If y is a dependent variable (aka the **response variable**) and* x*_{1}*, …, x _{k}* are independent variables (aka

**predictor variables**), then the

**multiple regression model**provides a prediction of y from the

*x*of the form

_{i}Topics:

Everything you need to do real statistical analysis using Excel

Hello,

I am trying to create a predictive model for earnings of automotive companies using a multiple regression model. I will be using 6 independent variables. However, there are 10 companies. Is there some sort of error I should consider. If you could give me name for this type of analysis I could do some of my own research.

or would it not matter if the data comes from different companies?

Aamir,

It depends on how you selected the companies (e.g. randomly) and how you plan to use the data. Does this mean that you have 10 values for each of the variables (based on the 10 companies)?

Charles

I ve picked the 10 biggest companies by units sold. its a time series data.

so each company will have 6 independent variables for the past 30 years. eg. earnings = B1x1 + B2x2….

so each variable has 30 values for 10 companies. i.e 300

the independent variables are rnd spending, profit margin, employee expenses…

I will use the model and do tests to conclude if R&D spending affects earnings.

Question:

would disregard the fact that they values for independent variable come from different companies? how can i take the company into account in my analysis?

thank you for your reply Charles. Really appreciate it.

Aamir,

Thanks for providing this essential information that was missing from your original description.

Whether or not you can use data from 10 different companies now depends on what you are trying to demonstrate. For usual, univariate, time series analysis you would have one independent variable, namely time. You may need to use two independent variables, namely time and company; this is multivariate time series analysis.

Charles

Hello

I am working with time series data (daily data of rainfall) and have some missing data values. Out of 60 years of data data for few months are missing. I want to fill in those missing values using the multiple regression analysis technique. Can you please help me to do this using the real stat package?

Please look at the following webpage for how to do this

http://www.real-statistics.com/handling-missing-data/traditional-approaches-handling-missing-data/

Charles

Thank you for the reply. I see that this method can work great if the missing data are continuous in nature. What if the missing values are random instead of being in a sequence?

Thanks again!

Som

It really depends on how much of the data is missing. If only a little is missing listwise deletion could still work or you could simply generate random values for the missing data*provided data is truly random(.

Charles

Hi Prof,

I am trying to fit a multiple linear regression model to my data (200 observations of one dependent & 6 independent variables) using excel – data analysis tool pack.

I’ve also checked correlation between the variables, 3-4 pairs have correlation >= 0.45

My R^2 is very low 0.26 (adjusted R^2 is 0.23) shows 26% of explained variation; Significance F is very low (0.05 (say, 0.483737793 and 0.074517895) [For these variables correlation is 0.4577]

And the regression coefficient for all 6 variables are very low (say, -0.0027809, -0.027634038, 0.052207914, -1.74467E-06, -4.78887E-05, -9.98705E-06)

So I tried deleting one of the independent variable with p-value but still getting lower values of R^2, regression coefficients, one p-value > 0.05

I don’t find the correct path to obtain a suitable model.

Please suggest the steps to follow, while building a strong multivariate regression model.

Please email me (see Contact Us) an Excel file with your data so that I can understand the situation better.

Charles

Hi,

I am slowly picking up in learning regression techniques. I have a situation here like i need predict a rating score for labour (response variable) and I have 6 predictor variables of different types. Please suggest the steps to start with and complete the analysis, thus designing a predictive model.

Thanks!

You can find information about this subject at

Testing extra variables in multiple regression

Also relevant are

Interaction

Polynomial regression

Charles

Hi Charles,

I am trying to predict the next days stock price based on past data with todays open values.

A stock named ABC depends on Constituent Indices : CNX 500,CNX 200,CNX SMALLCAP,CNX MIDCAP INDEX.

For CNX 500 we have Open,High,Low, Close

For CNX 200 we have Open,High,Low, Close

For CNX SMALLCAP we have Open,High,Low,Close

For CNX CNX MIDCAP we have Open,High,Low,Close

For stock named ABC we have Open, High,Low, Last, Close

Totally i have 21 variables. Since there are 21 variables, i assumed atleast 9 data points for each variable. So i collected datapoints of 21 *9 = 189 data rows.

I wanted to do multiple regression but excel does not support for these 21 variables.

Could you please suggest the best approach to predict the next days price?

I have seen Interaction, Tolerance, VIF in the links you have provided. Does interaction also applies for the stock as well?

Thanks,

Venkat

Hi Charles,

Small correction, typo in earlier message

I am trying to predict the next day stock price based on past data with next day open values.

Thanks,

Venkat

Venkat,

Excel’s Regression data analysis supports up to 16 independent variables. Excel’s LINEST function can be used instead, and it supports up to 64 variables.

You are better off using the Real Statistics Multiple Linear Regression data analysis tool since it supports as many independent variables as you need and is easier to use than LINEST. The tool also generates the VIF values as well (Tolerance is just the reciprocal of VIF).

If you have two variables x and y, then the interaction is modeled via the variable x*y.

Caution: Since you are using stock market data, autocorrelation needs to be taken into account. I have provided the Durbin-Watson function in the Real Statistics Resource Pack to let you test whether there is significant autocorrelation, but have not yet explained how to revise the regression analysis to take autocorrelation into account; this will be done in one of the next release of the software and website.

Charles

Hi Venkat

You might want to use R (programming language) instead of Excel for Data Analysis. It’s brilliant!

Antonio

Hello,

I am running a multiple regression and am trying to figure out if using three variables instead of two will give me better results. I am trying to figure out the correlation between three independent variables. I can’t run a regression through excel without picking one of the variables as dependent so I was wondering if you have any tips. I am hoping to use the correlation to determine the variance inflation factor to see which additional variable is the least correlated to my other chosen independent variables.

Thank you!

Hello Gabriella,

To determine whether you will get a significantly better result with three variables rather than two, you can use the

RSquareTest, as described on the webpage Testing the significance of extra variables on the model. This webpage will also describe theAICstatistic, which can also be used for this purpose.You can calculate the variance inflation factor using the Real Statistics

VIFfunction, as described on the webpageCollinearity

Finally, with three you can calculate the multiple correlation using the Real Statistics

MCORRELfunction, but you do need to pick one of the variables as the dependent variable.Charles

Dear sir,

I am doing analysis on my epidemiological data about the association between dioxin in mother breast milk and their children hormone. I have some questions, could you please help me to find it out?

– I do the log 10 transform to improve normality first. Then, when I make the linear correlation, which data type should I use, log 10 transformed type of base date?

– The same with above, when I make a multiple regression analysis, which type of data should I use?

– Normally, if we find a significant linear relation, we will do multiple analysis to check that relation again. In my case, I found no significant linear relation between all variables, is it necessary for me to do the multiple analysis next?

Thank you very much for reading my questions.

Anh LT

Anh LT,

If you need the data to be normal then you should use the data after the log transformation, otherwise you can use the original data.

It is entirely possible to not find a linear relationship between x1 and y nor between x2 and y, yet find a linear relationship between x1, x2 and y. The following is a very simple example:

y x1 x2

2 4 0

3 0 6

4 -5.5 14

5 15 -5

The correlation for x1 and y is .41, p-value = .59 (regression)

The correlation for x2 and y is -.11, p-value = .89

The correlation for x1, x2 and y is .999, p-value = .039

Thus, you should do the multiple linear regression. In fact, there isn’t much point in doing the simple regressions.

Charles

Is there a way to perform regression analysis and correlation matrix without the data analysis pak installed. Not able to install at public library

Yes. Excel provides regression and correlation data analysis tools. These can be accessed at Data > Analysis | Data Analysis.

Charles

Dear sir,

Im trying to find out what method to apply when analysing results from a questionnare. I have 1 dependent variable and 3 independent variable. The dependet variable is privacy.

Many thanks

Multiple regression could fit your description, but it depends on what you want to do.

Charles

Thnks for you quick answer. The main objective of the survey is to compare gender, income, age and demographic and see what impact these factors have on the awarness of privacy concerns in terms of loyalty cards.

Many thanks,

Pernille

Pernille,

It sounds like a regression problem. If privacy is a continuous variable then you could try multiple linear regression. If instead privacy is binary then you might try logistic regression.

Charles

We are looking at to what degree or what extend age, gender etc is effecting privacy concerns and awarness of information usage, and not a two way answer.

Pernille,

It sounds like a regression model with gender, income, age and other demographic factors as the independent variables and privacy as the independent variable.

Charles

Hi Charles,

I am trying to analyze the correlation and regression between 1 dependent variable(market value of the stock) and 4 independent variables(external factors of the economy) where i need to do this for 20 companies. what model should i use? can i make one table with all the market values of 20 companies and external factors??please help

Kiran,

If I understand correctly, your data can be organized as follows:

Company Fact1 Fact2 Fact3 Fact4 Value

GM 34.2 12.5 2 10 34.5

etc.

You would have 20 rows of data. You could use multiple regression to predict the value of the stock based on the values of the four factors. The value of R in the output is the correlation.

Charles

Dear Sir,

What stat formula will I use if I have one independent variable (personal formation program of the school) and 3 dependent variables (behavior at home, in school and in the community? You may email me for the answer. Thanks.

gadzilva

It depends on what hypothesis you want to test.

Charles

I’m doing a multiple regression on 23 independent variables. Granted, many of them are interaction variables. The 23rd variable is not giving any coefficients nor standard error data, which is then causing errors in the following cells. I’ve tried running it again without the 23rd variable and the 22nd variable is fine, but when I add in the 23rd, it doesn’t work. Any thoughts?

David,

A possible explanation is collinearity (see Collinearity), but I’d have to see the data to figure out what the problem is. Presumably you are using the Real Statistics data analysis tool since the Excel Regression tool is limited to 16 independent variables.

Charles

I am using that and thank you for that tool. I’m pretty confident that it is Collinearity as the variable is actualy interacted between 3 other variables and one is a squared variable. (X2sqX3X4) but there are other collinear variables in the entire data set so I was surprised when it was the only one that did that.

Please,who can help me among you here guys in doing the stat of my research. it’s really nerve cracking for because i’m not really good in stat.E mail me please,I would really appreciate your help. =(

sir i am un able to aply the regression on 1 dependent and 3 independent variabls plz help me

Are you trying to do the reression usin the standard Excel Regression data analysis tool or are you using the Real Statistics Linear Regression data analysis tool or something else?

Charles

Dear Charles,

I am investigating the dependency of a set of PCA components (each component with 27 values, that would be my dependent variables) with regard to a set of design evaluations (12 values, independent variables), I would like to ask you which is the best regression method for datasets which don’t have the same dimension?

Many thanks in advance

Alicia

Alicia,

I don’t completely understand your question. Regression is normally performed on one dataset and so I don’t know how to intend to use multiple datasets, unless you you are simply referring to the data for each variable as a different dataset. In this case, perhaps by different dimensions you mean that some data is missing which is causing the sample sizes to be different. Please clarify these points.

Charles

Dear Charles,

Your Real Statistics Add-In tools are great! Just wondering if there is a way to constrict the constant to zero when using the Linear Regression tool?

Regards,

Jason

Jason,

At present I don’t provide any way of constricting the constant to zero. I understand that there is no general agreement for how to do this. Excel’s Regression data analysis tool does provide this capability, but I am not sure whether the approach used is widely accepted.

Charles

Dear Sir,

thank you very much for your response.

Now the results of regression analysis are updating as per the changes in the data set.

but if I change the no. of inputs (like- earlier I have used 50 data points and now if I try the same with 48 data points), then this regression analysis is not showing any results.

So kindly do the needful to resolve the issue.

Thank you once again for your support.

Regards,

Jai

Dear all,

I am Jai, I am using this regression analysis for Weibull calculation, but I am looking for some kind of procedure, by which the regression analysis will update automatically. Because currently if I change the input data, then only graph is changing (other calculations are not changing, so if I change the data then I have to run regression analysis once again)

I would really appriciate, if someone can provide me the details about autoupdation of Regression Analysis.

Regards,

Jai

Jai,

The analysis performed by Excel’s Regression data analysis tool does not update automatically when the input data is changed.

The Regression data analysis tool provided by the Real Statistics Resource Pack will update automatically when the data is changed. It also handles 64 independent variables (instead of only 16 variables supported by the standard Excel Regression tool).

You can download the Real Statistics Resource Pack for free by going to the webpage http://www.real-statistics.com/free-download/real-statistics-resource-pack/. Please make sure that you install the software as described on that webpage.

See webpage http://www.real-statistics.com/multiple-regression/multiple-regression-analysis/ for more information about how to use the Real Statistics Regression data analysis tool.

Charles

Dear Sir,

I need to carry out multiple regression analysis on ordinal (satisfaction measures) independent variables. There are three parameters each categorised into factors and subfactors (variables). Each subfactor includes multiple questions to get satisfaction rating. Thus summing up the scores from questions to subfactors; subfactors to factors; factors to each respective paramater and finally combined score of all three parameters considered as score of the aspect of interest. My first question is whether the method is correct, and second is which specific regression analysis method should I use. Sample size is 300 households within 16 clusters equally divided into two categories.

Charu,

The approach seems reasonable from what I understand, but with such little detail I cannot say whether the method is correct or not, nor which type of regression analysis to use.

Charles

Dear Sir

I’m preparing budget for my company (production & sales budget) for different products. I had data of 3 years of production & sales for different product. Can you please help me for future projection which statistical formula will be helpful

Thanks

Depending on what the data looks like multiple regression might be a way to make the projection. When you say that you have 3 years, does that means that you have three data elements for production & sales for each product? Or do you have monthly or weekly data for the 3 years? I ask since with only 3 data elements, projections will be somewhat limited.

Charles

3 dependent vars (surveys) and one independent var (performance)

Hi! I am doing my research using likert scale (effect of attitudes) as dependent variable to english performance( ind. var). What stat tool would I employ? thanks

It really depends on what you are trying to demonstrate. What hypothesis are you tryin to test? It is not sufficient to say how many variables you have.

Charles

Hello Mr. Zaiontz,

I am conducting a study with 4 independent variables and one dependent variables. These 4 independent variables include 3 survey data and the other one is method of teaching. The dependent var is grade. Can I use multiple regression? How am I going to come up with a regression model? Please help me…. thanks

survey questions utilizing Likert scale?

Please be more specific. What would you like to find out about such questions?

Charles

Niel,

Yes you can use multiple regression, although the assumptions will likely not be fully satisfied. As to how to do it, please read the website.

Charles

thanks a lot!

According to this formula

Poverty = 0.457 + .00142 ∙ Infant Mortality + .0363 ∙ White + 1.279 ∙ Crime

for a certain case of Alabama where Infant Mortality =9, White= 71 and Crime= 448

Poverty should be 0.457 + 0.01278 + 2.577 + 573 = 573

And not around 15.7 as it is in the table. The result is clearly off and contribution from Crime is significantly higher then contribution from first 2 factors. This would make sense if first 2 factors do not correlate at all with Poverty but i think this is not the case.

I might be wrong but it appears to me that something is a bit off…

You are correct. I made two typing mistakes when I copied the coefficients in Figure 3. The correct regression line should be:

Poverty (predicted) = 0.437 + 1.279 ∙ Infant Mortality + .0363 ∙ White + 0.00142 ∙ Crime

Thanks for catching this error.

Charles

Compressive Strength Tensile Strength % polymer added

25.33 2.44 0

28 2.69 1

33.63 3.11 2

34.52 3.3 3

32.74 3.06 4

28.89 2.78 5

25.93 2.55 6

24 2.12 7

21.62 1.93 8

thirdt column values are independent and remaining 2 columns r dedenedent how 2 do multiple regression? plz

The usual multiple regression model would have two independent variables and one dependent variable. Since you have two dependent variables and one independent, you can create two separate regression models with one dependent and one independent variable. If you want a single regression model with two dependent variables then you probably want multivariate regression. R provides this capability.

Charles

i am doing the same Ctrl-Shift-Enter but the result is not coming. final result is coming like this #Value.

for my problem Y= b0+b1x1 +b2x2 +b3x3 +b4x4 +b5x5 +b6x6

Y matrix=(22*1)

x matrix=(x1 x2 x3 x4 x5 x6)

x1….x6 matrix=(22*6)

please tell how i can solve this problem using multiple regression. using TREND and LINSET command.

Sir i am unable to get the result by applying the command =TREND(B4:B53,C4:E53,G6:I8) at figure 2 of multiple regression analysis.

please inform me how i will over come the problem and i will get the same result what you got at table 2.

TREND is an array formula. It is essential that you press Ctrl-Shift-Enter (and not just Enter) after inserting the formula. If this is not the problem, let me know.

Charles

Very useful and practical. Actually the section for coefficient analisys is sustantive.

very knowledgable article.