Polynomial Regression

In Method of Least Squares for Multiple Regression we review how to fit data to a straight line. Sometimes data fits better with a polynomial curve.

On this webpage we explore how to construct polynomial regression models using standard Excel capabilities. Click here to learn more about Real Statistics capabilities that support polynomial regression.

Excel Capabilities

We look at a quadratic model, although it is straightforward to extend this to any higher order polynomial.

This is equivalent to the usual multiple regression model

studied in Multiple Regression Analysis where $x_2 = x_1^2$.

Example 1: A group of senior citizens who have never used the Internet before are given training. A sample of 5 people is chosen at random and the number of hours of Internet use is recorded for 6 months, as shown in the table on the upper left side of Figure 1. Determine whether a quadratic regression line is a good fit for the data.

Figure 1 – Data for polynomial regression in Example 1

We next create the table on the right in Figure 1 from this data, adding a second independent variable (MonSq) which is equal to the square of the month. We now run the Regression data analysis tool using the table on the right (quadratic model) in columns I, J and K as the input. The results are displayed in Figure 2.

Figure 2 – Quadratic regression output

The Adjusted R Square value of 95% and p-value (Significance F) close to 0 shows that the model is a good fit for the data. The fact that the p-value for the MonSq variable is near 0 also confirms that the quadratic coefficient is significant. This is further confirmed by looking at the scatter diagram in Figure 1, which shows that the quadratic trend line is a better bit for the data than the linear trend line. (To display the quadratic trend line select Layout > Analysis|Trendline and then More Trendline Options… On the display box which appears choose Polynomial trendline of Order 2.)

Figure 2 also shows that the regression quadratic that best fits the data is

Hours of Use = 21.92 – 24.55 * Month + 8.06 * Month2

Thus to predict the number of hours that a particular senior will use the Internet after 3 months, we plug 3 into the model (or use the TREND function) to get 20.8 hours of use.

We can also run the Regression data analysis tool on the original data to compare the above results with the linear model studied in Regression Analysis. The linear model is generated by using only columns I and K from Figure 1. The output is shown in Figure 3.

Figure 3 – Linear regression output

That the quadratic model is a better fit for the data is apparent from the fact that the adjusted R-square value is higher (95.2% vs. 83.5%) and the standard error is lower (13.2 vs. 24.5).

Real Statistics Capabilities

60 Responses to Polynomial Regression

1. UCHENWA LINUS OKAFOR says:

Linus

2. Ryan says:

Is the high collinearity (or correlation) between Month and Month^2 a concern?

• Charles says:

Ryan,
The correlation between Month and Month^2 is .9789, which is quite high, but it is also not necessarily at the level of collinearity. If you perform regression with Month and Month^2, the result won’t be very different from the result with just Month^2.
Charles

• Ryan says:

So in that case, you would probably remove Month from the model and fit a new model using only Month^2 as your explanatory variable? My question is about the worrisome correlation between two independent variables in the model.

3. Paul says:

Hi Charles, would you be able to give guidance on a method within excel of applying ± 95% confidence limits to a 3rd order polynomial. The limits would then be used to control a process. Thank you in advance for your reply,

• Charles says:

Paul,
For which variable are looking for a 95% confidence interval?
Charles

• Paul says:

Hours of use per month, as in your example above.

• Charles says:

Paul, I’m not sure that I understand what you mean by applying a ± 95% confidence limit. Generally, you should have more confidence in the accuracy of a statistic when its confidence interval is narrow. In particular if the confidence interval contains zero then the coefficient for that variable is not significantly different from zero, which means that that variable (at least the cube of that variable in this case) is not making a significant contribution to the regression model.
Charles

• Paul says:

Hi Charles, Thanks for your response. I was hoping to plot a ±95% confidence interval about the polynomial trend. I understand the function when applying to linear regression, not so easy for polynomial I guess (=t*SYX*SQRT(1/n+(A18-XAVG)^2/SSX).

• Charles says:

Paul,
Polynomial Regression is identical to multiple linear regression except that instead of independent variables like x1, x2, …, xn, you use the variables x, x^2, …, x^n. Thus, the formulas for confidence intervals for multiple linear regression also hold for polynomial regression. See the webpage Confidence Intervals for Multiple Regression.
Charles

4. Rishav Garg says:

Hi

I am doing multiple regression and getting compile error in hidden module.
Rishav Garg

• Charles says:

Rishav,
To try to figure out what is happening, please answer the following questions:
1. What do you see when you enter the formula =VER() in any spreadsheet cell?
2. What do you see when you press Ctrl-m?
3. Which release of Excel and Windows are you using?
Charles

5. Stamatis says:

Hi Charles

I have a set of data (lets call em X and Y). I fit to them a quadratic regression and i get an R^2 = 99.29%. Now my problem is to estimate the error my new values produced by the fitted polynomial. To be more exact I am intersted to the point that the fitted curves crosses the x axis or in other words a*x^2 + b*x + c = 0. What is going to be the variance of this point ?

• Charles says:

Stamatis,

A polynomial regression is just a special case of multiple linear regression. Therefore you can use the approach shown on the following webpage
Confidence and Prediction Interval

to calculate the standard error (i.e. the square root of the variance) at any point. In particular, you can use the Real Statistics REGPRED array function to do this.

You may need to actually calculate the two roots of the quadratic polynomial a*x^2 + b*x + c = 0. This can be done using the quadratic formula. Alternatively, you can use the new Real Statistics ROOTS function. I will add a description of the ROOTS function to the website shortly.

Charles

6. naison says:

hie there i jus want to know if a transcendental model function be done in excel

• Charles says:

What transcendental model functions are you referring to?
Charles

Thank you for this academic materials. Would you please give a guideline for analysis of third order polynomial regression model?

Regards

• Charles says:

It is exactly as in Example 1 of the referenced webpage, except that now you must add another column with the cubes of the x values of the input data.
Charles

Dear Charles
Thank you for your response. Would you please illustrate the meaning of a fitted third order polynomial regression curve/model i.e. how I can explain it.

Regards

• Charles says:

A fitted third order curve is one of the form y = ax^3 + bx^2 + cx + d. You are generally looking for the curve of this type that best fits the data. There are various versions of what best fit means. If you want to use linear regression then you are essentially viewing y = ax^3 + bx^2 + cx + d as a multiple linear regression model, where x^3, x^2 and x are the three independent variables. This is the approach used on the referenced webpage to find the best values of a, b, c and d. Here “best” means the smallest value of the sum of squared differences between the observed values of y_i and the values of y_i calculated when x_i is substituted for x in the equation y = ax^3 + bx^2 + cx + d.

You can also use a non-linear model to find the best values of a, b, c and d. This approach is illustrated on the following webpage (using Excel’s Solver):
http://www.real-statistics.com/regression/exponential-regression-models/exponential-regression-using-solver/

Charles

Hey,
I want to do a polynomial model with four independent variables in software R. How can I go on about that?

Thanks

• Charles says:

Sorry Varada, but this website is about statistics in Excel, not R. In fact, I don’t use R.
Charles

9. BECK says:

Hello
What is the p value for the polynomial line?
If I have a data series and I determine that the polynomial line is better fit than linear one and Rsquare is higher, how do I determine the p value for the polynomial line? do I have to change all my values to the square of the original values from the data series, run regression with excel and present the p I get?

• Charles says:

Beck,
The referenced webpage describes how to calculate the p-value for the linear and quadratic coefficients of the polynomial regression model. There is one p-value for each coefficient (corresponding to the degree of the polynomial). There is one R-square value for the entire regression model.
Charles

10. Hayati says:

Hello Sir,

I want to find a correlation between brain activities and enzyme activities during emotional state. So at first, I perform linear correlation/regression but almost all the results gave no significant in correlations (even though some are with large r) and I believe my variables are not correlated.

Then, if I use this polynomial regression to aim for that correlation, is it relevant.? Or what is the polynomial regression are actually aiming if it is not correlation?

Thank you,
Hayati

• Charles says:

Hayati,
It is not clear from your description what sort of polynomial regression you would use. It is possible that the (linear) correlation between x and y is say .2, while the linear correlation between x^2 and y is .9. Thus, the polynomial regression y = b*x^2+a might yield a better model (e.g. for predictions) then the linear regression model y = b*x+a.
Charles

• Hayati says:

Thank you for the reply, Mr Charles.

The polynomial regression that I meant is as in this chapter. I am not really familiar with statistics so I do not know if there are any types besides this.

So as in your reply, I can still use polynomial regression (or multiple regression, like explained in this chapter) to find correlation?

Then, I want to add one more question:
Should we go further with the order (quadratic, than cubic) to find better results? (results might be regression values, R^2 or p-values. I am not sure but in my case, I aim for correlation)

Hayati

• Charles says:

Hayati,

You can define the correlation coefficient for nonlinear relationships (i.e. based on a nonlinear regression) as the square root of 1 – SSE/SST, where
SSE = the sum of the squared residuals (i.e. where for each data value the residual is the difference between the observed y value and the y value predicted by the regression model)
SST = the sum of the squared differences between the observed y values and the mean of the observed y values

For linear regression this definition is equivalent to the usual definition of the linear correlation coefficient.

You can use polynomial regression to find the polynomial correlation coefficient. You can do this for quadratic, cubic, etc. regression/correlation.

Charles

11. Maja says:

Hi Charles,

Thank you for making this easier to understand – with the learnings from my statistics classes already blurred this was an excellent brush up!

I am trying to show if there can be talk of herding behaviour in stock markets. For this I have obtained market return data (r_m,t) to calculate the cross-sectional absolute deviation value. Now, in order for me to identify herding behaviour I have to detect a negative correlation between CSAD and r_m,t, from below formula (with D^event being a dummy for certain days):

CSAD_(m,t)=y_0+γ_1 D^Event |R_(m,t) |+γ_2 (1-D^Event )|R_(m,t) |+γ_3 D^Event R_(m,t)^2+γ_4 (1-D^Event )R_(m,t)^2+e_t

My question is now if you have any advise as to how I estimate these coefficients (y_3 and y_4 in particular) in excel.

Many thanks,
Maja

• Charles says:

Sorry Maja, but I don’t understand the formula that you are using.
Charles

12. Bhushan says:

Hi Charles,
The example above shows using a quadratic equation with one independent variable. Is it possible to use a quadratic or cubic equation with 2 or 3 independent variables? In that case how will the equation look?

• Charles says:

Bhushan,
It can have many forms. E.g. y = b0 + b1*x1 + b2*x1^2 + b3*x1^3 + b4*x2^2 + b5*v1*x2.
Charles

• Chris says:

Charles,

I beleive Bhushan is asking how to carry out this multivariate polynomial regression using your code. Currently the polynomial regression tab only allows for one dependent variable.

• Charles says:

Chris,
I understood from his comment that he has multiple independent variables (not dependent variables). E.g. y = b0 + b1*x1 + b2*x1^2 + b3*x1^3 + b4*x2^2 + b5*v1*x2. In these cases you can use multiple linear regression where you treat terms such as x1^2 as a new independent variable y1 (whose value is x1^2).
Charles

13. Ashebir says:

Dear Charles, can explain to me why the third and fourth degree polynomial equations that I get from excel by changing a linear trend line to a polynomial of third and fourth degree do not match with the trend line and they result in outputs that are outside of the graph? Thanks in advance.

• Charles says:

Ashebir,
If you send me an Excel file with your data and chart showing the trend line, I will try to answer your question.
Charles

14. Nathan Anderson says:

Hi charles,

I am using the polynomial regression formula to estimate the demand based on prices and demands given. How do I use the formula to find the standard deviation and mean, so I can find probabilities?

Nathan

• Charles says:

Nathan,
The standard deviation and mean of what? Are you referring to forecasts or coefficients or something else?
Charles

15. Lorenzo says:

Dear Charles, how can I perform a quadratic regression with 3 dependent variables in excel? If yes how?
Thanks Lorenzo

• Charles says:

Lorenzo,
The website doesn’t currently support multivariate regression (i.e. more than one dependent variable). This will be added some time in the future.
Charles

Hallo,

• Charles says:

Charles

Support for multivariate regression as asked by Lorenzo ?

Hello,

I have to find out relation between one dependent and four independent variables. I tried it with regression in excel. But I get the linear eqaution (linear regression). I would like to check whether polynomial or logarithmic or exponetial curve fits more correctly? I need this eqaution to predict for next entries. Also I dont want use the approach of using predicting equation and finding coefficient. That will not work out with me, as I have to repeat this procedure for multiple times.

• Charles says:

The procedure for polynomial regression is described on the referenced webpage. Exponential regression is described on the following webpage
Exponential Regression
As far as which approach fits better. One approach is to simply graph the data points and fit them with both an exponential trendline and a polynomial trendline (from Excel’s scatter chart capability) and visually see which one fits better. You can also calculate the SSE for each and see which is lower.
Charles

Hallo Charles,

I did not understand what you mean. I give example of my data,

Weight Height Age eyesight output
1 5 5 22 10
2 5 8 25 14
5 8 8 25 18
10 10 5 28 22
12 12 8 40 28

Now I have find equation fitting to this data so that , I can predict output for next values.
Here I am not sure, that equation will be linear or quadratic, polynomial. If nothing perfect which one gives least error.

Thanks

Hello Charles,

Thank you for your reply. But here are considering equatio will have only polynomial nature. I want to have flexibility with exponential or logarithmic curves too. Is it possible with your software?

Thanks

• Charles says:

If you want to mix polynomial and exponential factors, you can do it with the Real Statistics software, but you will need to manually format your data properly.
Charles

Hello Charles,

Also this link explains only one independent variable. In my case it is both multivariable and poynomial too?

• Charles says:

That is all that is covered in the website.
Charles

Hallo Charles,

I did not understand your comment. Is it possible multivariable and polynomial toghether regression with real statistics?

• Charles says:

Yes, using multiple linear regression, but you will need to manually transform some of the data.
Charles

Hi Charles,

Could you please explain, how to do multiple linear regression to generate multivariable polynomial regression? I tried but still not succeded.

• Charles says:

Suppose you have two independent variables x1 and x2 and want to consider polynomials of degree 3 or less, then the multiple linear regression model looks like
y = b0 + b1*x1 + b2*x1^2 + b3*x1^3 + b4*x2 + b5*x2^2 + b6*x2^3 + b7*x1*x2 + b8*x1*x2^2 + b9*x1^2*x2
Charles

Hallo Charles,

So I have to predict equations and then manually enter x^2, x^3… ? right?
And then simple linear regresssion from excel data analysis option ?
Is there any other professional way? or excel built software for curve fiitng?

Thanks

• Charles says:

First you enter the data corresponding to the x values. Then you expand the data columns to get the x^2, x^3, etc. values. You can do this manually or by using Real Statistics’ Extracting Columns from a Data Range data analysis tool. Then you perform multiple linear regression — e.g. by using Real Statistics’ Multiple Linear Regression data analysis tool. The combination of these two data analysis tools streamlines the process. You can also use other tools such as SPSS, SAS, etc. to do this.
Charles