Polynomial Regression

In Method of Least Squares for Multiple Regression we review how to fit data to a straight line. Sometimes data fits better with a polynomial curve.

On this webpage we explore how to construct polynomial regression models using standard Excel capabilities. Click here to learn more about Real Statistics capabilities that support polynomial regression.

Excel Capabilities

We look at a quadratic model, although it is straightforward to extend this to any higher order polynomial.

Quadratic regression model

This is equivalent to the usual multiple regression model

Multiple regression model

studied in Multiple Regression Analysis where x_2 = x_1^2.

Example 1: A group of senior citizens who have never used the Internet before are given training. A sample of 5 people is chosen at random and the number of hours of Internet use is recorded for 6 months, as shown in the table on the upper left side of Figure 1. Determine whether a quadratic regression line is a good fit for the data.

Data polynomial regression

Figure 1 – Data for polynomial regression in Example 1

We next create the table on the right in Figure 1 from this data, adding a second independent variable (MonSq) which is equal to the square of the month. We now run the Regression data analysis tool using the table on the right (quadratic model) in columns I, J and K as the input. The results are displayed in Figure 2.

Quadratic regression Excel

Figure 2 – Quadratic regression output

The Adjusted R Square value of 95% and p-value (Significance F) close to 0 shows that the model is a good fit for the data. The fact that the p-value for the MonSq variable is near 0 also confirms that the quadratic coefficient is significant. This is further confirmed by looking at the scatter diagram in Figure 1, which shows that the quadratic trend line is a better bit for the data than the linear trend line. (To display the quadratic trend line select Layout > Analysis|Trendline and then More Trendline Options… On the display box which appears choose Polynomial trendline of Order 2.)

Figure 2 also shows that the regression quadratic that best fits the data is

Hours of Use = 21.92 – 24.55 * Month + 8.06 * Month2

Thus to predict the number of hours that a particular senior will use the Internet after 3 months, we plug 3 into the model (or use the TREND function) to get 20.8 hours of use.

We can also run the Regression data analysis tool on the original data to compare the above results with the linear model studied in Regression Analysis. The linear model is generated by using only columns I and K from Figure 1. The output is shown in Figure 3.

Linear regression output Excel

Figure 3 – Linear regression output

That the quadratic model is a better fit for the data is apparent from the fact that the adjusted R-square value is higher (95.2% vs. 83.5%) and the standard error is lower (13.2 vs. 24.5).

Real Statistics Capabilities

Click here to learn more about Real Statistics capabilities that support polynomial regression.

32 Responses to Polynomial Regression

  1. Bhushan says:

    Hi Charles,
    The example above shows using a quadratic equation with one independent variable. Is it possible to use a quadratic or cubic equation with 2 or 3 independent variables? In that case how will the equation look?

  2. Maja says:

    Hi Charles,

    Thank you for making this easier to understand – with the learnings from my statistics classes already blurred this was an excellent brush up!

    I am trying to show if there can be talk of herding behaviour in stock markets. For this I have obtained market return data (r_m,t) to calculate the cross-sectional absolute deviation value. Now, in order for me to identify herding behaviour I have to detect a negative correlation between CSAD and r_m,t, from below formula (with D^event being a dummy for certain days):

    CSAD_(m,t)=y_0+γ_1 D^Event |R_(m,t) |+γ_2 (1-D^Event )|R_(m,t) |+γ_3 D^Event R_(m,t)^2+γ_4 (1-D^Event )R_(m,t)^2+e_t

    My question is now if you have any advise as to how I estimate these coefficients (y_3 and y_4 in particular) in excel.

    Many thanks,

  3. Hayati says:

    Hello Sir,
    I want to ask about the application of polynomial regression.

    I want to find a correlation between brain activities and enzyme activities during emotional state. So at first, I perform linear correlation/regression but almost all the results gave no significant in correlations (even though some are with large r) and I believe my variables are not correlated.

    Then, if I use this polynomial regression to aim for that correlation, is it relevant.? Or what is the polynomial regression are actually aiming if it is not correlation?

    Thank you,

    • Charles says:

      It is not clear from your description what sort of polynomial regression you would use. It is possible that the (linear) correlation between x and y is say .2, while the linear correlation between x^2 and y is .9. Thus, the polynomial regression y = b*x^2+a might yield a better model (e.g. for predictions) then the linear regression model y = b*x+a.

      • Hayati says:

        Thank you for the reply, Mr Charles.

        The polynomial regression that I meant is as in this chapter. I am not really familiar with statistics so I do not know if there are any types besides this.

        So as in your reply, I can still use polynomial regression (or multiple regression, like explained in this chapter) to find correlation?

        Then, I want to add one more question:
        Should we go further with the order (quadratic, than cubic) to find better results? (results might be regression values, R^2 or p-values. I am not sure but in my case, I aim for correlation)


        • Charles says:


          You can define the correlation coefficient for nonlinear relationships (i.e. based on a nonlinear regression) as the square root of 1 – SSE/SST, where
          SSE = the sum of the squared residuals (i.e. where for each data value the residual is the difference between the observed y value and the y value predicted by the regression model)
          SST = the sum of the squared differences between the observed y values and the mean of the observed y values

          For linear regression this definition is equivalent to the usual definition of the linear correlation coefficient.

          You can use polynomial regression to find the polynomial correlation coefficient. You can do this for quadratic, cubic, etc. regression/correlation.


  4. BECK says:

    What is the p value for the polynomial line?
    If I have a data series and I determine that the polynomial line is better fit than linear one and Rsquare is higher, how do I determine the p value for the polynomial line? do I have to change all my values to the square of the original values from the data series, run regression with excel and present the p I get?

    • Charles says:

      The referenced webpage describes how to calculate the p-value for the linear and quadratic coefficients of the polynomial regression model. There is one p-value for each coefficient (corresponding to the degree of the polynomial). There is one R-square value for the entire regression model.

  5. Varada Shrotri says:

    I want to do a polynomial model with four independent variables in software R. How can I go on about that?


  6. Md Shahadat Hossain says:

    Thank you for this academic materials. Would you please give a guideline for analysis of third order polynomial regression model?


    • Charles says:

      It is exactly as in Example 1 of the referenced webpage, except that now you must add another column with the cubes of the x values of the input data.

      • Md Shahadat Hossain says:

        Dear Charles
        Thank you for your response. Would you please illustrate the meaning of a fitted third order polynomial regression curve/model i.e. how I can explain it.


        • Charles says:

          A fitted third order curve is one of the form y = ax^3 + bx^2 + cx + d. You are generally looking for the curve of this type that best fits the data. There are various versions of what best fit means. If you want to use linear regression then you are essentially viewing y = ax^3 + bx^2 + cx + d as a multiple linear regression model, where x^3, x^2 and x are the three independent variables. This is the approach used on the referenced webpage to find the best values of a, b, c and d. Here “best” means the smallest value of the sum of squared differences between the observed values of y_i and the values of y_i calculated when x_i is substituted for x in the equation y = ax^3 + bx^2 + cx + d.

          You can also use a non-linear model to find the best values of a, b, c and d. This approach is illustrated on the following webpage (using Excel’s Solver):


  7. naison says:

    hie there i jus want to know if a transcendental model function be done in excel

  8. Stamatis says:

    Hi Charles

    I have a set of data (lets call em X and Y). I fit to them a quadratic regression and i get an R^2 = 99.29%. Now my problem is to estimate the error my new values produced by the fitted polynomial. To be more exact I am intersted to the point that the fitted curves crosses the x axis or in other words a*x^2 + b*x + c = 0. What is going to be the variance of this point ?

    • Charles says:


      A polynomial regression is just a special case of multiple linear regression. Therefore you can use the approach shown on the following webpage
      Confidence and Prediction Interval

      to calculate the standard error (i.e. the square root of the variance) at any point. In particular, you can use the Real Statistics REGPRED array function to do this.

      You may need to actually calculate the two roots of the quadratic polynomial a*x^2 + b*x + c = 0. This can be done using the quadratic formula. Alternatively, you can use the new Real Statistics ROOTS function. I will add a description of the ROOTS function to the website shortly.


  9. Rishav Garg says:


    I am doing multiple regression and getting compile error in hidden module.
    Please help.
    Rishav Garg

    • Charles says:

      To try to figure out what is happening, please answer the following questions:
      1. What do you see when you enter the formula =VER() in any spreadsheet cell?
      2. What do you see when you press Ctrl-m?
      3. Which release of Excel and Windows are you using?

  10. Paul says:

    Hi Charles, would you be able to give guidance on a method within excel of applying ± 95% confidence limits to a 3rd order polynomial. The limits would then be used to control a process. Thank you in advance for your reply,

    • Charles says:

      For which variable are looking for a 95% confidence interval?

      • Paul says:

        Hours of use per month, as in your example above.

        • Charles says:

          Paul, I’m not sure that I understand what you mean by applying a ± 95% confidence limit. Generally, you should have more confidence in the accuracy of a statistic when its confidence interval is narrow. In particular if the confidence interval contains zero then the coefficient for that variable is not significantly different from zero, which means that that variable (at least the cube of that variable in this case) is not making a significant contribution to the regression model.

          • Paul says:

            Hi Charles, Thanks for your response. I was hoping to plot a ±95% confidence interval about the polynomial trend. I understand the function when applying to linear regression, not so easy for polynomial I guess (=t*SYX*SQRT(1/n+(A18-XAVG)^2/SSX).

          • Charles says:

            Polynomial Regression is identical to multiple linear regression except that instead of independent variables like x1, x2, …, xn, you use the variables x, x^2, …, x^n. Thus, the formulas for confidence intervals for multiple linear regression also hold for polynomial regression. See the webpage Confidence Intervals for Multiple Regression.

  11. Ryan says:

    Is the high collinearity (or correlation) between Month and Month^2 a concern?

    • Charles says:

      The correlation between Month and Month^2 is .9789, which is quite high, but it is also not necessarily at the level of collinearity. If you perform regression with Month and Month^2, the result won’t be very different from the result with just Month^2.

      • Ryan says:

        So in that case, you would probably remove Month from the model and fit a new model using only Month^2 as your explanatory variable? My question is about the worrisome correlation between two independent variables in the model.


    Thank you for your work.t

Leave a Reply

Your email address will not be published. Required fields are marked *