Polynomial Regression

In Method of Least Squares for Multiple Regression, we review how to fit data to a straight line. Sometimes data fits better with a polynomial curve.

On this webpage, we explore how to construct polynomial regression models using standard Excel capabilities. Click here to learn more about Real Statistics capabilities that support polynomial regression.

Excel Capabilities

We look at a quadratic model, although it is straightforward to extend this to any higher-order polynomial.

Quadratic regression model

This is equivalent to the usual multiple regression model

Multiple regression model

studied in Multiple Regression Analysis where x_2 = x_1^2.

Example 1: A group of senior citizens who have never used the Internet before are given training. A sample of 5 people is chosen at random and the number of hours of Internet use is recorded for 6 months, as shown in the table on the upper left side of Figure 1. Determine whether a quadratic regression line is a good fit for the data.

Data polynomial regression

Figure 1 – Data for polynomial regression in Example 1

We next create the table on the right in Figure 1 from this data, adding a second independent variable (MonSq) which is equal to the square of the month. We now run the Regression data analysis tool using the table on the right (quadratic model) in columns I, J and K as the input. The results are displayed in Figure 2.

Quadratic regression Excel

Figure 2 – Quadratic regression output

The Adjusted R Square value of 95% and p-value (Significance F) close to 0 shows that the model is a good fit for the data. The fact that the p-value for the MonSq variable is near 0 also confirms that the quadratic coefficient is significant. This is further confirmed by looking at the scatter diagram in Figure 1, which shows that the quadratic trend line is a better bit for the data than the linear trend line. (To display the quadratic trend line select Layout > Analysis|Trendline and then More Trendline Options… On the display box which appears choose Polynomial trendline of Order 2.)

Figure 2 also shows that the regression quadratic that best fits the data is

Hours of Use = 21.92 – 24.55 * Month + 8.06 * Month2

Thus to predict the number of hours that a particular senior will use the Internet after 3 months, we plug 3 into the model (or use the TREND function) to get 20.8 hours of use.

We can also run the Regression data analysis tool on the original data to compare the above results with the linear model studied in Regression Analysis. The linear model is generated by using only columns I and K from Figure 1. The output is shown in Figure 3.

Linear regression output Excel

Figure 3 – Linear regression output

That the quadratic model is a better fit for the data is apparent from the fact that the adjusted R-square value is higher (95.2% vs. 83.5%) and the standard error is lower (13.2 vs. 24.5).

Real Statistics Capabilities

Click here to learn more about Real Statistics capabilities that support polynomial regression.

83 thoughts on “Polynomial Regression”

  1. Charles,
    Is it possible to fix (or lock) one of the data points that lies on the x-axis, so that the quadratic regression is forced to go through that data point? In other words, force the regression to lie at a specified value on the x-axis (where y=0).
    Terence

    Reply
    • Hello Terence,
      Suppose the specific value on the x-axis is 4 and we are doing linear regression. If you subtract 4 from the x values in your sample (without changing the y values), that seems to move your specific x value to 0. You can now perform linear regression through the origin, as described on the Real Statistics website.
      I have to think about this further to see how this would work for quadratic regression.
      BTW, do you know how to do quadratic regression through the origin?
      Charles

      Reply
  2. What is the formula in excel to calculate correlation (R2) of quadratic/2nd order polynomial curve? I know y regression. I am searching for direct formula.

    Reply
    • The approach described for a 2nd order polynomial regression is essentially linear regression using x^2 as if it were just another independent variable. In this case, the formular for R-square is the same as the formula for R-square for linear regression.
      Charles

      Reply
  3. Firstly I am far from an expert in data analysis or the understanding of quadratic equations, but I successfully use Linest for a single polynomial regression curve to generate the values to be used in an Excel VBA formula.
    Where I am completely stumped is whether this method can be used to predict the values from multiple X values?
    I have the following table, and need to predict the resulting Y values based on the values in both axis. This needs to be done as a formula in VBA. Is this possible please?
    Area
    0.025 0.04 0.09 0.16 0.25
    Vol 25 1.5
    50 2.75 2.1
    75 4.1 3.1 2
    100 5.5 4.1 2.6 2 1.7
    150 8.2 6.1 3.9 3 2.4
    200 11 8.1 5.1 3.8 3.1
    250 10.1 6.4 4.8 3.9
    300 7.9 5.75 4.6
    400 10.3 7.6 6.1
    500 9.6 7.6
    600 12 9.1
    700 10.9
    800 12.5

    Reply
    • Andy,
      Sorry, but I don’t understand your question nor your data.
      Quadratic regression uses the equation y = ax^2 + bx + c. Are you looking for a regression of form y = ax^2 + bx + cz^2 + dz + e ?
      Charles

      Reply
      • Apologies Charles, the data all compressed when I hit enter!
        Basically this is a small table with Area values along the columns, 0.025, through to 0.25, and volumes down the rows, 50 through to 800. All the other values are the results in the table, based on area and volume. Some of the cells were blank.
        I know using Linest for a single set of data I can generate the figures to use in y = ax^2 + bx + c, but where I am lost is how to do the same for results based on two variables.
        I hope this is clearer. As I said, this area is not a strength of mine.

        Reply
  4. Hi Charles,

    What if I have 2 independent variables and one dependent variable, say example Q =output, L = Labor, K = Capital. So what formula should I be using and how to work on the excel?

    Reply
    • Cher,
      Do you want to include L^2, L^3, etc. as well as K^2, K^3, etc. How about L*K, L^2*K, L*K^2, L^2*K^2, etc.? As you can see, this can get pretty complicated, pretty fast. Also, with the addition of each term, you lose one degree of freedom, which may not be very desirable.
      Charles

      Reply
  5. I want to measure purchase intent in response to a particular promotional scheme. For discussion purpose, I test 3 schemes – namely – Cash Discount (CD), Free Sample (FS) and Loyalty Programme (LP) and collected responses on a 5-point Likert scale (5 = Strongly agree (to buy) and 1 = Strongly disagree). How do I analyse this data? Can anyone please help me?

    Reply
    • Shyam,
      I cannot answer your question based on such a limited amount of information. Can you provide more information about the scenario that you are describing?
      Charles

      Reply
  6. Dear Charles,
    When the coefficient of x is significant, the coefficient of x ^ 2 is insignificant, or the coefficient of x is insignificant, the coefficient of x ^ 2 is significant, what kind of interpretation is it correct to make?

    Reply
    • Sami,
      It is possible for x and x^2 to be significant or neither to be significant. It is also possible that one is significant but not the other. E.g. if x and x^2 are the only independent variables and x^2 is significant and x is not, then x is not adding much to predicting y while x^2 is making a significant contribution.
      Charles

      Reply
  7. Hello Charles,

    Hope all is well at your end. A quick question, do you happen to know the formula to calculate the sum of squares for quadratic terms such as A^2, B^2? I am able to manually calulate first order main effects and interaction, but how would you calculate quadratic sum of squares?

    Reply
  8. Hi Sir,
    I want to do polynomial regression of order 3 and above with two independent variables. Can you help me with the procedure. I used to work on Excel but this software is new for me.
    Thank you

    Reply
  9. Hello sir,
    Please tell me how to run the polynomial regression in excel if reciprocal of the variables are taken:
    E.g; y=a+ b(1/x)+c(1/x^2) +d(1/x^3)

    Reply
    • Gowher,
      If you set z = 1/x then the equation takes the form y = a + bz + cz^2 + dz^3, which can be addressed by polynomial regression. Here your data comes from the reciprocals of the x data, plus the reciprocals of the x data squared and the x data cubed.
      Note that this approach uses linear regression. You can also use non-linear regression as explained for exponential regression.
      Charles

      Reply
      • Thank you sir,
        I have done as you suggest and the model is significant. Now please tell me how to obtain confidence limits of each predicted value that have been obtained from the above polynomial regression model in excel or in SPSS.

        Reply
  10. Dear Charles,

    I am doing a multiple linear regression for four independent variables and one dependent variable. The independent variables are pH (x1), temperature (x2), time (x3), concentration of catalyst (x4), and the dependent variable is the % degradation (y) of the pollutant in water. I used the following second order polynomial to fit the experimental data that I have

    y = b0 + b1*x1+b2*x2 + b3*x3 + b4*x4 + b5*x1^2+b6*x2^2 + b7*x3^2+b8*x4^2+ b9*x1x2+b10*x1x3+
    b11*x1x4+b12*x2x3+b13*x2x4+b14*x3x4

    My question is if this is a correct approach for fitting these experimental data. I used Excel for doing the fitting and my adjusted R square is 0.732 for this regression and the final p-values for all the remaining terms in the final equation are much less than 0.05. They are all statistically significant. Is it possible to give me advice on that?

    Best,
    Simon Sakhel

    Reply
  11. Dear Charles,

    I want to ask you about the experienced advantages and disadvantages of polynomial regression comparing with the linear regression, Especially for hydrologic modeling.

    thank you.

    Reply
    • Polynomial regression is just a form of linear regression where a power of one or more of the independent variables is added to the model.
      I have no experience with hydrologic modeling, and so I can’t say whether this approach is useful.
      Charles

      Reply
  12. Hello,

    I have to find out relation between one dependent and four independent variables. I tried it with regression in excel. But I get the linear eqaution (linear regression). I would like to check whether polynomial or logarithmic or exponetial curve fits more correctly? I need this eqaution to predict for next entries. Also I dont want use the approach of using predicting equation and finding coefficient. That will not work out with me, as I have to repeat this procedure for multiple times.

    Thank you in advance

    Reply
    • Adish,
      The procedure for polynomial regression is described on the referenced webpage. Exponential regression is described on the following webpage
      Exponential Regression
      As far as which approach fits better. One approach is to simply graph the data points and fit them with both an exponential trendline and a polynomial trendline (from Excel’s scatter chart capability) and visually see which one fits better. You can also calculate the SSE for each and see which is lower.
      Charles

      Reply
      • Hallo Charles,

        I did not understand what you mean. I give example of my data,

        Weight Height Age eyesight output
        1 5 5 22 10
        2 5 8 25 14
        5 8 8 25 18
        10 10 5 28 22
        12 12 8 40 28

        Now I have find equation fitting to this data so that , I can predict output for next values.
        Here I am not sure, that equation will be linear or quadratic, polynomial. If nothing perfect which one gives least error.

        Thanks
        Adish

        Reply
  13. Hi charles,

    I am using the polynomial regression formula to estimate the demand based on prices and demands given. How do I use the formula to find the standard deviation and mean, so I can find probabilities?

    Nathan

    Reply
  14. Dear Charles, can explain to me why the third and fourth degree polynomial equations that I get from excel by changing a linear trend line to a polynomial of third and fourth degree do not match with the trend line and they result in outputs that are outside of the graph? Thanks in advance.

    Reply
  15. Hi Charles,
    The example above shows using a quadratic equation with one independent variable. Is it possible to use a quadratic or cubic equation with 2 or 3 independent variables? In that case how will the equation look?

    Reply
      • Charles,

        I beleive Bhushan is asking how to carry out this multivariate polynomial regression using your code. Currently the polynomial regression tab only allows for one dependent variable.

        Reply
        • Chris,
          I understood from his comment that he has multiple independent variables (not dependent variables). E.g. y = b0 + b1*x1 + b2*x1^2 + b3*x1^3 + b4*x2^2 + b5*v1*x2. In these cases you can use multiple linear regression where you treat terms such as x1^2 as a new independent variable y1 (whose value is x1^2).
          Charles

          Reply
  16. Hi Charles,

    Thank you for making this easier to understand – with the learnings from my statistics classes already blurred this was an excellent brush up!

    I am trying to show if there can be talk of herding behaviour in stock markets. For this I have obtained market return data (r_m,t) to calculate the cross-sectional absolute deviation value. Now, in order for me to identify herding behaviour I have to detect a negative correlation between CSAD and r_m,t, from below formula (with D^event being a dummy for certain days):

    CSAD_(m,t)=y_0+γ_1 D^Event |R_(m,t) |+γ_2 (1-D^Event )|R_(m,t) |+γ_3 D^Event R_(m,t)^2+γ_4 (1-D^Event )R_(m,t)^2+e_t

    My question is now if you have any advise as to how I estimate these coefficients (y_3 and y_4 in particular) in excel.

    Many thanks,
    Maja

    Reply
  17. Hello Sir,
    I want to ask about the application of polynomial regression.

    I want to find a correlation between brain activities and enzyme activities during emotional state. So at first, I perform linear correlation/regression but almost all the results gave no significant in correlations (even though some are with large r) and I believe my variables are not correlated.

    Then, if I use this polynomial regression to aim for that correlation, is it relevant.? Or what is the polynomial regression are actually aiming if it is not correlation?

    Thank you,
    Hayati

    Reply
  18. Hello
    What is the p value for the polynomial line?
    If I have a data series and I determine that the polynomial line is better fit than linear one and Rsquare is higher, how do I determine the p value for the polynomial line? do I have to change all my values to the square of the original values from the data series, run regression with excel and present the p I get?

    Reply
    • Beck,
      The referenced webpage describes how to calculate the p-value for the linear and quadratic coefficients of the polynomial regression model. There is one p-value for each coefficient (corresponding to the degree of the polynomial). There is one R-square value for the entire regression model.
      Charles

      Reply
  19. Hey,
    I want to do a polynomial model with four independent variables in software R. How can I go on about that?

    Thanks

    Reply
  20. Thank you for this academic materials. Would you please give a guideline for analysis of third order polynomial regression model?

    Regards
    Shahadat

    Reply
    • It is exactly as in Example 1 of the referenced webpage, except that now you must add another column with the cubes of the x values of the input data.
      Charles

      Reply
      • Dear Charles
        Thank you for your response. Would you please illustrate the meaning of a fitted third order polynomial regression curve/model i.e. how I can explain it.

        Regards
        Shahadat

        Reply
        • A fitted third order curve is one of the form y = ax^3 + bx^2 + cx + d. You are generally looking for the curve of this type that best fits the data. There are various versions of what best fit means. If you want to use linear regression then you are essentially viewing y = ax^3 + bx^2 + cx + d as a multiple linear regression model, where x^3, x^2 and x are the three independent variables. This is the approach used on the referenced webpage to find the best values of a, b, c and d. Here “best” means the smallest value of the sum of squared differences between the observed values of y_i and the values of y_i calculated when x_i is substituted for x in the equation y = ax^3 + bx^2 + cx + d.

          You can also use a non-linear model to find the best values of a, b, c and d. This approach is illustrated on the following webpage (using Excel’s Solver):
          https://real-statistics.com/regression/exponential-regression-models/exponential-regression-using-solver/

          Charles

          Reply
  21. Hi Charles

    I have a set of data (lets call em X and Y). I fit to them a quadratic regression and i get an R^2 = 99.29%. Now my problem is to estimate the error my new values produced by the fitted polynomial. To be more exact I am intersted to the point that the fitted curves crosses the x axis or in other words a*x^2 + b*x + c = 0. What is going to be the variance of this point ?

    Reply
    • Rishav,
      To try to figure out what is happening, please answer the following questions:
      1. What do you see when you enter the formula =VER() in any spreadsheet cell?
      2. What do you see when you press Ctrl-m?
      3. Which release of Excel and Windows are you using?
      Charles

      Reply
  22. Hi Charles, would you be able to give guidance on a method within excel of applying ± 95% confidence limits to a 3rd order polynomial. The limits would then be used to control a process. Thank you in advance for your reply,

    Reply
        • Paul, I’m not sure that I understand what you mean by applying a ± 95% confidence limit. Generally, you should have more confidence in the accuracy of a statistic when its confidence interval is narrow. In particular if the confidence interval contains zero then the coefficient for that variable is not significantly different from zero, which means that that variable (at least the cube of that variable in this case) is not making a significant contribution to the regression model.
          Charles

          Reply
          • Hi Charles, Thanks for your response. I was hoping to plot a ±95% confidence interval about the polynomial trend. I understand the function when applying to linear regression, not so easy for polynomial I guess (=t*SYX*SQRT(1/n+(A18-XAVG)^2/SSX).

          • Paul,
            Polynomial Regression is identical to multiple linear regression except that instead of independent variables like x1, x2, …, xn, you use the variables x, x^2, …, x^n. Thus, the formulas for confidence intervals for multiple linear regression also hold for polynomial regression. See the webpage Confidence Intervals for Multiple Regression.
            Charles

    • Ryan,
      The correlation between Month and Month^2 is .9789, which is quite high, but it is also not necessarily at the level of collinearity. If you perform regression with Month and Month^2, the result won’t be very different from the result with just Month^2.
      Charles

      Reply
      • So in that case, you would probably remove Month from the model and fit a new model using only Month^2 as your explanatory variable? My question is about the worrisome correlation between two independent variables in the model.

        Reply

Leave a Reply to Bhushan Cancel reply