Confidence and prediction intervals for forecasted values

The 95% confidence interval for the forecasted values ŷ of x is

Confidence interval regression


This means that there is a 95% probability that the true linear regression line of the population will lie within the confidence interval of the regression line calculated from the sample data.

 Confidence prediction interval

Figure 1 – Confidence vs. prediction intervals

In the graph on the left of Figure 1, a linear regression line is calculated to fit the sample data points. The confidence interval consists of the space between the two curves (dotted lines). Thus there is a 95% probability that the true best-fit line for the population lies within the confidence interval (e.g. any of the lines in the figure on the right above).

There is also a concept called prediction interval. Here we look at any specific value of x, x0, and find an interval around the predicted value ŷ0 for x0 such that there is a 95% probability that the real value of y (in the population) corresponding to x0 is within this interval (see the graph on the right side of Figure 1).

The 95% prediction interval of the forecasted value ŷ0 for x0 is

Prediction interval regression

where the standard error of the prediction is

Standard error prediction

For any specific value x0 the prediction interval is more meaningful than the confidence interval.

Example 1: Find the 95% confidence and prediction intervals for the forecasted life expectancy for men who smoke 20 cigarettes in Example 1 of Method of Least Squares.

Confidence prediction intervals Excel

Figure 2 – Confidence and prediction intervals

Referring to Figure 2, we see that the forecasted value for 20 cigarettes is given by FORECAST(20,B4:B18,A4:A18) = 73.16. The confidence interval, calculated using the standard error 2.06 (found in cell E12), is (68.70, 77.61).

The prediction interval is calculated in a similar way using the prediction standard error of 8.24 (found in cell J12). Thus life expectancy of men who smoke 20 cigarettes is in the interval (55.36, 90.95) with 95% probability.

Example 2: Test whether the y-intercept is 0.

We use the same approach as that used in Example 1 to find the confidence interval of ŷ when x = 0 (this is the y-intercept). The result is given in column M of Figure 2. Here the standard error is


and so the confidence interval is


Since 0 is not in this interval, the null hypothesis that the y-intercept is zero is rejected.

31 Responses to Confidence and prediction intervals for forecasted values

  1. Phil says:

    Hi Charles,
    I wanted to point out a common misunderstanding about confidence intervals: CI’s say nothing about the probability that the true value of the population parameter lies within them – it either does or it doesn’t. A 95% CI just tells you that, if you were to repeat your experiment (sampling) an infinite number of times and run the statistics on each sample, the true parameter will lie within 95% of those confidence intervals. What you described as a CI in the first section of your post is actually Bayesian credible interval, which is a bit more complicated to calculate, but it does tell you the probability that your population parameter lies within the interval.

    • Charles says:


      Thanks for correcting my somewhat appealing, but, in the end, inaccurate statement. Shortly I will update the website with a more accurate characterization of the confidence interval.

      I appreciate your help in making the site more accurate.


  2. Mari says:

    Hello! Thank you very much this incredibly helpful guide! Do you have a tutorial on how to find the C.I and P.I using multiple regression?

    • Charles says:

      Not yet. I plan to include this shortly. I am working on adding information about Survival Analysis now. After that the next thing I will do will include CI/PI for multiple regression.

  3. Paul says:

    Hi Charles,
    I notice that you use n-2 for degrees of freedom, whilst other publications, for example UKAS M3003, use n-1. Although a small difference is seen in the calculation of the degrees of freedom, it can greatly affect the interval magnitude. Could you give some advise as to which calculation should be used for the df value? Thank you in advance for your reply.

    By the way, your website has been an extremely useful aid, Thanks

    • Charles says:

      I am not familiar with UKAS M3003, but I just checked a few other publications and they all show df = n-2 when calculating the confidence interval for regression.

  4. Ryan says:


    What’s the rationale for the added 1 under the square root of the standard error of the prediction? Theoretically, why 1?

    Thank you

    • Charles says:

      The standard error for a prediction interval adds the variability of the points around the predicted mean. mathematically this is where the 1 comes from.
      I plan to eventually show the proofs of the formulas for the confidence and prediction intervals. You will then be able to see from the proofs more precisely where the 1 comes from.

  5. Tristan says:

    Hi Charles,

    This post has helped me so much already, really very insightful and easy to follow!

    I am trying to do a prediction interval for some metal fatigue test data but I am trying to find 90% confidence in the 98% reliability data. In order to do this is just using 0.1 as the probability in TINV sufficient for the method to change it to the 90% probability or do I need to make more changes please?

    Thanks for your help!


    • Charles says:

      TINV(.1,df) (i.e. alpha = .1) is part of the formula for a 90% prediction interval. But I don’t understand what the 98% reliability data means.

  6. Elisa says:

    Hi Charles,
    I am using a exponential/hyperbolic function to fit my data. The model is (a*X+b)-SQRT[(a*X+b)^2-((4*a*X*b*c)/(2*c))].
    I wonder if I can calculate prediction intervals in the way you show, or if there is any parameter that is different for this type of models. If so, can you tell me how I should calculate it?
    Thank you very much

  7. Rizwan says:

    Hi Charles,

    Is Syx = Sres = STEYX(Y,X)? Is it same as Syx = SQRT((SUM(yi – Yi)^2)/(degrees of freedom)), where (xi,yi) are given data and Y is any nonlinear model (not a straight line, say a sigmoidal or logistic curve), which fits the data.

    • Charles says:

      Sorry for the long delay in responding to you. I just realized that I overlooked responding to you.
      It is true that Syx = Sres = STEYX(Y,X) = SQRT((SUM(yi – Yi)^2)/df) for linear models. I am not sure what these terms would even mean for non linear models such as logistic regression models.

  8. David says:

    Hi Charles,

    Thanks for your contributions on this site. I’m a bit confused by your base formula, though. Where you use the sum of squared deviations of x (SSx, calculated as DEVSQ(x) or DEVSQ(A4:A:18), I’ve learned to use the standard deviation of x times (n-1), or STDEV.S(A4:A:18)*(n-1) in Excel speak. This would yield a value of CI SE of 2.090695467 and a PI SE of 8.244184143. The difference is small in your dataset, but where deviations are larger the use of sums of squared deviations instead of the method I’ve heard will yield very different results. That said, I’m not at all certain which method is correct–can you point to some references for your formula, please?

    Thanks in advance for taking the time to clarify this issue for me.


    • Charles says:

      Hi David,

      Note that for any range R1, the square root of DEVSQ(R1) is STDEV.S(R1)*(n-1). In fact, in the formula for cell E12 in Figure 2 of the referenced page I do take the square root of SSx, and so it seems that we should get the same answer. Can you send me an example of your calculation so that I can see why the results are not the same?


      • Zhang says:

        Thanks for your contribution. I would like to know how to calculate CI and PI if there are two independent variables. Thanks.

  9. Anu says:

    Please help how u got value of SSx which I suppose to be:-271.6

  10. Kristian Pedersen says:

    Hi Charles,

    Great. Thank u.


  11. Kristian Pedersen says:


    Whats the formula in J12? Cannot get the same results…



    • Charles says:

      Hi Kristian,
      J12 contains the same value as cell E9. The formula in E9 is =FORECAST(E8,B4:B18,A4:A18).

      • Kristian Pedersen says:

        Hi Charles,

        I’m refering to J12, not J11 :-) J12 contains the formula for se (prediction standard error) and formula result i 8.236857, which I cannot get by using the exact same numbers you do.

        What formula is in cell J12??

        I think it is in the (x – x_)^2 that something is wrong!


  12. Joaquin says:

    Dr. Zaiontz,
    Very neat and concise example. I’m particularly interested in a one sided C.I. (lower bound)
    Would you agree to use
    \hat{y} – t_{crit} s.e.

    where t_{crit} should be calculated in Excel using =TINV(2*\alpha,df),
    where \alpha = 1-p?



    • Charles says:


      I believe that what you wrote is correct.


      • Emiel says:

        Hi all,

        It would be good to point out that the function TINV gives a two-sided confidence interval. The new function T.INV (with a dot) gives a one-sided confidence interval. In case of using the new function, you should take \alpha/2; furthermore, it uses the 1-\alpha/2 value, thus, T.INV(0.975,df).


        • Charles says:


          Actually, more simply you should use the T.INV.2T function for the two-sided critical value. It is equivalent to TINV.

          This is explained on the webpage

          I will shortly update this information to better explain the various usages of the functions.


          • Omamz says:

            Hi Charles,

            Your post has been of tremendous help to me.
            Although it took me some time I have applied most of the solutions and they have worked just fine.
            I need clarification on example 2. Why did you have to eliminate the # 1 before 1/15?
            Would really appreciate your prompt response.

          • Charles says:

            The test uses the confidence interval and not the prediction interval. This is why a 1 is not inserted before 1/15. Shortly I will update the webpage to explain better when the prediction interval is used and when the confidence interval is used.

Leave a Reply

Your email address will not be published. Required fields are marked *