Testing the significance of the slope of the regression line

In this section we test the value of the slope of the regression line.

Observation: By Theorem 1 of One Sample Hypothesis Testing for Correlation, under certain conditions, the test statistic t has the property

image1751

But by Property 1 of Method of Least Squares

image1752

and by Definition 3 of Regression Analysis and Property 4 of Regression Analysis

image1753

Putting these elements together we get that

image1754

where
image1755

Since by the population version of Property 4 of Regression Analysis

image1756it follows that ρ = 0 if and only if β = 0. Thus Theorem 1 of One Sample Hypothesis Testing for Correlation can be transformed into the following test of the hypothesis H0: β = 0 (i.e. the slope of the population regression line is zero):

image1758

Example 1: Test whether the slope of the regression line in Example 1 of Method of Least Squares is zero.

Figure 1 shows the worksheet for testing the null hypothesis that the slope of the regression line is 0.

Slope regression t test

Figure 1 – t-test of the slope of the regression line for data in Example 1

Since p-value = .0028 < .05 = α (or |t| = 3.67 > 2.16 = tcrit) we reject the null hypothesis, and so we can’t conclude that the population slope is zero.

Note that the 95% confidence interval for the population slope is

b ± tcrit · sb = -628 ± 2.16(.171) = (-.998, -.259)

Observation: We can also test whether the slopes of the regression lines arising from two independent populations are significantly different. This would be useful for example when testing whether the slope of the regression line for the population of men in Example 1 is significantly different from that of women.

Click here for additional information and an example about Hypothesis Testing for Comparing the Slopes of Two Independent Samples.

Excel Functions: where R1 = the array of observed  values and R2 = the array of observed  values.

STEYX(R1, R2) = standard error of the estimate sy∙x = SQRT(MSRes)

LINEST(R1, R2, TRUE, TRUE) – an array function which generates a number of useful statistics.

To use LINEST, begin by highlighting a blank 5 × 2 region, enter =LINEST( and then highlight the R1 array, enter a comma, highlight the R2 array and finally enter ,TRUE,TRUE) and press Ctrl-Shft-Enter.

The LINEST function returns a number of values, but unfortunately no labels for these values. To make all of this clearer, Figure 2 displays the output from LINEST(A4:A18, B4:B18, TRUE, TRUE) using the data in Figure 1. I have added the appropriate labels manually for clarity.

LINEST Excel regression

Figure 2 – LINEST(B4:B18,A4:A18,TRUE,TRUE) for data in Figure 1

R Square is the correlation of determination r2 (see Definition 2 of Basic Concepts of Correlation), while all the other values are as described above with the exception of the standard error of the y-intercept, which will be explained shortly.

Excel also provides a Regression data analysis tool. The creation of a regression line and hypothesis testing of the type described in this section can be carried out using this tool. Figure 3 displays the principal output of this tool for the data in Example 1.

Regression data analysis tool

Figure 3 – Output from Regression data analysis tool

The following is a description of the fields in this report:

Summary Output:

  • Multiple R – correlation coefficient (see Definition 1 of Multiple Correlation, although since there is only one independent variable this is equivalent to Definition 2 of Basic Concepts of Correlation)
  • R Square – coefficient of determination (see Definition 1 of Multiple Correlation), i.e. the square of Multiple R
  • Adjusted R Square – see Definition 2 of Multiple Correlation
  • Standard Error = SQRT(MSRes), can also be calculated using Excel’s STEYX function
  • Observations – sample size

ANOVA:

  • The first row lists the values for dfReg, SSReg, MSReg, F = MSReg/MSRes and p-value
  • The second row lists the values for dfRes, SSRes and MSRes
  • The third row lists the values for dfT and SST

Coefficients (third table):

The third table gives key statistics for testing the y-intercept (Intercept in the table) and slope (Cig in the table). We will explain the intercept statistics in Confidence and Prediction Intervals for Forecasted Values. The slope statistics are as follows:

  • Coefficients – value for the slope of the regression line
  • Standard Error – standard error of the slope, sb = sy∙x / (ssx * SQRT(n-1))
  • t-Stat = b/sb
  • P-value = TDIST(t, dfRes, 2); i.e. 2-tailed value
  • 95% confidence interval = b ± tcrit sb

In addition to the principal results described in Figure 3, one can optionally generate a table of residuals and table of percentiles as described in Figure 4.

Residual output Excel regression

Figure 4 – Additional output from Regression data analysis tool for data

Residual Output:

  • Predicted Life Exp = Cig * b + a; i.e. ŷ
  • Residuals = Observed Life Exp – Predicted Life Exp; i.e. y – ŷ
  • Standard Residuals = Residual / Std Dev of the Residuals (since the mean of the residuals is expected to be 0): i.e. e/se

For example. for Observation 1 we have

  • Predicted Life Exp = -.63 * 5 + 85.72 = 82.58
  • Residuals = 80 – 82.58 = -2.58
  • Standard Residuals = -2.58 / 7.69 = -.336

Note that the mean of the residuals is approximately 0 (which is consistent with a key assumption of the regression model) and standard deviation 7.69.

There is also the option to produce certain charts, which we will review when discussing Example 2 of Multiple Regression Analysis.

42 Responses to Testing the significance of the slope of the regression line

  1. Reza says:

    Hello,
    Is there a direct equation or formula for sample size estimation in linear regression model containing alpha and beta for the slope?

  2. TS says:

    I was wondering how to test whether the slope is equal to one?
    Thanks

    • Charles says:

      TS,
      Here is one approach. Consider the linear equation y = bx + a. First note that the linear equation y = (b-1)x + a has slope of zero if and only if b = 1. Next note that y = (b-1)x + a is equivalent to y’ = bx + a where y’ is y+x. Thus you need to perform the usual test for slope = zero using the original x data, but with the y data replaced by the original y data plus the corresponding original x values. Thus if your original x data were 4, 6, 9, 14 and your original y data were 14, 25, 36, 55, you perform the test for zero slope using x data 4, 6, 9, 14 and y data 18, 31, 45, 69.
      Charles

      • Sam says:

        Why are the x values added to the original y instead of subtracted? Isn’t the objective to get, if the original were a line with slope 1, a line with slope zero? Then, if the comparison works for slope zero, you know the original had slope 1?
        Or am I totally confused?
        Thanks.

  3. David Merrick says:

    Hi Charles,
    Example 1 tests whether the slope of the regression line is zero. The excel formula shows a one-tailed test with alpha=0.05, but surely this is a two-tailed test and the t-critical value formula should be ‘=TINV(0.025,E11)’. Please could you confirm either way?

    • Charles says:

      David,
      TINV(p,df) is a two tailed inverse; it is equal to T.INV(1-p/2,df), which is a one-tailed inverse.
      Charles

  4. adam says:

    Hi, how would i carry out a Hypothesis test for the slope and intercept coefficients once i have the regression results in excel?

    • Charles says:

      Adam,
      Just look at the p-value for the slope coefficient (or intercept) in the regression analysis. If the p-value is less than alpha then the slope (or intercept) is significantly different from zero.
      Charle

  5. Edward says:

    Dear Charles
    Many thanks for building this platform that is really help for me.
    Recently, I do a mutual fund performance test, the objective is to compare whether the difference between two groups’ pooling regression intercept terms (mutual fund risk-adjusted returns) is significantly different than zero. Before I ask this question, I consult some websites, such as, http://core.ecu.edu/psyc/wuenschk/docs30/CompareCorrCoeff.pdf, however, it only show a single factor model as an example, in our case , I am not sure how to dial with the M(2,1) and M(2,2) in page 3 because my regressions are multi-variable model. Then, my question is, whether I can use your approach “Testing the significance of the slope of the regression line” to test the difference in intercept is significant or not from two independent sample regressions?

    • Charles says:

      Edward,
      I have not yet addressed this issue in the Real Statistics website, but I will eventually do so. You can learn how to do this from the textbook written by Zar. See the Bibliography for a reference.
      Charles

  6. shima says:

    Dear Charles
    would you let me know how to compare three and more nonlinear regression lines which are not mostly in a fully common interval?

  7. yeva says:

    Hi, Charles , thanks for your hopeful post.I’ve done follow your steps and it’s very useful, but I have one confusion is where is the ” Regression data analysis tool”.Looking for your reply, thanks a lot 🙂

    • Charles says:

      You can access Excel’s Regression data analysis tool from the Data ribbon, via Data > Analysis|Data Analysis.
      You can access the Real Statistics version of this data analysis tool by pressing Ctrl-m and choosing Regression.
      Charles

  8. lindsey says:

    Hi, great info- I’m having trouble understanding why the value for degrees of freedom in the first screenshot is the sample size minus 2 instead of n-1?

  9. Monica says:

    Please explain to me the rule of thumb for significant testing of the regression estimate of a slope.

  10. Mori says:

    Hello
    Thanks for the helpful website and nice explanations. It might be a stupid question, but I was wondering how I can check and see if the slope of my regression line is significantly different from “1” (instead of zero), and the intercept is different from zero.

    Thanks,

    • Charles says:

      Not a stupid question at all. There are probably a number of ways of doing this, but here is one approach.

      Consider the linear equation y = bx + a. First note that the linear equation y = (b-1)x + a has slope of zero if and only if b = 1. Next note that y = (b-1)x + a is equivalent to y’ = bx + a where y’ is y+x. Thus you need to perform the usual test for slope = zero using the original x data, but with the y data replaced by the original y data plus the corresponding original x values. Thus if your original x data were 4, 6, 9, 14 and your original y data were 14, 25, 36, 55, you perform the test for zero slope using x data 4, 6, 9, 14 and y data 18, 31, 45, 69.

      Since the intercept didn’t change you simply use the usual test for intercept = zero, as shown in Example 2 of Confidence and Prediction Intervals.

      Charles

      • Sherry says:

        Hi Charels:
        I am thinking of how can I interpret the slope with significant different from the a constant, when the constant is not equal to zero. When the significant different from zero, we can know there is a linear relationship, but how can we interpret when it is not zero. For example, when it is asked to test whether the slope significant different form 1 or not and then make an explanation.

        Thanks,
        Sherry

        • Charles says:

          Sherry,
          I would use the fact that the line y = bx + a has slope b = 1 if and only if the line (y-x) = (b-1)x + a has slope 0. Thus you need to test y’ = b’x + a where y’ is y – x.
          Charles

  11. Birend Dhungana says:

    Hi,
    Thank you for the post.
    I am trying to generate a calibration curve consisting 5 data points. I have six sets of replicate measurement data. Now I am wondering, what is the best approach of finding/reporting standard deviations of slope and intercept: a) get single data set by averaging values from replicate measurements and use LINEST function or b) obtain 6 slope and intercept values by plotting each data set of replicate measurement separately and calculate standard deviations of slopes and intercept.
    Also, how can I calculate confidence interval if the approach (a) is used?

    I have similar set of data obtained for slightly different condition and I need to compare if resulting slopes for these two different conditions significantly different?

  12. Jérémie says:

    Hi! Thank you for your site, I find it very useful. I have a question about linear regression. I have performed a test to check correlation between two variables. For this test, 11 points were taken 4 times each (i.e. at the temperature of 4.5, the replicates were 4, 7, 6.5, 8.1). If I check for significance of a correlation between the average of each point (11 points on the curve) I can’t reject the null (p=0.073). However, if I use all 44 points I can reject the null (p=0.0005). I am wondering which treatment is correct and why? Thank you in advance.

    • Charles says:

      One of the assumptions for linear regression is that the observations are independent. In the 44 point case, you clearly don’t have independent observations (since there are 4 measurements for each of the 11 points).
      Charles

  13. Mehdi says:

    Hi. thanks for your useful and clear explains.
    I’ve used Eviews software to estimate an independent variable as a function of 8 independent variables. the R^2 for my model is high (0.72) but the t value for my parameters of independent variables are too low (in some cases less than 0.001).
    I want to know should I remove variables with low t value? and is R square more important compared to t value?

    • Charles says:

      The R^2 value is a measure of the overall fit of the model. The p-value (not the t statistic) of each coefficient is a measure of weather the corresponding variable is contributing much to the model. You can remove a variable whose corresponding p-value is not significant (this indicates that the corresponding coefficient is statistically equivalent to zero). You could remove such variables from the model and see what impact this has on the R^2 value. This is explained on the webpage

      Significance regression model variables

      Charles

  14. Vicki says:

    Hi Thanks for the post. It is very helpful. I need to calculate LOQ and LOD for my work. I used ICH guideline about standard deviation of y-intercept/slope to calculate LOQ and LOD. I used regression analysis and get the standard error of y-intercept (3rd table). I also calculated the STEYX. The two data do not match. Do they suppose to match? I am very confused. Also ICH calls for standard deviation but bot 3rd table and STEYX have the name of standard error instead of deviation, but I searched online and everybody said they are same thing. Can I send you the excel file.

  15. djU says:

    hi, i was just wondering on how do i know which independent variables are significant and which are not?

  16. Ryan says:

    Under Figure 3 below Summary Output I believe R Square – correlation of determination should be “coefficient of determination” Also, does the Real Statistics Data Analysis Tools offer variance inflation factor for linear regression? Or a scatterplot matrix feature to check for multicollinearity?

  17. krithi Subramanian says:

    Hello
    I am doing multiple regression in Excel 2007. I have a one Dependent Data and 18 independent Data. But i am not to finding multiple regression at the time in all my Data.
    Its give warning message like this, Only 16 column are available. can you please give me solution.?

    • Charles says:

      Excel’s Regression data analysis tool is limited to 16 independent variables. You can use the Real Statistics Linear Regression data analysis tool instead. This tool supports up to 64 independent variables and is part of the Real Statistic Resource Pack, which you can download for free from this website.
      Charles

Leave a Reply

Your email address will not be published. Required fields are marked *