Comparing the slopes for two independent samples

In this section we test whether the slopes for two independent populations are equal, i.e. we test the following null hypothesis:

H0:  β1 = β2 i.e. β1 – β2 = 0

The test statistic is

If the null hypothesis is true then

where

If the two error variances are equal, then as for the test for the differences in the means, we can pool the estimates of the error variances, weighing each by their degrees of freedom, and so

Now

Since we can replace the numerators of each by the pooled value $s_{Res}^2$, we have

Note that the while the null hypothesis that β = 0 is equivalent to ρ = 0, the null hypothesis that  β1 = β2  is not equivalent to ρ1 = ρ2.

Example 1: We have two samples, each comparing life expectancy vs. smoking. The first sample is for males and the second for females. We want to determine whether there is any significant difference in the slopes for these two populations. We assume that the two samples have the values in Figure 1 (for men the data is the same as that in Example 1 of Regression Analysis):

Figure 1 – Data for Example 1

Figure 1 – Data for Example 1

As can be seen from the scatter diagrams in Figure 1, it appears that the slope for women is less steep than for that of men. In fact, as can be seen from Figure 2, the slope of the regression line for men is -0.6282 and the slope for women is -0.4679, but is this difference significant?

As can be seen from the calculations in Figure 2, using both pooled and unpooled values for sRes, the null hypothesis, H0: the slopes are equal, cannot be rejected. And so we cannot conclude that there is any significant difference between the life expectancy of males and females for any incremental amount of smoking.

Figure 2 – t-test to compare slopes of regression lines

Figure 2 – t-test to compare slopes of regression lines

Real Statistics Function: The following supplemental array function is provided by the Real Statistics Resource Pack. Here Rx1, Ry1 are ranges containing the X and Y values for one sample and Rx2, Ry2 are the ranges containing the X and Y values for a second sample.

SlopesTest(Rx1, Ry1, Rx2, Ry2, b, lab): outputs the standard error of the difference in slopes sb1–b2, t, df and p-value for the test described above for comparing the slopes of the regression lines for the two samples.

If b = True (the default) then the pooled standard error sb1–b2 is used (as in cell T10 of Figure 2); otherwise the non-pooled standard error is used (as in cell M10 of Figure 2).

If lab = True then the output is a 4 × 2 range where the first column contains labels and the second column contains the values described above and if lab = False (the default) only the data is outputted (in the form of a 4 × 1 range).

The SlopesTest  function only produces the correct results if there are no missing data elements in Rx1, Ry1, Rx2, Ry2.

Observation: For Example 1, the formula

=SlopesTest(A5:A19,B5:B19,D5:D20,E5:E20,FALSE,TRUE)

generates the output in range M29:N32 of Figure 3, while the formula

=SlopesTest(A5:A19,B5:B19,D5:D20,E5:E20)

generates the output in range O29:O32.

Fig 3 – Using Real Statistics function

Figure 3 – Comparing slopes using Real Statistics function

27 Responses to Comparing the slopes for two independent samples

1. Aravindh says:

Hi,

You have a cell wrong in the excel sheet. Please fix it if you can. In Figure 2, cell M11 should be equal to (b1-b2)/(sb1-b2), NOT (b1-b2)/(sb1-sb2)

Thanks Aravindh,
That was a great catch. I have made the change that you suggested. Thanks for your help.
Charles

Dear Dr. Zaiontz
thank you for your useful example. I have a question:
As I found in your example, this method will be useful if linear regression consider. If we have some data that power regression will fit, in this case what should we do? can we use directly with those data or we should change them to linear regression (by log transferred for example)? for example, if we want to compare the regression line between fish male and female height and weight (which is power regression), we can use directly from those data?

• Charles says:

I believe that what you have suggested should work since you are only using a transformation. The formulas for comparing the slopes need to be applied after the transformation so that you are comparing the slopes of two (straight) lines.
Charles

Thank you for your laconic comments. What about “a” (Y intercepts) between two lines? Is this factor important if we want to compare the slopes between two lines or only the “b” (slope) should be compared?

3. Andrew Tilley says:

Do you have a textbook or a paper you can cite to justify these equations? I’m fairly certain the test you present here is incorrect. The test statistic should be (b1 – b2) / sqrt(SE(b1)^2 + SE(b2)^2), where SE(b1) is given not by steyx (this is the standard error of the predicted y value) nor by the equation you give for s_b1. See, e.g., this link: (http://stats.stackexchange.com/questions/44838/how-are-the-standard-errors-of-coefficients-calculated-in-a-regression) on accurate calculation of these standard errors.

• Charles says:

Andrew,
I used David Howell’s textbook entitled “Statistical Methods for Psychology”, Wadsworth CENGAGE Learning, 2010. Shortly I will recheck my test samples using your approach and the approach I used on the website.
Charles

4. Andrew Tilley says:

Thanks for your quick reply, Charles. I actually caught the source of my confusion, and I’m now convinced that your approach is actually correct! Sorry about that.

5. Lauri says:

Thank you for this, your slopestest function saved me a lot of trouble!

6. Colin says:

Sir

I am a little confusing about the “pool the estimates of the error variances” . The formula (S_res squre) you used in this website is different with the formula you used in Excel workbook.

Colin

• Colin says:

Sir

Please ignore my question, you are right.

Colin

7. Johnathan Clayborn says:

Hi Dr. Zaiontz,

This is exactly the type of information that I was looking for to complete a study that I was working on. I was wondering two questions;

1st) can you explain more about about I would go about finding the X/Y values of the lines in order to perform these calculations? I’m using a trendline in time-series line graph and I can see that there is definitely statistical significance, but I need to express it mathematically.

2nd) Do you know if this method is possible using SPSS?

• Charles says:

Hi Johnathan,

1) I am not sure what you mean. The X/Y values are the data that you are testing.

2) I don’t use SPSS, but I believe that the answer is yes. E.g. the following webpage references doing this in SPSS: http://core.ecu.edu/psyc/wuenschk/MV/multReg/Potthoff.pdf

Charles

8. Carl says:

Hi, I tried testing your SlopesTest using your Example1 data. When I input it I only get the following “result”: std err. This appears to be only the label as in your Figure 3. When I repeat the formula by excluding the “false,true” part I get the result 0.23271.
Any ideas?

• Charles says:

Carl,
SlopesTest is an array formula. Try entering the formula and then pressing Ctrl-Shift-Enter. The full results should be displayed. If you press Enter instead, then only the first cell in the output will appear.
Charles

• Laura says:

I have the same problem as Carl, except I have tried ‘Ctrl-Shift-Enter’ and it makes no difference to the result. It’s either ‘std err’ or a number (in my case 28.16). Please let me know if you are aware of any other factors that might be stopping this formula from working.

• Charles says:

Laura,

Since this is an array function, you need to first highlight a 4 x 1 column range, then enter a formula of form SlopesTest(R1, R2, R3, R4, b) where R1, R2. R3 and R4 are ranges and b is either TRUE or FALSE, and finally press Ctl-Shft-Enter. This will fill the highlighted range with the following values: std err, t, df, p-value.

Alternatively you can first highlight a 4 x 2 range, then enter a formula of form SlopesTest(R1, R2, R3, R4, b, TRUE) and finally press Ctrl-Shift-Enter. This will fill the second highlighted column with the same values as described above and fill the first column with the appropriate labels.

The key is that you must first highlight an output range of sufficient size to contain all the output. It can even be larger than necessary (the extra cells will be filled with #N/A.

Charles

• Laura says:

Thank you for the advice Charles. The problem is now fixed thanks to your suggestion!

Cheers

9. Ramiro says:

Hi Charles,

I noticed that if there are holes in the data the result of SlopesTest is different. Belos are the numbers I tried, they are the same but some points are missing one or the other piece of data. Since only data with x and y should count, I thought the SlopesTest would give me the same result. Should I always remove missing data before doing the SlopesTest?
Thanks you,
Ramiro

This gave me p=0.131431
1 3 1 1
2 3 2 2
3 3 3 5
4 4 6
5 4 8
4 5 9
6 5 6
7 6 7 13

and this gave me p=0.14889

1 3 1 1
2 3 2 2
3 3 3 5
5 4 4 6
6 5 5 9
7 6 7 13

• Charles says:

Hi Ramiro,
In the current implementation of the SlopesTest function the correct values are generated only if there is no missing data. You need to remove any missing data before using the function.
Charles

Hi.

How can I do this on excel 2010? I was trying TDIST formula but this function is available only with Excel 2007 or earlier versions and I am unable to understand the 2010 version. Or post a picture using Excel 2010 please.

Cheers

• Charles says:

I am using Excel 2010 and have no problem using the Excel 2007 functions such as TDIST. In any case here are substitutions for Excel 2010:

Replace TDIST(x,df,2) by T.DIST.2T(x,df)
Replace TINV(p,df) by T.INV.2T(p,df)

Charles

11. David says:

Hi Charles,
Thank you very much for this great post!
I have a small question. What if each one of my data (y) is actually a mean over a lager data set, how can I account for it? should I expect a different result?
Thanks.

• Charles says:

David,
I’m not sure how you would account for this (or if you could account for this). I would think that this would change things considerably.
I suggest that you try a few examples where you create some data (i.e. the larger data sets) and have the y values be the means over the larger data set that you have created. Then run the test using the means and run it again using the larger data set and see what sort of differences there are.
Charles

• David says:

David

12. Patricia Olson says:

Hi Charles
Your RealStatistics Resource Pack for Excel is great. Thank you for providing it. I have been using R but am still learning the language. Your tool is much more time saving for some statistical analyses than R. However, I am having some problems accessing some of the functions such as SlopesTest. I’m using Excel 7. I tried to access it through the example worksheet and still just get the #VALUE! message.
Thank you for your help on this.
Cheers
Patricia

13. Patricia Olson says:

Charles
Never mind… I figured it out finally. I was entering the array data incorrectly.

Thanks
Patricia