# Regression approach to ANCOVA

Example 1: Carry out the analysis for Example 1 of Basic Concepts of ANCOVA using a regression analysis approach.

Our objective is to analyze the effect of teaching method, but without the confounding effect of family income (the covariate). We do this using regression analysis. As we have done several times (see ANOVA using Regression), we use dummy variables for the treatments (i.e. the training methods in this example). We choose the following coding:

t1 = 1 if Method 1 and = 0 otherwise
t2 = 1 if Method 2 and = 0 otherwise
t3 = 1 if Method 3 and = 0 otherwise

We also use the following variables:

y = reading score
x = family income (covariate)

Thus the data from Figure 1 of Basic Concepts of ANCOVA takes the form for regression analysis shown in Figure 1.

Figure 1 – Data for Example 1 along with dummy variables

Now we define the following regression models:

• Complete model (y, x, t, x*t) – all the variables are used, interaction between treatments and income is modeled
• Full model (y, x, t) – all the variables are used, interaction of treatment with income is not modeled
• Partial model (y, t ) – only reading scores and treatments are used
• Partial model (y, x) – only reading scores and income are used
• Partial model (x, t) – only income and treatments are used

Running Excel’s regression data analysis tool for each model we obtain the results displayed in Figure 2 (excluding the complete model, which we will look at later):

Figure 2 – Full and reduced regression models

The ANCOVA model follows directly from Figure 2. There are two versions. The first model, shown in Figure 3, is essentially the full model with the variation due to the covariate identified.

Figure 3 – ANCOVA model for Example 1

The sum squares are calculated as follows (the degrees of freedom are similar):

From Figure 3, we see that the covariate is significant (p-value = 0.012 < .05), and so family income is significant in predicting reading scores.

We also see that differences in training are significant (p-value = .032 < .05) even when family income is excluded. This is equivalent to rejecting the following null hypothesis:

H0: $\mu'_1$ = $\mu'_2$ = $\mu'_3$ = $\mu'_4$

where $\mu'_j$ is the mean for training method adjusted to remove the effect of the covariate. We’ll have more to say about this in a moment.

Another way at looking at ANCOVA is to remove the covariate from the analysis (see Figure 4).

Figure 4 – Reduced ANCOVA model for Example 1

Here the adjusted regression SS (cell L40) is =L33 (from Figure 3), the residual SS (cell L41) is =L34, and the adjusted total SS (cell L42) is =L40+L41.

An alternative way of calculating SST in the reduced ANCOVA model uses the slope of the regression line that fits all the data points, namely (with reference to Figure 1 of Basic Concepts of ANCOVA)

bT = SLOPE(A4:A39,B4:B39) = 0.376975

Also note that SST(x,t) = DEVSQ(B4:B39).

### Adjusted means

We now turn our attention to the treatment means $\mu'_1$ adjusted to remove the effect of the covariate. To obtain estimates for these we need to look at the coefficients of the full model, which is displayed in Figure 5.

Figure 5 – Full model (y, x, t), including coefficients

Thus the regression model is

One thing this shows is that for every unit of increase in x (i.e. for every additional \$1,000 of family income), y (i.e. the child’s reading score) tends to increase by .323 points.

Note that the mean value of x is given by AVERAGE(B4:B39) = 48.802 (using Figure 1).

To get the adjusted mean of the reading scores for Method 4, we set x = 48.802 and t1 = t2 = t3 = 0, and calculate the predicted value for y:

For Method 1 we set x = 48.802, t1 = 1 and t2 = t3 = 0.

Similarly, for Method 2 we set x = 48.802, t2 = 1 and t1 = t3 = 0.

Finally, for Method 3 we set x = 48.802, t3 = 1 and t1 = t2 = 0.

The results are summarized in Figure 6.

Figure 6 – Adjusted means for Example 1

The values for Y in Figure 6 are the group means of y. E.g. the mean of reading scores for Method 2 is AVERAGE(A12:A19) = 33.75. The adjusted grand mean is the mean of the adjusted means, i.e. AVERAGE(C56:C59) = 23.442.

The adjusted means can also be computed using the slope bW, which is the regression coefficient of x in the full model (i.e. the value in cell S36 of Figure 5), namely  bW = .323.

Figure 7 – Alternative method for calculating the adjusted means

Here the values for Y are the group means as described above. The values for X (the covariate) are similar; e.g. the mean family income for the children in the Method 2 sample (cell C49) = AVERAGE(B12:B19) = 60.2625. The grand mean for the covariate (cell C52) is AVERAGE(B4:B39).

The adjusted means are now given by the formula

E.g. the adjusted mean for Method 2 (cell D49) is given by the formula =B49-S36*(C49-C52) where cell S36 contains the value of bW.

### 15 Responses to Regression approach to ANCOVA

1. Huy Pham says:

Dear Professor
Thanks for your page so much.
I have issue that need your help. I want to test the parallel between 2 linear regression- line. Firstly, i test the linear regression of two model. Then i want to test the parallel of these two. I think i can use the comparision of slopes of each model. if there ‘s no significantl difference between these two, they are parallel with each other. Is my method right or wrong?
I hope you can answer me.
thanks and best regards.

• Charles says:

Yes, this is correct.
Charles

2. Stephenie says:

Hi professor,

thank you for this page, it is really helpful! I am a not sure about some definitions and I would like to know what is the difference between adjusted, balanced and weighted in statistics ?

thank you very much!

Stephanie

• Charles says:

Stephanie,
The meanings of these terms depend on the context in which they are used. In the case of ANCOVA, adjusted implies that we make some change to the normal definition for some specific purpose. Balanced generally means that multiple groups have the same number of elements. Weighted means that you multiply each k-tuple of values by a fixed k-tuple of weights.
Charles

3. Mik says:

Thank you very much for all your work!
What I don’t understand is:
To get the adjusted means we count “mean_1 – (b_x *(mean_x1 – mean_x)” where mean_1 is the mean of group1, mean_x1 is the mean of the covariate over group1, mean_x is the mean of the covariate over all subjects and b_x is the weight of the covariate.
How can I get b_x without knowing the adjusted means? When I do a regression of x on y, I don’t get the same b_x as in the full model. thanks!

• Charles says:

Mik,

As stated at the bottom of the referenced webpage, b_W is equal to the b_x (cell S36 in Figure 5). This is what we need to calculate the adjusted means. You can also calculate b_W and the adjusted means using the ANOVA approach, as described on the webpage ANCOVA using ANOVA.

The value b_x is calculated using the regression shown in Figure 5 without adjusting the means.

Charles

4. Gaurav says:

Dr. Zaiontz

First I would like to thank you remarkable effort for bringing such a creative tutorial type statistics website.

I am writing to seek your advice on some data analysis.

We are trying to evaluate relative potency of drug and want to compare with some reference drug.
Design of experiment is as below
1) Both Test and reference drug is tested in parallel with reference drug at five doses @ 1/3 log variation in animals
2) for each dose, we are having a group comprise of 10 animals so in total 50 animals each for test and reference totalling 100 animals
3) After drug injection, potency in animals are checked through certain assay which gives quantitative values.

Now question are
1) which of statistical method is good way to judge and conclude Test drug is having same potency that of reference drug. whether we should use Anova or ANcova or something else for analysis?
2) Which statistical test is right way to evaluate its significance? F test or T test

Thanks in anticipation

• Charles says:

Based on my understanding of the scenario, you have two factors: Factor A has two levels test and reference drug. Factor B has 5 levels for the 5 dosages. This is a classic two fixed factor ANOVA, which uses an F test. When you get into follow up analyses, you may use a t test.
Charles

5. Citra says:

Sir,
I have four covariates with three factors/treatments, can you explain how to calculate ANCOVA with this several covariates? and can i know what the best treatments in this case?
Thank You,
Citra

• Charles says:

Citra,
Sorry, but I have not dealt with the case of multiple covariate as of yet. I will eventually add this. I believe that SPSS handles this type of analysis.
Charles

6. Charity says:

Hello,

Thanks for your page and your practice sheets. I was trying this out with my analysis where I only have 2 treatment groups (instead of 4 like yours). I was finding that when I run the linear regressions, it would return “0” for coefficient, standard deviation, and upper/lower limits for…
-“x” and “t1” in the complete regression
-“t1” for the full regression
-“t1” for the y,t partial
-“t1” for the x,t partial

As far as I can see I’m setting up my tables exactly the same as you, would you have any thoughts what I may be doing wrong?

Charity

7. franz schumann says:

Sir,

Would you please be so kind and explain, how the intercept (in the above ex. 2,857) is calculated in ANCOVA

• Charles says:

Franz,
The full model (including the intercept) is obtained using ordinary linear regression on the data in Figure 1. This is done by running either Excel’s Regression data analysis tool or the Real Statistics Linear Regression data analysis tool (see webpage http://www.real-statistics.com/multiple-regression/ for details)
Charles

8. Colin says:

Sir

I think the formula in Cell O32 in Figure 3 is wrong. You choose a wrong degree of freedom.

• Charles says:

Colin,
I believe that you are correct. I accidentally used the df for the row above the one I should have used. The same change needs to be made for the ANOVA approach to ANCOVA webpage. I have now made these changes to the website. Thanks for catching this error.
Charles