Example 1: Carry out the analysis for Example 1 of Basic Concepts of ANCOVA using a regression analysis approach.
Our objective is to analyze the effect of teaching method, but without the confounding effect of family income (the covariate). We do this using regression analysis. As we have done several times (see ANOVA using Regression), we use dummy variables for the treatments (i.e. the training methods in this example). We choose the following coding:
t1 = 1 if Method 1 and = 0 otherwise
t2 = 1 if Method 2 and = 0 otherwise
t3 = 1 if Method 3 and = 0 otherwise
We also use the following variables:
y = reading score
x = family income (covariate)
Thus the data from Figure 1 of Basic Concepts of ANCOVA takes the form for regression analysis shown in Figure 1.
Figure 1 – Data for Example 1 along with dummy variables
Now we define the following regression models:
- Complete model (y, x, t, x*t) – all the variables are used, interaction between treatments and income is modeled
- Full model (y, x, t) – all the variables are used, interaction of treatment with income is not modeled
- Partial model (y, t ) – only reading scores and treatments are used
- Partial model (y, x) – only reading scores and income are used
- Partial model (x, t) – only income and treatments are used
Running Excel’s regression data analysis tool for each model we obtain the results displayed in Figure 2 (excluding the complete model, which we will look at later):
Figure 2 – Full and reduced regression models
The ANCOVA model follows directly from Figure 2. There are two versions. The first model, shown in Figure 3, is essentially the full model with the variation due to the covariate identified.
Figure 3 – ANCOVA model for Example 1
The sum squares are calculated as follows (the degrees of freedom are similar):
From Figure 3, we see that the covariate is significant (p-value = 0.012 < .05), and so family income is significant in predicting reading scores.
We also see that differences in training are significant (p-value = .032 < .05) even when family income is excluded. This is equivalent to rejecting the following null hypothesis:
H0: = = =
where is the mean for training method adjusted to remove the effect of the covariate. We’ll have more to say about this in a moment.
Another way at looking at ANCOVA is to remove the covariate from the analysis (see Figure 4).
Figure 4 – Reduced ANCOVA model for Example 1
Here the adjusted regression SS (cell L40) is =L33 (from Figure 3), the residual SS (cell L41) is =L34, and the adjusted total SS (cell L42) is =L40+L41.
An alternative way of calculating SST in the reduced ANCOVA model uses the slope of the regression line that fits all the data points, namely (with reference to Figure 1 of Basic Concepts of ANCOVA)
bT = SLOPE(A4:A39,B4:B39) = 0.376975
Also note that SST(x,t) = DEVSQ(B4:B39).
We now turn our attention to the treatment means adjusted to remove the effect of the covariate. To obtain estimates for these we need to look at the coefficients of the full model, which is displayed in Figure 5.
Figure 5 – Full model (y, x, t), including coefficients
Thus the regression model is
One thing this shows is that for every unit of increase in x (i.e. for every additional $1,000 of family income), y (i.e. the child’s reading score) tends to increase by .323 points.
Note that the mean value of x is given by AVERAGE(B4:B39) = 48.802 (using Figure 1).
To get the adjusted mean of the reading scores for Method 4, we set x = 48.802 and t1 = t2 = t3 = 0, and calculate the predicted value for y:
For Method 1 we set x = 48.802, t1 = 1 and t2 = t3 = 0.
Similarly, for Method 2 we set x = 48.802, t2 = 1 and t1 = t3 = 0.
Finally, for Method 3 we set x = 48.802, t3 = 1 and t1 = t2 = 0.
The results are summarized in Figure 6.
Figure 6 – Adjusted means for Example 1
The values for Y in Figure 6 are the group means of y. E.g. the mean of reading scores for Method 2 is AVERAGE(A12:A19) = 33.75. The adjusted grand mean is the mean of the adjusted means, i.e. AVERAGE(C56:C59) = 23.442.
The adjusted means can also be computed using the slope bW, which is the regression coefficient of x in the full model (i.e. the value in cell S36 of Figure 5), namely bW = .323.
Figure 7 – Alternative method for calculating the adjusted means
Here the values for Y are the group means as described above. The values for X (the covariate) are similar; e.g. the mean family income for the children in the Method 2 sample (cell C49) = AVERAGE(B12:B19) = 60.2625. The grand mean for the covariate (cell C52) is AVERAGE(B4:B39).
The adjusted means are now given by the formula
E.g. the adjusted mean for Method 2 (cell D49) is given by the formula =B49-S36*(C49-C52) where cell S36 contains the value of bW.