In some experiments where we use ANOVA some of the unexplained variability (i.e. the error) is due to some additional variable (called a **covariate**) which is not part of the experiment. If we can somehow remove the effect of this variable, we could reduce the error variance thus enabling us to get a more accurate picture of the true effect of the independent variable. This is the main goal of **Analysis of Covariance** (**ANCOVA**).

As usual we will try to understand how ANCOVA works via an example. We provide two approaches for performing ANCOVA: one a modified ANOVA and the other using regression.

**Example 1**: A school system is exploring four methods of teaching reading to their children, and would like to determine which method is best. It selects a random sample of 40 children and randomly divides them into four groups, using a different teaching method for each group. The reading score of each of the children after a month of training is given in Figure 1.

Before doing the analysis one of the researchers postulated that the scores of the children would be influenced by the income of their families, speculating that children from higher income families would do better on the reading tests no matter which teaching method was used, and so this factor should be taken into account when trying to determine which teaching method to use. The family income (in thousands of dollars) for each of the children in the study is also given in Figure 1. Based on the data, is there a significant difference between the teaching methods?

**Figure 1 – Data for Example 1**

The ANOVA approach to addressing this example is given in ANOVA Approach to ANCOVA and the regression approach to addressing this example is given in Regression Approach to ANCOVA.

Dear charles,

Thank you for your site. I downloaded add-in for Excel 2007, but post-hoc tests for ANOVA are not working properly. The “c” columns are all empty. Is there any solution for this?

Thank you and all best.

Dear Vanja,

You need to fill in the values in this column. For contrasts you fill in positive values between 0 and 1 that add up to 1 and negative values between -1 and 0 that add up to -1. For the other (pairwise) ad hoc tests you need to fill in one cell in this column with a +1 and another with a -1. This is all described on the webpage Unplanned Tests.

Charles

Dear Charles,

Thank you very much for such a wonderful site/forum. I am learning a lot from it. I have got a few questions on the assumptions of ANCOVA and I would really appreciate your feedback/comments. I am running ANCOVA tests to examine the influence of my independent variables (ethnicity, gender, age group) on my participants’–they are immigrants to an English speaking country–attitudes towards learning their ethnic languages, while controlling for age at the time of arrival (Covariate). My questions are as follows:

First, When I test for the homogeneity of variance, do I need to run the test separately for each independent variable? What I do is running “homogeneity of variance” for all my independent (fixed factors: age group, ethnicity, gender) and covariate (age @ arrival) together, not separately! Can you please advise if what i am doing is correct?

Second, as for the assumption of the homogeneity of regression slopes for ANCOVA, does the interaction between age at arrival (covariate) and the independent variables need to be carried out separately for each an independent variable? or that they can be tested all together. What I did for example was checking the interaction between age at arrival and the other independent variables together in the same model (i.e. age@arrival*ethnicity age@arrival*agegroup age@arrival*gender etc.. Is this method of testing the homogeneity of regression slopes correct?

Third, suppose I find, when testing the assumptions of ANCOVA, that all were met except for the homogeneity of regression slopes. For example, when I was playing with my data I found that the covariate (age@arrival) was not independent of the age group but independent of ethnicity and gender. What should I do now? Should I carry on the analysis or choose another test? Any ideas/suggestions.

Thank you very much Charles for your assistance and help

Ayman,

Sorry for the delay in answering your third question. The following websites might be helpful in addressing the situation where the homogeneity of slopes assumption is not met.

http://film.fsu.edu/content/download/51984/428142/file/Rank_Analysis_of_Covariance.pdf

https://www.uvm.edu/~dhowell/gradstat/psych341/lectures/Ancova-Uneq/Covar1.html

http://www.statisticshell.com/docs/ancova.pdf

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3203541/

Charles

Hi,

Thank you for your site. It is extremely helpful for a beginner such as myself.

I had a question that I was hoping you could answer.

I am working on designing a research project and I am trying to determine the appropriate statistical analysis.

My research project (simplified version) is the following:

There are 4 groups of people: 75 with breast cancer, 50 with lung cancer, and 25 with renal cancer. The 4th group is all cancers for 150 people.

Assume all groups have the same anti-cancer drug. We then measure three dependent variables: improvement (yes/no), recurrence of cancer (yes/no), and other drug complications (yes/no).

Also, let us assume there are 3 independent variables for each of the groups: age (years), sex (male/female), and length of cancer before treatment (months).

Also, let us assume then after treatment it seems like people with breast cancer have higher improvement rates, less recurrence rates, and less complication rates, in comparison to people with lung or renal cancer. However, that could be misleading if people with breast cancer were also younger, female, and had cancer for a shorter duration, in comparison to those with lung or renal cancer.

So, I am having trouble determining which statistical analysis to study the same anti-cancer drug for 4 groups (breast, lung, renal, and all cancers) for 3 dependent variables (improvement, recurrence, and complications).

Then, I would like to see if there is a correlation/statistical significance among 3 independent variables across the 4 groups (age, sex, length of cancer before drug) to the same 3 dependent variables (improvement, recurrence, and complications).

Thank you!

Sean,

I don’t have enough information to provide a definitive response, but here are some ideas.

Regarding your second question, you can run MANOVA to see whether there is a difference among the 3 independent variables across the 4 groups.

Regarding the first question you might be able to use MANCOVA to remove the effects of some confounding variables. You might use multivariate regression (or multinomial logistic regression) and look at the interaction effects.

Charles

Can the 4th group not be a control group rather than a method 4 group?

The 4th group can be the control group or something else.

Charles

The data presented for example 1 for ANOVA mentions that 3 different teaching methods were used. However, the the data in Fig 1 displays 4 methods of the teaching methods. Similarly, there are 4 income groups shown. It appears that there an error here.

Thanks

You are correct. Thanks for catching this mistake. I have now updated the webpage to reflect that there are 40 students in 4 groups.

Charles