Assumptions for ANCOVA

The same assumptions as for ANOVA (normality, homogeneity of variance and random independent samples) are required for ANCOVA. In addition, ANCOVA requires the following additional assumptions:

• For each independent variable, the relationship between the dependent variable (y) and the covariate (x) is linear
• The lines expressing these linear relationships are all parallel (homogeneity of regression slopes)
• The covariate is independent of the treatment effects (i.e. the covariant and independent variables are independent

Example 1: Show that the assumptions hold for the data in Example 1 of Basic Concepts of ANCOVA.

We start by creating a box plot of the reading scores for each of the four methods (using the data from Figure 1 of Basic Concepts of ANCOVA). See Figure 1.

Figure 1 – Box plot for data in Example 1

Each plot looks relatively symmetric and the variances don’t appear to be wildly different. As we can see from the data in Figure 1 of Basic Concepts of ANCOVA, the variances for the reading scores vary from 44.8 to 164.8, which is likely to be an acceptable range to meet the homogeneity of variances assumption.

We now turn our attention to the ANCOVA-specific assumptions. We create a scatter diagram of the y data values against the x data values for each of the four methods. This is done by creating a scatter diagram for Method 1 in the usual way and then choosing Design > Data|Select Data and clicking on the Add button on the left side. Enter the name Method 2 and specify the range for the x and y values in the dialog box that appears. After repeating this procedure for Method 3 and Method 4 and adding linear trend lines for each method, the resulting chart is as in Figure 2.

Figure 2 – Checking whether regression lines are parallel

Although the four lines are not parallel, their slopes are quite similar, indicating that the homogeneity of slopes assumption is met. A further indication of this is to test the complete regression model y, x, t, x*t against the full regression model y, x, t. If there is no significant difference between the models then the interaction terms are not significant, implying that the homogeneity of regression slopes assumption is met. We conduct the same type of test in Testing the Significance of Extra Variables on the Regression Model.

First we use Excel’s regression data analysis tool to create the complete model (see Figure 3) using the range B4:H39 from Figure 1 of Regression Approach to ANCOVA when prompted for the Input X range.

Figure 3 – Complete model (y, x, t, x*t) for data in Example 1

Now we test (see Figure 4) whether there is a significant difference between the complete and full models (as described in Figure 5 of Regression Approach to ANCOVA and Figure 3 above).

Figure 4 – Testing homogeneity of regression line slopes

Row 6 of Figure 4 computes the difference between the R-Square values of the complete and full models. Row 7 computes the difference between the residual degrees of freedom of the two models. The F statistic (cell AB8) is then defined via the formula =AB6*Z7/(AB7*(1-Z6)). Since the p-value for this statistic is larger than .05, we conclude there is no significant difference between the two models, and so accept the homogeneity of regression slopes.

Alternatively, we can get the same result by using the Real Statistics function

RSquareTest(B4:H39, B4:E39, A4:A39) = 0.4615

54 Responses to Assumptions for ANCOVA

1. Colin says:

Sir

It is difficult to test homogeneity of variance & slope by charts. I think the criteria is vague. Why (44.8 to 164.8) is an acceptable range for the hypothesis of homogeneity of variance? And how to check homogeneity of slope from Figure 2? The criteria is too vague and subjective.

• Charles says:

Colin,

Regarding the homogeneity of variance assumption, generally if the largest variance is less than 4 times the smallest variance, the test results will be good. You can instead use one of the tests described in http://www.real-statistics.com/one-way-analysis-of-variance-anova/homogeneity-variances/, especially Levene’s test which is not subjective.

Regarding homogeneity of slopes, looking at the graph is admittedly subjective; testing the complete regression model y, x, t, x*t against the full regression model y, x, t is much less subjective.

Charles

2. rina says:

Sir..
if the data are not homogeneous, whether Anacova test may be used or we have to use another test ?

• Charles says:

Which assumption is not being met: homogeneity of variances or homogeneity of slopes?
Charles

3. Hamid A says:

Dear Charles

Thank you for all your efforts. I have a question on the 3rd point of the ANCOVA assumptions “The covariate is independent of the treatment effects”.
I know that a common use for the ANCOVA is to study pre-test post-test results in different groups, by assigning the pre-test score as covariate, post-test as dependent variable, and treatment group as independent variable. The reason behind using ANCOVA here is to remove the influence of pre-test scores on the post-test results.

But how can we use ANCOVA in this setting if we already know that treatment groups have different pre-test scores, i.e. there’s a correlation between pre-test and groups (not independent). Am I missing something here ?

• Charles says:

Hamid,
If you already know that there is a correlation between the treatments and the pre-test results, you couldn’t use this approach, but generally there is no such correlation since you assign the subjects to the treatment groups randomly.
Charles

• Hamid A says:

Thanks for the prompt reply Charles. I might still be able to use it since the correlation was only found after finishing the experiments and plotting the results (incidental finding).

4. Haitham says:

Hi Charles,
back to Rina’s questions: if homogeneity of slopes (regression)? what other test can be used to correct for covariates? nonparametric (quade) test?

Thanks

5. barun hanjabam says:

homogeneity of variance is violated with unequal sample size in ANCOVA?one controversial approach is to first equalize the sample size through random selection, then set p-value<.001 to reduce alpha error, & then go ahead with ANCOVA. I dnt think this approach is good or defensible. but some people use it.can i get any reference about this approach

• Charles says:

Generally it is frowned upon to delete data elements, but if the sample sizes are fairly similar and so you are only eliminating only a relatively few data elements, then this may be the easiest approach. The main two problems I see are that you lose some data (which reduces power) and you may get a different result depending on which elements are removed (of course you could get a different result if you took a different sample).

I don’t understand why you need to reduce the p-value (although presumably you mean the alpha value).

The following are a few websites that talk more about how to deal with unequal samples.

• barun hanjabam says:

I set the p-value at lower level, in order to reduce the chance of making type-I error.

• barun hanjabam says:

yeah i mean alpha value

6. barun hanjabam says:

Any extra assumptions for ANCOVA using two covariates which are linearly correlated to each other, and to the dependent variable, if at all usable??

• Charles says:

Sorry, but the website hasn’t yet dealt with multiple covariates.
Charles

• barun hanjabam says:

thanks alot Charles, for all the prompt & useful replies

7. barun hanjabam says:

use of ANCOVA to test the effectiveness of some intervention as a change in a variable (DV) in simple pre post study design with only one group (witot any control group) controlling for the change in some other covariate (s) due to that intervention??? possible?

• Charles says:

Sorry, but I don’t understand your question.
Charles

• barun hanjabam says:

Suppose there is one study group in which exercise intervention was given to see the change in VO2max. Due to this intervention, there was increase in lean body mass also. Increase in lean body mass may cause increase in VO2max also. So to see whether (a) the exercise intervention is effective in increasing VO2max independent of increase in lean body mass, and (b)if possible independent of the initial values of VO2max, which test is to be used? any suggestion?

• barun hanjabam says:

So, here i wan to control 2 things : (a) change in some other variables due to the intervention (b) the initial/pre test values of that variable which we are testing if the intervention is effective in changing or not. How can ANCOVA be useful in this case

• Charles says:

Barun,
If I am interpreting your question correctly, ANCOVA could be used in case (a) provided the test assumptions are met. I am not sure what (b) means in the context of ANCOVA.
Charles

Hi Charles,

I have some confusion with my data. I am not sure which statistical analysis to apply.
We conducted an in situ experiment on mosquito. We studied the effect of 2 insecticide and 2 different concentration on the different stages of mosquito. The study was carried out for 30 days. Sampling was conducted every 2 days for 30 days. I would like to know the efficiency of the insecticide, which insecticide and concentration was effective in controlling the mosquito and in how many days. My question is, whether ANCOVA is the appropriate analysis for my data or repeated measures of ANOVA?

• Charles says:

Hi Sanitha,
It looks like you have several factors: insecticide type (2 levels), concentration (2 levels) and time (16 levels). Thus you have two fixed factors and a repeated measures factor (time). This sounds like a repeated measures ANOVA.
If you use ANCOVA which is the covariate?
You might have another factor, namely mosquito stage, although this may be subsumed in the time factor.
Charles

9. Andrew says:

I am looking to use ANCOVA to look at group differences in a pre- post- test design using pretest as a covariate. one group is measured during 2 sessions, before and after a treatment; the other group is a control and is measured on two sessions with no treatment. DV is number of minutes spent talking during the session. Problem is the pretest scores (the covariate) as well as the post test scores (number of minutes) are very non-normal in their distribution, with lots of measurements at zero minutes and the rest showing some normalcy. Very skewed distributions with most scores at zero. This seems to violate a major assumption of the ANCOVA. Ideas? I have another categorical covariate with 3 levels that does not account for any variance in pre or post measurements. Thanks.

• Andrew says:

One problem (i think)…that (i think) disinclines me from a repeated measures ANOVA is that i have (by chance) a significant difference between groups on the pretest.

• Charles says:

Andrew,
If you are only comparing two groups, you might be able to use a nonparametric test (e.g. Wilcoxon signed-ranks test). You might be able to use Friedman’s test with more than two groups.
Charles

• Charles says:

Andrew,
Dis you try using a transformation to satisfy the normality assumption?
Charles

• Andrew says:

have not done any transformation. always felt they were voodoo. i have lots of zeros (maybe 40%) in my data so cannot do a log transform….square root doesn’t do enough… I might just do a repeated measures ANOVA but I still think my data violate assumptions. is there a transform you recommend?

• Charles says:

Andrew,
I can’t say for sure that a transformation is the way to go, but it could be a solution. Without more information, I couldn’t tell you which transformation is best. Even if there are a lot of zeros, a log transformation can be used. E.g say -5 is the smallest sample value then you could a transformation of form LN(x+6).
Charles

• Andrew says:

Thank you so much Charles. I’ve transformed the data and performed both ANCOVA and repeated measures ANOVA with little difference between the two. You have been very helpful. Know that your assistance is appreciated!

~Andy

• Charles says:

Andrew,
Glad that I could be of assistance.
Charles

10. Lava says:

Hi Charles,

I’m doing an ANCOVA for the first time for my dissertation and I’m a little bit confused.

I’m looking at stigma attitudes towards mental health in children. I did pre and post test surveys measuring stigma attitudes and emotional intelligence, with an intervention workshop challenging stigma towards mental health in between the two times. I also had a control group who did the same surveys but no intervention. I analyzed the scores/data using repeated measures ANOVA and found a significant main effect of time on stigma, as well as interaction effect of group x time. Now I want to test whether emotional intelligence has an effect on stigma, and have been told by my research supervisor to use ANCOVA to do this. I’ve watched plenty of videos on youtube explaining it but I don’t know if I’m doing it right as I don’t understand the output I’m getting from it… If you could give me a hand I will be forever grateful!

• Charles says:

Lava,
If you send me an Excel file with your data and analysis, I will see if I can help you.
Charles

11. wong says:

i have pretest as covariat, but if pretest in levene’s test not significant, means that there was no significant different between control and experiment group in their pretest, so we should use ANCOVA or ANOVA as it has no different on pretest..

• Charles says:

Sorry, but you have not described the situation well enough for me to answer.
Charles

Hi Charles,

I have a similar issue as someone posted above, but i did not really understand the answer.

I have data from two groups that I would like to compare, while taking into account a covariate. none of the data is normally distributed (group A, B, or the covariates). I’m not sure which analysis to use. is there a mann-witney U test with a covariate? or could this be done with regression (not sure how to set that up though).

13. Legrand says:

Dear Charles,
My question may be a little bit “far” from the topic but I hope you can nonethless help me. It deals the assumptions that need to be satisfied when runing an ANCOVA with a categorical covariate.
When the covariate is continuous, as you say, three assumptions need to be met : (1) For each independent variable, the relationship between the dependent variable (y) and the covariate (x) is linear, (2) The lines expressing these linear relationships are all parallel (homogeneity of regression slopes), (3) The covariate is independent of the treatment effects (i.e. the covariant and independent variables are independent).
Now, when the covariate is categorical, are there assumptions to be met? I would say that the third one would be that there is no interaction between the IV and the covariate, but is there any equivalent for the first and second one?
Thanks a lot for your help!

14. Franzi says:

Hey Charles,

i´m so struggling with one analysis of my bachelor thesis, maybe you can help me.
I´m having a mediator hypothesis, which I can´t compute correctly with SPSS according to my prof. Now he either wants me to perform a ANCOVA or a partial correlation.

I have three variables to be included:
The independent variable (grouping variable) has two stages (forms of strategies of perspective taking)
The dependent variable is metric (number of correct predictions in a partner game)
and the influencing variable is based on a likert-scale (similarity perception to partner)

Can I even do an ANCOVA? Or a partial correlation? I already tried and SPSS is giving me some results, but i don’t know if it´s allowed to do this analysis. Been hanging there for such a long time, I´m so confused now.

Hope it´s clear for you what I mean,

• Charles says:

Franzi,
I don’t have enough information to give you a precise answer. It seems like either approach may be possible, but more importantly what hypothesis are you trying to test? First you need to determine what you want to test and then you can determine which test is appropriate.
Charles

• Franzi says:

Of course, i´m sorry for the lack of information!

The hypothesis is:
Participants in the condition Imagine Self, who see themselves as similar to their partner show a significantly higher prediction accuracy, than participants who don’t. In contrast, the perception of similarity doesn’t have an influence in the condition Imagine Other.

Thanks for helping me!
Franzi

• Charles says:

Franzi,
Thanks for providing a clear statement of your objective. This is very important, and often people jump straight to testing before they are clear as to what they should be testing for.
Unfortunately, I don’t have enough details to answer your original question. If you can, I will ask your professor to explain better why he is recommending the two approaches that he has suggested to you.
Charles

15. Lisa says:

I am interested in looking at grade level differences between students on end of year math scores, after controlling for the pretest. I found that my data violates homogeneity of regression slopes. How should I proceed? Is there an alternative test? Thanks!

• Charles says:

Lisa,
Sorry, but I don’t understand what you are trying to test. What hypothesis are you trying to test?
Charles

• Lisa says:

Are there differences in end of year math scores between groups (e.g., ethnic groups, grade level, gender), after controlling for beginning of year math scores?

• Charles says:

Lisa,
Potential approaches are ANCOVA, MANOVA and repeated measures ANOVA. My understanding is that the homogeneity of variances assumption is not met. One approach for dealing with this is to go directly to ad hoc tests.
Charles

• Lisa says:

Thanks!

16. Ryanne says:

Hi Charles,

You did not clearly address how to conduct the assumption of linearity or how you can calculate it. Could you maybe illustrate this for me?

• Charles says:

Ryanne,
The usual way to test linearity is to create scatter plot and whether the point are reasonably aligned (i.e . are reasonably close to some straight line).
In the case of ANCOVA, you need to do this for each independent variable. For the example on the referenced webpage, this means that you create four scatter plots between Reading Score and Income, for each of the four teaching methods.
Charles

17. Alicia says:

Hi Charles,

I want to run an ANCOVA using R so as to evaluate the effect of several categorical factors (which are sex, age, area, etc., with several levels each, such as male/female, adult/subadult, a/b/c/d, etc.) on the relationship between length (continuous covariate) and weight data (response variable), that is, body condition. So, the question I’m trying to answer is: Are there any differences among the different levels of sex/age/etc in their body condition?

Prior to that, I have to check both normality and homogeneity of variances assumptions. I have recently known that it is the residuals of the linear model and not the variables the ones that must fulfill normality.
But my question is, how I am supposed to check it? I mean, should I run the Shapiro-Wilk test for the residuals of the logweight~loglength regression for each level of each factor? Putting sex as an example, should I run it for males and females separatedly? As there are some factors in my dataset which have many levels, I wonder whether this is correct. And in that case, if there is a quicker (although statistically correct) way to do so.