The same assumptions as for ANOVA (normality, homogeneity of variance and random independent samples) are required for ANCOVA. In addition, ANCOVA requires the following additional assumptions:

- For each independent variable, the relationship between the dependent variable (y) and the covariate (
*x*) is linear - The lines expressing these linear relationships are all parallel (homogeneity of regression slopes)
- The covariate is independent of the treatment effects (i.e. the covariant and independent variables are independent

**Example 1**: Show that the assumptions hold for the data in Example 1 of Basic Concepts of ANCOVA.

We start by creating a box plot of the reading scores for each of the four methods (using the data from Figure 1 of Basic Concepts of ANCOVA). See Figure 1.

**Figure 1 – Box plot for data in Example 1**

Each plot looks relatively symmetric and the variances don’t appear to be wildly different. As we can see from the data in Figure 1 of Basic Concepts of ANCOVA, the variances for the reading scores vary from 44.8 to 164.8, which is likely to be an acceptable range to meet the homogeneity of variances assumption.

We now turn our attention to the ANCOVA-specific assumptions. We create a scatter diagram of the y data values against the *x* data values for each of the four methods. This is done by creating a scatter diagram for Method 1 in the usual way and then choosing **Design > Data|Select Data** and clicking on the **Add** button on the left side. Enter the name Method 2 and specify the range for the *x* and y values in the dialog box that appears. After repeating this procedure for Method 3 and Method 4 and adding linear trend lines for each method, the resulting chart is as in Figure 2.

**Figure 2 – Checking whether regression lines are parallel**

Although the four lines are not parallel, their slopes are quite similar, indicating that the homogeneity of slopes assumption is met. A further indication of this is to test the complete regression model y*, x, t, x*t* against the full regression model y*, x, t*. If there is no significant difference between the models then the interaction terms are not significant, implying that the homogeneity of regression slopes assumption is met. We conduct the same type of test in Testing the Significance of Extra Variables on the Regression Model.

First we use Excel’s regression data analysis tool to create the complete model (see Figure 3) using the range B4:H39 from Figure 1 of Regression Approach to ANCOVA when prompted for the Input X range.

**Figure 3 – Complete model (****y , x, t, x*t**

*)*for data in Example 1Now we test (see Figure 4) whether there is a significant difference between the complete and full models (as described in Figure 5 of Regression Approach to ANCOVA and Figure 3 above).

**Figure 4 – Testing homogeneity of regression line slopes**

Row 6 of Figure 4 computes the difference between the R-Square values of the complete and full models. Row 7 computes the difference between the residual degrees of freedom of the two models. The *F* statistic (cell AB8) is then defined via the formula =AB6*Z7/(AB7*(1-Z6)). Since the p-value for this statistic is larger than .05, we conclude there is no significant difference between the two models, and so accept the homogeneity of regression slopes.

Alternatively, we can get the same result by using the Real Statistics function

RSquareTest(B4:H39, B4:E39, A4:A39) = 0.4615

Hi Charles,

I have some confusion with my data. I am not sure which statistical analysis to apply.

We conducted an in situ experiment on mosquito. We studied the effect of 2 insecticide and 2 different concentration on the different stages of mosquito. The study was carried out for 30 days. Sampling was conducted every 2 days for 30 days. I would like to know the efficiency of the insecticide, which insecticide and concentration was effective in controlling the mosquito and in how many days. My question is, whether ANCOVA is the appropriate analysis for my data or repeated measures of ANOVA?

Hi Sanitha,

It looks like you have several factors: insecticide type (2 levels), concentration (2 levels) and time (16 levels). Thus you have two fixed factors and a repeated measures factor (time). This sounds like a repeated measures ANOVA.

If you use ANCOVA which is the covariate?

You might have another factor, namely mosquito stage, although this may be subsumed in the time factor.

Charles

Thankx! Charles for the prompt response & the helpful advice

use of ANCOVA to test the effectiveness of some intervention as a change in a variable (DV) in simple pre post study design with only one group (witot any control group) controlling for the change in some other covariate (s) due to that intervention??? possible?

Sorry, but I don’t understand your question.

Charles

Suppose there is one study group in which exercise intervention was given to see the change in VO2max. Due to this intervention, there was increase in lean body mass also. Increase in lean body mass may cause increase in VO2max also. So to see whether (a) the exercise intervention is effective in increasing VO2max independent of increase in lean body mass, and (b)if possible independent of the initial values of VO2max, which test is to be used? any suggestion?

So, here i wan to control 2 things : (a) change in some other variables due to the intervention (b) the initial/pre test values of that variable which we are testing if the intervention is effective in changing or not. How can ANCOVA be useful in this case

Barun,

In case (b) it sounds like you are considering multiple timeframes (i.e. repeated measures) and perhaps not ANCOVA. You can find a discussion of this on the following webpage:

http://www.theanalysisfactor.com/pre-post-data-repeated-measures/

Charles

Barun,

If I am interpreting your question correctly, ANCOVA could be used in case (a) provided the test assumptions are met. I am not sure what (b) means in the context of ANCOVA.

Charles

Any extra assumptions for ANCOVA using two covariates which are linearly correlated to each other, and to the dependent variable, if at all usable??

Sorry, but the website hasn’t yet dealt with multiple covariates.

Charles

thanks alot Charles, for all the prompt & useful replies

homogeneity of variance is violated with unequal sample size in ANCOVA?one controversial approach is to first equalize the sample size through random selection, then set p-value<.001 to reduce alpha error, & then go ahead with ANCOVA. I dnt think this approach is good or defensible. but some people use it.can i get any reference about this approach

Generally it is frowned upon to delete data elements, but if the sample sizes are fairly similar and so you are only eliminating only a relatively few data elements, then this may be the easiest approach. The main two problems I see are that you lose some data (which reduces power) and you may get a different result depending on which elements are removed (of course you could get a different result if you took a different sample).

I don’t understand why you need to reduce the p-value (although presumably you mean the alpha value).

The following are a few websites that talk more about how to deal with unequal samples.

https://www.uvm.edu/~dhowell/gradstat/psych341/lectures/Ancova-Uneq/Covar1.html

http://www.csun.edu/~ata20315/psy524/docs/Psy524%20Lecture%209%20ANCOVA.ppt

http://www.ams.sunysb.edu/~zhu/ams57213/Team3.pptx

I set the p-value at lower level, in order to reduce the chance of making type-I error.

yeah i mean alpha value

Hi Charles,

back to Rina’s questions: if homogeneity of slopes (regression)? what other test can be used to correct for covariates? nonparametric (quade) test?

Thanks

The following webpages may be useful in the case of ANCOVA with unequal slopes:

https://books.google.it/books?id=coz_Fss1tY8C&pg=PA150&lpg=PA150&dq=ancova+with+unequal+slopes&source=bl&ots=PZKOkTQQVT&sig=qILcKMBQtHsF-o0Tzc40xcRCQ7U&hl=en&sa=X&ved=0ahUKEwjZv_-0t6nKAhXEGw8KHYPqAqk4FBDoAQg2MAQ#v=onepage&q=ancova%20with%20unequal%20slopes&f=false

http://www.ncbi.nlm.nih.gov/pubmed/21887795

Charles

Dear Charles

Thank you for all your efforts. I have a question on the 3rd point of the ANCOVA assumptions “The covariate is independent of the treatment effects”.

I know that a common use for the ANCOVA is to study pre-test post-test results in different groups, by assigning the pre-test score as covariate, post-test as dependent variable, and treatment group as independent variable. The reason behind using ANCOVA here is to remove the influence of pre-test scores on the post-test results.

But how can we use ANCOVA in this setting if we already know that treatment groups have different pre-test scores, i.e. there’s a correlation between pre-test and groups (not independent). Am I missing something here ?

Hamid,

If you already know that there is a correlation between the treatments and the pre-test results, you couldn’t use this approach, but generally there is no such correlation since you assign the subjects to the treatment groups randomly.

Charles

Thanks for the prompt reply Charles. I might still be able to use it since the correlation was only found after finishing the experiments and plotting the results (incidental finding).

Sir..

if the data are not homogeneous, whether Anacova test may be used or we have to use another test ?

Which assumption is not being met: homogeneity of variances or homogeneity of slopes?

Charles

Sir

It is difficult to test homogeneity of variance & slope by charts. I think the criteria is vague. Why (44.8 to 164.8) is an acceptable range for the hypothesis of homogeneity of variance? And how to check homogeneity of slope from Figure 2? The criteria is too vague and subjective.

Colin,

Regarding the homogeneity of variance assumption, generally if the largest variance is less than 4 times the smallest variance, the test results will be good. You can instead use one of the tests described in http://www.real-statistics.com/one-way-analysis-of-variance-anova/homogeneity-variances/, especially Levene’s test which is not subjective.

Regarding homogeneity of slopes, looking at the graph is admittedly subjective; testing the complete regression model y, x, t, x*t against the full regression model y, x, t is much less subjective.

Charles