Homogeneity of Variances

Certain tests (e.g. ANOVA) require that the variances of different populations are equal. This can be determined by the following approaches:

  • Comparison of graphs (esp. box plots)
  • Comparison of variance, standard deviation and IQR statistics
  • Statistical tests

The F test presented in Two Sample Hypothesis Testing of Variances can be used to determine whether the variances of two populations are equal. For three or more variables the following statistical tests for homogeneity of variances are commonly used:

  • Levene’s test
  • Bartlett’s test

Using the terminology from Definition 1 of Basic Concepts for ANOVA, the following null and alternative hypotheses are used for either of these tests:

H0\sigma_1^2 = \sigma_2^2 = ⋯ = \sigma_k^2

H1: Not all variances are equal (i.e. \sigma_i^2 ≠ \sigma_j^2 for some i, j)

Levene’s Test

For Levene’s test, the residuals eij of the group means from the cell means are calculated as follows:

Residuals Levene's test

An ANOVA is then conducted on the absolute value of the residuals. If the group variances are equal, then the average size of the residual should be the same across all groups.

Example 1: Use Levene’s test to determine whether the 4 samples in Example 2 of Basic Concepts for ANOVA have significantly different population variances.

Levene's test Excel

Figure 1 – Levene’s test for Example 1

Since p-value = .90357 > .05 = α (Figure 1), we cannot reject the null hypothesis, and conclude there is no significant difference between the 4 group means, and so the ANOVA test conducted previously for  Example 2 of Basic Concepts for ANOVA satisfies the homogenity of variances assumption.

There are three versions of the Levene’s test:

  • Use of mean (as in the explanation above)
  • Use of median (replace mean by median above)
  • Use of 10% trimmed mean (replace mean by 10% trimmed mean above)

The three choices determine the robustness and power of Levene’s test. By robustness, we mean the ability of the test to not falsely detect unequal variances when the underlying data are not normally distributed and the variables are in fact equal. By power, we mean the ability of the test to detect unequal variances when the variances are in fact unequal.

Levene’s original paper only proposed using the mean. Brown and Forsythe extended Levene’s test to use either the median or the trimmed mean. They performed Monte Carlo studies that indicated that using the trimmed mean performed best when the underlying data had a heavy-tailed distribution and the median performed best when the underlying data had a skewed distribution. Using the mean provided the best power for symmetric, moderate-tailed, distributions.

Although the optimal choice depends on the underlying distribution, the definition based on the median is recommended as the choice that provides good robustness against many types of non-normal data while retaining good power. Another choice may be better based on knowledge of the underlying distribution of the data.

Some cautions about Levene’s test: You need to assume that the absolute values of the residuals satisfy the assumptions of ANOVA. Also, a more liberal cut off value when testing homogeneity of variances is often used due to the poor power of these tests.

Real Statistics Function: The following supplemental functions contained in the Real Statistics Resource Pack compute the p-value for Levene’s test.

LEVENE(R1, type) = p-value of for Levene’s test for the data in range R1. If type = 0 then group means are used; if type > 0 then group medians are used; if type < 0 then 10% trimmed group means are used. If the second argument is omitted it defaults to 0.

This function ignores any empty or non-numeric cells.

For example, for the data in Example 1, LEVENE(B6:E13) = LEVENE(B6:E13, 0) = 0.90357 (referring to Figure 1). Note that, for the same data, LEVENE(B6:E13, 1) = 0.97971 and LEVENE(B6:E13, 2) = 0.90357.

Real Statistics Data Analysis Tool: A Levene’s Test option is included in the Single Factor Anova data analysis tool. This options displays the results of all three versions of Levene’s test.

To use this tool for Example 1, enter Ctrl-m and select Single Factor Anova from the menu. A dialog box similar to that shown in Figure 1 of Confidence Interval for ANOVA appears. Enter B5:E13 in the Input Range, check Column headings included with data, select the Levene’s Test option and click on the OK button.

Bartlett’s Test

We now show another test for homogeneity of variances using the Bartlett’s test statistic B, which is approximately chi-square:

Bartlett's test statistic

where s2 is the pooled variance, which as we have seen is MSW, and


B can also be defined as follows:


Here MSW is the pooled variance across all groups. Thus the null hypothesis that all the group variances are equal is rejected if p-value < α where p-value = CHIDIST(B, k–1). B is only approximately chi-square, but the approximation should be good enough if there are at least 3 observations in each sample.

Bartlett’s test is very sensitive to departures from normality. If the samples come from non-normal distributions, then Bartlett’s test may simply be testing for non-normality. Levene’s test is less sensitive to departures from normality.

Example 2: Use Bartlett’s test to determine whether the 4 samples in Example 2 of Basic Concepts for ANOVA have significantly different population variances.

Bartlett's test Excel

Figure 2 – Bartlett’s test for the data in Example 2

We obtain Bartlett’s test statistic B (cell I6 of Figure 2) by calculating the numerator and denominator of B as described above (cells I4 and I5). To do this we first calculate the values dfj, 1 ⁄ dfj, s_j^2 and ln s_j^2 (cells in the range B13:E16). We also calculate dfW, 1 ⁄ dfW, MSW and ln MSW (cells in range F13:F16). Note that MSW = SUMPRODUCT(B13:E13,B15:E15)/F13.

Since p-value = CHITEST(B, k–1) = CHITEST(1.88,3) < .979 > .05 = α, we don’t reject the null hypothesis, and so conclude that there is no significant difference between the variances of the four methods.

Note that if we change the first sample for Method 4 to 185 (instead of 85) and repeat the analysis we would find that there would be a significant difference in the variances (B = 17.23, p-value = .001 < .05 = α). This would be due to this one outlier. That it was an outlier would show up easily in any graphic representation. We would then need to decide whether this item was simply an error in measurement or a true measurement (see Outliers in ANOVA).

Dealing with non-heterogeneity of variances

We present four ways of dealing with models where the variances are not sufficiently homogeneous:

In the rest of this section we will look at transformations that can address homogeneity of variance. In particular, we look at square root and log transformations. For transformations that address normality Transformations to Create Symmetry.

Log transformation for homogeneity of variances: A log transformation can be effective when the standard deviations of the group samples are proportional to the group means. Here a log to any base can be used, although log base 10 and the natural log (i.e. log base e) are the common choices. Since you can’t take the log of a negative number, it may be necessary to use the transformation f(x) = log(x+a) where a is a constant sufficiently large to make sure that all the x + a are positive.

Example 3: In an experiment the data in Figure 3 were collected. Check that the variances are homogeneous before proceeding with other tests.

Data homogeneity of variances

Figure 3 – Data for Example 3

The sample variances in Figure 3 seem quite different. When we perform Levene’s test (Figure 4), we confirm that there is a significant difference between the variances (p-value = 0.024 < .05 = α).

Levene's homogeneity variances Excel

Figure 4 – Levene’s test for data in Example 3

We note there is a correlation between the group means and group standard deviations (r  = .88), which leads us to try making a log transformation (here we use base 10) to try to achieve homogeneity of variances (table on the left of Figure 15.23).

We can see that the variances in the transformed data are more similar. This time Levene’s test (the table on the right of Figure 5) shows that there is no significant difference between the variances (p-value =.20 > .05).

Log transformation Levene's Excel

Figure 5 – Log transform and Levene’s test

Square root transformation for homogeneity of variances: When the group means are proportional to the group variances, often a square root transformation f(x) = \sqrt{x} is useful. Since you can’t take the square root of a negative number, it may be necessary to use a transformation of form f(x) = \sqrt{x + a}, where a is a constant chosen to make sure that all values of x + a are positive. If the values of x are small (e.g. |x| < 10), it might be better to use the transformation f(x) = \sqrt{x + .5} or f(x) = \sqrt{x} + \sqrt{x + 1}.

35 Responses to Homogeneity of Variances

  1. Ned from Norn Iron says:

    Many thanks for this… The easy to follow guide to Levene’s and Bartlett’s included in your download is just what I needed to sort out a tricky analytical problem…

    • admin says:

      I am very pleased that the site has been useful for you. I hope that you will use it again in the future.

  2. Sriya says:


    Can you please let me know what transformation method I should be using if both standard deviation to means and means to variances are not proportional? There is no strong correlation for both?


    • Charles says:

      There is no easy answer to your question. It all depends on your data. There are an unlimited number of transformations as well (1/x, x^2, etc.). It also may turn out that a particular transformation creates more problems than it solves.

  3. Colin says:


    Will you add a real statistics function for “Bartlett’s Test” ?


    • Charles says:

      Bartlett’s Test is also called Box’s Test. This is already included in the Real Statistics Resource Pack (see multivariate statistics portion of the website).

  4. Deborah says:

    Hi, I just wanted to ask, what happens if your levene’s test is positive so homogeneity of variance cannot be assumed in a factorial independent measures ANOVA. I know that you have to change the significance to p=0.01 instead of 0.05 (or something along those lines) but what do I do in terms of SPSS? I have run the test as normal but I don’t know how I am supposed to interpret my results considering levene’s positivity.

    Please help!!! Thank you

  5. sonia says:

    sir i wanted to ask why homogeneity of variance is so important?please tell me in some points…

    • Charles says:

      Homegeneity of variances is a requirement for many of the most used statistical tests, including ANOVA. Fortunately most such tests are pretty foriving and as long as the variances are not too unequal the tests give pretty accurate results, but when the requirement is sufficiently violated then the results of these tests can be quite unreliable.

  6. Valerian says:

    hi! i want to ask on the interpretation of Bartlet’s test on the Gen stat discover program for ANOVA, i do fail to interpret it

  7. praveen kumar says:

    I want do homogeneity test for two variances please tell me to do the test

    • Charles says:

      Levene’s test can be used for two variance. You can use the LEVENE function as described on the referenced webpage.

  8. umar iqbal says:

    sir i want to know how do i find the relationship between export growth and variation between real and nominal exchange rate based on the measure of 3 months and 6 months i have collected data but now i m confused how do i apply non-parametric test on it and which test…

  9. Lucy says:

    Dear Sir,
    I am analyzing a field experiment on 4 maize varieties. The varieties were replicated three times in one location. Should I examine the homogeneity and normality tests?


    • Charles says:

      It really depends on what you are trying to test. The ANOVA tests will require homogeneity of variance and normality, but they can be quite forgiving even if these assumptions aren’t completely satisfied.

  10. Alfiya says:

    Dear Sir,
    I am doing the One-Way ANOVA analysis. My p-value =0,233 for Levene’s test. Since my data was not normally distributed I transformed it. Do I need to perform Levene’s test again for transformed data? (I have tried and p-value is less then 0.05)
    Thank you

    • Charles says:

      Since you will be testing the transformed data you need to make sure that the assumptions are satisfied on the transformed data. Since homogeneity of variance is an assumption for One-way ANOVA this assumption needs to be verified for the transformed data. Levene’s test is a way of checking this.

  11. Kennedy says:

    I love your website it is so useful in helping me solve statistical problems. I am a little confused about how to perform hypothesis testing when the observations are just given as one total without actually listing them separately – 125 observations (Southern States) and 132 observations (Northern States) with a sample mean of 87 and 88 respectively and a population variance of 7.0 and 6.2 respectively. Level of significance is .01 is there evidence that the workers in southern states are receiving less pay than workers in northern states?

    • Charles says:

      Since you have the population variances you can use a two sample test using the normal distribution, as described in Theorem 1 of Comparing Two Means.

      The null hypothesis is mean1 >= mean2 (these are population means). The test statistic is z = (m1-m2)/stdev, where m1-m2 = 88-87 = 1 (sample means) and stdev = sqrt(var) where var = v1^2/n1 + v2^2/n2 = 6.2^2/132 + 7.0^2/125. If NORMSDIST(z) > .99 then you reject the hypothesis that the the workers receive the same pay. This is a one tailed test. If you want a two-tailed test you need to replace .99 by .995.

      If instead of the population variance you had the sample variances you would use Theorem 1 of Two Sample t Test instead.


  12. Sumit says:

    Can one perform t-test or ANOVA using CV if the variance between group/s is not similar? If yes, how does one do it? I am a statistics illiterate

    • Charles says:

      There is a version of the t-test which you can use when the variances are not similar. See the webpage two sample t-test with unequal variances.

      There are also substitute tests for ANOVA when the variances are not equal. See the Dealing with non-heterogeneity of variances topic on the referenced webpage.

      You use the abbreviation “CV”. What does this stand for?


  13. saif salim says:

    Dear sir please I need your help:
    1- 40 students are divided into two groups of 20 each [the control group (i.e CG) and the experimental group(i.e. EG)] they are considered equal in their level of English language study. The CG read version A of a text with certain rhetorical organization , on the other hand the EG read version B of the same text with different rhetorical organization . They are asked to read these two versions and recall information from them so the amount of information recalled and speed of reading spent are recorded . I want to test the following hypothesis ” To what extent will the change (in the rhetorical pattern) affect the ease of information recalled and speed of reading as well? How can I use ANOVA test please when the CG have two marks of amount of information recalled and marks of speed of reading and the EG have also two marks also

    • Charles says:

      It sounds like you have one independent variable Rhetorical organization (RO) and two dependent variables: Information recalled (IR) and Reading speed (RS). You can use MANOVA, or more simply Hotelling’s T-square as described on the webpage Hotelling’s T-square.

      • saif salim says:

        Dear sir, I consider the students (i.e. CG and EG) as dependent variables since there is no difference between them and considered equal, the RO (i.e.the two versions of texts version A and B with different rhetorical organization) as independent variable since it is varied . However the IR and SR are the responses or marks gathered and not variables .Can I use ANOVA test to examine the responses i.e. the IR (amount of information recalled) then do the same procedures to examine the RS (speed of reading) with my thanks and regards sir.

        • Charles says:

          Sorry, but I still don’t understand the question. E.g. in the statement “I consider the students (i.e. CG and EG) as dependent variables since there is no difference between them and considered equal…”, I don’t understand why this statement would make CG and EG dependent variables. I really don’t understand the other statements either.

  14. niki says:


    i conducted a two way anova and the levene’s test p value is 0.001, my study has a continuous dv (eating style) and two categorical iv’s
    Not sure what to do as ive used this analysis to show the effects of two other eating styles?

    • Charles says:

      Sorry, but I don’t understand your question. Please explain in a little more dtail, what you have done and what you want to accomplish.

      • niki says:

        i have conducted a two way anova to see if there is a sig difference in the means of eating style scores when participants reported their stress levels (high vs low stress) and sleep quality (good vs poor). I have conducted this for both emotional and external eating style but when i conducted the analysis for restrained eating style the homogeneity assumption was violated what can i do?

        • Charles says:

          Look at the Dealing with non-heterogeneity of variances topic towards the end of the referenced webpage.

  15. brain matienga says:

    dear sir
    I am doing research on the effects of wet/dry and wet feeding troughs on feeding grower pigs . so i have 2 treatments wet feeding and dry feeding and 2 blocks that is males and female so i have 24 males and 24 females all pigs are of same age and genetic .Can you help me on the method of data analysis to use. I am measuring the feed in take daily and weighing pigs on entery and then once weekly for 8 weeks

    • Charles says:

      It all depends on what you are trying to test, but based on what you have described it sounds like a three factor Anova. One fixed factor with two levels for feeding type, one fixed factor with two levels for gender and one repeated factor with 9 levels for week. The dependent variable is weight.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>