Certain tests (e.g. ANOVA) require that the variances of different populations are equal. This can be determined by the following approaches:
- Comparison of graphs (esp. box plots)
- Comparison of variance, standard deviation and IQR statistics
- Statistical tests
The F test presented in Two Sample Hypothesis Testing of Variances can be used to determine whether the variances of two populations are equal. For three or more variables the following statistical tests for homogeneity of variances are commonly used:
- Levene’s test
- Bartlett’s test
Using the terminology from Definition 1 of Basic Concepts for ANOVA, the following null and alternative hypotheses are used for either of these tests:
H0: = = ⋯ =
H1: Not all variances are equal (i.e. ≠ for some i, j)
For Levene’s test, the residuals eij of the group means from the cell means are calculated as follows:
An ANOVA is then conducted on the absolute value of the residuals. If the group variances are equal, then the average size of the residual should be the same across all groups.
Example 1: Use Levene’s test to determine whether the 4 samples in Example 2 of Basic Concepts for ANOVA have significantly different population variances.
Since p-value = .90357 > .05 = α (Figure 1), we cannot reject the null hypothesis, and conclude there is no significant difference between the 4 group means, and so the ANOVA test conducted previously for Example 2 of Basic Concepts for ANOVA satisfies the homogenity of variances assumption.
There are three versions of the Levene’s test:
- Use of mean (as in the explanation above)
- Use of median (replace mean by median above)
- Use of 10% trimmed mean (replace mean by 10% trimmed mean above)
The three choices determine the robustness and power of Levene’s test. By robustness, we mean the ability of the test to not falsely detect unequal variances when the underlying data are not normally distributed and the variables are in fact equal. By power, we mean the ability of the test to detect unequal variances when the variances are in fact unequal.
Levene’s original paper only proposed using the mean. Brown and Forsythe extended Levene’s test to use either the median or the trimmed mean. They performed Monte Carlo studies that indicated that using the trimmed mean performed best when the underlying data had a heavy-tailed distribution and the median performed best when the underlying data had a skewed distribution. Using the mean provided the best power for symmetric, moderate-tailed, distributions.
Although the optimal choice depends on the underlying distribution, the definition based on the median is recommended as the choice that provides good robustness against many types of non-normal data while retaining good power. Another choice may be better based on knowledge of the underlying distribution of the data.
Some cautions about Levene’s test: You need to assume that the absolute values of the residuals satisfy the assumptions of ANOVA. Also, a more liberal cut off value when testing homogeneity of variances is often used due to the poor power of these tests.
Real Statistics Function: The following supplemental functions contained in the Real Statistics Resource Pack compute the p-value for Levene’s test.
LEVENE(R1, type) = p-value of for Levene’s test for the data in range R1. If type = 0 then group means are used; if type > 0 then group medians are used; if type < 0 then 10% trimmed group means are used. If the second argument is omitted it defaults to 0.
This function ignores any empty or non-numeric cells.
For example, for the data in Example 1, LEVENE(B6:E13) = LEVENE(B6:E13, 0) = 0.90357 (referring to Figure 1). Note that, for the same data, LEVENE(B6:E13, 1) = 0.97971 and LEVENE(B6:E13, 2) = 0.90357.
Real Statistics Data Analysis Tool: A Levene’s Test option is included in the Single Factor Anova data analysis tool. This options displays the results of all three versions of Levene’s test.
To use this tool for Example 1, enter Ctrl-m and select Single Factor Anova from the menu. A dialog box similar to that shown in Figure 1 of Confidence Interval for ANOVA appears. Enter B5:E13 in the Input Range, check Column headings included with data, select the Levene’s Test option and click on the OK button.
We now show another test for homogeneity of variances using the Bartlett’s test statistic B, which is approximately chi-square:
where s2 is the pooled variance, which as we have seen is MSW, and
B can also be defined as follows:
Here MSW is the pooled variance across all groups. Thus the null hypothesis that all the group variances are equal is rejected if p-value < α where p-value = CHIDIST(B, k–1). B is only approximately chi-square, but the approximation should be good enough if there are at least 3 observations in each sample.
Bartlett’s test is very sensitive to departures from normality. If the samples come from non-normal distributions, then Bartlett’s test may simply be testing for non-normality. Levene’s test is less sensitive to departures from normality.
Example 2: Use Bartlett’s test to determine whether the 4 samples in Example 2 of Basic Concepts for ANOVA have significantly different population variances.
We obtain Bartlett’s test statistic B (cell I6 of Figure 2) by calculating the numerator and denominator of B as described above (cells I4 and I5). To do this we first calculate the values dfj, 1 ⁄ dfj, and ln (cells in the range B13:E16). We also calculate dfW, 1 ⁄ dfW, MSW and ln MSW (cells in range F13:F16). Note that MSW = SUMPRODUCT(B13:E13,B15:E15)/F13.
Since p-value = CHITEST(B, k–1) = CHITEST(1.88,3) < .979 > .05 = α, we don’t reject the null hypothesis, and so conclude that there is no significant difference between the variances of the four methods.
Note that if we change the first sample for Method 4 to 185 (instead of 85) and repeat the analysis we would find that there would be a significant difference in the variances (B = 17.23, p-value = .001 < .05 = α). This would be due to this one outlier. That it was an outlier would show up easily in any graphic representation. We would then need to decide whether this item was simply an error in measurement or a true measurement (see Outliers in ANOVA).
Dealing with non-heterogeneity of variances
We present four ways of dealing with models where the variances are not sufficiently homogeneous:
- Non-parametric test: Kruskal-Wallis
- Modified tests: Brown-Forsythe and Welch’s procedure
- Transformations (see below)
In the rest of this section we will look at transformations that can address homogeneity of variance. In particular, we look at square root and log transformations. For transformations that address normality Transformations to Create Symmetry.
Log transformation for homogeneity of variances: A log transformation can be effective when the standard deviations of the group samples are proportional to the group means. Here a log to any base can be used, although log base 10 and the natural log (i.e. log base e) are the common choices. Since you can’t take the log of a negative number, it may be necessary to use the transformation f(x) = log(x+a) where a is a constant sufficiently large to make sure that all the x + a are positive.
Example 3: In an experiment the data in Figure 3 were collected. Check that the variances are homogeneous before proceeding with other tests.
The sample variances in Figure 3 seem quite different. When we perform Levene’s test (Figure 4), we confirm that there is a significant difference between the variances (p-value = 0.024 < .05 = α).
We note there is a correlation between the group means and group standard deviations (r = .88), which leads us to try making a log transformation (here we use base 10) to try to achieve homogeneity of variances (table on the left of Figure 15.23).
We can see that the variances in the transformed data are more similar. This time Levene’s test (the table on the right of Figure 5) shows that there is no significant difference between the variances (p-value =.20 > .05).
Square root transformation for homogeneity of variances: When the group means are proportional to the group variances, often a square root transformation is useful. Since you can’t take the square root of a negative number, it may be necessary to use a transformation of form , where a is a constant chosen to make sure that all values of x + a are positive. If the values of x are small (e.g. |x| < 10), it might be better to use the transformation or + .