The Kruskal-Wallis H test is a non-parametric test which is used in place of a one-way ANOVA. Essentially it is an extension of the Wilcoxon Rank-Sum test to more than two independent samples.
Although, as explained in Assumptions for ANOVA, one-way ANOVA is usually quite robust, there are many situations where the assumptions are sufficiently violated and so the Kruskal-Wallis test becomes quite useful: in particular, when:
- Group sample strongly deviate from normal (this is especially relevant when sample sizes are small and unequal and data are not symmetric)
- Group variances are quite different (especially when there are significant outliers)
Some characteristics of Kruskal-Wallis test are:
- No assumptions are made about the type of underlying distribution.
- However, it is assumed that all groups have a distribution with the same shape (i.e. a weaker version of homogeneity of variances).
- No population parameters are estimated (and so there are no confidence intervals).
Property 1: Define the test statistic
where k = the number of groups, nj is the size of the jth group, Rj is the rank sum for the jth group and n is the total sample size, i.e.
provided nj ≥ 5 based on the following null hypothesis:
H0: The distribution of scores is equal across all groups
Observation: If the assumptions of ANOVA are satisfied, then the Kruskal-Wallis test is less powerful than ANOVA.
An alternative expression for H is given by
where is the sum of squares between groups using the ranks instead of raw data. This is based on the fact that is the expected value (i.e. mean) of the distribution of .
If there are small sample sizes and many ties, a corrected Kruskal-Wallis test statistic H’ = H/T gives better results where
Here the sum is taken over all scores where ties exist and f is the number of ties at that level.
Example 1: A cosmetic company created a small trial of a new cream for treating skin blemishes. It measured the effectiveness of the new cream compared to the leading cream on the market and a placebo. Thirty people were put into three groups of 10 at random, although just before the trial began 2 people from the control group and 1 person from the test group for the existing cream dropped out. Figure 1 shows the number of blemishes removed from each person during the trial.
Figure 1 – Data for Example 1
Since the groups are of unequal size and variances for the groups are quite unequal, we use the Kruskal-Wallis test instead of ANOVA (Figure 2).
Figure 2 – Kruskal-Wallis test for Example 1
Using the RANK_AVG function we obtain the ranks of each of the raw scores and then calculate the sum of the ranks for each group, namely R1 = 187.5, R2 = 76.5 and R3 = 114. H is calculated to be 7.91 using the formula given above, namely =12*J17/(J16*(J16+1)) – 3*(J16+1). The p-value is then calculated using the formula =CHIDIST(J18, J19). Since p-value = .01915 < .05 = α, we reject the null hypothesis, and conclude there is significant difference between the three cosmetics.
Note that we can perform a one-way ANOVA on the ranks using the ANOVA: One Factor data analysis tool to find SSB. This provides an alternative way of calculating H (see Figure 3) since H is equal to
Figure 3 – ANOVA on ranks for data in Example 1
Real Statistics Function: The Real Statistics Resource Pack contains the following supplemental function:
KRUSKAL(R1) = value of H on the data (without headings) contained in range R1 (organized by columns).
KTEST(R1) = p-value of the Kruskal-Wallis test on the data (without headings) contained in range R1 (organized by columns).
For Example 1, KRUSKAL(B5:D14) = 7.91 and KTEST(B5:D14) = .0195.
The resource pack also provides the following array function:
KW_TEST(R1, lab, ties) = the 4 × 1 range consisting of the values for H, H′, df, p-value if lab = FALSE (default). If lab = TRUE then an extra column is added containing labels. If ties = TRUE (default) then a ties correction is applied (thus H′ = H if no ties correction is applied).
Real Statistics Data Analysis Tool: The Real Statistics Resource Pack provides a data analysis tool to perform the Kruskal-Wallis test.
To use the tool for Example 1, enter Ctrl-m and double click on Analysis of Variance and select Single Factor Anova. When a dialog box similar to that shown in Figure 1 of Confidence Interval for ANOVA appears, enter B4:D14 in the Input Range, check Column headings included with data, select the Kruskal-Wallis option and click on OK.
The output is shown in Figure 4
Figure 4 – Kruskal-Wallis data analysis
If the Kruskal-Wallis Test shows a significant difference between the groups, then pairwise comparisons can be used by employing the Mann-Whitney U Test. As described in Experiment-wise Error Rate and Planned Comparisons for ANOVA, it is important to reduce experiment-wise Type I error by using a Bonferroni or Dunn/Sidák correction. For two such comparisons, this amounts to setting α = .05/2 = .025 (Bonferroni) or α = 1 – (1 – .05)1/2 = .025321 (Dunn/Sidák).
There are a variety of other follow-up tests (e.g. Nemenyi, Dunn’s and Dunnett’s) which are described at Follow-up Tests to Kruskal-Wallis.