# Kruskal-Wallis Test

Omnibus Test

The Kruskal-Wallis H test is a non-parametric test which is used in place of a one-way ANOVA. Essentially it is an extension of the Wilcoxon Rank-Sum test to more than two independent samples.

Although, as explained in Assumptions for ANOVA, one-way ANOVA is usually quite robust, there are many situations where the assumptions are sufficiently violated and so the Kruskal-Wallis test becomes quite useful: in particular, when:

• Group sample strongly deviate from normal (this is especially relevant when sample sizes are small and unequal and data are not symmetric)
• Group variances are quite different (especially when there are significant outliers)

Some characteristics of Kruskal-Wallis test are:

• However, it is assumed that all groups have a distribution with the same shape (i.e. a weaker version of homogeneity of variances).
• No population parameters are estimated (and so there are no confidence intervals).

Property 1: Define the test statistic

where k = the number of groups, nj is the size of the jth group, Rj is the rank sum for the jth group and n is the total sample size, i.e.

Then

provided nj ≥ 5 based on the following null hypothesis:

H0: The distribution of scores is equal across all groups

Observation: If the assumptions of ANOVA are satisfied, then the Kruskal-Wallis test is less powerful than ANOVA.

An alternative expression for H is given by

where $SS'_B$ is the sum of squares between groups using the ranks instead of raw data. This is based on the fact that $\frac{12(k-1}{n(n+1)}$ is the expected value (i.e. mean) of the distribution of $SS'_B$.

If there are small sample sizes and many ties, a corrected Kruskal-Wallis test statistic H’ = H/T gives better results where

Here the sum is taken over all scores where ties exist and f is the number of ties at that level.

Example 1: A cosmetic company created a small trial of a new cream for treating skin blemishes. It measured the effectiveness of the new cream compared to the leading cream on the market and a placebo.  Thirty people were put into three groups of 10 at random, although just before the trial began 2 people from the control group and 1 person from the test group for the existing cream dropped out. Figure 1 shows the number of blemishes removed from each person during the trial.

Figure 1 – Data for Example 1

Since the groups are of unequal size and variances for the groups are quite unequal, we use the Kruskal-Wallis test instead of ANOVA (Figure 2).

Figure 2 – Kruskal-Wallis test for Example 1

Using the RANK_AVG function we obtain the ranks of each of the raw scores and then calculate the sum of the ranks for each group, namely R1 = 187.5, R2 = 76.5 and R= 114. H is calculated to be 7.91 using the formula given above, namely =12*J17/(J16*(J16+1)) – 3*(J16+1). The p-value is then calculated using the formula =CHIDIST(J18, J19). Since p-value = .01915 < .05 = α, we reject the null hypothesis, and conclude there is significant difference between the three cosmetics.

Note that we can perform a one-way ANOVA on the ranks using the ANOVA: One Factor data analysis tool to find SSB. This provides an alternative way of calculating H (see Figure 3) since H is equal to

Figure 3 – ANOVA on ranks for data in Example 1

Real Statistics Function: The Real Statistics Resource Pack contains the following supplemental function:

KRUSKAL(R1) = value of H on the data (without headings) contained in range R1 (organized by columns).

KTEST(R1) = p-value of the Kruskal-Wallis test on the data (without headings) contained in range R1 (organized by columns).

For Example 1, KRUSKAL(B5:D14) = 7.91 and KTEST(B5:D14) = .0195.

The resource pack also provides the following array function:

KW_TEST(R1, lab, ties) = the 4 × 1 range consisting of the values for H, H′, df, p-value if lab = FALSE (default). If lab = TRUE then an extra column is added containing labels. If ties = TRUE (default) then a ties correction is applied (thus H′ = H if no ties correction is applied).

Real Statistics Data Analysis Tool: The Real Statistics Resource Pack provides a data analysis tool to perform the Kruskal-Wallis test.

To use the tool for Example 1, enter Ctrl-m and double click on Analysis of Variance and select Single Factor Anova. When a dialog box similar to that shown in Figure 1 of Confidence Interval for ANOVA appears, enter B4:D14 in the Input Range, check Column headings included with data, select the Kruskal-Wallis option and click on OK.

The output is shown in Figure 4

Figure 4 – Kruskal-Wallis data analysis

Follow-up Tests

If the Kruskal-Wallis Test shows a significant difference between the groups, then pairwise comparisons can be used by employing the Mann-Whitney U Test. As described in Experiment-wise Error Rate and Planned Comparisons for ANOVA, it is important to reduce experiment-wise Type I error by using a Bonferroni or Dunn/Sidák correction. For two such comparisons, this amounts to setting α = .05/2 = .025 (Bonferroni) or α = 1 – (1 – .05)1/2  = .025321 (Dunn/Sidák).

There are a variety of other follow-up tests (e.g. Nemenyi, Dunn’s and Dunnett’s) which are described at Follow-up Tests to Kruskal-Wallis.

### 112 Responses to Kruskal-Wallis Test

1. Kay says:

Dear Charles,

Thank you so much for your very helpful site! I was hoping you could help me select the most appropriate test for my data. I have two disease groups (A and B) with 10 patients in each (A1-A10 and B1-B10) and each patient has provided samples from different blood vessels (called P, H and V ). I.e. A1 has a A1-P, A1-H and A1-V. Also the data is all non-parametric, I think,

I’m looking to measure two things:
1) The difference between the vessels within a disease group i.e. A-P v A-H and A-V. I was thinking of doing a Friedman for this as
2) The differences between the diseases per vessel i.e. A-V compared to B-V – I was thinking of doing a Mann-Whitney U
I was also considering doing a nested ANOVA to check for the differences too. I would really appreciate any thoughts you have!

Best wishes,
Kay

• Charles says:

Kay,
When you say that your data is “non-parametric”, I assume that this means that some important assumption of the appropriate parametric test (probably some version of ANOVA) has been violated. Under this assumption, I will make the following comments:
1) Friedman’s test seems appropriate. A possible post-hoc test would be Wilcoxon’s signed ranks test (using a Bonferroni correction factor)
2) Mann-Whitney seems reasonable in this case. I am not sure what sort of nested ANOVA you had in mind, but I thought that your data wasn’t suitable for using a parametric test, such as ANOVA.
Charles
Charles

2. Sara Hervitz says:

Hello,
I am trying to examine the relationship between work experience and a score of how usable an application is. I have 4 work experience groups (Less than 1 year, 1-2 years, 3-5 years, and 6-10 years). There is a score for each study participant and a corresponding work experience group. Sample=12. I want to to a Kruskal-Wallis analysis. Is this correct? If so, what post hoc test would I do afterward? Thank you!

• Charles says:

Sara,
What hypothesis do you want to test?
Charles

3. Jack Yang says:

Dear Charles:
Thank you for providing the useful resource. I finished an experiment but not sure which test should be used for analysis. I’d appreciate greatly if you could give me comments.
Two patients, three treatment, no carryover effect, monitoring a certain blood parameter
Three treatments were randomly implemented. One session (treatment and monitoring) was conducted every three days, and each treatment was repeated 5 times, resulting in 15 monitoring findings for each patient. The treatment order, for example, is BABCCBAABBACCAC. I want to know which treatment affects the blood parameter more than the other treatments.

Thank you so much!

Jack Yang
Thank you for

4. tim says:

Very useful, my thanks for sharing this… I have put my data through with the software downloaded to apply this tes (Kruskil-wallice) and it carrie out the function. However, my p value keeps returning as 3.26731E-07…

Am I missing some stepts prior?

• Charles says:

Tim,
3.26731E-07 is simply .000000326731 in scientific notation. This is a very small number and means that you have a significant result.
Charles

5. Paramesvary says:

Charles,
I went through the whole of the Q&A above. None specifically addressed my questions that I would like to put forward. I would like to analyze my dv(there are 33of them) across 4 groups of years of experience in teaching, however 1st group has 29 respondents, 2nd gp has 111 respondents, 3rd gp has 59 respondents and 4th has only 4 respondents. Will I get a reliable result with KW test if i omit/remove the 4th group from the analysis? Pallant suggest KW should be reported using median but when i read through many blogs and the you tube videos, ranked mean seems to be a better option to be reported as the differences are obvious. If the results could be reported using ranked mean, would you please include the APA format for reporting (couldn’t find any example). I also noticed that some of the bloggers perform KW using 2 groups each time with 3 different combinations as post hoc test. I tried performing both KW and later Mann Whithey U test for every dvs and noticed that KW test gives values closer to the p=.05. When using Bonferroni correction for alpha value, none gives me a significantly different result. Is this possible, if possible how should I report this result? Or should I just drop this particular research objective?
I’m just a beginner and taking baby steps in my endeavour. Thank you very much for reading my predicament and your guidance is very much.

Sincere regards,
Parames

• Charles says:

Parames,
Happy to help (at least where I can).
1. You can remove the 4th group from the analysis, but you should report this when presenting your results.
2. You can certainly report the group medians, but you should be cautious of how you interpret the results. If the group distributions are similar, then the KW test will test equality of medians, but if the groups are not similarly distributed, then the test doesn’t really test equality of group medians.
3. KW with two groups is equivalent to Mann-Whitney
4. If you use the KW test, then you should choose one of the post-hoc tests described on the referenced webpage
http://www.real-statistics.com/one-way-analysis-of-variance-anova/kruskal-wallis-test/
5. You should be able to report the KW test results more or less as you report ANOVA with the logical changes. I don’t know of a specific APA format for KW, but there is one for ANOVA.
Charles

6. qwerty says:

Hello,

I am currently trying to perform analysis on some data. I have a categorical variable with 9 categories and a continuous variable (this is the dependant variable)- the data is not evenly distributed so i opted to use a Kruskal Wallace instead of an ANOVA. However i only have a single mean value for each of the categories. For example for my first category, parity 1 i have an average number of piglets born based on 387 pigs of 10.46.

Originally there was 30+ pieces of raw data in each set which were averaged out for each of the categories and provided to me in the data set. Unfortunately i no longer have access to the original raw data. Whenever i try and use software ( GraphPad Prism ) to analyse this it will not provide me with a result due to the lack of data values i have to analyse.

Is there a way i can resolve this / different test which would be more relevant or is this too much of a problem to be solved?

Thanks

• Charles says:

I don’t know anyway to perform such a test without the original data.
Charles

7. Mohideen says:

Hello,

Could you inform how to test assumption of homogeneity of variances for Kruskal-Wallis test is met?

Thanks,
Mohideen

• Charles says:

Mohedeen,
Depending on how you plan to use the KW test, homogeneity of variances is not a requirement. See the following webpage regarding how to interpret this test:
http://influentialpoints.com/Training/Kruskal-Wallis_ANOVA_use_and_misuse.htm
Usually Welch’s test gives better results when the homeogenity of variances assumption (on the original data) is not met.
Charles

• Mohideen says:

Thanks a lot.

8. Ryan says:

Charles
Thanks for your all your help so far. I’m having trouble with your ties correction equation for my H value. The equation you have detailed seems to be different to the output of your add-in. Your equation says n^3-n but the output formulae reads n^2(n-1) any advice would be greatly appreciated.

• Charles says:

Ryan,
You are correct. The formula should read n(n^2-1). The difference in the output is usually not great, and for now you would need to correct the formula manually. I will correct this in the software shortly. Strangely, the ties formula used on the referenced webpage is the right one and so gives the correct value.
Thank you very much for finding this error. I really appreciate your help.
Charles

9. Negin says:

Hi Charles,

I get a negative number for H so I can’t get p-value and I receive an error. What should I do?

Best,
Negin

• Charles says:

Negin,
I don’t think it is possible to get a negative number for H. I suggest that you double check your calculation.
If you send me an Excel file with your data and analysis, I will try to figure out what is happening. You can get my email address at
Charles

10. Tom says:

Thank you, Charles. You are very helpful, and I appreciate your time!

11. Thomas Stieve says:

Hello,

Thank you again for your wonderfully informative pages. I am trying to do an ANOVA type test on a DV that is a Likert item, scale 1 – 5. I have two categorical IVs. I obviously cannot do an ANOVA, so I’m trying to do a Kruskal-Wallis. However, I fail the Levene’s homogeneity of variance test. I have tried to collapse my values in the DV (originally it was 1 – 6, but with only one observation at 6), and normalize the distribution by using a log10, all in vain. I keep failing homogeneity of variance. My questions are 1) is there another way to meet this assumption that I am missing, and 2) if I have to give up on Kruskal-Wallis, what is an alternative? I have successfully done an ordinal regression on the data, but I think an ANOVA-like test is better to tell my story. Any help is much appreciated. Thanks!

• Charles says:

Thomas,
Without seeing your data, it is difficult for me to comment in detail, but I can say that when the homogeneity of variances assumption fails often a good solution is to use Welch’s Test. This is described on the website.
Charles

• Tom says:

Thank you. Let me try that.

• Tom says:

Thank you for the response. Two things:

1. My dependent variable is a likert item, i.e., ordinal data. Welch’s is working with a parametric dependent variable. Can I take the log10 of my dependent variable and use that as the DV for such a test?

2. I see from this post and others that Welch’s mainly is a one-way ANOVA test. However, I have two independent variables. I have seen articles that mention it is feasible to use Welch’s as two-way, but I’m not seeing how.

Thank you VERY much for your help!
Tom

• Charles says:

Tom,

1) You can certainly make such a transformation. Whether the test is useful depends on whether its assumptions are met after making this transformation.

2) I am not familiar with a two-way version of Welch’s test. There is a two-way version of Brown-Forsythe’s test (see http://www.jstor.org/stable/2529238). There is also a two-way version of Kruskal-Wallis’s test, which I described on the following webpage:

Scheirer-Ray-Hare

Charles

12. Candice Martin says:

Hi Charles,

Many thanks for this wonderful program!!

I am comparing body composition of three groups of athletes and have used the Kruskal Wallis. I wanted to do the Dunn test as follow-up but have got a Mac so unfortunately this was not possible, so I opted for the Pairwise difference on average rank as you described. I just want to know If I should apply the Bonferroni correction to the alpha I am using in the formula’s for the Pairwise difference on average rank calculations. Your help will be much appreciated.

• Charles says:

You don’t need to apply a Bonferroni correction in this case.
Charles

Dear Charles,
Hope you are fine and doing well.

Our team performed an experiment with neonatal calves. We fed kefir to calves and investigated their health and performance.

In fact this study is entirely dependent on the fecal endpoints, where calves were given 5 different score (1−5) daily during the first 2 weeks of life.
In fact we have 3 treatment groups of calves, each was scored daily for a duration of 2 weeks.
Previously we analyzed fecal score data using following procedure in SAS:

proc mixed;
class TRT ID period;
model fecalscore= TRT period trt*period/ ddfm =kr;
repeated period / subject=ID(TRT) type=ar(1);
random id;
lsmeans TRT period trt*period/ pdiff;
run;

We submitted the manuscript to the journal, and it is the second revision that the editor says:

“These scores are not continuous data that can be mathematically combined and analyzed as such. Data should be analyzed for categorical data and results of all scores that you collected should be analyzed appropriately. Data should be presented for each possible score by animals and cases”. If there are 2 groups Wilcoxan Rank Sum is a good approach. If there are 3 or more then Kruskall Wallis could be used.

We kindly ask you is it possible to help us to solve this problem?

Attached I am sending the SAS file of fecal score data.
Thank you very much in advance.

• Charles says:

I assume that the reviewer is suggesting that since the data is not continuous, it is unlikely to be normal and so you can’t use ANOVA and t tests. Instead, he/she is recommending nonparametric tests such as Kruskall Wallis (instead of ANOVA) and Wilcoxon Rank Sum (instead of t test). Actually you should probably use Mann-Whitney instead of Wilcoxon Rank Sum; the test is equivalent and easier to perform.
These tests are available from SAS and they are available from the Real Statistics data analysis tools.
Charles

14. lam norbu says:

I need help from these many people. currently I am doing my M.Sc. thesis but I am wrong with Kruskal wallis test. plz anyboday can help me.

15. lam norbu says:

I tried a lot with Kruskal wallis Test but I lend up wrong. it was tested between number of small mammals and microhabitats ( ROcks, down logs, grass, litters , shrubs and trees). So if any one can help me i would be very grateful. I am ready to share that data.

16. Ryan says:

Charles,

Thanks for making this quick summary/tutorial. It’s been really useful. I’m trying to decide if this is the right test for me to run. I’ve got data from a randomized complete block experiment. The sample size is pretty large (81 blocks, 405 observations, 5 treatments). I wanted to run it as a normal ANOVA, but I can’t seem to get the residuals normalized for two of my response variables (even after using a boxcox transformation). So, I figured since it violated the assumption of residual normality I had to start looking for nonparametric alternatives. I don’t know very much about nonparametric designs, so this tutorial has been really useful. My question is whether Kruskal-Wallis, Friedman, or some other test that I haven’t figured out yet is most appropriate.

Again, thanks so much for any advice you have.

Be well,

Ryan

• Charles says:

Ryan,
ANOVA is quite robust to violations of normality, and so unless the data is far from normal I would use ANOVA. If you have a completely random design, then you can use Kruskal-Wallis if the normality assumption is not met (although Welch’s Test is better if the homogeneity of variance assumption is not met). Friedman’s test can be used if the normality assumption is not met in case of repeated measures design or random complete block.
Charles

17. ted says:

Good day Charles,

Can I still use KW test if the sample size in each group is small (2~6)?

Thanks

• Charles says:

You can always run the test with such a small sample, but unless the effect is very big, you shouldn’t expect too much from the test. You should expect that the power of a test with such a small sample will likely be very low.
Charles

18. Fabian Sch. says:

Hi Charles,

your plugin is really awesome and I work a lot with it to analyze large datasets. However, it can be very laborious to reorganize my data tables so that the individual samples are sorted by columns like it is required by the ktest formula. Could it be possible to modify the formula so that i can select individually the samples to be tested (i.e. ktest=(B2:B11,B12:B21,B22:B31). Many thanks! Fabian.

• Charles says:

Fabian,
You can do something like this by choosing Kruskal-Wallis data analysis tool and selecting the standard (stacked) format.
Charles

19. Ken says:

Hi Charles,
Your site and toolbox have been great resources! I have a question:

I am trying to analyze grade distribution data for a specific course over 3 semesters for different student majors. I have counts for total number of letter grades given A-F, and I want to group the counts according to college majors. The distribution is non-parametric (it is nearly binormal, with most students getting A’s).

I’m interested in seeing if there is a test for significance for the individual letter grades over time and/or per major. For example, say 26% of business majors, 25% of psych majors, and 29% of nursing majors received A’s in 2000, and 19%, 20%, and 17% got A’s in 2001…how do I tell if these differences are significant between majors for each year, and if the total number of A’s is significant when compared year to year? I was thinking Kruskal-Wallis since it is non-parametric?

Hope that makes sense…
Thanks!

• Charles says:

Hi Ken,
Kruskal-Wallis is a non-parametric version of one-way ANOVA. Thus you can use it say to compare the percentage of A’s between different majors in one year. It is not a multiple factor test and so you can’t tell if there are significant differences between majors in multiple years. It is not a repeated measures test and so you can’t tell whether there are significant differences in one major by year; you can use Friedman’s nonparametric test to do this.
Charles

20. Felix says:

Thank you Charles for every topic in this web site. Simple and effective to learn and for my personal opinion one of the best.
Felix

21. Sebastian says:

I tried to use the KW-test from the Real Statistics Pack. Unfortunately I got different results with the KW-test in R and in Statistica.

p=.0201 (R)
p=.0200 (Statistica)
p=.0131 (Real Statistics)

I used the KW-test in the “ANOVA: Single Factor” window over 4 groups. Can you tell me what I am doing wrong? Maybe I can send you my testdata.

• Charles says:

Sebastian,
Yes, please send me an Excel file with your data so that I can figure out what is going on.
Charles

• Sebastian says:

Dear Charles,
did you get my mail from Tuesday? 😉
Sebastian

• Charles says:

Dear Sebastian,
Yes, I received it and will look at it soon.
Charles

• Sebastian says:

Dear Charles,
thanks a lot! It works fine now with this adjustment of the data. Will you implement an automatical adjustment in the next release? this would be nice. Thanks again! I really appreciate this support! 🙂
Sebastian

• Charles says:

Sebastian,
The adjustment that I suggested in the email to you is necessary due to a problem with how Excel has implemented the RANK and RANK.AVG functions. I will try to come up with new versions of these functions that correct the problem.
Charles

22. Alistair Cullum says:

I should have noted the general form of the equation I found: α = 1–(1–.05)^(1/k), where k is the number of comparisons.

• Charles says:

Alistair,
I understood that, but the formula should be α = 1–(1–.05)^k.
Charles

• Alistair Cullum says:

But 1-(1-.05)^2 is 0.0975, not 0.0253. Unless I’m missing something, either the equation is wrong or the solution is wrong.

• Charles says:

Alistair,
Yes, you are correct. Sorry that it has taken me so long to understand. The exponent should be 1/2 instead of 2, just as you have said. I have now corrected this on the webpage. Thanks for your patience and for identifying this error.
Charles

• Alistair Cullum says:

No worries. I’m just glad it all got sorted out in the end.
Alistair

23. Alistair Cullum says:

I forgot to mention in my last comment: There seems to be a problem with one of the equations given in the page. At the end of the second-to-last paragraph it says “this amounts to setting α = .05/2 = .025 (Bonferroni) or α = 1 – 1 – .05 = .025321 (Dunn/Sidák).” The α = 1 – 1 – .05 part must be missing something, though I’m not familiar enough with the Dunn/Sidak correction to know what it’s supposed to look like.

Alistair

• Charles says:

Alistair,
Thanks for catching this typo. It should say α = 1–(1–.05)^2 =.025321 (Dunn/Sidák). I have now corrected the referenced webpage.
I really appreciate your help in making the website better.
Charles

• Alistair Cullum says:

I think the exponent should be ½ instead of 2. Otherwise the calculation gives 0.0975 instead of 0.0253.

• Charles says:

Hi Alistair,
The exponent for the Dunn/Sidak correction is 2 and not 1/2. The resulting values should be pretty similar to the Bonferroni correction.
Charles

24. Alistair Cullum says:

Hi Charles.

Thanks as always for your work here. I wonder if you can help with a question that has befuddled me for some time. It relates to the question of distributions needing to be “similar” for the assumptions of the Kruskal-Wallis or Mann-Whitney tests to be met. Does similar mean same general shape, or variability, or both? For example, say we have two samples, both right-skewed to the same degree (a skewness value of +3.0, for example). But the variance of one is five times that of the other. For purposes of K-W, are these similar? I’d like to be able to give my students some definitive advice on when they can use these rank-based tests versus needing to look for another solution.

Thanks,
Alistair

• Charles says:

Hi Alistair,

I believe that similar includes homogeneity of variances. See, for sample, the following paper
http://www.ncbi.nlm.nih.gov/pubmed/10710850

For two samples, if the data if not too skewed, you can use a t test (with Welch’s correction) or a transformation (e.g. log). For more than two samples, Welch’s ANOVA is often a good approach even with heterogeneous variances.

The Mann-Whitney test really tests the following null hypothesis:

H0: if one observation is made at random from each population (call them x0 and y0), then the probability that x0 > y0 is the same as the probability that x0 < y0.

If this is what you require then I don't believe that homogeneity of variances is an issue.

Charles

• Alistair Cullum says:

Thanks, Charles. It looks like the robustness of these rank-based methods is less than I’d been led to believe by some older texts. Based on my reading of this paper – http://www.ncbi.nlm.nih.gov/pubmed/19247980 – I’m going to be much more circumspect in recommending the ranks approach to students, instead suggesting that they use a transform and/or Welch’s method whenever possible.

• Charles says:

Alistair,
That sounds correct to me.
Charles

25. Robert says:

Hi Charles,

I have problem with calculating the p value,
because of som reason ist just dont work.

I tried with =CHIDIST (for example: =CHIDIST(G17:G18); =CHIDIST(G17,G18); etc.) fuction, but it shows me error.

Can you tell, yhat can be a problem?

Thank you wery much.

• Charles says:

The CHIDIST function takes the form =CHIDIST(x, df). The examples you gave don’t have two arguments and the first argument can’t be a range.
Charles

26. Alison says:

Dear Charles,
I am analyzing a data set collected from course catalogs of various institutions of higher education. I am trying to learn more about their online offerings in a discipline. My three groups are the types of institutions (2-yrs, 4-yr public, 4-yr private). Samples sizes are 16 or 17 institutions per group. I had planned to examine differences between the mean number of online courses offered between these groups using ANOVA. However, the data do not appear to be normally distributed (many institutions in one of my groups do not offer any online courses in this discipline) and because of all of the zeros, the variances are quite different. Would it be appropriate to analyze these data with a Kruskal-Wallis Test? If so, with a significant result, would it be appropriate to follow up with Mann-Whitney test for pairwise comparisons if I use a Bonferroni correction?
Thanks for considering my questions!
Alison

• Charles says:

Alison,
Since the homogeneity of variances assumption is violated, Welch’s test is usually the preferred test to use. You can follow up the Games-Howell test. Both of these tests are described in the Real Statistics website.
Charles

• Alison says:

Hi Charles,
What if the data also appear to have a non-normal distribution? Which test, Kruskal-Wallis or Welch’s ANOVA, is more robust when both of these assumptions appear to be violated?

BTW, I have done the analysis three ways now
-ANOVA followed by Bonferroni-corrected contrasts
-Kruskal-Wallis followed by Mann-Whitney U contrasts
and
-Welch’s ANOVA using Games Howell for contrasts.
I get the same results either way.
Alison

• Charles says:

Alison,

You asked which test is better when both of these assumptions are violated, but you have only listed one assumption, namely normality. I assume that the other assumption is homogeneity of variances. When this latter assumption is not met, Welch’s ANOVA is usually chosen over KW. I don’t know how robust Welch’s is to violations of normality, but usually these sorts of tests are much more sensitive to violations of the unequal variance assumption. Another approach is bootstrapping.

The the following is an article that may shed some more light on the situation:
http://home.cc.umanitoba.ca/~kesel/Cribbie_param_bootstrap_feb_2010.pdf

Charles

27. Mike says:

Is it possible to get p-value with confidence interval for Kruskal-Wallis test? Your articles states that there is no confidence interval for Kruskal-Wallis, but my paper reviewer insists I need to put one, together with p-value ( I have adjusted p-value for it, but not CI)

• Charles says:

Mike,
I can think of two approaches to getting a confidence interval. One is to construct an exact test, similar to the way I have done it for Wilcoxon Signed-ranks or Mann-Whitney. The other is to conduct resampling, as I have done for one sample and two sample tests of the median or for ANOVA. I believe that SAS provides a confidence interval, probably associated with the exact test.
Charles

28. Mehak says:

Could you please suggest me, if it is possible for krushal -walis test that we can consider same set of subjects for each group. e.g. same subjects (subject 1 to subject 10) from population.
In above example in this thread:
We have three groups e.g. new cream, old cream and placebo, to see the effects of new cream in comparison to the available creams. Could we apply these treatments or creams to same set of subjects for Krushal-walis test to see the difference in means between the groups.
I would be thankful, if someone can provide me the answer for my question.

• Charles says:

If you want to use the same subjects you should consider Friedman’s test instead.
Charles

• Mehak says:

Dear Charles,
Thank you for your reply. I was using this test before but now will implement Friedman’s test on my database. In this test if I am not wrong, we need to rank based on columns ( groups) for each subject separately.

I have one more question and I think it would be fine with you if I can ask you about my doubt for Anova test in this page.

For Anova test, assumption is that sample distributions should be normal and if our distributions are not normal we need to apply non-parametric tests e.g. krushal or Frideman.

But I was looking into some of the examples in a statistical toolbox, where they applied the Anova tests for sample data without normal distribution.

Could you please suggest me, it is fine if we apply the Anova test even when our sample distributions are not normal.

In my results, results are good for non-parametric sample distributions using Anova. I am confused, if it is a right way to report our work and to use this test for non-parametric sample distributions.

Thanks,
Mehak

• Charles says:

Mehak,
Anova is pretty robust to violations of the normality assumption. This means that unless the data is quite skewed the test should work pretty well even if the normality assumption is violated. More important is the homogeneous variance assumption. If the groups have very different variances then you should consider using a different test. In this case the Welch’s test is probably the best choice with independent samples.
If you have repeated samples, then you shouldn’t use the ordinary ANOVA but the ANOVA with repeated samples instead.
Charles

• Mehak says:

Dear Charles,

I really appreciate you for your quick reply to my query and to clear my doubt. I have spent almost one week to search for the solution of my problem and yesterday by chance I looked into your this blog. This blog is very helpful and thank you for providing us your help with such a productive answers to our problems.

Related to my problem, I have used the Anova for three different groups (score1, score 2 and score3) with same subjects for each group. It would be kind of you, if you can provide me the feedback for one of my basic question. I do not want to conclude my results based on wrong assumptions. My main question is related to sample data. Could we select same subjects for each group in anova. Example Same subjects (Subject 1 to Subject 50) for each group.

We can also use the post hoc tests after the parametric tests and non-parametric tests. I am planning to do the post hoc test after the Anova and Friedman test. Would it be fine if I will use these tests for further investigation.

Thank you
Mehak

29. Gordon says:

Dear Charles,

Will the Kuskal-Wallis test give reliable results if one group is very large compared with the others ? In one dataset I’m analysing, we’ve got one group of roughly 500 individuals, another of about 100, then 3 groups of 15-50 each. Intuitively, it would seem that the one very large group would dominate the overall “structure” of the rankings, making it less likely that the smaller groups could have significant influence.

I realise there is no requirement that the groups should be of the same size, but do the assumptions of the test break down (say) in the limit where one group gets extremely large whilst the others become tiny ?

• Charles says:

Gordon,
I don’t know of any breakdown in the KW test when the group sample sizes are very different, but I can’t answer your question definitively.
Usually in the case of unequal sample sizes, the statistical power tends to be reduced since the power is more affected by the smaller sample than the larger samples. I don’t know whether this is true for the KW test, but it is probably the case.
Charles

30. Juan says:

Dear Charles,

Thank you for your unvaluable help on statistics. I made a survey with several questions using 5-point scale (likert scale) on six different groups (between 12 to 17 people each groupe). I want to know if there are differences between groupes for each question. So, 1. Do I use Krutal Wallis test first and then pairwise comparisons by using the Mann-Whitney test ?
2. What kind of test should I use to evalute correlations between questions (Spearman test)?

Thank you so much,
Juan

• Charles says:

Juan,
1. Kruskal-Wallis is a reasonable approach. If there is a significant result then you can follow up with Mann-Whitney (probably using a Bonferroni correction factor). If KW test doesn’t yield a significant difference, then you don’t need to do pairwise MW tests.
2. Spearman’s is good, but Kendall’s is probably better.
Charles

• Juan says:

Thanks for the answer. I have another question.
In some questions one groupe presents bimodal distribution. The other groups present unimodal distribution in general. Can I use Krustal in those conditions ?

• Charles says:

Juan,
For the KW test to be valid the group distributions should have similar shape. The validity of the test depends on how strongly is this assumption violated. If you have decided not to use ANOVA because of non-homogeneity of the variances, then you are probably better off using Welsh’s Test instead of KW.
Charles

• juan says:

Dear Charles,

Actually, I decided not to use ANOVA because my data is ordinal (questions using 5-point scale on six different groups between 12 to 17 people each groupe). So, despite this, can I use Welsh’s test ?. If not, what could I use when I have unimodal and bimodal distribution and I need to know if there is a significant difference ? Just visual inspection (histogram) is enough ?

Thank you

• Charles says:

Juan,

Likert scales although technically ordinal are often considered to be interval (or even continuous). The more scales (e.g. 7-point vs 5-point) the better this assumption is. The following is a link to an article that compares various options for handling Likert scales in ANOVA-like tests.

http://digitalcommons.wayne.edu/cgi/viewcontent.cgi?article=1774&context=jmasm

Charles

31. elias says:

I collected tree density data which is grouped in to three ( they are count data) and i want to analyses this data so which test statistics is appropriate for me i am thinking of Kruskal-Wallis test so is there any one who help me !!!

• Charles says:

Elias,
You haven’t provided enough information to answer your question. Depending on the details, you might use ANOVA, Kruskal-Wallis or Friedman.
Charles

32. Sourav Mukherjee says:

Hello Charles,
Thank you first of all for your efforts and time for putting up your knowledge in this web-page and that too in such easy language. It’s been very helpful. But even then, I have some doubt on whether ‘kruskal wallis test’ should be considered as the most appropriate one to measure the difference in the dataset I have. Please help me out with this doubt.

I have a dataset comprised of 30 items (viz., 30 rows) in total. It has been collected for 4 types of items (viz., 4 columns). The problem is, some of the rows for some of the columns are blank in them (viz., no data for certain items). The columns contain 17 non-blank rows for column 1, 6 for column 2, 8 for column 3 and 13 for column 4; but for any row i, not all the 4 possible column j – are populated, as stated earlier. No Std-Dev. values are given, only the Mean values were used to populate this (aforementioned) sparse-table. A look with the naked-eye suggests that there are differences in these 4 columns. It seems to me that the ‘kruskal wallis test’ would be appropriate to measure the extent of their differences – am I right? Moreover, should I do some other tests (say, 6 Wilcoxon Rank-sum tests, takng pairs of 4 columns)? or something else? .. Please let me know your opinion.

Thanking you
Sourav

• Charles says:

Sourav,
You can perform the Kruskal-Wallis test even with some blank cells or blank rows. In fact, the example I give on the referenced webpage has some blank cells. The test just ignores all the blank cells. It is important that the types of items (i.e. columns) are independent; otherwise you might want to consider the Friedman’s non-parametric test.
Charles

Dear Charles

I am carrying out a survey and i want to test my hypothesis using kruskal – wallis test. My population size is 40 and my questionnaires have 6questions.

My responses to each questionnaire have five headings, but i do not know how to fit these data to the online software for calculating using the kruskal – wallis test,

• Charles says:

Please clarify your request. I understand that 40 people take a questionnaire consisting of 6 questions, but I don’t understand the 5 headings. Also what are you trying to test using Kruskal-Wallis?
Charles

34. Katie says:

Hi I have reported my results for the KW test as e.g. x2(3, N = 151) = 4.56, p = .229

I have been told I should report the ‘K statistic’

What is this and how do I find it? Or do I just need to rejig the above?

Thanks

• Charles says:

Katie,
I believe this is what is called H on the referenced webpage. Some use the letter K instead of H.
Charles

• Katie says:

Thank you Charles,

I was starting to think such a statistic didn’t exist!

So instead of x2 i should be reporting K then?

Katie

• Charles says:

Katie,
I don’t know what X2 is. This is cell X12 in Figure 4 of the referenced webpage.
Charles

35. Dicky says:

many thanks with this macro, I’ve already try with several statistic software, and come to this site.

I need to run 15 sample with 4 replicate. the sample result is ordinal ( 1 – 5). is this can be done with Kruskal – Wallis or have to use another statistical approach

Best Regards

• Charles says:

Dicky,
If the 4 “replicates” are independent variables then you can use the Kruskal-Wallis Test. (You could use ANOVA if the assumptions are met). If the 4 “replicates” are not independent then you might look at Friedman’s Test instead of Kruskal-Wallis.
Charles

36. Robert says:

Dear Charles,

first let me thank you for your unvaluable help on statistics.
I’ve got a question about the p-value in my calculation. After I’ve done the k-w-test I got as p-value .37. At the same time the whole calculation was given as significant. How can this be? I thought p had to be smaller than alpha to be significant.
Could you enlighten me?

• Charles says:

Robert,
If the p-value >= alpha then the sig should be equal to “no”. I just the tested the KW data analysis tool and it seems to work fine. Are you using the KW data analysis tool or something else? If you send me a spreadsheet with your data and analysis I will try to figure out what’s wrong.
Charles

37. Mahboob says:

Dear charles,

Many thanks for your nice explanation. i am working on review and I want to compare the mean of one parameter within nine groups in nine studies. moreover, for every groups I have mean, SD and sample size. I dont know with which software or formular I could define my statistics (ANOVA or Kruskal-wallis) for my analysis. because as you know most of the define statitics formular or software exclusively focuse on calculating based on raw data. I also check for “review manager” but it didnt work for only one study from every group. it would be your favor if you kindly help me in this regard. thank you

Mahboob

• Charles says:

Mahboob,

Kruskal-Wallis is based on the ranks of the raw data. Since you don’t have the raw data you won’t be able to use this test.

You have enough information to use a Single Factor Anova, as follows. Suppose that you have k groups whose sizes are n1, n2, …, nk, whose means are m1, m2, …, mk and whose variances are v1, v2, …, vk. Now define m = the average of the means = (m1 + … + mk)/k and n = n1 + … + nk. Define SSW = v1*(n1-1) + … + vk*(nk-1) and SSB = (m1-m)^2*n1 + … + (mk-m)^2*nk. Define dfB = k – 1, dfW = n – k, MSB = SSB/dfB, MSW = SSW/dfW and F = MSB/MSW. Now use the test FDIST(F, dfB, dfW).

Charles

38. Valerie says:

Hi!

I wonder if you could help me with my survey .
I have 4 factors namely Work Conditions, pyschosocial Factor, Work Culture and Leaders.I wanted to test the reliability so I did pilot testing with a total of 7 respondents.
Alpha is 94%.
But upon checking the alpha per factor some factor got really low alpha.
Would this mean I have to modify the questionnaire or just stick with it since the overall alpha is 94%?

Thank you so much

• Charles says:

Valerie,

Does your survey have (a) four questions or (b) four factors (with each of the four factors having multiple questions)?

If your survey has four factors (option b), then you should generally not calculate Cronbach’s alpha on the survey with all the questions. The fact that you got a high alpha for the total questionnaire and a low alpha oon the questions for one factor is surprising to me.

If instead your survey has four questions (option a) and the overall alpha is high but the alpha with one questions missing is low, then this is also surprising to me, but indicates that that question has problems.

Charles

39. Chairul Bahri says:

Hi Charles,
Suppose, one has performed Wilcoxon rank sum test along with the Kruskal-Wallis test. Does she need to perform the Mann-Whitney U Tests?

Thanks,
Chairul

• Charles says:

Hi Chairul,
No. The Wilcoxon rank sum test is equivalent to the Mann-Whitney test.
Charles

40. Uttam says:

Hey Charles,
First of all, tremendous work done on excel(stat tests) which is very rare to see. Can i get to see how to do Mood’s Median test and an example on excel?

Thanks
Uttam

• Charles says:

Uttam,
I don’t currently support the Mood’s Median Test, but it is pretty easy to conduct such a test based on the chi-square test that is already supported. Suppose that you have samples S1, S2 and S3. First calculate the Median M of the combined three samples. Next create a 2 x 3 contingency table, where the first row consists of the number of elements in each of the samples which are bigger than M and the second row consists of the number of elements in each of the samples that are less than or equal to M. Now perform a chi-square independence test.
I will add this test to the website shortly.
Thanks for bringing this topic up.
Charles

• Uttam Biswas says:

Thanks for your reply. Please bring the Chi-Square Independence test as i need to complete the Moods Median test.

• Charles says:

Uttam,
The chi-square test is available now from the software. I will add the Mood’s Median test in the next release of the software.
Charles

• Charles says:

Uttam,
The latest release of the Real Statistics software supports Mood’s Median Test. See Blog for more information.
Charles

41. Eldho says:

Eldho Varghese

• Charles says:

Sorry Eldho, but I don’t know why the Kruskal Wallis test statistic is named “H”.
Charles

42. Tim M says:

Charles,

First, thank you for your expertise and hard work. My question is in regard the post-hoc pairwise comparisons when using a Kruskal-Wallis analysis. In the formula for calculating the critical value you have placed 2 in the denominator of the second term, but in the text, when describing the example, you have divided by 12. Would you clarify?

Best Regards,
Tim

• Charles says:

Tim,
Thanks for catching the typo. It should be 12 in both places. I have now corrected the webpage.
Charles