Follow-up Tests to Kruskal-Wallis

Nemenyi Test

This test is the same as the Tukey HSD Test (see Unplanned Comparisons), except that we test the difference between rank sums and use the following standard error


where k = the number of groups and m = the size of each of the group samples. The group sample sizes must all be equal. The statistic  has a studentized range  distribution (see Studentized Range Distribution). The critical values for this distribution are presented in the Studentized Range q Table based on the values of α, k (the number of groups) and df = ∞. If q > qcrit then the two means are significantly different.

This test is equivalent to Rmax − Rmin > qcrit ⋅ s.e.

Picking the largest pairwise difference in means allows us to control the experiment-wise  for all possible pairwise contrasts; in fact, this test keeps the experiment-wise α = .05 for the largest pairwise contrast, and is conservative for all other comparisons.

Real Statistics Data Analysis Tool: The Real Statistics Resource Pack provides a data analysis tool to perform the Nemenyi test, as shown in Example C.

Example 1: Conduct the Nemenyi Test for the data in range B3:D11 of Figure 2 to determine which groups are significantly different.

Press Ctrl-m, double click on Analysis of Variance option and select Single Factor Anova. When a dialog box similar to that shown in Figure 1 appears, enter B3:D11 in the Input Range, check Column headings included with data, select the Kruskal-Wallis and Nemenyi KW options and click on OK.

ANOVA KW dialog box

Figure 1 – Selecting Kruskal-Wallis and Nemenyi Tests

The output is shown in Figure 2.

Nemenyi test

Figure 2– Nemenyi Test

The Kruskal-Wallis Test (the middle part of Figure 2) shows there is a significant difference between the three groups (cell J12). Since the three groups are equal in size we use the Nemenyi test to determine which groups are significantly different.

The template for the Nemenyi test is generated by the Real Statistics data analysis tool, as shown on the right side of Figure 2). You begin by inserting a 1 and -1 in cells M3 and M4 to compare the New and Old groups. The difference between the rank sums of these two groups is 76.5 (cell N6), which is greater than 66.28, the value of x-crit (cell P9), we conclude there is a significant difference bwteen the New and Old groups.

We can compare the New and Control group in the same way (removing the -1 from cell M4 and inserting -1 in cell M5) and see that there is no significance between these groups. Similarly there is no significant difference between Old and Control.

Some key formulas from Figure 2 are shown in Figure 3.

Cells Item Formula
N3 R1 =RANK_SUM(B4:D11,1,1)
O3 n1 =COUNT(B4:B11)
L9 s.e. =SQRT(O6^2*M9*(O6*M9+1)/12)
M9 k =COUNT(O3:O5)
N9 q-stat =N6/L9
O9 q-crit =QCRIT(M9,480,Q7,2)
P9 x-crit =L9*O9
S9 sig =IF(ABS(N9)>O9,”yes”,”no”)

Figure 3 – Selected formulas from Figure 2

Dunn’s Test

This test is similar to the above test and can be viewed as a version of Nemenyi’s test when the sample sizes are unequal.

Dunn’s test uses the statistic




and the standard error is


If there are a lot of ties, an improved version of the standard error is given by


where f is as in the ties correction for the Kruskal-Wallis test. This test is equivalent to




Here the term in parentheses is a Bonferroni-like correction.

Real Statistics Data Analysis Tool: The Real Statistics Resource Pack provides a data analysis tool to perform Dunn’s test, as shown in Example 2.

Example 2: Find all significant differences between the blemish creams of Example 1 of Kruskal-Wallis Test at the 95% significant level.

We repeat the data from Example 1 of Kruskal-Wallis Test in range B3:D13 of Figure 4. As we saw from the Kruskal-Wallis analysis, there is a significant difference between the three groups. Unlike in Example 1, this time we use Dunn’s test since the group sizes are different.

To perform this test, we proceed as in Example 1, except that we choose the Dunn KW option instead of the Nemenyi KW option. When we press the OK button the result shown on the right side of Figure 4 is displayed.

Dunn's test KW

Figure 4 – Dunn’s Test

We see that there is a significant difference between the New and Old creams. If we change the contrast coefficients in range G5:G7, we see that there is no significant difference between New and Control and between Old and Control.

Schaich-Hamerle Test

Schaich-Hamerle Test is similar to Dunn’s test, but it uses the chi-square distribution instead of the Studentized q range distribution. Once again, pairwise differences of the average ranks

are compared with the critical value

Schaich-Hamerle critical value

Here χ2α,k-1 is the critical value of the chi-square distribution for the given alpha and k – 1 degrees of freedom. The difference between the ith and jth groups is significant if dcrit < d.

Example 3: Find all significant differences between the blemish creams of Example 2 at the 95% significant level.

We summarize the results of the above analysis for the 3 pairwise comparisons in Figure 5.

Schaich-Hamerle test

Figure 5 – Schaich-Hamerle test for Example 3

Here χ2α,k-1 (in cell B36) = CHIINV(B35, B34–1) = 5.9915. For the comparison of the new and old creams, d = D29–D30 = 10.25 and dcrit = SQRT(B36*B33*(B33+1)/12*(1/B29+1/B30)) = 8.9267, and similarly for the other two comparisons.

The only significant comparison at the 95% significance level is between the new cream and the old cream where p < .05 since d > dcrit.

Steel’s Test

After a significant Kruskal-Wallis test, we can compare a control group with each of the other groups, in a manner similar to that used in Dunnett’s test after a one-way ANOVA. The test is also similar to the Nemenyi test, except that this time we use the Dunnett’s Table.

Real Statistics Data Analysis Tool: The Real Statistics Resource Pack provides a data analysis tool to perform Steel’s test, as shown in Example 4.

Example 4: Determine whether there is a significant difference between the Control and the New and Old groups for the data in Example 1.

After choosing the Steel test option from the One Factor ANOVA dialog box (see Figure 1) and filling in the contrast coefficients, we obtain the results shown in Figure 6.

Steel's Test

Figure 6 – Steel’s Test after KW Test

We see there is no significant difference between the Old and Control groups. If we move the -1 contrast coefficient to cell G5, we would see there is no significant difference between the New and Control groups.


Contrasts can be used after a Kruskal-Wallis test as for one-way ANOVA. A contrast C is defined based on the contrast coefficients by




Taking ties into account the formula for the standard error becomes


The square of the contrast C2 is then tested using a chi-square distribution with n−1 degrees of freedom.

Example 5: Determine whether there is a significant difference between the Control and the average of the New and Old groups for the data in Example 2.

This time we choose the Contrasts KW option from the One Factor ANOVA data analysis tool. We see from Figure 7 that there is no significant difference.

Contrasts post KW

Figure 7 – Contrasts after KW Test

45 Responses to Follow-up Tests to Kruskal-Wallis

  1. Ming says:


    First off–thank you for your excellent work! Your tools and great explanations are fantastic.

    I’m no statistician, but I do like to understand how things work. I think you may have a minor glitch in the Bonferroni correction for the Dunn test (I have not checked others). The Bonferroni correction is applied in calculating d-crit. [=NORMSINV(1-Alpha/(k*(k-1)))], where k=the number of groups. The denominator is meant to calculate m, the number of orthogonal tests. To do this correctly, it should all be divided by 2.

    i.e. for 3 groups, the current equation indicates m=6 comparisons being made. (and for 4, 12).

    If I’m correct, please consider this my humble contribution to your great work!

  2. Dennis says:

    Hello Charles,

    i love to use the package a lot! However, I just discovered some outcoma that i do not understand. Despite a highly significant KW-multiple group comparison i cannot detect any singificant differences in the follow-up Dunn-Test at all. As in you example, i simply put in 1 or -1 in the output table for the combinations i want to test for. I have different group sizes and some variables are not normally distributed even after various data transformations, thats why i chose the nonparametric procedures. Am i doing somethin wrong here? Thanks, Dennis

    • Charles says:

      If you send me an Excel file with your data and the tests you have performed, I will try to understand why you are getting the results that you are seeing. You can get my email address at Contact Us.

  3. Gerardo Burton says:

    Hello Charles
    thank you for this excellent statistical package! Its been very useful. I have however a problem I sent my work for publication in a scientific journal and used ANOVA followed by Tukeys test to analyze the results. However a reviewer asks that I use Kruskal Wallis followed by Dunn’s test. I tried to do this but when I select ANOVA single factor the dialog box differs from the one in the example 2 in that there are no follow up tests with “KW” (and no “Dunn test KW” to choose). Should I choose Tukey HSD?
    Thank you

    • Charles says:

      Hello Gerardo,
      It sounds like you are not using the latest version of the Real Statistics software. If you are using Excel 2007, 2010, 2013 or 2016 you should install the latest version of the software. Dunn KW is one of the options in the latest version of the software. Unfortunately, this capability is not yet available on the MAC or Excel 2003 versions of the software. In this case, you could use Tukey HSD, but that is not the advice you got from the reviewer.

  4. Andrew says:

    Hi Charles,

    Realstats is a great program and I love the website too.

    I’m trying to run a Dunn’s test as a follow-up to the Kruskal Wallis test. I have a Macbook Pro and, for some reason, the latest RealStats for Macs doesn’t seem to include the Dunn’s Test. Is that correct?



  5. john says:

    can you tell me how you calcuated qcritical using Nemenyi Test
    I tried it in r but getting different result.
    I used qtukey(p=.05,nmeans=3,df=2,lower.tail=F) and got 8.330783.
    qtukey is used in R to calculated Studentized Distribution . Looking at same distribution via the link provided by you in the article gave me the same result. The calculation you described by presenting formula table just after the method seems unclear. please help

    • Charles says:


      If you use the Real Statistics formula =QINV(0.05,3,2) you get the exact same value as you got in R, namely 8.330783.

      According to the standard textbook by Zar, Biostatistical Analysis, for the Nemenyi test, you don’t use df = 2, but df = infinity. I used df = 480, which is as high as I needed to go to approximate infinity.


  6. john says:

    I suspect the credibility of your result as i am getting different result in R.

    • Charles says:


      As I explained in my previous response, the Real Statistics formula =QINV(0.05,3,2) yields the exact same value as you got in R, namely 8.330783.

      According to the standard textbook by Zar, Biostatistical Analysis, for the Nemenyi test, you don’t use df = 2, but df = infinity. I used df = 480, which is as high as I needed to go to approximate infinity. Are you getting a different result from R for the Nemenyi test?


  7. Jill says:

    I’ve found this page very useful – thank you! But I’m having a problem with matching the standard error in Example 2, doing my own calculations.
    Is there a discrepancy between the formula used for Example 2 and the example spreadsheet?
    For the standard errror, the formula for example 2 starts with N*(N+1) (which matches the references). However the spreadsheet on the matching page (DUNN) has N*(N-1). If I use N*(N-1) then I match the standard error in the spreadsheet and Figure 4.
    Thank you for your help and an excellent resource

    • Charles says:

      There is an error in the calculation of the standard error. It should start with N*(N+1) and not N*(N-1). I will correct this in the next release.
      Thank you very much for finding this error. I really appreciate your help in improving the website and software tools.

  8. Davide says:

    Hi Charles!
    Thank you for these tools, they are awesome!!
    Can I ask you a (maybe stupid) question? I explain my dataset. I have 5 sample per each of the 6 different conditions (Control, Concentration1, Concentration2 and so on…). I made a Kruskal-Wallis and the significance results “yes”. So I did a Nemenyi Test to compare all the different concentrations (one at a time) to the Control (put -1 in the Control and 1 in the concentratio, one at a time) and registered the results. Is it correct? Should I change the “k” value from 6 to 2, since I’m comparing only two conditions at a time?
    Thank you in advance for the answer!
    I’m asking because I tried to run the KW and Nemenyi for Control and only one condition (e.g. Control-Concentration5) and the results are different.
    And what about the Dunnett’s test aften an ANOVA with the same type of dataset? Should I change something in your functions?
    Thank you again!

    • Charles says:

      The approach you used seems reasonable (assuming that you need to use the KW test in the first place). You shouldn’t change the k value for either test.

      • Davide says:

        Thank you Charles.
        Yes I need to use KW because, using Shapiro-Wilk and Levene’s test, it resulted that in my dataset normality and homoscedasticity are not always respected, so it seemed reasonable to me to use KW and Nemenyi. it is correct, isn’t it?
        Thank you a lot again.

        • Charles says:

          Yes, except that if the homogeneity of variances assumption is not met, you should consider Welch’s ANOVA.

          • Davide says:

            Hi Charles,
            Thank you for the advice. But, can you explain me in a simple way which is the difference between Kruskal-Wallis and a Welch’s ANOVA? Why should I use one instead of the other one and instead of the simple ANOVA single factor?
            In my dataset normality and homoscedasticity assumptions are not always respected, but in the majority of the cases they are respected.

            Thank you very much for the explanation (if you would have time to answer 🙂 )


          • Charles says:

            If the normality and homoscedasticity assumptions are met, generally you should use simple ANOVA. If normality does not hold but homogeneity of variances holds, then Kruskal-Wallis should be a good choice. When homogeneity of variances does not hold, then you should consider Welch’s test.

  9. Allison says:

    Hi Charles!
    Your website and explanations are wonderful. I’m trying to figure out though if I could apply this to my data. I have complete mass spectrometry data (lists of all proteins) for three separate mice within one healthy group. I’m trying to first determine if the mice within a group can be considered the same in regards to the protein composition of a selected tissue. So my data set contains proteins 1-900 for mouse 1,2,3 and the abundances of each of the 900 proteins. It doesn’t make sense to compare the levels between different proteins, instead I want to show that between the 3 mice they each have the same amounts separately of each protein. (Example: for all of your kitchen appliances, you might need 50 forks but you don’t need 50 ovens so it doesn’t make sense to compare the numbers of ovens to forks. But I want to show you and I have the same kitchen composition as determined individually for each appliance.)

    • Charles says:

      I like your kitchen analogy, especially since it is easier for me to understand things in the kitchen than mass spectometry data, but it is not clear what your question is.

      • Allison says:

        Sorry for the confusion, but thank you for your response. My question I suppose is two parts:
        1. Would Kruskal-Wallis be the appropriate test to compare kitchens in order to say they are the same because they have the same contents (as each appliance is measured individually)? Our data is discrete as mass spectrometry can only record complete counts (as in 1 or 2 toasters, because 1.5 toasters makes no sense). I have also tested normalcy and tried numerous ways to normalize it, but our data follows a non-normal distribution. I believe K-W is the correct test, but in the field of mass spectrometry very few groups have used it. Most use a t-test or ANOVA with a disclaimer stating that they know this is not the appropriate test, but the field has not agreed on the most appropriate one so they use it anyway.
        2. What would be a good post-hoc test to identify which appliances were variable between our kitchens? In my case, what would be the right test to determine which of the 900 proteins were more variable between the three mice?
        Thank you again!

  10. Dante Salas says:

    Hi Charles,
    I have a sizable data set with upwards of 20 different categories with different sample sizes and they are mostly skewed right. I used the Kruskal-Wallis H Test and found significance, but I have not used your software yet. I was looking around to see what follow up test I should use and it seems like the Dunn’s Test is the way to go. Looking at how many comparisons that I would need to do seems substantial nearing 200. Is there a better way to go about this, or would the best thing be running through them all?

    Thank you,

    • Charles says:

      Sorry Dante, but I haven’t yet implemented a way to perform all the comparisons at once. In any case, usually you only need to perform a few comparisons before you can detect which others will be significant which other ones won’t be significant (based on the size of difference between the R avg values in the case of Dunn’s Test).

  11. Nigel says:

    Hi Charles

    I’m no statistician, but I was under the impression that Dunnett’s test was a parametric test for multiple comparisons against a common control, and is used following ANOVA. The nonparametric equivalent is Steel’s test.

  12. Kevin Bluxome says:


    I believe the formula you are using under the “Pairwise Difference of Average Ranks” section that uses the chi-square is called the Schaich-Hamerle test. However, if I’m not mistaken, shouldn’t the second term in the product under the square root have a denominator of 12, not 2?

    • Charles says:

      Thank you very much for catching this error. I have made the correction to the webpage and will add this test in the next release of the software, which will be available shortly.
      I really appreciate that you have been an active participant in the Real Statistics community over several years, and have made important contributions.

      • Kevin Bluxome says:

        I’m glad I’ve been able to help out, Charles. I’m not a statistician by trade, believe it or not, but I am a self-proclaimed math enthusiast, and stats in particular has been a passion of mine for quite a long time. I guess I just got lucky and find the number-crunching of statistics to be rather fascinating. I just wish I had deeper insight into how some of these formulas were developed; I suspect that a full knowledge of such things would require a bigger handle on calculus than I currently do. But I’m always on the lookout for new techniques to use, so I have the site bookmarked for future reference! Thanks again for all the hard work you do!

  13. Federico Alessandro Ruffinatti says:

    Hi Charles,

    I think that Real-Statistics is a great project (and you are a great man!)
    Thank you for what are you doing!

    Just a question about Dunn’s post-hoc: isn’t there some way to get the actual p-value for this test beyond the sole yes/no significance value?
    If so, is it already corrected for multiple comparisons or I should correct it (or alpha) to get the adjusted value?
    Thank you again for your work!


    • Charles says:

      Thank you for your very kind statements. I am very pleased that you value the Real Statistics project.
      As noted on the webpage, d-crit = NORM.S.INV(alpha/(k*(k-1)). Thus the p-value is 1-NORM.S.DIST(d-stat,TRUE). This value must be compared with alpha/k*(k-1)) to correct for experiment-wise error. Alternatively you can compare (1-NORM.S.DIST(d-stat,TRUE))*k*(k-1) with alpha.
      For the example on the webpage p-value = .00244. The corrected value is this number times 6, i.e. .014638, a significant result since .014638 < .05. Charles

  14. Shane says:

    Hi Charles

    Thank you for all the work you’ve put into the real statistics add in. Its been really helpful.

    Just following you comment above, after cross checking my workbook, I get d-crit = NORM.S.INV(1 – alpha/(k*(k-1)). Wouldn’t the p value just be NORM.S.DIST(d-stat, TRUE) instead?

    • Charles says:

      Actually, p = 1-NORM.S.DIST(d-stat,TRUE), but you need to compare p with alpha/(k*(k-1).
      If you want to compare with alpha, then use p = k*(k-1)*(1-NORM.S.DIST(d-stat,TRUE))

      • Shane says:

        Thank you for your reply but I’m also kinda bummed by the contrast coefficients. What if i reverse the values to -1 and 1 in new and old respectively? I have been playing around with this and I noticed my d-stat becomes positive and negative. How do I find out where exactly I should put 1 and -1? Does it depend on the R-mean, where the highest R-mean value gets the 1 and the lowest R-mean gets the -1? Sorry if that sounded dumb but I’m really confused. Just been trying to get the p-values for each comparison.

        • Charles says:

          This is a reasonable question, although it does not matter which item gets the -1 and which gets the +1. The sign if the d-stat will change but this won’t affect the test result or p-value.

Leave a Reply

Your email address will not be published. Required fields are marked *