Follow-up Tests to Kruskal-Wallis

If the Kruskal-Wallis Test shows a significant difference between the groups, then pairwise comparisons or contrasts can be used to pinpoint the difference(s) as described following a single factor ANOVA. It is important to reduce familywise Type I error.

For more information about these follow-up tests and how to perform them in Excel, click on any of the following links:

References

NCSS (2012) One-way analysis of variance
https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/NCSS/One-Way_Analysis_of_Variance.pdf

Hollander, M., Wolfe, D. A. (1999) Nonparametric statistical methods, 2nd ed. Wiley

62 thoughts on “Follow-up Tests to Kruskal-Wallis”

  1. Hello Charles,
    Is it incorrect to use post hoc ANOVA tests like Tukey Kramer if I am running Kruskal Wallis?
    What is the equivalent of Nemenyi and Dunn’s test in SPSS if I have to refer to it?

    Reply
    • Sujatha,
      It is probably ok if the assumptions are met, but there is no need to use the standard ANOVA post-hoc tests since there are perfectly good KW post-hoc tests. What you shouldn’t do is shop around for a test whose results you like best. The KW post-hoc equivalent of Tukey-Kramer is Nemenyi.
      I don’t use SPSS and so I don’t know what names they use, although I would be surprised if the names were different.
      Charles

      Reply
  2. I have ran a Kruskal Wallis test and I am using Kruskal Wallice 1-way ANOVA pairwise comparisons. Do i report significance or adjusted significance (Bonferroni correction)?
    And what is the most appropriate way to present in a graph? box plot or means and standard error? or even a table?

    Reply
    • Hello Charlie,
      Report adjusted significance.
      Re what to graph, this depends on what you want to accomplish. What is your objective with this graph?
      Charles

      Reply
  3. Hi! Thank you very much for the RealStats Analysis Tool! It helped me so much ^^ I have a question tho: My experimental dosages have significant differences with both positive and negative control, and I’am at loss at what to do. Can you please help me? I used Kruskal-Wallis, and then Pairwise MW for the post hoc. Thank you!!

    Reply
    • Hi Eyna,
      It seems like you have conducted the tests and so I don’t know what you are asking when you say “I’m at a loss at what to do”.
      Without more information, I don’t know whether Kruskal-Wallis was the best test to use. It is very likely that there were better options than piarwise MW for the post-hoc since this doesn’t control for experimentwise error.
      Charles

      Reply
  4. Charles,
    I have a question regarding the df of 480 specified in Example 1 of the Nemenyi test output. Can you please explain how this df has been obtained?

    Thanks,
    -Sun

    Reply
  5. Hi Charles, I’ve been really enjoying RealStat. I’ve performed a Kruskal-Wallis on a set of 16 treatments (lack of normality and very heterogenous variances). When trying to perform a Mann-Whitney post KW I couldn’t find Mann-Whiney among the “Kruskal-Wallis follow-uo options”. The only options available were “Contrasts”, “Nemenyi”, “Dunn”, “Steel”, and “Schaich-Hamerle”.

    Thanks.

    Kind regards.

    Reply
    • Carlos,
      In the latest version of Real Statistics (Rel 5.9) you should also finf the Pairwise MW and Pairwise MW Exact options. These perform the Mann-Whitney tests. If you enter the formula =VER(), you will se which release you have.
      Charles

      Reply
  6. Hello Charles!
    Thank you for such a useful program and especially for your explanations and examples. I realize what a colossal work this is!
    But I get some strange results, using Kruskal-Wallis and Dunn’s test: in some cases of pairwise differences testing I see p-value>1 (for ex.5 or 15). I can’t understand the reason. Could you help me&
    Thanks

    Reply
  7. Hello Charles,

    Thank you for this incredible tool.

    I recently ran a Kruskal-Wallis Test that returned a significant result. I performed post-hoc testing with Dunn’s Test and couldn’t find any pairwise significant values. The two values with the greatest difference in R-means had a d-stat of 2.52, lower than the d-crit of 2.93. All other d-stat values were also below the d-crit.

    Do you have any thoughts or suggestions?

    Regards,

    Shaun

    Reply
    • Shaun,
      I’d have to see the data to really answer your question, but it is not completely surprising to get a significant result, but then find no significant pairwise follow-up result. This is due to the attempt of the follow-up tests to correct for experiment-wise error.
      Are the pairwise sample sizes equal, in which case perhaps you should have used a different follow up test (although this may not have mattered)?
      Charles

      Reply
  8. Hi Charles

    Thank you for all the work you’ve put into the real statistics add in. Its been really helpful.

    Just following you comment above, after cross checking my workbook, I get d-crit = NORM.S.INV(1 – alpha/(k*(k-1)). Wouldn’t the p value just be NORM.S.DIST(d-stat, TRUE) instead?

    Reply
    • Shane,
      Actually, p = 1-NORM.S.DIST(d-stat,TRUE), but you need to compare p with alpha/(k*(k-1).
      If you want to compare with alpha, then use p = k*(k-1)*(1-NORM.S.DIST(d-stat,TRUE))
      Charles

      Reply
      • Thank you for your reply but I’m also kinda bummed by the contrast coefficients. What if i reverse the values to -1 and 1 in new and old respectively? I have been playing around with this and I noticed my d-stat becomes positive and negative. How do I find out where exactly I should put 1 and -1? Does it depend on the R-mean, where the highest R-mean value gets the 1 and the lowest R-mean gets the -1? Sorry if that sounded dumb but I’m really confused. Just been trying to get the p-values for each comparison.

        Reply
        • Shane,
          This is a reasonable question, although it does not matter which item gets the -1 and which gets the +1. The sign if the d-stat will change but this won’t affect the test result or p-value.
          Charles

          Reply
          • Hello Charles —
            I am using Your RealStats-2010 package for analysis — Thanks much for all this excellent work! Following on Shane’s second comment above (Jan 23, 2018) asking where to put the -1 and 1 for the pairwise comparison, I agree that the sign of the d-stat will change. This DOES however affect calculations for the P-value in my analysis: when I have the -1 assigned with the smaller R-mean I get P=0.000145131, but when I have the direction of the pair swapped, and -1 assigned with the larger R-mean of the pair, I get P=11.99985487.
            I can stabilize this result by taking the absolute value of the d-stat in the calculation of P, as in:
            k*(k-1)*(1-NORM.S.DIST(ABS(d-stat))), but don’t know if this is appropriate.
            Thoughts?

          • Nathan,
            This also been brought to my attention by someone else. There is an error in how Real Statistics deals with the Bonferroni correction for the p-value in this case. Instead of multiplying the p-value by k*(k-1) it should be dividing the alpha value by k*(k-1)/2. This will be corrected in the next release of the Real Statistics software. Thanks for bringing this to my attention.
            Charles

  9. Hi Charles,

    I think that Real-Statistics is a great project (and you are a great man!)
    Thank you for what are you doing!

    Just a question about Dunn’s post-hoc: isn’t there some way to get the actual p-value for this test beyond the sole yes/no significance value?
    If so, is it already corrected for multiple comparisons or I should correct it (or alpha) to get the adjusted value?
    Thank you again for your work!

    Federico

    Reply
    • Federico,
      Thank you for your very kind statements. I am very pleased that you value the Real Statistics project.
      As noted on the webpage, d-crit = NORM.S.INV(alpha/(k*(k-1)). Thus the p-value is 1-NORM.S.DIST(d-stat,TRUE). This value must be compared with alpha/k*(k-1)) to correct for experiment-wise error. Alternatively you can compare (1-NORM.S.DIST(d-stat,TRUE))*k*(k-1) with alpha.
      For the example on the webpage p-value = .00244. The corrected value is this number times 6, i.e. .014638, a significant result since .014638 < .05. Charles

      Reply
  10. Charles,

    I believe the formula you are using under the “Pairwise Difference of Average Ranks” section that uses the chi-square is called the Schaich-Hamerle test. However, if I’m not mistaken, shouldn’t the second term in the product under the square root have a denominator of 12, not 2?

    Reply
    • Kevin,
      Thank you very much for catching this error. I have made the correction to the webpage and will add this test in the next release of the software, which will be available shortly.
      I really appreciate that you have been an active participant in the Real Statistics community over several years, and have made important contributions.
      Charles

      Reply
      • I’m glad I’ve been able to help out, Charles. I’m not a statistician by trade, believe it or not, but I am a self-proclaimed math enthusiast, and stats in particular has been a passion of mine for quite a long time. I guess I just got lucky and find the number-crunching of statistics to be rather fascinating. I just wish I had deeper insight into how some of these formulas were developed; I suspect that a full knowledge of such things would require a bigger handle on calculus than I currently do. But I’m always on the lookout for new techniques to use, so I have the site bookmarked for future reference! Thanks again for all the hard work you do!

        Reply
  11. Hi Charles

    I’m no statistician, but I was under the impression that Dunnett’s test was a parametric test for multiple comparisons against a common control, and is used following ANOVA. The nonparametric equivalent is Steel’s test.

    Reply
  12. Hi Charles,
    I have a sizable data set with upwards of 20 different categories with different sample sizes and they are mostly skewed right. I used the Kruskal-Wallis H Test and found significance, but I have not used your software yet. I was looking around to see what follow up test I should use and it seems like the Dunn’s Test is the way to go. Looking at how many comparisons that I would need to do seems substantial nearing 200. Is there a better way to go about this, or would the best thing be running through them all?

    Thank you,
    Dante

    Reply
    • Sorry Dante, but I haven’t yet implemented a way to perform all the comparisons at once. In any case, usually you only need to perform a few comparisons before you can detect which others will be significant which other ones won’t be significant (based on the size of difference between the R avg values in the case of Dunn’s Test).
      Charles

      Reply
  13. Hi Charles!
    Your website and explanations are wonderful. I’m trying to figure out though if I could apply this to my data. I have complete mass spectrometry data (lists of all proteins) for three separate mice within one healthy group. I’m trying to first determine if the mice within a group can be considered the same in regards to the protein composition of a selected tissue. So my data set contains proteins 1-900 for mouse 1,2,3 and the abundances of each of the 900 proteins. It doesn’t make sense to compare the levels between different proteins, instead I want to show that between the 3 mice they each have the same amounts separately of each protein. (Example: for all of your kitchen appliances, you might need 50 forks but you don’t need 50 ovens so it doesn’t make sense to compare the numbers of ovens to forks. But I want to show you and I have the same kitchen composition as determined individually for each appliance.)

    Reply
    • Allison,
      I like your kitchen analogy, especially since it is easier for me to understand things in the kitchen than mass spectometry data, but it is not clear what your question is.
      Charles

      Reply
      • Sorry for the confusion, but thank you for your response. My question I suppose is two parts:
        1. Would Kruskal-Wallis be the appropriate test to compare kitchens in order to say they are the same because they have the same contents (as each appliance is measured individually)? Our data is discrete as mass spectrometry can only record complete counts (as in 1 or 2 toasters, because 1.5 toasters makes no sense). I have also tested normalcy and tried numerous ways to normalize it, but our data follows a non-normal distribution. I believe K-W is the correct test, but in the field of mass spectrometry very few groups have used it. Most use a t-test or ANOVA with a disclaimer stating that they know this is not the appropriate test, but the field has not agreed on the most appropriate one so they use it anyway.
        2. What would be a good post-hoc test to identify which appliances were variable between our kitchens? In my case, what would be the right test to determine which of the 900 proteins were more variable between the three mice?
        Thank you again!

        Reply
  14. Hi Charles!
    Thank you for these tools, they are awesome!!
    Can I ask you a (maybe stupid) question? I explain my dataset. I have 5 sample per each of the 6 different conditions (Control, Concentration1, Concentration2 and so on…). I made a Kruskal-Wallis and the significance results “yes”. So I did a Nemenyi Test to compare all the different concentrations (one at a time) to the Control (put -1 in the Control and 1 in the concentratio, one at a time) and registered the results. Is it correct? Should I change the “k” value from 6 to 2, since I’m comparing only two conditions at a time?
    Thank you in advance for the answer!
    I’m asking because I tried to run the KW and Nemenyi for Control and only one condition (e.g. Control-Concentration5) and the results are different.
    And what about the Dunnett’s test aften an ANOVA with the same type of dataset? Should I change something in your functions?
    Thank you again!

    Reply
    • Davide,
      The approach you used seems reasonable (assuming that you need to use the KW test in the first place). You shouldn’t change the k value for either test.
      Charles

      Reply
      • Thank you Charles.
        Yes I need to use KW because, using Shapiro-Wilk and Levene’s test, it resulted that in my dataset normality and homoscedasticity are not always respected, so it seemed reasonable to me to use KW and Nemenyi. it is correct, isn’t it?
        Thank you a lot again.
        Davide

        Reply
        • Davide,
          Yes, except that if the homogeneity of variances assumption is not met, you should consider Welch’s ANOVA.
          Charles

          Reply
          • Hi Charles,
            Thank you for the advice. But, can you explain me in a simple way which is the difference between Kruskal-Wallis and a Welch’s ANOVA? Why should I use one instead of the other one and instead of the simple ANOVA single factor?
            In my dataset normality and homoscedasticity assumptions are not always respected, but in the majority of the cases they are respected.

            Thank you very much for the explanation (if you would have time to answer 🙂 )

            Davide

          • Davide,
            If the normality and homoscedasticity assumptions are met, generally you should use simple ANOVA. If normality does not hold but homogeneity of variances holds, then Kruskal-Wallis should be a good choice. When homogeneity of variances does not hold, then you should consider Welch’s test.
            Charles

  15. I’ve found this page very useful – thank you! But I’m having a problem with matching the standard error in Example 2, doing my own calculations.
    Is there a discrepancy between the formula used for Example 2 and the example spreadsheet?
    For the standard errror, the formula for example 2 starts with N*(N+1) (which matches the references). However the spreadsheet on the matching page (DUNN) has N*(N-1). If I use N*(N-1) then I match the standard error in the spreadsheet and Figure 4.
    Thank you for your help and an excellent resource

    Reply
    • Jill,
      There is an error in the calculation of the standard error. It should start with N*(N+1) and not N*(N-1). I will correct this in the next release.
      Thank you very much for finding this error. I really appreciate your help in improving the website and software tools.
      Charles

      Reply
    • John,

      As I explained in my previous response, the Real Statistics formula =QINV(0.05,3,2) yields the exact same value as you got in R, namely 8.330783.

      According to the standard textbook by Zar, Biostatistical Analysis, for the Nemenyi test, you don’t use df = 2, but df = infinity. I used df = 480, which is as high as I needed to go to approximate infinity. Are you getting a different result from R for the Nemenyi test?

      Charles

      Reply
  16. can you tell me how you calcuated qcritical using Nemenyi Test
    I tried it in r but getting different result.
    I used qtukey(p=.05,nmeans=3,df=2,lower.tail=F) and got 8.330783.
    qtukey is used in R to calculated Studentized Distribution . Looking at same distribution via the link provided by you in the article gave me the same result. The calculation you described by presenting formula table just after the method seems unclear. please help

    Reply
    • John,

      If you use the Real Statistics formula =QINV(0.05,3,2) you get the exact same value as you got in R, namely 8.330783.

      According to the standard textbook by Zar, Biostatistical Analysis, for the Nemenyi test, you don’t use df = 2, but df = infinity. I used df = 480, which is as high as I needed to go to approximate infinity.

      Charles

      Reply
  17. Hi Charles,

    Realstats is a great program and I love the website too.

    I’m trying to run a Dunn’s test as a follow-up to the Kruskal Wallis test. I have a Macbook Pro and, for some reason, the latest RealStats for Macs doesn’t seem to include the Dunn’s Test. Is that correct?

    Thanks!

    Andrew

    Reply
  18. Hello Charles
    thank you for this excellent statistical package! Its been very useful. I have however a problem I sent my work for publication in a scientific journal and used ANOVA followed by Tukeys test to analyze the results. However a reviewer asks that I use Kruskal Wallis followed by Dunn’s test. I tried to do this but when I select ANOVA single factor the dialog box differs from the one in the example 2 in that there are no follow up tests with “KW” (and no “Dunn test KW” to choose). Should I choose Tukey HSD?
    Thank you
    Gerardo

    Reply
    • Hello Gerardo,
      It sounds like you are not using the latest version of the Real Statistics software. If you are using Excel 2007, 2010, 2013 or 2016 you should install the latest version of the software. Dunn KW is one of the options in the latest version of the software. Unfortunately, this capability is not yet available on the MAC or Excel 2003 versions of the software. In this case, you could use Tukey HSD, but that is not the advice you got from the reviewer.
      Charles

      Reply
  19. Hello Charles,

    i love to use the package a lot! However, I just discovered some outcoma that i do not understand. Despite a highly significant KW-multiple group comparison i cannot detect any singificant differences in the follow-up Dunn-Test at all. As in you example, i simply put in 1 or -1 in the output table for the combinations i want to test for. I have different group sizes and some variables are not normally distributed even after various data transformations, thats why i chose the nonparametric procedures. Am i doing somethin wrong here? Thanks, Dennis

    Reply
    • Dennis,
      If you send me an Excel file with your data and the tests you have performed, I will try to understand why you are getting the results that you are seeing. You can get my email address at Contact Us.
      Charles

      Reply
  20. Hi,

    First off–thank you for your excellent work! Your tools and great explanations are fantastic.

    I’m no statistician, but I do like to understand how things work. I think you may have a minor glitch in the Bonferroni correction for the Dunn test (I have not checked others). The Bonferroni correction is applied in calculating d-crit. [=NORMSINV(1-Alpha/(k*(k-1)))], where k=the number of groups. The denominator is meant to calculate m, the number of orthogonal tests. To do this correctly, it should all be divided by 2.

    i.e. for 3 groups, the current equation indicates m=6 comparisons being made. (and for 4, 12).

    If I’m correct, please consider this my humble contribution to your great work!

    Reply

Leave a Reply to Charles Cancel reply