Nemenyi Test
This test is the same as the Tukey HSD Test (see Unplanned Comparisons), except that we test the difference between rank sums and use the following standard error
where k = the number of groups and m = the size of each of the group samples. The group sample sizes must all be equal. The statistic has a studentized range distribution (see Studentized Range Distribution). The critical values for this distribution are presented in the Studentized Range q Table based on the values of α, k (the number of groups) and df = ∞. If q > qcrit then the two means are significantly different.
This test is equivalent to Rmax − Rmin > qcrit ⋅ s.e.
Picking the largest pairwise difference in means allows us to control the experiment-wise for all possible pairwise contrasts; in fact, this test keeps the experiment-wise α = .05 for the largest pairwise contrast, and is conservative for all other comparisons.
Real Statistics Data Analysis Tool: The Real Statistics Resource Pack provides a data analysis tool to perform the Nemenyi test, as shown in Example C.
Example 1: Conduct the Nemenyi Test for the data in range B3:D11 of Figure 2 to determine which groups are significantly different.
Press Ctrl-m, double click on Analysis of Variance option and select Single Factor Anova. When a dialog box similar to that shown in Figure 1 appears, enter B3:D11 in the Input Range, check Column headings included with data, select the Kruskal-Wallis and Nemenyi KW options and click on OK.
Figure 1 – Selecting Kruskal-Wallis and Nemenyi Tests
The output is shown in Figure 2.
Figure 2– Nemenyi Test
The Kruskal-Wallis Test (the middle part of Figure 2) shows there is a significant difference between the three groups (cell J12). Since the three groups are equal in size we use the Nemenyi test to determine which groups are significantly different.
The template for the Nemenyi test is generated by the Real Statistics data analysis tool, as shown on the right side of Figure 2). You begin by inserting a 1 and -1 in cells M3 and M4 to compare the New and Old groups. The difference between the rank sums of these two groups is 76.5 (cell N6), which is greater than 66.28, the value of x-crit (cell P9), we conclude there is a significant difference bwteen the New and Old groups.
We can compare the New and Control group in the same way (removing the -1 from cell M4 and inserting -1 in cell M5) and see that there is no significance between these groups. Similarly there is no significant difference between Old and Control.
Some key formulas from Figure 2 are shown in Figure 3.
| Cells | Item | Formula |
| N3 | R1 | =RANK_SUM(B4:D11,1,1) |
| O3 | n1 | =COUNT(B4:B11) |
| N6 | Ri-Rj | =SUMPRODUCT(M3:M5,N3:N5) |
| L9 | s.e. | =SQRT(O6^2*M9*(O6*M9+1)/12) |
| M9 | k | =COUNT(O3:O5) |
| N9 | q-stat | =N6/L9 |
| O9 | q-crit | =QCRIT(M9,480,Q7,2) |
| P9 | x-crit | =L9*O9 |
| S9 | sig | =IF(ABS(N9)>O9,”yes”,”no”) |
Figure 3 – Selected formulas from Figure 2
Dunn’s Test
This test is similar to the above test and can be viewed as a version of Nemenyi’s test when the sample sizes are unequal.
Dunn’s test uses the statistic
where
and the standard error is
If there are a lot of ties, an improved version of the standard error is given by
where f is as in the ties correction for the Kruskal-Wallis test. This test is equivalent to
where
Here the term in parentheses is a Bonferroni-like correction.
Real Statistics Data Analysis Tool: The Real Statistics Resource Pack provides a data analysis tool to perform Dunn’s test, as shown in Example 2.
Example 2: Find all significant differences between the blemish creams of Example 1 of Kruskal-Wallis Test at the 95% significant level.
We repeat the data from Example 1 of Kruskal-Wallis Test in range B3:D13 of Figure 4. As we saw from the Kruskal-Wallis analysis, there is a significant difference between the three groups. Unlike in Example 1, this time we use Dunn’s test since the group sizes are different.
To perform this test, we proceed as in Example 1, except that we choose the Dunn KW option instead of the Nemenyi KW option. When we press the OK button the result shown on the right side of Figure 4 is displayed.
Figure 4 – Dunn’s Test
We see that there is a significant difference between the New and Old creams. If we change the contrast coefficients in range G5:G7, we see that there is no significant difference between New and Control and between Old and Control.
Pairwise Differences of Average Ranks
Another test, similar to Dunn’s test, uses the chi-square distribution instead of the Studentized q range distribution, and is based once again on pairwise differences of the average ranks
and compare these with the critical value
χ2α,k-1 is the critical value of the chi-square distribution for the given alpha and k – 1 degrees of freedom. The difference between the ith and jth groups is significant if dcrit < d.
Example 3: Find all significant differences between the blemish creams of Example 2 at the 95% significant level.
We summarize the results of the above analysis for the 3 pairwise comparisons in Figure 5.
Figure 5 – Non-parametric pairwise comparisons for Example 3
Here χ2α,k-1 (in cell B36) = CHIINV(B35, B34–1) = 5.9915. For the comparison of the new and old creams, d = D29–D30 = 10.25 and dcrit = SQRT(B36*B33*(B33+1)/12*(1/B29+1/B30)) = 8.9267, and similarly for the other two comparisons.
The only significant comparison at the 95% significance level is between the new cream and the old cream where p < .05 since d > dcrit.
Dunnett’s Test
After a significant Kruskal-Wallis test, we can compare a control group with each of the other groups, in a manner similar to that used in Dunnett’s test after a one-way ANOVA. The test is similar to the Nemenyi test, except that this time we use the Dunnett’s Table.
Real Statistics Data Analysis Tool: The Real Statistics Resource Pack provides a data analysis tool to perform Dunnett’s test, as shown in Example 4.
Example 4: Determine whether there is a significant difference between the Control and the New and Old groups for the data in Example 1.
After choosing the Dunnett KW option from the One Factor ANOVA dialog box (see Figure 1) and filling in the contrast coefficients, we obtain the results shown in Figure 6.
Figure 6 – Dunnett’s Test after KW Test
We see there is no significant difference between the Old and Control groups. If we move the -1 contrast coefficient to cell G5, we would see there is no significant difference between the New and Control groups.
Contrasts
Contrasts can be used after a Kruskal-Wallis test as for one-way ANOVA. A contrast C is defined based on the contrast coefficients by
where
Taking ties into account the formula for the standard error becomes
The square of the contrast C2 is then tested using a chi-square distribution with n−1 degrees of freedom.
Example 5: Determine whether there is a significant difference between the Control and the average of the New and Old groups for the data in Example 2.
This time we choose the Contrasts KW option from the One Factor ANOVA data analysis tool. We see from Figure 7 that there is no significant difference.
Figure 7 – Contrasts after KW Test





















Hi Charles,
I have a sizable data set with upwards of 20 different categories with different sample sizes and they are mostly skewed right. I used the Kruskal-Wallis H Test and found significance, but I have not used your software yet. I was looking around to see what follow up test I should use and it seems like the Dunn’s Test is the way to go. Looking at how many comparisons that I would need to do seems substantial nearing 200. Is there a better way to go about this, or would the best thing be running through them all?
Thank you,
Dante
Sorry Dante, but I haven’t yet implemented a way to perform all the comparisons at once. In any case, usually you only need to perform a few comparisons before you can detect which others will be significant which other ones won’t be significant (based on the size of difference between the R avg values in the case of Dunn’s Test).
Charles
Hi Charles!
Your website and explanations are wonderful. I’m trying to figure out though if I could apply this to my data. I have complete mass spectrometry data (lists of all proteins) for three separate mice within one healthy group. I’m trying to first determine if the mice within a group can be considered the same in regards to the protein composition of a selected tissue. So my data set contains proteins 1-900 for mouse 1,2,3 and the abundances of each of the 900 proteins. It doesn’t make sense to compare the levels between different proteins, instead I want to show that between the 3 mice they each have the same amounts separately of each protein. (Example: for all of your kitchen appliances, you might need 50 forks but you don’t need 50 ovens so it doesn’t make sense to compare the numbers of ovens to forks. But I want to show you and I have the same kitchen composition as determined individually for each appliance.)
Allison,
I like your kitchen analogy, especially since it is easier for me to understand things in the kitchen than mass spectometry data, but it is not clear what your question is.
Charles
Sorry for the confusion, but thank you for your response. My question I suppose is two parts:
1. Would Kruskal-Wallis be the appropriate test to compare kitchens in order to say they are the same because they have the same contents (as each appliance is measured individually)? Our data is discrete as mass spectrometry can only record complete counts (as in 1 or 2 toasters, because 1.5 toasters makes no sense). I have also tested normalcy and tried numerous ways to normalize it, but our data follows a non-normal distribution. I believe K-W is the correct test, but in the field of mass spectrometry very few groups have used it. Most use a t-test or ANOVA with a disclaimer stating that they know this is not the appropriate test, but the field has not agreed on the most appropriate one so they use it anyway.
2. What would be a good post-hoc test to identify which appliances were variable between our kitchens? In my case, what would be the right test to determine which of the 900 proteins were more variable between the three mice?
Thank you again!
Allison,
Since I don’t yet understand the mass spectrometry scenario, let me make the following suggestions on the basis that a t test or ANOVA is commonly used.
1. If there are two groups with similar distributions (e.g. both skewed in the same direction), then the Mann-Whitney test could be used instead of a t test. With more than two groups (and similar skewness properties), then a K-W test could be a good choice.
2. Typical follow-up tests after K-W are described on the following webpage:
http://www.real-statistics.com/one-way-analysis-of-variance-anova/kruskal-wallis-test/follow-up-tests-kruskal-wallis/
Charles
Hi Charles!
Thank you for these tools, they are awesome!!
Can I ask you a (maybe stupid) question? I explain my dataset. I have 5 sample per each of the 6 different conditions (Control, Concentration1, Concentration2 and so on…). I made a Kruskal-Wallis and the significance results “yes”. So I did a Nemenyi Test to compare all the different concentrations (one at a time) to the Control (put -1 in the Control and 1 in the concentratio, one at a time) and registered the results. Is it correct? Should I change the “k” value from 6 to 2, since I’m comparing only two conditions at a time?
Thank you in advance for the answer!
I’m asking because I tried to run the KW and Nemenyi for Control and only one condition (e.g. Control-Concentration5) and the results are different.
And what about the Dunnett’s test aften an ANOVA with the same type of dataset? Should I change something in your functions?
Thank you again!
Davide,
The approach you used seems reasonable (assuming that you need to use the KW test in the first place). You shouldn’t change the k value for either test.
Charles
Thank you Charles.
Yes I need to use KW because, using Shapiro-Wilk and Levene’s test, it resulted that in my dataset normality and homoscedasticity are not always respected, so it seemed reasonable to me to use KW and Nemenyi. it is correct, isn’t it?
Thank you a lot again.
Davide
Davide,
Yes, except that if the homogeneity of variances assumption is not met, you should consider Welch’s ANOVA.
Charles
Hi Charles,
Thank you for the advice. But, can you explain me in a simple way which is the difference between Kruskal-Wallis and a Welch’s ANOVA? Why should I use one instead of the other one and instead of the simple ANOVA single factor?
In my dataset normality and homoscedasticity assumptions are not always respected, but in the majority of the cases they are respected.
Thank you very much for the explanation (if you would have time to answer 🙂 )
Davide
Davide,
If the normality and homoscedasticity assumptions are met, generally you should use simple ANOVA. If normality does not hold but homogeneity of variances holds, then Kruskal-Wallis should be a good choice. When homogeneity of variances does not hold, then you should consider Welch’s test.
Charles
I’ve found this page very useful – thank you! But I’m having a problem with matching the standard error in Example 2, doing my own calculations.
Is there a discrepancy between the formula used for Example 2 and the example spreadsheet?
For the standard errror, the formula for example 2 starts with N*(N+1) (which matches the references). However the spreadsheet on the matching page (DUNN) has N*(N-1). If I use N*(N-1) then I match the standard error in the spreadsheet and Figure 4.
Thank you for your help and an excellent resource
Jill,
There is an error in the calculation of the standard error. It should start with N*(N+1) and not N*(N-1). I will correct this in the next release.
Thank you very much for finding this error. I really appreciate your help in improving the website and software tools.
Charles
Thank you for the clarification and the future fix.
Jill
Jill,
This fix is now available in the latest version of the Real Statistics Resource pack, Rel 4.7.
Charles
I suspect the credibility of your result as i am getting different result in R.
John,
As I explained in my previous response, the Real Statistics formula =QINV(0.05,3,2) yields the exact same value as you got in R, namely 8.330783.
According to the standard textbook by Zar, Biostatistical Analysis, for the Nemenyi test, you don’t use df = 2, but df = infinity. I used df = 480, which is as high as I needed to go to approximate infinity. Are you getting a different result from R for the Nemenyi test?
Charles
can you tell me how you calcuated qcritical using Nemenyi Test
I tried it in r but getting different result.
I used qtukey(p=.05,nmeans=3,df=2,lower.tail=F) and got 8.330783.
qtukey is used in R to calculated Studentized Distribution . Looking at same distribution via the link provided by you in the article gave me the same result. The calculation you described by presenting formula table just after the method seems unclear. please help
John,
If you use the Real Statistics formula =QINV(0.05,3,2) you get the exact same value as you got in R, namely 8.330783.
According to the standard textbook by Zar, Biostatistical Analysis, for the Nemenyi test, you don’t use df = 2, but df = infinity. I used df = 480, which is as high as I needed to go to approximate infinity.
Charles
Hi Charles,
Realstats is a great program and I love the website too.
I’m trying to run a Dunn’s test as a follow-up to the Kruskal Wallis test. I have a Macbook Pro and, for some reason, the latest RealStats for Macs doesn’t seem to include the Dunn’s Test. Is that correct?
Thanks!
Andrew
Sorry Andrew, but only the Windows release has Dunn’s test at the moment.
Charles
Hello Charles
thank you for this excellent statistical package! Its been very useful. I have however a problem I sent my work for publication in a scientific journal and used ANOVA followed by Tukeys test to analyze the results. However a reviewer asks that I use Kruskal Wallis followed by Dunn’s test. I tried to do this but when I select ANOVA single factor the dialog box differs from the one in the example 2 in that there are no follow up tests with “KW” (and no “Dunn test KW” to choose). Should I choose Tukey HSD?
Thank you
Gerardo
Hello Gerardo,
It sounds like you are not using the latest version of the Real Statistics software. If you are using Excel 2007, 2010, 2013 or 2016 you should install the latest version of the software. Dunn KW is one of the options in the latest version of the software. Unfortunately, this capability is not yet available on the MAC or Excel 2003 versions of the software. In this case, you could use Tukey HSD, but that is not the advice you got from the reviewer.
Charles
Hello Charles,
i love to use the package a lot! However, I just discovered some outcoma that i do not understand. Despite a highly significant KW-multiple group comparison i cannot detect any singificant differences in the follow-up Dunn-Test at all. As in you example, i simply put in 1 or -1 in the output table for the combinations i want to test for. I have different group sizes and some variables are not normally distributed even after various data transformations, thats why i chose the nonparametric procedures. Am i doing somethin wrong here? Thanks, Dennis
Dennis,
If you send me an Excel file with your data and the tests you have performed, I will try to understand why you are getting the results that you are seeing. You can get my email address at Contact Us.
Charles
Hi,
First off–thank you for your excellent work! Your tools and great explanations are fantastic.
I’m no statistician, but I do like to understand how things work. I think you may have a minor glitch in the Bonferroni correction for the Dunn test (I have not checked others). The Bonferroni correction is applied in calculating d-crit. [=NORMSINV(1-Alpha/(k*(k-1)))], where k=the number of groups. The denominator is meant to calculate m, the number of orthogonal tests. To do this correctly, it should all be divided by 2.
i.e. for 3 groups, the current equation indicates m=6 comparisons being made. (and for 4, 12).
If I’m correct, please consider this my humble contribution to your great work!
Hi Ming,
I am very pleased that you like the website.
Actually, I think the formula I used for Dunn’s test is correct, as can be seen from the following two references. Since this is a two tailed test, probably the division by 2 in the denominator is cancelled out by the fact that you also need to divide the alpha value in the numerator by 2.
http://support.minitab.com/en-us/minitab/17/macro-library/macro-files/nonparametrics-macros/krusmc/
http://www.lexjansen.com/pharmasug/2004/statisticspharmacokinetics/sp04.pdf
Charles