**Nemenyi Test**

This test is the same as the Tukey HSD Test (see Unplanned Comparisons), except that we test the difference between rank sums and use the following standard error

where *k* = the number of groups and *m* = the size of each of the group samples. The group sample sizes must all be equal. The statistic has a **studentized range ** distribution (see Studentized Range Distribution). The critical values for this distribution are presented in the Studentized Range q Table based on the values of* α, k* (the number of groups) and *df* = ∞. If *q > q _{crit}* then the two means are significantly different.

This test is equivalent to *R _{max} − R_{min} > q_{crit} ⋅ s.e.*

Picking the largest pairwise difference in means allows us to control the experiment-wise for all possible pairwise contrasts; in fact, this test keeps the experiment-wise α = .05 for the largest pairwise contrast, and is conservative for all other comparisons.

**Real Statistics Data Analysis Tool**: The Real Statistics Resource Pack provides a data analysis tool to perform the Nemenyi test, as shown in Example C.

**Example 1**: Conduct the Nemenyi Test for the data in range B3:D11 of Figure 2 to determine which groups are significantly different.

Press **Ctrl-m**, double click on **Analysis of Variance **option and select **Single Factor Anova**. When a dialog box similar to that shown in Figure 1 appears, enter B3:D11 in the **Input Range**, check **Column headings included with data**, select the **Kruskal-Wallis** and **Nemenyi** **KW** options and click on **OK**.

**Figure 1 – Selecting Kruskal-Wallis and Nemenyi Tests**

The output is shown in Figure 2.

**Figure 2– Nemenyi Test**

The Kruskal-Wallis Test (the middle part of Figure 2) shows there is a significant difference between the three groups (cell J12). Since the three groups are equal in size we use the Nemenyi test to determine which groups are significantly different.

The template for the Nemenyi test is generated by the Real Statistics data analysis tool, as shown on the right side of Figure 2). You begin by inserting a 1 and -1 in cells M3 and M4 to compare the New and Old groups. The difference between the rank sums of these two groups is 76.5 (cell N6), which is greater than 66.28, the value of x-crit (cell P9), we conclude there is a significant difference bwteen the New and Old groups.

We can compare the New and Control group in the same way (removing the -1 from cell M4 and inserting -1 in cell M5) and see that there is no significance between these groups. Similarly there is no significant difference between Old and Control.

Some key formulas from Figure 2 are shown in Figure 3.

Cells |
Item |
Formula |

N3 | R_{1} |
=RANK_SUM(B4:D11,1,1) |

O3 | n_{1} |
=COUNT(B4:B11) |

N6 | R_{i}-R_{j} |
=SUMPRODUCT(M3:M5,N3:N5) |

L9 | s.e. | =SQRT(O6^2*M9*(O6*M9+1)/12) |

M9 | k | =COUNT(O3:O5) |

N9 | q-stat | =N6/L9 |

O9 | q-crit | =QCRIT(M9,480,Q7,2) |

P9 | x-crit | =L9*O9 |

S9 | sig | =IF(ABS(N9)>O9,”yes”,”no”) |

**Figure 3 – Selected formulas from Figure 2**

**Dunn’s Test**

This test is similar to the above test and can be viewed as a version of Nemenyi’s test when the sample sizes are unequal.

Dunn’s test uses the statistic

where

and the standard error is

If there are a lot of ties, an improved version of the standard error is given by

where* f* is as in the ties correction for the Kruskal-Wallis test. This test is equivalent to

where

Here the term in parentheses is a Bonferroni-like correction.

**Real Statistics Data Analysis Tool**: The Real Statistics Resource Pack provides a data analysis tool to perform Dunn’s test, as shown in Example 2.

**Example 2**: Find all significant differences between the blemish creams of Example 1 of Kruskal-Wallis Test at the 95% significant level.

We repeat the data from Example 1 of Kruskal-Wallis Test in range B3:D13 of Figure 4. As we saw from the Kruskal-Wallis analysis, there is a significant difference between the three groups. Unlike in Example 1, this time we use Dunn’s test since the group sizes are different.

To perform this test, we proceed as in Example 1, except that we choose the **Dunn KW **option instead of the **Nemenyi KW** option. When we press the **OK** button the result shown on the right side of Figure 4 is displayed.

**Figure 4 – Dunn’s Test**

We see that there is a significant difference between the New and Old creams. If we change the contrast coefficients in range G5:G7, we see that there is no significant difference between New and Control and between Old and Control.

**Schaich-Hamerle Test**

### Schaich-Hamerle Test is similar to Dunn’s test, but it uses the chi-square distribution instead of the Studentized q range distribution. Once again, pairwise differences of the average ranks

are compared with the critical value

Here* χ ^{2}_{α,k-1}* is the critical value of the chi-square distribution for the given alpha and

*k*– 1 degrees of freedom. The difference between the

*i*

^{th}and

*j*

^{th}groups is significant if

*d*.

_{crit}< d**Example 3**: Find all significant differences between the blemish creams of Example 2 at the 95% significant level.

We summarize the results of the above analysis for the 3 pairwise comparisons in Figure 5.

**Figure 5 – Schaich-Hamerle test for Example 3**

Here *χ ^{2}_{α,k-1}* (in cell B36) = CHIINV(B35, B34–1) = 5.9915. For the comparison of the new and old creams,

*d*= D29–D30 = 10.25 and

*d*= SQRT(B36*B33*(B33+1)/12*(1/B29+1/B30)) = 8.9267, and similarly for the other two comparisons.

_{crit}The only significant comparison at the 95% significance level is between the new cream and the old cream where *p* < .05 since *d* > *d _{crit}*.

**Steel’s Test**

After a significant Kruskal-Wallis test, we can compare a control group with each of the other groups, in a manner similar to that used in Dunnett’s test after a one-way ANOVA. The test is also similar to the Nemenyi test, except that this time we use the Dunnett’s Table.

**Real Statistics Data Analysis Tool**: The Real Statistics Resource Pack provides a data analysis tool to perform Steel’s test, as shown in Example 4.

**Example 4**: Determine whether there is a significant difference between the Control and the New and Old groups for the data in Example 1.

After choosing the **Steel test** option from the One Factor ANOVA dialog box (see Figure 1) and filling in the contrast coefficients, we obtain the results shown in Figure 6.

**Figure 6 – Steel’s Test after KW Test**

We see there is no significant difference between the Old and Control groups. If we move the -1 contrast coefficient to cell G5, we would see there is no significant difference between the New and Control groups.

**Contrasts**

Contrasts can be used after a Kruskal-Wallis test as for one-way ANOVA. A contrast *C* is defined based on the contrast coefficients by

where

Taking ties into account the formula for the standard error becomes

The square of the contrast *C*^{2} is then tested using a chi-square distribution with *n*−1 degrees of freedom.

**Example 5**: Determine whether there is a significant difference between the Control and the average of the New and Old groups for the data in Example 2.

This time we choose the **Contrasts KW** option from the **One Factor ANOVA** data analysis tool. We see from Figure 7 that there is no significant difference.

**Figure 7 – Contrasts after KW Test**

Charles,

I believe the formula you are using under the “Pairwise Difference of Average Ranks” section that uses the chi-square is called the Schaich-Hamerle test. However, if I’m not mistaken, shouldn’t the second term in the product under the square root have a denominator of 12, not 2?

Kevin,

Thank you very much for catching this error. I have made the correction to the webpage and will add this test in the next release of the software, which will be available shortly.

I really appreciate that you have been an active participant in the Real Statistics community over several years, and have made important contributions.

Charles

I’m glad I’ve been able to help out, Charles. I’m not a statistician by trade, believe it or not, but I am a self-proclaimed math enthusiast, and stats in particular has been a passion of mine for quite a long time. I guess I just got lucky and find the number-crunching of statistics to be rather fascinating. I just wish I had deeper insight into how some of these formulas were developed; I suspect that a full knowledge of such things would require a bigger handle on calculus than I currently do. But I’m always on the lookout for new techniques to use, so I have the site bookmarked for future reference! Thanks again for all the hard work you do!

Thank you very much Kevin,

Charles

Hi Charles

I’m no statistician, but I was under the impression that Dunnett’s test was a parametric test for multiple comparisons against a common control, and is used following ANOVA. The nonparametric equivalent is Steel’s test.

Nigel,

I believe that you are correct. The test that I am describing following KW is Steel’s test. I will change the website and software before the next release of the Real Statistics software. Thank you for your comment and help in improving Real Statistics.

Charles

Do I understand correctly that the test labelled as Dunnett KW is really the Steel test?

Yes

Thank you!

P.S. I’ve donated and will send more after I next get paid. I am very grateful for what you are doing here.

Donna,

Thank you very much for your donation. i am glad that I was able to help you.

Charles

Hi Charles,

I have a sizable data set with upwards of 20 different categories with different sample sizes and they are mostly skewed right. I used the Kruskal-Wallis H Test and found significance, but I have not used your software yet. I was looking around to see what follow up test I should use and it seems like the Dunn’s Test is the way to go. Looking at how many comparisons that I would need to do seems substantial nearing 200. Is there a better way to go about this, or would the best thing be running through them all?

Thank you,

Dante

Sorry Dante, but I haven’t yet implemented a way to perform all the comparisons at once. In any case, usually you only need to perform a few comparisons before you can detect which others will be significant which other ones won’t be significant (based on the size of difference between the R avg values in the case of Dunn’s Test).

Charles

Hi Charles!

Your website and explanations are wonderful. I’m trying to figure out though if I could apply this to my data. I have complete mass spectrometry data (lists of all proteins) for three separate mice within one healthy group. I’m trying to first determine if the mice within a group can be considered the same in regards to the protein composition of a selected tissue. So my data set contains proteins 1-900 for mouse 1,2,3 and the abundances of each of the 900 proteins. It doesn’t make sense to compare the levels between different proteins, instead I want to show that between the 3 mice they each have the same amounts separately of each protein. (Example: for all of your kitchen appliances, you might need 50 forks but you don’t need 50 ovens so it doesn’t make sense to compare the numbers of ovens to forks. But I want to show you and I have the same kitchen composition as determined individually for each appliance.)

Allison,

I like your kitchen analogy, especially since it is easier for me to understand things in the kitchen than mass spectometry data, but it is not clear what your question is.

Charles

Sorry for the confusion, but thank you for your response. My question I suppose is two parts:

1. Would Kruskal-Wallis be the appropriate test to compare kitchens in order to say they are the same because they have the same contents (as each appliance is measured individually)? Our data is discrete as mass spectrometry can only record complete counts (as in 1 or 2 toasters, because 1.5 toasters makes no sense). I have also tested normalcy and tried numerous ways to normalize it, but our data follows a non-normal distribution. I believe K-W is the correct test, but in the field of mass spectrometry very few groups have used it. Most use a t-test or ANOVA with a disclaimer stating that they know this is not the appropriate test, but the field has not agreed on the most appropriate one so they use it anyway.

2. What would be a good post-hoc test to identify which appliances were variable between our kitchens? In my case, what would be the right test to determine which of the 900 proteins were more variable between the three mice?

Thank you again!

Allison,

Since I don’t yet understand the mass spectrometry scenario, let me make the following suggestions on the basis that a t test or ANOVA is commonly used.

1. If there are two groups with similar distributions (e.g. both skewed in the same direction), then the Mann-Whitney test could be used instead of a t test. With more than two groups (and similar skewness properties), then a K-W test could be a good choice.

2. Typical follow-up tests after K-W are described on the following webpage:

http://www.real-statistics.com/one-way-analysis-of-variance-anova/kruskal-wallis-test/follow-up-tests-kruskal-wallis/

Charles

Hi Charles!

Thank you for these tools, they are awesome!!

Can I ask you a (maybe stupid) question? I explain my dataset. I have 5 sample per each of the 6 different conditions (Control, Concentration1, Concentration2 and so on…). I made a Kruskal-Wallis and the significance results “yes”. So I did a Nemenyi Test to compare all the different concentrations (one at a time) to the Control (put -1 in the Control and 1 in the concentratio, one at a time) and registered the results. Is it correct? Should I change the “k” value from 6 to 2, since I’m comparing only two conditions at a time?

Thank you in advance for the answer!

I’m asking because I tried to run the KW and Nemenyi for Control and only one condition (e.g. Control-Concentration5) and the results are different.

And what about the Dunnett’s test aften an ANOVA with the same type of dataset? Should I change something in your functions?

Thank you again!

Davide,

The approach you used seems reasonable (assuming that you need to use the KW test in the first place). You shouldn’t change the k value for either test.

Charles

Thank you Charles.

Yes I need to use KW because, using Shapiro-Wilk and Levene’s test, it resulted that in my dataset normality and homoscedasticity are not always respected, so it seemed reasonable to me to use KW and Nemenyi. it is correct, isn’t it?

Thank you a lot again.

Davide

Davide,

Yes, except that if the homogeneity of variances assumption is not met, you should consider Welch’s ANOVA.

Charles

Hi Charles,

Thank you for the advice. But, can you explain me in a simple way which is the difference between Kruskal-Wallis and a Welch’s ANOVA? Why should I use one instead of the other one and instead of the simple ANOVA single factor?

In my dataset normality and homoscedasticity assumptions are not always respected, but in the majority of the cases they are respected.

Thank you very much for the explanation (if you would have time to answer 🙂 )

Davide

Davide,

If the normality and homoscedasticity assumptions are met, generally you should use simple ANOVA. If normality does not hold but homogeneity of variances holds, then Kruskal-Wallis should be a good choice. When homogeneity of variances does not hold, then you should consider Welch’s test.

Charles

I’ve found this page very useful – thank you! But I’m having a problem with matching the standard error in Example 2, doing my own calculations.

Is there a discrepancy between the formula used for Example 2 and the example spreadsheet?

For the standard errror, the formula for example 2 starts with N*(N+1) (which matches the references). However the spreadsheet on the matching page (DUNN) has N*(N-1). If I use N*(N-1) then I match the standard error in the spreadsheet and Figure 4.

Thank you for your help and an excellent resource

Jill,

There is an error in the calculation of the standard error. It should start with N*(N+1) and not N*(N-1). I will correct this in the next release.

Thank you very much for finding this error. I really appreciate your help in improving the website and software tools.

Charles

Thank you for the clarification and the future fix.

Jill

Jill,

This fix is now available in the latest version of the Real Statistics Resource pack, Rel 4.7.

Charles

I suspect the credibility of your result as i am getting different result in R.

John,

As I explained in my previous response, the Real Statistics formula =QINV(0.05,3,2) yields the exact same value as you got in R, namely 8.330783.

According to the standard textbook by Zar, Biostatistical Analysis, for the Nemenyi test, you don’t use df = 2, but df = infinity. I used df = 480, which is as high as I needed to go to approximate infinity. Are you getting a different result from R for the Nemenyi test?

Charles

can you tell me how you calcuated qcritical using Nemenyi Test

I tried it in r but getting different result.

I used qtukey(p=.05,nmeans=3,df=2,lower.tail=F) and got 8.330783.

qtukey is used in R to calculated Studentized Distribution . Looking at same distribution via the link provided by you in the article gave me the same result. The calculation you described by presenting formula table just after the method seems unclear. please help

John,

If you use the Real Statistics formula =QINV(0.05,3,2) you get the exact same value as you got in R, namely 8.330783.

According to the standard textbook by Zar, Biostatistical Analysis, for the Nemenyi test, you don’t use df = 2, but df = infinity. I used df = 480, which is as high as I needed to go to approximate infinity.

Charles

Hi Charles,

Realstats is a great program and I love the website too.

I’m trying to run a Dunn’s test as a follow-up to the Kruskal Wallis test. I have a Macbook Pro and, for some reason, the latest RealStats for Macs doesn’t seem to include the Dunn’s Test. Is that correct?

Thanks!

Andrew

Sorry Andrew, but only the Windows release has Dunn’s test at the moment.

Charles

Hello Charles

thank you for this excellent statistical package! Its been very useful. I have however a problem I sent my work for publication in a scientific journal and used ANOVA followed by Tukeys test to analyze the results. However a reviewer asks that I use Kruskal Wallis followed by Dunn’s test. I tried to do this but when I select ANOVA single factor the dialog box differs from the one in the example 2 in that there are no follow up tests with “KW” (and no “Dunn test KW” to choose). Should I choose Tukey HSD?

Thank you

Gerardo

Hello Gerardo,

It sounds like you are not using the latest version of the Real Statistics software. If you are using Excel 2007, 2010, 2013 or 2016 you should install the latest version of the software. Dunn KW is one of the options in the latest version of the software. Unfortunately, this capability is not yet available on the MAC or Excel 2003 versions of the software. In this case, you could use Tukey HSD, but that is not the advice you got from the reviewer.

Charles

Hello Charles,

i love to use the package a lot! However, I just discovered some outcoma that i do not understand. Despite a highly significant KW-multiple group comparison i cannot detect any singificant differences in the follow-up Dunn-Test at all. As in you example, i simply put in 1 or -1 in the output table for the combinations i want to test for. I have different group sizes and some variables are not normally distributed even after various data transformations, thats why i chose the nonparametric procedures. Am i doing somethin wrong here? Thanks, Dennis

Dennis,

If you send me an Excel file with your data and the tests you have performed, I will try to understand why you are getting the results that you are seeing. You can get my email address at Contact Us.

Charles

Hi,

First off–thank you for your excellent work! Your tools and great explanations are fantastic.

I’m no statistician, but I do like to understand how things work. I think you may have a minor glitch in the Bonferroni correction for the Dunn test (I have not checked others). The Bonferroni correction is applied in calculating d-crit. [=NORMSINV(1-Alpha/(k*(k-1)))], where k=the number of groups. The denominator is meant to calculate m, the number of orthogonal tests. To do this correctly, it should all be divided by 2.

i.e. for 3 groups, the current equation indicates m=6 comparisons being made. (and for 4, 12).

If I’m correct, please consider this my humble contribution to your great work!

Hi Ming,

I am very pleased that you like the website.

Actually, I think the formula I used for Dunn’s test is correct, as can be seen from the following two references. Since this is a two tailed test, probably the division by 2 in the denominator is cancelled out by the fact that you also need to divide the alpha value in the numerator by 2.

http://support.minitab.com/en-us/minitab/17/macro-library/macro-files/nonparametrics-macros/krusmc/

http://www.lexjansen.com/pharmasug/2004/statisticspharmacokinetics/sp04.pdf

Charles