Unplanned Comparisons

A number of unplanned comparisons are available. A few of the commonly used post-hoc tests (e.g. Fisher’s LSD, Student Newman-Keuls (SNK) and Tukey’s B) are not very accurate and usually should not be used. The key issue is to correct for experiment-wise error.

Although the Bonferroni and Dunn/Sidák correction factors can be used, since we are considering unplanned tests, we must assume that all pairwise tests will be made (or at least taken into account). For k groups, this results in m = C(k, 2) = k(k–1)/2 tests. For an experiment-wise error of α we need to use α/m as the alpha for each test (Bonferroni) or 1 – (1 – α)1/m (Dunn/ Sidák). This makes these tests too conservative.

More useful tests are Tukey’s HSD and REGWQ. These tests are designed only for pairwise comparisons (i.e. no complex contrasts). We also describe extensions to Tukey’s HSD test (Tukey-Kramer and Games and Howell) where the sample sizes or variances are unequal. We also describe the Scheffé test, which can be used for non-pairwise comparisons.

We discuss these tests on this webpage. Where the variances are unequal we can also use the Brown-Forsythe F* Test.

General guidelines are:

  • Tukey’s test is usually the safe choice. It is a good choice for comparing large numbers of means
  • REGWQ test is even better (i.e. has more power) for comparing all pairs of means, but should not be used when group sizes are different
  • Hochberg’s GT2 (not reviewed here) is best when sample sizes are very different
  • Games-Howell is useful when uncertain about whether population variances are equivalent.

Tukey’s HSD (Honestly Significant Difference)

The idea behind this test is to focus on the largest value of the difference between two group means. The relevant statistic is

studentized range q

and n = the size of each of the group samples. The statistic q has a distribution called the studentized range q (see Studentized Range Distribution). The critical values for this distribution are presented in the Studentized Range q Table based on the values of α, k (the number of groups) and dfW. If q > qcrit then the two means are significantly different.

This test is equivalent to


Picking the largest pairwise difference in means allows us to control the experiment-wise  for all possible pairwise contrasts; in fact, Tukey’s HSD keeps experiment-wise α = .05 for the largest pairwise contrast, and is conservative for all other comparisons.

Note that the statistic q is related to the usual t statistic by q = \sqrt 2 t. Thus we can use the following t statistic


The critical value for t is now given by tcrit = qcrit /\sqrt 2. If t > tcrit then we reject the null hypothesis that H0: μmax = μmin, and similarly for other pairs.

As described above, to control type I error, we can’t simply use the usual critical value for the distribution, but instead use a critical value based on the largest difference of the means.

From these observations we can calculate confidence intervals in the usual way:


or equivalently


Example 1: Analyze the data from Example 3 of Planned Comparisons using Tukey’s HSD test to compare the population means of women taking the drug and the control group taking the placebo.

Using the Studentized Range q Table with α = .05, k = 4 and dfW = 44, we get qcrit = 3.7775. Note that since there is no table entry for df = 44, we need to interpolate between the entries for df = 40 and df = 48. Alternatively, we can employ Excel’s table lookup capabilities. We can also use the Real Statistics function QCRIT(4,44,.05,2,FALSE), as described below, to get the same result of 3.7775.

The critical value for differences in means is


Since the difference between the means for women taking the drug and women in the control group is 5.83 – 3.83 = 1.75 and 1.75 is smaller than 1.8046, we conclude that the difference is not significant (just barely). The following table shows the same comparisons for all pairs of variables:

Pairwise Tukey's HSD tests

Figure 1 – Pairwise tests using Tukey’s HSD for Example 1

From Figure 1 we see that the only significant difference in means is between women taking the drug and men in the control group (i.e. the pair with largest difference in means). We can also use the t-statistic to calculate the 95% confidence interval as described above. In Figure 2 we compute the confidence interval for the comparison requested in the example as well as for the variables with maximum difference.

Tukey HSD confidence interval

Figure 2 – Tukey HSD confidence intervals for Example 1

Real Statistics Function: The following function is provided in the Real Statistics Resource Pack:

QCRIT(k, df, α, tails, h) = the critical value of the Studentized range q for k independent variables, the given degrees of freedom and value of alpha, and tails = 1 (one tail) or 2 (two tails, default). If h = TRUE (default) harmonic interpolation is used; otherwise linear interpolation is used.

This function is based on the table of critical values provided in Studentized Range q Table. Note too that in the previous example we found that QCRIT(4,44,.05,2,FALSE) = 3.7775 using linear interpolation (between the table values of df = 40 and df = 48). If harmonic interpolation were used (see Interpolation) then we would have obtained the value QCRIT(4,44) = 3.7763.

The Real Statistics Resource Pack also provides the following functions which provide estimates for the Studentized range distribution and its inverse based on a somewhat complicated algorithm.

QDIST(q, k, df) = the value of the Studentized range distribution at q for k independent variables and df degrees of freedom.

QINV(p, k, df, tails) = the inverse of the Studentized range distribution at p for k independent variables, df degrees of freedom and tails = 1 or 2 (default 2).

Observation: Note that the values calculated by QCRIT and QINV will be similar, at least within the range of alpha values in the table of critical values. E.g.  QINV(.015,4,18,2) = 4.82444 while QCRIT(4,18,.015,2) = 4.75289.

Note that QDIST outputs a two-tailed value. E.g. QDIST(4.82444,4,18) = 0.15. To get the usual cdf value for the Studentized range distribution, you need to divide the result from QDIST by 2, which for this example is .0075, as confirmed by the fact that QINV(.0075,4,18,1) = 4.82444.

Finally note that the algorithm used to calculate QINV (and QDIST) is pretty accurate except at low values of p and df. In particular, for df = 1  and certainly when p ≤ .025, QCRIT will be more accurate than QINV (at least for those values found in the table of critical values). This is also true when df = 2 and p ≤ .01 or when df = 3 and p = .001.

Real Statistics Data Analysis Tool: The Real Statistics Resource Pack contains a Tukey’s HSD Test data analysis tool which produces output very similar to that shown in Figure 2.

For example, to produce the first test in Figure 2, follow the following steps: Enter Ctrl-m and select the Analysis of Variance data analysis tool from the list. On the dialog that appears, select the Single Factor Anova option. A dialog box similar to that shown in Figure 1 of Confidence Interval for ANOVA appears. Enter A3:D15 in the Input Range, check Column headings included with data, select the Tukey HSD option and click on the OK button.

The report shown in Figure 3 now appears. We see that only MC-WD is significant, although WC-WD is close.

Tukey HSD follow-up test

Figure 3 – Real Statistics Tukey HSD data analysis

Tukey-Kramer Test

When sample sizes are unequal, the Tukey test can be modified by replacing \frac{2}{n} by \frac{1}{n_i} + \frac{1}{n_j} in the above formulas. In particular, the standard error for the q statistic becomes


Note that the Real Statistics Tukey HSD data analysis tool described above actually performs the Tukey-Kramer Test when the sample sizes are unequal.

Example 2: Analyze the data in range A3:D15 of Figure 4 using the Tukey-Kramer test to compare the population means of women taking the drug and the control group taking the placebo. This example is the same as Example 1 but with some data missing, and so there are unequal sample sizes.

Anova unequal sample sizes

Figure 4 – Data and ANOVA for Example 2

Enter Ctrl-m and select Single Factor Anova and Follow-up Tests from the menu. A dialog box similar to that shown in Figure 1 of Confidence Interval for ANOVA appears. Enter A3:D15 in the Input Range, check Column headings included with data, select the Tukey HSD option and click on OK.

The output is shown in Figure 5.

Tukey-Kramer follow-up test

Figure 5 – Tukey-Kramer Data Analysis

Games and Howell Test

A better alternative to Tukey-Kramer when variances are unequal is Games and Howell. Here the standard error becomes


Thus we use different pooled variances for each pair instead of the same pooled variance MSW. We employ the following test

Games-Howell test

where the standard error is as above and q′crit is the critical value of the Studentized range statistic but with the degrees of freedom given by df′ as defined in Two Sample t-Test with Unequal Variances, namely


In this way we also take care of the case where the variances are unequal in exactly the same manner as in Theorem 1 of Two Sample t-Test with Unequal Variances, except that we now use the q-statistic instead of the t-statistic. Note that the supplemental function DF_POOLED can be used to calculate df′.

Example 3: Repeat Example 2 using the Games-Howell test.

We repeat the same steps as we used in Example 2 except that we choose the Games-Howell option. The output of the test is shown in Figure 6.

Games-Howell analysis

Figure 6 – Games-Howell Data Analysis


The Ryan, Einot, Gabriel, Welsh Studentized Range Q (REGWQ) test uses what is known as a step-down approach. No confidence intervals are calculated.

First, the equality of all of the means is tested at the αk level. If the null hypothesis of equality of means is rejected, then each subset of k – 1 means is tested at the αk-1 level; otherwise, the procedure stops.

In general, if the hypothesis of equality of a set of p means is rejected at the αp level, then each subset of p – 1 means is tested at the αp-1 level; otherwise, the set of p means is considered not to differ significantly and none of its subsets is tested. One continues in this manner until no subsets remain to be tested.

First we arrange the sample means in descending order 12 ≥ … ≥ k. At each stage p, we define αp as follows:


Now, the equality of the means i, …, j where p = j – i + 1 is rejected when


as for Tukey’s HSD test, except that the αp are defined as above. The only problem with this approach is that qcrit values are required for values of α are usually not found in the Studentized Range q Table. To partially address this issue we provide a second table of values for q (also in Studentized Range q Table) for values of α = .025, .005 and .001. The best we can do for values not in the table is to interpolate.

Example 4: Apply the REGWQ to the data in Example 1 of Confidence Interval for ANOVA.

REGWQ test Excel

Figure 7 – REGWQ for Example 4

There are three tables in Figure 7. The table in the upper left hand side of the figure consists of the variables sorted from highest to lowest mean. We will refer to the index (i.e. rank) of these variables, where i = 1 refers to the variable with highest mean (i.e. women who take the drug) and i = 4 refers to the variable with lowest mean (i.e. men in the control).

The table in the upper right side of the figure is used to determine the critical values of mean differences. These are based on the possible values of p, namely 2, 3 and 4. For each p, first we determine the required values of αp (range R8:T8) as described above. For each p we then find the qcrit values from the Studentized Range q Table (interpolating as necessary – via rows 10 and 11 in Figure 4) corresponding to p, df = 44 and αp (range R12:T12). We multiply these values by


to get the critical values for the mean differences corresponding to p = 2, 3 and 4 (range R13:T13).

The final table iterates all possible pairs of variables (given by the indices i and j, which point to the entries in the first table). For each pair the value of p is calculated as i + j – 1 (row 17). The difference of the means corresponding to indices i and j is calculated in row 20. Each mean difference is then compared with the critical value calculated in the second table for the value of p in that column. The pairs whose mean difference is larger than the critical value are significant.

For our example, the only mean differences in the first two columns are significant. This means that there is a significant difference between women who take the drug and men in the control (first column) and there is also a significant difference between women who take the drug and women in the control.

Dunnett’s Test

This test is used when we want to compare one group (usually the control treatment) with the other groups. In this case, Dunnett’s test is more powerful than the other tests described on this webpage.

The test is similar to Tukey’s HSD, except that instead of testing


we test whether


where n = size of the group samples, o = mean of the control group, j is the mean of any other group and td  is the (two-tailed) Dunnett’s critical value given in the Dunnett’s Table. The table contains the values td(k, dfW, α)  where k = the number of groups (treatments) including the control.

The test can also be used when the group means are not equal via


although, since Dunnett’s Table is based on equal group sizes, the above formula is only accurate if the group sizes are not too different.

Real Statistics Function: The following function is provided in the Real Statistics Resource Pack:

DCRIT(k, df, α, h) = the critical value td for k independent variables, the given degrees of freedom and value of alpha. If h = TRUE (default) harmonic interpolation is used; otherwise linear interpolation is used.

Real Statistics Data Analysis Tool: We now show how to use the Dunnett’s Test data analysis tool to address Example 5.

Example 5: Assuming that Method 1 is the control group in Example 1, compare the mean of this method with the means of the other methods using Dunnett’s test. The data is repeated in Figure 8.

Enter Ctrl-m, select Analysis of Variance and press the OK button. Next select Single Factor Anova from the dialog box that appears. A dialog box similar to that shown in Figure 1 of Confidence Interval for ANOVA appears. Enter A3:D11 in the Input Range, check Column headings included with data, select the Dunnett’s Test option and click on the OK button.

The output is similar to that shown in Figure 8, although initially the contrast coefficients in range G4:G7 are blank. You need to enter a +1 and -1 in two of these cells, one of which must correspond to the control group (Method 1 in this example). The output for the comparison of Methods 1 and 3 is shown in Figure 8.

Dunnett's test

Figure 8 – Dunnett’s Test

We see that there is a significant difference between Method 1 and 3 (cell L11). We can of course compare Method 1 with Method 2 or Method 4 by simply changing where the value for the +1 contrast is placed.

Note that cell I11 contains the formula =DCRIT(COUNT(I4:I7),H11,L2). Thus the critical value in Dunnett’s Table when k = 4, df = 28 and α = .05 is 2.483.

Scheffé Test

To carry out Scheffé’s test follow the following steps:

  • Calculate the planned comparison t-test
  • Square the t-statistic to get F (since F = t2)
  • Find the critical value of F with dfB, dfW degrees of freedom for given value of α and multiply it by dfB. Thus the critical value is dfB* FINV(α, dfB, dfW).
  • If F > the critical value then reject null hypothesis

Since Scheffé’s test is very conservative, it is not recommended for pairwise comparisons where better tests are available, but it can be useful for more complicated comparisons where the other unplanned tests don’t apply.

Real Statistics Data Analysis Tool: Scheffé’s test can be carried out using  the Single Factor Anova and Follow-up Tests data analysis tool provided by the Real Statistics Resource Pack. We show how this is done in the following example.

Example 6: Carry out the test in Example 2 from Planned Comparisons for ANOVA using Scheffé’s test.

The steps are very similar to those used for Tukey’s HSD, as described above, namely enter Ctrl-m and select Single Factor Anova and Follow-up Tests from the menu. A dialog box similar to that shown in Figure 1 of Confidence Interval for ANOVA will appear. Enter I21:L29 in the Input Range, check Column headings included with data, select the Scheffe option and click on OK.

A report similar to that shown in Figure 9 will appear but with no numbers in the shaded range O23:O26. You must now enter the contrast values. Setting the contrast coefficients for Method 1 and 2 to -0.5 and the contrast coefficient for Method 4 to 1,  we get the output shown in Figure 5.

Scheffe analysis

Figure 9 – Scheffé data analysis for Example 6

This figure shows there is no significant difference between Method 4 and the average of Methods 1 and 2.


83 Responses to Unplanned Comparisons

  1. Bowen says:


    Great website! Doing these things manually in Excel really does allow for much more flexibility. I have some questions about the Tukey/Tukey-Kramer tests.

    First, a conceptual question: my interpretation of the Tukey test is that one computes q-values for each pair of means compared, using the mean values, MS_W, and n. This is compared to a critical q-value that depends only on df_W, m, and alpha.

    Where I’m confused is how the test “focuses on the largest value of the difference between two group means.” I see only individual paired comparisons made.

    Second, under the Tukey-Kramer test, the s.e. for the q statistic is modified as you note. However, instead replacing (1/n) with 2 * (1/n_i + 1/n_j), should it not be 0.5 * (1/n_i + 1/n_j)? For example, in the trivial case of comparing n_i = n_j = 1, multiplying by 0.5 backs out the original Tukey equation with n = 1. I checked an Excel dataset using Prism, and my q-values check out with the latter.


    • Charles says:

      Thanks for your excellent questions. Let me give you a quick response, although I plan to look at your questions more carefully and provide a more complete response shortly-
      The s.e. provided is not for the q statistic, but for the t-statistic that corresponds to the q-statistic (i.e the q-value divided by the square root of 2.). Having said this I can see how this might be confusing. I will either explain things better on the website or change the data analysis tool to provide the s.e. of the q statistic. The resulting test and confidence interval should be the same, however. The q statistic is based on the largest difference of mean values. Note that no matter which pair of means you choose the q value does not change.

  2. Ede says:

    Dear Charles

    Thank you so much for this tool that is helping lot of persons to solve our statistical analysis. Really appreciate it.
    I would like to know about Tukey’s HSD Test, but please you should know that I’m a beginner in these kind of study. So my question is:
    How do I set the contrast values that should I enter in the shaded range G5:G8 and you explained (Since Tukey’s HSD Test is a pairwise comparison you should only use two contrasts: one with 1 and the other with -1.) ?
    I understood is that I should only choice two groups to compare and write (1) in the group with the higher value of mean and (-1) in the group with lower value of mean, between the two groups that I want to compare. For example in the same Figure 3, If I want to compare MD with MC, should I write in shaded range G5:G8 MD with (1) and MC with (-1)
    Please, I hope you can help me, and apologize for my question and take part of your time….if you have some information that I should read about it, please feel free to advice me.
    Thanks once more and have great rest of the week,


    • Charles says:

      Dear Ed,
      You have understood things very well. Based on the real-world analysis that you want to accomplish you need to choose the two groups you want to compare. Once you do this you assign a contrast value of +1 to one and -1 to the other exactly as you described.
      If you do other followup tests (e.g. the Contrasts option of the Single Factor Anova data analysis tool) then you can assign other values to the contrast besides 1 and -1, but for Tukey’ HSD 1 and -1 are the only choices.

      • ED says:

        Dear Charles,
        I have been really busy with my experiments and classes, but now coming back to carry out my stats analysis from data obtained and using your “”AMAZING” tools,
        Thank you a lot,

        Best regard


  3. Rich says:

    Another outstanding effort with Release 2.11.

    Perhaps some words about the new q-distribution functions you recently added might be included here? And, an observation: it doesn’t seem that all of the new functions are yet included in the tools inventory listings.

    Regards, Rich

    • Charles says:

      Glad you like the new release. I have already started updating the website to explain the new capabilities. I hope to have this completed this weekend.

  4. Elizabeth Malek says:

    Ok. Here is my question again, but better asked (I hope). Yes, Tukey HSD tells me whether there is a significant difference in a pair of means. Now I have multiple yes/no answers. How can I now group them according to their differences? Example: group A (treatments 1 and 2) is significantly different from group B (treatments 3 and 4, 5, and 6). How can I tell if there is a group C (treatments 5 and 6) and/or more groups of statistically different treatments just by looking at the yes/no answers? My goal is to make a table with the means of the treatments from largest to smallest in a row in excel. I would like to underline the treatments that are within the same group. How can I do this?

    • Elizabeth Malek says:

      I just figured it out! I wasn’t showing any differences because all of my contrast values were 1 instead of 1 and -1. Can you explain why we need to use these values, how we determine which we should use, and how to come up with the value?

      • Charles says:


        I am pleased that you figured this out yourself. How you set the contrast values depends on what you want to test. What is important is that the sum of the contrast values is equal to 0 for any single test. Any variable which is not part of the analysis gets a value of 0.

        For example, if you want to compare variable x2 with variable x4 just set the 2nd contrast value to 1 and the 4th to -1 (or vice versa). If you want to compare x1 with the average of the values of x2 and x3, just set the 1st contrast to 1 and the 2nd and 3rd to -.5.

        Suppose x1 represents men who live in the US, x2 represents women who live in the US, x3 represents men who live in Canada and x4 represents women who live in Canada. If you want to compare men vs. women set x1 = x3 = .5 and x2 = x4 = -.5 (alternatively you could set x1 = x3 = -1 and x2 = x4 = +1 and get the same result). If instead you want to compare people in the US with Canadians set x1 = x2 = .5 and 3 = x4 = -.5.

        I tend to use contrast values so that the positive contrasts add up to 1 and the negative contrasts add up to -1, but this isn’t essential.


  5. David says:

    Is there any way I tell it to read replicates within a treatment across a row instead of down columns (as in the option to group by row or column in excel’s regular ANOVA analysis) without transposing the data (i.e. all my data has the dependent variables listed down the columns and replicate measurements across the same row)?

    • Charles says:

      Unfortunately the Real Statistics ANOVA data analysis tool currently only accepts two formats: the format that you referenced and the standard format (similar to that used by tools such as SPSS). The only solution I can think of is to transpose the data.

  6. Addie says:

    Thank you for this excellent tool to further my statistic ability when using excel. I’m currently experiencing a small difficulty in performing my own analysis, however.

    When I analyze my data set, the shaded grey column labeled “c” in the Tukey output is empty. Because of this, a lot of the remaining outputs cannot be calculated.

    additionally, there is no column for individual “yes” or “no” on whether each group is different. Based on the ANOVA p value, there is significant difference, but I’m unable to discern which groups.

    Thank you,

  7. William Agurto says:


    There’s a little bug in Real Statistics 3.2 when using Single Factor Anova (first option of Analysis of Variance menu):
    When I used 4 samples (different size: 19, 17, 15 and 18), I got an error in the Random Factor Table: It seems there’s an error calculating the estimate group variance. For my example, that value is negative. For that reason I obtained “#NUM!” in lower and upper bounds for the mean. Perhaps is necessary to use the ABS Excel function in the numerator for getting a positive number?

    The values I use for my example are the following.

    Thank you.


    G1 G2 G3 G4
    78 81 83 47
    45 59 32 90
    77 58 33 37
    2 48 84 15
    45 43 35 28
    46 98 68 5
    30 16 56 84
    96 6 21 36
    44 99 2 2
    31 18 31 4
    72 37 86 49
    28 6 39 36
    29 2 17 2
    68 67 53 65
    51 6 24 77
    15 50 79
    99 84 86
    73 94

    • Charles says:

      It is quite possible for the value to be negative, in which case the remaining values will be meaningless. I will explain this once I update the webpage. The value for df can also be less than one, in which case there will be an error since Excel’s chi-square function will round df down to zero.

      • William Agurto says:

        Thank you for your answer, Charles. I didn’t know that value could be negative and df could be less than one.



        • Charles says:

          Obviously in these cases the estimates generated by the formulas are not particularly useful since a negative variance cannot exist and a df of less than 1 (even if not rounded off) will produce a huge confidence interval.

  8. John Vanek says:

    Thanks for the great website!

    How would I apply the Tukey HSD test to a two factor ANOVA? I’ve successfully used your add-on to use the Tukey following a single way ANOVA, but am at a loss on how to get it to apply to the other ANOVAs.

    Thank you!

  9. David says:


    Above it says:”Example 1: Analyze the data from Example 1 of Confidence Interval for ANOVA using Tukey’s HSD test to compare the population means of women taking the drug and the control group taking the placebo.”

    However if I follow the link it says:”Example 1: Find the confidence intervals for each of the methods in Example 3 of Basic Concepts for ANOVA.”

    Following that link it says:”Example 3: Repeat the analysis for Example 2 where the last participant in group 1 and the last two participants in group 4 leave the study before their reading tests were recorded.”

    Example 2 on the same page says:”Example 2: A school district uses four different methods of teaching their students how to read and wants to find out if there is any significant difference between the reading scores achieved using the four methods. It creates a sample of 8 students for each of the four methods.

    I cannot find any examples comparing women taking drugs vs. placebos or that has 44 degrees of freedom. Do you mean this page?


    • Charles says:

      Yes, the example should refer to Example 3 on the Planned Comparisons webpage. I have just changed the link.
      Thanks for finding this mistake.

  10. Marcy says:

    Hi, thanks for all this useful help and resources. I have tried to match the tukey hsd results with minitab output and by formula from books. With the books i have almost the same std error and confidence intervals that real stats output. I dont understand quite well the c^2/n part .

    With minitab output they use the t test version of tukey and the formulas for the family ratio error and standard error of diferences yield different results from the formulas used in your website for t tukey hsd. Minitab say they use 0,007 for individual error alpha if you go for 0,05 family error rate in the case of 10 comparisons. but it doesnt follow the 1-(1-alpha)^c family error formula or explain how they get that number.
    This are the numbers Iam working with, this are from 3 plants workers ages. first row are labels not data.
    1 2 3
    29 32 25
    27 33 24
    30 31 24
    27 34 25
    28 30 26
    Help!! :)) thank you very much.

    • Charles says:


      I don’t use Minitab so that it is difficult for me to comment about what results they get. I have checked my results with another popular software product and the results for your data are exactly the same.

      Regarding .007 for individual alpha for familywise alpha of .05 with 10 comparisons, this doesn’t seem to have anything to do with the data that you have included in your comment since you only have 3 groups.


  11. Dillah says:

    Thanks for your wonderful website.
    It helps me a lot for my final project problem.
    However, I have a question for you regarding how to perform a table like Figure 1.
    Could you please tell me how to create the table in a great detail?


  12. Alistair Cullum says:

    Thanks for this fantastic resource. I have a question about confidence intervals for differences between means when variances of groups are heterogenous – that is, cases where you’d be using Welch’s test and the Games and Howell test. Would that CI be calculated in the standard way from the standard error and t-statistic, but with the SE and degrees of freedom for t taken from the Games and Howell calculations? Or does the difference in variances introduce an additional wrinkle that needs to be considered?

    Thanks again,

    • Charles says:

      The confidence intervals can be calculated in the usual way.

      • Alistair Cullum says:

        Thank you. I also had a question about a possible typo. In the introduction to this section, you write “Games-Howell is useful when uncertain about whether population variances are equivalent.”, but in the Games-Howell section, the first sentence reads “A better alternative to Tukey-Kramer when sample sizes are unequal is Games and Howell. ” I’m guessing the second of those is also supposed to say “variances” instead of “sample sizes,” but just wanted to double-check.

        • Charles says:

          You are correct. I had corrected this on my offline copy of the site, but not on the website itself. Thanks very much for catching this error and helping to improve the website. Much appreciated.
          I have now made the correction to the referenced webpage.

  13. Erik van Rensbergen / Flanders / Belgium says:

    I want to write my own Excel program for computing the critical q value in Tukey’s test.
    Problem is that I cannot find the statistic q in the formulas provided by Excell.
    Finding q-crit via q-crit = (2)exp0.5 * t-crit doesn’t seem to work to me since q=f(alpha, k, dfW) and t=f(alpha, dfW).
    I went to your the Studentized Range q Table which gave of course al the values necessary to perform Tukey’s HSD test, but I feel this is not being as practical as having it formulated on the same Excel sheet together with Anova and ICC computations. Could you help me to find a solution? I realize that your anwer may very well be that working with the Real Statistics Data Analysis Tool would be a far better solution, but I want to do it my own way…
    Thank you very much, you have always been a great support.

  14. Anna says:

    Hi Charles,
    I’ve run several one-way ANOVAs with Tukey’s HSD and consistently come up with the problem of the q statistic coming up with #DIV/0! in the spreadsheet. I’m pretty confident in my setup so I’m not sure why. Here’s some dummy data that gives this result, if needed:
    all gut tentacle Neo
    4 1 3 0
    3 2 1 2
    5 3 2 1
    3 2 1 1
    5 3 2 2
    10 5 4 2
    2 1 2 2
    3 1 2 1

    P value is significant and visually I can see where the significant differences are, but can’t seem to convince the formula to report the q-stat correctly.

    • Charles says:

      You need to fill in the c column in the output, by placing a +1 in any one of the highlighted cells and a -1 in another (these are the contrast coefficients). E.g. in your problem, if you place +1 next to Group 1 and -1 next to Group 3, you would compare Group 1 with Group 3. You cam compare any pair of groups without inflating the experimentwise error (which is the point of Tukey HSD).

  15. Felix says:

    Hi Sr:
    How much can differ cuantitative, the sample size when we go to use dunnett´s test?

    • Charles says:

      The version of the test in the Real Statistics Resource pack assumes that all sizes of all the groups are equal. The test can be modified to handle unequal sample sizes. This is done in the same manner as Tukey-Krammer modifies the Tukey HSD test to handle unequal sample sizes.

  16. Fahmi says:


    I’ve been trying to use both contrast and Tukey HSD for One way Anova.Then I got different result for both comparison analyses. Contrast give sig. result while Tukey HSD not on a pairwise comparison in my analysis.Can you enlighten me why this happen?

    • Charles says:

      It is not so uncommon to find that different tests for the same situation give conflicting results. Each test uses its own set of assumptions and has its own advantages and disadvantages.
      First and foremost, you need to make sure that each test is suitable for the analysis that you are undertaking (and that the assumptions of that test are met). In your situation, Tukey HSD is used for multiple pairwise comparisons, while contrasts can be used for comparisons that are not pairwise. While you can use Tukey’s HSD to perform multiple pairwise comparisons, you cannot use it it to compare more than two variables at a time (although you can use contrasts for this).

  17. Francis says:

    Dear Charles,

    As I’m a beginner in all of this, I’m a bit confused about the output of the Tukey HSD test. How can I know at which level of significance two groups are different? (based on the P-values, I want to add the appropriate ‘*, **, ***’ symbols to my charts. However, I can’t find the P-values associated with each pairwise comparison)

    Kind regards

    • Charles says:

      You can calculate the p-value by using the QDIST function. E.g. for Figure 3, p-value = =QDIST(3.663207,4,44) = .060185.

      • Francis says:

        Thanks. I really appreciate your help.

      • Austin says:

        I am showing a significant between group difference via ANOVA and a significant difference between 2 of my 3 groups on post hoc Tukey HSD. However, I do not see a p-value in the generated output chart.
        When I try the QDIST function mentioned above “=QDIST(q-stat,# of levels,df)”, the formula outputs “1”.
        Can anybody help with this?

        • Charles says:

          If you are using one of the latest releases of Real Statistics, you should see a p-value in the output. This won’t be the case if you are using the Mac version.
          If the QDIST function outputs 1, then p-value = 1 and you clealry don’t have a significant result.

  18. Dustin says:

    Brilliant site, thank you! I typed in your data and ran the add-in function for this; however, the DCRIT function, namely =DCRIT(COUNT(I3:I6),H10,L1) in my case, returns #VALUE!. I cannot get the function to work with different parameters either. Note, other RealStats functions are working in Excel during this instance as well. Thoughts?

    • Charles says:

      Glad you like the site.
      The formula on the referenced webpage is =DCRIT(COUNT(I4:I7),H11,L2). Your formula should work provided you have shifted all the data one row up. If so, then I suggest that you send me an Excel file with your data so that I can see what went wrong.

      • Davide says:

        I had the same problem.
        Also with my data the function DCRIT results in an error. But I found out that it is because the size of my dataset (i.e. “k”) is bigger than 240 (it is 294) and so the formula doesn’t recognize it as a number of the Dunnett’s table and return an error. Eventually, I put manually the right number (i.e. the number of df lower, 240) and the formula works as it should 🙂

        Anyway, it’s an awesome tools! Thank you very much!

        Have a nice day

        • Charles says:

          Last week I also came across this error in the DCRIT function. It will be corrected in the next release of the software, which should be available in the next couple of days.

  19. Jose says:

    Dear Porfessor,

    In other page I write a sugestions for write the p-value.
    But I need calculate…

    And I try follow what you write in this page and not obtain the values.

    q-stat df
    -18.74358529 4285

    ah… the sample/gropups are 7.
    I try use and not obtain any p-value. Say there are a error in formula.
    It seems to me that my values are very big because if I rduce the df I obtain values.


  20. Davide says:

    Good morning,
    Since the tool for the Dunnett’s test shows only if there is significance or not (with “yes” or “no”), how can I know the value of the p-values in a Dunnett’s test? Is there a way to do that?
    Thank you very much!
    Kind regards

    • Charles says:


      The best I can do at present is make an estimate based on the Table of Critical Values for Dunnett’s Test as shown in
      Dunnett’s Test Table

      E.g. suppose k = 10 and df = 20. From the table I see that the critical values at .01, .05 and .10 are respectively 3.694, 2.946 and 2.600. Thus if the test statistic is 3.2, then I know that the p-value is somewhere between .01 and .05; a rough estimate is something like .03. I can make this more precise by using the DCRIT function. In fact, I see that DCRIT(10,20,.03) = 3.070667, which is lower than 3.2. Thus I need a value lower than .03 (but higher than .01). I try .02 next. Since DCRIT(10,20,.02) = 3.2265, which is higher than 3.2, I need a value higher than .02 (but lower than .03).

      I can iterate this “divide and conqueror” approach to whatever degree of accuracy I desire. I settle for .0212 since DCRIT(10,20,.0212) = 3.200038. Note that this approach uses harmonic interpolation, as explained on the webpage

      I had planned to create a Real Statistics function that would do all of this for you for the previous release, but I ran out of time. I will likely put it in the next release.


  21. Long Nguyen says:

    Dear Charles,

    Your website really opens my mind! I tried analysing my data with Weich’s followed by the Games and Howell Test but there was no show-up of P-value. Did I miss something?Is there a formula like QDist to calculate the P-value in this test?


    • Charles says:

      In the latest release of the Real Statistics software (Rel 4.11) the p-value is given for the Games-Howell test. In any case, you can use the QDIST function.

  22. Nadav says:

    Dear Charles,

    In figure 4, there is a description of a calculation.
    I was trying to manually calculate the Description part myself, and couldn’t understand how did you calculate the Std Err (column L in the figure).
    Shouldn’t it be s/sqrt(n)? i.e. for MD for example SQRT(2.87/11)=0.51 (you got 0.48)


    • Charles says:

      That too is a reasonable approach, and in fact I used to use that calculation. I recently changed the approach to use the common variance MSE found in cell I15, since this seemed to be the more commonly used approach. A case can be made for either approach.

  23. Jamie says:

    Hi Charles,

    I have just discovered your add-in and website and am pleasantly surprised that it’s so well detailed. I was playing around with the add-in and am confused by the output I see for the Tukey HSD after the one-way ANOVA. I don’t see the pairwise significance table shown in “Fig 1” or the table shown in “Fig 2” on the Unplanned Comparisons webpage. I only see the table shown in “Fig 3”. I guess another way of describing what I’m seeing (using your Worksheet example #2), is that I don’t see the Tukey’s HSD table, but only the Tukey HSD/Kramer table. Am I missing something here? Thanks.

    • Charles says:

      In the current implementation (as shown in Figure 3), you need to make the pairwise comparisons one at a time. For whichever pairwise comparisons you desire, you place a 1 and a -1 in the shaded area corresponding to the two pairs you want to compare.

      • Jamie says:

        Thanks for the quick reply Charles. I see now how it works though I’m afraid I don’t find it very intuitive. As a friendly suggestion, it would be nicer to have the pairwise table generated automatically as in Fig. 1.

        On a different note, I have noticed that when I close Excel and reopen, that RealStats no longer appears under the Add-In menu, but when I go to the Add-ins options it is still checked. Then I need to uncheck, hit ok, and then recheck to reload it…and then it works again. Perhaps this is a bug? I’m using Office 2013 and don’t seem to have this trouble with other add-ins. Cheers.

  24. Kats says:

    Hi Charles,
    Thank you for the excellent program for everybody.
    When I read the outcomes by Scheffé’s test using RealStats.xlam, the P value was lower than other program. The cell for P value by Scheffé’s test did not include the coefficient of 1/dfB. (Sorry for not using a small capital for B.) When the F value was multiplied by this coefficient, the P value obtained was exactly same as those by other programs. I would be happy, if you could have a chance to check.
    Best regards,

  25. Sam says:

    Hi when you say :Since the difference between the means for women taking the drug and women in the control group is 5.83 – 3.83 = 1.75 and 1.75 is smaller than 1.8046, we conclude that the difference is not significant

    What does that imply ?is thit different ? And how to select the best combination between this six man.woman categories ?

    • Charles says:

      Not significant means that you cannot conclude that the population means are different.
      If you are using a post hoc test such as Tukey HSD, then you can do any combination of categories that are interesting to you (without increasing the experimentwise error rate).

  26. ENH says:

    Thanks for these programs! I’ve installed the program for Mac 2016 and I’m doing a One-way ANOVA with Tukey HSD. Regardless of whether the groups have the same N value or the individual cell values, the output returns a Q-test with std err, lower, upper, x-crit, and Cohen d of zero. The df and q-crit are reported. Any reason why this is or what I’m doing wrong?

  27. stephanie says:

    Hi Charles,

    Thanks so much for the amazing resources you have created, between all of the web pages and the RealStats Excel package. I’m following along with this example using the RealStats Excel functions (in Excel 2016 on a Mac), and for some reason, “=QCRIT(4,44,0.05,2)” returns a different value for me than what you show on this page. (3.7767272 vs. 3.7775). Any ideas about this?

    • Charles says:

      Hi Stephanie,
      =QCRIT(4,44,.05,2,FALSE) has value 3.7775, while =QCRIT(4,44,.05,2) has the value you calculated. The first formula uses linear interpolation, while the second uses harmonic interpolation (the default).
      I have now updated the webpage to make this clearer. Thanks for bringing this to my attention. I appreciate your help in improving the website.

  28. Piero says:

    Dear Charles,

    after a one-way unbalanced ANOVA on four groups, I did post-hoc Scheffé test with experiment-wise alfa = 0.05.
    This because I want to compare also group of means, not only pairwise comparisons.
    But after applying Scheffé to a pair of the above means with the latest Real Statistics pack, I obtain the following result:
    F-stat = 7.23, F-crit = 8.12, p-value = 0.000215, lower = -38.8, upper = 1.12
    and for another pair of means:
    F-stat = 3.72, F-crit = 8.12, p-value = 0.014, lower = -25.9, upper = 4.98

    So, while both comparisons are not significant, because F-stat < F-crit, p-values look like they were significant! I can not understand this, could you please explain me?

    Thank you very much
    Best Regards

    • Charles says:

      As stated on the website, you shouldn’t use Scheffe Test for pairwise comparisons.

      • Piero says:


        I understand this. But in my design I have 4 groups (A,B,C,D) of different sizes, and I need all pairwise comparisons (I can use Tukey-Kramer for this), but I should test also:
        A vs (B+C+D)
        This latter can’t be done with Tukey-Kramer, for this reason I was thinking to use Scheffé or maybe Games-Howell, I don’t know which is better for unbalanced design.

        Do you think it’s fine to present in my study Tukey-Kramer results for pairwise comparisons, but to use a different test for the latter more complex contrast?

        Thank you very much for all your suggestions


        • Charles says:

          Games-Howell is useful when the variances of the groups are different. If the sample sizes are different but the homogeneity of variances assumption holds then you can use Tukey-Kramer instead.
          Based on the info you have provided, Tukey-Kramer would probably be the correct choice if you only had pairwise comparisons.
          If you have a lot of non-pairwise comparisons, then Scheffe’ is probably the best choice. If you have many pairwise comparisons and only one non-pairwise comparison you might consider using Tukey-Kramer for the pairwise comparisons and Bonferroni/Contrasts for the non-pairwise comparisons.

  29. Ana says:

    Hello Charles

    very useful all the information that you publish, this made me understand many things.
    However I want to ask you about the value of “c” in the table that Games Howell throws, since, for my data that column is empty, therefore all the cells below that use the value “c” give me error.

    thank you very much

    • Charles says:

      You need to manually fill in this column by placing a +1 in one of the cells and a -1 in another of the cells. These two cells determine which pairwise comparison you want to make. To make multiple such comparisons, just change which cells get the +1 and -1.

Leave a Reply

Your email address will not be published. Required fields are marked *