We could have conducted the analysis for Example 1 of Basic Concepts for ANOVA by conducting multiple two sample tests. E.g. to decide whether or not to reject the following null hypothesis

H_{0}: *μ _{1} = μ_{2} = μ_{3}*

We can use the following three separate null hypotheses:

- H
_{0}:*μ*_{1}= μ_{2} - H
_{0}:*μ*_{2}= μ_{3} - H
_{0}:*μ*_{1}= μ_{3}

If any of these null hypotheses is rejected then the original null hypothesis is rejected.

Note however that if you set *α* = .05 for each of the three sub-analyses then the overall alpha value is .14 since 1 – (1 – *α*)^{3 }= 1 – (1 – .05)^{3} = 0.142525 (see Example 6 of Basic Probability Concepts). This means that the probability of rejecting the null hypothesis even when it is true (type I error) is 14.2525%.

For *k* groups, you would need to run *m* = COMBIN(*k*, 2) such tests and so the resulting overall alpha would be 1 – (1 – *α*)* ^{m}*, a value which would get progressively higher as the number of samples increases. For example, if

*k*= 6, then

*m*= 15 and the probability of finding at least one significant t-test, purely by chance, even when the null hypothesis is true is over 50%.

In fact, one of the reasons for performing ANOVA instead of separate t-tests is to reduce the type I error. The only problem is that once you have performed ANOVA if the null hypothesis is rejected you will naturally want to determine which groups have unequal variance, and so you will need to confront this issue in any case.

With 3 separate tests, in order to achieve a combined type I error rate (called an **experiment-wise error rate **or** family-wise error rate**) of .05 you would need to set each alpha to a value such that 1 – (1 – *α*)^{3} = .05, i.e. *α* = 1 – (1 – .05)^{1/3 }= 0.016952. As is mentioned in Statistical Power, for the same sample size this reduces the power of the individual t-tests. If the experiment-wise error rate < .05 then the error rate is called **conservative**. If it is > .05 then the error rate is called **liberal**.

There are two types of follow up tests following ANOVA: **planned** (aka **a** **priori**) and **unplanned** (aka **post hoc** or **posteriori**) tests. Planned tests are determined prior to the collection of data, while unplanned tests are made after data is collected. These tests have entirely different type I error rates.

For example, suppose there are 4 groups. If an alpha value of .05 is used for a planned test of the null hypothesis then the type I error rate will be .05. If instead the experimenter collects the data and sees means for the 4 groups of 2, 4, 9 and 7, then the same test will have a type I error rate of more than .05. The reason for this is that once the experimenter sees the data, he will choose to test because *μ _{1}* and

*μ*are the smallest means and

_{2}*μ*and

_{3}*μ*are the largest.

_{4}
Sir:

If I run 10 t-tests with alpha set at .05 and there are no significant results (all p’s above .5), then should I even be concerned with experiment-wise error rate?

Victor,

I believe that in this case you don’t need to be concerned about experiment-wise error (assuming I haven’t made a silly logic mistake).

Charles

Hi Charles,

I was wondering whether you could answer a few of my (relatively simple) questions:

1.) How do post hocs influence the statistical decision for each pairwise comparison?

2.) If one was to use multiple t-tests, what would the experiment wise error be?

3.) What is the formula linking appropriate experiment wise error rate that is associated with each comparison? Is it: desired experiment wise error rate / number of pairwise comparisons?

Any help is much appreciated!

Jack,

1. Don’t understand the question

2. 1-(1-alpha)^k

3. The error for each comparison is still alpha

Charles

If you use posthoc test and the test are significant. U can say that the specific pairwise that u use are different from each other

Dear Dr. Charles,

I would appreciate to have your opinion about this problem.

I have to statistically compare two foot pressure distribution maps, corresponding to two different clinical conditions, named A e B for instance.

Each pressure map is composed by let’s say 100 sensor cells. Maps are the results of an average, so for each cell, I have a mean pressure value and related s.d.

Then, what I need to do is to perform a comparison, (making 100 hundred of t-tests, one per each corresponding cell), between pressure value in condition A (mean and s.d.) and pressure value in condition B (mean and s.d.).

My concern is: what is the correct significance level I have to use for each t-test? Can I set p=0.05 for each test, or should I apply some correction (e.g. Bonferroni) to take into account that I’m performing many comparisons?

In effect, I am not interested to know if the whole foot in condition A is different from the whole foot in condition B, because in such a case I can understand that the Bonferroni correction on p-values would be mandatory, in order to keep a 5% experiment-wise type I error.

Instead, the aim of my study is to investigate if there are statistic differences at the level of single cells, and this makes me confused about what is the right significance level p to apply to each t-test.

Thank you very much for your help

Piero

Piero,

Since you plan to conduct 100 tests, generally you should correct for experiment-wise type I error. This will impact the statistical power.

Charles

Could you write about Phciyss so I can pass Science class?

You’re going to want to use Tukey’s if you are looking at all possible pairwise comparisons. If you want to look at a few, then use bonferonni. Or if you have a control group and want to compare every other treatment to the control, using the Dunnett Correction.

Hi Charles,

I am having a bit of trouble getting to grips with this and I was wondering if you could answer this question:

if you fix the experimentwise error rate at 0.05. What effect does this

have on the error rate of each comparison and how does this influence the statistical

decision about each comparison?

would it be that if you fixed it to 0.05 then the effect on each comparison would be that their error rates would be smaller, using the formula: 1 – (1 – .05)1/3 ? or have I got this completely wrong

Any help on this would be much appreciated!

You have got this right. If you fix the experimentwise error rate at 0.05, then this nets out to an alpha value of 1 – (1 – .05)1/3 = .016962 on each of the three tests to be conducted.

Charles

Thanks for this site and package of yours; I’m learning a lot!

Sir,

Thanks for this site and package of yours; I’m learning a lot!

With regards to this particular page about experiment wise error rate, you said just in the last paragraph that:

“…in order to achieve a combined type I error rate (called an experiment-wise error rate or family-wise error rate) of .05 you would need to set each alpha to a value such that 1 – (1 – α)3 = .05, i.e. α = 1 – (1 – .05)1/3 = 0.016952”

Does it mean that the computed alpha (that is, 0.016952 for m=3 tests among k=4 samples) should be the one used in the pairwise test (m=3) to reduce the overall type I error among your 4 tests. If so, sir, what do you, statisticians, technically call this adjusted alpha?

I’d be very glad to have your response.

And I was also answered by your other page, in your discussion about the kruskal-wallis test. You said:

“If the Kruskal-Wallis Test shows a significant difference between the groups, then pairwise comparisons can be used by employing the Mann-Whitney U Tests. As described in Experiment-wise Error Rate and Planned Comparisons for ANOVA, it is important to reduce experiment-wise Type I error by using a Bonferroni (alpha=0.05/m) or Dunn/Sidák correction (alpha=1-(1-0.05)^(1/3)).”

This only means your page is very efficient, my sincerest appreciation, sir.

Larry,

Glad to see that you are learning a lot form the website. That’s great.

The alpha value of 1 – (1 – .05)1/m depends on m, which is equal to the number of follow up tests you make. This is the alpha value you should use when you use contrasts (whether pairwise or not). Actually m = the number of orthogonal tests, and so if you restrict yourself to orthogonal tests then the maximum value of m is k – 1 (see Planned Follow-up Tests).

I have always called the “adjusted alpha” simply “alpha”. If there is a technical term for this, I am unaware of it.

Charles

Sir

There is something wrong with the pictures, I cannot see the formula

Colin,

There are no pictures on the referenced webpage. In general, I display “pictures” as images, but some “formulas” are displayed as images while others are displayed using latex. If you can’t see the pictures on some other webpage, let me know what you can’t see (sic) so that i can determine whether there are problems with images or latex.

Charles

Colin,

I forgot to mention that some formulas are also displayed as simple text. I also checked the specific website that you referenced are see that the first formulas are simple text, while the ones at the end of the document use latex as follows. What browser are you using? You should be able to see the latex formulas, but perhaps this is the problem you are having.

For example, suppose there are 4 groups. If an alpha value of .05 is used for a planned test of the null hypothesis \frac{\mu_1 + \mu_2}{2} = \frac{\mu_3 + \mu_4}{2} then the type I error rate will be .05. If instead the experimenter collects the data and sees means for the 4 groups of 2, 4, 9 and 7, then the same test will have a type I error rate of more than .05. The reason for this is that once the experimenter sees the data, he will choose to test \frac{\mu_1 + \mu_2}{2} = \frac{\mu_3 + \mu_4}{2} because μ1 and μ2 are the smallest means and μ3 and μ4 are the largest.

Charles