We could have conducted the analysis for Example 1 of Basic Concepts for ANOVA by conducting multiple two sample tests. E.g. to decide whether or not to reject the following null hypothesis
H0: μ1 = μ2 = μ3
We can use the following three separate null hypotheses:
- H0: μ1 = μ2
- H0: μ2 = μ3
- H0: μ1 = μ3
If any of these null hypotheses is rejected then the original null hypothesis is rejected.
Note however that if you set α = .05 for each of the three sub-analyses then the overall alpha value is .14 since 1 – (1 – α)3 = 1 – (1 – .05)3 = 0.142525 (see Example 6 of Basic Probability Concepts). This means that the probability of rejecting the null hypothesis even when it is true (type I error) is 14.2525%.
For k groups, you would need to run m = COMBIN(k, 2) such tests and so the resulting overall alpha would be 1 – (1 – α)m, a value which would get progressively higher as the number of samples increases. For example, if k = 6, then m = 15 and the probability of finding at least one significant t-test, purely by chance, even when the null hypothesis is true is over 50%.
In fact, one of the reasons for performing ANOVA instead of separate t-tests is to reduce the type I error. The only problem is that once you have performed ANOVA if the null hypothesis is rejected you will naturally want to determine which groups have unequal variance, and so you will need to confront this issue in any case.
With 3 separate tests, in order to achieve a combined type I error rate (called an experiment-wise error rate or family-wise error rate) of .05 you would need to set each alpha to a value such that 1 – (1 – α)3 = .05, i.e. α = 1 – (1 – .05)1/3 = 0.016952. As is mentioned in Statistical Power, for the same sample size this reduces the power of the individual t-tests. If the experiment-wise error rate < .05 then the error rate is called conservative. If it is > .05 then the error rate is called liberal.
There are two types of follow up tests following ANOVA: planned (aka a priori) and unplanned (aka post hoc or posteriori) tests. Planned tests are determined prior to the collection of data, while unplanned tests are made after data is collected. These tests have entirely different type I error rates.
For example, suppose there are 4 groups. If an alpha value of .05 is used for a planned test of the null hypothesis then the type I error rate will be .05. If instead the experimenter collects the data and sees means for the 4 groups of 2, 4, 9 and 7, then the same test will have a type I error rate of more than .05. The reason for this is that once the experimenter sees the data, he will choose to test because μ1 and μ2 are the smallest means and μ3 and μ4 are the largest.