Generally to understand some characteristic of the general population we take a random sample and study the corresponding property of the sample. We then determine whether any conclusions we reach about the sample are representative of the population.
This is done by choosing an estimator function for the characteristic (of the population) we want to study and then applying this function to the sample to obtain an estimate. By using the appropriate statistical test we then determine whether this estimate is based solely on chance.
The hypothesis that the estimate is based solely on chance is called the null hypothesis. Thus, the null hypothesis is true if the observed data (in the sample) do not differ from what would be expected on the basis of chance alone. The complement of the null hypothesis is called the alternative hypothesis.
The null hypothesis is typically abbreviated as H0 and the alternative hypothesis as H1. Since the two are complementary (i.e. H0 is true if and only if H1 is false), it is sufficient to define the null hypothesis.
Since our sample usually only contains a subset of the data in the population, we cannot be absolutely certain as to whether the null hypothesis is true or not. We can merely gather information (via statistical tests) to determine whether it is likely or not. We therefore speak about rejecting or not rejecting (aka retaining) the null hypothesis on the basis of some test, but not of accepting the null hypothesis or the alternative hypothesis. Often in an experiment we are actually testing the validity of the alternative hypothesis by testing whether to reject the null hypothesis.
When performing such tests, there is some chance that we will reach the wrong conclusion. There are two types of errors:
- Type I – H0 is rejected even though it is true (false positive)
- Type II – H0 is not rejected even though it is false (false negative)
The acceptable level of a Type I error is designated by alpha (α), while the acceptable level of a Type II error is designated beta (β).
We use the following terminology:
Significance level is the acceptable level of type I error, denoted α. Typically, a significance level of α = .05 is used (although sometimes other levels such as α = .01 may be employed). This means that we are willing to tolerate up to 5% of type I errors, i.e. we are willing to accept the fact that in 1 out of every 20 samples we reject the null hypothesis even though it is true.
P-value (the probability value) is the value p of the statistic used to test the null hypothesis. If p < α then we reject the null hypothesis.
Critical region is the part of the sample space that corresponds to the rejection of the null hypothesis, i.e. the set of possible values of the test statistic which are better explained by the alternative hypothesis. The significance level is the probability that the test statistic will fall within the critical region when the null hypothesis is assumed.
The typical approach for testing a null hypothesis is to select a statistic based on a sample of fixed size, calculate the value of the statistic for the sample and then reject the null hypothesis if and only if the statistic falls in the critical region.
One-tailed hypothesis testing specifies a direction of the statistical test. For example to test whether cloud seeding increases the average annual rainfall in an area which usually has an average annual rainfall of 20 cm, we define the null and alternative hypotheses as follows, where μ represents the average rainfall after cloud seeding.
H0: µ ≤ 20 (i.e. average rainfall does not increase after cloud seeding)
H1: µ > 20 (i.e. average rainfall increases after cloud seeding
Here the experimenters are quite sure that the cloud seeding will not significantly reduce rainfall, and so a one-tailed test is used where the critical region is as in the shaded area in Figure 1. The null hypothesis is rejected only if the test statistic falls in the critical region, i.e. the test statistic has a value larger than the critical value.
Figure 1 – Critical region is the right tail
The critical value here is the right (or upper) tail. It is quite possible to have one sided tests where the critical value is the left (or lower) tail. For example, suppose the cloud seeding is expected to decrease rainfall. Then the null hypothesis could be as follows:
H0: µ ≥ 20 (i.e. average rainfall does not decrease after cloud seeding)
H1: µ < 20 (i.e. average rain decreases after cloud seeding)
Figure 2 – Critical region is the left tail
Two-tailed hypothesis testing doesn’t specify a direction of the test. For the cloud seeding example, it is more common to use a two-tailed test. Here the null and alternative hypotheses are as follows.
H0: µ = 20
H1: µ ≠ 20
The reasons for using a two-tailed test is that even though the experimenters expect cloud seeding to increase rainfall, it is possible that the reverse occurs and, in fact, a significant decrease in rainfall results. To take care of this possibility, a two tailed test is used with the critical region consisting of both the upper and lower tails.
Figure 3 – Two-tailed hypothesis testing
In this case we reject the null hypothesis if the test statistic falls in either side of the critical region. To achieve a significance level of α, the critical region in each tail must have size α/2.
Statistical power is 1 – β. Thus power is the probability that you find an effect when one exists, i.e. the probability of correctly rejecting a false null hypothesis. While a significance level for type I error of α = .05 is typically used, generally the target for β is .20 or .10, and so .80 or .90 is used as the target value for power.
The general procedure for null hypothesis testing is as follows:
- State the null and alternative hypotheses
- Specify α and the sample size
- Select an appropriate statistical test
- Collect data (note that the previous steps should be done prior to collecting data)
- Compute the test statistic based on the sample data
- Determine the p-value associated with the statistic
- Decide whether to reject the null hypothesis by comparing the p-value to α (i.e. reject the null hypothesis if p < α)
- Report your results, including effect sizes (as described in Effect Size)
Observation: Suppose we perform a statistical test of the null hypothesis with α = .05 and obtain a p-value of p = .04, thereby rejecting the null hypothesis. This does not mean that there is a 4% probability of the null hypothesis being true, i.e. P(H0) =.04. What we have shown instead is that assuming the null hypothesis is true, the conditional probability that the sample data exhibits the obtained test statistic is 0.04; i.e. P(D|H0) =.04 where D = the event that the sample data exhibits the observed test statistic.