The chi-square goodness of fit test can be used to test the hypothesis that data comes from a normal hypothesis. In particular, we can use Theorem 2 of Goodness of Fit, to test the null hypothesis:
H0: data are sampled from a normal distribution.
Example 1: 90 people were put on a weight gain program. The following frequency table shows the weight gain (in kilograms). Test whether the data is normally distributed with mean 4 kg and standard deviation of 2.5 kg.
Figure 1 – Frequency table and histogram for Example 1
We begin by calculating the probability that x < b for b = 0, 1, …, 8, assuming a normal distribution with mean 4 and standard deviation 2.5. This probability is NORMDIST(b, 4, 2.5, TRUE). The probability that x is in the interval (a, b] is then NORMDIST(b, 4, 2.5, TRUE) – NORMDIST(a, 4, 2.5, TRUE). Multiplying these figures by the sample size of 90, gives us the expected frequency.
Figure 2 – Chi-square test based on known mean and standard deviation
We now perform the Chi-square goodness of fit test. Since the observed and expected frequencies of the first and last interval are less than 5, it is better to combine the 1st and 2nd as well as the last and second to last intervals. The chi-square test statistic is 4.47, which is less than the critical value of CHIINV(.05,7) = 14.07, and so we can conclude that there is a good fit. Note that the df = number of interval – 1 = 8 – 1 = 7 since the mean and standard deviation are given.
Example 2: In the above example, the population mean and variance were known. This is usually not the case. This time we will simply ask whether the above data comes from a normal population.
We first calculate the sample mean and variance as described in Frequency Tables using the midpoint of each interval, although for the first and last intervals (-∞,0] and [8,∞) we need to guess at acceptable representative values, which we take as -1 (i.e. a weight gain of 1 kg) and 9 respectively.
Figure 3 – Calculation of mean and standard deviation for Example 1
We next test the null hypothesis that the data is normally distributed using the sample mean and variance (3.74 and 4.84 respectively as see in Figure 3) as estimates for the population mean/variance. As in Example 1, we combine the first two and the last two intervals so that all frequencies are at least 5. Once again we use a chi-square goodness of fit test based on 8 intervals, but this time since the mean and variance are estimated parameters, per Theorem 3 of Goodness of Fit, we use df = 8 – 1 – 2 = 5.
Figure 4 – Chi-square test based on estimated mean and standard deviation
Since = 1.35 < 11.071 = , we again retain the null hypothesis that the data is normally distributed.