Grubbs’ test can be used to test the presence of one outlier and can be used with data that is normally distributed (except for the outlier) and has at least 7 elements (preferably more).
Here we test the null hypothesis that the data has no outliers vs. the alternative hypothesis that there is one outlier. The ESD test should be used if there is the possibility of more than one outlier.
If you suspect that the maximum value in the data set may be an outlier you can use the test statistic
If you suspect that the minimum value in the data set may be an outlier you can use the test statistic
The critical value for the test is
where tcrit is the critical value of the t distribution T(n−2) and the significance level is α/n. Thus the null hypothesis is rejected if G > Gcrit.
There is also a two-tailed version of the test where G is the larger of the two G values described above and Gcrit is defined as above except that the significance level for tcrit is α/(2n). Alternatively, G can be calculated using the formula
The data set is shown in range A4:A14 of Figure 1. Using the Shapiro-Wilk test, we see that the data is not normally distributed. However, when we remove the data element 3, which seems to be a potential outlier (as shown in range K4:K14), we find that the data is normally distributed, thus allowing us to use Grubbs’ test
Figure 1 – Testing data for normality
Grubbs’ test is implemented in Figure 2.
Figure 2 – Grubbs’ Test
We see that 3 is a little more than 2.5 standard deviations from the mean (cell G6) and that the test is significant (cell G14), meaning that 3 is an outlier (based on α = .05).
Real Statistics Function: The Real Statistics Resource Pack provides the following array function to perform a one-tailed Grubbs’ test.
GRUBBS(R1, lab, alpha): outputs a 4 × 1 column range with the following entries: potential outlier, G, Gcrit, and test significance
If lab = TRUE (default FALSE) then the output is a 4 × 2 matrix with a column of labels added. alpha = the significance level (default .05). The potential outlier is either the maximum or minimum value in R1, depending on which is farthest away from the mean of R1. The test significance if “yes” if G > Gcrit and “no” otherwise.
For Example 1, if you highlight the range P5:Q8, enter the formula =GRUBBS(A4:A14,TRUE) and press Ctrl-Shft-Enter, then the output that appears is displayed in Figure 3.
Figure 3 – Output from GRUBBS formula