Grubbs’ test can be used to test the presence of one outlier and can be used with data that is normally distributed (except for the outlier) and has at least 7 elements (preferably more).

Here we test the null hypothesis that the data has no outliers vs. the alternative hypothesis that there is one outlier. The ESD test should be used if there is the possibility of more than one outlier.

If you suspect that the maximum value in the data set may be an outlier you can use the test statistic

If you suspect that the minimum value in the data set may be an outlier you can use the test statistic

The critical value for the test is

where* t _{crit}* is the critical value of the t distribution

*T*(

*n*−2) and the significance level is

*α*/

*n*. Thus the null hypothesis is rejected if

*G > G*.

_{crit}There is also a two-tailed version of the test where *G* is the larger of the two *G* values described above and *G _{crit}* is defined as above except that the significance level for

*t*is

_{crit}*α*/(2

*n*). Alternatively,

*G*can be calculated using the formula

**Example 1**: Determine whether the data set {145, 125, 190, 135, 220, 130, 210, 3, 165, 165, 150} has an outlier.

The data set is shown in range A4:A14 of Figure 1. Using the Shapiro-Wilk test, we see that the data is not normally distributed. However, when we remove the data element 3, which seems to be a potential outlier (as shown in range K4:K14), we find that the data is normally distributed, thus allowing us to use Grubbs’ test

**Figure 1 – Testing data for normality **

Grubbs’ test is implemented in Figure 2.

**Figure 2 – Grubbs’ Test**

We see that 3 is a little more than 2.5 standard deviations from the mean (cell G6) and that the test is significant (cell G14), meaning that 3 is an outlier (based on *α* = .05).

**Real Statistics Function**: The Real Statistics Resource Pack provides the following array function to perform a one-tailed Grubbs’ test.

**GRUBBS**(R1, *lab, alpha*): outputs a 4 × 1 column range with the following entries: potential outlier, G, Gcrit, and test significance

If *lab* = TRUE (default FALSE) then the output is a 4 × 2 matrix with a column of labels added. *alpha* = the significance level (default .05). The potential outlier is either the maximum or minimum value in R1, depending on which is farthest away from the mean of R1. The test significance if “yes” if *G > G _{crit}* and “no” otherwise.

For Example 1, if you highlight the range P5:Q8, enter the formula =GRUBBS(A4:A14,TRUE) and press **Ctrl-Shft-Enter**, then the output that appears is displayed in Figure 3.

**Figure 3 – Output from GRUBBS formula**

I am trying to find out if the G critical value table uses the N sample size or the Degrees of Freedom. Calculating the values uses the DF but the table could be constructed to reference N as the sample size. I am fairly sure the N is sample size but want to check with out bothering the calculate the G critical from the formula.

I have also read that the Grubbs test should not be used on sample sizes of less than 6. Grubbs test is not reliable with these small sample sizes and leads to rejecting the Ho at an alpha where it should be accepted. You don’t state this but is that your understanding?

Joe,

There are tables of critical values for Grubbs’ test, but I don’t use them since we can use the critical values of the t distribution instead.

You are correct that the test is not reliable for small samples (as are many other statistical test). I have now updated the webpage with this fact.

Charles

Hello Mr. Zaiontz

I got a data set of n=365 and need to find out what values are outliers. Does Grubb’s test consider the max and min values only, or it can examine all of the values one by one?

Once the outlier(s) is(are) found.. Should it(they) be removed from the data set and start all over?

Pedro,

Grubbs test only finds one outlier (which or course will be the max or min, although it needs to consider all the values in determining whether one of these is an outlier). You can use the ESD extension to Grubbs test to identify more than one potential outlier.

Whether or not you remove an outlier from the data set depends on the type of test (e.g. various nonparametric tests are robust to outliers) and the reason that the data element is an outlier (i.e. if it is a measurement error or typo then it can be removed). Often you will need to run your test twice, once with the outlier included and onec without, with both results reported.

Charles

You said “data that is normally distributed (except for the outlier).”

how to know that our data is normally distributed if we don’t know the outliers yet, you said except for outliers??????????

Or is it a supposition that we do and at last we re-verify it ?

Akrour,

Yes, you can verify this assumption after you identify the outlier.

Charles

Hi Charles,

when I click on cell G12 of Grubbs’ Test, t-crit = T.INV(2*G10;G11) so cell I12 is a typo.

But if 2*G10 = 2*G8/G9 = 2*alpha/size is correct then I didn’t understand where you write …G-crit is defined as above except that the significance level for t-crit is α/(2n). Is this another typo?

Thank you in advance for your reply and for your helpful website.

Silva,

TINV(2*G10;G11) = T.INV(1-G10;G11). Note that the expression on the left is TINV and not T.INV.

Charles

Hi Charles,

I realized what the problem is. I have Excel 2010 localized in Italian so I have to pay attention to the functions names.

English localization

TINV = two-tailed inverse

T.INV = left-tailed inverse

Italian localization

INVT = left-tailed inverse

INV.T = two-tailed inverse

To me it seemed more logical T.INV = INV.T but it’s wrong.

Silva,

I understand the problem very well since I too have the Italian version of Excel on one of my computers.

Charles

Interesante lo aplicaré en mi expo.

Thanks for the information on Grubbs, and in addition I learned to evaluate data for their normal distribution characteristics .

This most kind of you.

Willie