Grubbs’ Test

Grubbs’ test can be used to test the presence of one outlier and can be used with data that is normally distributed (except for the outlier) and has at least 7 elements (preferably more).

Here we test the null hypothesis that the data has no outliers vs. the alternative hypothesis that there is one outlier. The ESD test should be used if there is the possibility of more than one outlier.

If you suspect that the maximum value in the data set may be an outlier you can use the test statistic

image9114

If you suspect that the minimum value in the data set may be an outlier you can use the test statistic

image9115

The critical value for the test is

image9116

where tcrit is the critical value of the t distribution T(n−2) and the significance level is α/n. Thus the null hypothesis is rejected if G > Gcrit.

There is also a two-tailed version of the test where G is the larger of the two G values described above and Gcrit is defined as above except that the significance level for tcrit is α/(2n). Alternatively, G can be calculated using the formula

image9117Example 1: Determine whether the data set {145, 125, 190, 135, 220, 130, 210, 3, 165, 165, 150} has an outlier.

The data set is shown in range A4:A14 of Figure 1. Using the Shapiro-Wilk test, we see that the data is not normally distributed. However, when we remove the data element 3, which seems to be a potential outlier (as shown in range K4:K14), we find that the data is normally distributed, thus allowing us to use Grubbs’ test

Testing normality Shapiro-Wilk

Figure 1 – Testing data for normality 

Grubbs’ test is implemented in Figure 2.

Grubbs' test one tail

Figure 2 – Grubbs’ Test

We see that 3 is a little more than 2.5 standard deviations from the mean (cell G6) and that the test is significant (cell G14), meaning that 3 is an outlier (based on α = .05).

Real Statistics Function: The Real Statistics Resource Pack provides the following array function to perform a one-tailed Grubbs’ test.

GRUBBS(R1, lab, alpha): outputs a 4 × 1 column range with the following entries: potential outlier, G, Gcrit,  and test significance

If lab = TRUE (default FALSE) then the output is a 4 × 2 matrix with a column of labels added. alpha = the significance level (default .05). The potential outlier is either the maximum or minimum value in R1, depending on which is farthest away from the mean of R1. The test significance if “yes” if G > Gcrit and “no” otherwise.

For Example 1, if you highlight the range P5:Q8, enter the formula =GRUBBS(A4:A14,TRUE) and press Ctrl-Shft-Enter, then the output that appears is displayed in Figure 3.

Grubbs' test using Excel

Figure 3 – Output from GRUBBS formula

12 Responses to Grubbs’ Test

  1. Joe Young says:

    I am trying to find out if the G critical value table uses the N sample size or the Degrees of Freedom. Calculating the values uses the DF but the table could be constructed to reference N as the sample size. I am fairly sure the N is sample size but want to check with out bothering the calculate the G critical from the formula.
    I have also read that the Grubbs test should not be used on sample sizes of less than 6. Grubbs test is not reliable with these small sample sizes and leads to rejecting the Ho at an alpha where it should be accepted. You don’t state this but is that your understanding?

    • Charles says:

      Joe,
      There are tables of critical values for Grubbs’ test, but I don’t use them since we can use the critical values of the t distribution instead.
      You are correct that the test is not reliable for small samples (as are many other statistical test). I have now updated the webpage with this fact.
      Charles

  2. Pedro Perez says:

    Hello Mr. Zaiontz
    I got a data set of n=365 and need to find out what values are outliers. Does Grubb’s test consider the max and min values only, or it can examine all of the values one by one?

    Once the outlier(s) is(are) found.. Should it(they) be removed from the data set and start all over?

    • Charles says:

      Pedro,
      Grubbs test only finds one outlier (which or course will be the max or min, although it needs to consider all the values in determining whether one of these is an outlier). You can use the ESD extension to Grubbs test to identify more than one potential outlier.
      Whether or not you remove an outlier from the data set depends on the type of test (e.g. various nonparametric tests are robust to outliers) and the reason that the data element is an outlier (i.e. if it is a measurement error or typo then it can be removed). Often you will need to run your test twice, once with the outlier included and onec without, with both results reported.
      Charles

  3. AKROUR Rabah says:

    You said “data that is normally distributed (except for the outlier).”
    how to know that our data is normally distributed if we don’t know the outliers yet, you said except for outliers??????????
    Or is it a supposition that we do and at last we re-verify it ?

  4. Silva says:

    Hi Charles,
    when I click on cell G12 of Grubbs’ Test, t-crit = T.INV(2*G10;G11) so cell I12 is a typo.
    But if 2*G10 = 2*G8/G9 = 2*alpha/size is correct then I didn’t understand where you write …G-crit is defined as above except that the significance level for t-crit is α/(2n). Is this another typo?
    Thank you in advance for your reply and for your helpful website.

    • Charles says:

      Silva,
      TINV(2*G10;G11) = T.INV(1-G10;G11). Note that the expression on the left is TINV and not T.INV.
      Charles

      • Silva says:

        Hi Charles,
        I realized what the problem is. I have Excel 2010 localized in Italian so I have to pay attention to the functions names.
        English localization
        TINV = two-tailed inverse
        T.INV = left-tailed inverse
        Italian localization
        INVT = left-tailed inverse
        INV.T = two-tailed inverse
        To me it seemed more logical T.INV = INV.T but it’s wrong.

        • Charles says:

          Silva,
          I understand the problem very well since I too have the Italian version of Excel on one of my computers.
          Charles

  5. david says:

    Interesante lo aplicaré en mi expo.

  6. Willie Delport says:

    Thanks for the information on Grubbs, and in addition I learned to evaluate data for their normal distribution characteristics .
    This most kind of you.
    Willie

Leave a Reply

Your email address will not be published. Required fields are marked *