We present the original approach to the performing the Shapiro-Wilk Test. This approach is limited to samples between 3 and 50 elements. By clicking here you can also review a revised approach using the algorithm of J. P. Royston which can handle samples with up to 5,000 (or even more).
The basic approach used in the Shapiro-Wilk (SW) test for normality is as follows:
- Rearrange the data in ascending order so that x1 ≤ … ≤ xn.
- Calculate SS as follows:
- If n is even, let m = n/2, while if n is odd let m = (n–1)/2
- Calculate b as follows, taking the ai weights from the Table 1 (based on the value of n) in the Shapiro-Wilk Tables. Note that if n is odd, the median data value is not used in the calculation of b.
- Calculate the test statistic W = b2 ⁄ SS
- Find the value in the Table 2 of the Shapiro-Wilk Tables (for a given value of n) that is closest to W, interpolating if necessary. This is the p-value for the test.
For example, suppose W = .975 and n = 10. Based on Table 2 of the Shapiro-Wilk Tables the p-value for the test is somewhere between .90 (W = .972) and .95 (W = .978).
Example 1: A random sample of 12 people is taken from a large population. The ages of the people in the sample are given in column A of the worksheet in Figure 1. Is this data normally distributed?
We begin by sorting the data in column A using Data > Sort & Filter|Sort or the QSORT supplemental function, putting the results in column B. We next look up the coefficient values for n = 12 (the sample size) in Table 1 of the Shapiro-Wilk Tables, putting these values in column E.
Corresponding to each of these 6 coefficients a1,…,a6, we calculate the values x12 – x1, …, x7 – x6, where xi is the ith data element in sorted order. E.g. since x1 = 35 and x12 = 86, we place the difference 86 – 35 = 51 in cell H5 (the same row as the cell containing a1). Column I contains the product of the coefficients and difference values. E.g. cell I5 contains the formula =E5*H5. The sum of these values is b = 44.1641, which is found in cell I11 (and again in cell E14).
We next calculate SS as DEVSQ(B4:B15) = 2008.667. Thus W = b2 ⁄ SS = 44.1641^2/2008.667 = .971026. We now look for .971026 when n = 12 in Table 2 of the Shapiro-Wilk Tables and find that the p-value lies between .50 and .90. The W value for .5 is .943 and the W value for .9 is .973.
Interpolating .971026 between these value (using linear interpolation), we arrive at p-value = .873681. Since p-value = .87 > .05 = α, we retain the null hypothesis that the data are normally distributed.
Example 2: Using the SW test, determine whether the data in Example 1 of Graphical Tests for Normality and Symmetry are normally distributed.
As we can see from the analysis in Figure 2, p-value = .0419 < .05 = α, and so we reject the null hypothesis and conclude with 95% confidence that that the data are not normally distributed, which is quite different from the results using the KS test that we found in Example 2 of Kolmogorov-Smironov Test.
Real Statistics Function: The Real Statistics Resource Pack contains the following supplemental functions where R1 consists only of numeric data without headings:
SHAPIRO(R1, FALSE) = the Shapiro-Wilk test statistic W for the data in the range R1
SWTEST(R1, FALSE, h) = p-value of the Shapiro-Wilk test on the data in R1
SWCoeff(n, j, FALSE) = the jth coefficient for samples of size n
SWCoeff(R1, C1, FALSE) = the coefficient corresponding to cell C1 within sorted range R1
SWPROB(n, W, FALSE, h) = p-value of the Shapiro-Wilk test for a sample of size n for test statistic W
The functions SHAPIRO and SWTEST ignore all empty and non-numeric cells. The range R1 in SWCoeff(R1, C1, FALSE) should not contain any empty or non-numeric cells.
When performing the table lookup, the default is to use harmonic interpolation (h = TRUE). To use linear interpolation, set h to FALSE. See Interpolation for details.
For example, for Example 1 of Chi-square Test for Normality, we have SHAPIRO(A4:A15, FALSE) = .874 and SWTEST(A4:A15, FALSE, FALSE) = SWPROB(15,.874,FALSE,FALSE) = .0419 (referring to the worksheet in Figure 2 of Chi-square Test for Normality).
It is important to note that SHAPIRO(R1, TRUE), SWTEST(R1, TRUE), SWCoeff(n, j, TRUE), SWCoeff(R1, C1, TRUE) and SWPROB(n, W, TRUE) refer to the results using the Royston algorithm, as described in Shapiro-Wilk Expanded Test.
For compatibility with the Royston version of SWCoeff, when j ≤ n/2 then SWCoeff(n, j, False) = the negative of the value of the jth coefficient for samples of size n found in the Shapiro-Wilk Tables. When j = (n+1)/2, SWCoeff(n, j, FALSE) = 0 and when j > (n+1)/2, SWCoeff(n, j, FALSE) = -SWCoeff(n, n–j+1, FALSE).