We present the original approach to the performing the Shapiro-Wilk Test. This approach is limited to samples between 3 and 50 elements. By clicking here you can also review a revised approach using the algorithm of J.P. Royston which can handle samples with up to 5,000 (or even more).
The basic approach used in the Shapiro-Wilk (SW) test for normality is as follows:
- Rearrange the data in ascending order so that x1 ≤ … ≤ xn.
- Calculate SS as follows:
- If n is even, let m = n/2, while if n is odd let m = (n–1)/2
- Calculate b as follows, taking the ai weights from the Table 1 (based on the value of n) in the Shapiro-Wilk Table. Note that if n is odd, the median data value is not used in the calculation of b.
- Calculate the test statistic W = b2 ⁄ SS
- Find the value in the Table 2 of the Shapiro-Wilk Table (for a given value of n) that is closest to W, interpolating if necessary. This is the probability that the data comes from a normal distribution.
For example, suppose W = .975 and n = 10. This means that the probability that the data comes from a normal distribution is somewhere between 90% and 95%. SW is valid for samples from about n = 7 to 2000.
Example 1: A random sample of 12 people is taken from a large population. The ages of the people in the sample are given in column A of the worksheet in Figure 1. Is this data normally distributed?
Figure 1 – Shapiro-Wilk test for Example 1
We begin by sorting the data in column A using Data > Sort & Filter|Sort or the QSORT supplemental function, putting the results in column B. We next look up the a coordinate values for n = 12 (the sample size) in Table 1 of the Shapiro-Wilk Table, putting these values in column E. Corresponding to each of these 6 coordinates a1,…,a6, we calculate the values x12 – x11, …, x7 – x6, where xi is the ith data element in sorted order. E.g. since x1 = 35 and x12 = 86, we place the difference 86 – 35 = 51 in cell H5 (the same row as the cell containing a1). Column I contains the product of the coordinate and difference values. E.g. cell I5 contains the formula =E5*H5. The sum of these values is b = 44.1641, which is found in cell I11 (and again in cell E14).
We next calculate SS as DEVSQ(B5:B16) = 2008.667. Thus W = b2 ⁄ SS = 44.1641^2/2008.667 = .971026. We now look for .971026 when n = 12 in Table 2 of the Shapiro-Wilk Table and find that the value lies between .50 and .90. The W value for .5 is .943 and the W value for .9 is .973. Interpolating .971026 between these value, we arrive at p-value = .873681. Since p-value = .87 > .05 = α, we retain the null hypothesis that the data are normally distributed.
Example 2: Using the SW test, determine whether the data in Example 1 of Graphical Tests for Normality and Symmetry are normally distributed.
Figure 2 – Shapiro-Wilk test for Example 2
As we can see from the analysis in Figure 2, p-value = .0419 < .05 = α, and so we reject the null hypothesis and conclude with 95% confidence that that the data are not normally distributed, which is quite different from the results using the KS test that we found in Example 2 of Kolmogorov-Smironov Test.
Real Statistics Excel Function: The Real Statistics Resource Pack contains the following supplemental functions where R1 consists only of numeric data without headings:
SHAPIRO(R1, False) = the Shapiro-Wilk test statistic W for the data in the range R1
SWTEST(R1, False) = p-value of the Shapiro-Wilk test on the data in R1
SWCoeff(n, j, False) = the jth coefficient for samples of size n
SWCoeff(R1, C1, False) = the coefficient corresponding to cell C1 within sorted range R1
SWPROB(n, W) = p-value of the Shapiro-Wilk test for a sample of size n for test statistic W
For example, for Example 1 of Chi-square Test for Normality, we have SHAPIRO(A4:A15, False) = .874 and SWTEST(A4:A15, False) = SWPROB(15,.874) = .0419 (referring to the worksheet in Figure 2 of Chi-square Test for Normality).
Note that SHAPIRO(R1, True), SWTEST(R1, True), SWCoeff(n, j, True) and SWCoeff(R1, C1, True) refer to the results using the Royston algorithm, as described in Shapiro-Wilk Expanded Test.
For compatibility with the Royston version of SWCoeff, when j ≤ n/2 then SWCoeff(n, j, False) = the negative of the value of the jth coefficient for samples of size n found in the Shapiro-Wilk Table. When j = (n+1)/2, SWCoeff(n, j, False) = 0 and when j > (n+1)/2, SWCoeff(n, j, False) = -SWCoeff(n, n–j+1, False).