In Skewness and Kurtosis Analysis we showed how to use the skewness and kurtosis to determine whether a data set is normally distributed. In particular, we demonstrated the Jarque-Barre test. We now described a more powerful test which is also based on skewness and kurtosis.

We first describe Skewness and Kurtosis tests, and then we describe the D’Agostino-Pearson Test, which is an integration of these two tests.

**Skewness Test**

The normal distribution has skewness equal to zero. This test determines whether the skewness of the data is statistically different from zero. The test is based on the fact that when the data is normally distributed the test statistic *z _{s}* =

*skew*/

*s.e.*has a standard normal distribution, where

*skew*= the skewness of the sample data and the standard error is given by the following formulas where

*n*= the sample size.

**Example 1**: Conduct the skewness test for the data in range B4:C15 of Figure 1.

**Figure 1 – Sample data**

The test results are shown in Figure 2.

**Figure 2 – Skewness Test**

We see from Figure 2 that the skewness is not significantly different from zero and in fact the 95% confidence interval is (-.72991, 1.21315).

**Real Statistics Function**: The Real Statistics Resource Pack provides the following array function.

**SKEWTEST**(R1, *lab, alpha*) – array function which tests whether the skewness of the sample data in range R1 is zero (consistent with a normal distribution). The output consists of a 6 × 1 range containing the sample skewness, standard error, test statistic z_{s}, p-value and 1–*alpha* confidence interval.

If *lab* = TRUE then the output contains a column of labels (default = FALSE). The default value for *alpha* is .05.

The output in range X7:Y12 of Figure 2 can be obtained using the array formula

=SKEWTEST(B4:C15,TRUE).

**Kurtosis Test**

The normal distribution has kurtosis equal to zero. This test determines whether the kurtosis of the data is statistically different from zero. The test is based on the fact that when the data is normally distributed the test statistic *z _{k}* =

*kurt*/

*s.e.*has a standard normal distribution, where

*kurt*= the kurtosis of the sample data and the standard error is given by the following formulas where

*n*= the sample size.

**Example 2**: Conduct the kurtosis test for the data in range B4:C15 of Figure 1.

The test results are shown in Figure 3.

**Figure 3 – Kurtosis Test**

**Real Statistics Function**: The Real Statistics Resource Pack provides the following array function.

**KURTTEST**(R1, *lab, alpha*) – array function which tests whether the kurtosis of the sample data in range R1 is zero (consistent with a normal distribution). The output consists of a 6 × 1 range containing the sample kurtosis, standard error, test statistic *z _{k}*, p-value and 1–alpha confidence interval.

If *lab* = TRUE then the output contains a column of labels (default = FALSE). The default value for *alpha* is .05.

The output in range AC7:AD12 of Figure 3 can be obtained using the array formula =KURTTEST(B4:C15,TRUE).

**D’Agostino-Pearson Omnibus Test**

The D’Agostino-Pearson test is based on the fact that when the data is normally distributed the test statistic has a chi-square distribution with 2 degrees of freedom, i.e.

This test should generally not be used for data sets with less than 20 elements

**Real Statistics Functions**: The Real Statistics Resource Pack contains the following functions.

**DAGOSTINO**(R1) = the D’Agostino-Pearson test statistic for the data in the range R1

**DPTEST**(R1) = p-value of the D’Agostino-Pearson test on the data in R1

**Example 3**: Use the D’Agostino-Pearson Test to determine whether the data in range B4:C15 of Figure 1 is normally distributed

From Figure 4, we see that p-value = .636273 > .05 = α, and so conclude that there are no grounds to reject the null hypothesis that the data is normally distributed, a conclusion which agrees with that obtained using the Shapiro-Wilk test.

**Figure 4 – D’Agostino-Pearson Test for Normality**

Can the D’Agostino-Pearson Test be used to check a fit to a Rayleigh distribution, if R1 is the CDF of the Rayleigh value of the data in sorted order?

Also, I noticed a slight typo: “From Figure 4, we see that p-value = .63673…” Should be 6.36273 to match the spreadsheet screen grab.

Steve,

I don’t see any reason why the d’Agostino-Pearson test could be used as you have described. If, however, the data in R1 could be expressed as the sqrt(x^2+y^2) then you could test the x and y data as being normally distributed (using d’Agostino-Pearson or Shapiro-Wilk) and check also that x and y have the same variance and are independent. In this case, you would have grounds for saying that data in R1 follows a Rayleigh distribution.

Thanks for catching the typo. I have now corrected the webpage. I really appreciate your help in improving the accuracy of the website.

Charles

Charles,

Great stuff. I have a question. If the alpha of a data I have is > 0.05 (i.e. the null is not rejected),

is the std deviation of the data set usable to model as the spread of the data ?

For e.g. if the data were actual demand of a product.

I wanted to find say a 98%CI of the range of expected future demand. Could I say that mean + z*std.deviation, is the expected demand level with 98% confidence (where z=norminv(p=.98)) ?

(given that the data can be treated as “normal”)

Jay,

Yes, you can do all of these things. In particular, you can create confidence intervals even when the null hypothesis is not rejected.

Charles