We consider a random variable *x* and a data set *S = *{*x _{1}, x_{2}, …, x_{n}*} of size

*n*which contains possible values of

*x*. The data set can represent either the population being studied or a sample drawn from the population.

Looking at *S* as representing a distribution, the** skewness** of *S* is a measure of symmetry while **kurtosis** is a measure of peakedness of the data in *S*.

**Symmetry and Skewness**

**Definition 1**: We use **skewness** as a measure of symmetry. If the skewness of *S* is 0 then the distribution represented by *S* is perfectly symmetric. If the skewness is negative, then the distribution is skewed to the left, while if the skew is positive then the distribution is skewed to the right (see Figure 1 below for an example).

Consistent with Excel we calculate the skewness of *S* as follows:

where *x̄* is the mean and *s* is the standard deviation of *S*. To avoid division by zero, this formula requires that *n* > 2.

**Observation**: When a distribution is symmetric, the mean = median, when the distribution is positively skewed the mean > median and when the distribution is negatively skewed the mean < median.

**Excel Function**: Excel provides the **SKEW** function as a way to calculate the skewness of *S*, i.e. if R is a range in Excel containing the data elements in *S* then SKEW(R) = the skewness of *S*.

**Excel 2013 Function**: There is also a population version of the skewness given by the formula

This version has been implemented in Excel 2013 using the function, **SKEW.P**.

It turns out that for range R consisting of the data in *S* = {*x*_{1}, …, *x _{n}*}, SKEW.P(R) = SKEW(R)*(

*n–*2)/SQRT(

*n*(

*n–*1)) where

*n*= COUNT(R).

**Example 1**: Suppose *S* = {2, 5, -1, 3, 4, 5, 0, 2}. The skewness of *S* = -0.43, i.e. SKEW(R) = -0.43 where R is a range in an Excel worksheet containing the data in *S*. Since this value is negative, the curve representing the distribution is skewed to the left (i.e. the fatter part of the curve is on the right). Also SKEW.P(R) = -0.34.

**Observation**: SKEW(R) and SKEW.P(R) ignore any empty cells or cells with non-numeric values.

**Kurtosis**

**Definition 2**: We use **kurtosis** as a measure of peakedness (or flatness). Positive kurtosis indicates a relatively peaked distribution. Negative kurtosis indicates a relatively flat distribution.

Consistent with Excel we calculate the kurtosis of *S* as follows:

where *x̄* is the mean and *s* is the standard deviation of *S*. To avoid division by zero, this formula requires that *n* > 3.

**Excel Function**: Excel provides the **KURT** function as a way to calculate the kurtosis of *S*, i.e. if R is a range in Excel containing the data elements in *S* then KURT(R) = the kurtosis of *S*.

**Example 2**: Suppose *S* = {2, 5, -1, 3, 4, 5, 0, 2}. The kurtosis of *S* = -0.94, i.e. KURT(R) = -0.94 where R is a range in an Excel worksheet containing the data in *S*. Since this value is negative, the curve representing the distribution is relatively flat.

**Observation**: KURT(R) ignores any empty cells or cells with non-numeric values.

**Graphical Illustration**

We now look at an example of these concepts using the chi-square distribution.

Figure 1 contains the graphs of two chi-square distributions (with different degrees of freedom *df*). We study the chi-square distribution elsewhere, but for now note the following values for the kurtosis and skewness:

**Figure 2 – Comparison of skewness and kurtosis**

The red curve (*df* = 10) is flatter than the blue curve (*df* = 5), which is reflected in the fact that the kurtosis value of the red curve is lower.

Both curves are asymmetric, and skewed to the right (i.e. the fat part of the curve is on the left). This is consistent with the fact that the skewness for both is positive. But the blue curve is more skewed to the right, which is consistent with the fact that the skewness of the blue curve is larger.

Thanks for helping us understanding those basics of stat.

Using the scores I have, how can I do the GRAPHIC ILLUSTRATION of skewness and kurtosis on the excel?

Namo,

I am not sure what you mean by a graphic illustration. I have tried to do this with the graph of the chi-square distribution, which was done using Excel (see the details in the Examples Workbook, which you can download for free).

Charles

Hi Charles. I want to know ‘what is the typical sort of skew?’

Soniya,

I don’t know of any typical sort of skew. The bell curve has 0 skew (i.e. it is symmetric).

Charles

Thank you Charles.

“the kurtosis value of the blue curve is lower” should read “the kurtosis value of the blue curve is higher”.

In fact, zero skew is seldom observed. See for example http://www.aip.de/groups/soe/local/numres/bookcpdf/c14-1.pdf

Gaylord,

Thanks for catching this typo. I have now corrected the webpage. I appreciate your help in making the website better.

Charles

Based on my experience of teaching the statistics, you can use pearson coefficient of skewness which is = mean – mode divide by standard deviation or use this = 3(mean – median) divide by standard deviation. mostly book covered use the first formula for ungrouped data and second formula for grouped data

Prof Amir,

Thank you very much for this suggestion. I will add something about this to the website shortly. I also found an interesting article about the usefulness of these statistics, especially for teaching purposes:

http://www.amstat.org/publications/jse/v19n2/doane.pdf

Charles

What the differences and similarities between skewness and kurtosis?

This is described on the referenced webpage. Perhaps you have a more specific question?

Charles

I want two suggestion

1. I have 1000 dollar money i wants to distribute it in 12 month in such a way that peak is 1.6 time the average ( using normal distribution curve)

2. As per my knowledge the peak in bell curve is attended in mean (i.e by 6.5 month) but if i want peak at 40% month (i.e 12*40/100 time ) and peak will still remain 1.6 time the average( i.e peak= 1.6*100/12) than what will be the distribution

The peak is usually considered to be the high point in the curve, which for a normal distribution occurs at the mean. Thus, I don’t know what it means for the peak to be 1.6 times the average (which is the mean). Please explain what you mean by the peak?

Charles

Hey Charles

Say you had a bunch of returns data and wished to check the skewness of that data. In this instance, which would be appropriate – Skew() or Skew.P()

I would imagine Skew() because Skew.P() refers to a population and you don’t have the population here, you merely have a bunch of return data don’t you. OR when dealing with financial returns do you assume that the data you have is the population?

Steven,

You would probably use SKEW(), although the results are probably fairly similar.

Charles

hi;

I want to make sure by ” n ”

did you mean the sample size ?

Yes, n = sample size.

Charles