**Definition 1**: Let *x _{1},…,x_{n}* be an ordered sample with

*x*≤ … ≤

_{1}*x*and define

_{n}*S*(

_{n}*x*) as follows:

Now suppose that the sample comes from a population with cumulative distribution function *F*(*x*) and define *D _{n}* as follows:

**Observation**: It can be shown that *D _{n}* doesn’t depend on

*F*. Since

*S*(x) depends on the sample chosen,

_{n}*D*is a random variable. Our objective is to use

_{n}*D*as way of estimating

_{n}*F*(

*x*).

The distribution of *D _{n}* can be calculated, but for our purposes the important aspect of this distribution are the critical values. These can be found in the Kolmogorov-Smirnov Table.

If *D _{n,α}* is the critical value from the table, then

*P*(

*D*) = 1 –

_{n}≤ D_{n,α}*α*.

*D*can be used to test the hypothesis that a random sample came from a population with a specific distribution function

_{n}*F*(

*x*). If

then the sample data is a good fit with *F*(*x*).

Also from the definition of *D _{n}* given above, it follows that

Thus *S _{n}*(

*x*) ±

*D*provides a confidence interval for F(x)

_{n,α}**Example 1**: Determine whether the data represented in the following frequency table is normally distributed.

**Figure 1 – Frequency table for Example 1**

This means that 8 elements have value less than 100, 25 elements have value between 101 and 200, etc. We need to find the mean and standard deviation of this data. Since this is a frequency table, we can’t simply use Excel’s AVERAGE and STDEV functions. Instead we first use the midpoints of each interval and then use an approach similar to that described in Frequency Tables as follows:

**Figure 2 – Calculating mean and std dev for data in frequency table **

Thus, the mean is 481.4 and the standard deviation is 155.2. We can now build the table that allows us to carry out the KS test, namely:

**Figure 3 – Kolmogorov-Smirnov test for Example 1**

Columns A and B contain the data from the original frequency table. Column C contains the corresponding cumulative frequency values and column D simply divides these values by the sample size (*n* = 1000) to yield the cumulative distribution function *S _{n}*(

*x*)

_{ }

Column E uses the mean and standard deviation calculated previously to standardize the values of *x* from column A. E.g. the formula in cell E4 is =STANDARDIZE(A4,N$5,N$10), where cell N5 contains the mean and cell N10 contains the standard deviation. Column F uses these standardized values to calculate the cumulative distribution function values assuming that the original data is normally distributed. E.g. cell F4 contains the formula =NORMSDIST(E4). Finally column G contains the differences between the values in columns D and F. E.g. cell G4 contains the formula =ABS(F4—D4). If the original data is normally distributed these differences will be zero.

Now *D _{n}* = the largest value in column G, which in our case is 0.0117. If the data is normally distributed then the critical value

*D*will be larger than

_{n,α}*D*. From the Kolmogorov-Smirnov Table we see that

_{n}*D _{n,α}* =

*D*

_{1000,.05}= 1.36 / SQRT(1000) = 0.043007

Since *D _{n} *= 0.0117 < 0.043007 =

*D*, we conclude that the data is a good fit with the normal distribution.

_{n,α}**Example 2**: Using the KS test, determine whether the data in Example 1 of Graphical Tests for Normality and Symmetry is normally distributed.

We follow the same procedure as in the previous example to obtain the following results. Since the frequencies are all 1, this example should be a bit easier to understand.

**Figure 4 – KS test for data from Example 2**

The Kolmogorov-Smirnov Table shows that the critical value *D _{n,α} *=

*D*

_{15,.05}= .338

Since *D _{n}* = 0.1874988 < 0.338 =

*D*, we conclude that the data is a reasonably good fit with the normal distribution, which a bit different from what we concluded from looking at the histogram and QQ plot.

_{n,α}**Real Statistics Excel Function**: The following function is provided in the Real Statistics Resource Pack:

**KSCRIT**(*n, α, t*) = the critical value of the Kolmogorov-Smirnov test for a sample of size *n*, for the given value of alpha and *t* = 1 (one tail) or 2 (two tails).

Dear Sir:

I am looking for a test to compare if one sub-sample of size “n” taken from a sample of size “N” (source sample), with n<<N, has the same attributes of the source sample.

Is Kolmogorov-Smirnov the best test?

The source sample is a multimodal distribution (fish size frequencies); and I have some doubts about how to construct the accumulative sample to make the KS test.

Tha data is in a table of frequencies by ranges of size

Thanks for your answer

Renato

Renato,

Whether the KS test is the right one depends on what you mean by “has the same attributes”. In any case, the webpage at http://www.real-statistics.com/tests-normality-and-symmetry/statistical-tests-normality-symmetry/kolmogorov-smirnov-test/ describes in detail the steps you need to perform the KS test. You can also download the Real Statistics Examples Workbook and look at the Excel worksheet for the KS test and use it as a model for your test. I have not yet created a supplemental function to automate the calculation of the KS test, but I will eventually add this.

Charles

Hi, the spss software use the Z K-S = D*SQRT(n), and a P-value, but, i can´t calculate the result of the p-value, is not the probablility of the normal distribution.

Example, n = 20 D = .416, ZK-S =.416*SQRT(20) = 1.861 SPSS P-value (two sided) = .002.

But, 2*(1-NORMSDIST(1.861)) is not .002

Do you know how is the p-value calculated?

Tks a lot

PD. Sorry, mi english is not the best

Hi Juan Pablo,

You need the distribution function. You can find this at http://www.jstatsoft.org/v08/i18/paper or http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test.

Charles