The one sample Kolmogorov-Smirnov test is used to test whether a sample comes from a specific distribution. We can use this procedure to determine whether a sample comes from a population which is normally distributed (see Kolmogorov-Smirnov Test for Normality).

We now show how to modify the procedure to test whether a sample comes from an exponential distribution. Tests for other distributions are similar.

**Example 1**: Determine whether the sample data in range B4:B18 of Figure 1 is distributed significantly different from an exponential distribution.

**Figure 1 – Kolmogorov-Smirnov test for exponential distribution**

The result is shown in Figure 1. This figure is very similar to Figure 3 of Kolmogorov-Smirnov Test for Normality. Assuming the null hypothesis holds and the data follows an exponential distribution, then the data in column F would contain the cumulative distribution values *F*(*x*) for every *x* in column B.

We use the Excel function EXPONDIST to calculate the exponential distribution valued *F*(*x*) in column F. E.g. the formula in cell F4 is =EXPONDIST(B4,$B$20,TRUE). Here B4 contains the *x* value (0.7 in this case) and B20 contains the value of lambda (*λ*) in the definition of the exponential distribution (Definition 1 of Exponential Distribution). As we can see from Figure 1 of Exponential Distribution,* λ* is simply the reciprocal of the population mean. As usual, we use the sample mean as an estimate of the population mean, and so the value in B20, which contains the formula =1/B19 where B19 contains the sample mean, is used as an estimate of *λ*.

All the other formulas are the same as described in Kolmogorov-Smirnov Test for Normality where the Kolmogorov-Smirnov test is used to test that data follows a normal distribution.

We see that calculated value of the test statistic *D* is .286423 (cell G20, which contains the formula =MAX(G4:G18)), which is less than the critical value of 0.338 (cell G21, which contains the formula =KSCRIT(B21,0.05), i.e. the value for *n* = 15 and *α* = .05 in the Kolmogorov-Smirnov Table). Since *D* < *D _{crit}*, we conclude that there is no significant difference between the data and data coming from an exponential distribution (with

*λ*= 0.247934).

We can compute an approximate p-value using the formula

KSPROB(G20,B21) = .141851

hi Charles…

I would like to know the reference that you used to decide the score of Kolmogrov Smirnov for the level of significance (.05)

Wildan,

Are you asking for the reference to Kolmogorov-Smirnov table of critical values? If so, if you google you should find numerous references to the table of critical values.

Charles

I really not aware how to calculate s(x)

As described on the referenced webpage, please see the following webpage

Kolmogorov-Smirnov Test for Normality

Charles

Hi Charles,

I am wondering if a variable has good normality in KS test, can I use anova after？

Many thanks.

Tian

Tian,

In general I would recommend the Shapiro-Wilk test for normality rather than the KS test. If you do use the KS test then make sure that you use the Lilliefors version of the test if the mean and standard deviation are estimated from the sample.

If the test for normality holds then you can use Anova provided that the other assumptions hold (especially homogeneity of variances).

Charles

Hi sir

why can i only need to consider one side of the difference?

I mean only

abs(cumul/count-F(x))———–1

but not

abs(F(x)-(cumul-1)/n)————2

it makes more sense to me if D_n=max{1,2},since the step function is discontinuous at x

thx!

Leung,

Sorry, but I don’t quite understand what the other side of the difference is. In any case, the KS test is the one described. Perhaps there are other possible tests along the lines that you are describing.

Charles

If the question simply tells you to test whether 2 variables follow a normal distribution, should I use the One-Sample K-S Test or rather consider the p-value of the Kolmogorov-Smirnov Test from the Tests of Normality (which in SPSS is given with the Lilliefors Significance Correction)?

Thanks in advance!

Steve,

In general I would use the Shapiro-Wilk test. It is more accurate.

If you are testing for a normal distribution with a specified mean and standard deviation then you could use the one-sample KS test. If you don’t know the population mean and standard deviation (and will estimate these from the sample), then you should use the Lilliefors version of the test.

Charles

Hi, Charles

I’m confuse how to calculate the value of Sn(x)?

Christian,

Sn(x) is explained on the following webpage

Charles

In your page ‘Kolmogorov-Smirnov Test for Normality’, the Sn(x)=k/n, if x(k) <= x < x(k+1). So what is the value of 'k'?

Hi Charles,

Regarding p-value, what is the difference between your formula KSPROB(D-statistic, Sample size) and KSDIST(D-statistic,Sample Size). On this page, the p-value is calculated using KSPROB. In the normality case, you used KSDIST. What is the difference. Thanks.

Jacky,

They both represent approximate values for the p-value. KSPROB(x,n) = the p-value estimated using the table of critical values. E.g., KSPROB(.24,30) = .05 because the critical value for alpha = .05 and n = 30 is .24. For values not in the table of critical values a harmonic interpolation is made: e.g. KSPROB(.23,30) = .0667; here .22 and .24 are in the table of critical values but .23 is not so a value between the two critical values is used.

The KSDIST(x,n) function uses a different approach, namely it calculates the p-value using an approximate Kolmogorov distribution function.

Neither value is perfect (nor are they always equal).

Charles

Don’t we able to one sample test on k-s test? like gamma distribution?

Sorry, but I don’t understand your question.

Charles

Hi

My research have two verible

avergae performance Cash F. (Before)….1

average performance Cash F. (After)…..2

Can I use KS to know if there are differnt to use before and After

You would typically use a paired t test or Wilcoxon signed ranks test for this sort of problem. A one sample KS test is typically used to see whether a sample fits a particular distribution.

Charles

Hi, sir

Since the null hypothesis for KS is that a set of data do not display a normal distribution, which means they are significantly different from each other.

If I just want to find out whether several figures, for instance, 1.1, 1.2, 1.4, 1.5, are significantly different from each other, only an one-sample KS test is OK?

Stacey,

A one-sample KS test can be used to determine whether a sample (such as the one you have listed) is normally distributed, i.e. that the sample is not significantly different from a normal distribution (not that the numbers in the sample are significantly different from each other). If you have the mean and standard deviation of the normal distribution, then you can use the KS test directly. If instead you are estimating the mean and standard deviation from the sample data, then you should use the Lilliefors version of the KS test, as described on the webpage

Lilliefors Test for Normality.

Charles

Thank you so much, Charles.

Your reply is really helpful. I also wanted to ask that if I want to estimate the difference within these five numbers (instead of their normal distribution) to find whether the difference is at a significant level, what kind of statistical test is suitable?

Thanks again.

Sorry, I did not make it clear.

These five numbers are means of five groups. I wanted to compare these five means to find whether data from these groups are significantly different.

Stacey,

You can use ANOVA. The input will be raw data for each group, not the means.

Charles

Thanks a lot. I wanna say your suggestion is really helpful. It is so kind.

I’ve read your introduction for an ANOVA test. Pardon me for another question. Five groups of raw data do not meet either the the normality assumption or homogeneity of variance test (their p value are all equal zero). However, the sample sizes are equal, with each group containing 5000 samples. Under this situation, an ANOVA test is OK?

Thanks a lot!

Thanks Charles.

Since the null hypothesis for KS is that they are not normally distributed, which means they are significantly different from each other. If I just want to compare several figures, for example, 1.31, 1.24, 1.56, 1.67, 1.45, to find out whether they are significantly different from each other, only an one-sample KS test is OK?

Hi,

I am trying to figure out how to use the K-S Test to evaluate the plausible randomness (or lack thereof) of a binary Heads-Tails sequence with n=200. It seems this should be possible with a minor tweak to what you present in these pages. Could you point me in the right direction?

Thanks,

Robert

Robert,

As described on the referenced webpage, the KS test can be used to determine whether a sample fits a particular distribution. For the case you have identified this distribution is a uniform distribution with endpoints 0 and 1.

Charles

Before doing one way ANOVA test, should we check the nomarlity of the population where the data were collected from by one sample KS, or check the normality of the the data itself by KS? In brief, should we do one sample KS or KS before we do one way ANOVA test???

Thanks for your reply. This question has bothered me for quite a long time.

The answer is yes. You should check normality before doing an ANOVA. However, note that ANOVA is pretty robust to violations of normality, provided the data is reasonably symmetric and the group samples are equal in size.

I provide a number of tests for normality on the website, and so I suggest you take a look at the webpage Testing for Normality and Symmetry. In particular, I would use either the Lilliefors test (which is related to the KS test) or the Shapiro-Wilk test for normality.

Charles

Thanks Charles

I spend a few hours to get Real Statistics Resource Pack to work on Excel 2007 dutch version

Hi Charles,

Nice article

I can’t find the function =KSCRIT(B21,0.05) in Excel.

I am using Excel 2007

Greetings

Afke

Afke,

KSCRIT is not a standard Excel function. You need to install the Real Statistics Resource Pack to use it.

Charles

Hi sir , why is two table for critical value of D (k-s test) , that in one table (α=0.05) and (n=25) ———–> D = 0.180 and in other table (α=0.05) and (n=25) D = 0.264 ????!!!!

for k-s test Normality i use this table ———–>

http://www.real-statistics.com/statistics-tables/kolmogorov-smirnov-table/

or this table ———–>

http://lya.fciencias.unam.mx/rfuentes/K-S_Lilliefors.pdf

thank you

excuse me for bad English

Hi Masoud,

The article that you reference explains that the table of critical values for KS are too high when the test is restricted to just the normal distribution. In fact for low values of n the values the authors calculated specifically for the normal distribution are about 2/3 of the general table values, which is consistent with .180 and .264. The table of critical values given in the Real Statistics website are for the general KS test.

This article seems to imply that if you want to use KS you should use critical values that are specifically calculated for the distribution you want to test (normal, uniform, exponential, etc.). In the case of the normal distribution I generally use the Shapiro-Wilk test which gives better results, and so I avoid this issue.

Charles

Hi Charles,

I’m a little confused about the KS table here. You chose α = .05 in this case, does it mean that there’s 95% chance that the distribution is not different from the expected distribution (exponential distribution in this case)? But why Dn,α goes smaller as the α increases? For example, in my case, D = 0.123 and n = 150. If I choose α = 0.05, Dn,α = 0.111 and I have to reject the null hypothesis, but if I choose α = 0.01, Dn,α = 0.133 and I can say my distribution is the same as expected. So what does α actually mean here and how should I choose it?

Thanks a lot!!

Chen

Chen,

The null hypothesis is that the two distributions are equal. The value of alpha is as described in Hypothesis Testing. Generally alpha is chosen to be .05, but you may choose a different value, based on how much error you can tolerate.

Charles

Hi

I have taken 22 different(softwares) samples of 2 different variables and first one contains 4 independent variables and second one contains 7 independent variables.In this situation can we apply ks test or which test can be applied in this situation?

Hi,

You need to specify what you are trying to test, before I can tell you which test to use.

If you are trying to compare two samples with different variables, then I would have to respond that this is like comparing apples with oranges.

Charles