One Sample Kolmogorov-Smirnov Test

The one sample Kolmogorov-Smirnov test is used to test whether a sample comes from a specific distribution. We can use this procedure to determine whether a sample comes from a population which is normally distributed (see Kolmogorov-Smirnov Test for Normality).

We now show how to modify the procedure to test whether a sample comes from an exponential distribution. Tests for other distributions are similar.

Example 1: Determine whether the sample data in range B4:B18 of Figure 1 is distributed significantly different from an exponential distribution.

Kolmogorov-Smirnov exponential distribution

Figure 1 – Kolmogorov-Smirnov test for exponential distribution

The result is shown in Figure 1. This figure is very similar to Figure 3 of Kolmogorov-Smirnov Test for Normality. Assuming the null hypothesis holds and the data follows an exponential distribution, then the data in column F would contain the cumulative distribution values F(x) for every x in column B.

We use the Excel function EXPONDIST to calculate the exponential distribution valued F(x) in column F. E.g. the formula in cell F4 is =EXPONDIST(B4,$B$20,TRUE). Here B4 contains the x value (0.7 in this case) and B20 contains the value of lambda (λ) in the definition of the exponential distribution (Definition 1 of Exponential Distribution). As we can see from Figure 1 of Exponential Distribution, λ is simply the reciprocal of the population mean. As usual, we use the sample mean as an estimate of the population mean, and so the value in B20, which contains the formula =1/B19 where B19 contains the sample mean, is used as an estimate of λ.

All the other formulas are the same as described in Kolmogorov-Smirnov Test for Normality where the Kolmogorov-Smirnov test is used to test that data follows a normal distribution.

We see that calculated value of the test statistic D is .286423 (cell G20, which contains the formula =MAX(G4:G18)), which is less than the critical value of 0.338 (cell G21, which contains the formula =KSCRIT(B21,0.05), i.e. the value for n = 15 and α = .05 in the Kolmogorov-Smirnov Table). Since D < Dcrit, we conclude that there is no significant difference between the data and data coming from an exponential distribution (with λ = 0.247934).

We can compute an approximate p-value using the formula

KSPROB(G20,B21) = .141851

39 Responses to One Sample Kolmogorov-Smirnov Test

  1. wildan says:

    hi Charles…
    I would like to know the reference that you used to decide the score of Kolmogrov Smirnov for the level of significance (.05)

    • Charles says:

      Wildan,
      Are you asking for the reference to Kolmogorov-Smirnov table of critical values? If so, if you google you should find numerous references to the table of critical values.
      Charles

  2. amir says:

    I really not aware how to calculate s(x)

  3. Tian says:

    Hi Charles,

    I am wondering if a variable has good normality in KS test, can I use anova after?

    Many thanks.
    Tian

    • Charles says:

      Tian,
      In general I would recommend the Shapiro-Wilk test for normality rather than the KS test. If you do use the KS test then make sure that you use the Lilliefors version of the test if the mean and standard deviation are estimated from the sample.
      If the test for normality holds then you can use Anova provided that the other assumptions hold (especially homogeneity of variances).
      Charles

  4. Leung says:

    Hi sir
    why can i only need to consider one side of the difference?
    I mean only
    abs(cumul/count-F(x))———–1
    but not
    abs(F(x)-(cumul-1)/n)————2
    it makes more sense to me if D_n=max{1,2},since the step function is discontinuous at x

    thx!

    • Charles says:

      Leung,
      Sorry, but I don’t quite understand what the other side of the difference is. In any case, the KS test is the one described. Perhaps there are other possible tests along the lines that you are describing.
      Charles

  5. Steve says:

    If the question simply tells you to test whether 2 variables follow a normal distribution, should I use the One-Sample K-S Test or rather consider the p-value of the Kolmogorov-Smirnov Test from the Tests of Normality (which in SPSS is given with the Lilliefors Significance Correction)?

    Thanks in advance!

    • Charles says:

      Steve,
      In general I would use the Shapiro-Wilk test. It is more accurate.
      If you are testing for a normal distribution with a specified mean and standard deviation then you could use the one-sample KS test. If you don’t know the population mean and standard deviation (and will estimate these from the sample), then you should use the Lilliefors version of the test.
      Charles

  6. Christian says:

    Hi, Charles

    I’m confuse how to calculate the value of Sn(x)?

  7. Jacky says:

    Hi Charles,
    Regarding p-value, what is the difference between your formula KSPROB(D-statistic, Sample size) and KSDIST(D-statistic,Sample Size). On this page, the p-value is calculated using KSPROB. In the normality case, you used KSDIST. What is the difference. Thanks.

    • Charles says:

      Jacky,

      They both represent approximate values for the p-value. KSPROB(x,n) = the p-value estimated using the table of critical values. E.g., KSPROB(.24,30) = .05 because the critical value for alpha = .05 and n = 30 is .24. For values not in the table of critical values a harmonic interpolation is made: e.g. KSPROB(.23,30) = .0667; here .22 and .24 are in the table of critical values but .23 is not so a value between the two critical values is used.

      The KSDIST(x,n) function uses a different approach, namely it calculates the p-value using an approximate Kolmogorov distribution function.

      Neither value is perfect (nor are they always equal).

      Charles

  8. Mahabub Rahaman says:

    Don’t we able to one sample test on k-s test? like gamma distribution?

  9. mammo says:

    Hi
    My research have two verible
    avergae performance Cash F. (Before)….1
    average performance Cash F. (After)…..2
    Can I use KS to know if there are differnt to use before and After

    • Charles says:

      You would typically use a paired t test or Wilcoxon signed ranks test for this sort of problem. A one sample KS test is typically used to see whether a sample fits a particular distribution.
      Charles

  10. Stacey says:

    Hi, sir

    Since the null hypothesis for KS is that a set of data do not display a normal distribution, which means they are significantly different from each other.
    If I just want to find out whether several figures, for instance, 1.1, 1.2, 1.4, 1.5, are significantly different from each other, only an one-sample KS test is OK?

    • Charles says:

      Stacey,
      A one-sample KS test can be used to determine whether a sample (such as the one you have listed) is normally distributed, i.e. that the sample is not significantly different from a normal distribution (not that the numbers in the sample are significantly different from each other). If you have the mean and standard deviation of the normal distribution, then you can use the KS test directly. If instead you are estimating the mean and standard deviation from the sample data, then you should use the Lilliefors version of the KS test, as described on the webpage
      Lilliefors Test for Normality.
      Charles

      • Stacey says:

        Thank you so much, Charles.
        Your reply is really helpful. I also wanted to ask that if I want to estimate the difference within these five numbers (instead of their normal distribution) to find whether the difference is at a significant level, what kind of statistical test is suitable? 
        Thanks again.

        • Stacey says:

          Sorry, I did not make it clear.
          These five numbers are means of five groups. I wanted to compare these five means to find whether data from these groups are significantly different.

      • Stacey says:

        Thanks a lot. I wanna say your suggestion is really helpful. It is so kind.
        I’ve read your introduction for an ANOVA test. Pardon me for another question. Five groups of raw data do not meet either the the normality assumption or homogeneity of variance test (their p value are all equal zero). However, the sample sizes are equal, with each group containing 5000 samples. Under this situation, an ANOVA test is OK?
        Thanks a lot!

  11. Stacey says:

    Thanks Charles.
    Since the null hypothesis for KS is that they are not normally distributed, which means they are significantly different from each other. If I just want to compare several figures, for example, 1.31, 1.24, 1.56, 1.67, 1.45, to find out whether they are significantly different from each other, only an one-sample KS test is OK?

  12. Robert says:

    Hi,

    I am trying to figure out how to use the K-S Test to evaluate the plausible randomness (or lack thereof) of a binary Heads-Tails sequence with n=200. It seems this should be possible with a minor tweak to what you present in these pages. Could you point me in the right direction?

    Thanks,

    Robert

    • Charles says:

      Robert,
      As described on the referenced webpage, the KS test can be used to determine whether a sample fits a particular distribution. For the case you have identified this distribution is a uniform distribution with endpoints 0 and 1.
      Charles

  13. Zhenlei says:

    Before doing one way ANOVA test, should we check the nomarlity of the population where the data were collected from by one sample KS, or check the normality of the the data itself by KS? In brief, should we do one sample KS or KS before we do one way ANOVA test???
    Thanks for your reply. This question has bothered me for quite a long time.

    • Charles says:

      The answer is yes. You should check normality before doing an ANOVA. However, note that ANOVA is pretty robust to violations of normality, provided the data is reasonably symmetric and the group samples are equal in size.

      I provide a number of tests for normality on the website, and so I suggest you take a look at the webpage Testing for Normality and Symmetry. In particular, I would use either the Lilliefors test (which is related to the KS test) or the Shapiro-Wilk test for normality.

      Charles

  14. Afke says:

    Thanks Charles

    I spend a few hours to get Real Statistics Resource Pack to work on Excel 2007 dutch version

  15. Afke says:

    Hi Charles,

    Nice article

    I can’t find the function =KSCRIT(B21,0.05) in Excel.
    I am using Excel 2007

    Greetings

    Afke

    • Charles says:

      Afke,
      KSCRIT is not a standard Excel function. You need to install the Real Statistics Resource Pack to use it.
      Charles

  16. masoud azari says:

    Hi sir , why is two table for critical value of D (k-s test) , that in one table (α=0.05) and (n=25) ———–> D = 0.180 and in other table (α=0.05) and (n=25) D = 0.264 ????!!!!
    for k-s test Normality i use this table ———–>

    http://www.real-statistics.com/statistics-tables/kolmogorov-smirnov-table/

    or this table ———–>

    http://lya.fciencias.unam.mx/rfuentes/K-S_Lilliefors.pdf

    thank you
    excuse me for bad English

    • Charles says:

      Hi Masoud,

      The article that you reference explains that the table of critical values for KS are too high when the test is restricted to just the normal distribution. In fact for low values of n the values the authors calculated specifically for the normal distribution are about 2/3 of the general table values, which is consistent with .180 and .264. The table of critical values given in the Real Statistics website are for the general KS test.

      This article seems to imply that if you want to use KS you should use critical values that are specifically calculated for the distribution you want to test (normal, uniform, exponential, etc.). In the case of the normal distribution I generally use the Shapiro-Wilk test which gives better results, and so I avoid this issue.

      Charles

  17. Chen says:

    Hi Charles,

    I’m a little confused about the KS table here. You chose α = .05 in this case, does it mean that there’s 95% chance that the distribution is not different from the expected distribution (exponential distribution in this case)? But why Dn,α goes smaller as the α increases? For example, in my case, D = 0.123 and n = 150. If I choose α = 0.05, Dn,α = 0.111 and I have to reject the null hypothesis, but if I choose α = 0.01, Dn,α = 0.133 and I can say my distribution is the same as expected. So what does α actually mean here and how should I choose it?

    Thanks a lot!!

    Chen

    • Charles says:

      Chen,
      The null hypothesis is that the two distributions are equal. The value of alpha is as described in Hypothesis Testing. Generally alpha is chosen to be .05, but you may choose a different value, based on how much error you can tolerate.
      Charles

      • Bhawana Mathur says:

        Hi
        I have taken 22 different(softwares) samples of 2 different variables and first one contains 4 independent variables and second one contains 7 independent variables.In this situation can we apply ks test or which test can be applied in this situation?

        • Charles says:

          Hi,
          You need to specify what you are trying to test, before I can tell you which test to use.
          If you are trying to compare two samples with different variables, then I would have to respond that this is like comparing apples with oranges.
          Charles

Leave a Reply

Your email address will not be published. Required fields are marked *