Kolmogorov-Smirnov Test for Normality

Hypothesis Testing

Definition 1: Let x1,…,xn be an ordered sample with x1 ≤ … ≤ xn and define Sn(x) as follows:


Now suppose that the sample comes from a population with cumulative distribution function F(x) and define Dn as follows:


Observation: It can be shown that Dn doesn’t depend on F. Since Sn(x) depends on the sample chosen, Dn is a random variable. Our objective is to use Dn as a way of estimating F(x).

The distribution of Dn can be calculated (see Kolmogorov Distribution), but for our purposes now the important aspect of this distribution are the critical values. These can be found in the Kolmogorov-Smirnov Table.

If Dn,α is the critical value from the table, then P(Dn ≤ Dn,α) = 1 – α. Dn can be used to test the hypothesis that a random sample came from a population with a specific distribution function F(x). If


then the sample data is a good fit with F(x).

Also from the definition of Dn given above, it follows that

image3581 image3583 image3582

Thus Sn(x) ± Dn,α provides a confidence interval for F(x)

Example 1: Determine whether the data represented in the following frequency table is normally distributed.

Frequency table KS test

Figure 1 – Frequency table for Example 1

This means that 8 elements have an x value less than 100, 25 elements have an x value between 101 and 200, etc. We need to find the mean and standard deviation of this data. Since this is a frequency table, we can’t simply use Excel’s AVERAGE and STDEV functions. Instead we first use the midpoints of each interval and then use an approach similar to that described in Frequency Tables as follows:

Frequency table mean variance

Figure 2 – Calculating mean and standard deviation for data in frequency table

Thus, the mean is 481.4 and the standard deviation is 155.2. We can now build the table that allows us to carry out the KS test, namely:

Kolmogorov-Smirnov test Excel

Figure 3 – Kolmogorov-Smirnov test for Example 1

Columns A and B contain the data from the original frequency table. Column C contains the corresponding cumulative frequency values and column D simply divides these values by the sample size (n = 1000) to yield the cumulative distribution function Sn(x)

Column E uses the mean and standard deviation calculated previously to standardize the values of x from column A. E.g. the formula in cell E4 is =STANDARDIZE(A4,N$5,N$10), where cell N5 contains the mean and cell N10 contains the standard deviation. Column F uses these standardized values to calculate the cumulative distribution function values assuming that the original data is normally distributed. E.g. cell F4 contains the formula =NORMSDIST(E4). Finally column G contains the differences between the values in columns D and F. E.g. cell G4 contains the formula =ABS(F4—D4). If the original data is normally distributed these differences will be zero.

Now Dn = the largest value in column G, which in our case is 0.0117. If the data is normally distributed then the critical value Dn,α will be larger than Dn. From the Kolmogorov-Smirnov Table we see that

Dn,α = D1000,.05 = 1.36 / SQRT(1000) = 0.043007

Since Dn = 0.0117 < 0.043007 = Dn,α, we conclude that the data is a good fit with the normal distribution.

Example 2: Using the KS test, determine whether the data in Example 1 of Graphical Tests for Normality and Symmetry is normally distributed.

We follow the same procedure as in the previous example to obtain the following results. Since the frequencies are all 1, this example should be a bit easier to understand.

KS test Excel

Figure 4 – KS test for data from Example 2

The Kolmogorov-Smirnov Table shows that the critical value Dn,α = D15,.05 = .338

Since Dn = 0.1874988 < 0.338 = Dn,α, we conclude that the data is a reasonably good fit with the normal distribution (more precisely that there is no significant difference between the data and data which is normally distributed). Note that is not the same conclusion we reached from looking at the histogram and QQ plot.

Real Statistics Excel Function: The following function is provided in the Real Statistics Resource Pack:

KSCRIT(n, α, tails, h) = the critical value of the Kolmogorov-Smirnov test for a sample of size n, for the given value of alpha (default = .05) and tails = 1 (one tail) or 2 (two tails, default), based on the KS Table. If h = TRUE (default) harmonic interpolation is used; otherwise linear interpolation is used.

KSPROB(x, n, tails, iter) = an approximate p-value for the KS test for  value equal to x for a sample of size n and tails = 1 (one tail) or 2 (two tails, default) based on a linear interpolation of the values in the Kolmogorov-Smirnov Table, using iter number of iterations (default = 40).

Note that the values for α in the Kolmogorov-Smirnov Table range from .01 to .2 (for tails = 2) and .005 to .1 for tails = 1. If the p-value is less than .01 (tails = 2) or .005 (tails = 1) then the p-value is given as 0 and if the p-value is greater than .2 (tails = 2) or .1 (tails = 1) then the p-value is given as 1.

For Example 2, KSCRIT(15, .05, 2) = .338 (the same as given in cell H21 of Figure 4). Also note that the p-value = KSPROB(H20, B21) = KSPROB(0.1874988, 15) = 1 (meaning that p-value > .2), and so once again we can’t reject the null hypothesis that the data is normally distributed.

If the value of Dn had been .35 in Example 2, then Dn = .35 > .338 = Dcrit, and so we would have rejected the null hypothesis that the data is normally distributed. In this case we would have seen that p-value = KSPROB(.35,15) = .0427, which once again leads us to reject the null hypothesis.

Kolmogorov Distribution

As referenced above, the Kolmogorov distribution can be useful in conducting the Kolmogorov-Smirnov test. Click here for more information about this distribution, including some useful functions provided by the Real Statistics Resource Pack.

Lilliefors Test

When the population mean and standard deviation for the Kolmogorov-Smirnov Test is estimated from the sample mean and standard deviation, as was done in Example 1 and 2, then the Kolmogorov-Smirnov Table yields results that are too conservative. More accurate results can be derived from the Liiliefors Table as described in the Lilliefors Test for Normality.

75 Responses to Kolmogorov-Smirnov Test for Normality

  1. Neeraj says:

    Dear Sir,
    I am writing a java program to perform Kolmogorov Smirnov Test.
    So I want to know the mathematical formulae to calculate the Z-Score and F(x).
    And what does it mean by estimating population mean and standard deviation from the sample mean and standard deviation. Does it mean taking the mean and standard deviation of population same as sample mean and standard deviation.

    Thank You

  2. César says:

    Dear Sir,

    Is possible to determinate Dn parameter froma a data, using your Real Statistics addin. I was checking your examples and addin. However I cant find that formule.

    • Charles says:

      Sorry, but I have not yet implemented this is the Real Statistics addin. This value depends on the specific distribution that you are trying to fit and since there are so many distribution, I haven’t tried to create a single formula for this. I will look into doing this shortly.

  3. BASANTES S. Ruben says:

    Dear Charles,

    Thanks for your application Real Statistics is ver useful
    However, I have some questions about the Kolmogorov-Smironov test.
    I have a huge sample of 3160 observations. My sample desribe a normal distribution, I can say that is a ideal sample.
    When I conduct K-S test on this data the null hypothesis is rejected.
    Why does it happen? I could expect that Ho was accepted!
    thanks for your response but I am new in this test,

    Best regards,

    • Charles says:

      If you send mean Excel file with your data and the KS test that you have conducted I will try to figure out what is going on. You can send the file to the email listed on the Contact Us webpage.
      Note: You should use the Lilliefors version of the KS test for normality when you are estimating the the population mean and variance from the sample mean and variance.

  4. Shaks says:

    Hi Charles:

    If i want to use the Ks test to test for the Benford Distribution, would that be a two sample test? So I would use what you have at this url?



  5. Daniel Vincent says:

    Charles, can you help me with what kind of normality test & hypothesis test that I should use with my survey?

    I created a survey about student’s perception on business ethics, based on gender (male / female), ethics education (formal / informal), & age of maturity (junior / senior). n = 160 with unequal distribution: 44 male – 116 female, 77 informal – 83 formal, 28 junior – 132 senior.

    Thank you in advance.

    • Charles says:

      I suggest that you use the Shapiro-Wilk test for normality.
      Regarding the hypothesis test, first you need to determine what hypothesis/hypotheses you want to test.

  6. Richard says:

    Hi Charles,

    How should one choose n if the sample size is really small, say ~10?


  7. Santosh says:

    Hi Charles,
    Do you have a example of applying K-S test to check for conformance with Benford’s distribution. Do you think K-S method will be an appropriate method to check for deviation from Benford’s law.

    • Charles says:

      Yes, I can see how you could use KS test to check or conformance with Benford’s law.I have not provided an example of this.

  8. aditya garg says:

    would u like to prescribe any book in which all these tests can be found

  9. Ijeoma says:

    Dear Charles
    Is it possible to get a conflicting result when the explore command and the one sample K-S are used to check the normality of the same data?

    • Charles says:

      I don’t know what “explore command” is.
      It is not surprising that two different tests for normality will give different p-values.

  10. Rob says:

    great site I learn from it a lot.

    Can I please ask, how did you calculate a column K? midpt-sq?

    Thank you.

    • Charles says:

      Cell K4 contains the formula =I4^2, and similarly for the other cells in column K.

      • Rob says:

        Thank you very much Charles.
        I dont know if I get it right, about numbers in column A (x values).
        Let me explain on this example.

        I have scale of loneliness and results can be
        10-20 – low loneliness
        20-30 – average loneliness
        30-40 – high loneliness
        so I calculate:

        data midp freq midp^2
        10-20 15 24 225
        20-30 25 32 625
        30- 40 35 33 1225

        n: 89
        M: 26,011
        Msq: 676,584
        Sq-sum/n: 739,606
        Varp: 63,022
        Var: 63,738
        Stdev: 7,983

        x freq Cum Snx Z F(x) D
        15 24 24 0,270 -1,379 0,084 0,186
        25 32 56 0,629 -0,127 0,450 0,180
        35 33 89 1 1,126 0,870 0,130

        Dmax 0,185
        Dkrit 0,144

        0,185>0,144 so data is not normally distributed.

        Is this rigt calculated or not?, I am not sure about choosing data for column A.

        • Charles says:


          If you assume that the data in each interval is concentrated at the midpoint then the calculation is correct. I have typically assumed this for the calculation of the mean, but have used the right end-point of the intervals for the KS calculation. I can see advantages with both approaches.

          I suggest that instead of using the KS table to calculate the critical value you use the Lilliefors Table instead. It is more accurate for determining whether data is normal when you use the sample mean and standard deviation. See the following webpage:

          Lilliefors Test


  11. Issam says:

    Dear Sir,

    Thank you very much, I’m learning a lot from your website.

    Unfortunately, my data set dose not fit with normal distribution.

    I have very large data and I read in this paper (Open Access): Langlois, T. J., Fitzpatrick, B. R., Fairclough, D. V., Wakefield, C. B., Hesp, S. A., McLean, D. L., … Meeuwig, J. J. (2012). Similarities between Line Fishing and Baited Stereo-Video Estimations of Length-Frequency: Novel Application of Kernel Density Estimates. PLoS ONE, 7(11), 1–9. doi:10.1371/journal.pone.0045973

    “We used Monte Carlo simulations to overcome uncertainty regarding the asymptotic distributions of KS test statistics under the null hypothesis”.

    How can I do the simulation in excel so my data can fit with normal distribution, so I can run the KS test for my data.

    Thank you very much

  12. Edgar says:

    Dear Charles I appreciate your contributions.

    Please consider the following, in your second example you state the following:

    “Since Dn = 0.1874988 < 0.338 = Dn,α, we conclude that the data is a reasonably good fit with the normal distribution (more precisely that there is no significant difference between the data and data which is normally distributed). Note that is not the same conclusion we reached from looking at the histogram and QQ plot"

    So the same remains for Dn = 0.1874988 .338 = Dcrit, and so we would have rejected the null hypothesis that the data is normally distributed. In this case we would have seen that p-value = KSPROB(.35,15) = .0427, which once again leads us to reject the null hypothesis”

    But if the α=0.01 then the critical value is 0.404 and Dn = 0.35 < 0.404 = Dn,α,

    Then, should we conclude that data is normally distributed ???

    I´ll appreciate your comments,

    Kind regards


    • Charles says:

      Changing the value of alpha from .05 to .01, changes the value for Dcrit, but doesn’t change the value of Dn. I don’t see where the Dn = 0.35 comes from?
      The null hypothesis that the data comes from a normal population cannot be rejected if Dn < Dcrit. Charles

  13. Brody says:

    Charles, thanks, but I too have a question.

    May I perform the KS test on two samples with different counts or n values?

    For example, there are 7 possible categories, and there are 3 individual samples that will distribute within those 7 categories (dealing with sediment, sieves, and weights). I need to compare this set of samples to another set of samples, however, the number of samples here is 7. So, 2 sets of samples. The first, has 3 samples, and the second has 7. There are 7 sieve sizes or categories into which the samples are distributed. Can KS test be run on them?

    If not, then would it be permissible to take the means of each sample, thus giving congruency to the data (same n values, but with means), and use the n from the sample size (n=10 (3 from first, and 7 from second)), rather than the mean (n=2) to establish the critical value, or would I need to use the n from the mean sample size to establish the CV?


    • Charles says:

      Although I don’t completely understand your description, you can perform a two sample KS test with samples of different sizes to determine whether these samples come from populations with the same distributions. See the following webpage for more details
      Two Sample KS Test

  14. sathyapriya says:

    how to i find cumulative distribution function F(x)?

    • Charles says:

      Examples 1 and 2 on the referenced webpage explain how to compute the cumulative distribution function F(x).

  15. Gianma says:

    Dear Sir,
    Thank you for sharing this.
    I have a question: why in the first example we calculate the Z-score with x=100, 200, etc., but with mean and standard deviation calculated from the mid points (150, 250, etc.)?
    Shouldnt’e be correct to have the mid points of the intervals in column A for Z-score calculation?
    Best regards,

    • Charles says:

      Probably either approach is acceptable, but here I have used the endpoints of the various intervals with the mean and stdev based on the midpoint of the intervals.

      • Gianma says:

        Dear Charles,
        Sorry for insisting, but it’s not a negligible difference: using the midpoints of the intervals for calculating the Z-score, the resulting Dn is equal to 0.117>Dn,a, so the overall result is the opposite (the data is NOT a good fit with the normal distribution)…
        Considering that the definition of Z is (Xi-u)/S, where u is the mean of the X values and S is their stdev, I think that only the midpoints of the intervals should be used, if u is calculated as their mean.
        Otherwise, we can use the endpoints of the intervals as Xi, but in this case also mean and stdev should be calculated on these values, and not on the midpoints.
        Do you agree?

        • Charles says:


          I realize that depending on the choice you make you might come to a different conclusion. This is why it is important to view significance values such as alpha = .05 not as absolute things. In fact if you set alpha = .05 as your significance value, any p-value near .05 can be viewed with some caution.

          Unfortunately, this is the nature of statistics. If you get a p-value of .0003 you are fairly confident of your result (at least as far as type I error is concerned), but often depending on which test you choose to use (or which version of a test you use), you might get different outcomes.


          • Gianma says:

            Dear Sir,
            Thank you for your answer. Sincerely, I’m not 100% convinced, but at least this discussion forced me to look deeper into this topic, and review some forgotten page of statistics!
            Best regards,

  16. Avi says:

    Great article. I know understand how you calculate the P value for the KS test. Thanks so much. However when I try to replicate in excel, the NORMDIST function does not return the same values. Is there something different you are doing, as excel is asking me for the mean and stDev (which i input) but does not return the same values you have in your sheet

    many thanks

    • Charles says:

      I don-t see any reference to the NORMDIST function on the referenced webpage. There is a reference to =NORMSDIST(E4(, which is the standard normal distribution function (mean = 0 and standard deviation = 1).

  17. abhijit bora says:

    Hello Sir,

    I am searching for Kolmogorov-Smirnov Test two sample data in excel. Can you help me?

  18. Sven says:


    with your tool it is possible to use the Shapiro-Wilk-Test on a time series and get a besides the p-value a “yes” or “no” for the normal assumption. Therefore I can do this test for multiple series parallel with only one formula which is very nice.

    Is there also a possibility to test for other distributions (Poisson, Stuttering Poisson, Gamma, Negative Binomial, etc.) for multiple series (KS-Test or Chi-Square-Test), so I can see which distribution would fit best?


  19. Cathy says:

    Can we use the Kolmogorov Smirnov test if we want to know whether the data follow a
    binomial distribution?

  20. zohreh says:

    Hi all,
    I am trying to fit an appropriate probability distribution with my data. I have known that I can use K-S test, but my problem is that, as I am going to use MATLAB or EXCEL softwares for this purpose, I do not know how I can use these softwares for this test. My problem is that I have not ever seen any example of this test for exponential or other distributions rather than normal and lognormal distributions. How can I decide whether for example lognormal distribution is appropriate or exponential distribution?
    Thank you very much for your help inn advance.

  21. Kevin says:


    I am trying to determine if Rokeach value survey (RVS) responses for two different groups are statistically significant. The RVS has subjects rank 18 values in order of importance to them. I have calculated the mean response for each value within each group and ordered them from most important (lowest mean) to least important (highest mean). I was told I could use the Kolmogorov-Smirnov Test to determine if differences in mean value rankings between groups are statistically significant.
    I would appreciate an explanation of this process in Excel.

    Thank you in advance,
    Kevin, Excel expert, stats neophyte

    P.S. I have learned more practical statistics from your site than my undergrad and masters professors have been able to drill into me… Well done, Sir!

    • Charles says:


      It is good to hear that the site has been helpful. My goal was exactly as you stated, to help people make practical use of (and understand) statistics in the environment is probably the most available for most people, namely Excel.

      If your goal is to determine whether there is a significant difference between the means of the two groups, you probably want to use the t test (if the data in the two groups are normally distributed) or the Mann-Whitney test if they are not. You could also use the two-sample Kolmogorov-Smirnov Test to determine whether the two groups of data come from the same population. I have already described the one sample Kolmogorov-Smirnov Test on the website, but not the two sample test.

      Fortunately, I have just implemented the two sample test in the Real Statistics Resource Pack (Release 2.15) and have written the description for the website (including two examples). I plan to release these in the next couple of days. Stay tuned.


    • Charles says:

      The two-sample KS test is now included in the Real Statistics Resource Pack. The procedure is described on the webpage http://www.real-statistics.com/non-parametric-tests/two-sample-kolmogorov-smirnov-test/.

  22. Sally says:

    hai, may I know what the p-value mean by and how to find the p-value of kolmogorov-smirnov ?

    • Sally says:

      besides that is it possible to use the statistical value of other distribution as a critical value to find the p-value of KS test?
      for example, use the z value of normal distribution to find the p-value by KS test.

      • Charles says:

        Sorry, but I don’t understand your question. In any case I will be adding the KS p-value shortly.

    • Charles says:

      I am revising the KS part of the website/software and will add the p-value. Stay tuned.

    • Charles says:

      I have now provided a way of calculating the p-value for the KS test, using the functions KSPROB and KSDIST. These are available in the latest release of the Real Statistics Resource Pack (Rel 2.15).

  23. Sandeep R says:

    hello sir, i found this article very helpful. i need to fit log normal distribution either from chi square or K-S test. you have explained only normal distribution. please explain log normal distribution also.
    here is my test data
    mean= 5.1439
    δ= 0.2506
    median= 4.99
    σlnz = 0.247

    interval observed frequency
    1.81 2.759 9
    2.759 3.708 61
    3.708 4.657 116
    4.657 5.606 155
    5.606 6.555 120
    6.555 7.504 42
    7.504 8.453 7
    8.453 9.402 2
    9.402 10.351 2
    10.351 11.3 3
    sum= 517

    • Charles says:

      The procedure for using the K-S test with the log normal distribution is pretty much the same as for the normal distribution. E.g. in Figure 3, you won’t need the E column. Simply enter the formula for the log-normal distribution in column F. E.g. cell F4 would contain a formula like =LOGNORMDIST(A4,N5,N10). The rest is the same as in the examples provided on the webpage.

      • Sandeep R says:

        thank you very much sir for your reply.
        sir i have one more doubt, should we use “mean and standard deviation” or “Median and σlnz in lognormal distribution?

        • Charles says:

          If I understand your original question correctly, then you should use the mean and std dev, esp. since Excel has the LOGNORM.DIST function available which use these two parameters. Why do you think the median and σlnz might be good choices? Perhaps this is correct and I am not answering the right question.

      • Bruno says:


        Critical values are the same for all distributions tested?

        Thank you

        • Charles says:

          Hi Bruno,

          If you know the distribution parameters (e.g. mean and variance for the normal distribution) then the table of critical values is the same. Usually these parameter values are estimated from the sample, in which case different critical values should be used. I have provided the table for the normal distribution. This is call the Lilliefors Test. See the following webpages

          Lilliefors Test
          Lilliefors Table


  24. Juan Pablo Góngora says:

    Hi, the spss software use the Z K-S = D*SQRT(n), and a P-value, but, i can´t calculate the result of the p-value, is not the probablility of the normal distribution.
    Example, n = 20 D = .416, ZK-S =.416*SQRT(20) = 1.861 SPSS P-value (two sided) = .002.
    But, 2*(1-NORMSDIST(1.861)) is not .002
    Do you know how is the p-value calculated?
    Tks a lot
    PD. Sorry, mi english is not the best

  25. Renato says:

    Dear Sir:
    I am looking for a test to compare if one sub-sample of size “n” taken from a sample of size “N” (source sample), with n<<N, has the same attributes of the source sample.
    Is Kolmogorov-Smirnov the best test?
    The source sample is a multimodal distribution (fish size frequencies); and I have some doubts about how to construct the accumulative sample to make the KS test.
    Tha data is in a table of frequencies by ranges of size
    Thanks for your answer


Leave a Reply

Your email address will not be published. Required fields are marked *