The two sample Kolmogorov-Smirnov test is used to test whether two samples come from the same distribution. The procedure is very similar to the One Kolmogorov-Smirnov Test (see also Kolmogorov-Smirnov Test for Normality).
Suppose that the first sample has size m with an observed cumulative distribution function of F(x) and that the second sample has size n with an observed cumulative distribution function of G(x). Define
The null hypothesis is H0: both samples come from a population with the same distribution. As for the Kolmogorov-Smirnov test for normality, we reject the null hypothesis (at significance level α) if Dm,n > Dm,n,α where Dm,n,α is the critical value.
For m and n sufficiently large
where c(α) = the inverse of the Kolmorogov distribution at α, which can be calculated in Excel as
Dm,n,α = KINV(α)*SQRT((m+n)/(m*n))
Example 1: Determine whether the two samples on the left side of Figure 1 come from the same distribution. The values in columns B and C are the frequencies of the values in column A.
Figure 1 – Two-sample Kolmogorov-Smirnov test
We carry out the analysis on the right side of Figure 1. Column E contains the cumulative distribution for Men (based on column B), column F contains the cumulative distribution for Women and column G contains the absolute value of the differences. E.g. cell E4 contains the formula =B4/B14, cell E5 contains the formula =B5/B14+E4 and cell G4 contains the formula =ABS(E4-F4).
Cell G14 contains the formula =MAX(G4:G13) for the test statistic and cell G15 contains the formula =KSINV(G1,B14,C14) for the critical value. Since D-stat =.229032 > .224317 = D-crit, we conclude there is a significant difference between the distributions for the samples.
We can also use the following functions to carry out the analysis:
Real Statistics Function: The following functions are provided in the Real Statistics Resource Pack:
KSDIST(x, n1, n2, b, m) = the p-value of the two-sample Kolmogorov-Smirnov test at x (i.e. D-stat) for samples of size n1 and n2.
KSINV(p, n1, n2, b, iter, m) = the critical value for significance level p of the two-sample Kolmogorov-Smirnov test for samples of size n1 and n2.
As usual, m = the # of iterations used in calculating an infinite sum (default = 10) in KDIST and KINV and iter (default = 40) = the # of iterations used to calculate KINV.
When the argument b = TRUE (default) then an approximate value is used which works better for small values of n1 and n2. If b = FALSE then it is assumed that n1 and n2 are sufficiently large so that the approximation described previously can be used.
For Example 1, we have the following:
D-crit = KSINV(G1,B14,C14) = .224526
p-value = KSDIST(G14,B14,C14) = .043055
Alternatively we can use the Two Sample Kolmogorov-Smirnov Table of critical values to find the critical values, or the following functions which are based on this table:
KS2CRIT(n1, n2, α, tails, h) = the critical value of the two sample Kolmogorov-Smirnov test for sample of size n1 and n2 for the given value of alpha (default .05) and tails = 1 (one tail) or 2 (two tails, default) based on the table of critical values. If h = TRUE (default) then harmonic interpolation is used; otherwise linear interpolation is used.
KS2PROB(x, n1, n2, tails, h ) = an approximate p-value for the two sample KS test for the Dn1,n2 value equal to x for samples of size n1 and n2, and tails = 1 (one tail) or 2 (two tails, default) based on a linear interpolation (if h = FALSE) or harmonic interpolation (if h = TRUE, default) of the values in the table of critical values, using iter number of iterations (default = 40).
Note that the values for α in the table of critical values range from .01 to .2 (for tails = 2) and .005 to .1 for tails = 1. If the p-value is less than .01 (tails = 2) or .005 (tails = 1) then the p-value is given as 0 and if the p-value is greater than .2 (tails = 2) or .1 (tails = 1) then the p-value is given as 1.
For Example 1, we have the following:
D-crit = KS2CRIT(B14,C14, G1) = .229792
p-value = KS2PROB(G14,B14,C14) = .051232
Finally, we can use the following array function to perform the test:
Real Statistics Function: The following function is provided in the Real Statistics Resource Pack:
KS2TEST(R1, R2, lab, alpha, b, iter, m) is an array function which outputs a column vector with the values D-stat, p-value, D-crit, n1, n2 from the two sample KS test for the samples in ranges R1 and R2, where alpha is the significance level (default = .05) and b, iter and m are as in KSINV.
If R2 is omitted (the default) then R1 is treated as a frequency table (e.g. range B4:C13 in Figure 1).
If lab = TRUE then an extra column of labels is included in the output; thus the output is a 5 × 2 range instead of a 1 × 5 range if lab = FALSE (default).
For Example 1, the formula =KS2TEST(B4:C13,,TRUE) inserted in range F21:G25 generates the output shown in Figure 2.
Figure 2 – Output from KS2TEST function
Example 2: Determine whether the samples for Italy and France in Figure 3 come from the same distribution.
Figure 3 – Two data samples
We first show how to perform the KS test manually and then we will use the KS2TEST function.
Figure 4 – Two sample KS test
The approach is to create a frequency table (range M3:O11 of Figure 4) similar to that found in range A3:C14 of Figure 1, and then use the same approach as was used in Example 1. This is done by using the Real Statistics array formula =SortUnique(J4:K11) in range M4:M10 and then inserting the formula =COUNTIF(J$4:J$11,$M4) in cell N4 and highlighting the range N4:O10 followed by Ctrl-R and Ctrl-D. Finally the formulas =SUM(N4:N10) and =SUM(O4:O10) are inserted in cells N11 and O11.
We can also calculate the p-value using the formula =KSDIST(S11,N11,O11), getting the result of .62169.
We see from Figure 4 (or from p-value > .05), that the null hypothesis is not rejected, showing that there is no significant difference between the distribution for the two samples. The same result can be achieved using the array formula
which produces the output in Figure 5.
Figure 5 – Output from KS2TEST function
Finally, note that if we use the table lookup, then we get KS2CRIT(8,7,.05) = .714 and KS2PROB(.357143,8,7) = 1 (i.e. > .2).