The Wald-Wolfowitz two sample runs test is used to determine whether two samples come from the same distribution. The test orders the values in the combined sample creating a sequence of symbols 1 (if the value comes from sample 1) and 2 (if the value comes from sample 2) and then using the one-tailed version of the one-sample runs test.
If there are ties, then the number of runs will differ depending on how the 1’s and 2’s for the tied values are ordered. In this case, we perform multiple versions of the test randomly changing the order of the 1’s and 2’s with tied ranks.
Note that when there is a significant difference between the distributions of the two samples, we can’t tell whether this is a difference in means, medians, variances, skewness, kurtosis, etc.
Example 1: Determine whether the samples in ranges B4:B11 and C4:C10 of Figure 1 come from the same distribution.
First we rearrange the input data as shown in range E4:F18. Essentially we are creating a stacked version of the original data in column E, labeling the data from sample 1 with a 1 in column F and labeling the data from sample 2 with a 2 in column F.
Next we sort the data putting the results in range H4:I18. We can do this by using the array formula =QSORTRows(E4:F18,1).
Figure 1 – Data for Two Sample Runs Test
We next use the array formula for the one-sample runs test=RUNSTEST(I4:I18,TRUE,1) to obtain the results similar to those shown in range K4:L11 of Figure 2.
Figure 2 – Two Sample Runs Test
Note that the value 36 appears twice in the original data, as shown in Figure 2, once in sample 1 and again in sample 2. When the data is sorted we see that 36 appears in cells H9 and H10. The order shown in column I is the one that produces the fewest number of runs (namely 9). However, if the values in cells I9 and I10 are interchanged the number of runs increases by 2 to 11. Thus, there are two possible outcomes, as shown in range N4:P12 of Figure 2.
We can simplify the calculations by using the following Real Statistics function.
Real Statistics Function: The following array function is provided in the Real Statistics Pack:
RUNS2TEST(R1, R2, lab, iter): outputs a 9 × 1 column range as shown in range L4:L11 of Figure 2 with the results of the two-sample runs test on the data in ranges R1 and R2 if lab = FALSE (default) and a 9 × 2 column range, including labels, as shown in range K4:L11 if lab = TRUE.
In the above iter = 0 (default). We use this value for if there are no ties or we are willing to accept a random (actually semi-random) ordering of the 1’s and 2’s for the tied values.
If we are not willing to accept the default output, we ask the function to randomly change the order of the 1’s and 2’s for the tied values number of times. For Example 1, if we set iter = 100, we see from the right side of Figure 2 that the runs = 9 case occurs 45 times and the runs = 11 case occurs 55 times. Note that the p-values for these two cases are different.
For Example 1, if we use the array formula =RUNS2TEST(B4:B11,C4:C10,TRUE) we obtain the output shown in range K4:L11 of Figure 2.
If instead we use the array formula =RUNS2TEST(B4:B11,C4:C10,TRUE,100) we obtain the output shown in range N4:P12. In this case, we know that there are 2 possible outcomes, and so we need to highlight a 9 × 3 range for the output. In general, some guesswork might be required to determine how large a range to use for the output. After seeing the output you might have to increase the size of the output range. Also note that since the orders are randomly generated, the output are likely to be different on successive runs of the formula.