Basic Concepts and Normal Approximation
We now explore the problem of determining whether a sequence of two possible outcomes is random, especially from the point of view that the runs are random. We begin with an example.
Example 1: 40 people were selected at random in the following order MMFFF FMFFM MFMMM MFFMM FMFFM MMMF FMFMM FFMMMF. Assuming the population has 50% men and 50% women, is true that the people were selected at random?
22 of the people selected were men and 18 women. The population random variable has a binomial distribution B(40, π). The null hypothesis is H0: π = .5. Now P(x ≤ 18) = BINOMDIST(18, 40, .5, TRUE) = 0.318. Since .025 < .318 < .975, we can’t reject the null hypothesis with 95% level of confidence. This, however, doesn’t mean that the sample is random.
We next check to see if the order is random. First we note that there are 20 runs as follows:
MM FFFF M FF MM F MMMM FF MM F M FF MMMM FF M F MM FF MMM F
Before proceeding with the example, we state the key property for conducting runs tests.
Property 1: Suppose there are n1 elements of one type and n2 of the other, where n1 ≥ n2 and n1 is large enough (approximately n1 > 20). Suppose further there are r runs. Then based on the null hypothesis H0 that the order is random, r has an approximately normal distribution N(μ,σ) where
Example 1 (continued) – runs test. Since n1 = 22 > 20, we use Property 1 as shown in Figure 1.
Figure 1 – Runs Test for Example 1
Thus we cannot reject the null hypothesis that the runs are random.
Table Lookup Approach
If n1 ≤ 20, then we can test r by using the table of values found in the Runs Test Table. The range listed for r in the table are the values for which the null hypothesis (that the runs are random) is not rejected at the 5% level.
Example 2: Is the order of XXXX ZZ XX Z XXXXXX ZZZ XX Z XXX ZZ random?
Figure 2 – Runs Test for Example 2
We see that n1 = 17 and runs = 10. Since n1 ≤ 20, we need to use the runs table in Runs Test Table. Consulting the runs table we see that runs in the interval [8, 19] are acceptable. Since r = number of runs = 10 is in this interval, we cannot reject the null hypothesis at the 5% level (2-tailed test).
Real Statistics Excel Functions: The following functions are provided in the Real Statistics Pack:
RLowerCRIT(n1, n2, α, tails) = lower tail critical value of the runs test for a sample with n1 elements of one type and n2 elements of the other type, where α is the significance level (default .05) and tails = 1 or 2 (default).
RUpperCRIT(n1, n2, α, tails) = upper tail critical value of the runs test for a sample with n1 elements of one type and n2 elements of the other type, where α is the significance level (default .05) and tails = 1 or 2 (default).
In a two-tailed test, if a = RLowerCRIT(n1, n2, α) and b = RUpperCRIT(n1, n2, α), then the closed interval [a, b] is the range of values for r that satisfy the null hypothesis (that the sequence is random).
These functions work for all values of n1 and n2 up to 514, and are based on the runs distribution which is described in Runs Distribution. Note that the values produced by these functions are similar, but not always the same as the values shown in the Runs Test Table.
Runs Test using Real Statistics Functions
Example 3: Determine whether the sequence of numbers in range B3:B17 of Figure 3 is randomly generated.
The approach we use is to turn the sequence of numbers into a sequence of letters of the type shown in the previous examples. We do this by calculating the median of the numbers to be 38 (cell G3 of Figure 3). If an entry in the sequence is greater than the median we code it with the letter A, while if it is less than the median we code it with the letter B. Entries that are equal to the median (e.g. cell B5) are simply ignored. We see the result of this coding in range D3:D17. For example, cell D3 contains the formula =IF(B3>G$3,”A”,IF(B3<G$3,”B”,””)).
Figure 3 – Runs Test for Example 3
We can now carry out the runs test as demonstrated previously. This time, however, we use the following Real Statistics function to perform the test.
Real Statistics Function: The following array function is provided in the Real Statistics Pack:
RUNSTEST(s, lab, tails): outputs a 9 × 1 column range as shown in range G5:G13 of Figure 3 with the results of the runs test on the string if lab = FALSE (default) and a 9 × 2 range, including labels, as shown in range F5:G13 if lab = TRUE. If tails = 2 (default) a two-tailed test is used and if tails = 1 a one-tailed is used. String s must consist of a sequence of two and only two distinct characters.
RUNSTEST(R1, lab, tails): outputs the results of the runs test on the numeric data in range R1 using the coding described in Example 3; i.e. =RUNSTEST(s, lab, tails) where s is the sequence of codes (ignoring the any elements in R1 whose value is the median).
If all the cells in R1 contain one of two one-character symbols (e.g. 4 and *) then RUNSTEST(R1, lab, tails) = RUNSTEST(s, lab, tails) where s is the sequence of symbols in R1. This is the only case where R1 may contain non-numeric values.
Example 3 (continued): The result of the runs test is shown in range F5:G13 of Figure 3, as calculated by the array formula =RUNSTEST(B3:B17,TRUE). Here n1 = the number of elements larger than the median and n2 = the number of elements smaller than the median. The value of runs is 9 since there are 9 runs in the sequence shown in column D. The mean, standard deviation, z-stat and p-value (cells G7, G8, G11, G12) are based on Property 1 and tails = 2 since this is the default value for the function. The value of p-exact (cell G13) is the p-value based on the exact test using the function =RUNSDIST(r, n1, n2, TRUE), as described in Runs Distribution.
Since p-value > .05 = α (using either the normal approximation in cell G12 or the exact value in cell G13), we cannot reject the null hypothesis that the sequence in range B3:B17 is random.
Note that we would get the same result using either of the following array formulas:
See Runs Distribution for a description of how to carry out an exact one-sample runs test.
See Data Analysis Tools for Non-parametric Tests for how to conduct the one-sample runs test using the Real Statistics Non-parametric Tests data analysis tool.
See Runs using the Binomial Distribution for another approach to carrying out runs tests.
See Two-Sample Runs Test for information about how to extend the one-sample runs test to two samples.