Property 1: Suppose sample 1 has size n1 and rank sum R1 and sample 2 has size n2 and rank sum R2, then R1 + R2 = n(n+1)/2 where n = n1 + n2.
Proof: This is simply a consequence of the fact that the sum of the first n positive integers is . This can be proven by induction. For n = 1, we see that = 1 = n. Assume the result is true for n, then for n + 1 we have, 1 + 2 + … + n + (n+1) = + (n + 1) = =
Property 2: When the two samples are sufficiently large (say of size > 10, although some say 20), then the W statistic is approximately normal N(μ, σ) where
Proof: We prove that the mean and variance of W = R1 are as described above. The normal approximation was proven in Mann & Whitney (1947) of Bibliography and we won’t repeat the proof here.
Let xi = the rank of the ith data element in the smaller sample. Thus, under the assumption of the null hypothesis, by Property 1
By Property 4a of Expectation
As we did in the proof of Property 1, we can show by induction on n that
From these it follows that
We can now calculate the following expectations:
Also where i ≠ j
By Property 2 of Expectation (case where i = j)
By Property 3 of Basic Concepts of Correlation when i ≠ j
By an extended version of Property 5 of Basic Concepts of Correlation