Wilcoxon Rank Sum Test – Advanced

Property 1: Suppose sample 1 has size n1 and rank sum R1 and sample 2 has size n2 and rank sum R2, then R1 R2 = n(n+1)/2 where n = n1 n2.

Proof: This is simply a consequence of the fact that the sum of the first n positive integers is \frac{n(n+1)}{2}. This can be proven by induction. For n = 1, we see that \frac{n(n+1)}{2} = \frac{1(1+1)}{2} = 1 = n. Assume the result is true for n, then for n + 1 we have,  1 + 2 + … + n + (n+1) = \frac{n(n+1)}{2} + (n + 1) = \frac{n(n+1)+2(n+1)}{2}\frac{(n+1)(n+2)}{2}

Property 2: When the two samples are sufficiently large (say of size > 10, although some say 20), then the W statistic is approximately normal N(μ, σ) where


Proof: We prove that the mean and variance of W = R1 are as described above. The normal approximation was proven in Mann & Whitney (1947) of Bibliography and we won’t repeat the proof here.

Let xi = the rank of the ith data element in the smaller sample. Thus, under the assumption of the null hypothesis, by Property 1


By Property 4a of Expectation


As we did in the proof of Property 1, we can show by induction on n that


From these it follows that


We can now calculate the following expectations:


Also where i ≠ j


By Property 2 of Expectation (case where i = j)


By Property 3 of Basic Concepts of Correlation when i ≠ j


By an extended version of Property 5 of Basic Concepts of Correlation





15 Responses to Wilcoxon Rank Sum Test – Advanced

  1. Marcel says:

    I have one question about this proof. You calculate the expectation E(rirj) for all j not equal i. I don’t understand why we could take this expectation as equivalent to E(rirj) for all j, i. In the covariance we have to use E(rirj) of all rangs, but you use the expectation for all j not equal i, why is it correct? Can you explain me this problem?

    Thanks for your answer!

    • Charles says:

      I show E[ri rj] both where i = j (i.e. var(ri)) and where i is not equal to j.

      • Marcel says:

        I am grateful to you for your answer. But I wanted to say that you take the expectation E(rirj) with i not equal to j by the covariance cov(rirj). I don’t understand, why we can do this. I thought we need the expectation of all i and j (also of the double sum i*j).
        I made a print-screen with both places in your proof: https://image.prntscr.com/image/7lBOfbg1RribBEjyIbN1Kw.png

        • Charles says:

          Thanks for clarifying things. There are two case: (1) where i = j and (2) where i is not equal to j. In case (1) cov(rirj) = var(ri), which is the described in your print-screen. In case (2) the formula is the one shown in your print-screen. I have just updated the referenced webpage to try to make this a bit clearer. Does it help?

  2. Marcel says:

    Your site is very good!
    Have you thougt about translate your work in other languages?

  3. willy kiptoo says:

    Thanks alot, you realy put it out on paper

  4. Orukpe Lewis says:

    I’m grateful for dis better understanding

  5. Nnamdi says:

    Thanks. Good one

  6. shark says:

    thanks 😀

  7. fra says:

    thanx! really helpful!

Leave a Reply

Your email address will not be published. Required fields are marked *