Wilcoxon Rank Sum Test – Proofs

Property 1

Suppose sample 1 has size n₁and rank sum R₁ and sample 2 has size n₂, and rank sum R₂, then R₁+ R₂ = n(n+1)/2 where n = n₁+ n₂.

Proof: This is simply a consequence of the fact that the sum of the first n positive integers is $\frac{n(n+1)}{2}$ . This can be proven by induction. For n = 1, we see that $\frac{n(n+1)}{2} = \frac{1(1+1)}{2}$ = 1 = n. Assume the result is true for n, then for n + 1 we have, 1 + 2 + … + n + (n+1) = $\frac{n(n+1)}{2}$ + (n + 1) = $\frac{n(n+1)+2(n+1)}{2}$ = $\frac{(n+1)(n+2)}{2}$

Property 2

When the two samples are sufficiently large (say of size > 10, although some say 20), then the W statistic is approximately normal N(μ, σ²) where

Proof: We prove that the mean and variance of W = R₁ are as described above. The normal approximation was proven in Mann & Whitney (1947) (see reference at the end of this webpage), and we won’t repeat the proof here.

Let x_i = the rank of the ith data element in the smaller sample. Thus, under the assumption of the null hypothesis, by Property 1

By Property 4a of Expectation

As we did in the proof of Property 1, we can show by induction on n that

From these, it follows that

We can now calculate the following expectations:

Also, where i ≠ j

By Property 2 of Expectation (case where i = j)

By Property 3 of Basic Concepts of Correlation, when i ≠ j

Using an extended version of Property 6 of Basic Concepts of Correlation we see that

Links

↑ Wilcoxon rank sum test

Reference

Mann, H. & Whitney (1947) On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 18, p50-60.
http://projecteuclid.org/download/pdf_1/euclid.aoms/1177730491

26 thoughts on “Wilcoxon Rank Sum Test – Proofs”

Jack

August 16, 2017 at 11:18 am

oh ，now i understand it！thanks your respond
Reply
Jack

August 13, 2017 at 8:23 am

It’s 2Σ i!=j to n1 [-(n+1)/12].I write wrong. There should’t ‘2’ in there i think .
Reply
- Charles
  
  August 14, 2017 at 10:38 am
  
  Jack,
  You need to sum all the terms cov(x_i,x_j) where i not equal to j. Note that each such covariance is repeated twice, once for cov(x_i,x_j) and once for cov(x_j,x_i). Thus, if you assume that the sum is where i < j, then you need to double the result. Another way to look at this is to determine how many pairs there are for the indices 1 to n1 where the indices are not equal. The answer is n1(n1-1), which is the value used in the proof. This is the same as 2 times n1(n-1)/2, the later being the number of pairs where the first index is less than the second index. To make this much clearer and more accurate, I have now replaced the lower limit of the summation symbol by i < j (instead of i not equal to j). Thanks for bringing this issue to my attention. Charles
  Reply
  - Jack
    
    July 7, 2018 at 9:54 am
    
    Mr Charles, may be you not believe but it is true that until today I know you replay me with message. And my written english is not good.It is difficult to get in touch with world web.In 2017,after i ask you the question,the next day i found you have changed i!=j to i<y.But stupid web do not let me see you replay message.Watch your proof spend lot of time,especially 2Σ i!=j to n1 [-(n+1)/12]. now i see your message.but some words i don not get understand,I forget many details about proof. So I just want to know Σ i!=j to n1 [-(n+1)/12] is right ? And Σ i!=j to n1 [-(n+1)/12] is equal 2Σ i<j to n1 [-(n+1)/12] ?
    Reply
    - Charles
      
      July 7, 2018 at 11:34 am
      
      Jack,
      Sorry, but I don’t understand the notation:
      Σ i!=j to n1 [-(n+1)/12] is right ? And Σ i!=j to n1 [-(n+1)/12] is equal 2Σ i Reply
      - Jack
        
        July 7, 2018 at 12:26 pm
        
        the last third lines. now you change “2Σ (i!=j to n1) [-(n+1)/12] ” to “2Σ (i<j to n1) [-(n+1)/12]" .I want to know if "Σ (i!=j to n1) [-(n+1)/12]" equals to "2Σ (i<j to n1) [-(n+1)/12] "
      - Charles
        
        July 13, 2018 at 9:19 am
        
        Jack,
        I still don’t understand your notation. What does i! mean in the notation Σ (i!=j to n1) …
        Charles
      - Jack
        
        September 1, 2018 at 4:16 pm
        
        Sir
        i!=j means i not equal to j
        the last third lines. now you change “2Σ (i!=j to n1) [-(n+1)/12] ” to “2Σ (i<j to n1) [-(n+1)/12]" .I want to know if "Σ (i!=j to n1) [-(n+1)/12]" equals to "2Σ (i<j to n1) [-(n+1)/12] "
      - Jack
        
        September 2, 2018 at 2:04 pm
        
        now i have a tweet account. So I send picture to you.
      - Charles
        
        September 2, 2018 at 10:21 pm
        
        Jack,
        Thanks for sending me the tweet. This makes everything clear.
        The equation you wrote is correct (as long as i, j <= n1). Charles
      - Jack
        
        September 3, 2018 at 10:51 am
        
        Thank you ！
Marcel

June 15, 2017 at 5:05 pm

I have one question about this proof. You calculate the expectation E(rirj) for all j not equal i. I don’t understand why we could take this expectation as equivalent to E(rirj) for all j, i. In the covariance we have to use E(rirj) of all rangs, but you use the expectation for all j not equal i, why is it correct? Can you explain me this problem?

Thanks for your answer!
Reply
- Charles
  
  June 15, 2017 at 8:40 pm
  
  Marcel,
  I show E[ri rj] both where i = j (i.e. var(ri)) and where i is not equal to j.
  Charles
  Reply
  - Marcel
    
    June 16, 2017 at 9:53 pm
    
    I am grateful to you for your answer. But I wanted to say that you take the expectation E(rirj) with i not equal to j by the covariance cov(rirj). I don’t understand, why we can do this. I thought we need the expectation of all i and j (also of the double sum i*j).
    I made a print-screen with both places in your proof: https://image.prntscr.com/image/7lBOfbg1RribBEjyIbN1Kw.png
    Reply
    - Charles
      
      June 19, 2017 at 9:54 am
      
      Marcel,
      Thanks for clarifying things. There are two case: (1) where i = j and (2) where i is not equal to j. In case (1) cov(rirj) = var(ri), which is the described in your print-screen. In case (2) the formula is the one shown in your print-screen. I have just updated the referenced webpage to try to make this a bit clearer. Does it help?
      Charles
      Reply
      - Marcel
        
        June 20, 2017 at 10:48 pm
        
        Thank you very much!
        I understand it now.
      - Charles
        
        June 21, 2017 at 9:40 am
        
        Marcel,
        Good to hear. Glad I could help.
        Charles
Marcel

June 4, 2017 at 2:01 pm

Your site is very good!
Have you thougt about translate your work in other languages?
Reply
- Charles
  
  June 4, 2017 at 4:18 pm
  
  Marcel,
  I hadn’t. What languages did you have in mind?
  Charles
  Reply
  - Marcel
    
    June 14, 2017 at 10:58 pm
    
    I think about German and Polish. I could help you .
    Reply
    - Charles
      
      June 15, 2017 at 8:42 pm
      
      Marcel,
      Thanks for the offer, but so far people haven’t been asking for translations. In any case, I’ll think about it.
      Charles
      Reply
willy kiptoo

May 16, 2017 at 7:04 am

Thanks alot, you realy put it out on paper
Reply
Orukpe Lewis

March 7, 2016 at 12:14 am

I’m grateful for dis better understanding
Reply
Nnamdi

February 17, 2016 at 7:37 am

Thanks. Good one
Reply
shark

September 17, 2015 at 12:52 pm

thanks 😀
Reply
fra

June 15, 2014 at 9:31 am

thanx! really helpful!
Reply

Property 1

Property 2

Links

Reference

26 thoughts on “Wilcoxon Rank Sum Test – Proofs”

Leave a Comment Cancel reply