We now show how to create a confidence interval for the difference between the population medians using what is called the **Hodges-Lehmann estimation**.

**Example 1**: Find the 95% confidence interval for the difference between the population medians based on the data in Example 1 of Mann-Whitney Test (repeated in range A3:D18 of Figure 1).

**Figure 1 – Set-up for calculating the confidence interval**

There are 12 elements in the Control sample (range B4:D15) and 11 elements in the Drug sample (range C4:C14). Thus there are 12∙11= 132 pairs of elements form both samples. All 132 possible differences are shown in range F4:P15. For example, the difference between the first Control element (cell B4) and the second Drug element (cell C5) is 52 – 31 = 21, shown in cell G4, as calculated by the formula =$E4-G$3. Note that the range F3:P3 contains the array formula =TRANSPOSE(C4:C14).

Using the table of critical values in Mann-Whitney Table, we see that the two-tailed critical value for *α* = .05 when the samples sizes are 11 and 12 is 33. Alternatively we can use the MCRIT function as shown in cell S5 of Figure 2.

**Figure 2 – Calculation of the confidence interval**

The 95% confidence interval is bounded by the 33^{rd} smallest and 33^{rd} largest values in range F4:P15, as calculated in cells S7 and S8, yielding the 95% confidence interval of [-9, 50].

The median of all the values in range F4:P15, called the **Hodges-Lehmann median**, is 4 (cell S9). This can be used as alternative effect size measurement.

Range S10:S13 is similar to range S5:S8, except that the confidence interval calculated is based on the critical value shown in cell S5 plus 1 (as shown in cell S10). Cell S11 shows the approximate alpha value corresponding to the value in cell S10. This means that the 94.45% confidence interval is [-8, 42], where 94.45% = 1 – .05556.

**Example 2**: Find the 95% confidence interval for the difference between the population medians based on the data in Example 2 of Mann-Whitney Test (repeated in range A3:H13 of Figure 3).

**Figure 3 – Set up for Mann-Whitney confidence interval**

Just as we did for Example 1, we create a table of differences. Since there are 40 non-smokers and 38 smokers, this is a 40 × 38 table occupying the range K4:AV43 of Figure 3 (with only the upper left side of the table visible).

Since the sample sizes are larger than those included in the table of critical values, we use the normal approximation, i.e. the ranks of the lower and upper bounds of the confidence interval are

where *z _{crit}* = the critical value for the standard normal distribution for

*α*/2 = .025.

The calculations required to arrive at the 95% confidence interval and a Hodges-Lehmann median of 7 are shown in Figure 4. We see from the figure that there is a 95.11% confidence interval of [2, 13] and a 94.99% confidence interval also of [2, 13]. Since both intervals are the same, we conclude that the 95% confidence interval is indeed [2, 13].

This won’t always be the case. In general, we will find that the alpha value for the first confidence interval will be at most .05, while the second will be a little larger than .05, and so we will have two confidence intervals, one a little more than 95% and one a little less than 95%.

**Figure 4 – Calculation of the confidence interval**

**Real Statistics Function**: The Real Statistics Pack provides the following array function to calculate the confidence interval based on the samples in ranges R1 and R2 where *alpha* is the *α* value (default .05).

**MANN_CONF**(R1, R2, *lab, ttype, alpha*): returns a 9 × 1 column range with the lower and upper bounds of the 1 – confidence interval and the Hodges-Lehmann median. If *lab* = TRUE (default FALSE) then an extra column with labels is included in the output.

If *ttype* = 0 (default) then the normal approximation is used; if *ttype* = 1 then MCRIT and MPROB are used with harmonic interpolation; if *ttype* = 2 then MCRIT and MPROB are used with linear interpolation; and if *ttype* = 3 then MANNINV and MANNDIST are used (as explained below).

For Example 1, the array formula =MANN_CONF(B4:B15,C4:C15,TRUE,1) returns the values shown in range R5:S13 of Figure 2. For Example 2, the array formula =MANN_CONF(A4:D13,E4:H13,TRUE) returns the values in range AY9:AY17 of Figure 4.

As we have seen in Example 2, with an integer critical value, it is not always possible to achieve a confidence interval of exactly 1–*α*. For this reason, the MANN_CONF generates two confidence intervals. The first corresponds to an alpha value which is at most α, while the second corresponds to the least value of alpha which is larger than α.

**Observation**: We can also calculate a confidence interval using the Mann-Whitney exact test (see Mann-Whitney Exact Test), at least when the sample sizes are not too large.

**Example 3**: Repeat Example 1 using the Mann-Whitney exact test.

**Figure 5 – Confidence interval using the exact test **

The formulas are similar to those shown in Figure 2, except that this time the MANNINV and MANNDIST functions are used instead of the MCRIT and MPROB functions.

We can also create the same output by using the formula

=MANN_CONF(B4:B15,C4:C14,TRUE,3)

That seems great, exactly what I need. Do you happen to have the excel workbook for that? I have some trouble doing the ‘by pairing the kth value in R4:R135 with the kth value’…

Thank you so much!

Sorry, I got this… thanks!

Hi Charles,

This is a really great site you have developed here. I have downloaded the Real statistics add in and tested it with your example above – it is working fine. I cannot seem to work it through with my own data though. The problem seems to be with datasets that are larger than about 15 samples in each of 2 groups. Is there a workaround or do you have plans to update the add in to allow for larger sample sizes?

Eileen,

Yes, there is another approach which I am about to add to the website. It will work for any size samples.

Charles

Eileen,

I have now revised this webpage. The new approach will work for any size samples. Shortly, I will issue a new version of the Real Statistics software which will implement this approach.

Charles