If there are a large number of ties, then the denominator in the definition of Kendall’s tau (see Kendall’s Tau Basic Concepts) should be replaced by
where nx is the number of pairs with a tie in variable x and ny is the number of pairs with a tie in variable y. This version of Kendall’s Tau is often called tau-b (instead of the previous version which is called tau-a).
The calculation of ny is similar to that of D described in Kendall’s Tau Hypothesis Testing, namely for each i, count the number of j > i for which xi = xj. This sum is ny. Calculating nx is similar, although potentially easier since the xi are in ascending order.
Since in general C(m, 2) = 1 + 2 +⋯+ (m–1), it follows that
where ti = the number of elements in the ith group of ties among the x values and uj = the number of elements in the jth group of ties among the y values.
Example 1: Repeat the analysis for Example 1 of Kendall’s Tau Normal Approximation using Kendall’s tau for the data in range A3:B18 of Figure 1.
Figure 1 – Hypothesis testing for Kendall’s tau (with ties)
As we did in Example 1 of Kendall’s Tau Hypothesis Testing, we first sort the data, placing the results in range D3:E18. This time we can see that there are ties.
The calculation is similar to that used for Example 1 of Kendall’s Tau Normal Approximation, except that we need to account for the ties. In particular, the formula for inversions (D) needs to be modified. E.g. cell F4 in Figure 1 contains the formula =COUNTIFS(E5:E$19,”<“&E4, D5:D$19,”>” & D4).
In order to calculate the modified denominator for tau we need to calculate nx and ny. E.g. the calculation for nx is carried out by putting the formula =COUNTIF(D5:D$19,”=”&D4) in cell H4. We next highlight the range H4:H18 and press Ctrl-D to copy this formula into all the relevant cells in column H. Placing the formula = SUM(H4:H18) in cell H19 yields the value for nx. The value ny (cell I19) can be calculated in a similar way.
We can calculate the value of C as the sum of the concordance elements in a fashion similar to that used to calculate D.
E.g. cell G4 contains the formula =COUNTIFS(E5:E$19,”>”&E4, D5:D$19,”>” & D4). Alternatively we note that C = C(n, 2) – D – T. Now C(n, 2) = C(15, 2) = 105 (cell M5), D = 72 (cell F19) and T = nx + ny – nx&y = 7 + 4 – 1 = 10. The number of ties is equal to the number of ties in x plus the number of ties y minus the number of ties for both x and y, nx&y. We calculate nx&y as the sum of the cells in column J where for example cell J4 contains the formula =COUNTIFS(D5:D$19,”=”&D4,E5:E$19,”=”&E4).
Kendall’s tau (cell M8) is calculated by the formula =(M7-M6)/SQRT((M5-H19)*(M5–I19)).
Observation: If there are a lot of ties we also need to modify the calculation of the standard error as follows:
Example 2: Repeat the analysis for Example 1 using the improved version of the standard error and z described above.
We show the analysis in Figure 2.
Figure 2 – Hypothesis testing for Kendall’s tau: improved version
C and D are calculated as before, but this time we handle the ties using the formulas
Column H contains a non-zero value only for those values in column D (the x values) which are the first one of a group of ties. This value is one less than the number of ties in that group. Similarly column I handles the ties from column E (the y values). E.g. the value 78 occurs 4 times in column D, the first of these in cell D12, and so cell H12 contains the value 4 – 1 = 3. This is done using the formula
Thus there are C(4, 2) = 6 pairs with value 78.
Since for any m, C(m, 2) = m(m–1)/2, we can calculate the number of ties for x, namely nx = C(ti, 2) by the formula =SUMPRODUCT(H4:H18,H4:H18+1)/2, and similarly for ny. In a similar fashion we can calculate the values of all the formulas in the previous observation, as shown in Figure 3.
Figure 3 – Calculation of standard error
From m (cell Q9) we calculate the standard error (cell L9) and the z-score (cell L12) as shown in Figure 2.
Observation: The KCORREL function with ties = TRUE (default) can be employed to carry out all these calculations automatically (see above). Alternatively, the Kendall’s Tau option of the Correlation data analysis tool can also be employed to carry out all these calculations (see Correlation Data Analysis Tool).
For example, the key results for Example 2 can be calculated using the formula =KCORREL(A4:A18,B4:B18,TRUE), as shown in Figure 4.
Figure 4 – Output of KCORREL with ties correction