Definition 1: Let x1, …, xn be a sample for random variable x and let y1, …, yn be a sample for random variable y of the same size n. There are C(n, 2) possible ways of selecting distinct pairs (xi, yi) and (xj, yj). For any such assignment of pairs, define each pair as concordant, discordant or neither as follows:
- concordant if (xi > xj and yi > yj) or (xi < xj and yi < yj)
- discordant if (xi > xj and yi < yj) or (xi < xj and yi > yj)
- neither if xi = xj or yi = yj (i.e. ties are not counted).
Now let C = the number of concordant pairs and D = the number of discordant pairs. Then define tau as
Observation: If there are no ties, then C(n, 2) = C + D. Thus
Observation: To facilitate the calculation of C – D it is best to first put all the x data elements in ascending order. If x and y are perfectly positively correlated, then all the values of y would be in ascending order too, and so if there are no ties then C = C(n, 2) and τ = 1.
Otherwise, there will be some inversions. For each i, count the number of j > i for which xj < xi. This sum is D. If x and y are perfectly negatively correlated, then all the values of y would be in descending order, and so if there are no ties then D = C(n, 2) and τ = -1.
Proof: This is a result of the fact that there are C(n, 2) pairings with C(n, 2) = C + D + T where T = the number of tied pairs. Thus
τ is maximum when D = T = 0 and so τ = 1. τ is minimum when C = T = 0 and so τ = -1.