Cohen’s kappa is a measure of the agreement between two raters, where agreement due to chance is factored out. We now extend Cohen’s kappa to the case where the number of raters can be more than two. This extension is called Fleiss’ kappa. As for Cohen’s kappa no weighting is used and the categories are considered to be unordered.
Let n = the number of subjects, k = the number of evaluation categories and m = the number of judges for each subject. E.g. for Example 1 of Cohen’s Kappa, n = 50, k = 3 and m = 2. While for Cohen’s kappa both judges evaluate every subject, in the case of Fleiss’ kappa, there may be many more than m judges and not every judge needs to evaluate each subject; what is important is that each subject is evaluated m times.
For every subject i = 1, 2, …, n and evaluation categories j = 1, 2, …, k, let xij = the number of judges that assign category j to subject i. Thus
The proportion of pairs of judges that agree in their evaluation on subject i is given by
The mean of the pi is therefore
We use the following measure for the error term
Definition 1: Fleiss’ Kappa is defined to be
We can also define kappa for the jth category by
The standard error for κj is given by the formula
There is an alternative calculation of the standard error provided in Fleiss’ orginal paper, namely the square root of the following:
The test statistics zj = κj/s.e.(κj) and z = κ/s.e. are generally approximated by a standard normal distribution, which allows us to calculate a p-value and confidence interval. E.g. the 1 – α confidence interval for kappa is therefore approximated as
κ ± NORMSINV(1 – α/2) * s.e.
Example 1: Six psychologists (judges) evaluate 12 patients as to whether they are psychotic, borderline, bipolar or none of these. The rating are summarized in range A3:E15 of Figure 1. Determine the overall agreement between the psychologists, subtracting out agreement due to chance, using Fleiss’ kappa. Also find Fleiss’ kappa for each disorder.
Figure 1 – Calculation of Fleiss’ Kappa
For example, we see that 4 of the psychologists rated subject 1 to have psychosis and 2 rated subject 1 to have borderline syndrome, no psychologist rated subject 1 with bipolar or none.
We use the formulas described above to calculate Fleiss’ kappa in the worksheet shown in Figure 1. The formulas in the ranges H4:H15 and B17:B22 are displayed in text format in column J, except that the formulas in cells H9 and B19 are not displayed in the figure since they are rather long. These formulas are:
Figure 2 – Long formulas in worksheet of Figure 1
Note too that row 18 (labelled b) contains the formulas for qj(1–qj).
The p-values (and confidence intervals) show us that all of the kappa values are significantly different from zero.
Real Statistics Function: The Real Statistics Resource Pack contains the following supplemental function:
KAPPA(R1, j, lab, alpha, tails, orig): if lab = FALSE (default) returns a 6 × 1 range consisting of κ if j = 0 (default) or κj if j > 0 for the data in R1 (where R1 is formatted as in range B4:E15 of Figure 1), plus the standard error, z-stat, z-crit, p-value and lower and upper bound of the 1 – alpha confidence interval, where alpha = α (default .05) and tails = 1 or 2 (default). If lab = TRUE then an extra column of labels is included in the output. If orig = TRUE then the original calculation for the standard error is used; default is FALSE.
For Example 1, KAPPA(B4:E15) = .2968 and KAPPA(B4:E15,2) = .28. The complete output for KAPPA(B4:E15,,TRUE) is shown in Figure 3.
Figure 3 – Output from KAPPA function
Real Statistics Data Analysis Tool: The Reliability data analysis tool supplied in the Real Statistics Resource Pack can also be used to calculate Fleiss’ kappa.
To calculate Fleiss’ kappa for Example 1 press Ctrl-m and choose the Reliability option from the menu that appears. Fill in the dialog box that appears (see Figure 7 of Cronbach’s Alpha) by inserting B4:E15 in the Input Range, choosing the Fleiss’ kappa option and clicking on the OK button..
The output is shown in Figure 4.
Figure 4 – Output from Fleiss’ Kappa analysis tool
Note that if you change the values for alpha (cell C26) and/or tails (cell C27) the output in Figure 4 will change automatically.