Hubert's multi-rater kappa revisited.
Antonio Martín AndrésMaría Álvarez HernándezPublished in: The British journal of mathematical and statistical psychology (2019)
There is a frequent need to measure the degree of agreement among R observers who independently classify n subjects within K nominal or ordinal categories. The most popular methods are usually kappa-type measurements. When R = 2, Cohen's kappa coefficient (weighted or not) is well known. When defined in the ordinal case while assuming quadratic weights, Cohen's kappa has the advantage of coinciding with the intraclass and concordance correlation coefficients. When R > 2, there are more discrepancies because the definition of the kappa coefficient depends on how the phrase 'an agreement has occurred' is interpreted. In this paper, Hubert's interpretation, that 'an agreement occurs if and only if all raters agree on the categorization of an object', is used, which leads to Hubert's (nominal) and Schuster and Smith's (ordinal) kappa coefficients. Formulae for the large-sample variances for the estimators of all these coefficients are given, allowing the latter to illustrate the different ways of carrying out inference and, with the use of simulation, to select the optimal procedure. In addition, it is shown that Schuster and Smith's kappa coefficient coincides with the intraclass and concordance correlation coefficients if the first coefficient is also defined assuming quadratic weights.