R package krippendorffsalpha provides tools for measuring agreement using Krippendorff’s
Krippendorff’s kripp.alpha
of package
irr (Gamer et al. 2012), function
kripp.boot
of package kripp.boot (Proutskova and Gruszczynski 2020), function
krippalpha
of package icr
(Staudt and L’Ecuyer 2020), and functions krippen.alpha.raw
and krippen.alpha.dist
of
package irrCAC (Gwet 2019).
However, these packages fail to provide a number of useful features. In
this article we present package
krippendorffsalpha,
which improves upon the above mentioned packages in (at least) the
following ways. Package
krippendorffsalpha
confint
,
influence
, plot
, and summary
;The remainder of this article is organized as follows. In
Section 2, we locate Krippendorff’s
Since Krippendorff’s
The UML class diagram (Fowler et al. 2004) shown below in
Figure 1 provides a conceptual roadmap for our
development. Briefly, a special case of
In this section, we develop Krippendorff’s
In this setup, we have
We can eliminate the arithmetic means in ((1)) by
employing the identity
Level of Measurement | Distance Function |
---|---|
interval | |
nominal | |
ratio | |
bipolar | |
circular | |
ordinal |
In any case, ((3)) is nonparametric for arbitrary
To show that Krippendorff’s
Note that this formulation is quite general since the objects in
In the preceding sections, we generalized
The statistical model underpinning Sklar’s
To see that the one-way mixed-effects ANOVA model (and hence
Mielke and Berry (2007) describe hypothesis testing for MRPPs. Specifically, they
discuss three approaches: permutation, Monte Carlo resampling, and
Pearson type III moment approximation. The latter has significant
advantages. For Krippendorff’s
Collect the scores in an
For
For each
For each
The resulting collection
We carried out a number of realistic simulation experiments and found
that this approach to interval estimation performs well in a wide
variety of circumstances. When the true distribution of
For some levels of measurement, one may, in the interest of robustness,
be tempted to replace squares with absolute values (in the distance
function
Here we illustrate the use of
krippendorffsalpha
by applying Krippendorff’s
Range of Agreement | Interpretation |
---|---|
Slight Agreement | |
Fair Agreement | |
Moderate Agreement | |
Substantial Agreement | |
Near-Perfect Agreement |
Consider the following data, which appear in (Krippendorff 2013). These
are nominal values (in
u1 | u2 | u3 | u4 | u5 | u6 | u7 | u8 | u9 | u10 | u11 | u12 | |
c1 | 1 | 2 | 3 | 3 | 2 | 1 | 4 | 1 | 2 | • | • | • |
c2 | 1 | 2 | 3 | 3 | 2 | 2 | 4 | 1 | 2 | 5 | • | 3 |
c3 | • | 3 | 3 | 3 | 2 | 3 | 4 | 2 | 2 | 5 | 1 | • |
c4 | 1 | 2 | 3 | 3 | 2 | 4 | 4 | 1 | 2 | 5 | 1 | • |
Note that the scores for all units except the sixth are constant or
nearly so. This suggests near-perfect agreement, and so we should expect
To apply Krippendorff’s
R> library(krippendorffsalpha)
krippendorffsalpha: Measuring Agreement Using Krippendorff's Alpha Coefficient
Version 1.1 created on 2021-01-13.
copyright (c) 2020-2021, John Hughes
For citation information, type citation("krippendorffsalpha").
Type help(package = krippendorffsalpha) to get started.
Now, we create the dataset as a matrix such that each row corresponds to a unit and each column corresponds to a coder.
R> nominal = matrix(c(1,2,3,3,2,1,4,1,2,NA,NA,NA,
+ 1,2,3,3,2,2,4,1,2,5,NA,3,
+ NA,3,3,3,2,3,4,2,2,5,1,NA,
+ 1,2,3,3,2,4,4,1,2,5,1,NA), 12, 4)
R> nominal
[,1] [,2] [,3] [,4]
[1,] 1 1 NA 1
[2,] 2 2 3 2
[3,] 3 3 3 3
[4,] 3 3 3 3
[5,] 2 2 2 2
[6,] 1 2 3 4
[7,] 4 4 4 4
[8,] 1 1 2 1
[9,] 2 2 2 2
[10,] NA 5 5 5
[11,] NA NA 1 1
[12,] NA 3 NA NA
Next, we apply Krippendorff’s level
is set to "nominal"
, the discrete
metric confint
defaults to TRUE
,
and control parameter bootit
defaults to 1,000). We set control
parameter parallel
equal to FALSE
because the dataset is too small
to warrant parallelization of the bootstrap computation. Finally, we set
argument verbose
equal to TRUE
so that a progress bar is shown
during the bootstrap computation. The computation took less than one
second.
R> set.seed(42)
R> fit.full = krippendorffs.alpha(nominal, level = "nominal", control = list(parallel = FALSE),
+ verbose = TRUE)
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s
As is customary in R, one can view a summary by passing the fit object
to summary.krippendorffsalpha
, an S3 method. If krippendorffs.alpha
was called with confint = TRUE
, summary
displays a 95% confidence
interval by default. The confidence level can be specified using
argument conf.level
. In any case, the quantile method
(Davison and Hinkley 1997) is used to estimate the confidence limits. Any
arguments passed to summary.krippendorffsalpha
via quantile
function. This allows the user to control, for
example, how the sample quantiles are computed.
R> summary(fit.full)
Krippendorff's Alpha
Data: 12 units x 4 coders
Call:
krippendorffs.alpha(data = nominal, level = "nominal", verbose = TRUE,
control = list(parallel = FALSE))
Control parameters:
parallel FALSE
bootit 1000
Results:
Estimate Lower Upper
alpha 0.7429 0.4644 1
We see that
Perhaps the substantial disagreement for the sixth unit was influential
enough to yield influence.krippendorffsalpha
, another S3 method, to investigate. This
function, like other R versions of influence
(e.g., influence.lm
,
influence.glm
), computes DFBETA statistics
(Young 2017), as illustrated below.
R> (inf.6 = influence(fit.full, units = 6))
$dfbeta.units
6
-0.1141961
Leaving out the sixth unit yields a DFBETA statistic of -0.11, which
implies that
R> fit.full$alpha.hat - inf.6$dfbeta.units
alpha
0.8571429
Let us call krippendorffs.alpha
again to get a new interval.
R> fit.sub = krippendorffs.alpha(nominal[-6, ], level = "nominal",
+ control = list(parallel = FALSE))
confint(fit.sub)
0.025 0.975
0.6616541 1.0000000
We see that excluding the sixth unit leads to confint.krippendorffsalpha
, whose level
argument defaults to 0.95,
in keeping with R’s other confint
methods. Note that
confint.krippendorffsalpha
, like summary.krippendorffsalpha
, passes
any quantile
function.
We conclude this example by producing a visual display of our results
(Figure 3). The figure was produced via a call to S3
method plot.krippendorffsalpha
, which in turn calls hist
and
abline
, and does not show a kernel density estimate. Function
plot.krippendorffsalpha
is capable of producing highly customized
plots; see the package documentation for details. Since
R> plot(fit.sub, xlim = c(0, 1), xlab = "Bootstrap Estimates", main = "Nominal Data",
+ density = FALSE)
Since the dataset used in this example has missing values, we take this
opportunity to explain how the package handles missingness. First, the
scores for a given unit of analysis are included in the computation only
if two or more scores are present for that unit. Otherwise, the unit’s
row of the data matrix is simply ignored. Second, if two or more scores
are present for a given unit, each NA
for that unit is ignored in the
computations for that row. This is handled both by the loop (adjusted
denominator) and by the distance function, which should return 0 if
either of its arguments is NA
. In the next example, we illustrate this
by way of a user-defined distance function, and of course, the package’s
built-in distance functions take the same approach.
The data for this example, some of which appear in Figure 4, are 323 pairs of T2* relaxation times (a magnetic resonance quantity) for femoral cartilage (Nissi et al. 2015) in patients with femoroacetabular impingement (Figure 5), a hip condition that can lead to osteoarthritis. One measurement was taken when a contrast agent was present in the tissue, and the other measurement was taken in the absence of the agent. The aim of the study was to determine whether raw and contrast-enhanced T2* measurements agree closely enough to be interchangeable for the purpose of quantitatively assessing cartilage health.
u1 | u2 | u3 | u4 | u5 | … | u319 | u320 | u321 | u322 | u323 | |
c1 | 27.3 | 28.5 | 29.1 | 31.2 | 33.0 | … | 19.7 | 21.9 | 17.7 | 22.0 | 19.5 |
c2 | 27.8 | 25.9 | 19.5 | 27.8 | 26.6 | … | 18.3 | 23.1 | 18.0 | 25.7 | 21.7 |
First, we load the cartilage data, which are included in the package.
The cartilage data are stored in a data frame; we convert the data frame
to a matrix, which is the format required by krippendorffs.alpha
.
R> data(cartilage)
R> cartilage = as.matrix(cartilage)
Now, we compute verbose
to TRUE
causes the fitting function to
display a progress bar once again. The computation took five seconds to
complete.
R> set.seed(12)
R> fit.sed = krippendorffs.alpha(cartilage, level = "interval", verbose = TRUE,
+ control = list(bootit = 10000, parallel = TRUE,
+ nodes = 3))
Control parameter 'type' must be "SOCK", "PVM", "MPI", or "NWS". Setting it to "SOCK".
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=05s
A call of function summary.krippendorffsalpha
produced the output
shown below.
R> summary(fit.sed)
Krippendorff's Alpha
Data: 323 units x 2 coders
Call:
krippendorffs.alpha(data = cartilage, level = "interval", verbose = TRUE,
control = list(bootit = 10000, parallel = TRUE, nodes = 3))
Control parameters:
bootit 10000
parallel TRUE
nodes 3
type SOCK
Results:
Estimate Lower Upper
alpha 0.8369 0.808 0.8648
We see that
Figure 6 provides a visual display of the cartilage
results. The histogram and kernel density estimate show the expected
large-sample behavior of
We mentioned above that attempting to robustify Krippendorff’s
First, define a new distance function as follows. Note that any
user-defined distance function must deal explicitly with NA
s if the
data at hand exhibit missingness. There are no missing values in the
cartilage data, but we illustrate the handling of NA
anyway.
R> L1.dist = function(x, y)
+ {
+ d = abs(x - y)
+ if (is.na(d))
+ d = 0
+ d
+ }
Now we call krippendorffs.alpha
, supplying our new distance function
via the level
argument.
R> fit.L1 = krippendorffs.alpha(cartilage, level = L1.dist, verbose = TRUE,
+ control = list(bootit = 10000, parallel = TRUE,
+ nodes = 3))
Control parameter 'type' must be "SOCK", "PVM", "MPI", or "NWS". Setting it to "SOCK".
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=05s
The results are summarized below. These results strongly suggest that
only moderate to substantial agreement exists between raw T2*
measurements and contrast-enhanced T2* measurements. This contradicts
not only our
R> summary(fit.L1)
Krippendorff's Alpha
Data: 323 units x 2 coders
Call:
krippendorffs.alpha(data = cartilage, level = L1.dist, verbose = TRUE,
control = list(bootit = 10000, parallel = TRUE, nodes = 3))
Control parameters:
bootit 10000
parallel TRUE
nodes 3
type SOCK
Results:
Estimate Lower Upper
alpha 0.6125 0.5761 0.648
In this article, we described Krippendorff’s
We demonstrated the use of krippendorffsalpha version 1.1 by analyzing two datasets: a nominal dataset previously analyzed by Krippendorff, and a sample of raw and contrast-enhanced T2* values from an MRI study of hip cartilage. These analyses highlighted the benefits of the package, which include the use of S3 methods, parallel bootstrap computation, support for user-defined distance functions, and a means of identifying influential units and/or coders.
The results in this paper were obtained using R 4.0.3 for macOS and the pbapply 1.4-2 package. R itself and all packages used (save kripp.boot) are available from the Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org. Package krippendorffsalpha may be downloaded from CRAN or from the author’s GitHub repository, which can be found at https://github.com/drjphughesjr/krippendorffsalpha. Information about the author’s other R packages can be found at http://www.johnhughes.org/software.html.
krippendorffsalpha, irr, icr, irrCAC, pbapply
HighPerformanceComputing, Psychometrics
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Hughes, "krippendorffsalpha: An R Package for Measuring Agreement Using Krippendorff's Alpha Coefficient", The R Journal, 2021
BibTeX citation
@article{RJ-2021-046, author = {Hughes, John}, title = {krippendorffsalpha: An R Package for Measuring Agreement Using Krippendorff's Alpha Coefficient}, journal = {The R Journal}, year = {2021}, note = {https://rjournal.github.io/}, volume = {13}, issue = {1}, issn = {2073-4859}, pages = {413-425} }