We present QCA, a package for performing Qualitative Comparative Analysis (QCA). QCA is becoming increasingly popular with social scientists, but none of the existing software alternatives covers the full range of core procedures. This gap is now filled by QCA. After a mapping of the method’s diffusion, we introduce some of the package’s main capabilities, including the calibration of crisp and fuzzy sets, the analysis of necessity relations, the construction of truth tables and the derivation of complex, parsimonious and intermediate solutions.
Qualitative Comparative Analysis (QCA) - a research method popularized
largely through the work of Charles Ragin
(Ragin 1987, 2000, 2008) - counts among the most influential
recent innovations in social science methodology. In line with Ragin’s
own background, QCA has been initially employed only by a small number
of (political) sociologists (Griffin et al. 1991; Wickham-Crowley 1991; e.g., Amenta et al. 1992). Since then, however,
the method has made inroads into political science and international
relations (Vis 2009; e.g., Thiem 2011), business and economics (e.g., Evans and Aligica 2008; Valliere et al. 2008), management and organization (e.g., Greckhamer 2011), legal studies and criminology
(Miethe and Drass 1999; Arvind and Stirton 2010), education (Schneider and Sadowski 2010; e.g., Glaesser and Cooper 2011), and health policy research (e.g., Harkreader and Imershein 1999; Schensul et al. 2010). Figure 1 charts the
trend in the total number of QCA applications that have appeared in
peer-reviewed journal articles since 1984, broken down by its three
variants crisp-set QCA (csQCA), multi-value QCA (mvQCA) and fuzzy-set
QCA (fsQCA).
Allowing for a publication lag of about two years, 4.2 applications on
average have been published throughout the first decade following the
introduction of csQCA in Ragin (1987). But only his sequel “Fuzzy-Set
Social Science” (Ragin 2000) seems to have got the “Ragin Revolution”
(Vaisey 2009) eventually off ground. In the years from 2003 to 2007, the
average number of applications had risen to 13.6 before the absolute
number of applications more than tripled from 12 in 2007 to 39 in 2011.
Despite the introduction of fsQCA, applications of csQCA have continued
to increase from four in 2001 to 22 in 2011. In contrast to csQCA and
fsQCA, mvQCA has remained underutilized. Of a total of 280 applications
between 1984 and 2012, only ten have employed this variant. Even when
accounting for the fact that it has been introduced in 2004, 17 years
after csQCA and four years after fsQCA, this represents a
disproportionately low number.
QCA’s methodological success story has created a growing demand for
tailored software, which has been met almost exclusively by two
programmes: Charles Ragin and Sean Davey’s (2009) fs/QCA and
Lasse Cronqvist’s (2011) Tosmana. Until recently, however,
users of non-Windows operating systems were limited as neither programme
ran on other operating systems than Microsoft Windows. As of version
1.3.2.0, Tosmana has also supported other operating systems. In 2008 and
2012, Kyle Longest and Stephen Vaisey’s (2008) fuzzy package
for Stata and Christopher Reichert and Claude Rubinson’s
(2013) Kirq have been developed as alternatives to fs/QCA. For
the R environment, Adrian Duşa’s
QCA package has been first
added in 2006 and in 2009, Ronggui Huang (2011) has released the
QCA3 package. The detailed
market shares of these software solutions are also shown in
Figure 1.
Not as unequal as their market shares, but significantly different still, are the capabilities of these software solutions. Table 1 provides an overview of the functionality each programme offers. All alternatives to QCA have different capabilities, but none covers the entire range of basic procedures. Kirq, fs/QCA and fuzzy cannot handle multi-value sets, whereas Tosmana cannot process fuzzy sets. The possibility to analyze necessity relations is not implemented in Tosmana, either, and the other packages, except Kirq, offer only limited user-driven routines. Complex and parsimonious solutions can be found by all packages, but only fs/QCA generates intermediate solutions on an automated basis.
Function | Tosmanab | Kirqc | fs/QCAd | fuzzye | QCA3f | QCAg |
---|---|---|---|---|---|---|
variant | ||||||
2-7 csQCA | full | full | full | full | full | |
mvQCA | full | no | no | no | full | full |
fsQCA | no | full | full | full | full | full |
(tQCA) | no | no | full | no | full | full |
solution type | ||||||
2-7 complex | full | full | full | full | full | full |
intermediate | no | partial | full | no | partial | full |
parsimonious | full | full | full | full | full | full |
procedure | ||||||
2-7 necessity tests | no | full | partial | partial | partial | full |
parameters of fit | no | full | full | partial | partial | full |
calibration | partial | no | partial | partial | partial | full |
factorization | no | no | no | no | no | full |
identify (C)SAs | full | no | no | no | full | full |
statistical tests | no | no | no | full | partial | no |
b version 1.3.2.0; c version 2.1.9; d version 2.5; e version st0140_2; f version 0.0-5; g version 1.0-5 |
The calibration of crisp sets is limited in fs/QCA and QCA3. Tosmana
cannot handle fuzzy sets, but it provides more elaborate tools for the
calibration of crisp sets. In addition to Ragin’s (2008) “direct
method” and “indirect method”, fuzzy offers a set-normalizing linear
membership function. Most importantly, it also includes various
statistical procedures for coding truth tables, the appropriateness of
which largely depends on the research design.
QCA combines and enhances the individual strengths of other software
solutions. It can process all QCA variants (including temporal QCA
(tQCA)), generates all solution types, and offers a wide range of
procedures. For example, QCA provides four classes of functions for
almost all kinds of calibration requirements and has an automated
routine for the analysis of necessity relations. QCA is also the only
package that can factorize any Boolean expression. As in Tosmana and
QCA3, simplifying assumptions can also be identified. Unlike Tosmana,
however, which does not present any parameters of fit, QCA produces
inclusion, coverage and PRI (Proportional Reduction in Inconsistency)
scores for both necessity and sufficiency relations in mvQCA.
In summary, a comprehensive QCA software solution has been missing so far. Researchers have often been limited in their analyses when using one programme, or they had to resort to different programmes for performing all required operations. This gap is now filled by the QCA package, which seeks to provide a user-friendly yet powerful command-line interface alternative to the two dominant graphical user interface solutions fs/QCA and Tosmana. In the remainder of this article, we introduce some of the package’s most important functions, including the calibration of sets, the analysis of necessity relations, the construction of truth tables and the derivation of complex, parsimonious and intermediate solution types.
The process of translating base variables (also referred to as raw data)
into condition or outcome set membership scores is called calibration,
in fsQCA also fuzzification. In contrast to csQCA, continuous base
variables need not be categorized directly in fsQCA but can be
transformed with the help of continuous functions, a procedure called
assignment by transformation (Verkuilen 2005 465). Ragin (2008), for
example, suggests a piecewise-defined logistic function. Sufficient for
the vast majority of fuzzification needs, QCA offers the calibrate()
function, one of whose flexible function classes for positive end-point
concepts is given in Equation (1).
Here, calibrate()
can generate
set membership scores for sets based on negative or positive mid-point
concepts (Thiem and Duşa 2013 55–62). If no suitable thresholds have been
found even after all means of external and internal identification have
been exhausted, QCA’s findTh()
function can be employed for
searching thresholds using hierarchical cluster analysis.
> library(QCA)
> # base variable and vector of thresholds
> b <- sort(rnorm(15)); th <- quantile(b, c(0.1, 0.5, 0.9))
> # create bivalent crisp set
> calibrate(b, thresholds = th[2])
[1] 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
> # create trivalent crisp set using thresholds derived from cluster analysis
> calibrate(b, thresholds = findTh(b, groups = 3))
[1] 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2
> # fuzzification using Equation (1)
> round(calibrate(b, type = "fuzzy", thresholds = th), 2)
[1] 0.00 0.00 0.04 0.32 0.42 0.47 0.48 0.50 0.59 0.72 0.77 0.93 0.94 1.00 1.00
> # negation of previous result
> round(calibrate(b, type = "fuzzy", thresholds = rev(th)), 2)
[1] 1.00 1.00 0.96 0.68 0.58 0.53 0.52 0.50 0.41 0.28 0.23 0.07 0.06 0.00 0.00
> # fuzzification using piecewise logistic
> round(calibrate(b, type = "fuzzy", thresholds = th, logistic = TRUE), 2)
[1] 0.02 0.04 0.06 0.25 0.38 0.45 0.46 0.50 0.64 0.79 0.83 0.93 0.93 0.96 0.97
Whenever the occurrence of an event
Provided that
For analyzing necessity relations, QCA offers the superSubset()
function. If superSubset()
does not require users to predefine the
combinations to be tested, and so removes the risk of leaving
potentially interesting results undiscovered. The initial set of
combinations always consists of all superSubset()
automatically
switches to forming set unions until the least complex form has been
found.
For demonstration purposes, we reanalyze the data from Krook’s
(2010) csQCA on women’s representation in 22 Western democratic
parliaments. Countries with electoral systems of proportional
representation (
> data(Krook)
> Krook
ES QU WS WM LP WNP
SE 1 1 1 0 0 1
FI 1 0 1 0 0 1
NO 1 1 1 1 1 1
.. . . . . . .
<rest omitted>
> superSubset(Krook, outcome = "WNP", cov.cut = 0.52)
incl PRI cov.r
--------------------------------
1 ES+LP 1.000 1.000 0.733
2 ES+WM 1.000 1.000 0.524
3 WS+WM+LP 1.000 1.000 0.611
4 QU+wm+LP 1.000 1.000 0.550
5 QU+WM+lp 1.000 1.000 0.524
6 QU+WS+LP 1.000 1.000 0.550
7 QU+WS+WM 1.000 1.000 0.524
8 es+QU+WS 1.000 1.000 0.524
--------------------------------
When not specified otherwise, all sets in the data but the outcome are
assumed to be conditions. By default, the function tests for necessity,
but sufficiency relations can also be analyzed. No trivial intersection
has passed the inclusion cut-off and superSubset()
has thus formed
unions of conditions. Substantively, the first combination
Whenever the occurrence of an event
The classical device for analyzing sufficiency relations is the truth
table, which lists all
OUT | |||||
---|---|---|---|---|---|
1 | 1 | 1 | 1 | 1 | |
2 | 1 | 1 | 0 | 1 | |
3 | 1 | 0 | 1 | 1 | |
4 | 1 | 0 | 0 | 1 | |
5 | 0 | 1 | 1 | 0 | |
6 | 0 | 1 | 0 | C | |
7 | 0 | 0 | 1 | ? | 0 |
8 | 0 | 0 | 0 | ? | 0 |
It is important to emphasize that the outcome value is not the same as
the outcome set, the latter of which does not show up in QCA truth
table. Instead, the outcome value is based on the sufficiency inclusion
score, returning a truth value that indicates the degree to which the
evidence is consistent with the hypothesis that a sufficiency relation
between a configuration and the outcome set exists. Configurations
The truthTable()
function can generate truth tables for all three main
QCA variants without users having to specify which variant they use. The
structure of the data is automatically transposed into the correct
format.
> KrookTT <- truthTable(Krook, outcome = "WNP")
> KrookTT
OUT: outcome value
n: number of cases in configuration
incl: sufficiency inclusion score
PRI: proportional reduction in inconsistency
ES QU WS WM LP OUT n incl PRI
3 0 0 0 1 0 0 2 0.000 0.000
4 0 0 0 1 1 1 1 1.000 1.000
9 0 1 0 0 0 0 1 0.000 0.000
11 0 1 0 1 0 0 4 0.000 0.000
12 0 1 0 1 1 1 1 1.000 1.000
18 1 0 0 0 1 0 1 0.000 0.000
21 1 0 1 0 0 1 1 1.000 1.000
24 1 0 1 1 1 1 1 1.000 1.000
25 1 1 0 0 0 0 3 0.000 0.000
26 1 1 0 0 1 1 1 1.000 1.000
27 1 1 0 1 0 1 1 1.000 1.000
28 1 1 0 1 1 1 2 1.000 1.000
29 1 1 1 0 0 1 1 1.000 1.000
32 1 1 1 1 1 1 2 1.000 1.000
At a minimum, truthTable()
requires an appropriate dataset and the
outcome
argument, which identifies the outcome set. If conditions
is
not provided as an argument, it is assumed that all other sets in the
data but the outcome are the conditions. By default, logical remainders
are not printed unless specified otherwise by the logical argument
complete
. The logical argument show.cases
prints the names of the
objects next to the configuration in which they have membership above
0.5.
The truthTable()
function includes three cut-off arguments that
influence how OUT
is coded. These are n.cut
, incl.cut1
and
incl.cut0
. The first argument n.cut
sets the minimum number of cases
with membership above 0.5 needed in order to not code a configuration as
a logical remainder. The second argument incl.cut1
specifies the
minimal sufficiency inclusion score for a non-remainder configuration to
be coded as positive. The third argument incl.cut0
offers the
possibility of coding configurations as contradictions when their
inclusion score is neither high enough to consider them as positive nor
low enough to code them as negative. If the inclusion score of a
non-remainder configuration falls below incl.cut0
, this configuration
is always considered negative. By means of the sort.by
argument, the
truth table can also be ordered along inclusion scores, numbers of cases
or both, in increasing or decreasing order. If the original condition
set labels are rather long, the logical letters
argument can be used
to replace the set labels with upper case letters in alphabetical order.
The leftmost column list the configuration row index values from the complete truth table. Sufficiency inclusion and PRI scores are also provided in the two rightmost columns. Once the truth table is fully coded, it can be minimized according to the theorems of Boolean algebra (McCluskey 1965 84–89).
The canonical union resulting from the truth table presented in Table 2 is given by Equation (5). It consists of four fundamental intersections (FI), each of which corresponds to one positive configuration. Generally, all FIs also represent positive configurations, but not all positive configurations become FIs. The analyst may decide to exclude some of these configurations from the minimization process on theoretical or empirical grounds.
The central function of the QCA package that performs the minimization
is eqmcc()
(enhanced
Quine-McCluskey) (Duşa 2007, 2010). It
can derive complex, parsimonious and intermediate solutions from a truth
table object or a suitable dataset. In contrast to complex solutions,
parsimonious solutions incorporate logical remainders into the
minimization process without any prior assessment by the analyst as to
whether a sufficiency relation is plausible or not. Intermediate
solutions offer a middle way insofar as those logical remainders that
have been used in the derivation of the parsimonious solution are
filtered according to the analyst’s directional expectations about the
impact of each single condition set value on the overall sufficiency
relation of the configuration of which it is part and the outcome set.
By formulating such expectations, difficult logical remainders are
excluded as FIs from the canonical union, whereas those logical
remainders that enter the canonical union are easy. The complex
solution, which is the default option, can be generated by eqmcc()
with minimal typing effort.
> KrookSC <- eqmcc(KrookTT, details = TRUE)
> KrookSC
n OUT = 1/0/C: 11/11/0
Total : 22
S1: ES*QU*ws*LP + ES*QU*ws*WM + es*ws*WM*LP + ES*WS*wm*lp + ES*WS*WM*LP
incl PRI cov.r cov.u
---------------------------------------
ES*QU*ws*LP 1.000 1.000 0.273 0.091
ES*QU*ws*WM 1.000 1.000 0.273 0.091
es*ws*WM*LP 1.000 1.000 0.182 0.182
ES*WS*wm*lp 1.000 1.000 0.182 0.182
ES*WS*WM*LP 1.000 1.000 0.273 0.273
---------------------------------------
S1 1.000 1.000 1.000
The truth table object KrookTT
that was generated above is passed to
eqmcc()
. No further information is necessary in order to arrive at the
complex solution. The logical argument details
causes all parameters
of fit to be printed together with the minimal union S1
: inclusion
(incl
), PRI (PRI
), raw coverage (cov.r
) and unique coverage
(cov.u
) scores for each PI as well as the minimal union.details = TRUE
, the logical argument show.cases
also prints the
names of the objects that are covered by each PI.
If alternative minimal unions exist, all of them are printed if the row
dominance principle for PIs is not applied as specified in the logical
argument rowdom
. One PI rowdom
to FALSE
.
> KrookSP <- eqmcc(KrookTT, include = "?", rowdom = FALSE, details = TRUE)
> KrookSP
n OUT = 1/0/C: 11/11/0
Total : 22
S1: WS + ES*WM + QU*LP + (es*LP)
S2: WS + ES*WM + QU*LP + (WM*LP)
-------------------
incl PRI cov.r cov.u (S1) (S2)
-----------------------------------------------
WS 1.000 1.000 0.455 0.182 0.182 0.182
ES*WM 1.000 1.000 0.545 0.091 0.091 0.091
QU*LP 1.000 1.000 0.545 0.091 0.091 0.091
-----------------------------------------------
es*LP 1.000 1.000 0.182 0.000 0.091
WM*LP 1.000 1.000 0.636 0.000 0.091
-----------------------------------------------
S1 1.000 1.000 1.000
S2 1.000 1.000 1.000
The intermediate solution for bivalent set data requires a vector of
directional expectations in the direxp
argument, where “0” denotes
absence, “1” presence and “-1” neither. The intermediate solution with
all conditions expected to contribute to a positive outcome value when
present is generated as follows:
> KrookSI <- eqmcc(KrookTT, include = "?", direxp = c(1,1,1,1,1), details = TRUE)
> KrookSI
n OUT = 1/0/C: 11/11/0
Total : 22
p.sol: WS + ES*WM + QU*LP + WM*LP
S1: ES*WS + WM*LP + ES*QU*LP + ES*QU*WM
incl PRI cov.r cov.u
------------------------------------
ES*WS 1.000 1.000 0.455 0.182
WM*LP 1.000 1.000 0.636 0.182
ES*QU*LP 1.000 1.000 0.455 0.091
ES*QU*WM 1.000 1.000 0.455 0.091
------------------------------------
S1 1.000 1.000 1.000
For intermediate solutions, eqmcc()
also prints the parsimonious
solution (p.sol
) whose simplifying assumptions have been used in
filtering logical remainders. The PI chart of this intermediate solution
(i.sol
) that has been derived from the (first and only) complex and
the (first and only) parsimonious solution (C1P1
) can then be
inspected by accessing the corresponding component in the returned
object.
> KrookSI$PIchart$i.sol$C1P1
4 12 21 24 26 27 28 29 32
ES*WS - - x x - - - x x
WM*LP x x - x - - x - x
ES*QU*LP - - - - x - x - x
ES*QU*WM - - - - - x x - x
If several minimal sums exist under both the parsimonious and complex
solution, the PI chart of the respective combination for the
intermediate solution can be accessed by replacing the numbers in the
C1P1
component.
Besides the PI chart, the solution object returned by eqmcc()
also
contains a dataframe of PI set membership scores in the pims
component. These scores can then be used to draw Venn diagrams of
solutions, similar to the one shown in Figure 3, using
suitable R packages such as
VennDiagram
(Chen and Boutros 2011).
> KrookSI$pims$i.sol$C1P1
ES*WS WM*LP ES*QU*LP ES*QU*WM
SE 1 0 0 0
FI 1 0 0 0
NO 1 1 1 1
DK 1 1 0 0
NL 0 1 1 1
ES 0 0 0 1
.. . . . .
<rest omitted>
In recent years, Qualitative Comparative Analysis (QCA) has become the method of choice for testing configurational hypotheses. However, QCA is still very much a “method in the making”. Extensions, enhancements and alternative algorithms appear on a regular basis (Baumgartner 2009; Eliason and Stryker 2009; Schneider and Wagemann 2012; Thiem 2012). R provides an ideal environment within which established QCA procedures as well as more advanced techniques can be implemented in a manner as transparent and user-responsive as possible. The QCA package makes a significant contribution in this regard. It fills the individual gaps in other programs’ coverage of basic functionality, and provides further improvements through complementary and advanced procedures.
The QCA software market remains dominated by the two graphical user interface programmes fs/QCA and Tosmana. QCA seeks to bridge the method of QCA with powerful command-line software while retaining a user-friendly code and command structure. In order to lower the barriers for social scientists to choosing R for QCA further, we have published an introductory textbook with extended replications of recent QCA studies from various research areas (Thiem and Duşa 2013). Although most examples have been taken from political science, the book may also be of interest to researchers from related disciples. At the same time, this textbook also serves as a comprehensive reference manual for the QCA package.
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Thiem & Duşa, "QCA: A Package for Qualitative Comparative Analysis", The R Journal, 2013
BibTeX citation
@article{RJ-2013-009, author = {Thiem, Alrik and Duşa, Adrian}, title = {QCA: A Package for Qualitative Comparative Analysis}, journal = {The R Journal}, year = {2013}, note = {https://rjournal.github.io/}, volume = {5}, issue = {1}, issn = {2073-4859}, pages = {87-97} }