CONTRIBUTED RESEARCH ARTICLES 87 QCA: A Package for Qualitative Comparative Analysis

We present QCA, a package for performing Qualitative Comparative Analysis (QCA). QCA is becoming increasingly popular with social scientists, but none of the existing software alternatives covers the full range of core procedures. This gap is now filled by QCA. After a mapping of the method's diffusion, we introduce some of the package's main capabilities, including the calibration of crisp and fuzzy sets, the analysis of necessity relations, the construction of truth tables and the derivation of complex, parsimonious and intermediate solutions.


Introduction
Qualitative Comparative Analysis (QCA) -a research method popularized largely through the work of Charles Ragin (Ragin, 1987(Ragin, , 2000(Ragin, , 2008) ) -counts among the most influential recent innovations in social science methodology.In line with Ragin's own background, QCA has been initially employed only by a small number of (political) sociologists (e.g., Amenta et al., 1992;Griffin et al., 1991;Wickham-Crowley, 1991).Since then, however, the method has made inroads into political science and international relations (e.g., Thiem, 2011;Vis, 2009), business and economics (e.g., Evans and Aligica, 2008;Valliere et al., 2008), management and organization (e.g., Greckhamer, 2011), legal studies and criminology (Arvind and Stirton, 2010;Miethe and Drass, 1999), education (e.g., Glaesser and Cooper, 2011;Schneider and Sadowski, 2010), and health policy research (e.g., Harkreader and Imershein, 1999;Schensul et al., 2010).Figure 1 charts the trend in the total number of QCA applications that have appeared in peer-reviewed journal articles since 1984, broken down by its three variants crisp-set QCA (csQCA), multi-value QCA (mvQCA) and fuzzy-set QCA (fsQCA). 1   Allowing for a publication lag of about two years, 4.2 applications on average have been published throughout the first decade following the introduction of csQCA in Ragin (1987).But only his sequel "Fuzzy-Set Social Science" (Ragin, 2000) seems to have got the "Ragin Revolution" (Vaisey, 2009) eventually off ground.In the years from 2003 to 2007, the average number of applications had risen to 13.6 before the absolute number of applications more than tripled from 12 in 2007 to 39 in 2011.Despite the introduction of fsQCA, applications of csQCA have continued to increase from four in 2001 to 22 in 2011.In contrast to csQCA and fsQCA, mvQCA has remained underutilized.Of a total of 280 applications between 1984 and 2012, only ten have employed this variant.Even when accounting for the fact that it has been introduced in 2004, 17 years after csQCA and four years after fsQCA, this represents a disproportionately low number.QCA's methodological success story has created a growing demand for tailored software, which 1 The number of applications differs slightly from the number of articles as four articles have each presented two applications of QCA using two different variants.In order to be included in the data, applications had to focus primarily on a substantive research question, not QCA as a method.All entries have been recorded in the bibliography section on http://www.compasss.org.
2 See the debate in Field Methods for more details on the status of mvQCA (Thiem, 2013;Vink andvan Vliet, 2009, 2013).
The R Journal Vol.Huang (2011) has released the QCA3 package.The detailed market shares of these software solutions are also shown in Figure 1. 3olding a clear monopoly, fs/QCA is by far the most common software with 82%, followed by Tosmana with 14% and fuzzy with 1%.Other solutions have claimed about 3%, but neither R package has managed to win any market shares thus far.
Not as unequal as their market shares, but significantly different still, are the capabilities of these software solutions.Table 1  The calibration of crisp sets is limited in fs/QCA and QCA3.Tosmana cannot handle fuzzy sets, but it provides more elaborate tools for the calibration of crisp sets.In addition to Ragin's (2008) "direct method" and "indirect method", fuzzy offers a set-normalizing linear membership function.Most importantly, it also includes various statistical procedures for coding truth tables, the appropriateness of which largely depends on the research design. 4QCA combines and enhances the individual strengths of other software solutions.It can process all QCA variants (including temporal QCA (tQCA)), generates all solution types, and offers a wide range of procedures.For example, QCA provides four classes of functions for almost all kinds of calibration requirements and has an automated routine for the analysis of necessity relations.QCA is also the only package that can factorize any Boolean expression.As in Tosmana and QCA3, simplifying assumptions can also be identified.Unlike Tosmana, however, which does not present any parameters of fit, QCA produces inclusion, coverage and PRI (Proportional Reduction in Inconsistency) scores for both necessity and sufficiency relations in mvQCA. 5n summary, a comprehensive QCA software solution has been missing so far.Researchers have often been limited in their analyses when using one programme, or they had to resort to different programmes for performing all required operations.This gap is now filled by the QCA package, which seeks to provide a user-friendly yet powerful command-line interface alternative to the two dominant graphical user interface solutions fs/QCA and Tosmana.In the remainder of this article, we introduce some of the package's most important functions, including the calibration of sets, the analysis of necessity relations, the construction of truth tables and the derivation of complex, parsimonious and intermediate solution types.

Calibration of sets
The process of translating base variables (also referred to as raw data) into condition or outcome set membership scores is called calibration, in fsQCA also fuzzification.In contrast to csQCA, continuous base variables need not be categorized directly in fsQCA but can be transformed with the help of continuous functions, a procedure called assignment by transformation (Verkuilen, 2005, p. 465).Ragin (2008), for example, suggests a piecewise-defined logistic function.Sufficient for the vast majority of fuzzification needs, QCA offers the calibrate() function, one of whose flexible function classes for positive end-point concepts is given in Equation (1). (1) Here, b is the base variable, τ ex the threshold for full exclusion from set S, τ cr the crossover threshold at the point of maximally ambiguous membership in S and τ in the threshold for full inclusion in S. The parameters p and q control the degrees of concentration and dilation.The piecewise-defined logistic membership function suggested in Ragin (2008) is also available.Furthermore, calibrate() can generate set membership scores for sets based on negative or positive mid-point concepts (Thiem and Duşa, 2013, pp. 55-62).If no suitable thresholds have been found even after all means of external and internal identification have been exhausted, QCA's findTh() function can be employed for searching thresholds using hierarchical cluster analysis.

Analysis of necessity
Whenever the occurrence of an event B is accompanied by the occurrence of an event A, then B implies A (B ⇒ A) and A is implied by B (A ⇐ B).Put differently, A is necessary for B and B is sufficient for A. Transposed to the set-theoretic terminology of QCA, analyses of necessity proceed from the observation of a value under the outcome set Y -written Y{v l } -to the observation of a value under the condition set X -written X{v l }.For analyzing necessity inclusion, the decisive question is to which degree objects are members of X{v l } and Y{v l } in relation to their overall membership in Y{v l }.
If necessity inclusion is high enough, the evidence is consistent with the hypothesis that X{v l } is necessary for Y{v l } (X{v l } ⊇ Y{v l }).The formula for necessity inclusion Incl N (X{v l }) is presented in Equation (2).
Provided that X{v l } ⊇ Y{v l } is sufficiently true, necessity coverage allows an assessment of the frequency with which B occurs relative to A. The formula for necessity coverage Cov N (X{v l }) is given in Equation (3).
For analyzing necessity relations, QCA offers the superSubset() function.If p j denotes the number of values of condition set j with j = 1, 2, . . ., k, the function returns necessity inclusion, PRI and coverage scores for those of the d = ∏ k j=1 (p j + 1) − 1 combinations of condition set values that just meet the given inclusion and coverage cut-offs. 6Therefore, superSubset() does not require users to predefine the combinations to be tested, and so removes the risk of leaving potentially interesting results undiscovered.The initial set of combinations always consists of all ∏ k j=1 p j trivial intersections X 1 {v 1 }, X 1 {v 2 }, . . ., X 1 {v p }, . . ., X k {v p } .The size of the intersection is incrementally increased from 1 to k until its inclusion score falls below the cut-off.If no trivial intersection passes the inclusion cut-off, superSubset() automatically switches to forming set unions until the least complex form has been found.
For demonstration purposes, we reanalyze the data from Krook's (2010) csQCA on women's representation in 22 Western democratic parliaments.Countries with electoral systems of proportional representation (ES), parliamentary quotas (QU), social democratic welfare systems (WS), autonomous women's movements (WM), more than 7% left party seats (LP) and more than 30% seats held by women (WNP) are coded "1", all others "0".The first five sets are the conditions to be tested for necessity in relation to the outcome set WNP.For reasons of simplicity and space, we use lower case letters for denoting set negation in all remaining code examples.------------------------------- --------------------------------When not specified otherwise, all sets in the data but the outcome are assumed to be conditions.By default, the function tests for necessity, but sufficiency relations can also be analyzed.No trivial intersection has passed the inclusion cut-off and superSubset() has thus formed unions of conditions.Substantively, the first combination ES + LP means that having proportional representation or strong left party representation is necessary for having more than 30% parliamentary seats held by women.

Analysis of sufficiency, step 1: Truth tables
Whenever the occurrence of an event A is accompanied by the occurrence of an event B, then A implies B (A ⇒ B) and B is implied by A (B ⇐ A).Put differently, A is sufficient for B and B is necessary for A. Transposed to the set-theoretic terminology of QCA, analyses of sufficiency proceed from the observation of a value under X to the observation of a value under Y.For analyzing sufficiency inclusion, the decisive question is to which degree objects are members of X{v l } and Y{v l } in relation to their overall membership in X{v l }.If sufficiency inclusion is high enough, the evidence is consistent with the hypothesis that X{v l } is sufficient for Y{v l } (X{v l } ⊆ Y{v l }).The formula for sufficiency inclusion Incl S (X{v l }) is presented in Equation (4).
The classical device for analyzing sufficiency relations is the truth table, which lists all d = ∏ k j=1 p j configurations and their corresponding outcome value. 7Configurations represent exhaustive combinations of set values characterizing the objects.For illustration, a simple hypothetical truth table with three bivalent condition sets X 1 , X 2 and X 3 and the outcome value OUT is presented in Table 2. Three bivalent conditions yield the eight configurations listed under C i .The minimum number of cases n that is usually required for the respective outcome value is also appended.
It is important to emphasize that the outcome value is not the same as the outcome set, the latter of which does not show up in QCA truth table.Instead, the outcome value is based on the sufficiency inclusion score, returning a truth value that indicates the degree to which the evidence is consistent with the hypothesis that a sufficiency relation between a configuration and the outcome set exists.Configurations C 1 to C 4 are positive because they support this hypothesis (OUT = 1), C 5 is negative because it does not (OUT = 0).Mixed evidence exists for C 6 (OUT = C).If at least two objects conform to one configuration, but the evidence neither sufficiently confirms nor falsifies the hypothesis of a subset relation between this configuration and the outcome set, contradictions arise.No empirical evidence at all exists for C 7 and C 8 .If a configuration has no or too few cases, it is called a logical remainder (OUT = ?).
The truthTable() function can generate truth tables for all three main QCA variants without users having to specify which variant they use.The structure of the data is automatically transposed into the correct format.
> KrookTT <-truthTable (Krook, 1. At a minimum, truthTable() requires an appropriate dataset and the outcome argument, which identifies the outcome set.If conditions is not provided as an argument, it is assumed that all other sets in the data but the outcome are the conditions.By default, logical remainders are not printed unless specified otherwise by the logical argument complete.The logical argument show.casesprints the names of the objects next to the configuration in which they have membership above 0.5.
The truthTable() function includes three cut-off arguments that influence how OUT is coded.These are n.cut, incl.cut1 and incl.cut .The first argument n.cut sets the minimum number of cases with membership above 0.5 needed in order to not code a configuration as a logical remainder.The second argument incl.cut1specifies the minimal sufficiency inclusion score for a non-remainder configuration to be coded as positive.The third argument incl.cutoffers the possibility of coding configurations as contradictions when their inclusion score is neither high enough to consider them as positive nor low enough to code them as negative.If the inclusion score of a non-remainder configuration falls below incl.cut, this configuration is always considered negative.By means of the sort.byargument, the truth table can also be ordered along inclusion scores, numbers of cases or both, in increasing or decreasing order.If the original condition set labels are rather long, the logical letters argument can be used to replace the set labels with upper case letters in alphabetical order.
The leftmost column list the configuration row index values from the complete truth table.Sufficiency inclusion and PRI scores are also provided in the two rightmost columns.Once the truth table is fully coded, it can be minimized according to the theorems of Boolean algebra (McCluskey, 1965, pp. 84-89).

Analysis of sufficiency, step 2: Boolean minimization
The canonical union resulting from the truth table presented in Table 2 is given by Equation (5).It consists of four fundamental intersections (FI), each of which corresponds to one positive configuration.Generally, all FIs also represent positive configurations, but not all positive configurations become FIs.The analyst may decide to exclude some of these configurations from the minimization process on theoretical or empirical grounds.
If two FIs differ on the values of one condition only, then this condition can be eliminated so that a simpler term results.For example, Equation ( 5) can be reduced in two passes as shown in Figure 2.
In the first pass, the four FIs can be reduced to four simpler terms.In the second pass, these four terms can then be reduced at once to a single term.No further reduction is possible, X 1 is the only term which is essential with respect to the outcome (X 1 ⊆ Y).All terms that survive the Boolean minimization process are called prime implicants (PI).
The central function of the QCA package that performs the minimization is eqmcc() (enhanced Quine-McCluskey) (Duşa, 2007(Duşa, , 2010)).It can derive complex, parsimonious and intermediate solutions from a truth table object or a suitable dataset.In contrast to complex solutions, parsimonious solutions incorporate logical remainders into the minimization process without any prior assessment by the analyst as to whether a sufficiency relation is plausible or not.Intermediate solutions offer a middle way insofar as those logical remainders that have been used in the derivation of the parsimonious The R Journal Vol.5/1, June ISSN 2073-4859 solution are filtered according to the analyst's directional expectations about the impact of each single condition set value on the overall sufficiency relation of the configuration of which it is part and the outcome set.By formulating such expectations, difficult logical remainders are excluded as FIs from the canonical union, whereas those logical remainders that enter the canonical union are easy.The complex solution, which is the default option, can be generated by eqmcc() with minimal typing effort.--------------------------------------S1 1. 1.

1.
The truth table object KrookTT that was generated above is passed to eqmcc().No further information is necessary in order to arrive at the complex solution.The logical argument details causes all parameters of fit to be printed together with the minimal union S1: inclusion (incl), PRI (PRI), raw coverage (cov.r) and unique coverage (cov.u)scores for each PI as well as the minimal union.8If details = TRUE, the logical argument show.cases also prints the names of the objects that are covered by each PI.
If alternative minimal unions exist, all of them are printed if the row dominance principle for PIs is not applied as specified in the logical argument rowdom.One PI P 1 dominates another P 2 if all FIs covered by P 2 are also covered by P 1 and both are not interchangeable (cf.McCluskey, 1965, p. 150).Inessential PIs are listed in brackets in the solution output and at the end of the PI part in the parameters-of-fit table, together with their unique coverage scores under each individual minimal union.For example, the parsimonious solution without row dominance applied can be derived by making all logical remainders available for inclusion in the canonical union as FIs and by setting rowdom to FALSE..455 .182 .182 .182 ES*WM 1. 1. .545 . 91 . 91 . 91 QU*LP 1. 1. .545 . 91 . 91 .91 ----------------------------------------------es*LP 1.
For intermediate solutions, eqmcc() also prints the parsimonious solution (p.sol) whose simplifying assumptions have been used in filtering logical remainders.The PI chart of this intermediate solution (i.sol) that has been derived from the (first and only) complex and the (first and only) parsimonious solution (C1P1) can then be inspected by accessing the corresponding component in the returned object.Besides the PI chart, the solution object returned by eqmcc() also contains a dataframe of PI set membership scores in the pims component.These scores can then be used to draw Venn diagrams of solutions, similar to the one shown in Figure 3, using suitable R packages such as VennDiagram (Chen and Boutros, 2011). 2

Figure 1 :
Figure 1: Trend in number of QCA applications by variants and year, and software market share.

>
KrookSI$PIchart$i.sol$C1P1 4 12 21 24 26 27 28 29 32 ES*WS--x x ---x x WM*LP x x -x --x -x ES*QU*LP ----x -x -x ES*QU*WM -----x x -xIf several minimal sums exist under both the parsimonious and complex solution, the PI chart of the respective combination for the intermediate solution can be accessed by replacing the numbers in the C1P1 component.
5/1, June ISSN 2073-4859 has been met almost exclusively by two programmes: Charles Ragin and Sean Davey's (2009) fs/QCA and Lasse Cronqvist's (2011) Tosmana.Until recently, however, users of non-Windows operating systems were limited as neither programme ran on other operating systems than Microsoft Windows.As of version 1.3.2.0, Tosmana has also supported other operating systems.In 2008 and 2012, Kyle Longest and Stephen Vaisey's (2008) fuzzy package for Stata and Christopher Reichert and Claude Rubinson's (2013) Kirq have been developed as alternatives to fs/QCA.For the R environment, Adrian Duşa's QCA package has been first added in 2006 and in 2009, Ronggui

Table 1 :
provides an overview of the functionality each programme offers.All alternatives to QCA have different capabilities, but none covers the entire range of basic procedures.Kirq, fs/QCA and fuzzy cannot handle multi-value sets, whereas Tosmana cannot process fuzzy sets.The possibility to analyze necessity relations is not implemented in Tosmana, either, and the other packages, except Kirq, offer only limited user-driven routines.Complex and parsimonious solutions can be found by all packages, but only fs/QCA generates intermediate solutions on an automated basis.Overview Software Functionality a