Social scientists and statisticians often use aggregate data to predict individual-level behavior because the latter are not always available. Various statistical techniques have been developed to make inferences from one level (e.g., precinct) to another level (e.g., individual voter) that minimize errors associated with ecological inference. While ecological inference has been shown to be highly problematic in a wide array of scientific fields, many political scientists and analysis employ the techniques when studying voting patterns. Indeed, federal voting rights lawsuits now require such an analysis, yet expert reports are not consistent in which type of ecological inference is used. This is especially the case in the analysis of racially polarized voting when there are multiple candidates and multiple racial groups. The eiCompare package was developed to easily assess two of the more common ecological inference methods: EI and EI:R\(\times\)C. The package facilitates a seamless comparison between these methods so that scholars and legal practitioners can easily assess the two methods and whether they produce similar or disparate findings.
Ecological inference is a widely debated methodology for attempting to understand individual, or micro behavior from aggregate data. Ecological inference has come under fire for being unreliable, especially in the fields of biological sciences, ecology, epidemiology, public health and many social sciences. For example, Freedman (1999) explains that when confronted with individual level data, many ecological aggregate estimates in epidemiology have been proven to be wrong. In the field of ecology Martin, B. A. Wintle, J. R. Rhodes, P. M. Kuhnert, S. A. Field, S. J. Low-Choy, A. J. Tyre, and H. P. Possingham (2005) expose the problem of zero-inflation in studies of the presence or absence of specific species of different animals and note that ecological techniques can lead to incorrect inference. Greenland (2001) describes the many pitfalls of ecological inference in public health due to the nonrandomization of social context across ecological units of analysis. Elsewhere, Greenland and J. Robins (1994) have argued that the problem of ecological confounder control leads to biased estimates of risk in epidemiology. Related, Frair, J. Fieberg, M. Hebblewhite, F. Cagnacci, N. J. DeCesare, and L. Pedrotti (2010) argue that while some ecological analysis can be informative when studying animal habitat preference, existing methods of ecological inference provide imprecise information on variation in the outcome variables and that considerable improvements are necessary. Wakefield (2004) provides a nice comparison of how ecological inference performs across epidemiological versus social scientific research. He concludes that in epidemiological applications individual-level data are required for consistently accurate statistical inference.
However, within the narrow subfield of racial voting patterns in American elections ecological inference is regularly used. This is especially common in scholarly research on the voting rights act where the United States Supreme Courts directly recommended ecological inference analysis as the main statistical method to estimate voting preference by racial group (e.g. Thornburg v. Gingles 478 U.S. 30, 1986). Because Courts in the U.S. have so heavily relied on ecological inference, it has gained prominence in political science research. The American Constitution Society for Law and Policy explains that ecological inference is one of the three statistical analyses that must be performed in voting rights research on racial voting patterns.1 As ecological inference evolved a group of scholars developed the eiPack package and published an article in R News announcing the new package (Lau, R. T. Moore, and M. Kellermann 2006).
This article does not conclude that ecological inference is appropriate or reliable outside the specific domain of American elections. Indeed, scholars in the fields of epidemiology and public health have correctly pointed out the limitations of individual level inference from aggregate date. However, its application to voting data in the United States represents one area where it may have utility, if model assumptions are met (Tam Cho and B. J. Gaines 2004). Indeed, the main point of our article is not to settle the debate on the accuracy of ecological inference in the sciences writ large, but rather to assess the degree of similarity or difference with respect to two heavily used R packages within the field of political science, ei and eiPack. Our package, eiCompare offers scholars who regularly use ecological inference in analyses of voting patterns the ability to easily compare, contrast and diagnose estimates across two different ecological methods that are recommended statistical techniques in voting rights litigation.
Today, although there is continued debate among social scientists (Cho 1998; Greiner 2007, 2011) — the courts generally rely on two statistical approaches to ecological data. The first, ecological inference (EI), developed by King (1997), is said to be preferred when there are only two racial or ethnic groups, and ideally only two candidates contesting office. However, Wakefield (2004) notes that EI methods can be improved with the use of survey data as Bayesian priors. The second, ecological inference R\(\times\)C (R\(\times\)C) developed by Rosen, W. Jiang, G. King, and M. A. Tanner (2001), is said to be preferred when there are multiple racial or ethnic groups, or multiple candidates contesting office. However, it is not clear that when faced with the exact same dataset, they would produce different results. In one case, analysis of the same dataset across multiple ecological approaches found they tend to produce the same conclusion (Grofman and M. A. Barreto 2009). However, others have argued that using King’s EI iterative approach with multiple racial groups or multiple candidates will fail and should not be relied on (Ferree 2004). Still others have gone further and stated that EI cannot be used to analyze multiple racial group or multiple candidate elections, stating that “it biases the analysis for finding racially polarized voting,” going on to call this approach “problematic” and stating that “no valid statistical inferences can be drawn” (Katz 2014).
As with any methodological advancement, there is a healthy and rigorous debate in the literature. However, very little real election data has been brought to bear in this debate. Ferree (2004) offers a simulation of Black, White, and Latino turnout and voting patterns, and then examines real data from a parliamentary election in South Africa using a proportional representation system. (Grofman and M. A. Barreto 2009) compare an exit poll to precinct election data in Los Angeles, but only compare Goodman’s ecological regression against King’s EI, using the single-equation versus double-equation approach, and do not examine the R\(\times\)C approach at all.
The challenges surrounding ecological inference are well documented. Robinson (2009) pointed out that relying on aggregate data to infer the behavior of individuals can result in the ecological fallacy, and since then scholars have applied different methods to discern more accurately individual correlations from aggregate data. Goodman (1953, 1959) advanced the idea of ecological regression where individual patterns can be drawn from ecological data under certain conditions. However Goodman’s logic assumed that group patterns were consistent across each ecological unit, and in reality that may not be the case.
Eventually, systematic analysis revealed that these early methods could be unreliable (King 1997). Ecological inference is King’s (1997) solution to the ecological fallacy problem inherent in aggregate data, and since the late 1990s has been the benchmark method courts use in evaluating racial polarization in voting rights lawsuits, and has been used widely in comparative politics research on group and ethnic voting patterns. Critics claim that King’s EI model was designed primarily for situations with just two groups (e.g., blacks and whites; Hispanics and Anglos, etc.). While many geographic areas (e.g., Mississippi, Alabama) still contain essentially two groups and hence pose no threat to traditional EI estimation procedures, the growth of racial groups such as Latinos and Asians have challenged the historical biracial focus on race in the United States (thereby challenging traditional EI model assumptions). Rosen, W. Jiang, G. King, and M. A. Tanner (2001) suggest a rows by columns (R\(\times\)C) approach which allows for multiple racial groups, and multiple candidates; however, their Bayesian approach suffered computational difficulties and was not employed at a mass level. Since then, computing power has steadily improved, making R\(\times\)C a realistic solution for many scenarios and accessible packages now exist in R that are widely used. These two methodological approaches are now both regularly used in political science; however, there is no consistent evidence how they perform side-by-side, and are different.
Ferree (2004) critiques King’s EI model, arguing that the conditions for iterative estimation (e.g., black vs. non black, white vs. non-white, Hispanic vs. non-Hispanic) can be considerably biased due to aggregation bias and multimodality in the data. In a hypothetical simulation dataset, Ferree shows that combining blacks and whites into a single “non-Hispanic” group in order to estimate Hispanic turnout can vastly overestimate Hispanic turnout, for example. However, the analysis did not provide any clues as to the specific conditions when and how R\(\times\)C is significantly better or preferred to EI. For example, if there are three racial groups in equal thirds of the electorate, does aggregation bias create more error in EI than a scenario in which two dominant groups comprise 90% and a small group is just 10%? Likewise, is EI’s iterative approach to candidates more stable when analyzing three candidates and far less stable when eight candidates contest the election? These questions have not been considered empirically. Instead, the existing scholarship uses simulation data to prove theoretically that EI might create bias and that R\(\times\)C is preferred. We argue that real election data should be considered in a side-by-side comparison.
Despite some critiques, other political scientists have defended ecological inference and even ecological regression using both simulations and real data. Owen and B. Grofman (1997) assess whether or not ecological fallacy in ecological regression is a theoretical problem only, a real problem for empirical analysis. In an extensive review, Owen and Grofman conclude that despite the valid theoretical concerns, linear ecological regression still holds up and provides meaningful and accurate estimates of racially polarized voting. A decade later, Grofman and M. A. Barreto (2009) again take up the question of how ecological models compare to one another using a combination of simulation, actual election precinct data, and an accompanying individual-level exit poll. Their analysis argues that there is general consistency across all ecological models and that once voter turnout rates are accounted for, ecological regression and King’s EI lead scholars to the same results. However, Grofman and Barreto did not consider R\(\times\)C in their comparison.
Greiner and K. M. Quinn (2010) combine R\(\times\)C methods with individual level exit poll data, and argue that this hybrid model can be preferable to a straight aggregation model. However, using exit poll data is not always available to all researchers and practitioners. Indeed, in most county or city elections, exit poll data does not exist which is why scholars often attempt to infer voting patterns through aggregate data. Herron and K. W. Shotts (2003) also criticize EI estimates when used for second-stage regression - given that error is baked into the second-level regression estimation. However Adolph and G. King (2003) respond by adjusting the EI procedure to reduce inconsistencies when estimating second-stage regressions. Nevertheless, these issues with EI do not speak specifically to R\(\times\)C methods.
Greiner and K. M. Quinn (2009) extend the 2\(\times\)2 EI contingency problem to 3\(\times\)3 and estimate voting preferences simultaneously for three candidates across three racial groups (but using counts instead of percentages). We extend this work by analyzing real-world datasets with sizes greater than 3\(\times\)3 (multiple candidates and at least three racial groups). In all of this, our main goal is to assess whether using iterative EI or simultaneous R\(\times\)C approaches change the conclusions social scientists can make from the data.
Finally, some have gone even further in arguing that EI is ill-equipped to handle complex datasets with multiple candidates and multiple racial groups, and that only R\(\times\)C can produce reliable results (Katz 2014). In explaining the theoretical reasons why EI cannot accurately process such elections Katz argues “adding additional groups and vote choices to King’s (1997) EI is not straightforward,” and also adds “given the estimation uncertainty, it may not be possible to infer which candidate is preferred by members of the group.” The argument against EI in multiple racial group, or especially multiple candidate elections is that EI takes an iterative approach pitting candidate A versus all others who are not candidate A. If the election features four candidates (A, B, C, D) critics state that you cannot accurately estimate vote choice quantities if you compare the vote for candidate A against the combined vote for B, C, D. The iterative approach would then move on to estimate the vote share for candidate B against the combined vote for A, C, D and so on, so that four separate equations are run. Katz (2014) claims that EI biases the findings in favor of bloc-voting stating “this jerry rigged approach to dealing with more than two vote choices stacks the deck in favor of finding statistical evidence for racially polarized.” Given these debates, our package allows scholars to quite easily make side-by-side comparisons and evaluate these competing claims.
While important advancements have been made in ecological inference techniques by King (1997) and Rosen, W. Jiang, G. King, and M. A. Tanner (2001) there is no consistency in which technique is used and how results are presented. What’s more, legal experts and social scientists often argue during voting rights lawsuits that one technique is superior to the other, or that their results are more accurate. There is no question that both social scientists and legal experts would greatly benefit from a standardized software package that presents both ecological inference results (EI and R\(\times\)C) simultaneously and metrics to compare each set of results. Thus, eiCompare was designed to compare the most commonly used methods today, EI and R\(\times\)C, but also incorporates Goodman methods. The package lets analysts seamlessly assess whether EI and R\(\times\)C estimates are similar (see King (1997) and Rosen, W. Jiang, G. King, and M. A. Tanner (2001) for a methodological description of the techniques). It incorporates functions from ei (King and M. Roberts 2013) and eiPack (Lau, R. T. Moore, and M. Kellermann 2012) into a new package that relatively quickly compares ecological inference estimates across the two routines.
The package includes several functions that ultimately produce tables of
results from the different ecological inference methods. Thus, in the
case of racially polarized voting, analysts can quickly assess whether
different racial groups preferred different candidates, according to the
EI, R\(\times\)C, and Goodman approaches. The eiCompare package wraps
the ei()
procedure (King and M. Roberts 2012) into a generalized function, has a
variety of table-making functions, and a plotting method that
graphically depicts the difference between estimates for the two main EI
methods (EI and R\(\times\)C). Below, we use a working example of a voter
precinct dataset in Corona, CA. To use the package, the process is
simple: 1) Load the package, the appropriate data, run the EI
generalized function, and create an EI table of results, 2) Run the
R\(\times\)C function (from eiPack) and create a table of results, 3)
Run the Goodman regression generalized function if the user chooses, 4)
Combine the results of all the algorithms together into a comparison
table, and 5) Plot the comparison results. Before we conclude, we also
compare EI and R\(\times\)C findings against exit poll data from a 2005
Los Angeles mayoral run-off election. The rest of the paper follows this
aforementioned outline.
To begin, we install (install.packages("eiCompare")
) and load the
eiCompare package (library(eiCompare)
) from the CRAN repository.
First, we load the aggregate-level dataset (data(cor_06)
) into R, in
this case a precinct (voting district) dataset from a 2006 election in
the city of Corona, CA. Table 1 below displays the first five
rows and column headers of the dataset. This dataset includes all the
necessary variables to run the code in the eiCompare
package. The
first column is "precinct"
, which essentially operates as a unique
identifier. The second column, "totvote"
, is the total number of votes
cast within the precinct. Columns three and four are the two racial
groups of whom we seek to determine their mean voting preference. The
rest of the columns are the percent of the total vote for each
respective candidate.
precinct | totvote | pct_latino | pct_other | pct_breitenbucher | pct_montanez | pct_spiegel | pct_skipworth | |
---|---|---|---|---|---|---|---|---|
1 | 22000 | 942 | 0.21 | 0.79 | 0.20 | 0.21 | 0.29 | 0.30 |
2 | 22002 | 1240 | 0.16 | 0.84 | 0.22 | 0.22 | 0.29 | 0.27 |
3 | 22003 | 1060 | 0.21 | 0.79 | 0.22 | 0.22 | 0.30 | 0.26 |
4 | 22004 | 1280 | 0.45 | 0.55 | 0.18 | 0.27 | 0.30 | 0.24 |
5 | 22008 | 1172 | 0.31 | 0.69 | 0.23 | 0.25 | 0.30 | 0.22 |
6 | 22012 | 1093 | 0.21 | 0.79 | 0.20 | 0.24 | 0.32 | 0.24 |
We are interested in how the four candidates (Breitenbucher, Montanez,
Spiegel, Skipworth) performed with Latino voters and non-Latino voters
(mostly non-Hispanic white), so we can asses whether racially polarized
voting exists. The process begins with the ei_est_gen()
function,
which is a generalized version of the ei()
function from the ei
package. Instead of having to estimate EI results for each candidate and
each racial group separately, ei_est_gen()
automates this process.
The ei_est_gen()
function takes a vector of candidate names, a
character vector of tilde-prefixed racial group names, the name of the
column representing the total number of people in the jurisdiction
(e.g., registered voters, ballots cast), the "data.frame"
object
holding the data, and the table names used to display the results. The
function also has four optional arguments, rho
, sample
, tomog
, and
density_plot
. The former two can be used to adjust the parameters of
the ei()
algorithm. These are especially useful when the initial run
does not compile or warnings are produced. The latter two plot out
tomography and density plots, respectively, into the working directory
but are set to off by default. These plots can be used to assess the
stability – and thus veracity – of the EI procedure (see King and M. Roberts (2012)
and (King 1997) for details). Finally, the ...
argument passes
additional arguments onto the ei()
function from the ei package.
One final note, given its iterative nature, the ei_est_gen()
function
can take a while to execute. This typically depends on features unique
to the dataset, including the number of candidates and groups, the
amount of racial/ethnic segregation within the city/area, as well as the
number of precincts. This particular example does not take especially
long, executing in about a minute on a standard Macbook Pro.
# LOAD DATA
data(cor_06)
# SET SEED FOR REPRODUCIBILITY
set.seed(294271)
# CREATE CHARACTER VECTORS REQUIRED FOR FUNCTION
<- c("pct_breitenbucher","pct_montanez","pct_spiegel", "pct_skipworth")
cands <- c("~ pct_latino", "~ pct_other")
race_group2 <- c("EI: Pct Lat", "EI: Pct Other")
table_names # RUN EI GENERALIZED FUNCTION
<- ei_est_gen(cand_vector=cands, race_group = race_group2,
results total = "totvote", data = cor_06, table_names = table_names)
# LOOK AT TABLE OF RESULTS
results
The call to the results object produces a table of results indicating the mean estimated voting preferences for Latinos and non-Latinos within the city of Corona (see Table 2). The results strongly suggest the presence of racially polarized voting, as Latinos prefer Montanez as their number one choice, whereas non-Latinos do not.
Candidate | EI: Pct Lat | EI: Pct Other |
---|---|---|
pct_breitenbucher | 19.68 | 21.12 |
se | 0.75 | 0.13 |
pct_montanez | 35.95 | 20.13 |
se | 0.03 | 0.08 |
pct_spiegel | 28.43 | 31.01 |
se | 0.57 | 0.23 |
pct_skipworth | 18.64 | 26.84 |
se | 0.71 | 0.23 |
Total | 102.69 | 99.10 |
The R\(\times\)C builds off of code from the eiPack package, where
eiCompare simply takes the former’s results and puts them into a
similar "data.frame"
/"table"
object similar to the results from the
ei_est_gen()
function. First, the user follows the code from the
eiPack package (here we use the ei.reg.bayes()
function), and
creates a formula object including all candidates and all groups. The
user must ensure that the percentages on both signs of the \(\sim\) symbol
add to 1. Thus, the initial table()
code is a simple data check to
ensure that this rule is followed. The R\(\times\)C model is then run
using the ei.reg.bayes()
model. Users can read the eiPack
documentation to familiarize themselves with this procedure. Depending
on the nature of one’s data, the R\(\times\)C code can take a while to
run. Finally, the results are passed onto the bayes_table_make()
function, along with a vector of candidate names, and a vector of table
names, similar to what was passed to ei_est_gen()
.
# CHECK TO MAKE SURE DATA SUMS TO 1 FOR EACH PRECINCT
with(cor_06, pct_latino + pct_other)
with(cor_06, pct_breitenbucher + pct_montanez + pct_spiegel + pct_skipworth)
# SET SEED FOR REPRODUCIBILITY
set.seed(124271)
#RxC GENERATE FORMULA
<- formula(cbind(pct_breitenbucher,pct_montanez,
form ~ cbind(pct_latino, pct_other))
pct_spiegel, pct_skipworth) # RUN EI:RxC MODEL
<- ei.reg.bayes(form, data = cor_06, sample = 10000, truncate = TRUE)
ei_bayes # CREATE TABLE NAMES
<- c("RxC: Pct Lat", "RxC: Pct Other")
table_names
# TABLE CREATION
<- bayes_table_make(ei_bayes, cand_vector = cands, table_names = table_names)
ei_bayes_res # LOOK AT TABLE OF RESULTS
ei_bayes_res
Candidate | RxC: Pct Lat | RxC: Pct Other |
---|---|---|
pct_breitenbucher | 18.22 | 21.58 |
se | 1.62 | 0.53 |
pct_montanez | 34.96 | 20.44 |
se | 1.72 | 0.56 |
pct_spiegel | 28.24 | 31.05 |
se | 1.08 | 0.35 |
pct_skipworth | 18.61 | 26.91 |
se | 1.73 | 0.56 |
Total | 100.03 | 99.99 |
The results are presented in Table 3, and look remarkably similar to those presented in Table 2. Indeed, the exact same conclusions would be drawn from an analysis of both tables: Latinos prefer Montanez as their first choice and non-latinos prefer Spiegel as their top choice.
While many users will skip over the Goodman regression when conducting
ecological inference, given the documented issues with the method
(Shively 1969; King 1997), eiCompare nevertheless
has a Goodman regression generalized function, similar to the
ei_est_gen()
function. This function takes a character vector of
candidate names, a character vector of racial groups, the name of the
column, a data object, and a character vector of table names. Because
Goodman is simply a linear regression, the execution is very fast.
<- c("Good: Pct Lat", "Good: Pct Other")
table_names <- goodman_generalize(cands, race_group2, "totvote", cor_06, table_names)
good good
Table 4 shows the Goodman regression results. In this particular case, these results align quite closely with results from the two EI models. All three approaches essentially tell us the same thing.
Candidate | Good: Pct Lat | Good: Pct Other |
---|---|---|
pct_breitenbucher | 17.51 | 20.34 |
se | 3.18 | 3.74 |
pct_montanez | 35.00 | 20.48 |
se | 3.41 | 4.01 |
pct_spiegel | 28.52 | 31.61 |
se | 2.16 | 2.54 |
pct_skipworth | 18.97 | 27.57 |
se | 3.45 | 4.05 |
Total | 100.00 | 100.00 |
The last two sections address the comparison component of the package.
The function, ei_rc_good_table()
, takes the objects from the EI,
R\(\times\)C, and Goodman regression, and puts them into a
"data.frame"``"table"
object. To simplify comparison, the table adds
an EI-R\(\times\)C column differential for each racial group. This format
lets the user quickly assess how the EI and R\(\times\)C methods stack up
against one another. The function takes the following arguments: EI
results object (e.g., results
), an R\(\times\)C object (e.g.,
ei_bayes_res
), and a character vector groups
(e.g.,
c("Latino", "Other")
) argument. The good
argument for the Goodman
regression is set to NULL
, and the include_good
argument defaults to
FALSE
. If the user wants to include a Goodman regression in the
comparison of results they need to change the latter to TRUE
and
specify the the good
argument as the object name from the
goodman_generalize()
call.
Candidate | EI: Pct Lat | RxC: Pct Lat | EI_Diff | EI: Pct Other | RxC: Pct Other | EI_Diff |
---|---|---|---|---|---|---|
pct_breitenbucher | 19.68 | 18.22 | -1.46 | 21.12 | 21.58 | 0.46 |
se | 0.75 | 1.62 | 0.13 | 0.53 | ||
pct_montanez | 35.95 | 34.96 | -0.99 | 20.13 | 20.44 | 0.31 |
se | 0.03 | 1.72 | 0.08 | 0.56 | ||
pct_spiegel | 28.43 | 28.24 | -0.19 | 31.01 | 31.05 | 0.04 |
se | 0.57 | 1.08 | 0.23 | 0.35 | ||
pct_skipworth | 18.64 | 18.61 | -0.02 | 26.84 | 26.91 | 0.07 |
se | 0.71 | 1.73 | 0.23 | 0.56 | ||
Total | 102.69 | 100.03 | -2.66 | 99.10 | 99.99 | 0.88 |
The results of ei_rc_good_table()
is a new class "ei_compare"
, which
includes a "data.frame"
and groups character vector. This output is
ultimately passed to plot()
.
<- ei_rc_good_table(results, ei_bayes_res,
ei_rc_combine groups = c("Latino", "Other"))
@data
ei_rc_combine
<- ei_rc_good_table(results, ei_bayes_res, good,
ei_rc_g_combine groups = c("Latino", "Other"), include_good = TRUE)
ei_rc_g_combine
Table 5 displays the output of a call to the
ei_rc_good_table()
function for the first line of code above. The user
must include the code @data
onto the outputted table name to extract
just the table. This table basically summarizes the results of the EI
and R\(\times\)C analyses. Clearly, very little difference emerges between
the two methods in this particular instance.
Finally, users can plot the results of the EI, and R\(\times\)C comparison
to more visually determine whether the two methods are similar. Plotting
is simple, as plot methods have been developed for the "ei_compare"
class. The code below produces the plot depicted in
Figure 1.
# PLOT COMPARISON -- adjust the axes labels slightly
plot(ei_rc_combine, cex.axis = .5, cex.lab = .7)
One possible question remains, whether or not ecological estimates line up with individual level estimates. Many studies have pointed out that ecological fallacy and aggregation bias can produce ecological inference results that are highly questionable. In this section we implement the eiCompare package for a mayoral election in a multiethnic setting in which an individual-level exit poll survey was also administered. The eiCompare package provides EI and R\(\times\)C results for the 2005 Los Angeles mayoral runoff election between Antonio Villaraigosa and James Hahn, and we also add results for the Los Angeles Times exit poll. Results are displayed in Table 6.
EI: AV | EI: JH | RxC: AV | RxC: JH | Exit: AV | Exit: JH | MOE | |
---|---|---|---|---|---|---|---|
White | 45 | 54 | 48 | 52 | 50 | 50 | +/- 2.5 |
Black | 58 | 40 | 50 | 50 | 48 | 52 | +/-4.2 |
Latino | 82 | 17 | 81 | 19 | 84 | 16 | +/-3.6 |
Asian | 48 | 51 | 47 | 53 | 44 | 56 | +/-6.1 |
The results presented in Table 6 demonstrate that not only do EI and R\(\times\)C produce remarkably consistent results, but they very closely match the individual level estimates for the Los Angeles Times. The EI R\(\times\)C estimates are all with the confidence range of the individual level data reported by the exit poll.
eiCompare is a new package that builds on the work of King and others that attempts to address the ecological inference problem of making individual-level assessments based on aggregate-level data. As we have reviewed above, there is considerable debate in the sciences about the utility and accuracy of ecological techniques. Despite these well documented questions, ecological inference is widely used in political science and will continue to grow in importance when the constitutionally mandated redistricting in 2021 occurs. The redistricting cycle will bring with it extensive academic, legislative, and legal research using ecological inference to assess racial voting patterns across all 50 states.
While this new package does not develop a new method, per se, it improves analysts’ ability to quickly compare different commonly used EI algorithms to assess the veracity of the methods and also produce tables of their findings. While R\(\times\)C has been touted as the method necessary in situations with multiple groups and multiple candidates, the results do not always demonstrate face validity. In these scenarios – and others – analysts may want to incorporate original EI methods so they can compare how the two approaches stack up. Ultimately, this approach provides a needed assessment between two commonly used methods in voting behavior research.
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Collingwood, et al., "eiCompare: Comparing Ecological Inference Estimates across EI and EI:RxC", The R Journal, 2016
BibTeX citation
@article{RJ-2016-035, author = {Collingwood, Loren and Oskooii, Kassra and Garcia-Rios, Sergio and Barreto, Matt A.}, title = {eiCompare: Comparing Ecological Inference Estimates across EI and EI:RxC}, journal = {The R Journal}, year = {2016}, note = {https://rjournal.github.io/}, volume = {8}, issue = {2}, issn = {2073-4859}, pages = {92-101} }