The R package pltesim calculates and depicts probabilistic long-term effects in binary models with temporal dependence variables. The package performs two tasks. First, it calculates the change in the probability of the event occurring given a change in a theoretical variable. Second, it calculates the rolling difference in the future probability of the event for two scenarios: one where the event occurred at a given time and one where the event does not occur. The package is consistent with the recent movement to depict meaningful and easy-to-interpret quantities of interest with the requisite measures of uncertainty. It is the first to make it easy for researchers to interpret short- and long-term effects of explanatory variables in binary autoregressive models, which can have important implications for the correct interpretation of these models.
Scholars from a wide variety of academic disciplines study phenomena with binary outcomes. This includes the study of war or peace (Beck et al. 1998), civil war or stability (Collier et al. 2003), wildlife habitat selection (Keating and Cherry 2004), automobile accident severity (Al-Ghamdi 2002), banking decisions (Maddala and Trost 1982), labor force participation (Mroz 1987), individual decisions about drinking water sources (Gelman et al. 2004), conflicts over water resources (Gleditsch et al. 2006), and education policy (Bailey et al. 2016), just to name a diverse few.
The desire to generalize produces the incentive for scholars to incorporate information both over time and across units, which results in time-series cross-sectional data. While helpful from an inferential standpoint, modeling processes that vary across time and space increase the number of potential estimation and interpretation problems facing scholars.
One problem that is unique to scholars examining binary time-series cross-sectional (BTSCS) data is the role of temporal dependence, or the notion that the probability of the occurrence of the event (i.e., the dependent variable) depends in part on how much time has passed since the previous occurrence. Whenever scholars estimate BTSCS models where there are omitted (or potentially unobservable) variables that are also correlated with time, there is a substantial risk of incorrect standard errors and highly misleading results (Beck et al. 1998).
Beck et al. (1998,1261) offered a
ground-breaking solution to this inferential obstacle by noting that
“BTSCS data are grouped duration data”, which implies that one can
borrow techniques from duration analysis to properly model the influence
of time since the previous event at some time
At the same time, another movement has produced meaningful improvements in the interpretation of dynamic models in the social sciences. Over the last decade or so, scholars have improved our understanding of the various short- and long-term effects that arise from dynamic models. These long-term effects can take the form of long-range multipliers in autoregressive distributed lag models (De Boef and Keele 2008) or dynamic simulations in models with lagged dependent variables (Williams and Whitten 2012). In addition to providing a more complete picture of the inferences of key theoretical variables, one reason for the explosion in scholarly attention is the emphasis on providing appropriate measures of uncertainty with easy-to-implement software packages (e.g. Williams and Whitten 2011; Gandrud et al. 2016; Choirat et al. 2017).
To this end, this article introduces the R (R Core Team 2017) package pltesim, which utilizes simulation methods to depict probabilistic long-term effects in binary models with temporal dependence (PLTE). The package is available from the Comprehensive R Archive Network (CRAN). The package follows the methodology introduced in (Williams 2016). In the remainder of the paper, we will first discuss the methodological principles at work, then the process pltesim uses to calculate probabilistic long-term effects, and finally an example with various visualization approaches.
Probabilistic long-term effects are the product of the intersection of
two methodological trends: controlling for unmodeled duration dependence
by including temporal dependence variables and interpreting short- and
long-term effects of explanatory variables in autoregressive models. For
example, if one is interested in the effects of
Calculating probabilistic long-term effects involves a two-step process.
The first step finds the change in the predicted probability of the
outcome, given a change in the independent variable (
The second step is to calculate the long-term effect (LTE). Assume that
we have modeled temporal dependence in a simple fashion, with
Then the LTE is calculated at time
And so on, up to a value of
These probabilistic long-term effects can be modified so that they
reflect a wide variety of quantities of interest. For example, scholars
can easily depict the PLTE of a short-term change in
pltesim is the only tool we know of that makes it easy to calculate and visualize probabilistic long-term effects in binary models with temporal dependence. pltesim has four steps:
Find the parameter estimates. Currently pltesim works with binary
outcome models, e.g. logit. So use a binary response with the glm
function included with the default R installation.
Create a counterfactual scenario in a data.frame
class object.
This should have a row with the fitted counterfactual values and
columns with names matching variables in your fitted model. All
variables without values will be treated as having values of 0 in
the counterfactual.
Simulate the long-term effects with pltesim’s plte_builder
function.
Plot the results with pltesim’s plte_plot
function.
In the next section we use simulated data from (Williams 2016) to illustrate these steps.
The following examples replicate panels from Figure 1 in Williams (2016,249). We start by loading the necessary packages:
library(pltesim)
library(ggplot2)
Notice that ggplot2 is loaded. It will be used later in this section to customize the plots created by pltesim.
The simulated data we will use in these examples is packaged with
pltesim. It is called negative_year
. The name refers to the
simulated data having negative duration dependence. It has the following
form:
data("negative_year", package = "pltesim")
head(negative_year)
#> group year y x
#> 1 1 1991 0 -1.04320703
#> 2 1 1992 0 0.56581828
#> 3 1 1993 0 -1.21016176
#> 4 1 1994 0 0.07632362
#> 5 1 1995 0 -0.40669992
#> 6 1 1996 1 0.44269959
where y
is the binary response, x
is the non-time independent
variable, year
is the time variable, and group
identifies each
section of the panel.
Before finding the parameter estimates from this data, we need to create
a standardized time variable that is in terms of time periods from the
last spell (or the beginning of the observation period if
left-censored), rather than years. pltesim includes the btscs
function to accomplish this:btscs
is based on a function by the same name from the R package
DAMisc which itself
is based on the Stata (StataCorp 2009) command implementing the
procedure in (Beck et al. 1998). btscs
was included in
pltesim to (a) allow improvements for handling single period
spells, (b) match pltesim’s syntax for ease of use within one
workflow, and (c) to reduce pltesim’s dependencies. It also starts
the spell time counter at 1 rather than 0.
neg_set <- pltesim::btscs(df = negative_year, event = "y", t_var = "year",
cs_unit = "group")
where df
specifies the data frame. event
is the binary response
variable where 1
indicates an event, 0
otherwise. t_var
specifies
the time variable, cs_unit
specifies the cross-sectional unit. The
resulting data frame has the form:
head(neg_set, n = 10)
#> group year y x spell_time
#> 1 1 1991 0 -1.04320703 1
#> 2 1 1992 0 0.56581828 2
#> 3 1 1993 0 -1.21016176 3
#> 4 1 1994 0 0.07632362 4
#> 5 1 1995 0 -0.40669992 5
#> 6 1 1996 1 0.44269959 6
#> 7 2 1991 0 -1.25522659 1
#> 8 2 1992 0 0.29738988 2
#> 9 2 1993 0 1.00741250 3
#> 10 2 1994 0 -0.42211204 4
This is the same data frame as before with the addition of a
spell_time
column containing a counter of time periods within each
spell.
Now estimate the parameters:
m1 <- glm(y ~ x + spell_time + I(spell_time^2) + I(spell_time^3),
family = binomial(link = "logit"), data = neg_set)
Note the inclusion of the I
interpretation function to create the
squared and cubed versions of spell_time
.plte_builder
to identify the polynomials given
just the base spell_time
variable name.bs
function from the splines package, included with R, allows similar
inclusion of polynomial splines for time using the B-spline basis.
The change in
x_change <- data.frame(x = 0.5)
which can be passed to plte_builder
along with the fitted model object
(m1
). The counterfactual must be in the form of a data frame with
column names matching each variable in the model and one row of fitted
values. Variables from the model not included in the fitted value data
frame will be treated as 0.
The counterfactual is passed to plte_builder
with the cf
argument.
The fitted model object is specified with obj
. The time variable is
identified with obj_tvar
. Information about how long the change in cf_duration
. It is
permanent
by default. The time period from the last spell over which
to simulate the effects is given with the t_points
argument.
The first simulation example finds the estimated impact of the
counterfactual lasting for one time period. To do this, the
plte_builder
function’s cf_duration
argument is set to "one-time"
.
sim1 <- plte_builder(obj = m1, obj_tvar = "spell_time",
cf_duration = "one-time",
cf = x_change, t_points = c(13, 25))
Running this code simulates a one period increase in x
by 0.5 that
occurs at 13 time points from the last spell. By default the central 95
percent interval of 1,000 simulations is returned. The extent of the
returned central interval can be specified with plte_builder
’s ci
argument and the number of simulations can be adjusted with the nsim
argument.
We can now plot the results with the plte_plot
function:
plte_plot(sim1) +
scale_y_continuous(limits = c(0, 0.4))
The first dot from the left in Figure 1 (and vertical
dashed line) represents the median simulated baseline probability (and
central 95 percent simulation interval) of the event occurring given the
simulation scenario
One can assess whether the change in
Note that because the output of plte_plot
is a gg
class ggplot2
object, we can modify it using the full set of ggplot2 functions,
including in this case, the plot’s y-axis limits with
scale_y_continuous
. This modification makes the plot more easily
comparable with the ones that follow in this section.
To examine the effects of changes to cf_duration = "permanent"
. The results are
show in Figure 2.
sim2 <- plte_builder(obj = m1, obj_tvar = "spell_time",
cf = x_change,
cf_duration = "permanent",
t_points = c(13, 25))
plte_plot(sim2) +
scale_y_continuous(limits = c(0, 0.4))
Users can also specify changes that last for periods shorter than the
entire simulation period, but longer than one-period by supplying a
numeric value to cf_duration
. For example, to have the 0.5 increase in
x
last for 4 time periods use:
sim3 <- plte_builder(obj = m1, obj_tvar = "spell_time",
cf_duration = 4,
cf = x_change,
t_points = c(13, 25))
plte_plot(sim3) +
scale_y_continuous(limits = c(0, 0.4))
The results are shown in Figure 3.
Finally, we can use pltesim to examine not only the effects of changes
in x
, but also the compound effect of experiencing multiple events. To
specify multiple events, supply an additional value to t_points
. For
example, to simulate and visualize the compound effect of an event at
simulated time 20 use:
sim4 <- plte_builder(obj = m1, obj_tvar = "spell_time",
cf = x_change,
t_points = c(13, 20, 25))
plte_plot(sim4) +
scale_y_continuous(limits = c(0, 0.4))
We can see in Figure 4 that in addition to the LTE given
the event at time
The goal of pltesim is to allow researchers to easily explore and present the short- and long-term effects of models estimated with temporal dependence. These variables can have a massive influence on the outcome of interest and can change the substantive effects of key theoretical variables. Prominent theories [such as the conflict trap; see Collier et al. (2003)] often have expectations that their variables have long-lasting effects, or that a variable’s influence grows with each recurring event. Yet, up until the introduction of pltesim, scholars have been unable to estimate and graphically depict these theoretically interesting long-run dynamics from BTSCS models.
Phylogenetics, Spatial, TeachingStatistics
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
btscs
is based on a function by the same name from the R package
DAMisc which itself
is based on the Stata (StataCorp 2009) command implementing the
procedure in (Beck et al. 1998). btscs
was included in
pltesim to (a) allow improvements for handling single period
spells, (b) match pltesim’s syntax for ease of use within one
workflow, and (c) to reduce pltesim’s dependencies. It also starts
the spell time counter at 1 rather than 0.[↩]plte_builder
to identify the polynomials given
just the base spell_time
variable name.[↩]Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Gandrud & Williams, "Simulating Probabilistic Long-Term Effects in Models with Temporal Dependence", The R Journal, 2017
BibTeX citation
@article{RJ-2017-059, author = {Gandrud, Christopher and Williams, Laron K.}, title = {Simulating Probabilistic Long-Term Effects in Models with Temporal Dependence}, journal = {The R Journal}, year = {2017}, note = {https://rjournal.github.io/}, volume = {9}, issue = {2}, issn = {2073-4859}, pages = {401-408} }