While autoregressive distributed lagmodels allow for extremely flexible dynamics, interpreting the substantive significance of complex lag structures remains difficult. In this paper we discuss dynamac (dynamic autoregressive and cointegrating models), an R
package designed to assist users in estimating, dynamically simulating, and plotting the results of a variety of autoregressive distributed lagmodels. It also contains a number of post-estimation diagnostics, including a test for cointegration for when researchers are estimating the error-correction variant of the autoregressive distributed lagmodel.
Many important social processes vary systematically over time. Models designed to account for such dynamics have become more common in the social sciences. Applied examples range from prime ministerial approval (Clarke et al. 2000), to the determinants of prison admission rates (Jacobs and Helms 1996), to preferences for different types of government spending (Wlezien 1995), to how changing tax rates affect income (Mertens and Montiel Olea 2018).
Dynamic processes take many forms. Some of these processes are
integrated, such that the current value (“realization”) of a series is
an expression of all past values in the series plus some new innovation
(i.e.
The job of properly modeling such a diverse set of dynamic processes is
challenging. Researchers want to test whether a theoretical regressor of
interest,
We advocate for one particular specification because of its
generalizability, flexibility and robustness to a variety of
data-generating proceeses: autoregressive distributed lag(ARDL) models.
ARDL models can account for multiple lags of independent variables,
either in levels or in first-differences, as well as multiple lags of
the dependent variable. Moreover, utilizing the ARDL-bounds testing
procedure of Pesaran et al. (2001), we can test whether variables in the
ARDL model are also in a cointegrating relationship. However, the very
flexibility of this modeling strategy produces its own challenges. If
the researcher includes multiple lags (or differences, or lagged
differences) of some independent variable, it becomes increasingly
difficult to discern various effects of a change in
We introduce a new R
package,
dynamac (Jordan and Philips 2018),
to help ameliorate these two challenges. The core of the package
consists of two functions: pssbounds
, which implements the ARDL-bounds
testing procedure for cointegration (Pesaran et al. 2001), and dynardl
,
which estimates ARDL models and simulates the effect of some R
packages designed to show substantive and statistical significance
of dynamic models (for instance, the
dynsim package, Gandrud et al. 2015, 2016), as well as broader classes of models
such as those compatible with
Zelig (Choirat et al. 2018).
Below we explain the context of ARDL models generally, as well as cointegration testing and the ARDL-bounds test in particular. We then discuss dynamac functions to help estimate both the ARDL-bounds test as well as the ARDL models (and their stochastic simulations). We provide illustrative examples of each function, and conclude by offering suggestions for future research.
The autoregressive distributed lagmodel has a very general form:
The left-hand side can be estimated either in levels (
\tag{3}$$
For clarity, lagged first-differences are not shown, although they may
be added to Equations (2) and (3), either for
theoretical reasons or to purge autocorrelation from the residuals.
The strength and drawback of the ARDL model should be immediately
apparent: while researchers can specify a variety of lagged levels and
differences to account for theory as well as ensure white-noise (random)
residuals (that is, no residual autocorrelation), the same flexibility
can make inferences regarding the total effect of any individual
We circumvent this difficulty by employing stochastic simulations rather than direct interpretation of coefficients in order to show statistical and substantive significance. Dynamic stochastic simulations provide an alternative to direct hypothesis testing of coefficients, instead focusing on simulating meaningful counterfactuals from model coefficients many times and drawing inferences from the central tendencies of the simulations. Such simulations are becoming increasingly popular with increased computing power, as demonstrated in the social sciences by recent methodological and applied work (Tomz et al. 2003; Breunig and Busemeyer 2012; Williams and Whitten 2012; Gandrud et al. 2015; Philips et al. 2015, 2016; Choirat et al. 2018).
Recall that the other challenge we wish to confront is that of
identifying cointegration in autoregressive distributed lagmodels in
error-correction form. In instances of small-sample data, especially of
time points
In dynamac, these challenges are resolved through pssbounds
(named
for Pesaran, Shin and Smith, the authors of the test), which implements
the ARDL-bounds test for cointegration, and dynardl
, which estimates
autoregressive distributed lagmodels and implements the stochastic
simulations we describe above. The commands also have the virtue of
working together: estimating an ARDL model in error-correction form with
dynardl
by nature stores the values necessary to execute the
ARDL-bounds test with pssbounds
. Below, we use a substantive example
to motivate our discussion of the package in greater detail.
We illustrate the process—autoregressive distributed lagmodeling,
testing for cointegration with pssbounds
, and interpretation of concern
), is a function of the share of income
going to the top ten percent, Income Top 10 (incshare10
). We also
hypothesize that the unemployment rate, Unemployment (urate
), affects
concern over the short-run (i.e., is not cointegrating):
Before estimating any model using dynamac, users should first check
for stationarity. A variety of unit root tests can be performed using
the urca package
(Pfaff et al. 2016). These suggest that all three series are integrated
of order I(1), as they appear integrated in levels but stationary in
first-differences (
Augmented DF | Phillips-Perron | Dickey-Fuller GLS | KPSS | |
---|---|---|---|---|
Concern | 0.688 | -3.437 |
-0.893 | 0.642 |
-3.507 |
-7.675 |
-3.124 |
0.814 |
|
Unemployment | -0.612 | -2.762 | -2.802 |
0.224 |
-5.362 |
-4.879 |
-5.308 |
0.064 | |
Income Top 10 | 2.992 | 0.442 | 0.994 | 2.482 |
-3.170 |
-6.244 |
-4.032 |
0.218 | |
One augmenting lag included for all tests. |
||||
and DF-GLS have null hypothesis of a unit-root, while KPSS has a null of stationarity. |
Given that all series appear to be I(1), we proceed with estimating a
model in dynardl
in error-correction form, and then testing for
cointegration between concern about inequality and the share of income
of the top 10 percent. In general, we suggest using this
strategy—outlined in Philips (2018)—along with alternative tests
for cointegration.dynardl
.
dynardl
The syntax for dynardl
, the main function in the package, is as
follows:
formula
, a formula of the type
dynardl
.data
, an optional argument specifying a particular dataset in
which the variables from formula
can be found.lags
, a list of variables to be lagged and their corresponding
lagged levels. For instance, if an analyst had two variables, lags = list("X1" = 1, "X2" = 1)
. If
he or she wanted to incorporate multiple lags on lags = list("X1" = 1, "X2" = c(1, 2, ...)
, expanded for however
many lags were desired.diffs
, a vector of variables to be differenced, i.e., included
as diffs = c("X1", X3")
.lagdiffs
, a list of variables to be included with lagged
first-differences. The syntax is identical to lags
, but instead of
incorporating lagged levels of the relevant variable lagdiffs = list("X1" = 1)
.levels
, a vector of variables to be included in levels
contemporaneously—i.e., at time levels = c("X2", "X3")
.ec
, a TRUE/FALSE option for whether the model should be estimated
in error-correcting form. If ec = TRUE
, the dependent variable
will be run in differences. The default is FALSE.trend
, a TRUE/FALSE option for whether a linear trend should be
included in the regression. The default is FALSE.constant
, a TRUE/FALSE option for whether a constant should be
included in the regression. The default is TRUE.modelout
, a TRUE/FALSE option for whether model output from the
ARDL model should be printed in the console. The default is FALSE.simulate
, a TRUE/FALSE option for whether any shocks should be
simulated. Since simulations are potentially computationally
intensive, if the analyst is simply using dynardl
to test for
cointegration using the bounds procedure, or wishes to just view the
model output, he or she might not wish to estimate simulations in
the interim. The default is FALSE.Reading the above, dynardl
is simply an engine for regression, but one
that allows users to focus on theoretical specification rather than
technical coding. All variables in the model are entered into the
formula. In this sense, dynardl
can be used in any ARDL context, not
just ones in which the user is also expecting cointegration testing or
dynamic simulations. We estimate our example model shown in Equation
(4) using dynardl
as follows:
data(ineq)
res1 <- dynardl(concern ~ incshare10 + urate, data = ineq,
lags = list("concern" = 1, "incshare10" = 1),
diffs = c("incshare10", "urate"),
ec = TRUE, simulate = FALSE )
[1] "Error correction (EC) specified;
dependent variable to be run in differences."
We specify the formula by letting dynardl
know what the dependent
variable and the regressors are. Next, in the lags
option, we let the
program know we want a lag at lags = list("concern" = c(1,2), "incshare10" = 2, "urate" = 2)
would add lags at incshare10
and urate
.lags
. In diffs
, we
let dynardl
know to include the first-difference for both Income Top
10 and Unemployment. The option ec = TRUE
estimates the dependent
variable in error-correction form (i.e., takes the first-difference of
Concern), and the program will issue a message indicating to the user
that such a transformation has taken place.FALSE
would estimate the dependent
variable in levels. If ec = TRUE
, the saved object will also
contain $pssbounds
, described below.simulate = FALSE
, we save on computing time by estimating the model
without performing stochastic simulations needed to make substantive
inferences through stochastic simulations, as shown in the next
sections. This is useful if the residuals of our first model indicate
residual autocorrelation and require respecification.
Presenting the model summary is the usual summary(foo)
:
summary(res1)
Call:
lm(formula = as.formula(paste(paste(dvnamelist), "~", paste(colnames(IVs),
collapse = "+"), collapse = " ")))
Residuals:
Min 1Q Median 3Q Max
-0.025848 -0.005293 0.000692 0.006589 0.031563
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.122043 0.027967 4.364 7.87e-05 ***
l.1.concern -0.167655 0.048701 -3.443 0.0013 **
d.1.incshare10 0.800585 0.296620 2.699 0.0099 **
d.1.urate 0.001118 0.001699 0.658 0.5138
l.1.incshare10 -0.068028 0.031834 -2.137 0.0383 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.01169 on 43 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.3671, Adjusted R-squared: 0.3083
F-statistic: 6.236 on 4 and 43 DF, p-value: 0.0004755
As shown from the regression results, dynardl
has included a constant,
the lagged dependent variable, l.1.concern
, the first difference of
the two regressors (Income Top 10 and Unemployment), as well as the lag
of Income Top 10.l.X.variable
, where X
is the
lag. First-differences are denoted by d.variable
. Lagged
first-differences are denoted by l.d.X.variable
, where X
is the
lag.
An essential component of ARDL modeling is ensuring that the residuals
from any ARDL estimation are white noise. One symptom of residual
autocorrelation in the presence of a lagged dependent variable (where
To assist users in model selection and residual testing, we offer
dynardl.auto.correlated
. This function takes the residuals from an
ARDL model estimated by dynardl
and conducts two tests for
autocorrelation—the Shaprio-Wilk test for normality and the
Breusch-Godfrey test for higher-order serial correlation—as well as
calculates the fit statistics for the Akaike information criterion
(AIC), Bayesian information criterion (BIC), and log-likelihood. The
arguments for this function are:
x
, a dynardl
model object.bg.type
, a character string, either "Chisq"
or "F"
of the type
of Breusch-Godfrey test for higher-order serial correlation. This
returns either the Chi-squared test statistic or the F statistic.
The default is "Chisq"
.digits
, an integer for the number of digits to round fit
statistics to (by default, this is three digits).order
, an integer for the highest possible order of
autocorrelation to test in the data. By default, this is computed by
the length of the series.object.out
, a TRUE/FALSE option for whether to print this output
into an object. This might be useful if the analyst is comparing
multiple ARDL objects and testing for white-noise residuals on the
basis of fit criteria like AIC. In this case, the analyst could
assign the output to some object for later comparison.To run this post-estimation diagnostics for our model above, we type:
dynardl.auto.correlated(res1)
------------------------------------------------------
Breusch-Godfrey LM Test
Test statistic: 3.704
p-value: 0.054
H_0: no autocorrelation up to AR 1
------------------------------------------------------
Shapiro-Wilk Test for Normality
Test statistic: 0.965
p-value: 0.158
H_0: residuals are distributed normal
------------------------------------------------------
Log-likelihood: 148.094
AIC: -284.187
BIC: -272.96
Note: AIC and BIC calculated with k = 5 on T = 48 observations.
------------------------------------------------------
Breusch-Godfrey test indicates we reject the null hypothesis of no
autocorrelation at p < 0.10.
Add lags to remove autocorrelation before running dynardl simulations.
Given the results of the Breusch-Godfrey LM test for autocorrelation, there appears to be some autocorrelation still in the residuals. To mitigate this, we can add lagged first-differences to our model (Philips 2018). For instance, to add a lagged first-difference of the dependent variable:
res2 <- dynardl(concern ~ incshare10 + urate, data = ineq,
lags = list("concern" = 1, "incshare10" = 1),
diffs = c("incshare10", "urate"),
lagdiffs = list("concern" = 1),
ec = TRUE, simulate = FALSE)
For brevity the results are not shown here, but they indicate that AR(1) autocorrelation is no longer present in the residuals after adding a lagged first-difference of the dependent variable.
pssbounds
Recall the definition of cointegration: two or more integrated series of
order I(1) are cointegrated if some linear combination of them results
in a stationary series.
A variety of cointegration tests have been proposed, which we do not
discuss here. Instead, we advocate for a test that has been shown to
demonstrate strong small sample properties (Philips 2018); the
ARDL-bounds test for cointegration (Pesaran et al. 2001). The ARDL-bounds
procedure proceeds as follows. First, analysts must ensure that the
regressors are not of order I(2) or more, and that any seasonal
component of the series has been removed. Second, analysts must ensure
that the dependent variable
Once the analyst ensures that the dependent variable
Two points are key. First, any independent variables that are
potentially I(1) must be entered in lagged levels at dynardl.auto.correlated
, discussed in
the previous section.
The special case of the ARDL model in Equation (5) is that
the
In other words, that the coefficients on variables appearing in first
lags are jointly equal to zero. The null hypothesis is that of no
cointegration. Rejecting the null hypothesis indicates there is a
cointegrating relationship between the series. In addition to the
F-test, a one-sided t-test may be used to test the null hypothesis that
the coefficient on the lagged dependent variable (in levels) is equal to
zero, or:
The alternative to this test is that
The procedure and tests themselves are relatively straightforward to
implement, but are complicated by two factors. The first is that
critical values for these tests are non-standard, meaning that they are
not readily testable in canned procedures. The correct (asymptotic)
critical values are provided by Pesaran et al. (2001), and the small sample
critical values for the F-statistic are given by Narayan (2005).
Just looking at the above discussion: even if the test is powerful, it
can be difficult to implement and interpret correctly. Accordingly, we
offer two functions to help implement the test and get dynamic
inferences. The first function, dynardl
, estimates the ARDL
relationship described above (to obtain the resulting F-statistic and
t-statistic). The second function, pssbounds
, takes the results of the
ARDL model and implements the ARDL-bounds test for cointegration,
providing the user with the correct critical values and an
easy-to-understand description of hypothesis testing.
To summarize, if we find that the dependent variable and regressors are
all I(1), they can only appear in lagged levels in the error-correction
model if there is evidence of cointegration. dynamac contains the
Pesaran et al. (2001) ARDL-bounds cointegration test to assist users in
testing for this. To implement the bounds testing procedure, we
introduce the function pssbounds
. This function has the following
syntax:
data
, an optional argument expecting a dynardl
model, which
calculates the following arguments for the user. If none is given,
the user must supply each of the following:obs
, an integer for the number of observations in the model time
series.fstat
, the tstat
, the case
, a numeric 1-2-3-4-5 or numeral I-II-III-IV-V of the case of
the regression.
k
, the number of regressors appearing in first lags.digits
, an integer for the number of digits to round fit
statistics to (by default, this is three digits).object.out
, a TRUE/FALSE option for whether to print this output
into an object. The default is FALSE.This command can be implemented in two ways. First, users can run the
appropriate model (following the instructions of Philips (2018)) and
pass the relevant arguments—obs
, fstat
, tstat
, case
, and
k
—to pssbounds
. Alternatively—and as we advocate—users can
estimate the model using foo <- dynardl(...)
and then,
post-estimation, run pssbounds(foo)
.ec = TRUE
, as only error-correcting models can potentially be
cointegrating.
Moving back to our example, to test for cointegration, we use the
pssbounds
command:dynardl
, and
adjust their model as needed before testing for cointegration.
res2 <- dynardl(concern ~ incshare10 + urate, data = ineq,
lags = list("concern" = 1, "incshare10" = 1),
diffs = c("incshare10", "urate"),
lagdiffs = list("concern" = 1),
ec = TRUE, simulate = FALSE)
pssbounds(res2)
PESARAN, SHIN AND SMITH (2001) COINTEGRATION TEST
Observations: 47
Number of Regressors (k): 1
Case: 3
------------------------------------------------------
- F-test -
------------------------------------------------------
<------- I(0) ------------ I(1) ----->
10% critical value 4.19 4.94
5% critical value 5.22 6.07
1% critical value 7.56 8.685
F-statistic = 12.204
------------------------------------------------------
- t-test -
------------------------------------------------------
<------- I(0) ------------ I(1) ----->
10% critical value -2.57 -2.91
5% critical value -2.86 -3.22
1% critical value -3.43 -3.82
t statistic = -3.684
------------------------------------------------------
t-statistic note: Small-sample critical values not provided for Case III.
Asymptotic critical values used.
The program displays the number of observations from the regression, the
number of regressors appearing in lagged levels (i.e., the cointegrating
equation), and the case (unrestricted intercept and no trend); all of
these are necessary to obtain the special critical values used by
Pesaran et al. (2001).
In our example, since the value of the F-statistic exceeds the critical
value at the upper I(1) bound of the test at the 1% level, we may
conclude that Income Top 10 and Concern about inequality are in a
cointegrating relationship. As an auxiliary test, pssbounds
displays a
one-sided test on the t-statistic on the lagged dependent variable.pssbounds
issues a
warning note.
dynardl
Once users have decided on the appropriate theoretical
model—accounting for the appropriate lags of the independent
variables, first-differenced variables, and so on—and drawn
conclusions regarding cointegration using pssbounds
(if applicable),
their resulting models might be somewhat complicated. Our solution to
inferences in the face of this complexity is the use of stochastic
simulations.
If a user runs dynardl
with the argument simulate = TRUE
, then the
following options become available:
shockvar
, the independent variable to be shocked in the
simulation. This is the independent variable for which we want to
estimate an effect. Analysis should enter this in quotes, like
shockvar = "X1"
. This is the only option required without a
default: we must know which variable we want inferences about.shockval
, the amount by which the independent variable should be
shocked. The default is a standard deviation of the shock variable.time
, the time period within the range
of the simulation when to
shock the independent variable of interest (the shockvar
). The
default is 10.qoi
, whether we should summarize the response of the dependent
variable with the mean or the median. Although the default is
mean
, if there is underlying skew in the distribution, it might be
better summarized by median
(Rainey 2017).forceset
, a list of variables and their values that should be
set to particular values during the simulations. By default,
variables in levels are held at their means, and variables in
differences are held at 0. If forceset
is not specified, the
simulation will be estimated using these default values. However, if
the analyst wanted to investigate the change in forceset
. These values can be any user-defined value, including
means, medians, percentiles, or other values of interest. For
instance, if forceset = list("X1" = 3)
.range
, an integer for how long the simulation is to run (the
default is 20 periods)burnin
, the number of time periods before the range begins.
burnin
allows time for the variables to equilibriate. The default
is 20. This means that 20 time periods are simulated, then the range
of the simulation begins. Users are not shown the burnin
time
periods.sims
, the number of simulations to run. The quantities of interest
are created from these simulations. The default number is 1000.sig
, an integer for any user-desired significance level in the
form of sig
would be 90. The default level is 95.expectedval
, a TRUE/FALSE option for whether the expected values
(which average errors within simulations) or predicted values (which
do not) are desired. Unless the analyst has a strong reason to use
expected values, we recommend using predicted values by including
expectedval = FALSE
, the default.dynardl
takes this set of arguments and generates the appropriate
autoregressive distributed laggiven the levels, lagged levels,
first-differences, and lagged first-differences of the variables
provided. Once this model is estimated using OLS, dynardl
simulates
the quantities of interest. Specifically, the coefficients sims
) from a multivariate
normal distribution with mean of forceset
. Note that the
equation is given time to equilibriate (through the use of burnin
periods) before the independent variable is shocked.
Perhaps the most important argument is shockvar
. This is the variable
for which the user wants some sort of inference regarding the effect of
an independent variable on the dependent variable: the response in the
dependent variable is created by shocking the shockvar
. At time
time
, the shockvar
is “shocked” by the amount of shockval
. This
shock is distributed appropriately, as defined by the model specified.
For instance, if the shockvar
is entered into the model in lagged
levels at time time
shockvar
can be
specified.
The only thing left to resolve is what to do with our sims
. We are not
particularly interested in any individual simulation in any individual
time period. Rather, we can get a sense of the central effect of the
shockvar
on foo$simulation
,
dynardl
stores the desired summary quantity of interest (either
qoi = "mean"
or qoi = "median"
) of sig
. Additionally,
users can specify whether they want predicted values (the average across
the simulations within a time period) or expected values (where each
individual simulation essentially averages out fundamental error). This
is through the option expectedval = FALSE
or TRUE
, respectively; we
highly recommend the former.
Moving back to our substantive example, now that we have our model and
found the relationship to be cointegrating, we move onto interpreting
the results through dynamic simulations. Again, although analytic
calculations of short- and long-run effects are possible (c.f., De Boef and Keele 2008), with a multitude of lags and lagged first differences,
the ability to interpret the effect on shockvar
on area.simulation.plot
and
spike.simulation.plot
. The only difference is whether an area plot or
a spike plot is desired (both are illustrated below). The arguments for
both functions are:
x
, a dynardl model object with a simulation to plot.response
, whether the the simulated response of the dependent
variable should be plotted in absolute levels
of the response, or
in changes from the mean value of the dependent variable
(response = "mean.changes"
). The default is response = "levels"
.bw
, whether the plot should be produced in black and white (for
publications) or color. The default is bw = FALSE
(a color plot).To plot these simulations, all the researcher needs to do is imagine the counterfactual he or she wishes to investigate. For instance, what is the effect of a 7-percentage-point increase in Income Top 10 on Concern? To examine statistical and substantive significance of this counterfactual, we estimate the following model below and plot it. Note that users may want to first set a seed in order to ensure that the plots are reproducible, since the simulations are stochastic. The resulting plot is shown in Figure 1.
set.seed(2059120)
res3 <- dynardl(concern ~ incshare10 + urate, data = ineq,
lags = list("concern" = 1, "incshare10" = 1),
diffs = c("incshare10", "urate"),
lagdiffs = list("concern" = 1),
ec = TRUE, simulate = TRUE,
shockvar = c("incshare10"), shockval = .07)
area.simulation.plot(res3)
area.simulation.plot()
The black dotted line shows the predicted value, while the colored
areas, from darkest to lightest, show the 75, 90, and 95 percentiles of
the predictions from the simulations, something akin to a credible
interval or confidence interval. The shock to Income Top 10 occurs by
default at time
option.
If instead we desire a spikeplot, we could type:
spike.simulation.plot(res3)
The resulting plot is shown in Figure 2.
spike.simulation.plot()
Figures 1 and 2 both illustrate that
the immediate effect of a 7-percentage-point increase in Income Top 10
is to increase Concern by about 4 percentage points. Over time, this
effect decays, eventually moving Concern to a new value of 0.56 from its
pre-shock mean of 0.58. For clarity: these effects are produced by
estimating the model formula given by the user, taking the estimated
coefficients from the model, drawing a number of simulated coefficients
from a multivariate normal distribution with mean and variance given by
the estimated coefficients and variance-covariance from the model, and
using them to predict values of the dependent variable (and
incorporating fundamental error by drawing stochastic terms from
range
of the simulation time periods to ensure that the
dependent variable has enough time to respond.
Users may have noticed fluctuations in the confidence intervals shown in
Figures 1 and 2. This is due to the
stochastic sampling technique used to create these simulations. When
using dynardl
, users have the option of increasing the number of
simulations performed using the sims
option. This should have the
effect of smoothing out period-to-period fluctuations in the confidence
intervals, at the cost of increased computing time.
dynardl
has a number of additional features users may find useful.
First, not only is it suitable for error-correction models; dynardl
can model a variety of stationary ARDL processes. To see this, we show a
simple example using data from Durr et al. (2000), who examine how
ideological movements in the United States affect aggregate public
support for the Supreme Court. Our model is:
dcalc
) is a
function of a constant, the lag of Supreme Court Support, the
ideological divergence between the Supreme Court and the public
(Ideological Divergence, iddiv
), and the level of Congressional
Approval (congapp
).dynardl
, we
type:
data(supreme.sup)
res4 <- dynardl(dcalc ~ iddiv + congapp, data = supreme.sup,
lags = list("dcalc" = 1),
levels = c("iddiv", "congapp"),
ec = FALSE, simulate = FALSE)
[1] "Dependent variable to be run in levels."
Anytime we specify levels
, any variables appearing in the list will be
included contemporaneously (i.e., appear at time
summary(res4)
Call:
lm(formula = as.formula(paste(paste(dvnamelist), "~", paste(colnames(IVs),
collapse = "+"), collapse = " ")))
Residuals:
Min 1Q Median 3Q Max
-12.1912 -3.7482 -0.2058 2.6513 11.9549
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 41.6319 11.7367 3.547 0.001078 **
l.1.dcalc 0.2437 0.1352 1.802 0.079758 .
iddiv -5.4771 2.5803 -2.123 0.040535 *
congapp 0.4732 0.1270 3.725 0.000649 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.626 on 37 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.4693, Adjusted R-squared: 0.4263
F-statistic: 10.91 on 3 and 37 DF, p-value: 2.833e-05
As the results show, the lag of Supreme Court Support is only weakly related to Supreme Court Support. Moreover, as Ideological Divergence grows, Supreme Court Support decreases. In contrast, as Congressional Approval increases, Supreme Court Support also increases. We can also bring a substantive interpretation to these effects by again using stochastic simulations. We can investigate the counterfactual of a 15-percentage-point decrease in Congressional Approval on Supreme Court Support. Moreover, we demonstrate the difference between the number of simulations used to generate predictions in Figure 3.
# 1000 sims
res4 <- dynardl(dcalc ~ iddiv + congapp, data = supreme.sup,
lags = list("dcalc" = 1),
levels = c("iddiv", "congapp"),
ec = FALSE, simulate = TRUE,
shockval = -15, shockvar = c("congapp"))
# 15000 sims
res5 <- dynardl(dcalc ~ iddiv + congapp, data = supreme.sup,
lags = list("dcalc" = 1),
levels = c("iddiv", "congapp"),
ec = FALSE, simulate = TRUE,
shockval = -15, shockvar = c("congapp"),
sims = 15000)
par(mfrow = c(2, 1))
area.simulation.plot(res4)
area.simulation.plot(res5)
The effect of the 15-percentage-point decrease in Congressional Approval
is about a 10-point decrease in Supreme Court Support. To uncover this
effect, the upper plot uses 1000 simulations, while the bottom uses
15000 and results in a much smoother measure of uncertainty in
predictions across the simulated timeframe, with only a slight increase
in estimation time.
While the default range for simulations is 20 time periods, with a shock
occurring at time time
and range
options.
In addition, dynardl
offers the ability to set variables at particular
values other than their means throughout the simulation through the
option forceset
.forceset
. As an example of this, we
re-estimate the model from above, but extend the total simulation range
to
res6 <- dynardl(dcalc ~ iddiv + congapp, data = supreme.sup,
lags = list("dcalc" = 1),
levels = c("iddiv", "congapp"),
ec = FALSE, simulate = TRUE,
shockval = -15, shockvar = c("congapp"),
sims = 15000, time = 20, range = 50,
forceset = list("iddiv" = 0.22, "congapp" = 72))
spike.simulation.plot(res6)
The results of a 15-percentage-point drop in Congressional Approval on Supreme Court Support are shown in Figure 4. It is clear that there is a relatively quick negative effect on Supreme Court Support in response to a drop in Congressional Approval, and Supreme Court Support remains at around 88 for the rest of the time periods.
forceset
dynamac also offers a few ancillary functions to help assist in
dynamic modeling using autoregressive distributed lagmodels and
dynardl
in particular. These include three basic time-series operators
to lag, difference, and lag-difference variables. Users are free to
utilize these functions outside of estimating the models.
For lagging variables, we use lshift(x, l)
, where x
is the variable
to be lagged and l
is the number of periods to be lagged. If a user
wanted a second lag (that is lshift(X, 2)
:
head(ineq$incshare10)
[1] 0.3198 0.3205 0.3198 0.3182 0.3151 0.3175
head(lshift(ineq$incshare10, 2)) # second lag
[1] NA NA 0.3198 0.3205 0.3198 0.3182
For differencing variables, we use dshift(x)
, where x
is the
variable to be first-differenced:
head(dshift(ineq$incshare10))
[1] NA 0.0007 -0.0007 -0.0016 -0.0031 0.0024
For lag-differencing variables, we use ldshift(x, l)
, where x
is the
variable to be lag-differenced and l
is the number of periods for the
difference to be lagged. If a user wanted for the difference of ldshift(X, 2)
:
head(ldshift(ineq$incshare10, 2))
[1] NA NA NA 0.0007 -0.0007 -0.0016
As a final example, we model the relationship between a country’s energy
consumption and its economic output. There is a substantial body of
literature that focuses on both the short-and long-run effects between
energy and GDP—as well as the directionality of the
relationship—making it a prime candidate for time series analysis
(Ozturk 2010; Tugcu et al. 2012). In fact, a number of
articles utilize the bounds testing approach discussed above (e.g., Odhiambo 2009; Fuinhas and Marques 2012). In the analysis below, we
focus on two key variables: the natural log of gross domestic product
(in constant 2010 US dollars) and the log of energy consumption (in
million tons of oil equivalent). These data are for the French economy,
measured annually from 1965 to 2017.lnenergy
, drives GDP, lnGDP_cons2010USD
,
not the other way around.
As Figure 5 suggests, both series appear non-stationary, with probably a trend as well. Unit-root tests (not shown) confirm that both series contain a unit root. Therefore, we proceed to estimate an ARDL model in error-correction form with a deterministic trend, test for cointegration, and interpret the results.
data(france.data)
res7 <- dynardl(lnGDP_cons2010USD ~ lnenergy, data = france.data,
lags = list("lnGDP_cons2010USD" = 1, "lnenergy" = 1),
diffs = c("lnenergy"), trend = TRUE,
ec = TRUE, simulate = FALSE)
summary(res7)
Call:
lm(formula = as.formula(paste(paste(dvnamelist), "~", paste(colnames(IVs),
collapse = "+"), collapse = " ")))
Residuals:
Min 1Q Median 3Q Max
-0.0262561 -0.0056128 0.0000717 0.0053919 0.0246083
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.995130 1.743518 4.012 0.000214 ***
l.1.lnGDP_cons2010USD -0.283335 0.071552 -3.960 0.000253 ***
d.1.lnenergy 0.257645 0.055622 4.632 2.88e-05 ***
l.1.lnenergy 0.170680 0.048796 3.498 0.001036 **
trendvar 0.003824 0.001063 3.598 0.000769 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.01112 on 47 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.6688, Adjusted R-squared: 0.6406
F-statistic: 23.72 on 4 and 47 DF, p-value: 8.84e-11
As is clear from the regression results, there appears to be a fairly
slow rate of re-equilibration when the two series are out of their
long-run equilibrium relationship, as shown by the coefficient on the
lagged dependent variable. Moreover, both the first-difference and lag
of the log of energy consumption, as well as the trend variable, are
statistically significant. Before proceeding to test for cointegration,
we ensure the residuals are white-noise using dynardl.auto.correlated
:
dynardl.auto.correlated(res7)
------------------------------------------------------
Breusch-Godfrey LM Test
Test statistic: 2.563
p-value: 0.109
H_0: no autocorrelation up to AR 1
------------------------------------------------------
Shapiro-Wilk Test for Normality
Test statistic: 0.971
p-value: 0.241
H_0: residuals are distributed normal
------------------------------------------------------
Log-likelihood: 162.789
AIC: -313.578
BIC: -301.87
Note: AIC and BIC calculated with k = 5 on T = 52 observations.
------------------------------------------------------
The Breusch-Godfrey test suggests there is no residual autocorrelation
of order AR(1), and the residuals are approximately normally
distributed. In addition, using the information criteria reported in
dynardl.auto.correlated
, we find that the model with a trend term
estimated above is more consistent with the data than one without (these
results are not reported here). Given that the model appears to be
well-specified, we proceed to test for cointegration. Note that the
program automatically selects critical values from “Case 5”, an
unrestricted intercept and trend:
pssbounds(res7)
PESARAN, SHIN AND SMITH (2001) COINTEGRATION TEST
Observations: 52
Number of Regressors (k): 1
Case: 5
------------------------------------------------------
- F-test -
------------------------------------------------------
<------- I(0) ------------ I(1) ----->
10% critical value 5.8 6.515
5% critical value 6.93 7.785
1% critical value 9.8 10.675
F-statistic = 8.2
------------------------------------------------------
- t-test -
------------------------------------------------------
<------- I(0) ------------ I(1) ----->
10% critical value -3.13 -3.4
5% critical value -3.41 -3.69
1% critical value -3.96 -4.26
t statistic = -3.96
------------------------------------------------------
t-statistic note: Small-sample critical values not provided for Case V.
Asymptotic critical values used.
Since the F-statistic (i.e., the test that all variables appearing in lagged levels are equal to zero) is larger than the I(1) critical value in the table at the 5% level, we conclude that cointegration between energy consumption and economic output exists. In addition, the one-sided t-test of the coefficient on the lagged dependent variable also supports the conclusion of cointegration. Given our model results, and finding evidence of cointegration, we turn to a substantive interpretation of our findings: what would be the effect of a one standard deviation decrease in energy consumption on GDP?
res7 <- dynardl(lnGDP_cons2010USD ~ lnenergy, data = france.data,
lags = list("lnGDP_cons2010USD" = 1, "lnenergy" = 1),
diffs = c("lnenergy"), trend = TRUE,
ec = TRUE, simulate = TRUE, shockvar = "lnenergy", shock = -0.2289)
[1] "Error correction (EC) specified;
dependent variable to be run in differences."
[1] "Deterministic linear trend added to model formula."
[1] "dynardl estimating ..."
|==============================================================| 100%
spike.simulation.plot(res7)
As shown by the spikeplot in Figure 6, the pre-shock
(i.e., any prediction before
In this paper we have introduced dynamac, a suite of functions
designed to assist applied time series analysts. This centers around
dynardl
, a multipurpose function to estimate autoregressive
distributed lag models. This flexible program models both stationary
ARDL relationships as well as cointegrating ones, and it allows users to
easily change model specifications (e.g., lags, differences, and
lag-differences). We also include two post-estimation diagnostic
commands to check for white-noise residuals and cointegration. Last,
through several applied examples we have shown the advantages of dynamic
simulation for interpreting the substantive significance of the results
of single-equation time series models, something users can easily do
using dynamac.
There are a number of interesting potential future research areas. Most useful would be extending ARDL modeling to the panel context. While other packages are designed to handle multiple panels (c.f., Gandrud et al. 2016), the ability to incorporate models that account for unit heterogeneity, such as random or fixed intercepts, would prove useful to users. In addition, adding the ability to estimate alternative models in the dynamic panel data context, such as those that account for causal feedback and time-varying covariates (Blackwell and Glynn 2018), or generalized method of moments (GMM) estimators, would also assist practitioners.
Distributions, Econometrics, Environmetrics, Finance, MixedModels, NumericalMathematics, Psychometrics, Robust, TeachingStatistics, TimeSeries
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
lags = list("concern" = c(1,2), "incshare10" = 2, "urate" = 2)
would add lags at incshare10
and urate
.[↩]FALSE
would estimate the dependent
variable in levels. If ec = TRUE
, the saved object will also
contain $pssbounds
, described below.[↩]l.X.variable
, where X
is the
lag. First-differences are denoted by d.variable
. Lagged
first-differences are denoted by l.d.X.variable
, where X
is the
lag.[↩]ec = TRUE
, as only error-correcting models can potentially be
cointegrating.[↩]dynardl
, and
adjust their model as needed before testing for cointegration.[↩]pssbounds
issues a
warning note.[↩]Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Jordan & Philips, "Dynamic Simulation and Testing for Single-Equation Cointegrating and Stationary Autoregressive Distributed Lag Models", The R Journal, 2018
BibTeX citation
@article{RJ-2018-076, author = {Jordan, Soren and Philips, Andrew Q.}, title = {Dynamic Simulation and Testing for Single-Equation Cointegrating and Stationary Autoregressive Distributed Lag Models}, journal = {The R Journal}, year = {2018}, note = {https://rjournal.github.io/}, volume = {10}, issue = {2}, issn = {2073-4859}, pages = {469-488} }