This article describes the R package OrthoPanels, which includes the function opm()
. This function implements the orthogonal reparameterization approach recommended by (Lancaster 2002) to estimate dynamic panel models with fixed effects (and optionally: wave specific intercepts). This article provides a statistical description of the orthogonal reparameterization approach, a demonstration of the package using real-world data, and simulations comparing the estimator to the known-to-be-biased OLS estimator and the commonly used GMM estimator.
Panel data includes observations on
This article provides an introduction to the R package
OrthoPanels, which
includes the function opm()
. This function implements the orthogonal
reparameterization approach recommended by (Lancaster 2002) to estimate
dynamic panel models with fixed effects (and optionally: wave specific
intercepts). In this article, we first provide a brief review of the
methods implemented in OrthoPanels, and then discuss an empirical
illustration using some of the features of the function opm()
.
Finally, we conduct some Monte Carlo simulations to demonstrate the
performance of the opm()
function under different assumptions about
the data generating process.
Panel models are used when we have observations on
A dynamic panel model contains one or more lags of the dependent
variable on the right-hand side. A dynamic model is necessary if the
dependent variable is autoregressive (Arellano 2003 129). We would
expect a dependent variable to be autoregressive if we believe that
subsequent to something happening that temporarily changes the value of
the dependent variable (e.g., a temporary shift in an independent
variable), the dependent variable will return partly but not entirely to
its original value before the next observation. The consequence is that
we have a dependent variable in which values that are above/below
average at one observation are more likely than not to be above/below
average at the next observation. This autoregressive relationship
between current and past values of the dependent variable is not due to
an omitted variable that links current and past values and not due to
correlation between the current and past values of the errors. The
autoregressive relationship is due to past values of the dependent
variable having a direct effect on current values (Finkel 2008 486). A simple example of a data generating process that might
require a dynamic model with fixed effects (and wave specific
intercepts) is:
where
The estimation of (1) presents a challenge. The OLS
fixed-effects estimator with a lag dependent variable is biased with a
fixed (small)
A popular alternative to the OLS estimation is the (Anderson and Hsiao 1981)
instrumental variable approach and more generally the Arellano and Bond
GMM estimator (1991), and other GMM estimators
(Arellano and Bover 1995; Blundell et al. 2000). The downside of these
estimators is that they are inefficient (Behr 2003). Further, the GMM
approach uses approximate inference methods and requires assumptions
about the appropriateness of past values of the dependent variable (and
possibly independent variables) as instruments. These assumptions may or
may not be valid. Lancaster (2000) argues the GMM estimator
contains no data or information that is not already contained in the
likelihood for the model. (Hsiao et al. 2002) propose a
transformed-likelihood approach to dealing with the incidental
parameters problem. (Lancaster 2002) proposes a conditional likelihood
estimator (also a type of transformed-likelihood approach) that can
analytically compute the conditional probability distributions, over our
variables of interest, exactly. Such likelihood-based estimators require
no instrumental variable assumptions. We refer to the approach suggested
by Lancaster as orthogonal reparameterization (or OPM for orthogonalized
panel model). This approach can be conceived of as a Bayesian version of
the frequentist, transformed-likelihood approach proposed and tested by
Hsiao, Pesaran and Tahmiscioglu (2002). See
(Pesaran 2015 692–695) and (Hsiao 2014 80–135) for a
review of likelihood-based methods developed to estimate linear dynamic
panel models with fixed effects. The transformed-likelihood approach has
been shown to perform better than GMM estimators
(Hsiao et al. 2002; Hsiao 2014), particularly when the
autoregressive parameter is close to
The Lancaster likelihood-based estimator proceeds as follows. The
maximum likelihood estimation of model (1) with a fixed (small)
The key to this approach is that we are not actually interested in
estimates of the incidental parameters
A straightforward example of the OPM approach (Lancaster 2002) is as
follows. We wish to estimate a model that includes parameters
Suppose the fixed effects can be reparameterized so that the likelihood
function for the data for a single case factors as:
Not all likelihoods can be transformed so that the incidental parameters
are orthogonalized. However, a solution with an equivalent intuition may
be available. It may be possible to reparameterize the fixed effects so
that they are information orthogonal. If we denote the log of the
likelihood for the data for observation
Once we have the marginal posterior for our parameters of interest, we
can use Monte Carlo methods to sample values from the marginal posterior
to produce estimates and credible intervals for the parameters, as
follows. We wish to estimate (1). To generalize a little, let us
allow for
Sampling from this posterior (4) gives us estimates
(distributions) for
Next, we integrate out
We can now proceed to sample triplet values
The limitation of this approach is that it must be worked out for each family of models. For example, the necessary reparameterization will differ for a panel probit model. Lancaster (2000) provides reparameterizations for Poisson count models, static linear panel models and dynamic (stationary and nonstationary) linear panel models. Estimates of dynamic, binary models by this approach are still in their early days (é and Kyriazidou 2000; Arellano and Bonhomme 2009). The approach relies on the Bayesian idea of integrating out the fixed effects to give us the marginal posterior distribution for the remaining parameters – although, this has many similarities to the frequentist idea of a conditional likelihood (Cox and Reid 1987). For a review and comparison of these approaches, see (Lancaster 2000).
In this section, we demonstrate the use of the OrthoPanels package
with an empirical example.
Evaluations of the leaders of the Conservative, Liberal Democrat and
Labour parties, measured using the question: “On a scale that runs
from 0 to 10, where 0 means strongly dislike and 10 means strongly
like, how do you feel about (David Cameron/Nick Clegg/Gordon
Brown)?” ‘0
A standard retrospective assessment of the national economic situation, or ‘sociotropic’ evaluation, using the question: “How do you think the general economic situation in this country has changed over the last 12 months? Has it:” Response options are: ‘got a lot worse’ (1); ‘got a little worse’ (2); ‘stayed the same’ (3); ‘got a little better’ (4); ‘got a lot better’ (5); ‘Don’t Know’.
Two policy evaluation questions. These are evaluations of how the
government has handled the NHS and terrorism.
A measure of the respondent’s ideology based on preference for
increasing or cutting taxes, using the question: “Using the 0 to 10
scale below, where the end marked 0 means that government should cut
taxes a lot and spend much less on health and social services, and
the end marked 10 means that government should raise taxes a lot and
spend much more on health and social services, where would you place
yourself on this scale?” ‘0
And a measure of whether the respondent identifies with the governing Labour party, using the question: “Generally speaking, do you think of yourself as Labour, Conservative, Liberal Democrat or what?” This is recoded 1 for Labour and 0 otherwise.
For each variable, a ‘Don’t know’ response was identified as missing and the corresponding case was deleted. Note though that OrthoPanels can accomodate unbalanced panel data under the assumption that the data is missing at random.
The variables for the model to be estimated can be contained in a frame, list, or environment and the model can be specified symbolically with a formula as described below.
The variables for our empirical example are included in the data frame
BES_panel
. This data frame is included with the OrthoPanels package.
> head(BES_panel)
n t Econ Clegg Brown Cameron Approve NHS Terror PID Tax
1 1 1 3 0 9 0 7 0 0 1 6
2 2 1 4 0 10 0 8 0 0 1 8
3 3 1 3 0 5 4 7 0 0 0 6
4 4 1 2 0 7 3 4 0 0 1 6
5 5 1 2 0 0 0 0 0 0 0 5
6 6 1 2 0 7 0 8 0 0 1 4
In this data frame, "Econ"
refers to evaluations of the national
economy; "Clegg"
refers to evaluations of the Liberal Democrat leader;
"Brown"
refers to evaluations of the Labour leader; "Cameron"
refers
to evaluations of the Conservative leader; "Approval"
refers to
government approval; "NHS"
refers to evaluations on how the government
has handled the NHS; "Terror"
refers to evaluations on how the
government has handled terrorism; "PID"
refers to whether the
individual identifies as Labour; and "Tax"
refers to the respondent’s
ideology based on their preference for cutting taxes. In addition to the
variables, the data frame contains the vectors "n"
and "t"
indicating the case number and wave for each observation. If the
independent and dependent variables are contained in 3- and 2-dimension
arrays, these vectors are unnecessary because the index is implicit in
the organization of the arrays.
We begin by specifying our model using a formula, indicating the data to be used, and the case and time variables:
> BES.opm.model<-opm(Approve ~ Econ + Clegg + Brown + Cameron + NHS + Terror + PID + Tax,
data = BES_panel, index = c('n','t'), n.samp = 10000, add.time.indicators = TRUE)
The first argument is a formula specifying the model symbolically:
response
term1 + term2
. This is consistent with the lm()
function. It is not necessary to include the lagged dependent variable
or the fixed effects in the model specification. This is done
automatically.
The other arguments are: data
which specifies the data frame, list or
environment containing the variables in the model; n.samp
which
specifies the number of Monte Carlo iterations to use to estimate the
parameters; index
which is a two-element vector containing the names
of the case and time variables, respectively; and add.time.indicators
which is a logical argument. If the add.time.indicators
is TRUE
, the
model includes dummy variables for each wave (time point). The default
is FALSE
. The data
and index
arguments are optional. If data is
not specified, the variables are taken from the environment from which
opm()
is called. If index is not specified, the first two vectors are
assumed to be the case and time indices, respectively. An additional
optional argument is subset
. This is a vector specifying a subset of
observations to be used in the estimation.
The function opm()
returns an object of class "opm"
which includes a
list, samples
, with the following elements: "rho"
, a vector of
n.samp
parameter samples of "v"
, a vector of n.samp
parameter samples of "beta"
, a n.samp
by x
variable matrix of parameter samples of
> summary(BES.opm.model)
Call:
opm(x = Approve ~ Econ + Clegg + Brown + Cameron + NHS + Terror +
PID + Tax, data = BES_panel, index = c("n", "t"), n.samp = 10000,
add.time.indicators = TRUE)
Parameter estimates:
<--95CI <--68CI med 68CI--> 95CI-->
rho 0.128000 0.15800000 0.189000 0.224000 0.25800000
sig2 1.611477 1.68193076 1.759734 1.846594 1.93906618
beta.Econ -0.038086 0.00077932 0.041545 0.081118 0.12235574
beta.Clegg -0.020535 0.00060795 0.022996 0.045020 0.06678845
beta.Brown 0.297306 0.31959250 0.343490 0.367214 0.39072453
beta.Cameron -0.124476 -0.10195910 -0.077744 -0.054332 -0.03195979
beta.NHS -0.235810 -0.18692841 -0.136087 -0.085502 -0.03647700
beta.Terror -0.153162 -0.11566089 -0.077034 -0.037453 -0.00036214
beta.PID 0.488818 0.64660493 0.804710 0.959541 1.10995704
beta.Tax 0.017598 0.04022762 0.063176 0.086574 0.10903010
beta.tind.2 0.217953 0.27900281 0.342908 0.406506 0.46890678
The package includes functions that may be of use in exploring the
results. The function confint()
computes equal tailed credible
intervals for one or more parameters in the fitted opm model. We can
calculate 90% equal tailed credible intervals as follows:
> confint(BES.opm.model, level = 0.90)
5% 95%
rho 0.13800000 0.24700000
sig2 1.63217694 1.90996203
beta.Econ -0.02625512 0.10794038
beta.Clegg -0.01312387 0.06009414
beta.Brown 0.30435125 0.38292909
beta.Cameron -0.11665040 -0.03883916
beta.NHS -0.21928907 -0.05272413
beta.Terror -0.14110536 -0.01271258
beta.PID 0.53904739 1.05947755
beta.Tax 0.02570175 0.10189246
beta.tind.2 0.23669235 0.4483367
The function caterplot()
creates side-by-side plots of credible
intervals of the opm model parameters. The intervals are displayed as
horizontal lines, with the 90% interval using a thicker line width and
the 95% interval a thinner one. The posterior median is indicated with a
dot.
caterplot(BES.opm.model)
abline(v=0)
We can use the function plot()
to obtain the posterior density of each
parameter.
plot(BES.opm.model, "rho")
In a dynamic model, the
> quantile(BES.opm.model$samples$beta[, 1] / (1 - BES.opm.model$samples$rho),
+ probs = c(0.025, 0.5, 0.975))
2.5% 50% 97.5%
-0.04723233 0.05098789 0.15025695
opm()
function: Monte Carlo simulationsTo demonstrate the performance of the OPM estimator operationalized by
the opm()
function, we use it to estimate models from simulated data
with a known data generating process.
We use
We generate panel data sets with an
Table 1, panels A through D, report the results of the simulations when
the autoregressive parameter,
The table reports the average value estimated for the autoregressive
parameter
In addition to presenting the results for the opm()
function (denoted
OPM), the tables also present the results from an OLS fixed effects
estimation and from a GMM estimation
(Arellano and Bover 1995; Blundell and Bond 1998). The OLS fixed effects results
demonstrate the estimation bias that the OPM estimator is designed to
rectify. It is also a commonly used estimator for this type of panel
data, despite the known bias (Nickell 1981).The GMM approach is probably
the most commonly used alternative and appears to be the next best
performing estimator compared to likelihood-based estimators
(Hsiao et al. 2002). There are a number of variations on the
GMM approach (Croissant and Millo 2008). We used a difference GMM with a
two-step estimator. We use the available lags of the dependent variable
as instruments. This specification was chosen as it seemed to produce
the best results given our data generating process. We used the
plm package to produce the
GMM estimates (Croissant and Millo 2008). This is a very versatile package
for estimating panel models within R.
A (T=2; N=1000; Rho=0.9; Beta=0.5; LR Effect=5) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Approach | Rho | Beta | LR Effect | LR Effect | Rho | Beta | LR Effect | Rho | Beta | LR Effect |
(Average) | (Average) | (Average) | (Median) | (95% CI) | (95% CI) | (95% CI) | (RMSE) | (RMSE) | (RMSE) | |
OLS, FE | 0.59 | 0.42 | 1.05 | 1.04 | 0 | 0 | 0 | 0.31 | 0.08 | 3.95 |
GMM | 0.92 | 0.5 | 1.98 | 2.21 | 0.97 | 0.97 | 0.72 | 0.18 | 0.05 | 36.57 |
OPM | 0.9 | 0.5 | 5.89 | 5.17 | 0.93 | 0.94 | 0.92 | 0.03 | 0.01 | 2.79 |
B (T=3; N=1000; Rho=0.9; Beta=0.5; LR Effect=5) | ||||||||||
Approach | Rho | Beta | LR Effect | LR Effect | Rho | Beta | LR Effect | Rho | Beta | LR Effect |
(Average) | (Average) | (Average) | (Median) | (95% CI) | (95% CI) | (95% CI) | (RMSE) | (RMSE) | (RMSE) | |
OLS, FE | 0.71 | 0.45 | 1.54 | 1.54 | 0 | 0 | 0 | 0.19 | 0.05 | 3.46 |
GMM | 0.9 | 0.5 | 6.17 | 3.67 | 0.95 | 0.96 | 0.80 | 0.09 | 0.02 | 60.75 |
OPM | 0.9 | 0.5 | 5.29 | 5.03 | 0.92 | 0.94 | 0.91 | 0.02 | 0.007 | 1.26 |
C (T=4; N=1000; Rho=0.9; Beta=0.5; LR Effect=5) | ||||||||||
Approach | Rho | Beta | LR Effect | LR Effect | Rho | Beta | LR Effect | Rho | Beta | LR Effect |
(Average) | (Average) | (Average) | (Median) | (95% CI) | (95% CI) | (95% CI) | (RMSE) | (RMSE) | (RMSE) | |
OLS, FE | 0.76 | 0.47 | 1.93 | 1.92 | 0 | 0 | 0 | 0.14 | 0.03 | 3.07 |
GMM | 0.9 | 0.5 | 50.22 | 4.46 | 0.94 | 0.94 | 0.85 | 0.06 | 0.01 | 1633.78 |
OPM | 0.9 | 0.5 | 5.09 | 4.98 | 0.94 | 0.94 | 0.93 | 0.01 | 0.005 | 0.68 |
D (T=9; N=1000; Rho=0.9; Beta=0.5; LR Effect=5) | ||||||||||
Approach | Rho | Beta | LR Effect | LR Effect | Rho | Beta | LR Effect | Rho | Beta | LR Effect |
(Average) | (Average) | (Average) | (Median) | (95% CI) | (95% CI) | (95% CI) | (RMSE) | (RMSE) | (RMSE) | |
OLS, FE | 0.84 | 0.49 | 3.07 | 3.07 | 0 | 0.02 | 0 | 0.06 | 0.01 | 1.93 |
GMM | 0.9 | 0.5 | 4.92 | 4.81 | 0.92 | 0.92 | 0.88 | 0.02 | 0.005 | 0.92 |
OPM | 0.9 | 0.5 | 5.02 | 5.02 | 0.95 | 0.94 | 0.94 | 0.005 | 0.003 | 0.26 |
E (T=2; N=1000; Rho=0.5; Beta=0.5; LR Effect=1) | ||||||||||
Approach | Rho | Beta | LR Effect | LR Effect | Rho | Beta | LR Effect | Rho | Beta | LR Effect |
(Average) | (Average) | (Average) | (Median) | (95% CI) | (95% CI) | (95% CI) | (RMSE) | (RMSE) | (RMSE) | |
OLS, FE | 0.29 | 0.45 | 0.63 | 0.63 | 0 | 0 | 0 | 0.21 | 0.05 | 0.37 |
GMM | 0.5 | 0.5 | 1.01 | 1.00 | 0.96 | 0.95 | 0.96 | 0.04 | 0.01 | 0.11 |
OPM | 0.5 | 0.5 | 1.01 | 1.01 | 0.93 | 0.94 | 0.92 | 0.03 | 0.01 | 0.07 |
F (T=3; N=1000; Rho=0.5; Beta=0.5; LR Effect=1) | ||||||||||
Approach | Rho | Beta | LR Effect | LR Effect | Rho | Beta | LR Effect | Rho | Beta | LR Effect |
(Average) | (Average) | (Average) | (Median) | (95% CI) | (95% CI) | (95% CI) | (RMSE) | (RMSE) | (RMSE) | |
OLS, FE | 0.37 | 0.47 | 0.75 | 0.75 | 0 | 0.008 | 0 | 0.13 | 0.03 | 0.25 |
GMM | 0.5 | 0.5 | 1.00 | 1.00 | 0.94 | 0.95 | 0.94 | 0.02 | 0.01 | 0.06 |
OPM | 0.5 | 0.5 | 1.00 | 1.00 | 0.92 | 0.94 | 0.92 | 0.02 | 0.01 | 0.04 |
Beginning with a
When we look at the results for the OPM and GMM approaches (again with
It may seem curious that the GMM estimator is so poor at estimating the
long-run effect, when, on average, it is good at estimating
It must be noted then that for the reasons just discussed the mean
estimate is not a very stable measure of the performance of the GMM
estimates of the long-run effect. For the GMM estimator, under these
circumstances, small differences in the estimates of
As
Table 1, Panels E & F, present the results with a reduced
Overall, the simulations demonstrate that under the conditions
considered (specifically a low
Rho | Beta | LR Effect | LR Effect | Rho | Beta | LR Effect | Rho | Beta | LR Effect | |
(Average) | (Average) | (Average) | (Median) | (95% CI) | (95% CI) | (95% CI) | (RMSE) | (RMSE) | (RMSE) | |
(T=3; N=1000; Rho=0.9; Beta=0.5; LR Effect=5) | ||||||||||
0.93 | 0.51 | 7.74 | 7.63 | 0.06 | 0.62 | 0.17 | 0.03 | 0.01 | 2.95 | |
(T=4; N=1000; Rho=0.9; Beta=0.5; LR Effect=5) | ||||||||||
0.93 | 0.51 | 7.45 | 7.36 | 0.02 | 0.48 | 0.04 | 0.03 | 0.01 | 2.56 | |
(T=9; N=1000; Rho=0.9; Beta=0.5; LR Effect=5) | ||||||||||
0.91 | 0.5 | -6.99 | 4.11 | 0.05 | 0.06 | 0.07 | 0.07 | 0.03 | 147.93 | |
(T=3; N=1000; Rho=0.5; Beta=0.5; LR Effect=1) | ||||||||||
0.53 | 0.51 | 1.08 | 0.52 | 0.48 | 0.29 | 0.28 | 0.03 | 0.01 | 0.09 |
Beyond the poor estimates for the long-run effect under specific
conditions, we have also indicated that one of the downsides of GMM
estimators is the need to make assumptions about the appropriateness of
instrumental variables. Up to this point, we have been using lagged
levels of the dependent variable as instruments in the GMM estimator. An
extension of this is to use the lagged differences of the dependent
variable as instruments (Arellano and Bover 1995; Blundell and Bond
1998). This is possible once the researcher has a
As a second empirical example, we model the dynamics of labour demand of
firm abond_panel
. This dataframe is
included with the OrthoPanels package.
We estimate the following:
> abond.opm.model <- opm(n ~ w + l_w + k + l_k + l2_k + ys + l_ys + l2_ys + yr1980
+ yr1981 + yr1982 + yr1983 + yr1984, data = abond_panel, index = c('id','year'),
n.samp = 10000, add.time.indicators = FALSE)
where n
is the log of employment in firm w
is the
log of the real product wage; k
is the log of the gross capital stock;
and ys
is the log of the gross industry output.
> summary(abond.opm.model)
Call:
opm(x = n ~ w + l_w + k + l_k + l2_k + ys + l_ys + l2_ys + yr1980 +
yr1981 + yr1982 + yr1983 + yr1984, data = abond_panel, index = c("id",
"year"), n.samp = 10000, add.time.indicators = FALSE)
Parameter estimates:
<--95CI <--68CI med 68CI--> 95CI-->
rho 0.123000 0.1400000 0.160000 0.179000 0.198000
sig2 0.042789 0.0452270 0.047862 0.050739 0.053676
beta.w -0.741396 -0.6200462 -0.497854 -0.376953 -0.262932
beta.l_w -0.321563 -0.2077651 -0.088837 0.030969 0.146837
beta.k 0.452300 0.5184428 0.585200 0.655670 0.718653
beta.l_k -0.088889 0.0042183 0.097407 0.188744 0.279613
beta.l2_k -0.031549 0.0319986 0.097301 0.163611 0.229459
beta.ys 0.207566 0.4594758 0.714674 0.971791 1.234787
beta.l_ys -1.055967 -0.6944647 -0.329579 0.030539 0.392799
beta.l2_ys -0.240888 0.0202597 0.301264 0.580882 0.844360
beta.yr1980 -0.040064 -0.0050531 0.029723 0.064730 0.099077
beta.yr1981 -0.057071 -0.0135131 0.029364 0.073729 0.115364
beta.yr1982 -0.019645 0.0192744 0.063018 0.106132 0.146820
beta.yr1983 0.031298 0.0759822 0.121946 0.167622 0.211210
beta.yr1984 0.050123 0.1021425 0.156051 0.207324 0.258260
We calculate 90% equal tailed credible intervals as follows:
> confint(abond.opm.model, level = 0.90)
5% 95%
rho 0.128000000 0.19200000
sig2 0.043604899 0.05267575
beta.w -0.699216408 -0.29845898
beta.l_w -0.283471542 0.10825151
beta.k 0.474339992 0.69903010
beta.l_k -0.057501600 0.25006782
beta.l2_k -0.010780489 0.20708355
beta.ys 0.293130802 1.15175284
beta.l_ys -0.938412547 0.28266139
beta.l2_ys -0.161339807 0.76612707
beta.yr1980 -0.028266743 0.08774348
beta.yr1981 -0.042205375 0.10218775
beta.yr1982 -0.006818364 0.13445195
beta.yr1983 0.045990409 0.19575660
beta.yr1984 0.067247279 0.24236211
We create side-by-side plots of credible intervals of the OPM model parameters. The intervals are displayed as horizontal lines, with the 90% interval using a thicker line width and the 95% interval a thinner one. The posterior median is indicated with a dot.
caterplot(abond.opm.model)
abline(v=0)
Let’s investigate the accuracy of opm()
’s parameter estimates on 200
simulated datasets.
First, let’s define the parameters used by the data-generating process:
rho <- .5
beta <- .5
sig2 <- 1
set.seed(321)
The following function generates a synthetic dataset of desired
dimensions (rho
), beta
), and sig2
)):
generate <- function(N, T, rho, beta, sig2) {
LT <- T + 50
f <- runif(N, -1, 1)
x <- array(.75 * f, dim = c(N, LT)) + rnorm(N * LT, sd = 4)
y <- matrix(0, N, LT)
for (t in 1:LT) {
yy <- if (t > 1)
y[, t - 1]
else
((f + beta * .75 * f)/(1 - rho))
y[, t] <- rho * yy + f + x[, t] * beta +
rnorm(N, sd = sqrt(sig2))
}
data.frame(i = rep(seq(N), LT - 50),
t = rep(seq(LT - 50), each = N),
x1 = c(x[(50 * N + 1):(LT * N)]),
y = c(y[(50 * N + 1):(LT * N)]))
}
Now we generate 200 datasets with
N <- 1000
T <- 3
reps <- 200
ds <- replicate(n = reps,
generate(N = N, T = T, rho = rho, beta = beta, sig2 = sig2),
simplify = FALSE)
Now we fit the OPM model to the datasets and save the results:
library(OrthoPanels)
library(knitr)
set.seed(421)
opms <- lapply(ds, function(d) {
opm(y ~ x1, data = d, n.samp = 1000)
})
Let’s check the sampled parameters:
true_param <- c(rho = rho, sig2 = sig2, beta = beta)
est_param <- sapply(opms, coef)
resid <- sweep(est_param, 1, true_param)
rmse <- sqrt(rowMeans(resid ^ 2))
kable(rbind(`True` = true_param,
`Est` = rowMeans(est_param),
`Bias` = rowMeans(resid),
`RMSE` = rmse))
rho | sig2 | beta | |
---|---|---|---|
True | 0.5000000 | 1.0000000 | 0.5000000 |
Est | 0.5038800 | 1.0050657 | 0.5018072 |
Bias | 0.0038800 | 0.0050657 | 0.0018072 |
RMSE | 0.0252336 | 0.0517390 | 0.0106620 |
Density plot for each parameter, with true value marked with a vertical line:
plot(density(sapply(opms, coef)[1,]),
main = 'Density of median of posterior samples of rho')
abline(v = rho, col = 'darkred')
plot(density(sapply(opms, coef)[2,]),
main = 'Density of median of posterior samples of sig2')
abline(v = sig2, col = 'darkred')
plot(density(sapply(opms, coef)[3,]),
main = 'Density of median of posterior samples of beta')
abline(v = beta, col = 'darkred')
The proportion of time the 95% credible interval includes the true value of the parameter:
cis <- sapply(lapply(opms, confint),
function(ci) {
ci[, '2.5%'] <= c(rho, sig2, beta) &
ci[, '97.5%'] >= c(rho, sig2, beta)
})
rowSums(cis) / reps
rho | sig2 | beta |
---|---|---|
0.930 | 0.955 | 0.925 |
CausalInference, Econometrics, MixedModels, SpatioTemporal
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Pickup, et al., "OrthoPanels: An R Package for Estimating a Dynamic Panel Model with Fixed Effects Using the Orthogonal Reparameterization Approach", The R Journal, 2017
BibTeX citation
@article{RJ-2017-003, author = {Pickup, Mark and Gustafson, Paul and Cubranic, Davor and Evans, Geoffrey}, title = {OrthoPanels: An R Package for Estimating a Dynamic Panel Model with Fixed Effects Using the Orthogonal Reparameterization Approach}, journal = {The R Journal}, year = {2017}, note = {https://rjournal.github.io/}, volume = {9}, issue = {1}, issn = {2073-4859}, pages = {60-76} }