In this paper, a general overview on spatial and spatiotemporal ARCH models is provided. In particular, we distinguish between three different spatial ARCH-type models. In addition to the original definition of (Otto et al. 2016), we introduce an logarithmic spatial ARCH model in this paper. For this new model, maximum-likelihood estimators for the parameters are proposed. In addition, we consider a new complex-valued definition of the spatial ARCH process. Moreover, spatial GARCH models are briefly discussed. From a practical point of view, the use of the R-package spGARCH is demonstrated. To be precise, we show how the proposed spatial ARCH models can be simulated and summarize the variety of spatial models, which can be estimated by the estimation functions provided in the package. Eventually, we apply all procedures to a real-data example.
Whereas autoregressive conditional heteroscedasticity (ARCH) models are applied widely in time series analysis, especially in financial econometrics, spatial conditional heteroscedasticity has not been seen as critical issue in spatial econometrics up to now. Although it is well-known that classical least squares estimators are biased for spatially correlated data as well as for spatial data with an inhomogeneous variance across space, there are just a few papers proposing statistical models accounting for spatial conditional heteroscedasticity in terms of the ARCH and GARCH models of (Engle 1982) and (Bollerslev 1986). The first extensions to spatial models attempted were time series models incorporating spatial effects in temporal lags (see Borovkova and Lopuhaa (2012) and Caporin and Paruolo (2006), for instance). Instantaneous spatial autoregressive dependence in the conditional second moments, i.e., the conditional variance in each spatial location is influenced by the variance nearby, has been introduced by (Otto et al. 2016). Further details and derivations can also be found in (Otto et al. 2018, 2019). Their models allow for these instantaneous effects but require certain regularity conditions. In this paper, we propose an alternative specification of spatial autoregressive conditional heteroscedasticity based on an exponential definition of the conditional variance. This new model can be seen as the spatial equivalent of the log-GARCH model by (Geweke 1986; Pantula 1986; Milhøj 1987). Other recent papers propose a mixture of these two approaches (see Sato and Matsuda (2017, 2018a)). Moreover, all these models can be used in spatiotemporal settings (see Otto et al. (2018; Sato and Matsuda 2018b)).
In addition to the novel spatial logarithmic ARCH model, this paper demonstrates the use of the R-package spGARCH. From this practical point of view, the simulation of several spatial ARCH-type models as well as the estimation of a variety of spatial models with conditional heteroscedasticity are shown. There are several packages implementing geostatistical models, kriging approaches, and other spatial models (cf. Cressie (1993; Cressie and Wikle 2011)). One of the most powerful packages used to deal with models of spatial dependence is spdep, written by (Bivand and Piras 2015). It implements most spatial models in a user-friendly way, such as spatial autoregressive models, spatial lag models, and so forth (see, also, Elhorst (2010) for an overview). These models are typically called spatial econometrics models, although they are not tied to applications in economics. In contrast, the package gstat provides functions for geostatistical models, variogram estimation, and various kriging approaches (see Pebesma (2004) for details). For dealing with big geospatial data, the Stem package uses an expectation-maximization (EM) algorithm for fitting hierarchical spatiotemporal models (see Cameletti (2015) for details). For a distributed computing environment, the MATLAB software D-STEM from (Finazzi and Fasso 2014) also provides powerful tools for dealing with heterogeneous spatial supports, large multivariate data sets, and heterogeneous spatial sampling networks. Additionally, these fitted models are suitable for spatial imputation. Contrary to these EM approaches, Bayesian methods for modeling spatial data are implemented in the R-INLA package (see Rue et al. (2009) for technical details of the integrated nested Laplace approximations and Martins et al. (2013) for recently implemented features). Along with this package, the R-INLA project provides several functions for diverse spatial models incorporating integrated nested Laplace approximations.
In contrast to the above mentioned software for spatial models, the prevalent R-package for time series GARCH-type models is rugarch from (Ghalanos 2018). Since spGARCH has been developed mainly to deal with spatial data, we aim to provide a package which is user-friendly for researchers and data scientists working in applied spatial science. Thus, the package is coordinated with the objects and ideas of R packages for spatial data rather than packages for dealing with time series.
We structure the paper as follows. In the next Section 2, we discuss all covered spatial and spatiotemporal ARCH-type models. In addition, we introduce a novel logarithmic spatial ARCH model, which has weaker regularity conditions than the other spatial ARCH models. In the subsequent section, parameter estimation based on the maximum-likelihood principle is discussed for both the previously proposed spatial ARCH models as well as the new logarithmic spatial ARCH model. Furthermore, spatial GARCH models are briefly discussed. However, the focus of this paper should be on ARCH-type models. After these theoretical sections, we demonstrate the use of the R-package spGARCH in Section 5. Further, we fit a spatial autoregressive model with exogenous regressors and spatial ARCH residuals for a real-world data set. In particular, we analyze prostate cancer incidence rates in southeastern U.S. states. Section 7 concludes the paper.
Let
To define spatial models, in particular areal spatial models such as the
simultaneous autoregressive (SAR) models, it is convenient to consider a
vector of observations
First, we define this vector
For this definition, there is a one-to-one relation between
It is important to assume that the spatial weighting matrix is a
non-stochastic, positive matrix with zeros on the main diagonal to
ensure that a location is not influenced by itself (cf. Elhorst (2010; Cressie and Wikle 2011)). The vector of random errors is denoted by
Lemma 1 (Otto et al. (2018)). Suppose that
It is important to note that
There are two cases in which the support of the errors does not need to
be constrained. If
Of course, the truncated support of the errors has an impact on the
extent of the spatial dependence on the conditional variances.
Obviously, the support need not be constrained regarding
Next, we consider an logarithmic spatial ARCH process (log-spARCH). In
this setting, we define the natural logarithm of
Theorem 1. Suppose that
At location
Corollary 1. Assume that the assumptions of Theorem
1 are fulfilled, then
For all proofs, we refer to the Appendix.
Now, we propose a complex-valued spARCH process. In order to obtain a
solution of
Finally, we show that spatiotemporal processes are covered directly by
these approaches. For spatiotemporal data, the vector
Process type | Definition of |
Comments |
---|---|---|
spARCH | ||
spARCH (oriented) | ||
spatial log-ARCH | ||
spARCH (complex) |
Since all conditional and unconditional odd moments of spatial ARCH
processes are equal to zero, these ARCH-type models can easily be added
to any kind of (spatial) regression model without influencing the mean
equation as well as the spatial dependence in the first conditional and
unconditional moments. This makes the spatial ARCH models flexible tools
for dealing with conditional spatial heteroscedasticity in the residuals
of spatial models. For instance, one can consider spatial autoregressive
models for
In contrast to other models for heteroscedastic errors, such as the
SARAR or SARMA models, which assume spatial autoregressive or spatial
moving average error terms (cf. (Haining 1978; Fingleton 2008; Kelejian and Prucha; 2010)), the SARspARCH model does not affect the
spatial autocorrelation of the process, just the spatial
heteroscedasticity, because all conditional and unconditional odd
moments are equal to zero. Thus,
Additionally, one may include spatially lagged observations of
The parameters of a spatial ARCH process can be estimated by the
maximum-likelihood approach. To obtain the joint density for
The Jacobian matrix, of course, depends on the definition of
From a computational perspective, the computation of the log determinant
of this matrix is feasible for large data sets. To be precise, the
log-determinant is equal to
In the spGARCH package,
we implemented the iterative maximization algorithm with inequality
constraints proposed by (Ye 1988), which is implemented in the R-package
Rsolnp (see Ghalanos and Theussl (2012)). It
is important to note that the log determinant of the Jacobian also
depends on the parameters in such a way that it needs to be computed in
each iteration (see, also, Theorem 13.7.3 of (Harville 2008) for the
computation of a determinant for the sum of a diagonal matrix and an
arbitrary matrix), but
Certainly, the choice of the weighting matrices are an important design
choice of the models, which has to be prespecified. However, the true
structure of
The R-package spGARCH
provides several basic functions for the analysis of spatial data
showing spatial conditional heteroscedasticity. In particular, the
process can be simulated for arbitrarily chosen weighting matrices
according to the definitions in Section 2. Moreover, we
implement a function for the computation of the maximum-likelihood
estimators. To generate a user-friendly output, the object generated by
the estimation function can easily be summarized by the generic
summary()
function. We also provide all common generic methods, such
as plot()
, print()
, logLik()
, and so forth. To maximize the
computational efficiency, the actual version of the package contains
compiled C++ code (using the packages
Rcpp and
RcppEigen, cf. Eddelbuettel and François (2011; Bates and Eddelbuettel 2013)). A brief overview of the package and its main functions is
given in Table 2. Further, we focus on the two main
aspects of the package, i.e., the simulation (described in detail in
Section 5.1) and estimation (Section
5.2) aspects of the spARCH, log-spARCH, and
SARspARCH processes.
Function | Description |
---|---|
Main functions | |
sim.spARCH() |
Simulation of spARCH and log-spARCH processes |
sim.spGARCH() |
Simulation of spGARCH, E-spGARCH, and log-spGARCH processes |
qml.spARCH() |
Quasi-maximum-likelihood estimation for spARCH models |
qml.SARspARCH() |
Quasi-maximum-likelihood estimation for SAR models with spARCH residuals |
Generic methods | |
summary() |
Summary of an object of ‘spARCH ’ class generated by qml.spARCH() or qml.SARspARCH() |
print() |
Printing method for ‘spARCH ’ class or summary.spARCH class |
fitted() |
Extracts the fitted values of an object of ‘spARCH ’ class |
residuals() |
Extracts the residuals of an object of ‘spARCH ’ class |
logLik() |
Extracts the log-likelihood of an object of ‘spARCH ’ class |
extractAIC() |
Extracts the AIC of an object of ‘spARCH ’ class |
plot() |
Provides several descriptive plots of the residuals of an object of ‘spARCH ’ class |
The simulations of all spatial ARCH-type models are implemented in one
function, namely, the sim.spARCH()
function. The different definitions
of the model are specified via the argument type
. The use of
sim.spARCH()
is very similar to how a basic random number generator is
used, meaning that the first argument n
is the number of generated
values and all further arguments specify the parameters of the spARCH
process. For instance, one might simulate an oriented spARCH process
(meaning
R> require("spdep")
R> rho <- 0.7
R> alpha <- 1
R> d <- 50
R> n <- d^2
R> nblist <- cell2nb(d, d, type = "queen")
R> W <- nb2mat(nblist)
R> W[upper.tri(W)] <- 0
R> Y <- sim.spARCH(n = n, rho = rho, alpha = alpha, W = W,
+ type = "spARCH", control = list(seed = 5515))
To build the spatial weighting matrix, we used cell2nb()
from the
spdep package, returning
an nb
object of a nb
object into a contiguity matrix, as
sim.spARCH()
requires either a matrix (class matrix
) or a sparse
matrix (class dgCMatrix
) as an argument. Usually, spatial weighting
matrices are sparse by construction. Thus, dgCMatrix
matrix or rather to a
SparseMatrix
object defined in the eigen library in C++. Via the
control
parameter, a random seed might be passed to the simulation
function. If not, a random seed is assigned randomly from a uniform
distribution and printed in console in order that one might reproduce
the result even without having a random seed specified in advance. We
prefer to print a single number in the console rather than returning to
the random number generator (RNG) state as an attribute of the returned
vector. Thus, a random seed might either be passed as an optional
argument to sim.spARCH()
or set before calling sim.spARCH()
by
set.seed()
.
There are several types of spatial ARCH processes which can be simulated
by sim.spARCH()
. They are all specified by the argument type
. If
type = "spARCH"
, then the original spARCH process according to the
definition in (Otto et al. 2018) is simulated.
type = "log-spARCH"
, an log-spARCH process is simulated with an
user-specified value of type = "complex-spARCH"
, complex solutions of
Figure 2 illustrates the behavior of different types
of spatial ARCH processes. All of them are simulated with the same
parameters and random seeds in such a manner that the vector
Above left: spatial white noise for comparison; center: oriented spARCH
(type = “gaussian”
); right: spatial E-ARCH
(type = “exp”
).
Below left: spARCH with truncated normal errors
(type = “gaussian”
); center: spatial E-ARCH
(type = “exp”
), right: complex spARCH
(type = “complex”
).
As pointed out in Section 2, spatiotemporal ARCH models
are directly covered if time is considered as one dimension of the
R> d <- 4
R> T <- 20
R> D_s <- 1:(d^2)
R> D_t <- 1:T
R> n <- length(D_s) * length(D_t)
R> nblist <- cell2nb(d, d, type = "queen")
R> W_list <- nb2listw(nblist)
R> W_s <- Matrix(listw2mat(W_list))
R> W <- W_s
R> for(t in D_t[-1]){
R> W <- bdiag(W, W_s)
R> }
R> diag(W[-c(1:length(D_s)), -c((n - length(D_s)+1):n)]) <- 0.2
R> set.seed(1)
R> Y <- sim.spARCH(n = n, rho = 0.8, alpha = 1, W = W, type = "log-spARCH")
The spatial weighting scheme has been chosen as block diagonal matrix
like proposed above, i.e., constant matrices
The resulting simulation is depicted in Figures
3 and 4. Whereas the
simulated values are shown as time series plots placed at their correct
spatial locations in Figure 3,
4 depicts the observations as consecutive
spatial random fields. Note that the same color coding has been chosen
for both representations. On the one hand side, one can observe spatial
volatility clusters (e.g. in the pre-last plot in Figure
4,
Note that the true spatial orientation is preserved in this representation (4 × 4 grid), i.e., plots appearing close to each other are also located close to each other in space and they are, therefore, more related than more distant locations. An alternative representation as consecutive spatial random fields is shown in Figure .
…
The first five and the last two time points are plotted as spatial random fields, i.e., the simulations are shown in their natural temporal ordering. An alternative representation as time series in their true spatial ordering is shown in Figure .
For sake of completeness, we briefly demonstrate the simulation of
spatial GARCH-type models using the sim.spGARCH()
function. Like for
sim.spARCH()
, the type of the spatial GARCH model can be chosen by the
argument type
. More precisely, there are the following options
type = "spGARCH"
for simulation of spatial GARCH models according
to the definition in (11),type = "e-spGARCH"
for simulation of exponential spatial GARCH
models according to the definition in (12) with type = "log-spGARCH"
for simulation of logarithmic spatial GARCH
models according to the definition in (12) with
type = "complex-spGARCH"
for simulation of a complex-valued
spatial GARCH model.To simulate a spatial GARCH process, two spatial weights matrices need
to be specified via the arguments W1
and W2
. Moreover, two
parameters rho
and lambda
. For instance, a spatial GARCH model
can be simulated on a
R> require("spdep")
R> rho <- 0.5
R> lambda <- 0.3
R> alpha <- 1
R> d <- 20
R> nblist <- cell2nb(d, d, type = "rook") # Rook's contiguity matrix
R> W_1 <- nb2mat(nblist)
R> W_2 <- W_1
R> Y <- sim.spGARCH(rho = rho, lambda = lambda, alpha = alpha,
+ W1 = W_1, W2 = W_2, type = "spGARCH")
Similarly, spatial log-GARCH processes and exponential spatial GARCH
processes can be simulated by changing the argument type
. In this
case, the parameters sim.spGARCH()
by the arguments b
, theta
, and zeta
.
Other important functions of the package are the qml.spARCH()
and
qml.SARspARCH()
functions, which implement a quasi-maximum-likelihood
estimation algorithm (QML). As for the sim.spARCH()
function, many
spARCH models are covered in the qml.spARCH()
and qml.SARspARCH()
function. Thus, the user needs to specify which particular spARCH model
is to be fitted via the argument type
. Moreover, the model for the
mean equation is a user-specified formula
, making the use of the
estimation functions similar to the use of the common lm()
or glm()
functions.
In general, the estimators exhibited good performances for a variety of
error distributions in simulation studies, although the likelihood
function was derived under the normality assumption. This is not
surprising, as the maximum-likelihood estimators have good properties
under mild assumptions for the error processes of a variety of similar
spatial econometrics models (cf. Lee (2004; Lee and Yu 2010a,b; 2012)).
Thus, we refer to the approach as the QML approach, and the name of the
estimation functions start with qml
instead of ml
. In the following
paragraphs, we start the simulation of one specific sample, which is
then used further to illustrate the log-likelihood functions as well as
to demonstrate parameter estimation.
Compared to the log-spARCH processes, the likelihood functions of spARCH models are rather flat around the global maximum. This behavior is illustrated for simulated processes in Figure 5. The observations for the log-spARCH process have been simulated as follows.
R> nblist <- cell2nb(20, 20, type = "queen")
R> W <- nb2mat(nblist)
R> y <- sim.spARCH(n = 20^2, rho = 0.5, alpha = 1, W = W,
+ type = "log-spARCH", control = list(seed = 5515))
To simulate an oriented process, the entries of type
must be changed to
"spARCH"
, i.e.,
R> W[upper.tri(W)] <- 0
R> y2 <- sim.spARCH(n = 20^2, rho = 0.5, alpha = 1, W = W,
+ type = "spARCH",
+ control = list(seed = 5515))
spARCH process
log-spARCH process
To estimate the parameters of an intercept-free log-spARCH model without
any regressors, the formula passed to the function qml.spARCH()
should
be specified as y 0
. In addition, a data.frame
can be passed via
the data
argument to the qml
functions. Although the likelihood
function of a spARCH process is flat, good estimates can be obtained
through iterative maximization. (Otto et al. 2018) analyze the performance
of the estimators in detail. The algorithm implemented in the packages
is based on the Rsolnp
package, allowing for both equality and inequality parameter constraints
(cf. Ghalanos and Theussl (2012)).
The results of the estimation procedure are returned via an object of
the class ‘spARCH
’, for which we provide additionally several generic
functions. First, there is a summary()
function for the ‘spARCH
’
object. The summary shows all important estimation results, i.e., the
parameter estimates, standard errors, test statistics, and asymptotic
p-values, including significance stars. The estimation of the above
simulated log-spARCH process would return the following results.
R> spARCH_object <- qml.spARCH(y ~ 0, W = W, type = "log-spARCH")
R> summary(spARCH_object)
Call:
qml.spARCH(formula = y ~ 0, W = W, type = "log-spARCH")
Residuals:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.6867629 -0.6197315 -0.0053580 -0.0002615 0.5708346 2.8576621
Coefficients:
Estimate Std. Error t value Pr(>|t|)
alpha 0.919324 0.128544 7.1518 8.564e-13 ***
rho 0.402998 0.056519 7.1304 1.001e-12 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
AIC: 543.01, BIC: 539.01 (Log-Likelihood: -269.51)
Moran's I (residuals): -0.028568, p-value: 0.31795
Moran's I (squared residuals): 0.035239, p-value: 0.14479
The standard errors are estimated as Cramer-Rao bounds from the Hessian
matrix of the log-likelihood function. For triangular weighting
matrices, the estimators are asymptotically normally distributed
(Otto et al. (2018)). In addition to the Akaike and Bayesian Schwarz
information criteria, the results of Moran’s test on the residuals and
squared residuals are reported for the spatial autocorrelation of the
residuals. However, it is possible to use functions like AIC()
or
BIC()
, since there is a logLik()
method for the objects from class
‘spARCH
’. Additionally, the fitted values and residuals can be
extracted by fitted()
and residuals()
, respectively.
To analyze the residuals, we provide additionally several descriptive
plots via the generic plot()
function. The first two plots are
produced by moran.plot()
imported from the package
spdep. They inspect the
spatial autocorrelation of the residuals and the squared residuals. In
addition, the error distribution is depicted in the third graphic by a
normal Q-Q-plot. The output obtained for the above numerical example is
given below and in Figure 6.
%\begin{CodeInput}
%R> AIC(spARCH_object)
%R> BIC(spARCH_object)
%R> par(mfcol = c(1,3))
%R> plot(spARCH_object)
%\end{CodeInput}
R> AIC(spARCH_object)
[1] 543.0126
R> BIC(spARCH_object)
[1] 550.9956
R> par(mfcol = c(1,3))
R> plot(spARCH_object)
Reproduce the results as follows:
eps <- residuals(x)
W <- as.matrix(x$W)
moran.plot(eps, mat2listw(W), zero.policy = TRUE,
xlab = "Residuals", ylab = "Spatially Lagged Residuals")
Reproduce the results as follows:
eps <- residuals(x)
W <- as.matrix(x$W)
moran.plot(eps, mat2listw(W), zero.policy = TRUE,
xlab = "Residuals", ylab = "Spatially Lagged Residuals")
Reproduce the results as follows:
eps <- residuals(x)
std_eps <- (eps - mean(eps))/sd(eps)
qqnorm(eps, ylab = "Standardized Residuals")
qqline(eps)
plot()
.The mean equation can be specified as formula
for all models, i.e.,
the spARCH, log-spARCH, and SARspARCH models. Thus, there is a huge
variety of possible spatial ARCH models as well as regression models
with spARCH residuals which can be fitted by the estimation functions.
In addition to linear models of the form y a + b
, more sophisticated
models can also be fitted, e.g., models with interactions y a + b:c
,
factor models y factor
, polynomial models y poly(a, 3)
,
seasonally or regularly varying models of the form y sin(t) + cos(t)
or y sin(long) + cos(long) + sin(lat) + cos(lat)
, and so forth. We
also implement an extractAIC()
method for ‘spARCH
’ objects, such
that one might also use step()
for stepwise model selection. Table
3 provides an overview of possible
combinations of the arguments formula
and type
and shows the
resulting models, which can be fitted by the functions qml.spARCH()
and qml.SARspARCH()
, respectively.
Function | formula |
type |
Resulting model |
---|---|---|---|
qml.spARCH() |
y 0 |
"spARCH" |
spARCH model (see (1) and (2)) |
qml.spARCH() |
y 1 |
"spARCH" |
spARCH model with an additional intercept for the mean equation |
qml.spARCH() |
y a + b |
"spARCH" |
Linear Regression with regressors a and b and spARCH residuals |
qml.spARCH() |
y a + b:c |
"spARCH" |
Linear Regression with more complex expressions and spARCH residuals |
qml.spARCH() |
y 0 |
"log-spARCH" |
log-spARCH model (see (1) and (6)) |
qml.spARCH() |
y 1 |
"log-spARCH" |
log-spARCH model with an additional intercept for the mean equation |
qml.spARCH() |
y a + b |
"log-spARCH" |
Linear Regression with regressors a and b and log-spARCH residuals |
qml.spARCH() |
y a + b:c |
"log-spARCH" |
Linear Regression with more complex expressions and log-spARCH residuals |
qml.SARspARCH() |
y 0 |
"spARCH" |
SAR model without an intercept, but with spARCH residuals (see (9) and (10)) |
qml.SARspARCH() |
y 1 |
"spARCH" |
SAR model with an intercept and spARCH residuals |
qml.SARspARCH() |
y a + b |
"spARCH" |
SAR model with an intercept and the regressors a and b and spARCH residuals |
qml.SARspARCH() |
y a + b:c |
"spARCH" |
SAR model with more complex expressions and spARCH residuals |
qml.SARspARCH() |
y 0 |
"log-spARCH" |
SAR model without an intercept, but with log-spARCH residuals (see (9) and (10)) |
qml.SARspARCH() |
y 1 |
"log-spARCH" |
SAR model with an intercept and log-spARCH residuals |
qml.SARspARCH() |
y a + b |
"log-spARCH" |
SAR model with an intercept and the regressors a and b plus log-spARCH residuals |
qml.SARspARCH() |
y a + b:c |
"log-spARCH" |
SAR model with more complex expressions and log-spARCH residuals |
Below, the focus is on the incidence rates (2008–2012) for prostate cancer provided by the Centers for Disease Control and Prevention (U.S. Department of Health and Human Services, Centers for Disease Control and Prevention and National Cancer Institute (2015)). In particular, we analyze the incidence rates in all counties of several southeastern U.S. states, namely Arkansas, Louisiana, Mississippi, Tennessee, North and South Carolina, Georgia, Alabama, and Florida. This area also covers the counties along the Mississippi River collectively known as “cancer alley” (see Nitzkin (1992; Berry 2003; Brent 2010)). All rates are age-adjusted to the 2000 U.S. standard population (cf. U.S. Department of Health and Human Services, Centers for Disease Control and Prevention and National Cancer Institute (2015)). To reproduce the example, the logarithmic incidence rates as well as several covariates are included in the package.
As explanatory variables, we included a large set of environmental,
climate, behavioral, and health covariates, which might have an
influence on incidence rates for prostate cancer. For instance, we
consider air pollution, such as prostate_cancer
. Eventually, the final explanatory factors were chosen
by minimizing the Bayesian information criterion using the generic
function step()
as follows.
R> data(prostate_cancer)
R> out <- step(qml.SARspARCH(formula, B = B, W = W, type = "spARCH",
+ data = prostate_cancer), k = log(length(Y)))
The formula
object simply defines a linear model between the
logarithmic incidence rates and all factors. Further, matrix
By minimizing the BIC criterion, the
Using the generic summary()
for the ‘spARCH
’ class, the estimated
model can be summarized as follows.
Call:
qml.SARspARCH(formula = formula, B = B, W = W, type = "spARCH",
data = prostate_cancer)
Residuals:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.7492270 -0.1079639 -0.0001509 -0.0005261 0.1121190 0.6404564
Coefficients:
Estimate Std. Error t value Pr(>|t|)
alpha (spARCH) 0.0203839 0.0042674 4.7766 1.783e-06 ***
rho (spARCH) 0.3782104 0.1309656 2.8879 0.003879 **
lambda (SAR) 0.6768133 0.0356765 18.9708 < 2.2e-16 ***
(Intercept) 1.5388985 0.1702222 9.0405 < 2.2e-16 ***
F_2 0.0192857 0.0069917 2.7584 0.005809 **
F_10 -0.0205693 0.0064450 -3.1915 0.001415 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
AIC: -1734.2, BIC: -1746.2 (Log-Likelihood: 873.11)
Moran's I (residuals): -0.022899, p-value: 0.32023
Moran's I (squared residuals): 0.021409, p-value: 0.00050052
First, we see that the model has a significant spatial autocorrelation
in the mean equation since lambda (SAR)
) differs
significantly from zero. This implies that there are clusters of higher
prostate cancer incidence rates and, vice versa, lower incidence rates.
Second, the error process shows conditional, autoregressive
heteroscedasticity in space, which is captured by the spARCH component
of the model, i.e., plot()
can be used to
produce the plots shown in Figure 7.
After fitting the model, one also may include further regressors or
estimate an intercept-only model via update()
. For illustration, we
added the percentage of positive results for a prostate-specific antigen
(PSA) test in each county as an additional explanatory variable by
R> out2 <- update(out, . ~ . + PSA_test)
The PSA test is used for prostate cancer screening, meaning that there should definitely be a positive dependence between the PSA test and the incidence rates. In fact, the estimated parameter is positive, and the AIC is lower compared to the previous model. To be precise, the updated parameters are
Estimate Std. Error t value Pr(>|t|)
alpha (spARCH) 0.0199281 0.0043105 4.6231 3.78e-06 ***
rho (spARCH) 0.3902185 0.1280266 3.0479 0.0023041 **
lambda (SAR) 0.6643605 0.0366748 18.1149 < 2.2e-16 ***
(Intercept) 1.1349551 0.2301554 4.9313 8.17e-07 ***
F_2 0.0198504 0.0069903 2.8397 0.0045159 **
F_10 -0.0224035 0.0065828 -3.4034 0.0006656 ***
PSA_test 0.0095962 0.0042728 2.2459 0.0247125 *
plot()
for the real-data
example.F. 1 | F. 2 | F. 3 | F. 4 | F. 5 | F. 6 | F. 7 | F. 8 | F. 9 | F. 10 | |
---|---|---|---|---|---|---|---|---|---|---|
0.69 | 0.72 | |||||||||
0.33 | -0.03 | |||||||||
0.13 | -0.12 | |||||||||
0.31 | 0.05 | |||||||||
0.07 | 0.44 | |||||||||
1.00 | -0.02 | |||||||||
Solar radiation | 0.60 | 0.44 | ||||||||
Precipitation | -0.08 | -0.26 | ||||||||
Outdoor temperature | 1.00 | -0.05 | ||||||||
Temperature differences | 0.32 | 0.94 | ||||||||
Ambient maximal temperature | 0.08 | -0.39 | ||||||||
-0.23 | 0.32 | |||||||||
Percentage of current smokers | 0.47 | -0.85 | ||||||||
Percentage of former smokers | 0.92 | 0.37 | ||||||||
Smoke some days | -0.07 | -0.62 | ||||||||
Never smoked | -0.96 | 0.25 | ||||||||
Aerobic activity | -0.05 | 0.58 | ||||||||
Exercises | 0.41 | 0.33 | ||||||||
Physical activity index | -0.09 | 0.99 | ||||||||
Alcohol consumption | 0.04 | 0.62 | ||||||||
Binge drinking | 0.07 | 0.44 | ||||||||
Heavy drinking | 0.43 | 0.02 | ||||||||
High cholesterol | 0.00 | 1.00 | ||||||||
Cholesterol checked | 0.55 | 0.00 | ||||||||
Overweight (BMI 25.0-29.9) | 0.99 | 0.09 | ||||||||
Obese (BMI 30.0 - 99.8) | -0.75 | 0.01 | ||||||||
Blood stool test | 0.56 | -0.23 | ||||||||
Sigmoidoscopy | 0.14 | -0.16 | ||||||||
High blood pressure | 0.03 | 0.79 | ||||||||
Flu shot | 0.81 | -0.13 | ||||||||
Pneumonia vaccination | 0.51 | -0.26 | ||||||||
Health care coverage | 0.58 | 0.18 | ||||||||
Seatbelt use | -0.58 | 0.10 |
This paper examines spatial models for autoregressive conditional heteroscedasticity. In contrast to previously proposed spatial GARCH models, these models allow for instantaneous autoregressive dependence in the second conditional moments. Previous approaches only allowed for spatial dependence in the first temporal lag. However, these models are also captured by the spatial ARCH approach, since temporal dependence can be included by appropriate choices of the weighting matrix. In addition to discussing previously proposed models, we introduced a novel spatial logarithmic ARCH model, for which the probability density has been derived and maximum-likelihood estimators discussed.
In addition to this theoretical model, we focus on the computational
implementation of all considered spatial ARCH models in the R-package
spGARCH. In particular,
the simulation and estimation has been demonstrated. Regarding
maximum-likelihood estimation, a broad range of spatial models are
implemented in the package. Furthermore, the spatial weights matrices,
as well as the mean model, can easily be specified by the user,
providing a flexible and easy-to-use tool for spatial ARCH models. All
estimation functions return an object for class ‘spARCH
’, for which
several generic functions are provided, such as summary()
, plot()
,
and AIC()
. This setup also allows the use of the R-base functions,
such as step()
for stepwise model selection or update()
for updating
the results of different mean models. Eventually, the use of these
functions are demonstrated by an empirical example, namely county-level
incidence rates of prostate cancer.
In the future, the package should be extended for further spatial ARCH-type models. Along this vein, a class for model specifications should be added alongside the actual implementations via arguments for the fitting functions. In that way, the package can be aligned to common time series ARCH packages, such as the rugarch package. Furthermore, the package could benefit from robust estimation methods, another focus for future research.
Proof of Theorem 1. For this definition of
Proof of Corollary 1. For
spGARCH, spdep, gstat, Stem, rugarch, Rsolnp, Rcpp, RcppEigen
Finance, HighPerformanceComputing, NumericalMathematics, Optimization, Spatial, SpatioTemporal, TimeSeries
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Otto, "spGARCH: An R-Package for Spatial and Spatiotemporal ARCH and GARCH models", The R Journal, 2019
BibTeX citation
@article{RJ-2019-053, author = {Otto, Philipp}, title = {spGARCH: An R-Package for Spatial and Spatiotemporal ARCH and GARCH models}, journal = {The R Journal}, year = {2019}, note = {https://rjournal.github.io/}, volume = {11}, issue = {2}, issn = {2073-4859}, pages = {401-420} }