We present a new version (CAR()
and SAR()
families for conditional and simultaneous autoregressive random effects were implemented. Eigen decomposition of the matrix describing the spatial structure (e.g., the neighborhood matrix) was used to transform the CAR/SAR random effects into an independent, but heteroscedastic, Gaussian random effect. A linear predictor is fitted for the random effect variance to estimate the parameters in the CAR and SAR models. This gives a computationally efficient algorithm for moderately sized problems.
We present an algorithm for fitting spatial generalized linear models
with conditional and simultaneous autoregressive [CAR & SAR;
Besag (1974); Cressie (1993)] random effects. The algorithm completely avoids
the need to differentiate the spatial correlation matrix by transforming
the model using an eigen decomposition of the precision matrix. This
enables the use of already existing methods for hierarchical generalized
linear models [HGLMs; Lee and Nelder (1996)] and this algorithm is implemented in
the latest version (
The hglm package, up to version 1.2-8, provides the functionality for fitting HGLMs with uncorrelated random effects using an extended quasi likelihood method [EQL; Lee et al. (2006)]. The new version includes a first order correction of the fixed effects based on the current EQL fitting algorithm, which is more precise than EQL for models having non-normal outcomes (Lee and Lee 2012). Similar to the h-likelihood correction of Noh and Lee (2007), it corrects the estimates of the fixed effects and thereby also reduces potential bias in the estimates of the dispersion parameters for the random effects (variance components). The improvement in terms of reduced bias is substantial for models with a large number of levels in the random effect (Noh and Lee 2007), which is often the case for spatial generalized linear mixed models (GLMMs).
Earlier versions of the package allow modeling of the dispersion parameter of the conditional mean model, with fixed effects, but not the dispersion parameter(s) of the random effects. The current implementation, however, enables the user to specify a linear predictor for each dispersion parameter of the random effects. Adding this option was a natural extension to the package because it allowed implementation of our proposed algorithm that fits CAR and SAR random effects.
Though the CAR/SAR models are widely used for spatial data analysis there are not many software packages which can be used for their model fitting. GLMMs with CAR/SAR random effects are often fitted in a Bayesian way using BUGS software [e.g., WinBUGS; Lunn et al. (2000) or alike], which leads to extremely slow computation due to its dependence on Markov chain Monte Carlo simulations. The R package INLA (Martins et al. 2013) provides a relatively fast Bayesian computation of spatial HGLMs via Laplace approximation of the posterior distribution where the model development has focused on continuous domain spatial modeling (Lindgren and Rue 2015) whereas less focus has been on discrete models including CAR and SAR. Recently, package spaMM (Rousset and Ferdy 2014) was developed to fit spatial HGLMs but is rather slow even for moderately sized data. Here, we extend package hglm to also provide fast computation of HGLMs with CAR and SAR random effects and at the same time an attempt has been made to improve the accuracy of the estimates by including corrections for the fixed effects.
The rest of the paper is organized as follows. First we give an introduction to CAR and SAR structures and present the h-likelihood theory together with the eigen decomposition of the covariance matrix of the CAR random effects and show how it simplifies the computation of the model. Then we present the R implementation, show the use of the implementation for two real data examples and evaluate the package using simulations. The last section concludes the article.
HGLMs with spatially correlated random effects are commonly used in spatial data analysis (Cressie 1993; Wall 2004). A spatial HGLM with Gaussian CAR random effects is given by
with
where
gives a Gaussian SAR random effects structure. The
If
In order to explain the specific model fitting algorithm implemented in hglm, we present a brief overview of the EQL algorithm for HGLMs (Lee and Nelder 1996). First, we start with the standard HGLM containing only uncorrelated random effects. Then, we extend the discussion for CAR and SAR random effects. Though it is possible to have more than one random-effect term, in a HGLM, coming from different distributions among the conjugate distributions to GLM families, for presentational simplicity we consider only one random-effect in this section. The (log-)h-likelihood for a HGLM with independent random effects can be presented as
where,
where
In order to estimate the model parameters, (Lee and Nelder 1996) suggested a
two-step procedure in which the first, for fixed dispersion parameters
In the second step, the following profile likelihood is maximized to estimate the dispersion parameters.
where optim
function.
A unified algorithm for estimating the variance components can be derived by using profile likelihood adjustment in Equation ((6)) instead of ((5)). This leads to maximizing
for estimating
HL(0,1) and EQL were found to be biased, especially for binary responses
when the cluster size is small (Lee and Nelder 2001; Noh and Lee 2007) and when method = "EQL1"
option, but still
hglm | spaMM | HGLMMM | |
---|---|---|---|
EQL | method = "EQL" |
HLmethod = "EQL-" |
- |
HL(0,1) | - | HLmethod = "HL(0,1)" |
LapFix = FALSE |
HL(1,1) | HLmethod = "HL(1,1)" |
LapFix = TRUE |
method = "EQL1"
in the hglm package, the
HL(1,1) correction of the working response is applied in the EQL
algorithm.
The above EQL method cannot be directly applied to HGLMs with CAR/SAR
random effects because the resulting
The main difference between an ordinary GLMM with independent random effects and CAR/SAR random effects models is that the random effects in the later cases are not independent. In the following we show that by using the eigen decomposition we can reformulate a GLMM with CAR or SAR random effects to an equivalent GLMM with independent but heteroscedastic random effects.
Let
It is worth noting that the relation between the eigenvalues of
Re-arranging Equation ((1)), we have
where,
where
where
The spatial models above are implemented in the hglm package
(CAR()
and SAR()
, for the
random effects. The “CAR” and “SAR” families allow the user to define
spatial structures by specifying the
Throughout the examples below, we use Gaussian CAR random effects to introduce spatial correlation. However, one can also add additional independent non-Gaussian random effects along with the Gaussian CAR random effect. For example, an overdispersed count response can be fitted using a Poisson HGLM with an independent Gamma and Gaussian CAR random effects.
Option | Explanation |
---|---|
rand.family = CAR(D = nbr) |
The random effect has conditional or simul- |
rand.family = SAR(D = nbr) |
taneous autoregressive covariance structure. |
Here nbr is a matrix provided by the user. |
|
method = "EQL1" |
The first order correction of fixed effects |
(Lee and Lee 2012) applied on the EQL estimates. | |
rand.disp.X = X |
Linear predictor for the variance component |
of a random effect. The matrix X is provided |
|
by the user. | |
rand.family = |
This option provides the possibility of having |
list(Gamma(),CAR()) |
different distributions for the random effects. |
In the hglm package vignette, we look at the improvement of EQL1 in comparison to EQL. Here, we focus on the precision of the parameter estimates with spatial HGLMs.
We study the properties of the estimates produced by hglm using a
simulation study built around the Scottish Lip Cancer example (see also
the examples). We simulate data with the same
Bias in the estimation methods | ||||||
---|---|---|---|---|---|---|
3-7 Parameter | True value | CAR | SAR | |||
3-4 | EQL |
EQL1 correction |
EQL |
EQL1 correction |
||
intercept | 0.2500 | 0.0351 |
0.0105* | 0.0541 |
0.0307* | |
0.3500 | ||||||
0.6667 | 0.0660 |
0.0658 |
n/a* | n/a* | ||
0.0118 |
0.0119 |
n/a* | n/a* | |||
0.8165 | n/a* | n/a* | 0.0393 |
0.0370 |
||
n/a* | n/a* | 0.0037 |
0.0049 |
|||
1.5000 | ||||||
0.1000 | ||||||
Both the EQL and the EQL1 correction are slightly biased for
"EQL1"
over 1000 Monte Carlo simulations
for the Scottish Lip Cancer example.The simulation results also revealed that the distribution of
Fitting CAR and SAR models for large data sets could be computationally challenging, especially for the spatial variance-covariance parameters. Using our new algorithm in hglm, moderately sized problems can be fitted efficiently.
We re-sampled from the ohio
data set (for more details on the data set
see also the examples) for different number of locations, each with 10
replicates, executed on a single Intel©Xeon©E5520 2.27GHz CPU. In each
replicate, the data was fitted using both hglm and spaMM, for the
same model described above. The results regarding average computational
time are summarized in Figure 2. hglm is clearly more usable
for fitting larger sized data sets. In the package vignette, we also
show the comparisons of the parameter estimates, where the "EQL"
estimates from hglm are almost identical to the "EQL-"
estimates
from spaMM, with high correlation coefficients between the two
methods: 1.0000, 0.9998, 0.9997 and 0.9999 for the residual variance,
Here we analyze the cancer
data set (source: Clayton and Kaldor 1987) from the hglm
package. Calling data(cancer)
loads a numeric vector E
that
represents the expected number of skin cancer patients in different
districts in Scotland, a numeric vector O
giving the corresponding
observed counts, a numeric vector Paff
giving proportion of people
involved agriculture, farming, and fisheries, and matrix D
giving the
neighborhood structure among Scottish districts. Here we demonstrate how
the data is fitted as a CAR model or a SAR model, using the hglm
package with the EQL method.
> library(hglm)
> data(cancer)
> logE <- log(E)
> XX <- model.matrix(~ Paff)
> cancerCAR <- hglm(X = XX, y = O, Z = diag(56),
+ family = poisson(),
+ rand.family = CAR(D = nbr),
+ offset = logE, conv = 1e-8,
+ maxit = 200, fix.disp = 1)
> summary(cancerCAR)
Call:
hglm.default(X = XX, y = O, Z = diag(56), family = poisson(),
rand.family = CAR(D = nbr), conv = 1e-08, fix.disp = 1, offset = logE)
----------
MEAN MODEL
----------
Summary of the fixed effects estimates:
Estimate Std. Error t-value Pr(>|t|)
(Intercept) 0.26740 0.20732 1.290 0.20893
Paff 0.03771 0.01215 3.103 0.00471 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note: P-values are based on 25 degrees of freedom
Summary of the random effects estimates:
Estimate Std. Error
[1,] 0.6407 1.0467
[2,] 0.5533 0.3829
[3,] 0.4124 0.5202
...
NOTE: to show all the random effects, use print(summary(hglm.object),
print.ranef = TRUE).
----------------
DISPERSION MODEL
----------------
NOTE: h-likelihood estimates through EQL can be biased.
Dispersion parameter for the mean model:
[1] 1
Model estimates for the dispersion term:
Link = log
Effects:
[1] 1
Dispersion = 1 is used in Gamma model on deviances to calculate the
standard error(s).
Dispersion parameter for the random effects:
[1] 656.3
Dispersion model for the random effects:
Link = log
Effects:
.|Random1
Estimate Std. Error
1/CAR.tau 6.487 1.727
-CAR.rho/CAR.tau -1.129 0.303
CAR.tau (estimated spatial variance component): 0.1542
CAR.rho (estimated spatial correlation): 0.174
Dispersion = 1 is used in Gamma model on deviances to calculate the
standard error(s).
EQL estimation converged in 10 iterations.
In the above output provided by the summary()
method, fixed effects
estimates are given under MEAN MODEL
, and the dispersion parameter
estimates are given under DISPERSION MODEL
. Here, we have only one
random effects term that has a CAR structure, and the corresponding
parameter estimates, .|Random1
. However, these are not the natural
parameters of the CAR model (see Section 3.1)
therefore the estimates of the natural dispersion parameters are given
just after them which are, in this case,
Furthermore, the value 656.3 for the dispersion of the random effects
given in the output above is an overall variance of
> cancerSAR <- hglm(X = XX, y = O, Z = diag(56), family = poisson(),
+ rand.family = SAR(D = nbr), offset = logE,
+ conv = 1e-08, fix.disp = 1)
> summary(cancerSAR)
Call:
hglm.default(X = XX, y = O, Z = diag(56), family = poisson(),
rand.family = SAR(D = nbr), conv = 1e-08, fix.disp = 1, offset = logE)
----------
MEAN MODEL
----------
Summary of the fixed effects estimates:
Estimate Std. Error t-value Pr(>|t|)
(Intercept) 0.19579 0.20260 0.966 0.34241
Paff 0.03637 0.01165 3.122 0.00425 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note: P-values are based on 27 degrees of freedom
Summary of the random effects estimates:
Estimate Std. Error
[1,] 0.7367 1.0469
[2,] 0.6336 0.3930
[3,] 0.4537 0.5784
...
NOTE: to show all the random effects, use print(summary(hglm.object),
print.ranef = TRUE).
----------------
DISPERSION MODEL
----------------
NOTE: h-likelihood estimates through EQL can be biased.
Dispersion parameter for the mean model:
[1] 1
Model estimates for the dispersion term:
Link = log
Effects:
[1] 1
Dispersion = 1 is used in Gamma model on deviances to calculate the
standard error(s).
Dispersion parameter for the random effects:
[1] 16.3
Dispersion model for the random effects:
Link = log
Effects:
.|Random1
Estimate Std. Error
1/sqrt(SAR.tau) 2.7911 0.4058
-SAR.rho/sqrt(SAR.tau) -0.4397 0.0822
SAR.tau (estimated spatial variance component): 0.1284
SAR.rho (estimated spatial correlation): 0.1575
Dispersion = 1 is used in Gamma model on deviances to calculate the
standard error(s).
EQL estimation converged in 12 iterations.
For the CAR model, the hglm
estimates are exactly the same as those
labelled as “PQL” estimates in Lee and Lee (2012). To get the EQL1 correction the
user has to add the option method = "EQL1"
and the hglm
function
gives similar results to those reported in Lee and Lee (2012), e.g., their HL(1,1)
estimates were
"EQL1"
correction gives (0.234, 0.0377, 0.156, 0.174) (see
also Section “Fitting a spatial Markov Random Field model using the CAR
family” in the package vignette). A minor difference to the "EQL1"
result appears because Lee and Lee (2012) used the HL(1,1) modification to their
HL(0,1) whereas we apply such a correction directly to EQL which is
slightly different from HL(0,1) (see Table 1).
We analyze a data set consisting of the student grades of 1,967 Ohio
Elementary Schools during the year 2001–2002. The data set is freely
available on the internet (URL http://www.spatial-econometrics.com/)
as a web supplement to LeSage and Pace (2009) but was not analyzed therein. The shape
files were downloaded from
http://www.census.gov/cgi-bin/geo/shapefiles2013/main and the
districts of 1,860 schools in these two files could be connected
unambiguously. The data set contains information on, for instance,
school building ID, Zip code of the location of the school, proportion
of passing on five subjects, number of teachers, number of students,
etc. We regress the median of 4th grade proficiency scores,
where
The above choice of constructing the weight matrix is rather simple. Because the aim of this paper is to demonstrate the use of hglm for fitting spatial models rather than drawing conclusions from real data analysis, we skip any further discussion on the construction of the weight matrix. Interested, readers are referred to LeSage and Pace (2009) for more discussion on the construction of the spatial weight matrices. With the spatial weight matrix defined as above, we can estimate model ((13)) by using our hglm package in the following way.
> ## load the data object 'ohio'
> data(ohio)
>
> ## fit a CAR model for the median scores of the districts
> X <- model.matrix(MedianScore ~ 1, data = ohioMedian)
> Z <- model.matrix(~ 0 + district, data = ohioMedian)
> ohioCAR <- hglm(y = ohioMedian$MedianScore, X = X, Z = Z,
+ rand.family = CAR(D = ohioDistrictDistMat))
> summary(ohioCAR)
Call:
hglm.default(X = X, y = ohioMedian$MedianScore, Z = Z,
rand.family = CAR(D = ohioDistrictDistMat))
----------
MEAN MODEL
----------
Summary of the fixed effects estimates:
Estimate Std. Error t-value Pr(>|t|)
(Intercept) 72.429 0.819 88.44 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note: P-values are based on 1566 degrees of freedom
Summary of the random effects estimates:
Estimate Std. Error
[1,] -21.433 11.071
[2,] -17.890 10.511
[3,] -4.537 7.844
...
NOTE: to show all the random effects, use print(summary(hglm.object),
print.ranef = TRUE).
----------------
DISPERSION MODEL
----------------
NOTE: h-likelihood estimates through EQL can be biased.
Dispersion parameter for the mean model:
[1] 190.5
Model estimates for the dispersion term:
Link = log
Effects:
Estimate Std. Error
5.2498 0.0357
Dispersion = 1 is used in Gamma model on deviances to calculate the
standard error(s).
Dispersion parameter for the random effects:
[1] 1.01
Dispersion model for the random effects:
Link = log
Effects:
.|Random1
Estimate Std. Error
1/CAR.tau 0.0097 8e-04
-CAR.rho/CAR.tau -0.0011 2e-04
CAR.tau (estimated spatial variance component): 103.6
CAR.rho (estimated spatial correlation): 0.1089
Dispersion = 1 is used in Gamma model on deviances to calculate the
standard error(s).
EQL estimation converged in 5 iterations.
The estimated spatial correlation parameter among school districts is 0.109. We can obtain fitted values from the CAR model and predict the school districts without any observations. The following codes perform such prediction and the results are visualized in Figure 3(B). We remove the estimate of Lake Erie, as estimation for an uninhabited region is meaningless.
> ## extract districts from the map data
> districtShape <- as.numeric(substr(as.character(ohioShape@data$UNSDIDFP), 3, 7))
>
> ## calculate fitted values from the CAR model
> CARfit <- matrix(ohioCAR$ranef + ohioCAR$fixef,
+ dimnames = list(rownames(ohioDistrictDistMat), NULL))
> ohioShape@data$CAR <- CARfit[as.character(districtShape),]
> is.na(ohioShape@data$CAR[353]) <- TRUE # remove estimate of Lake Erie
>
> ## visualize the results
> spplot(ohioShape, zcol = "CAR", main = "Fitted values from CAR",
+ col.regions = heat.colors(1000)[1000:1], cuts = 1000)
A predict()
method is not available because predicting spatially
correlated random effects for autoregressive models requires re-fitting
the whole model. Thus standard kriging cannot be used because the
covariance structure changes if the neighborhood matrix is altered,
while keeping
In the example above, ohioMedian$district
has 616 levels and 54 of the
districts have no records (Figure 3(A)). The incidence matrix
Z
, created using the model.matrix
function, therefore has 616
columns and 54 of these are columns of zeros. Hence, there are 616
levels in the fitted spatial random effect giving predictions for the
districts without records.
The hglm package is one of few non-Bayesian packages on CRAN to fit
spatial HGLMs, where the fixed and random effects are estimated
simultaneously. We have shown how the HGLM framework, allowing linear
predictors to model variance components, can be exploited to fit CAR and
SAR models. This gives a computationally efficient algorithm for
moderately sized problems (number of locations
X. Shen was supported by a Swedish Research Council grant (No. 2014-371).
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Alam, et al., "Fitting Conditional and Simultaneous Autoregressive Spatial Models in hglm", The R Journal, 2015
BibTeX citation
@article{RJ-2015-017, author = {Alam, Moudud and ård, Lars and Shen, Xia}, title = {Fitting Conditional and Simultaneous Autoregressive Spatial Models in hglm}, journal = {The R Journal}, year = {2015}, note = {https://rjournal.github.io/}, volume = {7}, issue = {2}, issn = {2073-4859}, pages = {5-18} }