ngspatial : A Package for Fitting the Centered Autologistic and Sparse Spatial Generalized Linear Mixed Models for Areal Data

Two important recent advances in areal modeling are the centered autologistic model and the sparse spatial generalized linear mixed model (SGLMM), both of which are reparameterizations of traditional models. The reparameterizations improve regression inference by alleviating spatial confounding, and the sparse SGLMM also greatly speeds computing by reducing the dimension of the spatial random effects. Package ngspatial (’ng’ = non-Gaussian) provides routines for fitting these new models. The package supports composite likelihood and Bayesian inference for the centered autologistic model, and Bayesian inference for the sparse SGLMM. Background and introduction The traditional autologistic model (Besag, 1972) and areal GLMM (Besag et al., 1991) have enjoyed widespread popularity: they have been applied thousands of times in many fields, e.g., epidemiology, marketing, agriculture, ecology, forestry, geography, and image analysis. But it was recently discovered that both models are spatially confounded (Caragea and Kaiser, 2009; Clayton et al., 1993). This confounding can cause bias and/or variance inflation in the estimators of regression coefficients, leading to erroneous regression inference. This is a serious drawback because many spatial modelers are interested in regression (rather than, or in addition to, prediction). To address the confounding of the traditional autologistic model, Caragea and Kaiser (2009) devised the centered autologistic model, so named because it replaces the traditional model’s autocovariate with a centered autocovariate (see below for details). The confounding of the mixed model was first addressed by Reich et al. (2006) using a technique now known as restricted spatial regression (Hodges and Reich, 2010). This technique alleviates spatial confounding and also yields a faster mixing Markov chain, but the computational burden remains high because the dimension of the spatial random effects is reduced only slightly relative to the traditional model. By using the so called Moran operator, Hughes and Haran (2013) were able to reparameterize the mixed model in a way that not only improves regression inference but also dramatically reduces the dimension of the random effects. The resulting model, which we will call the sparse SGLMM, can be fitted so efficiently that even the largest areal datasets can be analyzed quickly. These promising new models cannot be applied using existing software, and so we have provided support for the models in version 1.0 of R package ngspatial, the subject of this article. First we discuss the two models in some detail. Then we present ngspatial 1.0, which permits composite likelihood and Bayesian inference for the centered autologistic model, and Bayesian inference for the sparse SGLMM. We conclude with a summary. The models supported by ngspatial 1.0 Areal models The autologistic model and the sparse SGLMM are areal models, i.e., models for data observed at the vertices of a graph G = (V, E), where V = {1, 2, . . . , n} are the vertices, and E ⊂ V ×V are the edges. Each vertex of G represents an areal unit, i.e., an area over which measurements have been aggregated. For example, areal units could be counties, census tracts, voxels, pixels, or provinces. An edge (i, j) of G represents the spatial adjacency of areal units i and j. Typically, two areal units are considered adjacent if they share a boundary, but other definitions are possible. We will assume that G is undirected and free of loops and parallel edges. The R Journal Vol. 6/2, December 2014 ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLES 82 The centered autologistic model The traditional autologistic model was proposed by Besag (1972). The model is a Markov random field (MRF) model (Kindermann and Snell, 1980), which is to say that G describes conditional independencies among the random variables Zi (i = 1, . . . , n) associated with V. For the autologistic model, the ith observation is Bernoulli distributed conditional on its neighbors: log P(Zi = 1 | {Zj : (i, j) ∈ E}) P(Zi = 0 | {Zj : (i, j) ∈ E}) = x′ iβ+ η ∑ j:(i,j)∈E Zj, where xi is a p-vector of spatial predictors associated with the ith areal unit, β is a p-vector of spatial regression coefficients, η is a spatial dependence parameter, and ∑ Zj is the so called autocovariate. We see that η is a measure of Zi’s reactivity to its neighbors. If η = 0, the model reduces to the ordinary Bernoulli GLM, while η > 0 (< 0) corresponds to positive (negative) spatial dependence. We will assume that η > 0 since the model is usually applied to phenomena that exhibit spatial attraction rather than repulsion. Caragea and Kaiser (2009) showed that the traditional autologistic model is confounded. This is because the traditional autocovariate is not well suited to the task of fitting small-scale structure in the data, i.e., clustering induced by spatial dependence and residual to the large-scale structure x′β. Instead, the traditional autocovariate and the spatial predictors “compete” to explain the data, which prevents the model from isolating the role of the spatial predictors. Caragea and Kaiser (2009) reparameterized the model by centering the autocovariate. The resulting conditional log odds are log P(Zi = 1 | {Zj : (i, j) ∈ E}) P(Zi = 0 | {Zj : (i, j) ∈ E}) = x′ iβ+ η ∑ j:(i,j)∈E (Zj − μj), where μj = {1 + exp(−xjβ)} −1 is the independence expectation of Zj. Centering allows the autocovariate to fit only residual structure so that fitting the large-scale structure is left to the regression term. Thus the centered model restores to β and η their desired interpretations as regression coefficients and dependence parameter, respectively. Maximum likelihood and Bayesian inference for the autologistic model are complicated by an intractable normalizing function. To see this, first let Z = (Z1, . . . , Zn)′; let X be the design matrix; let A = [1{(i, j) ∈ E}] be the adjacency matrix of G, where 1{·} is the indicator function; let θ = (β′, η)′ be the full parameter vector; and let μ = (μ1, . . . , μn)′ be the vector of independence expectations. Then, assuming G has clique number 2, the joint pmf of the centered model is π(Z | θ) = c(θ)−1 exp ( Z ′Xβ− ηZ ′Aμ+ η 2 Z ′AZ ) , (1) where c(θ) = ∑ Y ∈{0,1}n exp ( Y ′Xβ− ηY ′Aμ+ η 2 Y ′AY ) is the normalizing function (Hughes et al., 2011). The normalizing function is intractable for all but the smallest datasets because the sample space {0, 1}n contains 2n points. Our package offers two ways to solve this problem: (1) composite likelihood inference, which sidesteps c(θ), and (2) auxiliary-variable MCMC for Bayesian inference, which allows c(θ) to cancel from the Metropolis-Hastings acceptance probability. See below for details. The sparse spatial generalized linear mixed model The traditional SGLMM for areal data—sometimes referred to as the BYM model—was proposed by Besag, York, and Mollié (1991). The BYM model is hierarchical, inducing spatial dependence by way of a latent autonormal random vector. Conditional on these spatial random effects, the observations are independent and follow an ordinary GLM. Specifically, the transformed conditional means are g(μi) = x′ iβ+ Si, (2) where g is a link function, μi = E(Zi | Si), and Si is the spatial random effect for the ith areal unit. The most common specification for S = (S1, . . . , Sn)′ is the so called intrinsic conditional autoregression (ICAR), a zero-mean Gaussian Markov random field (GMRF; Rue and Held, 2005) with a The R Journal Vol. 6/2, December 2014 ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLES 83 singular precision matrix that corresponds to the intuitively appealing full conditional distributions Si | {Sj : (i, j) ∈ E} ∼ N  1 di ∑ j:(i,j)∈E Sj, 1 τdi  , where di is the degree of vertex i and τ > 0 is a smoothing parameter. These conditionals correspond to the joint distribution S ∼ N{0, (τQ)−1}, where Q = D− A, with D = diag(di). Since Q is singular, the BYM model is employed from the Bayesian point of view, with the ICAR a prior distribution on S. Packages CARBayes (Lee, 2013) and spdep (Bivand, 2015) provide tools for fitting the ICAR and other CAR models. Reich et al. (2006) showed that the BYM model is spatially confounded in the sense that the random effects can “pollute” the regression manifold C(X), which can lead to a biased and variance-inflated posterior for β. To see this, first let P be the orthogonal projection onto C(X), so that I − P is the orthogonal projection onto C(X)⊥. Now eigendecompose P and I− P to obtain orthogonal bases (Kn×p and Ln×(n−p), say) for C(X) and C(X) ⊥. Then (2) can be rewritten as g(μi) = x′ iβ+ k ′ iγ + l ′ iδ, where γp×1 and δ(n−p)×1 are random coefficients. This form shows that K is the source of the confounding, for K and X have the same column space. Since the columns of K are merely synthetic predictors (i.e., they have no scientific meaning), Reich et al. (2006) recommend removing them from the model. The resulting model (henceforth the RHZ model) has g(μi) = x′ iβ+ l ′ iδ, so that spatial smoothing is restricted to the orthogonal complement of C(X). In a subsequent paper, Hodges and Reich (2010) referred to this technique as restricted spatial regression. Restricted spatial regression is not only an effective remedy for confounding but also speeds computing. Because the columns of L are orthogonal, the RHZ model’s random effects are approximately a posteriori uncorrelated. This yields a fast mixing Markov chain, and the cost per iteration is reduced because a simple spherical normal proposal is sufficient for updating the random effects. But fitting the RHZ model to large areal datasets is still quite burdensome computationally because the random effects remain high dimensional. By taking full advantage of G, Hughes and Haran (2013) were able to greatly reduce the number of random effects while also improving regression inference. Hughes and Haran (2013) begin by defining the so called Moran operator for X wi


Background and introduction
The traditional autologistic model (Besag, 1972) and areal GLMM (Besag et al., 1991) have enjoyed widespread popularity: they have been applied thousands of times in many fields, e.g., epidemiology, marketing, agriculture, ecology, forestry, geography, and image analysis.But it was recently discovered that both models are spatially confounded (Caragea and Kaiser, 2009;Clayton et al., 1993).This confounding can cause bias and/or variance inflation in the estimators of regression coefficients, leading to erroneous regression inference.This is a serious drawback because many spatial modelers are interested in regression (rather than, or in addition to, prediction).
To address the confounding of the traditional autologistic model, Caragea and Kaiser (2009) devised the centered autologistic model, so named because it replaces the traditional model's autocovariate with a centered autocovariate (see below for details).
The confounding of the mixed model was first addressed by Reich et al. (2006) using a technique now known as restricted spatial regression (Hodges and Reich, 2010).This technique alleviates spatial confounding and also yields a faster mixing Markov chain, but the computational burden remains high because the dimension of the spatial random effects is reduced only slightly relative to the traditional model.By using the so called Moran operator, Hughes and Haran (2013) were able to reparameterize the mixed model in a way that not only improves regression inference but also dramatically reduces the dimension of the random effects.The resulting model, which we will call the sparse SGLMM, can be fitted so efficiently that even the largest areal datasets can be analyzed quickly.
These promising new models cannot be applied using existing software, and so we have provided support for the models in version 1.0 of R package ngspatial, the subject of this article.First we discuss the two models in some detail.Then we present ngspatial 1.0, which permits composite likelihood and Bayesian inference for the centered autologistic model, and Bayesian inference for the sparse SGLMM.We conclude with a summary.

The models supported by ngspatial 1.0 Areal models
The autologistic model and the sparse SGLMM are areal models, i.e., models for data observed at the vertices of a graph G = (V, E), where V = {1, 2, . . ., n} are the vertices, and E ⊂ V × V are the edges.Each vertex of G represents an areal unit, i.e., an area over which measurements have been aggregated.For example, areal units could be counties, census tracts, voxels, pixels, or provinces.An edge (i, j) of G represents the spatial adjacency of areal units i and j.Typically, two areal units are considered adjacent if they share a boundary, but other definitions are possible.We will assume that G is undirected and free of loops and parallel edges.
The R Journal Vol.6/2, December 2014 ISSN 2073-4859 The centered autologistic model The traditional autologistic model was proposed by Besag (1972).The model is a Markov random field (MRF) model (Kindermann and Snell, 1980), which is to say that G describes conditional independencies among the random variables Z i (i = 1, . . ., n) associated with V.For the autologistic model, the ith observation is Bernoulli distributed conditional on its neighbors: where x i is a p-vector of spatial predictors associated with the ith areal unit, β is a p-vector of spatial regression coefficients, η is a spatial dependence parameter, and ∑ Z j is the so called autocovariate.
We see that η is a measure of Z i 's reactivity to its neighbors.If η = 0, the model reduces to the ordinary Bernoulli GLM, while η > 0 (< 0) corresponds to positive (negative) spatial dependence.We will assume that η > 0 since the model is usually applied to phenomena that exhibit spatial attraction rather than repulsion.Caragea and Kaiser (2009) showed that the traditional autologistic model is confounded.This is because the traditional autocovariate is not well suited to the task of fitting small-scale structure in the data, i.e., clustering induced by spatial dependence and residual to the large-scale structure x β.Instead, the traditional autocovariate and the spatial predictors "compete" to explain the data, which prevents the model from isolating the role of the spatial predictors.Caragea and Kaiser (2009) reparameterized the model by centering the autocovariate.The resulting conditional log odds are log where µ j = {1 + exp(−x j β)} −1 is the independence expectation of Z j .Centering allows the autocovariate to fit only residual structure so that fitting the large-scale structure is left to the regression term.
Thus the centered model restores to β and η their desired interpretations as regression coefficients and dependence parameter, respectively.
Maximum likelihood and Bayesian inference for the autologistic model are complicated by an intractable normalizing function.To see this, first let Z = (Z 1 , . . ., Z n ) ; let X be the design matrix; let A = [1{(i, j) ∈ E}] be the adjacency matrix of G, where 1{•} is the indicator function; let θ = (β , η) be the full parameter vector; and let µ = (µ 1 , . . ., µ n ) be the vector of independence expectations.Then, assuming G has clique number 2, the joint pmf of the centered model is where is the normalizing function (Hughes et al., 2011).
The normalizing function is intractable for all but the smallest datasets because the sample space {0, 1} n contains 2 n points.Our package offers two ways to solve this problem: (1) composite likelihood inference, which sidesteps c(θ), and (2) auxiliary-variable MCMC for Bayesian inference, which allows c(θ) to cancel from the Metropolis-Hastings acceptance probability.See below for details.

The sparse spatial generalized linear mixed model
The traditional SGLMM for areal data-sometimes referred to as the BYM model-was proposed by Besag, York, and Mollié (1991).The BYM model is hierarchical, inducing spatial dependence by way of a latent autonormal random vector.Conditional on these spatial random effects, the observations are independent and follow an ordinary GLM.Specifically, the transformed conditional means are where g is a link function, µ i = E(Z i | S i ), and S i is the spatial random effect for the ith areal unit.The most common specification for S = (S 1 , . . ., S n ) is the so called intrinsic conditional autoregression (ICAR), a zero-mean Gaussian Markov random field (GMRF; Rue and Held, 2005) with a The R Journal Vol.6/2, December 2014 ISSN 2073-4859 singular precision matrix that corresponds to the intuitively appealing full conditional distributions where d i is the degree of vertex i and τ > 0 is a smoothing parameter.These conditionals correspond to the joint distribution S ∼ N {0, (τQ) −1 }, Since Q is singular, the BYM model is employed from the Bayesian point of view, with the ICAR a prior distribution on S. Packages CARBayes (Lee, 2013) and spdep (Bivand, 2015) provide tools for fitting the ICAR and other CAR models.Reich et al. (2006) showed that the BYM model is spatially confounded in the sense that the random effects can "pollute" the regression manifold C(X), which can lead to a biased and variance-inflated posterior for β.To see this, first let P be the orthogonal projection onto C(X), so that I − P is the orthogonal projection onto C(X) ⊥ .Now eigendecompose P and I − P to obtain orthogonal bases (K n×p and L n×(n−p) , say) for C(X) and C(X) ⊥ .Then (2) can be rewritten as where γ p×1 and δ (n−p)×1 are random coefficients.This form shows that K is the source of the confounding, for K and X have the same column space.
Since the columns of K are merely synthetic predictors (i.e., they have no scientific meaning), Reich et al. (2006) recommend removing them from the model.The resulting model (henceforth the RHZ model) has so that spatial smoothing is restricted to the orthogonal complement of C(X).In a subsequent paper, Hodges and Reich (2010) referred to this technique as restricted spatial regression.
Restricted spatial regression is not only an effective remedy for confounding but also speeds computing.Because the columns of L are orthogonal, the RHZ model's random effects are approximately a posteriori uncorrelated.This yields a fast mixing Markov chain, and the cost per iteration is reduced because a simple spherical normal proposal is sufficient for updating the random effects.But fitting the RHZ model to large areal datasets is still quite burdensome computationally because the random effects remain high dimensional.
By taking full advantage of G, Hughes and Haran (2013) were able to greatly reduce the number of random effects while also improving regression inference.Hughes and Haran (2013) begin by defining the so called Moran operator for X with respect to G: (I − P)A(I − P).This operator appears in the numerator of a generalized form of Moran's I, a popular nonparametric measure of spatial dependence (Moran, 1950): Boots and Tiefelsdorf (2000) showed that (1) the (standardized) spectrum of a Moran operator comprises the possible values for the corresponding I X (A), and (2) the eigenvectors comprise all possible mutually distinct patterns of clustering residual to C(X) and accounting for G.The positive (negative) eigenvalues correspond to varying degrees of positive (negative) spatial dependence, and the eigenvectors associated with a given eigenvalue (λ i , say) are the patterns of spatial clustering that data exhibit when the dependence among them is of degree λ i .
In other words, the eigenvectors of the Moran operator form a multiresolutional spatial basis for C(X) ⊥ that exhausts all possible patterns that can arise on G. Since we do not expect to observe repulsion in the phenomena to which these models are usually applied, we can use the spectrum of the operator to discard all repulsive patterns, retaining only attractive patterns for our analysis (although it can be advantageous to accommodate repulsion (Griffith, 2006).The nature of the attractive Moran eigenvectors is illustrated in Figure 1.
By retaining only eigenvectors that exhibit positive spatial dependence, we can usually reduce the model dimension by at least half a priori.And Hughes and Haran (2013) showed that a much greater reduction is possible in practice, with 50-100 eigenvectors being sufficient for most datasets.
Let M n×q contain the first q n eigenvectors of the Moran operator.Then the sparse SGLMM has first stage where m i is the ith row of M and δ s is a q-vector of random coefficients.This implies p + q + 1 model The R Journal Vol.6/2, December 2014 ISSN 2073-4859 parameters, compared to p + n + 1 for the traditional model and p + (n − p) + 1 = n + 1 for the RHZ model.This dramatic reduction in dimension speeds computation considerably, allowing even the largest areal datasets to be analyzed quickly.
We note that the sparse SGLMM was partly inspired by spatial filtering, also known as principle coordinates of neighbor matrices.See, e.g., Griffith (2003); Dray et al. (2006); Borcard et al. (2011) for more information.Package spdep provides tools for spatial filtering.

Package ngspatial
Package ngspatial supports composite likelihood and Bayesian inference for the centered autologistic model, and Bayesian inference for the sparse SGLMM.Usage of the package's two main functions, autologistic and sparse.sglmm, is simple and intuitive while offering considerable flexibility.

Fitting the centered autologistic model
The fitting function for the centered autologistic model is called autologistic.This function returns an object of class "autologistic".Auxiliary functions residuals, summary, and vcov accept an object of type "autologistic" and return various kinds of residuals, summarize the fit, and return an estimated covariance matrix, respectively.

A fast perfect sampler for the centered autologistic model
A key component of our autologistic framework is function rautologistic, which simulates a draw from a centered autologistic model.The function has the following signature.

rautologistic(X, A, theta)
The three arguments are as described above.The functions of ngspatial require that an adjacency matrix be binary and symmetric (see function isSymmetric from the base package) and of type matrix.Note that package spdep provides function nb2mat, which constructs an adjacency matrix from a neighbors list.spdep also provides many other functions that are useful for handling adjacency structures.
This function employs coupling from the past (Propp and Wilson, 1996) to generate a vector distributed exactly according to (1).We use perfect sampling, rather than ordinary MCMC, for two reasons.First, the MCMC algorithm we use to do Bayesian inference requires that we draw a perfect sample during each iteration (Møller et al., 2006).Were we to use an approximation, the resulting Markov chain would not necessarily have the true posterior as its stationary distribution (Murray et al., 2006).Second, although perfect sampling can be computationally burdensome, a carefully implemented perfect sampler is fast enough to permit composite likelihood analysis of even very large datasets, while obviating convergence diagnosis.
Our perfect sampler was implemented in C++.More specifically, we used the Armadillo linear algebra library (Sanderson, 2010), which provides two benefits: (1) intuitive syntax for linear algebra (e.g., Z Xβ can be computed as Z.t() * X * beta), and (2) speed (Armadillo uses delayed evaluation to combine multiple operations, thus reducing or eliminating the need for temporaries).We integrated the C++ and R code using the Rcpp and RcppArmadillo packages (Eddelbuettel and François, 2011).
The R Journal Vol.6/2, December 2014 ISSN 2073-4859 We tested our Armadillo-based sampler in a number of circumstances and found it to be over three times as fast as an optimal pure R version.

Composite likelihood inference for the centered autologistic model
One way to overcome the intractability of the normalizing function c(θ) is to avoid it.This can be accomplished by considering the so called pseudolikelihood (PL), which is a composite likelihood (Lindsay, 1988) of the conditional type.Each of the n factors in the pseudolikelihood is the likelihood of a single observation, conditional on said observation's neighbors: where z i is the observed value of Z i , and A i is the ith row of A. Since the p i are free of the normalizing function, so is the log pseudolikelihood, which is given by Although ( 3) is not the true log likelihood unless η = 0, Besag (1975) showed that the MPLE converges almost surely to the maximum likelihood estimator (MLE) as the lattice size goes to ∞ (under an infill regime).For small samples the MPLE is less precise than the MLE (and the Bayes estimator), but point estimation of β is generally so poor for small samples that precision is unimportant.When the sample size is large enough to permit accurate estimation of β, the MPLE is nearly as precise as the MLE (Hughes et al., 2011).
We find the MPLE θ = arg max PL (θ) by using the optim function to minimize − PL (θ).Although the Nelder-Mead simplex algorithm finds the minimum quickly, we opted for the much faster BFGS method.To speed computation even further, we supply the score function Confidence intervals can be obtained using a parametric bootstrap or sandwich estimation.For the former we generate b samples from π(Z | θ) and compute the MPLE for each sample, thus obtaining the bootstrap sample θ(1) , . . ., θ(b) .Appropriate quantiles of the bootstrap sample are then used to construct approximate confidence intervals for the elements of θ.
The second approach for computing confidence intervals is based on (Varin et al., 2011) √ where I −1 (θ)J (θ)I −1 (θ) is the so called Godambe sandwich matrix (Godambe, 1960).The "bread" in this sandwich is the inverse of the information matrix I (θ) = −E∇ 2 PL (θ), and the "filling" is the variance of the score: J (θ) = E∇∇ PL (θ).We use the observed information (computed by optim) in place of I and estimate J using a parametric bootstrap.For the bootstrap we simulate b samples Z (1) , . . ., Z (b) from π(Z | θ) and estimate J as Because the bootstrap sample can be generated in parallel (using the parallel package) and little subsequent processing is required, these approaches to inference are very efficient computationally, even for large datasets.We note that sandwich estimation is over twice as fast as the full bootstrap.Moreover, asymptotic inference and bootstrap inference yield comparable results for practically all sample sizes because (4) is not, in fact, an asymptotic result.This is because the log pseudolikelihood is approximately quadratic with Hessian approximately invariant in law, which implies that the MPLE is approximately normally distributed irrespective of sample size (Geyer, 2013).

Bayesian inference for the centered autologistic model
For Bayesian inference we use the auxiliary-variable MCMC algorithm of Møller et al. (2006), which allows us to construct a proposal distribution so that the normalizing constant cancels from the Metropolis-Hastings ratio.The method requires that we can draw independent realizations from the unnormalized density for any value of θ, which we can do using our perfect sampler.
where Y * is the proposed auxiliary variable, θ * = (β * , η * ) is the proposed θ, and θ is the maximum pseudolikelihood estimate of θ.Thus log α is Because the auxiliary proposals cannot be generated in parallel, this approach to Bayesian analysis is computationally expensive.Our optimized perfect sampler eases the burden somewhat, as does our Armadillo implementation of (5).We achieve an additional gain in efficiency as follows.
We use a normal random walk Metropolis-Hastings algorithm, and so our proposal for θ is (p + 1)variate normal, i.e., θ * (k+1) | θ * (k) ∼ N p+1 (θ * (k) , Σ), for a suitably chosen covariance matrix Σ. Taking Σ = τ 2 I tends to result in a low acceptance rate unless τ 2 is quite small, in which case a long run is required to adequately explore the posterior.Instead, we begin by using where ΣGLM is the estimated asymptotic covariance matrix for β obtained from fitting the ordinary logistic model to the data.We use Σ 0 for a training run (the number of iterations can be chosen by the user), and then we use the posterior sample covariance matrix from the training run as the proposal covariance matrix for a subsequent run.We use the latter sample to do inference.The training run usually results in a much better mixing chain, which reduces the total number of iterations.Still, rigorous Bayesian inference (following Møller et al.) is impractical for large lattices because perfect sampling does not scale well.
As for the prior f (θ), we assume that β and η are a priori independent, i.e., f (θ) = f 1 (β) f 2 (η).The prior for β is N (0, σ 2 I).The common standard deviation defaults to 1,000 but can be specified by the user via control parameter sigma.The prior for η is U (0, η max ).The right endpoint defaults to 2 but can be chosen by the user via control parameter eta.max.

Usage examples for the centered autologistic model
Now we present some usage examples for the autologistic model.The fitting function is called autologistic and has the following signature.autologistic(formula, data, A, method = c("PL", "Bayes"), model = TRUE, x = FALSE, y = FALSE, verbose = FALSE, control = list()) Arguments formula, data, model, x, and y are analogous to those that appear in the signatures of other commonly used R model fitting functions, e.g., lm and glm.A is of course the binary adjacency matrix for G.The method argument is used to select pseudolikelihood inference or Bayesian inference.
The R Journal Vol.6/2, December 2014 ISSN 2073-4859 If verbose is equal to TRUE, various messages may be sent to the standard output stream.Most of those messages have to do with the default values for control parameters that were not set by the user via the control argument.
For the following example the underlying graph is the 50 × 50 square lattice.First the corresponding adjacency matrix is constructed.Then the vertices are assigned coordinates restricted to the unit square centered at the origin.The vertex locations are used as spatial covariates, with regression coefficients β = (2, 2) .The resulting independence expectations µ = Xβ, which range from approximately 0.1 to approximately 0.9, are level plotted in grayscale to show that the large-scale structure corresponds to a probability gradient that increases as one moves from southwest to northeast.Then, for η = 0.6, a dataset is simulated and plotted.The dataset is then fitted using the PL method, and no confidence intervals are computed.Finally, the fit is summarized.
We use, and recommend, fixed-width output analysis (Flegal et al., 2008).In fixed-width analysis, one chooses a tolerance and stops sampling when all Monte Carlo standard errors fall below the tolerance.The output shown below indicates that the Monte Carlo standard errors fell below the default tolerance of 0.01 after 714,025 draws were made from the posterior, which is why the analysis was terminated before having reached the (default) maximum number of iterations (maxit = 1e6).We use the batchmeans package (Haran and Hughes, 2012) to compute Monte Carlo standard errors (denoted MCSE in the output).
In this example we use a training run of length 10,000, and the length of the subsequent inferential run will be at least 10,000.
Control parameter maxit must be a positive whole number greater than or equal to minit .Setting it to 1e+06.
Control parameter sigma must be a positive number.Setting it to the default value of 1,000.Control parameter eta.max must be a positive number.Setting it to the default value of 2.
Warning: MCMC may be time consuming.

Progress => 5%
Progress => 10% Progress => 15% Call: autologistic(formula = Z ~X -1, A = A, method = "Bayes", verbose = TRUE, control = list(trainit = 10000, minit = 10000)) The object returned by function autologistic is a list with many useful elements.The following output resulted from applying the names function to the object returned in the second example above.

Fitting the sparse SGLMM
The fitting function for the sparse SGLMM is called sparse.sglmm.The function returns an object of class "sparse.sglmm".Auxiliary functions residuals, summary, and vcov accept an object of type "sparse.sglmm"and return various kinds of residuals, summarize the fit, and return an estimated covariance matrix, respectively.

Bayesian inference for the sparse SGLMM
The second stage of the sparse SGLMM, i.e., the prior for δ s , is where τ s is a smoothing parameter.The prior for β is spherical p-variate normal with mean zero and common variance sigma.b,which defaults to 1,000 but can be controlled by the user via argument hyper.The prior for τ s is gamma with parameters 0.5 and 2,000 (Kelsall and Wakefield, 1999).This prior is attractive because it corresponds to the prior belief that the fixed effects are sufficient to explain the data (since a large value for τ s implies small variances for the random effects) and because it discourages artifactual spatial structure in the posterior.
When the response is normally distributed, the identity link is assumed, in which case the first stage is where δ h are heterogeneity random effects.When the response is Poisson distributed, heterogeneity random effects are optional.In any case, the prior on δ h is spherical q-variate normal with mean zero and common variance 1/τ h .The prior for τ h is gamma with parameters a.h and b.h.The default values for these parameters are 0.01 and 100, respectively, or their values can be set by the user through argument hyper.
If the response is Bernoulli or Poisson, β and δ s are updated using Metropolis-Hastings random walks with normal proposals.The proposal covariance matrix for β is the estimated asymptotic covariance matrix from a glm fit to the data.The proposal for δ s is spherical normal with common standard deviation sigma.s.If the response is Poisson distributed and heterogeneity random effects are included, those random effects are updated using a Metropolis-Hastings random walk with a spherical normal proposal.The common standard deviation is sigma.h.Both sigma.sand sigma.hdefault to 0.01, but they can be set via argument tune.
The R Journal Vol.6/2, December 2014 ISSN 2073-4859 Most of the arguments are analogous to those already described above.Arguments attractive and repulsive are used to select the number of Moran eigenvectors.Both attractive and repulsive eigenvectors are permitted, although repulsive defaults to 0, which corresponds to pure spatial smoothing.Function sparse.sglmmchecks the validity of attractive and repulsive by eigendecomposing the Moran operator and examining the spectrum.Execution is terminated (with an informative error message) if either value is invalid.Arguments tune and hyper can be used to supply values for tuning parameters and hyperparameters, respectively, as described above.
We will illustrate the usage of our sparse SGLMM function by analyzing the data shown in Figure 2. The plot shows infant mortality data for 3,071 US counties.Each shaded peak represents a ratio of deaths to births, i.e., an empirical infant mortality rate, for a given county.The data were obtained from the 2008 Area Resource File (ARF), a county-level database maintained by the Bureau of Health Professions, Health Resources and Services Administration, US Department of Health and Human Services.Specifically, three variables were extracted from the ARF: the three-year (2002)(2003)(2004) average number of infant deaths before the first birthday, the three-year average number of live births, and the three-year average number of low birth weight infants.
To these data we fit the sparse Poisson SGLMM with log E(DEATHS i | β, δ s ) = log BIRTHS i + β 0 + β 1 LOW i + β 2 BLACK i + β 3 HISP i + β 4 GINI i where DEATHS is the number of infant deaths; BIRTHS is the number of live births; LOW is the rate of low birth weight; BLACK is the percentage of black residents (according to the 2000 US Census); HISP is the percentage of hispanic residents (2000 US Census); GINI is the Gini coefficient, a measure of income inequality (Gini, 1921); AFF is a composite score of social affluence (Yang et al., 2009); and STAB is residential stability, an average z-score of two variables from the 2000 US Census.
Other particularly useful fields include beta.sample,gamma.sample,and tau.s.sample, which contain the posterior samples for β, δ s , and τ s , respectively.(The package uses γ and δ, respectively, in place of δ s and δ h .This paper has used δ s and δ h because here it was necessary to present the RHZ model, which is not discussed in the package documentation.)Monte Carlo standard errors are also returned, as are acceptance rates.

Conclusion
This article introduced version 1.0 of R package ngspatial, which supports two promising new models for areal data, namely, the centered autologistic model and the sparse SGLMM.The package is user friendly because its model-fitting functions and auxiliary functions were designed to behave like the analogous functions (e.g., lm, glm, summary) in the stats package.The package is also efficient because the code uses vector and matrix operations where possible, and because key portions of the code were written in C++.

Figure 1 :
Figure 1: A Moran eigenvector exhibiting strong positive spatial dependence.The graph is the square 100 × 100 lattice.The coordinates of the vertices (x i = (x i , y i ) for i = 1, . . ., 10 000) were restricted to the unit square, and those coordinates were used as spatial covariates, i.e., X = (x 1 • • • x 10 000 ).

Figure 2 :
Figure 2: The empirical infant mortality rates.

Table 1 :
Results for sparse SGLMM and ICAR fits to the infant mortality data.