Eigenvector-based Spatial filtering constitutes a highly flexible semiparametric approach to account for spatial autocorrelation in a regression framework. It combines judiciously selected eigenvectors from a transformed connectivity matrix to construct a synthetic spatial filter and remove spatial patterns from model residuals. This article introduces the spfilteR package that provides several useful and flexible tools to estimate spatially filtered linear and generalized linear models in R. While the package features functions to identify relevant eigenvectors based on different selection criteria in an unsupervised fashion, it also helps users to perform supervised spatial filtering and to select eigenvectors based on alternative user-defined criteria. Besides a brief discussion of the eigenvector-based spatial filtering approach, this article presents the main functions of the package and illustrates their usage. Comparison to alternative implementations in other R packages highlights the added value of the spfilteR package.
The presence of spatial autocorrelation in regression residuals constitutes a severe problem in standard inferential statistics as it causes common econometric methods to produce inefficient or even biased and inconsistent parameter estimates (Franzese and Hays 2007; Goodchild 2009; Darmofal 2015). Besides parametric spatial regression techniques, which became the dominant approach to this challenge in the social sciences, spatial filtering techniques offer an alternative approach to handle spatially clustered data. The particular appeal of these alternative semiparametric approaches to spatial autocorrelation arise from their flexibility and the relative ease of estimation and interpretation (Getis and Griffith 2002; e.g., Tiefelsdorf and Griffith 2007). Especially the eigenvector-based spatial filtering (ESF) approach pioneered by (Griffith 1996, 2000, 2003; Griffith) has proven to be useful in various academic disciplines.
This article introduces the spfilteR package that provides a set of flexible and useful functions to implement the ESF approach in regression models. Besides tools to detect spatial autocorrelation in individual variables and regression residuals by means of the Moran coefficient (MC) (Cliff and Ord 1972, 1981), the package features easily customizable functions which allow users to perform supervised and unsupervised spatial filtering with eigenvectors. While other R packages like spatialreg (Bivand and Piras 2015) and spmoran (Murakami 2020) also contain implementations of the unsupervised ESF approach, they are less flexible in the specification of eigenvector selection criteria which constitutes the crucial step in the ESF approach. These packages also offer few functions for the supervised selection of eigenvectors.
In contrast, the spfilteR package allows users to obtain eigenvectors
from a transformed connectivity matrix and to identify a suitable
candidate set in order to perform supervised spatial filtering.
Alternatively, unsupervised eigenvector selection procedures for
different (generalized) linear models based on a stepwise regression
procedure are implemented as well. These functions select eigenvectors
based on either i) the maximization of model fit, ii) minimization of
residual autocorrelation, iii) the statistical significance of residual
autocorrelation, or iv) the statistical significance of the candidate
eigenvectors. Parameter estimates are obtained by means of ordinary
least squares (OLS) for linear models and maximum likelihood estimation
(MLE) for generalized linear models (GLMs). The print
, summary
, and
plot
methods further facilitate the interpretation and visualization
of the results.
After a theoretical description of the ESF approach in a regression framework, this article presents some stylized R code to demonstrate the implementation of the ESF approach using the functions and the synthetic dataset accompanying the spfilteR package. It also briefly compares the unsupervised ESF procedures contained in this package to alternative implementations in other R packages. The last section summarizes and concludes this article.
Intuitively, the ESF approach put forth by (Griffith 1996, 2000, 2003; Griffith) and also Tiefelsdorf and Griffith (2007) addresses the problem of spatially autocorrelated regression residuals by partitioning the error term into a spatially structured and a random component (see also Griffith and Chun 2014). Consider a stylized linear regression model of the following form:
To this end, synthetic proxy variables are generated that reflect the spatial pattern present in model residuals as closely as possible. Subsequently including these synthetic variables as control variables in the regression’s mean equation removes the problematic spatial structure from the disturbances and allows the use of standard procedures — such as OLS or MLE — for parameter estimation. Generating these proxy variables that act as the spatial filter requires the decomposition of the transformed and exogenously defined connectivity matrix which represents the dependence structure among the units of analysis.
The eigenfunction (or spectral) decomposition of a transformed connectivity matrix constitutes the core element of the ESF approach. More formally, the decomposition yields
The projection matrix is given by
glmFilter()
in the spfilteR
package reports the condition number (see also Griffith and Amrhein 1997).
However, since the number of eigenvectors equals the number of observations in the data, only a subset of eigenvectors can be included in the regression equation.
Identifying and selecting relevant eigenvectors is decisive in the ESF
approach and involves two steps. In a first step, a set of candidate
eigenvectors, the search set
Griffith (2003), for example, proposes a qualitative threshold determining
the candidate set by computing
Once a feasible candidate set is identified, the importance of each
eigenvector in
Once
Equation ((3)) depicts the spatially filtered regression
model and illustrates how the ESF approach partitions the regression
residuals
This stylized filtering scheme directly extends to GLMs, although the
link function might corrupt the uncorrelatedness of the eigenvectors. If
a substantial amount of multicollinearity among the eigenvectors is
present, each eigenvector included in the subset of
The stable release version of the spfilteR package can be obtained
from CRAN.
# install package from CRAN
R> install.packages("spfilteR")
# OR: install development version from GitHub
R> library(devtools)
R> devtools::install_github("sjuhl/spfilteR")
Alongside a collection of functions, the package also provides an
artificial dataset and a stylized binary connectivity matrix based on
the rook scheme of adjacency that connects
To this end, consider a simple linear regression model with a single
regressor. Once the model is fitted, the function MI.resid()
performs
a test of residual spatial autocorrelation based on the Moran
coefficient (Cliff and Ord 1981).
# load package and data
R> library(spfilteR)
R> data("fakedata")
R> y <- fakedataset$x1
R> X <- fakedataset$x2
R> resid <- resid(lm(y~X))
R> MI.resid(resid,x=X,W=W,alternative="greater")
I EI VarI zI pI
0.350568 -0.0119261 0.01207299 3.299085 0.0004850019 ***
The results suggest that the residuals are spatially autocorrelated,
which violates the Gauss-Markov assumption of uncorrelated errors since
As shown above, the ESF approach starts with the eigenfunction
decomposition of a transformed and symmetrized connectivity matrix as
depicted in Equation ((2)). The function getEVs()
allows
users to easily obtain these eigenvectors. Moreover, users have the
option to specify covariates that are used in order to construct the
projection matrix covars
.
R> EVs <- getEVs(W=W,covars=NULL)
R> E <- EVs$vectors
In addition to the eigenvectors and their corresponding eigenvalues,
getEVs()
also reports the value of the MC associated with each of the
eigenvectors.getEVs()
calls the helper function MI.ev()
, which
calculates the MC for each supplied eigenvector (Tiefelsdorf and Boots 1995; see also Griffith 1996).
|
|
|
|
|
|
|
|
Based on the MC values, users can define the candidate set
# identify candidate set
R> Ec <- EVs$moran/max(EVs$moran)>=.25
# obtain ESF residuals
R> esf.resid <- resid(lm(y~X+E[,Ec]))
# check for remaining spatial autocorrelation in model residuals
R> MI.resid(esf.resid,x=X,W=W,alternative="greater")
I EI VarI zI pI
-0.1836998 -0.0119261 0.01207299 -1.563326 0.941012
The results indicate that the ESF approach successfully removed positive
spatial autocorrelation from regression residuals. Furthermore, the
functions partialR2()
and vif.ev()
included in the spfilteR
package allow users to investigate the proportion of explained variance
by each eigenvector and identify potential problems of variance
inflation. In this example, eigenvector 13 accounts for about
R> round(partialR2(y=y,x=X,evecs=E[,Ec]),6)
0.000377 0.060584 0.001004 0.028734 0.020554 0.004804 0.000091 0.007010
0.030418 0.079015 0.004550 0.000012 0.232083 0.011407 0.000959 0.004993
0.001714 0.000094 0.036713 0.044113 0.006588 0.005762 0.001845 0.009648
0.002761 0.031923 0.007490 0.000075 0.004271 0.004042 0.004060
R> vif.ev(x=X,evecs=E[,Ec],na.rm=TRUE)
1.004420 1.001660 1.050409 1.049729 1.011899 1.001588 1.008393 1.000929
1.034209 1.013360 1.000230 1.000027 1.005781 1.022793 1.073397 1.015425
1.014602 1.014900 1.000798 1.002998 1.004616 1.019448 1.001397 1.015900
1.005540 1.000474 1.018344 1.008363 1.000284 1.009756 1.086114
Besides the supervised eigenvector selection procedure, the function
lmFilter()
performs unsupervised spatial filtering and provides
parameter estimates by means of OLS. Importantly, users can specify
different selection criteria. Thereby, this function eases the
implementation of the ESF approach while simultaneously providing
considerable flexibility regarding the stepwise selection of
eigenvectors. Specifically, the following input arguments allow users to
customize the selection procedure and ensure the function’s flexibility:
objfn
allows users to determine the objective function of the
search algorithm determining ’R2’
), residual spatial
autocorrelation (’MI’
), the significance of eigenvectors (’p’
),
and the significance level of residual spatial autocorrelation
(’pMI’
). Alternatively, all eigenvectors may be included by
spefifying objfn=’all’
, implying that no selection takes place.MX
(optional) specifies the covariates used to construct the
projection matrix sig
and bonferroni
indicate the significance level if the search
algorithm selects eigenvectors based on their significance or the
significance of residual spatial autocorrelation. If
bonferroni=TRUE
and objfn=’p’
, the significance level will be
adjusted in order to account for inflated Type-I errors. If
objfn=’pMI’
, bonferroni
is automatically set to FALSE
.positive
(TRUE
or FALSE
) restricts the eigenvector search to
those eigenvectors associated with positive levels of spatial
autocorrelation.ideal.setsize
(TRUE
or FALSE
) determines the ideal size of the
candidate set alpha
allows users to specify a threshold for the inclusion of
eigenvectors in the candidate set based on their MC values (see Griffith 2003).tol
sets a tolerance threshold for remaining residual
autocorrelation if objfn=’MI’
. Once the level of residual
autocorrelation reaches the threshold, the selection procedure
terminates.boot.MI
(optional) takes integers indicating the number of
bootstrap permutations in order to estimate the variance of the
Moran test for residual autocorrelation.These arguments allow users to customize the ESF model and obtain
parameter estimates by using a single function call and only a few lines
of code. While the print
method for the output — an object of class
"spfilter"
— only reports the number of selected eigenvectors in
summary
method provides a host of useful additional information.
R> (esf <- lmFilter(y=y,x=X,W=W,objfn="p",sig=.1,bonferroni=TRUE
+ ,positive=TRUE,ideal.setsize=TRUE))
3 out of 22 candidate eigenvectors selected
R> summary(esf,EV=TRUE)
- Spatial Filtering with Eigenvectors (Linear Model) -
Coefficients (OLS):
Estimate SE p-value
(Intercept) 9.370881 0.71253832 4.103548e-23 ***
beta_1 0.975771 0.08536198 1.511830e-19 ***
Adjusted R-squared:
Initial Filtered
0.4673945 0.6534442
Filtered for positive spatial autocorrelation
3 out of 22 candidate eigenvectors selected
Objective Function: "p" (significance level=0.1)
Bonferroni correction: TRUE (adjusted significance level=0.00455)
Summary of selected eigenvectors:
Estimate SE p-value partialR2 VIF MI
ev_13 -9.552977 1.626696 6.290028e-08 0.23208263 1.005781 0.6302019 ***
ev_10 -5.571465 1.632824 9.483754e-04 0.07901543 1.013360 0.7303271 ***
ev_2 4.900028 1.623316 3.261057e-03 0.06058390 1.001660 1.0004147 **
Moran's I ( Residuals):
Observed Expected Variance z p-value
Initial 0.3505680 -0.01192610 0.01207299 3.299085 0.0004850019 ***
Filtered 0.1397003 -0.03703186 0.02417938 1.136562 0.1278607838
Besides the parameter estimates of the filtered model, the summary
method provides information on the fit of the filtered and the
unfiltered models, the objective function, and the Moran test for
residual autocorrelation. If users specify EV=TRUE
, information on the
included eigenvectors in the order of their selection will be displayed
as well. Just like above, we see that the eigenvector 13, for example,
explains
|
|
|
“spfilter”
(left), spatial pattern captured by the filter
and calculated by MI.sf()
(center), and spatial patterns of
filtered residuals (right).
Finally, the left part of Figure 2 demonstrates the
plotting method for objects of class "spfilter"
which is produced by
plot(esf)
. It visualizes the MC of each eigenvector and highlights the
ones selected by the unsupervised selection procedure. The grey shaded
area illustrates the candidate set MI.sf()
computes the MC
value associated with the map pattern depicted by the spatial filter
The ESF methodology directly extends to GLMs. In fact, one of the advantages of the filtering approach as compared to parametric spatial regression models in this context is that parameter estimates can be obtained by standard MLE and do not require the application of more sophisticated estimation techniques (Griffith et al. 2019).
Besides the supervised filtering procedure, the function glmFilter()
from the spfilteR package allows users to perform unsupervised spatial
filtering in GLMs. While its usage is purposefully similar to the
function lmFilter()
introduced above, GLMs require some adjustments of
the filtering procedure. As a result, glmFilter()
not only uses MLE
instead of OLS to obtain parameter estimates but also differs in some of
the function’s input. Hence, in addition the input already discussed
above, glmFilter()
differs to lmFilter()
with respect to the
following input arguments:
objfn
defines the eigenvector selection criterium. Possible
criteria are the maximization of model fit (’AIC’
or ’BIC’
),
minimization of residual autocorrelation (’MI’
), the significance
level of candidate eigenvectors (’p’
), the significance of
residual spatial autocorrelation (’pMI’
), or all eigenvectors in
the candidate set (’all’
).model
specifies the type of model to be estimated. The current
version of spfilteR (version 1.0.0) supports ’probit’
,
’logit’
, and ’poisson’
as input.optim.method
determines the method used to optimize the likelihood
function.min.reduction
takes values in the interval resid.type
allows users to specify the type of residuals which is
used to calculate the MC value. Valid arguments are ’raw’
,
’deviance’
, and the default option ’pearson’
.Implementing the ESF approach in GLMs using glmFilter()
requires as
few lines of code as using the lmFilter()
function in the context of
linear regression models. The following example demonstrates the ease of
implementation in the context of a logit, a probit, and a Poisson
regression model:
# define DVs
R> y.bin <- fakedataset$indicator
R> y.count <- fakedataset$count
# seed (because of 'boot.MI')
set.seed(123)
# logit model
R> (esf.logit <- glmFilter(y=y.bin,x=NULL,W=W,objfn="p",model="logit",optim.method="BFGS"
+ ,sig=.05,bonferroni=FALSE,resid.type="pearson",boot.MI=100))
3 out of 31 candidate eigenvectors selected
# probit model
R> (esf.probit <- glmFilter(y=y.bin,x=NULL,W=W,objfn="BIC",model="probit"
+ ,optim.method="BFGS",min.reduction=0,resid.type="deviance"
+ ,boot.MI=100))
2 out of 31 candidate eigenvectors selected
# poisson model
R> (esf.poisson <- glmFilter(y=y.count,x=NULL,W=W,objfn="pMI",model="poisson"
+ ,optim.method="BFGS",sig=.1,resid.type="pearson"
+ ,boot.MI=100))
0 out of 31 candidate eigenvectors selected
Of course, users can also define their own eigenvector selection
criteria or apply the ESF approach to models currently not supported by
the glmFilter()
function. Just like for linear regression models
illustrated above, the function getEVs()
performs the eigenfunction
decomposition of the transformed and symmetrized connectivity matrix,
and users can implement a supervised selection procedure using the
standard glm()
function.
Of course, alternative implementations of the ESF approach outlined here
exist in other R packages as well. While these packages are highly
useful for spatial analysts, the spfilteR package offers a couple of
notable extensions that improve these existing implementations.
The spmoran package developed by Murakami (2020) contains different
functions for estimating eigenvector-based spatial additive mixed
models. Although the function esf()
estimates a linear spatial
filtering model, the main advantages of this package are the estimation
of the random effects ESF model (e.g., Murakami and Griffith 2019) and the fast
approximation of the eigenfunction decomposition, which makes this
package especially useful for large datasets. Moreover, users can also
use the functions meigen()
and meigen_f()
to obtain eigenfunctions
and perform supervised eigenvector selection.
At the same time, the eigenvector selection criteria implemented in
esf()
only allow for the identification of relevant eigenvectors based
on model fit statistics such as the adjusted
Alternatively, the spatialreg package, which encompasses a great
variety of different spatial estimation techniques, not only provides
the SpatialFiltering()
function estimating spatially filtered linear
models. It also allows for the estimation of spatially filtered GLMs by
using ME()
. Yet, both of these functions utilize an objective function
that selects eigenvectors based on the overall reduction of residual
autocorrelation. While it is possible to restrict the candidate set size
and to customize the level of remaining autocorrelation at which the
search terminates, users cannot select alternative objective functions.
Moreover, ME()
does not allow for the inclusion of covariates in the
construction of
Therefore, the spfilteR package provides additional flexibility –
especially for the estimation of filtered linear and generalized linear
models where the ESF approach is predominantly applied. Since the
eigenvector selection procedure is the crucial step in the ESF approach,
the options provided by lmFilter()
and glmFilter()
allow users to
tailor the ESF procedure to their specific needs. The option to estimate
the ideal size of the eigenvector candidate set
Despite this additional flexibility, the functions that perform
unsupervised eigenvector selection are very easy to use and only require
a minimum of code. Moreover, the getEVs()
command and several
additional helper functions such as MI.ev()
, MI.sf()
, partialR2()
,
and vif.ev()
introduced above facilitate the estimation of spatially
filtered (generalized) linear models. Consequently, while the spmoran
and the spatialreg packages cover additional model types and
estimation strategies, the flexibility provided by the spfilteR
package constitutes a great advantage in the most common applications of
the ESF approach.
This article briefly covers the basics of spatial filtering with eigenvectors and introduces the spfilteR package. Using the synthetic dataset provided by the package, it discusses the main functions and their implementation in the context of supervised and unsupervised spatial filtering as well as its extension to GLMs. By comparing the package to alternative implementations of the ESF approach, this article highlights that the flexibility provided by the spfilteR package constitutes an important improvement in settings where the ESF approach is commonly applied.
Funding was provided by the German Research Foundation (DFG) through the Collaborative Research Center (SFB) 884 (grant number: 139943784). This work was also supported by a postdoc fellowship of the German Academic Exchange Service (DAAD).
spfilteR, spatialreg, spmoran, adespatial, vegan
Econometrics, Environmetrics, Phylogenetics, Psychometrics, Spatial
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
glmFilter()
in the spfilteR
package reports the condition number (see also Griffith and Amrhein 1997).[↩]getEVs()
calls the helper function MI.ev()
, which
calculates the MC for each supplied eigenvector (Tiefelsdorf and Boots 1995; see also Griffith 1996).[↩]Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Juhl, "spfilteR: An R package for Semiparametric Spatial Filtering with Eigenvectors in (Generalized) Linear Models", The R Journal, 2021
BibTeX citation
@article{RJ-2021-085, author = {Juhl, Sebastian}, title = {spfilteR: An R package for Semiparametric Spatial Filtering with Eigenvectors in (Generalized) Linear Models}, journal = {The R Journal}, year = {2021}, note = {https://rjournal.github.io/}, volume = {13}, issue = {2}, issn = {2073-4859}, pages = {450-459} }