Eigenvector-based Spatial filtering constitutes a highly flexible semiparametric approach to account for spatial autocorrelation in a regression framework. It combines judiciously selected eigenvectors from a transformed connectivity matrix to construct a synthetic spatial filter and remove spatial patterns from model residuals. This article introduces the spfilteR package that provides several useful and flexible tools to estimate spatially filtered linear and generalized linear models in R. While the package features functions to identify relevant eigenvectors based on different selection criteria in an unsupervised fashion, it also helps users to perform supervised spatial filtering and to select eigenvectors based on alternative user-defined criteria. Besides a brief discussion of the eigenvector-based spatial filtering approach, this article presents the main functions of the package and illustrates their usage. Comparison to alternative implementations in other R packages highlights the added value of the spfilteR package.
The presence of spatial autocorrelation in regression residuals constitutes a severe problem in standard inferential statistics as it causes common econometric methods to produce inefficient or even biased and inconsistent parameter estimates (Franzese and Hays 2007; Goodchild 2009; Darmofal 2015). Besides parametric spatial regression techniques, which became the dominant approach to this challenge in the social sciences, spatial filtering techniques offer an alternative approach to handle spatially clustered data. The particular appeal of these alternative semiparametric approaches to spatial autocorrelation arise from their flexibility and the relative ease of estimation and interpretation (Getis and Griffith 2002; e.g., Tiefelsdorf and Griffith 2007). Especially the eigenvector-based spatial filtering (ESF) approach pioneered by (Griffith 1996, 2000, 2003; Griffith) has proven to be useful in various academic disciplines.
This article introduces the spfilteR package that provides a set of flexible and useful functions to implement the ESF approach in regression models. Besides tools to detect spatial autocorrelation in individual variables and regression residuals by means of the Moran coefficient (MC) (Cliff and Ord 1972, 1981), the package features easily customizable functions which allow users to perform supervised and unsupervised spatial filtering with eigenvectors. While other R packages like spatialreg (Bivand and Piras 2015) and spmoran (Murakami 2020) also contain implementations of the unsupervised ESF approach, they are less flexible in the specification of eigenvector selection criteria which constitutes the crucial step in the ESF approach. These packages also offer few functions for the supervised selection of eigenvectors.
In contrast, the spfilteR package allows users to obtain eigenvectors
from a transformed connectivity matrix and to identify a suitable
candidate set in order to perform supervised spatial filtering.
Alternatively, unsupervised eigenvector selection procedures for
different (generalized) linear models based on a stepwise regression
procedure are implemented as well. These functions select eigenvectors
based on either i) the maximization of model fit, ii) minimization of
residual autocorrelation, iii) the statistical significance of residual
autocorrelation, or iv) the statistical significance of the candidate
eigenvectors. Parameter estimates are obtained by means of ordinary
least squares (OLS) for linear models and maximum likelihood estimation
(MLE) for generalized linear models (GLMs). The print
, summary
, and
plot
methods further facilitate the interpretation and visualization
of the results.
After a theoretical description of the ESF approach in a regression framework, this article presents some stylized R code to demonstrate the implementation of the ESF approach using the functions and the synthetic dataset accompanying the spfilteR package. It also briefly compares the unsupervised ESF procedures contained in this package to alternative implementations in other R packages. The last section summarizes and concludes this article.
Intuitively, the ESF approach put forth by (Griffith 1996, 2000, 2003; Griffith) and also Tiefelsdorf and Griffith (2007) addresses the problem of spatially autocorrelated regression residuals by partitioning the error term into a spatially structured and a random component (see also Griffith and Chun 2014). Consider a stylized linear regression model of the following form:
\[\label{eq:ols} \mathbf{y} = \mathbf{X\beta} + \mathbf{e}, \tag{1}\] where \(\mathbf{X}\) denotes the matrix of covariates (plus a vector of ones for the intercept term) and \(\mathbf{\beta}\) is the corresponding parameter vector. If the errors \(\mathbf{e}\) are not independent but exhibit a non-random spatial pattern, the ESF approach removes this pattern from the disturbances and thereby “whitens” the residuals.
To this end, synthetic proxy variables are generated that reflect the spatial pattern present in model residuals as closely as possible. Subsequently including these synthetic variables as control variables in the regression’s mean equation removes the problematic spatial structure from the disturbances and allows the use of standard procedures — such as OLS or MLE — for parameter estimation. Generating these proxy variables that act as the spatial filter requires the decomposition of the transformed and exogenously defined connectivity matrix which represents the dependence structure among the units of analysis.
The eigenfunction (or spectral) decomposition of a transformed connectivity matrix constitutes the core element of the ESF approach. More formally, the decomposition yields
\[\label{eq:MVM} \mathbf{MVM}=\mathbf{E\Lambda E}', \tag{2}\] where \(\mathbf{M}\) is a symmetric and idempotent projection matrix, and \(\mathbf{V}\) is the exogenously specified connectivity matrix which is symmetrized by \(\frac{1}{2}(\mathbf{W}+\mathbf{W}')\). The columns of matrix \(\mathbf{E}\) are the \(n\) mutually uncorrelated eigenvectors obtained from \(\mathbf{MVM}\), and \(\mathbf{\Lambda}\) is a diagonal matrix with the corresponding eigenvalues \(\mathbf{\lambda}=\{\lambda_1,\lambda_2,\dots,\lambda_n\}\) on its main diagonal. Tiefelsdorf and Boots (1995) show that each eigenvector in \(\mathbf{E}\) represents a distinct map pattern permitted by the units’ spatial arrangement and is associated with a certain level of spatial autocorrelation.1
The projection matrix is given by \(\mathbf{M}=\mathbf{I}-\mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\), where \(\mathbf{I}\) is the identity matrix, and the eigenvectors in \(\mathbf{E}\) are mutually uncorrelated and orthogonal to the covariates in the design matrix \(\mathbf{X}\).2 If only the intercept is included in the construction of the projection matrix, this equation simplifies to \(\mathbf{M}=(\mathbf{I}-\mathbf{11}'/n)\), where \(\mathbf{1}\) is an \(n\times 1\)-dimensional vector of ones. As Tiefelsdorf and Griffith (2007) show, the underlying spatial process generating the data determines both the form of the spatial misspecification in a naïve nonspatial regression and the appropriate specification of \(\mathbf{M}\).
However, since the number of eigenvectors equals the number of observations in the data, only a subset of eigenvectors can be included in the regression equation.
Identifying and selecting relevant eigenvectors is decisive in the ESF approach and involves two steps. In a first step, a set of candidate eigenvectors, the search set \(\mathbf{E}^C \subset \mathbf{E}\), needs to be determined based on different criteria. If the model residuals exhibit positive levels of spatial autocorrelation, eigenvectors depicting negative autocorrelation can be discarded since simultaneously including eigenvectors associated with positive and negative spatial autocorrelation can cause problems (Tiefelsdorf and Griffith 2007). Moreover, eigenvectors portraying negligible levels of spatial autocorrelation can be eliminated as well since they contribute little to the spatial pattern present in model residuals (Chun and Griffith 2014).
Griffith (2003), for example, proposes a qualitative threshold determining the candidate set by computing \(MC_i/MC_{max}\) for all eigenvectors \(i \in \{1,2,\dots,n\}\), where \(MC_{max}\) denotes either the largest positive or the largest negative Moran coefficient of all eigenvectors in \(\mathbf{E}\). According to this approach, eigenvectors for which \(MC_i/MC_{max} \geq 0.25\) should be included in the candidate set \(\mathbf{E}^C\). Alternatively, 1 propose a nonlinear function to calculate the ideal size of the candidate set for a given level of spatial autocorrelation and the total number of positive eigenvectors. However, this approach is only applicable if the residuals exhibit positive levels of spatial autocorrelation.
Once a feasible candidate set is identified, the importance of each eigenvector in \(\mathbf{E}^C\) needs to be established in a second step. This is typically done by a stepwise regression procedure which sequentially evaluates each eigenvector in the candidate set. To this end, the search algorithm utilizes an objective function in order to determine which eigenvectors to select. The selected eigenvectors \(\mathbf{E}^* \in \mathbf{E}^C\) are the synthetic covariates constituting the spatial filter. Selection criteria commonly employed in the literature include the maximization of model fit statistics (Griffith 2003; e.g., Tiefelsdorf and Griffith 2007), the statistical significance of the eigenvectors (Le Gallo and Páez 2013; e.g., Griffith and Chun 2014), the minimization of residual autocorrelation (e.g., Tiefelsdorf and Griffith 2007), or arbitrary combinations of different selection criteria (e.g., Páez 2019). The aim is to specify an objective function that provides a parsimonious subset of eigenvectors. Parsimony here means that \(\mathbf{E}^*\) minimizes residual autocorrelation with respect to the pre-specified connectivity structure of the filtered model by selecting the smallest number of eigenvectors possible to obtain spatially independent errors (Tiefelsdorf and Griffith 2007).
Once \(\mathbf{E}^*\) is established, it can be added to the regression model in Equation ((1)):
\[\label{eq:filter} \mathbf{y} = \mathbf{X\beta} + \overbrace{\underbrace{\mathbf{E}^*\mathbf{\gamma}}_{filter} + \underbrace{\vphantom{\gamma}\mathbf{\epsilon}}_{noise}}^{\mathbf{e}}. \tag{3}\]
Equation ((3)) depicts the spatially filtered regression model and illustrates how the ESF approach partitions the regression residuals \(\mathbf{e}\) from Equation ((1)) into a spatial trend component (\(\mathbf{E}^*\mathbf{\gamma}\)) and a random component (\(\mathbf{\epsilon}\)). The selected eigenvectors \(\mathbf{E}^*\), in conjunction with their parameter estimates \(\mathbf{\gamma}\), represent the spatial pattern latent in \(\mathbf{e}\). This term constitutes the synthetic spatial filter that shifts the spatial pattern from the error term to the regression’s systematic part. Thereby, it removes the spatial structure from the error term, leaving white noise residuals \(\mathbf{\epsilon}\).
This stylized filtering scheme directly extends to GLMs, although the link function might corrupt the uncorrelatedness of the eigenvectors. If a substantial amount of multicollinearity among the eigenvectors is present, each eigenvector included in the subset of \(\mathbf{E}^*\) should be reevaluated whenever a new eigenvector is selected (e.g., Griffith et al. 2019).
The stable release version of the spfilteR package can be obtained from CRAN.3 Alternatively, the latest development version is available on GitHub:
# install package from CRAN
> install.packages("spfilteR")
R
# OR: install development version from GitHub
> library(devtools)
R> devtools::install_github("sjuhl/spfilteR") R
Alongside a collection of functions, the package also provides an artificial dataset and a stylized binary connectivity matrix based on the rook scheme of adjacency that connects \(n=100\) units on a regular \(10 \times 10\) grid. I use this made-up dataset to illustrate key features of the package and its functionality.
To this end, consider a simple linear regression model with a single
regressor. Once the model is fitted, the function MI.resid()
performs
a test of residual spatial autocorrelation based on the Moran
coefficient (Cliff and Ord 1981).
# load package and data
> library(spfilteR)
R> data("fakedata")
R
> y <- fakedataset$x1
R> X <- fakedataset$x2
R
> resid <- resid(lm(y~X))
R> MI.resid(resid,x=X,W=W,alternative="greater")
R
I EI VarI zI pI 0.350568 -0.0119261 0.01207299 3.299085 0.0004850019 ***
The results suggest that the residuals are spatially autocorrelated, which violates the Gauss-Markov assumption of uncorrelated errors since \(cov(\epsilon_i,\epsilon_j)\neq0 \forall i\neq j\). To address this problem, the spfilteR package allows users to implement the ESF approach and to select relevant eigenvectors using different supervised or unsupervised selection procedures.
As shown above, the ESF approach starts with the eigenfunction
decomposition of a transformed and symmetrized connectivity matrix as
depicted in Equation ((2)). The function getEVs()
allows
users to easily obtain these eigenvectors. Moreover, users have the
option to specify covariates that are used in order to construct the
projection matrix \(\mathbf{M}\) via the input covars
.
> EVs <- getEVs(W=W,covars=NULL)
R> E <- EVs$vectors R
In addition to the eigenvectors and their corresponding eigenvalues,
getEVs()
also reports the value of the MC associated with each of the
eigenvectors.4 The first eigenvector depicts the spatial pattern
permitted by \(\mathbf{W}\) with the largest possible degree of positive
spatial autocorrelation. The second eigenvector displays the pattern
associated with the second largest possible degree of positive
autocorrelation that is uncorrelated with the first pattern, and so on
(Griffith 1996). Consequently, while the first eigenvectors represent
global patterns of positive spatial autocorrelation, the pattern becomes
more local as the degree of spatial autocorrelation approaches zero. The
last eigenvectors in the set capture patterns of negative
autocorrelation (see Figure 1).
Based on the MC values, users can define the candidate set \(\mathbf{E}^C\) and select relevant eigenvectors based on any desired selection criterium. Using the threshold suggested by Griffith (2003) outlined above, the set \(\mathbf{E}^C\) consists of \(31\) eigenvectors. For illustrative purposes, I skip the second step of the eigenvector selection procedure and include all eigenvectors in the ESF model so that \(\mathbf{E}^C=\mathbf{E}^*\).
# identify candidate set
> Ec <- EVs$moran/max(EVs$moran)>=.25
R
# obtain ESF residuals
> esf.resid <- resid(lm(y~X+E[,Ec]))
R
# check for remaining spatial autocorrelation in model residuals
> MI.resid(esf.resid,x=X,W=W,alternative="greater")
R
I EI VarI zI pI -0.1836998 -0.0119261 0.01207299 -1.563326 0.941012
The results indicate that the ESF approach successfully removed positive
spatial autocorrelation from regression residuals. Furthermore, the
functions partialR2()
and vif.ev()
included in the spfilteR
package allow users to investigate the proportion of explained variance
by each eigenvector and identify potential problems of variance
inflation. In this example, eigenvector 13 accounts for about \(23.21\%\)
of the variance in \(\mathbf{y}\). Moreover, none of the eigenvectors
induces problematic levels of multicollinearity as the variance
inflation factor (VIF) of each eigenvectors remains close to \(1\).
> round(partialR2(y=y,x=X,evecs=E[,Ec]),6)
R
0.000377 0.060584 0.001004 0.028734 0.020554 0.004804 0.000091 0.007010
0.030418 0.079015 0.004550 0.000012 0.232083 0.011407 0.000959 0.004993
0.001714 0.000094 0.036713 0.044113 0.006588 0.005762 0.001845 0.009648
0.002761 0.031923 0.007490 0.000075 0.004271 0.004042 0.004060
> vif.ev(x=X,evecs=E[,Ec],na.rm=TRUE)
R
1.004420 1.001660 1.050409 1.049729 1.011899 1.001588 1.008393 1.000929
1.034209 1.013360 1.000230 1.000027 1.005781 1.022793 1.073397 1.015425
1.014602 1.014900 1.000798 1.002998 1.004616 1.019448 1.001397 1.015900
1.005540 1.000474 1.018344 1.008363 1.000284 1.009756 1.086114
Besides the supervised eigenvector selection procedure, the function
lmFilter()
performs unsupervised spatial filtering and provides
parameter estimates by means of OLS. Importantly, users can specify
different selection criteria. Thereby, this function eases the
implementation of the ESF approach while simultaneously providing
considerable flexibility regarding the stepwise selection of
eigenvectors. Specifically, the following input arguments allow users to
customize the selection procedure and ensure the function’s flexibility:
objfn
allows users to determine the objective function of the
search algorithm determining \(\mathbf{E}^*\). It supports eigenvector
selection based on the adjusted \(R^2\) (’R2’
), residual spatial
autocorrelation (’MI’
), the significance of eigenvectors (’p’
),
and the significance level of residual spatial autocorrelation
(’pMI’
). Alternatively, all eigenvectors may be included by
spefifying objfn=’all’
, implying that no selection takes place.MX
(optional) specifies the covariates used to construct the
projection matrix \(\mathbf{M}\). As Tiefelsdorf and Griffith (2007) show, the
specification of \(\mathbf{M}\) is directly linked to the form of the
spatial misspecification in the unfiltered naïve regression model.sig
and bonferroni
indicate the significance level if the search
algorithm selects eigenvectors based on their significance or the
significance of residual spatial autocorrelation. If
bonferroni=TRUE
and objfn=’p’
, the significance level will be
adjusted in order to account for inflated Type-I errors. If
objfn=’pMI’
, bonferroni
is automatically set to FALSE
.positive
(TRUE
or FALSE
) restricts the eigenvector search to
those eigenvectors associated with positive levels of spatial
autocorrelation.ideal.setsize
(TRUE
or FALSE
) determines the ideal size of the
candidate set \(\mathbf{E}^C\) according to the formula given in
alpha
allows users to specify a threshold for the inclusion of
eigenvectors in the candidate set based on their MC values (see Griffith 2003).tol
sets a tolerance threshold for remaining residual
autocorrelation if objfn=’MI’
. Once the level of residual
autocorrelation reaches the threshold, the selection procedure
terminates.boot.MI
(optional) takes integers indicating the number of
bootstrap permutations in order to estimate the variance of the
Moran test for residual autocorrelation.These arguments allow users to customize the ESF model and obtain
parameter estimates by using a single function call and only a few lines
of code. While the print
method for the output — an object of class
"spfilter"
— only reports the number of selected eigenvectors in
\(\mathbf{E}^*\) and the size of the candidate set \(\mathbf{E}^C\), the
summary
method provides a host of useful additional information.
> (esf <- lmFilter(y=y,x=X,W=W,objfn="p",sig=.1,bonferroni=TRUE
R+ ,positive=TRUE,ideal.setsize=TRUE))
3 out of 22 candidate eigenvectors selected
> summary(esf,EV=TRUE)
R
- Spatial Filtering with Eigenvectors (Linear Model) -
Coefficients (OLS):
-value
Estimate SE p9.370881 0.71253832 4.103548e-23 ***
(Intercept) 0.975771 0.08536198 1.511830e-19 ***
beta_1
-squared:
Adjusted R
Initial Filtered 0.4673945 0.6534442
for positive spatial autocorrelation
Filtered 3 out of 22 candidate eigenvectors selected
: "p" (significance level=0.1)
Objective Function: TRUE (adjusted significance level=0.00455)
Bonferroni correction
:
Summary of selected eigenvectors-value partialR2 VIF MI
Estimate SE p-9.552977 1.626696 6.290028e-08 0.23208263 1.005781 0.6302019 ***
ev_13 -5.571465 1.632824 9.483754e-04 0.07901543 1.013360 0.7303271 ***
ev_10 4.900028 1.623316 3.261057e-03 0.06058390 1.001660 1.0004147 **
ev_2
's I ( Residuals):
Moran Observed Expected Variance z p-value
Initial 0.3505680 -0.01192610 0.01207299 3.299085 0.0004850019 ***
Filtered 0.1397003 -0.03703186 0.02417938 1.136562 0.1278607838
Besides the parameter estimates of the filtered model, the summary
method provides information on the fit of the filtered and the
unfiltered models, the objective function, and the Moran test for
residual autocorrelation. If users specify EV=TRUE
, information on the
included eigenvectors in the order of their selection will be displayed
as well. Just like above, we see that the eigenvector 13, for example,
explains \(23.21\%\) of the variance, and the VIF indicates no problems of
multicollinearity in the filtered model. The adjusted \(R^2\) also shows
that the ESF approach considerably improves model fit.
“spfilter”
(left), spatial pattern captured by the filter
and calculated by MI.sf()
(center), and spatial patterns of
filtered residuals (right).
Finally, the left part of Figure 2 demonstrates the
plotting method for objects of class "spfilter"
which is produced by
plot(esf)
. It visualizes the MC of each eigenvector and highlights the
ones selected by the unsupervised selection procedure. The grey shaded
area illustrates the candidate set \(\mathbf{E}^C\) from which the
eigenvectors in \(\mathbf{E}^*\) are selected. Figure 2
further depicts the spatial pattern of the spatial filter (center) and
the filtered residuals (right). The function MI.sf()
computes the MC
value associated with the map pattern depicted by the spatial filter
\(\mathbf{E}^*\mathbf{\gamma}\) in Equation ((3)) (e.g., Le Gallo and Páez 2013).
The ESF methodology directly extends to GLMs. In fact, one of the advantages of the filtering approach as compared to parametric spatial regression models in this context is that parameter estimates can be obtained by standard MLE and do not require the application of more sophisticated estimation techniques (Griffith et al. 2019).
Besides the supervised filtering procedure, the function glmFilter()
from the spfilteR package allows users to perform unsupervised spatial
filtering in GLMs. While its usage is purposefully similar to the
function lmFilter()
introduced above, GLMs require some adjustments of
the filtering procedure. As a result, glmFilter()
not only uses MLE
instead of OLS to obtain parameter estimates but also differs in some of
the function’s input. Hence, in addition the input already discussed
above, glmFilter()
differs to lmFilter()
with respect to the
following input arguments:
objfn
defines the eigenvector selection criterium. Possible
criteria are the maximization of model fit (’AIC’
or ’BIC’
),
minimization of residual autocorrelation (’MI’
), the significance
level of candidate eigenvectors (’p’
), the significance of
residual spatial autocorrelation (’pMI’
), or all eigenvectors in
the candidate set (’all’
).model
specifies the type of model to be estimated. The current
version of spfilteR (version 1.0.0) supports ’probit’
,
’logit’
, and ’poisson’
as input.optim.method
determines the method used to optimize the likelihood
function.min.reduction
takes values in the interval \([0,1)\). It defines the
minimum level of reduction in the AIC or BIC (if either selection
criterium is chosen) relative to the current AIC/ BIC a candidate
eigenvector needs to achieve in order to be included in the spatial
filter.resid.type
allows users to specify the type of residuals which is
used to calculate the MC value. Valid arguments are ’raw’
,
’deviance’
, and the default option ’pearson’
.Implementing the ESF approach in GLMs using glmFilter()
requires as
few lines of code as using the lmFilter()
function in the context of
linear regression models. The following example demonstrates the ease of
implementation in the context of a logit, a probit, and a Poisson
regression model:
# define DVs
> y.bin <- fakedataset$indicator
R> y.count <- fakedataset$count
R
# seed (because of 'boot.MI')
set.seed(123)
# logit model
> (esf.logit <- glmFilter(y=y.bin,x=NULL,W=W,objfn="p",model="logit",optim.method="BFGS"
R+ ,sig=.05,bonferroni=FALSE,resid.type="pearson",boot.MI=100))
3 out of 31 candidate eigenvectors selected
# probit model
> (esf.probit <- glmFilter(y=y.bin,x=NULL,W=W,objfn="BIC",model="probit"
R+ ,optim.method="BFGS",min.reduction=0,resid.type="deviance"
+ ,boot.MI=100))
2 out of 31 candidate eigenvectors selected
# poisson model
> (esf.poisson <- glmFilter(y=y.count,x=NULL,W=W,objfn="pMI",model="poisson"
R+ ,optim.method="BFGS",sig=.1,resid.type="pearson"
+ ,boot.MI=100))
0 out of 31 candidate eigenvectors selected
Of course, users can also define their own eigenvector selection
criteria or apply the ESF approach to models currently not supported by
the glmFilter()
function. Just like for linear regression models
illustrated above, the function getEVs()
performs the eigenfunction
decomposition of the transformed and symmetrized connectivity matrix,
and users can implement a supervised selection procedure using the
standard glm()
function.
Of course, alternative implementations of the ESF approach outlined here exist in other R packages as well. While these packages are highly useful for spatial analysts, the spfilteR package offers a couple of notable extensions that improve these existing implementations.5
The spmoran package developed by Murakami (2020) contains different
functions for estimating eigenvector-based spatial additive mixed
models. Although the function esf()
estimates a linear spatial
filtering model, the main advantages of this package are the estimation
of the random effects ESF model (e.g., Murakami and Griffith 2019) and the fast
approximation of the eigenfunction decomposition, which makes this
package especially useful for large datasets. Moreover, users can also
use the functions meigen()
and meigen_f()
to obtain eigenfunctions
and perform supervised eigenvector selection.
At the same time, the eigenvector selection criteria implemented in
esf()
only allow for the identification of relevant eigenvectors based
on model fit statistics such as the adjusted \(R^2\), the AIC, or the BIC.
The specification of the projection matrix \(\mathbf{M}\) also does not
allow for the inclusion of covariates. Furthermore, the spmoran
package does not support the ESF approach in the context of GLMs.
Alternatively, the spatialreg package, which encompasses a great
variety of different spatial estimation techniques, not only provides
the SpatialFiltering()
function estimating spatially filtered linear
models. It also allows for the estimation of spatially filtered GLMs by
using ME()
. Yet, both of these functions utilize an objective function
that selects eigenvectors based on the overall reduction of residual
autocorrelation. While it is possible to restrict the candidate set size
and to customize the level of remaining autocorrelation at which the
search terminates, users cannot select alternative objective functions.
Moreover, ME()
does not allow for the inclusion of covariates in the
construction of \(\mathbf{M}\). Since there is no function to perform the
eigenfunction decomposition shown in Equation ((2)), the
package offers no support for supervised spatial filtering.
Therefore, the spfilteR package provides additional flexibility –
especially for the estimation of filtered linear and generalized linear
models where the ESF approach is predominantly applied. Since the
eigenvector selection procedure is the crucial step in the ESF approach,
the options provided by lmFilter()
and glmFilter()
allow users to
tailor the ESF procedure to their specific needs. The option to estimate
the ideal size of the eigenvector candidate set \(\mathbf{E}^C\) according
to 1, the specification of different residual types in GLMs, and
the ability to define a threshold for the minimum increase in model fit
when an objective function is chosen accordingly are examples of
features unique to the spfilteR package.
Despite this additional flexibility, the functions that perform
unsupervised eigenvector selection are very easy to use and only require
a minimum of code. Moreover, the getEVs()
command and several
additional helper functions such as MI.ev()
, MI.sf()
, partialR2()
,
and vif.ev()
introduced above facilitate the estimation of spatially
filtered (generalized) linear models. Consequently, while the spmoran
and the spatialreg packages cover additional model types and
estimation strategies, the flexibility provided by the spfilteR
package constitutes a great advantage in the most common applications of
the ESF approach.
This article briefly covers the basics of spatial filtering with eigenvectors and introduces the spfilteR package. Using the synthetic dataset provided by the package, it discusses the main functions and their implementation in the context of supervised and unsupervised spatial filtering as well as its extension to GLMs. By comparing the package to alternative implementations of the ESF approach, this article highlights that the flexibility provided by the spfilteR package constitutes an important improvement in settings where the ESF approach is commonly applied.
Funding was provided by the German Research Foundation (DFG) through the Collaborative Research Center (SFB) 884 (grant number: 139943784). This work was also supported by a postdoc fellowship of the German Academic Exchange Service (DAAD).
spfilteR, spatialreg, spmoran, adespatial, vegan
Econometrics, Environmetrics, Phylogenetics, Psychometrics, Spatial
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Juhl, "spfilteR: An R package for Semiparametric Spatial Filtering with Eigenvectors in (Generalized) Linear Models", The R Journal, 2021
BibTeX citation
@article{RJ-2021-085, author = {Juhl, Sebastian}, title = {spfilteR: An R package for Semiparametric Spatial Filtering with Eigenvectors in (Generalized) Linear Models}, journal = {The R Journal}, year = {2021}, note = {https://rjournal.github.io/}, volume = {13}, issue = {2}, issn = {2073-4859}, pages = {450-459} }