In this article we present the Bayesian estimation of spatial probit models in R and provide an implementation in the package spatialprobit. We show that large probit models can be estimated with sparse matrix representations and Gibbs sampling of a truncated multivariate normal distribution with the precision matrix. We present three examples and point to ways to achieve further performance gains through parallelization of the Markov Chain Monte Carlo approach.
The abundance of geolocation data and social network data has lead to a growing interest in spatial econometric methods, which model a contemporaneous dependence structure using neighboring observations (e.g. friends in social networks).
There are a variety of R packages for spatial models available, an incomplete list includes spBayes (Finley and Banerjee 2013), spatial (Venables and Ripley 2002), geostatistical packages called geoR (Diggle and Ribeiro 2007) and sgeostat (Majure and Gebhardt 2013), spdep (Bivand 2013), sphet (Piras 2010), sna (Butts 2013) and network (Butts et al. 2013).
While all of these packages deal with linear spatial models, in this article we focus on a nonlinear model, the spatial probit model, and present the Bayesian estimation first proposed by LeSage (2000). As will be shown below, one crucial point we have been working on was the generation of random numbers of a truncated multivariate normal distribution in very high dimensions.
Together with these high dimensions and the size of problems in spatial
models comes the need to work with sparse matrix representations rather
than the usual dense matrix()
. Popular R packages for dealing with
sparse matrices are
Matrix (Bates and Maechler 2013) and
sparseM (Koenker and Ng 2013).
In the next section we first introduce spatial probit models and
describe the Bayesian estimation procedure of the SAR probit model in
detail. Subsequently we discuss some implementation issues in R,
describe the sarprobit
method in package
spatialprobit
(Wilhelm and Matos 2013), compare it against maximum likelihood (ML) and
generalized method of moments (GMM) estimation in
McSpatial
(McMillen 2013) and illustrate the estimation with an example from
social networks. We also illustrate how to parallelize the Bayesian
estimation with the parallel package.
The book of LeSage and Pace (2009) is a good starting point and reference for spatial econometric models in general and for limited dependent variable spatial models in particular (chapter 10, p. 279).
Suppose we have the spatial autoregressive model (SAR model, spatial lag model)
for
In a spatial probit model,
The data generating process for
Note that if glm()
.
Another popular spatial model is the spatial error model (SEM) which takes the form
Recent studies using spatial probit models include, among others, Marsh et al. (2000), Klier and McMillen (2008) and LeSage et al. (2011).
Maximum Likelihood and GMM estimation of the spatial probit is
implemented in the package McSpatial (McMillen 2013) with the methods
spprobitml
and spprobit
. Here we present the details of Bayesian
estimation. Although GMM is massively more efficient computationally
than Bayesian Markov Chain Monte Carlo (MCMC), as a
instrumental-variables estimation it will work well only in very large
samples. This point is also brought by Franzese et al. (2013), who compares
different spatial probit estimation strategies.
The basic idea in Bayesian estimation is to sample from a posterior
distribution of the model parameters
Given the observed variables
subject to rtmvnorm.sparseMatrix()
in package
tmvtnorm
(Wilhelm and Manjunath 2013), see the next section for details.
For a normal prior
The standard way for sampling from this distribution is rmvnorm()
from package mvtnorm
(Genz et al. 2013).
The remaining conditional density
Since MCMC methods are widely considered to be (very) slow, we devote this section to the discussion of some implementation issues in R. Key factors to estimate large spatial probit models in R include the usage of sparse matrices and compiled Fortran code, and possibly also parallelization, which has been introduced to R 2.14.0 with the package parallel. We report estimation times and memory requirements for several medium-size and large-size problems and compare our times to those reported by LeSage and Pace (2009 291). We show that our approach allows the estimation of large spatial probit models in R within reasonable time.
In this section we describe the efforts made to generate samples from
The generation of samples from a truncated multivariate normal
distribution in high dimensions is typically done with a Gibbs sampler
rather than with other techniques such as rejection sampling (for a
comparison, see Wilhelm and Manjunath (2010)). The Gibbs sampler
draws from univariate conditional densities
The distribution of
and mean
and bounds
Package tmvtnorm has such a Gibbs sampler in place since version 0.9,
based on the covariance matrix
In our spatial probit model, the covariance matrix
Suppose one wants to draw
Data storage problem: Since matrix
> library(Matrix)
> I_n_dense <- diag(10000)
> print(object.size(I_n_dense), units = "Mb")
762.9 Mb
> rm(I_n_dense)
> I_n_sparse <- sparseMatrix(i = 1:10000, j = 1:10000, x = 1)
> print(object.size(I_n_sparse), units = "Mb")
0.2 Mb
Data access problem: Even with a sparse matrix representation
for H[i,j]
like in triplet or hash table
representations, will result in
Iterating only over the non-zero elements in row
How fast is the random number generation for
m=2 | 0.03 | 0.25 | 2.60 | ||||
0.03 | 0.25 | 2.75 | |||||
0.03 | 0.25 | 2.84 | |||||
0.03 | 0.25 | 2.80 | |||||
0.26 | 2.79 | ||||||
m=6 | 0.03 | 0.23 | 2.48 | ||||
0.01 | 0.23 | 2.53 | |||||
0.03 | 0.23 | 2.59 | |||||
0.02 | 0.24 | 2.57 | |||||
0.25 | 2.68 |
One more performance issue we discuss here is the burn-in size in the
innermost Gibbs sampler generating burn.in=20
samples, we have to generate 21 draws in order to keep
just one. In our situation, a large burn-in size will dramatically
degrade the MCMC performance, so the number of burn-in samples for
generating burn.in=10
, when the Gibbs
sampler starts from zero. Alternatively, they also propose to use no
burn-in phase at all (e.g. burn.in=0
), but then to set the start value
of the sampler to the previous value of
The mean vector mu <- qr.solve(S, X %*% beta)
and mu <- solve(qr(S), X %*% beta)
.
The latter will apply a QR decomposition for a sparse matrix
qr()
from the package Matrix,
whereas the first function from the base package will coerce
solve(qr(S), X %*% beta)
takes only
half the time required by qr.solve(S, X %*% beta)
.
Drawing
The computation of log-determinants for do_ldet()
).
In spatial lag models (and its probit variants), a change of an
explanatory variable solve(qr(S),rep(1,n))
speeds up the calculation
of total effects by magnitudes.
After describing the implementation issues to ensure that the estimation is fast enough and capable to handle large problems, we briefly describe the interface of the methods in package spatialprobit (Wilhelm and Matos 2013) and then turn to some examples.
The main estimation method for the SAR probit model
sar_probit_mcmc(y, X, W)
takes a vector of dependent variables y
, a
model matrix X
and a spatial weight matrix W
. The method
sarprobit(formula,W, data)
is a wrapper which allows a model formula
and a data frame. Both methods require a spatial weight matrix W
to be
passed as an argument. Additionally, the number of MCMC start values,
the number of burn-in iterations and a thinning parameter can be
specified. The estimation fit <- sarprobit(y ~ x, W, data)
returns an
object of class sarprobit
. The model coefficients can be extracted via
coef(fit)
. summary(fit)
returns a coefficient table with impacts(fit)
gives the marginal effects and the plot(fit)
method
provides MCMC trace plots, posterior density plots as well as
autocorrelation plots of the model parameters. logLik(fit)
and
AIC(fit)
return the log likelihood and the AIC of the model for model
comparison and testing.
We replicate the experiment from LeSage and Pace (2009 10.1.5) for
> library(spatialprobit)
> set.seed(2)
> n <- 400
> beta <- c(0, 1, -1)
> rho <- 0.75
> X <- cbind(intercept = 1, x = rnorm(n), y = rnorm(n))
> I_n <- sparseMatrix(i = 1:n, j = 1:n, x = 1)
> nb <- knn2nb(knearneigh(cbind(x = rnorm(n), y = rnorm(n)), k = 6))
> listw <- nb2listw(nb, style = "W")
> W <- as(as_dgRMatrix_listw(listw), "CsparseMatrix")
> eps <- rnorm(n = n, mean = 0, sd = 1)
> z <- solve(qr(I_n - rho * W), X %*% beta + eps)
> y <- as.double(z >= 0)
We estimate the spatial probit model as
> sarprobit.fit1 <- sarprobit(y ~ X - 1, W, ndraw = 1000, burn.in = 200,
+ thinning = 1, m = 10)
> summary(sarprobit.fit1)
> plot(sarprobit.fit1)
Table 2 shows the results of the SAR probit
estimation for the same experiment as in LeSage and Pace (2009). The results shown
there can be replicated either using the code above (for LeSagePaceExperiment()
function in the package and setting
set.seed(2)
(for
> set.seed(2)
> res1 <- LeSagePaceExperiment(n = 400, beta = c(0, 1, -1), rho = 0.75,
+ ndraw = 1000, burn.in = 200, thinning = 1, m = 10)
> summary(res1)
> set.seed(2)
> res2 <- LeSagePaceExperiment(n = 1000, beta = c(0, 1, -1), rho = 0.75,
+ ndraw = 1000, burn.in = 200, thinning = 1, m = 1)
> summary(res2)
The corresponding ML and GMM estimates in Table 2 can
be replicated by creating the data as above (replacing n <- 1000
for
> library(McSpatial)
> wmat <- as.matrix(W)
> mle.fit1 <- spprobitml(y ~ X[, "x"] + X[, "y"], wmat = wmat)
> gmm.fit1 <- spprobit(y ~ X[, "x"] + X[, "y"], wmat = wmat)
Our Bayesian estimation yields similar results, but our implementation
is much faster (factor 20-100) than the LeSage implementation, even on a
single core. The McSpatial GMM estimation seems to be biased in this
example. The evaluation of the performance relative to McSpatial ML
clearly needs to take into account a number of factors: First, the
choice of the number of MCMC iterations burn.in
samples, as
well as
Table 3 shows that a change of the Gibbs sampler
burn-in size
As a second example we present the estimation of the probit model based
on a random undirected graph with
> library(igraph)
> library(spatialprobit)
> set.seed(1)
> n <- 200
> branch <- 3
> probability <- branch/n
> grandom <- igraph::erdos.renyi.game(n = n, p.or.m = probability,
+ type = "gnp", directed = F, loops = F)
> V(grandom)$name <- 1:n
> A <- igraph::get.adjacency(grandom, type = "both", sparse = TRUE)
> W <- A/ifelse(rowSums(A) == 0, 1, rowSums(A))
> plot(grandom, layout = layout.fruchterman.reingold, vertex.label.family = "sans",
+ vertex.size = 2, vertex.label = "")
Figure 1 shows the resulting random graph.
Next we are going to estimate the spatial probit model with
> set.seed(1.2345)
> x <- rnorm(n)
> X <- cbind(const = rep(1, n), x = x)
> p <- 0.3
> beta <- c(-1, 2)
> I_n <- sparseMatrix(i = 1:n, j = 1:n, x = 1)
> z <- solve(qr(I_n - p * W), X %*% beta + rnorm(n))
> y <- as.numeric(z >= 0)
> sarprobit.fit <- sarprobit(y ~ X - 1, W, ndraw = 3000, burn.in = 200,
+ thinning = 1)
The true parameter in this model are
> summary(sarprobit.fit)
--------MCMC spatial autoregressive probit--------
Execution time = 25.350 secs
N draws = 3000, N omit (burn-in)= 200
N observations = 200, K covariates = 2
# of 0 Y values = 151, # of 1 Y values = 49
Min rho = -1.000, Max rho = 1.000
--------------------------------------------------
Estimate Std. Dev p-level t-value Pr(>|z|)
Xconst -1.25361 0.20035 0.00000 -6.26 2.3e-09 ***
Xx 2.05238 0.28529 0.00000 7.19 1.2e-11 ***
rho 0.24796 0.10571 0.00967 2.35 0.02 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The direct, indirect and total marginal effects are extracted using
> impacts(sarprobit.fit)
--------Marginal Effects--------
(a) Direct effects
lower_005 posterior_mean upper_095
Xx 0.231 0.268 0.3
(b) Indirect effects
lower_005 posterior_mean upper_095
Xx -0.299 -0.266 -0.23
(c) Total effects
lower_005 posterior_mean upper_095
Xx 0.00149 0.00179 0
The corresponding non-spatial probit model is estimated using the
glm()
function:
> glm1 <- glm(y ~ x, family = binomial("probit"))
> summary(glm1, digits = 4)
Call:
glm(formula = y ~ x, family = binomial("probit"))
Deviance Residuals:
Min 1Q Median 3Q Max
-2.2337 -0.3488 -0.0870 -0.0002 2.4107
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.491 0.208 -7.18 7.2e-13 ***
x 1.966 0.281 6.99 2.7e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 222.71 on 199 degrees of freedom
Residual deviance: 103.48 on 198 degrees of freedom
AIC: 107.5
Number of Fisher Scoring iterations: 7
Figures 2 and 3 show the trace plots and posterior densities as part of the MCMC estimation results. Table 4 compares the SAR probit and standard probit estimates.
LeSage (2009) | sarprobit | McSpatial ML | McSpatial GMM | ||||||
Estimates | Mean | Std dev | Mean | Std dev | Mean | Std dev | Mean | Std dev | |
2-5 | |||||||||
-0.1844 | 0.0686 | 0.0385 | 0.0562 | 0.0286 | 0.0302 | -0.1042 | 0.0729 | ||
0.9654 | 0.1179 | 0.9824 | 0.1139 | 1.0813 | 0.1115 | 0.7690 | 0.0815 | ||
-0.8816 | 0.1142 | -1.0014 | 0.1163 | -0.9788 | 0.1040 | -0.7544 | 0.0829 | ||
0.6653 | 0.0564 | 0.7139 | 0.0427 | 0.7322 | 0.0372 | 1.2208 | 0.1424 | ||
Time (sec.) | 11.5 | ||||||||
2-5 | |||||||||
0.05924 | 0.0438 | -0.0859 | 0.0371 | 0.0010 | 0.0195 | -0.1045 | 0.0421 | ||
0.96105 | 0.0729 | 0.9709 | 0.0709 | 1.0608 | 0.0713 | 0.7483 | 0.0494 | ||
-1.04398 | 0.0749 | -0.9858 | 0.0755 | -1.1014 | 0.0728 | -0.7850 | 0.0528 | ||
0.69476 | 0.0382 | 0.7590 | 0.0222 | 0.7227 | 0.0229 | 1.4254 | 0.0848 | ||
Time (sec.) | 15.5 |
m=1 | m=2 | m=5 | m=10 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
2-3 Estimates | Mean | Std dev | Mean | Std dev | Mean | Std dev | Mean | Std dev | |||
0.0385 | 0.0571 | 0.0447 | 0.0585 | 0.0422 | 0.0571 | 0.0385 | 0.0562 | ||||
1.0051 | 0.1146 | 0.8294 | 0.0960 | 0.9261 | 0.1146 | 0.9824 | 0.1139 | ||||
-1.0264 | 0.1138 | -0.8417 | 0.0989 | -0.9446 | 0.1138 | -1.0014 | 0.1163 | ||||
0.7226 | 0.0411 | 0.6427 | 0.0473 | 0.6922 | 0.0411 | 0.7139 | 0.0427 | ||||
Time (sec.) | 10.2 | 10.2 | 10.5 | 11.0 | |||||||
Time (sec.) in LeSage (2009) | 195 | 314 | - | 1270 |
SAR probit | Probit | ||||||
---|---|---|---|---|---|---|---|
2-4 Estimates | Mean | Std dev | p-level | Mean | Std dev | p-level | |
-1.2536 | 0.2004 | 0.0000 | -1.4905 | 0.2077 | 0.0000 | ||
2.0524 | 0.2853 | 0.0000 | 1.9656 | 0.2812 | 0.0000 | ||
0.2480 | 0.1057 | 0.0097 | |||||
Time (sec.) | 25.4 |
A last example looks at the information flow in social networks.
Coleman et al. (1957; Coleman et al. 1966) have studied how innovations (i.e. new
drugs) diffuse among physicians, until the new drug is widely adopted by
all physicians. They interviewed 246 physicians in 4 different US cities
and looked at the role of 3 interpersonal networks (friends, colleagues
and discussion circles) in the diffusion process (November,
1953–February, 1955). Figure 4
illustrates one of the 3 social structures based on the question "To
whom do you most often turn for advice and information?". The data set
is available at http://moreno.ss.uci.edu/data.html#ckm, but also part
of spatialprobit (data(CKM)
). See also Burt (1987) and
Bulte and Lilien (2001) for further discussion of this data set.
The dependent variable in the model is the month in which a doctor first prescribed the new drug. Explanatory variables are the social structure and individual variables.
We are estimating a standard probit model as well as a SAR probit model
based on the "advisorship" network and trying to find determinants for
all adopters of the new drug. We do not aim to fully reanalyze the
Coleman data set, nor to provide a detailed discussion of the results.
We rather want to illustrate the model estimation in R. We find a
positive relationship for the social influence variable in the probit
model, but social contagion effects as captured by
> set.seed(12345)
> # load data set "CKM" and spatial weight matrices "W1","W2","W3"
> data(CKM)
> # 0/1 variable for early adopter
> y <- as.numeric(CKM$adoption.date <= "February, 1955")
> # create social influence variable
> influence <- as.double(W1 %*% as.numeric(y))
> # Estimate Standard probit model
> glm.W1 <- glm(y ~ influence + city + med_sch_yr + meetings + jours + free_time +
+ discuss + clubs + friends + community + patients + proximity + specialty,
+ data = CKM, family = binomial("probit"))
> summary(glm.W1, digits = 3)
> # Estimate SAR probit model without influence variable
> sarprobit.fit.W1 <- sarprobit(y ~ 1 + city + med_sch_yr + meetings + jours +
+ free_time + discuss + clubs + friends + community + patients + proximity +
+ specialty, data = CKM, W = W1)
> summary(sarprobit.fit.W1, digits = 3)
Probit | SAR Probit | |||||
Estimates | Mean | z value | Mean | z value | ||
(Intercept) | ||||||
influence | ||||||
city | ||||||
med_sch_yr | ||||||
meetings | ||||||
jours | ||||||
free_time | ||||||
discuss | ||||||
clubs | ||||||
friends | ||||||
community | ||||||
patients | ||||||
proximity | ||||||
specialty | ||||||
Table 5 presents the estimation results for
the non-spatial probit and the SAR probit specification. The coefficient
estimates are similar in magnitude and sign, but the estimate for
MCMC is, similar to the bootstrap, an embarrassingly parallel problem.
It can be easily run in parallel on several cores. From version 2.14.0,
R offers a unified way of doing parallelization with the parallel
package. There are several different approaches available to achieve
parallelization and not all approaches are available for all platforms.
See for example the conceptual differences between the two main methods
mclapply
and parLapply
, where the first will only work serially on
Windows. Users are therefore encouraged to read the parallel package
documentation for choosing the appropriate way.
Here, we only sketch how easy the SAR probit estimation can be done in parallel with 2 tasks:
> library(parallel)
> mc <- 2
> run1 <- function(...) sarprobit(y ~ X - 1, W, ndraw = 500, burn.in = 200,
+ thinning = 1)
> system.time({
+ set.seed(123, "L'Ecuyer")
+ sarprobit.res <- do.call(c, mclapply(seq_len(mc), run1))
+ })
> summary(sarprobit.res)
Due to the overhead in setting up the cluster, it is reasonable to expect another 50% performance gain when working with 2 CPUs.
In this article we presented the estimation of spatial probit models in
R and pointed to the critical implementation issues. Our performance
studies showed that even large problems with sarprobit()
), the
probit model with spatial errors (semprobit()
) and the SAR Tobit model
(sartobit()
). The Bayesian approach can be further extended to other
limited dependent spatial models, such as ordered probit or models with
multiple spatial weights matrices. We are planning to include these in
the package in near future.
spBayes, spatial, geoR, sgeostat, spdep, sphet, sna, network, Matrix, sparseM, spatialprobit, McSpatial, LearnBayes, tmvtnorm, mvtnorm, igraph
Bayesian, Distributions, Econometrics, Finance, GraphicalModels, MixedModels, NumericalMathematics, Optimization, Spatial, SpatioTemporal, Survival, TeachingStatistics
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Wilhelm & Matos, "Estimating Spatial Probit Models in R", The R Journal, 2013
BibTeX citation
@article{RJ-2013-013, author = {Wilhelm, Stefan and Matos, Miguel Godinho de}, title = {Estimating Spatial Probit Models in R}, journal = {The R Journal}, year = {2013}, note = {https://rjournal.github.io/}, volume = {5}, issue = {1}, issn = {2073-4859}, pages = {130-143} }