Survival data often include a fraction of units that are susceptible to an event of interest as well as a fraction of “immune” units. In many applications, spatial clustering in unobserved risk factors across nearby units can also affect their survival rates and odds of becoming immune. To address these methodological challenges, this article introduces our BayesSPsurv R-package, which fits parametric Bayesian Spatial split-population survival (cure) models that can account for spatial autocorrelation in both subpopulations of the user’s time-to-event data. Spatial autocorrelation is modeled with spatially weighted frailties, which are estimated using a conditionally autoregressive prior. The user can also fit parametric cure models with or without nonspatial i.i.d. frailties, and each model can incorporate time-varying covariates. BayesSPsurv also includes various functions to conduct pre-estimation spatial autocorrelation tests, visualize results, and assess model performance, all of which are illustrated using data on post-civil war peace survival.
Conventional survival models have been applied to analyze time-to-event data across several academic disciplines, but these models rely on two core assumptions that are not always tenable. The first is that all units, including right-censored observations, will eventually experience the event of interest. In many applications, however, some fraction of subjects that are “immune” or “cured” may never experience the event (Maller and Zhou 1996; Peng and Taylor 2014; Beger et al. 2017). A clinical study of obesity on human death rates can employ a standard survival model because all subjects will eventually die, but a study on vaccine effectiveness will likely assume that some fraction of the treated population will become immune while others may respond differently and remain uncured. To account for both subpopulations, scholars have developed a class of split-population (SP) survival models that probabilistically separate the immune fraction from the units that are susceptible to the event of interest and then estimate the conditional hazard of survival among the latter units (Box-Steffensmeier and Zorn 1999; Box-Steffensmeier and Jones 2004; Cai et al. 2012; Beger et al. 2018). The second assumption of conventional survival models is that each observation is conditionally independent after controlling for covariates. This assumption is violated if spatially clustered units share common unobserved features that influence their baseline risk of experiencing an event (Darmofal 2009; Taylor and Rowlingson 2017). If spatial autocorrelation matters for process survival and/or the probability of being immune to an event, ignoring it can lead to biased parameter estimates.
Although there are numerous methods for dealing with spatially autocorrelated time-to-event data (e.g., (Henderson et al. 2002; Li and Ryan 2002; Banerjee et al. 2003; Zhou et al. 2020)), these spatial survival models do not differentiate between at-risk and immune observations. Conversely, existing SP survival models either ignore spatial autocorrelation altogether or only account for it in the survival stage (Banerjee and Carlin 2004). We present the BayesSPsurv (Bolte et al. 2021b) R-package (available at https://CRAN.R-project.org/package=BayesSPsurv), which has the power to overcome these limitations as well as the flexibility for researchers to model spatial clustering in their survival data however they choose to define it. In particular, BayesSPsurv allows the user to estimate parametric Bayesian Spatial split-population survival (cure) models with spatial frailties in both the model’s split and survival stages. These models account for spatial clustering in the immune and at-risk fractions of the data. The package also includes functions and code for pre-estimation spatial autocorrelation diagnostics, visualizing results, simulating multiple Markov chains, and implementing Markov Chain Monte Carlo (MCMC) estimation routines for SP survival models with independent (exchangeable) or no random effects. The user can also include time-varying covariates in either stage. In the next section, we outline previous work on SP and spatial survival models, including existing packages and their limitations. We then formally develop the Pooled, Exchangeable, and Spatial Bayesian SP survival models before describing the various functions available in the BayesSPsurv package. We demonstrate these functions using replication data from a published study on the survival of post-civil war peace.
Scholars have identified at least two sources of conditional heterogeneity in survival data that hasve been separately addressed. The first occurs when a subset of the units in the data are immune from experiencing a “failure” event, which violates the assumption that even right-censored observations experience the event of interest (Box-Steffensmeier and Zorn 1999; Box-Steffensmeier and Jones 2004; Beger et al. 2017). Cure rate or split-population (SP) survival models account for this immune fraction by first estimating the probability that right-censored units are immune from an event in a “split-stage” and then the time until at-risk censored and non-censored units experience the event. Recent work has extended these models to include i.i.d. frailties (Peng and Taylor 2011), time-varying covariates (Beger et al. 2017), and account for random right-censoring (Patilea and Van Keilegom 2020). Split-population survival models have been used to study phenomena, ranging from the survival of breast cancer patients (Wang et al. 2005), criminal recidivism (Schmidt and Witte 1989), the risk of coups (Beger et al. 2017), and parasite-induced mortality in river salmon (Ray et al. 2014).
Separately, conventional survival models have been extended to account for spatial autocorrelation among nearby units (Banerjee et al. 2003; Taylor and Rowlingson 2017; Zhou et al. 2020). These models relax the assumption of spatial independence by incorporating spatially weighted frailties into the survival model’s baseline hazard function. This allows for the possibility that adjacent units share unmodeled risk factors that influence their underlying propensities for experiencing a failure event. Apart from recent advances in modeling spatial frailties in different survival frameworks (e.g. (Diva et al. 2008; Zhou et al. 2020)), spatial survival models have been applied to analyze, for example, geographically referenced data on leukemia survival (Henderson et al. 2002), position announcements by U.S. House members (Darmofal 2009), prostate cancer (Zhou et al. 2020), and fire service response times (Taylor and Rowlingson 2017). Despite these considerable methodological developments, far less attention has been dedicated to accounting for spatial autocorrelation in SP survival settings. This is surprising because spatial autocorrelation between units may influence their probability of being immune and the survival rate among the units at risk of experiencing the failure event simultaneously. Banerjee and Carlin (2004) develop Bayesian spatial cure models but focus on modeling spatial autocorrelation in the survival stage using a conditionally autoregressive (CAR) prior. In fact, these authors emphasize that future research must “include covariates and spatial random effects as regressors in the cure rate portion of the model, instead of just the log-relative risk portion” (274).
In line with these trends in the literature, some R packages fit standard and split-population survival models but do not allow for the incorporation of spatial information. The survival package (Therneau 2020) fits parametric and semiparametric Cox survival models via MLE, whereas dynsurv (Wang et al. 2020) fits the Cox Proportional Hazards (PH) model with dynamic coefficients using MCMC methods. Conventional semi-parametric cure models can be estimated via MLE with the smcure (Cai et al. 2012) or nltm (Garibotti et al. 2019) package. The flexsurvcure (Amdahl 2020) package fits parametric mixture and non-mixture cure models for time-to-event data, and the spduration package fits parametric SP survival models with time-varying covariates (Beger et al. 2018).
A small handful of packages allow the user to incorporate spatial information into their survival models, but never in split-population settings. The BayesX package (Umlauf et al. 2019) and its associated interface to R R2BayesX (Belitz et al. 2017) fit spatial survival models and structured additive regression models with spatial frailties. The spBayesSurv package (Zhou and Hanson 2020) fits several Bayesian survival models with spatial frailties that can be formulated on a PH, proportional odds, or AFT scale, all of which can include time-varying covariates. The spatsurv package also fits Bayesian spatial survival models, including a PH model that permits users to incorporate Gaussian process frailties (Taylor and Rowlingson 2020). Beyond R, WinBUGS and GeoBUGs code has been developed to fit, for instance, survival and non-mixture cure models with spatial frailties (Banerjee et al. 2003; Banerjee et al. 2004; Thomas et al. 2004).
To our knowledge, our BayesSPsurv package is the first to allow users to fit parametric split-population survival models with not just time-varying covariates but also spatial frailties in both stages. The frailties in the parametric Bayesian spatial SP model are estimated using the CAR prior approach (Besag et al. 1991; Bernardinelli et al. 1995; Banerjee et al. 2003; Banerjee and Carlin 2004). BayesSPsurv also includes functions and routines coded in C++ to fit nonspatial parametric SP survival models with exchangeable frailties in the model’s split and survival-stage equation and without any frailties. Statistical inference of the models in BayesSPsurv is conducted via combined MCMC techniques that require little input from users. Our package and supplemental code also provide functions to implement spatial autocorrelation tests, produce country-level adjacency matrices, generate and compare multiple Markov chains, assess convergence, and conduct model comparison. Before outlining the functionality of the package in greater detail, we turn to briefly develop each of the three included Bayesian split-population survival models.
Define
Suppose that the survival data includes two subpopulations: an “at-risk” fraction that can fail and an “immune” fraction that will not experience the (failure) event of interest (Maller and Zhou 1996; Yin and Ibrahim 2005; Beger et al. 2017). When presented with this data generation process, researchers typically employ split-population survival (cure) models with or without unit-specific frailties to simultaneously estimate the probability of observations being in the immune fraction and the effect of covariates on the hazard of survival among the at-risk fraction (Maller and Zhou 1996; Lu 2010; Peng and Taylor 2014).
To understand these models in more detail, consider the split-population
survival model for the duration
where
where
Suppose, however, that the survival data with the two aforementioned
subpopulations should be fit with time-varying covariates. We can
re-define this data with unique “entry time” duration as
where the “split-stage” equation is
with
Suppose, however, that independence among the frailties cannot be assumed—that is, that the frailties exhibit spatial autocorrelation or clustering that influences each units’ propensity for being immune to an event of interest and their survival time if they are not immune. In a Bayesian split-population survival model, the assumption of spatial independence is relaxed by assigning spatial weights to the unit-specific frailties in the model’s split and survival stage and then statistically incorporating these spatially weighted frailties via the conditionally autoregressive (CAR) prior approach (Besag et al. 1991; Banerjee et al. 2003). The CAR prior accounts for spatially autocorrelated frailties by allowing the frailties to be spatially autocorrelated across geographically adjacent units.
To understand how the CAR prior is applied, first note that spatial data
often take the form of a lattice in which a continuous spatial surface
is divided into a grid of units such as counties, districts, or
countries. The spatially weighted frailties are constructed by defining
the relevant spatial relationship among adjacent units (this could, for
example, be geographic distance or contiguity) in an adjacency matrix
The resulting conditional distributions of the spatial frailties for
The log-likelihood of the Pooled (non-frailty), Exchangeable (nonspatial
frailty), and Spatial split-population (SP) survival models are
compatible with any survival distribution. The BayesSPsurv package,
however, supports MCMC estimation of these models for the Weibull and
log-logistic distributions. Our empirical application below focuses on
the Weibull survival distribution. The density, survival, and hazard
rate in the Weibull case are
Following standard practice for Bayesian inference (Carlin and Louis 2000), we
assign the multivariate normal (MVN) prior to
where
The joint posterior distribution of the Bayesian Spatial SP survival
model with time-varying covariates—our main model of interest—is
where
The three split-population survival models in BayesSPsurv are each
estimated with an MCMC algorithm for Bayesian inference. Because
closed-forms for the posterior distributions of
Each of the three Bayesian SP survival models in BayesSPsurv
incorporates a cure rate fraction and assumes that the time-to-event
baseline hazard follows a Weibull or log-logistic distribution (which
the user specifies). Users can also incorporate time-varying covariates
in either stage. BayesSPsurv contains compiled C++ code using the
package Rcpp (Eddelbuettel et al. 2020) to
maximize computational efficiency when estimating the included Bayesian
SP survival models. In addition to the pre-estimation spatial
autocorrelation (Join Count and Global Moran’s I) tests described below,
the BayesSPsurv package also permits users to calculate the deviance
information criterion (DIC) and log-likelihood statistics from the
Spatial, Exchangeable, and Pooled SP survival models’ MCMC output. The
DIC is a measure of model fit that also penalizes the effective number
of parameters. Like the Akaike Information Criterion, models with
smaller values are preferable to those with larger values. Users can
also conduct MCMC diagnostics with various extant packages designed to
handle mcmc
objects such as
coda (Plummer et al. 2020).
Function | Description |
---|---|
spatialSPsurv() |
Fits Bayesian Spatial SP survival model |
exchangeSPsurv() |
Fits Bayesian Exchangeable SP survival model |
pooledSPsurv() |
Fits Bayesian Pooled SP survival model |
plot_JoinCount() |
Implement and plot Join Count statistics |
plot_Moran.I() |
Implement and plot global Moran’s I statistics |
spatial_SA() |
Generate spatial weights (adjacency) matrix |
SPstats() |
Calculate DIC and log-likelihood from fitted models |
To demonstrate the utility and various functions in
BayesSPsurv, we use
replication data from Walter (2015)’s ((2015)) global study
on post-civil war peace duration (denoted in the package as
Walter_2015_JCR
). Walter (2015)’s ((2015)) panel data
consist of 1,237 observations from 46 countries observed between 1962
and 2009. As discussed in Bolte et al. (2021a), her data are well-suited for SP
survival analysis because they include two underlying populations: an
“at-risk” fraction of countries in which civil wars can potentially
recur, and an “immune” fraction of countries in which civil conflict
recurrence is structurally improbable. The Walter_2015_JCR
data are a
subset of Walter (2015)’s ((2015)) most important variables,
including lgdpl
for log per capita income, unpko
for the presence of
UN peacekeeping operations, the binary variable victory
, coded as 1
when one side in the civil war wins the conflict militarily, and the
dummy variable comprehensive
, coded as 1 when the combatants sign a
comprehensive peace agreement. The dataset also includes the binary
indicator renewed_war
, coded as 1 for the year in which a civil war
recurs and 0 otherwise. This variable will serve as our failure event.
First, however, we load the BayesSPsurv package along with the dataset
and then use the add_duration()
function from the
spduration package to
add several variables that allow us to capture the survival
characteristics of the data (Beger et al. 2017). unitID
indicates
the unique unit identifier (in this case, a unique id for each civil
conflict), and timeID
specifies the temporal variable.
library(BayesSPsurv)
data(Walter_2015_JCR)
walter <- spduration::add_duration(Walter_2015_JCR,"renewed_war",
unitID = "id", tID = "year",
freq = "year", ongoing = FALSE)
str(walter)
'data.frame': 1237 obs. of 21 variables:
$ year : num 2002 2003 2004 2005 2003 ...
$ lastyear : num 0 0 0 1 0 0 0 0 0 0 ...
$ renewed_war : num 0 0 0 1 0 0 0 0 0 0 ...
$ fhcompor1 : num -1.17 -1.17 -1.08 -1 -1.08 ...
$ lgdpl : num 5.75 6.29 6.36 6.4 8 ...
...
$ failure : num 0 0 0 1 0 0 0 0 0 0 ...
$ ongoing : num 0 0 0 0 0 0 0 0 0 0 ...
$ end.spell : num 0 0 0 1 0 0 0 0 0 0 ...
$ cured : num 0 0 0 0 1 1 1 1 1 1 ...
$ atrisk : num 1 1 1 1 0 0 0 0 0 0 ...
$ censor : num 0 0 0 0 0 0 0 0 0 0 ...
$ duration : num 1 2 3 4 1 2 3 4 5 6 ...
$ t.0 : num 0 1 2 3 0 1 2 3 4 5 ...
These new variables include duration
, a cumulative count of the years
of post-war peace survived, and the dummy variable atrisk
, coded as 1
for all observations that eventually experience war recurrence in the
sample period. Altogether the data include 77 post-civil war peace
spells and 24 instances of civil war recurrence.
If the preferred frailty unit is at the country-level, users can take
advantage of the spatial_SA()
function in the BayesSPsurv package to
generate their binary spatial weights matrix. In most cases, analysts
define spatial clustering in terms of geographic proximity. This often
requires the researcher to define some maximum distance threshold below
which two units are considered “neighbors.” spatial_SA()
allows users
to specify their own distance threshold. For this example, we use the
spatial_SA()
function to generate an adjacency matrix of countries in
the data whereby “proximity” (ccode
) and our distance threshold of 800 km. The spatial_SA()
function will produce a walter
that
includes both the original data frame and the associated adjacency
matrix.
walter <- BayesSPsurv::spatial_SA(data = walter, var_ccode = "ccode",
threshold = 800L)
walter[[2]][1:6,1:6]
42 90 92 93 135 155
42 0 0 0 0 0 0
90 0 0 1 1 0 0
92 0 1 0 1 0 0
93 0 1 1 0 0 0
135 0 0 0 0 0 0
155 0 0 0 0 0 0
Note that spatial adjacency between units does not need to be defined geographically; users can conceptualize “space” as any form of the dyadic relationship between units, but typically spatial clustering is substantively captured with some measure of geographic distance. Users can also create their own adjacency matrix from scratch to incorporate into the Bayesian estimation routine if their units of analysis are something other than countries.
Having generated a matrix that records the spatial relationship of all
pairs of units, we may now be interested in assessing the presence and
degree of spatial autocorrelation in the data with respect to our
outcomes of interest. The BayesSPsurv package provides two functions
to conduct these preliminary tests on country-level data. The first is
the plot_JoinCount()
function, which generates an adjacency matrix
with a user-defined distance threshold (as above), implements the join
count test for each cross-section in the data, and then automatically
plots the test statistics with user-specified confidence intervals
across each observed year. The join count test is a widely used
correlational statistic for evaluating whether the expected count of
categorically distinct adjacent units is greater than what we would
expect by chance alone (Cliff and Ord 1981). More formally, if we assume
two categories, Black and White, then the join count test statistic is
where BW and Eplot_JoinCount
also has an argument
specifying the minimum number of units that must be in a cross-section
for the test statistic to be calculated. In our case, there were fewer
than 12 countries in every year prior to 1975 in the data, and none were
geographically proximate. By specifying n=12
, we are simply dropping
years prior to 1975. The following code produces the plot in Figure
1a.
plot_JoinCount(data = walter[[1]], var_cured = "cured", var_id = "ccode",
var_time = "year", n = 12)
Figure 1
|
|
|
|
Negative values indicate clustering (positive spatial autocorrelation), and positive values indicate spatial dispersion. Figure 1a clearly depicts spatially correlated patterns of potential “peace consolidation,” though the direction of the autocorrelation varies over time.
The second function, plot_Moran.I()
, implements the Global Moran’s I
test, which assesses the direction and degree of spatial autocorrelation
in continuous or ordinal data (Moran 1950; Paradis et al. 2020). We use this test
to examine whether the duration of post-civil war peace among
geographically proximate countries exhibits spatial autocorrelation (or
dispersion). Like the previous function, plot_Moran.I()
plots the
Moran’s I values based on user-defined adjacencies and, in this case,
90% confidence intervals for each year.
plot_Moran.I(data = walter[[1]], var_duration = "duration", var_id = "ccode",
var_time = "year", n = 12)
Figure 1b reveals that the spatial relationship of post-war peace survival oscillates between positive autocorrelation (in the 1970s, early 1990s, and late 2000s) and spatial dispersion (in much of the 1980s and late 1990s). Taken together, the Join Count and Moran’s I tests suggest that unobserved heterogeneous risk factors that transcend the borders of a single state may lead to spatial autocorrelation in both the consolidation and duration of post-war peace in Walter (2015)’s ((2015)) data. This suggests that a Spatial SP survival model is an appropriate method of analysis, particularly if proximate countries share common unobservable risk factors that affect their odds of being immune to war recurrence or the time it takes for renewed war to occur.
The spatialSPsurv()
function fits the Bayesian Spatial SP survival
model, which includes spatially autocorrelated frailties in the model’s
split and survival stage and can incorporate time-varying covariates. We
illustrate the features of the spatialSPsurv()
function by fitting the
Bayesian Spatial Weibull cure model on Walter (2015)’s
((2015)) data on post-war peace.lgdpl
variable in the
estimated model’s split-stage equation, which estimates the probability
of units being in the “consolidated” post-war peace group (the “immune”
fraction), though it is important to note that atrisk
is used here as
the dependent variable. We estimate the probability of post-war survival
among the at-risk fraction as a function of UN peacekeeping (unpko
),
outright military victory in the previous war (victory
), the
resolution of the previous war via peace agreement (comprehensive
),
and, again, GDP/capita (lgdpl
). The spatial relationship of each
country is incorporated into the model via the spatial weights matrix
generated from the spatial_SA()
function described earlier.
The Spatial SP survival model is fit via Bayesian methods for inference.
The MVN prior is used for the model’s split-stage (V
) and survival stage (W
). Estimation proceeds via the
MCMC algorithm described in the previous section. To this end, we
specify the MCMC sampler as follows: the argument N = 15,000
sets the
number of MCMC iterations to 15,000, burn = 5,000
discards the first
5,000 states of the Markov chain, thin = 15
specifies the number of
steps that are employed to prevent autocorrelation, and m = 10
limits
the steps in the slice sampling to 10. The argument w = c(1,1,1)
specifies the default values of the size of the slice sampling for
For simplicity, we retain the default 0 as the initial value for all
parameters with the ini.beta
, ini.gamma
, ini.W
, and ini.V
arguments. We incorporate separate proposal variances, namely
prop.varV = 1e-05
and prop.varW = 1e-05
, for the split and survival
stage frailties (V, W) and then assign separate
Metropolis-Hastings proposal steps for estimating these parameters to
optimize the acceptance rates. To specify the covariates in the model,
duration ~
precedes the survival-stage covariates, while immune ~
precedes the split-stage variables. Y0
indicates the time elapsed
since inception until time t - 1, while LY
is a dummy that captures
the last observation year due to censoring or failure. A = walter[[2]]
calls the spatial weights matrix from our list object. S= ’sp_id’
(generated by spatial_SA())
gives the unit identifier in the spatial
weights matrix that matches the units in the data frame. The argument
form
is used to specify the parametric survival distribution (which
can be either “Weibull” or “loglog”).
Importantly, the model functions in the BayesSPsurv package generate single chains for each parameter, but we provide a routine for estimating multiple chains in parallel on the GitHub repository for the package (https://github.com/Nicolas-Schmidt/BayesSPsurv). We first discuss the results and diagnostics for the single-chain results using the above MCMC specification and then illustrate the multiple-chain results and diagnostics for the Spatial SP survival model.
set.seed(123456)
model <- spatialSPsurv(duration = duration ~ victory + comprehensive + lgdpl + unpko ,
immune = atrisk ~ lgdpl,
Y0 = 't.0',
LY = 'lastyear',
S = 'sp_id' ,
data = walter[[1]],
N = 15000,
burn = 5000,
thin = 15,
w = c(1,1,1),
m = 10,
ini.beta = 0,
ini.gamma = 0,
ini.W = 0,
ini.V= 0,
form = "Weibull",
prop.varV = 1e-05,
prop.varW = 1e-05,
A = walter[[2]])
The function automatically provides a progress bar for users to track
the percent of the computation that has been completed. The elapsed run
time for the Spatial SP Weibull specification being examined here was
just under 23 minutes (calculated with system.time()
). Once completed,
the generic print()
function displays the results
print(model)
Call:
spatialSPsurv(duration = duration ~ victory + comprehensive +
lgdpl + unpko, immune = atrisk ~ lgdpl, Y0 = "t.0",
LY = "lastyear", S = "sp_id", A = walter[[2]],
data = walter[[1]], N = 15000, burn = 5000, thin = 15, w = c(1,
1, 1), m = 10, ini.beta = 0, ini.gamma = 0, ini.W = 0,
ini.V = 0, form = "Weibull", prop.varV = 1e-05, prop.varW = 1e-05)
Iterations = 1:666
Thinning interval = 1
Number of chains = 1
Sample size per chain = 666
Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Duration equation:
Mean SD Naive SE Time-series SE
(Intercept) 0.3434823 1.0681878 0.041391437 0.078812901
victory 0.1028612 0.5321456 0.020620224 0.020620224
comprehensive 0.2036460 0.6178274 0.023940326 0.023940326
lgdpl 0.4831914 0.1424928 0.005521485 0.009230762
unpko 0.3057078 0.7695278 0.029818596 0.029818596
Immune equation:
Mean SD Naive SE Time-series SE
(Intercept) -0.4306327 4.545655 0.1761405 0.3648687
lgdpl -2.4722920 3.113144 0.1206319 0.1646276
The posterior mean estimates suggest that peace agreements, military
victory, and peacekeeping units may increase the survival of post-civil
war peace, though none of these parameters are highly reliable given
their 95% BCIs, as reported in (Bolte et al. 2021a). However, lgdpl
appears to
reliably increase the survival of peace among at-risk cases while also
reliably decreasing the probability of peace “consolidation” in the
split-stage. Calling the SPstats()
function calculates the Deviance
Information Criterion (DIC) and log-likelihood (LL) statistics from the
estimated model, where
SPstats(model)
$DIC
[1] -1726.128
$Loglik
[1] 5301.714
The spatialSPsurv()
function produces an object of class "mcmc"
,
which is compatible with all standard summary and diagnostic methods for
the single Markov chains in the
coda package
(Plummer et al. 2020). For instance, we can test for convergence with both the
Geweke (1992) test using the geweke.diag()
function and the
Heidelberger and Welch (1983) stationarity test via the heidel.diag()
function in
coda. We report the
Heidelberger and Welch (1983) stationarity test results in Bolte et al. (2021a) to save space.
In this case, the Geweke tests indicate little evidence against
convergence for each split-stage and survival-stage covariate, because
the test statistics are reasonably close to zero.
geweke.diag(model$gammas)
Fraction in 1st window = 0.1
Fraction in 2nd window = 0.5
(Intercept) lgdpl
-1.3755 0.4429
geweke.diag(model$betas)
Fraction in 1st window = 0.1
Fraction in 2nd window = 0.5
(Intercept) victory comprehensive lgdpl unpko
-0.8248 -0.4271 1.8237 0.3601 -0.9471
geweke.diag(model$rho)
Fraction in 1st window = 0.1
Fraction in 2nd window = 0.5
var1
0.3544
coda’s plot()
function
can then be used to display trace and posterior density plots for the
single Markov chains of the model’s split-stage mcmc.list
object and easily plot the trace and density plots of the multiple
chains. Using the code on GitHub, the Spatial SP survival model was
re-run four times (to generate four chains) with 0, 1, 10, and 50 as the
initial starting values for each of the estimated parameters (
par(mfrow=c(2,2))
plot(gammas, density = FALSE, auto.layout = FALSE, col =c (28,26,27,29))
plot(gammas, trace = FALSE, auto.layout = FALSE)
par(mfrow=c(2,4))
plot(betas[,2:5], density= FALSE, auto.layout = FALSE, col = c(28,26,27,29))
plot(betas[,2:5], trace= FALSE, auto.layout = FALSE)
The trace plots for the Spatial SP Weibull model’s gelman.diag()
function in coda, which should be approximately 1 if
all chains have converged to a common distribution.
Potential scale reduction factors:
Point est. Upper C.I.
(Intercept) 1.19 1.38
victory 1.17 1.20
comprehensive 1.17 1.19
lgdpl 1.32 2.40
unpko 1.11 1.14
Multivariate psrf
1.05
Potential scale reduction factors:
Point est. Upper C.I.
(Intercept) 1.05 1.07
lgdpl 1.14 1.33
Multivariate psrf
1.05
The multivariate PSRF is less than 1.1 in both stages, but the point estimates for all parameters are still insufficiently far from 1, suggesting that extending the chains would likely improve the accuracy of the estimates.
One way to substantively interpret the spatial frailties is to display a
map and determine whether adjacent units share similar frailty values
(e.g., (Darmofal 2009)). The following code from the
countrycode
(Arel-Bundock 2020) and
rworldmap
(South 2016) packages permits users to create choropleth maps of
the spatial frailty posterior means. Figures
4a-4b display the
single-chain split-stage (V
) and survival-stage (W
) frailties from
our Spatial SP Weibull model (the code generates Figure
4a; Figure 4b
simply uses W
in place of V
.
library(rworldmap)
library(countrycode)
library(classInt)
spv <- matrix(apply(model$V, 2, mean), ncol = 1, nrow = ncol(model$V))
ISO3 <- countrycode(colnames(model$V) ,'gwn','iso3c')
spv <- data.frame(ccode = colnames(model$V), ISO3 = ISO3, spv = spv)
map <- joinCountryData2Map(spv, joinCode = "ISO3", nameJoinColumn = "ISO3")
classInt <- classIntervals(map[["spv"]], n = 6, style = "quantile")
mapParams <- mapCountryData(map,
nameColumnToPlot = "spv",
addLegend = FALSE,
catMethod = classInt[["brks"]],
colourPalette = palette(RColorBrewer::brewer.pal(6,"RdBu")),
mapTitle = "")
do.call(addMapLegend, c(mapParams, legendLabels = "all", legendWidth = 0.5,
legendIntervals = "data", legendMar = 2))
|
|
|
|
Note that in certain regions, there are distinct spatial bands in the (i) split-stage frailties, which range from -0.633 to 0.381 with a corresponding standard deviation of 0.34 and (ii) survival-stage frailties that range from -0.391 to 0.766 with a corresponding standard deviation of 0.32. These spatial bands reveal geographic clustering in both consolidation and duration of post-war peace since similar frailty values often occur near one another.
The BayesSPsurv package can also be used to estimate Bayesian
Exchangeable and Pooled SP survival models. The Exchangeable model
incorporates frailties that are assumed to be statistically independent
in both stages, while the Pooled model excludes frailties altogether. We
again use Walter (2015)’s ((2015)) data on post-civil war
peace to illustrate these models. Beginning with the Exchangeable
Weibull SP survival model, we again assign the MVN prior to the model’s
id_WV=country
to define the country-level nonspatial frailties.
set.seed(123456)
country <- countrycode(unique(walter[[1]]$ccode),'gwn','iso3c')
model1 <- exchangeSPsurv(duration = duration ~ victory + comprehensive + lgdpl + unpko,
immune = atrisk ~ lgdpl,
Y0 = 't.0',
LY = 'lastyear',
S = 'sp_id' ,
data = walter[[1]],
N = 15000,
burn = 5000,
thin = 15,
w = c(1,1,1),
m = 10,
ini.beta = 0,
ini.gamma = 0,
ini.W = 0,
ini.V= 0,
form = "Weibull",
prop.varV = 1e-05,
prop.varW = 1e-05,
id_WV=country)
Again, users can assess either single or multiple (parallel) Markov
chains with different starting parameter values when fitting the
Exchangeable SP survival model. We report the single-chain results here
for simplicity. The run time for the above specification was
approximately 16 minutes and 30 seconds. The single-chain results and
model fit statistics are obtained with the print()
and SPstats()
functions.
print(model1)
Call:
exchangeSPsurv(duration = duration ~ victory + comprehensive +
lgdpl + unpko, immune = atrisk ~ lgdpl, Y0 = "t.0",
LY = "lastyear", S = "sp_id", data = walter[[1]],
N = 15000, burn = 5000, thin = 15, w = c(1, 1, 1), m = 10,
ini.beta = 0, ini.gamma = 0, ini.W = 0, ini.V = 0, form = "Weibull",
prop.varV = 1e-05, prop.varW = 1e-05, id_WV = country)
Iterations = 1:666
Thinning interval = 1
Number of chains = 1
Sample size per chain = 666
Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Duration equation:
Mean SD Naive SE Time-series SE
(Intercept) 0.31054460 1.0938158 0.042384501 0.08335381
victory 0.14520456 0.5328378 0.020647044 0.02064704
comprehensive 0.09721372 0.6374526 0.024700786 0.02470079
lgdpl 0.48473536 0.1455731 0.005640844 0.01061988
unpko 0.44731384 0.7421925 0.028759376 0.02875938
Immune equation:
Mean SD Naive SE Time-series SE
(Intercept) -0.8993176 5.924550 0.2295716 0.4969172
lgdpl -3.7286951 5.846367 0.2265421 0.5271203
SPstats(model1)
$DIC
[1] 3549.075
$Loglik
[1] 7506.608
Based on these results, lgdpl
is still a highly reliable predictor of
longer peace duration among conflicts at risk of recurrence, while the
other survival stage covariates remain positive but statistically
unreliable. In contrast to the spatial frailty model, however, the
effect of lgdpl
on the probability of peace consolidation is now
statistically unreliable. Users interested in displaying and
interpreting the exchangeable frailties can do so in a variety of ways,
but if these results are being compared to those of a spatial frailty
model, we recommend mapping the exchangeable frailty values and
comparing the clustering (or lack thereof) of the i.i.d. frailty terms
to those produced from the spatial model (Darmofal 2009).
Finally, the pooledSPsurv()
function fits the Pooled Bayesian SP
survival model. For this application, we again assign the MVN prior to
the model’s
set.seed(123456)
model2 <- pooledSPsurv(duration = duration ~ victory + comprehensive + lgdpl + unpko,
immune = atrisk ~ lgdpl,
Y0 = 't.0',
LY = 'lastyear',
data = walter[[1]],
N = 15000,
burn = 5000,
thin = 15,
w = c(1,1,1),
m = 10,
ini.beta = 0,
ini.gamma = 0,
form = "Weibull")
Note that the MCMC sampler is almost identical to that of the previous two models, but we now exclude the separate Metropolis-Hastings proposal step for the frailities in the MCMC algorithm as well as the initial values for the frailty estimates since this is a non-frailty model. The run time for the Pooled model was approximately 10 minutes. The single-chain output is again easily displayed.
print(model2)
Call:
pooledSPsurv(duration = duration ~ victory + comprehensive +
lgdpl + unpko, immune = atrisk ~ lgdpl, Y0 = "t.0",
LY = "lastyear", data = walter[[1]], N = 15000, burn = 5000,
thin = 15, w = c(1, 1, 1), m = 10, ini.beta = 0, ini.gamma = 0,
form = "Weibull")
Iterations = 1:666
Thinning interval = 1
Number of chains = 1
Sample size per chain = 666
Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Duration equation:
Mean SD Naive SE Time-series SE
(Intercept) 0.3014335 1.0572594 0.040967971 0.07197438
victory 0.1060453 0.5485730 0.021256770 0.02125677
comprehensive 0.1900127 0.6054421 0.023460406 0.02481878
lgdpl 0.4647260 0.1552465 0.006015679 0.01166616
unpko 0.2692275 0.7740804 0.029995006 0.02999501
Immune equation:
Mean SD Naive SE Time-series SE
(Intercept) -1.213902 9.617551 0.3726725 1.3195278
lgdpl -2.896038 4.939374 0.1913968 0.5267751
SPstats(model2)
$DIC
[1] 1804.527
$Loglik
[1] 6229.248
For both the Exchangeable and Pooled SP survival models, users can again
employ the code and routines available in our GitHub repository to view
the trace plots from the single or multiple Markov chains associated
with the posterior densities for the
Survival data often include two populations with different underlying risk propensities: the immune fraction of right-censored units that will never experience the event of interest and the at-risk fraction of units that have or will. Numerous R-packages such as smcure, nltm, and spduration have been developed to fit parametric or semi-parametric cure models for nonspatial survival data, but none of these packages allow the user to account for spatial autocorrelation in both stages. A separate set of packages allows the user to include spatially weighted frailties in conventional survival models (Taylor and Rowlingson 2017; Zhou et al. 2020), but spatial autocorrelation may also influence the probability of immunity from an event of interest. The BayesSPsurv package addresses this lacuna by offering a suite of functions to fit Bayesian SP survival models. Specifically, users can estimate Bayesian Pooled, Exchangeable, and Spatial frailty SP survival models with time-varying covariates, specify their own spatial weights matrices, and easily examine diagnostic statistics. The applied potential of the Spatial SP survival model, in particular, is considerable, as researchers studying anything from the survival of cancer patients to the survival of political regimes may have a methodological need to model spatial autocorrelation in the immune fraction.
Future work can build upon the spatial and nonspatial cure models
presented here to develop estimation routines for survival data with
multiple stages, competing risks, or recurrent events. Moreover,
although BayesSPsurv is the first to allow spatially autocorrelated
frailties in both stages, future work can extend the frailty models in
our package by employing a similar approach as Yin (2005) to incorporate
shared frailties with factor loadings to account for any correlation
between the frailty terms (
BayesSPsurv, survival, dynsurv, smcure, nltm, flexsurvcure, spduration, BayesX, R2BayesX, spBayesSurv, spatsurv, Rcpp, coda, doParallel, doRNG, countrycode, rworldmap
Bayesian, ClinicalTrials, Econometrics, GraphicalModels, HighPerformanceComputing, MixedModels, NumericalMathematics, OfficialStatistics, Spatial, Survival
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Bolte, et al., "BayesSPsurv: An R Package to Estimate Bayesian (Spatial) Split-Population Survival Models", The R Journal, 2021
BibTeX citation
@article{RJ-2021-068, author = {Bolte, Brandon and Schmidt, Nicolás and Béjar, Sergio and Huynh, Nguyen and Mukherjee, Bumba}, title = {BayesSPsurv: An R Package to Estimate Bayesian (Spatial) Split-Population Survival Models}, journal = {The R Journal}, year = {2021}, note = {https://rjournal.github.io/}, volume = {13}, issue = {1}, issn = {2073-4859}, pages = {595-613} }