BayesSPsurv: An R Package to Estimate Bayesian (Spatial) Split-Population Survival Models

Survival data often include a fraction of units that are susceptible to an event of interest as well as a fraction of “immune” units. In many applications, spatial clustering in unobserved risk factors across nearby units can also affect their survival rates and odds of becoming immune. To address these methodological challenges, this article introduces our BayesSPsurv R-package, which fits parametric Bayesian Spatial split-population survival (cure) models that can account for spatial autocorrelation in both subpopulations of the user’s time-to-event data. Spatial autocorrelation is modeled with spatially weighted frailties, which are estimated using a conditionally autoregressive prior. The user can also fit parametric cure models with or without non-spatial i.i.d. frailties, and each model can incorporate time-varying covariates. BayesSPsurv also includes various functions to conduct pre-estimation spatial autocorrelation tests, visualize results, and assess model performance, all of which are illustrated using data on post-civil war peace survival.


Introduction
Conventional survival models have been applied to analyze time-to-event data across several academic disciplines, but these models rely on two core assumptions that are not always tenable. The first is that all units, including right-censored observations, will eventually experience the event of interest. In many applications, however, some fraction of subjects that are "immune" or "cured" may never experience the event (Maller and Zhou, 1996;Peng and Taylor, 2014;Beger et al., 2017). A clinical study of obesity on human death rates can employ a standard survival model because all subjects will eventually die, but a study on vaccine effectiveness will likely assume that some fraction of the treated population will become immune while others may respond differently and remain uncured. To account for both subpopulations, scholars have developed a class of split-population (SP) survival models that probabilistically separate the immune fraction from the units that are susceptible to the event of interest and then estimate the conditional hazard of survival among the latter units (Cai et al., 2012;Beger et al., 2018;Box-Steffensmeier and Zorn, 1999;Box-Steffensmeier and Jones, 2004). A second assumption of conventional survival models is that each observation is conditionally independent after controlling for covariates. This assumption is violated if spatially clustered units share common unobserved features that influence their baseline risk of experiencing an event (Darmofal, 2009;Taylor and Rowlingson, 2017). If spatial autocorrelation matters for process survival and/or the probability of being immune to an event, ignoring it can lead to biased parameter estimates.
Although there are numerous methods for dealing with spatially autocorrelated time-to-event data (e.g. Li and Ryan, 2002;Banerjee et al., 2003;Henderson et al., 2002;, these spatial survival models do not differentiate between at-risk and immune observations. Conversely, existing SP survival models either ignore spatial autocorrelation altogether or only account for it in the survival stage (Banerjee and Carlin, 2004). We present the BayesSPsurv (Bolte et al., 2021b) R-package (available at https://CRAN.R-project.org/package=BayesSPsurv), which has the power to overcome these limitations as well as the flexibility for researchers to model spatial clustering in their survival data however they choose to define it. In particular, BayesSPsurv allows the user to estimate parametric Bayesian Spatial split-population survival (cure) models with spatial frailties in both the model's split and survival stages. These models account for spatial clustering in the immune and at-risk fractions of the data. The package also includes functions and code for pre-estimation spatial autocorrelation diagnostics, visualizing results, simulating multiple Markov chains, and implementing Markov Chain Monte Carlo (MCMC) estimation routines for SP survival models with independent (exchangeable) or no random effects. The user can also include time-varying covariates in either stage. In the next section, we outline previous work on SP and spatial survival models, including existing packages and their limitations. We then formally develop the Pooled, Exchangeable, and Spatial Bayesian SP survival models before describing the various functions available in the BayesSPsurv package. We demonstrate these functions using replication data from a published study on the survival of post-civil war peace.

Background and other R-packages
Scholars have identified at least two sources of conditional heterogeneity in survival data that has been separately addressed. The first occurs when a subset of the units in the data are immune from experiencing a "failure" event, which violates the assumption that even right-censored observations experience the event of interest (Box-Steffensmeier and Zorn, 1999;Box-Steffensmeier and Jones, 2004;Beger et al., 2017). Cure rate or split-population (SP) survival models account for this immune fraction by first estimating the probability that right-censored units are immune from an event in a "split-stage" and then the time until at-risk censored and non-censored units experience the event. Recent work has extended these models to include i.i.d. frailties (Peng and Taylor, 2011), time-varying covariates (Beger et al., 2017), and account for random right-censoring (Patilea and Van Keilegom, 2020). Splitpopulation survival models have been used to study phenomena, ranging from the survival of breast cancer patients (Wang et al., 2005), criminal recidivism (Schmidt and Witte, 1989), the risk of coups (Beger et al., 2017), and parasite-induced mortality in river salmon (Ray et al., 2014).
Separately, conventional survival models have been extended to account for spatial autocorrelation among nearby units (Banerjee et al., 2003;Taylor and Rowlingson, 2017;. These models relax the assumption of spatial independence by incorporating spatially weighted frailties into the survival model's baseline hazard function. This allows for the possibility that adjacent units share unmodeled risk factors that influence their underlying propensities for experiencing a failure event. Apart from recent advances in modeling spatial frailties in different survival frameworks (e.g. Diva et al., 2008;, spatial survival models have been applied to analyze, for example, geographically referenced data on leukemia survival (Henderson et al., 2002), position announcements by U.S. House members (Darmofal, 2009), prostate cancer , and fire service response times (Taylor and Rowlingson, 2017). Despite these considerable methodological developments, far less attention has been dedicated to accounting for spatial autocorrelation in SP survival settings. This is surprising because spatial autocorrelation between units may influence their probability of being immune and the survival rate among the units at-risk of experiencing the failure event simultaneously. Banerjee and Carlin (2004) develop Bayesian spatial cure models, but focus on modeling spatial autocorrelation in the survival stage using a conditionally autoregressive (CAR) prior. In fact, these authors emphasize that future research must "include covariates and spatial random effects as regressors in the cure rate portion of the model, instead of just the log-relative risk portion" (274).
In line with these trends in the literature, some R packages fit standard and split-population survival models but do not allow for the incorporation of spatial information. The survival package (Therneau, 2020) fits parametric and semi-parametric Cox survival models via MLE, whereas dynsurv (Wang et al., 2020) fits the Cox Proportional Hazards (PH) model with dynamic coefficients using MCMC methods. Conventional semi-parametric cure models can be estimated via MLE with the smcure (Cai et al., 2012) or nltm (Garibotti et al., 2019) package. The flexsurvcure (Amdahl, 2020) package fits parametric mixture and non-mixture cure models for time-to-event data, and the spduration package fits parametric SP survival models with time-varying covariates (Beger et al., 2018).
A small handful of packages allow the user to incorporate spatial information into their survival models, but never in split-population settings. The BayesX package (Umlauf et al., 2019) and its associated interface to R R2BayesX (Belitz et al., 2017) fits spatial survival models and structured additive regression models with spatial frailties. The spBayesSurv package  fits several Bayesian survival models with spatial frailties that can be formulated on a PH, proportional odds, or AFT scale, and all can include time-varying covariates. The spatsurv package also fits Bayesian spatial survival models, including a PH model that permits users to incorporate Gaussian process frailties (Taylor and Rowlingson, 2020). Beyond R, WinBUGS and GeoBUGs code has been developed to fit, for instance, survival and non-mixture cure models with spatial frailties (Banerjee et al., 2003Thomas et al., 2004).
To our knowledge, our BayesSPsurv package is the first to allow users to fit parametric splitpopulation survival models with not just time-varying covariates but also spatial frailties in both stages. The frailties in the parametric Bayesian spatial SP model are estimated using the CAR prior approach (Besag et al., 1991;Bernardinelli et al., 1995;Banerjee et al., 2003;Banerjee and Carlin, 2004). BayesSPsurv also includes functions and routines coded in C++ to fit non-spatial parametric SP survival models with exchangeable frailties in the model's split and survival-stage equation and without any frailties. Statistical inference of the models in BayesSPsurv is conducted via combined MCMC techniques that require little input from users. Our package and supplemental code also provide functions to implement spatial autocorrelation tests, produce country-level adjacency matrices, generate and compare multiple Markov chains, assess convergence, and conduct model comparison. Before outlining the functionality of the package in greater detail, we turn to briefly develop each of the three included Bayesian split-population survival models.

Model development
Define i = {1, 2, ...N} for the units that may fail or experience an event of interest in a continuous time survival dataset. Let f (t) and F(t) represent the probability density function and cumulative distribution function. The survival distribution is S(t) = 1 − F(t), and the hazard rate is h(t) = f (t) S(t) . Some units will fail during the time period under observation ( C i = 1), while others do not and are "censored" ( C i = 0). The general likelihood of the conventional survival model in which all units eventually experience the event of interest is, Suppose that the survival data includes two subpopulations: an "at-risk " fraction that can fail and an "immune" fraction that will not experience the (failure) event of interest (Maller and Zhou, 1996;Yin and Ibrahim, 2005;Beger et al., 2017). When presented with this data generation process, researchers typically employ split-population survival (cure) models with or without unit-specific frailties to simultaneously estimate the probability of observations being in the immune fraction and the effect of covariates on the hazard of survival among the at-risk fraction (Maller and Zhou, 1996;Lu, 2010;Peng and Taylor, 2014).
To understand these models in more detail, consider the split-population survival model for the duration t that splits the sample into an at-risk and an immune fraction. Let α i = Pr(Y i = 1) be the probability with which units enter the immune fraction. α i can be estimated via a binary response function and is defined for the logit case as: where Z i are p2-dimensional covariates, γ the parameter vector in R p 2 , and V i ∼ N(0, σ 2 ) are the non-spatial i.i.d unit-specific frailties (random effects). Equation 2 is the split-population model's split-stage equation, where the unit-specific frailties V i , which are each independent of other individual random effects, account for unobserved heterogeneity that influences probability α i . Let W i ∼ N(0, σ 2 ) denote the non-spatial i.i.d unit-specific frailties that capture the possibility that some units are at different risks of experiencing the event of interest due to unobserved factors. The proportional hazards function of the SP survival model with non-spatial unit-specific frailties is, where h 0 (t i ) is the baseline hazard (e.g. Weibull, log-logistic), log ω i = W i , X i is the p1-dimensional covariates, and β the parameter vector in R p 1 . We focus on incorporating unit-specific frailty terms generally because they are most commonly used in the social sciences, but our approach could plausibly be extended to a shared frailty framework if the researcher believes that the frailties occur in clusters such that within-cluster frailties are correlated and while frailties between clusters are independent.
Suppose, however, that the survival data with the two aforementioned subpopulations should be fit with time-varying covariates. We can re-define this data with unique "entry time" duration as t0 and "exit time" as duration t for each period at which an observation is observed. Let t0 ij denote unit i's elapsed time since inception until the beginning of time period j, t ij the elapsed time since that unit's inception until the end of period j, andC ij = 1 if that unit fails or is censored (C ij = 0) at t ij . The probability of survival up until period j is now S(t0) = 1 − F(t0) where F (t0) = t0 0 f (t0). In this case, both subpopulations contribute to the log-likelihood of the split-population survival model with non-spatial i.i.d frailties as: where the "split-stage" equation is . V i and W i are the non-spatial frailties. The model's survival stage estimates the probability of survival prior to the event of interest conditional upon being at-risk for that event given covariates X ij and the baseline hazard function. If V i = W i = 0, then (4) defines the log-likelihood of the "Pooled" SP survival model (without unit-specific frailties) with time-varying covariates (Ibrahim et al., 2001;Lu, 2010). However, if unobserved unit-specific heterogeneity influences the probability of immunity or survival time, it can be accounted for with the split and survival-stage frailty terms (V i and W i ). In a Bayesian split-population survival framework, these frailties are incorporated into each stage of the model using the exchangeable normal prior, with τ as the precision parameter (Banerjee et al., 2003;Banerjee and Carlin, 2004). The prior in (5) is induced by treating each specified unit as exchangeable rather than assigning weights corresponding to each unit's spatial relationship to one another (Bernardinelli and Montomoli, 1992;Darmofal, 2009). This Exchangeable split-population survival model is appropriate if each unit's frailty is presumed to be independent from other individual random effects. Geographically, for instance, this means that the influence of each unit-specific frailty on that unit's probability of being immune or it's risk propensity is completely unrelated to the neighboring units' frailties unobserved effects.
Suppose, however, that independence among the frailties cannot be assumed-that is, that the frailties exhibit spatial autocorrelation or clustering that influences each units' propensity for being immune to an event of interest and their survival time if they are not immune. In a Bayesian splitpopulation survival model, the assumption of spatial independence is relaxed by assigning spatial weights to the unit-specific frailties in the model's split and survival stage, and then statistically incorporating these spatially weighted frailties via the conditionally autoregressive (CAR) prior approach (Besag et al., 1991;Banerjee et al., 2003). The CAR prior accounts for spatially autocorrelated frailties by allowing the frailties to be spatially autocorrelated across geographically adjacent units.
To understand how the CAR prior is applied, first note that spatial data often take the form of a lattice in which a continuous spatial surface is divided into a grid of units such as counties, districts, or countries. The spatially weighted frailties are constructed by defining the relevant spatial relationship among adjacent units (this could, for example, be geographic distance or contiguity) in an adjacency matrix A with elements a ii . Each element a ii in A is given a weight of 1 if units i and i are "neighbors," and 0 if they are not. Once these spatial weights are defined via the matrix A, this information is then incorporated into the CAR prior, which permits us to model spatially dependent frailties between these contiguous units. To employ the CAR prior approach in a Bayesian SP survival framework, the frailties V i are collected into the vector V = {V 1 , ...., V N }, and W i into W = {W 1 ,....,W N }. Separate CAR priors are then employed for V and W, which implies the following model structure: λ is the precision parameter (Besag et al., 1991;Banerjee and Carlin, 2004). The CAR(λ) prior for V and W has a joint distribution in each case that has been formally characterized by scholars (Banerjee et al., 2003, 126).
The resulting conditional distributions of the spatial frailties for V and W are are the averages of the neighboring W i =i and V i =i respectively where i adj i denotes that i is adjacent to i given A, and m i is the number of these adjacencies (Bernardinelli and Montomoli, 1992, 989;Thomas et al., 2004;Banerjee et al., 2003). Incorporating the spatial information in A in this way accounts for the possibility that spatially proximate units share common unmodeled factors that influence their probability of being immune or their survival time before experiencing the event of interest. Using this CAR prior approach to address spatial autocorrelation, the Spatial split-population survival model's log-likelihood is defined The log-likelihood of the Pooled (non-frailty), Exchangeable (non-spatial frailty), and Spatial split-population (SP) survival models are compatible with any survival distribution. The BayesSPsurv package, however, supports MCMC estimation of these models for the Weibull and log-logistic distributions. Our empirical application below focuses on the Weibull survival distribution. The density, survival, and hazard rate in the Weibull case are where θ = exp X ij β, W for the Spatial, θ = exp X ij β + W i for the Exchangeable and θ = exp X ij β for the Pooled split-population Weibull model. The density, survival function and the hazard rate for the log-logistic case is defined in Bolte et al. (2021a).

Bayesian inference
Following standard practice for Bayesian inference (Carlin and Louis, 2000), we assign the multivariate normal (MVN) prior to β = {β 1 , ..., β p 1 } and γ = {γ 1 , ..., γ p 2 }, and the Gamma prior for ρ with shape and scale parameters a ρ and b ρ for each of the three Bayesian split-population survival models in the BayesSPsurv package: where a ρ , b ρ , S β , ν β , S γ , ν γ are the hyperparameters. We use Bayesian hierarchical modeling to estimate Σ β and Σ γ employing the Inverse-Wishart (IW) distribution when using the MVN (a weakly informative) prior. For Bayesian MCMC estimation of the Spatial SP survival (Weibull) model, we assign the hyperprior p(λ) to λ given the CAR prior approach. Specifically, we assign the Gamma hyperprior λ ∼ Gamma(a λ , b λ ) for λ (Banerjee and Carlin, 2004;Darmofal, 2009). 1 To estimate the Exchangeable SP Weibull model, we assign the (multivariate) normal prior for the model's split and survival-stage frailties (V i , W i ), and the priors defined in (9) for β, γ,and ρ. To identify the Exchangeable and Spatial SP models' intercepts, we impose the constraint that the frailties sum to zero (∑ i V i = 0 and ∑ i W i = 0).
The three split-population survival models in BayesSPsurv are each estimated with an MCMC algorithm for Bayesian inference. Because closed forms for the posterior distributions of β, γ and ρ are not available, these parameters in each model are updated in the MCMC algorithm via slice-sampling (with stepout and shrinkage) from their respective full conditional distribution. The closed form of the full conditional distributions (e.g., π Σ β |β and π (Σ γ |γ)) and details about slice-sampling is provided in Bolte et al. (2021a). 2 Further, because the closed forms for the posterior distributions of λ, W and V are not available for the Spatial split-population survival model, our MCMC algorithm incorporates Gibbs Sampling for estimating λ. Our MCMC update scheme then employs the Metropolis-Hastings algorithm for estimating V given λ, and then uses the Metropolis-Hastings algorithm to estimate W given λ. 3

Using the BayesSPsurv R package
Each of the three Bayesian SP survival models in BayesSPsurv incorporates a cure rate fraction and assumes that the time-to-event baseline hazard follows a Weibull or log-logistic distribution (which the user specifies). Users can also incorporate time-varying covariates in either stage. BayesSPsurv contains compiled C++ code using the package Rcpp (Eddelbuettel et al., 2020) to maximize computational efficiency when estimating the included Bayesian SP survival models. In addition to the pre-estimation spatial autocorrelation (Join Count and Global Moran's I) tests described below, the BayesSPsurv package also permits users to calculate the deviance information criterion (DIC) and log-likelihood statistics from the Spatial, Exchangeable, and Pooled SP survival models' MCMC output. The DIC is a measure of model fit that also penalizes the effective number of parameters. Like the Akaike Information Criterion, models with smaller values are preferable to those with larger values. Users can also conduct MCMC diagnostics with various extant packages designed to handle mcmc objects such as coda (Plummer et al., 2020).

Loading the package, dataset, and assessing spatial autocorrelation
To demonstrate the utility and various functions in BayesSPsurv, we use replication data from Walter's (2015) global study on post-civil war peace duration (denoted in the package as Walter_2015_JCR). Walter's (2015) panel data consist of 1,237 observations from 46 countries observed between 1962 and 2009. As discussed in Bolte et al. (2021a), her data are well-suited for SP survival analysis because they include two underlying populations: an "at-risk" fraction of countries in which civil wars can potentially recur, and an "immune" fraction of countries in which civil conflict recurrence is structurally improbable. The Walter_2015_JCR data are a subset of Walter's (2015) most important variables, including lgdpl for log per capita income, unpko for the presence of UN peacekeeping operations, the binary variable victory, coded as 1 when one side in the civil war wins the conflict militarily, and the dummy variable comprehensive, coded as 1 when the combatants sign a comprehensive peace agreement. The dataset also includes the binary indicator renewed_war, coded as 1 for the year in which a civil war recurs and 0 otherwise. This variable will serve as our failure event.
First, however, we load the BayesSPsurv package along with the dataset and then use the add_duration() function from the spduration package to add several variables that allow us to capture the survival characteristics of the data (Beger et al., 2017). unitID indicates the unique unit identifier (in this case, a unique id for each civil conflict), and timeID specifies the temporal variable. library(BayesSPsurv) data(Walter_2015_JCR) walter <-spduration::add_duration(Walter_2015_JCR,"renewed_war", unitID = "id", tID = "year", freq = "year", ongoing = FALSE) str(walter) data.frame : 1237 obs. of 21 variables: These new variables include duration, a cumulative count of the years of post-war peace survived, and the dummy variable atrisk, coded as 1 for all observations that eventually experience war recurrence in the sample period. Altogether the data include 77 post-civil war peace spells and 24 instances of civil war recurrence.
If the preferred frailty unit is at the country-level, users can take advantage of the spatial_SA() function in the BayesSPsurv package to generate their binary spatial weights matrix. In most cases, analysts define spatial clustering in terms of geographic proximity, but this often requires the researcher to define some maximum distance threshold below which two units are considered "neighbors." spatial_SA() allows users to specify their own distance threshold. For this example, we use the spatial_SA() function to generate an adjacency matrix of countries in the data whereby "proximity" (a ii = 1) is defined as having capitals that are within 800 km of each other (and a ii = 0 otherwise). We simply specify the unique identifier for each country (ccode) and our distance threshold of 800 km. The spatial_SA() function will produce a 1 × N vector with identifying information (e.g. country ID) for each observation that matches the rows and columns of the matrix. The result is a list object called walter that includes both the original data frame and the associated adjacency matrix. Note that spatial adjacency between units does not need to be defined geographically; users can conceptualize "space" as any form of dyadic relationship between units, but typically spatial clustering is substantively captured with some measure of geographic distance. Users can also create their own adjacency matrix from scratch to incorporate into the Bayesian estimation routine if their units of analysis are something other than countries.
Having generated a matrix that records the spatial relationship of all pairs of units, we may now be interested in assessing the presence and degree of spatial autocorrelation in the data with respect to our outcomes of interest. The BayesSPsurv package provides two functions to conduct these preliminary tests on country-level data. The first is the plot_JoinCount() function, which generates an adjacency matrix with a user-defined distance threshold (as above), implements the join count test for each cross-section in the data, and then automatically plots the test statistics with user-specified confidence intervals across each observed year. The join count test is a widely used correlational statistic for evaluating whether the expected count of categorically distinct adjacent units is greater than what we would expect by chance alone (Cliff and Ord, 1981). More formally, if we assume two categories, Black and White, then the join count test statistic is: where BW and E(BW) are the observed and expected counts of adjacent Black and White units, respectively, and σ 2 BW = E BW 2 − E (BW) 2 . We use this test as a preliminary exercise to determine whether countries that never experience conflict recurrence in the sample exhibit spatial autocorrelation. Note that plot_JoinCount also has an argument specifying the minimum number of units that must be in a cross-section for the test statistic to be calculated. In our case, there were fewer than 12 countries in every year prior to 1975 in the data, and none were geographically proximate. By specifying n=12, we are simply dropping years prior to 1975. The following code produces the plot in Negative values indicate clustering (positive spatial autocorrelation), and positive values indicate spatial dispersion. Figure 1a clearly depicts spatially correlated patterns of potential "peace consolidation," though the direction of the autocorrelation varies over time.
The second function, plot_Moran.I(), implements the Global Moran's I test, which assesses the direction and degree of spatial autocorrelation in continuous or ordinal data (Moran, 1950;Paradis et al., 2020). We use this test to examine whether the duration of post-civil war peace among geographically proximate countries exhibits spatial autocorrelation (or dispersion). Like the previous function, plot_Moran.I() plots the Moran's I values based on user-defined adjacencies and, in this case, 90% confidence intervals for each year.
plot_Moran.I(data = walter [[1]], var_duration = "duration", var_id = "ccode", var_time = "year", n = 12) Figure 1b reveals that the spatial relationship of post-war peace survival oscillates between positive autocorrelation (in the 1970s, early 1990s, and late 2000s) and spatial dispersion (in much of the 1980s and late 1990s). Taken together, the Join Count and Moran's I tests suggest that unobserved heterogenous risk factors that transcend the borders of a single state may lead to spatial autocorrelation in both the consolidation and duration of post-war peace in Walter's (2015) data. This suggests that a Spatial SP survival model is an appropriate method of analysis, particularly if proximate countries share common unobservable risk factors that affect their odds of being immune to war recurrence or the time it takes for renewed war to occur.

Applying the Bayesian Spatial SP survival model
The spatialSPsurv() function fits the Bayesian Spatial SP survival model, which includes spatially autocorrelated frailties in the model's split and survival stage and can incorporate time-varying covariates. We illustrate the features of the spatialSPsurv() function by fitting the Bayesian Spatial Weibull cure model on Walter's (2015) data on post-war peace. 4 The log-logistic results are reported in Bolte et al. (2021a). We include the lgdpl variable in the estimated model's split-stage equation, which estimates the probability of units being in the "consolidated" post-war peace group (the "immune" fraction), though it is important to note that atrisk is used here as the dependent variable. We estimate the probability of post-war survival among the at-risk fraction as a function of U.N. peacekeeping (unpko), outright military victory in the previous war (victory), the resolution of the previous war via peace agreement (comprehensive), and, again, GDP/capita (lgdpl). The spatial relationship of each country is incorporated into the model via the spatial weights matrix generated from the spatial_SA() function described earlier.
The Spatial SP survival model is fit via Bayesian methods for inference. The MVN prior is used for the model's split-stage (γ) and survival-stage (β) parameters (with the following hyperparameters s β = I p1 , v β = p1, s γ = I p2 , v γ = p2). The Gamma prior is used for the hazard's shape parameter ρ (with hyperparameters aρ = 1, bρ = 1), and separate CAR priors-with the Gamma hyperprior assigned to λ for CAR (λ) 5 -incorporate the spatially autocorrelated frailties in the model's split-stage (V) and survival stage (W). Estimation proceeds via the MCMC algorithm described in the previous section. To this end, we specify the MCMC sampler as follows: the argument N = 15,000 sets the number of MCMC iterations to 15,000, burn = 5,000 discards the first 5,000 states of the Markov chain, thin = 15 specifies the number of steps that are employed to prevent autocorrelation, and m = 10 limits the steps in the slice sampling to 10. The argument w = c(1,1,1) specifies the default values of the size of the slice sampling for γ, β, and ρ.
For simplicity, we retain the default 0 as the initial value for all parameters with the ini.beta, ini.gamma, ini.W, and ini.V arguments. We incorporate separate proposal variances, namely prop.varV = 1e-05 and prop.varW = 1e-05, for the split and survival stage frailties (V, W), and then assign separate Metropolis-Hastings proposal steps for estimating these parameters to optimize the acceptance rates. To specify the covariates in the model, duration~precedes the survival-stage covariates, while immune~precedes the split-stage variables. Y0 indicates the time elapsed since inception until time t -1, while LY is a dummy that captures the last observation year due to censoring or failure. A = walter [[2]] calls the spatial weights matrix from our list object. S= sp_id (generated by spatial_SA()) gives the unit identifier in the spatial weights matrix that matches the units in the data frame. The argument form is used to specify the parametric survival distribution (which can be either "Weibull" or "loglog"). Importantly, the model functions in the BayesSPsurv package generate single chains for each parameter, but we provide a routine for estimating multiple chains in parallel on the GitHub repository for the package (https://github.com/Nicolas-Schmidt/BayesSPsurv). We first discuss the results and diagnostics for the single chain results using the above MCMC specification, and then illustrate the multiple chain results and diagnostics for the Spatial SP survival model. The function automatically provides a progress bar for users to track the percent of the computation that has completed. The elapsed run time for the Spatial SP Weibull specification being examined here was just under 23 minutes (calculated with system.time()). Once completed, the generic print() function displays the results 6 : print (  The posterior mean estimates suggest that peace agreements, military victory, and peacekeeping units may increase the survival of post-civil war peace, though none of these parameters are highly reliable given their 95% BCIs, as reported in (Bolte et al., 2021a). However, lgdpl, appears to reliably increase the survival of peace among at-risk cases, while also reliably decreasing the probability of peace "consolidation" in the split-stage. Calling the SPstats() function calculates the Deviance Information Criterion (DIC) and log-likelihood (LL) statistics from the estimated model, where DIC = −2 × (L − P), L is the log-likelihood of the data given the posterior means of the covariate parameters, and P is the effective number of parameters in the model: The spatialSPsurv() function produces an object of class "mcmc", which is compatible with all standard summary and diagnostic methods for the single Markov chains in the coda package (Plummer et al., 2020). For instance, we can test for convergence with both the Geweke (1992) test using the geweke.diag() function and the Heidelberger and Welch (1983) stationarity test via the heidel.diag() function in coda. We report the Heidelberger and Welch (1983) stationarity test results in Bolte et al. (2021a) to save space. In this case, the Geweke tests indicate little evidence against convergence for each split-stage and survival-stage covariate, because the test statistics are reasonably close to zero. Fraction in 1st window = 0.1 Fraction in 2nd window = 0.5 var1 0.3544 coda's plot() function can then be used to display trace and posterior density plots for the single Markov chains of the model's split-stage γ, survival-stage β, and ρ parameters (which we do not report here to save space). Users may also wish to validate MCMC convergence using multiple chains with different initial values for each parameter rather than only examining single-chain diagnostics. A routine for generating multiple chains in parallel using the doParallel (Wallig et al., 2020) and doRNG (Gaujoux, 2020) packages is available on the BayesSPsurv GitHub repository, though single chains with different starting values can also be run sequentially. The user can then collect them into a single mcmc.list object and easily plot the trace and density plots of the multiple chains. Using the code on GitHub, the Spatial SP survival model was re-run four times (to generate four chains) with 0, 1, 10, and 50 as the initial starting values for each of the estimated parameters (β, γ, W, and V). The trace and density plots for the multiple chains are reported below (we excluded the β intercept for aesthetic reasons), and the total run time was approximately 28 minutes for all four chains to be generated simultaneously: par(mfrow=c(2,2)) plot (gammas,density = FALSE,auto.layout = FALSE,col =c (28,26,27,29)) plot(gammas, trace = FALSE, auto.layout = FALSE) par(mfrow=c(2,4)) plot(betas [,2:5],density= FALSE,auto.layout = FALSE,col = c(28,26,27,29)) plot(betas[,2:5], trace= FALSE, auto.layout = FALSE) The trace plots for the Spatial SP Weibull model's β and γ reveal decent mixing between the first three chains, though the fourth chain appears to take longer to converge. This is unsurprising given its extreme initial value. We can assess the convergence of the multiple chains using the Gelman-Rubin diagnostic, which compares the within-chain variance to the between-chain variance (Gelman and Rubin, 1992). If the difference in these variances is large, then the multiple chains likely have not converged to a proper stationary distribution. We can easily calculate the potential scale reduction  The multivariate PSRF is less than 1.1 in both stages, but the point estimates for all parameters are still insufficiently far from 1, suggesting that extending the chains would likely improve the accuracy of the estimates.

Applying the Bayesian Exchangeable and Pooled SP survival models
The BayesSPsurv package can also be used to estimate Bayesian Exchangeable and Pooled SP survival models. The Exchangeable model incorporates frailties that are assumed to be statistically independent in both stages, while the Pooled model excludes frailties altogether. We again use Walter's (2015) data on post-civil war peace to illustrate these models. Beginning with the Exchangeable Weibull SP survival model, we again assign the MVN prior to the model's γ and β parameters, the Gamma prior to ρ, and the same default settings for the hyperparameters (s β = I p1 , v β = p1, s γ = I p2 , v γ = p2, a ρ = 1, b ρ = 1). We also incorporate separate proposal variance variables for the nonspatial i.i.d frailties (V i ,W i ) and assign separate Metropolis-Hastings steps to estimate these parameters. Further, we use the same MCMC sampler specification as before but rather than incorporating spatial information via an adjacency matrix, we let id_WV=country to define the country-level nonspatial frailties: set.seed ( Based on these results, lgdpl is still a highly reliable predictor of longer peace duration among conflicts at risk of recurrence, while the other survival stage covariates remain positive but statistically unreliable. In contrast to the spatial frailty model, however, the effect of lgdpl on the probability of peace consolidation is now statistically unreliable. Users interested in displaying and interpreting the exchangeable frailties can do so in a variety of ways, but if these results are being compared to those of a spatial frailty model, we recommend mapping the exchangeable frailty values and comparing the clustering (or lack thereof) of the i.i.d. frailty terms to those produced from the spatial model (Darmofal, 2009).
Finally, the pooledSPsurv() function fits the Pooled Bayesian SP survival model. For this application, we again assign the MVN prior to the model's γ and β parameters, the Gamma prior to ρ, and the same default settings for the hyperparameters described earlier.

Conclusion
Survival data often include two populations with different underlying risk propensities: the immune fraction of right-censored units that will never experience the event of interest and the at-risk fraction of units that have or will. Numerous R-packages such as smcure, nltm, and spduration have been developed to fit parametric or semi-parametric cure models for non-spatial survival data, but none of these packages allow the user to account for spatial autocorrelation in both stages. A separate set of packages allows the user to include spatially weighted frailties in conventional survival models (Taylor and Rowlingson, 2017; but spatial autocorrelation may also influence the probability of immunity from an event of interest. The BayesSPsurv package addresses this lacuna by offering a suite of functions to fit Bayesian SP survival models. Specifically, users can estimate Bayesian Pooled, Exchangeable, and Spatial frailty SP survival models with time-varying covariates, specify their own spatial weights matrices, and easily examine diagnostic statistics. The applied potential of the Spatial SP survival model in particular is considerable, as researchers studying anything from the survival of cancer patients to the survival of political regimes may have a methodological need to model spatial autocorrelation in the immune fraction. Future work can build upon the spatial and nonspatial cure models presented here to develop estimation routines for survival data with multiple stages, competing risks, or recurrent events. Moreover, although BayesSPsurv is the first to allow spatially autocorrelated frailties in both stages, future work can extend the frailty models in our package by employing a similar approach as Yin (2005) to incorporate shared frailties with factor loadings to account for any correlation between the frailty terms (W and V) in either stage. Subsequent iterations of the package will include alternative baseline hazards based on penalized splines or non-parametric Gaussian processes as well as the option to use a semi-parametric Cox model with exchangeable or spatial frailties. Implementing these future developments with an external MCMC sampling dependency built on C++ subroutines like STAN would be useful, given the Hamiltonian Monte Carlo algorithm's efficiency in sampling from high dimensional posterior distributions. We also plan to extend BayesSPsurv to accommodate left and interval censoring and delayed entry.