Mapping Smoothed Spatial Effect Estimates from Individual-Level Data: MapGAM

Abstract We introduce and illustrate the utility of MapGAM, a user-friendly R package that provides a unified framework for estimating, predicting and drawing inference on covariate-adjusted spatial effects using individual-level data. The package also facilitates visualization of spatial effects via automated mapping procedures. MapGAM estimates covariate-adjusted spatial associations with a univariate or survival outcome using generalized additive models that include a non-parametric bivariate smooth term of geolocation parameters. Estimation and mapping methods are implemented for continuous, discrete, and right-censored survival data. In the current manuscript, we summarize the methodology implemented in MapGAM and illustrate the package using two example simulated datasets: the first considering a case-control study design from the state of Massachusetts and the second considering right-censored survival data from California.


Introduction
In spatial epidemiology studies, mapping crude and adjusted spatial distributions of disease risk is a useful tool for identifying risk factors of public health concern (Elliott and Wartenberg, 2004). The underlying (or crude) geographic pattern of disease is often what is observed by public health practitioners, but these patterns may be due to important spatially-varying predictors such as socioeconomic status, race/ethnicity, or environmental exposures. Individual-level spatial analyses can provide insight regarding disease risk by adjusting for these variables without aggregation bias (also known as ecological bias). Disease risks often have complex spatial patterns that are subject to high variability due to sparsity. Smoothing provides an efficient method to deal with these issues by borrowing strength from adjacent observations to reduce variability while allowing for non-parametric flexibility when estimating the spatial distribution of risk. Generalized additive models (GAMs), originally proposed by Hastie and Tibshirani (1986), are common model-based approaches for mapping point-based epidemiologic data (Webster et al., 2006;Vieira et al., 2008;Baker et al., 2011;Akullian et al., 2014;Bristow et al., 2014;Hoffman et al., 2015). GAMs provide a unified statistical framework that allows for the adjustment of individual-level risk factors when evaluating spatial variability in a flexible way. The flexibility provided by GAMs, together with the intuitive nature of many smoothing techniques, make them an ideal choice for modeling complex spatial associations.
There are a number of R packages implementing GAMs and related models (R Core Team, 2015). The gam package (Hastie, 2004) provides an implementation of the GAM framework of Hastie and Tibshirani (1986) by providing two types of commonly used smoothing methods: cubic loess smoothing splines for univariate variables and local kernel smoothing (LOESS) for multivariate variables. The mgcv (Wood, 2009;Breslow and Clayton, 1993) package implements cubic smoothing splines and tensor product smooths, an extension of cubic splines to multi-dimentions. mgcv also provides various criterion to aid in the selection of model complexity via the choice of effective degrees of freedom and provides functions to fit generalized additive mixed effects models (GAMMs) for correlated data. Package gamlss (Rigby and Stasinopoulos, 2005;Stasinopoulos and Rigby, 2007) implements an extension of the GAM that incorporates selected distributions outside of the exponential family. With respect to censored survival data, parametric additive models can be fit using both the gamlss.cens package (Stasinopoulos et al., 2015) and the VGAM package (W., 2007). Bayesian inferences for the spatial analysis of survival data based on the parametric proportional hazards model are implemented in package spatsurv (Taylor et al., 2016;Taylor and Rowlingson, 2014). However, parametric models assume a full distribution of the survival times, and misspecifying the distribution may yield bias for estimates. Cox proportional hazards models, which are semi-parametric without specifying a form for underlying hazard function, are more robust for survival analysis including mulitple adjusted variables.
A variety of R packages incorporate Cox proportional hazards models and spatial smoothing term. The R interface to BayesX (Umlauf et al., 2015;Belitz et al., 2016;Kneib et al., 2014), R2BayesX , provides survival spatial analysis based on structured additive models (STAR) without specifying the baseline hazard. mboost (Hothorn et al., 2016) implements boosting for optimizing penalized likehood function, and as an extention of mboost, gamboostMSM (Reulen, 2014) provides estimates for multistate models. mgcv also incorporates coxph family in the model fitting. However these packages use cubic, B-and/or P-splines as smoothing methods; none offer LOESS. LOESS adapts to varying data densities by defining a local neighborhood based on a fixed proportion ("span") of the observations; this is especially useful for spatial analyses as population densities typically vary within any study region. Therefore, none of the above packages provide an implementation of the Cox proportional hazards additive model for censored survival data that allows for multivariate loess smoothing of covariates such as geolocation parameters, despite the fact that spatial effect estimation in the context of survival outcomes is of great interest in epidemiology studies (Henderson et al., 2002;Bristow et al., 2014). Moreover, displaying spatial predictions on a map with irregular geographic boundaries is a nontrivial effort, often handled by exporting statistical predictions to separate specialized geographic information system (GIS) software such as ArcGIS that requires a paid user license (Webster et al., 2006;Vieira et al., 2008) or by omitting geographic boundaries altogether (Akullian et al., 2014). At best, these limitations and complexities pose a significant barrier to researchers not already well versed in both GAMs and GIS methods and at worst may lead to reporting errors due to the inefficient transfer of estimates between separate software packages.
To address the above deficiencies of current software, MapGAM was built to provide a single R package that allows for estimating, predicting, and visualizing covariate-adjusted spatial effects using individual-level data. The package estimates covariate-adjusted spatial associations with a univariate or survival outcome via GAMs that include a non-parametric bivariate smooth term of geolocation parameters. Estimation and mapping methods are implemented for continuous, discrete, and right-censored survival data. In addition, support functions for efficient control sampling in case-control studies and inferential procedures for testing global and pointwise spatial effects are implemented. We have found that a unified system for estimating and visualizing covariate-adjusted spatial effects on outcomes arising from the most commonly encountered epidemiologic study designs greatly facilitates efficient and reproducible analyses in these settings.
This article serves as an introduction and illustration of the MapGAM package. The remainder of the manuscript is organized as follows: Section 2.2 provides an overview of the methodology implemented in MapGAM for estimating and visualizing spatial effects in the context of a generalized additive model for continuous, binary or count outcome data. An illustrative example using MapGAM to analyze hypothetical case-control data from the state of Massachusetts is also provided. Section 2.3 considers estimating spatial effects on right-censored survival times via a Cox proportional hazards additive model. The estimation procedures implemented in MapGAM are provided and a brief simulation study considers the performance of the proposed fitting methods in various settings. In Section 2.4 we consider inference procedures associated with spatial modeling and illustrate how to use the package to perform a global test of a spatial effect and calculate confidence intervals for predictions at each spatial prediction point. Section 2.5 concludes with discussion of the utility of the MapGAM package and considers possible extensions of the package in future research.

Spatial effect on a univariate outcome
We consider estimating and visualizing covariate-adjusted spatial effects in the context of a GAM for continuous, binary or count outcome data. The spatial effect can be estimated by fitting a GAM model with a bivariate smoothing term for the two geolocation parameters. Typical models will also include additional adjustment for demographic characteristics and other risk factors that may serve as potential confounding factors in the association between location and the outcome of interest.

Generalized additive model
We consider modeling observations that are distributed on a map with u i and v i denoting the geographical parameters for the i th observation, i = 1, . . . , n. Let Y i denote the outcome and X i denote a vector of adjustment covariates. Further suppose that the distribution of the outcome belongs to the exponential family. The GAM then assumes that where g(·) is the link function for mean of the outcome µ i = IE[Y i ] and the variance of the outcome is defined by the assumed probability model and denoted as V i ≡ Var(Y i ) = V(µ i , φ); a function of the mean and nuisance parameter φ. β denotes a vector of coefficients associated with adjustment covariate X i and f (u i , v i ) represents the spatial effects, which is a nonlinear function of location. When fitting the model, we separate the spatial effect into parametric and nonparametric portions: f (u i , v i ) = γ 1 u i + γ 2 v i + s i , and the model becomes The parametric part of the spatial effect is fit jointly with other adjustment variables using least squares, while the nonparametric term is fit using a nonparametric smoother. To ensure identifiability, we constrain the model so that ∑ i s i = 0.
A local scoring procedure (Hastie and Tibshirani, 1986) is used to fit the model. Let l denote the log-likelihood function based upon one observations Y = [Y 1 , . . . , Y n ] , which is a function of η = [η 1 , . . . , η n ] . To estimate the parameters of the model we seek to maximize the expected log likelihood: where the expectation is taken over the joint distribution of X and Y. This has intuitive appeal since it seeks to choose a model that maximizes the likelihood of all possible future observations. Under standard regularity conditions (namely the ability to interchange integration and differentiation), we obtain While there is no general closed for solution to Eq.(4), a first-order Taylor series expansion leads to an iterative estimating procedure given by which is equivalent to In the exponential family case, we can compute the first and second derivatives of the expected log likelihood as and Then taking the expectation (conditional on X) of Eq. (8) we obtain Hence η i is updated in the GLM case by Further, letting Y old w = [Y old w1 , · · · , Y old wn ] denote the working response computed in terms of η old and µ old and given by we obtain from Eq.(2), Eq.(10), and Eq.(11), The coefficientsβ new 0 ,β new and nonparametric term, s, must be estimated in order to obtain an updated value of η new in Eq.(10). If no parametric linear predictor term is included in the model (beyond the spatial smoothing term), the updated s new can be estimated by regressing the working response Y old w on a bivariate smoother for u and v. However, with the parametric linear predictor term included in the model, the backfitting algorithm can be used to update β 0 ,β and s, as is done in the gam package.
A locally weighted scatterplot smoother (LOESS) (Cleveland, 1979(Cleveland, , 1981Cleveland and Devlin, 1988) is utilized as the bivariate smoothing function for the two geolocation parameters u and v in the MapGAM package. The smoothing parameter defining the neighborhood used to select the K nearest observations points for smoothing may be user specified or automatically chosen by minimizing AIC (Webster et al., 2006).

Estimating and mapping a spatial effect
In the MapGAM package, typical spatial applications will start with the predgrid() function to create a regular grid of points within the study area, potentially restricted to points within optional map boundaries (e.g., a country, state, or regional map obtained from the maps package or imported from a shapefile). Crude or covariate-adjusted odds ratios, hazard ratios, or other effect estimates are then obtained for each grid point using the modgam() function to smooth by geolocation. modgam() provides compatible and flexible interfaces, acting as a wrapper function to the gam() function in the gam package. Specifically, the model can be specified via a formula statement, or for users less familiar with writing model formulas in R, the formula can be omitted in which case the model is specified implicitly by structuring the data so that the first column of the data represents the outcome to be modeled (or the first two columns for survival objects), the next two columns represent the parameters for geolocation, and the remaining columns represent the adjustment covariates to be included in the model. With the model specified, modgam() proceeds by calling the gam() function to estimate model parameters, then calls mypredict.gam() to generate predictions for the specified grid. The optspan() function can be used to find an optimal span size (proportion of data size included in the neighborhood) for the LOESS smoother. Optionally, the modgam() function can call optspan() to choose the optimal span for fitting the model in an automated fashion.
Considering the estimated spatial effect f (u i , v i ) for the i th location, researchers are often interested in the spatial effect difference (or ratio, log-ratio) comparing each location to a defined reference. To obtain spatial effect estimates, one can specify type="spatial", then modgam() provides three options for the choice of reference: the median of f (u i , v i ), i = 1, . . . , n, the mean of f (u i , v i ), i = 1, . . . , n, or an estimated spatial effect value at a user-specified geolocation. Alternatively, specifying reference="none" will produce prediction estimates based upon the linear predictor for each covariate combination in the prediction dataset (including the model intercept). To produce estimates of effects for all adjustment covariates, the option type="all" may be specified. The result of modgam() is an object of class modgam() that can be summarized by class-defined printing, summarizing and plotting methods. Specifically, a heatmap of the predicted values from a fitted model can be generated using either the colormap() or plot() functions. For tailored plots, the trimdata() and sampcont() functions can be used to restrict data to those areas within a specified set of map boundaries and to conduct simple or spatiotemporal stratified sampling from eligible controls-a useful feature for analysis of data from large cohorts.

Application to case-control data from Massachusetts
In this section we present an illustrative example using MapGAM to analyze hypothetical casecontrol data from Massachusetts. MAdata is a simulated case-control study dataset available in the MapGAM package. Contained in the dataset are 90 cases and 910 controls with randomly generated geolocations across Massachusetts, geocoded on a Lambert projection (in meters). MAmap provides a map of Massachusetts using the same projection. The dataset also contains three randomly generated potential adjustment covariates: smoking, mercury exposure and selenium exposure. A summary of the dataset follows: The geolocations of the observations are shown in Figure 1, which can be generated with the following code: We first start with generating a prediction grid for the map using predgrid().
library("PBSmapping") R> gamgrid <-predgrid(MAdata, map = MAmap) After defining a prediction grid, modgam() is used to fit a GAM model based on the MAdata and generate predictions on the defined grid. A formula expression indicates that the indicator Case is specified as the response, and two spatial parameters Xcoord and Ycoord are included in lo() to specify a geospatial smoothing term. In addition, potential confounders Smoking, Mercury and Selenium are also adjusted for in the model as linear terms. Argument sp is used to specify the span size for the spatial smoothing term. A specification of sp = null (the default) implies that an optimal span will be selected. Note that if the model formula is not supplied, the data must be structuring so that the outcome is in the first column, the two spatial parameters are in the second and third columns and the adjustment variables are in other columns. In that case, specifying m="adjusted" will include all other columns of the data as linear terms in the model and m="crude" will fit only the two spatial parameters (the m argument is ignored if a model formula is supplied). For this particular example, the resulting call to modgam() using the formula statement is given as follows: Coefficients in the above output represent log-odds ratios. The interpretation of parametric terms remain the same as the usual logistic regression model. For example, we estimate the odds of disease is estimated to be e 1.53 = 4.63-fold higher when comparing smokers to non-smokers with similar location and exposure to mercury and selenium.
The interpretation of the smoothed spatial terms is best done graphically. A heatmap of the estimated spatial effect predictions (representing the odds ratio comparing the odds at each location to the median odds across all locations) can be generated using the modgam plotting routine via a call to the plot() function. This in turn relies upon the colormap() function defined within MapGAM. The resulting heatmap is displayed in Figure 2. The exp argument is used to specify whether the heatmap is drawn on the scale of the odds ratio (exp=TRUE) or the log odds ratio (exp=FALSE).

Estimating spatial effects for right-censored survival data
To quantify spatial effects on censored survival outcomes, MapGAM implements a Cox proportional hazards additive model with a bivariate (two geolocation parameters) smoothing term. The incorporation of a bivariate smoother within the Cox model is not, to the best of our knowledge, currently implemented within R. In this section, we briefly introduce the methodology implemented in MapGAM as an extension of the GAM methods previously discussed for GLMs, provide a limited simulation study to illustrate the validity of the methodology in selected settings and provide an example of applying the MapGAM package to estimate spatial effects on censored survival data using hypothetical survival times derived from the state of California.

Fitting the Cox proportional hazards additive model
Suppose we observe right-censored survival data that is distributed on a map with u i and v i as the geographical parameters for the i th observation, i = 1, . . . , n. Let T i denote the observed followup time and δ i denote the indicator of whether or not T i represents the true failure time for observation i. Further, letX i be a vector including adjustment covariates X and geolocations (u, v) corresponding to observation i. The Cox proportional hazards additive model used in the MapGAM package incorporates a bivariate smoother into the Cox proportional hazards model (Kelsall and Diggle, 1998) as where λ i (t) represents the hazard function for observation i evaluated at time t and λ 0 (t) denotes the baseline hazard (ie. the hazard of an observation with all covariate values equal to 0 and location with s = 0, where again s is a smooth function of spatial coordinates u and v). Define the linear predictor For ease of exposition, consider the case of no tied failure times. Then the partial likelihood and log-partial likelihood are given by and respectively, where D represents the set of indices of all unique failures and R j = {k|T k ≥ T j } denotes the risk set just prior to time T j . In the event of tied failure times, MapGAM defaults to the use of the Efron approximation (Efron, 1977) for the partial likelihood: The R Journal Vol. 12/1, June 2020 ISSN 2073-4859 where F j is the set of indices of the failures occurring at time T j , and |F j | is the number of indices in the set F j .
Letting l denote the log-partial likelihood in Eq.(17), we again seek to solve the maximization problem provided in Eq.(3). The solution can be found iteratively using Eq.(6). We can compute the first and second derivatives of the log-partial likelihood with respect to η i for observation i as and The Cox model is a semi-parametric model without any specification for the distribution of the survival times, so it is not possible to calculate a close form for the expectation of the second derivatives for the log partial likelihood as required in Eq.(9). So before updating η, a GAM model can be fitted using the second derivatives as responses to estimate the expectation of the second derivatives of log-partial likelihood.
To this end, we modify the local scoring procedure presented in Section 2.2.1 by noting that in Eq. (6), with an estimate η old , the new estimate for η can be obtained using the following two steps: 1. Estimate IE[d 2 l/dη 2 i ] by fitting a generalized additive model using [d 2 l/dη 2 i ], i = 1, · · · , n as responses, including the linear predictor ofX and a bivariate smoother of geolocation parameters; 2. Estimate η new using the backfitting algorithm described in Section 2.2.1 with

Simulation examples
In this section we assess the performance of our proposed method for fitting the Cox proportional hazards additive model using two simulation studies. In both simulation settings, two spatial parameters (u, v) and adjustment covariate x are generated from a uniform distribution with range from −1 to 1. Survival times were then simulated from an exponential distribution with a hazard function. The first simulation example assumes a linear effect of all covariates on the log-hazard and that the effect of adjustment covariate x does not interact with the effect of the spatial parameters u and v. λ = 0.03 exp {log(0.7)x + log(1.2)u + log(1.5)v} .
In the second simulation example, the spatial parameters have a nonlinear effect on the log-hazard, while the adjustment covariate x has a linear effect that does not interact with the spatial coordinates. The hazard function used in the second simulation example is λ = 0.03 exp log(0.7)x + log(1.2)u + log(1.5)v + log(0.8)u 2 + log(1.8)uv .
The true data-generating heatmaps of the two examples are shown in Figures 3a and 3c, respectively. When we set a seed of 269, with N = 5000 sampled data points. The survival times under the first (second) simulation setting range from 0.0011(0.0011) to 316.5(396.8), and have a median of 22.66(24.16).
In both settings, censoring times were randomly sampled from a Uni f orm(0, 70) distribution and observed times were taken to be the minimum of the true failure time and censoring time for each observation, yielding approximately 41.6% and 43.9% censoring in scenario 1 and 2, respectively. Code for this simulation is provided in the Appendix. Cox proportional hazards additive models were fit and the spatial effect of the points on an equally-spaced grid (201 × 201) extended across u ∈ [−1, 1] and v ∈ [−1, 1] were predicted using the modgam function from the MapGAM package. Smoothing span sizes of 0.4 and 0.2 were utilized for scenario 1 and 2, respectively. In each case, these values roughly correspond to the automated span size chosen when optimizing AIC. Figure 3b and 3d display the estimated spatial effects for example data sets using the first (linear relationship) and second (nonlinear relationship) simulation settings, respectively. Comparing the estimated values in Figures 3b and 3d to the corresponding true data generating values displayed in Figures 3a and 3d, we can see that the additive proportional hazards model implemented in MapGAM accurately recreates the true spatial effects (either linear or nonlinear) giving rise to the data. In addition, two scatterplots of the estimated versus true spatial effect are provided in Figure  4a and 4b, again illustrating that the additive proportional hazards method outlined above is able to correctly identify the spatial effects present in the data with minimal bias.

Application to right-censored California data
In this section we use the MapGAM package to estimate and visualize spatial effects for a dataset simulated from information on censored survival times of California ovarian cancer patients. These are data contained in the object CAdata within the MapGAM package. The original source is the California advanced-stage invasive epithelial ovarian cancer patients reported to the California Cancer Registry from 1996 to 2006 (Bristow et al., 2014). After removing patients with age <25 and >80 for identifiability reasons, and adding random noise to the geolocation parameters, CAdata represents a random draw of size N = 5, 000 observations from the original dataset. Observed times and failure status were simulated based upon the observed distribution found in the original dataset. Potential covariates available in the dataset include age and insurance type (6 categories in total: Managed Care, Medicare, Medicaid, Other Insurance, Not Insured and Unknown). A summary of CAdata is as follows: R> data("CAdata") R> summary ( R> data("CAmap") R> plot(CAmap) R> points(CAdata$X,CAdata$Y) Below we generate the object CAgrid for the state of California using the predgrid() function and estimate spatial effects on the relative risk of death from a Cox proportional hazards additive model using the modgam() function. As with the previous example, coeficients for parametric terms in the model are interpretable as the would be in a standard (non-GAM) fit of the data. In this case, the coefficients for these terms represent log-hazard ratios. For example, we estimate that the hazard ratio comparing two subpopulations differing in age by 1 year but having similar insurance status is approximately e 1.026 = 1.03. The smoothed spatial terms are again best intepreted graphically. A heatmap of the hazard ratio comparing the estimated hazard at each location to the median hazard across all locations is plotted using the plotting routines defined for modgam objects via plot(). The resulting heatmap is displayed in Figure 6.

Making inferences
In addition to providing point estimates associated with each spatial location, MapGAM provides pointwise standard errors as well confidence intervals. This inference is returned by the modgam function when the option se.fit=TRUE is specified. The estimated pointwise standard errors for spatial effects are derived from the sum of two variance curves: one from the parametric terms associated with location, γ 1 u i + γ 2 v i , and the other from the non parametric term, s i (Chambers and Hastie, 1992). Briefly, variance estimation requires computation of the operation matrix G i for each smooth term s i , such that s i = G i z, where z is the working response from the last iteration of the fitting algorithm described in Section 2.1 and is asymptotically distributed as a Gaussian random variable. From this, the covariance matrix for the estimated s i is given by G i Cov(z)G i , which can be estimated byφG i W −1 G i , where W is a diagonal matrix with elements defined by the weights used in the last iteration of the fitting algorithm andφ is an overdispersion parameter estimated using Pearson's Chi square statistic. The operation matrix, G i , tends to be computationally expensive to obtain for non-parametric or semi-parametric smoothing procedures, and hence approximations are often used when estimating G i Cov(z)G i . One approach is to approximateφG i W −1 G i byφG i W −1 , which is generally conservative for non-projection smoothers (Chambers and Hastie, 1992). In this case, G i can be orthogonally decomposed into G i = H i + N i , where H i can be obtained as the design matrix corresponding to the parametric portion of the linear predictor, and N i corresponds to the non-parametric portion. Thus, the variance of the estimated smooth term can be approximated via a decomposition of two variance components: (i) the variance from the parametric portion of the linear predictor which captures the correlation all parametric terms that are fitted together, and (ii) the variance from the non-parametric portion of linear predictor reflecting the marginal information obtained in the smoothing terms. modgam conducts a global test for spatial effects via a likelihood ratio test by comparing the deviance between a full model (including the spatial smoother) and a reduced model (omitting the spatial smoother). For the full model, the degrees of freedom of the non-parametric term are computed as tr(S) − 1, where S denotes the smoothing matrix, and the degrees of freedom of the parametric portion are p + 3 (p + 2 for survival data). Thus, the degrees of freedom of the full model are tr(S) + p + 2 (tr(S)+p+1), and the degrees of freedom for the likelihood ratio test statistic are tr(S) + 1. The function modgam will return the p-value for the likelihood ratio test automatically. In addition, modgam also performs a permutation test of the global spatial effect and pointwise significance (Kelsall and Diggle, 1998;Webster et al., 2006) . The function will return the results of the permutation test when permute=N.permt is specified in the function call, where N.permt denotes the desired number of permutations used to generate the permutation distribution.
For visualizing inference for spatial effects, the plot function will plot all point estimates along with the associated lower and higher band of confidence intervals provided that se.fit=TRUE is specified in the original modgam call. By setting "contours = intervals", areas with confidence intervals excluding 0 (on the log estimated effect scale) will be indicated on the map by plotting the contours of an indicator vector created to indicate whether 0 is below, between or above the confidence intervals at the grid points. By setting "contours = permrank", contours will be added to indicate significant areas that had a pointwise permutation based p value less than a specified threshold (default of .05).

An example
Returning to the CAdata example presented in Section 2.3.3, we consider visualizing spatial inference. Setting se.fit=TRUE, modgam function returns pointwise standard errors and confidence intervals. In fit3 below, the resulting standard errors can be obtained via the call fit3$se. The resulting confidence intervals can be plotted via the plot function, and are shown in Figure 7. R> fit3 <-modgam(Surv(time, event)~AGE + factor(INS) + lo(X, Y), + data = CAdata, rgrid = CAgrid, sp = 0.3, verbose = FALSE, + se.fit = TRUE) R> plot(fit3, CAmap, exp = True, mapmin = 0.2, mapmax = 5, + border.gray = 0.7, contours = "interval") R> fit3 Call: modgam(formula = Surv(time, event)~AGE + factor ( Heatmap of the hazard ratio as well as confidence intervals compared to the median hazard with significant areas circled which were identified by confidence intervals. The left plot illustrates the lower bound of a 95% confidence interval for the hazard ratio at each location. The center plot depicts the estimated hazard ratio at each location. The right plot indicates the upper bound of a 95% confidence interval for the hazard ratio at each location.

Concluding remarks
GAMs provide a unified statistical framework that allows for the adjustment of individual-level risk factors when evaluating spatial variability in a flexible way. Given the complex nature of spatial patterns, GAMs provide an improved framework over traditional parametric modeling of spatial patterns. The MapGAM package introduced here provides a fairly comprehensive and user-friendly set of tools for both fitting GAMs to a variety of outcomes and visualizing complex spatial effects. Of course, one must be careful of overfitting observed data given the flexibility afforded by the GAM framework. As such, care is needed when choosing the degree of flexibility utilized in model specifications and honest assessments of out-of-sample predictive performance should be considered.
Bivariate LOESS smoothing with standard error estimation is computationally intensive, especially in the context of GAMs and proportional hazards models. For example, with 5000 observations, a span size of 0.2, and a binomial outcome modgam took about 1 second to provide estimates without standard errors but about 50 seconds with standard error estimates (se.fit=TRUE) on recent personal computers. For the same span size and number of observations but with a proportional hazards model, modgam took about 40 seconds without standard errors and about 70 seconds with standard errors. Although slower than we might like, the run times for se.fit=TRUE are much faster than the pointwise permutation test we previously employed which required a 1000-fold increase in run times (Webster et al., 2006).
Estimating and mapping spatial distributions of disease risk is extremely useful for identifying health disparities, and mapping risk surfaces that are adjusted for individual-level confounding variables is of great interest to epidemiologists. By developing and actively maintaining a convenient R package, MapGAM, we intend to facilitate mapping crude and covariate-adjusted spatial effects for the most common probability models used to characterize the relationship of disease risk to spatial location and other factors. In the future we hope to improve the flexibility of the package by expanding the incorporated smoothing methods, including the addition of basis expansion and tensor product methods, allowing for smoothing over more than two dimensions, and expanding the sampcont function to include additional sampling methods such as matching. Further research on the development and implementation of adaptive smoothing methods that allow for the amount of smoothing to vary depending on the local extent of a spatial effect is currently in progress, and may be added to the package in a future update. In addition, while spatial correlation is accounted for via the fixed effects smoothed spatial term in the models we have presented, correlation may also arise if repeated measures on sampling units are taken through time. This is currently beyond the scope of the package, but is an area of our current research.