The IDSpatialStats R Package: Quantifying Spatial Dependence of Infectious Disease Spread

Spatial statistics for infectious diseases are important because the spatial and temporal scale over which transmission operates determine the dynamics of disease spread. Many methods for quantifying the distribution and clustering of spatial point patterns have been developed (e.g. Kfunction and pair correlation function) and are routinely applied to infectious disease case occurrence data. However, these methods do not explicitly account for overlapping chains of transmission and require knowledge of the underlying population distribution, which can be limiting when analyzing epidemic case occurrence data. Therefore, we developed two novel spatial statistics that account for these effects to estimate: 1) the mean of the spatial transmission kernel, and 2) the τ-statistic, a measure of global clustering based on pathogen subtype. We briefly introduce these statistics and show how to implement them using the IDSpatialStats R package.


Introduction
The transmission process which drives an epidemic can be characterized by the spatial distance separating linked cases. When these transmission events accumulate over time, they are observed as areas of elevated disease prevalence. Knowledge of the extent of the affected area and where new cases may arise is crucial for many disease control strategies (e.g. ring vaccination, vector control etc). In epidemiology, case occurrence data-(x,y) coordinates with temporal information (t) and other covariates-are often used to understand these types of infectious disease dynamics. These data are typically treated as a generic point process so that they can be described in terms of spatial intensity (expected number of cases per unit area) or clustering due to spatial dependence (covariance in x,y space).
In the broader field of spatial statistics, there are many methods that measure the spatial intensity or clustering of a generic point process on a Cartesian (x,y) coordinate system (Table 1). These methods primarily fall into three categories: first-order first-moment (FOFM), first-order second-moment (FOSM), and second-order second-moment (SOSM). The FOFM measures use quadrature (aggregate counts of points within cells) to quantify intensity of the point pattern continuously over (x,y) space. Packages such as lgcp (Taylor et al., 2013(Taylor et al., , 2015 and ppmlasso (Renner and Warton, 2013) allow users to model first-order intensity as a count process using a regressive function of (x,y) coordinates and other covariates. The FOSM measures-Moran's I and Geary's C (Moran, 1950;Geary, 1954)-also use quadrature, but they describe general covariance among cell counts across the (x,y) dimensions. These spatial statistics can be calculated using the spdep R package (Bivand and Piras, 2015). The SOSM measures, such as the K-function and its non-cumulative analogue, the pair correlation function, quantify clustering among neighboring points. Both the FOSM and SOSM measures are considered global spatial statistics because they describe spatial dependence for the entire study area. However, the SOSM can further describe how spatial dependence changes as a function of distance by comparing the observed intensity of neighboring points within distance d to that expected under complete spatial randomness. The K-function and pair correlation function can be calculated using the ads (Pélissier and Goreaud, 2015), spatstat (Baddeley et al., 2016), and splancs Rowlingson and Diggle (2017) R packages.
These classic spatial measures are limited in their ability to describe infectious disease dynamics primarily because they treat case occurrence data as a generic point process. The FOFM and FOSM measures use quadrature, which make them vulnerable to error associated with data aggregation (Robinson, 2009) and the modifiable areal unit problem (Openshaw and Taylor, 1979). The SOSM measures, like the K-function and pair correlation function, are more common in epidemiology. However, their statistical interpretation is less intuitive in terms of classic epidemiological quantities of relative disease risk, such as the incidence rate ratio or hazard ratio. Additionally, even the temporal forms of these functions (e.g. the space-time K-function) do not capture the typical distances traveled in a single transmission generation as they quantify the overall spatial dependence between all cases, not just those epidemiologically linked. The mean distance between sequential cases in a transmission chain is an important epidemiological quantity because it provides insight into potential mechanisms driving spread as well as helping inform interventions. Therefore, we developed novel measures that build upon concepts in spatial statistics to characterize infectious disease spread using case occurrence  Rowlingson and Diggle (2017) data. Importantly, these measures are robust to heterogeneities in the underlying population, and substantial case under-reporting, which is common in epidemiology.
We describe these two measures of spatial dependence for infectious diseases and show how they can be calculated with the IDSpatialStats R package in the following three sections. First, we introduce a function which simulates infectious disease spread as a spatial branching process. This function is primarily intended to simulate example datasets for the est.transdist family of functions and τ-statistics that use temporal information to indicate linked cases. Second, we demonstrate how to estimate the mean and standard deviation of the spatial transmission kernel (Salje et al., 2016b). Estimating the spatial transmission kernel requires an understanding of the number of transmission generations separating cases at different time points of the epidemic. This method provides a measure of fine-scale spatial dependence between two cases, which can be interpreted as the mean distance between sequential cases in a transmission chain. Third, we describe a measure of global clusteringthe τ-statistic-that calculates the relative risk of infection given some criteria to identify cases closely related along a chain of transmission (Lessler et al., 2016). The τ-statistic is a global clustering statisticlike the K-function and pair correlation function-that provides an overall measure of clustering for the entire course of an outbreak. Depending on the parameterization, the τ-statistic represents the odds of observing another case with distance d of an infected case compared with either the underlying population or other pathogen types. The following sections contain a brief introduction to each statistic to provide context to the code implementation-for more detailed description of each statistic, see Lessler et al. (2016) and Salje et al. (2016b).
We have implemented these tools in the IDSpatialStats R package version 0.3.5 and above. The latest stable release depends on the doParallel and foreach packages (Microsoft and Weston, 2017;Corporation and Weston, 2018) and can be downloaded from CRAN. A development version of the package is also available on Github at https://github.com/HopkinsIDD/IDSpatialStats.

Simulating spatial disease spread
We use a stochastic spatial branching process to simulate epidemiological data in the sim.epidemic function. Simulations begin with an index case at (x, y, t) = (0, 0, 0) and transmission events that link two cases follow according to a random Markov process in (x, y) space (i.e. Brownian motion). The number of transmission events occurs according to a Poisson distribution, with its mean and variance set to the effective reproduction number R of the infecting pathogen. The spatial distance traversed by each transmission event is given by a user specified probability distribution which serves as the dispersal kernel function. When specifying the dispersal kernel, the trans.kern.func argument expects a list object containing a probability distribution function and its named arguments. For example, to simulate an epidemic where transmission typically occurs at the local level, but long distance transmission events sometimes occur, an exponential transmission kernel might be used because of its long tail. Alternatively, if transmission is expected to consistently occur within a given range, then a normal kernel may be more appropriate.
In simulations where the basic reproductive number R 0 is used to define a constant R-value and R 0 > 1, the number of cases will continue to increase with each time step. This effect may not be appropriate when simulating settings where intervention efforts or depletion of susceptible individuals causes heterogeneity in R over the course of the epidemic. Thus, the sim.epidemic function accepts either a scalar value for a constant R value or a vector of R values with a length equal to tot.generations, allowing simulations with a variable R value, as shown in the following R code.

The mean transmission distance
In Salje et al. (2016b), we introduced a method to estimate the mean and standard deviation of the spatial transmission kernel using case occurrence data. These data include location (x, y) and onset time t of each case (case times) and the infecting pathogen's generation time g(x). To estimate these spatial statistics, we use the Wallinga-Teunis (WT) method (Wallinga and Teunis, 2004) to probabilistically estimate the number of transmission events required to link two cases, denoted as θ. In settings where a phylogenetic model or contact tracing provide information on transmission pathways, the spatial kernel can be empirically estimated using the distribution of observed distances among all linked cases. The mean and standard deviation of this kernel can then be calculated for any time interval between t 1 and t 2 to give µ obs t (t 1 , t 2 ), with the assumption that the number of transmission events separating all case pairs is homogeneous (θ = 1). When data that indicate case linkage are lacking, this assumption is incorrect because the distance between two cases depends on the number of transmission events separating them. In this case, the mean transmission distance at each time interval µ t must be estimated as a weighted mean: The R Journal Vol. 11/2, December 2019 ISSN 2073-4859 Where, w(θ = i, t 1 , t 2 ) gives the weight for each of the i elements of θ and the second term µ a (θ = i, µ k , σ k ) gives the mean distance separating case pairs that are linked by the ith value of θ.
We have implemented four nested functions that are used to estimate w(θ = i, t 1 , t 2 ) and describe them briefly below. Listed in order, they are comprised of est.wt.matrix.weights, est.wt.matrix, get.transdist.theta, and est.transdist.theta.weights. Although, these functions are documented separately, they are all driven by the est.transdist family of functions and do not need to be run manually unless desired.

Wallinga-Teunis matrices
The est.wt.matrix.weights function builds upon code from the R0 package (Obadia et al., 2012) to calculate the basic WT matrix (Wallinga and Teunis, 2004). This matrix gives the probability that a case at time t i (rows) infected a case at time t j (columns), i.e. θ = 1, based on the generation time distribution of the pathogen g(x). For an epidemic with t unique case times, est.wt.matrix.weights gives a T × T matrix.
The est.wt.matrix function produces a WT type matrix for all infector-infectee case pairs. Given the WT matrix produced by est.wt.matrix.weights and total case count n, this function calculates an n × n matrix giving the probability that case i (rows) infected case j (columns). The WT matrix object can be handed directly to est.wt.matrix via the basic.wt.weights argument, or if this argument is NULL, the est.wt.matrix.weights function is called automatically.

Estimation of θ weights
The get.transdist.theta function estimates the number of transmission events θ separating pairs of cases using the probabilities in the infector-infectee WT matrix produced by the est.wt.matrix function. Sampling all possible transmission trees is impractical for most datasets, so this function constructs a transmission tree by randomly selecting the infector of each case in the epidemic and then θ is determined by finding the product of all probabilities in the chain of transmission that link the randomly sampled case pairs. The object theta.wts (in the code segment below) contains a three-dimensional array [i,j,k], where the rows i and columns j represent unique case times and the third dimension k is the number of transmission events θ. Each cell gives the probability that two cases occurring at times i and j are connected by θ transmission events in the randomly sampled transmission tree. Probabilities in each [i,j,·] row are normalized across all θ values. The get.transdist.theta function samples a single randomized transmission tree from the epidemic data, therefore we want to simulate many iterations of this random sampling to get a better estimate of the true distribution of θ.
The est.transdist.theta.weights function estimates the distribution of θ across all t i and t j combinations by simulating many iterations of transmission trees using the get.transdist.theta function. Its output is the same as the get.transdist.theta function, however, it represents the normalized probabilities after n.rep number of simulations.

Estimating mean of transmission kernel
To estimate the mean transmission distance over the duration of the epidemic we must use the observed distances between case pairs given the time they occurred µ obs t (t i , t j ) and combine them into an overall estimate of the mean of the transmission kernel µ k . The workhorse function est.trandsdist estimates the overall mean µ k and standard deviation σ k of the kernel. This function first calls the est.wt.matrix.weights, est.wt.matrix, get.transdist.theta, and est.transdist.theta.weights functions described above to estimate the distribution of θ across all case pairs and then calculates each of the weights w(θ = i, t 1 , t 2 ). The weights are calculated as the proportion of all case pairs occurring at t i and t j that are separated by each estimated θ over all simulations: .
Here, the functions I 1 and I 2 indicate if two cases occurred at time t i and t j and were linked by θ transmission events, or if they just occurred at t i and t j respectively. In words this can be written as: w(θ = i, t 1 , t 2 ) = Total cases at t 1 and t 2 across all simulations separated by θ transmission events Total cases at t 1 and t 2 across all simulations .
Once the weights of the θ values are estimated, the est.transdist function calculates µ k and σ k as the average weighted estimate over all combinations of t i and t j . If we now let k index the vector of θ values, then:μ For a derivation of these equations, see sections 2.3 and 2.4 of Salje et al. (2016b).
The est.trandist function requires case occurrence data-a matrix with three columns [x,y,t]and the mean and standard deviation of the infecting pathogen's generation time (for calculating WT matrices) as input. The function returns estimates of µ k and σ k of the spatial transmission kernel. These estimates are made under the assumption that µ k = σ k , so the upper bound ofμ k andσ k are also provided for when this assumption is violated. Bound estimates are equal to √ 2 times the values estimated under the µ k = σ k assumption (see section 2.5 of Salje et al. (2016b)). Additional constraints on the estimation of µ k and σ k can be defined in the remaining arguments, such as the time step in which the analysis should begin (t1), the maximum number of time steps (max.sep) and maximum spatial distance (max.dist) to consider when estimating θ, and the number of randomized transmission tree simulations to run (n.transtree.reps).
To estimate the uncertainty aroundμ k due to sampling or observation error, we have implemented a wrapper function called est.transdist.bootstrap.ci that performs bootstrap iterations using the est.transdist function. Upon each iteration, the epidemiological data are resampled with replacement and µ k is re-estimated. The est.transdist.bootstrap.ci function contains all the same arguments as the est.transdist function, with additional arguments defining the number of bootstrapped iterations to perform, the high and low boundaries of the desired confidence interval, and options for running the bootstrap analysis in parallel.
When parallel computation is enabled (the default is parallel = FALSE), the function uses the makeCluster() function of the parallel package to make the appropriate cluster type for the operating system of the local machine (SOCK cluster for Windows or a Fork cluster for Unix-like machines). The cluster is then registered as the parallel backend for the foreach package, which is used to run the bootstraps in parallel. The user can define the number of cores to use when running in parallel using the n.cores argument. If parallel = TRUE and n.cores = NULL, the function will use half the total cores on the local machine.

Change in mean transmission distance over time
An estimate of µ k over the duration of an epidemic is indicative of the overall spatial dependence. However, conditions may change over the course of an epidemic that alter the spatial scale upon which transmission operates. To quantify temporal heterogeneity in the mean transmission distance, we have implemented the est.transdist.temporal and est.transdist.temporal.bootstrap.ci functions, which estimate the change inμ k over time and its bootstrapped confidence intervals respectively.
When applying the temporal versions of the est.transdist functions, it is important to consider the sample size at each time step because the est.transdist.temporal function estimates µ k for all cases leading up to each unique time step. Some time steps at the beginning of an epidemic may be returned as NA if there are not enough unique cases to estimate µ k . Furthermore, in scenarios where time steps in the beginning of an epidemic have low sample sizes, such as an epidemic with a low R 0 ,μ k may be over-or under-estimated and display larger confidence intervals due to sampling error. Therefore, we recommend either setting the t1 argument to the first time step that contains a sufficient sample size, or plotting results along with cumulative sample size as we have done in Figure 3.

Application to foot-and-mouth disease
To provide an example of how the functions shown above can be applied to real data, we estimate the mean transmission distance for the 2001 foot-and-mouth epidemic among farms in Cumbria, UK. These data can be found in the fmd data object included in the sparr package (Davies et al., 2011). It contains transformed (x,y) coordinates of the infected farms and the time step (t) in which it was infected, which is given in days since the index farm was infected ( Figure 2). The generation time for foot-and-mouth disease is estimated to have a mean of 6.1 days and a standard deviation of 4.6 days (Haydon et al., 2003), so we use these in the gen.   , col= blue ) } Using our approach described above, we estimated the mean transmission distance between case farms in the sparr package foot-and-mouth disease data to beμ k = 5.8 km (95% CI = 5.7-6.1 km). Interestingly, this estimate of µ k is lower than that reported in Salje et al. (2016b), whereμ k = 9.1 km (95% CI = 8.4-9.7 km). The difference inμ k is likely due to differences in data sources. The values estimated in Salje et al. (2016b) include case farms from both Cumbria and Dumbfriesshire, UK with the additional constrain that only case farms where the source farm was confirmed were included. The sparr data set, on the other hand, contains all case farms from only Cumbria.

Global clustering: the τ-statistic
Estimating the mean of the spatial transmission kernel (above) provides information on the small spatial scale of individual transmission events. After subsequent generations of transmission where different transmission chains overlap in space, a larger area of elevated disease prevalence will be observed. To describe this larger-scale process, we introduced the τ-statistic in Lessler et al. (2016). The τ-statistic measures global clustering with an epidemiological interpretation-the relative risk of an individual being a related case (under some definition) given they are within a particular distance from another case. The spatial distances where the relative risk is high represent an area of elevated prevalence that is likely to have greater public health utility compared with the scale of individual transmission because interventions must account for ongoing transmission at the population level to contain an outbreak. Therefore, the τ-statistic provides a more intuitive global measure of spatial clustering, which can be interpreted as the relative risk of infection.
Formulation of the τ-statistic has a mathematical relationship to the K-function and pair correlation function. The K-function quantifies the expected number of neighboring points within distance d of a typical point Z relative to the intensity of the underlying population distribution λ.

y coordinates of Z]
In the simplest scenario, λ is assumed to be homogeneously distributed, so that λ = N/A, where N The R Journal Vol. 11/2, December 2019 ISSN 2073-4859 is the total number of cases and A is the total study area. Under the assumption of a heterogeneous underlying population distribution, λ becomes the location specific intensity λ(S), where S ⊂ A within distance d of location Z. In both cases, the K-function is plotted with the theoretical value of the K-function for a homogeneous Poisson process πd 2 , which indicates clustering or dispersion relative to complete randomness. The cumulative aspect of the K-function (using all neighbors within distance d) is, however, a well-known constraint that makes it difficult to interpret changes in clustering over distance. The pair correlation function alleviates this constraint by applying the K-function within a distance range (d 1 , d 2 ) and standardizing it by the complete spatial randomness expectation for a homogeneous Poisson process within this range: where, ∆d = d 2 − d 1 . Both the K-function and pair correlation functions have seen general application due to developments that accommodate inhomogeneous underlying population distribution, clustering between typed points, and edge effects. However, these functions assume complete knowledge of the underlying population distribution and use a null statistical hypothesis of complete spatial randomness, which is precarious for scenarios in epidemiology where the underlying population is unknown and relative risk is used to understand disease dynamics.
To incorporate other metrics of global clustering, the IDSpatialStats package provides wrapper functions for calculating both the cross Kand cross pair correlation functions using the Kcross and PCFcross functions from the spatstat package (Baddeley et al., 2016). These wrapper functions allow for straightforward calculation of these statistics using typed epidemiological data that is formatted for IDSpatialStats functions (Figure 4) legend(-3, 4.15e6, legend=c("Theoretical Poisson process", "Observed function"), col=c("red", "black"), lty=2:1, box.lty=0, bg= transparent , x.intersp=0.7, y.intersp = 1.2) plot(g$pcf, type= l , lwd=2, xaxt= n , xlab= distance (m) , ylab= cross pair correlation function ) abline(h=1, col= red , lty=2, lwd=2) axis(1, at=which(r.vals %in% labs), labels=labs) A measure of relative risk that does not assume knowledge of the underlying population distribution was developed for point pattern data in veterinary epidemiology (Diggle et al., 2005). This function, which is implemented in the sparr (Davies et al., 2011) and spatstat (Baddeley et al., 2016) packages, uses spatial kernel functions to calculate a ratio of spatial intensity for two different point types ρ(S) = λ 1 (S)/λ 0 (S). This formulation can quantify the relative risk for case-control point data or case occurrences with multiple types, but it is not a global clustering statistic.
The τ-statistic can be computed in two ways depending on the underlying assumption that the true population distribution is known. If this assumption is true, then the τ-statistic is similar to other common measures of global clustering that rely on knowledge of the background population distribution to quantify generic clustering of a point process.
whereπ(d 1 , d 2 ) represents the incidence rate within distance d 1 to d 2 of a case andπ(0, ∞) represents the incidence rate over the entire extent of the study area. This can be implemented by defining the numerator and denominator using the get.pi function or by using the generalized get.tau function with comparison.type = representative argument. Theπ(·) terms in the numerator and denominator use the occurrence data to calculate the incidence rates for linked cases within some defined distance. Therefore, a critical step in performing an analysis with the τ-statistic is specifying which cases are linked through some defined relationship (homologous) and those that are not (non-homologous). Homology can be defined statically or dynamically. When defined statically, the get.pi.typed and get.tau.typed functions can be used to assign case types based on a type column supplied by the data matrix. When defined dynamically, an indicator function I(·) is used to delineate linked and unlinked cases in the data, which allows greater flexibility when defining case type homology. plot(tau$r.low+tau$r/2, tau$tau, type= l , lwd=2, col= blue , xlab= distance (m) ) abline(h=1, lty=2, lwd=2, col= red ) abline(v=100) The interpretation of the τ-statistic is analogous to that of the pair-correlation function in two ways. First, the τ-statistic is not compared to the theoretical measure of a random Poisson process because the metric is an incidence rate ratio with epidemiological meaning. Instead, this measure is plotted in comparison to a ratio of 1, indicating no relative difference in disease risk among homologous cases. Second, the τ-statistic measures relative risk using case pairs within a distance range (d 1 ≤ d ij < d 2 ). This approach can describe how fine-scale spatial dependence changes over distance. However, if the user wishes to estimate cumulative spatial dependence (analogous to the K-function), then d 1 can be fixed at zero (0 ≤ d ij < d 2 ).

Estimating the τ-statistic withθ
If the underlying population distribution is unknown, then the τ-statistic can be computed so that it quantifies global clustering in terms of relative risk. This goes beyond classic measures of clustering by utilizing some relationship between linked and unlinked cases to distinguish transmission chains and quantify relative clustering between them. To do so, we use the functionθ(·) which gives the odds ratio of cases related to case i to those independent of i to give an estimate of the τ-statistic that is not biased by assumptions about the underlying population distribution.
The indicator function I(·) is applied to all ij case pairs within distance d 1 and d 2 . It returns a binary response which is equal to 1 when case pairs meet user-specified criteria to be homologous and equal to 2 when they are non-homologous. The result is an i × j relation matrix z ij which is used to find the sums of homologous and non-homologous case pairs. Using an indicator function also allows additional criteria to be used to define case type homology, such as temporal proximity ( Figure 6). plot(tau$r, tau$tau, type= l , lwd=2, col= blue , xlab= distance (m) ) abline(h=1, lty=2, lwd=2, col= red ) abline(v=100) Figure 6: The τ-statistic calculated using get.tau with an indicator function based on serotype homology and temporal proximity (blue line) with the theoretical value of no relative difference in disease risk shown by the dashed red line. The vertical black line indicates the mean of the spatial dispersal kernel (100m) used to simulate the DengueSimR01 data set.

Calculating variance in point estimates
In the examples above, the get.pi, get.theta, and get.tau function families calculate point estimates forπ,θ, andτ respectively. In scenarios where observation error, sampling bias, or measurement error are expected to introduce additional variance, users may wish to place confidence intervals around these point estimates. For this purpose, each family of functions contains a function ending with a .bootstrap suffix, which generates point estimates for boot.iter number of bootstrapped samples of the data (Efron and Tibshirani, 1994). Functions ending with a .ci suffix are wrappers that calculate user specified confidence intervals based on the bootstrapped samples (Figure 7).

Null hypothesis testing
A common approach for interpreting spatial clustering statistics includes hypothesis testing using simulation envelopes to assess whether an observed spatial measure is statistically significant (Ripley, 1979;Baddeley et al., 2014). To enable null hypothesis tests, we have implemented a permutation method (Good, 2010) to simulate the nonparametric distribution ofπ,θ, andτ under the null hypothesis of no spatial dependence. The permutation algorithm simulates the null distribution by randomly reassigning case coordinates to observations upon each permutation. Null distributions can be computed using functions ending in the .permute suffix and then plotted with observed measures to assess statistical significance as a function of distance (Figure 8).

Summary
Conventional spatial statistics are often used to describe the intensity or clustering of point processes. Quantifying spatial dependence of infectious disease spread, however, requires a modified approach that considers overlapping transmission chains and the likelihood of case linkage. Therefore, we have implemented two types of spatial statistics in the IDSpatialStats package (the mean transmission distance µ k , and the τ-statistic) that can be used along with other measures of spatial dependence (e.g. the cross K-function and cross pair correlation function) to understand the spatial spread of infectious diseases.
We showed how to simulate epidemiological data and estimate µ k , and the τ-statistic, which can be used as templates for other analyses. First, the transdist family of functions provides a measure of fine-scale spatial dependence by estimating the mean of the transmission distanceμ k between sequentially linked cases in a transmission chain (Salje et al., 2016b). Second, the get.tau family of functions measure spatial dependence on a larger-scale by estimating the τ-statistic, which describes the area of elevated prevalence surrounding cases. This family of functions does so by estimating the relative risk of a case being homologous compared with non-homologous case types. The definition of case type homology is flexible and can utilize temporal or biological information, such as genotype and serotype of the pathogen.
The generalized structure of the get.tau family allows for diverse applications of the τ-statistic to epidemiological data. Previous studies have used the τ-statistic to quantify spatial and/or temporal dependence of transmission for Dengue, Cholera, HIV, and Measles disease systems (see Table 2 for detailed descriptions). These studies illustrate that, regardless of the system under study, analyses are enhanced when bootstrapping, permutation tests, and/or assessment of observation error is employed to understand the distribution of error and statistical significance for estimates of the τ-statistic.
The IDSpatialStats package is undergoing continued development. Future directions include expanding the implementation of the τ-statistic to facilitate estimation of spatio-temporal dependence by incorporating a temporal interval into the spatial search window. This technique was used by Salje et al. (2012) in the form of the φ-statistic to estimateφ(d 1 , d 2 , t 1 , t 2 ). Additional developments include a theoretical framework for the τ-statistic that incorporates uncertainty due to pathogen generation time, and to define case type homology more continuously using genetic distance matrices. We hope these developments will enable users to address more complex questions and incorporate more sources of uncertainty into estimates of spatial dependence. Check Github at https://github.com/HopkinsIDD/ IDSpatialStats for latest development release. Overview of the τ-statistic, its performance given observation error, Lessler et al. (2016)