CONTRIBUTED RESEARCH ARTICLE 1 spMC: Modelling Spatial Random Fields with Continuous Lag Markov Chains

Currently, a part of the R statistical software is developed in order to deal with spatial models. More specifically, some available packages allow the user to analyse categorical spatial random patterns. However, only the spMC package considers a viewpoint based on transition probabilities between locations. Through the use of this package it is possible to analyse the spatial variability of data, make inference, predict and simulate the categorical classes in unobserved sites. An example is presented by analysing the well-known Swiss Jura data set. Introduction Originally, the spMC package (Sartore, 2013) was developed with the purpose of analysing categorical data observed in 3-D locations. It deals with stochastic models based on Markov chains, which may be used for the analysis of spatial random patterns in continuous multidimensional spaces (Carle and Fogg, 1997). Its results are easily interpretable and it is a good alternative to the T-PROGS software developed by Carle (1999), which is oriented towards modelling groundwater systems. The models considered in the spMC package are used to analyse any categorical random variable Z(s) at the d-dimensional position s 2 R d which satisfies the Markov property. Other R packages are also helpful for analysing categorical spatial data. For example, the gstat package (Pebesma, 2004) allows for analyses using traditional methods such as the parameter estimation of spatial models based on variograms and kriging techniques for predictions. All these methods and their variants are also available in other packages, e.g. geoRglm (Christensen and Ribeiro Jr, 2002) and RandomFields (Schlather, 2013). When Z(s) is assumed to be linked to a continuous hidden random process, these packages are useful for studying the covariance structure of the data. The spMC package extends the functionality of the T-PROGS software to R users. New useful functions are included for faster modelling of transition probability matrices, and efficient algorithms are implemented for improving predictions and simulations of categorical random fields. The main tasks and their functions are clearly summarised in Table 1. Three different fitting methods were implemented in the package. The first is based on the estimates of the main features that characterise the process, the second focuses on the minimisation of the discrepancies between the empirical and theoretical transition probabilities, and the third follows the maximum entropy approach. Once the model parameters are properly estimated, transition probabilities are calculated through the matrix- valued exponential function (see Higham, 2008, Algorithm 10.20 in Chapter 10). These transition probabilities are then combined to predict the category in an unsampled position. Three algorithms are used to simulate spatial random fields; those based on the kriging techniques (Carle and Fogg, 1996), those using fixed and random path methods (Li, 2007a; Li and Zhang, 2007), or those using multinomial categorical simulation proposed by Allard et al. (2011). In order to reduce computation time through OpenMP API (version 3.0; OpenMP Architecture Review Board, 2008), the setCores() function allows the user to change the number of CPU cores, so that one can mix shared memory parallel techniques with those based on the Message Passing Interface (The MPI Forum, 1993) as described in Smith (2000). Here, it will be shown how to perform a geostatistical analysis of the Jura data set (Goovaerts, 1997) using the spMC package (version 0.3.1). The data set consists of 359 sampled spatial coordinates and their respective observed realisations of two categorical variables (related to the rock-type and the land use) and some continuous variables (corresponding to the topsoil content).


Introduction
Originally, the spMC package (Sartore, 2013) was developed with the purpose of analysing categorical data observed in 3-D locations.It deals with stochastic models based on Markov chains, which may be used for the analysis of spatial random patterns in continuous multidimensional spaces (Carle and Fogg, 1997).Its results are easily interpretable and it is a good alternative to the T-PROGS software developed by Carle (1999), which is oriented towards modelling groundwater systems.The models considered in the spMC package are used to analyse any categorical random variable Z(s) at the d-dimensional position s ∈ R d which satisfies the Markov property.Other R packages are also helpful for analysing categorical spatial data.For example, the gstat package (Pebesma, 2004) allows for analyses using traditional methods such as the parameter estimation of spatial models based on variograms and kriging techniques for predictions.All these methods and their variants are also available in other packages, e.g.geoRglm (Christensen and Ribeiro Jr, 2002) and RandomFields (Schlather, 2013).When Z(s) is assumed to be linked to a continuous hidden random process, these packages are useful for studying the covariance structure of the data.
The spMC package extends the functionality of the T-PROGS software to R users.New useful functions are included for faster modelling of transition probability matrices, and efficient algorithms are implemented for improving predictions and simulations of categorical random fields.The main tasks and their functions are clearly summarised in Table 1.Three different fitting methods were implemented in the package.The first is based on the estimates of the main features that characterise the process, the second focuses on the minimisation of the discrepancies between the empirical and theoretical transition probabilities, and the third follows the maximum entropy approach.Once the model parameters are properly estimated, transition probabilities are calculated through the matrixvalued exponential function (see Higham, 2008, Algorithm 10.20 in Chapter 10).These transition probabilities are then combined to predict the category in an unsampled position.Three algorithms are used to simulate spatial random fields; those based on the kriging techniques (Carle and Fogg, 1996), those using fixed and random path methods (Li, 2007a;Li and Zhang, 2007), or those using multinomial categorical simulation proposed by Allard et al. (2011).In order to reduce computation time through OpenMP API (version 3.0;OpenMP Architecture Review Board, 2008), the setCores() function allows the user to change the number of CPU cores, so that one can mix shared memory parallel techniques with those based on the Message Passing Interface (The MPI Forum, 1993) as described in Smith (2000).
Here, it will be shown how to perform a geostatistical analysis of the Jura data set (Goovaerts, 1997) using the spMC package (version 0.3.1).The data set consists of 359 sampled spatial coordinates and their respective observed realisations of two categorical variables (related to the rock-type and the land use) and some continuous variables (corresponding to the topsoil content).

Brief overview of the models
The spMC package deals with both one-dimensional and multidimensional continuous lag models.If Z(s l ) denotes a categorical random variable in a location s l , for any l = 1, . . ., n, its outcome conventionally takes values in the set of mutually exclusive states {z 1 , . . . ,z K }, where K represents the total number of observable categories.A continuous lag Markov chain model organises the conditional probabilities for any i, j = 1, . . ., K, in a K × K transition probability matrix.Generally speaking, such a model is a transition probability matrix-valued function depending on one-dimensional or multidimensional The R Journal Vol.T(h φ ) : R d → [0, 1] K×K , wherein h φ denotes a d-dimensional continuous lag along the direction φ ∈ R d .Such a lag corresponds to the difference between the location coordinates and is proportional to the direction φ.The exponential form, is usually adopted to model the observed variability and local anisotropy.The components of the transition rate matrix R φ ∈ R K×K (the model coefficients) depend on the direction φ and they must satisfy the following properties (Norris, 1998, Section 2.1): • r ii ≤ 0, for any i = 1, . . ., K.
• The row sums satisfy • The column sums satisfy where p i is the proportion of the i-th category.
The R Journal Vol.5/2, December ISSN 2073-4859 The components of R −φ may be computed through the relation where −φ denotes the opposite direction.

Transition rate matrix estimation
In order to obtain an estimate of the transition rate matrix along the direction φ, the package provides two solutions, i.e. by following the one-dimensional approach or the multidimensional.The latter estimates the matrix R φ by the ellipsoidal interpolation of d matrices, which are computed along the axial directions through one-dimensional procedures.
The one-dimensional techniques related to the tpfit_ml() and tpfit_me() functions are based on mean lengths L i, φ and transition frequencies of embedded occurrences f * kj, φ .The iterated least squares method is implemented through the tpfit_ils() function.
The first two functions estimate the stratum mean lengths for each category through the mlen() function.The mean lengths are computed either with the average of the observed stratum lengths or their expectation based on the maximum likelihood estimate by assuming that the observed lengths are independent realisations of a log-normal random variable.In order to verify the distributional assumption on the lengths, the function getlen() estimates stratum lengths of embedded Markov chains along a chosen direction, while other functions such as boxplot.lengths(),density.lengths(),hist.lengths() are used for graphical diagnostics.
The tpfit_ml() function computes the transition frequencies of embedded occurrences as an average through the function embed_MC().The maximum entropy method, adopted by the tpfit_me() function, calculates the transition frequencies of embedded occurrences through the iterative proportion fitting (Goodman, 1968).The algorithm may be summarised as follows: 4. Repeat the second and the third step until convergence.
Both tpfit_ml() and tpfit_me() functions estimate the autotransition rates as r ii = −1/L i, φ , while the rates for any i = j are calculated as r ij, φ = f * ij, φ /L i, φ .The tpfit_ils() function estimates the transition rate matrix by minimising the sum of the squared discrepancies between the empirical probabilities given by the transiogram() function and theoretical probabilities given by the model.The bound-constrained Lagrangian method (Conn et al., 1991) is performed in order to have a proper transition rate matrix, which satisfies the transition rate properties.
The multidimensional approach is computationally efficient.In fact a generic entry of the matrix R φ is calculated by the ellipsoidal interpolation as where h v, φ is the v-th component of the vector h φ , e v represents the standard basis vector, and the rate r ij, e v is replaced by r ij, −e v for components h v, φ < 0. In this way, it is only necessary to have in memory the estimates for the main directions.
The multi_tpfit_ml(), multi_tpfit_me() and the multi_tpfit_ils() functions automatically perform the estimation of d transition rate matrices along the axial directions with respect to the chosen one-dimensional method.

Prediction and simulation based on transition probabilities
Several methods were developed to predict or simulate the category in an unobserved location s 0 given the knowledge of the sample positions s 1 , . . ., s n .The conditional probability, is used to predict or simulate the category in s 0 , where z i represents the i-th category and z(s l ) is the observed category in the l-th sample location.
Usually such a probability is approximated through • Kriging-and Cokriging-based methods, implemented in the sim_ik() and sim_ck() functions.
The approximation proposed by Carle and Fogg (1996) is implemented in the functions sim_ik() and sim_ck().Both of them use some variant of the following where and the weights w ij, l are calculated by solving the following system of linear equations: where This approximation does not satisfy the probability axioms, because such probabilities might lie outside the interval [0, 1] and it is not ensured that they sum up to one.To solve the former problem truncation is considered, but the usual normalisation is not adopted to solve the latter; in fact, after the truncation, these probabilities might also sum up to zero instead of one.The implemented stabilisation algorithm translates the probabilities with respect to the minimum computed for that point.Then, the probabilities are normalised as usual.
To improve the computational efficiency of the algorithm, the m-nearest neighbours are considered in the system of equations instead of all sample points; in so doing, a decrease in computing time is noted and the allocated memory is drastically reduced to a feasible quantity.
For the approximation adopted in the sim_path() function, conditional independence is assumed in order to approximate the conditional probability as in the analysis of a Pickard random field (Pickard, 1980).This method, as described in Li (2007b), considers m known neighbours in the axial directions, so that the probability is computed as The method proposed by Allard et al. ( 2011) is implemented in the sim_mcs() function.It was introduced to improve the computational efficiency of the Bayesian maximum entropy approach The R Journal Vol.5/2, December ISSN 2073-4859 proposed by Bogaert (2002).Here, the approximation of the conditional probability is . Also in this case, the user can choose to apply this approximation by considering all data or only the m-nearest neighbours, with the same advantages described above.
Once the conditional probabilities are computed, the prediction is given by the highest probable category, while the simulation is given by randomly selecting one category according to the computed probabilities.
After the first simulation is drawn, the sim_ik() and sim_ck() functions execute an optimisation phase in order to avoid "artifact discontinuities", wherein the simulated patterns do not collimate with the conditioning data (Carle, 1997).The user can then choose to perform simulated annealing or apply a genetic algorithm in order to reduce the quantity where r ij, SIM and p i, SIM are coefficients estimated from the pattern to optimise, while r ij, MOD and p i, MOD are those used to generate the initial simulation.Other comparison methods are also available through the argument optype.

An example with the Jura data set
The data set consists of spatial coordinates and the observed values of both categorical and continuous random variables collected at 359 locations in the Jura region in Switzerland.In particular, we will deal with the rock-type categorical variable of the geological map created by Goovaerts (1997, see Figure 4), which consists of 5957 sites.The aim of these analyses is related to the parameters estimation of the model in (1), and its interpretation through graphical methods.These analyses are useful to check the model assumptions and to ensure the accuracy of the predictions.
First, the spMC package and the Jura data set in the gstat package are loaded as follows: library(spMC) data(jura, package = "gstat") If the package is compiled with the OpenMP API, the number of CPU cores to use can be set by setCores(4) otherwise a message will be displayed and only one core can be internally used by the spMC package.
In order to study the spatial variability of the data and interpret the transitions from a geometrical viewpoint, the empirical transition probabilities along the main axes are calculated.These probabilities point out the persistence of a category according to the lag between two points.They also provide juxtapositional and asymmetrical features of the process, which are not detected by adopting indicator cross-variograms (Carle and Fogg, 1996).Therefore, all couples of points along axial directions are chosen such that their lag-length is less than three.After, we calculate the empirical transition probabilities for twenty points within the maximum distance.This can be conducted with the execution of the following code: )) If we want to compare these probabilities with the theoretical one, we first need to estimate two transition rate matrices, i.e. the model coefficients, along the axial directions.Three estimation methods are available in the spMC package (see Table 1), but only those based on mean lengths and maximum entropy are shown, even though the iterated least squares may be similarly applied.The code to estimate the transition rates through the mean lengths method is written as follows: The R Journal Vol.5/2, December ISSN 2073-4859 One−dimensional transiograms (X−axis)    RTm <-list() In this case, the mean lengths are calculated through the average of the stratum lengths along the chosen directions.On the other hand, to estimate the transition rate matrices through the maximum entropy approach, the following code must be executed: ] <-tpfit_me(data, coords, direction = c( , 1)) Given the model coefficients, the transition probabilities for some specific lags are calculated as in (1).This is done as follows: RTr <-list() ETr <-list() Since these probabilities are calculated with respect to some fixed directions, i.e. by considering a one-dimensional perspective, they can be graphically compared.By the use of the mixplot() function, several transition probability diagrams (transiograms) can be superposed in a unique graphic, e.g.

for (i in 1:2) mixplot(list(Trg[[i]], RTr[[i]], ETr[[i]]
), type = c("p", "l", "l"), pch = "+", col = c(3, 1, 2), legend = FALSE, main = paste( "One-dimensional transiograms", c("(X-axis)", "(Y-axis)")[i])) By looking at the graphics in Figure 1, one can see how well the estimated models fit the observed probabilities (green points).This kind of graphic may be interpreted as a transition probability matrix; in fact it shows the probability dynamic related to one-dimensional lags along the specified direction.For example, let us consider the first horizontal line of graphics in Figure 1 on the left.They denote the transition probabilities at each lag from the Argovian state to one of the five categories.
These graphics are mainly used to investigate the stationarity of the stochastic process.In particular, the process is weakly stationary if the expected probabilities are not dependent on the location points, i.e.IE[Pr(Z(s) = z i )] = p i for all s ∈ R d .Theoretically, the transition probability matrix in (1) becomes constant as the lag distance h φ → ∞.In order to check the stationarity property of the process along one fixed direction φ, we need to look at empirical transition probabilities computed for large distances.If most of these probabilities reproduce the characteristics already described, the data might be considered as a realisation from a weakly stationary process.
The comparison can also be made between two or more transiograms drawn for different directions.In so doing, it is possible to check if the process is anisotropic.This happens when there is directional dependence in the data.From a probabilistic point of view, the transiogram may show different dynamics when it approaches the limit probability matrix along different directions.In our case, the behaviours of the empirical transition probabilities along the axial directions do not match.Although those based on the estimated model are more regular than the empirical, they are not similar.This means that the Jura data set is anisotropic.
The function pemt() can be considered as another tool to check the anisotropy of the process.It estimates the transition rate matrix for each multidimensional lag direction and computes the transition probabilities as in (1).At the same time the function calculates other probabilities through the transition rates computed as in (2).Then the probabilities are drawn by use of the function image.pemt()(see Figure 2).If probabilities at the same level (those with the same colour) are placed on ellipses, the process is characterised by geometrical anisotropy.Comparisons made by the use of contour.pemt() are more evident, because contour lines are displayed for both the pseudo-empirical and the theoretical probabilities in a unique graphic.
The following R code can be executed to obtain Figure 2: psEmpTr <-pemt (data, coords, 4 , max.dist = c(.5, .5))mycol <-rev(heat.colors(5)) image(psEmpTr, col = mycol, useRaster = TRUE, breaks = c( :5 ) / 5 ) From a computational point of view, the model based on the ellipsoidal interpolation of transition rate matrices is the most efficient way to calculate transition probabilities given multidimensional lags.In these cases, the model coefficients can be separately estimated by a unique R function.Hence, the functions multi_tpfit_ml() and multi_tpfit_me() provide methods to estimate transition rate The R Journal Vol.5/2, December ISSN 2073-4859 matrices along axial directions.These functions implement algorithms based on the mean lengths and the maximum entropy respectively (see Table 1).
MTr <-list() MTr$average <-multi_tpfit_ml(data, coords) MTr$entropy <-multi_tpfit_me(data, coords) With the output of these functions, we can draw the theoretical transition probability maps as follows: image(MTr$average, 4 , max.dist = .25,col = mycol, nlevels = 5, breaks = :5 / 5 ) image(MTr$entropy, 4 , max.dist = .25,col = mycol, nlevels = 5, breaks = :5 / 5 ) Both graphics in Figure 3 denote transition probability maps.The way to read these graphics is almost the same as for one-dimensional transiograms.Each image in this kind of graphic represents a 2-D transition probability section; this means that the probability level is given by the colour of the points.Each point is located to a specific "bidimensional" lag.
The transition probabilities obtained through maximum entropy rates (see Figure 3 on the right) are too regular for the process.If we look at the transiogram in Figure 1 on the left, we note that the red lines are not so close to the empirical transition probabilities and this may create some forecast problems when we consider the multidimensional lags.In fact, since the model was developed for stationary processes, its use is suitable when the stochastic process might be considered stationary.
Once the best fitting mdoel is chosen, we can predict the category in the unknown points or simulate a random field.In any case, we need to consider that the approximation of the simulation probabilities is affected by further variability due to the ellipsoidal interpolation of the transition rates.This means that the real transition rates for a non-axial direction can be overestimated or underestimated with a bigger error than the axial directions.
In this example, 100 observations are sampled from the original geological map and, instead of re-estimating transition rates, we are going to predict the category in the original locations through the already estimated rates.From a computational viewpoint, this allows us to compare the prediction accuracy of the procedures exposed in Table 1.
In real applications, all data may be used to estimate the parameters of the model and get better predictions.Simulations should be used only for drawing scenarios for non-observed locations.Obviously, the most probable category in a location is the best prediction.
The following lines of code are executed to plot the Jura geological map (see Figure 4): The R Journal Vol.5/2, December ISSN 2073-4859 Sample of 100 observations X Y q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Argovian Kimmeridgian Sequanian Portlandian Quaternary X <-jura.grid$XY <-jura.grid$Ylibrary(RColorBrewer) brwCol <-brewer.pal(nlevels(data),"Accent") par(mfrow = c(1, 1), mar = c(5, 4, 4, 2)) plot(X, Y, col = brwCol[data], pch = 15, main = "Swiss Jura geological map") legend("topleft", legend = levels(data), col = brwCol, pch = 15) One hundred observations are randomly selected as follows: set.seed(29 62 11) smp <-sample(length(data):1, 1 ) and they are plotted by the following code: plot(X, Y, type = "n", main = "Sample of 1 observations", xlab = "X", ylab = "Y") points(X[smp], Y [smp],pch = 19,col = brwCol[data[smp]]) legend("topleft", legend = levels(data), col = brwCol, pch = 15) Usually, before performing the simulation, a grid of points is generated by the use of the wellknown expand.grid()function.In this example, the simulation grid is set as grid <-jura.grid[, 1:2] The kriging algorithm will approximate the conditional probabilities by considering the twelve nearest neighbours for all points in the simulation grid.Since only predictions will be presented here, the optimisation phase used to adjust the simulations will be skipped.grid, radius = 1) The multinomial categorical simulation method will approximate the prediction probabilities by considering all sample points.All these functions returns a "data.frame"object as output.It contains the coordinates, predictions, simulations and the approximated probability vector for each point in the simulation grid.Through these quantities we can plot the prediction maps by the use of the following R code: posCol <-as.integer(iks$Prediction)plot(X, Y, pch = 15, col = brwCol[posCol], main = "Kriging prediction map") legend("topleft", legend = levels(data), col = brwCol, pch = 15) posCol <-as.integer(fpth$Prediction)plot(X, Y, pch = 15, col = brwCol[posCol], main = "Fixed path prediction map") legend("topleft", legend = levels(data), col = brwCol, pch = 15) posCol <-as.integer(rpth$Prediction)plot(X, Y, pch = 15, col = brwCol[posCol], main = "Random path prediction map") legend("topleft", legend = levels(data), col = brwCol, pch = 15) posCol <-as.integer(mcs$Prediction)plot(X, Y, pch = 15, col = brwCol[posCol], main = "Multinomial categorical prediction map") legend("topleft", legend = levels(data), col = brwCol, pch = 15) By looking at the graphics in Figures 5 and 6, we can have an idea of the prediction heterogeneity of these methods.In order to establish which is the best predictor, one should perform these simulations more than once.At each time, another 100 observations must be randomly selected.However, this is beyond the aim of this example.
Using only 2% of the original data, we can obtain the results in Table 2 by checking how many predictions match with the observed categories.Since the data are used in the computations of the probabilities, the prediction accuracy improves under particular conditions.Essentially, the sample size should increase while the spatial domain, wherein the observations are taken, is fixed and bounded.In this example, we can have more accurate predictions by increasing the number of the random selected observations and keeping the number of points in the simulation grid fixed.

Conclusions
Although there exist other approaches to study categorical random fields, the spMC package is a powerful tool for modelling continuous lag Markov chains.In particular, the user is allowed to deal with categorical response variables based on spatial stochastic processes without specifying a variogram model for the spatial dependence structure.
Several functions were developed to investigate graphically the properties of the process (e.g.stationarity and anisotropies), while others are useful to estimate the parameters, to predict and simulate the categorical values on unsampled locations.All of the functions in the package were designed in order to achieve good results in a reasonable time, as well as for large data sets.

Figure 1 :
Figure1: Empirical transiogram (green points) and theoretical probabilities (average method in black lines, maximum entropy method in red) along the X-axis (left) and along the Y-axis (right).

Figure 3 :
Figure 3: Multidimensional theoretical transiogram (transition rates are estimated by the average method (left) and by the maximum entropy method (right).

Figure 4 :
Figure 4: The Swiss Jura geological map (left) and the 100 sampled observations (right).
iks <-sim_ik(MTr$average, data = data[smp], coords = coords[smp, ],grid, knn = 12, max.it = )Both fixed and random path simulation methods are performed by considering those nearest points along the axial directions within a radius of length one.

Figure 5 :
Figure 5: Prediction obtained by kriging probability approximation (left) and by fixed path probability approximation (right).

Figure 6 :
Figure 6: Prediction obtained by random path probability approximation (left) and by multinomial categorical probability approximation (right).

Table 1 :
Most important user functions in the spMC package.

Table 2 :
Percentages of matched categories.