Spatial stratification of landscapes allows for the development of efficient sampling surveys, the inclusion of domain knowledge in data-driven modeling frameworks, and the production of information relating the spatial variability of response phenomena to that of landscape processes. This work presents the rassta package as a collection of algorithms dedicated to the spatial stratification of landscapes, the calculation of landscape correspondence metrics across geographic space, and the application of these metrics for spatial sampling and modeling of environmental phenomena. The theoretical background of rassta is presented through references to several studies which have benefited from landscape stratification routines. The functionality of rassta is presented through code examples which are complemented with the geographic visualization of their outputs.
The application of robust, quantitative approaches for the spatial modeling of environmental phenomena has increased in the past few decades mainly due to an increase in computational power, advances in statistical modeling, and the availability of geospatial layers of environmental information (Scull et al. 2003; Elith and Leathwick 2009). Most of these approaches aim at building explicit quantitative relationships between environmental controls and response phenomena through statistical learning. Examples of these approaches include digital soil mapping (DSM) (McBratney et al. 2003), species distribution modeling (SDM) (Guisan and Zimmermann 2000), land use/land cover classification (Ham et al. 2005), and forest fire modeling (Chuvieco et al. 2010). Despite the extensively documented success of these approaches, there are still some challenges that limit their application. For instance, poor statistical performance is often reported in studies where input data is too limited to accurately represent control-response relationships (Araújo and Guisan 2006). Moreover, model parsimony and interpretation of results can be compromised when using ‘black-box’ algorithms (Arrouays et al. 2020). Similarly, including a priori knowledge about natural processes in purely statistical approaches can be challenging to achieve (Heuvelink and Webster 2001).
Several studies have suggested embedding spatial stratification routines within approaches such as DSM, SDM, land use/cover mapping, forest fire modeling, and others to overcome the challenges limiting their application. In such studies, the spatial stratification of landscapes creates units with reduced spatial variability of environmental phenomena as compared to the overall variability across a landscape. The use of these units allows the researcher to (a) obtain balanced representations of control-response relationships (Guisan and Zimmermann 2000; West et al. 2016); (b) include expert knowledge of physical processes for improving modeling with limited data (Zhu et al. 2008); (c) improve the performance of parameterization of mechanistic models (Park and Van De Giesen 2004; Baldwin et al. 2017); and, (d) facilitate the interpretation of environmental conditions and their influence on the spatiotemporal variability of processes of interest (Rodrigues et al. 2019).
In general, landscape stratification routines follow fundamental ecological concepts that explain the hierarchical and multi-scale nature of relationships between environmental phenomena across space (Allen and Starr 1982). Therefore, landscape stratification methods have been applied in many studies that use geospatial information for environmental modeling, such as those previously cited. However, few packages exist in the R environment with functions strictly aimed at landscape stratification routines using geospatial data. Although one could implement custom stratification algorithms using multiple all-purpose geospatial analysis packages such as terra (Hijmans 2021) and sf (Pebesma 2018), the ease of use, reproducibility, and replicability of analysis is often enhanced when algorithms are implemented as part of a dedicated package. The motif package (Nowosad 2021) is the only example the authors could find of a package that is fully dedicated to landscape stratification in R using geospatial data. Although the methods offered by motif are effective for large-scale studies (Jasiewicz et al. 2015; Nowosad 2021), their application is currently limited to rasters of categorical data. Thus, motif is not practical for the modeling of spatially continuous environmental phenomena, which is often a goal of landscape stratification routines.
This work presents the rassta package as a collection of algorithms for the spatial stratification of landscapes, sampling, and modeling of environmental phenomena. The rassta package is not intended as a drop-in replacement for statistically-robust environmental modeling approaches. Rather, it is intended to serve as a generalized framework to derive geospatial information that can be used to improve inference with these statistical approaches.
The algorithms in the rassta package assist in the analysis of
environmental information related to the spatial variability of natural
phenomena across landscapes. These functions focus on integrating standard
geospatial techniques and quantitative analysis in a generalized framework for
landscape stratification, sampling, and modeling. All of the functions in the
rassta package take geospatial data in raster format as input. In the
context of geographic information systems (GIS), the raster format can be
considered a graphical representation of a matrix that is organized in rows and
columns, and which may be stacked in multiple layers (e.g., multi-band satellite
imagery). Each cell (pixel) in the raster contains a value representing a
spatially-varying phenomenon, such as elevation or precipitation. A few
functions in rassta also produce geospatial data in vector format. Vector
data represents geometric entities in the form of points, lines, and polygons.
The rassta package uses the highly efficient terra package as the
backbone for handling raster and vector data. Most of the geospatial data
manipulation with terra is performed in C++ and is based on two main R
data types (classes): SpatRaster
and SpatVector
. Note that terra
imports the Rcpp package (Eddelbuettel and François 2011) since terra uses
C++ (including external pointers) to manipulate these classes.
Most of the functions implemented in rassta are interrelated in the sense
that the outputs from some functions can be used as the inputs for others. This
functional interrelation allows for a generalized framework to conduct spatial
stratification, sampling, and modeling in a single package following a
project-oriented approach. In general, the functions of rassta can be
grouped into five categories: (a) landscape stratification; (b) landscape
correspondence metrics; (c) stratified sampling; (d) spatial modeling; and (e)
miscellaneous (Figure 1). Each category and its corresponding
functions (except for miscellaneous) are theoretically founded on several
studies focused on understanding spatially-varying natural phenomena across
landscapes. In the next sections, the rationale behind each category and its
functions is described. This description is complemented with references to
corresponding scientific literature and includes code examples showing the
application of each function with extensive use of plotting functions (for
visualization purposes only). Most of the plotting functions are derived from
the terra package using the SpatRaster
and SpatVector
classes. [Note:
To reduce the extension of code examples, all the map and graph plotting
functions were consolidated in the function figure()
].
Several studies have suggested the need to account for the hierarchical and multi-scale nature of landscape processes. Allen and Starr (1982) suggested that landscape processes can be explained through hierarchical multivariate structures given their multiple spatial and temporal scales. Based on Dokuchaeiv’s theory of soil formation (Glinka 1927) and the soil-landscape paradigm (Hudson 1992), McSweeney et al. (1994) proposed a nested model of soil-landscape processes at the physiographic, geomorphometric, and within-soil domains. Flügel (1995) suggested that the regionalization of hydrology-related processes should consider the multi-scale landscape heterogeneity in terms of soil, topography, geology, climate, and vegetation. These ideas have led these and other authors to formulate frameworks for the creation of spatial entities that stratify the landscape. The general purpose of these entities has been to define spatially-explicit domains that represent distinctive landscape processes and/or interactions (McSweeney et al. 1994). Accordingly, spatial stratification using rassta focuses on the creation of such domains (hereafter referred to as units).
The landscape stratification process with rassta follows a hierarchical approach similar to Austin and Heyligers (1989), who individually classified gradients of precipitation and elevation into intervals that were intersected with geologic classes for sampling purposes. Similarly, in rassta, a set of first-level units is created separately for each landscape factor under analysis. Then, multiple sets of first-level units are integrated into a single set of second-level units. The first-level units, called classification units, can be created outside of rassta via multicriteria analysis, statistical learning, or other methods. Moreover, the classification units can be formally defined through classification schemes, such as those based on taxonomic keys. The second-level units, called stratification units, result from the spatial intersection of multiple sets of classification units. Note that both classification and stratification units represent a spatial stratification for a given landscape. Figure 2 shows an example of a simple landscape stratification process based on two landscape factors, each with three raster layers representing continuous variables.
There are three important aspects of the stratification approach used within rassta that must be considered. (a) One can simply create stratification units by incorporating variables from multiple landscape factors in a single classification process. However, the interpretation of results is often compromised when using a large number of variables in “all-in-the-bag” statistically driven classification schemes. (b) Multiple sets of classification units can belong to a single landscape factor, and each set can be created from variables at a distinct spatial scale. Presumably, this can account for the multi-scale nature of landscape factors in the stratification process. (c) A landscape factor can be represented by a single categorical variable, as in the case of geologic units or soil parent material. In this case, the landscape factor/variable is already in the form of classification units. Figure 3 shows a landscape stratification scenario like that addressed in (b) and (c).
A set of n classification units represents n distinct landscape
configurations related to a single landscape factor. Note that the term
landscape configuration is used here as a generic term for a particular
pattern in the spatial variability of one or multiple variables belonging to a
landscape factor. Currently, rassta allows the creation of classification
units via unsupervised learning thanks to its functions som_gap()
and
som_pam()
. The function som_gap()
performs dimension reduction based on the
self-organizing map (SOM) proposed by Kohonen (1990). The R package
kohonen (Wehrens and Kruisselbrink 2018) is called internally by som_gap()
to produce
the SOM. The function som_gap()
also performs cluster analysis on the SOM
codes based on the partitioning around medoids (PAM) (Kaufman and Rousseeuw 1990), with
estimation of the optimum number of clusters (k) through the gap statistic
(Tibshirani et al. 2001). It is important to mention that the output SOM object
returned by som_gap()
can be used as input for any other clustering algorithm
(e.g., hierarchical, spectral, etc.) or statistical analysis outside of
rassta.
The code below shows how som_gap()
reduces the feature space and selects k
clusters from four terrain variables. Note that the processing time of
som_gap()
is significant (around 162 seconds on a 4-cores Intel processor at
3.2 GHz for the following example). The processing time increases as the number
of cells/layers in the argument var.rast
increases, and/or as the argument
K.max
increases.
# Load the rassta and terra packages
library(rassta)
library(terra)
# Note that terra imports Rcpp, but if Rcpp is not automatically loaded then:
library(Rcpp)
# Get the data required to run the examples from rassta’s installation folder
wasoil <- system.file("exdat/wasoil.zip", package = "rassta")
# Copy data to current working directory and extract files
file.copy(from = wasoil, to = getwd())
unzip("wasoil.zip")
# Set seed
set.seed(963)
# Multi-layer SpatRaster with 4 terrain variables
terr.var <- rast(c("height.tif", "midslope.tif", "slope.tif", "wetness.tif"))
# Scale variables to mean = 0 and standard deviation = 1
terr.varscale <- scale(terr.var)
# Dimensionality reduction and estimation of optimum k (max k to evaluate: 12)
terr.som <- som_gap(terr.varscale, xdim = 10, ydim = 10, K.max = 12)
# Plot results
figure(4, d = list(terr.var, terr.som))
The function som_pam()
creates raster versions from the outputs of
som_gap()
. The code below shows how som_pam()
creates raster versions of the
SOM grid and PAM clustering computed in the previous example.
Note that the approach for creating classification units should not be limited
to that offered by som_gap()
and som_pam()
. There are many other approaches
outside of rassta that can be followed, such as supervised classification
based on statistical learning, or GIS-based multicriteria analysis. The best
approach may depend on the research question(s) being addressed. Therefore, the
selection of the proper approach and the optional use of other R packages and/or
GIS software is left to the user. Also, note that classification units created
outside of rassta are completely compatible with rassta objects and
methods if the units are represented through the SpatRaster
class from
terra.
A set of n stratification units represents n distinct landscape
configurations related to multiple landscape factors. Note that the term
landscape configuration is used here as a generic term for a particular
pattern in the spatial variability of multiple variables belonging to multiple
landscape factors, or to the same factor represented at multiple spatial scales.
The function strata()
allows the spatial intersection of multiple sets of
classification units into a single set of stratification units. This function
also assigns a unique numeric code to each stratification unit. The numeric code
makes it possible to trace back each classification unit composing a given
stratification unit. The code below shows the construction of stratification
units with strata()
using classification units from three landscape factors
(climate, soil parent material, and terrain).
There are two metrics of landscape correspondence that can be calculated with rassta: (a) the spatial signature of classification units, and (b) the landscape similarity to stratification units. These metrics quantify the relative correspondence between any location across geographic space and landscape configurations represented by classification and stratification units. Several studies have applied similar concepts related to continuous correspondence between landscape configurations for the modeling of spatially-varying phenomena. Early examples include studies using multivariate distance metrics in the feature space for SDM (Carpenter et al. 1993) and studies applying the fuzzy set theory (Zadeh 1965) for multicriteria evaluation (Burrough 1989), DSM (Zhu and Band 1994) and landform classification (MacMillan et al. 2000).
The spatial patterns of the degree of correspondence between any landscape configuration and the configuration represented by a given classification unit are defined as the spatial signature. The spatial signature is represented by a raster layer of continuous values that results from the cell-wise aggregation of empirical distribution functions mapped over geographic space. Each distribution function corresponds to one variable and relates the classification unit to “typical” values of the variable within the classification unit. The concept of spatial signature is based on the work of Pike and Rozema (1975) and Pike (1988). These authors used the term geometric signature to describe a set of sample statistics (e.g., mean, standard deviation) of terrain variables (e.g., slope, curvature) used to distinguish “geomorphically disparate landscapes” (Pike 1988).
The spatial signature in rassta replaces the geometric signature’s
measurements of central tendency and dispersion statistics with statistical
distribution functions generated across geographic space. The statistical
distribution functions used in rassta are: (a) the probability density
function (PDF) based on the kernel density estimation, (b) the empirical
cumulative distribution function (ECDF), and (c) an inverted version of the ECDF
(iECDF). Note that the spatial signature concept is somewhat similar to the
virtual ecological niche (Hirzel et al. 2001) and the multivariate environmental
similarity surface (Elith et al. 2010), which are implemented in R through the packages
virtualspecies (Leroy et al. 2016) and dismo (Hijmans et al. 2020),
respectively. Figure 7 and Figure 8 show an illustration
and a pseudocode of the process to calculate the spatial signature of a
classification unit, respectively. Note that the function FUNSIG()
in the
pseudocode is just a placeholder to encompass the three functions from
rassta that are required to calculate spatial signatures. These functions
are select_functions()
, predict_functions()
, and signature()
, each will be
further discussed next.
An important assumption is made when using the PDF, ECDF, and iECDF to characterize the typical values of a given variable within a given classification unit. The position of a value within the distribution function is an indicator of how typical the value is in terms of the variable’s distribution within the classification unit. For instance, values closer to, or at the peak of the PDF are assumed to be the most typical values of the variable within the classification unit. Contrarily, values at the tails of the PDF are the less typical. Although one could simply use the PDF as a generalized function to denote typical values, this function assigns the same weight to values at the tails of the distribution regardless of the tail’s location (left or right). In some cases, a priori knowledge can dictate that typical values of a variable within a given classification unit are those approaching \(+\infty\), or those approaching \(-\infty\). The use of the ECDF and the iECDF is intended for those cases. More specifically, if a classification unit is known to be associated with a variable’s extreme values toward \(+\infty\), then the ECDF can be used to represent this association. Conversely, if the classification unit is associated with those variable’s extreme values toward \(-\infty\), then the iECDF can be used.
The function select_functions()
allows the user to select the statistical
distribution function used to represent the typical values for a given variable
within a specific classification unit. Both automatic and interactive selection
modes are supported, with the latter based on a shiny app
(Chang et al. 2021). The automatic selection of distribution functions is based on
within-unit statistics, also referred to as zonal statistics in the GIS
literature, and it follows the criteria described next:
The code below shows the automatic selection of statistical distribution
functions for four climatic classification units and two variables with
select_functions()
.
# Multi-layer SpatRaster with 2 climatic variables
clim.var <- rast(c("precipitation.tif", "temperature.tif"))
# Single-layer SpatRaster with 4 climatic classification units
clim.cu <- rast("climate.tif")
# Automatic selection of statistical distribution functions
clim.difun <- select_functions(cu.rast = clim.cu,
var.rast = clim.var,
mode = "auto")
# Plot results
figure(8, d = list(clim.difun, clim.cu, clim.var))
The selected distribution functions can be used to generate predictions of
distribution function values over geographic space with the function
predict_functions()
as shown in the code below. The predictions are generated
by fitting a locally estimated scatterplot smoothing (LOESS) regression with the
within-unit distribution function’s values (y) and the within-unit variable’s
values (x). The fitted LOESS and the raster layer of the variable are then
used to predict new distribution function values across geographic space.
# Multi-layer SpatRaster of climatic variables and classification units
clim.all <- c(clim.var, clim.cu)
# Ouput table from select_functions()
df <- clim.difun$distfun
# Predicted distribution functions for climatic variables
clim.pdif <- predict_functions(cuvar.rast = clim.all,
cu.ind = 3,
cu = df$Class.Unit,
vars = df$Variable,
dif = df$Dist.Func)
# Plot results
figure(9, d = list(clim.pdif, clim.cu))
The function signature()
calculates the spatial signature of a given
classification unit by aggregating all of the predicted distribution functions
associated with the unit. The code below shows the calculation of spatial
signatures with signature()
. Note that the arguments inprex
and outname
allow the user to identify the raster layers representing the predicted
distribution functions associated with each classification unit in a set, and to
assign a unique name to each resulting raster layer of spatial signature,
respectively.
The spatial patterns of the degree of correspondence between any landscape configuration and the landscape configuration represented by a given stratification unit are defined as the landscape similarity. The landscape similarity is represented by a raster layer of continuous values, which results from the cell-wise aggregation of the spatial signatures of multiple classification units. This aggregation is possible because any given stratification unit is the result of the spatial intersection of multiple classification units, commonly one per landscape factor or factor scale (see Figure 2 and 3). Moreover, each classification unit has one spatial signature associated with it. Therefore, any given stratification unit will be associated with multiple spatial signatures, which can be cell-wise aggregated to calculate the landscape similarity. Figure 12 shows an example of the calculation process for a layer of landscape similarity to stratification unit.
The function similarity()
calculates the landscape similarity layer for each
stratification units in a given set (with the set being represented by a
single-layer SpatRaster
object), as shown in the following example. The
argument su.code
indicates the name of the landscape factors/factor scales
used to create the stratification units, and the digit position (start, end) of
the classification units’ ID in the stratification unit’s numeric code.
# Multi-layer SpatRaster with spatial signatures of classification units
clim.sig <- rast(list.files(pattern = "climate_")) # For climatic units
mat.sig <- rast(list.files(pattern = "material_")) # For soil parent material units
terr.sig <- rast(list.files(pattern = "terrain_")) # For terrain units
# Single-layer SpatRaster of stratification units
su <- rast("su.tif")
# Landscape similarity to stratification units
su.ls <- similarity(su.rast = su, sig.rast = c(clim.sig, mat.sig, terr.sig),
su.code = list(climate = c(1, 1), material = c(2, 2),
terrain = c(3, 3)))
# Plot results
figure(12, d = list(su.ls, su, clim.sig, mat.sig, terr.sig))
Stratified sampling is an efficient technique for achieving an adequate representation of environmental variability, reducing cost of field work, and improving modeling with limited observations (Austin and Heyligers 1989; Wessels et al. 1998; Guisan and Zimmermann 2000; Zhu et al. 2008; West et al. 2016). Accordingly, sampling with rassta to select observations/sampling locations is performed in a stratified fashion using stratification units. Additionally, the raster layers of landscape similarity to stratification units can be included in the sampling process. Including the landscape similarity layers results in a non-probability sample. For each stratification unit, the sampling process selects the observation(s)/sampling location(s) at the raster cell where the highest landscape similarity value occurs, resulting in a stratified, non-probability sample that is biased towards maximizing the representativeness of landscape configurations. This idea of biased, stratified sampling is based on the work of Gillison (1983); Gillison and Brewer (1985), Austin and Heyligers (1989), and Zhu et al. (2008). These authors have suggested that bias related to landscape configurations is relevant for the maximization of environmental representativeness, detection of maximum diversity, and representation of non-stochastic control-response relationships.
The function observation()
performs the automatic selection of the
representative response observation for each stratification unit in a given set.
Given a stratification unit, the unit’s representative response observation is
that whose value best reflects the influence that the unit’s landscape
configuration exerts on the response. This function requires a set of
observations/samples already collected for a set of stratification units.
Currently, observation()
selects observations based on the following methods:
(a) mls: select the observation at the raster cell with the maximum landscape
similarity value; (b) mrv: select the observation whose response value is the
median of all the values; and (c) random: select an observation at random.
Note that the latter represents a case of stratified random sampling.
The code below shows the selection of representative soil organic carbon (SOC)
observations based on the maximum landscape similarity method. Note that the
arguments su.rast
and ls.rast
require the stratification units and landscape
similarity layers previously created with strata()
and similarity()
,
respectively.
# SpatVector with SOC observations for stratification units
soc.obs <- vect("soc.shp")
# Representative SOC observation for each stratification unit
su.obs <- observation(su.rast = su, obs = soc.obs, col.id = 1, col.resp = 2,
method = "mls", ls.rast = su.ls$landsim)
# Plot results
figure(13, d = list(su.obs, soc.obs, su))
The function locations()
performs the automatic selection of the
representative sampling location(s) for each stratification unit in a given set,
where the representative sampling location is the raster cell where the highest
landscape similarity value occurs. Currently, locations()
implements two
selection methods: (a) buffer: select sampling locations within areas with
landscape similarity values above a certain threshold; and (b) absolute:
select sampling locations with the highest landscape similarity values. The code
below shows the use of locations()
based on the buffer method.
Predictive modeling with rassta is based on the assumption that each
stratification unit represents a distinct landscape configuration and that this
configuration influences a natural phenomenon in a distinctive manner. It is
assumed that the influence that a stratification unit’s landscape configuration
has on response phenomena at a specific location (i.e, raster cell) is
proportional to the unit’s landscape similarity value at that raster cell.
Therefore, given a stratification unit x, the corresponding raster layer of
landscape similarity ls, the response y, and a raster cell c, the greater
the value of ls at c, the more similar y at c will be to the typical y
for x. The typical (i.e., representative) value of a response phenomenon for a
given stratification unit can be defined in several ways. For instance, if a
response phenomenon was sampled/measured multiple times within a given
stratification unit, the typical response value could be that from the
sample/measurement at the raster cell with the highest landscape similarity
value (see observation()
).
Several studies have used landscape similarity layers to model the spatial
variability of natural phenomena. These studies argue that the use of similarity
layers is appropriate in cases when (a) available observations for modeling are
limited (Zhu et al. 2008); (b) initial spatial distribution patterns are needed for
survey design (Carpenter et al. 1993); (c) expert-driven selection of informative
variables is possible (Knick and Dyer 1997); (d) a priori knowledge of response-control
relationships in the form of conceptual models is available (Schmidt et al. 2005; Zhu et al. 2010); and (e) discriminating between (ecologically) positive and
negative deviations from reference environments is required (Watrous et al. 2006).
Accordingly, engine()
allows the modeling of environmental phenomena with a
number of training observations as few as the number of landscape similarity
layers [cases (a) and (b)]; training observations and landscape similarity
layers as outcomes of expert-driven landscape stratification [cases (c) and
(d)]; and landscape similarity layers derived from spatial signatures that
discriminate between the tails of distribution functions [case (e)].
Modeling with rassta is performed using the function engine()
. For
continuous responses, engine()
performs a weighted average involving
representative response values and landscape similarity layers. For a raster
cell c, the modeled response value is equal to the weighted average of the
representative values for those stratification units with the highest landscape
similarity values at c. The stratification units with the highest landscape
similarity values at c can be considered as the nearest neighbors (in
feature space) of the landscape configuration at c. These nearest neighbors
are called winning stratification units, and the weight of their corresponding
representative value is proportional to the winning unit’s landscape similarity
value at c. For categorical responses, the modal response value of the winning
stratification units replaces the weighted average. Figure 16 shows
an example of the modeling process for continuous responses with rassta.
Note that the weighted average for modeling phenomena across geographic space
has been widely applied in GIS-based multicriteria decision analysis (GIS-MCDA).
In GIS-MCDA, attributes (i.e., variables) in the form of raster layers are
weighted according to expert criteria. The weighted variables are then combined
through (cell-wise) overlay operators such as multiplication, addition and
(ordered) averaging. The resulting value at each cell represents the relative
suitability for a certain condition/decision (Malczewski 2006). The function
engine()
generalizes the weighted overlay process of GIS-MCDA by allowing the
use of sampled/measured data of a response phenomenon in conjunction with the
landscape similarity layers acting as weighted variables. This generalization
allows the modeling of real-valued phenomena in continuous or categorical form.
The modeling approach of engine()
is almost the same as that proposed by
Zhu (1997) to model landscape attributes across geographic space. The difference
between engine()
and the approach of Zhu (1997) is that engine()
allows
the selection of the number of landscape similarity layers for the weighted
average calculation. Presumably, restricting the number of layers will reduce
the shortening (‘shrinking’) effect that weighted averaging has on the range of
modeled continuous response values (Nolan et al. 2019).
The code below demonstrates the use of engine()
for the predictive modeling of
soil organic carbon. Note that the representative response values (argument
su.repobs
) are those previously selected with observation()
, and that the
layers of landscape similarity (argument ls.rast
) are those previously created
with similarity()
.
# Table with the numeric code of stratification units and representative SOC values
su.soc <- su.obs$su_repobs[, c("SU", "soc")]
# engine() requires a (tiled) SpatVector with the boundaries of the area of interest
aoi <- vect("aoi.shp")
# engine() writes results directly on disk
if (dir.exists("soc") == FALSE) {dir.create("soc")} # Create directory
# Spatial modeling of SOC across the landscape based on 3 winning stratification units
soc <- engine(ls.rast = su.ls$landsim, n.win = 3, su.repobs = su.soc,
tiles = aoi, outdir = "soc", overwrite = TRUE)
figure(16, d = list(soc, "soc_valid.shp")) # Plot results
The spatial signature only applies to classification units created from
continuous variables. Thus, spatial signatures cannot be calculated for
classification units that represent categorical variables, such as land use/land
cover. In such cases, a one-hot encoding can be applied to produce binary layers
for the units. These layers are considered the spatial signatures of the
classification units. The code below shows the creation of binary layers for
soil parent material units with dummies()
.
The function plot3D()
produces interactive maps showing the 3-dimensional
(XYZ) variability in raster layers representing continuous variables. The
XYZ reference positions are obtained from a user-supplied elevation layer. For
large raster layers (large spatial coverage and/or high spatial resolution),
this function allows the option to decrease resolution and subset the data. The
code below shows how plot3D()
creates an interactive
3D map for SOC, as modeled with engine()
.
# Single-layer SpatRaster of terrain elevation
elev <- rast("elevation.tif")
# Interactive 3D map
soc3D <- plot3D(c(elev, soc), z = 1, ex = 0.2, pals = "Fall", rev = TRUE)
soc3D$soc
This work presented the rassta package for spatial stratification, sampling, and modeling of environmental phenomena within the R environment. Future versioning of the rassta package will focus on developing new approaches for spatial stratification. Stratification based on spatial intersection may not be feasible to implement in highly complex landscapes because these landscapes may require many (sets of) classification units to accurately represent the spatial variability of landscape factors, leading to over-stratification, and thus, greater demand for samples/observations to conduct predictive modeling based on landscape similarity. One plausible solution is the application of the stratification methods presented by Jasiewicz et al. (2015), Jasiewicz et al. (2018), Nowosad (2021), and Nowosad and Stepinski (2021). However, these methods have been purposely designed for studies with continental/global applications. Therefore, these methods should be adapted for rassta to tailor their application at local scales to allow for more precise representations of natural phenomena and their spatial variability. Another focus of versioning can be new functions to visualize the variability of response phenomena relative to the hierarchical structure represented by the stratification units. Lastly, future versioning of rassta should also consider the user’s experiences to ensure its general applicability.
The core ideas implemented in the rassta package include the multi-scale, hierarchical landscape stratification based on spatial intersection, the application of non-parametric distribution estimators to define the typical landscape configuration of stratification units, and the use of spatially explicit landscape correspondence metrics for non-probability sampling and predictive modeling. Some of these ideas have previously been implemented in R through a few packages dedicated to the analysis of geospatial data. Nevertheless, rassta offers a unified, generalized framework to conduct multiple landscape stratification routines through a dedicated set of algorithms. Moreover, spatially-explicit information created with rassta, like stratification units, landscape similarity layers, and representative observations, can be embedded into statistically robust modeling approaches to optimize the analysis of environmental phenomena.
Supplementary materials are available in addition to this article. It can be downloaded at RJ-2022-036.zip
terra, sf, motif, rassta, Rcpp, kohonen, virtualspecies, dismo, shiny
ChemPhys, HighPerformanceComputing, NumericalMathematics, Spatial, SpatioTemporal, WebTechnologies
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Fuentes, et al., "rassta: Raster-Based Spatial Stratification Algorithms", The R Journal, 2022
BibTeX citation
@article{RJ-2022-036, author = {Fuentes, Bryan A. and Dorantes, Minerva J. and Tipton, John R.}, title = {rassta: Raster-Based Spatial Stratification Algorithms}, journal = {The R Journal}, year = {2022}, note = {https://doi.org/10.32614/RJ-2022-036}, doi = {10.32614/RJ-2022-036}, volume = {14}, issue = {2}, issn = {2073-4859}, pages = {286-304} }