The focus of this paper is on the open-source R package roahd (RObust Analysis of High dimensional Data), see (Tarabelloni et al. 2017). roahd has been developed to gather recently proposed statistical methods that deal with the robust inferential analysis of univariate and multivariate functional data. In particular, efficient methods for outlier detection and related graphical tools, methods to represent and simulate functional data, as well as inferential tools for testing differences and dependency among families of curves will be discussed, and the associated functions of the package will be described in details.
Functional Data Analysis (FDA) has seen an impressive growth in the statistical research due to the more and more frequent production of complex data in many different research contexts (i.e., healthcare, environmental, engineering, etc.). According to the FDA model, data can be seen as measurements of a certain quantity (or a set of quantities) along a given, independent and continuous indexing variable (such as time or space). Observations are then treated as random functions and can be viewed as trajectories of stochastic processes defined on a given infinite dimensional functional space. In this context ‘high dimensional data’ is meant in this sense: a high number of covariates/predictors (e.g., evaluations of a signal on a given grid) for a single sample unit (e.g., signal). We have to face the traditional ‘large p, small n’ problem: the number of features can exceed the number of observations. Many research areas deal with this kind of data where features exceed observations, for example, biomedical signals, high resolution imaging, website analysis of stream data.
Even if the research in FDA dates back to 1970s - 1980s, the first edition of (Ramsay and Silverman 2005) and (Ramsay and Silverman 2002) made the methods available to a larger audience with an enormous impact on the spread of this topic. The authors mainly cover explorative methods, parametric and semi-parametric approaches. Other important books on functional data analysis are (Ferraty and Vieu 2006), (Horvath and Kokoszka 2012) and (Kokoszka and Reimherr 2017). In addition to these monographs there is a vast quantity of scientific papers ranging from theoretical to applied techniques aimed at modelling and analysing functions.
In the open-source R software development the number of packages focused on general functional data analysis is rapidly increasing. In particular, fda ((Ramsay et al. 2014)) presents functions to implement many methods of functional data analysis, including smoothing, plotting and regression models (see (Ramsay and Silverman 2005), (Ramsay et al. 2009)). The package fda.usc ((Febrero-Bande and Oviedo de la Fuente 2012)) carries out exploratory and descriptive analysis of functional data such as depth measurements or functional outliers detection, as well as functional regression models (univariate, nonparametric), basis representation and Functional Principal Component Analysis (FPCA). The package fdasrvf ((Tucker 2017)) performs alignment, FPCA, and modeling of univariate and multivariate functions, allowing for elastic analysis of functional data through phase and amplitude separation. The core of the package fdapace ((Dai et al. 2018)) is FPCA for sparsely or densely sampled random trajectories and time courses, via the Principal Analysis by Conditional Estimation (PACE) algorithm or numerical integration. The package rainbow ((Shang and Hyndman 2019)) provides tools for functional data display, explorqatory analysis (plots, bagplots and boxplots) and outlier detection, while the package fds (cite(fds) contains 19 data sets with functional data. There are also a lot of other packages, focused on more specific methods for functional data analysis, like regression, classification and clustering, registering and aligning, studying time series of functional data (see (Zeileis 2005))
The focus of this paper is on the open-source
R package roahd (RObust
Analysis of High dimensional Data), see (Tarabelloni et al. 2017).
roahd has been developed to gather recently proposed statistical
methods that deal with the robust statistical analysis of univariate and
multivariate functional data. The latter is the case where each
observation in a dataset is a set of possibly correlated functions,
measured at discrete points. Despite the usefulness of robust statistics
methods in data analysis (e.g., median, quantiles, trimmed means), their
generalisation to the functional framework is definitely not
straightforward, due to the infinite-dimensional nature of the spaces
embedding data. A possibility is to leverage on the concept of depth
measures in order to create proper order statistics to be used in a
suitable robust inferential framework.
In the multivariate context there are many possible definitions of depth
measures. Among others, see
(Tuckey 1975; Liu and Singh 1993; Liu et al. 1999; Zuo and Serfling 2000). For univariate and
multivariate functional data, two main approaches to the generalisation
of depth measures have been considered so far. A first approach is to
average a multivariate depth, say
The main contributions of the package, described and detailed in the following sections, are
the implementation of simple and handy S3 classes representing
functional data (fData
and mfData
for the univariate and
multivariate case, respectively), as well as a set of algebraic
operations and convenience functions to expressively operate on
them;
the implementation of useful generators for functional data, such as
generate_gauss_fdata
and generate_gauss_mfdata
, that can be used
to simulate artificial datasets of Gaussian functional data with a
target mean and covariance (that must be specified by the researcher
as arguments of the related functions), which could be very useful
to test or illustrate existing and new methods;
the implementation of efficient functions for computing depth
measures and robust statistics, such as MBD
, MEI
,
median_mfData
, cor_spearman
, for both univariate and
multivariate functional data, that allow to rank observations from
the center of the distribution-outward and down-upward/up-downard
with respect to sample measurements;
the implementation of useful graphical methods, like the functional
boxplot (fbplot
) and the outliergram (outliergram
), that can be
employed to carry out an explorative analysis of a functional
dataset, and to robustify it by discarding shape, magnitude and
covariance outliers.
Robust methods for functional data are generally rather computationally
intensive, thus the package’s functions have been implemented with an
attention to computational efficiency, in order to allow the processing
of realistic datasets.
The paper is structured as follows: in Section 2 we
introduce the methods and the functions to represent and to generate
functional data; in Section 3 we describe the robust
statistics and indexes implemented in roahd; in Section 4
the main graphical tools for outlier detection are detailed, and finally
Section 5 contains discussion, conclusion and further
potential developments.
The S3 classes fData
and mfData
implement a simple and compact
representation of univariate and multivariate functional datasets. They
can be used by specifying, for each observation in the functional
dataset, a set of measurements over a discrete grid, representing the
dependent variable indexing the functional data (e.g., time). If we
denote by fData
requires the evenly
spaced grid over which the functional observations are measured as
parameter grid
, and the values of the observations in the functional
dataset, provided in form of a two dimensional data structure (e.g.,
matrix or array) having as rows the observations and as columns their
measurements over the grid of length P values
. When the constructor of
the class is created, it checks that the grid is actually evenly spaced.
An example of the function’s call is the following
grid <- seq(0, 1, length.out = 100)
values <- matrix(c( sin(2 * pi * grid),
cos(2 * pi * grid),
4 * grid * (1 - grid),
tan(grid),
log(grid)),
nrow = 5, ncol = 100, byrow = TRUE)
fD <- fData(grid, values)
plot(fD, main = 'Univariate FD', xlab = 'time [s]', ylab = 'values', lwd = 2)
In particular the number of rows is the sample size, i.e. the number of statistical units and each statistical unit is a function evaluated in the grid of points of length P. In the artificial example above we have 5 curves evaluated on a evenly spaced grid of 100 points. The resulting plot is shown in Figure 1.
fData
object, with 5 curves evaluated on a
evenly spaced grid of 100 points.An mfData
object, instead, implements a multivariate functional
dataset where each component is defined over the same indexing variable.
In practice, we deal with a discrete grid mfData
requires the evenly
spaced grid of definition as parameter grid
and a list containing the
L components of the multivariate functional dataset, defined as 2D data
structures (analogously to the constructor of fData
class).
The S3 implementation allows to enrich the package with expressive
operations that enable an easy manipulation of datasets, keeping at the
same time a light structure that allow users to easily access the inner
state of objects. For instance, we added an overloaded operator
[``.fData
([``.mfData
) that allows to use standard slices of
matrix
and array
classes also for fData
(mfData
). We provided an
overloaded implementation of the four basic algebraic operations,
+.fData
, -.fData
, *.fData
, /.fData
, that allow to write and
evaluate simple expressions on fData
objects without explicitly carry
them out on the set of measurements. +.fData
, -.fData
support the
sum or subtraction of compatible functional datasets (e.g, same
definition grid *.fData
, /.fData
operators perform an
element-wise multiplication or division by a scalar quantity. We also
added two convenience functions, append_fData
and append_mfData
,
that can be used to concatenate two compatible (same definition grid
Statistics functions for the computation of the mean (specification of
the generic mean
function of
R), the median (through
median_fData
and median_mfData
, see Section 3 for more
details), or the covariance (cov fun
and its specifications for
fData
and mfData
), are also implemented for fData
and mfData
.
Finally, we implemented dedicated specialisations for the visualisation
of functional data in plot.fData
(plot.mfData
). In case of mfData
the graphical window is split into a rectangular lattice so that each
component is plotted singularly. The rectangular frame has
mfD_healthy
dataset
in the roahd package (see Figure 2).
data("mfD_healthy", package = "roahd")
The dataset mfD healthy
collects preprocessed (denoised, smoothed and
registered) 8-leads electrocardiographic (ECG) signals during a median
heartbeat of a sample of mfD LBBB
contains the ECGs of
mfD_healthy
dataset in the roahd
package.roahd contains functions that can be used to simulate artificial data
sets of functional data, both univariate and multivariate. The data are
obtained as realisations of a Gaussian process over a discrete grid with
a specific variance-covariance operator and mean, see (Rasmussen and Williams 2006). In
general, given a covariance function,
The finite-dimensional approximation on a discrete grid
The function generate_gauss_fdata
requires the sample size N
, the
mean vector centerline
, the matrix
representation of the desired variance-covariance operator Cov
or, as alternative to Cov
, its Cholesky factor
CholCov
, which is what is actually used to impose the desired
covariance structure of the errors of the generating formulas. A
built-in function can be used to generate exponential Matérn covariance
functions, namely exp_cov_function(grid, alpha, beta)
, returning the
discretised version of a covariance of the form
generate_gauss_fdata( N=50,
centerline = sin( 2 * pi * seq( 0, 1, length.out = 10^3 ) ),
Cov =exp_cov_function( seq( 0, 1, length.out = 10^3 ),
alpha = 0.2, beta = 0.3 ) )
The simulated data are show in Figure 3.
Similarly, we can use the function generate_gauss_mfdata
to generate a
sample of
where
The function requires: the sample size N
; the number of components of
the multivariate data L
; a matrix containing by rows the means of each
component centerline
; a vector correlations
of length
listCov
containing
the discretised covariance functions over the grid listCholCov
of their Cholesky factors.
An example of the function is the following
generate_gauss_mfdata( N=100, 2,
centerline = matrix( c( sin( 2 * pi * seq( 0, 1, length.out = 10^3 )),
cos( 2 * pi * seq( 0, 1, length.out = 10^3 ) ) ), nrow = 2, byrow = TRUE ),
correlations = 0.5,
listCov = list(exp_cov_function( seq( 0, 1, length.out = 10^3 ),
alpha = 0.1, beta = 0.5 ),
exp_cov_function( seq( 0, 1, length.out = 10^3 ),
alpha = 0.5, beta = 0.1 )))
The simulated data are show in Figure 4.
In order to provide a center-outward and a down-upward/up-downward order
of data, the Band Depth (BD) and Modified Band Depth (MBD) are
implemented both for functional and multivariate functional data. Let us
recall the empirical version of Band Depth for functional data, as
introduced in (López-Pintado and Romo 2009) and in
(López-Pintado and Romo 2011). Given a stochastic process
Both the functions BD
and MBD
require either an object of class
fData
or a matrix-like dataset of functional data (e.g.,
fData$values
), with observations as rows and measurements over grid
points as columns. They return a vector containing the values of depth
for the given dataset. Thanks to these values, the dataset
MBD(fD)
[1] 0.48503510 0.46228408 0.51505469 0.31594122 0.37595102
[6] 0.49060245 0.35306449 0.40934204 0.47676898 0.37799510
[11] 0.22585633 0.45465633 0.49204245 0.34556571 0.14763429
[16] 0.27375020 0.42478041 0.51758041 0.07788571 0.49579265
...
Another interesting down-upward/up-downward order of data can be built
on top of Epigraph Index (EI) and Hypograph Index (HI) or of their
corresponding Modified versions (MEI and MHI). We recall the definition
of EI (HI) for univariate functional data as introduced in
(López-Pintado and Romo 2011). Given a stochastic process
EI
(HI
) and MEI
(MHI
) require either an object of class fData
or a
matrix-like dataset of functional data (e.g., fData$values
), with
observations as rows and measurements over grid points as columns. They
return a vector containing the values of the corresponding indexes for
the given dataset, that can provide the desired ordering of data. In
(López-Pintado and Romo 2011) the authors propose another well-posed
definition of depth, the Modified Half Region Depth (MHRD) for
functional data as:
MHRD
in roahd computes the MHRD of elements of a univariate
functional dataset, and has the same usage as the previous functions.
In (Ieva et al. 2013) and (Ieva and Paganoni 2017) these statistics have been
generalized to multivariate functional framework. Let
Analogously we define the MEI (MHI) of
multiMBD
computes the MBD for a dataset of multivariate curves. In particular
multiMBD
requires either an object of class mfData
or a list of
2-dimensional matrices (Data
) having as rows the units of that
component and as columns the measurements of the functional data over
the grid, as well as either a set of weights weights
or the string
uniform specifying that a set of uniform weights (of value mfD
) shown
in Figure 4, using uniform weights:
multiMBD(mfD, weights="uniform")
[1] 0.40842020 0.45438788 0.24038384 0.31500606 0.35914343
[6] 0.48603636 0.44544040 0.42960606 0.37119798 0.21466667
[11] 0.46331111 0.37947677 0.46344646 0.39914141 0.30079394
[16] 0.48643838 0.14745657 0.47115354 0.41804242 0.32814545
...
The choice for the weights
The functions median_fData
(median_mfData
) of the package compute
the sample median of a univariate (multivariate) functional dataset
based on a definition of depth for univariate (multivariate) functional
data. Their input is the dataset whose median is required, in form of
fData
or mfData
object, and a string specifying the name of the
depth definition to use, as parameter type
. This name should bind to a
function actually defined in the workspace, such as the build-in ones of
roahd (e.g., MBD, MHRD, etc.).
Figure 5 shows the plot of healthy ECG data (see
mfD_healthy
) with superimposed the multivariate functional median
computed maximizing the multivariate MBD (9) with
uniform weights.
median_mfData(mfD_healthy, type = "multiMBD")
When dealing with multivariate functional data, it is possible to
compute correlation coefficients between observations’ univariate
components that generalise the Spearman’s coefficient
The function cor_spearman
can be used to compute the Spearman
correlation coefficient (13) for a bivariate mfData
object, using the ordering definition specified by ordering
(the
default is to use MEI
) to rank univariate components and then compute
the correlation coefficient. Besides MEI
, also MHI
can be used to
determine ranks.
Another well known measure for concordance in bivariate data is the
Kendall’s
cor_kendall
can be used to compute the Kendall
correlation coefficient (16) for a bivariate mfData
object, using the ordering definition specified by ordering
(by
default max
is used, i.e., formula (14)) to rank
univariate components, then to compute concordant and discordant pairs
and then the correlation coefficient. Also area
(i.e., formula
(15)) can be used. As an example let us compute
mfD
)
shown in Figure 4:
cor_spearman(mfD, ordering ="MEI")
[1] 0.6098597
cor_kendall(mfD, ordering="area")
[1] 0.4222222
In (Ieva et al. 2018) a boostrap-based inferential framework for the
Spearman coefficient is introduced. In particular the authors suggest to
compute a sample from the bootstrap distribution of the statistic
BCIntervalSpearman
. This function requires: two univariate
functional datasets in form of fData
objects, fD1
, fD2
; the
ordering relation to be used in the Spearman’s coefficient computation
as the parameter ordering
; the number of bootstrap iterations to use
in order to estimate the confidence interval, bootstrap_iterations
and
the coverage probability (1-
As an example let us compute a BCA interval of confidence mfD
)
shown in Figure 4:
BCIntervalSpearman(mfD$fDList[[1]], mfD$fDList[[2]], ordering = 'MEI',
alpha=0.05, bootstrap_iterations = 1000)
$lower
[1] 0.6520883
$upper
[1] 0.9819355
A verbosity parameter can be set in function BCIntervalSpearman
in
order to log information on the function’s progress when the
computational time is long. The simple or Bias-Corrected and Accelerated
version of the confidence interval allows for testing the presence of
dependency among two families od univariate curves, i.e.
cor_spearman
applied to a dataset of
type mfData
returns the pointwise estimate of the Spearman matrix BCIntervalSpearmanMultivariate
returns two matrices
containing the lower and upper bounds of the corresponding confidence
intervals. To clarify their use, we show the results corresponding to
the first two leads, i.e. I and II of mfD_healthy
.
mfD_healthy_subset = as.mfData(list(mfD_healthy$fDList[[1]],
mfD_healthy$fDList[[2]]))
cor_spearman(mfD_healthy_subset, ordering='MEI')
[1] 0.6840466
BCIntervalSpearmanMultivariate(mfD_healthy_subset,
ordering='MEI', alpha=0.05, bootstrap_iterations = 1000)
$lower
[,1] [,2]
[1,] 1.0000000 0.4805781
[2,] 0.4805781 1.0000000
$upper
[,1] [,2]
[1,] 1.000000 0.820072
[2,] 0.820072 1.000000
In order to perform a comparison between correlation patterns across
different populations of multivariate functional data, in
(Ieva et al. 2018) the authors propose a bootstrap procedure to test the
equality between the two corresponding Spearman matrices. Consider two
multivariate functional datasets, where the observations are
realizations of the stochastic processes
The function BTestSpearman
performs the test described above and
requires: two univariate functional samples in form of mfData
object,
mfD1
, mfD2
; the ordering relation to be used in the Spearman’s
coefficient computation ordering
; the number of bootstrap iterations
to be performed bootstrap_iterations
; the norm to measure the
differences between the Spearman correlation matrices of the two
functional datasets, normtype
(the allowed values are the same as for
parameter type
in R’s base
function norm
). The function returns the estimates of the test’s
p-value and statistics. As an example let us perform the test
considering the first two components of mfd_healthy
and mfD_LBB
datasets provided by the roahd package.
mfD_healthy_subset = as.mfData(list(mfD_healthy$fDList[[1]],
mfD_healthy$fDList[[2]]))
mfD_LBBB_subset = as.mfData(list(mfD_LBBB$fDList[[1]],
mfD_LBBB$fDList[[2]]))
BTestSpearman(mfD_healthy_subset, mfD_LBBB_subset,
bootstrap_iterations = 1000,
ordering = "MEI", normtype = "f")
$pvalue
[1] 0.473
$phi
[1] 0.06562356
The tools shown in this section (i.e., the functional boxplot and the
outliergram) enable a complete inferential analysis of (multivariate)
functional data based on robust statistics, like depth measures,
described in Section 3. These tools are very useful also in
the outlier detection framework which is of primary interest in FDA,
since outliers may deeply affect the inference of high dimensional data,
especially whenever the sample size is small.
The functional boxplot (see (Sun and Genton 2011)) is obtained by ranking
functions from the center of the distribution outwards thanks to a
suitable depth definition, computing the region of 50% most central
functions, see Eq. (3). The fences are obtained by
inflating such region by a factor
The function fbplot
computes the depths of a dataset and marks
outlying observations. If used with graphical option on (default
behaviour), it also plots the functional boxplot of the dataset.
fbplot
requires: the univariate functional dataset whose functional
boxplot must be determined in form of an fData
object Data
; either a
vector containing the depths for each statistical unit of the dataset,
or a string containing the name of the method you want to use to
compute; the value of the inflation factor, Fvalue
(the default value
is mfD_healthy
.
fbplot(mfD_healthy$fDList[[1]], Depths="MBD", Fvalue=3,
main="Functional Boxplot")
$Depth
[1] 0.4399681 0.1534263 0.4097385 0.4116510 0.3872242
[6] 0.4123326 0.2387404 0.3568670 0.3669691 0.4601483
[11] 0.3006872 0.3744531 0.1360906 0.4319324 0.3206186
[16] 0.2400199 0.4340800 0.4462739 0.2805389 0.4638281
...
$Fvalue
[1] 3
$ID_outliers
2
The function fbplot
also allows to automatically compute the best
adjustment factor F
that yields a desired proportion of outliers (True
Positive Rate, TPR
) of a Gaussian dataset with same center and
covariance function as the fData
object (see
(Sun and Genton 2012)). Such automatic tuning involves the
simulation of a number N_trials
of separate datasets of Gaussian
functional data with same center and covariance as the original dataset
(the covariance is robustly estimated with the function covOGK
of the
package robustbase, see (Maronna and Zamar 2002)) of size trial_size
, and
the computation of N_trials
values for Fvalue
such that the desired
proportion TPR
of observations is flagged as outliers. The optimal
value of Fvalue
for the original population is then found as the
average of the previously computed values Fvalue
. The parameters to
control the adjustment procedure can be passed through the argument
adjust
, whose default is FALSE
and otherwise is a list with (some
of) the fields:
N_trials
: the number of repetitions of the adjustment procedure
based on the simulation of a Gaussisan dataset of functional data,
each one producing an adjusted value of F, which will lead to the
averaged adjusted value Fvalue
. Default is 20;trial_size
: the number of statistical units in the Gaussian
population of functional data that will be simulated at each
repetition of the adjustment procedure. Default is TPR
: the True Positive Rate of outliers, i.e., the proportion of
observations in a dataset without amplitude outliers that have to be
considered outliers. Default is F_min
: the minimum value of Fvalue
, defining the left boundary
for the optimisation problem aimed at finding the optimal value of
Fvalue
;F_max
: the maximum value of Fvalue
, defining the right boundary
for the optimisation problem aimed at finding the optimal value of
Fvalue
;tol
: the tolerance to be used in the optimisation problem aimed at
finding the optimal value of Fvalue
;maxiter
: the maximum number of iterations to solve the
optimisation problem aimed at finding the optimal value of Fvalue
.Due to the S3 specialization, fbplot
can construct also functional
boxplots of a multivariate functional dataset (see (Ieva and Paganoni 2013)).
In Figure 7 we show the functional boxplot of the
first two leads, i.e. I and II of mfD_healthy
.
mfD_healthy_subset <- as.mfData(list(mfD_healthy$fDList[[1]],
mfD_healthy$fDList[[2]]))
fbplot( mfD_healthy_subset, Fvalue = 1.5, xlab = 'time',
ylab = list( 'Values 1', 'Values 2' ),
main = list( 'First component', 'Second component' ) )
$Depth
[1] 0.4061113 0.1695619 0.4086113 0.4204500 0.3005780
[6] 0.4233634 0.3170089 0.3522569 0.3821995 0.4625769
[11] 0.3055684 0.3535029 0.1630668 0.4213114 0.3553795
[16] 0.3079317 0.3853858 0.3711121 0.2640493 0.4489892
...
$Fvalue
[1] 1.5
$ID_outliers
[1] 2 16
$Depth
[1] 0.43298990 0.36373333 0.40771515 0.38667677 0.10381616
[6] 0.44186869 0.36482424 0.34480404 0.45246061 0.36904040
[11] 0.25263232 0.28706869 0.42344040 0.29333333 0.02921414
[16] 0.49872121 0.49638384 0.41457980 0.25275960 0.29689697
....
$Fvalue
[1] 2.5
$ID_outliers
[1] 5 15 27 31 32 33 34 35 44 50 52 55 57 59 63 67 71 73 75 85 23
Let us point out that the functional boxplot has been constructed mainly for detection of magnitude outliers, i.e., curves that lie far from the range of the majority bulk of data.
A method that can be used to detect shape outliers and covariance
outliers is the outliergram (see (Arribas-Gil and Romo 2014)), based on the
computation of MBD and MEI (see (2) and
(6)) of univariate functional data. Shape outliers are
curves that present a different pattern with respect to the rest of the
data in terms of their derivatives and covariance outliers are curves
generated by a model that is different from the model of the majority of
data just in terms of the variance and covariance operator that affects
the second order moments of data. Given a set of data outliergram
displays the
outliergram of a univariate functional dataset of class fData
(see
Figure 8) and returns a vector of observation IDs indicating the
outlying observations in the dataset.
The function multivariate_outliergram
implements the generalisation of
the outliergram to multivariate functional data, following
(Ieva and Paganoni 2017). Let
So the outliergram for multivariate functional data is constructed in
analogy with the univariate one and based on the quadratic boundary
(20). The function multivariate_outliergram
displays the outliergram of a multivariate functional dataset of class
mfData
(see Figure 9) and returns a vector of
observation IDs indicating the outlying observations in the dataset.
multivariate_outliergram(mfD, Fvalue = 2, shift=TRUE)
$Fvalue
2
$Depth
[1] 0.0386524565 0.0267086844 0.1297124989 0.0474675556
[5] 0.0296031652 0.0446769503 0.0353957777 0.0294200492
$ID_outliers
[1] 12 18 32 47 70 83 91 96
In this paper we have described the implementation in the roahd package of several statistical methods that deal with the robust statistical analysis of univariate and multivariate functional data, and some graphical tools mainly aimed at identifying and discarding outliers from a dataset of (potentially multivariate) functional data. The package should simplify the access and use of these strongly nonparametric methods to perform a suitable robust inferential analysis of high dimensional and complex data.
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Ieva, et al., "roahd Package: Robust Analysis of High Dimensional Data", The R Journal, 2019
BibTeX citation
@article{RJ-2019-032, author = {Ieva, Francesca and Paganoni, Anna Maria and Romo, Juan and Tarabelloni, Nicholas}, title = {roahd Package: Robust Analysis of High Dimensional Data}, journal = {The R Journal}, year = {2019}, note = {https://rjournal.github.io/}, volume = {11}, issue = {2}, issn = {2073-4859}, pages = {291-307} }