ICSOutlier: Unsupervised Outlier Detection for Low-Dimensional Contamination Structure

Detecting outliers in a multivariate and unsupervised context is an important and ongoing problem notably for quality control. Many statistical methods are already implemented in R and are brieﬂy surveyed in the present paper. But only a few lead to the accurate identiﬁcation of potential outliers in the case of a small level of contamination. In this particular context, the Invariant Coordinate Selection (ICS) method shows remarkable properties for identifying outliers that lie on a low-dimensional subspace in its ﬁrst invariant components. It is implemented in the ICSOutlier package. The main function of the package, ics.outlier , offers the possibility of labelling potential outliers in a completely automated way. Four examples, including two real examples in quality control, illustrate the use of the function. Comparing with several other approaches, it appears that ICS is generally as efﬁcient as its competitors and shows an advantage in the context of a small proportion of outliers lying in a low-dimensional subspace. In quality control, the method may help in properly identifying some defective products while not detecting too many false positives.


Introduction
The unsupervised detection of multivariate outliers is a timeless subject of interest in statistics, see Aggarwal (2017), Hodge and Austin (2004), Hadi et al. (2009) for a complete overview.Indeed, this can be the goal of the analysis, like in fraud detection, in medical applications or manufacturing-defect detection.It can also be used for preprocessing in almost any statistical analysis that is sensitive to the presence of outliers, e.g.Principal Component Analysis (PCA).Many statistical methods exist and have already been implemented in R through tens of packages.Only some of these packages are dedicated to the unsupervised context and are mentioned below.One of the most common methods for multivariate outlier detection is the Mahalanobis Distance (MD), and many packages are based on this distance: mvoutlier (Filzmoser and Gschwandtner, 2017), CerioliOutlierDetection (Green and Martin, 2017a), rrcovHD (Todorov, 2016), faoutlier (Chalmers and Flora, 2015).Additionally, traditional methods such as the angle-based methods are implemented in the packages abodOutlier (Jimenez, 2015), HighDimOut (Fan, 2015), the distribution-based methods in alphaOutlier (Rehage and Kuhnt, 2016), extremevalues (van der Loo, 2010), HDoutliers (Fraley, 2016), outliers (Komsta, 2011), the density-based methods in DMwR2 (Torgo, 2016), HighDimOut, ldbod (Williams, 2017), Rlof (Hu et al., 2015) and the depth-based methods in depth (Genest et al., 2017).Finally, other approaches have been implemented, such as the one based on Projection Pursuit (PP) in REPPlab (Fischer et al., 2016b), and PCA in OutlierDC (Eo and Cho, 2014), pcadapt (Luu and Blum, 2017), rrcov (Todorov and Filzmoser, 2009) and rrcovHD.However, it is important to note that all the implemented functions do not necessary return outlierness measures and the identities of the outlying observations.Limiting only to some packages fulfilling these conditions and to multivariate methods for numerical variables leads to focusing on mvoutlier, CerioliOutlierDetection and rrcov.These packages implement modified versions of the robust Mahalanobis distance, different algorithms for the outlier identification in high dimensions (Filzmoser and Todorov, 2013), classical and robust PCA.In PCA, the labelling of the outliers is based on the comparison of two outlierness measures: a score distance (SD) based on the first principal components and a distance in the orthogonal space (OD).All these methods rely on distances.To extend the analysis, we include comparisons with Local Outlier Factor (LOF) (Rlof package) and Angle-Based Outlier Factor (ABOD) (abodOutlier package) approaches even if they only return outlierness measures and not the identities of the outlying observations.These methods dedicated to outlier detection in an unsupervised context have different properties.Contrary to PCA, the Mahalanobis distance and its modified versions are invariant under any affine transformation as long as the location and scatter estimators are affine equivariant.However, several drawbacks have been noticed in the use of these distances.First, Filzmoser et al. (2005) highlight the fact that an outlier is not necessarily an extreme value.So, using a fixed threshold to label outliers for every data set is not the wisest solution and the threshold should be adjusted to the sample size.Moroever, as Cerioli (2010) notes, the threshold used usually is not adapted to the case in which there is no outlier in the data.Finally, for Cerioli (2010), the comparison of the distances of each observation to the threshold should not be performed independently without taking into account any simultaneous adjustments.Otherwise, the method may lead to many false positives, The R Journal Vol.10/1, July 2018 ISSN 2073-4859 i.e non outlying observations that are identified as outliers.Note that this is a serious concern that applies to almost all the outlier-detection methods.To the best of our knowledge, the mvoutlier and CerioliOutlierDetection packages are the first ones that tend to correct these drawbacks, but only for the robust Mahalanobis distance with the Minimum Covariance Determinant (MCD) estimators.
The ICSOutlier (Archimbaud et al., 2016) package is another approach to the detection of outliers.It appears that ICS is as efficient as its competitors in many situations and shows an advantage in the case of a small proportion of outliers lying in a low-dimensional subspace.This method is well-designed in particular for quality control.Its current version is able to handle a small proportion of outliers that lie on a subspace, using the affine equivariant Invariant Coordinate Selection (ICS) method, as presented by Archimbaud et al. (2018).The method relies on a generalized diagonalization and on the selection of the invariant components associated with the largest eigenvalues.Note that when more than say 10% of the observations are suspected to be outlying, invariant components associated with the smallest eigenvalues may also be of interest and this alternative will be considered in a future version of the package.ICS fulfills all the properties mentioned previously: (i) detect the absence of outliers, (ii) not be sensitive to the standardization of the data and (iii) be a multivariate method which controls the number of observations identified as outliers.The restriction to a small proportion of outliers that belong to a subspace is of interest in some areas of manufacturing products with a high level of quality control (e.g.automotive, avionics or aerospace).
In the following sections, the principle of the Invariant Coordinate Selection (ICS) method is recalled as well as the three steps of the outlier detection procedure: (i) invariant components selection, (ii) outlierness measure definition and (iii) outlier identification.Then, we explain how to use the ICSOutlier package and finally we analyze the efficiency of the ICS method compared to other methods on four examples, including a new real data set included in the package.

Invariant Coordinate Selection (ICS) for outlier detection Principle
The ICS method is a powerful method designed for exploring multivariate data by revealing their structure (Nordhausen et al., 2008).As explained in Tyler et al. (2009), it is based on a simultaneous spectral decomposition of two scatter matrices and leads to an affine invariant coordinate system.Readers not so familiar with the concept of scatter matrices are refered for example to Rousseeuw and Hubert (2013), Nordhausen and Tyler (2015) and references therein for definitions, examples and properties.
Following Archimbaud et al. (2018), for a p-variate dataset X n = (x 1 , . . ., x n ) , let V 1 (X n ) and V 2 (X n ) be two affine equivariant scatter matrices (symmetric and positive definite) and m 1 (X n ) and m 2 (X n ) their associated location estimators.ICS is looking for the diagonal p × p matrix D(X n ) containing the eigenvalues of V 1 (X n ) −1 V 2 (X n ) in decreasing order and the p × p matrix B(X n ) = (b 1 , . . ., b p ) containing the corresponding eigenvectors as its rows and such that Letting V 1 (X n ) be "more robust" than V 2 (X n ), the interpretation of these eigenvalues is the main point of the ICS method: they correspond to some kurtosis measure that depends on the choice of V 1 (X n ) and V 2 (X n ) .For example if V 1 is the usual empirical variance-covariance matrix and V 2 is the scatter matrix based on the fourth moments (the default in the ics.outlier function), the eigenvalues are ordered according to their classical Pearson kurtosis values in decreasing order.In this case and for a small proportion of outliers, it is advisable to analyze the projections that maximize the kurtosis and are associated with the largest eigenvalues (Archimbaud et al., 2018).
Then, using the location estimator m 1 (X n ) associated with the scatter matrix V 1 (X n ) , the corresponding centered scores are obtained as z i = B(X n )(x i − m 1 (X n )) for i = 1, . . ., n, so that: They are called the invariant components (IC) because of their affine invariance property in the sense that if X * n = X n A + 1 n b for any p × p regular matrix A and any p-vector b, where 1 n is a n-vector of ones and J is a p × p diagonal matrix with diagonal elements ±1, which The R Journal Vol.10/1, July 2018 ISSN 2073-4859 means the invariant coordinates change at most their signs.
These scores are related to the Mahalanobis Distance (MD).For any observation x i with i = 1, . . ., n, the squared Euclidian norm of its centered invariant coordinates is exactly the squared Mahalanobis distance from m 1 (X n ) in the sense of V 1 (X n ): The added value of using ICS over MD is realized when the structure of the data is on a subspace of dimension q < p.In this context and with a small percentage of outliers, it is of interest to focus only on the q projections that maximize a kurtosis measure.Indeed, if the number k of selected Invariant Coordinates corresponds to the dimension q of the subspace on which the outliers lie, then the ICS method is expected to recover the subspace of interest for outlier detection.As detailed in Tyler et al. (2009), this subspace corresponds to Fisher's discriminant subspace in case of mixtures of Gaussian distributions.Nevertheless, the main difficulty is to correctly estimate this dimension k.

Invariant coordinates selection
Depending on the combination of the scatters V 1 (X n ) and V 2 (X n ) chosen, the ICSOutlier package incorporates automated ways to select these k invariant coordinates.The two approaches proposed in Archimbaud et al. (2018) are implemented here: a test based on a quasi-inferential procedure and some normality tests.Considering V 1 (X n ) be more robust than V 2 (X n ) and a small percentage of outliers (less than 10% is recommended), the structure of the outlierness should be contained in the first k non-normal components.Note that the structure may be also contained in the last components (associated with the smallest eigenvalues) if the proportion of outliers is large.What is meant by "large" depends on the scatter pair but also on the distribution of the data as detailed in Archimbaud et al. (2018).Since the invariant coordinates are already ordered decreasingly according to a kurtosis measure, it is enough to test whether each component is Gaussian beginning by the one associated with the largest eigenvalue and stop the test procedure as soon as we find a Gaussian component.So, if k + 1 denotes the rank of the first Gaussian component, the components are sequentially tested at the adapted level α j = α/j, for j = 1, . . ., k, as in Dray (2008).Because of the sequentiality, the Bonferroni correction adjusts the significance of each test and ensures a nominal level α.
The first approach is a Parallel Analysis (PA) based on Monte Carlo simulations of eigenvalues for Gaussian populations of the same dimension as the initial dataset.This method is common for selecting components in PCA as described in Peres-Neto et al. (2005).It is based on mEig simulations of a Gaussian population and, for each j = 1, . . ., p, the computation of the 1 − α j percentile of the j th eigenvalue of ICS which is considered as a cut-off for the j th component.Then, sequentially from j = 1, if the observed j th eigenvalue exceeds the cut-off then the j th invariant component is declared non-Gaussian and is selected.As soon as one eigenvalue is smaller than its associated cut-off, the invariant component is considered as Gaussian and the test procedure is stopped.
The second approach is directly based on usual normality tests and is applied to the invariant coordinates.In the package the user can choose out of the following five tests: the D'Agostino test of skewness (DA), the Anscombe-Glynn (AG) test of kurtosis, the Bonett-Seier (BS) test of Geary's kurtosis, the Jarque-Bera (JB) test for normality which is based on both skewness and kurtosis measures and the Shapiro-Wilk (SW) normality test (see Yazici and Yolacan (2007) and Bonett and Seier (2002) for a complete description of each).The process of testing is still sequential with an adaptation of the level of each test.Once the first normal coordinate is found, the test procedure stops and only the non-Gaussian coordinates are selected.

Measure of outlierness
Let k denote the number of invariant components selected by one of the two test procedures detailed previously or by looking at the screeplot of the eigenvalues.Then, for each observation x i , i = 1, . . ., n, its squared ICS distance is computed based on the k selected invariant coordinates by: where The R Journal Vol.10/1, July 2018 ISSN 2073-4859

Outlier identification
Finally, the outliers are identified based on the comparison of their ICS distance with the expectation under the Gaussian distribution as in Archimbaud et al. (2018).The cut-off is derived from mDist Monte Carlo simulations of a Gaussian population of the same dimension as the initial dataset.For each observation, the ICS distance is computed using the k selected components.For a given level β, the cut-off corresponds to the average of the 1 − β percentiles of the distances over all the simulations.An observation is labeled as outlier if its ICS distance is higher than this cut-off.By default, β = 5%.

Using the package ICSOutlier
The main function available in ICSOutlier is ics.outlier which directly calls all the other functions included in the package.First it is necessary to apply the ics2 function from the ICS (Nordhausen et al., 2008) package in order to create an object of class ics2.The ICS package has been available for a few years, and contains a function called ics for implementing the ICS method.However, the ics function cannot be used directly for outlier detection, and a new function, ics2, had to be added to the ICS package (version 1.3-0).The arguments of ics2 are mostly the same as those of the function ics, except for now it is necessary to add the location vectors associated with the scatter estimators.
The main function ics.outlier of the package ICSOutlier uses the output of the ics2 function as an input.The function ics.outlier implements the three steps necessary for outlier detection using ICS in an automated way: (i) Select the invariant coordinates which recover at best the subspace where the outliers are lying using some test procedure.
(ii) Compute the ICS distance based on the selected invariant coordinates as an outlierness measure for each observation.
(iii) Label the outliers using the cut-off value obtained through simulations.
The links between the ics2 function from the ICS package and the functions available in the ICSOutlier package are illustrated on Figure 1.

ICS ics2
Outlier  The ics.outlier function only takes as parameter an object of class ics2 and not an object of class ics because this later returns only uncentered invariant coordinates contrary to the ics2 class.From a practical point a view, the S1 and S2 arguments of the ics2 function should be the name of the function which returns a list with the first (resp.second) location vector m 1 (resp.m 2 ) and the first scatter matrix V 1 (resp.V 2 ).
The R Journal Vol.10/1, July 2018 ISSN 2073-4859 In addition to this ics2 object, there are other parameters of ics.outlier that can be tuned, including an option to parallelize the computations in order to increase the speed of the function: • method: name of the method used to select the ICS components.Options are "simulation" for the parallel analysis approach and "norm.test"for the normality tests approach.Depending on the method, either comp.norm.testor comp.simu.test is used.
• test: • iseed: The seed passed on to clusterSetRNGStream if parallel computation is used.Default is NULL which means no fixed seed is used.
• pkg: A character vector listing all the packages which need to be loaded on the different cores via require if parallel computation is used.Must be at least "ICSOutlier" and must contain the packages needed to compute the scatter matrices.
• qtype: specifies the quantile algorithm used in quantile.
By default the ics2 function performs ICS with the usual mean vector and the usual empirical variance-covariance matrix returned by MeanCov and the location vector based on the third moments associated to the scatter matrix based on the fourth moments returned by Mean3Cov4.

Examples
In the following examples, we illustrate first, using an artifical data set, how ics.outlier behaves in the case of no outlier and then apply the function to three data sets available in R. For reproducibility all examples have a fixed seed.

Example with no outlier
One of the advantages of using ICS for outlier detection is its ability to detect the absence of outliers.If the first invariant coordinate is considered normal that means there is no outlier in the data set.So, in presence of a normal multivariate data set the function should select in most cases no component and hence will not detect outliers.
For the first example we simulate a bivariate normally distributed data set with 500 observations and we determine the cut-offs for identifying the outliers at the level 2.5% for all methods.library("ICSOutlier", quietly = TRUE) # Data simulation set.seed(123)X <-matrix(rnorm(1000, 0, 0.1), 500, 2) Using the default ICS setting and choosing the Jarque-Bera test of kurtosis to select the components at the default level 5% can easily be done as follows.
As a comparison we compute the robust Mahalanobis distances using the MCD with a breakdown point of 25% and compute the commonly used cut-off value coming from the χ 2 2 distribution and the adjusted cut-off value from (Green and Martin, 2017b) as implemented in the package CerioliOut-lierDetection.

R> [1] 14.22071
To visualize these results, the robust squared Mahalanobis distances are plotted with two different cut-off values in Figure 2.
Another situation where the ics.outlier function can conclude to an absence of outliers is when the squared ICS distances are below the corresponding cut-off for all observations, as illustrated in the help file of the function: ?ics.outlier.

HTP data set
The real data set HTP, introduced in Archimbaud et al. (2018), is included in this package.The data set provides the results of 88 numerical tests for 902 high-tech parts.Based on these results the producer considered all parts functional and all of them were sold.However two parts, 581 and 619, showed defects in use and were returned to the manufacturer.These two observations can thus be considered as outliers and the objective is to detect them.We use the ics.outlier function with its default settings, as most users initially would do but with the parallelizing option for the computations.# Code for the Figure 4 screeplot(icsHTP, cex.lab = 0.7, cex.axis = 0.7, cex.names = 0.7, cex.main = 0.7) Based on this screeplot, three components might be a reasonable choice and then the functions ics.distances and dist.simu.testcan be used to obtain the desired distances and cut-off value.
ics.dist.scree<-ics.distances(icsHTP,index = 1:3) # by default it can take quite long as m = 10000, so we choose to # use all but one available cores to parallelize the simulations.library(parallel) ics.cutOff <-dist.simu.test(icsHTP,1:3, ncores = detectCores()-1, iseed = 123) ics.cutOffR> 97.5% R> 12.80771 Before visualizing these results we also apply two competing outlier detection methods.First the Finite-Sample Reweighted MCD (FSRMCD) outlier detection test of Cerioli (Cerioli, 2010), implemented in the package CerioliOutlierDetection and secondly the SIGN1 algorithm (Filzmoser et al., 2008) implemented in the package mvoutlier.For the FSRMCD algorithm, we choose as nominal level α = 1 − γ 1/n for the individual outlier tests, with γ the nominal size of the intersection test and n the number of observations.Then the ICS distances based on the three selected components are plotted against the distances from the two competing methods together with the corresponding cut-off values.
# FSRMCD library("CerioliOutlierDetection") FSRMCD <-cerioli2010.fsrmcd.test(HTP,signif.alpha= 1 -0.975**(1/nrow(HTP)), mcd.alpha = 0.75) # Two critical values: one for points included in the reweighted MCD (weights == 1) # and one for points excluded from the reweighted MCD (weights == 0)).As illustrated in Figure 5, the cut-offs used for the FSRMCD and SIGN1 algorithms plotted on the x-axis lead to too many false positives.Even when focusing only on the rank of the outlying observations it is clear that the ICS method outperforms the other methods.The outlier labelled 619 is obviously outlying with all methods, but the outlier labelled 581 has the 4th highest squared ICS distance compared to 15th and 12th for the distances computed with FSRMCD and SIGN1 algorithms.Note that the data set contains highly collinear variables and it is well-known that high-breakdown point methods are not optimal in this case.Archimbaud et al. (2018) contains a more detailed analysis of this dataset comparing ICS with other methods and also gives the impact of the choice of different scatter matrices.For the present data set the default scatter combination used here seems the best.In this analysis, it is also interesting to notice that the first component highlights only the two defective parts detected on the second and the third components of ICS and as far as we know they are not defective parts but false positives.

Reliability data set
The Reliability data set comes also from an industrial context and is part the of REPPlab package.It contains 55 variables measured during the production process of 520 units.The challenge for this data is to identify the produced items with a fault not detected by the marginal tests.According to Fischer et al. (2016a), the observations numbered 414 and 512 are most likely outliers.
A special feature of this data set is that for example variable 24 is nearly constant which makes this data set hard to use with high-breakdown methods.For example the MCD cannot be computed for this data set when all variables are used.The use of ICS and other distance-based outlier detection methods is also detailed in Archimbaud et al. (2018).
Let us now illustrate the use of other scatter matrices than the default one together with the way to pass arguments to some of the functions.This time the invariant components are selected using simulations (Parallel analysis) with an initial decision level of 5%.These results are then compared to two methods not included in Archimbaud et al. (2018), namely the Local Outlier Factor (LOF, (Breunig et al., 2000)) and the Angle-Based Outlier Factor (Kriegel et al., 2008) outlier detection methods.First, we apply the LOF method with the lof function from the Rlof package which parallelizes the computation of the local outlier factor using l neighbours for each observation.The number of neighbours is determined based on the guidelines from Breunig et al. (2000): we compute the outlierness measures for l between 5 and 50 and we aggregate the results by taking the maximum of the local outlier factor for each observation.However, the distribution of the LOF values (and of the aggregated indices) is not known and therefore, there exists no theoretical cut-off to identify the outlying observations.In practice, observations with LOF values much higher than one are considered as potentially outlying.Second, we calculate the angle-based outlier factor for each observation through the abod function from the abodOutlier package on a random sample of 10% (the default) of the data (not on the entire data set because it is too time-consuming).As for the LOF method, the distribution of the outlyingness indices is unknown.But for this method, a potentially outlying observation is indicated by a small index value.
The location and scatter combinations we use here are the regular mean vector and covariance matrix and the joint maximum likelihood estimation of location and scatter of a t-distribution with one degree of freedom, also known as Cauchy MLE estimate.The function MeanCov returns the mean vector and regular covariance matrix as required for ics2 and for the Cauchy MLE we use the function tM where the degree of freedom is specified using the argument df.As the Cauchy MLE is considered the more robust estimator it should be specified in ics2 as S1.

HBK data set
The hbk data set is included in the robustbase package (Rousseeuw et al., 2017).It is an artificial data set created by Hawkins et al. (1984) for illustrating the so-called masking effect of outliers.It contains two groups of outliers with observations 1-10 in the first group and observations 11-14 in the second group, over the 75 observations characterized by the three explanatory variables.Usually with non-robust methods only the observations 12, 13 and 14 are identified as outliers.For this example the percentage of outliers is around 19% and so is larger than the percentage we recommend for the current version of the package.Our aim is to illustrate that in such a situation, the choice of the scatter matrices may have a large impact on the results.
First, the combination of the highly robust MCD estimates with the mean vector and regular covariance matrix is studied.Note that in order to be able to use the MCD via the function CovMcd a wrapper needs to be written around the function so that it returns the proper list as expected by ics2.components at the default level of 5%.The level of the quantile used for deriving the cut-off for outlier identification is the default, 2.5%.For this combination, ICS performs equally well as the MCD-based Mahalanobis distances, as illustrated in the first three plots of the Figure 7.All outliers are correctly identified, no false positive is detected and the two groups of outliers are clearly separated.

Conclusion and future developments
The use of ICSOutlier for outlier detection has been illustrated on several examples.The results are already pretty good with the default parameters as soon as the percentage of outliers is small.Note however that the components selection procedure can be improved sometimes by a visual inspection of the screeplot and the cut-off for outlier identification can be adjusted by looking at the ICS distances plot.The ability of the method to check for the absence of outliers is an advantage compared with methods such as the Mahalanobis distance.In the same vein, the number of false positives, i.e the number of non-outliers declared as outliers, is restricted thanks to the use of the cut-off derived from simulations.Moreover, the real HTP data set included in the package shows that the ICS method is useful for outlier detection in the context of quality control, among possible applications.Finally the ICS method competes with common approaches based on distances (the Mahalanobis distance, its robust version, the PCA methods, its variants and improvements) as well as other methods based on the density (LOF) or on angles (ABOD).by using the parallelizing option.Moreover, for now, the function can only select the first invariant components which is relevant when the percentage of outliers is small.For a large percentage, the invariant components associated with the smallest eigenvalues may also be of interest and this is a current perspective of the present package.A generalization of the ICS method in case of perfectly collinear variables is also a work in progress.

Figure 1 :
Figure 1: Organization chart of the functions called by ics.outlier

Figure 2 :Figure 3 :
Figure 2: Squared robust Mahalanobis distances and two different cut-off values.

Figure 4 :
Figure 4: Screeplot of ICS eigenvalues for HTP data and default parameters.

Figure 5 :
Figure 5: Squared ICS distances with 3 components against FSRMCD (left panel) and SIGN1 (right panel) measures of outlierness for HTP data with cut-offs at level of 2.5%.

Figure 6 :
Figure 6: Squared ICS distances with 39 components against LOF (left panel) and ABOD (right panel) measures of outlierness for Reliability data.

Figure 7 :
Figure 7: Squared robust Mahalanobis distance based on the MCD (left panel), squared ICS distances with MCD and Mean-Cov (middle panel), and squared ICS distances with default scatters (right panel).All cut-offs values are at level of 2.5% name of the marginal normality test to use if method = "norm.test".Possibilities are "jarque.test","anscombe.test","bonett.test","agostino.test","shapiro.test".Default is "agostino.test".number of simulations performed to derive the cut-off values for selecting the ICS components.Only if method = "simulation".See comp.simu.testfor details.• level.test:level for the comp.norm.testor comp.simu.testfunctions.The inital level for selecting the invariant coordinates.• adjust: logical.For selecting the invariant coordinates, the level of the test can be adjusted for each component to deal with multiple testing.See comp.norm.testand comp.simu.testfor details.Default is TRUE.• level.dist:level for the dist.simu.testfunction.The (1-level(s))th quantile(s) used to determine the cut-off value(s) for the ICS distances.
• mEig:• mDist: number of simulations performed to derive the cut-off value for the ICS distances.See dist.simu.testfordetails.•type: currently the only option is "smallprop" which means that only the first ICS components can be selected.See comp.norm.testorcomp.simu.testfordetails.• ncores: number of cores to be used in dist.simu.testand comp.simu.test.If NULL or 1, no parallel computing is used.Otherwise makeCluster with type = "PSOCK", from the parallel package, is used.