Linear discriminant analysis (LDA) is a powerful tool in building classifiers with easy computation and interpretation. Recent advancements in science technology have led to the popularity of datasets with high dimensions, high orders and complicated structure. Such datasetes motivate the generalization of LDA in various research directions. The R package TULIP integrates several popular high-dimensional LDA-based methods and provides a comprehensive and user-friendly toolbox for linear, semi-parametric and tensor-variate classification. Functions are included for model fitting, cross validation and prediction. In addition, motivated by datasets with diverse sources of predictors, we further include functions for covariate adjustment. Our package is carefully tailored for low storage and high computation efficiency. Moreover, our package is the first R package for many of these methods, providing great convenience to researchers in this area.
Linear discriminant analysis (LDA) is one of the most popular
classification method and a cornerstone for multivariate statistics
(Michie et al. 1994 e.g). Classical LDA builds a linear classifier based
on
In recent decades, the advancements in science and technology have enabled researchers to collect datasets with increasing sizes and complexity. Such datasets pose challenges to LDA. Four challenges that we tackle with this package are as follows. First, in research areas such as biology, genomics and psychology, we often have more predictors than samples. However, LDA is not applicable on these high-dimensional data, because sample covariance matrix becomes not invertible when the number of predictors exceeds the sample size.
Secondly, when we have a large number of predictors, variable selection is often desired such that we can obtain a sparse classifier involving only a small proportion of the variables. On one hand, Fan and Fan (2008; Bickel and Levina 2008) showed in theory that variable selection is critical for accurate classification. On the other hand, sparse classifiers much easier to interpret in practice. However, LDA generally does not perform variable selection.
Thirdly, contemporary datasets often have complicated structure that renders the linear classifier in LDA inadequate. For example, in the presence of thousands of predictors, it may be inappropriate them to model all of them with the normal distribution. Moreover, research in neuroimaging, computational biology and personalized recommendation produces data in the form of matrices (2-way tensor) or tensors. The analysis of tensor datasets requires considerable modification to the vector-based LDA model.
Last but not least, integrative analysis with multiple data sources are drawing researchers’ attention recently. Co-existence of diverse data types, such as vector, matrix and tensor calls for more sophisticated models to integrate the information from them to improve classification accuracy. It is critical to model the dependence among different types of data to reduce the noise level in the data and improve prediction accuracy (Pan et al. 2019).
Motivated by these challenges, many methods have been proposed to generalize LDA to datasets with high dimensions, non-normality and/or higher order predictors. In this package, we implement six methods that generalize LDA to contemporary complicated datasets. All of them are developed under models closely related to the LDA model, and penalties are imposed to achieve classification accuracy and variable selection in high dimensions. These methods include:
Direct sparse discriminant analysis (DSDA): DSDA generalizes the classical LDA model to high dimensions when there are only two classes (Mai et al. 2012). It formulates high-dimensional LDA into a penalized least squares problem.
Regularized optimal affine discriminant (ROAD): under the same model
as DSDA, ROAD fits a sparse classifier by minimizing the
classification error under the
Sparse optimal scoring (SOS) for binary problems: SOS is also developed under the LDA model (Clemmensen et al. 2011). It penalizes the optimal scoring problem (Hastie et al. 1994). We focus on its application in binary problems.
Semiparametric sparse discriminant analysis (SeSDA): SeSDA assumes a semiparametric model where data transformation can be applied to alleviate the non-normality. In practice, SeSDA finds the data-driven transformation and then performs model-fitting on the transformed data (Mai and Zou 2015).
Multiclass sparse discriminant analysis (MSDA): Instead of focusing on binary problems, MSDA considers the multiclass LDA model (Mai et al. 2015). It takes note of the fact that the Bayes’ rule can be estimated with minimizing a quadratic loss. To account for the multiclass structure, a group lasso penalty (Yuan and Lin 2006) is applied to achieve variable selection.
Covariate-adjusted tensor classification in high-dimensions (CATCH): CATCH (Pan et al. 2019) is developed for tensor predictors. It takes advantage of the tensor structure to significantly reduce the number of parameters and hence alleviate computation complexity.
DSDA | SOS | ROAD | SeSDA | MSDA | CATCH | |
---|---|---|---|---|---|---|
Classes | Binary | Binary | Binary | Binary | Multi-class | Multi-class |
Data type | Vector | Vector | Vector | Vector | Vector | Tensor |
Model | LDA | LDA | LDA | SeLDA | LDA | TDA/CATCH |
Covariate adjustment | Yes | No | No | No | Yes | Yes |
See Table 1 for a comparison of these methods. Despite their different model assumptions and formulas, all of them have strong theoretical support and excellent empirical performance. We further note that they can be combined with covariate adjustment when multiple data sources are available. Our package TULIP (Pan et al. 2021) integrates diverse discriminant analysis models and supportive functions to make it a convenient and well-equipped toolbox. It has several notable advantages. First, we not only include functions for model fitting, but also cross validation functions for easy control of the sparsity level, and prediction functions for the prediction of future observations. In addition, we provide covariate adjustment functions that efficiently remove heterogeneity in the predictors and combine information from covariates. Second, our package greatly facilitates the application of DSDA, ROAD and SeSDA for R users, as they do not have public R packages on CRAN outside ours. Third, although MSDA and SOS have been implemented in packages msda (Mai et al. 2015) and sparseLDA (Clemmensen and Kuhn 2016), we carefully modify their algorithms in our implementation to lower storage cost and/or speed up computation.
We acknowledge that many other efforts have been spent on topics closely related to that of our paper. On one hand, by now a large number of high-dimensional discriminant analysis methods have been developed. Some excellent examples include (Tibshirani et al. 2002; Trendafilov and Jolliffe 2007; Fan and Fan; 2008; Wu et al. 2009; Cai et al. 2011; Clemmensen et al. 2011; Shao et al. 2011; Witten and Tibshirani 2011; Fan et al. 2012; Xu et al. 2015; Niu et al. 2015). On the other hand, in the literature, many works study matrix/tensor regression and classification methods. Many of them impose low rank assumption (Kolda and Bader 2009; Chi and Kolda 2012; Zhou et al. 2013; Lai et al. 2013; Li and Schonfeld 2014; Zeng et al. 2015; Zhong and Suslick 2015; Liu et al. 2017). All these methods have been reported to have great performance, but a comprehensive study of them is apparently out of the scope of our current paper.
The rest of this paper is organized as follows. We start with a brief overview of discriminant analysis models in Section 2. Model estimation and implementation details are discussed in Section 3. Section 4 contains instructions and examples on the usage of the package. A real data example is given in Section 5 to confirm the numerical performance of methods in the package.
Recall that
In the rest of this section, we discuss various statistical models that have been widely studied in the literature, along with the Bayes rules under these assumptions. Specifically, we review the classical LDA model, the semiparametric LDA model, and the tensor discriminant analysis model. We also discuss a general framework for covariate adjustment.
Given a multivariate predictor
Define
The LDA model is simple yet elegant. All the parameters in this model
have natural interpretations, while the Bayes rule has a nice linear
form. An interesting fact about the Bayes rule in (2) is
that it does not explicitly involve the
Although the Bayes rule is derived under the somewhat restrictive
normality and equal covariance assumptions, the discriminant directions
In many real-life problems, we have additional covariates along with the
predictors. The covariates play two roles in the classification: it has
predictive power on it own, and it also accounts part of the variation
in the predictors. For example, in genomics studies, we record not only
gene expression levels but also age and clinical measurements. In this
case, we may view the gene expression levels as the high-dimensional
predictor, and the age and clinical measurements as the covariates. We
consider an LDA-type model to incorporate the covariates. In addition to
the response
Obviously, the CA-LDA model reduces to the LDA model in the absence of
covariates. With the covariates, the CA-LDA model continues to have
natural interpretations. Equation (3) indicates that
Under the CA-LDA model, the Bayes’ rule is
Although LDA is reasonably resistant to model misspecification, we may
still need more flexible models when data are heavily non-normal. The
semiparametric linear discriminant analysis (SeLDA) model (Lin and Jeon 2003) is
proposed for this purpose. SeLDA assumes that there exists a set of
strictly monotone univariate transformations
For identifiability, we further assume that all the diagonal elements in
It is easy to see that the LDA model is a special case of the SeLDA
model, if we restrict
Although the SeLDA model requires much weaker conditions than the LDA
model, it preserves many of the desirable properties. One of them is
that the Bayes rule continues to be a linear function of the transformed
data
Consequently, just as in the LDA model, when the dimension is high, we
assume that
The tensor discriminant analysis (TDA) model is proposed for
classification based on tensor predictors. We first briefly introduce
some standard tensor notation (Kolda and Bader 2009). See Appendenx
7.1 for more rigorous definitions. An
Further, we say a tensor
|
|
|
|
Now we discuss the tensor discriminant analysis (TDA) model. Consider
the
Under the TDA model, the Bayes’ rule is
Similar to the vector case, when additional covariates are provided, the
TDA model can be combined with covariate adjustment. (Pan et al. 2019) proposed
the CATCH model for this purpose. In addition to
The Bayes’ rule under the CATCH model is
In this section, we formally introduce the six methods implemented by
the package: DSDA, ROAD, SOS, SeSDA, MSDA and CATCH. Throughout the rest
of this paper, we denote
The direct sparse discriminant analysis (DSDA) is proposed for binary
classification under the LDA model in (1). Recall that our
main interest is in estimating the coefficients
Numerical and theoretical studies show that DSDA consistently estimate
the Bayes rule under mild conditions. Also, DSDA can be computed very
efficiently, as (11) is a heavily-studied
Regularized optimal affine discriminant (ROAD, (Fan et al. 2012)) is another
binary penalized discriminant analysis method for high-dimensional data.
ROAD estimates
In its optimization, the constraint of
The authors of ROAD proposed to solve (13) by replacing
the nonconvex constraint with a quadratic penalty. However, we adopt a
different approach to solve (13). It is showed in
Mai and Zou (2013) that the solution paths of DSDA and ROAD are equivalent. In
other words, for any
We also implement the successful discriminant analysis method, sparse optimal scoring (SOS, (Clemmensen et al. 2011)). We focus on binary problems, where we are able to greatly improve the computation speed. For multiclass problems, SOS can be solved by the R package sparseLDA.
In binary problems, SOS creates a dummy variable
However, we take another approach to solve SOS with lower computation
cost. (Mai and Zou 2013) showed that
Therefore, to solve for
SeSDA (Mai and Zou 2015) fits the SeLDA model in (5) for binary
problems. It is expected to have better performance than DSDA when data
are heavily non-normal. SeSDA has two steps. First, we find an estimate
Two estimators have been proposed for
The naive estimator is shown to consistently estimate
Similar to
Up to now, we have focused on binary classifiers. In this section, we
discuss a multiclass classifier under the LDA model (1).
Assume that
The multi-class sparse discriminant analysis (MSDA) has been proposed for fitting a sparse classifier under the context of interest. It takes note of the fact that, on the population level, we have
Therefore, in high dimensions, MSDA replaces the parameters with the
sample estimates and impose the group sparsity structure through group
lasso (Yuan and Lin 2006). More specifically, MSDA estimates
Compute
Initialize
For steps
for each element
Compute
Update
At convergence, output
Therefore, in our implementation we modify the original MSDA algorithm
for lower storage and computation cost for high-dimensional data. Note
that
When
Recall that, under the TDA model, we aim to estimate the parameters
CATCH can be solved by a coordinate descent algorithm with an explicit updating formula in each iteration.
When we have additional covariates
For vector-variate
For tensor-variate
Afterwards, the ensemble of all
We recommend selecting the tuning parameter in all methods by cross
validation, which is implemented in our package as supportive functions
for most of the methods. In cross validation, a sequence of potential
tuning parameters is supplied. For each candidate tuning parameter
If desired, our package can automatically generate a sequence of tuning
parameters for all the methods. They will first compute the smallest
The R package TULIP provides user-friendly functions to fit
discriminant analysis model and perform predictions on vector and tensor
data. The package can be downloaded through
https://cran.r-project.org/web/packages/TULIP or install in R through
install.packages(’TULIP’)
. In installing package, the pre-required
packages MASS(Venables and Ripley 2002) for
LDA model fitting, packages
Matrix (Bates and Maechler 2016) and
tensr (Gerard and Hoff 2016) for matrix
and tensor operations, and the package
glmnet(Friedman et al. 2010) for LASSO
are also automatically installed. Users do not need to install them
separately. To guarantee higher computation efficiency of the package,
core algorithms of MSDA and CATCH are implemented in Fortran, which have
already been compiled and can also be used directly.
Among all the six methods, there is always a tuning parameter dfmax
to limit the number of selected
variables and will only return the solutions with number of non-zero
elements less than dfmax
. Furthermore, MSDA has a model
option to
specify version of implementation between multi.original and
multi.modified. The methods are summarized in Table 2.
Parameters | dfmax |
model option |
Covariate | Cross validation | ||
---|---|---|---|---|---|---|
DSDA | Binary vector | |||||
ROAD | Binary vector | |||||
SOS | Binary vector | |||||
SeSDA | Binary vector | |||||
MSDA | Multi-class vector | |||||
CATCH | Multi-class tensor |
The functions in the package consists of two parts. One part contains
core functions which generate solution paths of all the methods,
including functions dsda
, road
, sos
, SeSDA
, msda
and catch
.
Since binary classification can be regarded as a special case of
multi-class problems, we also embedded DSDA into msda
function. See
Section 4.1 for details. The other part includes supportive
functions to perform covariate adjustment, prediction, cross validation
and handle some special cases.
To illustrate how to use the functions, we first simulate a binary
vector data set named dat.vec
with dimension dat.vec
is summarized in Table 3. Data set dat.vec
can be simulated by code
dat.vec<-sim.bi.vector(1000)
Variable | Type | Dimension |
---|---|---|
x | matrix | |
y | vector | 150 |
testx | matrix | |
testy | vector | 1000 |
Moreover, we include two real data sets, GDS1615 and colorimetric sensor
array data set, in the package to demonstrate usage of the functions.
Data set GDS1615 (Burczynski et al. 2006) is a vector data set where
observations belong to three classes. The original data set contains 127
observations and 22283 variables. Package msda preprocessed the data
by computing F-test statistics of each variable (Mai et al. 2015), whose
definition is in appendix. Hence only 127 variables are kept in the data
set. Colorimetric sesor array data (CSA) was used to show the
performance of discriminant analysis method (Zhong and Suslick 2015). It records
information of chemical dyes after exposed to volatile chemical
toxicants to identify their classes. It contains 147 observations in 21
classes. For each observation, the predictor is a
Function dsda
The following code shows an example of utilizing DSDA. Given the data
set dat.vec
, we fit DSDA on predict
function on the model and obtain the prediction for
each
obj <- dsda(dat.vec$x, y=dat.vec$y, lambda=seq(0.005, 0.3, length.out=20))
pred <- predict(obj, dat.vec$testx)
err<- apply(pred, 2, function(x){mean(x!=dat.vec$testy)})
print(min(err))
[1] 0.111
print(obj$lambda[which.min(err)])
[1] 0.02052632
If one wishes, dsda
can also be used in a more automatic way. On one
hand, it can be called without supplying a sequence of value for tuning
parameter. The function will automatically generate a sequence based on
data. On the other hand, the prediction can be performed along with
model fitting if testing data is supplied. The function dsda
will
produce the prediction error on the testing data corresponding to each
tuning parameter. See the following example.
obj <- dsda(dat.vec$x, y=dat.vec$y,testx=dat.vec$testx)
err <- apply(obj$pred, 2, function(x){mean(x!=dat.vec$testy)})
print(min(err))
[1] 0.107
print(obj$lambda[which.min(err)])
[1] 0.03180946
Figure 3 shows a solution path of DSDA model. As
parameter catch
, and we do not give a separate example here to avoid
redundancy.
Function SeSDA
Function SeSDA
fits a semiparametric sparse discriminant analysis
model on the input vector data. The simulated data dat.vec
follows
normal distribution within each class. We take an exponential
transformation on it to violate the normality assumption. The following
example shows that SeSDA
achieves error rates 11%. However, if we
directly apply DSDA on the data set, the minimum error rate is as high
as 15.8%. Therefore, the preprocessing of SeSDA can indeed help to
improve performance under this scenario.
x <- exp(dat.vec$x)
testx <- exp(dat.vec$testx)
obj.SeSDA <- SeSDA(x, y=dat.vec$y)
pred.SeSDA <- predict(obj.SeSDA, testx)
err <- apply(pred.SeSDA, 2, function(x){mean(x!=dat.vec$testy)})
min(err)
[1] 0.11
Further, Figure 4 shows how the distribution of the first variable changes after transformation. It is clear that both pooled and naïve transformatins result in approximately normal distribution.
Functions ROAD
and SOS
Functions ROAD
and SOS
can generate equivalent solution paths as
ROAD (Fan et al. 2012) and SOS (Clemmensen et al. 2011) methods on binary vector data,
respectively. Both of the two models are fit by calling dsda
function.
Compared to the original package for SOS, sparseLDA, our
implementation is usually faster, especially when a solution path or
parameter tuning is needed. For example, to fit a solution path with 10
possible values of lambda
s passed into ROAD
and SOS
functions will be directly used
by dsda
function. The lambda
s returned by the two functions are
their corresponding parameters in ROAD and SOS model, respectively.
Figure 5 shows the relationship between the
obj.dsda <- dsda(dat.vec$x, y=dat.vec$y, lambda=seq(0.1, 0.5, length.out=20))
obj.road <- ROAD(dat.vec$x, y=dat.vec$y, lambda=seq(0.1, 0.5, length.out=20))
obj.sos <- SOS(dat.vec$x, y=dat.vec$y, lambda=seq(0.1, 0.5, length.out=20))
Function msda
The function msda
provides an interface to fit MSDA. Similarly to
dsda
, without specification of possible values of msda
can also perform predictions when testing data is supplied and
make adjustments on covariates when covariates exist. We apply msda
on
GDS1615 data set to as a demonstration. We report the minimum training
error, its corresponding parameter value and the number of non-zero
variables selected by the model.
data(GDS1615)
x <- GDS1615$x
y <- GDS1615$y
set.seed(123456)
teindex <- c(sample(which(y==1), sum(y==1)/3), sample (which(y==2),
sum(y==2)/3), sample(which(y==3), sum(y==3)/3))
obj <- msda(x[-teindex, ], y=y[-teindex], testx=x[teindex, ])
err <- apply(obj$pred, 2, function(x){mean(x!=y[teindex])})
paste(min(err), obj$lambda[which.min(err)], obj$df[which.min(err)] )
[1] "0.04878049 1.446872 19"
If one wishes to visualize the discriminant effect, plots of projections
on the discriminant coefficients is helpful. A principle component
analysis is also optional to show the classification even more clearly.
For illustration, we perform principle component analysis on
We also note that msda
has an argument model
that can be specified
by users to use different algorithms in MSDA. The options for model
include binary
, multi.original
and multi.modified
. The option
binary
can only be used in binary problems. If selected, MSDA is
solved by DSDA, which gives the same solution with usually less time.
However, using this option in multi-class problems will result in an
error. The option multi.original
indicates that MSDA is solved by the
original algorithm, which requires the calculation of the full
covariance matrix. When the dimension is low, multi.original
is often
efficient. The option multi.modified
, on the other hand, solves MSDA
with the modified algorithm, where only part of the covariance matrix is
calculated in each iteration. This option allows MSDA to be applicable
in much higher dimensions. Also, when multi.modified
is selected, we
suggest using relatively larger tuning parameters to account for the
high dimensionality. If unspecified, model
is set to be binary
in
binary problems. If the response variable is multi-class, the function
will call multi.original implementation for
Function catch
To illustrate usage of function catch
, we first simulate a data set
named dat.ten
with tensor predictors
dat.ten
are summarized in Table 4.
Data set dat.vec
can be simulated by code
dat.ten<-sim.tensor.cov(1000)
Variable | Type | Dimension |
---|---|---|
x | list | 150. Each element is a |
y | vector | 150 |
z | matrix | |
vec_x | matrix | |
testx | list | 1000. Each element is a |
testy | vector | 1000 |
testz | matrix | |
vec_testx | matrix |
Function catch
fits a CATCH model on the input tensor data. Covariates
are optional for the function and the function will fit a TDA model when
there is no covariate. Function catch
has already integrated the
adjustment step and model fitting step, hence it will automatically
adjust for covariates when covariates exist. If one prefers to seperate
the adjustment step, he/she can call adjten
function to make
adjustments and then supply the adjusted predictors into catch
, which
we will discuss in supportive functions.
Similar to the two functions above, function catch
can generate a
solution path on default or user specified potential values of the
parameter. It will also perform prediction when testing data is
specified. To make predictions on CATCH model, user can directly apply
catch
function or separating adjustment and model fitting step and
then call the predict
function. The following example shows how to fit
the model and make prediction when covariates exist. As mentioned above,
functions dsda
and msda
shares the same arguments name for
covariates.
obj <- catch(dat.ten$x, dat.ten$z, dat.ten$y, dat.ten$testx, dat.ten$testz)
pred <- obj$pred
err <- apply(pred, 2, function(x){mean(x!=dat.ten$testy)})
min(err)
[1] 0.167
obj$lambda[which.min(err)]
[1] 0.4270712
An example of applying CATCH to fit model and perform prediction on CSA
data is as follows. catch
function takes list of multi-dimensional
array as input. In the dataset, x
is a list of length 148, where each
element is a matrix of dimension y
is a vector whose
value ranges between 1 and 21. We use default parameter sequence of
length 100 and the prediction for each value of parameter is generated.
data(csa)
x <- csa$IDLH
y <- csa$y
teindex <- seq(1,147,7)
obj <- catch(x[-teindex, ], y=y[-teindex], testx=x[teindex, ], nlambda=10)
err <- apply(obj$pred, 2, function(x){mean(x!=y[teindex])})
print(err)
[1] 0.9523819 0.1904762 0.0952381 0.0000000 0.0952381 0.0000000 0.0000000
0.0000000 0.0000000 0.0000000
Two special cases
We provide two more functions for two common problems in practice,
binary classification and matrix classification, respectively. First,
the function catch_matrix
fits CATCH model on matrix data (2-way
tensor), which is a special case of catch
. The usage of catch_matrix
is exactly the same as that of catch
, with the only exception that the
predictor has to be a matrix instead of higher-order tensor.
Second, our package includes the function dsda.all
that integrates
cross validation, model fitting and prediction. It requires the input of
the training set and testing set. Then the optimal tuning parameter is
chosen by cross validation on the training set, and the corresponding
testing error is reported. See the following example.
obj <- dsda.all(dat.vec$x, dat.vec$y, dat.vec$testx, dat.vec$testy, nfolds = 10)
print(obj$err)
[1] 0.116
Supportive functions
The package provides functions cv.dsda
, cv.msda
, cv.SeSDA
and
cv.catch
to perform cross validation. For all of these functions, user
can give a sequence of potential values to tune parameter. Otherwise,
the function will first fit a model on the entire data set and then
perform cross validation on the automatically generated msda
, user can specify which model to
use in cv.msda
or let the function determine by input data.
Users can also specify the number of folds by the argument nfolds
.
Another argument lambda.opt
has two options "min"
and "max"
. When
multiple "min"
will return the
smallest tuning parameter with the lowest error rate while "max"
will
return the largest one. We take cv.dsda
and cv.catch
as two
examples.
obj.dsda <- cv.dsda(dat.vec$x, dat.vec$y, nfolds = 10)
obj.catch <- cv.catch(dat.ten$x, dat.ten$z, dat.ten$y, lambda.opt="min")
Function adjten
and adjvec
implement the adjustment step for tensor
and vector data, respectively. It takes training tensor/vector,
covariate and response as input, and outputs the adjusted tensor/vector
and adjustment coefficients dsda
, msda
and catch
. When user input covariates along with
tensor/vector, the model fitting functions will automatically make the
adjustment. But if user do not want to use the automatic prediction in
model fitting function and prefer to predict via predict
, user need to
first make the adjustment to obtain adjustment coefficient gamma
, and
pass it into function predict
. Notice that making adjustment and
fitting a model on the adjusted tensor and response without covariate is
equivalent as fitting a model by inputting the original tensor,
covariate and response labels. Examples of two approaches are given as
follows.
obj <- catch(dat.ten$x, dat.ten$z, dat.ten$y, dat.ten$testx, dat.ten$testz)
obj.adj <- adjten(dat.ten$x, dat.ten$z, dat.ten$y, dat.ten$testx,
dat.ten$testz)
obj.fit <- catch(dat.ten$x, dat.ten$z, dat.ten$y)
pred <- predict(obj.fit, obj.adj$testxres, dat.ten$z, dat.ten$testz,
obj.adj$gamma)
There are three prediction functions corresponding to dsda
, msda
and
catch
, respectively. All of them can be directly called by predict
and the function will recognize which function to use based on the input
fitted model object. When covariate exists, user needs to pass the
adjustment coefficient obtained from function adjten
, the fitted model
and testing data altogether to make predictions. Therefore, we encourage
user to direct use model fitting functions msda
and catch
to fit
model and predict categorical responses.
In this section, we will show the performance of the models by a real data set. We considered the attention deficit hyperactivity disorder (ADHD) data set. The dataset is available on NITRC (http://fcon_1000.projects.nitrc.org/indi/adhd200) (Bellec et al. 2017). It contains three parts of information: s-MRI data which is a 3-D tensor, covariate information including age, gender and handedness which is a vector, and response label. Among all 930 individuals, there are four types of categorical labels: Typically Developing Childrem (TDC), ADHD Combined, ADHD Hyperactive and ADHD Inattentive.
We downsize the tensor to dimension
Given the tensor structure and existence of covariates, the most
suitable approach is to apply CATCH on that. We also vectorize the
tensor into vectors and stack covariates along with the long vector to
apply vector methods. For binary case, DSDA is applied. For multi-class
case, MSDA model with multi.modified
is applied since the dimension is
too large to employ multi.original
. We also compared with SOS
(Clemmensen et al. 2011) by its own package sparseLDA and
Method | Binary | Multi-class | ||||
Error rate (%) / Time (seconds) | Mean | SE | Time | Mean | SE | Time |
DSDA/multi.modified | 23.58 | 0.23 | 36.30 | 35.85 | 0.24 | 88.8 |
SeSDA | 23.68 | 0.24 | 731.28 | NA | NA | NA |
CATCH | 22.79 | 0.24 | 78.6 | 35.22 | 0.25 | 101.4 |
23.99 | 0.16 | 36.23 | 35.66 | 0.21 | 134.4 | |
SOS | 23.87 | 0.26 | 100.8 | 37.07 | 0.29 | 1114.8 |
For each replicate, we perform cross validation on training data and record the classification error on testing data. The entire process was repeated for 100 times and we report the mean and standard error of the error rates. The performance is shown in Table 5.
Package TULIP provides a toolbox to fit various sparse discriminant analysis models, including parametric models DSDA, ROAD, SOS for binary vector data, semiparametric model SeSDA for binary vector data, MSDA for multiclass vector data, and CATCH for multiclass tensor data. As a comprehensive toolbox, the package provides prediction and cross validation functions as well.
Meanwhile, the package propose an approach to handle cases when both predictor and covariates are supplied. The predictor can be vector and tensor, while the covariates are usually low-dimensional vectors. Covariates may have an effect on both response and the predictor. Therefore making adjustment and excluding the influence of covariates from predictor is important. The package includes functions to make the adjustments and can be called easily by supplying covariates in the model fitting function.
On each dimension, which is named mode, a tensor is composed by vectors
of length
Denote the mode-
Data sets dat.vec and dat.ten are used to illustrate usage of the functions. Detailed model settings are described in Section 4. Here are the code to simulate the two data sets.
Code to simulate data set dat.vec:
set.seed(123456)
sigma <- matrix(0.3, 500, 500)
diag(sigma) <- 1
dsigma <- t(chol(sigma))
#define beta and mean
beta <- matrix(0, nrow = 500, ncol = 1)
beta[1:10,1] <- 0.5
M <- matrix(0, nrow = 2, ncol = 500)
M[2,] <- sigma%*%beta
y <- c(rep(1, 75), rep(2, 75))
#generate test data
telabel <- ceiling(runif(1000)*2)
x <- matrix(rnorm(150*500),ncol = 500)%*%t(dsigma)
x[y==2, ] <- x[y==2, ] + M[2,]
testx <- matrix(rnorm(1000*500), ncol = 500) %*% t(dsigma)
testx[telabel==2, ] <- testx[telabel==2, ] + M[2, ]
dat.vec <- list(x = x, y = y, testx = testx, testy = telabel)
Code to simulate data set dat.ten:
set.seed(123456)
sigma <- array(list(), 3) #define covariance matrices
dsigma <- array(list(), 3)
for (i in 1:3){
sigma[[i]] <- diag(10)
dsigma[[i]] <- t(chol(sigma[[i]]))
}
B2 <- array(0, dim=c(10,10,10)) #define B and mean
B2[1:2, 1:2, 1:2] <- 0.8
M <- array(list(), 2)
M[[1]] <- array(0, dim=c(10,10,10))
M[[2]] <- atrans(B2, sigma)
y <- c(rep(1,75), rep(2,75))
coef <- array(0, dim=c(10,10,10,2)) #define alpha
coef[1:5, 1:5, 1:5, 1] <- 1
telabel <- ceiling(runif(1000)*2)
z <- matrix(rnorm(2*150), nrow=150, ncol=2) #generate covariates
z[y==2,] <- z[y==2,] + 0.3
testz <- matrix(rnorm(2*1000), nrow=1000, ncol=2)
testz[telabel==2, ] <- testz[telabel==2, ] + 0.3
vec_x <- matrix(rnorm(1000*150), ncol=150) #generate tensor
x <- array(list(),150)
for (i in 1:150){
x[[i]] <- array(vec_x[,i], c(10,10,10)) + amprod(coef, t(z[i,]), 4)[,,,1]
x[[i]] <- M[[y[i]]] + atrans(x[[i]], dsigma)
}
vec_testx <- matrix(rnorm(1000*1000), ncol=1000)
testx <- array(list(), 1000)
for (i in 1:1000){
testx[[i]] <- array(vec_testx[,i], c(10,10,10)) + amprod(coef, t(testz[i,]),
4)[,,,1]
testx[[i]] <- M[[telabel[i]]] + atrans(testx[[i]], dsigma)
}
dat.ten <- list(x=x, z=z, testx=testx, testz=testz, vec_x=t(vec_x),
vec_testx=t(vec_testx), y=y, testy=telabel)
Denote the sample mean of Class
Further unfold
The F-test statistic used to preprocess GDS1615 data is defined as
where
TULIP, msda, sparseLDA, MASS, Matrix, tensr, glmnet
Distributions, Econometrics, Environmetrics, MachineLearning, MixedModels, NumericalMathematics, Psychometrics, Robust, Survival, TeachingStatistics
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Pan, et al., "TULIP: A Toolbox for Linear Discriminant Analysis with Penalties", The R Journal, 2021
BibTeX citation
@article{RJ-2021-025, author = {Pan, Yuqing and Mai, Qing and Zhang, Xin}, title = {TULIP: A Toolbox for Linear Discriminant Analysis with Penalties}, journal = {The R Journal}, year = {2021}, note = {https://rjournal.github.io/}, volume = {12}, issue = {2}, issn = {2073-4859}, pages = {134-154} }