Ordinal data are used in many domains, especially when measurements are collected from people through observations, tests, or questionnaires. ordinalClust is an innovative R package dedicated to ordinal data that provides tools for modeling, clustering, co-clustering and classifying such data. Ordinal data are modeled using the BOS distribution, which is a model with two meaningful parameters referred to as "position" and "precision". The former indicates the mode of the distribution and the latter describes how scattered the data are around the mode: the user is able to easily interpret the distribution of their data when given these two parameters. The package is based on the co-clustering framework (when rows and columns are simultaneously clustered). The co-clustering approach uses the Latent Block Model (LBM) and the SEM-Gibbs algorithm for parameter inference. On the other hand, the clustering and the classification methods follow on from simplified versions of the SEM-Gibbs algorithm. For the classification process, two approaches are proposed. In the first one, the BOS parameters are estimated from the training dataset in the conventional way. In the second approach, parsimony is introduced by estimating the parameters and column-clusters from the training dataset. We empirically show that this approach can yield better results. For the clustering and co-clustering processes, the ICL-BIC criterion is used for model selection purposes. An overview of these methods is given, and the way to use them with the ordinalClust package is described using real datasets. The latest stable package version is available on the Comprehensive R Archive Network (CRAN).
Ordinal data is a specific kind of categorical data occurring when the
levels are ordered (Agresti 2012). Some common contexts for the collection
of ordinal data include satisfaction surveys, aptitude and personality
tests and psychological questionnaires. In the present work, an ordinal
variable is represented by
Thus far, ordinal data have received more attention from a supervised point of view. For example: a marketing firm investigating which factors influence the size of a soda (small, medium, large or extra large) that people order at a fast-food chain. These factors may include which type of sandwich is ordered (burger or chicken), whether or not fries are also ordered, and the consumer’s age. In this case, an observation consists in factors of different types and the variable to predict is an ordinal variable. Several software can analyze ordinal data in a regression framework. The cumulative link model (CLM) assumes that:
where
However, the focus of these techniques differs from ours in two ways. Firstly, they work in a supervised framework (classification). Secondly, they work with datasets for which the variables to predict are ordinal responses: the other variables are of various types. Our goal is to provide a tool for unsupervised and supervised tasks, and for datasets comprised only of ordinal variables only (in the classification context, the response is categorical). From an unsupervised point a view, the Latent Gold Software (J. Vermunt 2005) is – to our knowledge – the only software that uses the CMLs to cluster the data. Nevertheless, the implementation of this method is known to be computationally expensive. In addition, it is not provided through a user-friendly R package.
Other contributions have defined clustering algorithms with ordinal variables. In (McParland and Gormley 2013), the authors propose a model-based technique by considering the probability distribution of ordinal data as a discretization of an underlying continuous variable. This approach is implemented in the clustMD package (McParland and Gormley 2017), which is generally more for heterogeneous data. In (Ranalli and Rocci 2016), the categorical variables are seen as a discretization of an underlying finite mixture of Gaussians. In other works, the authors use the multinomial distribution to model the data. For instance in the case of (Giordan and Diana 2011), the multinomial distribution and a cluster tree are used, whereas (Jollois and Nadif 2009) apply a constrained multinomial distribution. However, these contributions do not provide a way to co-cluster and classify ordinal data. Furthermore, they are not always available as an R package (except in the case of (McParland and Gormley 2013)). More recently, (Corneli et al. 2020) proposed a method to co-cluster ordinal data modeled via latent Gaussian random variables. Their package ordinalLBM (Corneli et al. 2019) is available on CRAN.
Finally, the CUB (Combination of a discrete Uniform and a shifted
Binomial random variable) model (D’Elia and Piccolo 2005) is widely used to analyze
ordinal datasets. For instance, (Corduas 2008) proposes a clustering
algorithm based on a mixture of CUB models. In the CUB model, an answer
is interpreted as the result of a cognitive process where the decision
is intrinsically continuous but is expressed on a discrete scale of
More recently, Biernacki and Jacques (2016) proposed the so-called Binary Ordinal Search
model, referred to as the "BOS" model. It is a probability
distribution specific to ordinal data that is parameterized with
meaningful parameters
A dataset of ordinal data will be written as
The BOS model (Biernacki and Jacques 2016) is a probability distribution for ordinal
data parameterized by a position parameter
With this being in a co-clustering context, it is assumed that there are
Let us consider the data matrix
The univariate random variables
where
with the knowledge that
In the co-clustering context, tha im of the inference is to maximize the
observed log-likelihood
The ordinalClust package provides three modes for value
initialization. It is set through the argument init
, which can take
values ’random’
, ’kmeans’
or ’randomBurnin’
. The first value
randomly initializes
The third one, ’randomBurnin’
is a bit more complex and requires
additional arguments for the algorithm. It aims at avoiding a degeneracy
of the algorithm that leads to empty clusters, knowing that the
degeneracy event arises more often at the early stage of the algorithm
(thus during the burn-in period. This starts with a first random
initialization. However, for the first nbSEMburn
iterations
(nbSEMburn
nbSEM
), whenever a row-cluster gets empty, a
percentage percentRandomB
of the row partitions are resampled from the
multinomial distribution
The first iterations of the SEM-Gibbs are called the burn-in period,
which means that the parameters are not yet stable. Consequently, only
the iterations that occur after this burn-in period are taken into
account and are referred to as the "sampling distribution" hereafter.
While the final estimation of the position parameters
To determine how many row-clusters and how many column-clusters are
necessary, an adaptation of the ICL criterion (Biernacki et al. 2000), called
ICL-BIC, is proposed in Jacques and Biernacki (2018). In practice, the algorithm must be
executed with all the
The clustering model described in this section is a particular case of
the co-clustering model, in which each feature is in its own cluster
(
where
To infer the parameters of this model, the SEM-Gibbs
Algorithm 1 is used with the part in 1.2 removed from
the SE-step. The part in 1.3 relating to missing value imputation
also remains. It is noted here that clustering can also be achieved by
using the co-clustering model in section "2.3", and by
considering the resulting
By considering a classification task with a categorical variable to
predict from ordinal data, the configuration encountered is a particular
case where
This first model is similar to the clustering model: each variable
represents a column-cluster of size 1, thus
The inference of this model’s parameters only requires the M-step of Algorithm 1. However, if there are missing data, the SE-step made of the part in 1.3 only is also required.
This model is a parsimonious version of the first model. Parsimony is
introduced by grouping the features into
To infer this model’s parameters, Algorithm 1 is used with an SE-step only containing the part in 1.2, and the entire M-step. Again, if there are missing data, the SE-step made of the part in 1.3 is also required.
The Latent Block Model as described before cannot take variables with
different numbers of levels
In Selosse et al. (2019), a constrained Latent Block Model is provided. Although it
does not allow ordinal features with different numbers of levels to be
gathered in a same column-cluster, it is able to take into account the
fact that there are several numbers of levels and to perform a
co-clustering on more diverse datasets. The matrix
The model relies on the following hypothesis:
with
In this case, the SEM-Gibbs algorithm is slightly changed: in the
SE-step, a sampling step is appended for every additional
This section explains how to use the implementation of the methods
described before through the ordinalClust package. Users should be
aware that the code provided was run with R 3.5.3, and that the results
could be different with another version. If users wish to use a version
of R RNGkind(sample.kind=’Rounding’)
before running
the code.
The datasets included were part of the QoLR package (Anota et al. 2017). They contain responses to the well known "EORTC QLQ-C30" (European Organization for Research and Treatment of Cancer (EORTC) Quality of Life Questionnaire (QLQ-C30)), provided to patients affected by breast cancer. Furthermore, for all questions, the most positive answer is given by a level "1". For example, for question: "During the past week, did you feel irritable?" with possible responses: "Not at all." "A little." "Quite a bit." "Very much.", the following level numbers are assigned to the replies: 1 "Not at all.", 2 "A little.", 3 "Quite a bit.", 4 "Very much.", because it is perceived as more negative to have felt irritable. Two datasets are available:
dataqol
is a dataframe with 117 lines such that each line
represents a patient and the columns contain information about the
patient:
Id
: patient Id,q1-q28
: responses to 28 questions with the number of levels
equals to 4,q29-q30
: responses to 2 questions with the number of levels
equals to 7.dataqol.classif
is a dataframe with 40 lines such that a line
represents a patient, and the columns contain information about the
patient:
Id
: patient Id,q1-q28
: responses to 28 questions with the number of levels
equals to 4,q29-q30
: responses to 2 questions with the number of levels
equals to 7,death
: if the patient passed away (2) or not (1).The datasets contain missing values, coded as NA
: in dataqol
,
dataqol.classif
. To load the
package and its datasets, the following commands must be executed:
library(ordinalClust)
data("dataqol")
data("dataqol.classif")
Then, a seed is set so that users can obtain results identical to this document:
set.seed(1)
Users must define how many SEM-Gibbs iterations (nbSEM
) and how many
burn-in iterations (nbSEMburn
) are needed for
Algorithm 1. The section "3.7" provides an
empirical way of checking correctness of these values. Moreover, the
nbindmini
argument must be defined: it indicates the minimum number of
elements that must be present in a block. Finally, the init
argument
indicates how to initialize the algorithm. It can be set to "kmeans"
,
"random"
or "randomBurnin"
.
nbSEM <- 150
nbSEMburn <- 100
nbindmini <- 1
init <- "randomBurnin"
percentRandomB <- c(50, 50)
Here, percentRandom
is a vector because it defines two percentages:
the percentage of rows that will be resampled if a row-cluster is
emptied, and the percentage of columns that will be resampled if a
column-cluster is emptied.
In this section, the dataqol.classif
dataset is used. The aim is to
predict the death
variable from the ordinal data that corresponds to
the patients answers. The following commands show how to setup the
classification configuration. First, the death
to predict.
x <- as.matrix(dataqol.classif[,2:29])
v <- dataqol.classif$death
ordinalClust provides two classification models. The first model
(chosen by the option kc=0
) is a multivariate BOS model with the
assumption that, conditional on the class of the observations, the
features are independent as in Equation (3). The second
model introduces parsimony by grouping the features into clusters and
assuming that the features of a cluster have a common distribution, as
in Equation (4). This latter is a novel approach for
classification. The number kc = H
.
# sampling datasets for training and to predict
nb.sample <- ceiling(nrow(x)*7/10)
sample.train <- sample(1:nrow(x), nb.sample, replace=FALSE)
x.train <- x[sample.train,]
x.validation <- x[-sample.train,]
v.train <- v[sample.train]
v.validation <- v[-sample.train]
We also indicate how many classes there are, and how many levels the ordinal data have:
# classes
kr <- 2
# levels
m <- 4
The training can be performed using the function bosclassif
. In the
code below, several kc
parameters are tested. When kc = 0
, the
multivariate model is used: all variables are considered to be
independent. When kc > 0
, the parsimonious model is used: the
variables are grouped into kc
groups. To classify new observations,
the predict
function is used: it takes as arguments the result from
bosclassif
and the observations to classify. In the following example,
we store in the preds
matrix the predictions resulting from the
classifications performed with different kc
.
kcol <- c(0, 1, 2, 3, 4)
preds <- matrix(0, nrow = length(kcol), ncol = nrow(x.validation))
for( kc in 1:length(kcol) ){
classif <- bosclassif(x = x.train, y = v.train, kr = kr, kc = kcol[kc],
m = m, nbSEM = nbSEM, nbSEMburn = nbSEMburn,
nbindmini = nbindmini, init = init,
percentRandomB = percentRandomB)
new.prediction <- predict(classif, x.validation)
if(!is.character(new.prediction)){
preds[kc,] <- new.prediction@zr_topredict
}
}
Then the preds
matrix can be formatted to a dataframe:
preds <- as.data.frame(preds)
row.names <- paste0("kc = ", kcol)
rownames(preds) <- row.names
preds
v.validation
> preds
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
kc=0 2 1 2 2 2 2 1 1 1 2 1 2
kc=1 2 1 2 1 2 2 1 2 1 1 2 2
kc=2 2 1 2 2 2 2 1 2 1 2 2 2
kc=3 2 1 2 1 2 2 1 2 1 2 1 2
kc=4 1 1 2 1 1 1 1 2 1 2 1 2
> v.validation
[1] 2 1 1 1 1 1 1 2 1 1 1 2
Table 1 shows the sensitivity and specificity for each
different kc
. The code to get these values is available in the
Appendix "4.1". First of all, the results
are globally satisfying since the sensitivities and specificities are
quite high. We observe that the parsimonious models (when
kc = 1,2,3,4
) have better results than the multivariate model
(kc = 0
). The two parsimonious models kc = 1
and kc = 3
obtain the
best results. This illustrates the interest of introducing parsimonious
models in a supervised context. However, users should be aware that the
dataset is small, and the number of observations used here is too low to
draw definitive conclusions.
sensitivity | specificity | |
---|---|---|
kc = 0 | 0.67 | 0.44 |
kc = 1 | 1.00 | 0.56 |
kc = 2 | 1.00 | 0.33 |
kc = 3 | 1.00 | 0.56 |
kc = 4 | 0.78 | 0.67 |
This section uses the dataqol
dataset, plotted in
Figure 4.
The purpose of clustering is to emphasize information regarding the rows
of a data matrix. First, the
set.seed(1)
x <- as.matrix(dataqol[,2:29])
The clustering is obtained using the bosclust
function:
clust <- bosclust(x = x, kr = 3, m = 4,
nbSEM = nbSEM, nbSEMburn = nbSEMburn,
nbindmini = nbindmini, init = init)
The outcome can be plotted using the plot
function:
plot(clust)
Figure 5 represents the clustering result. We count the
clusters from the bottom to the top. Among the
The parameters are obtained with the command clust@params
:
> clust@params
[[1]]
[[1]]$mus
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
[1,] 1 1 1 1 1 1 1 1 1 2 1 2 1 1
[2,] 2 2 1 1 1 2 1 1 2 2 1 3 2 1
[3,] 3 4 3 3 1 4 3 2 3 4 2 4 3 4
[,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26]
[1,] 1 1 1 2 1 1 1 2 1 1 1 1
[2,] 1 1 1 2 2 1 2 2 1 2 1 1
[3,] 1 1 1 4 3 3 3 3 2 2 2 3
[,27] [,28]
[1,] 1 1
[2,] 1 1
[3,] 4 1
[[1]]$pis
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0.8079608 0.6673682 0.961979 0.7770536 1 0.9619790 1.0000000 0.8852379
[2,] 0.3946294 0.3736864 0.722322 0.4690402 1 0.3567357 0.5546162 0.6402318
[3,] 0.4319502 0.5928978 0.347433 0.4930463 1 0.2718517 0.5888644 0.3310052
[,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,] 0.9246885 0.5903583 0.6951631 0.5438752 0.9226941 0.4932884 0.8825371
[2,] 0.4767814 0.6937982 0.1481492 0.1859040 0.1176366 0.6624020 0.7916167
[3,] 0.3220447 0.7079570 0.4084469 0.5779180 0.5745136 0.1691940 0.3161048
[,16] [,17] [,18] [,19] [,20] [,21] [,22]
[1,] 0.8036703 0.7364791 0.6643935 1.0000000 0.9619790 0.6951631 0.5681893
[2,] 0.3054584 0.8394348 0.5440131 0.3395749 0.4757433 0.4142450 0.3805989
[3,] 0.1255990 0.4281432 0.5470879 0.4280508 0.2300193 0.5776385 0.2632960
[,23] [,24] [,25] [,26] [,27] [,28]
[1,] 0.4905033 0.5510665 0.8167944 0.7477762 0.8521366 0.9226941
[2,] 0.3870155 0.4064222 0.6484691 0.4666815 0.3530825 0.6599010
[3,] 0.4183768 0.4709545 0.1959082 0.5465595 0.6419857 0.4174326
clust@params
is a list: when the data have clust@params
has one element. Each
element of the list has two attributes, pis
and mus
. They indicate
the
In the example above, the choice for clust@icl
, we can find
out which result has the highest ICL value. The
Once again, this section uses the dataqol
dataset. The co-clustering
is performed using the boscoclust
function:
set.seed(1)
coclust <- boscoclust(x = x, kr = 3, kc = 3, m = 4,
nbSEM = nbSEM, nbSEMburn = nbSEMburn,
nbindmini = nbindmini, init = init)
As in the clustering context, the result can be plotted with the command below, as in Figure 6.
plot(coclust)
In this case, the algorithm highlights a structure amid the rows, as for the clustering Figure 5. In addition, it also reveals a structure inherent to the columns: for example, the third column-cluster is lighter than the others, consequently, these questions were globally responded to in a more positive way.
Once again, the parameters of the co-clustering are available through
the command coclust@params
:
> coclust@params
[[1]]
[[1]]$mus
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 2 1
[3,] 3 3 1
[[1]]$pis
[,1] [,2] [,3]
[1,] 0.8496224 0.6266097 0.9426305
[2,] 0.4876194 0.5340329 0.7722278
[3,] 0.2638594 0.3044552 0.3623779
In order to find out which questions belong to the third column-cluster
(the one whose corresponding blocks are lighter), we need the command
coclust@zc
, which indicates the column-cluster of each column.
coclust@zc
is also a list of length coclust@zc[[1]]
:
which(coclust@zc[[1]] == 3)
[1] 3 5 8 15 17 25 28
We know that questions 3, 5, 8, 15, 17, 25 and 28 are globally the ones that were answered the more positively. Here is the list of these questions in the EORTC QLQ C30:
In the examples above, the choice for
In this section we use the dataqol
dataset. It has
missing <- which(is.na(x))
missing
values.imputed.clust <- clust@xhat[[1]][missing]
values.imputed.clust
values.imputed.coclust <- coclust@xhat[[1]][missing]
values.imputed.coclust
> missing
[1] 148 177 278 352 380 440 450 559 996 1058 1496 1513 1611 1883 1981
[16] 2046 2047 2050 2085 2285 2402 2450 2514 2517 2518 2663 2754 2785 2900 2902
[31] 2982 2986 3060 3152 3366 3367 3368 3520 3572 3602
> values.imputed.clust
[1] 4 4 4 1 1 1 4 4 1 4 4 4 4 1 4 1 1 1 4 4 4 4 4 4 4 4 4 4 4 4 1 1 1 4 1 1 1
> values.imputed.coclust
[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
We see that the co-clustering and the clustering algorithm had different values imputed for the missing data.
Co-clustering can be seen as a parsimonious way of performing
clustering, which is why these two techniques are compared here. For
example, the interpretation of row-clusters is more precise with the
co-clustering. Indeed, in Figure 5, the row-clusters can
be seen as a group of people who globally replied positively, a group of
people who replied negatively, and a third group that replied in
between. On the other hand, in Figure 6, an inherent
structure of the data is better highlighted and adds more information:
for each row-cluster, it is also easy to detect the questions that were
replied to negatively. Co-clustering can therefore be seen as a more
efficient way of performing clustering. Furthermore the interpretation
of the parameters was easier with the co-clustering result because it
only had
The Adjusted Rand Index (Rand 1971) was computed on row partitions of co-clustering and clustering results, using the package mclust (Scrucca et al. 2016).
mclust::adjustedRandIndex(coclust@zr, clust@zr)
The value obtained is 0.41
, meaning that co-clustering creates a row
partition related to that created by the clustering, without being
identical.
The SEM-algorithm can be slow at reaching its stationary state,
depending on the dataset. After having chosen arbitrary nbSEM
and
nbSEMburn
arguments (in practice at least higher than pichain
, rhochain
and paramschain
slots represent the
par(mfrow=c(3,3))
for(kr in 1:3){
for(kc in 1:3){
toplot <- rep(0, nbSEM)
for(i in 1:nbSEM){
toadd <- coclust@paramschain[[1]]$pis[kr,kc,i]
toplot <- c(toplot, toadd)
}
plot.default(toplot, type = "l",ylim = c(0,1),
col = "hotpink3", main = "pi",
ylab = paste0("pi_", kr, kc, "values"),
xlab = "SEM-Gibbs iterations")
}
}
In Figure 7, we observe that the parameters reach their
stationary state before the nbSEMburn=100
) is
therefore enough. The total number of iterations corresponds to the
argument nbSEM=150
, so
If users wish to execute one of the functions described previously on
variables with different x
must be grouped by same number of levels
m[d]
. The additional changes for the arguments to pass are listed
below:
m
must be a vector of length kc
must be a vector of length idx_list
is a new vector argument of length m[d]
.An example on the dataqol
dataset is available in the
Appendix "4.4".
The ordinalClust package presented in this paper implements several methods for analyzing ordinal data. First, it implements a clustering and co-clustering framework based on the Latent Block Model, coupled with a SEM-Gibbs algorithm and the BOS distribution. Moreover, it defines a novel approach to classify ordinal data. For the classification method, two models are proposed, so that users can introduce parsimony in their analyses. Similarly, it has been shown that the co-clustering method provides a parsimonious way of performing clustering. The framework is able to handle missing values which is notably relevant in the case of real datasets. Finally, these techniques are also implemented in the case of dataset with ordinal data with several numbers of levels. The package ordinalClust is available on the Comprehensive R Archive Network (CRAN), and is still under active development. A future work will implement the method defined in Gelman and Rubin (1992), to automatically define the number of iterations of the SEM-Gibbs algorithm.
The following code computes the specificities, and sensitivities
obtained with the different kc
in the section "3.2":
library(caret)
actual <- v.validation - 1
specificities <- rep(0,length(kcol))
sensitivities <- rep(0,length(kcol))
for(i in 1:length(kcol)){
prediction <- unlist(as.vector(preds[i,])) - 1
u <- union(prediction, actual)
conf_matrix <- table(factor(prediction, u),factor(actual, u))
sensitivities[i] <- recall(conf_matrix)
specificities[i] <- specificity(conf_matrix)
}
sensitivities
specificities
> sensitivities
[1] 0.6666667 1.0000000 1.0000000 1.0000000 0.7777778
> specificities
[1] 0.4444444 0.5555556 0.3333333 0.5555556 0.6666667
set.seed(1)
library(ordinalClust)
data("dataqol")
M <- as.matrix(dataqol[,2:29])
nbSEM <- 150
nbSEMburn <- 100
nbindmini <- 2
init <- "randomBurnin"
percentRandomB <- c(50)
icl <- rep(0,3)
for(kr in 2:4){
object <- bosclust(x = M, kr = kr, m = 4, nbSEM = nbSEM,
nbSEMburn = nbSEMburn, nbindmini = nbindmini,
percentRandomB = percentRandomB, init = init)
if(length(object@icl)) icl[kr-1] <- object@icl
}
icl
> icl
[1] -3713.311 -3192.351 0
We see that the clustering algorithm could not find a solution without
an empty cluster for kr = 4
. The highest icl is for kr = 3
.
set.seed(1)
library(ordinalClust)
data("dataqol")
M <- as.matrix(dataqol[,2:29])
nbSEM <- 150
nbSEMburn <- 100
nbindmini <- 2
init <- "randomBurnin"
percentRandomB <- c(50, 50)
icl <- matrix(0, nrow = 3, ncol = 3)
for(kr in 2:4){
for(kc in 2:4){
object <- boscoclust(x = M,kr = kr, kc = kc, m = 4, nbSEM = nbSEM,
nbSEMburn = nbSEMburn, nbindmini = nbindmini,
percentRandomB = percentRandomB, init = init)
if(length(object@zr)){
icl[kr-1, kc-1] <- object@icl
}
}
}
icl
> icl
[,1] [,2] [,3]
[1,] -3529.423 0.000 -3503.235
[2,] 0.000 -3373.573 0.000
[3,] 0.000 -3361.628 -3299.497
We note that the co-clustering algorithm could not find a solution
without an empty cluster for (kr, kc) = (2,3), (3,2), (3,4), (4,2)
.
The highest ICL-BIC is obtained when (kr,kc) = (3, 3)
.
The following code shows how to handle different numbers of levels in a co-clustering context. It may take several minutes due to the high number of levels of the two last columns.
set.seed(1)
library(ordinalClust)
# loading the real dataset
data("dataqol")
# loading the ordinal data
x <- as.matrix(dataqol[,2:31])
# defining different number of categories:
m <- c(4,7)
# defining number of row and column clusters
krow <- 3
kcol <- c(3,1)
# configuration for the inference
nbSEM <- 20
nbSEMburn <- 15
nbindmini <- 2
init <- 'random'
d.list <- c(1,29)
# Co-clustering execution
object <- boscoclust(x = x,kr = krow, kc = kcol, m = m,
idx_list = d.list, nbSEM = nbSEM,
nbSEMburn = nbSEMburn, nbindmini = nbindmini,
init = init)
ordinalClust, MASS, VGAM, rms, brms, ordinal, ordinalForest, monmlp, ocapis, clustMD, ordinalLBM, CUB, mclust
Bayesian, Cluster, Distributions, Econometrics, Environmetrics, ExtremeValue, MetaAnalysis, MixedModels, NumericalMathematics, Phylogenetics, Psychometrics, ReproducibleResearch, Robust, Survival, TeachingStatistics
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Selosse, et al., "ordinalClust: An R Package to Analyze Ordinal Data", The R Journal, 2021
BibTeX citation
@article{RJ-2021-011, author = {Selosse, Margot and Jacques, Julien and Biernacki, Christophe}, title = {ordinalClust: An R Package to Analyze Ordinal Data}, journal = {The R Journal}, year = {2021}, note = {https://rjournal.github.io/}, volume = {12}, issue = {2}, issn = {2073-4859}, pages = {61-81} }