The BayesBinMix package offers a Bayesian framework for clustering binary data with or without missing values by fitting mixtures of multivariate Bernoulli distributions with an unknown number of components. It allows the joint estimation of the number of clusters and model parameters using Markov chain Monte Carlo sampling. Heated chains are run in parallel and accelerate the convergence to the target posterior distribution. Identifiability issues are addressed by implementing label switching algorithms. The package is demonstrated and benchmarked against the Expectation-Maximization algorithm using a simulation study as well as a real dataset.
Clustering data is a fundamental task in a wide range of applications and finite mixture models are widely used for this purpose (McLachlan and Peel 2000; Marin et al. 2005; Frühwirth–Schnatter 2006). In this paper our attention is focused on clustering binary datasets. A variety of studies aims at identifying patterns in binary data including, but not limited to, voting data (Ilin 2012), text classification (Juan and Vidal 2002), handwritten digit recognition (Al-Ohali et al. 2003), medical research (Sun et al. 2007), animal classification Li (2005) and genetics (Abel et al. 1993).
Throughout this paper the term cluster is used as a synonym of mixture component. Finite mixture models can be estimated under a frequentist approach using the Expectation-Maximization (EM) algorithm (Dempster et al. 1977). However, the likelihood surface of a mixture model can exhibit many local maxima and it is well known that the EM algorithm may fail to converge to the main mode if it is initialized from a point close to a minor mode. Moreover, under a frequentist approach, the selection of the number of clusters is not straightforward: a mixture model for each possible value of number of clusters is fitted and then the optimal one is selected according to penalized likelihood criteria such as the Bayesian information criterion (Schwarz 1978) or the integrated complete likelihood criterion (Biernacki et al. 2000). The reader is also referred to Lindsay (1995; Böhning 2000) for non-parametric likelihood estimation of a mixture model.
On the other hand, the Bayesian framework allows to put a prior distribution on both the number of clusters as well as the model parameters and then (approximately) sample from the joint posterior distribution using Markov chain Monte Carlo (MCMC) algorithms (Richardson and Green 1997; Stephens 2000a; Nobile and Fearnside 2007; White et al. 2016). However this does not mean that the Bayesian approach is not problematic. In general, vanilla MCMC algorithms may require a very large number of iterations to discover the high posterior density areas and/or sufficiently explore the posterior surface due to the existence of minor modes. Second, identifiability issues arise due to the label switching phenomenon (Redner and Walker 1984) which complicate the inference procedure.
The BayesBinMix package explicitly takes care of the previously mentioned problems for the problem of clustering multivariate binary data:
Allows missing values in the observed data,
Performs MCMC sampling for estimating the posterior distribution of the number of clusters and model parameters,
Produces a rapidly mixing MCMC sample by running parallel heated chains which can switch states,
Post-processes the generated MCMC sample and produces meaningful posterior mean estimates using state of the art algorithms to deal with label switching.
The rest of the paper is organised as follows. The mixture model is presented in Section 2. Its prior assumptions and the corresponding hierarchical model is introduced in Section 3. The basic MCMC scheme is detailed in Section 4.1. Section 4.2 deals with post-processing the generated MCMC sample in order to overcome identifiability issues due to the label switching problem. Finally, the basic sampler is embedded in a Metropolis-coupled MCMC algorithm as described in Section 4.3. The main function of the package is described in Section 5. Simulated and real datasets are analyzed in Sections 6.1 and 6.2, respectively.
Let
It is straightforward to prove that the variance-covariance matrix of a mixture of independent Bernoulli distributions is not diagonal (see e.g. (Bishop 2006)), which is the case for a collection of independent Bernoulli distributions. Therefore, the mixture model exhibits richer covariance structure thus it can prove useful to discover correlations in heterogeneous multivariate binary data.
The observed likelihood of the model is written as
Note that the allocation variables
Note that the quantities
The following prior assumptions are imposed
According to Equations (4), (5), (6), (7) and (8), the joint probability density function of the model is
Let
A general framework for updating the number of mixture components (
Let
From Equation (13), the (collapsed) conditional
posterior distribution of
It is well known that draws from the conditional distributions in
Equation (14) exhibit strong serial correlation, slowing down
the convergence of the MCMC sampler. The mixing can be improved by
proposing simultaneous updates of blocks of
Move 1: select two mixture components and propose a random reallocation of the assigned observations.
Move 2: select two mixture components and propose to move a randomly selected subset of observations from the 1st to the 2nd one.
Move 3: select two mixture components and propose a reallocation of the assigned observations according to the full conditional probabilities given the already processed ones.
Each move is accepted according to the corresponding Metropolis-Hastings acceptance probability, see Nobile and Fearnside (2007) for details.
The final step of the allocation sampler is to update the number of
clusters (
Attemp ejection with probability
Suppose that an ejection is attempted. The candidate state is
Propose reallocation of observations assigned to the ejecting
component between itself and the ejected component according to
the
Accept the candidate state with probability
If an absorption is attempted:
all observations allocated to the absorbed component are reallocated to the absorbing component.
the candidate state is accepted with probability
The parameter
The allocation sampler for mixtures of multivariate Bernoulli distributions is summarized in the following algorithm.
Given an initial state
For
Compute
Update
Propose Metropolis-Hastings moves
Propose an Absorption/Ejection move to update
Note in step 1.(a):
Finally, we mention that after the last step of Algorithm 1 we can also
simulate the component-specific parameters
Label switching (Redner and Walker 1984) is a well known identifiability problem occurring in MCMC outputs of mixture models, arising from the symmetry of the likelihood with respect to permutations of components’ labels. A set of sufficient conditions under a general framework of missing data models that lead to label switching and its consequences is given in Papastamoulis and Iliopoulos (2013). If an MCMC sample exhibits label switching, the standard practice of estimating the posterior means and other parametric functions by ergodic averages becomes meaningless. In order to deal with this identifiability problem we have considered two versions of ECR algorithm (Papastamoulis and Iliopoulos 2010; Papastamoulis 2014; Rodríguez and Walker 2014) as well as the KL algorithm (Stephens 2000b). These algorithms are quite efficient and in most cases exhibit almost identical results, but ECR is significantly faster and computationally lightweight compared to KL. The implementation was performed in the R package label.switching (Papastamoulis 2016).
Note here that in the case that
There are various strategies for improving MCMC sampling, see
e.g. chapter 6 in Gilks et al. (1996). In this study, the Metropolis-coupled MCMC
(
Let
In order to take full advantage of computing power in modern-day
computers, our
The main function of the BayesBinMix package is coupledMetropolis
,
with its arguments shown in Table 1. This function takes
as input a binary data array (possibly containing missing values) and
runs the allocation sampler for a series of heated chains which run in
parallel while swaps between pairs of chains are proposed. In the case
that the most probable number of mixture components is larger than 1,
the label switching algorithms are applied.
Argument | Description |
---|---|
Kmax |
Maximum number of clusters (integer, at least equal to two). |
nChains |
Number of parallel (heated) chains. |
heats |
nChains -dimensional vector specifying the temperature of each chain: the 1st entry should always be equal to 1 and the rest of them lie on the set: |
binaryData |
The observed binary data (array). Missing values are allowed as long as the corresponding entries are denoted as NA . |
outPrefix |
The name of the produced output folder. An error is thrown if the directory exists. |
ClusterPrior |
Character string specifying the prior distribution of the number of clusters. Available options: poisson or uniform . It defaults to the (truncated) Poisson distribution. |
m |
The number of MCMC cycles. At the end of each cycle a swap between a pair of heated chains is attempted. Each cycle consists of 10 iterations. |
alpha |
First shape parameter of the Beta prior distribution (strictly positive). Defaults to 1. |
beta |
Second shape parameter of the Beta prior distribution (strictly positive). Defaults to 1. |
gamma |
Kmax -dimensional vector (positive) corresponding to the parameters of the Dirichlet prior of the mixture weights. Default value: rep(1,Kmax) . |
z.true |
An optional vector of cluster assignments considered as the ground-truth clustering of the observations. It is only used to obtain a final permutation of the labels (after the label switching algorithms) in order to maximise the similarity between the resulting estimates and the real cluster assignments. Useful for simulations. |
ejectionAlpha |
Probability of ejecting an empty component. Defaults to 0.2. |
burn |
Optional integer denoting the number of MCMC cycles that will be discarded as burn-in period. |
As the function runs it prints some basic information on the screen such
as the progress of the sampler as well as the acceptance rate of
proposed swaps between chains. The output which is returned to the user
mainly consists of "mcmc"
objects, a class imported from the
coda package (Plummer et al. 2006). More
specifically, the coupledMetropolis()
function returns the objects
detailed in Table 2. We note that this is just a subset
of the full output of the sampler which consists of several additional
quantities, such as the raw MCMC values corresponding to the whole set
of generated values of outPrefix
.
Object | Description |
---|---|
K.mcmc |
Object of class "mcmc" (see coda package) containing the simulated values (after burn-in) of the number of clusters for the cold chain. |
parameters.ecr.mcmc |
Object of class "mcmc" containing the simulated values (after burn-in) of |
allocations.ecr.mcmc |
Object of class "mcmc" containing the simulated values (after burn-in) of |
classificationProbabilities.ecr |
Data frame of the reordered classification probabilities per observation after reordering the most probable number of clusters with the ECR algorithm. |
clusterMembershipPerMethod |
Data frame of the most probable allocation of each observation after reordering the MCMC sample which corresponds to the most probable number of clusters according to ECR, STEPHENS and ECR-ITERATIVE-1 methods. |
K.allChains |
m nChains matrix containing the simulated values of the number of clusters ( |
chainInfo |
Number of parallel chains, cycles, burn-in period and acceptance rate of swap moves. |
In this section the usage of BayesBinMix package is described and various benchmarks are presented. At first we demonstrate a typical implementation on a single simulated dataset and inspect the simulated parameter values and estimates. Then we perform an extensive study on the number of estimated clusters and compare our findings to the FlexMix package (Leisch 2004; Grün and Leisch 2007, 2008). An application to a real dataset is provided next.
At first, a single simulated dataset is used in order to give a brief
overview of the implementation. We simulated x
which contains a total of
1038 missing values corresponding to 34 rows.
We will run 4 parallel chains with the following temperatures:
coupledMetropolis()
function as follows.
> library('BayesBinMix')
> nChains <- 4
> heats <- seq(1, 0.4, length = nChains)
# using the truncated Poisson prior distribution on the number of clusters
> cm1 < - coupledMetropolis(Kmax = 20, nChains = nChains, heats = heats,
binaryData = x, outPrefix = "bbm-poisson", ClusterPrior = "poisson",
m = 1100, z.true = z.true, burn = 100)
# using the uniform prior distribution on the number of clusters
> cm2 <- coupledMetropolis(Kmax = 20, nChains = nChains, heats = heats,
binaryData = x, outPrefix = "bbm-uniform", ClusterPrior = "uniform",
m = 1100, z.true = z.true, burn = 100)
Note that we have called the function twice using either the truncated
Poisson or the Uniform prior on the set z.true
which contains the true allocation of each
observation. It is only used for making the inferred clusters agree to
the labelling of the true values and it has no impact on the MCMC or
label switching algorithms.
In this section we illustrate summaries of the basic output returned to
the user, using only the run which corresponds to the Poisson prior
distribution (cm1
). The print()
method of the package returns a
basic summary of the fitted model:
> print(cm1)
* Run information:
Number of parallel heated chains: 4
Swap acceptance rate: 63.5%
Total number of iterations: 11000
Burn-in period: 1000
Thinning: 10.
* Estimated posterior distribution of the number of clusters:
6 7 8
0.971 0.026 0.003
* Most probable model: K = 6 with P(K = 6|data) = 0.971
* Estimated number of observations per cluster conditionally on K = 6 (3 label switching
algorithms):
STEPHENS ECR ECR.ITERATIVE.1
1 50 50 50
2 46 46 46
3 30 30 30
4 36 36 36
5 12 12 12
6 26 26 26
* Posterior mean of probability of success per feature and cluster (ECR algorithm):
cluster_1 cluster_2 cluster_3 cluster_4 cluster_5 cluster_6
theta_1 0.33364058 0.8465393 0.7023264 0.3340989 0.08364937 0.8933767
theta_2 0.71919239 0.6653526 0.3227822 0.3982836 0.22369486 0.5936094
theta_3 0.49869339 0.2285653 0.3605507 0.3570447 0.07206039 0.1883581
theta_4 0.22360156 0.9148123 0.3359406 0.7889224 0.15476900 0.5924109
theta_5 0.01867034 0.8296381 0.8107050 0.1121773 0.78051586 0.1442368
<+ 95 more rows>
Next we present summaries of the marginal posterior distributions of the
(reordered) MCMC sample of parameters conditionally on the selected
number of clusters. The reordered MCMC sample of "mcmc"
object
parameters.ecr.mcmc
. Hence we can use the summary()
method of the
coda package, which prints empirical means, standard deviations, as
well the quantiles for each variable. This is done with the following
command.
> summary(cm1$parameters.ecr.mcmc)
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
theta.1.1 0.33364 0.06504 0.0020874 0.0020874
... ... ... ... ...
theta.6.1 0.89338 0.05552 0.0017816 0.0017816
<+ 99 blocks of 6 rows>
p.1 0.24663 0.02869 0.0009208 0.0009208
... ... ... ... ...
p.6 0.13270 0.02276 0.0007304 0.0007304
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
theta.1.1 0.2115064 0.289971 0.32896 0.37502 0.46542
... ... ... ... ... ...
theta.6.1 0.7676282 0.859599 0.90175 0.93405 0.97556
<+ 99 blocks of 6 rows>
p.1 0.1950401 0.225938 0.24436 0.26611 0.31012
... ... ... ... ... ...
p.6 0.0905194 0.117203 0.13206 0.14795 0.17993
The user can also visualize the output with a trace of the sampled
values and a density estimate for each variable in the chain using the
plot()
method of the coda package. For illustration, the following
example plots the trace and histogram for
mat <- matrix(c(1:4), byrow = TRUE, ncol = 2)
layout(mat, widths = rep(c(2, 1), 2), heights = rep(1, 4))
mcmcSubset <- cm1$parameters.ecr.mcmc[ , c("theta.2.1", "p.2")]
plot(mcmcSubset, auto.layout = FALSE, ask = FALSE, col = "gray40")
The reader is also referred to the coda package which provides various other functions for calculating and plotting MCMC diagnostics.
Figures 4.(a) and 4.(b) illustrate the sampled
values of m
nChains
array named K.allChains
. The actual posterior
distribution corresponds to the blue line. Note that as the temperature
increases the posterior distribution of
|
|
(a) | (b) |
|
|
(c) | (d) |
|
|
(e) | (f) |
|
|
(g) | (h) |
Next we inspect the MCMC output conditionally on the event that the
number of clusters equals rawMCMC.mapK.6.txt
in the output directory specified by the output
argument. For illustration we plot the raw values of mixture weights. As
shown in Figures 4.(c) and 4.(d), the sample
is mixing very well to the symmetric posterior areas, since in every
iteration labels are changing. The corresponding reordered values
(according to the ECR algorithm) are returned to the user as an "mcmc"
object named parameters.ecr.mcmc
, shown in Figures 4.(e)
and 4.(f). Note that the high posterior density areas are
quite close to the true values of relative frequencies of generated
observations per cluster (indicated by horizontal lines). Finally,
Figures 4.(g) and 4.(h) display the posterior
mean estimates (arising from the reordered MCMC sample) versus the true
values of
|
|
|
|
|
|
|
|
Next we are dealing with model selection issues, that is, selecting the
appropriate number of clusters. For this reason we compare BayesBinMix
with the EM-algorithm implementation provided in FlexMix. Under a
frequentist framework, the selection of the number of mixture components
is feasible using penalized likelihood criteria, such as the BIC
(Schwarz 1978) or ICL (Biernacki et al. 2000), after fitting a mixture model
for each possible value of
> library("BayesBinMix")
> library("flexmix")
> nChains <- 8
> heats <- seq(1,0.4,length = nChains)
> cm <- coupledMetropolis(Kmax = 20, nChains = nChains, heats = heats, binaryData = x,
outPrefix = "sampler", ClusterPrior = "poisson", m = 330, burn = 30)
# now run flexmix for binary data clustering
> ex <- initFlexmix(x ~ 1, k = 1:20, model = FLXMCmvbinary(),
control = list(minprior = 0), nrep = 10)
Note that for both algorithms the number of clusters varies in the set
nrep = 10
different starting points in FlexMix. Here we used a total
of only m = 330
MCMC cycles in order to show that reliable estimates
can be obtained using small number of iterations. Figure
5 displays the most probable number of mixture
components estimated by BayesBinMix and the selected number of
clusters using FlexMix, for each possible value of the true number of
clusters used to simulate the data. Observe that when the number of
clusters is less than 5 both methods are able to estimate the true
number of mixture components. However, FlexMix tends to underestimate
the number of clusters when
We consider the zoo database available at the UC Irvine Machine Learning
Repository (Lichman 2013). The database contains 101 animals, each of
which has 15 boolean attributes and 1 discrete attribute (legs
). The
partition of animals into a total of 7 classes (mammal, bird, reptile,
fish, amphibian, insect and invertebrate) can be considered as the
ground-truth clustering of the data, provided in the vector
z.ground_truth
. Following Li (2005), the discrete variable legs
is transformed into six binary features, which correspond to 0, 2, 4, 5,
6 and 8 legs, respectively. Also we eliminate one of the two entries
corresponding to frog
, as suggested by Li (2005). In total we
consider an x
as the input data.
Recall that the Bernoulli mixture in Equation (1) assumes
that each cluster consists of a product of independent Bernoulli
distributions. Here this assumption is not valid due to the fact that
the six new binary variables arising from legs
are not independent:
they should sum to
We test our method considering both prior assumptions on the number of
clusters, as well as different hyper-parameters on the prior
distribution of
|
|
|
|
# read data
> xOriginal <- read.table("zoo.data", sep = ",")
> x <- xOriginal[ , -c(1, 14, 18)]
> x <- x[-27, ] # delete 2nd frog
# now transform v14 into six binary variables
> v14 <- xOriginal[-27, 14]
> newV14 <- array(data = 0, dim = c(100, 6))
> for(i in 1:100){
+ if( v14[i] == 0 ){ newV14[i,1] = 1 }
+ if( v14[i] == 2 ){ newV14[i,2] = 1 }
+ if( v14[i] == 4 ){ newV14[i,3] = 1 }
+ if( v14[i] == 5 ){ newV14[i,4] = 1 }
+ if( v14[i] == 6 ){ newV14[i,5] = 1 }
+ if( v14[i] == 8 ){ newV14[i,6] = 1 }
+ }
> x <- as.matrix(cbind(x, newV14))
# apply BayesBinMix using 8 heated chains
> library("BayesBinMix")
> nChains <- 8
> heats <- seq(1, 0.6, length = nChains)
# K ~ P{1,...,20}, theta_{kj} ~ Beta(1, 1)
> c1 <- coupledMetropolis(Kmax = 20, nChains = nChains, heats =
heats, binaryData = x,
+ outPrefix = "poisson-uniform", ClusterPrior = "poisson",
+ m = 4400, burn = 400, z.true = z.ground_truth)
# K ~ U{1,...,20}, theta_{kj} ~ Beta(1, 1)
> c2 <- coupledMetropolis(Kmax = 20, nChains = nChains, heats = heats, binaryData = x,
+ outPrefix = "uniform-uniform", ClusterPrior = "uniform",
+ m = 4400, burn = 400, z.true = z.ground_truth)
# K ~ P{1,...,20}, theta_{kj} ~ Beta(0.5, 0.5)
> c3 <- coupledMetropolis(Kmax = 20, nChains = nChains, heats = heats, binaryData = x,
+ outPrefix = "poisson-jeffreys", ClusterPrior = "poisson",
+ m = 4400, burn = 400, z.true = z.ground_truth)
# K ~ U{1,...,20}, theta_{kj} ~ Beta(0.5, 0.5)
> c4 <- coupledMetropolis(Kmax = 20, nChains = nChains, heats = heats, binaryData = x,
+ outPrefix = "uniform-jeffreys", ClusterPrior = "uniform",
+ m = 4400, burn = 400, z.true = z.ground_truth)
Next, we compare the estimated clusters (for the most probable value of
z.ground_truth
). For this reason we provide the rand index
(adjusted or not) based on the confusion matrix between the estimated
and ground-truth clusters, using the package
flexclust
(Leisch 2006).
> library("flexclust")
> z <- array(data = NA, dim = c(100, 4))
> z[ , 1] <- c1$clusterMembershipPerMethod$ECR
> z[ , 2] <- c2$clusterMembershipPerMethod$ECR
> z[ , 3] <- c3$clusterMembershipPerMethod$ECR
> z[ , 4] <- c4$clusterMembershipPerMethod$ECR
> rand.index <- array(data = NA, dim = c(4, 3))
> rownames(rand.index) <- c("poisson-uniform", "uniform-uniform",
"poisson-jeffreys", "uniform-jeffreys")
> colnames(rand.index) <- c("K_map", "rand_index", "adjusted_rand_index")
> findMode <- function(x){ as.numeric( names(sort(-table(x$K.mcmc)))[1] ) }
> rand.index[ , 1] <- c( findMode(c1), findMode(c2), findMode(c3), findMode(c4) )
> for(i in 1:4){
+ rand.index[i, 2] <- randIndex(table(z[ , i], z.ground_truth), correct = FALSE)
+ rand.index[i, 3] <- randIndex(table(z[ , i], z.ground_truth))
+ }
> rand.index
K_map rand_index adjusted_rand_index
poisson-uniform 4 0.9230303 0.7959666
uniform-uniform 5 0.9408081 0.8389208
poisson-jeffreys 6 0.9505051 0.8621216
uniform-jeffreys 7 0.9490909 0.8525556
Note that both rand indices (raw and adjusted) are larger for z[, 3]
,
that is, the six-component mixture model that corresponds to the Poisson
prior on
coupledMetropolis()
function when
running nChains = 4
heated chains on the same number of parallel
threads for m = 1100
cycles, with each cycle consisting of 10 MCMC
iterations. At most Kmax = 20
clusters are allowed. For all sample
sizes (The BayesBinMix package for fitting mixtures of Bernoulli distributions with an unknown number of components has been presented. The pipeline consists of a fully Bayesian treatment for the clustering of multivariate binary data: it allows the joint estimation of the number of clusters and model parameters, deals with identifiability issues as well as produces a rapidly mixing chain. Using a simulation study we concluded that the method outperforms the EM algorithm in terms of estimating the number of clusters and at the same time produces accurate estimates of the underlying model parameters. In the real dataset we explored the flexibility provided by using different prior assumptions and concluded that the estimated clusters are strongly relevant to the natural grouping of the data.
For the prior distribution on the number of clusters our experience
suggests that the truncated Poisson distribution performs better than
the uniform (see also Nobile and Fearnside (2007)). Regarding the prior distribution on
the Bernoulli parameters we recommend to try both the uniform
distribution (default choice) as well as the Jeffreys prior, especially
when the sample size is small. An important parameter is the number of
heated chains which run in parallel, as well as the temperature of each
chain. We suggest to run at least nChains = 4
heated chains. The
heat
parameter for each presented example achieved an acceptance ratio
of proposed swaps between pairs of chains between heats = seq(1, 0.3, length = nChains)
, however we advise to try
different values in case that the swap acceptance ratio is too small
(e.g. m = 1100
and burn = 100
for
total number of MCMC cycles and burn-in period, respectively. For these
particular values of nChains
and m
, Figure 8 displays
the wall clock time demanded by the coupledMetropolis()
function.
Research was funded by MRC award MR/M02010X/1. The authors would like to thank Rebecca Howard and Dr. Lijing Lin (University of Manchester) for using the software and reporting bugs to earlier versions. We also thank an anonymous reviewer and Roger Bivand, Editor of the R Journal, for their valuable comments and suggestions that considerably improved the package and the presentation of our findings.
BayesBinMix, label.switching, foreach, doParallel, coda, FlexMix, flexclust
Bayesian, Cluster, GraphicalModels, HighPerformanceComputing
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Papastamoulis & Rattray, "BayesBinMix: an R Package for Model Based Clustering of Multivariate Binary Data", The R Journal, 2017
BibTeX citation
@article{RJ-2017-022, author = {Papastamoulis, Panagiotis and Rattray, Magnus}, title = {BayesBinMix: an R Package for Model Based Clustering of Multivariate Binary Data}, journal = {The R Journal}, year = {2017}, note = {https://rjournal.github.io/}, volume = {9}, issue = {1}, issn = {2073-4859}, pages = {403-420} }