Working with Multilabel Datasets in R: The mldr Package

Most classification algorithms deal with datasets which have a set of input features, the variables to be used as predictors, and only one output class, the variable to be predicted. However, in late years many scenarios in which the classifier has to work with several outputs have come to life. Automatic labeling of text documents, image annotation or protein classification are among them. Multilabel datasets are the product of these new needs, and they have many specific traits. The mldr package allows the user to load datasets of this kind, obtain their characteristics, produce specialized plots, and manipulate them. The goal is to provide the exploratory tools needed to analyze multilabel datasets, as well as the transformation and manipulation functions that will make possible to apply binary and multiclass classification models to this data or the development of new multilabel classifiers. Thanks to its integrated user interface, the exploratory functions will be available even to non-specialized R users.

Francisco Charte , David Charte

CRAN packages used

RWeka, mldr, shiny, Rcmdr, rattle, XML, circlize, devtools, pROC, shiny

CRAN Task Views implied by cited packages

WebTechnologies, MachineLearning, Finance, NaturalLanguageProcessing


Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".


For attribution, please cite this work as

Charte & Charte, "The R Journal: Working with Multilabel Datasets in R: The mldr Package", The R Journal, 2015

BibTeX citation

  author = {Charte, Francisco and Charte, David},
  title = {The R Journal: Working with Multilabel Datasets in R: The mldr Package},
  journal = {The R Journal},
  year = {2015},
  note = {},
  doi = {10.32614/RJ-2015-027},
  volume = {7},
  issue = {2},
  issn = {2073-4859},
  pages = {149-162}