VSURF: An R Package for Variable Selection Using Random Forests

This paper describes the R package VSURF. Based on random forests, and for both regression and classification problems, it returns two subsets of variables. The first is a subset of important variables including some redundancy which can be relevant for interpretation, and the second one is a smaller subset corresponding to a model trying to avoid redundancy focusing more closely on the prediction objective. The two-stage strategy is based on a preliminary ranking of the explanatory variables using the random forests permutation-based score of importance and proceeds using a stepwise forward strategy for variable introduction. The two proposals can be obtained automatically using data-driven default values, good enough to provide interesting results, but strategy can also be tuned by the user. The algorithm is illustrated on a simulated example and its applications to real datasets are presented.

Robin Genuer , Jean-Michel Poggi , Christine Tuleau-Malot
2015-11-08

CRAN packages used

VSURF, rpart, randomForest, party, ipred, Boruta, varSelRF, spikeSlabGAM, BioMark, mlbench, mixOmics

CRAN Task Views implied by cited packages

MachineLearning, Environmetrics, Survival, ChemPhys, Multivariate, Bayesian, HighPerformanceComputing

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Genuer, et al., "VSURF: An R Package for Variable Selection Using Random Forests", The R Journal, 2015

BibTeX citation

@article{RJ-2015-018,
  author = {Genuer, Robin and Poggi, Jean-Michel and Tuleau-Malot, Christine},
  title = {VSURF: An R Package for Variable Selection Using Random Forests},
  journal = {The R Journal},
  year = {2015},
  note = {https://doi.org/10.32614/RJ-2015-018},
  doi = {10.32614/RJ-2015-018},
  volume = {7},
  issue = {2},
  issn = {2073-4859},
  pages = {19-33}
}