VSURF: An R Package for Variable Selection Using Random Forests
Robin Genuer, Jean-Michel Poggi and Christine Tuleau-Malot
, The R Journal (2015) 7:2, pages 19-33.
Abstract This paper describes the R package VSURF. Based on random forests, and for both regression and classification problems, it returns two subsets of variables. The first is a subset of important variables including some redundancy which can be relevant for interpretation, and the second one is a smaller subset corresponding to a model trying to avoid redundancy focusing more closely on the prediction objective. The two-stage strategy is based on a preliminary ranking of the explanatory variables using the random forests permutation-based score of importance and proceeds using a stepwise forward strategy for variable introduction. The two proposals can be obtained automatically using data-driven default values, good enough to provide interesting results, but strategy can also be tuned by the user. The algorithm is illustrated on a simulated example and its applications to real datasets are presented.
Received: 2014-07-28; online 2015-11-08@article{RJ-2015-018, author = {Robin Genuer and Jean-Michel Poggi and Christine Tuleau- Malot}, title = {{VSURF: An R Package for Variable Selection Using Random Forests}}, year = {2015}, journal = {{The R Journal}}, doi = {10.32614/RJ-2015-018}, url = {https://doi.org/10.32614/RJ-2015-018}, pages = {19--33}, volume = {7}, number = {2} }