Identification of informative variables in an information system is often performed using simple one-dimensional filtering procedures that discard information about interactions between variables. Such an approach may result in removing some relevant variables from consideration. Here we present an R package MDFS (MultiDimensional Feature Selection) that performs identification of informative variables taking into account synergistic interactions between multiple descriptors and the decision variable. MDFS is an implementation of an algorithm based on information theory (Mnich and Rudnicki, 2017). The computational kernel of the package is implemented in C++. A high-performance version implemented in CUDA C is also available. The application of MDFS is demonstrated using the well-known Madelon dataset, in which a decision variable is generated from synergistic interactions between descriptor variables. It is shown that the application of multidimen sional analysis results in better sensitivity and ranking of importance.
Supplementary materials are available in addition to this article. It can be downloaded at RJ-2019-019.zip
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Piliszek, et al., "MDFS: MultiDimensional Feature Selection in R", The R Journal, 2019
BibTeX citation
@article{RJ-2019-019, author = {Piliszek, Radosław and Mnich, Krzysztof and Migacz, Szymon and Tabaszewski, Paweł and Sułecki, Andrzej and Polewko-Klim, Aneta and Rudnicki, Witold}, title = {MDFS: MultiDimensional Feature Selection in R}, journal = {The R Journal}, year = {2019}, note = {https://doi.org/10.32614/RJ-2019-019}, doi = {10.32614/RJ-2019-019}, volume = {11}, issue = {1}, issn = {2073-4859}, pages = {198-210} }