MDFS: MultiDimensional Feature Selection in R
Radosław Piliszek, Krzysztof Mnich, Szymon Migacz, Paweł Tabaszewski, Andrzej Sułecki, Aneta Polewko-Klim and Witold Rudnicki
, The R Journal (2019) 11:1, pages 198-210.
Abstract Identification of informative variables in an information system is often performed using simple one-dimensional filtering procedures that discard information about interactions between variables. Such an approach may result in removing some relevant variables from consideration. Here we present an R package MDFS (MultiDimensional Feature Selection) that performs identification of informative variables taking into account synergistic interactions between multiple descriptors and the decision variable. MDFS is an implementation of an algorithm based on information theory (Mnich and Rudnicki, 2017). The computational kernel of the package is implemented in C++. A high-performance version implemented in CUDA C is also available. The application of MDFS is demonstrated using the well-known Madelon dataset, in which a decision variable is generated from synergistic interactions between descriptor variables. It is shown that the application of multidimen sional analysis results in better sensitivity and ranking of importance.
Received: 2018-12-01; online 2019-08-16, supplementary material, (1.3 KiB)@article{RJ-2019-019, author = {Radosław Piliszek and Krzysztof Mnich and Szymon Migacz and Paweł Tabaszewski and Andrzej Sułecki and Aneta Polewko-Klim and Witold Rudnicki}, title = {{MDFS: MultiDimensional Feature Selection in R}}, year = {2019}, journal = {{The R Journal}}, doi = {10.32614/RJ-2019-019}, url = {https://doi.org/10.32614/RJ-2019-019}, pages = {198--210}, volume = {11}, number = {1} }