dGAselID: An R Package for Selecting a Variable Number of Features in High Dimensional Data
Nicolae Teodor Melita and Stefan Holban
, The R Journal (2017) 9:2, pages 18-34.
Abstract The dGAselID package proposes an original approach to feature selection in high dimen sional data. The method is built upon a diploid genetic algorithm. The genotype to phenotype mapping is modeled after the Incomplete Dominance Inheritance, overpassing the necessity to define a dominance scheme. The fitness evaluation is done by user selectable supervised classifiers, from a broad range of options. Cross validation options are also accessible. A new approach to crossover, inspired from the random assortment of chromosomes during meiosis is included. Several mutation operators, inspired from genetics, are also proposed. The package is fully compatible with the data formats used in Bioconductor and MLInterfaces package, readily applicable to microarray studies, but is flexible to other feature selection applications from high dimensional data. Several options for the visualization of evolution and outcomes are implemented to facilitate the interpretation of results. The package’s functionality is illustrated by examples.
Received: 2016-08-25; online 2017-08-25@article{RJ-2017-040, author = {Nicolae Teodor Melita and Stefan Holban}, title = {{dGAselID: An R Package for Selecting a Variable Number of Features in High Dimensional Data}}, year = {2017}, journal = {{The R Journal}}, doi = {10.32614/RJ-2017-040}, url = {https://doi.org/10.32614/RJ-2017-040}, pages = {18--34}, volume = {9}, number = {2} }