The R Journal: accepted article

This article will be copy edited and may be changed before publication.

Finding Optimal Normalizing Transformations via bestNormalize PDF download
Ryan A. Peterson

Abstract The bestNormalize R package was designed to help users find a transformation that can effectively normalize a vector regardless of its actual distribution. Each of the many normalization techniques that have been developed has its own strengths and weaknesses, and deciding which to use until data are fully observed is difficult or impossible. This package facilitates choosing between a range of possible transformations and will automatically return the best one, i.e., the one that makes data look the most normal. To evaluate and compare the normalization efficacy across a suite of possible transformations, we developed a statistic based on a goodness of fit test divided by its degrees of freedom. Transformations can be seamlessly trained and applied to newly observed data, and can be implemented in conjunction with caret and recipes for data preprocessing in machine learning workflows. Custom transformations and normalization statistics are supported.

Received: 2020-06-03; online 2021-06-07
CRAN packages: bestNormalize, caret, recipes, MASS, LambertW, nortest, parallel, doRNG, tidymodels, visreg, scales, ggplot2, mgcv, yardstick
CRAN Task Views implied by cited CRAN packages: SocialSciences, Distributions, Econometrics, Environmetrics, HighPerformanceComputing, Multivariate, TeachingStatistics, Bayesian, Graphics, MachineLearning, NumericalMathematics, Phylogenetics, Psychometrics, Robust


CC BY 4.0
This article is licensed under a Creative Commons Attribution 4.0 International license.

@article{RJ-2021-041,
  author = {Ryan A. Peterson},
  title = {{Finding Optimal Normalizing Transformations via
          bestNormalize}},
  year = {2021},
  journal = {{The R Journal}},
  doi = {10.32614/RJ-2021-041},
  url = {https://journal.r-project.org/archive/2021/RJ-2021-041/index.html}
}