Nonparametric Independence Tests and k-sample Tests for Large Sample Sizes Using Package HHG

Nonparametric tests of independence and k-sample tests are ubiquitous in modern applica tions, but they are typically computationally expensive. We present a family of nonparametric tests that are computationally efficient and powerful for detecting any type of dependence between a pair of univariate random variables. The computational complexity of the suggested tests is sub-quadratic in sample size, allowing calculation of test statistics for millions of observations. We survey both algorithms and the HHG package in which they are implemented, with usage examples showing the implementation of the proposed tests for both the independence case and the k-sample problem. The tests are compared to existing nonparametric tests via several simulation studies comparing both runtime and power. Special focus is given to the design of data structures used in implementation of the tests. These data structures can be useful for developers of nonparametric distribution-free tests.

Barak Brill , Yair Heller , Ruth Heller
2018-05-16

Supplementary materials

Supplementary materials are available in addition to this article. It can be downloaded at RJ-2018-008.zip

CRAN packages used

Hmisc, infotheo, entropy, minerva, dHSIC, energy, HHG, kernlab, dslice, rbenchmark, doRNG

CRAN Task Views implied by cited packages

Multivariate, Bayesian, ClinicalTrials, Cluster, Econometrics, HighPerformanceComputing, MachineLearning, NaturalLanguageProcessing, OfficialStatistics, Optimization, ReproducibleResearch, SocialSciences

Bioconductor packages used

minet

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Brill, et al., "Nonparametric Independence Tests and k-sample Tests for Large Sample Sizes Using Package HHG", The R Journal, 2018

BibTeX citation

@article{RJ-2018-008,
  author = {Brill, Barak and Heller, Yair and Heller, Ruth},
  title = {Nonparametric Independence Tests and k-sample Tests for Large Sample Sizes Using Package HHG},
  journal = {The R Journal},
  year = {2018},
  note = {https://doi.org/10.32614/RJ-2018-008},
  doi = {10.32614/RJ-2018-008},
  volume = {10},
  issue = {1},
  issn = {2073-4859},
  pages = {424-438}
}