The two-sample problem refers to the comparison of two probability distributions via two independent samples. With high-dimensional data, such comparison is performed along a large number \(p\) of possibly correlated variables or outcomes. In genomics, for instance, the variables may represent gene expression levels for \(p\) locations, recorded for two (usually small) groups of individuals. In this paper we introduce TwoSampleTest.HD, a new R
package to test for the equal distribution of the \(p\) outcomes. Specifically, TwoSampleTest.HD implements the tests recently proposed by (Cousido-Rocha et al. 2019) for the low sample size, large dimensional setting. These tests take the possible dependence among the \(p\) variables into account, and work for sample sizes as small as two. The tests are based on the distance between the empirical characteristic functions of the two samples, when averaged along the \(p\) locations. Different options to estimate the variance of the test statistic under dependence are allowed. The package TwoSampleTest.HD provides the user with individual permutation \(p\)-values too, so feature discovery is possible when the null hypothesis of equal distribution is rejected. We illustrate the usage of the package through the analysis of simulated and real data, where results provided by alternative approaches are considered for comparison purposes. In particular, benefits of the implemented tests relative to ordinary multiple comparison procedures are highlighted. Practical recommendations are given.
Supplementary materials are available in addition to this article. It can be downloaded at RJ-2023-063.zip
Equalden.HD
: An R
package for testing the equality of a high dimensional set of densities. Computer Methods and Programs in Biomedicine, 217: 106694, 2022. DOI https://doi.org/10.1016/j.cmpb.2022.106694.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Cousido-Rocha & Uña-Álvarez, "TwoSampleTest.HD: An R Package for the Two-Sample Problem with High-Dimensional Data", The R Journal, 2023
BibTeX citation
@article{RJ-2023-063, author = {Cousido-Rocha, Marta and Uña-Álvarez, Jacobo de}, title = {TwoSampleTest.HD: An R Package for the Two-Sample Problem with High-Dimensional Data}, journal = {The R Journal}, year = {2023}, note = {https://doi.org/10.32614/RJ-2023-063}, doi = {10.32614/RJ-2023-063}, volume = {15}, issue = {3}, issn = {2073-4859}, pages = {79-92} }