Protein structure data consist of several dihedral angles, lying on a multidimensional torus. Analyzing such data has been and continues to be key in understanding functional properties of proteins. However, most of the existing statistical methods assume that data are on Euclidean spaces, and thus they are improper to deal with angular data. In this paper, we introduce the package ClusTorus specialized to analyzing multivariate angular data. The package collects some tools and routines to perform algorithmic clustering and model-based clustering for data on the torus. In particular, the package enables the construction of conformal prediction sets and predictive clustering, based on kernel density estimates and mixture model estimates. A novel hyperparameter selection strategy for predictive clustering is also implemented, with improved stability and computational efficiency. We demonstrate the use of the package in clustering protein dihedral angles from two real data sets.
Supplementary materials are available in addition to this article. It can be downloaded at RJ-2022-032.zip
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Hong & Jung, "ClusTorus: An R Package for Prediction and Clustering on the Torus by Conformal Prediction", The R Journal, 2022
BibTeX citation
@article{RJ-2022-032, author = {Hong, Seungki and Jung, Sungkyu}, title = {ClusTorus: An R Package for Prediction and Clustering on the Torus by Conformal Prediction}, journal = {The R Journal}, year = {2022}, note = {https://doi.org/10.32614/RJ-2022-032}, doi = {10.32614/RJ-2022-032}, volume = {14}, issue = {2}, issn = {2073-4859}, pages = {186-207} }