Mass spectrometry (MS) is a powerful tool for measuring biomolecules, but the data produced is often difficult to handle computationally because it is stored as a ragged array. In R, this format is typically encoded in complex S4 objects built around environments, requiring an extensive background in R to perform even simple tasks. However, the adoption of tidy data (Wickham 2014) provides an alternate data structure that is highly intuitive and works neatly with base R functions and common packages, as well as other programming languages. Here, we discuss the current state of R-based MS data processing, the convenience and challenges of integrating tidy data techniques into MS data processing, and present RaMS, a package that produces tidy representations of MS data.
Supplementary materials are available in addition to this article. It can be downloaded at RJ-2022-050.zip
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Kumler & Ingalls, "The R Journal: Tidy Data Neatly Resolves Mass-Spectrometry's Ragged Arrays", The R Journal, 2022
BibTeX citation
@article{RJ-2022-050, author = {Kumler, William and Ingalls, Anitra E.}, title = {The R Journal: Tidy Data Neatly Resolves Mass-Spectrometry's Ragged Arrays}, journal = {The R Journal}, year = {2022}, note = {https://doi.org/10.32614/RJ-2022-050}, doi = {10.32614/RJ-2022-050}, volume = {14}, issue = {3}, issn = {2073-4859}, pages = {193-202} }