Tidy Data Neatly Resolves Mass-Spectrometry’s Ragged Arrays

Mass spectrometry (MS) is a powerful tool for measuring biomolecules, but the data produced is often difficult to handle computationally because it is stored as a ragged array. In R, this format is typically encoded in complex S4 objects built around environments, requiring an extensive background in R to perform even simple tasks. However, the adoption of tidy data (Wickham 2014) provides an alternate data structure that is highly intuitive and works neatly with base R functions and common packages, as well as other programming languages. Here, we discuss the current state of R-based MS data processing, the convenience and challenges of integrating tidy data techniques into MS data processing, and present RaMS, a package that produces tidy representations of MS data.

William Kumler (University of Washington School of Oceanography) , Anitra E. Ingalls (University of Washington School of Oceanography)
2022-12-20

Supplementary materials

Supplementary materials are available in addition to this article. It can be downloaded at RJ-2022-050.zip

H. Wickham. Tidy data. Journal of Statistical Software, 59(10): 1–23, 2014. URL https://doi.org/10.18637/jss.v059.i10.

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Kumler & Ingalls, "Tidy Data Neatly Resolves Mass-Spectrometry's Ragged Arrays", The R Journal, 2022

BibTeX citation

@article{RJ-2022-050,
  author = {Kumler, William and Ingalls, Anitra E.},
  title = {Tidy Data Neatly Resolves Mass-Spectrometry's Ragged Arrays},
  journal = {The R Journal},
  year = {2022},
  note = {https://doi.org/10.32614/RJ-2022-050},
  doi = {10.32614/RJ-2022-050},
  volume = {14},
  issue = {3},
  issn = {2073-4859},
  pages = {193-202}
}