Using DECIPHER v2.0 to Analyze Big Biological Sequence Data in R

In recent years, the cost of DNA sequencing has decreased at a rate that has outpaced improvements in memory capacity. It is now common to collect or have access to many gigabytes of biological sequences. This has created an urgent need for approaches that analyze sequences in subsets without requiring all of the sequences to be loaded into memory at one time. It has also opened opportunities to improve the organization and accessibility of information acquired in sequencing projects. The DECIPHER package offers solutions to these problems by assisting in the curation of large sets of biological sequences stored in compressed format inside a database. This approach has many practical advantages over standard bioinformatics workflows, and enables large analyses that would otherwise be prohibitively time consuming.

Erik S. Wright
2016-05-01

CRAN packages used

RSQLite

CRAN Task Views implied by cited packages

Databases

Bioconductor packages used

Biostrings, DECIPHER

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Wright, "Using DECIPHER v2.0 to Analyze Big Biological Sequence Data in R", The R Journal, 2016

BibTeX citation

@article{RJ-2016-025,
  author = {Wright, Erik S.},
  title = {Using DECIPHER v2.0 to Analyze Big Biological Sequence Data in R},
  journal = {The R Journal},
  year = {2016},
  note = {https://doi.org/10.32614/RJ-2016-025},
  doi = {10.32614/RJ-2016-025},
  volume = {8},
  issue = {1},
  issn = {2073-4859},
  pages = {352-359}
}