Crumble: reference free lossy compression of sequence quality values
Published version
Peer-reviewed
Repository URI
Repository DOI
Change log
Authors
Abstract
Motivation: The bulk of space taken up by NGS sequencing CRAM files consists of per-base quality values. Most of these are unnecessary for variant calling, offering an opportunity for space saving. Results: On the Syndip test set, a 17 fold reduction in the quality storage portion of a CRAM file can be achieved while maintaining variant calling accuracy. The size reduction of an entire CRAM file varied from 2.2 to 7.4 fold, depending on the non-quality content of the original file (see Supplementary Material S6 for details). Availability and implementation: Crumble is OpenSource and can be obtained from https://github.com/jkbonfield/crumble. Supplementary information: Supplementary data are available at Bioinformatics online.
Description
Keywords
Data Compression, High-Throughput Nucleotide Sequencing
Journal Title
Bioinformatics
Conference Name
Journal ISSN
1367-4803
1367-4811
1367-4811
Volume Title
35
Publisher
Oxford University Press
Publisher DOI
Sponsorship
Wellcome Trust (unknown)
This work was funded by the Wellcome Trust [WT098051].