Repository logo
 

Crumble: reference free lossy compression of sequence quality values

Published version
Peer-reviewed

Type

Article

Change log

Authors

Bonfield, James K 
McCarthy, Shane A 

Abstract

Motivation: The bulk of space taken up by NGS sequencing CRAM files consists of per-base quality values. Most of these are unnecessary for variant calling, offering an opportunity for space saving. Results: On the Syndip test set, a 17 fold reduction in the quality storage portion of a CRAM file can be achieved while maintaining variant calling accuracy. The size reduction of an entire CRAM file varied from 2.2 to 7.4 fold, depending on the non-quality content of the original file (see Supplementary Material S6 for details). Availability and implementation: Crumble is OpenSource and can be obtained from https://github.com/jkbonfield/crumble. Supplementary information: Supplementary data are available at Bioinformatics online.

Description

Keywords

Data Compression, High-Throughput Nucleotide Sequencing

Journal Title

Bioinformatics

Conference Name

Journal ISSN

1367-4803
1367-4811

Volume Title

35

Publisher

Oxford University Press
Sponsorship
Wellcome Trust (unknown)
This work was funded by the Wellcome Trust [WT098051].