Systematic bias in high-throughput sequencing data and its correction by BEADS.

Cheung, Ming-Sin; Down, Thomas A; Latorre, Isabel; Ahringer, Julie

Systematic bias in high-throughput sequencing data and its correction by BEADS.

Published version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/279749

Repository DOI

https://doi.org/10.17863/CAM.27119

Files

Published version (4.74 MB)

Type

Article

Authors

Cheung, Ming-Sin

Down, Thomas A

Latorre, Isabel

Ahringer, Julie

https://orcid.org/0000-0002-7074-4051

Abstract

Genomic sequences obtained through high-throughput sequencing are not uniformly distributed across the genome. For example, sequencing data of total genomic DNA show significant, yet unexpected enrichments on promoters and exons. This systematic bias is a particular problem for techniques such as chromatin immunoprecipitation, where the signal for a target factor is plotted across genomic features. We have focused on data obtained from Illumina's Genome Analyser platform, where at least three factors contribute to sequence bias: GC content, mappability of sequencing reads, and regional biases that might be generated by local structure. We show that relying on input control as a normalizer is not generally appropriate due to sample to sample variation in bias. To correct sequence bias, we present BEADS (bias elimination algorithm for deep sequencing), a simple three-step normalization scheme that successfully unmasks real binding patterns in ChIP-seq data. We suggest that this procedure be done routinely prior to data interpretation and downstream analyses.

Keywords

Algorithms, Animals, Base Composition, Caenorhabditis elegans, Chromatin Immunoprecipitation, DNA, Helminth, High-Throughput Nucleotide Sequencing, Sequence Analysis, DNA

Journal Title

Nucleic Acids Res

Journal ISSN

0305-1048
1362-4962

Volume Title

39

Publisher

Oxford University Press (OUP)

Publisher DOI

https://doi.org/10.1093/nar/gkr425

Rights

Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)

Sponsorship

Wellcome Trust (054523/Z/98/C)

Collections

Cambridge University Research Outputs