Systematic bias in high-throughput sequencing data and its correction by BEADS.
View / Open Files
Publication Date
2011-08Journal Title
Nucleic Acids Res
ISSN
0305-1048
Publisher
Oxford University Press (OUP)
Volume
39
Issue
15
Pages
e103
Language
eng
Type
Article
Physical Medium
Print-Electronic
Metadata
Show full item recordCitation
Cheung, M., Down, T., Latorre, I., & Ahringer, J. (2011). Systematic bias in high-throughput sequencing data and its correction by BEADS.. Nucleic Acids Res, 39 (15), e103. https://doi.org/10.1093/nar/gkr425
Abstract
Genomic sequences obtained through high-throughput sequencing are not uniformly distributed across the genome. For example, sequencing data of total genomic DNA show significant, yet unexpected enrichments on promoters and exons. This systematic bias is a particular problem for techniques such as chromatin immunoprecipitation, where the signal for a target factor is plotted across genomic features. We have focused on data obtained from Illumina's Genome Analyser platform, where at least three factors contribute to sequence bias: GC content, mappability of sequencing reads, and regional biases that might be generated by local structure. We show that relying on input control as a normalizer is not generally appropriate due to sample to sample variation in bias. To correct sequence bias, we present BEADS (bias elimination algorithm for deep sequencing), a simple three-step normalization scheme that successfully unmasks real binding patterns in ChIP-seq data. We suggest that this procedure be done routinely prior to data interpretation and downstream analyses.
Keywords
Animals, Caenorhabditis elegans, DNA, Helminth, Chromatin Immunoprecipitation, Sequence Analysis, DNA, Base Composition, Algorithms, High-Throughput Nucleotide Sequencing
Sponsorship
Wellcome Trust (054523/Z/98/C)
Identifiers
External DOI: https://doi.org/10.1093/nar/gkr425
This record's URL: https://www.repository.cam.ac.uk/handle/1810/279749
Rights
Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
Licence:
http://www.rioxx.net/licenses/all-rights-reserved
Statistics
Total file downloads (since January 2020). For more information on metrics see the
IRUS guide.