Repository logo
 

Personalized and graph genomes reveal missing signal in epigenomic data

Published version
Peer-reviewed

Change log

Authors

Groza, Cristian 
Kwan, Tony 
Soranzo, Nicole 
Pastinen, Tomi 
Bourque, Guillaume 

Abstract

Abstract: Background: Epigenomic studies that use next generation sequencing experiments typically rely on the alignment of reads to a reference sequence. However, because of genetic diversity and the diploid nature of the human genome, we hypothesize that using a generic reference could lead to incorrectly mapped reads and bias downstream results. Results: We show that accounting for genetic variation using a modified reference genome or a de novo assembled genome can alter histone H3K4me1 and H3K27ac ChIP-seq peak calls either by creating new personal peaks or by the loss of reference peaks. Using permissive cutoffs, modified reference genomes are found to alter approximately 1% of peak calls while de novo assembled genomes alter up to 5% of peaks. We also show statistically significant differences in the amount of reads observed in regions associated with the new, altered, and unchanged peaks. We report that short insertions and deletions (indels), followed by single nucleotide variants (SNVs), have the highest probability of modifying peak calls. We show that using a graph personalized genome represents a reasonable compromise between modified reference genomes and de novo assembled genomes. We demonstrate that altered peaks have a genomic distribution typical of other peaks. Conclusions: Analyzing epigenomic datasets with personalized and graph genomes allows the recovery of new peaks enriched for indels and SNVs. These altered peaks are more likely to differ between individuals and, as such, could be relevant in the study of various human phenotypes.

Description

Keywords

Research, Graph genomes, Personalized genomes, Genome graphs, De novo assembly, Modified reference, Reference bias, ChIP-seq, Epigenomics

Journal Title

Genome Biology

Conference Name

Journal ISSN

1474-760X

Volume Title

21

Publisher

BioMed Central