Repository logo
 

A haplotype-aware de novo assembly of related individuals using pedigree sequence graph.

Accepted version
Peer-reviewed

No Thumbnail Available

Type

Article

Change log

Authors

Garg, Shilpa 
Aach, John 
Li, Heng 
Church, George 

Abstract

MOTIVATION: Reconstructing high-quality haplotype-resolved assemblies for related individuals has important applications in Mendelian diseases and population genomics. Through major genomics sequencing efforts such as the Personal Genome Project, the Vertebrate Genome Project (VGP), and the Genome in a Bottle project (GIAB), a variety of sequencing datasets from trios of diploid genomes are becoming available.Current trio assembly approaches are not designed to incorporate long- and short-read data from mother-father-child trios, and therefore require relatively high coverages of costly long-read data to produce high-quality assemblies. Thus, building a trio-aware assembler capable of producing accurate and chromosomal-scale diploid genomes of all individuals in a pedigree, while being cost-effective in terms of sequencing costs, is a pressing need of the genomics community. RESULTS: We present a novel pedigree sequence graph based approach to diploid assembly using accurate Illumina data and long-read Pacific Biosciences (PacBio) data from all related individuals, thereby generalizing our previous work on single individuals. We demonstrate the effectiveness of our pedigree approach on a simulated trio of pseudo-diploid yeast genomes with different heterozygosity rates, and real data from human chromosome. We show that we require as little as 30× coverage Illumina data and 15× PacBio data from each individual in a trio to generate chromosomal-scale phased assemblies. Additionally, we show that we can detect and phase variants from generated phased assemblies. AVAILABILITY: https://github.com/shilpagarg/WHdenovo.

Description

Keywords

Genome, Genomics, Haplotypes, High-Throughput Nucleotide Sequencing, Humans, Pedigree, Sequence Analysis, DNA

Journal Title

Bioinformatics

Conference Name

Journal ISSN

1367-4803
1367-4811

Volume Title

Publisher

Oxford University Press

Rights

All rights reserved
Sponsorship
Wellcome Trust (unknown)