Haplotype-aware graph indexes.
Bioinformatics (Oxford, England)
Oxford University Press (OUP)
MetadataShow full item record
Sirén, J., Garrison, E., Novak, A. M., Paten, B., & Durbin, R. (2020). Haplotype-aware graph indexes.. Bioinformatics (Oxford, England), 36 (2), 400-407. https://doi.org/10.1093/bioinformatics/btz575
Motivation:The variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are nonbiological, unlikely recombinations of true haplotypes. Results:We augment the VG model with haplotype information to identify which paths are more likely to exist in nature. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows–Wheeler transform (GBWT). We demonstrate the scalability of the new implementation by building a whole-genome index of the 5,008 haplotypes of the 1000 Genomes Project, and an index of all 108,070 TOPMed Freeze 5 chromosome 17 haplotypes. We also develop an algorithm for simplifying variation graphs for k-mer indexing without losing any k-mers in the haplotypes.
Sequence Analysis, DNA, Haplotypes, Genome, Algorithms, Software
External DOI: https://doi.org/10.1093/bioinformatics/btz575
This record's URL: https://www.repository.cam.ac.uk/handle/1810/296161
All rights reserved