Repository logo
 

DEFLATE compression algorithm corrects for overestimation of phylogenetic diversity by Grantham approach to single-nucleotide polymorphism classification.

Published version
Peer-reviewed

Type

Article

Change log

Authors

Schlosberg, Arran 
Lam, Brian YH 
Yeo, Giles SH 
Clifton-Bligh, Roderick J 

Abstract

Improvements in speed and cost of genome sequencing are resulting in increasing numbers of novel non-synonymous single nucleotide polymorphisms (nsSNPs) in genes known to be associated with disease. The large number of nsSNPs makes laboratory-based classification infeasible and familial co-segregation with disease is not always possible. In-silico methods for classification or triage are thus utilised. A popular tool based on multiple-species sequence alignments (MSAs) and work by Grantham, Align-GVGD, has been shown to underestimate deleterious effects, particularly as sequence numbers increase. We utilised the DEFLATE compression algorithm to account for expected variation across a number of species. With the adjusted Grantham measure we derived a means of quantitatively clustering known neutral and deleterious nsSNPs from the same gene; this was then used to assign novel variants to the most appropriate cluster as a means of binary classification. Scaling of clusters allows for inter-gene comparison of variants through a single pathogenicity score. The approach improves upon the classification accuracy of Align-GVGD while correcting for sensitivity to large MSAs. Open-source code and a web server are made available at https://github.com/aschlosberg/CompressGV.

Description

Keywords

Algorithms, Computational Biology, Genetic Variation, Internet, Models, Theoretical, Polymorphism, Single Nucleotide, Sequence Alignment, User-Computer Interface

Journal Title

Int J Mol Sci

Conference Name

Journal ISSN

1661-6596
1422-0067

Volume Title

15

Publisher

MDPI AG
Sponsorship
Medical Research Council (MC_UU_12012/1)
Medical Research Council (MC_UU_12012/5)
Medical Research Council (MC_UU_12012/5/B)
Medical Research Council (MC_PC_12012)