How much do model organism phenotypes contribute to the computational identification of human disease genes?

Alghamdi, Sarah 
Hoehndorf, Robert 

Thumbnail Image
Change log

Computing phenotypic similarity has been shown to be useful in identification of new disease genes and for rare disease diagnostic support. Genotype–phenotype data from orthologous genes in model organisms can compensate for lack of human data to greatly increase genome coverage. Work over the past decade has demonstrated the power of cross-species phenotype comparisons, and several cross-species phenotype ontologies have been developed for this purpose. The relative contribution of different model organisms to computational identification of disease-associated genes is not yet fully explored. We use methods based on phenotype ontologies to semantically relate phenotypes resulting from loss-of-function mutations in different model organisms to disease-associated phenotypes in humans. Semantic machine learning methods are used to measure how much different model organisms contribute to the identification of known human gene–disease associations. We find that mouse genotype-phenotype data is the most important dataset in the identification of human disease genes by semantic similarity and machine learning over phenotype ontologies. Data from other model organisms does not improve identification over that obtained by using the mouse alone, and therefore does not contribute significantly to this task. Our work has implications for the future development of integrated phenotype ontologies, as well as for the use of model organism phenotypes in human genetic variant interpretation.

Publication Date
Online Publication Date
Acceptance Date
Journal Title
Disease Models and Mechanisms
Journal ISSN
Volume Title
Company of Biologists
King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No.URF/1/3790-01-01, URF/1/4355-01-01, and FCC/1/1976-34-01.