PheneBank: a literature-based database of phenotypes.

Pilehvar, Mohammad Taher; Bernard, Adam; Smedley, Damian; Collier, Nigel

PheneBank: a literature-based database of phenotypes.

Accepted version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/329879

Repository DOI

https://doi.org/10.17863/CAM.77324

Files

Accepted version (220.65 KB)

Type

Article

Authors

Pilehvar, Mohammad Taher

Bernard, Adam

Smedley, Damian

Collier, Nigel

https://orcid.org/0000-0002-7230-4164

Abstract

MOTIVATION: Significant effort has been spent by curators to create coding systems for phenotypes such as the Human Phenotype Ontology, as well as disease-phenotype annotations. We aim to support the discovery of literature-based phenotypes and integrate them into the knowledge discovery process. RESULTS: PheneBank is a Web-portal for retrieving human phenotype-disease associations that have been text-mined from the whole of Medline. Our approach exploits state-of-the-art machine learning for concept identification by utilizing an expert annotated rare disease corpus from the PMC Text Mining subset. Evaluation of the system for entities is conducted on a gold-standard corpus of rare disease sentences and for associations against the Monarch initiative data. AVAILABILITY AND IMPLEMENTATION: The PheneBank Web-portal freely available at http://www.phenebank.org. Annotated Medline data is available from Zenodo at DOI: 10.5281/zenodo.1408800. Semantic annotation software is freely available for non-commercial use at GitHub: https://github.com/pilehvar/phenebank. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Keywords

Humans, Rare Diseases, Software, Algorithms, Data Mining, Phenotype

Journal Title

Bioinformatics

Journal ISSN

1367-4803
1367-4811

Publisher

Oxford University Press (OUP)

Publisher DOI

https://doi.org/10.1093/bioinformatics/btab740

Rights

Sponsorship

Engineering and Physical Sciences Research Council (EP/M005089/1)
Medical Research Council (MR/M025160/1)

Medical Research Council (grant MR/M025160/1).

Collections

Cambridge University Research Outputs