Repository logo
 

Text Mining for Contexts and Relationships in Cancer Genomics Literature

Published version
Peer-reviewed

Repository DOI


Change log

Abstract

Motivation: Scientific advances build on the findings of existing research. The 2001 publication of the human genome has led to the production of huge volumes of literature exploring the context-specific functions and interactions of genes. Technology is needed to perform large-scale text mining of research papers to extract the reported actions of genes in specific experimental contexts and cell states such as cancer, thereby facilitating the design of new therapeutic strategies. Results: We present a new corpus and Text Mining methodology that can accurately identify and extract the most important details of cancer genomics experiments from biomedical texts. We build a Named Entity Recognition model that accurately extracts relevant experiment details from PubMed abstract text, and a second model that identifies the relationships between them. This system outperforms earlier models and enables the analysis of gene function in diverse and dynamically evolving experimental contexts. Availability: We make available under open source licence our text mining models, code, and annotated corpus. See: Supplementary information: Supplementary data are available at Bioinformatics online.

Description

Funder: UK Research and Innovation; DOI: https://doi.org/10.13039/100014013


Funder: Amazon Machine Learning Research Award


Funder: Cancer Research UK Cambridge Institute; DOI: https://doi.org/10.13039/501100022011


Funder: Biotechnology and Biological Sciences Research Council; DOI: https://doi.org/10.13039/501100000268

Journal Title

Bioinformatics

Conference Name

Journal ISSN

1367-4803
1367-4811

Volume Title

40

Publisher

Oxford University Press (OUP)

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution 4.0 International
Sponsorship
Cancer Research UK (CB4210)
Diabetes UK (via British Council) (65BX18MNIB)
Biotechnology and Biological Sciences Research Council (BB/S013466/1)
BBSRC (BB/T013486/1)
Horizon Europe UKRI Underwrite ERC (EP/Y031350/1)
Cancer Research UK (24453)