Improving Automated Literature-based Discovery with Neural Networks: Neural biomedical Named Entity Recognition, Link Prediction and Discovery

Crichton, Gamal Kashaka Omari

Improving Automated Literature-based Discovery with Neural Networks: Neural biomedical Named Entity Recognition, Link Prediction and Discovery

Repository URI

https://www.repository.cam.ac.uk/handle/1810/293886

Repository DOI

https://doi.org/10.17863/CAM.40995

Files

Thesis (1.27 MB)

Type

Thesis

Authors

Crichton, Gamal Kashaka Omari

https://orcid.org/0000-0002-3036-0811

Abstract

Literature-based Discovery (LBD) uses information from explicit statements in literature to generate new or unstated knowledge. Automated LBD can thus facilitate hypothesis testing and generation from large collections of publications to support and accelerate scientific research, which is adversely affected by publication explosion and knowledge fragmentation. Existing methods, however, use methodologies which are inadequate for capturing the complex information available in scientific literature and are prone to proposing spurious discoveries or an abundance of low-quality ones. To be capable of solving these problems, automated LBD needs to accurately glean the extensive information present in literature, cope with the dynamic nature of scientific knowledge and place high-quality proposals at the top of ranked outputs.

Recent advances in Natural Language Processing (NLP) allow for deep textual analysis to obtain a wide coverage of information present in text and can adapt easily to recognising new biomedical entities and terms. Similarly, recent advances in graph processing have made it possible to do in-depth analysis on information represented as graphs, such as published biomedical connections, to facilitate high-quality knowledge discovery. Both of these advances utilise neural networks extensively.

This work used neural networks in a bid to advance automated LBD in three ways: 1) improving biomedical Named Entity Recognition (NER) to extract entities from unstructured text by using multi-task learning across multiple biomedical datasets; 2) improving knowledge discovery from realistic, random- and time-sliced biomedical graphs using link prediction and 3) improving the ranking of published discoveries on open- and closed- LBD instances by scoring the strength of connection paths using neural models. Excitingly, the latter approaches outperformed those used by the state-of-the-art LION LBD system, indicating that their integration into it would provide better support to cancer researchers using it.

The results from this work show that it is feasible to use neural networks to improve LBD in different ways. They also demonstrate that neural networks are versatile enough to be applied to improve traditional as well as non-traditional LBD. The principal implication of these findings is that neural biomedical knowledge discovery, especially LBD, is presently useful in addition to being a potentially rich field for further study.

Date

2019-02-18

Advisors

Korhonen, Anna

Keywords

Literature-based Discovery, LBD, Neural networks, Named Entity Recognition, NER, Multi-task Learning, LION LBD, knowledge discovery, Natural Language Processing, NLP, Machine Learning, Deep Learning, Biomedical NLP, Biomedical Knowledge Discovery, Link Predcition, Language Technology Laboratory

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights

Sponsorship

Cambridge Commonwealth, European & International Trust

Collections

Theses - Theoretical and Applied Linguistics