Repository logo
 

Data-driven and Machine Learning approaches for exploration and inference of Biological Pathways in physiological and pathological states


Loading...
Thumbnail Image

Type

Change log

Authors

Abstract

The complexity of human biology cannot be understood by studying its individual components in isolation. A fundamental element of this complexity arises from the numerous layers of regulation present within a cell. Interactions within and between these interconnected regulatory layers form a vast and dynamic cellular interaction network. Biological pathways are used in an attempt to model this network. These pathways play a key role in advancing our understanding of cellular processes and are employed across a wide range of tasks. The utility of pathways is demonstrated in the first part of this thesis, which investigates the epigenetic changes in cancer through machine learning models that classify cancer types and subtypes. Interpretation of these models through pathway analysis techniques confirms that the genomic loci they detect are involved in cancer processes.

Despite the central importance and broad applicability of pathways across systems biology, there is scope for improvement regarding their comprehensiveness and quality. Previous work has shown that they are incomplete and inconsistent. In support of this view, here I show that pathway databases are not yet sufficiently accurate to facilitate conclusive pathway enrichment analysis. This knowledge gap highlights an open research direction: to develop prediction models over pathway data, in order to uncover the missing knowledge from pathway databases. In this endeavour, the biological context is a key consideration, as interactions between genes and other biomolecules, such as RNA, protein and metabolites, differ depending on the context. Therefore, as a proof of principle for pathway prediction, I focus on predicting missing pathway edges in the interferon system.

To pursue this objective, I first integrate and analyse over 250 gene expression experiments to identify the key genes and pathways of the interferon system, a critical component of host defence. With knowledge of this system, I then proceed to develop link prediction models to predict pathway edges in this system. For this purpose, I utilise graph machine learning approaches including graph neural networks and knowledge graph embedding models. I investigate whether incorporating prior biological information can aid pathway prediction and, surprisingly, find that additional biological network information does not aid the prediction of pathway edges. After validating the performance of the models, I integrate them into a methodology for curating custom subpathways and networks. I also show a novel strategy to validate the predicted edges in these subpathways using predictions from large language models. Collectively, this thesis offers contributions to the field of systems immunology by providing insights into biological pathway databases and tools to study the interferon system. These include a web application to study the interferon system, investigations into the inconsistencies between pathway databases and a novel approach to pathway curation.

Description

Date

2025-04-02

Advisors

Rueda, Oscar

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)