Biophysical dynamical priors in machine learning

Moss, Jacob

doi:https://doi.org/10.17863/CAM.118119

Biophysical dynamical priors in machine learning

Repository URI

https://www.repository.cam.ac.uk/handle/1810/383935

Repository DOI

https://doi.org/10.17863/CAM.118119

Files

Primary Thesis (16.33 MB)

Type

Thesis

Authors

Moss, Jacob

https://orcid.org/0000-0003-1555-7306

Abstract

The proliferation of machine learning models in biology is due to the great potential of novel discoveries ranging from new medicines to an improved understanding of the development of species. Adding to this, an ever-increasing number of high-resolution biological datasets are providing the fuel for these models to extract meaningful insights. Due to the success of this pairing, machine learning in biology has emerged to be a vital field of research. As with any real-world applications, the scenarios are usually much more complex than the benchmark tasks found in foundational machine learning research. In addition, the standard modelling approaches in biology frequently involve classical techniques such as dimensionality reduction to a 2D plane followed by empirical observations. This motivates a need to construct better modelling techniques that make use of developments in deep learning. The challenge is then to scale these approaches to the vast size and dimensionality of biological datasets.

In this dissertation, I hypothesise that incorporating domain knowledge can not only improve predictive performance but also yield additional insights that cannot be obtained through data-driven methods alone. In order to evaluate this hypothesis, I select a set of important problems in biology, such as genetic regulation, and construct a variety of machine learning models with the aim of evaluating the relative efficacies of different levels of biophysical inductive bias. The approaches I introduce in Chapter 3 traverse from simple, black box dynamical biases to explicit biophysical priors encoded directly in the model. The resulting techniques are studied under the lens of the latent force paradigm, a combination of nonparametric mechanistic and data-driven approaches, leading to improved biological interpretability and a greater representation power. In Chapter 4, I propose and evaluate methods which improve the predictive performance compared with the standard approaches as well as drastically increase the scalability of the paradigm. Chapter 5 extends the popular biological framework, RNA velocity, both to reduce invalid assumptions and to enable the inference of single-cell resolution quantities such as pseudotime. Finally, Chapter 6 presents a conclusion on all the work conducted along with directions for future work.

Date

2024-11-01

Advisors

Lio, pietro

Keywords

gaussian processes, gaussian process, latent force model, dynamical model, biology, genetics, transcriptomics, machine learning, neural network

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights and licensing

Sponsorship

Full PhD studentship funded by GSK

Collections

Theses - Computer Science and Technology

Biophysical dynamical priors in machine learning

Repository URI

Repository DOI

Files

Type

Change log

Authors

Abstract

Description

Date

Advisors

Keywords

Qualification

Awarding Institution

Rights and licensing

Sponsorship

Collections