Repository logo
 

Statistical Methods for the Analysis of Contextual Gene Expression Data


Type

Thesis

Change log

Authors

Abstract

Technological advances have enabled profiling gene expression variability, both at the RNA and the protein level, with ever increasing throughput. In addition, miniaturisation has enabled quantifying gene expression from small volumes of the input material and most recently at the level of single cells. Increasingly these technologies also preserve context information, such as assaying tissues with high spatial resolution. A second example of contextual information is multi-omics protocols, for example to assay gene expression and DNA methylation from the same cells or samples. Although such contextual gene expression datasets are increasingly available for both popu- lation and single-cell variation studies, methods for their analysis are not established. In this thesis, we propose two modelling approaches for the analysis of gene expression variation in specific biological contexts. The first contribution of this thesis is a statistical method for analysing single cell expression data in a spatial context. Our method identifies the sources of gene expression variability by decomposing it into different components, each attributable to a different source. These sources include aspects of spatial variation such as cell-cell interactions. In applications to data across different technologies, we show that cell-cell interactions are indeed a major determinant of the expression level of specific genes with a relevant link to their function. The second contribution is a latent variable model for the unsupervised analysis of gene expression data, while accounting for structured prior knowledge on experimental context. The proposed method enables the joint analysis of gene expression data and other omics data profiled in the same samples, and the model can be used to account for the grouping structure of samples, e.g. samples from individuals with different clinical covariates or from distinct experimental batches. Our model constitutes a principled framework to compare the molecular identities of these distinct groups.

Description

Date

2018-09-26

Advisors

Stegle, Oliver
Saez-Rodriguez, Julio

Keywords

Gaussian Processes, Factor Analysis, Gene Expression, Machine Learning, Bayesian Modelling

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Sponsorship
Fellowship from the EMBL international PhD program