Repository logo
 

Bayesian modelling and sampling strategies for ordering and clustering problems with a focus on next-generation sequencing data


Type

Thesis

Change log

Authors

Strauss, Magdalena Elisabeth 

Abstract

This thesis presents novel methods for ordering and clustering problems. The first two parts focus on the development of models and sampling strategies specifically tailored for next-generation sequencing data. Most high-throughput measurements for single-cell data are destructive, resulting in the loss of longitudinal information. I developed a new, Bayesian, way of reconstructing this information computationally, sampling orders efficiently using MCMC on a space of permutations. This Bayesian approach provides novel insights into biological phenomena and experimental artefacts.

The second part presents a new clustering method for single-cell data, which specifically models the uncertainty of the clustering structure that results in part from the uncertainty of the orders discussed above. The proposed method uses nonparametric Bayesian methods, consensus clustering and efficient MCMC sampling to identify differences in dynamic patterns for different branches of gene expression data. It also categorises genes in a way consistent with biological function in an application to stimulated dendritic cells, and integrates data from different cell lines in a principled way.

The third part of the thesis adapts some of the methods developed in the first two parts to applications with very sparsely and irregularly sampled data, and explores through simulations the applicability of such models in different circumstances.

The fourth part discusses clustering methods for samples in a variety of different contexts, such as RNA expression, methylation or protein expression, and develops and critically discusses a novel hierarchical Bayesian method that integrates both different contexts and different groups of samples, for example different cancer types.

The unifying underlying theme of the thesis is the development of methods and efficient sampling and approximation strategies capable of capturing the uncertainty inherent in any statistical analysis of high-dimensional and noisy data.

Description

Date

2018-11-29

Advisors

Wernisch, Lorenz

Keywords

next-generation sequencing data, pseudotime ordering, nonparametric Bayes, efficient sampling, multi-omics methods

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Sponsorship
Medical research council