Repository logo
 

High-dimensional covariance estimation with applications to functional genomics


Type

Thesis

Change log

Abstract

Covariance matrix estimation plays a central role in statistical analyses. In molecular biology, for instance, covariance estimation facilitates the identification of dependence structures between molecular variables that shed light on the underlying biological processes. However, covariance estimation is generally difficult because high-throughput molecular experiments often generate high-dimensional and noisy data, possibly with missing values. In such context, there is a need to develop scalable and robust estimation methods that can improve inference by, for example, taking advantage of the many sources of external information available in public repositories.

This thesis introduces novel methods and software for estimating covariance matrices from high-dimensional data. Chapter 2 introduces a flexible and scalable Bayesian linear shrinkage covariance estimator. This accommodates multiple shrinkage target matrices, allowing the incorporation of external information from an arbitrary number of sources. It is also less sensitive to target misspecification and can outperform state-of-the-art single-target linear shrinkage estimators.

Chapter 3 explores a dimensionality reduction approach --- probabilistic principal component analysis --- as a model-based covariance estimation method that can handle missing values. By assuming a low-dimensional latent structure, this is particularly useful when the inverse covariance is required (e.g. network inference). All of our methods are implemented as well-documented open-source R libraries.

Finally, Chapter 4 presents a case study using a dataset of cytokine expression in patients with traumatic brain injury. Studies of this type are crucial to researching the inflammatory response in the brain and potential patient recovery. However, due to the difficulties in patient recruitment, they result in high-dimensional datasets with relatively low sample sizes. We show how our methods can facilitate the multivariate analysis of cytokines across time and different treatment regimes.

Description

Date

2019-02-15

Advisors

Richardson, Sylvia
Leday, Gwenaël
Vallejos, Catalina

Keywords

Covariance, high-dimensional, linear shrinkage, probabilistic principal component analysis, bayesian

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Sponsorship
Wellcome Trust 4 year PhD studentship for Mathematical Genomics and Medicine