Show simple item record

dc.contributor.authorGray, Harry
dc.date.accessioned2020-01-08T16:21:02Z
dc.date.available2020-01-08T16:21:02Z
dc.date.issued2020-03-11
dc.date.submitted2019-02-15
dc.identifier.urihttps://www.repository.cam.ac.uk/handle/1810/300603
dc.description.abstractCovariance matrix estimation plays a central role in statistical analyses. In molecular biology, for instance, covariance estimation facilitates the identification of dependence structures between molecular variables that shed light on the underlying biological processes. However, covariance estimation is generally difficult because high-throughput molecular experiments often generate high-dimensional and noisy data, possibly with missing values. In such context, there is a need to develop scalable and robust estimation methods that can improve inference by, for example, taking advantage of the many sources of external information available in public repositories. This thesis introduces novel methods and software for estimating covariance matrices from high-dimensional data. Chapter 2 introduces a flexible and scalable Bayesian linear shrinkage covariance estimator. This accommodates multiple shrinkage target matrices, allowing the incorporation of external information from an arbitrary number of sources. It is also less sensitive to target misspecification and can outperform state-of-the-art single-target linear shrinkage estimators. Chapter 3 explores a dimensionality reduction approach --- probabilistic principal component analysis --- as a model-based covariance estimation method that can handle missing values. By assuming a low-dimensional latent structure, this is particularly useful when the inverse covariance is required (e.g. network inference). All of our methods are implemented as well-documented open-source R libraries. Finally, Chapter 4 presents a case study using a dataset of cytokine expression in patients with traumatic brain injury. Studies of this type are crucial to researching the inflammatory response in the brain and potential patient recovery. However, due to the difficulties in patient recruitment, they result in high-dimensional datasets with relatively low sample sizes. We show how our methods can facilitate the multivariate analysis of cytokines across time and different treatment regimes.
dc.description.sponsorshipWellcome Trust 4 year PhD studentship for Mathematical Genomics and Medicine
dc.language.isoen
dc.rightsAll rights reserved
dc.rightsAll Rights Reserveden
dc.rights.urihttps://www.rioxx.net/licenses/all-rights-reserved/en
dc.subjectCovariance
dc.subjecthigh-dimensional
dc.subjectlinear shrinkage
dc.subjectprobabilistic principal component analysis
dc.subjectbayesian
dc.titleHigh-dimensional covariance estimation with applications to functional genomics
dc.typeThesis
dc.type.qualificationlevelDoctoral
dc.type.qualificationnameDoctor of Philosophy (PhD)
dc.publisher.institutionUniversity of Cambridge
dc.publisher.departmentMRC Biostatistics Unit
dc.date.updated2020-01-08T14:03:26Z
dc.identifier.doi10.17863/CAM.47676
dc.contributor.orcidGray, Harry [0000-0002-6714-0089]
dc.publisher.collegeDarwin College
dc.type.qualificationtitlePhD in Biostatistics
cam.supervisorRichardson, Sylvia
cam.supervisorLeday, Gwenaël
cam.supervisorVallejos, Catalina
cam.supervisor.orcidRichardson, Sylvia [0000-0003-1998-492X]
cam.supervisor.orcidVallejos, Catalina [0000-0003-3638-1960]
cam.thesis.fundingfalse


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record