Large-scale inference and imputation for multi-tissue gene expression

Viñas Torné, Ramon

doi:https://doi.org/10.17863/CAM.104513

Large-scale inference and imputation for multi-tissue gene expression

Repository URI

https://www.repository.cam.ac.uk/handle/1810/362131

Repository DOI

https://doi.org/10.17863/CAM.104513

Files

Primary Thesis (9.93 MB)

Type

Thesis

Authors

Viñas Torné, Ramon

https://orcid.org/0000-0003-2411-4478

Abstract

Integrating molecular information across tissues and cell types is essential for understanding the coordinated biological mechanisms that drive disease and characterise homoeostasis. Effective multi-tissue omics integration promises a system-wide view of human physiology, with potential to shed light on intra- and multi-tissue molecular phenomena, but faces many complexities arising from the intricacies of biomedical data. This integration problem challenges single-tissue and conventional techniques for omics analysis, often unable to model a variable number of tissues with sufficient statistical strength, necessitating the development of scalable, non-linear, and flexible methods.

This dissertation develops inference and imputation methods for the analysis of gene expression data, an immensely rich and complex biomedical data modality, enabling integration across multiple tissues. The imputation task can strongly influence downstream applications, including performing differential expression analysis, determining co-expression networks, and characterising cross-tissue associations. Inferring tissue-specific gene expression may also play a fundamental role in clinical settings, where gene expression is often profiled in accessible tissues such as whole blood. Due to the fact that gene expression is highly context-specific, imputation methods may facilitate the prediction of gene expression in inaccessible tissues, with applications in diagnosing and monitoring pathophysiological conditions.

The modelling approaches presented throughout the thesis address four important methodological problems. The first work introduces a flexible generative model for the in-silico generation of realistic gene expression data across multiple tissues and conditions, which may reveal tissue- and disease-specific differential expression patterns and may be useful for data augmentation. The second study proposes two deep learning methods to study whether the complete transcriptome of a tissue can be inferred from the expression of a minimal subset of genes, with potential application in the selection of tissue-specific biomarkers and the integration of large-scale biorepositories. The third work presents a novel method, hypergraph factorisation, for the joint imputation of multi-tissue and cell-type gene expression, providing a system-wide view of human physiology. The fourth study proposes a graph representation learning approach that leverages spatial information to improve the reconstruction of tissue architectures from spatial transcriptomic data. Collectively, this thesis develops flexible and powerful computational approaches for the analysis of tissue-specific gene expression data.

Date

2023-09-01

Advisors

Pietro, Liò

Keywords

data imputation, data integration, deep learning, gene expression, multi-tissue transcriptomics

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights and licensing

Sponsorship

Engineering and Physical Sciences Research Council (2276380)

Fundació "la Caixa" Fundación Rafael del Pino

Relationships

Is supplemented by:

https://doi.org/10.1038/ng.2653

Collections

Theses - Computer Science and Technology

Large-scale inference and imputation for multi-tissue gene expression

Repository URI

Repository DOI

Files

Type

Change log

Authors

Abstract

Description

Date

Advisors

Keywords

Qualification

Awarding Institution

Rights and licensing

Sponsorship

Relationships

Collections