Repository logo
 

The effect of non-coding variants on gene transcription in human blood cell types


Type

Thesis

Change log

Authors

Kreuzhuber, Roman 

Abstract

To understand complex genetic diseases it is necessary to study DNA, its transcription, translation and regulation thereof. In a mechanistic view diseases can be caused by alterations in the DNA sequence or by dysregulation of gene expression. To understand cell regulation, first connections have to be found between elements contributing to gene expression and its regulation.

During my PhD I have studied the regulatory effects of human genetic variation. To do so I have processed and analysed datasets measuring effects of genetic variants on transcript levels of genes, identified regulatory variants and put them in bigger biochemical and physiological context.

Expression quantitative trait locus (eQTL) studies identify genetically explainable gene expression variation in a tissue and potentially cell type specific manner. I have reprocessed and aggregated eQTL datasets of seven purified blood cell types that were generated by collaborators in different laboratories. I showed that the increased sample size enables the identification of additional associations, also with low-frequency variants, and I compared my cell type specific eQTL results to results from an eQTL study performed on whole blood without cell type purification.

Gene regulation is complex and relies on several layers of control, and only one of them is genetic background. To help understand genetic control, I compared my eQTL results across the seven cell types and highlighted cell type specific associations.

I put my eQTL results into bigger context, overlapping them with genomic regions, which are known to be important for gene regulation. Apart from the direct interpretation, eQTL results have been used as a tool to help improve the understanding of results from genome-wide association studies (GWAS) by means of colocalisation. I performed a colocalisation analysis and as an example I could show how a GWAS variant mechanistically exerts its effect on plateletcrit - a blood cell index studied in a recent GWAS (Astle et al., 2016). Finally, I drew a link between gene-regulation, three-dimensional chromatin structure and gene constraint against coding loss of function mutations.

Complementing association analyses like eQTL and GWAS, recent advances in machine learning can be used to predict in silico the effects of genetic variation. To facilitate the application of published machine learning models on custom data and to predict biological effects of genetic variants, I have developed, in collaboration with a fellow PhD student, Ziga Avsec (Prof. Julien Gagneur's group, TU Munich, Germany), the software Kipoi (www.kipoi.org).

The software has been designed to facilitate sharing and re-use of trained machine learning models in genomics. Together with Ziga Avsec I have conceptualised and implemented core elements of the platform. My major contribution in this software project was the implementation of tools and features for the effect estimation of DNA variants.

Description

Date

2018-11-12

Advisors

Ouwehand, Willem Hendrik
Stegle, Oliver

Keywords

eQTL, genomics, deep learning, kipoi, blood cells, colocalisation

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge