Investigation of transcription factor binding at distal regulatory elements
Repository URI
Repository DOI
Change log
Authors
Abstract
Cellular development and function necessitate precise patterns of gene expression. Control of gene expression is in part orchestrated by a class of remote regulatory elements, termed enhancers, which are brought into contact with promoters via DNA looping. Enhancers typically contain clusters of transcription factor binding sites, and TF recruitment to them is thought to play a key role in transcriptional control. In this thesis I have addressed two issues regarding gene regulation by enhancers. First, with recent genome-wide enhancer mapping, it is becoming increasingly apparent that genes are commonly regulated by multiple enhancers in the same cell type. How a gene’s regulatory information is encoded across multiple enhancers, however, is still not fully understood. Second, numerous recent studies have found that enhancers are enriched for expression-modulating and disease-associated genetic variants. However, understanding and predicting the effects of enhancer variants remains a major challenge. I focussed on a human lymphoblastoid cell line (LCL), GM12878, for which ChIP-Seq data are available for 52 different TFs from the ENCODE project. Significantly, Promoter Capture Hi-C data for the same LCL are available, making it possible to link enhancers to target genes globally. In the first part of the thesis, I investigated how gene regulatory information is encoded across enhancers. Specifically, I asked whether a gene tends to use multiple enhancers to bring the same or distinct regulatory information. I found that there was a general trend towards a “shadow” enhancer architecture, whereby similar combinations of TFs were recruited to multiple enhancers. However, numerous examples of “integrating” enhancers were observed, where the same gene showed large variation in TF binding across enhancers. Distinct groups of TFs were associated with these contrasting models of TF enhancer binding. To investigate the functional effects of variation at enhancers, I additionally took advantage of a panel of LCLs derived from 359 individuals, which have been genotyped by the 1000 Genomes Project, and for which RNA-Seq data are publically available. I used TF binding models to computationally predict variants impacting TF binding, and tested the association of these variants with the expression of the target genes they contact based on Promoter Capture Hi-C. Compared to the standard eQTL calling approach, this offers increased sensitivity as only variants physically contacting the promoter and predicted to impact TF binding are tested. Using this approach, I discovered a set of predicted TF-binding affinity variants at distal regions that associate with gene expression. Interestingly, a large proportion of these binding variants fall at the promoters of other genes. This finding suggests that some promoters may be able to act in an enhancer-like manner via long-range interactions, consistent with very recent findings from alternative approaches.