Repository logo
 

Pathogenicity and selective constraint in the non-coding genome


Type

Thesis

Change log

Authors

Short, Patrick 

Abstract

Gene regulation plays a central role in evolution, organismal development, and disease. Despite the critical importance of gene regulation throughout development, there have been few genetic variants in regulatory elements with large effects that have been robustly associated to disease. In this work, my overarching aim was to gain a better understanding of the contribution of genetic variation in regulatory elements to Mendelian disorders and attempted to approach this problem from three different perspectives. I first sought to assess the contribution of regulatory variation to severe developmental disorders using sequence data from 8,000 affected individuals and their parents and to identify individual elements with a high probability of harbouring pathogenic regulatory elements. Next, I used population genetic models and data from more than 28,000 whole genome sequenced individuals to examine the forces of selection operating on non-coding elements genome-wide. Finally, I conducted a pilot experiment to assay >50,000 different non-coding variants across more than 700 different non-coding elements, including variants observed in patients with developmental disorders in a massively parallel reporter assay (MPRA) and collaborated on an assessment of the impact of patient mutations in eleven different enhancers using mouse transgenesis assays.

A few key results from the work are summarised below:

  • I provide evidence that de novo SNVs in non-coding elements contribute to severe developmental disorders, and estimate that they contribute in 1-3% of cases not harbouring a likely diagnostic coding variant.
  • These de novo SNVs reside primarily in highly evolutionarily conserved regulatory elements and I estimate that a large fraction of conserved non-coding elements (50-70%) are acting as enhancers and a smaller subset (10-15%) have a function related to alternative splicing.
  • Statistical modelling of the distribution of variants in developmental disorder patients suggests that a small fraction of bases (maximum likelihood estimate of 3%) within a disease-associated non-coding element are likely pathogenic with high penetrance when mutated.
  • I develop a new genome-wide mutation rate model that accounts for a variety of germline features including recombination rate, replication timing, sequence context, and histone marks which greatly outperforms models based on sequence-context alone.
  • I find evidence for widespread purifying selection in the non-coding genome that is correlated with nucleotide-level evolutionary conservation, even when the conserved nucleotides lie within otherwise poorly conserved sequence.
  • I show that the selective constraint on small insertions and deletions is likely greater than the selective constraint on SNVs.
  • I present data from a pilot experiment assessing more than 50,000 different non-coding variants in a massively parallel reporter assay conducted in both HeLa and Neuroblastoma cells.

Description

Date

2018-09-10

Advisors

Hurles, Matthew

Keywords

genomics, developmental disorders, statistical genetics

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Sponsorship
Wellcome Trust