Repository logo

Genetic architecture of transcript splicing in blood and phenotypic consequences

Change log



Transcript splicing is a fundamental process which allows for the generation of multiple different isoforms from a single gene body, increasing the functional capacity of our genomes. Bulk RNA-sequencing has allowed us to analyse this at scale by investigating not only the amount of a gene product detected, but where, and by how much, certain parts of genes have been excised. This thesis presents the largest to-date investigation in to the genetic architecture of transcript splicing in blood, utilising a deeply-phenotyped cohort of 4,732 healthy adults, in addition to 638 adults presenting to intensive care units with sepsis.

I first explore the genetic architecture of transcript splicing through the generation of splicing quantitative trait loci (sQTLs) in a healthy cohort of blood donors from the INTERVAL study. Transcript splicing is quantified through the use of split-reads present in RNA-seq data using the LeafCutter pipeline, allowing the quantification of transcript splicing without regards to established reference annotations. As the derived splice events do not have a 1-to-1 mapping to currently defined isoforms, I created a pipeline to richly annotate these splice events to aid in subsequent analyses. This resulted in 29,514 cis-sQTLs in 6,853 cis-sGenes, and I demonstrate large overlap with previous findings in addition to a plethora of new associations. Using cis-eSNPs derived from the same cohort, I perform a targeted trans-sQTL analysis under the hypothesis that trans-sSNPs could regulate splicing through the regulation of certain gene products involved in splicing. This validated the few currently known trans-sQTL associations, and provides a total of 642 splice events (in 208 sGenes), including known splice factors. Due to the magnitude and novelty of the created information, I develop an interactive online portal to browse and explore these sQTL results and incorporate subsequent analyses into, creating an interpretable form of the results generated by this thesis. This portal is publicly available at:

As the INTERVAL cohort is deeply phenotyped, containing protein measurements in plasma along with metabolites, lipids, and their genetic associations, I perform colocalisation analysis of these with the generated spliceQTLs to explore their shared genetic architecture. This reveals that many splice events and molecular phenotypes appear to be regulated by shared genetic effects, and through examples demonstrate how splicing could be modulating these downstream phenotypes through mechanisms such as changes in solubility. As a proof of concept I then compare public GWAS statistics for immune and blood related diseases with both spliceQTLs and those of the downstream molecular phenotypes, detailing many splicing-mediated pathways of disease through which risk loci are putatively acting, the majority of which are independent of eQTLs. To investigate the interaction of genetic and environmental effects on transcript splicing in disease, I utilise the GAinS cohort of 638 adults that have had blood taken upon arrival to the ICU with sepsis. Using this, I explore the transcriptomic differences between these individuals that are explained by transcript splicing and how this information can be used to predict patient status, and subsequently compare the shared genetic architecture of these splicing events with those of the healthy individuals and the previously defined downstream associations with molecular phenotypes. Notably through colocalisation with summary statistics for COVID-19 susceptibility and severity, I observe risk loci shared with those impacting transcript splicing in the sepsis patients, that were not observed in the healthy individuals.

In summary, this thesis provides an in-depth analysis of the genetic architecture of the largest to-date catalogue of transcript splicing, explores their utility in explaining the regulation of downstream molecular phenotypes, and demonstrates how these associations can be used to understand the mechanistic pathways of risk loci.





Davenport, Emma
Inouye, Michael


eQTL, genomics, GWAS, metabolite, protein, sepsis, sQTL, transcript splicing


Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Wellcome Trust (PhD studentship 222548/Z/21/Z) Wellcome Cambridge Trust Scholarship