Genetic determinants of the human plasma proteome and their application in biology and disease
Repository URI
Repository DOI
Change log
Authors
Abstract
Proteins are the primary functional units of biology and the direct targets of most drugs, yet there is limited knowledge of the genetic factors determining inter-individual variation in protein levels (protein quantitative trait loci (pQTLs)). Limitations in high-throughput proteomic measurement technology have meant well-powered genome-wide association studies for large number of proteins so far have lagged behind many of the other "omic" studies such as transcriptomics and metabolomics. This is made more challenging by the complexity of human plasma, characterised by high dynamic range spanning several magnitudes of concentrations and a large number of low abundance proteins. By using an expanded high-throughput multiplex aptamer-based proteomic assay with more than twice the proteome coverage of previous studies, I am able to greatly expand on existing knowledge on genetic determinants of human plasma proteins through testing 10.6 million DNA variants against levels of 2,994 proteins in 3,301 individuals. I identify 1,927 genetic associations with 1,478 proteins, replicating many previous associations as well as gaining novel insights into the genetic architecture of the human plasma proteome. I use several approaches to highlight the application of pQTLs to biology and disease. I show several examples linking distant pQTLs to biologically plausible genes and demonstrate the mediation of distant pQTL by local protein levels, highlighting the role of protein-protein interactions. In addition, I find epistatic effects of genetically determined phenotypes (blood group and secretor status) on protein levels. Through linking previous disease associations, I show that disease associated variants are enriched for pQTLs and I provide insights into possible mechanisms underpinning some of the disease loci. Finally, I identify causal roles for protein biomarkers in disease through multivariable Mendelian randomisation (MR) analysis, leveraging on the simultaneous measurement of multiple functionally related proteins in a locus to account for potential pleiotropic effects. Whereas MR studies of plasma proteins have been constrained by availability of few suitable genetic instruments, the data generated here remedy this bottleneck by furnishing an extensive toolkit. Overall, the work within this thesis foreshadows major advances in post-genomic science through increasing application of novel bioassay technologies to major population biobanks.