Repository logo
 

Modelling fitness and stability of G protein-coupled receptor variants


Change log

Abstract

G protein-coupled receptors (GPCRs) are the molecular targets of more than a third of approved drugs1 used in a wide variety of diseases2. Protein-altering genetic variants have complex effects on the biophysical and functional properties of GPCRs. Understanding the biophysical and functional effects of protein-altering variants in GPCRs using computational variant effect prediction methods could help engineer improved GPCR constructs for in vitro experiments3 and improve the understanding of GPCR gene essentiality and gene-disease links4. Recently, machine learning (ML) models have made significant advances in protein structure prediction5 and variant effect prediction6. However, use of ML models in high-stakes decisions is challenging as the mechanism by which these predictions are derived from underlying data is not well understood7. In this thesis, I aimed to generate interpretable predictions of the effects of GPCR variants on stability and clinical phenotypes. In Chapter 2 of this thesis, I aimed to improve the interpretability of ML models for GPCR variant stability. To this end, I built a model which combines evolutionary fitness prediction with structure-based Monte Carlo simulations to predict the relative stability of GPCR mutants. My model performed competitively with other proposed models for prediction of the stability of GPCR mutants in detergent. Combining these features with experimental measurements of GPCR expression further improved predictive performance beyond that reported for previous methods. Because of the simple structure of my model, the contributions of the input features could be checked easily, allowing the number of input features to be reduced without loss of performance. In Chapter 3, I aimed to test whether estimates of GPCR gene essentiality could be improved by combining information from missense and loss-of-function variants. I found that 50% of genes in the GPCR superfamily have insufficient expected loss-of-function variants to be ranked in the top quintile of genes by the popular LOEUF metric for human gene essentiality, and that this is a result of reduced splicing in GPCR genes compared to other genes of this length. I constructed a metric combining evidence for depletion of missense and loss of function variants. This metric discriminated GPCR genes associated with lethal phenotypes on knockout in mice from non-essential genes more effectively than loss-of-function constraint, while performing similarly for non-GPCR essential genes. I concluded that combined metrics of constraint against pLoF and missense variants could improve the discovery of disease linked GPCR genes. In Chapter 4, I aimed to characterize missense variants in selected CXC subfamily chemokine GPCRs associated with an immunological phenotype, neutrophil count, using sequence-based variant effect prediction and structural analysis. I found that variants in CXCR2 with low predicted fitness or at key functional sites for chemokine ligand binding and G protein signalling are more likely to be associated with changes in neutrophil count. However, I did not observe a similar effect for CXCR1. I concluded that rare variants which reduce CXCR2 function are likely to be associated with neutrophil count in humans. Overall, I have shown that single-amino acid variants in GPCRs can be characterised effectively using interpretable computational methods, and that the essentiality and phenotypic associations of GPCR genes can be estimated through prediction of the protein-level effects of missense variants.

Description

Date

2024-09-19

Advisors

Bender, Andreas

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights and licensing

Except where otherwised noted, this item's license is described as All rights reserved
Sponsorship
Biotechnology and Biological Sciences Research Council (2275936)