Modelling the structural, functional and phenotypic consequences of protein coding mutations
View / Open Files
Authors
Dunham, Alistair
Advisors
Beltrao, Pedro
Date
2021-09-29Awarding Institution
University of Cambridge
Qualification
Doctor of Philosophy (PhD)
Type
Thesis
Metadata
Show full item recordCitation
Dunham, A. (2021). Modelling the structural, functional and phenotypic consequences of protein coding mutations (Doctoral thesis). https://doi.org/10.17863/CAM.82589
Abstract
Proteins are integral to all cellular processes and underpin the function of all extant organisms, meaning variants impacting them are a primary cause of phenotypic variation. Protein coding variants are a key area of study in biology, with relevance from structural and molecular biology to population genetics. They are also medically important, impacting inherited genetic diseases, cancer and response to pathogens. Recent advances in highthroughput experimental techniques have opened the door to many new approaches in biology, and protein variants are no exception. Deep mutational scanning experiments exhaustively measure the fitness of variants in a protein, which gives us more experimentally validated mutational consequence measurements than ever before. Such advances, together with ever larger sequence and structure databases, have created an opportunity to apply large scale analyses to coding variation, studying the effect on protein structure, function and phenotype.
In this thesis I perform three large scale variant analyses. First, I use the consequences of variation to learn about protein structure and function. I compile a dataset from 28 deep mutational scanning studies, covering 6291 positions in 30 proteins, and use the consequences of mutation at each position to define a mutational landscape. I show rich biophysical relationships in this landscape and identify functionally distinct positional subtypes of each amino acid. In the second analysis, I explore genotype to phenotype prediction using a dataset of 1011 S. cerevisiae strains, with genotypes, transcriptomics, proteomics and measured phenotypes, and comprehensive gene deletions in four strains. I show knowledge-based
models of mutational consequences and pathway function can be used to associate genes with phenotypes and predict growth phenotypes across 34 growth conditions. However, genetic background is found to have a large effect on variant consequences, to such an extent that the same deletion can be highly significant in one strain and have no effect in another. Finally, I analyse computational variant effect prediction, benchmarking current predictors using deep mutational scanning data. I then develop a new end-to-end deep convolutional neural network predictor that predicts consequences directly from sequence and structure and show it improves on current methods. Together these projects advance our knowledge of protein coding variation and enhance our capacity to link variation to impacts on structure, function and phenotype.
Keywords
Bioinformatics, Protein, Mutation, Deep Learning
Identifiers
This record's DOI: https://doi.org/10.17863/CAM.82589
Rights
Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Licence URL: https://creativecommons.org/licenses/by-nc-sa/4.0/
Statistics
Total file downloads (since January 2020). For more information on metrics see the
IRUS guide.
Recommended or similar items
The current recommendation prototype on the Apollo Repository will be turned off on 03 February 2023. Although the pilot has been fruitful for both parties, the service provider IKVA is focusing on horizon scanning products and so the recommender service can no longer be supported. We recognise the importance of recommender services in supporting research discovery and are evaluating offerings from other service providers. If you would like to offer feedback on this decision please contact us on: support@repository.cam.ac.uk