Repository logo

Modelling the structural, functional and phenotypic consequences of protein coding mutations



Change log


Dunham, Alistair 


Proteins are integral to all cellular processes and underpin the function of all extant organisms, meaning variants impacting them are a primary cause of phenotypic variation. Protein coding variants are a key area of study in biology, with relevance from structural and molecular biology to population genetics. They are also medically important, impacting inherited genetic diseases, cancer and response to pathogens. Recent advances in highthroughput experimental techniques have opened the door to many new approaches in biology, and protein variants are no exception. Deep mutational scanning experiments exhaustively measure the fitness of variants in a protein, which gives us more experimentally validated mutational consequence measurements than ever before. Such advances, together with ever larger sequence and structure databases, have created an opportunity to apply large scale analyses to coding variation, studying the effect on protein structure, function and phenotype.

In this thesis I perform three large scale variant analyses. First, I use the consequences of variation to learn about protein structure and function. I compile a dataset from 28 deep mutational scanning studies, covering 6291 positions in 30 proteins, and use the consequences of mutation at each position to define a mutational landscape. I show rich biophysical relationships in this landscape and identify functionally distinct positional subtypes of each amino acid. In the second analysis, I explore genotype to phenotype prediction using a dataset of 1011 S. cerevisiae strains, with genotypes, transcriptomics, proteomics and measured phenotypes, and comprehensive gene deletions in four strains. I show knowledge-based models of mutational consequences and pathway function can be used to associate genes with phenotypes and predict growth phenotypes across 34 growth conditions. However, genetic background is found to have a large effect on variant consequences, to such an extent that the same deletion can be highly significant in one strain and have no effect in another. Finally, I analyse computational variant effect prediction, benchmarking current predictors using deep mutational scanning data. I then develop a new end-to-end deep convolutional neural network predictor that predicts consequences directly from sequence and structure and show it improves on current methods. Together these projects advance our knowledge of protein coding variation and enhance our capacity to link variation to impacts on structure, function and phenotype.





Beltrao, Pedro


Bioinformatics, Protein, Mutation, Deep Learning


Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge