Repository logo
 

The Influence of Structural Constraints on Protein Evolution


Type

Thesis

Change log

Authors

Perron, Umberto 

Abstract

Few mathematical models of sequence evolution incorporate parameters describingprotein structure, despite its high conservation, essential functional role and the increasingavailability of structural data. The primary goal of my PhD project was to create astructurally aware amino acid substitution model in which proteins are represented usingan expanded alphabet that relays both amino acid identity and structural information.Each character in this alphabet specifies an amino acid as well as information aboutthe rotamer configuration of its side chain: the discrete geometric pattern of permittedside chain atomic positions, as defined by the dihedral angles between covalently linkedatoms. I generated a 55-state “Dayhoff-like” substitution model (RAM55) by assigningrotamer states in 79,558 structures (∼50%of all PDBe entries) and identifying substitu-tions between closely related sequences. RAM55’s rotamer state exchange patterns clearlyshow that the evolutionary properties of amino acids depend strongly upon side chain ge-ometry. Exploiting knowledge of these patterns assists in phylogenetic analyses: I showthat RAM55 performs as well as or better than traditional 20-state models on simulatedand empirical data for divergence time estimation, tree inference, side chain configurationprediction and ancestral sequence reconstruction.Further, encoding observed characters in an alignment as ambiguous representations ofcharacters in a larger state-space allows the application of RAM55 to 20-state amino aciddata for which structures are not known. Adding structural information to as few as12.5%of the sequences in an amino acid alignment results in excellent ancestral reconstructionperformance compared to a benchmark that considers the full rotamer state information.This strategy significantly expands the applicability of RAM55 to real-world scenarioswhere structure might only be available for some of the sequences of interest.Thus, not only is rotamer configuration a valuable source of information for phylo-genetic studies, but modelling the concomitant evolution of sequence and structure mayhave important implications for understanding protein folding and function.

Description

Date

2020-05

Advisors

Goldman, Nick

Keywords

protein, evolution, protein structure, phylogenetics, modelling

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge