Repository logo
 

Computational discovery and modelling of tandem domain repeats in proteins


Type

Thesis

Change log

Authors

Lafita Masip, Aleix  ORCID logo  https://orcid.org/0000-0003-1549-3162

Abstract

Domains are functional and evolutionary units of proteins that typically fold into stable globular structures. A small subset of natural multidomain proteins contain large arrays of nearly identical domains repeated in tandem, challenging some of our assumptions about protein folding and evolution. In this study, I aim to discover new tandem domain repeats, characterise their sequence and structural properties and understand their roles in the function of proteins. I start by using computational sequence analysis tools across large datasets of proteins and bacterial genomes to survey the prevalence and distribution of tan- dem domain repeats across organisms and domain families. Next, I computation- ally analyse and compare structures of domains found as tandem repeats, several of which have been experimentally determined by our collaborators in the course of this study. I finally develop two computational methods to systematically model the structure and misfolding energetics of tandem domain repeats. Nearly identical tandem domain repeats are rare in natural proteins (below 0.1%) and their sequences are highly biased in amino acid composition. Many of them have structural roles in bacterial surface proteins implicated in biofilm for- mation and host colonisation; new examples of such proteins, named "Periscope proteins", show rapid domain repeat number variation, a molecular mechanism used to modulate bacterial phenotype. Tandem domain repeat structures reveal unusual structural malleability, with numerous cases of domain atrophy (loss of core secondary structures) and elaboration. They are also predicted to be more resistant to misfolding via tandem domain swapping, with potential misfolding- resistant mechanisms such as the domain topology and length. This study improves our understanding of the prevalence, type and function of tandem domain repeats in proteins, in particular their role as structural ele- ments in bacterial surface proteins, and suggests new protein and domain targets for further experimental characterisation. It also has important implications for protein misfolding and for the design and engineering of multidomain proteins.

Description

Date

2021-02-24

Advisors

Bateman, Alex

Keywords

protein modelling, protein classification, protein domains, protein misfolding, multidomain proteins, tandem domain repeats

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Sponsorship
EMBL International PhD fellowship