Repository logo
 

Computational Discovery of Bacterial Fibrillar Adhesins and Adhesive Domains


Type

Thesis

Change log

Authors

Monzon, Vivian 

Abstract

Fibrillar adhesins are filamentous bacterial surface proteins, which can play a key role in host-pathogen interactions. They have a characteristic protein architecture with repeating domains, called stalk domains, folding into a rod-like stalk structure. An adhesive domain is positioned at the tip of the stalk. This study aims to provide a comprehensive characterisation of this protein class as well as to discover novel fibrillar adhesins and adhesive domains.

Using a collection of known adhesive and stalk domains, a domain-based search for fibrillar adhesins was conducted yielding over 3,500 protein hits in the UniProt Reference Proteomes widespread across the bacterial tree of life. These proteins were called fibrillar adhesin-like (FA-like) proteins. Investigating them in-depth showed different adhesive and stalk domain combinations and distinct protein architectures between different bacterial phyla. It also resulted in the recognition of identification features for the development of a machine learning (ML)-based discovery approach. This approach was applied on the Firmicutes and Actinobacteria UniProt Reference Proteomes detecting over 5,000 FA-like proteins, which were missed by the domain- based approach, including proteins without a known adhesive and/or stalk domain. Exploring these proteins with the focus on those lacking a known adhesive domain enabled the discovery of potential novel adhesive domain families. Using AlphaFold2, the structure of these domains was predicted to identify their potential function. The AlphaFold2 release also enabled the detection of FA-like proteins using adhesive and stalk domain structures. Here, the TMalign and Foldseek structure aligners were compared, with Foldseek showing a higher concordance with the sequence-based discovery approaches. Integrating structure features in the ML-based discovery approach improved its precision when testing it on three bacterial proteomes.

To find out more about the function of the detected FA-like proteins, protein interaction prediction methods were tested with the focus on AlphaFold Multimer. The results showed limitations in confidently predicting a binding target. The challenges encountered were discussed as well as an AlphaFold induced increase in the development of alternative methods tackling these challenges.

Description

Date

2023-03-01

Advisors

Bateman, Alexander

Keywords

Adhesive Proteins, AlphaFold, Bacteria, Bioinformatics, Host-pathogen interaction, Protein domain families

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Sponsorship
EMBL International PhD fellowship