Repository logo
 

Network modularity and local environment similarity as descriptors of protein structure


Type

Thesis

Change log

Authors

Abstract

As the number of solved protein structures increases, the opportunities for meta-analysis of this dataset increase too. Here we explore two approaches for analysing protein structure, both starting from the three-dimensional co-ordinates of each atom within the structure, which are then abstracted into a more useful form.

The first method transforms the protein into a network in which its amino acids are the nodes, and where the edges are generated using a simple proximity test. By applying the Infomap community detection algorithm, we can fragment the protein into highly intra-connected subregions - these subregions are compact and globular, and can be compared with known structural and functional subunits of the protein (also known as domains). By performing this fragmentation process systematically across a large set of proteins, and checking for structurally conserved fragments, we can search for novel candidate domains. This method for automatically decomposing a protein into compact substructures may also be useful in coarse-graining molecular dynamics, analysing the protein’s topology, in de novo protein design, or in fitting electron density maps derived from single particle electron microscopy.

The second method calculates a descriptor for each atom of the protein based on its local environment, known as a Smooth Overlap of Atomic Positions (SOAP) descriptor. Using these descriptors we can perform overall comparisons of the subregions identified above. In addition, by comparing the descriptors of a set of proteins known to share common structural or functional features (such as binding of a particular ligand), we can automatically identify the most highly conserved atoms of the set. These atoms may line ligand binding pockets or correspond to allosteric sites, which could inform drug design.

Description

Date

2019-12-01

Advisors

Ahnert, Sebastian

Keywords

computational biology, protein structure, network science

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Sponsorship
EPSRC (1644501)

Collections