Protein Shape Description and its Application to Shape Comparison
There are currently over 138, 000 known macromolecular structures deposited in the wwPDB (Worldwide Protein Data Bank) database. While all the macromolecular structure files contain information about a particular structure, the collection of these files also allows combining the macromolecular structures to obtain statistical information about macromolecules in general. This fact has been the basis for many structural biology methods including the molecular replacement method used in X-ray crystallography or homologous structure restraints in the refinement methods. With the success of methods based on prior information, it is feasible that novel methods could be developed and current methods improved using further prior information; more specifically, by using the structure density-map shape similarity instead of sequence or model similarity. Therefore, this project introduces a mathematical framework for computing three different measures of macromolecular three-dimensional shape similarity and demonstrates how these descriptors can be applied in symmetry detection and protein-domain clustering. The ability to detect cyclic (C), dihedral (D), tetrahedral (T), octahedral (O) and icosahedral (I) symmetry groups as well as computing all associated symmetry elements has direct applications in map averaging and reducing the storage requirements by storing only the asymmetric information. Moreover, by having the capacity to find structures with similar shape, it was possible to reduce the size of the BALBES protein domain database by more than 18.7% and thus achieve proportional speed-up in the searching parts of its applications. Finally, the development of the method described in this project has many possible applications throughout structural biology. The method could, for example, facilitate matching and fitting of protein domains into the density maps produced by the electron-microscopy techniques, or it could allow for molecular-replacement candidate search using shape instead of sequence similarity. To allow for the development of any further applications, software for applying the methods described here is also presented and released for the community.
ProSHADE Repository: http://fg.oisin.rc-harwell.ac.uk/projects/proshade/