Integration and analysis of protein evolutionary relationships and small molecule bioactivity data
University of Cambridge
European Bioinformatics Institute
MetadataShow full item record
Krüger, F. (2014). Integration and analysis of protein evolutionary relationships and small molecule bioactivity data (doctoral thesis). https://doi.org/10.17863/CAM.15984
Interactions of small organic molecules and proteins have been studied extensively in the search of therapeutic drugs. Historically, the interaction partners have been attributed to separate scientific disciplines: small organic molecules to the domain of chemistry, proteins to the domain of biology. Likewise, chemical and biological data have been stored and maintained separately. The aim of my thesis was to integrate the ChEMBL database, a public repository of small molecule bioactivity measurements, with resources of protein evolutionary relationships, and exploit these new links to further our understanding of small molecule bioactivity. In order to link biological assays via the evolutionary relationships of their protein targets, I established a mapping of small molecule binding to specific structural protein domains - the fundamental building blocks of protein architecture and evolution. By mapping small molecule binding to protein domains, I was able to examine links between the properties of small molecules and the evolutionary units that mediate their binding. I used domain definitions from Pfam, a database of protein domains derived from conserved sequence blocks. The mapping is now an integral part of the ChEMBL database and can be used to limit sequence-based queries to sequence partitions that are relevant to small molecule binding. Further, I integrated information from the homology resource EnsemblCompara Genetrees with bioactivity data from ChEMBL to examine the conservation of small molecule potency between homologous proteins within and across species. Potency differences between related proteins are a useful indicator of small molecule specificity. Specificity is an early milestone for most drug discovery projects as it allows for the manipulation of a desired process in a targeted manner, with side effects reduced to a minimum. I examined pairs of closely related human proteins and found that potency differences were overall greater than the estimated background noise. Using the outlined integration approach in a cross-species comparison, I also observed that potency differences between pairs of related proteins in human and rat were overall no greater than the background noise. This is relevant to the use of model organisms for drug discovery, which relies on extrapolation from a measured response in one species to a therapeutic effect in humans. Taken together I have integrated small molecule bioactivity and protein evolutionary data from two resources, Pfam and EnsemblCompara Genetrees. This has provided a framework for studying small molecule binding in the context of protein evolution.
This record's DOI: https://doi.org/10.17863/CAM.15984