Curation, characterisation and prediction of Drosophila signalling pathway members
View / Open Files
Authors
Advisors
Brown, Nicholas H
Date
2022-01-01Awarding Institution
University of Cambridge
Qualification
Doctor of Philosophy (PhD)
Type
Thesis
Metadata
Show full item recordCitation
Antonazzo, G. (2022). Curation, characterisation and prediction of Drosophila signalling pathway members (Doctoral thesis). https://doi.org/10.17863/CAM.84682
Abstract
Signalling pathways are key to virtually every aspect of the biology of multicellular organisms. Extensive research in Drosophila melanogaster has greatly contributed to the understanding of these pathways, but a central resource distilling the vast literature on the topic has been lacking. At the same time, there is now a large amount of publicly available functional genomics data in Drosophila that, if appropriately analysed, might be able to contribute to further progress in the study of signalling pathways. Here, I describe an effort to systematize what is currently known about which genes are part of Drosophila pathways and use the resulting resource as the foundation for machine learning analyses, aiming to address whether existing data can be used to predict novel pathway members.
First, I describe my contribution to a systematic review of the literature on Drosophila signalling pathways. High-confidence lists of member genes were established for 16 pathways, and annotated using the Gene Ontology controlled vocabulary. The results of this review have been presented in a publicly available resource in the FlyBase database. Second, I performed analyses of various published data aiming to characterise the biological properties of genes within pathways. These analyses showed that members of a given pathway have correlated mRNA expression profiles and higher numbers of both physical and genetic interactions with each other than expected by chance, but do not show strong trends of having arisen in the same period during the history of life. Pathway members also have fewer loss-of-function variants in natural Drosophila populations than other genes, highlighting their biological importance. Third, I established a machine learning pipeline that makes use of these various types of data to predict new candidate pathway members, using the annotated members as positive training examples. The predictions displayed high accuracy in recognising true annotated members held out from training, suggesting that the predicted new members are useful candidates for future experimental work.
Overall, the work presented here highlights the importance of systematic curation of published findings to biological research. It also demonstrates how such curation, when combined with computational analyses of published data, can contribute to continued progress in the study of Drosophila signalling pathways.
Keywords
Signalling pathways, Drosophila, Gene Ontology, Machine learning, Biocuration
Identifiers
This record's DOI: https://doi.org/10.17863/CAM.84682
Statistics
Total file downloads (since January 2020). For more information on metrics see the
IRUS guide.
Recommended or similar items
The current recommendation prototype on the Apollo Repository will be turned off on 03 February 2023. Although the pilot has been fruitful for both parties, the service provider IKVA is focusing on horizon scanning products and so the recommender service can no longer be supported. We recognise the importance of recommender services in supporting research discovery and are evaluating offerings from other service providers. If you would like to offer feedback on this decision please contact us on: support@repository.cam.ac.uk