Building a pipeline to solicit expert knowledge from the community to aid gene summary curation.
Publication Date
2020-01Journal Title
Database : the journal of biological databases and curation
ISSN
1758-0463
Publisher
Oxford University Press
Volume
2020
Language
eng
Type
Article
This Version
AM
Physical Medium
Print
Metadata
Show full item recordCitation
Antonazzo, G., Urbano, J. M., Marygold, S., Millburn, G., & Brown, N. (2020). Building a pipeline to solicit expert knowledge from the community to aid gene summary curation.. Database : the journal of biological databases and curation, 2020 https://doi.org/10.1093/database/baz152
Abstract
Brief summaries describing the function of each gene’s product(s) are of great value to the research community, especially when interpreting genome-wide studies that reveal changes to hundreds of genes. However, manually writing such summaries, even for a single species, is a daunting task; for example, the Drosophila melanogaster genome contains almost 14,000 protein-coding genes. One solution is to use computational methods to generate summaries, but this often fails to capture the key functions or express them eloquently. Here, we describe how we solicited help from the research community to generate manually written summaries of D. melanogaster gene function. Based on the data within the FlyBase database, we developed a computational pipeline to identify researchers who have worked extensively on each gene. We e-mailed these researchers to ask them to draft a brief summary of the main function(s) of the gene’s product, which we edited for consistency to produce a “gene snapshot”. This approach yielded 1,800 gene snapshot submissions within a three-month period. We discuss the general utility of this strategy for other databases that capture data from the research literature.
Database URL: https://flybase.org/
Keywords
Animals, Drosophila melanogaster, Data Collection, Software, Databases, Genetic, Genome, Insect
Sponsorship
Medical Research Council (UK) [G1000968 and MR/N030117/1]
National Human Genome Research Institute at the National Institutes of Health [U41 HG00739]
Funder references
MRC (G1000968)
MRC (MR/N030117/1)
National Institutes of Health (NIH) (via Harvard School of Public Health) (132685-5104589)
National Institutes of Health (NIH) (via Harvard University) (132685-5104589)
National Institutes of Health (NIH) (via Harvard University) (132626-5085854)
Embargo Lift Date
2022-12-19
Identifiers
External DOI: https://doi.org/10.1093/database/baz152
This record's URL: https://www.repository.cam.ac.uk/handle/1810/300148
Rights
All rights reserved