Show simple item record

dc.contributor.authorGuo, Yufanen
dc.contributor.authorKorhonen, Anna-Leenaen
dc.contributor.authorLiakata, Mariaen
dc.contributor.authorSilins, Ilonaen
dc.contributor.authorHogberg, Johanen
dc.contributor.authorStenius, Ullaen
dc.date.accessioned2011-06-16T15:45:27Z
dc.date.available2011-06-16T15:45:27Z
dc.date.issued2011-03-08en
dc.identifier.citationBMC Bioinformatics 2011, 12:69
dc.identifier.issn1471-2105
dc.identifier.urihttp://www.dspace.cam.ac.uk/handle/1810/237756
dc.description.abstractAbstract Background Many practical tasks in biomedicine require accessing specific types of information in scientific literature; e.g. information about the results or conclusions of the study in question. Several schemes have been developed to characterize such information in scientific journal articles. For example, a simple section-based scheme assigns individual sentences in abstracts under sections such as Objective, Methods, Results and Conclusions. Some schemes of textual information structure have proved useful for biomedical text mining (BIO-TM) tasks (e.g. automatic summarization). However, user-centered evaluation in the context of real-life tasks has been lacking. Methods We take three schemes of different type and granularity - those based on section names, Argumentative Zones (AZ) and Core Scientific Concepts (CoreSC) - and evaluate their usefulness for a real-life task which focuses on biomedical abstracts: Cancer Risk Assessment (CRA). We annotate a corpus of CRA abstracts according to each scheme, develop classifiers for automatic identification of the schemes in abstracts, and evaluate both the manual and automatic classifications directly as well as in the context of CRA. Results Our results show that for each scheme, the majority of categories appear in abstracts, although two of the schemes (AZ and CoreSC) were developed originally for full journal articles. All the schemes can be identified in abstracts relatively reliably using machine learning. Moreover, when cancer risk assessors are presented with scheme annotated abstracts, they find relevant information significantly faster than when presented with unannotated abstracts, even when the annotations are produced using an automatic classifier. Interestingly, in this user-based evaluation the coarse-grained scheme based on section names proved nearly as useful for CRA as the finest-grained CoreSC scheme. Conclusions We have shown that existing schemes aimed at capturing information structure of scientific documents can be applied to biomedical abstracts and can be identified in them automatically with an accuracy which is high enough to benefit a real-life task in biomedicine.
dc.languageEnglishen
dc.language.isoen
dc.titleA comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessmenten
dc.typeArticle
dc.date.updated2011-06-16T15:45:27Z
dc.description.versionRIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.en
dc.rights.holderGuo et al.; licensee BioMed Central Ltd.
prism.publicationDate2011en
dcterms.dateAccepted2011-03-08en
rioxxterms.versionofrecord10.1186/1471-2105-12-69en
rioxxterms.licenseref.urihttp://www.rioxx.net/licenses/all-rights-reserveden
rioxxterms.licenseref.startdate2011-03-08en
dc.identifier.eissn1471-2105
rioxxterms.typeJournal Article/Reviewen
pubs.funder-project-idEPSRC (EP/G051070/1)
pubs.funder-project-idMRC (G0601766)


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record