Show simple item record

dc.contributor.authorHawizy, Lezanen
dc.contributor.authorJessop, Daviden
dc.contributor.authorAdams, Nicoen
dc.contributor.authorMurray-Rust, Peteren
dc.date.accessioned2011-06-17T05:00:15Z
dc.date.available2011-06-17T05:00:15Z
dc.date.issued2011-05-16en
dc.identifier.citationJournal of Cheminformatics 2011, 3:17
dc.identifier.issn0065-7727
dc.identifier.urihttp://www.dspace.cam.ac.uk/handle/1810/238153
dc.description.abstractAbstractBackgroundThe primary method for scientific communication is in the form of published scientific articles and theses which use natural language combined with domain-specific terminology. As such, they contain free flowing unstructured text. Given the usefulness of data extraction from unstructured literature, we aim to show how this can be achieved for the discipline of chemistry. The highly formulaic style of writing most chemists adopt make their contributions well suited to high-throughput Natural Language Processing (NLP) approaches.ResultsWe have developed the ChemicalTagger parser as a medium-depth, phrase-based semantic NLP tool for the language of chemical experiments. Tagging is based on a modular architecture and uses a combination of OSCAR, domain-specific regex and English taggers to identify parts-of-speech. The ANTLR grammar is used to structure this into tree-based phrases. Using a metric that allows for overlapping annotations, we achieved machine-annotator agreements of 88.9% for phrase recognition and 91.9% for phrase-type identification (Action names).ConclusionsIt is possible parse to chemical experimental text using rule-based techniques in conjunction with a formal grammar parser. ChemicalTagger has been deployed for over 10,000 patents and has identified solvents from their linguistic context with > 99.5% precision.
dc.languageEnglishen
dc.language.isoen
dc.titleChemicalTagger: A tool for semantic text-mining in chemistryen
dc.typeArticle
dc.date.updated2011-06-17T05:00:15Z
dc.description.versionRIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.en
dc.rights.holderHawizy et al.; licensee BioMed Central Ltd.
prism.publicationDate2011en
dcterms.dateAccepted2011-05-16en
rioxxterms.versionofrecord10.1186/1758-2946-3-17en
rioxxterms.licenseref.urihttp://www.rioxx.net/licenses/all-rights-reserveden
rioxxterms.licenseref.startdate2011-05-16en
dc.identifier.eissn1758-2946
rioxxterms.typeJournal Article/Reviewen


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record