Repository logo
 

Text mining for improved exposure assessment

Published version
Peer-reviewed

Type

Article

Change log

Authors

Larsson, K 
Silins, I 
Guo, Y 
Stenius, U 

Abstract

Chemical exposure assessments are based on information collected via different methods, such as biomonitoring, personal monitoring, environmental monitoring and questionnaires. The vast amount of chemical-specific exposure information available from web-based databases, such as PubMed, is undoubtedly a great asset to the scientific community. However, manual retrieval of relevant published information is an extremely time consuming task and overviewing the data is nearly impossible. Here, we present the development of an automatic classifier for chemical exposure information. First, nearly 3700 abstracts were manually annotated by an expert in exposure sciences according to a taxonomy exclusively created for exposure information. Natural Language Processing (NLP) techniques were used to extract semantic and syntactic features relevant to chemical exposure text. Using these features, we trained a supervised machine learning algorithm to automatically classify PubMed abstracts according to the exposure taxonomy. The resulting classifier demonstrates good performance in the intrinsic evaluation. We also show that the classifier improves information retrieval of chemical exposure data compared to keyword-based PubMed searches. Case studies demonstrate that the classifier can be used to assist researchers by facilitating information retrieval and classification, enabling data gap recognition and overviewing available scientific literature using chemical-specific publication profiles. Finally, we identify challenges to be addressed in future development of the system.

Description

Keywords

biomarkers, phthalates, taxonomy, blood, inhalation, lead (element), urine, information retrieval

Journal Title

PLOS One

Conference Name

Journal ISSN

1932-6203
1932-6203

Volume Title

12

Publisher

PLOS
Sponsorship
Medical Research Council (MR/M013049/1)
Medical Research Council (G0601766)
S.B. received funding from Commonwealth Scholarship Commission (http://cscuk.dfid.gov.uk/), Cambridge Trust (https://www.cambridgetrust.org/). A.K. received funding from Medical Research Council UK grant MR/M013049/1.