Text mining for improved exposure assessment
MetadataShow full item record
Larsson, K., Baker, S., Silins, I., Guo, Y., Stenius, U., Korhonen, A., & Berglund, M. (2017). Text mining for improved exposure assessment. PLOS One, 12 (3. e0173132)https://doi.org/10.1371/journal.pone.0173132
Chemical exposure assessments are based on information collected via different methods, such as biomonitoring, personal monitoring, environmental monitoring and questionnaires. The vast amount of chemical-specific exposure information available from web-based databases, such as PubMed, is undoubtedly a great asset to the scientific community. However, manual retrieval of relevant published information is an extremely time consuming task and overviewing the data is nearly impossible. Here, we present the development of an automatic classifier for chemical exposure information. First, nearly 3700 abstracts were manually annotated by an expert in exposure sciences according to a taxonomy exclusively created for exposure information. Natural Language Processing (NLP) techniques were used to extract semantic and syntactic features relevant to chemical exposure text. Using these features, we trained a supervised machine learning algorithm to automatically classify PubMed abstracts according to the exposure taxonomy. The resulting classifier demonstrates good performance in the intrinsic evaluation. We also show that the classifier improves information retrieval of chemical exposure data compared to keyword-based PubMed searches. Case studies demonstrate that the classifier can be used to assist researchers by facilitating information retrieval and classification, enabling data gap recognition and overviewing available scientific literature using chemical-specific publication profiles. Finally, we identify challenges to be addressed in future development of the system.
biomarkers, phthalates, taxonomy, blood, inhalation, lead (element), urine, information retrieval
S.B. received funding from Commonwealth Scholarship Commission (http://cscuk.dfid.gov.uk/), Cambridge Trust (https://www.cambridgetrust.org/). A.K. received funding from Medical Research Council UK grant MR/M013049/1.
Medical Research Council (MR/M013049/1)
External DOI: https://doi.org/10.1371/journal.pone.0173132
This record's URL: https://www.repository.cam.ac.uk/handle/1810/264007
Attribution 4.0 International, Attribution 4.0 International, Attribution 4.0 International, Attribution 4.0 International