Pinpointing needles in giant haystacks: use of text mining to reduce impractical screening workload in extremely large scoping reviews.

Shemilt, Ian; Simon, Antonia; Hollands, Gareth J; Marteau, Theresa M; Ogilvie, David; O'Mara-Eves, Alison; Kelly, Michael P; Thomas, James

Pinpointing needles in giant haystacks: use of text mining to reduce impractical screening workload in extremely large scoping reviews.

Published version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/275673

Repository DOI

https://doi.org/10.17863/CAM.22931

Files

Published version (1.31 MB)

Type

Article

Authors

Shemilt, Ian

Simon, Antonia

Hollands, Gareth J

Marteau, Theresa M

Ogilvie, David

https://orcid.org/0000-0002-0270-4672

Show 3 more

Abstract

In scoping reviews, boundaries of relevant evidence may be initially fuzzy, with refined conceptual understanding of interventions and their proposed mechanisms of action an intended output of the scoping process rather than its starting point. Electronic searches are therefore sensitive, often retrieving very large record sets that are impractical to screen in their entirety. This paper describes methods for applying and evaluating the use of text mining (TM) technologies to reduce impractical screening workload in reviews, using examples of two extremely large-scale scoping reviews of public health evidence (choice architecture (CA) and economic environment (EE)). Electronic searches retrieved >800,000 (CA) and >1 million (EE) records. TM technologies were used to prioritise records for manual screening. TM performance was measured prospectively. TM reduced manual screening workload by 90% (CA) and 88% (EE) compared with conventional screening (absolute reductions of ≈430 000 (CA) and ≈378 000 (EE) records). This study expands an emerging corpus of empirical evidence for the use of TM to expedite study selection in reviews. By reducing screening workload to manageable levels, TM made it possible to assemble and configure large, complex evidence bases that crossed research discipline boundaries. These methods are transferable to other scoping and systematic reviews incorporating conceptual development or explanatory dimensions.

Keywords

scoping review methods, study selection, systematic review methods, text mining, Data Mining, Machine Learning, Natural Language Processing, Pattern Recognition, Automated, Periodicals as Topic, Review Literature as Topic, Vocabulary, Controlled, Workload

Journal Title

Res Synth Methods

Journal ISSN

1759-2879
1759-2887

Volume Title

5

Publisher

Wiley

Publisher DOI

https://doi.org/10.1002/jrsm.1093

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

Sponsorship

Medical Research Council (MC_UU_12015/6)
null (unknown)
Wellcome Trust (087636/Z/08/Z)
Economic and Social Research Council (ES/G007462/1)
Medical Research Council (MR/K023187/1)
Medical Research Council (MC_UP_1001/1)

Collections

Scholarly Works - Institute of Public Health