Machine Learning for Workflow Applications in Screening Mammography: Systematic Review and Meta-Analysis.
Background Advances in computer processing and improvements in data availability have led to the development of machine learning (ML) techniques for mammographic imaging. Purpose To evaluate the reported performance of stand-alone ML applications for screening mammography workflow. Materials and Methods Ovid Embase, Ovid Medline, Cochrane Central Register of Controlled Trials, Scopus, and Web of Science literature databases were searched for relevant studies published from January 2012 to September 2020. The study was registered with the PROSPERO International Prospective Register of Systematic Reviews (protocol no. CRD42019156016). Stand-alone technology was defined as a ML algorithm that can be used independently of a human reader. Studies were quality assessed using the Quality Assessment of Diagnostic Accuracy Studies 2 and the Prediction Model Risk of Bias Assessment Tool, and reporting was evaluated using the Checklist for Artificial Intelligence in Medical Imaging. A primary meta-analysis included the top-performing algorithm and corresponding reader performance from which pooled summary estimates for the area under the receiver operating characteristic curve (AUC) were calculated using a bivariate model. Results Fourteen articles were included, which detailed 15 studies for stand-alone detection (n = 8) and triage (n = 7). Triage studies reported that 17%-91% of normal mammograms identified could be read by adapted screening, while "missing" an estimated 0%-7% of cancers. In total, an estimated 185 252 cases from three countries with more than 39 readers were included in the primary meta-analysis. The pooled sensitivity, specificity, and AUC was 75.4% (95% CI: 65.6, 83.2; P = .11), 90.6% (95% CI: 82.9, 95.0; P = .40), and 0.89 (95% CI: 0.84, 0.98), respectively, for algorithms, and 73.0% (95% CI: 60.7, 82.6), 88.6% (95% CI: 72.4, 95.8), and 0.85 (95% CI: 0.78, 0.97), respectively, for readers. Conclusion Machine learning (ML) algorithms that demonstrate a stand-alone application in mammographic screening workflows achieve or even exceed human reader detection performance and improve efficiency. However, this evidence is from a small number of retrospective studies. Therefore, further rigorous independent external prospective testing of ML algorithms to assess performance at preassigned thresholds is required to support these claims. ©RSNA, 2021 Online supplemental material is available for this article. See also the editorial by Whitman and Moseley in this issue.
Cancer Research UK (C96/A25177)
Engineering and Physical Sciences Research Council (EP/M00483X/1)
Engineering and Physical Sciences Research Council (EP/N014588/1)