International evaluation of an AI system for breast cancer screening.

McKinney, Scott Mayer; Sieniek, Marcin; Godbole, Varun; Godwin, Jonathan; Antropova, Natasha; Ashrafian, Hutan; Back, Trevor; Chesus, Mary; Corrado, Greg S; Darzi, Ara; Etemadi, Mozziyar; Garcia-Vicente, Florencia; Gilbert, Fiona J; Halling-Brown, Mark; Hassabis, Demis; Jansen, Sunny; Karthikesalingam, Alan; Kelly, Christopher J; King, Dominic; Ledsam, Joseph R; Melnick, David; Mostofi, Hormuz; Peng, Lily; Reicher, Joshua Jay; Romera-Paredes, Bernardino; Sidebottom, Richard; Suleyman, Mustafa; Tse, Daniel; Young, Kenneth C; De Fauw, Jeffrey; Shetty, Shravya

International evaluation of an AI system for breast cancer screening.

Accepted version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/299195

Repository DOI

https://doi.org/10.17863/CAM.46260

Files

Accepted version (951.36 KB)

Type

Article

Authors

McKinney, Scott Mayer

Sieniek, Marcin

Godbole, Varun

Godwin, Jonathan

Antropova, Natasha

Show 5 more

Abstract

Screening mammography aims to identify breast cancer at earlier stages of the disease, when treatment can be more successful1. Despite the existence of screening programmes worldwide, the interpretation of mammograms is affected by high rates of false positives and false negatives2. Here we present an artificial intelligence (AI) system that is capable of surpassing human experts in breast cancer prediction. To assess its performance in the clinical setting, we curated a large representative dataset from the UK and a large enriched dataset from the USA. We show an absolute reduction of 5.7% and 1.2% (USA and UK) in false positives and 9.4% and 2.7% in false negatives. We provide evidence of the ability of the system to generalize from the UK to the USA. In an independent study of six radiologists, the AI system outperformed all of the human readers: the area under the receiver operating characteristic curve (AUC-ROC) for the AI system was greater than the AUC-ROC for the average radiologist by an absolute margin of 11.5%. We ran a simulation in which the AI system participated in the double-reading process that is used in the UK, and found that the AI system maintained non-inferior performance and reduced the workload of the second reader by 88%. This robust assessment of the AI system paves the way for clinical trials to improve the accuracy and efficiency of breast cancer screening.

Keywords

Artificial Intelligence, Breast Neoplasms, Early Detection of Cancer, Female, Humans, Mammography, Reproducibility of Results, United Kingdom, United States

Journal Title

Nature

Journal ISSN

0028-0836
1476-4687

Volume Title

577

Publisher

Springer Science and Business Media LLC

Publisher DOI

https://doi.org/10.1038/s41586-019-1799-6

Rights

Sponsorship

Department of Health (via National Institute for Health Research (NIHR)) (NF-SI-0515-10067)
NETSCC (None)
Engineering and Physical Sciences Research Council (EP/N014588/1)

Professor Fiona Gilbert receives funding from the National Institute for Health Research (Senior Investigator award).

Collections

Cambridge University Research Outputs