Deep learning on whole-slide images for early detection and risk prediction of oesophageal cancer

Change log

This dissertation introduces novel computational techniques to identify patients at particularly high risk for progressing from Barrett’s oesophagus (BE) to oesophageal adenocarcinoma (EAC) earlier and more accurately using data from a minimally-invasive cell collection device. It also introduces a new software library for efficiently handling computational pathology tasks.

Oesophageal adenocarcinoma is usually diagnosed late, leading to a five-year mortality rate of only 13%. The identification of its precursor, Barrett’s oesophagus (BE), is thus a crucial early detection goal. Identifying the cancer at an early stage drastically increases patient five-year survival to 80%. A new minimally-invasive screening device for BE detection called the Cytosponge presents a solution. Despite this advance, only 0.3% of BE cases progress to cancer per patient-year, leading to a large number of costly and invasive follow-up procedures. I have therefore developed machine learning systems to predict which precursors become deadly by identifying two features prognostic of progression to EAC: atypia, a kind of cellular irregularity, and P53 aberrance. These models automate patient stratification and drastically reduce the time it takes to screen for these progression markers. I have also identified a clinically relevant correlation between the automatically detected quantity of BE in a pathology slide and the length of the BE segment identified from endoscopy.

Beyond oesophageal cancer, the inspection of stained tissue slides by pathologists is essential for the early detection, diagnosis, and monitoring of disease. However, WSIs present a number of unique challenges for analysis, requiring special consideration of image annotations, slide and image artefacts, and evaluation of model performance. I have therefore developed SliDL, a Python library for performing pre- and post-processing of WSIs for deep learning. SliDL allows users to perform essential processing tasks in a few simple lines of code, bridging the gap between standard image analysis and WSI analysis. By providing a framework in which deep learning methods for WSI analysis can be developed and applied, SliDL increases the accessibility of an important application of deep learning.

Digital pathology is rapidly growing as a topic salient to both computer science and medicine. My work aims to contribute to both fields, including a software library to democratise access while applying it to a pressing issue in cancer early detection.

Markowetz, Florian
Fitzgerald, Rebecca
barrett's oesophagus, cancer, computational biology, cytosponge, deep learning, digital pathology, early detection, machine learning, oesophageal, oesophageal adenocarcinoma, oesophageal cancer, pathology, whole slide image
Doctor of Philosophy (PhD)
Awarding Institution
University of Cambridge
Cancer Research UK (S_4065)
Bill & Melinda Gates Foundation; Cancer Research UK (FM: C14303/A17197); Medical Research Council (RCF: RG84369)
Is supplemented by: