Repository logo

Exploration of cell-free DNA’s biological properties for better understanding and improved circulating tumour DNA detection



Change log


Vijayaraghavan, Aadhitthya 


Background: Liquid biopsies, which are sampling from bodily fluids, are minimally-invasive techniques that can be used for cancer profiling. Cell-free DNA (cfDNA) are fragmented DNA shed into circulation from various types of cells. The cfDNA derived from cancer cells, circulating tumour DNA (ctDNA), forms a subset of cfDNA. It can be challenging to detect ctDNA as it often constitutes only a minor fraction of the cfDNA. Recent studies suggest that understanding the biological properties of ctDNA can improve its detection in a tumour-naive way.

Aim: To develop novel data analysis approaches and tools to explore the biological properties of cfDNA sequencing data, aiming to improve cancer detection and diagnosis.

Methods: Two methods FRENDS (FRagment ENDS Sequence context) and MIDS (fragMent dIstance to the miDpoint of genomic elementS) were developed from biological properties sequence content of fragment ends and the fragment positioning. FRENDS method extracts nucleotide frequency and motif frequency at fragment start and ends. The MIDS method assesses the distance of the fragment midpoints to the interested region’s centre in 66 different genomic lists ranging from nucleosome positions, transcription start sites, to DNASE1 hypersensitive regions.

Results: The FRENDS and MIDS signals showed differences between cancer patients and healthy donors in both plasma and urine cfDNA data and between analytes. FRENDS model (26 features), MIDS model (94 features) and ensemble stacked model (built from FRENDS, MIDS and fragment length features obtained from previous publications) showed an AUC of 0.98, 0.94 and 0.99 in train (n=279), and 0.97, 0.86 and 0.96 in test (n=183) data from in-house sWGS plasma data (0.4X depth), with AUC of >0.97, >0.84 and >0.96 in early stage (stages I and II) disease and AUC of >0.99, >0.91 and >0.99in late stage disease. FRENDS scores derived from the 26 FRENDS features showed a significant difference in patient outcomes. The same feature sets from FRENDS and MIDS showed good performance in external datasets (DELFI (n=504, 1.5X; FRENDS AUC - 0.96; MIDS AUC - 0.94; Ensemble AUC - 0.98); Heitzer (n=401, 0.2X; FRENDS AUC - 0.99; MIDS AUC - 0.98, Ensemble AUC - 0.99)) and in-house urine sWGS cfDNA data (n=92, 0.4X FRENDS AUC - 0.97; MIDS AUC - 0.99; Ensemble AUC - 0.99). All stages (including stage I) in the DELFI dataset show > 0.92 AUC.

Conclusions: Using fragment end and position properties for detecting cancer has shown promising results. An ensemble model built from these properties can be a valuable tool for early cancer detection, detection of minimal residual disease, and monitoring of patients in clinical settings.





Markowetz, Florian


Bioinformatics, Cancer early detection, cfDNA, Liquid biopsy, Machine learning


Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Cancer Research UK (S_4076)