Early detection and diagnosis of cancer with interpretable machine learning to uncover cancer-specific DNA methylation patterns.
Published version
Peer-reviewed
Repository URI
Repository DOI
Type
Change log
Authors
Abstract
Cancer, a collection of more than two hundred different diseases, remains a leading cause of morbidity and mortality worldwide. Usually detected at the advanced stages of disease, metastatic cancer accounts for 90% of cancer-associated deaths. Therefore, the early detection of cancer, combined with current therapies, would have a significant impact on survival and treatment of various cancer types. Epigenetic changes such as DNA methylation are some of the early events underlying carcinogenesis. Here, we report on an interpretable machine learning model that can classify 13 cancer types as well as non-cancer tissue samples using only DNA methylome data, with 98.2% accuracy. We utilize the features identified by this model to develop EMethylNET, a robust model consisting of an XGBoost model that provides information to a deep neural network that can generalize to independent data sets. We also demonstrate that the methylation-associated genomic loci detected by the classifier are associated with genes, pathways and networks involved in cancer, providing insights into the epigenomic regulation of carcinogenesis.
Description
Acknowledgements: S.A.S conceived the study. I.N developed the machine learning models and carried out the data processing and analysis. M.S contributed to the initial machine learning models and analysis during a summer studentship. S.J contributed an external data set and expertise. I.N and S.A.S wrote the manuscript with input from the other authors. We acknowledge the contribution of Dr Charles Massie (In Memoriam) of the University of Cambridge who was also involved in the conception of the study and whose advice and expertise on cancer early detection and cancer-related DNA methylome analysis was invaluable to this study. We are thankful to Prof. Rebecca Fitzgerald (University of Cambridge), who contributed an oesophagus cancer data set to this study. We also thank members of the S.A.S laboratory that read and commented on the manuscript.
Keywords
Journal Title
Conference Name
Journal ISSN
2396-8923

