Repository logo

Advances in Time-to-Event Analysis: Big Data Applications in Cancer Risk Prediction



Change log


Jung, Alexander Wolfgang 


The digital transformation of health care provides new opportunities to study dis- ease and gain unprecedented insights into the underlying biology. With the wealth of data generated, new statistical challenges arise. This thesis will address some of them, with a particular focus on Time-to-Event analysis. The Cox hazard model, one of the most widely used statistical tools in biomedicine, is extended to analyses for large-scale and high-dimensional data sets. Built on recent machine learning frame- works the approach scales readily to big data settings. The method is extensively evaluated in simulation- and case-studies, showcasing its applicability to different data modalities, ranging from hospital admission episodes to histopathological im- ages of tumour resections. The motivating application of this thesis are electronic health records (EHR), collections of various interlinked data at an individual level. With many countries starting to implement national health data resources, methods that can cope with these datasets become paramount. In particular, cancers could benefit significantly from these developments. The lifetime risk of developing a ma- lignancy is around 50%. However, the associated risks are not equally distributed with large differences between individuals. Hence, being able to utilise the data available in EHR could potentially help to stratify individuals by their risk profiles and screen or even intervene early. The proposed method is used to build a pre- dictive model for 20 primary cancer sites based on clinical disease histories, basic health parameters, and family histories covering 6.7 million Danish individuals over a combined 193 million life years. The obtained risk score can predict cancer inci- dence across most organ sites. Further, the information could potentially be used to create cohorts with similar efficiency while screening earlier, creating the possibility for risk-targeted screening programs. Additionally, the obtained result could also be transferred between health care systems, as shown here between Denmark and the UK. Taken together the thesis established a method to analyse the extensive amounts of data that is being generated nowadays as well as an evaluation of the potential these data sources can have in the context of cancer risk.





Gerstung, Moritz


Survival Analysis, Cancer Risk Prediction


Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Novo Nordisk Foundation (NNF17OC0027594)