Repository logo
 

Advances in Time-to-Event Analysis: Big Data Applications in Cancer Risk Prediction


Type

Thesis

Change log

Authors

Jung, Alexander Wolfgang 

Abstract

The digital transformation of health care provides new opportunities to study dis- ease and gain unprecedented insights into the underlying biology. With the wealth of data generated, new statistical challenges arise. This thesis will address some of them, with a particular focus on Time-to-Event analysis. The Cox hazard model, one of the most widely used statistical tools in biomedicine, is extended to analyses for large-scale and high-dimensional data sets. Built on recent machine learning frame- works the approach scales readily to big data settings. The method is extensively evaluated in simulation- and case-studies, showcasing its applicability to different data modalities, ranging from hospital admission episodes to histopathological im- ages of tumour resections. The motivating application of this thesis are electronic health records (EHR), collections of various interlinked data at an individual level. With many countries starting to implement national health data resources, methods that can cope with these datasets become paramount. In particular, cancers could benefit significantly from these developments. The lifetime risk of developing a ma- lignancy is around 50%. However, the associated risks are not equally distributed with large differences between individuals. Hence, being able to utilise the data available in EHR could potentially help to stratify individuals by their risk profiles and screen or even intervene early. The proposed method is used to build a pre- dictive model for 20 primary cancer sites based on clinical disease histories, basic health parameters, and family histories covering 6.7 million Danish individuals over a combined 193 million life years. The obtained risk score can predict cancer inci- dence across most organ sites. Further, the information could potentially be used to create cohorts with similar efficiency while screening earlier, creating the possibility for risk-targeted screening programs. Additionally, the obtained result could also be transferred between health care systems, as shown here between Denmark and the UK. Taken together the thesis established a method to analyse the extensive amounts of data that is being generated nowadays as well as an evaluation of the potential these data sources can have in the context of cancer risk.

Description

Date

2022-09-09

Advisors

Gerstung, Moritz

Keywords

Survival Analysis, Cancer Risk Prediction

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Sponsorship
Novo Nordisk Foundation (NNF17OC0027594)