Repository logo

Information and generative deep learning with applications to medical time-series



Change log



Physiological time-series data are a valuable but under-utilised resource in intensive care medicine. These data are highly-structured and contain a wealth of information about the patient state, but can be very high-dimensional and difficult to interpret. Understanding temporal relationships between time-series variables is crucial for many important tasks, in particular identifying patient phenotypes within large heterogeneous cohorts, and predicting and explaining physiological changes to a patient over time. There are wide- ranging complexities involved in learning such insights from longitudinal data, including a lack of a universal accepted framework for understanding causal influence in time-series, issues with poor quality data segments that bias downstream tasks, and important privacy concerns around releasing sensitive personal data. These challenges are by no means unique to this clinical application, and there are significant domain-agnostic elements within this thesis that have a broad scope to any research area that is centred around time-series monitoring (e.g. climate science, mathematical finance, signal processing).

In the first half of this thesis, I focused firstly on information and causal influence in time- series data and then on flexible time-series modelling and hierarchical model comparison using Bayesian methods. To aid these tasks, I reviewed and developed new statistical methodology, particularly using integrated likelihoods for model evidence estimation. Together, this provided a framework for evaluating trajectories of the information contained within and between physiological variables, and allowed a comparison between patient cohorts that showed evidence of impaired physiological regulation in Covid-19 patients. The second half of this thesis introduced generative deep learning models as a tool to address some of the key difficulties in clinical time-series data, including artefact detection, imputation and synthetic dataset generation. The latter is especially important in the future of critical care research, because of the inherent challenges in publishing clinical datasets. However, I showed that that there are many obstacles that must be addressed before large-scale synthetic datasets can be utilised fully, including preserving complex relationships between physiological time-series variables within the synthetic data.





Eglen, Stephen
Ercole, Ari


artefact detection, causal influence, generative deep learning, intensive care medicine, multilevel models, synthetic data


Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
EPSRC (2089662)
Engineering and Physical Sciences Research Council (EPSRC) National Productivity Investment Fund (NPIF) EP/S515334/1, reference 2089662