Reliable and decentralised deep learning for physiological data
Repository URI
Repository DOI
Change log
Authors
Abstract
Physiological data encompass measurements from various bodily functions and processes. By employing machine learning to model these data, especially with the advancement of mobile sensing technologies, it becomes feasible to automatically and continually monitor and diagnose one's health status. This holds considerable promise for easing the burden on clinical resources and ensuring timely treatment for the wider population. Nonetheless, significant challenges related to the data and the modelling methods are yet to be resolved, obstructing the deployment of machine learning, especially deep learning, in real-world healthcare contexts.
One challenge is that labelled physiological data for model development are usually insufficient and imbalanced, leading to models occasionally exhibiting bias and overconfidence in their predictions. This can result in unreliable diagnoses which yield expensive clinical costs. Moreover, deep learning research generally requires massive data on a centralised server, while privacy concerns hinder the aggregation of physiological data from individuals or hospitals.
In order to tackle these challenges and pave the way for reliable deep learning-driven health diagnostics, this thesis proposes several novel solutions and makes the following contributions:
Chapter 4 introduces an ensemble learning approach designed to handle data imbalance and model overconfidence for binary health screening. This method utilises balanced training sets derived from imbalanced physiological data, training multiple ensemble models. The predictions from these models are fused to reduce bias and calibrate confidence from a signal model, with model uncertainty measured by the inconsistency among multiple models. This approach effectively mitigates model overconfidence, thereby facilitating reliable automated diagnoses.
In Chapter 5, an efficient uncertainty quantification approach is presented to improve the reliability of multi-class mobile health diagnostics. This approach incorporates the cutting-edge technique of evidential deep learning and introduces two novel mechanisms specifically designed to handle class imbalance. The quantified uncertainty enables accurate and efficient detection of misdiagnoses and out-of-training distributed inputs.
Chapter 6 introduces a cross-device federated learning method to address privacy concerns arising from gathering physiological data for model development. This method allows physiological data to remain on personal mobile devices, with only locally trained models aggregated into a global health diagnostic model. To mitigate bias caused by data imbalance, a novel loss-weighted model aggregation method is proposed to enhance the performance of the global model.
Chapter 7 illustrates a cross-silo federated learning method that enables multiple data holders such as hospitals to collaboratively train a model without exchanging raw data. The distributional heterogeneity of these physiological data silos poses a challenge to federated learning. To address this, a novel method based on feature sharing and augmentation is proposed to balance privacy protection and model performance.
All proposed methods have been validated using real-world physiological datasets and commonly used machine learning benchmark data. Specific attention is given to clinical tasks, including the modelling of respiratory audio for respiratory health screening, ECG signals for predicting cardiovascular diseases, and dermoscopic images for detecting skin cancer. Extensive experiments demonstrate that these methods effectively address challenges posed by limited, imbalanced, and decentralised physiological data, thereby enabling reliable health diagnoses. These contributions have significant potential to advance the deployment of deep learning in real-world healthcare scenarios.

