Repository logo
 

External validity of machine learning-based prognostic scores for cystic fibrosis: A retrospective study using the UK and Canadian registries.

Published version
Peer-reviewed

Change log

Authors

Alaa, Ahmed 
Floto, Andres 
Schaar, Mihaela van der 

Abstract

Precise and timely referral for lung transplantation is critical for the survival of cystic fibrosis patients with terminal illness. While machine learning (ML) models have been shown to achieve significant improvement in prognostic accuracy over current referral guidelines, the external validity of these models and their resulting referral policies has not been fully investigated. Here, we studied the external validity of machine learning-based prognostic models using annual follow-up data from the UK and Canadian Cystic Fibrosis Registries. Using a state-of-the-art automated ML framework, we derived a model for predicting poor clinical outcomes in patients enrolled in the UK registry, and conducted external validation of the derived model using the Canadian Cystic Fibrosis Registry. In particular, we studied the effect of (1) natural variations in patient characteristics across populations and (2) differences in clinical practice on the external validity of ML-based prognostic scores. Overall, decrease in prognostic accuracy on the external validation set (AUCROC: 0.88, 95% CI 0.88-0.88) was observed compared to the internal validation accuracy (AUCROC: 0.91, 95% CI 0.90-0.92). Based on our ML model, analysis on feature contributions and risk strata revealed that, while external validation of ML models exhibited high precision on average, both factors (1) and (2) can undermine the external validity of ML models in patient subgroups with moderate risk for poor outcomes. A significant boost in prognostic power (F1 score) from 0.33 (95% CI 0.31-0.35) to 0.45 (95% CI 0.45-0.45) was observed in external validation when variations in these subgroups were accounted in our model. Our study highlighted the significance of external validation of ML models for cystic fibrosis prognostication. The uncovered insights on key risk factors and patient subgroups can be used to guide the cross-population adaptation of ML-based models and inspire new research on applying transfer learning methods for fine-tuning ML models to cope with regional variations in clinical care.

Description

Acknowledgements: We acknowledge the UK Cystic Fibrosis Registry and Canadian Cystic Fibrosis Registry for the high quality data of CF patients. We thank Dr. Janet Allen (University of Cambridge), Ms. Stephanie Chang (Cystic Fibrosis Canada) and Prof. Anne Stephenson (University of Toronto) for their help with the CF data access, extraction and data correction. We would like to express special thanks to Prof. Changhee Lee (Chung-Ang University) for his help with the cleaning and preprocessing of the UK CF data.


Funder: Cystic Fibrosis Trust; funder-id: http://dx.doi.org/10.13039/501100000292


Funder: Cystic Fibrosis Foundation; funder-id: http://dx.doi.org/10.13039/100000897

Keywords

Lung, Rare Diseases, Brain Disorders, 4.2 Evaluation of markers and technologies, 4 Detection, screening and diagnosis, 3 Good Health and Well Being

Journal Title

PLOS Digit Health

Conference Name

Journal ISSN

2767-3170
2767-3170

Volume Title

Publisher

Public Library of Science (PLoS)
Sponsorship
Cystic Fibrosis Trust (DHRP016)