Repository logo

Supervised machine learning for the early prediction of acute respiratory distress syndrome (ARDS).

Accepted version



Change log


Le, Sidney 
Pellegrini, Emily 
Green-Saxena, Abigail 
Hoffman, Jana 


PURPOSE: Acute respiratory distress syndrome (ARDS) is a serious respiratory condition with high mortality and associated morbidity. The objective of this study is to develop and evaluate a novel application of gradient boosted tree models trained on patient health record data for the early prediction of ARDS. MATERIALS AND METHODS: 9919 patient encounters were retrospectively analyzed from the Medical Information Mart for Intensive Care III (MIMIC-III) data base. XGBoost gradient boosted tree models for early ARDS prediction were created using routinely collected clinical variables and numerical representations of radiology reports as inputs. XGBoost models were iteratively trained and validated using 10-fold cross validation. RESULTS: On a hold-out test set, algorithm classifiers attained area under the receiver operating characteristic curve (AUROC) values of 0.905 when tested for the detection of ARDS at onset and 0.827, 0.810, and 0.790 for the prediction of ARDS at 12-, 24-, and 48-h windows prior to onset, respectively. CONCLUSION: Supervised machine learning predictions may help predict patients with ARDS up to 48 h prior to onset.



Acute respiratory distress syndrome, Clinical decision support systems, Electronic health records, Intensive care unit, Machine learning, Medical informatics, Adolescent, Adult, Aged, Area Under Curve, Critical Care, Databases, Factual, Early Diagnosis, Female, Humans, Intensive Care Units, Male, Middle Aged, ROC Curve, Respiratory Distress Syndrome, Retrospective Studies, Risk Factors, Supervised Machine Learning, Young Adult

Journal Title

J Crit Care

Conference Name

Journal ISSN


Volume Title



Elsevier BV
MRC (MR/P502091/1�)