Open-source machine learning pipeline automatically flags instances of acute respiratory distress syndrome from electronic health records

Félix Leonardo Morales; Feihong Xu; Hyojun Ada Lee; Heliodoro Tejedor Navarro; Meagan A Bechel; Eryn L Cameron; Jesse David Kelso; Curtis H Weiss; Luís A Nunes Amaral

doi:10.1101/2024.05.21.24307715

Abstract

Physicians could greatly benefit from automated diagnosis and prognosis tools to help address information overload and decision fatigue. Intensive care physicians stand to benefit greatly from such tools as they are at particularly high risk for those factors. Acute Respiratory Distress Syndrome (ARDS) is a life-threatening condition affecting >10% of critical care patients and has a mortality rate over 40%. However, recognition rates for ARDS have been shown to be low (30-70%) in clinical settings. In this work, we present a reproducible computational pipeline that automatically adjudicates ARDS on retrospective datasets of mechanically ventilated adult patients. This pipeline automates the steps outlined by the Berlin Definition through implementation of natural language processing tools and classification algorithms. First, we used labeled chest imaging reports from two different hospitals over three different time periods to train an XGBoost model to detect bilateral infiltrates, and a subset of attending physician notes from one hospital labeled for the most common ARDS risk factor (pneumonia) to train another XGBoost model to detect a pneumonia diagnosis. Both models achieve high performance when tested on out-of-bag samples-an area under the receiver operating characteristic curve (AUROC) of 0.88 for adjudicating chest imaging reports, and an AUROC of 0.86 for detecting pneumonia on attending physician notes. Next, we integrate the models and validate the entire pipeline on a fourth cohort from a third hospital (MIMIC-III) and find a sensitivity of 93.5% - an extraordinary improvement over the 22.6% ARDS recognition rate reported for these encounters - along with a false positive rate of 18.8%. We conclude that our reproducible, automated diagnostic pipeline exhibits promising ARDS retrospective adjudication performance, thus providing a valuable resource for physicians aiming to enhance ARDS diagnosis and treatment strategies. We surmise that real-time integration of the pipeline with EHR systems has the potential to aid clinical practice by facilitating the recognition of ARDS cases at scale.

PERMALINK

This is a preprint.

Open-source machine learning pipeline automatically flags instances of acute respiratory distress syndrome from electronic health records

Félix Leonardo Morales

Feihong Xu

Hyojun Ada Lee

Heliodoro Tejedor Navarro

Meagan A Bechel

Eryn L Cameron

Jesse David Kelso

Curtis H Weiss

Luís A Nunes Amaral

Abstract

Full Text Availability

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

This is a preprint.

Open-source machine learning pipeline automatically flags instances of acute respiratory distress syndrome from electronic health records

Félix Leonardo Morales

Feihong Xu

Hyojun Ada Lee

Heliodoro Tejedor Navarro

Meagan A Bechel

Eryn L Cameron

Jesse David Kelso

Curtis H Weiss

Luís A Nunes Amaral

Abstract

Full Text Availability

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases