Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

medRxiv logoLink to medRxiv
[Preprint]. 2024 Nov 2:2024.05.21.24307715. [Version 3] doi: 10.1101/2024.05.21.24307715

Open-source machine learning pipeline automatically flags instances of acute respiratory distress syndrome from electronic health records

Félix Leonardo Morales, Feihong Xu, Hyojun Ada Lee, Heliodoro Tejedor Navarro, Meagan A Bechel, Eryn L Cameron, Jesse David Kelso, Curtis H Weiss, Luís A Nunes Amaral
PMCID: PMC11142283  PMID: 38826348

Abstract

Physicians could greatly benefit from automated diagnosis and prognosis tools to help address information overload and decision fatigue. Intensive care physicians stand to benefit greatly from such tools as they are at particularly high risk for those factors. Acute Respiratory Distress Syndrome (ARDS) is a life-threatening condition affecting >10% of critical care patients and has a mortality rate over 40%. However, recognition rates for ARDS have been shown to be low (30-70%) in clinical settings. In this work, we present a reproducible computational pipeline that automatically adjudicates ARDS on retrospective datasets of mechanically ventilated adult patients. This pipeline automates the steps outlined by the Berlin Definition through implementation of natural language processing tools and classification algorithms. First, we used labeled chest imaging reports from two different hospitals over three different time periods to train an XGBoost model to detect bilateral infiltrates, and a subset of attending physician notes from one hospital labeled for the most common ARDS risk factor (pneumonia) to train another XGBoost model to detect a pneumonia diagnosis. Both models achieve high performance when tested on out-of-bag samples-an area under the receiver operating characteristic curve (AUROC) of 0.88 for adjudicating chest imaging reports, and an AUROC of 0.86 for detecting pneumonia on attending physician notes. Next, we integrate the models and validate the entire pipeline on a fourth cohort from a third hospital (MIMIC-III) and find a sensitivity of 93.5% - an extraordinary improvement over the 22.6% ARDS recognition rate reported for these encounters - along with a false positive rate of 18.8%. We conclude that our reproducible, automated diagnostic pipeline exhibits promising ARDS retrospective adjudication performance, thus providing a valuable resource for physicians aiming to enhance ARDS diagnosis and treatment strategies. We surmise that real-time integration of the pipeline with EHR systems has the potential to aid clinical practice by facilitating the recognition of ARDS cases at scale.

Full Text Availability

The license terms selected by the author(s) for this preprint version do not permit archiving in PMC. The full text is available from the preprint server.


Articles from medRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES