Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

medRxiv logoLink to medRxiv
[Preprint]. 2023 Jun 1:2023.05.30.23290757. [Version 1] doi: 10.1101/2023.05.30.23290757

Characterizing the Importance of Hematologic Biomarkers in Screening for Severe Sepsis using Machine Learning Interpretability Methods

Dipak P Upadhyaya, Yasir Tarabichi, Katrina Prantzalos, Salman Ayub, David C Kaelber, Satya S Sahoo
PMCID: PMC10312863  PMID: 37398448

Abstract

Background

Early detection of sepsis in patients admitted to the emergency department (ED) is an important clinical objective to help reduce morbidity and mortality. We aimed to use data from Electronic Health Records (EHR) system to characterize the relative importance of a new biomarker called Monocyte Distribution Width (MDW) that has been recently approved by the US Food and Drug Administration (FDA) for sepsis screening in the presence of routinely available hematologic parameters and vital signs measures.

Methods

In this retrospective cohort study, we included ED patients admitted to the MetroHealth hospital (a large regional safety-net hospital in Cleveland, OH, USA) with suspected infection who later developed severe sepsis. All adult patients presenting to the ED were eligible for inclusion and encounters that did not have complete blood count with differential data or vital signs data were excluded. We developed seven data models and an ensemble of four high accuracy machine learning (ML) algorithms using the Sepsis-3 diagnostic criteria for validation. Using the results generated by the high accuracy ML models, we applied the Local Interpretable Model-Agnostic Explanation (LIME) and Shapley Additive Value (SHAP) post-hoc ML interpretability methods to characterize the contributions of individual hematologic parameters, including MDW, vital signs measures in screening for severe sepsis.

Findings

We evaluated 7071 adult patients from 303,339 adult ED visits occurring between May 1 st , 2020 and August 26 th , 2022. Implementation of the seven data models reflected the ED clinical workflow with incremental addition of standard complete blood count (CBC), CBC with differential, with MDW, and finally vital signs measures. Random forest and deep neural network model reported classification area under the receiver operating characteristic curve (AUC) value of up to 93% (CI 92 - 94) and 90% (CI 88 – 91) over data model with hematologic parameters and vital signs measures. We applied the LIME and SHAP ML interpretability methods on these high accuracy ML models. Both the interpretability methods were consistent in their findings that the value of MDW is grossly attenuated (low feature importance scores of 0.015 (SHAP) and 0.0004 (LIME)) in the presence of other routinely reported hematologic parameters and vital signs measures for severe sepsis detection.

Interpretation

Using ML interpretability methods applied to EHR data, we show that MDW can be replaced with routinely reported CBC with differential together with vital signs measures for severe sepsis screening. MDW requires specialized laboratory equipment and modification of existing care protocols; therefore, these results could guide decisions about allocation of limited resources in cost constrained care settings. Additionally, the analysis shows the practical application of ML interpretability methods in clinical decision making.

Funding

National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health/National Center for Advancing Translational Sciences, National Institute on Drug Abuse

Full Text Availability

The license terms selected by the author(s) for this preprint version do not permit archiving in PMC. The full text is available from the preprint server.


Articles from medRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES