Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2021 May 6;16(5):e0250923. doi: 10.1371/journal.pone.0250923

Clinical factors associated with rapid treatment of sepsis

Xing Song 1,*, Mei Liu 2, Lemuel R Waitman 1, Anurag Patel 3, Steven Q Simpson 4,*
Editor: Robert Moskovitch5
PMCID: PMC8101717  PMID: 33956846

Abstract

Purpose

To understand what clinical presenting features of sepsis patients are historically associated with rapid treatment involving antibiotics and fluids, as appropriate.

Design

This was a retrospective, observational cohort study using a machine-learning model with an embedded feature selection mechanism (gradient boosting machine).

Methods

For adult patients (age ≥ 18 years) who were admitted through Emergency Department (ED) meeting clinical criteria of severe sepsis from 11/2007 to 05/2018 at an urban tertiary academic medical center, we developed gradient boosting models (GBMs) using a total of 760 original and derived variables, including demographic variables, laboratory values, vital signs, infection diagnosis present on admission, and historical comorbidities. We identified the most impactful factors having strong association with rapid treatment, and further applied the Shapley Additive exPlanation (SHAP) values to examine the marginal effects for each factor.

Results

For the subgroups with or without fluid bolus treatment component, the models achieved high accuracy of area-under-receiver-operating-curve of 0.91 [95% CI, 0.86–0.95] and 0.84 [95% CI, 0.81–0.86], and sensitivity of 0.81[95% CI, 0.72–0.87] and 0.91 [95% CI, 0.81–0.97], respectively. We identified the 20 most impactful factors associated with rapid treatment for each subgroup. In the non-hypotensive subgroup, initial physiological values were the most impactful to the model, while in the fluid bolus subgroup, value minima and maxima tended to be the most impactful.

Conclusion

These machine learning methods identified factors associated with rapid treatment of severe sepsis patients from a large volume of high-dimensional clinical data. The results provide insight into differences in the rapid provision of treatment among patients with sepsis.

Introduction

Sepsis is an important public health problem in the United States and is the leading cause of death among hospitalized patients. Current estimates suggest that sepsis afflicts over 1.7 million Americans each year and is responsible for over 270,000 deaths [1]. In 2012 the Surviving Sepsis Campaign (SSC) first published 3-hour and 6-hour care bundles for reducing mortality due to sepsis [2]. Based on these recommendations, hospitals implement the guidelines differently, adapted to local standards of care, and clinicians behave differently, based on patients’ manifestations of illness. Education of healthcare workers and attention to quality improvement have aided in reducing mortality from sepsis [1, 2]. To that end, numerous efforts have used alerting mechanisms within the electronic medical record (EMR) to attempt early warning of the signs of sepsis, whether based on systemic inflammatory response syndrome (SIRS) or organ dysfunction, with varying degrees of success [35]. One weakness of EMR-based alerting has been its inability to detect when infection is suspected or present, while the strengths of the approach lie in identifying objective data, such as respiratory rate, blood pressure, or specific criteria of organ dysfunction in sepsis [6].

EMRs have revolutionized the curation and presentation of clinical data; in the current state, medicine is far better served than it has ever been, in terms of having data readily available for clinical use. However, the mode of presentation of data to end users (physicians, nurses, etc.) remains steadfastly in a 20th century paradigm, in that EMRs predominantly exist as information storehouses, and their potential to guide efficient diagnosis and treatment decisions for various conditions is unrealized, as is their potential to facilitate quality improvement efforts. EMRs currently provide users with static patient data; the values are displayed in the manner of the specific EMR and are displayed in essentially the same fashion, regardless of the identity of the user. The physician user mostly sees only the data that they have specifically sought, and frequently this data is sought principally to confirm prior beliefs about the patient. We envision an EMR that is adaptive to the patient’s data, the location or facility, and the user, much as the consumer-oriented products Amazon and Google are. In other words, the envisioned EMR presents the data to an individual user in a fashion intended to promote specific behaviors and based on adaptive algorithms determined by users’ collective previous desired behavior. For Amazon or Google, those desired behaviors involve product purchases or the viewing of specific web pages. For the EMR, such behaviors could include rapid treatment of sepsis.

Development of such an EMR requires several steps: a) understanding which conditions can benefit from it, b) within those conditions, determining the features that are associated with efficient diagnosis and care delivery that alter patient outcomes, c) understanding which of those features are promoters of efficient diagnosis and treatment, rather than simply covariates, d) testing putative promoters by altering modes of EMR display to individual users and monitoring subsequent diagnostic and therapeutic activity, along with patient outcomes. We chose to study sepsis, as it is a high morbidity and mortality condition that is also the most expensive condition treated in American hospitals [7].

Numerous efforts have been made to improve prognostic accuracy and efficiency for sepsis and its complications via machine learning techniques. For example, Yang et al. [8], Komorowski et al. [9] and Reyna et al. [10] developed artificial intelligence models to predict sepsis in intensive care; while Mao et al. [11] and Lauritsen et al. [12] extended the prediction application to ED and general ward. Itzhak et al. [13] and Cherifa et al. [14] developed models to predict acute hypertensive or hypotensive episodes among ICU admissions. However, few, if any studies have been designed to understand specific clinical features that patients exhibit at the time that physicians initiate rapid treatment of sepsis. Without assuming causality, one could evaluate from a situational awareness perspective which clinical features are most closely associated with rapid, thorough treatment. We believed that data from such a study could provide novel information that could be used to prompt rapid sepsis treatment for appropriate patients, regardless of the extant diagnostic criteria. We performed a retrospective, machine learning analysis of patients presenting to our ED over a ten-year period to identify all patients meeting clinical criteria for severe sepsis, and to determine which of their clinical characteristics were associated with rapid initiation of antibiotics, fluids, and other sepsis treatments.

Methods

We collected a retrospective, observational cohort of adult patients (age ≥ 18 years) who were admitted through the University of Kansas Hospital ED from 11/2007 through 05/2018. The de-identified data were obtained from the Healthcare Enterprise Resource for Ontological Narration (HERON), an i2b2-based clinical integrated data repository [15, 16]. The operation of HERON as an honest broker research repository was approved by the University of Kansas Medical Center Institutional Review Board (Human Subject Committee) as an expedited protocol and is renewed and reviewed annually (HSC #12337).

Because of the date range and the nature of these studies, we used the definitions of sepsis, severe sepsis, and septic shock according to the American College of Chest Physicians/Society of Critical Care Medicine (Sepsis-1) definitions, including the SIRS and the laboratory thresholds for organ dysfunction [17]. Patients were included by satisfying all of the following criteria:

  • presence of a suspected infection is based on clinicians’ actions to diagnose and treat infection, defined as a body fluid culture ordered and anti-infective administered within four hours of one another;

  • presence of two or more SIRS criteria;

  • at least one site of acute organ dysfunction, which was defined by the first instance of an abnormal laboratory or examination value and based on organ dysfunction criteria from the first and second international consensus definitions [2, 6].

To further exclude patients who were admitted through the ED but developed sepsis later in their hospitalization, i.e., to include only sepsis present on admission, we inferred an hour boundary based on the timing distribution of patients with infection present on admission, which was 13 hours since triage.

The outcome of interest was timely completion of the SSC 3-hour bundle components [17]. We chose the SSC bundles not as an endorsement, but as quantifiable, time-stamped, and recorded actions that are representative of rapid treatment and that are widely known to critical care and emergency practitioners. The specific treatment bundles were not proposed until 2012, but the components of the 3-hour bundle represent standard elements of excellent sepsis care and have always been present in well-treated patients. However, during the entire study time period Sepsis-2 definitions were the hospital standard and SSC bundles were promoted via continuous quality improvement. No specific sepsis treatments are initiated in our region by Emergency Medical Service (EMS) personnel. In the ED, only physicians may order antibiotics or fluids, though blood cultures may be initiated by nursing personnel. We defined the responses separately for two subgroups, based on the SSC bundle recommendations: Group 1- if a fluid bolus was never triggered by hypotension (systolic blood pressure < 90 mm/Hg, mean arterial pressure < 70 mm Hg, or documented drop in systolic blood pressure ≥ 40 mm Hg) or lactate ≥ 4 mmol/L was not present, we defined rapid treatment as completion of the remaining bundle components within 3 hours of triage; Group 2—if a fluid bolus was triggered, we defined rapid treatment as Group 1 actions, plus completion of a 2-liter bolus within 2 hours of bolus initiation. Thus, separate feature selection models were developed for each subgroup.

We adopted a gradient boosting machine (GBM), an embedded feature selection technique which performed feature selection while constructing and optimizing a prediction model, on the two subgroups separately [18]. GBM is an ensemble learning technique that generates a sequence of decision trees, each of which is designed to further improve prediction accuracy from the previous trees [19, 20]. We randomly partitioned the data into a training set (70% of patients) for model development and a testing set (30% of patients) used for measuring prediction accuracy. To control overfitting, we carefully tuned the model hyper-parameters (i.e. depth of each tree, number of trees, learning rate, and minimum-child-weight) within the training set using 10-fold cross validation. At each iteration during the training stage, we performed “down-sampling”, a common technique of sampling positive and negative cases in equal proportion at each node of each tree to avoid overweighting by negative cases [21]. Missing values were handled in the following fashion: for categorical data, a value of 0 was set for missing whereas for numerical data, a missing value split was always accounted for, and the best imputation value can be adaptively learned based on improvement in training AUROC, at each tree node within the ensemble. For example, if a variable X takes values (0, 1, 2, 3, NA, and NA), where “NA” stands for missing, the following 2 decisions will be made automatically at each split for each tree: (a) should we split based on missing or not; (b) if we split based on values, for example, > 1 or ≤ 0, should we merge the missing cases with the bin of > 1 or ≤ 0. We used the R package “xgboost” and SHAP value derivation used the “xgboostExplainer” package for model development [22, 23]. Additionally, because the data encompass a crucial decade in the development of sepsis diagnosis and treatment, we evaluated whether treatment year was a significant feature of the data, by stratifying the validation set by year.

Seven hundred sixty distinct predictors were fed into the model, including demographics, vital signs, routine laboratory values, a variety of statistics that summarize vital sign and laboratory trends when multiple observations were made, clinical manifestations of Systemic Inflammatory Response Syndrome (SIRS) and acute organ dysfunction, as well as infection diagnosis present on admission and comorbidity diagnoses before admission (Table 1). To include factors that were more likely to induce rapid treatment, rather than being an outcome of it, we sampled values that occurred strictly before IV fluid initiation for Group 2 patients. For Group 1 patients, we sampled values until all sepsis-defining values were present or until completion of the bundle when that occurred before all sepsis-defining values were present, which we called the prediction point. Factors were selected based on their collective discriminant power, measured by area under the receiver operator characteristic curve (AUROC), and the optimal sensitivity and specificity determined by the point closest to the top-left corner of ROC curve by Euclidean distance. We also evaluated area under the precision recall curve (AUPRC) and the positive predictive value (PPV) when optimal sensitivity was achieved, as well as calibration score measured by Brier score and the Hosmer-Lemeshow test (HL).

Table 1. Complete list of variables included in the full model.

Demographics Age, Sex, Race, Ethnicity Continuous
• age
Binary
• sex, race, ethnicity.
(4)
Vital signs Temperature, Heart Rate, Respiratory Rate, Systolic Blood Pressure (SBP), Diastolic Blood Pressure (DBP), Mean Arterial Pressure (MAP), Glasgow Coma Scale (GCS), LOC, O2 Saturation, FiO2, SpO2/FiO2 ratio, SpO2, PaO2, PaCO2, Continuous
(20)
Laboratory values Basic Metabolic Panel [BMP]: Sodium, Potassium, Bicarbonate (CO2), Anion Gap, Glucose, Calcium, Blood Urea Nitrogen (BUN), Serum Creatinine (SCr) baseline, SCr change, Phosphate Continuous
Liver Function Test [LFT]: Albumin, Bilirubin baseline, Bilirubin change
(23)
Complete Blood Count [CBC]: White Blood Cells (WBC) and percentage band, Hemoglobin, Platelet Count
Other labs: D-Dimer, INR, PTT, Fibrinogen, Lactate, pH
Vital signs and Laboratory value trends Initial value of variable vital signs and labs before prediction pointa Continuous
Highest value of variable vital signs and labs before prediction pointa
Lowest value of variable vital signs and labs before prediction pointa
(172)
Average value of variable vital signs and labs before prediction pointa
Diagnostics present or before admission Infection diagnosis codes (present on admission) (538), comorbidities (16), Charlson comorbidities index, Chronic Conditions (on problem list or medical history) (7) Binary
(561)
Critical Events Identifiers of first occurrence of 2 SIRS (12), first occurrence of distinct sites of organ dysfunctions (7), Triage time of the day (4) Binary
(23)

aprediction point is defined as the time of 3-hour bundle initiation (first occurrence of blood culture order, first antibiotics administration, initial lactate, and fluid bolus) if applicable, or sepsis onset (last occurrent of 2 SIRS, suspected infection and first site of organ dysfunction) if not.

The importance of factors was ranked based on “gain”, or the cumulative improvement in AUROC attributed to all splits involving the predictor across all decision trees [20]. The marginal effects were measured by the SHAP (Shapley Additive exPlanations) values [23], which evaluated how the odds ratio changed by including a particular factor of certain value for each individual patient (S1 Appendix). The SHAP value not only captured the global patterns of effects of each factor but demonstrated the patient-level variations of the effects. For each model, we reported the 20 factors that provided the most individual “gain” (i.e. cumulatively accounting for at least 50% “gain”). To interpolate the non-linear factorial effects as well as the uncertainties, we fit cubic splines across with 6 knots over the SHAP values and constructed a bootstrapped confidence interval for each factor [24]. Since the XGBoost implementation of the GBM model incorporated missing value branches for each split of each tree, we were also able to identify if the “missing pattern” of certain factors could have meaningful implications [22].

Results

The initial cohort contained 25,427 encounters identified as suspected infection, of which 11,590 developed markers of two SIRS and at least one site of organ dysfunction within 48 hours of triage (Fig 1). The mean age of the final cohort was 57 (±17) years, evenly distributed between males and females, with the majority being Caucasian. Group 1 included 6,855 (59%) encounters, with 47% women; Group 2 comprised 4,735 (41%) encounters with 55% women. The Logistic Organ Dysfunction Score (LODS) of Group 2 was slightly higher than that of Group 1 (Table 2).

Fig 1. Consort diagram for cohort inclusion and exclusion.

Fig 1

Table 2. Demographic and physiological characteristics.

Demographic Characteristic Overall Group 1 Group 2
(n = 4,735) (n = 6,855)
Age, mean (sd) 57 (17) 57 (16) 58 (18)
Sex, n (%)
Female 6,068 (52) 2,202 (47) 3,773 (55)
Male 5,522 (48) 2,533 (53) 3,062 (45)
Race, n (%)
White 7,633 (66) 2,917 (62) 4,573 (67)
Black 2,570 (22) 1,278 (27) 1,367 (20)
Asian 166 (1) 52 (1.1) 109 (1.6)
Othera 1,223 (11) 469 (9.9) 787 (11.4)
Ethnicity, n (%)
Non-Hispanic 10,640 (92) 4,385 (93) 6,281 (92)
Hispanic 918 (8) 350 (7) 574 (8)
Physiological Characteristics
Initial Temperature (°C) mean (sd) 37.3 (1.20) 37.3 (1.15) 37.3 (1.22)
Initial Hear Rate (/min) mean (sd) 108 (19.7) 108 (17.8) 108 (20.4)
Initial Respiratory Rate (/min) mean (sd) 22 (6.7) 22 (6.2) 22 (7.0)
Initial WBC counts (K/uL) median (IQR) 14 (9.4, 19) 13 (8.6, 17.9) 14 (9.7, 19.7)
Temperature ≥38°C or ≤36°Cb n (%) 5,126 (44) 1,772 (37) 3,213 (47)
Heart Rate ≥90/minb n (%) 10,792 (93) 4,403 (93) 6,386 (93)
Respiratory Rate ≥20/minb n (%) 9,518 (82) 3,731 (79) 5,718 (83)
WBC counts >12K/uL or <4K/uL n (%) 9,623 (83) 3,782 (80) 5,776 (84)
Logistic Organ Dysfunction Score
LODS within first 3hr since triage mean (sd) 1.9 (1.80) 1.9 (1.59) 2.5 (2.12)
LODS prior to initial antibiotics mean (sd) 2.5 (2.05) 2.1 (1.62) 2.6 (2.17)

aThe catch-all “Other” category includes: American Ind/Pac Islander/Two Races, Other and Unknown.

bAll the SIRS events are captured within first 48 hours during ED stay.

Times of the bundle components with respect to triage are shown in Table 3 and Fig 1. Bundle completion rates were lower (17% vs. 42% for completion, 2.2% vs. 10.7% for rapid completion) and time for completion of the bundle longer (27.5 [IQR: 10.4–80.0] hours vs 4.7 [IQR: 3.3–8.0] hours from triage), in Group 1 than in Group 2. Low bundle completion was principally associated with prolonged time to IV fluid completion; 57% of Group 2 patients completed the other three bundle components (blood culture, antibiotics administration, and initial lactate) within 3 hours after triage.

Table 3. Bundle component timing.

Bundle Components Group 1 (n = 4,735) Group 2 (n = 6,855)
n (%, median hours since triage n (%, median hours since triage
[IQR]) [IQR])
Blood Culture 3,923 (83%, 1.3 [0.8, 4.3]) 5,857 (85%, 1.3 [0.6, 4.5])
Antibiotic Administration 4,032 (85%, 3.2 [2.1, 5.4]) 6,098 (89%, 2.9 [1.7, 4.6])
Initial Lactate 2,533 (53%, 2.9 [2.1, 5.8]) 4,387 (64%, 3.4 [2.7, 6.0])
IV Fluid Bolus Begin NA 5,402 (79%, 7.4 [2.7, 27.4])
IV Fluid Bolus Complete NA 3,150 (46%, 28.4 [10.2, 79.2])
Bundle Completeness
At least initiated 4,573 (97%, 1.2 [0.4, 3.4]) 6,855 (100%, 0.9 [0.4, 2.4])
At least initiated therapeutic components 2,013 (42%, 4.2 [3.1, 6.4]) 3,068 (26%, 3.6 [2.3, 5.2])
Completion of bundle components (except for IV Fluid Bolus) 1,393 (29%, 4.7 [3.3, 8.0]) 3,920 (57%, 4.4 [3.0, 7.2])
Completion of all bundle components 1,393 (29%, 4.7 [3.3, 8.0]) 1,925 (17%, 27.5 [10.4, 80.0])
Rapid completion of bundle components 506 (10.7%, 2.0 [1.4, 2.6]) 148 (2.2%, 1.8 [1.3, 2.3], 1.0 [0.4, 1.4]a)

aThe first time is the completion of bundle components since triage (except for IV fluid bolus); the second time is the completion of 2 L bolus since fluid initiation.

Model 1 was built on Group 1 patients, and ultimately selected 142 discriminant factors. Model 2 was developed for Group 2 patients and selected 158 important factors. As shown in Fig 2, both models showed good predictive ability for rapid completion of sepsis bundles based on AUROC in the validation cohort; 0.84 [95% CI, 0.81–0.86] for Model 1 and 0.91 [95% CI, 0.86–0.95] for Model 2. The optimal sensitivity and specificity for Model 1 were 81% [95% CI, 72% - 87%] and 74% [95% CI, 70% - 83%], and were 91% [95% CI, 81% - 97%] and 83% [95% CI, 79% - 87%] for Model 2. At the points of optimal sensitivity, Model 1 achieved a PPV of 40% [95% CI, 36% - 42%], and Model 2 achieved a PPV of 44% [95% CI, 40% - 47%]. Both models achieved competitive AUPRCs (0.41 [95% CI, 0.37–0.43] for Model 1 and 0.29 [95% CI, 0.20–0.41] for Model 2) in comparison to the baseline rates of 10.7% and 2.2%. Both models show good calibrations with p-value > 0.1 for the HL test. The Lift Curves for both models suggest good discriminative power as the higher the risk decile, the more rapid treatment cases the risk decile includes. In addition, the model performance was consistent across calendar years (Fig 3).

Fig 2. Prediction performance metrics.

Fig 2

Fig 3. Model performance comparisons over calendar years.

Fig 3

The middle point of each bar represents the corresponding performance metric over validation set within certain calendar year group. Upper and lower bounds of each bar correspond to 95% bootstrapping confidence interval for each metric. The model performance was consistent across calendar years.

A Spearman correlation test (0.6 [0.43–0.76]) suggested that the feature rankings of the two models were statistically different. However, both models identified 8 common risk factors among the top 20 factors: maximum Glasgow Coma Scale (GCS), minimum heart rate, initial temperature, initial WBC count, initial serum creatinine (SCr), minimum diastolic blood pressure (DBP), initial platelet count and age (Fig 4). Among Group 1 patients, the majority of the top 20 most impactful factors to the model were components of the initial physiological profiles, such as: increased bilirubin on first measure or bilirubin increase from pre-hospitalization baseline, arterial pH, blood pressure, heart rate, respiratory rate, SpO2, and INR. The most impactful factors specific to Group 2 patients were more likely to represent the minimum, maximum, or mean of values before the prediction point: minimum and maximum mean arterial pressure, minimum and maximum systolic blood pressure (SBP), mean heart rate, and maxima of respiratory rate, temperature, or total CO2.

Fig 4. Variable importance plot for Model 1 and Model 2.

Fig 4

The importance score of each variable has been scaled to a maximum value of 100. The colors indicate marginal associations of variables with rapid treatment, which are abstractions calculated by comparing SHAP values at 25th, 50th and 75th percentiles of the variable values.

Fig 5 further depicts the full details of marginal effects of top 12 most impactful features for both models (for better resolution, we reported the remaining marginal effects for next top 13–20 most impactful features in S2 Appendix), which can be used to identify specific value ranges that have strong positive or negative association with rapid treatment. For both Group 1 and Group 2 patients, the presence of a recorded GCS was associated with a reduction in odds ratio of rapid treatment by a factor of 0.3 to 0.6, while the absence of a recorded GCS was associated with an increase of odds ratio by a factor of 1.2 to 1.3. A more rapid minimum heart rate was associated with increasing likelihood of rapid treatment. The odds ratio for rapid treatment was significantly increased among both Group 1 and Group 2 patients when the minimum heart rate was ≥100 beats/min. Initial temperature showed a U-shaped relationship, with significant increase of odds ratio when ≤ 36°C or ≥ 39°C for the non-hypotensive patients. However, the relationship of temperature to rapid treatment was more monotonic among Group 2 patients, with significant increases in odds ratio for rapid treatment when initial temperature was ≥ 39°C. The odds ratio for rapid treatment was increased by a factor of 1.5 by an initial SCr ≥ 1.5 mg/dL among Group 2 patients; however, the magnitude of this effect was substantially lower and less consistent among Group 1 patients.

Fig 5.

Fig 5

Marginal effects of variables ranked top 12 for Model 1 (Panel A) and Model 2 (Panel B) based on SHAP values, i.e. exponential of the SHAP value. Each dot represents an average change of odds ratio for a variable, taking certain values within a bootstrapped sample. Each colored vertical line depicts a 95% bootstrap confidence interval based on 100 bootstrapped samples. A brown line suggests an odds ratio change significantly higher than 1.0; a blue line suggests an odds ratio change significantly lower than 1.0; a yellow line suggests an odds ratio not significantly different from 1.0. Orange dots represent the odds ratio effect of not having the particular data point recorded for the model. The dashed horizontal line shows an odds ratio of 1.

Other features associated with significant increases in odds ratio for rapid treatment were specific to either group. For Group 1 patients (Fig 5A), an initial bilirubin increase from baseline by ≥ 1.0 mg/dL, initial arterial pH value ≤ 7.4, initial WBC count ≥ 20,000/mm3, or an initial heart rate ≥ 120 beats/min were associated with significant increased odds ratios for rapid treatment. For Group 2 patients (Fig 5B), multiple representations of blood-pressure-related factors were shown to be important, such as minimum SBP, DBP and MAP, maximum SBP and MAP, and mean SBP. However, these values were not necessarily correlated with one another (Fig 6). A minimum SBP or MAP ≤ 90 mmHg, or a mean SBP ≤ 100 mmHG was significantly associated with increased likelihood of rapid treatment, while a minimum DBP ≥ 60 mmHg was associated with moderately higher chance of rapid treatment. Additionally, a mean heart rate ≥ 120/min, and/or an initial SCr increase ≥1 mg/dL, and/or missing maximum total CO2 or temperature were associated with significantly increased odds ratios for rapid treatment (S2 Appendix).

Fig 6. The correlation heatmap among different abstractions of the same clinical variable with repeated measurement.

Fig 6

Note that the “Initial” values are not always very different from the other types of summaries.

Discussion

Early and aggressive treatment of sepsis with antibiotics and fluids can be lifesaving [2528]. Yet, in spite of educational efforts, reporting measures, and even regulations mandating hospital protocols for sepsis diagnosis and treatment, many patients who should be rapidly treated are not [2, 29, 30]. An understanding of what patient factors are associated with rapid treatment (or slow treatment) may allow for beneficial changes in education, individualized presentation of data in the EMR, or consistency in approach to septic patients.

We used a data-driven machine learning approach to evaluate which clinical features that are present early in a patient’s hospital course have been associated with rapid sepsis treatment by physicians in our institution. Our initial investigations began with logistic regression modeling. However, this modeling was limited in two important ways. First, the variables to input into the model were based upon prior clinical knowledge and did not allow for discovery of latent variables that may be important but that were not already intuitive. Second, logistic regression forces linear and independent behavior of variables that, themselves, may be non-linear and highly correlated.

Our study spanned more than a decade of care that encompassed both Sepsis-2 and Sepsis-3 diagnostic criteria, as well as continuous quality improvement efforts to speed care delivery for sepsis. Because the elements of the treatment bundle are relatively basic—antibiotics and intravenous fluids—it is likely that the specific bundle recommendation and the timing recommended by the bundle play only a small role in the rapidity with which physicians treat septic patients. We believe that we have identified patient features that may well promote rapid delivery of basic resuscitative therapies, regardless of the timing recommendations by the SSC guidelines. We developed two models for subgroups, determined by the presence or absence of hypotension or lactate ≥ 4. Both models demonstrated good performance. Model 2 demonstrated better predictive ability, indicating that rapid delivery of treatment bundles is more uniform in these patients.

Although each model identifies more than 100 factors associated with rapid treatment, some general patterns based on the most impactful clinical features emerge. For example, instead of specific values of the SIRS criteria or counts of SIRS criteria met, we identified that ranges of values for temperature, WBC count, heart rate and respiratory rate were associated with rapid 3-hour bundle treatment. This suggests that physicians rely on ranges of findings, rather than on specific thresholds when making treatment decisions. Some findings were intuitive and predictable, such as that minimal blood pressure among hypotensive patients was associated with more rapid treatment. Other findings are less intuitive, such as that the presence of any recorded GCS was associated with slower treatment. GCS was less likely to be frequently recorded when it was normal. We hypothesize that patients with abnormal GCS may slow sepsis treatment, because they first receive CT scans or other evaluations to evaluate other causes of abnormal mental status. Age is also an important feature for rapidity of treatment, such that patients over 60 years of age who were not hypotensive received treatment more rapidly than younger patients. However, among hypotensive patients age over 70 was associated with slower treatment (S2 Appendix).

Rather than focusing on features that predict the diagnosis of sepsis, these studies focus on features that predict more rapid treatment of sepsis. To the extent that more rapid treatment is associated with improved sepsis outcomes, the data could be of use in designing more effective EMR presentations that provide physicians with improved situational awareness and promote desired behavior. These studies cannot establish causality, but they may provide insights into factors that motivate physicians to treat faster when sepsis is present. Much as Amazon promotes specific products or Google promotes specific advertisements based on modeling of previous behavior, alerting mechanisms that highlight the appropriate features when they are present, i.e. those features historically associated with more rapid treatment, could more effectively stimulate providers to speed therapy than do simple alerts based on the presence or absence of sepsis. Because physician behavior varies from one hospital to another, incorporation of these findings in the EMR would require local model implementation and continuous learning within each hospital, which could be accomplished either by local servers or by a cloud implementation that could be informed by data from many hospitals. As discussed in [31], awareness and compliance with protocols can be difficult to maintain in the absence of an effective clinical pathway, which requires careful planning and dedicated resources. A better understanding of alerting features associated with rapid treatment would provide useful and evidence-based insights to a better design for creating such clinical pathways and continuous education.

The majority of the most impactful patient features involve vital signs data. Although that information might have been anecdotally predicted, we believe this to be the first study demonstrating associations of vital signs data with rapidity of treatment for sepsis. Certain features are machine generated, such as mean respiratory rate over time, which is not a known diagnostic or prognostic parameter in sepsis. However, it seems possible, even likely that physicians may, in fact, consciously or unconsciously make judgements and take actions related to it. Though it would be of interest, it is not necessary to understand exactly why or how the clinical features promote the desired behavior, which is to say that our studies do not predict the physician thought process, only the outcome of that process.

Our study has several limitations. Since we used only structured EMR data, the definition of suspected infection was based on clinicians’ perceptions of infection, as evidenced by the actions of obtaining blood cultures and initiating antibiotics; we are unable to ascertain more detail about how and why infection was suspected. The actual indication that a physician has diagnosed infection is routinely captured in free-text clinical notes; to further this research, one would need to incorporate free-text nursing and physician notes using natural language processing tools. The diagnostic criteria we used were those of the Sepsis-1 and 2 consensus conferences, chosen because they were the extant diagnostic criteria throughout the vast majority of the study period, and physicians and other providers would have been familiar with them. Lactate value was not used in the learning even though it may be a strong predictor, because of the need to prevent “label leakage”, since the prediction targets for both models under the Sepsis 1 and 2 criteria are partially defined by initial lactate. This factor would be mitigated in Model 1, if Sepsis 3 diagnostic criteria were applied to the cohort, since lactate is not a defining feature of sepsis in those criteria. Our data sources do not include social factors that may impact rapidity of treatment, such as presenting symptoms [32], income level, educational level, occupation, or zip code, nor were we able to capture such features as staffing ratios, pharmacy volumes, or ER wait times; these factors could well affect clinician recognition of sepsis and treatment decision making. We did find that time of day did not significantly affect rapid sepsis treatment (Table 1). Data are analyzed on an encounter level, and it is possible that a given patient could exist in both the training data and the validation data. Finally, we have demonstrated association, not causation, and it is impossible to determine whether we have described overt patient characteristics or latent provider characteristics. However, we have no record of individual provider characteristics that could inform the models. Further, the data provide no insights into how providers arrived at decisions to treat patients with the associated features, only that they did, i.e. the models do not replicate physician thought processes in any way, but are predictive of physician behavior.

Conclusion

We developed machine-learning models for accurately predicting rapid treatment of patients with sepsis in the emergency department and identified clinical factors that are commonly available and that physicians may recognize and use but that are not a part of standard thresholds. These studies may be useful to inform a new generation of EMR sepsis alerting tools.

Supporting information

S1 Appendix. Examples of SHAP value representations.

(PDF)

S2 Appendix

Marginal effects of variables ranked top 11–20 for Model 1 (Panel A) and Model 2 (Panel B) based on SHAP values, i.e. exponential of the SHAP value.

(PDF)

Abbreviations

SSC

Surviving Sepsis Campaign

EMR

Electronic Medical Record

SIRS

Systemic Inflammatory Response Syndrome

ED

Emergency Department

HERON

Healthcare Enterprise Resource for Ontological Narration

EMS

Emergency Medical Service

GBM

Gradient Boosting Machine

SHAP

Shapley Additive exPlanation values

AUROC

Area Under the Receiver Operator Characteristic curve

AUPRC

Area Under the Precision Recall Characteristic curve

PPV

positive predictive value

HL

Hosmer-Lemeshow test

SBP

Systolic Blood Pressure

DBP

Diastolic Blood Pressure

MAP

Mean Arterial Pressure

GCS

Glasgow Coma Scale

LOC

Level of Consciousness

BMP

Basic Metabolic Panel

BUN

Blood Urea Nitrogen

CBC

Complete Blood Count

LFT

Liver Function Test

SCr

Serum Creatinine

WBC

White Blood Cell

LODS

Logistic Organ Dysfunction Score

IQR

Inter-Quartile Range

Data Availability

Data cannot be shared publicly because it includes high-dimensional patient-level data. The high-dimensionality nature of this dataset has a potential re-identification risk if linked with other data source. Data are available from the University of Kansas Medical Center Institutional Data Access / Ethics Committee (contact via HERON team at University of Kansas Medical Center, misupport@kumc.edu, for researchers who meet the criteria for access to confidential data).

Funding Statement

SQS and AP received Blue KC Outcome Research Grants (No.0925-0001) and the authors played role in study design, decision to publish and preparation of the manuscript. LRW received CTSA grant UL1TR002366 from NCRR/NIH and the author played role in data collection and preparation of the manuscript. XS played role in study design, developed the training and testing setup, extracted the study cohort, cleaned up the data and performed all experiments ML played role in study design and manuscript revision. The funders provided support in the form of salaries for authors XS, ML, LRW, AP, SQS, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section as well as above.

References

  • 1.Rhee C, Dantes R, Epstein L, et al.: Incidence and trends of sepsis in US hospitals using clinical vs claims data, 2009–2014. JAMA—J Am Med Assoc 2017; 318:1241–1249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Levy MM, Fink MP, Marshall JC, et al.: 2001 SCCM/ESICM/ACCP/ATS/SIS. International Sepsis Definitions. Conference. Crit Care Med. 2003;31(4): 1250–1256. 10.1097/01.CCM.0000050454.01978.3B [DOI] [PubMed] [Google Scholar]
  • 3.Brandt BN, Gartner AB, Moncure M, et al.: Identifying Severe Sepsis via Electronic Surveillance. Am J Med Qual 2015; 30:559–565. 10.1177/1062860614541291 [DOI] [PubMed] [Google Scholar]
  • 4.Croft CA, Moore FA, Efron PA, et al.: Computer versus paper system for recognition and management of sepsis in surgical intensive care. J Trauma Acute Care Surg 2014; 76:311–319. 10.1097/TA.0000000000000121 [DOI] [PubMed] [Google Scholar]
  • 5.Churpek M, Snyder A, Han X, et al.: Quick sepsis-related organ failure assessment, systemic inflammatory response syndrome, and early warning scores for detecting clinical deterioration in infected patients outside theintensive care unit. Am J Respir Crit Care Med 2017; 195:906–911. 10.1164/rccm.201604-0854OC [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Bone RC, Balk RA, Cerra FB, et al.: Definitions for sepsis and organ failure and guidelines for the use of innovative therapies in sepsis. The ACCP/SCCM. Consensus Conference Committee. American College of Chest Physicians/Society of Critical Care Medicine. Chest. 1992;101(6):1644–1655. 10.1378/chest.101.6.1644 [DOI] [PubMed] [Google Scholar]
  • 7.Torio CM, Moore BJ: National Inpatient Hospital Costs: The Most Expensive Conditions by Payer, 2013: Statistical Brief #204. Healthcare Cost and Utilization Project (HCUP) Statistical Briefs. Rockville (MD),2006 [PubMed]
  • 8.Yang M., Liu C., Wang X., Li Y., Gao H., Liu X., et al. (2020). An Explainable Artificial Intelligence Predictor for Early Detection of Sepsis. Critical Care Medicine, 48(11), e1091–e1096 10.1097/CCM.0000000000004550 [DOI] [PubMed] [Google Scholar]
  • 9.Komorowski M., Celi L. A., Badawi O., Gordon A. C., & Faisal A. A. (2019). Understanding the Artificial Intelligence Clinician and optimal treatment strategies for Sepsis in intensive care. arXiv preprint arXiv:1903.02345. [DOI] [PubMed] [Google Scholar]
  • 10.Reyna, M. A., Josef, C., Seyedi, S., Jeter, R., Shashikumar, S. P., Westover, M. B., et al. (2019, September). Early prediction of Sepsis from clinical data: the PhysioNet/Computing in Cardiology Challenge 2019. In 2019 Computing in Cardiology (CinC) (pp. Page-1). IEEE.
  • 11.Mao Q., Jay M., Hoffman J. L., Calvert J., Barton C., Shimabukuro D., et al. (2018). Multicentre validation of a sepsis prediction algorithm using only vital sign data in the emergency department, general ward and ICU. BMJ open, 8(1). 10.1136/bmjopen-2017-017833 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Lauritsen S. M., Kristensen M., Olsen M. V., Larsen M. S., Lauritsen K. M., Jørgensen M. J., et al. (2020). Explainable artificial intelligence model to predict acute critical illness from electronic health records. Nature communications, 11(1), 1–11. 10.1038/s41467-019-13993-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Itzhak, N., Nagori, A., Lior, E., Schvetz, M., Lodha, R., Sethi, T., et al. (2020, August). Acute Hypertensive Episodes Prediction. In International Conference on Artificial Intelligence in Medicine (pp. 392–402). Springer, Cham.
  • 14.Cherifa M., Blet A., Chambaz A., Gayat E., Resche-Rigon M., & Pirracchio R. (2020). Prediction of an acute hypotensive episode during an ICU hospitalization with a super learner machine-learning algorithm. Anesthesia & Analgesia, 130(5), 1157–1166. 10.1213/ANE.0000000000004539 [DOI] [PubMed] [Google Scholar]
  • 15.Murphy SN, Webwe G, Mendis M, et al.: Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). JAMIA. 2010. Mar-Apr; 17(2):124–30. 10.1136/jamia.2009.000893 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Waitman LR, Warren JJ, Manos EL, et al.: Expressing observations from electronic medical record flowsheets in an i2b2 based clinical data epository to support research and quality improvement. AMIA Annu Symp Proc. 2011; 2011:1454–63. [PMC free article] [PubMed] [Google Scholar]
  • 17.Dellinger RP, Levy MM, Rhodes A, et al.: Surviving sepsis campaign: international guidelines for management of severe sepsis and septic shock: 2012. Crit Care Med. 41(2):580–637. 10.1097/CCM.0b013e31827e83af [DOI] [PubMed] [Google Scholar]
  • 18.Saeys Y, Inza I, Larranaga P.: A review of feature selection techniques in bioinformatics. Bioinformatics. October, 2007;23(19):2507–2517 10.1093/bioinformatics/btm344 [DOI] [PubMed] [Google Scholar]
  • 19.He K, Li Ym, Zhu J., et al.: Component-wise gradient boosting and false discovery control in survival analysis with high-dimensional covariates. Bioinformatics. 2016. January 1; 32(1): 50–57 10.1093/bioinformatics/btv517 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Friedman J: Greedy boosting approximation: a gradient boosting machine. Ann. Stat. 2001. October; 29(5):1189–1232. [Google Scholar]
  • 21.Weber I, Florin E, von Papen M, Timmermann L. The influence of filtering and downsampling on the estimation of transfer entropy. PLoS One. 2017;12(11):e0188210. Published 2017 Nov 17. 10.1371/journal.pone.0188210 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Chen T, Guestrin C.: Xgboost: a scalable tree boosting system. Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016; pp 785–794
  • 23.Lundberg SM, Lee SI: A unified approach to interpreting model predictions. Neural Information Processing Systems 2017 Conference. Long Beach, CA, USA.
  • 24.Duurleman S, Simon R. Flexible regression models with cubic splines. Statistical Methodology. 1989. May; 8(5): 551–61. [DOI] [PubMed] [Google Scholar]
  • 25.Liu VX, Morehouse JW, Marelich GP, et al.: Multicenter implementation of a treatment bundle for patients with sepsis and intermediate lactate values. Am J Respir Crit Care Med 2016; 193:1264–1270. 10.1164/rccm.201507-1489OC [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Liu V, Morehouse JW, Soule J, et al.: Fluid volume, lactate values, and mortality in sepsis patients with intermediate lactate values. Ann Am Thorac Soc 2013; 10:466–473. 10.1513/AnnalsATS.201304-099OC [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Whiles BB, Deis AS, Simpson SQ: Increased Time to Initial Antimicrobial Administration Is Associated With Progression to Septic Shock in Severe Sepsis Patients. Crit Care Med 2017; 45:623–629. 10.1097/CCM.0000000000002262 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Pruinelli L, Westra BL, Yadav P, et al.: Delay Within the 3-Hour Surviving Sepsis Campaign Guideline on Mortality for Patients With Severe Sepsis and Septic Shock. Crit Care Med 2018; 46:1. 10.1097/CCM.0000000000002696 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Deis AS, Whiles BB, Brown AR, et al.: Three-Hour Bundle Compliance and Outcomes in Patients With Undiagnosed Severe Sepsis. Chest 2017; 153:39–45. 10.1016/j.chest.2017.09.031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Levy MM, Rhodes A, Phillips GS, et al.: Surviving Sepsis Campaign: association between performance metrics and outcomes in a 7.5-year study. Crit Care Med 2015; 43:3–12. 10.1097/CCM.0000000000000723 [DOI] [PubMed] [Google Scholar]
  • 31.Uffen J. W., Oosterheert J. J., Schweitzer V. A., Thursky K., Kaasjager H. A. H., and Ekkelenkamp M. B., “Interventions for rapid recognition and treatment of sepsis in the emergency department: a narrative review,” Clin. Microbiol. Infect., February. 2020. 10.1016/j.cmi.2020.02.022 [DOI] [PubMed] [Google Scholar]
  • 32.Filbin MR, Lynch J, Gillingham TD, et al. Presenting Symptoms Independently Predict Mortality in Septic Shock: Importance of a Previously Unmeasured Confounder. Critical care medicine. 2018;46(10):1592–1599. 10.1097/CCM.0000000000003260 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Robert Moskovitch

Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

28 Sep 2020

PONE-D-20-06811

Clinical Factors Associated with Rapid Treatment of Sepsis

PLOS ONE

Dear Dr. Song,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Nov 12 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Robert Moskovitch

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2.We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

3.Thank you for stating the following in the Financial Disclosure section:

[SQS and PA received Blue KC Outcome Research Grants (No.0925-0001) and the authors played role in study design, decision to publish and preparation of the manuscript.

LR received CTSA grant UL1TR002366 from NCRR/NIH and the author played role in data collection and preparation of the manuscript.].   

We note that one or more of the authors are employed by a commercial company: Anurag4Health

  1. Please provide an amended Funding Statement declaring this commercial affiliation, as well as a statement regarding the Role of Funders in your study. If the funding organization did not play a role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript and only provided financial support in the form of authors' salaries and/or research materials, please review your statements relating to the author contributions, and ensure you have specifically and accurately indicated the role(s) that these authors had in your study. You can update author roles in the Author Contributions section of the online submission form.

Please also include the following statement within your amended Funding Statement.

“The funder provided support in the form of salaries for authors [insert relevant initials], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.”

If your commercial affiliation did play a role in your study, please state and explain this role within your updated Funding Statement.

2. Please also provide an updated Competing Interests Statement declaring this commercial affiliation along with any other relevant declarations relating to employment, consultancy, patents, products in development, or marketed products, etc.  

Within your Competing Interests Statement, please confirm that this commercial affiliation does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to  PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests) . If this adherence statement is not accurate and  there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.

Please include both an updated Funding Statement and Competing Interests Statement in your cover letter. We will change the online submission form on your behalf.

Please know it is PLOS ONE policy for corresponding authors to declare, on behalf of all authors, all potential competing interests for the purposes of transparency. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision-making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests: http://journals.plos.org/plosone/s/competing-interests

4. We note that you have included the phrase “data not shown” in your manuscript. Unfortunately, this does not meet our data sharing requirements. PLOS does not permit references to inaccessible data. We require that authors provide all relevant data within the paper, Supporting Information files, or in an acceptable, public repository. Please add a citation to support this phrase or upload the data that corresponds with these findings to a stable repository (such as Figshare or Dryad) and provide and URLs, DOIs, or accession numbers that may be used to access these data. Or, if the data are not a core part of the research being presented in your study, we ask that you remove the phrase that refers to these data.

5. Your ethics statement should only appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please delete it from any other section.

6. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

Additional Editor Comments (if provided):

Please refer carefully to the reviewers' comments.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: High-level comments:

The paper is relatively clearly written, and of mild interest, although its impact and significance are somewhat limited.

The paper discusses a data analysis technique, and model analysis technique (SHAP) applied to a concrete medical problem (rapid treatment of Sepsis).

The topic of Sepsis is a bit specific, but the methodology can be used for different types of complications.

The results and the analysis that are presented in this paper have some interest in the medical domain, but it is not a substantial contribution.

The contributions have to be carefully state and rewritten.

I thought it was unfortunate that the authors chose not to submit supplementary material that could have included additional details about the XGBoost parameters.

Low-level comments:

Abstract - "ED" - was used before it was defined. You explained it on page 4.

Abstract - "These machine learning methods identified factors associated with rapid treatment of severe sepsis patients from a large volume of high-dimensional clinical data" - What about the combination of these factors? The combination of these factors leads to Sepsis and not separately.

Page 3 - "Sepsis is an important public health problem in the United States and is the leading cause of death among hospitalized patients" - Where is the definition of Sepsis? Please define Sepsis clearly.

Page 3 - SIRS - again, was used before it was defined. You explained it on Page 4.

Any similar research with SHAP and Sepsis or other complications? Write more about it in the introduction.

The PDF contains low-resolution figures; it's hard to understand, especially when printing. For example, I can't see the features in Figure 4.

Page 5 - 'We randomly partitioned the data' - why randomly? Try with stratified k-fold cross-validation (same class distribution).

Page 6 - "To control overfitting, we carefully tuned the model hyper-parameters within the training set using 10-fold cross-validation." - Why not training, validation, and testing set? You can use the validation set for parameter tuning.

Your explanation about the missing values is not clear to me.

You mentioned SHAP on page 6 but explained it only on page 7.

It is not clear how do you treat the continuous variable. Aggregation over time? It will be better to add figures to explain how exactly you generate the features over time.

Why GBM? Justify your selection. You explained why not logistic regression. Can you tell why not, for example, Random forest? Neural network?

"logistic regression forces linear and independent behavior of variables that, themselves, may be non-linear and highly correlated." And GBMs? Please explain why it's different.

Page 7-8 - "Since the XGBoost implementation of the GBM model incorporated missing value branches for each split of each tree, we were also able to identify if the "missing pattern" of certain factors could have meaningful implications (14)." - Is XGBoost the only one?

Why you put the demographic table on the results section?

Add the research questions before the results.

"These studies cannot establish causality, but they may provide insights into factors that motivate physicians to treat faster when sepsis is present." - I think it's important to emphasize this in the introduction.

Please share your insights in the Conclusions section.

"Because physician behavior varies from one hospital to another, incorporation of these findings in the EMR would require local model implementation and continuous learning within each hospital, which could be accomplished either by local servers or by a cloud implementation that could be informed by data from many hospitals." - This sentence is hard to understand. Consider also trying your method on public data, such as MIMIC III. Any additional information you can add for readers that want to replicate your experiments? Assuming other researchers hold your data.

Did you use the same seed for both models (for group 1, 2)?

Minor points that could be usefully clarified:

Line numbering is part of the template. Did you use another template?

Please prepare figures that are readable in B&W (I think it already made for colorblind).

Reviewer #2: The paper defines an interesting research question, and addresses it using Gradient boosting models on ED visits data. Overall, the manuscript is well-written; however, there are several shortfalls that need to be addressed.

Please refer to the attached file for the detailed comments.

Reviewer #3: In the manuscript titled "Clinical Factors Associated with Rapid Treatment of Sepsis" by Song et. al. the authors described how they used data from University of Kansas Medical Center. They used retrospective data of 10 years period for finding clinical factors indicative for rapid treatment of features.

Limitation of current systems is that EMR is static and present only data that was predefined to be shown for specific use. Although other data is available, it may be buried under menus and submenus.

The uniqueness of the current study is that rather than focusing on features that predict the diagnosis of sepsis, this study focus on features that predict more rapid treatment of sepsis.

Overall, the manuscript is well written and the analysis was done adequately. I do have a few comments that can be used to improve the manuscript. The definitions of inclusion exclusion criteria of for defining cases/controls and for thresholding were not clear. See bellow related comments.

The next sentence in the Introduction is not clear: "To further exclude patients who were admitted through the ED but developed sepsis later in their hospitalization, we inferred an hour boundary based on the timing distribution of patients with infection present on admission, which was 13 hours since triage." What is this hour boundary, and when is it on the timeline of the admission, and for what proposes?

Sepsis is also a critical issues in ICUs. The authors did not mention that point. Please explain whether the model could be applicable to ICUs. If possible, test the model in the same medical center in the ICU unit.

A reference so SSC document is missing on page 5.

"EMS (personnel)" is used without definition of EMS abbreviation.

Definition of the two groups is not clear. What is the exact differences between the two groups and what is shared, and why was it defined as such? Is it based on previous studies, or clinical guidelines or something else?

GBM was trained on 70% of patients. What if a patient was admitted more than once to the ED?

In the discussion, authors state that "Data are analyzed on an encounter level, and it is possible that a given patient could exist in both the training data and the validation data." This is not how it was presented in the introduction where the counts represented number of patients. This is critical limitation of a patient can appear both in training and validation as sepsis is a risk for future sepsis. Instead, authors should eliminate multiple samples of the same patient (by random, or by chronological order), or leave all admission of same patient in the same split (training or validation).

Whatsoever, this decision cannot be buried in the discussion but must be explicitly stated in the methods.

"At each iteration during the training stage, we sampled positive and negative cases in equal proportion" Does equal proportion mean 1:1, or proportional to the overall proportion of positive and negatives, or the proportion of each node?

Moreover, what are positive and negative cases? I assume it's relate to timely completion or rapid completion, but author need to be explicit, as it was defined well above.

"(0, 1, 2, 3, NA, and NA)" shouldn't it be "(0, 1, 2, 3, and NA)"?

Please state version of R used and version of the other packages xgboost and xgboostExplainer.

Table 1:

- No need for "Basic Metabolic Panel [BMP]:", "Liver Function Test [LFT]: " as the details are followed.

I guess the numbers in parentheses is the number of features. But the details do no match. For example, there are no 20 listed vital signs.

List of diagnosis codes or variables should be given in the supplement or as a reference where it is listed.

How come "Triage time of the day" gives 4 features? Time of the day is a single datum.

Figure 1 doesn't match numbers in text. For example in the text, Group 1 comprised of 6855 patients, while in Figure 1 it is group 2 that have this number of samples.

Figures' resolution are too low, and have to be improved.

The authors claimed based on Figure 3 that "model performance was consistent across calendar years". First, should it be 'models'?

Second, such claim needs to be supported statistically.

"A Spearman correlation test (0.6 [0.43 - 0.76]) suggested that the feature rankings of the two models were statistically different." What is 0.6? Is it the r squared, or test's p-value? A non-significant p-value does not implies that models are statistically different.

How many features were used for this test? All of them or top 20?

Authors need to compare not only the top features but their directionality.

When comparing model 1 and model 2's features the authors claim that among top factors the initial physiological were present for M1, while for M2 there were more likely to represent values before prediction point. The author need to quantify it.

It is not clear to me how author see a reduction in OR of absent of GCS. Is it reflected in figures 5,6?

Figure 7 is not discussed, and therefore should be eliminated.

The authors used one method for two purposes. 1. Outcome prediction. 2. Identify important features. Although this two objectives are related, they do not have to be coupled. So there are methods that are better for predictions and other methods for finding top affecting features.

Data was not made available.

Code was not shared. Although this is not a requirement, but it became acceptable that authors publish their code in an open to the public repository as a package or as a notebook.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Nadav Rappoport

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: comments.pdf

PLoS One. 2021 May 6;16(5):e0250923. doi: 10.1371/journal.pone.0250923.r002

Author response to Decision Letter 0


8 Dec 2020

The manuscript applies association rule mining to identify features that are associated with rapid treatment of Spesis patients in Emergency Department (ED). The paper defines an interesting research question, and addresses it using Gradient boosting models on ED visits data. Overall, the manuscript is well-written; however, there are several shortfalls that need to be addressed:

Assumptions and Scope:

Authors mentioned several key assumptions and definitions throughout the paper. As a reader, I would like to see them all at one place, preferably toward the beginning of the paper as bullet-points. Currently, they are defined as needed all over the paper. For example, “Since sepsis, so defined, was considered to be present on admission…”in page 5. Also, “…components of the 3-hour bundle represent standard elements of excellent sepsis care and have always been present in well-treated patients” in page 5. Similarly, “We hypothesize that patients with abnormal GCS may slow sepsis treatment, because they first receive …” in page 14. To set the stage, most of the key limitations in Page 15 (e.g. “definition of suspected infection was based on clinicians’ perceptions of infection”) need to be mentioned earlier, so the readers exactly know what the scopes are.

Response: We thank reviewer for the suggestions on reorganizing the paper to achieve better clarity. We have moved the key assumptions and definitions toward the beginning of the paper on page 4 and 5. More specifically,

- We have defined the study cohort using bullet points on page 5.

- the “Since sepsis, so defined, was considered to be present on admission…”on page 5, is now integrated in the exclusion criteria: “To further exclude patients who were admitted through the ED but developed sepsis later in their hospitalization, i.e. to include only sepsis present on admission, we inferred an hour boundary based on the timing distribution of patients with infection present on admission, which was 13 hours since triage.” on page 5.

- the “…components of the 3-hour bundle represent standard elements of excellent sepsis care and have always been present in well-treated patients” in page 5, is now integrated in: “The outcome of interest was timely completion of the Surviving Sepsis Guidelines (SSC) 3-hour bundle components (10). We chose the SSC bundles not as an endorsement, but as quantifiable, time-stamped, and recorded actions that are representative of rapid treatment and that are widely known to critical care and emergency practitioners. The specific treatment bundles were not proposed until 2012, but the components of the 3-hour bundle represent standard elements of excellent sepsis care and have always been present in well-treated patients. ...” on page 5

- the “definition of suspected infection was based on clinicians’ perceptions of infection” on page 15, is now integrated in one of the inclusion criteria “presence of a suspected infection is based on definition following clinicians’ perceptions of infection, which was defined as a body fluid culture ordered and anti-infective administered within four hours of one another” described on page 5.

However, the “We hypothesize that patients with abnormal GCS may slow sepsis treatment, because they first receive …” in page 14, was not something we would have known at the design stage. This is one of the findings of the study that we interpreted from a clinical point of view.

Method:

Authors used XGboost and SHAP packages in R. Since these are well-known methods, I would have appreciated if you included more details about the modeling process, certain data normalization/standardization procedures for continuous variables, tuning and parameter selection phase, and the reason for not having a testing set (authors used 70% training, and 30% validation sets, split randomly). I liked the brief juxtaposition of XGBoost and logistic regression on page 13. It is better to have it earlier in the methods section. Since authors used historical data, it will be interesting to see the distribution of different Diagnosis Related Groups (DRG) or ICD-10 codes among Sepsis population. This can further validate your Sepsis identification method.

Response: We have added more technical details regarding modeling process, data preprocessing and tuning in the Method section (page 6). Note that to retain better interpretation of modeling results as well as attribute to the robustness of tree-based models such as GBM, we didn’t perform additional normalization/standardization procedures for continuous variables, except for missing value handling. We’ve also mentioned the comparison with logistic regression ahead of time.

With regard to ICD coding, this could, indeed be interesting, but probably does not validate the technique. In fact we used a method of sepsis identification very similar to that of Rhee, et al, (JAMA, 2017). Additionally, we demonstrated in a related dataset from our institution that ICD codes are quite insensitive for identifying patients with infection-induced organ dysfunction (Deis, et al. CHEST, 2017) The work would be interesting, but we don’t believe that it would enhance our findings.

Results/Discussion:

Seems like variables related to vital signs play a big role in rapidity of treatment, which is quite expected. Authors also discuss the possibility of using other variables such as machine generated features (e.g. mean respiratory rate over time in page 15), and social factors (e.g. income, education in page 15). Since the latter is the study limitation, I would like to see what the findings of previous research about these features were and how your findings compare to them. In doing so, you may refer to some of the recent similar works, e.g. [1], [2].

Moreover, authors mentioned an important aspect of the subject where operational factors contributing to the ED wait times (e.g. staffing ratio, wait time in page 15) can impact physicians’ decision to expedite treatments. As far as I know, congested downstream units (e.g. ICU) prolongs the expected ED boarding time. Providers, on the other hand, initiate/expedite the treatment in the ED when they foresee a long wait time. I would like to see a brief discussion on this matter. Please refer to [3] for the case of ICU beds. Also, you may define the terms “risk factor” and “contributing factor” that were used for features between two different groups.

Response: We thank reviewers for this comment. [1] suggested the importance of personalized medicine advancement in helping better diagnose and treat sepsis patients, which is exactly something we are trying to achieve using a robust machine learning model. However, [1] emphasized on discovering genomic biomarkers, while we focused more on phenotypic features that are commonly recorded in EMR data. The majority of our findings involve vital signs. Although that the information might have been anecdotally predicted, we believe this to be the first study quantifying the associations of vital signs data with rapidity of treatment on a continuous scale. In [2], it pointed out that “most studies reported on changes in compliance with bundle elements and/or mortality rates”, however, few of those studies took a step further to understand risk factors that affect the compliance or non-compliance, which is something we are trying to uncover here. To emphasize on this point, we have added the following paragraph in the discussion section: “As discussed in [2], awareness and compliance with protocols can be difficult to maintain in the absence of an effective clinical pathway, which requires careful planning and dedicated resources. With a better understanding of alerting features associated with rapid treatment would provide useful and evidence-based insights to a better design for creating such clinical pathway and continuous education. ”.

It is an interesting proposition that treatment is initiated or expedited in the ED when ED personnel foresee a long wait time. This would be true under ideal circumstances; however, in the real time workflow of the emergency department and the hospital floors and ICUs, it is rare that one can actually foresee the circumstance. Beds can and do open at any time, and patients are accepted by the admitting team, whether on the hospital ward or in the ICU, as soon as they can be. It is rare, certainly in our hospital, that the ED physicians can actually anticipate that there will be a long delay in transfer to a floor bed or ICU bed. The variable and unpredictable nature of bed availability leads to variable behavior on the part of ED physicians. The simulation you reference [3] necessarily assumes that there is a rational, predictable behavior that simply does not exist under live circumstances. In fact, a full ICU can and does lead to later implementation of sepsis bundles, once time has passed and the ED physician realizes that the patient will not be moving soon from the ED. Delays in this circumstance can be significant.

Minor Comments:

- Page 5. (ref. to SSC document) needs a proper citation

- Page 11. “The marginal effects of the top 20 features for each model are shown in Figure 5 and Figure 6. is repeated a few lines above.

- Page 12, explanations for Figures 5 and 6 are almost identical, so I would merge them together.

- Quality of the provided figures are poor. Maybe this is because of the word-pdf conversion.

Response: We thank reviewer for the careful review! We have addressed all the minor comments and included figures of better resolution. In terms of the pictures, we have tried to save raw format as .tiff files which should guarantee better resolutions. Please also note that we have merged Figure 5 and 6 as new Figure 5, and make Figure 7 as new Figure 6.

References:

[1] P. Palma and J. Rello, “Precision medicine for the treatment of sepsis: recent advances and future prospects,” Expert Rev. Precis. Med. Drug Dev., vol. 4, no.4, pp. 205–213, Jul. 2019.

[2] J. W. Uffen, J. J. Oosterheert, V. A. Schweitzer, K. Thursky, H. A. H. Kaasjager, and M. B. Ekkelenkamp, “Interventions for rapid recognition and treatment of sepsis in the emergency department: a narrative review,” Clin. Microbiol. Infect.,Feb. 2020.

[3] I. Hasan, E. Bahalkeh, and Y. Yih, “Evaluating intensive care unit admission and discharge policies using a discrete event simulation model,” Simulation, p.003754972091474, Apr. 2020.

Attachment

Submitted filename: Response to Comments_final.docx

Decision Letter 1

Robert Moskovitch

13 Jan 2021

PONE-D-20-06811R1

Clinical Factors Associated with Rapid Treatment of Sepsis

PLOS ONE

Dear Dr. Song,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The paper was improved meaningfully, however, please follow the additional reviewers comments, and refer with a cover letter describing the corresponding modifications.

Please submit your revised manuscript by Feb 27 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Robert Moskovitch

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: * The figure's resolution is much better now.

* Please don't use acronyms before defining them, such as ED in the abstract. Don't assume all the readers familiar with these acronyms, even if there are popular.

* Add similar research with SHAP and Sepsis or other complications. Also, studies for predicting Sepsis or other complications. Write more about it in the introduction. Compare and discuss the results with other studies, write on the differences.

Here some suggestions for quite similar studies:

Murugesan, I., Murugesan, K., Balasubramanian, L., & Arumugam, M. (2019, September). Interpretation of Artificial Intelligence Algorithms in the Prediction of Sepsis. In 2019 Computing in Cardiology (CinC) (pp. Page-1). IEEE.

Lauritsen, S. M., Kristensen, M., Olsen, M. V., Larsen, M. S., Lauritsen, K. M., Jørgensen, M. J., ... & Thiesson, B. (2020). Explainable artificial intelligence model to predict acute critical illness from electronic health records. Nature communications, 11(1), 1-11.

Yang, M., Liu, C., Wang, X., Li, Y., Gao, H., Liu, X., & Li, J. (2020). An Explainable Artificial Intelligence Predictor for Early Detection of Sepsis. Critical Care Medicine, 48(11), e1091-e1096.

Komorowski, M., Celi, L. A., Badawi, O., Gordon, A. C., & Faisal, A. A. (2019). Understanding the Artificial Intelligence Clinician and optimal treatment strategies for Sepsis in intensive care. arXiv preprint arXiv:1903.02345.

And similar papers for prediction of complications:

Reyna, M. A., Josef, C., Seyedi, S., Jeter, R., Shashikumar, S. P., Westover, M. B., ... & Clifford, G. D. (2019, September). Early prediction of Sepsis from clinical data: the PhysioNet/Computing in Cardiology Challenge 2019. In 2019 Computing in Cardiology (CinC) (pp. Page-1). IEEE.

Fleuren, L. M., Klausch, T. L., Zwager, C. L., Schoonmade, L. J., Guo, T., Roggeveen, L. F., ... & Elbers, P. W. (2020). Machine learning for the prediction of Sepsis: a systematic review and meta-analysis of diagnostic test accuracy. Intensive care medicine, 1-18.

Itzhak, N., Nagori, A., Lior, E., Schvetz, M., Lodha, R., Sethi, T., & Moskovitch, R. (2020, August). Acute Hypertensive Episodes Prediction. In International Conference on Artificial Intelligence in Medicine (pp. 392-402). Springer, Cham.

Mao, Q., Jay, M., Hoffman, J. L., Calvert, J., Barton, C., Shimabukuro, D., ... & Das, R. (2018). Multicentre validation of a sepsis prediction algorithm using only vital sign data in the emergency department, general ward and ICU. BMJ open, 8(1).

Cherifa, M., Blet, A., Chambaz, A., Gayat, E., Resche-Rigon, M., & Pirracchio, R. (2020). Prediction of an acute hypotensive episode during an ICU hospitalization with a super learner machine-learning algorithm. Anesthesia & Analgesia, 130(5), 1157-1166.

Reviewer #2: (No Response)

Reviewer #3: In the revised manuscript titled "Clinical Factors Associated with Rapid Treatment of Sepsis", the authors improved dramatically the writing. It is now much clearer, and the authors addressed all previous comment and suggestion raised by the reviewer.

However I still have a few comments which can improve the manuscript farther.

- The author listed 3 criteria "Patients were included by satisfying the following criteria:" Are all three criteria required for every sample to be included?

- "EMS personnel" abbreviation is used w/o definition.

- Table 2:

1. There is no need for replicating column titles for every subsection.

2. I would recommend to stratify columns also by rapid treatment vs not.

- Figure 6: It is not clear whether this is for group 2 only, or cross sectional.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 May 6;16(5):e0250923. doi: 10.1371/journal.pone.0250923.r004

Author response to Decision Letter 1


5 Mar 2021

Response to Reviewer 1: We appreciate reviewer’s comments for the minor revision.

- We have identified all acronyms and added their full spellings when first introduced in the text. In addition, we have added a new “Acronym” section at the beginning of the text.

- We have also enriched the “Introduction” section with all the recommended articles added. On second page, we added “…Numerous efforts have been made to improve prognostic accuracy and efficiency for sepsis and its complications via machine learning techniques. For example, Yang et al.(8), Komorowski et al. (9) and Reyna et al. (10) developed artificial intelligence models to predict sepsis in intensive care; while Mao et al. (11) and Lauritsen et al. (12) extended the prediction application to ED and general ward. Itzhak et al. (13) and Cherifa et al. (14) developed models to predict acute hypertensive or hypotensive episodes among ICU admissions. However, few, if any studies have been designed to understand specific clinical features that patients exhibit at the time that physicians initiate rapid treatment of sepsis. Without assuming causality, one could evaluate from a situational awareness perspective which clinical features are most closely associated with rapid, thorough treatment. We believed that data from such a study could provide novel information that could be used to prompt rapid sepsis treatment for appropriate patients, regardless of the extant diagnostic criteria.”

Response to Reviewer 3: We appreciate reviewer’s comments for the minor revision.

- Yes. All three criteria are required for the patient encounter to be eligible for the study, and we have stated so very specifically in this revision.

- EMS stands for Emergency Medical Service, besides EMS, we have identified all acronyms and added their full spellings when first introduced in the text. In addition, we have added a new “Acronym” section at the beginning of the text.

- For Table 2, we believe we understand what the reviewer is intending, and we considered altering the table. However, on reflection, we believe that we should leave the table as is. The main thrust of the paper is to identify the demographic and other characteristics that identify, i.e. separate, the patients who receive rapid treatment, compared with those who do not. We do so with much more sophisticated techniques than one would employ with a perusal of summary statistics in a table. We opted to provide the demographics of the entire study set and to look at the subsets of those who required vasopressors and those who did not, and we think this better serves the purpose of informing the reader of who our study population is. This is underscored by the fact that demographic information was not among the features with the greatest effect in the model. The physiological data that most significantly separate rapidly treated patients from others are displayed in Figure 2, while demographics do not rise to that level of influence. We did remove the redundant column titles, as suggested.

- Figure 6 is on the entire or cross-sectional data set, i.e. for both group 1 and group 2.

Attachment

Submitted filename: Response to Reviewers_rev2.docx

Decision Letter 2

Robert Moskovitch

19 Apr 2021

Clinical Factors Associated with Rapid Treatment of Sepsis

PONE-D-20-06811R2

Dear Dr. Song,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Robert Moskovitch

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Following the reviewers response, it seems that the paper is ready for publication.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #3: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

Reviewer #3: I see that the authors of the manuscript "Clinical Factors Associated with Rapid Treatment of Sepsis" addressed all reviewers' comments adequately.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #3: No

Acceptance letter

Robert Moskovitch

26 Apr 2021

PONE-D-20-06811R2

Clinical Factors Associated with Rapid Treatment of Sepsis

Dear Dr. Song:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Robert Moskovitch

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix. Examples of SHAP value representations.

    (PDF)

    S2 Appendix

    Marginal effects of variables ranked top 11–20 for Model 1 (Panel A) and Model 2 (Panel B) based on SHAP values, i.e. exponential of the SHAP value.

    (PDF)

    Attachment

    Submitted filename: comments.pdf

    Attachment

    Submitted filename: Response to Comments_final.docx

    Attachment

    Submitted filename: Response to Reviewers_rev2.docx

    Data Availability Statement

    Data cannot be shared publicly because it includes high-dimensional patient-level data. The high-dimensionality nature of this dataset has a potential re-identification risk if linked with other data source. Data are available from the University of Kansas Medical Center Institutional Data Access / Ethics Committee (contact via HERON team at University of Kansas Medical Center, misupport@kumc.edu, for researchers who meet the criteria for access to confidential data).


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES