Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Nov 1.
Published in final edited form as: Am J Emerg Med. 2024 Sep 2;85:140–147. doi: 10.1016/j.ajem.2024.08.037

Validation and Comparison of Triage-Based Screening Strategies for Sepsis

Kasra Rahmati a,b, Samuel M Brown b,c, Joseph R Bledsoe d, Paul Passey b, Peter P Taillac e, Scott T Youngquist e, Matthew M Samore f, Catherine L Hough g, Ithan D Peltan b,c,*
PMCID: PMC11525104  NIHMSID: NIHMS2024275  PMID: 39265486

Abstract

OBJECTIVE:

This study sought to externally validate and compare proposed methods for stratifying sepsis risk at ED triage.

METHODS:

This nested case/control study enrolled ED patients from four hospitals in Utah and evaluated the performance of previously-published sepsis risk scores amenable to use at ED triage based on their area under the precision-recall curve (AUPRC, which balances positive predictive value and sensitivity) and area under the receiver operator characteristic curve (AUROC, which balances sensitivity and specificity). Score performance for predicting whether patients met Sepsis-3 criteria in the ED was compared to patients’ assigned ED triage score (Canadian Triage Acuity Score [CTAS]) with adjustment for multiple comparisons.

RESULTS:

Among 2000 case/control patients, 981 met Sepsis-3 criteria on final adjudication. The best performing sepsis risk scores were the Predict Sepsis version #3 (AUPRC 0.183, 95% CI 0.148-0.256; AUROC 0.859, 95% CI 0.843-0.875) and Borelli scores (AUPRC 0.127, 95% CI 0.107-0.160, AUROC 0.845, 95% CI 0.829-0.862), which significantly outperformed CTAS (AUPRC 0.038, 95% CI 0.035-0.042, AUROC 0.650, 95% CI 0.628-0.671, p<0.001 for all AUPRC and AUROC comparisons). The Predict Sepsis and Borelli scores exhibited sensitivity of 0.670 and 0.678 and specificity of 0.902 and 0.834, respectively, at their recommended cutoff values and outperformed Systemic Inflammatory Response Syndrome (SIRS) criteria (AUPRC 0.083, 95% CI 0.070-0.102, p=0.052 and p=0.078, respectively; AUROC 0.775, 95% CI 0.756-0.795, p<0.001 for both scores).

CONCLUSIONS:

The Predict Sepsis and Borelli scores exhibited improved performance including increased specificity and positive predictive values for sepsis identification at ED triage compared to CTAS and SIRS criteria.

Keywords: Sepsis, emergency department (ED) triage, screening, prediction model validation

INTRODUCTION

Sepsis is life-threatening organ dysfunction caused by a dysregulated host response to infection [1]. Prompt treatment of sepsis improves outcomes and lowers costs [2, 3]. However, rapid identification and diagnosis of sepsis is a difficult proposition given the syndrome’s frequently subtle presentation and the busy emergency department (ED) environment.

Early screening interventions can expedite delivery of care to patients with sepsis and may substantially decrease mortality rates, but must both maximize sensitivity and minimize false positives to avoid resource wastage and overtreatment [4, 5]. Patient and population-level complications due to antimicrobial-associated adverse events present tradeoff challenges between early treatment and overtreatment [5], emphasizing the need for validated and optimized sepsis predication models.

While several published studies have aimed to create prediction models and scores to help identify patients at high risk of having sepsis based on patient past medical history, labs, vital signs, and other risk factors in addition to symptoms, few such models have undergone appropriate external validation [6, 7]. Moreover, prior studies have employed metrics for discrimination emphasizing sensitivity and specificity rather than sensitivity and positive predictive value (PPV) — the key determinants of clinical utility for these models [8] — and some have evaluated patients already diagnosed with sepsis, thus limiting generalizability and applicability to real-time triage [6, 8].

In order to determine the optimal model for identification of patients at high risk of sepsis at ED triage, we compared untargeted (i.e., usual care) triage-based risk assessment based on the Candian Triage and Acuity Scale (CTAS) to 12 previously published sepsis identification models that utilize data points readily available at triage (the Hamilton Early Warning Score [HEWS], Modified Early Warning Score [MEWS], National Early Warning Score [NEWS], Prehospital Early Sepsis Detection[(PRESEP], Predict Sepsis Version #3 [Predict Sepsis v3], Prehospital Severe Sepsis [PRESS], Quick Sepsis-related Organ Failure [qSOFA], Rapid Emergency Medicine Score [REMS], Screening to Enhance Prehospital Identification of Sepsis Score [SEPSIS], Sepsis Early Warning Score [SEWS], and Simple Triage Scoring System [STSS]) and to systemic inflammatory response syndrome [SIRS] criteria [922] in a multicenter analysis emphasizing model sensitivity and positive predictive value [8].

METHODS

We performed a nested case-control study of adult patients who presented to the ED. The Intermountain Healthcare Institutional Review Board approved this study with waiver of informed consent.

Setting

The study took place in four EDs within Intermountain Health (Intermountain), a vertically integrated healthcare system based in Utah. The EDs were located in two suburban community hospitals, one urban community hospital, and one urban tertiary/teaching hospital in and around Salt Lake City, Utah. Study EDs varied in size (19 to 57 ED beds) and annual visit volumes (22,000 to 89,000).

Subjects

Adult (age ≥18 years) patients presenting to a study ED during 2018 were eligible for inclusion in the study. Trauma patients and patients who left the ED without treatment were excluded. Patients were identified and demographic and clinical data obtained via previously validated electronic queries of Intermountain’s Enterprise Data Warehouse (EDW), a centrally-managed database linking data across multiple information systems [23, 24]. Structured manua review of the electronic medical record by trained study personnel was used to complete missing data, verify outliers, and obtain data unavailable from the EDW (see Supplemental Methods) [2426].

To provide sufficient outcome events for sepsis prediction models’ validation within a feasible sample size for data abstraction [27, 28], we randomly selected 1000 case subjects who met Sepsis-3 criteria [1] (IV or equivalent antimicrobials, body fluid culture, and Sequential Organ Failure Assessment score ≥2 points above baseline) while in the ED and 1000 control subjects who did not meet Sepsis-3 criteria using a random number generator. Trained personnel employed validated structured methods [25] to determine whether ED clinicians considered infection in the differential diagnosis and whether they suspected or diagnosed infection. Sepsis case patients who did not have infection suspected or diagnosed by the treating ED clinician but instead received antimicrobials for an indication other than acute infection (e.g., prophylaxis for cirrhotic patient with gastrointestinal bleeding) were reassigned as controls. Abstractors also adjudicated the presence and source of infection in the ED based on all available clinical documentation and laboratory data available in the patien’s medical record — including data through hospital discharge and any relevant outpatient data — using validated methods as previously described (see the Supplemental Methods for additional details) [25].

Outcome and exposures

The primary outcome was meeting Sepsis-3 criteria as defined above while in the ED. Patient demographic and clinical information was obtained from the EDW, including the triage acuity score assigned to each patient by trained ED nurses using the 5-point Canadian Triage Acuity Scale (CTAS), which ranges from 1 (resuscitation) to 5 (non-urgent). Validation of outlying data, missing data completion, and collection of data elements unavailable in the EDW (e.g., clinicians’ suspected source of infection, symptoms) occurred via structured review of Intermountain’s electronic medical record by trained reviewers as previously described [2426] (see details in the Supplemental Methods). Previously-published sepsis prediction scores designed for application at ED triage or the prehospital setting and systemic inflammatory response syndrome (SIRS) criteria [922] were calculated using first-available ED data (Table 1). Score inputs unavailable from ED triage documentation (white blood count, differential, and skin exam) were obtained via laboratory testing or documentation performed later in the ED encounter (Table 1). Consistent with clinical practice, we assigned the points that would have resulted from normal values when a score input was missing. Scores’ discrimination for detection of patients meeting Sepsis-3 criteria in the ED was measured by the area under precision recall curve (AUPRC) and area under the receiver operator curve (AUROC).

Table 1.

Sepsis risk score thresholds and input parameters.

Sepsis risk scorea Score thresholdb Temperature (°C) Heart rate (beats/min) Systolic blood pressure (mmHg) Resp. rate (breaths/min) SpO2 (%) Glasgow Coma Scale score Other
BAS 90-30-90 [9, 49 ≥1 <90 >30 <90
Borelli [10] ≥3 <36
>38.3
>90 <90 >20 <90 <15 Suspected infection
CTASc [11, 49] ≤2 <36
≥38.5
≥90 >220 or DBP >130
200-220 or DBP 110-130
<80 or DBP <60
≥20 <90
<92
92-94 ≥94
3-9
10-13
14-15
Presenting problem, nursing assessment
HEWS [12, 49, 50] ≥3 ≤35(3)
35.1-36
38-39
≥39.1(2)
≤40(2)
41-50
101-110
111-130(2)
>130(3)
<71(3)
71-90(2)
171-200(2)
>200(3)
<8(3)
8-13(2)
21-30(2)
>30(3)
<85(3)
85-92
10-14
7-9(2)
≤6(3)
MEWS [13, 50] ≥5 <35(2)
≥38.5(2)
<40(2)
41-50
101-110
111-129(2)
≥130(3)
<70(3)
71-80(2)
81-100
≥200(2)
<9(2)
15-20
21-29(2)
≥30(3)
10-14
7-9(2)
≤6(3)
NEWS [14, 50, 51] ≥5 ≤35(3)
35.1-36
38.1-39
≥39.1(2)
≤40(3)
41-50
91-110
111-130(2)
≥131(3)
≤90(3)
91-100(2)
101-110(2)
≥220(3)
≤8(3)
9-11
21-24(2)
≥25(3)
≤91(3)
92-93(2)
94-95
<15(3)
Predict Sepsis v3 [15] ≥2 38.1-38.5
>38.5(2)
>110 ≤100(2) >24 <94 <15(2)
PRESEP [16] ≥4 <36 >38(4) <90(2) <90(2) <22 <92(2)
qSOFA [17] ≥2 ≤100 ≥22 <15
REMS [18, 52] ≥9 <40.9(4)
39–40.9(3)
38.5–38.9
34 – 35.9
32–33.9(2)
30–31.9(3)
<30(4)
<179(4)
140–179(3)
110–139(2)
55–69(2)
40–54(3)
<39(4)
>159(4) (MAP)
130–159(3) (MAP)
110–129(2) (MAP)
55–69(2) (MAP)
40-54(3) (MAP)
<39(4) (MAP)
>49(4)
35–49(3)
25–34
10-11
6-9(2)
<5(4)
<75(4)
75-85(3)
86-89
<5(4)
5-7(3)
8-10(2)
11–13
SEPSIS [19] ≥5 37.5-39.5
>39.5(2)
101–140
141–160(2)
60-99
>160(−1)
21-40
41-60(2)
<94 ≤12 Skin examd
SEWS [20] ≥7 >110(2) <106(3) or <100(4) <78 or <58(2) (DBP) >24(5) <14(3)
SIRS [21] ≥2 <36 >38 >90 >20 WBCd Differentiald
STSS [22] ≥3 Shock index >1 (HR î SBP) >30 <90 <15
a

Thresholds adapted from original score development/validation publication or external publications, as indicated.

b

The presence of each criterion is worth 1 point toward the total score, except where alternative values are indicated in parentheses.

c

Utilized CTAS value was obtained from nurse documentation and was not calculated de novo. CTAS inputs are shown for illustrative purposes.

d

Data element obtained from first-available physical exam documentation or laboratory values.

Abbreviations: BAS, Blood Pressure “Andningsfrekvens” Saturation score; BP, blood pressure; CTAS, Canadian Triage and Acuity Scale; DBP, diastolic blood pressure; HEWS, Hamilton Early Warning Score; MAP, mean arterial pressure; MEWS, Modified Early Warning Score; NEWS, National Early Warning Score; PRESEP, Prehospital Early Sepsis Detection; Predict Sepsis v3, Predict Sepsis Version #3; PRESS, Prehospital Severe Sepsis; qSOFA, Quick Sepsis-related Organ Failure Assessment; REMS, Rapid Emergency Medicine Score; Resp. rate, respiratory rate; SBP, systolic blood; pressure; SEPSIS, Screening to Enhance Prehospital Identification of Sepsis Score; SEWS, Sepsis Early Warning Score; SIRS, Systemic Inflammatory Response Syndrome; SPO2, oxygen saturation; STSS, Simple Triage Scoring System.

Data analysis

To measure the benefit from using a sepsis prediction score at triage in comparison to untargeted risk assessment using usual triage criteria, the primary analysis compared the AUPRC of each candidate sepsis prediction score to the AUPRC for generic/usual care risk assessment based on the CTAS value assigned by ED nurses at triage. Secondary analyses compared scores to SIRS criteria. SIRS criteria were selected as a comparator based on their frequent incorporation into sepsis screening strategies. The AUPRC, which evaluates how risk scores balance positive predictive value (precision) and sensitivity (recall), has been recommended over the area under receiver operator curve (AUROC or C statistic) when analyzing predication accuracy for rare events [8, 29, 30]. The AUPRC value for a non-informative test is equal to the population event rate. Since the PPV values necessary for AUPRC estimation are dependent on prevalence and therefore are not directly available from the case/control data, PPV values were calculated using the sensitivity and specificity obtained from the case/control data and the sepsis prevalence calculated in the overall cohort including all eligible ED patients using a standard formula [31]. We used bootstrapping to generate 95% confidence intervals for AUPRC and PPV. All potential scores were tested for equivalent performance jointly and in pairwise comparisons versus the triage acuity score. P values for pairwise comparisons were adjusted for multiple comparisons using the Bonferroni method [32].

PPV=(sensitivity×prevalence)/[(sensitivity×prevalence)+((1specificity)×(1prevalence))] Formula 1.

Data analysis was performed in Stata version 16.1 (College Station, TX). A p value or multiplicity-adjusted p value of ≤0.05 was considered statistically significant. We estimated a priori that a balanced case/control sample including 1988 patients would yield 80% power to detect an AUPRC difference of 0.048 before adjustment for multiplicity.

RESULTS

Of 132,807 adult ED patients eligible for analysis, 74,942 (56.4%) were female and 3,302 (2.4%) met Sepsis-3 criteria in the ED (Supplemental Table 1). Patients were generally similar across study sites (Supplemental Table 2). Among the 2,000 case/control patients, 981 (49.1%) met Sepsis-3 criteria and were included as sepsis case patients after confirmation that the ED clinician diagnosed infection (Figure 1). Based on infection status adjudication using all available data, infection was present in the ED for 903 (92.0%) sepsis case patients and 160 (15.7%) non-sepsis control patients (Supplemental Figure 1). Sepsis case patients were older (62 ±19 years versus 51±21), had more comorbidities (weighted Elixhauser comorbidity score median 18 [IQR 8-30] vs 7 [IQR 0-18]), and exhibited more abnormal vital signs (Table 2).

Figure 1.

Figure 1.

Subject inclusion flow diagram. (Legend: * Objective Sepsis-3 criteria required administration of eligible antimicrobial, body fluid culture, and SOFA score ≤2 points above baseline but did not account for reason for antimicrobial administration being suspected or confirmed infection. ** Adjudication of the presence/absence of infection on final assessment used standardized, validated methods and all available data.)

Table 2.

Demographic and clinical characteristics of case/control ED patients.

Sepsis-3 cases (N=981) Controls (N=1,019) Missing P value
Age (years) 62 (19) 51 (21) 0 (0%) <0.001
Female sex 550 (56.1%) 551 (54.1%) 0 (0%) 0.393
Hispanic/Latino or race other than White 175 (17.8%) 195 (19.1%) 0 (0%) 0.649
ED arrival from long-term care facility 73 (7.4%) 20 (2.0%) 0 (0%) <0.001
Weighted Elixhauser comorbidity score 19 (8-30) 10 (0-18) 122 (5.6%) <0.001
Initial ED vital signs
  Temperature (°C) 37.5 (1.3) 36.7 (0.7) 11 (0.5%) <0.001
  Fever 354 (36.1%) 73 (7.2%) 0 (0%) <0.001
  Respiratory rate 21 (6.1) 19 (4.0) 2 (0.1%) <0.001
  Systolic blood pressure (mmHg) 126 (28) 138 (24) 4 (0.2%) <0.001
  Shock (SBP<90 or MAP <65) 142 (14.5%) 21 (2.1%) 0 (0%) <0.001
  Heart rate 102 (23) 88 (20) 3 (0.1%) <0.001
Initial ED laboratory results
  Lactate measured and <2 mmol/L 368 (37.5%) 100 (9.8%) 0 (0%) <0.001
  White blood cell count (1000/dL) 12.8 (7.3) 9.5 (4.6) 284 (14.2%) <0.001
ED triage acuity score 0 (0%) <0.001
  Resuscitation (1) 37 (3.8%) 19 (1.9%)
  Emergent (2) 582 (59.3%) 338 (33.2%)
  Urgent (3) 358 (36.5%) 608 (59.7%)
  Semi-urgent or non-urgent (4 or 5) 4 (0.4%) 54 (5.3%)
Acute infection considered in ED clinician differential diagnosis 981 (100%) 424 (41.6%) 0 (0%) <0.001
ED-diagnosed presence/source of infection 0 (0%) <0.001
  Pneumonia/pulmonary 266 (27.1%) 39 (3.8%)
  Urinary Tract 301 (30.7%) 31 (3.0%)
  Other or unknown source 414 (42.2%) 93 (9.1%)
  No infection suspected 0 (0%) 856 (84.0%)
Infection present on final adjudication 903 (92.0%) 160 (15.7%) 0 (0%) <0.001
IV or equivalent antimicrobial administration in ED 981 (100%) 97 (9.5%) 0 (0%) <0.001
ED disposition 0 (0%) <0.001
  Admit to inpatient floor 491 (50.1%) 218 (21.4%)
  Admit to ICU 206 (21.0%) 42 (4.1%)
  Transfer to non-study system facility 14 (1.4%) 10 (1.0%)
  Discharged from ED 251 (25.6%) 747 (73.3%)
  Expired 19 (1.9%) 2 (0.2%)
Hospital mortality 53 (5.4%) 8 (0.8%) 0 (0%) <0.001

Values reported as mean (SD), median (IQR), or n(%).

Abbreviations: ED, emergency department; ICU, Intensive Care Unit; MAP, mean arterial pressure; SBP, systolic blood pressure.

The Predict Sepsis (AUPRC 0.183 [95% CI 0.148-0.256]; AUROC 0.859 [95% CI 0.843-0.875]), Borelli (AUPRC 0.126 [95% CI 0.107-0.160]; AUROC 0.845 [95% CI 0.829-0.862]), and SEPSIS (AUPRC 0.127, 95% CI 0.105-0.176; AUROC 0.818 [95% CI 0.800-0.836]) scores exhibited the best discrimination among tested sepsis risk scores (Table 3). Compared to CTAS (AUPRC 0.038 [95% CI 0.035-0.042]; AUROC 0.650 [95% CI 0.628-0.795]), the AUPRC- and AUROC-based discrimination was significantly higher for all three scores (p<0.001 for each pariwise comparison). While the Predict Sepsis, Borelli, qSOFA (AUPRC 0.081 [95% CI 0.069-0.097]; AUROC 0.690 [95% CI 0.670-0.710]), and SEPSIS scores also outperformed SIRS (AUPRC 0.083 [95% CI 0.070-0.102]; AUROC 0.775 [95% CI 0.756-0.795]) in AUROC-based evaluations (p<0.001 for each pairwise comparison); the differences in AUPRC-based discrimination was not statistically significant (p=0.052, p=0.078, p=1.0, and p=0.442, respectively) (Figures 2 and 3).

Table 3.

Prediction models and their respective classification accuracy in comparison to CTAS and SIRS.

Score Score threshold AUPRC (95% CI) AUPRC p Value vs CTAS AUPRC p Value vs SIRS AUROC (95% CI) AUROC p Value vs CTAS AUROC p Value vs SIRS
CTAS ≤2 0.038 (0.035-0.042) <0.001 0.650 (0.628-0.671) <0.001
SIRS ≥2 0.083 (0.070-0.102) <0.001 0.775 (0.756-0.795) <0.001
BAS 90-30-90 ≥1 0.057 (0.047-0.084) 0.598 0.351 0.618 (0.602-0.634) 0.086 <0.001
Borelli ≥3 .127 (0.107-0.160) <0.001 0.078 0.845 (0.829-0.862) <0.001 <0.001
HEWS ≥3 0.076 (0.063-0.116) 0.351 1.00 0.747 (0.726-0.768) <0.001 0.010
MEWS ≥5 0.123 (0.103-0.170) 0.013 0.546 0.767 (0.747-0.787) <0.001 1.00
NEWS ≥5 0.113 (0.092-0.160) <0.001 1.00 0.790 (0.770-0.810) <0.001 1.00
Predict Sepsis v3 ≥2 0.183 (0.148-0.256) <0.001 0.052 0.859 (0.843-0.875) <0.001 <0.001
PRESEP ≥4 0.116 (0.095-0.161) 0.026 0.832 0.776 (0.756-0.796) <0.001 1.00
qSOFA ≥2 0.081 (0.069-0.097) <0.001 1.00 0.690 (0.670-0.710) 0.009 <0.001
REMS ≥9 0.060 (0.051-0.081) 0.286 0.286 0.692 (0.670-0.715) 0.014 <0.001
SEPSIS ≥5 0.127 (0.105-0.176) 0.013 0.442 0.818 (0.800-0.836) <0.001 <0.001
SEWS ≥7 0.061 (0.052-0.090) 0.611 0.832 0.672 (0.649-0.696) 1.00 <0.001
STSS ≥3 0.087 (0.072-0.104) <0.001 1.00 0.718 (0.697-0.738) <0.001 <0.001

Figure 2.

Figure 2.

Comparison of area under precision recall curve characteristic of 14 sepsis risk scores at ED triage. Area under the precision recall curve (AUPRC) values for scores are reported together with p value (adjusted for multiple comparisons) versus that achieved by the Canadian Triage Acuity Scale (CTAS). A non-informative test would have an AUPRC value equal to the population sepsis prevalence (0.024). (Legend: * = p≤0.05 & p>0.01, ** = p≤0.01 & p>0.001, *** = p≤0.001)

Figure 3.

Figure 3.

Comparison of area under receiver operator characteristic of 14 sepsis risk scores at ED triage. Area under the receiver operator characteristic (AUROC) values for scores are reported together with p value (adjusted for multiple comparisons) versus that achieved by the Canadian Triage Acuity Scale (CTAS). A non-informative test would have an AUROC value equal to 0.5. (Legend: * = p≤0.05 & p>0.01, ** = p≤0.01 & p>0.001, *** = p≤0.001

At their recommended cutoff values, triage categorization as “emergent” or resuscitation (CTAS ≤2) had 69% sensitivity for sepsis (Table 4), similar to the Predict Sepsis (0.67) and Borelli (0.68) scores but substantially higher than the SEPSIS score (0.12). Each of the three scores had higher specificity (0.90, 0.83, and 0.99, respectively) and positive predictive value (0.15, 0.09, and 0.27, respectively) compared to CTAS (0.07). SIRS ≥2 had slightly lower sensitivity (0.63) and positive predictive value (0.04) than CTAS.

Table 4.

Prediction models and their respective classification accuracy.

Score Sensitivity Specificity PPV NPV
CTAS 0.693 0.769 0.071 0.990
SIRS 0.631 0.650 0.044 0.986
BAS 90-30-90 0.300 0.934 0.104 0.981
Borelli 0.678 0.834 0.094 0.990
HEWS 0.573 0.795 0.067 0.987
MEWS 0.284 0.956 0.141 0.981
NEWS 0.531 0.871 0.095 0.987
Predict Sepsis v3 0.670 0.902 0.148 0.991
PRESEP 0.524 0.907 0.125 0.987
qSOFA 0.178 0.966 0.117 0.979
REMS 0.190 0.940 0.075 0.979
SEPSIS 0.116 0.992 0.274 0.978
SEWS 0.505 0.765 0.052 0.984
STSS 0.117 0.985 0.169 0.978

DISCUSSION

In this multicenter study employing granular and high-fidelity clinical data, the Predict Sepsis (version 3) and Borrelli risk scores demonstrated substantially improved performance for identifying ED patients at increased risk for sepsis at ED triage compared to both a standard severity assessment strategy and SIRS-based screening. Among unselected patients, the Predict Sepsis score achieved a PPV nearly 2-fold higher than standard triage severity assessment and 4-fold higher than SIRS-based screening while maintaining similar 67% sensitivity. The SEPSIS score achieved a positive predictive value of 27%, but had only 12% sensitivity.

Treatment — and therefore diagnosis — of sepsis is a time critical problem for ED clinicians that is complicated by the condition’s non-specific presentation and its relatively low frequency [33]. In the absence of a simple, non-invasive test for the syndrome, sepsis diagnosis requires a resource intensive, multidisciplinary, and multimodal evaluation. Risk scores, alerts, and other decision support aids with inadequate sensitivity risk blinding clinicians to patients requiring rapid evaluation for sepsis, while systems with inadequate PPV risk alert fatigue, resource drain, overtreatment, and anchoring bias [3437]. Effective rapid assessment pathways therefore require activation criteria that simultaneously maximize the proportion of sepsis patients captured and minimize the false positive activation rate. In this setting, AUPRC provides a more balanced assessment of prediction models’ relative clinical utility than the standard prediction model discrimination statistic, AUROC, which yields an overly optimistic assessment of discriminative performance [8, 29]. For instance, qSOFA significantly outperformed SIRS in AUROC-based assessments but had similar discrimination based on the scores’ AUPRC, reflecting the tradeoff between qSOFA’s low sensitivity but high PPV versus SIRs’s moderate sensitivity but low PPV.

In addition to emphasizing AUPRC, our study builds on past studies investigating ED-based performance of sepsis risk scores along several lines. First, we focused on scores’ performance at ED triage, a critical opportunity point which nevertheless has been studied relatively infrequently to date [12, 20, 3842]. We tested a broad range of scores proposed or employed in practice for triage-based sepsis risk assessment, including scores like NEWS initially developed for generic outcome prediction models. We also evaluated multiple scores initially developed for prehospital (i.e., ambulance) use [9, 10, 15,16, 19], an environment similar to ED triage in that the breadth and depth of inputs available for risk scores are limited. Finally, and perhaps most importantly, our study simulated real-world risk assessment at ED triage by including unselected patients presenting to the ED rather than a population restricted to patients with suspected infection, suspected severe illness, or both. To our knowledge, our study is the first external validation of triage-based sepsis risk assessement performance among unselected patients for many of the evaluated scores. For those scores that have been evaluated for this setting and population, including HEWS, SIRS, qSOFA, and NEWS, our findings were in general agreement with prior reports [12, 3942].

Our study investigated sepsis risk scores’ utility to guide a binary diagnositic or treatment decision (e.g., activation of a sepsis care pathway), and its case-control design precluded assessment of the tested scores’ calibration. While AUPRC values correlate better with clinical utility than the most common discrimination metric, AUROC, AUPRC-based performance may differ in settings with different sepsis prevalence [8]. However, sepsis prevalence in this study was similar to a recent national estimate [33] and our AUROC-based comparisons paralleled the findings from AUPRC-based comparisons, supporting the generalizability of our findings.

Additional limitiations include the fact that study data were captured retrospectively. Inputs for some evaluated scores were not routinely captured at ED triage (skin exam for the SEPSIS risk score and white blood cell count and differential for SIRS). As these inputs are likely to be stable during the ED encounter, data collected or documented later in the ED course was assumed to be representative of triage values. Some potential sepsis risk scores were dependent on (e.g., end-tidal carbon dioxide) not captured in routine practice [43] or were not portable [44, 45], precluding evaluation in the present study. Systematic, prospective capture of the diverse inputs for all potential scores at ED triage may have yielded different findings. While CTAS has a similar 5-level structure and overall similar performance to other triage scales such the Emergency Severity Index and the Manchester Triage System [46, 47], results from our comparison may not generalize to EDs using triages systems other than CTAS [40, 48]. Finally, alternative windows for defining the outcome that identified sepsis patients with infections missed in the ED or sepsis-defining organ failure developing after ED departure may have yielded different results.

In summary, the Predict Sepsis score and the Borelli score exhibited the best clinical utility for identifying patients at increased risk for sepsis at ED triage, substantially outperforming standard triage-based severity assessment methods. Our findings highlight the value of external validation and balanced metrics when evaluating the clinical utility of risk scores for low outcome prevalence conditions.

Supplementary Material

Supplementary Material

Funding:

The study was supported by grants from the National Institutes of Health (K23GM129661 and R35GM151147) and the Intermountain Research and Medical Foundation. The funding sources had no role in the design, conduct, analysis, or reporting of this study.

Conflicts of interest:

Outside the present work, KR reports issued patent, IDP reports grant funding from Janssen Pharmaceuticals and payments to his institution for study enrollments from Regeneron and Bluejay Diagnostics, SMB reports royalties on a patent from ReddyPort. STY reports industry grant from CoLabs, and JRB reports payment from JAJ LLC medical consulting. Other authors report no conflicts of interest.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Prior presentation: A preliminary version of this manuscript was presented at the October 2022 annual scientific assembly of the American College of Emergency Physicians in San Francisco, CA.

REFERENCES

  • [1].Singer M, Deutschman CS, Seymour CW, Shankar-Hari M, Annane D, Bauer M, et al. The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). JAMA. 2016;315(8):801–10. 10.1001/jama.2016.0287. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Paoli CJ, Reynolds MA, Sinha M, Gitlin M, Crouser E. Epidemiology and Costs of Sepsis in the United States-An Analysis Based on Timing of Diagnosis and Severity Level. Crit Care Med. 2018;46(12):1889–97. 10.1097/CCM.0000000000003342. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Seymour CW, Gesten F, Prescott HC, Friedrich ME, Iwashyna TJ, Phillips GS, et al. Time to Treatment and Mortality during Mandated Emergency Care for Sepsis. N Engl J Med. 2017;376(23):2235–44. 10.1056/NEJMoa1703058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Gatewood MO, Wemple M, Greco S, Kritek PA, Durvasula R. A quality improvement project to improve early sepsis care in the emergency department. BMJ Qual Saf. 2015;24(12):787–95. 10.1136/bmjqs-2014-003552. [DOI] [PubMed] [Google Scholar]
  • [5].Martínez ML, Plata-Menchaca EP, Ruiz-Rodríguez JC, Ferrer R. An approach to antibiotic treatment in patients with sepsis. J Thorac Dis. 2020;12(3):1007–21. 10.21037/jtd.2020.01.47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Murad MH, Katabi A, Benkhadra R, Montori VM. External validity, generalisability, applicability and directness: a brief primer. BMJ Evid Based Med. 2018;23(1):17–9. 10.1136/ebmed-2017-110800. [DOI] [PubMed] [Google Scholar]
  • [7].Steckler A, McLeroy KR. The importance of external validity. Am J Public Health. 2008;98(1):9–10. 10.2105/AJPH.2007.126847. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Leisman DE. Rare Events in the ICU: An Emerging Challenge in Classification and Prediction. Crit Care Med. 2018;46(3):418–24. 10.1097/CCM.0000000000002943. [DOI] [PubMed] [Google Scholar]
  • [9].Wallgren UM, Castren M, Svensson AE, Kurland L. Identification of adult septic patients in the prehospital setting: a comparison of two screening tools and clinical judgment. Eur J Emerg Med. 2014;21(4):260–5. 10.1097/MEJ.0000000000000084. [DOI] [PubMed] [Google Scholar]
  • [10].Borrelli G, Koch E, Sterk E, Lovett S, Rech MA. Early recognition of sepsis through emergency medical services pre-hospital screening. Am J Emerg Med. 2019;37(8):1428–32. 10.1016/j.ajem.2018.10.036. [DOI] [PubMed] [Google Scholar]
  • [11].Bullard MJ, Unger B, Spence J, Grafstein E, Group CNW. Revisions to the Canadian Emergency Department Triage and Acuity Scale (CTAS) adult guidelines. CJEM. 2008;10(2):136–51. 10.1017/s1481803500009854. [DOI] [PubMed] [Google Scholar]
  • [12].Skitch S, Tam B, Xu M, McInnis L, Vu A, Fox-Robichaud A. Examining the utility of the Hamilton early warning scores (HEWS) at triage: Retrospective pilot study in a Canadian emergency department. CJEM. 2018;20(2):266–74. 10.1017/cem.2017.21. [DOI] [PubMed] [Google Scholar]
  • [13].Subbe CP, Kruger M, Rutherford P, Gemmel L. Validation of a modified Early Warning Score in medical admissions. QJM. 2001;94(10):521–6. 10.1093/qjmed/94.10.521. [DOI] [PubMed] [Google Scholar]
  • [14].Alam N, Vegting IL, Houben E, van Berkel B, Vaughan L, Kramer MH, et al. Exploring the performance of the National Early Warning Score (NEWS) in a European emergency department. Resuscitation. 2015;90:111–5. 10.1016/j.resuscitation.2015.02.011. [DOI] [PubMed] [Google Scholar]
  • [15].Wallgren UM, Sjolin J, Jarnbert-Pettersson H, Kurland L. The predictive value of variables measurable in the ambulance and the development of the Predict Sepsis screening tools: a prospective cohort study. Scand J Trauma Resusc Emerg Med. 2020;28(1):59. 10.1186/s13049-020-00745-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Bayer O, Schwarzkopf D, Stumme C, Stacke A, Hartog CS, Hohenstein C, et al. An Early Warning Scoring System to Identify Septic Patients in the Prehospital Setting: The PRESEP Score. Acad Emerg Med. 2015;22(7):868–71. 10.1111/acem.12707. [DOI] [PubMed] [Google Scholar]
  • [17].Seymour CW, Liu VX, Iwashyna TJ, Brunkhorst FM, Rea TD, Scherag A, et al. Assessment of Clinical Criteria for Sepsis: For the Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). JAMA. 2016;315(8):762–74. 10.1001/jama.2016.0288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Olsson T, Terent A, Lind L. Rapid Emergency Medicine Score: a new prognostic tool for in-hospital mortality in nonsurgical emergency department patients. J Intern Med. 2004;255(5):579–87. 10.1111/j.1365-2796.2004.01321.x. [DOI] [PubMed] [Google Scholar]
  • [19].Smyth MA, Gallacher D, Kimani PK, Ragoo M, Ward M, Perkins GD. Derivation and internal validation of the screening to enhance prehospital identification of sepsis (SEPSIS) score in adults on arrival at the emergency department. Scand J Trauma Resusc Emerg Med. 2019;27(1):67. 10.1186/s13049-019-0642-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Mellhammar L, Linder A, Tverring J, Christensson B, Boyd JH, Akesson P, et al. Scores for sepsis detection and risk stratification - construction of a novel score using a statistical approach and validation of RETTS. PLoS One. 2020;15(2):e0229210. 10.1371/journal.pone.0229210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Bone RC, Balk RA, Cerra FB, Dellinger RP, Fein AM, Knaus WA, et al. Definitions for sepsis and organ failure and guidelines for the use of innovative therapies in sepsis. The ACCP/SCCM Consensus Conference Committee. American College of Chest Physicians/Society of Critical Care Medicine. Chest. 1992;101(6):1644–55. 10.1378/chest.101.6.1644. [DOI] [PubMed] [Google Scholar]
  • [22].Talmor D, Jones AE, Rubinson L, Howell MD, Shapiro NI. Simple triage scoring system predicting death and the need for critical care resources for use during epidemics. Crit Care Med. 2007;35(5):1251–6. 10.1097/01.CCM.0000262385.95721.CC. [DOI] [PubMed] [Google Scholar]
  • [23].Clayton PD, Narus SP, Huff SM, Pryor TA, Haug PJ, Larkin T, et al. Building a comprehensive clinical information system from components: the approach at Intermountain Health Care. Methods Inf Med. 2003;42(1):1–7.http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=12695790&retmode=ref&cmd=prlinks. [PubMed] [Google Scholar]
  • [24].Peltan ID, Bledsoe JR, Oniki TA, Sorensen J, Jephson AR, Allen TL, et al. Emergency Department Crowding Is Associated With Delayed Antibiotics for Sepsis. Ann Emerg Med. 2019;73(4):345–55. 10.1016/j.annemergmed.2018.10.007. [DOI] [PubMed] [Google Scholar]
  • [25].Hooper GA, Klippel CJ, McLean SR, Stenehjem EA, Webb BJ, Murnin ER, et al. Concordance Between Initial Presumptive and Final Adjudicated Diagnoses of Infection Among Patients Meeting Sepsis-3 Criteria in the Emergency Department. Clin Infect Dis. 2023;76(12):2047–55. 10.1093/cid/ciad101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Peltan ID, Brown SM, Bledsoe JR, Sorensen J, Samore MH, Allen TL, et al. ED Door-to-Antibiotic Time and Long-term Mortality in Sepsis. Chest. 2019;155(5):938–46. 10.1016/j.chest.2019.02.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Collins GS, Ogundimu EO, Altman DG. Sample size considerations for the external validation of a multivariable prognostic model: a resampling study. Stat Med. 2016;35(2):214–26. 10.1002/sim.6787. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].Riley RD, Debray TPA, Collins GS, Archer L, Ensor J, van Smeden M, et al. Minimum sample size for external validation of a clinical prediction model with a binary outcome. Stat Med. 2021;40(19):4230–51. 10.1002/sim.9025. [DOI] [PubMed] [Google Scholar]
  • [29].Ozenne B, Subtil F, Maucort-Boulch D. The precision-recall curve overcame the optimism of the receiver operating characteristic curve in rare diseases. J Clin Epidemiol. 2015;68(8):855–9. 10.1016/jjclinepi.2015.02.010. [DOI] [PubMed] [Google Scholar]
  • [30].Leisman DE, Harhay MO, Lederer DJ, Abramson M, Adjei AA, Bakker J, et al. Development and Reporting of Prediction Models: Guidance for Authors From Editors of Respiratory, Sleep, and Critical Care Journals. Crit Care Med. 2020;48(5):623–33. 10.1097/CCM.0000000000004246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Tenny S, Hoffman MR. Prevalence. StatPearls, Treasure Island, FL: StatPearls Publishing; 2024. [Google Scholar]
  • [32].Wright SP. Adjusted P-Values for Simultaneous Inference. Biometrics. 1992;48(4):1005–13. 10.2307/2532694. [DOI] [Google Scholar]
  • [33].Wang HE, Jones AR, Donnelly JP. Revised National Estimates of Emergency Department Visits for Sepsis in the United States. Crit Care Med. 2017;45(9):1443–9. 10.1097/CCM.0000000000002538. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].IDSA Sepsis Task Force. Infectious Diseases Society of America (IDSA) Position Statement: Why IDSA Did Not Endorse the Surviving Sepsis Campaign Guidelines. Clin Infect Dis. 2018;66(10):1631–5. 10.1093/cid/cix997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Umscheid CA, Betesh J, VanZandbergen C, Hanish A, Tait G, Mikkelsen ME, et al. Development, implementation, and impact of an automated early warning and response system for sepsis. J Hosp Med. 2015;10(1):26–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [36].Taylor SP, Rozario N, Kowalkowski MA, Gunn EC, Ellerman J, Heffner AC, et al. Trends in False-Positive Code Sepsis Activations in the Emergency Department. Ann Am Thorac Soc. 2020;17(4):520–2. 10.1513/AnnalsATS.201910-757RL. [DOI] [PubMed] [Google Scholar]
  • [37].Kuye I, Rhee C. Spotlight: Overdiagnosis and Delay: Challenges in Sepsis Diagnosis, https://psnet.ahrq.gov/web-mm/spotlight-overdiagnosis-and-delay-challenges-sepsis-diagnosis; 2018. [accessed September 16, 2023.
  • [38].Durr D, Niemi T, Despraz J, Tusgul S, Dami F, Akrour R, et al. National Early Warning Score (NEWS) Outperforms Quick Sepsis-Related Organ Failure (qSOFA) Score for Early Detection of Sepsis in the Emergency Department. Antibiotics (Basel). 2022;11(11):1518. 10.3390/antibiotics11111518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [39].Sterk E, Hyun BH, Rech MA. Comparison of an ED triage sepsis screening tool and qSOFA in identifying CMS SEP-1 patients. Am J Emerg Med. 2020;38(10):1995–9. 10.1016/j.ajem.2020.06.030. [DOI] [PubMed] [Google Scholar]
  • [40].Nieves Ortega R, Rosin C, Bingisser R, Nickel CH. Clinical Scores and Formal Triage for Screening of Sepsis and Adverse Outcomes on Arrival in an Emergency Department All-Comer Cohort. J Emerg Med. 2019;57(4):453–60 e2. 10.1016/jjemermed.2019.06.036. [DOI] [PubMed] [Google Scholar]
  • [41].Filbin MR, Thorsen JE, Lynch J, Gillingham TD, Pasakarnis CL, Capp R, et al. Challenges and Opportunities for Emergency Department Sepsis Screening at Triage. Sci Rep. 2018;8(1):11059. 10.1038/s41598-018-29427-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [42].Usman OA, Usman AA, Ward MA. Comparison of SIRS, qSOFA, and NEWS for the early identification of sepsis in the Emergency Department. Am J Emerg Med. 2019;37(8):1490–7. 10.1016/j.ajem.2018.10.058. [DOI] [PubMed] [Google Scholar]
  • [43].Hunter CL, Silvestri S, Ralls G, Stone A, Walker A, Papa L. A prehospital screening tool utilizing end-tidal carbon dioxide predicts sepsis and severe sepsis. Am J Emerg Med. 2016;34(5):813–9. 10.1016/j.ajem.2016.01.017. [DOI] [PubMed] [Google Scholar]
  • [44].Brann F, Sterling NW, Frisch SO, Schrager JD. Sepsis Prediction at Emergency Department Triage Using Natural Language Processing: Retrospective Cohort Study. JMIR AI 2024;3:e49784. 10.2196/49784. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [45].Delahanty RJ, Alvarez J, Flynn LM, Sherwin RL, Jones SS. Development and Evaluation of a Machine Learning Model for the Early Identification of Patients at Risk for Sepsis. Ann Emerg Med. 2019;73(4):334–44. 10.1016/j.annemergmed.2018.11.036. [DOI] [PubMed] [Google Scholar]
  • [46].Worster A, Fernandes CM, Eva K, Upadhye S. Predictive validity comparison of two five-level triage acuity scales. Eur J Emerg Med. 2007;14(4):188–92. 10.1097/MEJ.0b013e3280adc956. [DOI] [PubMed] [Google Scholar]
  • [47].Zachariasse JM, van der Hagen V, Seiger N, Mackway-Jones K, van Veen M, Moll HA. Performance of triage systems in emergency care: a systematic review and meta-analysis. BMJ Open. 2019;9(5):e026471. 10.1136/bmjopen-2018-026471. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [48].Zaboli A, Ausserhofer D, Pfeifer N, Solazzo P, Magnarelli G, Siller M, et al. Triage of patients with fever: The Manchester triage system’s predictive validity for sepsis or septic shock and seven-day mortality. J Crit Care. 2020;59:63–9. 10.1016/jjcrc.2020.05.019. [DOI] [PubMed] [Google Scholar]
  • [49].Lane DJ, Wunsch H, Saskin R, Cheskes S, Lin S, Morrison LJ, et al. Screening strategies to identify sepsis in the prehospital setting: a validation study. CMAJ. 2020;192(10):E230–E9. 10.1503/cmaj.190966. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [50].McNarry AF, Goldhill DR. Simple bedside assessment of level of consciousness: comparison of two simple assessment scales with the Glasgow Coma scale. Anaesthesia. 2004;59(1):34–7. 10.1111/j.1365-2044.2004.03526.x. [DOI] [PubMed] [Google Scholar]
  • [51].Silcock DJ, Corfield AR, Gowens PA, Rooney KD. Validation of the National Early Warning Score in the prehospital setting. Resuscitation. 2015;89:31–5. 10.1016/j.resuscitation.2014.12.029. [DOI] [PubMed] [Google Scholar]
  • [52].Ruangsomboon O, Boonmee P, Limsuwat C, Chakorn T, Monsomboon A. The utility of the rapid emergency medicine score (REMS) compared with SIRS, qSOFA and NEWS for Predicting in-hospital Mortality among Patients with suspicion of Sepsis in an emergency department. BMC Emerg Med. 2021;21(1):2. 10.1186/s12873-020-00396-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

RESOURCES