Abstract
This cohort study uses data from electronic health records to assess variability in a sepsis prediction model across 9 hospitals.
Use of sepsis prediction models may be associated with reduced patient mortality.1 There is concern, however, about the external validity of widely implemented models. For example, although the Epic Sepsis Model (ESM) performed poorly in an academic health system,2 another study at a community hospital found ESM improved the timeliness of antibiotic administration.3 The reasons for these disparate findings are unclear.4 Because increased ESM alerting at larger vs smaller hospitals suggests that hospital-related factors are associated with sepsis model performance variation,4,5 we evaluated the ESM performance across 9 hospitals.
Methods
This cohort study enrolled consecutive adult patients presenting to 1 of 9 hospitals in a large network (BJC HealthCare) between January 1, 2020, and June 30, 2022 (eMethods in Supplement 1). The Washington University Institutional Review Board approved this study with a waiver of consent because it involved no more than minimal risk to participants and it could not be practically conducted without a waiver. The ESM was calculated silently (ie, results not clinically available) every 20 minutes from the patient’s arrival in the emergency department. The primary outcome was sepsis, defined as meeting Sepsis-3 criteria using data extracted from the electronic health record; sepsis onset was based on clinical recognition (ie, either cultures or antimicrobials ordered). We evaluated discrimination by calculating the C-statistic (area under the receiver operating characteristic curve) at the hospitalization level, using the highest occurring score prior to clinical recognition of sepsis. A Pearson product-moment correlation was used to detect associations between C-statistics and hospital factors (time to clinical recognition of sepsis, sepsis incidence, Van Walraven index [a modification of Elixhauser comorbidity measures], and cancer prevalence). We used R version 4.0 (R Project for Statistical Computing). We set statistical significance at P = .05 and used 2-sided tests.
Results
We evaluated 806 696 encounters across 9 hospitals, including 233 875 from the largest hospital in the network and 572 821 from the remaining hospitals. Sepsis incidence ranged from 4.8% to 12.0% (Table).
Table. Characteristics and Outcomes of Sepsis and Other Encounters Across 9 US Hospitals.
| Variable | No. (%)a | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Hospital A | Hospital B | Hospital C | Hospital D | Hospital E | Hospital F | Hospital G | Hospital H | Hospital I | |
| Encounters, No. | 233 875 | 97 996 | 172 929 | 64 154 | 74 963 | 54 206 | 40 571 | 33 958 | 34 065 |
| Staffed beds, No. | 1273 | 449 | 233 | 158 | 107 | 102 | 76 | 68 | 35 |
| Characteristics of patient encounters | |||||||||
| Age, mean (SD), y | 49 (19) | 57 (20) | 44 (19) | 51 (21) | 50 (20) | 55 (20) | 50 (20) | 53 (19) | 49 (20) |
| Race | |||||||||
| Black | 101 231 (43) | 18 166 (19) | 127 782 (74) | 10 713 (17) | 1354 (1.8) | 5151 (9.6) | 2900 (7.1) | 421 (1.2) | 10 585 (31) |
| White | 100 050 (43) | 67 598 (69) | 23 884 (14) | 48 946 (76) | 52 718 (70) | 43 468 (80) | 33 732 (83) | 18 854 (56) | 30 916 (91) |
| Otherb | 32 594 (14) | 12 232 (12) | 21 263 (12) | 4495 (7.0) | 20 891 (28) | 5587 (10) | 3939 (9.7) | 2728 (8.0) | 4519 (13) |
| Sex | |||||||||
| Male | 126 908 (54) | 42 208 (43) | 81 507 (47) | 29 138 (45) | 43 354 (58) | 26 881 (50) | 17 864 (44) | 15 936 (47) | 16 731 (49) |
| Female | 106 967 (46) | 55 788 (57) | 91 422 (53) | 35 016 (55) | 31 609 (42) | 27 325 (50) | 22 707 (56) | 18 022 (53) | 17 334 (51) |
| Hypertension | 23 383 (10.0) | 10 902 (11) | 16 151 (9.3) | 5167 (8.1) | 5244 (7.0) | 2960 (5.5) | 1278 (3.2) | 2469 (7.3) | 1479 (4.3) |
| T2D | 14 482 (6.2) | 5732 (5.8) | 7006 (4.1) | 3110 (4.8) | 3368 (4.5) | 1905 (3.5) | 828 (2.0) | 1034 (3.0) | 1050 (3.1) |
| CHF | 12 166 (5.2) | 5345 (5.5) | 5860 (3.4) | 2536 (4.0) | 2344 (3.1) | 1515 (2.8) | 657 (1.6) | 558 (1.6) | 991 (2.9) |
| COPD | 11 107 (4.7) | 4312 (4.4) | 8407 (4.9) | 3378 (5.3) | 4002 (5.3) | 2250 (4.2) | 961 (2.4) | 967 (2.8) | 1668 (4.9) |
| CKD | 10 986 (4.7) | 4474 (4.6) | 4950 (2.9) | 2069 (3.2) | 1922 (2.6) | 1082 (2.0) | 281 (0.7) | 622 (1.8) | 486 (1.4) |
| Cancer | 19 497 (8.3) | 3609 (3.7) | 1785 (1.0) | 853 (1.3) | 810 (1.1) | 1287 (2.4) | 215 (0.5) | 1621 (4.8) | 284 (0.8) |
| Van Walraven Elixhauser Comorbidity Score, mean (SD) | 9 (13) | 10 (13) | 6 (11) | 8 (13) | 7 (12) | 8 (12) | 6 (10) | 9 (13) | 6 (11) |
| Positive for SARS-CoV-2 | 6672 (2.9) | 2477 (2.5) | 7196 (4.2) | 1583 (2.5) | 2572 (3.4) | 1556 (2.9) | 1108 (2.7) | 708 (2.1) | 1164 (3.4) |
| Outcomes of patient encounters | |||||||||
| Met Sepsis-3 criteria | 28 148 (12) | 11 900 (12) | 7024 (4.1) | 4949 (7.7) | 6120 (8.2) | 3923 (7.2) | 2127 (5.2) | 2213 (6.5) | 1648 (4.8) |
| Hours between presentation and sepsis clinical recognition, mean (SD) | 8.3 (59) | 4.5 (38) | 6.2 (32) | 2.6 (22) | 0.41 (11) | 3.9 (21) | 2.9 (16) | 4.2 (29) | 1.6 (11) |
| Died | 4762 (2.0) | 1603 (1.6) | 1481 (0.9) | 689 (1.1) | 351 (0.5) | 572 (1.1) | 230 (0.6) | 29 (<0.1) | 224 (0.7) |
Abbreviations: CHF, congestive heart failure; CKD, chronic kidney disease; COPD, chronic obstructive pulmonary disease; T2D, type 2 diabetes; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.
Percentage values are rounded to 2 digits (≥10 to whole numbers and <10 to a single decimal place).
Other includes Asian, American Indian or Alaska Native, Native Hawaiian or Other Pacific Islander, and individuals with no race or ethnicity provided. We did not break these categories down further given their size at certain hospitals, which could risk identifiability.
Discrimination varied substantially across hospitals (Figure). The C-statistic at the largest hospital (0.55) was similar to that reported in another study at the University of Michigan (0.63).2 The highest C-statistic in the present study (0.73) was close to the range (0.76-0.83) reported by the model developer.
Figure. Association Between Hospital-Level Sepsis Incidence and Epic Sepsis Model C-Statistic Across 9 US Hospitals in a Network.
Each hospital is represented by a blue point (A through I), with 95% CIs represented by vertical bars. The diagonal line represents the line of best fit among hospitals A through I.
Across the 9 hospitals, C-statistics were negatively correlated with sepsis incidence (r −0.80; P = .009), comorbidity burden (r −0.78; P = .013), and cancer prevalence (r −0.86; P = .003). The correlation was not statistically significant with time to clinical recognition of sepsis (r −0.59; P = .09).
Discussion
This study found that the performance of a sepsis model was negatively correlated with the incidence of sepsis, the presence of comorbidities, and cancer prevalence. We found no evidence that encounters with COVID-19 were associated with ESM discrimination, suggesting that COVID-19-related alerting increases may be due to model miscalibration.4,5 Our study was limited in that between-hospital differences in recognizing sepsis might affect the calculation of its incidence.1
Possible explanations for our findings include that sepsis may present more heterogeneously at hospitals with a higher sepsis incidence or be more difficult to detect in populations with a higher comorbidity burden, in which other clinical disorders could mimic sepsis. Beyond differential patient characteristics, the low C-statistic at the large teaching hospital in our study may be related to unique patterns in care, documentation, or billing compared with other hospitals in the network. Although academic medical centers often use sepsis prediction models, our findings suggest they may be most useful at lower-acuity hospitals, where sepsis rates may be lower. A forthcoming ESM update,6 which includes a recommendation for training on a hospital’s own data before clinical use and a change in the definition of sepsis onset, may mitigate the performance variation that we found and improve the clinical value of the model.
eMethods. Supplementary Methods
eReferences
Data Sharing Statement
References
- 1.Adams R, Henry KE, Sridharan A, et al. Prospective, multi-site study of patient outcomes after implementation of the TREWS machine learning-based early warning system for sepsis. Nat Med. 2022;28(7):1455-1460. [DOI] [PubMed] [Google Scholar]
- 2.Wong A, Otles E, Donnelly JP, et al. External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients. JAMA Intern Med. 2021;181(8):1065-1070. doi: 10.1001/jamainternmed.2021.2626 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Tarabichi Y, Cheng A, Bar-Shain D, et al. Improving timeliness of antibiotic administration using a provider and pharmacist facing sepsis early warning system in the emergency department setting: a randomized controlled quality improvement initiative. Crit Care Med. 2022;50(3):418-427. [DOI] [PubMed] [Google Scholar]
- 4.Finlayson SG, Subbaswamy A, Singh K, et al. The clinician and dataset shift in artificial intelligence. N Engl J Med. 2021;385(3):283-286. doi: 10.1056/NEJMc2104626 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Wong A, Cao J, Lyons PG, et al. Quantification of sepsis model alerts in 24 US hospitals before and during the COVID-19 Pandemic. JAMA Netw Open. 2021;4(11):e2135286. doi: 10.1001/jamanetworkopen.2021.35286 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Ross C. Epic overhauls popular sepsis algorithm criticized for faulty alarms. Stat News. Published October 3, 2022. Accessed November 7, 2022. https://www.statnews.com/2022/10/03/epic-sepsis-algorithm-revamp-training/
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
eMethods. Supplementary Methods
eReferences
Data Sharing Statement

