Skip to main content
. 2021 Dec 13;29(3):559–575. doi: 10.1093/jamia/ocab236

Table 1.

Study characteristics

Study (year) Clinical setting and data source Sample sizea Cohort criteria infection definition Task and objective
Horng et al.47 (2017)
  • ED

  • Beth Israel Deaconess (Boston, MA, United States)

  • Dec 17, 2008—Feb 17, 2013

230 936 patient visits

 
  • Infection: 32 103 P; 14%

  • No infection: 198 833 P; 86%

 
  • Train : 147 799 P; 64%

  • Validation: 46 187 P; 20%

  • Test: 36 950 P; 16%

Angus Sepsis ICD-9-CM abstraction criteria79 Identify patients with suspected infection to demonstrate benefits of using clinical text with structured data for detecting ED patients with suspected infection.
Apostolova and Velez48 (2017)
  • ICU

  • MIMIC-III

  • 2001–2012

634 369 nursing notes

 
  • Infection presence: 186 158 N; 29%

  • Possible infection: 3262 N; 1%

  • No infection: 448 211 N; 70%

 
  • Train: 70%

  • Test: 30%

Notes describing patient taking or being prescribed antibiotics for treating infection Identify notes with suspected or presence of infection to develop a system for detecting infection signs and symptoms in free-text nursing notes.
Culliton et al.49 (2017)
  • Inpatient care

  • Baystate hospitals (Springfield, MA, United States)

  • 2012–2016

203 000 adult inpatient admission encounters

 
  • Used 68 482 E

  • Severe sepsis: 1427 E; 2.1%

 
  • 3-fold cross validation: only text data

  • Model construction: 2012–2015 data

 

Test set: 2016 data:

 
  • Used 13 603 E

  • Severe sepsis: 425 P; 3.1%

Modified Baystate clinical definition of severe sepsis (8 structured variables) and severe sepsis ICD codes Predict severe sepsis 4, 8, and 24 h before the earliest time structured variables meet the severe sepsis definition to compare accuracy of predicting patients that will meet the clinical definition of sepsis when using unstructured data only, structured data only, or both types.
Delahanty et al.51 (2019)
  • ED

  • Tenet Healthcare Hospitals (Nashville, TN, United States)

  • January 1, 2016—October 31, 2017

2 759 529 patient encounters

 
  • Sepsis: 54 661 E; 2%

  • No Sepsis: 2 704 868 E; 98%

 

Train: 1 839 503 E; 66.7%

 
  • Sepsis: 36 458 E; 2%

  • No sepsis: 1 803 045 E; 98%

 

Test: 920 026 E; 33.3%

 
  • Sepsis: 18 203 E; 2%

  • No sepsis: 901 823 E; 98%

Rhee’s modified Sepsis-3 definition80 Predict sepsis risk in patients 1, 3, 6, 12, and 24 h after the first vital sign or laboratory result is recorded in the EHR to develop a new sepsis screening tool comparable to benchmark screening tools.
Liu et al.50 (2019)
  • ICU

  • MIMIC-III

  • 2001–2012

  • 38 645 adult patients

  • Train: 70% P

  • Test: 30% P

  • Applied model to:

  • 15 930 P with suspected infection and at least 1 physiological EHR data

Sepsis-3 definition1 Predict septic shock in sepsis patients before the earliest time septic shock criteria are met to demonstrate an approach using NLP features for septic shock prediction.
Amrollahi et al.53 (2020)
  • ICU

  • MIMIC-III

  • 2001–2012

40 175 adult patients

 
  • Sepsis: 2805 P; ∼7%

 
  • Train 80% P

  • Test 20% P

Sepsis-3 definition1 Predict sepsis onset hours in advance using a deep learning approach to show a pre-trained neural language representation model can improve early sepsis detection.
Hammoud et al.54 (2020)
  • ICU

  • MIMIC-II

  • 2001–2007

17 763 patients

 
  • Sepsis: 6097 P

  • Severe sepsis: 3962 P

  • Septic shock : 1469 P

 

5-fold cross validation

Sepsis definition based on what Henry et al78 used Predict early septic shock in ICU patients using a model that can be optimized based on user preference or performance metrics.
Goh et al.52 (2021)
  • ICU

  • Singapore government-based hospital (Singapore, Singapore)

  • Apr 2, 2015—Dec 31, 2017

  • 5317 patients (114 602 notes)

 

Train and validation: 3722 P (80 162 N)

 
  • Sepsis: 6.45%

  • No sepsis: 93.55%

 

Test: 1595 P (34 440 N)

 
  • Sepsis: 5.45%

  • No sepsis: 94.55%

ICU admission with an ICD-10 code for sepsis, severe sepsis, or sepsis shock Identify if a patient has sepsis at consultation time or predict sepsis 4, 6, 12, 24, and 48 h after consultation to develop an algorithm that uses structured and unstructured data to diagnose and predict sepsis.
Qin et al.55 (2021)
  • ICU

  • MIMIC-III

  • 2001–2012

  • 49 168 patients

 

Train: 33 434 P

 
  • Sepsis: 1353 P

  • No Sepsis: 32 081 P

 

Validation: 8358 P

 
  • Sepsis: 338 P

  • No Sepsis: 8020 P

 

Test: 7376 P

 
  • Sepsis: 229 P

  • No Sepsis: 7077 P

PhysioNet Challenge restrictive Sepsis-3 definition81 Predict if a patient will develop sepsis to explore how numerical and textual features can be used to build a predictive model for early sepsis prediction.

ED: emergency department; ICU: intensive care unit; ICD: International Classification of Diseases; ICD-9 CM: ICD Clinical Modification, 9th revision; ICD-10: ICD 10th revision; MIMIC-II: Multiparameter Intelligent Monitoring in Intensive Care II database; MIMIC-III: Medical Information Mart for Intensive Care dataset.

a

Sample size unit abbreviations: P: patients; N: notes; E: encounters.