. 2021 Dec 13;29(3):559–575. doi: 10.1093/jamia/ocab236

Table 1.

Study characteristics

Study (year)	Clinical setting and data source	Sample size^a	Cohort criteria infection definition	Task and objective
Horng et al.⁴⁷ (2017)	ED Beth Israel Deaconess (Boston, MA, United States) Dec 17, 2008—Feb 17, 2013	230 936 patient visits Infection: 32 103 P; 14% No infection: 198 833 P; 86% Train : 147 799 P; 64% Validation: 46 187 P; 20% Test: 36 950 P; 16%	Angus Sepsis ICD-9-CM abstraction criteria⁷⁹	Identify patients with suspected infection to demonstrate benefits of using clinical text with structured data for detecting ED patients with suspected infection.
Apostolova and Velez⁴⁸ (2017)	ICU MIMIC-III 2001–2012	634 369 nursing notes Infection presence: 186 158 N; 29% Possible infection: 3262 N; 1% No infection: 448 211 N; 70% Train: 70% Test: 30%	Notes describing patient taking or being prescribed antibiotics for treating infection	Identify notes with suspected or presence of infection to develop a system for detecting infection signs and symptoms in free-text nursing notes.
Culliton et al.⁴⁹ (2017)	Inpatient care Baystate hospitals (Springfield, MA, United States) 2012–2016	203 000 adult inpatient admission encounters Used 68 482 E Severe sepsis: 1427 E; 2.1% 3-fold cross validation: only text data Model construction: 2012–2015 data Test set: 2016 data: Used 13 603 E Severe sepsis: 425 P; 3.1%	Modified Baystate clinical definition of severe sepsis (8 structured variables) and severe sepsis ICD codes	Predict severe sepsis 4, 8, and 24 h before the earliest time structured variables meet the severe sepsis definition to compare accuracy of predicting patients that will meet the clinical definition of sepsis when using unstructured data only, structured data only, or both types.
Delahanty et al.⁵¹ (2019)	ED Tenet Healthcare Hospitals (Nashville, TN, United States) January 1, 2016—October 31, 2017	2 759 529 patient encounters Sepsis: 54 661 E; 2% No Sepsis: 2 704 868 E; 98% Train: 1 839 503 E; 66.7% Sepsis: 36 458 E; 2% No sepsis: 1 803 045 E; 98% Test: 920 026 E; 33.3% Sepsis: 18 203 E; 2% No sepsis: 901 823 E; 98%	Rhee’s modified Sepsis-3 definition⁸⁰	Predict sepsis risk in patients 1, 3, 6, 12, and 24 h after the first vital sign or laboratory result is recorded in the EHR to develop a new sepsis screening tool comparable to benchmark screening tools.
Liu et al.⁵⁰ (2019)	ICU MIMIC-III 2001–2012	38 645 adult patients Train: 70% P Test: 30% P Applied model to: 15 930 P with suspected infection and at least 1 physiological EHR data	Sepsis-3 definition¹	Predict septic shock in sepsis patients before the earliest time septic shock criteria are met to demonstrate an approach using NLP features for septic shock prediction.
Amrollahi et al.⁵³ (2020)	ICU MIMIC-III 2001–2012	40 175 adult patients Sepsis: 2805 P; ∼7% Train 80% P Test 20% P	Sepsis-3 definition¹	Predict sepsis onset hours in advance using a deep learning approach to show a pre-trained neural language representation model can improve early sepsis detection.
Hammoud et al.⁵⁴ (2020)	ICU MIMIC-II 2001–2007	17 763 patients Sepsis: 6097 P Severe sepsis: 3962 P Septic shock : 1469 P 5-fold cross validation	Sepsis definition based on what Henry et al⁷⁸ used	Predict early septic shock in ICU patients using a model that can be optimized based on user preference or performance metrics.
Goh et al.⁵² (2021)	ICU Singapore government-based hospital (Singapore, Singapore) Apr 2, 2015—Dec 31, 2017	5317 patients (114 602 notes) Train and validation: 3722 P (80 162 N) Sepsis: 6.45% No sepsis: 93.55% Test: 1595 P (34 440 N) Sepsis: 5.45% No sepsis: 94.55%	ICU admission with an ICD-10 code for sepsis, severe sepsis, or sepsis shock	Identify if a patient has sepsis at consultation time or predict sepsis 4, 6, 12, 24, and 48 h after consultation to develop an algorithm that uses structured and unstructured data to diagnose and predict sepsis.
Qin et al.⁵⁵ (2021)	ICU MIMIC-III 2001–2012	49 168 patients Train: 33 434 P Sepsis: 1353 P No Sepsis: 32 081 P Validation: 8358 P Sepsis: 338 P No Sepsis: 8020 P Test: 7376 P Sepsis: 229 P No Sepsis: 7077 P	PhysioNet Challenge restrictive Sepsis-3 definition⁸¹	Predict if a patient will develop sepsis to explore how numerical and textual features can be used to build a predictive model for early sepsis prediction.

ED: emergency department; ICU: intensive care unit; ICD: International Classification of Diseases; ICD-9 CM: ICD Clinical Modification, 9th revision; ICD-10: ICD 10th revision; MIMIC-II: Multiparameter Intelligent Monitoring in Intensive Care II database; MIMIC-III: Medical Information Mart for Intensive Care dataset.

Sample size unit abbreviations: P: patients; N: notes; E: encounters.