Abstract
Objective
Artificial intelligence (AI) holds promise for predicting sepsis. However, challenges remain in integrating AI, natural language processing (NLP) and free text data to enhance sepsis diagnosis at emergency department (ED) triage. This study aimed to evaluate the effectiveness of AI in improving sepsis diagnosis.
Methods
This retrospective cohort study analysed data from 134 266 patients admitted to the ED and subsequently hospitalised between 1 January 2016 and 31 December 2021. The data set comprised 10 variables and free-text triage comments, which underwent tokenisation and processing using a bag-of-words model. We evaluated four traditional NLP classifier models, including logistic regression, LightGBM, random forest and neural network. We also evaluated the performance of the BERT classifier. We used area under precision-recall curve (AUPRC) and area under the curve (AUC) as performance metrics.
Results
Random forest exhibited superior predictive performance with an AUPRC of 0.789 (95% CI: 0.7668 to 0.8018) and an AUC of 0.80 (95% CI: 0.7842 to 0.8173). Using raw text, the BERT model achieved an AUPRC of 0.7542 (95% CI: 0.7418 to 0.7741) and AUC of 0.7735 (95% CI: 0.7628 to 0.8017) for sepsis prediction. Key variables included ED treatment time, patient age, arrival-to-treatment time, Australasian Triage Scale and visit type.
Discussion
This study demonstrates AI, particularly random forest and BERT classifiers, for early sepsis detection in EDs using free-text patient concerns.
Conclusion
Incorporating free text into machine learning improved diagnosis and identified missed cases, enhancing sepsis prediction in the ED with an AI-powered clinical decision support system. Large, prospective studies are needed to validate these findings.
Keywords: Machine Learning, Artificial intelligence
WHAT IS ALREADY KNOWN ON THIS TOPIC
Sepsis remains a leading cause of death worldwide, with early detection being critical for improving outcomes. Existing diagnostic tools rely heavily on structured data, such as vital signs and laboratory results, which can delay diagnosis in emergency departments.
WHAT THIS STUDY ADDS
This study demonstrates that integrating artificial intelligence (AI) with free-text triage notes using natural language processing improves sepsis prediction. Machine learning models, including random forest and BERT, showed strong predictive performance, highlighting the value of unstructured data.
HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY
These findings support incorporating free-text analysis into AI-powered clinical decision support tools for early sepsis detection. Prospective validation in diverse healthcare settings is needed.
Introduction
Sepsis presents a significant global health challenge, with over 48.9 million cases and approximately 11 million deaths worldwide in 2017 alone.1 Apart from its immediate impact on mortality and morbidity, sepsis also imposes a substantial burden on healthcare systems due to prolonged hospital stays.2,6 Timely diagnosis and treatment are crucial for improving patient outcomes,7 8 particularly in the emergency department (ED), where triage plays a critical role in identifying and prioritising patients based on the severity of their conditions. However, the rate of sepsis diagnosis during triage is often low, especially in overcrowded EDs.9 10
While clinical scoring systems exist to aid in sepsis diagnosis,11 their integration into routine practice is limited by their complexity and the need for continuous education and proper implementation.12,15
The emergence of artificial intelligence (AI) as a tool in medical diagnosis holds promise in this context, given its capacity to analyse extensive medical data, encompassing patient records, scans, laboratory results and current evidence in the field.16 17
Integrating AI into the existing triage process has the potential to provide real-time sepsis risk predictions on patient admission, thus offering valuable decision support to triage nurses and clinicians.18 Central to this advancement is free text data, including unstructured clinical notes, narrative descriptions and other textual information documented by healthcare providers in patient records. Previous studies combining free text with machine learning (ML) techniques have demonstrated improved sepsis diagnosis in the ED.19 The first opportunity for prediction of sepsis is available from the moment the patients encounter an ED triage nurse to describe their symptoms.
Given the significance of the underlying challenges in ED and the potential benefits offered by AI, we hypothesise that leveraging AI and natural language processing (NLP) in conjunction with free text data could improve the diagnosis and treatment of sepsis and may improve outcomes. The primary objective of our study is to ascertain whether using NLP that uses free text can enhance the diagnosis of sepsis during the ED triage process.
Methods
This retrospective cohort study analysed data from patients admitted to Sir Charles Gairdner Hospital (SCGH) after presenting to the ED between 1 January 2016 and 31 December 2021. Data were extracted from the ED Information System and the Hospital Morbidity Data Collection for SCGH, merged using unique patient identifiers and admission dates, with ethical approval and a waiver of consent. The study included individuals admitted to the hospital from the ED, excluding those transferred from other hospitals or not admitted. Sepsis was identified using International Classification of Diseases (ICD)-10 AM codes, excluding ED diagnoses to prevent misclassification, and removing cases with specific hospital-acquired complications (HACs). Out of 353 906 ED presentations, 134 266 patients were admitted, and 7925 were diagnosed with sepsis. After excluding 2354 HAC-related cases, 5571 sepsis cases were used for model training. ICD codes and descriptions used to exclude sepsis in patients who potentially had HACS are summarised in table 1. A breakdown of the resulting data set and a statistical analysis of each feature are presented in table 2.
Table 1. ICD codes and descriptions used to exclude sepsis in patients who potentially had HACS.
| Code | Description |
|---|---|
| O860 | Infection of obstetrical surgical wound |
| O862 | Urinary tract infection following delivery |
| T814 | Wound infection following a procedure, not elsewhere classified |
| T826 | Infection and inflammatory reaction due to cardiac valve prosthesis |
| T827 | Infection and inflammatory reaction due to other cardiac and vascular devices, implants and grafts |
| T8271 | Infection and inflammatory reaction due to electronic cardiac device |
| T8272 | Infection and inflammatory reaction due to coronary artery bypass and valve grafts |
| T8273 | Infection and inflammatory reaction due to other vascular grafts |
| T8274 | Infection and inflammatory reaction due to central vascular catheter |
| T8275 | Infection and inflammatory reaction due to peripheral vascular catheter |
| T8276 | Infection and inflammatory reaction due to surgically created arteriovenous fistula and shunt |
| T8277 | Infection and inflammatory reaction due to vascular dialysis catheter |
| T8279 | Infection and inflammatory reaction due to cardiac and vascular devices, implants and grafts, not elsewhere classified |
| T835 | Infection and inflammatory reaction due to prosthetic device, implant and graft in urinary system |
| T836 | Infection and inflammatory reaction due to prosthetic device, implant and graft in genital tract |
| T845 | Infection and inflammatory reaction due to internal joint prosthesis |
| T846 | Infection and inflammatory reaction due to internal fixation device (any site) |
| T847 | Infection and inflammatory reaction due to other internal orthopaedic prosthetic devices, implants and grafts |
| T8571 | Infection and inflammatory reaction due to peritoneal dialysis catheter |
| T8572 | Infection and inflammatory reaction due to nervous system prosthetic devices, implants and grafts |
| T8573 | Infection and inflammatory reaction due to gastrointestinal prosthetic devices, implants and grafts |
| T8574 | Infection and inflammatory reaction due to respiratory prosthetic devices, implants and grafts |
| T8575 | Infection and inflammatory reaction due to breast prostheses and implants |
| T8576 | Infection and inflammatory reaction due to ocular prosthetic devices, implants and grafts |
| T8577 | Infection and inflammatory reaction due to internal hearing devices, implants and grafts |
| T8578 | Infection and inflammatory reaction due to other internal prosthetic devices, implants and grafts |
| T874 | Infection of amputation stump |
HACs, hospital-acquired complications; ICD, International Classification of Diseases.
Table 2. Hyperparameters and their respective search ranges for different classification models used in sepsis prediction.
| Classification model | Hyperparameter | Search range |
|---|---|---|
| Logistic regression | Regularisation (C) | (0.001, 0.01, 0.1, 1, 10, 100) |
| LightGBM | Number of estimators, learning rate | (50, 100, 200), (0.05, 0.1, 0.2) |
| Random forest | Number of trees, maximum depth | (50, 100, 200), (5, 10, 20) |
| Neural network | Hidden layer sizes, regularisation (alpha) | ((50), (100), (50, 50), (100, 100)), (0.0001, 0.001, 0.01) |
Each model's performance was optimised using Grid search with fivefold cross-validation.
Data preparation
The clinical data set was preprocessed to ensure consistency and suitability for ML algorithms, involving data cleaning, transformation and feature engineering. Time-related features were standardised by converting them to minutes, ensuring uniformity and facilitating numerical analysis. Categorical variables were label-encoded, converting them into integers to preserve their ordinal relationships and making them suitable for ML algorithms. To streamline the data set and focus on features most relevant to the prediction of sepsis, several columns that contained redundant, irrelevant or incomplete information were dropped during the data preprocessing stage. In the triage data, columns such as PP_COMMENTS, Clinical Comments, Visit Type.1, Symptom Code, Disposition, Gender, Symptom Description, Diagnosis Description and Departure Destination were removed. These columns either contained free-text comments that were not directly relevant to ML algorithms or data that did not contribute meaningfully to sepsis prediction.
Similarly, in the admissions data, columns like SEPARATED_DESTINATION, SEPARATION_DATETIME, SEPARATION_FIN_YEAR, SEPARATION_MONTH, IP LOS in HOURS, intensive care unit (ICU-related variables (eg, ICU HOURS, CONTINUOUS_VENTILATORY_SUPPORT), as well as various ICD and procedure codes, were discarded. These fields were excluded because they either captured information postadmission (which was outside the scope of triage-based sepsis prediction), were incomplete, or were not essential for the primary objective of the study. By removing such columns, we aimed to eliminate noise from the data set and focus only on the variables that could improve model performance and efficiency.
Feature engineering involved deriving new features from existing data to provide meaningful insights. For instance, arrival month and arrival hour were extracted from the arrival date, and time differences such as arrival to treatment time and ED treat to discharge time were calculated. These features helped capture temporal patterns and relationships in the data. To avoid sparsity and ensure meaningful analysis, categorical data was aggregated. Categories with fewer records were combined into broader groups, such as grouping visit types with fewer than 100 records and transport modes with fewer than 20 occurrences.
A unique identifier for each record was created by combining date and patient-specific information, crucial for accurately merging the triage and admissions data sets. The presence of sepsis was identified using specific ICD-10 codes, and records with identified sepsis were flagged and subsequently removed to focus on new sepsis cases detected after admission. Potential hospital-acquired conditions related to sepsis were also flagged to distinguish between community-acquired and hospital-acquired sepsis cases. Finally, intermediate columns used for calculations or identification processes were dropped to prepare the data set for further analysis.
Statistical analysis
We evaluated the performance of four traditional ML algorithms for sepsis prediction: logistic regression, LightGBM, random forest and neural network. To validate the model’s performance, 10-fold cross-validation was used. In this method, the data set is divided into ten equal parts. 90% of the data is used for training and 10% for testing. This process is repeated ten times, until each part of the data set is used for testing exactly once. Grid search was performed to identify optimal model hyperparameters during the training process as summarised in table 3.
Table 3. Counts and feature statistics for the data set used to train the AI model.
| Data | No sepsis (n=128 695; 95.9%) |
Sepsis (n=5571; 4.1%) |
|---|---|---|
| Patient age: mean (95% CI) | 62.0 (61.9 to 62.1) | 69.1 (68.6 to 69.5) |
| Patient gender male: no. (%) | 62 224 (48) | 3069 (55) |
| Median (IQR) ATS | 3 (2–4) | 2 (2–3) |
| Minutes from arrival to treatment: mean (95% CI) | 35.9 (35.7 to 36.2) | 22.0 (21.1 to 22.9) |
| Minutes from treatment to discharge from ED: mean (95% CI) | 268.7 (267.5 to 270.0) | 381.0 (373.8 to 388.1) |
| Arrival hour: median (IQR) | 13 (10–18) | 13 (10–18) |
| Arrival month: median (IQR) | 7 (4–10) | 7 (4–10) |
| Visit type | 15 categories (eg, emergency presentation, transfer from metro hospital) | |
| Transport description | 10 categories (eg, ambulance, private transport) | |
| Triage comment | Free text | |
The p values were statistically significant (p<0.05) between two groups (p<0.05).
AI, artificial intelligence; ATS, Australasian Triage Scale; ED, emergency department.
We used elastic net regularisation method for feature selection,20 which required tuning two hyperparameters: α (ridge coefficient) and λ (amount of regularisation). α determined the number of variables included in the analysis, while λ controlled the degree of coefficient shrinkage. Here we set α=0.9 and grid search was performed for λ in the range (0.01, 0.1, 1, 10, 100). During training, we used the variable weights from elastic net regularisation to identify important predictors and hyperparameters via grid search for inclusion in the final model for prediction on the test set. This process involved testing a range of α and λ values to find the best fit for the training data.
Features were first excluded if they were not available at triage (eg, separation date, time, ICU hours), had high missingness or were irrelevant to early prediction (eg, discharge disposition). Subsequently, Elastic Net regularisation (α=0.9) was used during model training for data-driven feature selection. This method allowed variable selection while reducing overfitting by penalising both large coefficients and redundancy. A grid search over λ = (0.01, 0.1, 1, 10, 100) was used to identify a robust feature set.
Due to an unequal number of patients in the two groups (sepsis 4% and no-sepsis 96%), the training data set was imbalanced which could bias the system’s performance. To address this issue, we used a random undersampling technique to create a balanced training set. In this method, the data is balanced by reducing the number of instances in the majority class (no-sepsis) to match the minority class (sepsis). We further performed 10-fold cross-validation within the training set (90% for training and 10% for validation) to identify the optimal combination of variables. The non-sepsis group included various diagnoses, which may have introduced some label noise and affected model performance. However, we applied random undersampling only to the training data, while keeping the test data unchanged to avoid bias and to ensure that model evaluation reflects real-world conditions. Sepsis cases are typically much fewer than non-sepsis cases, which is a common characteristic of ED data sets.
Text-based BERT classification
We also evaluated the performance of the Bidirectional Encoder Representations from Transformers (BERT) model for predicting sepsis from raw text data as it is shown to be efficient for NLP tasks.21 The raw text data was preprocessed, which involved tokenising the text data using the BERT tokeniser from the Hugging Face Transformers library. Tokenisation converts raw text into tokens, input IDs and attention masks, standardising the input for BERT. We used the same strategy for the BERT model as for the classical classifiers, including stratified 10-fold cross-validation (90% training, 10% testing) and random undersampling during training to handle class imbalance and maintain consistency across model pipelines. Tokenised inputs and labels were converted into TensorDatasets and loaded into DataLoaders for efficient batching and shuffling, with a batch size of 16 to facilitate training and evaluation.
For training, we fine-tuned a pretrained BERT model, BertForSequenceClassification, for binary classification of sepsis versus non-sepsis. We used the bert-base-uncased model and the AdamW optimiser with a learning rate of 2e-5. A linear learning rate scheduler with warmup was applied. Training was performed over three epochs, with gradient clipping to prevent non-converging gradients. The model’s performance was evaluated on the testing set using accuracy, calculated as the proportion of correctly predicted labels. All simulations were performed using the Python scripting language.
Results
Table 4 summarises the performance (mean (IQR) across 10 folds) of four classifiers used in the study. Random forest outperformed other classifiers, achieving an area under precision-recall curve (AUPRC) of 0.7890 (0.7668–0.8018) and an area under the curve (AUC) of 0.8000 (0.7842–0.8173), indicating its high discriminatory power and reliability. All subsequent work focused on results obtained using the random forest classifier due to its superior performance.
Table 4. Performance of four classifiers used in this study to predict sepsis, reported as mean (IQR) across 10-folds.
| Metric | Logistic regression | LightGBM | Random forest | Neural network | BERT |
|---|---|---|---|---|---|
| AUPRC | 0.7760 (0.7585–0.8050) | 0.7880 (0.7776–0.8174) | 0.7890 (0.7668–0.8018) | 0.7670 (0.7448–0.7804) | 0.7542 (0.7418–0.7741) |
| AUC | 0.7810 (0.7564–0.8030) | 0.7950 (0.7684–0.8092) | 0.8000 (0.7842–0.8173) | 0.7740 (0.7627–0.8030) | 0.7735 (0.7628–0.8017) |
| Precision | 0.7020 (0.6889–0.7151) | 0.7290 (0.7154–0.7427) | 0.7450 (0.7259–0.7707) | 0.6890 (0.6597–0.7152) | 0.7105 (0.6953–0.7338) |
| Recall | 0.7330 (0.7218–0.7603) | 0.7410 (0.7249–0.7615) | 0.7680 (0.7540–0.7883) | 0.7050 (0.6889–0.7170) | 0.7684 (0.7522–0.7888) |
| F1-score | 0.7170 (0.6950–0.7412) | 0.7350 (0.7164–0.7508) | 0.7560 (0.7342–0.7669) | 0.6971 (0.6753–0.7178) | 0.7382 (0.7173–0.7519) |
AUC, area under the curve; AUPRC, area under precision-recall curve; BERT, Bidirectional Encoder Representations from Transformers.
To assess whether differences in AUC between classifiers were statistically significant, we applied DeLong’s test. The results showed no statistically significant difference (p>0.05) in AUC between random forest and other classifiers (including BERT), despite random forest achieving the highest mean AUC.
Figure 1 shows the net benefit plot for sepsis prediction. The random forest model (blue curve) outperforms the ‘Always Act’ (red dashed line) and ‘Never Act’ (green dashed-dotted line) strategies across various threshold probabilities. The random forest model maintains a high net benefit up to a threshold of about 0.3 where the model effectively balances true positives and false positives, after which the net benefit decreases.
Figure 1. Net benefit plot for random forest model. The random forest model maintains a high net benefit up to a threshold of approximately 0.3, effectively balancing true positives and false positives compared to the baseline strategies.
For the ‘sepsis’ condition, the median predicted probability is about 0.6, reflecting the model’s tendency to assign higher probabilities to sepsis cases. The IQR ranges from 0.45 to 0.75, indicating greater variability compared with the ‘no sepsis’ condition. The whiskers extend from 0.3 to 0.9, covering most data points. No significant outliers are detected, suggesting more consistent predictions for sepsis cases.
The elastic-net feature selection algorithm selected nine features across all 10-folds: age, pain, temperature, Australasian Triage Scale, visit type, ED treatment to discharge time, urinary tract infection, headache and fever.
Discussion
The results obtained in this study highlight the potential of AI for early detection of sepsis in the ED using free text analysis of patient concerns prior to clinical assessment. The random forest classifier demonstrated the best predictive performance among the models evaluated, with an AUPRC of 0.7890 and an AUC of 0.8000. Notably, the BERT model, which leverages advanced NLP techniques to process raw text data, also achieved high predictive performance, with an AUPRC of 0.7542 and an AUC of 0.7735. While slightly lower than the random forest model, BERT’s strong performance highlights the utility of leveraging unstructured triage narratives for sepsis prediction. The fact that both models performed well supports the feasibility of incorporating free-text data into AI-driven clinical decision support tools at ED triage, where early detection is critical for improving patient outcomes. The elastic-net feature selection algorithm identified nine key features across all folds, including demographic factors (age), clinical indicators (pain, temperature) and process metrics (ED treat to discharge time). These features provide valuable insights into the predictive factors associated with sepsis development.
Our findings align with previous research by Horng et al,19 which demonstrated the effectiveness of incorporating free-text data into AI-based clinical decision support systems for sepsis prediction in the ED. Their retrospective study (2008–2013) used ICD-9-CM discharge diagnoses and divided patients into training (64%), validation (20%) and test (16%) groups. They created four models integrating vital signs, chief concerns and preprocessed free text, using bag-of-words and topic models with a support vector machine for prediction. Out of 230 936 patient visits, about 14% were diagnosed with infection. The best-performing models, which included free text data, achieved an AUC of 0.86 in the test set.
In addition to establishing an AI model to detect sepsis, we were also interested in the prevalence of sepsis in a large ED. One of the difficulties for clinicians is maintaining vigilance across the many ‘common’ conditions that present to the ED. Initially, 5.9% of patients in the data set had sepsis ICD codes assigned during their admission. After excluding patients with potential HACs, the prevalence reduced to 4.1%. Given the possibility that some of the removed HAC cases might still be relevant sepsis cases, we estimate the prevalence of ‘real’ sepsis to be between 4.1% and 5.9% in our data set.
Our results also align with those reported by the Australian Institute of Health and Welfare, which indicated a 4.9% prevalence of ‘Certain infectious and parasitic diseases’ in ED presentations in Western Australia.22 Sepsis accounts for 6.7% of healthcare expenditure in the region, highlighting its substantial economic burden. Our findings support using free-text data in ML models to enhance sepsis diagnosis in ED patients. By analysing clinical notes and narrative descriptions, these models can identify sepsis cases that may be missed when relying only on structured data like vital signs and demographics. Our study, using limited ED data and triage comments, demonstrates the potential of ML, particularly random forest, in early sepsis detection. These models, leveraging demographic, clinical and temporal features, can help clinicians identify at-risk patients promptly, leading to timely interventions and improved outcomes. Identified features such as pain levels, vital signs and treatment duration emphasise the importance of comprehensive patient assessment in identifying sepsis risk. Integrating these predictors into clinical decision support systems can enhance diagnostic accuracy and facilitate targeted interventions, reducing morbidity and mortality associated with sepsis.
Automating decision support for sepsis detection at ED triage is challenging,23 24 as previous studies25 26 often rely on lab results and continuous vital sign monitoring, which are not always available in fast-paced ED settings. Thus, clinical practice continues to use vital signs as a trigger for diagnosis of sepsis at triage.27 28 Consequently, vital signs alone are still used but are neither sensitive nor specific enough for early sepsis detection. Leveraging all available data, including free text from clinical notes, can improve decision support triggers, and our results, supported by previous studies,19 highlight this opportunity.1
Previous systematic reviews and meta-analyses on AI for sepsis prediction highlight its effectiveness.13 18 Kijpaisalratana et al29 used retrospective data from adult ED patients, and used algorithms including logistic regression, gradient boosting, random forest and neural networks. Their models, trained on 80% of the data and tested on 20%, outperformed traditional models including quick Sequential Organ Failure Assessment, Modified Early Warning Score and systemic inflammatory response syndrome with a higher area under receiver operating characteristic curve of 0.93 for the best model (random forest). These findings support the superiority of ML over traditional methods for predicting sepsis in ED patients.
Fleuren et al18 conducted a systematic review and meta-analysis of 28 studies, evaluating 130 ML models for real-time sepsis diagnosis. They reported that the diagnostic accuracy (measured by AUC) varied by setting: 0.68–0.99 in the ICU, 0.96–0.98 in-hospital and 0.87–0.97 in the ED. Despite varying sepsis definitions, the models accurately predicted sepsis, highlighting the need for further research to bridge the gap between data and clinical practice.
Limitations
There are several limitations in our study. First, we did not include key variables like temperature and heart rate, which could have improved prediction accuracy. Second, since we used free text data as input to the model, artefacts (spelling and grammatical errors) could have impacted the model’s performance. Third, we used random under-sampling of the training data to balance classes while preserving the original distribution in the test data. This approach, while simple, may reduce the diversity of non-sepsis presentations and introduce label noise. Future work involves exploring the performance with data augmentation strategies such as generative adversarial networks or synthetic text generation for improved representation of the sepsis class.30 Finally, the heterogeneity within the non-sepsis group may have introduced label noise, potentially affecting model performance. We did not further stratify this group into diagnostic subcategories, as such granularity is not available to clinicians at the time of triage. This limitation reflects real-world constraints but may reduce model specificity. With access to a larger data set containing more sepsis cases, future models may achieve greater accuracy and robustness. Though multimodal clinical data, vital signs and laboratory results could enhance sepsis detection and patient risk stratification, it is challenging to optimally synchronise them, which adds complexity to the healthcare system.
Conclusions
This study demonstrates the utility of ML models, particularly random forest, in predicting sepsis among ED patients. By leveraging a combination of demographic, clinical and temporal features, these models offer valuable insights into sepsis risk stratification and early detection. However, further research is needed to address data quality issues, refine feature selection and facilitate real-world implementation to maximise clinical impact and improve patient outcomes. In this study, we used the original BERT model to assess the feasibility of applying ML and basic transformer-based architectures to triage text classification. While BERT was not pretrained specifically on medical data, it served as a baseline for evaluating the potential of language models in this context. In future work, we plan to explore domain-adapted variants such as Bio_ClinicalBERT and PubMedBERT for potentially enhanced performance for clinical implementation. Future research should also involve validating the model with clinician acceptance through iterative feedback, stakeholder engagement, prospective data and external data sets to ensure generalisability across diverse populations and settings.
Acknowledgements
We thank Sandra O’Keefe for her assistance in data extraction and Dr Peter Allely for his assistance as the data custodian and for providing access to data. We thank the WA Data Science Innovation Hub for initial project support through the WA Health Hackathon 2022.
Footnotes
Funding: The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Patient consent for publication: Not applicable.
Provenance and peer review: Not commissioned; externally peer reviewed.
Data availability statement
Data may be obtained from a third party and are not publicly available.
References
- 1.Rudd KE, Johnson SC, Agesa KM, et al. Global, regional, and national sepsis incidence and mortality, 1990-2017: analysis for the Global Burden of Disease Study. Lancet. 2020;395:200–11. doi: 10.1016/S0140-6736(19)32989-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Adrie C, Alberti C, Chaix-Couturier C, et al. Epidemiology and economic evaluation of severe sepsis in France: age, severity, infection site, and place of acquisition (community, hospital, or intensive care unit) as determinants of workload and cost. J Crit Care. 2005;20:46–58. doi: 10.1016/j.jcrc.2004.10.005. [DOI] [PubMed] [Google Scholar]
- 3.Iwashyna TJ, Ely EW, Smith DM, et al. Long-term cognitive impairment and functional disability among survivors of severe sepsis. JAMA. 2010;304:1787–94. doi: 10.1001/jama.2010.1553. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Seymour CW, Gesten F, Prescott HC, et al. Time to Treatment and Mortality during Mandated Emergency Care for Sepsis. N Engl J Med. 2017;376:2235–44. doi: 10.1056/NEJMoa1703058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Rhodes A, Evans LE, Alhazzani W, et al. Surviving Sepsis Campaign: International Guidelines for Management of Sepsis and Septic Shock: 2016. Intensive Care Med. 2017;43:304–77. doi: 10.1007/s00134-017-4683-6. [DOI] [PubMed] [Google Scholar]
- 6.Ferrer R, Martin-Loeches I, Phillips G, et al. Empiric Antibiotic Treatment Reduces Mortality in Severe Sepsis and Septic Shock From the First Hour. Crit Care Med. 2014;42:1749–55. doi: 10.1097/CCM.0000000000000330. [DOI] [PubMed] [Google Scholar]
- 7.Arefian H, Heublein S, Scherag A, et al. Hospital-related cost of sepsis: A systematic review. J Infect. 2017;74:107–17. doi: 10.1016/j.jinf.2016.11.006. [DOI] [PubMed] [Google Scholar]
- 8.Fleischmann C, Scherag A, Adhikari NKJ, et al. Assessment of Global Incidence and Mortality of Hospital-treated Sepsis. Current Estimates and Limitations. Am J Respir Crit Care Med. 2016;193:259–72. doi: 10.1164/rccm.201504-0781OC. [DOI] [PubMed] [Google Scholar]
- 9.Leisman DE, Angel C, Schneider SM, et al. Sepsis Presenting in Hospitals versus Emergency Departments: Demographic, Resuscitation, and Outcome Patterns in a Multicenter Retrospective Cohort. J Hosp Med. 2019;14:340–8. doi: 10.12788/jhm.3188. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Darraj A, Hudays A, Hazazi A, et al. The Association between Emergency Department Overcrowding and Delay in Treatment: A Systematic Review. Healthcare (Basel) 2023;11:385. doi: 10.3390/healthcare11030385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.McLymont N, Glover GW. Scoring systems for the characterization of sepsis and associated outcomes. Ann Transl Med. 2016;4:527. doi: 10.21037/atm.2016.12.53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Liu VX, Lu Y, Carey KA, et al. Comparison of Early Warning Scoring Systems for Hospitalized Patients With and Without Infection at Risk for In-Hospital Mortality and Transfer to the Intensive Care Unit. JAMA Netw Open . 2020;3:e205191. doi: 10.1001/jamanetworkopen.2020.5191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Tong-Minh K, Welten I, Endeman H, et al. Predicting mortality in adult patients with sepsis in the emergency department by using combinations of biomarkers and clinical scoring systems: a systematic review. BMC Emerg Med. 2021;21:70. doi: 10.1186/s12873-021-00461-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Chamberlain DJ, Willis E, Clark R, et al. Identification of the severe sepsis patient at triage: a prospective analysis of the Australasian Triage Scale. Emerg Med J. 2015;32:690–7. doi: 10.1136/emermed-2014-203937. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Tromp M, Hulscher M, Bleeker-Rovers CP, et al. The role of nurses in the recognition and treatment of patients with sepsis in the emergency department: a prospective before-and-after intervention study. Int J Nurs Stud. 2010;47:1464–73. doi: 10.1016/j.ijnurstu.2010.04.007. [DOI] [PubMed] [Google Scholar]
- 16.Kumar Y, Koul A, Singla R, et al. Artificial intelligence in disease diagnosis: a systematic literature review, synthesizing framework and future research agenda. J Ambient Intell Humaniz Comput. 2023;14:8459–86. doi: 10.1007/s12652-021-03612-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Yusuf M, Atal I, Li J, et al. Reporting quality of studies using machine learning models for medical diagnosis: a systematic review. BMJ Open. 2020;10:e034568. doi: 10.1136/bmjopen-2019-034568. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Fleuren LM, Klausch TLT, Zwager CL, et al. Machine learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy. Intensive Care Med. 2020;46:383–400. doi: 10.1007/s00134-019-05872-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Horng S, Sontag DA, Halpern Y, et al. Creating an automated trigger for sepsis clinical decision support at emergency department triage using machine learning. PLoS One. 2017;12:e0174708. doi: 10.1371/journal.pone.0174708. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw. 2010;33:1–22. [PMC free article] [PubMed] [Google Scholar]
- 21.Lee JD, Toutanova K. Pre-training of deep bidirectional transformers for language understanding. arXiv. 2018 Preprint.
- 22.Li L, Sunderland N, Rathnayake K, et al. Sydney: Australian Commission on Safety and Quality in Health Care; 2020. Epidemiology of sepsis in australian public hospitals. [Google Scholar]
- 23.Ge C, Deng F, Chen W, et al. Machine learning for early prediction of sepsis-associated acute brain injury. Front Med. 2022;9:962027. doi: 10.3389/fmed.2022.962027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Mani S, Ozdas A, Aliferis C, et al. Medical decision support using machine learning for early detection of late-onset neonatal sepsis. J Am Med Inform Assoc. 2014;21:326–36. doi: 10.1136/amiajnl-2013-001854. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Tsoukalas A, Albertson T, Tagkopoulos I. From data to optimal decision making: a data-driven, probabilistic machine learning approach to decision support for patients with sepsis. JMIR Med Inform. 2015;3:e11. doi: 10.2196/medinform.3445. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Henry KE, Hager DN, Pronovost PJ, et al. A targeted real-time early warning score (TREWScore) for septic shock. Sci Transl Med. 2015;7:299ra122. doi: 10.1126/scitranslmed.aab3719. [DOI] [PubMed] [Google Scholar]
- 27.Sawyer AM, Deal EN, Labelle AJ, et al. Implementation of a real-time computerized sepsis alert in nonintensive care unit patients. Crit Care Med. 2011;39:469–73. doi: 10.1097/CCM.0b013e318205df85. [DOI] [PubMed] [Google Scholar]
- 28.McGillicuddy DC, O’Connell FJ, Shapiro NI, et al. Emergency department abnormal vital sign “triggers” program improves time to therapy. Acad Emerg Med. 2011;18:483–7. doi: 10.1111/j.1553-2712.2011.01056.x. [DOI] [PubMed] [Google Scholar]
- 29.Kijpaisalratana N, Sanglertsinlapachai D, Techaratsami S, et al. Machine learning algorithms for early sepsis detection in the emergency department: A retrospective study. Int J Med Inform. 2022;160:104689. doi: 10.1016/j.ijmedinf.2022.104689. [DOI] [PubMed] [Google Scholar]
- 30.Stoev T, Ferrario A, Demiray B, et al. Coping With Imbalanced Data in the Automated Detection of Reminiscence From Everyday Life Conversations of Older Adults. IEEE Access. 2021;9:116540–51. doi: 10.1109/ACCESS.2021.3106249. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data may be obtained from a third party and are not publicly available.

