Hospital-acquired sepsis remains a major health problem that continues to warrant attention from the perspectives of research and clinical practice. More than 100,000 cases of hospital-acquired sepsis occur every year in the United States.1,2 Sepsis is associated with poor survival (19.2% mortality rate), prolonged lengths of stay (median of 17 days in hospital, 8 days in ICU), and substantial hospitalization costs (median of almost $40,000).3,4 Overall, hospital-acquired sepsis accounts for at least $2.6 billion of annual U.S. hospital costs.5
The burgeoning role of artificial intelligence (and machine learning [ML], specifically) has promised to help us identify patients at risk for sepsis and facilitate earlier intervention. In their paper “Physiological Machine Learning Models for Prediction of Sepsis in Hospitalized Adults: An Integrative Review,” the authors examined literature that either: (a) developed or validated sepsis prediction models or (b) implemented sepsis prediction models in a clinical environment. Not surprisingly, the ML model development/validation studies (n=12) outnumbered the implementation studies (n=2).
ML model development/validation studies reported moderate-to-high area-under-the-curve (AUC) values. AUCs are commonly reported in ML model performance because the AUC is a measure of how well a model discriminates between people with and without a condition. One reason many investigators only report AUC values is because this metric is largely insensitive to a condition’s prevalence in the population. In my experience working with relatively rare nursing-relevant outcomes, a prevalence-sensitive metric (such as precision and F1 measures) provides additional insight into how beneficial a predictive model might be for clinicians.
In this integrative review, the F1 measure (calculated by combining sensitivity and precision) was reported in only 2 of the studies. One of those F1 values was particularly low (0.05) suggesting inadequate sensitivity (i.e., the ability to identify patients with the condition) and/or precision (i.e., the percentage of alerts that correctly identify patients with the condition). If sensitivity is low, the model is not identifying enough people with the condition. If precision is low, alarm fatigue results from too many false positive alerts. By training a predictive model to maximize AUC rather than the F1 measure, it’s possible we are not optimizing the model for satisfactory clinical use.
It’s also worth mentioning that clinical use is not only influenced by the ML model performance characteristics but also by constraints in the clinical environment and whether interventions exist to alter the patient outcome. Many refer to this problem as the “last mile” of ML applications in healthcare. Simply because a model can accurately predict an outcome does not imply clinicians can effectively intervene. In fact, the ability to intervene can be influenced by a number of local contextual factors that require organization-specific quality improvement initiatives.6 While building an ML model is complex, using the information to support clinical decision making and change people’s behavior is widely viewed as even more complicated. The Figure 7 shown here is one example of a framework that attempts to succinctly convey the complex system in which ML models (labeled as “Processing & Output” here) exist.
Figure.
User- and Context-Dependent Framework for Clinical Decision Support Systems
There are no magic solutions to either adequate ML model performance or successful implementation and intervention. However, an essential prerequisite is having nurses (and other clinicians) engaged throughout the lifecycle of ML model development, evaluation, and implementation. For example, in the model development process, what if we discover a bimodal distribution of the weight variable? Nurses who regularly perform data entry would be able to help ascertain whether this finding might be caused by differing weight units (i.e., pounds vs. kilograms) or populations (e.g., adults vs. children). Similarly, nurses contribute expertise on the quality of available data. For example, if documenting vital signs in the electronic health record is routinely performed later in the shift (rather than in real-time), building an ML model that assumes data are available in real-time will not perform as well when implemented in the clinical environment with delayed data input. Nurses also have knowledge of whether additional predictors might improve model performance – e.g., whether a patient is on oxygen might be an important predictor, but including the amount of oxygen the patient is receiving might be even more helpful. Or, consider the possibility that simply the act of opening the sepsis alert could serve as a proxy for “nurse concern” that enhances the model’s performance. Finally, nurses can provide invaluable insight into which information is (or isn’t) actionable and how that facilitates adoption and use. For example, a white blood cell count elevation might contribute to the model’s predictive ability and could help explain to clinicians how the model is working, but there are no direct interventions that alter the white blood cell count for patients experiencing sepsis. Whether and how to display that variable’s information is a great decision for clinicians, rather than ML developers, to make.
Artificial intelligence and ML will be “working” in healthcare for the foreseeable future. In order for predictive models to have the best chance at improving health outcomes, they not only need to perform well on metrics that are important to clinicians but they also need to be placed into clinical environments in ways that are considerate of the context in which they’re expected to contribute.
Acknowledgement:
“Dr. Jeffery received support for this work from the Agency for Healthcare Research and Quality (AHRQ) and the Patient-Centered Outcomes Research Institute (PCORI) under award number K12 HS026395. The content is solely the responsibility of the author and does not necessarily represent the official views of AHRQ, PCORI, or the United States Government.”
References
- 1.Fleischmann C, Scherag A, Adhikari NK, et al. Assessment of global incidence and mortality of hospital-treated sepsis. Current estimates and limitations. Am J Respir Crit Care Med. 2016;193(3):259–272. [DOI] [PubMed] [Google Scholar]
- 2.Angus DC, Linde-Zwirble WT, Lidicker J, et al. Epidemiology of severe sepsis in the united states: Analysis of incidence, outcome, and associated costs of care. Crit Care Med. 2001;29(7):1303–1310. [DOI] [PubMed] [Google Scholar]
- 3.Page DB, Donnelly JP, Wang HE. Community-, healthcare-, and hospital-acquired severe sepsis hospitalizations in the university healthsystem consortium. Crit Care Med. 2015;43(9):1945–1951. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Rhee C, Wang R, Zhang Z, et al. Epidemiology of hospital-onset versus community-onset sepsis in u.S. Hospitals and association with mortality: A retrospective analysis using electronic clinical data. Crit Care Med. 2019;47(9):1169–1176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Torio C, Moore B. National inpatient hospital costs: The most expensive conditions by payer, 2013. Hcup statistical brief #204. 2016, May; https://www.hcup-us.ahrq.gov/reports/statbriefs/sb204-Most-Expensive-Hospital-Conditions.pdf. [PubMed]
- 6.Gripp L, Raffoul M, Milner KA. Implementation of the surviving sepsis campaign one-hour bundle in a short stay unit: A quality improvement project. Intensive Crit Care Nurs. 2020:103004. [DOI] [PubMed] [Google Scholar]
- 7.Jeffery AD, Novak LL, Kennedy B, et al. Participatory design of probability-based decision support tools for in-hospital nurses. J Am Med Inform Assoc. 2017;24(6):1102–1110. [DOI] [PMC free article] [PubMed] [Google Scholar]

