In order for machine learning models focused on clinical prediction problems to be useful, they must be informative and actionable, accurate and timely, interpretable, reproducible and generalizable, and, perhaps most importantly, implementable.(1) Risk prediction models deployed as clinical decision support (CDS) tools embedded in electronic health record (EHR) systems are often designed to assist clinicians in making good decisions quickly. One type of risk prediction model, the early warning system, attempts to identify patients who will clinically deteriorate soon. Those patients can then receive rapid treatment designed to stabilize their physiology and decrease their risk of death, permanent disability, or worsening organ dysfunction.(2) The pediatric early warning system (PEWS) family of scores was originally developed to decrease the likelihood of cardiopulmonary arrest outside of an intensive care unit.(3) A large randomized controlled trial of PEWS implementation conducted in 21 hospitals in 7 countries outside the U.S. did not show an all-cause mortality benefit.(4) However, PEWS and other similar systems may have benefits beyond patient-level prediction that include empowerment of nurses, providing less-experienced nurses with reference ranges for vital signs, and alerting nurses and physicians to concerning changes.(5) Overall, however, consistent early identification of children with imminent clinical deterioration, particularly in high-risk subgroups such as cardiology and oncology, is an unmet need.
In this issue of Pediatric Critical Care Medicine, Rust et al. report the design and implementation of a CDS system that predicts deterioration in hospitalized children.(6) To develop the system, the authors conducted a retrospective cohort study of patients at a single large, freestanding U.S. children’s hospital between October 2015 and December 2019. Using a common machine/statistical learning approach, least absolute shrinkage and selection operator (LASSO) regularized logistic regression, the authors developed three separate models for children with congenital heart disease, cancer, and all other patients. They then integrated those models into a single Deterioration Risk Index (DRI).
The study cohort included 104,447 control hospitalizations encompassing 317,536 patient-days and 126 case hospitalizations that included a clinical deterioration event. The DRI is designed to be calculated every 15 minutes by the EHR system and includes 18 (oncology), 21 (cardiology), or 28 (all others) predictors. This is consistent with other early warning scores: in a recent review of methods used in developing early warning scores for adult care, the number of predictors varied from 3 to 72.(2)
The authors found that the DRI had greater discrimination and sensitivity than the existing situational awareness system at the site (known as Watchstander) and PEWS. In the 18 months following pilot DRI deployment, the site experienced three rather than the thirteen clinical deterioration events expected based on historical trends. Of the three “missed” cases, DRI generated an alarm for two of them more than two hours prior to clinical deterioration. Only one did not generate an alarm prior to the deterioration event. Alarm burden was only modestly higher after DRI implementation and results from the pilot implementation suggested that alarms per detected deterioration event may be lower with DRI than Watchstander.
This manuscript has several notable strengths. First, it builds on the authors’ prior work developing and deploying CDS systems at their site.(7) Second, the DRI targets models to high-risk patient populations in order to overcome weaknesses of existing systems. This is an excellent example of an iteratively improved, purpose-built CDS system. Third, the DRI uses a pragmatic imputation strategy (last observation carried forward to 24 hours or normal) that is feasible for real-time CDS.(8) Fourth, the DRI is based on a natively highly interpretable modeling approach, LASSO logistic regression, rather than a less transparent machine learning algorithm.(9) Fifth, the authors evaluated the potential for bias in model sensitivity between males and females and across race and ethnicity groups. Sixth, the DRI was “designed for dissemination”: alarm burden and clinical workflow integration were considered throughout the study.
Most importantly, the DRI has already been deployed on the cardiology and general hospital medicine wards at the study site. Many clinical prediction models are developed, but few are implemented into clinical practice. The authors’ accomplished this implementation by creating a partnership between the model development team, the relevant clinical teams, and health system information technology teams to understand the current decisional needs(10) of clinicians and build a system that would function well in the existing workflows. The DRI has been integrated into the existing situational awareness program (Watchstander) with a human-in-the-loop format. This means that an alert from the DRI does not automatically lead to an escalation of care such as ICU transfer, but rather is a trigger for an existing bedside rapid response huddle, risk mitigation, and development of an escalation plan if needed.
The deterioration prediction target is also a strength. Some systems attempt to predict a heterogeneous “deterioration” that includes what might reflect natural progression of disease and appropriate health system response (e.g., transfer to the ICU and initiation of non-invasive ventilation).(11) In contrast, the deterioration composite predicted by the DRI model includes intubation, inotropes or vasopressors, or 60cc/kg fluid within the one hour prior to or immediately following ICU admission. These events are more specific and reflect a patient that is rapidly deteriorating or has reached a physiological threshold. Most (100/126) of the deterioration events in the training data were these “emergency transfers.” The remainder were code blue events outside the ICU and unexpected deaths.
Among the most important weaknesses are that the DRI models are based on data from a single institution and that they were not validated using current best practices in this study. The study dataset was extremely unbalanced (nearly 1:1000 cases to controls) and the authors struggled to converge on a stable model when they attempted to split out a test set. They chose instead to rely on cross-validation. Future prospective validation will be very important to solidify confidence in the system’s performance.
Another weakness is that some of the model input variables are idiosyncratic and will not easily generalize to other health systems. Some “structured” EHR variables generalize relatively easily to other sites – for example, laboratory values and vital signs. Other structured variables such as diagnoses and procedures have widely used formats and generalize easily, but only exist in a site’s EHR for pre-existing patients. A third category of variables includes those captured in, for example, drop-down menus, and recorded in customized EHR data structures (“flowsheets” in the EHR system used at the authors’ institution, but analogous structures exist in other vendors’ EHRs). These variables have structure because they are categorical rather than free-text, but would require substantial effort to translate to another site. These include variables such as “parent/caregiver gut feeling” as well as “increased edema” and other nursing assessments. This weakness has also been present for other recently reported dynamic prediction models in our field.(12, 13)
Backcharting is another potential limitation of the DRI system in its current form. Nurses are extremely busy and have many hourly and per-shift tasks and assessments. Those assessments generate rich information about the patient’s current physiologic state. However, the results of those assessments may be captured in alternative (e.g., paper) forms and entered into the EHR later, when the nurse has enough time. In most retrospective datasets, backcharted data are timestamped when the assessment occurred, not when the data are entered. In real-time applications, those data are not available until hours later, which can delay a CDS system having sufficient information in the EHR to generate a critical prediction.(14) Importantly, this is more likely to occur for sicker patients, as nurses for those patients are busier providing direct care. This may be one reason why PEWS has had less predictive utility in prospective studies than in retrospective data.
Overall, despite its weaknesses, this study is an important contribution because the authors “ate their own dog food” and did the hard work of implementing a novel machine learning-based index into clinical practice. Further prospective validation will be important. Despite its origins as a single-center quality improvement project, the DRI will have the greatest impact if it is translated to other sites and evaluated in clinical trials.(15)
Acknowledgments
This work was supported, in part, by NIH/NLM R01HD105939
Footnotes
Copyright Form Disclosure: Dr. Bennett’s institution received funding from the National Institute for Child Health and Human Development, the National Center for Advancing Translational Sciences, and the National Heart, Lung, and Blood Institute; he received support for article research from the National Institutes of Health.
References
- 1.Sanchez-Pinto LN, Bennett TD: Evaluation of Machine Learning Models for Clinical Prediction Problems. Pediatr Crit Care Med 2022; 23:405–408 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Fu L-H, Schwartz J, Moy A, et al. : Development and validation of early warning score system: A systematic literature review. Journal of Biomedical Informatics 2020; 105:103410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Duncan H, Hutchison J, Parshuram CS: The Pediatric Early Warning System score: a severity of illness score to predict urgent medical need in hospitalized children. J Crit Care 2006; 21:271–278 [DOI] [PubMed] [Google Scholar]
- 4.Parshuram CS, Dryden-Palmer K, Farrell C, et al. : Effect of a Pediatric Early Warning System on All-Cause Mortality in Hospitalized Pediatric Patients: The EPOCH Randomized Clinical Trial. JAMA 2018; 319:1002–1012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Bonafide CP, Roberts KE, Weirich CM, et al. : Beyond statistical prediction: qualitative evaluation of the mechanisms by which pediatric early warning scores impact patient safety. J Hosp Med 2013; 8:248–53 [DOI] [PubMed] [Google Scholar]
- 6.Rust L, Gorham TJ, Bambach S, et al. The Deterioration Risk Index: Developing and Piloting a Machine Learning Algorithm to Reduce Pediatric Inpatient Deterioration. Pediatr Crit Care Med. In Press. 2023 [DOI] [PubMed] [Google Scholar]
- 7.Gorham TJ, Rust S, Rust L, et al. : The Vitals Risk Index-Retrospective Performance Analysis of an Automated and Objective Pediatric Early Warning System. Pediatr Qual Saf 2020; 5:e271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Martin B, DeWitt PE, Scott HF, et al. : Machine Learning Approach to Predicting Absence of Serious Bacterial Infection at PICU Admission. Hosp Pediatr 2022; e2021005998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Rudin C: Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nat Mach Intell 2019; 1:206–215 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Bennett TD, Marsh R, Maertens JA, et al. : Decision-Making About Intracranial Pressure Monitor Placement in Children With Traumatic Brain Injury. Pediatr Crit Care Med 2019; 20:645–651 [DOI] [PubMed] [Google Scholar]
- 11.Bonafide CP, Holmes JH, Nadkarni VM, et al. : Development of a score to predict clinical deterioration in hospitalized children. J Hosp Med 2012; 7:345–349 [DOI] [PubMed] [Google Scholar]
- 12.Aczon MD, Ledbetter DR, Laksana E, et al. : Continuous Prediction of Mortality in the PICU: A Recurrent Neural Network Model in a Single-Center Dataset. Pediatr Crit Care Med 2021; 22:519–529 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Trujillo Rivera EA, Chamberlain JM, Patel AK, et al. : Dynamic Mortality Risk Predictions for Children in ICUs: Development and Validation of Machine Learning Models. Pediatr Crit Care Med 2022; 23:344–352 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Jagannath S, Sarcevic A, Young V, et al. : Temporal Rhythms and Patterns of Electronic Documentation in Time-Critical Medical Work [Internet]. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. Glasgow Scotland Uk: ACM; 2019. p. 1–13.[cited 2023 Jan 17] Available from: https://dl.acm.org/doi/10.1145/3290605.3300564 [Google Scholar]
- 15.Angus DC: Randomized Clinical Trials of Artificial Intelligence. JAMA 2020; 323:1043–1045 [DOI] [PubMed] [Google Scholar]
