Skip to main content
Journal of the American Medical Informatics Association: JAMIA logoLink to Journal of the American Medical Informatics Association: JAMIA
. 2021 Feb 26;28(6):1207–1215. doi: 10.1093/jamia/ocaa347

Using machine learning to improve the accuracy of patient deterioration predictions: Mayo Clinic Early Warning Score (MC-EWS)

Santiago Romero-Brufau 1,2,, Daniel Whitford 3, Matthew G Johnson 4, Joel Hickman 4, Bruce W Morlan 4, Terry Therneau 4, James Naessens 4, Jeanne M Huddleston 1
PMCID: PMC8661441  PMID: 33638343

Abstract

Objective

We aimed to develop a model for accurate prediction of general care inpatient deterioration.

Materials and Methods

Training and internal validation datasets were built using 2-year data from a quaternary hospital in the Midwest. Model training used gradient boosting and feature engineering (clinically relevant interactions, time-series information) to predict general care inpatient deterioration (resuscitation call, intensive care unit transfer, or rapid response team call) in 24 hours. Data from a tertiary care hospital in the Southwest were used for external validation. C-statistic, sensitivity, positive predictive value, and alert rate were calculated for different cutoffs and compared with the National Early Warning Score. Sensitivity analysis evaluated prediction of intensive care unit transfer or resuscitation call.

Results

Training, internal validation, and external validation datasets included 24 500, 25 784 and 53 956 hospitalizations, respectively. The Mayo Clinic Early Warning Score (MC-EWS) demonstrated excellent discrimination in both the internal and external validation datasets (C-statistic = 0.913, 0.937, respectively), and results were consistent in the sensitivity analysis (C-statistic = 0.932 in external validation). At a sensitivity of 73%, MC-EWS would generate 0.7 alerts per day per 10 patients, 45% less than the National Early Warning Score.

Discussion

Low alert rates are important for implementation of an alert system. Other early warning scores developed for the general care ward have achieved lower discrimination overall compared with MC-EWS, likely because MC-EWS includes both nursing assessments and extensive feature engineering.

Conclusions

MC-EWS achieved superior prediction of general care inpatient deterioration using sophisticated feature engineering and a machine learning approach, reducing alert rate.

Keywords: early warning score, clinical deterioration, machine learning

INTRODUCTION

Background and significance

Physiological deterioration in the hospital is often unrecognized and can generally be defined as “any significant worsening in the condition of a hospitalized patient that can result in patient morbidity and/or mortality.”1 It is commonly caused by sepsis or acute respiratory failure, with sepsis alone involved in 34% to 52% of in-hospital deaths and estimated to have an in-hospital mortality of 10%, and acute respiratory failure estimated to have an in-hospital mortality of 20%.2,3 Physiological deterioration can lead to profound clinical instability or cardiac arrest, requiring care escalation.4 Research has shown that up to 85% of such deteriorations are preceded by abnormal patient vital signs,5–8 suggesting that many of these outcomes may be preventable.9

Early intervention during patient deterioration may improve patient outcomes. For example, delayed transfer of critically ill patients to the intensive care unit (ICU) is associated with increased mortality10,11 and mortality increases approximately 8% for every hour that antibiotic treatment is delayed for patients with septic shock.12

To address the crucial need for quick interventions during clinical deterioration, hospitals created rapid response systems responsible for responding to patients’ acute needs.13 A typical rapid response system has 3 limbs: an afferent (to detect deteriorating patients), an efferent (to respond to those deteriorations), and an analytic administrative portion to continually assess and improve the system.14,15 In an effort to support the afferent limb (ie, the detection of physiological deterioration), modeling methods and patient data have been utilized to develop scores that aid in assessing and predicting patient deterioration early in a hospital visit.

To date, over 70 early warning scores (EWSs) have been developed, each using different combinations of vital sign components.16 However, when used without the other 2 limbs, EWSs fall short in 2 ways: they fail to show quantitative improvements in key clinical outcomes and they unfortunately generate a significant number of false alerts, contributing to alert fatigue for providers.17–19 In fact, even EWSs that perform well in standard comparative analyses (eg, area under the receiver-operating characteristic curve [AUROC] ) can generate a surprisingly high number of false alerts.20

To improve the predictive accuracy of EWSs, some data scientists are using very large volumes of routinely collected patient measurements (eg, vital signs and laboratory results) in development and calibration of predictive models.21–23 Additional improvements in EWSs have also been made through the utilization of sophisticated modeling techniques, such as machine learning. These techniques improve predictive accuracy for clinical deterioration on general care wards.24–27 Missing from these are other, less routinely used nursing assessments of patient functional status and mobility.

Nursing assessments can be independently predictive of patient deterioration,28,29 and also maintain significant value in a machine learning model for pediatric patients.30 Incorporation of these specific predictors into a machine learning model for adult patients is not sufficiently explored. Additionally, clinically relevant interactions between patient vital signs and laboratory results have been explored in the emergency department31 and pediatric ICU32 settings but not in general care settings.

OBJECTIVE

The objective of this study was to develop and validate the predictive accuracy of the Mayo Clinic Early Warning Score (MC-EWS), a machine learning model for predicting acute deterioration in general care wards that utilizes not only patient vital signs and laboratory tests as predictors, but also nursing assessments and variables defining known physiologic interactions.

MATERIALS AND METHODS

Population and datasets

Following approval from Mayo Clinic’s Institutional Review Board, we retrospectively collected data for consecutive hospitalizations to general care beds in 2 quaternary care hospitals in Minnesota, and a tertiary care hospital in Arizona. All adult (>18 years of age) patients that spent any time in a general care or telemetry bed were included. Hospitalizations were excluded if they were entirely ICU stays or if they were primarily for research, rehabilitation, or psychiatric purposes.

We defined “encounter” as a hospitalization and “episode” as any uninterrupted period of time spent in the general care or telemetry wards. An encounter consists of 1 or more episodes, interrupted by ICU transfers or procedures. Data from time in the ICU or procedure settings were excluded. In our datasets, each row represented a time point: a moment in time in each patient’s hospitalization. When a new value was entered for a variable, a row was added, with the latest values carried forward for the remaining variables in that row.

A schematic description of the datasets used can be seen in Figure 1. Admissions between January 2010 and December 2011 to 2 referral hospitals in southeastern Minnesota (the 2 hospitals were consolidated in 2014) were split as follows: one-third was used for a validation dataset and the remaining two-thirds was used to build the training dataset by selecting only 1 time point per hospitalization, as follows. In this two-thirds dataset, we first defined cases as hospitalizations that had an outcome of interest (see Outcomes) and controls as hospitalizations without one. Then we selected specific time points from the case and control hospitalizations to build our final training set. From the case hospitalizations, we selected the time point of the first outcome. From the remaining hospitalizations, we selected a random time point for each, and randomly subset so the final training dataset had a 1:10 case-to-control ratio. The idea was to produce a slimmed-down and faster-running dataset that allowed the model to distinguish, “is this a deteriorating patient, or a nondeteriorating patient at a random time point in their hospitalization?” Admissions between January 2011 and June 2015 to a referral hospital in Arizona were used as an external validation dataset.

Figure 1.

Figure 1.

Datasets and sample sizes.

Predictors

Predictors included demographics, vital signs, laboratory test results, and nursing assessments. The data were preprocessed to screen for extreme outliers that were not clinically plausible in order to correct for any values that had been entered into the medical record incorrectly. Supplementary Appendix 1 includes a list of the valid ranges for vital signs. Feature engineering was used to capture physiologically relevant interaction terms of the primary variables (ie, shock index as heart rate divided by systolic blood pressure). Other calculated variables included measurements of a variable through time (maxima, minima, and ranges over the previous 24 or 48 hours). The full list of predictor candidates used can be found in Supplementary Appendix 2. At each time point, we used the latest value obtained in a general care ward. For newly admitted patients, we used results from laboratory studies obtained immediately preceding transfer to general care wards. Values obtained when the patient was in the ICU or operating room were not included.

Patient location

Patients were excluded from the deterioration prediction model during the periods of time spent in the ICU, operating room, or undergoing a procedure. To determine patient location at the time of variable measurement, we used 2 different sources: the bed registry and the location provided by the patient monitoring system. Bed registration data is always present and represents the patient bed “reserved” for their overnight stay. The monitoring system location data is automatically recorded when a patient is attached to a monitoring machine containing a physical address (patient’s room). All highly monitored areas (ICU, operating room, and procedure room) have the same patient monitoring system, and every time a patient is in those areas a location is recorded. Combining these 2 data sources, we could accurately determine a patient’s location in the hospital throughout their hospitalization.

Standard of care in the study hospitals

In the general care beds, the standard of nursing care during the period of the study was to obtain a set of vitals every 8 to 12 hours, with flexibility to increase the frequency at nurses’ discretion. Vital signs are hand entered by nursing staff into the electronic medical record, or automatically collected and hand-confirmed by a nurse. Vital signs on medical and surgical floors with telemetric capability are collected more frequently with electronic capture and recording into the electronic health record. Laboratory studies in both floor settings are ordered at the physician discretion at any time of day. The rapid response team (RRT) responds at the request of nurses, physicians, or advanced allied health providers in both the general care and telemetry floors of the hospitals 24 hours per day, every day. A list of the criteria for calling RRT for each hospital is displayed in Appendix 3.

Outcomes

We used widely accepted surrogates that are representative of the conceptual outcome “acute physiological deterioration.” The primary outcome was a composite of resuscitation call for cardiorespiratory arrest (code), call to the RRT, or unplanned transfer to the ICU. Immediate postoperative transfers to the postanesthesia care unit or ICU immediately after surgery were considered planned and were not included as outcomes in the analyses. Our secondary outcome for sensitivity analysis was the composite outcome of unplanned ICU transfer or code.

Model building

We built the model on the training set, which included 1 time point per hospitalization. This time point included some information about previous time points. We selected a gradient boosting machine33 as the machine learning method for its ability to deal with missing values, interactions between predictors, and nonlinear relationships.33,34 A series of gradient boosting machine models were built on the training dataset using different shrinkage, depth, and minimum observations per node. The final model, which we call the MC-EWS, was selected based on its performance in the cross-validated training dataset.

Model accuracy analysis

The MC-EWS was evaluated in both the internal and external validation datasets. These datasets have multiple time points per hospitalization. First, a score was calculated at each patient time point. Then a score cutoff was defined as a proposed threshold for alerting clinicians to possible deterioration. Every time a patient’s score was over that defined cutoff, it was considered an alert. To prevent double-counting of repeated triggers, any alerts in the 24 hours after an initial trigger were not considered for the purposes of calculating the model’s accuracy. The alert was considered a true positive if a primary outcome occurred during that 24-hour window. No alert in the following 24 hours indicated a false positive. False negatives were outcomes that had not been detected by an alert with the aforementioned analysis. True negatives were 24-hour windows in which there were neither alerts nor outcomes. That analysis was repeated for different alert thresholds for both the MC-EWS and, as a comparator, different cutoff values of the National Early Warning Score (NEWS).35 Then, sensitivity, specificity, positive predictive value (PPV), and alerts for day per 10 patients were calculated for each score cutoff. Sensitivity and specificity were used to build the receiver-operator characteristics (ROC) curve, and to calculate the area under it (AUROC).

Because RRT calls are dependent on RRT calling criteria at each of the 2 sites as well as on individual provider or nurses’ judgement, we performed a sensitivity analysis to test the model’s robustness. This was done by removing RRT calls and repeating the accuracy analysis for the secondary outcome of only ICU transfers and resuscitation calls. This was then compared with the analysis for the primary outcome.

R version 3.1.3 “Smooth Sidewalk” (R Foundation for Statistical Computing, Vienna, Austria) was used for statistical analysis and the R package “gbm” was used for model building.

RESULTS

Datasets

Figure 1 shows the relationships between each of the datasets used in this study and presents their sample sizes. Our initial dataset was first divided by geographical location (ie Rochester, Arizona); we then used a 67%/33% split to further divide the internal data into 2 sets for training and validation. This resulted in 3 distinct datasets for our analysis: a training dataset (the slimmed-down subset of 24 500 hospitalizations, with only 1 time point per hospitalization), a dataset for internal validation of the model (25 784 hospitalizations, all time points included), and a dataset for external validation of the model (53 956 hospitalizations, all time points included).

Descriptive statistics

Table 1 presents the patient and hospitalization characteristics in our datasets for training (and parent pretraining dataset), internal validation, and external validation. The results show that population characteristics were comparable across all datasets (with the exception of the training dataset, purposefully enriched to a 1:10 case-to-control ratio). Event rates for all 3 portions of the primary outcome were similar in the pretraining and internal validation datasets but higher in the training and external validation datasets.

Table 1.

Patient characteristics across datasets

Internal
External
Variable Pretraining (n = 51 826) Training (n = 24 500) Validation (n = 25 784) Validation (n = 53 956)
Total patientsa 39 425 21 165 19 580 32 761
 Female patients 20 067 (50.9) 10 453 (49.4) 10 058 (51.4) 15 198 (46.4)
Mean age, y 58.5 60.3 58.4 63.8
Total episodesb 76 503 37 925 38 043 57 156
Total time points 11 697 727 34 923 5 955 389 5 037 866
Length of stay, d
LOS of hospitalization, d
 25th percentile 2 3 2 2
 50th percentile 3 5 3 3
 75th percentile 6 9 6 5
Events
Total events 3185 3185 1547 6909
 RRT calls 1519 (47.7) 1519 (47.7) 749 (48.4) 4353 (63)
 Code 45 190 (6.0) 190 (6.0) 98 (6.3) 207 (3)
 Unscheduled transfer to ICU 1476 (46.3) 1476 (46.3) 700 (45.2) 2349 (34)
Event rate per 100 episodes
Total event rate 4.2 11.4 4.1 12.1
 RRT call rate 2.0 5.4 2.0 7.6
 Code 45 rate 0.2 0.7 0.3 0.4
 Unscheduled transfers to ICU rate 1.9 5.3 1.8 4.1

Values are n (%).

ICU: intensive care unit; LOS: length of stay; MC-EWS: Mayo Clinic Early Warning Score; NEWS: National Early Warning Score; RRT: rapid response team.

a

Total unique patients. Some patients had more than 1 hospitalization in our dataset.

b

An episode is an uninterrupted period of time when a patient is in a general care ward.

Score performance in validation datasets

Figure 2 shows a comparison of the ROC curves (top section) and alert rate vs sensitivity curves (bottom section) for both the MC-EWS and NEWS in the internal validation dataset and external validation dataset, using our primary composite outcome (RRT, ICU transfer, or resuscitation call). The ROC curves show that the MC-EWS outperforms the NEWS in both datasets. The sensitivity vs alert rate curves also show that the MC-EWS generates fewer alerts than the NEWS at a fixed sensitivity level.

Figure 2.

Figure 2.

Mayo Clinic Early Warning Score (MC-EWS) and National Early Warning Score (NEWS) in the validation sets for primary outcome. AUROC: area under the receiver-operating characteristic curve; ROC: receiver-operating characteristic.

Table 2 compares the performance of the 2 scores (MC-EWS and NEWS) in the external validation dataset. Each set of measurements is reported at equal sensitivity between the MC-EWS and NEWS to allow for a direct comparison of the other measures. The MC-EWS demonstrates superior performance across all metrics at every level of sensitivity. For example, Table 2 shows that at a sensitivity of 73%, the MC-EWS has a PPV of 0.12 and generates 0.70 alerts per day per 10 patients. This represents a 45% lower alert rate compared with what would be generated by NEWS, with a PPV of 0.07 generating 1.27 alerts per day per 10 patients at the same sensitivity of 0.73.

Table 2.

Performance of MC-EWS and NEWS in external validation dataset

Sensitivity MC-EWS
NEWSa
Difference in Alert Rate (%)
Specificity PPV Alerts per Day per 10 Patients Specificity PPV Alerts per Day per 10 Patients
0.73 0.94 0.12 0.70 0.88 0.07 1.27 −45
0.81 0.90 0.08 1.09 0.84 0.06 1.71 −36
0.89 0.83 0.06 1.78 0.76 0.04 2.47 −28
0.94 0.73 0.04 2.81 0.70 0.03 3.11 −10

MC-EWS: Mayo Clinic Early Warning Score; NEWS: National Early Warning Score; PPV: positive predictive value.

a

Linear interpolation was used to calculate the PPV, specificity and alert rate at each sensitivity value for NEWS.

Sensitivity analysis

Figure 3 shows the ROC curves (left) and alert rate vs sensitivity curves (right) for both the MC-EWS and NEWS. The curves show little change after removing RRT calls as an outcome, with MC-EWS again demonstrating consistent improvement over the NEWS. There is only a minimal loss in the performance of the MC-EWS in this sensitivity analysis (AUROC of 0.932 in sensitivity analysis vs 0.937 with primary outcome).

Figure 3.

Figure 3.

Mayo Clinic Early Warning Score (MC-EWS) and National Early Warning Score (NEWS) in the external validation set for secondary outcome. AUROC: area under the receiver-operating characteristic curve; ROC: receiver-operating characteristic.

Relative influence of model predictors

Figure 4 presents the top 20 variables in the final model, selected by the machine learning algorithm from the more than 120 variables that were included. The top 4 variables (which account for 40% of the relative importance) include 2 secondary variables (Kirkland probability, and respiratory index in the presence of oxygen supplementation)36 and 2 primary variables (systolic blood pressure and heart rate).

Figure 4.

Figure 4.

Relative influence of top 20 variables. BUN: blood urea nitrogen; DBP: diastolic blood pressure; HCO3: bicarbonate; INR: international normalized ratio; pCO2: partial pressure of carbon dioxide; RASS: Richmond Agitation Sedation Scale; RR: respiratory rate; SBP: systolic blood pressure; SpO2: oxygen saturation.

DISCUSSION

We have developed the MC-EWS, a machine learning model that accurately predicts acute deterioration in the general care wards, represented by the primary composite outcome of RRT call, resuscitation call, or unplanned transfer to the ICU. Over 120 variables were used as predictors, including: demographics, vital signs (including trend over time), laboratory results, nursing assessments, and clinically relevant interactions. This model demonstrated excellent discrimination (AUROC = 0.937 in the external validation set) and could be implemented to alert providers to act early in the deterioration process, with the goal of reducing preventable mortality and morbidity. Compared with the NEWS, our model achieved a 45% reduction in alert rate for a comparable sensitivity of 72%.

There have been other attempts to incorporate machine learning approaches to the prediction of in-hospital mortality or acute deterioration in the general care wards, but they have achieved lower accuracy overall. Arguably, this is because no other approach has incorporated all of the elements described: a high number of predictor variables including nursing assessments, and feature engineering. The models are not publicly available, so we could not calculate and directly compare them within our dataset, but we provide a brief comparison subsequently.

The APPROVE score predicts acute respiratory failure and mortality.24 It did not, however, include feature engineering, and its discrimination (AUROC = 0.87) was lower than our MC-EWS. Giannini et al25 developed a random forest classifier to predict severe sepsis and septic shock using 587 variables, including time series data. However, to keep alerts at 10 per day for implementation, sensitivity was only 26%, meaning that most deteriorating patients did not trigger an alert. The MEWS++ random forest model, developed by Kia et al37 to predict death or escalation of care within 6 hours, also used an extensive range of predictors, including the time series of frequently measured variables. It compares favorably to our MC-EWS, with a sensitivity of 78.9% and PPV of 11.5% at their chosen cutoff, but it would likely benefit from capturing relevant physiologic interactions between variables. Other attempts have used deep learning, which has the advantage of automatically learning features, and has been applied to predicting in-hospital cardiac arrest.38 However, for that model, the alarm rate of 0.3 per patient per hour at a sensitivity of 75% was still too high. The most likely explanation for why this model was not able to achieve a higher accuracy is that it used only 4 vital signs as predictors missing the lift obtained from nursing assessments and laboratory results.

We chose to model our outcome as a composite of clinical responses (ICU transfer, RRT call, or code), rather than mortality, to capture instances in which escalation of care was considered necessary, rather than including events of expected death in which de-escalation was the appropriate clinical response. Because of the subjectivity of RRT calling criteria, a sensitivity analysis was done to evaluate the performance of MC-EWS with RRT calls removed as an outcome. Results showed little difference when compared with the accuracy analysis for the primary outcome, corroborating that RRT calls are a reliable surrogate for physiological deterioration. The slight reduction in performance for both MC-EWS and NEWS was likely due to the lower number of total outcomes after eliminating RRT calls as an outcome. We compared the performance in the internal and external validation sets (see Supplementary Appendix 4). We noted a slight increase in predictive performance in our external validation dataset compared with the internal validation set. This could be due to a higher frequency of vital sign recordings, or more accurate data entry.

Examining the most influential predictors in the MC-EWS yields some interesting insights. The top predictor, Kirkland score,36 was a composite derived from the shock index, Braden score,39 respiratory rate, and oxygen saturation. The inclusion of the Braden score, which consists of several nursing assessments such as nutrition, mobility, and sensory perception, supports our emphasis on including nursing assessments in the model. Looking at other top predictors, we found that 7 of the top 10 were either secondary variables derived from others or time-dependent variables. In an application in which every improvement in discrimination is vital, this demonstrates the value feature engineering can provide.

Strengths of our study include the selection of a large number of variables for creation of the model, particularly nursing assessments, the range of values over time, and clinically relevant interactions. The low-bias gradient boosted machine learning method used was able to capture nonlinear relationships and incorporate “missing” as a possible value to be evaluated. Missing variable values provide important clinical context regarding provider medical decision making (ordering of a laboratory study) or realities of clinical operations at that moment (no time for nurse to chart a vital sign). Because a missing value does not necessarily reduce our expectation of risk the same way a normal level of a variable would, it is important to consider a missing variable value as possible valuable.

Our study is not without its limitations. The MC-EWS was developed and validated in 3 tertiary and quaternary care hospitals operated by Mayo Clinic and may perform differently if evaluated in smaller community hospitals or those with different rapid response systems. Future research could be done to improve upon the performance of the MC-EWS, for example, by using continuous monitoring devices or wearable data. Deep learning models have been developed using several vital signs as predictors of deterioration with promising results.38,40,41 Future studies could focus on applying these methodologies to multi-institution datasets. In addition, the development data were older (2010-2011). However, we do not expect, with the possible exception of COVID-19 (coronavirus disease 2019), that changes over time in the practice or patient parameters would invalidate our findings.

Additionally, implementation of MC-EWS should be piloted to measure whether this machine learning automatic alert system can trigger consistent reactions (efferent limb of the rapid response system) resulting in earlier diagnostic and therapeutic interventions. Other machine learning scores have been implemented as afferent limbs of rapid response systems. While they have resulted in improved standardization of care,42 they are often not perceived as helpful by providers.43 It will be important to explore how to design a rapid response system that integrates cleanly into existing workflows, is trusted by caregivers, and assesses counterbalance measures to ensure that it does not adversely affect the volumes of RRT calls and ICU transfers.

CONCLUSION

We developed the Mayo Clinic Early Warning Score, a machine learning model that accurately predicts acute deterioration in the general care wards and is ready for an implementation study as the afferent limb of a rapid response system.

FUNDING

This study was funded by Mayo Clinic’s Department of Medicine internal research funds.

Supplementary Material

ocaa347_Supplementary_Data

AUTHOR CONTRIBUTIONS

SR-B and JMH conceptualized the study. SR-B, MGJ, BWM, TT, and JMH designed the methodology. SR-B, MGJ, JH, BWM, and JMH performed the investigation. SR-B, MGJ, JH, TT, and BWM performed the formal data analysis. SR-B and DW drafted the manuscript. All authors reviewed and approved the final version of the manuscript.

CONFLICT OF INTEREST STATEMENT

The Mayo Clinic Early Warning Score was licensed by Mayo Clinic to a third-party company, Jvion Inc. SR-B, MGJ, JH, BWM, and JMH receive royalties as part of the licensing agreement.

DATA AVAILABILITY

Data can be made available upon request to the corresponding author. The request will follow institutional committee approvals as per institutional policy.

REFERENCES

  • 1. Jones D, Mitchell I, Hillman K, Story D.. Defining clinical deterioration. Resuscitation 2013; 84 (8): 1029–34. [DOI] [PubMed] [Google Scholar]
  • 2. Liu V, Escobar GJ, Greene JD, et al. Hospital deaths in patients with sepsis from 2 independent cohorts. JAMA 2014; 312 (1): 90–2. [DOI] [PubMed] [Google Scholar]
  • 3. Stefan MS, Shieh MS, Pekow PS, et al. Epidemiology and outcomes of acute respiratory failure in the United States, 2001 to 2009: a national survey. J Hosp Med 2013; 8 (2): 76–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Peate I, Dutton H.. Acute Nursing Care: Recognising and Responding to Medical Emergencies. 2nd ed.New York, NY: Routledge; 2020. [Google Scholar]
  • 5. Buist MD, Jarmolowski E, Burton PR, Bernard SA, Waxman BP, Anderson J.. Recognising clinical instability in hospital patients before cardiac arrest or unplanned admission to intensive care. A pilot study in a tertiary-care hospital. Med J Aust 1999; 171 (1): 22–5. [DOI] [PubMed] [Google Scholar]
  • 6. Schein RM, Hazday N, Pena M, Ruben BH, Sprung CL.. Clinical antecedents to in-hospital cardiopulmonary arrest. Chest 1990; 98 (6): 1388–92. [DOI] [PubMed] [Google Scholar]
  • 7. Hillman KM, Bristow PJ, Chey T, et al. Antecedents to hospital deaths. Intern Med J 2001; 31 (6): 343–8. [DOI] [PubMed] [Google Scholar]
  • 8. Kause J, Smith G, Prytherch D, et al. A comparison of antecedents to cardiac arrests, deaths and emergency intensive care admissions in Australia and New Zealand, and the United Kingdom–the ACADEMIA study. Resuscitation 2004; 62 (3): 275–82. [DOI] [PubMed] [Google Scholar]
  • 9. Hogan H, Healey F, Neale G, Thomson R, Vincent C, Black N.. Preventable deaths due to problems in care in English acute hospitals: a retrospective case record review study. BMJ Qual Saf 2012; 21 (9): 737–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Young MP, Gooder VJ, McBride K, James B, Fisher ES.. Inpatient transfers to the intensive care unit: delays are associated with increased mortality and morbidity. J Gen Intern Med 2003; 18 (2): 77–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Chalfin DB, Trzeciak S, Likourezos A, Baumann BM, Dellinger RP.. Impact of delayed transfer of critically ill patients from the emergency department to the intensive care unit. Crit Care Med 2007; 35 (6): 1477–83. [DOI] [PubMed] [Google Scholar]
  • 12. Kumar A, Roberts D, Wood KE, et al. Duration of hypotension before initiation of effective antimicrobial therapy is the critical determinant of survival in human septic shock. Crit Care Med 2006; 34 (6): 1589–96. [DOI] [PubMed] [Google Scholar]
  • 13. Hillman K, Parr M, Flabouris A, Bishop G, Stewart A.. Redefining in-hospital resuscitation: the concept of the medical emergency team. Resuscitation 2001; 48 (2): 105–10. [DOI] [PubMed] [Google Scholar]
  • 14. Devita MA, Bellomo R, Hillman K, et al. Findings of the first consensus conference on medical emergency teams. Crit Care Med 2006; 34 (9): 2463–78. [DOI] [PubMed] [Google Scholar]
  • 15. DeVita MA, Smith GB, Adam SK, et al. Identifying the hospitalised patient in crisis;–a consensus conference on the afferent limb of rapid response systems. Resuscitation 2010; 81 (4): 375–82. [DOI] [PubMed] [Google Scholar]
  • 16. Smith GB, Prytherch DR, Schmidt PE, Featherstone PI.. Review and performance evaluation of aggregate weighted ‘track and trigger’ systems. Resuscitation 2008; 77 (2): 170–9. [DOI] [PubMed] [Google Scholar]
  • 17. Smith GB, Prytherch DR, Schmidt PE, Featherstone PI, Higgins B.. A review, and performance evaluation, of single-parameter track and trigger systems. Resuscitation 2008; 79 (1): 11–21. [DOI] [PubMed] [Google Scholar]
  • 18. Hamilton F, Arnold D, Baird A, Albur M, Whiting P.. Early Warning Scores do not accurately predict mortality in sepsis: A meta-analysis and systematic review of the literature. J Infect 2018; 76 (3): 241–8. [DOI] [PubMed] [Google Scholar]
  • 19. de Grooth HJ, Girbes AR, Loer SA.. Early warning scores in the perioperative period: applications and clinical operating characteristics. Curr Opin Anaesthesiol 2018; 31 (6): 732–8. [DOI] [PubMed] [Google Scholar]
  • 20. Romero-Brufau S, Huddleston JM, Escobar GJ, Liebow M.. Why the C-statistic is not informative to evaluate early warning scores and what metrics to use. Crit Care 2015; 19 (1): 285. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Finlay GD, Rothman MJ, Smith RA.. Measuring the modified early warning score and the Rothman index: advantages of utilizing the electronic medical record in an early warning system. J Hosp Med 2014; 9 (2): 116–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Rothman MJ, Rothman SI, Beals J.. Development and validation of a continuous measure of patient condition using the Electronic Medical Record. J Biomed Inform 2013; 46 (5): 837–48. [DOI] [PubMed] [Google Scholar]
  • 23. Churpek MM, Yuen TC, Winslow C, et al. Multicenter development and validation of a risk stratification tool for ward patients. Am J Respir Crit Care Med 2014; 190 (6): 649–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Dziadzko MA, Novotny PJ, Sloan J, et al. Multicenter derivation and validation of an early warning score for acute respiratory failure or death in the hospital. Crit Care 2018; 22 (1): 286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Giannini HM, Ginestra JC, Chivers C, et al. A Machine Learning Algorithm to Predict Severe Sepsis and Septic Shock: Development, Implementation, and Impact on Clinical Practice. Crit Care Med 2019; 47 (11): 1485–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Barton CM, Lynn-Palevsky A, Fletcher G, et al. Predicting patient mortality: using machine learning to identify at-risk patients and improve outcomes. Am J Respiratory Crit Care Med 2020; 201: A4299. [Google Scholar]
  • 27. Ye C, Wang O, Liu M, et al. A real-time early warning system for monitoring inpatient mortality risk: prospective study using electronic medical record data. J Med Internet Res 2019; 21 (7): e13719. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Rothman MJ, Solinger AB, Rothman SI, Finlay GD.. Clinical implications and validity of nursing assessments: a longitudinal measure of patient condition from analysis of the electronic medical record. BMJ Open 2012; 2 (4): e000646. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Romero-Brufau S, Gaines K, Nicolas CT, Johnson MG, Hickman J, Huddleston JM.. The fifth vital sign? Nurse worry predicts inpatient deterioration within 24 hours. JAMIA Open 2019; 2 (4): 465–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Wellner B, Grand J, Canzone E, et al. Predicting unplanned transfers to the intensive care unit: a machine learning approach leveraging diverse clinical elements. JMIR Med Inform 2017; 5 (4): e45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Delahanty RJ, Alvarez J, Flynn LM, Sherwin RL, Jones SS.. Development and evaluation of a machine learning model for the early identification of patients at risk for sepsis. Ann Emerg Med 2019; 73 (4): 334–44. [DOI] [PubMed] [Google Scholar]
  • 32. Rubin J, Potes C, Xu-Wilson M, et al. An ensemble boosting model for predicting transfer to the pediatric intensive care unit. Int J Med Inform 2018; 112: 15–20. [DOI] [PubMed] [Google Scholar]
  • 33. Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Statist 2001; 29 (5): 1189–232. [Google Scholar]
  • 34. Friedman JH. Stochastic gradient boosting. Comput Stat Data Anal 2002; 38 (4): 367–78. [Google Scholar]
  • 35. Smith GB, Prytherch DR, Meredith P, Schmidt PE, Featherstone PI.. The ability of the National Early Warning Score (NEWS) to discriminate patients at risk of early cardiac arrest, unanticipated intensive care unit admission, and death. Resuscitation 2013; 84 (4): 465–70. [DOI] [PubMed] [Google Scholar]
  • 36. Kirkland LL, Malinchoc M, O’Byrne M, et al. A clinical deterioration prediction tool for internal medicine patients. Am J Med Qual 2013; 28 (2): 135–42. [DOI] [PubMed] [Google Scholar]
  • 37. Kia A, Timsina P, Joshi HN, et al. MEWS++: enhancing the prediction of clinical deterioration in admitted patients through a machine learning model. J Clin Med 2020; 9 (2): 343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Kwon JM, Lee Y, Lee Y, Lee S, Park J.. An algorithm based on deep learning for predicting in-hospital cardiac arrest. J Am Heart Assoc 2018; 7 (13): e008678. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Bergstrom N, Braden BJ, Laguzza A, Holman V.. The braden scale for predicting pressure sore risk. Nurs Res 1987; 36 (4): 205–10. [PubMed] [Google Scholar]
  • 40. Kim SY, Kim S, Cho J, et al. A deep learning model for real-time mortality prediction in critically ill children. Crit Care 2019; 23 (1): 279. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Kam HJ, Kim HY.. Learning representations for the early detection of sepsis with deep neural networks. Comput Biol Med 2017; 89: 248–55. [DOI] [PubMed] [Google Scholar]
  • 42. Paulson SS, Dummett BA, Green J, Scruth E, Reyes V, Escobar GJ.. What do we do after the pilot is done? implementation of a hospital early warning system at scale. Jt Comm J Qual Patient Saf 2020; 46 (4): 207–16. [DOI] [PubMed] [Google Scholar]
  • 43. Ginestra JC, Giannini HM, Schweickert WD, et al. Clinician perception of a machine learning-based early warning system designed to predict severe sepsis and septic shock. Crit Care Med 2019; 47 (11): 1477–84. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ocaa347_Supplementary_Data

Data Availability Statement

Data can be made available upon request to the corresponding author. The request will follow institutional committee approvals as per institutional policy.


Articles from Journal of the American Medical Informatics Association : JAMIA are provided here courtesy of Oxford University Press

RESOURCES