Abstract
Having an interpretable dynamic length-of-stay (LOS) model can help hospital administrators and clinicians make better decisions and improve the quality of care. The widespread implementation of electronic medical record (EMR) systems has enabled hospitals to collect massive amounts of health data. However, how to integrate this deluge of data into healthcare operations remains unclear. We propose a framework grounded in established clinical knowledge to model patients’ lengths-of-stay. In particular, we impose expert knowledge when grouping raw clinical data into medically meaningful variables, which summarize patients’ health trajectories. We use dynamic predictive models to output patients’ remaining lengths-of-stay (RLOS), future discharges, and census probability distributions based on their health trajectories up to the current stay. Evaluated with large-scale EMR data, the dynamic model significantly improves predictive power over the performance of any model in previous literature and remains medically interpretable.
Keywords: healthcare, hospitals, statistics, nonparametric, computational methods
1. Introduction
The study of length-of-stay (LOS) has been a focus in both operational and clinical literature. Clinically, there is strong evidence demonstrating that longer LOS is correlated with hospital-acquired conditions (HAC), adverse drug events, and readmission (Hoogervorst-Schilp et al. 2015, Basques et al. 2015, Ansari et al. 2018, Rinne et al. 2017, Kaboli et al. 2012). For example, one additional day in a hospital would increase a patient’s risk of infection by 17.6% and the risk of an adverse drug reaction by 5.5% (Hauck and Zhao 2011). On the other hand, a 27% reduction in LOS was associated with a 16% decrease of 30-day readmissions and a 18% decrease in 90 days mortalities (Kaboli et al. 2012).
Financially, the transition from per-diem based payment to a diagnosis-related group (DRG) based payment system has put pressure on hospitals to use resources efficiently and manage patients’ lengths-of-stay. Because a DRG-based system pays a fixed amount per admission, fewer hospital days mean lower variable costs and additional capacity for new admissions. As a result, in the first year under Medicare’s Inpatient Prospective Payment System (DRG based), the average length-of-stay fell 9% (Rosenberg and Browne 2001).
In this study, we leverage more than 8 years (2008 – 2015) of electronic medical records (EMR) data from a 71-bed children’s hospital level III/IV neonatal intensive care unit (NICU) in Chicago, Illinois. We focus only on 4624 encounters that were safely discharged home, who represent the majority of NICU patients (89.1%). The mortality rate in our data is less than 5%. The motivation to study patients in the NICU with a focus on lengths-of-stay is threefold. First, like other critical care units, the NICU is one of the most data-intensive in a hospital (Anthony Celi et al. 2013, Sanchez-Pinto et al. 2018, Ghassemi et al. 2015a). All the neonates (newborns) under care are being constantly monitored by machines and frequently checked by medical professionals. Secondly, NICU patients’ conditions can be extremely severe, often systematic, and possibly life-threatening, which means models built for the NICU are comprehensive. Another consequence of the NICU patients’ severity is that lengths-of-stay in the NICU are significantly longer and much more variable than those of adult or pediatric ICU patients. The median LOS for an adult or pediatric ICU patient is around two days (Verburg et al. 2017, Pollack et al. 2018). The median LOS for patients in this study is 9 days, with a mean of 23.4 days and a standard deviation of 34.7 days. This is partly because adult or pediatric ICU patients are usually discharged to the floor. Recovered Neonatal ICU patients, like inpatient patients, are most frequently discharged home.
In medical literature, lengths-of-stay, or time to event data in general, can be modeled using survival analysis techniques. Cox proportional hazard (Coxph) models are most widely used to analyze how patients’ health conditions affect time durations (Collett 2015). Another alternative is the accelerated failure time (AFT) model, which assumes a linear relationship between the health variables and log(LOS). This assumption allows direct regression of LOSs on patients’ health variables with additional assumptions on the distribution of LOSs. Even though LOSs are directly estimated in AFT models, they are usually related to retrospective mortality studies (Verma et al. 1999, Bender et al. 2013, Lee et al. 2016, Richardson et al. 2001, Harsha and Archana 2015). The goal is often to identify the underlying medical relationship between patient severity indicated by a handful of variables and LOS, not to predict prospectively. There is an unserved need when neonatologists attempt to compare their prospective clinical discharge predictions to these models. In addition to being retrospective, these are static models. They generate static estimations per patient using summary statistics of health variables that often change over time.
1.1. Contribution
Therefore, for neonatologists and hospital administrators, a interpretable prognostic model that can accurately inform a NICU patient’s future stays allows them to better benchmark clinical estimations, carefully plan treatment and optimize medical care. In this paper we make the following contributions:
Our study leverages expert knowledge throughout the iterative development of an accurate and interpretable machine learning algorithm for healthcare operations applications. Unlike recent studies (Rajkomar et al. 2018) that applied machine learning algorithms directly to raw EMR data, every health variable used in our model has been vetted by neonatologists during the clinical focus-group sessions. The codification of the clinical knowledge in defining and refining the health variables allows the results to be more robust and accurate. The clinicians’ involvement in the iterative process of model development guarantees the interpretability of model outputs. This study demonstrates the value of maintaining robust clinical input throughout the development of machine learning models.
Instead of building a retrospective static model, we construct a dynamic model that takes in comprehensive high-dimensional real-time health data and predicts a patient’s remaining length-of-stay (RLOS). Using standard machine learning algorithms, random forests of the classification trees family, we achieve highly accurate remaining length-of-stay predictions. Our model has validation R2 > 0.8 after more than 55% of patients are discharged and outperforms the current state-of-art AFT regression models until a substantial part (90%) of the patients’ lengths-of-stay are reached. For discharge time predictions with rolling time horizons (next week, next 2-week, next month), our model achieves validation AUROCs >0.88 for the first 93% of times until discharge.
This high accuracy in our model predictions is also a result of our focus on modeling the survival functions directly per LOS per patient. We also discuss how to construct accurate overall census predictions. With the exact probability distributions of the NICU census over time for existing patients, our accurate survival functions can be a key building block to such a system.
1.2. Literature review
Even though simple linear or log-linear models have difficulty capturing complex medical interactions, in most clinical literature, regression models with a handful of key variables or scores are regularly deployed. Previously reported Cox proportional hazard and accelerated failure time models fall into this category (Verburg et al. 2017, Verma et al. 1999, Richardson et al. 2001, Bender et al. 2013, Lee et al. 2016, Chaou et al. 2017). In these models, predictions are typically based on the last available measurements or a summary of previous measurements, even though a patient’s health condition is often measured repeatedly over time.
While the majority of the literature focuses on explanatory and retrospective modeling, there have been a few predictive models for ICU lengths-of-stay. A few studies have responded to the ever-growing data with novel techniques to forecast patient outcomes at various time points. Ghassemi et al. 2015b applied the multi-task Gaussian process models on multivariate time-series data to predict acuity and mortality in-hospital and after discharge. Rajkomar et al. 2018 modeled a larger, multi-center time-series dataset using recurrent neural networks to predict mortality within 24h, 30-day readmissions and week-long length-of-stay. Although they achieved high accuracy at several fixed time points, these are static models that process data making fixed-time predictions within fixed time windows, ignoring health variables that changes dynamically.
Instead of static outcomes at fixed time points, Aczon et al. 2017 applied recurrent neural networks to time-series data and generated a temporally dynamic risk of mortality predictions for Pediatric ICU patients. This setup illustrated another major challenge of predictive modeling within health care applications. The complexity involved in interpreting a “black-box” model, such as a neural network, is likely to discourage medical professionals from integrating machine learning algorithms into their clinical decision-making (Sanchez-Pinto et al. 2018, Verghese et al. 2018).
The idea of using multi-state models for time-to-event data in longitudinal studies has a long history (Meira-Machado et al. 2009). These models include time-varying covariates and have the potential to obtain dynamic predictions. However, the complexity of a multi-state model depends on the size of the possible state space and the number of transitions. Due to the high numbers of transitions and the high-dimensional health states in the NICU setting, a naive application of a multi-state model is simply not scalable.
1.3. Challenges
We overcome the high-dimensionality of patients’ health states by modeling length-of-stay transitions instead. Our prediction target is the survival function after each LOS. From the survival functions we can generate the real-time remaining lengths-of-stay of a patient given their current health variables and the current length-of-stay.
Not only are the clinical data high dimensional, but their interactions may also be highly indicative of a patient’s health status. We use the following example to demonstrate the benefit of using a tree-based algorithm that could capture these interactions automatically. In the NICU, neonatologists intubate neonates who suffer from respiratory distress. This is a form of invasive therapy, and it interferes with regular oral feeding, which is crucial for growth and recovery. The tree-based method is able to capture this clinical interaction between breathing and feeding, as well as a patient’s change in respiratory severity. Using three health variables and their interactions, this sampled tree differentiates patients into four groups (from left to right): Feeding problem, No breathing or feeding problem, Breathing recovered, and ongoing breathing problem. Figure 1 demonstrates the survival functions (probability of staying in NICU) for the four groups of patients based on their health trajectory for the past 10-day stays. This tree predicts that neonates without any feeding and breathing problem will be discharged earliest, followed by those with breathing problem but recovered, those with feeding problem and those who continued to have breathing problem were likely to stay the longest.
Figure 1.
Tree-based Survival Functions Example. The survival curves show the probabilities of a patient staying in the NICU over time. intubated: a patient is or was intubated due to breathing problems during the 10th day; LV_max: maximum level of respiratory support for the 10th day, PO_perc: percentage of nutrition by regular oral feeding during the 10th day. Extubated: patient’s respiratory condition improved during the 10th day and the tube was removed. The bottom survival curves are the estimates for patients belonging to the corresponding groups.
The remainder of the paper is structured as follows: §2 and §3 introduce the methods and algorithms proposed in this study. §4 details how clinicians participate in our model development. §5 describes the data that we used to evaluate our method empirically. Furthermore, the analysis of the performance is given in §6, followed by discussion and concluding remarks in §7.
2. Model
In this section, we overcome the high-dimensionality of patients’ health states by modeling length-of-stay transitions instead. Our prediction target is the survival function given a patient’s current health variables and the current length-of-stay. We first provide a basic formulation, then transform transition function matrix into survival function matrix, and finally provide implementation details.
2.1. Markov model
A Markov process involves transitions among a finite set of health states S. Each potential transition occurs with a rate that depends on covariates including the prior history of an individual’s health state. For each individual i, data are collected as health vector of the transition times from each state sk to sk+1 for k = 0, · · ·, ni for ni state transitions with transition probability . In an inference task, the object is to estimate the effects of covariates in the estimation of transition probabilities. Given a novel observation starting at an initial state, this model also allows prediction of probabilities tailored to individuals.
2.2. The New Model
As an alternative to the previous model, we propose a model with a set of LOS states L = 0, 1, · · ·, n, in which the transition function is pll′ = Pr(l′|l,Zl), which is the probability of the future state (LOS l′) given the current LOS l and the current observed health vector Zl. Because LOS is monotonically increasing, the transition probabilities from l to l′ < l are 0. For our model of NICU stay, n represents the LOS on the date of discharge from the hospital.
(1) |
The use of state-observation transition functions allows us to model transitions in terms of the observed health vector Zl, dynamically generated from a patient’s available information up to Length of Stay l, represented by filtration . Note that although l is Markovian, is not. And Zl is not necessarily Markovian.
In what follows, we will split Pr(l|l′,Zl) into |L| separately trained transition functions Prl(l′|Zl) = Pr(l′|l,Zl) ∀l′ = |l, · · ·, n. Each of these functions is given by a survival model, estimated using survival random forest (RSF) (Ishwaran et al. 2008). Next we discuss how to fit the transition functions.
2.3. A Survival Model for Transitions
The survival model is a framework for addressing data that contains information on the time to an event. A survival model focuses on the probability S(t) that the event does not occur by time t. In our probabilistic framework, for each state (LOS l), we have a separate Sl(t | Zl) defined as the probability that the discharge event does not occur within the next t days, in other words, the Remaining LOS RLOS ≥t:
(2) |
One of our predictive targets is the Expected Remaining LOS El(Zl), i.e., the expected time to discharge after current LOS l given the observed health vector Zl at l
(3) |
This allow us to translate the transition matrix to a survival matrix
(4) |
Note that although a transition probability matrix eq. (1) can always be transformed into a survival function matrix eq. (4), the other way around is not necessarily true. Because each Sl is trained independently, Sl(0 | Zl), Sl(1 | Zl), · · ·, Sl(n − l | Zl) can be used to recover the transition probabilities, which will sum to 1. However, we did not put such constraints across different Sl. Therefore, the recovered transition probabilities from S0(l | Z0), S1(l − l | Z1), · · ·, Sl(0 | Zl), 0, · · ·, 0 might not sum to 1. Putting such constraints on these functions will require all of them to be estimated together.
3. Estimation
One motivation of our model is the ultimate incorporation into the electronic medical system. In that scenario, a dynamic model will be trained on all historical data to predict the remaining lengths-of-stay for individuals that are presently in the NICU. We mirrored this time lag in training and prediction in our current algorithm. To introduce the overall procedure, we first order individuals by their admission dates, then partition the data into the states, LOS ≥ l. For each l, we select the first 80% of patients as the training set. We then apply RSF using training set’s health covariates at l, in order to induce the survival function Sl for l. The model is then evaluated using the latter 20% individuals’ data and their remaining lengths-of-stay at l. The following contains an overview of the model algorithm.
Algorithm 1: An outline of the algorithm for estimating Sl using RSF
Data: All individuals’ LOS L = 0, 1, · · ·, n, remaining lengths-of-stay (RLOS) at l and corresponding health vectors Zl, dynamically generated from each patient’s all available information up to Length of Stay l,.
Order all individuals by their admission time
for LOS l do
Select patients with LOS ≥ l
Select the first 80% patients as the training set
Apply RSF using training set’s health covariates at l,
Evaluate Sl using the latter 20% individuals’ data and their RLOS at l
end
Output: A series of Sl for all l ∈ L that take an unseen health vector and predicts the Expected RLOS
3.1. Computational Time and Resources
The time complexity of training a random forest is on average (Louppe 2014), where N is the number of sample data, M the number of trees, and K the number of randomly drawn variables. The number of patients with LOS ≥ l, Nl, decreases as l increases. Even though the number of RSF grows linearly with n, the complexity of each additional RSF decreases. Therefore, the time complexity of the model grows sub-linearly with n. Note that each Sl can be trained separately and in parallel, which significantly improves computation efficiency. The total runtime, including hyperparameter tuning, for our dynamic RSF model on a single core (2.40GHz Intel Xeon) machine, is ~ 3.27 hrs for LOS 0–80 and uses 351M for memory, which is consistent with current literature (Probst et al. 2019).
3.2. Parameter tuning and model evaluation
For hyperparameter tuning, except the number of randomly drawn candidate variables (K), all other parameters are set to default, i.e., log-rank splitting rule, maximum of 1000 trees, minimum node size 3, draw data with replacement and no pruning (Ishwaran and Kogalur 2018). The maximum number of candidate variables used is half of the total variables and the minimum is set to 3. We use the training data for parameter tuning, then choose the hyper-parameters with the highest training r2, then evaluate this model with the test set and report the test r2, RMSE, MAE, and c-statistic.
All models are trained on the 80% training set and evaluated on the same 20% test set. We used the full training set and did not implement any cross-validation for parameter tuning. The ROC curve is calculated on the binary outcome of whether a person from the test set is predicted to be discharged within 7 days, 14 days, or 30 days. Therefore, early discharges are considered true positives and late discharges are considered false positives. Our LOS model can be used to developed an early warning system, see Online Supplement Figure 16 for the AMOC curves for LOS 7, 14 and 30 models. We chose the c-statistic because all our patients are eventually discharged home, and based on the LOS distribution, we did not suffer from low prevalence where reporting of positive predictive value (PPV) is more appropriate (Romero-Brufau et al. 2015).
4. Clinical Participation
Because our ultimate goal of this project is to integrate this model into the clinical system, we ran weekly clinical focus group sessions throughout the project to iteratively develop our final model and data input. To make sure both the data and the model outputs are interpretable to the clinicians, we start with a handful of clinically validated variables similar to what is done in NICU literature (Verma et al. 1999, Bender et al. 2013). As we gradually ramp up the complexity of the model and the input data, we continue to validate the intermediate data inputs and model outputs with the clinicians. We constructed additional variables with clinical hypotheses on how they could prolong or reduce patient LOS and tested each hypothesis before including these variables as input data through visual inspections, see Online Supplement Figure 11 for example.
We also showed our intermediate model outputs, e.g., single decision trees, to the entire NICU team to validate the non-linear interactions captured in models. We followed this protocol and expanded our input variables in iterations, based on additional clinical factors raised by the NICU team during the focus groups. As we transitioned into a less interpretable random forest model, neonatologists were able to interpret the model clinically using the variable importance plots (Online Supplement Figures 17 to 19). We end up codifying the neonatology practice and translating information a physician uses to make clinical decisions into a machine-readable, expert-interpretable format. Some of these variables and their interactions have not been reported in the clinical literature. Had we started with a black-box model, i.e., neural network, we would not receive the clinical inputs and validations, which are crucial for our model performance and output interpretability.
5. Data
Multiple attempts have been made to leverage the rich information stored in electronic medical records (EMR) for accurate predictions (Rajkomar et al. 2018, Aczon et al. 2017). There are, however, few studies that focused on both model performance and interpretability. In the following section, we introduce a data processing scheme that incorporates medical knowledge from the neonatologists and allows us to model the health trajectory dynamically.
We tackle the interpretability problem in three ways. First, instead of feeding the raw data directly to a model, we carefully constructed variables that encapsulate the medical relevancy with help from the neonatologists. Secondly, to match the dynamic nature of our model, each variable is characterized by its available time in the EMR system. Thirdly, in order to tackle the high dimensionality problem, we reduced the data dimensionality by aggregating related information and explicitly generated variables to capture known medical or physiological interactions. For a complete break down of variables used in our model see Table 2. Variables summary statistics are available in the Online Supplement.
5.1. Raw data
We collected the EMRs from the NICU at the University of Chicago Comer Children’s Hospital between Aug 2008 and Oct 2015. The raw data consists of a total of 43,319,934 data points with 4624 encounters and 4612 safely discharged home patients. We define an encounter as a continuous admission into the NICU. A patient can have multiple encounters, though it is rare (0.26%).
The raw data were available in the EMR system at different time points. Some are collected at birth, such as demographics, gestational age at birth, congenital malformations. More dynamic information was collected at various time points during the span of an encounter with the NICU. The dynamic information includes sampled measurements of vitals (e.g. pulse, blood pressure, temperature, weight), machine functional readings (e.g., assisted respiratory rate, mean airway pressure, intravenous fluid rate), exam results (laboratories, neurology, radiology), diagnoses (daily problem list), and interventions (surgeries, procedures, medications).
5.2. Data Processing
Making sure both our model input and outputs are interpretable to the medical professionals, neonatologists participated in every stage of the study. With their guidance, we aggregated related medical data and generated health variables encapsulating established medical knowledge or physiological meanings. This also reduced the dimensions of our health data significantly, while enhancing the interpretability of the model.
We categorized a variable by two dimensions: medical relevancy and time available in the EMR system. We defined the medical category of a variable based on how it relates to neonatal physiology, using Gomella et al. 2013 as a guideline. We then organized data based on their available time within the EMR systems. We classified data as available before birth, available at birth, real-time, daily and available after discharge. For example, due to the time lag between diagnosis and coding, ICD9 diagnostic codes not related to congenital defects are usually available after discharge and before billing.
All information available before birth, such as demographics, financial class, zipcode, and maternal information, as well as information available at birth, such as gestational age at birth, weight at birth, and Apgar scores, are captured in static variables.
5.3. Dynamic information
Real-time information consists of machine-generated readings, recorded exam results and interventions. For example, weights are collected regularly through vital machine functions and irregularly through nurse readings. Because we do not require the covariates in the health vector Z to be Markovian, in addition to the time-dependent covariates at each LOS l, we can include the history of a variable up to time l in Z, captured in the non-Markovian time-smoothed trajectory variables. The time-dependent weight measurements, for instance, were used to generate three time-smoothed trajectory variables: daily weight, daily weight change, and weekly weight change (rolling) (Table 2). Depending on their medical meaning, some real-time variables may have rolling 3-day and rolling weekly averages to capture their trajectories.
These trajectory variables are significant in predicting the remaining lengths-of-stay as well as in the clinical interpretation of our model outputs. For example, the past-3-day respiratory support history is among the top 10 in variable importance for days 7 and 14 models, see Figures 18 and 19.
Other information, such as procedures, surgeries, and diagnoses, may not be available in real-time but are updated daily in the system. We consider these as dynamic information as well. For binary variables, we take the daily, weekly, and monthly rolling maximum. For count variables, we take the daily, weekly, and monthly sum (Table 2).
We included procedures, surgeries, and medications in the health status covariates because patients’ health conditions dictate the choices and timings of these therapeutic actions. For example, tracheoesophageal fistula (TEF) repair surgeries are performed to correct the congenital malformations when neonates are healthy enough. Similarly, neonatologists prescribe anticonvulsant medications based on the frequency and severity of seizures.
5.4. Dimension Reduction
To solve the high dimensionality and potential sparsity of our data, we organized and aggregated related information in medically meaningful ways. In what follows, we will use one respiratory variable, which aggregates 220 machine-generated measurements, as an example of such data transformation for dimension reduction. Another example of a metabolic and nutritional variable is in Online Supplement Table 10 and Online Supplement Figure 12.
In previous NICU length-of-stay models (Verma et al. 1999, Bender et al. 2013), respiratory variables at various time points are frequently constructed as indicators of health in the morbidity assessment index for newborns (MAIN) (Table 3). However, these variables only captured assisted ventilation and mechanical ventilation at various fixed time points. In addition to being static, these variables ignored several other ventilation methods, such as non-invasive ventilation, high frequency oscillatory ventilation and extracorporeal membrane oxygenation. In comparison, we ordered the respiratory support methods based on the severity and aggregated 220 real-time machine-generated measurements into one respiratory support variable (Table 1). As a result, we were able to track the respiratory support levels of a patient over the entire NICU encounter (Figure 2) using just one variable. As shown in the §1 example, the respiratory support variable is crucial in predicting patient LOS.
Table 1.
Respiratory support types and corresponding severity levels
Respiratory support | Level |
---|---|
| |
Extracorporeal Membrane Oxygenation | 7 |
High Frequency Oscillatory Ventilation | 6 |
Mechanical Ventilation | 5 |
Non Invasive Ventilation | 4 |
Continuous Positive Airway Pressure | 3 |
High Flow Nasal Cannula | 2 |
Low Flow Nasal Cannula | 1 |
Room Air | 0 |
Figure 2.
The dynamic respiratory variable of one patient tracked over the entire NICU encounter. The patient was admitted right after birth starting with mechanical ventilation and deteriorated quickly, requiring extracorporeal membrane oxygenation. After that, this patient gradually improved over the next month. However, about a week before discharge, the condition deteriorated again but was eventually resolved.
6. Results
In this study, the mean and median lengths-of-stay are 23.4 days and 9 days, indicating significant variations in NICU patients’ lengths-of-stay. The maximum length-of-stay is 454, and the total lengths-of-stay is 108003 days. The histogram of lengths-of-stay for NICU patients is shown in Figure 3. The 4624 patients are included with 45% female patients and 38% delivered by Cesarean birth. On average, patients with Cesarean birth stayed 4.88 additional days. The average gestational age is 254 days with 53% born pre-term. On average, neonates with each additional day of gestational age stay 0.826 fewer days in the NICU. Additional patient demographics and other characteristics are summarized in Online Supplement Table 4. The Comer NICU data is HIPAA protected and not available. A complete description of the data appears in the Online Supplement §B.
Figure 3.
Histogram of LOSs for NICU patients. Data is truncated at 95 percentile (94 days).
6.1. Remaining LOS prediction
Our prediction target remaining LOS is defined previously in §2. We compared the numeric predictions using the statistic R2, the proportion of the variance explained by the model. The resulted metrics R2 >0.80 on test set data for up to LOS 11, which is after >55% of the NICU patients reached the ends of their LOSs and were discharged home. To evaluate our model performance, we adapted the static AFT regression model to our dynamic setting based on previously published algorithms in Bender et al. 2013 to establish baseline performance on our dataset. Figure 4 demonstrates the improved predictive capability of the new model over the dynamic AFT model. We also trained a generic off-the-shelf long short-term memory (LSTM) recurrent neural network (RNN) on the training set data. Our model outperforms the LSTM model for the first 20 days (~2/3 of the NICU patients discharged). Our model performance is stable across measures, see Figure 13 and Figure 14 for MAE and RMSE statistics. The test set R2 > 0.55 for 68 days (>90% of the NICU patients discharged) with the only exception at LOS 57 (R2 =0.52). This is partly due to our choice of splitting the training set and the test set by time. Randomly sampled training and test sets gave test R2 =0.57 for LOS 57 model.
Figure 4.
Prediction results on test set for LOS 0–80 models compared to dynamic AFT and LSTM RNN performance.
6.2. Discharge Time Predictions
To evaluate our dynamic discharge predictions, we used the Area Under Curve for the time-dependent receiver operating characteristic (ROC) curves. In the time-dependent ROC, cases (true positive) are those who discharged before next t days, and the controls (true negative) are those who remained in NICU for the next t days. In the case of next week discharge, cases are those discharged within the next 7 days and controls are those remained until next week.
We focused on the time-dependent c-statistic for three time horizons (next-week discharge, next-2-week discharge, and next-month discharge). Because the short term discharge are often decided by the neonatologists, we did not predict short-term discharges (e.g. next-3-days). All of the metrics were calculated on the holdout encounters at each LOS.
Because the current state-of-art AFT regression model is a static model, we first compared our model using only the Day 1 (LOS 0) data to the AFT model. Even though our model lacked Day 2 to Day 7 information available in the AFT regression model, both models included the same patients for training and test sets. The resulting c-statistics are 0.95 (95% CI 0.94–0.96) and 0.97 (95% CI 0.96–0.98) for next-2-week discharge and next-month discharge. These were significantly higher than those from the AFT regression model, which were 0.89 (95% CI 0.87–0.91) and 0.89 (95% CI 0.87–0.92), see Figure 15. Next-week discharge prediction comparison was not shown because the AFT model used information whether a patient stayed until Day 7. As shown in Figure 5, the time-dependent predictions of next-week, next-2-week, and next-month discharge had test c-statistics greater than baseline performance for more than 68 LOSs.
Figure 5.
Dynamic prediction test set c-statistic for 3 time horizon (next-week, next-2-weeks and next-month) with LOS models from LOS=0 to LOS=68.
6.3. Clinical interpretations
Our dynamic random forest model not only makes accurate predictions on LOS but also identifies which variables are the most important to make these predictions. This enables physicians to better understand the underlying clinical process during a patient’s stay. For example, our model for days 0, 7, and 14 shows the decreasing importance of gestational age and weight and the increasing importance of oral feeding and respiratory support levels, see Figures 17 to 19. From a medical standpoint, it is not surprising that on the day of birth, gestational age and weight matter the most regarding remaining lengths-of-stay. What is surprising is as a patient remains in NICU on days 7 and 14, how gestational age, while still important, becomes less important than oral feeding and respiratory support. This may allow physicians to target oral feeding and respiratory levels instead of targeting care based on gestational age or weight, in an attempt to decrease LOS.
6.4. Census distributions over time
The predicted survival function for each patient at each LOS allows us to generate the exact distributions of census over time. Figure 6 shows the probability distributions of census generated for the 70 patients that stayed in NICU on Jan 02, 2015. Because we set up the training data as the first 80% of patients per LOS. This roughly translates to patients that were admitted to our NICU before July 17, 2014. Jan 02, 2015 patients are, therefore, highly unlikely to be in the training set.
Figure 6.
Probability distributions of the census for the 70 patients who stayed in NICU on Jan 02 2015 over 30 time horizons (Jan 03 to Feb 01). The solid line tracks the true census over time.
7. Discussion and Conclusion
The increased availability of detailed patient data via the use of electronic health records argues for the integration of medical knowledge, comprehensive data and real-time predictive models. In this paper, we demonstrate the value in identifying medically relevant variables through codification of clinical knowledge, before applying machine learning algorithms. With accurate and interpretable results, neonatologists can compare their clinical evaluations to our model predictions, review cases with discrepancies, and detect early signals of change in patients’ health conditions that could affect LOS.
We also introduce a dynamic LOS model for NICU patients that inputs real-time health variables aggregated from patients’ clinical data available up to LOS l and generates dynamic survival functions. The resulted remaining length-of-stay predictions are highly accurate with validation R2 > 0.8 for more than 55% of patients. Our model outperforms the current state-of-art AFT regression model until a substantial part (LOS=68, > 90% percentile) of the patients are discharged home.
For the outliers with LOS ≥69 days, our remaining length-of-stay performances decrease past 69 days. However, our future discharge predictions remain accurate for another 29 days. For the outliers, we can still predict with high AUC (> 0.86) after 98 days (95.5% percentile) whether they will stay another 30 days or not.
7.1. Census Forecasting
With the predicted future distributions for the entire NICU census, our model is a key component to an accurate census forecasting for better resource planning and care management (Pallin et al. 2013). The census forecasting process is well-established and straightforward in its exercise. However, each step could potentially introduce forecasting errors, making it difficult to deconvolute the errors and evaluate the efficacy of our LOS model.
To forecast the total census, one has to set up a model for the arrival process for the number of neonates showing up to the NICU every day. This should be similar to ones used in admission control problems (Kim et al. 2014, Shmueli et al. 2003). This model needs to be fitted to the empirical data while taking into account the seasonality and the weekly patterns (Dobson et al. 2010).
In addition to the number of arrivals, we need to construct a distribution of the covariates for those arriving in the NICU. The correlation structure among covariates can be modeled as a multivariate Gaussian with a variance-covariance matrix estimated from empirical data. Due to the sparsity of several key covariates (e.g., congenital defects) and their effects on lengths-of-stay, this could potentially introduce significant variations in forecasting census.
Our focus on the survival functions enables us to construct the probability distributions of the NICU census over time for those eventually discharged home, who represent the vast majority of the NICU discharge dispositions. We can incorporate mortality (<5%) and transfers (~5%) with a competing risk model by adding another dimension to the outcome. However, within our dataset, the mortality rate is small, with the number of deaths of the same order as the number of health variables in our model. Thus, to obtain accurate predictions of mortality, more data, e.g., a national NICU database, would be required.
7.2. Limitations
We follow the practice of existing survival analysis literature and model the time series of physiological indicators as both time-dependent and trajectory covariates (Witten and Tibshirani 2010, Fang et al. 2016, Ma et al. 2019). However, some of these variables, such as oxygen levels, blood pressure, or pulse, could potentially be modeled separately as stochastic processes. For example, one could construct for each covariate, a Markovian model on how it evolves from current LOS l to the next, using the rest of covariates as predictors. Given the current status of clinical knowledge on what and how factors impact the physiology changes of a neonate during an intensive care stay, our data has the potential for realistic dynamic models but will require significant research. Future researchers could build upon our model and explore further the relationships among these clinical covariates.
The survival functions were trained with separate random survival forest models. As the LOS increases, the number of patients decreases, causing the predictive performance to deteriorate.. The generic LSTM RNN model performed at times close to or better than our dynamic RSF model after LOS 20 days. Because the LSTM RNN model was trained on the training set patients’ entire stays, for long LOSs (> 20 days), it exploited the dynamic LOS process while the RSF suffers from the shrinking number of training data. A hybrid model where the first 21 days use a more interpretable RSF and longer LOSs with a more predictive black-box neural network could potentially balance the strengths and shortcomings of these two models. Even though it is out of the scope of this paper, further research and investigation for a more sophisticated neural network model have the potential to improve its predictive power further. In addition to those reported in the results section, we also implemented several other algorithms that were not in the LOS literature, such as k-nearest neighbors (Lowsky et al. 2013), gradient boosted decision trees (Chen and Guestrin 2016) and Bayesian additive regression trees (Chipman et al. 2010).
Our model is trained and evaluated on data from the University of Chicago Comer Children’s Hospital’s NICU. This represents our first attempt to codify neonatology patients’ entire health conditions. Once neonatologists start to benchmark predictions and review cases using our model, they are likely to have feedback and suggestions that could generate better health variables and further improve our model performance. In addition, other hospital systems might have customized data models different from our electronic health system. With modification, our approach can be adapted to different ICUs settings, given the availability of similar health data. Collaboration among multiple health systems will result in more comprehensive data and likely better models, especially for the outliers or to capture mortality.
Thus, our results suggest that when applied to clinically meaningful variables, dynamic methods can dramatically improve prediction performance, especially as patients’ conditions change over their stays in the hospital. These improvements in prediction accuracy, when incorporated into an optimization approach for resource allocation and staff scheduling, can lead to better health outcomes.
Supplementary Material
Acknowledgments
John R. Birge and Daniel Adelman are grateful for the financial support from Booth School of Business. We thank the entire neonatology section for their input in identifying clinical drivers of length-of-stay. We also thank the reviewers for providing very helpful comments while sacrificing precious personal time to provide prompt reviews that greatly improved the paper.
Contributor Information
Kanix Wang, Booth School of Business, The University of Chicago, Chicago, Illinois 60637.
Walid Hussain, Section of Neonatology, Department of Pediatrics, The University of Chicago, Chicago, Illinois 60637.
John R. Birge, Booth School of Business, The University of Chicago, Chicago, Illinois 60637
Michael D. Schreiber, Section of Neonatology, Department of Pediatrics, The University of Chicago, Chicago, Illinois 60637
Daniel Adelman, Booth School of Business, The University of Chicago, Chicago, Illinois60637.
References
- Aczon M, Ledbetter D, Ho L, Gunny A, Flynn A, Williams J, Wetzel R (2017) Dynamic Mortality Risk Predictions in Pediatric Critical Care Using Recurrent Neural Networks. arXiv:1701.06675 [cs, math, q-bio, stat] URL http://arxiv.org/abs/1701.06675, arXiv: 1701.06675. [Google Scholar]
- Ansari SF, Yan H, Zou J, Worth RM, Barbaro NM (2018) Hospital Length of Stay and Readmission Rate for Neurosurgical Patients. Neurosurgery 82(2):173–181, ISSN 1524–4040, URL 10.1093/neuros/nyx160. [DOI] [PubMed] [Google Scholar]
- Anthony Celi L, Mark RG, Stone DJ, Montgomery RA (2013) “Big Data” in the Intensive Care Unit. Closing the Data Loop. American Journal of Respiratory and Critical Care Medicine 187(11):1157–1160, ISSN 1073-449X, URL 10.1164/rccm.201212-2311ED. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Basques BA, Webb ML, Bohl DD, Golinvaux NS, Grauer JN (2015) Adverse Events, Length of Stay, and Readmission After Surgery for Tibial Plateau Fractures:. Journal of Orthopaedic Trauma 29(3):e121–e126, ISSN 0890-5339, URL 10.1097/BOT.0000000000000231. [DOI] [PubMed] [Google Scholar]
- Bender J, Koestler D, Ombao H, McCourt M, Alskinis B, Rubin LP, Padbury JF (2013) Neonatal Intensive Care Unit: Predictive Models for Length of Stay. Journal of perinatology : official journal of the California Perinatal Association 33(2):147–153, ISSN 0743-8346, URL 10.1038/jp.2012.62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chaou CH, Chen HH, Chang SH, Tang P, Pan SL, Yen AMF, Chiu TF (2017) Predicting Length of Stay among Patients Discharged from the Emergency Department—Using an Accelerated Failure Time Model. PLOS ONE 12(1):e0165756, ISSN 1932-6203, URL 10.1371/journal.pone.0165756. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794, KDD ‘16 (New York, NY, USA: Association for Computing Machinery; ), ISBN 9781450342322, URL 10.1145/2939672.2939785. [DOI] [Google Scholar]
- Chipman HA, George EI, McCulloch RE (2010) BART: Bayesian additive regression trees. Annals of Applied Statistics 4(1):266–298, ISSN 1932-6157, 1941-7330, URL 10.1214/09-AOAS285, publisher: Institute of Mathematical Statistics. [DOI] [Google Scholar]
- Collett D (2015) Modelling Survival Data in Medical Research (New York: Chapman and Hall/CRC; ), 3 edition, ISBN 978-0-429-19629-4, URL 10.1201/b18041. [DOI] [Google Scholar]
- Dobson G, Lee HH, Pinker E (2010) A Model of ICU Bumping. Operations Research 58(6):1564–1576, ISSN 0030-364X, URL 10.1287/opre.1100.0861, publisher: INFORMS. [DOI] [Google Scholar]
- Fang HB, Wu TT, Rapoport AP, Tan M (2016) Survival analysis with functional covariates for partial follow-up studies. Statistical Methods in Medical Research 25(6):2405–2419, ISSN 0962-2802, URL 10.1177/0962280214523586, publisher: SAGE Publications Ltd STM. [DOI] [PubMed] [Google Scholar]
- Ghassemi M, Celi LA, Stone DJ (2015a) State of the art review: the data revolution in critical care. Critical Care 19(1):118, ISSN 1364-8535, URL 10.1186/s13054-015-0801-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ghassemi M, Pimentel MA, Naumann T, Brennan T, Clifton DA, Szolovits P, Feng M (2015b) A Multivariate Timeseries Modeling Approach to Severity of Illness Assessment and Forecasting in ICU with Sparse, Heterogeneous Clinical Data. Proceedings of the … AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence 2015:446–453, ISSN 2159-5399, URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4864016/. [PMC free article] [PubMed] [Google Scholar]
- Gomella TL, Cunningham MD, Eyal F (2013) Neonatology 7th Edition (New York: McGraw-Hill Education/Medical; ), 7 edition edition, ISBN 978-0-07-176801-6. [Google Scholar]
- Harsha SS, Archana BR (2015) SNAPPE-II (Score for Neonatal Acute Physiology with Perinatal Extension-II) in Predicting Mortality and Morbidity in NICU. Journal of Clinical and Diagnostic Research : JCDR 9(10):SC10–SC12, ISSN 2249-782X, URL 10.7860/JCDR/2015/14848.6677. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hauck K, Zhao X (2011) How dangerous is a day in hospital? A model of adverse events and length of stay for medical inpatients. Medical Care 49(12):1068–1075, ISSN 1537-1948, URL 10.1097/MLR.0b013e31822efb09. [DOI] [PubMed] [Google Scholar]
- Hoogervorst-Schilp J, Langelaan M, Spreeuwenberg P, de Bruijne MC, Wagner C (2015) Excess length of stay and economic consequences of adverse events in Dutch hospital patients. BMC Health Services Research 15(1):531, ISSN 1472-6963, URL 10.1186/s12913-015-1205-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ishwaran H, Kogalur UB (2018) randomForestSRC: Random Forests for Survival, Regression, and Classification (RF-SRC). URL https://CRAN.R-project.org/package=randomForestSRC. [Google Scholar]
- Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS (2008) Random survival forests. The Annals of Applied Statistics 2(3):841–860, ISSN 1932-6157, 1941-7330, URL 10.1214/08-AOAS169. [DOI] [Google Scholar]
- Kaboli PJ, Go JT, Hockenberry J, Glasgow JM, Johnson SR, Rosenthal GE, Jones MP, Vaughan-Sarrazin M (2012) Associations between reduced hospital length of stay and 30-day readmission rate and mortality: 14-year experience in 129 Veterans Affairs hospitals. Annals of Internal Medicine 157(12):837–845, ISSN 1539-3704, URL 10.7326/0003-4819-157-12-201212180-00003. [DOI] [PubMed] [Google Scholar]
- Kim SH, Chan CW, Olivares M, Escobar G (2014) ICU Admission Control: An Empirical Study of Capacity Allocation and Its Implication for Patient Outcomes. Management Science 61(1):19–38, ISSN 0025-1909, URL 10.1287/mnsc.2014.2057, publisher: INFORMS. [DOI] [Google Scholar]
- Lee HC, Bennett MV, Schulman J, Gould JB, Profit J (2016) Estimating Length of Stay by Patient Type in the Neonatal Intensive Care Unit. American Journal of Perinatology 33(8):751–757, ISSN 1098-8785, URL 10.1055/s-0036-1572433. [DOI] [PubMed] [Google Scholar]
- Louppe G (2014) Understanding Random Forests: From Theory to Practice. Ph.D. thesis, Université de Liège, Liège, Belgique, URL https://orbi.uliege.be/handle/2268/170309. [Google Scholar]
- Lowsky DJ, Ding Y, Lee DKK, McCulloch CE, Ross LF, Thistlethwaite JR, Zenios SA (2013) A K-nearest neighbors survival probability prediction method. Statistics in Medicine 32(12):2062–2069, ISSN 1097-0258, URL http://dx.doi.org/10.1002/sim.5673, _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.5673. [DOI] [PubMed] [Google Scholar]
- Ma J, Lee DKK, Perkins ME, Pisani MA, Pinker E (2019) Using the Shapes of Clinical Data Trajectories to Predict Mortality in ICUs. Critical Care Explorations 1(4):e0010, ISSN 2639-8028, URL 10.1097/CCE.0000000000000010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meira-Machado L, de Uña-Álvarez J, Cadarso-Suárez C, Andersen PK (2009) Multi-state models for the analysis of time-to-event data. Statistical methods in medical research 18(2):195–222, ISSN 0962-2802, URL 10.1177/0962280208092301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pallin DJ, Allen MB, Espinola JA, Camargo CA, Bohan JS (2013) Population aging and emergency departments: visits will not increase, lengths-of-stay and hospitalizations will. Health Affairs (Project Hope) 32(7):1306–1312, ISSN 1544-5208, URL 10.1377/hlthaff.2012.0951. [DOI] [PubMed] [Google Scholar]
- Pollack MM, Holubkov R, Reeder R, Dean JM, Meert KL, Berg RA, Newth CJL, Berger JT, Harrison RE, Carcillo J, Dalton H, Wessel DL, Jenkins TL, Tamburro R, Eunice Kennedy Shriver National Institute of Child Health and Human Development Collaborative Pediatric Critical Care Research Network (CPCCRN) (2018) PICU Length of Stay: Factors Associated With Bed Utilization and Development of a Benchmarking Model. Pediatric Critical Care Medicine: A Journal of the Society of Critical Care Medicine and the World Federation of Pediatric Intensive and Critical Care Societies 19(3):196–203, ISSN 1529-7535, URL 10.1097/PCC.0000000000001425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Probst P, Wright MN, Boulesteix AL (2019) Hyperparameters and tuning strategies for random forest. WIREs Data Mining and Knowledge Discovery 9(3):e1301, ISSN 1942-4795, URL http://dx.doi.org/10.1002/widm.1301, _eprint:https://onlinelibrary.wiley.com/doi/pdf/10.1002/widm.1301. [Google Scholar]
- Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, Liu PJ, Liu X, Marcus J, Sun M, Sundberg P, Yee H, Zhang K, Zhang Y, Flores G, Duggan GE, Irvine J, Le Q, Litsch K, Mossin A, Tansuwan J, Wang D, Wexler J, Wilson J, Ludwig D, Volchenboum SL, Chou K, Pearson M, Madabushi S, Shah NH, Butte AJ, Howell MD, Cui C, Corrado GS, Dean J (2018) Scalable and accurate deep learning with electronic health records. npj Digital Medicine 1(1):18, ISSN 2398-6352, URL 10.1038/s41746-018-0029-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Richardson DK, Corcoran JD, Escobar GJ, Lee SK (2001) SNAP-II and SNAPPE-II: Simplified newborn illness severity and mortality risk scores. The Journal of Pediatrics 138(1):92–100, ISSN 0022-3476, URL 10.1067/mpd.2001.109608. [DOI] [PubMed] [Google Scholar]
- Rinne ST, Graves MC, Bastian LA, Lindenauer PK, Wong ES, Hebert PL, Liu CF (2017) Association between length of stay and readmission for COPD. The American Journal of Managed Care 23(8):e253–e258, ISSN 1936-2692. [PMC free article] [PubMed] [Google Scholar]
- Romero-Brufau S, Huddleston JM, Escobar GJ, Liebow M (2015) Why the C-statistic is not informative to evaluate early warning scores and what metrics to use. Critical Care 19(1), ISSN 1364-8535, URL 10.1186/s13054-015-0999-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rosenberg MA, Browne MJ (2001) The Impact of the Inpatient Prospective Payment System and Diagnosis-Related Groups. North American Actuarial Journal 5(4):84–94, ISSN 1092-0277, URL 10.1080/10920277.2001.10596020. [DOI] [Google Scholar]
- Sanchez-Pinto LN, Luo Y, Churpek MM (2018) Big Data and Data Science in Critical Care. Chest 154(5):1239–1248, ISSN 0012-3692, URL 10.1016/j.chest.2018.04.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shmueli A, Sprung CL, Kaplan EH (2003) Optimizing Admissions to an Intensive Care Unit. Health Care Management Science 6(3):131–136, ISSN 1572-9389, URL 10.1023/A:1024457800682. [DOI] [PubMed] [Google Scholar]
- Verburg IWM, Atashi A, Eslami S, Holman R, Abu-Hanna A, de Jonge E, Peek N, de Keizer NF (2017) Which Models Can I Use to Predict Adult ICU Length of Stay? A Systematic Review. Critical Care Medicine 45(2):e222–e231, ISSN 1530-0293, URL 10.1097/CCM.0000000000002054. [DOI] [PubMed] [Google Scholar]
- Verghese A, Shah NH, Harrington RA (2018) What This Computer Needs Is a Physician: Humanism and Artificial Intelligence. JAMA 319(1):19–20, ISSN 1538-3598, URL 10.1001/jama.2017.19198. [DOI] [PubMed] [Google Scholar]
- Verma A, Okun NB, Maguire TO, Mitchell BF (1999) Morbidity assessment index for newborns: a composite tool for measuring newborn health. American Journal of Obstetrics and Gynecology 181(3):701–708, ISSN 0002-9378. [DOI] [PubMed] [Google Scholar]
- Witten DM, Tibshirani R (2010) Survival analysis with high-dimensional covariates. Statistical methods in medical research 19(1):29–51, ISSN 0962-2802, URL 10.1177/0962280209105024. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.