Abstract
The aim of the present study was to validate, and if necessary update, a predictive model previously developed using a classification and regression tree (CART) algorithm for predicting successful extubation (ES) using a new cohort. This prospective cohort study enrolled adults admitted to 10 intensive care units, who had successfully passed a spontaneous breathing trial (SBT) and were considered ready for extubation. After extubation, the patients were followed up for 48 h. The primary outcome measure was ES, defined as the ability to maintain spontaneous unassisted breathing for >48 h after extubation. The 3-factor CART model was applied to patients in this cohort. The predicted probability of ES for each patient in this validation cohort was calculated based on the original CART model using the Laplace correction method. The performance was assessed by discrimination and calibration. A decision curve analysis was used assess the clinical net benefit (NB). Extubation failure (EF) occurred in 90/530 patients (17%). Among the 90 patients, 72 (13.6%) were reintubated, while 18 patients remained on rescue noninvasive ventilation within 48 h after extubation. The original CART model showed high discrimination but only moderate calibration with predicted probabilities that were systematically lower than expected. The original CART model was updated, and the updated model preserved excellent discrimination (area under the receiver operating characteristic curve, 0.91; 95% confidence interval, 0.87 to 0.93), but exhibited near-perfect calibration (calibration slope, 1; intercept, 0). Between threshold probabilities of 50 and 80%, the NB of using this updated model is significantly improved compared with the current strategy. The updated CART model may be used to estimate the predicted probability of ES after a successful SBT for individual patients. Applying this model appears to produce a substantial clinical consequence with regard to potential reduction in unexpected EFs.
Keywords: decision trees, endotracheal extubation, calibration, decision curve analysis, model updating
Introduction
Discontinuation of mechanical ventilation is an essential issue for critically ill patients and clinicians; however, controversy continues concerning the best approach for conducting this task. The current recommendation is to perform weaning using an evidence-based two-step approach, in which daily evaluation for readiness to wean is followed by a spontaneous breathing trial (SBT) (1,2). Currently, it is recommended that patients who successfully pass the SBT should be extubated if neurological status, excessive secretions, adequate cough and airway obstruction are not problematic (1,3). However, the rate of extubation failure is high (13.5 to 22%) in patients extubated on the basis of the current strategy (4–6). Delayed and premature discontinuation of mechanical ventilation (MV) have been associated with increased mortality (1,7,8). Therefore, there is an urgent requirement for improved methods of identifying patients who are likely to undergo extubation successfully. Extensive investigations have been conducted to identify predictors of extubation outcome; however, none of the predictors reported to date have demonstrated high accuracy (2,9).
Research into weaning pathophysiology indicates that patients show substantial alterations in numerous physiological variables over time during the SBT (10). Thus, mathematically-based prediction models, which include multiple variables, may provide improved predictive outcomes compared with traditional predictors (11). Recently, a number of clinical models for predicting extubation outcome have been reported in the literature (12–14). Certain models have exhibited good discrimination; however, none has been externally validated. External validation is crucial to determine generalizability, and it is necessary to establish the ability of a model to predict outcome in different settings prior to its use in clinical practice being recommended (15,16).
In a previous study, data from a cohort of mechanically ventilated elderly patients were prospectively analyzed, and used to develop a predictive model using a classification and regressive tree (CART) algorithm (17), also known as a decision tree (18), to predict extubation outcome in patients following a successful SBT. This CART model selected three discriminators, which are readily available in the majority of intensive care units (ICUs), and showed a good discrimination with an area under the receiver operating characteristic curve (AUC) of 0.94.
The aim of the present study was to validate and, if necessary, update this previously developed CART model (17) in a multicenter cohort of adult patients admitted to 10 Chinese ICUs. Furthermore, the present study aimed to determine whether the clinical benefit achieved using this model surpasses the current strategy of extubating all patients who have successfully completed a SBT. Therefore, a net benefit analysis was also conducted (19,20), which is a novel method to quantify the clinical usefulness of a predictive model.
Materials and methods
Study setting and patients
This prospective, validation cohort study was conducted between April 2013 and October 2013 in ten Chinese ICUs in six tertiary hospitals (Pingjin Hospital, Tianjin, China; Tianjin Chest Hospital, Tianjin, China; Institute of Traumatic Brain Injury and Neurology, Tianjin, China; General Hospital of Chinese People's Armed Police Forces, Beijing, China; Xizang Corps Hospital, Lhasa, Tibet Autonomous Region; and Langfang Fourth Peoples' Hospital, Bazhou, China). Consecutive patients >18 years of age that were on MV for >48 h were included if they were considered to be able to undergo an SBT, on the basis of the readiness criteria that are listed in Table I (2). Exclusion criteria are detailed in Fig. 1. The Institutional Review Board at each of the participating sites approved this observational study and written informed consent was obtained from the patients' next of kin.
Table I.
1 | Improvement in underlying conditions |
2 | Adequate oxygenation, indicated by PaO2 >60 mmHg at FiO2 ≤0.4 with an extrinsic positive end-expiratory pressure <8 cm H2O |
3 | Cardiovascular stability (absence of active myocardial ischemia, absence of vasopressor use or <5 mg/kg/min dopamine or dobutamine; heart rate <130 beats/min) |
4 | Body temperature <38°C |
5 | Hemoglobin >8 g/dl |
6 | Awake or easily arousable |
7 | Adequate coughing during suctioning and did not require suctioning more often than every 2 h |
8 | Blood pH ≥7.3 |
PaO2, partial pressure of oxygen in arterial blood; FiO2, fraction of inspired oxygen.
Study protocol
Patients eligible for enrollment, in a semirecumbent position, were submitted to 60-min SBT immediately. The ventilator (Evita-4 or XL; Dräger, Lübeck, Germany) mode was set to 100% automatic tube compensation (ATC) plus 5 cm H2O positive end-expiratory pressure and other settings remained unchanged. If no indications of SBT failure (Table II) were observed during the SBT, the trial was considered successful. If the patient had adequate mental status, and the ability to cough and expectorate, extubation was performed. Alternatively, if manifestations of SBT failure were detected during this period, SBT was terminated and MV reinstituted using the original settings. Immediate reintubation was performed in the presence of a major clinical event (21,22) (Table II). Rescue therapy with non-invasive ventilation (NIV) was used to avoid reintubation in patents with respiratory failure after extubation (21,22) if immediate reintubation was not necessary (Table II). Respiratory therapists applied NIV (BiPAP Vision; Respironics, Inc., Murrysville, PA, USA) using the S/T mode, following standard procedures (21). The final decisions, regarding whether to conduct extubation, rescue NIV or reintubation, were made solely by the primary team, with no involvement from the research team, according to the clinical protocols of the respective institutions.
Table II.
Criteria for SBT failure (≥1 criterion required)
|
Criteria for post-extubation respiratory failure that prompted immediate rescue therapy with NIV (≥1 criterion required)
|
Major clinical events that prompted immediate reintubation (≥1 event required)
|
SBT, spontaneous breathing trial; NIV, non-invasive ventilation; SpO2, peripheral capillary oxygen saturation; FiO2, fraction of inspired oxygen; PaCO2, partial pressure of carbon dioxide in arterial blood.
Study outcome
After extubation, the patients received follow-up for 48 h. The primary outcome measure was successful extubation (ES), defined as the ability to maintain spontaneous unassisted breathing for >48 h after extubation. Reinstitution of either NIV or invasive MV within 48 h of extubation was considered an extubation failure (EF).
Data collection and definitions of candidate predictors for ES
The original CART model has been described in detail in a previous study (17). The following candidate predictors were required to calculate the probability of ES in the validation set: Rapid shallow breathing index (RSBI) at 1 min of SBT (RSBI1), change in RSBI at 30 min (∆RSBI30) of SBT and product of airway occlusion pressure (P0.1) and RSBI at 30 min of SBT (P0.1 × RSBI30). P0.1 was measured using the ventilator (Evita-4; Dräger), according to previously described methods (23). RSBI was calculated by dividing respiratory rate (f) by tidal volume (Vt, in liters) and the values were displayed on the ventilator. At least three measurements were obtained, separated by an interval of ≥15 sec, and the mean value was used for analysis. Differences in RSBI at 30 min during the 60-min SBT (∆RSBI30) were assessed using the ratio of RSBI30 to RSBI1, expressed as a percentage.
Demographic characteristics (such as age, weight, gender and comorbid conditions), acute physiology and chronic health evaluation II (APACHE-II) score on admission, reasons for MV, and duration of MV prior to SBT were recorded. Electrocardiogram, heart rate (HR), arterial blood pressure and oxygen saturation by pulse oximetry (SpO2) were continuously monitored. Arterial blood samples were collected prior to SBT and at 30 min after extubation, and were immediately analyzed using an ABL 520 blood gas analyzer (Radiometer Medical ApS, Copenhagen, Denmark).
Statistical analysis
Data were presented as the mean ± standard deviation for continuous and ordinal variables, or medians (interquartile range) if the data were not normally distributed. For bivariate comparisons, the Wilcoxon test was used for all continuous variables, and Fisher's exact test for categorical variables. As the present study aimed to test the models and single predictors in a new population, the threshold values of the three predictors alone to predict ES were the same as those used or cited in previous relevant studies (2,17,24,25): P0.1 × RSBI30, ≤328 cmH2O·breaths/min/l; ∆RSBI30, ≤105%; RSBI1, ≤105 breaths/min/l.
Model validation
The original CART model was initially validated (17) by assessing its discrimination and calibration (agreement between predicted and actual probabilities of ES) properties in the validation sample. The probability of ES at each terminal node was calculated using the Laplace correction method (26,27) using the derivation cohort (17). Subsequently, the predicted probability of ES was obtainable for each patient in the validation cohort (Pori), as each patient was ultimately allocated to one of the five terminal nodes of the original CART model. Discrimination was assessed by calculating the AUC and the 95% confidence interval (CI) derived from bootstrapping 1,000 samples. In addition, the sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) of the model were computed. A true-positive (TP) test result was defined as one that predicted ES and the extubation was successful. A true-negative (TN) test result was defined as one that predicted EF and extubation was unsuccessful. Model calibration was assessed by generating plots of actual (y-axis) and predicted probabilities (x-axis) of ES across quantiles of predicted probabilities (28).
Model updating
To improve the calibration, the original CART model was updated following standardized procedures. These included updating or adjusting the intercept, followed by more extensive updates if necessary, including model revision (20,29). Initially, the simple recalibration method was used. In brief, a logistic regression model was fitted with the predicted probability of the original CART model (Pori) as the only covariate in the validation set:
where y is the outcome in the validation cohort (ES=1, EF=0), a is the updated intercept, βoverall is the calibration slope, and ý is the predicted probability of the original CART model (Pori). Therefore, the updated model's predicted probabilities in validation set (Pup) were calculated as follows:
Formula 2 can be further transformed into Formula 3:
This updating method, known as the simple recalibration method, preserves the tree structure, but updates the predictions to obtain calibration-in-the-large (updating of intercept), and compensates for any overfitting that may have occurred at model development (βoverall) (20).
In this method, there is no need to adjust the effects of the individual predictors (such as P0.1 × RSBI30, ∆RSBI30, RSBI1), which is likely to be the primary advantage of the current method compared with other more extensive updating methods (20). The performance of the updated model (discrimination and calibration) was assessed using the aforementioned methods.
Clinical usefulness
Decision curve analysis (DCA) (19) was used to quantify the clinical usefulness of the updated CART model. Prediction models often provide a result in continuous form, such as the probability of an event from 0 to 100%. To evaluate such a model using decision-analytic methods, the analyst is required to dichotomize the continuous result at a given threshold probability (Pt). At each Pt, the net benefit (NB) of using the model may be calculated using the following formula (19):
In this formula, TP and FP are the number of patients with true- and false-positive classification, respectively, N is the total number of patients and w is a weight equal to the odds of the threshold probability [Pt/(1−Pt)], being essentially the relative harm of a false-positive (FP) and a false negative (FN) classification. In the current model, patients with predicted probability of ES above and below the Pt were classified as positive and negative, respectively. Thus, the key aspect of DCA is that a single Pt may be used to categorize patients as positive or negative and to weight FP and FN classifications (19). In practice, it is often difficult to define the optimal Pt precisely. A decision curve was constructed, which addresses this problem by assessing the NB of using a model in supporting decision-making across a full range of Pts (19,30). To compare the NB of the updated model with extubating patients according to the current strategy at a specified Pt, the difference in NB (∆NB) and corresponding bootstrap CI were computed. If the 95% CI did not include zero, then a statistically significant difference at the 0.05 level was declared (30).
The analyses were performed using SPSS software, version 21.0.0 (IBM SPSS, Armonk, NY, USA), MedCalc software, version 12.7.0.0 (MedCalc, Mariakerke, Belgium) and R software, version 3.0.1 (The R Foundation for Statistical Computing, Vienna, Austria).
Results
Participating ICUs
The median profile of the 10 participating ICUs was an 18-bed ICU in a tertiary hospital with 708 beds; ICUs included a median of 57 patients in the study. No ICU contributed <5% or >15% of the total sample. During the 6-month study period, 4,539 patients were admitted to the 10 ICUs.
Patient characteristics
As shown in Fig. 1, 530 patients were finally extubated after a successful SBT. A total of 72 patients (13.6%) were re-intubated and 18 patients remained on NIV as rescue therapy within 48 h after extubation. The overall EF rate was 17%, as EF was defined as a requirement for any invasive or noninvasive ventilatory support. The reasons for reintubation and EF are presented in Table III. The case-mix in the present validation sample differed from that of the derivation cohort. Patients in the validation cohort were younger, and presented with an increased APACHE-II score (mean difference, 1.5; 95% CI, 0.45 to 2.58) compared with the derivation cohort. The median duration of MV in the validation set was 8 days (6,11), a significant reduction (P<0.001) compared with 10 days in the derivation set (9,14). Relative to the derivation sample, the external validation sample represented a more diverse population in terms of disease spectrum and settings (two medical ICUs, two surgical ICUs, three general ICUs, one neurosurgical ICU and two coronary care units). Furthermore, the validation sample included a higher proportion of patients with acute respiratory distress syndrome (ARDS; 16.2 vs. 6.6%, P=0.02) and a decreased proportion of patients with chronic obstructive pulmonary disease (COPD; 8.5 vs. 35.2%, P<0.001) compared with the derivation sample (Table IV). Respiratory variables during SBT are presented in Table V.
Table III.
Cause | Reintubation (n=72) | Extubation failure (n=90) |
---|---|---|
No improvement in signs of muscle fatiguea | 25 (34.7) | 37 (41.1) |
Hypoxemia | 27 (37.5) | 31 (34.4) |
No improvement in respiratory acidosis | 5 (6.9) | 7 (7.8) |
Neurological deteriorationb | 6 (8.3) | 6 (6.7) |
Excess respiratory secretions | 4 (5.6) | 4 (4.4) |
Hemodynamic instabilityc | 5 (7.0) | 5 (5.6) |
Such as the use of respiratory accessory muscles, paradoxical abdominal motion, or retraction of the intercostal spaces.
Defined as the presence of at least one of the following: i) Psychomotor agitation inadequately controlled by sedation; and ii) decreased consciousness, rendering the patient unable to tolerate noninvasive ventilation (NIV).
With a systolic blood pressure <90 mmHg despite adequate volume challenge, the use of vasopressors, or both. Extubation failure includes re-intubated patients and patients who remained on NIV rescue therapy within 48 h following extubation.
Table IV.
Parameter | Validation cohort (n=530) |
---|---|
Age (years) | 67 (60,72) |
Gender [male (%)] | 315 (59.4) |
APACHE II score | 21.5±4.9 |
PSV (cmH2O) | 11.3±1.8 |
MBP (mmHg) | 89.7±8.8 |
Duration of MV (days) | 8 (6,11) |
PaO2/FiO2 (mmHg) | 261±78 |
PaCO2 (mmHg) | 41±7 |
HR (beats/min) | 84±15 |
Extubation failure [n (%)] | 90 (17.0) |
Reason for MV [n (%)] | |
COPD | 45 (8.5) |
Pneumonia | 122 (23.0) |
Septic shock | 73 (13.8) |
ARDS | 86 (16.2) |
Congestive heart failure | 70 (13.2) |
Postoperative acute respiratory failure | 36 (6.8) |
Multiple trauma without brain injury | 45 (8.5) |
Cardiac arrest | 4 (0.8) |
Neurological disease | 49 (9.2) |
Data expressed as the mean ± standard deviation or as the median (interquartile range) if the data were not normally distributed. APACHE II, acute physiology and chronic health evaluation; PSV, pressure support ventilation; MBP, mean blood pressure; MV, mechanical ventilation; PaO2, partial pressure of oxygen in arterial blood; FiO2, fraction of inspired oxygen; PaCO2, partial pressure of carbon dioxide in arterial blood; HR, heart rate; COPD, chronic obstructive pulmonary disease; ARDS, acute respiratory distress syndrome.
Table V.
Parameter | Extubation faliure (n=90) | Extubation success (n=440) | P-value | |
---|---|---|---|---|
Age (years) | 69 (65.74) | 67 (59.72) | 0.04 | |
APACHE II score | 21.4±5.4 | 21.5±4.8 | 0.447 | |
MBP (mmHg) | 88.6±9.3 | 89.8±8.7 | 0.17 | |
HR (beats/min) | 86.6±15.7 | 82.8±15.8 | 0.035 | |
Duration of MV (days) | 9 (7,12.3) | 8 (6,11) | 0.05 | |
1 min of SBT | ||||
RSBI1 (breaths/min/l) | 91±32 | 75±36 | <0.001 | |
P0.1 × RSBI1 (cmH2O·breaths/min/l) | 336±160 | 217±145 | <0.001 | |
30 min of SBT | ||||
RSBI30 (breaths/min/l) | 107±36 | 69±31 | <0.001 | |
P0.1 × RSBI30 (cmH2O·breaths/min/l) | 443±163 | 218±132 | <0.001 | |
∆RSBI30 (%) | 125±59 | 99±36 | <0.001 | |
60 min of SBT | ||||
RSBI60 (breaths/min/l) | 104±57 | 67±34 | <0.001 | |
P0.1 × RSBI60 (cmH2O·breaths/min/l) | 420±251 | 211±133 | <0.001 |
Data expressed as the mean ± standard deviation or as the median (interquartile range) if the data were not normally distributed. SBT, spontaneous breathing trial; APACHE II, acute physiology and chronic health evaluation; MBP, mean blood pressure; HR, heart rate; P0.1, airway occlusion pressure; P0.1 × RSBI, product of airway occlusion pressure and RSBI; RSBIn, rapid shallow breathing index at n min;∆RSBI30, change in RSBI at 30 min of SBT, expressed as a percentage of RSBI30 to RSBI1.
Model validation
In the validation cohort, the original CART model showed high discrimination (AUC, 0.91; 95% CI, 0.87 to 0.93), but moderate calibration, with a substantial mismatch between predicted and actual probabilities over the entire range of probabilities (Fig. 2A). A substantial improvement in the calibration was observed when the simple recalibration method (20) was used, obviating a clear requirement for more extensive updating methods. The calibration plot of the updated CART model exhibited an intercept of 0 and a calibration slope of 1, suggesting near-perfect calibration (Fig. 2B). The predicted probability of ES for an individual patient could be estimated using the updated CART model (Fig. 3).
The updated CART model clearly exhibited similar discriminative ability to the original CART model in the validation set, with identical AUC significantly higher compared with any single predictors alone (P<0.001; Table VI). If a predicted probability >0.49 (determined by Youden index) was used as cut-off point for predicting ES, the updated CART model had a sensitivity of 93.4% (95% CI, 90.7–95.4), a specificity of 80.3% (95% CI, 70.6 to 87.1%) and a diagnostic accuracy of 91.1% (95% CI, 88.7 to 93.6) (Table VI).
Table VI.
CART model | |||||
---|---|---|---|---|---|
|
|||||
Variable | RSBI30 | ∆RSBI30 | P0.1 × RSBI30 | Original | Updated |
AUC | 0.69 (0.65 to 0.73) Ref. | 0.70 (0.65 to 0.73) z=0.09, P=0.93 | 0.83 (0.79 to 0.86) z=5.15, P<0.001 | 0.89 (0.87 to 0.93)c,d z=7.75, P<0.001 | 0.89 (0.87 to 0.93)c,d z=7.75, P<0.001 |
Sensitivitya | 85.5 (81.9 to 88.4) | 72.7 (68.4 to 76.7) | 76.4 (72.2 to 80.1) | 93.4 (90.7 to 95.4) | 93.4 (90.7 to 95.4) |
Specificitya | 52.2 (42 to 62.2) | 65.6 (55.3 to 74.6) | 88.9 (80.7 to 93.9) | 80.3 (70.6 to 87.1) | 80.3 (70.6 to 87.1) |
PPVa | 89.7 (86.5 to 92.3) | 91.2 (87.7 to 93.7) | 97.1 (94.8 to 98.4) | 95.8 (93.5 to 97.3) | 95.8 (93.5 to 97.3) |
NPVa | 42.3 (33.6 to 51.6) | 33.0 (26.5 to 40.1) | 43.5 (36.5 to 50.7) | 71.3 (61.8 to 79.2) | 71.3 (61.8 to 79.2) |
NB (Pt=50%)b | 0 (−0.68 to 0.83) | 0 (−0.02 to 0.01) | 0 (−0.47 to 0.57) | 8.0 (4.86 to 12.7) | 8.0 (4.86 to 12.7) |
NB (Pt=75%)b | 14.5 (6.6 to 22.2) | 10.7 (2.0 to 19.4) | 25.6 (15.2 to 35.2) | 35.3 (26.2 to 44.2) | 35.3 (26.2 to 44.2) |
NB (Pt=80%)b | 23.3 (10.8 to 27.7) | 21.8 (6.3 to 27.6) | 40.7 (22.5 to 45.7) | 48.9 (32.4 to 53.3) | 48.9 (32.4 to 53.3) |
Threshold values of each single predictor for predicting successful extubation were: RSBI30, ≤105 breaths/min/l; ∆RSBI30, ≤105%; and P0.1 × RSBI30, ≤328 cmH2O·breaths/min/l end-diastolic volume.
Compared with the current strategy: Patients who successfully pass the spontaneous breathing trial (SBT) should be extubated if neurological status, excessive secretions, adequate cough and airway obstruction are not issues.
P<0.001 vs. ∆RSBI30
P<0.001 vs. P0.1 × RSBI30. The optimal threshold values of the two models to predict successful extubation were: Predicted probability >0.4 for the original CART model and predicted probability >0.49 for the updated CART model. Values in parentheses are 95% confidence intervals. RSBI30, rapid shallow breathing index at 30 min of SBT; ∆RSBI30, change in RSBI at 30 min of SBT, expressed as the percentage of RSBI30 to RSBI1; P0.1 × RSBI30, product of airway occlusion pressure and RSBI at 30 min of SBT; CART, classification and regression tree; AUC, area under the receiver operating characteristic curve; PPV, positive predictive value; NPV, negative predictive value; NB, net benefit; Pt, threshold probability; Ref., reference value for AUC comparisons.
DCA
The results of the DCA analysis are presented in Fig. 4 and Table VI. The updated CART model displayed an improved NB compared with the ‘current strategy’, ∆RSBI30, RSBI30 and P0.1 × RSBI30 across a range of clinically plausible Pts (50 to 80%).
Discussion
The accurate prediction of extubation outcome remains challenging. Using a multicenter prospective cohort of mechanically ventilated patients in 10 Chinese ICUs, the previously developed CART model (17) was externally validated to predict ES after a successful SBT, and to propose an updated version of this model to improve calibration. The updated CART model outperformed the use of the single predictors alone and the current strategy with regard to the clinical NB.
Validations of predictive models are necessary to verify their generalizability to new sites, as performance in the original data may be optimistic; however, temporal and external validation studies are limited (16). Prediction models that are not validated are typically not sufficiently developed for clinical application (18). Models predicting ES are usually submitted only to internal validation by the bootstrapping (14) or split-sample method (31). Results are often accepted without sufficient regard to the importance of external validation (16,20), limiting the generalizability of a prediction model to future settings. To the best of our knowledge, the current study is the first to validate a prediction model for extubation outcome based on a multi-center population.
It has been suggested that a formal validation study should consist of an adequate sample of different but related patients compared with the derivation population. This standard required for the validation of a prediction model (20) has been fulfilled in the present study. In this study, the group of patients used for the validation of the CART model differed in numerous aspects, such as age, APACHE-II score and prevalence of disease, from the group of patients used for derivation. This is known as a difference in case-mix. However, this difference in case-mix, may result in the poor calibration of the original model in the validation population (20,29). As demonstrated in Fig. 2A, the original CART model systematically underestimated the probabilities of ES. This type of miscalibration, with a slope ~1 and a non-zero intercept, is a typical result in external validation studies, as described by Steyerberg (20). This result indicates that certain patient characteristics (such as age, APACHE-II score and prevalence of disease), which were not included in the prediction model, were distributed differently between the derivation and validation set (20,29). In such cases, a simple recalibration method is often sufficient and preferable (20,29). As expected, the model updated with this simple recalibration method showed excellent calibration (Fig. 2B), indicating that the updated CART model was able to provide accurate predicted probabilities of ES in the new population.
Newly collected patient data are often used to develop a new prediction models instead of validating an existing model (29,32,33) if a model exhibits worse performance in another population. This may lead to a loss of previous scientific information captured in the previous (development) study, which is counterintuitive to the notion that inferences and guidelines to enhance evidence-based medicine should be based on as much information as possible (34). A superior alternative to redeveloping new models in each new patient sample is to update existing prediction models and adjust or recalibrate them to the local circumstances or setting of the validation sample under investigation (29,34,35). Validation and updating, as conducted in the present study, may lead to more stable and generalizable prediction models, since updated models combine the information captured in the original model with information from new individuals (32,36,37). Therefore, validation and updating may be preferable to the development of new models.
Notably, the updated CART model produced an identical AUC (0.91) to the original one. This similar discriminatory ability may be explained by the simple re-calibration method employed to update the original CART model (20). This updating strategy preserves the tree structure, but re-estimates the predictions of the outcome in each of the five terminal nodes of the original CART model. This updating method involves refitting the intercept to produce an average predicted probability equal to the observed overall event rate and, if necessary, the calibration slope (overall weights of the predictors) (20,29,32). This method would not alter the relative ranking of the predicted probabilities (20,29,34,38), and thus would not affect a model's discrimination. However, the updated CART model, which exhibited near-perfect calibration, remains preferable compared with the original. Furthermore, the evaluation of calibration is more important if model predictions are used to inform patients or physicians to make clinical decisions (20,30,32).
The original and updated CART models were able to discriminate effectively, with an identical AUC (0.91) that was significantly higher compared with any single predictive factor alone (such as RSBI, P0.1 × RSBI30) in this validation cohort. In the present study, EF occurred in 17% (90/530) of the extubated patients who completed the SBT and were judged appropriate by their physician for the extubation. If a predicted probability >0.49 (determined by Youden index) was used as cut-off point for predicting ES (Table VI), the updated CART model correctly predicted extubation outcome in 91.1% of patients that successfully passed the SBT, with a sensitivity of 93.4, a specificity of 80.3 and AUC of 0.91, identifying 72/90 EF patients. In total, 11/18 EF patients were not identified by the model; however, EF in these cases occurred for reasons other than acute respiratory failure or respiratory distress, such as neurological deterioration or excess respiratory secretions. Indeed, the etiology of EF influences outcome, with the highest mortality for respiratory failure or respiratory distress (8). Among the EF patients in the present study, 83% exhibited acute respiratory failure or respiratory distress as the reason for EF, which is consistent with a prior study (8). The updated CART model identified 91% of patients with EF resulting from respiratory failure or respiratory distress, which may be clinically useful.
The CART model possesses a number of clear advantages. First, the updated CART model (Fig. 3) is practical as it is simple to use, easy to remember with a graphical presentation, does not require any calculations and may be applied anywhere in an ICU. By contrast, traditional models based on logistic regression are complicated to use, require extensive calculation and may not be practical for application in ICUs that have a large workload. Second, the CART model consists of three clearly defined predictors that do not require any subjective interpretation. Third, the generalizability of the CART model is supported by its external validation and updating in diverse clinical settings. The more numerous and diverse the settings in which the model is demonstrated to be valid, the more likely it is that the model will be applicable to an untested setting (15). Indeed, the updated model combines the information captured in the original model with information from new individuals. Hence, the updated CART model is based on data used in the development and validation studies, further improving its stability and generalizability (20,29). Fourth, the patients included in the present study sample represent a broad disease spectrum, including the majority of diseases leading to invasive MV, including postoperative acute respiratory failure, multiple trauma, cardiac arrest, neurological disease, congestive heart failure, ARDS, septic shock, pneumonia and COPD.
As the CART model has been validated and updated in a new patient population with a varying case-mix, the final CART model for predicting ES appears to be robust, stable, and reliable. We classify our model that was validated across diverse medical settings as level 2 on the hierarchy (15). Level 2 should be, ideally, followed by impact evaluation (level I). The current data suggest that strict application of the updated CART model in the studied patient population would reduce the incidence of EF from 17 to 3.4%, but increase the occurrence of unnecessarily delayed extubation from 0 to 5.4%. As unnecessarily delayed extubation and EF are associated with increased mortality, traditional metrics, such as sensitivity, specificity and AUC, are unable to determine whether the CART model is sufficiently accurate to be beneficial in a clinical context (19). Decision-analytic methods incorporate consequences and, in theory, may resolve this ambiguity. For example, if a clinician assumes that the harm associated with an unexpected EF (FP classification) is equal to the harm of unnecessary delay in extubation (FN classification), then w=1 and Pt=0.5. The NB of 0.08 at a Pt of 50% can be interpreted in terms that extubating patients according to our updated CART model, compared with using the current strategy, leads to the equivalent of a net 8.0 TP results per 100 patients without an increase in the number of FP classifications. The NB formula (19) calculates that this is the equivalent of a net 8.0 fewer false-positive results per 100 patients. In other words, use of the updated CART model may lead to the equivalent of a 8.0% reduction in the number of patients experiencing EF, with no increase in the number of patients with extubation being delayed unnecessarily. It may be difficult to define this threshold in practice; therefore, a full range of Pts were considered. The updated CART model is improved compared the current strategy across a wide range of Pts (50 to 80%).
It is crucial to emphasize that the CART model is not able to predict EF caused by airway protection issues. Adding factors of airway competence (such as cough strength and secretion volume) may have improved the performance of the model. However, in recent studies of extubation outcome (6,8,12,22,24, the authors reported that retained secretions and/or weak cough accounted for only 6.2% (range, 0 to 9.9%) of the EFs, which is consistent with the present results. The low frequency of abundant secretions in these studies suggests that a number of years after the increased risk of EF in the presence of copious secretions and weak cough was initially reported, physicians are proficient at extracting principles that emerge from research studies and incorporating them into everyday clinical practice. Thus, adding these quantitative measures of cough strength and quantity of endotracheal secretions is unlikely to produce a substantial improvement in the performance of the updated CART model.
A number of limitations of the present study require consideration. First, as with all observational studies on extubation outcome, a selection bias may have occurred, as the predefined criteria for reintubation included certain subjective indices, including ‘agitation’, ‘massive’ and ‘decreased consciousness’. However, decisions regarding reintubation in the ICUs were made by board-certified intensive care specialists using the standard policy in the institutions. Furthermore, the reintubation and EF rates (17%) in the present study are consistent with those observed in previous studies (4,6,8). Although never subjected to rigorous cost/benefit analyses, reintubation rates of 5 to 20% are generally considered reasonable, and are typical for the majority of effectively managed ICUs (3,40). That the CART model reintubation rate (13.6%) falls within this range suggests that the impact of this selection bias on the results was minimal. A second limitation was that the present study was conducted in a limited number of centers within a single country. The level of generalizability of the present results to other institutions that have differences in case-mix is not clear. However, the CART model has been validated and adjusted in a new population with a wide spectrum of disease across diverse medical settings, further improving its stability and generalizability. Therefore in theory the CART model may be applicable to other clinical settings with a high degree of confidence (15). However, external validation is a dynamic process and a number of issues remain unresolved. For example, it is not obvious when a model may be considered sufficiently validated and updated. The extent to which this process of model validation and adjustment requires continuation prior to clinical application, will depend on the context (32), and general rules are as yet unavailable. Considering the limited number of centers involved in the present validation study, the updated CART model may require further repeated validation studies in hospitals or institutions of other geographical areas, preferably from different countries, to improve its stability and generalizability. Third, a selection bias may have been present, as the use of ATC has the potential to allow marginal patients to tolerate the SBT, who may subsequently develop ventilatory failure following extubation. However, Elsasser et al observed that the built-in commercial ATC may provide adequate inspiratory tube compensation with minimal overassistance (41). This conclusion is further supported by the observation that the reintubation rate in the present study (13.6%) is comparable with that reported from other trials assessing outcome after SBTs (4,6,8).
The original CART model discriminated effectively between patients that were extubated successfully following an SBT and those who were not; however, in the present validation set, calibration was modest. The model was improved using a simple re-calibration method. The updated CART model represents a simple approach that may be used by clinicians to estimate the predicted probability of ES following a successful SBT for individual patients. The clinical consequence of applying the updated CART model appears to be substantial in terms of the potential reduction in unexpected EF. Future studies are required to further validate and, if required, update the CART model in hospitals or institutions of other geographical areas, within China or across countries, that have differences in case-mix, as the reliable prediction of the probability of ES after a successful SBT is of substantial practical value.
Acknowledgements
The authors thank the nursing staff, attending physicians and respiratory therapists at each of the participating sites for their cooperation in this study. The authors also wish to thank Professor Mao-Ti Wei for his statistical guidance on the decision curve analysis.
Glossary
Abbreviations
- APACHE II
acute physiology and chronic health evaluation II
- ARDS
acute respiratory distress syndrome
- ATC
automatic tube compensation
- AUC
area under the receiver operating characteristic curve
- CART
classification and regression tree
- CI
confidence interval
- COPD
chronic obstructive pulmonary disease
- DCA
decision curve analysis
- ES
successful extubation
- EF
extubation failure
- FN
false-negative
- FP
false-positive
- ICU
intensive care unit
- MV
mechanical ventilation
- NB
net benefit
- NIV
non-invasive ventilation
- NPV
negative predictive value
- OR
odds ratio
- P0.1
airway occlusion pressure
- PPV
positive predictive value
- Pt
threshold probability
- RSBI
rapid shallow breathing index
- RSBI1
RSBI at 1 min of SBT
- RSBI30
RSBI at 30 min of SBT
- ∆RSBI30
ratio of RSBI30 to RSBI1 as a percentage
- SBT
spontaneous breathing trial
- TN
true-negative
- TP
true-positive
References
- 1.Boles JM, Bion J, Connors A, et al. Weaning from mechanical ventilation. Eur Respir J. 2007;29:1033–1056. doi: 10.1183/09031936.00010206. [DOI] [PubMed] [Google Scholar]
- 2.MacIntyre NR, Cook DJ, Ely EW, Jr, et al. Evidence-based guidelines for weaning and discontinuing ventilatory support: A collective task force facilitated by the American College of Chest Physicians; the American Association for Respiratory Care; and the American College of Critical Care Medicine. Chest. 2001;120(Suppl 6):375–395. doi: 10.1378/chest.120.6_suppl.375S. [DOI] [PubMed] [Google Scholar]
- 3.Macintyre NR. Evidence-based assessments in the ventilator discontinuation process. Respir Care. 2012;57:1611–1618. doi: 10.4187/respcare.02055. [DOI] [PubMed] [Google Scholar]
- 4.Esteban A, Alia I, Tobin MJ, et al. Spanish Lung Failure Collaborative Group: Effect of spontaneous breathing trial duration on outcome of attempts to discontinue mechanical ventilation. Am J Respir Crit Care Med. 1999;159:512–518. doi: 10.1164/ajrccm.159.2.9803106. [DOI] [PubMed] [Google Scholar]
- 5.FrutosVivar F, Esteban A, Apezteguia C, et al. Outcome of reintubated patients after scheduled extubation. J Crit Care. 2011;26:502–509. doi: 10.1016/j.jcrc.2010.12.015. [DOI] [PubMed] [Google Scholar]
- 6.Perren A, Previsdomini M, Llamas M, et al. Patients' prediction of extubation success. Intensive Care Med. 2010;36:2045–2052. doi: 10.1007/s00134-010-1984-4. [DOI] [PubMed] [Google Scholar]
- 7.Coplin WM, Pierson DJ, Cooley KD, Newell DW, Rubenfeld GD. Implications of extubation delay in brain-injured patients meeting standard weaning criteria. Am J Respir Crit Care Med. 2000;161:1530–1536. doi: 10.1164/ajrccm.161.5.9905102. [DOI] [PubMed] [Google Scholar]
- 8.Thille AW, Harrois A, Schortgen F, BrunBuisson C, Brochard L. Outcomes of extubation failure in medical intensive care unit patients. Crit Care Med. 2011;39:2612–2618. doi: 10.1097/CCM.0b013e3182282a5a. [DOI] [PubMed] [Google Scholar]
- 9.MacIntyre N. Ventilator discontinuation process: Evidence and guidelines. Crit Care Med. 2008;36:329–330. doi: 10.1097/01.CCM.0000297958.82589.E2. [DOI] [PubMed] [Google Scholar]
- 10.Jubran A, Tobin MJ. Pathophysiologic basis of acute respiratory distress in patients who fail a trial of weaning from mechanical ventilation. Am J Respir Crit Care Med. 1997;155:906–915. doi: 10.1164/ajrccm.155.3.9117025. [DOI] [PubMed] [Google Scholar]
- 11.Jubran A, Grant BJ, Laghi F, Parthasarathy S, Tobin MJ. Weaning prediction: Esophageal pressure monitoring complements readiness testing. Am J Respir Crit Care Med. 2005;171:1252–1259. doi: 10.1164/rccm.200503-356OC. [DOI] [PubMed] [Google Scholar]
- 12.FrutosVivar F, Ferguson ND, Esteban A, et al. Risk factors for extubation failure in patients following a successful spontaneous breathing trial. Chest. 2006;130:1664–1671. doi: 10.1378/chest.130.6.1664. [DOI] [PubMed] [Google Scholar]
- 13.Bien MY, Shui Lin Y, Shih CH, et al. Comparisons of predictive performance of breathing pattern variability measured during T-piece, automatic tube compensation and pressure support ventilation for weaning intensive care unit patients from mechanical ventilation. Crit Care Med. 2011;39:2253–2262. doi: 10.1097/CCM.0b013e31822279ed. [DOI] [PubMed] [Google Scholar]
- 14.Mokhlesi B, Tulaimat A, Gluckman TJ, Wang Y, Evans AT, Corbridge TC. Predicting extubation failure after successful completion of a spontaneous breathing trial. Respir Care. 2007;52:1710–1717. [PubMed] [Google Scholar]
- 15.McGinn TG, Guyatt GH, Wyer PC, Naylor CD, Stiell IG, Richardson WS. Evidence-based Medicine Working Group: Users' guides to the medical literature: XXII: how to use articles about clinical decision rules. JAMA. 2000;284:79–84. doi: 10.1001/jama.284.1.79. [DOI] [PubMed] [Google Scholar]
- 16.Altman DG, Vergouwe Y, Royston P, Moons KG. Prognosis and prognostic research: Validating a prognostic model. BMJ. 2009;338:b605. doi: 10.1136/bmj.b605. [DOI] [PubMed] [Google Scholar]
- 17.Liu Y, Wei LQ, Li GQ, et al. A decision-tree model for predicting extubation outcome in elderly patients after a successful spontaneous breathing trial. Anesth Analg. 2010;111:1211–1218. doi: 10.1213/ANE.0b013e3181f4e82e. [DOI] [PubMed] [Google Scholar]
- 18.Adams ST, Leveson SH. Clinical prediction rules. BMJ. 2012;344:d8312. doi: 10.1136/bmj.d8312. [DOI] [PubMed] [Google Scholar]
- 19.Vickers AJ, Elkin EB. Decision curve analysis: A novel method for evaluating prediction models. Med Decis Making. 2006;26:565–574. doi: 10.1177/0272989X06295361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Steyerberg EW, editor. Clinical Prediction Models: A Practical Approach to Development, Validation and Updating. Springer; New York: 2009. 19 Patterns of External Validity; pp. 333–360. [Google Scholar]
- 21.Ferrer M, Valencia M, Nicolas JM, Bernadich O, Badia JR, Torres A. Early noninvasive ventilation averts extubation failure in patients at risk: a randomized trial. Am J Respir Crit Care Med. 2006;173:164–170. doi: 10.1164/rccm.200505-718OC. [DOI] [PubMed] [Google Scholar]
- 22.Esteban A, FrutosVivar F, Ferguson ND, et al. Noninvasive positive-pressure ventilation for respiratory failure after extubation. N Engl J Med. 2004;350:2452–2460. doi: 10.1056/NEJMoa032736. [DOI] [PubMed] [Google Scholar]
- 23.Kuhlen R, Hausmann S, Pappert D, Slama K, Rossaint R, Falke K. A new method for P0.1 measurement using standard respiratory equipment. Intensive Care Med. 1995;21:554–560. doi: 10.1007/BF01700159. [DOI] [PubMed] [Google Scholar]
- 24.Segal LN, Oei E, Oppenheimer BW, et al. Evolution of pattern of breathing during a spontaneous breathing trial predicts successful extubation. Intensive Care Med. 2010;36:487–495. doi: 10.1007/s00134-009-1735-6. [DOI] [PubMed] [Google Scholar]
- 25.Tobin MJ, Jubran A. In: Principles and Practice of Mechanical Ventilation. 2nd. Tobin MJ, editor. McGraw-Hill; New York, NY: 2006. Weaning from mechanical ventilation; pp. 1185–1220. [Google Scholar]
- 26.Perlich C, Provost F, Simonoff JS. Tree induction vs. logistic regression: A learning-curve analysis. J Mach Learn Res. 2003;4:211–255. [Google Scholar]
- 27.Provost F, Domingos P. Tree induction for probability-based ranking. Machine Learning. 2003;52:199–215. doi: 10.1023/A:1024099825458. [DOI] [Google Scholar]
- 28.Steyerberg EW, Vickers AJ, Cook NR, et al. Assessing the performance of prediction models: A framework for traditional and novel measures. Epidemiology. 2010;21:128–138. doi: 10.1097/EDE.0b013e3181c30fb2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Janssen KJ, Moons KG, Kalkman CJ, Grobbee DE, Vergouwe Y. Updating methods improved the performance of a clinical prediction model in new patients. J Clin Epidemiol. 2008;61:76–86. doi: 10.1016/j.jclinepi.2007.04.018. [DOI] [PubMed] [Google Scholar]
- 30.Steyerberg EW, Pencina MJ, Lingsma HF, Kattan MW, Vickers AJ, Van Calster B. Assessing the incremental value of diagnostic and prognostic markers: A review and illustration. Eur J Clin Invest. 2012;42:216–228. doi: 10.1111/j.1365-2362.2011.02562.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Vassilakopoulos T, Routsi C, Sotiropoulou C, et al. The combination of the load/force balance and the frequency/tidal volume can predict weaning outcome. Intensive Care Med. 2006;32:684–691. doi: 10.1007/s00134-006-0104-y. [DOI] [PubMed] [Google Scholar]
- 32.Moons KG, Altman DG, Vergouwe Y, Royston P. Prognosis and prognostic research: Application and impact of prognostic models in clinical practice. BMJ. 2009;338:b606. doi: 10.1136/bmj.b606. [DOI] [PubMed] [Google Scholar]
- 33.Reilly BM, Evans AT. Translating clinical research into clinical practice: Impact of using prediction rules to make decisions. Ann Intern Med. 2006;144:201–209. doi: 10.7326/0003-4819-144-3-200602070-00009. [DOI] [PubMed] [Google Scholar]
- 34.Moons KG, Kengne AP, Grobbee DE, et al. Risk prediction models: II. External validation, model updating and impact assessment. Heart. 2012;98:691–698. doi: 10.1136/heartjnl-2011-301247. [DOI] [PubMed] [Google Scholar]
- 35.van Houwelingen HC. Validation, calibration, revision and combination of prognostic survival models. Stat Med. 2000;19:3401–3415. doi: 10.1002/1097-0258(20001230)19:24<3401::AID-SIM554>3.0.CO;2-2. [DOI] [PubMed] [Google Scholar]
- 36.Steyerberg EW, Borsboom GJ, van Houwelingen HC, Eijkemans MJ, Habbema JD. Validation and updating of predictive logistic regression models: A study on sample size and shrinkage. Stat Med. 2004;23:2567–2586. doi: 10.1002/sim.1844. [DOI] [PubMed] [Google Scholar]
- 37.Moons KG. Criteria for scientific evaluation of novel markers: A perspective. Clin Chem. 2010;56:537–541. doi: 10.1373/clinchem.2009.134155. [DOI] [PubMed] [Google Scholar]
- 38.Toll DB, Janssen KJ, Vergouwe Y, Moons KG. Validation, updating and impact of clinical prediction rules: A review. J Clin Epidemiol. 2008;61:1085–1094. doi: 10.1016/j.jclinepi.2008.04.008. [DOI] [PubMed] [Google Scholar]
- 39.Zapata L, Vera P, Roglan A, Gich I, OrdonezLlanos J, Betbese AJ. B-type natriuretic peptides for prediction and diagnosis of weaning failure from cardiac origin. Intensive Care Med. 2011;37:477–485. doi: 10.1007/s00134-010-2101-4. [DOI] [PubMed] [Google Scholar]
- 40.MacIntyre N. Discontinuing mechanical ventilatory support: Removing positive pressure ventilation vs removing the artificial airway. Chest. 2006;130:1635–1636. doi: 10.1378/chest.130.6.1635. [DOI] [PubMed] [Google Scholar]
- 41.Elsasser S, Guttmann J, Stocker R, Mols G, Priebe HJ, Haberthür C. Accuracy of automatic tube compensation in new-generation mechanical ventilators. Crit Care Med. 2003;31:2619–2626. doi: 10.1097/01.CCM.0000094224.78718.2A. [DOI] [PubMed] [Google Scholar]