Key Points
Question
Is use of the risk-adjusted cumulative sum (CUSUM) associated with improvement in early detection of hospitals with excess perioperative mortality relative to episodic performance evaluation?
Findings
In this national, hospital-level, comparative effectiveness study of 697 566 patients undergoing surgery across 104 Veterans Affairs hospitals, the CUSUM identified hospitals with excess perioperative mortality significantly earlier than episodic evaluation and was associated with future performance.
Meaning
Findings suggest that the CUSUM represents a useful tool that could be implemented within current national quality improvement programs and would likely enhance both quality and performance improvement efforts.
Abstract
Importance
National surgical quality improvement programs lack tools for early detection of quality or safety concerns, which risks patient safety because of delayed recognition of poor performance.
Objective
To compare the risk-adjusted cumulative sum (CUSUM) with episodic evaluation for early detection of hospitals with excess perioperative mortality.
Design, Setting, and Participants
National, observational, hospital-level, comparative effectiveness study of 697 566 patients. Identification of hospitals with excess, risk-adjusted, quarterly 30-day mortality using observed to expected ratios (ie, current criterion standard in the Veterans Affairs Surgical Quality Improvement Program) was compared with the risk-adjusted CUSUM. Patients included in the study underwent a noncardiac operation at a Veterans Affairs hospital, had a record in the Veterans Affairs Surgical Quality Improvement Program (January 1, 2011, through December 31, 2016), and were aged 18 years or older.
Main Outcome and Measure
Number of hospitals identified as having excess risk-adjusted 30-day mortality.
Results
The cohort included 697 566 patients treated at 104 hospitals across 24 quarters. The mean (SD) age was 60.9 (13.2) years, 91.4% were male, and 8.6% were female. For each hospital, the median number of quarters detected with observed to expected ratios, at least 1 CUSUM signal, and more than 1 CUSUM signal was 2 quarters (IQR, 1-4 quarters), 8 quarters (IQR, 4-11 quarters), and 3 quarters (IQR, 1-4 quarters), respectively. During 2496 total quarters of data, outlier hospitals were identified 33.3% of the time (830 quarters) with at least 1 CUSUM signal within a quarter, 12.5% (311 quarters) with more than 1 CUSUM signal, and 11.0% (274 quarters) with observed to expected ratios at the end of the quarter. The CUSUM detection occurred a median of 49 days (IQR, 25-63 days) before observed to expected ratio reporting (1 signal, 35 days [IQR, 17-54 days]; 2 signals, 49 days [IQR, 26-61 days]; 3 signals, 58 days [IQR, 44-69 days]; ≥4 signals, 49 days [IQR, 42-69 days]; trend test, P < .001). Of 274 hospital quarters detected with observed to expected ratios, 72.6% (199) were concurrently detected by at least 1 CUSUM signal vs 42.7% (117) by more than 1 CUSUM signal. There was a dose-response relationship between the number of CUSUM signals in a quarter and the median observed to expected ratio (0 signals, 0.63; 1 signal, 1.28; 2 signals, 1.58; 3 signals, 2.08; ≥4 signals, 2.49; trend test, P < .001).
Conclusions
This study found that with CUSUM, hospitals with excess perioperative mortality can be identified well in advance of standard end-of-quarter reporting, which suggests episodic evaluation strategies fail to detect out-of-control processes and place patients at risk. Continuous performance evaluation tools should be adopted in national quality improvement programs to prevent avoidable patient harm.
This comparative effectiveness study investigates use of the risk-adjusted cumulative sum vs episodic evaluation for early detection of Veterans Affairs hospitals with excess perioperative mortality.
Introduction
After concerns were raised about the state of health care for veterans, the US Congress passed legislation in 1985 mandating that the US Department of Veterans Affairs (VA) “establish and conduct a comprehensive quality assurance program to monitor and evaluate the quality of health care furnished by the VA’s Department of Medicine and Surgery.”1 For surgery, this prompted the development and implementation of what is now known as the Veterans Affairs Surgical Quality Improvement Program (VASQIP) to collect, analyze, and report hospital-level, risk-adjusted, perioperative outcomes episodically (eg, quarterly) at all VA facilities.2,3 In the private sector, following the VASQIP template, the American College of Surgeons developed the National Surgical Quality Improvement Program (ACS-NSQIP).3 Although these types of quality improvement programs have been associated with improved quality and safety of surgical care in both VA facilities and the private sector, episodic reporting systems may delay recognition of poor outcomes.4,5 Consequently, hospitals with quality or safety concerns may be unaware of clinically significant issues, exposing future patients to problematic care processes or outcomes. Early recognition using continuous monitoring techniques could allow clinical and administrative leadership to address life-threatening quality gaps more proactively.
The cumulative sum (CUSUM) is a statistical process control technique initially used in the industrial setting to monitor the quality of production processes but subsequently adopted for health care use.6 In the 1990s, CUSUM methods were applied to monitor surgical quality, detect clinically significant deterioration in outcomes, and rapidly recognize outcome clusters for specific procedures.6,7,8 Although the lack of risk-adjustment techniques then precluded broader application, advances in computational technology have created new opportunities to use this methodology.9,10 The CUSUM is now used to monitor patients after transplant and graft survival across US transplant centers.11,12 Recent work suggests the risk-adjusted CUSUM can be applied to monitor perioperative outcomes across multiple specialties, facilitate early detection of hospitals with excess perioperative adverse events, and identify centers with quality issues undetected with existing episodic monitoring strategies.13 However, despite the risk of patient harm from delayed recognition (and correction) of errant care processes and outcomes, continuous monitoring tools, such as CUSUM, remain the exception rather than the norm in surgical quality improvement programs.
National surgical quality improvement programs use reporting and analytic frameworks that have remained largely unchanged since they were first introduced several decades ago.2,3 Like all of health care, quality improvement techniques need to implement technology and analytic strategies that can adapt to a rapidly changing health care landscape.14 A critical missing piece of information from current episodic monitoring strategies (such as the observed to expected ratio) is time—specifically, the timing of events in the numerator relative to one another. Current episodic monitoring strategies are predicated on the number of events in the numerator relative to a denominator. However, these strategies do not incorporate any information about when events in the numerator happen relative to one another. Put differently, using episodic analysis, 10 adverse outcomes observed after 100 operations will look the same whether they occur in 1 of every 10 cases or in 10 cases in a row. The value of CUSUM is that it incorporates time in the evaluation of performance. Although CUSUM monitoring is an established analytic method, the optimal manner to apply it for assessing multispecialty, perioperative outcomes has not been established. The aims of this study are to compare the potential benefit of continuously monitoring hospital performance using the CUSUM with standard episodic evaluation (quarterly observed to expected ratios) and to evaluate hospital-level factors associated with false-positive and false-negative CUSUM signaling.
Methods
Data
VASQIP data were used to conduct a national, observational, hospital-level comparative effectiveness study of VA hospitals from January 1, 2011, through December 31, 2016. VASQIP is a mandatory, national, surgical quality improvement program for all VA hospitals with a surgical program. The quality and reliability of VASQIP data have previously been demonstrated.15 Data are collected by trained nurse abstractors and then validated and analyzed centrally by the VA National Surgery Office to create a quarterly VASQIP report provided to clinical and administrative leaders at each VA hospital. This comparative effectiveness study was approved by the institutional review board of the Baylor College of Medicine and the Michael E. DeBakey VA Medical Center Research and Development Committee. Because VASQIP is an existing data source, patient consent was waived. The study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.
Study Cohort
Patients aged 18 years or older who underwent noncardiac surgery at a VA hospital were included (n = 702 475). Only noncardiac operations were included because cardiac VASQIP is a separate data set with different variables. For patients with more than 1 operation in a 30-day period, only data from the first operation were used. A small proportion of patients with missing covariate data or vital status information; recorded as having a cardiac, eye, transplant, or procedure assigned to a nonsurgical specialty; or with a complication date before the recorded operation date were excluded (n = 4909 [0.7%]).
Quarterly Observed to Expected Ratio Method
Quarterly, hospital-specific, 30-day mortality observed to expected ratios were used as the control (ie, current criterion standard in the VASQIP) because this is the approach used by the VA National Surgery Office to identify hospitals with quality or safety concerns. Expected mortality rates were derived using estimates after multivariable logistic regression with the following covariates: age, sex, American Society of Anesthesiologists classification, procedural urgency, history of severe chronic obstructive pulmonary disease, history of neurologic event, preoperative functional status, history of diabetes, preoperative weight loss of more than 10% body weight within 6 months of surgery, procedural complexity, and surgical specialty.16,17 Robust SEs were used. Outlier hospitals were identified as those with a 95% CI lower boundary for the observed to expected ratio (calculated with the binomial distribution) significantly greater than 1.0.2 Although bayesian approaches and reliability adjustment are used by some quality improvement programs to assess hospital performance, they are not currently used by the VASQIP.18 However, because a hospital with a single quarter of outlier performance could represent random variation, we evaluated 3 other controls as sensitivity analyses: (1) 2 consecutive quarters of observed to expected ratio outlier performance; (2) reliability-adjusted observed to expected ratio outlier performance for 1 quarter; and (3) reliability-adjusted observed to expected ratio outlier performance for 2 consecutive quarters.19
CUSUM Method
The continuous time CUSUM uses an integrated time series approach accounting for the accumulation of postoperative deaths over time.13 Parametric proportional hazard regression was used to construct each hospital’s risk-adjusted CUSUM, controlling for the same covariates used to calculate observed to expected ratios. The observed CUSUM results from the integration of the event series, which is then compared with a risk-adjusted expected CUSUM calculated with a hazard-based approach. Integrating these differences results in the CUSUM (Figure 1; eAppendix 1 in Supplement 1). With this continuous approach, outcomes are incorporated into the multivariable model on the postoperative date of occurrence (ie, during the quarter) instead of episodically (ie, only at the end of the quarter). Hospitals with excess mortality were detected with the V-mask method. The V mask is defined by a slope and radius and forms a “V” shape around the CUSUM, with the open arms at time zero and the vertex projected forward in time (ie, the “V” becomes narrower over time). We used a V-mask slope and radius previously demonstrated to optimize the correlation between CUSUM and observed to expected ratio hospital detection while balancing sensitivity and specificity (slope of 2.5 and radius of 1.0).13 Building on prior work evaluating a single CUSUM signal for detecting a poorly performing hospital, multiple CUSUM signals (ie, persistent detection after the first signal) were evaluated in an effort to decrease false-positive detection.13 To address the potential for autocorrelation (ie, a hospital that signaled using the CUSUM would be at risk for additional signals moving forward), monitoring was continued from the point at which the signal occurred, but with the CUSUM reset as though a preceding signal had not occurred.
Statistical Analysis
Patient-level data were aggregated at the hospital level for analysis. The main outcome of interest was the number of hospitals identified as an outlier according to excess risk-adjusted 30-day mortality. Outlier detection with the observed to expected ratio served as the reference standard for calculating sensitivity, specificity, and positive and negative predictive values. Receiver operating characteristic curves were used to calculate the area under the curve. Concordance between the observed to expected ratio and CUSUM outlier detection was ascertained with tetrachoric correlation.
For hospitals identified as a quarterly outlier with CUSUM and observed to expected ratios (concordant detection), the earliest CUSUM signal date in that quarter was compared with the end-of-quarter observed to expected ratio date to quantify the time lag for outlier identification. The Kruskal-Wallis test and a nonparametric trend test were applied to compare time lag based on the number of CUSUM signals. An additional sensitivity analysis comparing observed to expected ratios calculated after 60 days (instead of quarterly) with the CUSUM was performed to evaluate whether earlier episodic ascertainment might have value for surgical quality improvement efforts (results are presented in eAppendix 2 in Supplement 1). Multivariable hierarchical regression was used to (1) evaluate the association between the number of CUSUM signals in a given quarter and concordant detection of the same hospital in the subsequent quarter (ie, estimation of future performance) and (2) evaluate the association between hospital factors and false-positive and false-negative CUSUM signaling (model C statistics in the eTable in Supplement 1). Hospital factors were ascertained from previous work delineating VA facility structural and process measures.20 The variance inflation factor and correlation matrices were used to assess multicollinearity of hospital factors. Variables with a variance inflation factor greater than 5 were excluded. P values were 2-sided at a .05 significance level. All analyses were performed from January 30, 2022, to December 2, 2022, with SAS, version 9.4 (SAS Institute Inc) and Stata, version 17 (StataCorp LLC).
Results
Overall, 697 566 patients underwent an operation across 104 VA hospitals during 24 quarters. The mean (SD) age was 60.9 (13.2) years, 91.4% were male, and 8.6% were female. For each hospital, the median number of quarters detected with observed to expected ratios, at least 1 CUSUM, and more than 1 CUSUM signal were 2 quarters (IQR, 1-4 quarter), 8 quarters (IQR, 4-11 quarters), and 3 quarters (IQR, 1-4 quarters), respectively. For 2496 total quarters of data, outlier hospitals were identified 33.3% of the time (830 quarters) with at least 1 CUSUM signal within a quarter, 12.5% (311 quarters) with more than 1 CUSUM signal, and 11.0% (274 quarters) with observed to expected ratios at the end of the quarter. Of 274 hospital quarters detected with observed to expected ratios, 72.6% (199) were concurrently detected by at least 1 CUSUM signal vs 42.7% (117) by more than 1 CUSUM signal. There was a dose-response relationship between the number of CUSUM signals in a quarter and the median observed to expected ratio (0 signals, 0.63; 1 signal, 1.28; 2 signals, 1.58; 3 signals, 2.08; ≥4 signals, 2.49 [1.80-2.95]; trend test, P < .001).
For hospitals concordantly identified with both CUSUM and observed to expected ratios in a quarter, Figure 2 demonstrates the distribution in time lag between initial CUSUM detection during the quarter and subsequent observed to expected ratio detection at the end of the quarter. The initial CUSUM signal occurred a median of 49 days (IQR, 25-63 days) before observed to expected ratio reporting (1 signal, 35 days [IQR, 17-54 days]; 2 signals, 49 days [IQR, 26-61 days]; 3 signals, 58 days [IQR, 44-69 days]; ≥4 signals, 49 days [IQR, 42-69 days]; trend test, P < .001). Among hospitals with multiple CUSUM signals, the median interval between signals was 9 days (IQR, 3-22 days) (2 signals, 10 days [IQR, 4-24 days]; 3 signals, 9 days [IQR, 3-26 days]; ≥4 signals, 6 days [IQR, 2-18 days]; trend test, P = .03).
For hospitals identified with at least 1 CUSUM signal, 67.3% (559 of 830) were concordantly identified as an outlier with both CUSUM and observed to expected ratios in the following quarter. Among hospitals concordantly identified with at least 1 CUSUM signal and observed to expected ratios, 20.6% (41 of 199) were concordantly identified as an outlier in the following quarter. For hospitals identified as an outlier by observed to expected ratios (but not CUSUM), 36.0% (27 of 75) were identified by CUSUM in the subsequent quarter. An increasing number of CUSUM signals were associated with concordant CUSUM and observed to expected ratio detection in the following quarter (1 signal: odds ratio [OR], 1.50 [95% CI, 1.19-1.89]; 2 signals: OR, 1.48 [95% CI, 1.04-2.10]; 3 signals: OR, 1.54 [95% CI, 0.94-2.53]; and ≥4 signals: OR, 2.80 [95% CI, 1.33-5.93]).
Table 1 demonstrates performance characteristics for CUSUM detection relative to the observed to expected ratio. Using a single CUSUM signal to detect an outlier hospital, the false-positive detection rate was 28.4% (631 of 2222). For multiple CUSUM signals, the rate was 8.7% (194 of 2222). Specificity increased with an increasing number of CUSUM signals (1 signal, 80.3% [95% CI, 78.6%-82.0%]; 2 signals, 94.1% [95% CI, 93.0%-95.0%]; 3 signals, 97.8% [95% CI, 97.1%-98.4%]; and ≥4 signals, 99.3% [95% CI, 98.9%-99.6%]). The positive predictive value also increased (1 signal, 15.8% [95% CI, 12.8%-19.2%]; 2 signals, 29.2% [95% CI, 22.8%-36.3%]; 3 signals, 45.5% [95% CI, 34.8%-56.4%]; and ≥4 signals, 60.5% [95% CI, 43.4%-76.0%]). The association between hospital characteristics and false-positive or false-negative CUSUM detection is presented in Table 2.20 Overall, higher-complexity hospitals were more likely to demonstrate false-positive CUSUM signaling (complexity level 3, reference; complexity level 2: OR, 2.86 [95% CI, 0.30-26.76]; complexity level 1c: OR, 11.53 [95% CI, 1.26-105.52]; complexity level 1b: OR, 15.33 [95% CI, 1.63-144.30]; and complexity level 1a: OR, 16.95 [95% CI, 1.77-162.09], with level 1a being the highest complexity and level 3 being the lowest).
Table 1. CUSUM Performance (Stratified by the Number of Signals in a Quarter) Relative to Observed to Expected Ratio for Detecting Outlier Hospitals.
CUSUM signals, No. | Sensitivity (95% CI), % | Specificity (95% CI), % | Positive predictive value (95% CI), % | Negative predictive value (95% CI), % | AUC | Correlationa |
---|---|---|---|---|---|---|
1 | 29.9 (24.6-35.7) | 80.3 (78.6-82.0) | 15.8 (12.8-19.2) | 90.3 (88.9-91.6) | 0.55 | 0.17 |
2 | 19.7 (15.2-24.9) | 94.1 (93.0-95.0) | 29.2 (22.8-36.3) | 90.5 (89.2-91.6) | 0.56 | 0.38 |
3 | 14.6 (10.6-19.3) | 97.8 (97.1-98.4) | 45.5 (34.8-56.4) | 90.3 (89.0-91.4) | 0.56 | 0.51 |
≥4 | 8.4 (5.4-12.3) | 99.3 (98.9-99.6) | 60.5 (43.4-76.0) | 89.8 (88.5-91.0) | 0.54 | 0.58 |
Abbreviations: AUC, area under the curve; CUSUM, cumulative sum.
Tetrachoric correlation.
Table 2. Hospital Factors Associated With False-Positive and False-Negative CUSUM Signaling (Relative to Observed to Expected Ratio), Stratified by Single vs Multiple Quarterly CUSUM Signals.
Factor | Odds ratio (95% CI) | |||
---|---|---|---|---|
Single CUSUM signal | Multiple CUSUM signals | |||
False positive | False negative | False positive | False negative | |
Size | ||||
Hospital operating beds per 10 000 veteran users, No. | 1.00 (0.99-1.02) | 1.00 (0.95-1.06) | 1.00 (0.98-1.03) | 1.02 (0.61-1.67) |
Complexity levela | ||||
3 | 1 [Reference] | 1 [Reference] | 1 [Reference] | 1 [Reference] |
2 | 2.86 (0.30-26.76) | 1.03 (0.13-8.03) | 0.14 (0.05-0.45) | 7.95 (0.85-74.25) |
1c | 11.53 (1.26-105.52) | 0.17 (0.02-1.43) | 0.48 (0.22-1.07) | 1.20 (0.21-6.97) |
1b | 15.33 (1.63-144.30) | 0.24 (0.02-3.18) | 0.84 (0.47-1.49) | 0.49 (0.11-2.18) |
1a | 16.95 (1.77-162.09) | 0.45 (0.03-6.47) | 1.00 (1.00-1.00) | 1.00 (1.00-1.00) |
Disease burden | ||||
Mean relative risk scoreb | 0.66 (0.20-2.14) | 1434.06 (31.29-65 724.32) | 0.61 (0.12-3.04) | 3.16 (0.05-214.99) |
Academic mission | ||||
Resident slots per 10 000 Veteran users | 1.02 (1.00-1.05) | 0.90 (0.84-0.98) | 1.01 (0.98-1.05) | 0.95 (0.88-1.03) |
Reliance on VA | ||||
Reliance of Medicare-eligible enrollees on VAc | 16.90 (1.23-232.93) | 0.001 (0.001-0.086) | 9.60 (0.28-324.29) | 0.002 (0.000-8.459) |
Infrastructure | ||||
Multisite | 1.24 (0.73-2.09) | 5.29 (0.97-28.92) | 0.80 (0.40-1.61) | 1.00 (0.14-7.10) |
Square footage per unique veteran userd | 0.97 (0.95-1.00) | 1.02 (0.96-1.09) | 1.00 (0.97-1.03) | 1.00 (0.93-1.08) |
Care delivery structure | ||||
Ratio of nonhospital-based outpatient visits to total visits | 0.52 (0.11-2.38) | 13.32 (0.39-452.05) | 0.42 (0.05-3.57) | 1.41 (0.02-121.11) |
Community and environment | ||||
Community hospital beds per patient, No. | 1.09 (0.49-2.43) | 4.51 (0.95-21.30) | 0.63 (0.19-2.12) | 5.03 (0.40-63.15) |
Total Medicaid generositye | 1.00 (1.00-1.00) | 1.00 (1.00-1.00) | 1.00 (1.00-1.00) | 1.00 (1.00-1.00) |
SAIL Quality Scoref | ||||
1 | 1 [Reference] | 1 [Reference] | 1 [Reference] | 1 [Reference] |
2 | 1.40 (0.69-2.84) | 2.77 (0.49-15.52) | 1.37 (0.53-3.56) | 1.44 (0.29-10.29) |
3 | 1.07 (0.53-2.17) | 2.59 (0.49-13.63) | 0.94 (0.36-2.49) | 1.22 (0.20-7.62) |
4 | 1.04 (0.48-2.25) | 2.83 (0.52-15.51) | 1.00 (0.34-2.91) | 1.79 (0.25-12.59) |
5 | 1.07 (0.43-2.65) | 2.91 (0.17-50.19) | 1.10 (0.32-3.74) | 1.60 (0.11-23.75) |
Abbreviations: CUSUM, cumulative sum; SAIL, Strategic Analytics for Improvement and Learning; VA, US Department of Veterans Affairs.
VA facilities are divided into 5 levels of complexity (1a, 1b, 1c, 2, and 3, with 1a being the highest complexity and 3 being the lowest) based on factors including patient risk, teaching, research, volume, number of physician specialists, and availability of intensive care units.20
Veterans’ estimated health care costs divided by mean observed health care costs of an individual in the VA.
VA cost divided by the sum of VA cost and Medicare cost.
Building square footage plus leased square footage.
Adjusted total Medicaid spending by the state divided by number of state residents with income below the federal poverty level.
Assessment of 27 quality measures, such as mortality rate, complications, patient satisfaction, and overall efficiency, for hospital system performance within the VA.
With 2 consecutive observed to expected ratio outlier quarters as the detection benchmark, outliers were identified 2.1% of the time (53 quarters) during 2496 hospital quarters. Of these outliers, 60.4% (32 quarters) were concurrently identified with at least 1 CUSUM signal. The initial CUSUM signal occurred a median of 137 days (IQR, 116-154 days) before the end of the second quarter with observed to expected ratio detection. After reliability adjustment, outliers were identified 6.1% of the time (152 of 2496) across all quarters of data. Of these outliers, 70.4% (107) were concurrently identified with at least 1 CUSUM signal, with the initial CUSUM signal occurring a median of 57 days (IQR, 51-64 days) before observed to expected ratio detection at the end of the quarter. With 2 consecutive reliability-adjusted observed to expected ratio outlier quarters as the benchmark, 1.0% of hospital quarters (25 of 2496) were identified as outliers. Of these outliers, 72.0% (18) were concurrently identified by CUSUM. The initial CUSUM signal occurred a median of 135 days (IQR, 116-155 days) before the end of the second quarter of observed to expected ratio detection.
Discussion
National quality improvement programs, such as VASQIP and ACS-NSQIP, were implemented decades ago to improve the quality and safety of surgical care.5,17 However, episodic evaluation limits the ability of hospitals to proactively respond to clinically important deterioration in surgical care and to effectively reduce surgical risk. Consequently, more contemporaneous feedback is vital to ensure that hospitals’ response to adverse events is timely.21,22,23 The CUSUM represents an analytic method that could provide more immediate data to inform performance improvement and potentially fill a gap within contemporary quality improvement programs.11,12,13 In this context, our study suggests that most hospitals with excess perioperative mortality identified at the end of the quarter were found to have at least 1 earlier CUSUM signal suggestive of an out-of-control process during the quarter. The accumulation of CUSUM signals during the course of the quarter indicates an opportunity for earlier detection of hospitals with ongoing, persistent quality or safety concerns, which in turn translates into patient lives saved.
Episodic approaches to performance evaluation used by contemporary national surgical and nonsurgical quality improvement programs provide only feedback at the end of an observation period. This delay potentially exposes patients to suboptimal care processes or outcomes.21,22,24 In fact, our group previously estimated that at least 129 operations and 368 postoperative inpatient days per quarter at each VA hospital may be at risk after CUSUM detection in the middle of a quarter but before observed to expected ratio detection at the end of that quarter.13 When multiple CUSUM signals are detected in a quarter, they can occur in close temporal clusters (eg, 5 deaths in a short period). These types of clinically relevant clusters might otherwise go unrecognized under current episodic strategies. The benefit of early recognition of these clusters is that they may all stem from the same unrecognized, underlying care process. For example, an unexpected increase in surgical site infections occurring during a 2-week period could be associated with recurrent, inappropriate perioperative antibiotic selection or a failed instrument sterilization process. The ability to rapidly identify clusters of events affords hospitals the opportunity to implement timely corrective measures and therefore prevent additional patient harm. Unfortunately, our work suggests that there are substantial delays between when a problem is occurring and when it is recognized and addressed within current surgical quality improvement programs.
When VASQIP was implemented nearly 3 decades ago, its primary intent was as a quality assessment tool.4 However, given numerous improvements in surgical quality and safety in both the VA and the private sector, the focus of national quality improvement programs should shift to performance improvement.5,17 Recent data suggest national improvements in perioperative outcomes have occurred independent of participation in these types of quality improvement programs.25,26 As such, alternative analytic methods and tools could serve as useful adjuncts for participating hospitals and improve the flow of data to relevant stakeholders. Because CUSUM can be modified to meet different program monitoring goals, it may represent an optimal tool that could be broadly implemented within the analytic infrastructure of national surgical and nonsurgical quality improvement programs. Wider use of early detection tools would not only enhance the robustness of national quality improvement programs but also could encourage hospitals to take a more proactive approach to local quality improvement.
The CUSUM is intended to be a management, not a regulatory, tool. Therefore, when the potential value of an early detection tool such as the CUSUM is considered, it is important to balance the number of false-positive detections (minimizing false-positive signals ensures stakeholders do not experience “alarm fatigue”) against the risk of missing a truly important quality improvement problem requiring a performance improvement plan. In other words, early detection tools should be sensitive enough to make stakeholders aware of opportunities for improvement, but not so sensitive that they are disregarded because detection is frequently not associated with any process-level issues that can be addressed. Using multiple CUSUM signals in a quarter to detect hospitals with potentially out-of-control processes appears to help minimize false-positive detection. We believe this provides a good balance between providing timely and appropriate feedback and the risk of alarm fatigue. Ultimately, CUSUM is an optimal tool for achieving this balance because it allows stakeholders to set detection thresholds consistent with the monitoring goal, and CUSUM signaling also appears to estimate future performance and identify opportunities for improvement that may otherwise go undetected with current episodic evaluation strategies.13 We envision an opportunity for surgical programs and other stakeholders to be able to access hospital-specific CUSUM charts through a secure online dashboard to allow an evaluation of contemporaneous program performance, similar to the manner the CUSUM is currently used by the Scientific Registry of Transplant Recipients.11,12
Limitations
There are several limitations to acknowledge regarding our findings. Given the observational and retrospective nature of our study, there are some inherent challenges in understanding the degree to which a tool such as the CUSUM could facilitate local quality improvement. More specifically, our data do not provide any information about what hospitals may have been doing in response to being identified as an outlier. The optimal V mask for monitoring perioperative outcomes has not been established, and our study indicates future work will need to consider differences in signaling thresholds based on the type of hospital being monitored, volume of procedures performed, and risk tolerance. The VASQIP also does not include details on cause of death or associated patient and hospital factors. As such, we could not ascertain the extent to which observed CUSUM or observed to expected ratio signaling represented potentially modifiable performance issues. Our analysis of hospital factors is limited to only VA hospitals, and the ORs had wide 95% CIs. Therefore, further work on the association between hospital factors and CUSUM monitoring is warranted. Our study focused on all-cause mortality because it is the primary outcome assessed by VASQIP and ACS-NSQIP. Because other outcomes are also captured by these programs, future work to expand the CUSUM beyond mortality will be of value. Finally, we compared CUSUM with episodic evaluation by using quarters of data because this is the approach used by both VASQIP and ACS-NSQIP. Although an alternative approach might be to use observed to expected ratio analyses on shorter increments (eg, 60 days), doing so would not address the inability of current episodic analytic strategies to detect clinically meaningful outcome clusters (eg, 3 deaths in a week).
Conclusions
This hospital-level comparative effectiveness study found that compared with current episodic monitoring, the CUSUM facilitates earlier detection of hospitals with persistent surgical care concerns. As such, it represents a quality improvement and performance improvement tool that could prevent patient harm and be integrated into the framework and infrastructure of existing national quality improvement programs without any added data collection or resources. Although CUSUM provides a mechanism for earlier detection, national quality improvement infrastructure will need contemporaneous data flow to and from each hospital to realize the true benefit of this technique. Further work is needed to determine how information about unique and diverse hospital-level factors should be incorporated to create more reliable strategies for monitoring hospital performance. Implementation of a confidential CUSUM reporting mechanism would complement current episodic analytic and feedback strategies, facilitate the flow of information to stakeholders, and enhance the robustness of national quality improvement efforts.
References
- 1.HR 505—Veterans’ Administration health-care amendments of 1985. Accessed June 6, 2023. https://www.congress.gov/bill/99th-congress/house-bill/505/text
- 2.Khuri SF, Daley J, Henderson W, et al. ; National VA Surgical Quality Improvement Program . The Department of Veterans Affairs’ NSQIP: the first national, validated, outcome-based, risk-adjusted, and peer-controlled program for the measurement and enhancement of the quality of surgical care. Ann Surg. 1998;228(4):491-507. doi: 10.1097/00000658-199810000-00006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Birkmeyer JD, Shahian DM, Dimick JB, et al. Blueprint for a new American College of Surgeons: National Surgical Quality Improvement Program. J Am Coll Surg. 2008;207(5):777-782. doi: 10.1016/j.jamcollsurg.2008.07.018 [DOI] [PubMed] [Google Scholar]
- 4.Khuri SF, Daley J, Henderson WG. The comparative assessment and improvement of quality of surgical care in the Department of Veterans Affairs. Arch Surg. 2002;137(1):20-27. doi: 10.1001/archsurg.137.1.20 [DOI] [PubMed] [Google Scholar]
- 5.Hall BL, Hamilton BH, Richards K, Bilimoria KY, Cohen ME, Ko CY. Does surgical quality improve in the American College of Surgeons National Surgical Quality Improvement Program? an evaluation of all participating hospitals. Ann Surg. 2009;250(3):363-376. doi: 10.1097/SLA.0b013e3181b4148f [DOI] [PubMed] [Google Scholar]
- 6.Page ES. Continuous inspection schemes. Biometrika. 1954;41(1/2):100. doi: 10.2307/2333009 [DOI] [Google Scholar]
- 7.de Leval MR, François K, Bull C, Brawn W, Spiegelhalter D. Analysis of a cluster of surgical failures: application to a series of neonatal arterial switch operations. J Thorac Cardiovasc Surg. 1994;107(3):914-923. doi: 10.1016/S0022-5223(94)70350-7 [DOI] [PubMed] [Google Scholar]
- 8.Steiner SH, Cook RJ, Farewell VT. Monitoring paired binary surgical outcomes using cumulative sum charts. Stat Med. 1999;18(1):69-86. doi: [DOI] [PubMed] [Google Scholar]
- 9.Chaput de Saintonge DM, Vere DW. Why don’t doctors use cusums? Lancet. 1974;1(7848):120-121. doi: 10.1016/S0140-6736(74)92345-9 [DOI] [PubMed] [Google Scholar]
- 10.Wohl H. The cusum plot: its utility in the analysis of clinical data. N Engl J Med. 1977;296(18):1044-1045. doi: 10.1056/NEJM197705052961806 [DOI] [PubMed] [Google Scholar]
- 11.Axelrod DA, Guidinger MK, Metzger RA, Wiesner RH, Webb RL, Merion RM. Transplant center quality assessment using a continuously updatable, risk-adjusted technique (CUSUM). Am J Transplant. 2006;6(2):313-323. doi: 10.1111/j.1600-6143.2005.01191.x [DOI] [PubMed] [Google Scholar]
- 12.Axelrod DA, Kalbfleisch JD, Sun RJ, et al. Innovations in the assessment of transplant center performance: implications for quality improvement. Am J Transplant. 2009;9(4, pt 2):959-969. doi: 10.1111/j.1600-6143.2009.02570.x [DOI] [PubMed] [Google Scholar]
- 13.Massarweh NN, Chen VW, Rosen T, et al. Comparative effectiveness of risk-adjusted cumulative sum and periodic evaluation for monitoring hospital perioperative mortality. Med Care. 2021;59(7):639-645. doi: 10.1097/MLR.0000000000001559 [DOI] [PubMed] [Google Scholar]
- 14.Lyman WB, Passeri M, Murphy K, et al. The next step in surgical quality improvement: outcome situational awareness. Can J Surg. 2020;63(2):E120-E122. doi: 10.1503/cjs.000519 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Davis CL, Pierce JR, Henderson W, et al. Assessment of the reliability of data collected for the Department of Veterans Affairs National Surgical Quality Improvement Program. J Am Coll Surg. 2007;204(4):550-560. doi: 10.1016/j.jamcollsurg.2007.01.012 [DOI] [PubMed] [Google Scholar]
- 16.Massarweh NN, Anaya DA, Kougias P, Bakaeen FG, Awad SS, Berger DH. Variation and impact of multiple complications on failure to rescue after inpatient surgery. Ann Surg. 2017;266(1):59-65. doi: 10.1097/SLA.0000000000001917 [DOI] [PubMed] [Google Scholar]
- 17.Massarweh NN, Kougias P, Wilson MA. Complications and failure to rescue after inpatient noncardiac surgery in the Veterans Affairs health system. JAMA Surg. 2016;151(12):1157-1165. doi: 10.1001/jamasurg.2016.2920 [DOI] [PubMed] [Google Scholar]
- 18.Dimick JB, Ghaferi AA, Osborne NH, Ko CY, Hall BL. Reliability adjustment for reporting hospital outcomes with surgery. Ann Surg. 2012;255(4):703-707. doi: 10.1097/SLA.0b013e31824b46ff [DOI] [PubMed] [Google Scholar]
- 19.Wakeam E, Hyder JA. Reliability of reliability adjustment for quality improvement and value-based payment. Anesthesiology. 2016;124(1):16-18. doi: 10.1097/ALN.0000000000000845 [DOI] [PubMed] [Google Scholar]
- 20.Byrne MM, Daw CN, Nelson HA, Urech TH, Pietz K, Petersen LA. Method to develop health care peer groups for quality and financial comparisons across hospitals. Health Serv Res. 2009;44(2, pt 1):577-592. doi: 10.1111/j.1475-6773.2008.00916.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Reason J. Human error: models and management. West J Med. 2000;172(6):393-396. doi: 10.1136/ewjm.172.6.393 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Vincent C. Understanding and responding to adverse events. N Engl J Med. 2003;348(11):1051-1056. doi: 10.1056/NEJMhpr020760 [DOI] [PubMed] [Google Scholar]
- 23.Markovitz AA, Ryan AM. Pay-for-performance: disappointing results or masked heterogeneity? Med Care Res Rev. 2017;74(1):3-78. doi: 10.1177/1077558715619282 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Rodziewicz TL, Houseman B, Hipskind JE. Medical error reduction and prevention. National Library of Medicine. Updated May 2, 2023. Accessed July 1, 2023. https://www.ncbi.nlm.nih.gov/books/NBK499956/
- 25.Etzioni DA, Wasif N, Dueck AC, et al. Association of hospital participation in a surgical outcomes monitoring program with inpatient complications and mortality. JAMA. 2015;313(5):505-511. doi: 10.1001/jama.2015.90 [DOI] [PubMed] [Google Scholar]
- 26.Osborne NH, Nicholas LH, Ryan AM, Thumma JR, Dimick JB. Association of hospital participation in a quality reporting program with surgical outcomes and expenditures for Medicare beneficiaries. JAMA. 2015;313(5):496-504. doi: 10.1001/jama.2015.25 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.