Abstract
Objective:
The original Leapfrog Initiative recommends selective referral based on procedural volume thresholds (500 coronary artery bypass graft [CABG] surgeries, 30 abdominal aortic aneurysm [AAA] repairs, 100 carotid endarterectomies [CEA], and 7 esophagectomies annually). We tested the volume-mortality relationship for these procedures in the University HealthSystem Consortium (UHC) Clinical DatabaseSM, a database of all payor discharge abstracts from UHC academic medical center members and affiliates. We determined whether the Leapfrog thresholds represent the optimal cutoffs to discriminate between high- and low-mortality hospitals.
Methods:
Logistic regression was used to test whether volume was a significant predictor of mortality. Volume was analyzed in 3 different ways: as a continuous variable, a dichotomous variable (above and below the Leapfrog threshold), and a categorical variable. We examined all possible thresholds for volume and observed the optimal thresholds at which the odds ratio is the highest, representing the greatest difference in odds of death between the 2 groups of hospitals.
Results:
In multivariate analysis, a relationship between volume and mortality exists for AAA in all 3 models. For CABG, there is a strong relationship when volume is tested as a dichotomous or categorical variable. For CEA and esophagectomy, we were unable to identify a consistent relationship between volume and outcome. We identified empirical thresholds of 250 CABG, 15 AAA, and 22 esophagectomies, but were unable to find a meaningful threshold for CEA.
Conclusions:
In this group of academic medical centers and their affiliated hospitals, we demonstrated a significant relationship between volume and mortality for CABG and AAA but not for CEA and esophagectomy, based on the Leapfrog thresholds. We described a new methodology to identify optimal data-based volume thresholds that may serve as a more rational basis for selective referral.
This report investigates the relationship between surgical procedural volume and mortality in a database of US academic medical centers and their affiliates, the University HealthSystem Consortium Clinical Database SM. We tested the ability of the volume thresholds proposed by the Leapfrog Group to discriminate between high- and low-quality hospitals, as reflected by in-hospital mortality rate. We then used the data to determine empirical volume thresholds and compared these to the Leapfrog thresholds.
The first suggestion of a relationship between procedural volume and outcome was made by Luft et al in 1979.1 Since that time, many subsequent studies have corroborated their results; however, a consensus about the significance of high volume and its association with lower in-hospital mortality still does not exist and the use of volume as a quality measure continues to be debated.2–4 Despite the lack of agreement, many policymakers and some physicians are advocating the use of volume as a major measure of quality and as the basis for such policies as selective patient referral.5–8
The recent article by Birkmeyer et al was designed to be the definitive study on the relationship between volume and outcome.9 Medicare data were used to investigate the relationship between volume and mortality for 14 complex surgical procedures. The strength of the relationship between volume and mortality varied substantially across the cases investigated, comparing the highest quintile to the lowest quintile. The use of volume-based referrals, however, requires a single threshold (as opposed to quintiles) to discriminate between high- and low-quality hospitals.
The Leapfrog Group, a consortium of healthcare purchasers and providers representing approximately 33 million patients and $56 billion in healthcare revenue, is perhaps the best-known promoter of volume-based selective referral.10 The Leapfrog Initiative plans to use market forces to promote improvement in the quality of healthcare. One of its initial guidelines calls for selective referral to high-volume hospitals for 5 invasive procedures, as well as high-risk neonatal care. The annual volume thresholds were set at 500 coronary artery bypass graft (CABG) procedures per year, 400 coronary angioplasties, 30 abdominal aortic aneurysm (AAA) repairs, 100 carotid endarterectomies (CEA), and 7 esophagectomies. The Leapfrog Group based these thresholds on expert opinion and a critical review of the literature.10–16 Several of these studies used geographically limited databases with few high-volume institutions and may not be generalizable. More importantly, these analyses were not intended to determine thresholds but were primarily designed to validate the existence of volume-outcome correlations. Finally, more recent studies of the same populations failed to show a relationship between volume and mortality for 2 of these procedures, AAA and CABG.17,18 The Leapfrog Group revised their suggested volume thresholds in April 2003, removing CEA from the list of procedures and adding major pancreatic resections. In addition, they altered the thresholds for the remaining 3 procedures (450 CABG, 50 AAA, and 13 esophagectomies). This amendment illustrates that despite the consistent evidence for a relationship between volume and outcome in the literature, it is still not clear how to proceed to policy changes.5,19 Although selective referral may be a viable option, it is still not clear where the threshold should be set and if a single threshold is even reasonable.
The relationship between volume and outcome is likely a proxy for other structural and process components of care, which more accurately predict quality than volume alone. Many of these suggested structural and process characteristics, such as the presence of house staff or more specialized attending staff, dedicated operating rooms, or better nurse staffing, are more prevalent in academic institutions. The importance of process measures has been recognized by the Leapfrog Group. They recently proposed a set of process measures for each of their index procedures to be used as an adjunct to volume; however, until these process measures are better defined and institutions are able to document their performance based on these indicators, volume will continue to be used in quality measurement.
One of the major studies that failed to show a relationship between surgical volume and outcome comes from the Veterans Affairs National Surgical Quality Improvement Program database (VA NSQIP).18 This large, multi-institutional national database (68,631 operations from 123 institutions) was used to investigate 8 common surgical procedures and failed to find a correlation between volume and outcome (mortality for all procedures except CEA, which used stroke rate) for any of these 8 procedures. There are a number of possible explanations for this. There is a narrower range of volumes available within the VA system. For example, high-volume VA hospitals for AAA would only be classified as low or even very low hospitals in the recent study of Medicare data.9 Additionally, although surgeons may operate in low volume at the VA, they are likely to have higher volume practices at affiliated academic medical centers, showing the difficulty with simple institutional volume. Finally, it has been suggested that the failure to show a volume-outcome relationship stems from the unique structural properties of institutions within the VA system.20 To test whether structural similarity can attenuate the volume-outcome relationship, we used data contained in the University HealthSystem Consortium (UHC) Clinical Database (CDB) SM. The UHC comprises academic health centers and their affiliated community teaching hospitals, a subgroup of US hospitals with similar institutional attributes.19
Proposed policy initiatives focus on using the volume-mortality relationship as a basis for selective referral to high-volume institutions. These initiatives are predicated on the identification of a single-volume threshold that reliably discriminates between high-“quality” (as reflected by low mortality) and low-“quality” hospitals for a given procedure. To investigate the feasibility of setting such a threshold, we tested whether the thresholds proposed by the Leapfrog Group can reliably discriminate between high- and low-quality institutions in this sample of academic medical centers and community teaching affiliates. We then used the data in the UHC CDB to determine the optimal thresholds for discriminating between high- and low-quality institutions in this particular database. Finally, we compared these empirical thresholds with those proposed by the Leapfrog Group.
METHODS
Database
Founded in 1984, the UHC is an alliance of 87 academic medical centers across the United States.21 The UHC Clinical Database SM is a collection of all-payor hospital discharge abstracts from UHC members and their community teaching affiliates. For this study, data were analyzed from calendar years 1999–2000. The 2 years were examined in aggregate to maximize analytic power.
Cohort Determination
We defined a cohort for each of the 4 surgical procedures based on International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) procedure and diagnosis codes using the procedural definitions proposed by the Leapfrog Group.10 These are listed in Table 1. The database was queried for cases meeting these definitions and 4 separate analyses were performed.
TABLE 1. Procedural Definitions As Proposed by the Leapfrog Group Using ICD-9-CM Procedural Codes +/− ICD-9-CM Diagnostic Codes
Definition of Variables
Dependent Variable
The dependent variable under investigation is in-hospital mortality. The ideal end point for several of these analyses would actually be complication rate, either overall or specific for the procedure (eg, stroke rate for CEA) because mortality is a rare outcome. However, mortality is the end point used by the Leapfrog group as the basis for their proposal. Additionally, ICD-9–based complication rates may reflect, in addition to other factors, the diligence and sophistication of the coding and abstraction practices at individual institutions. In this setting, “complication rates” will have more institutional variability than more well-defined outcomes, such as in-hospital mortality.
Primary Independent Variable
The primary independent variable under investigation is procedural volume. Procedural volume was determined by summing the number of procedures that met the Leapfrog definitions for each year at each institution. We analyzed procedural volume as 3 different variables: (1) a continuous variable; (2) a dichotomous variable (above and below the Leapfrog thresholds); and (3) a categorical variable or 4 groups based on the following: (a) <50% of the Leapfrog threshold, (b) >50% of the Leapfrog Threshold and <Leapfrog Threshold, (c) >Leapfrog threshold and <150% of the Leapfrog threshold, and (d) >150% of the Leapfrog threshold. We chose to examine volume groups as a function of preassigned volume cutoffs rather than establishing equal quartiles or quintiles as many studies have done because we were interested in testing the validity of strict volume thresholds.
Covariates and Confounders
Patient-level covariates included in this analysis were age, race, gender, emergency status, whether the patient was transferred in from another acute-care institution, insurance status, and a measure of severity of illness. For the latter variable, a refinement class designation (RDRGs) based on secondary diagnoses was used.21 These severity classes are 0 (baseline or no substantial secondary diagnosis), 1 (moderate secondary diagnosis, eg, diabetes or chronic obstructive pulmonary disease), 2 (major secondary diagnosis), and 3 (catastrophic secondary diagnosis). A patient's refinement class is designated by the highest refinement class assigned to all secondary diagnoses. For analysis, refinement class 0 was used as the reference group and the variable was analyzed as a categorical variable. For esophagectomy, refinement class was collapsed into a dichotomous variable of refinement class 0/1 or 2/3 to achieve model convergence.
Multivariate Regression Analysis
A logistic regression model was used to evaluate the univariate relationship between each independent variable and mortality. The patient was the unit of analysis and we controlled for clustering within institutions using the generalized estimating equation methodology.22 Any covariate with a P value < 0.10 was considered for inclusion in the multivariable model. The final multivariate model was built at the significance level of 0.05. All analyses were performed using SAS V. 8.2, PROC GENMOD.
Threshold Determination
To determine the optimal threshold for discriminating between high- and low-mortality hospitals, the hospitals were split into 2 groups according to a volume threshold, which was then varied. The odds ratio (OR) for each threshold was determined using a logistic regression model that controlled for patient-level covariates and that adjusted for clustering within institutions. The optimal threshold was identified as the volume that most significantly discriminated among hospitals with respect to odds of death.
Hospitals Meeting the Threshold
We determined how many institutions in the UHC CDB met each of the 3 thresholds for each procedure during the year 2000: the original Leapfrog thresholds, the revised Leapfrog thresholds, and the empirical thresholds determined by our analysis.
RESULTS
Cohort Composition
The UHC CDB contains ICD-9 coded abstracts on over 2.3 million annual discharges from 113 academic medical centers and their affiliates. From this database, we identified 4 cohorts as follows: 9,869 AAA from 83 institutions, 69,827 CABG from 99 institutions, 17,015 CEA from 102 institutions, and 1,634 esophagectomies from 88 institutions. Using previously published estimates of national rates, the UHC database contains approximately 14% of AAA performed annually in the United States, 9–11% of the annual number of CABG procedures, 6–7% of the annual number of CEA, and 33% of esophagectomies.8,23 Table 2 contains a description of the UHC institutions for the 4 procedures, including mortality rates. Table 2 also contains the mean, median, and range of institutional volumes seen in the UHC CDB for each of the 4 procedures during the calendar year 2000. There is a wide distribution of institutional volumes for each of the 4 procedures. As these procedural definitions were based on those proposed by the Leapfrog Group, which were designed to be inclusive, combined procedures were included. 17% of CABG procedures were performed in conjunction with a valve procedure. Table 3 describes the patient characteristics of the cohort under investigation for each procedure including the percent of patients treated at hospitals above the suggested Leapfrog thresholds as well as the composition of the 4 volume groups. The mean age is 65 or younger for 3 of the 4 procedures, suggesting that this will be a different population than captured by Medicare data.
TABLE 2. Institutional Characteristics: The Cohorts Defined Within the UHC Clinical Database (CDB)SM for Each Procedure During the Calendar years 1999 and 2000
TABLE 3. Patient Characteristics: The Cohorts Defined Within the UHC Clinical Database (CDB)SM for Each Procedure During the Calendar Years 1999 and 2000
Multivariate Model Using the Original Leapfrog Thresholds
The adjusted odds ratios for the 3 volume variables from the multivariate models are shown in Table 4. The relationship between procedural volume and mortality is significant for AAA whether volume is included as a continuous variable (change in risk per 10 AAA performed), a dichotomous variable (above or below the Leapfrog cut-off) or a categorical variable. For CABG, the volume-outcome relationship is not significant as a continuous variable, but is highly significant when included as a dichotomous or categorical variable. We therefore consider these 2 procedures to have a significant volume-mortality relationship.
TABLE 4. Multivariate Model Results: Adjusted Odds Ratio for Mortality from the Multivariate Hierarchical Logistic Regression Model.
For CEA, the results are more complicated. There is no relationship between volume and outcome when volume is a continuous or categorical variable; however when the Leapfrog cut-off is used to define 2 groups, there is a statistically significant relationship (P = 0.04). This is likely a function of the large sample size (n = 17,015) and the clinical significance is limited. For esophagectomy, the only statistically significant difference in mortality rates (P = 0.02) is seen between the extreme groups (highest volume vs. lowest volume). Based on the weak and inconsistent nature of these results, we conclude that there is not a strong relationship between volume and mortality based on the Leapfrog thresholds for these 2 procedures in the UHC database.
Threshold Determination and Comparison
To determine the empirical thresholds, we modeled the UHC data using logistic regression as described above. Figure 1 graphs these results. The odds ratio, which represents the increased risk of death, is depicted on the y-axis while the threshold for each of the 4 procedures is varied along the x-axis. The error bars represent the pointwise 95% confidence intervals (CI) and the difference is considered significant so long as these do not cross 1.0. The scale on the y-axis was adjusted to include the widest 95% CI for each procedure. It is clear from these graphs that the Leapfrog thresholds are not optimal for discriminating high versus low mortality in these data sets. For AAA, the graph peaks at 15 AAA per year with an OR of 1.70, which can be named the optimal threshold for discriminating between high and low mortality hospitals. The CABG graph has a clear inflection point at 250 per year, with a similar OR of 1.70. The CEA graph depicts a peak at 20; however, at this point, the CI crosses 1, signifying a lack of statistically significant difference between the 2 groups of hospitals. For esophagectomy, the difference between the 2 groups becomes significant for a threshold set above 22. Based on these results, we suggest the following alternative thresholds for the UHC data: 15 AAA, 250 CABG, and 22 esophagectomies. There is no meaningful volume threshold available for CEA, based on its relationship with mortality. The true confidence intervals for the associated odds ratios may be wider than depicted as those depicted have not been adjusted for multiple testing over all thresholds.
FIGURE 1. Odds ratio as a function of volume threshold. The Leapfrog thresholds are represented by the dashed lines whereas the new empirical thresholds are indicate by the solid lines. Please note that the scale of the y-axis has been adjusted for each graph to allow depiction of the widest 95% confidence interval.
Number of Hospitals Meeting the Thresholds
Table 5 demonstrates the number of hospitals in the UHC CDB that meet each of the 2 proposed set of Leapfrog volume thresholds and the new empirical volume thresholds. The value of the threshold markedly alters the number of hospitals that would be “approved” should volume-based selective referral be instituted and would have a major impact on patient care.
TABLE 5. Comparison of Leapfrog Volume Thresholds with Empirical Thresholds from the UHC CDB: The Number and Percent of UHC Institutions Meeting or Exceeding These Three Thresholds is Shown for Comparison for Each of the Four Procedures
DISCUSSION
With an increasing national focus on patient safety, there is substantial pressure, both from the medical and lay communities, to devise better means for discriminating between high-quality and low-quality hospitals. One of the first suggested quality measures was in-hospital mortality rates. As the importance of case mix was recognized, there was a movement toward observed to expected mortality ratios, where the actual death rate is compared with an expected death rate for the institution derived from modeling based on case mix.24–27 More recently, quality measurement has begun to focus on procedural volume, based primarily on its relationship with adjusted mortality rates. Procedural volume is appealing because it is a simple measure that can be easily understood and used by both healthcare providers and consumers. However, at this point, the causal relationships underlying the correlation between procedural volume and outcomes are still not fully understood. Therefore, despite the fact that the relationship between volume and outcome has been repeatedly reported in the literature, it is not yet clear how to translate this relationship into policy initiatives. Selective referral as suggested by such initiatives as the Leapfrog Group may be a reasonable option, but a rational approach to identifying volume thresholds must be taken.
There are 2 approaches to investigating the relationship between volume and adjusted mortality rates. The first proposed by Khuri et al. analyzes institutional observed to expected mortality ratios as a function of procedural volume.18 Although this is a statistically valid approach, the Pearson coefficient, the appropriate test for the correlation, has limited power to detect the relationship (Betensky RA, et al. Hospital volume versus outcome: an unusual example of bivariate association. submitted). Because of this limitation, we performed this analysis at the patient level with adjustment for confounding, using an approach similar to several other studies.9,12,16,28,29
In this work, 2 aspects of the volume-outcome relationship as it relates to volume thresholds are investigated. First, the relationship between volume and mortality in a sample of academic medical centers and their affiliated community teaching hospitals is tested utilizing the proposed Leapfrog thresholds. Then, the data were used to identify the optimal threshold for discriminating between high- and low-mortality hospitals in this data set. Based on the characteristics of member institutions, this cohort of hospitals is assumed to represent a relatively homogenous group with similar structural characteristics and possibly some similar process characteristics. For example, the surgeons operating at these institutions are affiliated with a university or medical center, are more likely to specialize in performing certain procedures, and are assisted by house staff for both intraoperative and postoperative patient care. Similarly, it is likely that there are specialized operating rooms and intensive care units at UHC institutions. In addition, members of the Consortium have agreed to contribute their data for analysis and quality improvement measures. Each member receives quarterly and annual reports that include risk-adjusted length of stay, cost, and mortality measures, as well as other benchmarking data that can be used for quality improvement. The commitment to such benchmarking efforts suggests that this group of hospitals is more likely to pay attention to and maintain high quality care.
One of the assumptions made by the Leapfrog group and therefore this analysis is that mortality is the “gold standard” for measuring quality; however, this is not the case for all procedures. The overall mortality rate for CEA in the UHC database is only 0.7%, and the difference we are trying to detect between “high mortality” and “low mortality” is on the order of 0.2–0.3%. We are able to find a statistically significant difference when the hospitals are split into 2 groups, but this is likely a function of the large sample size in each group and the clinical significance of this is limited. Further attempts to study the volume-outcome relationship for CEA should focus on complication rate, which may be more reliable measures than mortality rate for this procedure.30 The difficulty in this lies with the unreliability of the coding of complication data in administrative databases.31–33 The volume-mortality relationship for any low mortality procedure, such as most routine elective surgery, will likely suffer from similar limitations.
In this group of academic health centers and their affiliated community teaching hospitals, there is not a strong relationship between volume and outcome across the entire spectrum of volumes for esophagectomies and in fact, we were unable to detect a relationship using volume groups based on the Leapfrog threshold. This may be, at least in part, a function of the low number of procedures available for analysis and therefore a lack of power. The graph in Figure 1 does suggest that it is possible to discriminate between high- and low-quality hospitals if one is willing to set the threshold at 22 esophagectomies per year, where the difference in mortality is larger. This may also explain why we fail to detect a relationship between the high- and low-volume groups as defined in our initial analysis; the significant findings were buried in our highest volume group. A comparison of the observed mortality rates will help to illustrate the advantage of the empirical threshold of 22 over the Leapfrog threshold of 7. The raw mortality rate for hospitals performing fewer than 7 esophagectomies per year is 6.1% compared with 5.5% for hospitals performing more than 7. If we use the empirical threshold, the mortality rate is 7.0% for hospitals performing less than 22 procedures per year while hospitals performing more than 22 per year have an observed mortality rate of only 2.9%.
For AAA and CABG, there are sufficient numbers of patients per institution and a substantial mortality rate; therefore volume can be considered a reasonable quality measure based on its relationship with mortality. The graphs in Figure 1 illustrate the ideal quality discriminating volume thresholds based on the UHC data. For both of these procedures, at the optimal point, the OR is 1.70. For the data in the UHC CDB, the optimal thresholds would be 15 AAA per year and 250 CABG procedures.
Until we are better able to define and measure quality of care, proxies such as volume may offer a temporary bridge in the quality movement. It must be remembered, however, that volume is a limited quality measure. It is a surrogate measure that likely reflects other more direct structural and process characteristics that promote high quality of care. Health professionals and researchers must continue to investigate other quality measures, including structure and process measures, as well as outcomes such as postoperative complication rates within large cohorts of patients.
One possible way to use volume as a quality measure is to set minimum procedural volume thresholds to serve as a basis for selective referral to high volume institutions. The value of these thresholds has major ramifications for patient care as illustrated in Table 5 and must be set with caution. Yet to date, it is not clear where the ideal thresholds for distinguishing between high and low mortality hospitals lie. This idea is reinforced by the fact that the Leapfrog Group recently amended their recommendations. They removed CEA from their guidelines and the results of this analysis of the UHC CDB support this decision. They also altered the thresholds for the other 3 procedures and added pancreatic resections.
This study suggests that the cutoffs recommended by the Leapfrog Group fall short and may not represent the optimal thresholds for discriminating between high- and low-mortality hospitals within this database of academic medical centers and their teaching affiliates. We were able to define better thresholds for AAA, CABG, and esophagectomy but not CEA, based on the data available in the UHC CDB. Although this methodology may offer a rational basis for volume-based selective referral, there are still many policy questions that must be answered. For example, how will selective referral alter referral patterns and continuity of care? Also, is selective referral to only a small number of geographically dispersed hospitals (eg, 5 institutions in UHC for esophagectomy) feasible? The next step will be to determine if these thresholds can be used to discriminate quality in a larger population-based database, containing a broader range of US institutions. For a volume threshold to serve as the basis for selective referral, it must be proven to be a reliable measure across different groups of hospitals, and ideally also across time.
Quality measurement is not a simple or inexpensive undertaking. The use of volume thresholds as a basis for selective referral is appealing because of its simplicity. Yet, we caution against implementing arbitrary thresholds. This study describes a more rational approach using the data to define empirical thresholds where possible. Physicians and researchers ultimately must strive for better, more direct measures so that quality measurement does not need to rely on proxies such as volume. The most recent proposal by the Leapfrog group, which attempts to define some process measures for their index procedures, suggests that these more direct measures will eventually supplant the use of indirect proxies such as volume.7 In the meantime, it is important to remain wary of making public and payor policy changes based solely on volume until more investigation has been performed.
ACKNOWLEDGMENTS
We would like to thank the University HealthSystem Consortium for providing us with the data for this analysis. Our special thanks to Jodi Neikirk, Director of Analytic Services, and Stacy Wang, Senior Statistical Analyst, for their assistance in data acquisition and analysis and to Ms. Niekirk, Dr. Richard Bankowitz, and Susan Bradshaw for their assistance in preparation of the manuscript.
Discussion
Dr. Lazar J. Greenfield (Ann Arbor, Michigan): I would like to congratulate Dr. Christian on her presentation and express my gratitude for the opportunity to review the manuscript well ahead of the meeting.
The Leapfrog Group, which is a consortium of health care purchasers comprised primarily of Fortune 500 companies, has been in the forefront of efforts to improve quality of health care progressing rapidly from recommendations to specific dictates regarding surgical procedures, critical care, and the utilization of information technology. One could only wish for the same attention to their companies’ accounting standards. (Laughter)
The authors have tested the recommended threshold for 4 major operative procedures and confirmed that volume is statistically associated with improved outcomes for the specific populations presenting for abdominal aneurysm repair and coronary artery bypass. However, the thresholds selected by Leapfrog are too high in both cases.
One limitation to this report is that it involves only hospitals associated with the UHC. Therefore, we do not know what percent of the total hospital cohort who perform CABGs the 99 studied hospitals represent. Also, it is not clear whether the report includes combined coronary bypass procedures with others, such as valves. Similarly, there may be geographic variations in outcomes just as there are in indications.
The fundamental question in any discussion of quality outcome is whether volume thresholds represent the optimal metric or whether there is an advantage to observed-to-expected mortality ratios as reported by Khuri and others. At our own institution, Campbell and his group have been able to confirm the value of these O/E ratios as demonstrated in the national VA Hospital experience.
The fact that no significant relationship could be demonstrated between volume and mortality for carotid procedures and esophagectomy points out the limitations of such an approach in procedures with low frequency, such as esophagectomy, or expected low mortality rates such as carotid disease. For these and other procedures, it is much more helpful to know the complication rate, which proves to be a major determinant for cost and quality.
Even after revising the volume criteria to lower the threshold for inclusion of more academic health centers, we should be concerned about the consequences of adherence to such a referral policy. The adverse impact of transferring large numbers of patients to high-volume hospitals in terms of nurse staffing ratios and hospital occupancy could well offset a small benefit in mortality.
The authors should be congratulated on their careful documentation of the limitations of the proposal by the Leapfrog Group and should be encouraged to continue their efforts to document the indicators of quality care.
Although there has been much discussion of the importance of quality outcomes, the most recent Harris poll indicates that only a small percentage of the population is even aware of quality reports on the physicians or institutions that they use, and only 1 to 2% of either employers or patients make any decisions based on quality alone. It appears that cost still trumps quality and unfortunately is likely to do so for the foreseeable future.
Thank you for the opportunity to discuss this interesting paper.
Dr. Robert S. Rhodes (Philadelphia, Pennsylvania): Thank you for an outstanding piece of work. And I also express my gratitude for the opportunity to review the manuscript ahead of time. Your findings shed light on an important part of the volume outcome puzzle and seem to clarify some key methodologic issues.
For instance, the conclusions of the Leapfrog Group are mainly based on statewide databases that encompass a wide range of hospital sizes and hospital volumes. In using an extensive but more restrictive database from the 87 academic medical centers, you properly note that these medical centers are likely to have specific structural attributes that might contribute to improved quality. Thus, the fact they identify different empirical volume thresholds from those of the Leapfrog Group, or in some cases no volume thresholds, is not surprising.
On the other hand, I am intrigued both by the shape of the odds ratio as a function of specific volume and by the role of range of confidence of given volume for a given procedure. With regard to shape, the relationship of odds ratio to volume for aneurysmectomy, endarterectomy and esophagectomy are relatively flat, whereas the shape of the curve for CABG shows a peak and subsequent decline with increasing volume. Indeed, at extremely high volume it appears that the odds ratio for CABG suggests poorer outcomes at high volumes than at low volumes. My first question then is to ask you to speculate on the validity of this finding and the seeming optimal volume effect of CABG.
My next question relates to the relatively wider confidence intervals evident at low volumes of endarterectomy and aneurysmectomy. One explanation, of course, is that they are simply due to a lower number of hospitals at these volume levels. However, the relatively wide range could also be due to greater variability in mortality among hospitals in these volume ranges. Can you elaborate on this?
If due to greater variability in outcomes, it would further support the concept that a volume is a surrogate measure that reflects other structural and process factors. This finding would then also raise the question as to what the structural factors are. Do you plan to share this data with the participating hospitals in an identified manner?
The key message is that volume is a surrogate measure for other structural and process factors, and this cannot be overemphasized. Furthermore, using an empirical volume threshold seems to be the antithesis of Continued Quality Improvement. The authors are to be congratulated on this work and hopefully will continue this research. Thank you.
Dr. Caprice K. Christian (Boston, Massachusetts): I want to begin by addressing our choice of databases since this was mentioned by both of the discussants. We chose this database because it represents a convenient sample of teaching hospitals, both academic medical centers and their community hospital teaching affiliates. One of our objectives that I did not have time to discuss in the presentation, but is included in the manuscript, was to test whether structural similarity could blunt or even negate the volume-mortality relationship.
Next I will address the shape of the graphs. The peak and then a subsequent decline in the CABG graph are expected. This analysis assumes a ‘step function‘ for the volume-mortality relationship. At the lower end of the graph, as you move along the x-axis, more hospitals are appropriately classified as high mortality and low volume. As a result, the high volume group represents a more homogeneous group of low mortality hospitals and the odds ratio increases. This is true until the ideal threshold is met. At that point, the graph will then turn downwards as the low volume group is ‘contaminated‘ with better performers from the high volume group.
The findings at the extremes of the graph need to be interpreted with caution as they represent characteristics of a very small number of hospitals. The 2 points you refer to in the CABG graph actually reflect the characteristics of a few high volume outliers. We performed this analysis before controlling for severity of illness and these actually were above 1, but were not statistically significant. This leads us to conclude that these hospitals may have a ‘healthier‘ population than some other hospitals at the high end of the spectrum.
The shape of the AAA and CABG graphs is actually quite similar if you account for the difference in scale between the 2. The graphs are flat when few patients are reclassified by moving the threshold. For example the AAA graph appears to be flat because there were no hospitals in this data set that fell between 150 and 250. Therefore, the odds ratios in this area reflect repetitions of the same model. Finally, for esophagectomies, if we had carried the graph out, it would also have a peak and return to a not significant relationship. For 22 (the threshold) we see a peak OR of 2.59. By 28 this has dropped to 2.47, 32 it drops to 2.35, and by 50 the OR is only 1.80 (with 95% CI of 0.64 - 5.03). I think that for these 3 procedures, there is a clear peak, which represents the optimal threshold for this particular data set. I think it is difficult to draw any conclusions about the graph for CEA, but even this appears to peak and trend down as the volume threshold is set higher.
As far as the confidence intervals are concerned, you bring up 2 important points. These models are built at the patient level and the widened confidence intervals at the lower end likely reflect both the lower number of institutions, but also the lower number of patients per institution. I also agree that part of it is due to the wider variability in the reported mortality rates, but would have to disagree with the underlying reason. I think that part of the greater variability is due to the discreteness of our outcome of mortality. If you have 1 patient, for example, the mortality rate is either zero or 100% and there is no in-between. Similarly, for 3 patients, it is zero, 33%, 67%, or 100%. As you include more patients, there is a greater range of mortality rates available for analysis and your confidence intervals are going to be smaller. For CABG, there are many more patients at each point on the graph than for any of the other procedures, leading to much narrower confidence intervals. For esophagectomy, the CI are wide and stay wide throughout the graph reflecting the lower number of patients available for analysis for this procedure.
Dr. Michael J. Zinner (Boston, Massachusetts): I would like to thank the Association for the ability to present our work and also thank the discussants for their insightful comments and questions. Permit me, in closing, to make a few remarks about quality in our patients.
Everyone here knows that there is a relationship between volume and outcome. However, no one knows what that relationship is. Is it linear? That is, the more you do, the better it is forever? Or is it a step function? Are there thresholds? Or perhaps there are multiple thresholds. Specifically, the appropriate use of single procedure volume and the decision which outcome parameter to use it against is not clear. Low number of cases for hospital or provider or low event rates for outcomes may reflect chance occurrence as opposed to poor quality. It is the tyranny of small numbers. So when it comes to selective referral based solely on volume we should be careful what we wish for.
Real quality improvements will come with prospectively gathered risk adjusted data that do not rely on surrogates for quality. The use of volume thresholds as a basis for selective referrals is appealing because of its simplicity. We caution, however, against implementing arbitrary thresholds and propose a more rational approach using data to define empirical thresholds when possible. Finally, what we should strive for is a better understanding of these relationships and the continuing education of our patients about our desires to improve the quality of the care we deliver to them.
Footnotes
The information contained in this article was based in part on the Clinical Data Products Data Base maintained by the University HealthSystem Consortium (UHC).
Reprints: Michael J. Zinner, MD, Surgeon-in-Chief and Chairman, Department of Surgery, Brigham and Woman's Hospital, 75 Francis Street, Boston, MA 02115. E-mail: mzinner@partners.org.
REFERENCES
- 1.Luft HS, Bunker JP, Enthoven AC. Should operations be regionalized? The empirical relation between surgical volume and mortality. N Engl J Med. 1979;301:1364–1369. [DOI] [PubMed] [Google Scholar]
- 2.Khuri SF. Invited commentary: surgeons, not General Motors, should set standards for surgical care. Surgery. 2001;130:429–431. [DOI] [PubMed] [Google Scholar]
- 3.Dudley RA, Johansen KL. Invited commentary: physician responses to purchaser quality initiatives for surgical procedures. Surgery. 2001;130:425–428. [DOI] [PubMed] [Google Scholar]
- 4.Daley J. Invited commentary: quality of care and the volume-outcome relationship—what's next for surgery? Surgery. 2002;131:16–18. [DOI] [PubMed] [Google Scholar]
- 5.Dudley RA, Johansen KL, Brand R, et al. Selective referral to high-volume hospitals: estimating potentially avoidable deaths. JAMA. 2000;283:1159–1166. [DOI] [PubMed] [Google Scholar]
- 6.Epstein AM. Volume and outcome—it is time to move ahead. N Engl J Med. 2002;346:1161–1164. [DOI] [PubMed] [Google Scholar]
- 7.Gordon TA, Bowman HM, Tielsch JM, et al. Statewide regionalization of pancreaticoduodenectomy and its effect on in-hospital mortality. Ann Surg. 1998;228:71–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Birkmeyer JD, Finlayson EV, Birkmeyer CM. Volume standards for high-risk surgical procedures: potential benefits of the Leapfrog initiative. Surgery. 2001;130:415–422. [DOI] [PubMed] [Google Scholar]
- 9.Birkmeyer JD, Siewers AE, Finlayson EV, et al. Hospital volume and surgical mortality in the United States. N Engl J Med. 2002;346:1128–1137. [DOI] [PubMed] [Google Scholar]
- 10.http://www.leapfroggroup.org.
- 11.Hannan EL, Popp AJ, Tranmer B, et al. Relationship between provider volume and mortality for carotid endarterectomies in New York state. Stroke. 1998;29:2292–2297. [DOI] [PubMed] [Google Scholar]
- 12.Hannan EL, Kilburn H Jr, Bernard H, et al. Coronary artery bypass surgery: the relationship between inhospital mortality rate and surgical volume after controlling for clinical risk factors. Med Care. 1991;29:1094–1107. [PubMed] [Google Scholar]
- 13.Kazmers A, Perkins AJ, Jacobs LA. Aneurysm rupture is independently associated with increased late mortality in those surviving abdominal aortic aneurysm repair. J Surg Res. 2001;95:50–53. [DOI] [PubMed] [Google Scholar]
- 14.Williams SV, Nash DB, Goldfarb N. Differences in mortality from coronary artery bypass graft surgery at five teaching hospitals. JAMA. 1991;266:810–815. [PubMed] [Google Scholar]
- 15.O'Connor GT, et al. A regional prospective study of in-hospital mortality associated with coronary artery bypass grafting. The Northern New England Cardiovascular Disease Study Group. JAMA. 1991;266:803–809. [PubMed] [Google Scholar]
- 16.Begg CB, Cramer LD, Hoskins WJ, et al. Impact of hospital volume on operative mortality for major cancer surgery. JAMA. 1998;280:1747–1751. [DOI] [PubMed] [Google Scholar]
- 17.Sollano JA, Gelijns AC, Moskowitz AJ, et al. Volume-outcome relationships in cardiovascular operations: New York State, 1990–1995. J Thorac Cardiovasc Surg. 1999;117:419–428; discussion 428–430. [DOI] [PubMed] [Google Scholar]
- 18.Khuri SF, Daley J, Henderson W, et al. Relation of surgical volume to outcome in eight common operations: results from the VA National Surgical Quality Improvement Program. Ann Surg. 1999;230:414–429; discussion 429–432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Halm EA, Lee C, Chassin M. How is volume related to qulality in health care? A systematic review of the research literature. Washington, DC: Institute of Medicine; 2002. [Google Scholar]
- 20.Birkmeyer JD. Relation of surgical volume to outcome. Ann Surg. 2000;232:724–725. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.http://www.uhc.edu.
- 22.Liang K-Y, Zeger S. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22. [Google Scholar]
- 23.http://www.americanheart.org.
- 24.Khuri SF, Daley J, Henderson W, et al. The Department of Veterans Affairs’ NSQIP: the first national, validated, outcome-based, risk-adjusted, and peer-controlled program for the measurement and enhancement of the quality of surgical care. National VA Surgical Quality Improvement Program. Ann Surg. 1998;228:491–507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Dubois RW, Rogers WH, Moxley JH III, et al. Hospital inpatient mortality. Is it a predictor of quality? N Engl J Med. 1987;317:1674–1680. [DOI] [PubMed] [Google Scholar]
- 26.Khuri SF, Daley J, Henderson W, et al. The National Veterans Administration Surgical Risk Study: risk adjustment for the comparative assessment of the quality of surgical care. J Am Coll Surg. 1995;180:519–531. [PubMed] [Google Scholar]
- 27.Khuri SF, Daley J, Henderson W, et al. Risk adjustment of the postoperative mortality rate for the comparative assessment of the quality of surgical care: results of the National Veterans Affairs Surgical Risk Study. J Am Coll Surg. 1997;185:315–327. [PubMed] [Google Scholar]
- 28.Maxwell JG, Rutledge R, Covington DL, et al. A statewide, hospital-based analysis of frequency and outcomes in carotid endarterectomy. Am J Surg. 1997. 174:655–660; discussion 660–661. [DOI] [PubMed] [Google Scholar]
- 29.Hannan EL, Radzyner M, Rubin D. The influence of hospital and surgeon volume on in-hospital mortality for colectomy, gastrectomy, and lung lobectomy in patients with cancer. Surgery. 2002;131:6–15. [DOI] [PubMed] [Google Scholar]
- 30.Daley J, Henderson WG, Khuri SF. Risk-adjusted surgical outcomes. Annu Rev Med. 2001;52:275–287. [DOI] [PubMed] [Google Scholar]
- 31.Best WR, Khuri SF, Phelan M, et al. Identifying patient preoperative risk factors and postoperative adverse events in administrative databases: results from the Department of Veterans Affairs National Surgical Quality Improvement Program. J Am Coll Surg. 2002;194:257–266. [DOI] [PubMed] [Google Scholar]
- 32.Romano PS, Mark DH. Bias in the coding of hospital discharge data and its implications for quality assessment. Med Care. 1994;32:81–90. [DOI] [PubMed] [Google Scholar]
- 33.Romano PS, Chan BK, Schembri ME, et al. Can administrative data be used to compare postoperative complication rates across hospitals? Med Care. 2002;40:856–867. [DOI] [PubMed] [Google Scholar]