Skip to main content
The Permanente Journal logoLink to The Permanente Journal
. 2014 Winter;18(1):14–18. doi: 10.7812/TPP/12-133

Accuracy of National Surgery Quality Improvement Program Models in Predicting Postoperative Morbidity in Patients Undergoing Colectomy

Jeffrey A Neale 1, Craig Reickert 2, Andrew Swartz 3, Subhash Reddy 4, Maher A Abbas 5, Ilan Rubinfeld 6
PMCID: PMC3951025  PMID: 24626067

Although National Surgery Quality Improvement Program (NSQIP)-generated morbidities used to create area under the receiver operator curves (AUROCs) are accurate for patients in an overall surgical model, predictive models for morbidity are marginal for laparoscopic and open abdominal colectomies. NSQIP risk models tend to emphasize comorbidities rather than intraoperative details or technical aspects of colonic resections.

Abstract

Background:

The National Surgery Quality Improvement Program (NSQIP) is the standard for assessment of acuity-adjusted outcomes in surgery. The validity of NSQIP has not been well established in colorectal surgery. Technical and process variables, which NSQIP may not consider, affect morbidity rate.

Objective:

A retrospective observational study was undertaken to determine the accuracy of NSQIP models in predicting morbidity for patients undergoing laparoscopic or open colectomy.

Methods:

NSQIP participant use files for 2005 to 2008 were obtained. Data were selected using Current Procedural Terminology coding for open or laparoscopic colectomy. NSQIP-generated predicted morbidities were used to create area under the receiver operator curves (AUROCs).

Results:

AUROCs demonstrated an accurate predictive model if the value was above 0.8 and indicated a marginal predictor mode if below 0.7. The AUROC for the general NSQIP model was 0.817 (confidence interval [CI] = 0.815–0.819, p < 0.001). AUROC for the combined laparoscopic and open colectomy group was 0.703 (CI = 0.698–0.709, p value < 0.001). AUROCs for the individual laparoscopic and open colectomy groups were 0.627 (CI = 0.615–0.640, p < 0.001) and 0.701 (CI = 0.695–0.707, p < 0.001).

Conclusion:

This study demonstrates that although NSQIP-generated morbidities used to create AUROCs are accurate for patients in an overall surgical model, predictive models for morbidity are marginal for laparoscopic and open abdominal colectomies. NSQIP risk models tend to emphasize comorbidities rather than intraoperative details or technical aspects of colonic resections.

Introduction

In 1994, the Veterans Health Administration (VHA) established the National Surgical Quality Improvement Program (NSQIP) for monitoring and improving the quality of surgical care across all VHA medical centers where major surgery is performed. The impact of NSQIP on quality of care was substantial, with a 47% decrease in the 30-day postoperative mortality and a 43% reduction in postoperative complications.1 The implementation of NSQIP in VHA hospitals demonstrated that systematic collection, analysis, and feedback of risk-adjusted surgical data could lead to improved outcomes.1 NSQIP collects 250 preoperative, intraoperative, and 30-day postoperative variables to quantify 30-day risk-adjusted surgical outcomes. This prospective, peer-controlled, and validated database includes 95% of the data points.2 The data represent a sample of institutional operative cases. Data are collected by specially trained nurse coordinators and validated by standard methods to ensure reliable comparison between institutions. This approach is gaining wide acceptance and is rapidly becoming a standard for measuring and improving quality of care for general, vascular, and colon and rectal surgery practices in many health care institutions in the US. Initiatives have been undertaken to broaden the implementation of NSQIP in additional surgical subspecialties, including gynecology, orthopedics, and neurosurgery; the “multispecialty” hospital membership includes these surgical subspecialties and more. In the colon and rectal surgical realm, NSQIP has had an impact on decreasing surgical site infections (SSI) and has been used to study the impact of a laparoscopic or open approach on the frequency of SSI.3,4 Fleming and colleagues5 recently used NSQIP data to demonstrate that a laparoscopic approach for restorative proctocolectomy was associated with a statistically significant reduction in both minor and major postoperative complications compared with the traditional open approach.

Currently, risk models for morbidity and mortality are adjusted each year, and institutional outcomes are based on the acuity-adjusted observed-to-expected ratios. These models’ operative results are highly predictive when applied to the general population of NSQIP. For any of these models, accuracy can be judged by 2 components: its ability to separate diseased from nondiseased (discrimination) and its ability to correctly estimate the risk (calibration). The area under the receiver operating characteristic curve (AUROC) is one of the most common means of measuring discrimination. The AUROC and its associated c-statistic are functions of the sensitivity and specificity for each value of the measure or model. Because specificity and sensitivity can be manipulated on the basis of threshold choice, the c-statistic allows one to balance the view of the predictive model across the various metrics. The c-statistic value can range from 0.5 (no predictive ability) to 1 (perfect discrimination). The AUROC and its c-statistic are optimized semiannually to ensure accurate risk adjustment for reliable interinstitutional comparison. These models tend to favor demographic and comorbidity data, as these are common to all procedures. Another tool that one could use to evaluate goodness of fit in logistic regression is the Hosmer-Lemeshow test. However, this test cannot be used for large datasets such as ours because “[a]s with any statistical test, the power increases with sample size; this can be undesirable for goodness of fit tests because in very large data sets, small departures from the proposed model will be considered significant.”6 Given NSQIP’s need to gather a dataset common to all procedures, there are no specific colon and rectal data points collected. Despite proven broader surgical and specific colon and rectal predictive benefits, current NSQIP risk models are slightly better at predicting mortality than morbidity. In a review of semiannual reports, both mortality and morbidity are accurately predicted, with c-statistics on the AUROC curve of 0.94 (range = 0.85–0.87). We hypothesized that these models tend to emphasize comorbidity data rather than intraoperative details and technical aspects of surgery, and therefore are not solely reliable in predicting the outcome of patients undergoing colectomy.

Materials and Methods

NSQIP participant use files were obtained under a data use agreement of the American College of Surgeons, and the study was approved by the Henry Ford Health institutional review board. We evaluated the most recent 4 years available at the time of analysis, January 1, 2005 to December 31, 2008. Patients were selected using Current Procedural Terminology (CPT) coding for major colectomy and labeled as either open or laparoscopic. For open colectomy and laparoscopic colectomy, the CPT codes are listed in Table 1. The noncolectomy group was defined as patients undergoing procedures other than those listed under open and laparoscopic colectomy. Postoperative morbidity was defined as the occurrence of 1 or more of the following events: SSI (superficial, deep, or organ space), wound disruption, pneumonia, unplanned intubation, pulmonary embolism, mechanical ventilation longer than 48 hours, renal insufficiency, acute renal failure, urinary tract infection, stroke or cerebrovascular accident, coma lasting longer than 24 hours, peripheral nerve injury, cardiac arrest requiring cardiopulmonary resuscitation, myocardial infarction, bleeding transfusions, graft/prosthesis/flap failure, deep vein thrombosis or thrombophlebitis, sepsis, and septic shock. It should be noted that each of these points, even though not directly applicable to colectomy surgery, are part of the standard NSQIP adverse events that all NSQIP surgical clinical reviewers look for. NSQIP-generated predicted morbidities were then used to create AUROCs for the various populations: all of NSQIP, noncolon-related surgeries, all colectomies, laparoscopic colectomies, and open colectomies. AUROC (a curve generated by the modeling process, the c-statistic gives you an objective understanding if that curve is a good one) is defined as the probability that predicting the outcome is better than that of chance.7 The c-statistic can range from 0.5 (no predictive ability) to 1 (perfect discrimination). AUROCs were judged by the c-statistic: < 0.70 (no clinical utility), 0.70 to 0.79 (marginal clinical utility), 0.80 to 0.89 (adequate clinical utility), and greater than 0.90 (excellent clinical utility).8 All analyses were verified using segmentation and subset methods. Data were analyzed using statistical analysis software (SPSS version 19, IBM SBSS, New York, NY), and p < 0.05 was considered significant.

Table 1.

Current Procedural Terminology codes

Procedure Codes
Open colectomy 44139 44140
44141 44143
44144 44145
44146 44147
44150 44151
44152

Laparoscopic colectomy 44204 44205
44206 44207
44208 44210
44211 44212
44213 44215

Results

The general NSQIP population from January 1, 2005 to December 31, 2008 included 635,265 patients, of whom 45,645 underwent colonic resections (Table 2). Of the colonic resections, 12,455 (27.2%) were laparoscopic and 33,190 (72.8%) were open procedures. The mean age of all patients undergoing colectomy—“colectomy” group—was 62.1 years, and 48.1% were male. The patients undergoing procedures unrelated to the colon—“noncolectomy group”—were younger (mean = 54.5 years) and approximately the same proportion of male sex as in the other group. Emergent colectomies comprised 18.6% of all colectomies; 3.6% of laparoscopic colectomies were emergent, and 24.2% of open procedures were also emergent. Compared with other NSQIP-captured noncolorectal abdominal procedures, a higher proportion of colectomies were performed as emergency procedures, and most often employed the open approach. The AUROC for emergent morbidity, mortality, and elective morbidity and mortality were 0.73, 0.86, 0.64, and 0.88, respectively (Table 3). The mean relative value unit was 25.6 for all colectomies, 28.1 for laparoscopic colectomies, and 24.7 for open colectomies. As displayed in Table 2, the American Society of Anesthesiologists (ASA) status for colectomies was significantly higher than noncolectomy cases, especially for open procedures. As expected, the predicted morbidity of the colectomy group was much higher than that of the noncolectomy groups (24% for all colectomies vs 17% for laparoscopic colectomies, and 26% for open procedures; all univariate data significant at p < 0.001). The occurrence of actual morbidity for all of the NSQIP, all of NSQIP noncolectomy procedures, all colectomies, laparoscopic colectomies, and open procedures was 14.2%, 13.0%, 14.2%, 17.9%, and 34.9%, respectively.

Table 2.

Findings by group

Characteristic All NSQIP procedures (N = 635,265)a Noncolectomies (n = 589,620)a All colectomies (n = 45,645)a Laparoscopic (n = 12,455)a Open (n = 33,190)a
Mean age, years 55.1 54.5 62.1 60.0 62.9
Male, % 42.6 42.2 48.1 48.4 47.9
Emergency procedure, % 12.9 12.4 18.6 3.6 24.2
Mean relative value unit 15.6 14.8 25.6 28.1 24.7
ASA 1, % 10.4 11.0 3.3 5.0 2.7
ASA 2, % 45.6 45.7 44.4 59.6 38.7
ASA 3, % 36.7 36.3 41.5 32.3 45.0
ASA 4, % 6.6 6.4 9.8 2.9 12.4
ASA 5, % 0.3 0.3 0.9 0.1 1.1
Average predicted morbidity 0.12 0.11 0.24 0.17 0.26
Actual morbidity occurrence 0.14 0.13 0.14 0.18 0.35
a

Significant at p < 0.001.

ASA = American Society of Anesthesiologists class; NSQIP = National Surgical Quality Improvement Program.

Table 3.

Morbidity and mortality by colectomy emergency status

Colectomy emergency status N AUROCa Confidence interval p value Corresponding figure
Emergency
  Morbidity 4408 0.73 0.72–0.74 < 0.001 7
  Mortality 1295 0.86 0.85–0.87 < 0.001 8
Elective
  Morbidity 9376 0.64 0.64–0.65 < 0.001 9
  Mortality 777 0.88 0.87–0.89 < 0.001 10
a

AUROC value > 0.8 (accurate predictive model); value ≤ 0.7 (marginal predictive model).

AUROC = area under the receiver operator characteristic curve.

The detail of each AUROC curve is aggregated and summarized in Table 4. The AUROC for the general NSQIP model was 0.817, which was accurate in predicting morbidity in the entire patient population; the confidence interval (CI) was appropriate, and the p value was of statistical significance. The AUROC for the combined laparoscopic and open colectomy group was 0.703 and therefore marginal in predicting morbidity for the entire colectomy group. An appropriate CI was also obtained, and the p value demonstrated statistical significance. The AUROCs for the individual laparoscopic and open colectomy groups were 0.633 and 0.701, respectively. The NSQIP-generated AUROCs for these patient populations were marginal at predicting morbidity, which was supported by adequate sample size, CI, and p values (Figure 1). Figures 2 to 5 show AUROCs for morbidity and mortality for elective and emergency colectomies.

Table 4.

Aggregation and summarization of AUROCs

Group N AUROCa Confidence interval p value Corresponding figure
All patients 635,265 0.817 0.815–0.819 < 0.001 5
Noncolectomy 589,620 0.816 0.814–0.818 < 0.001 6
Open colectomy 33,190 0.701 0.694–0.707 < 0.001 3
All colectomies 45,645 0.702 0.697–0.708 < 0.001 4
Laparoscopic colectomy 12,455 0.627 0.619–0.647 < 0.001 2
a

AUROC Value > 0.8 is the accurate predictive model; value ≤ 0.7 is a marginal predictive model.

AUROC = area under the receiver operator characteristic curve.

Figure 1.

Figure 1.

Comparison of receiver operator characteristic (ROC) curves.

aThe random guess line, also known as the line of no discrimination, represents the strategy of randomly guessing. That is, the result is no more accurate than a random guess, for example, heads or tails. The closer the sample curve approaches the random guess line, the less accurate the test. The closer the sample line approaches the left and upward direction (toward y = 1), the more accurate the test.1,2 Diagonal segments are produced by ties.

NSQIP = National Surgical Quality Improvement Program.

1. Harrell FE Jr. Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. New York, NY: Springer-Verlag New York, Inc; 2001.

2. Cook NR. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation 2007 Feb 20;115(7):928–35. DOI: http://dx.doi.org/10.1161/CIRCULATIONAHA.106.672402.

Figure 2.

Figure 2.

Morbidity of emergency colectomy: receiver operator characteristic curve.

Solid diagonal line represents the random guess line.

Figure 5.

Figure 5.

Mortality of elective colectomy: receiver operator characteristic curve.

Solid diagonal line represents the random guess line.

Discussion

Our review demonstrates that the NSQIP-generated morbidities used to create AUROCs are accurate for patients in an overall surgical model. However, NSQIP-generated morbidities used to create AUROCs to predict morbidity in patients undergoing open colectomy demonstrated marginal accuracy at best and even less reliability for laparoscopic colectomy. The NSQIP risk models tend to emphasize comorbidities rather than intraoperative details or technical aspects of colonic resections. It is our opinion that certain factors may affect the surgical morbidity, including the case volume experience of the surgeon, the surgeon’s training (eg, specialized training in colorectal surgery or minimally invasive surgical fellowship), institutional support for colorectal oncology therapy, conversion from laparoscopic to open procedure, and institutional investment in laparoscopic equipment and dedicated surgical teams in the operating room. For instance, Bates and colleagues9,10 compared operative mortality rates of board-certified colorectal surgeons vs other institutional general surgeons, finding that overall mortality rates for colorectal operations were 1.4% for colorectal surgeons and 7.3% for other general surgeons. Specific patient factors such as type of prior abdominal surgery, adhesions, severity of disease process (in cases such as diverticulitis and inflammatory bowel disease), quality of bowel preparation, intraoperative decision making, intraoperative technique choices, or unexpected findings that change the planned strategy are not tracked or monitored by NSQIP. For example, the study conducted by Van’t Sant et al11 showed that anastomotic leakage developed in 7.8% of patients treated with mechanical bowel preparation and in 5.7% of patients not treated with mechanical bowel preparation (p = 0.79). Anastomotic leakage and intraabdominal abscess adverse events, which are of far greater concern to surgeons, are collected under the broad category of the organ/deep space infection variable in NSQIP. Therefore, it is plausible that NSQIP could be enhanced to better evaluate the technical and process-related variables that might affect morbidity rates in colon and rectal surgery, which current NSQIP data have not routinely considered. Investigators for the Michigan Colectomy Collaborative currently are focusing independent efforts on colectomy procedures, using a broader NSQIP approach to produce uniformity across the data and accurate comparison of different institutions.12

The NSQIP risk models tend to emphasize comorbidities rather than intraoperative details or technical aspects of colonic resections.

Future research efforts are needed to further understand and quantify the impact of various intraoperative factors on postoperative outcome to improve the value of the NSQIP program as it pertains to colorectal surgical procedures. This is of paramount importance considering that colorectal surgical procedures contribute to a substantial percentage of postoperative complications among all general surgical procedures.13

Conclusions

This study demonstrated the limitations of NSQIP as a risk-adjusted program used to monitor postoperative outcomes in patients undergoing colorectal resections. When evaluating practice improvement opportunities on the basis of expected outcomes for colon and rectal surgery in NSQIP reports, an organization and its physicians must balance the significant power of statistical measurements with a need for granularity about specific patient factors that may also influence outcome. Further research is needed to delineate the impact of various intraoperative technical and process-related factors that can affect outcome.

Figure 3.

Figure 3.

Mortality of emergency colectomy: receiver operator characteristic curve.

Solid diagonal line represents the random guess line.

Figure 4.

Figure 4.

Morbidity of elective colectomy: receiver operator characteristic curve.

Solid diagonal line represents the random guess line.

Acknowledgments

Kathleen Louden, ELS, of Louden Health Communications provided editorial assistance.

Footnotes

Disclosure Statement

The author(s) have no conflicts of interest to disclose.

Not By Any Other Means

Surgery cures diseases that cannot be cured by any other means, not by themselves, not by nature, not by medicine.

—Henri de Mondeville, 1260–1320, French surgeon

References


Articles from The Permanente Journal are provided here courtesy of Kaiser Permanente

RESOURCES