Abstract
Background:
Surgical risk calculators (SRCs) have been developed for estimation of postoperative complications but do not directly inform decision-making. Decision curve analysis (DCA) is a method for evaluating prediction models, measuring their utility in guiding decisions. We aimed to analyze the utility of SRCs to guide both preoperative and postoperative management of patients undergoing hepatopancreaticobiliary surgery by using DCA.
Methods:
A single-institution, retrospective review of patients undergoing hepatopancreaticobiliary operations between 2015 and 2017 was performed. Estimation of postoperative complications was conducted using the American College of Surgeons SRC [ACS-SRC] and the Predictive OpTimal Trees in Emergency Surgery Risk (POTTER) calculator; risks were compared with observed outcomes. DCA was used to model optimal patient selection for risk prevention strategies and to compare the relative performance of the ACS-SRC and POTTER calculators.
Results:
A total of 994 patients were included in the analysis. C-statistics for the ACS-SRC prediction of 12 postoperative complications ranged from 0.546 to 0.782. DCA revealed that an ACS-SRC–eguided readmission prevention intervention, when compared with an all-or-none approach, yielded a superior net benefit for patients with estimated risk between 5% and 20%. Comparison of SRCs for venous thromboembolism intervention demonstrated superiority of the ACS-SRC for thresholds for intervention between 2% and 4% with the POTTER calculator performing superiorly between 4% and 8% estimated risk.
Conclusions:
SRCs can be used not only to predict complication risk but also to guide risk prevention strategies. This methodology should be incorporated into external validations of future risk calculators and can be applied for institution-specific quality improvement initiatives to improve patient outcomes.
Keywords: Surgical risk calculators, Decision curve analysis, Risk prediction, Postoperative complications, Net benefit
Introduction
Postoperative complications, which occur in up to 30% of general surgery procedures, may lead to prolonged hospital stay, delays in adjuvant treatment, poor quality of life, and postoperative mortality.1 In addition, adverse events represent a significant burden to the health care system; thus, it is critical to accurately predict these complications and seek to reduce them.2 Many groups have attempted to refine risk estimation through the creation of surgical risk calculators (SRCs).3–5 As an example, the American College of Surgeons (ACS) SRC uses patient demographics and clinical characteristics to generate estimated risk percentages for 12 complications including cardiovascular events, surgical site infections, and death.4 Although patient-specific risk estimation is a valuable tool for informed consent discussion, clinical application of these data to guide intervention in situations where estimated risk is high has thus far been limited.
Despite their wealth of information, risk calculators have several limitations. The ACS-SRC, for example, was developed using a heterogeneous cohort of patients and may yield variable accuracy in specific patient populations.4 In addition, calculators may only provide accurate risk prediction within a certain range and thus may oversimplify overall risk.6 With the advent of new calculators, it has also become increasingly difficult to establish the relative performance of one calculator over another. Finally, translation of risk calculator data to effect clinical practice is lacking. To address this latter point, Vickers et al. developed decision curve analysis (DCA) as a statistical approach using existing risk calculation data to evaluate the clinical consequences of a targeted intervention to improve overall patient outcomes.6–8 This represents a novel approach to inform implementation of risk prevention measures.
The aim of this study was to use validated SRCs, including the ACS-SRC and the Predictive OpTimal Trees in Emergency Surgery Risk (POTTER) calculator,3 to 1) compare predicted risk with observed outcomes, 2) use DCA to estimate the potential impact of risk-driven interventions on patient outcomes, and 3) compare the overall net benefit of risk-driven interventions from two different calculators. To explore this, we focused exclusively on patients undergoing hepatopancreaticobiliary (HPB) surgery, as these are complex operations associated with relatively high morbidity and mortality. We hypothesized that utilization of DCA in interpretation of risk estimation could provide objective data to guide quality improvement initiatives and optimize patient outcomes in HPB surgery with the potential for application to other surgical patient populations.
Methods
Derivation of the study cohort
A retrospective review of our institutional HPB database was performed for all patients who underwent pancreaticoduodenectomy (Whipple procedure), distal pancreatectomy, and hepatectomy for any indication between January 1, 2015 and June 30, 2017. Both open and laparoscopic cases were included. The Current Procedural Terminologies codes used to categorize patients by procedure type are listed (Appendix Table A.1). Baseline characteristics were obtained from our institutional ACS National Surgical Quality Improvement Program data set. Observed 30-day outcomes were obtained from ACSeNational Surgical Quality Improvement Program data and from comprehensive review of the electronic health record. This study was approved by the Institutional Review Board at the University of Pittsburgh. Because this was a retrospective study, informed consent was waived for participation in the study.
External validation of the risk calculator
Baseline characteristics were entered into the ACS web-based SRC between August 20, 2017 and August 28, 2017 to obtain estimated risk percentages of the 12 specified 30-day complication rates. Observed and predicted risk percentages were compared using discrimination, calibration, and the Brier score. Discrimination was measured using the C-statistic or the area under the receiver operating characteristic curve (AUC).9 Calibration was measured using the HosmereLemeshow test which divides the data set into deciles based on predicted values and compares the observed response rates to the expected rates, with significant differences indicating lack of fit.10 Overall predictive accuracy was measured using the raw Brier score which is the average gap (mean squared difference) between forecast probabilities and actual outcomes. This incorporates a model’s discriminative ability and calibration with a score of 0 for a perfect model and a score of p*(1-p), where p is the a priori probability of the outcome, for a noninformative model.11,12
Decision curve analysis
Decision curves were constructed using the open source “rmda” package (version 1.4) in R (version 3.4.1; R Development Core Team, Vienna, Austria).13 A detailed explanation on the manual calculation of a decision curve is provided (Appendix). In addition, to understand the relative frequencies of risk estimates, we developed a custom score distribution script in Python using standard plotting libraries (version 3.4; Python Software Foundation, Wilmington, DE). No data manipulation was performed, and the visualizations are easily reproducible.
Decision curve analysis for selection of risk intervention and comparison of risk calculators
After our external validation of the ACS-SRC in our patient cohort, we constructed a DCA curve to model readmission. We also performed an external validation for the POTTER calculator in our patient cohort to compare C-statistics. As an example, we then plotted DCA curves for venous thromboembolism (VTE) for both the ACS-SRC and POTTER calculators, as compared with treat-all and treat-none.
Statistical analysis
Continuous data are shown as mean ± standard deviation, and continuous data are shown as number (percent). Student’s t-test was used to compare continuous variables. Statistical analyses were performed using R Studio (version 3.4.1; R Foundation for Statistical Computing, Vienna, Austria).
Case scenario explanation of decision curve analysis
To illustrate the concepts of DCA, we offer the following scenario. City Y has performed poorly in tornado preparedness and is considering the adoption of a new weather prediction model, model X, to predict a tornado and thereby guide the activation of their city-wide siren system. Currently, sirens only activate when a tornado is already reported on the ground. Model X inputs a series of meteorological variables and outputs an estimated risk percentage, from 0% to 100%, of a tornado touching down in the next several hours. City Y officials want to determine if the expected net benefit of implementing model X is superior to no intervention (status quo), or a liberal intervention for any day with inclement weather strategy.
To begin, city Y officials perform an external validation of model X to assess its reproducibility and geographic transportability. Weather data from the past 100 d are input into model X, and the estimated risk percentages of a tornado touching down are recorded. When comparing these expected values to actual observed tornado events, metrics of model X performance such as AUC and calibration can be recorded. Although important, these metrics alone do not inform on the utility of adopting model X; therefore, city Y officials wish to apply some form of decision analytics. They choose DCA because their same external validation data set is sufficient to perform DCA.
DCA provides a method for exploring risk-based intervention (in this example, siren activation) that, unlike other forms of decision analysis, does not require utility scores for all possible outcomes of the decision tree.8,14,15 Instead, DCA approaches the problem in terms of the threshold probability (Pt) above which the decision maker would deem the expected value of intervention to be greater than not doing so. In this formulation, the ratio (1- Pt)/Pt represents the relative cost of false-positive to false-negative results (e.g., a threshold probability of 10% signifies that the harm of a single false negative is nine times the harm of a false positive). In our example, a false positive indicates unnecessary siren activation, whereas a false negative represents a failure of siren activation on a day of a tornado touching down. In other words, a false positive signifies waste of time and resources while a false negative indicates a missed opportunity for improved safety and care. The effectiveness of an intervention and its cost (monetary, risk to those involved, etc.) is inherently accounted for in the decision maker’s selection of the threshold probability. An intervention that approaches 100% efficacy with minimal cost and risk lends itself to a low threshold probability for triggering the intervention.
DCA derives the net benefit (y-axis) to intervene when the estimated risk percentage from model X is greater than city Y’s specified threshold probability (x-axis). The unit of net benefit is analogous to net true positives. As a reference, a net benefit of 0.1 is equivalent to a net of 10 true-positive siren activations per 100 d without an increase in the number of false-positive activations. At the decision maker’s selected threshold probability, the model or strategy with the highest net benefit would be the best to implement. At a given Pt, net benefit is calculated by subtracting the proportion of all days with false-positive siren activations (simply count the number of days without an observed tornado where model X’s estimated risk percentage was ≥ to the given Pt) from the proportion with true-positive activations, weighting by the relative harm of a false-positive and a false-negative result in accordance with the following formula:
By allowing the threshold probability to vary, DCA can show graphically the net benefit obtained by using model X for the decision to activate sirens. In the absence of other predictive models, DCA compares net benefit of using model X against the net benefits of two opposing strategies of treat-none and treat-all days with the intervention. For the former, the net benefit to intervene is zero, which is constant whatever the threshold probability. In the latter, when intervention is implemented for all days, (true-positive count/n) is city Y tornado prevalence (π) and (false-positive count/n) is 1-π, resulting in a net benefit function defined by π - (1- π)*(Pt/1-Pt) that ranges from π down to negative infinity.
DCA’s utility becomes more apparent when multiple models are compared. Model W claims to be superior to model X because of its higher AUC. City Y officials have agreed on a threshold probability of 50% for siren activations (i.e., they are willing to accept a false alarm rate of 50%). As such, the model with the highest net benefit at a Pt of 50% would be their selection. City Z is also looking to implement a model; however, they have more fortified commercial and residential architecture and are willing to accept a higher threshold probability of 75% for siren activation. A higher model AUC does not guarantee a higher net benefit across all threshold probabilities; therefore, they would also look at the DCA comparing model X and model W before making their selection.
Results
Baseline characteristics of the study cohort
Baseline characteristics of patients undergoing one of three selected HPB procedures are shown (Table 1). Of the 994 patients, 306 (30.8%) underwent a Whipple procedure, 127 (12.8%) underwent distal pancreatectomy, and 561 (56.4%) underwent hepatectomy. Approximately one-third (36.3%) of patients undergoing a Whipple procedure were <65 y old, whereas 48% of patients undergoing distal pancreatectomy and 54.9% of patients undergoing a hepatectomy were <65 y. The majority of patients in the Whipple (97.1%) and distal pancreatectomy (95.3%) groups had a cancer diagnosis, whereas only 58.5% of those undergoing hepatectomy had a cancer diagnosis.
Table 1 –
Baseline characteristics of patients undergoing hepatopancreaticobiliary surgery, stratified by procedure type.
| Whipple n = 306 | Distal pancreatectomy n = 127 | Hepatectomy n = 561 | |
|---|---|---|---|
| Age (y)–no. (%) | |||
| <65 | 111 (36.3) | 61 (48.0) | 308 (54.9) |
| 65–74 | 117 (38.2) | 45 (35.4) | 155 (27.6) |
| 75–84 | 71 (23.2) | 19 (15.0) | 87 (15.5) |
| >85 | 7 (2.3) | 2 (1.6) | 11 (2.0) |
| Sex–no. (%) | |||
| Female | 145 (47.4) | 65 (51.2) | 276 (49.2) |
| Male | 161 (52.6) | 62 (28.8) | 285 (50.8) |
| Body mass index (kg/m2)–mean ± SD | 27.0 ± 5.9 | 29.6 ± 6.9 | 28.6 ± 6.0 |
| Functional status | |||
| Independent | 304 (99.3) | 127 (100.0) | 559 (99.6) |
| Partially dependent | 2 (0.7) | 0 | 2 (0.4) |
| Emergent–no. (%) | 1 (0.3) | 0 | 1 (0.2) |
| ASA class | |||
| 1 | 0 | 1 (0.8) | 3 (0.5) |
| 2 | 35 (11.4) | 19 (15) | 97 (17.3) |
| 3 | 247 (80.7) | 98 (77.2) | 436 (77.7) |
| 4 | 24 (7.8) | 9 (7.1) | 25 (4.5) |
| Ventilator dependent–no. (%) | 0 | 0 | 0 |
| Cancer–no. (%) | 297 (97.1) | 121 (95.3) | 328 (58.5) |
| Sepsis–no. (%) | |||
| None | 301 (98.4) | 127 (100.0) | 555 (98.9) |
| SIRS | 3 (1.0) | 0 | 2 (0.4) |
| Sepsis | 2 (0.7) | 0 | 4 (0.7) |
| Diabetes–no. (%) | |||
| No | 223 (72.9) | 92 (72.4) | 459 (81.8) |
| Insulin | 47 (15.4) | 11 (8.7) | 40 (7.1) |
| Noninsulin | 36 (11.8) | 24 (18.9) | 62 (11.1) |
| Hypertension–no. (%) | 182 (59.5) | 71 (55.9) | 276 (49.2) |
| Congestive heart failure–no. (%) | 0 | 0 | 1 (0.2) |
| Dyspnea–no. (%) | |||
| No | 295 (96.4) | 0 (0.0) | 552 (98.4) |
| At rest | 1 (0.3) | 4 (3.1) | 1 (0.2) |
| Moderate exertion | 10 (3.3) | 123 (96.9) | 8 (1.4) |
| Smoking–no. (%) | 72 (23.5) | 33 (26.0) | 106 (18.9) |
| COPD–no. (%) | 15 (4.9) | 33 (26.0) | 106 (18.9) |
| Dialysis–no. (%) | 1 (0.3) | 0 | 1 (0.2) |
| Acute renal failure–no. (%) | 0 | 0 | 0 |
| Ascites–no. (%) | 0 | 0 | 0 |
| Chronic steroid use–no. (%) | 5 (1.6) | 3 (2.4) | 9 (1.6) |
ASA = American Society of Anesthesiologists; COPD = chronic obstructive pulmonary disease; SIRS = systemic immune response syndrome; SD = standard deviation.
Comparison of predicted and observed 30-day outcomes
Predicted (based on the ACS-SRC) and observed 30-day rates of 12 complications specified by the ACS-SRC are shown (Table 2). Poor calibration was observed for “any complication” and “pneumonia” with Hosmere–Lemeshow test P-values of 0.04 and 0.01, respectively. All other complications displayed a P-value >0.05, indicating satisfactory calibration. In addition, we compared predicted and observed outcomes individually for Whipple procedure (Appendix Table A.2), distal pancreatectomy (Appendix Table A.3), and hepatectomy (Appendix Table A.4). Among all patients, the highest C-statistic was noted for discharge to a facility (0.782, Fig. 1A) with the lowest being for surgical site infection (0.546, Fig. 1B). Receiver operation curves for readmission (C-statistic 0.611, Fig. 1C) and VTE (C-statistic 0.674, Fig. 1D) are also shown. Analysis of the Brier scores (with a lower score reflecting superior accuracy) demonstrates the highest accuracy for cardiac complications (0.002), death (0.009), and renal failure (0.013).
Table 2 –
Predicted and observed 30-day outcomes for patients undergoing hepatopancreaticobiliary surgery.
| Predicted %±SD | Observed No. (%) | C-statistic | Brier score | Hosmer–Lemeshow P-value | |
|---|---|---|---|---|---|
| Serious complication | 21.8 ± 7.5 | 190 (19.1) | 0.606 | 0.152 | 0.37 |
| Any complication | 25.3 ± 9.0 | 220 (22.1) | 0.599 | 0.171 | 0.04 |
| Pneumonia | 4.1 ± 2.3 | 27 (2.7) | 0.656 | 0.026 | 0.01 |
| Cardiac complication | 1.8 ± 1.5 | 2 (0.2) | 0.571 | 0.002 | 0.55 |
| Surgical site infection | 13.8 ± 6.4 | 132 (13.3) | 0.546 | 0.117 | 0.32 |
| Urinary tract infection | 3.2 ± 1.5 | 19 (1.9) | 0.706 | 0.019 | 0.26 |
| Venous thromboembolism | 3.2 ± 1.2 | 25 (2.5) | 0.674 | 0.024 | 0.07 |
| Renal failure | 1.9 ± 1.5 | 13 (1.3) | 0.658 | 0.013 | 0.47 |
| Readmission | 13.7 ± 3.6 | 182 (18.3) | 0.611 | 0.148 | 0.14 |
| Return to the operating room | 3.9 ± 1.8 | 36 (3.6) | 0.591 | 0.035 | 0.08 |
| Death | 2.2 ± 2.5 | 9 (1.0) | 0.604 | 0.009 | 0.78 |
| Discharge to a nursing or rehabilitation facility | 9.3 ± 9.3 | 86 (8.7) | 0.782 | 0.069 | 0.39 |
| Predicted length of stay (d) | 7.9 ± 2.5 | 6.6 ± 5.3 |
SD = standard deviation.
Risk prediction was obtained from the American College of Surgeons Surgical Risk Calculator. P-value for length of stay was <0.001.
Fig. 1 –

Receiver operating curves generated from comparison of predicted (from the American College of Surgeons Surgical Risk Calculator) 30-day complication rates for surgical site infection (A), readmission (B), venous thromboembolic event (C), and discharge to a nursing facility (D) in patients undergoing hepatopancreaticobiliary surgery.
Decision curve analysis for risk intervention
We modeled the application of DCA for intervention for readmission (Table 3). The threshold probability at which an intervention would be implemented is shown in addition to the corresponding net benefit (for treating only those patients with an ACS-SRC risk above the threshold versus treating all patients) and the adjusted benefit of using the ACS-SRC risk to determine intervention. In addition, we demonstrate the number of patients that would be spared from an unnecessary intervention at each threshold. As an example, if a surgeon selects an ACS-SRC–epredicted readmission risk of 10% at which to intervene, unnecessary intervention (i.e., intervention is implemented but patient not have experienced the complication) would be avoided in 26 of 100 patients.
Table 3 –
Application of decision curve analysis for readmission intervention, based on data obtained from the American College of Surgeons (ACS) Surgical Risk Calculator (SRC).
| Pt (%) | Patient counts | Net benefit | Advantage of ACS-SRC | ||||
|---|---|---|---|---|---|---|---|
| Total | TPC | FPC | Treatment based on ACS-SRC | All patients treated | Relative benefit using ACS-SRC | Number of interventions avoided per 100 patients | |
| 0 | 994 | 182 | 812 | 0.183 | 0.183 | 0 | 0 |
| 5 | 990 | 182 | 808 | 0.141 | 0.140 | 0.001 | 1.480 |
| 6 | 980 | 182 | 798 | 0.134 | 0.131 | 0.003 | 4.359 |
| 7 | 948 | 181 | 767 | 0.130 | 0.122 | 0.008 | 11.185 |
| 8 | 934 | 181 | 753 | 0.124 | 0.112 | 0.012 | 13.364 |
| 9 | 881 | 176 | 705 | 0.121 | 0.102 | 0.018 | 18.527 |
| 10 | 819 | 171 | 648 | 0.121 | 0.092 | 0.029 | 25.693 |
| 11 | 758 | 161 | 597 | 0.115 | 0.082 | 0.033 | 26.639 |
| 12 | 655 | 150 | 505 | 0.124 | 0.072 | 0.052 | 38.258 |
| 13 | 572 | 136 | 436 | 0.124 | 0.061 | 0.063 | 42.049 |
| 14 | 514 | 122 | 392 | 0.113 | 0.050 | 0.063 | 38.754 |
| 15 | 426 | 100 | 326 | 0.100 | 0.039 | 0.061 | 34.429 |
| 20 | 24 | 5 | 19 | 0.010 | −0.021 | 0.032 | 12.617 |
FPC = false-positive count; Pt = threshold probability; TPC = true-positive count.
We demonstrate the DCA for readmission, with the net benefit on the y-axis and the threshold probability on the x-axis (Fig. 2). The solid line demonstrates the DCA if all patients were to receive the readmission intervention, and the dashed line represents no patients receiving the intervention. The dash-dot line shows the decision curve using the SRC-predicted risk. Therefore, using the SRC approach to guide intervention results in superior net benefit between threshold probabilities of 5% and 20%.
Fig. 2 –

Decision curve analysis for readmission for patients undergoing hepatopancreaticobiliary surgery, based on estimated risk from the American College of Surgeons Surgical Risk Calculator.
Decision curve analysis for comparison of risk calculators
Next we sought to use DCA to compare risk-based VTE intervention using two different risk calculators (ACS-SRC and POTTER). The receiver operating curves for VTE for both ACS-SRC and POTTER are shown (Fig. 3A), with a C-statistic of 0.574 for POTTER and 0.674 for ACS-SRC. As shown in Figure 3B, ACS-SRC–eguided intervention yielded a greater net benefit at lower threshold probabilities (2%-4%) and the POTTER results were superior between threshold probabilities of 4% and 8%. In addition, we demonstrate the score distribution of each calculator at different threshold probabilities (Fig. 3C), which displays the overall distribution of risk percentage frequency obtained from the calculators at or below each probability in our external validation data set.
Fig. 3 –

Receiver operation curves demonstrating predicted 30-day venous thromboembolism (VTE) risk as estimated by the American College of Surgeons Surgical Risk Calculator (ACS-SRC) and the Predictive OpTimal Trees in Emergency Surgery Risk (POTTER) calculator (A). Decision curve analysis using both the ACS-SRC and POTTER calculators for VTE risk intervention (B). Decision curve analysis with intervals representing the frequency of reported risk estimation at or below each probability (C).
Discussion
SRCs have been devised with the intent of guiding informed consent discussions through estimation of specific risks. The clinical application of these results in the postoperative period, however, has thus far been limited. Building on prior work by others using DCA, we demonstrate the utility of DCA in the understanding of implementation of risk-reducing approaches based on estimated risk in a population of almost 1000 patients undergoing high-risk HPB surgery. Our results highlight that the model with the largest AUC may not be the optimal model for clinical decisions. DCA also allows for the selection of the optimal SRC depending on the goals of a quality improvement program and thresholds for intervention. This approach has broad applications to a variety of surgical specialties and can serve to guide quality improvement initiatives to improve surgical outcomes.
To the best of our knowledge, this is the largest published external validation of the ACS-SRC in a population of patients undergoing HPB surgery with almost 1000 patients. Our results demonstrate satisfactory overall estimation of the 12 complications with results comparable with other external validation studies in various surgical populations. Similar to other studies, we found variations in model performance between different complications.16–18 In addition, our Brier scores were comparable with those reported in an exploration of the ACS-SRC in a smaller study of patients undergoing HPB procedures19 and in patients undergoing Whipple.20 We do believe that it is important to incorporate multiple tests of predictive accuracy in the external validation of SRCs, as expounded by others and as we have carried out here.21 Because interpretation of model performance remains subjective and should be interpreted in the context of different practice environments, it is particularly important to factor in the SRC calibration, accuracy, and discrimination results before applying SRC results to clinical practice or research endeavors.
One key advantage of DCA analysis is the ability to select a threshold probability. DCA can be applied in individual practice settings, with surgeons determining their own threshold for intervention based on their available risk interventions and perceived benefits, and is also applicable on a larger scale for institution-wide quality improvement initiatives. For example, to reduce postoperative VTE, an institution may select to institute aggressive risk management strategies (i.e., early ambulation, mechanical compression, outpatient pharmacologic prophylaxis) in patients with an estimated VTE risk over 3% based on ACS-SRC risk.22,23 Thus, in a preoperative office setting, providers can obtain these risk estimations and use these results not only to counsel patients but also to identify potential barriers to ensure a smooth postoperative course. This allows for preoperative interventions (e.g., prehabilitation which may facilitate early postoperative ambulation) and allows for sufficient planning to institute any postoperative interventions (e.g., patient and caretaker education about enoxaparin injections).24,25 This approach, however, does necessitate understanding of the risks/benefits and resources available for each intervention and thus must be tailored for each institution. DCA allows for flexibility in threshold probability and provides a range in which a given calculator remains reliable and thus is adaptable to changes in goal metrics.
Certainly, DCA is not the only approach developed for modeling the impacts of decision-making from a clinical, resource, and financial perspective. Cost-effectiveness analysis and decision tree analysis, for example, represent valuable and more robust methodologies, although they may require additional data such as billing records and utility scores which may not be readily available.26–28 DCA, on the other hand, has the advantage of being less cumbersome, using existing clinical data from a validation data set.6,8,29
In this study, we used the ACS-SRC calculator as it is the most widely used and well-validated SRC.4,5,30 Despite this, these results may not be applicable to all patient populations, which, together with the rise of big data approaches, has driven the development of a multitude of SRCs for specific patient populations. We compared the ACS-SRC with the recently published and machine learninge–driven POTTER calculator to explore the relative utility of each of the calculators in VTE risk.3 Interestingly, despite a higher C-statistic, the ACS-SRC demonstrated a superior net benefit at a more limited range of threshold probabilities (2%-4%), whereas the POTTER calculator performed better at a higher range (4%-8%). This finding highlights the value of direct comparison of these risk estimation approaches with selection of a risk prediction method based on specific goals. For example, although POTTER was developed for emergency general surgery operations, its results may be more applicable if an institution has selected a lower threshold probability (e.g., 2%-4%) on which to intervene to reduce VTE risk.
This study has several limitations. We chose to focus on patients undergoing HPB surgery given the high-risk nature of these operations and the importance of risk mitigation among this patient population. We believe that this methodology can be extended to other surgical or nonsurgical patient populations, but it is possible that it may not carry the same value in lower-risk populations. In addition, we acknowledge that this study, like many decision modeling studies, is theoretical in nature. A large, prospective study comparing DCA-guided interventions to other guided risk intervention approaches would be particularly informative. Finally, for some of the complications, we observed a low number of events; thus, external validation of the calculator and subsequent DCA analysis may be limited.
With the rapid advent of novel risk calculators, it is imperative that future external validations include not only standard metrics of model performance, but also data in support of their potential role for clinical implementation.31 We demonstrate the application of SRC data to guide interventions for risk reduction, using DCA to inform intervention implementation for high-risk patients. This method represents a key step in leveraging SRC data to guide clinical decision-making. Future work will focus on implementation of this strategy in a prospective study testing SRC-guided intervention measures. We believe that this work can be easily extrapolated to other disease types, using DCA and other risk calculators to maximize patient outcomes for oncologic and nononcologic diseases.
Supplementary Material
Disclosure
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Footnotes
Meeting presentation: This was presented at the American College of Surgeons 104th Annual Clinical Congress, Scientific Forum, Boston, MA, October 2018.
Supplementary data
Supplementary data to this article can be found online at https://doi.org/10.1016/j.jss.2020.11.059.
REFERENCES
- 1.Healey MA, Shackford SR, Osler TM, Rogers FB, Burns E. Complications in surgical patients. Arch Surg. 2002;137:611–618. [DOI] [PubMed] [Google Scholar]
- 2.Grosse SD, Nelson RE, Nyarko KA, Richardson LC, Raskob GE. The economic burden of incident venous thromboembolism in the United States: a review of estimated attributable healthcare costs. Thromb Res. 2016;137:3–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Bertsimas D, Dunn J, Velmahos GC, Kaafarani HMA. Surgical risk is not linear. Ann Surg. 2018;268:574–583. [DOI] [PubMed] [Google Scholar]
- 4.Bilimoria KY, Liu Y, Paruch JL, et al. Development and evaluation of the universal ACS NSQIP surgical risk calculator: a decision aid and informed consent tool for patients and surgeons. J Am Coll Surg. 2013;217:833–842.e3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Liu Y, Cohen ME, Hall BL, Ko CY, Bilimoria KY. Evaluation and enhancement of calibration in the American College of surgeons NSQIP surgical risk calculator. J Am Coll Surg. 2016;223:231–239. [DOI] [PubMed] [Google Scholar]
- 6.Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Mak. 2006;26:565–574. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Steyerberg EW, Vickers AJ. Decision curve analysis: a discussion. Med Decis Mak. 2008;28:146–149. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Fitzgerald M, Saville BR, Lewis RJ. Decision curve analysis. JAMA. 2015;313:409–410. [DOI] [PubMed] [Google Scholar]
- 9.Pencina MJ, D’Agostino RB. Evaluating discrimination of risk prediction models: the C statistic. JAMA. 2015;314:1063–1064. [DOI] [PubMed] [Google Scholar]
- 10.Van Calster B, Vickers AJ. Calibration of risk prediction models: impact on decision-analytic performance. Med Decis Mak. 2015;35:162–169. [DOI] [PubMed] [Google Scholar]
- 11.Redelmeier D, Bloch D, Hickam D. Assessing predictive accuracy: how to compare Brier scores. J Clin Epidemiol. 1991;44:1141–1146. [DOI] [PubMed] [Google Scholar]
- 12.Wu YC, Lee WC. Alternative performance measures for prediction models. PLoS One. 2014;9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Brown M rmda: Risk Model Decision Analysis. Available at: http://mdbrown.github.io/rmda/.
- 14.Localio AR, Goodman S. Beyond the usual prediction accuracy metrics: reporting results for clinical decision making. Ann Intern Med. 2012;157:294–295. [DOI] [PubMed] [Google Scholar]
- 15.Vickers AJ, Van Calster B, Steyerberg EW. Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests. BMJ. 2016;352:3–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Basta MN, Bauder AR, Kovach SJ, Fischer JP. Assessing the predictive accuracy of the American College of surgeons national surgical quality improvement project surgical risk calculator in open ventral hernia repair. Am J Surg. 2016;212:272–281. [DOI] [PubMed] [Google Scholar]
- 17.Vaziri S, Wilson J, Abbatematteo J, et al. Predictive performance of the American College of Surgeons universal risk calculator in neurosurgical patients. J Neurosurg. 2018;128:942–947. [DOI] [PubMed] [Google Scholar]
- 18.Khavanin N, Qiu CS, Mlodinow AS, et al. External validation of the breast reconstruction risk assessment calculator. J Plast Reconstr Aesthet Surg. 2017;70:876–883. [DOI] [PubMed] [Google Scholar]
- 19.Beal EW, Lyon E, Kearney J, et al. Evaluating the American College of surgeons national surgical quality improvement project risk calculator: results from the U.S. Extrahepatic biliary malignancy consortium. HPB. 2017;19:1104–1111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Mogal HD, Fino N, Clark C, et al. NSQIP risk calculator in patients undergoing pancreaticoduodenectomy. J Surg Oncol. 2017;114:157–162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Collins GS, De Groot JA, Dutton S, et al. External validation of multivariable prediction models: a systematic review of methodological conduct and reporting. BMC Med Res Methodol. 2014;14:1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Kakkar A, Cohen A, Tapson V, et al. Venous thromboembolsim risk and prophylaxis in acute care hospital setting (ENDORSE survey): findings in surgical patients. Ann Surg. 2010;251:330–338. [DOI] [PubMed] [Google Scholar]
- 23.Agnelli G Prevention of venous thromboembolism in surgical patients. Circulation. 2004;110(24 Suppl L):4–12. [DOI] [PubMed] [Google Scholar]
- 24.Colwell CW, Pulido P, Hardwick ME, Morris BA. Patient compliance with outpatient prophylaxis: an observational study. Orthopedics. 2005;28:143–147. [DOI] [PubMed] [Google Scholar]
- 25.Mayo NE, Feldman L, Scott S, et al. Impact of preoperative change in physical function on postoperative recovery: argument supporting prehabilitation for colorectal surgery. Surgery. 2011;150:505–514. [DOI] [PubMed] [Google Scholar]
- 26.Hill SR. Cost-effectiveness analysis for clinicians. BMC Med. 2012;10:2–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Celi LA, Charlton P, Ghassemi MM, et al. Secondary analysis of electronic health records. 2016.
- 28.Sonnenberg FA, Beck JR. Markov models in medical decision making: a practial guide. Med Decis Mak. 1993;13:322–338. [DOI] [PubMed] [Google Scholar]
- 29.Rousson V,Zumbrunn T. Decision curve analysis revisited: overall net benefit, relationships to ROC curve analysis, and application to case-control studies. BMC Med Inform Decis Mak. 2011;11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Cohen ME, Liu Y, Ko CY, Hall BL. An examination of American College of surgeons NSQIP surgical risk calculator accuracy. J Am Coll Surg. 2017;224:787–795.e1. [DOI] [PubMed] [Google Scholar]
- 31.Pencina MJ, Goldstein BA, D’Agostino RB. Prediction models – development, evaluation, and clinical application. N Engl J Med. 2020;382:1583–1586. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
