Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Jul 13.
Published in final edited form as: Surg Endosc. 2020 Jan 17;35(1):182–191. doi: 10.1007/s00464-020-07378-x

Development and Validation of Machine Learning Models to Predict Gastrointestinal Leak and Venous Thromboembolism After Weight Loss Surgery: An Analysis of the MBSAQIP Database

Jacob Nudel 1,2, Andrew M Bishara 3,4, Susanna WL de Geus 1, Prasad Patil 5, Jayakanth Srinivasan 2, Donald T Hess 1, Jonathan Woodson 2
PMCID: PMC9278895  NIHMSID: NIHMS1650039  PMID: 31953733

Abstract

Background:

Postoperative gastrointestinal leak and venous thromboembolism (VTE) are devastating complications of bariatric surgery. The performance of currently available predictive models for these complications remains wanting, while machine learning has shown promise to improve on traditional modeling approaches. The purpose of this study was to compare the ability of two machine learning strategies, artificial neural networks (ANNs) and gradient boosting machines (XGBs), to conventional models using logistic regression (LR) in predicting leak and VTE after bariatric surgery.

Methods:

ANN, XGB, and LR prediction models for leak and VTE among adults undergoing initial elective weight loss surgery were trained and validated using preoperative data from the 2015–2017 Metabolic and Bariatric Surgery Accreditation and Quality Improvement Program database. Data was randomly split into training, validation, and testing populations. Model performance was measured by the area under the receiver-operating characteristic curve (AUC) on the testing data for each model.

Results:

The study cohort contained 436,807 patients. The incidences of leak and VTE were 0.70% and 0.46%. ANN (AUC 0.75, 95% CI, 0.73 – 0.78) was the best-performing model for predicting leak, followed by XGB (AUC 0.70, 95% CI, 0.68 – 0.72) and then LR (AUC 0.63, 95% CI, 0.61 – 0.65, p < 0.001 for all comparisons). In detecting VTE, ANN, XGB, and LR achieved similar AUCs of 0.65 (95% CI, 0.63–0.68), 0.67 (95% CI, 0.64–0.70), and 0.64 (95% CI, 0.61–0.66) respectively; the performance difference between XGB and LR was statistically significant (p = 0.001).

Conclusions:

ANN and XGB outperformed traditional LR in predicting leak. These results suggest that ML has the potential to improve risk stratification for bariatric surgery, especially as techniques to extract more granular data from medical records improve. Further studies investigating the merits of machine learning to improve patient selection and risk management in bariatric surgery are warranted.

Keywords: Bariatric surgery, Postoperative Complications, Anastomotic Leak, Venous Thromboembolism, Machine Learning, Deep Learning

INTRODUCTION

Obesity and associated metabolic diseases constitute a major public health threat for which bariatric surgery is a highly effective intervention [1]. Laparoscopic weight loss surgery (WLS) is safe relative to other elective general surgical procedures [2], but complications can be morbid and expensive [3]. Safety concerns among both patients [4] and providers [5] help explain why WLS is under-utilized relative to clinical needs [6]. Stratification of risk for post-operative complications can guide patient selection, inform referral practices and patient counseling, and identify high-risk patients for monitoring and intervention.

Gastrointestinal leak occurs in less than one percent of WLS cases [7] but is associated with other complications, readmission, reoperation, death [8], and increased cost [9]. Obese patients are at high risk for deep vein thrombosis [10, 11] and American Society for Metabolic and Bariatric Surgery guidelines recommend routine thromboprophylaxis [12]. Nevertheless, venous thromboembolism (VTE) remains a leading cause of morbidity and mortality in this population [13, 14] and optimizing thromboprophylaxis strategies remains an area of considerable interest [13, 15, 16]. Prior risk models for leak and VTE achieve modest results [14, 17]. For example, BariClot is a VTE risk assessment tool based on logistic regression (LR) that was developed and validated using the Metabolic and Bariatric Surgery Accreditation and Quality Improvement Program (MBSAQIP) registry. Though it achieved an area under the receiver-operating characteristic curve (AUC) of just 0.60, it outperformed two previously published models [14, 18, 19].

Machine learning (ML), a branch of artificial intelligence, is the study of computer algorithms that extract information from data without explicit instructions from humans. ML does not refer to a specific mathematical approach, but to a broad array of statistical models. These are generally related in their flexibility and capacity to distinguish subtle, nonlinear patterns in data that are often not accessible to traditional approaches like LR [20]. ML models have recently outperformed LR in preoperative risk stratification using National Surgical Quality Improvement Program data [21, 22].

Artificial neural networks (ANNs) and gradient boosting machines (XGBs) are powerful classes of ML models that perform well in medical risk prediction using tabular data [23, 24]. A simple ANN is a stack of layered functions with each layer containing a matrix of weights. Data pass through the stack with the output of one layer used as the input to the next, ultimately transforming the data into model outputs. Training involves repeatedly adjusting the weights to gradually match model to target outputs [25]. XGB is a ML algorithm in which a series of decision models are iteratively constructed, tested, and adjusted to correct outputs, ultimately resulting in a decision tree algorithm optimized for a regression or classification task [26].

The aim of this study was to develop and validate preoperative ANN and XGB risk models for gastrointestinal leak and VTE among WLS patients and compare their performance against traditional models.

METHODS

Data Source and Study Population

All available MBSAQIP data from 2015 – 2017 was used. This national registry contains patient-level variables characterizing pre-operative risk factors and 30-day post-operative outcomes. In 2017, 832 accredited bariatric centers contributed over 200,000 cases to the registry [27]. The study population included patients aged 18–79 with no prior foregut or bariatric surgery who underwent elective laparoscopic gastric bypass (CPT 43644 or 43645) or laparoscopic sleeve gastrectomy (CPT 43775). We excluded patients with no information on height and weight or Body Mass Index (BMI) given the fundamental importance of this information to the study interventions. This study was approved by the Boston Medical Center Institutional Review Board under a pre-existing protocol for research on MBSAQIP data.

Outcomes

Outcomes of interest were gastrointestinal leak and VTE. Each was defined as a composite endpoint of 30-day outcomes variables in MBSAQIP. Leak was defined as postoperative organ space infection, presence of a surgical drain for more than 30 days, or leak as the suspected reason for any readmission, reintervention, or reoperation [7]. VTE was defined as anticoagulation therapy for imaging-confirmed deep vein thrombosis (DVT) or pulmonary embolism (PE) or readmission, reintervention, reoperation, or death with DVT or PE as the suspected cause [14].

Predictive Models

For each outcome of interest, we randomly split the data into training, validation, and testing populations comprising 50%, 25%, and 25% of the study cohort respectively. To account for imbalanced data, we oversampled positive cases to a ratio of 0.5 in the training set using the imbalanced-learn Python library [28, 29]. Positive and negative cases were split separately to ensure equitable distribution of positive cases in the training, validation, and testing sets.

Predictive models used all clinical variables that could be reasonably ascertained the day prior to surgery (Table 1). To permit valid comparisons of model performance, all models used all available input variables to generate predictions. Some features were calculated or consolidated from MBSAQIP variables (Table 1). Continuous variables were zero-centered and scaled to unit variance. Methods for handling missing and incomplete data are described in Table 1. Wherever possible, missing continuous variables were set to the training population mean. Missing categorical variables were assigned to a unique, unknown category.

Table 1.

Input variables and outcomes among 436,807 patients undergoing elective laparoscopic gastric bypass or sleeve gastrectomy.

Input variable
Sex, n (%)
  Female 346559 (79.3)
  Male 90248 (20.7)
Race, n (%)
  American Indian or Alaska Native 1745 (0.4)
  Asian 2138 (0.5)
  Black or African American 77050 (17.6)
  Native Hawaiian or Other Pacific Islander 1222 (0.3)
  Unknown/Not Reported 35193 (8.1)
  White 319459 (73.1)
Hispanic ethnicity, n (%)
  No 340748 (78.0)
  Unknown 41535 (9.5)
  Yes 54524 (12.5)
Procedure, n (%)
  Gastric bypass 121528 (27.8)
  Sleeve gastrectomy 315279 (72.2)
Gastroesophageal reflux disease, n (%)
  No 301408 (69.0)
  Yes 135399 (31.0)
Limited ambulation, n (%)
  No 429440 (98.3)
  Yes 7367 (1.7)
Vein thrombosis requiring therapy, n (%)
  No 429833 (98.4)
  Yes 6974 (1.6)
History of myocardial infarction, n (%)
  No 431143 (98.7)
  Yes 5664 (1.3)
Previous PCI or angioplasty, n (%)
  No 427889 (98.0)
  Yes 8918 (2.0)
Previous cardiac surgery, n (%)
  No 431964 (98.9)
  Yes 4843 (1.1)
Hypertension requiring medication, n (%)
  No 224663 (51.4)
  Yes 212144 (48.6)
Number of anti-Hypertensive medications, n (%)
  0 159267 (36.5)
  1 94885 (21.7)
  2 72381 (16.6)
  3+ 110274 (25.2)
Hyperlipidemia, n (%)
  No 331523 (75.9)
  Yes 105284 (24.1)
Venous stasis, n (%)
  No 432278 (99.0)
  Yes 4529 (1.0)
Dialysis requirement, n (%)
  No 435460 (99.7)
  Yes 1347 (0.3)
Renal insufficiency, n (%)
  No 433915 (99.3)
  Yes 2892 (0.7)
Preoperative therapeutic anticoagulation, n (%)
  No 425520 (97.4)
  Yes 11287 (2.6)
Diabetes, n (%)
  Insulin-dependent 38102 (8.7)
  No 320820 (73.4)
  Non-Insulin dependent 77885 (17.8)
Smoker, n (%)
  No 399223 (91.4)
  Yes 37584 (8.6)
Functional status, n (%)
  Independent 432220 (98.9)
  Partially Dependent 2833 (0.6)
  Totally Dependent 1754 (0.4)
Chronic obstructive pulmonary disease, n (%)
  No 429313 (98.3)
  Yes 7494 (1.7)
Oxygen dependent, n (%)
  No 433635 (99.3)
  Yes 3172 (0.7)
History of pulmonary embolism, n (%)
  No 431748 (98.8)
  Yes 5059 (1.2)
Sleep apnea, n (%)
  No 269762 (61.8)
  Yes 167045 (38.2)
Chronic steroids, n (%)
  No 429452 (98.3)
  Yes 7355 (1.7)
Presence and timing of placement of IVCF, n (%) 1
  Placed in anticipation of surgery 2243 (0.5)
  Pre-existing 978 (0.2)
  No 433539 (99.3)
  Unknown 47 (0.0)
American Society of Anesthesiology Class, n (%)
  1-No Disturb 1476 (0.3)
  2-Mild Disturb 97939 (22.4)
  3-Severe Disturb 319773 (73.2)
  4-Life Threat 15571 (3.6)
  5-Moribund 40 (0.0)
  Unknown 2008 (0.5)
Training level of first assistant, n (%)
  Attending - Other 24369 (5.6)
  Attending - Weight Loss Surgeon 65444 (15.0)
  Minimally Invasive Surgery Fellow 38613 (8.8)
  None (no assist or scrub tech/RN only) 63273 (14.5)
  Physician Assistant/Nurse Practitioner/Registered Nurse 166222 (38.1)
  Resident (PGY 1–5+) 78886 (18.1)
Year of operation, n (%)
  2015 131926 (30.2)
  2016 146614 (33.6)
  2017 158267 (36.2)
Height in centimeters, mean (sd) 166.7 (9.2)
Consolidated preoperative BMI, mean (sd) 2 45.4 (8.0)
Change in BMI in the year prior to surgery, mean (sd) 3 −2.0 (2.3)
Weight in kilograms, mean (sd) 4 126.7 (26.8)
Age in years, mean (sd) 5 44.7 (12.0)
Preoperative albumin level, mean (sd) 6 4.1 (0.4)
Preoperative hematocrit level, mean (sd) 7 40.9 (3.8)
Operative duration (minutes) 8 85.8 (47.1)
Outcomes
Gastrointestinal leak, n (%)
  No 433739 (99.3)
  Yes 3068 (0.7)
Venous thromboembolism, n (%)
  No 434795 (99.5)
  Yes 2012 (0.5)

Abbreviations: BMI body mass index; PCI percutaneous coronary intervention; IVCF inferior vena cava filter

1

The presence and timing of placement of preoperative inferior vena cava filters were consolidated into one variable.

2

In the event that preoperative BMI was available but maximum BMI for the preceding year was not, the most recent BMI was assumed to be the maximum BMI (n = 27,862); when preoperative BMI was not available, it was set equal to the maximum (n = 2,268)

3

A continuous variable representing the difference between the maximum BMI and the pre-operative BMI was computed. 27,862 missing

4

back-calculated from height and consolidated BMI

5

The 2015 MBSAQIP PUF reports ages as digits, whereas the 2016 and 2016 PUFs report ages to the hundredth decimal place. To avoid losing information from the latter cohorts, we reassigned each 2015 patient a randomly selected age from a uniform distribution within the appropriate year

6

114,343 missing

7

44,969 missing

8

used only in BariClot calculation

ANN and XGB were compared to LR for prediction of both VTE and leak. Our ANN, XGB, and LR models were compared to BariClot for prediction of VTE. Our models computed the probability of an outcome for each patient, while BariClot generated a risk score [14]. All predictive models were implemented in Python 3.6 [30, 31] using the Anaconda Distribution [32] with extensive use of the Pandas [33] and NumPy [34] libraries. We followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis reporting guidelines [35]. All code used for pre-processing data and building predictive models is open sourced.

ANN models were implemented in Pytorch [36] with code adapted from open sources [3739]. ANN architecture consisted of two layers with rectified linear units applied after each layer. We selected a relatively simple architecture because initial experiments with more complex architectures increased computational demand without a notable increase in predictive power. Categorical variables were encoded as neural embeddings [40]. Batch normalization was applied between layers [41]. Early stopping [42] and random dropout [43] were employed to avoid over-fitting training data [23]. Training was terminated when the ANN achieved peak performance on the validation data. XGB was implemented in XGBoost using default hyperparameters [26]. LR was implemented in statsmodels [44].

Statistical Comparison of Model Performance

Model performance was measured by computing the AUC generated by each model on the test set for each outcome. The Delong test [45] with threshold of 0.05 was used to statistically compare AUCs generated by each predictive model. AUC confidence intervals were obtained using the Delong procedure. Bootstrapping was used to find confidence intervals for other model performance measures including comparison of partial AUCs. The pROC package [46] with RStudio [47] and R version 3.5.2 [48] was used for all model performance calculations. Plots were made with ggplot2 [49].

Descriptive statistics were computed in using the tableone Python library [50]. Training, validation, and test populations were compared using one-way ANOVA and chi-squared tests for continuous and categorical variables respectively.

RESULTS

The study cohort contained 436,807 patients of whom 3,068 (0.070%) developed leak and 2,012 (0.046%) suffered VTE (Supplementary Figure 1). Characteristics of the cohort are shown in Table 1. The training, validation, and testing sets for both gastrointestinal leak and VTE had 218,403, 109,202, and 109,202 patients respectively. There were no clinically meaningful differences in patient characteristics between training, validation, and test sets, although there were some statistically significant differences (Table 1 and Table 2 in the Supplement).

Figure 1 shows model performance for prediction of leak. ANN was the best-performing model with an AUC of 0.75 (95% CI, 0.73 – 0.78). ANN outperformed XGB (p < 0.001), which also performed well, achieving an AUC of 0.70 (95% CI, 0.68 – 0.72). Both ANN and XGB significantly outperformed LR (p < 0.001 for each comparison), which achieved an AUC of 0.63 (95% CI, 0.61 – 0.65).

Figure 1. Receiver Operating Characteristic Curves for Predicting Gastrointestinal Leak.

Figure 1.

ANN, artificial neural network; XGB, gradient boosting machine; LR, logistic regression.

ANN achieved a partial AUC of 0.05 under the portion of the ROC with specificity greater than 90%, outperforming both XGB (partial AUC 0.03, p < 0.001) and LR (partial AUC 0.01, p < 0.001). With the specificity threshold held as close as possible to 0.975, ANN achieved a sensitivity of 0.493 (95% CI, 0.458 – 0.529), a positive predicative value (PPV) of 0.122 (95% CI, 0.114 – 0.131), and outperformed XGB and LR at the same threshold (Table 2). Of the 767 patients in the testing set who went on to suffer post-operative leaks, ANN would have identified 378 at the 0.975 specificity threshold, while XGB and LR would have identified 184 and 103 respectively.

Table 2.

Performance characteristics of the artificial neural network (ANN), gradient boosting machine (XGB), and logistic regression (LR) models for predicting gastrointestinal leak at the 97.5% specificity threshold.

Sensitivity, median (95% CI) Specificity, median (95% CI) PPV, median (95% CI)
Model
ANN 0.493 (0.458 – 0.529) 0.975 (0.974 – 0.976 0.122 (0.114 – 0.131)
XGB 0.24 (0.209 – 0.270) 0.975 (0.974 – 0.976) 0.063 (0.056 – 0.071)
LR 0.134 (0.111 – 0.159) 0.975 (0.974 – 0.976) 0.037 (0.030 – 0.043)

Model performance for prediction of VTE is summarized in Figure 2. ANN, XGB, and LR achieved similar AUCs of 0.65 (95% CI, 0.63–0.68), 0.67 (95% CI, 0.64–0.70), and 0.64 (95% CI, 0.61–0.66) respectively. XGB outperformed LR (p = 0.001) but there were no other statistically significant differences between models. ANN, XGB, and LR outperformed BariClot (p < 0.001 for all three comparisons), which achieved an AUC of 0.56 (95% CI, 0.54 – 0.59). At the 0.975 specificity threshold, confusion matrix metrics of the ANN, XGB, and LR models were generally comparable to one another and superior to BariClot (Table 3).

Figure 2. Receiver Operating Characteristic Curves for Predicting Venous Thromboembolism.

Figure 2.

ANN, artificial neural network; XGB, gradient boosting machine; LR, logistic regression.

Table 3.

Performance characteristics of the artificial neural network (ANN), gradient boosting machine (XGB), logistic regression (LR), and BariClot models for predicting venous thromboembolism at the 97.5% specificity threshold.

Sensitivity, median (95% CI) Specificity, median (95% CI) PPV, median (95% CI)
Venous thromboembolism
ANN 0.203 (0.169 – 0.239) 0.975 (0.974 – 0.976) 0.036 (0.03 – 0.042)
XGB 0.211 (0.175 – 0.247) 0.975 (0.974 – 0.976) 0.038 (0.031 – 0.044)
LR 0.159 (0.127 – 0.191) 0.975 (0.974 – 0.976) 0.029 (0.023 – 0.034)
BariClot 0.101 (0.076 – 0.127) 0.975 (0.974 – 0.976) 0.018 (0.014 – 0.023)

All models used all input variables in prediction. The relative importance of predictive variables in XGB models for both outcomes are shown in Figures 3 and 4. XGB identified age, height and weight-related measures, hematocrit, albumin, and assistant training level as important predictors for both leak and VTE. History of DVT was among the most important factors in predicting VTE, but not leak (Figures 3 and 4). Odds ratios for predictive variables used by logistic regression models are listed in the Supplement (Tables 3 and 4).

Figure 3. Relative importance of each predictive variable in the gradient boosting machine model for predicting gastrointestinal leak.

Figure 3.

Relative performance quantifies the relative contribution of each variable to minimizing the error of the gradient boosting model. The measure is scaled from zero to one against the most important predictor [24]. Relationships between importance and outcomes are nonlinear and cannot be interpreted directionally with respect to their influence on outcomes, nor can they be used to generate cutoff or threshold values. BMI, body mass index; DVT, deep vein thrombosis; MIS, minimally invasive surgery; PE, pulmonary embolism; HTN, hypertension; IVCF, inferior vena cava filter; PCI, percutaneous coronary intervention; GERD, gastroesophageal reflux disease; ASA, American Society of Anesthesiology; COPD, chronic obstructive pulmonary disease; HLD, hyperlipidemia; MI, myocardial infarction.

Figure 4. Relative importance of each predictive variable in the gradient boosting machine model for predicting venous thromboembolism.

Figure 4.

BMI, body mass index; GERD, gastroesophageal reflux disease; HTN, hypertension; IVCF, inferior vena cava filter; MIS, minimally invasive surgery; ASA, American Society of Anesthesiology; PE, pulmonary embolism.

DISCUSSION

This study demonstrates the potential utility of applying ML methods for pre-operative risk assessment in bariatric surgery. For predicting leak, ANN and XGB outperformed LR, which performed very similarly to a previously reported LR model [17]. In our study, the potential clinical benefits of ML are most apparent when evaluating our leak models at high specificity, where ANN and XGB performed particularly well and could prove useful in preoperative screening. At 97.5% specificity, ANN predicted several-fold more leaks than LR and achieved a PPV over 10%. Among patients with a 10% probability of leak, the benefits of weight loss surgery are unlikely to outweigh the risks. These results suggest ML methods can offer clinically meaningful improvements in risk stratification, even for uncommon events that are difficult to predict using any statistical method.

In the context of VTE, ANN and XGB perform similarly to LR, with XGB achieving a small but statistically significant advantage. All three of our models outperformed BariClot even though BariClot employs intra-operative information in prediction, likely because BariClot was trained on less data than our models. Recent contributions to the literature on VTE risk after weight loss surgery use a wider range of variables and incorporate patient data from perioperative, intra-operative, and post-operative time points than ours [13, 14, 16]. Our VTE risk models are less predictive than our leak models. This may be because widespread thromboprophylaxis among patients in MBSAQIP dampens the statistical signals available to VTE models.

These results contribute to an emerging literature describing ML for medical risk assessment. ML techniques have recently been applied to tabular data to predict a variety of outcomes including delirium [24] and pediatric emergency department triage [23] with good results. However, ML does not always outperform traditional LR. For example, ML outperformed LR in just one of two recent, rigorous efforts to predict heart failure readmissions, likely due to differences between the data sets used by each team [5153]. Our results fit the general pattern that no single predictive modelling technique consistently prevails.

Several limitations apply. First, outcomes of interest that occur beyond 30 days or for which patients do not present to the index institution may be missed [54]. However, the effect of increasing the incidence of outcomes in a test population on model performance is unclear, and may actually boost performance. Second, feature selection is limited to the specific variables and level of detail available in MBSAQIP. It is not clear that models developed using narrowly scoped, highly structured data will perform well outside of this context [20, 55]. Nevertheless, our results indicate that ML techniques may provide significant performance gains against LR. ANNs are especially powerful in learning from unstructured and multimodal data. Thus, we suspect access to a wider set of features would have improved the predictive performance of all of our models and of ANNs in particular. Additionally, pre-trained ANNs can be adapted to new data in a process called transfer learning. In this fashion, the insights gained through training in large administrative datasets can be harnessed to build high-performing models in specific clinical contexts with relatively small numbers of observations that can be collected on the scale of single institutions [56]. Third, we do not have sufficient data to externally validate our models. ANN and XGB were somewhat overfitted to the training data, but all three of our models performed similarly in the validation and testing data, confirming internal validity (Supplementary Table 5). Fourth, several variables, including the precise age of all patients in the 2015 cohort, were missing in a non-trivial number of cases. However, we split the data to equally distribute the missing data among the training, validation, and testing cohorts, and model performance should therefore account for bias introduced in imputation.

Our ML models are also limited in terms of usability. They employ more variables than clinicians can reasonably input at the point of care. Their utility will depend on assistive software that marries innovation in clinical data management to user interface design [20, 57]. Additionally, ML models are opaque and difficult to interpret. XGBs have the concept of relative importance, which measures the influence of each variable on model output [24, 58]. For example, our XGB suggests previously unreported predictors of leak including preoperative change in BMI, first assistant training level, race, ethnicity, and steroid use (Figure 3) [7, 59]. However, unlike the LR odds ratio, relative importance does not have clear numerical or directional meaning and lacks an intuitive semantic connection to model outcomes. ANNs have no such analogous concept and are particularly difficult to interpret. In some cases, interpretable algorithms like LR may be preferable to ANN or XGB even at the expense of predictive performance.

Despite these limitations, we offer a number of innovations, particularly with respect to our ANN. It is implemented Pytorch, an industry-standard framework. It makes use of a number of contemporary techniques to optimize performance and training that are common in industry but only beginning to emerge in the medical outcomes literature [23]. These include non-linearities between layers, dropout, batch normalization, and automatic early stopping. Additionally, our ANN uses neural embeddings for categorical variables. Traditionally, categorical variables are represented as one-hot for use in high-dimensional operations. By training feature vectors for each possible value of a categorical variable, we can represent values more meaningfully and in theory make better predictions [60]. This technique originated in natural language processing [61] and has been used in commercial software [62] and data science competitions [40]. This may be its first application to surgical outcomes. Others can straightforwardly adapt our ANN to analyze any organized tabular data and modify its structure to experiment with deeper and more complicated architectures.

Artificial intelligence has the potential to transform surgery by transferring responsibility for complex cognitive and manual tasks from humans to machines, ultimately automating and amplifying the capabilities of surgical teams [20]. This study represents incremental progress toward that future and generally supports the expectation that advances in artificial intelligence and ML will meaningfully improve the performance of predictive models in surgery. To our knowledge, this is the first successful application of modern ML algorithms to characterize preoperative risk among WLS patients. Before these models can be deployed at the point of care, they must be validated in future and external cohorts. They may need to be retrained or updated with additional data in order to ensure they perform as expected in particular patient populations.

Supplementary Material

1650039_SuppFile

ACKNOWLEDGMENTS

The Metabolic and Bariatric Surgery Accreditation and Quality Improvement Program (MBSAQIP) the hospitals participating in the MBSAQIP are the source of the data used herein; they have not verified and are not responsible for the statistical validity of the data analysis or the conclusions derived by the authors.

Source of Funding:

Supported by the Boston University Institute for Health System Innovation and Policy and the Boston Medical Center Department of Surgery. On a separate project, Dr. Bishara was supported by a National Institute of General Medical Sciences training grant (T32 GM008440, PI: Dexter Hadley).

Footnotes

Previous communications: Presented at the American College of Surgeons Quality and Safety conference, July 2019

Conflicts of Interest: Drs. Nudel and Bishara are co-founders of Bezel Health, a company building software to measure and improve healthcare quality interventions. Drs. Woodson, De Geus, Srinivasan, Patil, and Hess do not have any conflicts of interest to report.

REFERENCES

  • [1].Nguyen NT, Varela JE. Bariatric surgery for obesity and metabolic disorders: state of the art. Nature reviews Gastroenterology & hepatology. 2017; 14:160; [DOI] [PubMed] [Google Scholar]
  • [2].Böckelman C, Hahl T, Victorzon M. Mortality Following Bariatric Surgery Compared to Other Common Operations in Finland During a 5-Year Period (2009–2013). A Nationwide Registry Study. Obesity Surgery. 2017; 27:2444–2451; 10.1007/s11695-017-2664-z [DOI] [PubMed] [Google Scholar]
  • [3].Fry BT, Scally CP, Thumma JR, Dimick JB. Quality improvement in bariatric surgery: the impact of reducing postoperative complications on medicare payments. Annals of surgery. 2018; 268:22–27; [DOI] [PubMed] [Google Scholar]
  • [4].Funk LM, Jolles S, Fischer LE, Voils CI. Patient and referring practitioner characteristics associated with the likelihood of undergoing bariatric surgery: a systematic review. JAMA surgery. 2015; 150:999–1005; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Funk LM, Jolles SA, Greenberg CC, Schwarze ML, Safdar N, McVay MA, Whittle JC, Maciejewski ML, Voils CI. Primary care physician decision making regarding severe obesity treatment and bariatric surgery: a qualitative study. Surgery for Obesity and Related Diseases. 2016; 12:893–901; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Vidal J, Corcelles R, Jiménez A, Flores L, Lacy AM. Metabolic and bariatric surgery for obesity. Gastroenterology. 2017; 152:1780–1790; [DOI] [PubMed] [Google Scholar]
  • [7].Alizadeh RF, Li S, Inaba C, Penalosa P, Hinojosa MW, Smith BR, Stamos MJ, Nguyen NT. Risk Factors for Gastrointestinal Leak after Bariatric Surgery: MBASQIP Analysis. J Am Coll Surg. 2018; 227:135–141; PMID:29605723; 10.1016/j.jamcollsurg.2018.03.030 [DOI] [PubMed] [Google Scholar]
  • [8].Mocanu V, Dang J, Ladak F, Switzer N, Birch DW, Karmali S. Predictors and outcomes of leak after Roux-en-Y Gastric Bypass: An analysis of the MBSAQIP data registry. Surgery for Obesity and Related Diseases. 2019; [DOI] [PubMed] [Google Scholar]
  • [9].Turrentine FE, Denlinger CE, Simpson VB, Garwood RA, Guerlain S, Agrawal A, Friel CM, LaPar DJ, Stukenborg GJ, Jones RS. Morbidity, mortality, cost, and survival estimates of gastrointestinal anastomotic leaks. Journal of the American College of Surgeons. 2015; 220:195–206; [DOI] [PubMed] [Google Scholar]
  • [10].Ward-Smith P. Body mass index, surgery, and risk of venous thromboembolism in middleaged women. Urologic Nursing. 2012; 32:220–223; [Google Scholar]
  • [11].Klovaite J, Benn M, Nordestgaard BG. Obesity as a causal risk factor for deep venous thrombosis: a M endelian randomization study. Journal of internal medicine. 2015; 277:573–584; [DOI] [PubMed] [Google Scholar]
  • [12].ASMBS Clinical Issues Committee. ASMBS updated position statement on prophylactic measures to reduce the risk of venous thromboembolism in bariatric surgery patients. 2013; 10.1016/j.soard.2013.03.006 [DOI] [PubMed] [Google Scholar]
  • [13].Aminian A, Andalib A, Khorgami Z, Cetin D, Burguera B, Bartholomew J, Brethauer SA, Schauer PR. Who should get extended thromboprophylaxis after bariatric surgery. Annals of surgery. 2017; 265:143–150; [DOI] [PubMed] [Google Scholar]
  • [14].Dang JT, Switzer N, Delisle M, Laffin M, Gill R, Birch DW, Karmali S. Predicting venous thromboembolism following laparoscopic bariatric surgery: development of the BariClot tool using the MBSAQIP database. Surg Endosc. 2018; PMID:30003351; 10.1007/s00464-018-6348-0 [DOI] [PubMed] [Google Scholar]
  • [15].Gaborit B, Aron-Wisnewsky J, Salem J-E, Bege T, Frere C. Pharmacologic Venous Thromboprophylaxis After Bariatric Surgery. Annals of surgery. 2018; [DOI] [PubMed] [Google Scholar]
  • [16].Thereaux J, Lesuffleur T, Czernichow S, Basdevant A, Msika S, Nocca D, Millat B, Fagot-Campagna A. To what extent does posthospital discharge chemoprophylaxis prevent venous thromboembolism after bariatric Surgery?: results from a nationwide cohort of more than 110,000 patients. Annals of surgery. 2018; 267:727–733; [DOI] [PubMed] [Google Scholar]
  • [17].Kumar SB, Hamilton BC, Wood SG, Rogers SJ, Carter JT, Lin MY. Is laparoscopic sleeve gastrectomy safer than laparoscopic gastric bypass? a comparison of 30-day complications using the MBSAQIP data registry. Surg Obes Relat Dis. 2018; 14:264–269; PMID:29519658; 10.1016/j.soard.2017.12.011 [DOI] [PubMed] [Google Scholar]
  • [18].Bahl V, Hu HM, Henke PK, Wakefield TW, Campbell DA, Caprini JA. A validation study of a retrospective venous thromboembolism risk scoring method. Ann Surg. 2010; 251:344–350; PMID:19779324; 10.1097/SLA.0b013e3181b7fca6 [DOI] [PubMed] [Google Scholar]
  • [19].Finks JF, English WJ, Carlin AM, Krause KR, Share DA, Banerjee M, Birkmeyer JD, Birkmeyer NJ, Collaborative MBS. Predicting risk for venous thromboembolism with bariatric surgery: results from the Michigan Bariatric Surgery Collaborative. Annals of surgery. 2012; 255:1100–1104; [DOI] [PubMed] [Google Scholar]
  • [20].Hashimoto DA, Rosman G, Rus D, Meireles OR. Artificial intelligence in surgery: promises and perils. Annals of surgery. 2018; 268:70–76; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Kim JS, Merrill RK, Arvind V, Kaji D, Pasik SD, Nwachukwu CC, Vargas L, Osman NS, Oermann EK, Caridi JM. Examining the ability of artificial neural networks machine learning models to accurately predict complications following posterior lumbar spine fusion. Spine. 2018; 43:853–860; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Bertsimas D, Dunn J, Velmahos GC, Kaafarani HMA. Surgical Risk Is Not Linear: Derivation and Validation of a Novel, User-friendly, and Machine-learning-based Predictive OpTimal Trees in Emergency Surgery Risk (POTTER) Calculator. Ann Surg. 2018; 268:574–583; PMID:30124479; 10.1097/SLA.0000000000002956 [DOI] [PubMed] [Google Scholar]
  • [23].Goto T, Camargo CA, Faridi MK, Freishtat RJ, Hasegawa K. Machine Learning-Based Prediction of Clinical Outcomes for Children During Emergency Department Triage. JAMA Netw Open. 2019; 2:e186937; PMID:30646206; 10.1001/jamanetworkopen.2018.6937 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Wong A, Young AT, Liang AS, Gonzales R, Douglas VC, Hadley D. Development and Validation of an Electronic Health Record–Based Machine Learning Model to Estimate Delirium Risk in Newly Hospitalized Patients Without Known Cognitive Impairment. JAMA Network Open. 2018; 1:e181018; 10.1001/jamanetworkopen.2018.1018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015; 521:436–444; PMID:26017442; 10.1038/nature14539 [DOI] [PubMed] [Google Scholar]
  • [26].Chen T, Guestrin C. Xgboost: A scalable tree boosting system. 2016; Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining:785–794; [Google Scholar]
  • [27].MBSAQIP. MBSAQIP Participant Use Data File. [accessed 2019 January 14]. https://www.facs.org/quality-programs/mbsaqip/participant-use.
  • [28].Pavlou M, Ambler G, Seaman SR, Guttmann O, Elliott P, King M, Omar RZ. How to develop a more accurate risk prediction model when there are few events. Bmj. 2015; 351:h3868; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Lemaître G, Nogueira F, Aridas CK. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. The Journal of Machine Learning Research. 2017; 18:559–563; [Google Scholar]
  • [30].Millman KJ, Aivazis M. Python for scientists and engineers. Computing in Science & Engineering. 2011; 13:9–12; [Google Scholar]
  • [31].Oliphant TE. Python for scientific computing. Computing in Science & Engineering. 2007; 9 [Google Scholar]
  • [32].Anaconda Software Distribution. Computer software. Vers. 2–2.4.0. Anaconda, Nov. 2016. Web. https://anaconda.com. [Google Scholar]
  • [33].McKinney W. Data structures for statistical computing in python. 2010; Proceedings of the 9th Python in Science Conference 445:51–56; [Google Scholar]
  • [34].Oliphant TE. A guide to NumPy. Trelgol Publishing USA; 2006. [Google Scholar]
  • [35].Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): The TRIPOD Statement. Annals of Internal Medicine. 2015; 162:55; 10.7326/m14-0697 [DOI] [PubMed] [Google Scholar]
  • [36].Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A. Automatic differentiation in pytorch. 2017; [Google Scholar]
  • [37].Howard JAO. fastai. GitHub;2018. https://github.com/fastai/fastai. [Google Scholar]
  • [38].Seth Y. A Neural Network in PyTorch for Tabular Data with Categorical Embeddings. Let the Machines Learn;2018. https://yashuseth.blog/2018/07/22/pytorch-neural-network-for-tabular-data-with-categorical-embeddings/. [Google Scholar]
  • [39].Ng A, Katanforoosh K. CS230 Deep Learning Course Notes and Code Examples. [Google Scholar]
  • [40].Guo C, Berkhahn F. Entity embeddings of categorical variables. arXiv preprint arXiv:160406737. 2016; [Google Scholar]
  • [41].Ioffe S, Szegedy C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv. 2015; [Google Scholar]
  • [42].Caruana R, Lawrence S, Giles CL. Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. 2001; Advances in neural information processing systems:402–408; [Google Scholar]
  • [43].Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research. 2014; 15:1929–1958; [Google Scholar]
  • [44].Seabold S, Perktold J. Statsmodels: Econometric and statistical modeling with python. 2010; Proceedings of the 9th Python in Science Conference 57:61; [Google Scholar]
  • [45].DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988; 837–845; [PubMed] [Google Scholar]
  • [46].Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, Müller M. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC bioinformatics. 2011; 12:77; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [47].Team RS. RStudio: integrated development for R. RStudio, Inc, Boston, MA: URL http://wwwrstudiocom. 2015;42 [Google Scholar]
  • [48].Team RC. R: A language and environment for statistical computing. 2013; [Google Scholar]
  • [49].Wickham H. ggplot2: elegant graphics for data analysis. Springer; 2016. [Google Scholar]
  • [50].Pollard TJ, Johnson AEW, Raffa JD, Mark RG. tableone: An open source Python package for producing summary statistics for research papers. JAMIA Open. 2018; 1:26–31; 10.1093/jamiaopen/ooy012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [51].Frizzell JD, Liang L, Schulte PJ, Yancy CW, Heidenreich PA, Hernandez AF, Bhatt DL, Fonarow GC, Laskey WK. Prediction of 30-day all-cause readmissions in patients hospitalized for heart failure: comparison of machine learning and other statistical approaches. JAMA cardiology. 2017; 2:204–209; 10.1001/jamacardio.2016.3956 [DOI] [PubMed] [Google Scholar]
  • [52].Johnson KW, Soto JT, Glicksberg BS, Shameer K, Miotto R, Ali M, Ashley E, Dudley JT. Artificial intelligence in cardiology. Journal of the American College of Cardiology. 2018; 71:2668–2679; [DOI] [PubMed] [Google Scholar]
  • [53].Golas SB, Shibahara T, Agboola S, Otaki H, Sato J, Nakae T, Hisamitsu T, Kojima G, Felsted J, Kakarmath S. A machine learning model to predict the risk of 30-day readmissions in patients with heart failure: a retrospective analysis of electronic medical records data. BMC medical informatics and decision making. 2018; 18:44; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [54].Telem DA, Yang J, Altieri M, Patterson W, Peoples B, Chen H, Talamini M, Pryor AD. Rates and Risk Factors for Unplanned Emergency Department Utilization and Hospital Readmission Following Bariatric Surgery. Ann Surg. 2016; 263:956–960; PMID:26727087; 10.1097/SLA.0000000000001536 [DOI] [PubMed] [Google Scholar]
  • [55].Maier-Hein L, Vedula SS, Speidel S, Navab N, Kikinis R, Park A, Eisenmann M, Feussner H, Forestier G, Giannarou S. Surgical data science for next-generation interventions. Nature Biomedical Engineering. 2017; 1:691; [DOI] [PubMed] [Google Scholar]
  • [56].Lee G, Rubinfeld I, Syed Z. Adapting surgical models to individual hospitals using transfer learning. 2012; 2012 IEEE 12th International Conference on Data Mining Workshops:57–63; [Google Scholar]
  • [57].Bihorac A, Ozrazgat-Baslanti T, Ebadi A, Motaei A, Madkour M, Pardalos PM, Lipori G, Hogan WR, Efron PA, Moore F. MySurgeryRisk: development and validation of a machine-learning risk algorithm for major complications and death after surgery. Annals of surgery. 2019; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [58].Natekin A, Knoll A. Gradient boosting machines, a tutorial. Frontiers in neurorobotics. 2013; 7:21; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [59].Masoomi H, Kim H, Reavis KM, Mills S, Stamos MJ, Nguyen NT. Analysis of factors predictive of gastrointestinal tract leak in laparoscopic and open gastric bypass. Archives of Surgery. 2011; 146:1048–1051; 10.1001/archsurg.2011.203 [DOI] [PubMed] [Google Scholar]
  • [60].Qu Y, Cai H, Ren K, Zhang W, Yu Y, Wen Y, Wang J. Product-based neural networks for user response prediction. 2016; 2016 IEEE 16th International Conference on Data Mining:1149–1154; [Google Scholar]
  • [61].Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. 2013; Advances in neural information processing systems:3111–3119; [Google Scholar]
  • [62].Covington P, Adams J, Sargin E. Deep neural networks for youtube recommendations. 2016; Proceedings of the 10th ACM conference on recommender systems:191–198; [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1650039_SuppFile

RESOURCES