Assessing the value of deep neural networks for postoperative complication prediction in pancreaticoduodenectomy patients

Mikkel Bonde; Alexander Bonde; Haytham Kaafarani; Andreas Millarch; Martin Sillesen

doi:10.1371/journal.pone.0316402

. 2024 Dec 30;19(12):e0316402. doi: 10.1371/journal.pone.0316402

Assessing the value of deep neural networks for postoperative complication prediction in pancreaticoduodenectomy patients

Mikkel Bonde ¹, Alexander Bonde ¹, Haytham Kaafarani ², Andreas Millarch ¹, Martin Sillesen ^1,^3,^4,^*

Editor: Robert Jeenchen Chen⁵

PMCID: PMC11684602 PMID: 39774499

Abstract

Introduction

Pancreaticoduodenectomy (PD) for patients with pancreatic ductal adenocarcinoma (PDAC) is associated with a high risk of postoperative complications (PoCs) and risk prediction of these is therefore critical for optimal treatment planning. We hypothesize that novel deep learning network approaches through transfer learning may be superior to legacy approaches for PoC risk prediction in the PDAC surgical setting.

Methods

Data from the US National Surgical Quality Improvement Program (NSQIP) 2002–2018 were used, with a total of 5,881,881 million patients, including 31,728 PD patients. Modelling approaches comprised of a model trained on a general surgery patient cohort and then tested on a PD specific cohort (general model), a transfer learning model trained on the general surgery patients with subsequent transfer and retraining on a PD-specific patient cohort (transfer learning model), a model trained and tested exclusively on the PD-specific patient cohort (direct model), and a benchmark random forest model trained on the PD patient cohort (RF model). The models were subsequently compared against the American College of Surgeons (ACS) surgical risk calculator (SRC) in terms of predicting mortality and morbidity risk.

Results

Both the general model and transfer learning model outperformed the RF model in 14 and 16 out of 19 prediction tasks, respectively. Additionally, both models outperformed the direct model on 17 out of the 19 tasks. The transfer learning model also outperformed the general model on 11 out of the 19 prediction tasks. The transfer learning model outperformed the ACS-SRC regarding mortality and all the models outperformed the ACS-SRC regarding the morbidity prediction with the general model achieving the highest Receiver Operator Area Under the Curve (ROC-AUC) of 0.668 compared to the 0.524 of the ACS SRC.

Conclusion

DNNs deployed using a transfer learning approach may be of value for PoC risk prediction in the PD setting.

Introduction

Pancreatic ductal adenocarcinoma (PDAC) is a leading cause of cancer-related deaths in Western countries, with a 5-year survival rate of approximately 12%, making it the cancer with the lowest 5-year survival rate in the United States [1]. Furthermore, patients with successfully resected tumors have a 3-year survival rate of only 20–34%, [2] which is attributable to a combination of aggressive tumor growth patterns and poor response to oncological treatment [3–5].

Surgical resection in the form of pancreaticoduodenectomy (PD, Whipple’s procedure), distal pancreatectomy (DP) or total pancreatectomy (TP)—provides the only curative option for patients with PDAC. These procedures are, however, associated with a plethora of postoperative complications (PoCs) such as Superficial Surgical Site Infections (SSSIs), Organ/Space Specific Surgical Site Infections (OSSI), venous thromboembolism (VTE’s), hemorrhage, and death, collectively affecting upwards of 40% of patients undergoing PD [6]. These complications not only prolong the surgical treatment phase and subject patients to significant morbidity, but could furthermore render the patient unable to proceed with adjuvant chemotherapy due to frailty issues [7].

Due to these factors as well as the fact that upwards of 80% of successfully resected patients suffer tumor recurrence [8], the risk of an operative approach to PDAC treatment needs to be carefully weighed against the potential benefits, and tools for identifying PoC risks thus play an important role in selecting optimal treatment strategies for PDAC patients. While multiple risk prediction tools have been proposed for both general surgical and PDAC patients, these have reported varied performance in terms of identifying PoC risks [9, 10], especially for PD patients [11]. Novel approaches such as artificial intelligence (AI) deep neural networks (DNNs) have, however, recently shown superior performance over legacy approaches in PoC prediction [12] although the potential value for pancreatic surgery patients remain unknown.

Legacy risk prediction models often have the inherent drawback of being trained on general surgical cohorts with subsequent applications to specific surgical procedure such as PD. In contrast, DNN approaches can leverage the power of transfer learning [13], meaning that the model can be trained and learn general features from one patient cohort (e.g., learning features associated with PoC’s in general surgical populations) with subsequent retraining and fine tuning on a specific surgical procedure or cohort (e.g., patients undergoing a PD procedure. This makes it possible to create new models with knowledge of previously predicted input and output relationship which can then further be refined and improved through training on new data [14]. We have previously demonstrated the value of this approach [13]. Investigating whether DNNs and the potential of transfer learning could be of value in predicting PoC risk for PD patients presents the goal of this study. We hypothesize that transfer learning of a DNN previously trained on a large-scale dataset with many surgical procedure types, to a dataset consisting exclusively of PD patients could be superior to legacy approaches including the American College of Surgeons Surgical Risk Calculator (ACS-SRC) [15] for task including predicting the 30-day risk of mortality and morbidity as defined by the National Surgical Quality Improvement Program (NSQIP) [16].

Methods

This study was conducted using a dataset obtained from the US American College of Surgeons (ACS) NSQIP, which includes manually curated PoC’s from more than 700 US hospitals across 2,941 different procedure subtypes. For this study we used the 2002 to 2018 dataset available through NSQIP. The study and the use of the dataset was approved by NSQIP. IRB approval was waived by the Massachusetts General Hospital IRB, including the requirement for informed consent.

Data was accessed May 3^rd, 2023. This was a purely retrospective study on data obtained from the NSQIP quality registry. As such, no interaction with patients was required. The study group did not have access to data enabling the identification of individual participants, and patient consent was thus neither required (as per the IRB decision to waive the requirement for consent), nor possible due to the de-identified nature of the dataset.

Datasets and modelling approaches

The dataset included manually labelled data from 5,881,881 million patients with more than 150 different variables for each patient such as perioperative biochemistry, height, weight, age, smoking status, comorbidities, demographic, American Society of Anesthesiology (ASA) score, and postoperative complications as defined by NSQIP [16] and shown in Table 1.

Table 1. The incidence of postoperative complication (prediction variables) for the three datasets before the split into validation/training sets are depicted above with the number of patients experiencing each variable labelled.

PD: Pancreaticoduodenectomy.

	General dataset (n = 5,874,941)	PD dataset (n = 17,037)	Test dataset (n = 2,000)
Superficial Surgical site infection	79,986 (1.4%)	1,348 (7.9%)	153 (7.7%)
Deep surgical site infection	21,314 (0.4%)	345 (2.0%)	38 (1.9%)
Organ/space surgical site infection	49,259 (0.8%)	2,321 (13.6%)	247 (12.4%)
Wound disruption	21,921 (0.4%)	242 (1.4%)	23 (1.2%)
Postoperative pneumonia	54,850 (0.9%)	704 (4.1%)	86 (4.3%)
Unplanned intubation	43,288 (0.7%)	716 (4.2%)	77 (3.9%)
Pulmonary embolism	18,967 (0.3%)	193 (1.1%)	32 (1.6%)
Ventilator dependence >48 hours	43,476 (0.7%)	582 (3.4%)	66 (3.3%)
Progressive renal insufficiency	14,349 (0.2%)	141 (0.8%)	13 (0.7%)
Acute renal failure	15,511 (0.3%)	183 (1.1%)	22 (1.1%)
Urinary tract infection	62,844 (1.1%)	514 (3.0%)	59 (4.0%)
Stroke	11,256 (0.2%)	51 (0.3%)	5 (0.3%)
Cardiac arrest	17,383 (0.3%)	211 (1.2%)	19 (1.0%)
Myocardial infarction	21,027 (0.4%)	193 (1.1%)	21 (1.1%)
Deep vein thrombosis	32,230 (0.6%)	478 (2.8%)	59 (3.0%)
Sepsis	43,464 (0.7%)	1,271 (7.5%)	158 (7.9%)
Septic shock	23,498 (0.4%)	565 (3.3%)	70 (3.5%)
Bleeding requiring transfusion	303,726 (5.2%)	3,593 (21.1%)	412 (20.6%)
Death	57,605 (1.0%)	349 (2.0%)	40 (2.0%)

Open in a new tab

A graphical representation of patient selection and allocation into training and test data is illustrated in Fig 1. Of the 5,881,881 patients in the NSQIP dataset, we identified 31,944 patients as having undergone a PD procedure (as indicated by the CPT codes 48150, 48152, 48153, and 48154). A total of 216 patients were excluded from the study due to having undergone a PD operation with a duration of 2 hours or less because we assessed that the procedure may not have been completed and there would thus be a risk of incorrect coding, resulting in a final sample size of 31,728 patients.

The dataset of 31,728 PD patients was randomly split into two datasets: a dataset with 12,907 PD patients (40%) which would be recombined with the non-PD dataset (5,862,034 patients) containing non-PD patients and thus all operation types (termed “general dataset” in the following), which was fielded to allow the model to learn both general and PD-specific features. The second dataset was a dedicated PD dataset containing the remaining 19,037 (60%) of patients (PD dataset): This dataset contained only PD patients.

The overall objective of the modeling approaches was to predict 30-day risk of death and/or the occurrence of 18 different postoperative complications as defined by NSQIP [16] using a DNN and Random forest for multi-label classification enabling the prediction of all outcomes by a single model (please see prediction variables below). To align with the ACS-SCR, we further included a target termed “morbidity”, defined as the occurrence of any of the 18 NSQIP- defined complications within a 30-day period following surgery.

The datasets were used for training and testing four different multi-label modelling approaches, aiming at identifying the optimal training and dataset use approach in the PD setting:

Training of a DNN on the general dataset with direct porting and testing on the PD dataset (general model)
Transfer learning of a DNN, from a general to a PD specific setting. This model included training on the general dataset, with subsequent transferring to the PD dataset for retraining (transfer learning model)
Direct training of a DNN only on the PD dataset (direct model)
Direct training of a Random Forrest (RF) model directly on the PD dataset, serving as benchmarking (RF model)

Finally, models were benchmarked against the American College of Surgeons (ACS) Surgical Risk Calculator (SRC), where a mortality risk as well as a compound morbidity risk value was included in the NSQIP dataset.

To separate the PD patients between training, validation, and test sets, we randomly selected 2,000 PD patients (6.3%) into a test set to ensure the validity of all the models after training. The remaining 17,037 PD patients were then randomly split into an 80% training set (consisting of 13,632 PD patients) and a 20% validation set (consisting of PD 3,407 patients) which was used to test validity and tune hyperparameters for modelling approaches 1–4.

Model architecture

The model architecture is depicted in Fig 2. Categorical values were converted into integers, and a dimension was assigned to each category in an embedding matrix. To determine the dimension, we multiplied the cardinality (the number of unique values) of the variable by 1.6 and raised it to the power of 0.56. The resulting value was compared to a dimensional space of 600 used from the fast.ai library [17], and the lower value was selected as the dimension of the given category [18]. This process was repeated for all categorical variables, and the resulting embeddings were passed into the same embedding space with enough dimensions to include separate dimensions for each categorical variable.

The categorical variables were then passed through a dropout layer, followed by a normalization layer with the continuous variables. This was followed by a linear layer and a rectified linear unit (ReLU) activation function layer. The resulting tensor was passed through a subsequent normalization, dropout, linear, ReLU activation function, normalization, and another dropout layer before finally passing through a linear layer with 19 output variables denoting NSQIP’s 18 different complications and death. Backpropagation was used to train the trainable parameters through the Adam optimizer and combined with the loss function flattened Binary Cross Entropy with Logits Loss with positive weights.

To counteract the imbalance in the data, positive weights were applied to the loss function of all DNNs since there were fewer positive outputs than negative ones. This method increases the impact of the minority class on gradient updates during the training process by multiplying the loss with the positive weights when the minority class is misclassified. The positive weight for each output variable was determined by calculating the ratio of negative outcomes to positive outcomes for each variable in the training sets [19].

The DNNs all had 53 embedding layers and trained for 5 epochs, but they differed in terms of the number of trainable parameters, learning rates, weight decay, and weights in the loss function. The general model was trained on a neural network model with 1,253,130 trainable parameters, a learning rate of 3e-3, and a weight decay of 0.2. The transfer learning model also had 1,253,130 trainable parameters, a learning rate of 2e-4, and a weight decay of 0.2. The direct model trained on PD patients had 703,219 trainable parameters, a learning rate of 2e-4, and no weight decay was specified for this model.

To compare the DNNs with a conventional method of handling structured tabular data we created a random forest model. Our random forest consisted of 100 trees, each trained on a sample of 75% of the total data. We used the DecisionTreeClassifier from Scikit-learn with a minimum of five samples per node to train each tree. For each split point in the decision trees, we randomly sampled 50% of the columns. We set the minimum number of samples required to be a leaf node to 40.

Input variables

The input variables for the models included 64 different factors, which were gathered preoperatively as well as operation time which was recorded after the operation. These variables can be classified as either continuous or categorical. Continuous variables are hierarchical numerical values, such as weight, height, age, and protein levels in the blood, while categorical variables include smoking status, comorbidities, type of anesthesia used, specialty under which the patient was treated, and more.

Continuous input variables were standardized by mean and standard deviation (Z-score normalization). Missing continuous data were handled firstly by replacing the missing value with the median value for the corresponding group and secondly, for each variable where missing values were imputed, generating a new binary categorical variable. This binary variable indicates the presence or absence of missing data for each observation, effectively signaling whether the original continuous variable value was missing for each patient. If any categorical variables had missing data, they were given their own separate category and included in the dataset. Both the categorical and continuous variables were then converted into vectors and updated in each epoch of training via the embedding process. An overview of the input data can be found in S1 Table in S1 Data.

Prediction variables

The prediction variables in the model were the PoC’s as defined by NSQIP, occurring up to 30 days after the surgical procedure. Of these, 18 were related to morbidity and the last was mortality. To evaluate the performance of the models, the Area under the receiver operating characteristic curve (ROC-AUC) metric was used on the test set of 2,000 PD patients. This metric was calculated for each of the 19 output variables (mortality and 18 different postoperative complications) across all four models. The average ROC-AUC value was then determined for the 18 morbidity variables in each of the four models. This value was compared to that of the risk calculator, which only provided the probability of morbidity and mortality without specifying the type of morbidity. For the prediction variables, the dataset contained no missing data.

SHAP

We calculated Shapley values with the ‘SHAP’ library [20] to evaluate and visualize the impact of each input variable on the predictions made by our deep neural network (DNN) models. This method assesses the contribution of each input variable to the predicted outcome on the test set. The underlying principle is based on cooperative game theory. The Shapley value represents the average contribution of a feature to the prediction outcome, taken across all possible combinations of input features. This approach allows for a comprehensive understanding of how each input variable influences the models’ predictions [20].

Model implementation

The models in this study were developed using Python 3.7.6, with PyTorch 1.13.1 [21] and fastai 2.7.12 [17]. Performance metrics were derived using scikit-learn 0.22.1 [22].

Results

The performance of the models on the test data is depicted in Table 2 and Fig 3 with the ROC_AUC as the evaluation metric for all 19 variables. The direct model (trained and tested on PD data only) generally had the poorest performance among the four models only having the best performance for predicting myocardial infarction (MI). The general model (trained on the general dataset containing both general and PD patients) and the transfer learning model (trained on the general dataset with subsequent retraining on the PD dataset) outperformed it on 17 out of the prediction tasks while the RF Model outperformed it on 14 out of the 19 prediction tasks.

Table 2. The overall performance of the four models on all variables in the test set with Receiver Operator Characteristics Area Under the Curve (ROC_AUC) values as the metric.

The general model was trained on a general surgery patient cohort, the transfer learning model was trained on the general surgery patient cohort and transferred to a PD-specific patient cohort, the direct model, and the Random Forest model (RF) was trained exclusively on the PD-specific patient cohort.

Complication	General model	Transfer learning model	Direct model	RF model
Superficial Surgical site infection	0.608	0.582	0.537	0.575
Deep surgical site infection	0.695	0.706	0.622	0.593
Organ/space surgical site infection	0.580	0.581	0.517	0.608
Wound disruption	0.676	0.702	0.630	0.577
Postoperative pneumonia	0.639	0.629	0.562	0.567
Unplanned intubation	0.642	0.664	0.634	0.663
Pulmonary embolism	0.662	0.683	0.615	0.651
Ventilator dependence >48 hours	0.706	0.705	0.669	0.701
Progressive renal insufficiency	0.648	0.676	0.590	0.638
Acute renal failure	0.718	0.750	0.713	0.763
Urinary tract infection	0.726	0.687	0.623	0.665
Stroke	0.738	0.615	0.714	0.583
Cardiac arrest	0.643	0.685	0.646	0.610
Myocardial infarction	0.601	0.636	0.657	0.557
Deep vein thrombosis	0.653	0.634	0.578	0.603
Sepsis	0.645	0.626	0.552	0.641
Septic shock	0.695	0.713	0.629	0.702
Bleeding requiring transfusion	0.761	0.753	0.733	0.738
Death	0.648	0.678	0.660	0.672

Open in a new tab

Furthermore, Table 2 and Fig 3 demonstrates that the best performing models were the general model as well as the transfer learning model. As is depicted in Fig 2B and 2E respectively, the transfer learning model outperformed the RF model in 16 out of the 19 prediction tasks and the general model outperformed it in 14 out of the 19 predictions tasks. When comparing the general model with the transfer learning model as shown in Fig 3C they exhibited similar performances on most of the outputs, with the transfer learning model outperforming the general model in 11 out of the 19 prediction tasks.

The ACS-SRC only report two variables to the NSQIP dataset: morbidity risk and mortality risk. To compare the performance of the four models with the ACS-SRC, the average ROC_AUC of all variables, except the deceased variable, was calculated and labeled as “morbidity” in Table 3. This table demonstrates that all models outperformed the risk calculator regarding the morbidity risk with the general model achieving the highest average morbidity risk ROC_AUC of 0.679. The transfer learning model and RF model also outperformed the ACS-SRC when assessing the morality risk however, all models faired similar with the results ranging from the lowest from the general model of 0.648 to the best of the transfer learning model with a result of 0.678 (Fig 4).

Table 3. The average morbidity Receiver Operator Characteristics Area Under the Curve (ROC_AUC) scores of the four models, calculated on the test set.

Additionally, the table includes the average morbidity and mortality risk scores obtained from the same test set derived from the American College of Surgeons Surgical Risk Calculator (ACS-SRC).

Complications	General model	Transfer learning model	Direct model	RF-model	ACS-SRC
Morbidity	0.669	0.668	0.623	0.635	0.524
Mortality	0.648	0.678	0.661	0.672	0.667

Open in a new tab

Considering ‘SHAP’ values for the transfer learning model and the direct model, as shown in Figs 5 and 6, the models’ predictions are based on different variable interactions. For example, “Serum albumin” was the fourth biggest driver of complications in the direct model but the thirteenth biggest driving factor in the transfer-learning model. In contrast, “gender”, which is not ranked within the top 40 in the direct model, was the fourth most significant driving factor in the transfer-learning model. Furthermore, the transfer-learning model, seem to have more driving factors compared to the direct model, which indicates that the transfer-learning model has a higher complexity. Even though the models have some differences in which value ranked the highest, there was a significant overlap in which values were generally regarded as the most important. These values included “Weight”, “Age” and “Operation time” which were top 3 driving factors in both models.

Fig 5 — The x-axis contains the average impact on the color-coded prediction tasks and the y-axis represent the input variables hierarchically dependent on impact level. PATOS: Present at time of surgery.

Fig 6 — The x-axis contains the average impact on the color-coded prediction tasks and the y-axis represent the input variables hierarchically dependent on impact level. PATOS: Present at time of surgery.

Discussion

In this study, we demonstrate that transfer learning of a DNN (transfer learning model) as well larger models trained on a diverse range of operations (general model) outperform alternative approaches such as models using random forest (RF model), DNN exclusively trained on PD patients (direct model) and non-deep learning AI models (ACS-SRC) when risk-predicting post-operative complications in PD patients. The DNN transfer learning approach thus outperformed the ACS-SRC, suggesting a potential value of using this approach when targeting limited-volume operation subtypes such as PD where direct transfer of pretrained models on large PD datasets or direct de-novo training of risk prediction models may not be feasible.

By utilizing DNNs that capture relationships between input and output variables of common diseases and treatment options, it is possible to develop, coherent predictive models even with limited data available for limited surgical procedures. By leveraging transfer learning techniques, it becomes possible to augment the size of small datasets by leveraging the knowledge acquired from other and potentially larger datasets. This approach expands the amount of available data for training larger models, thereby offering a potential for improving outcomes in the PD setting.

Comparing our study’s findings with the results from deploying the ACS-SRC on PDs for neuroendocrine tumors by Dave et al. [23], both the general and the transfer learning model exhibited superior predictive performance in comparison to the NSQIP calculator. Our study incorporated a comparable variable termed ’morbidity,’ aligning with the ’serious complication’ variable examined by Dave et al. (Table 3). Notably, Dave et al. reported an AUC of 0.55 for their ’serious complication’ value, whereas both our general and transfer learning models achieved an AUC of 0.67. However, it should be acknowledged that while there were overlapping complications, there existed differences in the specific variables included in our respective studies, making it difficult to directly compare these two.

In a study from Aoki et al. [6] an attempt was made to predict a value referred to as ‘serious morbidity’ resulting in an ROC AUC of 0.708. However, the definition of ’serious morbidity’ in this study was based on the presence of a Clavien–Dindo classification grade of IV or V, which introduces a disparity between this study and ours in terms of prediction variables. Similarly, a study from Braga et al. [24] also predicted major complications and achieved a ROC AUC of 0.743. However, their definition of major complications also aligned with a Clavien-Dindo classification of IV or V and included a variety of other types of complications, making it challenging to compare their study directly with ours.

As such, in assessing the performance of this model versus previous approaches, it is important to underline that prediction targets are often not aligned.

Most previous approaches have targeted PD specific complications such as pancreatic fistula development, which is indeed a major driver of postoperative complications in the PD setting. In contrast, this study focusses on the prediction of general non-PD specific complications. The rationale behind this choice stems from the fact that key drivers of fistula development (pancreatic texture and pancreatic duct diameter) are often not available for risk prediction before the time of surgery and models incorporating these features are thus of little use in the preoperative setting where the decision on whether to proceed with an operative strategy must be made.

The relevance of fistula development as a driver of other complications should, however, not be underestimated. Fistula development is a recognized driver of other complications, including SSI’s [25] and postoperative hemorrhage [26]. Previous results fielding a large-scale postoperative risk prediction model using a DNN approach on multiple surgical procedure subtypes from the NSQIP dataset, yielded a combined morbidity risk prediction ROC AUC of 0.87 [12], which is superior to the combined morbidity risk prediction ROC AUC of 0.678 demonstrated here. The reason behind this suboptimal performance is likely multifaceted but could include that fact the fistula development risk did not factor into risk calculations in this model. The fact that the DNN approach presented here is on-par with or superior to previous approaches does, however, highlight the fact that even with state-of-the art DNN approaches that have previously demonstrated superior performance compared with legacy approaches [12], PD risk prediction continues to a difficult task where models exhibiting excellent performance still remain elusive. Future efforts could potentially benefit from incorporating methods for assessing fistula risk by including preoperatively available data points assessing pancreatic texture and duct diameter, potentially through automated density analyses of preoperative CT scans combined with pancreatic duct diameter measurements.

This study has limitations that should be acknowledged. As is the case for all studies utilizing registry datasets, models are dependent on the quality and transferability of the data which the model is built upon. As an example, temporal information on when during treatment data points are obtained cannot be assessed. This poses a challenge especially regarding continuous variables such as biochemistry, which are susceptible to fluctuation depending on the time of measurement. A second limitation is that the PD dataset is of limited size, although this was also the rationale for assessing the value of transfer learning approaches in the first place. The PD dataset was used for training all but the general model as well as validation and testing of all models. Therefore, the limited size of this dataset reduces the generality of the findings as well as hinder the learning of very complex relationships between variables. The limited size of the PD dataset is particularly challenging for rare outcome predictions such as stroke which, in the test set, only occurs with 5 patients. A third limitation of this study could be attributed to the underlining patient demographics and treatment strategies. NSQIP contains data primarily from US patients, and it thus cannot be assessed how this model would perform on non-US patients or hospital systems. Furthermore, it should be noted that although the ability of DNNs to include a multitude of relevant input variables offers the approach a position of strength over conventional regression-based approaches, this often also hinders manual use of the model as it is impractical for the clinical user to input several hundred parameters to the model for each risk prediction. Ideally, actual clinical use of DNNs would thus require automated embedding of the DNNs directly into the electronic health record (EHR) systems. Lastly it is worth noting that as with all studies concerning DNNs, the black box issue of which factors the model perceives as most relevant, is still an unsolved problem. Therefore, it is difficult for the model to address the rationale behind specific predictions hindering the ability to determine the relationships between variables which the model found most important. We have attempted to try and visualize the importance of the most important variables using SHAP values. However, visualizing the importance of a single feature in a non-linear model- still presents a significant challenge.

Even with these limitations, we conclude that DNNs and transfer learning approaches may have a value in predicting general complications in the setting of low-volume surgical cases such as PDs, although overall performance improvements and EHR system integrations are likely needed before models can see actual clinical use.

Supporting information

S1 Data. Supplementary data-2.

(DOCX)

pone.0316402.s001.docx^{(38KB, docx)}

Data Availability

The de-identified dataset used for this study can be obtained from the authors or the following institutional contacts, provided written authorization from data owners American College of Surgeons, National Surgical Quality Improvement Program (ACS-NSQIP) can be obtained. The reason for the current restriction is patient privacy as well as the fact that the ACS-NSQIP does not allow for commercial use of data collected through the ACS-NSQIP. As such, there is a requirement of data use only for specific and approved research projects, and that this is individually approved by the ACS-NSQIP, including through a signed data user agreement. This data user agreement can be obtained by contacting the ACS-NSQIP at baa@facs.org. The American College of Surgeons National Surgical Quality Improvement program (ACS-NSQIP), serving as the governing body for the data used in this study, can be contacted at baa@facs.org. The point of contact is Brian Matel. Contacting researchers will be required to complete a new data user agreement with the ACS-NSQIP if secondary use of the data is desired.

Funding Statement

Funded by a grant from the Novo Nordisk Foundation (grant #NNF19OC0055183) to MS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1.Siegel RL, Miller KD, Wagle NS, Jemal A. Cancer statistics, 2023. CA Cancer J Clin. 2023;73(1):17–48. doi: 10.3322/caac.21763 . [DOI] [PubMed] [Google Scholar]
2.Wang H, Liu J, Xia G, Lei S, Huang X, Huang X. Survival of pancreatic cancer patients is negatively correlated with age at diagnosis: a population-based retrospective study. Sci Rep. 2020;10(1):7048. Epub 20200427. doi: 10.1038/s41598-020-64068-3 . [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Springfeld C, Jäger D, Büchler MW, Strobel O, Hackert T, Palmer DH, et al. Chemotherapy for pancreatic cancer. Presse Med. 2019;48(3 Pt 2):e159–e74. Epub 20190315. doi: 10.1016/j.lpm.2019.02.025 . [DOI] [PubMed] [Google Scholar]
4.Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin. 2021;71(3):209–49. Epub 20210204. doi: 10.3322/caac.21660 . [DOI] [PubMed] [Google Scholar]
5.Kleeff J, Michalski C, Friess H, Büchler MW. Pancreatic cancer: from bench to 5-year survival. Pancreas. 2006;33(2):111–8. doi: 10.1097/01.mpa.0000229010.62538.f2 . [DOI] [PubMed] [Google Scholar]
6.Aoki S, Miyata H, Konno H, Gotoh M, Motoi F, Kumamaru H, et al. Risk factors of serious postoperative complications after pancreaticoduodenectomy and risk calculators for predicting postoperative complications: a nationwide study of 17,564 patients in Japan. J Hepatobiliary Pancreat Sci. 2017;24(5):243–51. Epub 20170405. doi: 10.1002/jhbp.438 . [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Chikhladze S, Lederer AK, Kousoulas L, Reinmuth M, Sick O, Fichtner-Feigl S, et al. Adjuvant chemotherapy after surgery for pancreatic ductal adenocarcinoma: retrospective real-life data. World J Surg Oncol. 2019;17(1):185. Epub 20191109. doi: 10.1186/s12957-019-1732-3 . [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Luu AM, Belyaev O, Höhn P, Praktiknjo M, Janot M, Uhl W, et al. Late recurrences of pancreatic cancer in patients with long-term survival after pancreaticoduodenectomy. J Gastrointest Oncol. 2021;12(2):474–83. doi: 10.21037/jgo-20-433 . [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Cohn SL, Fernandez Ros N. Comparison of 4 Cardiac Risk Calculators in Predicting Postoperative Cardiac Complications After Noncardiac Operations. Am J Cardiol. 2018;121(1):125–30. Epub 2017/11/12. doi: 10.1016/j.amjcard.2017.09.031 . [DOI] [PubMed] [Google Scholar]
10.Cohen ME, Liu Y, Ko CY, Hall BL. An Examination of American College of Surgeons NSQIP Surgical Risk Calculator Accuracy. J Am Coll Surg. 2017;224(5):787–95.e1. Epub 20170404. doi: 10.1016/j.jamcollsurg.2016.12.057 . [DOI] [PubMed] [Google Scholar]
11.Höhn P, Runde F, Luu AM, Fahlbusch T, Fein D, Klinger C, et al. Applicability of the surgical risk calculator by the American College of Surgeons in the setting of German patients undergoing complete pancreatectomy: multicentre study using data from the StuDoQ|Pancreas registry. BJS Open. 2023;7(2). doi: 10.1093/bjsopen/zrac164 . [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Bonde A, Varadarajan KM, Bonde N, Troelsen A, Muratoglu OK, Malchau H, et al. Assessing the utility of deep neural networks in predicting postoperative surgical complications: a retrospective study. Lancet Digit Health. 2021;3(8):e471–e85. Epub 20210629. doi: 10.1016/S2589-7500(21)00084-4 . [DOI] [PubMed] [Google Scholar]
13.Millarch AS, Bonde A, Bonde M, Klein KV, Folke F, Rudolph SS, et al. Assessing optimal methods for transferring machine learning models to low-volume and imbalanced clinical datasets: experiences from predicting outcomes of Danish trauma patients. Front Digit Health. 2023;5:1249258. Epub 20231102. doi: 10.3389/fdgth.2023.1249258 . [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Husanjot Chahal HT. ‘Small Data’ Are Also Crucial for Machine Learning 2021. https://www.scientificamerican.com/article/small-data-are-also-crucial-for-machine-learning/.
15.Bilimoria KY, Liu Y, Paruch JL, Zhou L, Kmiecik TE, Ko CY, et al. Development and evaluation of the universal ACS NSQIP surgical risk calculator: a decision aid and informed consent tool for patients and surgeons. J Am Coll Surg. 2013;217(5):833–42 e1-3. Epub 2013/09/24. doi: 10.1016/j.jamcollsurg.2013.07.385 . [DOI] [PMC free article] [PubMed] [Google Scholar]
16.ACS. NSQIP participants user file 2018 [August 18th 2023]. https://www.facs.org/media/xunbqzy5/nsqip_puf_userguide_2018.pdf.
17.Jeremy Howard SG. Deep Learning for Coders with fastai and PyTorch: AI Applications Without a PhD: O’Reilly Media; 2020.
18.Gugger JHaS. https://docs.fast.ai/tabular.model.html.
19.Hart E. Machine Learning 101: The What, Why, and How of Weighting. KDnuggets.
20.Scott M. Lundberg S-IL. A Unified Approach to Interpreting Model Predictions. NIPS paper. 2017.
21.Pytoch: Adam Paszke SG, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, et al. An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems (pp. 8026–8037). 3 Dec 2019.
22.Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort Vincent Michel, Thirion B. Scikit-learn: Machine Learning in Python. 2011.
23.Dave A, Beal EW, Lopez-Aguiar AG, Poultsides G, Makris E, Rocha FG, et al. Evaluating the ACS NSQIP Risk Calculator in Primary Pancreatic Neuroendocrine Tumor: Results from the US Neuroendocrine Tumor Study Group. J Gastrointest Surg. 2019;23(11):2225–31. Epub 20190402. doi: 10.1007/s11605-019-04120-4 . [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Braga M, Capretti G, Pecorelli N, Balzano G, Doglioni C, Ariotti R, Di Carlo V. A prognostic score to predict major complications after pancreaticoduodenectomy. Ann Surg. 2011;254(5):702–7; discussion 7–8. doi: 10.1097/SLA.0b013e31823598fb . [DOI] [PubMed] [Google Scholar]
25.Suragul W, Rungsakulkij N, Vassanasiri W, Tangtawee P, Muangkaew P, Mingphruedhi S, et al. Predictors of surgical site infection after pancreaticoduodenectomy. BMC Gastroenterol. 2020;20(1):201. Epub 20200626. doi: 10.1186/s12876-020-01350-8 . [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Yekebas EF, Wolfram L, Cataldegirmen G, Habermann CR, Bogoevski D, Koenig AM, et al. Postpancreatectomy hemorrhage: diagnosis and treatment: an analysis in 1669 consecutive pancreatic resections. Ann Surg. 2007;246(2):269–80. doi: 10.1097/01.sla.0000262953.77735.db . [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0316402.r001

Decision Letter 0

Robert Jeenchen Chen

Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

30 Jan 2024

PONE-D-23-30439ASSESSING THE VALUE OF DEEP NEURAL NETWORKS FOR POSTOPERARTIVE COMPLICATION PREDICTION IN PANCREATICODUODENECTOMY PATIENTSPLOS ONE

Dear Dr. Sillesen,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please revise.

Please submit your revised manuscript by Mar 15 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Robert Jeenchen Chen, MD, MPH

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please provide additional details regarding participant consent. In the ethics statement in the Methods and online submission information, please ensure that you have specified what type you obtained (for instance, written or verbal, and if verbal, how it was documented and witnessed). If your study included minors, state whether you obtained consent from parents or guardians. If the need for consent was waived by the ethics committee, please include this information.

3. Thank you for stating the following in the Competing Interests section: "I have read the journal's policy and the authors of this manuscript have the following competing interests: Authors AB and MS have founded Aiomic Aps, a healthtech company fielding artificial intelligence models for healthcare use. The present work is for research only and is not related to any commercial activities."

Please confirm that this does not alter your adherence to all PLOS ONE policies on sharing data and materials, by including the following statement: "This does not alter our adherence to PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests). If there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.

Please include your updated Competing Interests statement in your cover letter; we will change the online submission form on your behalf.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: No

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I appreciate the opportunity to examine this interesting paper. Nevertheless, there are a few concerns regarding the work.

Conduct a comprehensive review of existing literature pertaining to the utilization of machine learning and deep learning techniques in the context of mortality and morbidity, as well as any relevant studies in related fields. The utilization of Deep Neural Networks (DNN) and Random Forest (RF) can be justified based on their respective advantages and capabilities. There are numerous AI algorithms available, with the latest one being ensemble learning, which offers superior performance. Deep neural networks (DNN) may not consistently perform as effectively as other machine learning methods, but they are particularly notable for their application in image processing. This is evident when the area under the curve (AUC) has a relatively low value. A value that is considered acceptable is 0.75 or higher.

The authors should explicitly state their objective and anticipated outcome in the objective section. As I comprehend, there are multiple morbidity outcomes and death outcomes, and the authors conducted different modeling for each result. Alternatively, is it a prediction involving many classes? The authors should explicitly articulate this.

In the data section, specify the percentage of missing values for final variables and identify the data imputation method employed.

The authors determined the final features from the 150 variables listed in the study. The ultimate result must also be explicitly expressed. It is necessary to provide a clear explanation of how the dataset is allocated to the four models. How did the authors address the issue of data imbalance, which can explain the low AUC value? There is a noticeable imbalance in the data when merging patients with PD with people without PD.

Summary statistics of each dataset are compared against the outcome using chi square analysis, yielding the chi square value for categorical and corresponding p-value continuous variable.

Did the authors undertake parameter optimization on the RF model to enhance its performance? Support Vector Machines (SVM) are a suitable option for conducting benchmarking.

I suggest incorporating Net Index Reclassification (NRI) for the purpose of comparing it with traditional methodologies.

Additionally, the use of SHAP can provide a clearer understanding of the relationship between input properties and the resulting conclusion.

Reviewer #2: I commend you for taking up a comprehensive research analysis of data set and use it for prediction of post operative complications in a subset of pancreaticoduodenectomy cases. The different models formulated and tested are based on sound reasoning and parameters. I hope that, with use of similar computer-generated applications in health care research would help resolve many issues in future

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: sorayya malek

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Dec 30;19(12):e0316402. doi: 10.1371/journal.pone.0316402.r002

Author response to Decision Letter 0

7 Jun 2024

Dear Dr. Chen

Thank you for the opportunity to resubmit our manuscript. Please find a point-by-point response to reviewers’ comments below. Also, please do not hesitate to reach out to us if further information is required.

On behalf of the authors

Martin Sillesen MD, PhD

Assoc. Prof. of Surgery

Dep. of Organ Surgery and Transplantation

Copenhagen University Hospital, Rigshospitalet

Denmark

Martin.Sillesen@Regionh.dk

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

Comment #1

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

Response:

Thank you for highlighting this. The formatting issues have been addressed.

Comment #2

Response:

Thank you for this question. The project received a waiver of IRB approval from the Massachusetts General Hospital IRB. This is stated in the methods section. We have expanded this section further to detail that this entailed that patient consent was not required and thus not obtained.

Comment #3

Response:

You are welcome

Comment #4

Please include your updated Competing Interests statement in your cover letter; we will change the online submission form on your behalf.

Response:

Thank you, we have updated the cover letter with the required information.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #1:

Comment #5

I appreciate the opportunity to examine this interesting paper. Nevertheless, there are a few concerns regarding the work.

Response:

Thank you these comments. Overall, we fully agree with the reviewer that there are multiple different modeling options well suited for this prediction task. We have chosen the DNN architecture, as we have previously demonstrated superior performance using this approach on a general surgical dataset (Manuscript reference #12).

In this modeling study, we use the DNNs to do multiclass prediction, so it is essentially the same network used for multi-label prediction of several different postoperative complications. We have augmented the manuscript with a further description of this important point, as well as elaborated on the expected outcome in the introduction as requested.

Comment #6

In the data section, specify the percentage of missing values for final variables and identify the data imputation method employed.

Response :

Thank you for this question. We agree that the issue of missing data is important. For the final (output) variables, the dataset was complete and thus did not have missing data. This is because the NSQIP database does not allow for incomplete postoperative complication data entry when submitting cases. We have added this information to the methods section of the manuscript, as well as included a description of how missing data was handled for input variables. For the input data, an overview of demographic variables between datasets, as well as the percentage of missing data, has now been included in the supplementary data table 1.

Comment #7

Response:

Thank you for this insightful question. We agree that a full overview of the input features is important for assessing the data on which the model performs its predictions. We have added a table detailing the input features as supplementary data.

The reviewer is indeed correct in stating that the class imbalance in the data may well impact on the AUC performance score. This problem often occurs in postoperative complication data, as the outcomes are infrequent. We have, however, sought to address this through the introduction of positive weights applied to the loss function of the DNN. This approach has been further detailed in the methods section

Comment #8

Summary statistics of each dataset are compared against the outcome using chi square analysis, yielding the chi square value for categorical and corresponding p-value continuous variable.

Response:

In this study, we have not calculated summary statistics of the different datasets for several reasons. First, the objective is not to determine whether significant differences existed between input and output variables. This is no doubt the case, as a general surgical cohort will for example have a different spectrum and incidence of postoperative complications as would a specific pancreatic cohort. We thus feel that the objective of the study is not to compare cohorts and identify differences in these, but rather to investigate how optimal training of a PD specific model can be achieved.

Even if it was chosen that summary statistics were required, the number of input and output variables would necessitate correction for multiple testing due to the number of variables included. As such, there would be a clear risk of a Type-II error, which again would diminish the usefulness of such a comparison. While this can in certain cases of cause be acceptable, we respectfully feel that a comparison of cohorts is not within the objective of this study. Rather this is a model comparison study.

Comment #9

Did the authors undertake parameter optimization on the RF model to enhance its performance? Support Vector Machines (SVM) are a suitable option for conducting benchmarking.

Response:

Thank you for this comment. We agree that the SVM approach could be useful a study such as this. As suggested, we have benchmarked the RF models with an SVM approach with parameter optimization using grid search. This approach was inferior to the RF approach, resulting in a ROC AUC of 0.52 for morbidity and 0.50 for mortality. As such, the RF outperformed the grid search optimized SVM approach, indicating adequate hyper parameter optimization for the RF approach. This information has been added to the manuscript.

Comment #10

I suggest incorporating Net Index Reclassification (NRI) for the purpose of comparing it with traditional methodologies.

Response:

Thank you for this suggestion. As suggested, we have included the NRI calculations as supplementary data, and updated the manuscript accordingly.

Comment #11

Additionally, the use of SHAP can provide a clearer understanding of the relationship between input properties and the resulting conclusion.

Response:

Thank you for this suggestion. We agree that the addition of SHAP values would add value to the manuscript. We have thus included these in the revised manuscript.

Comment#12

Response:

Thank you for this comment!

Attachment

Submitted filename: Response to reviewers.docx

pone.0316402.s002.docx^{(21KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0316402.r003

Decision Letter 1

Robert Jeenchen Chen

21 Oct 2024

PONE-D-23-30439R1ASSESSING THE VALUE OF DEEP NEURAL NETWORKS FOR POSTOPERATIVE COMPLICATION PREDICTION IN PANCREATICODUODENECTOMY PATIENTSPLOS ONE

Dear Dr. Sillesen,

Please address the preprint issue.

Please submit your revised manuscript by Dec 05 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Robert Jeenchen Chen, MD, MPH

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #2: Yes

Reviewer #3: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

Reviewer #3: I Don't Know

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #2: Yes

Reviewer #3: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #2: Yes

Reviewer #3: No

**********

6. Review Comments to the Author

Reviewer #2: Thank you very much for addressing the reviewers queries and concerns. I believe it would be suitable in the updated form.

Reviewer #3: This is a preprint paper from

https://www.medrxiv.org/content/10.1101/2023.08.21.23294364v1

This is a preprint paper from

https://www.medrxiv.org/content/10.1101/2023.08.21.23294364v1

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Reviewer #3: Yes: hazim alhiti

**********

PLoS One. 2024 Dec 30;19(12):e0316402. doi: 10.1371/journal.pone.0316402.r004

Author response to Decision Letter 1

8 Nov 2024

Dear PLOSOne editorial team

Thank you for assessing our submitted manuscript. We understand that reviews feel that questions have been addressed, but here seems to be an issue with the fact that the manuscript has previously been submitted as a preprint.

To formally adhere to the review phases, please find a point-by-point response to comments below.

Thank you again for the opportunity to resubmit our manuscript.

On behalf of the authors

Martin Sillesen MD, PhD

Assoc. Prof. of Surgery

Dep. of Organ Surgery and Transplantation

Copenhagen University Hospital, Rigshospitalet

Denmark

Martin.Sillesen@Regionh.dk

Reviewers' comments:

Reviewer's Responses to Questions

Comment #1

Comments to the Author

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

Response:

Thank you

Comment #2

6. Review Comments to the Author

Reviewer #2: Thank you very much for addressing the reviewers queries and concerns. I believe it would be suitable in the updated form.

Reviewer #3: This is a preprint paper from

https://www.medrxiv.org/content/10.1101/2023.08.21.23294364v1

This is a preprint paper from

https://www.medrxiv.org/content/10.1101/2023.08.21.23294364v1

Response:

It is indeed true that this manuscript has previously been submitted as a preprint, which we believe is in accordance with PLOSOne guidelines. This seems to be more of a comment pertaining to the fact this this has been presented as a preprint, rather than a request for edits from the side of the reviewer.

However, from conversations with the editorial office, we understand that there is a request from the editorial office for “…authors to make more improvements from the version in the preprint. Please revise to show more improvements.”.

The original preprint version was submitted in August 2023, whereas the present version was substantially edited based on the reviewer’s comments. As such, the preprint and the current manuscript is less than 80% identical to the preprint, indicating that 20% of the text has been updated/altered compared with the preprint.

Furthermore, based on the reviewers inputs, we have updated the manuscript with new tables and data, added new modelling approaches including benchmarking with SVMs and included parameter optimization . We have, as per reviewers suggestions, included Net Index Classification and Shapley insights into driving factors as well as performed a general editing of the manuscript text.

As such, we would respectfully argue that the current version of the manuscript is indeed very different from the preprint version.

While we are of course open to performing other changes to the manuscript, we find it difficult to make changes to a manuscript that is already significantly different from its preprint version both in terms of text, data, figures, tables and methods without more specific instructions on where the editors see the need for improvements specifically.

Attachment

Submitted filename: Reponse to reviwers 2nd round.docx

pone.0316402.s003.docx^{(17.4KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0316402.r005

Decision Letter 2

Robert Jeenchen Chen

10 Dec 2024

ASSESSING THE VALUE OF DEEP NEURAL NETWORKS FOR POSTOPERATIVE COMPLICATION PREDICTION IN PANCREATICODUODENECTOMY PATIENTS

PONE-D-23-30439R2

Dear Dr. Sillesen,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Robert Jeenchen Chen, MD, MPH, ChFC®, EA, CLU

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #3: (No Response)

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #3: Yes

**********

6. Review Comments to the Author

Reviewer #3: hard work

well done

congratulations

my final decision is acceptable ‎

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #3: Yes: Hazim Alhiti

**********

PLoS One. doi: 10.1371/journal.pone.0316402.r006

Acceptance letter

Robert Jeenchen Chen

15 Dec 2024

PONE-D-23-30439R2

PLOS ONE

Dear Dr. Sillesen,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Robert Jeenchen Chen

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Data. Supplementary data-2.

(DOCX)

pone.0316402.s001.docx^{(38KB, docx)}

Attachment

Submitted filename: Response to reviewers.docx

pone.0316402.s002.docx^{(21KB, docx)}

Attachment

Submitted filename: Reponse to reviwers 2nd round.docx

pone.0316402.s003.docx^{(17.4KB, docx)}

Data Availability Statement

[pone.0316402.ref001] 1.Siegel RL, Miller KD, Wagle NS, Jemal A. Cancer statistics, 2023. CA Cancer J Clin. 2023;73(1):17–48. doi: 10.3322/caac.21763 . [DOI] [PubMed] [Google Scholar]

[pone.0316402.ref002] 2.Wang H, Liu J, Xia G, Lei S, Huang X, Huang X. Survival of pancreatic cancer patients is negatively correlated with age at diagnosis: a population-based retrospective study. Sci Rep. 2020;10(1):7048. Epub 20200427. doi: 10.1038/s41598-020-64068-3 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0316402.ref003] 3.Springfeld C, Jäger D, Büchler MW, Strobel O, Hackert T, Palmer DH, et al. Chemotherapy for pancreatic cancer. Presse Med. 2019;48(3 Pt 2):e159–e74. Epub 20190315. doi: 10.1016/j.lpm.2019.02.025 . [DOI] [PubMed] [Google Scholar]

[pone.0316402.ref004] 4.Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin. 2021;71(3):209–49. Epub 20210204. doi: 10.3322/caac.21660 . [DOI] [PubMed] [Google Scholar]

[pone.0316402.ref005] 5.Kleeff J, Michalski C, Friess H, Büchler MW. Pancreatic cancer: from bench to 5-year survival. Pancreas. 2006;33(2):111–8. doi: 10.1097/01.mpa.0000229010.62538.f2 . [DOI] [PubMed] [Google Scholar]

[pone.0316402.ref006] 6.Aoki S, Miyata H, Konno H, Gotoh M, Motoi F, Kumamaru H, et al. Risk factors of serious postoperative complications after pancreaticoduodenectomy and risk calculators for predicting postoperative complications: a nationwide study of 17,564 patients in Japan. J Hepatobiliary Pancreat Sci. 2017;24(5):243–51. Epub 20170405. doi: 10.1002/jhbp.438 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0316402.ref007] 7.Chikhladze S, Lederer AK, Kousoulas L, Reinmuth M, Sick O, Fichtner-Feigl S, et al. Adjuvant chemotherapy after surgery for pancreatic ductal adenocarcinoma: retrospective real-life data. World J Surg Oncol. 2019;17(1):185. Epub 20191109. doi: 10.1186/s12957-019-1732-3 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0316402.ref008] 8.Luu AM, Belyaev O, Höhn P, Praktiknjo M, Janot M, Uhl W, et al. Late recurrences of pancreatic cancer in patients with long-term survival after pancreaticoduodenectomy. J Gastrointest Oncol. 2021;12(2):474–83. doi: 10.21037/jgo-20-433 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0316402.ref009] 9.Cohn SL, Fernandez Ros N. Comparison of 4 Cardiac Risk Calculators in Predicting Postoperative Cardiac Complications After Noncardiac Operations. Am J Cardiol. 2018;121(1):125–30. Epub 2017/11/12. doi: 10.1016/j.amjcard.2017.09.031 . [DOI] [PubMed] [Google Scholar]

[pone.0316402.ref010] 10.Cohen ME, Liu Y, Ko CY, Hall BL. An Examination of American College of Surgeons NSQIP Surgical Risk Calculator Accuracy. J Am Coll Surg. 2017;224(5):787–95.e1. Epub 20170404. doi: 10.1016/j.jamcollsurg.2016.12.057 . [DOI] [PubMed] [Google Scholar]

[pone.0316402.ref011] 11.Höhn P, Runde F, Luu AM, Fahlbusch T, Fein D, Klinger C, et al. Applicability of the surgical risk calculator by the American College of Surgeons in the setting of German patients undergoing complete pancreatectomy: multicentre study using data from the StuDoQ|Pancreas registry. BJS Open. 2023;7(2). doi: 10.1093/bjsopen/zrac164 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0316402.ref012] 12.Bonde A, Varadarajan KM, Bonde N, Troelsen A, Muratoglu OK, Malchau H, et al. Assessing the utility of deep neural networks in predicting postoperative surgical complications: a retrospective study. Lancet Digit Health. 2021;3(8):e471–e85. Epub 20210629. doi: 10.1016/S2589-7500(21)00084-4 . [DOI] [PubMed] [Google Scholar]

[pone.0316402.ref013] 13.Millarch AS, Bonde A, Bonde M, Klein KV, Folke F, Rudolph SS, et al. Assessing optimal methods for transferring machine learning models to low-volume and imbalanced clinical datasets: experiences from predicting outcomes of Danish trauma patients. Front Digit Health. 2023;5:1249258. Epub 20231102. doi: 10.3389/fdgth.2023.1249258 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0316402.ref014] 14.Husanjot Chahal HT. ‘Small Data’ Are Also Crucial for Machine Learning 2021. https://www.scientificamerican.com/article/small-data-are-also-crucial-for-machine-learning/.

[pone.0316402.ref015] 15.Bilimoria KY, Liu Y, Paruch JL, Zhou L, Kmiecik TE, Ko CY, et al. Development and evaluation of the universal ACS NSQIP surgical risk calculator: a decision aid and informed consent tool for patients and surgeons. J Am Coll Surg. 2013;217(5):833–42 e1-3. Epub 2013/09/24. doi: 10.1016/j.jamcollsurg.2013.07.385 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0316402.ref016] 16.ACS. NSQIP participants user file 2018 [August 18th 2023]. https://www.facs.org/media/xunbqzy5/nsqip_puf_userguide_2018.pdf.

[pone.0316402.ref017] 17.Jeremy Howard SG. Deep Learning for Coders with fastai and PyTorch: AI Applications Without a PhD: O’Reilly Media; 2020.

[pone.0316402.ref018] 18.Gugger JHaS. https://docs.fast.ai/tabular.model.html.

[pone.0316402.ref019] 19.Hart E. Machine Learning 101: The What, Why, and How of Weighting. KDnuggets.

[pone.0316402.ref020] 20.Scott M. Lundberg S-IL. A Unified Approach to Interpreting Model Predictions. NIPS paper. 2017.

[pone.0316402.ref021] 21.Pytoch: Adam Paszke SG, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, et al. An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems (pp. 8026–8037). 3 Dec 2019.

[pone.0316402.ref022] 22.Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort Vincent Michel, Thirion B. Scikit-learn: Machine Learning in Python. 2011.

[pone.0316402.ref023] 23.Dave A, Beal EW, Lopez-Aguiar AG, Poultsides G, Makris E, Rocha FG, et al. Evaluating the ACS NSQIP Risk Calculator in Primary Pancreatic Neuroendocrine Tumor: Results from the US Neuroendocrine Tumor Study Group. J Gastrointest Surg. 2019;23(11):2225–31. Epub 20190402. doi: 10.1007/s11605-019-04120-4 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0316402.ref024] 24.Braga M, Capretti G, Pecorelli N, Balzano G, Doglioni C, Ariotti R, Di Carlo V. A prognostic score to predict major complications after pancreaticoduodenectomy. Ann Surg. 2011;254(5):702–7; discussion 7–8. doi: 10.1097/SLA.0b013e31823598fb . [DOI] [PubMed] [Google Scholar]

[pone.0316402.ref025] 25.Suragul W, Rungsakulkij N, Vassanasiri W, Tangtawee P, Muangkaew P, Mingphruedhi S, et al. Predictors of surgical site infection after pancreaticoduodenectomy. BMC Gastroenterol. 2020;20(1):201. Epub 20200626. doi: 10.1186/s12876-020-01350-8 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0316402.ref026] 26.Yekebas EF, Wolfram L, Cataldegirmen G, Habermann CR, Bogoevski D, Koenig AM, et al. Postpancreatectomy hemorrhage: diagnosis and treatment: an analysis in 1669 consecutive pancreatic resections. Ann Surg. 2007;246(2):269–80. doi: 10.1097/01.sla.0000262953.77735.db . [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Assessing the value of deep neural networks for postoperative complication prediction in pancreaticoduodenectomy patients

Mikkel Bonde

Alexander Bonde

Haytham Kaafarani

Andreas Millarch

Martin Sillesen

Roles

Abstract

Introduction

Methods

Results

Conclusion

Introduction

Methods

Datasets and modelling approaches

Table 1. The incidence of postoperative complication (prediction variables) for the three datasets before the split into validation/training sets are depicted above with the number of patients experiencing each variable labelled.

Fig 1. 5.881.881 patients were in the National Surgical Quality Improvement Program (NSQIP) dataset, 31.944 of whom were PD patients.

Model architecture

Fig 2. Model architecture with all layers depicted.

Input variables

Prediction variables

SHAP

Model implementation

Results

Table 2. The overall performance of the four models on all variables in the test set with Receiver Operator Characteristics Area Under the Curve (ROC_AUC) values as the metric.

Table 3. The average morbidity Receiver Operator Characteristics Area Under the Curve (ROC_AUC) scores of the four models, calculated on the test set.

Fig 4. The average morbidity score and mortality Area Under the Receiver Operator Curve (ROC AUC) of the 4 models as well as the American College of Surgeons Surgical Risk Calculator (ACS-SRC).

Fig 5. SHAP values for the transfer learning model.

Fig 6. SHAP values for the direct model.

Discussion

Supporting information

Data Availability

Funding Statement

References

Decision Letter 0

Robert Jeenchen Chen

Roles

Transfer Alert

Author response to Decision Letter 0

Decision Letter 1

Robert Jeenchen Chen

Roles

Author response to Decision Letter 1

Decision Letter 2

Robert Jeenchen Chen

Roles

Acceptance letter

Robert Jeenchen Chen

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases