Table 2.
Performance of the prognostic models of the algorithms and the surgeon in the test set (n = 400)
| Model/ Surgeon |
Balanced Accuracy |
Accuracy | Sensitivity | Specificity | AUCa | Brier Score |
|---|---|---|---|---|---|---|
| LRb | 0.580 | 0.90 (CIc 0.86–0.92) | 0.235 | 0.924 | 0.624 | 0.041 |
| RFd | 0.591 | 0.81 (CI 0.77–0.85) | 0.353 | 0.830 | 0.671 | 0.043 |
| XGBe | 0.608 | 0.84 (CI 0.80–0.88) | 0.353 | 0.864 | 0.646 | 0.042 |
| KNNf | 0.616 | 0.70 (CI 0.65–0.74) | 0.529 | 0.702 | 0.623 | 0.042 |
| Senior Surgeon | 0.527 | 0.74 (CI 0.69–0.78) | 0.294 | 0.760 | - | - |
aAUC: area under the receiver operating characteristic curve; bLR: logistic regression; cCI: Confidence interval; dRF: random forest; eXGB: eXtreme gradient boost; fKNN: K-nearest neighbors.
95% confidence intervals are provided for accuracy. No confidence interval is reported for balanced accuracy, as it is a composite metric whose uncertainty depends on both sensitivity and specificity, and no standard analytical method is established for its estimation.