Table 2.
First Author (Year) | Discrimination (AUC) | Calibration Measures | Calibration: Performance | DCA |
---|---|---|---|---|
Buisman (2022) [20] | 0.73 | Calibration curve | Good calibration (MSKCC model)/slight underprediction (Erasmus MC model) | NR |
Bertsimas (2022) [21] | KRAS-variant: 0.76 (both training and testing)/external validation: 0.78/wild-type, training: 0.79/wild-type, testing: 0.57 | NR | NR | NR |
Bao (2021) [22] | Mean time-dependent: 0.75 | NR | NR | NR |
Lam (2023) [23] | 0.65 (both for OS and RFS) | NR | NR | NR |
Reijonen (2023) [24] | 0.62 (OS) | NR | NR | NR |
Margonis (2018) [25] | 0.625 | NR | NR | NR |
Paredes (2020) [26] | Model without KRAS: 0.649–0.662 (validation cohort)/model with KRAS: 0.642–0.667 (validation cohort) | Calibration curve | No KRAS: good calibration/KRAS: fair | NR |
Fruhling (2021) [27] | 1-, 3-, 5-year OS: 0.71, 0.67, 0.67/internal validation: 0.62 | Calibration curve | Excellent calibration in development cohort | NR |
Taghavi (2021) [28] | Training: 0.64/validation: 0.71 | NR | NR | NR |
Brudvik (2019) [29] | Development, 5 -y OS: 0.69/development: 5 y RFS: 0.66 | NR | NR | NR |
Moaven (2023) [30] | GBT, OS: 0.77/GBT, recurrence: 0.63/LRB, OS: 0.64/LRB, recurrence: 0.57 | NR | NR | NR |
Villard (2022) [31] | Development: 0.74/validation: 0.69/simplified model, development: 0.74, validation: 0.66 | Calibration curve, CITL, slope, HL test | CITL: 0.36, slope: 0.89 (validation), good overall fit | NR |
Chen (2020) [32] | Development: 0.69 at 24 months and 0.65 at 33 months/internal validation: 0.63/cohort 2: 0.81 at 15 months | Calibration curve | Good calibration | NR |
Chen (2022) [33] | 1-, 3-, 5-year OS: 0.828, 0.740, 0.700 in the solitary LM group; 0.747, 0.714, 0.753 in the 2–4 LM group; 0.728, 0.741, 0.792 in the ≥ 5 LM group | Calibration curve | Fair calibration only in the 2–4 LM group | NR |
Dai (2021) [34] | Training: 0.866/validation: 0.792 | Calibration curve | Poor calibration in the validation cohort | Clinical utility with lift curves |
Liu (2021) [35] | 0.707 | Calibration curve | Fair | NR |
Liang (2021) [36] | Training: 0.742/validation: 0.773 | Calibration curve | Fair in both training and validation cohorts | NR |
Wu (2021) [37] | 0.71 (both neoadjuvant and non-neoadjuvant groups) | NR | NR | NR |
Sasaki (2022) [38] | Development: 0.61 (model as a continuous variable), 0.60 (model as a categorical variable)/Asian external validation cohort: 0.62 (model as a continuous variable), 0.60 (model as a categorical variable)/European external validation cohort: 0.57 (model as a continuous variable), 0.57 (model as a categorical variable) | NR | NR | NR |
Huiskens (2019) [39] | Stage 1 model: 0.70/Stage 2 model: 0.72 | H-L test | Stage 1 model: chi-square: 3.5, p = 0.63/Stage 2 model: chi-square: 7.8, p = 0.18 | NR |
Bai (2022) [40] | 5-year OS, development: 0.721/5-year OS, validation: 0.665/2-year RFS, development: 0.728/2-year RFS, validation: 0.640 | NR | NR | NR |
Fang (2022) [41] | 0.715 | NR | NR | NR |
Qin (2022) [42] | 1-, 2-, 3-year ihPFS: 0.695, 0.764, 0.782 | Calibration curve | Fair calibration | yes |
Kawaguchi (2021) [43] | RAS mutant, development: 0.629/RAS mutant, validation: 0.644/wild type, development: 0.625/wild type, validation: 0.624 | Calibration curve | Fair calibration (development and validation cohort) | NR |
Zhang (2023) [44] | Risk score: 1, 3, 5 years, training: 0.624, 0.630, 0.662/testing: 0.610, 0.646, 0.688/validation: 0.612, 0.622, 0.652/full model: 0.783, corrected: 0.772 | Calibration curve | Fair calibration | yes |
Chen (2021) [45] | Complications: 0.658/PFS: 0.676/OS: 0.700 | Calibration curve, HL test | Complications: fair, HL test: chi-square 3.99, p = 0.91/PFS: fair/OS: good | yes (for complications) |
Jin (2022) [46] | Training: 0.826/validation: 0.820/external validation: 0.763 | Calibration curve | Poor calibration (internal validation), fair (external validation) | yes |
Zhai (2022) [47] | 0.659 | NR | NR | NR |
Liu (2021) [48] | Development: 0.696/validation: 0.682 | Calibration curve | Development: fair/validation: poor | NR |
Moro (2020) [49] | AIC: wtKRAS: 1356, mtKRAS: 1356 | Brier scores after bootstrapping | Brier: 0.1741 (wtKRAS), 0.1793 (mtKRAS) | NR |
Chen (2021) [50] | Complications: 0.750/PFS: 0.663/OS: 0.684 | Calibration curves and HL test | Complications: fair/PFS: fair/OS: fair | yes |
Yao (2021) [51] | Presence of LN metastases: 0.655/PFS: 0.656 | Calibration curves and HL test | Presence of LN metastases: fair/PFS: fair | NR |
Kazi (2023) [52] | 0.692 | Calibration table | Good calibration (small group numbers) | NR |
Meng (2021) [53] | 1 yr OS, training: 0.788/3 yr OS, validation: 0.702/3 yr OS, training: 0.752/3 yr OS, validation: 0.848 | Calibration curve | 1 yr OS: fair, 3 yr OS: good (small numbers) | NR |
Imai (2016) [54] | 0.66 | Calibration curve | 3 and 5 yr OS: fair | NR |
Chen (2022) [55] | Development: 0.754/validation: 0.882 | Calibration curve, HL test | HL: chi-square: 1.36, p = 0.998, calibration curve: good calibration in development and validation cohorts | yes |
Cheng (2022) [56] | Training: 0.709/validation: 0.735 | Calibration curve | CSS: fair in training and validation/OS: fair in training and validation | NR |
Kulik (2018) [57] | Preoperative: 0.716/preop- and perioperative: 0.761 | NR | NR | NR |
Bai (2021) [58] | LDH-CRS: 0.674/mCRS: 0.681 | NR | NR | NR |
Wang (2021) [59] | 1st score, 1, 3, 5 yr OS, training: 0.84, 0.73, 0.70/1, 3, 5 yr OS, int. validation: 0.75, 0.70, 0.70/1, 3, 5 yr OS, ext. validation: 0.77, 0.78, 0.72/2nd score, 3 yr OS, training: 0.76/5 yr OS, training: 0.75/3 yr OS, validation: 0.74/5 yr OS, validation: 0.66 | Calibration curve | Merged score: fair | NR |
Xu (2021) [60] | Training: 0.746/validation: 0.764 | Calibration curve, slope, intercept | Validation: fair, calibration slope 1.09, intercept: −0.006 | NR |
Sasaki (2018) [61] | 0.669 | NR | NR | NR |
Wada (2022) [62] | Training: 0.83/validation: 0.81/mixed model: 0.85 | NR | NR | NR |
Kim (2020) [63] | Training: 0.824/validation: 0.898 | H-L test | p = 0.831 | NR |
Dupre (2019) [64] | Preoperative: 0.619/postoperative: 0.637 | NR | NR | NR |
Qi (2023) [65] | SOF, 5 yr: 0.63/SOF, 8 yr: 0.74/combined, 5 yr: 0.69/combined, 8 yr: 0.79 | Calibration curve | Fair calibration | NR |
Wu (2021) [66] | 0.705 | Calibration curve | Fair calibration | NR |
Dasari (2023) [67] | Development, 1, 2, 3, 5 yr: 0.756, 0.745, 0.706, 0.698/validation, 1, 2, 3, 5 yr: 0.679, 0.659, 0.678, 0.732 | NR | NR | NR |
Liu (2023) [68] | DEG risk score, development, 5 yr: 0.74/validation, 5 yr: 0.64/mixed model: 0.69 | Calibration curve | Good calibration | yes |
Amygdalos (2023) [69] | 0.70 | NR | NR | NR |
Chen (2023) [70] | 0.732 | Calibration curve | Fair | NR |
Wu (2018) [71] | OS, 1 and 3 yr: 0.621,0.661/CSS, 1 and 3 yr: 0.621,0.660 | Calibration curve | Fair in training and validation, both for OS and CSS | NR |
Deng (2023) [72] | Training: 0.720/validation: 0.740 | Calibration curve, HL test | Training: fair calibration, chi-square 4.97, p = 0.7612/validation: poor calibration, chi: 3.89, p = 0.8671 | yes (utility in a narrow range of thresholds) |
Berardi (2023) [73] | Training: 0.68/validation: 0.60 | Calibration curve | Fair | NR |
Liu (2019) [74] | Development: 0.675/validation: 0.77 | Calibration curve | Development: 1 yr poor, 3 yr good/validation: 1 yr poor, 3 yr poor, 5 yr poor | NR |
Welsh (2008) [75] | 0.781 | Calibration plot, HL test | Validation: chi-square = 6.03, p = 0.196 | NR |
Famularo (2023) [76] | RF model: 0.66 | NR | NR | NR |
He (2023) [77] | Training: 0.801/validation: 0.739 | Calibration curve, slope, intercept | Development: good calibration/validation: fair calibration, slope: 1.0, intercept 0.0 | yes |
Kattan (2008) [78] | Optimism-corrected: 0.612 | Calibration curve | Fair | NR |
Wensink (2023) [79] | Optimism-corrected, 6 m: 0.643, 12 m: 0.641 | Calibration curve, slope | Fair at 6 and 12 months, optimism-corrected slope: 0.86 | yes |
Fendler (2015) [80] | Training 0.81/validation: 0.83 | NR | NR | NR |
Marfa (2016) [81] | Training: 0.903 | NR | NR | NR |
Jiang (2023) [82] | CSS, training, 1 and 3 yr: 0.77, 0.70/validation, 1 and 3 yr: 0.72, 0.68/OS, training, 1 and 3 yr 0.78, 0.70/validation, 1 and 3 yr: 0.74, 0.70 | Calibration curve | Training: fair, validation poor | yes (superior to AJCC stage) |
Endo (2023) [83] | OS-OPT, training: 0.68/testing: 0.69/RFS-OPT, training: 0.68/testing: 0.69 | NR | NR | NR |
Rees (2008) [84] | Preoperative: 0.781/postoperative: 0.805 | H-L test | Preoperative: chi-square: 8.125; p = 0.087/postoperative: chi-square: 7.453, p = 0.114 | NR |
Zakaria (2007) [85] | DSS: 0.61/recurrence: 0.58 | NR | NR | NR |
Tan (2008) [86] | 0.59 | NR | NR | NR |
Hill (2012) [87] | Apparent: 0.69/optimism-corrected: 0.67 | NR | NR | NR |
Takeda (2021) [88] | Development: 0.65 | NR | NR | NR |
Wang (2017) [89] | 0.642 | NR | NR | NR |
Spelt (2013) [90] | ANN: 0.72/Cox model: 0.66 | NR | NR | NR |
AUC: area under the curve, DCA: decision curve analysis, MSKCC: Memorial Sloan Kettering Cancer Centre, KRAS: Kirsten rat sarcoma virus, NR: not reported, OS: overall survival, RFS: recurrence-free survival, GBT: gradient-boosted trees, LRB: logistic regression with bootstrapping, CITL: calibration-in-the-large, HL: Hosmer–Lemeshow, LM: liver metastases, ihPFS: intrahepatic progression-free survival, PFS: progression-free survival, AIC: Akaike information criterion, LN: lymph node, CSS: cancer-specific survival, LDH: lactate dehydrogenase, mCRS: modified clinical risk score, SOFs: spatial organization features, DEGs: differentially expressed genes, RF: random forest, AJCC: American Joint Committee on Cancer, OPT: optimal policy tree, DSS: disease-specific survival, ANN: artificial neural network.