Table 2.
Comparison of best-performing feature selectors (by AUC). Due to the high class imbalance for class boundaries at high and low overall survival values, the balanced accuracy is reported. The single-center metrics are listed as the mean across splits of stratified 10-fold cross-validation. A large performance drop can be observed when the model is tested on unseen multi-center data
Prior | Center | Robustness | Class Boundary | Selector | ML model | AUC | Bal. Acc. |
---|---|---|---|---|---|---|---|
– | S | Non-robust | 304.20 | MRMR | Gaussian Process | 1.00 | 70% |
– | S | Non-robust | 365.00 | MIFS | MLP | 0.98 | 93% |
– | S | Non-robust | 425.80 | CIFE | Adaboost | 1.00 | 94% |
– | S | Non-robust | 540.00 | MRMR | MLP | 1.00 | 95% |
– | M | Non-robust | 304.20 | MRMR | Gaussian Process | 0.50 | 52% |
– | M | Non-robust | 365.00 | MIFS | MLP | 0.42 | 50% |
– | M | Non-robust | 425.80 | CIFE | AdaBoost | 0.49 | 51% |
– | M | Non-robust | 540.00 | MRMR | MLP | 0.51 | 55% |
MR | S | Robust | 304.20 | RELF | Random Forest | 0.76 | 69% |
MR | S | Robust | 365.00 | RELF | Nearest Neighbors | 0.72 | 52% |
MR | S | Robust | 425.80 | RELF | XGBoost | 0.81 | 71% |
MR | S | Robust | 540.00 | GINI | Decision Tree | 0.69 | 68% |
MR | M | Robust | 304.20 | RELF | Random Forest | 0.54 | 50% |
MR | M | Robust | 365.00 | RELF | Nearest Neighbors | 0.57 | 51% |
MR | M | Robust | 425.80 | RELF | XGBoost | 0.49 | 45% |
MR | M | Robust | 540.00 | GINI | Decision Tree | 0.46 | 43% |
H | S | Robust | 304.20 | CIFE | XGBoost | 0.90 | 75% |
H | S | Robust | 365.00 | MRMR | AdaBoost | 0.78 | 69% |
H | S | Robust | 425.80 | MRMR | AdaBoost | 0.82 | 67% |
H | S | Robust | 540.00 | GINI | AdaBoost | 0.74 | 68% |
H | M | Robust | 304.20 | CIFE | XGBoost | 0.66 | 57% |
H | M | Robust | 365.00 | MRMR | AdaBoost | 0.54 | 58% |
H | M | Robust | 425.80 | MRMR | AdaBoost | 0.51 | 50% |
H | M | Robust | 540.00 | GINI | Adaboost | 0.48 | 43% |
Abbreviations: MR sequence prior, H hand-picked, S single-center, M multi-center, Bal. Acc. balanced accuracy, MRMR minimum redundancy maximum relevance, MIFS mutual information feature selection, CIFE conditional infomax feature extraction, RELF ReliefF, GINI Gini index, CMIM conditional mutual information maximization, MLP multi-layer perceptron, RBF SVC support vector classifier with radial basis function kernel. The full table with all performance metrics is reported in the supplementary material