Table 4:
Phase classification performance of models on the C4KC-KiTS dataset: XGBoost, ResNet3D 18-layer (r3d_18), Mixed Convolution Network 18-layer (mc3_18), R(2+1)D 18-layer (r2plus1d_18), and TotalSegmentator (ts_phase). Models are evaluated using AUC, Sensitivity, Specificity, PPV, F1 Score, and Accuracy.
| AUC | Sensitivity | Specificity | PPV | F1-score | Accuracy | p-value | |
|---|---|---|---|---|---|---|---|
| Non-contrast | |||||||
| XGBoost | 0.994 | 0.981 | 0.996 | 0.990 | 0.985 | 0.981 | — |
| r3d_18 | 0.992 | 0.981 | 0.973 | 0.929 | 0.954 | 0.981 | NaN |
| mc3_18 | 0.989 | 0.971 | 0.940 | 0.852 | 0.908 | 0.971 | 1.000 |
| r2plus1d_18 | 0.992 | 0.981 | 0.986 | 0.963 | 0.972 | 0.981 | NaN |
| ts_phase | 0.984 | 0.971 | 0.996 | 0.990 | 0.981 | 0.990 | 1.000 |
| Arterial/Venous | |||||||
| XGBoost | 0.994 | 0.961 | 0.974 | 0.975 | 0.968 | 0.961 | — |
| r3d_18 | 0.961 | 0.876 | 0.878 | 0.884 | 0.880 | 0.876 | <0.001 |
| mc3_18 | 0.925 | 0.838 | 0.772 | 0.796 | 0.816 | 0.838 | <0.001 |
| r2plus1d_18 | 0.917 | 0.800 | 0.777 | 0.792 | 0.796 | 0.800 | <0.001 |
| ts_phase | 0.620 | 0.614 | 0.626 | 0.635 | 0.624 | 0.620 | <0.001 |
| Delayed | |||||||
| XGBoost | 0.991 | 0.956 | 0.974 | 0.915 | 0.935 | 0.956 | — |
| r3d_18 | 0.926 | 0.670 | 0.917 | 0.701 | 0.685 | 0.670 | <0.001 |
| mc3_18 | 0.862 | 0.406 | 0.911 | 0.569 | 0.474 | 0.406 | <0.001 |
| r2plus1d_18 | 0.877 | 0.494 | 0.867 | 0.517 | 0.505 | 0.494 | <0.001 |
| ts_phase | 0.469 | 0.197 | 0.741 | 0.180 | 0.188 | 0.620 | <0.001 |
P-values indicate the significance of accuracy differences compared to XGBoost (p<0.001 considered significant).