Table 3:
Phase classification performance of models on the VinDr-Multiphase dataset: XGBoost, ResNet3D 18-layer (r3d_18), Mixed Convolution Network 18-layer (mc3_18), R(2+1)D 18-layer (r2plus1d_18), and TotalSegmentator (ts_phase). Models are evaluated using AUC, Sensitivity, Specificity, PPV, F1 Score, and Accuracy.
| AUC | Sensitivity | Specificity | PPV | F1-score | Accuracy | p-value | |
|---|---|---|---|---|---|---|---|
| Non-contrast | |||||||
| XGBoost | 0.999 | 0.994 | 0.999 | 0.994 | 0.994 | 0.994 | — |
| r3d_18 | 0.995 | 0.983 | 0.996 | 0.978 | 0.980 | 0.983 | 0.479 |
| mc3_18 | 0.999 | 0.994 | 0.994 | 0.968 | 0.981 | 0.994 | NaN |
| r2plus1d_18 | 0.997 | 0.983 | 0.991 | 0.952 | 0.967 | 0.983 | 0.479 |
| ts_phase | 0.986 | 0.972 | 1.000 | 1.000 | 0.986 | 0.995 | 0.113 |
| Arterial | |||||||
| XGBoost | 0.977 | 0.885 | 0.997 | 0.995 | 0.937 | 0.885 | — |
| r3d_18 | 0.960 | 0.725 | 0.991 | 0.983 | 0.834 | 0.725 | <0.001 |
| mc3_18 | 0.973 | 0.845 | 0.977 | 0.962 | 0.900 | 0.845 | 0.011 |
| r2plus1d_18 | 0.963 | 0.637 | 0.995 | 0.990 | 0.775 | 0.637 | <0.001 |
| ts_phase | 0.877 | 0.961 | 0.794 | 0.767 | 0.853 | 0.863 | <0.001 |
| Venous | |||||||
| XGBoost | 0.974 | 0.939 | 0.919 | 0.861 | 0.898 | 0.939 | — |
| r3d_18 | 0.971 | 0.934 | 0.927 | 0.873 | 0.902 | 0.934 | 0.838 |
| mc3_18 | 0.969 | 0.924 | 0.965 | 0.933 | 0.929 | 0.924 | 0.361 |
| r2plus1d_18 | 0.967 | 0.927 | 0.907 | 0.841 | 0.882 | 0.927 | 0.475 |
| ts_phase | 0.913 | 0.871 | 0.956 | 0.913 | 0.891 | 0.926 | <0.001 |
| Delayed | |||||||
| XGBoost | 0.937 | 0.780 | 0.964 | 0.666 | 0.718 | 0.780 | — |
| r3d_18 | 0.945 | 0.911 | 0.900 | 0.462 | 0.613 | 0.911 | 0.003 |
| mc3_18 | 0.953 | 0.862 | 0.932 | 0.546 | 0.669 | 0.862 | 0.061 |
| r2plus1d_18 | 0.957 | 0.921 | 0.875 | 0.410 | 0.567 | 0.921 | 0.002 |
| ts_phase | 0.500 | 0.000 | 1.000 | 0.000 | 0.000 | 0.915 | <0.001 |
P-values indicate the significance of accuracy differences compared to XGBoost (p<0.001 considered significant).