Table 2. Performance of various MRI-based deep learning models on the internal and external test sets.
Framework | Aug. | Internal test set | External test set | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AUC | P value | Accuracy | Sensitivity | Specificity | F1 | AUC | P value | Accuracy | Sensitivity | Specificity | F1 | |||
Tri-ResNet18 | Com. | 0.750 (0.635–0.863) |
0.374 | 69.4 (57.5–79.8) |
71.1 (55.7–83.6) |
66.7 (46.0–83.5) |
0.744 | 0.754 (0.616–0.893) |
0.056 | 66.7 (52.5,78.9) |
80.0 (59.3–93.2) |
55.2 (35.7– 73.6) |
0.690 | |
Tri-ResNet34 | Com. | 0.757 (0.636–.879) |
0.269 | 72.2 (60.4–82.1) |
68.9 (53.4–81.8) |
77.8 (57.7–91.4) |
0.756 | 0.755 (0.616–0.896) |
0.041 | 66.7 (52.5,78.9) |
72.0 (50.6–87.9) |
62.1 (42.3–79.3) |
0.667 | |
Tri-ResNet50 | Com. | 0.779 (0.663–0.895)† |
NA | 72.2 (60.4–82.1)† |
77.8 (62.9–88.8)† |
63.0 (42.4,80.6) |
0.778† | 0.778 (0.648–0.908)† |
NA | 74.1 (60.3–85.0)† |
76.0 (54.9–90.6)† |
72.4 (52.8–87.3)† |
0.731† | |
Tri-ResNet101 | Com. | 0.755 (0.638–0.871) |
0.352 | 69.4 (57.5–79.8) |
62.2 (46.5–76.2) |
81.5 (61.9–93.7) |
0.718 | 0.748 (0.600–0.894) |
0.080 | 68.5 (46.5–85.1) |
68.0 (46.5–85.1) |
69.0 (49.2–84.7) |
0.667 | |
Tri-VGG16 | Com. | 0.759 (0.647–0.871) |
0.028 | 65.3 (53.1–76.1) |
64.4 (48.8–78.1) |
66.7 (46.0–84.5) |
0.699 | 0.706 (0.551–0.861) |
0.010 | 63.0 (48.7–75.7) |
60.0 (38.7–78.9) |
65.5 (45.7–82.1) |
0.600 | |
Tri-ResNet50 | Mixup | 0.835 (0.720–0.933) |
0.015 | 79.2 (68.0–87.8) |
77.8 (62.9–88.8) |
81.5 (61.9–93.7) |
0.824 | 0.825 (0.712–0.938) |
0.149 | 77.8 (64.4–88.0) |
84.0 (63.9–95.5) |
72.4 (52.8–87.3) |
0.778 | |
Tri-ResNet50 | MixCut* | 0.870 (0.742–0.952)* |
0.004* | 83.3 (72.7–91.1)* |
88.9 (76.0–96.3)* |
74.1 (53.7–8.9) |
0.870* | 0.840 (0.730–0.950)* |
0.037* | 81.5 (68.6–90.8)* |
88.0 (68.8–97.5)* |
75.9 (56.5–89.7)* |
0.815* |
Accuracy, sensitivity, and specificity are expressed as percentages. Data in brackets are 95% confidence intervals. P values represent statistical AUC differences between Tri-ResNet50 model using the common data augmentations and other models. When fixing common data augmentations, the unique best performance of the optimal framework was shown in ‘†’; when fixing the optimal framework, the unique best performance of MixCut was shown in ‘*’. MRI, magnetic resonance imaging; Aug., data augmentation method; AUC, the area under the curve; F1, F1 score; Com., common; NA, not available.