Skip to main content
. 2022 Jul 18;12(12):5564–5573. doi: 10.7150/thno.74125

Table 2.

Performance of deep learning model and two experienced radiologists in training and internal validation cohort

Results (n) Test performance (%)
TP TN FP FN AUC [95%CI] Sensitivity [95%CI] Specificity [95%CI] Accuracy [95%CI] P
Training 1368 5051 113 696 82.05 [81.01-83.08] 66.28 (1368/2064) [64.10-68.32] 97.81 (5051/5164) [97.41-98.20] 88.81 (6419/7228) [88.06-89.53]
Internal Validation
Deep-learning model 356 1268 23 160 83.61 [81.58-85.64] 68.99 (356/516) [65.12-73.06] 98.22 (1268/1291) [97.44-98.92] 89.87 (1624/1807) [88.39-91.23]
Radiologist 1 181 1239 52 335 65.52 [63.40-67.65] 35.08 (181/516) [30.81-39.34] 95.97 (1239/1291) [94.89-96.98] 78.58 (1420/1807) [76.62-80.45] < 0.0001
Radiologist 2 115 1248 43 401 59.48 [57.62-61.34] 22.29 (115/516) [18.60-25.78] 96.67 (1248/1291) [95.66-97.60] 75.43 (1363/1807) [73.38-77.40] < 0.0001

TP = true positive, TN = true negative, FP = false positive, FN = false negative

†: compare between radiologists and deep learning model.

Delong's test was used to compare the AUCs.