Skip to main content
. 2023 Apr 22;96(1150):20220685. doi: 10.1259/bjr.20220685

Table 2.

The performance of the DL model and readers in the test data set

AUC Accuracy Sensitivity Specificity
Image-based analysis
 DL model 0.95 0.92 (2876/3133) 0.87 (213/245) 0.92 (2663/2888)
 Reader A Without DL model 0.96 0.96 (3001/3133) 0.97 (238/245) 0.96 (2763/2888)
With DL model 0.97 0.96 (3012/3133) 0.98 (239/245) 0.96 (2773/2888)
Comparison 0.100 0.091 1.000 0.123
 Reader B Without DL model 0.93 0.98 (3065/3133) 0.88 0.99 (2850/2888)
With DL model 0.95 0.98 (3079/3133) 0.90 (220/245) 0.99 (2859/2888)
Comparison 0.006a 0.001a 0.074 0.015a
 Reader C Without DL model 0.96 0.94 0.95 (233/245) 0.94 (2723/2888)
With DL model 0.99 0.95 (2971/3133) 1.00 (245/245) 0.94 (2726/2888)
Comparison <0.001a 0.077 0.002a 0.877
Reader D Without DL model 0.93 0.96 (3016/3133) 0.90 (220/245) 0.97 (2796/2888)
With DL model 0.96 0.95 (2973/3133) 0.96 (235/245) 0.95 (2738/2888)
Comparison <0.001a <0.001a 0.001a <0.001a
Patient-based analysis
 DL model 0.98 0.96 (48/50) 0.96 (24/25) 0.96 (24/25)
 Reader A Without DL model 0.98 0.98 (49/50) 1.00 (25/25) 0.96 (24/25)
With DL model 1.00 1.00 (50/50) 1.00 (25/25) 1.00 (25/25)
Comparison 0.317 1.000    N/A 1.000
 Reader B Without DL model 0.96 0.96 (48/50) 0.92 (23/25) 1.00 (25/25)
With DL model 1.00 1.00 (50/50) 1.00 (25/25) 1.00 (25/25)
Comparison 0.149 0.480 0.480    N/A
 Reader C Without DL model 0.98 0.98 (49/50) 0.96 (24/25) 1.00 (25/25)
With DL model 1.00 1.00 (50/50) 1.00 (25/25) 1.00 (25/25)
Comparison 0.317 1.000 1.000    N/A
 Reader D Without DL model 0.94 0.94 (47/50) 0.88 (22/25) 1.00 (25/25)
With DL model 1.00 0.98 (49/50) 0.96 (24/25) 1.00 (25/25)
Comparison 0.073 0.480 0.480    N/A

AUC, area under the curve; DL, deep learning; N/A, not applicable.

Accuracy, sensitivity, and specificity were calculated by using the threshold that achieved the Youden index. Comparisons between AUC values were performed with the DeLong’s test, and comparisons among accuracy, sensitivity, and specificity were performed by using the McNemar’s test.

astatistically significant difference.