. 2023 Apr 22;96(1150):20220685. doi: 10.1259/bjr.20220685

Table 2.

The performance of the DL model and readers in the test data set

		AUC	Accuracy	Sensitivity	Specificity
Image-based analysis
DL model		0.95	0.92 (2876/3133)	0.87 (213/245)	0.92 (2663/2888)
Reader A	Without DL model	0.96	0.96 (3001/3133)	0.97 (238/245)	0.96 (2763/2888)
	With DL model	0.97	0.96 (3012/3133)	0.98 (239/245)	0.96 (2773/2888)
	Comparison	0.100	0.091	1.000	0.123
Reader B	Without DL model	0.93	0.98 (3065/3133)	0.88	0.99 (2850/2888)
	With DL model	0.95	0.98 (3079/3133)	0.90 (220/245)	0.99 (2859/2888)
	Comparison	0.006^a	0.001^a	0.074	0.015^a
Reader C	Without DL model	0.96	0.94	0.95 (233/245)	0.94 (2723/2888)
	With DL model	0.99	0.95 (2971/3133)	1.00 (245/245)	0.94 (2726/2888)
	Comparison	<0.001^a	0.077	0.002^a	0.877
Reader D	Without DL model	0.93	0.96 (3016/3133)	0.90 (220/245)	0.97 (2796/2888)
	With DL model	0.96	0.95 (2973/3133)	0.96 (235/245)	0.95 (2738/2888)
	Comparison	<0.001^a	<0.001^a	0.001^a	<0.001^a
Patient-based analysis
DL model		0.98	0.96 (48/50)	0.96 (24/25)	0.96 (24/25)
Reader A	Without DL model	0.98	0.98 (49/50)	1.00 (25/25)	0.96 (24/25)
	With DL model	1.00	1.00 (50/50)	1.00 (25/25)	1.00 (25/25)
	Comparison	0.317	1.000	N/A	1.000
Reader B	Without DL model	0.96	0.96 (48/50)	0.92 (23/25)	1.00 (25/25)
	With DL model	1.00	1.00 (50/50)	1.00 (25/25)	1.00 (25/25)
	Comparison	0.149	0.480	0.480	N/A
Reader C	Without DL model	0.98	0.98 (49/50)	0.96 (24/25)	1.00 (25/25)
	With DL model	1.00	1.00 (50/50)	1.00 (25/25)	1.00 (25/25)
	Comparison	0.317	1.000	1.000	N/A
Reader D	Without DL model	0.94	0.94 (47/50)	0.88 (22/25)	1.00 (25/25)
	With DL model	1.00	0.98 (49/50)	0.96 (24/25)	1.00 (25/25)
	Comparison	0.073	0.480	0.480	N/A

AUC, area under the curve; DL, deep learning; N/A, not applicable.

Accuracy, sensitivity, and specificity were calculated by using the threshold that achieved the Youden index. Comparisons between AUC values were performed with the DeLong’s test, and comparisons among accuracy, sensitivity, and specificity were performed by using the McNemar’s test.

^astatistically significant difference.