. 2020 Jun 24;9(6):1981. doi: 10.3390/jcm9061981

Table 3.

Diagnostic Performance of DL algorithm and ED physicians (visible pneumonia on CR vs. non-pneumonia).

	AUROC (95% CI)	p Value	Sensitivity (95% CI)	p Value	Specificity (95% CI)	p Value
DL algorithm	0.940 (0.910–0.962)	NA	0.817 * (0.696–0.905)	NA	0·944 * (0·912–0·967)	NA
Session 1 (ED physicians only)
Observer 1	0.856 (0.816–0.891)	0.003 ^a	0.833 (0.715–0.917)	1.000 ^a	0.690 (0.634–0.741)	<0.0001 ^a
Observer 2	0.887 (0.850–0.918)	0.053 ^a	0.700 (0.568–0·812)	0.119 ^a	0.974 (0.949–0.989)	0.093 ^a
Observer 3	0.920 (0.887–0.946)	0.455 ^a	0.683 (0.550–0.797)	0.022 ^a	0.997 (0.982–1.000)	0.0001 ^a
Group	0.871 (0.849–0.890)	0.007 ^a	0.739 (0.668–0.801)	0.034 ^a	0.887 (0.864–0.907)	<0.0001 ^a
Session 2 (ED physicians with DL algorithm assistance)
Observer 1	0.936 (0.905–0.958)	0.007 ^b	0.867 (0.754–0.941)	0.774 ^b	0.954 (0.924–0.975)	<0.0001 ^b
Observer 2	0.907 (0.873–0.935)	0.412 ^b	0.783 (0.658–0.879)	0.227 ^b	1.000 (0.988–1.000)	0.008 ^b
Observer 3	0.907 (0.872–0.934)	0.609 ^b	0.817 (0.696–0.905)	0.022 ^b	0.990 (0.971–0.998)	0.625 ^b
Group	0.916 (0.898–0.931)	0.002 ^b	0.822 (0.758–0.875)	0.014 ^b	0.981 (0.970–0.989)	<0.0001 ^b

AUROC = the area under the receiver operating characteristics curve, CR = chest radiograph, DL = deep learning, ED = emergency department. * Sensitivity and specificity of DL algorithm were determined at high-sensitivity threshold. ^a Comparison of performance with DL algorithm. ^b Comparison of performance with session 1.