. 2021 May 25;10(6):30. doi: 10.1167/tvst.10.6.30

Table 3.

Performance Comparison Among Our Approach, Baseline A, Baseline B, and Baseline C on the Dataset Containing Only Interpretable Images

	Accuracy/No. (%, 95% CI)	FNR/No. (%, 95% CI)	Recall/No. (%, 95% CI)	Specificity/No. (%, 95% CI)	AUC % (95% CI)	P Values
Our approach	69	7	35	34	93.85%	0.00766
	(88.46%, 81.37%–95.55%)	(16.67%, 8.40%–24.94%)	(83.33%, 75.06%–91.60%)	(94.44%, 89.36%–99.53%)	(88.39%–99.31%)
Baseline A	59	17	25	34	79.89%	<0.001
	(75.64%, 66.11%–85.17%)	(40.48%, 29.58%–51.37%)	(59.52%, 48.63%–70.42%)	(94.44%, 89.36%–99.53%)	(71.00%–88.79%)
Baseline B	65	10	32	33	89.48%	<0.001
	(83.33%, 75.06%–91.60%)	(23.81%, 14.36%–33.26%)	(76.19%, 66.74%–85.64%)	(91.67%, 85.53%–97.80%)	(81.88%–97.09%)
Baseline C	68	6	36	32	90.21%	0.00443
	(87.18%, 79.76%–94.60%)	(14.29%, 6.52%–22.05%)	(85.71%, 77.95%–93.48%)	(88.89%, 81.91%–95.86%)	(83.31%–97.12%)

Performance comparison between our approach (alternate gradient descent with binary output), baseline A (2 single modal CNNs as 3-output task), baseline B (interpretability classifiers followed by 2 single modal CNNs as 2-output task), and baseline C (two-stream CNNs representing state-of-the-art methods for 2-modal image analysis) on the dataset containing only interpretable images.