Table 3.
Performance Comparison Among Our Approach, Baseline A, Baseline B, and Baseline C on the Dataset Containing Only Interpretable Images
Accuracy/No. (%, 95% CI) | FNR/No. (%, 95% CI) | Recall/No. (%, 95% CI) | Specificity/No. (%, 95% CI) | AUC % (95% CI) | P Values | |
---|---|---|---|---|---|---|
Our approach | 69 | 7 | 35 | 34 | 93.85% | 0.00766 |
(88.46%, 81.37%–95.55%) | (16.67%, 8.40%–24.94%) | (83.33%, 75.06%–91.60%) | (94.44%, 89.36%–99.53%) | (88.39%–99.31%) | ||
Baseline A | 59 | 17 | 25 | 34 | 79.89% | <0.001 |
(75.64%, 66.11%–85.17%) | (40.48%, 29.58%–51.37%) | (59.52%, 48.63%–70.42%) | (94.44%, 89.36%–99.53%) | (71.00%–88.79%) | ||
Baseline B | 65 | 10 | 32 | 33 | 89.48% | <0.001 |
(83.33%, 75.06%–91.60%) | (23.81%, 14.36%–33.26%) | (76.19%, 66.74%–85.64%) | (91.67%, 85.53%–97.80%) | (81.88%–97.09%) | ||
Baseline C | 68 | 6 | 36 | 32 | 90.21% | 0.00443 |
(87.18%, 79.76%–94.60%) | (14.29%, 6.52%–22.05%) | (85.71%, 77.95%–93.48%) | (88.89%, 81.91%–95.86%) | (83.31%–97.12%) |
Performance comparison between our approach (alternate gradient descent with binary output), baseline A (2 single modal CNNs as 3-output task), baseline B (interpretability classifiers followed by 2 single modal CNNs as 2-output task), and baseline C (two-stream CNNs representing state-of-the-art methods for 2-modal image analysis) on the dataset containing only interpretable images.