Skip to main content
. 2024 Jul 25;14(8):5288–5303. doi: 10.21037/qims-24-160

Table 5. Comparison of AUC, sensitivity, specificity, and accuracy declared by the developers of AI-based software and obtained during the three stages of the experiment for the five models.

Diagnostic accuracy metrics AI-based software Declared Obtained
Stage 1 (detection of pathologies overall) Stage 2 (lung nodules segmentation) Stage 3 (lung nodules segmentation and classification)
AUC (95% CI) qXR 0.920 0.921 (0.862 to 0.980) 0.823* (0.754 to 0.889) 0.792* (0.721 to 0.860)
Celsus 0.920 0.956 (0.918 to 0.994) 0.885 (0.824 to 0.945) 0.812* (0.744 to 0.879)
Program for automated analysis of digital fluorograms 0.950 0.858* (0.790 to 0.925) 0.844* (0.775 to 0.910) 0.688* (0.619 to 0.753)
Care Mentor AI 0.930 0.810* (0.723 to 0.897) 0.708* (0.640 to 0.773) 0.667* (0.599 to 0.734)
Lunit INSIGHT CXR 0.920 0.932 (0.887 to 0.977) 0.787* (0.720 to 0.854) 0.787* (0.720 to 0.854)
Sensitivity (95% CI) qXR 0.900 0.854 (0.750 to 0.954) 0.646* (0.510 to 0.781) 0.583* (0.444 to 0.723)
Celsus 0.900 0.875 (0.781 to 0.970) 0.770* (0.652 to 0.890) 0.625* (0.488 to 0.762)
Program for automated analysis of digital fluorograms 0.900 0.750* (0.630 to 0.872) 0.690* (0.556 to 0.819) 0.375* (0.238 to 0.512)
Care Mentor AI 0.860 0.604* (0.466 to 0.740) 0.417* (0.277 to 0.556) 0.333* (0.200 to 0.467)
Lunit INSIGHT CXR 0.790 0.920** (0.840 to 0.990) 0.574* (0.433 to 0.716) 0.574* (0.433 to 0.716)
Specificity (95% CI) qXR 0.820 0.830 (0.722 to 0.937) 1.0** (1.0 to 1.0) 1.0** (1.0 to 1.0)
Celsus 0.860 0.960** (0.900 to 1.0) 1.0** (1.0 to 1.0) 1.0** (1.0 to 1.0)
Program for automated analysis of digital fluorograms 0.980 0.960 (0.900 to 1.0) 1.0** (1.0 to 1.0) 1.0** (1.0 to 1.0)
Care Mentor AI 0.920 0.910 (0.835 to 0.990) 1.0** (1.0 to 1.0) 1.0** (1.0 to 1.0)
Lunit INSIGHT CXR 0.950 0.810* (0.700 to 0.920) 1.0** (1.0 to 1.0) 1.0** (1.0 to 1.0)
Accuracy (95% CI) qXR 0.850 0.880 (0.820 to 0.950) 0.820 (0.744 to 0.898) 0.789 (0.707 to 0.871)
Celsus 0.860 0.916 (0.860 to 0.972) 0.884 (0.820 to 0.950) 0.810 (0.732 to 0.889)
Program for automated analysis of digital fluorograms 0.940 0.850* (0.781 to 0.930) 0.842* (0.769 to 0.915) 0.684* (0.590 to 0.778)
Care Mentor AI 0.910 0.760* (0.672 to 0.844) 0.705* (0.610 to 0.797) 0.663* (0.568 to 0.758)
Lunit INSIGHT CXR N/A 0.860 (0.790 to 0.930) 0.787 (0.707 to 0.868) 0.787 (0.704 to 0.870)

The comparison relies on the ground truth markup. The values of the obtained metrics, taking into account 95% CI, which were less than those stated by the developer are marked with “*”, and those which were more than those stated by the developer are marked with “**”. The metrics named by the vendors are shown for detection of pathologies overall. AUC, area under the receiver operating characteristic curve; AI, artificial intelligence; CXR, chest X-ray; CI, confidence interval.