Table 5. Comparison of AUC, sensitivity, specificity, and accuracy declared by the developers of AI-based software and obtained during the three stages of the experiment for the five models.
Diagnostic accuracy metrics | AI-based software | Declared | Obtained | ||
---|---|---|---|---|---|
Stage 1 (detection of pathologies overall) | Stage 2 (lung nodules segmentation) | Stage 3 (lung nodules segmentation and classification) | |||
AUC (95% CI) | qXR | 0.920 | 0.921 (0.862 to 0.980) | 0.823* (0.754 to 0.889) | 0.792* (0.721 to 0.860) |
Celsus | 0.920 | 0.956 (0.918 to 0.994) | 0.885 (0.824 to 0.945) | 0.812* (0.744 to 0.879) | |
Program for automated analysis of digital fluorograms | 0.950 | 0.858* (0.790 to 0.925) | 0.844* (0.775 to 0.910) | 0.688* (0.619 to 0.753) | |
Care Mentor AI | 0.930 | 0.810* (0.723 to 0.897) | 0.708* (0.640 to 0.773) | 0.667* (0.599 to 0.734) | |
Lunit INSIGHT CXR | 0.920 | 0.932 (0.887 to 0.977) | 0.787* (0.720 to 0.854) | 0.787* (0.720 to 0.854) | |
Sensitivity (95% CI) | qXR | 0.900 | 0.854 (0.750 to 0.954) | 0.646* (0.510 to 0.781) | 0.583* (0.444 to 0.723) |
Celsus | 0.900 | 0.875 (0.781 to 0.970) | 0.770* (0.652 to 0.890) | 0.625* (0.488 to 0.762) | |
Program for automated analysis of digital fluorograms | 0.900 | 0.750* (0.630 to 0.872) | 0.690* (0.556 to 0.819) | 0.375* (0.238 to 0.512) | |
Care Mentor AI | 0.860 | 0.604* (0.466 to 0.740) | 0.417* (0.277 to 0.556) | 0.333* (0.200 to 0.467) | |
Lunit INSIGHT CXR | 0.790 | 0.920** (0.840 to 0.990) | 0.574* (0.433 to 0.716) | 0.574* (0.433 to 0.716) | |
Specificity (95% CI) | qXR | 0.820 | 0.830 (0.722 to 0.937) | 1.0** (1.0 to 1.0) | 1.0** (1.0 to 1.0) |
Celsus | 0.860 | 0.960** (0.900 to 1.0) | 1.0** (1.0 to 1.0) | 1.0** (1.0 to 1.0) | |
Program for automated analysis of digital fluorograms | 0.980 | 0.960 (0.900 to 1.0) | 1.0** (1.0 to 1.0) | 1.0** (1.0 to 1.0) | |
Care Mentor AI | 0.920 | 0.910 (0.835 to 0.990) | 1.0** (1.0 to 1.0) | 1.0** (1.0 to 1.0) | |
Lunit INSIGHT CXR | 0.950 | 0.810* (0.700 to 0.920) | 1.0** (1.0 to 1.0) | 1.0** (1.0 to 1.0) | |
Accuracy (95% CI) | qXR | 0.850 | 0.880 (0.820 to 0.950) | 0.820 (0.744 to 0.898) | 0.789 (0.707 to 0.871) |
Celsus | 0.860 | 0.916 (0.860 to 0.972) | 0.884 (0.820 to 0.950) | 0.810 (0.732 to 0.889) | |
Program for automated analysis of digital fluorograms | 0.940 | 0.850* (0.781 to 0.930) | 0.842* (0.769 to 0.915) | 0.684* (0.590 to 0.778) | |
Care Mentor AI | 0.910 | 0.760* (0.672 to 0.844) | 0.705* (0.610 to 0.797) | 0.663* (0.568 to 0.758) | |
Lunit INSIGHT CXR | N/A | 0.860 (0.790 to 0.930) | 0.787 (0.707 to 0.868) | 0.787 (0.704 to 0.870) |
The comparison relies on the ground truth markup. The values of the obtained metrics, taking into account 95% CI, which were less than those stated by the developer are marked with “*”, and those which were more than those stated by the developer are marked with “**”. The metrics named by the vendors are shown for detection of pathologies overall. AUC, area under the receiver operating characteristic curve; AI, artificial intelligence; CXR, chest X-ray; CI, confidence interval.