Table 2. Diagnostic Performances of AI-CAD When Applied to Digital Mammography Interpretation.
References | Purpose | Cancer Proportion | AI Category | External Validation* | AUC | Sensitivity | Specificity | Accuracy |
---|---|---|---|---|---|---|---|---|
Kooi et al. 2017 [28] | Compare between mammography CAD vs. CNN | 1.5% (271 annotated cancers in 18182 images) | Deep CNN | No, 18453 images from 2188 cases for test set | CAD 0.910 vs. CNN 0.929 | - | - | - |
CNN vs. radiologists for test set | Test set: CNN 0.878, radiologists 0.911 | |||||||
Becker et al. 2017 [86] | Evaluate diagnostic accuracy of deep learning-based software | 7.7% (18 of 233 cases) | dANN | No, 30% saved for validation | 0.840 (experienced readers: 0.890, inexperienced readers: 0.790) | 84.2% (84.2%, 84.2%) | 80.4% (89.0%, 83.0%) | - |
Al-Masni et al. 2018 [87] | Detection and classification of masses on DM | 50.0% (300 of 600 cases) | ROI-based CNN | No | 0.877 | 93.2% | 78.0% | 85.5% |
Bandeira Diniz et al. 2018 [88] | Detection of mass/non-mass regions in non-dense and dense breast | - (2482 images from 1241 women) | Deep CNN | No, 20% saved as test set | - | 91.5% in non-dense, 90.4% in dense breast | 90.5% in non-dense, 96.4% in dense breast | 91.0% in non-dense, 94.8% in dense breast |
Ribli et al. 2018 [89] | Propose a CAD system that detects and classifies malignant or benign lesions | - (2949 cases) | Faster R-CNN | Yes, DM DREAM challenge (AUC 0.85) | 0.950 | - | 90% | - |
Chougrad et al. 2018 [90] | Deep learning CAD to aid radiologists to classify mammography mass lesions | 51.0% | Deep CNN | Yes, MIAS database | DDSM 0.98, INIbreast 0.97, BCDR 0.96, MIAS 0.99 | - | - | DDSM 97.35%, INIbreast 95.50%, BCDR 96.67%, MIAS 98.23% |
Rodriguez-Ruiz et al. 2019 [25] | Compare the stand-alone performances of AI system to 101 radiologists | 24.6% (653 cancers in 2652 examinations) | Deep CNN (Transpara 1.4.0, Screenpoint Medical) | - | 0.840 Average of radiologists: 0.814 | Higher sensitivity for AI system in 5 of 9 datasets at the average specificity of radiologists | - | - |
Rodriguez-Ruiz et al. 2019 [26] | Compare the performances of radiologists unaided vs. aided by AI system | 20.1% (110 cancers of 546 examinations) | Deep CNN (Transpara 1.3.0, Screenpoint Medical) | With AI: 0.89 higher than without AI 0.87 | With AI: 86% without AI: 83% (p = 0.046) | With AI: 79% without AI: 77% (p = 0.06) | - | |
McKinney et al. 2020 [24] | Evaluate the performance of AI-CAD in a large, clinically representative dataset of UK and USA | UK: 1.6% | Deep learning AI model | Yes, tested on the USA test set | AI 0.740, outperformed the average radiologist, 0.625, p = 0.0002 | UK: ↑ 2.7% for the first reader, non-inferior to the second reader | UK: ↑ 1.2% for the first reader, non-inferior to the second reader | - |
USA: 22.2% | USA: ↑ 9.4% | USA: ↑ 5.7% | ||||||
Kim et al. 2020 [23] | Evaluate whether the AI algorithm for mammography can improve accuracy of breast cancer diagnosis | 50.0% (160 cancers of 320 examinations in the test set) | Deep CNN (Lunit INSIGHT for mammography) | - | AI 0.940, higher than average of 14 radiologists without AI (0.810) | AI 88.87% Improved with AI assistance for radiologists, 75.27% to 84.78% | AI 81.87%, improved with AI assistance for radiologists, 71.96% to 74.64% | - |
Radiologists improved with AI, 0.801 to 0.881 |
*With independent test set. AI = artificial intelligence, AUC = area under the receiving operator characteristics curve, BCDR = Breast Cancer Digital Repository, CAD = computer-aided detection/diagnosis, CNN = convolutional neural network, dANN = deep artificial neural networks, DDSM = Digital Database of Screening Mammography, DM = digital mammography, MIAS = Mammographic Image Analysis Society, ROI = region-of-interest, UK = United Kingdom, USA = United States