Annarumma et al., 2019 [17] |
3229 institutional adult chest radiographs |
Developed and tested an AI model, based on deep CNNs, for automated triaging of adult chest radiographs based on the urgency of imaging appearances |
Ensemble of two deep CNNs |
Sensitivity of 71%, specificity of 95%, PPV of 73%, and NPV of 94% |
The AI model was able to interpret and prioritize chest radiographs based on critical or urgent findings |
False Negatives could result in misinterpretation of urgent cases as non-urgent, delaying timely clinical attention
Same radiology label could correspond to different levels of urgency. The spectrum of urgency was not addressed in the study
|
Dunnmon et al., 2019 [18] |
533 frontal chest radiographs |
Assessed the ability of CNNs to enable automated binary classification of chest radiographs |
Variety of classification CNNs |
AUC of 0.96 |
Demonstrated the automated classification of chest radiographs as normal or abnormal |
Only predicted the presence or absence of abnormality in the thoracic region
Did not provide explainability
|
Nguyen et al., 2022 [19] |
6285 frontal chest radiographs |
Deployed and validated an AI-based system for detecting abnormalities on chest X-ray scans in real-world clinical settings |
EfficientNet |
F1 score of 0.653, accuracy of 79.6%, sensitivity of 68.6%, and specificity of 83.9% |
Examined the AI performance on a clinical dataset different from the training dataset |
Classified radiographs into normal or abnormal due to lack of detailed ground truth
Did not check the effect of AI on radiologist diagnostic performance
|
Saleh et al., [20] |
18,265 frontal-view chest X-ray images |
Developed CNN-based DL models and compared their feasibility and performance to classify 14 chest pathologies found on chest X-rays |
Variety of classification CNNs with DC-GANs |
Accuracy of 67% and 62% for the best-performing model with and without augmentation, respectively |
Used GAN-based techniques for data augmentation to address the lack of data for some pathologies |
A different test set was used for the AI model with augmentation
Test sets included images from the NIH database only
|
Hwang et al., [21] |
1089 frontal chest X-ray images |
Developed a deep learning–based algorithm that classified chest radiographs into normal and abnormal for various thoracic diseases |
Variety of classification CNNs |
AUC of 0.979, sensitivity of 0.979, and specificity of 0.880 |
AI model outperformed physicians, including thoracic radiologists. Radiologists aided with DLAD performed better than radiologists without the aid of DLAD |
Validation was performed using experimentally designed data sets and included chest radiographs with only 1 target disease
DLAD covered only 4 major thoracic disease categories
|