Table 1.
Detailed information on the 12 best papers found in our systematic meta-review of 463 papers (maturity score of high)
| Paper title | Primary task; modality | Key findings | Limitations | Patients (train/val/test) | No. of data sites | Labels | Architecture, dimensionality | Pretraining | Metrics | Results | Reproducibility (code/data open source) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Artificial intelligence-enabled rapid diagnosis of patients with COVID-1927 | diagnosis, CT | system identified 68% of RT-PCR-positive patients with normal CT (asymptomatic). Clinical information is important for diagnosis and model is equally sensitive than a senior radiologist | small data size, mild cases have few abnormal findings on chest CT, severity of pathological findings variable in CT | 534/92/279 | 18 | RT-PCR tests | Inception-ResNet-v2 (pretrained ImageNet), 3-layer MLP, 2D | transfer learning (pulmonary tuberculosis model) | AUROC, sensitivity, specificity | 0.92 AUC, 84.3% sens, 82.8% spec | code—yes, data—no |
| Artificial intelligence augmentation of radiologist performance in distinguishing COVID-19 from pneumonia of other origin at chest CT32 | Diagnosis, CT | AI assistance improved radiologists' performance in diagnosing COVID-19. AI alone outperformed radiologists on sensitivity and specificity | bias in radiologist-annotation, heterogeneous data, bias in location of COVID (China) versus non-COVID pneumonia patients (USA) | 830/237/119 | 13 | RT-PCR tests, slice-level by radiologist | EfficientNet-B4, 2D | transfer learning (ImageNet) | AUROC, sensitivity, specificity, accuracy, AUPRC | 0.95 AUC, 95% sens, 96% spec, 96% acc, 0.9 AUPRC | code—yes, data—no |
| Automated assessment of CO-RADS and chest CT severity scores in patients with suspected COVID-19 using artificial intelligence33 | diagnosis, CT | a freely accessible algorithm that assigns CO-RADS and CT severity scores to non-contrast CT scans of patients suspected of COVID-19 with high diagnostic performance | only one data center, high COVID prevalence, low prevalence for other diseases | 476/105 | 1 | RT-PCR, radiology report | lobe segmentation 3D UNet, CO-RADS scoring, 3D Inception Net | transfer learning (ImageNet and kinetics) | AUC, sensitivity, specificity | internal: 0.95 AUC, external: 0.88 AUC | code—yes, data—no |
| Diagnosis of Covid-19 pneumonia using chest radiography: value of artificial intelligence35 | diagnosis, X-ray | AI surpassed senior radiologists in COVID-19 differential diagnosis | high COVID prevalence, human ROC-AUC were averaged from 3 readers | 5,208/2,193 | 5 hospitals, 30 clinics | RT-PCR, natural language processing on radiology report | CV19-Net | 3-stage transfer learning (ImageNet) | AUC, sensitivity, specificity | 0.92 AUC, 88.0% sens, 79.0% spec | code—yes, data—no |
| Development and evaluation of an artificial intelligence system for COVID-19 diagnosis23 | diagnosis, multimodal | paired cohort of chest X-ray (CXR)/CT data: CT is superior to CXR for diagnosis by wide margin. AI system outperforms all radiologists in 4-class classification | more data on more pneumonia subtypes needed, no clinical information used (could enable severity assessment) | 2,688/2,688/3,649 | 7 | – | lung seg 2D UNet, slice diagnosis 2D ResNet152 | transfer learning (pretrained ImageNet) | AUC, sensitivity, specificity | AUC 0.978 | code—yes, data—no |
| AI-assisted CT imaging analysis for COVID-19 screening: building and deploying a medical AI system31 | diagnosis, CT | system was deployed in 4 weeks in 16 hospitals; AI outperformed radiologists in sensitivity by wide margin | model fails when multiple lesions, metal or motion artifacts are present, system depends on fully annotated CT data | 1,136 | 5 | Nucleic acid test, 6 annotators (lesions, lung) | 3D UNet++, ResNet50 | full training | sensitivity, specificity | sens 97.4%, spec 92.2% | code—no, data—no |
| Automated assessment and tracking of COVID-19 pulmonary disease severity on chest radiographs using convolutional Siamese neural networks32 | severity, X-ray | continuous severity score used for longitudinal evaluation and risk stratification (admission CXR score predicts intubation and death, AUC = 0.8). Follow-up CXR score by AI is concordant with radiologist (r = 0.74) | patients only from urban areas in USA, no generalization to posteroanterior radiographs | 160,000/267 (images) | 2 | RT-PCR tests, 2–5 annotators, mRALE | Siamese DenseNet-121 | DenseNet-121 (ImageNet, fine-tuned on CheXpert) | PXS score, Pearson, AUC | r = 0.86, AUC = 0.8 | code—yes, data—partial (COVID CXR not released) |
| Development and clinical implementation of tailored image analysis tools for COVID-19 in the midst of the pandemic36 | severity, CT | developed algorithms for quantification of pulmonary opacity in 10 days. Human-level performance with <200 CT scans. Model integrated into clinical workflow | data: no careful acquisition, not complete, consecutively acquired or fully random sample; empirical HU-thresholds for quantification | 146/66 | 1 | RT-PCR, 3 radiologist annotators | 3D UNet | full training | Dice coefficient, Hausdoff distance | Dice = 0.97 | code—yes, data—no |
| Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography27 | prognosis, CT | AI with diagnostic performance comparable with senior radiologist. AI lifts junior radiologists to senior level. AI predicts drug efficacy and clinical prognosis. Identifies biomarkers for novel coronavirus pneumonia lesion. Data available | 3,777 | 4 | pixel-level annotation (5 radiologists) | lung-lesion seg DeepLabV3, diagnosis analysis 3D ResNet-18, gradient boosting decision tree | full training | Dice coefficient, AUC, accuracy, sensitivity, specificity | AUC 0.9797, acc 92.49%, sens 94.93%, spec 91.13% | code—yes, data—yes | |
| Relational modeling for robust and efficient pulmonary lobe segmentation in CT scans30 | segmentation, CT | leverages structured relationships with non-local module. Can enlarge receptive field of convolution features. Robustly segments COVID-19 infections | errors on border of segmentations, gross pathological changes not represented in data | 4,370/1,100 | 2 (pretraining: 21 centers) | radiology report | RTSU-Net (2-stage 3D UNet) | pretraining on COPDGene | intersection over union, average asymmetric surface distance | IOU 0.953, AASD 0.541 | code—yes, data—no/partial |
| Dual-branch combination network (DCN): toward accurate diagnosis and lesion segmentation of COVID-19 using CT images37 | diagnosis, CT | DCN for combined segmentation and classification. Lesion attention (LA) module improves sensitivity to CT images with small lesions and facilitates early screening. Interpretability: LA provides meaningful attention maps | diagnosis depends on accuracy of segmentation module, no slice-level annotation | 1,202 | 10 | RT-PCR, pixel-level annotation by 6 radiologists | UNet, ResNet-50 | full training | accuracy, Dice, sensitivity, specificity, AUC, average accuracy | acc 92.87%, Dice 99.11%, sens 92.86%, spec 92.91%, AUC 0.977, average acc 92.89% | code—no, data—no |
| AI-driven quantification, staging and outcome prediction of COVID-19 pneumonia29 | prognosis, CT | 2D/3D COVID-19 quantification, roughly on par with radiologists. Facilitates prognosis/staging which outperforms radiologists. Rich set of model ensembles, uses clinical features | test dataset partly split by centers | 693 (321,000 slices)/513 for test | 8 | RT-PCR | AtlasNet, 2D | full training | Dice coefficient, correlation, accuracy | Dice 0.7, balanced accuracy 0.7 | code—no, data—yes (without images) |
For discussion, please see the text.