Table 1.
VinDr-CXR | ChestX-ray14 | CheXpert | MIMIC-CXR | PadChest | |
---|---|---|---|---|---|
Number of radiographs total (training set/test set) [n] | 18,000 (15,000/3000) | 112,120 (86,524/25,596) | 157,878 (128,356/29,320) | 213,921 (170,153/43,768 | 110,525 (88,480/22,045) |
Number of patients (Total) [n] | N/A | 30,805 | 65,240 | 65,379 | 67,213 |
Patient age [years] | |||||
Median | 42 | 49 | 61 | N/A | 63 |
Mean ± Standard deviation | 54 ± 18 | 47 ± 17 | 60 ± 18 | N/A | 59 ± 20 |
Range (minimum, maximum) | (2, 91) | (1, 96) | (18, 91) | N/A | (1, 105) |
Patient sex female/male [%] | |||||
Training set | 47.8/52.2 | 42.4/57.6 | 41.4/58.6 | N/A | 50.0/50.0 |
Test set | 44.1/55.9 | 41.9/58.1 | 39.0/61.0 | N/A | 48.2/51.8 |
Projections [%] | |||||
Anteroposterior | 0.0 | 40.0 | 84.5 | 58.2 | 17.1 |
Posteroanterior | 100.0 | 60.0 | 15.5 | 41.8 | 82.9 |
Country | Vietnam | USA | USA | USA | Spain |
Contributing hospitals [n] | 2 | 1 | 1 | 1 | 1 |
Clinical setting | N/A | N/A | Inpatient and Outpatien t | Intensive Care Unit | N/A |
Radiography systems [n] | ≥ 8 | N/A | N/A | N/A | N/A |
Labeling method | Manual | Automatic (NLP) | Automatic (NLP) | Automatic (NLP) | Partially manual, Partially Automatic (NLP) |
Radiographs with cardiomegaly [%] | 11.8 | 2.5 | 12.6 | 19.7 | 8.9 |
Radiographs with Pleural effusion [%] | 4.1 | 11.9 | 41.3 | 22.6 | 6.3 |
Radiographs with pneumonia [%] | 4.0 | 1.3 | 2.5 | 6.5 | 4.7 |
Radiographs with atelectasis [%] | 0.8 | 10.3 | 16.7 | 19.9 | 5.6 |
Radiographs with consolidation [%] | 1.2 | 4.2 | 6.0 | 4.0 | 1.5 |
Radiographs with pneumothorax [%] | 0.4 | 4.7 | 10.3 | 4.6 | 0.4 |
Radiographs without abnormality [%] | 70.3 | 53.8 | 10.8 | 37.7 | 32.9 |
Indicated are the included datasets, i.e., VinDr-CXR28, ChestX-ray1429, CheXpert30, MIMIC-CXR31, and PadChest32, and their characteristics. Only frontal chest radiographs (both anteroposterior and posteroanterior projections) were used for this study, while lateral projections were disregarded. Multiple radiographs may have been included per patient. N/A not available, NLP natural language processing.