Skip to main content
. 2023 Dec 19;13:22576. doi: 10.1038/s41598-023-49956-8

Table 1.

Dataset characteristics.

VinDr-CXR ChestX-ray14 CheXpert MIMIC-CXR PadChest
Number of radiographs total (training set/test set) [n] 18,000 (15,000/3000) 112,120 (86,524/25,596) 157,878 (128,356/29,320) 213,921 (170,153/43,768 110,525 (88,480/22,045)
Number of patients (Total) [n] N/A 30,805 65,240 65,379 67,213
Patient age [years]
 Median 42 49 61 N/A 63
 Mean ± Standard deviation 54 ± 18 47 ± 17 60 ± 18 N/A 59 ± 20
 Range (minimum, maximum) (2, 91) (1, 96) (18, 91) N/A (1, 105)
Patient sex female/male [%]
 Training set 47.8/52.2 42.4/57.6 41.4/58.6 N/A 50.0/50.0
 Test set 44.1/55.9 41.9/58.1 39.0/61.0 N/A 48.2/51.8
Projections [%]
 Anteroposterior 0.0 40.0 84.5 58.2 17.1
 Posteroanterior 100.0 60.0 15.5 41.8 82.9
Country Vietnam USA USA USA Spain
Contributing hospitals [n] 2 1 1 1 1
Clinical setting N/A N/A Inpatient and Outpatien t Intensive Care Unit N/A
Radiography systems [n]  ≥ 8 N/A N/A N/A N/A
Labeling method Manual Automatic (NLP) Automatic (NLP) Automatic (NLP) Partially manual, Partially Automatic (NLP)
Radiographs with cardiomegaly [%] 11.8 2.5 12.6 19.7 8.9
Radiographs with Pleural effusion [%] 4.1 11.9 41.3 22.6 6.3
Radiographs with pneumonia [%] 4.0 1.3 2.5 6.5 4.7
Radiographs with atelectasis [%] 0.8 10.3 16.7 19.9 5.6
Radiographs with consolidation [%] 1.2 4.2 6.0 4.0 1.5
Radiographs with pneumothorax [%] 0.4 4.7 10.3 4.6 0.4
Radiographs without abnormality [%] 70.3 53.8 10.8 37.7 32.9

Indicated are the included datasets, i.e., VinDr-CXR28, ChestX-ray1429, CheXpert30, MIMIC-CXR31, and PadChest32, and their characteristics. Only frontal chest radiographs (both anteroposterior and posteroanterior projections) were used for this study, while lateral projections were disregarded. Multiple radiographs may have been included per patient. N/A not available, NLP natural language processing.