. 2022 Jul 20;9:429. doi: 10.1038/s41597-022-01498-w

Table 1.

An overview of existing public datasets for CXR interpretation.

Dataset	Release year	# findings	# samples	Image-level labels	Local labels
JSRT¹²	2000	1	247^{(◃, *)}	Available	Available
MC¹⁴	2014	1	138^{(◃, *)}	Available	N/A
SH¹⁴	2014	1	662^{(◃, *)}	Available	N/A
Indiana¹³	2016	10	8,121^{(◃, *)}	Available	N/A
ChestX-ray8⁸	2017	8	108,948^(●)	Available	Available^†
ChestX-ray14⁸	2017	14	112,120^(●)	Available	N/A
CheXpert²	2019	14	224,316^(●)	Available	N/A
Padchest⁹	2019	193	160,868^(●,*)	Available	N/A^††
MIMIC-CXR¹⁰	2019	14	377,110^(●)	Available	N/A
VinDr-CXR (ours)	2020	28	18,000^(*)	Available	Available

^● Labeled by an NLP algorithm. ^(*) Labeled by radiologists. ^(◃) Moderate-size datasets that are not applicable for training deep learning models. ^(†) A portion of the dataset (983 images) is provided with hand-labeled bounding boxes. ^(††) 27% of the dataset was manually annotated with encoded anatomical regions of the findings.