Skip to main content
. 2022 Jul 20;9:429. doi: 10.1038/s41597-022-01498-w

Table 1.

An overview of existing public datasets for CXR interpretation.

Dataset Release year # findings # samples Image-level labels Local labels
JSRT12 2000 1 247(◃, *) Available Available
MC14 2014 1 138(◃, *) Available N/A
SH14 2014 1 662(◃, *) Available N/A
Indiana13 2016 10 8,121(◃, *) Available N/A
ChestX-ray88 2017 8 108,948(●) Available Available
ChestX-ray148 2017 14 112,120(●) Available N/A
CheXpert2 2019 14 224,316(●) Available N/A
Padchest9 2019 193 160,868(●,*) Available N/A††
MIMIC-CXR10 2019 14 377,110(●) Available N/A
VinDr-CXR (ours) 2020 28 18,000(*) Available Available

Labeled by an NLP algorithm. (*) Labeled by radiologists. (◃) Moderate-size datasets that are not applicable for training deep learning models. (†) A portion of the dataset (983 images) is provided with hand-labeled bounding boxes. (††) 27% of the dataset was manually annotated with encoded anatomical regions of the findings.