Table 1.
Dataset | Release year | # findings | # samples | Image-level labels | Local labels |
---|---|---|---|---|---|
JSRT12 | 2000 | 1 | 247(◃, *) | Available | Available |
MC14 | 2014 | 1 | 138(◃, *) | Available | N/A |
SH14 | 2014 | 1 | 662(◃, *) | Available | N/A |
Indiana13 | 2016 | 10 | 8,121(◃, *) | Available | N/A |
ChestX-ray88 | 2017 | 8 | 108,948(●) | Available | Available† |
ChestX-ray148 | 2017 | 14 | 112,120(●) | Available | N/A |
CheXpert2 | 2019 | 14 | 224,316(●) | Available | N/A |
Padchest9 | 2019 | 193 | 160,868(●,*) | Available | N/A†† |
MIMIC-CXR10 | 2019 | 14 | 377,110(●) | Available | N/A |
VinDr-CXR (ours) | 2020 | 28 | 18,000(*) | Available | Available |
● Labeled by an NLP algorithm. (*) Labeled by radiologists. (◃) Moderate-size datasets that are not applicable for training deep learning models. (†) A portion of the dataset (983 images) is provided with hand-labeled bounding boxes. (††) 27% of the dataset was manually annotated with encoded anatomical regions of the findings.