Table 1.
An overview of existing public datasets for CXR interpretation.
| Dataset | Release year | # findings | # samples | Image-level labels | Local labels |
|---|---|---|---|---|---|
| JSRT12 | 2000 | 1 | 247(◃, *) | Available | Available |
| MC14 | 2014 | 1 | 138(◃, *) | Available | N/A |
| SH14 | 2014 | 1 | 662(◃, *) | Available | N/A |
| Indiana13 | 2016 | 10 | 8,121(◃, *) | Available | N/A |
| ChestX-ray88 | 2017 | 8 | 108,948(●) | Available | Available† |
| ChestX-ray148 | 2017 | 14 | 112,120(●) | Available | N/A |
| CheXpert2 | 2019 | 14 | 224,316(●) | Available | N/A |
| Padchest9 | 2019 | 193 | 160,868(●,*) | Available | N/A†† |
| MIMIC-CXR10 | 2019 | 14 | 377,110(●) | Available | N/A |
| VinDr-CXR (ours) | 2020 | 28 | 18,000(*) | Available | Available |
● Labeled by an NLP algorithm. (*) Labeled by radiologists. (◃) Moderate-size datasets that are not applicable for training deep learning models. (†) A portion of the dataset (983 images) is provided with hand-labeled bounding boxes. (††) 27% of the dataset was manually annotated with encoded anatomical regions of the findings.