Table 2.
Dataset characteristics.
| Characteristics | Training set | Test set | |
|---|---|---|---|
| Collection statistics | Years | 2018 to 2020 | 2018 to 2020 |
| Number of scans | 15,000 | 3,000 | |
| Number of human annotators per scan | 3 | 5 | |
| Image size (pixel × pixel, median) | 2788 × 2446 | 2748 × 2394 | |
| Age (years, median)* | 43.77 | 31.80 | |
| Male (%)* | 52.21 | 55.90 | |
| Female (%)* | 47.79 | 44.10 | |
| Data size (GB) | 161 | 31.3 | |
| Local labels | 1. Aortic enlargement (%) | 2348 (15.65%) | 220 (7.33%) |
| 2. Atelectasis (%) | 62 (0.41%) | 86 (2.87%) | |
| 3. Cardiomegaly (%) | 1817 (12.11%) | 309 (10.30%) | |
| 4. Calcification (%) | 177 (1.18%) | 194 (6.47%) | |
| 5. Clavicle fracture (%) | 1 (0.01%) | 2 (0.07%) | |
| 6. Consolidation (%) | 121 (0.81%) | 96 (3.20%) | |
| 7. Edema (%) | 1 (0.01%) | 0 (0%) | |
| 8. Emphysema (%) | 14 (0.09%) | 3 (0.1%) | |
| 9. Enlarged PA (%) | 21 (0.14%) | 8 (0.27%) | |
| 10. Interstitial lung disease (ILD) (%) | 152 (1.01%) | 221 (7.37%) | |
| 11. Infiltration (%) | 245 (1.63%) | 58 (1.93%) | |
| 12. Lung cavity (%) | 21 (0.14%) | 9 (0.30%) | |
| 13. Lung cyst (%) | 4 (0.03%) | 2 (0.07%) | |
| 14. Lung opacity (%) | 547 (3.65%) | 84 (2.80%) | |
| 15. Mediastinal shift (%) | 85 (0.57%) | 20 (0.67%) | |
| 16. Nodule/Mass (%) | 410 (2.73%) | 176 (5.87%) | |
| 17. Pulmonary fibrosis (%) | 1017 (6.78%) | 217 (7.23%) | |
| 18. Pneumothorax (%) | 58 (0.39%) | 18 (0.60%) | |
| 19. Pleural thickening (%) | 882 (5.88%) | 169 (5.63%) | |
| 20. Pleural effusion (%) | 634 (4.23%) | 111 (3.70%) | |
| 21. Rib fracture (%) | 41 (0.27%) | 11 (0.37%) | |
| 22. Other lesion (%) | 363 (2.42%) | 94 (3.13%) | |
| Global labels | 23. Lung tumor (%) | 132 (0.88%) | 80 (2.67%) |
| 24. Pneumonia (%) | 469 (3.13%) | 246 (8.20%) | |
| 25. Tuberculosis (%) | 479 (3.19%) | 164 (5.47%) | |
| 26. Other diseases (%) | 4002 (26.68%) | 657 (21.90%) | |
| 27. COPD (%) | 7 (0.05%) | 2 (0.07%) | |
| 28. No finding (%) | 10606 (70.71%) | 2051 (68.37%) |
Note: the numbers of positive labels were reported based on the majority vote of the participating radiologists. (*) The calculations were only based on the CXR scans where patient’s sex and age were known.