Skip to main content
. 2022 Jul 20;9:429. doi: 10.1038/s41597-022-01498-w

Table 2.

Dataset characteristics.

Characteristics Training set Test set
Collection statistics Years 2018 to 2020 2018 to 2020
Number of scans 15,000 3,000
Number of human annotators per scan 3 5
Image size (pixel × pixel, median) 2788 × 2446 2748 × 2394
Age (years, median)* 43.77 31.80
Male (%)* 52.21 55.90
Female (%)* 47.79 44.10
Data size (GB) 161 31.3
Local labels 1. Aortic enlargement (%) 2348 (15.65%) 220 (7.33%)
2. Atelectasis (%) 62 (0.41%) 86 (2.87%)
3. Cardiomegaly (%) 1817 (12.11%) 309 (10.30%)
4. Calcification (%) 177 (1.18%) 194 (6.47%)
5. Clavicle fracture (%) 1 (0.01%) 2 (0.07%)
6. Consolidation (%) 121 (0.81%) 96 (3.20%)
7. Edema (%) 1 (0.01%) 0 (0%)
8. Emphysema (%) 14 (0.09%) 3 (0.1%)
9. Enlarged PA (%) 21 (0.14%) 8 (0.27%)
10. Interstitial lung disease (ILD) (%) 152 (1.01%) 221 (7.37%)
11. Infiltration (%) 245 (1.63%) 58 (1.93%)
12. Lung cavity (%) 21 (0.14%) 9 (0.30%)
13. Lung cyst (%) 4 (0.03%) 2 (0.07%)
14. Lung opacity (%) 547 (3.65%) 84 (2.80%)
15. Mediastinal shift (%) 85 (0.57%) 20 (0.67%)
16. Nodule/Mass (%) 410 (2.73%) 176 (5.87%)
17. Pulmonary fibrosis (%) 1017 (6.78%) 217 (7.23%)
18. Pneumothorax (%) 58 (0.39%) 18 (0.60%)
19. Pleural thickening (%) 882 (5.88%) 169 (5.63%)
20. Pleural effusion (%) 634 (4.23%) 111 (3.70%)
21. Rib fracture (%) 41 (0.27%) 11 (0.37%)
22. Other lesion (%) 363 (2.42%) 94 (3.13%)
Global labels 23. Lung tumor (%) 132 (0.88%) 80 (2.67%)
24. Pneumonia (%) 469 (3.13%) 246 (8.20%)
25. Tuberculosis (%) 479 (3.19%) 164 (5.47%)
26. Other diseases (%) 4002 (26.68%) 657 (21.90%)
27. COPD (%) 7 (0.05%) 2 (0.07%)
28. No finding (%) 10606 (70.71%) 2051 (68.37%)

Note: the numbers of positive labels were reported based on the majority vote of the participating radiologists. (*) The calculations were only based on the CXR scans where patient’s sex and age were known.