Skip to main content
. 2024 Feb 8;8:10. doi: 10.1186/s41747-023-00411-3

Table 1.

Characteristics of the datasets utilized in this study

VinDr-CXR ChestX-ray14 CheXpert MIMIC-CXR UKA-CXR PadChest
Number of radiographs (total) 18,000 112,120 157,878 213,921 193,361 110,525
Number of radiographs (training set) 15,000 86,524 128,356 170,153 153,537 88,480
Number of radiographs (test set) 3,000 25,596 29,320 43,768 39,824 22,045
Number of patients N/A 30,805 65,240 65,379 54,176 67,213

Patient age (years)

Median

Mean ± standard deviation

Range (minimum, maximum)

42

54 ± 18 (2, 91)

49

47 ± 17 (1, 96)

61

60 ± 18 (18, 91)

N/A

N/A

N/A

68

66 ± 15 (1, 111)

63

59 ± 20 (1, 105)

Patient’s sex

Females/males [%]

Training set, test set

47.8/52.2

44.1/55.9

42.4/57.6

41.9/58.1

41.4/58.6

39.0/61.0

N/A

N/A

34.4/65.6

36.3/63.7

50.0/50.0

48.2/51.8

Projections [%]

Anteroposterior

Posteroanterior

0.0

100.0

40.0

60.0

84.5

15.5

58.2

41.8

100.0

0.0

17.1

82.9

Location Hanoi, Vietnam Maryland, USA California, USA Massachusetts, USA Aachen, Germany Alicante, Spain
Number of contributing hospitals 2 1 1 1 1 1
Labeling method Manual NLP (ChestX-ray14 labeler) NLP (CheXpert labeler) NLP (CheXpert labeler) Manual Manual & NLP (PadChest labeler)
Original labeling system Binary Binary Certainty Certainty Severity Binary
Accessibility of the dataset for research Public Public Public Public Internal Public

The table shows the statistics of the datasets used, including VinDr-CXR [21], ChestX-ray14 [22], CheXpert [23], MIMIC-CXR [24], UKA-CXR [3, 2528], and PadChest [29]. The values correspond to only frontal chest radiographs, with the percentages of total radiographs provided. Binary labeling system refers to diagnosing if a finding is present or not. “Severity” refers to classification of the severity of a finding. “Certainty” indicates that a certainty level was assigned to each finding during the labeling by either the experienced radiologists (manual) or an automatic natural language processing—NPL, labeler. Note that some datasets may include multiple radiographs per patient

N/A Not available