. 2023 Dec 19;13:22576. doi: 10.1038/s41598-023-49956-8

Table 2.

On-domain evaluation of performance of the convolutional neural network—individual imaging findings.

Dataset	Training Strategy	Cardiomegaly	Pleural Effusion	Pneumonia	Atelectasis	Consolidation	Pneumothorax	No Abnormality	Average
VinDr-CXR	Local	92.2 ± 0.7	93.7 ± 1.4	88.3 ± 1.2	78.4 ± 3.13	88.1 ± 1.9	93.3 ± 2.3	87.08 ± 0.7	88.7 ± 5.2
	Collaborative	95.3 ± 0.5	98.6 ± 0.4	89.9 ± 1.0	91.2 ± 1.4	94.7 ± 1.0	98.5 ± 0.7	92.9 ± 0.5	94.4 ± 3.2
	P value	0.001	0.001	0.896	0.001	0.001	0.003	0.001	0.001
ChestX-ray14	Local	87.5 ± 0.5	81.5 ± 0.3	68.8 ± 1.1	74.7 ± 0.4	72.8 ± 0.5	84.4 ± 0.4	72.2 ± 0.3	77.4 ± 6.6
	Collaborative	89.4 ± 0.5	82.6 ± 0.3	73.3 ± 1.1	77.1 ± 0.4	74.7 ± 0.5	87.5 ± 0.3	73.1 ± 0.3	79.7 ± 6.4
	P value	0.001	0.001	0.001	0.001	0.001	0.001	0.001	0.001
CheXpert	Local	86.7 ± 0.3	87.3 ± 0.2	76.4 ± 0.8	68.4 ± 0.4	74.4 ± 0.5	85.5 ± 0.3	87.2 ± 0.3	80.8 ± 7.1
	Collaborative	86.7 ± 0.3	88.1 ± 0.2	73.8 ± 0.9	68.8 ± 0.4	74.6 ± 0.5	86.3 ± 0.3	87.7 ± 0.3	80.8 ± 7.5
	P value	0.443	0.001	0.001	0.864	0.681	0.001	0.001	0.509
MIMIC-CXR	Local	80.9 ± 0.2	90.7 ± 0.2	73.9 ± 0.5	81.7 ± 0.2	80.3 ± 0.5	86.5 ± 0.4	85.4 ± 0.2	82.8 ± 5.0
	Collaborative	78.8 ± 0.2	90.9 ± 0.1	74.1 ± 0.5	81.2 ± 0.2	82.2 ± 0.4	86.5 ± 0.5	85.0 ± 0.2	82.7 ± 5.1
	P value	0.001	0.045	0.768	0.001	0.001	0.442	0.001	0.088
PadChest	Local	92.2 ± 0.3	95.5 ± 0.3	84.8 ± 0.7	84.4 ± 0.6	89.0 ± 0.9	86.8 ± 2.0	85.8 ± 0.3	88.3 ± 3.9
	Collaborative	92.5 ± 0.2	95.9 ± 0.3	85.1 ± 0.6	84.3 ± 0.6	90.0 ± 0.8	92.5 ± 1.5	85.0 ± 0.3	89.3 ± 4.3
	P value	0.017	0.003	0.806	0.371	0.922	0.001	0.001	0.001

Performance metrics are indicated as the area under the receiver operating characteristic curve (AUROC) values for each dataset, training strategy (i.e., local or collaborative training), and imaging finding. See Table 1 for further details on dataset characteristics. Differences between locally and collaboratively trained models were assessed for statistical significance using bootstrapping, and p values were indicated.