[Preprint]. 2020 May 8:2020.05.04.20090803. [Version 1] doi: 10.1101/2020.05.04.20090803

Table 5.

Performance metrics achieved with the different combinations of the augmented training data toward classifying the baseline test data into bacterial and viral pneumonia categories. Bold values indicate superior performance.

Dataset	Acc.	AUC	Sens.	Spec.	Prec.	F	MCC
Baseline	0.9308	0.9565	0.9711	0.8649	0.9216	0.9457	0.8527

Data augmentation with weakly labeled images

Baseline + Montreal	0.9179	0.9479	0.9794	0.8176	0.8978	0.9368	0.827
Baseline + Twitter	0.9308	0.9577	0.9835	0.8446	0.9119	0.9464	0.8541
Baseline + NIH	0.9179	0.9600	0.9587	0.8514	0.9134	0.9355	0.8249
Baseline + CheXpert	0.9405	0.9689	0.9877	0.8624	0.9201	0.9542	0.8716
Baseline + RSNA	0.9359	0.9592	0.9877	0.8514	0.9158	0.9503	0.8653
Baseline + NIH + CheXpert	0.9333	0.9606	0.9835	0.8514	0.9154	0.9483	0.8594
Baseline + NIH + RSNA	0.9231	0.9642	0.9959	0.8041	0.8926	0.9415	0.8411
Baseline + CheXpert + RSNA	0.9359	0.9628	0.9835	0.8582	0.919	0.9501	0.8647
Baseline + NIH + CheXpert + RSNA	0.9154	0.9542	0.9794	0.8109	0.8944	0.935	0.8217
Baseline + CheXpert + Twitter	0.9103	0.9538	0.9629	0.8244	0.8997	0.9302	0.8088
Baseline + CheXpert +Montreal	0.9231	0.9595	0.9711	0.8446	0.9109	0.94	0.8365