. 2023 Jan 23;12:1044496. doi: 10.3389/fonc.2022.1044496

Table 3.

Performance values and statistical significance test results of the best data augmentation strategies for the models trained with OPTIMAM Hologic database.

		Only synthetic BI-RADS D in training			Synthetic and real BI-RADS D in training
		FROC AUC	Gain	p-value	FROC AUC	Gain	p-value
OPTIMAM Hologic BI-RADS D Test Set	Baseline	79.71% (78.44, 80.98)	Ref	Ref	80.60% (79.20, 82.00)	Ref	Ref
	BC-Aug	79.62% (77.83, 81.41)	-0.09	0.0064	81.10% (80.40, 81.80)	+0.50	0.2277	0.2277
	OP-Aug	79.86% (78.30, 81.42)	+0.15	0.8269	80.75% (78.77, 82.73)	+0.15	0.5599	0.5599
	OP-CS-BC-Aug	80.95% (79.63, 82.27)	+1.24	0.0696	80.76% (79.92, 81.60)	+0.16	0.7921	0.7921
INbreast Dataset (external validation)	Baseline	81.51% (78.93, 84.09)	Ref	Ref	84.71% (83.39, 86.03)	Ref	Ref
	BC-Aug	85.66% (81.91, 89.41)	+4.15	0.0002	84.88% (82.86, 86.90)	+0.17	0.1666	0.1666
	OP-Aug	83.45% (80.03, 86.87)	+1.94	6.08e-05	86.16% (83.37, 88.95)	+1.45	0.0041	0.0041
	OP-CS-BC-Aug	84.47% (82.32, 86.62)	+2.95	0.0008	84.29% (82.22, 86.36)	-0.42	0.0162	0.0162

The columns on the left correspond to the models trained without real BI-RADS D mammograms. The baseline models were trained without synthetic images. The 95% Confidence Intervals of the FROC AUC are in parenthesis. The p-value was computed using the DeLong method with a maximum of 10 FPPI. Bold values correspond to the best performing strategy. Ref corresponds to the reference method.