Table 3.
Performance values and statistical significance test results of the best data augmentation strategies for the models trained with OPTIMAM Hologic database.
| Only synthetic BI-RADS D in training | Synthetic and real BI-RADS D in training | |||||||
|---|---|---|---|---|---|---|---|---|
| FROC AUC | Gain | p-value | FROC AUC | Gain | p-value | |||
| OPTIMAM Hologic BI-RADS D Test Set | Baseline | 79.71% (78.44, 80.98) |
Ref | Ref | 80.60% (79.20, 82.00) |
Ref | Ref | |
| BC-Aug | 79.62% (77.83, 81.41) |
-0.09 | 0.0064 |
81.10%
(80.40, 81.80) |
+0.50 | 0.2277 | 0.2277 | |
| OP-Aug | 79.86% (78.30, 81.42) |
+0.15 | 0.8269 | 80.75% (78.77, 82.73) |
+0.15 | 0.5599 | 0.5599 | |
| OP-CS-BC-Aug |
80.95%
(79.63, 82.27) |
+1.24 | 0.0696 | 80.76% (79.92, 81.60) |
+0.16 | 0.7921 | 0.7921 | |
| INbreast Dataset (external validation) | Baseline | 81.51% (78.93, 84.09) |
Ref | Ref | 84.71% (83.39, 86.03) |
Ref | Ref | |
| BC-Aug |
85.66%
(81.91, 89.41) |
+4.15 | 0.0002 | 84.88% (82.86, 86.90) |
+0.17 | 0.1666 | 0.1666 | |
| OP-Aug | 83.45% (80.03, 86.87) |
+1.94 | 6.08e-05 |
86.16%
(83.37, 88.95) |
+1.45 | 0.0041 | 0.0041 | |
| OP-CS-BC-Aug | 84.47% (82.32, 86.62) |
+2.95 | 0.0008 | 84.29% (82.22, 86.36) |
-0.42 | 0.0162 | 0.0162 | |
The columns on the left correspond to the models trained without real BI-RADS D mammograms. The baseline models were trained without synthetic images. The 95% Confidence Intervals of the FROC AUC are in parenthesis. The p-value was computed using the DeLong method with a maximum of 10 FPPI. Bold values correspond to the best performing strategy. Ref corresponds to the reference method.