Table 1:
Additional DM & DBT Evaluation | |||||
---|---|---|---|---|---|
Dataset | Location | Manufacturer | Model | Input Type | AUC |
OMI-DB | UK | Hologic | 2D | DM | 0.963 ± 0.003 |
Site A – DM | Oregon | GE | 2D | DM | 0.927 ± 0.008 |
Site E | China | Hologic | 2D | DM | 0.971 ± 0.005 |
Site E (resampled) | China | Hologic | 2D | DM | 0.956 ± 0.020 |
Site A – DBT | Oregon | Hologic | 2D* | DBT manufacturer synthetics | 0.922 ± 0.016 |
Site A – DBT | Oregon | Hologic | 3D | DBT slices | 0.947 ± 0.012 |
Site A – DBT | Oregon | Hologic | 2D + 3D | DBT manufacturer synthetics + slices | 0.957 ± 0.010 |
All results correspond to using the “index” exam for cancer cases and “confirmed” negatives for the non-cancer cases, except for Site E where the negatives are unconfirmed. “Pre-index” results where possible and additional analysis are included in Extended Data Figure 8. Rows 1–2: Performance of the 2D deep learning model on held-out test sets of the OMI-DB (1,205 cancers, 1,538 negatives) and Site A (254 cancers, 7,697 negatives) datasets. Rows 3–4: Performance on a dataset collected at a Chinese hospital (Site E; 533 cancers, 1,000 negatives). The dataset consists entirely of diagnostic exams given the low prevalence of screening mammography in China. Nevertheless, even when adjusting for tumor size using bootstrap resampling to approximate the distribution of tumor sizes expected in an American screening population (see Methods), the model still achieves high performance (Row 4). Rows 5–7: Performance on DBT data (Site A - DBT; 78 cancers, 518 negatives). Row 5 contains results of the 2D model fine-tuned on the manufacturer-generated synthetic 2D images, which are created to augment/substitute DM images in a DBT study (*indicates this fine-tuned model). Row 6 contains the results of the weakly-supervised 3D model, illustrating strong performance when evaluated on the MSP images computed from the DBT slices. We note that when scoring the DBT volume as the maximum bounding box score over all of the slices, the strongly-supervised 2D model used to create the MSP images exhibits an AUC of 0.865±0.020. Thus, fine-tuning this model on the MSP images significantly improves its performance. Row 7: Results when combining predictions across the final 3D model and 2D models. The standard deviation for each AUC value was calculated via bootstrapping.