Skip to main content
. Author manuscript; available in PMC: 2022 Aug 30.
Published in final edited form as: Nat Med. 2021 Jan 11;27(2):244–249. doi: 10.1038/s41591-020-01174-9

Table 1:

Summary of additional model evaluation.

Additional DM & DBT Evaluation
Dataset Location Manufacturer Model Input Type AUC
OMI-DB UK Hologic 2D DM 0.963 ± 0.003
Site A – DM Oregon GE 2D DM 0.927 ± 0.008
Site E China Hologic 2D DM 0.971 ± 0.005
Site E (resampled) China Hologic 2D DM 0.956 ± 0.020
Site A – DBT Oregon Hologic 2D* DBT manufacturer synthetics 0.922 ± 0.016
Site A – DBT Oregon Hologic 3D DBT slices 0.947 ± 0.012
Site A – DBT Oregon Hologic 2D + 3D DBT manufacturer synthetics + slices 0.957 ± 0.010

All results correspond to using the “index” exam for cancer cases and “confirmed” negatives for the non-cancer cases, except for Site E where the negatives are unconfirmed. “Pre-index” results where possible and additional analysis are included in Extended Data Figure 8. Rows 1–2: Performance of the 2D deep learning model on held-out test sets of the OMI-DB (1,205 cancers, 1,538 negatives) and Site A (254 cancers, 7,697 negatives) datasets. Rows 3–4: Performance on a dataset collected at a Chinese hospital (Site E; 533 cancers, 1,000 negatives). The dataset consists entirely of diagnostic exams given the low prevalence of screening mammography in China. Nevertheless, even when adjusting for tumor size using bootstrap resampling to approximate the distribution of tumor sizes expected in an American screening population (see Methods), the model still achieves high performance (Row 4). Rows 5–7: Performance on DBT data (Site A - DBT; 78 cancers, 518 negatives). Row 5 contains results of the 2D model fine-tuned on the manufacturer-generated synthetic 2D images, which are created to augment/substitute DM images in a DBT study (*indicates this fine-tuned model). Row 6 contains the results of the weakly-supervised 3D model, illustrating strong performance when evaluated on the MSP images computed from the DBT slices. We note that when scoring the DBT volume as the maximum bounding box score over all of the slices, the strongly-supervised 2D model used to create the MSP images exhibits an AUC of 0.865±0.020. Thus, fine-tuning this model on the MSP images significantly improves its performance. Row 7: Results when combining predictions across the final 3D model and 2D models. The standard deviation for each AUC value was calculated via bootstrapping.