Table 2.
Summary of the evaluation metrics of best-performed models on the internal test set with 95% confidence intervals
| Model | Accuracy | Precision | Sensitivity | Specificity | F1 Score | AUROC |
|---|---|---|---|---|---|---|
| a) Stage 1 Model performance on the internal test set containing only axillary patches | ||||||
| Stage 1 | ||||||
| Resnet18 | 0.94 (0.91, 0.97) | 0.94 (0.88, 0.99) | 0.93 (0.89, 0.98) | 0.96 (0.92, 0.99) | 0.93 (0.90, 0.97) | 0.98 (0.97–1.00) |
| VGG16 | 0.93 (0.89, 0.96) | 0.90 (0.84, 0.95) | 0.92 (0.87, 0.97) | 0.93 (0.89, 0.97) | 0.91 (0.87, 0.95) | 0.98 (0.97–0.99) |
| Densenet121 | 0.92 (0.88, 0.95) | 0.89 (0.84, 0.95) | 0.91 (0.85, 0.96) | 0.92 (0.88, 0.96) | 0.90 (0.86, 0.94) | 0.97 (0.94–0.99) |
| Stage 2 | ||||||
| Resnet18 | 0.97 (0.94, 0.99) | 0.95 (0.90, 0.98) | 0.97 (0.93, 1.00) | 0.96 (0.93, 0.99) | 0.96 (0.93, 0.98) | 1.00 (0.99–1.00) |
| b) Model performance on the internal test set containing both axillary and non-axillary patches | ||||||
| Stage 1 | ||||||
| Resnet18 | 0.85 (0.81–0.89) | 0.45 (0.33–0.57) | 0.95 (0.86–1.00) | 0.84 (0.79–0.89) | 0.61 (0.49–0.72) | 0.97 (0.91–1.00) |
| Stage 2 | ||||||
| Resnet18 | 0.99 (0.98–1.00) | 0.95 (0.86–1.00) | 0.97 (0.90–1.00) | 0.99 (0.98–1.00) | 0.96 (0.92–1.00) | 1.00 (1.00–1.00) |
AUROC, area under the receiver operating characteristic.
(a) Stage 1 Model performance on the internal test set containing only axillary patches. (b) Model performance on the internal test set containing both axillary and non-axillary patches.