Table 1.
Detection | Segmentation | Classification | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Fold | N | AP @.5 | mAP @.5:.95 | N | Median IOU | Median DICE | N | Superclasses? | Accuracy | MCC | AUROC (micro) | AUROC (macro) |
1 (Val.) | 6102 | 75.3 | 34.4 | 1389 | 78.5 | 87.9 | 5351 | No | 71.0 | 58.1 | 93.3 | 84.6 |
Yes | 77.5 | 65.2 | 93.7 | 89.0 | ||||||||
2 | 15442 | 74.9 | 33.2 | 3474 | 78.0 | 87.6 | 13597 | No | 70.1 | 56.9 | 93.8 | 83.6 |
Yes | 79.4 | 68.2 | 94.6 | 86.5 | ||||||||
3 | 12672 | 74.0 | 33.8 | 1681 | 80.2 | 89.0 | 11176 | No | 68.6 | 57.0 | 93.5 | 87.1 |
Yes | 79.0 | 68.1 | 94.4 | 89.4 | ||||||||
4 | 8260 | 75.3 | 33.5 | 1948 | 80.9 | 89.5 | 7288 | No | 73.1 | 61.8 | 94.5 | 85.0 |
Yes | 83.9 | 73.5 | 96.1 | 87.4 | ||||||||
5 | 7295 | 74.9 | 31.5 | 1306 | 78.1 | 87.7 | 6294 | No | 61.7 | 47.0 | 89.3 | 79.2 |
Yes | 68.4 | 52.4 | 89.0 | 80.8 | ||||||||
Mean (Std) | — | 74.8 (0.5) | 33.0 (0.9) | — | 79.3 (1.3) | 88.5 (0.8) | — | No | 68.4 (4.2) | 55.7 (5.4) | 92.8 (2.0) | 83.7 (2.9) |
Yes | 77.7 (5.7) | 65.6 (7.9) | 93.5 (2.7) | 86.0 (3.2) |
Note: All accuracy values are percentages. Fold 1 acted as the validation set for hyperparameter tuning, so the bottom row shows mean and standard deviation of four values (folds 2–5). Note that the number of testing set nuclei varied by fold because the split happens at the level of hospitals and not nuclei. Note that the classification accuracy is consistently higher when the assessment was done at the level of superclasses. Abbreviations: AP@.5, average precision when a threshold of 0.5 is used for considering a detection to be true; mAP@.5:.95, mean average precision at detection thresholds between 0.5 and 0.95.