Table 2.
Performance at WSI-level (private data) | ||||
---|---|---|---|---|
Dataset | Micro-accuracy (SKET labels) | Micro-accuracy (GT labels) | Weighted F1-score (SKET labels) | Weighted F1-score (GT labels) |
Catania | 0.911 ± 0.004 | 0.918 ± 0.006 | 0.797 ± 0.011 | 0.807 ± 0.020 |
Radboudumc | 0.906 ± 0.005 | 0.909 ± 0.008 | 0.744 ± 0.020 | 0.758 ± 0.025 |
Private data | 0.908 ± 0.005 | 0.912 ± 0.006 | 0.769 ± 0.015 | 0.779 ± 0.019 |
Performance on publicly available images | ||||
Dataset | Accuracy (SKET labels) | Accuracy (GT labels) | Weighted F1-score (SKET labels) | Weighted F1-score (GT labels) |
GlaS36 | 0.745 ± 0.059 | 0.745 ± 0.065 | 0.717 ± 0.050 | 0.750 ± 0.066 |
CRC37 | 0.876 ± 0.014 | 0.856 ± 0.024 | 0.878 ± 0.019 | 0.855 ± 0.024 |
UNITOPATHO31,32 (single sections) | 0.549 ± 0.025 | 0.543 ± 0.026 | 0.590 ± 0.015 | 0.591 ± 0.020 |
UNITOPATHO31,32 (WSIs) | 0.750 ± 0.022 | 0.770 ± 0.025 | 0.723 ± 0.024 | 0.764 ± 0.023 |
TCGA-COAD33 | 0.862 ± 0.051 | 0.868 ± 0.093 | 0.925 ± 0.029 | 0.927 ± 0.056 |
Xu38 | 0.717 ± 0.053 | 0.728 ± 0.038 | 0.677 ± 0.084 | 0.725 ± 0.041 |
AIDA34 | 0.743 ± 0.046 | 0.760 ± 0.030 | 0.744 ± 0.047 | 0.752 ± 0.026 |
IMP-CRC35 | 0.706 ± 0.035 | 0.678 ± 0.048 | 0.707 ± 0.033 | 0.682 ± 0.048 |
Results for the performance of the CNN on WSI-level classification task for the Catania and Radboudumc datasets (upper part) and for the classification of images from publicly available datasets (lower part). The performance at WSI-level is evaluated with micro-accuracy and weighted F1-score. For each classification type, the average and the standard deviation (of the models involved in the k-fold cross-validation) are reported for each metric, including cumulative results for each dataset. The performance is reported for the CNNs trained using the automatically generated weak labels (SKET labels) and the manually created ground truth weak labels (GT labels).