Skip to main content
. 2022 Jul 22;5:102. doi: 10.1038/s41746-022-00635-4

Table 2.

CNN performance overview.

Performance at WSI-level (private data)
Dataset Micro-accuracy (SKET labels) Micro-accuracy (GT labels) Weighted F1-score (SKET labels) Weighted F1-score (GT labels)
 Catania 0.911 ± 0.004 0.918 ± 0.006 0.797 ± 0.011 0.807 ± 0.020
 Radboudumc 0.906 ± 0.005 0.909 ± 0.008 0.744 ± 0.020 0.758 ± 0.025
 Private data 0.908 ± 0.005 0.912 ± 0.006 0.769 ± 0.015 0.779 ± 0.019
Performance on publicly available images
Dataset Accuracy (SKET labels) Accuracy (GT labels) Weighted F1-score (SKET labels) Weighted F1-score (GT labels)
 GlaS36 0.745 ± 0.059 0.745 ± 0.065 0.717 ± 0.050 0.750 ± 0.066
 CRC37 0.876 ± 0.014 0.856 ± 0.024 0.878 ± 0.019 0.855 ± 0.024
 UNITOPATHO31,32 (single sections) 0.549 ± 0.025 0.543 ± 0.026 0.590 ± 0.015 0.591 ± 0.020
 UNITOPATHO31,32 (WSIs) 0.750 ± 0.022 0.770 ± 0.025 0.723 ± 0.024 0.764 ± 0.023
 TCGA-COAD33 0.862 ± 0.051 0.868 ± 0.093 0.925 ± 0.029 0.927 ± 0.056
 Xu38 0.717 ± 0.053 0.728 ± 0.038 0.677 ± 0.084 0.725 ± 0.041
 AIDA34 0.743 ± 0.046 0.760 ± 0.030 0.744 ± 0.047 0.752 ± 0.026
 IMP-CRC35 0.706 ± 0.035 0.678 ± 0.048 0.707 ± 0.033 0.682 ± 0.048

Results for the performance of the CNN on WSI-level classification task for the Catania and Radboudumc datasets (upper part) and for the classification of images from publicly available datasets (lower part). The performance at WSI-level is evaluated with micro-accuracy and weighted F1-score. For each classification type, the average and the standard deviation (of the models involved in the k-fold cross-validation) are reported for each metric, including cumulative results for each dataset. The performance is reported for the CNNs trained using the automatically generated weak labels (SKET labels) and the manually created ground truth weak labels (GT labels).