Skip to main content
. 2021 Feb 2;23(2):e23436. doi: 10.2196/23436

Table 2.

Overall mean performance of each task’s classifiers measured using mean balanced accuracy and evaluated on tile level and slide level.

Taska ResNet50 performance (mean balanced accuracy)
Tile level Slide level (95% CI)b
1: Patient age 76.2% 87.5%
2: Slide preparation date


Data set 1: 2015 versus 2017 54.1% 56.1% (52.7% to 59.5%)

Data set 1: 2016 versus 2018 56.5% 63.2% (53.4% to 73.0%)

Data set 2: 2014 versus 2016 69.0% 82.0% (76.4% to 87.6%)

Data set 2: 2015 versus 2017 66.6% 83.5% (80.9% to 86.1%)

Data set 2: 2016 versus 2018 52.7% 56.7% (52.6% to 60.7%)
3: Slide origin 94.2% 97.9% (97.3% to 98.5%)
4: Scanner type 100% 100%

aTest sets for each task had a minimum of 10 slides per class.

bConfidence intervals are shown for the decisive criteria (slide level) and are omitted for tasks where no variation on slide level was observed.