Skip to main content
[Preprint]. 2023 Mar 3:rs.3.rs-2526701. [Version 1] doi: 10.21203/rs.3.rs-2526701/v1

TABLE 2:

Repeatability analysis highlighting quadratic weighted kappa (QWK) summary statistics – mean, median with interquartile range (IQR) and adjusted linear regression (LR) β values – for design choices within each design choice category for our automated visual evaluation (AVE) classifier. Rows shaded in salmon indicate design choices filtered out at this stage due to poor repeatability.

Design Choice Category Design Choices QWK summary

Mean (SD) Median (IQR) Adjusted LR β
Architecture densenet121 0.743 (0.062) 0.748 (0.719 - 0.786) −0.016

resnest50 0.675 (0.069) 0.649 (0.630 - 0.743) −0.083**

resnet50 0.752 (0.048) 0.760 (0.736 - 0.776) −0.018
SWT 0.743 (0.079) 0.748 (0.671 - 0.815) ref

Loss Function Cross Entropy 0.725 (0.069) 0.738 (0.671 - 0.771) −0.039**

Focal 0.717 (0.070) 0.730 (0.654 - 0.773) −0.078**

QWK 0.779 (0.042) 0.782 (0.752 - 0.809) ref

CORAL 0.678 (0.056) 0.649 (0.636 - 0.729) −0.069**

Balancing strategy Balanced loss 0.703 (0.107) 0.751 (0.647 - 0.769) −0.053**
Balanced sampling 0.729 (0.057) 0.735 (0.675 - 0.781) −0.046**
Remove controls 0.775 (0.054) 0.777 (0.744 - 0.809) ref
Sampling 1:1:2 0.744 (0.055) 0.758 (0.728 - 0.783) −0.042**
Sampling 1:1:4 0.776 (0.033) 0.772 (0.752 - 0.798) −0.026
Sampling 2:1:1 0.764 (0.017) 0.762 (0.750 - 0.778) −0.045
None 0.706 (0.069) 0.721 (0.638 - 0.749) −0.019

Dropout No Dropout 0.663 (0.072) 0.649 (0.620 - 0.723) −0.088**

Train Dropout only 0.725 (0.058) 0.738 (0.681 - 0.759) −0.035**
Monte Carlo Dropout 0.760 (0.059) 0.772 (0.733 - 0.802) ref

Multilevel Ground Truth 3 level all patients 0.740 (0.068) 0.752 (0.719 - 0.780) ref
3 level subsets 0.707 (0.070) 0.709 (0.637 - 0.778) −0.026**
5 level all patients 0.705 (0.064) 0.721 (0.650 - 0.748) −0.025

SWT: Swin Transformer; CORAL: CORAL (consistent rank logits) loss, as described in the METHODS section; ref: reference category.