TABLE 2:
Repeatability analysis highlighting quadratic weighted kappa (QWK) summary statistics – mean, median with interquartile range (IQR) and adjusted linear regression (LR) β values – for design choices within each design choice category for our automated visual evaluation (AVE) classifier. Rows shaded in salmon indicate design choices filtered out at this stage due to poor repeatability.
Design Choice Category | Design Choices | QWK summary | ||||
---|---|---|---|---|---|---|
| ||||||
Mean (SD) | Median (IQR) | Adjusted LR β | ||||
Architecture | densenet121 | 0.743 | (0.062) | 0.748 | (0.719 - 0.786) | −0.016 |
| ||||||
resnest50 | 0.675 | (0.069) | 0.649 | (0.630 - 0.743) | −0.083** | |
| ||||||
resnet50 | 0.752 | (0.048) | 0.760 | (0.736 - 0.776) | −0.018 | |
SWT | 0.743 | (0.079) | 0.748 | (0.671 - 0.815) | ref | |
| ||||||
Loss Function | Cross Entropy | 0.725 | (0.069) | 0.738 | (0.671 - 0.771) | −0.039** |
| ||||||
Focal | 0.717 | (0.070) | 0.730 | (0.654 - 0.773) | −0.078** | |
| ||||||
QWK | 0.779 | (0.042) | 0.782 | (0.752 - 0.809) | ref | |
| ||||||
CORAL | 0.678 | (0.056) | 0.649 | (0.636 - 0.729) | −0.069** | |
| ||||||
Balancing strategy | Balanced loss | 0.703 | (0.107) | 0.751 | (0.647 - 0.769) | −0.053** |
Balanced sampling | 0.729 | (0.057) | 0.735 | (0.675 - 0.781) | −0.046** | |
Remove controls | 0.775 | (0.054) | 0.777 | (0.744 - 0.809) | ref | |
Sampling 1:1:2 | 0.744 | (0.055) | 0.758 | (0.728 - 0.783) | −0.042** | |
Sampling 1:1:4 | 0.776 | (0.033) | 0.772 | (0.752 - 0.798) | −0.026 | |
Sampling 2:1:1 | 0.764 | (0.017) | 0.762 | (0.750 - 0.778) | −0.045 | |
None | 0.706 | (0.069) | 0.721 | (0.638 - 0.749) | −0.019 | |
| ||||||
Dropout | No Dropout | 0.663 | (0.072) | 0.649 | (0.620 - 0.723) | −0.088** |
| ||||||
Train Dropout only | 0.725 | (0.058) | 0.738 | (0.681 - 0.759) | −0.035** | |
Monte Carlo Dropout | 0.760 | (0.059) | 0.772 | (0.733 - 0.802) | ref | |
| ||||||
Multilevel Ground Truth | 3 level all patients | 0.740 | (0.068) | 0.752 | (0.719 - 0.780) | ref |
3 level subsets | 0.707 | (0.070) | 0.709 | (0.637 - 0.778) | −0.026** | |
5 level all patients | 0.705 | (0.064) | 0.721 | (0.650 - 0.748) | −0.025 |
SWT: Swin Transformer; CORAL: CORAL (consistent rank logits) loss, as described in the METHODS section; ref: reference category.