TABLE 3.
Mean and standard deviation of the CAV test accuracies over all model architectures tested on the concept data test split. Results are given averaged over all datasets, as well as the averages over model architectures trained on single datasets.
| Dataset | SCDB | ISIC | EyePACS | Overall | ||||
|---|---|---|---|---|---|---|---|---|
| Mean | Std | Mean | Std | Mean | Std | Mean | Std | |
| Baseline | 84.03% | ±1.83 | 68.93% | ±3.99 | 64.41% | ±6.99 | 72.46% | ±4.27 |
| Overfit | 77.89% | ±2.26 | 67.80% | ±3.46 | 62.72% | ±7.01 | 69.47% | ±4.25 |
| DP | 73.69% | ±2.68 | 68.13% | ±4.31 | 61.58% | ±7.06 | 67.80% | ±4.68 |