Table 1.
Selector | Scoring | Classifier | Before cleaning | After cleaning | |
---|---|---|---|---|---|
(1) | Vanilla | Eq. (2) | Vanilla | 64.1 | 68.4 |
(2) | Vanilla | BALD34 | Vanilla | 64.1 | 68.3 |
(3) | SSL | Eq. (2) | Vanilla | 64.1 | 70.9 |
(4) | SSL | Eq. (2) | SSL | 78.7 | 80.3 |
(5) | Vanilla | Eq. (2) | SSL | 78.7 | 79.4 |
(6) | Co-teaching | Eq. (2) | Co-teaching | 66.5 | 68.8 |
(7) | – | – | ELR29 | 67.0 | – |
(8) | (Clean training) | Vanilla | 73.6 | – | |
(9) | (Clean training) | SSL | 80.7 | – |
Models are evaluated on a clean test set (N = 50 k) before and after relabelling 32.7% of samples in the training set (CIFAR10H, N = 5 k, η = 30%).