Table 2.
Model | True | Noisy (40%) | Cleaned (10%) | Cleaned (20%) |
---|---|---|---|---|
SYM | ||||
M1 | 73.32 | 45.19 (0.21) | 54.37 (0.13) | 62.53 (0.09) |
M2 | 80.16 | 48.93 (0.23) | 58.45 (0.10) | 67.42 (0.10) |
IDN | ||||
M1 | 69.91 | 45.93 (0.10) | 49.97 (0.06) | 55.87 (0.06) |
M2 | 65.76 | 46.50 (0.13) | 49.90 (0.12) | 55.28 (0.10) |
Co-teaching (M1) and SSL (M2) models are compared on a noisy CIFAR10H validation set over three runs using different label initialisations. Both approaches use ResNet-50 with different weight initialisation and regularisation. We compare classification accuracy on true, noisy, and cleaned labels. M2 is deliberately less regularised (i.e. weight decay) and is expected to perform worse on a more challenging IDN noise model. For each validation set, the highest accuracy is highlighted in bold.