Skip to main content
. 2022 Mar 4;13:1161. doi: 10.1038/s41467-022-28818-3

Table 2.

Classification accuracy under noisy evaluation labels.

Model True Noisy (40%) Cleaned (10%) Cleaned (20%)
SYM
 M1 73.32 45.19 (0.21) 54.37 (0.13) 62.53 (0.09)
 M2 80.16 48.93 (0.23) 58.45 (0.10) 67.42 (0.10)
IDN
 M1 69.91 45.93 (0.10) 49.97 (0.06) 55.87 (0.06)
 M2 65.76 46.50 (0.13) 49.90 (0.12) 55.28 (0.10)

Co-teaching (M1) and SSL (M2) models are compared on a noisy CIFAR10H validation set Deval over three runs using different label initialisations. Both approaches use ResNet-50 with different weight initialisation and regularisation. We compare classification accuracy on true, noisy, and cleaned labels. M2 is deliberately less regularised (i.e. weight decay) and is expected to perform worse on a more challenging IDN noise model. For each validation set, the highest accuracy is highlighted in bold.