. 2022 Mar 4;13:1161. doi: 10.1038/s41467-022-28818-3

Table 2.

Classification accuracy under noisy evaluation labels.

Model	True	Noisy (40%)	Cleaned (10%)	Cleaned (20%)
SYM
M1	73.32	45.19 (0.21)	54.37 (0.13)	62.53 (0.09)
M2	80.16	48.93 (0.23)	58.45 (0.10)	67.42 (0.10)
IDN
M1	69.91	45.93 (0.10)	49.97 (0.06)	55.87 (0.06)
M2	65.76	46.50 (0.13)	49.90 (0.12)	55.28 (0.10)

Co-teaching (M1) and SSL (M2) models are compared on a noisy CIFAR10H validation set $D_{eval}$ over three runs using different label initialisations. Both approaches use ResNet-50 with different weight initialisation and regularisation. We compare classification accuracy on true, noisy, and cleaned labels. M2 is deliberately less regularised (i.e. weight decay) and is expected to perform worse on a more challenging IDN noise model. For each validation set, the highest accuracy is highlighted in bold.