a, Model metamers were generated for each stage of a ‘generation’ model (one of the models from Figs. 2c,d, 3, 4 and 6g for visual models and from Figs. 2f and 5 for auditory models). These metamers were presented to ‘recognition’ models (all other models from the listed figures). We measured recognition of the generating model’s metamers by each recognition model, averaging accuracy over all recognition models (excluding the generation model), as shown here for a standard-trained ResNet50 image model. Error bars represent s.e.m. over N = 28 recognition models. b, Average model recognition of metamers from the standard ResNet50, the three self-supervised ResNet50 models and the three adversarially trained ResNet50 models. To obtain self-supervised and adversarially trained results, we averaged each recognition model’s accuracy curve across all generating models and averaged these curves across recognition models. Error bars represent s.e.m. over N = 28 recognition models for standard models and N = 29 recognition models for adversarially trained and self-supervised models. c, Same as b but for Standard AlexNet, LowpassAlexNet and VOneAlexNet models from Fig. 6d–h. Error bars are over N = 28 recognition models. d, Same as b but for auditory models, with metamers generated from the standard CochResNet50, the three CochResNet50 models with waveform adversarial perturbations and the two CochResNet50 models with cochleagram adversarial perturbations. Chance performance is 1/794 for models because they had a ‘null’ class in addition to 793 word labels. Error bars represent s.e.m. over N = 16 recognition models for the standard model and N = 17 recognition models for adversarially trained models. e,f, Correlation between human and model recognition of another model’s metamers for visual (e; N = 219 model stages) and auditory (f; N = 144 model stages) models. Abscissa plots average human recognition accuracy of metamers generated from one stage of a model, and error bars represent s.e.m. across participants. Ordinate plots average recognition by other models of those metamers, and error bars represent s.e.m. across recognition models. Human recognition of a model’s metamers is highly correlated with other models’ recognition of those same model metamers.