Skip to main content
. 2023 Oct 16;26(11):2017–2034. doi: 10.1038/s41593-023-01442-0

Extended Data Fig. 9. Adversarial robustness of VOneAlexNet and LowPassAlexNet to different types of adversarial examples.

Extended Data Fig. 9

a, Adversarial robustness to “Fooling images”. Fooling images14 are constructed from random noise initializations (the same noise type used for initialization during model metamer generation) by making small Lp-constrained perturbations to cause the model to classify the noise as a particular target class. LowpassAlexNet and VOneAlexNet are more robust than the standard AlexNet for all perturbation types (ANOVA comparing VOneAlexNet or LowPassAlexNet to the standard architecture; main effect of architecture; F(1,8) > 6787.0, p < 0.031, ηp2>0.999, for all adversarial perturbation types in both cases), and although there was a significant robustness difference between LowPassAlexNet and VOneAlexNet, it was in the opposite direction as the difference in metamer recognizability: VOneAlexNet was more robust (ANOVA comparing VOneAlexNet to LowPassAlexNet; main effect of architecture; F(1,8) > 98.6, p < 0.031, ηp2>0.924 for all perturbation types). Error bars plot s.e.m. across five sets of target labels. b, Adversarial robustness to feature adversaries. Feature adversaries56 are constructed by perturbing a natural “source” image so that it yields model activations (at a particular stage) that are close to those evoked by a different natural “target” image, while constraining the perturbed image to remain within a small distance from the original natural image in pixel space. The robustness measure plotted here is averaged across adversaries generated for all stages of a model. LowpassAlexNet and VOneAlexNet were more robust than the standard AlexNet for all perturbation types (ANOVA comparing VOneAlexNet or LowPassAlexNet to the standard architecture; main effect of architecture F(1,8) > 90.8, p < 0.031, ηp2>0.919, for all adversarial perturbation types), and although there was a significant robustness difference between LowPassAlexNet and VOneAlexNet, it was again in the opposite direction as the difference in metamer recognizability: VOneAlexNet was more robust (ANOVA comparing VOneAlexNet to LowPassAlexNet; main effect of architecture; F(1,8) > 69.0, p < 0.031, ηp2>0.895 for all perturbation types). Here and in (c), error bars plot s.e.m. across five samples of target and source images. c, Performance on feature adversaries for each model stage used to obtain the average curve in b. LowpassAlexNet is not more robust than VOneAlexNet to any type of adversarial example, even though it has more recognizable model metamers (Fig. 6g), illustrating that metamers reveal a different type of model discrepancy than that revealed with typical metrics of adversarial robustness.