Skip to main content
. Author manuscript; available in PMC: 2024 Apr 1.
Published in final edited form as: Transact Mach Learn Res. 2023 Jun;2023:https://openreview.net/forum?id=K0CAGgjYS1.

Figure 1:

Figure 1:

For fixed R=1,η=0.1, ViT-base trained with DP-SGD under various noise σ has similar performance on CIFAR10 (setting in Section 5.3). Here ‘non-DP’ means both σ=0 and no clipping. Notice that the loss curves for different σ are very similar (though not the same) to each other, because we fix the random seed at the beginning of each iteration among different runs. This is to eliminate the potential difference from uncontrolled random realizations for fair comparison.