Skip to main content
. 2018 Jul 27;115(33):E7665–E7671. doi: 10.1073/pnas.1806579115

Fig. 4.

Fig. 4.

Separating two isotropic Gaussians, with a nonmonotone activation function (see Predicting Failure for details). Here N=800, d=320, and Δ=0.5. The main frame presents the evolution of the population risk along the SGD trajectory, starting from two different initializations of (wi0)iNiidN(0,κ2/dId) for either κ=0.1 or κ=0.4. In Inset, we plot the evolution of the average of w2 for the same conditions. Symbols are empirical results. Continuous lines are predictions obtained with the reduced PDE (Eq. 13).