Separating two isotropic Gaussians, with a nonmonotone activation function (see Predicting Failure for details). Here , , and . The main frame presents the evolution of the population risk along the SGD trajectory, starting from two different initializations of for either or . In Inset, we plot the evolution of the average of for the same conditions. Symbols are empirical results. Continuous lines are predictions obtained with the reduced PDE (Eq. 13).