Skip to main content
. 2019 Dec 23;117(1):161–170. doi: 10.1073/pnas.1908636117

Fig. 5.

Fig. 5.

Spectra of the Hessian for the same solutions of Fig. 4, for various algorithms. The spectra are directly comparable since they are all computed on the same loss function (MSE; using CE does not change the results qualitatively) and the networks are normalized. (Top) The results with the parameter β of the activation functions set to a value such that all solutions of all algorithms are still valid; this value is exclusively determined by the LAL algorithm. (Bottom) The results for a much lower value of β that can be used when removing the LAL solutions, where differences between ceSGD-slow, eLAL and fBP that were not visible al higher β can emerge (the spectrum of LAL would still be the widest by far even at this β).