Fig. 5.
Spectra of the Hessian for the same solutions of Fig. 4, for various algorithms. The spectra are directly comparable since they are all computed on the same loss function (MSE; using CE does not change the results qualitatively) and the networks are normalized. (Top) The results with the parameter of the activation functions set to a value such that all solutions of all algorithms are still valid; this value is exclusively determined by the LAL algorithm. (Bottom) The results for a much lower value of that can be used when removing the LAL solutions, where differences between ceSGD-slow, eLAL and fBP that were not visible al higher can emerge (the spectrum of LAL would still be the widest by far even at this ).