Skip to main content
. 2020 Dec;132:428–446. doi: 10.1016/j.neunet.2020.08.022

Fig. 11.

Fig. 11

Impact of initialization on performance of wide networks. We plot the generalization error decomposition as in Fig. 9 for (A) an under-parameterized Nh=10 and (B) an over-parameterized Nh=200 network. We demonstrate that large initializations have a detrimental impact on generalization performance in wide neural networks which are over-parameterized and not on smaller networks. This effect is due to the frozen subspace observed when training with fewer examples than free parameters, and again is in contrast to our finding that larger networks perform better in the low initialization setting depicted in Fig. 9. Other parameters: P=50, Nt=20, σϵ=0, σw¯=1.