Skip to main content
. 2018 Jun 19;9:2383. doi: 10.1038/s41467-018-04316-3

Fig. 6.

Fig. 6

Models accuracy using three weights regularization techniques on the Fashion-MNIST dataset. All models have been trained with stochastic gradient descent, having the same hyper-parameters, number of hidden layers (i.e. three), and number of hidden neurons per layer (i.e. 1000). ac use ReLU activation function for the hidden neurons and Nesterov momentum; df use ReLU activation function without Nesterov momentum; gi use SReLU activation function and Nesterov momentum; and jl use SReLU activation function without Nesterov momentum. a,d,g,j present experiments with SET-MLP; b,e,h,k with MLPFixProb; and c,f,i,l with MLP