Skip to main content
. 2021 Aug 31;10:e68980. doi: 10.7554/eLife.68980

Table 1. Optimized GRU hyperparameters.

Parameter name Description Range Final value
Number of layers Multiple recurrent layers could be stacked on top of each other. [1; 3] 1
Hidden size Hidden state vector size. [10; 500] 300
Learning rate The rate at which network weights were updated during training. [10–6; 1] 0.0023
L2 Strength of the L2 weight regularization. [0; 10] 0.0052
Gradient clipping Gradient clipping (Pascanu et al., 2013) limits the gradient magnitude at a specified maximum value. [yes; no] Yes
Max. gradient Value at which the gradients are clipped. [0.1, 2] 1
Dropout During training, a percentage of units could be set to 0 for regularization purposes (Srivastava et al., 2014). [0; 0.2] 0
Residual connection Feeding the input directly to the linear decoder bypassing the RNN’s computation. [yes; no] No
Batch size The number of training trials fed into the network before each weight update. [3; 20] 12