. 2021 Aug 31;10:e68980. doi: 10.7554/eLife.68980

Table 1. Optimized GRU hyperparameters.

Parameter name	Description	Range	Final value
Number of layers	Multiple recurrent layers could be stacked on top of each other.	[1; 3]	1
Hidden size	Hidden state vector size.	[10; 500]	300
Learning rate	The rate at which network weights were updated during training.	[10^–6; 1]	0.0023
L2	Strength of the L2 weight regularization.	[0; 10]	0.0052
Gradient clipping	Gradient clipping (Pascanu et al., 2013) limits the gradient magnitude at a specified maximum value.	[yes; no]	Yes
Max. gradient	Value at which the gradients are clipped.	[0.1, 2]	1
Dropout	During training, a percentage of units could be set to 0 for regularization purposes (Srivastava et al., 2014).	[0; 0.2]	0
Residual connection	Feeding the input directly to the linear decoder bypassing the RNN’s computation.	[yes; no]	No
Batch size	The number of training trials fed into the network before each weight update.	[3; 20]	12