Skip to main content
. 2023 Aug 18;9:e1511. doi: 10.7717/peerj-cs.1511

Table 3. Optimal hyper-parameters for GRU, LSTM, and BiLSTM neural networks architectures for Task 1.

Vocabulary size Embedding dimension Dropout LSTM/GRU layer size FF layer size Importance Optimizer
GRU 2,500 8 0.199 32 8 1.282 RMSprop
5,000 8 0.367 4 32 1.016 RMSprop
10,000 8 0.681 64 256 1.228 RMSprop
16,000 8 0.677 8 8 1.046 Nadam
32,000 8 0.649 4 256 1.042 Nadam
64,000 8 0.593 4 64 1.195 Adamax
LSTM 2,500 8 0.340 8 256 2.423 Nadam
5,000 8 0.650 16 16 1.268 Nadam
10,000 8 0.594 16 64 1.598 Adam
16,000 8 0.189 64 8 1.055 Adamax
32,000 8 0.643 64 32 2.187 Adam
64,000 8 0.207 64 8 2.173 Adam
BiLSTM 2,500 6 0.646 8 256 1.519 Adamax
5,000 8 0.678 16 8 1.622 RMSprop
10,000 8 0.405 8 8 1.860 Adam
16,000 8 0.639 16 8 2.493 RMSprop
32,000 8 0.279 16 16 1.039 Adamax
64,000 8 0.430 16 16 1.870 RMSprop