Table 3. Optimal hyper-parameters for GRU, LSTM, and BiLSTM neural networks architectures for Task 1.
| Vocabulary size | Embedding dimension | Dropout | LSTM/GRU layer size | FF layer size | Importance | Optimizer | |
|---|---|---|---|---|---|---|---|
| GRU | 2,500 | 8 | 0.199 | 32 | 8 | 1.282 | RMSprop |
| 5,000 | 8 | 0.367 | 4 | 32 | 1.016 | RMSprop | |
| 10,000 | 8 | 0.681 | 64 | 256 | 1.228 | RMSprop | |
| 16,000 | 8 | 0.677 | 8 | 8 | 1.046 | Nadam | |
| 32,000 | 8 | 0.649 | 4 | 256 | 1.042 | Nadam | |
| 64,000 | 8 | 0.593 | 4 | 64 | 1.195 | Adamax | |
| LSTM | 2,500 | 8 | 0.340 | 8 | 256 | 2.423 | Nadam |
| 5,000 | 8 | 0.650 | 16 | 16 | 1.268 | Nadam | |
| 10,000 | 8 | 0.594 | 16 | 64 | 1.598 | Adam | |
| 16,000 | 8 | 0.189 | 64 | 8 | 1.055 | Adamax | |
| 32,000 | 8 | 0.643 | 64 | 32 | 2.187 | Adam | |
| 64,000 | 8 | 0.207 | 64 | 8 | 2.173 | Adam | |
| BiLSTM | 2,500 | 6 | 0.646 | 8 | 256 | 1.519 | Adamax |
| 5,000 | 8 | 0.678 | 16 | 8 | 1.622 | RMSprop | |
| 10,000 | 8 | 0.405 | 8 | 8 | 1.860 | Adam | |
| 16,000 | 8 | 0.639 | 16 | 8 | 2.493 | RMSprop | |
| 32,000 | 8 | 0.279 | 16 | 16 | 1.039 | Adamax | |
| 64,000 | 8 | 0.430 | 16 | 16 | 1.870 | RMSprop |