Skip to main content
. 2023 Aug 18;9:e1511. doi: 10.7717/peerj-cs.1511

Table 4. Optimal hyper-parameters for transformer neural networks architectures for Task 1.

Vocabulary size Embedding dimension Batch size Number of heads Attention layers Importance Optimizer
Transformer 2,500 16 32 2 1 1.245 RMSprop
5,000 8 128 4 2 1.928 RMSprop
10,000 8 64 8 4 1.594 Adam
16,000 8 32 2 2 1.117 Adamax
32,000 8 64 8 4 2.226 RMSprop
64,000 8 64 4 4 1.764 RMSprop