Skip to main content
. 2023 Mar 2;13:3517. doi: 10.1038/s41598-023-30657-1

Table 3.

Hyperparameters optimized via population based training.

Hyperparameter BERT XLNet
Hidden size 144 144
Number of layers 12 6
Number of attention heads 12 6
Feed-forward layer hidden size 128 128
Learning rate 1×10-6 5×10-6
Batch size 16 16
Dropout 0.5 0.4