Skip to main content
. 2022 Apr 18;122:108842. doi: 10.1016/j.asoc.2022.108842

Table 3.

Hyper-parameters tuning.

Sl no Model / Hyper-parameters DistilBERT BERT RoBERTa
1 Pre-trained model DistilBERT-base-uncased Bert-base-uncased Roberta-base
2 Learning rate 1e−5, 5e−5, 3e−5, 4e−5, 2e−5 1e−5, 5e−5, 3e−5, 4e−5, 2e−5 1e−5, 5e−5, 3e−5, 4e−5, 2e−5
3 Activation Softmax Softmax Softmax
4 Batch size 8, 16, 32 8, 16, 32 8, 16, 32
5 Number of epochs trained 10,5 10,5 10
6 Maximum sequence length 125 125 125
7 Dropout 0.2 0.2 0.2
8 Hidden size 768 768 768
9 Optimizer AdamW, Adam AdamW, Adam AdamW, Adam