. 2022 Apr 18;122:108842. doi: 10.1016/j.asoc.2022.108842

Table 3.

Hyper-parameters tuning.

Sl no	Model / Hyper-parameters	DistilBERT	BERT	RoBERTa
1	Pre-trained model	DistilBERT-base-uncased	Bert-base-uncased	Roberta-base
2	Learning rate	1e−5, 5e−5, 3e−5, 4e−5, 2e−5	1e−5, 5e−5, 3e−5, 4e−5, 2e−5	1e−5, 5e−5, 3e−5, 4e−5, 2e−5
3	Activation	Softmax	Softmax	Softmax
4	Batch size	8, 16, 32	8, 16, 32	8, 16, 32
5	Number of epochs trained	10,5	10,5	10
6	Maximum sequence length	125	125	125
7	Dropout	0.2	0.2	0.2
8	Hidden size	768	768	768
9	Optimizer	AdamW, Adam	AdamW, Adam	AdamW, Adam