Table 3.
Hyper-parameters tuning.
| Sl no | Model / Hyper-parameters | DistilBERT | BERT | RoBERTa |
|---|---|---|---|---|
| 1 | Pre-trained model | DistilBERT-base-uncased | Bert-base-uncased | Roberta-base |
| 2 | Learning rate | 1e−5, 5e−5, 3e−5, 4e−5, 2e−5 | 1e−5, 5e−5, 3e−5, 4e−5, 2e−5 | 1e−5, 5e−5, 3e−5, 4e−5, 2e−5 |
| 3 | Activation | Softmax | Softmax | Softmax |
| 4 | Batch size | 8, 16, 32 | 8, 16, 32 | 8, 16, 32 |
| 5 | Number of epochs trained | 10,5 | 10,5 | 10 |
| 6 | Maximum sequence length | 125 | 125 | 125 |
| 7 | Dropout | 0.2 | 0.2 | 0.2 |
| 8 | Hidden size | 768 | 768 | 768 |
| 9 | Optimizer | AdamW, Adam | AdamW, Adam | AdamW, Adam |