Table 3.
Model architectures
Model | Number of Filters/Units/Encoders | Embedding Dimension | Max Sequence Length | Dropout | Activation Function | Optimizer | Total Parameters |
---|---|---|---|---|---|---|---|
CNN | 8 | 200 | 557 | 0.3 | ReLU | Adam | 5.51 M |
RNN | 8 | 200 | 557 | 0.3 | ReLU | Adam | 5.50 M |
GRU | 8 | 200 | 557 | 0.3 | ReLU | Adam | 5.50 M |
LSTM | 8 | 200 | 557 | 0.3 | ReLU | Adam | 5.50 M |
Bi-LSTM | 8 | 200 | 557 | 0.3 | ReLU | Adam | 5.51 M |
Transformer Encoder | 1 encoder (2 heads) | 200 | 557 | 0.3 | ReLU | Adam | 5.94 M |
BERT-Base | 12 encoders (12 heads) | 768 | 512 |
0.3 (fine-tune layer) |
ReLU (fine-tune layer) | Adam (fine-tune layer) | 110 M |