Table 3.
Hyperparameters chosen for all our experiments
| Hyperparameter | 2010/2012/2014 |
|---|---|
| Dimension of token-level word representation | 50 |
| Dimension of character representation | 25 |
| Character-level LSTM size | 25 |
| Character-level CNN filter size | 3 |
| Character-level CNN filter number | 25 |
| Token-level LSTM size | 100 |
| Dropout probability | 0.5 |
| Learning rate | 0.005 |
| Gradient clipping | 5.0 |
| Training epochs | 50/30/55 |