Table 7. Model validation for hyperparameters selection.
Table lists losses of models with different hyperparameter values. represents the number of layers for the transformer architecture. is the dimension of the embedding space. The loss shown is the average cross entropy loss evaluated on a held out validation set.
|
32 | 64 | 128 |
|---|---|---|---|
| 4 | 83.1% | 88.4% | 90.7% |
| 6 | 86.3% | 94.6% | 96.8% |
| 8 | 90.5% | 96.4% | 96.8% |