TABLE 3.
Hyperparameter tuning details, including parameter names, values of each hyperparameter, and optimal value.
| Hyperparameter | Values | Optimal value |
|---|---|---|
| Number of headself‐attention | {1, 3, 5, 6, 9} | 3 |
| Number of headcross‐attention | {2, 6, 10, 12, 18} | 6 |
| d model | {16,32,64,128} | 32 |
| Learning rate | {5e−3, 1e−3, 5e−4, 1e−4, 5e−5, 1e−5} | 5e−4 |