Table 5:
Hyperparameters of Multimodal Transformer (MulT) we use for the various tasks. The “# of Crossmodal Blocks” and “# of Crossmodal Attention Heads” are for each transformer.
| CMU-MOSEI | CMU-MOSI | IEMOCAP | |
|---|---|---|---|
| Batch Size | 16 | 128 | 32 |
| Initial Learning Rate | 1e-3 | 1e-3 | 2e-3 |
| Optimizer | Adam | Adam | Adam |
| Transformers Hidden Unit Size d | 40 | 40 | 40 |
| # of Crossmodal Blocks D | 4 | 4 | 4 |
| # of Crossmodal Attention Heads | 8 | 10 | 10 |
| Temporal Convolution Kernel Size (L/V /A) | (1 or 3)/3/3 | (1 or 3)/3/3 | 3/3/5 |
| Textual Embedding Dropout | 0.3 | 0.2 | 0.3 |
| Crossmodal Attention Block Dropout | 0.1 | 0.2 | 0.25 |
| Output Dropout | 0.1 | 0.1 | 0.1 |
| Gradient Clip | 1.0 | 0.8 | 0.8 |
| # of Epochs | 20 | 100 | 30 |