Table 5:
Hyperparameters of Multimodal Transformer (MulT) we use for the various tasks. The “# of Crossmodal Blocks” and “# of Crossmodal Attention Heads” are for each transformer.
CMU-MOSEI | CMU-MOSI | IEMOCAP | |
---|---|---|---|
Batch Size | 16 | 128 | 32 |
Initial Learning Rate | 1e-3 | 1e-3 | 2e-3 |
Optimizer | Adam | Adam | Adam |
Transformers Hidden Unit Size d | 40 | 40 | 40 |
# of Crossmodal Blocks D | 4 | 4 | 4 |
# of Crossmodal Attention Heads | 8 | 10 | 10 |
Temporal Convolution Kernel Size (L/V /A) | (1 or 3)/3/3 | (1 or 3)/3/3 | 3/3/5 |
Textual Embedding Dropout | 0.3 | 0.2 | 0.3 |
Crossmodal Attention Block Dropout | 0.1 | 0.2 | 0.25 |
Output Dropout | 0.1 | 0.1 | 0.1 |
Gradient Clip | 1.0 | 0.8 | 0.8 |
# of Epochs | 20 | 100 | 30 |