Skip to main content
. Author manuscript; available in PMC: 2020 May 1.
Published in final edited form as: Proc Conf Assoc Comput Linguist Meet. 2019 Jul;2019:6558–6569. doi: 10.18653/v1/p19-1656

Table 5:

Hyperparameters of Multimodal Transformer (MulT) we use for the various tasks. The “# of Crossmodal Blocks” and “# of Crossmodal Attention Heads” are for each transformer.

CMU-MOSEI CMU-MOSI IEMOCAP

Batch Size 16 128 32
Initial Learning Rate 1e-3 1e-3 2e-3
Optimizer Adam Adam Adam
Transformers Hidden Unit Size d 40 40 40
# of Crossmodal Blocks D 4 4 4
# of Crossmodal Attention Heads 8 10 10
Temporal Convolution Kernel Size (L/V /A) (1 or 3)/3/3 (1 or 3)/3/3 3/3/5
Textual Embedding Dropout 0.3 0.2 0.3
Crossmodal Attention Block Dropout 0.1 0.2 0.25
Output Dropout 0.1 0.1 0.1
Gradient Clip 1.0 0.8 0.8
# of Epochs 20 100 30