. Author manuscript; available in PMC: 2020 May 1.

Published in final edited form as: Proc Conf Assoc Comput Linguist Meet. 2019 Jul;2019:6558–6569. doi: 10.18653/v1/p19-1656

Table 5:

Hyperparameters of Multimodal Transformer (MulT) we use for the various tasks. The “# of Crossmodal Blocks” and “# of Crossmodal Attention Heads” are for each transformer.

	CMU-MOSEI	CMU-MOSI	IEMOCAP

Batch Size	16	128	32
Initial Learning Rate	1e-3	1e-3	2e-3
Optimizer	Adam	Adam	Adam
Transformers Hidden Unit Size d	40	40	40
# of Crossmodal Blocks D	4	4	4
# of Crossmodal Attention Heads	8	10	10
Temporal Convolution Kernel Size (L/V /A)	(1 or 3)/3/3	(1 or 3)/3/3	3/3/5
Textual Embedding Dropout	0.3	0.2	0.3
Crossmodal Attention Block Dropout	0.1	0.2	0.25
Output Dropout	0.1	0.1	0.1
Gradient Clip	1.0	0.8	0.8
# of Epochs	20	100	30