. Author manuscript; available in PMC: 2024 May 21.

Published in final edited form as: Adv Neural Inf Process Syst. 2021 Dec;2021(DB1):1–20.

Table 11:

Table of hyperparameters for prediction on MM-IMDb dataset in the multimedia domain.

Component	Model	Parameter	Value
Text Encoder	2-Layer MaxoutMLP	Hidden size Output dim MLP num	512 128/256/512 2
Image Encoder	2-Layer MaxoutMLP	Hidden size Output dim MLP num	1024 128/256/512 2
Classification Head	Linear
	2-Layer MLP	Hidden size Activation	512 ReLU
	2-Layer Maxout_Linear	Hidden size MLP num	512 2
Fusion	Concatenate
	LRTF [106]	Output dim Ranks	512 128
	MI-Matrix [77]	output dim	1024
Training	Unimodal, EF, LF, LRTF, MI-Matrix	Loss Batch size Num epochs Optimizer Learning rate Weight decay	Binary Cross Entropy 128 Text: 125, Image: 25, LF:5, EF/LRTF:15, MI-Matrix:20 AdamW Unimodal: 0.0001, EF: 0.04, LF/LRTF/MI-Matrix: 0.008 0.01
	CCA [145]	Loss CCA weight Batch size Num epochs Optimizer Learning rate Weight decay	Binary Cross Entropy + CCA 0.001 800 20 AdamW 0.01 0.01
	RMFE [53]	Loss Regularization weight Batch size Num epochs Optimizer Learning rate Weight decay	Binary Cross Entropy + Regularization 1e −10 128 10 AdamW 0.01 0.01
	RefNet [135]	Loss Contrast weight Self-supervised weight Batch size Num epochs Optimizer Learning rate Weight decay	Binary Cross Entropy + Contrast + Self-supervised 0.0001 0.1 128 10 AdamW 0.01 0.01
	MFM [155]	Loss Batch size Num epochs Optimizer Learning rate Recon Loss Modality Weight Cross Entropy Weight Intermediate Modules	Binary Cross Entropy + Reconstruction(MSE) 128 10 Adam 0.005 [1,1] 2.0 MLP [512,256,256] MLP [512,256,256] MLP [1024,512,256]