Table 11:
Component | Model | Parameter | Value |
---|---|---|---|
Text Encoder | 2-Layer MaxoutMLP | Hidden size Output dim MLP num |
512 128/256/512 2 |
Image Encoder | 2-Layer MaxoutMLP | Hidden size Output dim MLP num |
1024 128/256/512 2 |
Classification Head | Linear | ||
2-Layer MLP | Hidden size Activation |
512 ReLU |
|
2-Layer Maxout_Linear | Hidden size MLP num |
512 2 |
|
Fusion | Concatenate | ||
LRTF [106] | Output dim Ranks |
512 128 |
|
MI-Matrix [77] | output dim | 1024 | |
Training | Unimodal, EF, LF, LRTF, MI-Matrix | Loss Batch size Num epochs Optimizer Learning rate Weight decay |
Binary Cross Entropy 128 Text: 125, Image: 25, LF:5, EF/LRTF:15, MI-Matrix:20 AdamW Unimodal: 0.0001, EF: 0.04, LF/LRTF/MI-Matrix: 0.008 0.01 |
CCA [145] | Loss CCA weight Batch size Num epochs Optimizer Learning rate Weight decay |
Binary Cross Entropy + CCA 0.001 800 20 AdamW 0.01 0.01 |
|
RMFE [53] | Loss Regularization weight Batch size Num epochs Optimizer Learning rate Weight decay |
Binary Cross Entropy + Regularization 1e −10 128 10 AdamW 0.01 0.01 |
|
RefNet [135] | Loss Contrast weight Self-supervised weight Batch size Num epochs Optimizer Learning rate Weight decay |
Binary Cross Entropy + Contrast + Self-supervised 0.0001 0.1 128 10 AdamW 0.01 0.01 |
|
MFM [155] | Loss Batch size Num epochs Optimizer Learning rate Recon Loss Modality Weight Cross Entropy Weight Intermediate Modules |
Binary Cross Entropy + Reconstruction(MSE) 128 10 Adam 0.005 [1,1] 2.0 MLP [512,256,256] MLP [512,256,256] MLP [1024,512,256] |