Table 4.
Model | Speech | Text | Vision | Fusion |
---|---|---|---|---|
MFN [43] | LSTM | LSTM | LSTM | Feature fusion |
MCTN [32] | CNN | CNN | CNN | Concatenation |
RAVEN [33] | LSTM | LSTM | LSTM | Feature fusion |
HFusion [34] | openSMILE | Word2vec + CNN | 3D CNN | Hierarchical fusion |
MulT [35] | Conv 1D | Conv 1D | Conv 1D | Feature fusion |
SSE-FT [40] | Wav2vec | Roberta | FabNet | Hierarchical fusion |
CMC-HF(ours) | CSA encoder | EFC encoder | PC encoder | Hierarchical fusion |