Skip to main content
. Author manuscript; available in PMC: 2020 May 1.
Published in final edited form as: Proc Conf Assoc Comput Linguist Meet. 2019 Jul;2019:6558–6569. doi: 10.18653/v1/p19-1656

Table 1:

Results for multimodal sentiment analysis on CMU-MOSI with aligned and non-aligned multimodal sequences. h means higher is better and l means lower is better. EF stands for early fusion, and LF stands for late fusion.

Metric Acc7h Acc2h F1h MAEl Corrh

(Word Aligned) CMU-MOSI Sentiment

EF-LSTM 33.7 75.3 75.2 1.023 0.608
LF-LSTM 35.3 76.8 76.7 1.015 0.625
RMFN (Liang et al., 2018) 38.3 78.4 78.0 0.922 0.681
MFM (Tsai et al., 2019) 36.2 78.1 78.1 0.951 0.662
RAVEN (Wang et al., 2019) 33.2 78.0 76.6 0.915 0.691
MCTN (Pham et al.,2019) 35.6 79.3 79.1 0.909 0.676

MulT (ours) 40.0 83.0 82.8 0.871 0.698

(Unaligned) CMU-MOSI Sentiment

CTC (Graves et al., 2006) + EF-LSTM 31.0 73.6 74.5 1.078 0.542
LF-LSTM 33.7 77.6 77.8 0.988 0.624
CTC + MCTN (Pham et al., 2019) 32.7 75.9 76.4 0.991 0.613
CTC + RAVEN (Wang et al., 2019) 31.7 72.7 73.1 1.076 0.544

MulT (ours) 39.1 81.1 81.0 0.889 0.686