Table 3.
Model | Regression | Classification | ERS | |
---|---|---|---|---|
mR2 | mAP | mRA | ||
A random method based on priors | ||||
Chance | 0 | 10.55 | 50 | 0.151 |
Learning from skeleton | ||||
ST-GCN | 0.044 | 12.63 | 55.96 | 0.194 |
LMA | 0.075 | 13.59 | 57.71 | 0.216 |
Learning from pixels | ||||
TF | −0.008 | 10.93 | 50.25 | 0.149 |
TS-ResNet101 | 0.084 | 17.04 | 62.29 | 0.240 |
I3D | 0.098 | 15.37 | 61.24 | 0.241 |
TSN | 0.095 | 17.02 | 62.70 | 0.247 |
TSN-Spatial | 0.048 | 15.34 | 60.03 | 0.212 |
TSN-Flow | 0.098 | 15.78 | 61.28 | 0.241 |
Best performance for each evaluation metric under each modality is highlighted in bold
mR2 = mean of R2 over dimensional emotions, mAP(%) = average precision/area under precision recall curve (PR AUC) over categorical emotions, mRA(%) = mean of area under ROC curve (ROC AUC) over categorical emotions, and ERS = emotion recognition score. Baseline methods: ST-GCN (Yan et al. 2018), TF (Kantorov and Laptev 2014), TS-ResNet101 (Simonyan and Zisserman 2014), I3D (Carreira and Zisserman 2017), and TSN (Wang et al. 2016)