Skip to main content
. Author manuscript; available in PMC: 2014 Oct 7.
Published in final edited form as: Proc ACM Int Conf Multimodal Interact. 2012;2012:485–492. doi: 10.1145/2388676.2388781

Table 4.

Correlation performances (average of correlation coefficients evaluated over each sequence) of different modalities and their fusions for fully continuous affect dimension state estimation. Results for both sub-challenges (frame level FCSC and word level WLSC) are shown on development and test sets. Extrap. means extrapolation at non-speech regions for the audio and lexical indicators. PF stands for Particle filtering based fusion. Mixed means the use of the best fusion methods for each affect dimension

(a) FCSC - development
Arousal Expectancy Power Valence Avg.
Single Modality
Baseline Video
Video
0.157
0.306
0.130
0.215
0.115
0.242
0.186
0.370
0.146
0.283
Audio (Exterp.)
Lexical (Exterp.)
0.215
0.171
0.215
0.176
0.455
0.396
0.297
0.308
0.295
0.263
Speech Modality Fusions
Audio+Lex. (PF)
Audio+Lex. (Exterp., PF)
0.276
0.275
0.208
0.205
0.283
0.556
0.373
0.423
0.285
0.365
Video and Speech Modality Fusions
Baseline Video+Audio
Video+Audio+Lex. (PF)
Video+Audio+Lex. (Exterp., PF)
0.162
0.383
0.377
0.162
0.266
0.210
0.111
0.238
0.477
0.208
0.408
0.473
0.157
0.324
0.384
Video+Audio+Lex. (Mixed, PF) 0.383 0.266 0.556 0.473 0.420
(b) WLSC - development
Arousal Expectancy Power Valence Avg.
Single Modality
Baseline Video
Video
0.145
0.280
0.111
0.202
0.103
0.223
0.137
0.383
0.124
0.272
Baseline Audio
Audio
Lexical
0.097
0.257
0.230
0.052
0.239
0.211
0.061
0.147
0.160
0.085
0.270
0.270
0.074
0.228
0.218
Speech Modality Fusions
Audio+Lex. (PF)
Audio+Lex. (Exterp., PF)
0.323
0.329
0.222
0.169
0.192
0.296
0.358
0.377
0.274
0.293
Video and Speech Modality Fusions
Baseline Video+Audio
Video+Audio+Lex. (PF)
Video+Audio+Lex. (Exterp., PF)
0.113
0.350
0.384
0.090
0.263
0.182
0.083
0.209
0.228
0.114
0.437
0.458
0.100
0.315
0.313
Video+Audio+Lex. (Mixed, PF) 0.350 0.263 0.296 0.458 0.342
(c) FCSC - test
Arousal Expectancy Power Valence Avg.
Single Modality
Baseline Video
Video
0.092
0.251
0.121
0.153
0.064
0.099
0.140
0.210
0.104
0.178
Audio (Exterp.)
Lexical (Exterp.)
0.74
0.046
0.032
0.057
0.418
0.426
0.149
0.228
0.168
0.189
Fusion
Baseline Video+Audio
Video+Audio+Lex. (Mixed, PF)
0.149
0.359
0.110
0.215
0.138
0.477
0.146
0.325
0.136
0.344
(d) WLSC - test
Arousal Expectancy Power Valence Avg.
Single Modality
Baseline Video
Video
0.091
0.184
0.114
0.156
0.121
0.146
0.143
0.226
0.117
0.178
Baseline Audio
Audio
Lexical
0.119
0.078
0.078
0.075
0.087
0.190
0.056
0.073
0.189
0.076
0.128
0.189
0.081
0.091
0.162
Fusion
Baseline Video+Audio
Video+Audio+Lex. (Mixed, PF)
0.103
0.302
0.105
0.194
0.066
0.293
0.111
0.331
0.096
0.280