. Author manuscript; available in PMC: 2014 Oct 7.

Published in final edited form as: Proc ACM Int Conf Multimodal Interact. 2012;2012:485–492. doi: 10.1145/2388676.2388781

Table 4.

Correlation performances (average of correlation coefficients evaluated over each sequence) of different modalities and their fusions for fully continuous affect dimension state estimation. Results for both sub-challenges (frame level FCSC and word level WLSC) are shown on development and test sets. Extrap. means extrapolation at non-speech regions for the audio and lexical indicators. PF stands for Particle filtering based fusion. Mixed means the use of the best fusion methods for each affect dimension

(a) FCSC - development
	Arousal	Expectancy	Power	Valence	Avg.
Single Modality
Baseline Video Video	0.157 0.306	0.130 0.215	0.115 0.242	0.186 0.370	0.146 0.283
Audio (Exterp.) Lexical (Exterp.)	0.215 0.171	0.215 0.176	0.455 0.396	0.297 0.308	0.295 0.263
Speech Modality Fusions
Audio+Lex. (PF) Audio+Lex. (Exterp., PF)	0.276 0.275	0.208 0.205	0.283 0.556	0.373 0.423	0.285 0.365
Video and Speech Modality Fusions
Baseline Video+Audio Video+Audio+Lex. (PF) Video+Audio+Lex. (Exterp., PF)	0.162 0.383 0.377	0.162 0.266 0.210	0.111 0.238 0.477	0.208 0.408 0.473	0.157 0.324 0.384
Video+Audio+Lex. (Mixed, PF)	0.383	0.266	0.556	0.473	0.420

(b) WLSC - development
	Arousal	Expectancy	Power	Valence	Avg.
Single Modality
Baseline Video Video	0.145 0.280	0.111 0.202	0.103 0.223	0.137 0.383	0.124 0.272
Baseline Audio Audio Lexical	0.097 0.257 0.230	0.052 0.239 0.211	0.061 0.147 0.160	0.085 0.270 0.270	0.074 0.228 0.218
Speech Modality Fusions
Audio+Lex. (PF) Audio+Lex. (Exterp., PF)	0.323 0.329	0.222 0.169	0.192 0.296	0.358 0.377	0.274 0.293
Video and Speech Modality Fusions
Baseline Video+Audio Video+Audio+Lex. (PF) Video+Audio+Lex. (Exterp., PF)	0.113 0.350 0.384	0.090 0.263 0.182	0.083 0.209 0.228	0.114 0.437 0.458	0.100 0.315 0.313
Video+Audio+Lex. (Mixed, PF)	0.350	0.263	0.296	0.458	0.342

(c) FCSC - test
	Arousal	Expectancy	Power	Valence	Avg.
Single Modality
Baseline Video Video	0.092 0.251	0.121 0.153	0.064 0.099	0.140 0.210	0.104 0.178
Audio (Exterp.) Lexical (Exterp.)	0.74 0.046	0.032 0.057	0.418 0.426	0.149 0.228	0.168 0.189
Fusion
Baseline Video+Audio Video+Audio+Lex. (Mixed, PF)	0.149 0.359	0.110 0.215	0.138 0.477	0.146 0.325	0.136 0.344

(d) WLSC - test
	Arousal	Expectancy	Power	Valence	Avg.
Single Modality
Baseline Video Video	0.091 0.184	0.114 0.156	0.121 0.146	0.143 0.226	0.117 0.178
Baseline Audio Audio Lexical	0.119 0.078 0.078	0.075 0.087 0.190	0.056 0.073 0.189	0.076 0.128 0.189	0.081 0.091 0.162
Fusion
Baseline Video+Audio Video+Audio+Lex. (Mixed, PF)	0.103 0.302	0.105 0.194	0.066 0.293	0.111 0.331	0.096 0.280