Table 1.
Comparison of the K-EmoCon dataset with the existing multimodal emotion recognition datasets.
| Name (year) | Size | Modalities | Spon. vs. posed | Natural vs. induced | Annotation method | Annotation type | Context |
|---|---|---|---|---|---|---|---|
|
IEMOCAP (2008)51 |
10 |
Videos, face motion capture, gesture, speech (audio & transcribed) |
Both | Both† |
Per dialog turn |
S, E | Dyadic |
|
SEMAINE (2011)52 |
150 |
Videos, FAUs, speech (audio & transcribed) |
Spon. | Induced |
Trace-style continuous |
E | Dyadic |
|
MAHNOB-HCI (2011)23 |
27 |
Videos (face and body), eye gaze, audio, biosignals (EEG, GSR, ECG, respiration, skin temp.) |
Spon. | Induced | Per stimuli | S | Individual |
|
DEAP (2012)24 |
32 |
Face videos, biosignals (EEG, GSR, BVP, respiration, skin temp., EMG & EOG) |
Spon. | Induced | Per stimuli | S | Individual |
|
DECAF (2015)25 |
30 |
NIR face videos, biosignals (MEG, hEOG, ECG, tEMG) |
Spon. | Induced | Per stimuli | S | Individual |
|
ASCERTAIN (2016)26 |
58 |
Facial motion units (EMO), biosignals (ECG, GSR, EEG) |
Spon. | Induced | Per stimuli | S | Individual |
|
MSP-IMPROV (2016)53 |
12 | Face videos, speech audio | Both | Both† |
Per dialog turn |
E | Dyadic |
|
DREAMER (2017)27 |
23 | Biosignals (EEG, ECG) | Spon. | Induced | Per stimuli | S | Individual |
|
AMIGOS (2018)28 |
40 |
Vidoes (face & body), biosignals (EEG, ECG, GSR) |
Spon. | Induced | Per stimuli | S, E |
Individual, Group |
|
MELD (2019)38 |
7 |
Videos, speech (audio & transcribed) |
Both | Both† | Turn-based | E |
Dyadic, Group |
|
CASE (2019)29 |
30 |
Biosignals (ECG, respiration, BVP, GSR, skin temp., EMG) |
Spon. | Induced |
Trace-style continuous |
S | Individual |
|
CLAS (2020)100 |
64 |
Biosignals (ECG, PPG, EDA), accelerometer |
Spon. | Induced | Per stimuli/task | Predefined‡ | Individual |
|
K-EmoCon (2020) |
32 |
Videos (face, gesture), speech audio, accelerometer, biosignals (EEG, ECG, BVP, EDA, skin temp.) |
Spon. | Natural |
Interval-based continuous |
S, P, E | Dyadic |
Posed emotions are when a subject is instructed to enact a particular emotion while Spon. = spontaneous. Similarly, induced emotions are when a set of selected stimuli is used for their elicitation. For annotation types, S = self annotations, P = partner annotations, and E = external observer annotations.
†A dataset was considered to contain induced emotions if scripted interaction was involved in the data collection, even though no artificial stimuli (such as an emotion inducing video clip) was used.
‡Predefined emotion categories of stimuli and success rates of participants in a set of purposefully selected cognitive tasks were used as ground-truth labels.