. 2020 Sep 8;7:293. doi: 10.1038/s41597-020-00630-y

Table 1.

Comparison of the K-EmoCon dataset with the existing multimodal emotion recognition datasets.

Name (year)	Size	Modalities	Spon. vs. posed	Natural vs. induced	Annotation method	Annotation type	Context
IEMOCAP (2008)⁵¹	10	Videos, face motion capture, gesture, speech (audio & transcribed)	Both	Both^†	Per dialog turn	S, E	Dyadic
SEMAINE (2011)⁵²	150	Videos, FAUs, speech (audio & transcribed)	Spon.	Induced	Trace-style continuous	E	Dyadic
MAHNOB-HCI (2011)²³	27	Videos (face and body), eye gaze, audio, biosignals (EEG, GSR, ECG, respiration, skin temp.)	Spon.	Induced	Per stimuli	S	Individual
DEAP (2012)²⁴	32	Face videos, biosignals (EEG, GSR, BVP, respiration, skin temp., EMG & EOG)	Spon.	Induced	Per stimuli	S	Individual
DECAF (2015)²⁵	30	NIR face videos, biosignals (MEG, hEOG, ECG, tEMG)	Spon.	Induced	Per stimuli	S	Individual
ASCERTAIN (2016)²⁶	58	Facial motion units (EMO), biosignals (ECG, GSR, EEG)	Spon.	Induced	Per stimuli	S	Individual
MSP-IMPROV (2016)⁵³	12	Face videos, speech audio	Both	Both^†	Per dialog turn	E	Dyadic
DREAMER (2017)²⁷	23	Biosignals (EEG, ECG)	Spon.	Induced	Per stimuli	S	Individual
AMIGOS (2018)²⁸	40	Vidoes (face & body), biosignals (EEG, ECG, GSR)	Spon.	Induced	Per stimuli	S, E	Individual, Group
MELD (2019)³⁸	7	Videos, speech (audio & transcribed)	Both	Both^†	Turn-based	E	Dyadic, Group
CASE (2019)²⁹	30	Biosignals (ECG, respiration, BVP, GSR, skin temp., EMG)	Spon.	Induced	Trace-style continuous	S	Individual
CLAS (2020)¹⁰⁰	64	Biosignals (ECG, PPG, EDA), accelerometer	Spon.	Induced	Per stimuli/task	Predefined^‡	Individual
K-EmoCon (2020)	32	Videos (face, gesture), speech audio, accelerometer, biosignals (EEG, ECG, BVP, EDA, skin temp.)	Spon.	Natural	Interval-based continuous	S, P, E	Dyadic

Posed emotions are when a subject is instructed to enact a particular emotion while Spon. = spontaneous. Similarly, induced emotions are when a set of selected stimuli is used for their elicitation. For annotation types, S = self annotations, P = partner annotations, and E = external observer annotations.

^†A dataset was considered to contain induced emotions if scripted interaction was involved in the data collection, even though no artificial stimuli (such as an emotion inducing video clip) was used.

^‡Predefined emotion categories of stimuli and success rates of participants in a set of purposefully selected cognitive tasks were used as ground-truth labels.