TABLE 1.
Data Set | Greater than 50 people recorded (# people) | Greater than 5,000 Clips (# of clips) | At least 6 emotion categories (# categories) | At least 8 ratersper clip for over 95% of clips (# raters) | All 3 rating modalities (which modalities) |
---|---|---|---|---|---|
CREMA-D (this work) | ✓ (91) | ✓ (7,442) | ✓ (6) | ✓ (4-12,mean 9.8) | ✓ (audio, visual, audio-visual) |
GEMEP [31] | x(10) | x (1,260) | ✓ (18) | ✓ (Audio 23, Visual 25, AV 23) | ✓ (audio, visual, audio-visual) |
De Silva Multimodal [32] | x(2) | x(72) | ✓ (6) | ✓ (18) | ✓ (audio, visual, audio-visual) |
Mower Provost [15] | x(1) | x(72) | x(5) | ✓ (117) | ✓ (audio, visual, audio-visual, AV mismatch) |
AV Integration [33] | x(6) | x(60) | x(2) | ✓ (8) | ✓ (audio, visual, audio-visual, AV mismatch) |
AV Synthetic Character [34] | x (1 female voice,1 animated face) | x (210) | x(4) | x (3 to 4 for AV, 6 to 7 for A or V) | ✓ (audio, visual, audio-visual) |
RekEmozio [35] | x(17) | x (2,720) | x(0) | x (3 to 4) | x (audio for oral, visual for faces) |
Vera Am Mittag German Audio- Visual Database [36] | ✓ (104) | x (1,421) | x (7, for faces) | ✓(Audio : 6 or 17 Face: 8-34,mean 14) | x (audio, visual) |
IEMOCAP [37] | x(10) | ✓ (10,039) | ✓ (9) | x(3) | x (audio-visual) |
Chen Bimodal [38] | ✓ (100) | ✓ (9,900) | ✓ (11) | x (None) | x (audio-visual) |
HUMAINE [39] | x (≤48, unspecified) | x(48) | ✓ (48) | x(6) | x (audio-visual) |
RECOLA [40] | x(46) | x(46) | x(2) | x(6) | x (audio-visual) |
CHAD [41] | x(42) | ✓ (6,228) | ✓ (7) | ✓ (120) | x (audio) |
MAHNOB-HCI [42] | x(27) | x (1,296) | ✓ (9) | x(1) | x (self-report) |
✓ indicates the criterion is met. x -indicates criterion is not met. Each highlighted cell indicates that the criterion for the column was met by the data set.