EB+ [24] |
2020 |
200 |
18–66 |
25 |
Five ethnicities (Latino/Hispanic, White, African American, Asian, and Others) |
1216 videos, with 395 K frames in total |
iSAFE [9] |
2020 |
44 |
17–22 |
60 |
Two ethnicities (Indo-Aryan and Dravidian (Asian)) |
395 clips |
RAF-DB * [70] |
2019 |
thousands |
- |
- |
The images URLs were collected from Flickr |
30,000 facial images |
TAVER * [26] |
2019 |
17 |
21–38 |
10 |
One ethnicity (Korean) |
17 videos of 1–4 mn |
4DFAB* [61] |
2018 |
180 |
5–75 |
60 |
Three ethnicities (Caucasian (Europeans and Arabs), Asian (East-Asian and South-Asian) and Hispanic/Latino) |
Two million frames. The vertex number of reconstructed 3D meshes ranges from 60 k to 75 k |
Aff-Wild2 * [71] |
2018 |
258 |
infants, young and elderly |
30 |
Five ethnicities (Caucasian, Hispanic or Latino, Asian, black, or African American) |
Extending it with 260 more subjects and 1,413,000 new video frames |
RAVDESS * [27] |
2018 |
24 |
21–33 |
30 |
(Caucasian, East-Asian, and Mixed (East-Asian Caucasian, and Black-Canadian First nations Caucasian)) |
7356 recordings composed of 4320 speech recordings and 3036 song recordings |
AM-FED+ * [72] |
2018 |
416 |
- |
14 |
Participants from around the world |
1044 videos of naturalistic facial responses to online media content recorded over the Internet |
GFT * [28] |
2017 |
96 |
21–28 |
- |
Participants were randomly selected |
172,800 frames |
AffectNet* [73] |
2017 |
450,000 |
average age 33.01 years |
- |
More than 1,000,000 facial images from the Internet |
1,000,000 images with facialandmarks. 450,000 images annotated manually |
AFEW-VA* [74] |
2017 |
240 |
8–76 |
- |
Movie actors |
600 video clips |
SEWA* [29] |
2017 |
398 |
18–65 |
20–30 |
Six ethnicities (British, German, Hungarian, Greek, Serbian, and Chinese) |
1990 audio-visual recording clips |
BP4D+ (MMSE) [25] |
2016 |
140 |
18–66 |
25 |
Five ethnicities (Latino/Hispanic, White, African American, Asian, and Others) |
1.4 million frames. Over 10TB high quality data generated for the research community |
Aff-Wild * [75] |
2016 |
500 |
- |
- |
- |
500 videos from YouTube |
EmotioNet * [17] |
2016 |
1,000,000 |
- |
- |
One million images of facial expressions downloaded from the Internet |
Images queried from web: 100,000 images annotated manually, 900,000 images annotated automatically |
FER-Wild * [15] |
2016 |
24,000 |
- |
- |
- |
24,000 images from web |
BAUM-1 * [16] |
2016 |
31 |
19-65 |
30 |
One ethnicity (Turkish) |
1184 multimodal facial video clips contain spontaneous facial expressions and speech of 13 emotional and mental states |
BioVid Emo * [30] |
2016 |
86 |
18–65 |
- |
- |
15 standardized film clips |
Vinereactor * [76] |
2016 |
222 |
- |
web-cam |
Mechanical tuckers |
6029 video responses from 343 unique mechanical truck workers in response to 200 video stimulus. Total number of 1,380,343 video frames |
CHEAVD * [77] |
2016 |
238 |
11–62 |
25 |
- |
Extracted from 34 films, two TV series and four other television shows. In the wild |
ISED * [31] |
2016 |
50 |
18–22 |
50 |
One ethnicity (India) |
428 videos |
4D CCDb * [32] |
2015 |
4 |
20–50 |
60 |
- |
34 audio-visuals |
MAHNOB Mimicry * [33] |
2015 |
60 |
18–34 |
25 |
Staff and students at Imperial College London |
Over 54 sessions of dyadic interactions between 12 confederates and their 48 counterparts |
OPEN-EmoRec-II * [34] |
2015 |
30 |
Mean age: women 37.5 years; men 51.1 years |
- |
- |
Video, audio, physiology (SCL, respiration, BVP, EMG Corrugator supercilii, EMG Zygomaticus Major) and facial reactions annotations |
HAPPEI * [78] |
2015 |
8500 faces |
- |
- |
- |
4886 images. |
AVEC’14 * [35] |
2014 |
84 |
18–63 |
- |
German |
300 audio-visuals |
BAUM-2 * [13] |
2014 |
286 |
5–73 |
- |
two ethnicities (Turkish, English) |
1047 video clips |
BP4D-Spontaneous * [14] |
2013 |
41 |
18–29 |
25 |
four ethnicities (Asian, African-American, Hispanic, and Euro-American) |
368,036 frames |
DISFA * [36] |
2013 |
27 |
18–50 |
20 |
four ethnicities (Asian, Euro American, Hispanic, and African-American) |
130,000 frames |
RECOLA * [37] |
2013 |
46 |
Mean age: 22 years, standard deviation: three years |
- |
four ethnicities (French, Italian, German and Portuguese) |
27 videos |
AM-FED * [79] |
2013 |
242 |
Range of ages and ethnicities |
14 |
Viewers from a range of ages and ethnicities |
168,359 frames/242 facial videos |
FER-2013 * [11] |
2013 |
35,685 |
- |
- |
- |
Images queried from web |
AVEC’13 (AViD-Corpus) * [38] |
2013 |
292 |
18–63 |
30 one ethnicity (German) |
340 audio-visuals |
|
CCDb * [39] |
2013 |
16 |
25–56 |
- |
All participants were fully fluent in the Englishanguage |
30 audio-visuals |
MAHNOB Laughter * [62] |
2013 |
22 |
Average age: 27 and 28 years |
25 |
12 different countries and of different origins. |
180 sessions 563aughter episodes, 849 speech utterances, 51 posedaughs, 67 speech–laughs episodes and 167 other vocalizations annotated in the dataset |
DynEmo * [40] |
2013 |
358 |
25–65 |
25 |
One ethnicity (Caucasian) |
Two sets of 233 and 125 recordings of EFE of ordinary people |
PICS-Stirling ESRC 3D Face Database * [63] |
2013 |
99 |
- |
- |
- |
2D images, video sequences and 3D face scans |
DEAP * [41] |
2012 |
32 |
19–37 |
- |
Mostly European students |
40 one-minuteong videos shown to subjects |
AFEW * [10] |
2012 |
330 |
1–70 |
- |
Extracted from movies |
1426 sequences withength from 300 to 5400 ms. 1747 expressions |
SEMAINE * [42] |
2012 |
24 |
22–60 |
- |
Undergraduate and postgraduate students |
130,695 frames |
Belfast induced * [80] |
2012 |
Set1: 114 |
Undergraduate students |
- |
undergraduate students |
570 audio-visuals |
|
|
Set2: 82 |
Mean age of participants 23.78 |
- |
Undergraduate students, postgraduate students or employed professionals |
650 audio-visuals |
|
|
Set3: 60 |
age of participants 32.54 |
- |
(Peru, Northern Ireland) |
180 audio-visuals |
MAHNOB-HCI * [43] |
2012 |
27 |
19–40 |
60 |
Different educational background, from undergraduate students to postdoctoral fellows, with different English proficiency from intermediate to native speakers |
756 data sequences |
Hi4D-ADSIP * [12] |
2011 |
80 |
18–60 |
60 |
Undergraduate students from the Performing Arts Department at the University. Undergraduate students, postgraduate students and members of staff from other departments |
3360 images/sequences |
UNBC-McMaster (UNBC Shoulder Pain Archive (SP)) * [44] |
2011 |
25 |
- |
- |
Participants were self-identified while having a problem with shoulder pain |
48,398 frames/200 video sequences |
CAM3D * [45] |
2011 |
16 |
24–50 |
25 |
Three ethnicities (Caucasian, Asian and Middle Eastern) |
108 videos of 12 mental states |
SFEW * [7] |
2011 |
95 |
- |
- |
- |
700 images: 346 images in Set 1 and 354 images in Set 2 |
B3D(AC) * [46] |
2010 |
14 |
21–53 |
25 |
Native English speakers |
1109 sequences, 4.67 song |
USTC-NVIE * [64] |
2010 |
215 |
17–31 |
30 |
Students |
236 apex images |
CK+ * [47] |
2010 |
123 |
18–50 |
- |
Three ethnicities (Euro-American, Afro-American and other) |
593 sequences |
MMI-V * [65] |
2010 |
25 |
20–32 |
25 |
Three ethnicities (European, South American, Asian) |
1 h and 32 min of data. 392 segments |
AVLC * [67] |
2010 |
24 |
Average ages were respectively 30, 28 and 29 years |
25 |
eleven ethnicities (Belgium, France, Italy, UK, Greece, Turkey, Kazakhstan, India, Canada, USA and South Korea) |
1000 spontaneousaughs and 27 actedaughs |
AvID * [48] |
2009 |
15 |
19–37 |
- |
Native Slovenian speakers |
Approximately one-hour video for each subject |
AVIC [49] |
2009 |
21 |
≤30 and ≥40 |
25 |
Two ethnicities (Asian and European) |
No. episodes 324 |
DD [50] |
2009 |
57 |
- |
30 |
19% non-Caucasian |
No. episodes 238 |
VAM-faces * [81] |
2008 |
20 |
16–69 (70% ≤ 35) |
25 |
One ethnicity (German) |
1867 images (93.6 images per speaker on average) |
FreeTalk * [82] |
2008 |
4 |
- |
60 |
Originating from different countries and each of them speaking a different nativeanguage (Finnish, French, Japanese, and English) |
No. episodes 300 |
IEMOCAP * [68] |
2008 |
10 |
- |
120 |
Actors (fluent English speakers) |
Two hours of audiovisual data, including video, speech, motion capture of face, and text transcriptions |
SAL * [51] |
2008 |
4 |
- |
- |
- |
30 min sessions for each user |
HUMAINE * [52] |
2007 |
Multiple |
- |
- |
- |
50 ‘clips’ from naturalistic and induced data |
EmoTABOO * [53] |
2007 |
- |
- |
- |
French dataset |
10 clips |
AMI [69] |
2006 |
- |
- |
25 |
- |
A multi-modal data set consisting of 100 h of meeting recordings |
ENTERFACE * [54] |
2006 |
16 |
average age 25 |
- |
- |
- |
|
|
5 |
22–38 |
|
|
|
|
|
16 |
average age 25 |
|
|
|
RU-FACS [56] |
2005 |
100 |
18–30 |
24 |
Two ethnicities (African-American and Asian or Latino) |
400–800 min dataset |
MMI * [66] |
2005 |
19 |
19–62 |
24 |
Three ethnicities (European, Asian, or South American) |
Subjects portrayed 79 series of facial expressions. Image sequence of frontal and side view are captured. 740 static images/848 videos |
UT-Dallas * [55] |
2005 |
284 |
18–25 |
29.97 |
One ethnicity (Caucasians) |
1540 standardized clips |
MIT [57] |
2005 |
17 |
- |
- |
- |
Over 25,000 frames were scored |
EmoTV * [83] |
2005 |
48 |
- |
- |
French |
51 video clips |
UA-UIUC * [58] |
2004 |
28 |
Students |
- |
Students |
One video clip for each subject |
AAI [59] |
2004 |
60 |
18–30 |
- |
Two ethnicities (European American and Chinese American) |
One audiovisual for each subject |
Smile dataset [60] |
2001 |
95 |
- |
30 |
- |
195 spontaneous smiles |