Skip to main content
. 2022 Feb 16;22(4):1524. doi: 10.3390/s22041524

Table 5.

Macro-expressions datasets. The columns report: the dataset name (Dataset); the number of subjects; the range of subjects’ age (Age); the number of frames captured per second (FPS); ethnicity; and the amount of data/frames. In the table cells, a ‘-’ indicates that no information is available, while a ‘*’ following the dataset name indicates that the data is publicly available.

Dataset Year Number of Subjects Age FPS Ethnicity Amount of Data/Frames
EB+ [24] 2020 200 18–66 25 Five ethnicities (Latino/Hispanic, White, African American, Asian, and Others) 1216 videos, with 395 K frames in total
iSAFE [9] 2020 44 17–22 60 Two ethnicities (Indo-Aryan and Dravidian (Asian)) 395 clips
RAF-DB * [70] 2019 thousands - - The images URLs were collected from Flickr 30,000 facial images
TAVER * [26] 2019 17 21–38 10 One ethnicity (Korean) 17 videos of 1–4 mn
4DFAB* [61] 2018 180 5–75 60 Three ethnicities (Caucasian (Europeans and Arabs), Asian (East-Asian and South-Asian) and Hispanic/Latino) Two million frames. The vertex number of reconstructed 3D meshes ranges from 60 k to 75 k
Aff-Wild2 * [71] 2018 258 infants, young and elderly 30 Five ethnicities (Caucasian, Hispanic or Latino, Asian, black, or African American) Extending it with 260 more subjects and 1,413,000 new video frames
RAVDESS * [27] 2018 24 21–33 30 (Caucasian, East-Asian, and Mixed (East-Asian Caucasian, and Black-Canadian First nations Caucasian)) 7356 recordings composed of 4320 speech recordings and 3036 song recordings
AM-FED+ * [72] 2018 416 - 14 Participants from around the world 1044 videos of naturalistic facial responses to online media content recorded over the Internet
GFT * [28] 2017 96 21–28 - Participants were randomly selected 172,800 frames
AffectNet* [73] 2017 450,000 average age 33.01 years - More than 1,000,000 facial images from the Internet 1,000,000 images with facialandmarks. 450,000 images annotated manually
AFEW-VA* [74] 2017 240 8–76 - Movie actors 600 video clips
SEWA* [29] 2017 398 18–65 20–30 Six ethnicities (British, German, Hungarian, Greek, Serbian, and Chinese) 1990 audio-visual recording clips
BP4D+ (MMSE) [25] 2016 140 18–66 25 Five ethnicities (Latino/Hispanic, White, African American, Asian, and Others) 1.4 million frames. Over 10TB high quality data generated for the research community
Aff-Wild * [75] 2016 500 - - - 500 videos from YouTube
EmotioNet * [17] 2016 1,000,000 - - One million images of facial expressions downloaded from the Internet Images queried from web: 100,000 images annotated manually, 900,000 images annotated automatically
FER-Wild * [15] 2016 24,000 - - - 24,000 images from web
BAUM-1 * [16] 2016 31 19-65 30 One ethnicity (Turkish) 1184 multimodal facial video clips contain spontaneous facial expressions and speech of 13 emotional and mental states
BioVid Emo * [30] 2016 86 18–65 - - 15 standardized film clips
Vinereactor * [76] 2016 222 - web-cam Mechanical tuckers 6029 video responses from 343 unique mechanical truck workers in response to 200 video stimulus. Total number of 1,380,343 video frames
CHEAVD * [77] 2016 238 11–62 25 - Extracted from 34 films, two TV series and four other television shows. In the wild
ISED * [31] 2016 50 18–22 50 One ethnicity (India) 428 videos
4D CCDb * [32] 2015 4 20–50 60 - 34 audio-visuals
MAHNOB Mimicry * [33] 2015 60 18–34 25 Staff and students at Imperial College London Over 54 sessions of dyadic interactions between 12 confederates and their 48 counterparts
OPEN-EmoRec-II * [34] 2015 30 Mean age: women 37.5 years; men 51.1 years - - Video, audio, physiology (SCL, respiration, BVP, EMG Corrugator supercilii, EMG Zygomaticus Major) and facial reactions annotations
HAPPEI * [78] 2015 8500 faces - - - 4886 images.
AVEC’14 * [35] 2014 84 18–63 - German 300 audio-visuals
BAUM-2 * [13] 2014 286 5–73 - two ethnicities (Turkish, English) 1047 video clips
BP4D-Spontaneous * [14] 2013 41 18–29 25 four ethnicities (Asian, African-American, Hispanic, and Euro-American) 368,036 frames
DISFA * [36] 2013 27 18–50 20 four ethnicities (Asian, Euro American, Hispanic, and African-American) 130,000 frames
RECOLA * [37] 2013 46 Mean age: 22 years, standard deviation: three years - four ethnicities (French, Italian, German and Portuguese) 27 videos
AM-FED * [79] 2013 242 Range of ages and ethnicities 14 Viewers from a range of ages and ethnicities 168,359 frames/242 facial videos
FER-2013 * [11] 2013 35,685 - - - Images queried from web
AVEC’13 (AViD-Corpus) * [38] 2013 292 18–63 30 one ethnicity (German) 340 audio-visuals
CCDb * [39] 2013 16 25–56 - All participants were fully fluent in the Englishanguage 30 audio-visuals
MAHNOB Laughter * [62] 2013 22 Average age: 27 and 28 years 25 12 different countries and of different origins. 180 sessions 563aughter episodes, 849 speech utterances, 51 posedaughs, 67 speech–laughs episodes and 167 other vocalizations annotated in the dataset
DynEmo * [40] 2013 358 25–65 25 One ethnicity (Caucasian) Two sets of 233 and 125 recordings of EFE of ordinary people
PICS-Stirling ESRC 3D Face Database * [63] 2013 99 - - - 2D images, video sequences and 3D face scans
DEAP * [41] 2012 32 19–37 - Mostly European students 40 one-minuteong videos shown to subjects
AFEW * [10] 2012 330 1–70 - Extracted from movies 1426 sequences withength from 300 to 5400 ms. 1747 expressions
SEMAINE * [42] 2012 24 22–60 - Undergraduate and postgraduate students 130,695 frames
Belfast induced * [80] 2012 Set1: 114 Undergraduate students - undergraduate students 570 audio-visuals
Set2: 82 Mean age of participants 23.78 - Undergraduate students, postgraduate students or employed professionals 650 audio-visuals
Set3: 60 age of participants 32.54 - (Peru, Northern Ireland) 180 audio-visuals
MAHNOB-HCI * [43] 2012 27 19–40 60 Different educational background, from undergraduate students to postdoctoral fellows, with different English proficiency from intermediate to native speakers 756 data sequences
Hi4D-ADSIP * [12] 2011 80 18–60 60 Undergraduate students from the Performing Arts Department at the University. Undergraduate students, postgraduate students and members of staff from other departments 3360 images/sequences
UNBC-McMaster (UNBC Shoulder Pain Archive (SP)) * [44] 2011 25 - - Participants were self-identified while having a problem with shoulder pain 48,398 frames/200 video sequences
CAM3D * [45] 2011 16 24–50 25 Three ethnicities (Caucasian, Asian and Middle Eastern) 108 videos of 12 mental states
SFEW * [7] 2011 95 - - - 700 images: 346 images in Set 1 and 354 images in Set 2
B3D(AC) * [46] 2010 14 21–53 25 Native English speakers 1109 sequences, 4.67 song
USTC-NVIE * [64] 2010 215 17–31 30 Students 236 apex images
CK+ * [47] 2010 123 18–50 - Three ethnicities (Euro-American, Afro-American and other) 593 sequences
MMI-V * [65] 2010 25 20–32 25 Three ethnicities (European, South American, Asian) 1 h and 32 min of data. 392 segments
AVLC * [67] 2010 24 Average ages were respectively 30, 28 and 29 years 25 eleven ethnicities (Belgium, France, Italy, UK, Greece, Turkey, Kazakhstan, India, Canada, USA and South Korea) 1000 spontaneousaughs and 27 actedaughs
AvID * [48] 2009 15 19–37 - Native Slovenian speakers Approximately one-hour video for each subject
AVIC [49] 2009 21 ≤30 and ≥40 25 Two ethnicities (Asian and European) No. episodes 324
DD [50] 2009 57 - 30 19% non-Caucasian No. episodes 238
VAM-faces * [81] 2008 20 16–69 (70% ≤ 35) 25 One ethnicity (German) 1867 images (93.6 images per speaker on average)
FreeTalk * [82] 2008 4 - 60 Originating from different countries and each of them speaking a different nativeanguage (Finnish, French, Japanese, and English) No. episodes 300
IEMOCAP * [68] 2008 10 - 120 Actors (fluent English speakers) Two hours of audiovisual data, including video, speech, motion capture of face, and text transcriptions
SAL * [51] 2008 4 - - - 30 min sessions for each user
HUMAINE * [52] 2007 Multiple - - - 50 ‘clips’ from naturalistic and induced data
EmoTABOO * [53] 2007 - - - French dataset 10 clips
AMI [69] 2006 - - 25 - A multi-modal data set consisting of 100 h of meeting recordings
ENTERFACE * [54] 2006 16 average age 25 - - -
5 22–38
16 average age 25
RU-FACS [56] 2005 100 18–30 24 Two ethnicities (African-American and Asian or Latino) 400–800 min dataset
MMI * [66] 2005 19 19–62 24 Three ethnicities (European, Asian, or South American) Subjects portrayed 79 series of facial expressions. Image sequence of frontal and side view are captured. 740 static images/848 videos
UT-Dallas * [55] 2005 284 18–25 29.97 One ethnicity (Caucasians) 1540 standardized clips
MIT [57] 2005 17 - - - Over 25,000 frames were scored
EmoTV * [83] 2005 48 - - French 51 video clips
UA-UIUC * [58] 2004 28 Students - Students One video clip for each subject
AAI [59] 2004 60 18–30 - Two ethnicities (European American and Chinese American) One audiovisual for each subject
Smile dataset [60] 2001 95 - 30 - 195 spontaneous smiles