Table 5.
Macro-expressions datasets. The columns report: the dataset name (Dataset); the number of subjects; the range of subjects’ age (Age); the number of frames captured per second (FPS); ethnicity; and the amount of data/frames. In the table cells, a ‘-’ indicates that no information is available, while a ‘*’ following the dataset name indicates that the data is publicly available.
Dataset | Year | Number of Subjects | Age | FPS | Ethnicity | Amount of Data/Frames |
---|---|---|---|---|---|---|
EB+ [24] | 2020 | 200 | 18–66 | 25 | Five ethnicities (Latino/Hispanic, White, African American, Asian, and Others) | 1216 videos, with 395 K frames in total |
iSAFE [9] | 2020 | 44 | 17–22 | 60 | Two ethnicities (Indo-Aryan and Dravidian (Asian)) | 395 clips |
RAF-DB * [70] | 2019 | thousands | - | - | The images URLs were collected from Flickr | 30,000 facial images |
TAVER * [26] | 2019 | 17 | 21–38 | 10 | One ethnicity (Korean) | 17 videos of 1–4 mn |
4DFAB* [61] | 2018 | 180 | 5–75 | 60 | Three ethnicities (Caucasian (Europeans and Arabs), Asian (East-Asian and South-Asian) and Hispanic/Latino) | Two million frames. The vertex number of reconstructed 3D meshes ranges from 60 k to 75 k |
Aff-Wild2 * [71] | 2018 | 258 | infants, young and elderly | 30 | Five ethnicities (Caucasian, Hispanic or Latino, Asian, black, or African American) | Extending it with 260 more subjects and 1,413,000 new video frames |
RAVDESS * [27] | 2018 | 24 | 21–33 | 30 | (Caucasian, East-Asian, and Mixed (East-Asian Caucasian, and Black-Canadian First nations Caucasian)) | 7356 recordings composed of 4320 speech recordings and 3036 song recordings |
AM-FED+ * [72] | 2018 | 416 | - | 14 | Participants from around the world | 1044 videos of naturalistic facial responses to online media content recorded over the Internet |
GFT * [28] | 2017 | 96 | 21–28 | - | Participants were randomly selected | 172,800 frames |
AffectNet* [73] | 2017 | 450,000 | average age 33.01 years | - | More than 1,000,000 facial images from the Internet | 1,000,000 images with facialandmarks. 450,000 images annotated manually |
AFEW-VA* [74] | 2017 | 240 | 8–76 | - | Movie actors | 600 video clips |
SEWA* [29] | 2017 | 398 | 18–65 | 20–30 | Six ethnicities (British, German, Hungarian, Greek, Serbian, and Chinese) | 1990 audio-visual recording clips |
BP4D+ (MMSE) [25] | 2016 | 140 | 18–66 | 25 | Five ethnicities (Latino/Hispanic, White, African American, Asian, and Others) | 1.4 million frames. Over 10TB high quality data generated for the research community |
Aff-Wild * [75] | 2016 | 500 | - | - | - | 500 videos from YouTube |
EmotioNet * [17] | 2016 | 1,000,000 | - | - | One million images of facial expressions downloaded from the Internet | Images queried from web: 100,000 images annotated manually, 900,000 images annotated automatically |
FER-Wild * [15] | 2016 | 24,000 | - | - | - | 24,000 images from web |
BAUM-1 * [16] | 2016 | 31 | 19-65 | 30 | One ethnicity (Turkish) | 1184 multimodal facial video clips contain spontaneous facial expressions and speech of 13 emotional and mental states |
BioVid Emo * [30] | 2016 | 86 | 18–65 | - | - | 15 standardized film clips |
Vinereactor * [76] | 2016 | 222 | - | web-cam | Mechanical tuckers | 6029 video responses from 343 unique mechanical truck workers in response to 200 video stimulus. Total number of 1,380,343 video frames |
CHEAVD * [77] | 2016 | 238 | 11–62 | 25 | - | Extracted from 34 films, two TV series and four other television shows. In the wild |
ISED * [31] | 2016 | 50 | 18–22 | 50 | One ethnicity (India) | 428 videos |
4D CCDb * [32] | 2015 | 4 | 20–50 | 60 | - | 34 audio-visuals |
MAHNOB Mimicry * [33] | 2015 | 60 | 18–34 | 25 | Staff and students at Imperial College London | Over 54 sessions of dyadic interactions between 12 confederates and their 48 counterparts |
OPEN-EmoRec-II * [34] | 2015 | 30 | Mean age: women 37.5 years; men 51.1 years | - | - | Video, audio, physiology (SCL, respiration, BVP, EMG Corrugator supercilii, EMG Zygomaticus Major) and facial reactions annotations |
HAPPEI * [78] | 2015 | 8500 faces | - | - | - | 4886 images. |
AVEC’14 * [35] | 2014 | 84 | 18–63 | - | German | 300 audio-visuals |
BAUM-2 * [13] | 2014 | 286 | 5–73 | - | two ethnicities (Turkish, English) | 1047 video clips |
BP4D-Spontaneous * [14] | 2013 | 41 | 18–29 | 25 | four ethnicities (Asian, African-American, Hispanic, and Euro-American) | 368,036 frames |
DISFA * [36] | 2013 | 27 | 18–50 | 20 | four ethnicities (Asian, Euro American, Hispanic, and African-American) | 130,000 frames |
RECOLA * [37] | 2013 | 46 | Mean age: 22 years, standard deviation: three years | - | four ethnicities (French, Italian, German and Portuguese) | 27 videos |
AM-FED * [79] | 2013 | 242 | Range of ages and ethnicities | 14 | Viewers from a range of ages and ethnicities | 168,359 frames/242 facial videos |
FER-2013 * [11] | 2013 | 35,685 | - | - | - | Images queried from web |
AVEC’13 (AViD-Corpus) * [38] | 2013 | 292 | 18–63 | 30 one ethnicity (German) | 340 audio-visuals | |
CCDb * [39] | 2013 | 16 | 25–56 | - | All participants were fully fluent in the Englishanguage | 30 audio-visuals |
MAHNOB Laughter * [62] | 2013 | 22 | Average age: 27 and 28 years | 25 | 12 different countries and of different origins. | 180 sessions 563aughter episodes, 849 speech utterances, 51 posedaughs, 67 speech–laughs episodes and 167 other vocalizations annotated in the dataset |
DynEmo * [40] | 2013 | 358 | 25–65 | 25 | One ethnicity (Caucasian) | Two sets of 233 and 125 recordings of EFE of ordinary people |
PICS-Stirling ESRC 3D Face Database * [63] | 2013 | 99 | - | - | - | 2D images, video sequences and 3D face scans |
DEAP * [41] | 2012 | 32 | 19–37 | - | Mostly European students | 40 one-minuteong videos shown to subjects |
AFEW * [10] | 2012 | 330 | 1–70 | - | Extracted from movies | 1426 sequences withength from 300 to 5400 ms. 1747 expressions |
SEMAINE * [42] | 2012 | 24 | 22–60 | - | Undergraduate and postgraduate students | 130,695 frames |
Belfast induced * [80] | 2012 | Set1: 114 | Undergraduate students | - | undergraduate students | 570 audio-visuals |
Set2: 82 | Mean age of participants 23.78 | - | Undergraduate students, postgraduate students or employed professionals | 650 audio-visuals | ||
Set3: 60 | age of participants 32.54 | - | (Peru, Northern Ireland) | 180 audio-visuals | ||
MAHNOB-HCI * [43] | 2012 | 27 | 19–40 | 60 | Different educational background, from undergraduate students to postdoctoral fellows, with different English proficiency from intermediate to native speakers | 756 data sequences |
Hi4D-ADSIP * [12] | 2011 | 80 | 18–60 | 60 | Undergraduate students from the Performing Arts Department at the University. Undergraduate students, postgraduate students and members of staff from other departments | 3360 images/sequences |
UNBC-McMaster (UNBC Shoulder Pain Archive (SP)) * [44] | 2011 | 25 | - | - | Participants were self-identified while having a problem with shoulder pain | 48,398 frames/200 video sequences |
CAM3D * [45] | 2011 | 16 | 24–50 | 25 | Three ethnicities (Caucasian, Asian and Middle Eastern) | 108 videos of 12 mental states |
SFEW * [7] | 2011 | 95 | - | - | - | 700 images: 346 images in Set 1 and 354 images in Set 2 |
B3D(AC) * [46] | 2010 | 14 | 21–53 | 25 | Native English speakers | 1109 sequences, 4.67 song |
USTC-NVIE * [64] | 2010 | 215 | 17–31 | 30 | Students | 236 apex images |
CK+ * [47] | 2010 | 123 | 18–50 | - | Three ethnicities (Euro-American, Afro-American and other) | 593 sequences |
MMI-V * [65] | 2010 | 25 | 20–32 | 25 | Three ethnicities (European, South American, Asian) | 1 h and 32 min of data. 392 segments |
AVLC * [67] | 2010 | 24 | Average ages were respectively 30, 28 and 29 years | 25 | eleven ethnicities (Belgium, France, Italy, UK, Greece, Turkey, Kazakhstan, India, Canada, USA and South Korea) | 1000 spontaneousaughs and 27 actedaughs |
AvID * [48] | 2009 | 15 | 19–37 | - | Native Slovenian speakers | Approximately one-hour video for each subject |
AVIC [49] | 2009 | 21 | ≤30 and ≥40 | 25 | Two ethnicities (Asian and European) | No. episodes 324 |
DD [50] | 2009 | 57 | - | 30 | 19% non-Caucasian | No. episodes 238 |
VAM-faces * [81] | 2008 | 20 | 16–69 (70% ≤ 35) | 25 | One ethnicity (German) | 1867 images (93.6 images per speaker on average) |
FreeTalk * [82] | 2008 | 4 | - | 60 | Originating from different countries and each of them speaking a different nativeanguage (Finnish, French, Japanese, and English) | No. episodes 300 |
IEMOCAP * [68] | 2008 | 10 | - | 120 | Actors (fluent English speakers) | Two hours of audiovisual data, including video, speech, motion capture of face, and text transcriptions |
SAL * [51] | 2008 | 4 | - | - | - | 30 min sessions for each user |
HUMAINE * [52] | 2007 | Multiple | - | - | - | 50 ‘clips’ from naturalistic and induced data |
EmoTABOO * [53] | 2007 | - | - | - | French dataset | 10 clips |
AMI [69] | 2006 | - | - | 25 | - | A multi-modal data set consisting of 100 h of meeting recordings |
ENTERFACE * [54] | 2006 | 16 | average age 25 | - | - | - |
5 | 22–38 | |||||
16 | average age 25 | |||||
RU-FACS [56] | 2005 | 100 | 18–30 | 24 | Two ethnicities (African-American and Asian or Latino) | 400–800 min dataset |
MMI * [66] | 2005 | 19 | 19–62 | 24 | Three ethnicities (European, Asian, or South American) | Subjects portrayed 79 series of facial expressions. Image sequence of frontal and side view are captured. 740 static images/848 videos |
UT-Dallas * [55] | 2005 | 284 | 18–25 | 29.97 | One ethnicity (Caucasians) | 1540 standardized clips |
MIT [57] | 2005 | 17 | - | - | - | Over 25,000 frames were scored |
EmoTV * [83] | 2005 | 48 | - | - | French | 51 video clips |
UA-UIUC * [58] | 2004 | 28 | Students | - | Students | One video clip for each subject |
AAI [59] | 2004 | 60 | 18–30 | - | Two ethnicities (European American and Chinese American) | One audiovisual for each subject |
Smile dataset [60] | 2001 | 95 | - | 30 | - | 195 spontaneous smiles |