Skip to main content
. 2023 Feb 23;23(5):2455. doi: 10.3390/s23052455

Table 3.

Dataset for speech emotion recognition.

Name Type Details Number of Emotion Categories Number of Samples
MDS [184] Textual Product reviews from the Amazon shopping site; consisting of different words,
sentences, and documents
2 or 5 100,000
SST [185] Textual Semantic emotion recognition database established by Stanford University 2 or 5 11,855
IMDB [186] Textual Contains a large number of movie reviews 2 25,000
EMODB [187] Performer-based The dataset consists of ten German voices spoken by ten speakers (five males and five females) 7 800
SAVEE [188] Performer-based Performed by four female speakers;
spoken in English
7 480
CREAM-D [189] Performer-based Spoken in English 6 7442
IEMOCAP * [190] Performer-based Conversation between two people (one male and one female);
spoken in English
4 -
Chinese Emotion Speech Dataset [191] Induced Spoken in Chinese 5 3649
MELD * [192] Induced Data from TC-series Friends 3 13,000
RECOLA Speech Database [179] Natural Spoken by 46 speakers (19 male and 27 female);
spoken in French
5 7 h
FAU Aibo emotion corpus [193] Natural Communications between 51 children and a robot dog;
spoken in German
11 9 h
Semaine Database [194] Natural Spoken by 150 speakers;
spoken in English, Greek, and Hebrew
5 959 conversations
CHEAVD [195] Natural Spoken by 238 speakers (from children to the elderly);
spoken in Chinese
26 2322

* Can also be used for multimodal emotion recognition.