Skip to main content
. 2021 Dec 17;21(24):8448. doi: 10.3390/s21248448

Table 4.

Main available datasets for conversational agents—part B.

Datasets for Goal Oriented CAs
Schema Guided dialogue simulator+ multi-domain, 20 k intent prediction,
Dialogue [232] paid task-oriented conversations lang. generation,
crowd-workers human-agent convev. dialogue tracking
MultiWOZ turkers working human-human 10 k dialogues Task-oriented
[233] conversations dialogue modelling
Taskmaster-1 crowd workers spoken & written 5507 spoken & dialogue systems
[234] users and technical 7708 written research, dev.
center operators dialogs dialogs and design
MultiDoGo crowd workers human to human, 8~1 K dialogues virtual assistants
[235] paired with services dialogues across 6 domains, development
trained annotators
Datasts for Supporting CAs
COVID-19 dialogue online healthcare conversations between 603 Eng. + medical dialogue
dataset [176] platform doctors and 1088 Chinese system
patients consultations systems
MedDialog medical dialogue doctors–patients 1.1 M Chinese + medical dialogue
[236] platform conversations 0.3 M English systems
dialogues
SEMAINE human–human emotionally coloured 25 recordings, eliciting non-verbal
[239] conversation conversations video 3~0 min signals in
experiment recordings long human-computer
interactions
EmpatheticDialogues 810 crowd workers conversations 25 k conversations recognizing
[238] select an emotion grounded in human’s feelings
and talk about it emotional situations
Offensive response input–response input–response 110 K improve CA
dataset [241] records from SimSimi pairs and chat pairs abilities
offensivity annotated their annotation
by crowd workers
BURCHAK dataset dialogues of chat outputs of 177 dialogues learning
[242] pairs of participants, dialogues 2454 turns visually grounded
discussing visual word meanings
attributes of 9 objects in a foreign language
The CIMA collection conversations between tutoring interactions 2970 tutor tutoring conversation
[246] crowd workers playing and accompanying responses based on
as students and tutors. responses to 350 exercises. a provided strategy.