Skip to main content

View full-text article in PMC

. 2018 Jul 11;25(9):1248–1258. doi: 10.1093/jamia/ocy072

Box 2.

Example of technical evaluation measures for conversational agents and their individual modules

Conversational agent as a whole (global measures)	Dialogue success rate (% successful task completion), dialogue-based cost measures (duration, number of turns necessary to achieve a task, number of repetitions, corrections or interruptions)
Automatic speech recognition	Word accuracy, word error rate, word insertion rate, word substitution rate, sentence accuracy
Natural language understanding	Percentage of words correctly understood, not covered or partially covered; % sentences correctly analyzed; % words outside the dictionary; % sentences whose final semantic representation is the same as the reference; % correct frame units, considering the actual frame units; frame-level accuracy; frame-level coverage
Dialogue management	Percentage of correct responses; % half-answers; % times the system works trying to solve a problem; % times the user acts trying to solve a problem
Natural language generation	Number of times the user requests a repetition of the reply provided by the system; user response time; number of times the user does not answer; rate of out-of-vocabulary words
Speech synthesis	Intelligibility of synthetic speech and naturalness of the voice

Abbreviations: %, percentage

Adapted from López-Cózar et al. 2011;³⁶ Walker et al. 1997⁴³