Skip to main content
. 2018 Jul 11;25(9):1248–1258. doi: 10.1093/jamia/ocy072

Box 2.

Example of technical evaluation measures for conversational agents and their individual modules

Conversational agent as a whole (global measures) Dialogue success rate (% successful task completion), dialogue-based cost measures (duration, number of turns necessary to achieve a task, number of repetitions, corrections or interruptions)
Automatic speech recognition Word accuracy, word error rate, word insertion rate, word substitution rate, sentence accuracy
Natural language understanding Percentage of words correctly understood, not covered or partially covered; % sentences correctly analyzed; % words outside the dictionary; % sentences whose final semantic representation is the same as the reference; % correct frame units, considering the actual frame units; frame-level accuracy; frame-level coverage
Dialogue management Percentage of correct responses; % half-answers; % times the system works trying to solve a problem; % times the user acts trying to solve a problem
Natural language generation Number of times the user requests a repetition of the reply provided by the system; user response time; number of times the user does not answer; rate of out-of-vocabulary words
Speech synthesis Intelligibility of synthetic speech and naturalness of the voice

Abbreviations: %, percentage

Adapted from López-Cózar et al. 2011;36 Walker et al. 199743