Skip to main content
. 2023 Dec 9;31(3):746–761. doi: 10.1093/jamia/ocad222

Table 2.

The summary of the outcome measures nested in the seven main categories and subcategories in the review.

Category and subcategory (number of outcome measures, Nom) Selected typical outcome measures (unit, questionnaire or method)
1. Functionality (Nom = 44)
  • Sentence classification performance (Nom = 5)

Precision (%),25,93 sensitivity (%),25,93 accuracy (%),25 and specificity (%),25 and F1 (value)25,93 of the classifier.
  • Understanding and responses (Nom = 17)

Response accuracy (%),17,19,86,103, inquiries unable to answer (%),86 response completion (%),86 understanding (scale, survey),91 etc.
  • Engagement functions (Nom = 7)

Topics initiated by CA versus participants,26 attempts to restart conversation (n),8 sentiment (score, coding responses manually),17 etc.
  • Task achievements and efforts (Nom = 4)

Conversation tasks completed (%),94 task failure rate (n and %),8 and task completion (coefficient),25 and time per task (seconds).8
  • Voice and device control (Nom = 2)

Adequate volume, speed, and sound quality (a survey),91 and negative technical aspects (qualitative analysis of user's responses).104
  • Clinical assessment performance (Nom = 9)

Accuracy,40,100 sensitivity,40,90 specificity of CA-based clinical assessment outcomes (CA vs standard clinical assessments),40,90 etc.
2. Safety and information quality (Nom = 17)
  • CA response appropriateness (Nom = 6)

Response appropriateness (scale),19,86,87 appropriate responses (descriptive),83 etc.
  • Risk of misinformation (Nom = 4)

Misinformation (%),17 reliable (%)86 and evidence-base (%)85 resources, information accuracy and completeness (%),85 and quality (descriptive).84
  • Risk of unintended harms or adverse events (Nom = 4)

Responses with risk of unintended harms (n and %; eg, medication and emergency tasks),8 serious adverse events (n),82 and deaths (n and %).8
  • Privacy and trust (Nom = 3)

Privacy and trust (a survey),55 privacy and trust (a qualitative study, interview),55 and privacy infringement (a survey).105
3. User experience (Nom = 80)
  • Ease of use (Nom = 2)

Ease of use (scale, a self-designed questionnaire)11,88 and learning experience (score, a self-designed questionnaire).106
  • Engagement (Nom = 3)

User engagement (scale, a survey),95 DBCI engagement (scale),92 and perceived engagement (scale, a survey).101
  • Conversation capability (Nom = 6)

Response appropriateness (scale, a survey),11 dialogue performance (score, SASSI),94 emotional awareness (score, a questionnaire),106 etc.
  • Usefulness/helpfulness (Nom = 6)

Usefulness (scale, a survey21,107 or interview88), perceived helpfulness (a survey, open-ended question, or interview),21,108 etc.
  • Perceived quality and trust (Nom = 5)

Perceived trust (score, a questionnaire),89 perceived quality of the answers (score, EORTC QLQ-INFO25),39 etc.
  • Satisfaction (Nom = 5)

Satisfaction (scale, a self-defined survey,8,15,72,99,105,107,109–111 and CSQ-881,82,112), content satisfaction (scale, a survey),106 etc.
  • Feasibility (Nom = 3)

Feasibility (score, a self-designed questionnaire).18,20,26,81
  • Usability (Nom = 5)

Usability (scale, SUS),75,96,107,113,114 usability (open comments, a focus group session),27 perceived usability (scale),92,105 etc.
  • Acceptance/preference (Nom = 11)

Acceptance (scale, a survey),88 preference of CA (scale, a survey),88 potential to replace humans (scale, a survey),11 etc.
  • Overall user experience with mixed themes (Nom = 26)

Overall user experience (UEQ,23,25 USE,96 NPS,115 URP-I81,82), users with positive or negative experience (n, the CA prompted the survey),15 etc.
  • Working alliance (Nom = 1)

Working alliance (questionnaire, WAI-SR78,79,81,82,101,112).
  • Suggestions for improvement (Nom = 4)

Suggestions for improvements (open-ended question in a survey),20,93,108,110 good and bad experiences with the CA (a survey),109 etc.
  • Other open comments (Nom = 3)

Perceived stress (survey and interview),24 benefit (focus group study),27 and feelings of answering sensitive questions (CA vs humans).89
4. Clinical/health outcomes (Nom = 68)
  • Psychological/mental health (Nom = 34)

PHQ-9,21,95,106,109,112,116 QIDS-SR,105, GAD-7,21,81,82,95,106,109,112,116–118 SAS,105 PANAS,106,,109,112,119 PSYCHLOPS,21 DASS21,75,117 PSS-10,95,105,120 etc.
  • Disease conditions (Nom = 3)

Pain (%, NRS),82 Parkinson’s disease rating scale (MDS-UPDRS),121 and Parkinson’s disease questionnaire.121
  • Modification of behaviors and risk factors (Nom = 23)

Behavior modification (score, SQUASH),122,123 smoking cessation (%, a survey),124 physical activity (score, AAS),113,123 etc.
  • Knowledge and skills (Nom = 4)

Knowledge gained (scale, a survey),16,18 problem solvability (score, a survey),75 problem resolution (score, a survey),75 etc.
  • Health wellbeing and issues (Nom = 4)

SWLS,120 WHO-5-J,95,125 EQ-5D-5L,126 and falls (falls per 1000 patient-days).13
5. Costs and health economic analyses (Nom = 2)
  • Cost effectiveness (Nom = 1)

Time spent per 100 patients (hours per 100 patients, an analysis of the conversation logs).14
  • Costs (Nom = 1)

Monthly budget (dollars per month, an analysis of running costs of the CA system).73
6. Usage, adherence and uptake (Nom = 62)
  • Usage (Nom = 38)

Conversation duration (second, minute, or hour),14,15,26,55,72,73,75,81,82,88,115,126–128, exchanges (n),108–110, CA responses (n),92,127,129 etc.
  • Adherence (Nom = 15)

Adherence (n),108,120 dropouts (n, %; conversation dropouts,15 and dropouts of interventions15,116,122), follow-up rate (%),14 etc.
  • Uptake (Nom = 9)

Completed questionnaires (n),122 total followers (n),73 total impressions (n),73 average daily reach times (times of reach per day),73 etc.
7. User characteristics for implementation science (Nom = 12)
  • Age and gender (Nom = 2)

Age (age groups, n, %)15,72,88,110 and gender (n, %).15,72,88,110
  • Nationality, ethnicity and religion (Nom = 4)

Nationality (n),88 race and ethnicity (%, White, Hispanic, Black),72 religion (n),88 and language (%, users in Spanish).72
  • Education and socioeconomic status (Nom = 2)

Occupation and education (n, %, self-designed questionnaire),88 and urbanization levels (n, self-designed questionnaire).88
  • Health conditions (Nom = 3)

Users with a personal history of cancer (%),72 a family history of cancer (%),72 and risks of different cancers (NCCN criteria, Tyrer–Cuzick criteria).72
  • Devices used (Nom = 1)

Mobile users (%).73

AAS, The Active Australia Survey; CSQ-8, Client Satisfaction Questionnaire with 8 questions; DASS21, depression, anxiety, and stress scales 21; DBCI, a questionnaire on the Digital Behavior Change Intervention; EORTC QLQ-INFO25, The European Organisation for Research and Treatment of Cancer Quality of Life Group information questionnaire; EQ-5D-5L, health-related quality of life with 5 dimensions: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression; F1, the harmonic mean (average) of the precision and recall; GAD-7, the Generalized Anxiety Disorder scale—7; NCCN criteria, National Comprehensive Cancer Network criteria; NPS, net promoter score; PANAS, the positive and negative affect schedule; PHQ-9, the Patient Health Questionnaire—a 9-item self-report questionnaire that assesses the frequency and severity of depressive symptomatology within the previous 2 weeks; PSS-10, the Perceived Stress Scale; PSYCHLOPS, the psychological outcome profiles; ROC, receiver operating characteristic—a graphical plot to evaluate a binary classifier/decision system across different discrimination thresholds; QIDS-SR, the Quick Inventory of Depressive Symptomatology-Self-report; SAS, the Self-rating Anxiety Scale; SASSI, the Subjective Assessment of System Speech Interfaces—a 7-point Likert scale on accuracy, likeability, cognitive demand, annoyance, habitability, and speed; SQUASH, the Dutch Short Questionnaire to assess health enhancing physical activity; SUS, the system usability scale; SWLS, the Satisfaction with Life Scale; UEQ, the User Experience Questionnaire; URP-I, usage rating profile-intervention with feasibility (6 items) and acceptability (6 items) scales; USE, the Usefulness, Satisfaction, and Ease of Use (USE) Questionnaire Short-Form; WAI-SR, the Working Alliance Inventory-Short Revised (agreement on the tasks of therapy, agreement on the goals of therapy and development of an affective bond); WHO-5-J, HEALTH well-being—5 Well-Being Index (Japanese version).