Table 2.
Category and subcategory (number of outcome measures, Nom) | Selected typical outcome measures (unit, questionnaire or method) |
---|---|
1. Functionality | (Nom = 44) |
|
Precision (%),25,93 sensitivity (%),25,93 accuracy (%),25 and specificity (%),25 and F1 (value)25,93 of the classifier. |
|
Response accuracy (%),17,19,86,103, inquiries unable to answer (%),86 response completion (%),86 understanding (scale, survey),91 etc. |
|
Topics initiated by CA versus participants,26 attempts to restart conversation (n),8 sentiment (score, coding responses manually),17 etc. |
|
Conversation tasks completed (%),94 task failure rate (n and %),8 and task completion (coefficient),25 and time per task (seconds).8 |
|
Adequate volume, speed, and sound quality (a survey),91 and negative technical aspects (qualitative analysis of user's responses).104 |
|
Accuracy,40,100 sensitivity,40,90 specificity of CA-based clinical assessment outcomes (CA vs standard clinical assessments),40,90 etc. |
2. Safety and information quality | (Nom = 17) |
|
Response appropriateness (scale),19,86,87 appropriate responses (descriptive),83 etc. |
|
Misinformation (%),17 reliable (%)86 and evidence-base (%)85 resources, information accuracy and completeness (%),85 and quality (descriptive).84 |
|
Responses with risk of unintended harms (n and %; eg, medication and emergency tasks),8 serious adverse events (n),82 and deaths (n and %).8 |
|
Privacy and trust (a survey),55 privacy and trust (a qualitative study, interview),55 and privacy infringement (a survey).105 |
3. User experience | (Nom = 80) |
|
Ease of use (scale, a self-designed questionnaire)11,88 and learning experience (score, a self-designed questionnaire).106 |
|
User engagement (scale, a survey),95 DBCI engagement (scale),92 and perceived engagement (scale, a survey).101 |
|
Response appropriateness (scale, a survey),11 dialogue performance (score, SASSI),94 emotional awareness (score, a questionnaire),106 etc. |
|
Usefulness (scale, a survey21,107 or interview88), perceived helpfulness (a survey, open-ended question, or interview),21,108 etc. |
|
Perceived trust (score, a questionnaire),89 perceived quality of the answers (score, EORTC QLQ-INFO25),39 etc. |
|
Satisfaction (scale, a self-defined survey,8,15,72,99,105,107,109–111 and CSQ-881,82,112), content satisfaction (scale, a survey),106 etc. |
|
Feasibility (score, a self-designed questionnaire).18,20,26,81 |
|
Usability (scale, SUS),75,96,107,113,114 usability (open comments, a focus group session),27 perceived usability (scale),92,105 etc. |
|
Acceptance (scale, a survey),88 preference of CA (scale, a survey),88 potential to replace humans (scale, a survey),11 etc. |
|
Overall user experience (UEQ,23,25 USE,96 NPS,115 URP-I81,82), users with positive or negative experience (n, the CA prompted the survey),15 etc. |
|
Working alliance (questionnaire, WAI-SR78,79,81,82,101,112). |
|
Suggestions for improvements (open-ended question in a survey),20,93,108,110 good and bad experiences with the CA (a survey),109 etc. |
|
Perceived stress (survey and interview),24 benefit (focus group study),27 and feelings of answering sensitive questions (CA vs humans).89 |
4. Clinical/health outcomes | (Nom = 68) |
|
PHQ-9,21,95,106,109,112,116 QIDS-SR,105, GAD-7,21,81,82,95,106,109,112,116–118 SAS,105 PANAS,106,,109,112,119 PSYCHLOPS,21 DASS21,75,117 PSS-10,95,105,120 etc. |
|
Pain (%, NRS),82 Parkinson’s disease rating scale (MDS-UPDRS),121 and Parkinson’s disease questionnaire.121 |
|
Behavior modification (score, SQUASH),122,123 smoking cessation (%, a survey),124 physical activity (score, AAS),113,123 etc. |
|
Knowledge gained (scale, a survey),16,18 problem solvability (score, a survey),75 problem resolution (score, a survey),75 etc. |
|
SWLS,120 WHO-5-J,95,125 EQ-5D-5L,126 and falls (falls per 1000 patient-days).13 |
5. Costs and health economic analyses | (Nom = 2) |
|
Time spent per 100 patients (hours per 100 patients, an analysis of the conversation logs).14 |
|
Monthly budget (dollars per month, an analysis of running costs of the CA system).73 |
6. Usage, adherence and uptake | (Nom = 62) |
|
Conversation duration (second, minute, or hour),14,15,26,55,72,73,75,81,82,88,115,126–128, exchanges (n),108–110, CA responses (n),92,127,129 etc. |
|
Adherence (n),108,120 dropouts (n, %; conversation dropouts,15 and dropouts of interventions15,116,122), follow-up rate (%),14 etc. |
|
Completed questionnaires (n),122 total followers (n),73 total impressions (n),73 average daily reach times (times of reach per day),73 etc. |
7. User characteristics for implementation science | (Nom = 12) |
|
Age (age groups, n, %)15,72,88,110 and gender (n, %).15,72,88,110 |
|
Nationality (n),88 race and ethnicity (%, White, Hispanic, Black),72 religion (n),88 and language (%, users in Spanish).72 |
|
Occupation and education (n, %, self-designed questionnaire),88 and urbanization levels (n, self-designed questionnaire).88 |
|
Users with a personal history of cancer (%),72 a family history of cancer (%),72 and risks of different cancers (NCCN criteria, Tyrer–Cuzick criteria).72 |
|
Mobile users (%).73 |
AAS, The Active Australia Survey; CSQ-8, Client Satisfaction Questionnaire with 8 questions; DASS21, depression, anxiety, and stress scales 21; DBCI, a questionnaire on the Digital Behavior Change Intervention; EORTC QLQ-INFO25, The European Organisation for Research and Treatment of Cancer Quality of Life Group information questionnaire; EQ-5D-5L, health-related quality of life with 5 dimensions: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression; F1, the harmonic mean (average) of the precision and recall; GAD-7, the Generalized Anxiety Disorder scale—7; NCCN criteria, National Comprehensive Cancer Network criteria; NPS, net promoter score; PANAS, the positive and negative affect schedule; PHQ-9, the Patient Health Questionnaire—a 9-item self-report questionnaire that assesses the frequency and severity of depressive symptomatology within the previous 2 weeks; PSS-10, the Perceived Stress Scale; PSYCHLOPS, the psychological outcome profiles; ROC, receiver operating characteristic—a graphical plot to evaluate a binary classifier/decision system across different discrimination thresholds; QIDS-SR, the Quick Inventory of Depressive Symptomatology-Self-report; SAS, the Self-rating Anxiety Scale; SASSI, the Subjective Assessment of System Speech Interfaces—a 7-point Likert scale on accuracy, likeability, cognitive demand, annoyance, habitability, and speed; SQUASH, the Dutch Short Questionnaire to assess health enhancing physical activity; SUS, the system usability scale; SWLS, the Satisfaction with Life Scale; UEQ, the User Experience Questionnaire; URP-I, usage rating profile-intervention with feasibility (6 items) and acceptability (6 items) scales; USE, the Usefulness, Satisfaction, and Ease of Use (USE) Questionnaire Short-Form; WAI-SR, the Working Alliance Inventory-Short Revised (agreement on the tasks of therapy, agreement on the goals of therapy and development of an affective bond); WHO-5-J, HEALTH well-being—5 Well-Being Index (Japanese version).