Skip to main content
. 2023 Dec 9;31(3):746–761. doi: 10.1093/jamia/ocad222

Table 3.

The proposed framework for evaluating CAs in healthcare.

Stage → 1. Feasibility and usability → 2. Efficacy → 3. Effectiveness → 4. Implementation
WHO digital health Brief description
  • Feasibility: The ability to work as intended.

  • Usability: The degree of a system being used to achieve specified goals in a specified context of use.

Efficacy: The ability to achieve the intended results in a research setting or trial. Effectiveness: The ability to achieve the intended results in a real application (nonresearch setting). Implementation science: To assess the uptake, integration and sustainability of evidence-based digital health interventions for a given context, including policies and practices.
Evaluation targets
  • Stability (system uptime/failure rates)

  • Performance consistency

  • Standards adherence (terminology, interoperability, security)

  • User satisfaction

  • Workflow “fit”

  • Learning curve (design)

  • Cognitive performance/errors

  • Reliability

  • Changes in care processes (time)

  • Changes in outcomes (system performance/health)

  • Changes in process, outcome, coverage, and costs

  • Total cost of implementation, and health impact

  • Error rates

  • Learning curve of users

  • Changes in policy, practices attributable to system

  • Adaptability and extendibility to new use-cases

Illustrative num of users 10-100 100-1000 10 000 + 1000 000+
Studies reviewed and outcome measures at 4 major evaluation stages aligned with the WHO guide Studies (n) 40 21 12 8
Study design and sample size (n) Single-arm studies (n = 3-10,107,114,131 11-20,21,27,91,95,97,103,111 21-30,11,20,23,24,26,96 31-40,25,80,90 41-50,8,55,98,105, 73,92 89,94 101,81 116,113 121,93 31874; 4390 messages73), 2-arm quasi-experimental study(n = 454125), laboratory tests (investigators, n = 1,19 217,83–87) and RCT (n = 142,39 and 289132)
  • Cross-over study (n = 17940)

  • Case-control study (n = 9513)

  • Cross-sectional studies (n = 354,127 929,88 4737110)

  • Randomized controlled trials (n = 28,120 180,82 513,126 700,118 927,16 57 214124)

  • Cohort study (n = 3629117)

Cross-sectional studies (n = 1206,78 7099,104 16 519,15 14 698,129 36 070,79 61 070,72 135 263128; 610 conversations133)
User characteristics for implementation science
  • Devices used.73

  • Age and gender88,110

  • Nationality, ethnicity and religions88

  • Education and socioeconomic status88

  • Age and gender15,72

  • Nationality, ethnicity and religions72

  • Health conditions72

Usage, adherence and uptake
Costs and health economic analyses
  • Cost effectiveness14

  • Health economic analyses, such as cost-utility analysis, cost-effective analysis, cost-minimization analysis, and cost-benefit analysis. (our recommendation)

  • Health economic analyses, such as cost-utility analysis, cost-effective analysis, cost-minimization analysis, and cost-benefit analysis. (our recommendation)

Clinical/health outcomes
  • Knowledge and skills21

  • Health wellbeing and issues95,125

  • Psychological/mental health,21,55,74,81,95,105,125

  • Behavioral modification and risk factors26,81,113

  • Disease condition82

  • Knowledge and skills16

  • Health wellbeing and issues13,126

  • Psychological/mental health13,16,82,117,118

  • Clinical assessment performance40

  • Behavioral modification and risk factors16,82,110,118,122,124,126

  • Psychological/mental health78,79 (through short inbuilt questionnaires in CA apps)

User experience
  • Ease of use88

  • Satisfaction82,110,122

  • Working alliance82

  • Overall experience16,82,110,120

  • Usefulness/helpfulness88,118

  • Acceptance/preference82,88

  • Conversational capability88

  • Suggestions for improvement110

  • Satisfaction15,72

  • Working alliance78,79

  • Overall experience15,78

  • Acceptance and preference104

Safety and information quality
  • Privacy and trust55,105

  • Risk of causing death8

  • CA response capability17,19,85,86

  • Risk of misinformation17

  • Risk of unintended harms8,85

  • CA response appropriateness19,83,86,87,105

  • Resources and contents quality84–86

  • Risk of unintended harms82

Functionality
  • Response speed8

  • Task achievements8,25,94

  • Engagement functions8,17,26,85,94

  • Voice and device control91

  • Classification performance25,93

  • Clinical assessment performance90

  • Understanding and accurate responses11,19,86,91,98,103,107,111,132

  • Understanding and accurate responses108

  • Voice and device control104

  • Understand and accurate response133

The framework demonstrates the included CA evaluation studies (n = 43), study designs, outcome measures and sample sizes at four essential evaluation stages. The 4 stages and corresponding evaluation targets were proposed by the WHO. Essential categories, which we identified for each stage, are marked by a light blue.