Skip to main content
. 2025 Sep 24;9:e75421. doi: 10.2196/75421

Table 5. Comparative evaluation on simulated conversations with behavioral science agentic workflow and base Gemini.

No. of conversations preferred, n (%) Conversations avg. Char. length
(N=153), mean (SD)
Autoevaluation (N=153) Human expert review (N=30)
Behavioral science agentic workflow 102 (66.7)
  • Expert 1: 22 (73.3)

  • Expert 2: 20 (66.7)

3825 (1678)
Base Gemini 51 (33.3)
  • Expert 1: 8 (26.7)

  • Expert 2: 10 (33.3)

3904 (2056)