. 2025 Sep 24;9:e75421. doi: 10.2196/75421

Table 5. Comparative evaluation on simulated conversations with behavioral science agentic workflow and base Gemini.

	No. of conversations preferred, n (%)		Conversations avg. Char. length (N=153), mean (SD)
	Autoevaluation (N=153)	Human expert review (N=30)	Conversations avg. Char. length (N=153), mean (SD)
Behavioral science agentic workflow	102 (66.7)	Expert 1: 22 (73.3) Expert 2: 20 (66.7)	3825 (1678)
Base Gemini	51 (33.3)	Expert 1: 8 (26.7) Expert 2: 10 (33.3)	3904 (2056)