Skip to main content
. 2025 May 21;11:e72522. doi: 10.2196/72522

Table 3. Comparison of completeness and readability of chatbot responses on US prostate cancer screening guidelines.

Study Chatbot name Completeness, n/N (%) Average readability score
mean (SD) %, mean (SD)
This study PCIa 17/23 (74) 64.5 (8.7) b
Zhu et al [15] ChatGPT 21/22 (95)c 100 (NRd)
Zhu et al [15] ChatGPT Plus 20.3/22 (92)c 100 (NR)
Zhu et al [15] ChatSonic 14.3/22 (65) 95 (NR)
Zhu et al [15] YouChat 10.34/22 (47) 98 (NR)
Zhu et al [15] Neeva AI 8.8/22 (40) 84 (NR)
Zhu et al [15] Perplexity Detailed 6.6/22 (30) 95 (NR)
Zhu et al [15] Perplexity Concise 6.6/22 (30) 95 (NR)
Owens et al [23] ChatGPT 3.5 standard response 6/11 (54) 38.0 (7.6)
Owens et al [23] ChatGPT 3.5 low literacy response 4/11 (34) 70.3 (7.2)e
Owens et al [23] ChatGPT 4.0 standard response 7/11 (63) 43.1 (9.2)
Owens et al [23] ChatGPT 4.0 low literacy response 7/11 (63) 74.1 (9.9)e
Owens et al [23] Google Gemini standard response 6/11 (54) 55.7 (10.4)
Owens et al [23] Google Gemini low literacy response 5/11 (45) 81.0 (3.6)e
Owens et al [23] Google Gemini Advanced standard response 6/11 (54) 66.3 (9.4)e
Owens et al [23] Google Gemini Advanced low literacy response 6/11 (54) 79.4 (5.1)e
Owens et al [23] Microsoft Copilot standard response 8/11 (72) 50.8 (9.3)
Owens et al [23] Microsoft Copilot low literacy response
6/11 (54) 65.1 (6.6)e
Owens et al [23] Microsoft Copilot Pro standard response 7/11 (63) 61.2 (9.5)
Owens et al [23] Microsoft Copilot Pro low literacy response
6/1 (54) 78.8 (4.7)e
a

PCI: Prostate Cancer Info.

b

Not applicable.

c

Chatbot had a higher completeness score than PCI.

d

NR: not reported.

e

Chatbot had definitively higher readability scores than PCI based on the Flesch-Kincaid readability. Other scores may also be higher but were not based on a validated measure.