. 2025 May 21;11:e72522. doi: 10.2196/72522

Table 3. Comparison of completeness and readability of chatbot responses on US prostate cancer screening guidelines.

Study	Chatbot name	Completeness, n/N (%)	Average readability score
			mean (SD)	%, mean (SD)
This study	PCI^a	17/23 (74)	64.5 (8.7)	—^b
Zhu et al [15]	ChatGPT	21/22 (95)^c	—	100 (NR^d)
Zhu et al [15]	ChatGPT Plus	20.3/22 (92)^c	—	100 (NR)
Zhu et al [15]	ChatSonic	14.3/22 (65)	—	95 (NR)
Zhu et al [15]	YouChat	10.34/22 (47)	—	98 (NR)
Zhu et al [15]	Neeva AI	8.8/22 (40)	—	84 (NR)
Zhu et al [15]	Perplexity Detailed	6.6/22 (30)	—	95 (NR)
Zhu et al [15]	Perplexity Concise	6.6/22 (30)	—	95 (NR)
Owens et al [23]	ChatGPT 3.5 standard response	6/11 (54)	38.0 (7.6)	—
Owens et al [23]	ChatGPT 3.5 low literacy response	4/11 (34)	70.3 (7.2)^e	—
Owens et al [23]	ChatGPT 4.0 standard response	7/11 (63)	43.1 (9.2)	—
Owens et al [23]	ChatGPT 4.0 low literacy response	7/11 (63)	74.1 (9.9)^e	—
Owens et al [23]	Google Gemini standard response	6/11 (54)	55.7 (10.4)	—
Owens et al [23]	Google Gemini low literacy response	5/11 (45)	81.0 (3.6)^e	—
Owens et al [23]	Google Gemini Advanced standard response	6/11 (54)	66.3 (9.4)^e	—
Owens et al [23]	Google Gemini Advanced low literacy response	6/11 (54)	79.4 (5.1)^e	—
Owens et al [23]	Microsoft Copilot standard response	8/11 (72)	50.8 (9.3)	—
Owens et al [23]	Microsoft Copilot low literacy response	6/11 (54)	65.1 (6.6)^e	—
Owens et al [23]	Microsoft Copilot Pro standard response	7/11 (63)	61.2 (9.5)	—
Owens et al [23]	Microsoft Copilot Pro low literacy response	6/1 (54)	78.8 (4.7)^e	—

PCI: Prostate Cancer Info.

Not applicable.

Chatbot had a higher completeness score than PCI.

NR: not reported.

Chatbot had definitively higher readability scores than PCI based on the Flesch-Kincaid readability. Other scores may also be higher but were not based on a validated measure.