Table 1. Descriptive statistics of scores between chatbots.

	ChatGPT 3.5 (n=75)	Google Bard (n=72)	Bing AI (n=75)
Mean Flesch reading ease score (SD)*^a	33.90 (8.1)	49.72 (15.4)	46.53 (9.7)
Mean accuracy (SD)	5.29 (0.97)	5.00 (0.98)	4.87 (1.1)
Mean overall rating (SD)	8.37 (1.8)	7.94 (1.9)	7.41 (2.1)
Number of responses appropriate for a patient-facing platform (%)	71 (95)	65 (90)	65 (87)
Sufficiency for clinical practice
Yes (%)	41 (55)	35 (49)	35 (47)
No: not specific enough (%)	14 (19)	15 (21)	23 (31)
No: inaccurate information (%)	20 (27)	20 (28)	17 (23)
No: not concise (%)	0	2 (3)	0

Out of n=25 for ChatGPT and Bing AI and n=24 for Google Bard because only 1 Flesch reading ease score was calculated for each response. The other measures in the table are based on evaluation of each chatbot response by 3 board-certified dermatologists.