Skip to main content
letter
. 2025 Jan 7;8:e60827. doi: 10.2196/60827

Table 1. Descriptive statistics of scores between chatbots.

ChatGPT 3.5 (n=75) Google Bard (n=72) Bing AI (n=75)
Mean Flesch reading ease score (SD)*a 33.90 (8.1) 49.72 (15.4) 46.53 (9.7)
Mean accuracy (SD) 5.29 (0.97) 5.00 (0.98) 4.87 (1.1)
Mean overall rating (SD) 8.37 (1.8) 7.94 (1.9) 7.41 (2.1)
Number of responses appropriate for a patient-facing platform (%) 71 (95) 65 (90) 65 (87)
Sufficiency for clinical practice
 Yes (%) 41 (55) 35 (49) 35 (47)
 No: not specific enough (%) 14 (19) 15 (21) 23 (31)
 No: inaccurate information (%) 20 (27) 20 (28) 17 (23)
 No: not concise (%) 0 2 (3) 0
a

Out of n=25 for ChatGPT and Bing AI and n=24 for Google Bard because only 1 Flesch reading ease score was calculated for each response. The other measures in the table are based on evaluation of each chatbot response by 3 board-certified dermatologists.