Table 1. Descriptive statistics of scores between chatbots.
| ChatGPT 3.5 (n=75) | Google Bard (n=72) | Bing AI (n=75) | |
|---|---|---|---|
| Mean Flesch reading ease score (SD)*a | 33.90 (8.1) | 49.72 (15.4) | 46.53 (9.7) |
| Mean accuracy (SD) | 5.29 (0.97) | 5.00 (0.98) | 4.87 (1.1) |
| Mean overall rating (SD) | 8.37 (1.8) | 7.94 (1.9) | 7.41 (2.1) |
| Number of responses appropriate for a patient-facing platform (%) | 71 (95) | 65 (90) | 65 (87) |
| Sufficiency for clinical practice | |||
| Yes (%) | 41 (55) | 35 (49) | 35 (47) |
| No: not specific enough (%) | 14 (19) | 15 (21) | 23 (31) |
| No: inaccurate information (%) | 20 (27) | 20 (28) | 17 (23) |
| No: not concise (%) | 0 | 2 (3) | 0 |
Out of n=25 for ChatGPT and Bing AI and n=24 for Google Bard because only 1 Flesch reading ease score was calculated for each response. The other measures in the table are based on evaluation of each chatbot response by 3 board-certified dermatologists.