Table 2.
Adjusted percentage average ratings of large language model responses. Adjusted average percentage ratings were calculated as the mean of normalized scores using the following formula to scale responses uniformly from 0% to 100%: adjusted percentage rating = ((actual Likert score – 1) / (Likert scale maximum – 1)) × 100%.
|
|
ChatGPT | Claude 2 | Bard | |||||
|
|
Likert score, mean (SD) | Adjusted average Likert rating (%), mean (SD) | Likert score, mean (SD) | Adjusted average Likert rating, mean (SD) | Likert score, mean (SD) | Adjusted average Likert rating, mean (SD) | ||
| Accuracy | 4.2 (0.55) | 79.93 (13.8) | 4.61 (0.58) | 90.13 (14.58) | 3.76 (0.85) | 69.08 (21.3) | ||
| Relevance | 4.28 (0.64) | 81.91 (16.1) | 4.76 (0.4) | 94.08 (9.96) | 4.04 (0.67) | 75.99 (16.79) | ||
| Clarity | 4.24 (0.61) | 80.92 (16.1) | 4.68 (0.38) | 92.11 (9.38) | 3.86 (0.64) | 71.38 (15.89) | ||
| Emotional sensitivity | 4.49 (1) | 58.11 (16.61) | 5.46 (0.92) | 74.34 (15.3) | 4.7 (0.97) | 61.62 (16.16) | ||