Table 4.
Comprehensiveness assessment for common questions answered by the three LLM-Chatbots, with responses that received a 'good' accuracy rating
LLMa | Response Comprehensiveness |
||
---|---|---|---|
n | Mean (SD) | Median | |
ChatGPT-3.5 | 11 | 4.7 (0.3) | 4.7 |
ChatGPT-4.0 | 11 | 4.7 (0.3) | 4.7 |
Google Bard | 11 | 4.7 (0.2) | 4.7 |
Based on majority consensus across the three graders.