Table 3.
LLMa | Response comprehensiveness |
||
---|---|---|---|
n | Mean (SD) | Median | |
ChatGPT-3.5 | 22 | 4.6 (0.3) | 4.5 |
ChatGPT-4.0 | 33 | 4.6 (0.4) | 4.7 |
Google Bard | 15 | 4.7 (0.2) | 4.7 |
Based on majority consensus across the three graders.
LLMa | Response comprehensiveness |
||
---|---|---|---|
n | Mean (SD) | Median | |
ChatGPT-3.5 | 22 | 4.6 (0.3) | 4.5 |
ChatGPT-4.0 | 33 | 4.6 (0.4) | 4.7 |
Google Bard | 15 | 4.7 (0.2) | 4.7 |
Based on majority consensus across the three graders.