Table 4.
GPT-4 | Bing | P-value | |
---|---|---|---|
Item 1: Certainty of the justification provided by chatbots | |||
This is not the correct answer, and the information is wrong. | 7 (3.89) | 7 (3.89) | - |
Not the right answer, but the information is somewhat correct. | 16 (8.89) | 21 (11.67) | - |
This is the correct answer, but the information is wrong. | 6 (3.33) | 3 (1.67) | - |
It is the correct answer, and the information is accurate. | 151 (83.89) | 149 (82.78) | 0.777 |
Item 2: Usefulness of the justification provided by chatbots | |||
It has no educational pearls. | 5 (2.78) | 9 (5.00) | - |
There are about 1–2 educational pearls or important concepts that a competent physician should know. | 47 (26.11) | 53 (29.44) | - |
There are quite a few (more than 3) educational pearls that a competent physician should know. | 86 (47.78) | 59 (32.78) | 0.037a) |
The entire contents are educational pearls that a competent physician should know. | 42 (23.33) | 59 (32.78) | 0.046a) |
Item 3: Potential use of the justification provided by chatbots in classes | |||
No, I wouldn’t use anything. | 24 (13.33) | 22 (12.22) | - |
I would use some of this as a guide. | 82 (45.56) | 69 (38.33) | - |
Yes, I would use the entire explanation. | 74 (41.11) | 89 (49.44) | 0.112 |
Values are presented as number (%). P-value for the chi-square test comparing GPT-4 and Bing on the highest rating of each item.
The difference is statistically significant according to the chi-square test.