Skip to main content
. 2023 Nov 20;20:30. doi: 10.3352/jeehp.2023.20.30

Table 4.

Ratings of certainty, usefulness, and potential use in class for the best GPT-4 and Bing scores by 2 medical educators

GPT-4 Bing P-value
Item 1: Certainty of the justification provided by chatbots
 This is not the correct answer, and the information is wrong. 7 (3.89) 7 (3.89) -
 Not the right answer, but the information is somewhat correct. 16 (8.89) 21 (11.67) -
 This is the correct answer, but the information is wrong. 6 (3.33) 3 (1.67) -
 It is the correct answer, and the information is accurate. 151 (83.89) 149 (82.78) 0.777
Item 2: Usefulness of the justification provided by chatbots
 It has no educational pearls. 5 (2.78) 9 (5.00) -
 There are about 1–2 educational pearls or important concepts that a competent physician should know. 47 (26.11) 53 (29.44) -
 There are quite a few (more than 3) educational pearls that a competent physician should know. 86 (47.78) 59 (32.78) 0.037a)
 The entire contents are educational pearls that a competent physician should know. 42 (23.33) 59 (32.78) 0.046a)
Item 3: Potential use of the justification provided by chatbots in classes
 No, I wouldn’t use anything. 24 (13.33) 22 (12.22) -
 I would use some of this as a guide. 82 (45.56) 69 (38.33) -
 Yes, I would use the entire explanation. 74 (41.11) 89 (49.44) 0.112

Values are presented as number (%). P-value for the chi-square test comparing GPT-4 and Bing on the highest rating of each item.

a)

The difference is statistically significant according to the chi-square test.