. 2023 Nov 20;20:30. doi: 10.3352/jeehp.2023.20.30

Table 4.

Ratings of certainty, usefulness, and potential use in class for the best GPT-4 and Bing scores by 2 medical educators

	GPT-4	Bing	P-value
Item 1: Certainty of the justification provided by chatbots
This is not the correct answer, and the information is wrong.	7 (3.89)	7 (3.89)	-
Not the right answer, but the information is somewhat correct.	16 (8.89)	21 (11.67)	-
This is the correct answer, but the information is wrong.	6 (3.33)	3 (1.67)	-
It is the correct answer, and the information is accurate.	151 (83.89)	149 (82.78)	0.777
Item 2: Usefulness of the justification provided by chatbots
It has no educational pearls.	5 (2.78)	9 (5.00)	-
There are about 1–2 educational pearls or important concepts that a competent physician should know.	47 (26.11)	53 (29.44)	-
There are quite a few (more than 3) educational pearls that a competent physician should know.	86 (47.78)	59 (32.78)	0.037^a)
The entire contents are educational pearls that a competent physician should know.	42 (23.33)	59 (32.78)	0.046^a)
Item 3: Potential use of the justification provided by chatbots in classes
No, I wouldn’t use anything.	24 (13.33)	22 (12.22)	-
I would use some of this as a guide.	82 (45.56)	69 (38.33)	-
Yes, I would use the entire explanation.	74 (41.11)	89 (49.44)	0.112

Values are presented as number (%). P-value for the chi-square test comparing GPT-4 and Bing on the highest rating of each item.

^a)

The difference is statistically significant according to the chi-square test.