. 2023 Nov 20;20:30. doi: 10.3352/jeehp.2023.20.30

Table 3.

Factors associated with correct answers provided by chatbots in a bivariate logistic regression model

	GPT-4	Bing	Claude	Bard	GPT-3
Area
Surgery	Ref	Ref	Ref	Ref	Ref
Internal medicine	0.37 (0.02 to 2.25)	2.21 (0.60 to 7.63)	2.17 (0.82 to 5.64)	0.79 (0.28 to 2.07)	1.16 (0.42 to 3.00)
Pediatrics	0.13 (0.01 to 1.02)	0.36 (0.09 to 1.37)	1.08 (0.27 to 3.23)	1.82 (0.15 to 1.99)	0.53 (0.15 to 1.83)
Obstetrics & gynecology	0.13 (0.01 to 0.82)	1.53 (0.36 to 6.86)	0.71 (0.24 to 2.04)	0.42 (0.13 to 1.27)	0.88 (0.28 to 2.70)
Public health	0.23 (0.11 to 1.96)	0.72 (0.17 to 3.02)	3.53 (0.90 to 17.79)	1.49 (0.38 to 6.50)	1.05 (0.30 to 3.83)
Emergency medicine	0.27 (0.01 to 7.38)	Not estimable	4.12 (0.60 to 82.89)	2.45 (0.34 to 50.03)	0.42 (0.08 to 2.17)
Peruvian knowledge
Not required	Ref	Ref	Ref	Ref	Ref
Required	0.23 (0.09 to 0.61)^a)	0.65 (0.26 to 1.78)	0.94 (0.42 to 2.21)	0.67 (0.31 to 1.50)	0.67 (0.31 to 1.50)
Type of item
Recall	Ref	Ref	Ref	Ref	Ref
Application of knowledge	2.25 (0.84 to 5.71)	1.02 (0.35 to 2.60)	0.61 (0.25 to 1.39)	0.43 (0.16 to 0.99)	0.88 (0.39 to 1.89)

Values are presented as odds ratio (95% confidence interval).

Ref, reference.

^a)

The odds ratio was statistically significant.