. 2023 Nov 20;20:30. doi: 10.3352/jeehp.2023.20.30

Table 2.

Total and subgroup scores of the best attempt of each chatbot

	GPT-4	Bing	Claude	Bard	GPT-3	Total
Total	157 (87.2)	152 (84.4)	129 (71.6)	124 (68.8)	124 (68.8)	180
Area
Surgery	26 (96.2)	22 (81.48)	17 (62.9)	20 (74.07)	19 (70.3)	27
Internal medicine	68 (90)	68 (90)	59 (78.6)	52 (69.3)	55 (73.3)	75
Pediatrics	14 (77.7)	11 (61.1)	11 (61.1)	11 (61.1)	10 (55.5)	18
Obstetrics & gynecology	24 (68.5)	27 (77.1)	17 (48.5)	17 (48.5)	21 (60)	35
Public health	18 (85.7)	16 (76)	18 (85.7)	17 (80.9)	15 (71.4)	21
Emergency medicine	7 (87.5)	8 (100)	7 (87.5)	7 (87.5)	4 (50)	8
Peruvian knowledge
Required	24 (70.5)	27 (79.4)	24 (70.5)	21 (61.7)	21 (61.7)	34
Not required	133 (91.09)	125 (85.6)	105 (71.9)	103 (70.5)	103 (70.5)	146
Type of item
Recall	30 (78.9)	32 (84.2)	30 (78.9)	31 (81.5)	27 (71.05)	38
Application of knowledge	127 (89.4)	120 (84.5)	99 (69.7)	94 (66.1)	97 (68.3)	142

Values are presented as number (%) or number.