Skip to main content
. 2023 Nov 20;20:30. doi: 10.3352/jeehp.2023.20.30

Table 2.

Total and subgroup scores of the best attempt of each chatbot

GPT-4 Bing Claude Bard GPT-3 Total
Total 157 (87.2) 152 (84.4) 129 (71.6) 124 (68.8) 124 (68.8) 180
Area
 Surgery 26 (96.2) 22 (81.48) 17 (62.9) 20 (74.07) 19 (70.3) 27
 Internal medicine 68 (90) 68 (90) 59 (78.6) 52 (69.3) 55 (73.3) 75
 Pediatrics 14 (77.7) 11 (61.1) 11 (61.1) 11 (61.1) 10 (55.5) 18
 Obstetrics & gynecology 24 (68.5) 27 (77.1) 17 (48.5) 17 (48.5) 21 (60) 35
 Public health 18 (85.7) 16 (76) 18 (85.7) 17 (80.9) 15 (71.4) 21
 Emergency medicine 7 (87.5) 8 (100) 7 (87.5) 7 (87.5) 4 (50) 8
Peruvian knowledge
 Required 24 (70.5) 27 (79.4) 24 (70.5) 21 (61.7) 21 (61.7) 34
 Not required 133 (91.09) 125 (85.6) 105 (71.9) 103 (70.5) 103 (70.5) 146
Type of item
 Recall 30 (78.9) 32 (84.2) 30 (78.9) 31 (81.5) 27 (71.05) 38
 Application of knowledge 127 (89.4) 120 (84.5) 99 (69.7) 94 (66.1) 97 (68.3) 142

Values are presented as number (%) or number.