Skip to main content
. 2024 Mar 29;24:354. doi: 10.1186/s12909-024-05239-y

Table 2.

Key parameters investigated in each study

Author No.
of
MCQs
Tested
vs.
Human
Medical
Field
Questions
Evaluated
By
Performance
Scores
Sevgi et al. 3 No Neurosurgery

Evaluated by the

author according

to current literature

2 (66.6%) of the questions

were accurate

Biswas 5 No General N/A N/A
Agarwal et al. 320 No Medical Physiology 2 Physiologists

p value validity < 0.001 for:

Chat-GPT vs. Bing < 0.001

Bard vs. Bing < 0.001

p value of difficulty < 0.006

Chat-GPT vs. Bing 0.010

Chat-GPT vs. Bard 0.003

Ayub et al. 40 No Dermatology

2 board certified

dermatologists

16 (40%) of questions valid for exams
Cheung et al. 50 Yes Internal Medicine/Surgery

5 International

medical experts

and educators

Overall performance:

AI score 20 (40%) vs. Human score 30 (60%)

Mean difference -0.80 ± 4.82

Total time required:

AI 20 min 25 s vs. Human 211 min 33 s

Totlis et al. 18 No Anatomy N/A N/A
Han et al. 3 No Biochemistry N/A N/A
Klang et al. 210 No

Internal Medicine

Surgery

Obstetrics & Gynecology

Psychiatry

Pediatrics

5 Specialist

physicians in the

tested fields

Problematic questions by field:

Surgery 30%

Gynecology 20%

Pediatrics 10%

Internal medicine 10%

Psychiatry 0%

Summary of key parameters investigated in each study, November 2023