Skip to main content
. 2023 Jul 17;183(9):1028–1030. doi: 10.1001/jamainternmed.2023.2909

Table. Summary of Cases, Clinical Skills Assessed, and Scores for Chatbot and Student Responses by Case and Clinical Reasoning Skill.

Case description Total word count Assessed clinical reasoning skills Chatbot score, %a Student score, mean (SD), %
Diagnostic schemab Differential diagnosis Illness scriptsc Case summary Problem list Otherd Model 3.5 Model 4
Chronic fatigue and anemia 583 Yes Yes No No Yes Yes 62.9 76.8 75.9 (10.7)
Acute abdominal pain and diarrhea 798 Yes Yes No No Yes Yes 72.7 90.6 83.1 (11.5)
Acute confusion and hypertension 1013 Yes Yes Yes Yes Yes Yes 65.4 88.7 78.2 (11.6)
Chronic diarrhea and amenorrhea 1109 No Yes No Yes No Yes 65. 70.2 82.1 (11.0)
Subacute fever and abdominal pain 919 Yes Yes No No Yes Yes 67.9 87.8 83.0 (7.7)
Chronic dyspnea 831 No Yes No No Yes No 60.4 85.7 74.1 (14.0)
Acute chest pain 747 No No No No No Yes 82.3 91.7 96.0 (13.0)
Acute RUQ pain 902 No Yes No Yes No Yes 61.5 81.6 77.6 (16.7)
Acute lightheadedness 885 No Yes No Yes Yes Yes 41.5 70.3 80.5 (12.2)
Acute abdominal pain and fever 1071 No Yes No Yes No Yes 79.4 99.4 86.8 (10.3)
Chronic fatigue 917 No Yes No Yes Yes Yes 71.4 94.7 86.2 (8.3)
Subacute confusion 953 Yes Yes Yes Yes Yes Yes 71.6 93.6 78.2 (8.3)
Acute abdominal pain and nausea 972 Yes Yes No No Yes Yes 81.1 86.5 84.1 (10.0)
Subacute dyspnea 1093 No Yes Yes No Yes Yes 80.9 92.4 80.5 (11.2)
Model 4 score, mean (SD), %e NA 89.8 (9.0) 84.3 (13.8) 92.0 (7.8) 81.9 (25.3) 87.7 (11.2)f 86.4 (17) NA NA NA
Student score, mean (SD), %e NA 85.4 (17.3) 86.1 (16.8) 87.6 (14.8) 82.2 (21.0) 71.8 (20.1)f 82.8 (20.3) NA NA NA

Abbreviations: NA, not applicable; RUQ, right upper quadrant.

a

Scores listed are the mean score for each case from 2 runs, each graded by 2 independent faculty graders using the same grading rubric.

b

A diagnostic schema is defined as a thorough collection of causes for a specific symptom, which is organized into categories based on organ system or physiological process.

c

An illness script is defined as a summary of the features of a specific disease, organized into categories such as epidemiology, historical features, examination findings, and relevant test abnormalities.

d

Other assessed clinical skills include the following: diagnostic test selection and interpretation, identification of cognitive biases, discussion of relevant literature search strategies, and interpretation of the significance of physical examination findings.

e

Scores listed are the mean score for questions or prompts that tested a specific clinical reasoning skill, graded by 2 independent faculty graders using the same grading rubric.

f

The difference in mean scores between model 4 and students on problem list–related questions was statistically significant (P < .001).