Table. Summary of Cases, Clinical Skills Assessed, and Scores for Chatbot and Student Responses by Case and Clinical Reasoning Skill.
Case description | Total word count | Assessed clinical reasoning skills | Chatbot score, %a | Student score, mean (SD), % | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Diagnostic schemab | Differential diagnosis | Illness scriptsc | Case summary | Problem list | Otherd | Model 3.5 | Model 4 | |||
Chronic fatigue and anemia | 583 | Yes | Yes | No | No | Yes | Yes | 62.9 | 76.8 | 75.9 (10.7) |
Acute abdominal pain and diarrhea | 798 | Yes | Yes | No | No | Yes | Yes | 72.7 | 90.6 | 83.1 (11.5) |
Acute confusion and hypertension | 1013 | Yes | Yes | Yes | Yes | Yes | Yes | 65.4 | 88.7 | 78.2 (11.6) |
Chronic diarrhea and amenorrhea | 1109 | No | Yes | No | Yes | No | Yes | 65. | 70.2 | 82.1 (11.0) |
Subacute fever and abdominal pain | 919 | Yes | Yes | No | No | Yes | Yes | 67.9 | 87.8 | 83.0 (7.7) |
Chronic dyspnea | 831 | No | Yes | No | No | Yes | No | 60.4 | 85.7 | 74.1 (14.0) |
Acute chest pain | 747 | No | No | No | No | No | Yes | 82.3 | 91.7 | 96.0 (13.0) |
Acute RUQ pain | 902 | No | Yes | No | Yes | No | Yes | 61.5 | 81.6 | 77.6 (16.7) |
Acute lightheadedness | 885 | No | Yes | No | Yes | Yes | Yes | 41.5 | 70.3 | 80.5 (12.2) |
Acute abdominal pain and fever | 1071 | No | Yes | No | Yes | No | Yes | 79.4 | 99.4 | 86.8 (10.3) |
Chronic fatigue | 917 | No | Yes | No | Yes | Yes | Yes | 71.4 | 94.7 | 86.2 (8.3) |
Subacute confusion | 953 | Yes | Yes | Yes | Yes | Yes | Yes | 71.6 | 93.6 | 78.2 (8.3) |
Acute abdominal pain and nausea | 972 | Yes | Yes | No | No | Yes | Yes | 81.1 | 86.5 | 84.1 (10.0) |
Subacute dyspnea | 1093 | No | Yes | Yes | No | Yes | Yes | 80.9 | 92.4 | 80.5 (11.2) |
Model 4 score, mean (SD), %e | NA | 89.8 (9.0) | 84.3 (13.8) | 92.0 (7.8) | 81.9 (25.3) | 87.7 (11.2)f | 86.4 (17) | NA | NA | NA |
Student score, mean (SD), %e | NA | 85.4 (17.3) | 86.1 (16.8) | 87.6 (14.8) | 82.2 (21.0) | 71.8 (20.1)f | 82.8 (20.3) | NA | NA | NA |
Abbreviations: NA, not applicable; RUQ, right upper quadrant.
Scores listed are the mean score for each case from 2 runs, each graded by 2 independent faculty graders using the same grading rubric.
A diagnostic schema is defined as a thorough collection of causes for a specific symptom, which is organized into categories based on organ system or physiological process.
An illness script is defined as a summary of the features of a specific disease, organized into categories such as epidemiology, historical features, examination findings, and relevant test abnormalities.
Other assessed clinical skills include the following: diagnostic test selection and interpretation, identification of cognitive biases, discussion of relevant literature search strategies, and interpretation of the significance of physical examination findings.
Scores listed are the mean score for questions or prompts that tested a specific clinical reasoning skill, graded by 2 independent faculty graders using the same grading rubric.
The difference in mean scores between model 4 and students on problem list–related questions was statistically significant (P < .001).