Figure. Performance of Generative Pre-trained Transformer 4 (GPT-4).
Histogram of GPT-4’s performance. Performance scale scores (Bond et al2): 5 = the actual diagnosis was suggested in the differential; 4 = the suggestions included something very close, but not exact; 3 = the suggestions included something closely related that might have been helpful; 2 = the suggestions included something related, but unlikely to be helpful; 0 = no suggestions close to the target diagnosis. (The scale does not contain a score of 1.)