Figure 3.
The accuracy distribution for ChatGPT-3.5 and ChatGPT-4 depending on the question domain. The absolute number of correct answers for each domain is marked at the top of each bar. The domain number 1-13 correspond to statistics, bone & soft tissue, breast, CNS & eye, gastrointestinal, genitourinary, gynecology, head & neck & skin, lung & mediastinum, lymphoma & leukemia, pediatrics, biology, and physics, respectively. The X-axis labels are shifted to save space.