Table 1:
Test | Section 1: Physics, Instrumentation, Radionuclides, and Radiation Safety | p-value | Section 2: Acquisition and Quality Control, Gated SPECT, Artifact Recognition, and MUGA | p-value | Section 3: Test Selection, Stress and Nuclear Protocols Interpretation, Appropriate Use, and Risk Stratification | p-value | Section 4: Cardiac PET, Multimodality Imaging, Cardiac Amyloidosis, Cases with the Experts: PET and SPECT | p-value |
---|---|---|---|---|---|---|---|---|
GPT-4 (4) | 63.0% (58.0% – 64.0%) | p< 0.001 (4 vs. G) p= 0.024 (4 vs. 4T) p< 0.001 (G vs. 4T) p< 0.001 (4o vs. 4T) p< 0.001 (4o vs. G) p< 0.001 (4o vs. 4) |
44.4% (41.1% – 46.7%) | p= 0.877 (4 vs. G) p< 0.001 (4 vs. 4T) p< 0.001 (G vs. 4T) p= 0.306 (4o vs. 4T) p< 0.001 (4o vs. G) p< 0.001 (4o vs. 4) |
67.5% (66.0% – 69.0%) | p< 0.001 (4 vs. G) p= 0.044 (4 vs. 4T) p< 0.001 (G vs. 4T) p< 0.001 (4o vs. 4T) p< 0.001 (4o vs. G) p< 0.001 (4o vs. 4) |
58.0% (56.7% – 59.3%) | p< 0.001 (4 vs. G) p= 0.222 (4 vs. 4T) p< 0.001 (G vs. 4T) p< 0.001 (4o vs. 4T) p< 0.001 (4o vs. G) p< 0.001 (4o vs. 4) |
Gemini (G) | 38.0% (36.0% – 40.0%) | 42.2% (41.3% – 43.2%) | 40.0% (37.5% – 42.5%) | 45.5% (44.3% – 46.6%) | ||||
GPT-4 Turbo (4T) | 62.0% (61.2% – 62.8%) | 53.3% (51.1% – 55.6%) | 68.8% (67.5% – 70.0%) | 57.6% (56.1% – 60.6%) | ||||
GPT-4o (4o) | 73.0% (71.2% – 74.9%) | 55.6% (54.2% – 56.9%) | 60.0% (57.4% – 62.5%) | 63.6% (62.2% – 65.0%) |
Values are presented as median percentiles and 95% confidence intervals of 30 attempts at the exam. SPECT - single photon emission computed tomography; PET - positron emission tomography; MUGA - multiple-gated acquisition