Table 1.
GPT-4 | Bing | GPT-3 | Claude | Bard | |
---|---|---|---|---|---|
Total | 0.647 | 0.668 | 0.700 | 0.714 | 0.574 |
Areas | |||||
Surgery | 0.100 | 0.655 | 0.769 | 0.843 | 0.688 |
Internal medicine | 0.638 | 0.837 | 0.669 | 0.678 | 0.632 |
Pediatrics | 0.571 | 0.595 | 0.550 | 0.847 | 0.417 |
Obstetrics & gynecology | 0.745 | 0.396 | 0.733 | 0.699 | 0.697 |
Public health | 0.709 | 0.844 | 0.699 | 0.741 | 0.096 |
Emergency medicine | 1.000 | 0.111 | 0.832 | -0.007 | 0.495 |
Type of item | |||||
Recall | 0.533 | 0.782 | 0.665 | 0.623 | 0.321 |
Application of knowledge | 0.688 | 0.632 | 0.708 | 0.735 | 0.628 |