Skip to main content
. 2024 Feb 21;10:e51523. doi: 10.2196/51523

Table 3.

Subject-level total correct matching responses and accuracy consensus across compared models.

Subject GPTa-3.5 vs Bard Bard vs GPT-4 GPT-3.5 vs GPT-4 Bard, GPT-3.5, and GPT-4

Total correct matching responses, n Accuracy consensus Total correct matching responses, n Accuracy consensus Total correct matching responses, n Accuracy consensus Total correct matching responses, n Accuracy consensus
Biology 17 0.4 22 0.46 23 0.48b 17 0.52
Chemistry 4 0.31 7 0.50 8 0.50b 4 0.50
Physics 8 0.58 13 1.00 14 0.93b 8 1.00

aGPT: Generative Pre-trained Transformers.

bHighest accuracy within a subject.