Skip to main content
. 2024 Jun 26;24:694. doi: 10.1186/s12909-024-05630-9

Table 1.

AI chatbots’ accuracy

ChatGPT-4 Microsoft Copilot Google Gemini ChatGPT-4 vs Google Gemini ChatGPT-4 vs Microsoft Copilot Microsoft Bing vs Google Gemini Overall among AI chatbots
Failure Absolute frequ. % Absolute frequ. % Absolute frequ. % Chi2 p-value Chi2 p-value Chi2 p-value Chi2 p-value
57 6.96 83 10.13 246 30.04 -0.23 0.00* -0.031 0.199 -0.198 0.00* 312.76 0.000*
Logical reasoning and general culture 39 68.42 51 61.45 126 51.22 -0.28 0.00* -0.038 0.70 -0.242 0.00* 52 0.000*
Biology 6 10.53 8 9.64 31 12.60 -0.1 0.00* -0.008 1.00 -0.09 0.00* 166.01 0.000*
Chemistry 7 12.28 11 13.25 32 13.01 -0.16 0.00* -0.025 1.00 -0.13 0.00* 73.03 0.000*
Physics and mathematics 5 8.77 13 15.66 57 23.17 -0.43 0.00* -0.066 0.46 -0.366 0.00* 94.16 0.000*

* statistically significant findings