. 2024 Jun 26;24:694. doi: 10.1186/s12909-024-05630-9

Table 1.

AI chatbots’ accuracy

	ChatGPT-4		Microsoft Copilot		Google Gemini		ChatGPT-4 vs Google Gemini		ChatGPT-4 vs Microsoft Copilot		Microsoft Bing vs Google Gemini		Overall among AI chatbots
Failure	Absolute frequ.	%	Absolute frequ.	%	Absolute frequ.	%	Chi2	p-value	Chi2	p-value	Chi2	p-value	Chi2	p-value
	57	6.96	83	10.13	246	30.04	-0.23	0.00*	-0.031	0.199	-0.198	0.00*	312.76	0.000*
*Logical reasoning and general culture*	39	68.42	51	61.45	126	51.22	-0.28	0.00*	-0.038	0.70	-0.242	0.00*	52	0.000*
*Biology*	6	10.53	8	9.64	31	12.60	-0.1	0.00*	-0.008	1.00	-0.09	0.00*	166.01	0.000*
*Chemistry*	7	12.28	11	13.25	32	13.01	-0.16	0.00*	-0.025	1.00	-0.13	0.00*	73.03	0.000*
*Physics and mathematics*	5	8.77	13	15.66	57	23.17	-0.43	0.00*	-0.066	0.46	-0.366	0.00*	94.16	0.000*

^* statistically significant findings