. 2023 Nov 20;20:30. doi: 10.3352/jeehp.2023.20.30

Table 1.

Agreement between the 3 attempts of each chatbot calculated using the Fleiss kappa

	GPT-4	Bing	GPT-3	Claude	Bard
Total	0.647	0.668	0.700	0.714	0.574
Areas
Surgery	0.100	0.655	0.769	0.843	0.688
Internal medicine	0.638	0.837	0.669	0.678	0.632
Pediatrics	0.571	0.595	0.550	0.847	0.417
Obstetrics & gynecology	0.745	0.396	0.733	0.699	0.697
Public health	0.709	0.844	0.699	0.741	0.096
Emergency medicine	1.000	0.111	0.832	-0.007	0.495
Type of item
Recall	0.533	0.782	0.665	0.623	0.321
Application of knowledge	0.688	0.632	0.708	0.735	0.628