Skip to main content
. 2024 Jun 26;8:e59267. doi: 10.2196/59267

Table 2.

κ coefficient for interrater agreement between GPT-4 and the physicians’ evaluations for the differential diagnosis lists.

Differential-diagnosis lists generator Cohen κ coefficient (95% CI) Strength of agreement [34] Number of differential-diagnosis lists
All 0.63 (0.56-0.69) Fair to good 1176
GPT-4 0.47 (0.39-0.56) Fair to good 392
Google Barda 0.67 (0.52-0.73) Fair to good 392
LLaMA2 chatbotb 0.63 (0.52-0.73) Fair to good 392

aCurrently Google Gemini.

bLLaMA2: LLM Meta AI 2.