Skip to main content
letter
. 2025 Jan 6;5(2):95–99. doi: 10.1016/j.aopr.2025.01.002

Table 3.

Intra-AI agreement.

AI Model Comparison Agreement Level Kappa (k) Valuea
ChatGPT-4o Answers 1 & 2 Very good 0.879
Answers 1 & 3 Very good 0.879
Answers 2 & 3 Very good 0.874
Combined vs. Correct Good 0.679
Llama 70B Answers 1 & 2 Very good 0.942
Answers 1 & 3 Very good 0.921
Answers 2 & 3 Very good 0.926
Combined vs. Correct Moderate 0.526
Llama 405B Answers 1 & 2 Very good 0.858
Answers 1 & 3 Very good 0.853
Answers 2 & 3 Very good 0.900
Combined vs. Correct Good 0.700

Agreement levels and kappa values for comparisons between combined answers from Llama 70B, Llama 405B, and ChatGPT-4o. Values indicate the level of agreement between answers, with higher kappa values indicating stronger agreement.

a

all Kappa values P < 0.001.