Table 3.
Intra-AI agreement.
| AI Model | Comparison | Agreement Level | Kappa (k) Valuea | 
|---|---|---|---|
| ChatGPT-4o | Answers 1 & 2 | Very good | 0.879 | 
| Answers 1 & 3 | Very good | 0.879 | |
| Answers 2 & 3 | Very good | 0.874 | |
| Combined vs. Correct | Good | 0.679 | |
| Llama 70B | Answers 1 & 2 | Very good | 0.942 | 
| Answers 1 & 3 | Very good | 0.921 | |
| Answers 2 & 3 | Very good | 0.926 | |
| Combined vs. Correct | Moderate | 0.526 | |
| Llama 405B | Answers 1 & 2 | Very good | 0.858 | 
| Answers 1 & 3 | Very good | 0.853 | |
| Answers 2 & 3 | Very good | 0.900 | |
| Combined vs. Correct | Good | 0.700 | 
Agreement levels and kappa values for comparisons between combined answers from Llama 70B, Llama 405B, and ChatGPT-4o. Values indicate the level of agreement between answers, with higher kappa values indicating stronger agreement.
all Kappa values P < 0.001.