Table 2. First diagnostic test accuracy.
| Participants | Responses | pa | |
|---|---|---|---|
| Correct (%) | Incorrect (%) | ||
| Human expert | 88 | 12 | 0.208 |
| ChatGPT-4 | 80 | 20 | |
| ChatGPT-4o | 87 | 13 | |
| ChatGPT o3-mini | 89 | 11 | |
Cochran-Q test; no statistical difference was found between paired measurements.