Skip to main content
. 2024 Nov 26;121(49):e2414955121. doi: 10.1073/pnas.2414955121

Fig. 6.

Fig. 6.

Comparison of Human and GPT-4 grading. Average model and human performance for a subset of 933 questions and answers from (A) GPT-4 and (B) GPT-3.5 generated with the metacognitive prompting method.