Skip to main content
. 2024 Jul 22;26:e58158. doi: 10.2196/58158

Table 2.

Accuracy of different prompt techniques against the osteoarthritis benchmark.


GPT-3.5 GPT-4

IOa COTb P value IO COT P value







Osteoarthritis benchmark 0.16 0.17 .41 0.24 0.23 .52
Guideline item QAc 0.26 0.28 .03 0.38 0.38 .80
Management option QA 0.22 0.20 .004 0.30 0.27 .002
Treatment strategy QA 0.02 0.03 <.001 0.07 0.07 .79
Real-case QA 0.03 0.01 <.001 0.01 0.01 .20

aInput-output prompt technique.

bZero-shot chain of thought prompt technique.

cQA: question-answer.