Table 2.
Accuracy of different prompt techniques against the osteoarthritis benchmark.
|
|
GPT-3.5 | GPT-4 | ||||
|
|
IOa | COTb | P value | IO | COT | P value |
|
|
|
|
|
|
|
|
| Osteoarthritis benchmark | 0.16 | 0.17 | .41 | 0.24 | 0.23 | .52 |
| Guideline item QAc | 0.26 | 0.28 | .03 | 0.38 | 0.38 | .80 |
| Management option QA | 0.22 | 0.20 | .004 | 0.30 | 0.27 | .002 |
| Treatment strategy QA | 0.02 | 0.03 | <.001 | 0.07 | 0.07 | .79 |
| Real-case QA | 0.03 | 0.01 | <.001 | 0.01 | 0.01 | .20 |
aInput-output prompt technique.
bZero-shot chain of thought prompt technique.
cQA: question-answer.