Skip to main content
. 2024 Aug 14;10:20552076241269538. doi: 10.1177/20552076241269538

Table 3.

Comparison of ChatGPT-4's performance using prompts with and without the specification “according to NCCN.”

ChatGPT-4 Prompt template
All Non-specific prompt b NCCN-specified prompt c p-value
Mean (SD) Mean (SD) Mean (SD)
Query prompts (n) 108 54 54
ChatGPT total RECs a 6.0 (1.92) 6.9 (1.67) 5.0 (1.68) <0.001**
 Rater-approved cGPT REC ratio % d 88.5 (14.8) 85.8 (15.9) 91.2 (13.3) 0.011*
 NCCN-aligned cGPT REC ratio % d 86.7 (16.1) 83.5 (16.7) 89.9 (15.0) 0.006**
 Rater-disagreed cGPT REC ratio % d 9.5 (13.7) 11.7 (14.8) 7.4 (12.3) 0.020**
NCCN total RECs 6.0 (2.18) 6.0 (2.18) 6.0 (2.18)
 ChatGPT REC/NCCN REC ratio % e 100.0 (40.5) 116.1 (39.6) 84.1 (35.0) <0.001**
 NCCN-aligned ChatGPT REC/NCCN REC ratio % e 81.0 (20.6) 89.7 (15.5) 72.4 (21.6) <0.001**
Correctness (range 1–5) 4.5 (0.65) 4.4 (0.69) 4.6 (0.59) 0.017*
Comprehensiveness (range 1–5) 4.4 (0.70) 4.7 (0.41) 4.2 (0.81) <0.001**
Specificity (range 1–5) 4.0 (0.71) 4.2 (0.57) 3.7 (0.73) <0.001**
Appropriateness (range 1–5) 4.4 (0.70) 4.4 (0.72) 4.3 (0.68) 0.640
a

RECs : recommendations.

b

Prompt without “according to NCCN”.

c

Prompt with “according to NCCN”.

d

ChatGPT total RECs as the denominator.

e

NCCN total RECs as the denominator.

* Significant p < 0.05; ** Significant p < 0.01.