. 2024 Aug 14;10:20552076241269538. doi: 10.1177/20552076241269538

Table 3.

Comparison of ChatGPT-4's performance using prompts with and without the specification “according to NCCN.”

ChatGPT-4		Prompt template
		All		Non-specific prompt^b		NCCN-specified prompt^c		p-value
		Mean	(SD)	Mean	(SD)	Mean	(SD)	p-value
Query prompts (n)		108		54		54
ChatGPT total RECs^a		6.0	(1.92)	6.9	(1.67)	5.0	(1.68)	<0.001**
Rater-approved cGPT REC ratio %^d		88.5	(14.8)	85.8	(15.9)	91.2	(13.3)	0.011*
NCCN-aligned cGPT REC ratio %^d		86.7	(16.1)	83.5	(16.7)	89.9	(15.0)	0.006**
Rater-disagreed cGPT REC ratio %^d		9.5	(13.7)	11.7	(14.8)	7.4	(12.3)	0.020**
NCCN total RECs		6.0	(2.18)	6.0	(2.18)	6.0	(2.18)
ChatGPT REC/NCCN REC ratio %^e		100.0	(40.5)	116.1	(39.6)	84.1	(35.0)	<0.001**
NCCN-aligned ChatGPT REC/NCCN REC ratio %^e		81.0	(20.6)	89.7	(15.5)	72.4	(21.6)	<0.001**
Correctness	(range 1–5)	4.5	(0.65)	4.4	(0.69)	4.6	(0.59)	0.017*
Comprehensiveness	(range 1–5)	4.4	(0.70)	4.7	(0.41)	4.2	(0.81)	<0.001**
Specificity	(range 1–5)	4.0	(0.71)	4.2	(0.57)	3.7	(0.73)	<0.001**
Appropriateness	(range 1–5)	4.4	(0.70)	4.4	(0.72)	4.3	(0.68)	0.640

^{^a}

RECs : recommendations.

^{^b}

Prompt without “according to NCCN”.

^{^c}

Prompt with “according to NCCN”.

^{^d}

ChatGPT total RECs as the denominator.

^{^e}

NCCN total RECs as the denominator.

* Significant p < 0.05; ** Significant p < 0.01.