. 2024 Aug 14;10:20552076241269538. doi: 10.1177/20552076241269538

Table 2.

Overall and subgroup analysis of ChatGPT-4 on urological cancer treatment recommendations.

ChatGPT-4				Cancer type							Disease Status
		All		Prostate Ca		Kidney Ca.		Bladder Ca.		p-value	Localized		Systemic		Recurrent		p-value
		Mean	(SD)	Mean	(SD)	Mean	(SD)	Mean	(SD)	p-value	Mean	(SD)	Mean	(SD)	Mean	(SD)	p-value
Query prompts (n)		108		36		34		38			56		40		6
ChatGPT total RECs^a		6.0	(1.92)	6.4	(2.10)	5.8	(1.64)	5.8	(1.97)	0.336	5.7	(1.55)	5.6	(2.04)	7.8	(1.85)	0.017*
Rater-approved ChatGPT REC ratio (%)^b		88.5	(14.8)	90.1	(12.8)	83.4	(19.0)	91.4	(11.0)	0.048*	85.7	(16.0)	93.3	(11.1)	73.8	(16.6)	0.002*
NCCN-aligned ChatGPT REC ratio (%)^b		86.7	(16.1)	85.0	(16.1)	82.9	(19.2)	91.7	(11.8)	0.050	83.7	(17.8)	91.4	(12.0)	73.2	(17.9)	0.009*
Rater-disagreed ChatGPT REC ratio (%)^b		9.5	(13.7)	7.9	(12.4)	13.9	(18.1)	7.1	(9.0)	0.072	12.4	(15.4)	4.3	(7.8)	24.5	(17.1)	<0.001**
NCCN total RECs		6.0	(2.18)	5.9	(1.11)	6.1	(2.89)	6.0	(2.29)	0.961	5.7	(1.89)	6.1	(2.40)	7.6	(2.33)	0.131
ChatGPT REC/NCCN REC ratio (%)^c		100.0	(40.5)	102.7	(35.5)	103.4	(56.7)	95.2	(28.5)	0.728	108.5	(45.1)	86.6	(30.3)	113.8	(41.3)	0.057
NCCN-aligned ChatGPT REC/NCCN REC ratio (%)^c		81.0	(20.6)	80.8	(20.8)	78.0	(20.1)	83.6	(21.3)	0.638	84.6	(20.0)	77.3	(21.8)	77.3	(17.2)	0.314
Correctness	(range 1–5)	4.5	(0.65)	4.4	(0.69)	4.4	(0.75)	4.6	(0.49)	0.215	4.4	(0.67)	4.7	(0.51)	3.7	(0.79)	<0.001**
Comprehensiveness	(range 1–5)	4.4	(0.70)	4.6	(0.69)	4.3	(0.68)	4.5	(0.73)	0.388	4.5	(0.72)	4.4	(0.67)	4.1	(0.65)	0.207
Specificity	(range 1–5)	4.0	(0.71)	4.0	(0.67)	3.8	(0.71)	4.1	(0.75)	0.278	3.9	(0.59)	4.0	(0.80)	3.7	(0.71)	0.545
Appropriateness	(range 1–5)	4.4	(0.70)	4.3	(0.73)	4.3	(0.75)	4.5	(0.62)	0.432	4.3	(0.75)	4.5	(0.52)	3.6	(0.88)	0.006*

^{^a}

RECs: recommendations.

^{^b}

ChatGPT total RECs as the denominator.

^{^c}

NCCN total RECs as the denominator.

Significant p < 0.05.