Skip to main content
. 2024 Aug 14;10:20552076241269538. doi: 10.1177/20552076241269538

Table 2.

Overall and subgroup analysis of ChatGPT-4 on urological cancer treatment recommendations.

ChatGPT-4 Cancer type Disease Status
All Prostate Ca Kidney Ca. Bladder Ca. p-value Localized Systemic Recurrent p-value
Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD)
Query prompts (n) 108 36 34 38 56 40 6
ChatGPT total RECs a 6.0 (1.92) 6.4 (2.10) 5.8 (1.64) 5.8 (1.97) 0.336 5.7 (1.55) 5.6 (2.04) 7.8 (1.85) 0.017*
 Rater-approved ChatGPT REC ratio (%) b 88.5 (14.8) 90.1 (12.8) 83.4 (19.0) 91.4 (11.0) 0.048* 85.7 (16.0) 93.3 (11.1) 73.8 (16.6) 0.002*
 NCCN-aligned ChatGPT REC ratio (%) b 86.7 (16.1) 85.0 (16.1) 82.9 (19.2) 91.7 (11.8) 0.050 83.7 (17.8) 91.4 (12.0) 73.2 (17.9) 0.009*
 Rater-disagreed ChatGPT REC ratio (%) b 9.5 (13.7) 7.9 (12.4) 13.9 (18.1) 7.1 (9.0) 0.072 12.4 (15.4) 4.3 (7.8) 24.5 (17.1) <0.001**
NCCN total RECs 6.0 (2.18) 5.9 (1.11) 6.1 (2.89) 6.0 (2.29) 0.961 5.7 (1.89) 6.1 (2.40) 7.6 (2.33) 0.131
 ChatGPT REC/NCCN REC ratio (%) c 100.0 (40.5) 102.7 (35.5) 103.4 (56.7) 95.2 (28.5) 0.728 108.5 (45.1) 86.6 (30.3) 113.8 (41.3) 0.057
 NCCN-aligned ChatGPT REC/NCCN REC ratio (%) c 81.0 (20.6) 80.8 (20.8) 78.0 (20.1) 83.6 (21.3) 0.638 84.6 (20.0) 77.3 (21.8) 77.3 (17.2) 0.314
Correctness (range 1–5) 4.5 (0.65) 4.4 (0.69) 4.4 (0.75) 4.6 (0.49) 0.215 4.4 (0.67) 4.7 (0.51) 3.7 (0.79) <0.001**
Comprehensiveness (range 1–5) 4.4 (0.70) 4.6 (0.69) 4.3 (0.68) 4.5 (0.73) 0.388 4.5 (0.72) 4.4 (0.67) 4.1 (0.65) 0.207
Specificity (range 1–5) 4.0 (0.71) 4.0 (0.67) 3.8 (0.71) 4.1 (0.75) 0.278 3.9 (0.59) 4.0 (0.80) 3.7 (0.71) 0.545
Appropriateness (range 1–5) 4.4 (0.70) 4.3 (0.73) 4.3 (0.75) 4.5 (0.62) 0.432 4.3 (0.75) 4.5 (0.52) 3.6 (0.88) 0.006*
a

RECs:  recommendations.

b

ChatGPT total RECs as the denominator.

c

NCCN total RECs as the denominator.

*

Significant p < 0.05.