Skip to main content
. 2024 Aug 2;7(8):e2425373. doi: 10.1001/jamanetworkopen.2024.25373

Table 3. Chatbots as an Abstract Generator: Comparison of Grades Subgroup Analysis: Chatbot 1 vs Chatbot 2a.

Grading scale Grade by surgeon reviewer, median (IQR) P valueb
Chatbot 1 Chatbot 2
10-Point scale 7.0 (6.0-8.0) 7.0 (6.0-8.0) .41
20-Point scale 14.0 (12.0-16.0) 14.0 (13.0-16.0) .41
Rank 3.0 (2.0-4.0) 2.0 (1.0-3.0) .02
a

Abstracts were generated by chatbot 1 (Chat Generative Pretrained Transformer [GPT] version 3.5) or chatbot 2 (Chat-GPT version 4.0) and graded by 5 surgeon-reviewers.

b

Statistical significance was P < .05.