Table 3. Chatbots as an Abstract Generator: Comparison of Grades Subgroup Analysis: Chatbot 1 vs Chatbot 2a.
Grading scale | Grade by surgeon reviewer, median (IQR) | P valueb | |
---|---|---|---|
Chatbot 1 | Chatbot 2 | ||
10-Point scale | 7.0 (6.0-8.0) | 7.0 (6.0-8.0) | .41 |
20-Point scale | 14.0 (12.0-16.0) | 14.0 (13.0-16.0) | .41 |
Rank | 3.0 (2.0-4.0) | 2.0 (1.0-3.0) | .02 |
Abstracts were generated by chatbot 1 (Chat Generative Pretrained Transformer [GPT] version 3.5) or chatbot 2 (Chat-GPT version 4.0) and graded by 5 surgeon-reviewers.
Statistical significance was P < .05.