Table 5. Blinded assessment of the readability of ChatGPT-generated vs original abstracts.
| Number of abstracts selected as most readable, N (%) | |
|---|---|
| Original | 14 (31.11%) |
| GPT 3.5-generated | 28 (62.22%) |
| GPT 4-generated | 3 (6.67%) |
Significance Tests
Original vs GPT 3.5, P = 0.003
Original vs GPT 4, P = 0.003
GPT 3.5 vs GPT 4, P<0.001
It should be noted that the readability data was only available for 45 out of the 62 abstracts initially selected for this study due to incomplete evaluations in the assessment process.
Pearson’s χ2 test was used to compare the performance of each abstract subgroup.