. 2024 Feb 14;19(2):e0297701. doi: 10.1371/journal.pone.0297701

Table 5. Blinded assessment of the readability of ChatGPT-generated vs original abstracts.

	Number of abstracts selected as most readable, N (%)
Original	14 (31.11%)
GPT 3.5-generated	28 (62.22%)
GPT 4-generated	3 (6.67%)

Significance Tests

Original vs GPT 3.5, P = 0.003

Original vs GPT 4, P = 0.003

GPT 3.5 vs GPT 4, P<0.001

It should be noted that the readability data was only available for 45 out of the 62 abstracts initially selected for this study due to incomplete evaluations in the assessment process.

Pearson’s χ2 test was used to compare the performance of each abstract subgroup.