. 2025 Apr 21;14(6):1281–1295. doi: 10.1007/s40123-025-01142-x

Table 4.

Readability of original online resources and performance of LLMs for improving the readability of the original online resources

Readability metrics	Original resources	ChatGPT-4o (01 preview)	p value (Orig. vs GPT-4o)	ChatGPT-3.5	p value (Orig. vs GPT-3.5)	Google Gemini	p value (Orig. vs Gemini)
Syllables	1457.2 (736.9)	330.4 (159.6)	< 0.001	984.5 (412.7)	0.008	808.4 (205.7)	< 0.001
Words	871.0 (392.7)	230.4 (107.9)	< 0.001	649.2 (256.2)	0.21	521.9 (122.4)	< 0.001
3+ syllable words	91.0 (60.7)	13.3 (10.4)	< 0.001	43.0 (23.1)	0.001	38.5 (12.6)	< 0.001
Sentences	41.5 (17.1)	18.4 (7.8)	0.04	41.0 (16.6)	0.9	31.1 (9.0)	0.02
SMOG Readability Score	10.3 (2.2)	5.3 (1.6)	< 0.001	7.6 (1.2)	< 0.001	7.8 (1.3)	< 0.001
Flesch-Kincaid Grade Level	9.7 (1.9)	5.8 (1.5)	< 0.001	7.7 (1.4)	< 0.001	7.5 (1.1)	< 0.001

LLM large language model, SMOG Simple Measure of Gobbledygook