Table 4.
Readability of original online resources and performance of LLMs for improving the readability of the original online resources
| Readability metrics | Original resources | ChatGPT-4o (01 preview) | p value (Orig. vs GPT-4o) | ChatGPT-3.5 | p value (Orig. vs GPT-3.5) | Google Gemini | p value (Orig. vs Gemini) |
|---|---|---|---|---|---|---|---|
| Syllables | 1457.2 (736.9) | 330.4 (159.6) | < 0.001 | 984.5 (412.7) | 0.008 | 808.4 (205.7) | < 0.001 |
| Words | 871.0 (392.7) | 230.4 (107.9) | < 0.001 | 649.2 (256.2) | 0.21 | 521.9 (122.4) | < 0.001 |
| 3+ syllable words | 91.0 (60.7) | 13.3 (10.4) | < 0.001 | 43.0 (23.1) | 0.001 | 38.5 (12.6) | < 0.001 |
| Sentences | 41.5 (17.1) | 18.4 (7.8) | 0.04 | 41.0 (16.6) | 0.9 | 31.1 (9.0) | 0.02 |
| SMOG Readability Score | 10.3 (2.2) | 5.3 (1.6) | < 0.001 | 7.6 (1.2) | < 0.001 | 7.8 (1.3) | < 0.001 |
| Flesch-Kincaid Grade Level | 9.7 (1.9) | 5.8 (1.5) | < 0.001 | 7.7 (1.4) | < 0.001 | 7.5 (1.1) | < 0.001 |
LLM large language model, SMOG Simple Measure of Gobbledygook