Table 3.
Post-Hoc Results for One-way ANOVA Readability Metrics Across ChatGPT, Gemini, and CoPilot
| Metric | Comparison | ChatGPT: Mean Difference (P-value) | Gemini: Mean Difference (P-value) | CoPilot: Mean Difference (P-value) |
|---|---|---|---|---|
| Word Count | Baseline vs. Step 1 | 1019 (**** < 0.0001) | 1071 (**** < 0.0001) | 1031 (**** < 0.0001) |
| Baseline vs. Step 2 | 1110 (**** < 0.0001) | 1133 (**** < 0.0001) | 1102 (**** < 0.0001) | |
| Baseline vs. Step 3 | 1119 (**** < 0.0001) | 1114 (**** < 0.0001) | 1129 (**** < 0.0001) | |
| Grade Level | Baseline vs. Step 1 | −1.595 (* 0.0140) | −1.626 (** 0.0089) | −1.195 (ns 0.0977) |
| Baseline vs. Step 2 | 1.484 (* 0.0255) | −0.07895 (ns 0.9986) | 0.6842 (ns 0.5394) | |
| Baseline vs. Step 3 | 3.463 (**** < 0.001) | 1.942 (** 0.0012) | 2.347 (*** 0.0001) | |
| Reading Ease | Baseline vs. Step 1 | 18.05 (**** < 0.0001) | 16.34 (**** < 0.0001) | 14.41 (**** < 0.0001) |
| Baseline vs. Step 2 | −4.474 (ns 0.4819) | 5.889 (ns 0.2836) | 0.9579 (ns 0.9883) | |
| Baseline vs. Step 3 | −17.90 (**** < 0.0001) | −10.19 (* 0.0141) | −9.321 (* 0.0128) |