Skip to main content
. 2025 May 28;20:531. doi: 10.1186/s13018-025-05955-1

Table 3.

Post-Hoc Results for One-way ANOVA Readability Metrics Across ChatGPT, Gemini, and CoPilot

Metric Comparison ChatGPT: Mean Difference (P-value) Gemini: Mean Difference (P-value) CoPilot: Mean Difference (P-value)
Word Count Baseline vs. Step 1 1019 (**** < 0.0001) 1071 (**** < 0.0001) 1031 (**** < 0.0001)
Baseline vs. Step 2 1110 (**** < 0.0001) 1133 (**** < 0.0001) 1102 (**** < 0.0001)
Baseline vs. Step 3 1119 (**** < 0.0001) 1114 (**** < 0.0001) 1129 (**** < 0.0001)
Grade Level Baseline vs. Step 1 −1.595 (* 0.0140) −1.626 (** 0.0089) −1.195 (ns 0.0977)
Baseline vs. Step 2 1.484 (* 0.0255) −0.07895 (ns 0.9986) 0.6842 (ns 0.5394)
Baseline vs. Step 3 3.463 (**** < 0.001) 1.942 (** 0.0012) 2.347 (*** 0.0001)
Reading Ease Baseline vs. Step 1 18.05 (**** < 0.0001) 16.34 (**** < 0.0001) 14.41 (**** < 0.0001)
Baseline vs. Step 2 −4.474 (ns 0.4819) 5.889 (ns 0.2836) 0.9579 (ns 0.9883)
Baseline vs. Step 3 −17.90 (**** < 0.0001) −10.19 (* 0.0141) −9.321 (* 0.0128)