Skip to main content
. Author manuscript; available in PMC: 2026 Feb 18.
Published in final edited form as: Behav Ther. 2025 Mar 10;56(4):680–688. doi: 10.1016/j.beth.2025.02.005

Table 2.

Blinded Ratings of Hierarchies

ChatGPT-Generated
(N = 72)
Human-Generated
(N = 18)
M (SD) Range M (SD) Range Test
Appropriateness 4.47 (0.58) 2.33 – 5.0 4.94 (0.13) 4.67 – 5.0 t(87.56) = 6.43, p < .001, d = 1.14
Specificity 4.17 (0.65) 2.33 – 5.0 4.78 (0.34) 4.0 – 5.0 t(50.85) = 5.51, p < .001, d = 1.18
Variability 3.96 (0.79) 1.67 – 5.0 4.63 (0.41) 3.33 – 5.0 t(52.08) = 5.01, p < .001, d = 1.07
Safety 4.89 (0.24) 3.67 – 5.0 4.81 (0.29) 4.0 – 5.0 t(23.53) = 1.08, p = .29, d = .30
Overall 3.99 (0.82) 2.0 – 5.0 4.70 (0.39) 3.67 – 5.0 t(57.80) = 5.34, p < .001, d = 1.11