Table 2.
Blinded Ratings of Hierarchies
| ChatGPT-Generated (N = 72) |
Human-Generated (N = 18) |
||||
|---|---|---|---|---|---|
| M (SD) | Range | M (SD) | Range | Test | |
| Appropriateness | 4.47 (0.58) | 2.33 – 5.0 | 4.94 (0.13) | 4.67 – 5.0 | t(87.56) = 6.43, p < .001, d = 1.14 |
| Specificity | 4.17 (0.65) | 2.33 – 5.0 | 4.78 (0.34) | 4.0 – 5.0 | t(50.85) = 5.51, p < .001, d = 1.18 |
| Variability | 3.96 (0.79) | 1.67 – 5.0 | 4.63 (0.41) | 3.33 – 5.0 | t(52.08) = 5.01, p < .001, d = 1.07 |
| Safety | 4.89 (0.24) | 3.67 – 5.0 | 4.81 (0.29) | 4.0 – 5.0 | t(23.53) = 1.08, p = .29, d = .30 |
| Overall | 3.99 (0.82) | 2.0 – 5.0 | 4.70 (0.39) | 3.67 – 5.0 | t(57.80) = 5.34, p < .001, d = 1.11 |