Table 4.
Rater measurement.
| Rater | Total | Observed Average | FAM | Measure (logits) | SE | Infit MNSQ | Infit ZSTD | Outfit MNSQ | Infit ZSTD | PTMAC |
|---|---|---|---|---|---|---|---|---|---|---|
| Rater 1 | 1270 | 2.14 | 2.4 | 1.62 | 0.08 | 0.91 | −1.7 | 0.91 | −1.6 | 0.48 |
| Rater 2 | 758 | 2.63 | 2.59 | 2.31 | 0.13 | 1.37 | 4.3 | 1.33 | 3.3 | 0.32 |
| Rater 3 | 1020 | 1.72 | 2.15 | 0.62 | 0.08 | 0.86 | −2.5 | 0.86 | −2.5 | 0.53 |
| Rater 4 | 1562 | 2.63 | 2.65 | 2.54 | 0.09 | 1.12 | 2.2 | 1.14 | 2 | 0.44 |
| Rater 5 | 1273 | 2.28 | 2.24 | 1.01 | 0.08 | 0.87 | −2.4 | 0.87 | −2.5 | 0.48 |
| Rater 6 | 817 | 2.67 | 2.66 | 2.61 | 0.13 | 1.22 | 2.9 | 1.2 | 2.1 | 0.43 |
SE = Standard Error, FAM = Fair average measure, PTMAC = Point measure correlation.
No substantial differences were noted between the observed average score and FAM, indicating a high rater assessment precision. Table 1 indicates that the rater strata and reliability were 10.19 and 0.98, respectively. The chi-square statistic also exerted a significant effect (χ2 = 382.6; df = 5; p < .001), demonstrating high heterogeneity among raters. This result revealed that raters displayed various degrees of severity, which influenced student performance ratings.