. 2023 May 23;9(6):e16548. doi: 10.1016/j.heliyon.2023.e16548

Table 4.

Rater measurement.

Rater	Total	Observed Average	FAM	Measure (logits)	SE	Infit MNSQ	Infit ZSTD	Outfit MNSQ	Infit ZSTD	PTMAC
Rater 1	1270	2.14	2.4	1.62	0.08	0.91	−1.7	0.91	−1.6	0.48
Rater 2	758	2.63	2.59	2.31	0.13	1.37	4.3	1.33	3.3	0.32
Rater 3	1020	1.72	2.15	0.62	0.08	0.86	−2.5	0.86	−2.5	0.53
Rater 4	1562	2.63	2.65	2.54	0.09	1.12	2.2	1.14	2	0.44
Rater 5	1273	2.28	2.24	1.01	0.08	0.87	−2.4	0.87	−2.5	0.48
Rater 6	817	2.67	2.66	2.61	0.13	1.22	2.9	1.2	2.1	0.43

SE = Standard Error, FAM = Fair average measure, PTMAC = Point measure correlation.

No substantial differences were noted between the observed average score and FAM, indicating a high rater assessment precision. Table 1 indicates that the rater strata and reliability were 10.19 and 0.98, respectively. The chi-square statistic also exerted a significant effect (χ² = 382.6; df = 5; p < .001), demonstrating high heterogeneity among raters. This result revealed that raters displayed various degrees of severity, which influenced student performance ratings.