. 2024 Sep 13;35(3):1178–1217. doi: 10.1007/s40593-024-00426-w

Table 3.

Quadratic Weighted Kappa Across ASAP Essay Traits

	Content	Organization	Word choice	Sentence fluency	Conventions
ASAP 1
N-Gram reg.	0.536	0.511	0.515	0.491	0.481
Feature reg.	0.678	0.635	0.672	0.636	0.623
Feature DNN	0.693	0.657	0.690	0.645	0.639
DistilBERT	0.713	0.666	0.677	0.675	0.666
Hybrid	.743	.672	0.673	.681	0.648
M. & B. (2018)¹	0.67	0.60	0.64	0.62	0.61
M. & B. (2020)²	0.703	0.664	0.675	0.648	0.638
ASAP 2
N-Gram reg.	0.552	0.541	0.548	0.396	0.402
Feature reg.	0.637	0.658	0.686	0.672	0.684
Feature DNN	0.664	0.662	0.698	0.688	0.699
DistilBERT	0.651	0.591	0.686	0.674	0.685
Hybrid	0.688	.686	.715	.736	0.685
M. & B. (2018)¹	0.61	0.58	0.60	0.59	0.62
M. & B. (2020)²	0.617	0.623	0.630	0.603	0.601

The best performing model for each trait and prompt is printed in bold. reg. = ridge regression

¹Performance benchmarks in terms of QWK from Mathias and Bhattacharyya (2018)

²Performance benchmarks in terms of QWK from Mathias and Bhattacharyya (2020)