Skip to main content
. 2024 Sep 13;35(3):1178–1217. doi: 10.1007/s40593-024-00426-w

Table 3.

Quadratic Weighted Kappa Across ASAP Essay Traits

Content Organization Word choice Sentence fluency Conventions
ASAP 1
 N-Gram reg. 0.536 0.511 0.515 0.491 0.481
 Feature reg. 0.678 0.635 0.672 0.636 0.623
 Feature DNN 0.693 0.657 0.690 0.645 0.639
 DistilBERT 0.713 0.666 0.677 0.675 0.666
 Hybrid .743 .672 0.673 .681 0.648
 M. & B. (2018)1 0.67 0.60 0.64 0.62 0.61
 M. & B. (2020)2 0.703 0.664 0.675 0.648 0.638
ASAP 2
 N-Gram reg. 0.552 0.541 0.548 0.396 0.402
 Feature reg. 0.637 0.658 0.686 0.672 0.684
 Feature DNN 0.664 0.662 0.698 0.688 0.699
 DistilBERT 0.651 0.591 0.686 0.674 0.685
 Hybrid 0.688 .686 .715 .736 0.685
 M. & B. (2018)1 0.61 0.58 0.60 0.59 0.62
 M. & B. (2020)2 0.617 0.623 0.630 0.603 0.601

The best performing model for each trait and prompt is printed in bold. reg. = ridge regression

1Performance benchmarks in terms of QWK from Mathias and Bhattacharyya (2018)

2Performance benchmarks in terms of QWK from Mathias and Bhattacharyya (2020)