Skip to main content
. 2024 Sep 13;35(3):1178–1217. doi: 10.1007/s40593-024-00426-w

Table 4.

Model Performances Across MEWS Essay Traits

Content Organization Language quality
MEWS 1 (AD)
 N-Gram reg. 0.330 0.142 0.442
 Feature reg. 0.423 0.509 0.662
 Feature DNN 0.380 0.482 0.648
 DistilBERT 0.396 0.171 0.556
 Hybrid 0.463 .521 0.698
 Human Threshold1 0.66 0.68 0.71
MEWS 2 (TE)
 N-Gram reg. 0.289 0.167 0.464
 Feature reg. 0.435 0.507 0.654
 Feature DNN 0.377 0.517 0.688
 DistilBERT 0.355 0.192 0.667
 Hybrid 0.376 0.528 0.723
 Human threshold1 0.52 0.77 0.72

The best performing model for each trait and prompt is printed in bold. reg. = ridge regression

1Human rater agreement in terms of QWK