Table 4.
Model Performances Across MEWS Essay Traits
| Content | Organization | Language quality | |
|---|---|---|---|
| MEWS 1 (AD) | |||
| N-Gram reg. | 0.330 | 0.142 | 0.442 |
| Feature reg. | 0.423 | 0.509 | 0.662 |
| Feature DNN | 0.380 | 0.482 | 0.648 |
| DistilBERT | 0.396 | 0.171 | 0.556 |
| Hybrid | 0.463 | .521 | 0.698 |
| Human Threshold1 | 0.66 | 0.68 | 0.71 |
| MEWS 2 (TE) | |||
| N-Gram reg. | 0.289 | 0.167 | 0.464 |
| Feature reg. | 0.435 | 0.507 | 0.654 |
| Feature DNN | 0.377 | 0.517 | 0.688 |
| DistilBERT | 0.355 | 0.192 | 0.667 |
| Hybrid | 0.376 | 0.528 | 0.723 |
| Human threshold1 | 0.52 | 0.77 | 0.72 |
The best performing model for each trait and prompt is printed in bold. reg. = ridge regression
1Human rater agreement in terms of QWK