Skip to main content
. 2022 Feb 21;2021:881–890.

Table 3:

Human judgement counts for sentence pairs from the test set for all models and the reference human sentences. S: the generated was simpler; F: the original was simpler; E: both of equal complexity; N: cannot understand either; U: was not changed by the model/human reference; SG simplification gain as defined in Equation 2. Bold indicates best model. Scores in SG are significant (p < 0.05)

S F E N U SG
Human 1 730 273 904 40 4 053 0.21
n-gram 1 452 1 004 1 732 110 2 702 0.06
GPT-1 1 404 747 1 736 117 2 996 0.09
GPT-2 1 372 1 077 1 661 118 2 772 0.04
NTS 587 855 1 022 98 4 438 -0.04
ClinicalNTS 1 483 1 597 404 93 3 423 -0.02
PhraseTable 2 425 2 759 269 98 1 449 -0.05