Skip to main content
. 2023 Aug 18;9:e1511. doi: 10.7717/peerj-cs.1511

Table 7. Evaluation results (averaged with calculated confidence intervals) of Task 2 using the testing dataset.

The best results are presented in bold.

Vocab size
8,000 16,000 32,000
LSTM BLEU 0.560 ± 0.009 0.574 ± 0.006 0.592 ± 0.006
ROUGE 0.596 ± 0.011 0.602 ± 0.005 0.623 ± 0.007
BiLSTM BLEU 0.612 ± 0.009 0.649 ± 0.003 0.671 ± 0.011
ROUGE 0.635 ± 0.015 0.673 ± 0.008 0.695 ± 0.005
Transformer BLEU 0.912 ± 0.011 0.946 ± 0.006 0.927 ± 0.011
ROUGE 0.932 ± 0.014 0.950 ± 0.010 0.949 ± 0.004
T5 small BLEU 0.906 ± 0.005
ROUGE 0.928 ± 0.009
T5 base BLEU 0.902 ± 0.004
ROUGE 0.937 ± 0.008