Skip to main content

View full-text article in PMC

. 2023 Aug 18;9:e1511. doi: 10.7717/peerj-cs.1511

Table 7. Evaluation results (averaged with calculated confidence intervals) of Task 2 using the testing dataset.

The best results are presented in bold.

		Vocab size
		8,000	16,000	32,000
LSTM	BLEU	0.560 ± 0.009	0.574 ± 0.006	0.592 ± 0.006
LSTM	ROUGE	0.596 ± 0.011	0.602 ± 0.005	0.623 ± 0.007
BiLSTM	BLEU	0.612 ± 0.009	0.649 ± 0.003	0.671 ± 0.011
BiLSTM	ROUGE	0.635 ± 0.015	0.673 ± 0.008	0.695 ± 0.005
Transformer	BLEU	0.912 ± 0.011	0.946 ± 0.006	0.927 ± 0.011
Transformer	ROUGE	0.932 ± 0.014	0.950 ± 0.010	0.949 ± 0.004
T5 small	BLEU	–	–	0.906 ± 0.005
T5 small	ROUGE	–	–	0.928 ± 0.009
T5 base	BLEU	–	–	0.902 ± 0.004
T5 base	ROUGE	–	–	0.937 ± 0.008