Table 2:
Performance of fine-tuned T5 models on the summarization task. 95% confidence intervals are included. The first row is a baseline representing the best performance on this task to date. Please see the Appendix for the full set of results.
Model | Training | Summarization |
---|---|---|
Gao et al., 2023 | Single task | 7.60 (5.31 – 9.89) |
T5 220M | Single task | 26.35 (22.18 – 30.52) |
Multi-task | 24.84 (20.28 – 29.40) | |
T5 770M | Single task | 26.90 (22.58 – 31.23) |
Multi-task | 23.99 (19.86 – 28.13) | |
SciFive 220M | Single task | 25.31 (21.45 – 29.17) |
Multi-task | 24.38 (19.99 – 28.78) | |
SciFive 770M | Single task | 27.31 (23.09 – 31.53) |
Multi-task | 25.31 (21.45 – 29.17) | |
Clinical-T5 | Single task | 25.35 (21.19 – 29.51) |
220M | Multi-task | 26.21 (21.92 – 30.49) |
Clinical-T5 | Single task | 28.28 (24.17 – 32.38) |
770M | Multi-task | 28.55 (24.29 – 32.80) |