Skip to main content
[Preprint]. 2024 Jul 25:arXiv:2408.00588v1. [Version 1]

Figure 2:

Figure 2:

Performance of different medical evidence summarization systems in automatic evaluations. The p-value was calculated using a paired t-test to determine the statistical significance of the difference between the models. FT - fine-tuning; ZS - zero-shot learning; * - p < 0.05; ** - p < 0.01; *** - p < 0.001; **** - p < 0.0001; ns - Not significant.