Table 11.
Frequency distribution of scores with respect to each metric by all evaluators.
| Metric | Score | Frequency |
| M1a | 2 | 4 |
| M1 | 3 | 10 |
| M1 | 4 | 12 |
| M1 | 5 | 4 |
| M2b | 2 | 3 |
| M2 | 3 | 10 |
| M2 | 4 | 10 |
| M2 | 5 | 7 |
| M3c | 3 | 12 |
| M3 | 4 | 16 |
| M3 | 5 | 2 |
aM1: summary relevance to the inbound query.
bM2: aim, population, intervention, results, and outcome classification representation in the summary.
cM3: model summary better than the baseline summary.