Table 11.
Frequency distribution of scores with respect to each metric by all evaluators.
Metric | Score | Frequency |
M1a | 2 | 4 |
M1 | 3 | 10 |
M1 | 4 | 12 |
M1 | 5 | 4 |
M2b | 2 | 3 |
M2 | 3 | 10 |
M2 | 4 | 10 |
M2 | 5 | 7 |
M3c | 3 | 12 |
M3 | 4 | 16 |
M3 | 5 | 2 |
aM1: summary relevance to the inbound query.
bM2: aim, population, intervention, results, and outcome classification representation in the summary.
cM3: model summary better than the baseline summary.