Table 10.
System | Batch 1 | Batch 2 | Batch 3 |
---|---|---|---|
Wishart-S1 | 3.95 | 4.23 | - |
Wishart-S2 | 3.95 | - | - |
Wishart-S3 | 3.95 | - | - |
Baseline1 | 2.86 | 3.02 | 3.19 |
Baseline2 | 2.73 | 2.87 | 3.17 |
main system | 3.35 | 3.39 | 3.13 |
system 2 | - | 3.34 | 3.07 |
system 3 | - | 3.34 | 2.98 |
system 4 | - | 3.34 | - |
The final score is calculated as the average of the individual scores of the systems for the different evaluation criteria. A hyphenation symbol (-) is used whenever the system did not participate in the corresponding batch. The scores are given by experts who read and evaluated the “ideal” answers, and they range from 1 to 5, with 5 being the best score.