Table 10.
Average scores for each system and each batch of phase B of Task 1b for the “ideal” answers
| System | Batch 1 | Batch 2 | Batch 3 |
|---|---|---|---|
| Wishart-S1 | 3.95 | 4.23 | - |
| Wishart-S2 | 3.95 | - | - |
| Wishart-S3 | 3.95 | - | - |
| Baseline1 | 2.86 | 3.02 | 3.19 |
| Baseline2 | 2.73 | 2.87 | 3.17 |
| main system | 3.35 | 3.39 | 3.13 |
| system 2 | - | 3.34 | 3.07 |
| system 3 | - | 3.34 | 2.98 |
| system 4 | - | 3.34 | - |
The final score is calculated as the average of the individual scores of the systems for the different evaluation criteria. A hyphenation symbol (-) is used whenever the system did not participate in the corresponding batch. The scores are given by experts who read and evaluated the “ideal” answers, and they range from 1 to 5, with 5 being the best score.