Table 3.
Evaluation metric | Top@ 1 | Top@ 5 | Top@ 10 | Top@ 20 |
---|---|---|---|---|
Retriever | ||||
Recall (single document) | 0.495 | 0.711 | 0.720 | 0.836 |
Recall (multiple documents) | 0.494 | 0.716 | 0.720 | 0.836 |
Mean reciprocal rank (MRR) | 0.495 | 0.572 | 0.582 | 0.775 |
Precision | 0.495 | 0.344 | 0.342 | 0.304 |
Mean average precision (MAP) | 0.494 | 0.672 | 0.690 | 0.697 |
Reader | ||||
F1-Score | 0.504 | 0.636 | 0.636 | 0.771 |
Exact match (EM) | 0.539 | 0.549 | 0.698 | 0.775 |
Semantic answer similarity (SAS) | 0.503 | 0.623 | 0.687 | 0.785 |
Accuracy | 0.895 (same for all top @k) |
Bold means best result