Table 4.
Performance of the models when answering questions having few relevant documents vs. when answering questions with many relevant documents
| Top 1 | Top 5 | Top 10 | Top 15 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| EM | F1 | EM | F1 | EM | F1 | EM | F1 | ||
| Questions with few relevant documents | QA-Not-Rerank | 31.00 | 40.48 | 35.00 | 43.93 | 46.00 | 55.79 | 48.00 | 55.12 |
| QANA | 31.00 | 40.52 | 45.00 | 54.18 | 48.00 | 57.28 | 52.00 | 59.22 | |
| Questions with many relevant documents | QA-Not-Rerank | 20.00 | 24.41 | 25.00 | 31.75 | 35.00 | 42.86 | 36.00 | 42.84 |
| QANA | 22.00 | 28.02 | 29.00 | 33.33 | 36.00 | 41.11 | 39.00 | 46.21 | |