Table 3.
Performance of different models using EM and F1
| Model | Top 1 | Top 5 | Top 10 | Top 15 | ||||
|---|---|---|---|---|---|---|---|---|
| EM | F1 | EM | F1 | EM | F1 | EM | F1 | |
| DrQA-NYT [9] | 22.50 | 27.58 | 28.00 | 32.78 | 29.50 | 34.11 | 32.00 | 36.87 |
| DrQA-Wiki [9] | 21.00 | 26.17 | 22.50 | 27.92 | 26.00 | 31.49 | 29.00 | 34.37 |
| QA-NLM-U [21] | 23.50 | 30.54 | 33.00 | 39.71 | 41.00 | 48.02 | 43.00 | 50.71 |
| QA-Not-Rerank [30] | 25.50 | 32.45 | 30.00 | 37.84 | 40.50 | 47.32 | 42.00 | 48.95 |
| QANA-TempPub | 26.00 | 33.69 | 36.00 | 42.75 | 39.50 | 47.19 | 44.00 | 50.71 |
| QANA-TempCont | 22.50 | 29.70 | 32.50 | 40.67 | 41.50 | 49.05 | 44.50 | 51.09 |
| QANA | 26.50 | 34.27 | 37.00 | 43.76 | 42.00 | 49.20 | 45.50 | 52.71 |