Table 2.
Robustness to retrieval configuration (Mean ± Std Across Runs).
| Dataset | Retrieval configuration | Accuracy degradation (%) |
|---|---|---|
| TruthfulQA | Baseline (k = 5) | 21.86 ± 0.19 |
| No Filtering | 20.92 ± 0.17 | |
| k = 10 | 23.20 ± 0.14 | |
| k = 3 | 21.10 ± 0.14 | |
| MMLU | Baseline (k = 5) | 18.70 ± 0.14 |
| No Filtering | 18.10 ± 0.14 | |
| k = 10 | 19.90 ± 0.14 | |
| k = 3 | 18.50 ± 0.14 | |
| MedMCQA | Baseline (k = 5) | 23.90 ± 0.14 |
| No Filtering | 22.80 ± 0.14 | |
| k = 10 | 24.30 ± 0.14 | |
| k = 3 | 23.10 ± 0.14 | |
| SCALR | Baseline (k = 5) | 20.40 ± 0.14 |
| No Filtering | 19.60 ± 0.14 | |
| k = 10 | 21.70 ± 0.14 | |
| k = 3 | 20.20 ± 0.14 |