Table 3.
Hyperparameters of the delay shift agent and baseline methods after validation.
| Agent | Learning rate | Discount factor (γ) |
-decay |
L2 regularisation | Action selection strategy |
|---|---|---|---|---|---|
| Delay Shift | 0.001 | 0.95 | 0.995 | 0.01 |
-greedy policy based on Q-function |
| Rule-Based | – | – | – | – | Δ = +2 s (offset applied to the estimated task delay) |
| Entropy-Based | 0.01 | 0.9 | – | 0.001 | Softmax over Q-estimates (T = 0.5) |
| Heuristic QoS | – | – | – | – | p = 0.7 (probabilistic admission for tasks with delay ≤ 100 ms) |

