Skip to main content
. 2025 Aug 22;15:30845. doi: 10.1038/s41598-025-13983-4

Table 3.

Hyperparameters of the delay shift agent and baseline methods after validation.

Agent Learning rate Discount factor (γ) Inline graphic-decay L2 regularisation Action selection strategy
Delay Shift 0.001 0.95 0.995 0.01 Inline graphic-greedy policy based on Q-function
Rule-Based Δ = +2 s (offset applied to the estimated task delay)
Entropy-Based 0.01 0.9 0.001 Softmax over Q-estimates (T = 0.5)
Heuristic QoS p = 0.7 (probabilistic admission for tasks with delay ≤ 100 ms)