. 2025 Aug 22;15:30845. doi: 10.1038/s41598-025-13983-4

Table 3.

Hyperparameters of the delay shift agent and baseline methods after validation.

Agent	Learning rate	Discount factor (γ)	-decay	L2 regularisation	Action selection strategy
Delay Shift	0.001	0.95	0.995	0.01	-greedy policy based on Q-function
Rule-Based	–	–	–	–	Δ = +2 s (offset applied to the estimated task delay)
Entropy-Based	0.01	0.9	–	0.001	Softmax over Q-estimates (T = 0.5)
Heuristic QoS	–	–	–	–	p = 0.7 (probabilistic admission for tasks with delay ≤ 100 ms)