A Power-Aware 5G Network Slicing Scheme for IIoT Systems with Age Tolerance

. 2025 Nov 14;25(22):6956. doi: 10.3390/s25226956

Algorithm 1 Training process of the improved D3QN

1:
Input: Total iteration rounds T, attenuation factor $γ$ , and exploration rate $ε$ .
2:
Initialize the evaluation network parameters $θ$ , target network parameters $θ^{'} = θ$ , $θ^{'}$ update frequency x, and replay buffer D;
3:
for $t = 1$ to T do
4:
Randomly select actions according to the probability of $ε$ in the optimized exploration space e, otherwise select actions according to the greedy policy;
5:
Taking action, the agent gets new status $s_{t + 1}$ and reward;
6:
Store combination $(s_{t}, {\vec{a}}_{t}, R_{t}, s_{t + 1})$ to D;
7:
Randomly select a batch of $(s_{t}, {\vec{a}}_{t}, R_{t}, s_{t + 1})$ from D and calculate Q-Value of the evaluation network;
8:
Calculate loss and update parameters $θ$ ;
9:
Every x steps reset $θ^{'} = θ$ , update the target Q-values, and reduce $ε$ value;
10:
$s_{t} = s_{t + 1}$ ;
11:
end for