Joint Beamforming, Power Allocation, and Splitting Control for SWIPT-Enabled IoT Networks with Deep Reinforcement Learning and Game Theory

View full-text article in PMC

. 2022 Mar 17;22(6):2328. doi: 10.3390/s22062328

Algorithm 3 The two-layer hybrid MRA training algorithm.

1:
(Input) $λ_{i}, μ_{i}, ν_{i}, \forall i$ , batch size $η$ , learning rate $α$ , minimum exploration rate $ϵ_{m i n}$ , discount factor $ζ$ , exploration decay rate d, and converge threshold $ϱ$ ;
2:
(Output) Learned DQN to decide $P_{i}, θ_{i}, f_{i}, \forall i$ , for (7);
3:
(Upper-layer DQN-based learning:)
4:
Initialize action $a^{(0)}$ and replay buffer $D = \emptyset$ ;
5:
for episode = 1 to $M$ do
6:
Initialize state $s^{(0)}$ ;
7:
for time $t = 1$ to $N$ do
8:
Observe current state $s^{(t)} = \{L^{(t)}, F^{(t)}\}$ ;
9:
$ϵ = max (ϵ \cdot d, ϵ_{m i n})$ ;
10:
if random number $r < ϵ$ then
11:
Select $a^{(t)}$ from ${\hat{A}}_{F}$ at random;
12:
else
13:
Select $a^{(t)} = arg {max}_{a^{'}} Q^{*} (s^{(t)}, a^{'}, ω)$ ;
14:
end if
15:
Observe next state $s^{'}$ ;
16:
(Lower-layer game-theory-based iteration:)
17:
for each link i do
18:
for iteration $k = 1$ to $K$ do
19:
Update $P_{i} [k]$ with (47);
20:
Update $θ_{i} [k]$ with (48);
21:
if | $U_{i} [k] - U_{i} [k - 1] | \leq ϱ$ then
22:
$k^{'} = k$ ; break;
23:
end if
24:
end for
25:
$k^{*} = min \{k^{'}, K\}$ ;
26:
$P_{i}^{(t)} = P_{i} [k^{*}]$ ; $θ_{i}^{(t)} = θ_{i} [k^{*}]$ ;
27:
end for
28:
Determine $U_{i}^{(t)}$ based on $P_{i}^{(t)}$ and $θ_{i}^{(t)}$ in the lower layer, and $f_{i}^{(t)}$ in the upper layer, $\forall i$ ;
29:
Store transition $(s^{(t)}, a^{(t)}, r^{(t)}, s^{'})$ in D;
30:
Select $η$ random samples $(s^{(j)}, a^{(j)}, r^{(j)}, s^{(j + 1)})$ from D;
31:
Calculate $\hat{Q} (s^{(j)}, a^{(j)}, ω^{'})$ and perform SGD to find the optimal weight of DNN, $ω^{*}$ ;
32:
Update $ω = ω^{*}$ for DQN in the upper layer;
33:
$s^{(t)} = s^{'}$ ;
34:
end for
35:
end for