Joint Beamforming, Power Allocation, and Splitting Control for SWIPT-Enabled IoT Networks with Deep Reinforcement Learning and Game Theory

. 2022 Mar 17;22(6):2328. doi: 10.3390/s22062328

Algorithm 1 The single-layer DQN-based MRA training algorithm.

1:
(Input) $λ_{i}, μ_{i}, ν_{i}, \forall i$ , batch size $η$ , learning rate $α$ , minimum exploration rate $ϵ_{m i n}$ , discount factor $ζ$ , and exploration decay rate d;
2:
(Output) Learned DQN to decide $P_{i}, θ_{i}, f_{i}, \forall i$ , for (7);
3:
Initialize action $a^{(0)}$ and replay buffer $D = \emptyset$ ;
4:
for episode = 1 to $M$ do
5:
Initialize state $s^{(0)}$ ;
6:
for time $t = 1$ to $N$ do
7:
Observe current state $s^{(t)}$ ;
8:
$ϵ = max (ϵ \cdot d, ϵ_{m i n})$ ;
9:
if random number $r < ϵ$ then
10:
Select $a^{(t)} \in \hat{A}$ at random;
11:
else
12:
Select $a^{(t)} = arg {max}_{a^{'}} Q^{*} (s^{(t)}, a^{'}, ω)$ ;
13:
end if
14:
Observe next state $s^{'}$ ;
15:
Store transition $(s^{(t)}, a^{(t)}, r^{(t)}, s^{'})$ in D, where $r^{(t)}$ is $U^{(t)}$ obtained with (23);
16:
Select randomly $η$ stored samples $(s^{(j)}, a^{(j)}, r^{(j)}, s^{(j + 1)})$ from D for experience;
17:
Obtain $\hat{Q} (s^{(j)}, a^{(j)}, ω^{'})$ for all j samples with (13);
18:
Perform SGD to minimize the loss in (14) for finding the optimal weight of DNN, $ω^{*}$ ;
19:
Update $ω = ω^{*}$ in the DQN;
20:
$s^{(t)} = s^{'}$ ;
21:
end for
22:
end for