Multi-Objective Optimization of Energy Saving and Throughput in Heterogeneous Networks Using Deep Reinforcement Learning

. 2021 Nov 27;21(23):7925. doi: 10.3390/s21237925

Algorithm 2 PPO-Based Deep Optimistic Linear Support

1:
Initialization:
2:
$S$ = partial CSS, which is composed of $V_{t}$ obtained after the PPO learning.
3:
$W$ = corner weights, which is obtained from $S$ .
4:
$Q$ = priority queue of weights for the multi-objective, where the weights form a tuple along with their importance (i.e., ( $[ω_{1}^{t}, ω_{2}^{t}], I$ )).

5:
$ω_{t} = Q$ .pop()
6:
for iteration=1,2, ….,do
7:
for iteration=1,2, …., T do
8:
$a_{t} = π_{θ_{o l d}} (s_{t})$
9:
$[r_{e}^{t}, r_{d}^{t}], s_{t + 1} = E n v (a_{t})$
10:
Reduce scaling of $[r_{e}^{t}, r_{d}^{t}]$
11:
$M = M \cup {s_{t}, a_{t}, [r_{e}^{t}, r_{d}^{t}], s_{t + 1}}$
12:
$[A_{e}^{t}, A_{d}^{t}] =$ compute advantage estimate from Equation (26)
13:
$\hat{A_{t}} = \hat{A_{t}} \cup {[A_{e}^{t}, A_{d}^{t}] \times [ω_{1}^{t}, ω_{2}^{t}]}$
14:
end for
15:
Optimize surrogate L and wrt $θ$ from $\hat{A_{t}}$ , with K epochs
16:
Optimize $V_{ϕ}$ and wrt $ϕ$ from ${\hat{V_{t}}}^{G A E (γ, λ)}$ , with K epochs
17:
$θ_{o l d} = θ, ϕ_{o l d} = ϕ$
18:
end for when convergence
19:
$V_{t} = V_{ϕ} (s)$
20:
$W = W \cup ω_{t}$
21:
if $ω_{t} \cdot V_{t} > \sum_{U \in S} ω_{t} \cdot U$ then
22:
$S$ = remove obsolete $V_{d e l}$ due to new $V_{t}$
23:
$ω_{c}$ = new corner weight from $S$
24:
$S = S \cup V_{t}$
25:
$Q$ = remove obsolete $ω_{d e l}$ due to new $ω_{c}$
26:
for iteration=1,2, …., $ω_{c}$ do
27:
if estimate improvement of $(ω^{'}, W, S) > τ$ then
28:
$Q = Q \cup ω^{'}$
29:
end if
30:
end for
31:
end if
32:
if $Q$ is not empty then
33:
go back to line 1
34:
end if