A Turbo Q-Learning (TQL) for Energy Efficiency Optimization in Heterogeneous Networks

. 2020 Aug 30;22(9):957. doi: 10.3390/e22090957

Algorithm 5: The algorithm for optimizing initial problems.

Require:

α \in A

β \in B

|S| = \{0, 1, \dots, {|S|}_{max}\}

, Reward r, Learning rate

λ = \{λ_{1}, λ_{2}, λ_{3}\}

Greedy probability

ε = \{ε_{1}, ε_{2}, ε_{3}\}

and Discount factor

γ = \{γ_{1}, γ_{2}, γ_{3}\}

Ensure: Optimal action configuration

\{α, β, |S|\}

in each state.

1: Initialize

U_{t}

α

β

|S|

2: while n <=

t h r e s h o l d_{4}

3: Fixed the CRE bias

β

and the number of SBS activation

|S|

, calculate the ABS ratio

α

according to Algorithm 2. Pass the solved

α

to step (4) and step (5);

4: Fixing the ABS ratio

α

and the number of SBS activation

|S|

, calculate the CRE bias

β

according to Algorithm 3. Pass the solved

β

to step (3) and step (4);

5: Fix the ABS ratio

α

and the CRE bias

β

, calculate the number of SBS activation

|S|

according to Algorithm 4. Pass the solved

|S|

to step (4) and step (3);

n = n + 1

;

7: end while

8: Output:

α

β

|S|

;