A Turbo Q-Learning (TQL) for Energy Efficiency Optimization in Heterogeneous Networks

. 2020 Aug 30;22(9):957. doi: 10.3390/e22090957

Algorithm 4: The ABS ratio

α

and the CRE bias

β

are given to optimize the number of SBS activation

|S|

Require:

a_{t} = (α, β, |S|)

Ensure: Optimized ABS ratio

α

1: Initialize

Q_{|S|} (s, a)

, state s and n = 0;

2: Setting learning rate

λ_{3}

, greedy probability

ε_{3}

, discount factor

γ_{3}

, and

t h r e s h o l d_{3}

3: while n <=

t h r e s h o l d_{3}

4: In state s, select the optimal action a with greedy probability

ε_{3}

;

5: Observe r;

6: randomly transfer from s to

s^{^{'}}

;

7: Update

Q_{|S|} (s, a)

according to Formula (20);

s \leftarrow s^{^{'}}

;

n = n + 1

;

10: end while

11: Output:

|S| = a

;