A Turbo Q-Learning (TQL) for Energy Efficiency Optimization in Heterogeneous Networks

. 2020 Aug 30;22(9):957. doi: 10.3390/e22090957

Algorithm 3: The ABS ratio

α

and the number of SBSs activation

|S|

are given to optimize the CRE bias

β

Require:

a_{t} = (α, β, |S|)

Ensure: Optimized CRE bias

β

1: Initialize

Q_{β} (s, a)

, state s and n = 0;

2: Setting learning rate

λ_{2}

, greedy probability

ε_{2}

, discount factor

γ_{2}

, and

t h r e s h o l d_{2}

;

3: while n<=

t h r e s h o l d_{2}

4: In state s, select the optimal action a with greedy probability

ε_{2}

;

5: Observe r;

6: randomly transfer from s to

s^{^{'}}

;

7: Update

Q_{β} (s, a)

according to Formula (19);

s \leftarrow s^{^{'}}

;

n = n + 1

;

10: end while

11: Output:

β = a

;