A Turbo Q-Learning (TQL) for Energy Efficiency Optimization in Heterogeneous Networks

. 2020 Aug 30;22(9):957. doi: 10.3390/e22090957

Algorithm 2: The CRE bias

β

and the number of SBSs activation

|S|

are given to optimize the ABS ratio

α

Require:

A_{t} = (α, β, |S|)

Ensure: Optimized ABS ratio

α

1: Initialize

Q_{α} (s, a)

, state s and n = 0;

2: Setting learning rate

λ_{1}

, greedy probability

ε_{1}

, discount factor

γ_{1}

and

t h r e s h o l d_{1}

;

3: while n <=

t h r e s h o l d_{1}

4: In state s, select the optimal action a with greedy probability

ε_{1}

;

5: Observe r;

6: randomly transfer from s to

s^{^{'}}

;

7: Update

Q_{α} (s, a)

according to Formula (18);

s \leftarrow s^{^{'}}

;

n = n + 1

;

10: end while

11: Output:

α = a

;