Delay-Tolerance-Based Mobile Data Offloading Using Deep Reinforcement Learning

. 2019 Apr 8;19(7):1674. doi: 10.3390/s19071674

Algorithm 1 Reward function of the proposed method
1	Current load of connecting eNB: $L_{t + 1}$
2	Ideal load that is control target value: $L_{i d e a l}$
3	Available bandwidth of connecting eNB: $A B W$
4	Select action at time $t$ : $a_{t}$
5	Episode end time: $t_{e n d}$
6	Normalization variable: $β$
7	if $L_{t + 1} \leq L_{i d e a l}$ then
8	$r \leftarrow 1 + \frac{t}{β}$
9	else
10	if $A B W = 0$ and $a_{t} = 4$ then
11	$r \leftarrow 0$
12	else
13	$r \leftarrow - (\frac{L_{t + 1} - L_{i d e a l}}{L_{i d e a l}} + \frac{t_{e n d} - t}{β}$ )
14	end if
15	end if
16	return $r$