Online Learning Approach for Predictive Real-Time Energy Trading in Cloud-RANs

View full-text article in PMC

. 2021 Mar 25;21(7):2308. doi: 10.3390/s21072308

Algorithm 2 Main Online Learning Algorithm

1:
Initialize: Time slot count: $t = 0$ ;
2:
while $t \neq T$ do
3:
Increment the iteration index $t = t + 1$ ;
4:
for $f = 1 : F$
5:
if $t = 1$ (initial time slot)
6:
then Initialize the super arm for the first trial ( $k = 1$ ) as $A_{1}^{[set]} = {0_{1}, \dots, 0_{N}}$ ,
7:
else $A_{1}^{[set]} = S^{*}$ ,
8:
end if
9:
Exploration Stage: Run Algorithm 1
10:
Estimation Stage:
11:
Calculate the mean reward vector for the frame ${\hat{μ}}_{n}^{[f, t]} = ({\hat{μ}}_{n, 1}^{[f, t]}, {\hat{μ}}_{n, 2}^{[f, t]}, \dots, {\hat{μ}}_{n, J}^{[f, t]})$ , where ${\hat{μ}}_{n, p}^{[f, t]} = \frac{\sum_{k = 1}^{K} μ_{n, p}^{[k, f, t]}}{K}, \forall p \in J, n \in L_{b}$ .
12:
Adjustment Stage:
13:
if $Ψ_{p}$ (number of times the p-th arm is played) $\neq 0$
14:
then adjust ${\bar{μ}}_{n, p}^{[f, t]} = {\hat{μ}}_{n, p}^{[f, t]} + \sqrt{\frac{3 \ln K}{2 Ψ_{p}}}$ ,
15:
else ${\bar{μ}}_{n, p}^{[f, t]} = {\hat{μ}}_{n, p}^{[f, t]}, \forall p \in J, n \in L_{b}$ .
16:
end if
17:
end for
18:
Average adjusted mean reward vector over all frames ${\bar{μ}}_{n}^{[t]} = (\frac{\sum_{f \in F} {\bar{μ}}_{n, 1}^{[f, t]}}{F}, \frac{\sum_{f \in F} {\bar{μ}}_{n, 2}^{[f, t]}}{F}, \dots, \frac{\sum_{f \in F} {\bar{μ}}_{n, J}^{[f, t]}}{F}), n \in L_{b}$ .
19:
Exploitation Stage:
20:
Average ${\bar{μ}}_{n}^{[t]}$ over accumulated number of time slots, as ${\bar{μ}}_{n} = \frac{\sum_{t^{'} = 1}^{t} {\bar{μ}}_{n}^{[t^{'}]}}{t} = [{\bar{μ}}_{n, 1}, {\bar{μ}}_{n, 2}, \dots, {\bar{μ}}_{n, J}], n \in L_{b}$ .
21:
For the next time slot: find N optimum arm indexes as $p_{n}^{*} = \underset{p}{argmax} ({\bar{μ}}_{n, p}), p \in J, \forall n \in L_{b}$ , and the updated super arm as $S^{*} = Δ E [p_{1}^{*}, p_{2}^{*}, \dots, p_{N}^{*}]$ .
22:
end while