Online Learning Approach for Predictive Real-Time Energy Trading in Cloud-RANs

. 2021 Mar 25;21(7):2308. doi: 10.3390/s21072308

$L_{b}$	Index of the number N of BSs.
$L_{i}$	Index of the number $K_{i}$ of information users.
$L_{e}$	Index of the number $K_{e}$ of active energy users (EUs).
$L_{e}^{[idle]}$	Index of the number $K_{e}^{[idle]}$ of idle EUs.
$P_{n}^{[Tx]}$	Total transmit power at the n-th RRH.
$P_{n}^{[circ]}$	Hardware circuit power consumption at the n-th RRH.
$P_{CU}^{[circ]}$	Hardware circuit power consumption at the CU.
$P_{n}^{[Tmax]}$	Maximum transmit power allowance of the n-th RRH.
$P_{CU}^{[\max]}$	Maximum power provision by the grid at the CU.
$T$	Index of the number T of time slots.
$F$	Index of the number F of frames within a time slot.
$K$	Index of the number K of trials within a frame.
$B_{n}^{[ahead]}$	Amount of energy purchased from the day-ahead
	market (Arm).
$B_{n}^{[spot]}$	Amount of energy to be purchased from the spot-market.
$S_{n}$	Amount of excessive energy to be sold back to the grid.
$E_{n}$	Amount of renewable energy generation at the n-th RRH.
$B_{n}^{[total]} (k)$	Total energy cost of the n-th RRH at the k-th trial.
$E^{[total]} = {E^{1}, \dots, E^{J}}$	All energy packages (arms) offered by the grid in the
	day-ahead market.
$μ_{n, p}^{[k, f, t]} = R (B_{n}^{[ahead]} (k))$	Reward associated with arm $B_{n}^{[ahead]}$ at the k-th trial of the
	f-th frame at the t-th time slot.
$A_{k}^{[set]} = {B_{1}^{[ahead]} (k), \dots, B_{N}^{[ahead]} (k)}$	N energy packages purchased a day ahead at the k-th trial
	(super arm).
$R (A_{k}^{[set]})$	Reward for the super arm $A_{k}^{[set]}$ at the k-th trial.
$μ_{n}^{[k, f, t]} = (μ_{n, 1}^{[k, f, t]}, μ_{n, 2}^{[k, f, t]}, \dots, μ_{n, J}^{[k, f, t]})$	Reward vector for the n-th RRH
${\hat{μ}}_{n}^{[f, t]} = ({\hat{μ}}_{n, 1}^{[f, t]}, {\hat{μ}}_{n, 2}^{[f, t]}, \dots, {\hat{μ}}_{n, J}^{[f, t]})$	Estimated mean reward vector
${\bar{μ}}_{n}^{[f, t]} = ({\bar{μ}}_{n, 1}^{[f, t]}, {\bar{μ}}_{n, 2}^{[f, t]}, \dots, {\bar{μ}}_{n, J}^{[f, t]})$	Adjusted reward vector of individual arms
$R_{t}^{[acc]}$	Accumulated reward at time slot t.
$Q_{t}$	Regret at time slot t.