| Algorithm 1 Deep Linear Transition Network (DLTN) |
| Input: Cloud RAN , set of users , action decision Output: Cumulative reward Procedure: Step 1: signal-to-interference-plus-noise ratio (SINR) at the receiver of user can be estimated by using Equation (1). Step 2: data rate between BS and user is denoted as , and it can be calculated using Equation (2). Step 3: the total delay of user is computed using Equation (3), which consists of three components, the transmission delay , queuing delay , and retransmission delay . Step 4: the transmission delay is formulated as shown in Equation (4). Step 5: the state of power control model includes the queue length of data rate, current delay, and current transmission power, which is estimated using Equation (6). Step 6: Action: the transmission power is estimated with respect to the action of power control by using Equation (7). Step 7: Reward: the reward of power control model is computed with the weighted sum reward and penalty of large transmission power as shown in Equation (8). Step 8: the expected long-term reward is maximized by using Equation (9). Step 9: then, the learning rate and discount factors are computed using Equation (9). Step 10: finally, the cumulative reward is produced as the output, which is used for allocating the resources in cloud systems. |