|
Algorithm 1: Developed Q-learning Channel Prediction Structure. |
Initialize the table values and initialize the reward matrix R with zeroes.
Inputs
-
2.
Number of Iterations and the size for the channel parameters for every user device.
-
3.
Initial distance
of every user device from the BS.
-
4.
Path loss parameter
.
-
5.
Design random pilot symbols.
-
6.
Initialize the random channel parameters for each user based on fading model, and . is the number of antennas at BS and is the number of devices in the cell.
-
7.
Designate the power percentage for each user.
-
8.
Determine system bandwidth , Total transmit power , and noise spectral density
-
9.
Assign the desired channel parameters and the target rate
Procedure
-
10.
Based on the channel gain , total transmit power , and initial power factor for each user , signal to interference noise ratio , minimum required rate can be calculated for each device.
-
11.
At each iteration, compare the initial generated rate with the target rate .
-
12.
Update the values for the Q-table that represent the current state and action pair .
Q-algorithm
-
13.
identify discount factor , learning rate , the current state, and the terminal state.
-
14.
Choose the next state at random and set it as the next new state.
-
15.
Inspect all possible actions to move to the new state.
-
16.
Select the best action
, which satisfies the maximum value for the Q-value function argmax to move to the new state.
-
17
Identify the immediate Reward , based on the action implemented to move to the new state.
-
18.
Based on the following: (1) maximum Q-value obtained in (16), (2) the corresponding reward , (3) the discount factor , then can be updated based on bellman’s equation
Outputs
-
19.
Based on the updated values in Q-table, the channel coefficients and channel gain can be updated and a new user rate can be calculated and compared to the target rate .
-
20.
Compute the difference between the updated value function and the previous .
-
21.
Based on value in the Q-table can be further updated according to
-
22.
Check whether the terminal state has been reached or the episode has been completed.
-
23.
Compose predicted channel taps
|