Skip to main content
. 2023 Jan 26;23(3):1383. doi: 10.3390/s23031383
Algorithm 1: Developed Q-learning Channel Prediction Structure.
  1.  Initialize the Q table values and initialize the reward matrix R with zeroes.

Inputs
  • 2.

     Number of Iterations and the size for the channel parameters for every user device.

  • 3.

     Initial distance di of every user device from the BS.

  • 4.

     Path loss parameter ϑ.

  • 5.

     Design random pilot symbols.

  • 6.

    Initialize the random channel parameters for each user hij based on fading model, j1, 2, ,N and i1, 2, ,M. N is the number of antennas at BS and M is the number of devices in the cell.

  • 7.

     Designate the power percentage ηi for each user.

  • 8.

    Determine system bandwidth B, Total transmit power PT, and noise spectral density No

  • 9.

     Assign the desired channel parameters hid and the target rate RT

Procedure
  • 10.

    Based on the channel gain hij2, total transmit power PT, and initial power factor for each user ηi, signal to interference noise ratio SINRi, minimum required rate Ri can be calculated for each device.

  • 11.

    At each iteration, compare the initial generated rate Ri with the target rate RT.

  • 12.

    Update the values for the Q-table that represent the current state and action pair Qs,a.

Q-algorithm
  • 13.

    identify discount factor γ, learning rate α, the current state, and the terminal state.

  • 14.

     Choose the next state at random and set it as the next new state.

  • 15.

     Inspect all possible actions ai to move to the new state.

  • 16.

    Select the best action aiA, which satisfies the maximum value for the Q-value function argmax Qs,a to move to the new state.

  • 17

    Identify the immediate Reward R, based on the action implemented to move to the new state.

  • 18.

    Based on the following: (1) maximum Q-value Qs,a obtained in (16), (2) the corresponding reward R, (3) the discount factor γ, then Qs,a can be updated based on bellman’s equation

Qs,aR+γ argmax Qs,a
Outputs
  • 19.

    Based on the updated Qs,a values in Q-table, the channel coefficients hij and channel gain hij2 can be updated and a new user rate can be calculated and compared to the target rate RT.

  • 20.

    Compute the difference ΔQ between the updated value function Qnews,a and the previous Qs,a.

  • 21.

    Based on (20), Qs,a value in the Q-table can be further updated according to Qs,aQs,a+α·ΔQ

  • 22.

     Check whether the terminal state has been reached or the episode has been completed.

  • 23.

     Compose predicted channel taps h^i