Skip to main content
. 2023 Mar 3;23(5):2772. doi: 10.3390/s23052772
Algorithm 1 Proposed deep Q-learning algorithm
  • 1:

    Initialize policy, target DQN with random ww

  • 2:

    Initialize ϵ

  • 3:

    for episode do

  • 4:

        for instance do

  • 5:

            Select a channel matrix and add it to action space At for present state space St

  • 6:

            Observe immediate reward Rt, next state space St+1

  • 7:

            Put (St,At,Rt,St+1)→ ERM

  • 8:

            Form random sample mini batch of (St,At,Rt,St+1) from ERM

  • 9:

            for each tuple in mini batch do

  • 10:

               Calculate Q-values

  • 11:

               Approximate Q*-values using target DNN

  • 12:

               Compute loss from Q and Q*

  • 13:

               Optimize w of policy DNN with Adam optimizer

  • 14:

        ww after all time steps

  • Ensure: 

    RrRmax