Skip to main content
. 2023 Jan 20;23(3):1198. doi: 10.3390/s23031198
Algorithm 1 DQN Algorithm
Input:  MDP=(S,A,T,R),done,update
Output: Optimal policy π
  1: Init (π)
  2: sts0
  3: aNOP
  4: while  st!=doneandepisode<episodes  do
  5:    // Iterative selection of optimal value
  6:    A=A
  7:    aRandomSelect(A)
  8:    update(st,a,π) // Update parameters
  9:    Go to the next state st+1
10: end while