Algorithm 1 DQN-CNN with Experience Replay |
Initialize the memory stored in the experience of replay D, the number of iterations M |
Randomly initialize the Q-value function |
for iteration number = 1, M do |
randomly initialize the first action
|
initialize the first state
|
for = 1, T do |
if the probability is ϵ, select a random action
|
otherwise select
|
input , into , get classification
|
if then |
reward
|
if < T then |
reward
|
else
|
execute , get and next state
|
stored () in D |
using a gradient descending of random small batches to get sample ()
|
Calculate the gradient of to update
|
end if |
end for |