Skip to main content
. 2017 Oct 16;17(10):2356. doi: 10.3390/s17102356
Algorithm 1 DQN-CNN with Experience Replay
 Initialize the memory stored in the experience of replay D, the number of iterations M
 Randomly initialize the Q-value function
 for iteration number = 1, M do
 randomly initialize the first action a1
 initialize the first state s1
 for t = 1, T do
  if the probability is ϵ, select a random action at
 otherwise select at=Q*amax(st,a;θ)
 input at, st into CNN, get classification ct=CNN(at,st)
 if ct==label then
 reward rt=1
 if t < T then
 reward rt=2
 else rt=0
 execute at, get rt and next state st+1
 stored (st,at,rt,st+1) in D
 using a gradient descending of random small batches to get sample (sj,aj,rj,sj+1)
yj={rjsj+1terminalrj+γQamax(sj+1,a;θ)sj+1=terminal
 Calculate the gradient of (yjQ(sj,ai;θ))2 to update θ
 end if
 end for