Selectivity Enhancement in Electronic Nose Based on an Optimized DQN

. 2017 Oct 16;17(10):2356. doi: 10.3390/s17102356

Algorithm 1 DQN-CNN with Experience Replay

Initialize the memory stored in the experience of replay D, the number of iterations M

Randomly initialize the Q-value function

for iteration number = 1, M do

randomly initialize the first action

a_{1}

initialize the first state

s_{1}

for

t

= 1, T do

if the probability is ϵ, select a random action

a_{t}

otherwise select

a_{t} = {}_{a^{'}}^{m a x}Q^{*} (s_{t}, a; θ)

input

a_{t}

s_{t}

into

C N N

, get classification

c_{t} = C N N (a_{t}, s_{t})

c_{t} = = label

then

reward

r_{t} = 1

t

< T then

reward

r_{t} = 2

else

r_{t} = 0

execute

a_{t}

, get

r_{t}

and next state

s_{t + 1}

stored (

s_{t}, a_{t}, r_{t}, s_{t + 1}

) in D

using a gradient descending of random small batches to get sample (

s_{j}, a_{j}, r_{j}, s_{j + 1}

)

y_{j} = {\begin{matrix} r_{j} s_{j + 1} \neq t e r m i n a l \\ r_{j} + γ {}_{a^{'}}^{m a x}Q (s_{j + 1}, a^{'}; θ) s_{j + 1} = t e r m i n a l \end{matrix}

Calculate the gradient of

{(y_{j} - Q (s_{j}, a_{i}; θ))}^{2}

to update

θ

end if

end for