Table - PMC

Skip to main content

View full-text article in PMC

. 2021 Mar 11;21(6):1960. doi: 10.3390/s21061960

Algorithm 1 DQN Algorithm

1:
$Create a Replay Buffer B$
2:
$Initiate randomly the local network weights θ$
3:
$Initiate target network weights θ_{t} \leftarrow θ$
4:
for $each episode e$ do
5:
$Initiate a random state s$
6:
for $each transition t$ do
7:
$Select action a according to ϵ - greedy model$
8:
$Find the next - state and reward$
9:
$Store the experience in buffer B$
10:
$Select batch - size random experiences from buffer B$
11:
$Calculate Q (s, a) using local network and$
12:
$the batch of experiences$
13:
$Calculate Q^{'} (s^{'}, a) using target network and$
14:
$the batch of next - states$
15:
$Calculate loss function using Q^{'} (s^{'}, a) and Q (s, a)$
16:
if $t % updating - interval = = 0$ then
17:
$θ_{t} \leftarrow θ$
18:
end if
19:
end for
20:
end for