A Generalized Robot Navigation Analysis Platform (RoNAP) with Visual Results Using Multiple Navigation Algorithms

. 2022 Nov 22;22(23):9036. doi: 10.3390/s22239036

Algorithm 1 DQN training procedure.

Input: Total training episodes

Γ_{m a x}

, experience pool capacity D, target network update frequency F, training batch size B, attenuation coefficient

γ

, greedy value

ε

and maximum total penalty

R_{m i n}

Output: Target neural network

\hat{Q}

with

\hat{ω}

1: Initialization: set initial episode

Γ = 0

, initial state

s_{t}

, total reward of each episode

R = 0

, value neural network Q with random weight

ω

, target network

\hat{Q}

with

\hat{ω} = ω

2: while

Γ < Γ_{m a x}

3: while

∥(x, y) - (x_{d}, y_{d})∥ ⩾ ϵ

R ⩾ R_{m i n}

4: Select an action through

a_{t} = a r g m a x Q (s_{t}, a_{t}; ω)

5: Execute

a_{t}

and obtain

r_{t}

and

s_{t + 1}

6: Store

(s_{t}, a_{t}, r_{t}, s_{t + 1})

into D.

7: Randomly select B-size data from D, and perform

y = \{\begin{matrix} r_{t}, & Γ = Γ_{m a x}, \\ r_{t} + γ m a x \hat{Q} (s_{t + 1}, a_{t + 1}; \hat{ω}), & else . \end{matrix}

(11)

8: Compute mean square error loss through

l o s s = {(y - Q (s_{t}, a_{t}; ω))}^{2} .

(12)

9: Utilize gradient descent algorithm to update

ω

10: After F steps, perform

Q = \hat{Q}

11: Perform

R = R + r_{t}

12: end while

13: Perform

Γ = Γ + 1

and

R = 0

14: end while

15: return target neural network

\hat{Q}

with

\hat{ω}