A Shortest Distance Priority UAV Path Planning Algorithm for Precision Agriculture

. 2024 Nov 25;24(23):7514. doi: 10.3390/s24237514

Algorithm 3 Deep Q-Learning with Experience Replay.

1: Initialize replay memory

D

to capacity

N

;
2: Initialize action-value function

Q

with random weights
3: for episodes = 1, M do
Initialize sequence

s_{1} = {x_{1}}

and preprocessed sequenced

φ_{1} = φ (s_{1})

4: for t = 1,T do
5: With probability

ε

select a random action select a random action

α_{t}

6: otherwise select

α_{t} = \max_{a} Q^{*} (φ (s_{t}), α, θ)

7: Execute action

α_{t}

in emulator and observe reward

r_{t}

and image

x_{t + 1}

7: Set

s_{t + 1} = s_{t}, α_{t}, x_{t + 1}

and preprocess

φ_{t + 1} = φ (s_{t + 1})

8: Store transition

(φ_{t}, α_{t}, r_{t}, φ_{t + 1})

D

9: Sample random minibatch of transitions

(φ_{j}, α_{j}, r_{j}, φ_{j + 1})

from

D

9: Set

y_{j} =

using the RMSprop update rule
10: Perform a gradient descent step on

(y_{j} - Q (φ_{j}, a_{j}; θ))^{2}

11: end for
12: end for