Toward Energy-Efficient Routing of Multiple AGVs with Multi-Agent Reinforcement Learning

. 2023 Jun 15;23(12):5615. doi: 10.3390/s23125615

Algorithm 1: An algorithm of MADDPG with the

ϵ

-greedy policy for AGVs.

for

j = 1

to max-episode do

Initialization of the parameters

for

t = 1

to M do

for

i = 1

to N do

n = random number

n < ϵ

then

execute any action(a)

else

execute the action which maximizes

Q_{t} (a)

with

1 - ϵ

end if

a_{i} = μ_{θ i} (o_{i}) + N_{t}

a = (a_{i}, \dots, a_{N})

r_{i} = k_{i 1} \times D_{p o s i t i o n} + k_{i 2} \times C_{v} \times v \times c o s (ω) + k_{i 3} \times C_{e} \times (E_{t a r g e t} - E_{i}) + k_{i 4} \times C_{A G V} + k_{i 5} \times C_{o b s t a c l e s}

end for

for agent

i = 1

to N do

y^{j} = r_{i}^{j} + γ Q^{μ}_{i}^{'} (x_{j}^{'}, a_{1}^{'}, \dots a_{N}^{'})

L (θ_{i}) = \frac{1}{s} \sum_{j} {(y^{j} - Q_{i}^{μ} (x^{j}, a_{1}^{j}, \dots, a_{N}^{j}))}^{2}

\nabla_{θ_{i}} J ≃ \frac{1}{s} \sum_{j} θ_{i} μ_{i} (o_{i}^{j}) a_{i} Q_{i}^{μ} (x^{j}, a_{1}^{j}, \dots, a_{i}, \dots, a_{N}^{j}) | a_{i} = μ_{i} (o_{i}^{j})

end for

for

i = 1

to N do

θ_{i}^{'} < - τ θ_{i} + (1 - τ) θ_{i}^{'}

end for