Algorithm 1: An algorithm of MADDPG with the -greedy policy for AGVs. |
for
to max-episode do
|
Initialization of the parameters |
for
to M do
|
for
to N do
|
n = random number |
if
then
|
execute any action(a) |
else
|
execute the action which maximizes
with
|
end if
|
|
|
|
end for
|
for agent to N do
|
|
|
|
end for
|
for
to N do
|
|
end for
|
end for
|
end for
|