|
Algorithm 1: Proposed SAC-based path planning algorithm for multi-arm manipulator. |
-
1:
Define MAMMDP and the augmented state and the state and goal state and
-
2:
Initialize network parameters
-
3:
Initialize the parameter values of the target network
-
4:
Initialize global replay memory
-
5:
-
6:
for to Mdo
-
7:
Initialize local buffer ▹ Memory for an episode
-
8:
for
to
do
-
9:
Randomly choose the goal and initial positions
-
10:
-
11:
-
12:
-
13:
if
then ▹ Get next state and reward
-
14:
-
15:
-
16:
else if
then
-
17:
-
18:
-
19:
else if
then
-
20:
-
21:
Terminate due to goal arrival
-
22:
end if
-
23:
-
24:
Store the transition in
-
25:
▹ Parameters update
-
26:
Sample mini-batch of m transitions from
-
27:
-
28:
-
29:
-
30:
-
31:
Each network parameters are updated by gradient descent
-
32:
using
-
33:
-
34:
Update state value target
-
35:
end for
-
36:
-
37:
if
then ▹ HER
-
38:
Set additional goal
-
39:
for
to
do
-
40:
Sample a transition from
-
41:
if
then
-
42:
-
43:
else
-
44:
end if
-
45:
Store the transition in
-
46:
end for
-
47:
end if
-
48:
end for
|