Biped Robots Control in Gusty Environments with Adaptive Exploration Based DDPG

. 2024 Jun 8;9(6):346. doi: 10.3390/biomimetics9060346

Algorithm 1 Modified HER Algorithm for Retaining Experience with Adjusted Rewards

1:
Initialize environment and agent
2:
$s t a t e_h i s t o r y \leftarrow []$
3:
$f a l l e n_i n d e x \leftarrow - 1$
4:
while task not finished do
5:
$a c t i o n \leftarrow agent selects action (c u r r e n t_s t a t e)$
6:
$n e x t_s t a t e, r e w a r d, d o n e, i n f o \leftarrow environment executes action (a c t i o n)$
7:
Append $c u r r e n t_s t a t e$ to $s t a t e_h i s t o r y$
8:
if $i n f o [‘ fallen ’] = True$ then
9:
$d o n e \leftarrow True$
10:
$f a l l e n_i n d e x \leftarrow length (s t a t e_h i s t o r y)$
11:
end if
12:
$c u r r e n t_s t a t e \leftarrow n e x t_s t a t e$
13:
end while
14:
if $f a l l e n_i n d e x \geq 20$ then
15:
$s u c c e s s f u l_s t a t e s \leftarrow s t a t e_h i s t o r y [0 : fallen_index - 20]$
16:
$l a s t_a c t i o n s \leftarrow s t a t e_h i s t o r y [fallen_index - 20 : fallen_index]$
17:
Add a reward of 10 to the reward of $s u c c e s s f u l_s t a t e s$
18:
Process $s u c c e s s f u l_s t a t e s$ for further learning as successful attempts
19:
Store $l a s t_a c t i o n s$ in experience pool without additional reward
20:
end if