Skip to main content
. 2022 Mar 8;22(6):2099. doi: 10.3390/s22062099
Algorithm 1 RLCA algorithm code

Initialize the training environment

Initialize the replay memory buffer to capacity D

Initialize the evaluate network with random weight θ

Initialize the target network with random weight θ=θ

For episode = 1, M do

Initialize the initial position of the own USV and the obstacle USVs

While true

update the training environment

With probability ε, select USV action atAt

Otherwise, select USV action at=argmaxaQ(s,argmaxaQ(s,a;θ);θ)

Execute action at in the training environment, and obtain st+1

Obtain reward Rall=Rgoal+Rcollision+RCOLREGs+Rφ+RΔφ via maneuvering and the COLREGs

Obtain the st+1 category, and add one in n(S) by the method of category-based exploration

Obtain a reward Rexplore=σn(S)+dt based on category-based exploration

Obtain the total reward Rimprove=Rall+Rexplore

Store transition (st,at,rt,st+1) in replay memory buffer D

Sample the random minibatch of transitions (sj,aj,rj,sj+1) from D

Obtain yi=rj,ifj+1istheterminalrj+γmaxaQ(sj,a,θ),otherwise

Update the evaluate network parameters θ with gradient descent

If the number of steps reaches the update step N

update the target network with weight θ=θ

End if

The number of steps plus 1

End while

End while

Return the weight θ*=θ