Algorithm 1 RLCA algorithm code |
Initialize the training environment Initialize the replay memory buffer to capacity D Initialize the evaluate network with random weight Initialize the target network with random weight For episode = 1, M do Initialize the initial position of the own USV and the obstacle USVs While true update the training environment With probability , select USV action Otherwise, select USV action Execute action in the training environment, and obtain Obtain reward via maneuvering and the COLREGs Obtain the category, and add one in by the method of category-based exploration Obtain a reward based on category-based exploration Obtain the total reward Store transition in replay memory buffer D Sample the random minibatch of transitions from D Obtain Update the evaluate network parameters with gradient descent If the number of steps reaches the update step N update the target network with weight End if The number of steps plus 1 End while End while Return the weight |