Algorithm 2. Training the prediction neural network |
Network structure: |
input layer: neurons ( agent positions and target location probabilities, both relative to the size of the domain), |
hidden layer: neurons, |
output layer: neurons (in accordance with the number of possible actions). |
Activation function: |
sigmoid function . |
Loss function: |
mean square error (MSE) function. |
Input: domain , |
set of possible actions, |
probability of true alarms (Equation (3)), |
rate of false alarms and their probability (Equation (4)), |
sensor sensitivity , |
discount factor , |
objective probability map (obtained by using the value ), |
number of iterations for updating the weights, |
initial value (Equation (22)) and its discount factor , |
learning rate (with respect to the type of optimizer), |
number of epochs, |
initial weights of the prediction network and initial weights of the target network, |
training data set (that is, the table of pairs created by Procedure 1). |
Output: The trained prediction network. |
1. Create the prediction network. |
2. Create the target network as a copy of the prediction network. |
3. For each epoch do: |
4. For each pair from the training data set, do: |
5. For each action do: |
6. Calculate the value with the prediction network. |
7. Calculate the probability (Equation (22)). |
8. End for. |
9. Choose an action according to the probabilities . |
10. Apply the chosen action and set the next position of the agent. |
11. Calculate the next probability map with Equations (20) and (21). |
12. If or , then |
13. Set the immediate reward . |
14. Else |
15. Calculate the immediate reward with respect to and (Equation (14)). |
16. End if. |
17. For each action do: |
18. If then |
19. Set . |
20. Else |
21. Calculate the value with the target network. |
22. End if. |
23. End for. |
24. Calculate the target value (Equation (17)). |
25. Calculate the temporal difference learning error as for the chosen action (Equation (19)) and set for all other actions. |
26. Update the weights in the prediction network by backpropagation with respect to the error . |
27. Every iterations, set the weights of the target network as . |
28. End for. |