Skip to main content
. 2022 Aug 22;24(8):1168. doi: 10.3390/e24081168
Algorithm 2. Training the prediction neural network
Network structure:
input layer: 2n neurons (n agent positions and n target location probabilities, both relative to the size n of the domain),
  hidden layer: 2n neurons,
output layer: 9 neurons (in accordance with the number of possible actions).
Activation function:
  sigmoid function fx=1/1+ex.
Loss function:
  mean square error (MSE) function.
Input: domain C=c1,c2,,cn,
  set A=,,,,,,,, of possible actions,
  probability pTA of true alarms (Equation (3)),
  rate α of false alarms and their probability pFA=αpTA (Equation (4)),
  sensor sensitivity λ,
  discount factor γ,
  objective probability map P* (obtained by using the value ε),
  number r of iterations for updating the weights,
  initial value η (Equation (22)) and its discount factor δ,
  learning rate ρ (with respect to the type of optimizer),
  number M of epochs,
  initial weights w of the prediction network and initial weights w=w of the target network,
  training data set (that is, the L×N table of c,P pairs created by Procedure 1).
Output: The trained prediction network.
1. Create the prediction network.
2. Create the target network as a copy of the prediction network.
3. For each epoch j=1,,M do:
4. For each pair c,P from the training data set, do:
5. For each action aA do:
6. Calculate the value Qc,P,a;w with the prediction network.
7. Calculate the probability pa|Q;η (Equation (22)).
8. End for.
9. Choose an action according to the probabilities pa|Q;η.
10. Apply the chosen action and set the next position c=ac of the agent.
11. Calculate the next probability map P with Equations (20) and (21).
12. If P=P* or cC, then
13. Set the immediate reward Ra=0.
14. Else
15. Calculate the immediate reward Ra with respect to P and P (Equation (14)).
16. End if.
17. For each action aA do:
18. If P=P* then
19. Set  Qc,P,a;w=0.
20. Else
21. Calculate the value Qc,P,a;w with the target network.
22. End if.
23. End for.
24. Calculate the target value Q+=Ra+γmaxaAQc,P,a;w (Equation (17)).
25. Calculate the temporal difference learning error as ΔlQ=Q+Qc,P,a;w for the chosen action a (Equation (19)) and set ΔlQ=0 for all other actions.
26. Update the weights w in the prediction network by backpropagation with respect to the error ΔlQ.
27. Every r iterations, set the weights of the target network as w=w.
28. End for.