Detection of Static and Mobile Targets by an Autonomous Agent with Deep Q-Learning Abilities

. 2022 Aug 22;24(8):1168. doi: 10.3390/e24081168

Algorithm 1. Generating the training data set

Input: domain

C = \{c_{1}, c_{2}, \dots, c_{n}\}

set

A = \{↑, ↗, \to, ↘, ↓, ↙, \leftarrow, ↖, ⊙\}

of possible actions,

probability

p_{T A}

of true alarms (Equation (3)),

rate

α

of false alarms and their probability

p_{F A} = α p_{T A}

(Equation (4)),

sensor sensitivity

λ

range

[ξ_{1}, ξ_{2}]

of possible numbers

0 < ξ_{1} < ξ_{2} \leq n - 1

of targets,

length

L \in (0, \infty)

of the agent’s trajectory,

number

N \in (0, \infty)

of agent trajectories,

initial probability map

P (0)

on the domain

C

Output: data set that is an

L \times N

table of pairs

(c, P)

of agent positions

c

and corresponding probability maps

P

1. Create the

L \times N

data table.

2. For each agent trajectory

j = 1, \dots, N

do:

3. Choose a number

ξ \in [ξ_{1}, ξ_{2}]

of targets according to a uniform distribution on the interval

[ξ_{1}, ξ_{2}]

4. Choose the target locations

c_{1}, c_{2}, \dots, c_{ξ} \in C

randomly according to the uniform distribution on the domain

C

5. Choose the initial agent position

c (0) \in C

randomly according to the uniform distribution on the domain

C

6. For

l = 0, \dots, L - 1

do:

7. Save the pair

〈 c (l), P (l) 〉

as the

j

th element of the data table.

8. Choose an action

a (l) \in A

randomly according to the uniform distribution on the set

A

9. Apply the chosen action and set the next position

c (l + 1) = a (c (l))

of the agent.

10. Calculate the next probability map

P (l + 1)

with Equations (20) and (21).

11. End for

12. End for

13. Return the data table.