Reinforcement Learning-Based Data Association for Multiple Target Tracking in Clutter

. 2020 Nov 18;20(22):6595. doi: 10.3390/s20226595

Algorithm 1. The Q-learning method pseudocode.

Initialize
Set the state

s

and the action

a

For each state

s_{i}

and action

a_{i}

Set

Q (s_{i}, a_{i}) = 0

End For
Randomly choose an initial state

s_{t}

While the terminal condition is not reached do
Choose the best action

a_{t}

from the current state

s_{t}

from Q-table
Execute action

a_{t}

, then get the immediate reward
Find out the new state

s_{t + 1}

Acquire the corresponding maximum Q-value of

s_{t + 1}

Update the Q-table by (19)
Update the state

s_{t} \leftarrow s_{t + 1}

End While