Table 1.
General steps of the gating algorithms (see text for details).
1 | Choose motor action a and gating action g for current state s according to softmax over relevant action values in QM and QG (Equations 1, 2) |
2 | Observe reward r and next state s′ |
3 | Compute TD errors (Equations 3–5) |
4 | Update specific eligibility traces associated with current state s and actions a, g (Equations 6–8) |
5 | Update all state/state-action values (Equations 9–13) |
6 | Update all eligibility traces (Equations 14–16) |
7 | Repeat steps 1–6 until termination |