Figure 10. Actor's parameter adaptation during closed-loop control.
(A) Cumulative reward over time. (B) Action values computed at the output layer of the Actor. Each color represents the value of a specific action. Here the red corresponds to the action that navigates the robot in a direct path to the target. (C) Output of the 3 hidden layer processing elements of the Actor. Larger adaptation of the values occurs before the “knee” of the cumulative reward curve. After the “knee” the system parameters stabilize their relative values indicating consolidation of the performance.
