Skip to main content
. 2020 Jul 7;9:e53262. doi: 10.7554/eLife.53262

Figure 4. Algorithmic level.

Figure 4.

(A) Summary of the algorithm used by the actor. (B) Identifying an action based on a gradient of F. The panel shows an example of a dependence of F on a, and we wish a to take the value maximizing F. To find the action, we let a to change over time in proportion to the gradient of F over a (Equation 4.2, where the dot over a denotes derivative over time). For example, if the action is initialized to a=1.5, then the gradient of F at this point is positive, so a is increased (Equation 4.2), as indicated by a green arrow on the x-axis. These changes in a continue until the gradient is no longer positive, i.e. when a is at the maximum. Analogously, if the action is initialized to a=3.5, then the gradient of F is negative, so a is decreased until it reaches the maximum of F.

HHS Vulnerability Disclosure