Skip to main content
. 2020 Jul 7;9:e53262. doi: 10.7554/eLife.53262

Figure 6. Description of a model selecting action intensity.

Figure 6.

(A) Details of the algorithm. The update rules for the variance parameters can be obtained by computing derivatives of F, giving δg2-1/Σg and δh2/Σh2-1/Σh, but to simplify these expressions, we scale them by Σg2 and Σh2, resulting in Equations 6.5. Such scaling does not change the value to which the variance parameters converge because Σg2 and Σh2 are positive. (B) Mapping of the algorithm on network architecture. Notation as in Figure 5B. This network is very similar to that shown in Figure 5B, but now the projection to output nuclei from the habit system is weighted by its precision 1/Σh (to reflect the weighting factor in Equation 6.2), and also the rate of decay (or relaxation to baseline) in the output nuclei needs to depend on Σh. One way to ensure that the prediction error in goal-directed system is scaled by Σg is to encode Σg in the rate of decay or leak of these prediction error neurons (Bogacz, 2017). Such decay is included as the last term in orange Equation 6.7 describing the dynamics of prediction error neurons. Prediction error evolving according to this equation converges to the value in orange Equation 6.3 (the value in equilibrium can be found by setting the left hand side of orange Equation 6.7 to 0, and solving for δg). In Equation 6.7, total reward R was replaced according to Equation 1.1 by the sum of instantaneous reward r, and available reward v computed by the valuation system. (C) Dynamics of the model.

HHS Vulnerability Disclosure