Figure 6. Description of a model selecting action intensity.
(A) Details of the algorithm. The update rules for the variance parameters can be obtained by computing derivatives of , giving and but to simplify these expressions, we scale them by and , resulting in Equations 6.5. Such scaling does not change the value to which the variance parameters converge because and are positive. (B) Mapping of the algorithm on network architecture. Notation as in Figure 5B. This network is very similar to that shown in Figure 5B, but now the projection to output nuclei from the habit system is weighted by its precision (to reflect the weighting factor in Equation 6.2), and also the rate of decay (or relaxation to baseline) in the output nuclei needs to depend on . One way to ensure that the prediction error in goal-directed system is scaled by is to encode in the rate of decay or leak of these prediction error neurons (Bogacz, 2017). Such decay is included as the last term in orange Equation 6.7 describing the dynamics of prediction error neurons. Prediction error evolving according to this equation converges to the value in orange Equation 6.3 (the value in equilibrium can be found by setting the left hand side of orange Equation 6.7 to 0, and solving for ). In Equation 6.7, total reward was replaced according to Equation 1.1 by the sum of instantaneous reward , and available reward computed by the valuation system. (C) Dynamics of the model.