Skip to main content
. Author manuscript; available in PMC: 2022 Sep 28.
Published in final edited form as: Adv Neural Inf Process Syst. 2020;33:3442–3453.

Figure 3:

Figure 3:

Result from an example IBL mouse. (a-d) Inferred trial-to-trial weight trajectories for the choice bias (yellow) and contrast sensitivity (purple), recovered under different learning models: (a) RF0, No learning model, with only a noise component to track the changes in behavior with the noise component. This mouse’s bias fluctuates between leftward and rightward choices (negative and positive bias weight), whereas its decision-making is increasingly influenced by the task stimuli (gradually increasing stimulus weight). (b) RF1, REINFORCE with a single learning rate for all weights. (c) RFK, REINFORCE with a separate learning rate for each of the two weights. (d) RFβ, REINFORCE with baselines, where the baseline is also inferred separately for each weight. (e-g) The decomposition of trial-to-trial weight updates into learning and noise components, for the model shown in the same row. The noise component is shown with the dashed line, while the learning component is given by the solid line.