Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2023 Aug 18;19(8):e1011385. doi: 10.1371/journal.pcbi.1011385

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2023 Blackwell, Doya

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PMC Copyright notice

Fig 3 — After acquisition (i.e., at trial 200), a second tone is added and the agent must learn to poke right in response to 10 kHz tone. Then, the pairing is switched and the agent learns poke right in response to 6 kHz and poke left in response to 10 kHz. A. Mean reward per trial. The reward obtained during the last 20 trials does not differ between agents with one Q versus G and N matrices for discrimination (T = 0.121, P = 0.905, N = 10 each) or reversal (T = -1.194, P = 0.250, N = 10 each). (B&C) Fraction of responses per trial; optimal would be 0.5 responses per trial, since each tone is presented 50% of the time. B. During 1^st few blocks of discrimination trials, agent goes Left in response to 10 kHz tone, exhibiting generalization. C. After the first few blocks, the agent learns to go Right in response to 10 kHz. After reversal, the agent suppresses right response to 10 kHz. D. Dynamics of Q values for state (Poke port, 6 kHz) for a single run with G and N matrices. Note that two different states (rows in the matrix) were created in the N matrix for this agent. E. Dynamics of G and N values for state (Poke port, 10 kHz) for a single run. F. Dynamics of β₁ change according to recent reward history; thus, β₁ decreases at the beginning of discrimination, increases as the animal acquires the correct right response, and then decreases to the minimum at reversal. G. number of states of G and N matrices for a single run. In all panels, gray dashed lines show boundaries between tasks.