Skip to main content
. 2023 Aug 18;19(8):e1011385. doi: 10.1371/journal.pcbi.1011385

Fig 3. Performance on discrimination and reversal tasks are similar for agents with one Q versus G and N matrices.

Fig 3

After acquisition (i.e., at trial 200), a second tone is added and the agent must learn to poke right in response to 10 kHz tone. Then, the pairing is switched and the agent learns poke right in response to 6 kHz and poke left in response to 10 kHz. A. Mean reward per trial. The reward obtained during the last 20 trials does not differ between agents with one Q versus G and N matrices for discrimination (T = 0.121, P = 0.905, N = 10 each) or reversal (T = -1.194, P = 0.250, N = 10 each). (B&C) Fraction of responses per trial; optimal would be 0.5 responses per trial, since each tone is presented 50% of the time. B. During 1st few blocks of discrimination trials, agent goes Left in response to 10 kHz tone, exhibiting generalization. C. After the first few blocks, the agent learns to go Right in response to 10 kHz. After reversal, the agent suppresses right response to 10 kHz. D. Dynamics of Q values for state (Poke port, 6 kHz) for a single run with G and N matrices. Note that two different states (rows in the matrix) were created in the N matrix for this agent. E. Dynamics of G and N values for state (Poke port, 10 kHz) for a single run. F. Dynamics of β1 change according to recent reward history; thus, β1 decreases at the beginning of discrimination, increases as the animal acquires the correct right response, and then decreases to the minimum at reversal. G. number of states of G and N matrices for a single run. In all panels, gray dashed lines show boundaries between tasks.