Dynamic reinforcement learning reveals time-dependent shifts in strategy during reward learning

Sarah Jo C Venditto; Kevin J Miller; Carlos D Brody; Nathaniel D Daw

doi:10.1101/2024.02.28.582617

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2024 Oct 7:2024.02.28.582617. [Version 3] doi: 10.1101/2024.02.28.582617

Dynamic reinforcement learning reveals time-dependent shifts in strategy during reward learning

Sarah Jo C Venditto, Kevin J Miller, Carlos D Brody, Nathaniel D Daw

PMCID: PMC10925334 PMID: 38464244

Abstract

Different brain systems have been hypothesized to subserve multiple “experts” that compete to generate behavior. In reinforcement learning, two general processes, one model-free (MF) and one model-based (MB), are often modeled as a mixture of agents (MoA) and hypothesized to capture differences between automaticity vs. deliberation. However, shifts in strategy cannot be captured by a static MoA. To investigate such dynamics, we present the mixture-of-agents hidden Markov model (MoA-HMM), which simultaneously learns inferred action values from a set of agents and the temporal dynamics of underlying “hidden” states that capture shifts in agent contributions over time. Applying this model to a multi-step, reward-guided task in rats reveals a progression of within-session strategies: a shift from initial MB exploration to MB exploitation, and finally to reduced engagement. The inferred states predict changes in both response time and OFC neural encoding during the task, suggesting that these states are capturing real shifts in dynamics.

Full Text Availability

The license terms selected by the author(s) for this preprint version do not permit archiving in PMC. The full text is available from the preprint server.

PERMALINK

This is a preprint.

Dynamic reinforcement learning reveals time-dependent shifts in strategy during reward learning

Sarah Jo C Venditto

Kevin J Miller

Carlos D Brody

Nathaniel D Daw

Abstract

Full Text Availability

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

This is a preprint.

Dynamic reinforcement learning reveals time-dependent shifts in strategy during reward learning

Sarah Jo C Venditto

Kevin J Miller

Carlos D Brody

Nathaniel D Daw

Abstract

Full Text Availability

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases