Skip to main content
. 2021 Nov 19;10:e69748. doi: 10.7554/eLife.69748

Figure 1. Male and female mice showed different exploratory strategies in a restless bandit task - males explored more than females, and they explored for longer periods of time once started.

(A) Schematic of the mouse touchscreen chamber with the restless two-armed bandit task and trial structure. (B) Average probability of obtaining reward compared to the chance probability of reward across individuals (dots). (C) Average probability of obtaining reward compared to the chance probability of reward across sexes. (D) Average response time across sexes. Females responded significantly faster than did males. (E) (left) A hidden Markov model that labeled exploration and exploitation as latent goal states underlying observed choices. This model includes an exploitation state for each arm and an exploration state where the subject chooses one of the arms randomly. (right) Reward probabilities (lines) and choices (dots) for 300 example trials for a given mouse. Shaded areas highlight explore-labeled choices. (F, G) Average (F) and distribution (G) of the percentage of Hidden Markov Model (HMM)-labeled exploratory trials in females and males. (H) Dynamic landscape of the fitted HMMs for males and females. The model fit to males had deeper exploratory states, with higher activation energy between the states. * indicates p < 0.05. Graphs depict mean ± SEM across animals.

Figure 1.

Figure 1—figure supplement 1. Male and female mice had reached asymptotic performance.

Figure 1—figure supplement 1.

There is no change in reward acquisition, response time, and reward retrieval time across days. (A) Average probability of obtaining reward compared to the chance probability of reward across days in male and female mice. (B) Average response time across days in male and female mice. (C) Average reward retrieval time across days in male and female mice.

Figure 1—figure supplement 2. Two time constants combined best describe the rate of switching choices in animals’ choice behavior and Hidden Markov model validation.

Figure 1—figure supplement 2.

Related to Figure 1D. (A) The tetrachoric correlation (r) between RL model-inferred explore-exploit states and HMM-inferred states. (B) The standardized regression coefficient (beta coefficients) of RL model-inferred states and HMM-inferred states in predicting response time. (C) The distribution of times between switch decisions (inter-switch intervals). A single probability of switching would produce exponentially distributed inter-switch intervals. Orange line, the maximum likelihood fit for a single discrete exponential distribution. Solid blue line, a mixture of two exponential distributions, with each component distribution in dotted blue. The two components reflect one fast-switching time constant (average interval, 1.7 trials) and one persistent time constant (6.8 trials). The right plot is the same as the left, but with a log scale. Inset is the log likelihood of mixtures of different numbers of exponential distributions. (D) Probability of choice as a function of value differences between choices for exploratory and exploitative states. (E) Difference in choice response time between explore and exploit choices. (F) The probability of animals switching targets on the next trial, given the current trial’s outcome and latent state. (G) Difference in choice response time between explore and exploit choices. There is no significant difference in retrieval time between two latent states, suggesting that exploration was not merely disengagement from the task. * indicates p < 0.05. Graphs depict mean ± SEM across animals.