A. Dynamics of beliefs in Task 1 (top) and Task 2 (bottom). Black arrows indicate transitions between states in the absence of observations (⌀) as a function of elapsed time, t, following an odor observation. ‘X’ indicates an unconstrained duration, and a dashed arrow indicates a transition that happens only when ‘X’ is finite. B. RNN activity at each time step (small black dots with connected lines) during an example trial in a 2D subspace identified using PCA, for two example networks trained on Task 1 (top) and Task 2 (bottom). Putative ITI fixed point indicated as purple circle. Vectors indicate the response to odor (black) and reward (red). Activity during an omission trial is shown in cyan, though note that omission trials were present in training data only for Task 2. C-D. Average normalized distance of each model’s activity from its fixed point following an odor (panel C) or reward (panel D) observation, over time. To allow comparing distances across models, each model’s distances were normalized by the maximum distance following each observation. E. Difference between each RNN’s odor memory and reward memory, for Untrained RNNs and Value RNNs trained on each task. An RNN’s odor memory is defined as the number of time steps after an odor that the RNN’s activity returns to its ITI (see panel C); reward memory is defined similarly (see panel D). Same conventions as Fig 3D.