Skip to main content
. Author manuscript; available in PMC: 2022 Jun 1.
Published in final edited form as: Nature. 2021 Nov 24;600(7889):489–493. doi: 10.1038/s41586-021-04129-3

Extended Data Fig. 1 |. Additional details of the COIN model (related to Fig. 1). a-b, Hierarchy and generalisation in contextual inference.

Extended Data Fig. 1 |

a, Local transition probabilities are generated in two steps via a hierarchical Dirichlet process. In the first step (top), an infinite set of global transition probabilities β are generated via a stochastic stick-breaking process (see Suppl. Inf.). Probabilities are represented by the width of bar segments with different colours indicating different contexts. In the second step (bottom), for each context (‘from context’), local transition probabilities to each other context (‘to context’) are generated (a row of Π) via a stochastic Dirichlet process and are equal to the global probabilities in expectation (bar a self-transition bias, which we set to zero here for clarity). (An analogous hierarchical Dirichlet process, not shown, is used to generate the global and local cue probabilities.) b, Contextual inference updates both the global and local transition probabilities. Context transition counts are maintained for all from-to pairs of known contexts and get updated based on the contexts inferred on two consecutive time points (responsibilities at time points t and t + 1). These updated context transition counts are used to update the inferred global transition probabilities β^. The updated global transition probabilities and context transition counts produce new inferences about the inferred local transition probabilities Π^. Note that although the model infers full (Dirichlet) posterior distributions over both the global and local transition probabilities, for clarity here we only show the means of these posterior distributions (indicated by the hat notation). In the example shown, only row 3 of the context transition counts is updated (as context 3 has an overwhelming responsibility at time t), but all rows of the local transition probabilities are updated due to the updating of the global transition probabilities (if the model were non-hierarchical, there would be no global transition probabilities, and so the local transition probabilities would only be updated for context 3 via the updated context transition contexts). Thus inferences about transition probabilities generalise from one context (here context 3) to all other contexts (here contexts 1 and 2) due to the hierarchical nature of the generative model. Note that when a novel context is encountered for the first time, its local transition probabilities are initialised based on β^, thus allowing well-informed inferences about transitions to be drawn immediately. c-e, Parameter inference in the COIN model for the simulation shown in Fig. 1cf. In addition to inferring states and contexts, the COIN model also infers transition (c) and cue (d) probabilities, as well as the parameters of context-specific state dynamics (e). c, Transition probabilities. Top: Estimated global transition probabilities (solid lines) to each known context (line colours) and the novel context (grey). Pale lines show estimated stationary probabilities of the same contexts representing the expected proportion of time spent in each context given the current estimate of the local transition probabilities (below). Bottom three panels: estimated local transition probabilities from each context (colours as in top panel). d, Estimated global (top panel) and local cue probabilities for the three known contexts (bottom three panels) and cues (line colours). Although the model infers full (Dirichlet) posterior distributions over both transition (c) and cue probabilities (d), for clarity here we only show the means of these posterior distributions. e, Posterior distribution of drift (left) and retention parameters (right) for the three known contexts (colours as in c, novel context not shown for clarity). Although the model infers the joint distribution of the drift and retention parameters for each context, for clarity here we show the marginal distribution of each parameter separately. Note that drift and retention are estimated to be larger for the red context that is associated with the largest perturbation.