Skip to main content
. 2017 Dec 1;1(4):381–414. doi: 10.1162/NETN_a_00018

Figure 11. . A generative model of pictographic reading. In this model there are two discrete hierarchical levels with two sorts of hidden states at the second level and four at the first level. The hidden states at the higher level correspond to the sentence or narrative—generating sequences of words at the first level—and which wordthe agent is currently sampling (with six alternative sentences and four words respectively). These hidden states combine to specify the word at the first level (flee, feed, or wait). The hidden states at the first level comprise the current word and which quadrant the agent is looking at. These hidden states combine to generate outcomes in terms of letters or pictograms that would be seen at that location. In addition, two further hidden states flip the relative locations vertically or horizontally. The vertical flip can be thought of in terms of font substitution (uppercase versus lowercase), while the horizontal flip means a word is invariant under changes to the order of the letters (cf. palindromes). In this example, flee means that a bird is next to a cat, feed means a bird is next to some seeds, and wait means seeds are above (or below) the bird. Notice that there is a (proprioceptive) outcome signaling the word currently being sampled (e.g., head position), while at the lower level there are two discrete outcome modalities. The first (exteroceptive) outcome corresponds to the observed letter and the second (proprioceptive) outcome specifies a point of visual fixation(e.g., in a head-centered frame of reference). Similarly, there are policies at both levels. The high-level policy determines which word the agent is currently reading, while the lower level dictates eye movements among the quadrants containing letters. These discrete outcomes (the pictogram, what, and target location, where) generate continuous visuomotor signals as follows: the target location (specified by the discrete where outcome) is the center of the corresponding quadrant (denoted by L in the figure). This point of fixation attracts the current center of gaze (in the generative model) that is enacted by action (in the generative process), where action simply moves the eye horizontally or vertically. At every point in time, the visual outcome is sampled from an image (with 32 × 32 pixels), specified by the discrete what outcome. This sampling is eccentric, based upon the displacement between the target location and the current center of gaze (denoted by d in the figure). Finally, the image contrast is attenuated as a Gaussian function of displacement to emulate sensory attenuation. In short, the continuous state space model has two hidden causes target location and identity (denoted by vL and vI) and a single hidden state (x), corresponding to the current center gaze.

Figure 11.