Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 May 2.
Published in final edited form as: Neural Comput. 2018 Mar 22;30(5):1209–1257. doi: 10.1162/NECO_a_01072

Predictive Coding in Area V4: Dynamic Shape Discrimination under Partial Occlusion

Hannah Choi 1,2,*, Anitha Pasupathy 2,3,4, Eric Shea-Brown 1,2,5,6
PMCID: PMC5930045  NIHMSID: NIHMS940202  PMID: 29566355

Abstract

The primate visual system has an exquisite ability to discriminate partially occluded shapes. Recent electrophysiological recordings suggest that response dynamics in intermediate visual cortical area V4, shaped by feedback from prefrontal cortex (PFC), may play a key role. To probe the algorithms that may underlie these findings, we build and test a model of V4 and PFC interactions based on a hierarchical predictive coding framework. We propose that probabilistic inference occurs in two steps. Initially, V4 responses are driven solely by bottom-up sensory input and are thus strongly influenced by the level of occlusion. After a delay, V4 responses combine both feedforward input and feedback signals from the PFC; the latter reflect predictions made by PFC about the visual stimulus underlying V4 activity. We find that this model captures key features of V4 and PFC dynamics observed in experiments. Specifically, PFC responses are strongest for occluded stimuli and delayed responses in V4 are less sensitive to occlusion, supporting our hypothesis that the feedback signals from PFC underlie robust discrimination of occluded shapes. Thus, our study proposes that area V4 and PFC participate in hierarchical inference, with feedback signals encoding top-down predictions about occluded shapes.

Keywords: Predictive coding, area V4, prefrontal cortex

1 Introduction

In natural scenes, objects rarely appear in isolation; rather, animals often have to discriminate and recognize partially occluded objects. While recognition under occlusion is difficult for even the best computer vision system, animals seldom have trouble. But the neural basis of this capacity is poorly understood. Here, we study the physiological underpinnings of a special case of the general problem, where occluders can be detected as distinct stimulus features.

Feedback projections from higher cortices are hypothesized to be important for successful recognition of occluded objects (Rust & Stocker, 2010; Gregoriou et al., 2014), and there are abundant feedback connections in the visual stream. Despite this, models of object recognition are typically hierarchical feedforward circuits (Fukushima, 1980; Riesenhuber & Poggio, 1999; Serre et al., 2007; Cadieu et al., 2007; Yamins et al., 2014). This is partly because of the complexity of including feedback signals, but also because little is known about where the relevant feedback signals originate, where they terminate in visual cortex, and how they contribute to recognition. Developing a computational framework explaining how feedback facilitates shape recognition under occlusion, therefore, is a prominent challenge for visual neuroscience.

Recent experimental results provide key insights into how interactions between area V4, a fundamental stage in the primate shape processing pathway (Roe et al., 2012; Pasupathy & Connor, 1999, 2001), and the prefrontal cortex, important for the control of complex behavior (Miller & Cohen, 2001), may underlie the ability to recognize partially occluded objects (Kosai et al., 2014; Pasupathy et al., 2015; Fyall et al., 2017). Specifically, in monkeys trained to discriminate pairs of shapes under varying degrees of occlusion, dynamics of V4 and PFC activity suggest that feedback signals from PFC to area V4 may serve to discount the effect of occlusion on the responses of V4 neurons –thereby increasing shape selectivity. This raises the question of how the feedback signals in V4-PFC circuitry perform the computation necessary for shape recognition. In this paper, we propose and test the hypothesis that this occurs via a hierarchical predictive coding. With the proposed model based on predictive coding, we successfully explain the dynamics of a subpopulation of neurons in V4 that exhibit delayed peak of responses (Pasupathy et al., 2015; Fyall et al., 2017), presumably induced by feedback signals from PFC.

Predictive coding has been proposed as a method to create efficient neural codes, and has successfully described neural responses in a variety of different sensory systems (Bogacz, 2015; Bastos et al., 2012; Friston & Kiebel, 2009a, b; Srinivasan et al., 1982; Rao & Ballard, 1999; Spratling, 2016; Rao, 1997, 1999, 2004, 2005; Lee & Mumford, 2003; Yuille & Kersten, 2006). Notably, the predictive coding framework reproduces center-surround antagonism in retina (Srinivasan et al., 1982) and endstopping effects in V1 (Rao & Ballard, 1999). In these studies, feedforward signals from each cortical area represent the residual errors between the feedback predictions and the encoding expectation. This interpretation of feedforward signals, however, has met the criticism (Koch & Poggio, 1999) that it implies reduced firing when familiar sensory inputs are encountered, differing from the common view in which sensory neurons respond strongly to preferred features. Here, we introduce a novel implementation of predictive coding, where the responses in V4 and PFC correspond to their most likely (or optimal) values given the stimulus and a hierarchical representation of its likelihood. Furthermore, the hierarchical inference is implemented in two steps, initially reflecting only the feedforward sensory signals and later integrating the feedback predictions, to explain the dynamic shape selective responses in V4.

In addition to assigning an algorithmic role to the feedback signals, our model makes further predictions on the structure of the network, representation of the stimuli, and prior expectations encoded in V4 and PFC. Previous studies have shown that shapes can be discriminated based on V4 activity at the population level (Meyers et al., 2008; Pasupathy & Connor, 2002), and shape identity information is already available at the level of V4. However, in our model feedback predictions effectively re-map the population responses and amplify the shape identity information that are reduced by partial occlusion. Furthermore, our model predicts that such amplification of the shape identity information following feedback predictions occurs only when the occlusion is salient and distinct from the shape.

In sum, our model suggests that feedback signals to V4 during the representation of occluded shapes can be interpreted in the context of predictive coding. These results shed light on how prior expectation contribute to the recognition of complex images in V4 and higher cortical areas.

2 Methods

2.1 Experiments

Experimental procedures are described in detail by Kosai et al. (2014); Fyall et al. (2017), and are only briefly outlined in this section to provide the background.

Animals were trained on a sequential shape discrimination task, where two stimuli were presented in sequence and the animal had to report whether they were the same or different with a rightward or a leftward saccade, respectively. The second stimulus in the sequence was presented in the receptive field of the V4 neuron under study and was partially occluded. During recordings in area V4, all task details were customized to the preferences of the single neuron under study. Specifically, one of the two discriminanda was a preferred shape that elicited strong responses from the neuron while the other was a non-preferred shape. Both shapes were presented in a preferred color for the cell and the occluding dots were in a non-preferred color so they provided only a modulatory influence. For recordings in the PFC, we studied many neurons simultaneously and did not customize stimulus shape or color to individual neuronal preferences as is customary in the field. Each day the experimental session began as follows. We chose two stimuli to serve as the discriminanda. This was followed by two phases. First, during the training phase, animals performed the sequential discrimination task with the unoccluded versions of the discriminanda. This typically included 20 attempts and was to ensure that the the unoccluded versions of the discriminanda were discriminable in the periphery. This was followed by the test phase during which the discriminanda were occluded to different levels with a field of randomly positioned dots. The level of occlusion was titrated by varying dot diameter while the number of dots was held constant, and was quantified as the percentage of the shape area that remained visible (% visible area).

All animal procedures conformed to NIH guidelines and were approved by the Institutional Animal Care and Use Committee at the University of Washington.

2.2 Coding assumptions

We explain the response dynamics of V4 and PFC neurons during the shape discrimination task by building a computational model based on a few coding principles, which we introduce here.

First, we assume that average firing rates of the neuronal populations recorded in experiments reflect the most likely representation of the neuronal responses given the input visual stimulus and a specific hierarchical model of the responses that we define below. Thus, assuming the sensory system seeks to infer the most likely representation of neuronal responses {r1, …, rn} of hierarchical areas ranging from the lowest area 1 to the highest area n, we simply find the set of responses that maximizes the posterior probability p(r1, …, rn|κ), where κ represents the sensory input. We refer to these as the optimal firing rates.

Second, the model is constructed based on the hierarchical predictive coding principle. In predictive coding (Rao & Ballard, 1999; Friston & Kiebel, 2009a; Bogacz, 2015), feedback from higher cortical areas is interpreted as a prediction about activities in lower cortical areas. In the lower cortical areas, bottom-up sensory signals are combined with these top-down predictions. With the predictions and the sensory inputs thus combined, probability distributions of the neural responses are constructed based on hierarchical Bayesian inference (Rao & Ballard, 1999; Bogacz, 2015; Lee & Mumford, 2003; Yuille & Kersten, 2006). Under this assumption, combined with predictive coding, neuronal activities depend on the activities of the next higher area, but are conditionally independent of activities in other cortical areas. In other words, the neurons in area i + 1, whose activity is denoted as ri+1, make the ‘top-down’ prediction Pred(ri+1) of the neuronal activity ri in area i. The noise ηi characterizing the differences between the actual neuronal response ri and the prediction made by the next higher layer Pred(ri+1), is given as

ηi=ri-Pred(ri+1). (1)

We assumed the noises to have a distribution gi(ηi) with zero mean. This leads to p(ri|ri+1), the distribution of the neuronal activity ri in area i given the next level activity ri+1, having its mean at the top-down prediction Pred(ri+1).

The posterior probability of the response representation across all levels given the sensory stimulus κ therefore factors as

p(r1,,rnκ)=ν·p(κr1,,rn)p(r1,,rn)=ν·p(κr1)p(r1r2)p(rn-1rn)p(rn), (2)

where ν is a normalization constant.

Above we described the general and classical framework for hierarchical representation of a stimulus κ via a sequence of firing rates. In summary, we assume that the brain aims to have neuronal activity in every layer get as close as possible to the prediction made by the responses of the next higher layer, where the discrepancy is given by a noise term ηi. Then, the neural firing rates adjust to those that are most consistent, i.e., most likely, given the stimulus κ. We next describe the specific form of the representation that we use here.

2.3 Model architecture

Our model is composed of two layers, a V4 layer and a PFC layer (Fig. 1A). We designate the higher cortical area as PFC based on the experimental evidence indicating feedback from PFC as a likely precursor of the delayed responses in V4 (Pasupathy et al., 2015; Fyall et al., 2017) (see Results). Furthermore, previous experimental studies have found anatomical and physiological evidence for direct feedforward (Ninomiya et al., 2012) and feedback (Brabas & Mesulam, 1985; Ungerleider et al., 2008) connections between V4 and PFC in the primate brain.

Figure 1.

Figure 1

Schematic diagram of network model. (A) Model network of V4 and PFC populations and the schematic of the input shape stimulus. By optimizing the cost function with respect to both V4 and PFC responses, the network implements both feedforward connections from V4 to PFC and feedback connections from PFC to V4. Note that the model is not image computable, and the input stimulus in the figure is given to illustrate the model setup. (B) Top-down predictions made by PFC on each of the three V4 units are represented by Gaussian distributions with means at f(u · rpfc) = u · rpfc. (C) Bottom-up component, which is represented by the conditional probability distributions of the V4 responses given the shape stimulus. When the input stimulus is unoccluded shape A, the response distribution of the shape A-selective V4 population has a higher mean than those of the shape B- and occluder-selective populations. As the occlusion level increases, the mean of the shape A-selective response distribution decreases and the standard deviation increases. Shape B-selective distribution stays at the constant baseline and the occluder-selective response distribution moves towards higher rates. The response distribution of each V4 population is shown in the same color as in (A).

The V4 layer is composed of three units: two that are selective for each of the two visual shapes that are being discriminated, namely, shape A and shape B (Fig. 1A, V4 unit 1 (green) and V4 unit 2 (blue), respectively), and a third V4 unit that responds selectively to the occluder-specific features, such as color (Fig. 1A, V4 unit 3 (red)). Shape selectivity has been previously demonstrated in area V4 (Pasupathy & Connor, 1999). While the existence of V4 cells that are selective exclusively for occluders has not been confirmed experimentally, a recent experimental study has found such strictly occluder-selective cells in the IT cortex (Namima & Pasupathy, 2016). Furthermore, we do not require that neurons corresponding to V4 unit 3 would be exclusively selective for occluders independent of other stimulus features – rather, they could respond preferentially to any occluder-specific features. The V4 cells that preferentially respond to the color of the occluders are a good candidate, as in the experiment occluders were presented in a different color than the shape or the background. Supporting this, many V4 neurons are known to have color-selectivity (Zeki, 1973; Schein & Desimone, 1990; Bushnell et al., 2011a; Bushnell & Paupathy, 2012), and many are sensitive simply to stimulus area rather than shape (Eghbali et al., 2016). Indeed, in Fig. S1B, we present example V4 cells that respond strongly to presence of occluders regardless of whether these occluders are presented with the preferred or the non-preferred shape. Each V4 unit can be interpreted as a sub-population of V4 neurons with similar tuning properties.

The model includes two PFC units, which represent two distinct neuronal populations in PFC. While the roles of PFC neurons are not well-understood, PFC is believed to be involved in planning complex behavior and tasks involving short-term memory (Miller & Cohen, 2001). Experimental recordings (Pasupathy et al., 2015; Fyall et al., 2017) from PFC also show that a subset of PFC neurons have mild shape selectivity, while also responding strongly to occluders.

The sum of PFC activities weighted by the connection weights between V4 and PFC units (Fig. 1A) is represented as the feedback signal to V4 units. The initial feedback connection weights between V4 units and PFC units are chosen so that the PFC units show appropriate selectivity after training. Namely, one of the PFC units in the model is designated to be weakly shape A-selective and the other PFC unit is weakly shape B-selective. Both PFC units respond strongly to partially occluded shapes, and only weakly to unoccluded shapes.

In this way, PFC neurons of the model respond strongly to both the task-relevant visual features (shape identity) and nuisance variables (occlusion level), while each of V4 populations responds preferentially to single feature of the input visual stimulus. Thus, although the V4 responses are already modulated by both shape and occlusion level, the signals become even more mixed as they go up in the hierarchy. Previous studies have shown that such mixed selectivity in the PFC plays an important computational role in a high-dimensional population encoding of task-relevant information (Rigotti et al., 2013; Fusi et al., 2016).

2.4 Probabilistic network model

As we detail further below, the responses of the neuronal units evolve toward values that maximize the posterior probability of these responses given the input shape stimulus. In other words, the neuronal activities, and synaptic weights at a slower time scale, are found by estimating the most likely values given the shape stimulus.

In our model, visual inputs are simplified and represented by κ, which includes the shape identity s (shape A or shape B, s ∈ {A, B}) and the degree of occlusion c (c ∈ [0, 1]), so that κ = (s, c). We assume that the V4-PFC circuitry builds a two-level hierarchical description of the input stimulus κ, via firing rates of V4 (rv4) and PFC neurons (rpfc). As it is assumed that each successive random variable is conditionally dependent only on the random variable in the adjacent higher level, the posterior probability of the V4 and PFC responses given κ factors as

p(rv4,rpfcκ)=h0·p(κrv4,rpfc)p(rv4,rpfc)=h0·p(κrv4,rpfc)p(rv4rpfc)p(rpfc)=h0·p(κrv4)p(rv4rpfc)p(rpfc)=h·p(κrv4)p(rv4rpfc), (3)

where h0 and h are constants. The first equality comes from Bayes’ theorem, with a normalization term h0. The second equality is simply a property of joint probability. The third equality is based on the assumption that the probability distribution is set up hierarchically. Based on the assumption of spatially Markovian inference (Lee & Mumford, 2003; Rao & Ballard, 1999; Friston & Kiebel, 2009a; Bogacz, 2015), we made a simplification p(κ|rv4, rpfc) = p(κ|rv4) in Eq. 3. Finally, a flat prior on the PFC firing rates is assumed, which is embedded in the constant h on the last line of Eq. 3, and therefore, the posterior probability of the neuronal responses is

p(rv4,rpfcκ)=h·p(κrv4)p(rv4rpfc). (4)

The firing rates of the V4 and PFC units are given as

rv4=[rv4,1rv4,2rv4,3],rpfc=[rpfc,1rpfc,2], (5)

where rv4,1 and rv4,2 represent the average firing rates of the shape-selective V4 neuronal populations (preferring shape A and shape B, respectively), and rv4,3 is the average firing rate of the occluder feature-selective V4 population.

We first describe p(κ|rv4) and how V4 firing rates depend on the input stimulus κ. We define μ as the bottom-up representation of the stimulus

μ=[μ1μ2μ3]. (6)

The difference between this bottom-up representation and the V4 responses rv4 gives the noise term η1,

η1=μ-rv4, (7)

which has a Gaussian distribution with zero mean and diagonal covariance matrix

1=[σ12000σ22000σ32]. (8)

The distribution p(κ|rv4) is the likelihood of the V4 neuronal activities given the sensory input κ. Assuming a flat prior on rv4, p(κ|rv4) ∝ p(rv4|κ). Thus,

p(rv4κ)=N(rv4;μ,1). (9)

The mean μ and the covariance matrix Σ1 are determined by the input shape identity s and the occlusion level c. Changes in μ and Σ1 describe the sensory-input driven responses of the V4 populations to different shapes under various degrees of occlusion. In other words, for each occlusion level and the shape identity, there is a most-likely firing rate of each V4 unit given by μ, and that likelihood falls off according to the covariance Σ1.

Here we describe how we modulate μ and Σ1 based on the sensory input κ. Let’s assume the animal is presented with shape A as the test shape. With shape A presented, μ1, the Gaussian mean of the firing rate distribution of V4 unit 1 in Fig. 1A (the shape A-selective V4 population), decreases as occlusion c increases (Fig. 1C, green). On the other hand, μ2 of the V4 population preferring shape B (V4 unit 2 in Fig. 1A) stays constant at a “baseline” firing rate, independent of the change in occlusion level. That is, the V4 unit 2 does not prefer shape A, it responds with a low firing rate regardless of the occlusion level (Fig. 1C, blue). The standard deviation σ1 of the preferred V4 unit increases as occlusion increases, in order to capture the increasing uncertainty of the shape identity under higher degrees of occlusion (see Fig. 1C, where the green distribution widens as occlusion increases). The standard deviation σ2 of the non-preferred V4 population (V4 unit 2) is assumed to be constant.

A justification for increasing the input variance σ1, but not σ2 with occlusion is as follows. These terms represent uncertainty in shape identity signals. We hypothesize that occlusion introduces the most uncertainty for neuronal responses to preferred shapes, as randomly placed occluders may either hide critical features of the preferred shapes or fail to hide these features. In the first case, shape signals will be strongly suppressed; in the second, they will be maintained. For non-preferred shapes, which lack critical features, we hypothesize shape signals for different occlusion patterns will be less volatile. Supporting this, while random placement of occluders may form accidental contours, a previous experimental study in V4 has shown that responses to preferred contours are suppressed when those contours are accidentally formed at the junction between the occluded and occluding objects (Bushnell et al., 2011b). Accordingly, the variance σ2 should be roughly constant with added occlusion, or if increasing, only by a small amount. In Results, we explore which trends in variances are consistent with the data in more detail (Fig. 9).

Figure 9.

Figure 9

Model of graded encoding of feedforward sensory inputs across a heterogeneous V4 population. (A) Schematic of an expanded model composed of two PFC units, ten occluder-selective V4 units, and twenty shape-selective V4 units with graded shape preferences. (B) Input-dependent peak firing rates (top) and variances (bottom) as a function of occlusion level, for occluder-preferred V4 units (red) and shape-selective V4 units (blue-green). (C) The model responses of PFC units (i), occluder-selective V4 units (ii), and shape-selective V4 units (iii). A selected number of the shape-selective V4 unit responses are shown in (iv) for a better display. Solid lines indicate initial responses, and dotted lines correspond to delayed responses.

Finally, for V4 unit 3, the relevant stimuli (occluding dots) are present on every trial but slightly shifted in position; as we assume that this unit responds to the presence of occluders but not their specific configuration, the variance σ3 is taken to be constant across occlusion levels (Fig. 1C, red). As occlusion level increase, μ3 of the occluder-selective V4 population (V4 unit 3) also increases.

The dependence of the means and the variances on the occlusion level c was set to be linear: μ = μ0 + α · c and Σ1 = Σ0 + β · c with μ0 = [50 20 20]T, α = [−5 0 100]T, Σ0 = I3, and β = [5 0 0]T. The slopes (α, β) and the values defining the response distributions when the shape is unoccluded (μ0, Σ0) at c = 0, were manually chosen to match the peak firing rates observed in experiments. With this choice of α, as occlusion level increases, the peak of the response distribution decreases, stays constant at a low baseline firing rate, and increases, for V4 unit 1, 2, and 3, respectively. Thus, V4 units 1 and 2 reproduce response patterns of V4 neurons to preferred and non-preferred shapes under varying degrees of occlusion in experiments (Fig. 2; S1A), and unit 3 replicates V4 neurons that respond strongly to occlusion (Fig. S1B). The values chosen for β, on the other hand, indicate that ambiguity of the stimulus feature increases only for the test shape preferred V4 unit 1. In this way, the input stimuli– shape A and shape B with various degrees of occlusion – are represented by the response distributions of three different V4 populations given κ, rather than by using actual pixel images.

Figure 2.

Figure 2

Recordings from V4 and PFC show characteristic response dynamics. (A) Example V4 cell responses to a preferred (left) and a non-preferred shape (right) during the discrimination task. Test stimulus onset was at time 0 ms. Level of occlusion was measured by % unoccluded area (line color). Black line (100% unoccluded) represents the unoccluded stimulus. Two transient peaks are identified by filled and open rectangles. (B) The time averaged V4 firing rates during the initial and the delayed peaks (identified in A) as a function of occlusion level. Solid lines show averaged firing rates for the preferred shape during the initial peak, and the dotted lines indicate averaged firing rates during the delayed transients, as marked above response traces in (A). (C) Response of an example PFC cell to the two shape stimuli (left and right) during the discrimination task. (D) Averaged PFC responses as a function of occlusion level. Responses to each of the two shapes are shown in green and blue, respectively. Population data follow the same trend. Data adapted with permission from Pasupathy et al. (2015); Fyall et al. (2017).

The second term on the right side of Eq. 4, p(rv4|rpfc), provides the top-down effects on the posterior distribution, also described as Gaussian. Here, the mean is the prediction made by PFC, u · rpfc, which is the sum of the two PFC population responses weighted by the connection weight matrix u. In more general cases, this weighted sum is filtered by a nonlinearity f, thus yielding the top-down prediction f(u · rpfc) (Fig. 1B). For the simulations in this study, however, the nonlinearity on weighted PFC responses was ignored and the predictions were assumed to be linear, i.e., f(u · rpfc) = u · rpfc, as in Rao & Ballard (1999). The connection weights between the V4 and PFC neuronal units are given as

u=[u1,1u1,2u2,1u2,2u3,1u3,2]. (10)

The difference between u · rpfc, the top-down prediction made by PFC, and the V4 responses rv4 is then

η2=rv4-u·rpfc, (11)

where the noise η2 has a Gaussian distribution with zero mean and covariance matrix Σ2,

2=[σ12000σ22000σ32]. (12)

The distribution of V4 responses given the PFC responses, p(rv4|rpfc), is then

p(rv4rpfc)=N(rv4;u·rpfc,2). (13)

The standard deviation of the response distribution of each V4 unit given the PFC responses determines the relative significance of the top-down predictive contribution on shaping the V4 responses. Specifically, a smaller standard deviation leads to smaller noise terms, forcing closer matches between PFC and V4 responses. These standard deviations were chosen as σ1 = 10, σ2 = 10, and σ3 = 1. Thus, the top-down component is more strongly emphasized for V4 unit 3, the V4 neuronal population selective for occluders. We found that such emphasis on the predictive component for the occluder-selective V4 population was necessary to reproduce the experimentally observed PFC response characteristics – an increase in PFC responses with a rise in occlusion level (see Results).

Given the visual stimulus κ, the firing rates rv4 and rpfc adjust in order to maximize the posterior distribution, namely, p(κ|rv4)p(rv4|rpfc). Maximizing this is equivalent to minimizing its negative logarithm, which is defined as the cost function E,

E=(rv4-μ)T1-1(rv4-μ)+(rv4-u·rpfc)T2-1(rv4-u·rpfc). (14)

Note that this cost function is the sum of the squared error η1Tη1 between the V4 responses and the sensory-input imposed representation, and the squared error η2Tη2 between the V4 responses and the top-down prediction made by PFC, weighted by their inverse variances.

The optimal “parameters” – the neuronal responses and the connection weights – are thus found by minimizing this cost function E with respect to the parameters rv4, rpfc, and u. The initial V4 responses in experiments, that presumably depend only on the feedforward sensory input, are found by minimizing only the first term of Eq. 14. The initial responses are therefore equal to the sensory-driven representation μ. However, the delayed V4 responses, which we hypothesize to depend on both the feedfoward sensory input and the feedback prediction, are found by minimizing the entire cost function Eq. 14.

2.5 Training protocol: weight adjustment during the preliminary phase

We divide the optimization process into two phases based on the experimental setup: the preliminary phase and the test phase. In this section, we discuss how the synaptic weight matrix between PFC and V4 is found during the preliminary phase. To find these weights, we minimized the cost function E with respect to rv4 and rpfc as well as with respect to the connection weight matrix u, over a series unoccluded trials. Then during the test phase, the optimal estimates of the neuronal responses to shapes under varying degrees of occlusion are determined by minimizing the cost function with respect to rv4 and rpfc, with the connection weights fixed at the learned values.

The preliminary phase corresponds to the stage at the beginning of the experiment where the animal is exposed to a pair of unoccluded shapes used for the experimental session for ~ 20 times. We introduced its equivalent in the simulation, during which the cost function E is minimized by gradient descent with respect to the firing rates of the V4 units rv4 and PFC units rpfc, as well as the connection weight matrix u. During this phase, unoccluded shape A and shape B are randomly chosen and used as inputs to the model for up to 30 trials.

The optimal estimates of rv4, rpfc, and u are obtained by performing gradient descent on E with respect to these parameters at different learning rates:

drv4dt=-krErv4drpfcdt=-krErpfcdudt=-kuEu. (15)

The learning rate of u was a significantly smaller value ku = 0.001, compared to that of rv4 and rpfc, which was kr = 0.1. This models the relatively faster dynamics of firing rates and slower dynamics of synaptic plasticity. For each selected shape, we carried out gradient descent either until the firing rates reach steady states after a minimum 20 iterations, or until the iteration exceeds the maximum of 500 iterations. While rv4 and rpfc rapidly converge to a fixed point for each of the sampled shapes, the connection matrix u gradually converges over the course of multiple samples of shape A and B. In this way, the weight matrix u is tuned over the course of the preliminary phase, which corresponds to the animal’s familiarization with the pair of the shapes at the beginning of the experiment.

We set initial weights for u1,2 and u2,1 smaller than the initial values of other connection weights, to slightly bias one of the PFC populations (PFC unit 1) to be shape A-selective and the other (PFC unit 2) to be shape B-selective.

We acknowledge a limitation of the gradient descent method on E in Eq. 15, which is that it requires nonlocal computation. In other words, the activities and the synaptic strengths of all the neuronal units in the system must be known in order to take a gradient descent step, a requirement that is not physiologically realistic. This issue also exists in previous models of predictive coding and sparse coding in the visual system (Rao & Ballard, 1999; Olshausen & Field, 1996, 1997), as pointed by Bogacz (2015); Zylberberg et al. (2011). While we do not pursue this matter further here, we note that Zylberberg et al. (2011) shows that in the limit that the neuronal activity is sparse and uncorrelated, the non-local gradient descent rule is approximately equivalent to a synaptically local rule.

2.6 Optimal stimulus representation during the test phase

Once the weight matrix u has converged over the course of the preliminary phase, it is fixed at the learned values during the test phase. The test phase corresponds to the recording session where the animal performs the matching task while test shapes with varying degrees of occlusion are displayed. We hypothesize that the V4 and PFC recordings from the experiment are represented by the average firing rates of the V4 and PFC populations in the model network, rv4 and rpfc, that minimize the cost function E. Either shape A or shape B can be used as the input to the network. In this paper, however, without loss of generality we only show the simulations with shape A as the test shape so that the V4 unit selective for shape A (V4 unit 1) is the “preferred” population and the shape B-selective V4 unit (V4 unit 2) is the “non-preferred” population. The weight matrix u is fixed at the learned values from the preliminary phase.

For each occlusion level, the optimization is carried out in two parts, to reflect the dynamics of the V4 responses. The initial responses of V4 neurons observed in experiments are compared to the V4 responses rv4 that minimize the first part of the cost function E (Eq. 14), namely,

E1=(rv4-μ)T1-1(rv4-μ). (16)

E1 is simply a weighted difference between the V4 neuronal responses and the V4 responses predicted by the bottom-up sensory input. Therefore, rv4 that minimizes E1 are interpreted as the V4 responses shaped by only the feedforward inputs.

On the other hand, the delayed responses of V4 neurons, as well as the PFC responses, are found by minimizing the entire cost function E (Eq. 14) with respect to rv4 and rpfc. We rewrite the full cost function E as E2:

E2:=E=(rv4-μ)T1-1(rv4-μ)+(rv4-u·rpfc)T2-1(rv4-u·rpfc). (17)

E2 includes a term that depends on the difference between rv4 and the top-down predictions made by PFC, u · rpfc, in addition to the error term between the rv4 and the V4 responses predicted by the input visual stimulus. Therefore, rv4 that minimizes this cost function E2 is interpreted as the V4 responses shaped by both the feedforward and the feedback signals. This rv4 is compared to the delayed responses in V4 neurons in experiments that we hypothesize to be induced by feedback from PFC.

E1 and E2 are minimized using gradient descent and MATLAB fminsearch with respect to rv4 and rpfc, starting from the initial value at 10 (spikes/s) for all neuronal units.

3 Results

We first present experimental evidence that supports the hypothesis that feedback signals from PFC modulate shape representations in V4 (Experimental evidence for feedback signals in area V4). We then compare the outcomes in our probabilistic network model (Structure and design of probabilistic network model) to physiology and explain how robust shape recognition can be achieved in our model (Two-step inference on neuronal dynamics). Subsequently, we identify necessary assumptions on the network structure (Parsimony of the network structure) and the signal structure (Structure of inputs to V4; Differential weighting of feedforward and feedback inputs) of the model to capture the key trends in the experimental results. Finally, using our model, we make predictions on shape selective neuronal responses to a new type of reduced stimulus clarity (Model prediction for responses to non-salient occlusion, noise, or reduced contrast).

3.1 Experimental evidence for feedback signals in area V4

Recent experiments demonstrated that neurons in V4 and PFC show strikingly different response patterns in monkeys performing a sequential shape discrimination task. Specifically, a class of V4 neurons shows evidence of feedback signals from PFC, supported by interesting response patterns in these V4 neurons and PFC neurons. Our goal in this study is to provide a normative model describing these experimental results.

Fig. 2A shows the response dynamics of an example V4 cell to a preferred shape (left) and a non-preferred shape (right). The V4 neuron exhibits two transient peaks when the preferred shape was presented, but only one smaller peak for the non-preferred shape. In the initial transient at the onset of the preferred shape stimulus, the V4 neuron responded strongly to the unoccluded shape (black), and an increase in occlusion weakened the shape selective responses (color). While the first peak shows a dramatic dependence on occlusion, the latter peak of responses shows a weaker dependence. Fig. 2B shows the averaged responses of the V4 neuron during the initial transient (50–125 ms) and the delayed transient (175–250 ms), illustrating the differential effects of occlusion on V4 responses over time. The reduced effect of occlusion on V4 responses to the preferred shape during the second transient leads to enhanced shape selectivity, as previously observed in Pasupathy et al. (2015); Kosai et al. (2014); Fyall et al. (2017). Such response patterns were observed in many other V4 neurons in experiments. In Fig. S1A, we show a few more example V4 cells which exhibit shape selective responses that are less sensitive to occlusion during the delayed response peak.

In contrast to V4 neurons, PFC neurons exhibit one peak, and show strongest responses to occluded stimuli and weakest responses to unoccluded stimuli, as shown for an example PFC neuron in Fig. 2C (Pasupathy et al., 2015). Fig. 2D shows the time averaged responses of the PFC neuron as a function of occlusion level, for both the preferred and the non-preferred shapes. As occlusion increases, the PFC responses increase, which is the opposite trend as for V4. Moreover, the timing of the peak PFC responses is between the initial and the delayed transients of V4 responses, consistent with the hypothesis that the PFC responses, which arise from feedforward transmission of sensory information, in turn send feedback inputs and drive the second peak of responses in V4. These experimental observations led us to the hypothesis that the feedback inputs from PFC and other higher cortices underlie delayed improvement of shape selective responses under occlusion in V4. For more details on the experimental results, see Kosai et al. (2014) and Pasupathy et al. (2015).

3.2 Structure and design of probabilistic network model

We sought to understand the response dynamics of V4 and PFC neurons in the context of predictive coding, a hierarchical encoding of stimuli widely used to probe interactions of lower and higher sensory areas. We first pose a probabilistic network model of the V4-PFC circuitry with the presumptive feedback based on predictive coding, and introduce an innovation that differentiates our model from previous predictive coding models.

In each layer of our V4-PFC network model, there are distinct units, each of which represents a neuronal population with similar tuning properties. The V4 layer is composed of three units which respond preferentially to different features of the visual stimulus (Fig. 1A): unit 1 to shape A, unit 2 to shape B, and unit 3 to an occluder-specific feature, for example the color of the occluders. In PFC, there are two units that respond strongly to occlusion, while also exhibiting some degree of shape selectivity. The representation of a population of neurons as a single unit is a common simplification but we find that each unit replaced by a population of multiple neurons with mild heterogeneity yields qualitatively the same response trends as with the single unit model (See Appendix S2).

In the model, V4 receives feedforward sensory inputs and seeks to match the responses imposed by the sensory inputs. At the same time, feedback predictions from PFC bias the V4 responses. The weighted sums of PFC responses provide top-down predictions conditioned on underlying visual stimulus, and are regarded as the feedback from PFC to V4. With hierarchical Bayesian inference assumed, the most likely representation of the responses is obtained by finding a set of responses that maximize the posterior probability given the visual stimulus, which is equivalent to the product of conditional probabilities of the neuronal activities given only the activities of the next higher area (See Methods, Eq. 3). Note that we are not only finding the optimal responses of V4 units, but also the optimal responses of PFC units to minimize the cost function. Therefore, the V4 neuronal responses drive the PFC responses while the PFC predictions drive V4 neurons, enacting feedforward and feedback connections between V4 and PFC. Here, the visual input to each V4 unit is represented as a Gaussian distribution, whose mean and variance change according to the shape identity and the occlusion level (Fig. 1C). Similarly, the feedback from PFC to each V4 unit is described by a Gaussian distribution with the peak at a sum of the PFC responses weighted by the synaptic strengths (Fig. 1B).

In this way, the optimal representation of the neuronal responses integrates both the bottom-up sensory input and the top-down prediction. This is done by minimizing a cost function composed of the difference between the V4 activities and the top-down predictions as well as the difference between the V4 activities and the V4 responses predicted by the sensory input, with each term inversely weighted by its respective variance (See Methods, Eq. 14). We compare this optimal representation directly to the neuronal responses in experiments; this differs from previous studies (Rao & Ballard, 1999; Srinivasan et al., 1982) where the residual error between the prediction and the neuronal activity was associated with physiologically measured responses. With this reformulation, neural activity conveys both the sensory input and the internal prediction, preventing the situation in original implementations of predictive coding in which neurons have depressed activity when familiar stimuli are presented regardless of the sensory tuning properties.

3.3 Network training and synaptic weight matrix

First, the network was trained following the experimental procedure where the animal was exposed to the pair of unoccluded shapes. During this preliminary phase, the connection weight matrix u between PFC and V4 is learned by gradient descent on the cost function E with respect to the weights u as well as the neuronal responses rv4 and rpfc, while unoccluded shape stimuli randomly selected from the set of shape A and shape B, are input to the network. The learning rate for neuronal firing rates is significantly larger than that for weights (See Methods, Eq. 15). Thus, for each sampled shape, the firing rates of the neuronal units converge rapidly. The weight matrix u converges on a slower time scale, over the course of the preliminary phase with multiple presentations of unoccluded shapes. With initial values of the connection weights set to

u=[1-1-1111],

the connection weight matrix converges to

u=[2.320.210.262.370.940.94],

where the asymmetric weights between the PFC units and the shape-selective V4 units indicate shape selectivity in PFC units. The shape selectivity in PFC units and resulting response characteristics are preserved as long as the initial values for u1,2 and u2,1 are sufficiently smaller than u1,1 and u2,2 to introduce an initial bias on shape selectivity.

The convergence of the weight matrix depends on the choice of initial conditions, given the non-convex and under-constrained nature of the cost function E, as there are multiple combinations of the connection weights and neuronal responses that minimize E. However, this does not limit our main results, as we can regard the biased initial values as the connections between a subset of PFC populations and the V4 population of interest before learning the shapes, which may have either weak negative values or positive values, among a wide range of random initial connection weights between PFC and V4. Depending on the initial connection weights, the connections will either become stronger or weaker over the course of training, and shape selectivity in PFC neurons emerges.

Our simulations with synaptic weights starting from different initial values show that the neuronal responses of the model are robust to precise choices of these initial weights. In Fig. 3A, we randomly choose different initial weights, under the constraint that u1,1 and u2,2 start from stronger values (in the range from 0.5 to 3.5) than u1,2 and u2,1 (in the range from −1 to 1). The initial weights between the occluder-selective V4 units and PFC units, u3,1 and u3,2, are randomly chosen in the range from 0 to 2. For all of these choices, the connection weights u1,1 and u2,2 converge to higher values than u1,2 and u2,1, resulting in mild shape selectivity in PFC units, with PFC unit 1 preferring the test shape (Fig. 3A(i),(iv)). If the initial connection weights of u1,2 and u2,1 are at larger values than u1,1 and u2,2, the shape preferences in PFC units switch (Fig. 3B(i),(iv)), but the response characteristics of V4 neurons remain unchanged (Fig. 3B(ii),(iii)). Interestingly, responses of V4 units are highly robust to differences in initial weights, converging to almost indistinguishable identical values in each case, as shown in Fig. 3A,B (ii),(iii).

Figure 3.

Figure 3

Convergence of connection weights from different initial values. During the training phase, the gradient descent starts with ten different randomly chosen sets of initial weights (see text). (A) When the initial connection weights for u1,1 and u2,2 are higher than the initial connection weights for u1,1 and u2.2, the connection weights u1,1 and u2,2 converge to larger values than u1,2 and u2,1 during the training phase (i). With these connection weights, responses of V4 unit 1 and unit 2 (ii), V4 unit 3 (iii), and PFC unit 1 and 2 (iv) to the test shape under varying degrees of occlusion are generated, and are almost identical regardless of the precise initial condition. (B) Same as in (A), but with initial connection weights for u1,2 and u2,1 larger than the initial connection weights for u1,1 and u2,2.

The obtained connection weight matrix is interpreted as a stored template or memory of the shape pair, and is fixed during the following test phase. The memory of the shapes encoded in the connection weights is similar to the idea proposed in Mumford (1992) where it was suggested that descending pathways store templates in the weights of their synapses.

3.4 Two-step inference on neuronal dynamics

With the trained connection weights, we find the model responses of each unit to partially occluded stimuli are comparable to neuronal responses in experimental recordings during the sequential shape discrimination task described above. In particular, we separate the responses inferred strictly by feedforward sensory inputs from those generated by integrated signals of both feedforward inputs and feedback predictions, and show that the model responses capture the temporal dynamics in the electrophysiological recordings.

The optimal representations of the neuronal responses rv4 and rpfc that minimize either the first term (E1 from Eq. 16) or the full representation of the cost function E (E2 from Eq. 17) are computed at each occlusion level. As explained in Methods, these are equivalent to the optimal responses in hierarchical Bayesian inference that maximize the posterior probability of the V4 neuronal responses given the shape identity and the occlusion level. Here we assume that the occluders are of a color different from that of the shape or the background, i.e., occlusion is salient and distinct (Fig. 4B). The occluders therefore activate V4 unit 3, the occluder-selective neuronal population in the model.

Figure 4.

Figure 4

Model simulations. The optimal representation based on hierarchical Bayesian inference reproduces V4 and PFC responses in the experiments. (A) The network model schematic as in Fig. 1A. The solid rectangle shows the initial feedforward-only signal computation. The dotted rectangle encompasses the computations for the delayed response inferences that integrate the bottom-up sensory inputs and the top-down predictions from PFC. The corresponding optimal representations are shown in solid (initial, feedforward-only) and dotted (delayed, feedforward+feedback) lines in D–E. (B) Illustration of the input stimuli– shape A with varying degrees of occlusion. The actual images were not used as the input; the κ-dependent population response distributions of V4 neurons were used to represent the shape stimuli. Note that the occluders are of a different color than the shape or the background, and activate a group of V4 cells selective for the color. (C) Inferred PFC responses increase as occlusion level increases, in accordance with experiments. A weak shape selectivity is present, as PFC unit 1 responds at higher rates than PFC unit 2 to the presented shape A across the occlusion levels. (D) Inferred responses of the shape-selective V4 units before (solid) and after (dotted) the top-down prediction. The green lines are the optimal responses of the V4 population selective for the test shape– shape A (V4 unit 1), and the blue lines are those of the non-preferred V4 population that responds preferentially to shape B (V4 unit 2). (E) Model prediction of average firing rates of the occluder-selective V4 population (V4 unit 3), as a function of occlusion level. The salient occlusion activates this class of V4 neurons. Note that the x-axis shows fraction unoccluded.

We make the inference on the neuronal responses in two steps. First, only the bottom-up sensory input is considered, so that the posterior distribution depends only on the stimulus κ (Fig. 4A, solid box). In other words, the optimal representations of the activities of the V4 units, rv4, are found by minimizing only the first term of the cost function E in Eq.14, or equivalently, by maximizing Eq.9. We hypothesize that these optimal responses modulated only by the bottom-up sensory inputs, to correspond to the initial transient in recorded V4 responses. Thus, only feedforward signals are present at this stage.

The delayed transients in V4 responses following the peak of responses in PFC, on the other hand, are compared to the optimal responses that integrate both the bottom-up and the top-down inputs. The model representations of the delayed V4 responses and the PFC responses, therefore, are obtained by finding rv4 and rpfc minimizing the full cost function E (Eq.14), which is equivalent to maximizing the full posterior distribution in Eq.4 composed of both the feedforward, κ-dependent distribution and the feedback, prediction-driven distribution. In this way, the model draws a connection between the response dynamics of V4 and PFC neurons and different computational stages in the feedforward-feedback loop.

The inferred optimal responses of each neuronal unit in V4 and PFC across a range of occlusion levels, before and after the feedback from PFC, are shown in Fig. 4C,D, and E. Both PFC unit 1 and unit 2 responses increase with added occlusion (Fig. 4C), in agreement with the experiments where PFC neurons respond strongly to occluded stimuli and weakly to unoccluded stimuli (Fig. 2D). Such increased PFC responses to occlusion result from the PFC connections to the occluder-selective V4 unit 3; through the synaptic connections, PFC predictions are compelled to match the responses of V4 unit 3 which responds preferentially to occluders. The model PFC units also show shape selectivity, with PFC unit 1 showing higher responses than PFC unit 2 to the test shape A across occlusion levels. This agrees with physiological evidence for shape selectivity in PFC (Pasupathy et al., 2015).

The two-step inference on the V4 responses accurately predicts the response characteristics of the initial and the delayed peaks in experimental recordings of V4 neurons. While the responses of V4 unit 2 (the neuronal unit not preferring the test shape A) stay constant at a low rate across the occlusion levels, V4 unit 1 (the preferred V4 unit) shows a decreasing response pattern as occlusion increases, i.e., as unoccluded area decreases. Compared to the responses inferred only based on the feedforward sensory input (Fig. 4D, solid green), the firing rates are less dependent on occlusion level when the feedback predictions are included (Fig. 4D, dotted green). Thus, with the feedback, an increase in occlusion does not as extensively degrade the preferred V4 responses. The model predictions therefore agree with the experimental observation on the two transients in V4 (Fig. 2B), and are in accordance with our hypothesis that the initial V4 responses reflect the feedforward signals from the afferent areas, and the delayed peak of responses in V4 are computed based on both the feedforward sensory signals and the feedback predictions from PFC. Because the response of the preferred V4 unit becomes resistant to occlusion when the feedback prediction is included, we say that the feedback enables V4 neurons to have enhanced shape discriminability under partial occlusion.

We note that, in contrast to other sensory areas (Rao & Ballard, 1999; Srinivasan et al., 1982) where this comparison has been successfully made, the responses of shape-selective V4 neurons are not accurately described by a direct comparison to the residual errors between the feedback predictions and the neuronal responses underlying the current estimate of the sensory signals. Here, the residual error of V4 unit 1 and 2 increases with added occlusion (Fig. 5), unlike the activity of shape-selective V4 neurons in experiments, which were strongest for unoccluded stimuli and weaker with occlusion (Fig. 2B, Kosai et al. (2014)). Instead, we identify the optimal estimates shaped by the feedforward input and the feedback predictions as the neuronal responses measured in V4, which do replicate response characteristics in experiments. Unlike the residual errors which reflect novelty of the sensory inputs, the optimal response representation conveys both sensory stimulus features and stimulus novelty.

Figure 5.

Figure 5

Error signals. (A) Squared difference between top-down prediction u · rpfc and the initial V4 responses rv4 obtained by minimizing E1. (B) Squared difference between top-down prediction u · rpfc and the delayed V4 responses rv4 obtained by minimizing E2.

Finally, the group of neurons that are hypothesized to respond preferentially to occluder saliency exhibits increasing responses as occlusion increases, both with and without the feedback (Fig. 4E). Although this class of neurons has not been systematically recorded in experiments, we identified several neurons with increasing responses to added occlusion regardless of the co-presented shapes (Fig. S1B), similar to the response patterns of V4 unit 3 (Fig. 4E).

In the above we have compared the steady-state representation of neuronal responses in the model to transient peaks of responses in the experiments. The two-step inference does not have a mechanism for the shape of the transient activities observed in experiments. Specifically, instead of having the brief suppression of responses between the initial peak and the delayed peak (Fig. 2A), the gradient descent on E1 (Eq.16) and E2 (Eq.17) with respect to rv4 simply predicts the V4 response dynamics rv4 to reach and stay at the respective steady state firing rates which minimize E1 and E2.

The gradient descent dynamics are shown in Fig. S3, where the feedback prediction term from PFC is included in the cost function after neuronal responses reach the steady state firing rates minimizing E1. Before optimization of the full cost function E2 starts, the responses may be brought down to the baseline firing rate for a brief interval rather than being continued from the values optimizing E1; in either case, the optimal responses measured at the end of optimization process of E2 do not change (Fig. S3 B,D), indicating that those values are robust within the range of firing rates we consider. Note that unless the responses are deliberately suppressed, the gradient descent dynamics do not exhibit transient peaks as observed in experiments. This implies that there may be additional physiological mechanisms in the cortical circuitry responsible for the transient dynamics. In principle, it is also possible that such temporal effects could be interpreted by extending the predictive coding to the temporal domain (Rao & Ballard, 1999; Friston & Kiebel, 2009a,b).

In summary, in this section we asked how the responses in a hierarchical predictive coding model compare to physiology. We find that, upon training, the model indeed predicts the observed responses in V4 and PFC, when the dynamics unfold over an initial feedforward and a second feedback stage.

3.5 Parsimony of the network structure

In the simulations above, we have assumed a specific network structure. This poses the question of whether these assumptions were necessary, and in general what aspects of network structure are required to reproduce the observed physiological responses.

Shape selectivity in V4 and PFC neurons is supported by experiments (Fyall et al., 2017), thus we included the test shape-preferred and non-preferred V4 and PFC units, namely, V4 units 1 and 2 and PFC units 1 and 2. In addition, our model includes an additional group of V4 cells that responds strongly to occlusion. We found that such occluder-selective V4 neurons are necessary to capture the response characteristics of PFC neurons observed in the experiments. Since the second term in the cost function Eq.14 is the squared difference between the PFC predictions – a linear combination of PFC responses – and the actual V4 responses, the PFC responses minimizing the cost function tend to follow the response trends of the afferent V4 neurons. The shape A (test shape)-preferred V4 unit 1 exhibits monotonically decreasing firing rates as occlusion level increases, while the activity of the shape B-selective V4 unit 2 stays constant across degrees of occlusion, as a consequence of the bottom-up stimulus-dependent inputs. With only these two types of neuronal populations, therefore, the PFC responses cannot capture the firing rate increase induced by occlusion. Given our model architecture without any additional mechanisms, there has to be a class of V4 neurons that responds strongly to occlusion but only weakly to unoccluded stimuli, so that PFC follows the similar response trends. Moreover, we found that the increase in PFC responses with occlusion cannot be obtained by including a simple prior distribution of PFC responses in the cost function, instead of the third class of V4 neurons in question (the only way to have a prior implement the observed changes in PFC responses would be to have that prior itself change with occlusion level). As discussed above, there are several candidates for types of V4 neurons represented by unit 3. These include populations of neurons that respond preferentially to the color of the occluding dots.

Another feature of our architecture – the convergence of the signals, with each of the PFC cells connected to multiple afferent V4 neurons from different populations –is also critical to replicate the shape selective responses that become more robust to occlusion after the PFC feedback. We experimented with different architectures and found that such convergence is crucial for transmitting information between different V4 units. Unless the same PFC unit makes predictions about both the shape-preferred V4 unit (V4 unit 1) and the occluder-selective V4 unit (V4 unit 3), the information about the occlusion level encoded by the occluder-selective V4 unit will not be transmitted to the shape-selective V4 population, which is crucial for maintaining robust shape discrimination and weaker dependence on occlusion. This structure, where the neurons of the lower cortical areas with different tuning properties send convergent signals to neurons in higher cortices, agrees physiological findings in which signals become more mixed as they travel along the hierarchy (Felleman & Van Essen, 1991; Rigotti et al., 2013; Fusi et al., 2016).

Another feature of our model is that fewer units in PFC (2) combine to make linear predictions about the responses of a larger number (3) of V4 units. This is also necessary to capture the experimental data. Without such convergence, the V4 responses imposed by the bottom-up sensory input can be matched perfectly by the top-down predictions made by PFC units, leading the optimal predictive coding solution to make identical copies of the sensory input at each stage along the hierarchy – which clearly does not occur in experiments. Translating this constraint into biology, this does not mean there must be fewer neurons in higher areas of brain, but rather that there are fewer functional or active populations that can be grouped as single units in the higher area during the task.

In our model, information about the shape identity s and the occlusion level c are both input to V4. The system implements a feedforward-feedback loop involving the higher area PFC to enhance shape discriminability under occlusion, as illustrated in a state space view in Fig. 6. Without the feedback predictions, during the initial responses, high occlusion moves noisy versions of the responses close to, or even above, the unity line, obscuring the shape identity (Fig. 6A). However, when the feedback from PFC is included, the responses move away from the unity line, thus clarifying the shape identity under partial occlusion (Fig. 6B). The convergent structure of the network is the key for this effect to occur. Although information about occlusion is initially present at the level of V4, it does not impact the shape selective V4 units without feedback from PFC. In other words, PFC predictions re-map the information about the shape identity and the occlusion level onto the shape-selective space in V4, enhancing the shape discriminability there.

Figure 6.

Figure 6

Shape discriminability under occlusion increases with the top-down prediction. The optimal average firing rates across degrees of occlusion as in Fig. 4D (yellow), projected onto the state space of V4 unit 1 (preferred) and unit 2 (non-preferred) responses. For each occlusion level, 200 responses were generated with a white noise with the mean at the optimal average value (yellow) and standard deviation of 2 arbitrary chosen for illustration purpose (blue: low occlusion, green: high occlusion). When the population responses are under the unity line (dotted black), rv4,1 > rv4,2, and the animal concludes that the test shape presented is shape A. The opposite is true for rv4,2 > rv4,1. Before the top-down prediction (A), the noisy responses under high occlusion (green dots) lie close to the unity line, obscuring the shape identity. With the top-down prediction included (B), the average optimal responses to occluded stimuli are moved horizontally to larger rv4,1 values (yellow). Thus the noisy responses are more squeezed and moved away from the unity line, clarifying the shape identity.

We note that recurrent connections among V4 populations – rather than the feedback described above – could in principle also transmit information about the occlusion level to the shape-selective neurons. Which mechanism is more effective and efficient is an open question. However, the current experimental evidence showing the delayed peak of responses in V4 arising after PFC responses peak, as well as the strong PFC responses to occlusion, are suggestive of feedback.

In summary, the proposed network, composed of two PFC units and three V4 units, has a parsimonious structure to explain the neuronal responses in the experiments under predictive coding principles.

3.6 Structure of inputs to V4

In the above simulations, we have assumed a simple input structure, where the sensory input is determined by probability distributions of V4 responses conditioned on shape identity s and occlusion level c. Based on experiments, we model μ1 for the test shape-preferred V4 unit 1 to decrease from a high firing rate as occlusion level grow, μ2 for the non-preferred V4 unit 2 to stay at a low baseline firing rate, and μ3 for the occlusion-preferring V4 unit 3 to increase. To provide a firmer basis for this, we show a several example V4 neurons in Fig. S1. Example cells in Fig. S1A behave like V4 units 1 and 2 with decreasing responses to preferred shapes under added occlusion and overall low responses to non-preferred shapes. On the other hand, the cells shown in Fig. S1B may correspond to V4 unit 3, which display relatively low, shape-selective responses to unoccluded shapes and increasing responses to both preferred and non-preferred shapes as occlusion level increases. The population averaged initial peak responses of V4 neurons to preferred and non-preferred shapes further support our implementation of μ (Fig. 7A). Specifically, the averaged responses of V4 neurons with clear two transient peaks and shape selectivity exhibit a decreasing response pattern to preferred stimuli with added occlusion, but responses to non-preferred shapes stay at a constantly low firing rate across the range of occlusion levels.

Figure 7.

Figure 7

V4 population encoding of shape stimuli. (A) Population averaged initial peak responses of 39 V4 cells that show clear two transient peaks and shape selectivity. The population averaged responses (normalized) to preferred shapes (green) decrease as occlusion increases, while those to non-preferred shapes (blue) remain at a relatively constant low activity level across the range of occlusion levels. Data adapted with permission from Pasupathy et al. (2015). (B) Normalized responses of 109 V4 neurons neurons to the shapes displayed in the insets (unoccluded), sorted based on firing rate. The population responses to the shape on the top have a sharp peak indicating a division between the neurons that show strong preference to the shape and the rest of the neurons. Responses to the shape on the bottom are more distributed across the V4 population. Data adapted with permission from Pasupathy & Connor (2002).

In addition to the peak firing rate μ, another component that forms the input signals is the variance of the V4 response distributions given the sensory input. As discussed earlier, we hypothesize that σ1 increases with added occlusion as high degrees of occlusion obscure the shape identity, and that σ2 is constant across degrees of occlusion, since random placements of occluding dots on a non-preferred shape will not introduce as much variability in responses as on a preferred shape. We also modeled σ3 to stay constant regardless of occlusion level, as this unit responds to the presence of occluders but not their specific configuration. To test this hypothesis, we examined the consequence of other plausible assumptions for how variance depends on occlusion. First, when variances for all three V4 units increase at the same rate (Fig. 8A), the response characteristics of the shape-preferred V4 unit 1 remain unchanged but the non-preferred V4 unit 2 shows increasing delayed responses to added occlusion. Such increasing delayed responses of unit 2 are also obtained when the input variances for both V4 unit 1 and unit 2 are increased with occlusion, while the variance for V4 unit 3 is kept constant (Fig. 8B,C). When the variances of all V4 units are decreased with added occlusion, on the other hand, we observe very different response patterns (Fig. 8D). Specifically, feedback does not improve the shape discriminability, as the initial and the delayed responses of V4 unit 1 are identical. Based on these simulations, we limit our model to the cases where introduction of occlusion increases the input variances for shape-selective V4 units.

Figure 8.

Figure 8

Dependence of input variance on occlusion levels. Model responses across occlusion levels when (A) the variances σ1, σ2 and σ3 increase with added occlusion at the same rate (σ1 = σ2 = σ3 = 1 + 5 · c), and (B) the variances for the shape-selective V4 units σ1 and σ2 increase at the same rate as occlusion increases, while σ3 remains unchanged (σ1 = σ2 = 1 + 5 · c; σ3 = 1), (C) the variances for the shape-selective V4 units σ1 and σ2 both increase, but σ2 at a slower rate; here σ3 again remains unchanged (σ1 = 1 + 5 · c; σ2 = 1 + 2 · c; σ3 = 1), (D) the variances σ1, σ2, and σ3 decrease with added occlusion at the same rate (σ1 = σ2 = σ3 = 1 − c)

Our simulations show that the experimental results (Fig. 2A,B; 7A; S1A) are best captured by different rates of variance increase for different V4 units (Fig. 4A; 8C), in particular, when the variances for V4 unit 1 increase with added occlusion, the variances for V4 unit 2 stay constant or increase by a smaller amount compared to V4 unit 1, and the variances for V4 unit 3 stay constant. As a result, shape selectivity under occlusion is consistently improved in the delayed signals (Fig. 8C).

The dependence of variance on occlusion may not be uniquely defined and likely vary among V4 neurons. Indeed, neurons in V4 show a spectrum of different response patterns to non-preferred stimuli, indicating that different V4 neurons encode input variances in more than one way. For example, the top two rows in Fig. S1A show example V4 neurons whose delayed responses to non-preferred stimuli do not increase with added occlusion, corresponding to our model simulation with a constant variance for V4 unit 2. On the other hand, the last row in Fig. S1A shows an example V4 neuron with increased delayed responses to occluded non-preferred shapes. The example V4 cells thus suggest that neurons in V4 may respond to partially occluded, non-preferred shapes with constant variances or slightly increased variances.

Our original model has only two V4 units of shape-selectivity, one tuned for the test shape and the other not preferring the test shape. A more biologically realistic model would consist of a population of V4 neurons with diverse response properties. To construct this population, we first examined response profiles of a population of 109 neurons in V4 previously reported in Pasupathy & Connor (2002). For some shapes, disparity between the responses of the neurons preferring the shapes and those of the non-preferred neurons is noticeable, as illustrated in the sorted population responses to a given shape in Fig. 7B, top panel. However, to many other shapes, the population of V4 neurons show more graded responses, as in Fig. 7B, bottom panel.

We next expanded the model network to a larger network with two PFC units and thirty V4 units (Fig. 9A). Among the V4 units, 10 are occluder-preferred units and the remaining 20 units are shape-selective. Instead of dividing the shape-selective V4 units into test shape-preferred and non-preferred groups, we modeled the V4 units to have a spectrum of peak firing rates and variances for the input-driven responses (Fig. 9B). For the V4 units that are more tuned to the test shape (corresponding to higher values of μ), the input-driven variance increases by larger amounts with added occlusion (Fig. 9B). The connection weights from PFC units to the V4 units are also adjusted accordingly (i.e., so that in Fig. 9A, the green PFC unit has stronger connections to the green V4 units compared to the blue V4 units and vice versa).

This expanded model yields qualitatively the same results as the simple network with two shape-selective V4 units. The PFC units and the occluder-selective V4 units show increasing responses with added occlusion (Fig. 9C(i),(ii)), and the shape-selective V4 units yield decreasing response patterns with an increase in occlusion (Fig. 9C(iii)). Moreover, the delayed responses of the shape-selective V4 units obtained by optimizing the full cost function exhibit reduced sensitivity to occlusion, and the effect is stronger for the units with stronger test shape preference. To see this more clearly, Fig. 9C(iv) presents responses of a selected number of the shape-selective V4 units with high, intermediate, and low degrees of preference for the test shape. In sum, the delayed increase in responses to stimuli under occlusion induced by the feedback, in neurons that respond preferentially to the test shape is maintained in a population of V4 units with graded response properties, validating the predictions made by our simplified model.

3.7 Differential weighting of feedforward and feedback inputs

In our model, the relative strength of feedback and feedfoward interactions are determined by assumptions about levels of variability in the inference errors (the noise terms in Eq.7,11) at each network layers (Eq.8,12). Here we ask how these assumptions impact the ability of the model to reproduce trends in experimental data.

Recall that the cost function E in our model has two terms, one based on bottom-up sensory inputs and the other based on top-down predictions (Eq.14). Contribution of each of these components is weighted by the inverse variance of the respective probability distribution. The pattern of the optimal responses to occlusion can therefore be modulated by these variances. Here we examine how this occurs, and show that the tradeoff between feedforward and feedback components achieved by the variances in Fig. 4 is necessary to capture the response characteristics observed in experiments.

We first discuss effects of the variances for the bottom-up input-driven distributions. In the original model (Fig. 4), for the bottom-up component, variances are set equal to 1 for all three V4 populations when the input shape is unoccluded. We also set the variance for the test shape-preferred V4 population (V4 unit 1) to increase as occlusion level increases, to capture the increase in uncertainty of the shape identity in presence of occlusion. We found that this increase in variance for the preferred V4 unit is necessary to mimic its weaker sensitivity to occlusion when feedback inputs are included. Without the increase in variance, this V4 unit depends relatively more on the bottom-up inputs under high degrees of occlusion, and as a result, shows a steep decrease in its responses as occlusion increases (Fig. 10A, left panel, green). By increasing the variance of the sensory input-dependent distribution, therefore, the optimal response of this V4 population becomes more dependent on the top-down predictions made by PFC. As the PFC populations respond strongly to occluded stimuli, weighting the bottom-up component less will result in a more gradual decrease in V4 responses to increasing occlusion, as in the original model in Fig. 4D.

Figure 10.

Figure 10

Model simulations with modified top-down and bottom-up variances predict different response patterns in neuronal units. The responses of each neuronal unit when (A) the bottom-up variance of shape A-selective V4 response distribution σ1 stays constant with increasing occlusion, (B) the top-down predictive distributions all have unit variances ( σ1=σ2=σ3=1), (C) the top-down variances are all larger than the bottom-up variances ( σ1=σ2=σ3=10).

Next, we examine the choice of the top-down variances in the original model that successfully captures experimental data. In the initial model (Fig. 4), the variances of the top-down component do not depend on the occlusion level and stay at constant values. However, the top-down effect is differentially weighted for each of the V4 populations; it is weighted more for the occluder-selective V4 population ( σ3=1) compared to the shape A- and B-selective neurons ( σ1=σ2=10). This is needed to reproduce the rise in PFC responses at higher levels of occlusion. The smaller variance, or equivalently, more “weight”, on the top-down predictions of the occluder-selective V4 unit drives the PFC unit to follow the same increasing response pattern as the occluder-selective V4. The smaller variance imposed on the top-down prediction for the occluder-selective V4 unit can be interpreted as the top-down predictions having more significance for occlusion than for identity of the shape.

We investigated effects of changes in the top-down variances on the response patterns. When the feedback prediction-driven distributions for all V4 units are uniformly weighted with unit variance, the top-down effect becomes more pronounced (Fig. 10B) compared to the case with the variances at the original values (Fig. 4D). As a consequence, the delayed responses of the test shape-preferred V4 (V4 unit 1) increase with added occlusion, reflecting strong modulation by PFC (Fig. 10B, left panel, green dotted line). Similarly, when the top-down variances on all three V4 units are set to be larger than the bottom-up variances, relatively more influence is exerted by the bottom-up drive (Fig. 10C). As a result, the feedback no longer increases robustness of V4 unit 1 responses under partial occlusion (Fig. 10C, left panel, green dotted line).

In sum, we have shown that the ability to reproduce trends in experimental recordings in our predictive coding model requires the balance of top-down and the bottom-up influences that is given by the increase in the input-dependent variance with added occlusion for the test shape-selective neurons and the smaller variance in the top-down prediction on the occluder-selective neurons.

3.8 Model prediction for responses to non-salient occlusion, noise, or reduced contrast

Above, we have assumed that occlusion is salient, and that there is a separate population of cells in V4 that responds preferentially to occlusion. But what happens to predictions of the model when the occlusion is non-salient – that is, indistinct from the shape? To answer this, we consider the case where the occluder reduces the shape signal, but does not activate a dedicated class of V4 neurons. For example, when the occluders are of the same color as the shape or the background, occlusion would increase ambiguity of the shape identity but would not induce responses in a V4 population separately responsive to a distinct color. Other examples include a decrease in shape clarity by white noise or reduced contrast (illustrated in Fig. 11B).

Figure 11.

Figure 11

Model simulation with indiscriminate occlusion or noise does not activate a class of V4 neurons, predicting the top-down signals to have no effect on the V4 responses. (A) Model schematic. Same model as in Fig. 4A, but with an input stimulus obscured by non-salient occlusion, noise, or reduced contrast. (B) Illustration of the input stimuli: shape A with varying degrees of noise, contrast, and non-salient occlusion with occluders of the same color as the background or the shape. These types of visual ambiguity are not salient while obscuring the shape identity. (C) Inferred PFC responses as a function of fraction of the shape unoccluded (shape clarity). Reduced shape clarity alone does not increase the responses of shape A-selective PFC population. (D) Inferred responses of the shape-selective V4 units before (solid) and after (dotted) the top-down prediction, as a function of occlusion/obscurity level. The responses are depicted by color and line type as in Fig. 4D. The responses of the preferred V4 population after the top-down inputs are not distinguishable from those before the top-down inputs. Therefore, the top-down prediction does not improve shape discriminability under occlusion. (E) Model prediction of average firing rates of the occluder-selective V4 population. The non-salient occlusion does not activate the V4 population selective for some distinct feature (e.g. color) of the occluders. Note that fraction unoccluded on the x-axis means shape clarity in the case of reduced contrast or added noise.

We simulated such non-salient occlusion and ambiguity in our model by setting μ3, and therefore the peak of the response distribution for the occluder-selective V4 conditioned on sensory stimulus, to a constant. Therefore, an increase in occlusion or ambiguity in the shape stimulus does not increase the responses of V4 unit 3, as shown in Fig. 11E. The peak μ1 for the shape A-preferring V4 unit, however, is assumed to decrease with occlusion, as for previous simulations. Note that such neuronal behaviors are assumed because the occluder-selective V4 unit 3 is not modeled to specifically detect occlusion, but rather respond to some occluder-specific feature such as a distinct color with contrast relative to background. This results in a decrease in the preferred PFC responses with occlusion/ambiguity, and only a slight increase in the non-preferred PFC responses (Fig. 11C). Therefore, the feedback predictions made by PFC do not increase the preferred V4 unit 1 responses when the shape ambiguity (occlusion level) is high. In Fig. 11D, the preferred V4 responses after the feedback (dotted green) are therefore indistinguishable from the responses before the feedback (solid green). Our model thus predicts that when the shape signal is reduced in a way that is not salient, the feedback from PFC does not improve shape discriminability.

From the point of view of perception, this prediction seems plausible since we often have more difficulty recognizing an object when the obscurant is not distinct from the object. Moreover, preliminary experimental observations show that PFC neurons do not respond strongly to occluders of the same color as the background. In addition, the second peak of responses were not observed in V4 neurons when the shapes were obscured by reducing their contrast. While these preliminary observations are in accordance with our model predictions, more data should certainly be collected before conclusions can be drawn.

4 Discussion

In this study, we have proposed that robust shape-selective V4 responses under partial occlusion can be explained in the framework of predictive coding and hierarchical Bayesian inference. We have used this framework to construct a model of V4 and PFC in which signals converge as they travel up the hierarchy. In particular, we suggest that top-down predictions made by PFC neurons with mixed selectivity for shape identity and occlusion play a significant role in maintaining robust shape discriminability under salient partial occlusion in V4. In this model, PFC neurons make linear predictions on V4 activities in the form of feedback signals, and the connection weights are interpreted to store the memory of the shape identities. We reformulated the traditional framework of predictive coding, so that the optimal representation of the internal states of the model V4 and PFC units, rather than residual errors, are comparable to the electrophysiological recordings in these areas.

Our model suggests that the initial responses in experimental recordings of a class of V4 neurons are purely feedforward and computed solely based on the bottom-up sensory input, while the delayed responses are modulated by both the bottom-up sensory signals and the top-down predictions. The model further shows that the feedback signals in V4 improve the shape discriminability under occlusion by reducing ambiguity in the population representation of the shape identity, and that this is achieved by transmission of the occlusion information via a feedforward-feedback loop. This can be viewed as an extension of the concept proposed in Rao & Ballard (1999) where predictions made by higher visual areas with larger receptive fields enable neurons encoding the surround and the center in V1 to share information; in our model of V4, neurons encoding different features of a shape stimulus such as curvature, color, etc, share information via predictions made by the higher areas.

The increase in the shape selective responses of V4 induced by the feedback depends on asymmetric weighting of the top-down and the bottom-up effects, so that the top-down prediction is weighted more strongly for the occluder-selective neurons and the dependency of the shape-selective neuronal responses on the sensory input decreases with added occlusion. Interesting future work could more directly test this weighting of the top-down and the bottom-up effects. For example, the top-down predictive component of our model would be weakened by training with a larger set of noisy shape stimuli under various degrees of occlusion, which will introduce larger variance terms in the feedback prediction. Model predictions for experiments where partially occluded shapes are used for training are given in Appendix S4. Our simulations predict that when training is done with partially occluded shapes, the V4 neurons do not exhibit a delayed increase in shape-selective responses, underlining the significance of initial exposure to unoccluded shapes (Appendix S4). If the variances (Σ1, Σ2) are allowed to be learned as well, using a noisy stimuli set under various degrees of occlusion for training will weaken the top-down influence by increasing the top-down variance Σ2. Alternatively, cooling PFC is another way that more emphasis would be placed on the bottom-up sensory input. Overall, our model predicts a smaller or no increase in shape selectivity during delayed V4 responses in these cases where the effect of the top-down predictive signals is reduced.

For the input-driven response distributions, we have assumed the mean and the covariance to depend linearly on occlusion level. While this assumption keeps our model simple, detailed neuronal responses may be captured more accurately by implementing nonlinear dependence. For example, the example V4 cell in Fig. 2 shows the maximum delayed increase in shape-discriminability when the occlusion level is intermediate. However, in our model, the separation between the initial and the delayed V4 responses increases monotonically with added occlusion, more resembling example V4 cell 3 in Fig. S1A. The response pattern in Fig. 2, on the other hand, can be reproduced by nonlinear dependence on occlusion of the bottom-up mean and covariance (data not shown). Thus, the variability in detailed response patterns across cells may indicate heterogeneous occlusion-dependence functions of individual V4 neurons.

In this way, our model contributes to new understanding of both neurophysiological and computational mechanisms underlying discrimination of partially occluded shapes in V4, suggesting a possible functional contribution of feedback signals.

4.1 Relationship to previous models

Several previous theoretical studies investigated the computational mechanisms for recognition of partially occluded shapes, patterns, and objects (Fukushima, 1987, 2001, 2005; Rao, 1997). However, these are strictly feedforward and often overlook feedback computation, in stark contrast to biological networks which feature abundant feedback and recurrent connections. One approach is based on an extended version of neocognitron– a hierarchical, multilayered, and feedforward neural network model (Fukushima, 1987, 2001, 2005). This extended neocognitron has an additional “masker layer” which detects occluders by difference in brightness and suppresses them at an early state. A study by Rao (Rao, 1997, 1999) uses a Kalman filter model and Bayesian optimal estimation theory of maximizing the posterior probability of the internal states. With robust optimization method which clips large residual errors, the model effectively segments the occluders from the image, treating the occluders as the outlier. The physiological mechanisms underlying the robust optimization method, however, are not known.

There have been a number of other modeling studies of V4 tuning to shape contours based on hierarchical feedforward models of object categorization, which have structural similarity to the ventral visual pathway (Fukushima, 1980; Riesenhuber & Poggio, 1999; Serre et al., 2007; Cadieu et al., 2007; Yamins et al., 2014). These models are also purely feedforward, and while they have had successes in reproducing V4 shape selectivity (Cadieu et al., 2007; Yamins et al., 2014), they lack separate mechanisms to account for occlusion. Unlike these previous models, our model bridges hierarchical predictive coding and experimentally recorded response dynamics in area V4 and PFC.

Our model is focused primarily on the encoding of the partially occluded stimulus, while the underlying behavioral task required animals to report whether the two stimuli presented were the same or different. Other literature has proposed how the sensory representation of the test stimulus may be compared to the memory representation of the reference stimulus (Hayden & Gallant, 2013; Murray et al., 2017; Romo & Salinas, 2003) to derive a behavioral decision, but this is beyond the scope of this paper.

4.2 Learning the shape templates with connection weights

Our model modifies the synaptic weights between V4 and PFC neurons during the preliminary phase which consists of a few presentations of unoccluded shapes. This step corresponds to the initial learning phase in experiments where the animal discriminates an unoccluded pair of shapes used for the session. In this setup, the fast learning of the shape pair after exposure to the shapes for just a few times, is achieved by the memory stored in the synaptic weights between V4 and PFC neurons. When partially occluded shapes are used during the preliminary phase, on the other hand, the system learns different values of synaptic weights and the feedback does not improve shape discriminability (See Appendix S4). Fast learning, as attested by the shape discrimination task here, has been observed widely, where new sensory stimuli are easily learned with just a few presentations (Seitz, 2010; Rubin et al., 1997).

Physiological recordings in cortical cells in vitro, however, show only small changes in synaptic strength after a pair of pre- and post-synaptic spikes (Markram et al., 1997; Bi & Poo, 1998; Gerstner et al., 1996), suggesting that neurons learn a repeated stimulus more gradually, after a large number of presentations. Such seemingly contradicting evidence from physiology and behavioral observations can be reconciled by introducing stronger synaptic changes than usually observed in vitro, possibly aided by neuromodulation (Fusi et al., 2005). More recently, it has been proposed that even weak synaptic plasticity can support fast learning in the balanced-regime of excitation and inhibition (Yger et al., 2015). Due to the leverage effect from the excitatory and inhibitory balance in this regime, small synaptic modifications applied to many synapses onto a given neuron result in a large effect (Yger et al., 2015).

4.3 Mapping computational units in predictive coding to cortical circuitry

Different algorithms implementing hierarchical predictive coding share the general principle of a generative model: the brain has an internal representation of the world which is actively compared to the actual sensory inputs. However, the precise computational procedures employed by these algorithms as well as their connections to neuronal populations are controversial and vary widely across different studies (Spratling, 2016; Bastos et al., 2012; Bogacz, 2015; Rao & Ballard, 1999; Mumford, 1992; Spratling, 2008).

For example, in our model, the variances of the response distributions of different V4 units given the sensory input or the higher cortical activity are pre-defined to capture the response characteristics in experiments. However, they can also be treated as parameters to be optimized and are assigned to the most likely values, with a slight modification on the network structure as done in a few other models of hierarchical predictive coding. In these studies, the variances are interpreted as synaptic weights and are obtained by minimizing the free energy (Bogacz, 2015; Friston & Kiebel, 2009a,b).

There are varied interpretations on the connections between predictive coding algorithms and computations done by cortical circuitry. Cortical areas have laminar structures, and different layers or populations within the cortical area may correspond to different local computational nodes that arise in predictive coding algorithms. However, there is no unifying description of the intra-cortical connectivity and the local computations within a cortical area. For example, inhibitory feedback connection implemented in the model proposed by Rao & Ballard (1999), is modified in Spratling (2008, 2016) to reflect excitatory feedback signals observed in physiology. In order to avoid negative responses, Spratling (Spratling, 2008, 2016) also replaced additive excitation and subtractive inhibition in Rao & Ballard (1999) by multiplicative and divisive modulations, respectively. In our model, we follow the approach in Rao & Ballard (1999) and implement additive excitation and subtractive inhibition for simplicity.

Within area V4, there surely are multiple neuronal populations across the laminar structures, and each neuronal node may perform different computations as suggested by earlier studies. In this work, we have shown that V4 neuronal responses to partially occluded shapes are better explained by the optimal representation of responses than the residual errors between the current estimates and the predictions, and thus correspond to the node that encodes the current estimates. However, neurons whose responses are less dependent on their tuning to stimuli features but more sensitive to novelty of the stimulus may correspond to the unit that computes residual errors. Investigations of specific neuronal populations within V4-PFC circuitry in the context of the corresponding computational nodes in the predictive coding algorithm will provide a better understanding and validation of our model.

Supplementary Material

Acknowledgments

We thank Rajesh Rao, Wyeth Bair, and Joel Zylberberg for many helpful discussions and comments. This work was supported by the Washington Research Foundation Innovation Postdoctoral Fellowship in Neuroengineering to H.C., NEI grant R01EY018839 to A.P., NSF Career Award DMS-1056125 to E.S-B, Vision Core grant P30EY01730 to the University of Washington, and P51 grant OD010425 to the Washington National Primate Research Center.

References

  1. Barbas H, Mesulam MM. Cortical afferent input to the principalis region of the rhesus monkey. Neuroscience. 1985;15:619– 637. doi: 10.1016/0306-4522(85)90064-8. [DOI] [PubMed] [Google Scholar]
  2. Bastos AM, Usrey WM, Adams RA, Mangun GR, Fries P, Friston K. Canonical microcircuits for predictive coding. Neuron. 2012;76:695– 711. doi: 10.1016/j.neuron.2012.10.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bi GQ, Poo MM. Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. J Neuroscience. 1998;18:10464– 10472. doi: 10.1523/JNEUROSCI.18-24-10464.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bogacz R. A tutorial on the free-energy framework for modelling perception and learning. Journal of Mathematical Psychology. 2015 doi: 10.1016/j.jmp.2015.11.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bushnell BN, Harding PJ, Kosai Y, Bair W, Pasupathy A. Equiluminance cells in visual cortical area V4. J Neuroscience. 2011;31(35):12398– 12412. doi: 10.1523/JNEUROSCI.1890-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bushnell BN, Harding PJ, Kosai Y, Pasupathy A. Partial occlusion modulates contour-based shape encoding in primate area V4. J Neuroscience. 2011;31(11):4012– 4024. doi: 10.1523/JNEUROSCI.4766-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bushnell BN, Pasupathy A. Shape encoding consistency across colors in primate V4. J Neurophysiology. 2012;108:1299–1308. doi: 10.1152/jn.01063.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cadieu C, Kouh M, Pasupathy A, Connor CE, Riesenhuber M, Poggio T. A model of V4 shape selectivity and invariance. J Neurophysiology. 2007;98:1733– 1750. doi: 10.1152/jn.01265.2006. [DOI] [PubMed] [Google Scholar]
  9. Eghbali R, Pasupathy A, Bair W. Clustering V4 neurons based on their responses to simple shapes. Society for Neuroscience Abstracts 2016 [Google Scholar]
  10. Felleman DJ, Van Essen DC. Distributed hierarchical processing in the primate cerebral cortex. Cerebral Cortex. 1991;1(1):1– 47. doi: 10.1093/cercor/1.1.1-a. [DOI] [PubMed] [Google Scholar]
  11. Friston K, Kiebel S. Predictive coding under the free-energy principle. Phil Trans R Soc B. 2009;364:1211– 1221. doi: 10.1098/rstb.2008.0300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Friston K, Kiebel S. Cortical circuits for perceptual inference. Neural Networks. 2009;364:1093– 1104. doi: 10.1016/j.neunet.2009.07.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Fukushima K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics. 1980;36:193– 202. doi: 10.1007/BF00344251. [DOI] [PubMed] [Google Scholar]
  14. Fukushima K. Neural network model for selective attention in visual pattern recognition n and associative recall. Applied Optics. 1987;26:1985– 1992. doi: 10.1364/AO.26.004985. [DOI] [PubMed] [Google Scholar]
  15. Fukushima K. Recognition of partly occluded patterns: a neural network model. Biological Cybernetics. 2001;84:251– 259. doi: 10.1007/s004220000210. [DOI] [PubMed] [Google Scholar]
  16. Fukushima K. Restoring partly occluded patterns: a neural network model. Neural Networks. 2005;18:33– 43. doi: 10.1016/j.neunet.2004.05.001. [DOI] [PubMed] [Google Scholar]
  17. Fusi S, Drew PJ, Abbot LF. Cascade models of synaptically stored memories. Neuron. 2005;45:599– 611. doi: 10.1016/j.neuron.2005.02.001. [DOI] [PubMed] [Google Scholar]
  18. Fusi S, Miller RK, Rigotti M. Why neurons mix: high dimensionality for higher cognition. Current Opinion in Neurobiology. 2016;37:66– 74. doi: 10.1016/j.conb.2016.01.010. [DOI] [PubMed] [Google Scholar]
  19. Fyall AM, El-Shamayleh Y, Choi H, Shea-Brown E, Pasupathy A. Dynamic representation of partially occluded objects in primate prefrontal and visual cortex. eLife. 2017;6:e25784. doi: 10.7554/eLife.25784. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Gerstner W, Kempter R, van Hemmen JL, Wagner H, Hemmen JV. A neuronal learning rule for sub-millisecond temporal coding. Nature. 1996;383:76– 81. doi: 10.1038/383076a0. [DOI] [PubMed] [Google Scholar]
  21. Gregoriou GG, Rossi AF, Ungerleider LG, Desimone R. Lesions of prefrontal cortex reduce attentional modulation of neuronal responses and synchrony in V4. Nature Neuroscience. 2014;17:1003– 1011. doi: 10.1038/nn.3742. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Hayden BY, Gallant JL. Working memory and decision processes in visual area V4. Frontiers in Neuroscience. 2013;7:18. doi: 10.3389/fnins.2013.00018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Koch C, Poggio T. Predicting the visual world: silence is golden. Nature Neuroscience. 1999;2:9– 10. doi: 10.1038/4511. [DOI] [PubMed] [Google Scholar]
  24. Kosai Y, El-Shamayleh Y, Fyall AM, Pasupathy A. The role of visual area V4 in the discrimination of partially occluded shapes. J Neuroscience. 2014;34(25):8570– 8584. doi: 10.1523/JNEUROSCI.1375-14.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Lee TS, Mumford D. Hierarchical Bayesian inference in the visual cortex. J Opt Soc Am A Opt Image Sci Vis. 2003;20:1434– 1448. doi: 10.1364/josaa.20.001434. [DOI] [PubMed] [Google Scholar]
  26. Markram H, Lubke J, Frotscher M, Sakmann B. Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs. Science. 1997;275:213– 215. doi: 10.1126/science.275.5297.213. [DOI] [PubMed] [Google Scholar]
  27. Meyers EM, Freedman DJ, Kreiman G, Miller EK, Poggio T. Dynamic population coding of category information in inferior temporal and prefrontal cortex. J Neurophyiology. 2008;100:1407– 1419. doi: 10.1152/jn.90248.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Miller E, Cohen JD. An integrative theory of prefrontal cortex function. Annu Rev Neurosci. 2001;24:167– 202. doi: 10.1146/annurev.neuro.24.1.167. [DOI] [PubMed] [Google Scholar]
  29. Mumford D. On the computational architecture of the neocortex. Biological Cybernetics. 1992;66:241– 251. doi: 10.1007/BF00198477. [DOI] [PubMed] [Google Scholar]
  30. Murray JD, Jaramillo J, Wang X-J. Working memory and decision making in a fronto-parietal circuit model. bioRxiv. 2017 doi: 10.1523/JNEUROSCI.0343-17.2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Namima T, Pasupathy A. Neural responses in the inferior temporal cortex to partially occluded and occluding stimuli. Society for Neuroscience Abstracts. 2016 doi: 10.1523/JNEUROSCI.2992-20.2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Ninomiya T, Sawamura H, Inoue K, Takeda M. Multisynaptic inputs from the medial temporal Lobe to V4 in Macaques. PLOS One. 2012;7(12):e52115. doi: 10.1371/journal.pone.0052115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Olshausen BA, Field DJ. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature. 1996;381:607– 609. doi: 10.1038/381607a0. [DOI] [PubMed] [Google Scholar]
  34. Olshausen BA, Field DJ. Sparse coding with an overcomplete basis set: a strategy employsed by V1? Vision Research. 1997;37(23):3311– 3325. doi: 10.1016/s0042-6989(97)00169-7. [DOI] [PubMed] [Google Scholar]
  35. Pasupathy A, Connor CE. Responses to contour features in macaque area V4. J Neurophysiology. 1999;82:2490– 2502. doi: 10.1152/jn.1999.82.5.2490. [DOI] [PubMed] [Google Scholar]
  36. Pasupathy A, Connor CE. Shape representation in area V4: position-specific tuning for boundary conformation. J Neurophysiology. 2001;86:2505– 2519. doi: 10.1152/jn.2001.86.5.2505. [DOI] [PubMed] [Google Scholar]
  37. Pasupathy A, Connor CE. Population coding of shape in area V4. Nature Neuroscience. 2002;5:1332– 1338. doi: 10.1038/nn972. [DOI] [PubMed] [Google Scholar]
  38. Pasupathy A, Fyall AM, Choi H. Discriminating partially occluded shapes: insights from visual and frontal cortex. Cosyne annual meeting 2015 [Google Scholar]
  39. Rao RPN. Correlates of attention in a model of dynamic visual recognition. Advances in Neural Information Processing Systems (NIPS) 1997;10:80– 86. [Google Scholar]
  40. Rao RPN, Ballard DH. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience. 1999;2:79– 87. doi: 10.1038/4580. [DOI] [PubMed] [Google Scholar]
  41. Rao RPN. An optimal estimation approach to visual perception and learning. Vision Research. 1999;39:1963– 1989. doi: 10.1016/s0042-6989(98)00279-x. [DOI] [PubMed] [Google Scholar]
  42. Rao RPN. Bayesian computation in recurrent neural circuits. Neural Computation. 2004;16:1– 38. doi: 10.1162/08997660460733976. [DOI] [PubMed] [Google Scholar]
  43. Rao RPN. Bayesian inference and attentional modulation in the visual cortex. NeuroReport. 16:1843–1848. doi: 10.1097/01.wnr.0000183900.92901.fc. [DOI] [PubMed] [Google Scholar]
  44. Riesenhuber M, Poggio T. Hierarchical models of object recognition in cortex. Nature Neuroscience. 1999;2:1019– 1025. doi: 10.1038/14819. [DOI] [PubMed] [Google Scholar]
  45. Rigotti M, Barak O, Warden M, Wang X-J, Daw ND, Miller RK, et al. The importance of mixed selectivity in complex cognitive tasks. Nature. 2013;497:585– 590. doi: 10.1038/nature12160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Romo R, Salinas E. Flutter Discrimination: neural codes, perception, memory and decision making. Nature Reviews Neuroscience. 2003;4:203– 218. doi: 10.1038/nrn1058. [DOI] [PubMed] [Google Scholar]
  47. Roe AW, Chelazzi L, Connor CE, Conway BR, Fujita I, Gallant JL, et al. Toward a unified theory of visual area V4. Neuron. 2012;74:12– 29. doi: 10.1016/j.neuron.2012.03.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Rubin N, Nakayama K, Shapley R. Abrupt learning and retinal size specificity in illusory-contour perception. Curr Biol. 1997;7:461– 467. doi: 10.1016/s0960-9822(06)00217-x. [DOI] [PubMed] [Google Scholar]
  49. Rust NC, Stocker AA. Ambiguity and invariance: two fundamental challenges for visual processing. Current Opinion in Neurobiology. 2010;20:382– 388. doi: 10.1016/j.conb.2010.04.013. [DOI] [PubMed] [Google Scholar]
  50. Schein SJ, Desimone R. Spectral properties of V4 neurons in the macaque. J Neuroscience. 1990;10:3369– 3389. doi: 10.1523/JNEUROSCI.10-10-03369.1990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Seitz AR. Sensory learning: rapid extraction of meaning from noise. Curr Biol. 2010;20:R643–R644. doi: 10.1016/j.cub.2010.06.017. [DOI] [PubMed] [Google Scholar]
  52. Serre T, Oliva A, Poggio T. A feedforward architecture accounts for rapid categorization. Proc Natl Acad Sci. 2007;104:6424– 6429. doi: 10.1073/pnas.0700622104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Spratling MW. Predictive coding as a model of biased competition in visual attention. Vision Research. 2008;48(12):1391– 1408. doi: 10.1016/j.visres.2008.03.009. [DOI] [PubMed] [Google Scholar]
  54. Spratling MW. A review of predictive coding algorithms. Brain and Cognition. 2016 doi: 10.1016/j.bandc.2015.11.003. [DOI] [PubMed] [Google Scholar]
  55. Srinivasan MV, Laughlin SB, Dubs A. Predictive coding: A fresh view of inhibition in the retina. Proc R Soc Lond B Biol Sci. 1982;216:427– 459. doi: 10.1098/rspb.1982.0085. [DOI] [PubMed] [Google Scholar]
  56. Ungerleider LG, Galkin TW, Desimone R, Gattass R. Cortical connections of area V4 in the macaque. Cereb Cortex. 2008;18:477– 499. doi: 10.1093/cercor/bhm061. [DOI] [PubMed] [Google Scholar]
  57. Yamins DLK, Hong H, Cadieu CF, Solomon EA, Seibert D, DiCarlo JJ. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc Natl Acad Sci. 2014;111:8619– 8624. doi: 10.1073/pnas.1403112111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Yger P, Stimberg M, Brette R. Fast learning with weak synaptic plasticity. J Neuroscience. 2015;35(39):13351– 13362. doi: 10.1523/JNEUROSCI.0607-15.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Yuille A, Kersten D. Vision as Bayesian inference: analysis by synthesis? Trends in Cognitive Sciences. 2006;10:301– 308. doi: 10.1016/j.tics.2006.05.002. [DOI] [PubMed] [Google Scholar]
  60. Zeki SM. Colour coding in rhesus monkey prestriate cortex. Brain Res. 1973;53:422– 427. doi: 10.1016/0006-8993(73)90227-8. [DOI] [PubMed] [Google Scholar]
  61. Zylberberg J, Murphy JT, DeWeese MR. A sparse coding model with synaptically local plasticity and spiking neurons can account for the diverse shapes of V1 simple cell receptive fields. PLoS Comp Bio. 2011;7(10):e1002250. doi: 10.1371/journal.pcbi.1002250. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

RESOURCES