Abstract
Filling-in at the blind spot is a perceptual phenomenon in which the visual system fills the informational void, which arises due to the absence of retinal input corresponding to the optic disc, with surrounding visual attributes. It is known that during filling-in, nonlinear neural responses are observed in the early visual area that correlates with the perception, but the knowledge of underlying neural mechanism for filling-in at the blind spot is far from complete. In this work, we attempted to present a fresh perspective on the computational mechanism of filling-in process in the framework of hierarchical predictive coding, which provides a functional explanation for a range of neural responses in the cortex. We simulated a three-level hierarchical network and observe its response while stimulating the network with different bar stimulus across the blind spot. We find that the predictive-estimator neurons that represent blind spot in primary visual cortex exhibit elevated non-linear response when the bar stimulated both sides of the blind spot. Using generative model, we also show that these responses represent the filling-in completion. All these results are consistent with the finding of psychophysical and physiological studies. In this study, we also demonstrate that the tolerance in filling-in qualitatively matches with the experimental findings related to non-aligned bars. We discuss this phenomenon in the predictive coding paradigm and show that all our results could be explained by taking into account the efficient coding of natural images along with feedback and feed-forward connections that allow priors and predictions to co-evolve to arrive at the best prediction. These results suggest that the filling-in process could be a manifestation of the general computational principle of hierarchical predictive coding of natural images.
Introduction
Filling-in at the blind spot is one of the examples of how brain interpolates the informational void due to the deficit of visual input from the retina. Because of the absence of photoreceptors at optic disc, the retina is unable to send the corresponding signal to the brain and thereby, hides some portion of the visual field. The concealed visual field is known as the blind spot. However, we never notice any odd patch in our visual field, even in monocular vision, but rather we see the complete scene; filled up in accordance with the surrounding visual attributes [1]. This completion is known as perceptual filling-in or simply filling-in. In addition to the blind spot, filling-in also occurs in other visual input deficit conditions, e.g. filling-in at the artificial and natural retinal scotoma [2, 3]. In addition to the deficit of input, filling-in also occurs in visual illusions such as Neon color spreading, Craik-O’Brien-Cornsweet illusion, Kanizsa shapes, etc. and steady fixation condition like Troxler effect (for review see [4]).
Many psychophysical and physiological studies have been performed to gain insight into the neural mechanism of perceptual completion at the blind spot. These studies suggest that the filling-in is an active process: some neural process is involved and mainly take place in the early visual cortex [5–7]. For example, studies on monkeys show that perceptually correlated neural activities are evoked in the deep layer of primary visual cortex, in the region that retinotopically corresponds to the blind spot (BS) region, when filling-in completion occurs [5, 6]. In another experiment, Matsumoto and Komatsu [7] showed that some neuron in BS region in deep layer of primary visual cortex (BS neurons), which possess larger receptive fields that extend beyond the blind spot, exhibits non-linear elevated response when a long moving bar cross over the blind spot and perceptual completion occurs (See Fig 1).
Although some attempts have been made to understand the computational mechanism of completion of illusory contour and surface [8–12], little work has been devoted to the study of computational mechanism of filling-in completion at the blind spot. Recent studies [13] have suggested the computational mechanism of completion of the bar in terms of a complex interaction of velocity-dependent pathways in the visual cortex under the framework of regularization theory. However, the fitness of this suggestion in the context of a general coding principle of the visual cortex is not clear. Here in this study, we suggest the filling-in completion at the blind spot naturally follows from the computational principle of Hierarchical predictive coding (HPC)of natural images, which has, recently, gained growing support as the general coding principle of visual cortex [14–24] (For an excellent review see [25]).
The root of Hierarchical predictive coding lies in the probabilistic hierarchical generative model and the efficient coding of natural images. In such probabilistic frameworks, the job of the visual system is to infer or estimate the properties of the world from signals coming from receptors [26–28]. In HPC framework, this job is hypothesized to be completed by concurrent prediction-correction mechanism along the hierarchy of the visual system. Accordingly, each higher visual area (say V2) attempt to predict response at its lower area (say V1) on the basis of the learned statistical regularities, and send that prediction signal to the lower area by feedback connection. In response to this top-down information, lower area sends a residual error signal to the higher area, by feed-forward connection, to correct the next prediction. This idea is based on the anatomical architecture of the visual system which is hierarchically organized and reciprocally connected [29]. Probabilistic generative model, in HPC framework, accounts for learning the statistical regularities found in natural images and generation of prediction of input based on that learning. Recently, several neuronal tuning properties in different visual areas such as the lateral geniculate nucleus (LGN), primary visual cortex (V1) and middle temporal level (MT) have been explained using this framework [17, 20, 21, 23]. For example, in his standard HPC model, Rao [14] suggested that the extra-classical properties of neurons in V1 could be understood in terms of the predictive-feedback signal from the secondary visual cortex (V2) which is made in a larger context and in the backdrop of learned statistical regularity of the natural Scene. We speculated that a similar mechanism could also explain the filling-in completion across the blind spot.
In this work, we have conducted simulation studies involving horizontal bars on three leveled (LGN-V1-V2) HPC model network having a blind spot which was emulated by removing the feed-forward (LGN-V1) connection. In our first investigation we have employed shifting bar stimuli as described in [7](See Fig 1), to study the properties of our model network and recorded the model predictive estimator neurons (PE neurons) at BS region in the V1. We found that these neurons exhibit similar non-linear response and represent the filling-in completion when bar crosses the blind spot. In another investigation, we presented two separate bar segments at the opposite end of the model blind spot to verify the tolerance of completion by varying the alignment of those segments. We found that the filling-in completion is best when the bars are perfectly aligned. The completion is visible for small orders of misalignment, but it fades out quickly with increasing misalignment. These results are consistent with the finding of psychophysical experiments [30, 31]and therefore, suggest that the filling-in process could naturally arise out of the computational principle of hierarchical predictive coding (HPC) of natural images.
Methods
Hierarchical Predictive coding of natural images
General Model and Network Architecture
As discussed in the previous section, the problem of vision has been considered as an inference or an estimation problem; where an organism tries to estimate the hidden physical cause (object attributes such as shape, texture and luminance etc.) behind the generated image that organism receives as an input. In the HPC framework [14], it is assumed that the image generation in the outer world involves hierarchical, multilevel, spatial and temporal interactions between the physical causes. The goal of the visual system is, thus, to estimate (or internally represent) these multilevel hidden physical causes efficiently; which is accomplished by the visual system using recurrent prediction-correction mechanism along its hierarchy (See Fig 2a).
In this framework, on the arrival of an input, predictor estimator modules (PE module), at each visual processing level, generate the prediction (or estimate) on the basis of the learned statistical regularities of natural scenes. Each higher area (say V2) then sends the generated prediction to its immediate lower level (say V1) by feedback connections and in return receives the error signal, by feed-forward connections, which is used to correct the current estimate. An equilibrium state is achieved after the completion of few prediction-correction cycle; where the estimate matches the input signal. This optimum-estimate is regarded as a representation of the input at that level. The achieved optimum-estimate at different levels of network is depicted as a perception of the input image.
In general, a single PE module (See Fig 2b) consists of: (i) Predictive estimator neurons (PE neurons) which represent the estimate of current input signal I with response vector r (state vector), (ii) neurons, carrying prediction signal Ur (for the input I) to lower level by feed-back connections, whose synapse encode encoding efficacy matrix U, (iii) neurons, carrying feed-forward error signal (I − Ur) form lower level to higher level, whose synapses encoded rows of efficacy matrix UT, and (iv) error detecting neurons which carry the residual error signal (r − rtd) to the higher level corresponding to the prediction rtd from the higher level.
Network dynamics and learning rule
The dynamics, the learning rules and hence the above-mentioned architecture of a general PE module directly stem from probabilistic estimation methods. In the Bayesian framework, these originate from maximum a posteriori (MAP) approach. In this case, maximizing the posterior probability P(r, rtd, U|I), which is equal to the product of generative models P(I|r, U), P(rtd|r) and prior probabilities P(r) and P(U), with respect to r and U provides the dynamics and learning rule respectively. Equivalently, in the framework of information theory, the minimum description length (MDL) approach leads to the same results by minimizing the coding length E, which is equal to negative log of posterior probability defined above, with respect to r and U (for details, see [14] and supporting information of [22]).
By assuming the probability distributions P(I|r, U) and P(rtd|r) as Gaussians of zero mean and variances σ2 and respectively, the total coding length E can be written as,
(1) |
here, g(r) and h(U) are the negative log of prior probabilities P(r) and P(U) respectively. Minimizing the coding length E, with respect to r (using the gradient descent method) provides the dynamics of PE module as,
(2) |
here, k1 is a rate parameter that governs the rate of descent towards a minimum of E, and UT is the transpose to weight matrix U. The steady state of this dynamical equation provides an optimum-estimate, which is regarded as the representation of the input. Coding length E, roughly, can be seen as the mean square error at the input and the output level of a PE module, subjected to constraints of prior probabilities. And minimization of the coding length is equivalent to optimization of estimate by recurrently matching of estimate to the corresponding “sensory driven” input from lower area as well as “context driven” prediction signal from higher area. The prediction signal Ur is the linear combination of basis vectors Ui’s. The Ui is the ith column of the matrix U, and represents the receptive field for ith neuron. The weighted coefficient in this combination, ri, represents the response of ith neuron having receptive field Ui. The visual representation of the prediction Ur corresponding to optimum-estimate r is, in this study, termed as “perceptual image.”
Furthermore, the minimization of coding length E, with respect to U using gradient descent method provides the learning rule for basis matrix U as,
(3) |
here k2 is learning rate, which operates on the slower time scale than the rate parameters k1, and rT is the transpose of state vector r. This learning rule can be seen as of Hebbian type. In this study, prior probability, P(r), on state vector r, is chosen according to sparse coding; where it is assumed that the visual system encodes any incoming signal with a small set of neurons from the available larger pool [28]. The kurtotic prior distribution () constrains the dynamics for the sparse representation of the input. This distribution gives us:
(4) |
which is used in Eq (2). The prior probability distribution, P(U) has been chosen here to be Gaussian type, which finally gives us:
(5) |
which is used in the Eq (3). Here α and λ are variance related parameters.
Simulation
Network
In this work we simulated a three level linear hierarchical predictive network (See Fig 3). In this network, Level 1, which is equivalent to V1, consists of 9 PE modules. These modules receive input from level 0 and send the output to the solitary module at level 2. Level 0 is equivalent to the LGN and level 2 is equivalent to V2. Therefore, the PE module at level 2 receives input from all the nine level 1 PE modules and sends back the feedback signal to all of them. This architecture is based on the fact that the visual area higher in hierarchy operates on a higher spatial scale.
Each of nine PE modules at level 1 consists of 64 PE neurons, 144 Prediction carrying neurons, 64 afferent error carrying neurons and 64 error detecting neurons for conveying the residual error to level 2. The layer 2 module consists of 169 PE neurons, 576 prediction carrying neurons and 169 error carrying neurons.
Training
Six natural images (Fig 4a) of size 512 × 512 pixels were used for training after pre-processing. The pre-processing involved DC removal and the filtering of images with circular symmetric whitening/lowpass filter with spatial frequency profile W(f) = fexp(−(f/f0)4)(see [28, 32]). Cutoff frequency f0 was taken to be 200 cycles/image. The pre-processing has been argued to emulate the filtering at LGN [33]. Variance normalized 1000 batches of 100 image patches of size 30 × 30 pixel, which were extracted from randomly selected locations from the randomly selected pre-processed images, were given as input to the network. A single 30 × 30-pixel image consisted of nine tiled 12 × 12-pixel image patches, which were overlapped by 3 pixels (see Fig 4b) and which were fed to the corresponding level 1 PE modules. For each batch of image patches, the network was allowed to achieve steady states (according to the Eq (2)) and the average of these states was used to update the efficacy of neurons (according to the Eq 3), initially assigned random values. During training, to prevent the efficacy vectors Ui (columns of U or rows of UT) from growing unbound, the gain (L2 norm), , were adopted, as , so that the variance of ri remain at appropriate level (for the details see [32]). Here is desired variance of ri and α is gain adaption rate. The level 1 was trained first and then the level 2. Parameter values used in this study are: k1 = 1, k2 = 3, σ2 = 3, , α = 0.05 at level 1 and 0.1 at level 2, λ = 0.0025, , γ = 0.02.
Blind spot implementation
To mimic the blind spot the feed-forward connection in a certain area was removed from the model network, which was pre-trained with usual feed-forward connections. The removal was implemented by setting the efficacy of early feed-forward (level 0–level 1) neurons, that carry the error signal corresponding to the middle region (of size 8 × 8) of input patches (of size 30 × 30), to zero. This “pre-training”, the training before the creation of the blind spot, captures the fact that the active neurons in deep layer (5/6) corresponding to filling-in has been reported to be of binocular type. These neurons were found to respond to the inputs from both eye and hence, possess binocular receptive field. Additionally, these neurons also exhibit greater sensitivity to the inputs from the other eye (non-BS eye) [6, 7]. It is, therefore, natural to assume that, in normal binocular vision the feed-forward input from non-BS eye will cause the receptive fields (of these deep layer neurons) to develop.
Results
To ascertain whether the computational mechanism of HPC could account for filling-in completion across the blind spot, we conducted a pair of experiments by stimulating the trained HPC model network with bar stimuli. The HPC network was allowed to learn the synaptic weight of model neurons, by exposing it to natural image patches under the constraints of the sparseness of model neuron responses (see method). The learned synaptic weights of neurons carrying feed-forward signal of one of the modules at level 1 and level 2 are shown in Fig 5. The weighting profiles at level 1 (Fig 5a) resemble the Gabor-like receptive field at V1, which is similar to the results reported earlier in several studies [14, 20, 28]. The weighting profile at the level 2 (Fig 5b) resembles the more abstract visual features: long bar, curve, etc. The blind spot was emulated in the network by removing feed-forward connection (see method), whereas, the training was performed on a network by keeping feed-forward connection intact. We designate the network with the blind spot as BS network and the one without the blind spot as a non-BS network.
Filling-in of shifting bar
Both BS and non-BS Network were exposed to images of a horizontal bar of different length. One end of the bar was fixed at a position outside of the blind spot, whereas, the position of other end was varied (by one pixel at each instant) across the blind spot. Images of the bar for six different end positions are shown in Fig 6a. The response vector, r, of PE neurons in the central module in the model network (let say BS module) at level 1 and the sole module at level 2 was recorded for the different end position of the bar.
Fig 7 shows the bar plots of the response of 64 neurons in BS module at level 1 for six different bar position in both model networks(BS and non-BS). The comparison shows that almost the same set of a small number of neurons responded in both networks. The receptive field of the highly responsive neurons in this set possesses a horizontal bar-like structures. We plotted the response of some of these highly responsive neurons against the bar position (varying by one pixel) (see Fig 8) which show that these neurons exhibited elevated response when the varying end of bar crosses the blind spot. These elevated responses, in BS network, come reasonably close to the maximum response exhibit by these neurons in non-BS neuron. The closeness of responses indicates the representation of objects in the BS network is similar to the one in the non-BS network. This is reflected in the corresponding “perceptual images” (see Fig 6b) reconstructed using the generative process. The response profile of level 2 neurons is shown in Fig 9. The most active PE neurons at level 2 exhibit similar response as level 1 neurons and possess horizontal bar like receptive field, which is quite expected.
It is evident from these results that, in the case of BS Network, as long as the bar end reside inside the blind spot the response of neurons, in the BS module, remained constant and relatively low (Figs 7 and 8) which results in the perception of a bar of constant length on one side of the blind spot. On the other hand, when the bar crosses the blind spot, the responses are elevated significantly, and the filling-in completion occurred. These could also be understood by observing the relative deviation of the response of each neuron (typically the highly responsive neurons) in both networks (Fig 7) when bar end crosses the blind spot. These results are consistent with the findings of neurophysiological studies on macaque monkeys [7].
It is evident from Fig 8 that when filling-in occur, the response profile changes abruptly in a non-linear fashion. In order to verify the explicit correlation between the non-linear response and the filling-in completion, the response of the most responsive neurons were examined by exposing the BS network with different stimulus combinations (a, b, c, ab and a+b) as described in Fig 10. The responses of these neurons to these four stimuli were compared. Stimuli are shown in the inset at the bottom of the figure and the corresponding responses are presented as bar plots above each stimulus. As shown in Fig 10, the response to ab stimulus is significantly larger than the sum of the responses to a and b (shown as a+b) even though stimuli a and b separately stimulated similar areas of the RF exposed ab stimulus. This indicates that the abrupt change in the magnitude of the response during filling-in completion can not be explained by the stimulation of the receptive field extending out from the opposite side of the blind spot. In other words, the abrupt response increase (non-linearity) is correlated with the perceptual completion and is not predictable from a simple summation rule.
Filling-in for misaligned bars
Two bar segments were presented on the opposite sides of the blind spot. One of these bar segments was kept fixed, whereas the other one was shifted along the vertical direction (see Fig 11a). We recorded the response of PE neurons, in the BS module, at Level 1 and generated the corresponding “perceptual images”. The result presented in the Fig 11b shows that the bar appears completed when both segments are aligned, but the filling-in fades away when misalignment increases. This result indicates that the filling-in completion is highly favorable for perfect alignment and have some degree of tolerance for misalignment. Similar results are reported in earlier psychophysical studies [30, 31].
To understand the mechanism of filling-in, we should recall that in HPC, feed-forward connection propagates up the residual error, corresponding to the current prediction made by higher area, for the betterment of the next prediction. The optimum-estimate, where prediction closely matches the “driving sensory” input as well as “contextual signal” from higher area, which produce a minimum prediction error, is then depicted as a percept of the input. However, the blind spot is characterized by the absence of such feed-forward connection. Therefore, the estimate made by higher areas prevails in the absence of error signal and this provides the ground for the filling-in completion at the blind spot.
Neurons at V2 learns the regular feature like long bar, curve etc. from the natural scene and operates on a larger area and context. Therefore, the initial estimate, which is based on these learned regular features, prevails and becomes an optimum-estimate for the inputs which only match with those regular features in the surrounding of the blind spot. This process initializes the filling-in at the V2. In the absence of feed-forward connection in BS region, the corresponding local optimum-estimate at level 1 will, therefore, evolve by matching the “context driven” feedback signal from level 2. This process at level 1 locally captures all the course of the completion process at level 2. Thus, the properties of the filling-in are highly determined by the matching of statistics of input stimuli around the blind spot and natural statistics learned by the network. Better the degree of matching, higher chances of completion.
For example, in the bar shifting experiment while the moving end of bar resides inside the blind spot, the incoming sensory input, which is of a short bar residing on one side of the blind spot, deviates reasonably from learned statistical regularity, in which the bars are usually longer (extended across the blind spot) [14]. That turns out as a non-completion of the bar and PE neurons at BS module, whose response represent bar, exhibit low response. On the other hand, when bar crosses the blind spot, the likelihood of matching of the input signal and the learned regular features suddenly increases and that leads to the abrupt elevation of the response of PE neurons which encodes the bar in BS module. This process resembles the AND-gate functionality and reflects the nonlinearity in response profiles shown above (Figs 8 and 10).
In the context of the second investigation, for the aligned bar segments, a continuous bar as an estimate is more likely in the landscape of set of similar features learned from natural scenes. This likelihood favors the completion of the aligned-bar segments. On the other hand, two non-aligned bar (non-aligned case in Fig 11a) is less likely to be part of a continuous object in the dictionary of learned feature set and that reflects in their non-completion. These suggest that the learned statistical regularity of natural objects along with the prediction-correction mechanism could play a significant role in filling-in completion.
We have presented some of the results in the form of “perceptual images”, which are generated from the activities of PE neurons, and did not associated these results to the visual angle, which is a common way of representing results in visual psychophysics. The reason to do so is, since the extension of receptive field and the blind spot are represented in terms of pixels, its relation to the actual shape and dimension of the blind spot (and receptive fields), and hence to the visual angle representation, is difficult to establish.
Discussion
In this study, we have investigated the computational mechanism of filling-in at the blind spot. We postulated that this could be understood by examining the hierarchical predictive processing inside the visual cortex and therefore, conducted simulation studies on the three level HPC model network to investigate the filling-in completion using bar stimuli. The blind spot was emulated in the network by removing bottom-up feed-forward connection whereas; the training was performed on a network with usual top-down and bottom-up connections. In the first study, we recorded the response of PE neurons at V1 in BS module, while shifting a long bar across the blind spot, and additionally, generated the corresponding perceptual counterpart using the generative model. The non-linear response of the PE neurons, as well as the nature of filling-in of bar segments, was in agreement with experimental findings [7]. In another study, a pair of bar segments was presented on the opposite side of the blind spot, with varying alignment. We found that the filling-in completion, which occurs in the case of perfectly aligned bar segments, also occurs with a small degree of misalignment, but the degree of filling-in completion deteriorates with increasing misalignment. These results are also in good agreement with the physiological and psychophysical results reported earlier [7, 31].
Previous studies have suggested hierarchical predictive processing of natural images, as a general unified coding principle of visual system [14, 17, 18, 20, 34]. This study proposes that the same coding principle could also account for filling-in at the blind spot. Where, for an input stimulus around the blind spot, higher areas (V2) generates unified estimate (including the estimate corresponding to blind spot region) of the input stimuli on the basis of the learned statistical regularities of natural images. This estimate remains uncorrected due to the absence of error carrying feed-forward connection in BS region at V1 and therefore, local optimum-estimate is achieved essentially by top-down prediction. Influenced by learned statistical regularities, higher areas predicts a long continuous bar across the blind spot and this results in the perception of completion. The nonlinearity observed in the responses of PE neuron and, hence, the properties of filling-in, result from the degree of similarity between statistics of stimuli around the blind spot and the natural image statistics. This study provides another support to the suggestions of predictive coding as a general computational principle of visual cortex.
In some related studies [8, 10], the role of cortico-cortical (V2-V1) interaction in the filling-in of illusory contours and the surfaces was suggested. Neumann [9] proposed that the filling-in of illusory contour could be the outcome of modulation mechanism of feedback signal from V2, which enhance the favorable response profile of feature detecting neurons, mainly in the superficial layers, at V1, in the context of larger contour coded at V2. This model, therefore, has its limitation in explaining the completion across the blind spot where the activity is mainly found in the deep layer of the V1. In another recent study [35], authors tried to explain the non-linear behavior of neurons in filling-in in terms of interaction of top-down and bottom-up signal in a Bayesian framework, where the feed-forward signal carrying the “prediction match” plays a crucial role. However, in this study, we have demonstrated similar response profiles along with the other properties of filling-in under a simple, unifying framework of hierarchical predictive processing, where feed-forward signal carries the “prediction error” instead of “prediction match” which determines the activities of PE neurons. These PE neurons, in this framework, hypothetically reside in the deep layer of the cortex [16, 36], which is consistent with the physiological findings.
Regarding the “pre-training” mentioned in method section, one can argue that in the absence of feed-forward input, even the feedback signal from the higher area can possibly cause the associative receptive fields to develop, which could provide the basis for internal representation in the BS area. But this is not the case with neurons representing the blind spot in which, as we have already discussed, there seems to exist the feed-forward connection from the other eye (non-BS) in normal binocular vision to cause receptive fields of the deep layer neurons to develop. In this case, one can suggest that these neurons might get relatively reduce input strength in the BS region since these are getting input from only one eye rather than from both eyes and that can lead to different weighting profile. Without going into details of the integration of input, which can take the value of relative strength from half to one depending on assumption of integration (linear or nonlinear), we can discuss that even the reduction of input, in a reasonable amount, may not give rise to any qualitative change in the learned receptive fields of the neurons because the nature of the receptive fields is mainly governed by the statistical feature of the input. We, therefore, argue that this situation may not alter the generality of our approach.
In this work, we are suggesting the functional explanation of filling-in at the blind spot, which has been largely unexplained till this dates, under the simple linear predictive coding framework. Though this study provides some insight of this phenomenon qualitatively, a quantitative insight could be gained by incorporating a more detail HPC model. Some other recent studies [21, 23, 24] has expanded this framework from the standard one and has demonstrated that it can account for the various physiological and psychophysical results. With proper modification in line with the requirements of the blind spot, future studies in these detailed HPC frameworks could probably be useful for quantitative estimates of filling-in at the blind spot. Moreover, filling-in completion of a bar mainly governed by the the learned statistics of contrast information (edge, boundary, etc.) found in natural scenes. Positive outcomes of our investigation, therefore, suggest that surface filling-in at the blind spot could also be understood under the similar mechanism, given a proper surface representation in the hierarchical probabilistic framework, which is the present challenge of visual science. This study does not reject any possible role of intracortical interaction in V1 in filling-in completion. There could be some other (or more than one) prediction-correction pathway within V1, which can contribute to filling-in based on contextual information surrounding the blind spot.
In conclusion, recent studies on filling-in at the blind spot reveal that neural activities in BS area, in the deep layers of V1, are associated with filling-in completion. For example, in the shifting bar completion experiment, nonlinear neuronal activities are reported. Here, we have explained these activities in terms of hierarchical predictive coding principles of natural images and moreover, we have also demonstrated that these activities represent the filling-in. In the context of misaligned bars, perceptual results corroborate our findings, which show that the tolerance of filling-in varies with the alignment of the bars presented on opposite sides of the blind spot. All these results suggest that the filling-in could be a manifestation of a hierarchical predictive coding principle and, the nature of filling-in could be predominantly guided by the learned statistical regularities of the natural scene.
Data Availability
All relevant data are within the paper.
Funding Statement
This work was supported by Department of Atomic Energy, Govt. of India. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1. Ramachandran V. Blind spots. Scientific American. 1992;p. 86–91. [DOI] [PubMed] [Google Scholar]
- 2. Ramachandran VS, Gregory RL. Perceptual filling in of artificially induced scotomas in human vision. Nature. 1991;350(6320):699–702. 10.1038/350699a0 [DOI] [PubMed] [Google Scholar]
- 3. Gerrits HJ, Timmerman GJ. The filling-in process in patients with retinal scotomata. Vision research. 1969;9(3):439–442. 10.1016/0042-6989(69)90092-3 [DOI] [PubMed] [Google Scholar]
- 4. Komatsu H. The neural mechanisms of perceptual filling-in. Nature reviews Neuroscience. 2006. March;7(3):220–31. 10.1038/nrn1869 [DOI] [PubMed] [Google Scholar]
- 5. Fiorani Júnior M, Rosa MG, Gattass R, Rocha-Miranda CE. Dynamic surrounds of receptive fields in primate striate cortex: a physiological basis for perceptual completion? Proceedings of the National Academy of Sciences of the United States of America. 1992;89(18):8547–8551. 10.1073/pnas.89.18.8547 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Komatsu H, Kinoshita M, Murakami I. Neural responses in the retinotopic representation of the blind spot in the macaque V1 to stimuli for perceptual filling-in. The Journal of neuroscience: the official journal of the Society for Neuroscience. 2000. December;20(24):9310–9319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Matsumoto M, Komatsu H. Neural responses in the macaque V1 to bar stimuli with various lengths presented on the blind spot. Journal of Neurophysiology. 2005;93(5):2374–2387. 10.1152/jn.00811.2004 [DOI] [PubMed] [Google Scholar]
- 8. Neumann H, Pessoa L, Hansen T. Visual filling-in for computing perceptual surface properties. Biological Cybernetics. 2001;369(5):355–369. [DOI] [PubMed] [Google Scholar]
- 9. Neumann H, Sepp W. Recurrent V1-V2 interaction in early visual boundary processing. Biol Cybern. 1999;81(5–6):425–444. 10.1007/s004220050573 [DOI] [PubMed] [Google Scholar]
- 10. Grossberg S, Grossberg S, Center EM, Center EM, Systems A, Systems A, et al. Neural Dynamics of Form Perception: Boundary Completion, Illusory Figures, and Neon Color Spreading. Psychological Review. 1985;92(2):173–211. 10.1037/0033-295X.92.2.173 [DOI] [PubMed] [Google Scholar]
- 11. Grossberg S, Kelly F. Neural dynamics of binocular brightness perception. Vision Research. 1999;39(22):3796–3816. 10.1016/S0042-6989(99)00095-4 [DOI] [PubMed] [Google Scholar]
- 12. Grossberg S, Mingolla E, Ross WD. Visual brain and visual perception: how dose the cortex do perceptual grouping? Trends Neurosci. 1997;20(3):106–111. 10.1016/S0166-2236(96)01002-8 [DOI] [PubMed] [Google Scholar]
- 13. Satoh S, Usui S. Computational theory and applications of a filling-in process at the blind spot. Neural Networks. 2008. November;21(9):1261–1271. 10.1016/j.neunet.2008.05.001 [DOI] [PubMed] [Google Scholar]
- 14. Rao RPN, Ballard DH. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience. 1999;2(1):79–87. 10.1038/4580 [DOI] [PubMed] [Google Scholar]
- 15. Lee TS, Mumford D. Hierarchical Bayesian inference in the visual cortex. Journal of the Optical Society of America A: Optics and Image Science, and Vision. 2003;20(7):1434–1448. 10.1364/JOSAA.20.001434 [DOI] [PubMed] [Google Scholar]
- 16. Friston K. A theory of cortical responses. Philosophical transactions of the Royal Society of London Series B, Biological sciences. 2005;360(1456):815–36. 10.1098/rstb.2005.1622 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Jehee JFM, Rothkopf C, Beck JM, Ballard DH. Learning receptive fields using predictive feedback. Journal of Physiology Paris. 2006;100(1–3):125–132. 10.1016/j.jphysparis.2006.09.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Hohwy J, Roepstorff A, Friston K. Predictive coding explains binocular rivalry: An epistemological review. Cognition. 2008;108(3):687–701. 10.1016/j.cognition.2008.05.010 [DOI] [PubMed] [Google Scholar]
- 19. Spratling MW. Predictive coding as a model of biased competition in visual attention. Vision Research. 2008;48(12):1391–1408. 10.1016/j.visres.2008.03.009 [DOI] [PubMed] [Google Scholar]
- 20. Jehee JFM, Ballard DH. Predictive feedback can account for biphasic responses in the lateral geniculate nucleus. PLoS Computational Biology. 2009;5(5):e1000373 10.1371/journal.pcbi.1000373 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Spratling MW. Predictive Coding as a Model of Response Properties in Cortical Area V1. The Journal of Neuroscience. 2010;30(9):3531–3543. 10.1523/JNEUROSCI.4911-09.2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Huang Y, Rao RPN. Predictive coding. Wiley Interdisciplinary Reviews: Cognitive Science. 2011;2(5):580–593. [DOI] [PubMed] [Google Scholar]
- 23. Spratling MW. Predictive coding accounts for V1 response properties recorded using reverse correlation. Biological Cybernetics. 2012;106(1):37–49. 10.1007/s00422-012-0477-7 [DOI] [PubMed] [Google Scholar]
- 24. Boerlin M, Machens CK, Denève S. Predictive Coding of Dynamical Variables in Balanced Spiking Networks. PLoS Computational Biology. 2013;9(11). 10.1371/journal.pcbi.1003258 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Clark A. Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences. 2013;36(03):181–204. 10.1017/S0140525X12000477 [DOI] [PubMed] [Google Scholar]
- 26. Mumford D. Neuronal architectures for pattern-theoretic problems Large-Scale Theories of the Cortex. Cambridge, MA: MIT Press; 1994. [Google Scholar]
- 27. Dayan P, Hinton GE, Neal RM, Zemel RS. The Helmholtz machine. Neural computation. 1995;7(5):889–904. 10.1162/neco.1995.7.5.889 [DOI] [PubMed] [Google Scholar]
- 28. Olshausen BA, Field DJ. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature. 1996;381(6583):607–609. 10.1038/381607a0 [DOI] [PubMed] [Google Scholar]
- 29. Felleman DJ, Van Essen DC. Distributed hierarchical processing in the primate cerebral cortex. Cerebral cortex (New York, NY: 1991). 1991;1(1):1–47. [DOI] [PubMed] [Google Scholar]
- 30. Araragi Y. Anisotropies of linear and curvilinear completions at the blind spot. Experimental Brain Research. 2011;212(4):529–539. 10.1007/s00221-011-2758-0 [DOI] [PubMed] [Google Scholar]
- 31. Araragi Y, Nakamizo S. Anisotropy of tolerance of perceptual completion at the blind spot. Vision Research. 2008. February;48(4):618–625. 10.1016/j.visres.2007.12.001 [DOI] [PubMed] [Google Scholar]
- 32. Olshausen Ba, Field DJ. Sparse coding with an incomplete basis set: a strategy employed by ∖protect{V1}. Vision Research. 1997;37(23):3311–3325. 10.1016/S0042-6989(97)00169-7 [DOI] [PubMed] [Google Scholar]
- 33. Atick JJ. Could information theory provide an ecological theory of sensory processing? Network: Computation in Neural Systems. 1992;3(2):213–251. 10.1088/0954-898X_3_2_009 [DOI] [PubMed] [Google Scholar]
- 34. Ballard DH, Jehee J. Dynamic coding of signed quantities in cortical feedback circuits. Frontiers in Psychology. 2012;3(August). 10.3389/fpsyg.2012.00254 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Hosoya H. Multinomial Bayesian learning for modeling classical and nonclassical receptive field properties. Neural computation. 2012;24(8):2119–2150. 10.1162/NECO_a_00310 [DOI] [PubMed] [Google Scholar]
- 36. Rao RPN, Sejnowski TJ. Predictive Coding, Cortical Feedback, and Spike-Timing Dependent Plasticity. Probabilistic models of the brain. 2002;p. 297. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
All relevant data are within the paper.