Abstract
We propose a computational model of a simple cell with push-pull inhibition, a property that is observed in many real simple cells. It is based on an existing model called Combination of Receptive Fields or CORF for brevity. A CORF model uses as afferent inputs the responses of model LGN cells with appropriately aligned center-surround receptive fields, and combines their output with a weighted geometric mean. The output of the proposed model simple cell with push-pull inhibition, which we call push-pull CORF, is computed as the response of a CORF model cell that is selective for a stimulus with preferred orientation and preferred contrast minus a fraction of the response of a CORF model cell that responds to the same stimulus but of opposite contrast. We demonstrate that the proposed push-pull CORF model improves signal-to-noise ratio (SNR) and achieves further properties that are observed in real simple cells, namely separability of spatial frequency and orientation as well as contrast-dependent changes in spatial frequency tuning. We also demonstrate the effectiveness of the proposed push-pull CORF model in contour detection, which is believed to be the primary biological role of simple cells. We use the RuG (40 images) and Berkeley (500 images) benchmark data sets of images with natural scenes and show that the proposed model outperforms, with very high statistical significance, the basic CORF model without inhibition, Gabor-based models with isotropic surround inhibition, and the Canny edge detector. The push-pull CORF model that we propose is a contribution to a better understanding of how visual information is processed in the brain as it provides the ability to reproduce a wider range of properties exhibited by real simple cells. As a result of push-pull inhibition a CORF model exhibits an improved SNR, which is the reason for a more effective contour detection.
Introduction
Visual information is of great importance for humans and animals. In macaques, for instance, 55% of the neocortex is dedicated to process visual information [1], this is 5 to 20 times more than the resources dedicated to any other sensory information.
The study of [2]–[4] was the first breakthrough in the understanding of neurons in area V1 of the visual cortex. They distinguished three types of neurons that they called simple, complex and hypercomplex cells. Their work inspired many researchers to study and unveil the properties of other kinds of neurons in the same and other areas of the visual cortex [5], [6].
The visual cortex of the brain may be understood as being organized in a hierarchy [7], which is composed of layers of neurons that perform similar as well as varied operations. Neurophysiologists have identified two main pathways that process visual information, the so-called dorsal and ventral streams or as they are referred to, the “where” and “what” pathways, respectively. The dorsal stream is responsible for motion analysis and spatial arrangement while the ventral stream performs, essentially, object detection and recognition. The complexity of neuronal selectivity increases when going up the hierarchy. For instance, in the bottom layer of the ventral stream, neurons in area V1 respond to bars and edges, as well as spatial frequency, color, motion and disparity while at the higher end, neurons in area IT respond to whole objects independently of changes in location on the retina, stimulus size, contrast, color and aspect ratio (related to deph rotation invariance) [8], [9].
The ongoing findings of such neurophysiological studies have been the inspiration to computationally simulate how visual information is analyzed in the brain. During the last three decades, this has been the focus of many research groups in the computer vision community. Their work may not only contribute to more robust techniques but also to achieve a better understanding of how the brain processes visual information. Computational neuroscience and modeling address the big questions in computer vision by mimicking the human visual system as well as providing a ground where to test hypotheses on how the visual cortex works. In [10] the first approach was proposed to model some properties of simple and complex cells of the type reported by Hubel and Wiesel. Computational neuroscientists have been adding layers of functionalities to that pioneering work. Some of those works consist of modelling simple cells [11], as well as modelling hierarchies of simple and complex cells [12]. Other works have been adding new neural types and functionalities, such as the addition of lateral connections for contour grouping [13], [14] or neurons that are selective to shapes [15] to name a few among the extensive modeling bibliography.
In this work we focus on neurons in area V1 that respond to edges and bars. These neurons integrate responses of cells that reside in the lateral geniculate nucleus (LGN), an intermediate area between the eye and the visual cortex. In area V1, there are three main types of neuron that respond to bars and edges, referred to as simple, complex and hypercomplex neurons. A simple neuron responds to a bar or an edge of a given orientation at a specific position in its receptive field. A complex neuron is also orientation-selective but its response is invariant to the location of the preferred stimulus within its receptive field. It is usually considered as integrating responses from simple neurons [3] or LGN neurons [16]. Finally, hypercomplex (also known as end-stopped) cells are sensitive to the terminations of edges or bars [17].
The class of simple cells is the most studied type of neuron in neurophysiology, their detailed properties are very well known today. Besides orientation selectivity, they respond to gratings [5] and exhibit an orientation bandwidth which is invariant to the contrast of a stimulus. Another property that is typical of simple cells is called cross orientation suppression. This means that if two stimuli are presented at the same time, one of preferred orientation and the other one of orthogonal orientation, the response of the concerned simple cell decreases with increasing contrast of the orthogonally oriented stimulus [18].
While the 2D Gabor function [11] has gained particular popularity as a model of a simple cell, it fails to reproduce contrast invariant orientation tuning and cross orientation suppression. A novel computational model of a simple cell was proposed in [19], called CORF (Combination of Receptive Fields), that exhibits these two important properties. The authors demonstrated that the CORF model outperforms the Gabor function model in a contour detection task [20]. The response of that CORF model is based on excitatory synapses by a collection of afferent model LGN cells, the receptive fields of which are co-linearly aligned.
A CORF model takes as input the responses of a group of model LGN cells with center-surround receptive fields that are aligned along a row. The colinear arrangement of center-on receptive fields on one side and in parallel to a similar colinear arrangement of center-off receptive fields on the other side determines the orientation selectivity of a CORF model simple cell. This is in line with a recent exhaustive study [21], which found that the geometrical arrangement in the visual space of population receptive fields of geniculate inputs can predict the dominant orientation and spatial phase preferences of the simple cells in a cortical column. The response of a CORF model simple cell is computed as the weighted geometric mean of afferent LGN input. This AND-type operation follows the hypotheses of Hubel and Wiesel [22] as well as Marr and Hildreth [23] in that a simple cell fires only when all the afferent LGN cells with appropriately aligned receptive fields are activated. While the biological underlying mechanism is still an open research question, the AND-type operation proposed in the CORF model turned out to be essential to achieve contrast invariant orientation tuning and cross orientation suppression, as they could not be reproduced by an OR-type operation.
A classical receptive field of a simple cell is a region of the visual field where the presence of a visual stimulus with preferred contrast, size and orientation triggers the firing of the concerned cell. For instance, a simple cell that is selective for a vertical edge has a receptive field which is divided into two main areas, vertically oriented and elongated, parallel to each other, called the ON and OFF sub-regions. It fires when a vertical edge is within its receptive field and the light and dark parts of the stimulus are appropriately located on the ON and OFF sub-regions of the receptive field, respectively.
In neurophysiology, it is well known that simple cells receive what is called antiphase or push-pull inhibition [24]–[29]. A push-pull response of a simple cell with classical receptive field is achieved when two stimuli of preferred orientation but of opposite contrast evoke responses of the opposite sign; the stimulus of preferred contrast evokes a push (positive) response and the stimulus of opposite contrast evokes a pull (negative) response. Some simple cells are also known to have non-classical receptive fields [30]–[33] which receive inhibition from their surrounding. In [34] a computational model of a simple cell with surround inhibition was proposed, which is based on Gabor functions.
A popular model of the push-pull response of a simple cell is depicted in Fig. 1. While there is not yet explicit biological evidence of the involved wiring it continues to receive strong neurophysiological experimental support [24], [26], [34]–[40]. It consists of a cortical neuron which receives excitation from a relay of thalamic LGN cells with center-surround receptive fields of preferred polarity, as well as inhibition from another cortical neuron, which receives input from LGN cells with center-surround receptive fields of opposite polarities.
There is neurophysiological evidence that push-pull inhibition is the most dominant form of inhibition received by simple cells [26], [29], [41]–[43]. This type of inhibition can be so strong that it may completely suppress the activation of a simple cell [41]. While the speculative feedforward push-pull model mentioned above has been evaluated with experimental data in neurophysiology, to the best of our knowledge, it has not yet been implemented as a computational model and evaluated in contour detection, which is assumed to be the biological role of simple cells.
We propose a push-pull CORF model of a simple cell with anitphase inhibition that takes as input the responses of two CORF model cells of the type proposed in [19], one with preferred polarity and the other one with opposite polarity, and compute its response as a function of the difference between their responses. We explore whether a push-pull CORF model exhibits the following two biological properties: separability of spatial frequency and orientation, and sensitivity of spatial frequency tuning to contrast [44], [45]. Moreover, we study the effectiveness of push-pull inhibition with regards to signal-to-noise ratio and to a contour detection application. We also compare this model with other biologically and non-biologically inspired contour operators.
The paper is organized as follows. First, we present the push-pull CORF model followed by experiments that demonstrate that it exhibits important properties of simple cells. Then, we present the experimental results in contour detection for two benchmark data sets of images with natural scenes. Finally, we provide a discussion about some aspects of the proposed model and draw our conclusions.
Computational Model
Overview
Fig. 1 illustrates the main setup of the push-pull CORF model of a simple cell that we propose. The concentric circles illustrate center-on (light central region with a dark surround) and center-off (dark central region with light background) receptive fields of model LGN cells. We use the CORF model that was proposed in [19] to model the colinear spatial arrangement of the receptive fields of model LGN cells. Its response is computed as the weighted geometric mean of the responses of the involved model LGN cells. The upper group of center-surround receptive fields is aligned in a colinear manner and with a polarity that is appropriate for the preferred stimulus shown at the bottom. The lower group corresponds to another CORF model which takes input from a group of model LGN cells of opposite polarity. Its response suppresses (or pulls) the excitatory (or push) response that is achieved with a CORF model of preferred polarity. The combined responses of these two model cells are then used to activate the corresponding model simple cell.
In the following sub-sections we explain the implementation details of the proposed push-pull CORF model.
Implementation
We denote by S a CORF model simple cell that is selective for vertical edges, of the type shown in Fig. 2d, that we configure with the trainable method proposed in [19].
(1) |
where every four-tuple represents the properties of a pool of afferent model LGN cells, which we call sub-unit. We model an LGN cell by a difference-of-Gaussians (DoG) function, which has been evaluated many times in neuroscience as an appropriate model LGN cell [46]. In particular, δi represents the polarity of the center-surround receptive fields (−1 for center-off, and 1 for center-on) of a pool of DoG functions, σi represents the standard deviation of the outer Gaussian function of the involved DoG functions (the standard deviation of the inner Gaussian function is half of that of the outer Gaussian function), and are the polar coordinates of the sub-unit's center with respect to the receptive field's center of the concerned CORF model cell.
The response of a CORF model cell at location , which we denote by , is achieved by combining the responses of the n afferent sub-units by weighted geometric mean. This computation is explained in detail in [19]. Fig. 2a illustrates the receptive field structure of a CORF model cell and Fig. 2e shows the response image that it achieves to the preferred stimulus shown in Fig. 2d.
The excitatory and inhibitory regions within the receptive field of a simple cell may either overlap or be separated in the direction orthogonal to the orientation preference of the cell [47]. We refer to the orthogonal distance between a pool of center-on and a pool of center-off model LGN cells as the separation index, which we denote by B. We consider the receptive field structure that results from the automatic configuration of a CORF model cell, such as the one shown in Fig. 2a, to have a separation index . Below we study the properties of the model for values of the separation index larger than : .
From the set S that corresponds to , we form a new set that defines another CORF model simple cell, which has the same preference for vertical orientations but has a separation index :
(2) |
where , ), , , when and when . Fig. 3 illustrates the geometrical relationship between one pair of and its counterpart .
The value of the parameter β effects the strength of the response to the preferred stimulus as well as the spatial frequency and orientation bandwidth of the concerned CORF model cell; the response to the preferred stimulus and the spatial frequency decreases, while the orientation bandwidth increases with an increasing value of β, Fig. 4.
We use set to define a new CORF model cell that is selective for vertical edges with opposite contrast:
(3) |
The receptive field of a CORF model is in antiphase to the one of . Push-pull inhibition is the result of combining the responses of two models, S (push) and (pull), defined above. We use a non-negative β value only for the inhibitory part in order to achieve an orientation bandwidth that is broader than that of the excitation, a property that is supported by neurophysiological evidence [48], [49].
We denote by a push-pull CORF model simple cell and define it as a pair:
(4) |
For β>0 the inhibitory CORF model has a smaller spatial frequency than the excitatory counterpart. An alternative way to achieve a similar effect is to use an inhibitory CORF model that has afferent model LGN cells with larger receptive fields (i.e. larger σ values) than those of the excitatory CORF model. We choose to work with the parameter β because it provides more flexibility to the model.
We compute the response of a push-pull CORF model cell at location () by subtracting a factor of the pull response from the push response , and denote it by :
(5) |
where the parameter k represents the pull strength of the inhibition.
Push-pull inhibition and signal-to-noise ratio
In the following we investigate the effect of push-pull inhibition on the signal-to-noise (SNR) ratio of computed neural responses. For this purpose we compare the SNR values of the responses of CORF models with and without inhibition to synthetic test images.
We generate a test image by summing an image of a vertical bright-to-dark edge with full contrast and a noise image, Fig. 5(a–c). We use the method proposed by [50] to generate a band-limited noise image as a superposition of a constant value N and 100 sinusoidal gratings of randomly selected orientations, all with the same given spatial wavelength w. The rationale of using band-limited noise is that it is particularly effective for masking of contours due to the responses it elicits from orientation-selective model neurons. We set the amplitude of the gratings as one third of the given average noise luminance N. The resulting test image has an edge contrast C defined as C = 1/N.
Fig. 5e illustrates the response map obtained by a CORF model cell without inhibition to the preferred stimulus shown in Fig. 5b. For the same noiseless stimulus an equivalent result is achieved by a push-pull CORF model cell that we propose. The maximum responses are achieved along the edge and they rapidly decrease with an increasing deviation from the edge until they disappear. The label b in Fig. 5f indicates the width of the band around the edge that contains responses greater than half of the maximum response.
We create nine test images by using three contrast values () and three values of w (). For all the locations of a test image we apply two CORF model cells, one without inhibition and the other with push-pull inhibition () and obtain two response maps. For this experiment both CORF models have the common parameter σ set to 2 and they both result in a band of width pixels to a noiseless edge stimulus of preferred orientation.
For each map, we then compute the average of the responses of a model cell along the band of width b that surrounds the edge and call it the response to signal . Similarly, we compute the average of the responses of the same model cell in the remaining noisy areas and call it the response to noise . Finally, we compute the SNR in decibels as follows:
(6) |
Fig. 6 shows the synthetic test images that we use along the corresponding response maps that are obtained with the two types of CORF model cells. These experimental results clearly show that the proposed push-pull CORF model cell improves the SNR substantially.
Tolerance to Rotation
The model configured above has an orientation preference for bright-to-dark vertical edges, Fig. 2(d–e). This preference is determined from a user-specified prototype edge by a configuration process that is thoroughly explained in [19]. We form a new set that describes a CORF model simple cell to be selective for edges that have an orientation of ψ radians:
(7) |
In order to obtain a response that is tolerant to any orientation we take the maximum value of push-pull CORF models with different orientation preference at a given location ():
(8) |
where Ψ is a set of orientations: . A value of is sufficient as a push-pull CORF model cell achieves an orientation bandwidth at half amplitude of , Fig. 4.
Testing Some Properties of Simple Cells
Separability of spatial frequency and orientation
The majority of simple cells exhibit an orientation tuning that is separable (or independent) of spatial frequency [51]. However, there are other cells whose orientation tuning is affected by the spatial frequency of a stimulus [44], [45].
We explore the separability properties of the proposed push-pull CORF model. Fig. 7a shows a response map of a CORF model cell without inhibition (, ) to gratings of different frequency and orientation. We computed two measurements, and si, that were used in [51] in order to quantify the separability between spatial frequency and orientation. The quantity is the squared correlation between measured and predicted spatial frequency-orientation. Predicted values are obtained under the assumption that both features (spatial frequency and orientation) are independent. The other quantitity is related to how much the first singular vector reconstructs the original matrix after singular value decomposition. Both quantities range between 0 (non-separable) to 1 (separable). We refer to [51] for further technical details on the rationale of these quantities. We obtained a value of 0.96 for and a value of 0.99 for si. Such high values (very close to ) mean that the spatial frequency and orientation are almost perfectly separable. Fig. 7b shows a response map which we obtain by adding moderate inhibition (, ), and it results in and . This scenario is very similar to the average over 52 neurons reported in [51]. Fig. 7c shows another response map for much stronger inhibition (, ), which results in and . These experiments indicate that the separability of spatial frequency and orientation tuning decreases as the inhibition strength increases.
The studies in [45] and [51] share a common finding; they report that some simple cells whose preferred spatial frequency varies with orientation and other cells whose preferred spatial frequency is independent of the orientation of the grating. Next, we demonstrate how we can achieve both phenomena with the proposed model by simply changing the push-pull inhibition factor k in Eq. 8. In Fig. 8 we show the activity of the proposed model that achieves comparable behaviour to the two most extreme cases from the work of [45]. When no inhibition is applied (, ) we obtain a model cell whose preferred spatial frequency is completely independent of the grating orientation (top) as in the case of simple cell 3 studied in [45]. On the other hand, if we add push-pull inhibition (, ) (bottom) we obtain a model cell whose preferred spatial frequency is dependent on the orientation of the grating as in cell 16 studied in [45].
Spatial frequency tuning sensitive to contrast
Some simple cells in visual cortex have a spatial frequency tuning that is sensitive to contrast [52]. We can also achieve this property by incorporating a sublinear function, such as the sigmoid function, to the responses of model LGN cells that provide input to CORF model cells.
The resulting CORF model cells with and without inhibition show dependence of spatial frequency tuning to contrast, Fig. 9.
Application to Contour Detection
In the following, we evaluate the proposed push-pull CORF model in a contour detection task. First, we explain how we transform a given image of a natural scene into a binary contour map and then we present a quantitative procedure to evaluate the quality of the resulting contour map.
Finally, we compare the performance of the proposed model to several other computational models, including the basic CORF model without inhibition, the Gabor Filter model of a simple cell with and without surround inhibition, the Gabor energy model of a complex cell with and without surround inhibition, as well as to the classical Canny edge detector.
Data sets and ground truth
We use two benchmark data sets that were created by the Universities of Groningen (RuG: the data set is online: http://www.cs.rug.nl/~imaging) and Berkeley. The RuG data set was originally introduced in [53] for the evaluation of the Gabor (energy) filter model with non-classical receptive field. It consists of 40 colour images (of size 512×512 pixels) of objects in natural scenes. Fig. 10 (first row) illustrates four examples of images taken from this data set, and Fig. 10 (second row) illustrates the corresponding ground truth contour maps that are hand drawn by a person. The ground truth images depict only the contours of objects (and shadows) and omit the sporadic contours of textured background.
The Berkeley data set consists of 500 images (of size 481×321 or 321×481 pixels) of objects in complex scenes. Fig. 11 (first row) shows four examples of images taken from this data set. While this data set was mainly developed for the evaluation of segmentation algorithms, it has also been used to evaluate various contour detection operators. Each image in the Berkeley data set is complemented with a collection of five ground truth contour maps which were hand drawn by five different persons. Fig. 11 (second row) illustrates the ground truth of superimposed contour maps that correspond to the images in the first row. The bolder the contour is the better the agreement is among the involved human observers.
Next, we explain how we obtain binary contour maps from the operators that we use here for comparison. Subsequently, we define the performance measures that we use to quantify the quality of the resulting contour maps with respect to the given ground truth images.
Binary contour map
We apply a classical two-step procedure in computer vision that was proposed by [54] and [55] to obtain a binary contour map from the output of the concerned model. The first step consists of edge thinning by non-maximum suppression to determine the ridges in the given response image. Then, we apply hysteresis thresholding to obtain a binary contour map. The latter step requires a high and a low threshold value. Similar to the work in [19] we set the low threshold value to a fraction (0.5) of the high threshold. For a given image, we set the high threshold to be the lowest value of the strongest ζ pixels in the thinned response image. The given value of the parameter ζ is a fraction of the total number of pixels in the image. The resulting binary map contains the strongest fraction ζ of contour pixels together with any connected ones that are achieved by hysteresis thresholding.
The images in the third to the seventh row of Fig. 10 and of Fig. 11 show the contour maps of the proposed push-pull CORF model, the basic CORF model without inhibition, the Gabor and Gabor energy models with isotropic surround inhibition and the classical Canny edge detector for the RuG and Berkeley data sets, respectively. These maps are obtained for certain values of the high threshold parameter that are explained below.
Quantitative performance measure
A binary contour map consists of two unbalanced sets of pixels, a minority set of contour pixels and a majority set of non-contour pixels.
We use the Matthews' correlation coefficient (mcc) as a quantitative measure to compare such unbalanced binary maps, which are obtained by some contour operators, with the corresponding ground truth. This performance measure, which is appropriate even when the concerned classes are unbalanced, considers the number of correctly detected contour pixels (true positives or TP), the number of pixels that are incorrectly detected as contour pixels (false positives or FP), the number of correctly detected background pixels (true negatives or TN) and the number of incorrectly missed contour pixels (false negatives or FN):
(9) |
where , , and .
The mcc values vary between −1 and +1. A value of +1 means perfect prediction, a value of 0 means random prediction, while a value of −1 indicates a completely wrong prediction.
We use the method described by [19] to deal with inexact contour localizations between the given ground truth and binary contour maps.
Experimental setup
In our experiments we perform various evaluations and comparisons. First, we determine the best β value and inhibition factor k for the model that we propose. This is achieved by running a systematic set of experiments on the RuG data set, each time using a different combination of the following parameters: 21 values of the scale parameter (), five ζ values (), five β values ( and 15 inhibition factors (). For we use three radii (), for we use four radii () and for we use five radii (). These ρ values are selected in such a way that the resulting orientation bandwidth at half amplitude is . For each combination of parameters we compute the mean mcc () value for all the 40 images in the RuG data set. The maximum is achieved for , , and . The contour maps shown in Fig. 10 (third row) are obtained with these parameter values. For the Berkeley data set we do not search for the best β and k parameter values but we use the same ones (, ) that were determined from the RuG data set.
Next, we compare the proposed push-pull CORF-based operator (CORF+PP) to the basic CORF-based operator without inhibition. This experiment allows us to understand the effectiveness of the addition of push-pull inhibition. Furthermore, we compare our model with an alternative inhibitory model of a simple cell called Gabor filter with isotropic surround inhibition (GF+II). For the sake of completeness, we also make a comparison with the Gabor energy filter model with isotropic inhibition (GEF+II), which is a computational model of a complex cell in area V1 with non-classical receptive field inhibition. For the Gabor-based operators [53] showed that isotropic surround inhibition is more effective in contour detection than anisotropic surround inhibition. Finally, we compare our results with the classical Canny edge detector.
The five operators that we compare share a common parameter, namely the scale parameter σ. For the CORF-based operators σ represents the standard deviation of the outer Gaussian function of the DoG filters that provide input, for the Gabor-based operators it represents the standard deviation of the envelope Gaussian function and for the Canny edge detector it represents the standard deviation of a Gaussian smoothing kernel.
For the Gabor-based operators (GF+II, GEF+II), we set the wavelength and the spatial aspect ratio as suggested by [56]. Furthermore, we set the inhibition factor of the Gabor-based operators as it yielded the maximum value for the RuG data set. We consider 12 orientations (in intervals of ) for the CORF- and Gabor-based operators.
Results
For every input image we apply the above five mentioned contour detection operators with 21 different values of the parameter σ () and five values of the parameter ζ ().
Finally, we compute the value for each value combination of parameters σ and ζ and for each data set. Table 1 reports the parameter values of σ and ζ that contribute to the maximum value. In the fourth to the seventh row of Fig. 10 and Fig. 11 we show the binary contour maps of the CORF, GF+II, GEF+II and Canny operators with the parameter values reported in Table 1 for the RuG and Berkeley data sets, respectively.
Table 1. The best parameters for the five evaluated operators.
CORF+PP | CORF | GF+II | GEF+II | Canny | ||||||
σ | ζ | σ | ζ | σ | ζ | σ | ζ | σ | ζ | |
RuG | 2.2 | 0.1 | 4.8 | 0.1 | 3.6 | 0.1 | 2.8 | 0.2 | 2.4 | 0.1 |
Berkeley | 2.2 | 0.3 | 3.6 | 0.2 | 3.4 | 0.3 | 2.0 | 0.3 | 2.0 | 0.2 |
The values of parameters σ and the fraction ζ of minimum pixels (from thinned images) to generate the resulting binary contour maps, which contribute to the maximum for the operators that we apply to the RuG and Berkeley data sets.
Fig. 12 shows four scatter plots that illustrate pairwise comparisons between the proposed CORF+PP operator with the other four state-of-the-art operators for the RuG data set. The labels in the x-axis are the RuG image names in descending order of the corresponding mcc value that is achieved with the proposed push-pull CORF model. We compare the mcc values of each image that are achieved with the values of parameters σ and reported in Table 1. For the majority of the images, the proposed operator achieves a better mcc value. In particular, out of the 500 images of the Berkeley data set, the proposed CORF+PP operator achieves better performance in 434, 377, 451, and 437 cases in comparison to the CORF-based operator without inhibition, GF+II, GEF+II and Canny edge detector, respectively.
On a statistical level, we apply a right-tailed paired-samples t-test to the set of pairs of mcc values that are achieved by the proposed CORF+PP operator and by each of the other four operators. The CORF+PP operator that we propose outperforms all other operators with high statistical significance for both the RuG (CORF: , GF+II: , GEF+II: , Canny: and the Berkeley (CORF: , GF+II: , GEF+II: , Canny: ) data sets.
In order to test the generalization ability of the above experimental method, we perform a 10-fold cross validation on the Berkeley data set. For each fold we consider nine different sets of 50 images and for each operator we apply a grid search to determine the σ and ζ parameter values that contribute to the maximum average mcc score across the (9×50 = ) 450 training images. It turns out that for all the 10 folds and for each operator we achieve the same σ and ζ parameter values as reported in Table 1 on the entire data set of 500 images. This result demonstrates the generalization ability of the applied experimental setup. Moreover, the fact that for the Berkeley data set we use the β and k parameter values that were determined from the RuG data set demonstrates the generalization ability of the proposed CORF detector with push-pull inhibition.
In an iterative procedure we perform a grid search to every possible combination of 9 sets of images, such that in each iteration we leave a different set out of consideration. This procedure is performed for the five operators. For the 10 grid searches, the threshold parameters of the operators remain constant and match the ones reported in Table 1 for the whole data set. The scale parameter remains constant only for the proposed CORF detector with push-pull inhibition (sigma = 2.2), GF+II (sigma = 3.4) and GEF+II (sigma = 2). For the basic CORF operator without inhibition the scale parameter is 3.6 for six grid searches and 3.8 for the remaining four. For the same six and four grid searches the scale parameter of the Canny operator is set to 2 and 2.2, respectively.
Discussion
In contrast to other computational models of simple cells, in particular the ones that rely on the Gabor function [11] and difference-of-Gaussians [15], [57]–[59], the proposed push-pull CORF model cell is anatomically more realistic as it uses as afferent inputs the responses of model LGN cells, rather than intensity pixels as projected on the retina.
In other studies we demonstrated that by using orientation-selective filters as afferent inputs we can form models that achieve qualitatively similar responses to shape-selective neurons in area V4, and showed that such models can be effectively used in various computer vision applications [60], [61].
The push-pull CORF model cell that we propose differs from the Gabor-based models with non-classical receptive field inhibition (nCRF) in two main aspects. First, the proposed model uses one model cell with opposite polarity to provide inhibition to the concerned model simple cell. Second, the receptive fields of the inhibitory neuron and simple cell models overlap each other. For there is a complete overlap, and for the receptive field of an inhibitory model neuron expands in all directions from the center, resulting in a bigger receptive field than that of the simple cell but with the same center. To the contrary, nCRF models receive inhibition as a function of the total responses of many model cells that are outside (no overlap) the receptive field of the model cell at hand. This is also known as contextual modulation.
In previous work [19], it was shown that a CORF model without inhibition exhibits contrast invariant orientation tuning, cross orientation suppression and response saturation, three properties that are typical of simple cells. Here, we demonstrate that by adding push-pull inhibition we can extend the number of properties that are observed in real simple cells. These include the relationship between spatial frequency and orientation tuning and spatial frequency selectivity that is sensitive to contrast. As a matter of fact, push-pull inhibition may be at the heart of an ongoing discussion in neurophysiology. A CORF model without inhibition exhibits orientation tuning that is independent of spatial frequency [51], but when we add push-pull inhibition the resulting model exhibits less separability between orientation tuning and spatial frequency. Similarly, by changing the strength of push-pull inhibition we can control the sensitivity of contrast to spatial frequency.
We demonstrated by quantitative experiments that the addition of push-pull inhibition improves signal-to-noise ratio systematically. This is the reason why a contour operator based on the proposed model outperforms the one without inihibition with high statistical significance. The highest improvement is achieved in images with high textured (noisy) background, such as the images shown in Fig. 10(a–c), Fig. 11a and Fig. 11c. For images that consist of only perceptually salient objects without noise, the result will be the same. The contour detection experiments also demonstrate that the proposed implementation of push-pull inhibition is more effective than Gabor-based models with nCRFs. Similarly, it outperforms the popular Canny edge detector.
The proposed model is conceptually simple and easy to implement. A push-pull response is computed as the response of a CORF model with preferred polarity minus a factor of the response of another CORF model with the same orientation but opposite polarity.
Conclusions
Push-pull inhibition provides the ability to construct models of a wider range of real simple cells with various properties that cannot be reproduced by other computational models. Besides orientation selectivity, cross-orientation suppression, contrast-invariant orientation tuning and response saturation, the proposed method can be used to implement a model cell whose relationships between its selectivity for spatial frequency, orientation tuning and contrast can be controlled by the strength of push-pull inhibition.
In addition, a push-pull CORF model cell improves SNR substantially, and outperforms other brain-inspired (Gabor-based) contour operators and the classical Canny edge detector.
Acknowledgments
We would like to thank TNO (Netherlands Organization for Applied Scientific Research) for partially supporting the research hours of George Azzopardi.
Funding Statement
The authors would like to thank TNO (Netherlands Organization for Applied Scientic Research) for partially supporting the research hours of George Azzopardi. TNO had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Kandel E, Schwartz J, Jessell T (2000) Principles of neural science, 4th ed. McGraw-Hill. [Google Scholar]
- 2. Hubel D, Wiesel T (1959) Receptive fields of single neurones in the cat's striate cortex. The Journal of physiology 148: 574–591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Hubel D, Wiesel T (1962) Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. Journal of Physiology 160: 106–154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Hubel D, Wiesel T (1968) Receptive fields and functional architecture of monkey striate cortex. Journal of Physiology 195: 215–243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Orban GA (2008) Higher order visual processing in macaque extrastriate cortex. Physiological Reviews 88: 59–89. [DOI] [PubMed] [Google Scholar]
- 6. Krüger N, Janssen P, Kalkan S, Lappe M, Leonardis A, et al. (2013) Deep Hierarchies in the Primate Visual Cortex: What Can We Learn For Computer Vision? IEEE Transactions on Pattern Analysis and Machine Intelligence 35: 1847–1871. [DOI] [PubMed] [Google Scholar]
- 7. Felleman D, Van Essen D (1991) Distributed hierarchical processing in the primate cerebral cortex. Cerebral cortex 1: 1–47. [DOI] [PubMed] [Google Scholar]
- 8. Tanaka K (2003) Columns for complex visual object features in the inferotemporal cortex: clustering of cells with similar but slightly different stimulus selectivities. Cerebral Cortex 13: 90–99. [DOI] [PubMed] [Google Scholar]
- 9. Gross CG (2008) Inferior temporal cortex. 3: 7294. [Google Scholar]
- 10. Fukushima K (1980) Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics 36: 193–202. [DOI] [PubMed] [Google Scholar]
- 11. Daugman JG (1985) Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. Optical Society of America, Journal, A: Optics and Image Science 2: 1160–1169. [DOI] [PubMed] [Google Scholar]
- 12. Riesenhuber M, Poggio T (2000) Models of object recognition. Nature Neuroscience 3 Suppl: 1199–1204. [DOI] [PubMed] [Google Scholar]
- 13. Craft E, Schütze H, Niebur E, von der Heydt R (2007) A neural model of figure–ground organization. Journal of neurophysiology 97: 4310–4326. [DOI] [PubMed] [Google Scholar]
- 14. Weidenbacher U, Neumann H (2009) Extraction of surface-related features in a recurrent model of v1-v2 interactions. PLoS ONE 4: e5909. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Rodríguez-Sánchez A, Tsotsos J (2012) The roles of endstopped and curvature tuned computations in a hierarchical representation of 2d shape. PLoS ONE 7: 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Mel BW, Ruderman DL, Archie KA (1998) Translation-invariant orientation tuning in visual complex cells could derive from intradendritic computations. The Journal of Neuroscience 18: 4325–4334. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Kato H, Bishop P, Orban G (1978) Hypeercomplex and simple/complex cells classifications in cat striate cortex. Journal of Neurophysiology 1071–1095. [DOI] [PubMed] [Google Scholar]
- 18. Morrone MC, Burr D, Maffei L (1982) Functional implications of cross-orientation inhibition of cortical visual cells. i. neurophysiological evidence. Proceedings of the Royal Society of London Series B Biological Sciences 216: 335–354. [DOI] [PubMed] [Google Scholar]
- 19. Azzopardi G, Petkov N (2012) A CORF computational model of a simple cell that relies on lgn input outperforms the gabor function model. Biological cybernetics 1–13. [DOI] [PubMed] [Google Scholar]
- 20.Azzopardi G, Petkov N (2012) Contour detection by CORF operator. In: Villa AE, Duch W, Érdi P, Masulli F, Palm G, editors, Artificial Neural Networks and Machine Learning ICANN 2012, Lecture Notes in Computer Science, Springer Berlin Heidelberg, volume 7552: . pp. 395–402. [Google Scholar]
- 21. Jin JZ, Wang YS, Swadlow HA, Alonso JM (2011) Population receptive fields of on and off thalamic inputs to an orientation column in visual cortex. Nature Neuroscience 14: 232–U323. [DOI] [PubMed] [Google Scholar]
- 22. Hubel D, Wiesel T (1962) Receptive Fields, binocular interaction and functional architecture in cats visual cortex. Journal of Physiology-London 160: 106–&. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Marr D, Hildreth E (1980) Theory of Edge Detection. Proceedings of the Royal Society of London Series B, Biological Sciences 207: 187–217. [DOI] [PubMed] [Google Scholar]
- 24. Palmer L, Davis T (1981) Receptive-field structure in cat striate cortex. Journal of Neurophysiology 46: 260–276. [DOI] [PubMed] [Google Scholar]
- 25. Heggelund P (1981) Receptive-field organization of simple cells in cat strate cortex. Expermental Brain Research 42: 89–98. [DOI] [PubMed] [Google Scholar]
- 26. Ferster D (1988) Spatially opponent excitation and inhibition in simple cells of the cat visual cortex. The Journal of neuroscience 8: 1172–1180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Hirsch J, Alonso J, Reid R, Martinez L (1998) Synaptic integration in striate cortical simple cells. Journal of Neuroscience 18: 9517–9528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Borg-Graham L, Monier C, Fregnac Y (1998) Visual input evokes transient and strong shunting inhibition in visual cortical neurons. Nature 393: 369–373. [DOI] [PubMed] [Google Scholar]
- 29. Anderson J, Carandini M, Ferster D (2000) Orientation tuning of input conductance, excitation, and inhibition in cat primary visual cortex. Journal of Neurophysiology 84: 909–926. [DOI] [PubMed] [Google Scholar]
- 30. Rizzolatti G, Camarda R (1975) Inhibition of Visual Responses of Single Units in Cat Visual Area of Lateral Suprasylvian Gyrus (Clare-Bishop Area) by Introduction of a second visual stimulus. Brain Research 88: 357–361. [DOI] [PubMed] [Google Scholar]
- 31. Nelson J, Frost B (1978) Orientation-selective inhibition from beyond classic visual receptive field. Brain Research 139: 359–365. [DOI] [PubMed] [Google Scholar]
- 32. Knierim J, Vanessen D (1992) Neuronal responses to static texturepatterns in area-V1 of the alert macaque monkey. Journal of Neurophysiology 67: 961–980. [DOI] [PubMed] [Google Scholar]
- 33. Jones H, Grieve K, Wang W, Sillito A (2001) Surround suppression in primate V1. Journal of Neurophysiology 86: 2011–2028. [DOI] [PubMed] [Google Scholar]
- 34. Petkov N, Westenberg MA (2003) Suppression of contour perception by band-limited noise and its relation to non-classical receptive field inhibition. Biological Cybernetics 88: 236–246. [DOI] [PubMed] [Google Scholar]
- 35. Heggelund P (1986) Quantitative studies of enhancement and suppression zones in the receptive-field of simple cells in cat striate cortex. Journal of Physiology-London 373: 293–310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Jones J, Palmer L (1987) The two-dimensional spatial structure of simple receptive-fields in cat striate cortex. Journal of Neurophysiology 58: 1187–1211. [DOI] [PubMed] [Google Scholar]
- 37. Tolhurst D, Dean A (1987) Spatial summation by simple cells in the striate cortex of the cat. Experimental Brain Research 66: 607–620. [DOI] [PubMed] [Google Scholar]
- 38. DeAngelis G, Ohzawa I, Freeman R (1995) Receptive-field dynamics in the central visual pathways. Trends in Neuroscience 18: 451–458. [DOI] [PubMed] [Google Scholar]
- 39. Hirsch J (2003) Synaptic physiology and receptive field structure in the early visual pathway of the cat. Cerebral Cortex 13: 63–69. [DOI] [PubMed] [Google Scholar]
- 40. Hirsch J, Martinez L (2006) Circuits that build visual cortical receptive fields. Trends in Neuro-science 29: 30–39. [DOI] [PubMed] [Google Scholar]
- 41. Hirsch JA, Alonso JM, Reid RC, Martinez LM (1998) Synaptic integration in striate cortical simple cells. The Journal of Neuroscience 18: 9517–9528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Borg-Graham LJ, Monier C, Fregnac Y (1998) Visual input evokes transient and strong shunting inhibition in visual cortical neurons. Nature 393: 369–373. [DOI] [PubMed] [Google Scholar]
- 43. Ferster D, Miller KD (2000) Neural mechanisms of orientation selectivity in the visual cortex. Annual review of neuroscience 23: 441–471. [DOI] [PubMed] [Google Scholar]
- 44. Vidyasagar T, Sigüenza J (1985) Relationship between orientation tuning and spatial frequency in neurones of cat area 17. Experimental Brain Research 57: 628–631. [DOI] [PubMed] [Google Scholar]
- 45. Webster MA, De Valois RL (1985) Relationship between spatial-frequency and orientation tuning of striate-cortex cells. JOSA A 2: 1124–1132. [DOI] [PubMed] [Google Scholar]
- 46.Casagrande V, Norton T (1991) The lateral geniculate nucleus: A review of its physiology and function, volume 4. MacMillan Press, London, 41–84 pp. [Google Scholar]
- 47. Martinez L, Wang Q, Reid R, Pillai C, Alonso J, et al. (2005) Receptive field structure varies with layer in the primary visual cortex. Nature Neuroscience 8: 372–379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Liu Bh, Li Yt, Ma Wp, Pan Cj, Zhang LI, et al. (2011) Broad Inhibition Sharpens Orientation Selectivity by Expanding Input Dynamic Range in Mouse Simple Cells. Neuron 71: 542–554. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Li Yt, Ma Wp, Li Ly, Ibrahim LA, Wang Sz, et al. (2012) Broadening of Inhibitory Tuning Underlies Contrast-Dependent Sharpening of Orientation Selectivity in Mouse Visual Cortex. Journal of Neuroscience 32: 16466–16477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Petkov N, Westenberg M (2003) Suppression of contour perception by band-limited noise and its relation to nonclassical receptive field inhibition. Biological Cybernetics 88: 236–246. [DOI] [PubMed] [Google Scholar]
- 51. Mazer JA, Vinje WE, McDermott J, Schiller PH, Gallant JL (2002) Spatial frequency and orientation tuning dynamics in area v1. Proceedings of the National Academy of Sciences 99: 1645–1650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Sceniak MP, Hawken MJ, Shapley R (2002) Contrast-dependent changes in spatial frequency tuning of macaque v1 neurons: effects of a changing receptive field size. Journal of Neurophysiology 88: 1363–1373. [DOI] [PubMed] [Google Scholar]
- 53. Grigorescu C, Petkov N, Westenberg M (2003) Contour detection based on nonclassical receptive field inhibition. IEEE Transactions on Image Processing 12: 729–739. [DOI] [PubMed] [Google Scholar]
- 54. Canny J (1986) A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 8: 679–698. [PubMed] [Google Scholar]
- 55.Sonka M, Hlavac V, Boyle R (1999) Image processing, analysis, and machine vision. Pacific Grove, CA: Brooks/Cole. [Google Scholar]
- 56. Petkov N (1995) Biologically motivated computationally intensive approaches to image pattern-recognition. Future Generation Computer Systems 11: 451–465. [Google Scholar]
- 57. Hawken M, Parker A (1987) Spatial properties of neurons in the monkey striate cortex. Proceedings of the Royal Society of London, series B, Biological Sciences 231: 251–288. [DOI] [PubMed] [Google Scholar]
- 58. Serre T, Wolf L, Bileschi S, Riesenhuber M, Poggio T (2007) Object recognition with cortex-like mechanism. IEEE Transactions on Pattern Analysis and Machine Intelligence 29: 411–426. [DOI] [PubMed] [Google Scholar]
- 59. Rodríguez-Sánchez A, Tsotsos J (2011) The importance of intermediate representations for the modeling of 2d shape detection: Endstopping and curvature tuned computations. IEEE Computer Society Conference on Computer Vision and Pattern Recognition 4321–4326. [Google Scholar]
- 60. Azzopardi G, Petkov N (2013a) Trainable COSFIRE Filters for Keypoint Detection and Pattern Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35: 490–503. [DOI] [PubMed] [Google Scholar]
- 61. Azzopardi G, Petkov N (2013b) Automatic detection of vascular bifurcations in segmented retinal images using trainable COSFIRE filters. Pattern Recognition Letters 34: 922–933. [Google Scholar]