Serre et al. 10.1073/pnas.0700622104.

Supporting Information

Files in this Data Supplement:

SI Text
SI Table 1
SI Figure 4
SI Figure 5
SI Figure 6
SI Figure 7
SI Figure 8
SI Figure 9
SI Table 2




SI Figure 4

Fig. 4. The role of units in different areas. Comparison between different layers of the model on the animal- vs. nonanimal-categorization task. The poor performance of model after lesioning V4 is likely to be because of the resulting decrease of invariance to position and scale. The bypass route only corresponds to an implementation of the model in which V4 was lesioned; the direct route corresponds to an implementation of the model for which the route from V2 to the posterior inferotemporal cortex (bypassing V4) was lesioned. The performance of all of the various model implementations was obtained with n = 10 random splits.





SI Figure 5

Fig. 5. The effect of image orientation. A comparison between the performance (d') of the human observers (Left, n = 14) and the model (Right) in three experimental conditions: upright, at 90° rotation, and inverted (180° rotation). Human observers and the model are similarly robust to image rotations.





SI Figure 6

Fig. 6. A comparison between the model and human observers (hit rates) with different mask conditions. The upper and lower bounds on human-level performance (n = 21) are given by the no-mask and the immediate-mask conditions, respectively. The average accuracy (percent correct) of human observers for the conditions with 20-ms SOA, 50-ms SOA, 80- ms SOA, and no-mask conditions were 59%, 79%, 86%, and 91%, respectively-all significantly above chance (t test, P < 0.01)-compared to 82% for the model (18% false alarms). The model matches human observers for SOAs between 50 ms and 80 ms. Error bars indicate the standard error and are not directly comparable for the model (computed over n = 20 random runs) and for humans (computed over n = 21 observers).





SI Figure 7

Fig. 7. An estimate of the timing of feedback loops in the ventral stream of primates (based on refs. 47 and 48). We assume that typical latency from one stage to the next is »10-20 ms and that feedforward and back projections have similar conduction times (45). The first number corresponds to latencies for monkeys and is assumed to constitute a lower bound on the latencies for humans. The second number corresponds to an additional 50% and is assumed to constitute a "typical" number for humans ( S. Thorpe, personal communication).





SI Figure 8

Fig. 8. A close-up view in the model from S1 to C2 stages. The input image (gray value) is first analyzed by an array of functionally organized S1 units at all locations and several scales. At the next C1 stage, a local max-pooling operation is taken over retinotopically organized S1 units with the same preferred orientation and at neighboring positions and scales to increase invariance to 2D transformations. S2 units then combine the response of several C1 units at different preferred orientations to increase the complexity of the optimal stimulus with a tuning operation and are selective for features of moderate complexity (53) (examples shown in yellow). Although only one type of S2 unit is shown, by considering different combinations (learned from natural images) of C1 units in the model, we obtain KS2 »2, 000 different types of S2 units. S2 units are also organized in feature maps such that every location in the visual field is analyzed by all KS2 types of S2 units at different scales. A local max-pooling operation is performed over S2 units with the same selectivity over neighboring positions and scales to yield the C2 unit responses. C2 units have been shown to match well with the tuning and invariance properties of cells in V4 (see ref. 4, pp. 28-36) in response to different stimulus sets (17-19).





SI Figure 9

Fig. 9. The population of S1 units (corresponding to simple cells in primary visual cortex). 2 phases ´4 orientations ´ 17 sizes (or equivalently peak frequencies). Only units at one phase are shown but the population also includes filters of the opposite phase. Receptive field sizes range between 0.2° and 1.1°(typical values for cortex range between »0.1° and 1°; see refs. 10 and 11). Peak frequencies are in the range between 1.6 and 9.8 cycles/deg.