Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Oct 1.
Published in final edited form as: Neuron. 2014 Sep 18;84(1):55–62. doi: 10.1016/j.neuron.2014.08.043

A channel for 3D environmental shape in anterior inferotemporal cortex

Siavash Vaziri 1,2, Eric T Carlson 1,2, Zhihong Wang 2, Charles E Connor 2,3
PMCID: PMC4247160  NIHMSID: NIHMS624610  PMID: 25242216

SUMMARY

Inferotemporal cortex (IT) has long been studied as a single pathway dedicated to object vision, but connectivity analysis reveals anatomically distinct channels, through ventral superior temporal sulcus (STSv) and dorsal/ventral inferotemporal gyrus (TEd, TEv). Here, we report a major functional distinction between channels. We studied individual IT neurons in monkeys viewing stereoscopic 3D images projected on a large screen. We used adaptive stimuli to explore neural tuning for 3D abstract shapes ranging in scale and topology from small, closed, bounded objects to large, open, unbounded environments (landscape-like surfaces and cave-like interiors). In STSv, most neurons were more responsive to objects, as expected. In TEd, surprisingly, most neurons were more responsive to 3D environmental shape. Previous studies have localized environmental information to posterior cortical modules. Our results show it is also channeled through anterior IT, where extensive cross-connections between STSv and TEd could integrate object and environmental shape information.

INTRODUCTION

“Inferotemporal cortex” (IT) is a general designation for the final stages in the ventral, “what” pathway of monkey and human visual cortex, which progresses anteriorly through the inferior temporal lobe (Ungerledier and Mishkin, 1982; Mishkin et al., 1983). A recent meta-analysis of anatomical evidence concluded that monkey IT actually comprises three channels, with very different connectivities to prefrontal cortex, basal ganglia, and medial temporal lobe structures (Kravitz et al., 2012). Here, we discovered a major functional distinction between two of these channels, STSv and TEd.

As in our recent studies of 3D object shape coding (Yamane et al., 2008; Hung et al., 2012), we used an adaptive stimulus algorithm to localize and densely sample sparse IT tuning functions within the virtually infinite domain of 3D shape. The algorithm was initialized with random, abstract shapes, which evolved through multiple generations, guided by differential neural responses to stimuli. Stimuli were segregated into two independent lineages, which could then be compared to evaluate repeatability and significance of results. This is an efficient search method for characterizing pure shape coding, free from semantic confounds and statistical biases in photographic stimuli.

Here, for the first time, we tested stimuli ranging continuously from small object shapes to large environmental shapes. “Object” stimuli were topologically spheroidal—closed (i.e. like a ball rather than a bowl) and bounded (completely contained) within the field of view. “Environmental” stimuli were topologically planar—open and extending beyond the field of view, like landscapes. Overall surface curvature of some environments, like small caves, could imply closure behind the viewer. Since the goal was to contrast responses to objects and environments, the adaptive method was critical for a fair comparison between optimized object stimuli and optimized environmental stimuli. A random selection of objects and environments could easily fail to sample the maximum response range in the dominant category. In addition, the parallel, independent lineages were essential for testing consistency of any response differences.

While previous IT research has focused on objects, we found that the large majority of neurons in the TEd channel were more responsive to environmental shape. These responses were critically dependent on 3D shape cues, especially texture flows and shading. In contrast to TEd, the majority of neurons in the STSv channel were more responsive to object shape. This object/environment dichotomy is a functional counterpart to differences in anatomical connectivity with other brain regions. In particular, object-sensitive STSv, unlike TEd, provides major inputs to ventrolateral prefrontal cortex (VLPFC) regions involved in object working memory and orbitofrontal cortex (OFC) regions that process object value (Kravitz et al., 2012).

The response bias in TEd shows that environmental shape information, which has been localized previously to more posterior and dorsal cortical modules (Epstein and Kanwisher, 1998; Epstein et al., 1999; Park and Chun, 2009; Ward et al., 2010; Nasr et al. 2011; Kornblith et al. 2013; Kravitz et al., 2011b), is also propagated through the final stages of the ventral pathway. As with face and color information (Tsao et al. 2003; Lafer-Sousa and Conway, 2013), environmental shape information remains at least partially segregated at these advanced processing stages. (Our anatomical sampling range largely spared regions where face and color modules have been identified.) However, the strong cross-connectivity between STSv and TEd would support integration of object information into environmental contexts.

RESULTS

We studied 76 TEd neurons and 65 STSv neurons in 2 monkeys performing a passive fixation task. We measured responses to abstract 3D shapes, projected stereoscopically on a screen subtending 77° of visual angle horizontally and 61° vertically. Position and structure in depth were conveyed by binocular disparity (stereopsis), shading (consistent with a light source from the viewing direction at infinite distance), and surface texture (hexagonal grids applied to the stimulus surface). 3D stimulus shapes were generated by randomly distorting an initially spherical NURBS (non-uniform rational b-spline) surface. Stimuli varied in scale and topology (Fig. 1A) from small, closed objects subtending a few degrees of visual angle to large, open surfaces resembling landscapes or cave interiors (these were the visible portions of large stimuli extending beyond the screen boundaries).

Figure 1.

Figure 1

Stimuli and example results. (A) Example stimulus shape presented at six scales. Inset values specify the maximum visual angle subtended by the stimulus. The fixation point (red dot) remains at the same point on the stimulus surface. As scale increases, the stimulus extends beyond the screen borders, and the visible portion becomes landscape-like. (B) Example morphing transformations. From left to right: surface distortion removal, x-axis rotation, translation/rotation of latitudinal ring, change in surface distortion height function, y-axis rotation. (C) High-response stimuli for example TEd neurons. The coronal MRI image shows the approximate extent of TEd (cyan tint) and recording location of example neurons (colored dots). For each neuron, high-response stimuli from independent lineages (left and right columns) are indicated by arrows of the corresponding color. The border color around each stimulus indicates response rate (averaged across the 750 ms presentation period and 5 repetitions) based on the scale bars at right, which index the minimum-to-maximum response range of each neuron in sp/s. As in our previous studies of IT shape coding (Yamane et al., 2008; Hung et al. 2012), we observed that high-response stimuli differed in global shape but shared partial structure, consistent with fragment-based ensemble coding of shape. In these previous studies, we used multidimensional parameterization of surface and medial axis shape to construct linear/nonlinear models of 3D shape fragment tuning validated by correspondence across lineages. Similar analyses can be applied to these data, but will require extensive presentation in a separate report. Precise characterization of shape tuning is not critical for our purpose here, which is to analyze overall selectivity for stimulus scale. The essential cross-validation here is between scale-tuning functions in separate lineages. (D) Scale-tuning functions for example TEd neurons. Average response (+/− SEM) across all stimuli as a function of stimulus scale is plotted for each example (indexed by color) in each lineage (left vs. right). The correlations between scale-tuning functions across lineages are given by the r values in (C). These correlations were each highly significant (p < 0.001). Baseline activity +/− SEM for each example neuron (colored points next to vertical axis) was averaged across randomly scheduled null stimulus presentation periods during the fixation task. (E) High-response stimuli for example STSv neurons. Details as in (C). (F) Scale-tuning functions for example STSv neurons. Details as in (D). See also Fig. S1.

We studied each neuron with two independently evolving stimulus lineages in order to verify repeatability of results. The first generation of each lineage comprised 40 random 3D shapes. Successive stimulus generations included “descendant” stimuli that were partially morphed versions of “ancestor” stimuli from previous generations. The “fitness” of an ancestor stimulus—its probability of producing one or more descendants in the current generation—was defined by its neural response level, averaged over 5 presentations. Descendants were created by morphing the shape, position, orientation, and/or scale of the ancestor (Fig. 1B). Scale changed probabilistically, but the distribution of scales within each generation was constrained for equal sampling of object-sized stimuli (subtending 4–22° of visual angle), environmental stimuli (subtending 80–142°, i.e. greater than screen width), and intermediate stimuli (subtending 22–80°). Neurons were typically studied with 5–8 stimulus generations, for a total of 400–640 stimuli.

TEd neurons were typically selective for large structures extending past the screen boundary (i.e. subtending >80° of visual angle), as illustrated by the examples in Figs. 1C and D (see also Fig. S1A). Response rates (averaged across the 750 ms presentation period) of these example neurons were maximal in the environmental scale range and fell to background levels for object-scale stimuli. Scale-tuning functions were strongly correlated across lineages (r values in Fig. 1C), demonstrating consistent evolution in the adaptive stimulus algorithm. The response bias of each neuron toward environmental stimuli (subtending >80°) over object stimuli (subtending <22°) was highly significant (p < 0.0001, Wilcoxon rank-sum test).

In contrast, STSv neurons were typically selective for small, object-like stimuli with closed, bounded surfaces, as illustrated by the examples in Figs. 1E and F (see also Fig. S1B). Response rates of these example neurons were maximal in the object scale range and fell to background levels for environmental stimuli. Scale-tuning functions were strongly correlated across lineages (Fig. 1F), and the response bias of each neuron toward object stimuli was highly significant (p < 0.0001, Wilcoxon rank-sum test). Responsiveness in STSv to 3D, object-like stimuli, defined in part by their surface curvatures, is consistent with previous studies (Janssen et al., 2000; Yamane et al., 2008).

Differential processing of object vs. environmental shape between the two channels was strong and significant at the population level (Fig. 2). The scale-tuning function for each neuron in our sample is depicted by a colored strip (Fig. 2A), in which color represents the stimulus range (red = objects, green = environments) and brightness represents average response strength at each point along the stimulus continuum. Neurons are ordered according to recording location around the circumference of IT in the coronal plane. Thus, from left to right, recording locations progress laterally across ventral bank STS, from fundus to lip, and then ventro-medially across the surface of the gyrus (see MRI images). In the anterior-posterior direction, these recording locations spanned a range from +8 to +22 mm (anterior to the interaural line), mostly between the expected locations of the middle and anterior face and color patches. Only two recording sites fell within the expected location of the AL face patch (Fig. S2A) (Freiwald & Tsao, 2010). There were no significant anterior-posterior or medio-lateral trends in scale tuning in either STS or TEd (Fig. S2A caption).

Figure 2.

Figure 2

Population results. (A) The scale-tuning function of each neuron (n = 141) is plotted as a vertical strip. Color represents scale, as in Fig. 1D,F, ranging from small objects (red) to large environments (green). Brightness at each point along the vertical strip is proportional to the average normalized response strength across all stimuli at the corresponding scale in the upper half of the neuron’s response range. Neurons are arranged along the horizontal axis based on their positions in the coronal plane along the STSv and TEd channels (see MRI images and arrows). (B) For each neuron, we plot the result of a Wilcoxon rank-sum test based on the 10 highest response environmental stimuli (subtending > 80°) as compared to the 10 highest response objects (subtending < 22°). The rank-sum value can vary from 55 (if the environmental stimuli occupy all the lower ranks, 1–10) to 155 (if the environmental stimuli occupy all the higher ranks, 11–20). Dashed lines indicate significance thresholds (p < 0.05, two-tailed). Significant neurons are plotted in red (object) or green (environment). Neurons are ordered along the horizontal axis as in (A). (C) Distributions of Pearson correlations between scale-tuning functions in the two independent lineages for each neuron. Filled bars represent significant correlations (p < 0.01). (D) Average normalized response levels in STSv (red) and TEd (green) as a function of stimulus diameter. The shaded regions represented the 95% confidence interval of average response values. See also Fig. S2.

There was, however, a major difference between channels (Fig. 2B). The majority of STSv neurons (49/65; 75%) were significantly (p < 0.05, Wilcoxon rank-sum test) more responsive to object stimuli (red dots, Fig. 2B). Only 8 STSv neurons were significantly more responsive to environmental stimuli (green dots). Conversely, the majority of TEd neurons (50/76; 66%) were significantly (p < 0.05, Wilcoxon rank-sum test) more responsive to environmental stimuli (green dots). Only 16 TEd neurons were significantly more responsive to object stimuli (red dots). In both channels, scale-tuning functions were highly correlated across independent stimulus lineages (Fig. 2C). This correlation was significant (p < 0.01) for 60/65 STSv neurons and 71/76 TEd neurons. The average normalized response across STSv neurons (Fig. 2D, red) was highest in the small object range and declined gradually to a minimum at stimulus diameters near 70°. In contrast, the average normalized response across TEd neurons (Fig. 2D, green) was low up to 70°, then climbed abruptly, at precisely the point at which stimuli became large enough to exceed the viewing frame, thus becoming topologically open and unbounded, i.e. environmental. Neurons in both channels were highly selective for stimulus shape within these different ranges (Figs. S1C,D). We conclude that the STSv channel was predominantly sensitive to object shape, as expected; the TEd channel contained some object-responsive neurons, but was predominantly sensitive to environmental shape.

Since the prevalence of environmental shape processing in TEd was unexpected, we performed a number of additional tests (when recording time permitted) to confirm that TEd neurons were responding to 3D environmental shape rather than some associated, lower-level stimulus factor. We selected a high-response 3D environmental stimulus from the adaptive search and tested how responses to this stimulus depended on multiple factors. First, we compared responses to equivalent 2D stimuli, in which 3D structure was destroyed by eliminating binocular disparities, shading, and texture flows (by alternatively flattening the hexagonal grid onto the plane of the projection screen, by randomizing the orientations of the constituent texture lines, and removing texture entirely). Destroying 3D structure in this way largely abolished responses in most cases (Fig. 3A; similar results for STSv neurons are shown in Fig. S3A). Thus, these neurons were specifically responsive to shape-in-depth, which is especially definitive for environments. This shows that responses did not depend simply on the presence of a large 2D shape or texture. It also shows that responses did not depend simply on stimulation of more peripheral parts of the visual field. In other words, large receptive fields, while certainly necessary, are not sufficient to explain selective responses to environmental stimuli. This was further confirmed by testing responses to high-response object stimuli at peripheral locations (Fig. S3B).

Figure 3.

Figure 3

Control tests on TEd neurons with significant selectivity for environmental stimuli. Example results are presented for a single TEd neuron (cyan dot in population plots). (A) Sensitivity to 3D shape-in-depth vs. 2D shape. One high-response and one low-response environmental stimulus were selected from the adaptation experiment. Modulation strength is the response difference divided by the maximum. 3D modulation strength (x-axis) is based on the original stimuli with all depth cues. 2D modulation strength (y-axis) was based on stimuli with no disparity cues, no shading, and either fronto-parallel hexagonal texture, random line texture, or no texture (silhouettes), whichever produced the highest modulation value. Stimuli for the example neuron are shown next to the corresponding axes, with borders indicating response rate (see scale bar). Removing cues for shape-in-depth largely abolished differential responses. The average 3D modulation strength of 0.86 was significantly greater than the average 2D modulation strength of −0.041 (paired t-test, p < 0.0001). (B) Responses do not depend on texture density. Responses to the original texture density (medium) are plotted against the vertical scale. Responses remained consistent when texture density decreased (low) or increased (high). (C) Scale-tuning test. The high-response environmental stimulus for the example neuron was presented at 6 scales, ranging from object to environmental. (The original stimulus from the adaptive experiment was the largest scale, at right.) Preference for environmental-scale stimuli was maintained even for this optimal shape. Similar consistency of scale tuning for optimal shapes was observed for other neurons in TEd (Fig. S3C) and STSv (Fig. S3D). (D) Consistency of response across lighting directions. Responses to the original lighting direction (from the direction of the viewer, at infinite distance) are plotted against the vertical scale. Responses remained consistent for spotlights positioned to the right, left, bottom, or top, which produced substantially different image contrast patterns. See also Fig. S3.

We further tested dependence on 3D cues by presenting all combinations of binocular disparity, shading, and texture flows (Fig. S3C). Texture flow was the most significant 3D cue for environmental stimuli (F(1,21) = 16.8; p < 0.002, 3-way repeated measures ANOVA), followed by shading (F(1,21) = 10.4; p < 0.005). Stereoscopic cues did not have a significant effect on environmental shape responses (F(1,21) = 1,8; p = 0.194), which makes sense because binocular disparity differences become negligible at large distances and thus do not provide precise information about large-scale shape-in-depth. Texture flows provide strong cues for 3D shape independent of stereoscopic depth (Ben-Shahar and Zucker, 2001, 2003; Li and Zaidi, 2000, 2004).

The low responses to 2D stimuli that preserved but distorted textures helps address the concern that neurons might simply respond to higher spatial frequencies in environmental stimulus textures (Rajimehr et al., 2011). To further rule out this concern, we tested responses when the spatial frequency of the texture in the 3D stimuli was decreased or increased by a factor of 2 (Fig. 3B). These manipulations did not significantly affect responses (F(2,32) = 0.16; p = 0.85), showing that the neurons were not merely sensitive to a specific spatial frequency range.

Another concern could be that IT responses are known to be somewhat size-invariant (Ito et al., 1995; Brincat and Connor, 2004). Given this, high response shapes might by chance evolve at a particular scale, but might evoke equally strong responses at a very different scale. This concern is addressed by the strong and in most cases (131/141) significant (p < 0.01) correlation between scale-tuning functions across entirely independent lineages (Fig. 2C). But we also selected high response stimuli for some neurons and tested them across the entire scale range (see example in Fig. 3C). Scale-selectivity remained strong and consistent with the results of the adaptive experiment in all cases (Fig. S3D,E).

We also tested the effect of lighting direction, which determines the pattern of shading across the stimulus. During the adaptive search procedure, the implicit light source was at infinite distance from the viewer’s direction. We additionally tested responses when the light source was above, below, to the right, or to the left. The results (Fig. 3D) show that the 3D environmental shape selectivity of these TEd neurons was remarkably consistent (F(4,84) = 0.64, p = 0.64) across these manipulations, which produced very different image contrast patterns.

DISCUSSION

Our results reveal a surprising specialization for environmental shape processing in the middle channel (TEd) of anterior IT cortex in macaque monkeys. The majority of TEd neurons were substantially more responsive to large, open, unbounded 3D surface shapes. These responses were critically dependent on depth cues, especially texture flows (Ben-Shahar and Zucker, 2001, 2003; Li and Zaidi, 2000, 2004). This makes sense since environments are more defined by 3D shape-in-depth than by 2D self-occlusion boundaries. Texture flows are likely to be an essential 3D shape cue for environments, which occupy more distant depth ranges where binocular disparity differences become negligible. A previous study showed that TEd neurons are not sensitive to stereoscopic depth in object-size stimuli and concluded that TEd represents only 2D shape (Janssen et al., 2000). To the contrary, we found that TEd is very sensitive to 3D shape, but on the scale of environments, a scale on which stereoscopic depth cues provide little shape information.

TEd selectivity for environments contrasted with STS selectivity for objects, which have long been considered the primary domain of IT processing. This is a clear functional distinction between the anatomically-defined channels in anterior ventral pathway identified described by Kravitz and colleagues (2012). These channels differ in connectivity with prefrontal cortex, basal ganglia, and medial temporal lobe memory structures. Our finding that STS is more specialized for object shape is consistent with its stronger connection to object-sensitive ventrolateral prefrontal cortex (Kravitz et al., 2012). It should be noted that we observed some selectivity for objects in TEd as well, consistent with prior studies of object processing in this region.

Channeling of environmental and object shape processing in anterior IT is a new feature of modular organization in ventral pathway cortex. Face and color modules occur at intervals along the inferior temporal lobe (Tsao et al., 2003; Moeller et al., 2008; Lafer-Sousa and Conway, 2013). Our anterior-posterior sampling range fell mostly in between reported locations for middle and anterior face and color patches. Our sampling also largely spared the lip of the sulcus, where face patches tend to be located. Thus, our findings expand on rather than conflict with prior understanding of anterior IT modularity.

Our discovery of environmental shape processing in anterior IT contrasts with previous work that localized processing of place information (scenes and buildings) mainly to more posterior and dorsal visual regions. Beginning with the identification of the parahippocampal place area (PPA) (Epstein and Kanwisher, 1998; Epstein et al., 1999), human fMRI studies have revealed multiple areas posterior/dorsal involved in structural and/or semantic processing of place information (Epstein 2008; Park and Chun, 2009; Ward et al., 2010; Epstein and Ward, 2010; Kornblith et al. 2013; Park et al., 2011; Mullally and Maguire, 2011; Kravitz et al., 2011a; Harel et al., 2013). Posterior scene-sensitive patches in macaque monkey cortex have been identified with fMRI, and specialization of individual neurons in posterior IT (TEO and/or TFO) for place processing has recently been confirmed by Kornblith and colleagues (Kornblith et al., 2013; see also Sato and Nakamura, 2003). These posterior IT neurons were particularly sensitive to surface textures, analogous to our finding in TEd that texture flows were the most critical cues for shape-in-depth.

Our data reveal that in addition to these posterior modules, information about place structure is also processed in anterior stages of the ventral pathway. The ventral pathway is specialized for fine shape discrimination, and thus could support accurate recognition of familiar environments—rooms, neighborhoods, landscapes—even when the objects or landmarks they contain are rearranged or removed. Perhaps more importantly, object information must ultimately be organized within representations of the spatial environment to support scene comprehension and physical interactions with the world. The strong connectivity between object-sensitive STS and environment-sensitive TEd would enable such integration. Consistent with this idea, MacAvoy and Epstein (2011) have shown that, in human cortex, ventral pathway area LO, but not PPA, combines information about scenes and scene-associated objects.

Our findings are consistent with some fMRI evidence for more anterior sensitivity to scene information. In addition to finding a homologue to human PPA in monkey posterior temporal cortex (mPPA), Nasr and colleagues (2011) showed some activation in more anterior locations (Nasr et al., Fig. 8). In humans, a recent study shows that semantic information about scene categories is represented beyond the boundaries of previously described functional regions (Stansbury et al., 2013). It is also worth noting that neural recording can reveal substantial selectivity in regions where fMRI differences are weak or absent: Kornblith and colleagues (2013) found that neural place selectivity was prominent not only in a lateral place patch (LPP), which emerged in an fMRI contrast between places and other stimuli, but also in a more medial place patch (MPP), which was identified only through stimulation of LPP.

Our results might also relate to the parallel organization of large and small object representations in human cortex described by Konkle and Oliva (2012). They found that smaller objects are represented in superior/lateral ventral pathway, while larger object representations are more inferior/medial swath. This organization depended on the real-world size of the familiar object stimuli, not on their retinal size. Their result seems consistent with our finding that small, abstract objects are represented in a more superior channel and large environments in a more inferior channel. Representations of large objects and large environments are likely to co-localize, since they are statistically related in the natural world. Also, both depend on inputs from more peripheral parts of the visual field. Anterior IT does not appear to be retinotopic (Lafer-Souza and Conway, 2013), but the ventral location of large object/environment processing might reflect the ventral origin of peripheral upper field inputs from early visual cortex (Kravitz et al. 2012). Under natural viewing conditions where nearby objects are foveated at ground level, the upper field periphery is likely to contain the most information about surrounding environmental structure.

Our results leave open the question of whether environmental information in TEd is more structural or categorical/semantic in nature. These two domains are difficult to dissociate because they are statistically correlated in the natural world and necessarily entwined at the neural processing level and in perception (Kourtzi and Connor, 2011). Our stimuli were entirely abstract, and thus serve to show that there is at least some level of structural processing in TEd. But further experiments involving photographs of familiar environments would be required to distinguish representation of structure from structural processing in support of place recognition and categorization. It is also conceivable that photographic object and scene stimuli would reveal a different pattern of anatomical organization. Our results establish a clear dichotomy in structural processing, but semantic representation of familiar stimuli can be independent of structural metrics, including scale (Konkle and Oliva, 2012).

EXPERIMENTAL PROCEDURES

Behavioral task, stimulus presentation, and electrophysiological recording

Two head-restrained male rhesus monkeys (Macaca mulatta) were trained to maintain fixation within 1° (radius) of a 0.1° diameter spot for 4 s to obtain a juice reward. Eye position was monitored with an infrared eye tracker (EyeLink). 3D shape stimuli were rendered with shading, surface texture, and binocular disparity cues using openGL. Two precisely aligned high-resolution (1400 × 1050) color projectors were used to back-project differentially polarized right and left eye images on a screen subtending 77° of visual angle horizontally and 61° vertically with the monkey seated at a viewing distance of 55 cm. The fixation target appeared at screen center, with one of 4 right/left image disparities corresponding to fixation at 5 cm (screen depth), 100 cm, 10 m, or 100 m. The stimulus was always tangent to the fixation point, at 0 disparity, so the monkey was always fixating a point on the stimulus surface. During fixation, randomly ordered stimuli were flashed on the screen with a period of 1 s (250 ms pre-stimulus blank, 750 ms stimulus presentation), for a total of 4 stimuli per trial. Prior to each experimental session, binocular fusion and stereoscopic depth perception were verified with a random dot stereogram saccade target detection task. The electrical activity of well-isolated single neurons was recorded with epoxy-coated tungsten electrodes (Microprobe or FHC). Action potentials of individual neurons were amplified and electrically isolated using a Tucker-Davis Technologies recording system. Recording positions were determined on the basis of structural magnetic resonance images and the sequence of sulci and response characteristics observed while lowering the electrode. All animal procedures were approved by the Johns Hopkins Animal Care and Use Committee and conformed to US National Institutes of Heath and US Department of Agriculture guidelines.

Stimulus construction and adaptive algorithm

Each stimulus was based on a 3D ellipsoidal shape defined by a dense polar grid of NURBS (non-uniform rational b-splines) control points, which can be rendered as a smooth surface with the OpenGL NURBS utility. Large scale shape variation was created by randomly altering the positions, orientations, sizes and aspect ratios of 2–5 latitudinal rings in the polar grid and interpolating the positions of intervening NURBs control points. Smaller scale variation in surface shape was created by defining points and paths where surface height was altered with random polynomial functions, creating hills, valleys, ridges, canyons, etc. Global surface smoothness was varied to span from the curvilinear structure of natural landscapes to the rectilinear structure of manmade environments.

Each successive generation in the adaptation procedure comprised 40 stimuli in each of 2 lineages, for a total of 80 stimuli. In each generation, 80% of stimuli were partially-morphed descendants of ancestors drawn from the entire pool of previously tested stimuli, and 20% were new, randomly defined stimuli. The probability of a given stimulus producing a descendant was based on its average response rate. The current response range of the neuron was divided into 5 percentile ranges. Thirty percent of ancestors were from the top range (90–100% of maximum response), twenty percent each from the three middle ranges (70–90, 50–70, 30–50%), and ten percent from the bottom range (0–30%).

To produce each descendant, 1–3 morphing transformations were applied. The number and magnitude of morphing transformations were probabilistic functions of response rate, to produce more alteration of low response ancestors and less alteration of high response ancestors. The probabilities of specific morphing transformations were heavily weighted toward scaling, since this produced the transition from object to environmental shapes. The scale distribution within each generation was constrained for equal sampling in the object range (4–22° of visual angle), intermediate range (subtending 22–80°), and environmental range (80–142°). The morphing probabilities were scaling 35%, fixation depth 5%, rotation 10% (equal probability for x-, y-, and z-axis rotation), translation 10%, longitudinal scaling of the NURBs grid 5%, translation/rotation of one latitudinal ring 5%, change in elliptical aspect ratio of one latitudinal ring 5%, global latitudinal curvature 5%, global longitudinal curvature 5%, height function of a surface distortion 5%, path shape of a surface distortion 5%, position of a surface distortion 2.5%, removal of a surface distortion 2.5%. All stimuli were adjusted in depth such that the surface was tangent to the fixation point at 0 disparity. Closer fixation depths were prohibited when they would cause the stimulus surface to intersect the surface of the projection.

Data analysis and statistics

Response rates for each stimulus were averaged across the 750 ms presentation period and across 5 repetitions. Baseline response was averaged over randomly scheduled null stimulus presentations during the adaptive experiment. Baseline was subtracted from each stimulus response for the population analyses presented in Fig. 2. Selectivity for object vs. environmental stimuli was characterized with a Wilcoxon rank sum test. Pearson correlation was used to evaluate consistency of scale-tuning across independent lineages. Repeated measures ANOVA was used to evaluate the effects of control test manipulations. Details of analyses are described in Results.

Supplementary Material

HIGHLIGHTS.

  • TEd, a channel in the ventral visual pathway, processes 3D environmental shape

  • The STSv channel in anterior IT processes 3D object shape

Acknowledgments

We thank William Nash, William Quinlan, Lei Hao, and Virginia Weeks for technical assistance.

This work was supported by NIH Grant #EY024028.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Ben-Shahar O, Zucker SW. On the perceptual organization of texture and shading flows: From a geometrical model to coherence computation. Computer Vision and Pattern Recognition, 2001. CVPR 2001; Proceedings of the 2001 IEEE Computer Society Conference on (Vol. 1); IEEE; 2001. pp. I–1048. [Google Scholar]
  2. Ben-Shahar O, Zucker SW. The perceptual organization of texture flow: A contextual inference approach. IEEE Trans Pattern Anal Mach Intell. 2003;25:401–417. [Google Scholar]
  3. Epstein RA. Parahippocampal and retrosplenial contributions to human spatial navigation. Trends Cogn Sci. 2008;12:388–396. doi: 10.1016/j.tics.2008.07.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Epstein R, Harris A, Stanley D, Kanwisher N. The parahippocampal place area: recognition, navigation, or encoding? Neuron. 1999;23:115–125. doi: 10.1016/s0896-6273(00)80758-8. [DOI] [PubMed] [Google Scholar]
  5. Epstein R, Kanwisher N. A cortical representation of the local visual environment. Nature. 1998;392:598–601. doi: 10.1038/33402. [DOI] [PubMed] [Google Scholar]
  6. Epstein RA, Ward EJ. How reliable are visual context effects in the parahippocampal place area? Cerebral Cortex. 2010;20:294–303. doi: 10.1093/cercor/bhp099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Freiwald WA, Tsao DY. Functional compartmentalization and viewpoint generalization within the macaque face-processing system. Science. 2010;330:845–851. doi: 10.1126/science.1194908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Harel A, Kravitz DJ, Baker CI. Deconstructing visual scenes in cortex: gradients of object and spatial layout information. Cerebral Cortex. 2013;23:947–957. doi: 10.1093/cercor/bhs091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Hung CC, Carlson ET, Connor CE. Medial axis shape coding in macaque inferotemporal cortex. Neuron. 2012;74:1099–1113. doi: 10.1016/j.neuron.2012.04.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Ito M, Tamura H, Fujita I, Tanaka K. Size and position invariance of neuronal responses in monkey inferotemporal cortex. J Neurophysiol. 1995;73:218–226. doi: 10.1152/jn.1995.73.1.218. [DOI] [PubMed] [Google Scholar]
  11. Janssen P, Vogels R, Orban GA. Selectivity for 3D shape that reveals distinct areas within macaque inferior temporal cortex. Science. 2000;288:2054–2056. doi: 10.1126/science.288.5473.2054. [DOI] [PubMed] [Google Scholar]
  12. Konkle T, Oliva A. A real-world size organization of object responses in occipitotemporal cortex. Neuron. 2012;74:1114–1124. doi: 10.1016/j.neuron.2012.04.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Kornblith S, Cheng X, Ohayon S, Tsao DY. A Network for Scene Processing in the Macaque Temporal Lobe. Neuron. 2013;79:766–781. doi: 10.1016/j.neuron.2013.06.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Kourtzi Z, Connor CE. Neural representations for object perception: structure, category, and adaptive coding. Ann Rev Neurosci. 2011;34:45–67. doi: 10.1146/annurev-neuro-060909-153218. [DOI] [PubMed] [Google Scholar]
  15. Kravitz DJ, Peng CS, Baker CI. Real-world scene representations in high-level visual cortex: it’s the spaces more than the places. J Neurosci. 2011a;31:7322–7333. doi: 10.1523/JNEUROSCI.4588-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Kravitz DJ, Saleem KS, Baker CI, Mishkin M. A new neural framework for visuospatial processing. Nature Reviews Neuroscience. 2011b;12:217–230. doi: 10.1038/nrn3008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Kravitz DJ, Saleem KS, Baker CI, Ungerleider LG, Mishkin M. The ventral visual pathway: an expanded neural framework for the processing of object quality. Trends Cog Sci. 2012;17:26–49. doi: 10.1016/j.tics.2012.10.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Lafer-Sousa R, Conway BR. Parallel, multi-stage processing of colors, faces and shapes in macaque inferior temporal cortex. Nat Neurosci. 2013;16:1870–1878. doi: 10.1038/nn.3555. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Li A, Zaidi Q. Perception of three-dimensional shape from texture is based on patterns of oriented energy. Vision Research. 2000;40:217–242. doi: 10.1016/s0042-6989(99)00169-8. [DOI] [PubMed] [Google Scholar]
  20. Li A, Zaidi Q. Three-dimensional shape from non-homogeneous textures: carved and stretched surfaces. Journal of Vision. 2004;4:860–878. doi: 10.1167/4.10.3. [DOI] [PubMed] [Google Scholar]
  21. Nasr S, Liu N, Devaney KJ, Yue X, Rajimehr R, Ungerleider LG, Tootell RB. Scene-selective cortical regions in human and nonhuman primates. Journal of Neuroscience. 2011;31:13771–13785. doi: 10.1523/JNEUROSCI.2792-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. MacEvoy SP, Epstein RA. Constructing scenes from objects in human occipitotemporal cortex. Nat Neurosci. 2011;14:1323–1329. doi: 10.1038/nn.2903. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Mishkin M, Ungerleider LG, Macko KA. Object vision and spatial vision: two cortical pathways. Trends Neurosci. 1983;6:414–417. [Google Scholar]
  24. Moeller S, Freiwald WA, Tsao DY. Patches with links: a unified system for processing faces in the macaque temporal lobe. Science. 2008;320:1355–1359. doi: 10.1126/science.1157436. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Park S, Chun MM. Different roles of the parahippocampal place area (PPA) and retrosplenial cortex (RSC) in panoramic scene perception. Neuroimage. 2009;47:1747–1756. doi: 10.1016/j.neuroimage.2009.04.058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Park S, Brady TF, Greene MR, Oliva A. Disentangling scene content from spatial boundary: complementary roles for the parahippocampal place area and lateral occipital complex in representing real-world scenes. Journal of Neuroscience. 2011;31:1333–1340. doi: 10.1523/JNEUROSCI.3885-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Mullally SL, Maguire EA. A new role for the parahippocampal cortex in representing space. Journal of Neuroscience. 2011;31:7441–7449. doi: 10.1523/JNEUROSCI.0267-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Rajimehr R, Devaney KJ, Bilenko NY, Young JC, Tootell RB. The “parahippocampal place area” responds preferentially to high spatial frequencies in humans and monkeys. PLOS Biology. 2011;9:e1000608. doi: 10.1371/journal.pbio.1000608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Sato N, Nakamura K. Visual response properties of neurons in the parahippocampal cortex of monkeys. J Neurophysiol. 2003;90:876–886. doi: 10.1152/jn.01089.2002. [DOI] [PubMed] [Google Scholar]
  30. Stansbury DE, Naselaris T, Gallant JL. Natural scene statistics account for the representation of scene categories in human visual cortex. Neuron. 2013;79:1025–1034. doi: 10.1016/j.neuron.2013.06.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Tsao DY, Freiwald WA, Knutsen TA, Mandeville JB, Tootell RB. Faces and objects in macaque cerebral cortex. Nat Neurosci. 2003;6:989–995. doi: 10.1038/nn1111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Ungerleider LG, Mishkin M. Two cortical visual systems. In: Ingle DJ, Goodale MA, Mansfield RJW, editors. Analysis of Visual Behaviour. Cambridge, MA: MIT Press; 1982. pp. 549–586. [Google Scholar]
  33. Ward EJ, MacEvoy SP, Epstein RA. Eye-centered encoding of visual space in scene-selective regions. Journal of Vision. 2010;10:1–12. doi: 10.1167/10.14.6. [DOI] [PubMed] [Google Scholar]
  34. Yamane Y, Carlson ET, Bowman KC, Wang Z, Connor CE. A neural code for three-dimensional object shape in macaque inferotemporal cortex. Nat Neurosci. 2008;11:1352–1360. doi: 10.1038/nn.2202. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

RESOURCES