Skip to main content
eLife logoLink to eLife
. 2019 May 7;8:e44422. doi: 10.7554/eLife.44422

Dissociable laminar profiles of concurrent bottom-up and top-down modulation in the human visual cortex

Samuel JD Lawrence 1, David G Norris 1,2, Floris P de Lange 1,
Editors: Christian Büchel3, Michael J Frank4
PMCID: PMC6538372  PMID: 31063127

Abstract

Recent developments in human neuroimaging make it possible to non-invasively measure neural activity from different cortical layers. This can potentially reveal not only which brain areas are engaged by a task, but also how. Specifically, bottom-up and top-down responses are associated with distinct laminar profiles. Here, we measured lamina-resolved fMRI responses during a visual task designed to induce concurrent bottom-up and top-down modulations via orthogonal manipulations of stimulus contrast and feature-based attention. BOLD responses were modulated by both stimulus contrast (bottom-up) and by engaging feature-based attention (top-down). Crucially, these effects operated at different cortical depths: Bottom-up modulations were strongest in the middle cortical layer and weaker in deep and superficial layers, while top-down modulations were strongest in the superficial layers. As such, we demonstrate that laminar activity profiles can discriminate between concurrent top-down and bottom-up processing, and are diagnostic of how a brain region is activated.

Research organism: Human

eLife digest

Recent advances in brain imaging have made it possible to map brain activity in areas of tissue less than a millimeter in size. This resolution offers particular advantages for studying the brain’s outer surface, the cortex. The cortex is traditionally divided into several layers, each containing different types and arrangements of neurons. New high-resolution machines can now visualize the activity in individual layers of cortex, and this can reveal whether the layers also have different roles.

In humans, a large area in the cortex is devoted to vision. Our visual cortex receives sensory information that arrives from the eyes via the optic nerve. This is known as bottom-up processing. But what we see depends on more than just incoming sensory information: it also relies on where we focus our attention, and on our expectations about how things should look. Many optical illusions, for example, work because the brain attempts to decipher an ambiguous visual signal based on previous experiences. This use of existing knowledge to interpret sensory input is called top-down processing.

Using high-resolution brain scanning, Lawrence et al. show that bottom-up and top-down processing occur in different layers of visual cortex. Healthy volunteers viewed a series of images while lying inside a brain scanner. Lawrence et al. changed the contrast of the images to alter the volunteers’ bottom-up processing: this affected activity in the middle layer of visual cortex. To adjust their top-down processing, the volunteers were asked to attend to different features of the images on different trials: these changes in attention had more effect in the layers on either side of the middle layer. This suggests that bottom-up processing occurs in the middle layer of visual cortex, whereas top-down processing takes place in the layers above and below.

The findings by Lawrence et al. will help to better measure activity in cortical layers using modern brain imaging techniques. With further technological improvements, it may become possible to image each layer in the brain in more detail, in particular for other areas that support complex cognitive processes.

Introduction

Using ‘ultra-high field’ MRI systems of 7T and above, it has become possible to non-invasively measure fMRI responses at lamina-resolved spatial resolutions in humans (Dumoulin et al., 2018; Koopmans et al., 2011; Polimeni et al., 2010). This has allowed researchers to ask new questions about the functional organization of the human brain, and examine communication between brain areas in more detail than previously possible (Kuehn and Sereno, 2018). One important promise of laminar fMRI is its potential ability to distinguish between bottom-up and top-down BOLD responses. While these are spatially amalgamated at standard imaging resolutions (Lawrence et al., 2017; Self et al., 2017), they are expected to be expressed at different cortical depths. Bottom-up connections between brain areas are known to target the granular layer 4, at middle cortical depths, while top-down connections target deeper and superficial layers but largely avoid layer 4 (Anderson and Martin, 2009; Felleman and Van Essen, 1991; Rockland and Pandya, 1979). It should therefore be possible to tease apart the bottom-up and top-down contributions to a stimulus-driven BOLD response by examining that response across cortical depth.

Previous laminar fMRI studies suggest that this is indeed the case. For example, stimulus-driven responses in visual cortex have been shown to be strongest at middle depths (Koopmans et al., 2010), while top-down signals embodying contextual inference, prediction, attention and working memory operate at deep and/or superficial, but not middle, cortical depths (Klein et al., 2018; Kok et al., 2016; Lawrence et al., 2018; Muckli et al., 2015; Scheeringa et al., 2016). Similar results for top-down influences have also been reported for auditory (De Martino et al., 2015) and motor cortex (Huber et al., 2017). Whilst these results are encouraging, these studies have typically measured top-down signals in the absence of a bottom-up response. The rationale for this choice is clear, as bottom-up drive could affect all cortical layers due to quick communication between layers (Self et al., 2013) and blurring from spatial hemodynamics (Uğurbil et al., 2003; Uludağ and Blinder, 2018; Yacoub et al., 2005) could obscure layer-specific top-down effects. However, being constrained to measuring top-down responses in isolation limits the potential power of laminar fMRI experiments for exploring brain function. If the overall BOLD response to a stimulus could be separated into its bottom-up, stimulus-driven component and a top-down, modulatory component, this would open the door for increasingly complex task design in laminar fMRI experiments.

Here we measured lamina-resolved fMRI responses from human participants as they viewed visual stimuli and were required to attend to a specific stimulus feature (orientation). Our stimulus paradigm was designed to elicit concurrent bottom-up and top-down modulations of the stimulus-driven response through orthogonal manipulations of stimulus contrast (bottom-up) and feature-based attention (top-down), both of which are known to influence early visual cortex responses (Boynton et al., 1999; Himmelberg and Wade, 2019; Kamitani and Tong, 2005; Martinez-Trujillo and Treue, 2004; Saenz et al., 2002; Treue and Martínez Trujillo, 1999).

We predicted that response modulations driven by attention would operate at different cortical depths to those driven by changes in stimulus contrast. Specifically, contrast modulations were expected to be largest at middle cortical depths, as increases in contrast should be associated with stronger bottom-up input to the granular layer (Hubel and Wiesel, 1972; Rockland and Pandya, 1979). Top-down influences are generally expected to be strongest in the deep and/or superficial cortical depths (agranular layers, Lawrence et al., 2017). However, previous research into laminar effects of attention have been varied. Studies by van Kerkoerle et al. (2017) and Klein et al. (2018) report largely agranular effects of attention in V1 while others report effects in all layers of V1 (Denfield et al., 2018; Hembrook-Short et al., 2017). Further studies have also reported attention effects in all layers of V4 (Nandy et al., 2017) and the superficial layers of primary auditory cortex (De Martino et al., 2015). Moreover, previous laminar attention studies have employed spatial or object-based attention, but the laminar circuits involved in feature-based attention have, to our knowledge, not yet been studied. It is therefore not clear whether we should expect feature-based attention to modulate responses in all layers or only agranular layers. Critically, both eventualities yield the prediction that modulations from feature-based attention should be more strongly expressed in agranular layers compared to those from stimulus contrast, which should only be strong in the granular layer.

To preview, we found that fMRI responses in the early visual regions (V1-V3) were strongly modulated by changes in stimulus contrast and feature-based attention, and that these effects were indeed expressed at different cortical depths. As predicted, attentional modulations were more strongly expressed in agranular layers, particularly the superficial layers, while stimulus contrast modulations were largest in the granular layer.

Results

We report laminar-resolved fMRI responses from the early visual cortex (V1-V3) of 24 healthy human subjects while they viewed a series of plaid stimuli comprising two orthogonal sets of bars (one oriented 45°, hereafter referred to as ‘clockwise’, the other oriented 135°, hereafter referred to as ‘counter-clockwise’; see Figure 1). Plaids were presented in blocks of 8 stimuli, during which participants monitored changes in bar width of either the clockwise or counter-clockwise bars (Kamitani and Tong, 2005). Importantly, both sets of bars varied in width independently of each other, meaning attention had to be focused on the cued orientation to succeed at the task. Stimulus contrast was also manipulated: each block of stimuli comprised either high (80%) or low (30%) contrast plaid stimuli. As such, for each stimulus block participants attended to either clockwise or counter-clockwise bars within high or low contrast plaid stimuli, which provided our top-down and bottom-up task manipulations, respectively.

Figure 1. Task design.

Figure 1.

Plaid stimuli were presented in a block design. During stimulus blocks, eight stimuli were presented at a rate of 0.5 Hz (1.75 s on, 0.25 s off). Subjects were required to respond to each stimulus (except for the first in each block), indicating whether the bars in the cued orientation were thicker or thinner compared to the previously presented stimulus. Attention was cued by the colored fixation dot: red = clockwise, green = counter clockwise. Stimulus blocks were preceded by an attention cue and followed by performance feedback and an inter-block interval. See Materials and methods for more information on the task and stimuli.

Bottom-up and top-down modulations of the BOLD response

Subjects were able to focus their attention on one set of oriented bars within a plaid and accurately discriminate changes in bar width between stimuli. On average, subjects performed at 83.5% correct (SD = 2.3) for low and 84.5% correct (SD = 1.5) for high contrast stimuli. Task difficulty was controlled by separate staircases for high and low contrast stimuli to match task difficulty across contrast levels. Despite this, the numerically small difference in task performance was significant (t [22]=2.52, p=0.019). To assess the effects of attention and stimulus contrast on brain responses, we divided visually active voxels within V1, V2 and V3 into subpopulations with a strong preference for clockwise orientations over counter-clockwise or vice versa (Albers et al., 2018; see Materials and methods). It was expected that voxels would respond more strongly during blocks in which their preferred orientation was attended, and that all voxels would respond more strongly to higher contrast stimuli.

As expected, BOLD responses in early visual cortex were modulated by both subjects’ attention towards a specific orientation and changes in stimulus contrast (see Figure 2). Responses to high contrast stimuli were significantly higher than low contrast stimuli across V1-V3 (F [23, 1]=35.57, p=4.00e−6). The size of this effect varied across areas (F [30.2, 1.3]=46.53, p=1.77e−8), being larger in V1 than V2 and V3. Voxel responses were also higher when their preferred orientation was attended, compared to when the orthogonal orientation was attended (F [23, 1]=25.67, p=4.00e−5). This effect also varied across visual areas (F [46, 2]=4.91, p=0.012), being slightly smaller in V1 compared to V2 and V3. Overall, therefore, our paradigm was successful in inducing strong modulations of stimulus-driven BOLD responses using bottom-up (contrast) and top-down (feature-based attention) task manipulations.

Figure 2. BOLD modulations from feature-based attention and stimulus contrast.

Average BOLD signal change in orientation-selective voxels from V1-V3 combined, and V1, V2 and V3 separately. In all areas responses to high contrast stimuli (darker bars) were higher than to low contrast stimuli (lighter bars). Responses were also higher in voxels that preferred the orientation that was attended (red bars) compared to those that preferred the ignored orientation (blue bars). Error bars show within-subjects standard error. See text for statistical details.

Figure 2.

Figure 2—figure supplement 1. Layer-specific BOLD signal changes for each experimental condition.

Figure 2—figure supplement 1.

Layer-specific BOLD signal change in orientation-selective voxels from V1, V2 and V3. In all areas responses to high contrast stimuli (darker bars) were higher than to low contrast stimuli (lighter bars). Responses were also higher in voxels that preferred the orientation that was attended (red bars) compared to those that preferred the ignored orientation (blue bars). For each condition signal change from deep, middle and superficial cortex is plotted from left to right (signified by D, M, S). Error bars show within-subjects standard error.

Dissociable laminar profiles of bottom-up and top-down response modulations

Next, we determined whether the effects of feature-based attention and stimulus contrast on BOLD responses varied across cortical depth, and whether they did so differently from each other. To this end we computed separate BOLD time courses specific to three equal volume gray matter depth bins defining deep, middle and superficial cortex (Lawrence et al., 2018; van Mourik et al., 2018a, see Materials and methods for more information). Depth-specific time courses were normalized to remove overall differences in signal intensity between layers (Figure 3—figure supplements 3 and 4). Note that this normalization was not critical to the results reported (Figure 3—figure supplement 5). Normalized depth-specific time courses were analyzed to compare the laminar profile of activity modulations resulting from top-down attention and bottom-up stimulus contrast. To get an overall picture of depth-specific modulations across the visual cortex, we first combined voxels from V1, V2 and V3 for this analysis.

Response modulations from both feature-based attention and stimulus contrast were clearly present in depth-specific time courses (Figure 3A–D). In order to fairly compare laminar profiles across conditions, we used data from the same time points (highlighted in Figure 3A & C), which comprised the peak of the BOLD response during a block of stimuli and during which both the effects of attention (F [23, 1]=19.95, p=1.76e−4) and contrast (F [23, 1]=35.98, p=4.00e−6) were significant. Within this time window, the effect of feature-based attention on neural responses was present at all cortical depths and was largest in the superficial layers (Figure 3b).

Figure 3. Laminar organization of top-down and bottom-up response modulations in V1-V3 combined.

(A) Average BOLD time course for a block of stimuli, averaged across cortical depth bins. Responses from voxels that preferred the attended orientation (red line) and voxels that preferred the unattended orientation (blue dash) are plotted separately. (B) Average difference between attended and unattended BOLD signals from the highlighted time points in panel A, plotted separately for cortical depth bins. (C) Average BOLD time courses for blocks of stimuli contained high (dark gray line) and low contrast (light gray dash) stimuli. (D) Average difference between responses to high and low contrast stimuli from the time points highlighted in panel C, plotted separately for cortical depth bins. (E) Group average scores indicating the whether the effects of attention (red bar) and contrast (gray bar) were stronger in the agranular or granular layers. Scores were computed by taking the average attention or contrast effect from the deep and superficial layers (left-most and right-most data point in panels B and D, respectively) and subtracting the attention or contrast effect from the middle layer (middle data point in panels B and D, respectively). A positive score indicates a more agranular response, negative indicates more granular. Asterisks denote a significant paired sample t test (p=0.005, see text for details). (F) Difference between agranular – granular scores (panel E) for attention and contrast conditions for each individual subject. A Positive score indicates that attention modulations were stronger in agranular layers compared to contrast modulations, which was the case for 20 out of 24 subjects. All error bars show within-subject standard error.

Figure 3.

Figure 3—figure supplement 1. Effect of mask size on results.

Figure 3—figure supplement 1.

In our main analysis, we constructed ROIs for V1, V2 and V3 comprising the 1000 most selective voxels (500 that preferred clockwise and 500 that preferred counter-clockwise). Here we conducted a series of control analyses to ensure our choice of 1000 voxels did not bias our results. (A) Agranular-granular scores for the effects of feature-based attention (red) and stimulus contrast (black) for a range of different mask sizes. (B) Difference between agranular-granular scores for the effects of attention and contrast plotted in panel A. A positive difference indicates that the effect of attention was more agranular compared to the effect of contrast. Though effect sizes varied with mask size, being smaller at the extremes (likely due to increased noise in laminar estimates when using few voxels and the inclusion of less selective voxels when using a large number of voxels), attention was more agranular than contrast for all mask sizes, as was the case in our main analysis. In both panels, error bars depict within-subject standard error.
Figure 3—figure supplement 2. Results using re-sampled orientation preference masks designed to sample from all cortical depths equally.

Figure 3—figure supplement 2.

In our original analysis we selected voxels that were maximally responsive to and selective for our stimuli. This included restricting ROIs to only include voxels that exhibited a significant response to the stimuli presented in the stimulus localizer. Due to stronger overall signal in superficial cortex, this restriction created a sampling bias where our masks included more voxels that maximally overlapped with the superficial depth bin compared to other bins: on average in V1, 334 voxels (SD = 97) maximally overlapped with the superficial bin, compared to 141 (SD = 43) for middle and 175 (SD = 55) for deep. Similarly, for V2 there were 331 voxels (SD = 117) for superficial, 146 (SD = 50) for middle and 189 for deep (SD = 56) and in V3 there were 379 voxels (SD = 132) for superficial, 162 (SD = 55) for middle and 177 (SD = 54) for deep. Note that the total number of voxels reported here for each visual area do not add up to the total analyzed for each visual area (1000). This is because there were also a number of voxels that fell within white matter and CSF, which helped the spatial GLM estimate responses for these depth bins that fell outside the gray matter, but responses from these depths were not analyzed further. To check our results were not dependent on this sampling bias, we conducted the following control analysis. We resampled our orientation preference masks by randomly removing voxels until there was an equal number that maximally overlapped with each gray matter depth bin. This resulted in 132 voxels (SD = 40) for each depth bin in V1, 137 (SD = 46) for each depth in V2, and 141 (SD = 45) for each depth in V3. We recomputed our analyses using these control masks that sampled evenly from all cortical depths, which revealed very similar results. The effect of attention was significant during the highlighted time points (stats, panel A). The effect of attention varied across depth (F [46, 2]=3.55, p=0.037, panel B), being larger in superficial compared to middle (t [23]=2.05, p=0.052) and deep cortex (t [23]=2.25, p=0.034). The effect of stimulus contrast was significant in the highlighted time window (stats, panel C). The effect of contrast varied significantly across depth (F [46, 2]=5.73, p=0.006), being larger in the middle depth bin compared to deep (t [23]=3.20, p=0.004, panel D). The effect of attention was significantly stronger in the agranular layers compared to stimulus contrast (t [23]=2.37, p=0.026, panel E). Overall, the results of this control analysis were similar to our main analysis. All error bars depict within-subjects standard error.
Figure 3—figure supplement 3. Raw layer-specific time courses for each condition and visual area.

Figure 3—figure supplement 3.

Raw group average time courses are plotted for each condition (from top to bottom: low contrast attended, low contrast unattended, high contrast attended, high contrast unattended) and for each visual area (from left to right: V1-V3 combined, V1, V2, V3). Time courses from deep (black lines), middle (dark gray lines) and superficial (light gray lines) layers are plotted separately. These average time courses were computed using raw layer-specific time courses, before they were normalized using within-layer z scoring (described in Materials and methods), showing the bias in signal strength towards superficial cortex in the raw data. Error bars show within-subject standard error.
Figure 3—figure supplement 4. Removal of layer-specific BOLD bias.

Figure 3—figure supplement 4.

As expected, there was a bias in overall BOLD amplitude towards superficial cortex (see Figure 3—figure supplement 3). For the analyses reported in our main paper, we removed this bias by z scoring data within layers, after we had estimated layer-specific responses from the pre-processed voxel data. The figure shows normalized layer-specific time courses from V1, V2 and V3, averaged across experimental conditions. The bias in BOLD amplitude in superficial cortex is almost completely removed.
Figure 3—figure supplement 5. Main results using raw layer-specific time courses with no normalization.

Figure 3—figure supplement 5.

Our main results (shown in Figure 2) were obtained using layer-specific time courses that were normalized by z scoring data within layers. It was possible that normalizing in this manner could have impacted our effect sizes of interest. This is because the standard deviation across a time course, the denominator in a z score calculation, is somewhat dependent on the size of response fluctuations caused by our manipulations of attention and contrast. To determine whether this was the case, we repeated all the analyses from Figure 2 of raw layer-specific time courses without any normalization. The figure shows that both the attention (panel A) and contrast (panel C) effect sizes appear very similar to those in Figure 2, indicating that any effect of within-layer z scoring on effect sizes was negligible. Layer-specific effects (panel B, D–F) are also highly similar to those reported in our main analysis in Figure 2. All error bars depict within-subjects standard error.
Figure 3—figure supplement 6. Differences in trial-to-trial variance between cortical layers.

Figure 3—figure supplement 6.

To see if there were differences in signal variance between cortical layers, we computed trial-to-trial standard deviation in our (normalized) layer-specific time courses for each of the 10 time points within a single trial, and then computed the average standard deviation across time points. The group average standard deviation for each layer clearly shows a difference in overall signal variance between layers, where variance was higher in deeper cortex (F [46, 2]=11.41, p=9.5e-5). It therefore appears that signals we measured from deeper cortex had overall lower signal strength and larger variance. Error bars show within-subject standard error.
Figure 3—figure supplement 7. Example cross section of V1 mask.

Figure 3—figure supplement 7.

Axial (left), coronal (middle) and sagittal (right) views of a cross section of one example subject’s V1 mask. Superficial, middle and deep voxels are colored in green, blue and red, respectively. Note that in our spatial GLM approach (described in Materials and methods), voxels’ contributions to layer-specific time courses were weighted by their proportion overlap with each layer. For the purposes of visualization in this figure, voxels are labeled according to which layer they maximally overlapped with.
Figure 3—figure supplement 8. Layout of orientation-selective voxels in V1.

Figure 3—figure supplement 8.

Example V1 clockwise (blue-light blue) and counter-clockwise (red-yellow) masks for one representative subject. Brighter colors indicate stronger selectivity for the preferred orientation (t statistic, see Materials and methods for more details).
Figure 3—figure supplement 9. Example layer-specific masks and statistical maps.

Figure 3—figure supplement 9.

(A) Axial (left), coronal (middle) and sagittal (right) views of a cross section of one example subject’s V1 mask. Superficial, middle and deep voxels are colored in green, blue and red, respectively. Note that in our spatial GLM approach (described in Materials and methods), voxels’ contributions to layer-specific time courses were weighted by their proportion overlap with each layer. For the purposes of visualization in this figure, voxels are labeled according to which layer they maximally overlapped with. (B) Statistical map showing voxel t values for the high contrast >low contrast stimuli from the main analysis for the same slices in panel A. (C) Statistical map showing voxel t values for attended >unattended orientation from the main analysis for the same slices in panel A. (D) Zoom in on a region of cortex highlighted by the white dashed boxes in sagittal views in panels A-C. By comparing voxels depths (left), contrast (middle) and attention (right) t maps, it can be seen that middle-depth voxels tend to show larger contrast modulation, while superficial voxels are more modulated by attention.

There was a trend of activity differences between layers induced by the attentional manipulation (F [46, 2]=2.82, p=0.070). Unpacking this, the attentional modulation was significantly stronger in the superficial layers compared to the middle (t [23]=2.11, p=0.046) and deep layers (t [23]=2.15, p=0.042), while there was no significant difference in the strength of the attentional modulation between the deep and middle layers (t [23]=0.36, p=0.723). Modulations from changes in stimulus contrast were organized quite differently, peaking at middle depths (Figure 3d). Indeed, contrast modulations varied significantly across depth (F [46, 2]=8.43, p=0.001), being largest at middle compared to deep (t [23]=3.79, p=0.001) and superficial (t [23]=3.56, p=0.002) depths. Critically, the organization of contrast-related modulations across depth was significantly different to those caused by feature-based attention, as shown by a source (bottom-up, top-down) X layer (deep, middle, superficial) interaction (F [46, 2]=4.39, p=0.018). As such, the laminar profiles of responses modulations across the early visual cortex were dependent on whether those modulations were bottom-up or top-down in origin.

We predicted that top-down effects were more likely to be expressed in agranular layers compared to bottom-up effects. To explicitly test for this, we computed a score that described whether experimental effects were more agranular or granular. This was done by averaging the effect of feature-based attention (or contrast) from the superficial and deep depth bins (agranular) and subtracting that from the middle bin (granular). As such, a positive score indicates a largely agranular effect, while a negative score indicates a granular effect. As predicted, feature-based attention effects were more agranular compared to stimulus contrast (Figure 3E). This difference was significant (t [23]=3.11, p=0.005), and 20 of our 24 subjects showed an effect in this direction (Figure 3F). Therefore, it appears that top-down contributions to response modulations were stronger in the agranular layers compared to bottom-up contributions, which were strongest in the granular layer. As can be seen from Figure 3B, the agranular profile of attention was driven by the fact that the attentional modulation was strongest in the superficial layers.

Laminar profiles are similar across early visual areas

We next explored how modulations from feature-based attention and stimulus contrast varied across cortical depth within visual areas, and potential differences in organization between areas. We estimated depth-specific effects of attention and contrast for V1, V2 and V3 using the same methods applied to the three areas combined (Materials and methods). Similar to our original analysis, variation in the effect of attention across depth over V1-V3 (Figure 4A) did not reach significance (F [46, 2]=2.54, p=0.090), and attention depth profiles were similar across areas (F [69.67, 3.03]=0.44 p=0.778). The effect of contrast did vary across cortical depth (F [36.28, 1.58]=7.52, p=0.004) peaking at middle depths (Figure 4B), but this profile was not significantly different between the three areas (F [92, 4]=1.39, p=0.244. When directly contrasting these two modulatory factors, there was an overall, area independent, difference between feature-based attention and stimulus contrast that approached significance (F [46, 2]=2.74, p=0.075), but no significant differences between areas (F [80.38, 3.50]=1.00, p=0.407).

Figure 4. Laminar organization of top-down and bottom-up response modulations in V1, V2 and V3.

Figure 4.

(A) Average difference between depth-specific time courses in voxels that preferred the attended orientation and voxels that preferred the unattended orientation in V1 (orange), V2 (blue) and V3 (green). (B) Average difference between depth-specific time courses from blocks containing high and low contrast stimuli for V1, V2 and V3. (C) Average difference between attention modulations (taken from panel A) and contrast modulations (from panel B) in the agranular layers (average of deep and superficial bins) and granular layer (middle bin) for V1, V2 and V3. A positive score indicates a more agranular response, negative indicates more granular. All error bars show within-subject standard error.

We also computed scores describing how agranular or granular effects of attention and contrast were within V1, V2 and V3 (Figure 4C). In general, modulations from feature-based attention were more agranular compared to those from stimulus contrast (F [23, 1]=5.48, p=0.028), and this was consistent across visual areas (F [46, 2]=0.51, p=0.607). Overall, these results show highly similar behavior of the three early visual regions (V1, V2, V3) that we examined, in terms of both their bottom-up and top-down laminar activation profiles.

Discussion

We measured laminar fMRI responses from the human visual cortex during a visual task designed to induce bottom-up and top-down response modulations via orthogonal manipulations of stimulus contrast and feature-based attention. BOLD responses were strongly modulated by both feature-based attention and stimulus contrast, and these effects were expressed at different cortical depths. Effects of stimulus contrast were considerably larger at middle cortical depths compared to deep and superficial depths, while effects of feature-based attention were more even across depth, peaking in superficial cortex. Moreover, by comparing the strength of attention and contrast modulation in agranular versus the granular layers, we found that attention effects were expressed more strongly in the agranular layers (specifically the superficial layers) compared to effects from stimulus contrast, which were more granular.

Our results show clear differences in how bottom-up and top-down aspects of perceptual processing affect brain responses across cortical depth and are consistent with the anatomical organization of feedforward and feedback connections between brain areas (Rockland and Pandya, 1979). To our knowledge, our study also provides the first report of how visual cortex responses are modulated by feature-based attention at the laminar level. Most importantly, we demonstrate that laminar fMRI methods can be used to examine both the bottom-up and top-down components of the overall BOLD response as they co-occur during the processing of a stimulus. Previous laminar fMRI studies have either measured depth-specific effects in the absence of a physical stimulus (Kok et al., 2016; Lawrence et al., 2018; Muckli et al., 2015), or in the presence of a stimulus that was held constant (De Martino et al., 2015; Klein et al., 2018). By orthogonally manipulating stimulus contrast and feature-based attention, we have shown that top-down effects can be separated from concurrent bottom-up modulations driven by the stimulus. This opens the door for future studies to further examine the dynamic interactions between bottom-up and top-down processing that occur in the context of stimulus processing.

Top-down modulations of the BOLD response were expressed at all cortical depths relatively evenly, slightly peaking in the superficial layers. This partly contrasts with our previous study, which observed top-down activation of V1 during visual working memory that was strong in the agranular layers, but much weaker in the middle layer (Lawrence et al., 2018). The most obvious difference between the two studies is the presence of a physical stimulus during top-down modulation in this study, while there was no stimulus during working memory in our previous study. Each stimulus is expected to trigger a large response in the middle layer of V1 driven by bottom-up connections from the LGN (Hubel and Wiesel, 1972). Interestingly, influences of feature-based attention have been reported in the LGN before (Ling et al., 2015; Schneider, 2011), suggesting that this bottom-up signal could carry attentional modulations, consistent with our data. Electrophysiological studies of laminar effects of attention report mixed results regarding the involvement of the granular layer of V1 in attention. van Kerkoerle et al. (2017) report increased spike rate and current sinks with attention that were largest in the agranular layers. In particular, Van Kerkoerle et al. report strong attentional modulations in the deep layers compared to the middle layer, which was not the case in our data. Other studies (Denfield et al., 2018; Hembrook-Short et al., 2017) report attentional effects on spike rate in all layers of V1. However, it should be noted that these studies utilized spatial attention as opposed to feature-based attention, making it unclear how comparable their results are to our study. More research is required to further elucidate the laminar circuits involved in different modes of attentional control.

The relatively similar strength of attentional modulations across cortical layers highlights an important aspect of our task design. Taken in isolation, the laminar profile of feature-based attention we report could be viewed as difficult to interpret, as there are no obvious differences between cortical depths. Crucially, however, the comparison of this profile to one derived from a manipulation of stimulus contrast revealed clear differences in how the visual cortex is modulated depending on the source of the modulation. We encourage future laminar studies exploring top-down responses to also include a bottom-up manipulation as a point of comparison, as the laminar organization of a BOLD activity difference between conditions on its own can be challenging to interpret (Self et al., 2017).

It is possible that we found top-down effects were similar across cortical depth due the blurring of BOLD responses across depth bins from spatial hemodynamics. Our task involved repeated presentation of a series of visual stimuli, which is expected to cause large swathes of stimulus-related activity in V1 that starts in layer four and quickly spreads to other layers (Self et al., 2013). This activity is in turn expected to be spatially blurred in the BOLD response by venous draining towards the pial surface, which smooths responses across cortical depth, causing stronger responses at superficial depths (Uğurbil et al., 2003; Uludağ and Blinder, 2018; Yacoub et al., 2005). It is therefore possible that repeated visual stimulation could have effectively washed out depth-specific responses, increasing the likelihood of experimental effects being uniform across depth. That said, any influence of spatial hemodynamics should be consistent across experimental conditions, and therefore accounted for in our calculation of bottom-up/top-down modulations via a subtraction of the responses to different contrast/attention conditions. Indeed, the strikingly distinct laminar profile of stimulus contrast effects that clearly peaked in the middle layers indicates that our analysis could account for the influence of hemodynamics. Nevertheless, accurate depth-estimates of BOLD responses continues to be the biggest challenge in laminar fMRI. Recent developments in modeling spatial aspects of the BOLD response for applying a spatial deconvolution to BOLD data (Markuerkiaga et al., 2016, ISMRM, abstract; Marquardt et al., 2018) and improved measurement protocols (Huber et al., 2017) could help to alleviate this issue.

We show that modulations of stimulus-driven responses were similar across areas within the early visual cortex. For stimulus contrast, this is consistent with a purely stimulus-driven effect that changes response amplitude at early, subcortical levels and is inherited through the visual system via bottom-up connections targeting layer 4 (Hubel and Wiesel, 1972; Rockland and Pandya, 1979). With regards to attention, there is little work addressing laminar differences between visual brain areas. Nandy et al. (2017) report attentional modulations in all layers of V4, consistent with our findings in extrastriate areas V2 and V3, as well as V1, but they do not provide a comparison to other brain areas. Buffalo et al. (2011) report attentional modulation of gamma and alpha oscillations in deep and superficial cortex that were similar in V1, V2 and V4. Though they did not measure from granular layer neurons, and thus cannot comment on whether attentional modulations occurred in all layers or only agranular layers, the similarity of results across visual brain areas appears consistent with our study. However, we again note that how these results compare to our own is unclear as these studies used spatial attention, not feature-based attention, as well as a variety of electrophysiological measurements with an unclear relation to the BOLD signal. For future studies, laminar fMRI is well suited to exploring laminar differences between brain areas as it affords simultaneous measurements over larger areas of cortex compared to electrophysiological methods.

In conclusion, we have shown that fMRI responses in visual cortex are strongly modulated by changes in stimulus contrast and feature-based attention, and that these effects operate at different cortical depths. Top-down modulations from attention were overall stronger in agranular layers (specifically the superficial layers) compared to those from stimulus contrast, which were strongest in the granular layer. We have shown that, in a task where bottom-up and top-down influences are manipulated independently, the overall BOLD response can be separated into top-down and bottom-up components by examining how these effects are organized across depth. Future studies can use similar strategies to further explore the dynamic interactions between bottom-up and top-down processing that occur in perception and cognition.

Materials and methods

Participants

Twenty-six healthy participants (all right-handed, nine males, mean age 25.5, age range 19–47) with normal or corrected-to-normal vision completed the experiment. This sample size (N = 26) provided us with 80% power to detect one-sided experimental effects that had at least medium effect size (Cohen’s d > 0.6). All gave written informed consent and the study was approved by the local ethics committees (CMO region Arnhem-Nijmegen, The Netherlands, and ethics committee of the University Duisburg-Essen, Germany, protocol CMO 2014/288). Participants were reimbursed for their time at the rate of €10 per hour. All participants completed a 1 hr 3T fMRI retinotopic mapping session, a 1 hr psychophysics session, and a 1 hr 7T fMRI session for the main task. The experiment and analysis plan were preregistered on the Open Science Framework (https://osf.io/46adc/).

Retinotopic mapping

Retinotopic mapping data were acquired and analyzed using identical methods to those reported in our previous study (Lawrence et al., 2018). In brief, brain responses to rotating wedge and expanding ring checkerboard stimuli were acquired using a Siemens 3T Trio MRI system (Siemens, Erlangen, Germany) with a 32-channel head coil and a T2*-weighted gradient-echo EPI sequence (TR 1500 ms, TE 40 ms, 68 slices, 2 mm isotropic voxels, multi-band acceleration factor 4). One high resolution anatomical image was also acquired with a T1-weighted MP-RAGE sequence (TR 2300 ms, TE 3.03 ms, 1 mm isotropic voxels, GRAPPA acceleration factor 2). Anatomical data were automatically segmented into white matter, gray matter and CSF using FreeSurfer (http://surfer.nmr.mgh.harvard.edu/). Functional data were analyzed using the phase encoded approach in MrVista (http://white.stanford.edu/software/). Polar angle and eccentricity data were visualized on an inflated cortical surface and the boundaries of V1, V2 and V3 were drawn manually using established criteria (Engel et al., 1994; Sereno et al., 1995; Wandell et al., 2007).

Psychophysics procedure

During the psychophysics session subjects completed the same visual task (Figure 1) that was used in the 7T main task fMRI session. Plaid stimuli were programmed in MATLAB (MathWorks, Natick, MA) and presented using PsychToolbox (Brainard, 1997) on a 24 inch BenQ XL2420T monitor (http://www.benq.eu/product/monitor/, resolution 1920 × 1080, refresh rate 120 Hz). Plaids comprised orthogonally oriented sets of bars (one set black, one set white), overlaid on top of each other. Areas of overlap between bars were made mid-gray (the same as the background), to facilitate mental separation of the two component stimuli. Subjects viewed the stimuli from a chin rest mounted 70 cm from the display and were instructed to fixate on a central fixation dot (0.5 degrees of visual angle across) at all times. Plaids were presented centrally behind an annulus mask (inner radius one degree, outer radius eight degrees, and had a spatial frequency of 1 cycle/degree and random phase. Stimulus edges were softened with a linear ramp that started 0.5 degrees from the edge of the mask.

The task used a block design. Stimulus blocks were preceded by an attention cue that lasted 2 s, where attention was cued by the color of the fixation dot (red = attend clockwise, green = attend counter-clockwise). The fixation dot remained red/green for the duration of the stimulus block. Stimulus blocks comprised a series of 8 plaid stimuli presented sequentially at a rate of 0.5 Hz (1.75 s on, 0.25 s off). Subjects’ task was to press one of two buttons indicating whether the bars in the attended orientation were thicker or thinner than they were in the previously presented stimulus. Subjects were instructed to attend, but not respond to, the first stimulus in each block (as there was no preceding stimulus to compare to) and to respond to all remaining stimuli within the block. Subjects were allowed to respond at any time during stimulus presentation or the inter-stimulus interval, but the trial was marked as incorrect if they did not respond before the next stimulus was presented. Bar width for clockwise and counter-clockwise bars varied independently from each other, meaning attention had to be focused on the cued orientation in order to succeed at the task.

Changes in bar width between stimuli were controlled using a QUEST (Watson and Pelli, 1983) staircase function targeting 80% correct performance, which was updated after each individual trial. Bars within the first plaid presented in each block had a bar width of 0.2 degrees ± a random increment between 0 and 0.02 degrees. For the remaining stimuli bar width was equal to the width of the previously presented stimulus ±an increment decided by the staircase. For both sets of bars, the direction of width increments was pseudo-randomized such that they were positive for four stimuli in each block and negative for four stimuli, presented in a random order. Stimulus luminance polarities were held constant within blocks but randomized between blocks, ensuring that both positive and negative luminance polarities were presented the same number of times for each experimental condition. After a stimulus block, the fixation dot turned black and was presented for 1 s. This was followed by performance feedback presented as a mark out of 7 for correct trials in the previous block, presented for 1 s. A 2 s attention cue then preceded the onset of the next stimulus block.

Subjects completed 24 blocks of the task, at which point the discrimination threshold for 80% correct performance was recorded for use in the main fMRI task. This process was performed once using high contrast stimuli (80% Michelson contrast), and once using low contrast stimuli (30% Michelson contrast), meaning separate thresholds were estimated for two contrast levels, which were used to match task difficulty across contrast levels in the fMRI experiment.

fMRI data acquisition

fMRI data for the main experiment were acquired using a Siemens Magnetom 7T MRI system (Siemens, Erlangen, Germany) with a commercial RF head coil (Nova Medical, Inc, Wilmington, MA, USA) with one transmit (TX) and 32 receive (RX) channels and a gradient coil (Type AS095, Siemens Healthcare, Erlangen, Germany) with 38 mT/m gradient strength and 200 mT/m/ms slew rate. Functional data were acquired with a T2*-weighted 3D gradient-echo EPI sequence (Poser et al., 2010; TR 3408 ms, TE 28 ms, 0.8 mm isotropic voxels, 16° flip angle, 192 × 192×38.4 mm FOV, GRAPPA acceleration factor 4). Shimming was performed using the standard Siemens shimming procedure for 7T. Anatomical data were acquired with an MP2RAGE sequence (Marques et al., 2010; TR 5000 ms, TE 2.04 ms, voxel size 0.8 mm isotropic, 240 × 240 mm FOV, GRAPPA acceleration factor 2) yielding two inversion contrasts (TI 900 ms, 4° flip angle and TI 3200 ms, 6° flip angle), which were combined to produce a T1-weighted image. We also acquired a T2-weighted HASTE scan that was used to identify the calcarine sulcus to aid functional slice positioning (TR 3230 ms, TE 67 ms, seven coronal slices, 0.625 × 0.625×5.10 mm voxels). Stimuli were programmed and displayed using the same methods described for the psychophysics session onto a rear-projection screen using an EIKI (EIKI, Rancho Santa Margarita, CA) LC-X71 projector (1024 × 768 resolution, refresh rate 60 Hz), viewed via a mirror (view distance ~130 cm).

fMRI procedure

Each subject completed 3 runs of the main task. The task was identical to the psychophysics session, except blocks of high and low contrast stimuli were randomly interleaved rather than presented in separate sessions, and timings were adjusted to sync with volume acquisition: The attention cue preceding a stimulus block was presented for 1.04 s, stimulus blocks lasted 16 s, followed by 1 s of fixation, 0.5 s of feedback, and a 15.54 s inter-block interval with a black fixation dot to allow the BOLD response to return to baseline before the next block. Changes in bar width for high and low contrast blocks were controlled by separate staircases, which were given starting estimates equal to contrast-specific discrimination thresholds measured in the psychophysics session plus a 20% increment. Due to a problem with recording button responses in one session, the behavioral results reported in the Results section were calculated using data from the remaining 23 subjects. five volumes were acquired per stimulus block, and five volumes between blocks, with 16 blocks in a single 555.5 s run. This run time also includes three dummy volumes that were discarded from the start of each run to allow for signal stabilization.

After the main task, subjects completed an orientation localizer scan that was used to measure voxel-wise orientation preference. Single sets of oriented bars (i.e., one stimulus component from the plaid stimuli presented in isolation) were presented in an AoBo block design. Stimulus blocks were 13.6 s long (4 TR) and separated by rest blocks of the same length. During a stimulus block bars were repeatedly presented with the same orientation at a rate of 2 Hz (250 ms on, 250 ms off). Stimuli were presented at 100% contrast, luminance polarity was reversed with each stimulus presentation, and phase was randomized. Stimulus blocks alternated between blocks of clockwise bars (45°) and blocks of counter-clockwise bars (135°). A total of 16 stimulus blocks were presented in a 446.4 s run; the first three volumes were again discarded. During the scan subjects maintained fixation and pressed a button every time the fixation dot flashed white for 0.25 s (1 to 4 flashes per block).

Preprocessing of fMRI data

We used the same data processing pipeline as our previous study (Lawrence et al., 2018). Functional volumes were cropped so that only the occipital lobe remained, and spatially realigned within and then between runs using SPM8 (http://www.fil.ion.ucl.ac.uk/spm). Finally, data were highpass filtered using FEAT (fMRI Expert Analysis Tool) v6.00 (https://fsl.fmrib.ox.ac.uk/fsl) with a cut off of 55 s to remove low frequency scanner drift.

7T anatomical data were segmented into white matter, gray matter and CSF using FreeSurfer’s (http://surfer.nmr.mgh.harvard.edu/) automated procedure. The white and gray matter surfaces were then aligned to the mean functional volume using a standard rigid body registration (Greve and Fischl, 2009) followed by a recursive non-linear distortion correction that has been described previously (Lawrence et al., 2018; van Mourik et al., 2018a).

Definition of functional masks

We defined orientation-selective masks in V1, V2 and V3 using methods we have described previously (Lawrence et al., 2018). Note that the voxel selection procedure described here was applied to data from the orientation localizer scan; a data set that was independent from the main task. In brief, a GLM was applied to functional localizer data using FEAT v6.00 (https://fsl.fmrib.ox.ac.uk/fsl) to identify voxels that responded to significantly to all stimuli presented in the localizer (z > 2.3, p<0.05). For two subjects, there were very few voxels within the visual cortex that survived this cluster correction, indicating they had failed to remain alert for the duration of the experiment, and so we did not make any further use of their data. Next we contrasted responses to clockwise and counter-clockwise bars, and created masks of 1000 voxels per area, containing the 500 voxels with the most positive t values in this contrast (prefer clockwise) and the 500 with the most negative t values (prefer counter-clockwise). This was done separately for V1, V2 and V3. In any cases where there were fewer than 500 voxels within an area that met the required criteria for being visually active and having an orientation preference, we used as many voxels as did fulfill the criteria. To ensure that our results did not depend on how many and which voxels we chose to include on our masks, and that the selection we ran a battery of control analyses using an array of different mask sizes (Figure 3—figure supplement 1–9). Although effect sizes varied across mask sizes, all produced effects in the same direction as our main analysis.

Quantification of effects of feature-based attention and stimulus contrast

Overall effects of feature-based attention and changes in stimulus contrast were quantified using a temporal GLM applied using FEAT v6.00 (https://fsl.fmrib.ox.ac.uk/fsl) on the preprocessed functional data. Each of the four experimental conditions (attend clockwise high contrast, attend clockwise low contrast, attend counter-clockwise high contrast, attend counter-clockwise low contrast) were modeled as separate regressors of interest and contrasted against baseline to estimate % signal changes associated with each condition. This was applied to orientation-selective masks from V1-V3 combined, and also to each area separately. % signal changes are shown in Figure 2. Signal changes associated with attended and unattended orientations were calculated by averaging responses from clockwise preferring voxels to attend clockwise blocks and counter-clockwise preferring voxels to attend counter-clockwise blocks. Likewise, unattended responses were calculated by averaging responses from clockwise preferring voxels to attend counter-clockwise blocks and vice versa.

Estimation of laminar responses

Laminar-specific time courses were estimated using the open fMRI analysis toolbox (van Mourik et al., 2018b) as we have described previously (Lawrence et al., 2018). In brief, segmented cortical meshes were divided into five depth bins: white matter, three equivolume gray matter bins, and CSF. The proportion of overlap between each voxel within our orientation-selective masks and these five bins were estimated, creating a matrix of depth weights describing the laminar organization of a population of voxels. These weights were regressed against the functional data from the same voxels to produce a single time course for each depth bin representative of the average response across the population at that cortical depth. This process was applied separately to the clockwise and counter-clockwise preferring voxel populations from V1-V3 combined to examine overall laminar activity across the visual cortex (Figure 3). We also did the same for V1, V2 and V3 separately to examine differences in laminar organization between areas (Figure 4).

Normalization of layer-specific responses

It is well established that gradient-echo BOLD suffers from a bias in signal strength whereby responses in superficial cortex are stronger than responses from deep cortex (Koopmans et al., 2010; Uğurbil et al., 2003; Uludağ and Blinder, 2018; Yacoub et al., 2005). This bias can be seen clearly in our raw data (see Figure 2—figure supplement 1; Figure 3—figure supplement 3). We attempted to alleviate this issue by converting time courses specific to deep, middle and superficial cortical layers to z scores, normalizing differences in overall signal strength between layers. The z scoring was performed on layer-specific time courses from a single run of the main task, meaning it was performed within layers, across all experimental conditions and within runs. This procedure removed overall amplitude and variance differences between layers, while preserving within-layer differences between conditions. This had the effect of making overall signal changes between depth bins very similar (Figure 3—figure supplement 4), while preserving potential differences between depth bins that are due to experimental manipulations (rather than large differences in overall signal change). Of note, none of our results critically depend on this normalization step (Figure 3—figure supplement 5), but it allowed us to interpret those results in the absence of large-scale response differences between layers that are present in the raw data.

Quantification of laminar-specific effects of feature-based attention and stimulus contrast

We analyzed time courses specific to deep, middle and superficial gray matter depth bins in the following way to quantify depth-specific effects of feature-based attention and stimulus contrast. Z scored, depth-specific time courses were split into segments of 10 volumes each, corresponding to one stimulus block (five volumes) followed by an inter-block interval (five volumes). To examine effects of feature-based attention, we computed an average attended time course by averaging responses for each block from the voxels that preferred the cued orientation in that block (i.e. prefer clockwise for attend clockwise blocks and prefer counter-clockwise for attend counter-clockwise blocks), and an average unattended time course by averaging responses from voxels that preferred the ignored orientation for each block (i.e. prefer clockwise for attend counter-clockwise blocks and prefer counter-clockwise for attend clockwise blocks). To examine effects of stimulus contrast, we averaged responses from both populations of voxels, regardless of orientation preference, averaging across all high contrast blocks and low contrast blocks to produce separate average time courses for high and low contrast stimuli. This analysis procedure was performed separately on time courses from the three gray matter depth bins within each subject, and then a group average was calculated. Figure 3A&C show group average time courses for each experimental condition, averaged across gray matter bins. The strength of modulations from feature-based attention and stimulus contrast were quantified as the difference between condition-specific time courses during the peak of the stimulus driven response (highlighted in Figure 3A&C), which are plotted for each depth bin in Figure 3B&D. Finally, we computed a score to describe the extent to which an effect of interest was expressed in the agranular or granular layers. This was achieved by averaging the effect of attention or stimulus contrast (Figure 3B&D) from the superficial and deep gray matter bins (agranular) and subtracting the middle bin (granular). A positive score therefore indicates a mostly agranular effect, while a negative score indicates a granular effect. The procedure described here was applied first to voxels from all visual areas combined (Figure 3), and then V1, V2 and V3 separately (Figure 4).

Statistical testing

Overall effects of feature-based attention and stimulus contrast (Figure 2) were assessed using a visual area (V1/V2/V3) x contrast (high/low) x attention (attended/unattended) repeated measures ANOVA. Note that, though we plot the results from V1-V3 combined in Figure 2, the ANOVA was performed on data from the three areas separately so that it would incorporate differences between areas.

The effects of feature-based attention and stimulus contrast in laminar-specific time courses from V1-V3 combined were quantified by examining the difference between attended and unattended (or high and low contrast) time courses during the peak of the stimulus-driven response during a block of stimuli (highlighted in Figure 3A&C). These were assessed using separate condition (attended/unattended or high/low contrast) x time point (6.8/10.2/13.6/17 s) repeated measures ANOVAs. These tests were performed on time courses averaged across depth bins that are plotted in Figure 3A&C. Depth-specific time courses were analyzed in the same way, and the difference between attended/unattended and high/low contrast for each depth are plotted in Figure 3B&D, respectively. We investigated whether these laminar profiles were different from each other using a modulation (attention/contrast) by depth (deep/middle/superficial) repeated measures ANOVA. A significant interaction (see Results) revealed the profiles were different from each other, so we examined them independently with one-way repeated measures ANOVAs (levels: deep/middle/superficial). In the cases that the main effect of depth was significant (i.e., for stimulus contrast), differences between depths were examined with paired-samples t tests. Finally, the difference between agranular – granular scores for attention and stimulus contrast was assessed using a paired-samples t test.

Laminar-specific effects of attention and stimulus contrast were also compared between visual areas (Figure 4). Differences between laminar profiles of attention and contrast and between areas were assessed with a visual area (V1/V2/V3) x modulation (attention/contrast) x depth (deep/middle/superficial) repeated measures ANOVA. The modulation x depth interaction approached significance (see Results), and we chose to examine the effects of attention and contrast using separate ANOVAs so that we might relate these results to those obtained from V1-V3 combined. As such, we examined whether the effects of attention and contrast varied across depth and visual area using separate visual area (V1/V2/V3) x depth (deep/middle/superficial) repeated measures ANOVAs. Finally, differences in agranular – granular scores between conditions and areas were assessed using a modulation (attention/contrast) x visual area (V1/V2/V3) repeated measures ANOVA. For all the ANOVAs we conducted, in cases where the assumption of sphericity was violated the degrees of freedom were adjusted using a Huynh-Feldt correction.

Acknowledgements

We are grateful to Julia Pottkämper for valuable assistance in data collection and to Matthias Fritsche and Benedikt Ehinger for helpful comments on an earlier version of the manuscript. This work was supported by The Netherlands Organisation for Scientific Research (NWO Vidi grant 452-13-016) and the EC Horizon 2020 Program (ERC starting grant 678286, ‘Contextvision’), both awarded to FPdL.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Floris P de Lange, Email: floris.delange@donders.ru.nl.

Christian Büchel, University Medical Center Hamburg-Eppendorf, Germany.

Michael J Frank, Brown University, United States.

Funding Information

This paper was supported by the following grants:

  • Netherlands Organisation for Scientific Research Vidi grant 452-13-016 to Floris P de Lange.

  • European Research Council Starting Grant 678286 CONTEXTVISION to Floris P de Lange.

Additional information

Competing interests

Reviewing editor, eLife.

No competing interests declared.

Author contributions

Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing—original draft, Project administration, Writing—review and editing.

Resources, Supervision, Methodology, Writing—review and editing.

Conceptualization, Formal analysis, Supervision, Funding acquisition, Methodology, Writing—original draft, Project administration, Writing—review and editing.

Ethics

Human subjects: All participants gave written informed consent and the study was approved by the local ethics committees (CMO region Arnhem-Nijmegen, The Netherlands, and ethics committee of the University Duisburg-Essen, Germany). Protocol CMO 2014/288.

Additional files

Transparent reporting form
DOI: 10.7554/eLife.44422.017

Data availability

Data and code used for stimulus presentation and analysis are available online at the Donders Research Data Repository: https://data.donders.ru.nl/collections/di/dccn/DSC_3018028.04_752.

The following dataset was generated:

Lawrence S, Norris DG, de Lange FP. 2019. Dissociable laminar profiles of bottom-up and top-down modulation in the human visual cortex. Donders Repository. DSC_3018028.04_752

References

  1. Albers AM, Meindertsma T, Toni I, de Lange FP. Decoupling of BOLD amplitude and pattern classification of orientation-selective activity in human visual cortex. NeuroImage. 2018;180:31–40. doi: 10.1016/j.neuroimage.2017.09.046. [DOI] [PubMed] [Google Scholar]
  2. Anderson JC, Martin KA. The synaptic connections between cortical areas V1 and V2 in macaque monkey. Journal of Neuroscience. 2009;29:11283–11293. doi: 10.1523/JNEUROSCI.5757-08.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Boynton GM, Demb JB, Glover GH, Heeger DJ. Neuronal basis of contrast discrimination. Vision Research. 1999;39:257–269. doi: 10.1016/S0042-6989(98)00113-8. [DOI] [PubMed] [Google Scholar]
  4. Brainard DH. The psychophysics toolbox. Spatial Vision. 1997;10:433–436. doi: 10.1163/156856897X00357. [DOI] [PubMed] [Google Scholar]
  5. Buffalo EA, Fries P, Landman R, Buschman TJ, Desimone R. Laminar differences in gamma and alpha coherence in the ventral stream. PNAS. 2011;108:11262–11267. doi: 10.1073/pnas.1011284108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. De Martino F, Moerel M, Ugurbil K, Goebel R, Yacoub E, Formisano E. Frequency preference and attention effects across cortical depths in the human primary auditory cortex. PNAS. 2015;112:16036–16041. doi: 10.1073/pnas.1507552112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Denfield GH, Ecker AS, Shinn TJ, Bethge M, Tolias AS. Attentional fluctuations induce shared variability in macaque primary visual cortex. Nature Communications. 2018;9:2654. doi: 10.1038/s41467-018-05123-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Dumoulin SO, Fracasso A, van der Zwaag W, Siero JCW, Petridou N. Ultra-high field MRI: advancing systems neuroscience towards mesoscopic human brain function. NeuroImage. 2018;168:345–357. doi: 10.1016/j.neuroimage.2017.01.028. [DOI] [PubMed] [Google Scholar]
  9. Engel SA, Rumelhart DE, Wandell BA, Lee AT, Glover GH, Chichilnisky EJ, Shadlen MN. fMRI of human visual cortex. Nature. 1994;369:525. doi: 10.1038/369525a0. [DOI] [PubMed] [Google Scholar]
  10. Felleman DJ, Van Essen DC. Distributed hierarchical processing in the primate cerebral cortex. Cerebral Cortex. 1991;1:1–47. doi: 10.1093/cercor/1.1.1. [DOI] [PubMed] [Google Scholar]
  11. Greve DN, Fischl B. Accurate and robust brain image alignment using boundary-based registration. NeuroImage. 2009;48:63–72. doi: 10.1016/j.neuroimage.2009.06.060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Hembrook-Short JR, Mock VL, Briggs F. Attentional modulation of neuronal activity depends on neuronal feature selectivity. Current Biology. 2017;27:1878–1887. doi: 10.1016/j.cub.2017.05.080. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Himmelberg MM, Wade AR. Eccentricity-dependent temporal contrast tuning in human visual cortex measured with fMRI. NeuroImage. 2019;184:462–474. doi: 10.1016/j.neuroimage.2018.09.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Hubel DH, Wiesel TN. Laminar and columnar distribution of geniculo-cortical fibers in the macaque monkey. The Journal of Comparative Neurology. 1972;146:421–450. doi: 10.1002/cne.901460402. [DOI] [PubMed] [Google Scholar]
  15. Huber L, Handwerker DA, Jangraw DC, Chen G, Hall A, Stüber C, Gonzalez-Castillo J, Ivanov D, Marrett S, Guidi M, Goense J, Poser BA, Bandettini PA. High-Resolution CBV-fMRI allows mapping of laminar activity and connectivity of cortical input and output in human M1. Neuron. 2017;96:1253–1263. doi: 10.1016/j.neuron.2017.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Kamitani Y, Tong F. Decoding the visual and subjective contents of the human brain. Nature Neuroscience. 2005;8:679–685. doi: 10.1038/nn1444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Klein BP, Fracasso A, van Dijk JA, Paffen CLE, Te Pas SF, Dumoulin SO. Cortical depth dependent population receptive field attraction by spatial attention in human V1. NeuroImage. 2018;176:301–312. doi: 10.1016/j.neuroimage.2018.04.055. [DOI] [PubMed] [Google Scholar]
  18. Kok P, Bains LJ, van Mourik T, Norris DG, de Lange FP. Selective activation of the deep layers of the human primary visual cortex by Top-Down feedback. Current Biology. 2016;26:371–376. doi: 10.1016/j.cub.2015.12.038. [DOI] [PubMed] [Google Scholar]
  19. Koopmans PJ, Barth M, Norris DG. Layer-specific BOLD activation in human V1. Human Brain Mapping. 2010;31:1297–1304. doi: 10.1002/hbm.20936. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Koopmans PJ, Barth M, Orzada S, Norris DG. Multi-echo fMRI of the cortical laminae in humans at 7 T. NeuroImage. 2011;56:1276–1285. doi: 10.1016/j.neuroimage.2011.02.042. [DOI] [PubMed] [Google Scholar]
  21. Kuehn E, Sereno MI. Modelling the human cortex in three dimensions. Trends in Cognitive Sciences. 2018;22:1073–1075. doi: 10.1016/j.tics.2018.08.010. [DOI] [PubMed] [Google Scholar]
  22. Lawrence SJD, Formisano E, Muckli L, de Lange FP. Laminar fMRI: applications for cognitive neuroscience. NeuroImage. 2017 doi: 10.1016/j.neuroimage.2017.07.004. [DOI] [PubMed] [Google Scholar]
  23. Lawrence SJD, van Mourik T, Kok P, Koopmans PJ, Norris DG, de Lange FP. Laminar organization of working memory signals in human visual cortex. Current Biology. 2018;28:3435–3440. doi: 10.1016/j.cub.2018.08.043. [DOI] [PubMed] [Google Scholar]
  24. Ling S, Pratte MS, Tong F. Attention alters orientation processing in the human lateral geniculate nucleus. Nature Neuroscience. 2015;18:496–498. doi: 10.1038/nn.3967. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Markuerkiaga I, Barth M, Norris DG. A cortical vascular model for examining the specificity of the laminar BOLD signal. NeuroImage. 2016;132:491–498. doi: 10.1016/j.neuroimage.2016.02.073. [DOI] [PubMed] [Google Scholar]
  26. Marquardt I, Schneider M, Gulban OF, Ivanov D, Uludağ K. Cortical depth profiles of luminance contrast responses in human V1 and V2 using 7 T fMRI. Human Brain Mapping. 2018;39:2812–2827. doi: 10.1002/hbm.24042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Marques JP, Kober T, Krueger G, van der Zwaag W, Van de Moortele PF, Gruetter R. MP2RAGE, a self bias-field corrected sequence for improved segmentation and T1-mapping at high field. NeuroImage. 2010;49:1271–1281. doi: 10.1016/j.neuroimage.2009.10.002. [DOI] [PubMed] [Google Scholar]
  28. Martinez-Trujillo JC, Treue S. Feature-based attention increases the selectivity of population responses in primate visual cortex. Current Biology. 2004;14:744–751. doi: 10.1016/j.cub.2004.04.028. [DOI] [PubMed] [Google Scholar]
  29. Muckli L, De Martino F, Vizioli L, Petro LS, Smith FW, Ugurbil K, Goebel R, Yacoub E. Contextual feedback to superficial layers of V1. Current Biology. 2015;25:2690–2695. doi: 10.1016/j.cub.2015.08.057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Nandy AS, Nassi JJ, Reynolds JH. Laminar organization of attentional modulation in macaque visual area V4. Neuron. 2017;93:235–246. doi: 10.1016/j.neuron.2016.11.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Polimeni JR, Fischl B, Greve DN, Wald LL. Laminar analysis of 7T BOLD using an imposed spatial activation pattern in human V1. NeuroImage. 2010;52:1334–1346. doi: 10.1016/j.neuroimage.2010.05.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Poser BA, Koopmans PJ, Witzel T, Wald LL, Barth M. Three dimensional echo-planar imaging at 7 tesla. NeuroImage. 2010;51:261–266. doi: 10.1016/j.neuroimage.2010.01.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Rockland KS, Pandya DN. Laminar origins and terminations of cortical connections of the occipital lobe in the rhesus monkey. Brain Research. 1979;179:3–20. doi: 10.1016/0006-8993(79)90485-2. [DOI] [PubMed] [Google Scholar]
  34. Saenz M, Buracas GT, Boynton GM. Global effects of feature-based attention in human visual cortex. Nature Neuroscience. 2002;5:631–632. doi: 10.1038/nn876. [DOI] [PubMed] [Google Scholar]
  35. Scheeringa R, Koopmans PJ, van Mourik T, Jensen O, Norris DG. The relationship between oscillatory EEG activity and the laminar-specific BOLD signal. PNAS. 2016;113:6761–6766. doi: 10.1073/pnas.1522577113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Schneider KA. Subcortical mechanisms of feature-based attention. Journal of Neuroscience. 2011;31:8643–8653. doi: 10.1523/JNEUROSCI.6274-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Self MW, van Kerkoerle T, Supèr H, Roelfsema PR. Distinct roles of the cortical layers of area V1 in figure-ground segregation. Current Biology. 2013;23:2121–2129. doi: 10.1016/j.cub.2013.09.013. [DOI] [PubMed] [Google Scholar]
  38. Self MW, van Kerkoerle T, Goebel R, Roelfsema PR. Benchmarking laminar fMRI: neuronal spiking and synaptic activity during top-down and bottom-up processing in the different layers of cortex. NeuroImage. 2017 doi: 10.1016/j.neuroimage.2017.06.045. [DOI] [PubMed] [Google Scholar]
  39. Sereno MI, Dale AM, Reppas JB, Kwong KK, Belliveau JW, Brady TJ, Rosen BR, Tootell RB. Borders of multiple visual areas in humans revealed by functional magnetic resonance imaging. Science. 1995;268:889–893. doi: 10.1126/science.7754376. [DOI] [PubMed] [Google Scholar]
  40. Treue S, Martínez Trujillo JC. Feature-based attention influences motion processing gain in macaque visual cortex. Nature. 1999;399:575–579. doi: 10.1038/21176. [DOI] [PubMed] [Google Scholar]
  41. Uludağ K, Blinder P. Linking brain vascular physiology to hemodynamic response in ultra-high field MRI. NeuroImage. 2018;168:279–295. doi: 10.1016/j.neuroimage.2017.02.063. [DOI] [PubMed] [Google Scholar]
  42. Uğurbil K, Toth L, Kim DS. How accurate is magnetic resonance imaging of brain function? Trends in Neurosciences. 2003;26:108–114. doi: 10.1016/S0166-2236(02)00039-5. [DOI] [PubMed] [Google Scholar]
  43. van Kerkoerle T, Self MW, Roelfsema PR. Layer-specificity in the effects of attention and working memory on activity in primary visual cortex. Nature Communications. 2017;8:13804. doi: 10.1038/ncomms13804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. van Mourik T, Koopmans PJ, Norris DG. Improved cortical boundary registration for locally distorted fMRI. bioRxiv. 2018a doi: 10.1101/248120. [DOI] [PMC free article] [PubMed]
  45. van Mourik T, van der Eerden JPJM, Bazin P-L, Norris DG. Laminar signal extraction over extended cortical areas by means of a spatial GLM. PLOS ONE. 2018b;14:e0212493. doi: 10.1371/journal.pone.0212493. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Wandell BA, Dumoulin SO, Brewer AA. Visual field maps in human cortex. Neuron. 2007;56:366–383. doi: 10.1016/j.neuron.2007.10.012. [DOI] [PubMed] [Google Scholar]
  47. Watson AB, Pelli DG. QUEST: a bayesian adaptive psychometric method. Perception & Psychophysics. 1983;33:113–120. doi: 10.3758/BF03202828. [DOI] [PubMed] [Google Scholar]
  48. Yacoub E, Van De Moortele PF, Shmuel A, Uğurbil K. Signal and noise characteristics of hahn SE and GE BOLD fMRI at 7 T in humans. NeuroImage. 2005;24:738–750. doi: 10.1016/j.neuroimage.2004.09.002. [DOI] [PubMed] [Google Scholar]

Decision letter

Editor: Christian Büchel1
Reviewed by: Christian Büchel2

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Dissociable laminar profiles of concurrent bottom-up and top-down modulation in the human visual cortex" for consideration by eLife. Your article has been reviewed by three peer reviewers, including Christian Buchel as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Michael Frank as the Senior Editor.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

As you can see from the comments below the reviewers found the data and the study of high interest, However, they have also indicated that they need to see more details to fully judge the validity of the claims. This refers to the details of normalization (Reviewer #2) and overall quality control issues (Reviewer #3) including the wish to see more raw data (Reviewers #2 and #3). Reviewer #2 also questions whether the task employed is a classical feature-based attention task. This would also need to be addressed. Finally, Reviewer #3 wonders if the pre-selection of voxels is statistical valid ('double dipping').

Note from Deputy Editor: eLife prefers supplemental data to be supplements to specific figures, so that all data are included in the body of the paper. Please see if your existing supplemental figures can be thought of as supplements to figures in the paper, and whether the additional data asked for in the paper can also be included in that form. You will see that some eLife papers have 2 or 3 supplements to some figures, so that the relevant additional data are grouped with the primary result. And of course, there is nothing to prevent you from just adding additional figures to the paper. We prefer that controls be viewed as an important part of the paper.

Reviewer #1:

This is the first study combining high resolution, layer resolving fMRI at 7T with a task that allows to investigate top-down and bottom-up processing in a robust and elegant task. The methodology is sound and the authors have successfully established layer resolved fMRI. Based on retinotopically identified areas they investigate how attending to a particularly oriented grating CW or CCW is represented in early visual areas (V1, V2 and V3). The bottom-up manipulation was implemented by changes in contrast. The top-down manipulation was feature attention i.e. detecting changes in bar width. In a first step the authors identified voxels in the mapped areas that preferentially responded to either orientation. They then investigated both main effects i.e. high vs. low stimulus contrast and attended vs. unattended orientation. Finally, they analyzed whether these main-effects are stronger in a specific cortical layer (deep, middle and superficial). With respect to feature based attention they observed a trend that the superficial layer showed the strongest modulation, but this was not significant (p=0.07). In contrast, the bottom up contrast effect showed a significantly smaller main effect in the middle layer. They can also show a significant interaction (i.e. layer by type of attention). In a final step they investigated whether these effects differ between V1, V2 and V3. This analysis revealed no relevant differences in the observed patterns.

This is an interesting study showing that changes in stimulus contrast are predominantly represented in middle cortical layers. The study further suggests that feature based attention shows the opposite effect, yet this was only a trend. This a clear data set and very interesting result.

The paradigm employed is in essence a 2x2 factorial design with the factors bottom-up (i.e. stimulus contrast) and top-down (feature attention). Although Figure 2 suggests that there is no interaction, I was wondering whether (i) any voxels show such an interaction and (ii) whether this interaction would be differently expressed in different layers. Along these lines, Figure 4 collapses single main effects into a difference score, which does not allow the reader to interpret the full data. I agree that this might clutter the figure, but the authors should add a supplemental figure showing for each layer the responses to all 4 conditions without subtraction as a bar graph or in other words providing Figure 2 for each of the 3 layers.

Reviewer #2:

The authors present a 7T fMRI study examining whether top-down dependent processes (such as feature-based attention) can be dissociated from bottom-up processes (difference in contrast) in the different layers of early visual cortex. Many of the conclusions appear to rely on the computation of z-scores per layer and I have several methodological questions about the quantification, some of which may undermine the conclusions (but I hope the authors can address them – my point #1). I also have some issues with the behavioral paradigm (point #2) and whether this is truly a feature-based attention paradigm. The laminar differences in the attention effect are quite small but the comparison between agranular and granular layers might be possibly interesting. However, the authors use a distinction between granular and agranular (deep + superficial) in their analysis that does not describe the results well. The attentional effects are in fact strongest in the superficial layers and weakest in the deep layers, an intermediate in layer 4. It does therefore not make much sense to average across the superficial and deep layers, and the results are actually quite different from previous studies studying spiking activity in monkeys so that one wonders if laminar fMRI using the present methodology is a valid approach.

1) The authors indicate (subsection “Quantification of laminar-specific effects of feature-based attention and stimulus contrast”) that the BOLD signal in superficial layers is stronger than in the deeper layers, which is generally thought to be caused by the direction of blood flow in cortex. However, the raw data (before normalization) are not shown, and the normalization steps that carried out to correct for these differences in BOLD amplitude remained a bit obscure. This is an important point because I suspect that the laminar profile might actually reflect the choices that are made for the normalization.

As I understand it, the authors used the magnitude of the visually driven activity for normalization when they write "we converted time courses within depth bins to z scores". Does this imply that they normalized to the magnitude of visually driven activity per layer? If so, it seems somewhat surprising to see differences in visually driven activity between the layers, and in particular when considering Figure 3—figure supplement 4 suggesting that normalization was done per layer. I do understand that this can come out, because of the comparison between the activity elicited by high and low-contrast stimuli. But it is not immediately clear to me how one should interpret that difference, which should depend on the contrast response function of the voxels, not on visually driven response. Are the results in Figure 3—figure supplement 4 are obtained by pooling across the lower and higher contrasts? If the contrast response function is flat around the contrasts that are chosen, one might expect a small difference and a larger difference if the contrast response function is steeper. However, the authors seem to interpret the slightly stronger activity in layer 4 as evidence for a feedforward effect, and I am not sure if this interpretation is supported by the data, given these normalization issues.

– These normalization issues are aggravated when one also considers the s.d. (i.e. the variability) of activity across trials, a term appearing in the denominator when computing z-scores. Again, the outcome of the analysis may now become sensitive to arbitrary choices, which have not been well described. Are these z-scores computed per condition? Or across conditions? All conditions? In one possible scenario, the z-scores are computed across all stimuli (both low and high contrast stimuli- a similar argument holds for attended vs. non-attended stimuli). In that case, the outcome of the normalization depends on the effect size of the contrast manipulation which will contribute to the overall variance of activity across trials, and hence will contribute to the denominator when computing z-scores. In the most extreme case, contrast/attention explains a large fraction of the variance, and part of the effect of the contrast manipulation would be removed during the computation of z-scores because of the normalization. I am not sure if these problematic issues arose during the analysis, but the computation of z-scores and the effect sizes before normalization are not described in sufficient detail for a proper evaluation.

– The variance in activity might differ across the layers, have the authors also investigated these effects?

– If these normalization issues can be solved, which I hope to be the case, I would like to see a thorough discussion of a rational approach to normalize for the strength of BOLD signals in the different layers, how this affects the difference in activity elicited by low and high contrast stimuli, attended and non-attended stimuli and the possible issues that can occur when computing z-scores.

– I can imagine that systematic approaches to this problem must exist. If not, the authors may be in an excellent position to propose such an approach.

– I think it should be made clear already in the Results section that the laminar profiles have been z-scored within depth bins. This is important for interpretation of the results and on my initial reading I was wondering why there was no overall bias towards the superficial layers. I would also like to see a figure showing the non-z-scored BOLD signal changes for the different attention/contrast conditions to get a better impression of the data.

– In the Discussion the authors remark "That said, any influence of spatial hemodynamics should be consistent across experimental conditions, and therefore accounted for in our calculation of bottom-up/top-down modulations via a subtraction of the responses to different contrast/attention conditions." I was a bit confused, what do they mean with a subtraction?

– Is the increase in BOLD in the superficial layers a property of the chosen EPI sequence?

– Did the degree of orientation tuning differ between the areas. If yes, did that impact on the results?

– Figure 3F: not all subjects had more activity if the stimulus was attended. Is that a reliable within subject effect, opposite for some subjects than what was expected? Or does this reflect noise in the quality of the data of individual subjects?

2) The authors frame their paradigm as a feature-based attention paradigm. However, the design of the stimulus is quite different to a typical feature-based attention paradigm and I doubt whether the participants required feature-based attention to solve the task. For example, if I'm being cued to attend to the clockwise bars, my strategy would be to fixate one of the bars of the appropriate color (e.g. white) and monitor that bar for width changes. Eye-movements weren't monitored as far as I can tell, but even if the participants did maintain fixation then this is still more reminiscent of a spatial-attention task. True feature-based attention requires the participant to attend to a feature in one part of space, and this modulates activity related to that feature in another part of space. The interpretation of the paradigm is not just of semantic interest but has a large-impact on much of the discussion and the relevance and impact of this work, so I would like to hear the authors thoughts on this.

3) In the previous Current Biology paper, the orientation-mask had quite an interesting spatial distribution in V1. Are these the same subjects? If not, was the same distribution observed?

– In that paper in V1 bottom up signals are stronger in deep and superficial and weaker in L4 – their Figure 3B. Can you explain the difference?

4) I think that the pooling across deep and superficial layers and then report "agranular layers" is misleading, given that the results for layer 4 are more similar to the deep than to the superficial layers. The dissimilarity between the low attention modulation in the deep layers in the present study with the previous monkey studies should be discussed.

– The conclusion in the Discussion first paragraph "Moreover, our results pointed to stronger attention modulations in agranular cortical layers compared to contrast effects, which were strongest in the granular layer." seems to be a bit too optimistic to me.

Reviewer #3:

This study demonstrates clear modulations of relative layer-specific activation with a bottom up (contrast modulation) and top down (attention modulation) tasks. Specifically, in early visual regions, bottom up modulation was seen mostly in middle layers and top down modulation was mostly seen in superficial layers however with less modulation in middle layers. This is a well thought out experiment that certainly collapses challenging data in a manner that comes close to convincingly revealing the hypothezised modulations, however, as detailed in the specific comments, I have concerns with the fact that no actual maps nor raw BOLD activation laminar profiles nor even selected ROIs were shown, leaving the reader completely in the dark as to the data quality. It's mentioned that because there were two task modulations and a comparison between the two, large pial vessel effects were eliminated. This, I would argue is not entirely true as the baseline blood volume not only modulates the degree of BOLD signal change but can also have a secondary effect of enhancing the BOLD signal change difference as a function of underlying blood volume. Lastly, since the contrast modulation was used to create the ROI's and then used in the analysis, this paper has the potential for falling into the statistically unsound "double dipping" trap that would elevate the effects at least for the bottom up modulations. The way around this would be to demonstrate that these differences are mappable onto the cortical layer architecture. If this cannot be done, then it is questionable whether or not the data are of sufficient quality to make any conclusive statements.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for submitting your article "Dissociable laminar profiles of concurrent bottom-up and top-down modulation in the human visual cortex" for consideration by eLife. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Michael Frank as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission. As a consequence, I am delighted to say that we are happy, in principle, to publish a suitably revised version.

Essential revisions:

As you can see both reviewers still have issues with the paper. However, the editors feel that these concerns can be addressed by (i) a figure showing layer specific activation (Reviewer #3) (ii) by changing the wording as suggested by Reviewer #2 in critical parts of the manuscript, (iii) by further discussing the pooling of deep and superficial layers (Reviewer #2).

Should these revisions convince the editors, we are in principle happy to publish this paper,

Reviewer #2:

I am still a bit mixed about this paper. On the one hand, the authors convincingly addressed all issues that I had with the normalization.

On the other hand, my issue with the grouping of superficial and deep layers into an "agranular compartment" was not addressed satisfactorily. As I mentioned in my first review, the attentional effects (or the ratio with bottom-up effects) in fact seem strongest in the superficial layers and weakest in the deep layers, and intermediate in layer 4. It does therefore not make much sense to average across the superficial and deep layers. The results are actually quite different from previous studies on spiking activity in monkeys, so that one wonders if laminar fMRI using the present methodology reflects the underlying neurophysiology.

Several misleading statements remained in the revision:

– In the Abstract: "top-down modulation is significantly stronger in deep and superficial layers than top down effects", which is not shown for the deep layers but only by using the misleading grouping into "agranular layers". I suspect that the effects are driven by the superficial layers.

– Same at the end of the Introduction, final sentence.

– In reality there are no clear differences in top-down effects between the layers (subsection “Dissociable laminar profiles of bottom-up and top-down response modulations”) but it is only if a comparison (subtraction) is made to the bottom-up effects.

– This is also visible in Figure 4A, where the weakest attention effects are present in the deep layers, the strongest in the superficial layers and the granular layers are intermediate.

– Discussion section: "We have shown that, in a task where bottom-up and top-down influences are manipulated independently, the overall BOLD response can be separated into top-down and bottom-up components by examining how these effects are organized across depth." I think that this is an overstatement. The only reliable laminar difference seems to be in the bottom up response across layers.

– It should also be clarified consistently that these effects are driven by a difference in the contrast sensitivity rather than be a difference in the attention effects between layers.

What is the rationale of the grouping of superficial and deep layers? Is it the wish to replicate the non-human primate studies?

I would recommend that the three laminar compartments stay separate throughout the analysis (e.g. Figure 3E, F), and also in the Abstract and in the Discussion. It seems conceivable that in such an analysis with three laminar compartments, there is a difference in the ratio between top-down and bottom-up effects between superficial layers and the granular layers, but no such difference between the granular and deep layers. Such a discrepancy with the non-human primate work would also be a valuable outcome, and useful for future studies that plan to use laminar fMRI.

Reviewer #3:

Overall, I appreciate that the authors put in a tremendous amount of work to address all the questions from all three reviewers. I am satisfied with all the answers except this answer:

"Our revised manuscript includes three additional figures displaying raw data to allow the reader to better assess the data quality. (1) A figure showing raw, layer-specific BOLD time courses for each experimental condition and each ROI (Figure 3—figure supplement 3)"

This is a time course but not a map of actual layer activity.

"(2) A replication of our main results obtained by performing our analysis to raw data that had not been normalized, showing that the steps we took to minimize the impact of overall BOLD signal differences between layers, i.e. z scoring data within layers, were not critical to our results (Figure 3—figure supplement 5)."

This is certainly appreciated but not a map of layer specific activity.

"(3) A cross section of V1 from a representative example subject, where voxels are color coordinated based on which layer they belong to (Figure 3—figure supplement 7). We thank the reviewer for these helpful suggestions."

This is a mask and not an activation map of layer specific activity.

eLife. 2019 May 7;8:e44422. doi: 10.7554/eLife.44422.022

Author response


Reviewer #1:

[…] This is an interesting study showing that changes in stimulus contrast are predominantly represented in middle cortical layers. The study further suggests that feature based attention shows the opposite effect, yet this was only a trend. This a clear data set and very interesting result.

The paradigm employed is in essence a 2x2 factorial design with the factors bottom-up (i.e. stimulus contrast) and top-down (feature attention). Although Figure 2 suggests that there is no interaction, I was wondering whether (i) any voxels show such an interaction and (ii) whether this interaction would be differently expressed in different layers.

We thank the reviewer for this interesting suggestion. We ran an interaction analysis in FSL to probe whether there were voxels that showed an attention x contrast interaction. We found no evidence for significant clusters of voxels showing an interaction that replicated across participants, and no significant clusters at the group level. We therefore conclude that, as Figure 2 suggests, there were no voxels that exhibited an attention x contrast interaction.

Along these lines, Figure 4 collapses single main effects into a difference score, which does not allow the reader to interpret the full data. I agree that this might clutter the figure, but the authors should add a supplemental figure showing for each layer the responses to all 4 conditions without subtraction as a bar graph or in other words providing Figure 2 for each of the 3 layers.

We appreciate the suggestion and have added this figure to the supplementary materials (Figure 2—figure supplement 1).

Reviewer #2:

[…] 1) The authors indicate (subsection “Quantification of laminar-specific effects of feature-based attention and stimulus contrast”) that the BOLD signal in superficial layers is stronger than in the deeper layers, which is generally thought to be caused by the direction of blood flow in cortex. However, the raw data (before normalization) are not shown, and the normalization steps that carried out to correct for these differences in BOLD amplitude remained a bit obscure. This is an important point because I suspect that the laminar profile might actually reflect the choices that are made for the normalization.

As I understand it, the authors used the magnitude of the visually driven activity for normalization when they write "we converted time courses within depth bins to z scores". Does this imply that they normalized to the magnitude of visually driven activity per layer? If so, it seems somewhat surprising to see differences in visually driven activity between the layers, and in particular when considering Figure 3—figure supplement 4 suggesting that normalization was done per layer. I do understand that this can come out, because of the comparison between the activity elicited by high and low-contrast stimuli. But it is not immediately clear to me how one should interpret that difference, which should depend on the contrast response function of the voxels, not on visually driven response. Are the results in Figure 3—figure supplement 4 are obtained by pooling across the lower and higher contrasts? If the contrast response function is flat around the contrasts that are chosen, one might expect a small difference and a larger difference if the contrast response function is steeper. However, the authors seem to interpret the slightly stronger activity in layer 4 as evidence for a feedforward effect, and I am not sure if this interpretation is supported by the data, given these normalization issues.

We apologize for not adequately explaining the rationale and method for z scoring our data in the original manuscript. As has been documented before (Koopmans et al., 2010; Uǧurbil et al., 2003; Uludağ and Blinder, 2018; Yacoub et al., 2005), we found large differences in overall BOLD signal strength between cortical layers, where responses were strongest in superficial cortex. This can now clearly be seen in a new supplementary figure (Figure 3—figure supplement 3) which shows raw BOLD time courses for each experimental condition before any normalization. We wanted to investigate the strength of signal modulations from attention and stimulus contrast within layers but were not interested in differences in overall signal strength between layers. We therefore chose to normalize BOLD responses from each cortical layer separately. This was done by converting layer-specific time courses from each run of the fMRI experiment to z scores, meaning z scoring was performed within layers but across all experimental conditions. We reasoned that this should normalize differences in overall signal strength between layers, while preserving differences between conditions within layers. Z scoring within each run should also reduce the impact of differences in signal quality or intensity between runs. It is clear that this approach removed most of the overall signal intensity differences between layers by comparing the raw layer-specific time courses shown for each condition in Figure 3—figure supplement 3 to the normalized layer-specific time courses (averaged across conditions) in Figure 3—figure supplement 4. We now include new supplementary figure showing our main results without normalization, determined by repeating our analysis approach without converting layer-specific time courses to z scores (Figure 3—figure supplement 5). The final results look very similar to those reported in the main manuscript including the z scoring procedure. This shows that this normalization had little effect on layer-specific response differences between experimental conditions, but allows us to consider those differences in the absence of large differences in overall signal strength between layers, which are partly due to blood draining through the cortex (Uludağ and Blinder, 2018) rather than our experimental manipulations. We have now also added a new section to the Materials and methods that more clearly describes our reasonings behind the z scoring approach and how it was implemented (“Normalization of layer-specific responses”).

The reviewer also asks whether the time courses plotted in Figure 3—figure supplement 4 are averaged across contrast conditions, and whether voxel contrast response functions might be flat across the contrast levels we chose. The data in Figure 3—figure supplement 4 are indeed averaged across contrast conditions as well as across attention conditions. Although we did not measure full contrast response functions, we do not expect them to be flat across the contrast levels of 80% and 30% that we chose, given previously measured response functions (Buracas and Boynton, 2007). We found a strong effect of stimulus contrast on BOLD responses (Figure 2), and the effect of contrast can also be seen in the raw data by comparing time courses from low and high contrast conditions in Figure 3—figure supplement 3. We believe this provides sufficient evidence that voxel responses were modulated by stimulus contrast as we expected.

These normalization issues are aggravated when one also considers the s.d. (i.e. the variability) of activity across trials, a term appearing in the denominator when computing z-scores. Again, the outcome of the analysis may now become sensitive to arbitrary choices, which have not been well described. Are these z-scores computed per condition? Or across conditions? All conditions? In one possible scenario, the z-scores are computed across all stimuli (both low and high contrast stimuli- a similar argument holds for attended vs. non-attended stimuli). In that case, the outcome of the normalization depends on the effect size of the contrast manipulation which will contribute to the overall variance of activity across trials, and hence will contribute to the denominator when computing z-scores. In the most extreme case, contrast/attention explains a large fraction of the variance, and part of the effect of the contrast manipulation would be removed during the computation of z-scores because of the normalization. I am not sure if these problematic issues arose during the analysis, but the computation of z-scores and the effect sizes before normalization are not described in sufficient detail for a proper evaluation.

We again apologize for not describing our z scoring procedure with sufficient clarity in the original manuscript. We hope that our manuscript revisions and response to point #1 have resolved this issue. We would again highlight that the replication of our main results without z scoring in Figure 3—figure supplement 5 should now make it clear to the reader that our choice of normalization was not critical to our results.

– The variance in activity might differ across the layers, have the authors also investigated these effects?

To interrogate this issue, we computed trial-to-trial standard deviation in our (normalized) layer-specific time courses for each of the 10 time points within a single trial, and then computed the average standard deviation across time points. The group average standard deviation for each layer (Figure 3—figure supplement 6) clearly shows a difference in overall signal variance between layers, where variance was higher in deeper cortex (F [46, 2] = 11.41, p = 9.5e-5). It therefore appears that signals we measured from deeper cortex had overall lower signal strength and larger variance. Given that our main results show an overall difference in the organization of bottom-up and top-down modulations across all layers, we do not believe higher signal variance in deeper cortex should greatly impact the interpretation of our data. In addition, the similarity of main results using raw (not normalized) data (Figure 3—figure supplement 5) to those using data that were z scored within layers (Figure 2) indicates that differences in variance between layers did not affect our calculation of average responses from the z scored data. Nevertheless, we now include this analysis in the supplementary as we consider it useful information for the reader and we thank the reviewer for the suggestion.

– If these normalization issues can be solved, which I hope to be the case, I would like to see a thorough discussion of a rational approach to normalize for the strength of BOLD signals in the different layers, how this affects the difference in activity elicited by low and high contrast stimuli, attended and non-attended stimuli and the possible issues that can occur when computing z-scores.

– I can imagine that systematic approaches to this problem must exist. If not, the authors may be in an excellent position to propose such an approach.

We have included a new section to the Materials and methods, which describes our reasonings for the z scoring procedure and how it was implemented. This passage is also pasted below:

“Normalization of layer-specific responses

It is well established that gradient-echo BOLD suffers from a bias in signal strength whereby responses in superficial cortex are stronger than responses from deep cortex (Koopmans et al., 2010; Uǧurbil et al., 2003; Uludağ and Blinder, 2018; Yacoub et al., 2005). This bias can be seen clearly in our raw data (see raw time courses in Figure 3—figure supplement 3). […] Of note, none of our results critically depend on this normalization step (Figure 3—figure supplement 5), but it allowed us to interpret those results in the absence of large-scale response differences between layers that are present in the raw data.”

– I think it should be made clear already in the Results section that the laminar profiles have been z-scored within depth bins. This is important for interpretation of the results and on my initial reading I was wondering why there was no overall bias towards the superficial layers. I would also like to see a figure showing the non-z-scored BOLD signal changes for the different attention/contrast conditions to get a better impression of the data.

We now state more clearly in the Results section how the data have been normalized:

“Depth-specific time courses were normalized to remove overall differences in signal intensity between layers (see Figure 3—figure supplement 3 and 4). Note that this normalization was not critical to the results reported (Figure 3—figure supplement 5). Normalized depth-specific time courses were analyzed to compare the laminar profile of activity modulations resulting from top-down attention and bottom-up stimulus contrast.”

We also provide a new supplementary figure showing raw layer-specific time courses for each experimental condition and visual area (Figure 3—figure supplement 3).

– In the Discussion the authors remark "That said, any influence of spatial hemodynamics should be consistent across experimental conditions, and therefore accounted for in our calculation of bottom-up/top-down modulations via a subtraction of the responses to different contrast/attention conditions." I was a bit confused, what do they mean with a subtraction?

Here we refer to how we quantified the strength of modulations from stimulus contrast and feature-based attention. The effect of contrast was established by subtracting the averaged trial response to low contrast stimuli from the response to high contrast stimuli. Similarly, we quantified the effect of attention as the response to the unattended orientation subtracted from the response to the attended orientation. This procedure is described in the legend for Figure 2 and in the Materials and methods section (Quantification of effects of feature-based attention and stimulus contrast). We expect that the influence of spatial hemodynamics is consistent across contrast levels, and across attention conditions, meaning they should effectively be accounted for in this subtraction, leaving only the influence of contrast or attention.

– Is the increase in BOLD in the superficial layers a property of the chosen EPI sequence?

Indeed, increased BOLD strength in superficial cortex is a feature of gradient echo EPI. This has been reported in detail in previous research (e.g. Koopmans et al., 2010; Turner, 2002; Uǧurbil et al., 2003; Uludağ and Blinder, 2018; Yacoub et al., 2005). Other sequences such as spin echo EPI or 3D GRASE are less susceptible to this bias (De Martino et al., 2013; Uludaǧ et al., 2009), but they also provide much weaker overall signal-to-noise ratio (SNR) compared to gradient echo EPI (Moerel et al., 2017). Recent developments in laminar cerebral blood volume imaging appear very promising (Huber et al., 2017), but these are difficult to implement for the visual system and may also have lower SNR.

– Did the degree of orientation tuning differ between the areas. If yes, did that impact on the results?

Our experiment only included two orthogonal orientations, meaning we cannot extrapolate full orientation tuning profiles for each area. Instead we have examined the average selectivity of V1, V2, V3 for clockwise over counter-clockwise (or vice versa). This was computed as the average unsigned t value from our orientation-selective masks for the clockwise > counter-clockwise contrast applied to the orientation localizer data. The average t values for V1, V2 and V3 were numerically similar (see Author response image 1). However, the small variation in selectivity across areas was significant (Huynh-Feldt corrected F (32.76, 1.42) = 4.50, p =.029), which appears to be driven by slightly weaker selectivity in V3 compared to V2 or V1. We believe this small difference in orientation selectivity between areas is unlikely to have had a large impact on our results, especially given that we found no differences between visual areas in our main analysis.

Author response image 1. Differences in orientation selectivity across areas.

Author response image 1.

The average unsigned t value for the clockwise > counter-clockwise contrast that was applied to the orientation localizer data are plotted for all voxels within orientation-selective masks in V1, V2 and V3.

– Figure 3F: not all subjects had more activity if the stimulus was attended. Is that a reliable within subject effect, opposite for some subjects than what was expected? Or does this reflect noise in the quality of the data of individual subjects?

Figure 3F does not show difference in activity between attended and unattended conditions. Rather, it is an illustration of how consistently the result in Figure 3E was shown be individual subjects. Figure 3E plots average scores representing the extent to which the effects of attention and contrast were more strongly expressed in the agranular (deep and superficial) or granular (middle) layers. These were defined as the average effect of attention (or contrast) in the deep and superficial layers minus the effect in the middle layer. As such a positive score indicates a stronger agranular effect, while a negative score indicates a granular effect. Figure 3E shows that, on average, the effect of attention was significantly more agranular compared to stimulus contrast, which was more granular. Figure 3F then shows the difference between attention and contrast scores plotted in Figure 3E for each individual subject, where a positive score indicates that the attention effect was more agranular, and a negative score indicates that the contrast effect was more agranular for that subject. Overall, this plot is a demonstration of how consistent the grouper laminar profiles for attention and contrast reported in the rest of the figure were across individual subjects.

2) The authors frame their paradigm as a feature-based attention paradigm. However, the design of the stimulus is quite different to a typical feature-based attention paradigm and I doubt whether the participants required feature-based attention to solve the task. For example, if I'm being cued to attend to the clockwise bars, my strategy would be to fixate one of the bars of the appropriate color (e.g. white) and monitor that bar for width changes. Eye-movements weren't monitored as far as I can tell, but even if the participants did maintain fixation then this is still more reminiscent of a spatial-attention task. True feature-based attention requires the participant to attend to a feature in one part of space, and this modulates activity related to that feature in another part of space. The interpretation of the paradigm is not just of semantic interest but has a large-impact on much of the discussion and the relevance and impact of this work, so I would like to hear the authors thoughts on this.

The paradigm we used is the same paradigm described in the landmark paper by (Kamitani and Tong, 2005, cited >1,500 times). These authors refer to the paradigm as a feature-based attention paradigm, given that subjects were attending to one of two features of the plaid stimulus, each feature being a set of oriented bars. It does not appear possible to complete the task using only spatial attention because (1) the two features largely spatially overlap and (2) the phase of each element within the plaid is randomized between stimuli, meaning the spatial location of the bars themselves changes from trial to trial. It is true that we were not able to record eye movements during the fMRI experiment, but Kamitani and Tong ruled out the contribution of eye movements to the same paradigm using a control experiment. We have pasted their description of this control experiment below:

“We did an additional control experiment to address whether eye movements, orthogonal to the attended orientation, might account for the enhanced responses to the attended grating by inducing retinal motion. The visual display was split into left and right halves, and activity from corresponding regions of the contralateral visual cortex was used to decode the attended orientation in each visual field (Supplementary Figure 4). Even when the subject was instructed to pay attention to different orientations in the plaids of the left and right visual fields simultaneously, cortical activity led to accurate decoding of both attended orientations. Because eye movements would bias only one orientation in the whole visual field, these results indicate that the attentional bias effects in early visual areas are not due to retinal motion induced by eye movements.” (Kamitani and Tong, 2005, p. 683).

3) In the previous Current Biology paper the orientation-mask had quite an interesting spatial distribution in V1. Are these the same subjects? If not, was the same distribution observed?

Different subjects were used in the current paper. We observed a similar distribution to our previous study, with clockwise-preferring voxels being largely clustered in the inferior portion of V1, and counter-clockwise-preferring voxels mostly in the superior portion of V1. The distribution has also been observed and discussed in several other papers (e.g. Freeman et al., 2011; Swisher et al., 2010). We have added a new figure that shows the layout of orientation-selective voxels in V1 for a representative subject (Figure 3—figure supplement 8).

– In that paper in V1 bottom up signals are stronger in deep and superficial and weaker in L4 – their Figure 3B. Can you explain the difference?

The figure that the reviewer refers to (Figure 3B of Lawrence 2018) describes a rather different comparison, i.e. stimulus>baseline. In this case, there were in fact no differences between the different layers for this (p = 0.648), suggesting that the entire cortical column was equally activated by presenting a stimulus (vs. no stimulus). Here we show that increasing the contrast of the stimulus (high vs. low contrast comparison) leads to a stronger increase in the middle layer. In other words, here we describe a much subtler modulation of neural activity by increasing contrast, rather than a difference between a stimulus and a blank screen.

4) I think that the pooling across deep and superficial layers and then report "agranular layers" is misleading, given that the results for layer 4 are more similar to the deep than to the superficial layers. The dissimilarity between the low attention modulation in the deep layers in the present study with the previous monkey studies should be discussed.

The data we used to compute the agranular – granular scores plotted in Figure 3E are plotted in full in panels B and D, which show the strength of effects of attention and stimulus contrast within each layer separately. Exactly how the scores in panel E were computed is also clearly described in the figure legend, with reference to the data plotted in B and D. The pooling across agranular layers for the purposes of Figure 3E should therefore be clearly accessible to the reader, by inspecting the different panels of the same figure. Testing the extent to which bottom-up and top-down effects were expressed in the agranular versus the granular layers was directly relevant to our initial hypothesis. We discuss the discrepancies between our feature-based attention effect (which was strong in all layers, peaking in the superficial layers) and those reported in previous monkey studies on (spatial) attention. As the reviewer suggests, we now also highlight in this Discussion the differences in results with regards to the deep layer, specifically:

“In particular, Van Kerkoerle et al. report strong attentional modulations in the deep layers compared to the middle layer, which was not the case in our data.”

– The conclusion in the Discussion first paragraph "Moreover, our results pointed to stronger attention modulations in agranular cortical layers compared to contrast effects, which were strongest in the granular layer." seems to be a bit too optimistic to me.

Although the effect of attention did not significantly vary across cortical layers when considered in isolation, it was stronger in the agranular layers compared to a bottom-up effect of stimulus contrast, as we show in Figure 3E. It is this comparison that we intended to speak to in the quoted statement, and we now make this clearer in the manuscript:

“Moreover, by comparing the strength of attention and contrast modulation in agranular versus the granular layers, we found that attention effects were expressed more strongly in the agranular layers compared to effects from stimulus contrast, which were more granular.”

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Reviewer #2:

I am still a bit mixed about this paper. On the one hand, the authors convincingly addressed all issues that I had with the normalization.

On the other hand, my issue with the grouping of superficial and deep layers into a "agranular compartment" was not addressed satisfactorily. As I mentioned in my first review, the attentional effects (or the ratio with bottom-up effects) in fact seem strongest in the superficial layers and weakest in the deep layers, and intermediate in layer 4. It does therefore not make much sense to average across the superficial and deep layers. The results are actually quite different from previous studies on spiking activity in monkeys, so that one wonders if laminar fMRI using the present methodology reflects the underlying neurophysiology.

We appreciate the reviewer’s point. We now mention more clearly, throughout the manuscript, that the attentional effect is most strongly present in the superficial layer (rather than the superficial and deep layers). The analysis of granular vs. agranular layers was a priori and theoretically motivated, in view of the fact that bottom-up input activates the granular layer 4 whereas top-down modulations avoid the granular layer but innervate the superficial and deep layers instead. We would therefore like to keep this analysis. We do however now spell out clearly throughout the manuscript that the difference in activity profile between the contrast and attention modulations is driven by a larger activity modulation in the superficial layers (attention) vs. middle layer (contrast).

The link with spiking activity in monkeys is difficult. To our knowledge, there hasn’t been a single study measuring layer-specific activity modulations due to feature-based attention in monkeys. Also, the link between spiking activity (reflecting output) and BOLD signal modulations (predominantly reflecting input) is not fully understood. For all these reasons, one cannot expect a 1:1 correspondence in findings between species and methods.

Several misleading statements remained in the revision:

– In the Abstract: "top-down modulation is significantly stronger in deep and superficial layers than top down effects", which is not shown for the deep layers but only by using the misleading grouping into "agranular layers". I suspect that the effects are driven by the superficial layers.

– Same at the end of the Introduction, final sentence.

– In reality there are no clear differences in top-down effects between the layers (subsection “Dissociable laminar profiles of bottom-up and top-down response modulations”) but it is only if a comparison (subtraction) is made to the bottom-up effects.

– This is also visible in Figure 4A, where the weakest attention effects are present in the deep layers, the strongest in the superficial layers and the granular layers are intermediate.

– Discussion section: "We have shown that, in a task where bottom-up and top-down influences are manipulated independently, the overall BOLD response can be separated into top-down and bottom-up components by examining how these effects are organized across depth." I think that this is an overstatement. The only reliable laminar difference seems to be in the bottom up response across layers.

– It should also be clarified consistently that these effects are driven by a difference in the contrast sensitivity rather than be a difference in the attention effects between layers.

We have adapted the statements to ensure that they do not mislead the readers and properly reflect the data. Specifically:

Abstract: “Bottom-up modulations were strongest in the middle cortical layer and weaker in deep and superficial layers, while top-down modulations were strongest in the superficial layers.”

Introduction: “As predicted, attentional modulations were more strongly expressed in agranular layers, particularly the superficial layers, while stimulus contrast modulations were largest in the granular layer”.

Results: “Therefore, it appears that top-down contributions to response modulations were stronger in the agranular layers compared to bottom-up contributions, which were strongest in the granular layer. As can be seen from Figure 3B, the agranular profile of attention was driven by the fact that the attentional modulation was strongest in the superficial layers.”

Discussion: “Moreover, by comparing the strength of attention and contrast modulation in agranular versus the granular layers, we found that attention effects were expressed more strongly in the agranular layers (specifically the superficial layers) compared to effects from stimulus contrast, which were more granular.”

Conclusion: “Top-down modulations from attention were overall stronger in agranular layers (specifically the superficial layers) compared to those from stimulus contrast, which were strongest in the granular layer.”

What is the rationale of the grouping of superficial and deep layers? Is it the wish to replicate the non-human primate studies?

We have explicated this in reply to point #1.

I would recommend that the three laminar compartments stay separate throughout the analysis (e.g. Figure 3E, F), and also in the Abstract and in the Discussion. It seems conceivable that in such an analysis with three laminar compartments, there is a difference in the ratio between top-down and bottom-up effects between superficial layers and the granular layers, but no such difference between the granular and deep layers. Such a discrepancy with the non-human primate work would also be a valuable outcome, and useful for future studies that plan to use laminar fMRI.

We thank the reviewer for this suggestion. The analysis in which the three laminar compartments are kept separate was already included in the manuscript [i.e., the modulation (attention/contrast) x depth (deep/middle/superficial) repeated measures ANOVA]. The fact that this analysis showed a significant modulation X depth interaction provides formal statistical support for the notion that there is a difference in the ratio between top-down and bottom-up effects between the layers. We now further unpacked this interaction by showing that the bottom-up effect varied significantly across depth (F [46, 2] = 8.43, p =.001), being largest at middle compared to deep (t [23] = 3.79, p =.001) and superficial (t [23] = 3.56, p =.002) depths. For the top-down effect, there was a statistical trend of activity differences between the layers (F [46, 2] = 2.82, p =.070). Unpacking this, we observed that the attention effect was significantly stronger in the superficial layers compared to the middle (t [23] = 2.11, p =.046) and deep layers (t [23] = 2.15, p =.042), while there was no difference in the strength of the attention effect between the deep and middle layers (t [23] = 0.36, p =.723). These post-hoc tests further corroborate the notion that is put forward by the reviewer, that the attentional effect is strongest in the superficial layers (rather than the deep layers). We have added these further post-hoc tests to the manuscript and we have tried to better qualify the nature of the attentional modulation throughout the manuscript.

Reviewer #3:

Overall, I appreciate that the authors put in a tremendous amount of work to address all the questions from all three reviewers. I am satisfied with all the answers except this answer:

"Our revised manuscript includes three additional figures displaying raw data to allow the reader to better assess the data quality. (1) A figure showing raw, layer-specific BOLD time courses for each experimental condition and each ROI (Figure 3—figure supplement 3)"

This is a time course but not a map of actual layer activity.

We have now added a map of actual layer activity, Figure 3—figure supplement 9.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Lawrence S, Norris DG, de Lange FP. 2019. Dissociable laminar profiles of bottom-up and top-down modulation in the human visual cortex. Donders Repository. DSC_3018028.04_752 [DOI] [PMC free article] [PubMed]

    Supplementary Materials

    Transparent reporting form
    DOI: 10.7554/eLife.44422.017

    Data Availability Statement

    Data and code used for stimulus presentation and analysis are available online at the Donders Research Data Repository: https://data.donders.ru.nl/collections/di/dccn/DSC_3018028.04_752.

    The following dataset was generated:

    Lawrence S, Norris DG, de Lange FP. 2019. Dissociable laminar profiles of bottom-up and top-down modulation in the human visual cortex. Donders Repository. DSC_3018028.04_752


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES