Skip to main content
The Journal of Neuroscience logoLink to The Journal of Neuroscience
. 2022 Nov 16;42(46):8709–8715. doi: 10.1523/JNEUROSCI.0605-22.2022

Attentional Enhancement of Tracked Stimuli in Early Visual Cortex Has Limited Capacity

Nika Adamian 1,, Søren K Andersen 1,2
PMCID: PMC9671574  PMID: 36202616

Abstract

Keeping track of the location of multiple moving objects is one of the well documented functions of visual attention. However, the mechanism of attentional selection that supports such continuous tracking is unclear. In particular, it has been proposed that target selection in early visual cortex occurs in parallel, with tracking errors arising because of attentional limitations at later processing stages. Here, we examine whether, instead, total attentional capacity for enhancement of early visual processing of tracked targets is shared between all attended stimuli. If the magnitude of attentional facilitation of multiple tracked targets was a key limiting factor of tracking ability, then one should expect it to drop systematically with increasing set-size of tracked targets. Human observers (male and female) were instructed to track two, four, or six moving objects among a pool of identical distractors. Steady-state visual evoked potentials (SSVEPs) recorded during the tracking period revealed that the processing of tracked targets was consistently amplified compared with the processing of the distractors. The magnitude of this amplification decreased with increasing set size, and at lateral occipital electrodes it closely followed inverse proportionality to the number of tracked items, suggesting that limited attentional resources must be shared among the tracked stimuli. Accordingly, the magnitude of attentional facilitation predicted the behavioral outcome at the end of the trial. Together, these findings demonstrate that the limitations of multiple object tracking (MOT) across set-sizes stem from the limitations of top-down selective attention already at the early stages of visual processing.

SIGNIFICANCE STATEMENT The ability to selectively attend to relevant features or objects is the key to flexibility of perception and action in the continuously changing environment. This ability is demonstrated in the multiple object tracking (MOT) task where observers monitor multiple independently moving objects at different locations in the visual field. The role of early attentional enhancement in tracking was previously acknowledged in the literature, however, the limitations on tracking were thought to arise during later stages of processing. Here, we demonstrate that the strength of attentional facilitation depends on the number of tracked objects and predicts successful tracking performance. Thus, it is the limitations of attentional enhancement at the early stages of visual processing that determine behavioral performance limits.

Keywords: attention, EEG, frequency tagging, multiple object tracking, spatial attention, steady-state visual evoked potentials

Introduction

Human observers are capable of keeping track of multiple independently moving objects in their visual surround, even in the presence of identical distractors. This ability, ubiquitous in everyday life, is studied in the laboratory setting using the multiple object tracking (MOT) paradigm (Pylyshyn and Storm, 1988; Cavanagh and Alvarez, 2005; Scholl, 2009). A central question in this field is what limits the capacity to track moving objects. Originally, it was proposed that this ability relies on four parallel preattentional mechanisms (FINSTs; Pylyshyn and Storm, 1988) and is thus limited to about four objects. However, later work demonstrated a smooth trade-off between the number of tracked objects and their speed, with participants being able to concurrently track as many as eight slow objects, but only a single fast one (Alvarez and Franconeri, 2007). Accordingly, it was proposed that multiple object tracking relies on the flexible allocation of attentional resources rather than a fixed preattentional architecture (Chen et al., 2013; Franconeri et al., 2013).

Neuroimaging studies investigating set-size effects in MOT found only parietal, but not early visual, brain areas to be sensitive to the number of tracked targets (Culham et al., 1998; Jovicich et al., 2001). This is congruent with a more recent EEG study, which found that sustained attentional modulation of tracked targets in early visual areas was predictive of behavioral responses at the end of the trial, although its magnitude was independent of the number of targets. Attentional modulation of target processing therefore seems necessary for tracking, although limitations of tracking capacity seem to arise from higher stages of processing (Störmer et al., 2013).

This view, however, seems at odds with studies of divided attention to static locations. While multiple objects located noncontiguously can be attended through multiple foci (Awh and Pashler, 2000; Müller et al., 2003), studies have overwhelmingly demonstrated costs associated with dividing attention (Castiello and Umiltà, 1990; McMains and Somers, 2004, 2005). This includes studies where steady-state visual evoked potentials (SSVEPs) were used to continuously measure attentional allocation in early visual cortex (Toffanin et al., 2009; Andersen et al., 2013; Adamian et al., 2019), as in the MOT study by Störmer et al. (2013), which however found no such trade-off. If multifocal attention required for tracking was an extension of divided spatial attention, we would expect that as the number of tracked targets grows, the magnitude of attentional facilitation decreases.

A possible explanation for a lack of set-size effects on attentional target modulation could be that observers group targets into a virtual polygon (Yantis, 1992; Merkel et al., 2014, 2017) and attention enhances this grouped representation equally regardless of the number of constituent targets. Such an explanation however leaves it unclear why grouping would only benefit moving targets and not divided attention to static locations. If this was the case, it might signify the intriguing possibility that the mechanism of selection of moving objects is qualitatively different from the mechanism of static selection (Cavanagh et al., 2014).

Here, we investigated whether attentional enhancement of tracked targets in early visual cortex, as measured by steady-state visual evoked potentials (SSVEPs), is subject to capacity limits.

Participants tracked two, four, or six moving objects among identical distractors. Targets and distractors flickered at different frequencies, driving separate SSVEPs and thereby allowing for the simultaneous examination of attentional allocation to each stimulus type. Importantly, stimulus displays were identical across target number conditions, and flicker frequencies were matched to the stimuli such that the number of frequency-tagged stimuli was not confounded with the number of tracked objects.

If limited attentional resources are distributed among the tracked targets, we should observe a decline in attentional selectivity with increasing set-size. Importantly, if this reflected a strictly limited resource, then the magnitude of attentional modulation should be inversely proportional to the number of tracked targets. Finally, if the bottleneck of tracking performance includes early visual cortex, attentional selection should be predictive of successful tracking.

Materials and Methods

Participants

Twenty-two members of the student community of University of Aberdeen participated in the study (10 female, four left-handed, 21–24 years old). They gave written informed consent and were compensated £10 for their time. All participants reported normal color vision and normal or corrected-to-normal visual acuity. The study was approved by the Ethics Committee of the School of Psychology at University of Aberdeen.

Data from five participants was excluded. Two of them withdrew from the study before completion, and further three datasets were excluded because of rejection of over 50% of trials in at least one condition as a result of EEG recording artifacts and performance. The final sample included 17 participants.

Stimuli and procedure

Stimuli were created using MATLAB (MathWorks Inc.) and the Cogent Graphics package. They were presented in a dimly lit room on a 20-inch CRT monitor with 640 × 480-pixel screen resolution and a refresh rate of 120 Hz. Participants were seated at a viewing distance of ∼60 cm (head position was not restrained by a chin rest) and instructed to maintain their gaze at the fixation point [1.1 degrees of visual angle (dva)] in the center of the screen. Stimuli were presented against a mid-gray (29 cd/m2) background within a centrally positioned light-gray elliptical field (38.5 cd/m2, 33.4 dva width, 25.3 dva height) and consisted of 12 identical red discs (12.2 cd/m2, 3.8 dva diameter). Target and probe cues were given by outlining each disk in black. Feedback was given by displaying smaller green (correct) or dark red (incorrect) discs on top of the cued discs (Fig. 1).

Figure 1.

Figure 1.

Illustration of the trial sequence in the Attend six condition. Tracking targets and probe items were circled during the cueing period and the probing period, respectively. Throughout the trial, targets and distractors flickered at designated frequencies (see Fig. 2). Note that in the experiment probes were presented and responses were collected consecutively. Black arrows during the tracking period represent motion vectors and were not displayed in the experiment.

At the beginning of each trial, the flickering discs were randomly positioned inside the elliptical viewing area, with the constraint that all discs were separated by at least 2.1 dva from each other and the edge. Depending on the condition, two, four or six discs were outlined to mark them as to-be-tracked items. After 1250 ms, the outlines disappeared and the discs started moving in randomly chosen linear trajectories at a constant speed of 3.3 dva/s, bouncing off the borders of the viewing area (including the fixation cross) or each other at physically realistic angles. To prevent targets overlapping and reduce the potential effect of crowding, the discs were surrounded by an invisible boundary (1 dva wider than the disk itself) which determined when discs bounced off the aperture or other discs. Thus, starting positions and motion directions of the discs were random, but motion during the trials was deterministic and predictable. The tracking period lasted for 4000 ms after which the discs stopped moving and two of the discs were sequentially outlined. Participants were asked to report by key press whether each of the outlined discs was a target or a distractor. Targets and distractors were probed with 50% probability on each trial to maintain guessing chance at 50%. After responding to both probes, participants received visual (probed discs were filled with either green or red color) and auditory (high-pitch sound if both responses were correct or low-pitched beep otherwise) feedback. Summary feedback was also given after each block of trials.

Throughout the trial, all 12 discs flickered at their designated frequencies. Irrespective of the condition, two discs flickered at 10.9 Hz, two discs flickered at 13.3 Hz and the remaining eight discs flickered at 12 Hz (see Fig. 2). In all set size conditions the cued to-be-tracked discs included either both 10.9-Hz or both 13.3-Hz items. In addition, when four or six items were tracked, the remaining discs were selected from those flickering at 12 Hz. This allocation of frequency tags allowed us to compare attentional enhancement of tracked targets across set size conditions while controlling for the overall number of presented targets.

Figure 2.

Figure 2.

Top, Allocation of flicker frequencies to the stimuli. Each circle represents an item in the multiple object tracking task. Bold outline denotes tracking targets in each condition, with colors corresponding to different conditions. Number inside each circle is its flicker frequency in hertz. In all conditions, stimulation included 12 moving discs, which were flickering at three distinct frequencies. Two out of 12 discs always flickered at 10.9 Hz and two always flickered at 13.3 Hz. These discs were always assigned to both be either targets or distractors (e.g., in Attend two condition 10.9-Hz items were either targets (blue condition in the figure) or distractors (red condition in the figure). The remaining discs flickered at 12 Hz and were either all distractors (in Attend two condition) or a mix of targets and distractors (Attend four and Attend six conditions). Bottom, Grand-averaged amplitude spectrum of a wide cluster of 10 temporo-occipital electrodes obtained by Fourier transformation and zero-padded to 16,384 points. Insets are zoomed-in view of the spectra focused on the two frequencies of interest: 10.9 and 13.3 Hz. The 12-Hz amplitude peak is expectedly large given that eight discs flickered at 12 Hz. Color coding corresponds to the conditions depicted in the top panel.

There was a total of 336 trials delivered in eight blocks of 42 trials each. There were three set size conditions (Attend two, Attend four, and Attend six) which were duplicated for each target stimulation frequency (i.e., Attend two where targets flicker at 10.9 Hz and Attend two where targets flicker at 13.3 Hz), totaling 56 trials per condition. The latter manipulation occurred without participants knowing about it. Trials of different conditions were presented in a randomized order with each block containing seven trials from each condition. The same starting positions (and hence trajectories) of all objects were repeated once for each of the six conditions in every block thus keeping physical stimuli identical across attentional conditions.

Data analysis

Behavioral data analysis

Accuracy was analyzed as a function of set size. For this and all the subsequent analyses, responses in each trial were classified as correct if both probed discs were identified correctly, and incorrect otherwise. Therefore, guessing chance on the trial level was 25%. Accuracy rates (percentage correct) were submitted to one-way repeated measures ANOVA. All ANOVA analyses in this study were conducted with Greenhouse–Geisser correction for nonsphericity.

EEG acquisition and preprocessing

EEG data were recorded using an ActiveTwo amplifier system (Biosemi) from 64 Ag/AgCl electrodes at a sampling rate of 256 Hz. The default 10–20 electrode locations were modified by moving electrodes from positions T7/8 and F5/6 to PO9/10 and I1/2 to enhance spatial resolution of posterior locations. Eye movements and blinks were monitored by electrooculographic recordings from supraorbital and infraorbital right eye electrodes (vertical EOG) and outer canthi of both eyes (horizontal EOG). EEG data were processed using the EEGLAB toolbox (Delorme and Makeig, 2004) as well as custom MATLAB (MathWorks Inc.) routines.

Epochs were extracted from 400 to 3900 ms after motion onset. Epochs with blinks and eye movements (larger than 20 μV) were removed, as well as epochs when eye movements occurred during the cueing period. The averaged EOG traces after artifact removal indicated that remaining gaze position deviations from fixation were smaller than 0.8° (estimated following the method described by Mangun and Hillyard, 1991).

Epoch mean and linear trend were removed from each epoch. The remaining epochs were submitted to an automated preprocessing routine (Junghöfer et al., 2000) which replaces artifact-contaminated sensors with statistically weighted spherical interpolation or rejects entire trials if too many sensors are contaminated by artifacts. The average trial rejection rate was 15.7% (±6.55%) of trials across participants and conditions. The average number of interpolated channels was 3.81% (±1.32%). The remaining trials were subjected to scalp current density (SCD) transformation by means of spherical spline interpolation (Perrin et al., 1989).

Based on the distribution of amplitude and phase of SSVEPs across electrodes (see Fig. 3), we identified two clusters of electrodes for further analysis. The midline occipital cluster included electrodes POz and Oz, and lateral parieto-occipital cluster included electrodes P5/6, P7/8, and PO7. The electrodes were selected based on the overall signal amplitude and subsequently grouped into clusters based on phase similarity within and phase dissimilarity between clusters – an approach introduced by Andersen et al. (2012). Both subsequent analyses, of average and of single-trial SSVEP amplitudes, were performed on the two electrode clusters separately. Importantly, both electrode selection and clustering were performed on EEG data averaged across conditions and therefore reflected overall SSVEP signal strength and timing rather than any potential condition differences.

Figure 3.

Figure 3.

Topographical maps and phase coherence of SSVEPs. For each stimulation frequency: (top) grand mean scalp current density (SCD) map of SSVEP amplitudes averaged across conditions. Maximum amplitudes were obtained at midline occipital and lateral parieto-occipital sites. Note that the 12-Hz map has a different scale. Middle, Grand mean SSVEP phase map averaged across conditions. Cluster borders were clearly defined by the phase differences. All phases were rotated to align Oz electrodes to minus π/2 radians. Bottom, Phase coherence for all pairs of electrodes averaged across participants and conditions. Phase coherence was defined as the cosine of the phase difference between the two electrodes of each pair: a value close to 1 corresponds to an almost identical phase of the two electrodes of the pair. Triangles and asterisks on topographical maps indicate electrode locations included in the midline-occipital and lateral parieto-occipital cluster respectively.

The analysis of average SSVEP amplitudes was performed only on trials with correct responses. This ensures that the SSVEP amplitudes reflect the expected number of tracked targets at particular frequency, since trials with partially correct or fully incorrect responses are likely to be contaminated by tracking errors, such as “dropping” some of the targets and only tracking a subset of the cued targets. The average number of remaining epochs per condition and participant was 35 (±6.5). SSVEP amplitudes at frequencies of interest (10.9, 12.0, and 13.3 Hz) were obtained from averaged epochs as the absolute value of the complex Fourier coefficients for each frequency, condition, and participant. Single-trial SSVEP analysis included epochs preceding both correct and incorrect responses for subsequent classification. Single-trial SSVEP amplitudes were computed for each frequency, participant, and trial by projecting complex Fourier coefficients within each condition onto their mean phase (Andersen et al., 2008; Störmer et al., 2013; Adamian et al., 2019). This yields the contribution of each individual trial to the phase-locked SSVEP amplitude.

Analysis of average SSVEP amplitudes

The goal of this analysis was to identify whether the magnitude of attentional modulation depends on set size, and more specifically whether it is inversely proportional to the number of tracked targets. First, SSVEP amplitudes within the electrode cluster were averaged across conditions for each participant and frequency separately. To make SSVEP amplitudes comparable across frequencies and participants, they were then rescaled to a mean of 1.0 by dividing individual amplitudes (for each condition, participant, and frequency) by the mean over all six conditions. Finally, rescaled amplitudes were averaged across frequencies to yield mean SSVEP amplitudes for every condition and participant (see Andersen et al., 2008, 2011, 2013 for the rescaling method applied to other SSVEP studies) This procedure was performed for each electrode cluster separately. The resulting amplitudes were submitted to a repeated measures ANOVA with factors Attention (Attended vs Unattended) and Set size (Two, Four, and Six).

SSVEP amplitudes at 12 Hz were analyzed separately as they were not manipulated by the attentional conditions in the same way as 10.9- and 13.3-Hz items. Among the eight items flickering at 12 Hz in Set size condition Two, no items were attended; in Set size condition Four, two items were attended; and in Set size condition Six, four items were attended. These SSVEP amplitudes were separately rescaled and submitted to a repeated measures ANOVA with factor Set size (Zero, Two, Four).

Finally, we tested whether attentional selectivity in each cluster can be described as inversely proportional to set size. To this end, we scaled a 1/n function (where n is the number of targets) to match the average attentional modulation of individual participants. Scaling was done in the form of one-parameter fit where the intercept of 1/n function was determined by the average attentional modulation of individual participants' SSVEP amplitudes across three set size conditions. We then tested whether the empirically observed attentional modulations deviated from the hypothetical 1/n function on the group level. Evidence in favor of the null hypothesis that the observed values did not deviate from the 1/n predictions was assessed with a one-way repeated Bayesian ANOVA of residuals (Rouder et al., 2009) with factor Set size.

Single-trial analysis

The goal of the single-trial analysis was to test whether attentional selection as indexed by SSVEP amplitudes is predictive of behavioral performance on the trial-by-trial basis. Single-trial amplitude values were rescaled following the same procedure as in the previous analysis, and attentional selection was computed as the difference between the rescaled attended and unattended amplitude. For each trial, the electrode with the highest attentional modulation within each cluster was selected to enhance the sensitivity of the analysis. Note that the criterion used for electrode selection (overall attentional modulation) is independent from the correlation with the behavioral outcome of the trials. We then performed a median split of trials for each subject based on the magnitude of attentional modulation of single-trial SSVEPs (i.e., separated trials with attentional effects below and above average) and calculated mean accuracy rates for trials with low and high attentional modulation which were then compared statistically. In addition, we conducted multilevel logistic regression using the glmer function in the lme4 package for R (Bates et al., 2015). Response correctness on the trial-by-trial level was regressed on the fixed effect of attentional modulation (difference between rescaled attended and unattended amplitude). Participant intercepts were included as random effects.

Raw data, summary data and analysis code are available at https://osf.io/a36kw/.

Results

Participants made more errors when tracking more objects (F(2,32) = 52.453, p < 10−6, η2 = 0.54). Tracking accuracy (Fig. 4A) was reduced when tracking four compared with two discs (t(16) = 4.82, p = 0.002) and was lowest when tracking six discs (four vs six: t(16) = 8.41, p < 10−6).

Figure 4.

Figure 4.

A, Mean accuracy rates for three set size conditions. B, Rescaled grand mean SSVEP amplitudes at midline occipital and lateral parieto-occipital electrode clusters. C, Attentional modulation (difference between SSVEP amplitudes in Attended and Unattended conditions) and predicted attentional modulation under the assumption of inverse proportionality. D, Residuals between observed data and 1/n prediction for both electrode clusters. Error bars denote within-subject 95% confidence intervals (Morey, 2008) in panels A–C and between-subject 95% confidence intervals in panel D.

Average SSVEP amplitudes were larger when the objects eliciting them were targets rather than distractors (main effect of attention; midline occipital: F(1,16) = 57.77, p < 10−6, η2 = 0.35; lateral parieto-occipital: F(1,16) = 27.18, p < 10−5, η2 = 0.30; Fig. 4B), confirming that processing of tracked targets is prioritized at the level of early visual cortex. Attentional modulation was larger when fewer targets were attended in both clusters (set Size × Attention; midline occipital: F(2,32) = 3.70, p = 0.036, η2 = 0.07; lateral parieto-occipital: F(2,32) = 3.86, p = 0.03, η2 = 0.07). Post hoc pairwise comparisons between the amplitudes at set sizes two and six showed statistically significant modulation of Attended amplitudes (midline cluster: t(17) = 2.34, p = 0.03; lateral cluster: t(17) = 2.26, p = 0.04) but not the Unattended ones (midline cluster: t(17) = −1.28, p = 0.2; lateral cluster: t(17) = −0.66, p = 0.5).

SSVEP amplitudes elicited by the 12-Hz items were not significantly modulated by set size (midline cluster: F(2,32) = 0.35, p = 0.35, η2 = 0.013; lateral cluster: F(2,32) = 0.296, p = 0.75, η2 = 0.011).

Differences between the observed SSVEP amplitudes and values predicted by the inverse proportionality did not significantly deviate from zero in any of the individual conditions (see Table 1 and Fig. 4D). Bayesian repeated measures ANOVA cluster (Wagenmakers and Lee, 2014) revealed no evidence in favor of the 1/n prediction in the Midline cluster (Table 1), with data 1.9 times more likely under the null hypothesis of no deviations from 1/n prediction. However, in the lateral cluster the data provided moderate evidence (BF10 = 6.3) in favor of the 1/n prediction. These results suggest that the SSVEP amplitudes in the lateral parieto-occipital cluster reflect allocation of a strictly limited resource. While the modulation of SSVEP amplitudes in the midline cluster is also set size dependent, it does not exhibit inverse proportionality to set size to the same degree as in the lateral parieto-occipital cluster.

Table 1.

Summary of the tests of deviation between 1/n prediction and observed attentional effects

Cluster Set size Degrees of freedom t statistic p value BF10
Midline occipital Two 17 −1.39 0.18 0.52
Four 17 1.54 0.14
Six 17 −0.186 0.86
Lateral parieto-occipital Two 17 −0.351 0.73 0.16
Four 17 0.0907 0.93
Six 17 0.243 0.81

p-values not corrected for multiple comparisons.

To test whether limits of attentional selection in early visual cortex are linked to tracking performance we examined whether attentional modulation of SSVEP amplitudes predicts accuracy on the trial-by-trial basis. Figure 5 shows that on trials with lower-than-average attentional selection in the lateral parieto-occipital cluster participants were more likely to produce an error response (low vs high selection: t(50) = −2.67, p = 0.01, d = 0.34). This pattern was not observed in the midline occipital cluster (t(50) = −0.85, p = 0.4, d = 0.1). Mixed model logistic regression confirmed that attentional selection in the lateral cluster predicts performance (OR: 1.05 [CI: 1.01–1.09], p = 0.01).

Figure 5.

Figure 5.

Mean accuracy rates for trials with large and small single-trial SSVEP amplitude modulations. Trials with larger attentional modulation in the lateral electrode cluster demonstrate higher performance. Error bars are within-subject 95% confidence intervals.

Discussion

Our results demonstrate that multiple object tracking is closely associated with attentional selection in early visual cortex, and that tracking errors are predicted by these early attentional capacity limitations. We used the frequency-tagging technique and a set size manipulation to concurrently measure allocation of attention to a varied number of tracked targets and distractors. The SSVEP signal has two main generators located in primary visual cortex (V1) and in motion-sensitive MT (Di Russo et al., 2007; Störmer et al., 2013). Accordingly, we identified two clusters of electrodes, midline occipital and lateral parieto-occipital, which both exhibited set-size dependency of attentional enhancement and are located over V1 and MT topographically. Both primary visual cortex and MT have the shortest response latencies to a visual stimulus, thus activity in both clusters reflects chronologically early stages of cortical processing of visual information (Lamme and Roelfsema, 2000).

We found that relative attentional enhancement at these early stages decreases with set size, supporting the idea that attentional capacity for neural enhancement is shared between all tracked stimuli. Further, we confirmed that in the lateral parieto-occipital cluster, likely reflecting activity in motion-sensitive area MT, this dependency was inversely proportional to the number of tracked targets as well as predictive of performance. Together these findings demonstrate that during tracking, attention operates in a capacity limited manner already at the early stages of processing, and that these limits significantly contribute to the outcome of tracking.

Our key finding, the set-size dependency of attentional modulation, seems to conflict with a previous SSVEP study of multiple object tracking (Störmer et al., 2013), which also found a continuous attentional boost to target processing, but no set-size dependency of this effect. There are a number of differences between the two implementations of multiple object tracking tasks which can explain this discrepancy. First, our study probed a wider range of set sizes; two, four, and six here vs five and seven in Störmer et al. (2013), which resulted in a stronger manipulation of attention during tracking. Second, we kept the physical stimuli identical between different set-size conditions. The assignment of tagging frequencies to stimuli allowed us to differentiate target-induced and distractor-induced SSVEPs across set size conditions without changing the physical number of tagged and presented items. This feature of the experimental design ensured that there were no changes in spatial interference between targets and distractors that could affect attentional selectivity (Franconeri et al., 2010) and differs from Störmer et al. (2013), where participants tracked either five out of 10 or seven out of 14 presented objects. Last, we employed a more stringent test of participants' performance by probing two instead of one item after each trial, reducing the chance of classifying correctly guessed responses as correctly tracked by 25%. Together, these features of our task make it more sensitive to the changes in attentional selectivity between conditions.

While set-size dependency was present in both electrode clusters, in the lateral parieto-occipital cluster (P7/PO7, P5/6, P8) attentional selectivity was inversely proportional to the number of tracked targets and predictive of performance at the end of the tracking period. Given the known cortical sources of SSVEPs in primary visual cortex and in MT (Di Russo et al., 2007; Störmer et al., 2013), it is likely that the lateral parieto-occipital cluster predominantly reflects MT activity. Thus, our study provides evidence that limitations of tracking capacity are reflected in visual cortex well before the posterior parietal areas, where set size dependency was demonstrated earlier (Jovicich et al., 2001). However, our findings are not necessarily inconsistent with Jovicich et al. (2001), in that limitations of top-down modulation of visual processing might arise from the brain structures producing these attentional top-down control signals.

Interestingly, early attentional enhancement of targets was demonstrated to be hemifield specific, i.e., targets are selected in left and right visual fields independently (Störmer et al., 2014). Future studies could expand this finding to test whether set size dependency in early visual processing exhibits the same pattern.

The Attend two condition differed from Attend four and six conditions in that all targets flickered at one frequency that was distinct from the flicker frequency of all distractors. If flicker frequency were a useful cue for attentional selection, this might have facilitated selection in the Attend two condition. However, control experiments in previous studies have specifically tested for this possibility and consistently found attentional selection to be unaffected by differences in flicker frequency of targets and distractors (Müller et al., 2006; Störmer et al., 2013).

In line with studies of divided attention to static locations, we observed costs associated with dividing attention between multiple stimuli. Accordingly, the strongest effect of attentional selection on SSVEP amplitudes was observed when attention was spread across only two objects (relative amplitude enhancement of 34% or d = 1.49). This effect size is larger than effect sizes observed in other studies where SSVEP were measured as spatial attention was split into two foci (Andersen et al., 2013, 24% or d = 1.06; Adamian et al., 2019, 27% or d = 1.1). The obvious difference between these studies is that the MOT task engages spatial attention only, while the other studies use divided spatial attention as a means of performing a feature-based task. However, our previous studies demonstrated that concurrent deployment of attention to different dimensions, such as color and orientation (Andersen et al., 2015) or color and space (Adamian et al., 2019) is independent (i.e., it does not incur costs) and thus one might expect that the magnitude of attentional enhancement to be equal in the MOT task and in other tasks where spatial attention was divided. However, the present results suggest that MOT engages spatial attention above and beyond what is expected from splitting it in static foci. It is possible that continuously attending to a moving object is easier than keeping one's attention on a static one because of the bottom-up signal constantly provided by the object that successively changes position. Potentially related evidence shows that attention moves faster when it pursuits a moving object compared with shifting between static locations (Horowitz et al., 2004; Hogendoorn et al., 2007). It should be noted though, that the same explanation could be used to argue that motion has the potency of increasing distractor saliency too. Finally, it is also possible that the requirement to covertly shift and sustain attention ∼5 dva into the periphery on each side while maintaining central fixation in static experiments is hampering the effect of spatial attention on SSVEPs. When fixation is not required during MOT, observers tend to look at the central point between the targets (Fehd and Seiffert, 2008, 2010). If a similar strategy is employed during covert tracking, over the course of the trial attention will approach or coincide with fixation, relaxing the requirement of dissociating overt and covert focus and freeing attentional resources. In summary, attention to moving and static stimuli seems qualitatively similar in that divided attention incurs costs in both cases, but attention to moving stimuli may produce quantitatively larger effects.

To sum up, the present study demonstrated that during multiple object tracking attention operates in a capacity limited manner already at the early stages of visual processing. The magnitude of attentional enhancement enjoyed by the tracked targets in early visual areas decreases with their number. In addition, we identified a distinct bilateral group of electrodes in which attentional selection is anti-proportional to the number of tracked targets. The magnitude of this selection also predicts successful tracking performance, further solidifying the role of early visual cortex in supporting spatiotemporal attention that keeps track of multiple moving objects.

Footnotes

This work was supported by the Biotechnology and Biological Sciences Research Council Grant BB/P002404/1 (to S.K.A.) and the Leverhulme Early Career Fellowship ECF-2020-488 (to N.A.). We thank Alex O. Holcombe and Christian Merkel for their many insightful comments and suggestions. We also thank Rafael Lemarchand for his help in data collection.

The authors declare no competing financial interests.

References

  1. Adamian N, Slaustaite E, Andersen SK (2019) Top-down attention is limited within but not between feature dimensions. J Cogn Neurosci 31:1173–1183. [DOI] [PubMed] [Google Scholar]
  2. Alvarez GA, Franconeri SL (2007) How many objects can you track? Evidence for a resource-limited attentive tracking mechanism. J Vis 7:14. 10.1167/7.13.14 [DOI] [PubMed] [Google Scholar]
  3. Andersen SK, Hillyard SA, Müller MM (2008) Attention facilitates multiple stimulus features in parallel in human visual cortex. Curr Biol 18:1006–1009. 10.1016/j.cub.2008.06.030 [DOI] [PubMed] [Google Scholar]
  4. Andersen SK, Fuchs S, Müller MM (2011) Effects of feature-selective and spatial attention at different stages of visual processing. J Cogn Neurosci 23:238–246. 10.1162/jocn.2009.21328 [DOI] [PubMed] [Google Scholar]
  5. Andersen SK, Müller MM, Martinovic J (2012) Bottom-up biases in feature-selective attention. J Neurosci 32:16953–16958. 10.1523/JNEUROSCI.1767-12.2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Andersen SK, Hillyard SA, Müller MM (2013) Global facilitation of attended features is obligatory and restricts divided attention. J Neurosci 33:18200–18207. 10.1523/JNEUROSCI.1913-13.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Andersen SK, Müller MM, Hillyard SA (2015) Attentional selection of feature conjunctions is accomplished by parallel and independent selection of single features. J Neurosci 35:9912–9919. 10.1523/JNEUROSCI.5268-14.2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Awh E, Pashler H (2000) Evidence for split attentional foci. J Exp Psychol Hum Percept Perform 26:834–846. 10.1037/0096-1523.26.2.834 [DOI] [PubMed] [Google Scholar]
  9. Bates D, Mächler M, Bolker B, Walker S (2015) Fitting linear mixed-effects models using lme4. J Stat Softw 67:1–48. 10.18637/jss.v067.i01 [DOI] [Google Scholar]
  10. Castiello U, Umiltà C (1990) Size of the attentional focus and efficiency of processing. Acta Psychol (Amst) 73:195–209. 10.1016/0001-6918(90)90022-8 [DOI] [PubMed] [Google Scholar]
  11. Cavanagh P, Alvarez GA (2005) Tracking multiple targets with multifocal attention. Trends Cogn Sci 9:349–354. 10.1016/j.tics.2005.05.009 [DOI] [PubMed] [Google Scholar]
  12. Cavanagh P, Battelli L, Holcombe A (2014) Dynamic attention. In: The Oxford handbook of attention. Oxford: Oxford University Press. [Google Scholar]
  13. Chen WY, Howe PD, Holcombe AO (2013) Resource demands of object tracking and differential allocation of the resource. Atten Percept Psychophys 75:710–725. 10.3758/s13414-013-0425-1 [DOI] [PubMed] [Google Scholar]
  14. Culham JC, Brandt SA, Cavanagh P, Kanwisher NG, Dale AM, Tootell RBH (1998) Cortical fMRI activation produced by attentive tracking of moving targets. J Neurophysiol 80:2657–2670. 10.1152/jn.1998.80.5.2657 [DOI] [PubMed] [Google Scholar]
  15. Delorme A, Makeig S (2004) EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J Neurosci Methods 134:9–21. 10.1016/j.jneumeth.2003.10.009 [DOI] [PubMed] [Google Scholar]
  16. Di Russo F, Pitzalis S, Aprile T, Spitoni G, Patria F, Stella A, Spinelli D, Hillyard SA (2007) Spatiotemporal analysis of the cortical sources of the steady-state visual evoked potential. Hum Brain Mapp 28:323–334. 10.1002/hbm.20276 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Fehd HM, Seiffert AE (2008) Eye movements during multiple object tracking: where do participants look? Cognition 108:201–209. 10.1016/j.cognition.2007.11.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Fehd HM, Seiffert AE (2010) Looking at the center of the targets helps multiple object tracking. J Vis 10:19.1–13. 10.1167/10.4.19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Franconeri SL, Jonathan SV, Scimeca JM (2010) Tracking multiple objects is limited only by object spacing, not by speed, time, or capacity. Psychol Sci 21:920–925. 10.1177/0956797610373935 [DOI] [PubMed] [Google Scholar]
  20. Franconeri SL, Alvarez GA, Cavanagh P (2013) Flexible cognitive resources: competitive content maps for attention and memory. Trends Cogn Sci 17:134–141. 10.1016/j.tics.2013.01.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Hogendoorn H, Carlson TA, Verstraten FAJ (2007) The time course of attentive tracking. J Vis 7:2. 10.1167/7.14.2 [DOI] [PubMed] [Google Scholar]
  22. Horowitz TS, Holcombe AO, Wolfe JM, Arsenio HC, DiMase JS (2004) Attentional pursuit is faster than attentional saccade. J Vis 4:585–603. 10.1167/4.7.6 [DOI] [PubMed] [Google Scholar]
  23. Jovicich J, Peters RJ, Koch C, Braun J, Chang L, Ernst T (2001) Brain areas specific for attentional load in a motion-tracking task. J Cogn Neurosci 13:1048–1058. 10.1162/089892901753294347 [DOI] [PubMed] [Google Scholar]
  24. Junghöfer M, Elbert T, Tucker DM, Rockstroh B (2000) Statistical control of artifacts in dense array EEG/MEG studies. Psychophysiology 37:523–532. [PubMed] [Google Scholar]
  25. Lamme VAF, Roelfsema PR (2000) The distinct modes of vision offered by feedforward and recurrent processing. Trends Neurosci 23:571–579. 10.1016/S0166-2236(00)01657-X [DOI] [PubMed] [Google Scholar]
  26. Mangun GR, Hillyard SA (1991) Modulations of sensory-evoked brain potentials indicate changes in perceptual processing during visual-spatial priming. J Exp Psychol Hum Percept Perform 17:1057–1074. 10.1037/0096-1523.17.4.1057 [DOI] [PubMed] [Google Scholar]
  27. McMains SA, Somers DC (2004) Multiple spotlights of attentional selection in human visual cortex. Neuron 42:677–686. 10.1016/s0896-6273(04)00263-6 [DOI] [PubMed] [Google Scholar]
  28. McMains SA, Somers DC (2005) Processing efficiency of divided spatial attention mechanisms in human visual cortex. J Neurosci 25:9444–9448. 10.1523/JNEUROSCI.2647-05.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Merkel C, Stoppel CM, Hillyard SA, Heinze HJ, Hopf JM, Schoenfeld MA (2014) Spatio-temporal patterns of brain activity distinguish strategies of multiple-object tracking. J Cogn Neurosci 26:28–40. 10.1162/jocn_a_00455 [DOI] [PubMed] [Google Scholar]
  30. Merkel C, Hopf JM, Schoenfeld MA (2017) Spatio-temporal dynamics of attentional selection stages during multiple object tracking. NeuroImage 146:484–491. 10.1016/j.neuroimage.2016.10.046 [DOI] [PubMed] [Google Scholar]
  31. Morey RD (2008) Confidence intervals from normalized data: a correction to Cousineau (2005). Tutor Quant Methods Psychol 4:61–64. 10.20982/tqmp.04.2.p061 [DOI] [Google Scholar]
  32. Müller MM, Malinowski P, Gruber T, Hillyard SA (2003) Sustained division of the attentional spotlight. Nature 424:309–312. 10.1038/nature01812 [DOI] [PubMed] [Google Scholar]
  33. Müller MM, Andersen S, Trujillo NJ, Valdés-Sosa P, Malinowski P, Hillyard SA (2006) Feature-selective attention enhances color signals in early visual areas of the human brain. Proc Natl Acad Sci U S A 103:14250–14254. 10.1073/pnas.0606668103 [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Perrin F, Pernier J, Bertrand O, Echallier JF (1989) Spherical splines for scalp potential and current density mapping. Electroencephalogr Clin Neurophysiol 72:184–187. 10.1016/0013-4694(89)90180-6 [DOI] [PubMed] [Google Scholar]
  35. Pylyshyn ZW, Storm RW (1988) Tracking multiple independent targets: evidence for a parallel tracking mechanism. Spat Vis 3:179–197. 10.1163/156856888x00122 [DOI] [PubMed] [Google Scholar]
  36. Rouder JN, Speckman PL, Sun D, Morey RD, Iverson G (2009) Bayesian t tests for accepting and rejecting the null hypothesis. Psychon Bull Rev 16:225–237. 10.3758/PBR.16.2.225 [DOI] [PubMed] [Google Scholar]
  37. Scholl BJ (2009) What have we learned about attention from multiple-object tracking (and vice versa)? In: Computation, cognition, and Pylyshyn, pp 49–77. Cambridge: The MIT Press. [Google Scholar]
  38. Störmer VS, Winther GN, Li SC, Andersen SK (2013) Sustained multifocal attentional enhancement of stimulus processing in early visual areas predicts tracking performance. J Neurosci 33:5346–5351. 10.1523/JNEUROSCI.4015-12.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Störmer VS, Alvarez GA, Cavanagh P (2014) Within-Hemifield competition in early visual areas limits the ability to track multiple objects with attention. J Neurosci 34:11526–11533. 10.1523/JNEUROSCI.0980-14.2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Toffanin P, de Jong R, Johnson A, Martens S (2009) Using frequency tagging to quantify attentional deployment in a visual divided attention task. Int J Psychophysiol 72:289–298. 10.1016/j.ijpsycho.2009.01.006 [DOI] [PubMed] [Google Scholar]
  41. Wagenmakers EJ, Lee MD, eds (2014) Bayesian model comparison. In: Bayesian cognitive modeling: a practical course, pp 101–117. Cambridge: Cambridge University Press. [Google Scholar]
  42. Yantis S (1992) Multielement visual tracking: attention and perceptual organization. Cogn Psychol 24:295–340. 10.1016/0010-0285(92)90010-y [DOI] [PubMed] [Google Scholar]

Articles from The Journal of Neuroscience are provided here courtesy of Society for Neuroscience

RESOURCES