Post-Saccadic Disruption of Semantic Category Information in Naturalistic Scenes

Yong Min Choi; Tzu-Yao Chiu; Julie D Golomb

doi:10.1101/2025.06.06.658316

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2025 Sep 5:2025.06.06.658316. Originally published 2025 Jun 10. [Version 2] doi: 10.1101/2025.06.06.658316

Post-Saccadic Disruption of Semantic Category Information in Naturalistic Scenes

Yong Min Choi ^1,^2,^*, Tzu-Yao Chiu ¹, Julie D Golomb ¹

PMCID: PMC12258977 PMID: 40661517

Abstract

During natural vision, people make saccades to efficiently sample visual information from complex scenes. However, a substantial body of evidence has shown impaired visual information processing around the time of a saccade. It remains unclear how saccades affect the processing of high-level visual attributes – such as semantic category information–which are essential for navigating dynamic environments and supporting complex behavioral goals. Here, we investigated whether/how the processing of semantic category information in naturalistic scenes is altered immediately after a saccade. Through both human behavioral and neuroimaging studies, we compared semantic category judgments (Experiments 1A and 1B) and neural representations (Experiment 2) for scene images presented at different time points following saccadic eye movements. In the behavioral experiments, we found a robust reduction in scene categorization accuracy when the scene image was presented within 50 ms after saccade completion. In the neuroimaging experiment, we examined neural correlates of semantic category information in the visual system using fMRI multivoxel pattern analysis (MVPA). We found that scene category representations embedded in the neural activity patterns of the parahippocampal place area (PPA) were degraded for images presented with a short (0–100 ms) compared to a long post-saccadic delay (400–600 ms), despite no corresponding reduction in overall activation levels. Together, these findings reveal that post-saccadic disruption extends beyond basic visual features to high-level visual attributes of naturalistic scenes, highlighting a limitation of visual information processing in the short post-saccadic period before executing the next saccade.

Keywords: eye movements, scene perception, scene category, spatial frequency, visual stability

Introduction

When viewing complex visual scenes, people make saccadic eye movements - rapid, ballistic shifts of fixation to different spatial locations – to efficiently sample visual information (Najemnik & Geisler, 2005; Rayner, 2009; Renninger et al., 2007; Samonds et al., 2018; Yarbus, 1967). In many cases, saccades are functionally beneficial, projecting relevant information onto the retinal region with the highest spatial resolution, and maximizing information gain while reducing perceptual uncertainty (Renninger et al., 2007).

Although people are often unaware of any instability, numerous studies have documented various perceptual changes around the time of saccadic eye movements. During a saccade, visual input is transiently suppressed–a phenomenon known as a saccadic suppression–which supports the conscious impression of seamless stability by minimizing retinal blur (Benedetto & Morrone, 2017; Burr et al., 1994; Idrees et al., 2020; Kleiser et al., 2004). Beyond suppression effects, perception can also be distorted shortly before and after saccadic eye movements. For instance, saccades introduce perisaccadic compression, a transient distortion in spatial geometry that causes systematic mislocalization of visual objects flashed shortly before saccade onset (Ross et al., 1997; Hamker et al., 2008). Saccades also introduce perceptual instability. Because a saccade results in a displacement of retinal input, retinotopic neural representations must be remapped with each saccade (Duhamel et al., 1992; Neupane et al., 2020; Zirnsak & Moore, 2014), and this process is not perfectly efficient or instantaneous (Golomb & Mazer, 2021; Golomb & Kanwisher, 2012b). Thus, the processing of objects presented immediately after a saccade can be disrupted, including increased feature interference from objects in different locations (Golomb et al., 2014; Dowd & Golomb, 2020). If individual objects and features can be distorted immediately following a saccade, what might that mean for perception of rich, naturalistic scenes? Here we focus specifically on errors of post-saccadic perception because each saccade introduces new visual input to the retina, requiring the visual system to process novel visual information within a few hundred milliseconds before executing the next saccade.

Despite existing evidence of altered low-level perception following saccadic eye movements, it remains unclear how saccadic eye movements impact the encoding of more abstract visual information critical for everyday behavior. Visual scenes are extremely complex, in a way that multiple layers of low- and high-level visual properties are spatially organized with redundancy and regularity to form a meaningful scene (Geisler, 2008; Kersten, 1987; Malcolm et al., 2016). While some studies have used naturalistic stimuli to examine the perceptual consequences of saccades, they have largely focused on low-level properties such as local contrast (Dorr & Bex, 2013) or spatial frequency content of scene images (Kwak et al., 2024). However, successful behavior in complex visual environments requires encoding both low-level and high-level attributes–such as semantic category, navigability, action affordance, etc. (see Malcolm et al., 2016 for review).

How might saccadic eye movements influence the subsequent encoding of semantic category information (e.g., mountain, city, highway, etc.) from naturalistic scene images? One possibility is that the processing of semantic category information may be resilient to post-saccadic interference due to the redundant visual cues in natural scenes (Geisler, 2008; Kersten, 1987; Võ et al., 2019). Because semantic category information could be extracted from either basic-level (Castelhano & Henderson, 2008; Oliva & Schyns, 2000; Walther & Shen, 2014) or complex visual properties, such as spatial layout (Ross and Oliva, 2011) or global summary statistics (Greene & Oliva, 2009; Oliva & Torralba, 2006), previous findings on basic-level visual features may not be readily generalized to the semantic category information in naturalistic scene images. Alternatively, considering the linkage between processing of basic visual features and semantic category information (Groen et al., 2013; 2017), semantic category representations may be disrupted post-saccadically analogously to the processing of basic-level visual features.

Another intriguing alternative is that post-saccadic disruptions of semantic category representations may be more nuanced, perhaps depending on the spatial frequency conveying the scene contents. A prominent theory of rapid scene perception, the Coarse-to-Fine (CtF) model, suggests distinct roles of low and high spatial frequencies (Hegdé, 2008; Schyns & Oliva, 1994): The low spatial frequency (LSF) information conveys an abstract and coarse summary of a scene image (e.g., global layout) through the rapid magnocellular pathway, while the high spatial frequency (HSF) information carries finer details of a scene image (e.g., object details) through the relatively slower parvocellular pathway (Kauffmann et al., 2014). Given a short post-saccadic period to process the full-spectrum of spatial frequency information, the visual system may preferentially use LSF to encode scene attributes, resulting in the processing of HSF visual information being more vulnerable to post-saccadic disruption.

The current study examined whether, and how, the processing of semantic category information in naturalistic scenes is disrupted immediately after saccadic eye movements. We addressed this question using complementary behavioral and neuroimaging approaches. First, we tested behavioral performance on an explicit semantic categorization task for scene images (sampled from beach, city, forest, highway, mountain, and office categories) presented at varying time points after a saccade. Next, to assess underlying neural encoding of semantic category information in an orthogonal task, we examined the neural representation of semantic categories using functional Magnetic Resonance Imaging (fMRI) combined with multi-voxel pattern analysis (MVPA). In both cases, to examine whether the influence of post-saccadic delay is modulated by the spatial frequency conveying scene content, scene images were filtered with different spatial frequency filters to contain either low or high spatial frequency information. Previewing the findings, both behavioral and neural results indicated that the processing of semantic scene category information is transiently impaired when images are presented shortly after a saccade. These findings highlight a fundamental tradeoff in active scene perception: while saccades serve a functional benefit by projecting relevant information onto the retinal region with the highest acuity, they can also incur brief consequences for perception.

Results

Experiment 1

Human participants performed a gaze-contingent behavioral task in which they made a guided saccade and then reported the category of a naturalistic scene image presented after the saccade (Figure 1A). To examine how scene categorization performance varies over the post-saccadic period, we manipulated the delay between saccade offset and scene onset (Post-saccadic delay condition). In Experiment 1A, scenes appeared either 5 ms or 500 ms after the saccade. In Experiment 1B, five logarithmically spaced delays were used (5, 16, 50, 158, and 500 ms) to capture finer-grained temporal dynamics. Additionally, to test whether spatial frequency modulates the effect of post-saccadic delays on scene categorization performance, we presented scene images filtered to contain either full-spectrum (FS), high-spatial frequency (HSF), or low-spatial frequency (LSF; Figure 1B). The FS condition trials were only used to gauge overall performance and for excluding subjects.

Figure 1. — Experiment 1 design. (A) Trial sequence for behavioral experiments (Experiments 1A and 1B). (B) Example scene images of different scene categories, original images in the top row, followed by scene images filtered with different spatial frequency filters.

Experiment 1A.

Scene categorization accuracy exceeded chance level (0.16) for both HSF (mean = 0.59, sd = 0.12) and LSF (mean = 0.62 , sd = 0.12) conditions. We compared accuracies between the two post-saccadic delay conditions (5 ms vs. 500 ms) and two SF conditions (HSF vs. LSF) by performing 2 × 2 repeated-measures ANOVA (Figure 2A). We found a significant main effect of post-saccadic delay (F(1,20) = 17.11, p < .001, $η_{p}^{2} = .46$ , BF_incl = 189.06), with lower categorization accuracy in the 5 ms compared to the 500 ms post-saccadic delay condition. However, we found no significant main effect of the SF condition (F(1,20) = 2.11, p = .162, $η_{p}^{2} = .09$ , BF_incl = 0.76), nor significant interaction effect between the delay and SF condition (F(1,20) = 0.07, p = .790, $η_{p}^{2} = .004$ , BF_incl = 0.31). These findings suggest that the processing of semantic category information is disrupted when a scene image is presented briefly following a saccadic eye movement, regardless of spatial frequency conveying the scene content.

Figure 2. — Scene categorization task results for (A) Experiment 1A and (B) Experiment 1B. Scene categorization accuracy was compared across two post-saccadic delay conditions in Experiment 1A and five post-saccadic delay conditions in Experiment 1B. In both figure, faint gray lines indicate categorization accuracy for LSF (coarse dots) and HSF (fine dots) conditions, while the solid black line indicate the scene categorization accuracy collapsed across SF conditions. For Experiment 1B, categorization accuracy in the 500 ms post-saccadic delay was used as a baseline and depicted with the gray region representing standard error. Error bars indicate within-subject standard errors. NOTE: Chance level is 0.16 for 6-AFC scene categorization task.

Experiment 1B.

Scene categorization accuracy pooled over delay condition exceeded chance level (0.16) for both HSF (mean = 0.62, sd = 0.10) and LSF (mean = 0.59 , sd = 0.12) conditions. The 5 (post-saccadic delay condition) × 2 (SF condition) repeated-measures ANOVA (Figure 2B) revealed a significant main effect of the post-saccadic delay condition (F(4,68) = 7.15, p < .001, η_p² = .30, BF_incl = 75.54), without significant interaction effect with spatial frequency (F(4,68) = 0.53, p = .713, η_p² = .03, BF_incl = 0.07), consistent with Experiment 1A. While the Bayesian evidence supported a main effect of spatial frequency condition (BF_incl = 18.25), it did not reach significance with the frequentist approach (F(1,17) = 3.78, p = .069, $η_{p}^{2} = .18$ ).

As pre-registered, we then conducted post-hoc t-tests after collapsing the spatial frequency condition (Figure 2B, solid black line). Specifically, scene categorization accuracy in the 500 ms post-saccadic delay condition was considered as the baseline for recovered performance (Figure 2B, gray region) and compared with the other shorter post-saccadic delay conditions (5, 16, 50, 158 ms). We found significantly lower categorization accuracy in the 5 ms (t(17) = −2.87, p = .011, d = −0.68, BF₁₀ = 5.04) compared to the 500ms baseline. Additionally, though it did not reach significance based on corrected alpha value (.0125), scene categorization accuracy was also lower in 16 ms post-saccadic delay conditions compared to the baseline (t(17) = −2.71, p = .015, d = −0.64, BF₁₀ = 3.80). However, the scene categorization accuracy was not significantly different from the baseline in the 50 ms (t(17) = 0.73, p = .476, d = 0.17, BF₁₀ = 0.31) and 158 ms post-saccadic delay conditions (t(17) = 1.29, p = .214, d = 0.30, BF₁₀ = 0.50). Combined, these results demonstrated the time course of semantic category representation in the post-saccadic period, characterized by a significant drop in scene categorization performance shortly following the saccade offset and rapid recovery back to the baseline within 50 ms after the saccade offset.

Experiment 1B Exploratory analyses.

For the above analyses we defined saccade offset in a real-time gaze-contingent manner, as the time when the distance between the current gaze location and the saccade target location becomes smaller than 2°. While this method is commonly used in literature, it likely underestimates saccade offset time, such that the eye may still be moving for a brief period of time after this marker. Indeed, when we performed post-hoc analyses calculating eye movement velocity at different time points relative to the scene onset time, eye movement velocity at scene onset was higher with short post-saccadic delays (Figure 3A). Thus, the decreased scene categorization accuracy in shorter post-saccadic delay trials could be attributed to the residual eye movement that can smear a visual image projected to the retina.

Figure 3. — Exploratory analysis results for Experiment 1B, accounting for eye movement velocity and the saccade detection algorithm. (A) Eye movement velocity (dva/sec; y-axis) as a function of time relative to scene onset (x-axis) is plotted for each post-saccadic delay condition (rows) from a single exemplar subject. Gray boxes indicate the duration of scene presentation. (B) Group-level scene categorization accuracy for short (5, 16 ms), intermediate (50, 158 ms), and long (500 ms) post-saccadic delay trials, after excluding trials in which eye movement velocity exceeded 25 dva/s at scene onset. (C) Post-saccadic delay at scene onset for each condition, calculated using saccade onset times from the Eyelink 1000’s online parsing system for each subject. Black vertical lines indicate the intended post-saccadic delay. (D) Group-level scene categorization accuracy compared across post-saccadic delay conditions based on re-calculated post-saccadic delays, grouped into four post-saccadic delays. Error bars indicate within-subject standard errors.

To investigate whether retinal shifts of visual input are responsible for reduced scene categorization performance, we excluded trials on which the eyes were still moving at scene onset (>25 °/sec), and compared categorization accuracy for short (5, 16 ms), intermediate (50, 158 ms) and long (500 ms) post-saccadic delay trials (Figure 3B). One-way repeated-measures ANOVA revealed a significant main effect of post-saccadic delay (F(2,34) = 12.99, p < .001, $η_{p}^{2} = .43$ , BF_incl = 305.89), characterized by significantly lower categorization accuracy for short (5, 16 ms) post-saccadic delay trials compared to intermediate (t(17) = −4.89, p_bonf < .001, d = −1.15, BF₁₀ = 6060.12) and long post-saccadic delay trials (t(17) = −3.69, p_bonf = .002, d = −0.87, BF₁₀ = 7.09), without significant difference between intermediate and long post-saccadic delay trials (t(17) = 0.28, p_bonf = .721, d = 0.28, BF₁₀ = 0.44). These results indicate that the observed post-saccadic drop in scene categorization accuracy was not due to the confound of residual eye movement.

In addition, we also employed an alternative algorithm to detect saccade onset and offset. Using the online parsing system built-in Eyelink 1000, we re-calculated trial-wise post-saccadic delay (Figure 3C). The majority of re-calculated post-saccadic delays (histograms) were shorter than the intended post-saccadic delays (black vertical lines), suggesting that this method is more strict way of defining post-saccadic delay for each stimuli onset. Then, we labeled each trial based on calculated post-saccadic delay into four post-saccadic delay groups (0-16 ms, 16-50 ms, 50-158 ms, 250-1000 ms; Figure 3D). One-way repeated-measures ANOVA again revealed a significant main effect of post-saccadic delay (F(3, 51) = 4.38, p = .008, $η_{p}^{2} = .205$ , BF_incl = 5.42). Post-hoc analysis found lower scene categorization accuracy in 0-16 ms post-saccadic delay trials compared to the 16-50 ms (t(17) = −3.12, p_bonf = .019, d = −0.73, BF_10,U = 3.34) and 50-158 ms (t(17) = −3.12, p_bonf = .018, d = −0.74 , BF_10,U =8.64) , with marginal difference compared to the 158-1000 ms post-saccadic delay trials (t(17) = −2.47, p_bonf = .10, d = −0.58, BF_10,U = 2.42). The exploratory analyses revealed impaired semantic category information for scene images presented immediately after saccadic eye movement, which is not attributed to smeared retinal image nor limited to the saccade detection methods used in the main analysis.

Experiment 2

In Experiment 2, we adopted a neuroimaging approach, using functional Magnetic Resonance Imaging (fMRI) and multi-voxel pattern analysis (MVPA; Haxby et al, 2001), to assess whether and how neural representations of semantic scene category information are altered following saccades. Specifically, if scene content processing is disrupted post-saccadically, this should be reflected in degraded decoding of scene category information within scene-selective brain regions such as the parahippocampal place area (PPA; Epstein & Kanwisher, 1998).

Comparing neural indicators of semantic scene category information in the absence of an explicit categorization task is particularly useful to rule out alternative explanations for the reduced behavioral performance evaluated in Experiment 1. For example, non-perceptual factors such as interference with decision-making (Matsumiya & Furukawa, 2023), motor planning or execution (Pashler et al., 1993; Richardson et al., 2013) may be responsible for the reduced performance in short delay conditions and/or the absence of interaction with spatial frequency. By examining neural evidence of scene category providing direct evidence for the perceptual disruption of scene content during the post-saccadic period.

In the fMRI scanner, subjects followed a fixation dot and performed a 1-back task on sequentially presented scene images, pressing a button only when the current image was identical to the one shown on the previous trial (Figure 4A). Similar to the behavioral experiments, we aimed to compare scene images presented with either short or long delays after the saccade cue onset (Post-saccadic delay condition). To manipulate post-saccadic delay, we integrated high temporal resolution eye tracking with the fMRI system. However, unlike the fully gaze-contingent behavioral experiments, where stimulus presentation on each trial was contingent on online eye-tracking data, the fMRI trial sequence had to be pre-scheduled and time-locked to the scanner’s repetition time. Therefore, we recorded high resolution eye tracking data during each fMRI trial and used it to subsequently select trials for inclusion in which the scene onset time fell within the designated short (0-100 milliseconds) or long (400-600 milliseconds) post-saccadic delay windows. To maximize trial inclusion, we first measured each subject’s average saccade reaction time during a pre-scan session (Figure 4B), and adjusted the saccade cue onset timing for that subject in their fMRI session to maximize the likelihood that scenes would appear within these windows (see Materials and Methods for details). The scene image stimuli were drawn from two nature scene categories (beach and mountain) and two urban scene categories (city and highway), and were filtered with two low spatial frequency (LSF1 and LSF2) and two high spatial frequency bands (HSF1 and HSF2), matching the hierarchical structure of the scene category manipulation.

Figure 4. — Experiment 2 design. (A) Trial sequence. (B) illustrative distribution of saccade reaction time (ms) – delay between saccade cue onset to saccade offset – measured during the pre-scan session. We identified a 100 ms range containing the highest concentration of saccade reaction times (gray horizontal bar). The upper bound of this range (black vertical bar) was defined as the optimal saccade reaction times (*optSRT*), and used to set the delay between the saccade cue and scene onset in the scan session. (C) Illustrative distribution of post-saccadic delays – delay between saccade offset and scene onset – measured during the scan session. By presenting scene image either *optSRT* or *optSRT + 500 ms* after the saccade cue onset, trial-wise post-saccadic delays followed a bimodal distribution, maximizing the number of short (0-100 ms; blue) or long (400-600 ms; green) post-saccadic delay trials, selected through post-hoc analyses of eye-tracking data and included in the main analysis. (D) Each scene image was labeled as one of four scene category conditions and four spatial frequency conditions.

Post-saccadic disruption of scene category information in PPA.

To quantify the amount of scene category information (natural vs. urban) represented in PPA, we performed a multi-voxel pattern analysis (MVPA; Haxby et al., 2001^;Golomb & Kanwisher, 2012a). Specifically, we constructed representational similarity matrices (RSMs) and tested whether voxel-wise activation patterns in PPA were more similar between trials featuring the same scene category than between trials with different categories (Figure 5; see MVPA analysis section for details).

We found that activity patterns were more similar for images of the same scene category than for different categories, across all spatial frequency and post-saccadic delay conditions (ps<.021, BF₁₀s >2.97; Figure 6A), indicating significant semantic scene category representation encoded in PPA. Next, to assess how scene category information (indexed by the same-minus-different category difference scores) was influenced by post-saccadic delay and its interaction with spatial frequency, we calculated scene category representation on 8 × 8 RSMs (correlations within each delay x SF condition) and conducted a 2 (Post-saccadic delay condition) × 2 (Spatial frequency condition) repeated measures ANOVA (Figure 6B). The ANOVA showed no significant interaction (F(1,16) = 1.622, p = .221, $η_{p}^{2} = .09$ , BF_incl = 0.61) or main effect of spatial frequency (F(1,16) = 3.38, p = .085, $η_{p}^{2} = .17$ , BF_incl = 2.11). Nevertheless, there was significant main effect of post-saccadic delay condition (F(1,16) = 9.99, p = .006, $η_{p}^{2} = .38$ , BF_incl = 1.07), indicated by reduced scene category information in the short compared to the long post-saccadic delay trials. A post-hoc analysis of simple main effects revealed reduced scene category information in short compared to long post-saccadic delay trials in the HSF condition (F(1) = 6.94, p = .018, d = −0.64, BF₁₀ = 3.31), but not in the LSF condition (F(1) = 0.26, p = .615, d = −0.12, BF₁₀ = 0.28).

Figure 6. — fMRI analysis results in PPA. (A) Average correlation between the same scene category pairs (white bars) and different category pairs (black bars) shows more similar neural activity patterns between same scene category trials than different scene category trials. (B) Scene category representation in PPA, obtained by subtracting average correlation between different scene category pairs from same category pairs, separately for spatial frequency conditions (LSF vs. HSF) and post-saccadic delay conditions (Short vs. Long). (C) Univariate analysis result showing overall activation level between each condition. Error bars indicate within-subject standard error.

Next, to better capture the broader effect of post-saccadic delay, we performed another MVPA analysis separating the RSM only by delay (16 × 16 cell RSMs) to calculate scene category information at each delay regardless of spatial frequency. A paired-samples t-test confirmed significantly lower scene category information in the short post-saccadic delay trials (M = 0.053) compared to that of long post-saccadic delay trials (M = 0.078; t(16) = −3.84, p = .001, d = −0.93, BF₁₀ = 27.64).

Disrupted neural activity pattern without reduced activation.

Is the reduction in semantic category representation in neural activity pattern driven by an overall reduction in PPA activation to scene images? We conducted a standard univariate analysis averaging beta estimates in PPA (Figure 6C). A 2 (Post-saccadic delay condition) × 2 (Spatial frequency condition) repeated measures ANOVA revealed no significant main effect of post-saccadic delay (F(1,16) = 0.86, p = .366, $η_{p}^{2} = .051$ , BF_incl = 0.41), nor a significant interaction (F(1,16) = .001, p = .974, $η_{p}^{2} = .00$ , BF_incl = 0.31) or main effect of spatial frequency (F(1,16) = 0.023, p = .881, $η_{p}^{2} = .001$ , BF_incl = 0.25). The absence of post-saccadic delay effect on univariate activation suggests that overall activation to visual scene stimuli remains intact after a saccade, even when the neural activation pattern encoding the semantic scene content was disrupted.

Consistent patterns in PPA subregions along anterior-posterior axis

Motivated by the functional distinction of PPA along the anterior-posterior axis (Baldassano et al., 2016; Berman et al., 2017), we further investigated if sub-regions of PPA along the anterior-posterior axis are differently influenced post-saccade. We performed a 2 (Post-saccadic delay condition) × 2 (Spatial frequency condition) × 2 (PPA subregions) repeated measures ANOVA (Figure 7A). Consistent with post-saccadic disruption of scene category information in PPA as a whole, we found a main effect of post-saccadic delay on scene category information (F(1,16) = 7.17, p = .016, $η_{p}^{2} = .31$ , BF_incl = 1.37), without significant interaction between post-saccadic delay and spatial frequency (F(1,16) = 1.50, p = .238, $η_{p}^{2} = .09$ , BF_incl = 1.35). Moreover, there was a significant higher scene category information in HSF, compared to the LSF condition (F(1,16) = 4.772, p = .044, $η_{p}^{2} = .23$ , BF_incl = 32.05). Importantly, we did not find any 2-way nor 3-way interaction effects involving PPA subregions (ps>0.67, BF₁₀s < .24), suggesting no functional distinction between anterior and posterior PPA concerning the post-saccadic processing of semantic category information. Consistent with the univariate results for the PPA overall, a 2 × 2 × 2 repeated measures ANOVA on univariate activation (Figure 7B) did not find any significant main effects nor interactions with subregion (ps > .40, BF_incls < 0.46).

Figure 7. — fMRI analysis results in anterior and posterior PPA (A) Scene category representation in anterior (left) and posterior PPA (posterior), obtained by subtracting average correlation between different scene category pairs from same category pairs, separately for spatial frequency conditions (LSF vs. HSF) and post-saccadic delay conditions (Short vs. Long). (C) Univariate analysis result showing overall activation level in each condition, separately for anterior (left) and posterior PPA (right). Error bars indicate within-subject standard error.

Spatial frequency information in early visual cortex

Finally, while our primary focus is on post-saccadic representations of semantic scene content (scene category information), the spatial frequency manipulation also allowed us to examine post-saccadic processing of basic-level visual features in complex scene images (i.e., spatial frequency information). Similar to above, we conducted both MVPA and univariate analyses, now examining the amount of spatial frequency information (LSF vs. HSF) in the early visual cortex (EVC). As shown in Figure 8A, the MVPA analysis tested whether there was significant information in the pattern of EVC response to differentiate whether a scene contained high vs low spatial frequency content. EVC exhibited significant scene frequency information in both short (t(16) = 3.69, p = .002, d = 0.90, BF₁₀ = 21.126) and long post-saccadic delay trials (t(16) = 2.58, p = .02, d = 0.63, BF₁₀ = 3.04). Although the magnitude was numerically higher in the long delay, a paired-samples t-test revealed no significant effect of post-saccadic delay on spatial frequency representation (t(16) = −0.98, p = .34, d = −0.24, BF₁₀ = 0.38). The univariate analysis also found no significant difference between the short and long post-saccadic delay condition in EVC (Figure 8B; t(16) = −0.92, p = .370, d = −0.224, BF₁₀ = 0.361). Additional analyses calculating spatial frequency information within low (LSF1 vs. LSF2) or high (HSF1 vs. HSF2) spatial frequency bands were not significant in EVC (Supplementary Figure 6).

Figure 8. — fMRI analysis results in EVC. (A) Neural representation of spatial frequency information was compared between the short and long post-saccadic delay conditions. (B) Univariate activation was compared between the short and long post-saccadic delay conditions. Error bars indicate within-subject standard error.

General Discussion

The current study used a combination of behavioral and neuroimaging approaches to investigate an understudied aspect of naturalistic visual scene perception: whether representations of semantic scene category information are briefly altered in the time period immediately following a saccadic eye movement. Our behavioral experiments revealed significantly diminished scene categorization accuracy when the scene image was presented following the shortest post-saccadic delays (<50 ms), compared to after longer delays. Moreover, in the fMRI experiment, we assessed neural representations of semantic category in scene-selective brain region PPA using MVPA, and found analogously disrupted semantic category representations for scene images presented with short (0-100 ms) compared to longer post-saccadic delays (400-600 ms). The degraded neural representation even in absence of an explicit semantic task rules out non-perceptual explanation such as decision-making interference (Matsumiya & Furukawa, 2023) or motor planning (Pashler et al., 1993; Richardson et al., 2013), underscoring genuine disruption of semantic category representation in the post-saccadic period.

Furthermore, fMRI data revealed no effect of post-saccadic delay on univariate activation in PPA, suggesting that saccades interfere with representations of scene content (neural pattern encoding), rather than reducing overall activity. The lack of an activation difference argues against the possibility that residual eye movements restrict the amount of visual information reaching the system at early processing stages. Moreover, it may indicate that PPA still recognized the visual input as a ‘scene’, while the detailed semantic content is not fully processed post-saccadically. This proposes interesting correspondence with prior findings, where people are often surprisingly insensitive to trans-saccadic changes in scene details (Choi et al., 2025; Henderson & Hollingworth, 2003; Kwak et al., 2024), whilst maintaining a coherent conscious percept of the visual scene.

Together, these findings demonstrate that high-level visual attributes of naturalistic scene are vulnerable to disruption following saccades, despite the redundancy and regularity of naturalistic scene images (Geisler, 2008; Kersten, 1987; Malcolm et al., 2016; Võ et al., 2019). Adding to the known functional benefits of saccadic eye movements during active exploration of visual scenes, these results suggest that saccades may also carry brief costs for subsequent visual information processing.

The effect of spatial frequency conveying semantic category information.

In all experiments, we manipulated the spatial frequency content of scene stimuli to examine its influence on semantic category processing in the post-saccadic period. Particularly, inspired by the Coarse-to-Fine (CtF) model (Hegdé, 2008; Schyns & Oliva, 1994), we hypothesized that the rapid processing of high-level scene attributes may rely more on the LSF information, making HSF images more susceptible to post-saccadic disruption. On the other hand, some studies of saccadic suppression have found stronger suppression (i.e., reduced sensitivity) for LSF compared to HSF stimuli (Burr et al., 1994; Idrees et al., 2020; Kleiser et al., 2004), which would predict the opposite pattern in our study.

Our fMRI study revealed interesting effects of spatial frequency. First, we found overall stronger semantic category representations for HSF compared to LSF scene images, especially in long-post-saccadic delay trials, consistent with prior work suggesting that scene content may be predominantly conveyed by HSF information (Berman et al., 2017; Kauffmann et al., 2015; Rajimehr et al., 2011). Interestingly, it was HSF scene images - not LSF scene images - that exhibited a significant reduction in semantic category representation in short post-saccadic delay trials, although the interaction effect was not significant. While the relative preservation of semantic information for LSF scenes under short delays is consistent with the prediction grounded on CtF model, the current findings alone are insufficient to conclude if the visual system preferentially relies on LSF information in post-saccadic scene perception. Future research could clarify how the visual system differentially processes spatial frequency information during the immediate post-saccadic period.

The greater post-saccadic impairment for HSF scene images does seem inconsistent with a stronger saccadic suppression observed with LSF compared to HSF stimuli (Burr et al., 1994; Idrees et al., 2020; Kleiser et al., 2004). This discrepancy may reflect distinctive processing of localized objects versus naturalistic scenes (Boucart et al., 2013; Hasson et al., 2002; Levy et al., 2001; Malach et al., 2002). While object recognition relies on central vision with high spatial resolution, scene processing remains robust in peripheral vision (Boucart et al., 2013) and even with low-pass filtered images (Nuthmann, 2013, 2014). Indeed, scene-selective voxels are clustered medially in the ventral temporal cortex and exhibit a preference for peripheral visual input (Grill-Spector & Weiner, 2014; Hasson et al., 2002; Levy et al., 2001; Malach et al., 2002). Taken together, the distinct patterns of post-saccadic visual perception, modulated by spatial frequency, may reflect an optimized use of different spatial frequencies around the time of saccadic eye movements for more efficient scene processing.

Unlike the fMRI experiment, the behavioral experiments did not find a corresponding effect of spatial frequency information, possibly due to insufficient sensitivity of the categorization task to capture subtle effects of low-level image statistics. The visual environment is highly complex and redundant (Geisler, 2008; Kersten, 1987; Võ et al., 2019). When explicitly categorizing scenes, observers may rely on a variety of cues - including basic features (Castelhano & Henderson, 2008; Oliva & Schyns, 2000; Walther & Shen, 2014), spatial layout (Ross & Oliva, 2011), or global summary statistics (Greene & Oliva, 2009; Oliva & Torralba, 2006) – potentially obscuring subtle effects of spatial frequency.

The absence of disrupted spatial frequency information in EVC

Interestingly, in contrast to prior behavioral findings showing impaired sensitivity to basic-level visual features like contrast (Dorr & Bex, 2013) or spatial frequency (Kwak et al., 2024) in naturalistic scenes, our fMRI results revealed no significant effect of post-saccadic delay on the neural representation of spatial frequency information in early visual cortex. One possible explanation is that the duration of the scene image in our fMRI study (100 ms) was sufficiently long to allow adequate processing of basic-level visual information even when accounting for post-saccadic disruption. Using neuroimaging techniques with superior temporal resolution (e.g., EEG, MEG), previous studies have examined the time course of naturalistic visual stimuli processing for different attributes (Dima et al., 2018; Fakche et al., 2024). Specifically, a recent MEG experiment showed the neural representation of object color emerging around 100 ms after saccade offset, followed by category-level information around 145 ms (Fakche et al., 2024). The more rapid processing of basic-level visual features may mean that it could have escaped from post-saccadic interference in our experiment design, particularly on trials where the scene image was presented at the later end of the short-delay window.

Other high-level visual attributes during naturalistic scene processing

While we focused on semantic category information in naturalistic scenes, it does not capture the full range of high-level visual attributes necessary for interacting with the environment, such as action affordance and navigability (Epstein & Baker, 2019; Malcolm et al., 2016). For example, the stronger degradation of semantic category information when viewing HSF scene images may not generalize to other attributes (e.g., action affordance, navigability), considering literature suggesting distinct, flexible usage of spatial frequency information depending on task demands (Wiesmann et al., 2021). Moreover, compared to some of these other attributes, semantic category is a more stable attribute over time. While the semantic category of the current visual scene generally does not change across eye movements, navigable paths—defined in egocentric coordinates—change with each fixation and must be continuously updated across saccades (Wang & Spelke, 2000; Bonner & Epstein, 2017), as do the action affordances of objects (Medendorp et al., 2008; Henrique et al., 1998; Batista et al., 1999). Future research could explore how saccades affect these more dynamic scene attributes and how the visual system interacts with motor networks to enable seamless perception and action in naturalistic environments (Goodale, 2011; Tagliabue & McIntyre, 2012).

Lastly, our findings raise a fundamental question: how do individuals navigate complex visual environments effortlessly despite disruptions in high-level visual processing after saccades? Decades of research have identified multiple mechanisms supporting trans-saccadic perceptual stability, spanning neural (Duhamel et al., 1992; Wurtz, 2008), cognitive (MacKay, 1973), and visual (Binda & Morrone, 2018) levels. While majority of these theories were built upon the stability of basic visual properties, such as spatial displacement (Deubel et al., 1996) or changes in surface features of isolated objects (Weiß et al., 2015), there is increasing recognition on testing stability mechanisms in a more ecologically valid context (Choi et al., 2025). Here, we leveraged complementary use of behavioral and neural evidence and demonstrated disrupted processing of a high-level visual attribute – semantic category information – when viewing naturalistic scene images. Our results further underscore the need for future research to explore trans-saccadic perception in naturalistic settings with dynamic task demands to fully understand how the brain achieves coherent visual experience in real-world contexts.