Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 May 1.
Published in final edited form as: Psychophysiology. 2024 Jan 5;61(5):e14503. doi: 10.1111/psyp.14503

Multiple mechanisms of visual prediction as revealed by the timecourse of scene-object facilitation

Cybelle M Smith 1, Kara D Federmeier 1,2,3
PMCID: PMC11021179  NIHMSID: NIHMS1971011  PMID: 38178793

Abstract

Not only semantic, but also recently learned arbitrary associations have the potential to facilitate visual processing in everyday life—for example, knowledge of a (moveable) object’s location at a specific time may facilitate visual processing of that object. In our prior work, we showed that previewing a scene can facilitate processing of recently associated objects at the level of visual analysis (Smith and Federmeier in Journal of Cognitive Neuroscience, 32(5):783–803, 2020). In the current study, we assess how rapidly this facilitation unfolds by manipulating scene preview duration. We then compare our results to studies using well-learned object–scene associations in a first-pass assessment of whether systems consolidation might speed up high-level visual prediction. In two ERP experiments (N = 60), we had participants study categorically organized novel object–scene pairs in an explicit paired associate learning task. At test, we varied contextual pre-exposure duration, both between (200 vs. 2500 ms) and within subjects (0–2500 ms). We examined the N300, an event-related potential component linked to high-level visual processing of objects and scenes and found that N300 effects of scene congruity increase with longer scene previews, up to approximately 1–2 s. Similar results were obtained for response times and in a separate component-neutral ERP analysis of visual template matching. Our findings contrast with prior evidence that scenes can rapidly facilitate visual processing of commonly associated objects. This raises the possibility that systems consolidation might mediate different kinds of predictive processing with different temporal profiles.

Keywords: associative learning, EEG, N300, prediction, systems consolidation, visual object recognition

1 |. INTRODUCTION

Visual object processing is facilitated when objects are embedded in supportive (vs. incongruent) contexts (Biederman et al., 1982; Davenport & Potter, 2004; Wolfe et al., 2011). Human research on scene–object contextual facilitation has largely focused on well-learned associations (e.g., a pot viewed in a kitchen scene; see for example, Bar, 2004; Truman & Mudrik, 2018; Võ et al., 2019). However, more recently acquired associations also present an opportunity for speeding visual object recognition (Turk-Browne, 2019); that is, knowing that a pot was recently placed in the garage could also ease its identification. In keeping with this, we previously showed that visual processing is facilitated for objects presented after a recently associated (and thus “congruent”) scene context (Smith & Federmeier, 2020). However, we still do not understand the underlying mechanisms or temporal dynamics of visual contextual facilitation for newly associated stimuli. Well-learned scene–object associations are believed to be rapidly accessed in the service of visual object recognition (Bar et al., 2006; Guillaume et al., 2016). Can scenes also rapidly influence visual processing of objects based on recently learned associations? We explore this question in humans by manipulating the onset delay between a scene and an object that was (or was not) recently paired with that scene and measuring resulting changes in congruency effects. As we discuss, our results speak to both the mechanisms and neural systems involved in context-based visual facilitation.

1.1 |. Scenes rapidly facilitate visual object processing for well-learned natural associations

Converging evidence suggests that scene information can rapidly modulate visual object processing (Bar et al., 2006; Joubert et al., 2008; Truman & Mudrik, 2018). Congruent scenes speed superordinate categorization of embedded visual objects, as illustrated by RT distributions on a go-nogo animal judgment task diverging as early as 300 ms post-stimulus-onset (Joubert et al., 2008). Scene gist information is extracted within 200 ms (reviewed in Larson et al., 2014) and likely contributes to these early effects, as corroborated by MEG data (Bar et al., 2006). Bar and colleagues inferred that prefrontal activity leads inferotemporal activity in predicting visual object recognition, all within 200 ms. These and related findings led them to attribute scene–object facilitation effects to top-down bias signals, which generate predictive pre-activation of visual object features. However, human scalp EEG studies typically do not detect facilitatory effects of scene–object congruency until around 200 ms or later (cf. Guillaume et al., 2016). Nonetheless, EEG studies are also generally consistent with the idea that scene information influences visual object processing with little to no delay after an initial feedforward pass through the visual hierarchy.

Across a wide range of task demands, scene–object ERP facilitation effects for well-learned/natural associations are observed between roughly 200 and 500 ms following object onset (Draschkow et al., 2018; Ganis & Kutas, 2003; Mudrik et al., 2010, 2014; Truman & Mudrik, 2018; Võ & Wolfe, 2013). This time range includes the N300 component, linked to high-level visual processing (Schendan, 2019), and the N400 component, linked to amodal or multi-modal semantic processing (Kutas & Federmeier, 2011).1 We focus particularly on the N300, due to its closer ties to vision. The N300 is associated with visual object recognition (Doniger et al., 2000; Schendan & Kutas, 2002), shape perception (reviewed in Schendan & Ganis, 2015), and canonical/non-canonical viewpoint manipulations of objects (McPherson & Holcomb, 1999; Schendan & Kutas, 2003). It also is associated with analogous effects for visual scenes and may index visual template matching (Kumar et al., 2021). Notably, the N300 has been claimed to index visual predictive coding and top-down memory-based influences on visual processing (Kumar et al., 2021; Schendan & Ganis, 2012, 2015), and its generators likely include human lateral occipital cortex (e.g., Doniger et al., 2000). It is thus interesting that N300 effects of scene congruency on object processing are observed in cases where the scene precedes the object by only a short interval (e.g., 300 ms, Ganis & Kutas, 2003) or even with concurrent object–scene presentation (Mudrik et al., 2010, 2014; Truman & Mudrik, 2018). Although more controlled comparisons are required, the overall literature suggests that there is not much sensitivity to scene preview duration in paradigms wherein scenes are displayed before naturally associated objects (cf. Demiral et al., 2012). This is consistent with a model in which scene information begins impacting visual object processing as soon as an initial bottom-up pass through the visual hierarchy is complete. Importantly, however, as we outline below, this may not match the time-course for facilitation of newly associated scene–object pairs, a hypothesis we test in the current study.

1.2 |. Scenes likely influence processing of recently associated objects via the medial temporal lobe (MTL)

Although some researchers suggest a possible role of the medial temporal lobe in congruency effects for naturally co-occurring object–scene pairs (Guillaume et al., 2016), many models focus on cortically consolidated pathways (Bar et al., 2006; Schendan & Ganis, 2012, 2015). This stands in contrast with research on recently learned high-level visual associations, which points to a strong role for the MTL (primate models: Higuchi & Miyashita, 1996; Murray et al., 1993; Sakai & Miyashita, 1991; see Albright, 2012 for review; rodent models: reviewed in Winters et al., 2010; human studies: Hannula et al., 2007; Kok & Turk-Browne, 2018; Schapiro et al., 2014). For example, human lesion studies have shown that MTL damage can impair paired-associate and statistical learning of high-level visual stimuli, including scene-face, shape-shape and scene-scene associates (Hannula et al., 2007; Schapiro et al., 2014). Human fMRI studies have also suggested that the hippocampus can encode predicted abstract shape information (Kok & Turk-Browne, 2018), and human intracranial EEG data suggests a role for the MTL in visual working memory (Boran et al., 2022). Although the MTL is not a primary generator of the N300 and may not directly contribute to the N300 response observed at the scalp, the idea that MTL activity may indirectly influence N300 facilitation effects under some circumstances is plausible. To start, human hippocampal local field potentials detected using intracranial recordings respond to high-level visual stimuli at roughly the same latency as N300 congruity effects (Kreiman et al., 2000) and are sensitive to similar manipulations (Sehatpour et al., 2008). Furthermore, the MTL is densely connected with, and itself contains, areas associated with high level visual scene and visual object processing (see Huang et al., 2021 and Ma et al., 2022 for a recent assessment of human MTL structural and functional connectivity). Degree of MTL reliance may in turn have important consequences for the timing of scene facilitation effects on visual object processing.

1.3 |. The timing of contextual facilitation for recently learned associates

Eye-tracking and ERP evidence provide an initial hint that contextual facilitation effects on visual processing for recently learned associates may be contingent on cue-target delay (Hannula et al., 2006, 2007), and correspondingly, that reliance on the MTL may introduce a delay in the impact of top-down feedback from cue to target. For example, in one eye-tracking study, Hannula et al. (2007) manipulated preview duration in a paired associate learning paradigm in which participants learned novel associations between faces and scenes. With prior presentation of the associated scene (for 3 s), participants preferentially viewed faces that were paired with the scene over previously presented but mismatching (re-paired) faces. This effect emerged approximately 500–750 ms post-stimulus-onset and was delayed by 1 s when scenes and faces were presented concurrently. It was also absent from patients with damage to the MTL, 5 out of 6 of whom had damage selective to the hippocampus. In another study, Hannula et al. (2006) adapted this paradigm for use with ERPs, and detected an N300 effect of face-scene mismatch using a 3 s scene preview. Hannula et al. noted that such early ERP effects are generally absent from match/re-pair manipulations in short-term associative memory paradigms using word pairs (e.g., Donaldson & Rugg, 1998). They thus inferred that the N300 effect might be contingent on scene preview in the same way as the preferential viewing effect observed with eye-tracking. However, preview time was never manipulated within Hannula et al.’s paradigm to directly test the hypothesis that N300 effects are contingent on sufficient preview time for recently paired associates.

In a 2020 study, our lab adapted Hannula and colleagues’ paradigm to study novel object–scene pairs (Smith & Federmeier, 2020). Similar to learning of face-scene pairs, learning of novel object–scene pairs is likely to be MTL dependent, as the MTL is believed to play a role in the learning and predictive processing of a wide variety of high-level visual associates (Turk-Browne, 2019), as there are dense connections between the hippocampus and both object-selective inferotemporal cortex and scene-selective parahippocampal cortex via the perirhinal and entorhinal cortices (Huang et al., 2021; Rolls et al., 2022). We further boosted the likelihood of MTL-reliance by ensuring that pre-existing associative relationships between objects and scenes were low, both through counterbalancing across multiple scene–object pairings and through the use of novel objects. We replicated Hannula et al.’s N300 match/mismatch effect, using a 2500 ms scene preview. In addition, we found that a distinct pattern of ERP mismatch effects could be elicited by visually distorting the presented object relative to the object that was learned, while keeping object identity intact. Finding reliable sensitivity to the specific visual properties of the presented object relative to the object exemplar associated with the scene at study strengthened our conclusion that participants’ visual representation of the object, and not merely the object’s identity, was processed in a facilitated way due to the presence of an associated scene. In the current study, having linked Hannula et al.’s MTL-dependent associative facilitation effects more strongly to visual processing than before, we now ask whether our ERP visual facilitation effects are modulated by scene preview duration.

1.4 |. Experimental design and predictions

We report the results of two EEG experiments. In the first experiment, we had participants study novel object–scene pairs, with a 200 ms scene preview prior to object onset at test. We can then compare the results of this study to Smith and Federmeier (2020), effectively running a between-subjects comparison of two different scene preview durations: 200 and 2500 ms. We specifically chose 200 ms as the scene preview duration in Experiment 1 to provide sufficient time for participants to extract scene gist information (and possibly additional details; cf. Fei-Fei et al., 2007) while restricting their ability to prepare to view the upcoming object. In our second experiment, we then manipulated scene preview duration continuously within-subjects, between 0 and 2500 ms. To capture the impact of the scene on visual processing of the object, we used as our primary dependent measures N300 modulations and component-neutral ERP visual similarity effects. In addition, we assessed N400 effects as a measure of semantic priming, and late positive complex (LPC) and response time (RT) data as metrics of the decision-making processes involved in determining contextual congruity.

If we observe sensitivity to scene preview duration on our primary dependent measures, this suggests that it takes time for scenes to facilitate visual processing of newly associated objects. To the extent that scene–object facilitation is more immediate for well-learned associates (based on our read of the literature), this could suggest that systems level consolidation modulates the time course of contextual facilitation effects in high level visual processing. That is to say, entering a visual environment might rapidly support recognition of semantically associated objects but may only support recognition of recently associated objects after a delay (and via different mechanisms dependent on the MTL).

Degree of sensitivity to scene preview also constrains the underlying mechanisms of any observed scene–object facilitation. A high degree of sensitivity to scene preview duration would be consistent with a predictive preactivation account of scene–object facilitation. Prediction-like preparatory activity has been observed prior to onset of a variety of anticipated visual stimuli (e.g., Bell et al., 2016; Lewis-Peacock & Postle, 2008), including in neural firing patterns following visual paired associate learning in macaque IT cortex (Sakai & Miyashita, 1991), and appears similar in nature to delay period maintenance activity in visual working memory paradigms (Boran et al., 2022; Luria et al., 2016). If scenes do predictively preactivate recently associated visual object features, then we expect that with only 200 ms of preview time (Experiment 1), congruency effects linked to visual processing should be notably reduced compared to the effects we previously documented with 2500 ms preview time. The results of Experiment 2, in turn, can help reveal the speed with which preactivated information comes online and if/when the accessibility of predictive information peaks or asymptotes before target onset.

If no sensitivity to scene preview duration is observed, this would be consistent with a unitized representational account of scene–object facilitation.2 That is, it is possible that following associative learning, viewing a scene evokes a pattern of neural activity that reflects visual feature-level information about both the scene and (to some degree) the associated object, immediately and concurrently. That the rapid activation of unitized representations might be possible is supported by electrophysiological studies in macaques, which have identified portions of perirhinal cortex that exhibit concurrent activation of cue and target representations after multi-session training on a visual paired associate task (Fujimichi et al., 2010). Also, some models of hippocampally dependent statistical learning, including successor representation models, employ unitized representations such that the cue and target come to be co-activated within the hippocampus following repeated exposure (Gershman, 2018; Schapiro et al., 2016). Furthermore, evidence that incongruous visual objects interfere with rapid scene gist extraction (Joubert et al., 2007) has corroborated the idea that scene processing and object processing are rapid, parallel and interactive (Bar, 2004), and that “feedforward co-activation” of objects and scenes drives scene–object mismatch effects (Guillaume et al., 2016). Thus, the results of our scene duration manipulation are additionally informative because they have the potential to rule out a “unitized representational” predictive processing account if sensitivity to scene preview duration is detected.

2 |. EXPERIMENT 1

In our first experiment, following the design in Smith and Federmeier (2020), we ask participants to study scene–object pairs and then in a subsequent test phase to indicate whether the presented object matches the scene. Critically, we adjust scene preview duration in the test phase to 200 ms, down from 2500 ms in Smith and Federmeier (2020). By directly comparing results of the current experiment with those presented in Smith and Federmeier (2020), we can examine sensitivity of ERP visual facilitation effects to cue-target delay.

2.1 |. Method

2.1.1 |. Participants

Data are reported from 24 participants (mean age 22, range 18–29; 9 males), all native English-speaking University of Illinois undergraduates, who were compensated with payment. One additional participant was replaced due to excessive trial loss. All participants provided written informed consent, according to procedures established by the IRB at the University of Illinois. Handedness was assessed using the Edinburgh inventory (Oldfield, 1971). All participants were right-handed; mean score: .84, where 1 denotes strongly right-handed and −1 strongly left-handed. Eight reported having left-handed family members. No participants had major exposure to languages other than English prior to the age of 5, and none had a current diagnosis of any neurological or psychiatric disorder or brain damage or was using neuroactive drugs. All had normal or corrected-to-normal vision for the distances used in the experiment. All participants also passed a behavioral criterion for inclusion: They showed significant sensitivity in their response distribution to the match/distortion versus mismatch conditions (Pearson’s Chi-squared statistic, all individual participant p’s < .001; conditions described further below). Participants were randomly assigned to one of 24 experimental lists.

2.1.2 |. Materials

Overview

Materials and counterbalancing are identical to Smith and Federmeier (2020) but are reproduced below. Novel objects resembling either biological organisms (“germs”) or mechanical devices (“machines”) were paired with natural scenes. Objects were organized hierarchically such that germs and machines could be subdivided into major categories, then subcategories, then exemplars (i.e., distortions). At study, each major object category was consistently paired with a scene type (e.g., germ category 1 with beaches, germ category 2 with forests) for any given participant. In the test phase, participants viewed each scene from the study phase, followed by an object that exactly matched what they had studied with that scene, a distortion of that object, an object from a different subcategory (that had been associated with that scene type, but not that specific scene, at study), or an object from a different major category (that would never have appeared on that scene type at study). Across the full set of participants, all object types were paired with all scene types, and objects and scenes were never repeated in the study phase. Details of stimulus development and counterbalancing follow. See Figure 1 for an overview of the stimulus structure and experimental design.

FIGURE 1.

FIGURE 1

Overview of stimulus structure and experimental design. (a) Exemplars from each of the six scene categories: beaches, forests, mountains, city streets, highways and offices. (b) Prototypes for each of the 18 subcategories of germ objects, divided into 6 major categories. (c) Prototypes for each of the 18 subcategories of machine objects, divided into 6 major categories. (d) Category structure of the object images: each class of objects (germs or machines) has 6 major categories, and each major category has 3 subcategories. 24 exemplars are generated per subcategory. (e) Examples of feature continua used to generate exemplars from germ and machine object prototypes. Each prototype was manipulated to generate three multi-feature continuua. Each continuum was used to generate 8 exemplars. Thus, a total of3 * 8 = 24 exemplars were generated per subcategory prototype. (f) Experiment 1 experimental conditions. (g) Procedure for generating test trials in the within category mismatch and between category mismatch conditions. Figure reproduced with permission from Smith and Federmeier (2020).

Scenes

Scenes depicted one of six categories: beaches, forests, mountains, city streets, highways and offices. Scenes were drawn from a pool of 288 images, 48 per category, which were previously normed as being highly representative of their respective scene types and which had been rescaled to 800 × 600 pixels (see Torralbo et al., 2013 for norming details).

Objects

Line drawings of novel object prototypes for biological organisms (“germs”) or mechanical devices (“machines”) were created by an artist with the aid of Adobe Photoshop to maintain a consistent set of visual textures. Within the two classes of germs and machines, drawings were further organized into six major categories, each with three subcategories. Thus, there were 18 subcategories of germs and 18 subcategories of machines, each with a single representative prototype image. Major categories shared aspects of their visual structure and texture, as well as homologous parts. From each subcategory prototype image, 24 exemplar images were derived by changing the relative positions, proportions and orientations of the object parts.

To create the object exemplars, each object prototype image was scanned and roughly centered, then further manipulated using Photoshop and the animation software Unity. For each object prototype, we altered the prototype image along three continuously changing parameter scales, with the prototype image at the center of each scale. We then took snapshots of the distorted images at various points along each parameter scale to generate three continuously varying sets of eight exemplars, with four exemplars on either side of the central prototype within each set. We thus produced 24 total exemplars per object, from which we sampled to generate the stimulus lists. We never displayed the original prototype images in the experiment. For germs, parameter changes induced global distortions, such as gradually twisting the body or changing its width, while for machines, which were assumed to have a more rigid body, parameter changes affected the size and relative positions of component parts, which were first extrapolated from the prototype images using Photoshop layers to avoid unnatural gaps. Often, several parameters would change at once along each scale to heighten the visual dissimilarity among exemplars. All novel object line drawings were resized to 274 × 274 pixels.

Counterbalancing and experimental conditions

A total of 24 experimental lists of stimuli were used, identical to those in Smith and Federmeier (2020). Each list was assigned to a single participant. Lists were grouped into 6 sets of 4. Lists within each set maintained a fixed study trial correspondence between the 6 categories of novel objects within each class (germs or machines), and the 6 scene categories. For any given list (and participant), each germ or machine type would only ever appear on a single scene category (e.g., “beaches”) at study, and this relationship would hold for all 3 subtypes of the object major category. The mapping between object and scene categories was then systematically rotated across the 6 sets, so that over the full set of participants every object category was associated with every scene category at study. For example, the first four participants would always see exemplars from germ category 1 and machine category 1 with beaches during the study phase, while the second set of four participants would see exemplars from germ category 1 and machine category 1 with highways.

Each list consisted of 288 study and 288 test pairs of novel object and scene stimuli. Stimuli within each list were organized into 18 blocks of 16 study pairs followed by 16 test pairs. Blocks alternated between all germs and all machines; the first block for each list was always germs. The same set of 288 unique scenes was used across all lists; scenes were presented exactly once at study and once in the corresponding test phase in each list. Each set of 4 lists had 288 unique test object exemplars that were used across all 4 lists; these objects were randomly drawn (without replacement) from the total set of possible exemplars, within constraints of the experimental design. Test objects were never repeated within a list but were sometimes repeated across lists, such that some test object exemplars were used more often over the full counterbalancing than others. 793 out of 864 possible novel object exemplars (24 exemplars/object × 3 objects/category × 6 categories/class × 2 classes) were presented as test objects over the full set of participants.

Within each block, an approximately equal number of pairs involving each scene category and object major category were presented at both study and test. All 16 scenes presented at study within a block were presented in a different pseudorandom order in the corresponding, and immediately following, test phase. In the test phase of each block, 4 of the 16 test trials were assigned to each of 4 experimental conditions (see Figure 1):

  1. Exact match: test object exactly matched the object paired with the scene at study

  2. Distortion: test object was a different exemplar of the same subtype of object presented with the scene at study

  3. Within-category mismatch: test object was a different subtype of object within the same major category as the object presented with the scene at study

  4. Between-category mismatch: test object was a different major category of object from that presented with the scene at study (and thus, belonged to a category of object that would never have been studied with that scene category previously; e.g., an object category that had only ever appeared on offices at study, now appearing on a beach at test).

To illustrate, a participant might study a particular object (a) on a beach (Figure 1f, study phase), an object (b) from a different subcategory on a different beach (Figure 1g, left, study phase), and an object (c) from a different major category on a mountain (Figure 1g, right, study phase). For the corresponding test phase, a match trial could be the same exemplar of object (a) on the same beach as study. A distortion trial could be the same beach studied with object (a) but, instead of the original exemplar of object (a), a variant along the continuum in Figure 1e is presented. A within category mismatch could be a new exemplar of object (b) presented on the beach that had gone with object (a). A between category mismatch trial could be a new exemplar of object (c) (previously presented on a mountain) on the beach that had gone with object (a). Other than in the exact match condition, the exact object exemplar presented at test never matched that presented in the preceding study phase (i.e., a distortion derived from the same object prototype was used; see Figure 1e). Because there were only 16 trials per study phase, only 16 of 18 objects (6 categories × 3 objects/category, given the object class) were presented in each study phase. Half of the between category mismatches in the test phase were created by swapping two object subcategories from different major categories that had been presented in the preceding study phase (see Figure 1g, right). The other half were created by introducing an object category that had not been presented in the preceding study phase. Within category, mismatches were always created by swapping object subcategories within a major category that had been presented in the preceding study phase (see Figure 1g, left).

Each list contained 72 trials per condition. Across lists, each test object and each scene appeared in each of the four experimental conditions an equal number of times. In fact, identical test object–scene pairs were used across the first 3 experimental conditions; only in the fourth experimental condition (between category mismatch) was it necessary to shuffle the specific pairings of test object and scene. Trial order within each block was pseudo-randomized, such that no more than two trials corresponding to each condition were presented in a row, and no more than 3 trials mapping onto a “same” response (i.e., trials in the exact match or distortion conditions) occurred in a row at test. Response hand was counterbalanced across lists.

2.1.3 |. Procedure

Participants passively studied the paired scenes and novel objects and then were tested by being asked to indicate, for each in a new set of pairs, whether the object matched the presented scene. Study and test phases were organized into 18 study-test blocks, between which the participant was encouraged to take a break. All breaks were self-paced.

In each study phase, 16 scene–object pairs were presented, each beginning with a white fixation cross on a black background presented for 350–550 ms (duration jittered to reduce the impact of anticipatory slow potentials on the timelocked waveform; the fixation cross remained on screen for the remainder of the trial). Next, the scene alone was presented centrally for 2500 ms on a black background. Right after the scene appeared, participants were allowed to move their eyes to take in the scene; however, 1800 ms into scene presentation, the fixation cross brightened to indicate that the participant should fixate in the center of the screen in preparation for object presentation. 700 ms later (2500 ms after scene onset), a white square containing the object appeared in the center of the screen, super-imposed on top of the scene, for 2500 ms. A screen with the word “***BLINK***” was then displayed for 2000 ms (preceded and followed by 50 ms of blank screen), in order to encourage the participant to blink between trials (Figure 2).

FIGURE 2.

FIGURE 2

Experiment 1 procedure.

In the test phase immediately following each study phase, 16 scene–object pairs, repeating all 16 scenes from the study phase, were displayed. Participants were asked not to move their eyes for the entire test trial duration. Similar to the study trials, each scene was first displayed by itself for 200 ms on a black background, followed by 2500 ms during which the test object was displayed centrally on top of the scene, again embedded in a white square. Participants were asked to wait until the object–scene pair was replaced by a question mark in the center of the screen to respond; the question mark remained on screen until a response was made. Participants had 3 response options: the object on this scene is (1) the same object that was studied with this scene, (2) not the same object, but “could have gone with” this scene, and (3) not the same object and could not have gone with this scene. Participants were told that if the object was only slightly visually distinct from what they remembered studying (e.g., had a different body position or proportions), they should still respond (1). They were also told that an object and scene “could go together” if they believed that pair looked similar to other study items and could hypothetically be presented in an upcoming trial of the experiment, even if they knew that they had not studied it. Participants were never told that there was a structured relationship among the object and scene categories. Participants were explicitly instructed that each test phase only covered materials studied in the immediately preceding study phase and that testing was non-cumulative across blocks. Prior to the main experiment, participants were given a practice block of 4 study and 4 test trials, which used different but qualitatively similar object and scene images to those in the main study.

During recording, participants were seated 100 cm away from the computer in a comfortable chair. The visual angle of the scenes was 13.6° by 9.3° and that of the object images was 4.6° by 4.2°. The recording session lasted approximately 90 min. Afterwards, for a subset of the objects (one prototype image from each of the 6 major categories of germs and machines), participants indicated which scene categories the object had been associated with during the experiment by circling one or more of 6 scene category labels.

2.1.4 |. EEG data acquisition and preprocessing

The electroencephalogram (EEG) was recorded from 26 silver/silver-chloride electrodes evenly spaced over the scalp. The sites are midline prefrontal (MiPf), left and right medial prefrontal (LMPf and RMPf), left and right lateral prefrontal (LLPf and RLPf), left and right medial frontal (LMFr and RMFr), left and right mediolateral frontal (LDFr and RDFr), left and right lateral frontal (LLFr and RLFr), midline central (MiCe), left and right medial central (LMCe and RMCe), left and right mediolateral central (LDCe and RDCe), midline parietal (MiPa), left and right mediolateral parietal (LDPa and RDPa), left and right lateral temporal (LLTe and RLTe), midline occipital (MiOc), left and right medial occipital (LMOc and RMOc), and left and right lateral occipital (LLOc and RLOc). The midline central (MiCe) electrode was placed where the “Cz” electrode would appear using the international 10–20 system. Eye movements were monitored via a bipolar montage of electrodes on the outer canthus of each eye. Blinks were detected by an electrode below the left eye. Impedances were kept below 5 kΩ. Signals were amplified with a 0.02–0.250 Hz bandpass using a BrainVision amplifier and digitized at 1000 Hz. Data were referenced online to the left mastoid and rereferenced offline to the average of the left and right mastoids. Each trial consisted of a 1000 ms epoch preceded by a 200 ms prestimulus baseline. Trials contaminated by eye movements, blinks or other recording artifacts were rejected offline. Artifact rejection procedures using subject-specific threshold parameters resulted in average trial loss of 22.2% for the exact match condition, 21.9% for the distortion condition, 23.1% for the within-category mismatch condition, and 21.1% for the between category mismatch condition. A digital lowpass Butterworth IIR filter with a 30 Hz half-amplitude cut-off and 12 dB/octave roll-off was applied prior to statistical analysis. Prior to permutation-based cluster analysis, data were further down-sampled to 100 Hz.

2.1.5 |. Analysis

EEG dependent measures

Dependent measures derived from the EEG consisted of mean amplitudes over time windows and electrode sites selected to capture particular components of interest and are identical to Smith and Federmeier (2020) to maximize comparability. Dependent measures of interest are mean amplitudes at: 250–349 ms over fronto-central sites to capture the N300, 350–499 ms over centro-parietal sites for the N400, and 500–699 ms and 700–899 ms over posterior sites to capture early and late time windows of the LPC. Because the N300 was the primary measure of interest, and because its distribution has been variably characterized over the literature, we took the conservative approach of measuring effects at all fronto-central sites (16 total); the N400 and LPC were characterized at 8 sites each, focused around each component’s typical distribution and reducing topographic overlap with other components. See Figure 3 for precise electrode selection for each dependent measure.

FIGURE 3.

FIGURE 3

Timing and electrode sites of ERP dependent measures of interest. Reproduced with permission from Smith and Federmeier (2020).

Predictors of interest

Key within-subjects predictors of EEG amplitude included match condition and visual similarity to the target object. In a between-subjects analysis combining the current data set with that in Smith and Federmeier (2020), we assessed interactions between scene preview time at test (200 vs. 2500 ms), and the effects of match condition and visual similarity to target. We also conducted an exploratory time-domain cluster analysis examining effects of match condition in the current experiment, the results of which are presented in Figure S1.

Visual distance computation

For our analysis of visual similarity effects, we computed visual distance between the observed and predicted object image as follows. First, an initial set of V1-like features was generated for each object image using the model in Pinto et al. (2008). This model (code available at https://github.com/npinto/v1s-0.0.4_scene) generates visual features by filtering the input image using a series of oriented Gabor filters with varying spatial frequencies. Object images were resized to 150 × 150 pixels using the imagemagick (https://imagemagick.org/) resize function prior to feature generation. We followed the standard Pinto et al. (2008) model pipeline, as follows. A 3 × 3 pixel low pass box filter was applied prior to feature extraction. Two-dimensional Gabor filters were generated using 6 different spatial frequencies (1/2, 1/3, 1/4, 1/6, 1/11, 1/18 cycles/pixel) crossed with 16 different orientations equally spaced between 0 and 15/16 π, for a total of 96 filters. Filters had a zero-mean and Euclidean norm of one and used a fixed Gaussian envelope (standard deviation of 9 cycles/pixel in both directions) and fixed phase (0). Filter kernels were set to a fixed size of 43 × 43 pixels. After convolving the Gabor filter kernels with the preprocessed input image, the output image was thresholded by setting negative feature values to 0 and values greater than 1 to 1. Features were then normalized by taking each 3 × 3 pixel area in the filtered image and subtracting the mean value across all filters and pixels corresponding to that 3 × 3 pixel area, and then dividing by the norm of these same values (except when the norm was less than 1). Finally, the filtered images were low-pass filtered again using a 17 × 17 pixel boxcar filter, and then down-sampled to 30 × 30 pixels. This procedure, which used the same parameter settings as Pinto et al. (2008), resulted in 30 × 30 × 96 = 86,400 features per input image. We then reduced the dimensionality of the feature space. First, features with variance close to zero were removed to avoid numeric issues while scaling, and feature values were mean-centered and scaled to unit variance. Next, PCA was applied to reduce the dimensionality of the feature space to 766 while maintaining an explained variance ratio of 0.999. Visual distance was then defined as the Euclidean distance between the presented object image and the target (studied) object image in this feature space, for each trial. Visual distance was grand mean centered prior to model fitting.

Statistical analysis of behavioral data

Behavioral accuracy in the test phase was assessed by collapsing across the Match and Distortion conditions and collapsing across the Within Category Mismatch and Between Category Mismatch conditions. Responding “match” to a Match or Distortion condition trial and “possible mismatch” or “impossible mismatch” to a mismatch trial was considered correct. Accuracy was compared across scene preview time conditions using a Welch two-sample paired t-test. Behavioral analyses of response distributions were conducted using logistic regression modeling in R. A logistic regression model predicting the probability of response “impossible mismatch” was fit with a fixed effect of condition (exact match or distortion vs. within category mismatch vs. between category mismatch), subject random intercepts, and by-subjects random slopes of match condition, which, due to convergence issues, only contrasted the between category mismatch condition versus all other match conditions. Nested model comparisons were used to assess response sensitivity to mismatch type.

Post-test categorization performance was also assessed using logistic regression. A logistic regression model was fit predicting the probability of indicating that a scene category was associated with a presented object, with a fixed effect of ground-truth match to the depicted object, crossed random intercepts for subject and scene (response choice), and a by-subjects random effect of ground-truth match to the depicted object. Nested model comparisons were used to assess the main effect of match. Additional predictors of scene preview duration and its interaction with ground-truth match were added, and nested model comparisons were used to assess whether sensitivity to match interacted with scene preview duration.

Statistical analysis of EEG data.

We followed the component-based analysis approach in Smith and Federmeier (2020), important aspects of which are reproduced below. We also extended the analysis to directly compare the current dataset with that in Smith and Federmeier (2020). Only behaviorally correct and artifact free trials were included. Linear mixed effects models were fit to the individual trial data, including fixed effects of:

  1. Condition (contrasting match, distortion, within, and between category conditions)

  2. Response type (for within and between category mismatch conditions only, since only one response type was considered correct for match and distortion conditions); condition (within vs. between) was moderately associated with response (“possible” vs. “impossible” mismatch) in the current experiment; Cramér’s V = 0.43.

  3. The interaction between condition and response

Models also included crossed random intercepts of subject, item (object + scene), and channel, and by subjects random slopes of condition, response and their interaction. We also compared this model to one with an additional fixed effect, which contrasted between category mismatch swap trials (generated by swapping the condition of object images that had been presented in the immediately preceding study phase) and between category mismatch new trials (generated by presenting an object image that had not been presented in the immediately preceding study phase). The equation in R notation for the full model for single trial ERP amplitude, except where otherwise noted, is thus:

single_trial_ERP_amplitude ~ 
match_condition*object_recency +
match_condition*response
+ (1 + match_condition*response | subject)
+ (1 | object_scene_pair)
+ (1 | channel)

Here, match_condition is a factor with 4 levels: Match, Distortion, Within Category Mismatch, and Between Category Mismatch. For this analysis, when examining the N300 time-window only, random slopes combined the Match and Distortion conditions into a single factor level due to convergence issues. The object recency variable could only differ in the Between Category Mismatch condition and was valued at 1 if the object was presented in the immediately preceding study phase, and 0 otherwise (this predictor was removed when examining effects of condition collapsing across recency, and when recency was found not to improve model fit). The response variable could only differ within the two mismatch conditions, given that only behaviorally correct trials were included in the analysis, and denoted the difference between “possible mismatch” and “impossible mismatch” responses to these trials. Due to convergence issues, we were not able to fit the maximally complex model:

single_trial_ERP_amplitude |
match_condition*object_recency*response
+ (1 + match_condition*object_recency*response | subject)
+ (1 | object_scene_pair)
+ (1 | channel)

Specifically, we did not include a by-subjects random slope of object recency or a fixed effect for the interaction between response and object recency (which would only impact model estimates for Between Category Mismatch trials). For the N300 time window, we also excluded a by-subjects random slope of Distortion Condition (vs. Match). Including any of these variables led to a failure to converge.

For assessing effects of condition, response, and object recency within Experiment 1, models were fit to only the data collected in Experiment 1. For between-subjects comparisons of preview time (incorporating data from Smith & Federmeier, 2020), a new set of models was fit to the combined 200 ms scene preview and 2500 ms scene preview data. These new models had an identical model structure to that above, except that they excluded effects of object recency and included a fixed effect of scene preview duration and its interactions with match and response:

single_trial_ERP_amplitude ~
match_condition*response
+ preview_duration*match_condition*response
+ (1 + match_condition*response | subject)
+ (1 | object_scene_pair)
+ (1 | channel)

Subject was defined by participant identity and not by subject list. The same random effects structure was used for all time windows in this analysis.

In a separate set of analyses, we added an additional fixed effect to the models used above: visual distance from the target object. Models did not distinguish among the two types of between category violations (new vs. swap). In our between-subjects analysis, fixed effects of both visual distance to target and its interaction with preview duration were added to the model.

All models were fit using maximum likelihood estimation. Fixed effects were initially tested using likelihood ratio tests with nested model comparisons. The reduced model removed the effect of interest from the full model as selectively as possible, and model fit to the data was compared between the reduced and full models, accounting for changes to overall model complexity, using a chi-square difference test computed with the anova function in the R stats package. For example, when testing for an interaction between scene preview duration and match condition, only that interaction term was removed, but the main effect of scene preview duration and all interaction terms between scene preview duration and other variables were still included in the reduced model. For analyses of the data from Experiment 1 alone, follow-up comparisons of condition means were conducted using the contest function in the lmerTest package in R, with family-wise error rate corrected p-values and the Satterthwaite approximation for the degrees of freedom. Since there was little observed sensitivity to response type, pairwise condition contrasts collapse across response type using linear combinations of beta weights. That is, although the model included a fixed effect of response type, variability in response type was allowed to impact estimates of pairwise differences among the condition means. However, when comparing across Experiment 1 and the dataset in Smith and Federmeier (2020), beta weights for condition and its interactions were assessed without collapsing across response type, nested model comparisons were used to assess fixed effects, and p-values for follow-up comparisons to better understand trends in the data remain uncorrected for multiple comparisons. The distortion condition was included in all models, but, as in Smith and Federmeier (2020), never differed reliably from the match condition. For ease of reporting, we thus describe only contrasts among the exact match and mismatch conditions.

Sample size and multiple comparisons correction.

Sample size for Experiment 1 was identical to Smith and Federmeier (2020) and was originally determined with reference to Hannula et al. (2006). We have refrained from correcting for multiple comparisons across the four separate component-based time-windows. We include statistics on the later time-windows, but caution that the presence of any earlier effects may impact the interpretability of later components. Reported p values are uncorrected for multiple comparisons except where stated explicitly. When assessing pairwise condition contrasts within an ERP time-window in our initial within-subjects analysis of the Experiment 1 data, we controlled for the family-wise error rate by applying Bonferroni-Holm correction using the p.adjust function in the R stats package. Two families of contrasts were considered for correction: one family of all 6 possible pairwise contrasts involving the four main match conditions, and a separate family of all 7 pairwise match condition contrasts that involved breaking down the between category mismatch condition into “new” and “swap” subconditions. Multiple comparisons were corrected for in our exploratory cluster analysis reported in the supplementary information using permutation testing. Cluster significance was computed by comparing the sum of the t values within each cluster to the distribution of the maximum sum of t values cluster score over a random permutation baseline, α = .025 (equivalent to a two-sided test with α = .05, since both positive and negative clusters were computed), n = 2000 repetitions.

2.2 |. Results

2.2.1 |. Behavioral: Online accuracy

As in Smith and Federmeier (2020), participants discriminated well among test objects that matched and mismatched the presented scene. Accuracy was computed after collapsing the exact match and distortion conditions (which were treated as a “match”) and the two mismatch conditions along with the two different mismatch responses (“possible” and “impossible”). Mean accuracy was 80.3%, range 59.7%–93.4%. Comparing the current dataset to the dataset in Smith and Federmeier (2020), accuracy did not differ from when scene preview was 2500 ms (Welch two-sample t-test, t = .45, df = 45.6, p = .67). Participants were also sensitive to the type of mismatch. A logistic regression model predicting the probability of response “impossible mismatch” with fixed effect of condition (exact match or distortion vs. within category mismatch vs. between category mismatch), using subject random intercepts, by-subjects random effects of match condition, and nested model comparisons, confirmed sensitivity to mismatch type: intercept = −0.938, β = 2.57, SE = .324, z = 7.934, χ12=30.92, p < .001. Figure 4a shows the mean response distribution across subjects for each condition. Distribution of responses was comparable to that seen when scene preview was 2500 ms (Smith & Federmeier, 2020).

FIGURE 4.

FIGURE 4

Experiment 1 behavioral results. (a) Proportion of responses by condition in the online memory task. Participants were reliably sensitive to condition, and responded to distortions similarly to the exact match condition, as instructed. (b) Confusion matrix of scene–object category associations indicated at post-test. Scenes circled by participants were the associated scene category for the displayed object 82%–92% of the time, demonstrating explicit knowledge of the scene–object category mapping.

Participants were numerically more likely to respond to between category mismatches as being impossible, compared to Smith and Federmeier (2020; β = −0.76, SE = .405, z = −1.87, χ12=3.40, p < .1). There was no difference in probability of responding “match” to the exact match and distortion conditions across the current experiment and Smith and Federmeier (2020; due to convergence issues, a simplified random effects structure was used that removed the by subjects random slope for between category mismatch vs. other match conditions; χ12<1).

2.2.2 |. Behavioral: Posttest categorization

On the posttest, participants demonstrated explicit knowledge of the categorical mapping among objects and scenes. Participants were more likely to indicate that objects were associated with a scene category when the two had been paired at study. Figure 4b shows the normalized confusion matrix indicating the probability that a scene category, if circled, belonged to the correct scene category for the depicted object. This was assessed with a logistic regression model predicting the probability of circling a scene category, with a fixed effect of match, crossed random intercepts for subject and scene (response choice), a by-subjects random effect of match, and nested model comparisons: intercept = −4.79, β = 8.16, SE = .869, z = 9.390, χ12=43.87, p < .001. When results were compared with Smith and Federmeier (2020), there were no significant interactions with experiment (probability of circling a scene type was similar for both matching and mismatching scene types across experiments, |z|’s < 1).

2.2.3 |. ERP analysis: Match condition

N300

Figure 5 shows ERP responses to the three conditions. To examine the N300, voltages were separately averaged across time for each trial from 250 to 349 ms at each of 16 frontal and central sites. There was a main effect of condition (χ32=12.95, p < .01), which remained when recency was accounted for by including between category mismatch swap versus new as a fixed effect (χ32=9.33, p < .05; due to convergence issues, the random slope for the main effect of response needed to be dropped for this test). Accounting for recency improved model fit (χ12=9.45, p < .01). Follow-up comparisons revealed that between category mismatches were more negative than matches (diff = −1.20 μV, F(1,28.7) = 9.2, p < .05), but within category mismatches were not significantly so (diff = −0.93 μV, F(1,23.0) = 3.1, n.s.). Between and within category mismatches did not differ from each other (diff = −0.27 μV, F < 1). Among between category mismatches, new trials were more negative than swap trials (diff = −1.03 μV, F(1,3240) = 9.2, p < .05). When examined separately, between category mismatch new trials were significantly more negative than matches (diff = −1.71 μV, F(1,41.1) = 16.0, p < .01), but swap trials were not (diff = −0.68 μV, F(1,39.9) = 2.5, n.s.). There was no main effect of response nor interaction between match condition and response among mismatch trials (response: χ12<1; response × match condition: χ12=2.53, n.s.). Thus, in the N300 time window there was a more negative response to new items (which were only used in the between category mismatch condition) compared to repeated items. There was a tendency for mismatches to be numerically more negative than matches, as in Smith and Federmeier (2020), but effect sizes were smaller and driven by recency.

FIGURE 5.

FIGURE 5

Experiment 1 match versus mismatch conditions at 12 representative sites (scalp locations indicated at bottom right). An additional 15 Hz low pass filter was applied after averaging for display purposes.

N400

To examine the N400, voltages were separately averaged across time for each trial from 350 to 499 ms at 8 central and parietal sites. There was a main effect of condition (χ32=14.15, p < .01), that persisted even when recency was accounted for by including between category mismatch swap versus new as a fixed effect (χ32=9.11, p < .05; accounting for recency improved model fit, χ12=6.36, p < .05). Follow-up comparisons revealed that between category mismatches were more negative than the match condition (diff = −1.66 μV, F(1,26.7) = 12.0, p < .05), but within category mismatches were not significantly so (diff = −1.03 μV, F(1,17.6) = 4.94, n.s.). Between and within category mismatches did not significantly differ from each other (diff = −0.63 μV, F(1,23.5) = 1.6, n.s.). Among between category mismatches, new trials were more negative than swap trials (diff = −0.99 μV, F(1,3126) = 6.7, p < .05). New between category mismatch trials were also more negative than matches (diff = −2.16 μV, F(1,36.2) = 17.5, p < .01), but swap trials were not significantly more negative than matches (diff = −1.17 μV, F(1,35.2) = 5.1, n.s.). There was no main effect of response nor interaction between match condition and response among mismatch trials (response: χ12<1; response × match condition: χ12=1.09, n.s.). Like the preceding N300 time window, effects of match were driven by recency. In Smith and Federmeier (2020), we had also observed effects of both match and recency on the N400, with similar effect sizes.

LPC

To examine the LPC, voltages were separately averaged across time for each trial from 500 to 699 ms (early LPC) and 700–899 ms (late LPC) at each of 8 posterior sites. In the early time window, there was a main effect of condition (χ32=11.99, p < .01), which persisted even when recency was accounted for by including between category mismatch swap versus new as a fixed effect (χ32=10.56, p < .05; accounting for recency did not significantly improve model fit, χ12<1). Both between and within category mismatches were more negative than matches (between: diff = −1.46 μV, F(1,25.9) = 7.9, p < .05; within: diff = −1.59 μV, F(1,19.5) = 11.5, p < .05). Between and within category mismatches did not significantly differ from each other (diff = 0.13 μV, F < 1). There was no main effect of response nor interaction between condition and response among mismatch trials (χ12's<1). This pattern differs substantively from that seen in Smith and Federmeier (2020), which found only an effect of response type and no effects of match condition (Figure 6).

FIGURE 6.

FIGURE 6

Comparing effects of match condition across scene preview durations: Experiment 1 versus Smith and Federmeier (2020). Difference waves of within category mismatch—match, and between category mismatch—match, plotted separately by scene preview duration (200 vs. 2500 ms). Additional 5 Hz low pass filter applied following averaging for display purposes.

In the late time window, there again was a main effect of match condition (χ32=10.15, p < .05), that persisted even when recency was accounted for by including between category mismatch swap versus new as a fixed effect (χ32=9.96, p < .05; accounting for recency did not significantly improve model fit, χ12<1). Within category mismatches were more negative than matches and between category mismatches (within—match, diff = −1.73 μV, F(1,16.7) = 15.4, p < .01; between—within, diff = 2.21 μV, F(1,23.8) = 20.1, p < .001). Between category mismatches did not significantly differ from matches and were numerically more positive than matches (diff = 0.48 μV, F(1,24.4) = 1.0, n.s.). Smith and Federmeier (2020) also found numerically more positive waveforms elicited by between compared with within category mismatches in this time window. There was no main effect of response nor interaction between condition and response among mismatch trials (response: χ12=1.52; response × match condition:χ12=1.92; p’s > .1). Again, this differs from Smith and Federmeier (2020), which showed a continued effect of response type in the late (as in the early) LPC window.

Between-subjects comparison: 200 versus 2500 ms scene preview

To examine interactions with scene preview time, we directly compared the current experiment (200 ms scene preview) with the dataset in Smith and Federmeier (2020; 2500 ms scene preview). See Figure 6.

For the N300, there was a significant interaction between match condition and scene preview time (χ32=13.98, p < .01). The between category mismatch—exact match contrast was larger given a longer scene preview (β = −1.12, SE = .55, χ12=3.94, p < .05, uncorrected). Among mismatch trials, there was no significant interaction of scene preview time with response, or response × match condition (scene preview × response: χ12<1; scene preview × response × match condition: χ12=2.04, n.s.).

For the N400, there were no significant interactions with scene preview time (scene preview × match condition: χ32=2.54; scene preview × response: χ12=1.90; scene preview × response × match condition: χ12<1; p’s > .1).

In the early LPC window, there was a significant interaction between match condition and scene preview time (χ32=11.93, p < .01). Given a long preview time, there was no reliable effect of match condition, with the within and between category mismatch conditions numerically more positive than the match condition. With a short preview, within and between category mismatches were significantly more negative than the match condition. Thus, the [within category mismatch—match] and [between category mismatch—match] contrasts were significantly more positive given a long scene preview, (scene preview × (within—match): β = 3.27, SE = 1.03, χ12=9.04, p < .01; scene preview × (between—match): β = 2.29, SE = .76, χ12=8.24, p < .01; p-values uncorrected). There was also a significant interaction between scene preview time and response, reflecting that “possible” mismatch responses elicited a more negative waveform than “impossible” responses with long scene previews, but not short scene previews (β = −2.84, SE = 1.34, χ12=4.34, p < .05).

In the late LPC time window, there was no significant interaction between scene preview time and match condition (χ32=5.54, n.s.). There was, however, a significant interaction between scene preview time and response, again reflecting that “possible” mismatch responses elicited a more negative waveform than “impossible” responses with long scene previews, but not short scene previews (β = −3.47, SE = 1.54, χ12=4.88, p < .05).

2.2.4 |. ERP analysis: Target similarity

Following Smith and Federmeier (2020), we also conducted a separate set of analyses in order to assess whether low-level visual features of the scene-congruent object were accessed in memory even when it was not displayed (see Figure 7). Beta values are reported as 1000 times the original estimates.

FIGURE 7.

FIGURE 7

Experiment 1 ERP averages by visual distance bin. Distance = 0 indicates the Exact Match condition, while higher distances indicate that the presented object was more visually distinct from the target object at test. Only behaviorally correct and artifact free trials included. An additional 5 Hz low pass filter was applied prior to plotting. Sites used to generate each plot are indicated at bottom left.

There was a significant interaction between experiment (2500 vs. 200 ms scene preview time) and the effect of visual distance to target, for all components of interest (N300: β = −2.255, SE = .426, χ12=27.96, p < .001; N400: β = −4.124, SE = .602, χ12=46.83, p < .001; early LPC: β = −3.926, SE = .599, χ12=42.94, p < .001; late LPC: β = −3.265, SE = .627, χ12=27.06, p < .001). Given a 2500 ms scene preview, the waveform was more positive for objects more similar to the target across all four components, but given a 200 ms scene preview (the current experiment), the effect was smaller, absent, or, for the early LPC, there was a significant effect that tended in the opposite direction (N300, β = −1.509, SE = .485, χ12=9.67, p < .01; N400 and late LPC:χ12's<1; early LPC: β = 1.399, SE = .631, χ12=4.90, p < .05). See supplementary information for further comparison of the two experiments, including discussion of differences in the correlation structure and a break-down by response and match condition.

2.3 |. Discussion

With a short preview time, match effects primarily reflected stimulus recency in the N300 and N400 time windows and became more robust and less sensitive to recency during the early LPC. On the late LPC, within category mismatches were more negative than both matches and between category mismatches. There were no interactions with response on any of the four components.

When directly compared with Smith and Federmeier (2020), 2500 ms scene previews were found to elicit larger N300 mismatch effects at the test object than 200 ms scene previews. N400 match effect size did not differ across short and long preview times. Having a short (200 ms) scene preview also led to the emergence of match effects (of a similar form to those seen for the N400) in the early LPC time window. The combined pattern of smaller mismatch effects at an early point in time, and larger ones at a later point in time, suggests a delay in the timing with which the brain appreciates the match/mismatch distinction when the participant is given less time to process the scene context prior to viewing the test object. In addition, the early and late LPC were sensitive to the participant’s response choice independent of condition only when a long preview time was given. This may partially reflect the tighter correlation between mismatch type and response type in the current experiment, compared with Smith and Federmeier (2020; Cramér’s V = 0.43 vs. 0.27). Alternatively, it is consistent with a delay in response-related processing.

The results of the target similarity analysis corroborate the patterns in the match analysis in suggesting that short preview times provide less evidence for preparatory processing. With a short preview time, effects of target similarity were smaller or absent, suggesting participants were less likely to have activated a representation of the target object in response to the scene.

3 |. EXPERIMENT 2

In Experiment 2, we used a within-subjects parametric design to replicate and extend the findings of Experiment 1, in which we saw that N300 match-mismatch effects are enhanced by giving participants an extended contextual preview prior to viewing the object at test. By treating scene preview duration as a continuous parameter (0–2500 ms), we can (1) estimate how much scene preview time is enough to maximize the contextual benefit and (2) explore whether our findings extend to a case where there is temporal uncertainty in when the object image will appear following the context scene.

3.1 |. Method

3.1.1 |. Participants

Data are reported from 36 participants (mean age 20, range 18–25; 10 males), all native English-speaking University of Illinois undergraduates, who were compensated with payment. Three additional participants were replaced due to excessive trial loss (1) or poor behavioral performance on the online task (2). The criterion for poor behavioral performance was showing no significant sensitivity to match condition (Pearson’s Chi-squared test). All participants provided written informed consent, according to procedures established by the IRB at the University of Illinois. Handedness was assessed using the Edinburgh inventory (Oldfield, 1971). All participants were right-handed; mean score: .78, where 1 denotes strongly right-handed and −1 strongly left-handed. Fifteen reported having left-handed family members. No participants had major exposure to languages other than English prior to the age of 5, and none had a current diagnosis of any neurological or psychiatric disorder or brain damage or was using neuroactive drugs. All reported normal or corrected-to-normal vision for the distances used in the experiment. Participants were randomly assigned to one of 36 experimental lists.

3.1.2 |. Materials

Similar materials were used as in Experiment 1, with the following changes. Thirty-six lists of experimental stimuli (drawing from the same set as in Experiment 1) were generated such that each list consisted of 16 blocks of study/test trials. Each study and test phase contained 18 object–scene pairs. Test phase trials were divided evenly into three experimental match conditions: exact match, within category mismatch, and between category mismatch. Unlike in Experiment 1, all mismatch trials were created by recombining objects and scenes that had been presented in the preceding study phase. The same exemplar object images presented in the study phase were presented at test. Blocks alternated between germ and machine objects. Whether the first block of trials consisted of germs or machines was counterbalanced across participants. Test trials were divided evenly into three of the four object match conditions from Experiment 1: exact match, within category mismatch, and between category mismatch. In addition, the amount of time scenes were presented prior to object onset at test (scene preview duration) was varied continuously between 0 and 2500 ms, orthogonal to match condition. Test trials were pseudorandomized such that no match condition was ever presented more than 3 times in a row. Furthermore, no object major category or scene type was ever presented more than twice in a row at study or test.

3.1.3 |. Procedure

The task and procedure were similar to Experiment 1, with the following changes. As in Experiment 1, participants were instructed to respond “match” if they felt that the presented object image in the test phase was a distorted version of the image they had studied with that scene in the study phase. And, as in Experiment 1, a distortion trial was included in the practice. However, no distortion trials were presented in the main experiment. Unlike in Experiment 1, participants were instructed to respond as quickly and accurately as possible as soon as the object appeared in each test phase trial. On the posttest, all 36 object prototypes were shown and asked to be matched with one or more of 6 scene type labels (whereas, in Experiment 1 participants responded to only a subset of the objects). Lastly, unlike in Experiment 1, the center of the scene was occluded throughout study and test with a white square of the same dimensions as the object image. This change allowed onset of the object image to be less jarring given that the precise timing of onset was no longer predictable. It also controlled for the possibility that duration of access to centrally presented scene information was driving sensitivity to scene preview duration in our previous experiments (Figure 8).

FIGURE 8.

FIGURE 8

Experiment 2 design and conditions.

3.1.4 |. EEG data acquisition and preprocessing

EEG data acquisition and preprocessing was the same for Experiment 1. Artifact rejection procedures using subject-specific threshold parameters resulted in average trial loss of 27.0% for the exact match condition, 22.0% for the within-category mismatch condition, and 22.4% for the between category mismatch condition.

3.1.5 |. Analysis

Statistical analysis of behavioral data

Because participants were asked to respond immediately, rather than at a delay as in Experiment 1, the impact of match condition and preview time on response time was also assessed. Linear mixed effects models were fit to test whether log response time was predicted as a function of match condition, response, mean centered scene preview duration (in seconds), and their interactions. Random intercepts of item (scene + object) and subject were included in the model, as well as by-subjects random slopes of condition, response, and their interaction. Nested model comparisons were used to test individual fixed effects, in a forward model selection procedure. Only behaviorally correct trials were examined.

Statistical analysis of EEG data

Effects of match condition and preview time on our ERP dependent measures were assessed similarly to Experiment 1. However, scene preview time at test was now treated as a continuous within-subjects predictor. Mixed effects models assessing EEG dependent measures were fit with the following effects structure (in R notation), except where stated otherwise:

single_trial_ERP_amplitude ~ 
match_condition*response*preview_duration
+ (1 + match_condition*response | subject)
+ (1 | object_scene_pair)
+ (1 | channel)

Models included the following fixed effects:

  1. match condition (contrasting match, within, and between category conditions)

  2. response type (for within and between category mismatch conditions only); condition (within vs. between) was moderately associated with response (“possible” vs. “impossible” mismatch); Cramér’s V = 0.22.

  3. the interaction between match condition and response

  4. mean centered scene preview duration as a continuous linear predictor, in seconds

  5. the interaction between match condition and scene preview duration

  6. the interaction between response and scene preview duration

  7. the 3-way interaction between match condition, response, and scene preview duration

The same random effects structure was used as for Experiment 1, with random intercepts of subject, item (scene + object), and channel, and by-subjects random slopes of match condition, response, and match condition × response. Our attempt to include a by-subjects random slope of scene preview duration led to convergence issues.

We corrected for multiple comparisons using Bonferonni-Holm over the following families of contrasts:

  1. The three paired condition contrasts (as for Experiment 1, proportions of different response types were added back in to create response weighted linear combinations when examining overall differences among conditions)

  2. The six paired condition contrasts broken down by response

  3. Four different measures of the effect of scene preview duration for each of the four match versus mismatch contrasts, broken down by response.

3.2 |. Results

3.2.1 |. Behavioral: Online accuracy

Initial behavioral analyses were conducted to screen participants for poor behavioral performance. All but two participants included in the study showed significant sensitivity to the match versus mismatch contrast, collapsing across mismatch type and “possible” versus “impossible” mismatch response (Pearson’s Chi-squared test for 2 × 2 contingency table, p’s < .01). The remaining two participants were sensitive to between category mismatches versus other trials, but responded similarly to matches and within category mismatches, again collapsing across “possible” and “impossible” mismatch responses (Pearson’s Chi-squared test for 3 × 2 contingency table, p’s < .01).

For subsequent online behavioral analyses, trials were excluded for which no response was registered within the first 5000 ms of object onset (37 total trials across the experiment, 0.36% of the data). Mean accuracy was 76.1%, range 58.5%–96.9%. Participants as a group were sensitive to the type of mismatch and were more likely to respond that between category mismatches were “impossible” than that within category mismatches were “impossible” (intercept = −1.742, β = 1.492, SE = .197, z = 7.587, χ12=34.12, p < .001). Figure 9a shows the mean response distribution across subjects for each condition. The probability of responding correctly to match trials increased with increasing scene preview duration (β = 0.161, SE = .069, z = 2.337, χ12=5.44, p < .05). However, sensitivity to mismatch type did not improve with increasing scene preview (based on two models respectively predicting probability of a “possible” or “impossible” mismatch response, examining interactions between scene preview duration and each mismatch condition ID variable, |z|’s < 1).

FIGURE 9.

FIGURE 9

Experiment 2 behavioral results. (a) Proportion of responses by condition in the online memory task. (b) Response time is sensitive to match condition and scene preview duration. (c) Confusion matrix of scene–object category associations indicated at post-test.

3.2.2 |. Behavioral: Response time

Within category and between category mismatches were both responded to more slowly than exact matches (within—exact: β = 0.250, SE = .019, χ12=62.07, p < .001; between—exact: β = 0.197, SE = .017, χ12=54.41, p < .001). “Possible” responses did not differ reliably from “impossible” responses (χ12=1.62, n.s.). Within category mismatches were responded to more slowly than between category mismatches (β = 0.053, F(1,89.9) = 16.56, χ12=11.54, p < .001).

The more time participants had to preview the context scene prior to test object onset, the faster their response times (β = −0.096, SE = .006, χ12=256.0, p < .001). This was modulated by match condition (χ22=14.51, p < .001). Longer preview times sped up the confirmation of match trials more than the rejection of mismatch trials (effect of preview time on match trials: β = −0.129, SE = .011; decrease in preview facilitation for mismatches relative to matches: within category, β = 0.041, SE = .014; between category, β = 0.051, SE = .014). Among mismatches, there was no significant further modulation of the preview facilitation effect by whether a “possible” or “impossible” response was given(χ12=2.00, n.s.). Figure 9b plots response time as a function of scene preview duration and match condition.

3.2.3 |. Behavioral: Posttest categorization

As in Experiment 1, participants showed sensitivity to the association between object and scene types at study. Figure 9c shows the normalized confusion matrix indicating the probability that a scene category, if circled, belonged to the correct scene category for the depicted object. This was again assessed with a logistic regression model predicting the probability of circling a scene category, with a fixed effect of match, crossed random intercepts for subject and scene (response choice), a by-subjects random effect of match, and nested model comparisons. As in Experiment 1, matching scene types were selected more often than mismatching scene types (intercept = −3.45, β = 5.85, SE = .450, z = 12.99, χ12=61.55, p < .001).

3.2.4 |. ERP analysis: Match by preparation time

ERP results associated with match condition as a function of preview time are shown in Figures 10, 11, and 12. All four component models revealed a numeric or significant three-way interaction among match condition, response, and scene preview duration (N300: χ22=5.57, p < .1; N400: χ22=89.77, p < .001; early LPC: χ22=54.11, p < .001; late LPC: χ22=39.08, p < .001). Therefore, the interaction between scene preview duration and the mismatch—match effect is reported by mismatch type, both collapsing across response types and separately by response type (within vs. between category mismatch type × “possible” vs. “impossible” response). To aid in interpretation, corresponding mismatch—match effect estimates at the mean scene preview duration are also reported. Because changing scene preview duration led to more or less overlap between the scene and object elicited ERPs, we do not attempt to interpret the main effect of scene preview duration.

FIGURE 10.

FIGURE 10

Experiment 2 test object match versus mismatch condition ERPs at 12 representative sites (scalp locations indicated at bottom right). An additional 15 Hz low pass filter was applied after averaging for display purposes.

FIGURE 11.

FIGURE 11

Experiment 2 ERP waveforms time-locked to test object onset, plotted separately by match condition and scene preview duration (500 ms bins). ERP waveforms aggregated across 5 central electrode sites, indicated at right. Additional 15 Hz low pass filter applied after averaging for display purposes.

FIGURE 12.

FIGURE 12

Experiment 2 estimated effect size of N300 mismatch—match effect by scene preview duration bin. Estimates computed for each preview time bin as unweighted average of four separate mismatch—match effects: within category mismatch—exact match and between category mismatch—exact match, each separately assessed for “impossible” and “possible” responses. Modeled over 16 frontal channels in N300 time window using linear mixed effects model including 5 ×500 ms scene preview duration bins. Error bars show 2×approximate standard error.

N300

At the mean scene preview duration (1264 ms), between category mismatches were more negative than exact matches (diff = −1.27 μV, F(1,47.2) = 7.0, p < .05), to a similar degree across response types (interaction between category mismatch—match effect and response type: F < 1). Within category mismatches did not differ significantly from between category mismatches (between—within: diff = −.41 μV; F < 1) and were numerically more negative than matches (diff = −0.86 μV, F(1,32.4) = 3.6, n.s.), to a similar degree across response types (interaction between within category mismatch—match effect and response type: F < 1). The mismatch—match effect was larger (more negative) given a longer preview duration, for both mismatch types and both response types (see Table 1 summary).

TABLE 1.

Preview duration modulation of match-mismatch effect by mismatch and response type (behaviorally correct trials only: “possible mismatch” or “impossible mismatch”).

Mismatch type Response Preview duration effect (β, in μV/s) F df1 F df2 F statistic p-value
Within category mismatch Possible −0.908 1 94,771 69.9 <.001
Impossible −0.781 1 90,640 24.1 <.001
Between category mismatch Possible −1.061 1 5704 11.9 <.001
Impossible −1.416 1 5747 21.0 <.001

The interaction of N300 effect size and scene preview duration is of particular interest in the current study. We took advantage of our treatment of scene preview duration as a continuous parameter spanning 0–2500 ms, in order to get an approximate estimate of the amount of contextual pre-exposure time needed to show a reliable mismatch—match effect on the N300, given our design. Thus, we fit a separate mixed effects model predicting N300 amplitude, this time treating preview time as an unordered binned predictor with the following factor levels (in ms): [0, 500], (500, 1000], (1000, 1500], (1500, 2000], (2000, 2500]. For each preview time bin, we then computed the unweighted average of four mismatch—match effects: within category “possible” response, within category “impossible” response, between category “possible” response, between category “impossible” response. By taking the unweighted average, we compensated for any fluctuations in the proportions of mismatch type and response type across preview time bins. This gross measure of the N300 mismatch—match effect became significant (uncorrected for multiple comparisons) and remained so for scene preview durations greater than approximately 1000 ms ([0, 500] ms: diff = −.14 μV, F < 1; (500, 1000] ms: diff = −.63 μV, F(1,73.3) = 1.87, n.s.; (1000, 1500] ms: diff = −1.01 μV, F(1,71.7) = 4.97, p < .05; (1500, 2000] ms: diff = −2.06 μV, F(1,69.6) = 20.9, p < .001; (2000, 2500] ms: diff = −1.50 μV, F(1,72.4) = 10.8, p < .01). See Figure 12 for a graphical illustration.

N400

At the mean scene preview duration, mismatches were estimated to be more negative than exact matches (within category mismatch—match: −1.49 μV, F(1,33.4) = 13.8, p < .01; between category mismatch—match: diff = −1.45 μV, F(1,47.5) = 7.4, p < .05). This numeric trend held across response type (“possible” within: diff = −1.08 μV, F(1,32.8) = 5.38, n.s.; “impossible” within: diff = −2.51 μV, F(1,26.7) = 11.5, p < .05; “possible” between: diff = −1.85 μV, F(1,34.3) = 6.77, p < .1; “impossible” between: diff = −1.06 μV, F(1,38.9) = 3.29, n.s.). Within and between category mismatches did not differ significantly for either response type (“possible”: F(1,34.3) = 1.52; “impossible”: F(1,30.9) = 2.80; p’s > .1).

As was true for the N300, for the N400 the mismatch—match effect was larger (more negative) given a longer preview duration for both mismatch types and both response types (“possible” within category mismatch: β = −0.344 μV/s, F(1,46,172) = 5.23, p < .05; “impossible” within category mismatch: β = −2.585 μV/s, F(1,44,455) = 137.8, p < .001; “possible” between category mismatch: β = −0.938 μV/s, F(1,5220) = 7.07, p < .05; “impossible” between category mismatch: β = −1.140 μV/s, F(1,5252) = 10.4, p < .01).

Early LPC

At the mean scene preview duration, between category mismatches were more negative than matches (between category mismatch—match: diff = −1.43 μV, F(1,33.4) = 15.9, p < .01). Within category mismatches were numerically more negative than matches (diff = −0.64 μV, F(1,46.2) = 1.5, n.s.), and there was no reliable difference by mismatch type (between—within: diff = .78 μV, F(1,52.8) = 3.5, n.s.). Most combinations of mismatch and response type were numerically more negative than matches (“possible” within: diff = −1.49 μV, F(1,32.3) = 15.7, p < .01; “possible” between: diff = −1.62 μV, F(1,39.4) = 6.64, p < .1; “impossible” within: diff = −1.26 μV, F(1,29.1) = 3.52, n.s.). However, “impossible” between category mismatches were numerically more positive than exact matches (diff = 0.34 μV, F < 1).

Effects did not vary by preview time for items that were given a “possible” response (“possible” within: F(1,45,455) = 1.10; “possible” between: F < 1; p’s > .1). For “impossible” within category mismatches, the mismatch—match effect was more negative given longer scene preview (β = −1.692 μV/s, F(1,43,048) = 59.2, p < .001) and the same pattern was seen numerically for “impossible” between category mismatches (β = −0.559 μV/s, F(1,5174) = 2.83, n.s.).

Late LPC

At the mean scene preview duration, within category mismatches were more negative than matches (diff = −0.89 μV, F(1,33.8) = 5.7, p < .05). Between category mismatches were numerically more positive than matches (diff = 0.58 μV, F(1,44.2) = 1.0, n.s.). Broken down by response type, “possible” within category mismatches were more negative than exact matches (diff = −1.24 μV, F(1,33.3) = 10.9, p < .05) but no other condition by response combination differed from exact matches (“possible” between: diff = −.20 μV, F < 1; “impossible” within: diff = −.01 μV, F < 1; “impossible” between: diff = 1.36 μV, F(1,38.8) = 3.5, n.s.).

“Impossible” mismatch—match contrasts became more negative with increasing scene preview duration, but not “possible” mismatch—match contrasts (“impossible” within: β = −1.009 μV/s, F(1,45,005) = 20.4, p < .001; “impossible” between: β = −0.845 μV/s, F(1,5187) = 6.27, p < .05; “possible” within: β = −0.090 μV/s, F < 1; “possible” between: β = 0.230 μV/s, F < 1).

3.3 |. Discussion

Earlier ERP measures (targeting the N300 and N400) were robustly sensitive to category-level mismatches between the presented and target object at test. Specifically, mismatching objects elicited a more negative waveform than matching objects at these latencies. Moreover, this sensitivity increased when participants were given more time to process the context scene prior to the onset of the test object, replicating the pattern shown between subjects in Experiment 1. We further estimated that N300 match effects emerged and stabilized after roughly 1000 ms of contextual pre-exposure time given the current design and sample size.

Later ERP measures (targeting the early and late LPC) showed a more complex pattern of sensitivity, reflecting the interaction of stimulus and response-related processing. When participants ultimately decided a mismatch was “impossible” given the context scene, to the extent that they were given more time to process the scene in advance, the mismatch—match effect became more negative in this interval. However, when they ultimately decided the mismatch was “possible,” the duration of contextual preview no longer affected amplitude of the mismatch—match effect.

4 |. GENERAL DISCUSSION

In the current paper, we examined sensitivity to scene preview duration in a scene–object paired associate learning paradigm, with the goal of understanding the temporal dynamics of contextual facilitation of visual processing for newly learned associations and thereby shedding light on the mechanisms underlying those context effects. We found substantial sensitivity to scene preview duration not only in behavior, but also on visual processing, as revealed in the size of congruency effects on the N300 and via the strength of sensitivity (in ERP amplitude) to subtle visual distortions of the presented versus expected object. These results suggest that scenes facilitate visual processing of recently associated objects only at a delay, possibly through MTL-mediated mechanisms. The fact that this pattern is different from that previously documented for well-learned associations, which seem to show more immediate context effects, suggests that well-established and recently acquired associations yield context effects via different mechanisms of memory retrieval. Below, we discuss the major findings and their implications in turn.

4.1 |. Wide-spread effects of scene-preview duration on processing of recently associated visual objects

Across two experiments, shorter scene previews yielded diminished or delayed contextual congruency effects for newly associated visual objects. Behaviorally, shorter scene previews resulted in slower response times and a lower probability of correctly identifying matching objects in Experiment 2. Shorter scene previews also tended to evoke smaller match/mismatch effects on earlier ERP components (the N300), and if anything, larger effects on later components (the LPC), consistent with an overall latency shift in peak match sensitivity. N300 mismatch effects were lower in amplitude with shorter preview times across both experiments, and residual N300 effects at 200 ms scene preview in Experiment 1 were driven by stimulus recency. In contrast, shorter scene previews yielded larger match/mismatch effects in the early LPC window (500–699 ms) in Experiment 1, when responding was not immediate.3 Although response time effects alone imply a disruption of facilitatory processing given shorter scene preview time, the pattern of ERP effects allows us to make inferences about the specific stages of processing that have been affected. Because the N300 has been theoretically linked to visuo-structural processing, attenuation of N300 match/mismatch effects at shorter latencies suggests a disruption of facilitation for higher-level visual processing of the presented object.

However, we go beyond component interpretation to tie our results to visual processing. We also found that a component-neutral EEG index of visual feature priming is attenuated at short scene preview durations. This measure consists of a graded effect on ERP amplitude of the degree of visual similarity between the presented and expected object, using a V1-like feature space. This measure accentuates differences among object exemplars based on their visual properties and is thus distinct from semantic or categorical contrasts. At 200 ms scene preview in Experiment 1, we only found evidence for a small visual similarity effect in the expected direction in the N300 spatio-temporal region of interest. However, we previously detected robust effects across the N300, N400 and LPC ROIs with a 2500 ms scene preview (Smith & Federmeier, 2020). Our component neutral analysis thus further supports the claim that visual processing facilitation of a contextually congruent object is contingent on cue-target delay for recently learned associates. Across all our various measures, wide-spread evidence for dependence on cue-target delay has several implications for theoretical interpretation of visual associative facilitation effects, as outlined in the following sections.

4.2 |. Systems consolidation may impact the nature and timing of contextual facilitation effects

Our findings that contextual effects for recently learned associations show a strong dependence on scene preview duration are in contrast with previous studies using well-established scene–object relationships, which show N300 facilitation for contextually congruent objects even with no scene preview (Mudrik et al., 2010, 2014; Truman & Mudrik, 2018). Given that it is likely that effects for well-established relationships arise through the use of cortically consolidated knowledge, we believe that this distinction highlights the presence of multiple predictive sources underlying the N300 response, and contextual facilitation effects more generally. Future work should systematically manipulate scene preview duration for cortically consolidated scene–object associations to further explore differences and similarities with our current findings. For example, the presence of N300 facilitation effects at zero delay does not preclude the possibility that N300 facilitation effects may still be enhanced by scene preview. However, there are several reasons to believe that cortically consolidated associative relationships may be coactivated more rapidly than MTL-dependent activity in visual cortex.4

There are suggestions from the literature that when the hippocampus is recruited, feedback to visual areas should be slower than that observed with strictly cortical feedback. Mainly, the MTL may receive feedforward input rapidly, but take time to retrieve associations. Evidence that the MTL receives input rapidly comes from the timing of its visual evoked responses and its connectivity profile. The hippocampus responds rapidly to visual input, including during the N300 time window (Kreiman et al., 2000; Sehatpour et al., 2008), and has extensive connectivity to high level visual cortex (e.g., perirhinal cortex and parahippocampal cortex via entorhinal cortex) and some connections to lower level visual cortex, including V1, in humans (Huang et al., 2021; Maller et al., 2019). However, hippocampus-mediated retrieval is theorized to be time-locked to the theta rhythm (roughly 3–10 Hz), such that pattern completion is carried out primarily during the theta peak, when hippocampal subregion CA3 sends strong inputs to subregion CA1 (Schapiro et al., 2017; Hasselmo et al., 2002). This proposed theta-dependence could add an additional delay of up to roughly 150 ms, even if retrieval was completed in a single cycle. As it turns out, some forms of memory retrieval appear to unfold over multiple theta cycles. One recent model of episodic memory retrieval, for example, suggests that sensory inputs reach the MTL within 500 ms but that the hippocampus primarily facilitates cortical memory reinstatement between 500 and 1500 ms post stimulus onset (Staresina & Wimber, 2019). Although implicit visual feedback could be more rapid than explicit memory retrieval, Staresina and Wimbler’s timeline could plausibly match the data collected in our current study, inasmuch as N300 match/mismatch effects were only reliable after 500 ms, and response time facilitations appeared to asymptote at roughly 1500 ms scene preview.

Conversely, there are reasons to believe visual associative knowledge may be brought online more rapidly once it is cortically consolidated. High-level cortical areas may receive feed-forward visual input through direct cortical and cortico-thalamic connections and can rapidly begin sending feedback signals upon activation. Bar and colleagues have suggested that visual processing may be influenced by prefrontal responses to visual inputs, which peak as early as 130 ms post stimulus onset (Bar et al., 2006), and prefrontal areas are known to play a critical role in cortical consolidation of associative knowledge (see, e.g., van Kesteren et al., 2012). Within high-level visual cortex, Brandman and Pellen (2017) used findings from fMRI and MEG to estimate that scene-based facilitation of object processing is dependent on scene-selective cortex and peaks within the N300 time-window. This is, of course, in keeping with broader empirical and theoretical motivations for believing that high-level cortical feedback drives the N300 response itself in paradigms relying on familiar objects and well-learned contextual associations. Huang et al. (2021) have also uncovered stronger-than-expected white matter connectivity among inferotemporal cortex, perirhinal cortex, and parahippocampal cortex, suggesting direct cortical pathways for rapid object-scene interactivity. Thus, multiple convergent sources of evidence suggest that high-level cortical feedback can rapidly shape visual processing. Moreover, with cortical consolidation, to the extent that low-level visual statistical associations are also learned, these could be encoded directly within low-level visual cortex, further speeding processing.Future work should explore whether the effects we observed for novel associations show reduced sensitivity to cue-target delay 1 week after learning, which would allow time for cortical consolidation. Such a finding would further strengthen the claim that inferred differences in contextual facilitation effects across the literature reflect different stages of systems consolidation.

The relative importance of specific MTL structures in visual statistical learning appears to vary by stimulus type and may also vary across single versus multi-session studies. This suggests possible limitations on the generalizability of our findings. For example, the human hippocampus does not encode gabor patches predicted by an auditory stimulus, although it does encode predicted shape information (Kok et al., 2020). Also, lesion studies in monkeys and rodents suggest that hippocampal damage does not substantially impair object-object (and abstract shape-shape) visual paired associate learning and memory (Murray et al., 1993; Winters et al., 2010), but perirhinal lesions do. It is possible that hippocampal encoding of predicted shape information in humans thus does not reflect a direct causal contribution. Alternatively, it may be that the hippocampus plays a more central role in single session learning, given that most animal studies require extensive training over multiple sessions. In any case, it is possible that the temporal dynamics uncovered here might not generalize from scene–object pairs to other stimulus types or to multi-session studies. It’s also not clear the extent to which associative learning is yoked to particular stimulus features. Our use of novel objects raises the question of generalizability to well-known objects, since participants in our study learned the visual properties and category structure of a set of novel objects at the same time as the object–scene associations. Future work should confirm that our findings generalize to well-known objects paired with previously unassociated scenes.

4.3 |. Dissociable activation time-courses for scenes and objects suggest non-unitized representations drive cortical pyramidal activity for recently learned associates

“Unitized” associative representations have been found in both perirhinal cortex (Fujimichi et al., 2010) and the hippocampus (Stachenfeld et al., 2017). A unitized representation is inferred to exist when a single neuron fires at an elevated rate in response to both a stimulus and its associate at a similar delay. However, we use the term more broadly to include distributed representations encoded at the population level that contain overlapping information about two different associated stimuli at the same point in time (similar to the distributed modeling approach adopted in Schapiro et al., 2013). That is, rather than moving between two different population states to encode the association dynamically, the association is directly encoded within a single state. Thus, after associative learning, a cue and target stimulus come to elicit population responses that resemble one another more than they did before learning through a subset of shared activity patterns. Also, there is a directional bias in the convergence from cue to target; that is, over the course of learning, cues come to elicit responses that resemble pre-learning targets, but target representations remain comparatively stable. The primary difference between a unitized representation account and a predictive preactivation account is thus specification of timing. The unitized representation account specifies that the prediction is now part of the cue representation itself, and, as such, part of the target representation becomes active at the same time as the cue representation. Of course, only part of the predicted target representation is automatically activated; it is thus possible that additional processes may subsequently activate the predicted target representation in full.

Work on unitized representations has raised the question of whether scene–object associations are encoded as a unitized representation in MTL or high-level association cortex, and whether such a representation could primarily drive visual processing facilitation of objects by scenes. If the observed facilitation effects on the N300 were driven entirely by unitized representations in high level visual cortex and/or the MTL, then we would not have expected to see the degree of sensitivity to scene preview duration that we did. This is under the assumption that core recognition and visual and semantic memory processes for scenes are completed within roughly 500 ms, and that access to a unitized representation that can influence N300 generators (believed to include lateral occipital cortex) would be completed within that time. Note that it remains likely that at some point in time after viewing a scene, the MTL or another region does encode recently learned scene–object associations. However, this associative knowledge is not always brought online in time to drive activity in the neural generators of the N300. From this, we can infer that knowledge of an associative link between a scene and an as-yet-unseen object is unlikely to influence basic visuostructural processing of the scene, as might be expected in a successor representation style predictive model.

In lieu of a unitized representational account, we propose that predictive processing unfolds gradually over time in the form of ramping/maintenance activity that encodes features of the predicted object. This would seem consistent with a similar conclusion reached by Sherman et al. (2022) for scene–scene pairs. Sherman et al. recorded local field potentials from visually responsive cortex in human participants as they passively viewed statistically associated scenes. They then showed each scene one at a time in a post-exposure task that was used to train a scene classifier. The classifier could identify scene type for the predicted, but not yet presented scene above chance from 268 to 534 ms5 post cue onset. Noting that the currently presented scene was decodable earlier in time than the predicted but as yet unpresented scene, Sherman and colleagues suggested a mechanism of predictive preactivation relying on non-unitized representations. However, Sherman et al. trained their classifier on non-contiguous timepoints selected for being maximally informative over a broad time window (roughly 130–530 ms) while participants viewed individual scenes following statistical learning.6 This makes it difficult to determine the extent to which different types of scene information, or early feedforward versus later feedback processes, may have contributed to classifier performance. We think our findings add to a converging picture that predictive pre-activation with anticipatory activity encoding the predicted target (dissociable from activity encoding the predictive cue) is a common pattern in high-level visual prediction for recently learned associations.

4.4 |. Manipulations of cue-target delay are informative and underutilized

We hope our work also illustrates the empirical value of employing systematic manipulations in cue-target delay to inform theoretical interpretations of contextual facilitation effects. Manipulations of cue-target delay can be combined with multiple methods in cognitive neuroscience (eyetracking, EEG, fMRI, etc.). We feel they have been underused given the high level of interest in predictive processing and contextual facilitation across the field. There are inherent limitations on interpretability when cues and targets are presented at a fixed delay, or when the delay is long and variability in delay is not assumed to impact effect size (as with standard fMRI event-related paradigms). Mainly, it is often unclear the extent to which observed facilitation effects are the result of cue presentation, target anticipation, or cue-target integration. This study illustrates how researchers can begin to systematically parcel out these disparate sources of variation for the wide variety of contextual facilitation effects observed in the literature.

5 |. CONCLUSION

Across two experiments, we found strong evidence that it takes time for scenes to facilitate visual processing of recently associated objects. N300 scene–object congruity effects and component-neutral effects of visual template matching are attenuated, and response time in a match/mismatch decision is lengthened, when scene preview time is under 500 ms. This stands in contrast with the near-instantaneous congruency effects that have been observed when scene–object associations are well learned. Our results thus suggest changes in how the brain stores and retrieves visual associative relationships over the course of systems consolidation.

Supplementary Material

Supp Fig 4
Supplemental Info
Supp Fig 3
Supp Fig 2
Supp Fig 1
Supp Table 1

Funding information

James S. McDonnell foundation award; National Institute on Aging, Grant/Award Number: AG2630; National Science Foundation Graduate Research Fellowship Program, Grant/Award Number: 1144245

Footnotes

1

Some researchers see these effects as less distinct and have described effects in this time window as occurring on an “N300-N400 complex” (e.g., Draschkow et al., 2018; Võ & Wolfe, 2013) or suggested that visual processes indexed by the N300 may continue during the N400 time window (Schendan, 2019).

2

It is also possible that the facilitatory effects that have been observed in behavior and/or ERPs do not reflect predictive processing, but instead arise through integrative processing of the conjunction of the observed scene and object that does not onset until the object has been presented. In this case, we would also expect limited or no sensitivity to scene preview duration. Since, as discussed, it has been shown that basic scene information, including scene gist, can be extracted within about 200 ms, if we assume that integrative processing of scene-object pairs is mostly driven by such rapidly extractable information, we would not expect to find sensitivity to scene preview duration for the ranges tested here (200–2500 ms).

3

This pattern was not observed in Experiment 2. Notably, in Experiment 1, the early LPC was not sensitive to response outcome, but it was in Experiment 2, suggesting sensitivity to the task demands of immediate versus delayed responding.

4

We posit that the MTL may influence visual processing via feedback connections, but do not suggest that the N300 response is directly generated by the MTL. Rather, we link our study to the MTL via the strong literature suggesting MTL dependence for sensitivity to recently learned arbitrary associations. Future work relying on electrophysiology in nonhuman primates and/or ECoG in human patients, for example, may be better able to directly confirm the presence and nature of feedback dynamics between the MTL and N300 generators in inferotemporal cortex.

5

This time-window for analysis was pre-determined and is not a latency estimate.

6

Interestingly, many maximally informative points were selected from the N300 time window.

SUPPORTING INFORMATION

Additional supporting information can be found online in the Supporting Information section at the end of this article.

DATA AVAILABILITY STATEMENT

Data and analysis code are available at Harvard Dataverse: https://doi.org/10.7910/DVN/JFXXKI.

REFERENCES

  1. Albright TD (2012). On the perception of probable things: Neural substrates of associative memory, imagery, and perception. Neuron, 74(2), 227–245. 10.1016/j.neuron.2012.04.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bar M (2004). Visual objects in context. Nature Reviews Neuroscience, 5(8), 617–629. 10.1038/nrn1476 [DOI] [PubMed] [Google Scholar]
  3. Bar M, Kassam KS, Ghuman AS, Boshyan J, Schmid AM, Dale AM, Hämäläinen MS, Marinkovic K, Schacter DL, Rosen BR, & Rosen BR (2006). Top-down facilitation of visual recognition. Proceedings of the National Academy of Sciences, 103(2), 449–454. 10.1073/pnas.0507062103 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bell AH, Summerfield C, Morin EL, Malecek NJ, & Ungerleider LG (2016). Encoding of stimulus probability in macaque inferior temporal cortex. Current Biology, 26(17), 2280–2290. 10.1016/j.cub.2016.07.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Biederman I, Mezzanotte RJ, & Rabinowitz JC (1982). Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology, 14(2), 143–177. 10.1016/0010-0285(82)90007-X [DOI] [PubMed] [Google Scholar]
  6. Boran E, Hilfiker P, Stieglitz L, Sarnthein J, & Klaver P (2022). Persistent neuronal firing in the medial temporal lobe supports performance and workload of visual working memory in humans. NeuroImage, 254, 119123. 10.1016/j.neuroimage.2022.119123 [DOI] [PubMed] [Google Scholar]
  7. Brandman T, & Pellen MV (2017). Interaction between scene and object processing revealed by human fMRI and MEG decoding. Journal of Neuroscience, 37(32), 7700–7710. 10.1523/JNEUROSCI.0582-17.2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Davenport JL, & Potter MC (2004). Scene consistency in object and background perception. Psychological Science, 15(8), 559–564. 10.1111/j.0956-7976.2004.00719.x [DOI] [PubMed] [Google Scholar]
  9. Demiral ŞB, Malcolm GL, & Henderson JM (2012). ERP correlates of spatially incongruent object identification during scene viewing: Contextual expectancy versus simultaneous processing. Neuropsychologia, 50(7), 1271–1285. 10.1016/j.neuropsychologia.2012.02.011 [DOI] [PubMed] [Google Scholar]
  10. Donaldson DI, & Rugg MD (1998). Recognition memory for new associations: Electrophysiological evidence for the role of recollection. Neuropsychologia, 36(5), 377–395. 10.1016/S0028-3932(97)00143-7 [DOI] [PubMed] [Google Scholar]
  11. Doniger GM, Foxe JJ, Murray MM, Higgins BA, Snodgrass JG, Schroeder CE, & Javitt DC (2000). Activation time-course of ventral visual stream object-recognition areas: High density electrical mapping of perceptual closure processes. Journal of Cognitive Neuroscience, 12(4), 615–621. 10.1162/089892900562372 [DOI] [PubMed] [Google Scholar]
  12. Draschkow D, Heikel E, Võ ML-H, Fiebach CJ, & Sassenhagen J (2018). No evidence from MVPA for different processes underlying the N300 and N400 incongruity effects in object–scene processing. Neuropsychologia, 120, 9–17. 10.1016/j.neuropsychologia.2018.09.016 [DOI] [PubMed] [Google Scholar]
  13. Fei-Fei L, Iyer A, Koch C, & Perona P (2007). What do we perceive in a glance of a real-world scene? Journal of Vision, 7(1), 10. 10.1167/7.1.10 [DOI] [PubMed] [Google Scholar]
  14. Fujimichi R, Naya Y, Koyano KW, Takeda M, Takeuchi D, & Miyashita Y (2010). Unitized representation of paired objects in area 35 of the macaque perirhinal cortex. European Journal of Neuroscience, 32(4), 659–667. 10.1111/j.1460-9568.2010.07320.x [DOI] [PubMed] [Google Scholar]
  15. Ganis G, & Kutas M (2003). An electrophysiological study of scene effects on object identification. Cognitive Brain Research, 16(2), 123–144. 10.1016/S0926-6410(02)00244-6 [DOI] [PubMed] [Google Scholar]
  16. Gershman SJ (2018). The successor representation: Its computational logic and neural substrates. Journal of Neuroscience, 38(33), 7193–7200. 10.1523/JNEUROSCI.0151-18.2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Guillaume F, Tinard S, Baier S, & Dufau S (2016). An ERP investigation of object–scene incongruity. Journal of Psychophysiology, 32(1), 20–29. 10.1027/0269-8803/a000181 [DOI] [Google Scholar]
  18. Hannula DE, Federmeier KD, & Cohen NJ (2006). Event-related potential signatures of relational memory. Journal of Cognitive Neuroscience, 18(11), 1863–1876. 10.1162/jocn.2006.18.11.1863 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Hannula DE, Ryan JD, Tranel D, & Cohen NJ (2007). Rapid onset relational memory effects are evident in eye movement behavior, but not in hippocampal amnesia. Journal of Cognitive Neuroscience, 19(10), 1690–1705. 10.1162/jocn.2007.19.10.1690 [DOI] [PubMed] [Google Scholar]
  20. Hasselmo ME, Bodelón C, & Wyble BP (2002). A proposed function for hippocampal theta rhythm: Separate phases of encoding and retrieval enhance reversal of prior learning. Neural Computation, 14(4), 793–817. 10.1162/089976602317318965 [DOI] [PubMed] [Google Scholar]
  21. Higuchi S-I, & Miyashita Y (1996). Formation of mnemonic neuronal responses to visual paired associates in inferotemporal cortex is impaired by perirhinal and entorhinal lesions. Proceedings of the National Academy of Sciences, 93(2), 739–743. 10.1073/pnas.93.2.739 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Huang C-C, Rolls ET, Hsu C-CH, Feng J, & Lin C-P (2021). Extensive cortical connectivity of the human hippocampal memory system: Beyond the “what” and “where” dual stream model. Cerebral Cortex, 31(10), 4652–4669. 10.1093/cercor/bhab113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Joubert OR, Fize D, Rousselet GA, & Fabre-Thorpe M (2008). Early interference of context congruence on object processing in rapid visual categorization of natural scenes. Journal of Vision, 8(13), 11–1118. 10.1167/8.13.11 [DOI] [PubMed] [Google Scholar]
  24. Joubert OR, Rousselet GA, Fize D, & Fabre-Thorpe M (2007). Processing scene context: Fast categorization and object interference. Vision Research, 47(26), 3286–3297. 10.1016/j.visres.2007.09.013 [DOI] [PubMed] [Google Scholar]
  25. Kok P, Rait LI, & Turk-Browne NB (2020). Content-based dissociation of hippocampal involvement in prediction. Journal of Cognitive Neuroscience, 32(3), 527–545. 10.1162/jocn_a_01509 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Kok P, & Turk-Browne NB (2018). Associative prediction of visual shape in the hippocampus. Journal of Neuroscience, 38, 6888–6899. 10.1523/JNEUROSCI.0163-18.2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Kreiman G, Koch C, & Fried I (2000). Category-specific visual responses of single neurons in the human medial temporal lobe. Nature Neuroscience, 3(9), 946–953. 10.1038/78868 [DOI] [PubMed] [Google Scholar]
  28. Kumar M, Federmeier KD, & Beck DM (2021). The N300: An index for predictive coding of complex visual objects and scenes. Cerebral Cortex Communications, 2(2), tgab030. 10.1093/texcom/tgab030 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Kutas M, & Federmeier KD (2011). Thirty years and counting: Finding meaning in the N400 component of the event related brain potential (ERP). Annual Review of Psychology, 62, 621–647. 10.1146/annurev.psych.093008.131123 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Larson AM, Freeman TE, Ringer RV, & Loschky LC (2014). The spatiotemporal dynamics of scene gist recognition. Journal of Experimental Psychology: Human Perception and Performance, 40(2), 471–487. 10.1037/a0034986 [DOI] [PubMed] [Google Scholar]
  31. Lewis-Peacock JA, & Postle BR (2008). Temporary activation of long-term memory supports working memory. Journal of Neuroscience, 28(35), 8765–8771. 10.1523/JNEUROSCI.1953-08.2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Luria R, Balaban H, Awh E, & Vogel EK (2016). The contralateral delay activity as a neural measure of visual working memory. Neuroscience & Biobehavioral Reviews, 62, 100–108. 10.1016/j.neubiorev.2016.01.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Ma Q, Rolls ET, Huang C-C, Cheng W, & Feng J (2022). Extensive cortical functional connectivity of the human hippo-campal memory system. Cortex, 147, 83–101. 10.1016/j.cortex.2021.11.014 [DOI] [PubMed] [Google Scholar]
  34. Maller JJ, Welton T, Middione M, Callaghan FM, Rosenfeld JV, & Grieve SM (2019). Revealing the hippocampal connectome through super-resolution 1150-direction diffusion MRI. Scientific Reports, 9(1), 2418. 10.1038/s41598-018-37905-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. McPherson WB, & Holcomb PJ (1999). An electrophysiological investigation of semantic priming with pictures of real objects. Psychophysiology, 36(1), 53–65. 10.1017/S0048577299971196 [DOI] [PubMed] [Google Scholar]
  36. Mudrik L, Lamy D, & Deouell LY (2010). ERP evidence for context congruity effects during simultaneous object–scene processing. Neuropsychologia, 48(2), 507–517. 10.1016/j.neuropsychologia.2009.10.011 [DOI] [PubMed] [Google Scholar]
  37. Mudrik L, Shalgi S, Lamy D, & Deouell LY (2014). Synchronous contextual irregularities affect early scene processing: Replication and extension. Neuropsychologia, 56, 447–458. 10.1016/j.neuropsychologia.2014.02.020 [DOI] [PubMed] [Google Scholar]
  38. Murray EA, Gaffan D, & Mishkin M (1993). Neural substrates of visual stimulus-stimulus association in rhesus monkeys. Journal of Neuroscience, 13(10), 4549–4561. 10.1523/JNEUROSCI.13-10-04549.1993 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Oldfield RC (1971). The assessment and analysis of handedness: The Edinburgh inventory. Neuropsychologia, 9(1), 97–113. 10.1016/0028-3932(71)90067-4 [DOI] [PubMed] [Google Scholar]
  40. Pinto N, Cox DD, & DiCarlo JJ (2008). Why is real-world visual object recognition hard? PLoS Computational Biology, 4(1), e27. 10.1371/journal.pcbi.0040027 [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Rolls ET, Deco G, Huang C-C, & Feng J (2022). The effective connectivity of the human hippocampal memory system. Cerebral Cortex, 32(17), 3706–3725. 10.1093/cercor/bhab442 [DOI] [PubMed] [Google Scholar]
  42. Sakai K, & Miyashita Y (1991). Neural organization for the long-term memory of paired associates. Nature, 354(6349), 152–155. 10.1038/354152a0 [DOI] [PubMed] [Google Scholar]
  43. Schapiro AC, Gregory E, Landau B, McCloskey M, & Turk-Browne NB (2014). The necessity of the medial temporal lobe for statistical learning. Journal of Cognitive Neuroscience, 26(8), 1736–1747. 10.1162/jocn_a_00578 [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Schapiro AC, Rogers TT, Cordova NI, Turk-Browne NB, & Botvinick MM (2013). Neural representations of events arise from temporal community structure. Nature Neuroscience, 16(4), 486–492. 10.1038/nn.3331 [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Schapiro AC, Turk-Browne NB, Botvinick MM, & Norman KA (2017). Complementary learning systems within the hippocampus: A neural network modelling approach to reconciling episodic memory with statistical learning. Philosophical Transactions of the Royal Society B: Biological Sciences, 372(1711), 20160049. 10.1098/rstb.2016.0049 [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Schapiro AC, Turk-Browne NB, Norman KA, & Botvinick MM (2016). Statistical learning of temporal community structure in the hippocampus. Hippocampus, 26(1), 3–8. 10.1002/hipo.22523 [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Schendan HE (2019). Memory influences visual cognition across multiple functional states of interactive cortical dynamics. In Psychology of learning and motivation (Vol. 71, pp. 303–386). Elsevier. 10.1016/bs.plm.2019.07.007 [DOI] [Google Scholar]
  48. Schendan HE, & Ganis G (2012). Electrophysiological potentials reveal cortical mechanisms for mental imagery, mental simulation, and grounded (embodied) cognition. Frontiers in Psychology, 3, 329. 10.3389/fpsyg.2012.00329 [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Schendan HE, & Ganis G (2015). Top-down modulation of visual processing and knowledge after 250 ms supports object constancy of category decisions. Frontiers in Psychology, 6, 1289. 10.3389/fpsyg.2015.01289 [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Schendan HE, & Kutas M (2002). Neurophysiological evidence for two processing times for visual object identification. Neuropsychologia, 40(7), 931–945. 10.1016/S0028-3932(01)00176-2 [DOI] [PubMed] [Google Scholar]
  51. Schendan HE, & Kutas M (2003). Time course of processes and representations supporting visual object identification and memory. Journal of Cognitive Neuroscience, 15(1), 111–135. 10.1162/089892903321107864 [DOI] [PubMed] [Google Scholar]
  52. Sehatpour P, Molholm S, Schwartz TH, Mahoney JR, Mehta AD, Javitt DC, Stanton PK, & Foxe JJ (2008). A human intracranial study of long-range oscillatory coherence across a frontal-occipital-hippocampal brain network during visual object processing. Proceedings of the National Academy of Sciences, 105(11), 4399–4404. 10.1073/pnas.0708418105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Sherman BE, Graves KN, Huberdeau DM, Quraishi IH, Damisah EC, & Turk-Browne NB (2022). Temporal dynamics of competition between statistical learning and episodic memory in intracranial recordings of human visual cortex. BioRxiv, 42, 9053–9068. 10.1523/JNEUROSCI.0708-22.2022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Smith CM, & Federmeier KD (2020). Neural signatures of learning novel object–scene associations. Journal of Cognitive Neuroscience, 32(5), 783–803. 10.1162/jocn_a_01530 [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Stachenfeld KL, Botvinick MM, & Gershman SJ (2017). The hippocampus as a predictive map. Nature Neuroscience, 20(11), 1643–1653. 10.1038/nn.4650 [DOI] [PubMed] [Google Scholar]
  56. Staresina BP, & Wimber M (2019). A neural chronometry of memory recall. Trends in Cognitive Sciences, 23(12), 1071–1085. 10.1016/j.tics.2019.09.011 [DOI] [PubMed] [Google Scholar]
  57. Torralbo A, Walther DB, Chai B, Caddigan E, Fei-Fei L, & Beck DM (2013). Good exemplars of natural scene categories elicit clearer patterns than bad exemplars but not greater BOLD activity. PLoS One, 8(3), e58594. 10.1371/journal.pone.0058594 [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Truman A, & Mudrik L (2018). Are incongruent objects harder to identify? The functional significance of the N300 component. Neuropsychologia, 117, 222–232. 10.1016/j.neuropsychologia.201806.004 [DOI] [PubMed] [Google Scholar]
  59. Turk-Browne NB (2019). The hippocampus as a visual area organized by space and time: A spatiotemporal similarity hypothesis. Vision Research, 165, 123–130. 10.1016/j.visres.2019.10.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. van Kesteren MT, Ruiter DJ, Fernández G, & Henson RN (2012). How schema and novelty augment memory formation. Trends in Neurosciences, 35(4), 211–219. 10.1016/j.tins.2012.02.001 [DOI] [PubMed] [Google Scholar]
  61. Võ ML-H, Boettcher SE, & Draschkow D (2019). Reading scenes: How scene grammar guides attention and aids perception in real-world environments. Current Opinion in Psychology, 29, 205–210. 10.1016/j.copsyc.2019.03.009 [DOI] [PubMed] [Google Scholar]
  62. Võ ML-H, & Wolfe JM (2013). Differential electrophysiological signatures of semantic and syntactic scene processing. Psychological Science, 24(9), 1816–1823. 10.1177/0956797613476955 [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Winters BD, Saksida LM, & Bussey TJ (2010). Implications of animal object memory research for human amnesia. Neuropsychologia, 48(8), 2251–2261. 10.1016/j.neuropsychologia.2010.01.023 [DOI] [PubMed] [Google Scholar]
  64. Wolfe JM, Võ ML-H, Evans KK, & Greene MR (2011). Visual search in scenes involves selective and nonselective pathways. Trends in Cognitive Sciences, 15(2), 77–84. 10.1016/j.tics.2010.12.001 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp Fig 4
Supplemental Info
Supp Fig 3
Supp Fig 2
Supp Fig 1
Supp Table 1

Data Availability Statement

Data and analysis code are available at Harvard Dataverse: https://doi.org/10.7910/DVN/JFXXKI.

RESOURCES