Visual Memories Bypass Normalization

Ilona M Bloem; Yurika L Watanabe; Melissa M Kibbe; Sam Ling

doi:10.1177/0956797617747091

. 2018 Mar 29;29(5):845–856. doi: 10.1177/0956797617747091

Visual Memories Bypass Normalization

Ilona M Bloem ^1,^2,^✉, Yurika L Watanabe ^1,², Melissa M Kibbe ^1,², Sam Ling ^1,^2,³

PMCID: PMC5945309 NIHMSID: NIHMS958433 PMID: 29596038

Abstract

How distinct are visual memory representations from visual perception? Although evidence suggests that briefly remembered stimuli are represented within early visual cortices, the degree to which these memory traces resemble true visual representations remains something of a mystery. Here, we tested whether both visual memory and perception succumb to a seemingly ubiquitous neural computation: normalization. Observers were asked to remember the contrast of visual stimuli, which were pitted against each other to promote normalization either in perception or in visual memory. Our results revealed robust normalization between visual representations in perception, yet no signature of normalization occurring between working memory stores—neither between representations in memory nor between memory representations and visual inputs. These results provide unique insight into the nature of visual memory representations, illustrating that visual memory representations follow a different set of computational rules, bypassing normalization, a canonical visual computation.

Keywords: visual memory, normalization, visual perception, psychophysics

Visual memory allows us to briefly retain information we have just seen, despite the fact that we constantly experience rapid, moment-to-moment changes in visual inputs. What are the qualitative properties of representations stored within visual memory? A prevailing theory, the sensory recruitment hypothesis, posits that the retention of visual memories involves maintenance of visual information within visual cortices in the absence of visual input (Christophel, Klink, Spitzer, Roelfsema, & Haynes, 2017; Harrison & Tong, 2009; Offen, Schluppeck, & Heeger, 2009; Pasternak & Greenlee, 2005; Serences, Ester, Vogel, & Awh, 2009). Indeed, the contents of visual memory appear to share some properties in common with true visual representations (Harrison & Tong, 2009; Pasternak & Greenlee, 2005; Serences et al., 2009; Sneve, Alnæs, Endestad, Greenlee, & Magnussen, 2011; Supèr, Spekreijse, & Lamme, 2001; Tanaka & Sagi, 1998; Xing, Ledgeway, McGraw, & Schluppeck, 2013). For instance, neuroimaging studies have demonstrated that information regarding the remembered stimulus is still evident in the ensemble pattern of activity residing within striate cortex—so much so that training a classifier on true visual stimuli allows for reasonable generalization of classification to patterns of voxel activity corresponding to the remembered orientation (Harrison & Tong, 2009) or contrast (Xing et al., 2013), suggesting that visual memory and visual perception share a representational structure. However, it remains unknown whether representations stored within visual memory function like visual representations.

To address this, we tested whether visual memory representations abide by the same rules as visual perception, examining the degree to which representations in visual memory undergo one of the most essential computations that supports perception: divisive normalization. Under divisive normalization, the neural response to a stimulus is attenuated by the presence of neighboring responses (Carandini & Heeger, 2012; Heeger, 1992). Models of normalization have long served as cornerstone principles for computational accounts of early vision (Carandini & Heeger, 2012; Heeger, 1992; Ling & Blake, 2012) and have been shown to generalize to a variety of other sensory modalities and cognitive processes (Rabinowitz, Willmore, Schnupp, & King, 2011; Rangel & Clithero, 2012), suggesting that normalization may serve as a canonical neural computation (Carandini & Heeger, 2012). Interestingly, apparently unrelated modulatory processes, such as attention, have been theorized to act by co-opting the same neural machinery to alter the relative gain of responses to selected information (Herrmann, Montaser-Kouhsari, Carrasco, & Heeger, 2010; Reynolds & Heeger, 2009). Does normalization act on visual memories?

To examine whether the contents of visual memory undergo contrast normalization, we leveraged a classic demonstration of this computation in action within primary visual cortex: center-surround suppression. With center-surround suppression, the response to a stimulus is dampened by adding additional stimulation in its surrounding region, which has been shown to be linked to decreases in perceived contrast (Shushruth et al., 2013; Xing & Heeger, 2001; Zenger-Landolt & Heeger, 2003)—an interaction that emerges naturally from divisive normalization. Another trademark of divisive normalization is its feature-tuned nature, whereby stimuli with similar features suppress each other’s response more so than those with dissimilar features, implying that surround suppression is mediated by orientation-specific inhibitory interactions within early visual areas (Shushruth et al., 2013). If visual perception and visual memory truly succumb to the same neural computations, presenting stimulation in the surrounding region of an item retained in memory should also attenuate its remembered contrast. Evidence for center-surround suppression in memory would indicate that visual memory representations are pooled by normalization, much like visual representations are.

In Experiment 1, we investigated the degree to which surrounding visual stimulation can influence an actively maintained visual memory representation of a center contrast stimulus. To test for normalization within visual perception, we presented the surround stimulus simultaneously with the center stimulus (simultaneous condition), while to test normalization within visual memory, this surrounding stimulus was instead presented sequentially, during the maintenance interval (sequential condition). We observed surround suppression only when center and surround were presented simultaneously during visual encoding; visual memory representations were left unaffected by the potentially normalizing influence of a surrounding stimulus presented during retention. In Experiment 2, we tested whether normalization operates between multiple representations stored within visual memory. To do so, we tested the degree to which representations stored in visual memory compete with each other by asking observers to retain a visual memory of both the center and surround stimulus, which were presented either simultaneously or sequentially. We again found suppression when center and surround were presented simultaneously but no signature of contrast normalization between representations of sequentially presented stimuli stored in visual memory, suggesting that visual memory representations do not interact like true visual representations. Taken together, these results suggest that visual memory fails to take advantage of a neural computation that could potentially mediate between competing neural representations—results that are striking considering the limited capacity of visual memory (Alvarez & Cavanagh, 2004; Todd & Marois, 2004).

Experiment 1: Normalization Between Visual Memory and Vision

Method

Observers

Twelve healthy adult volunteers between the ages of 20 and 31 years (6 female; mean age = 24.1), with normal or corrected-to-normal vision, participated in Experiment 1. A minimum sample size of 12 was chosen a priori on the basis of sample sizes of comparable studies (Kiyonaga & Egner, 2016; Xing & Heeger, 2001), and a power calculation illustrated that the current sample size yielded a statistical power greater than 90%. All observers provided written informed consent and were reimbursed for their time. The Boston University Institutional Review Board approved the study.

Stimuli

Stimuli were generated using MATLAB (Release 2013b; The MathWorks, Natick, MA) in conjunction with the Psychophysics Toolbox (Pelli, 1997), rendered on a PC running Ubuntu 14.04 LTS, and presented on a gamma-corrected CRT monitor (1,400- × 1,050-pixel resolution; 60 Hz refresh rate). Observers were placed comfortably with their heads in a chin rest at a viewing distance of 68 cm from the screen and were instructed to maintain steady fixation throughout all experimental trials. Stimuli consisted of foveally presented oriented gratings (spatial frequency = 3 cycles/°; randomized spatial phase) on a uniform gray background (mean luminance = 52.05 cd/m²). In each trial, the center stimulus (subtending 1° of visual angle) had a random orientation (between 1° and 180°) and varied from trial to trial in its contrast (five contrast levels, linearly spaced on a log scale between 10% and 75% Michelson contrast; Fig. 1a).

Fig. 1. — Stimuli and example trial sequences from Experiment 1. Each stimulus (a) was composed of one of three different surround configurations at five different center contrast levels (10%–75% contrast). Example trial sequences are shown for the simultaneous (b) and the sequential (c) conditions. Observers viewed a center stimulus for 1,000 ms, which varied from trial to trial in contrast and orientation. In both conditions, observers were required to match the contrast of the probe to the remembered center stimulus after a 2,200-ms retention interval. During the simultaneous condition, the center stimulus was enveloped by a full-contrast surround stimulus, which had orientation content that was either collinearly or orthogonally oriented to the center. In the sequential condition, this surround stimulus was moved into the retention interval. After every interval in which a stimulus could appear, a counterphase flickering, full-contrast checkerboard masking stimulus was presented to reduce any lingering afterimages. Stimuli are modified for illustrative purposes.

Depending on the experimental condition, an oriented surround stimulus (spatial frequency = 3 cycles/°; inner diameter = 1.08°; outer diameter = 3°; randomized spatial phase; 100% Michelson contrast) was presented either simultaneously or sequentially with the center stimulus. The sequential condition was constructed to ensure that any normalization-driven suppression we may observe was not simply due to suppression during perceptual encoding but instead due to normalization during visual memory retention. To ensure that the stimulus presentation would not cause any lingering afterimages, we always directly followed the presentation of both center and surround stimuli with a brief, counterphase flickering, full-contrast checkerboard masking stimulus (diameter = 3°; presented for 200 ms at 40 Hz).

Procedure

Behavioral performance was measured by means of a method-of-adjustment contrast replication task. Throughout both the simultaneous and sequential conditions, the general outline of the task was the same (Figs. 1b and 1c). First, a randomly oriented center grating target was presented for 1,000 ms, and observers were asked to remember the contrast of this grating. After a retention interval (2,200 ms), we presented a probe grating that matched the orientation of the center grating but differed in spatial phase and contrast intensity. Note that in both conditions, the maintenance duration of the center contrast was identical. Presentation of each stimulus was followed by a brief, full-contrast checkerboard stimulus (for 200 ms at 40 Hz) to ensure that the center stimulus did not evoke a negative afterimage. Observers were asked to manually operate a knob (PowerMate; Griffin Technology, Nashville, TN) to match the contrast of the probe to the contrast of the center stimulus held in memory. Once satisfied with the replicated contrast, observers proceeded to the next trial.

There were no time constraints for responses (mean duration = 3.08 s, SD = 1.13 s); instead, the precision of replication performance was stressed throughout the experiment. Observers were required to practice the task before the start of the experiment to get acquainted with the knob. For each of the two experimental conditions (simultaneous and sequential), observers performed a total of six runs of 120 trials (~15 min) each, resulting in 48 repetitions for each contrast-surround configuration. Observers participated in four sessions of data collection, with each session occurring on separate days. Within a session, only one of the two experimental conditions was tested, and the order of the experimental conditions over sessions was counterbalanced across observers.

Simultaneous condition

In the simultaneous condition, we examined the influence of divisive normalization on perceived contrast by introducing a full-contrast surround stimulus (100% Michelson contrast), which was presented simultaneously with the center grating. This surrounding stimulus could have the same orientation as the center (collinear condition) or could be oriented 90° relative to the center grating’s orientation (orthogonal condition; Fig. 1a). Observers were instructed that the surrounding stimuli were irrelevant and that they should attend to and remember only the center stimulus’s contrast. Trials without the presentation of a surrounding stimulus were interleaved throughout the experiment in order to obtain a baseline measure (no-surround condition) for contrast-replication precision, matching the total duration of a trial sequence (Fig. 1b).

Sequential condition

In the sequential condition, we examined whether a visual memory representation can undergo normalization similarly to perception by moving the full-contrast surround stimulus into the retention interval. As in the simultaneous condition, the surround could be collinearly or orthogonally oriented relative to the center grating but was presented 1,000 ms after the offset of the center stimulus and was displayed for 1,000 ms. Observers were told that the surrounding stimuli were irrelevant and were instructed to focus only on retaining the center stimulus’s contrast. To obtain a baseline measure (no-surround condition) for contrast-replication precision, we also measured perceived contrast of the center stimulus in the absence of the surrounding stimulus during the retention interval (Fig. 1c).

Model-fitting procedure

Perceived contrast of the center stimulus in both the simultaneous and sequential conditions was formalized within the normalization framework. The normalization model proposes that the neural response to a stimulus is comprised of an excitatory component that is divided by an inhibitory component (Carandini & Heeger, 2012; Heeger, 1992). We assumed that perceived contrast scales proportionally to the signal-to-noise ratio of the underlying contrast response function (Herrmann et al., 2010; Ling & Blake, 2012). Specifically, changes in the neural contrast response function under this framework directly impact an observer’s perceived contrast for a stimulus. The neural response to an isolated center stimulus, R_a, can be formally expressed as

R_{a} (c) = \frac{{c_{a}}^{n}}{{c_{a}}^{n} + C 50^{n}}

where c_a corresponds to the center-stimulus contrast in the absence of a surround stimulus, C50 is the inflection point of the response function, and n represents the nonlinear transducer, determining the steepness of the function.

We extended Equation 1 to include surround suppression, as described in previous work (Xing & Heeger, 2001). The neural response to the test center stimulus when enveloped by a surround stimulus, R_t, can be formally expressed as

R_{t} (c) = \frac{{c_{t}}^{n}}{{c_{t}}^{n} + (γ {c_{S}}^{n}) + C 50^{n}}

where c_t corresponds to the center stimulus contrast, c_s is the contrast of the surround stimulus (here fixed to 100% contrast), and γ is a parameter that represents the degree of normalization induced by the surround.

In order to fit our data, we assumed that the underlying contrast response for the center stimulus in the no-surround and surround conditions was equal, with only γ free to describe the influence of the surround on perceived center contrast. We used MATLAB’s fminsearch function to optimize the parameter estimates for C50, n, and γ, using nonlinear regression, for each individual observer in the simultaneous and sequential conditions independently. The fitting procedure was performed concurrently for all surround conditions using the no-surround condition to estimate C50 and n and two independent γ parameters to capture the differences in normalization evoked by either the collinear or orthogonal surround conditions.

Results

We first confirmed that the contrast of a stimulus could be reliably retained within visual memory by analyzing contrast estimates in the no-surround condition. Observers’ subjective reports of the center contrast retained in visual memory in the absence of a surround stimulus were near veridical: Measures of apparent contrast closely matched the objective contrast of the stimulus, albeit with a slight bias; specifically, lower contrasts were remembered as slightly higher than reality, and higher contrasts were remembered as slightly lower (Figs. 2a and 2b).

Fig. 2. — Results from Experiment 1. Perceived contrast of the center stimuli is shown separately for the (a) simultaneous and (b) sequential conditions. Observers’ estimates of the center stimulus contrast were near veridical (indicated by the dashed line). Data points reflect the apparent contrast estimates across all contrast levels, averaged over observers (N = 12), for the three different surround conditions (collinear, orthogonal, and no surround). Error bars denote ±1 *SEM* (note that in some cases the error bars are smaller than the data points). Schematics above the graphs illustrate the general experimental design. Normalization strength estimates (c) were derived from the normalization model. Parameter estimates illustrate the influence of the surround (collinear and orthogonal) on perceived contrast of the center stimulus for both the simultaneous and sequential conditions (see the Supplemental Material available online for additional parameter estimates). Error bars denote ±1 *SEM*.

When the center grating was simultaneously enveloped by a surrounding stimulus, we found a substantial suppression of the center’s remembered contrast across all contrast levels (Fig. 2a)—the signature of normalization-driven surround suppression within early visual areas. This attenuation in apparent contrast was evident both when the orientation content of the surrounding stimulus matched that of the center (collinear condition), as well as when the surround orientation content did not match (orthogonal condition). We found that the magnitude of perceptual suppression depended on the match between the center and surround stimuli; the collinear condition engendered stronger suppression than the orthogonal condition (Fig. 2a; see also Figs. S1 and S3a in the Supplemental Material available online).

The previous results established that our stimuli configurations gave rise to multiple signatures of divisive normalization when presented simultaneously. However, does a visual memory representation of the actively maintained center contrast also succumb to contrast normalization when a surround stimulus is instead presented during the retention interval? Our results revealed that the presence of the surround stimulus during the retention interval did not have an effect on the remembered contrast of the center stimulus, either for the collinear or orthogonal configurations (Fig. 2b; see also Figs. S2 and S3a in the Supplemental Material); however, the precision of responses was highly comparable between the simultaneous and sequential conditions (Fig. S5 in the Supplemental Material). In a separate experiment, we confirmed that differences in the timing of the onset of the surround stimulus between the simultaneous and sequential conditions did not influence the differences in suppression between these conditions (Fig. S3b in the Supplemental Material).

Fig. 3. — Stimuli and example trial sequences from Experiment 2. Stimuli (a) were composed of a center and a surround stimulus that both varied in contrast. Each component could be one of four contrast levels (10%–75% contrast). Example trial sequences are shown for the simultaneous (b) and sequential (c) conditions. The contrast of both center and surround stimuli had to be remembered, and after a retention period, observers were asked to match the contrast of the probe to either the center or surround that had been held in memory. Counterphase flickering, full-contrast masks were presented to reduce any lingering afterimages. Stimuli are modified for illustrative purposes.

To quantify the degree of normalization brought about by the surround in both the simultaneous and sequential conditions, we fitted the perceived contrast estimates with a variant of the normalization model (Carandini & Heeger, 2012; Heeger, 1992; Xing & Heeger, 2001; see Equation 2). The model fitted well to all our individual observers’ data (mean R² = .92, SD = .03; Fig. S4 in the Supplemental Material), capturing the slight compression of perceived contrast for visual stimuli, as well as the suppression in the presence of the surround. Normalization strength, as indexed by the normalization constant, γ, differed substantially when the surround was presented simultaneously or sequentially. These results were confirmed by utilizing a paired-samples t test for both the collinear surround, t(11) = 6.37, p < .001 (95% confidence interval, CI = [0.11, 0.23], d = 1.84), and the orthogonal surround, t(11) = 4.57, p = .001 (95% CI = [0.06, 0.16], d = 1.32).

Specifically, the model fits revealed that with competing visual stimulation in the simultaneous condition, normalization strength, γ, was substantially greater than zero across our observers (Fig. 2c; see also Fig. S4). Right-tailed one-sided t tests and Jeffreys-Zellner-Siow Bayes factors (JZS BFs; BayesFactor package for R; see Morey & Rouder, 2011) confirmed these results for both the collinear surround, t(11) = 5.63, p < .001 (95% CI = [0.11, ∞], d = 1.63, estimated JZS BF₁₀ = 399.46), and the orthogonal surround, t(11) = 3.61, p = .002 (95% CI = [0.05, ∞], d = 1.04, estimated JZS BF₁₀ = 24.42). Moreover, normalization strength, γ, was greater in the collinear surround condition compared with the orthogonal surround condition, confirming orientation-tuned divisive normalization for visual representations, t(11) = 5.32, p < .001 (paired-samples t test, 95% CI = [0.04, 0.09], d = 1.53).

However, when fitting the normalization model to the sequential conditions, we found no evidence for suppression, indicated by the normalization constant, γ, between memory stores and visual inputs (Fig. 2c; see also Fig. S4). Right-tailed one-sided t tests confirmed these results for both the collinear surround, t(11) = −2.80, p = .99 (95% CI = [−0.02, ∞], d = −0.81, estimated JZS BF₁₀ = 0.10, JZS BF₂₀ = 7.49), and the orthogonal surround, t(11) = −2.07, p = .97 (95% CI = [−0.03, ∞], d = −0.60, estimated JZS BF₁₀ = 0.11, JZS BF₂₀ = 2.67). The right-tailed one-sided t test was motivated by our a priori hypothesis that normalization should suppress perceived contrast of the center. While the visual memory condition hints toward a subtle increase in perceived contrast, this is not in agreement with divisive normalization and might reflect an attractor bias toward the irrelevant surround stimulus presented during the maintenance period, a memory bias that has been observed for other visual features (Rademaker, Bloem, De Weerd, & Sack, 2015). Furthermore, there was no signature of orientation-tuned normalization, t(11) = −0.03, p = .979 (paired-samples t test, 95% CI = [−0.01, 0.01], d = −0.01).

Fig. 4. — Results from Experiment 2. Perceived contrast of the center stimuli is shown separately for the (a) simultaneous and (b) sequential conditions. Data points reflect the apparent center contrast estimates across all contrast levels, averaged over observers (N = 10), for each surround contrast condition (10%–75% surround). Dashed black lines indicate veridical contrast estimation. Error bars denote ±1 *SEM* (note that in some cases the error bars are smaller than the data points). Schematics above the graphs illustrate the general experimental design. Normalization strength estimates (c) were derived from the normalization model. Parameter estimates illustrate the influence of the surround on perceived contrast of the center stimulus for both the simultaneous and sequential conditions (perception = blue; visual memory = red; see the Supplemental Material for additional parameter estimates). Error bars denote ±1 *SEM*.

Discussion

Experiment 1 suggests that visual memory representations appear immune to divisive normalization induced by visual stimulation during retention. However, it is possible that our visual memory condition did not elicit normalization because observers could ignore the sequentially presented stimulus. Previous work has shown that only attended memory representations elicit a decodable neural representation, suggesting that different attentional states might have different mechanisms supporting the memory representations (LaRocque, Riggall, Emrich, & Postle, 2016). While Experiment 1 showed that normalization may not occur between visual memory representations and visual inputs, it is possible that normalization operates between two attended visual memories stored within early visual areas. In our second experiment, we set out to test this hypothesis, asking whether multiple memory representations that are stored within early visual areas undergo normalization-driven competition.