Abstract
A key unanswered question about working memory is the nature of interference between items. At one extreme of existing theories, interference occurs between any two items because of a general capacity limit. At another extreme, interference depends on the similarity between particular features of different items. We examine this question in three experiments by presenting two sets of items on each trial, comprising tones or colors, with three levels of similarity between the two sets: cross-modal, unimodal with different marking features (two different musical instruments or shapes), and unimodal with the same marking feature. Another question is the extent to which the entry of presented items into working memory is obligatory or optional, which we examined by requiring retention of the first, the second, or both sets of stimuli for a recognition test shortly after the presentation of the two sets. The combination of the set similarity and attention manipulations allows us to draw conclusions about the nature of working-memory storage. The findings were not entirely in accord with any pre-existing theory. The effects of feature similarity were present in both modalities but more pronounced for sounds, whereas the detrimental effects of attention to both sets for retention occurred only for visual stimuli. Based on the findings we suggest a new, hybrid conception of working memory storage.
Keywords: Attention, Modalities, Working Memory, Asymmetry across Modalities, Working Memory Capacity
Working memory reflects limited information that individuals hold in mind at once (Cowan, 2017). It is important in problem-solving, learning, and communication (e.g., Conway et al., 2005; Cowan, 2016). A key issue in studying working memory has been the extent to which it depends on a general holding place across domains (e.g., Cowan, 1988, 2001, 2019; Cowan, Saults, & Blume, 2014; Doherty et al., 2019; Morey, Cowan, Morey, & Rouder, 201; Uittenhove, Chaabi, Camos, & Barrouillet, 2019; Vergauwe, Barrouillet, & Camos, 2010) versus specialized stores for particular information types, such as one store for acoustic or verbal material and another for visual or spatial material (e.g., Baddeley, 1986; Cocchini, Logie, Della Sala, MacPherson, & Baddeley, 2002; Doherty & Logie, 2016; Fougnie, Zughni, Godwin, & Marois, 2015). Even within a modality such as acoustic or visual, there could be greater interference between items when they have features more similar to one another (Cowan, 1988; Nairne, 1990; Oberauer & Lin, 2017). In three experiments, we examine questions about working memory capacity with the presentation of two sets of items on every trial varying in the degree of inter-set similarity.
In our procedure, one or both sets of four items is to be retained for subsequent recognition of a probe item (present in a set to be retained or absent from both sets; Figure 1). The goal is to assess the roles of interference and attention in maintenance. In the general-purpose storage view, the holding place is considered to be attention-related. All kinds of items are eligible for storage in a general, capacity-limited focus of attention (Cowan, 2019), and/or for serial, attention-based refreshing to counteract decay (Barrouillet & Camos, 2014; Cowan, 1992; Lemaire, Pageot, Plancher, & Portrat, 2018; Vergauwe et al., 2010). In contrast to the general-storage view, separate-storage theorists suggest that maintenance of information in these stores does not require much attention (e.g., Baddeley, 1986; Logie, 2016).
Figure 1.
A detailed illustration of a trial. Either of the two sets of memoranda could be a set of colored disks, a set of colored triangles, a series of cello notes, or a series of piano notes with no identical items. Other differences between trials were the requirement to remember the first set, the second set, or both sets, depending on the trial block; the probed set being the first or second; and the probe being the same or different from any of the stimuli in the set. The trial sequence on the top reflects the general procedure, and the detailed presentation of auditory and visual sets are shown in the second and third rows, respectively. The differences between the three experiments (E1, E2, E3) are marked within parentheses: E1 includes simultaneous visual arrays and no mask before the probe, E2 includes simultaneous visual arrays and a bimodal mask before probe, and E3 includes visual sequences and a mask before probe.
Recently, the separation between theories has become less clear-cut as the views migrate toward one another. Cowan et al. (2014) found that some maintenance of working memory information seems separate for different kinds of materials. Conversely, from a separate-storage viewpoint, the importance of attention has been acknowledged (Allen, Baddeley, & Hitch, 2017; Hu, Allen, Baddeley, & Hitch, 2016). The present work suggests a need to re-assess the role of inter-item interference by modalities or features and of the role of attention in working memory maintenance.
The Present Study
Here we examine memory for the colors of different shapes and the frequencies of tones produced by different instruments, similar to those used in two previous studies of dual-set working memory (Cowan et al., 2018, Experiment 2; Morey et al., 2011). The use of nonverbal items in both modalities eliminates the role of verbal knowledge, while still reflecting stimuli that should be processed by different modular stores (e.g., Williamson, Baddeley, & Hitch, 2010). In our procedure, the qualities to be judged, visual color and tone frequency, are continuous (though perceived in color or note categories), with no duplications within a set. Concomitant, untested marking features (shape and instrument) are categorical and identical within a set. These marking features, along with temporal separation of the two sets, specify the candidate set to be matched to a probe item.
In the general procedure that we used in three experiments (Figure 1), two sets of stimuli are followed by a probe prompting the participant to determine if it had appeared in one probed set. There are three conditions of similarity between stimulus sets. They can be in different modalities (e.g., a set of colored triangles and a series of cello notes), different in marking features within a modality (e.g., a set of colored triangles and a set of colored disks), or of the same feature (e.g., two sets of colored triangles). This allows us to study the interference between items of the same or different modalities or marking features.
Instructions for a trial block determine whether the participant is to retain the first, the second, or both sets for the subsequent recognition test. Therefore, we can learn whether any observed inter-set interference is caused by attending to both sets or occurs automatically even when only one set is to be retained. In most previous work with dual modalities of stimuli for working memory (e.g., Cowan et al., 2014; Fougnie et al., 2015), single-modality instructions allowed participants to tune perception to only visual or only auditory stimuli. In our procedure, instructions on attention only indicate the position of the set to be retained (first or second on the trial) but do not reveal its modality. Therefore, performance should be governed solely by factors in working memory retention rather than any effects of perceptual expectations and tuning.
Analyses and Hypotheses for Shared and Specialized Working-Memory Storage
We aim to investigate specific hypotheses about retention of one or two stimulus sets in working memory. Some of our hypotheses about it can be examined best with an accuracy measure. However, other hypotheses can be examined best by estimating the number of items from each set and then partitioning working memory into sections allocated to each set separately and shared between sets. Below, we show how the measures will be used.
Accuracy with different factor combinations.
We will examine our results first in terms of accuracy to answer questions about three key variables: (1) the attention condition, i.e., whether one or both sets were to be attended; (2) the set probed, i.e., whether the first or second set was tested; and (3) the set similarity, i.e., whether they came from different modalities (the cross-modal case), from the same modality but with different marking features of shape or instrument (different-feature unimodal case), or from the same modality with the same marking feature (same-feature unimodal case). These analyses will be conducted separately for visual and acoustic probes to simplify the interpretation.
We examine whether attention to both sets is detrimental to accuracy on the tested set, with a positive result predicted by the general-storage theory, which states that attention is involved in the retention of information in working memory. A set similarity effect should indicate whether the modality of the unprobed set is important, and/or whether the similarity of marking features within a modality matter. Similarity may interact with attention if interference occurs during an active process of retention when both sets must be retained. No interaction between set similarity and attention is expected according to the separate-storage theory because interference occurs through the automatic entry of presented materials into working memory. Finally, interference could be greater when the first set is probed, with retroactive interference from overwriting (e.g., Nairne, 1990) or distraction by the second set.
Components of working memory.
Cowan et al. (2014) showed that it is possible to use estimates of the number of items in working memory to determine the magnitude of shared and dedicated storage for two sets (Figure 2). The “central”, shared storage can be reallocated depending on the attention condition, and the “peripheral”, set-specific storage is always dedicated to one stimulus type. The first step is to estimate the number of items in working memory, k, in each condition. For the current experiment procedure, the assumption is that participants either recognize the probe and answer that it is an old item, or else guess old at a certain rate. The guessing rate drops out of the formula (Cowan et al., 2013), and combining correct recognition of the old probes and false recognition of new probes yields
where p("old"∣new) is the proportion of new-probe trials in which the participant incorrectly answered with “old-item” response; and so on.
Figure 2.

Analysis of the k values into portions. Each circle represents the k value for one set when attended alone. The overlapping portion is the “central” portion of working memory that can be allocated to either set depending on instructions and the remaining, peripheral portions cannot be allocated. See text for how these three portions are estimated from the data.
One can then partition working memory into three portions, with one portion always dedicated to Set 1, another portion to Set 2, and a central portion allocated to Set 1 or 2 or split between sets, depending on attention conditions. For example, consider trials in which the first set comprised piano notes and the second set, colored triangles. One can construct a Venn diagram (Figure 2) representing working memory contents. In the example, the left circle could represent kattend piano and the right circle, kattend triangles under attend-one-set conditions. The total capacity available when both sets are to be retained can be represented by the entire area of the three individual portions, with the central portion being shared by the two sets. In this example, Cowan et al. (2014) showed that the central portion could be estimated as:
This equation is the sum of the two single-attention capacities (large circles in Figure 2) minus the total capacity when both must be retained. Subtracting this central portion from each of the attend-one-set capacities results in the peripheral portions, always dedicated to one set regardless of attention conditions. The peripheral portion for piano in the example is:
For example, if the average total capacity for piano notes and triangle colors in attend-both trials is 4.5 items, it could reflect the sum of 1.3 piano notes, 1.5 triangle colors retained regardless of the attention condition, and 1.7 items that could be allocated to piano notes, triangle colors, or some combination of them depending on the attention condition. In general, for two stimulus sets A and B.
and
Cowan et al. (2014) deployed this model in the case of one set of verbal items accompanied by an array of visual objects, with the central portion reflecting working memory that is not modality-specific. Here we also deploy it in all types of set similarity. Given that the two sets appear in separate groups, peripheral portions can exist in addition to a central portion even in the same-feature unimodality condition. For example, when the two sets are of the same kind, some information about each set might be rapidly memorized as two conglomerates (Cowan, 2019), forming the peripheral portions, whereas any memory mechanism shared among sets would count as the central portion.
These portions could be affected by the set similarity, and different theories have different predictions of the effect. If the central portion does not change regardless of similarity between sets, it is as expected with the notion that a focus of attention can hold a certain number of any kind of ungrouped items (Cowan, 1988, 2019). If the peripheral portions are much larger when the two sets are in different modalities (i.e., tones and colors) and consequently do not interfere with one another, this finding accords with multicomponent theories (e.g., Baddeley, 1986; Williamson et al., 2010). If, however, the sizes of the peripheral portions depend primarily on whether the two sets have different features, even within a modality, that accords with expectations of interference theorists (e.g., Nairne, 1990; Oberauer & Lin, 2017). Finally, differences between results for the two modalities would be expected according to the hypothesis of an asymmetry in which there is more use of attention in visual retention compared to auditory retention (cf. Morey & Mall, 2012; Vergauwe et al., 2010).
In the first experiment, we presented arrays of colored shapes and series of tones. Experiment 2 was designed similarly except that a post-perceptual, bimodal mask was presented to eliminate lingering sensory memory. Finally, in Experiment 3, we presented both modalities as sequences to determine whether differences between modalities in the first two experiments could be attributed to presentation methods.
Experiment 1
Methods
Participants.
We recruited 30 participants to be consistent with previous research with comparable stimuli (Cowan et al., 2018; Morey et al., 2011). The 30 student participants were recruited from an introductory psychology course as part of the course requirement. They were split into two groups according to odd versus even participant number. Group 1 had 15 participants (6 female and 9 male, M=18.6 years, SD=0.88) and Group 2 had 15 participants (9 female and 6 male, M=18.5 years, SD=0.72). The two groups were formed to counterbalance the experimental blocks, as we explain in the Procedure section.
Apparatus and Stimuli.
Auditory stimuli included 21 musical notes of the C major scale from C3 to B5 (whole-note steps), which were played by synthesized piano and cello (42 sounds in total). The sounds were generated by Online Sequencer (onlinesequencer.net) and then trimmed using Audacity (audacityteam.org) to keep the first 250 ms. The waveform envelopes include onset ramps of about 25 ms for piano and about 100 ms for cello. A set of auditory stimuli consisted of 4 different notes by the same instrument that were played in succession with 250 ms gaps in between, for a total of 2 s for the auditory presentation (Figure 1). The sounds were presented with an intensity of 65-75 dB(A) using Audio-Technica ATH-M50WH Studio Monitor Headphones.
Visual stimuli were presented on a 17-inch cathode ray tube monitor (1024 by 768 pixels). A set of visual items comprised arrays of 4 colored disks or filled equilateral triangles of 0.75° visual angle in radius, always with a peak pointing upward. These items, always the same shape within an array, were randomly positioned in the center area of 9.8° × 7.3° visual angle and were separated by at least 2 visual degrees between the centers of any two items. The colors, including the new-item probe, were randomly drawn from a color wheel with at least 30 degrees of difference between items on a trial, with 360 possible colors. The whole array was displayed for 500 ms.
The individual time of each item presentation is identical in the auditory and visual conditions but, as a consequence, the total study time is much longer for auditory stimuli. These presentation methods are in accord with the usual finding of greater temporal acuity for working memory in audition and greater spatial acuity for working memory in vision (Penney, 1989). They are typical of presentation methods in the field (see Cowan, 2001) and produced comparable performance levels across modalities.
Procedure.
All participants were tested individually in the same sound-attenuated room. An experimenter explained the instructions and stayed in the room until the first set of 8 practice trials was finished. Then there was an opportunity to ask questions, and the participant was left to complete the experiment (240 trials). No participant raised questions when asked.
On each trial, two sets of stimuli were presented (Figure 1), followed by a single probe item to be judged present in or absent from the tested set. If the probe was absent, it differed from the stimuli in the tested set only by the tested feature: color (if visual) or frequency (if acoustic). When a probe was absent from the tested set, it differed from all items presented on that trial regardless of the attention condition. The experiment had three types of trial blocks. In one type, the participants were always probed regarding the first set on each trial, and in a second type, the second set of stimuli was always probed. In the third type of trial block, the probe was randomly selected from either set on each trial, so the participants had to remember both sets until a retro-cue indicated which set would be tested. All stimuli in both sets were unique within each trial; there were no duplications in a trial. The block type was explained to the participants at the beginning of each block and was reiterated before each trial in the block. Each experiment included two blocks probing the first set (each with 8 practice and 32 test trials) and two comparable blocks probing the second set. There was one block in which either set could be probed, with 16 practice and 64 test trials. Participants received a block order of either [probe first, probe second, probe either, probe second, probe first] or [probe second, probe first, probe either, probe first, probe second] depending on the parity of their participant number.
The modalities and marking features of the two stimulus sets were uniformly randomized, which means the participants could have any one of four stimulus types in Set 1 and, independently, any one of the same four stimulus types in Set 2. Each combination of stimulus type was presented twice in the 32 test trials in each block probing the first or second set, and four times in the 64 test trials in the block probing either set. Each color or sound frequency within a trial was drawn randomly without replacement from the allowable stimuli as described above. If the trial had no new item as probe, the probe was randomly drawn from the presented items; otherwise it was randomly drawn from the rest of the allowable stimuli.
The procedure for each trial is shown in Figure 1. All instructions and stimuli were displayed on a medium grey background. A trial begins with a fixation cross for 500 ms. Then the two sets of stimuli were presented with another 500 ms fixation screen in between. For auditory stimuli, each note was played for 250 ms followed by a 250 ms silence, which made the whole set last 2 s. The 4 colored shapes were displayed all at once for 500 ms. After the two sets of stimuli, a screen indicating which set would be probed was presented for 500 ms. The probe was presented immediately after this retro-cue for 250 ms. A blank screen was shown while an auditory probe was played. For a visual probe, the colored disk or triangle in question was placed in the center of the screen during silence. After the probe, a question mark was then shown on the screen until a response from the participant was received. The participant was to press “A” if the probe appeared in the probed set or “B” if it was absent. Then a feedback screen showed a check mark (for correct) or cross (for incorrect) until the participant pressed the spacebar to continue to the next trial.
Statistical analysis.
We rely on null hypothesis statistical testing, with pairwise, Holm–Bonferroni post-hoc tests to follow up on significant main effects in ANOVAs for factors with more than two levels. We also provide results of Bayesian analyses. The Bayes Factor provided is the ratio of the likelihood of each model including an effect to the likelihood of the identical model only excluding that effect, according to the default settings of JASP with matched models selected (JASP team, 2019). In the test of an interaction, the only models tested are those including all main effects comprising the tested interaction. Like Rouder, Morey, Speckman, & Province (2012, p. 361), the Bayesian analyses use a Cauchy prior distributed as π(μ,σ2)=1/σ2. A resulting Bayes Factor (BF) no smaller than 3 is conventionally taken as evidence for an effect, and BF≤0.33 is regarded as evidence against an effect, with more extreme values indicating stronger evidence.
Any individual estimate of a component (central or peripheral) smaller than zero was set to zero. Such adjustments were necessary to prevent dividing by zero and theoretically impossible negative estimates, so one cannot interpret the absolute value of the estimates, only the differences between conditions.
Results
Proportion correct.
The proportion correct in each condition, averaged across trials in which the probe was present versus absent from the stimulus sets, is given in Table 1, with effects of set similarity and attention summarized in Figure 3. Clearly, the pattern of results is different for auditory and visual test probes.
Table 1.
Proportion correct (and SD) for each condition in three experiments.
| Probe Condition | ||||||
|---|---|---|---|---|---|---|
| Auditory Probe | Visual Probe | |||||
| Experiment | Set Attended |
Set Similarity |
Set 1 | Set 2 | Set 1 | Set 2 |
| 1 (No Mask) | Both | AV | 0.71 (0.22) | 0.76 (0.23) | 0.66 (0.20) | 0.73 (0.17) |
| Different | 0.69 (0.26) | 0.64 (0.26) | 0.53 (0.22) | 0.65 (0.27) | ||
| Same | 0.56 (0.28) | 0.74 (0.23) | 0.65 (0.26) | 0.60 (0.27) | ||
| One | AV | 0.67 (0.16) | 0.78 (0.14) | 0.64 (0.12) | 0.75 (0.13) | |
| Different | 0.59 (0.21) | 0.71 (0.17) | 0.66 (0.16) | 0.75 (0.16) | ||
| Same | 0.55 (0.22) | 0.73 (0.16) | 0.63 (0.17) | 0.74 (0.17) | ||
| 2 (Mask) | Both | AV | 0.65 (0.20) | 0.66 (0.20) | 0.59 (0.19) | 0.71 (0.18) |
| Different | 0.61 (0.28) | 0.54 (0.25) | 0.63 (0.31) | 0.63 (0.30) | ||
| Same | 0.51 (0.28) | 0.65 (0.31) | 0.58 (0.31) | 0.59 (0.24) | ||
| One | AV | 0.64 (0.11) | 0.66 (0.15) | 0.70 (0.11) | 0.66 (0.15) | |
| Different | 0.63 (0.15) | 0.65 (0.16) | 0.63 (0.19) | 0.67 (0.17) | ||
| Same | 0.58 (0.16) | 0.65 (0.17) | 0.62 (0.17) | 0.73 (0.22) | ||
| 3 (Sequential, with mask) | Both | AV | 0.68 (0.16) | 0.70 (0.20) | 0.67 (0.20) | 0.77 (0.21) |
| Different | 0.60 (0.27) | 0.62 (0.29) | 0.63 (0.23) | 0.76 (0.21) | ||
| Same | 0.56 (0.23) | 0.64 (0.28) | 0.64 (0.27) | 0.71 (0.22) | ||
| One | AV | 0.65 (0.14) | 0.70 (0.16) | 0.77 (0.13) | 0.76 (0.10) | |
| Different | 0.57 (0.13) | 0.67 (0.18) | 0.72 (0.15) | 0.77 (0.13) | ||
| Same | 0.59 (0.19) | 0.62 (0.19) | 0.65 (0.17) | 0.73 (0.17) | ||
Note. “AV” stimuli refer to a visual set and an auditory set, in either order. “Different” refers to two sets of stimuli within a modality but of different types (for visual objects, a set of colored disks and a set of colored triangles; for sounds, a set of cello notes and a set of piano notes), whereas “same” would be exemplified by two sets of piano notes. The Probe condition specifies the modality and the set being probed.
Figure 3.

Proportion correct in each condition of each experiment (X axis). Top panels, auditory probe; bottom panels, visual probe. Left-hand panels, set similarity effect with the levels as the graph parameter; right-hand panels, attention effect with the levels as the graph parameter. Error bars are standard errors of the mean.
For probes in each modality, an ANOVA was conducted with 3 within-participant factors as described in the introduction: attention condition (attend one or both sets), set similarity (cross-modal, unimodal different-marking-feature, or unimodal same-marking feature). Significant effects are shown in Table 2 and the complete results of the analysis are shown in the online supplement. For auditory probes, there was no effect of attention but there was an effect of set similarity, with much better performance in cross-modal than unimodal trials. In Holm-Bonferroni pairwise post-hoc tests, cross-modal trials differed from both same-feature and different-feature unimodal trials while the latter two did not differ. In contrast, for visual probes, it was the effect of attention that significantly mattered, with better performance on single-set than on dual-set attention (Figure 3). There was no effect of set similarity, although there was a trend (with p=.095 and ηp2=.083, versus ηp2=.133 for the significant acoustic case).
Table 2.
Significant results from frequentist and Bayesian ANOVA for visual and auditory proportion correct in each and across experiment.
| Exp | Probe Modality |
Factors | df | F | η2p | p | BF incl | |
|---|---|---|---|---|---|---|---|---|
| 1 | Auditory | Set Similarity | 2 | 58 | 4.436 | 0.133 | 0.016 | 7.654 |
| Probed Set | 1 | 29 | 21.273 | 0.423 | < .001 | 3116.044 | ||
| Set Similarity ✻ Probed Set | 2 | 58 | 4.207 | 0.127 | 0.020 | 2.288 | ||
| Visual | Attention Condition | 1 | 28 | 12.553 | 0.31 | 0.001 | 4.416 | |
| Probed Set | 1 | 28 | 13.689 | 0.328 | < .001 | 43.883 | ||
| 2 | Visual | Attention Condition | 1 | 29 | 4.749 | 0.141 | 0.038 | 0.840 |
| Probed Set | 1 | 29 | 5.015 | 0.147 | 0.033 | 0.578 | ||
| 3 | Auditory | Set Similarity | 2 | 58 | 6.795 | 0.19 | 0.002 | 4.267 |
| Visual | Set Similarity | 2 | 58 | 4.021 | 0.122 | 0.023 | 0.961 | |
| Probed Set | 1 | 29 | 14.677 | 0.336 | < .001 | 121.748 | ||
| All | Auditory | Set Similarity | 2 | 174 | 12.523 | 0.126 | < .001 | 2554.998 |
| Probed Set | 1 | 87 | 22.54 | 0.206 | < .001 | 2985.694 | ||
| Set Similarity ✻ Probed Set | 2 | 174 | 4.701 | 0.051 | 0.010 | 1.704 | ||
| Set Similarity ✻ Attention Condition ✻ Probed Set | 2 | 174 | 3.887 | 0.043 | 0.022 | 0.727 | ||
| Experiment | 2 | 87 | 4.059 | 0.085 | 0.021 | 0.575 | ||
| Visual | Set Similarity | 2 | 172 | 5.421 | 0.059 | 0.005 | 1.957 | |
| Attention Condition | 1 | 86 | 17.064 | 0.166 | < .001 | 111.895 | ||
| Probed Set | 1 | 86 | 32.089 | 0.272 | < .001 | 32013.930 | ||
| Set Similarity ✻ Attention Condition ✻ Probed Set | 2 | 172 | 3.923 | 0.044 | 0.022 | 2.584 | ||
| Experiment | 2 | 86 | 5.139 | 0.107 | 0.008 | 1.772 | ||
Note. Exp=experiment. All factors are within-participant except Experiment. The online supplement presents the entire ANOVA results. BFincl refers to the likelihood of models with the effect in question included, compared to the matched models without that effect.
In both modalities, as shown in Table 1, there was a pronounced disadvantage when Set 1 was tested compared to Set 2 (auditory, M=.63, & .73, respectively; visual, M=.63 & .70), indicating a strong effect of retroactive interference caused by overwriting or attention distraction.
The only interaction in these analyses was set similarity with probed set for auditory probes. The mean (and SEM) for the three levels of similarity when the first set was probed were: .68 (.03) for cross-modal sets; .62 (.03) for different-feature sets (piano vs. cello); and .55 (.04) for same-feature sets. In contrast, when the second set was probed these means were .77 (.02), .68 (.03), and .74 (.03). In this interaction, degrees of similarity thus had a greater effect when the first set was probed, as one would expect from feature overwriting.
Theoretical analysis of components.
As described above, we used the data for probe-present and probe-absent trials to estimate the parameter k, the number of items present in working memory. Then we used k values in different attention conditions to estimate the shared central component and the set-specific peripheral components of working memory. The component estimates and statistical analysis results are shown in Table 3 and Table 4, respectively.
Table 3.
Estimated items Mean (and SD) of each component in working memory by condition.
| Stimuli Modalities |
Stimuli Types |
Central | Auditory Peripheral |
Visual Peripheral |
Total Dual-set Capacity |
|---|---|---|---|---|---|
| Experiment 1 (Arrays of Colors, Series of Tones, No Mask) | |||||
| Cross-modal | 0.85 (0.97) | 1.90 (1.10) | 1.32 (0.93) | 4.07 (1.24) | |
| Auditory | Different | 1.41 (1.51) | 1.07 (1.13) | - | 3.54 (1.78) |
| Same | 1.42 (1.82) | 0.67 (0.92) | - | 2.76 (1.98) | |
| Visual | Different | 2.24 (2.08) | - | 0.62 (0.92) | 3.47 (1.74) |
| Same | 1.73 (2.10) | - | 0.67 (0.90) | 3.06 (1.82) | |
| Experiment 2 (Like Experiment 1 but with Post-Perceptual Mask) | |||||
| Cross-modal | 1.16 (1.39) | 1.11 (0.96) | 0.97 (0.84) | 3.24 (0.95) | |
| Auditory | Different | 1.63 (2.06) | 0.77 (0.96) | - | 3.16 (1.97) |
| Same | 1.97 (2.32) | 0.39 (0.61) | - | 2.75 (1.99) | |
| Visual | Different | 1.21 (1.67) | - | 0.70 (0.90) | 2.60 (1.86) |
| Same | 1.86 (1.99) | - | 0.50 (0.86) | 2.86 (1.86) | |
| Experiment 3 (Serial Presentation in both Modalities, with Mask) | |||||
| Cross-modal | 0.98 (1.10) | 1.44 (0.93) | 1.42 (0.98) | 3.85 (0.99) | |
| Auditory | Different | 1.51 (1.77) | 0.58 (0.79) | - | 2.67 (1.66) |
| Same | 2.20 (2.51) | 0.40 (0.72) | - | 3.00 (2.23) | |
| Visual | Different | 1.76 (1.85) | - | 0.94 (1.06) | 3.63 (1.54) |
| Same | 1.58 (1.78) | - | 0.79 (0.93) | 3.16 (1.57) | |
Note. Central and peripheral components estimated according to the formulas provided by Cowan et al. (2014). The total capacity shown is the sum of central and peripheral components. When both sets were of the same modality, the listed peripheral component estimate must be multiplied by two when calculating the total capacity.
Table 4.
Frequentist and Bayesian ANOVA results for peripheral and central components in each experiment and across experiments
| Expt. | Component | Factors | df | F | η2 p | p | BF incl | |
|---|---|---|---|---|---|---|---|---|
| 1 | Peripheral | Set Similarity | 2 | 58 | 15.507 | 0.348 | < .001 | 175406.549 |
| Modality | 1 | 29 | 4.631 | 0.138 | 0.040 | 2.987 | ||
| Set Similarity ✻ Modality | 2 | 58 | 2.359 | 0.075 | 0.104 | 0.390 | ||
| Central | Type | 4 | 116 | 2.656 | 0.084 | 0.036 | 1.432 | |
| 2 | Peripheral | Set Similarity | 2 | 58 | 6.166 | 0.175 | 0.004 | 39.995 |
| Modality | 1 | 29 | 0.06 | 0.002 | 0.808 | 0.159 | ||
| Set Similarity ✻ Modality | 2 | 58 | 0.349 | 0.012 | 0.707 | 0.149 | ||
| Central | Type | 4 | 116 | 1.188 | 0.039 | 0.320 | 0.142 | |
| 3 | Peripheral | Set Similarity | 2 | 58 | 15.124 | 0.343 | < .001 | 104179.819 |
| Modality | 1 | 29 | 3.878 | 0.118 | 0.059 | 0.999 | ||
| Set Similarity ✻ Modality | 2 | 58 | 1.258 | 0.042 | 0.292 | 0.298 | ||
| Central | Type | 4 | 116 | 1.724 | 0.056 | 0.149 | 0.336 | |
| All | Peripheral | Set Similarity | 2 | 174 | 34.778 | 0.286 | < .001 | 5.592e +13 |
| Set Similarity ✻ Experiment | 4 | 174 | 1.101 | 0.025 | 0.358 | 0.062 | ||
| Modality | 1 | 87 | 0.31 | 0.004 | 0.579 | 0.111 | ||
| Modality ✻ Experiment | 2 | 87 | 4.613 | 0.096 | 0.012 | 4.261 | ||
| Set Similarity ✻ Modality | 2 | 174 | 2.979 | 0.033 | 0.053 | 0.398 | ||
| Set Similarity ✻ Modality ✻ Experiment | 4 | 174 | 0.376 | 0.009 | 0.825 | 0.036 | ||
| Experiment | 2 | 87 | 3.258 | 0.07 | 0.043 | 0.483 | ||
| Central | Type | 4 | 348 | 3.236 | 0.036 | 0.013 | 1.398 | |
| Type ✻ Experiment | 8 | 348 | 1.1 | 0.025 | 0.362 | 0.047 | ||
| Experiment | 2 | 87 | 0.062 | 0.001 | 0.940 | 0.040 | ||
Note. Expt.=experiment. All factors are within-participant except Experiment.
Each analysis of the central component included only one within-participant factor, the type of stimulus combination (cross-modal, different-feature auditory, same-feature auditory, different-features visual, and same-feature visual). There was a significant effect in which the central component was smaller with cross-modal stimuli than with the other types. Pairwise post-hoc tests confirmed only that the cross-modal case had a lower central component than two visual sets with different shapes.
For the peripheral component analysis, there were two within-participant factors: the set similarity and the probe modality (auditory or visual). As Table 4 indicates, this analysis showed significant effects of set similarity and modality: the peripheral component was largest when the two sets were presented in different modalities and was larger for auditory probes. However, there was no interaction.
Discussion
The first experiment showed some effects of each of three variables: attention, set similarity, and the probed set. These results do not fully conform with any of the major theories of working memory. The effect of similarity, in which cross-modal trials produced better performance than unimodal trials, was predicted by multicomponent models (e.g., Baddeley, 1986), but this effect was significant only for auditory probes. The effect of attention, in which dual-set attention was detrimental to performance, was predicted by attention-based principles (e.g., Cowan, 1988), but the effect was only found for visual probes. Finally, poorer performance occurred when the first set was probed. That effect was exacerbated for auditory stimuli in same-marking-feature trials, and it was in accord with overwriting principles (e.g., Cowan, 1988; Nairne, 1990). The difference between modalities in which there was more of an effect of attention for visual probes is in accord with views in which there is an asymmetry between modalities in the use of attention (Morey & Mall, 2012).
Before interpreting these results, we conducted two more experiments to explore stimulus factors. In the second experiment, a bimodal mask was presented (cf. Cowan et al., 2014) to eliminate any contribution of lingering sensory memory at the time of test.
Experiment 2
The bimodal mask added to this experiment (Figure 1) comprises a multi-colored square and a noise presented simultaneously after both sets of stimuli. Given that sensory memory is considered to be modality-specific and not influenced by attention, eliminating any lingering sensory memory or otherwise fragile but large-capacity trace (Endress & Potter, 2014; Landman, Spekreijse, & Lamme, 2003) could reduce the effect of the set similarity, making the memory representation more abstract. In doing so, it could also potentially increase the requirement for attention in cross-modal conditions, because attention-based mnemonic processing might have to replace sensory memory in preserving a useful working memory representation.
Methods
Aspects of the method and analyses are mostly the same as in Experiment 1. The only exceptions are explained below.
Participants.
Similar to Experiment 1, the 30 participants were students receiving credit in an introductory psychology course. Each of the Group 1 (11 female and 4 male, M=19.33 years, SD=1.25) and Group 2 (10 female and 5 male, M=20.33 years, SD=3.59) has 15 participants. The block order was counterbalanced for the two groups as in Experiment 1.
Apparatus, Stimuli and Procedure.
The arrangement was the same as in Experiment 1 except for two differences (Figure 1). First, we were concerned that performance was better for disks than for triangles. Previously the triangles and the disks had the same radius of 0.75° visual angle but, as a result, the area of each triangle was 0.73° visual angle, while the area for each disk was 1.77° visual angle. In the second experiment, the triangles and disks were normalized to have the same area of 1.77° visual angle to equalize the two stimulus types.
Second, a bimodal mask was added after the both sets of stimuli were presented (and after a blank 500-ms pause) but before the test set indicator was presented. The visual mask was a rectangle consists of 18 rows × 20 columns of squares with an edge length of 0.75° visual angle, making the entire mask 15° wide and 13.5° tall. Each square within the mask was colored by one of the 360 possible colors, and the whole mask showed all possible colors in haphazard arrangement. The auditory mask consisted of 4 notes played by both synthesized cello and piano simultaneously. The four notes were: one lowest note C3 (130.813 Hz), one mid-low note C4 (261.626 Hz), one mid-high note C5 (523.251 Hz), and one highest note B5 (987.767 Hz). The auditory mask and the visual mask were presented at the same time. After 250 ms the auditory mask stopped, while the visual mask continued to be shown for another 250 ms. The set-cue and the probe were shown afterwards (Figure 1).
Results
Proportion correct.
Figure 3 shows the means for each condition in the main analyses of proportion correct, depicted also in Table 1. Although the pattern of means for similarity and attention looks similar to Experiment 1, comparable analyses (Table 2) show that the mask affected the auditory modality substantially. There were no significant effects of any of the main variables (attention, set similarity, or probed set). In contrast, for visual probes, the effects were comparable to those of Experiment 1, with a main effect of attention (better performance with single-set attention compared to dual-set attention) and probed set (poorer performance on the first set, M=.63, compared to the first set, M=.67).
Theoretical analysis of components.
Analyses of components similar to Experiment 1 showed fewer effects in this experiment. There was no difference between types in the central component after adding the mask (Tables 3 and 4). In the analysis of peripheral components, there was no effect of probe modality, but the set similarity still has an effect with the largest peripheral components on cross-modal trials, smaller components on different-feature unimodal trials, and the smallest peripheral components on the same-feature unimodal trials. Thus, the more the sets differ, the more capacity increases because of the increase of the peripheral components. Post-hoc tests confirmed only the cross-modal trials had higher peripheral components than the same-feature trials.
Discussion
With sensory memory removed by the bimodal mask, effects of retroactive acoustic interference have been reduced compared to Experiment 1. Although both modalities have a kind of sensory memory generally lasting several seconds (Cowan, 1988, 1995), they may function differently; for example, partial report procedures show evidence of the persistence of useful sensory memory for characters that does not last as long in vision (less than 1 s: Sperling, 1960) as it does in audition (4 s: Darwin, Turvey, & Crowder, 1972). This may account for why, in Experiment 1, the auditory performance in cross-modal trials appears substantially higher in audition than in vision (Figure 3), whereas the levels of performance in the present experiment seem better-matched.
In both experiments, there was a role of attention for visual probes and of set similarity for the peripheral components. The set similarity factor, however, does not appear to concern only modality but also partially involves the marking feature within a modality (tone instrument or visual shape), which is in accord with interference theories (e.g., Nairne, 1990; Oberauer & Lin, 2017).
Before drawing conclusions, we noted that one of our decisions about the stimulus presentation was debatable. We presented visual items in arrays and acoustic items as a sequence. By doing so, we hoped to minimize the amount of interference between sets based on representational similarity. Both modalities cannot be presented as concurrent arrays without relatively poor auditory perception (Saults & Cowan, 2007). Extending the visual array presentation time to 2 s to match the sounds might allow groups of items (Jiang, Chun, & Olsen, 2004) to be converted to learned chunks (Cowan, 2019).
If both modalities were presented as sequences, there would be potential interference between the comparable serial positions of an auditory set and a visual set, which could be to the disadvantage of multicomponent theories that specify little interference between such sets in different modalities. Nevertheless, presenting both sets as sequences does allow the schedule of presentations to be comparable across sets. This led to the third experiment, using a bimodal mask as in Experiment 2, to examine whether the pattern of results will be similar to that experiment despite sequential visual presentation.
Experiment 3
Methods
Aspects of the method and analyses are mostly the same as in Experiment 2. The only exceptions are explained below.
Participants.
The 30 participants were recruited and counterbalanced in the same manner as for the first two experiments. Group 1 (12 female and 3 male, M=22 years, SD=2.8) and Group 2 (11 female and 4 male, M=22.87 years, SD=3.26) each has 15 participants.
Apparatus, Stimuli and Procedure.
Experiment 3 followed the same procedure as Experiment 2 except for one difference (see Figure 1). The 4 items in a visual array were shown one after another rather than in a simultaneous array. Each colored shape was shown on screen at a random location for 250 ms followed by a 250-ms blank screen. An array thus lasted 2 s.
Results
Proportion correct.
Figure 3 shows the results for proportion correct, as does Table 1 in more detail. Note that there is a general pattern of results for auditory and visual probes that is generally consistent across all three experiments. In that pattern, similarity and test order effects can occur for both modalities but attention effects are restricted to the visual case.
For the auditory probe analyses, as in Experiments 1 and 2, there was no effect of attention. As in Experiment 2, there was no effect of the probed set, either. In this experiment, however, as in Experiment 1, there was an effect of set similarity (Table 2). Pairwise post-hoc tests confirmed that cross-modal trials had a higher proportion correct than both different- or same-feature unimodal trials, with no difference between the latter two.
The visual probe analyses show that, unlike the first two experiments in which presentation of a visual set was as a concurrent array, there was no effect of attention. Even though Figure 3 shows a pattern that resembles the effect in the other experiments, the detrimental effect of dual-set attention did not reach significance though there was a trend (p=.092, ηp2=.095). The need for shared attention to the two sets may be larger in Experiments 1 and 2 because of the presentation of visual items in a concurrent array.
Unlike the other experiments, there was a significant effect of set similarity for visual probes, and there is a considerable difference in performance levels between the different- and same-feature trials (Figure 3). However, post-hoc tests established only that the cross-modal trials produced significantly better performance than same-marking-feature trials. Finally, once more, performance on visual probes was worse when the first set was probed (M=.68) than when the second set was probed (M=.75).
Theoretical analysis of components.
In this experiment with visual and auditory materials both presented as slow sequences, the effect of trial type on the central component of working memory was not evident. In the analysis of peripheral components, the effect of probe modality was not significant while the set similarity had a significant effect (Table 4). The reason for the effect can be seen clearly in Table 3. The peripheral components were larger for both modalities in cross-modal condition than in the two unimodal conditions. Post-hoc tests confirmed this for cross-modal compared to both different-feature and same-feature trials, and the two unimodal conditions did not differ.
Discussion
In this experiment, unlike the previous ones, there was no significant effect of attending to one versus two sets on visual set performance. This suggests that effects of attention may come into play at least partly in the encoding of brief, concurrent arrays into working memory during a memory load, or holding onto them during further encoding.
On the other hand, the change in visual presentation method from the previous experiments also resulted in the effect of set similarity becoming significant, unlike the first two experiments (Table 5). In the analysis of proportion correct (see Figure 3), a difference between the marking features of two sets, with one set of disks and one set of triangles, seemed helpful. One possibility is that the visual marking feature (shape) was well-encoded only in the present experiment. If it was encoded in all experiments, another possible explanation for the difference in results between experiments might be that the visual sets in Experiment 3 were presented in a temporally drawn-out fashion. That presentation method would reduce their temporal distinctiveness from one another, making it more useful to have a non-temporal means to avoid confusion between two visual sets.
Table 5.
Main distinguishing features of each experimental condition and main effects
| Expt. | Vis. Display | Mask | Similarity effect? | Dual-task cost? | Second Set >First? |
|---|---|---|---|---|---|
| Auditory Probe | |||||
| 1 | Array | No | Yesa | No | Yes |
| 2 | Array | Yes | No | No | No |
| 3 | Sequence | Yes | Yesa | No | No |
| All | n.a. | n.a. | Yes a | No | No |
| Visual Probe | |||||
| 1 | Array | No | No | Yes | Yes |
| 2 | Array | Yes | No | Yes | Yes |
| 3 | Sequence | Yes | Yesb | No | Yes |
| All | n.a. | n.a. | Yes c | Yes | Yes |
NOTE. Expt.=experiment, Aud.=auditory, Vis.=visual. An effect is listed as Yes if p<.05 in corresponding analysis of proportion correct. The all-experiments analysis included no significant interactions of Experiment with any other factor.
Difference primarily between cross-modal and within-modality trials
Difference primarily between different-marking-feature and same-marking-feature trials
Changes between cross-modal, different-marking-feature, and same-marking-feature trials.
It is also worth noting that set similarity has a significant effect again for auditory probes. There is the possibility that the longer, serial presentation of visual stimuli allows better consolidation, which could free up attention for auditory stimuli and increase performance on those stimuli, though chance fluctuations between experiments cannot be ruled out. In the analysis of components, an auditory and a visual set, as opposed to two auditory or two visual sets, also proved helpful to performance overall.
Analyses Across Experiments
Proportion correct.
The three experiments show the same basic asymmetry across experiments, with the effects of the attention condition restricted to responses to visual stimuli (Figure 3). There are, however, differences in the statistical results of the experiments that could result from either experimental design differences or insufficient power in some instances. For greater statistical power, we carried out analyses of proportion correct with all data included by making experiment a between-participant factor along with the three within-participant factors of attention condition, set similarity, and probed set. This analysis, shown at the bottom of Table 2, produced significant effects of experiment for both probe modalities, but there was no interaction between experiment and any other factor.
For auditory probes, there were effects of set similarity and probed set. Pairwise post-hoc tests on similarity showed that the cross-modal case produced better performance than either unimodal trial types within the auditory modality. There was no effect of attention except a three-way interaction of attention, set similarity, and probed set.
In contrast, for visual probes, there were not only effects of set similarity and probed set, but also a robust effect of attention (as well as, again, the three-way interaction of attention, set similarity, and probed set). Post-hoc tests on set similarity confirmed a significant effect of the difference between cross-modal trials and same-feature trials. The results suggest that although both modalities of working memory are sensitive to interference, visual working memory is more dependent on attention, in keeping with Morey and Mall (2012).
Although the same three-way interaction occurred for both probe modalities, inspection of the means in Table 1 showed that the patterns are noticeably different in the two probe modalities. With an auditory probe, in one-set attention trials, there was an advantage for probing the second set as compared to the first set. The advantage grew slightly across degrees of set similarity (cross-modal, .06; different-features unimodal, .08; same- feature unimodal, .09). In attend-both-set trials, however, the pattern was much more dependent on similarity (.03, −.03, .13). With a visual probe, the one-set attention trials show similar growth in the second-set advantage across levels of similarity (.02, .06, .10). When attending to both sets, however, the pattern reverses, showing more second-set advantage when the sets are dissimilar (.10, .08, .01). This pattern difference between the two modalities further supports the point that there is a fundamental difference in the role of attention for auditory versus visual probe trials.
Analysis of components.
In an analysis of components, the central component was examined with a between-participants factor of experiment and with the within-participant factor of trial type (cross-modal, different-feature auditory, same-feature auditory, different-features visual, and same-feature visual). The central component showed no effect of experiment, but there was an effect of type. Post-hoc pairwise comparisons confirmed only that the cross-modal condition produced a smaller central component than two auditory sets of the same musical instrument.
The peripheral components also were examined in an analysis with the additional between-participant factor of experiment and within-participant factors of set similarity and probe modality. There was an overall effect of set similarity. Post-hoc tests confirmed differences between all three levels, with peripheral components largest for cross-modal trials, smaller for different-feature unimodal trials, and smallest for same-feature trials. There was also an effect of the experiment and an interaction of that effect with probe modality. Across levels of similarity, in the three experiments the mean auditory peripheral component was 1.21, 0.76, and 0.81 items, respectively, whereas the mean visual peripheral component was .87, .72, and 1.05 items. This pattern could occur because in Experiment 1, not including a mask mainly helped auditory memory, whereas in Experiment 3, sequential presentation of visual items primarily improved visual memory.
General Discussion
Present Method and Findings
We examined short-term recognition performance on two sets of materials when (1) either set could consist of colored objects or a tone series; (2) attention was directed toward the first, second, or both sets of items; (3) the modalities of the set(s) to be attended were unknown until they were presented; and (4) sets within a modality had the same or different marking features. Consequently, the basis of dual-task costs is clearer here than in previous work; it is confined to working memory retention.
The main findings are summarized in Table 5. The clearest conclusion is that attention works differently for auditory and visual stimuli, especially with typical presentations in the field (auditory sequences; rapid visual arrays). The detrimental effect of dual-set attention as a main effect of the proportion correct was confined to trials with visual probes, and the disadvantage of probing the first set was only prominent for visual probes after eliminating sensory memory. In contrast, detrimental interference caused by having two sets of same-modality items was found more consistently with auditory probes.
The disadvantage of first-set probes in the visual modality is to be expected if there is attentional distraction from encoding of the second set regardless of whether it is to be retained for a possible test. Therefore, the first-set disadvantage for visual probes may indicate that attention was important for them even in Experiment 3, in which the visual items were presented sequentially. In that experiment, the disadvantage originated not from trying to retain both sets but from distraction caused by the second set. The distraction effect might also be related to the same-marking-feature disadvantage for visual stimuli in this experiment only (Table 1). In terms of Allen et al. (2017), having to attend two sets creates executive distraction from encoding both sets into working memory (attention-sharing between sets), but the second set also creates perceptual distraction. The asymmetry between modalities regarding attention (Morey & Mall, 2012) thus seems to encompass both executive attention and perceptual attention.
Stimulus Discriminability
Conclusions might be tempered by whether the sets in the two modalities were matched for performance levels. The mean proportion correct for probing Set 2 when only that set was to be retained can be taken as the best estimate of the discriminability of the stimuli within working memory because there is minimal overwriting or distraction. The levels were closely matched in Experiment 1 (auditory, M=.74; visual, M=.75) and almost matched in Experiment 2 (.65 & .60). The match was not as close in Experiment 3, in which all sets were presented as series (.66 & .75). Inspection of Figure 1 shows that the effects of similarity and attention nevertheless have comparable patterns across experiments and levels of performance.
Comparison to Related Literature
According to an extreme modular view, one might obtain no detrimental effect of attempting to remember a second set of colors or tones while retaining items from a first set in the opposite modality. According to an extreme attention-based view, in contrast, one might obtain capacity levels with a fixed limit (based on prior work, about 3 items in adults) that is the same regardless of whether those items come from one or two sets. Prior results suggest an intermediate solution (Cowan et al., 2018, with colors and tones; Cowan et al, 2014, 2018, with colors and words).
These studies, however, did not compare unimodal to cross-modal trials. Cowan and Morey (2007) included such trials using verbal and visual items with a procedure in which attention was determined by a post-cue followed by a 3-s retention interval. They found a partial overlap between modalities. There are studies with other kinds of stimuli in which the two sets in different modalities were always to be remembered but the load in each set varied (Katus & Eimer, 2018; Fougnie et al., 2015; Uittenhove et al., 2019). Among these studies, only Uittenhove et al. (2019) found a cross-modality cost of working memory load in recall. Cocchini et al. (2002) found a small effect that they did not consider important, which seems debatable.
Compared to previous studies, our design offers two advantages. First, we included trials with sets not only in the same or different modalities but also with the same or different marking features within a modality (musical instrument or visual shape). Second, we manipulated attention in a way that eliminated pre-tuning of perception to the stimuli (see also Cowan & Morey, 2007). With both experimental factors in place, we find an intermediate solution in which attention is more important for the visual modality, both in terms of deliberate use of attention and distraction from an unattended second set (Tables 1 & 5). Interference effects are more pronounced between sets in the same modality, especially for auditory probes. However, interference is alleviated to some extent when there are different marking features within a modality, especially for visual stimuli (Figure 2).
Theoretical Implications
Recent studies involving an adversarial collaboration between proponents of a modular view of working memory and proponents of attention-based views (Doherty et al., 2019; Rhodes et al., 2019) provide discussions suggesting that either kind of view could be fine-tuned to account for data like ours. The process of identifying the best model may involve tuning different models to match new data, which tends to lead toward convergence of the models (Cowan et al., 2020). Here we discuss what versions of the modular and attention-based views seem to be needed to account for the present results.
Modular views.
The typical modular view has included separate visuospatial and verbal storage buffers (e.g., Baddeley, 1986). In such a view, greater interference is expected between sets of items in the same modality than between sets of cross-modality items, because sets in different modalities are likely to be stored in different buffers. Colors are acceptable within the visuospatial store. The buffer that can include tones might or might not be the same as the phonological buffer. Although there are similarities in how verbal and tonal information is processed, such as acoustic similarity effects in both cases (Williamson et al., 2010), there are also differences. The effectiveness of suppressing the mental conversion of visual notes into acoustic representations is different for speech versus singing (Schendel & Palmer, 2007), and the dominant representation of the tonal aspects of language is in the right hemisphere, versus the left hemisphere for phonological aspects (Jia, Tsang, Huang, & Chen, 2013). In principle, a tonal buffer could be added.
In a modular view, effects of attention to more than one set are expected only under special circumstances. They can occur if the working memory loads are large enough to require involvement of something other than modality-specific stores. If, for example, there are attention-demanding strategies that are necessary in rehearsing particular stimuli, then these strategies could conflict for colors and tones. Although attention-demanding strategies are not part of the traditional modular model, recent research by proponents of modular views have successfully investigated the role of attention and distraction in the retention of visual materials (e.g., Allen, Baddeley, & Hitch, 2017). In that view, attention is added to the separate, modality-specific modules of the system and is tied into central executive function.
Alternatively, if both the tone buffer and the visual buffer fill up, verbal coding might be used to retain some of the items of both types, leading to a capacity issue for the phonological buffer (an account consistent with a modular view in which there is no central executive function and no overall sharing of a general attentional resource; cf. Logie, 2016; Vandierendonck, 2016). In the present case, colors could be named, but it is difficult to understand how tones could be converted to a verbal form, except that they might be retained through covert humming tapping into the phonological buffer (cf. Williamson et al., 2010).
It is not clear to us what would have to be done within a modular view to account for an asymmetrical effect of attention-sharing, affecting visual performance but leaving acoustic performance relatively undisturbed (and for visual and verbal stimuli, cf. Morey et al., 2012, 2013). In a modular view, there would be little interference between modalities (Cocchini et al., 2002). However, a kind of rehearsal mechanism used for visual maintenance (Baddeley, 1986) theoretically could be more susceptible to interruption by attention-switching between stimulus sets, with less attentional demand of tone rehearsal by humming.
Attention-based views.
From the view of Cowan (1988, 1995, 1999, 2005/2016, 2019), what we call working memory storage is a combination of two mechanisms. The first is the collection of items held temporarily in the activated portion of long-term memory, even if that information consists of synaptic or neurochemical information instead of neural activation as was originally assumed (Cowan, 2019; Masse, Rosen, & Freedman, 2020; Rose, 2020). The second is a subset of activated long-term memory that is in the focus of attention, which is limited to several (about 3) separate, coherent ideas or chunks of information at once. The activated portion of long-term memory is readily retrievable into the focus of attention, which is needed for voluntary recognition responses as in the present research. In the activated portion of long-term memory, new information overwrites older information if it shares similar features, and this can occur on a feature-by-feature basis. This principle can explain why colors of objects are overwritten by other colors, and tone pitch is overwritten by other tones. These features can be overwritten regardless of the shape or timbre features that sometimes separate one set of stimuli from another. The small magnitude of effects of these marking features is uncertain but can be explained on the grounds that the temporal grouping of the two sets already provides enough separation to narrow search sets and guide responses.
The asymmetry in effects of sharing attention could occur, in this view, because humming is a backup process for tone memory that does not require much attention, whereas no backup process applies to the maintenance of visual items. Therefore, sharing attention has a much stronger effect on visual items.
Comparison of views.
The present results do not clearly choose between modular and attention-based views, which has been the case in other dual-task circumstances as well (Doherty et al., 2019; Rhodes et al., 2019). As in those other recent studies, though, the data do constrain both kinds of theories (cf. Cowan et al., 2020). Neither theory would necessarily have predicted an asymmetry in the effects of shared attention on color memory versus tone memory, although asymmetries have been observed before for visual versus verbal stimuli (Morey et al., 2012, 2013; Vergauwe et al., 2010).
To account for the asymmetry, the attention-based account seems to have an advantage over any version of the modular account in which there are no attention processes shared between modalities. The reason is that, if there are no contributions of attention per se, then one must posit that both modalities share some other resource, most likely to be a verbalization process. However, past research has shown that concurrent verbalization has no discernable effect on the retention of visual arrays of colors (Morey & Cowan, 2004, 2005). Considerable recent research suggests that when verbal rehearsal is not possible, attention is needed, and that this is typically the case for visual stimuli (Gray et al., 2017; Morey & Bieler, 2013; Souza & Oberauer, 2017).
Conclusion
We have examined all combinations of visual-visual, visual-auditory, auditory-visual, and auditory-auditory dual sets sequence with the task of retaining the colors or tone pitches of one or both sets for immediate recognition. Unlike previous studies, we eliminated a confounding factor in which same- but not different-modality sets can both seem relevant to the same test probe item. To avoid this situation, on some same-modality trials we allowed different marking features (colored disks or triangles; cello or piano notes) to assist recall and separate the sets. The data show that there are effects of the modality on the two sets, with worse performance when they are the same. There was only a limited advantage of a marking feature within each set, which was seen most clearly when both sets were presented as sequences. In addition to this inter-set interference factor, we observed a detrimental effect of dual-set attention only when the task involved recognition of visual objects, even when paired with a set of tones. This asymmetry in the effects of attention presents a challenge to theories of working memory and constrains various kinds of theories. We believe it is to the advantage of theories that allow a key role for a limited focus of attention. In future work, we plan to vary the tested features within each modality to place modular and attention-based theories in stronger contrast to one another.
Supplementary Material
Open Practices Statement.
The materials and data for this paper can be found on the Open Science Framework at the web site https://osf.io/8ywaf/, by Nelson Cowan and Yu Li, Visual and acoustic working memory with feature markers.
Acknowledgments
We thank Levi Doyle-Barker, Bret Glass, Mariah Hawkins, Justin Moore, and Jingyuan (JuJu) Ye for assistance. This research was funded by NIH Grant R01-HD021338 to Cowan.
References
- Allen RJ, Baddeley AD, & Hitch GJ (2017). Executive and perceptual distraction in visual working memory. Journal of Experimental Psychology: Human Perception and Performance, 43, 1677–1693. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baddeley AD (1986). Working memory. Oxford, England: Clarendon Press. [Google Scholar]
- Barrouillet P, & Camos V (2014). Working memory: Loss and reconstruction. London, UK: Psychology Press. [Google Scholar]
- Clark KM, Hardman K, Schachtman TR, Saults JS, Glass BA, & Cowan N (2018). Tone series and the nature of working memory capacity development. Developmental Psychology, 54, 663–676. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cocchini G, Logie RH, Della Sala S, MacPherson SE, & Baddeley AD (2002). Concurrent performance of two memory tasks: Evidence for domain-specific working memory systems. Memory & Cognition, 30, 1086–1095. [DOI] [PubMed] [Google Scholar]
- Conway ARA, Kane MJ, Bunting MF, Hambrick DZ, Wilhelm O, & Engle RW (2005). Working memory span tasks: A methodological review & user's guide. Psychonomic Bulletin & Review, 12, 769–786. [DOI] [PubMed] [Google Scholar]
- Cowan N (1988). Evolving conceptions of memory storage, selective attention, and their mutual constraints within the human information processing system. Psychological Bulletin, 104, 163–191. [DOI] [PubMed] [Google Scholar]
- Cowan N (1992). Verbal memory span and the timing of spoken recall. Journal of Memory and Language, 31, 668–684. [Google Scholar]
- Cowan N (1995). Attention and memory: An integrated framework. Oxford Psychology Series (No. 26). New York: Oxford University Press. [Google Scholar]
- Cowan N (1999). An embedded-processes model of working memory. In Miyake A & Shah P (Eds.), Models of Working Memory: Mechanisms of active maintenance and executive control (pp. 62–101). Cambridge, U.K.: Cambridge University Press. [Google Scholar]
- Cowan N (2001). The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24, 87–185. [DOI] [PubMed] [Google Scholar]
- Cowan N (2005/2016). Working memory capacity. Hove, East Sussex, UK: Psychology Press. [Classic edition: 2016] [Google Scholar]
- Cowan N (2016). Working memory maturation: Can we get at the essence of cognitive growth? Perspectives on Psychological Science, 11, 239–264. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cowan N (2017). The many faces of working memory and short-term storage. Psychonomic Bulletin & Review, 24, 1158–1170. [DOI] [PubMed] [Google Scholar]
- Cowan N (2019) Short-term memory based on activated long-term memory: A review in response to Norris (2017). Psychological Bulletin, 145, 822–847. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cowan N, Belletier C, Doherty JM, Jaroslawska AJ, Rhodes S, Forsberg A, Naveh-Benjamin M, Barrouillet P Camos V, & Logie RH (2020). How do scientific views change? Notes from an extended adversarial collaboration. Perspectives on Psychological Science, 15, 1011–1025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cowan N, Blume CL, & Saults JS (2013). Attention to attributes and objects in working memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 39(3), 731–747. 10.1037/a0029687 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cowan N, Li Y, Glass B, & Saults JS (2018). Development of the ability to combine visual and acoustic information in working memory. Developmental Science, 21, e12635, 1–14. doi: 10.1111/desc.12635 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cowan N, & Morey CC (2007). How can dual-task working memory retention limits be investigated? Psychological Science, 18, 686–688. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cowan N, Saults JS, & Blume CL (2014). Central and peripheral components of working memory storage. Journal of Experimental Psychology: General, 143, 1806–1836. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Darwin CJ, Turvey MT, & Crowder RG (1972). An auditory analogue of the Sperling partial report procedure: Evidence for brief auditory storage. Cognitive Psychology, 3, 255 267. [Google Scholar]
- Doherty JM, Belletier C, Rhodes S, Jaroslawska AJ, Barrouillet P, Camos V, Cowan N, Naveh-Benjamin M, & Logie RH (2019). Dual-task costs in working memory: An adversarial collaboration. Journal of Experimental Psychology: Learning, Memory, and Cognition, 45, 1529–1551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Doherty JM, & Logie RL (2016). Resource-sharing in multiple-component working memory. Memory & Cognition, 44, 1157–1167 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Endress AD, & Potter MC (2014). Large capacity temporary visual memory. Journal of Experimental Psychology: General, 143, 548–565. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fougnie D, Zughni S, Godwin D, & Marois R (2015). Working memory storage is intrinsically domain specific. Journal of Experimental Psychology: General, 144, 30–47. [DOI] [PubMed] [Google Scholar]
- Gray S, Green S, Alt M, Hogan T, Kuo T, Brinkley S, & Cowan N (2017). The structure of working memory in young school-age children and its relation to intelligence. Journal of Memory and Language, 19,183–201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hitch GJ, Burgess N, Towse JN, & Culpin V (1996). Temporal grouping effects in immediate recall: A working memory analysis. Quarterly Journal of Experimental Psychology, 49A, 116–139. [Google Scholar]
- JASP Team (2019). JASP (Version 0.11.1)[Computer software]. [Google Scholar]
- Jia S, Huang J, Chen H-C, & Tsang Y-K (2013). Right hemisphere advantage in processing Cantonese level and contour tones: Evidence from dichotic listening. Neuroscience Letters, 556, 135–139. 10.1016/j.neulet.2013.10.014 [DOI] [PubMed] [Google Scholar]
- Jiang Y, Chun MM, & Olson IR (2004). Perceptual grouping in change detection. Perception & Psychophysics, 66, 446–453. [DOI] [PubMed] [Google Scholar]
- Katus T, & Eimer M (2018). Independent attention mechanisms control the activation of tactile and visual working memory representations. Journal of Cognitive Neuroscience, 30, 644–655. [DOI] [PubMed] [Google Scholar]
- Landman R, Spekreijse H, & Lamme VAF (2003). Large capacity storage of integrated objects before change blindness. Vision Research, 43, 149–164. [DOI] [PubMed] [Google Scholar]
- Lemaire B, Pageot A, Plancher G, & Portrat S (2018). What is the time course of working memory attentional refreshing? Psychonomic Bulletin & Review, 25, 370–385. [DOI] [PubMed] [Google Scholar]
- Logie RH (2016). Retiring the central executive. Quarterly Journal of Experimental Psychology, 69, 2093–2109. [DOI] [PubMed] [Google Scholar]
- Masse NY, Rosen MC, & Freedman DJ (2020). Reevaluating the role of persistent neural activity in short-term memory. Trends in Cognitive Sciences, 24, 242–258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morey CC, & Bieler M (2013). Visual short-term memory always requires attention. Psychonomic Bulletin & Review, 20, 163–170. [DOI] [PubMed] [Google Scholar]
- Morey CC, & Cowan N (2004). When visual and verbal memories compete: Evidence of cross-domain limits in working memory. Psychonomic Bulletin & Review, 11, 296–301. [DOI] [PubMed] [Google Scholar]
- Morey CC, & Cowan N (2005). When do visual and verbal memories conflict? The importance of working-memory load and retrieval. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 703–713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morey CC, Cowan N, Morey RD, & Rouder JN (2011). Flexible attention allocation to visual and auditory working memory tasks: Manipulating reward induces a tradeoff. Attention, Perception, & Psychophysics, 73, 458–472. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morey CC, & Mall JT (2012). Cross-domain costs during concurrent verbal and spatial serial memory tasks are asymmetric. Quarterly Journal of Experimental Psychology, 65, 1777–1797. [DOI] [PubMed] [Google Scholar]
- Morey CC, Morey RD, van der Reijden M, & Holweg M (2013). Asymmetric cross-domain interference between two working memory tasks: Implications for models of working memory. Journal of Memory and Language, 69, 324–348. [Google Scholar]
- Nairne JS (1990). A feature model of immediate memory. Memory & Cognition, 18, 251–269. [DOI] [PubMed] [Google Scholar]
- Oberauer K, & Lin HY (2017). An interference model of visual working memory. Psychological Review, 124, 21–59. doi: 10.1037/rev0000044. [DOI] [PubMed] [Google Scholar]
- Penney CG (1989). Modality effects and the structure of short-term verbal memory. Memory & Cognition, 17, 398–422. [DOI] [PubMed] [Google Scholar]
- Pratte MS (2018). Iconic memories die a sudden death. Psychological Science, 29, 877–887. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ricker TJ, Sandry J, Vergauwe E, & Cowan N (2020). Do familiar memory items decay? Journal of Experimental Psychology: Learning, Memory, and Cognition, 46, 60–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rhodes S, Jaroslawska AJ, Doherty JM, Belletier C, Naveh-Benjamin M, Cowan N, Camos V, Barrouillet P, & Logie RH (2019). Storage and processing in working memory: Assessing dual task performance and task prioritization across the adult lifespan. Journal of Experimental Psychology: General, 148, 1204–1227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rose N (2020). The dynamic processing model of working memory. Current Directions in Psychological Science, 29, 378–387. [Google Scholar]
- Treisman M, & Rostron AB (1972). Brief auditory storage: A modification of Sperling's paradigm. Acta Psychologica, 36, 161–170. [DOI] [PubMed] [Google Scholar]
- Rouder JN, Morey RD, Speckman PL, and Province JM (2012). Default Bayes factors for ANOVA designs. Journal of Mathematical Psychology, 56, 356–374. [Google Scholar]
- Saults JS, & Cowan N (2007). A central capacity limit to the simultaneous storage of visual and auditory arrays in working memory. Journal of Experimental Psychology: General, 136, 663–684. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schendel ZA, & Palmer C (2007). Suppression effects on musical and verbal memory. Memory & Cognition, 35, 640–650. 10.3758/BF03193302 [DOI] [PubMed] [Google Scholar]
- Souza AS, & Oberauer K (2017). The contributions of visual and central attention to visual working memory. Attention, Perception, & Psychophysics, 79, 1897–1916. DOI 10.3758/s13414-017-1357-y [DOI] [PubMed] [Google Scholar]
- Sperling G (1960). The information available in brief visual presentations. Psychological Monographs, 74 (Whole No. 498.) [Google Scholar]
- Hu Y, Allen RJ, Baddeley AD, & Hitch GJ (2016). Executive control of stimulus-driven and goal-directed attention in visual working memory. Attention, Perception, & Psychophysics, 78, 2164–2175. [DOI] [PubMed] [Google Scholar]
- Uittenhove K, Chaabi L, Camos V, & Barrouillet P (2019). Is working memory storage intrinsically domain-specific? Journal of Experimental Psychology: General, 148, 2027–2057. [DOI] [PubMed] [Google Scholar]
- Vandierendonck A (2016). A working memory system with distributed executive control. Perspectives on Psychological Science, 11, 74–100. [DOI] [PubMed] [Google Scholar]
- Vergauwe E, Barrouillet P, & Camos V (2010). Do mental processes share a domain general resource? Psychological Science, 21, 384–390. [DOI] [PubMed] [Google Scholar]
- Vogel EK, Woodman GF, & Luck SJ (2006). The time course of consolidation in visual working memory. Journal of Experimental Psychology: Human Perception and Performance, 32, 1436–1451. [DOI] [PubMed] [Google Scholar]
- Williamson VJ, Baddeley AD, & Hitch GJ (2010). Musicians’ and nonmusicians’ short-term memory for verbal and musical sequences: Comparing phonological similarity and pitch proximity. Memory & Cognition, 38, 163–175. 10.3758/MC.38.2.163 [DOI] [PubMed] [Google Scholar]
- Xu M, Fu Y, Yu J, Zhu P, Shen M, & Chen H (2020). Source information is inherently linked to working memory representation for auditory but not for visual stimuli. Cognition, 197, 104160. [DOI] [PubMed] [Google Scholar]
- Zhang W, & Luck SJ (2009). Sudden death and gradual decay in visual working memory. Psychological Science, 20, 423–428. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

