Encoding location and serial order in auditory working memory: evidence for separable processes

Franco Delogu; Tanja C W Nijboer; Albert Postma

doi:10.1007/s10339-012-0442-3

. 2012 May 23;13(3):267–276. doi: 10.1007/s10339-012-0442-3

Encoding location and serial order in auditory working memory: evidence for separable processes

Franco Delogu ^1,^✉, Tanja C W Nijboer ^1,^2,³, Albert Postma ^1,²

PMCID: PMC3402662 PMID: 22618605

Abstract

In this study, we investigated the interactions between temporal and spatial information in auditory working memory. In two experiments, participants were presented with sequences of sounds originating from different locations in space and were then asked to recall either their position or their serial order. In Experiment 1, attention during encoding was manipulated by contrasting ‘pure’ blocks (i.e., location-only or serial-order-only trials) to ‘mixed’ blocks (i.e., different percentages of spatial and serial-order trials). In Experiment 2, ‘pure’ blocks were contrasted to blocks in which spatial and serial-order trials were intermixed with a third task requiring a semantic categorization of sounds. Results from both experiments showed that, whereas serial-order recall is linearly affected by the simultaneous encoding of a concurrent feature, the recall of position is mostly unaffected by concurrent feature encoding. Contrastingly, overall performance level was lower for spatial recall than serial recall. We concluded that serial order and location of items appear to be independently encoded in auditory working memory. Serial order is easier to recall, but strongly affected by the processing of concurrent item dimensions, while item location is more difficult to recall, but relatively automatic, as shown by its strong resistance to interfering dimensions in encoding.

Keywords: Automatic encoding, Attention, Spatial, Serial order, Auditory working memory, Environmental sounds

Introduction

When keeping track of events in memory, we have to remember what happened, where it happened, and when it happened. Do we maintain these different dimensions of the stimuli in integrated representation in working memory or do we have separate traces for each one of these different domains?

On the one side, numerous studies have indicated that information about the identity, the location, and the serial order of perceptual objects can be independently encoded in working memory (WM). For example, evidence of a separate encoding for objects and locations in visuospatial WM has been found repeatedly (e.g., Klauer et al. 2004, see also Zimmer 2008 for a review). Also, serial order per se appears to be dissociable from the type of information contained in the item sequence (Amiez and Petrides 2007; Kesner et al. 1994; Milner et al. 1991) and from the location of items within the sequence (Dutta and Nairne 2003; Healy 1975). Moreover, recent neuroimaging studies confirmed that the encoding of item identity, location, and serial order seems to be mediated by different brain regions (see Courtney et al. 2007 for a review).

On the other side, several studies have shown that these event features can be integrated into unified memory representations through mechanisms of feature binding (see, e.g., Prabhakaran et al. 2000; Jiang et al. 2000; Maybery et al. 2009). Accordingly, Baddeley’s revised working memory model included the episodic buffer as a component, which is responsible for integrating different information in short-time multidimensional representations (Baddeley 2000).

Until recently, however, not all binding processes have been studied in depth. While feature–feature binding (see the seminal works of Treisman 1999 and Luck and Vogel 1997) and feature–location binding (see among others Prabhakaran et al. 2000) have been extensively investigated, less attention has been devoted to exploring the mechanisms of binding between serial order and location of items. A remarkable exception is offered by a recent study by Gmeindl et al. (2011), which focused on how serial-order information is associated in encoding to either item identity or item location. The authors compared memory span tasks for locations and letters in conditions in which serial order was either task-relevant or task-irrelevant. Participants failed to detect changes in serial order more in the spatial task than in the identity task. Moreover, when participants were not required to remember serial order, they tended to recall the correct serial order for item identity, but not for item location. They argued that the maintenance of verbal identity and spatial information is achieved through different rehearsal mechanisms, a serial rehearsal for verbal information and a multilocation configural rehearsal for spatial information. The authors concluded that serial order is more efficiently bound to the identity of the stimuli than to specific spatial positions. By contrast, there are studies indicating that verbal and spatial stimuli show similar or functionally equivalent serial position curves (Smyth and Scholey 1996, see also Parmentier 2011 for a review). Such equivalence suggests that, analogously to verbal encoding, the spatial encoding of a sequence of locations does involve serial processing.

Cross-domain interference between spatial and temporal features has been investigated by Dutta and Nairne (2003). Their participants selectively attended either to spatial or to temporal (serial) information during a speeded classification task while ignoring irrelevant variation along the other dimension. They found that whereas participants can selectively ignore temporal or spatial variation when no recall of the irrelevant dimension is required, they suffer interference when information from both dimensions must be remembered (Dutta and Nairne 2003). Similar results have been found by van Asselen et al. (2006). They asked participants to recall either the serial order or the exact individual positions of sequentially presented visual items. In order to investigate the automaticity of spatiotemporal integration, they manipulated attention toward each one of the two dimensions by biasing the expectancy of attending either to a spatial or to a temporal task across different blocks of trials. In two ‘pure’ blocks, participants were exclusively presented with temporal or spatial trials. In two ‘mixed’ blocks, they were presented with the majority of trials (80 %) within one dimension (temporal or spatial) and the remaining trials (20 %) within the alternative dimension. Results showed higher accuracy in expected tasks than in the less-expected task both in the spatial and the temporal domains. The authors concluded that attention plays an important role during the encoding of both the location and the serial order of visual objects (van Asselen et al. 2006). Performance in the 20 % condition was clearly above chance, though, suggesting that there is also partial automatic encoding of the unattended feature.

Taken together, the above-mentioned studies offer a first indication that an integration of serial and spatial information of items, with or without the simultaneous encoding of item identity, is a markedly demanding process. In fact, it seems that spatial–temporal binding is more difficult than identity–temporal binding (Gmeindl et al. 2011) and that the simultaneous maintenance of serial order and location of items is an effortful process (Dutta and Nairne 2003), which is significantly modulated by the distribution of attention resources toward the two dimensions during encoding (van Asselen et al. 2006).

Since these previous studies have been conducted with visual stimuli only, it is unknown whether such location–order binding costs could be generalized to other sensory modalities. A comparison to the auditory domain is particularly relevant, as there is strong evidence indicating vision being dominant in spatial processing (Kubovy 1988; Morein Zamir et al. 2003), whereas audition is dominant in temporal and sequential processing (Kubovy 1988; Conway et al. 2009). Accordingly, it is possible that such a primacy of temporal processing over spatial processing in the auditory domain could affect location–order binding.

In line with the foregoing, in the current study, we investigated how spatial and temporal information are combined with each other in auditory working memory encoding. We modified the experimental paradigm used in van Asselen’s study (2006), adapting it to the auditory modality and adding new experimental conditions. We presented participants with two blocks of trials that were either exclusively temporal or exclusively spatial, as well as with two blocks of trials in which the majority of trials (80 %) was within one domain (temporal or spatial) and a minority of trials (20 %) was within the other domain (spatial or temporal; hence, less expected). Most importantly, we added a fifth block in which the expectations of recalling the spatial and the temporal dimensions were equal (50–50 %). As participants did not know which of the two alternative tasks they were going to perform, they were forced to encode and to maintain in memory all types of information, the serial order, the spatial location, and obviously, the identity of the auditory items.

We hypothesized that if the location and serial order of auditory items were automatically integrated in a joint representation in auditory WM, no differences in their recall should be found as a function of the amount of attention dedicated to the target dimension during encoding. On the contrary, if the two dimensions were not automatically integrated, the intention to learn should play a significant role, and accuracy should increase as a function of the amount of attention devoted to the target feature.

Moreover, it is possible that one of the two features, either the position or the serial order of items, is more primarily and more automatically encoded. In this case, we expect dual encoding to have less negative effects on the recall of the primary feature. We tested these hypotheses in two experiments. In the first experiment, participants were expecting to recall either the items’ location only or the items’ serial order only or both dimensions. In the second experiment, we contrasted trials where they expected to recall only location or serial order with trials requiring also the encoding of a non-spatial, non-temporal stimulus dimension.

Experiment 1

In this first experiment, we aimed at testing whether information about item location and item serial order are automatically integrated in auditory WM encoding. Sequences of auditory stimuli were presented from five different locations, and participants were asked to recall either their location or their serial order. Attention was manipulated by varying the proportion of serial-order and spatial trials between blocks (i.e., 100, 80, 50, and 20 % of location versus serial-order trials).

Method

Participants

Twenty students of the University of Utrecht (mean age: 25.5 years (SD: 5.9), 11 females) participated in exchange for course credits or a small amount of money. All participants reported normal hearing and they were all right-handed.

Apparatus

Five loudspeakers were used to present the auditory sequences (see Fig. 1). They were positioned 30° apart from each other in azimuth, at angles of −60°, −30°, 0°, +30°, and +60° (0° corresponds to the position faced by the participant). The loudspeakers were placed at about the head height of the seated participant (1.25 m above the ground), at a distance of 1 m from the participant’s head. A sixth loudspeaker (hereafter test loudspeaker) was positioned behind the participant (180° angle), at the same height as the other five speakers and approximately 60 cm behind the participant’s head. Sound-absorbing curtains were arranged on the wall in order to minimize sound wave reflection. All sounds were presented with an average loudness of 70 dB. A response box was placed in front of the participant. The position of the keys on the response box was arranged in an ergonomic way in order to reduce muscular tension and fatigue. An 8-channel audio card controlled by a custom-written Matlab (The Mathworks, MA) script was used for the presentation of the sounds through the test loudspeakers.

Stimuli and tasks

The stimuli consisted of a set of 40 environmental sounds, described in a previous study (see Delogu et al. 2009). Environmental sounds instead of spoken words were chosen to allow for a more direct comparison of this study to previous studies that also used non-linguistic stimuli (pictures) in the visual domain (see, e.g., Van Asselen et al. 2006). All sounds lasted 2 s. All stimuli belonged to one of the three following semantic categories: human sounds (e.g., baby crying, person coughing), animal sounds (e.g., cat meowing, bird chirping), and tool sounds (e.g., car engine, telephone ring). The sounds were presented in sequences of five sounds, each of them originating from a different loudspeaker. All the sequences contained a random selection of items, with the limitation that a sound could not be presented in a sequence if it was already presented in one of the previous two sequences. The experiment included five different blocks of trials: two blocks of 10 sequences each (‘pure’ blocks), in which participants had to perform only one task (either the location or the serial-order task) throughout the entire block; two blocks of 20 sequences each (80–20 % blocks), in which participants had to recall one feature in the 80 % of the trials and the alternative feature in the remaining 20 %; and one block of 16 sequences (50–50 % block), in which the participants were requested to perform the location task in the 50 % of trials and the serial-order judgment in the remaining 50 %. The order of presentation of the five blocks of trials was counterbalanced between participants.

Procedure

Participants were first trained to use the five keys to indicate either the position of the sound (with the leftmost key indicating the leftmost loudspeaker and the rightmost key indicating the rightmost loudspeaker) or to indicate its serial order (with the leftmost key corresponding to the first sound in the sequence and the rightmost key corresponding to the last sound in the sequence). Before starting the experiment, they also performed an auditory localization task in which they were asked to indicate the position of a series of 100 sounds randomly originating from one of the five speakers. Results of the sound localization task showed high accuracy (mean: 90 %, SD: 4 %), indicating that the azimuthal separation between auditory sources was easy to detect. Since the smallest detectable change in angular position in azimuth is always lower than 4° for the positions included in our task (Mills 1958), we were safely above sensory threshold levels. The separation of 30° was chosen as a compromise between two contrasting needs: (1) to have a sufficient discriminability between speakers and (2) to reduce the use of easy categorical labels to encode item positions (e.g., cardinal points with 45°).

During the experiment, detailed auditory instructions indicating which task the participants were about to perform (serial order versus location) were presented before each block. In the two 80–20 % blocks, participants were explicitly told which feature they would be asked to recall in the majority of the trials. They were also told that in a marginal amount of trials, they would be asked to perform the alternative task. In the 50–50 % block, participants were explicitly told that no feature was prevalent and that, in order to increase their chances of a correct recall, they should pay attention to both features during the sequence presentation. The manipulation of the probability of the task (100, 80, 50, and 20 %) was explicit and unequivocally explained to the participants in order to bias their attention toward either the spatial or the serial-order features during encoding. In such a way, for each tasks, we operatively defined the following four conditions of attention: full attention (100 %), partially diverted attention (80 %), divided attention (50 %), and marginal attention (20 %).

Participants triggered the presentation of each learning sequence by pressing a key on the response box and then listened to the five sounds. After listening to the learning sequence, they were presented with the instruction word (either ‘ORDER’ or ‘POSITION’) coming from the test loudspeaker, indicating which feature they had to recall. Then, all the sounds of the learning sequence were presented again, one by one in a random order, from the test loudspeaker. After each test sound, participants had to recall, according to the condition, either which location or which serial order that sound had in the learning sequence (see Fig. 1). The experiment lasted approximately 70 min. The procedure we employed to assess the memory of the serial order differs substantially from traditional methods such as the immediate serial recall (ISR) task. The reason of using alternative measures is due to the exigency to integrate the serial-order task with the object location memory task in different attentional conditions.

Analysis

A two-way repeated measures ANOVA with the variables feature (location vs. serial order) and expectancy (100, 80, 50, and 20 %) was performed on the mean percentage of correct responses. The Greenhouse–Geisser correction was applied whenever the assumption of sphericity was violated. Fisher’s LSD was used for post hoc comparison. Two participants were excluded from final analysis because their performance, at least in one condition, was more than 2 standard deviations under the group average. F (2, 21) = 13.74.

Results

A main effect was found for feature, F (1, 19) = 11.73, p = 0.003, Inline graphic = 0.382, showing that serial order of items was easier to recall than items’ location. Expectancy also yielded a main effect, F (3, 57) = 16.89, p < 0.001, = 0.47, indicating that the accuracy in the recall of one of the two features increases together with the expectation of recalling such feature.

Importantly, a significant interaction between feature and expectancy was obtained, F (3, 57) = 5.21, p = 0.003, Inline graphic = 0.215. Taking a closer look at this interaction (see Fig. 2), it can be observed that task expectancy strongly and linearly influenced the accuracy in the serial-order task, while affecting the location task only marginally. Post hoc pairwise comparison showed that accuracy in the 100 % location condition, where participants only had to memorize the location of sounds, was equivalent to the accuracy in the 80 % location condition, where the attention in encoding was partially diverted toward serial order (p = 0.434), and also to the 50 % location condition, where participants were instructed to encode both features (p = 0.922). The only condition in which spatial recall was significantly impaired was the 20 % condition, in which location recall was strongly unexpected (p = 0.041 in the 100 vs. 20 % comparison).

Fig. 2 — Accuracy in the location and in the serial-order tasks split on level of expectancy (100, 80, 50, and 20 %). *Error bars* represent standard errors from the mean

On the contrary, in the serial-order task, accuracy exhibited a linear effect of expectancy: the more expected was the recall of serial order during encoding, the better the serial recall. Post hoc tests (p < 0.05 in all pairwise comparison between serial-order conditions) showed that accuracy was progressively poorer as the probability of performing the alternative task increased. It is worth to report that the accuracy in the 20 % condition was higher than chance level. In fact, two separated one-sample t tests indicated that the accuracy in the location and in the serial-order tasks were both significantly different from the chance level: t (19) = 5.8, p < 0.001 for the location task and t (19) = 6.6, p < 0.001 for the serial-order task.

A post-experimental questionnaire was administered to all subjects in order to provide descriptive statistics about the strategies used to maintain items, the perceived difficulty of the location and serial-order tasks, and to verify the influence of attention in their subjective perception of the difficulty of the tasks. Results are summarized in Fig. 3.

Fig. 3 — Post-experimental questionnaire. *Error bars* represent standard errors from the mean

Discussion Experiment 1

In Experiment 1, we investigated whether location and serial-order information are integrated in working memory. We compared recall accuracy in blocks of trials in which participants were asked to attend only one dimension (full attention blocks) to accuracy in blocks in which the recall of one of the two dimensions was prevalent (80–20 % blocks) and to accuracy in blocks of trials in which participants had to attend to both features (divided attention blocks).

Accuracy in the full attention blocks was markedly higher in the temporal task (86 %) than in the spatial task (65 %). This result appears to support the notion that auditory modality is specifically tuned to the processing of serial-order information. Alternatively, it could be due to differences in spatial and serial-order rehearsal. In fact, while serial-order maintenance can be achieved through the rehearsal of the mere item identity, which is likely to be serial even when not required (cf. Gmeindl et al. 2011), the maintenance of item location needs additional, specifically spatial processing. A final possibility is that the crucial factor here is the low accuracy in the location task, which reflects the general lower sensitivity of the auditory modality for space. It should be mentioned here, though, that in the discrimination trials reported in the method section, we observed that perceptual discrimination of the sound location was quite high for the present set-up. Notably, van Asselen et al. (2006) observed a higher performance (80 %) in a visual location task than what we found in the auditory location task (65 %), although they used larger (7 items) stimuli sequences. Admittedly, the present experimental set-up differed in more dimensions from previous studies than only in the stimulus modality.

Concerning the overall influence of attention, our data indicated that participants were more accurate when expecting to recall only one feature instead of both. This result suggests that, in auditory WM, an integrated representation of serial-order and spatial information is not automatic. This result is consistent with studies by Dutta and Nairne (2003) and by van Asselen et al. (2006) in the visual domain. The post-experimental questionnaire revealed that the subjective perception of the difficulty of the task progressively increases when attention is reduced.

More interestingly, our results indicate that the attention during encoding affects the serial-order task more than the location task. Specifically, item location memory appears less affected by expectancy than serial-order memory. As tolerance toward concurrent processing loads is a sign of automaticity (Andrade and Meudell 1993; Ellis 1990), we argue that spatial encoding is more automatic than serial-order encoding in the auditory domain. This result is consistent with previous findings in the visual domain, where spatial information has been demonstrated to be relatively automatically processed even when attention is focused on other features (Köhler et al. 2001; Olson and Marshuetz 2005). It is worth noting that this last result is not consistent with the expected primacy of temporal and sequential processing over spatial processing in the auditory domain (Kubovy 1988; Conway et al. 2009).

From the results of Experiment 1, we cannot decisively explain why dual encoding does not impair sound localization. It could either be that the specific kind of concurrent information (i.e., serial order) does not interfere with spatial processing, or that the encoding of sound position is resistant to the interference of concurrent information, no matter if it is serial or not. Similarly, we cannot decide whether serial-order encoding is specifically affected by concurrent spatial information or whether it would also be affected by non-spatial information. In order to verify if such non-mutual effects of attention on spatial and serial-order recall are dependent or independent from the type of concurrent information, we conducted a second experiment in which spatial and serial order were respectively combined with the encoding of a non-spatial and non-serial feature of the stimulus.

Experiment 2

In Experiment 1 information about serial order and item location was used both as target feature and as interfering memory load. Consequently, it is difficult to establish why spatial encoding was not impaired by the concurrent encoding of serial-order information. In Experiment 2, location and serial-order information had to be encoded either alone or while participants had to remember a third, independent feature. The new feature to be encoded was the semantic category of the environmental sounds, which could belong either to the group of living things (i.e., human or animal sounds) or to the group of non-living things (i.e., tool sounds). We hypothesized that if even when combined with new concurrent information, spatial encoding was less interfered by a concurrent memory load than serial-order encoding, we could conclude that spatial encoding is more automatically encoded than serial order in auditory WM.