Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Aug 1.
Published in final edited form as: Psychol Bull. 2019 Aug;145(8):822–847. doi: 10.1037/bul0000199

Short-term Memory Based on Activated Long-term Memory: A Review In Response to Norris (2017)

Nelson Cowan 1
PMCID: PMC6650160  NIHMSID: NIHMS1026827  PMID: 31328941

Abstract

Short-term memory (STM), the limited information temporarily in a state of heightened accessibility, includes just-presented events and recently-retrieved information. Norris (2017) argued for a prominent class of theories in which STM depends on the brain keeping a separate copy of new information, and against alternatives in which the information is held only in a portion of long-term memory (LTM) that is currently activated (aLTM). Here I question premises of Norris’ case for separate-copy theories in the following ways. (1) He did not allow for implications of the common assumption (e.g., Cowan, 1999; Cowan & Chen, 2009) that aLTM can include new, rapidly-formed LTM records of a trial within an STM task. (2) His conclusions from pathological cases of impaired STM along with intact LTM are tenuous; these rare cases can be explained by impairments in encoding, processing, or retrieval related to LTM rather than passive maintenance. (3) Although Norris reasonably allowed structured pointers to aLTM instead of separate copies of the actual item representations in STM, the same structured pointers may well be involved in long-term learning. (4) Last, models of STM storage can serve as the front end of an LTM learning system rather than being separate. I summarize evidence for these premises and an updated version of an alternative theory in which storage depends on aLTM (newly clarified), and, embedded within it, information enhanced by the current focus of attention (Cowan, 1988, 1999), with no need for a separate STM copy.

Keywords: short-term memory, working memory, long-term memory, activation, capacity


Recently, Norris (2017; henceforth Norris) reviewed the evidence for several alternative theoretical views regarding the mechanisms of short-term memory (STM), the limited information held in mind only temporarily, with special attention to the serial recall of lists. He suggested in the title and throughout the article that, after the many years of research in this field, STM and long-term memory (LTM), the vast store of information learned over a lifetime, are still different. According to Norris, this difference between STM and LTM must include not merely a portion of LTM that is in a special, activated state (e.g., Atkinson & Shiffrin, 1971; Cowan, 1988, 1999; Norman, 1968; Ruchkin, Grafman, Cameron, & Berndt, 2003; Shiffrin, 1975), a suggestion that can be traced back to the beginning of the field of experimental psychology (see Raaijmakers & Shiffrin, 2003). According to Norris (and cf. Baddeley, 2003), STM must include a separate copy of the information or, at least, a set of temporary pointers to the relevant LTM items to represent the structure of the materials in the set to be remembered.

Given the great importance of STM for human information processing, understanding the basis of STM is theoretically crucial. In the present response to Norris, I argue that a separate copy of the information is not needed. Further, in the absence of a separate copy, pointers are indeed needed, but they need not be separate from the long-term learning system.

Among the theories that Norris disputed is the embedded-processes theoretical framework (Cowan, 1988, 1999), in which STM is conceived as activated long-term memory (aLTM) and, embedded within it, more-processed information comprising up to several separate items or ideas in the focus of attention (FoA) concurrently. The dispute interested me but, when I first read Norris’ article, I was unconcerned about our differences in opinion. Later, though, I noticed that misconceptions about the alternative models stated by Norris were repeated frequently and enthusiastically among researchers with similar views, as strong evidence for the separate-copy theory. I believe that these misconstrued points considerably distort the debate. The purposes of the present reply are thus to describe the points overlooked by Norris and misconceptions arising from them, and to present an updated version of the embedded-processes theory, with no separate copy of information outside of the LTM system, in light of the last 30 years of evidence, and to show that it remains viable.

Organization of the Reply

First, I describe the points raised by Norris, and then the key distinctions between three views to be compared: the separate-copy view articulated by Norris, a unitary memory view that Norris critiques, and my embedded-processes view that he also critiques. Next, I introduce replies to key points raised by Norris (see Table 1 summary). Last, I explain in more detail a viable alternative conception of processing with no separate copy of information in STM, depicted in Figure 1.

Table 1.

Responses to the Arguments for a Separate Copy of Information in STM

Description of Argument Argument for Separate Copy (with key references) Response Against Separate Copy (with key references)
1. Storage of new configurations is needed in STM Activated long-term memory (aLTM) has no way to know about the spatial or temporal configuration of stimulus elements (Baddeley, 2003; Norris, 2017) Everyone recognizes there must be new, rapid learning of information in STM tasks (e.g., Keppel & Underwood, 1962), and the newly-learned information is typically still in an activated state, aLTM, at the time of test (Cowan, 1999).
2. Token representations cannot be represented in aLTM, only types aLTM cannot represent separate tokens of the same type, as in the series 1–7-1; this is the problem of two (Jackendoff, 2002; Norris, 2017) aLTM includes rapid learning of information, and therefore can include the same episodic information about tokens that one adds to LTM (Cowan, 1999; Nairne & Neath, 2001)
3. No extant model of STM performance based on aLTM Unlike separate-copy mathematical models of STM (e.g., Burgess & Hitch, 2006; Henson, 1998), there is no well-specified aLTM model of STM (Norris, 2017) Including new learning as part of aLTM changes the need, because separate-STM-copy theories might be reclassified as the front end of long-term learning. Many long-term learning models exist. A few models deal explicitly with aspects of aLTM and new learning (Anderson & Matessa, 1997; Cowan et al., 2012).
4, STM recall differs from LTM recall in its properties Hebb (1961) learning doesn’t work when only alternate items are repeated between trials (Cumming, Page, & Norris, 2003) so order cues differ. STM operates with phonological information; LTM, with semantic information. There is evidence that long-term learning with repetition heavily relies on item-item associations (Zaromb et al., 2006), not just item-position as implied by Cumming et al. LTM with reduced interference looks more similar to STM (Dewar et al., 2010; Ecker et al., 2015a, 2015b). Unlike the usual procedures, STM can use semantic information (Potter, 1993) and LTM can be made to use phonological cues when such cues are best suited to the encoding context (Morris et al., 1977). Order retention suffers in dyslexia within both STM and LTM (Martinez Perez et al., 2013; Szmalec et al., 2011).
5. Separate STM and LTM decay rates in models for a reason Models of STM include 2 decay rates, fast and slow (Burgess et al., 2006; Oberauer et al., 2013) Episodic LTM learning can be rapid (Wixted et al., 2014, 2018) and the data do not strongly support the need for both decay rates.
6. Small periods of time play an important role in STM Effects of speed of rehearsal (Baddeley et al., 1975) and free time for refreshing (Barrouillet et al., 2011) implicate processes to counteract temporal decay. Decay is practically non-existent for well-learned items presented in slow series (Oberauer & Lewandowsky, 2008), which can remain active in aLTM in large numbers (Endress & Potter, 2014; Wolfe, 2012), whereas decay over several seconds is found for poorly-learned items presented quickly or in brief spatial arrays (Ricker, 2015; Ricker & Cowan, 2014).Unused time between items may result in the opportunistic use of the hippocampus for further consolidation (Mednick et al., 2011).
7. Failure of Hebb-like effect for visual arrays, further dissociating STM from LTM Spatial arrays do not yield learning in recognition tasks (Logie, Brockmole, & Vandenbroucke, 2009), suggesting no clear LTM representation formed Learning may be impeded because a different subset of the spatial array is entered into STM on each trial, and/or because change trials introduce interference with the representation
8. Variable binding must be encoded into STM It is not enough to retain STM information about binding; roles must be retained, e.g., one mention of dog as agent and another as patient in the same sentence Patients with hippocampal damage and LTM deficiency also show a deficit in variable binding, in sentence comprehension requiring variable binding for pronoun assignment (Kurczek et al., 2013)
9. Neuropathological deficits distinguish STM from LTM Deficits in STM with preserved LTM show that STM cannot simply be a portal for LTM, but rather a separate copy of information (Warrington & Shallice, 1969) Specific deficits in STM performance could come from deficient processes specific to STM maintenance (e.g., rehearsal: Cowan, 1988; or other kinds of deficient coding: Cermak, 1997; Morey, 2018; Morey et al., in press; Ruchkin et al., 2003). Also, LTM procedure used have not closely matched STM procedures used.
10. Tasks are impure measures of either STM or LTM Deficits in STM that accompany LTM damage are restricted to supraspan lists, where LTM learning occurs (Jeneson & Squire, 2012) LTM learning may make use of use the focus of attention once for subspan lists but reiteratively for supraspan lists (Rhodes & Cowan, 2018), and the reiterative process could be impaired.
11. Neuroimaging as a correlation fallacy The finding of STM activity in the same areas known to mediate LTM does not imply that this LTM activity causes STM maintenance. It is just a correlation (Baddeley, 2003) The scientific method seeks the most parsimonious and adequate theory that can accommodate all of the evidence, including correlations and causation. The neuroscientific evidence for the embedded-processes approach includes correlational neuroimaging-behavior correspondences (e.g., Chein & Fiez, 2010; Cowan, 2011; Cowan et al., 2011; Kalm & Norris, 2017; Lewis-Peacock et al., 2012; Li et al., 2014; Majerus et al., 2016; Öztekin et al., 2008) and causal TMS evidence (Postle et al., 2006; Rose et al., 2016).

Figure 1.

Figure 1

A simplified sketch of the embedded-processes model in a brain context. DLPFC=dorsolateral prefrontal cortex; IPS = intraparietal sulcus. Modeled after the proposals of Cowan (1995, 1999) with refinements from more recent literature on the focus of attention or FoA (literature on the IPS discussed in the article, e.g., Cowan et al., 2011) and on the locus of covert verbal rehearsal (Chein & Fiez, 2001, 2010). Dashed arrows represent attention-related processes, including central executive control of the contents of the FoA, pointers from the FoA to currently-attended aLTM, and connectivity to the hippocampus and adjacent areas for permanent storage of new LTM. Dotted lines represent processes that may operate outside of attention, including a covert verbal rehearsal activity making use of, and perpetuating, verbal information in aLTM, and hippocampal activities using aLTM and FoA input to record new memories that also alter the LTM regions. It is not yet clear whether unattended aLTM elements are to be represented primarily by synaptic weighting information that is invisible to fMRI (Christophel et al., 2018) or by neural activity that does not include the attention circuit (Rose et al., 2016). Some new concepts and episodes possibly might form in LTM regions with FoA involvement, but they might not survive permanently without normal hippocampal function.

It is important to identify the set of tasks under consideration. There is fairly good agreement about immediate recall tasks that can be considered indices of STM, including serial recall of lists, free recall of lists, recognition of items from lists or spatial arrays, and probed item reproduction (Cowan, 2017a; Oberauer et al., 2018). The present discussion will concentrate on these tasks with the most emphasis on serial recall of lists, like Norris. I will also suggest how common processing mechanisms may be shared by very different kinds of tasks.

Points Raised by Norris

Eleven key points raised by Norris are listed in Table 1. (1) First, Norris asserted that a new configuration of information can be saved in STM only with a separate store containing a separate copy of the information or a separate reference to it via a set of pointers, not just storage in aLTM. The latter was conceived as a temporarily very accessible state of a small amount of information from LTM. So, for example, if one tries to remember the sentence, “Three penguins jumped off of the rocks,” there could be activated representations of penguins, rocks, and jumping, but aLTM was said to be unable to form a new configuration from the familiar elements. (2) It was stated that multiple tokens of the same type cannot be represented in aLTM. If one saw the series 5-1-5, the digits (types) 1 and 5 could be represented but not the two separate tokens of the digit 5, so reconstruction of the list would supposedly be impossible. (3) It was stated that, although there are a number of computational and theoretical models of STM, none exists based only on aLTM. (4) It was stated that STM recall differs from LTM recall in its properties (for example, with phonological confusions typical of STM recall and semantic confusions typical of LTM), so that the two must have different storage mechanisms. (5) It was stated that computational models have included separate rates of memory loss over time, or decay, for STM and LTM, suggesting that they are based on different storage mechanisms. (6) It was suggested that small periods of time play an important role in STM recall (for example, the rate at which an individual can rehearse words or refresh information using attention, and the time available to do so), which would be true of an STM store but presumably not aLTM. (7) The Hebb effect (Hebb, 1961) was said to dissociate STM retention from LTM learning of the same information. For verbal sequences, STM retention and LTM learning both can succeed, but for visual arrays, STM retention can succeed where LTM learning fails, so STM and LTM learning appear to have different properties overall. (8) A separate STM store is needed to keep track of variable binding, such as the associations between instances of the word some and the two verbs in the sentence, some left and some stayed. (9) Neuropathological deficits distinguish STM from LTM storage. One can find patients with STM deficits but normal LTM, or LTM deficits but normal LTM, so they presumably cannot be based on the same storage medium. (10) Tasks are impure measures of either STM or LTM so, if one finds evidence of both kinds of memory in a single procedure, this does not imply that they rely on a common storage mechanism. (11) Finally, Norris suggested that although there is neuroimaging evidence seeming to point to aLTM sites for information being retained in STM, this evidence does not indicate that the aLTM sites are responsible for STM retention; neuroimaging results are seen as correlations with behaviors, not necessarily causes. These premises will be reconsidered here (not completely in the order Norris used), after key distinctions between different views are discussed.

Key Distinctions between Different Views

In this section, three kinds of approaches to STM are described: a separate-STM-copy approach, a unitary-memory approach, and an embedded-processes approach. The last approach, the one adopted here, is further clarified in terms of three issues: decay versus interference, central executive function and attention, and the role of rapid new learning. These distinctions between views are described to set the stage for a reply to Norris’ points and a description of how an alternative, the embedded-processes view, can account for the relevant evidence.

Separate-STM-Copy Approaches

Norris proposed two versions of the separate-copy STM view. In one version, the information from the environment and, as needed, from LTM are copied into STM. Even if the ability to form new long-term memories is destroyed through brain damage, the ability to form temporary, new STM structures presumably can be preserved. In a second version of the theory that Norris proposed, it is not the structured set of item representations that are entered into STM, but temporary pointers to LTM contents, indicating which items have been presented, organized in a way that describes their relation to one another. A further preference Norris stated for both varieties is that they should be separate for different types of information (specifically phonological, visual-spatial, and multimodal episodic information). It was allowed that some kind of activation of LTM could exist, perhaps underpinning a sense of familiarity, but it was asserted that this aLTM would be insufficient to represent new relationships revealed in the memory trial that did not exist in LTM beforehand.

It is noteworthy that there is another, very different theoretical approach in which it has been suggested that a separate copy of information is needed for STM, one without different storage modules for different content areas but with temporary associations (Oberauer, 2009; Oberauer, Souza, Druey, & Gade, 2013), and that approach too will be briefly discussed and critiqued.

Unitary Memory Theories

Unitary memory theories (e.g., Brown, Neath, & Chater, 2007; Surprenant and Neath, 2009) hold that only one set of mechanisms and rules is needed to account for all of memory. The difference between memory in the short term and memory in the long term is said to be in the presence of much more interference from other trials and intervening events in a delayed memory test, including both proactive interference (from material presented before the stimuli to be recalled) and retroactive interference (from material presented after the stimuli to be recalled, but before its recall). These approaches clearly have no separate STM copy.

Embedded-processes Approach

In this approach (Cowan, 1988, 1999, 2001, 2005/2016, 2010), memory is represented by LTM along with a subset of features that are in a temporarily activated state, making these items more rapidly and reliably accessible than other items in LTM. This aLTM was originally defined by the notion that features, when they are not mentally rehearsed or refreshed, decay to the point of becoming useless within several seconds, involving a loss of activation. Now, based on recent findings to be discussed (e.g., Oberauer & Lewandowsky, 2008; Ricker, 2015; Ricker & Cowan, 2014), it seems clear that the rate of loss is quite variable, depending on how completely the information was processed and consolidated into memory when it was presented.

Within aLTM, a subset of the information is highlighted by the FoA, which includes more processed, integrated information limited to about 3 to 5 independent, coherent units or chunks. STM performance is presumably based on items in the FoA supplemented by information in aLTM (including new learning, and also information recently retrieved from LTM), which is included in the FoA as needed, up to a limit of several separate chunks. In keeping with the notion that concepts of temporary memory were “evolving” (stated in the title of Cowan, 1988), the concept of aLTM is still evolving, as I will show.

Decay and interference.

The embedded-processes approach shares with the multi-component approaches the assumption of time-based decay, absent from unitary approaches. However, the embedded-processes approach shares with the unitary approach a great reliance on principles of interference between items when their features are similar (e.g., Nairne, 1990), though there are unanswered questions about the nature of interference (e.g., Hintzman, 2016).

In contrast to the embedded-processes and unitary approaches, multi-component approaches including that of Norris hold that the amount of interference depends on whether two sources of materials are saved in the same STM storage module, in which case they interfere with one another a lot (e.g., both verbal) or different modules, in which case they hardly interfere with one another (e.g., one verbal, one spatial). The embedded-processes approach, however, considers features of many types (e.g., acoustic, phonological, orthographic, spatial, visual, touch, taste, and smell) and their combinations (e.g., the spatial locations of sounds), and all of these features would not fit neatly into a few modules so, for the sake of simplicity, there is no attempt at a taxonomy of aLTM substores. When the decay rate is very slow, for well-processed items, the information remains activated until a critical amount of interference occurs.

Central executive function and attention.

The embedded-processes approach has a greater reliance on the FoA concept than either of the other approaches. It shares with the multi-component approach the notion of central executive processes that control the flow of information between parts of the system. In the seminal work of Baddeley and Hitch (1974) that stimulated the field of working memory, the central executive included memory of abstract information, which would be similar to the FoA, but that memory was eliminated later (e.g., Baddeley, 1986). The multi-component model was then missing some mnemonic capabilities that must exist, which were later assigned to the episodic buffer (Baddeley, 2000). Its STM capabilities that were highlighted were STM for binding (association) of features or items across modalities and integrated semantic information. These are capabilities that the embedded-processes approach handles through information included in the FoA and linked to aLTM.

Role of rapid new learning.

A key property of the embedded-processes approach that Norris did not discuss was that information can be learned quite quickly, so that newly learned structures (such as the serial positions of list items, spatial positions of array items, or binding of items to semantic roles) is processed by the FoA and is concurrently learned, resulting in new aLTM material that can be used on the trial (though learning may be imperfect and later retrieval depends on interference and on retrieval cues). Like the second version of Norris’ theory, one way to interpret the FoA is that it holds pointers to the information in aLTM, structured to represent the new information. Unlike Norris’ theory, though, this information also alters LTM. This learning capability was made clear by Cowan (1999) and expounded upon by Cowan and Chen (2008). Norris cited the latter without commenting on the learning property.

This reply provides the argument for a system in which all storage of information in normal individuals contributes to new learning in LTM. The pointers that Norris described could be seated in the FoA in an embedded-processes conception, and the result would include new learning that would guide both STM and LTM task performance. Counter-arguments from Norris will be critiqued, and the most important points are summarized in Table 1.

Reply to Norris: aLTM with New Long-term Learning

Two overarching points need to be addressed. First, problems brought up by Norris regarding complex structures in STM, exemplified here by a discussion of types and tokens, can be addressed by new learning, without a separate STM copy of information. I will discuss how that can occur. Second, I will address a number of stated objections to the assumption in the embedded-processes approach that there is no separate STM copy of information.

How Is Structure Encoded Into STM?

The problem for encoding STM stated by Norris, versus resolution of the problem through new learning.

Norris noted that STM must include information not only about what items are currently to be remembered, but also what their relationship to one another is in the material to be remembered. A key example he offered is the distinction between types and tokens. As Norris explained, memory for the series 6-3-6 could not be encoded in aLTM, which presumably would be able to include only the types (categories) 6 and 3, but not two separate tokens (instances) of the same type (6) in first and third positions, as is needed to allow correct serial recall. Presumably, according to Norris, a separate mechanism would be needed to indicate that the digit 6 appears twice, once before and once after the digit 3.

Although this example shows the inadequacy of aLTM defined totally as the activation of information learned prior to the presentation of the series to be remembered, that is not how I have defined it, and I have not found anyone who thought pre-existing aLTM was sufficient for STM performance. In my discussion of the issue, aLTM also includes information newly learned about the current ensemble of information, still in an active form at the time of recall or recognition (Table 1, Point 1). Cowan (1999) stated it as follows:

Finally, there is one important qualification of the statement that working memory contains activated elements of long-term memory. Most stimulus situations in life include novel combinations of familiar features. In memory the elements are activated independently, but the particular links between those elements are often novel. The current combination of elements may, however, be stored as a new long-term memory trace. Declarative memories are said to be encoded only with the presence of attention, whereas procedural memories might be encoded more automatically, provided that sufficient attention is devoted to the task to allow the relevant stimulus features to be processed… (Cowan, 1999, p. 89)

Thus, the concept of rapid long-term learning has long been part of the embedded process view and is not an ad hoc invention in response to the criticism of Norris or others such as Logie and Della Sala (2003). Moreover, given that rapid LTM learning acting in concert with the FoA serves a function similar to the episodic buffer (Baddeley, 2000), it is useful to realize that my 1999 description, based on a 1997 conference presentation, was not formulated in response to the episodic buffer but, rather, may have highlighted some of the rationale leading to that buffer. Baddeley (2000) and I thus have addressed the need for flexibility in coordinating and binding different kinds of units in STM, but with my earlier conception being less modular in nature.

Cowan & Chen (2009; mis-cited as 2008 by Norris) further emphasized this need for long-term learning to be considered as part of aLTM:

We address the question of whether information in short-term memory can be conceived as the activated portion of long-term memory. The main problem for this conception is that short-term memory must include new associations between items that are not already present in long-term memory (or sometimes between items and serial positions). Relevant evidence is obtained from a task in which new word pairings are taught and then embedded within a short-term serial recall task. We conclude that rapid long-term learning occurs in short-term memory procedures, and that this rapid learning can explain the retention of new associations. (Cowan & Chen, 2009, p. 86)

If aLTM can include newly-learned associative information, then it can represent new sequences, even including repetitions of a token within a list, as these clearly can be learned (Table 1, Point 2).

To describe this learning further, new LTM is largely episodic in nature, but multiple episodes with commonalities can be combined in the brain to form new semantic concepts in LTM (cf. Anderson & Ross, 1980; Watkins & Kerkar, 1985). For the future, it might be clearer if a different term is used to describe this new-learning source of aLTM such as, perhaps, new-learning aLTM. It would account for new associations, such as binding items to their serial and spatial positions and binding features within items (e.g., which shape was presented in which color), as well as semantic roles filled by items (e.g., in the sentence, The officer lost the gun, the officer’s role as an agent or actor and the gun’s role as a patient or object of the action).

Is a Separate STM Copy Necessary?

Here I respond to Norris’ arguments for a separate copy of information in STM, showing how STM storage nevertheless could be identified with aLTM storage, with STM as the portal for LTM learning as in classical conceptions (e.g., Atkinson & Shiffrin, 1968). To do so, I consider evidence to address Norris’ concerns about learning, neuropathology, neuroimaging, cognitive modeling of STM supporting an aLTM approach, and modularity versus non-modularity of storage.

Norris’ separate stores versus rapid, new learning.

Norris suggested that STM and LTM stores are separate. He did acknowledge that new LTM traces are formed rapidly. For example, he mentioned (p. 995) that “Implicit learning occurs even in tasks which ostensibly only require STM.” Nevertheless, he did not consider that this newly-learned information might be the basis of STM performance in total, encoding such details as repeated digits in a list. Why not? One reason might be because information that can be retrieved in an STM task greatly exceeds what is remembered later in a delayed task, seeming to suggest that STM performance includes a separate copy of information that is then lost and unavailable for a delayed task. However, one can instead explain discrepancies between immediate and delayed memory by interference (cf. Table 1, Points 4 & 7).

Interference effects and new learning.

Norris may not have allowed for the full implications of interference in memory. Regarding the reasons why information might be present in the short term and gone in the longer term, researchers espousing a single-store, unitary memory model (e.g., Bjork & Whitten, 1974; Brown, Neath, & Chater, 2007; Crowder, 1982; McGeoch, 1932; Keppel & Underwood, 1962; Nairne, 2002; Surprenant & Neath, 2009) have a point. Information about an event memorized when it occurred may persist in memory but, as time goes on and stimuli accrue, the retrieval of that information can become more difficult. A key principle of forgetting noted by these theorists is interference. The ability to engage in delayed recall depends upon the right recall cues to retrieve the correct information, despite other prior trials with similar materials acting as interference. Suppose, for example, n trials of immediate recall are followed by a delayed request to retrieve information from Trial x. Information from Trials 1…x–1 act as proactive interference and information from Trials x+1…n act as retroactive interference. From a unitary memory view, the passage of time between immediate-recall Trial x and the delayed recall of it may increase proactive interference by making Trial x less temporally distinct compared to prior trials (e.g., Glenberg & Swanson, 1986; though unexpected temporal intervals within lists do not seem to facilitate list recall as shown by Nimmo & Lewandowsky, 2005, 2007; Parmentier, King, & Dennis, 2006), and it allows more trials to contribute to retroactive interference, which may be more potent than proactive interference because of overwriting of features (e.g., Nairne, 1990).

Even STM theorists do not completely deny that newly-learned information contributes to memory in STM tasks. Importantly, Burgess and Hitch (2006), upon which Norris relied, noted:

The context signals are ambiguous with respect to the traditional distinction between STM and LTM. They are responsible for aspects of ISR [immediate serial recall] traditionally associated with LTM, such as learning over repetitions and position-specific intrusions… as well as effects of temporal grouping within STM…. In the revised model, we have sought to make clear their role in the transition of order information from STM to LTM. Thus, in the absence of repetition of lists, context signals play a role in maintaining order in STM (but not a crucial one, given that recovery from inhibition during presentation provides an alternative ordering mechanism, see Burgess & Hitch, 1999)… and mediate effects of temporal grouping, and long-term connection strengths from context sets to item nodes are not reliably strengthened. When order information is reliably repeated, however, a context set becomes associated with the repeated pattern and effectively provides a form of long-term memory for that sequence. (Burgess & Hitch, 2006, p. 646)

Where my interpretation differs from Burgess and Hitch (and Norris) is in the understanding of non-repeated information. Burgess and Hitch’s statement that a long-term connection is not reliably strengthened with a single presentation cannot be taken literally, or else connection strengths would never build up over repetitions. An alternative is that single presentations do provide rapid long-term learning that is often sufficient for immediate recall, given that there is not much interference from other trials. This single-presentation learning would be insufficient for delayed recall, for which there is more interference that can (1) cause inactivation of the newly-learned information, and (2) make it difficult to know which information needs to be re-activated. Burgess and Hitch, as well as Norris, rest their case for a separate STM copy on differences between STM storage and LTM learning, such as the absence of long-term benefits when only every other item in a list is repeated (Cumming, Page, & Norris, 2003). This absence of benefits from partial repetition could be explained, however, if retrieval of the information is not simply a matter of associations of items with serial positions, but also inter-item associations. For example, when Zaromb et al. (2006) carried out free recall with some of the same items repeated between lists, the repeated items were recalled well, but they also caused more intrusions from the items that were neighbors of the repeated items in the previous, recent lists, suggesting inter-item associations based on contiguity in the list.

Variable binding and new learning.

Norris was also concerned about the issue of variable binding between items in STM:

…consider how we might maintain a coherent representation of a sentence such as ‘The young boy saw the boy who was singing.’ Here the problem is not simply representing the order of the words, or even that there are two tokens of the word boy, but appreciating that there are two different boys, one of whom is singing and one of whom is young.” It’s necessary to both represent multiple tokens and the bindings between each of those tokens and other components of the sentence. However, this cannot be achieved solely by coactivating and associating existing representations (as assumed by Cowan & Chen, 2008 [sic – should be 2009]…This might seem to present a severe problem for Oberauer’s three state model…(Norris, 2017, p. 1000)

This kind of representation would be impossible using only pre-learned aLTM, but it is quite naturally accomplished when one adds rapid new learning as part of aLTM on a trial (Table 1, Point 8). The fact that Cowan and Chen (2009) did not consider cases of variable binding in no way implies that their proposed mechanism would be unsuitable to handle it. It would seem unparsimonious to have a dedicated STM module (or modules) that could not only order phonological, lexical, and object units but could also represent abstract roles such as two different boys with different attributes, when the same apparatus is then needed for long-term learning as well. Parsimony points toward a theory in which STM and LTM binding are created by the same mechanism.

Evidence for STM as a portal for LTM.

One way to distinguish a separate STM copy from STM as a portal for LTM would be if only STM tasks revealed a capacity limit, but that does not appear to be the case (Table 1, Point 2). Nairne and Neath (2001) presented lists of 2–9 words to be rated on their pleasantness. Following a 5-minute period filled with a geometric task, there was a surprise memory task in which each list was re-presented with the words in alphabetical order, the task being to reproduce the previous list order. If LTM learning follows different rules than STM, one might not expect a capacity limit in this task. If, however, immediate experience conveys episodes of limited length or complexity to LTM, performance should depend on the length of the original list. That was the finding, with performance declining dramatically with the original list length and with about half of the words correct for 5-word lists. The findings suggest that each list formed a new episodic record that could be reactivated later, with the STM limits affecting how much was incidentally learned from each list. Cowan, Donnell, & Saults (2013) similarly presented lists of 3, 6, or 9 nouns for an orienting task in which the participant was to select the most interesting word in each list. Later, a surprise recognition test was presented in which the task was to determine whether two words came from nearby serial positions of the same list or nearby serial positions of different lists. This task was accomplished at better accuracy when the words came from 3-word lists compared to longer lists.

Norris’ neuropathologies of separate stores, versus neuropathologies of separate processes.

Norris asserted the existence of separate STM and LTM storage based on neurological damage cases in which STM performance is lost with preserved LTM performance, or vice versa. Before evaluating this argument, I would note that the arena of neurological damage seems like quite a tricky one that may be misleading, given the present state of the art. Most brain lesions are messy, not confined to one functionally-defined brain region or area. Each patient’s damage is unique. It is difficult to avoid approaching the patient with a biased, self-confirmatory view. Investigators often ignore cases that do not fit the pattern they are looking for, considering those cases to be impure or uninteresting. Time with a patient is usually short, and patients cannot always complete the desired tasks correctly; it can be difficult to test thoroughly. Thus, one must proceed cautiously.

Interpretation of patients with dissociations.

The results of patients with STM-LTM dissociations may have to be accounted for differently than Norris did (Table 1, Point 9). Consider the well-known patients with medial temporal lobe damage who have impaired delayed-recall performance with intact immediate recall (e.g., Scoville & Millner, 1957). If there were a separate copy of information in STM, then it would be possible to have damaged LTM with preserved STM as Norris suggested. However, Norris’ favored theory involved STM pointers to LTM information and temporary structure of the pointers as the basis of STM. There is, however, a potential problem with the latter account as applied to the patients. If LTM is damaged, then the pointers could be pointing at damaged information, yet immediate recall of subspan lists is preserved. How can this happen? The pointer theory seems to require that the information needed for immediate recall is temporarily present in memory, in the same neural tissue that ordinarily would lay down new LTM traces, but then is not permanently saved because of damage to the LTM consolidation system. A further possibility, though, is that the pointers themselves (and their temporary structure) are not unique to STM, but also serve as the basis of the LTM learning system in healthy control participants.

An obvious prediction from this aLTM account of immediate memory performance is that, if STM encoding is damaged, then there must be impairment also in LTM learning. In apparent contrast to this view is the evidence supporting preserved LTM learning along with STM impairment (Basso, Spinnler, Vallar, & Zanobio, 1982; Saffran & Marin, 1975; Shallice & Vallar, 1990; Shallice & Warrington, 1970; Vallar & Baddeley, 1984; Vallar, Di Betta, & Silveri, 1997; Vallar & Papagno, 1995; Vallar, Papagno, & Baddeley, 1991; Warrington, Logue, & Pratt, 1971; Warrington & Shallice, 1969) as noted by Norris. A key point that must be made, however, is as follows. According to theories without a separate STM copy, such as the aLTM theory of Cowan (1988, 1999), the role of LTM is different in immediate and delayed tasks. In immediate tasks, the special distinctiveness of the most recently-presented set of memoranda makes that set easily retrievable whereas, in delayed recall, the retrieval task is plagued by interference from other trials. In the memory representation, after a filled delay, there is a stream of memories marked by time and other distinguishing aspects of context, but these cues are not always sufficient to select the right memory to be retrieved (e.g., Bjork & Whitten, 1974; Brown et al., 2007; Glenberg & Swanson, 1986). Under that logic, it is possible to have a kind of neural damage that impedes retrieval of information from newly-formed LTM in delayed recall, without much harm to the ability to retrieve the same information from aLTM in immediate recall, when interference from other trials is minimized and activation has not yet decayed.

The implications of new LTM formation in STM tasks are far-reaching. Norris (p. 993) argued that there must be a separate STM store partly on the basis of a review of medial temporal lobe damage and LTM deficits by Jeneson and Squire (2012). He noted that those patients show STM deficits only “with supraspan stimuli that exceed the capacity of STM.” With these supraspan stimuli, it is clear that a learning mechanism is defective; there is a steep shelf of performance separating the subspan from supraspan lists. What is not clear is whether there is a separate STM storage mechanism as Norris supposes, or whether the STM capacity limit applies for another reason (Table 1, Point 10). Specifically, according to Cowan (1988, 1999, 2001, 2005/2016), the capacity limit of STM is in the amount of information that can be held in the FoA at once, not a separate copy of the information but a privileged state in which up to 3 or 4 integrated objects or ideas are held. Responding for up to that number of objects can occur directly from the information in the FoA or from simple learning based on it, whereas recall of supraspan lists requires that the FoA be used reiteratively, to overcome capacity limits. In that reiterative process, some information is off-loaded into aLTM as a newly-learned structure so the FoA can then grapple with additional information. The information held with the FoA could be described as a structured set of pointers, in keeping with Norris but, unlike Norris’ conception, it would also serve as a portal to LTM learning. For example, to learn the list of digits 739482, the individual might memorize 739, then 48, and then the association between these segments as 739-48, subsequently incorporating the last digit to encode 739-48-2. That reiterative process (see Rhodes & Cowan, 2018) would presumably be available for immediate recall and the products would be permanently stored, although massive interference from other trials would often preclude its delayed recall. LTM damage could impede the reiterative process, affecting only supraspan performance.

Role of interference.

In support of this account of a persistent LTM representation of new information that is difficult to retrieve after the passage of time, some studies show that, if interference can be greatly reduced during the encoding process, then delayed recall can be improved in amnesic individuals. For example, Dewar, Fernandez Garcia, Cowan, & Della Sala, 2009) found that densely amnesic patients could recall dramatically more of a word list after a 9-min retention interval when the first 6 min of that time period were spent in a quiet, dark room before 3 min of interfering material were introduced, compared to when the interfering material was introduced earlier within the 9-min retention period. The benefit of a no-interference memory consolidation period does not appear to depend on covert rehearsal during that period (Cowan, Beschin, & Della Sala, 2004; Dewar, Alber, Cowan, & Della Sala, 2014) and, in healthy older adults at least, it has been shown to persist at least a week after learning (Dewar, Alber, Butler, Cowan, & Della Sala, 2012). There are also studies indicating advantages of removing interference before, as well as after, the memoranda, for typical adults (Ecker, Brown, & Lewandowsky, 2015; Ecker, Tay, & Brown, 2015).

Processing-based accounts of neuropathologies of memory.

Optimal retrieval cues may tend to differ in immediate versus delayed recall. Therefore, if something were damaged in the kinds of encoding mechanisms needed for delayed recall, then one could expect STM task performance loss with preserved LTM performance (e.g., Warrington & Shallice, 1969). As Norris pointed out, short-term recall tends to benefit from phonological cues, whereas long-term recall tends to benefit from semantic cues. This distinction, however, need not result from separate stores. If one is planning immediate recall of a short list of words, for example, it may not be necessary to encode the list in a semantic or elaborative manner, inasmuch as the current phonological and lexical units can seem temporally distinct. In delayed recall, in contrast, there may be a greater retrieval problem if the participant received many lists that have overlapping phonological and/or lexical properties, and it could help to encode not only phonological properties, but also semantic properties that make each list unique – the novel combinations of semantic representations in each list. This logic of avoiding retrieval interference in delayed recall is consistent with evidence that contributed to a levels-of-processing account of memorability (Craik & Lockhart, 1972; Craik & Tulving, 1975), indicating better long-term memorability of stimuli originally encoded with semantic, and not only phonological or physical, properties of the stimuli.

Cowan (1988, 1995, 1999) reviewed evidence that there are, in fact, semantic aspects of stimuli encoded in short-term recall and phonological or physical aspects in long-term recall; what sets these kinds of recall apart, I would argue, is the combination of the current distinctiveness and types of interference from which the retrieval process must occur. This combination can be manipulated, for example making phonological information more useful than semantic information in long-term recall when the phonological information is more appropriate to the retrieval task (e.g., Morris, Bransford, & Franks, 1977).

Other theorists have accounted for STM damage with intact LTM not from a loss of a separate STM copy of the information (e.g., Warrington & Shallice, 1969), but from selective loss of other processes that differentiate short- and long-term recall (e.g., Cermak, 1997). Cowan (1988) offered one such alternative account:

I have discussed evidence that the short- and long-term stores cannot be distinguished on the basis of phonemic versus semantic content. The alternative view that was proposed is that the control processes associated with the two stores differ. The subject described by Shallice and Warrington may have had a deficiency in one or more of the control processes used to enhance short-term storage (e.g., covert articulation).This would also explain why the short-term memory deficit in this subject was later found to occur primarily for verbal items and why visually presented verbal items did not result in acoustic confusions as they do in normal subjects (Warrington & Shallice, 1972).These factors suggest that the parallel-stores model is not necessary to account for the results. (Cowan, 1988, p. 182)

A selective deficit in STM thus theoretically could occur even if the STM storage medium is neurally embedded within LTM. It also could occur if there is partial damage to the memory consolidation system, which could prove insufficient in its initial consolidation in real time but could repair or improve the consolidation later. In this vein, it has been suggested that the hippocampal system returns to consolidate memories when it is not engaged in processing new input (Mednick, Cai, Shuman, Anagnostaras, & Wixted, 2011). To my knowledge, given the various unknowns about the rare patients with a selective STM deficit, none of the evidence seems to rule out this approach based on selectively damaged control processes. Ruchkin et al. (2003, p. 711) made a similar argument, for example noting that “Romani and Martin (1999) reported that individuals with a semantic short-term memory deficit also have difficulty forming semantic but not phonological long-term memories, whereas individuals with a phonological short-term memory deficit show the reverse pattern of difficulty. Therefore, when the nature of the representations is taken into account, the neuropsychological evidence for distinct short-term and long-term memory stores is not compelling.”

Importantly, in the studies of list recall in patients with memory deficits, different materials are presented for memory in the short and long term. Suppose that, in an experiment, participants were told, “I am going to present a series of digits and I want you to repeat them now and also remember them for later.” It does not seem likely that an individual with selective STM impairment would be unable to repeat the digits immediately, yet show no deficit compared to typical individuals repeating the digit list the next day. Rather, immediate memory tests involve materials rather devoid of possible elaborative encoding features, or of time to use them well, and a deficit is obtained given poor consolidation of the list structure. Typical participants carry out memory maintenance for immediate recall presumably by repeatedly retrieving the material using covert rehearsal or attention (e.g., Camos et al., 2011), which can keep the list items in an activated state, albeit without great improvement of the representation. Attempting the same kinds of processes, patients with an STM deficiency would allow the materials to lose activation. Long-term memory tests typically involve cues that can lead to reasonable levels of recognition and recall later, which requires materials that can be encoded as a new, rich LTM structure that can be noticed and is thereby relatively easy to retrieve later, and STM patients may have preserved elaborative rehearsal processes for these richer materials.

Loss of verbal STM with preserved LTM has also been explained recently on the basis of another type of possible processing deficit, impaired mappings of verbal input to motor output, again without resorting to the notion of a separate STM copy (Morey, Rhodes, & Cowan, 2019). Similarly, Morey (2018) suggested that cases of visual or spatial STM damage are not reported in a manner that can clearly implicate a damaged visual or spatial STM store per se, as opposed to affiliated capabilities used in the tasks examined. In sum, unlike what Norris and others have claimed, the neuropathological literature does not appear definitive as a source of evidence for a separate STM copy.

Neuropathology and variable binding.

There is even some evidence for a convergence of STM and LTM mechanisms for the variable binding situations discussed earlier, and by Norris. Kurczek, Brown-Schmidt, and Duff (2013) examined the use of STM to interpret pronouns in participants with or without hippocampal damage. An example (p. 142) is the passage, Melissa is playing violin for [Debbie/Danny] as the sun is shining overhead. She is wearing a blue/purple dress. Remembering the names seems critical to interpreting the referent of the pronoun she. The healthy control participants and control patients with ventromedial prefrontal lesions were able to use the gender of the second-named person to determine who was intended by the pronoun and, when both names were female, these participants strongly tended to assign she to the first name. Patients with hippocampal lesions, however, did not clearly make these distinctions; the use of STM in processing appears to have been deficient. This is a kind of memory that Baddeley (2000) would have attributed to the episodic buffer, but here it can be seen that the long-term learning system is needed for the task. Baddeley’s episodic buffer may be handling what are, in reality, products of the long-term episodic learning system applied to STM situations.

Norris’ view of neuroimaging evidence as correlational, versus neuroimaging evidence for STM as activated LTM.

Norris discusses various forms of evidence dwelling on neural activity. Activity typically related to the encoding and long-term storage of a particular kind of information in functional magnetic resonance imaging (fMRI) studies shows up also during STM tasks. This kind of finding has been used to argue that the basis of STM is aLTM, but Norris points out (p. 998) that “The fact that LTM activity can be decoded during short-term retention interval does not imply that those LTM representations are responsible for short-term retention.” That false implication was called a correlation fallacy, in which a correlate is unjustifiably assumed to be causal.

Although it is a reasonable point to be careful not to interpret correlation as causation, and Norris rightly considers neuroimaging evidence to be correlational with respect to behavior, such correlations are still useful in distinguishing between theories (Table 1, Point 11). For example, we have a theory of gravity based largely on planetary motion that we cannot manipulate. Correlations do not prove causation, but they do point to good places to look for possible causation, as researchers typically assume, for example, when they use structural equation models. Neuroimaging evidence in fact paints a story that seems friendly to the notion that there is a common mechanism for STM storage and new LTM learning during the course of an STM trial.

Interpretation of the neuroimaging evidence.

Recent research goes beyond a correlation between memory and neural activation in several ways, including observations of when neural activation appears, disappears, and reappears. The research takes advantage of multivoxel pattern analysis, a technique in which one can examine patterns of activation specific to certain kinds of stimuli. In a neuroimaging procedure used by Lewis-Peacock, Drysdale, Oberauer, and Postle (2012), for example, two types of stimuli to be remembered are presented on a trial (e.g., a word and different orientations of bars), followed by a cue that a recognition probe for one of these stimuli is about to be presented (e.g., a word). The recognition probe and response are then followed by a second cue, forewarning of another probe either in the same modality or in the other modality. When the first cue indicates that a particular type of item (e.g., bars) are not immediately needed, it has been found that the pattern for that type of item subsides to baseline. If, however, the second cue indicates that that type of item will soon be needed, its activity pattern has been found to revive. Thus, the information not currently needed, but possibly needed later in the trial, is preserved in a dormant or inactive form. The distinguishing features of the information are heavily based on posterior brain regions that are active in the initial encoding of stimuli of different sorts, suggestion that information highlighted by the FoA could be a reactivation of neural patterns present when the items were initially perceived and memorized. Elsewhere in the brain, there may be active neural patterns also for items that are needed but are currently not attended (Christophel, Iamshchinina, Yan, Allefeld, & Haynes, 2018).

In fMRI research on serial order STM, Kalm and Norris (2017) recently found frontal and temporal regions that are active during both the encoding and the recall of the serial order of pictures. Given what is known about the brain, it is possible that there is a temporal region representing serial order regardless of the domain of the stimuli, and that the frontal region is involved in the process of memorizing serial order relations. If the serial order STM storage mechanism doubled as an LTM learning mechanism, one might have expected activation including or surrounding the hippocampus as well, given the aforementioned, well-known relation between hippocampal activity and long-term learning (e.g., Mednick et al., 2011), but hippocampal activity is known to be difficult to detect unless one is looking for it specifically, and it could have been overlooked in this study. Elsewhere, there is evidence suggesting hippocampal involvement in order memory. For example, Öztekin, McElree, Staresina, and Davachi (2008) presented lists of 5 letters in tasks of item recognition and judgment of recency and, in both tasks, found fMRI evidence of the involvement of both the frontal-parietal attention network and the hippocampus (cf. Ekstrom, Copara, Isham, Wang, & Yonelinas, 2011).

Causal neural evidence of aLTM involvement.

Rose et al. (2016) have now taken the field closer to a causal model of behavioral activation, and a clearer idea of the neural substrate of behavioral activation. In particular, for a type of item potentially needed later in the trial but not needed currently (e.g., faces), transcranial magnetic stimulation (TMS) of the appropriate area in the posterior cortex brings back the telltale neural pattern and brings back the behavioral sign of its presence in the FoA. Magnetic stimulation does not bring back the pattern of an item that is definitely no longer needed for that trial. These results suggest a dormant but still relevant status that we might identify as aLTM, with the revived neural pattern indicating inclusion in the FoA.

Norris’ claim that there is no explicit model of aLTM, versus actual models including new learning.

Norris implied that there are no models of STM as aLTM designed to account for data in detail, such as serial position functions in recall (Table 1, Point 3). My own embedded-processes model was depicted as somehow mutating over time repeatedly to account for new behavioral data, in such a manner that, by the time of Cowan and Chen (2008), there was very little remaining reliance on aLTM to carry out the work of STM recall:

Given that not all short-term storage in Cowan’s model is supported solely by activated LTM, the crucial question then is what is the remaining force of the claim that STM is activated LTM? Is there any part of the process of retaining information over the short-term that can be served simply by activating LTM? One factor that makes it hard to answer this question is the absence of a computational specification of what it means for LTM to be activated, and of how that activation then supports memory. As Cowan’s position has evolved to accommodate a broader range of behavioral data, it has had to respect the fact that very little of that data can be explained purely in terms of activation. (Norris, 2017, p. 996)

The only relevant shift in my model that I can think of is that the point about aLTM including new learning, and thus remaining critical in all sorts of STM tasks, may not have been made clearly until Cowan (1999). Most of the other concerns may come from this early statement having been missed. Moreover, the extra something that Norris was looking for in addition to aLTM, something to represent serial order information, may have been present in the Cowan (1988) model all along. It was described in the form of the FoA and its functions in conjunction with LTM. The FoA was said to have a small capacity (3–5 items: Cowan, 2001) in comparison to aLTM limited not by capacity but by decay. It was stated (Cowan, 1988, p. 171) that the central executive processes worked with the FoA to carry out, among other functions, “problem-solving activities including principled long-term memory retrieval and a recombination of short-term memory units to form new associations” and (p. 177) that “The central executive calls up additional relevant information and forms broader associations among the stimuli and between the stimuli and prior memories.”

Formal embedded-processes model.

I have acknowledged that accounting for serial order information is difficult, and have sometimes taken the approach of trying to examine capacity with serial order concerns removed. Thus, Chen and Cowan (2009) found constant capacity of recall across lists comprising multi-word chunks of 1 or 2 words (in the latter case learned through repetition), using a scoring method in which serial order errors were ignored; and Cowan, Rouder, Blume, and Saults (2012) found a capacity parameter based on recognition of single words within lists of 1-, 2-, or 3-word sequences that were familiar based on their semantics and on idiomatic expressions (e.g., ball; garbage truck; leather brief case), which once more did not require serial order information, except for the rapid learning of semantically viable chunks like leather brief case. So, in my own work I have not much tackled the basis of serial order information. In that sense, my modeling efforts (culminating in Cowan et al., 2012) have in fact focused on aspects of aLTM that could be based on temporary activation of already-known units, without worrying about most of the contribution of new learning of inter-item structure during the trial itself. Cowan et al. showed that this kind of information is modeled well by a constant capacity within an individual of about 3 chunks on average, supplemented by an additional contribution of information from newly-learned aLTM for single-word chunks. Despite this focus on item information in the formal model, what is needed in principle for memory of serial order and other inter-item structure in the stimulus set (e.g., spatial arrangement of an array) can, I would argue, be accounted for by new learning that becomes available as part of aLTM by the time a response is required on the trial.

There are exceptions to my not having dealt with structure of a memory set. Most relevant is an investigation (Cowan, Saults, Elliott, & Moreno, 2002) in which nine-digit lists were recalled starting at Serial Positions 1, 4, or 7 depending on the recall cue. The results showed that output interference explained the typical serial position function of serial recall and showed that for triads recalled first, the serial position function looked very much like the typical free recall function (cf. Bhatarah, Ward, Smith, & Hayes, 2009; Grenfell-Essam, Ward, & Tan, 2017; Ward, Tan, & Grenfell-Essam, 2010).

Interpretation of models of serial order.

I appreciate the many sophisticated, rigorous attempts that investigators have made to account for and model serial order information (Anderson, Bothell, Lebiere, & Matessa, 1998; Anderson & Matessa, 1997; Botvinick & Plaut, 2006; Brown, Neath, & Chater, 2007; Brown, Preece, & Hulme, 2000; Burgess & Hitch, 1992, 1999, 2006; Farrell, 2012; Farrell & Lewandowsky, 2002, 2004; Grossberg & Pearson, 2008; Henson, 1998; Houghton, 1990; Hurlstone, Hitch, & Baddeley, 2013; Lewandowsky & Farrell, 2008; Lewandowsky & Murdock, 1989; Nairne, 1990; Page & Norris, 1998, 2009) and order in free recall (Howard & Kahana, 2002; Lohnas, Polyn, & Kahana, 2015; Polyn, Norman, & Kahana, 2009; Sederberg, Howard, & Kahana, 2008). The viability of an approach involving aLTM with new learning does not depend on coming up with a separate serial order memory model specifically within the embedded-processes framework, inasmuch an adequate model of serial order memory in STM formulated by another investigator also could also serve as the long-term learning mechanism.

The case against this view that the embedded-processes approach could adopt a previous model of serial order in STM is essentially that much is remembered in the short term that is forgotten in the long term (see Norris). My counter-argument is that all information that is used in STM tasks may, in some ways, enter and alter LTM, but that one cannot expect to see this happening because the information becomes more difficult to retrieve after long, filled retention intervals. The most common assumption throughout the history of cognitive psychology is inevitable transmission of some information from every STM episode to LTM (e.g., Atkinson & Shiffrin, 1968; Broadbent, 1958; Schacter, 1987). Consequently, it seems to me that the onus for assessing this STM-to-LTM transmission must be born by investigators with all views pro and con. For example, in future work, models of long-term episodic learning (e.g., Gers, Schmidhuber, & Cummins, 2000; Wixted et al., 2014, 2018; Wörgötter, & Porr, 2005) could be compared to STM models to find out if they are compatible.

In my further discussion I will focus on a particular model, that of Burgess and Hitch (1999, as modified in 2006) for reasons to be explained shortly. Hurlstone and Hitch (2018), along with Norris, recently discussed the literature on models of serial order memory. Among the models (which I cited earlier), the foremost principle is competitive queueing, a process in which there are two layers of nodes (simulated neural centers), a parallel planning layer along with a competitive choice layer. In a first step, relative levels of activation are established for nodes in the parallel planning layer, and then the competitive choice layer determines which node wins the activation contest. For the present purposes, the most critical model, which was based on competitive queueing, is that of Burgess and Hitch (2006). It is critical because their model addresses the interplay between STM and LTM. In the model, each item in a list to be remembered is encoded along with associations to its context, and the context includes serial position and grouping cues, as well as cues from learned information. These associations between items and context guide the parallel planning layer. The item-context associations are said to decay at two different rates, a rapid rate that is used to explain transient phenomena, and a slow rate that is used to explain longer-term phenomena. Burgess and Hitch (2006, p. 630) noted that “The strengthening process has two components, one large-amplitude and short-lived, the other small-amplitude but slowly decaying and so more cumulative.” The justification for this distinction between two types of learning with different decay rates (Table 1, Point 5) was that learning of repeated lists (the Hebb effect: Hebb, 1961) was not influenced by phonological factors said to be specific to the STM process. For example, doing a serial recall task while repeating a word over and over (articulatory suppression) did not alter the rate of learning of repeated lists, but did worsen recall at every stage of learning. A slow decay process was needed even for the longer-term learning because the rate of learning depended on how many non-repeated lists separated the repeated lists.

The model of Burgess and Hitch (2006) does a lot, but what is at issue here is the stipulation of two decay rates. One problem with these rates is that, for word lists, it has been difficult to find any direct evidence of decay at all. Oberauer and Lewandowsky (2008) imposed variable delays between items in the recall period of a serial recall task, sometimes filled with concurrent articulatory suppression and attention distraction, and still found little or no decay. To be sure, there are theories of serial recall that depend on decay and counteracting refreshing and rehearsal processes to explain time-dependent aspects of recall, including the ability to recall as much as one can recite in about 2 s (Baddeley, Thomson, & Buchanan, 1975), the ability to recall more when a higher proportion of time between items is free of distraction (Barrouillet, Portrat, & Camos, 2011), and both constraints together (Camos et al., 2011). An alternative possibility that can account for the same results (Table 1, Point 6) is that more free time allows better consolidation of the memory trace, reducing the rate of decay (Rhodes & Cowan, 2018; Ricker & Cowan, 2014; Ricker, 2015).

One supposed difficulty for a single process for serial memory in STM and LTM is that neither articulatory suppression nor phonological similarity affect sequence learning, though they affect the overall level of STM recall (Burgess & Hitch, 2006). This finding is taken to indicate that the phonological loop of STM is separate from the long-term serial learning process (Table 1, Point 4). Another possible interpretation, however, is that the long-term learning of serial order that is observed is based on whatever distinctive features of the stimuli are clearest. With randomly ordered lists of words selected to avoid phonological confusions, the lexical (including both morphological and phonological) features may tend to be clearest. In contrast, with lists selected to include many phonological confusions between words, the clearest features may be the semantic ones, and it may be those that form the basis of long-term learning of serial order. In both conditions, therefore, only one learning process would take place. There are results that would be difficult to explain with the full Burgess and Hitch model that seem easier to explain with this notion of distinctive features subserving learning; an example is the finding that phonological neighborhood effects make a difference for serial order STM (Clarkson, Roodenrys, Miller, & Hulme, 2017). There are other models suggesting that LTM learning becomes involved in complex working-memory tasks in which items to remember are interspersed with processing episodes, and in simple list memory tasks that exceed capacity (e.g., Unsworth & Engle, 2007).

A second supposed difficulty for a single process for serial order in STM and LTM is that the number of non-repeated lists between the repeated lists makes a difference for how rapidly the repeated lists are learned (Burgess & Hitch, 2006; Melton, 1963). This effect could be a matter of interference between repeated and non-repeated lists, which seems much more likely than some kind of slow decay rate. With the slow decay rate idea, it would be difficult to explain word sequence learning that is permanent, in keeping with what is usually thought about LTM.

One potential difference between STM and LTM learning is that list recall can seem to be based on item-position associations in STM, sometimes resulting in intrusions from a previous list (e.g., Henson, 1998), whereas the Hebb effect is not found with only alternating items preserved between lists (Burgess & Hitch, 2006). This discrepancy, however, may occur because the Hebb effect depends on many repetitions of a list. Early learning after one exposure may consist of item-position associations, which are of limited efficacy in the long term given interference from other lists, whereas continued learning with additional exposures may result in a reorganization of the material into a list-wide structure.

Another potential problem for a single-process approach is that one might think that if immediate recall were based on long-term learning, recall would become impossible after several trials because of the buildup of proactive interference. What is missing from this conception is the notion that the newly-learned information is in an activated state, which makes aLTM more accessible and less likely to be interfered with than items in dormant, inactive LTM, especially because the FoA and maintenance strategies help to maintain the activation. In procedures in which the activation cannot be maintained because of interference that is introduced, there is indeed extreme loss that occurs when proactive interference emerges after several trials (Keppel & Underwood, 1962).

Assessing another model with an STM separate copy of information.

Oberauer (2009) proposed a model in which there are two different mechanisms for short-term retention and long-term retrieval of information. The general gist of the discussion was that simple associations exist in STM but are insufficient to account for long-term learning of roles that items can fill. In a key example, Oberauer explained the following.

To recover the fact that the pastor calmed the businessman (in one particular time and place), the system needs a mechanism to tell that the association of ‘businessman’ with patient belongs together with the association of ‘pastor’ to agent (as well as the association of information about time and place to the roles of time and place). In other words, there must be a mechanism to associate pair-wise content-role associations with each other. Because associations are not themselves representations, it is not obvious how they can be associated together. Therefore, long-term learning of structural information cannot simply consist of translating the bindings in WM into corresponding associations one-to-one. (Oberauer, 2009, p. 78)

Oberauer went on to explain how chunking of information can handle this complex learning. I agree with the complexity of the learning, but the assumption that this same complexity is not needed in STM seems erroneous, and may be an illusion promoted by the greater simplicity of materials we typically present for immediate memory tasks. In a key immediate-memory task like comprehending a spoken sentence, one must set up these roles immediately and not wait for some slow learning process. Thus, the short-term and long-term tasks may depend on the same rapid learning of complex roles. When a model was constructed to account for declarative and procedural working memory performance (Oberauer, Souza, Druey, & Gade, 2013), it had a fast, STM-modification process called binding and slow, LTM-modification process called learning, but it was found that slow learning played little role in the computational model and could be safely set to zero. There is no way that this finding can be taken to indicate that long-term learning is unimportant, given trial-to-trial carryover effects that were obtained; rather, it suggests that there could be just one learning parameter, which operates rapidly. Much of the learned information may be in LTM but later unavailable because of massive interference. Further experimentation and modeling of this sort would be useful to confirm a single learning rate.

Modeling STM as rapid new learning.

To theorists favoring the separate-STM-copy mechanism, the rapid-learning process described by Cowan (1999) may seem fanciful. Isn’t it a sleight of hand cooked up just to make the aLTM theory work? Far from being that, it is essential to account for the evidence and most theorists already rely upon it, including second-copy theorists, although they may not always realize it. Norris repeatedly mentioned that certain phenomena require LTM without explicitly noting that it is new configural learning that he is talking about. For example, in the discussion of an fMRI study by Öztekin, Davachi, and McElree (2010), it was stated (Norris, pp. 997–998) that “As has already been shown, hippocampal activity is to be expected in STM tasks simply because LTM cannot be turned off in cases where it might not be needed. In fact, LTM is very likely to be needed in this study, as a 12-item list will be well beyond the normal span of STM.” Yes, but this requires new learning of the items in the list. How do we know that the same mechanism is not responsible for learning shorter lists as well, but with less noticeable hippocampal activity given an easier task? I have already addressed reasons why STM and LTM results based on a common storage mechanism would still differ.

The definition of aLTM within models of STM.

Norris suggested that the definition of activation in aLTM is unclear:

If STM is supported exclusively by activated LTM, it seems reasonable to ask what computational function is performed by activation that would enable it to encode, maintain, and retrieve information from STM. This is a fundamental and largely unrecognized problem with all models invoking activated LTM. Although the core explanatory concept is activation, there is no explicit definition of what activation means. In the memory literature the term activation often refers to the deployment of a limited capacity resource that can be used to support WM (Anderson, 1983; Cantor & Engle, 1993; Just & Carpenter, 1992). However, there is no computational definition of activation of LTM that would explain how that ‘activation’ might be sufficient to maintain representations in STM…it simply is not at all clear what it means to say that STM might be activated LTM. (Norris, 2017, pp. 998–999)

Ideally, there would not be a single definition of activation but separate definitions for behavioral and neural evidence that perfectly co-occur, much as one can define fire both in terms of a chemical oxidation reaction and in terms of its outward signs in the forms of heat and light. Cowan’s (1988, 1999) conception of activation was in terms of an accessibility that had no capacity limit but supposedly decayed away in somewhat less than a minute in the absence of any maintenance strategy or interference. Decay would be observable both behaviorally and, presumably, neurally. Originally, my conception was probably largely derived from a book I read in college, Hebb (1949), describing cell assemblies that underlie thoughts via reverberating neural circuits for concepts, presumably only until the circuit runs out of some physiological resources and activation collapses, making the representation dormant. In that conception, the cell assembly is an LTM concept that carries with it aLTM as an activated state of neural reverberation. (I see from an interview in The Psychologist, September 2008, volume 21, p. 832, that Alan Baddeley also listed Hebb, 1949 as a primary inspiration.)

Recent research requires refinement of the notion of activation, however. If decay as a function of time is not easily observable for series of well-learned and well-encoded stimuli (cf. Oberauer & Lewandowsky, 2008), then what we are left with is the ready availability of an item or sequence from the time it was originally encoded or last retrieved to the time when interference makes it no longer readily available. It would be lost not as a pure function of the passage of time without maintenance activities, but as a function of time in a manner dependent on interference. If a short list is unique, it will be memorized in a stable manner but, if the same short list is easily confusable with other recent lists, then it will not last long in activation without maintenance strategies (Keppel & Underwood, 1962; Melton, 1963; Peterson & Peterson, 1959).

The kind of activation that is supposedly capacity-limited (Anderson, 1983; Cantor & Engle, 1993; Just & Carpenter, 1992) differs from my own conception of capacity-unlimited activation and, in retrospect, may be the same thing as the information driven by current or very recent attention (Lewis-Peacock et al., 2012; Rose et al., 2017). The ready availability of this kind of activation has been conceived as something that results in faster retrieval than other information (e.g., McElree, 2001; Gilchrist & Cowan, 2011). Given the limits of attentional vigilance, this kind of activation can be subject to loss as a function of time, i.e., a kind of decay. It may correspond to neural activation, whereas capacity-unlimited activation may be defined by synaptic weights (Rose et al., 2017; but see Christophel et al., 2018).

Activation, then, is simply the degree of availability for retrieval. The retrieval of information that was just encoded would be similar in form to the retrieval of information from long-term memory. A capacity-limited portion of that retrieved information is held in attention and probably produces neural activity, and also influences synaptic weights. The rest of the recently-retrieved information exists outside of attention, possibly as a set of these enhanced synaptic weights for the concepts involved. The temporary enhancement may remain until there is interference, but poorly-encoded or poorly-consolidated information may not establish very clear synaptic weights, in which case there is rapid forgetting of those items (rapid decay) when they are not attended (e.g., Ricker & Cowan, 2014).

Consideration of LTM serial order learning mechanisms and LTM-related physiology (e.g., Gers et al., 2000; Wixted et al., 2014; Wörgötter, & Porr, 2005) may place constraints on these theories of temporary, STM contextual maintenance, if in fact STM maintenance and LTM learning of the information may be one and the same. This identification of short-term and long-term memory seems to be implied by one leading theoretical approach (Anderson & Matessa, 1997; Anderson et al., 1998).

Norris’ modularity of stores, versus non-modularity.

Norris articulated evidence for separate phonological, visual-spatial, and episodic buffers (cf. Baddeley, 2000) or storage modules. In principle, either modular or non-modular storage could reflect either separate copies of information into STM or new learning in LTM. However, the modular view seems poorly suited to new learning, inasmuch as experience and memory generally involve inter-relations of features from various modalities. Therefore, it is important to question the evidence for modularity. Norris (p. 993) stated that “The critical evidence for a phonological store with a limited duration is that while memory confusions at short retention intervals are primarily phonological in nature, confusions at longer retention intervals tend to be semantic.” Visual interference typically harms visual-spatial item recall more than verbal, and phonological interference typically harms verbal recall more than visual-spatial (Baddeley, 1986), though there can also be generation of a visual code for printed materials (Logie, Della Sala, Wynn, & Baddeley, 2000). Evidence for the episodic buffer included the point that, although the STM patient PV had an auditory word span of one item, the patient could remember a meaningful sentence comprising up to five words (Vallar & Baddeley, 1984) presumably using this buffer.

It might seem awkward for separate STM buffers to be combined somehow into more general LTM episodic memories. Cowan (1988, 1999) takes issue with the general value of a rigid taxonomy with just three stores, while acknowledging modality- and code-based processing differences. Cowan’s alternative suggestion is that there are many codes saved in STM (acoustic, visual, tactile, phonological, orthographic, semantic, and so on) and that there is no strong evidence that these different codes can be subsumed under just a few discrete stores. The conception of Cowan was more of a soup of activated features of a variety of types, each subject to decay and interference from subsequent items with similar features in aLTM. The activated information can include associations between features, as they are organized into objects and events. Many newly-learned associations, though, may be too weak to last long before decaying out of activation, at which point retrieval of the information from LTM becomes unlikely without especially strong cues. Instead of several buffers, there might be micro-buffers for many specific types of features, but we do not know enough about them to suggest a taxonomy of them at present. The organization of information of multiple modalities and codes into coherent objects and events occurs in the FoA, which serves many functions similar to the episodic buffer.

Norris suggested that imaging results show a commonality in activation between perception and visual STM (Harrison & Tong, 2009; Serences, Ester, Vogel, & Awh, 2009) but that

… at the moment it is far from clear how the neuroimaging data can be mapped onto cognitive models of visual STM. For example, some have argued that there is more than one visual STM. Basing his argument primarily on neuropsychological data Logie (2003) has suggested that there may be one store for spatial layout, and one that supports the ability to manipulate mental images, and to retain dynamic information such as sequences. (Norris, 2017, p. 1001)

Rather than a modular visual-spatial store that then must be subdivided, it seems simpler and more natural to suggest, along with Cowan (1988, 1999), that stimuli give rise to multiple types of features in memory and that interference with each kind of feature occurs when a potentially interfering item, linked to the target item by temporal proximity or some other association of context, shares similar types of features. Moreover, I have suggested that interference is more damaging when the LTM representation of the target item is not well-established; there will be greater interference and more decay for memory of unfamiliar characters or pure tones differing in frequency (Cowan et al., 1997; Ricker et al., 2010) than for known words or familiar objects (Oberauer & Lewandowsky, 2008; Endress & Potter, 2014; Wolfe, 2012).

According to Morey (2018; Morey & Bieler, 2013) there is an asymmetry that could be interpreted to indicate that verbal STM is more modular than visual STM, in that cross-modal distraction has a much more severe influence on verbal memory than it does on visual memory. Memory for lists of verbal materials tend not to decay in the absence of interference (Oberauer & Lewandowsky, 2008), whereas memory for arrays of visual items decays markedly over a 12-s period (Ricker & Cowan, 2010). I believe, though, that the asymmetry may not be a function of the modality per se but of the amount of learning behind the representation. We all have learned verbal stimuli very intensively over our lifetimes, which can accelerate new learning. For example, if one knows the words “brick” and “fish,” one can form a new image of a brick with a fish on top, or a new phonological form, “brick-fish.” The visual stimuli in the studies examined seem to be abstract, with novel or arbitrary combinations of features such as color, shape, and location, which do not lend themselves to rapid new learning that is distinct from trial to trial. When enough time is devoted to taking in and consolidating novel visual stimuli, they tend to decay from memory much slower (Ricker, 2015; Ricker & Cowan, 2014; Ricker & Hardman, 2017). When known visual objects are used, a host of them can be represented in aLTM (Endress & Potter, 2014; Wolfe, 2012) instead of the 3 or so found with meaningless novel objects (e.g., Luck & Vogel, 1997). When the acoustic stimuli are pure tones from a continuum rather than learned verbal items, they do not last long in STM (Cowan, Saults, & Nugent, 1997; Keller, Cowan, & Saults, 1995).

When evaluating new learning, the conditions of learning must be taken into account. Sometimes repetition of items in STM tasks will produce a retrievable representation, as in the Hebb effect in which a list that is repeated over a number of trials within a serial recall task will start to be memorized (Hebb, 1961). Other times, the new learning of a repeated set may not be strong enough to overcome interference. For example, Logie, Brockmole, and Vandenbroucke (2009) found little learning of a repeated array of colored shapes when the repeated array was interspersed with new-array trials. Given that the array size was above capacity, participants may encode a different subset of the repeated array into memory on each trial, in which case the encoded subset would not be psychologically identical from trial to trial, impeding learning. In contrast to visual arrays, lists of words may result in a growing portion of the list learned each trial, given the sequential presentation of known items that can be combined to form distinct new conglomerates. What appears to be a modality asymmetry actually may depend on differences in the familiarity of the materials and stability of the learning conditions built into the task, and strict modularity of storage may be unnecessary in any modality.

Norris argued for modularity under a pointer system and against a single system pointing to any kind of information, stating (p. 1003) that “…the behavioral data show that the qualitative behavior of the different stores is different. At the very least they have different time-courses. If short-term retention is controlled by a single pointer system, then all short-term storage should have the same time-course.” The evidence for the different time courses was not clear to me. Assuming for the moment that it is true, there are multiple reasons why a single pointer system could produce different time courses of serial order memory loss in different modalities. This could occur, for example, because there is differential interference or knowledge for different kinds of information represented. In sum, storage may not depend on separate STM modules that contain copies of information in memory or separate pointers, but an LTM rapid-learning system with currently activated items and features, with newly-learned structure and bindings between elements, some of which are enhanced at any moment by the FoA.

An Emerging Conception of STM Based on aLTM and Attention

In what follows, I sketch my current, updated embedded-processes conception of the memory system, in which there is no separate copy of information other than mechanisms that also participate in long-term learning; evidence supporting that notion (see Figure 1); and suggestions for the research needed to assess this view. A first subsection shows how separate STM and LTM functions can be derived from a common storage basis. Various subordinate issues are considered (separate response patterns for STM and LTM, no separate STM copy, aLTM in behavior and neuroscience, rapid long-term learning contributing to aLTM, and variable-rate decay of aLTM representations). A second subsection deals with the role of attention and executive function in directing aLTM and new learning. Two subordinate issues are considered (the behavioral function of the FoA, and the neural underpinning of the working memory system considering both aLTM and the FoA together). A third subsection considers the consistency of this emerging conception with the recently reviewed, broader literature on benchmarks of STM.

Separate STM and LTM Functions from a Common Storage Basis

Separate response patterns for STM and LTM.

I am in agreement with Norris that STM and LTM are not simply implications of a single, unitary memory system. Evidence in favor of some sort of separation comes from research showing different patterns of serial position effects and interference effects for immediate-recall versus delayed-recall-following-distraction types of tasks, both in free recall (Davelaar, Goshen Gottstein, Ashkenazi, Haarman, & Usher, 2005) and in serial recall (Cowan, 1995, Chapter 4; Cowan, Wood, & Borne, 1994).

No separate STM copy of information.

The difference between STM and LTM need not be in terms of a separate copy of information dedicated to STM. Rather, the difference is whether a memory task response can be made using information about the materials to be remembered that has survived in an activated form since presentation (aLTM), or whether the response can only be made on the basis of information retrieved from an inactive state in LTM, returning it to an aLTM state. The latter imposes more daunting amounts of interference with retrieval, requiring more specific retrieval cues.

The best prior evidence for a separate copy of information in STM was the finding (e.g., Warrington & Shallice, 1969) that individuals could show STM loss with intact LTM, along with the more prevalent case of LTM loss with preserved STM (e.g., Baddeley & Warrington, 1970; Scoville & Milner, 1957). Alternative accounts of these findings have been based on the notion that representations in aLTM (with new learning) underlie both STM and LTM performance, but with different processes for maintenance and/or retrieval of memory for STM versus LTM (Cowan, 1988; Morey et al., 2018, 2019). In the absence of interference near the memoranda, LTM memory loss can be greatly reduced (e.g., Cowan et al., 2004; Dewar et al., 2009).

Norris embraced the evidence from selective STM loss to support a separate-copy theory, but later suggested that what may be saved as a separate copy is not the representations per se, but pointers to them. That kind of hypothesis, however, could be complicated if the preserved pointers would have damaged or inaccessible representations at which to point. This is not a fatal problem with the separate-pointer theory, but clarification is needed.

Activated LTM: relating behavior to neuroscience.

The definition of LTM activation here is based primarily on behavioral evidence, with hypothesized neurophysiological substrates of activation that are still uncertain. Behaviorally, activation refers to a temporary heightened availability of the material. It can include new learning contributing to aLTM, which allows it to include activation representing contextual features such as the serial or spatial positions of items, or their semantic relations to one another. Activation can be lost through decay if the important features of the material are not well-learned, and activation can be lost through the presentation of additional items with related features.

One can use the accuracy and latency of responses in a memory task to plot the decline of activation during a retention interval. The decline in these measures marks the loss of activation to an asymptotic level. Activation can be measured from the onset of either a perceived item, or a remembered item retrieved from LTM (cf. Fukuda & Woodman, 2017). If memorability declines over time and then a cue reminding the participant about the presentation or retrieval context increases the retrievability of the item again, that item is considered to have been inactive and to have been reactivated into an aLTM state. Activation precludes the need for contextual retrieval cues because the context is current.

A good experiment to explicate this definition of activation is one by Wickens, Moody, and Dow (1981). Wickens et al. produced memory search functions for pre-memorized lists that included a main effect on reaction time, which was longer when the list had to be retrieved from LTM just when it was needed as opposed to when enough time for retrieval from LTM was provided before the list was needed. There was also an effect of the number of list items on search time, as usual, which was not altered by the LTM retrieval effect. Retrieval from LTM can be seen to have preceded an STM search process (which in this context means searching through those representations that have been activated from LTM).

Information about serial or spatial position in a set of item presentations or events, which can be obtained only from information apprehended on the trial itself, is also said to be held in new aLTM formed from newly-learned information. Estes (1972) proposed that information about serial position was lost due to perturbations over time, leading to noisier position coding, and Nairne (1992) showed that this process matches what happens to the serial order of serial recall as a function of the delay interval. Cues to the list-presentation context presumably can result in the renewed sharpening of the serial position representation. As a demonstration of the principle, imagine that one is told that the initial letters of the names of the colors of the rainbow can be encoded as the name of a fictitious individual, ROY G. BIV. Later, if one is asked to recall the colors of the rainbow in order, the relevant information should be dormant in LTM and the clue, fictitious individual may be of assistance in restoring the information to an aLTM state.

Future evidence may well require fine-tuning of the definition of activation, which is offered as a first pass. For example, it is unclear whether variations in decay rates are absolute or functional. According to absolute variation, one would have to identify a hidden (e.g., neural) variable showing the amount of loss. For a newly-presented item there might, for example, be a larger proportion of relevant neural cells returning to a non-firing state every second if the item is presented only briefly (resulting in poor consolidation), compared to a longer presentation. According to relative variation, the loss of the hidden variable would be the same no matter what the degree of consolidation, but a better-consolidated representation would be useful for a particular task longer than a poorly-consolidated representation. At present, it seems impossible to tell the difference between these flavors of the term “decay” but future research might tell.

The neural substrate of aLTM is currently unclear. On one hand, some fMRI studies suggest that neural activity representing a particular kind of item can be observed if the item is immediately needed and therefore is in the focus of attention, but not if the information is needed only later in the trial (e.g., Emrich, Riggall, LaRocque, & Postle, 2013; Lewis-Peacock et al., 2012). According to those studies, aLTM more broadly might be represented neurally in the form of synaptic weights that are not visible to fMRI (Rose et al., 2016).

For the future, what would be helpful is a common fMRI procedure to examine both item and order encoding in the brain (cf. Healy, 1974), both in immediate recall and in delayed recall. For example, one could present a series of pictures one at a time in different screen locations; the prediction is that, if the task is one of recalling the spatial locations, the Lewis-Peacock et al. (2012) visual areas of activation should show up, and perhaps the Christophel et al. (2018) areas; whereas, if the task is one of recalling the serial order of presentation, the Kalm and Norris (2017) areas should show up. The areas of activation of the representations should be similar in immediate or delayed recall, even though there should be other differences in the activity patterns reflecting the different strategies needed for processing in immediate versus delayed recall situations.

The hypothesis that the representation of order is comparable in STM and LTM leads to the prediction that it should be possible to become deficient at short- and long-term order recall with relatively well-preserved short- and long-term item recognition. There is some evidence for impaired order memory with preserved item memory, but it to my knowledge it has only been tested separately in STM procedures (Cowan et al., 2017; Martinez Perez, Majerus, & Poncelet, 2013; Martinez Perez, Majerus, Mahot, & Poncelet, 2012) versus delayed recognition and recall procedures (Mayes et al., 2001). However, the STM order-specific deficits involved dyslexic individuals, including adults (Martinez Perez et al., 2013) and another study showed a deficit in Hebbian long-term order learning in dyslexic adults (Szmalec, Loncke, & Page, 2011), evidence that is at least consistent with a general STM-LTM order learning mechanism that is capable of being damaged separately from item information.

Rapid long-term learning contributing to aLTM.

It is proposed that newly-learned information contributes to the aLTM that an individual uses in recall tasks. Supporting this proposal, take as an example a phenomenon that helped establish the concept of STM, that of Peterson and Peterson (1959). They found that a printed trigram to be remembered, followed by counting backward from a high number by 3’s, as a distraction, led to drastically worsening recall of the trigram as the counting interval went from 0 to 18 s. The interpretation was that STM of the trigram was lost during the backward counting because rehearsal was not possible. Their evidence was soon removed from the arsenal of findings clearly supporting distinct STM and LTM stores, because of what Keppel and Underwood (1962) found. The latter did a more fine-grained analysis of this procedure and found that, in the first few trials, there was almost no forgetting as a function of the distraction-filled delay. That result seems to require the notion that recall of the trigram after the delay can be accomplished by retrieval of a newly-learned representation from LTM that has become inactive during the delay. If there have been even a few previous trials, however, recall of this representation becomes impractical, suggesting that it is a rather weak representation that is quite susceptible to proactive interference.

What happens at short delays is still open to debate. According to a unitary-memory view, at short intervals the most recent trigram is temporally distinct from the others, allowing excellent recall, but it loses its distinctiveness as a function of time. According to a two-store view, after proactive interference has set in, recall at short delays can still be based on STM. Supporting that view is evidence of some decay in a single-trial experiment (Baddeley & Scott, 1971). If it is decay that takes place, however, it is not a simple decay; the rate of loss of the first consonant in the Peterson and Peterson (1959) procedure depends on how many consonants are being held concurrently (Melton, 1963). Regardless of which view one takes, with or without the involvement of some kind of STM function, it still seems necessary to acknowledge the rapid long-term learning of each list in order to explain the proactive interference effects, in contrast to the naive view that they had tapped into a pure measure of STM. Subsequent studies adopted the modified view in which tests are not pure, for example in Waugh and Norman’s (1965) celebrated theory in which only the end of a long list is retrieved using STM.

Fundamentally, given that almost everyone now acknowledges that new long-term learning takes place rapidly, a key question becomes whether that long-term learning might be the same thing as the mechanisms that mediate maintenance of new contextually-specific information in STM, such as serial order in a list. If there is no distraction to interfere with that newly-learned information, it may remain in aLTM throughout the trial.

Variable-rate aLTM decay.

My view has evolved from the possibility of decay at a fixed rate to the notion of variable-rate decay depending on the amount of learning of the material. Cowan’s (1988, 1995) concept of aLTM was a simple one in which activation was defined by a state in which the represented ideas were currently highly available, presumably because of neural activity that decayed to zero within about a minute. This conception was backed by experiments on memory for unattended speech in which the access to memory showed marked loss if attention to the speech was delayed by distraction for up to 10 s before a cue to turn attention to the speech (e.g., Cowan, Lichty, & Grove, 1990). The conception includes decay not only of item information, but also some associative information. Thus, experiments on memory of spatial arrays of characters showed that when information was lost over an unfilled delay, it did not fade uniformly as in an acid bath; instead, items first became less attached to the correct array locations (Mewhort, Campbell, Marchetti, & Campbell, 1981). It was later shown that memory for unattended speech decayed in a way that affected recall in a serial order task (Cowan, Nugent, Elliott, & Saults, 2000).

Now, we are faced with evidence that under some circumstances there is no decay, even in the presence of concurrent distraction and articulatory suppression (Oberauer & Lewandowsky, 2008). Ricker and Cowan (2014) and Rhodes and Cowan (2018) suggest that the rate of decay is diminished as the LTM representation of the item being maintained is strengthened through new learning. The relation between STM storage and later long-term recall would depend on the processes involved in retention. For example, Craik, Gardiner, and Watkins (1970) examined final free recall and recognition of items that had been presented earlier within lists for immediate free recall. The most recent items from immediate recall were later recalled and recognized relatively poorly. This pattern makes sense if a strong LTM representation did not have to be formed for those most recent items, which could be recalled first in immediate recall. When more interference is added to immediate recall tasks, which is the case for the earlier-presented items, a stronger LTM representation is formed.

In complex span tasks, processing episodes are place between the items to be recalled, so storage has to be accomplished in a way that overcomes interference. According to a separate STM, one might expect that there would only be enough capability to use the STM storage to hold the words to be recalled, with other mechanisms doing the processing. It would be an extra burden to memorize the list items. Yet, better long-term recall would be expected if the mechanism used to recall the list items is in fact long-term learning. McCabe (2008) examined unexpected delayed recall of items that had been presented earlier in simple, immediate serial recall or in immediate recall in a complex span task, and found better delayed recall in the latter case (for earlier serial positions particularly), which favors the notion that the mechanism used to allow immediate recall was a long-term learning mechanism.

Some theories seem to depend on decay to explain why the amount of information we can recall in order from a verbal list is approximated by the amount we can recite in 2 s (Baddeley et al., 1975; or why the amount we can recall is linearly related to the proportion of time between items that is not filled with distraction (Barrouillet et al., 2011; Camos et al., 2011). In these theories, rehearsal or attention is used to refresh the representation before it is lost through decay. It is possible that these are situations in which the representation is not well-established and therefore is in need of refreshment, but there seems to be a clash with the finding of Oberauer and Lewandowsky (2008) that there is no decay with seemingly comparable presentation rates. An alternative possibility that might resolve differences between studies is that free time can be used to establish better LTM representations, reducing the decay rate to improve recall.

Role of Attention and Executive Function in Directing aLTM and New Learning

Although Cowan (1988, 1999) did not stipulate a separate copy of information in STM, the theory depended fundamentally on processing and storage mechanisms other than just aLTM: central executive function to control the flow of information, and the FoA as a temporary seat of retention of pointers to a small number of processed objects, items, chunks, or events, leading to STM retention and new LTM learning. It is important to understand these attention-related mechanisms to appreciate how the system could operate with these mechanisms in place, without a separate copy of information in STM. The FoA differs from a specialized store with limited capacity in that a limited attention must be shared between storage and other sorts of processing that do not require storage. Examples are the disruption of visual working memory by covert verbal retrieval (Ricker, Cowan, & Morey, 2010) or by tone identification (Stevanovski, & Jolicoeur, 2007), slowing of Necker Cube reversals by a letter string memory load (Intaitė, Koivisto, & Castelo-Branco, 2014), and loss of letter string memory due to arithmetic required during a retention interval (Doherty et al., in press; Rhodes et al., in press).

Cowan (1988, 1999) proposed that attention is governed partly through the deliberate intervention of executive function and partly by recruitment of attention to abrupt changes in the environment that require a change in the current neural model of the environment. This neural model presumably describes the characteristics and regularities of the current environment on any level that has been successfully processed (including physical and semantic features).

Behavioral function of the FoA.

In various venues, I have proposed that a basic function of the FoA is to inter-associate elements that are represented concurrently to form new series, new structures, new concepts, and so on (Cowan, 1995, 1999, 2001, 2005, 2010; Cowan & Chen, 2008). According to this view, a pointer system is expected in which a structured set of references to information in aLTM would be established, which would result in new aLTM learning (Cowan et al., 2013). It could include such features as variable binding (cf. Norris) though I have not discussed that point. It would be expected that the pointer system would work across modalities, which would make it somewhat related to Baddeley’s (2000) episodic buffer conception except that it would cause long-term learning of the structures established in STM.

Cowan, Saults, and Blume (2014) tested the possibility that items from an array or series that are apprehended by the FoA remain there throughout a trial, but the results did not turn out that way. There was some conflict between memory for an array of colored objects and a series of spoken or printed verbal items, but not nearly as much conflict as one would expect if both sets were continually held in the FoA (cf. Baddeley & Hitch, 1874). Cowan et al. suggested that a set of items is apprehended with the FoA and then off-loaded into new LTM representations (see also Rhodes & Cowan, 2018). Although serial order memory was not tested, binding between features within an object was tested (color-shape and verbal item-voice binding) and it was found that the amount of new aLTM formed was smaller than it was for isolated features, but still not negligible. Each binding may require retention of two elements to be bound, severely limiting the binding capability (cf. Treisman & Gelade, 1980). The FoA would be limited in terms of serial order information held also; capacity presumably rises to several items only because people make use of rehearsal and new learning to help hold some of the bound features or contextual information, such as serial positions, in aLTM without constant use of attention.

When sufficient FoA involvement is available, it is possible for performance to benefit from irrelevant but still-activated LTM representations, while the costs of using it are limited because the FoA selectively uses task-relevant information. For example, Oberauer, Awh, and Sutterer (2017) showed that a visual memory task (recall of colors of objects by reproduction on a color wheel) yielded proactive facilitation between trials but not proactive interference.

Neural underpinning of the working memory system with STM based on aLTM, new learning, and the FoA.

Cowan (1995) proposed neural involvement in working memory and STM from the standpoint of an embedded-process approach, based largely on prior research on brain damage. It was suggested that the inferior parietal areas serve as a primary site for the FoA, whereas it is controlled by frontal lobe regions. Various posterior regions that process information of various types also serve as the substrate for aLTM, including temporal regions for acoustic and speech-related information and occipital areas for visual information, with association areas storing abstract information. This scheme was supported partly by findings of more awareness-related deficits with parietal damage (including hemispatial neglect and anosognosia, the unawareness of a serious deficit such as paralysis of a limb) and control-related deficits with frontal damage, in addition to many studies of selective processing and topic-specific memory deficits with posterior cortical damage.

Figure 1 depicts proposals regarding the brain representation of working memory according to Cowan (1995, 1999). Note that there is no separate copy of information outside of aLTM, which includes newly-formed LTM that is still in an active state. There are pointers from the IPS to aLTM areas that are currently attended. The structure of attended information (e.g., serial position arrangement of items in a list) is represented by structure in the set of pointers, and that structured information can be entered into aLTM (possibly directly; probably with the involvement of the hippocampus and surrounding regions). The remainder of this section justifies these statements about the role of attention in relation to aLTM and new learning.

Chein and Fiez (2010) carried out an fMRI study in which several theories of STM and working memory were compared on the basis of manipulations of irrelevant speech, irrelevant broadband noise bursts, and articulatory suppression during a task of memory for 7 letters. The brain responses to the task and reactions to impediments to memory strongly supported the embedded process account, and were deemed to be inconsistent with the three other accounts examined (multiple-component, object-oriented episodic record, and feature models). After the printed presentation of a list of letters one at a time, in the response period one letter was presented as a probe and the participant was to write the letter that followed the probe in the list. The key issue was whether these sources of interruption affected the same brain regions, as would be expected according to most of the models, but not the embedded processes model. The logic of the examination was summarized as follows:

A simple logic can be employed to form predictions. Effects having a common source should influence the fMRI signal during working memory processing in the same way. That is, the pattern of brain activity observed under conventional (quiet) working memory conditions should be modified in the same way by separate irrelevant information effects that derive from the same source. By contrast, effects having different sources should accordingly have dissociable consequences for brain activity. Such influences on brain function may materialize as an alteration of the signal magnitude or temporal processing within the typical working memory network, or as a shift in the neuroanatomical substrates of performance (i.e., a change in the set of regions activated during working memory). (Chein & Fiez, 2010, p. 121–122)

This reasoning seems comparable to the dissociation logic used by researchers who deal with neural deficits in memory, discussed earlier (e.g., Scoville & Milner, 1957; Warrington & Shallice, 1969). According to theories with a separate copy of STM information, such as the multiple-component model, suppression and irrelevant information should act similarly. Instead, the key finding by Chein and Fiez was (p. 117) that “Within a principally frontal and left-lateralized network of brain regions, articulatory suppression caused an increase in activity during item presentation, whereas both irrelevant speech and nonspeech caused relative activity reductions during the subsequent delay interval.” The interpretation from an embedded-processes standpoint was that articulatory suppression made phonological rehearsal unavailable, forcing participants to carry out more attention-demanding encoding, whereas irrelevant items during maintenance acted as distractions that caused items to be lost from the FoA.

Todd and Marois (2004) administered a task of STM for items in a spatial array using fMRI and found load-dependent activity specific to the IPS. Although they considered this activity to be a locus of visual memory, from the theory of Cowan (1995, 1999), the IPS could be a more general area for maintenance of abstract information from any modality. To examine this possibility, Cowan et al. (2011) presented both acoustic letters and a visual spatial array of colors and varied the amount of material to be remembered. Although many areas of the brain responded to the memory load in one modality or another, robust activity for a memory load in either modality was specific to the left IPS. It was present during both encoding and maintenance periods. In a further analysis of the same data, Li et al. (2014) found that the IPS was functionally connected with posterior regions that differed depending on whether it was visual or verbal information that was maintained on a given trial.

Using a functional connectivity analysis, Majerus et al. (2006) showed that the left IPS modulates attention not only to item information in a verbal list, but also to order information. They stated (p. 880) that “during order STM, the left IPS was functionally connected to serial/temporal order processing areas in the right IPS, premotor and cerebellar cortices, while during item STM, the left IPS was connected to phonological and orthographic processing areas in the superior temporal and fusiform gyri.” This result suggests that there is a special brain network for encoding and remembering serial order but, again, this network’s product could be recorded as new LTM learning rather than as a separate copy of information for the short term. The IPS would serve as a key part of a “hub” of attention (namely the FoA) to work with areas known to process information. Although interpretation of a simple correlation can be considered a fallacy as Norris suggested, the more detailed the relation becomes, the more plausible an explanation becomes. For example, the fact that Cowan et al. (2011) found that only the left IPS clearly responded to a memory load of either the visual or the verbal types during maintenance does not necessarily mean that the left IPS is involved in maintenance activities. However, several other, convergent types of findings strengthen the hypothesis.

First, the neural performance has been related to behavior. Cowan et al. found that the level of activity in the left IPS was correlated with STM memory performance, but this was significantly so only for the activity near the end of the retention interval. Working memory performance accuracy was related to the neural area that was active across verbal and visual stimuli also in another study (Chein, Moore, & Conway, 2011).

Second, functional connectivity results show more about the STM process. These results (Li et al., 2014; Majerus et al., 2006) show that the IPS region works in combination with areas where processing of the information can be expected, in STM tasks.

Third, multivoxel pattern analysis of fMRI results provides much more specificity to the observations. The nature of specific stimuli in STM tasks (e.g., faces, orientations, directions of movement, semantic information) can be identified with better-than-chance accuracy from these patterns, in different posterior regions of the brain as expected from neural damage evidence. (e.g., Emrich et al., 2013; Lewis-Peacock et al., 2012), only for items that are currently being used and not for items needed later in the trial (but see Christophel et al., 2018). In contrast to the posterior regions, in the IPS one can detect not the stimulus type but the memory load, abstractly defined. Specifically, Majerus et al. (2016) found that the classification algorithm could be trained to detect the memory load based on verbal input and found that the same algorithm succeeded at visual input, or visual training of the algorithm worked for verbal input. The key activity was, as expected, in the IPS. Gossaries et al. (2018) found more information about how the IPS works, in that the amount of IPS activity depended not only on the memory load, but also on how similar the memory items were to one another; there was much more activity in response to a direction of motion to be remembered when a series to be remembered included three directions of motion and the correct serial position had to be selected, compared to when the series to be remembered included only one direction of motion along with two colors. Forming and maintaining associations items and their serial positions would be an instance of binding that, according to the general principle established by Treisman and Gelade (1980), would be attention-demanding and limited in capacity.

Fourth, there is convergent evidence from other methodologies. Using electro- and magnetoencephalography to examine oscillatory brain activity, Palva, Monto, Kulashekhar, and Palva (2010) showed that the frontal-parietal network known to be involved in attention-related activity displayed increasing neural synchrony as a function of the memory load, but that (p. 7580) “individual behavioral [visual working memory] capacity was predicted by synchrony in a network in which the intraparietal sulcus was the most central hub.” Anderson, Ferguson, Lopez-Larson, and Yurgelun-Todd (2010, p. 20110) note that “The intraparietal sulcus (IPS) region is uniquely situated at the intersection of visual, somatosensory, and auditory association cortices, ideally located for processing of multisensory attention.” They go on to use MRI to show that there are specialized parts of the IPS linked to different modalities and that the part of the IPS that is active is related to the modality of processing being accomplished, in non-memory tasks. Cabeza et al. (2011) suggested that the overlapping activation in perception and memory for this network suggests an attention-to-memory account, as in the present account.

Fifth, neuroimaging can be accompanied by an experimental manipulation that causes temporary functional lesions, namely TMS. Some findings using this technique have strengthened the neural conception from the embedded processes point of view. Postle et al. (2006) showed that although the frontal-parietal network tends to be activated as a unit, one can distinguish their functions using TMS. A repetitive TMS pulse was applied to the prefrontal cortex or the superior parietal lobule during short-term retention (remembering a list of letters to be recalled) or short-term retention with an added processing task (remembering a list of letters and alphabetizing them mentally before recall). Parietal TMS interfered with performance of either task, which is to be expected if the parietal areas maintain the content or pointers to it, whereas frontal stimulation interfered only with performance on the task that included alphabetization, which is to be expected if the frontal area does not actually maintain the information but is engaged in the task of reorganizing it.

Summary.

The embedded processes model of Cowan (1995, 1999) provides a way of thinking about STM and working memory that is friendly to behavioral and neural data, is plausible, and does not depend on any kind of STM storage that is not also shared with LTM. It seems like the more parsimonious account and, if others wish to propose a separate copy of information in STM that is not shared with LTM, they must accept partial responsibility for the burden of proof beyond just their favored interpretation of data.

Consistency of This Account with the Broader Literature

Oberauer et al. (2018) recently addressed the working memory / STM community at large in an attempt to arrive at benchmark behavioral findings that any theoretical account would have to handle in order to be consistent with the literature. The present conception does not include explicit accounts for all of these benchmarks but, to my knowledge, is consistent with them. Below I will comment on Oberauer et al.’s 13 categories of benchmarks.

Benchmarks of Category 1 are about set size effects, which are explicitly accounted for based on the capacity limit in the FoA to about 3–4 chunks of information in adults, but with some of the new information sometimes rapidly memorized so that the total capacity of the system can exceed this FoA capacity under the right circumstances (Cowan et al., 2012, 2014).

Benchmarks of Category 2 are about retention interval and presentation duration. Although there are clear effects of these variables, there are also unresolved inconsistencies in the literature, in which studies find marked decay or memory loss across several seconds (e.g., Baddeley & Scott, 1971; Ricker, 2015; Ricker & Cowan, 2010; Zhang & Luck, 2009) or no decay at all (e.g., Oberauer & Lewandowsky, 2008). Here I am proposing a novel resolution, suggesting that the variable rates of decay can be accounted for by the degree of rapid new learning (also called consolidation of the information into working memory), which can depend on the presentation duration (Ricker, 2015; Ricker & Cowan, 2014; Ricker & Hardman, 2017). When context is needed (e.g., serial or spatial position), new learning is all that is available and the amount of it can vary; hence, the rate of decay and amount preserved long-term can vary (Cowan et al., 2013; Nairne, 1992; Nairne & Neath, 2001). Findings that have been attributed to processes that counteract decay (e.g., Baddeley et al., 1975; Barrouillet et al., 2011; Camos et al., 2011) can be explained instead by construction of a more stable long-term representation (and a better aLTM for the task) using rehearsal and/or attention as tools in this construction.

Benchmarks of Category 3 are about effects of serial positions in lists. Here, one assumption has to be that early list items lead to the best long-term learning, probably because of more access to attention than later items, a factor that plays a major role in extant theories of serial position effects in STM, as in the primacy model of Page and Norris (1998), though these models do not rely on long-term learning as a mechanism of STM. Late list items can be recalled early on in free recall of a list, allowing them to avoid both input and output retroactive interference, whereas this interference allows considerable loss in serial recall (Cowan et al., 2002). Nevertheless, the upturn at the end of the list even in serial recall could occur because the unfilled interval after the last item allows especially good new learning (consolidation) of that list item and its serial position in the list, or its status as last item.

Benchmarks of Category 4 have to do with error patterns, and I have discussed perturbations in item-serial position coding as a type of forgetting (e.g., Nairne, 1992). These can occur when new learning of the list is imperfect. We have not dealt with many kinds of errors but I have not contradicted the many theories that attempt to deal with them, suggesting only that the mechanisms in these theories are not, to my knowledge, incompatible with the notion that STM coding is accompanied by new LTM learning.

Benchmarks of Category 5 have to do with multiple demands on working memory, such as combination of a verbal set with a visual set, which have played a large role in my attention-plus-aLTM-with new learning account (e.g., Cowan et al., 2014). The noted asymmetry between verbal and visual memory, with more ubiquitous interference with visual memory, was accounted for here with the notion that it is the degree of learning of each set that determines how well memory for that set is maintained in the face of interference.

A finding that dual tasks interrupt feature information (e.g., red) and binding information (e.g., the square is red) equally (e.g., Allen, Baddeley, & Hitch, 2006) would seem curious, inasmuch as retention of binding information is completely dependent on new learning in my approach, whereas feature information could be retained through activation of previously-learned features in aLTM. However, feature memory is typically at a higher level of performance, so that the proportion of information lost because of a secondary task is larger for memory of binding information than of feature information (Cowan et al., 2014).

Benchmarks of Category 6 have to do with auditory distraction and are handled in a manner subsumed within the response to Category 5. When irrelevant stimuli during a serial recall task cannot be ignored, they may cause distraction and they may also inappropriately be entered into the series of pointers signifying the serial positions of list items to be remembered.

Benchmarks of Category 7 have to do with word length effects based on the number of syllables in the list items. Words with more syllables are more distinctive in recall but also cause more extended interference with each other in the list, which can lead to a short-word advantage because of output interference in immediate recall, or a long-word advantage when interference is added, in a delayed recall task (Cowan et al., 1994).

Benchmarks of Category 8 have to do with similarity effects, which are easily explained on the grounds that aLTM representations interfere with each other to the extent that they share features (Cowan, 1988; Nairne, 1990).

Benchmarks of Category 9 have to do with distinctiveness and grouping. In the current approach, they are all related to how easy or difficult it is to form new aLTM representations.

Benchmarks of Category 10 have to do with prioritization of information in working memory. They are clearly compatible with an attention-based approach in which aLTM is constructed and perpetuated with the assistance of the FoA to a degree related to the prioritization of the information.

Benchmarks of Category 11 have to do with knowledge effects, of obvious relevance to the aLTM approach and not incompatible with that approach. The embedded-processes approach could be used to predict error patterns (Category 4) that Oberauer et al. (2018) did not consider. In an immediate recall task, it is not sufficient to pool all sources of LTM together in one’s answer, but confusions do occur between aLTM activation from the newly-formed episodic record and aLTM from other sources. One key example of this happening is in the false memory procedure of Roediger and McDermott (1995). Suppose one receives a list of many words for recall related to shirt, without actually receiving the word shirt itself. The newly-formed episodic record in LTM includes words that were presented (e.g., perhaps button, sleeve, blouse, etc.). However, these words together activate the word shirt indirectly and that aLTM representation sometimes leads to false recall despite its inappropriate source of activation. This kind of example shows that the status of being activated can influence performance of the system without absolute clarity of whether old information or new learning was the source of activation.

Benchmarks of Category 12 have to do with individual differences, which are outside of the scope of this article for the most part but have been treated at length within the embedded-processes approach. They have been discussed with respect to capacity and dual tasks in the case of child development (e.g., Cowan, 2016) and aging (e.g., Rhodes et al., in press).

Last, Benchmarks of Category 13 have to do with neuroscience, discussed here throughout.

Summary

An aLTM representation must include rapid new learning and is subject to loss through decay that occurs at a rate that is abated as a better representation of the material is established. There is also interference based on the similarity between the features of items in aLTM and new incoming items, and that interference also is presumably abated by improvement of the representation of the information in memory, such as noticing patterns in the information or possibly even some beneficial effects, for recognition at least, of rote mental repetition (Glenberg, Smith, & Green, 1977).

Concluding Observations

Although there is no evidence explicitly ruling out the idea of separate STM and LTM stores, this idea seems like an added complication to a model that almost certainly must include LTM in both active and dormant forms and attention to at least some of the currently active forms. Given the possibilities brought up here, there is good evidence for the sort of embedded-processes model of Cowan (1988) if several additional premises are granted: (1) the inclusion in aLTM of newly-formed LTM representations of episodes such as stimuli on the current trial (Cowan, 1999); (2) neural damage to mnemonic control processes that are critical for typical STM tasks but less so for typical LTM tasks, without damage to storage per se (Cermak, 1997; Cowan, 1988; Morey et al., 2019); (3) a neural basis of LTM activation that may not involve neuronal firing, but other physiological factors that affect the current synaptic weights (Rose et. al., 2016); and (4) a variable rate of decay depending on the degree of learning of the information (Ricker, 2015; Ricker & Cowan, 2014). The behavioral evidence favors a heavy role of LTM in STM procedures in a way that is easily handled by assuming that the STM information is, in fact, aLTM with some of that information attended.

The separate-copy version of STM could come in at least two varieties: either as a stand-alone STM, or as part of a system that also includes the activated portion of LTM, or aLTM. The latter type of model could, of course, handle all of the findings that could be handled with only aLTM, so the important question is whether there is any evidence that requires this added complexity in the model. I do not wish to suggest that the mysteries of behavioral and neural coding of STM are now settled. Much is still unknown, and there are complexities. Take, for example, the “problem of two” noted above (cf. Jackendoff, 2002), the need to represent types and tokens. Even without a separate-copy type of STM, there may be multiple copies of the information in the mind, for example in the hippocampus and also in the neocortex. Alternatively, there may only be one copy of the information, in the neocortex, and the hippocampal system might include only pointers to that information; the hippocampus would do its job only in connection with the neocortex, never on its own. Similarly, when the series 131 is presented, the intraparietal sulcus involved in the attention to information (Cowan, 2011) could include pointers to posterior areas reflecting types (e.g., a pointer to 1 and another pointer to 3), thus not doing the full job of holding an episodic record, or could reflect tokens (with pointers to 1, 3, and 1 in different serial positions). The brain research may not yet have included the conditions needed to discriminate between these possibilities. We do know that the intraparietal sulcus seems to work hard to keep items distinct from one another, so that the representation of three directions of motion in sequence involved more activity than sequences with one direction of motion and two colors (Gossaries et al., 2018).

I hope to have shown that the question of what is meant by LTM activation can be answered by combining a priori, behavioral, and neural considerations, leading to a known definition and some unknown but potentially knowable details. The mechanism of aLTM including new learning seems both viable and important to understand cognition. I also hope to have shown that it is an unresolved empirical issue to assess two theoretical alternatives: an activated portion of LTM with new learning, or that plus an added, separate STM copy. The exact meaning of activation and of the two alternatives may change as the pursuit to test them continues; changing definitions is a legitimate part of the progression of a science (Cowan, 2017b).

Public Significance Statement.

In short-term memory (STM), some suggest that a separate, temporary copy of events in the brain is used. I argue that one copy suffices. Temporarily activated long-term memory, including rapid, new learning guided by attention, describes STM and helps elucidate memory disorders.

Acknowledgments

This paper was written with support from NIH Grant 4R01-HD-21338.

References

  1. Allen RJ, Baddeley AD, & Hitch GJ (2006). Is the binding of visual features in working memory resource-demanding? Journal of Experimental Psychology: General, 135, 298–313. [DOI] [PubMed] [Google Scholar]
  2. Anderson JR (1983). The architecture of cognition Cambridge, MA: Harvard University Press. [Google Scholar]
  3. Anderson JR, Bothell D, Lebiere C, & Matessa M (1998). An integrated theory of list memory. Journal of Memory and Language, 38, 341–380. [Google Scholar]
  4. Anderson JR, & Matessa M (1997). A production system theory of serial memory. Psychological Review, 104, 728–748. [Google Scholar]
  5. Anderson JR, & Ross BH (1980). Evidence against a semantic episodic distinction. Journal of Experimental Psychology: Learning, Memory, and Cognition, 6, 441–466. [Google Scholar]
  6. Anderson JS, Ferguson MA, Lopez-Larson M, & Yurgelun-Todd D (2010). Topographic maps of multisensory attention. Proceedings of the National Academy of Science of the United States of America (PNAS), 107, 20110–20114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Atkinson RC, & Shiffrin RM (1968). Human memory: A proposed system and its control processes. In Spence KW & Spence JT (Eds.), The psychology of learning and motivation: Advances in research and theory (Vol. 2, pp. 89 195). New York: Academic Press. [Google Scholar]
  8. Atkinson RC & Shiffrin RM (1971) The control of short-term memory. Scientific American, 224, 82–90. [DOI] [PubMed] [Google Scholar]
  9. Baddeley AD (1986). Working memory Oxford Psychology Series #11 Oxford: Clarendon Press. [Google Scholar]
  10. Baddeley AD (2000). The episodic buffer: a new component of working memory? Trends in Cognitive Sciences, 4, 417–423. [DOI] [PubMed] [Google Scholar]
  11. Baddeley AD (2003). New data: Old pitfalls. Commentary on Ruchkin, Grafman, Cameron & Berndt. Behavioral and Brain Sciences, 26, 728–729. [DOI] [PubMed] [Google Scholar]
  12. Baddeley AD, & Hitch G (1974). Working memory. In Bower GH (Ed.), The psychology of learning and motivation, Vol. 8 New York: Academic Press; (pp. 47–89) [Google Scholar]
  13. Baddeley AD, & Scott D (1971). Short term forgetting in the absence of proactive inhibition. Quarterly Journal of Experimental Psychology, 23, 275–283. [Google Scholar]
  14. Baddeley AD, Thomson N, & Buchanan M (1975). Word length and the structure of short term memory. Journal of Verbal Learning and Verbal Behavior, 14, 575–589. [Google Scholar]
  15. Baddeley AD, & Warrington EK (1970). Amnesia and the distinction between long-and short-term memory. Journal of Verbal Learning and Verbal Behavior, 9, 176–189. [Google Scholar]
  16. Barrouillet P, Portrat S, & Camos V (2011). On the law relating processing to storage in working memory. Psychological Review, 118, 175–192. [DOI] [PubMed] [Google Scholar]
  17. Basso A, Spinnler H, Vallar G, & Zanobio ME (1982). Left hemisphere damage and selective impairment of auditory verbal short-term memory: A case study. Neuropsychologia, 20, 263–274. [DOI] [PubMed] [Google Scholar]
  18. Bhatarah P, Ward G, Smith J, & Hayes L (2009). Examining the relationship between free recall and immediate serial recall: similar patterns of rehearsal and similar effects of word length, presentation rate and articulatory suppression. Memory & Cognition, 37, 689–713. [DOI] [PubMed] [Google Scholar]
  19. Bjork RA, & Whitten WB (1974). Recency sensitive retrieval processes in long term free recall. Cognitive Psychology, 6, 173–189. [Google Scholar]
  20. Botvinick MM, & Plaut DC (2006). Short-term memory for serial order: A recurrent neural network model. Psychological Review, 113, 201–233. [DOI] [PubMed] [Google Scholar]
  21. Broadbent DE (1958). Perception and communication New York: Pergamon Press. [Google Scholar]
  22. Brown GDA, Neath I, & Chater N (2007). A temporal ratio model of memory. Psychological Review, 114, 539–576. [DOI] [PubMed] [Google Scholar]
  23. Brown GDA, Preece T, & Hulme C (2000). Oscillator-based memory for serial order. Psychological Review, 107, 127–181. [DOI] [PubMed] [Google Scholar]
  24. Burgess N, & Hitch GJ (1992). Towards a network model of the articulatory loop. Journal of Memory and Language, 31, 429–460. [Google Scholar]
  25. Burgess N, & Hitch G (1999). Memory for serial order: A network model of the phonological loop and its timing. Psychological Review, 106, 551–581. [Google Scholar]
  26. Burgess N, & Hitch GJ (2006). A revised model of short-term memory and long-term learning of verbal sequences. Journal of Memory and Language, 55, 627–652. [Google Scholar]
  27. Cabeza R, Mazuz YS, Stokes J, Kragel JE, Woldorff MG, Ciaramelli E, Olson IR, & Moscovitch M (2011). Overlapping parietal activity in memory and perception: Evidence for the attention to memory model. Journal of Cognitive Neuroscience, 23, 3209–3217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Camos V, Mora G, & Oberauer K (2011). Adaptive choice between articulatory rehearsal and attentional refreshing in verbal working memory. Memory & Cognition, 39, 231–244. [DOI] [PubMed] [Google Scholar]
  29. Cantor J, & Engle RW (1993). Working-memory capacity as long-term memory activation: An individual-differences approach. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 1101–1114. [DOI] [PubMed] [Google Scholar]
  30. Cermak LS (1997). A positive approach to viewing processing deficit theories of amnesia. Memory, 5, 89–98. [DOI] [PubMed] [Google Scholar]
  31. Chein JM, & Fiez JA (2001). Dissociation of verbal working memory system components using a delayed serial recall task. Cerebral Cortex, 11, 1003–1014. [DOI] [PubMed] [Google Scholar]
  32. Chein JM, & Fiez JA (2010). Evaluating models of working memory through the effects of concurrent irrelevant information. Journal of Experimental Psychology: General, 139, 117–137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Chein JM, Moore AB, & Conway ARA (2011). Domain-general mechanisms of complex working memory span. NeuroImage, 54, 550–559. [DOI] [PubMed] [Google Scholar]
  34. Chen Z, & Cowan N (2009). Core verbal working memory capacity: The limit in words retained without covert articulation. Quarterly Journal of Experimental Psychology, 62, 1420–1429. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Christophel TB, Iamshchinina P, Yan C, Allefeld C, & Haynes J-D (2018). Cortical specialization for attended versus unattended working memory. Nature Neuroscience, 21, 494–496. [DOI] [PubMed] [Google Scholar]
  36. Clarkson L, Roodenrys S, Miller LM, & Hulme C (2017). The phonological neighbourhood effect on short-term memory for order. Memory, 25, 391–402 [DOI] [PubMed] [Google Scholar]
  37. Cowan N (1988). Evolving conceptions of memory storage, selective attention, and their mutual constraints within the human information processing system. Psychological Bulletin, 104, 163–191. [DOI] [PubMed] [Google Scholar]
  38. Cowan N (1992). Verbal memory span and the timing of spoken recall. Journal of Memory and Language, 31, 668–684. [Google Scholar]
  39. Cowan N (1995). Attention and memory: An integrated framework Oxford Psychology Series, No. 26 New York: Oxford University Press; (Paperback edition: 1997) [Google Scholar]
  40. Cowan N (1999). An embedded-processes model of working memory. In Miyake A & Shah P (eds.), Models of Working Memory: Mechanisms of active maintenance and executive control Cambridge, U.K: Cambridge University Press; (pp. 62–101) [Google Scholar]
  41. Cowan N (2001). The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24, 87–185. [DOI] [PubMed] [Google Scholar]
  42. Cowan N (2005). Working memory capacity Hove, East Sussex, UK: Psychology Press; [Psychology Press and Routledge Classic Edition with new foreword, 2016] [Google Scholar]
  43. Cowan N (2010). The magical mystery four: How is working memory capacity limited, and why? Current Directions in Psychological Science, 19, 51–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Cowan N (2011). The focus of attention as observed in visual working memory tasks: Making sense of competing claims. Neuropsychologia, 49, 1401–1406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Cowan N (2016). Working memory maturation: Can we get at the essence of cognitive growth? Perspectives on Psychological Science, 11, 239–264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Cowan N (2017a). The many faces of working memory and short-term storage. Psychonomic Bulletin & Review, 24, 1158–1170. [DOI] [PubMed] [Google Scholar]
  47. Cowan N (2017b). Working memory, the information you are now thinking of. In Wixted J (ed.), Learning and Memory: A Comprehensive Reference, 2nd edition. Elsevier; (pp. 147–161) [Google Scholar]
  48. Cowan N, Beschin N, & Della Sala S (2004). Verbal recall in amnesiacs under conditions of diminished retroactive interference. Brain, 127, 825–834. [DOI] [PubMed] [Google Scholar]
  49. Cowan N, & Chen Z (2009). How chunks form in long-term memory and affect short-term memory limits. In Page M & Thorn A (Eds.), Interactions between short-term and long-term memory in the verbal domain (pp. 86–107). Hove, UK: Psychology Press. [Google Scholar]
  50. Cowan N, Donnell K, & Saults JS (2013). A list-length constraint on incidental item-to-item associations. Psychonomic Bulletin & Review, 20, 1253–1258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Cowan N, Hogan TP, Alt M, Green S, Cabbage KL, Brinkley S, & Gray S (2017). Short-term memory in childhood dyslexia: Deficient serial order in multiple modalities. Dyslexia, 23, 209–233. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Cowan N, Li D, Moffitt A, Becker TM, Martin EA, Saults JS, & Christ SE (2011). A neural region of abstract working memory. Journal of Cognitive Neuroscience, 23, 2852–2863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Cowan N, Lichty W, & Grove TR (1990). Properties of memory for unattended spoken syllables. Journal of Experimental Psychology: Learning, Memory, & Cognition, 16, 258–269. [DOI] [PubMed] [Google Scholar]
  54. Cowan N, Nugent LD, Elliott EM, & Saults JS (2000). Persistence of memory for ignored lists of digits: Areas of developmental constancy and change. Journal of Experimental Child Psychology, 76, 151–172. [DOI] [PubMed] [Google Scholar]
  55. Cowan N, Rouder JN, Blume CL, & Saults JS (2012). Models of verbal working memory capacity: What does it take to make them work? Psychological Review, 119, 480–499. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Cowan N, Saults JS, & Blume CL (2014). Central and peripheral components of working memory storage. Journal of Experimental Psychology: General, 143, 1806–1836. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Cowan N, Saults JS, Elliott EM, & Moreno M (2002). Deconfounding serial recall. Journal of Memory and Language, 46, 153–177. [Google Scholar]
  58. Cowan N, Saults JS, & Nugent LD (1997). The role of absolute and relative amounts of time in forgetting within immediate memory: The case of tone pitch comparisons. Psychonomic Bulletin & Review, 4, 393–397. [Google Scholar]
  59. Cowan N, Wood NL, & Borne DN (1994). Reconfirmation of the short-term storage concept. Psychological Science, 5, 103–106. [Google Scholar]
  60. Craik F, Gardiner JM, & Watkins MJ (1970). Further evidence for a negative recency effect in free recall. Journal of Verbal Learning and Verbal Behavior, 9, 554–560. [Google Scholar]
  61. Craik FIM, & Lockhart RS (1972). Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal Behavior, 11, 671–684. [Google Scholar]
  62. Craik FIM, & Tulving E (1975). Depth of processing and the retention of words in episodic memory. Journal of Experimental Psychology: General, 104, 268–294. [Google Scholar]
  63. Crowder RG (1982). The demise of short term memory. Acta Psychologica, 50, 291–323. [DOI] [PubMed] [Google Scholar]
  64. Cumming N, Page MPA, & Norris D (2003). Testing a positional model of the Hebb effect. Memory, 11, 43–63. [DOI] [PubMed] [Google Scholar]
  65. Davelaar EJ, Goshen Gottstein Y, Ashkenazi A, Haarman HJ, & Usher M (2005). The demise of short-term memory revisited: Empirical and computational investigations of recency effects. Psychological Review, 112, 3–42. [DOI] [PubMed] [Google Scholar]
  66. Dewar M, Alber J, Butler C, Cowan N, & Della Sala S (2012). Brief wakeful resting boosts new memories over the long term. Psychological Science, 23, 955–960. [DOI] [PubMed] [Google Scholar]
  67. Dewar M, Alber J, Cowan N, & Della Sala S (2014). Boosting long-term memory via wakeful rest: Intentional rehearsal is not necessary, consolidation is sufficient. PLOS One, 9(10), e109542, 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Dewar M, Fernandez Garcia Y, Cowan N, & Della Sala S (2009). Delaying interference enhances memory consolidation in amnesic patients. Neuropsychology, 23, 627–634. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Doherty JM, Belletier C, Rhodes S, Jaroslawska AJ, Barrouillet P, Camos V, Cowan N, NavehBenjamin M, & Logie RH (in press). Dual-task costs in working memory: An adversarial collaboration. Journal of Experimental Psychology: Learning, Memory, and Cognition [DOI] [PMC free article] [PubMed]
  70. Ecker UK, Brown GD, & Lewandowsky S (2015a). Memory without consolidation: Temporal distinctiveness explains retroactive interference. Cognitive Science, 39, 1570–1593. [DOI] [PubMed] [Google Scholar]
  71. Ecker UK, Tay JX, & Brown GD (2015b). Effects of prestudy and poststudy rest on memory: Support for temporal interference accounts of forgetting. Psychonomic Bulletin & Review, 22, 772–778. [DOI] [PubMed] [Google Scholar]
  72. Ekstrom AD, Copara MS, Isham EA, Wang W, & Yonelinas AP (2011). Dissociable networks involved in spatial and temporal order source retrieval. NeuroImage, 56, 1803–1813. [DOI] [PubMed] [Google Scholar]
  73. Emrich SM, Riggall AC, LaRocque JJ, & Postle BR (2013). Distributed patterns of activity in sensory cortex reflect the precision of multiple items maintained in visual short-term memory. The Journal of Neuroscience, 33, 6516–6523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Endress AD, & Potter MC (2014). Large capacity temporary visual memory. Journal of Experimental Psychology: General, 143, 548–566. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Estes WK (1972). An associative basis for coding and organization in memory. In Martin AW & Martin E (eds.), Coding processes in human memory (pp. 161–190). Washington, DC: Winston. [Google Scholar]
  76. Farrell S (2012). Temporal clustering and sequencing in short-term memory and episodic memory. Psychological Review, 119, 223–271. [DOI] [PubMed] [Google Scholar]
  77. Farrell S, & Lewandowsky S (2002). An endogenous distributed model of ordering in serial recall. Psychonomic Bulletin & Review, 9, 59–79. [DOI] [PubMed] [Google Scholar]
  78. Farrell S, & Lewandowsky S (2004). Modelling transposition latencies: Constraints for theories of serial order memory. Journal of Memory and Language, 51, 115–135. [Google Scholar]
  79. Fukuda K, & Woodman GF (2017). Visual working memory buffers information retrieved from visual long-term memory. Proceedings of the National Academy of Science (PNAS), 114, 5306–5311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Gers FA, Schmidhuber J, & Cummins F (2000). Learning to forget: Continual prediction with LSTM. Neural Computation, 12, 2451–2471. [DOI] [PubMed] [Google Scholar]
  81. Gilchrist AL, & Cowan N (2011). Can the focus of attention accommodate multiple separate items? Journal of Experimental Psychology: Learning, Memory, and Cognition, 37, 1484–1502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Glenberg AM, Smith SM, & Green C (1977). Type I rehearsal: Maintenance and more. Journal of Verbal Learning and Verbal Behavior, 16, 339–352. [Google Scholar]
  83. Glenberg AM, & Swanson NC (1986). A temporal distinctiveness theory of recency and modality effects. Journal of Experimental Psychology: Learning, Memory, & Cognition, 12, 3–15. [DOI] [PubMed] [Google Scholar]
  84. Gossaries O, Yu Q, LaRocque JJ, Starrett MJ, Rose NS, Cowan N, & Postle BR (2018). Parietal-occipital interactions underlying control- and representation-related processes in working memory for nonspatial visual features. Journal of Neuroscience, 38, 4357–4366. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Grenfell-Essam R, Ward G, & Tan L (2017). Common modality effects in immediate free recall and immediate serial recall. Journal of Experimental Psychology: Learning, Memory, and Cognition, 43, 1909–1933. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Grossberg S, & Pearson LR (2008). Laminar cortical dynamics of cognitive and motor working memory, sequence learning and performance: Towards a unified theory of how the cerebral cortex works. Psychological Review, 115, 677–732. [DOI] [PubMed] [Google Scholar]
  87. Harrison SA, & Tong F (2009). Decoding reveals the contents of visual working memory in early visual areas. Nature, 458, 632–635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Healy AF (1974). Separating item from order information in short term memory. Journal of Verbal Learning and Verbal Behavior, 13, 644 655. [Google Scholar]
  89. Hebb DO (1949). Organization of behavior New York: Wiley. [Google Scholar]
  90. Hebb DO (1961). Distinctive features of learning in the higher animal. In Delafresnaye JE (Ed.), Brain mechanisms and learning (pp. 37–46). New York: Oxford University Press. [Google Scholar]
  91. Henson RN (1998). Short-term memory for serial order: The Start-End Model. Cognitive Psychology, 36, 73–137. [DOI] [PubMed] [Google Scholar]
  92. Hintzman DL (2016). Is memory organized by temporal contiguity? Memory & Cognition, 44, 365–375. [DOI] [PubMed] [Google Scholar]
  93. Houghton G (1990). The problem of serial order: A neural network model of sequence learning and recall. In Dale R, Mellish C, & Zock M (Eds.), Current research in natural language generation (pp. 287–319). London: Academic Press. [Google Scholar]
  94. Howard MW, & Kahana MJ (2002). A distributed representation of temporal context. Journal of Mathematical Psychology, 46, 269–299. [Google Scholar]
  95. Hurlstone MJ, & Hitch GJ (2018). How is the serial order of a visual sequence represented? Journal of Experimental Psychology: Learning, Memory, and Cognition, 44, 167–192. [DOI] [PubMed] [Google Scholar]
  96. Hurlstone MJ, Hitch GJ, & Baddeley AD (2013). Memory for serial order across domains: An overview of the literature and directions for future research. Psychological Bulletin, 140, 339–373. [DOI] [PubMed] [Google Scholar]
  97. Intaitė M, Koivisto M, & Castelo-Branco M (2014). The linear impact of concurrent working memory load on dynamics of Necker cube perceptual reversals. Journal of Vision, 14, 13. doi: 10.1167/14.1.13 [DOI] [PubMed] [Google Scholar]
  98. Jackendoff R (2002). Foundations of language: Brain, meaning, grammar, evolution Oxford, UK: Oxford University Press. [DOI] [PubMed] [Google Scholar]
  99. Jeneson A, & Squire LR (2012). Working memory, long-term memory, and medial temporal lobe function. Learning & Memory, 19, 15–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  100. Just MA, & Carpenter PA (1992). A capacity theory of comprehension: Individual differences in working memory. Psychological Review, 99, 122–149. [DOI] [PubMed] [Google Scholar]
  101. Kalm K, & Norris D (2017). A shared representation of order between encoding and recognition in visual short-term memory. NeuroImage, 155, 138–146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  102. Keller TA, Cowan N, & Saults JS (1995). Can auditory memory for tone pitch be rehearsed? Journal of Experimental Psychology: Learning, Memory, & Cognition, 21, 635–645. [DOI] [PubMed] [Google Scholar]
  103. Keppel G, & Underwood BJ (1962). Proactive inhibition in short term retention of single items. Journal of Verbal Learning and Verbal Behavior, 1, 153–161. [Google Scholar]
  104. Kurczek J, Brown-Schmidt S, & Duff M (2013). Hippocampal contributions to language: Evidence of referential processing deficits in amnesia. Journal of Experimental Psychology: General, 142, 1346–1354 [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. Lewandowsky S, & Farrell S (2008). Short-term memory: New data and a model. The Psychology of Learning and Motivation, 49, 1–48. [Google Scholar]
  106. Lewandowsky S, & Murdock BB Jr. (1989). Memory for serial order. Psychological Review, 96, 25–57. [Google Scholar]
  107. Lewis-Peacock JA, Drysdale AT, Oberauer K, & Postle BR (2012). Neural evidence for a distinction between short-term memory and the focus of attention. Journal of Cognitive Neuroscience, 24, 61–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  108. Li D, Christ SE, & Cowan N (2014). Domain-general and domain-specific functional networks in working memory. Neuroimage, 102, 646–656. [DOI] [PMC free article] [PubMed] [Google Scholar]
  109. Logie RH (2003). Spatial and visual working memory: A mental workspace. Psychology of Learning and Motivation, 42, 37–78. [Google Scholar]
  110. Logie RH (2016). Retiring the central executive. Quarterly Journal of Experimental Psychology, 69, 2093–2109. [DOI] [PubMed] [Google Scholar]
  111. Logie RH, Brockmole JR, & Vandenbroucke ARE (2009). Bound feature combinations in visual short-term memory are fragile but influence long-term learning. Visual Cognition, 17, 160–179. [Google Scholar]
  112. Logie RH, & Della Sala S (2003). Working memory as a mental workspace: Why activated long-term memory is not enough. Behavioral and Brain Sciences, 26, 745–746. [Google Scholar]
  113. Logie RH, Della Sala S, & Wynn V, & Baddeley AD (2000). Visual similarity effects in immediate verbal serial recall. Quarterly Journal of Experimental Psychology, 53A, 626–646. [DOI] [PubMed] [Google Scholar]
  114. Lohnas LJ, Polyn SM, & Kahana MJ (2015). Expanding the scope of memory search: Modeling intralist and interlist effects in free recall. Psychological Review, 122, 337–363. [DOI] [PubMed] [Google Scholar]
  115. Luck SJ, & Vogel EK (1997). The capacity of visual working memory for features and conjunctions. Nature, 390, 279–281. [DOI] [PubMed] [Google Scholar]
  116. Majerus S, Cowan N, Péters F, Van Calster L, Phillips C, & Schrouff J (2016). Cross-modal decoding of neural patterns associated with working memory: Evidence for attention-based accounts of working memory. Cerebral Cortex, 26, 166–179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  117. Majerus S, Poncelet M, Van der Linden M, Albouy G, Salmon E, Sterpenich V, Vandewalle G, Collette F, & Maquet P (2006). The left intraparietal sulcus and verbal short term memory: Focus of attention or serial order? NeuroImage, 32, 880–891. [DOI] [PubMed] [Google Scholar]
  118. Martinez Perez T, Majerus S, & Poncelet M (2013). Impaired short-term memory for order in adults with dyslexia. Research in Developmental Disabilities, 34, 2211–2233. [DOI] [PubMed] [Google Scholar]
  119. Martinez Perez T, Majerus S, Mahot A, & Poncelet M (2012). Evidence for a specific impairment of serial order short-term memory in dyslexic children. Dyslexia, 18, 94–109. [DOI] [PubMed] [Google Scholar]
  120. Mayes AR, Isaac CL, Holdstock JS, Hunkin NM, Montaldi D, Downes JJ, MacDonald C, Cezayirli E, & Roberts JN (2001). Memory for single items, word pairs, and temporal order of different kinds in a patient with selective hippocampal lesions. Cognitive Neuropsychology, 18, 97–123. [DOI] [PubMed] [Google Scholar]
  121. McCabe DP (2008). The role of covert retrieval in working memory span tasks: Evidence from delayed recall tests. Journal of Memory and Language, 58, 480–494. [DOI] [PMC free article] [PubMed] [Google Scholar]
  122. McElree B (2001). Working memory and focal attention. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 817–835. [PMC free article] [PubMed] [Google Scholar]
  123. McGeoch JA (1932). Forgetting and the law of disuse. Psychological Review, 39, 352–370. [Google Scholar]
  124. Mednick SC, Cai DJ, Shuman T, Anagnostaras S, & Wixted JT (2011). An opportunistic theory of cellular and systems consolidation. Trends in Neurosciences, 34, 504–514. [DOI] [PMC free article] [PubMed] [Google Scholar]
  125. Melton AW (1963). Implications of short-term memory for a general theory of memory. Journal of Verbal Learning and Verbal Behavior, 2, 1–21. 10.1016/S0022-5371(63)80063-8 [DOI] [Google Scholar]
  126. Mewhort DJK, Campbell AJ, Marchetti FM, & Campbell JID (1981). Identification, localization, and “iconic memory”: An evaluation of the bar probe task. Memory & Cognition, 9, 50–67. [DOI] [PubMed] [Google Scholar]
  127. Morey C (2018). The case against specialized visual-spatial short-term memory. Psychological Bulletin, 144, 849–883. [DOI] [PubMed] [Google Scholar]
  128. Morey CC, & Bieler M (2013). Visual short-term memory always requires attention. Psychonomic Bulletin & Review, 20, 163–170. [DOI] [PubMed] [Google Scholar]
  129. Morey CC, Rhodes S, & Cowan N (2019). Sensory-motor integration and brain lesions: Progress toward explaining domain-specific phenomena within domain-general working memory. Cortex, 112, 149–161. [DOI] [PubMed] [Google Scholar]
  130. Morris CD, Bransford JD, & Franks JJ (1977). Levels of processing versus transfer appropriate processing. Journal of Verbal Learning and Verbal Behavior, 16, 519–533. [Google Scholar]
  131. Nairne JS (1990). A feature model of immediate memory. Memory & Cognition, 18, 251–269. [DOI] [PubMed] [Google Scholar]
  132. Nairne JS (1992). The loss of positional certainty in long term memory. Psychological Science, 3, 199–202. [Google Scholar]
  133. Nairne JS (2002). Remembering over the short term: The case against the standard model. Annual Review of Psychology, 53, 53 81. [DOI] [PubMed] [Google Scholar]
  134. Nairne JS, & Neath I (2001). Long term memory span. Behavioral and Brain Sciences, 24, 134 135. [Google Scholar]
  135. Nimmo LM, & Lewandowsky S (2005). From briefs gaps to very long pauses: temporal isolation does not benefit serial recall. Psychonomic Bulletin & Review, 12, 999–1004. [PubMed] [Google Scholar]
  136. Nimmo LM, & Lewandowsky S (2007). Distinctiveness revisited: Unpredictable temporal isolation does not benefit short-term serial recall of heard or seen events. Memory & Cognition, 34, 1368–1375. [DOI] [PubMed] [Google Scholar]
  137. Norman DA (1968). Toward a theory of memory and attention. Psychological Review, 75, 522–536. [Google Scholar]
  138. Norris D (2017). Short-term memory and long-term memory are still different. Psychological Bulletin, 143, 992–1009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  139. Oberauer K (2009). Design for a working memory. Psychology of learning and motivation: Advances in research and theory, 51, 45–100. [Google Scholar]
  140. Oberauer K, Awh E, & Sutterer DW (2017). The role of long-term memory in a test of visual working memory: Proactive facilitation but no proactive interference. Journal of Experimental Psychology: Learning, Memory, and Cognition, 43, 1–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  141. Oberauer K, & Lewandowsky S (2008). Forgetting in immediate serial recall: decay, temporal distinctiveness, or interference? Psychological Review, 115, 544–576. [DOI] [PubMed] [Google Scholar]
  142. Oberauer K, Lewandowsky S, Awh E, Brown GDA, Conway A, Cowan N, Donkin C, Farrell S, Hitch GJ, Hurlstone M, Ma WJ, Morey, Nee DE, Schweppe J, Vergauwe E, & Ward G (2018). Benchmarks for models of working memory. Psychological Bulletin, 144, 885–958. [DOI] [PubMed] [Google Scholar]
  143. Oberauer K, Souza AS, Druey M, & Gade M (2013). Analogous mechanisms of selection and updating in declarative and procedural working memory: experiments and a computational model. Cognitive Psychology, 66, 157–211. [DOI] [PubMed] [Google Scholar]
  144. Öztekin I, McElree B, Staresina B, & Davachi L (2008). Working memory retrieval: Contributions of left prefrontal cortex, left posterior parietal cortex and hippocampus. Journal of Cognitive Neuroscience, 21, 581–593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  145. Öztekin I, Davachi L, & McElree B (2010). Are representations in working memory distinct from representations in long-term memory? Neural evidence in support of a single store. Psychological Science, 21, 1123–1133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  146. Page MPA, & Norris D (1998). The primacy model: A new model of immediate serial recall. Psychological Review, 105, 761–781. [DOI] [PubMed] [Google Scholar]
  147. Page MPA, & Norris D (2009). A model linking immediate serial recall, the Hebb repetition effect and the learning of phonological word forms. Philosophical Transactions of the Royal Society, 364B, 3737–3753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  148. Palva JM, Monto S, Kulashekhar S, & Palva S (2010). Neuronal synchrony reveals working memory networks and predicts individual memory capacity. Proceedings of the National Academy of Sciences, (PNAS), 107, 7580–7585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  149. Parmentier FBR, King S, & Dennis I (2006). Local temporal distinctiveness does not benefit auditory verbal and spatial serial recall. Psychonomic Bulletin & Review, 13, 458–465. [DOI] [PubMed] [Google Scholar]
  150. Peterson LR & Peterson MJ (1959). Short term retention of individual verbal items. Journal of Experimental Psychology, 58, 193–198. [DOI] [PubMed] [Google Scholar]
  151. Polyn SM, Norman KA, & Kahana MJ (2009). Task context and organization in free recall. Psychological Review, 116, 129–156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  152. Postle BR, Ferrarelli F, Hamidi M, Feredoes E, Massimini M, Peterson M, Alexander A, & Tononi G (2006). Repetitive transcranial magnetic stimulation dissociates working memory manipulation from retention functions in the prefrontal, but not posterior parietal, cortex. Journal of Cognitive Neuroscience, 18, 1712–1722. [DOI] [PubMed] [Google Scholar]
  153. Potter MC (1993). Very short term conceptual memory. Memory & Cognition, 21, 156–161. [DOI] [PubMed] [Google Scholar]
  154. Raaijmakers JGW, & Shiffrin RM (2003). Models versus descriptions: Real differences and language differences. Behavioral and Brain Sciences, 26, 753. [Google Scholar]
  155. Rhodes S, & Cowan N (2018). Attention in working memory: Attention is needed but it yearns to be free. Annals of the New York Academy of Science, 1424, 52–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  156. Rhodes S, Jaroslawska AJ, Doherty JM, Belletier C, Naveh-Benjamin M, Cowan N, Camos V, Barrouillet P, & Logie RH (in press). Storage and processing in working memory: Assessing dual task performance and task prioritization across the adult lifespan. Journal of Experimental Psychology: General [DOI] [PMC free article] [PubMed]
  157. Ricker TJ (2015). The role of short-term consolidation in memory persistence. AIMS Neuroscience, 2 (4), 259–279. DOI: 10.3934/Neuroscience.2015.4.259 [DOI] [Google Scholar]
  158. Ricker TJ, & Cowan N (2010). Loss of visual working memory within seconds: The combined use of refreshable and non-refreshable features. Journal of Experimental Psychology: Learning, Memory, and Cognition, 36, 1355–1368. [DOI] [PMC free article] [PubMed] [Google Scholar]
  159. Ricker TJ, & Cowan N (2014). Differences between presentation methods in working memory procedures: A matter of working memory consolidation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40, 417–428. [DOI] [PMC free article] [PubMed] [Google Scholar]
  160. Ricker TJ, Cowan N, & Morey CC (2010). Visual working memory is disrupted by covert verbal retrieval. Psychonomic Bulletin & Review, 17, 516–521. [DOI] [PMC free article] [PubMed] [Google Scholar]
  161. Ricker TJ, & Hardman KO (2017). The nature of short-term consolidation in visual working memory. Journal of Experimental Psychology: General, 146, 1551–1573. [DOI] [PubMed] [Google Scholar]
  162. Roediger HL III, & McDermott KB (1995). Creating false memories: Remembering words not presented in lists. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 803 814. [Google Scholar]
  163. Romani C, & Martin R (1999). A deficit in the short term retention of lexical semantic information: Forgetting words but remembering a story. Journal of Experimental Psychology: General, 128, 56–77. [DOI] [PubMed] [Google Scholar]
  164. Rose NS, LaRocque JJ, Riggall AC, Gosseries O, Starrett MJ, Meyering EE, & Postle BR (2016). Reactivation of latent working memories with transcranial magnetic stimulation. Science, 354, 1136–1139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  165. Rouder JN, Morey RD, Cowan N, Zwilling CE, Morey CC, & Pratte MS (2008). An assessment of fixed-capacity models of visual working memory. Proceedings of the National Academy of Sciences (PNAS), 105, 5975–5979. [DOI] [PMC free article] [PubMed] [Google Scholar]
  166. Ruchkin DS, Grafman J, Cameron K, & Berndt RS (2003). Working memory retention systems: A state of activated long-term memory. Behavioral and Brain Sciences, 26, 709–777. [DOI] [PubMed] [Google Scholar]
  167. Saffran EM, & Marin OS (1975). Immediate memory for word lists and sentences in a patient with deficient auditory short-term memory. Brain and Language, 2, 420–433. [DOI] [PubMed] [Google Scholar]
  168. Schacter DL (1987). Implicit memory: History and current status. Journal of Experimental Psychology: Learning, Memory, & Cognition, 13, 501–518. [DOI] [PubMed] [Google Scholar]
  169. Scoville WB, & Milner B (1957). Loss of recent memory after bilateral hippocampal lesions. Journal of Neurology, Neurosurgery and Psychiatry, 20, 11–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  170. Sederberg PB, Howard MW, & Kahana MJ (2008). A context-based theory of recency and contiguity in free recall. Psychological Review, 115, 893–912. [DOI] [PMC free article] [PubMed] [Google Scholar]
  171. Serences JT, Ester EF, Vogel EK, & Awh E (2009). Stimulus-specific delay activity in human primary visual cortex. Psychological Science, 20, 207–214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  172. Shallice T, & Vallar G (1990). The impairment of auditory-verbal short-term storage. In Shallice T & Vallar G (Eds.), Neuropsychological impairments of short-term memory (pp. 11–53). Cambridge, UK: Cambridge University Press. [Google Scholar]
  173. Shallice T, & Warrington EK (1970). Independent functioning of verbal memory stores: A neuropsychological study. Quarterly Journal of Experimental Psychology, 22, 261–273. [DOI] [PubMed] [Google Scholar]
  174. Shiffrin RM (1975). The locus and role of attention in memory systems. In. Rabbitt PMA & Dornic S (Eds.), Attention and performance V (pp. 168–193). New York: Academic Press. [Google Scholar]
  175. Stevanovski B, & Jolicoeur P (2007) Visual short term memory: Central capacity limitations in short term consolidation. Visual Cognition, 15, 532 563. [Google Scholar]
  176. Surprenant AM, & Neath I (2009). The nine lives of short-term memory. In Thorn A & Page M (Eds.), Interactions between short-term and long-term memory in the verbal domain (pp. 16–43). Hove, UK: Psychology Press. [Google Scholar]
  177. Szmalec A, Loncke M, & Page MPA (2011). Order or disorder? Impaired Hebb learning in dyslexia. Journal of Experimental Psychology: Learning, Memory, and Cognition, 37, 1270–1279. [DOI] [PubMed] [Google Scholar]
  178. Todd JJ, & Marois R (2004). Capacity limit of visual short-term memory in human posterior parietal cortex. Nature, 428, 751–754. [DOI] [PubMed] [Google Scholar]
  179. Treisman AM, & Gelade G (1980). A feature integration theory of attention. Cognitive Psychology, 12, 97–136. [DOI] [PubMed] [Google Scholar]
  180. Unsworth N, & Engle RW (2007). The nature of individual differences in working memory capacity: Active maintenance in primary memory and controlled search from secondary memory. Psychological Review, 114, 104–132. [DOI] [PubMed] [Google Scholar]
  181. Vallar G, & Baddeley AD (1982). Short term forgetting and the articulatory loop. Quarterly Journal of Experimental Psychology, 34A, 53–60. [Google Scholar]
  182. Vallar G, & Baddeley AD (1984). Fractionation of working memory: Neuropsychological evidence for a phonological short-term store. Journal of Verbal Learning and Verbal Behavior, 23, 151–161. [Google Scholar]
  183. Vallar G, Di Betta AM, & Silveri MC (1997). The phonological short-term store-rehearsal system: Patterns of impairment and neural correlates. Neuropsychologia, 35, 795–812. [DOI] [PubMed] [Google Scholar]
  184. Vallar G, & Papagno C (1995). Neuropsychological impairments of short-term memory. In Baddeley AD, Wilson BA, & Wats FN (Eds.), Handbook of memory disorders New York, NY: Wiley. [Google Scholar]
  185. Vallar G, Papagno C, & Baddeley AD (1991). Long-term recency effects and phonological short-term memory. A neuropsychological case study. Cortex, 27, 323–326. [DOI] [PubMed] [Google Scholar]
  186. Ward G, Tan L, & Grenfell-Essam R (2010). Examining the relationship between free recall and immediate serial recall: The effects of list length and output order. Journal of Expermental Psychology: Learning, Memory, and Cognition, 36, 1207–1241 [DOI] [PubMed] [Google Scholar]
  187. Warrington EK, Logue V, & Pratt RT (1971). The anatomical localisation of selective impairment of auditory verbal short-term memory. Neuropsychologia, 9, 377–387. [DOI] [PubMed] [Google Scholar]
  188. Warrington EK, & Shallice T (1969). The selective impairment of auditory verbal short-term memory. Brain: A Journal of Neurology, 92, 885–896. [DOI] [PubMed] [Google Scholar]
  189. Warrington EK, & Shallice T (1972). Neuropsychological evidence of visual storage in short term memory tasks. Quarterly Journal of Experimental Psychology, 24, 30–40. [DOI] [PubMed] [Google Scholar]
  190. Watkins MJ, & Kerkar SP (1985). Recall of a twice presented item without recall of either presentation: Generic memory for events. Journal of Memory and Language, 24, 666–678. [Google Scholar]
  191. Waugh NC, & Norman DA (1965). Primary memory. Psychological Review, 72, 89–104. [DOI] [PubMed] [Google Scholar]
  192. Wickens DD, Moody MJ, & Dow R (1981). The nature and timing of the retrieval process and of interference effects. Journal of Experimental Psychology: General, 110, 1 20. [Google Scholar]
  193. Wixted JT Goldinger SD, Squire LR, Kuhn JR, Papesh MH, Smith KA, Treiman DM, & Steinmetz PN (2018). Coding of episodic memory in the human hippocampus. PNAS, 115, 1093–1098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  194. Wixted JT, Squire LR, Jange Y, Papesh MH, Gondinger SD, Kuhuna JR, Smith KA, Treiman DM, & Steinmetz PN (2014). Sparse and distributed coding of episodic memory in neurons of the human hippocampus. PNAS, 111, 9621–9626. [DOI] [PMC free article] [PubMed] [Google Scholar]
  195. Wolfe JM (2012). Saved by a log: How do humans perform hybrid visual and memory search? Psychological Science, 23, 698–703. [DOI] [PMC free article] [PubMed] [Google Scholar]
  196. Wörgötter F, & Porr B (2005). Temporal sequence learning, prediction, and control: A review of different models and their relation to biological mechanisms. Neural Computation, 17, 245–319. [DOI] [PubMed] [Google Scholar]
  197. Zaromb FM, Howard MW, Dolan ED, Sirotin YB, Tully M, Wingfield A, & Kahana MJ (2006). Temporal associations and prior-list intrusions in free recall. Journal of Experimental Psychology: Learning, Memory, & Cognition, 32, 792–804 [DOI] [PubMed] [Google Scholar]
  198. Zhang W, & Luck SJ (2009). Sudden death and gradual decay in visual working memory. Psychological Science, 20, 423–428. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES