Abstract
Connectionist and dynamical systems approaches explain human thought, language and behavior in terms of the emergent consequences of a large number of simple non-cognitive processes. We view the entities that serve as the basis for structured probabilistic approaches as sometimes useful but often misleading abstractions that have no real basis in the actual processes that give rise to linguistic and cognitive abilities or the development of these abilities. While structured probabilistic approaches can be useful in determining what would be optimal under certain assumptions, we suggest that approaches such as the connectionist and dynamical systems approaches, which focus on explaining the mechanisms giving rise to cognition, will be essential in achieving a full understanding of cognition and development.
Emergence of Structure in Cognition
Emergence is ubiquitous in nature: Consider the complex structure of an ant hill. It can have an elaborate structure, with a complex network of passageways leading from deep underground to 25 feet into the sky. One might suppose that ants possess a blueprint for creating such structures, but something far simpler is in play [1]. Ants are sensitive to certain gasses within their nests; when these gasses build up they move grains of dirt to the outside. This activity lets the gasses escape and has the byproduct of creating the elaborate structure of the nest.
Likewise, human thoughts and utterances have rich and complex structure that, in our view, is also the emergent consequence of the interplay of much simpler processes. The emergentist view contrasts with the approach advocated in the companion article [2], in which cognizing agents are viewed as optimal inferencing machines, coming to cognitive tasks with a structured space of hypotheses and a prior probability distribution over them. Observations provide a means of evaluating the hypotheses and selecting the one that has the highest posterior probability. Work within the structured probabilistic framework is often thought to address an abstract level of analysis akin to Marr’s computational level [3], with consideration of the actual cognitive processes being deferred until the computational-level theory is fully worked out.
The danger, of course, is that if the high-level description is wrong—that is, if the behaving child or adult is not actually engaged in the formulation and selection of hypotheses—then focusing on these constructs would be misleading. It may give rise to an enterprise, much like Chomsky’s Competence-based universal grammar approach to language [4], in which researchers focus on searching for entities that may exist only as descriptive abstractions, while ignoring those factors that actually shape behavior (See Box 1).
Box 1. Parallel Pitfalls of Computational-Level and Competence Approaches.
Structured probabilistic inference models include the following elements:
Formulation of any given problem as one of probabilistic inference.
Commitment to selecting the correct knowledge structure over which probabilities can be assessed and updated.
Abstraction from details of behavior and brain because the theory is usually pitched at Marr’s computational level.
A broader perspective on this approach is provided by looking at its closely-related precursor, Chomsky’s Competence-based approach to linguistics [4], whose foundational assumptions included the following:
Formulation of the goal of the field as characterizing a language user’s knowledge
Commitment to selecting the correct grammar as the representation that explains such facts.
Abstraction from details of behavior and brain because the theory is pitched at the competence level.
In both cases, the goal is an abstract characterization; linkage to performance is a promissory note, seldom redeemed in practice.
Thus, structured probabilistic models of cognition can be understood as competence theories. As such they inherit problems that have become apparent with this approach, including:
The problem formulation is not neutral. If learners are not trying to ‘select the correct grammar’ or ‘the correct structure’ for a domain, approaching the problem as if they were would be misleading.
The commitment to a knowledge representation is not neutral. Commitments to particular choices can lead researchers into blind. Commitment to grammar formalisms radically constrained how other issues were addressed. Acquisition became the problem of converging on a grammar; performance the question of how grammar is used; neurolinguistics the study of how grammar is represented in the brain. The role of grammatical theory has greatly diminished over the years, because of the research program’s lack of progress.
Treating levels of analysis as independent is counterproductive. It may be difficult or impossible to relate the high level computational/competence theory back to facts about behavior and the brain. Conversely, considering implementation/performance issues may lead to a different high-level formulation of a problem.
The levels of description and competence/performance approaches also introduce an uncomfortable extra degree of freedom with respect to data. Facts that are consistent with the theory are embraced; facts that conflict with the theory are relegated to as yet undeveloped “algorithmic-implementational” or “performance” theories.
Explanations of behavior that ignore mechanism and implementation are likely to fall short. For example, a recent study [5] has found that people can exploit a causal framing scenario to make normatively correct, explicit inferences in a contingency learning task if they are given ample time to make explicit predictions. However, when the same contingencies govern events to which participants must respond very quickly, they seem to learn according to a process akin to simple connection weight adjustment. Thus, different mechanisms appear to underlie learning of the very same probabilistic contingencies in the explicit prediction versus quick response variants of the task, yet the statistical structure of the two tasks, and thus the computational-level analysis of what would be optimal in the two situations, is the same.
To be clear, the disagreement between emergentist approaches and structured probabilistic approaches is not about the relevance of probability in characterizing human behavior—both approaches share an emphasis on statistical regularities in the learning environment and on variability in human performance. Indeed, emergentist models often optimize their probabilistic behavior by learning to match probabilistic outputs to the statistical structure of the experiences on which they are trained [6, 7]. The disagreement is also not about advocating a purely bottom-up versus top-down research strategy, as it is our view that science is best served by pursuing integrated accounts that span multiple levels of analysis simultaneously. Rather, the dispute between the two approaches concerns the utility of treating cognition as if its goal and outcome is the selection of one or the other structured statistical model, whether it be a probabilistic grammar, a mutation hierarchy, or a specific causal Bayes network [8, 9, 10]. From our perspective, the hypotheses, hypothesis spaces and data structures of the structured probabilistic approach are not the building-blocks of an explanatory theory. Rather, they are sometimes helpful but often misleading approximate characterizations of the emergent consequences of the real underlying processes. Likewise, the entities over which these hypotheses are predicated—such as concepts, words, morphemes, syllables, and phonemes—are themselves best understood as sometimes useful but sometimes misleading approximations (See Box 2).
Box 2. The Units Problem in Language and Cognition.
Language is usually characterized in terms of discrete units such as phonemes, morphemes, and sentences. Such units are compatible with probabilistic inference models that employ structured representations. For example, recognizing a speech sound could be construed as a Bayesian inference problem in which the hypotheses are alternative phonemes and the task is to pick the one that is most probable given the input [35]. The utility of this approach depends in part on the validity of the units as descriptions of linguistic structure. Herein lies a problem.
All of these units can be intuitively motivated using apparently clear cases. Phonemes are illustrated by minimal pairs such as PEN and TEN. Morphemes are minimal units of meaning as in FARM-FARMER. Such units provide useful terminology for describing and comparing. However, it would be a mistake to take them as the units involved in acquiring and using language.
In actual spoken language, units such as phonemes and syllables are matters of degree. There is almost no ‘t’ in ‘softly’, but more of one in ‘swiftly’ [36]; words such as ‘memory’ have more than two syllables but less than three [37]. Morphology presents a similar problem. There are cases in which the meaning of a complex word appears to be compositional (prefabricate), others where there is no compositionality at all (corner), and still others (predict, prefer) in which the parts appear to contribute to, but do not fully determine, the meaning of the whole [38]. Data suggests that people are sensitive to the gradations, in that intermediate cases produce intermediate morphological priming effects [39], indicating that morphological status is a matter of degree. For years, syntactic theory treated sentences as grammatical or ungrammatical. However, the borderline cases are legion [40]. In light of such observations, many linguists have turned to formalisms that admit degrees of well-formedness [41, 42], However, these systems still generally require commitments to a set of units over which degrees of well-formedness can be computed. Similar issues arise in any effort to create a taxonomy of concepts or meanings for words.
In connectionist models, there is no fixed vocabulary of representational units. The internal representations are graded patterns with varying degrees of distinctness, compositionality, and context sensitivity [43, 44, 45]. These characteristics make connectionist models different from a mere “implementation” of an idealized linguistic theory.
The remaining sections consider two very different cognitive domains that have been modeled as emergent phenomena using connectionist and dynamical systems approaches. In each case, we argue that it is unnecessary, and may even lead research astray, to characterize the situation in terms of structured probabilistic inference. In Box 3 we list examples of other linguistic, developmental, and cognitive domains where the phenomena have been captured within emergentist approaches.
Box 3. Examples of Emergent Phenomena in Language, Development, and Cognition.
Language
Past-Tense Inflection and Single Word Reading
Systematic linguistic knowledge (e.g., the past-tense of BAKE is BAKED) is often attributed to the operation of explicit rules, with violations (TAKE/TOOK) relegated to separate, item-specific storage [46]. Connectionist approaches in domains including past-tense inflection [47,48] and single word reading [44, 49], have emphasized instead that linguistic structure is graded rather than all-or-none, and that the relevant empirical phenomena are better captured by an integrated system in which all types of items are represented and processed.
Sentence Processing
Classical approaches assume an innate module imbued with Universal Grammar as the basis for acquisition of syntactic knowledge. However, Elman [21, 50] addressed the acquisition of syntax in a simple and generic connectionist model call the Simple Recurrent Network (SRN, See Figure). Work by Elman and others has shown how SRNs can assign representations to words capturing their syntactic and semantic roles in sentences and respect subtle regularities including long-distance dependencies without explicit syntactic rules [51]. Related models learn to comprehend sentences and stories [52, 53, 54].
Development
Stage Transitions
It has been common to characterize development as occurring through a series of discrete stages. However, there are many signs that stage transitions are graded rather than discrete [55, 56]. Connectionist models address such transitions as consequences of non-linearities in multi-layer networks. Effects of connection-weight changes in such networks exhibit accelerations and plateaus capturing stage-like phenomena [57, 58].
U-Shaped Developmental Trajectories
Young babies held upright appear to walk, but this behavior ceases long before self-supported walking. Classical accounts explain the disappearance as reflecting development of top-down inhibition [59]. More recent research shows that the disappearance reflects an increase in the mass of the child’s legs as they develop [60]. The approach correctly predicts that walking can be evoked after its apparent disappearance with appropriate adjustments to counterbalance the effects of increased leg mass.
Cognitive Processes
Semantic Cognition
A connectionist model [61] accounted for apparent modular representation of living things versus artifacts as an emergent consequence of (a) modular representation of visual and functional properties and (b) greater importance of functional properties for artifacts and of visual properties for living things. See also [20, 27].
Executive Functions and Short Term Memory
The control of behavior by task and previous context is disrupted in individuals with brain lesions in a wide range of brain areas, even though such control has been ascribed to special modules in the frontal lobes [62]. Botvinick and Plaut [63] observed that when complex behaviors have been acquired by a generic SRN, diffuse damage leaves stereotyped action patterns intact but distrupts ‘control’ by task and context, suggesting that such control may be an emergent function distributed over contributing brain areas. Their model also learns hierarchically structured tasks without explicitly representing hierarchical structure. Botvinick and Plaut [64] applied a similar model to a range of short-term memory phenomena that other approaches interpret as evidence for slots in short-term memory. In their model, the phenomena arise without explicit slots.
The A-not-B error: Absence of a Hypothesis or Emergent Consequence of the Dynamics of Motor Behavior?
The A-not-B task was introduced by Piaget [11] to measure the development of the object concept: the belief that objects exist independent of one’s own actions. In the canonical form of the task (see Figure 1), after searching for an object at one location, then seeing it hidden at a new location, 8–10 month old infants reach back to that first location, whereas older infants reach correctly to the new location. Although the A-not-B task has not been an explicit focus of research within the structured probabilistic framework, the situation is traditionally described in a way that is fully consistent with it: on this view, the phenomenon reflects the absence of (or perhaps a low prior probability for) the hypothesis that the object exists independently of the child’s actions; the younger child, lacking such a hypothesis, reaches to the place where his actions previously led him to find the object [11, 12].
Figure 1.
Top: The A not-B task. On the A trials, an experimenter hides an object repeatedly in one location (A), for example under a lid. The infant watches the hiding, a delay of several seconds is imposed, and then the hiding box is pushed close to the infant and the the infant is allowed to reach to the hiding location and retrieve the object. This is repeated several times – hiding in location A, delay, infant retrieval of the object. On the critical B trial, the experimenter hides the object in a new adjacent location (B), under a second lid. After the delay, the infant is allowed to reach. Bottom Left: A DFT simulation of activation in the dynamic field on a B trial. The activation rises at the B location during the hiding event, but then due to the cooperativity in the field and memory for previous reaches, activation begings to rise at A during the delay and the start of the reach inhibiting the activation at B and resulting in a simulated reach to A. Bottom right: A baby in a posture-shift A not-B task.
Experimental data favors an alternative, emergentist account of performance in the A-not-B task that has been developed within Dynamic Field Theory (DFT) [13, 14]. This account explains the error through general processes of goal-directed reaching (and indeed is a variant of one model of adult reaching behavior). The model consists of a dynamic field, shown in Figure 1, which corresponds to the activation within a population of neuron-like units, each dynamically representing the direction of a reach. The field integrates multiple sources of relevant information—the immediate events (e.g., hiding the toy), the lids or covers on the table, and the direction of past reaches. The internal activations that produce a directional reach are themselves dynamic events, with rise times, decay rates, amplitudes, and varying spatial resolution. Consequently, the model predicts—and experiments have confirmed—fine-grained stimulus, timing, and task effects [13, 14]. Because the explanation derives from general models of goal-directed action that are not specific to this task nor to this developmental period, the model makes predictions (tested and confirmed) about similar phenomena (and perseverations) at ages younger than, and considerably older than, the typical age range examined in the standard task [15, 16]. Indeed, using this model as a guide, experimenters can make the error come and go predictably—by changing the delay, by heightening the attention-grabbing properties of the covers or the hiding event, and by increasing and decreasing the number of prior reaches to A [13, 14, 16, 17].
The DFT-based model accounts for a wide range of findings showing that variables unrelated to beliefs about the existence of objects can affect the A-not-B error. The model has also been used to predict (correctly) that a reach back to A will occur in some situations when there is no toy hidden [17]. Furthermore, because the dynamic field is viewed as a motor planning field, and thus is tied to the body-centric nature of neural motor plans [17], the model also makes the novel prediction that perseverative errors should disappear if the motor plan needed for reaching to B is distinctly different from that for reaching to A. One experiment achieved this by shifting the posture of the infant [17, 19; see Figure 1].
Because the error can occur even when no object is hidden and can disappear with changes to the infant’s posture, explanations based on beliefs about objects seem largely irrelevant to understanding A-not-B behavior. What is developing is a complex dynamic system, and it is this system that governs intelligent behavior, not the any concepts, hypotheses, or inferences that some ascribe to the child’s thinking.
Connectionist vs. Structured Probabilistic Approaches to Semantic Cognition
We consider next a domain that both approaches have addressed, that of semantic cognition. Under the structured probabilistic approach [9], the acquisition of semantic knowledge is viewed as the inductive problem of deciding which of several alternative conceptual structures is most likely to have generated the observed properties of a set of items in a domain. This computation requires specification of considerable initial knowledge—specifically, (i) knowledge of the hypothesis space, the space of possible concepts and structures for relating concepts, and (ii) prior distributions over both the concepts and the structures. A similar approach has been taken to characterizing language acquisition [8].
Our fundamental disagreement with this approach concerns the fact that the alternative structured representations over which a probabilistic choice must be made generally do not, and perhaps cannot, adequately capture real-world domain structure. For example, a hierarchical taxonomic model that has been fit to natural kinds [9] fails to take account of the presence of partial homologies across separate branches of the hierarchy, such that predatory birds, fish, and mammals tend to share one set of properties while prey of each kind tend to share others. While strict homology might be captured by assigning parallel structures, partial homologies would have to be force-fit. Similarly, a context-free grammar may provide a better fit to a corpus of sentences than some alternatives [8], but such grammars miss subtler probabilistic dependencies easily captured in connectionist models [21, 22].
Connectionist models take a fundamentally different approach: the task of the model is not to choose from a set of pre-specified alternative structures, but to learn a set of real-valued weights on connections among neuron-like processing units that supports the generation of appropriate, context-sensitive, conditional expectations. Discrepancies between predicted and observed outcomes provide feedback for learning, in the form of gradual weight adjustment (see Figure 2). Related items tend to evoke similar internal representations, thereby supporting generalization, although the system can use context to learn different similarity relations among the same sets of items when appropriate [20]. Similar approaches are used in connectionist models of semantic learning and language acquisition [21, 22].
Figure 2.
Top, Left: the connectionist network used by Rogers and McClelland [20], first used by Rumelhart and Todd [70], to explore the emergence of structure from experience. The network is trained by presenting item-context input pairs (e.g. robin can) and then propagating activation forward (to the right) to activate units standing for possible completions of simple three-term propositions. Learning occurs by comparing the output to a pattern representing the valid completions (in this case, move/grow/fly), then adjusting connection weights throughout the network to reduce the discrepancy between the network’s output and the valid completions. Learning occurs gradually, affecting how different items are represented at the Representation layer, and also at the subsequent Hidden layer, where the representations are shaded by context. Learning occurs gradually, producing progressive differentiation. Bottom, Left: At first the network treats all items similarly, as shown in the hierarchical clustering analysis of the patterns of activation at the representation layer. As learning progresses over successive sweeps through the set of item-context-output training patterns, the network first differentiates the plants form the animals and later differentiates the different types of animals and different types of plants. Upper right: The middle panel shows the similarity structure in the learned representation layer patterns in a different way for a larger set of items, while the flanking panels show how this similarity structure is re-organized in different contexts. Note that in the can context, the plants are all represented as similar, because they all do the same thing (they just grow). Bottom right: Naming response of the network when the input is ‘goat’ at different points in training. Note the transient tendency to activate ‘dog’ before the correct response ‘goat’ is acquired. In this instance, the network was trained in an environments where dogs were more frequent than any other type of animal. Before the dog is differentiated from other animal types, the network treats all animals the same, naming them all with the most common animal name, dog. As differentiation occurs the correct name of the goat is finally learned. All panels reproduced with permission from [20].
Although the continuous space of possible weight sets for a given connectionist network could be seen as analogous to the “hypothesis space” of the structured probabilistic approach, there are several key differences. First, unlike the structured probabilistic approach [9], there is no restriction to a set of possible structure types, so that structures that do not exactly match any idealized type can be represented. Second, there is never a discrete decision to select one structure over another—the network’s current set of weights may approximate one structure or a blend of structures. Finally, learning simply involves the gradual refinement and elaboration of knowledge based on each new experience, and thus is far more constrained than the arbitrarily complex computation typically allowed by structured probabilistic approaches for computing the optimal structure from the entire corpus of relevant experiences.
A final point of comparison concerns inductive biases, which play a role in both approaches. Whereas the hypothesis spaces of the structured probabilistic approach impose both general and domain-specific (content-based) biases, work within the connectionist approach has typically focused on the discovery of structure using only domain-general biases derived from properties of the learning procedure and network architecture [7, 20]. While content-based constraints can be built into connectionist models, connectionist work has focused on generic constraints that foster the discovery of structure, whatever that structure may be, across a range of domains and content types [7, 20]. Yet, despite using only domain-general constraints, the connectionist model of semantic learning [20] explains evidence others [23, 24] use to argue that children rely on innate domain-specific constraints. The model can acquire domain-specific patterns of responding: It can rely, for example, on shape over color for semantic judgments in one domain while relying on color over shape in another (see also [25]). Like children [26], the model can also rely on different types of similarity among the same set of items in different contexts (e.g., taxonomically-defined similarity for biological properties, but a one-dimensional similarity space for judgments about size; see Figure 2). The model also exhibits patterns of conceptual change that mirror phenomena reported in the literature, including (i) a progressive differentiation in development (Figure 2); (ii) the advantage of basic level concepts in many situations but (iii) the elimination of the basic-level advantage in expertise; (iv) transient over generalization and illusory correlations in development and (v) the progressive disintegration of semantic knowledge in semantic dementia [27, 28]. Models cast at a competence level have not addressed most of these phenomena.
In short, the need to select among a pre-specified set of alternative structure types in [9] forces semantic representation into an ill-fitting procrustean bed. The connectionist model of semantic cognition shows that this is unnecessary. While further development of this model will certainly be required [Box 4; 29], the model in its current form already shows that conceptual knowledge can emerge from a constrained learning process, without prior domain-specific knowledge and without requiring pre-specification of possible knowledge structures or selection among them.
Box 4. Outstanding Questions.
What types of network architectures best promote the discovery of structure?
To what extent are generic constraints sufficient to promote acquisition of domain-specific structure?
When do the advantages of conforming knowledge to a specific structural form outweigh the disadvantages? Does expertise increase or decrease conformity to specific structural forms?
When do humans truly engage in explicit hypothesis selection, and how can we distinguish such cases from situations in which they are gradually adapting implicit forms of knowledge such as connection weights in response to experience?
Conclusion
Far from being functionally equivalent or simply different levels of description, different theoretical frameworks lead to different conclusions about the nature of cognitive development, the kinds of questions that a cognitive theory should address, and how explanations of different domains of behavior should be unified. The structured probabilistic approach takes the stand that it is critical to specify the goal of cognitive processes at an abstract, computational- or competence-level of analysis before it makes sense to be concerned with the performance characteristics of particular algorithms or hardware implementations. Although this stance does not preclude explicit implementation, the properties of the machinery that implements the computations are not considered theoretically relevant. By contrast, the emergentist approach to understanding cognition, exemplified by dynamical systems and connectionist models, emphasizes the importance of specifying the actual mechanisms that underlie human cognitive performance, ultimately in terms of their neural implementation. The latter approach welcomes consideration of more abstract levels of description, and numerous research efforts have benefited considerably from integrating theories across levels [30, 31], but not at the expense of mechanism (Box 5).
Box 5. Emergentist Approaches Address Function and Mechanism: Response to Griffiths et al [2].
We view Griffiths and colleagues’ arguments for their top-down, structured probabilistic models approach and against our emergentist one as misguided in at least three important respects.
The characterization of our view
The authors suggest that, whereas their approach is “top-down,” ours is “bottom-up.” Actually, we emphasize function, algorithm and implementation equally and seek accounts that span levels. We use dynamical systems and connectionist networks because they provide tools for addressing questions at all of these levels, including function. The “function-first” approach will go astray if it makes incorrect assumptions about what the functions and goals actually are. In fact, we question many of their assumptions about function—for example, that the goal of language acquisition is to induce grammatical rules, or that the goal of semantics is to induce a structure representing relations among concepts. If these are not the right problems, the question of how to solve them optimally is moot. Mechanistic commitments place important constraints on the kinds of computations that are easy or natural, and thus provide information about what functions are actually computed. Thus, attention to mechanism can provide clues to function, just as attention to function can provide clues to mechanism.
The characterization of human abilities
The authors assume that human behavior is rational, and that cognition is compositional and recursive. In so doing they seem to over-estimate and mischaracterize human cognitive abilities. For instance, they suggest that people can radically reconfigure their beliefs on the basis of a single statement—for example, hearing “Dolphins are not fish but mammals” reorganizes their knowledge of animals. Although people can memorize arbitrary facts, deep conceptual reorganization occurs gradually over years, and coexists with knowledge of inconsistent facts. Human behavior is also notoriously susceptible to biases and heuristics that can lead to violations of rationality. To the extent that such behaviors are explained post hoc by “rational” models, the models are under-constrained—it is too easy to come up with a post-hoc rational characterization of any particular human behavior. To be useful, a theoretical account must explain not only why people excel at some cognitive abilities but also why they fail at others.
The characterization of the capacities of emergentist models
Several of Griffiths et al’s statements about the limitations of emergentist models are incorrect. Contra their statements, such models can: (i) exploit information provided by natural language or social context [65], (ii) account for rapid learning and generalization of new words [34, 66], (iii) explain why people sometimes generalize in an all-or-none fashion and sometimes in a graded fashion [67], (iv) explain nonlinearities in children’s lexical development [68, 69], and (v) explain why people generalize differently in different contexts [20]. Though emergentist models are constrained in what they can do easily, we view this as an advantage. The constraints arise from a commitment to mechanisms similar to those that implement real minds—thus they provide useful clues as to how real minds solve important cognitive problems.
The commitment to mechanism is both principled and pragmatic. On the principled side, cognitive processing emerges out of evolutionary and developmental pressures and constraints that include the limited capabilities of biologically realizable hardware and the real-time demands of the environment. For example, biological vision cannot have evolved solely as an in-principle response to the abstract problem of seeing; it was also constrained by what could evolve from pre-visual biological precursors and operate in real time. Thus, the fundamental nature of cognitive processing is shaped by the performance characteristics of the underlying mechanism, and approaches that abstract away from such information run a serious risk of missing critical aspects of the problem under consideration.
On the pragmatic side, attention to both the strengths and limitations of specific implementation details has led to valuable theoretical advances that would have been unavailable if operating only at a competence level of analysis. A clear case in point concerns the observation that distributed connectionist networks suffer “catastrophic interference” to old knowledge when forced to rapidly learn new inconsistent knowledge without the chance to rehearse the old knowledge [32, 33]. Such rapid learning is possible using very sparse representations, but this compromises the ability to learn the underlying statistical structure of experiences, thereby undermining generalization. The competing demands of rapid learning of new knowledge versus the gradual discovery of underlying structure are consequences of the connectionist implementation of learning and memory. This competition lead McClelland, McNaughton and O’Reilly [34] to propose that these functions are subserved by distinct but complementary memory systems—hippocampus and neocortex, respectively—with the former helping to consolidate knowledge in the latter over time. There are other possible implementations of mechanisms of learning and memory in which these demands do not conflict. Thus there is no basis for understanding the contrasting properties and coordinated operation of hippocampus and neocortex without committing to properties of the mechanism.
In summary, we advocate an integrated approach to cognition in which functional considerations are grounded in, and informed by, the performance characteristics of the underlying neural implementation.
Box Figure.

Elman’s Simple Recurrent Network. Each rectangle represents a pool of simple processing units, and each dashed arrow represents a set of learnable connections from the units in one pool to the units in another. A stream of items is presented to the input layer of the network, one after another. For each item, the task is to predict the next item. The pattern on the hidden layer from processing the previous item is copied back to the context layer, thereby allowing context to influence the processing of the next incoming item. Reproduced with permission from [21].
Glossary
- Connectionism
An approach to modeling cognition based on the idea that the knowledge underlying cognitive activity is stored in the connections among neurons. In connectionist models, knowledge is acquired by using an experience-driven connection adjustment rule to alter the strengths of connections among neuron-like processing units
- Dynamical system
a mathematical formalization that describes the time evolution of physical and cognitive states. Examples include the mathematical models that describe the swinging of a clock pendulum, the flow of water in a pipe, the movement of the limbs of a walking organism, and the drift that occurs in working memory towards or away from special points in the state space
- Dynamical field theory
Originally formulated as a theory of movement preparation, in which movement parameters are represented by distributions of activation defined over metric spaces, the theory has recently been extended to address cognitive function. Dynamical fields are formalizations of how neural populations represent the continuous dimensions that characterize perceptual features, movements, and cognitive decisions, and dynamical field theory specifies how activity in such neural populations evolves over time
- Emergentist approaches
Approaches to modeling cognition based on the idea that the structure seen in overt behavior and the patterns of change observed in behavior reflect the operation of subcognitive processes such as propagation of activation and inhibition among neurons and adjustment of strengths of connections between them. Emergentist approaches contrast with symbolic approaches, including structured probabilistic models, in which cognition is modeled directly at the level of manipulation of symbols and symbolic structures such as propositions and rules
- Semantic cognition
A cognitive domain encompassing knowledge of the properties of objects and their relationships to other objects, as well as the acquisition of such knowledge and its use in guiding inference
- Structured probabilistic models
Models that specify that cognitive activity involves the use of probabilistic information to select among and specify the parameters of particular structural forms, which specify relationships among items represented by discrete symbols
- Universal grammar
A hypothetical construct that arose in the context of generative grammar. A universal grammar, if one existed, would be a system of rules that characterized all of the world’s languages
Contributor Information
James L. McClelland, Department of Psychology, Stanford University, Building 420, 450 Serra Mall, Stanford, CA 94305, USA
Matthew M. Botvinick, Department of Psychology and Princeton Neuroscience Institute, Princeton University, Green Hall, Priceton, NJ 08504, USA
David C. Noelle, School of Engineering and School of Social Sciences, University of California, Merced 5200 North Lake Road, Merced, CA 95343, USA
David C. Plaut, Dept. of Psychology and Center for the Neural Basis of Cognition, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA
Timothy T. Rogers, Department of Psychology, University of Wisconsin-Madison, 1202 West Johnson Street, Madison, WI 53706, USA
Mark S. Seidenberg, Department of Psychology, University of Wisconsin-Madison, 1202 West Johnson Street, Madison, WI 53706, USA
Linda B. Smith, Department of Psychological and Brain Sciences, Indiana University, 1101 East 10th Street, Bloomington, IN 47405
References
- 1.Johnson SB. Emergence: The connected lives of ants, brains, cities, and software. New York: Scribner’s; 2001. [Google Scholar]
- 2.Griffiths TL, Chater N, Kemp C, Perfors A, Tenenbaum J. Probabilistic models of cognition: Exploring the laws of thought. Trends in Cognitive Sciences. 2010;XX:xxx–xxx. doi: 10.1016/j.tics.2010.05.004. [DOI] [PubMed] [Google Scholar]
- 3.Marr D. Vision. W. H. Freeman; San Francisco, CA: 1982. [Google Scholar]
- 4.Chomsky N. Aspects of the theory of syntax. Cambridge, MA: MIT Press; 1965. [Google Scholar]
- 5.Sternberg D, McClelland JL. When should we expect indirect effects in human contingency learning? In: Taatgen NA, van Rijn H, editors. Proceedings of the 31st Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society; 2009. pp. 206–211. [Google Scholar]
- 6.Movellan JR, McClelland JL. Learning continuous probability distributions with symmetric diffusion networks. Cognitive Science. 1993;17:463–496. [Google Scholar]
- 7.Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science. 2006;313:504–507. doi: 10.1126/science.1127647. [DOI] [PubMed] [Google Scholar]
- 8.Perfors A, Tenenbaum JB, Regier T. Proceedings of the 28th Annual Conference of the Cognitive Science Society. Mahwah, NJ: Lawrence Erlbaum Associates; 2006. Poverty of the stimulus? A rational approach; pp. 663–668. [Google Scholar]
- 9.Kemp C, Tenenbaum JB. Structured statistical models of inductive reasoning. Psychological Review. 2009;116:20–58. doi: 10.1037/a0014282. [DOI] [PubMed] [Google Scholar]
- 10.Sobel D, Tenenbaum J, Gopnik A. Children’s causal inferences from indirect evidence: backwards blocking and Bayesian reasoning in preschoolers. Cognitive Science. 2004;28:303–333. [Google Scholar]
- 11.Piaget J. The construction of reality in the child. New York: Basic Books; 1954. [Google Scholar]
- 12.Baillargeon R. How do infants learn about the physical world? Current Directions in Psychological Science. 1994;3:133–140. [Google Scholar]
- 13.Thelen E, Schöner G, Scheier C, Smith LB. The dynamics of embodiment: A field theory of infant perseverative reaching. Behavioral and Brain Sciences. 2001;24:1–86. doi: 10.1017/s0140525x01003910. [DOI] [PubMed] [Google Scholar]
- 14.Clearfield MW, Dineva E, Smith LB, Diedrich FJ, Thelen E. Cue salience and infant perseverative reaching: Tests of the dynamic field theory. Developmental Science. 2009;12:26–40. doi: 10.1111/j.1467-7687.2008.00769.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Clearfield MW, Diedrich FJ, Smith LB, Thelen E. Young infants reach correctly in A-not-B tasks: On the development of stability and perseveration. Infant Behavior & Development. 2006;29:435–444. doi: 10.1016/j.infbeh.2006.03.001. [DOI] [PubMed] [Google Scholar]
- 16.Spencer JP, Smith LB, Thelen E. Tests of a dynamic systems account of the A-not-B error: The influence of prior experience on the spatial memory abilities of two-year-olds. Child Development. 2001;72:1327–1346. doi: 10.1111/1467-8624.00351. [DOI] [PubMed] [Google Scholar]
- 17.Smith LB, Thelen E, Titzer R, McLin D. Knowing in the context of acting: The task dynamics of the A-not-B error. Psychological Review. 1999;106:235–260. doi: 10.1037/0033-295x.106.2.235. [DOI] [PubMed] [Google Scholar]
- 18.Diedrich FJ, Thelen E, Smith LB, Corbetta D. Motor memory is a factor in infant perseverative errors. Developmental Science. 2000;3:479–494. [Google Scholar]
- 19.Lew A, Hopkins B, Owen L, Green M. Postural change effects on infants’ AB task performance: Visual, postural, or spatial? Journal of Experimental Child Psychology. 2007;97:1–3. doi: 10.1016/j.jecp.2006.12.009. [DOI] [PubMed] [Google Scholar]
- 20.Rogers TT, McClelland JL. Semantic cognition: A parallel distributed processing approach. MIT Press; Cambridge, MA: 2004. [DOI] [PubMed] [Google Scholar]
- 21.Elman JL. Finding structure in time. Cognitive Science. 1990;14:179–211. [Google Scholar]
- 22.Rohde DLT, Plaut DC. Language acquisition in the absence of explicit negative evidence: How important is starting small? Cognition. 1999;72:67–109. doi: 10.1016/s0010-0277(99)00031-1. [DOI] [PubMed] [Google Scholar]
- 23.Keil FC. Constraints on knowledge and cognitive development. Psychological Review. 1981;88:197–227. [Google Scholar]
- 24.Gelman R. First principles organize attention to and learning about relevant data: Number and the animate/inanimate distinction as examples. Cognitive Science. 1990;14:79–106. [Google Scholar]
- 25.Colunga E, Smith LB. From the lexicon to expectations about kinds: a role for associative learning. Psychological Review. 2005;112:347–382. doi: 10.1037/0033-295X.112.2.347. [DOI] [PubMed] [Google Scholar]
- 26.Gelman SA, Markman EM. Categories and induction in young children. Cognition. 1986;23:183–209. doi: 10.1016/0010-0277(86)90034-x. [DOI] [PubMed] [Google Scholar]
- 27.Rogers TT, Lambon Ralph MA, Garrard P, Bozeat S, McClelland JL, Hodges JR, Patterson K. The structure and deterioration of semantic memory: A neuropsychological and computational investigation. Psychological Review. 2004;111:205–235. doi: 10.1037/0033-295X.111.1.205. [DOI] [PubMed] [Google Scholar]
- 28.McClelland JL, Rogers TT, Patterson K, Dilkina KN, Lambon Ralph MR. Semantic cognition: Its nature, its development, and its neural basis (Chap. 72) In: Gazzaniga M, editor. The cognitive neurosciences IV. Boston, MA: MIT Press; 2009. [Google Scholar]
- 29.Rogers TT, McClelland JL. A simple model from a powerful framework that spans levels of analysis. Behavioral and Brain Sciences. 2008;31:729–749. [Google Scholar]
- 30.Botvinick MM, An J. Goal-directed decision making in prefrontal cortex: A computational framework. In: Koller D, Bengio YY, Schuurmans D, Bouttou L, Culotta A, editors. Advances in Neural Information Processing Systems. Red Hook, NY: Curran Associates, Inc; 2008. pp. 169–176. [PMC free article] [PubMed] [Google Scholar]
- 31.McClelland JL, Chappell M. Familiarity breeds differentiation: A subjective-likelihood approach to the effeccts of experience in recognition memory. Psychological Review. 1998;105:724–760. doi: 10.1037/0033-295x.105.4.734-760. [DOI] [PubMed] [Google Scholar]
- 32.McCloskey M, Cohen NJ. Catastrophic interference in connectionist networks: The sequential learning problem. In: Bower GH, editor. The psychology of learning and motivation. New York: Academic Press; 1989. pp. 109–165. [Google Scholar]
- 33.Ratcliff R. Connectionist models of recognition memory: Constraints imposed by learning and forgetting functions. Psychological Review. 1990;97:285–308. doi: 10.1037/0033-295x.97.2.285. [DOI] [PubMed] [Google Scholar]
- 34.McClelland JL, McNaughton BL, O’Reilly RC. Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychological Review. 1995;102:419–457. doi: 10.1037/0033-295X.102.3.419. [DOI] [PubMed] [Google Scholar]
- 35.Feldman NH, Griffiths TL, Morgan JL. The influence of categories on perception: Explaining the perceptual magnet effect as optimal statistical inference. Psychological Review. 2009;116:752–782. doi: 10.1037/a0017196. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Hay J. Lexical frequency in morphology: Is everything relative? Linguistics. 2001;39:1041–1070. [Google Scholar]
- 37.Bybee J, McClelland JL. Alternatives to the combinatorial paradigm of linguistic theory based on domain general principles of human cognition. The Linguistic Review. 2005;22:381–410. [Google Scholar]
- 38.Bybee JL. Morphology: A study of the relation between meaning and form. Philadelphia: John Benjamins; 1985. [Google Scholar]
- 39.Gonnerman LA, Seidenberg MS, Andersen E. A distributed connectionist approach to morphology: Evidence from graded semantic and phonological effects in lexical priming. Journal of Experimental Psychology: General. 2007;136:323–345. doi: 10.1037/0096-3445.136.2.323. [DOI] [PubMed] [Google Scholar]
- 40.Culicover PW. Syntactic nuts: Hard cases in syntax.. Volume 1, Foundations of Syntax. Oxford University Press; Oxford: 1999. [Google Scholar]
- 41.Prince A, Smolensky P. Optimality theory: Constraint interaction in generative grammar. Oxford: Blackwell; 2004. [Google Scholar]
- 42.Smolensky P, Legendre G. The harmonic mind: From neural computation to Optimality-theoretic grammar, Vol. 1: Cognitive architecture; Vol. 2: Linguistic and philosophical Implications. Cambridge, MA: MIT Press; 2006. [Google Scholar]
- 43.Rumelhart DE, Smolensky P, McClelland JL, Hinton GE. the PDP research group. Schemata and sequential thought processes in PDP models. In: McClelland JL, Rumelhart DE, editors. Parallel distributed processing: Explorations in the microstructure of cognition. Vol. 2. Cambridge, MA: MIT Press; 1986. pp. 7–57. [Google Scholar]
- 44.Plaut DC, McClelland JL, Seidenberg MS, Patterson K. Understanding normal and impaired word reading: Computational principles in quasi-regular domains. Psychological Review. 1996;103:56–115. doi: 10.1037/0033-295x.103.1.56. [DOI] [PubMed] [Google Scholar]
- 45.Plaut DC, Gonnerman LM. Are non-semantic morphological effects incompatible with a distributed connectionist approach to lexical processing? Language and Cognitive Processes. 2000;15:445–485. [Google Scholar]
- 46.Pinker Steven, Ullman Michael T. The past and future of the past tense. Trends in Cognitive Sciences. 2002;6(11):456–463. doi: 10.1016/s1364-6613(02)01990-3. [DOI] [PubMed] [Google Scholar]
- 47.Rumelhart DE, McClelland JL. On learning past tenses of English verbs. In: Rumelhart DE, McClelland JL, editors. Parallel distributed processing: Explorations in the micro-structure of cognition. Vol. 2: Psychological and biological models. Cambridge, MA: MIT press; 1986. [Google Scholar]
- 48.Joanisse MF, Seidenberg MS. Impairments in verb morphology following brain injury: a connectionist model. Proceedings of the National Academy of Sciences. 1999;96:7592–7597. doi: 10.1073/pnas.96.13.7592. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Seidenberg MS, McClelland JL. A distributed, developmental model of word recognition and naming. Psychological Review. 1989;96:523–568. doi: 10.1037/0033-295x.96.4.523. [DOI] [PubMed] [Google Scholar]
- 50.Elman JL. Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning. 1991;7:195–224. [Google Scholar]
- 51.Servan-Schreiber D, Cleeremans A, McClelland JL. Graded state machines: The representation of temporal contingencies in simple recurrent networks. Machine Learning. 1991;7:161–193. [Google Scholar]
- 52.St John MF, McClelland JL. Learning and applying contextual constraints in sentence comprehension. Artificial Intelligence. 1990;46:217–257. [Google Scholar]
- 53.St John MF. The story gestalt: A model of knowledge-intensive processes in text comprehension. Cognitive Science. 1992;16:271–306. [Google Scholar]
- 54.Rohde DLT. Unpublished PhD thesis. School of Computer Science, Carnegie Mellon University; Pittsburgh, PA: 2002. A Connectionist Model of Sentence Comprehension and Production. [Google Scholar]
- 55.Jansen BRJ, van der Maas HLJ. Evidence for the phase transition from Rule I to Rule II on the balance scale task. Developmental Review. 2001;21:450–494. [Google Scholar]
- 56.Ferretti RP, Butterfield EC. Are childrens’ rule-assessment classifications invariant across instances of problem types? Child Development. 1986;57:1419–1428. [PubMed] [Google Scholar]
- 57.McClelland JL. Parallel distributed processing: Implications for cognition and development. In: Morris R, editor. Parallel distributed processing: Implications for psychology and neurobiology. New York: Oxford University Press; 1989. pp. 8–45. [Google Scholar]
- 58.Schapiro AC, McClelland JL. A connectionist model of a continuous developmental transition in the balance scale task. Cognition. 2009;110:395–411. doi: 10.1016/j.cognition.2008.11.017. [DOI] [PubMed] [Google Scholar]
- 59.Denny-Brown D. The cerebral control of movement. Springfield, IL: Charles C. Thomas; 1966. [Google Scholar]
- 60.Thelen E, Smith LB. A dynamic systems approach to the development of cognition and action. Cambridge, MA: The MIT Press; 1994. [Google Scholar]
- 61.Farah MJ, McClelland JL. A computational model of semantic memory impairment: Modality- specificity and emergent category-specificity. Journal of Experimental Psychology: General. 1991;120:339–357. [PubMed] [Google Scholar]
- 62.Schwartz MF, Reed ES, Montgomery MW, Palmer C, Mayer NH. The quantitative description of action disorganization after brain damage: A case study. Cognitive Neuropsychology. 1991;8:381–414. [Google Scholar]
- 63.Botvinick M, Plaut DC. Doing without schema hierarchies: A recurrent connectionist approach to normal and impaired routine sequential action. Psychological Review. 2004;111:395–429. doi: 10.1037/0033-295X.111.2.395. [DOI] [PubMed] [Google Scholar]
- 64.Botvinick M, Plaut DC. Short-term memory for serial order: A recurrent neural network model. Psychological Review. 2006;113:201–233. doi: 10.1037/0033-295X.113.2.201. [DOI] [PubMed] [Google Scholar]
- 65.Hirsh-Pasek K, Golinkoff RM, Hollich G. An emergentist coalition model for word learning: mapping words to objects is a product of the interaction of multiple cues. In: Hirsh-Paske, Golinkoff, editors. Becoming a Word Learner: a Debate on Lexical Acquisition. Oxford Press; New York: 2000. pp. 136–164. [Google Scholar]
- 66.Mayor J, Plunkett K. A neurocomputational account of taxonomic responding and fast mapping in early word learning. Psychological Review. 2010;117(1):1–31. doi: 10.1037/a0018130. [DOI] [PubMed] [Google Scholar]
- 67.McClelland JL, Patterson K. Rules or connections in past-tense inflections: What does the evidence rule out? Trends in Cognitive Science. 2002;6(11):465–472. doi: 10.1016/s1364-6613(02)01993-9. [DOI] [PubMed] [Google Scholar]
- 68.Marchman V, Bates E. Continuity in lexical and morphological development: A test of the critical mass hypothesis. Journal of Child Language. 1994;21:339–366. doi: 10.1017/s0305000900009302. [DOI] [PubMed] [Google Scholar]
- 69.MacWhinney B. Models of the emergence of language. Annual Review of Psychology. 1998;49:199–227. doi: 10.1146/annurev.psych.49.1.199. [DOI] [PubMed] [Google Scholar]
- 70.Rumelhart DE, Todd PM. Learning and connectionist representations. In: Meyer DE, Kornblum S, editors. Attention and performance XIV: Synergies in experimental psychology, artificial intelligence, and cognitive neuroscience. MIT Press; 1993. pp. 3–30. [Google Scholar]


