Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Oct 12.
Published in final edited form as: Lang Learn Dev. 2011 Oct 12;7(4):279–286. doi: 10.1080/15475441.2011.605309

The Modularity Issue in Language Acquisition: A Rapprochement? Comments on Gallistel and Chomsky

Elissa L Newport 1
PMCID: PMC3377485  NIHMSID: NIHMS351879  PMID: 22737043

One of the central controversies of our field is whether language is ‘special’: Is there a separate and distinct faculty of mind that represents and controls language (and also, analogously, is there is a separate faculty for other central aspects of human cognition, such as faces, music, physical causality, etc)? Or, rather, is language derived and fully explainable via more general principles of human cognition?

A few facts form the basic groundwork for answering this question:

  1. Human language is organized in a strikingly different way than the communication systems of other species, even non-human primates.

  2. Human language is organized differently than other aspects of human cognition, with structure and principles that are, to varying degrees, different from those of music, space, and face perception.

  3. There are a number of universal principles that characterize human languages, and a number of dimensions along which human languages vary in very limited ways.

  4. Even humans are not always capable of acquiring languages at all stages of life. Infants and young children are spectacular language learners; adults are not.

  5. Reasonably localized parts of the human brain are responsible for human language, analogous to the localization of function found for the sensory and motor systems.

In light of these (fairly) widely accepted facts, how could it be the case that human language derives only from general cognitive principles? If no other cognitive system has precisely the same structure and organization as human language, then either there must be some ‘special’ constraints that have been added to the cognitive constraints on language (the modularist or semi-modularist view), or there must be a unique combination of cognitive functions that are not together involved in any other task and that together constrain human language to be structured the way it is (a non-modularist view). The latter is not impossible – human language involves a particular combination of thought and sequential expression, as Chomsky describes, that may call on a unique combination of cognitive skills for its learning, production, and online processing. One of a similar pair of alternatives must account for species differences: if no other species has the same type of communication system as human language, then it must be that one or more special constraints have appeared in humans – either special language-specific adaptations, or advanced cognitive skills that non-human species lack.

The issue, then, is whether there are any ‘special’ constraints, ones that participate in and constrain language learning and processing, that are invoked only for language, and that are not part of any other cognitive system.

Interestingly, I found the arguments of both Gallistel and Chomsky, who are proponents of the view that language and other significant cognitive functions are each modular and ‘special,’ to weigh almost as much in favor of the opposing view. Randy Gallistel’s fascinating paper takes us through a number of cases in animal cognition where non-human species appear to display extremely complex cognitive skills, whose central properties involve the use of symbolic computation and a predicate-argument type of representation of events. Does this apparent similarity to some of the functions underlying human language tell us that cognitive systems are modular? Or, rather, does it suggest that there are even more striking similarities between human language and other aspects of cognition than we previously understood? For me, the latter was the view more highlighted than the former by the research Gallistel reviews.

Similarly, Noam Chomsky’s important paper – surprisingly – for me provided a profound argument that one should think about the complex and ornate characteristics of human language as deriving from more fundamental and perhaps widespread underlying computational principles. Chomsky argues that the structure of human language derives from two types of constraints: the nature of thought (is this thought special to language, or is it simply special to humans?) and the pressures of externalization. On his view, the nature of thought is non-linear; it is hierarchical and recursive. His hypothesis about language is that it acquires its linear organization in the process of being externalized – at the sensory-motor interface, presumably in accord with pressures supplied by the nature of the articulation process, and perhaps also from the perceptual process applied by the listener.1 An overriding constraint applied to externalization is minimal computation, the constraint that there should be minimal computational complexity in the relationship of the hierarchical representation and its linearization. For me, this view, though more elegant, more detailed, and more beautifully articulated than any that I know among non-modularists, is nonetheless surprisingly similar to what the most promising non-modularist approaches are trying to argue as well.

Below I will briefly outline what I think are these promising non-modularist views, and then return to where I think this putative similarity of views leaves our field.

The non-modularity position: What is the alternative to what Chomsky and Gallistel favor?

Undoubtedly there are the radical proposals: positions arguing that nothing (of any kind) is innate, that languages have no universally shared principles or structures, and that language acquisition is just the learning of lexical and constructional forms. But this doesn’t seem to me to be the dominant non-modular view, and certainly not the most compelling or likely one. Most non-modularists – thanks to the profound importance of Chomsky’s work and its enormous impact on our field – believe that there are striking universal principles that constrain language structure and also that there are innate abilities of humans that are foundational for language acquisition and language processing. However, one can agree that there are innate abilities required for language and yet not be certain whether these abilities are specific to language. Though many nativists believe also in modularity, the question of innateness and that of modularity are in principle distinct (cf. Keil, 1990, for discussion).

In addition, few non-modularists think that all of perception or cognition is homogeneous, characterized by the same principles of organization throughout. Rather, the question is where and how to divide up the differing components of cognition/perception – and, in particular, whether language will turn out to be one of the components proper or rather is best described as the outcome of interactions among the other components and their constraints. Certainly most cognitive psychologists believe that there is a difference in organizational principles between the visual and auditory systems, between iconic or echoic memory, short-term memory, and long-term memory, and between implicit learning and explicit problem solving. The non-modularist’s question is whether language qualifies as one of those systems that handles information in its own way, or rather whether its characteristics are the outcome of squeezing information in, through, and out of the others.

A mild version of the non-modularist view is that there may be some elements or principles specific to language – perhaps the basic primitives (e.g. features, syllables) at the base of the system – and also perhaps some characteristics of the system that have become grammaticized or conventionalized within the life of the individual or the species. But the non-modularist believes that relatively little of the structure of language falls into the specialized type of constraint, whereas many or most aspects of language derive from the interaction of other modules or systems of cognition/perception.

In more detail, a widely held non-modular view is something like this:

  1. There must be some (few) innate primitives, with which one begins the acquisition process. At least some lowest level primitives must be there to begin and distinguish different types of patterned stimuli from one another: edges and points of light for vision, pitch classes and intervals for music, phonetic features or units like the phone or the syllable for speech. This assumption is shared with modularists. But beyond this, from the non-modularist’s perspective, the structure and patterning in language might be acquired by computational mechanisms or organizational principles that are shared with other domains (though not necessarily the same for every domain of cognition or every type of learning, and not necessarily as simple or without structure as associationism).

  2. There must also be some basic principles that organize the elements in each domain. As already noted, these are not necessarily extremely simple or without structure themselves. For example, most modern statistical learning views include a more complex set of expectations about what can be learned via statistical mechanisms than the simple local co-occurrences that Chomsky (1957) showed long ago would not be adequate for language. For language, these include computation of mutual information, entropy, conditional probability, contingency, or predictiveness rather than simple co-occurrence, between elements that are either local or slightly distant from one another, and computed over structural or hierarchical rather than linear distance. Many approaches also assume that these computations are done recursively, organizing words into phrases and phrases into more complex phrases (Gomez & Gerkin 1999; Newport & Aslin, 2004; Saffran & Wilson 2003; Takahashi & Lidz, 2008; Thompson & Newport 2007).

    In many ways these are the same assumptions as those of modularists; but for a non-modular view, the hypothesis would be that these organizational and computational principles are, at least to some degree, used by humans to organize more than just language. At least the simplest aspects of statistical learning appear to be performed similarly for both language and non-language materials (Fiser & Aslin, 2002; Saffran Johnson Newport & Aslin 1999). Hierarchical phrase structure, while supremely characteristic of human language, is clearly not unique to language (cf. Lashley, 1951) and is utilized in descriptions of the complex organization of motor behavior, music, and many other domains. The empirical question here is whether these organizational principles are, in detail, the same for language and these other arenas – or whether it is too glib or too vague to say these organizational principles are similar across domains. The generative tradition in language has given us an elegant and detailed articulation of how these principles work themselves out in language; whether the same principles apply in detail to any other domain remains to be seen, since few comparably sophisticated analyses have ever been done of other complex cognitive domains.

  3. There may also be some constraints on language from the production, comprehension, and learning systems that process language as it is learned and used, and whose limitations a successful language must observe in order for it to be maintained over time (cf. Bever 1970; Hawkins 1994, 2004; Jaeger 2010; Levy 2005; Newport, 1981; Newport & Aslin 2004). There are a number of proposals regarding how real-time processing and production might affect and constrain linguistic structure. For example, Bever (1970) suggested that linear sequencing in language (such as the relative positions of the determiner, adjectives, and head noun within a noun phrase) is shaped, at least in part, by perceptual strategies that make sentences easier to parse. Hawkins (1994, 2004) suggests that grammars are conventionalizations of processing preferences, thus combining the probabilistic character of processing constraints and biases with the more crisp and regular character of grammars. Jaeger (2010) has suggested that speakers manage information flow to listeners, utilizing linguistic structures chosen in part with an aim toward uniform information density over time. While this account is partially focused on the selection of which optionally available structures are utilized in real-time interactions, it also attempts to explain how structures might shift over time under the pressures of this constraint (Jaeger & Tily 2011). Newport and collaborators (Hudson Kam & Newport, 2009; Newport 1981, 1990; Newport & Aslin 2004) have suggested a number of ways in which constraints on learning might come to shape language structure, including constraints on the distance between related elements in word formation and also constraints on the complexity and regularity of linguistic constructions due to the cognitive limitations of the young children who carry the language through time. Like UG, constraints of these types can help to explain not only the structure and acquisition of spoken languages, but also the acquisition of sign languages, the emergence of languages with reduced or absent input, and the changes in acquisition that occur over maturation.

Of course it is not clear whether the framework outlined above will be adequate to explain all the richness of linguistic structure that will be required in order to avoid invoking additional principles specific to language. My point here is not to say that we know yet whether this type of complex non-modularist approach will succeed, but only to say that there are some linguistically informed ideas under discussion.

Is this type of non-modular position biologically implausible?

Is the non-modular position I have outlined a very unlikely or atypical way for a biological system to be built? Is there a ‘biological norm’ for systems regarding how they are organized and how they are built, a typical way in which biological systems are organized – and therefore a way in which we can presume language is also organized? In his paper in the Cognitive Neurosciences volume, Gallistel (2000) argues that the biological norm is a set of specialized modular systems, each adapted for solving a specific problem and therefore operating with its own distinct organizational principles, not particularly similar to any of the others. He suggests that this style of organization is so ubiquitous that we can and should expect all learning mechanisms – perhaps all cognitive systems – to be similarly highly specialized and unique in their organizational principles. From the examples I know best, however, it seems that biological systems are not all the same in this regard, that they are not all highly specialized or modularized. Rather, they are remarkably varied in how they constrain learning within their range – suggesting that the correct story for language cannot be predicted from a generalization about other systems. Here are a few examples that I believe differ in kind from the ones Gallistel so compellingly describes.

Imprinting is sometimes characterized as a specialized biological system, one which responds to its own specialized stimuli (Mommy duck or Mother goose) and which learns from these best examples how to identify species members and potential mates. However, while some investigators have described imprinting as specialized in this way, others (Bateson, 1979; Hess 1973) have described imprinting very differently: as the outcome of a small number of simple, general, and probabilistic tendencies that, when combined with experience, produce the apparently specialized learning as an outcome. On this alternate conception, the simple and not-very-specialized constraints are these: Ducks and geese are precocious walkers and are innately programmed to follow moving objects. They will follow virtually any moving object, but they have a preference for objects that are round over those of other shapes. And then they learn the characteristics of objects that they follow. As some objects become familiar, those that are not familiar begin to produce fear and flight. Not much is innate in this system – a few natural motor tendencies and a few stimulus preferences - and not much of what is innate is ‘knowledge of the species.’ But through the interaction of these sparse and general tendencies, knowledge of the species does usually emerge.

In seminal work on birdsong learning, the main hypothesis has been that song learners (e.g. white-crowned sparrows, zebra finches) have an innate template of their species song that serves as the filter and the guide to the details of the song they would learn (Marler, 1970), a concept similar to earlier and richer notions of UG for human language. An alternative way of thinking about song learning (Nordeen, Holtzman & Nordeen, 2009), however, is that conspecific song – or certain elements of that song – simply activate the dopaminergic reward system to unusually high levels of activity, thereby stimulating more readily the normal, general neural mechanisms of associative learning in the auditory system (Hebbian learning, LTP, etc). In accord with this hypothesis but in contrast to template expectations, zebra finches reared by Bengalese finches will learn the Bengalese finch song, which is very different in structure and does not fit the zebra finch template (Eales, 1987; Immelmann, 1969). Song learning – and therefore its explanation - also varies dramatically across species. Some species specialize in learning a very wide range of sounds, including (in the mockingbird) learning to imitate other species’ calls or even (in the lyre bird) learning not only other species’ calls, but also learning to produce shockingly accurate renditions of the camera shutter sounds and chain saw noises that surround them in modern life (Attenborough, 1998).

A different type of case arises in audition in crickets. Often we define modularity in part by the difference between general sensory or cognitive functions and the workings of a focused learning system; but this contrast is obliterated in an unexpected way in the female cricket. Female crickets’ entire auditory system is reduced to the range of frequencies in which male cricket mating chirps are produced. Is this a modular system, or a general one? Oddly, the ‘general’ system has become the ‘specialized’ one, so there is no longer any distinction between the two. For me, this example challenges the modular/non-modular distinction, making it less than clear quite how to categorize all biological systems one way or another.

As the complement of these examples, there are also modularized and coherent systems of knowledge for which there are probably not biological modules as their underpinnings. For individuals who have achieved very high levels of expertise, chess, juggling, music, knowledge of dinosaurs or cars, even language A as opposed to language B form highly practiced domains of expertise - bodies of knowledge that are organized and activated in an integrated way, can be localized in the brain, and likely can be lost through stroke or brain injury, separately from other aspects of cognition. This suggests that the characteristics of tightly integrated modular systems are not necessarily indicative of innate biological modularity.

Taken together, these examples suggest that nature finds many ways of building adapted systems, and that we can only understand how each of them works by studying them with a varied range of ideas regarding the nature of the architecture underneath.

Back to language: Concluding points

To return to the focus of this commentary, there are two main points I’ve tried to make here. First, I have sketched a version of a non-modular view of language – perhaps more appropriately called a semi-modular view of language – that is not quite what Chomsky’s paper argues against but is, in my opinion, a version worth further consideration and research. One crucial set of empirical questions is whether the constraints of externalization – the constraints of minimal computation, as well as the complex and best formulated constraints on linguistic structure from processing, production, and learning – are specific to language or apply to any other domains of complex pattern learning or complex hierarchically organized but serially ordered behavior. An answer to this question requires, in part, that we take up an investigation of some interesting non-linguistic domains with the sophistication and the analytic tools that modern linguistic theory has brought to the study of language, an enterprise that our field has not often engaged in (though see Lerdahl & Jackendoff 1983, as well as more recent work in music cognition, for a model).

Second, I have suggested that Gallistel’s fascinating article might be viewed as supporting striking similarities across species and across cognitive domains, and that the non-modularist view I have outlined above does not seem so different from the modularist one that Chomsky espouses. As Chomsky notes in his paper, in the early days of generative grammar there was more extensive and particular substantive content to UG – a very ‘rich set of assumptions about the genetic component of language’; and thus the modularist and non-modularist positions were more extremely different from one another. In more recent work, however, as UG has become smaller and framed in terms of more general principles, the modular view does not seem so very different to me than those of the other side. A crucial remaining difference, of course, is that UG is always more clearly and formally articulated, whereas those of us (certainly including myself) who have thought seriously about a less modular view of language have only been able to approach the problem with some generic thoughts (regarding adjacent versus non-adjacent dependencies, limits of short-term memory, and the like), accompanied by a wish and a promise that these cognitive approaches might someday be adequate to approach the formal elegance of the linguistic descriptions they are accountable for. Certainly there is no doubt that the two views differ greatly in the precision they bring to the problem. But this doesn’t tell us which is correct – it only means that we must work harder, and perhaps more collaboratively across these differing perspectives, to find out where the truth lies and how we might discover it.

Acknowledgments

Many thanks to Noam Chomsky and Randy Gallistel for their stimulating papers, to Dan Swingley for organizing the Society for Language Development Symposium, and to Ernie Nordeen, Florian Jaeger, Jeff Runner, Susan Goldin-Meadow, Lila Gleitman, and Dick Aslin for their detailed comments on this paper and their thoughtful and stimulating discussion of these issues over many years. Supported in part by NIH grants DC00167 and HD37082.

Footnotes

1

In broad strokes, this is like the position articulated by Liberman (1970), who suggested that grammar is the outcome of the mismatch between the structure of thought and the workings of the mouth and ear – that grammar is the system that links these two very different types of structure and process to one another.

References

  1. Attenborough D. The life of birds. BBC Worldwide; 1998. [Google Scholar]
  2. Bateson P. How do sensitive periods arise and what are they for? Animal Behaviour. 1979;27:470–486. [Google Scholar]
  3. Bever TG. The cognitive basis for linguistic structures. In: Hayes JR, editor. Cognition and development of language. New York: Wiley; 1970. pp. 279–362. [Google Scholar]
  4. Eales LA. Do zebra finch males that have been raised by another species still tend to select a conspecific song tutor? Animal Behavior. 1987;35:1347–1355. [Google Scholar]
  5. Fiser J, Aslin RN. Statistical learning of new visual feature combinations by infants. Proceedings of the National Academy of Sciences. 2002;99:15822–15826. doi: 10.1073/pnas.232472899. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Gomez RL, Gerken LA. Artificial grammar learning by one-year-olds leads to specific and abstract knowledge. Cognition. 1999;70:109–135. doi: 10.1016/s0010-0277(99)00003-7. [DOI] [PubMed] [Google Scholar]
  7. Hawkins JA. Efficiency and complexity in grammars. Oxford: Oxford University Press; 2004. [Google Scholar]
  8. Hess EH. Imprinting. New York: Van Nostrand Reinhold; 1973. [Google Scholar]
  9. Hudson Kam CL, Newport E. Getting it right by getting it wrong: When learners change languages. Cognitive Psychology. 2009;59:30–66. doi: 10.1016/j.cogpsych.2009.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Immelmann K. Song development in the zebra finch and other estrildid finches. In: Hinde RA, editor. Bird Vocalisations. Cambridge: Cambridge University Press; 1969. pp. 61–74. [Google Scholar]
  11. Jaeger TF. Redundancy and reduction: Speakers manage syntactic information density. Cognitive Psychology. 2010;61(1):23–62. doi: 10.1016/j.cogpsych.2010.02.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Jaeger TF, Tily H. Language processing complexity and communicative efficiency. WIRE: Cognitive Science. 2011;2 (3):323–335. doi: 10.1002/wcs.126. [DOI] [PubMed] [Google Scholar]
  13. Keil FC. Constraints on constraints: Surveying the epigenetic landscape. Cognitive Science. 1990;14:135–168. [Google Scholar]
  14. Lashley KS. The problem of serial order in behavior. In: Jeffress LA, editor. Cerebral mechanisms in behavior. New York: Wiley; 1951. pp. 112–131. [Google Scholar]
  15. Lerdahl F, Jackendoff R. A generative theory of tonal music. Cambridge: MIT Press; 1983. [Google Scholar]
  16. Levy R. Stanford University doctoral dissertation. 2005. Probabilistic models of word order and syntactic discontinuity. [Google Scholar]
  17. Liberman AM. The grammars of speech and language. Cognitive Psychology. 1970;1:301–323. [Google Scholar]
  18. Marler P. A comparative approach to vocal learning: Song development in white-crowned sparrows. Journal of Comparative and Physiological Psychology. 1970;71(2 Pt 2):1–25. [Google Scholar]
  19. Newport E. Constraints on structure: Evidence from American Sign Language and language learning. In: Collins WA, editor. Aspects of the development of competence. Minnesota symposia on child psychology. Vol. 14. Hillsdale, NJ: Erlbaum; 1981. [Google Scholar]
  20. Newport E, Aslin RN. Learning at a distance: I. Statistical learning of non-adjacent dependencies. Cognitive Psychology. 2004;48:127–162. doi: 10.1016/s0010-0285(03)00128-2. [DOI] [PubMed] [Google Scholar]
  21. Nordeen EJ, Holtzman DA, Nordeen KW. Increased Fos expression among midbrain dopaminergic cell groups during birdsong tutoring. European Journal of Neuroscience. 2009;30:662–670. doi: 10.1111/j.1460-9568.2009.06849.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Saffran JR, Wilson DP. From syllables to syntax: Multi-level statistical learning by 12-month-old infants. Infancy. 2003;4:273–284. [Google Scholar]
  23. Takahashi E, Lidz J. Beyond statistical learning in syntax. Proceedings of Generative Approaches to Language Acquisition.2008. [Google Scholar]
  24. Thompson SP, Newport EL. Statistical learning of syntax: The role of transitional probability. Language Learning and Development. 2007;3(1):1–42. [Google Scholar]

RESOURCES