Abstract
The language sciences – Linguistics, Psycholinguistics, and Computational Linguistics – have not been broadly represented at the Cognitive Science Society meetings of the past 30 years, but they are an important part of the heart of cognitive science. This article discusses several major themes that have dominated the controversies and consensus in the study of language and suggests the most pressing issues of the future. These themes include differences among the language science disciplines in their view of numbers and symbols and of modular and distributed cognition; and the need for an increasing prominence of questions concerning language and the brain.
Keywords: language sciences, linguistics, psycholinguistics, statistical learning, probabilistic approaches, symbols and rules, modularity, language and the brain
In this paper I will be discussing the past and the future in the interdisciplinary language sciences: the set of disciplines that study language, including Linguistics, Psycholinguistics, and Computational Linguistics. This field is not typically the same as the audience that attends the Cognitive Science Society meetings, so the programs of the last 30 years of the Cognitive Science Society would not represent this part of cognitive science very well. Rather than addressing the past and future thirty years at the Cognitive Society meetings, then, I would like to discuss issues and dimensions of change in this field more broadly over the last 30 years, as well as those that might be prominent in the future. I hope that this will stimulate readers to think also about dimensions of focus and change they see in the past and those they think will characterize the future.
The issues I will address are: numbers and symbols; modular and distributed cognition; a very quick point on methodologies - changes from print to real-time real-world language; and I will end by discussing language and the brain.
Numbers and Symbols
Let me start with numbers and symbols. The disciplines that study language have an extremely interesting difference in their first principles about qualitative versus quantitative, or symbolic versus numerical kinds of representations and processes, and this has led to a very interesting set of interactions and shifts - but also arguments and disagreements - in the field of language sciences.
Though there are certainly exceptions, I would characterize Formal Linguistics as, by and large, a field that takes as its first principle that representations and processes are not quantitative but are comprised of symbols and rules (Chomsky, 1965, 1981, 1995; Marcus, 2001). Over the last thirty years there have been many changes in theories - changes in individual theories (for example, in Chomskian theories), and the flourishing of many other kinds of theories - but most of them share this one characteristic: that they are not inherently statistical, probabilistic, or quantitative, but rather presume that the medium of representation and the nature of linguistic processes involves symbols and rules.
An extremely interesting exception – perhaps more accurately, an extremely interesting approach to this very issue – is the work of Alan Prince and Paul Smolensky on Optimality Theory (2004). In this approach, there is still a non-statistical type of representation - a set of rules and principles that always apply and that apply universally. However, by having a ranking system by which the principles interact with each other, one gets effects that in other theories arise from probabilistic or quantitative interactions among soft non-symbolic tendencies.
Aside from OT, though, most of Linguistics has taken a non-quantitative approach as a methodological assumption - and in a moment I will say that this has also been a claim about the nature of cognition.
Psycholinguistics has undergone a striking change on this issue over the past thirty years. In the field of Psychology, everything is probabilistic; it’s the way one grows up thinking as a psychologist. Even perceiving a light or a tone isn’t thought of as a discrete event but rather as a probabilistic phenomenon, at the core of which is Signal Detection Theory. In contrast to Linguistics, this probabilistic nature of cognition isn’t thought of as a performance problem that one could separate from knowledge, which is inherently non-quantitative. Rather, as a psychologist one thinks about a system that is inherently probabilistic in responding to stimulation, with natural variability in input and output as well as in storage mechanisms. These probabilistic characteristics are conceptualized as the real, true underlying nature of the system. Thirty years ago, psycholinguistics stood between these two traditions: one way of thinking about things from the linguistic point of view and the other from the psychological point of view. In the past thirty years, much of the field has been focused on the tension between rules and symbols versus statistics and quantitative, probabilistic phenomena. See, for example, movement in many parts of psycholinguistics from rules to connectionism to statistical learning (Aslin, Newport & Saffran, 1998; Elman, 2009; Marcus, 2001; Marcus et al, 1999; Mehler et al, 2006; Saffran, Aslin, & Newport, 1996; Seidenberg et al., 1997, 2003). Today there is still a tension in the field, indeed opposition, with some people claiming that there are rules and others claiming that there are statistics.
Computational Linguistics has undergone some interesting changes on this same dimension, with symbolic AI dominating previously but much recent work (though not all, of course) being statistical (Charniak, 1993; Eisner, 2002). Some important recent research in computational modeling and computational linguistics takes a Bayesian approach that combines or creates a hybrid of these two types, or an approach using Expectation Maximization that also involves their combination (for example, comparing symbolic grammars by assessing the probability that linguistic data might be produced by one grammar versus another) (Eisner, 2002; Goldwater, Griffiths, & Johnson, 2009).
In formal linguistics, the notion that knowledge is made of symbols and rules is not just a methodological approach. Perhaps the most interesting aspect of Chomskian linguistics has been the notion that this is a claim about the nature of the mind: that underlyingly the mind is not probabilistic, that cognition comprises symbols and rules (Chomsky, 1965). The controversy surrounding this claim still divides parts of linguistics from much of psycholinguistics, again with an opposition between the two approaches. There are, however, many ways that differing parts of the field think about statistics versus rules. Some investigators characterize statistics versus rules as different types of computation, while some have argued that there are different cognitive modules for the two (a statistical learning module and a rule module; cf. Marcus et al, 1999) - and some have suggested that there are different localizations for these functions in the brain (Pena et al., 2003). Some have proposed hybrids or dual kinds of representations (Pinker, 2000), while others have argued that they are more unified. I have suggested that there may be a sharpening process during learning, one that takes the statistics of sounds and words as the input for learning, but (at least in children) sharpens and regularizes the outcome so that the product behaves more like a rule (Hudson Kam & Newport, 2009; Newport, 1999).
A continuing question for the future is how humans maintain what appear to be these two different types of knowledge. I would suggest that there are some kinds of performance that exhibit one and the other at the same time. People are clearly sensitive to element frequency, bigram frequency, conditional probabilities, and more – not only for language but for most of what they perceive and learn: an amazing array of statistical aspects of the input they experience. At the same time, they also behave in a symbolic way - and (especially children) look like they formulate rule systems, obey principles, and form integrated systems of knowledge (Newport, 1999; Singleton & Newport, 2004; Trueswell & Gleitman, 2007; Wonnacott, Newport & Tanenhaus, 2008). One of the challenges for the future is to figure out how to integrate these types of knowledge in our descriptions of cognition, rather than argue about them.
Modularity
A related issue that I want to mention more briefly, an issue of interest throughout cognition but perhaps nowhere so centrally as in the study of language, is modularity. This issue arises in the study of language in two forms: First, is language different from non-linguistic cognition? That is, is language itself a modularized cognitive function? Second, within language, are there distinct and modularized components of linguistic knowledge and processing? That is, for example, is phonology separate from syntax and semantics? And if modularized, are there fundamentally different kinds of representations and operations that characterize each of those domains? In terms of processing, are there processes that operate on these types of information in strictly sequential fashion, or do they all combine and interact simultaneously?
Thirty years ago, there were widely held notions in the field such as ‘speech is special,’ and most researchers believed that language was different and distinct from other cognitive functions (Liberman, 1973; Fodor, 1983). It still is the case that people in some parts of the field talk about the language module, or UG (the acronym referring to a modularized kind of knowledge of language). But a great deal of the field has moved to thinking about interactive constraints on linguistic performance, and about linguistic structure arising from cognitive constraints on learning and real-time processing (Bever, 1970; Hawkins, 1994, 2007; Seidenberg, 1997; Tanenhaus & Trueswell, 2006). Again, I think these are issues that need to be resolved and brought together in the future.
A methodological point
A very brief mention of one change in our field that is methodological: It is surprising to remember how much time we spent thirty years ago in psycholinguistics looking at individual words and printed text. In contemporary psycholinguistics, much of the field now investigates real-time sentence and discourse processing. There are more eye trackers per square foot in my department than one can possibly imagine. Psycholinguistics, computational linguistics, and formal linguists all do corpus analyses of real speech. A funny example that came to mind as I was preparing this paper is that the basis for what are now called the Brown Corpus and the Penn Treebank was, when I was a graduate student, originally called Kucera & Francis. Kucera & Francis’ volume (1967) was the output of a cadre of graduate students sifting through voluminous amounts of texts so that psycholinguists would have word frequency norms for controlling experimental materials. The aim of the project, indeed, was a published word frequency list. In more recent times, this enterprise has been turned on its head. The focus has become the massive texts from which the word frequency counts were derived, rather than the word frequencies themselves; these texts have been digitized and syntactically labeled, and have become the basis of much current-day corpus analysis. That is an interesting shift also.
Language and the Brain
The last issue I want to focus on, which I think will be much of the future of the language sciences, is: how is the brain organized with respect to language? This has not been the primary focus of the last thirty years in the study of language. Of course there has been some work on language and the brain for many decades, but I think it is fair to say that this has not been a main focus of the field. This is in part because language is the privileged domain of humans, so the most revealing approaches of cellular/molecular or systems neuroscience have not been available for the study of language. But more recently, with fMRI, MEG, NIRS, and other imaging techniques available and widely used, there are methods that are beginning to stimulate many researchers – including many who didn’t previously work on language and the brain – to start thinking about the problem.
The issue I want to close with is that I think we need to think carefully and in novel ways about what might be the reasonable hypotheses for how language is organized in the brain. This in turn raises a more general question about localization of function for higher cognitive systems. If one looks at any standard neuroscience textbook, one can find depictions of the localization of function for those systems in the brain that are fairly well understood. In the sensory and motor systems, one finds clear organizational patterns for localization of function, with topographic maps that display a patterned layout of the world of stimulation (e.g., the visual field) or the world of motor output (e.g., the hand and arm) onto localized and adjacent pieces of the brain. Even tonotopic auditory cortex, which doesn’t have a spatial mapping to the outer world, is organized in a patterned way, with tone frequencies marching down primary auditory cortex.
How do we develop hypotheses about the neural organization for language, or for any higher cognitive system, if we take these as our best examples of what we know about the brain? What would be candidate hypotheses for language? Many people have thought that the modules of a linguistic grammar would be mapped onto locations of the brain, with a spot for syntax and a spot for semantics (Friederici, 2002). That might be true, but it should not be the only hypothesis that we are thinking about. (Indeed, in my reading of the literature, it is not working out that well so far, with neural activation often quite widespread throughout the left hemisphere language areas for many different types of linguistic tasks.) Perhaps there is a dictionary that runs down the temporal lobe. There actually is quite a bit of interesting evidence that there are spots in the temporal lobe that are involved with tool words as contrasted with animal words (Caramazza & Mahon, 2006). But it doesn’t seem very likely that words from A-Z in the dictionary will be organized alphabetically down the temporal lobe. We ought to be thinking carefully and broadly about what the best hypotheses might be for how language is organized in the brain. David Plaut and Marlene Behrman also spoke at this Cognitive Science Society meeting about some of these issues, suggesting that localization of cognitive functions might arise not from the inherent localization of cognitive modules such as language or face perception, but rather from the interaction of the multiple cognitive or perceptual processes that underlie the task of interest. I want to second their general point: we need some new ways of thinking about how language might be organized in the brain, and also some consideration of how the layout of other perceptual and cognitive functions might play a role in shaping the topography of language in the brain. We also need new ways of thinking about, and testing, how neural circuitry might accomplish the kinds of generalization and symbolic processes that language entails. While there are some approaches that have taken on this important problem, researchers who focus on the representational side of language have often not been part of the enterprise. My wish for the future is that we might collaborate on addressing these problems of utmost mutual interest.
Summary and Conclusions
In sum, my agenda for the future would be that we must continue to think about how to integrate rules and statistics rather than to conceptualize them as opposing issues; and we must think in new ways about how the brain might be organized in higher cognitive systems. In addition, we need to address how neural circuits might compute and represent the kind of information relevant to language, concepts and other aspects of high-level cognition. I hope in the next Cognitive Science Symposium, thirty years hence, we will have solved these simple problems and can stew about some new ones.
Acknowledgments
During the preparation of this paper my work was supported by NIH grant DC00167.
References
- Caramazza A, Mahon BZ. The organization of conceptual knowledge in the brain: The future’s past and some future directions. Cognitive Neuropsychology. 2006;23:13–38. doi: 10.1080/02643290542000021. [DOI] [PubMed] [Google Scholar]
- Charniak E. Statistical language learning. Cambridge, MA: The MIT Press; 1993. [Google Scholar]
- Chomsky N. Aspects of the theory of syntax. Cambridge, MA: The MIT Press; 1965. [Google Scholar]
- Chomsky N. Lectures on government and binding. Dordrecht: Foris Publications; 1981. [Google Scholar]
- Chomsky N. The Minimalist Program. Cambridge, MA: The MIT Press; 1995. [Google Scholar]
- Eisner J. Introduction to the special section on linguistically apt statistical methods. Cognitive Science. 2002;26:235–237. [Google Scholar]
- Fodor J. The modularity of mind. Cambridge, MA: The MIT Press; 1983. [Google Scholar]
- Friederici AD. Toward a neural basis of auditory sentence processing. Trends in Cognitive Sciences. 2002;6:78–84. doi: 10.1016/s1364-6613(00)01839-8. [DOI] [PubMed] [Google Scholar]
- Goldwater S, Griffiths TL, Johnson M. A Bayesian framework for word segmentation: Exploring the effects of context. Cognition. 2009;112:21–54. doi: 10.1016/j.cognition.2009.03.008. [DOI] [PubMed] [Google Scholar]
- Hudson Kam CL, Newport EL. Getting it right by getting it wrong: When learners change languages. Cognitive Psychology. 2009;59:30–66. doi: 10.1016/j.cogpsych.2009.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kucera H, Francis WN. Computational analysis of present-day American English. 1967. [Google Scholar]
- Hawkins JA. A performance theory of order and constituency. New York: Cambridge University Press; 1994. [Google Scholar]
- Hawkins JA. Processing typology and why psychologists need to know about it. New Ideas in Psychology. 2007;25:87–107. [Google Scholar]
- Marcus GF. The algebraic mind: Integrating connectionism and cognitive science. Cambridge, MA: MIT Press; 2001. [Google Scholar]
- Marcus GF, Vijayan S, Bandi Rao S, Vishton PM. Rule learning by seven-month-old infants. Science. 1999;283:77–80. doi: 10.1126/science.283.5398.77. [DOI] [PubMed] [Google Scholar]
- Mehler J, Peña M, Nespor M, Bonatti L. The soul of language does not use statistics: reflections on vowels and consonants. Cortex. 2006;42:846–854. doi: 10.1016/s0010-9452(08)70427-1. [DOI] [PubMed] [Google Scholar]
- Newport EL. Reduced input in the acquisition of signed languages: Contributions to the study of creolization. In: DeGraff M, editor. Language creation and language change: Creolization, diachrony, and development. Cambridge: MIT Press; 1999. [Google Scholar]
- Peña M, Maki A, Kovačić D, Dehaene-Lambertz G, Koizumi H, Bouquet F, Mehler J. Sounds and Silence: An optical topography study of language recognition at birth. Proceedings of the National Academy of Science. 2003;100:11702–5. doi: 10.1073/pnas.1934290100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pinker S. Words and rules: The ingredients of language. New York: Perennial; 2000. [Google Scholar]
- Prince A, Smolensky P. Optimality Theory: Constraint interaction in generative grammar. New York: Blackwell; 2004. [Google Scholar]
- Saffran JR, Aslin RN, Newport EL. Statistical learning by 8-month-old infants. Science. 1996;274:1926–1928. doi: 10.1126/science.274.5294.1926. [DOI] [PubMed] [Google Scholar]
- Seidenberg MS, MacDonald MC, Saffran JR. Are there limits to statistical learning? Science. 2003;300:51–52. [Google Scholar]
- Seidenberg MS. Language acquisition and use: Learning and applying probabilistic constraints. Science. 1997;275:1599–1604. doi: 10.1126/science.275.5306.1599. [DOI] [PubMed] [Google Scholar]
- Singleton JL, Newport EL. When learners surpass their models: The acquisition of American Sign Language from inconsistent input. Cognitive Psychology. 2004;49:370–407. doi: 10.1016/j.cogpsych.2004.05.001. [DOI] [PubMed] [Google Scholar]
- Tanenhaus MK, Trueswell JC. Eye movements and spoken language comprehension. In: Traxler MJ, Gernsbacher MA, editors. Handbook of Psycholinguistics. 2. Elsevier Press; 2006. [Google Scholar]
- Trueswell JC, Gleitman LR. Learning to parse and its implications for language acquisition. In: Gaskell G, editor. Oxford Handbook of Psycholinguistics. Oxford: Oxford Univ. Press; 2007. [Google Scholar]
- Wonnacott E, Newport EL, Tanenhaus MK. Acquiring and processing verb argument structure: Distributional learning in a miniature language. Cognitive Psychology. 2008;56:165–209. doi: 10.1016/j.cogpsych.2007.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
