Spoken word recognition is central to language, as words link sound, articulation, and spelling to meaning and syntax. Decades of work has yielded consensus that the mechanisms of word recognition can be described as a form of competition among candidates that is tightly coupled to the unfolding speech input (1). However, recent research has pushed beyond the modal listener (monolingual, normal hearing, neurotypical adults) to observe novel profiles of recognition which may help listeners become more flexible (2), for example, by altering the dynamics of word recognition in difficult listening conditions, or keeping options open in case a mistake is made and they must switch to a new interpretation. Villamerial et al. (3) extend this enterprise with a unique group of bilinguals who use both a spoken and a signed language. They offer conclusive evidence for direct interaction between the two languages during recognition (a key issue in multilingualism) and reveal pathways of information processing that may be critical for understanding how listeners flexibly recognize words in challenging circumstances.
Words unfold rapidly over time. Consequently, there is a brief period where the input is consistent with many candidates. For instance, the onset of “wizard” (wi-) is consistent with “window,” “wizard,” and “which” (etc.). Decades of research has concluded that modal listeners cope with this by using a form of competition (1). At word onset, multiple candidates are briefly coactivated and compete. Competition is modulated as more input arrives, until, eventually, one candidate remains. These dynamics can be visualized with the visual world paradigm (VWP) (Fig. 1A and ref. 4), as used in ref. 3. This uses eye movements in a simple word recognition task to trace the dynamics of word recognition as processing unfolds over time. Classic results (4) validate that competition is tightly coupled to the unfolding input: Early in recognition, listeners fixate both target words and onset competitors (e.g., “wizard” and “window”), and only later do they consider rhymes (e.g., “lizard”).
Fig. 1.
(A) Canonical profile of fixations over time for monolingual word recognition after hearing “wizard” (14). (B) A processing model based on the standard model of word recognition for bilinguals in two spoken languages. Gray circles indicate phonemes not yet heard. Here, the /m/, /ɑ/, and /ɹ/ in the input directly activate words in both English (“marker”) and Russian (“marka”). (C) The processing model consistent with ref. 3. Here, the bottom-up input for “vino” cannot activate “bruja.” The only available route is via a word→word pathway. (D) Fixations to targets and cohort competitors over time in Normal Hearing (NH) adults and in postlingually deaf adults who use Cochlear Implants (CIs) (14). CI users are slower to fully commit to the target and rule out competitors, and they continue fixating competitors even when they’ve selected the target, suggesting a sustained activation profile. (E) Prelingually deaf CI users (16) show much larger delays in target fixations. (F) Because lexical access is delayed, cohorts show less competition. By the time they begin lexical access for “wizard,” they have heard some information to rule out “window,” a wait and see profile.
This framing offers a clear avenue for examining a fundamental issue in bilingualism: Do listeners actively engage both languages during recognition, or are the lexica for each language functionally separate? Seminal studies by Marian and colleagues (see ref. 5 for review) used the VWP to show that words are activated in both languages. For example, when Russian/English bilinguals hear “marka” (stamp), they briefly look at a “marker” before settling on the target, even though the Russian word for “marker” (flomaster) has little overlap with marka. This implies the English word “marker” is active, despite English being irrelevant for the task.
Standard models of word recognition argue that bottom-up input obligatorily activates all words consistent with it. The logical consequence of this is that “mark-” should activate both Russian and English words (Fig. 1B) without requiring true cross-talk between the lexica. However, for a speaker of both a spoken and signed language (bimodal bilinguals), this issue is avoided—both languages cannot be activated by the same bottom-up input. For example, the phonemes /v/, /i/, /n/, and /o/ in the Spanish “vino” (wine), are completely unrelated to the manual properties of the sign for“witch” (bruja). Thus, there is no reason to activate “bruja” after hearing “vino” based on input alone. Nonetheless, Villamerial et al. (3) observe such cross-modal activation. This occurs because the sign for “vino” overlaps with “bruja” in its physical characteristics (hand shape and location). Their study thus offers a capstone to a series of studies on this unique population (6, 7), showing that cross-language activation arises even if the bottom-up input cannot support parallel activation. This strongly supports true cross-talk between the lexica (Fig. 1C).
While this offers conclusive evidence for interactions across languages, it also reveals a pathway for information flow that may be fundamental to word recognition even in monolinguals: information flow within the lexical level by which words directly activate other words(Fig. 1C, dashed path connecting "vino" and "bruja"). We’ve long known semantic overlap facilitates spreading activation (8), but here we se evidence of word→word spreading activation based on phonological similarity. This appears to challenge the common assumption that word recognition should maximize efficiency. Activating a second lexicon—even when the bottom-up signal does not necessitate it—vastly increases the amount of competition that must be resolved. Indeed, some research is premised on the idea that multilinguals require cognitive resources to effectively suppress activation across languages (9).
However, what if this increased competition should be embraced, not avoided? Efficiency is not the only functional goal of word recognition—recognition must also be flexible. All listeners (or signers) face the possibility of perceptual errors. For example, when hearing “lizard,” if the /l/ is misheard as /w/, the listener may erroneously decide the word was “wizard” and will have to revise their choice when future context describes a reptile. However, a system that delays committing to one word and keeps options available may make revision easier (e.g., ref. 10). In fact, listeners often consider competitors that should have been ruled out by the input. For example, hearing “tack” activates its anadrome, “cat,” which should have been ruled out early by the phoneme /t/ at word onset (11); hearing “trombone” activates “bone,” which should have been ruled out by the initial “trom-” (12). Thus, listeners keep a variety of options available, even those that are not a strict temporal match to the input. But how? The ability to spread activation directly from word to word may override the strict, bottom-up temporal order of activation imposed by the input and help preserve flexibility.
While psycholinguistics has often focused on typical hearing listeners in ideal listening conditions, people face a variety of challenging situations, such as background noise; Zoom calls with gaps; speech that is too quiet, too fast, or accented; and so forth. Further, millions of hearing-impaired listeners face auditory distortions from hearing loss, or from hearing devices like cochlear implants. Such situations require flexibility, but this may take multiple forms. First, under moderately challenging conditions, listeners maintain activation for competitors for longer than usual (Fig. 1D) (13–15). This keeps competitors accessible for potential revision. Second, under more difficult conditions, listeners pause word recognition entirely, delaying lexical access until more input is available, resulting in overall reduced activation from onset competitors (15–17), and likely reducing the probability of an error. This so-called “wait-and-see” profile differs from the immediate lexical activation of modal listeners by decoupling the internal dynamics of lexical access from the temporally unfolding input. This is analogous to results with anadromes and embedded words (11, 12), and dovetails with the lack of temporal order effects in the Spanish–Basque cross-activation finding of ref. 2. Both profiles may leverage word→word pathways to maintain activation for competitors or perform such temporal decoupling to increase flexibility in contexts that require it.
These results raise questions that must be answered, and functional goals of word recognition that have not been widely considered. First, we must determine whether such word→word cross-talk is specific to multilinguals, or whether it arises in monolinguals as well. This is crucial for asking whether these connections may support the kind of flexibility we describe. Second, we must examine how such pathways emerge developmentally. We now know that cross-language competition effects can emerge early in the course of second language (L2) acquisition (18); one possibility is that they arise, in part, from the need for translation (or from learning environments that stress translation). Lastly, phonologically related words are classically presumed to exert inhibitory, not facilitatory, influences on each other (19). The work of Villamerial et al. (3) suggests an additional excitatory role, and it is unclear how to rectify their finding with this claim. The answers to these questions may challenge models of word recognition, but they may also offer new dimensions on which to extend these models to account for challenging listening.
Villamerial et al. (3) also suggest functional goals for listeners. Word recognition is often framed as a process of rapidly suppressing competitors. However, flexibility may be just as important as efficiency. Our discussion has focused on the ability of the lexical system to access the correct meaning of a word, to revise if necessary, or to change in the face of difficult listening. However, flexible word recognition may also entail other functional goals. In bilinguals, for instance, preserving activation for words across languages may support useful functions like translation or code switching. Thus, looking beyond the modal case, the system may need to reinvent itself in response to a variety of internal and external conditions to successfully achieve the language user’s goals.
The relatively small and unique population of bimodal bilingual listeners studied by Villamerial et al. (3) and others (6) has generated fundamental insights that highlight the diversity of mechanisms by which all language users solve the problems of word recognition and raised crucial questions for our understanding of language in all listeners.
Acknowledgments
Preparation of this manuscript was supported by NIH Grant DC008089 awarded to B.M., and Grant DC000242 awarded to B.M. and Bruce Gantz.
Footnotes
The authors declare no competing interest.
See companion article, “Cross-modal and cross-language activation in bilinguals reveals lexical competition even when words or signs are unheard or unseen,” 10.1073/pnas.2203906119.
References
- 1.Dahan D., Magnuson J. S., “Spoken word recognition” in Handbook of Psycholinguistics, Traxler M., Gernsbacher M., Eds. (Elsevier, ed. 2, 2006), pp. 249–283. [Google Scholar]
- 2.McMurray B., Apfelbaum K. S., Colby S., Tomblin J. B., Understanding language processing in variable populations on their own terms: Towards a functionalist psycholinguistics of individual differences, development and disorders. PsyArxiv [Preprint] (2022). https://psyarxiv.com/zp4aw/ (Accessed 5 August 2022). [DOI] [PMC free article] [PubMed]
- 3.Villamerial S., Costello B., Giezen M., Carreiras M., Cross-modal and cross-language activation in bilinguals reveals lexical competition even when words or signs are unheard or unseen. Proc. Natl. Acad. Sci. U.S.A. 119, e2203906119 (1998). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Allopenna P., Magnuson J. S., Tanenhaus M. K., Tracking the time course of spoken word recognition using eye-movements: Evidence for continuous mapping models. J. Mem. Lang. 38, 419–439 (1998). [Google Scholar]
- 5.Chabal S., Marian V., “In the mind’s eye: Eye-tracking and multi-modal integration during bilingual spoken-language processing” in Attention and Vision in Language Processing, Mishra R. K., Srinivasan N., Huettig F., Eds. (Springer India, New Delhi, 2015), pp. 147–164. [Google Scholar]
- 6.Shook A., Marian V., Bimodal bilinguals co-activate both languages during spoken comprehension. Cognition 124, 314–324 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Villameriel S., Costello B., Dias P., Giezen M., Carreiras M., Language modality shapes the dynamics of word and sign recognition. Cognition 191, 103979 (2019). [DOI] [PubMed] [Google Scholar]
- 8.Meyer D. E., Schvaneveldt R. W., Facilitation in recognizing pairs of words: Evidence of a dependence between retrieval operations. J. Exp. Psychol. 90, 227–234 (1971). [DOI] [PubMed] [Google Scholar]
- 9.Bialystok E., Craik F. I. M., Luk G., Bilingualism: Consequences for mind and brain. Trends Cogn. Sci. 16, 240–250 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Kapnoula E. C., Edwards J., McMurray B., Gradient activation of speech categories facilitates listeners’ recovery from lexical garden paths, but not perception of speech-in-noise. J. Exp. Psychol. Hum. Percept. Perform. 47, 578–595 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Toscano J. C., Anderson N. D., McMurray B., Reconsidering the role of temporal order in spoken word recognition. Psychon. Bull. Rev. 20, 981–987 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Luce P. A., Cluff M. S., Delayed commitment in spoken word recognition: Evidence from cross-modal priming. Percept. Psychophys. 60, 484–490 (1998). [DOI] [PubMed] [Google Scholar]
- 13.Brouwer S., Bradlow A. R., The temporal dynamics of spoken word recognition in adverse listening conditions. J. Psycholinguist. Res. 45, 1151–1160 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Farris-Trimble A., McMurray B., Cigrand N., Tomblin J. B., The process of spoken word recognition in the face of signal degradation. J. Exp. Psychol. Hum. Percept. Perform. 40, 308–327 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Smith F. X., McMurray B., Lexical Access changes based on listener needs: Real-time word recognition in continuous speech in cochlear implant user. Ear Hear, 43, 1487–1501 (2022). [DOI] [PMC free article] [PubMed]
- 16.McMurray B., Farris-Trimble A., Rigler H., Waiting for lexical access: Cochlear implants or severely degraded input lead listeners to process speech less incrementally. Cognition 169, 147–164 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Klein K., Walker E., McMurray B., Delayed lexical access and cascading effects on spreading semantic activation during spoken word recognition in children with hearing aids and cochlear implants: Evidence from eye-tracking. Ear Hear, in press. PsyArxiv [Preprint] (2021). https://psyarxiv.com/mdzn7/. [DOI] [PMC free article] [PubMed]
- 18.Sarrett M., Shea C., McMurray B., Within- and between-language competition in adult second language learners: Implications for language proficiency. Lang. Cogn. Neurosci. 37, 165–181 (2022). [Google Scholar]
- 19.Dahan D., Magnuson J. S., Tanenhaus M. K., Hogan E., Subcategorical mismatches and the time course of lexical access: Evidence for lexical competition. Lang. Cogn. Process. 16, 507–534 (2001). [Google Scholar]

