Reconsidering the role of temporal order in spoken word recognition

Joseph C Toscano; Nathaniel D Anderson; Bob McMurray

doi:10.3758/s13423-013-0417-0

. Author manuscript; available in PMC: 2014 Oct 1.

Published in final edited form as: Psychon Bull Rev. 2013 Oct;20(5):10.3758/s13423-013-0417-0. doi: 10.3758/s13423-013-0417-0

Reconsidering the role of temporal order in spoken word recognition

Joseph C Toscano ^a, Nathaniel D Anderson ^a,^b, Bob McMurray ^c,^d

PMCID: PMC3812303 NIHMSID: NIHMS451328 PMID: 23456328

Abstract

Models of spoken word recognition assume that words are represented as sequences of phonemes. We evaluated this assumption by examining phonemic anadromes, words that share the same phonemes but differ in their order (e.g., sub and bus). Using the visual world paradigm, we found that listeners show more fixations to anadromes (e.g., sub when bus is the target) than to unrelated words (well) and to words that share the same vowel but not the same set of phonemes (sun). This contrasts with predictions of existing models and suggests that words are not defined as a strict sequence of phonemes.

Keywords: spoken word recognition, speech perception, temporal order, eye-tracking, visual world paradigm

A significant problem in understanding spoken word recognition is that speech unfolds over time. This leads to temporary ambiguity: information early in the signal is often insufficient to identify the intended word, since it is consistent with many different words (Marslen-Wilson, 1987). For example, when hearing tack, after only /tæ/ is heard, many completions are possible (tack, tap, taxi). A related issue involves temporal order: the order of elements in a word seems to be important for distinguishing them. For example, sub and bus consist of the same phonemes but can be distinguished because those phonemes occur in different sequences. There has been little research on the effects of temporal order, despite the assumption that it is fundamental to lexical representations and is implemented quite differently in different models (Gaskell & Marslen-Wilson, 1997; McClelland & Elman, 1986; Norris & McQueen, 2008; Grossberg, 2003). The present study examines its role in spoken word recognition by asking whether phonemes in incorrect positions contribute to lexical access.

Previous studies on temporary ambiguity are relevant to this problem. These studies have led to a consensus that listeners access potential lexical candidates from the earliest moments of a word (Allopenna, Magnuson, & Tanenhaus, 1998; Marslen-Wilson & Zwitserlood, 1989); they consider multiple words in parallel (Luce & Pisoni, 1998; Marslen-Wilson, 1987); they update the words under consideration as subsequent information arrives (Dahan & Gaskell, 2007; Frauenfelder, Scholten, & Content, 2001); and words compete with each other for recognition (Dahan, Magnuson, Tanenhaus, & Hogan, 2001; Luce & Pisoni, 1998).

This work also makes the strong assumption that words are defined as phoneme sequences.¹ This seems intuitive: if phoneme order distinguishes words, it makes sense for the system to represent them in this way. The large body of evidence showing strong competition from onset-matching competitors (cohorts) provided some of the earliest evidence for this. Since cohorts are more active than other competitors during early portions of the signal, this result appears to support slot-based representations in which early parts of the signal are matched to early parts of words in the lexicon. Indeed, the COHORT model (Marslen-Wilson, 1987) suggested that recognition was all-or-none; words that did not match at onset did not compete for activation. However, empirical work showing competition from onset-mismatching words has challenged this assumption (Connine et al., 1993; Allopenna et al., 1998). As a consequence, current models allow activation to depend on the degree of match in each phoneme position but continue to use a slot-based scheme (McClelland & Elman, 1986; Gaskell & Marslen-Wilson, 1997; Luce & Pisoni, 1998; Luce, Goldinger, Auer, & Vitevitch, 2000; Norris & McQueen, 2008). In NAM and Shortlist B, for example, graded activation is implemented using phoneme confusion data to partially activate words that are phonologically similar within each slot. In TRACE, this is accomplished through the use of graded features for each phoneme. However, while current models allow for this graded activation, they all incorporate the idea that the input is matched to lexical items slot-by-slot.

The present study challenges this assumption, asking whether words are even represented as sequences. This is loosely inspired by work on visual word recognition suggesting that letter order may be only coarsely coded (Chambers, 1979; Grainger & Whitney, 2004). Nonwords with transposed letters (JUGDE) prime their original words (JUDGE), while the same-sized mismatch without transposition (JULPE) does not (Perea & Lupker, 2003); this also extends to non-adjacent transpositions (e.g, CANISO/CASINO: Perea & Lupker, 2004). This implies that letters in incorrect positions may still activate the correct target word. This does not mean that readers completely ignore letter order, and there are limitations to these effects (Guerrara & Forster, 2008; Hannagan, Dupoux, & Christophe, 2011). However, it suggests that printed words are not represented as strict letter-sequences. While spoken and written words differ in both temporal demands and the nature of the input (the forms of letters are the same in all positions, while the forms of phonemes are not), this raises the possibility that order need not be fully represented.

Abandoning a slot-based approach for spoken words may appear to create problems (e.g., distinguishing sub from bus). However, given that phonemes vary acoustically with word position and that listeners are sensitive to fine-grained differences (McMurray, Tanenhaus, & Aslin, 2002), they may not need to represent words in a slot-based format. Sub and bus may be distinguished because word-initial /b/ is acoustically different from word-final /b/. Under such a system, listeners might show strong cohort effects, since when they hear a word-initial /b/, its acoustic properties map more strongly onto words with the word-initial allophone of /b/ and less well onto words with the word-final allophone. Critically, these effects would not arise from slot-based representations.

We tested this by examining phonemic anadromes, words like sub and bus that contain the same phonemes in the opposite order. Since many models tolerate some mismatch at onset (Gaskell & Marslen-Wilson, 1997; Luce & Pisoni, 1998; McClelland & Elman, 1986; Luce et al., 2000; Norris & McQueen, 2008), they might predict activation for anadromes over unrelated words that share no phonological features (on the basis of the shared vowel). However, they predict that this should be the same as activation for non-anadromes that have a similar degree of mismatch (e.g., sun would compete with bus just as well as sub does). None predict competition from anadromes due to phonemes in incorrect positions (e.g., the /b/ in sub leading to activation for bus). We confirmed that this is the case for TRACE (Supplemental Material S1). Thus, existing models suggest that sub might compete weakly with bus because of the similarity of the words within each slot, since the matching vowels (in the second slot) could drive some activation (though it has never been empirically demonstrated that a single word-medial phoneme can drive competition). Given this, we must also compare anadromes (sub when bus is the target) to words having the same onset and vowel (sun).

Sub-phonemic overlap among the consonants could also play a role. While sub and bus have little overlap, consider tack and cat. In TRACE, the features for /t/ are similar to /k/. In NAM and Shortlist B, the /t/ would be partially confusable with /k/. Therefore, any word with some confusability in each slot could compete. Tap could compete with cat, and if /p/ and /k/ are equally distant from /t/, tap and tack could be equally active. Critically this activation still derives from slot-based representations – phonemes or features in the wrong slot do not drive the effect. Thus, to establish anadrome activation we also need to look at word-pairs containing initial consonants with minimal phonological overlap.

We used the visual world paradigm (Allopenna et al., 1998; Tanenhaus, Spivey-Knowlton, Eberhard, & Sedivy, 1995) to ask whether anadromes are activated during lexical access, and if so, whether part of this activation can be attributed to phonemes whose position in the input do not match the corresponding position in the competitor word. Participants heard a target word in the presence of four pictures representing the target, possible lexical competitors, and unrelated words. Each item-set contained a base-word (e.g., sub), its anadrome (bus), a cohort (sun), and an unrelated word (well). Critically, on some trials, anadromes can be compared to words that overlap only in vowel (e.g., when bus was the target, differences in looks to sub and sun), allowing a test of effects of the vowel and partially-matching initial consonants. While ideally the word list would include only anadromes with minimal overlap among the bracketing consonants, there were not sufficient picturable pairs in English to construct a large list. Thus our list consists of words with similar initial consonants (e.g., tack/cat), which are useful for establishing activation for anadromes, and words that are minimally overlapping (sub/bus) which can confirm that effects are driven by phonemes in different positions.

Methods

Participants

Thirty University of Iowa undergraduates participated. Participants were native English speakers, reported normal or corrected-to-normal vision, provided informed consent, and received course credit or monetary compensation.

Design

Participants heard spoken words and used a computer mouse to select a corresponding picture. Stimuli consisted of 16 sets of four words (Appendix A). Each set contained a base-word (sub), its anadrome (bus), a cohort (sun), and an unrelated word (well).

Across trials, each word in an item-set occurred an equal number of times. This created four trial-types. On Cohort/Anadrome trials (when the base-word, sub, was the target) cohort and anadrome competitors were present with the target and an unrelated word. On Anadrome/Overlap trials (bus was the target) an anadrome (sub, the former base-word) and unrelated word (well) were present; the former cohort (sun) was termed an overlap since it shared only an overlapping vowel and a consonant in the wrong position. Cohort/Overlap trials (e.g., sun was the target) contained a cohort (sub), overlap (bus), and unrelated word (well). Finally, on Unrelated trials (e.g., well was the target) all three competitors were phonologically unrelated. Table 1 shows an example item-set and the role of each word across trial-types. Each word was repeated nine times, yielding 576 trials. Participants completed the experiment in an hour-long session.

Table 1.

Example item-set by trial-type.

Trial-type	sub	sun	bus	well
Cohort/Anadrome	Target	Cohort	Anadrome	Unrelated
Cohort/Overlap	Cohort	Target	Overlap	Unrelated
Anadrome/Overlap	Anadrome	Overlap	Target	Unrelated
Unrelated	Unrelated	Unrelated	Unrelated	Target

Open in a new tab

Stimuli

Auditory stimuli were recorded by a female talker in a sound-attenuated room. Recordings were made at 44.1 kHz on a Kay Elemetrics Computerized Speech Lab 4300B. Several tokens of each word were recorded and the best exemplar was selected. The mean word duration was 457 ms (SD=74 ms).

Visual stimuli were color drawings prepared using a standard procedure in the McMurray lab. For each word, a set of candidate clipart images were obtained. These were viewed by focus-groups of lab personnel to select the most prototypical image and guide subsequent editing to make pictures representative and uniform (McMurray, Samelson, Lee, & Tomblin, 2010).

Procedure

An SR Research Eyelink II eye-tracker was calibrated using the standard 9-point procedure. On each trial, participants saw four 200x200-pixel pictures in each corner of a 19” CRT monitor. At the center, a small blue dot was displayed, which turned red after 750 ms. The participant then clicked on the dot and heard the auditory stimulus 100 ms later (via Sennheiser HD555 headphones). This ensured that both the mouse cursor and the participant’s gaze were centered when the auditory stimulus began. The trial ended when the participant clicked on the corresponding referent.

Every 45 trials, drift correction was performed, and participants were given the opportunity to take a break. Stimulus presentation and data collection were handled by the SR Research Experiment Builder package and Eyelink control software.

Eye-movement analysis

As in prior experiments (McMurray, et al., 2010; McMurray et al. 2002) the eye-movement record was parsed into saccades and fixations by the Eyelink software. Adjacent saccades and fixations were combined into a “look” extending from saccade-onset to fixation-offset. Boundaries around the images were extended by 100 pixels. This maintained substantial space between ports (Horizontal: 630 pixels; Vertical: 374 pixels) while compensating for noise.

We used a standard area-under-the-curve approach for data analysis (Allopenna, et al., 1998; McMurray et al., 2002) with the time-window starting 200 ms after stimulus onset (since it takes ≈200 ms to plan an eye-movement) and ending at 1417 ms (the average RT plus 200 ms).² Proportions of looks were analyzed with linear mixed-effects (LME) models using the LME4 package in R (Bates & Sarkar, 2011). The dependent variable was proportion of looks, transformed with the empirical logit for use with linear models. All models used object-type as the only fixed effect, which was binary and coded as −0.5/+0.5.

In considering the random-effects structures of the models, we found that adding by-subject slopes did not significantly improve model fit, while by-item random slopes did. Models with only random intercepts for items showed clearer effects; models that were more sensitive to item-level variation using random slopes did not uniformly show effects. This suggests that our effects generalized robustly across subjects, but not necessarily across items. This was not unexpected, since our design does not provide sufficient power to detect small by-item effects, and there are a number of sources of variability across items. Thus, we report here results from both models with only random intercepts and models with random slopes (see Supplemental Material S2 for complete details of all statistical models). To be clear, both models included random effects for subjects and items. Significance was evaluated using the chi-square goodness-of-fit test comparing models with and without object-type as a fixed effect.

Results

Participants performed well, averaging 99.3% correct (SD=1.1%). We eliminated the few trials on which they selected the incorrect referent (M=3.9 trials/participant, SD=6.5). Mean RT for correct trials was 1217 ms (SD=642 ms).

To determine whether anadromes competed for recognition, we first examined the proportion of looks to the anadrome and unrelated objects on Cohort/Anadrome trials. Figure 1A shows the timecourse of looks to each competitor on these trials. The earliest fixations were directed to the target and cohort, since at that time, the stimulus was consistent with both. Fixations to the anadrome were initiated later and were greater than those to the unrelated object. This was confirmed statistically in an LME model with random intercepts and object-type (anadrome vs. unrelated) as the only fixed effect (b=0.025, SE=0.006, p<0.001), although this was not significant in the random-slope model (b=0.025, SE=0.017, p=0.131).

**(A)** Timecourse of looks to each object on Cohort/Anadrome trials. **(B)** Timecourse of looks on Cohort/Overlap trials. Dashed lines indicate boundaries of analysis time window.

Since words that mismatch at onset can also be activated (Allopenna et al., 1998), the anadrome effect could be driven by the overlapping vowel and/or shared features at onset (e.g., the /k/ in cat shares features with the /t/ in tack, since both are voiceless stops). Such a model also predicts activation for tap (an overlap competitor), when cat is the target. We examined this with a second LME model, which compared looks to overlap and unrelated objects on Cohort/Overlap trials (Figure 1B). The proportion of looks to the two objects was not significantly different in either the random-intercept model (b=0.001, SE=0.006, p=0.847) or the random-slope model (b=0.001, SE=0.006, p=0.847). This suggests that overlaps were not activated more than unrelated items.

We next asked whether anadromes also received more looks than overlaps. Here, we considered two approaches to analyzing this difference. One follows the analyses above, which examines looks within a single trial-type, holding the auditory stimulus constant and examining fixations to different visual objects in the display. An alternative is to compare looks to the same visual object when it serves as a different type of competitor. For example, we can compare looks to bus when it serves as an anadrome on Cohort/Anadrome trials (with sub as the target) with looks to bus as an overlap on Cohort/Overlap trials (with sun as the target). This controls for differences in the visual stimulus that might drive differences in eye-movements, gives us twice as many trials from which to draw our data, and may make it easier to detect small competitor effects, since it uses trials with a cohort in the display (Dahan et al., 2001).³

First, we used the within-trial-type approach to look at differences between anadromes and overlaps on Anadrome/Overlap trails (Figure 2A). We found a marginal effect of object-type with more looks to anadromes than to overlaps in the random-intercept model (b=0.011, SE=0.006, p=0.092), which appeared restricted to a few time ranges (300–500 ms and 800–1200 ms); this was not significant in the random-slope model (b=0.011, SE=0.016, p=0.523). One reason we might not see clear anadrome-overlap differences here is that, for many item-sets, the competitor onsets share multiple features with the target, which could heighten overlap activation. If cat is the target, the initial phonemes of tack and tap (anadrome and overlap) differ from cat in just one feature. For these item-sets, we might expect that overlaps are activated more by their shared phonological features with the target.

**(A)** Timecourse of looks to each object on Anadrome/Overlap trials. **(B)** Timecourse for the item-sets with three-feature differences between the onset of the target and competitors.

Therefore, a stronger test of our hypothesis is to examine differences in anadrome and overlap looks for pairs whose consonants share no features (e.g., for bus [target], sub [anadrome], and sun [overlap], the competitors’ initial phoneme differs from the target’s in voicing, place, and manner). For these item-sets, activation for the overlap and anadrome would not be enhanced by shared features at onset. If we observe more activation for anadromes than overlaps for these item-sets, it would suggest that competition is driven by the presence of phonemes that match the target but are in the wrong position.

A fourth LME model examined this by comparing anadrome and overlap looks on Anadrome/Overlap trials for the four item-sets that differed in all three features (Figure 2B). The random-intercept model showed an effect of object-type with more looks to anadromes (b=0.055, SE=0.015, p<0.001); this was not significant in the random-slope model (b=0.055, SE=0.040, p=0.183). This suggests that our effects cannot be due solely to graded activation of phonemes in slot-based representations of words—phonemes in the incorrect position play an additional role.

Finally, we examined four LME models corresponding to the comparisons above but holding the visual stimulus constant and comparing across trial-types. Each of the four random-intercept models showed an effect of object-type: (1) anadrome > unrelated (b=0.044, SE=0.006, p<0.001); (2) overlap > unrelated (b=0.016, SE=0.005, p=0.003); (3) anadrome > overlap using all item-sets (b=0.028, SE=0.006, p<0.001); and (4) anadrome > overlap using the item-sets with no feature overlap (b=0.046, SE=0.012, p<0.001); and two of the random-slope models showed an effect: (1) anadrome > unrelated (b=0.044, SE=0.018, p=0.024); and (3) anadrome > overlap using all item-sets (b=0.028, SE=0.013, p=0.045). Thus, these results show a clear advantage for anadromes over overlaps, and they also reveal a small effect of overlap competitors that we did not observe in the within-trial-type analyses.

Discussion

We found robust evidence that anadromes are activated more than unrelated words during spoken word recognition. This is consistent with previous work showing activation for lexical competitors that mismatch at onset (e.g., rhymes) and adds to the set of competitors that listeners consider during word recognition.

There are two potential causes of this effect. One possibility is that words are represented in a slot-based way and that a matching phoneme in the second position (the vowel) and/or a mismatching, but similar phoneme in the first position (the initial consonant) drove the effect. However, we only found mixed evidence that overlaps were activated at all, and we found much stronger evidence that anadromes received more fixations than overlaps when there was maximal difference in the consonants. When overlap activation was not heightened by featural match between the overlap and the target, a clear anadrome effect could be seen.

This suggests that words are not represented in a slot-like format and that phonemes in the wrong position may activate competitors (e.g., the /s/ in the first position of the input could activate bus). Thus, in contrast to existing models of word recognition, anadromes are activated as a conequence of phonemes in the incorrect position.

Given this, why don’t listeners confuse words with their anadromes? One possibility is that fine-grained phonetic detail could serve as a proxy for temporal order. The acoustic forms of consonants vary with syllable position—a syllable-initial /b/ contains rising formants and a short VOT, while a syllable-final /b/ contains falling formants, a closure, and a release burst. While both have similar gross spectral shapes (Blumstein & Stevens, 1979), they differ in many details. As a result, a fine-grained description of sub will differ from bus, even if phoneme order is ignored. This can explain why cohorts are more active than other competitors: a word-initial allophone of /b/ is a better match to bus and bun than to sub. We should note, however, that this is only a hypothesis at this point.

The fact that word recognition is sensitive to fine-grained detail (McLennan & Luce, 2005; McMurray et al., 2002), including allophonic detail (Ranbom & Connine, 2007), means that listeners could use subtle acoustic differences to distinguish anadromes. At the same time, the coarse spectral similarity will lead to some parallel activation. Such a model is quite different from most models of spoken word recognition, in that lexical representations have no inherent temporal order. Rather, words would be more like a bundle of fine-grained acoustic cues. This detailed acoustic information is preserved in exemplar models (Goldinger, 1998), though the memory traces for whole words are typically thought to include temporal order.

Although this differs from current models, it could be implemented in similar architectures using a model in which fine-grained cues are mapped directly to words (independently of order). A version of TRACE with a single time alignment and a richer input might show anadrome effects, as could architectures like normalized recurrence (Spivey, 2007), which has been used to model phonetic categorization (McMurray & Spivey, 2000). These types of models could show classic online processing effects, while distinguishing anadrome-pairs using fine-grained detail. In the meantime, these results challenge current models and the assumption that words are represented in terms of phoneme order, and they challenge the field to develop creative alternatives to representing temporal order in models of word recognition.

Supplementary Material

13423_2013_417_MOESM1_ESM

NIHMS451328-supplement-13423_2013_417_MOESM1_ESM.pdf^{(485.6KB, pdf)}

**(A)** Timecourse of looks to a given object when it served as a target (on anadrome/ overlap trials), anadrome (Cohort/Anadrome trials), and overlap (Cohort/Overlap trials). **(B)** Timecourse for item-sets with three-feature differences between target and competitor onsets.

Acknowledgments

We would like to thank Jennifer Cole, Gary Dell, Simon Fischer-Baum, and Deborah Gagnon for insightful comments, and Matt Goldrick and James McQueen for helpful comments on earlier drafts. We also thank Dan McEchron for assistance with data collection and processing. This research was supported by a Beckman Institute postdoctoral fellowship to J.C.T., NIH Grant DC008089 to B.M., and HD044458 to Gary Dell.

Appendix

Table A1.

List of words used in the experiment. The final column indicates the number of feature differences (voicing, place, and manner) in the initial consonant between the anadrome and target.

Base-word	Anadrome	Cohort	Unrelated	Feature differences
tack	cat	tap	mill	1
pit	tip	pig	wine	1
puck	cup	putt	mail	1
pack	cap	pan	chess	1
reed	deer	reel	phone	1
face	safe	faith	lime	1

tug	gut	tub	mice	2
side	dice	sign	witch	2
lute	tool	loon	judge	2
mug	gum	mud	fish	2
mad	dam	map	shoes	2
sick	kiss	sip	jam	2

lip	pill	lid	cow	3
leak	keel	leash	gem	3
leap	peel	leaf	moon	3
sub	bus	sun	well	3

Open in a new tab

Footnotes

We use phonemes here only as a convenience to describe the input; we are not implying any representational status for them in the recognition process.

For trials longer than 1417ms, looks were truncated. For trials shorter than 1417ms, the final fixation position was used to fill the remaining time period, assuming that listeners had settled on an interpretation when they clicked on the referent. This approach has been used in several previous studies (McMurray et al., 2002; 2010).

For the Anadrome/Unrelated and Overlap/Unrelated comparisons, this requires us to compare trials with a cohort to those without one. Thus, we initially examined the within-trial-type comparisons for those effects.

Contributor Information

Joseph C. Toscano, Email: jtoscano@illinois.edu.

Nathaniel D. Anderson, Email: nandrsn3@illinois.edu.

Bob McMurray, Email: bob-mcmurray@uiowa.edu.

References

Allopenna P, Magnuson JS, Tanenhaus MK. Tracking the time course of spoken word recognition using eye-movements: Evidence for continuous mapping models. J Memory Lang. 1998;38:419–439. [Google Scholar]
Bates D, Sarkar D. lme4: Linear mixed-effects models using S4 classes. 2011. [Google Scholar]
Blumstein SE, Stevens KN. Acoustic invariance in speech production: Evidence from measurements of the spectral characteristics of stop consonants. J Acoust Soc Am. 1979;66:1001–1017. doi: 10.1121/1.383319. [DOI] [PubMed] [Google Scholar]
Chambers SM. Letter and order information in lexical access. J Verbal Learning Verbal Behav. 1979;18:225–241. [Google Scholar]
Connine CM, Blasko D, Titone D. Do the beginnings of spoken words have a special status in auditory word recognition? J Memory Lang. 1993;32:193–210. [Google Scholar]
Dahan D, Gaskell MG. The temporal dynamics of ambiguity resolution: Evidence from spoken-word recognition. J Memory Lang. 2007;57:483–501. doi: 10.1016/j.jml.2007.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dahan D, Magnuson JS, Tanenhaus MK, Hogan E. Subcategorical mismatches and the time course of lexical access: Evidence for lexical competition. Lang Cog Processes. 2001;16:507–534. [Google Scholar]
Frauenfelder U, Scholten M, Content A. Bottom-up inhibition in lexical selection: Phonological mismatch effects in spoken word recognition. Lang Cog Processes. 2001;16:563–607. [Google Scholar]
Gaskell MG, Marslen-Wilson W. Integrating form and meaning: a distributed model of speech perception. Lang Cog Processes. 1997;12:613–656. [Google Scholar]
Goldinger SD. Echoes of Echos? An episodic theory of lexical access. Psychol Rev. 1998;105:251–279. doi: 10.1037/0033-295x.105.2.251. [DOI] [PubMed] [Google Scholar]
Grainger J, Whitney C. Does the huamn mnid raed wrods as a wlohe? Trends Cog Sci. 2004;8:58–59. doi: 10.1016/j.tics.2003.11.006. [DOI] [PubMed] [Google Scholar]
Grossberg S. Resonant neural dynamics of speech perpception. J Phonetics. 2003;31:423–445. [Google Scholar]
Guerrara C, Forster K. Masked form priming with extreme transposition. Lang Cog Processes. 2008;23:117–142. [Google Scholar]
Hannagan T, Dupoux E, Christophe A. Holographic string encoding. Cog Sci. 2011;35:79–118. doi: 10.1111/j.1551-6709.2010.01149.x. [DOI] [PubMed] [Google Scholar]
Luce PA, Goldinger SD, Auer ET, Vitevitch MS. Phonetic priming, neighborhood activation, and PARSYN. Percept Psychophys. 2000;62:615–625. doi: 10.3758/bf03212113. [DOI] [PubMed] [Google Scholar]
Luce PA, Pisoni DB. Recognizing spoken words: The neighborhood activation model. Ear Hearing. 1998;19:1–36. doi: 10.1097/00003446-199802000-00001. [DOI] [PMC free article] [PubMed] [Google Scholar]
Marslen-Wilson W. Functional parallelism in spoken word recognition. Cognition. 1987;25:71–102. doi: 10.1016/0010-0277(87)90005-9. [DOI] [PubMed] [Google Scholar]
Marslen-Wilson W, Zwitserlood P. Accessing spoken words: The importance of word onsets. J Exp Psychol: Human Percept Perform. 1989;15:576–585. [Google Scholar]
McClelland JL, Elman JL. The TRACE model of speech perception. Cog Psychol. 1986;18:1–86. doi: 10.1016/0010-0285(86)90015-0. [DOI] [PubMed] [Google Scholar]
McLennan C, Luce PA. Examining the time course of indexical specificity effects in spoken word recognition. J Exp Psychol: Learning Memory Cognition. 2005;31:306–321. doi: 10.1037/0278-7393.31.2.306. [DOI] [PubMed] [Google Scholar]
McMurray B, Samelson VS, Lee SH, Tomblin JB. Eye-movements reveal the time-course of online spoken word recognition language impaired and normal adolescents. Cog Psychol. 2010;60:1–39. doi: 10.1016/j.cogpsych.2009.06.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
McMurray B, Spivey MJ. The categorical perception of consonants: the interaction of learning and processing. Proc Chicago Linguistics Society. 2000;34:205–220. [Google Scholar]
McMurray B, Tanenhaus MK, Aslin RN. Gradient effects of within-category phonetic variation on lexical access. Cognition. 2002;86:B33–B42. doi: 10.1016/s0010-0277(02)00157-9. [DOI] [PubMed] [Google Scholar]
Norris D, McQueen JM. Shortlist B: A Bayesian model of continuous speech recognition. Psychol Rev. 2008;115:357–395. doi: 10.1037/0033-295X.115.2.357. [DOI] [PubMed] [Google Scholar]
Perea M, Lupker SJ. Does jugde activate COURT? Transposed-letter similarity effects in masked associative priming. Memory Cognition. 2003;31:829–841. doi: 10.3758/bf03196438. [DOI] [PubMed] [Google Scholar]
Perea M, Lupker SJ. Can CANISO activate CASINO? Transposed-letter similarity effects with nonadjacent letter positions. J Memory Language. 2004;51:231–246. [Google Scholar]
Ranbom LJ, Connine CM. Lexical representation of phonological variation in spoken word recognition. J Memory Lang. 2007;57:273–298. [Google Scholar]
Spivey MJ. The continuity of mind. New York: Oxford University Press; 2007. [Google Scholar]
Tanenhaus MK, Spivey-Knowlton MJ, Eberhard KM, Sedivy JC. Integration of visual and linguistic information in spoken language comprehension. Science. 1995;268:1632–1634. doi: 10.1126/science.7777863. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

13423_2013_417_MOESM1_ESM

NIHMS451328-supplement-13423_2013_417_MOESM1_ESM.pdf^{(485.6KB, pdf)}

[R1] Allopenna P, Magnuson JS, Tanenhaus MK. Tracking the time course of spoken word recognition using eye-movements: Evidence for continuous mapping models. J Memory Lang. 1998;38:419–439. [Google Scholar]

[R2] Bates D, Sarkar D. lme4: Linear mixed-effects models using S4 classes. 2011. [Google Scholar]

[R3] Blumstein SE, Stevens KN. Acoustic invariance in speech production: Evidence from measurements of the spectral characteristics of stop consonants. J Acoust Soc Am. 1979;66:1001–1017. doi: 10.1121/1.383319. [DOI] [PubMed] [Google Scholar]

[R4] Chambers SM. Letter and order information in lexical access. J Verbal Learning Verbal Behav. 1979;18:225–241. [Google Scholar]

[R5] Connine CM, Blasko D, Titone D. Do the beginnings of spoken words have a special status in auditory word recognition? J Memory Lang. 1993;32:193–210. [Google Scholar]

[R6] Dahan D, Gaskell MG. The temporal dynamics of ambiguity resolution: Evidence from spoken-word recognition. J Memory Lang. 2007;57:483–501. doi: 10.1016/j.jml.2007.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] Dahan D, Magnuson JS, Tanenhaus MK, Hogan E. Subcategorical mismatches and the time course of lexical access: Evidence for lexical competition. Lang Cog Processes. 2001;16:507–534. [Google Scholar]

[R8] Frauenfelder U, Scholten M, Content A. Bottom-up inhibition in lexical selection: Phonological mismatch effects in spoken word recognition. Lang Cog Processes. 2001;16:563–607. [Google Scholar]

[R9] Gaskell MG, Marslen-Wilson W. Integrating form and meaning: a distributed model of speech perception. Lang Cog Processes. 1997;12:613–656. [Google Scholar]

[R10] Goldinger SD. Echoes of Echos? An episodic theory of lexical access. Psychol Rev. 1998;105:251–279. doi: 10.1037/0033-295x.105.2.251. [DOI] [PubMed] [Google Scholar]

[R11] Grainger J, Whitney C. Does the huamn mnid raed wrods as a wlohe? Trends Cog Sci. 2004;8:58–59. doi: 10.1016/j.tics.2003.11.006. [DOI] [PubMed] [Google Scholar]

[R12] Grossberg S. Resonant neural dynamics of speech perpception. J Phonetics. 2003;31:423–445. [Google Scholar]

[R13] Guerrara C, Forster K. Masked form priming with extreme transposition. Lang Cog Processes. 2008;23:117–142. [Google Scholar]

[R14] Hannagan T, Dupoux E, Christophe A. Holographic string encoding. Cog Sci. 2011;35:79–118. doi: 10.1111/j.1551-6709.2010.01149.x. [DOI] [PubMed] [Google Scholar]

[R15] Luce PA, Goldinger SD, Auer ET, Vitevitch MS. Phonetic priming, neighborhood activation, and PARSYN. Percept Psychophys. 2000;62:615–625. doi: 10.3758/bf03212113. [DOI] [PubMed] [Google Scholar]

[R16] Luce PA, Pisoni DB. Recognizing spoken words: The neighborhood activation model. Ear Hearing. 1998;19:1–36. doi: 10.1097/00003446-199802000-00001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Marslen-Wilson W. Functional parallelism in spoken word recognition. Cognition. 1987;25:71–102. doi: 10.1016/0010-0277(87)90005-9. [DOI] [PubMed] [Google Scholar]

[R18] Marslen-Wilson W, Zwitserlood P. Accessing spoken words: The importance of word onsets. J Exp Psychol: Human Percept Perform. 1989;15:576–585. [Google Scholar]

[R19] McClelland JL, Elman JL. The TRACE model of speech perception. Cog Psychol. 1986;18:1–86. doi: 10.1016/0010-0285(86)90015-0. [DOI] [PubMed] [Google Scholar]

[R20] McLennan C, Luce PA. Examining the time course of indexical specificity effects in spoken word recognition. J Exp Psychol: Learning Memory Cognition. 2005;31:306–321. doi: 10.1037/0278-7393.31.2.306. [DOI] [PubMed] [Google Scholar]

[R21] McMurray B, Samelson VS, Lee SH, Tomblin JB. Eye-movements reveal the time-course of online spoken word recognition language impaired and normal adolescents. Cog Psychol. 2010;60:1–39. doi: 10.1016/j.cogpsych.2009.06.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] McMurray B, Spivey MJ. The categorical perception of consonants: the interaction of learning and processing. Proc Chicago Linguistics Society. 2000;34:205–220. [Google Scholar]

[R23] McMurray B, Tanenhaus MK, Aslin RN. Gradient effects of within-category phonetic variation on lexical access. Cognition. 2002;86:B33–B42. doi: 10.1016/s0010-0277(02)00157-9. [DOI] [PubMed] [Google Scholar]

[R24] Norris D, McQueen JM. Shortlist B: A Bayesian model of continuous speech recognition. Psychol Rev. 2008;115:357–395. doi: 10.1037/0033-295X.115.2.357. [DOI] [PubMed] [Google Scholar]

[R25] Perea M, Lupker SJ. Does jugde activate COURT? Transposed-letter similarity effects in masked associative priming. Memory Cognition. 2003;31:829–841. doi: 10.3758/bf03196438. [DOI] [PubMed] [Google Scholar]

[R26] Perea M, Lupker SJ. Can CANISO activate CASINO? Transposed-letter similarity effects with nonadjacent letter positions. J Memory Language. 2004;51:231–246. [Google Scholar]

[R27] Ranbom LJ, Connine CM. Lexical representation of phonological variation in spoken word recognition. J Memory Lang. 2007;57:273–298. [Google Scholar]

[R28] Spivey MJ. The continuity of mind. New York: Oxford University Press; 2007. [Google Scholar]

[R29] Tanenhaus MK, Spivey-Knowlton MJ, Eberhard KM, Sedivy JC. Integration of visual and linguistic information in spoken language comprehension. Science. 1995;268:1632–1634. doi: 10.1126/science.7777863. [DOI] [PubMed] [Google Scholar]

PERMALINK

Reconsidering the role of temporal order in spoken word recognition

Joseph C Toscano

Nathaniel D Anderson

Bob McMurray

Abstract