Interaction of Knowledge Sources in Spoken Word Identification

Aita Salasoo; David B Pisoni

doi:10.1016/0749-596X(85)90025-7

. Author manuscript; available in PMC: 2012 Dec 4.

Published in final edited form as: J Mem Lang. 1985 Apr;24(2):210–231. doi: 10.1016/0749-596X(85)90025-7

Interaction of Knowledge Sources in Spoken Word Identification

Aita Salasoo ¹, David B Pisoni ¹

PMCID: PMC3513696 NIHMSID: NIHMS418766 PMID: 23226691

Abstract

A gating technique was used in two studies of spoken word identification that investigated the relationship between the available acoustic–phonetic information in the speech signal and the context provided by meaningful and semantically anomalous sentences. The duration of intact spoken segments of target words and the location of these segments at the beginnings or endings of words in sentences were varied. The amount of signal duration required for word identification and the distribution of incorrect word responses were examined. Subjects were able to identify words in spoken sentences with only word-initial or only word-final acoustic–phonetic information. In meaningful sentences, less word-initial information was required to identify words than word-final information. Error analyses indicated that both acoustic–phonetic information and syntactic contextual knowledge interacted to generate the set of hypothesized word candidates used in identification. The results provide evidence that word identification is qualitatively different in meaningful sentences than in anomalous sentences or when words are presented in isolation: That is, word identification in sentences is an interactive process that makes use of several knowledge sources. In the presence of normal sentence context, the acoustic–phonetic information in the beginnings of words is particularly effective in facilitating rapid identification of words.

It is now well documented that both acoustic–phonetic information from the speech signal and other nonsensory sources of knowledge contribute to spoken word identification (see Bagley, 1900; Cole & Rudnicky, 1983). The listener’s knowledge of morphology, syntax, and semantics may be collectively labeled context. The problem of how context is used to support perception and comprehension is one of the most important questions in the field of language processing today (e.g., Cole & Rudnicky, 1983; Grosjean, 1980; Marslen-Wilson & Welsh, 1978; Marslen-Wilson & Tyler, 1980; Stanovich & West, 1981; Swinney, 1979).

Early studies of context effects in speech processing employed degraded stimuli (e.g., Miller, Heise, & Lichten, 1951; Miller & Isard, 1963). With impoverished speech signals such as sentences presented against high levels of noise, listeners can extract the linguistic content of the message if they have access to normal semantic and syntactic information. When these top-down knowledge sources are experimentally removed or modified in some way (Miller & Isard, 1963), listeners’ identification suffers substantially. If we consider the contribution of distortion and masking of the speech signal by miscellaneous sources of noise, for example, coughs, traffic, faulty telephone lines, then it follows that non-acoustic knowledge sources also contribute to the identification of words in normal fluent speech processing situations.

More recently, a variety of tasks have been used to address the mechanisms of context effects. One particular approach has been to observe the effect of various context manipulations on measures that index the listener’s relative dependence on bottom-up acoustic–phonetic information.

In this spirit, Marslen-Wilson and Tyler (1980) found that words in normal sentences were recognized before half of their acoustic–phonetic signal had become available to listeners. They also observed that identification times (to a specified word target) increased in anomalous sentences that had no coherent semantic interpretations, compared to normal sentences. Identification times were longest for words in scrambled lists of unrelated words. Several studies have used a gating paradigm, which measures the minimum acoustic–phonetic input required for word identification (Grosjean, 1980; Cotton & Grosjean, 1984). Listeners in these studies required less stimulus information to identify words in sentence contexts than when the same words occurred in isolation. The locus and mechanisms producing these context effects are not well understood, although they are the focus of much current research (cf. Cole & Rudnicky, 1983).

In this paper, we investigate the sources of knowledge that are employed in spoken word identification. We will assume that at some stage in perceiving and processing the speech input, words are identified or recognized by listeners. The word identification process involves a number of component stages including word recognition, lexical access and retrieval, and response execution. We will adopt the term word identification to stand for the conscious belief (and a response contingent on that belief) that a particular word has just occurred. We will reserve the term word recognition for the results of the low-level sensory pattern-matching process that is assumed to occur upon hearing a spoken word. By lexical access we mean contact of an internal representation derived directly from the speech input with a lexical representation (i.e., a word) in memory and retrieval or activation of that item in working memory (Pisoni, 1981).

In recent years, questions about the mechanisms responsible for context effects in speech processing have focused on the issue of autonomous versus interactive processing (Cairns, 1982; Marslen-Wilson & Tyler, 1980; Norris, 1982a, 1982b; Swinney, 1982; Tyler & Marslen-Wilson, 1982a, 1982b) and on the special status given to word-initial phonetic segments in lexical access (Cole & Jakimik, 1980; Garrett, 1978; Cairns, 1982). According to the autonomy principle (e.g., Swinney, 1982, p. 164), lexical processing consists of “a set of isolable, autonomous substages, where these substages constitute domain specific processing modules.” Of immediate interest to us is the a priori assumption made by several investigators that lexical access is autonomous or context independent (e.g., Forster, 1979; Marslen-Wilson & Tyler, 1980; Swinney, 1982). In autonomous accounts of spoken word identification, the effects of higher-order context are assumed to be postlexical in nature (Forster, 1979).

In the original account of cohort theory, Marslen-Wilson and Welsh (1978) assumed that the initial set (or cohort) of word candidates is fully determined by the word-initial acoustic–phonetic input alone (see also Tyler & Marslen-Wilson 1982a, 1982b). According to Marslen-Wilson’s “principle of bottom-up priority” (e.g., Marslen-Wilson & Tyler, 1980; Tyler & Marslen-Wilson, 1982b) the cohort-establishing processes (involved in lexical access as we have defined it) are viewed as context independent, that is, autonomous, using only word-initial acoustic–phonetic information. Once the set of word-initial cohorts is activated, subsequent bottom-up acoustic–phonetic information and all other sources of information (including syntactic and semantic constraints) are used to deactivate incompatible word candidates. According to this view, a word is identified “optimally” at the point where it becomes “uniquely distinguishable from all of the other words in the language beginning with the same sound sequence” (Tyler & Marslen-Wilson, 1982b, p. 175).

The autonomy principle gives special status to the initial acoustic–phonetic information in the speech signal for directing the word identification process. The cohort theory makes strong claims about the role of word-initial acoustic–phonetic information, namely, that an entire set of lexical candidates that share word-initial phonetic information with the spoken stimulus word is directly activated during word identification. Similarly, in Forster’s (1976) autonomous search model, a master file in the lexicon contains all the phonetic, syntactic, and semantic information associated with a word token that is subjected to postaccess comparison and identification decision processes. Entry to the master file proceeds, in the case of spoken language input, only via a peripheral phonetic file based on acoustic–phonetic information in the signal. As Swinney (1982) has pointed out, in both accounts, it is assumed a priori that initial processing is autonomous.

Tyler and Marslen-Wilson (1982b) have located the effects of context in postlexical decision stages of processing, almost by default. This conclusion is based on observations that word–initial fragments are influential in postperceptual tasks. For example, in both visual and auditory domains, production and memory for words when their initial segments are presented are superior to performance based on their final segments (e.g., Bruner & O’Dowd, 1958; Nooteboom, 1981). However, the possibility of perceptual context effects in continuous spoken word identification cannot be ruled out by these data.

Grosjean’s (1980) recent analysis of listeners’ responses in an identification task constituted the first attempt to operationalize Marslen-Wilson’s concept of a word-initial cohort. Grosjean used a gating paradigm in which signal duration was varied: Listeners tried to identify a target word after fragments of the speech signal had been presented. Incorrect responses included not only acoustically similar words but also word candidates guided by semantic constraints and word frequency. Grosjean (1980) interpreted his response data as evidence against the claim that only acoustic–phonetic information controlled the distribution of lexical candidates generated before a word is identified. Instead, Grosjean (1980) suggested that both acoustic and nonacoustic sources of knowledge interact to select potential word candidates in lexical access. This view is similar to the interactive logogen model proposed by Morton (1969, 1979).

Using a task based on Grosjean’s gating technique, we examined the interaction of contextual and sensory information in the identification of content words in sentences. We replaced multiple word targets in sentences with envelope-shaped noise and asked listeners to identify all the words. Our gating task is an “off-line” procedure that employs unusual sentence stimuli. Nevertheless, the task appears to be sensitive to the amount of sensory input available at various points during the time course of spoken word identification. It is possible that additional strategies not used normally in speech comprehension, such as hypothesis testing and guessing, may also be reflected in our task. We believe, however, that our results are compatible with those from “on-line” techniques and therefore can provide converging evidence about the availability of various knowledge sources used to identify words in spoken sentences.

In two gating experiments reported below, we investigate the interactive assumption that normal spoken word identification processes require the presence of semantic and syntactic context. More specifically, we focus on the a priori nature of the autonomy assumption in word identification and the special status given to word-initial acoustic–phonetic information in cohort theory.

Experiment 1

The present study used a gating paradigm to investigate the knowledge sources employed in the identification of words in spoken sentences. Three questions were addressed: First, is word-initial acoustic–phonetic information obligatory for word identification in sentences? Second, how does the reliance on acoustic–phonetic information change across normal, meaningful sentences, and syntactically normal, but semantically anomalous sentences? And, finally, how is the distribution of incorrect responses to “word-initial” or “word-final” information related to the amount of signal duration required for identifying spoken words?

Our terminology and procedure differ somewhat from previous studies using various gating techniques.¹ In our task, subjects were required to identify content words in short sentences after each presentation of the sentences. On the first trial, the waveform of each target word was replaced completely by envelope-shaped noise. This noise effectively removed all segmental acoustic–phonetic cues, while at the same time preserving the natural prosodic and duration information. On each consecutive trial, 50-millisecond increments of the original waveform replaced selected parts of the noise in each target word. The number of consecutive 50-millisecond increments increased on successive repetitions of the sentence until, on the final trial, the entire waveform of the original word was presented. Two aspects of our procedure are novel. First, in constructing the stimuli, we replaced the non-presented part of the signal with envelope-shaped noise instead of with silence. Second, we used multiple target words in sentences to simulate the demands of normal, continuous word identification in processing fluent speech. This procedure contrasts with the use of a single target item that has often been the final word in a sentence (e.g., Grosjean, 1980; Cotton & Grosjean, 1984). Our use of the term “gate” refers to the duration of the presented segment of the intact speech signal for each of the target words.

Method

Subjects

The subjects were 194 introductory psychology students who received course credit for their participation. All subjects were native speakers of English with no reported hearing loss. None of the subjects had participated in previous experiments using speech or speech-like materials.

Materials

From two sets of experimental materials, three context conditions were generated: words in meaningful sentences, words in anomalous sentences, and the same words presented in isolation. Eight Harvard psychoacoustic sentences (Egan, 1948) were chosen for the meaningful context condition. These sentences, for example, “The stray cat gave birth to kittens,” cover a range of syntactic structures and are balanced according to word frequency and phonological density counts in English usage. The Harvard sentences are typical of active declarative sentences in English; they are meaningful and syntactically normal. The second context condition consisted of eight sentences selected from a set of materials originally developed for use in the evaluation of the intelligibility of synthesized speech (Nye & Gaitenby, 1974; Pisoni, 1982). These materials, known as the Haskins sentences, are syntactically normal and contain high-frequency words. Unlike the Harvard sentences, however, they are semantically anomalous, for example, “The end home held the press.” As such, the Haskins sentences represent a class of semantically impoverished sentences. The two sets of materials will be referred to as meaningful (Harvard) and anomalous (Haskins) sentences, respectively.

All the content words from the meaningful and anomalous sentences served as targets.² In the third condition, the target words were excised from the intact spoken sentences and presented in isolation. This condition serves as a control for the contribution of any sentence context per se in the word identification process (Miller et al., 1951).

Two properties of the target words were varied orthogonally: first, the amount of acoustic–phonetic information in the waveform, defined by gate duration; and second, the location of that information within the word, defined by gating direction. Stimulus duration was varied in 50-millisecond increments between successive trials. The two levels of gating direction were forward, with increasing amounts of signal, left-to-right, from the word beginning, and backward, with increasing amounts of signal, right-to-left, from the end of the word.

Audio tapes of the original sentences, read by a male speaker, were sampled at 10 kHz, low-pass filtered at 4.8 kHz, digitized, and stored on disk by a PDP-11/34 computer. The beginnings and endings of target words were located with a digital waveform editor. The gated conditions of the target words in each sentence were produced by simply replacing the appropriate number of consecutive 50-millisecond intervals with envelope-shaped noise (Horii, House, & Hughes, 1971). For each digital sample of the waveform, the direction of the amplitude was reversed at random while the absolute value of the amplitude and the RMS energy were preserved. This procedure maintained the amplitude and durational cues of the original speech waveform, while at the same time obliterating the fine spectral information (i.e., formant structure) needed for segmental identification.

For each original sentence, two sequences of experimental sentences were produced, one each for the forward and backward gating conditions. In both sets of materials, the first and last trials were identical: On the first trial, all target words were completely replaced by noise, while on the last trial the original intact spoken sentence was presented. The remaining trials contained acoustic–phonetic information increasing in 50-millisecond increments from the beginning (for a forward-gated sentence) or ending (for a backward-gated sentence) of each target word. Figure 1 shows spectrograms of the first, third, fifth, and last trials (from top to bottom) of forward-gated and backward-gated sequences of one of the meaningful sentences used in the experiment. The isolated word conditions were created by excising words from the gated sentences (cf. Pollack & Pickett, 1963). Forward-gated and backward-gated sequences were created separately for each target word in isolation. Each isolated gated word presentation was treated as a trial.

Fig. 1 — Sample speech spectrograms of forward-gated and backward-gated sentences. Increasing signal duration is shown left-to-right (forward gating) and right-to-left (backward gating), respectively, for the target words of one test sentence.

Sixteen experimental tapes were created from the digitally stored stimuli using a 12-bit D/A converter and a Crown 800 Series tape recorder. For each gating direction two counterbalanced random orders of the eight sentence (or word) sequences were generated in each context condition. Each sequence progressed from the stimulus with shortest gate duration to the stimulus with the longest gate duration.

Procedure

Small groups of subjects were tested simultaneously in a sound-treated experimental room. Each group heard one experimental tape at 77 dB SPL peak levels over TDH-39 matched and calibrated earphones. Between 20 and 26 subjects heard each context type for each gating direction.

Subjects were told that they would hear a single sentence or word on each trial, and that each stimulus would be repeated, becoming clearer on each subsequent presentation. Subjects were instructed to write down the word or words they heard after each presentation. Subjects were encouraged to guess if they were not certain of a particular word or words. The experimenter stopped the tape recorder manually via a remote control unit after each trial and continued only when all subjects had finished writing their responses on prepared answer sheets. In the isolated word condition, the tape ran without interruption; the intertrial interval was 3 seconds long and cue tones were used to signal the start of a new stimulus sequence.

In the sentence conditions, response sheets contained the function words in their correct locations along with separate blank spaces for the content words in each experimental sentence. Subjects, therefore, had some access to top-down knowledge provided by the information on the answer sheets, for example, the possible form class of words following function words and the number of words in the sentences. Subjects were required to respond to each word after each stimulus presentation with either a word or an “X” if they could not identify a word.

Results and Discussion

Two types of dependent measures were obtained. First, we computed the “identification point” for all target words. This was defined as the duration of the signal present on the trial during which the word was first correctly identified and then continued to be identified correctly thereafter by a subject.³ Second, we analyzed subjects’ incorrect word responses in order to examine the response distribution of word candidates that was generated in a given gating condition before a given target word was identified correctly. We considered these response distributions to be an empirical measure of the structure of the underlying cohort (cf. Grosjean, 1980). That is, we assumed that the incorrect word responses before a word was identified would reflect the pool of potential word candidates hypothesized during on-line word identification.

Initially, the data from the Harvard and Haskins material sets were analyzed separately for gating-direction effects and then planned comparisons between the material sets were carried out. The data for the words in each sentence position were pooled across the eight test sentences. This was done to test for serial-order effects, that is, the possibility that recognizing words early in the sentence might influence the identification of words occurring later in the same sentence (Cole & Jakimik, 1980). The results for the analyses of identification points and the incorrect word responses are examined separately below.

Identification Point Results

The identification point data are shown in Figure 2 for words in meaningful and anomalous sentences (and their isolated controls) in the left- and right-hand panels, respectively. In each panel, identification points are shown for words in each sentence position for both gating directions; mean identification points averaged over all sentence positions are shown in the far right of each panel. The means of the measured physical durations of words at each sentence position are also included as a baseline for comparison.

Fig. 2 — Identification points for words in meaningful and semantically anomalous sentences expressed as milliseconds of signal duration in each sentence position in Experiment 1. Forward-gated and backward-gated words (triangles and squares) are shown for each sentence context and in isolation (filled and open symbols). The measured physical durations of each target words at each sentence position are marked with X’s and dotted lines.

The raw identification points were converted into proportions of the mean word duration in each sentence position to compensate for differences in duration as a function of syntactic structure.⁴ Statistical analyses were then carried out on arcsin transformations of the proportions. Analyses of variance by subject and treatment were performed. Gating direction and sentence position (or subjects) were treated as fixed and random factors, respectively, and min F′ statistics were calculated (Clark, 1973). Unless otherwise stated, all significance levels are less than p = .01.

Four major findings were obtained for words in the meaningful sentence contexts. First, backward-gated words required greater signal duration for identification than forward-gated words. This was true for words in a sentence context, min F′(1,78) = 6.56, and for their isolated controls, min F′(1,90) = 7.30, respectively. Second, word identification required more signal duration in isolation than in meaningful sentences, t(78) = 6.22: The mean difference was 96 milliseconds. Third, the presence of normal sentence context decreased the listeners’ dependence on the acoustic–phonetic information in the signal compared to isolated word identification. The amount of signal duration required to identify the isolated words excised from the Haskins anomalous sentences correlated positively with their measured durations, r(38) = .89. No such relationship was observed for the identification points and durations of the words excised from the meaningful sentences, r(38) = .24, p > .10. Fourth, no main effect of sentence position was observed, F(4,44) = 1.67, p > .1. This result suggests that the words that occurred earlier in a sentence apparently conveyed no predictive information that enabled listeners to identify the following words at shorter gate durations.

The identification point data for words from the anomalous (Haskins) materials can be summarized as follows: First, forward-gated words in isolation required less signal for identification than backward-gated words, min F′(1,80) = 13.84, but this was not true in the anomalous context, min F′(1,56) = 1.26, p > .10. No main effect of anomalous context compared to the isolated control condition was observed. Second, the presence of anomalous sentence context did not reduce the listeners’ dependence on the acoustic–phonetic information in the signal: Identification points correlated positively with the measured word duration for both words in anomalous contexts and their isolated controls, r(30) = .92, and r(30) = .76, respectively. Third, no sentence position effect was found in the presence of anomalous contexts: Neither the subject-wise nor the treatment-wise test was significant (F_s (3,129) < 1.0, p > .44 and F_s (3,29) = 1.43, p > .25).⁵ It appears that the anomalous sentence context somehow inhibits the normal reliance on acoustic–phonetic information for word identification.

The identification data for words in both meaningful and anomalous sentences are shown in Figure 3 as proportions of the mean word durations in each sentence position. No differences were observed for the isolated words: Words excised from the Harvard and Haskins sentences were identified with .83 and .81 of the mean word duration, respectively. However, in the anomalous sentence context, .72 of the mean word duration was needed for identification, while only .56 was necessary to identify the words in the meaningful sentences, r(82) = 3.31. Of the four context conditions, only in the anomalous sentence context was no advantage for word-initial acoustic–phonetic information over word-final acoustic–phonetic information observed. This result suggests that an anomalous sentence context prevents the listener from using word-initial information in the same way as it is used in normal sentence environments. In this case, lexical access processes are misdirected and listeners require more signal duration to identify words.

Fig. 3 — Identification points for words in meaningful and semantically anomalous sentences expressed as proportions of the actual measured word duration for each sentence position in Experiment 1.

Analysis of the Response Distributions

We were also interested in the distribution and structural organization of the incorrect word candidates generated by listeners before they correctly identified the target words in the different context conditions. The number of different incorrect word responses proposed by at least one subject was examined as a measure of response output. Analyses of variance with gating direction and sentence position as factors and sentences as repeated measures were performed on these output measures.

The mean number of different word responses in each sentence position in the meaningful and anomalous sentences (excluding correct identification responses) are shown in Figure 4. For the words in meaningful sentences, a gating-direction effect was found, F(1,14) = 4.94, p < .05: Word-final acoustic–phonetic information yielded more word candidate responses than word-initial acoustic–phonetic information. This was not observed for words in the anomalous sentences, F(1,14) < 1.0, p > .56. Sentence position effects were found for both context types, F(4,56) = 5.75 and F(3,42) = 8.68. However, the nature of these position effects differs substantially. In the meaningful sentences, fewer incorrect responses were proposed for words that occurred later in a sentence than for words that occurred earlier in the sentence. This is shown in the decreasing slope of the two curves in the left-hand panel of Figure 4. In contrast, in the anomalous sentences, more word candidates were generated for words in the second and fourth sentence positions than in the first and third positions. These data parallel the variations in the identification point data for the anomalous context condition (see Figure 2 and footnote 5). In fact, a significant correlation was observed between identification points and the number of incorrect word responses for the anomalous context, r(30) = .96. (This finding was not observed for the identification points and number of incorrect word candidates in the meaningful context, r(38) = .26, p > .10.)

Fig. 4 — Number of incorrect word responses generated for forward-gated and backward-gated words (triangles and squares) in each sentence position for the meaningful and semantically anomalous sentences, based on data from 42 and 45 subjects, respectively, in Experiment 1.

Finally, more incorrect lexical candidates were proposed in the semantically anomalous sentences than in the meaningful sentences, t(70) = 5.41. This result occurred despite the fact that the sentence frame was always fixed for the anomalous sentences, which we expected to aid identification. Analyses of the number of word candidates support our interpretation that the presence of an anomalous context interferes with the normal integration of acoustic–phonetic and contextual information that leads to word identification in sentences.

To examine the structure of the word candidate responses in greater detail, analyses of the hypothesized sources of knowledge underlying the proposed word candidates were carried out for each subject’s response protocol. Each incorrect word response was categorized as originating from one of three possible sources: (1) acoustic–phonetic analysis of the signal; (2) “syntactic” contextual information; and (3) “other” sources (nonwords, words from an inappropriate form class, or intrusions). Incorrect word candidates were classified as belonging to only one of these three categories. In this scoring procedure, preference was given to acoustic–phonetic information as a knowledge source, so that the remaining two categories contained no candidates that were phonetically similar to the target word.⁶ Thus, we selected a conservative measure of the nonsensory contributions to the set of words hypothesized before a word was identified correctly. Within the “acoustic” category, our scoring procedure prevented discrimination of contextually appropriate and inappropriate word candidates.

We assumed that in the meaningful sentences, normal pragmatic, semantic, and syntactic constraints were used by listeners to identify the target words. In contrast, we assumed that no normal pragmatic or semantic relations could be derived from the anomalous sentences. In fact, whatever semantic cues might be generated would most likely be incompatible with the normal syntactic cues in this condition. The criterion we adopted for scoring membership in the “syntactic” knowledge category was based solely on appropriate form class (in the absence of correct acoustic–phonetic information) for both meaningful and semantically anomalous sentences. We note that semantic and syntactic sources of knowledge were not differentiated within the “syntactic” knowledge category. Although this knowledge source could not be used in the isolated word identification task, response distributions in those conditions were nevertheless scored for this category also, simply as a baseline control measure.

Finally, the “other” category of incorrect responses contained primarily response intrusions from other sentences or from other serial positions in the same test sentence. Also included in this category were phonemically dissimilar nonwords and words from inappropriate form classes. Since the “other” word responses constituted less than 4% of the total incorrect responses, they were omitted from statistical analyses performed on the response distributions.

Table 1 shows the distribution of incorrect responses over all sentence positions and gate durations. Data from the three categories of incorrect word candidates, and blank responses are given as proportions of the total number of incorrect responses. In all conditions, the majority of incorrect responses were blanks. However, our treatment addresses the distributions of acoustic and syntactic word candidates, since they can be related to underlying sources of knowledge available to the listener, whereas this may be difficult for other kinds of errorful responses.

TABLE 1.

Incorrect Response Distributions in Experiment 1 as Percentages of Total Errors

Response	Meaningful context		Isolated control
Response	Forward	Backward	Forward	Backward
Blank	83.2	82.2	88.6	87.0
Acoustic – phonetic	8.5	8.5	7.9	9.4
Syntactic	4.9	7.1	2.0	2.0
Other	3.4	1.3	1.5	1.6
Response	Anomalous context		Isolated control
Response	Forward	Backward	Foward	Backward

Blank	85.4	86.9	87.5	88.5
Acoustic – phonetic	10.8	8.7	9.6	9.1
Syntactic	2.7	3.2	1.2	1.1
Other	1.1	1.2	1.7	1.3

Open in a new tab

Two major results were observed in the analyses of the response distributions. First, more word candidates were based on acoustic than on syntactic information. This was true for Harvard words both in meaningful contexts, F(1,28) = 9.83, and in isolation, F(1,28) = 286.68, and for Haskins words both in their anomalous contexts, F(1,28) = 225.59, and in isolation, F(1,28) = 428.11, respectively. Second, the role of nonacoustic, top-down knowledge sources in the response distribution increased in both meaningful and anomalous contexts compared to the isolated word conditions: For both the Harvard and Haskins words, more syntactically based word responses were made in the sentence context than in isolation, F(1,28) = 5.75, p < .02 and F(1,28) = 13.63, respectively. This result suggests that there are no syntactic constraints operating in the isolated control conditions. The observed facilitation from the anomalous context suggests that subjects may have taken advantage of the sentence frame similarity between all the Haskins sentences. Thus, semantically anomalous sentence context appears to inhibit normal efficient use of knowledge sources to identify words correctly, but in the process, the context causes more incorrect candidates, particularly from nonacoustic sources, to be considered. When incorrect word candidates were generated, they were based, in large part, on the acoustic–phonetic information present in the speech signal. However, the results for the syntactically based candidates stretch the spirit of “bottom-up priority,” whereby the set of possible word candidates is initially defined exclusively by sensory input (Tyler & Marslen-Wilson, 1982b).

When the data for the words in meaningful sentences and in isolation are compared, the effects of the meaningful sentence context are readily apparent. The relative contribution of acoustic information did not differ in isolation and in meaningful context, F(1,28) = 1.18, p > .28. In contrast, more syntactically based word candidates were generated in the sentence context than in isolation, F(1,28) = 5.75, p < .02. A significant Gating-Direction x Sentence Context interaction was observed for the syntactically based responses, F(1,28) = 7.77. When only word-final acoustic–phonetic information was present in the meaningful sentences, the number of responses based on correct syntactic knowledge increased.

It appears that the presence of meaningful sentence contexts operates to expand the role of syntactic information: In particular, the presence of meaningful context co-occurring with the absence of word-initial acoustic–phonetic information appears to increase the reliance on syntactic contextual information in hypothesizing word candidates. Thus, subjects are able to identify words from word-final information in sentences, but they do this in ways that are qualitatively and quantitatively different from identification based on word-initial information.

Our results from identification points and response distributions support three broad claims. First, the acoustic–phonetic information contained in the beginnings of words is more useful to listeners compared to the information contained in the ends of words. However, word identification is possible without information from the beginnings of words. This result calls into question the obligatory status given to word-initial information in a number of recent accounts of spoken word identification (Cole & Jakimik, 1980; Forster, 1976, 1979; Marslen-Wilson & Welsh, 1978; Tyler & Marslen-Wilson, 1982a, 1982b). Second, meaningful sentence contexts support faster, more efficient, and qualitatively different identification processes then semantically anomalous sentence contexts or the presentation of words in isolation. Third, we have demonstrated a reliable relationship between the structure of the set of incorrect word responses and the final product of the word identification process. When less dependence on acoustic–phonetic information exists as in most normal sentence processing situations, identification occurs faster and more accurately. In our study, listeners identified forward-gated words in meaningful sentences with the shortest gates and the fewest incorrect lexical responses. Thus, both accuracy and temporal measures of perceptual processing allowed by our gating procedure support the claim that spoken word identification in sentences is most efficient when word–initial information and normal sentential constraints are present.

Experiment 2

The aim of Experiment 2 was threefold: first, to replicate the effects observed in the first study; second, to study the changes in the distribution of word candidates over increasing signal durations; and third, to investigate the effects specific to the successive presentation procedure used in Experiment 1. In this experiment, we made one additional assumption, namely, that the sources of knowledge used in lexical access and word identification would be differentially informative at various points in the time course of the word identification process. Thus, we expected some variation in the balance of lexical responses based on different knowledge sources as a function of the time course of the word identification process. These changes should be reflected in the set of word candidates generated at different gate durations. These assumptions yield several predictions that can be used to test the autonomous character of lexical access processes as they have been specified in cohort theory (Marslen-Wilson & Welsh, 1978; Tyler & Marslen-Wilson, 1982b). According to the cohort theory, the original set of word candidates is activated solely by the acoustic–phonetic information in approximately the first 175–200 milliseconds of the speech signal at the beginnings of words. By this view, decreases in the set of hypothesized words during the time course of word identification are seen as the consequence of both the sensory input and the top-down syntactic and semantic knowledge available from the context. A word is identified or recognized when all but one lexical candidate are deactivated by the interaction of these two knowledge sources (see Grosjean, 1980; Tyler & Marslen-Wilson, 1982b).

In the present experiment, we predicted that at very short gate durations, when minimal acoustic–phonetic segmental information is available in the speech signal, a larger proportion of word candidates should be based on other knowledge sources than at longer gate durations. By this prediction, interactive processes can provide input to the set of lexical candidates generated before a word is consciously identified. If, on the other hand, semantic and syntactic knowledge can only be used to eliminate incorrect (acoustic–phonetic based) word candidates, as Marslen-Wilson and his colleagues claim, then any nonacoustic syntactic word candidates occurring in the response distribution should simply represent random noise. To this end, we examined the time course or growth of word candidates hypothesized by listeners in terms of changes in the distribution of word candidates over successive gate durations.

Several methodological questions were also addressed in the present experiment. First, the procedure of successive gated sentence trials (used in Experiment 1 and by Grosjean, 1980), each with the presentation of greater and greater signal durations, may have influenced subjects’ word identification responses artifactually. Repeated presentations of the same signal with short gate durations may have led to facilitation in terms of the amount of signal duration required for word identification. On the other hand, subjects may have developed specialized response strategies during successive presentations of a test sentence. Having available their responses from earlier trials on their answer sheets may have influenced subjects’ subsequent responses on later presentations of the same test signal. Conceivably, subjects may have been reluctant to change some word candidates, even when additional acoustic–phonetic information was present. To determine the validity of the procedure used in Experiment 1, several procedural and design changes were made. First, in a within-subject design, each subject heard each test sentence only once and every subject heard both meaningful and anomalous sentence contexts mixed in both forward and backward gating conditions. Second, we were interested in the effects of the syntactic information contained in the printed sentence frames on subject answer sheets in Experiment 1. The data from the sentence context and isolated word conditions in Experiment 1 suggest that syntactic knowledge plays only a minimal role in spoken word identification processes. Subjects may have used general linguistic knowledge, as opposed to specific contextual knowledge gained from a bottom-up parsing analysis of each stimulus input (Garrett, 1978). If this was indeed the case, then the function word sentences frames, for example, “The _____ _____ _____ in the _____,” would not be instrumental in providing subjects with syntactic information specific to each test sentence in Experiment 1. In the present study, therefore, subjects had no visual information about the semantic or syntactic structure of the sentences: Subjects simply wrote down whatever words they heard after each sentence presentation.