Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Dec 1.
Published in final edited form as: Mem Cognit. 2010 Dec;38(8):1137–1146. doi: 10.3758/MC.38.8.1137

Repetition is easy: Why repeated referents have reduced prominence

Tuan Q Lam 1, Duane G Watson 1
PMCID: PMC3057424  NIHMSID: NIHMS277006  PMID: 21156876

Abstract

The repetition and predictability of a word in a conversation are two factors that are believed to affect whether or not it is emphasized: predictable, repeated words are less acoustically prominent than unpredictable, new words. However, because predictability and repetition are correlated, it is unclear whether speakers lengthen unpredictable words to facilitate comprehension or whether this lengthening is the result of difficulties in accessing a new (non-repeated) lexical item. In this paper, we investigate the relationship between acoustic prominence, repetition, and predictability in a description task. In Experiment 1, we find that repeated referents are produced with reduced prominence, even when these referents are unexpected. In Experiment 2, we find that predictability and repetition both have independent effects on duration and intensity. However, word duration was primarily determined by repetition, and intensity was primarily determined by predictability. The data are most consistent with an account in which multiple cognitive factors influence the acoustic prominence of a word.


In conversation, certain words “stand out” more than others. Acoustically, these words are prominent because they are produced with greater intensity, a higher fundamental frequency (F0), or longer duration then expected. Traditionally, prosodic prominence has been described in two different ways. In one tradition, prominence is defined as a linguistic construct called a pitch accent, which occurs on words that are new or focused. Pitch accents are typically marked with a change in F0, and different types of pitch accents play different roles in the discourse (e.g. Pierrehumbert, 1980; Pierrehumbert & Hirschberg, 1990). Prominence has also been described in terms of its acoustic-phonetic form: prominence correlates with increases in fundamental frequency (F0), duration, intensity, and intelligibility (Bard et al., 2000; Bell et. al, 2003; Bell, Brenier, Gregory, Girand, & Jurafsky, 2009; Fowler & Housum, 1987; Jurafsky, Bell, Gregory, & Raymond, 2001; Watson, Arnold, & Tanenhaus, 2007). These different correlates of prominence often covary; however, they do not perfectly co-occur (e.g. Bard et al., 2000). Within this tradition, variation in the acoustic form has typically been discussed in terms of the word or syllable’s predictability or in terms of the psychological factors that play a role in producing or understanding the word. The current study investigates which factors drive apparent effects of predictability and discourse on acoustic prominence, and examines whether effects of redundancy on acoustic prominence are a result of speakers altering their speech to maintain a uniform information profile, or whether these effects are linked to language production retrieval and preparation processes.

A number of factors have been shown to correlate with prominence, such as repetition (Fowler & Housum, 1987; Bard & Aylett, 1999, Bard et al., 2000; Aylett & Turk, 2004; Pluymaekers, Ernestus, & Baayen, 2005a; Bell et al., 2009), frequency (Gregory, Raymond, Bell, Fosler-Lussier, & Jurafsky, 1999; Fosler-Lussier & Morgan, 1999; Jurafsky et al., 2001; Pluymaekers, Ernestus, & Baayen, 2005b), and transitional probability (Bell et al., 2009; Jurafsky et al., 2001; Kidd & Jaeger, 2008). For example, Fowler and Housum (1987) found that previously mentioned words in a corpus of recorded speech are shorter and less intelligible than words that have not been previously mentioned. Similarly, in recorded speech generated from a referential communication task, Bard and Aylett (1999) found that repeated words are less intelligible to listeners than non-repeated words, and other work has shown that listeners interpret prominence as a cue to new information in on-line sentence processing (e.g. Dahan, Tanenhaus, & Chambers, 2002). Lexical frequency is also linked with prominence (Zipf, 1929). High frequency words are produced with shorter durations than low frequency words (Gregory et al., 1999; Fosler-Lussier & Morgan, 1999; Jurafsky et al., 2001; Pluymaekers, Ernestus, & Baayen, 2005b). Lexical frequency also affects affix duration. When affixes are attached to infrequent words, the affixes are longer than when they are attached to more frequent words (Pluymaekers, Ernestus, & Baayen, 2005b). Finally, transitional probability can also affect prominence (Gregory et al., 1999; Jurafsky et al., 2001; Pluymaekers, Ernestus, & Baayen, 2005a; Kidd & Jaeger, 2008). When the transitional probability of a word is high, the acoustic realization of the word is reduced compared to when the transitional probability is low (Jurafsky et al., 2001).

Theories of acoustic prominence

While it is clear that repetition, frequency, transitional probability, and predictability in general all affect prosodic prominence, it is less clear why these effects exist. One proposal is that prominence differences are the result of speakers optimizing the acoustic signal for comprehension (Aylett & Turk, 2004; Frank & Jaeger, 2008; Lieberman, 1963; Fowler & Housum, 1987). If words are new, infrequent, and less predictable, then they may be difficult to identify in running speech. As a result, speakers may articulate these words more clearly to facilitate processing by the listener. Repeated, frequent, and predictable words are readily identifiable, so there is less need to articulate these words carefully (Fowler & Housum, 1987).

More recently, some accounts of prominence have appealed to information theoretic principles to explain differences in prominence across words. One example is Aylett & Turk’s (2004) Smooth Signal Redundancy Hypothesis. According to this account, effects of repetition, frequency and predictability on duration can be reduced to one thing: language redundancy (Aylett & Turk, 2004). Language redundancy is the predictability of a syllable, word, or syntactic structure in a linguistic context. It is modulated by a number of factors including lexical frequency, syntax, and pragmatics. In this framework, effects of repetition can also be linked to redundancy. Because speakers are more likely to refer to previously mentioned referents than new referents (Arnold, 1998), the repeated mention of a word is more predictable than the mention of a new word.

According to the Smooth Signal Redundancy Hypothesis, speakers attempt to produce a signal in which the amount of redundancy remains relatively constant throughout production. Aylett & Turk (2004) propose that prosodic prominence’s primary role is to smooth the information profile of a word. This is accomplished by reduction of syllable duration when a word is redundant: expected words are produced with shorter durations and unexpected words are produced with longer durations so that the amount of information conveyed is evenly distributed over time. Thus, the Smooth Signal Redundancy Hypothesis can explain the effects of repetition, frequency, and contextual predictability on prominence reduction.

While information theoretical accounts argue that reduction is for the benefit of the listener, this does not entail that speakers are explicitly modeling individual listeners such that specific knowledge is linked to a particular listener. Rather, speakers may be modeling a generic listener (Isaacs & Clark, 1987) whom they expect will have a certain set of beliefs or expectations. Taken to the extreme, speakers may be modeling predictability for any listener in general (Brown & Dell, 1987). Proponents of this approach argue that it is a model of communication at the computational level, rather than algorithmic level, in a Marr-like (1982) framework, and they are agnostic as to how it is implemented psychologically (e.g. Aylett & Turk, 2003; Frank & Jaeger, 2004; Jaeger, in press). Critically, whether a specific listener is being modeled or a more generic listener is being modeled, the approach at its core depends on smoothing the signal for a listener.

An alternative view is that the link between predictability and acoustic prominence is the result of speaker internal production processes, and not a means by which the speaker facilitates processing for the listener (Bard et al., 2000; Bell et al. 2009). Most researchers agree that speech production is a multi-step process beginning with message formulation, followed by grammatical encoding, and finishing with phonological encoding (e.g. Bock & Levelt, 1994). During message formulation, speakers formulate the semantic meaning of what they are going to say. During grammatical encoding, speakers select the appropriate lexical items to convey their messages and compute word order and include lexical items (like function words) that satisfy the constraints of the grammar of the language they speak. After grammatical encoding, speakers must encode the linguistic material into a phonological representation.

Proponents of a speaker internal account of prominence argue that prominence is linked to the amount of activation associated with a word in lexical retrieval (e.g. Bell et al., 2009). The speed at which lexical items are retrieved in the course of language production is regulated by factors like word frequency, repetition, and contextual predictability: frequent, repeated, and predictable words are retrieved more quickly than infrequent, non-repeated, and less predictable words (Griffin & Bock, 1998; Jescheniak & Levelt, 1994). Bell et al. (2009) propose that the speed of lexical retrieval is linked to articulatory planning, such that words that are retrieved quickly are articulated more quickly while words that are retrieved slowly are produced slowly. Bell et al. argue that this coordination between speed of lexical retrieval and articulatory planning is a strategy used by the production system to maximize fluent speech.

Thus, under a lexical retrieval account, repeated, predictable, and frequent words are reduced because they are retrieved quickly while non-repeated, unpredictable, and infrequent words are more prominent because they are retrieved more slowly. For example, when an entity is new to discourse, it will typically be less expected (Arnold, 1998). As a result there will be very little activation of the lexical item representing that entity and retrieval will be slower than if the word was initially more activated (Wingfield, 1968; Jescheniak & Levelt, 1994). This should result in longer articulation of the word. However, a given entity will have been previously activated and may be less difficult to retrieve than when it was first mentioned. This could occur either because of maintenance or through the slow decay of activation (Dell, 1990). Ultimately, this would lead to reduction of the word.

Although Bell et al.’s (2009) proposal centers on lexical access as the primary determinant of word duration, in principle, facilitation of processing at other levels of production could also affect word duration. If the linguistic message is easily formulated because of its frequency, the referential context, or because it is repeated, one might expect reduction. Similarly, if the phonological form of a word is highly activated, this might lead to reducing the phonological form as well. At the heart of all of these proposals is a desire to maximize fluent speech through feed forward mechanisms from earlier stages of production to the articulatory planning stage. We return to the locus of these potential effects in the General Discussion.

Finally, a third potential account of acoustic prominence is the multiple source view (Watson, 2010). Rather than assuming that acoustic prominence has a single source, under the multi-source view, the acoustic realization of a word is the product of many factors including difficulty in speech production as well as marking information for a listener. These different factors may differentially affect varying aspects of the acoustic signal. This approach differs from that of the information theoretic and the lexical access accounts, which make specific predictions about reduction (i.e. the shortening of the duration of a word). Under the multiple source account, a word’s fundamental frequency, duration, and intensity may be affected by different factors in different ways. Evidence for this view comes from both the production and comprehension literature. Watson, Arnold, & Tanenhaus (2008) found that difficult moves in games of Tic Tac Toe are produced with longer duration than moves that are easy or predictable. Moves that are important to the game, like a winning move or blocking a winning move, are produced with greater intensity than those that are not. In comprehension, some acoustic cues are used preferentially over other cues in determining linguistic structure, suggesting that these differing cues might have differing underlying sources (Isaacs & Watson, in press; Isaacs & Watson, 2009). For example listeners use the F0 slope over a word rather than raw duration in detecting prominence that is linked to discourse status (Isaacs & Watson, in press). Isaacs & Watson (2009) found that intensity and not duration contribute to meta-linguistic judgments of acoustic prominence, even though both correlate with prominence in production. Thus, it is possible that acoustic prominence might be the result of both speaker internal production mechanisms and facilitating comprehension, and these factors may affect the acoustic signal in different ways.

While all three accounts predict that predictability, lexical frequency, and repetition will influence acoustic prominence, this paper focuses on repetition. This is because a lexical retrieval account and an information theoretic account can potentially make differing predictions about whether repeated words will be reduced. Under a lexical retrieval account, repetition is critical for reduction. Words that have been produced before are reduced because they are easy to repeat. In contrast, under the information theoretic approach, reduction is driven primarily by how expected a word is. One way to test whether reduction is the result of information theoretic principles or the result of processes related to lexical retrieval and planning is to test whether effects of repetition and predictability are independent. If prominence is the product of smoothing the information profile of a word for the listener, unexpected words should be lengthened, even if they have been repeated. If prominence is partly the result of speaker-internal lexical retrieval processes, speakers should reduce previously mentioned words and lengthen new words, independent of whether the word is expected or not. Residual activation stemming from the previous production should cause reduction, even if the word is unexpected. Lastly, the multiple source account allows for the possibility that both theories may play some role in acoustic prominence, possibly in different ways.

Although we have discussed the acoustic correlates of prominence very generally, it is important to note that the information theoretic account and lexical retrieval account make predictions about duration in particular. Under information theoretic accounts, changing word duration is the means by which speakers alter the word’s information profile. Similarly, under lexical retrieval theories, the difficulty of producing a word affects the word’s length. In the experiments below, differences in duration will be used to adjudicate between these theories although we also measure F0 and intensity to determine whether they too are linked to repetition and predictability. The latter is critical for testing the multiple source account, as it is possible that predictability and repetition might both have effects on acoustic prominence, but they may occur along different acoustic dimensions.

Previous studies of repetition and predictability have relied primarily on corpus data (e.g. Aylett & Turk, 2004; Bell et al., 2009; Jurafsky et al. 2001). However, in natural speech, repetition and predictability are highly correlated: repeated words are more predictable than non-repeated words (Arnold, 1998). Thus, it is difficult to know whether or not effects of repetition and predictability are the result of similar cognitive processes. In Experiments 1a and 1b, we address this question by altering the correlation between predictability and repetition that exists in natural speech. In a picture description task, contexts were created in which repeating a word was unexpected, and producing a new word was expected (Experiment 1a). Contexts were also created in which repeating a word and producing a new word were equally expected, in order to elicit responses for comparison with Experiment 1a (Experiment 1b). If repeated reference causes reduction even when that target is less predictable, then this would provide support for theories that attribute reduction to factors in lexical retrieval. If repeated, less predictable target words are produced with longer duration than non-repeated, predictable words, this would support a redundancy avoidance account: predictable words are reduced to facilitate robust communication for the listener. If both repetition and predictability play a role, this would provide evidence for the multiple source account.

Experiment 1a & 1b

Participants

Sixty-three undergraduate students from the University of Illinois at Urbana-Champaign participated in this experiment to earn credit in a psychology course (32 in Experiment 1a, and 31 in Experiment 1b). All participants were native speakers of American English. Five participants had to be excluded from the analysis. One participant failed to produce the second utterance on repeated trials. Another two participants were excluded because of a recording error. The remaining two participants used pronouns on repeated mention trials, which made it impossible to compare prominence on repeated and non-repeated trials.

Materials

Participants’ task was to describe events on a computer screen to a confederate. Two pictures appeared on the participant’s (the director) and the confederate’s (the matcher) screen for each trial. On a given trial, one of the objects would shrink and then one of the objects would flash.

The stimuli were taken from a set of twelve images from Rossion & Pourtois (2001). These images were a colorized version of images originally created by Snodgrass & Vanderwart (1980). The images were used to generate six pairs of images. Images were paired so as to avoid semantic and phonetic relatedness. The image pairs were presented side by side in the center of the screen (see Figure 1). Each image pair appeared 18 times during the experiment, for a total of 108 trials. Which image appeared on the left or right was counterbalanced such that each image appeared on both sides an equal number of times. Items were randomized within sets of 12 trials such that in each set, each image pair appeared twice to counterbalance the image location.

Figure 1.

Figure 1

An illustration of a typical trial from Experiment 1 in the non-repeated condition. Note: The numbers indicates the order of events. The sun indicates the flashing event.

Repetition of one of the objects in the task and the likelihood of it being mentioned were both manipulated. Repetition was manipulated by varying whether the same object engaged in a shrinking and flashing event. On repeated mention trials, one of the images shrank and then the same image flashed. On non-repeated mention trials, one of the images shrank and then the other image flashed. An example of a trial from the non-repeated condition is shown in Figure 1.

Typical utterances are presented in (1) and (2).

  1. Repeated noun

    The axe is shrinking…The axe is flashing.

  2. Non-repeated noun

    The penguin is shrinking…The axe is flashing.

In order to manipulate predictability, a training block, followed by a test block, was used to manipulate speaker expectations about what object on the screen would flash. The training block consisted of 96 trials and the test block consisted of 12 trials. In the training block of Experiment 1a, repeated mention trials were much less predictable than non-repeated mention trials. In the training block, only six of the trials were repeated mention trials. The order of the conditions was pseudo-randomly permuted such that no repeated mention trial occurred within five trials of another repeated mention trial. Moreover, the first eight trials were all non-repeated mention trials. In the training block of Experiment 1b, repeated mention trials and non-repeated mention trials were equally likely and the order of the conditions was randomly permuted. Each training set was randomized such that every participant was presented with a different item order during the training block.

The purpose of the test block was to determine what effects repetition and the expectations established in the training block had on speaker productions. The test block consisted of 12 trials. The transition between training and test block was not marked, so participants were unaware of the transition. During the test block participants were exposed to each pair of images twice, one time in the repeated condition and one time in the non-repeated condition. As with the training block, the order of items during the test block was randomized. No pair of items appeared twice in a row during the test block. In Experiment 1a, there were 30 unique lists such that each participant was exposed to a unique order of trials during the test block. The order in which items were presented in the test block was matched across Experiments 1a and 1b.

The experiment was programmed using MATLAB with the Psychophysics toolbox version 2.54 installed. Participant utterances were recorded at a frequency of 44kHz.

Procedure

Before the experiment, participants were shown a video in which two research assistants were presented completing the task. This video was used to both instruct the participant on how to complete the task and to prime the participant with the construction, “the Noun 1 is shrinking…the Noun 2 is flashing.” Then participants were told that they would play the role of the director while the research assistant would play the role of the matcher. The director sat at a computer facing away from the matcher’s computer. The pair completed six practice trials before beginning the actual experiment. The images in the practice trials were the same images as those used in the experimental trials, and each pair was used only once. All practice trials were non-repeated mention trials.

At the beginning of each trial, two images appeared. After 500ms, one of the images shrank. Two seconds after the shrinking event, either the same image flashed, or the other image flashed. The participant was instructed to describe each event as soon as he knew what was happening. The matcher, meanwhile, clicked on one of four buttons to make her screen match the director’s screen. These buttons corresponded to the shrinking event and flashing event for each image. The trial ended when the matcher notified the director that she was finished matching the second event. Occasionally, the speaker accidently misnamed an object. On those trials, the matcher provided feedback that she did not have the object thereby prompting the director to correct his utterance. Otherwise there was no explicit feedback aside from confirmation of completion of the trial. The pair completed 108 trials of which the last 12 were recorded and labeled using Praat, a speech analysis program developed by Boersma & Weenink (2005). The target words during the first and second utterances were analyzed for mean F0 over the word, the maximum F0 excursion over the word, the minimum F0 over the word, word duration, and mean intensity. We also computed the proportion of the total utterance duration due to the target word in each condition (hereafter called target proportion).

Results

The data were analyzed using a linear mixed effects regression model with subject and item as random intercepts and slopes using the lmer function in the lme4 package in R (Baayen, 2008). Like ANOVA, this method accounts for the variance due to subjects and items; however, this method can account for variance of multiple random factors simultaneously (Baayen, 2008). Model comparisons were conducted using likelihood a ratio test to find the best fit random slopes and intercepts models. Random slopes did not significantly increase model fit for any reported model and are therefore not reported. Reported p-values were obtained from Markov chain Monte Carlo (MCMC) sampling using the language R package (Baayen, 2008). The production of the second target word across conditions was compared. We also compared the production of the first target word in a trial (Noun 1) to the production of the second target word (Noun 2) for trials in the repeated condition. All predictor variables were contrast coded, and as a result of our balanced design, the predictors were centered around the mean.

In Experiment 1a, there were effects of repetition but no effects of predictability. Non-repeated Noun 2’s were produced with greater duration (t=3.421, p<0.001, β=17.0 S.E.=4.96) than repeated Noun 2’s; however, there was no significant difference for intensity (t<1). There were also no significant differences in F0 across conditions for Noun 2’s. The target proportion in the non-repeated condition was also significantly greater than the target proportion in the repeated condition (t= 2.969, p<0.001, β=0.0112 S.E.=0.00377). The means for Noun 2 across repetition conditions are presented in Table 1. In repeated trials, Noun 1 was produced with greater duration (t=3.78, p<0.001, β=20.9 S.E.=5.54) and intensity (t=4.98, p<0.0001, β=1.08 S.E.=0.216) than Noun 2. Noun 1 was also produced with higher maximum F0 (t=2.98, p<0.01, β=13.2 S.E.=4.434), higher minimum F0 (t=4.91, p<0.001, β=18.7 S.E.=3.82) and higher mean F0 (t=6.13, p<0.001, β=14.6 S.E.=2.38) than Noun 2. The means for Noun 1 and Noun 2 in the repeated condition are presented in Table 2.

Table 1.

Experiment 1a and Experiment 1b Noun 2 Summary

Experiment 1a Experiment 1b


Metric Non-repeated Repeated Non-repeated Repeated
M SE M SE M SE M SE
Duration (ms) 393 (13.8) 376 (14.3) 356 (10.3) 339 (10.7)
Proportion .424 (.008) .413 (.008) .420 (.007) .406 (.007)
Intensity (db) 78.7 (1.14) 78.6 (1.08) 77.6 (0.97) 77.0 (1.00)
Average F0 (Hz) 167 (9.02) 165 (9.14) 156 (8.94) 155 (8.19)
F0 Maximum (Hz) 196 (11.3) 194 (11.2) 179 (10.7) 181 (10.7)
F0 Minimum (Hz) 141 (8.07) 140 (8.23) 136 (7.36) 135 (7.19)

Values in parentheses represent standard errors of the means. Note: In Experiment 1a, repeated nouns have low predictability (6.25% of trials) while non-repeated nouns have high predictability (93.75% of trials). In Experiment 1b, repeated and non-repeated nouns are equally likely.

Table 2.

Experiment 1a and 1b. The means of Noun 1 and Noun 2 in the repeated condition

Experiment 1a Experiment 1b


Metric Noun 1 Noun 2 Noun 1 Noun 2
M SE M SE M SE M SE
Duration (ms) 397 (14.0) 376 (13.8) 352 (10.6) 339 (10.7)
Intensity (db) 79.7 (1.10) 78.6 (1.08) 78.5 (0.953) 77.0 (1.00)
Average F0 (Hz) 180 (10.1) 165 (9.14) 165 (9.60) 155 (8.19)
F0 Maximum (Hz) 207 (12.0) 194 (11.2) 188 (10.7) 181 (10.7)
F0 Minimum (Hz) 159 (8.98) 140 (8.23) 145 (8.25) 135 (7.19)

Values in parentheses represent standard errors of the means. Note: In Experiment 1a, repeated Noun 2s have low predictability (6.25% of trials). In Experiment 1b, repeated Noun 2s have relatively higher predictability (50% of trials).

In Experiment 1b, non-repeated Noun 2’s were produced with greater duration (t=3.62, p<0.001, β=16.9 S.E.=4.68) and intensity (t=2.83, p<0.01, β=0.645 S.E.=0.228) than repeated noun 2’s. The target proportion was also significantly longer in the non-repeated condition (t=3.89, p<0.0001, β=0.0139 S.E.=0.00356). As in Experiment 1a, there were no significant differences in F0 across conditions for Noun 2. In repeated trials, Noun 1 was produced with greater duration (t=2.758, p<0.01, β=12.7 S.E.=4.62) and intensity (t=5.06, p<0.0001, β=1.52 S.E.=0.231) than Noun 2. Noun 1 was also produced with a higher average F0 (t=4.07, p<0.05, β=10.7 S.E.=2.64) and a higher minimum F0 (t=3.05, p=0.01, β=10.8 S.E.=3.53). Maximum F0 did not differ significantly across conditions.

Discussion

In both experiments, repeated nouns were less prominent than non-repeated nouns, providing support for a lexical retrieval account of prosodic prominence. According to these accounts, repeated words should be reduced because they have been previously activated and are therefore easier to retrieve for a subsequent production. This was also true of the overall utterance duration and the target proportion. Surprisingly, there were no differences between Noun 2’s in F0. In repeated conditions, F0 was higher for the first noun than second noun though this may have been due to the declination in pitch that typically occurs over a set of related utterances. It is possible that the descriptive nature of the task led participants to vary their pitch less than they would have in a more interactive setting.

These results are less consistent with information theoretic approaches. According to information theory based accounts, the predictability manipulation in Experiment 1a should have led to reduced duration of non-repeated, expected nouns and lengthening of repeated, unexpected nouns. In fact, the reverse occurred.

Note however that although the duration results support the theory that prominence is a result of lexical retrieval production processes, there were some differences in intensity across experiments. A post-hoc test for a condition by experiment interaction was conducted on the Noun 2 intensity data, which yielded a marginally significant interaction (t=1.77, p=0.07, β=0.561 S.E.= 0.317). There were no differences in intensity in Experiment 1a between the repeated, unexpected condition and the non-repeated, expected condition, but in Experiment 1b, non-repeated noun 2’s were produced with greater intensity than repeated noun 2’s. This pattern of results suggests that the lack of intensity differences observed in Experiment 1a may have been due to the predictability manipulation. This marginal interaction suggests that predictability and repetition may in fact be two separate factors that influence the production of prominence, a possibility that is most consistent with the multiple source view of prominence outlined in the introduction.

Predictability and repetition might both have had effects on intensity, but effects of predictability were not detectable because predictability and repetition were not independently manipulated. In the contexts of these experiments, it is difficult to determine whether this occurred, because in both instances, predictability and repetition were negatively correlated. We address this issue in Experiment 2.

Experiment 2

The goal of Experiment 2 was to test whether or not predictability and repetition have separate, independent effects on the acoustic realization of a word. A shortcoming of Experiment 1a was that predictability and repetition were negatively correlated, so the effects of one factor might have obscured effects of the other.

In Experiment 2, we altered the task used in Experiment 1 so that predictability and repetition could be independently manipulated. Participants were presented with an array of twelve images. As in Experiment 1, one image shrank and then another image flashed. However, the second event was preceded by a probabilistic cue as to which object would flash. In 92% of trials, a circle appeared around the image that flashed. At the beginning of the experiment, participants were told that the circle would usually but not always indicate which object would flash. This made it possible to independently manipulate whether a given word was repeated across events and whether it was expected by the speaker. An information theoretic account predicts that expected words will be produced with shorter durations than unexpected words. This account also predicts that there should be no differences between repeated and non-repeated words because the reliability of the cue was the same in repeated mention and non-repeated mention trials. In contrast, a lexical retrieval account predicts that repetition, rather than predictability, should affect reduction. This account predicts that repeated words should be produced with less prominence than non-repeated words. Finally, the multiple source account predicts that both repetition and predictability affect prominence and that these effects may be realized in different ways acoustically.

Method

Participants

Forty-five undergraduate students from the University of Illinois at Urbana-Champaign participated in this study in exchange for course credit. All participants were native speakers of American English. Four participants were excluded due to recording errors. One participant was excluded for failing to follow the instructions.

Materials

As in Experiment 1a and 1b, images were taken from a set of colored images by Rossion & Pourtois (2001). Ninety-six images from this set were used, of which 72 images were targets in critical trials. For each trial, a 3×4 array of images was displayed on a computer screen using MATLAB with the Psychophysics toolbox version 3.0 installed.

As in Experiments 1a and 1b, there were two events on each trial. On every trial, one of the images shrank and one of the images flashed. Predictability and repetition were manipulated in a 2 × 2 factorial design. Predictability was manipulated by circling a potential target for the second utterance immediately after the shrinking event but before the flashing event. On predictable trials, the circled image flashed. On unpredictable trials, the circled image did not flash. Repetition of the target word was also manipulated: either the same object shrank and flashed or different objects shrank and flashed. Predictability and repetition were crossed in a 2×2 design yielding four conditions: repeated-expected, repeated-unexpected, non-repeated-expected, and non-repeated-unexpected.

At the beginning of a trial, 12 images appeared on the screen. After one second, one of the images shrank. Then, after one second, one of the images was circled. The circle remained on the screen for 500 ms and then disappeared. After another 500ms, one of the images flashed. The images that shrank and/or were circled depended upon the condition for the trial. In repeated mention trials, the image that shrank was also the image that flashed. In non-repeated mention trials, the image that shrank was not the image that flashed. In expected trials, the image that was circled was the image that eventually flashed. In unexpected trials, the image that was circled was not the image that eventually flashed. An illustration of a trial from the non-repeated-unexpected condition is shown in Figure 2.

Figure 2.

Figure 2

An example of a typical trial in Experiment 2 from the non-repeated expected condition. Note: The numbers indicate the order of events in the trial. The sun indicates the flashing event.

There were six critical trials for each condition for a total of 24 critical trials. There were 120 filler trials, which were all trials in which the targets were non-repeated and predictable. These filler trials were used to reinforce the predictability manipulation. Overall, the cue was reliable on 132 of 144 trials or roughly 92% of the time. The order of trials was pseudo-randomly permuted such that no two critical trials appeared in succession. On critical trials, the shrinking image, the circled image, and the flashing image were all novel as targets but may have appeared previously as filler images. After the critical trial in which these images were used as targets, they could appear again later as targets in filler trials. Because of the potential for order effects, two pseudo-randomized target lists were used. In each list, critical items were counterbalanced using a Latin square, resulting in eight lists in total. The target on each trial was counterbalanced so that it appeared in all 12 locations an equal number of times in both critical trials and filler trials.

Procedure

Before beginning the experiment, participants were shown a video that demonstrated the task. This video was used to prime participants to use the construction “the Noun 1 is shrinking … the Noun 2 is flashing.” The video was also used to inform the participants about the probabilistic cue to the flashing event. Participants were told that the circled image was frequently the image that flashed. After watching the video, participants completed eight practice trials. Five of the trials were generated from the non-repeated-predictable condition. The other three trials were from the remaining conditions, repeated-expected, repeated-unexpected, and non-repeated-unexpected. These last three conditions were included so that participants would not be surprised when they encountered them in the actual experiment.

Following the practice trials, participants immediately began the experiment. Unlike in Experiments 1a and 1b in which the participant addressed a matcher, in Experiment 2, the speaker was alone in the room and progress through the experiment was self-paced. At the beginning of a trial, one of the images shrank. The participant then described this event. Then one of the images was circled. Finally, one of the images flashed and the participant described this event. After describing the flashing event, the participant pressed a key to begin the next trial. Participants completed 144 continuous trials of which 24 critical trials were recorded. The target word was the production of the flashing noun. The mean F0, max F0, min F0, intensity, and duration of target words were measured. We also computed the proportion of the total duration of the utterance occupied by the target word in each condition.

Results

Two targets (screw & refrigerator) were removed from analysis due to inconsistency in naming them across subjects. Of the remaining targets, 47 trials were removed from analysis due to errors in naming the targets. This led to a loss of 5.34% of the total trials.

Means are presented in Table 3. The data were analyzed using linear mixed effects regression with subject and item as random effects. All predictor variables were contrast coded and centered. As in Experiment 1a and 1b, model comparisons were conducted using a likelihood ratio test to find the best fit random slopes and intercepts models. Again, random slopes did not significantly increase model fit for any reported model and are therefore not reported. All reported models include random intercepts.

Table 3.

Experiment 2 Noun 2 Summary

Non-repeated Repeated


Metric Expected Unexpected Expected Unexpected
M SE M SE M SE M SE
Duration (ms) 446 (13.5) 458 (11.9) 421 (10.9) 428 (12.1)
Noun Proportion .401 (.006) .420 (.006) .398 (.005) .404 (.006)
Intensity (dB) 58.4 (0.81) 59.2 (0.88) 58.2 (0.82) 58.7 (0.91)
Average F0 (Hz) 170 (7.49) 177 (7.62) 173 (6.84) 168 (6.70)
F0 Maximum (Hz) 206 (9.28) 220 (9.54) 213 (10.0) 198 (8.87)
F0 Minimum (Hz) 146 (6.51) 149 (6.71) 145 (5.84) 144 (6.07)

Values in parentheses represent standard errors of the means.

Both repetition and expectedness were reliable predictors of intensity, and both factors together were better predictors of intensity than either factor alone. Non-repeated words had greater intensity than repeated words (t=2.73, p<0.01, β=0.4802 S.E.=0.176), and unexpected words had greater intensity than expected words (t=3.77, p<0.001, β=0.6617 S.E.=0.176). The relative size of the regression coefficients suggests that predictability had a larger effect than repetition on intensity.

Only repetition was a reliable predictor of raw noun duration. Non-repeated words were longer than repeated words (t=6.58, p<0.0001, β=30.0 S.E.=4.55). Predictability did not reliably predict raw duration.

There were effects of both repetition and predictability on target proportion. Non-repeated words had a greater target proportion than repeated words (t=5.37, p<0.0001, β=0.0145 S.E.=0.00270) and unexpected words had a greater target proportion than expected nouns (t=2.70, p<0.01, β=0.0073 S.E.=0.00270). For target proportion, the relative size of the regression coefficients suggest that repetition has a larger effect than predictability.

For measures of maximum F0 and average F0, there was a significant interaction between repetition and expectedness (t=2.954, p<0.01, β=25.1 S.E.=8.51; t=2.362, p<0.05, β=8.57 S.E.=3.65 respectively). Post-hoc paired t-tests revealed that non-repeated nouns had a higher average F0 than repeated nouns in the unexpected condition (t39=3.367, p<0.001); expected nouns showed no difference across repetition conditions (t39=0.088, p=0.93).

Discussion

The results from Experiment 2 suggest that both repetition and predictability play independent roles in the production of prosodic prominence. Both factors affected the duration and intensity of the target word. However, predictability was the stronger predictor of intensity while repetition was the stronger predictor of duration. These data are most consistent with the multiple source account.

The data from Experiment 2 suggests that the effects of repetition obscured effects of predictability in Experiment 1a, particularly for intensity. Recall that in Experiment 1a, these two factors were placed in opposition to one another, making it difficult to know whether each factor contributed independently to the acoustics of the target word.

Interestingly, there was a reliable interaction between predictability and repetition with respect to F0 such that unexpected words were produced with higher F0 in the non-repeated condition than in the repeated condition, but expected words were produced with similar F0 in the repeated conditions. This suggests that the lack of an effect in Experiment 1 was not due to the nature of the task. This interaction was not predicted by any of the theories discussed above, and suggests that the presence of both factors may be necessary for triggering a higher F0. Future work will need to investigate why effects on F0 appear to be qualitatively different than effects on duration and intensity.

Finally, one potential concern is that because there was no overt listener (unlike Experiment 1), these data may not be useful in evaluating information theoretic approaches. There are two things to note. The first is that, as discussed in the introduction, in the information theoretic frameworks that have been proposed, it is not critical that a specific listener be present. Frank & Jaeger (2004) are agnostic as to whether a specific listener is modeled, and argue that these information theoretic principles apply to communication more generally. The second is that despite the absence of a listener, there were clear effects of predictability. The manipulation of predictability was clearly strong enough to elicit differences. However, these differences appeared primarily in measures of intensity, rather than duration.

General Discussion

In the two experiments discussed in this paper, both repetition and predictability influenced the duration and intensity of target words in the tasks above. Repetition primarily affected word length and predictability primarily affected word intensity. These results are most consistent with a multiple source view of acoustic prominence: the prominence of a word is affected by production factors like the lexical access account and by marking unpredictable information. In English, these two factors appear to affect the acoustic properties of the word in different ways.

In fact, these findings are consistent with previous findings in the literature. Watson, Arnold, & Tanenhaus (2008) found a dissociation between intensity and duration in games of Tic Tac Toe, depending on the likelihood and importance of a game move. Baker & Bradlow (in press) have found that in clear speech, second mentions are more reduced for high frequency words than low frequency words, which is consistent with repetition and frequency effects having different underlying sources.

The duration results cannot be explained by an information theory account alone. These theories predict that speakers alter the duration of words such that listeners can more readily parse their utterances depending on the redundancy of the elements in the utterance. In particular, this account suggests that speakers should increase the length of less predictable words and reduce the length of predictable words. However, in Experiment 1a, when repeated words were unexpected, they were still reduced, despite being unexpected. In Experiment 2 where predictability and repetition were independently manipulated, predictability had no effect on target word duration. While predictability did have an effect in one of the measures of duration (target proportion), it was weaker than effects of repetition. Thus, it is clear that predictability alone is not sufficient for accounting for prominence differences in duration.

One potential concern is that the predictability manipulation here differs from the types of linguistic predictability that have typically been discussed in the literature. Linguistic predictability has been claimed to incorporate lexical and syntactic frequency, n-gram probabilities, and previous mention, all of which are properties of a language that the native speaker must learn through a lifetime of experience with the language. In contrast, the manipulation of predictability in the current study is based on the predictability of events occurring in a task. The representations that underlie this type of task-based predictability may differ from representations that underlie predictability based on stored linguistic experiences. However, stored sources of predictability are necessarily based upon input the speaker received while interacting with his or her environment and language community. Thus, in principle, there is no reason why the manipulations of predictability in this task should differ from longer-term linguistic predictability, except that it is more recent.

Although the manipulation of predictability did not have a significant effect on the raw duration of the target word, this does not mean that speakers do not optimize some aspects of speech for processing by the listener. First, speakers did lengthen target proportions, a prediction made by information theoretical accounts. This can be done both by increasing the duration of the target word, or reducing the duration of words around the target word, with the end result being that the target word is perceived as being more prominent. Although this effect was relatively weak compared to effects of repetition, it may have resulted from some optimization of the signal for the listener. Second, speakers could be improving word intelligibility by increasing the intensity of less predictable words. Changes in intensity may be more useful for a listener. Changes in duration can have a number of underlying causes including marking prosodic boundaries, the production of disfluencies, marking the rhythm of a sentence, and distinguishing between vowels of varying length to name just a few. These factors may make duration too noisy of a signal to use in detecting unpredictable information. In contrast, intensity may be a less crowded channel on which to convey information about predictability. And of course, the data here also do not rule out the possibility that changes in duration can be explained by information theoretic accounts of other aspects of linguistic structure.

One question this study did not address is the locus of where these effects arise. Previous work suggests that the repetition effect is not realized at the phonological level. Repetitions lead to reduction, but saying a word and then its homonym does not (Fowler 1988). Homonyms are words that are identical in spelling and sound, but have different meanings. Because homonyms are identical in sound, the production process for homonyms should be identical at the level of phonological encoding. This suggests that the repetition effect is situated at a higher level of production than phonological encoding, potentially at the level of lexical selection or message formulation. Fowler (1988) also failed to find a repetition effect when speakers produced repeated words that were produced as a list. Although the discrepancy between Fowler’s (1988) findings and the results here are puzzling, one possible explanation is that word list production does not engage the same production processes, such as message planning, as situated language use. If reduction is linked to ease of processing at higher stages of the production process, one might not expect to see reduction in the production of word lists. At the very least, both of Fowler’s findings suggest that effects of repetition may be driven by production factors earlier in the production process than the level of phonological encoding. In addition, work showing that reduction and decreases in intelligibility can occur even when a word is produced by a different speaker suggests that these effects are not necessarily rooted in phonological encoding (Bard et al., 2000; Anderson & Howarth, 2002).

It is unclear at what level of production effects of predictability arise. One possibility is that there are feed-forward connections from the message formulation level to the level of articulation that modulate levels of intensity depending on the predictability of the word. Judgments of predictability could either come from explicitly modeling the expectations of the listener or by evaluating listeners’ knowledge based upon the speaker’s own assessment of predictability (e.g. Brown & Dell, 1987; Horton & Keysar, 1996). This is related to a broader debate in the psycholinguistics literature regarding the extent to which speakers design their utterances for the listener. Clearly, future work in this domain will need to determine the exact mechanism that underlies marking unpredictable information with intensity.

More generally, these data suggest that prominence is not a unitary linguistic or psychological construct. Different factors can play a role in whether a word is produced with prominence, and this prominence can be realized in different ways. In contrast to previous work, which has typically found increases in duration and intensity co-occurring in natural speech, we have found that intensity is more strongly linked to speaker expectation while duration is more strongly linked to repetition. These data suggest that the prominence of a word can potentially have multiple sources (Watson, 2010).

Acknowledgments

We would like to thank Allison Potter, Cara Ader, Nicole Nash, Ashley Turk, and Carl Ruark for helping with data collection and transcription. We would also like to thank Jennifer Arnold for assistance in formulating the initial ideas for this project and Gary Dell and the members of the Communication and Language Lab for helpful comments on this project. This project was supported by a grant from the National Institutes of Health R01 DC008774.

References

  1. Andersen AH, Howarth B. Referential form and word duration in video-mediated and face-to-face dialogues. In: Bos J, Foster ME, Matheson C, editors. Proceedings of the sixth workshop on the semantics and pragmatics of dialogue; Edinburgh: University of Edinburgh; 2002. pp. 13–20. [Google Scholar]
  2. Arnold JE. Stanford University; 1998. Reference form and discourse patterns. Unpublished doctoral dissertation. [Google Scholar]
  3. Aylett M, Turk A. The smooth signal redundancy hypothesis: A functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech. Language and Speech. 2004;47:31–56. doi: 10.1177/00238309040470010201. [DOI] [PubMed] [Google Scholar]
  4. Baker RE, Bradlow AR. Variability in word duration as a function of probability, speech style, and prosody. Language and Speech. doi: 10.1177/0023830909336575. (in press) [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bard EG, Anderson AH, Sotillo C, Aylett M, Doherty-Sneddon G, Newlands A. Controlling the intelligibility of referring expressions in dialogue. Journal of Memory and Language. 2000;42(1):1–22. [Google Scholar]
  6. Bard EG, Aylett MP. The disassociation of deaccenting, givenness, and syntactic role in spontaneous speech; Proceedings of the 1999 International Conference on Spoken Language Processing; 1999. pp. 1753–1756. [Google Scholar]
  7. Baayen RH. Analyzing Linguistic Data. A Practical Introduction to Statistics Using R. Cambridge University Press; 2008. [Google Scholar]
  8. Bell A, Brenier J, Gregory M, Girand C, Jurafsky D. Predictability Effects on Durations of Content and Function Words in Conversational English. Journal of Memory and Language. 2009;60(1):92–111. [Google Scholar]
  9. Bell A, Jurafsky D, Fosler-Lussier E, Girand C, Gregory ML, Gildea D. Effects of disfluencies, predictability, and utterance position on word form variation in English conversation. Journal of the Acoustical Society of America. 2003;113(2):1001–1024. doi: 10.1121/1.1534836. [DOI] [PubMed] [Google Scholar]
  10. Boersma, Paul, Weenink, David Praat: doing phonetics by computer (Version 4.5.14) [Computer program] 2007 Retrieved from http://www.praat.org/
  11. Dahan D, Tanenhaus MK, Chambers CG. Accent and reference resolution in spoken-language comprehension. Journal of Memory and Language. 2002;47:292–314. [Google Scholar]
  12. Dell GS. Effects of frequency and vocabulary type on phonological speech errors. Language and Cognitive Processes. 1990;5:313–349. [Google Scholar]
  13. Fowler CA, Housum J. Talkers’ signaling of “new” and “old” words in speech and listeners’ perception and use of the distinction. Journal of Memory and Language. 1987;26:489–504. [Google Scholar]
  14. Fowler CA. Differential shortening of repeated context words produced in various communicative contexts. Language and Speech. 1988;31:307–319. doi: 10.1177/002383098803100401. Retrieved from http://las.sagepub.com. [DOI] [PubMed]
  15. Fosler-Lussier E, Morgan N. Effects of speaking rate and word predictability on conversational pronunciations. Speech Communication. 1999;29:137–158. [Google Scholar]
  16. Frank A, Jaeger TF. Speaking Rationally: Uniform Information Density as an Optimal Strategy for Language Production; The 30th Annual Meeting of the Cognitive Science Society (CogSci08); 2008. pp. 933–938. [Google Scholar]
  17. Gregory ML, Raymond WD, Bell A, Fosler-Lussier E, Jurafsky D. CLS-99. Chicago: University of Chicago; 1999. The effects of collocational strength and contextual predictability in lexical production; pp. 151–166. [Google Scholar]
  18. Griffin ZM, Bock JK. Constraint, word frequency, and relationship between lexical processing levels in spoken word production. Journal of Memory and Language. 1998;38:313–338. [Google Scholar]
  19. Horton WS, Keysar B. When do speakers take into account common ground? Cognition. 1996;59(1):91–117. doi: 10.1016/0010-0277(96)81418-1. [DOI] [PubMed] [Google Scholar]
  20. Isaacs AM, Watson DG. Accent detection is a slippery slope: Direction and rate of F0 change drives listeners’ comprehension. Language and Speech. doi: 10.1080/01690961003783699. (in press) [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Isaacs AM, Watson DG. Speakers and listeners don’t agree: Audience design in the production and comprehension of acoustic prominence; Poster presentation at CUNY 2009: Conference on Human Sentence Processing, Davis, CA; 2009. [Google Scholar]
  22. Jaeger TF. Redundancy and reduction: Speakers manage syntactic information density. Cognitive Psychology. doi: 10.1016/j.cogpsych.2010.02.002. (in press) [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Jescheniak JD, Levelt WJM. Word frequency effects in speech production: Retrieval of syntactic information and phonological form. Journal of Experimental Psychology: Learning, Memory, and Cognition. 1994;20:824–843. [Google Scholar]
  24. Jurafsky D, Bell A, Gregory M, Raymond WD. Probabilistic relations between words: Evidence from reduction in lexical production. In: Bybee J, Hopper P, editors. Frequency and the emergence of linguistic structure. Amsterdam: Benjamins; 2001. pp. 229–254. [Google Scholar]
  25. Kidd C, Jaeger TF. Spoken presentation given at Experimental and Theoretical Advances in Prosody. Ithaca, NY: Cornell University; 2008. Apr, Prosodic Phrasing and Function Word Pronunciation. [Google Scholar]
  26. Lierberman R. Some effects of the semantic and grammatical context on the production and perception of speech. Language and Speech. 1963;6:172–175. [Google Scholar]
  27. Onishi KH, Chambers KE, Fisher C. Learning phonotactic constraints from brief auditory exposure. Cognition. 2002;83:B13–B23. doi: 10.1016/s0010-0277(01)00165-2. [DOI] [PubMed] [Google Scholar]
  28. Pluymaekers M, Ernestus M, Baayen RH. Articulatory planning is continuous and sensitive to informational redundancy. Phonetica. 2005a;62:146–159. doi: 10.1159/000090095. [DOI] [PubMed] [Google Scholar]
  29. Pluymaekers M, Ernestus M, Baayen RH. Lexical frequency and acoustic reduction in spoken Dutch. Journal of the Acoustical Society of America. 2005b;118:2561–2569. doi: 10.1121/1.2011150. [DOI] [PubMed] [Google Scholar]
  30. Rossion B, Pourtois G. Revisiting Snodgrass and Vanderwart's object database: Color and texture improve object recognition. Journal of Vision. 2001;1(3):413a. [Google Scholar]
  31. Saffran JR, Aslin RN, Newport EL. Statistical learning by 8-month old infants. Science. 1996;274:1926–1928. doi: 10.1126/science.274.5294.1926. [DOI] [PubMed] [Google Scholar]
  32. Snodgrass JG, Vanderwart M. A standardized set of 260 pictures: Norms for name agreement, image agreement, familiarity, and visual complexity. Journal of Experimental Psychology: Human Learning & Memory. 1980;6(2):174–215. doi: 10.1037//0278-7393.6.2.174. [DOI] [PubMed] [Google Scholar]
  33. Warker JA, Dell GS. Speech errors reflect newly learned phonotactic constraints. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2006;32:387–398. doi: 10.1037/0278-7393.32.2.387. [DOI] [PubMed] [Google Scholar]
  34. Watson DG. The many roads to prominence: Understanding emphasis in conversation. In: Ross B, editor. The Psychology of Learning and Motivation. Vol. 52. Burlington: Academic Press; 2010. pp. 163–183. [Google Scholar]
  35. Watson DG, Arnold JE, Tanenhaus MK. Tic Tac TOE: Effects of predictability and importance on acoustic prominence in language production. Cognition. 2007;106:1548–1557. doi: 10.1016/j.cognition.2007.06.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Wingfield A. Effect of frequency on identification and naming objects. American Journal of Psychology. 1968;81:226–234. [PubMed] [Google Scholar]
  37. Zipf GK. Relative frequency as a determinant of phonetic change. Harvard Studies in Classical Philology. 1929;15:1–95. [Google Scholar]

RESOURCES