Perception of formulaic and novel expressions under acoustic degradation

C Sophia Rammell; Diana Van Lancker Sidtis; David B Pisoni

doi:10.1075/ml.16019.ram

. Author manuscript; available in PMC: 2019 May 10.

Published in final edited form as: Ment Lex. 2018 Mar 15;12(2):234–262. doi: 10.1075/ml.16019.ram

Perception of formulaic and novel expressions under acoustic degradation

C Sophia Rammell ¹, Diana Van Lancker Sidtis ^2,³, David B Pisoni ^1,³

PMCID: PMC6510503 NIHMSID: NIHMS993238 PMID: 31080525

Abstract

Background:

Formulaic expressions, including idioms and other fixed expressions, comprise a significant proportion of discourse. Although much has been written about this topic, controversy remains about their psychological status. An important claim about formulaic expressions, that they are known to native speakers, has seldom been directly demonstrated. This study tested the hypothesis that formulaic expressions are known and stored as whole unit mental representations by performing three perceptual experiments.

Method:

Listeners transcribed two kinds of spectrally-degraded spoken sentences, half formulaic, and half novel, newly created expressions, matched for grammar and length. Two familiarity ratings, usage and exposure, were obtained from listeners for each expression. Text frequency data for the stimuli and their constituent words were obtained using a spoken corpus.

Results:

Participants transcribed formulaic more successfully than literal utterances. Usage and familiarity ratings correlated with accuracy, but formulaic utterances with low ratings were also transcribed correctly. Phrase types differed significantly in text frequency, but word frequency counts did not differentiate the two kinds of expressions.

Discussion:

These studies provide new converging evidence that formulaic expressions are encoded and processed as whole units, supporting a dual-process model of language processing, which assumes that grammatical and formulaic expressions are differentially processed.

Keywords: formulaic language, corpus analysis, speech perception, dual-process model

Interest in formulaic language has grown in the past few decades. Scholars use an array of terms and have described various categories of formulaic expressions, along with their characteristic properties (Van Lancker & Rallon, 2004; Wray, 2002; Wulff, 2008). Formulaic expressions include conversational speech formulas, idioms, proverbs, pause fillers, counting, swearing, and other conventional and multiword units. Some examples are He’s got his head in the clouds, I’ll get back to you later, Cat got your tongue?, and Gosh darn it. Despite differences between these expressions, all have two important characteristics in common: they are not newly created utilizing grammatical rules to combine words and they are known to speakers of a language community. Detailed linguistic analyses of formulaic expressions reveal that formulaic expressions and other multiword expressions form categories along continua (Tannen & Öztek, 1981; Van Lancker, 1975; Kecskes, 2003, 2007; Wray, 2002; Ellis, 2012) in association with such characteristics as attitudinal and emotional nuance, degree of cohesion, more or less nonliteral meaning, dependency on context, optional or obligatory status in social contexts, and semantic transparency (Jarema, Busson, Nikolova, Tsapkin, & Libben, 1999). Nonetheless, these expressions, despite their differences, have in common that they are not newly created and they are recognized as known by speakers in a community.

Formulaic expressions make up a large part of language use, with estimates of proportions in normal discourse from 25–70% (Foster, 2001; Hill, 2001; Van Lancker Sidtis, 2014) and total counts between 100,000 and 300,000 (Jackendoff, 1995; Kuiper, 2009). They perform an assortment of communicative functions, including conveying nonliteral meanings and cultural memes, humor, interpersonal bonding, attitudinal and emotional expression, sociological group identity, and language play. Understanding how formulaic expressions are processed, stored, and retrieved from memory can contribute to more complete models of language processing.

Formulaic expressions typically have a stereotyped form,¹ conventionalized meaning (usually beyond the direct lexical meaning), and an appropriate context (with requirements for formality and register), all of which are immediately recognizable to native speakers of a language (Fillmore, 1979; Pawley & Syder, 1983; Kuiper, 2006). Second language speakers, in acquiring the form, meaning, and contextual contingencies of formulaic expressions, face a considerable challenge (Paquot & Granger, 2012); producing a formulaic expression with a replaced lexical item or nonstandard prosodic contour is generally taken to be a second language speaker error or a humorous gesture (Kuiper, 2007; Bell, 2012; Millar, 2011). Child language acquisition schedules differ for novel and formulaic expressions (e.g., Gleason & Weintraub, 1976; Gleason, 1980; Kempler, Van Lancker, Marchman & Bates, 1999; Nippold, 1998; Peters, 1983; Locke, 1993; Perkins, 1999). Evaluation and treatment in speech-language pathology are best informed by distinguishing between loss and rehabilitation of formulaic or novel language (Van Lancker Sidtis, 2012b, 2014; Stahl & Van Lancker Sidtis, 2015).

In the linguistic sciences, formulaic expressions, or “formulemes,” have been studied using surveys and sentence completion tasks (Van Lancker & Rallon, 2004; Van Lancker Sidtis, Cameron, Bridges, & Sidtis, 2015), word association (Clark, 1970), interpretation and recognition (Gibbs, 1980; Libben & Titone, 2008; Cutting & Bock, 1997; Osgood & Housain, 1974; Van Lancker Sidtis, 2003), language acquisition schedules (Nippold, 1998; Reuterskiöld & Van Lancker Sidtis, 2012; Pickens & Pollio, 1979), auditory/acoustic measures (Van Lancker, Canter & Terbeek, 1981; Lieberman, 1963; Yang & Van Lancker Sidtis, 2016), and speech errors (Kuiper, Van Egmond, Kempen, & Sprenger, 2007; Nooteboom, 2011). A variety of psycholinguistic designs, including eye tracking and response times, have aimed at discovering principles that distinguish the two kinds of language, formulaic and newly created (e.g., Siyanova-Chanturia, Conklin, & Schmitt, 2011; Underwood, Schmitt, & Galpin, 2004; Swinney & Cutler, 1979). Corpus linguistics and computational approaches have focused on collocation frequencies (Moon, 1998a,b; Biber, 2009; Conrad & Biber, 2004) and mutual information scores (Lin, 1999; Lin & Adolphs, 2009; Paquot & Granger, 2012; Wulff, 2008). Corpus linguistic approaches have used Latent Semantic Analysis (Schone & Jurafsky, 2001) and semantic similarity measures (Bannard, Baldwin, & Lascarides, 2003).

Many studies have suggested that formulaic expressions have unitary structure. Early studies revealed differences in pronunciation and perception between matched novel and formulaic exemplars (Lieberman, 1963; Van Lancker et al., 1981). As part of their stereotyped form, formulaic expressions have been shown to exhibit phonological coherence, which may be thought of a surrogate indicator of holistic structure (Hallin & Van Lancker Sidtis, 2015). Lin and others (Lin, 2010; Lin & Adolphs, 2009) have proposed that these expressions form a single intonation unit. In similar fashion, previous research has shown that formulaic expressions, under controlled conditions, are uttered faster and more fluently than novel language (Erman, 2007; Lin, 2010; Wray, 2002; Van Lancker et al., 1981; Hallin & Van Lancker Sidtis, 2015; Tabossi, Fanari & Wolf, 2009), again suggesting unitary structure. Other distinguishing characteristics are loudness, distinctive voice quality, and temporal cues such as initial shortening and phrase final lengthening (Yang, Ahn, & Van Lancker Sidtis, 2015; Yang & Van Lancker Sidtis, 2016). These studies address the structure and physical characteristics of formulaic expressions, but do not directly probe knowledge of the expressions and their place in memory or information processing. Numerous studies have probed various constituent and usage properties of one important category of FEs, idioms (Cacciari & Tabossi, 1988; Nunberg, Sag, & Wasow, 1994), referencing their varying properties such as literalness and transparency (Titone & Connine, 1994, 1999). These studies have led to proposals of mental representation that distinguish holistic, word like storage of non-decomposable subtypes from a configurational format (Caillies & Butcher, 2007). However, many of characteristics inhering in idioms are not viable for other kinds of FEs. This study utilized a range of FEs, including idioms and other kinds (see methods below), that differ from novel expressions in only one parameter: they, as a unit, are known to speakers.

The properties of the broad constituency of formulaic language are well accounted for by the proposal of a dual-process model of language (Van Lancker Sidtis, 2012a; Wray & Perkins, 2004; Erman & Warren, 2000). In this model, formulaic expressions and newly created, novel expressions differ in how they are learned, processed, and stored. Novel expressions are processed and analyzed in real time using stored lexical and morphological units organized according to grammatical rules. Formulaic expressions, in contrast, at some level of mental representation, may be accessed from stored traces as whole, precompiled units (Horowitz & Manelis, 1973; Osgood & Housain, 1974). Related to the dual-processing model is the “hybrid” model, which suggests that idioms may have at least two kinds of representations, one in holistic profile and another in compositional form (Sprenger, Levelt, & Kempen, 2006). This view accommodates some psycholinguistic results that show abilities of language users, in experimental studies, to process elements of constituency of formulaic expressions at phonological and lexical levels. Yet in these studies, the status of many kinds of formulaic expressions as relatively unitary in some stage or level of mental representation is attested (Osgood & Housain, 1974; Swinney & Cutler, 1979; Conklin & Schmitt, 2008; Horowitz & Manelis, 1973; Siyanova-Chanturia et al., 2011). While any verbal object can be decomposed in various ways, the hybrid view seems to provide the best characterization for idioms: “idioms are represented and retrieved as units that can interact” with compositionality and other factors (Libben & Titone, 2008; p. 1117). This general perspective forms a foundation also for all the larger class of FE variants utilized in this study.

The hypothesis tested in the present set of experiments is that formulaic expressions are known to the native speaker, and that they are known (stored) as single, holistic units in at least one level or stage of mental representation (Bolinger, 1976, 1977). Specifically, we predicted that, under acoustic degradation conditions, formulaic expressions will be correctly perceived more often than novel expressions because they are familiar and are stored in long-term memory as unitary holistic units. Exposure to degraded and incomplete perceptual information will suffice to elicit the associated, stored unitary form. Further, transcribed responses will fit the original, entire formulaic target more accurately than the matched, original complete novel target. For this report, the formulaic expressions chosen for study are conversational speech formulas, idioms, proverbs, lexical bundles, and other conventional expressions (see Appendix A). Our interest here is to establish empirical evidence for the proposal that native speakers know and process these expressions in a way that makes them fundamentally different in mental representation from newly created, novel utterances. Three experiments were conducted.

Experiment 1

Method

Subjects

Participants were native speakers of English with no known speech or hearing disorders at the time of testing. Twenty-two subjects (F = 9, M = 13) completed a two-part experimental protocol. The mean age of the subjects was 18.9 years, with a range of 18–21 years. Participants in all three experiments were recruited using the Indiana University Psychological and Brain Sciences Departmental Volunteer Subject Pool. Subjects were all undergraduate students at Indiana University enrolled in introductory psychology classes.

Stimuli

The stimulus materials consisted of 140 meaningful English spoken sentences, produced with natural expression by a native speaker of American English. Half of the sentences were formulaic and half were novel sentences matched for lexical syllable structure, length and grammatical construction (Appendix A). Forty-four sentences were taken from the Familiar and Novel Language Comprehension task (FANL-C) (Kempler & Van Lancker, 1996). The remaining sentences were selected from a matched idiom-novel expression list compiled for use in a previous study. Of the formulaic set, the stimuli fell into several categories. About half were classical idioms (e.g., That’s the way the cookie crumbles; Straight from the horse’s mouth). The rest consisted of proverbs (When the cat’s away, the mice will play; Don’t burn your bridges behind you), lexical bundles (None of your business; On the other hand), conversational speech formulas (And now for something completely different; I should be so lucky), and numerous other conventional utterances (A little of that goes a long way; You never had it so good; That’s for me to know and you to find out; No sooner said than done). This heterogeneous array of FEs have in common that they are not newly created and, in our view, they are known to speakers in their canonical (“formuleme”) form.

Analyses were performed to determine the text frequency of the expressions and their constituent words. Spoken corpus data from the Corpus of Contemporary American English (COCA) were used to obtain and compare whole phrase frequency and individual content word frequency from the phrases for a subset of 41 pairs of test expressions (used in Experiment 3; Davies, 2008). The median frequency in COCA of the entire set of formulaic phrases was 3, and the median frequency of novel phrases was 0.073. The mean frequency for formulaic expressions, 102.24, was strongly influenced by one outlier (“On the other hand,” frequency = 3832). An analysis was also performed to assess raw frequencies across both types of expressions. For each expression and for the sum of the content word frequencies in each expression, a measure of ln (frequency + 1) was calculated (Baayen & Hendrix, 2011; Baayen, Milin, Durdevic, Hendrix, & Marelli, 2011) (See Appendix B).

A repeated-measures ANOVA performed on the natural log measure of the frequency values of the content words and the whole phrase by type of expression revealed that the frequency of occurrence of the individual content words in both formulaic and novel expressions was the same. However, the types of expressions differed in frequency: the formulaic expressions occurred more frequently in the COCA corpus than the novel expressions. A significant interaction of type of expression and expression versus content word frequency was obtained in the ANOVA, F(1, 40) = 21.681, p < 0.001. (please see Appendix B).

There are many different ways of degrading a speech signal to reduce performance such as filtering or using white noise or multi-talker babble (Pisoni, 1996). In the present study, an acoustic simulation of a four-channel cochlear implant was used (Shannon, Zeng, Kamath, Wygonski, & Ekelid, 1995). Cochlear-implant simulated speech is an easy way to manipulate the intelligibility of the speech signal over a range from very degraded to less degraded. The .wav files of all of the stimuli were processed using Angel Sim to create a 4-channel cochlear implant simulation of sinewave vocoded speech (TigerCIS 2012). The original speech signal was first band-pass filtered into four frequency bands based on Greenwood’s function at a frequency range of 200–7000 Hz and filter slope of 24 dB/oct (Greenwood, 1990). The input and output signals were matched in frequency range and filter slope. Then the amplitude envelope was derived from each filter band using a low-pass filter with a cutoff frequency of 160 Hz and a roll off of 24 dB/oct. Residual spectral information was removed and replaced with either white noise or sine waves (Dorman & Loizou, 1997, 1998). This form of vocoded speech maintains the original speech envelope but removes the temporal fine structure. This approach to speech signal degradation was used in the present studies because it reduced speech intelligibility to levels close to the threshold of identification accuracy (Shannon et al., 1995; Shannon, Fu, & Galvin, 2004).

Procedure

All three experiments consisted of two parts: a sentence recognition task and a familiarity rating task based on usage or exposure. The participants were naïve to the purpose of the study and IRB procedures were followed. Prior to beginning Experiment 1, participants were informed that the sentences they would hear over their headphones had been processed by a computer and the speech would sound degraded. Five practice sentences taken from another test protocol were played first to familiarize the listeners with spectrally degraded speech (Nilsson, Soli, & Sullivan, 1994). Listeners then completed a speech recognition transcription task with the test stimuli under a 4-channel acoustic simulation. Stimuli were presented in random order. Listeners heard each sentence only once and were asked to type what they heard at the end of each presentation using a computer keyboard. Transcription tasks have a venerable history in providing a “window” onto mental knowledge for speech and language and for assessing listeners’ abilities to process speech and language samples.

In the second phase, the participants heard the same set of sentences again in unprocessed form, one time each, in a different random order, for a rating task. Listeners rated each utterance on a scale of 1 to 3 (1 = I never use this sentence, 2 = I sometimes use this sentence, 3 = I often use this sentence). The entire experiment took between 45 minutes and one hour.

Scoring

Two methods of scoring were used. First, the transcription of the whole phrase responses was scored as correct or incorrect by whether all keywords were present in the correct order. Second, analyses of total words correct in each phrase were calculated. Usage ratings were scored on a scale of 1–3.

Results

Overall, participants correctly transcribed entire formulaic expressions more often than novel expressions under acoustic degradation (Figure 1). Formulaic expressions were correctly transcribed in 57.9% of cases; novel expressions were correctly transcribed in only 32.7% of cases. The difference was significant using a paired-samples t-test, t (21) = −12.95, p < 0.001. Given that the utterances were carefully matched and the constituent words did not differ in frequency, this is a large difference in performance.

Figure 1. — Percent correct sentence transcription by type of expression. Novel expressions are shown on the left, formulaic expressions on the right. Error bars represent standard error of the mean

Twenty-two out of 22 participants showed the predicted effect. To assess the individual variation on this task, difference scores in percent correct between formulaic and novel expressions were calculated for each subject (Figure 2). The magnitude of the difference scores ranged from 2 to 38. When the phrases were rescored by total number of words correct from each phrase, the difference was also significant using a paired-samples t-test, t (2652) = −91.817, p < 0.001.

Figure 2. — Difference scores between formulaic and novel expression accuracy by participant. Scores are listed in ascending order. X-axis numbers represent subject numbers

Expressions selected as “never use” had a mean transcription accuracy of 37.3%, “sometimes use” a mean accuracy of 64.3%, and “often use” a mean accuracy of 66.1%. Paired-samples t-tests revealed statistical differences between percent correct averages for stimuli reported as “I never use this sentence” and “I sometimes use this sentence”, t(17) = −10.31, p = 0.000, and “never use” and “often use”, t(16) = −3.58, p = 0.002. The difference between “sometimes use” and “often use” was not significant (Figure 3). The lack of a significant difference between “sometimes use” and “often use” could arise from participants’ reluctance to classify phrases as “often use”. “Never use” was chosen most frequently by subjects, 67.3% of the time. “Sometimes use” was selected for 24.4% of expressions, and “often use” was very infrequently chosen (8.3%). This can be explained by a description of formulaic language of consisting of a very large available repertory, only a small fragment of which is actively used by the individual user. These usage subsets differed across language users. These results suggest that many more formulaic expressions are known and can be recognized than are actively used.

Figure 3. — Percent correct transcription by familiarity rating for both expression types

Importantly, the usage ratings employed in Experiment 1 differed by type of expression. Formulaic expressions were more frequently rated as “often use” (14.6%) or “sometimes use” (37.3%) than novel expressions, which were rated as “sometimes use” in 11.4% or “often use” in 2.0% of cases. On the other hand, novel expressions were significantly more frequently rated as “never use” at 86.6%, while formulaic expressions were rated as “never use” in 48.1% of cases (χ² = 428.892, df = 2, p = 0.000.) (Figure 4).

Figure 4. — Familiarity rating by type of expression

Beyond a relationship with usage ratings, formulaic expressions were always transcribed more accurately than novel expressions, regardless of usage rating (Figure 5). A logistic regression analysis was conducted to predict whole phrase transcription accuracy using usage ratings and expression type as predictors. Both predictors were significant: usage rating (χ² = 89.526, df = 1, p < 0.001) and expression type (χ² = 67.542, df = 1, p < 0.001). Figure 5 also shows that 50% of formulaic expressions rated as “never use” were correctly recognized, again implying a large repertory of known expressions independent of usage ratings.

Figure 5. — Accuracy as a function of usage familiarity rating and type of expression. Error bars represent standard error of the mean

Discussion

As hypothesized, listeners correctly transcribed spectrally-degraded formulaic expressions more often than novel expressions. Subjects also reported higher usage ratings for formulaic than novel expressions. These results are consistent with predictions based on a dual-process model of language (Lounsbury, 1963; Erman & Warren, 2000; Perkins, 1999; Sinclair, 1987; Van Lancker Sidtis, 2012a). According to this model, formulaic language is processed and stored differently than novel language. These results also indicate that subjects transcribed many expressions correctly that were rated as “never use.” This observation led to a restructuring of the familiarity rating task in the next experiment reported below.

Experiment 2

A second experiment was performed to ensure that subjects could transcribe these phrases correctly without any acoustic degradation. We expected that expressions which are not spectrally degraded would be correctly transcribed at ceiling performance levels. We replaced the usage scale of familiarity, used in Experiment 1, with a 7-point exposure rating scale.