Consonantal F0 perturbation in American English involves multiple mechanisms

Yi Xu; Anqi Xu

doi:10.1121/10.0004239

. 2021 Apr 29;149(4):2877–2895. doi: 10.1121/10.0004239

Consonantal F₀ perturbation in American English involves multiple mechanisms

Yi Xu ¹, Anqi Xu ^1,^a),^✉

PMCID: PMC8087449 PMID: 33940879

Abstract

In this study, we revisit consonantal perturbation of F₀ in English, taking into particular consideration the effect of alignment of F₀ contours to segments and the F₀ extraction method in the acoustic analysis. We recorded words differing in consonant voicing, manner of articulation, and position in syllable, spoken by native speakers of American English in both statements and questions. In the analysis, we compared methods of F₀ alignment and found that the highest F₀ consistency occurred when F₀ contours were time-normalized to the entire syllable. Applying this method, along with using syllables with nasal consonants as the baseline and a fine-detailed F₀ extraction procedure, we identified three distinct consonantal effects: a large but brief (10–40 ms) F₀ raising at voice onset regardless of consonant voicing, a smaller but longer-lasting F₀ raising effect by voiceless consonants throughout a large proportion of the following vowels, and a small lowering effect of around 6 Hz by voiced consonants, which was not found in previous studies. Additionally, a brief anticipatory effect was observed before a coda consonant. These effects are imposed on a continuously changing F₀ curve that is either rising-falling or falling-rising, depending on whether the carrier sentence is a statement or a question.

I. INTRODUCTION

When a non-sonorant consonant occurs in a speech utterance, the vibration of the vocal folds is affected in two major ways. First, voicing may be interrupted, resulting in a break of otherwise continuous fundamental frequency (F₀) trajectory. This can be referred to as a horizontal disruption or voice break. Second, F₀ around the voice break may be raised or lowered because of the consonant. This is usually known as consonantal perturbation of F₀ (Hombert et al., 1979; Ohala, 1974). Other names include pitch skip (Haggard et al., 1970; Hanson, 2009), micro F₀ (Kohler, 1990), and CF0 (Kingston, 2007; Kirby and Ladd, 2016). We will refer to the raising and lowering effects as vertical perturbation in order to distinguish them from the effects of voice break. This distinction is necessary because research on the effects of consonants on F₀ over the past decades has focused predominantly on vertical perturbation, while the effects of voice break have received much less attention. As will be demonstrated, the assessment and interpretation of vertical perturbation is contingent on the treatment of voice break in F₀ measurement. In particular, full consideration of voice break may help answer four critical questions: (a) Are there both raising of F₀ by voiceless consonants and lowering of F₀ by voiced consonants? (b) Are there multiple mechanisms that jointly contribute to F₀ perturbation? (c) Are there both carryover and anticipatory F₀ perturbations? And (d) is F₀ perturbation affected by intonation?

A. Vertical perturbation and macro vs micro F₀

As early as in the middle of the last century, House and Fairbanks (1953) measured mean F₀ averaged across the entire vowel in English and found that it was higher after voiceless consonants than after voiced consonants.¹ A similar finding was made by Lehiste and Peterson (1961) with peak F₀ as the measurement. Lea (1973) investigated the time course of the consonant perturbation and found that F₀ first rose after a voiceless consonant and then decreased throughout the vowel, while the opposite was true of voiced consonants. Hombert (1978) and Hombert et al. (1979) also reported a rise-fall dichotomy in the mean F₀ curves, as shown in Fig. 1, which has since been often cited as the prototypical dichotic consonantal perturbation of F₀. Later studies, however, started to show a more complex picture. Ohde (1984) and Silverman (1984) reported that F₀ fell after all obstruent consonants regardless of their voicing. Hanson (2009) applied an improved method to examine the time course of F₀ perturbation by including nasal consonants as the baseline. She found that F₀ was raised after voiceless consonants but not lowered after voiced ones. However, the rise-fall dichotomy remains a widely accepted notion, especially in its use as a key trigger for tonogenesis (Chen et al., 2017; Evans et al., 2018; Gao and Arai, 2019; Hill, 2019).

FIG. 1. — Average F₀ values of vowels following English voiced and voiceless bilabial stops in real time, aligned at vowel onset (adapted from Fig. 1 in Hombert *et al.*, 1979).

There has been less work on the anticipatory F₀ perturbation by consonants. Hombert et al. (1979) found no perturbation effect on the preceding vowels and Lehiste and Peterson (1961) reported that there was no consistent effect for English. Kohler (1982), however, found that F₀ was lowered before voiced stops in contrast with voiceless stops when the sentence intonation is falling but not in sentences with either monotone or rising intonation. Silverman (1984) also reported a dichotomy in the preceding vowels according to consonant voicing.

As summarized above, there is still no clear consensus on vertical perturbation either as a carryover or anticipatory effect. In fact, two major issues remain unresolved. The first is the underlying cause of vertical perturbation. Two mechanisms have been proposed. The first is the aerodynamic hypothesis (Ladefoged, 1967), according to which the release of a voiceless stop is accompanied by a high rate of airflow across the glottis, which would increase the rate of vocal fold vibration. During a voiced consonant, on the other hand, the flow of air across the glottis is reduced, thus lowering pitch. The chief argument against this view is that the observed perturbatory effect lasts too long to be due to an aerodynamic effect. Löfqvist et al. (1995) have shown that the release of voiceless consonants is indeed accompanied by increased airflow, but only for a brief period of time, whereas vertical F₀ perturbation can last for at least 100 ms (Hombert et al., 1979).

An alternative hypothesis is that there is an adjustment of the tension of the vocal folds during the production of the consonant depending on voicing (Halle and Stevens, 1971). This is supported by electromyography (EMG) recordings that show higher cricothyroid (CT) activity during voiceless consonants than during voiced consonants (Dixit, 1975; Löfqvist et al., 1989). Also, significant voicing differences have been found in the vertical position of the larynx (Ewan and Krones, 1974) and the pharyngeal cavity (Bell-Berti, 1975; Westbury, 1983). The changes in the tension of the vocal folds would affect phonation threshold (Berry et al., 1996). In addition, the changes in laryngeal height would affect transglottal pressure (Hanson and Stevens, 2002). Both types of changes would help to stop voicing for voiceless consonants and sustain voicing for voiced consonants, but both of them would also affect F₀. The problem with this hypothesis is in fact part of the second unresolved issue about vertical perturbation: do voiced consonants actually lower F₀ or do they have no effects on F₀? So far there is no clear evidence that F₀ is lowered after voiced obstruents due to vocal folds slackening or larynx lowering. Hanson (2009) finds that F₀ following phonologically voiced stops in English is actually slightly higher than the nasal baseline. Kirby and Ladd (2016) reported that even for French and Italian voiced consonants (which are phonetically prevoiced consonants), there was only a marginal F₀ lowering after the oral closure according to the mean F₀ contours, and the effect was not statistically significant. These results have been further replicated in Kirby et al. (2020).

The above two possibilities have been considered as the only two alternative mechanisms so far. There is a third possibility that has not been contemplated before, however. That is, it is also possible that an aerodynamic effect and the effect of vocal fold tension both occur, but they differ in temporal scale. The aerodynamic effect may occur right after voice onset, but fade away quickly (Löfqvist et al., 1995), while the vocal fold tension effect may have a slow onset, but last longer (Hanson, 2009).

One of the reasons for the lack of consensus is that the observation of vertical perturbation may be affected by the method of its assessment. Silverman (1986) points out that the effect of consonantal perturbation cannot be properly understood unless the underlying intonation is well controlled. For example, if a consonant happens to occur in the course of a rising intonation, the F₀ rise after the consonant release may not be entirely due to the consonant. He further reports that, once the underlying intonation is taken into consideration, there is no more rise-fall dichotomy due to stop voicing in English because F₀ falls after both voiced and voiced stops, except that the fall in the former is shallower than in the latter. Silverman's argument is shadowed by the notion of macro versus micro F₀ (Kohler, 1982, 1990), the first of which refers to stress and intonation, and the second to segmental effects. Kohler (1982) reported that in German the F₀ divergence after voiced and voiceless consonants was large in rising or monotone contours but not in falling contours, while the effect of voicing of a following stop in F₀ was observable only in falling contours.

It is not always obvious what an underlying intonation looks like around a consonant, however. Although one could infer it from the F₀ trajectories before and after the consonant, it is also possible that a sharp pitch turn takes place right before, after, or even during the consonant. When that happens, the assessment of vertical perturbation becomes tricky. What is needed is a careful consideration of the relation between underlying intonation and voice break.

B. Voice break and F₀-syllable alignment

In a sentence consisting of only vowels and sonorant consonants, like the Mandarin phrase /hei1 ni2 li3 mao4/ (black woolen hat) in Fig. 2(a) (where the numbers indicate the high, rising, low, and falling tones, respectively), the F₀ trajectory would be largely smooth and continuous throughout the utterance. This is because the tension of the vocal folds, which is mainly responsible for F₀, cannot change instantaneously. A voluntary pitch change of just one semitone would take over 100 ms to complete on average (Xu and Sun, 2002). Once obstruent consonants occur in an utterance, continuous F₀ is interrupted by the voice breaks during the constriction and sometimes also during the release, as is the case with the Mandarin expression /shan1 qiong2 shui3 jin4/ (no way out) in Fig. 2(b). A question then arises as to whether the voice break also interrupts the continuous adjustment of vocal fold tension. This question might seem unwarranted, as how can there be F₀ adjustment when there is no voicing? Continuous adjustment of F₀ regardless of voicing is nonetheless possible if F₀ control and voicing control are relatively independent of each other. The control of fundamental frequency mainly relies on adjusting vocal fold tension by rotating the thyroid cartilage at its joints with the cricoid cartilage (Hollien, 1960), which mainly involves the antagonistic contraction of the CT and the thyroarytenoid (TA) muscles, supplemented with the adjustment of laryngeal height and subglottal pressure by the contraction of the thyrohyoid, sternohyoid, and omohyoid muscles (Atkinson, 1978). Voicing control, on the other hand, is done by abduction and adduction of the vocal folds, which mainly involves the lateral cricoarytenoid (LCA) and the interarytenoid muscles (Farley, 1996; Zemlin, 1968). The relative independence of F₀ and voicing control makes it possible to adjust the tension of the vocal folds even when they are not vibrating.

A further issue is how exactly F₀ contours should be aligned relative to the syllable. It has been shown that the F₀ contour of a syllable in English is a movement toward an underlying pitch target associated with lexical stress as well as other concurrent functions (Fry, 1958; Liu et al., 2013; Xu and Xu, 2005). It is further shown that such target approximation movement is synchronized with the syllable in English (Prom-on et al., 2009; Xu and Prom-on, 2014; Xu and Xu, 2005), just like in Mandarin (Xu, 1998, 1999), i.e., starting from the syllable onset and ending by syllable offset (Xu and Wang, 2001; Xu, 2020).

Assuming that the target approaching F₀ movement is indeed synchronized with the syllable in English, the full effect of voice break would be most clearly seen by using sonorant consonants like nasals as the reference, as they allow F₀ to be fully continuous with little vertical perturbation (Xu, 1999; Xu and Xu, 2005). Figure 3 is an illustration based on data from the present study. Here, the solid curve represents the F₀ contour of a syllable with a nasal onset, and the dotted and dashed curves represent those in syllables with voiced and voiceless initial stops, respectively. All the contours are aligned by the onset of the consonant closure on the left and by the offset of the vowel on the right. The time in between is normalized across all the contours. As can be seen, F₀ in both stops starts much later than in the nasal, but they also differ from each other in timing, because voiceless stops have longer voice onset time (VOT) than voiced consonants. What is important is that the estimated vertical perturbation would be different if the alignment of F₀ contours is changed. If the onset of the non-sonorant consonant contours is shifted leftward, the magnitude of the estimated perturbation would increase. Furthermore, if the onset of voiceless consonants is shifted leftward to align with the voiced consonants, the difference between them in perturbation would also increase. Therefore, how F₀ onsets are aligned to each other is a potential confound in the assessment of vertical perturbation.

FIG. 3. — (Color online) Schematic illustrations of different procedures of measuring vertical F₀ perturbation. The curves represent F₀ contours in syllables that start with a nasal consonant (solid), a voiced consonant (dotted), or a voiceless consonant (dashed). In (a), time is normalized across the syllable; in (b) time is actual time, aligned at the syllable onset; and in (c), time is normalized across the consonant closure and the vowel, respectively.

In previous studies (Chen, 2011; Chen et al., 2017; Lea, 1973; Hombert, 1978; Jun, 1996; Ohde, 1984), including also those that have used nasal consonants as reference (Hanson, 2009; Kirby and Ladd, 2016; Kirby et al., 2020), F₀ contours have always been aligned at the onset of the vowel when estimating F₀ perturbation, as in Fig. 3(c). They differ only in terms of whether there are additional alignment points and whether time-normalization is applied. Some studies applied fixed time windows for the F₀ contours under comparison: 80 ms in Chen (2011), 100 ms in Jun (1996), and 150 ms in Hanson (2009). Instead of fixed time windows, Kirby and Ladd (2016) and Kirby et al. (2020) aligned the F₀ contours at vowel onset and offset, and then applied time-normalization across the vowel. The same method was also used by Gao and Arai (2019). By aligning F₀ contours at vowel onset, however, the potential effects of voice break on the assessment of vertical perturbation cannot be seen. Part of the goal of the present study is therefore to find this missing information by considering alternative alignments such as those shown in Figs. 3(a) and 3(b).

A further methodological issue is the quality of F₀ trajectory extraction. The finding of two different kinds of F₀ perturbation in the present study may help to explain the low consensus on the rise-fall dichotomy between voiced and voiceless stops in previous studies. Those that do not catch the initial jumps (House and Fairbanks 1953; Lehiste and Peterson, 1961; Lea, 1973; Hombert et al., 1979; Hanson, 2009) tend to report a simple voicing contrast with F₀ following voiceless stops being higher than the voiced stops. When the initial jumps are preserved, the F₀ falling after both types of consonants is observed (Ohde, 1984; Silverman, 1984; Hanson, 2009 ³). In our statistical comparison of the initial jump of voiced and voiceless stops, the conventional way of F₀ processing that removes the abrupt F₀ shift with trimming and smoothing led to a statistically significant voicing contrast. However, when the initial jump was preserved, the F₀ following voiced and voiceless obstruent consonants was statistically indistinguishable.

C. The present study

The present study is designed to answer the four critical questions raised in Sec. I by assessing the size and manner of vertical perturbation based on direct comparisons of syllable-wise F₀ contours both before and after the consonant closure. The new approach takes a more careful consideration of alignment and time normalization than has been done before, based on a number of assumptions. First, as discussed in the above section, the adjustment of vocal fold tension should be continuous (rather than in a temporary halt) during the consonant closure. Second, each syllable should have a targeted pitch pattern or pitch target in English as one of its articulatory goals, and this pitch target is associated with word stress as well as other concurrent functions (Fry, 1958; Liu et al., 2013; Xu and Xu, 2005). Second, the F₀ movement toward the pitch targets is fully synchronized with the syllable in English (Prom-on, Xu and Thipakorn, 2009; Xu and Prom-on, 2014; Xu and Xu, 2005) as is in Mandarin (Xu, 1998, 1999).

Another major source of discrepancy in previous reports of perturbation is the technical precision in F₀ extraction. Earlier studies compared F₀ values at a few acoustic landmarks or averaged across a long interval (House and Fairbanks, 1953; Lehiste and Peterson 1961). Later experiments have often used autocorrelation with large smoothing windows to extract F₀ contours (Kingston, 2007; Kirby and Ladd, 2016). These methods are not highly sensitive to brief changes in fundamental frequency. As shown by Ohde (1984), brief pitch spikes can often be found at consonant offsets when F₀ is computed directly from vocal cycles. Those spikes are consistent with the F₀ falls at the voice onset reported by Silverman (1984). When using F₀ extraction algorithms with sizable smoothing windows, the spikes might be missed entirely, or smoothed into the following contour, creating the appearance of a long-lasting perturbation (see Fig. 1). In order to catch any consistent but brief perturbations, there is a need to extract F₀ directly from vocal cycles, as will be described in Sec. II D.

II. METHOD

A. Stimuli

The stimuli (Table I) were chosen to allow variation of a target consonant within a varying linguistic context. Target consonants were nasals, voiced and voiceless fricatives, stops and stop-sonorants, and voiceless affricates. These were embedded in CV syllables, CVC syllables with the first consonant as nasals, and CVCV syllables with the first consonant as either nasals or laterals. The target words were embedded in the carrier sentences “I should say W next time.” and “Should I say W next time?” The carries were chosen to prevent the target consonants from being resyllabified with surrounding contexts (Xu, 1998).

TABLE I.

Words used as stimuli, in different syllable structures and word length.

	CV		CVC		CVCV
	Voiceless	Voiced	Voiceless	Voiced	Voiceless	Voiced
Nasal		nay		name		Mamie
Fricative	say	they	mace	nave	Laky	lady
Stop	tay	day	make	Meig	Macy	Maisie
Stop sonorant	tray	dray
Affricate	Che

Open in a new tab

B. Subjects

Subjects were four women and four men, all residents of New Haven, CT, and mostly students at Yale University. Their ages ranged from 20 to 54 years (from 20 to 24, excluding one subject), and all were native speakers of General American English. One subject, who had no difficulty with the task, had received six months of speech therapy as a young child, to treat a minor lisp. Otherwise, no speech or language disorders were reported.

C. Recording procedure

The recording was done in a soundproof studio at Haskins Laboratories, New Haven, CT. Subjects sat before a computer screen, on which one stimulus sentence appeared at a time. They read each sentence out loud into a head-mounted microphone and were recorded digitally onto the hard drive of an Apple Macintosh computer. Each sentence was presented five times. To elicit a narrow focus on the target word, we presented it in all capital letters and instructed subjects to emphasize it. Other intonational patterns, noticeable pauses, or voicing anomalies (most commonly creaky voice) rendered some tokens unusable. When this was noticed during the recording, the subject was asked to repeat the sentence. Some problems were not noticed, however, and occasionally both instances of a repeated token turned out to be usable, so the actual number of tokens was in some cases more or less than five.

D. Pitch extraction and processing

Phonetic data were extracted using a special version of ProsodyPro (Xu, 2013), a Praat (Boersma and Weenink, 2020) script for large-scale analysis of speech prosody. The script first used Praat's To PointProcess function to mark all the vocal cycles. The marked cycles were then manually rectified before being converted to F₀ curves. Segment boundaries were manually labeled at the onset of consonant closure and at the onset of vowel formants in both the target word and part of the carrier (… say __ next…), as illustrated in Fig. 4.

FIG. 4. — (Color online) An example of segmentation of consonantal and vocalic intervals.

In the case of the sentence “I should say name next time,” the boundary between [m] and [n] was not always easy to determine from the waveform or the spectrogram. Sometimes there was a faint burst that accompanied the labial release, and this was marked as the boundary, as shown in Fig. 5(a). Otherwise, the boundary was marked in the center of geminated nasal murmur [Fig. 5(b)].

FIG. 5. — (Color online) (a) An example of a burst at labial release between [m] and [n]. (b) An example of an arbitrary boundary in the middle of a nasal geminate.

Further analyses were performed using a custom-written version of ProsodyPro. The F₀ curves were trimmed with an algorithm described in Xu (1999), to remove sharp spikes. The vocal cycle next to a silent interval longer than 33 ms was exempted from this trimming to preserve the sharp spikes that consistently occur at voice onset and offset (based on the assumption that normal F₀ would not go below 30 Hz). The statistical analysis was conducted using linear mixed-effect models by lme4 (Bates et al., 2015) and emmeans (Lenth et al., 2020) for post hoc tests in the R (R Core Team, 2020). Random intercepts for SUBJECT and by-SUBJECT random slopes for fixed effects were then incorporated maximally (Barr et al., 2013). Subsequently, potential fixed effects were added. Only fixed effects that were judged to be superior to less specified models tested by likelihood-ratio tests were included in the model.

III. RESULTS

A. Graphical comparison of F₀ contours

Before deciding what measurements to take for statistical analysis, we first made direct comparisons of the F₀ contours to identify major differences between the conditions. Figure 6 shows examples of mean F₀ contours by individual subjects, with Fig. 6(a) showing those of the target word /nay/ in a statement and Fig. 6(b) in a question. The vertical differences in F₀ are large, with female subjects tending to have higher fundamental frequencies. There are some differences in the location of the F₀ peaks. Regardless of the differences in the vertical level and the peak location, however, all speakers show similar general patterns.

FIG. 6. — (Color online) (a), (b) Sample mean F₀ contours for the target word “nay” embedded in declarative (left, a) and interrogative (right, b) sentences.

Figure 7 shows mean F₀ contours with different ways of alignment and normalization. F₀ of CV syllables and parts of the carrier sentence in statements are aligned at vowel voice onset (a), syllable onset (b), syllable offset (c), and normalized across the entire syllable with alignment at both syllable edges (d). For display purposes only, each contour is an average across all repetitions by all subjects of the given stimulus. When averaging, each segment of each token is sampled at 20 even-spaced points. In the real-time plots, the mean time and F₀ of each of the points were averaged across repetitions and speakers. For the time-normalized plots, the mean time of each type of consonant was recalculated with reference to the mean time of nasals to align these points at both syllable onset and offset. The average plots in Figs. 7–9 reliably represent our data (see the supplementary material² for individual plots for all participants).

FIG. 7. — (Color online) (a)–(d). Mean F₀ contours in target CV syllables (also showing parts of the carrier sentence) with different types of consonants in declarative sentences. The methods of alignment and time-normalization are specified below each plot. The vertical lines indicate the alignment points, and the symbolic markers indicate segment boundaries. The consonants having the same manner of articulation are in paired colours with different grayscale values. The voiced consonants are darker than their voiceless counterparts.

FIG. 8. — (Color online) (a)–(d) Mean F₀ contours of vowels following target consonants in CV syllables (also showing parts of the carrier sentence) with different types of consonants in interrogative sentences. The methods of alignment and time-normalization are specified below each plot. The vertical lines indicate the alignment points, and the symbolic markers indicate segment boundaries. The consonants having the same manner of articulation are in paired colours with different grayscale values. The voiced consonants are darker than their voiceless counterparts.

FIG. 9. — (Color online) Mean F₀ contours of vowels following target consonants in CVC syllables [(a) and (b)] and CVCV [(c) and (d)] and parts of carrier sentences. The time points of consonants are normalized with reference to the mean time points of nasals. Carrier sentence is declarative [left, (a) and (c)] or interrogative [right, (b) and (d)]. The vertical lines indicate the alignment points and the symbolic markers indicate segment boundaries. The consonants having the same manner of articulation are in paired colours with different grayscale values. The voiced consonants are darker than their voiceless counterparts.

In order to establish an appropriate reference level, we plotted F₀ curves using the syllable-wise alignment and conventional alignment methods employed in previous research. As can be seen in Fig. 7, methods of alignment and time-normalization both have clear consequences. When aligned at voice onset [Fig. 7(a)] following previous studies (Lea, 1973; Hombert, 1978; Ohde, 1984; Jun, 1996; Hanson, 2009; Chen, 2011), the F₀ curves of different consonants vary greatly both before and after the consonants. Aligning the F₀ contours at syllable onset [Fig. 7(b)] results in variations at the end of the syllable and the following contexts. When the F₀ contours are aligned at both vowel onset and offset [Fig. 7(c)], as done in Kirby and Ladd (2016), Kirby et al. (2020), and Gao and Arai (2019), the amount of cross-consonant F₀ difference is as large as in Fig. 7(a). Time normalizing F₀ curves between the onset and offset of the target syllable [Fig. 7(d)] seems to exhibit the least variable F₀ patterns across consonant types both within the target syllable and in the surrounding carrier sentences. In the following analysis, therefore, we will focus on comparing F₀ contours time-normalized with respect to the syllable.

Looking more closely at Fig. 7(d), we can see that, with the exception of voiced fricative, F₀ is first perturbed upward by non-sonorant consonants relative to the nasal baseline, although there are also apparent differences in voice onset time between various types of consonants. Afterward, for most of the consonant types, F₀ drops sharply toward the nasal baseline and starts to shadow its contour shape for the rest of the syllable. However, for voiceless stops, surprisingly, F₀ first rises rather than falls, and then also starts to shadow the nasal contour. Besides the initial drop or rise, there are also apparent differences between the consonant types in subsequent overall F₀ height, with voiceless consonants generally having higher F₀ than voiced consonants. These height differences, though gradually reducing over time, persist all the way to the end of the vowel.

Figure 8 displays F₀ contours in questions with various alignment and time-normalization schemes. Again, F₀ is perturbed upward after all non-nasal segments, although there is much variation in terms of perturbation size. After this initial jump, like in statements, F₀ quickly drops toward the nasal baseline and starts to shadow its shape for the rest of the syllable duration. Interestingly, voiceless stops again show the smallest perturbation/jump among the voiceless consonants. But unlike in statements, F₀ drops rather than rises after the initial jump. Presumably, the initial jump, though small in size, has raised F₀ much higher than the targeted low F₀ represented by the nasal contour. Also, like in statements, the overall F₀ height after the initial jump is higher in voiceless consonants than in voice consonants.

Figure 9 shows F₀ contours of CVC [Figs. 9(a) and 9(b)] and CVCV [Figs. 9(c) and 9(d)] syllables with part of the carrier sentences in statements and questions. In both cases, the target consonant is the second consonant in the sequences. These syllables enable the examination of anticipatory effects of obstruent consonants on the preceding F₀ within and across syllable boundaries. For CVC syllables in statements, as can be seen in Figs. 9(a) and 9(b), pre-closure F₀ of non-sonorant consonants inevitably drops sharply after reaching a peak. But before those drops, the overall F₀ height is raised in all cases relative to the nasal baseline. Interestingly, here the consonants seem to be grouped by voicing in statements. Similar overall raising of F₀ height by coda consonants is also seen in questions, except that there are no sharp drops before consonant closure. In contrast, for CVCV syllables, as shown in Figs. 9(c) and 9(d), the F₀ contours of vowels preceding the target consonants do not seem to diverge in both statements and questions. Instead, the lack of the anticipatory effect appears to parallel what we have seen in Figs. 7 and 8 for CV syllables, where the F₀ of vowels in the carrier words converges regardless of the upcoming consonants.

To summarize the graphical comparison, with F₀ contours of nasal consonants as the baseline, a number of initial observations can be made. First, non-sonorant initial consonants seem to exert two kinds of perturbations: (a) an abrupt initial jump in F₀ at voice onset, followed by either a sharp drop or rise (voiceless stop in statement), and (b) a sustained raising (voiceless consonant) or lowering of F₀ height throughout the rest of the syllable. Second, non-sonorant coda consonants also seem to exert two kinds of perturbations: (a) an abrupt drop in F₀ right before voice offset in statements, and (b) a raising of F₀ that extends back toward the midpoint of the vowel. Finally, aspiration, especially in stops, seems to reduce the magnitude of initial jump. This has led to a rise rather than a drop of F₀ immediately after voice onset in a statement. In the next session, we will run statistical tests on the raw data to verify the visual observations.

B. Statistical analysis

The graphical comparison of F₀ contours shows initial indication of three different kinds of influences by initial consonants on F₀: (a) a voice break that interrupts continuous F₀, (b) a brief yet sometimes large jump relative to the nasal baseline, and (c) a long lasting raising or lowering effect, also relative to the nasal baseline. To closely examine these influences, closure duration, onset F₀, F₀ jump, F₀ elbow, elbow jump, and offset F₀ of all the repetitions by each speaker were measured and analysed, as illustrated in Fig. 10. For voiceless consonants, the closure duration equals VOT, while for voiced consonants, it is the time elapsed between the oral closure and the onset of the following vowel (thus disregarding any voicing during closure). Onset F₀ is the conventional way of observing initial consonantal perturbation, which is the first F₀ point at the onset of the vowel. F₀ jump is a new measurement not used in previous studies, which indicates the difference between onset F₀ and the F₀ of nasal baseline at the same relative time in normalized time, in the same intonation. Similar to F₀ jump, elbow jump is another new measurement that indicates the difference between F₀ elbow and the F₀ of nasal baseline in the same intonation at the same relative time in normalized time, where F₀ elbow is the F₀ turning point after the initial F₀ jump. Finally, offset F₀ is the F₀ at the end of the vowel preceding a target consonant, which evaluates whether the perturbation effects last until the end of the syllable.

FIG. 10. — Illustration of onset F₀, F₀ jump, F₀ elbow, elbow jump, and offset F₀.

1. Carryover effect

a. Consonant closure duration.

As we can see from Figs. 7 and 8, there are noticeable differences in closure time between various classes of consonants, and the shape of F₀ contours at the beginning of the following vowels are influenced by the duration of the closure. The longer the closure, the greater the magnitude of the initial F₀ perturbation, except for voiced stops. Table II lists means and standard deviations of closure duration of consonants in CV syllables separated by consonant types and intonation contexts. For the sake of data balance, statistical analysis was performed only on the stops, fricatives, and stop-sonorants that are minimal pairs. In a set of linear mixed models, CVOICE (voiced, voiceless), CMANNER (stop, fricative and stop-sonorant), INTONATION (statement, question), and their interaction were included as potential fixed effects. CVOICE improves the fit of the model (χ² = 24.077, df = 1, p < 0.001): voiceless consonants tend to have longer closures than voiced consonants. CMANNER (χ² = 18.255, df = 2, p < 0.001) also significantly predicts closure duration. The post hoc comparison showed that stop-sonorants have longer closures than fricatives (p < 0.001) and stops (p = 0.046). Meanwhile, closure duration of stops is longer than the fricatives (p = 0.005). INTONATION (χ² = 2.591, df = 1, p = 0.108) does not significantly improve the model. The interaction between CVOICE and CMANNER (χ²= 10.861, df = 2, p = 0.004) is significant. When the consonant is voiceless, the contrast in closure duration between stops and fricatives is not significant (p = 0.895), but the contrast is significant in voiced consonants (p = 0.004).

TABLE II.

Means (standard deviations) of closure duration (ms), onset F₀ (Hz), and F₀ jump (Hz).

Consonant type	Statement			Question
	Closure duration	Onset F₀	F₀ jump	Closure duration	Onset F₀	F₀ jump
Nasal	118 (21)	156 (43)	NA	117 (24)	148 (46)	NA
Voiced stop	122 (31)	174 (46)	18 (9)	118 (27)	170 (50)	22 (12)
Voiced fricative	102 (27)	157 (48)	2 (14)	99 (32)	152 (48)	4 (11)
Voiced stop-sonorant	134 (21)	163 (44)	7 (9)	119 (35)	158 (52)	10 (14)
Voiced consonant (excluding nasal)	119 (24)	165 (50)	9 (8)	112 (30)	160 (50)	12 (12)
Voiceless stop	175 (30)	177 (46)	13 (19)	171 (32)	166 (41)	18 (15)
Voiceless fricative	172 (26)	209 (52)	46 (24)	164 (23)	193 (51)	45 (15)
Voiceless stop-sonorant	189 (27)	192 (42)	27 (20)	175 (20)	178 (43)	30 (12)
Voiceless affricate	184 (29)	206 (47)	40 (15)	179 (26)	188 (51)	39 (24)
Voiceless consonant	179 (26)	196 (45)	32 (14)	172 (24)	182 (45)	33 (12)

Open in a new tab

The realisation of voicing in English consonants is influenced by linguistic contexts such as word position, adjacent consonants, and lexical tones (Davidson, 2016). Table III lists the percentages of phonetically voiced tokens among all phonological voiced consonants. As we can see from the table, there are individual differences in the production of voicing. Voicing is more likely to begin during the constriction for voiced fricatives and voiced stop sonorants compared with voiced stops. Most of the voiced stops are realized as voiceless unaspirated stops (72%), while the percentages of phonetically voiceless fricatives (33%) and stop sonorants (56%) are much lower. In addition, there are individual differences in voicing implementation. One of the speakers (F4) consistently devoiced all the voiced consonants, but the initial perturbation still differs substantially after voiced and voiceless consonants (see supplementary material² for by-speaker plots). For four of the speakers (F2, F3, M3, and M4), F₀ rises after voiceless stops, exhibiting a distinct pattern from other voiceless consonants (see supplementary material² for by-speaker plots).

TABLE III.

Percentages of phonetically voiced tokens in phonologically voiced stops, fricatives, and stop sonorants.

		F1	F2	F3	M1	M2	M3	M4
Stop	Statement	0	100	0	100	0	80	20
	Question	20	60	0	60	0	100	20
Fricative	Statement	100	100	100	100	100	100	100
Fricative	Question	100	100	100	100	40	100	100
Stop-sonorant	Statement	20	100	20	100	20	100	80
Stop-sonorant	Question	40	100	20	100	20	100	60

Open in a new tab

b. Onset F₀ and F₀ jump.

As shown in the previous section, closure duration varies with voicing. These variations may affect F₀ at vowel onset, as seen in Figs. 7 and 8. The conventional way of only measuring onset F₀ does not take closure duration into consideration, which may have potentially exaggerated or masked true vertical perturbation. Here, we compare the onset F₀ of stop consonants measured by the conventional pitch-processing method based on autocorrelation with F₀ trimming and smoothing and by our new method (i.e., without trimming and smoothing). As can be seen in Fig. 11, when F₀ trimming and smoothing is applied, the onset F₀ differs by a large amount after voiced stops and voiceless stops. However, when F₀ is obtained without trimming and smoothing, the first few pitch values are very similar regardless of voicing feature.

FIG. 11. — (Color online) Schematic comparisons of F₀ perturbation following voiced and voiceless obstruent consonants when applied with (solid) and without (dotted) trimming and smoothing pitch processing.

The distributions of the onset F₀ and F₀ jump following voiced and voiceless stops obtained by different pitch processing methods are shown in Fig. 12. A clear distinction of voicing feature can be seen in the trimmed onset F₀, while no such effect is observable in the untrimmed onset F₀ and F₀ jump. We ran statistical tests on the onset F₀ and F₀ jump obtained by the two methods to see whether the pitch extraction and processing method had a significant impact. The main effect of CVOICE is only significant in the model for the trimmed onset F₀ (χ² = 8.386, df = 1, p = 0.003) but not for either the untrimmed onset F₀ (χ² = 0.008, df = 1, p = 0.930) or the untrimmed F₀ jump (χ² = 0.799, df = 1, p = 0.371). The results indicate that the contrast between F₀ following voiced and voiceless is exaggerated when trimming and smoothing are applied.

FIG. 12. — (Color online) Boxplots of trimmed onset F₀ (Hz) (left, a) and untrimmed onset F₀ (Hz) (centre, b) and untrimmed F₀ jump (Hz) (right, c) of vowels following voiced and voiceless stop consonants.

Following the new method, we further evaluated the initial perturbation of other consonant types by measuring both onset F₀ and F₀ jump, as summarized in Table II. As can be seen, the standard derivation (SD) of onset F₀ (SD, 51) is larger than that of F₀ jump (SD, 27) across different conditions. This is further confirmed in Fig. 13, where the boxplots show that F₀ jump is more consistent, i.e., with smaller variance than onset F₀ in both statements and questions, especially for voiceless consonants.

FIG. 13. — (Color online) Boxplots of onset F₀ (Hz) (left, a) and F₀ jump (Hz) (right, b) of vowels following target consonants across voicing and intonation contexts.

The main effect of CVOICE is significant in the model for onset F₀ (χ² = 10.491, df = 1, p = 0.001) and F₀ jump (χ² = 8.398, df = 1, p = 0.004). Voiceless consonants show a greater onset F₀ as well as F₀ jump than voiced consonants. In contrast, CMANNER does not seem to have an impact on either onset F₀ (χ² = 4.268, df = 2, p = 0.118) or F₀ jump (χ² = 5.016, df = 2, p = 0.081). Further, INTONATION is non-significant for either onset F₀ (χ² = 2.664, df = 1, p = 0.103) or F₀ jump (χ² = 1.751, df = 1, p = 0.186).

The interaction between CVOICE and CMANNER is significant for both onset F₀ (χ² = 102.260, df = 4, p < 0.001) and F₀ jump (χ² = 104.950, df = 4, p < 0.001). As demonstrated in Fig. 14, the voicing contrast is more salient in fricatives (onset F₀: p < 0.001; F₀ jump: p < 0.001) and stop-sonorants (onset F₀: p < 0.001; F₀ jump: p = 0.012) than in stops (onset F₀: p = 1.000; F₀ jump: p = 0.968). It is worth noting that the interaction between CVOICE and INTONATION is significant in the model for onset F₀ (χ² = 8.136, df = 2, p = 0.017), whereas F₀ jump is not affected by the interaction (χ² = 1.751 df = 1, p = 0.186). As seen in Fig. 13, the onset F₀ of voiceless consonants is marginally higher in statements than questions (p = 0.097), but that of voiced stops is similar across intonation (p = 0.786). For F₀ jump, which results from subtraction of the nasal baseline from onset F₀, the interference from the interaction between voicing and intonation is eliminated.

FIG. 14. — (Color online) Interaction between voicing and manner of articulation in onset F₀ (left, a) and F₀ jump (right, b). Nasals and affricates are excluded.

What remains unclear is whether the voicing contrast in the initial perturbation is due to F₀ raising by voiceless consonants or F₀ lowering by voiced consonants. We plotted a histogram of F₀ jump for all consonant types in Fig. 15. As can be seen, except for voiceless stops, nearly all the F₀ jumps of voiceless consonants are above zero, which suggests a significant F₀ raise relative to nasals. And, interestingly, F₀ jumps in voiced stops are also distributed largely above zero. In contrast, voiced fricatives and voiced stop-sonorants contain both negative and positive values. This indicates that voiced stops significantly raise F₀ at vowel onset relative to the nasal baseline, just like voiceless consonants, which is consistent with the findings of Ohde (1984) and Silverman (1984). In other words, instead of F₀ lowering versus F₀ raising, voiced and voiceless stops differ only in the magnitude of F₀ raising as far as F₀ jumps are concerned.

FIG. 15. — (Color online) Histographic distributions of F₀ jump values by consonant type. The upper panel shows distributions of F₀ jump for voiced consonants and the lower panel for voiceless consonants. In each plot, the dashed vertical line marks the zero point on the x axis.

c. F₀ elbow and elbow jump.

As can be seen in Figs. 7 and 8, the initial F₀ jump does not last long and the F₀ trajectories of different consonants gradually converge toward the nasal baseline after a sharp turn. The turning point (F₀ elbow) occurs around 41 ms (SD = 22) after vowel onset. However, it is not the case that an F₀ elbow occurs after vowel onset in every utterance. The count and the height of F₀ elbow and elbow jump (the difference between F₀ elbow and the F₀ of nasal baseline in the same intonation at the same relative time point in normalized time, cf. Fig. 10) are summarized in Table IV. Figure 16 shows values of F₀ elbow and elbow jump in different voicing and intonation conditions. Like in the case of onset F₀ and F₀ jump, more variances can be seen in F₀ elbow (SD = 45) than in elbow jump (SD = 15). We fitted separate models for F₀ elbow and elbow jump with CVOICE (voiced, voiceless), CMANNER (stop, fricative, stop-sonorant), INTONATION (statement, question), and their interactions as potential fixed effects. The main effect of CVOICE is significant on F₀ elbow (χ² = 17.339, df = 1, p < 0.001) and elbow jump (χ² = 9.270, df = 1, p = 0.002): Voiceless consonants have higher F₀ elbow values than voiced consonants. CMANNER does not improve the fit of the model for either F₀ elbow (χ² = 0.442, df = 2, p = 0.801) or elbow jump (χ² = 0.348, df = 2, p = 0.175). F₀ elbow differs across intonation patterns (χ² = 6.406, df = 1, p = 0.011): higher in declarative sentences than in interrogative sentences. In contrast, INTONATION does not significantly predict elbow jump (χ² = 1.074, df = 1, p = 0.3). Similar to the results of onset F₀ and jump F₀ presented earlier, the interaction between CVOICE and INTONATION significantly improves the fit of the model for F₀ elbow (χ² = 6.806, df = 1, p = 0.009) but not for elbow jump (χ² = 1.271, df = 2, p = 0.530). The F₀ elbow of voiceless consonants has higher values in statements than in questions (p = 0.002), but not for voiced consonants (p = 0.082) (see Fig. 16).

TABLE IV.

The number of F₀ elbow/total available tokens and means (standard deviations) (in Hz) by intonational patterns and consonant types.

Consonant type	Statement			Question
	Count	F₀ elbow	Elbow jump	Count	F₀ elbow	Elbow jump
Voiced stop	22(40)	161(42)	1(14)	18(39)	139(35)	−4(10)
Voiced fricative	26(40)	161(41)	6(13)	27(40)	144(41)	0(10)
Voiced stop-sonorant	17(38)	167(39)	−13(13)	24(39)	150(45)	−1(6)
Voiced consonants (excluding nasal)	65(118)	163(40)	0(15)	69(118)	145(41)	−1(9)
Voiceless stop	21(40)	188(50)	13(17)	17(37)	157(37)	9(10)
Voiceless fricative	21(39)	160(39)	8(12)	16(40)	144(44)	−1(7)
Voiceless stop-sonorant	25(38)	184(43)	8(16)	14(39)	163(43)	11(16)
Voiceless affricate	29(38)	196(47)	12(18)	13(40)	162(41)	7(13)
Voiceless consonants	96(155)	183(46)	10(16)	60(156)	156(41)	6(13)

Open in a new tab

FIG. 16. — (Color online) Boxplots of F₀ elbow (a) and elbow jump (b) separated by consonant voicing and intonation context. See Fig. 10 for definitions of F₀ elbow and elbow jump.

Figure 17 shows the values of elbow jump for each consonant type. Even after the abrupt initial F₀ jump, there are still clear differences between the F₀ values after voiced and voiceless consonants. Compared with the distribution of F₀ jump (Fig. 15), the raising effects by voiceless consonants have reduced while the lowering effects of voiced consonants have become more evident.

FIG. 17. — (Color online) Histographic distributions of elbow jump values by consonant type. The upper panel shows distributions of elbow jump for voiced consonants and the lower panel for voiceless consonants. In each plot, the dashed vertical line marks the zero point on the x axis.

d. Offset F₀.

As seen in Figs. 7 and 8, the differences in F₀ across consonant types do not end by the F₀ elbows but are sustained through the rest of the syllable. Remarkably, what can also be noticed is that the divergence in offset F₀ between voiced and voiceless consonants is not only due to the upward F₀ shifts following voiceless consonants but also due to the downward F₀ shifts following voiced consonants. Means and standard deviations of offset F₀ under different conditions are provided in Table V. Offset F₀ following voiced consonants is considerably lower than the nasal baseline, whereas it is close to the nasal baseline following voiceless consonants. We ran a series of linear mixed models to test whether the voicing contract remains statistically significant by the end of the syllable. CVOICE (voiced, voiceless) improves the fit of the model (χ² = 6.654, df = 1, p = 0.010): The offset F₀ of vowels following voiceless consonants is higher than the ones following voiced consonants. However, neither CMANNER (stop, fricative, stop-sonorant: χ² = 3.365, df = 2, p = 0.186) nor INTONATION (statement, question: χ² = 1.367, df = 1, p = 0.242) shows significant effects on the offset F₀. The results, therefore, indicate that the F₀ height difference due to voicing lasts until the end of the syllable.

TABLE V.

Means (standard deviations) of offset F₀ (Hz) following different types of consonants in declarative and interrogative carrier sentences.

Consonant type	Statement	Question
Nasal	168(61)	181(51)
Voiced stop	164(55)	176(48)
Voiced fricative	169(59)	178(52)
Voiced stop-sonorant	161(56)	172(46)
Voiced consonants (excluding nasals)	164(56)	176(47)
Voiceless stop	168(60)	183(49)
Voiceless fricative	168(60)	182(52)
Voiceless stop-sonorant	168(59)	183(53)
Voiceless affricate	173(62)	184(53)
Voiceless consonants	169(60)	183(52)

Open in a new tab

2. Anticipatory effect

a. Effect of syllable boundary.

The consonantal perturbation may impact not only the F₀ of the following vowel but also the preceding vowel. As shown in Figs. 9(a) and 9(b), F₀ contours of vowels preceding the coda consonants in CVC syllables do not converge. In contrast, vowels before the target consonants in CV syllables have very close F₀ values (Figs. 7 and 8), which is similar to the first vowels in CVCV syllables where the second consonant is an obstruent, as shown in Figs. 9(c) and 9(d). The means and standard deviations of F₀ offset for vowels in CVC syllables, the first vowels in CV and CVCV syllables are listed in Table VI. We performed statistical analysis on the vowel offset F₀ with CVOICE (voiced, voiceless), CMANNER (stop, fricative), INTONATION (statement, question), and their interaction as potential fixed effects. In CVC syllables, the main effect of CVOICE (χ² = 10.018, df = 1, p = 0.002) is significant. The F₀ at the vowel offset is higher when preceded by voiceless consonants than by voiced consonants. Neither CMANNER (χ2 = 1.172, df = 1, p = 0.279) nor INTONATION (χ2 = 1.061, df = 1, p = 0.303) significantly predicts the offset F₀. The interaction CMANNER and INTONATION (χ² = 21.760, df = 2, p < 0.001) is significant: the contrast between stops and fricatives is more pronounced in questions (p < 0.001) than in statements (p = 0.095). In short, voicing and manner of articulation of coda consonants influence the F₀ of vowels right before the closure and the effect interacts with sentence intonation.

TABLE VI.

Means (standard deviations) of offset F₀ (Hz) of vowels in CVC syllables, first vowels in CVCV syllables before syllable boundaries and first vowels in CV syllables before word boundaries in declarative and interrogative sentences.

Consonant type	Statement			Question
	CV	CVC	CVCV	CV	CVC	CVCV
Nasal	152(45)	175(53)	190(52)	150(45)	171(52)	166(51)
Voiced stop	152(42)	167(52)	191(50)	147(46)	176(50)	165(47)
Voiced fricative	148(43)	162(58)	191(53)	145(47)	180(52)	174(50)
Voiced stop-sonorant	151(45)	NA	NA	142(40)	NA	NA
Voiced consonants (excluding nasal)	150(43)	164(55)	191(51)	145(44)	178(51)	169(49)
Voiceless stop	147(44)	190(59)	188(51)	146(45)	180(54)	164(47)
Voiceless fricative	152(46)	182(52)	194(52)	150(49)	199(56)	169(49)
Voiceless stop-sonorant	149(42)	NA	NA	144(41)	NA	NA
Voiceless affricate	152(47)	NA	NA	150(47)	NA	NA
Voiceless consonants	150(44)	186(55)	191(51)	148(45)	190(55)	167(48)

Open in a new tab

When the syllable boundary is not a word boundary, as in the case of offset F₀ in the first vowel of the CVCV syllable, the main effects of CMANNER (χ² = 5.507, df = 1, p = 0.019) and INTONATION (χ² = 5.905, df = 1, p = 0.015) are significant, while the main effect of CVOICE (χ² = 0.227, df = 1, p = 0.634) is not. No trace of F₀ differences at vowel offset before voiceless and voiced consonants was observed before syllable boundaries.

For vowel F₀ offset preceding CV syllables, when the syllable boundary between the target consonant and the preceding vowel is also a word boundary, the main effect of CVOICE (χ² = 0.056, df = 1, p = 0.814), CMANNER (χ² = 0.728, df = 2, p = 0.695) and INTONATION (χ² = 0.779, df = 1, p = 0.378) are not significant, and neither are the two-way interactions and three-way interactions. The anticipatory F₀ perturbation is also missing here, just like in CVCV syllables. If we combine the findings of offset F₀ in vowels before obstruent consonants in the CV, CVC, and CVCV syllables, it seems clear that anticipatory F₀ modulation at vowel offset is only present within a syllable.

b. Time course of anticipatory F₀ perturbation in CVC syllables.

As seen in Figs. 9(a) and 9(b), in CVC syllables, F₀ contours vary visibly with different types of coda consonants. The differences are the greatest right before the consonant closure, which then gradually reduce leftward and eventually converge to the nasal baseline. Figure 18 plots the time course of the anticipatory F₀ perturbation effect in vowels preceding voiced and voiceless consonants in five in-syllable positions. We can see that F₀ is higher preceding voiceless consonants than preceding voiced consonants. The closer to the target consonant, the more prominent the contrast is. To examine the time course of the anticipatory effect, we fitted linear mixed models with TIME (five levels: onset, 1/4, 1/2, 3/4 of the vowel duration, and offset) being incorporated as a potential categorical fixed effect. In addition, CVOICE (voiced, voiceless), CMANNER (stop, fricative, stop-sonorant), INTONATION (statement, question), and their interactions are included as potential fixed effects. Detailed results of the linear mixed models can be found in Appendix A. The interaction between CVOICE and TIME is significant (χ² = 72.277, df = 4, p < 0.001). Post hoc comparisons show that the difference in the F₀ of vowels before voiced and voiceless consonants is significant only at the very end of the syllable (p < 0.001), but not at the beginning (p = 0.995), 1/4 (p = 0.990), 1/2 (p = 1.000), or 3/4 (p = 0.181) of the vowel duration. Overall, the results indicate that there is an anticipatory F₀ perturbation effect that emerges from the very end of the vowel.

FIG. 18. — (Color online) F₀ at five relative locations in the vowels preceding voiced consonants (nasals excluded) and voiceless consonants. Error bars show the standard errors.

IV. DISCUSSION

The present study aims at achieving an accurate assessment of the nature and scope of the consonantal perturbation of F₀ by testing a number of methodological measures: (1) applying a nasal baseline as the reference; (2) using syllable-wise time-normalization to align F₀ contours in different syllable structures; (3) calculating F₀ cycle-by-cycle without smoothing with a large window; and (4) controlling underlying intonation in carriers spoken as either statements or questions. With these methods, we have found evidence that there are two rather different types of perturbations. One is a brief, yet sometimes large, F₀ jump at the vowel onset relative to the nasal baseline, and the other is a long-lasting raising or lowering of F₀ that persists all the way to the end of the syllable. In addition, we have also observed a brief anticipatory perturbation of F₀ before a coda consonant.

A. Large brief perturbations

From Figs. 7(d) to Fig. 8(d), we can see that the initial F₀ at vowel onset is in most cases well off the nasal baseline. We measured this initial deviation of F₀ in two different ways: onset F₀ (absolute F₀) and F₀ jump (relative to nasal baseline). Statistical results show a significant effect of consonant voicing on both onset F₀ and F₀ jump, but no effect of manner of consonant articulation. Onset F₀ is more variable than F₀ jump as a consequence of the impact of the interaction between consonant voicing and sentence intonation (see Fig. 13). The onset F₀ values of voiceless consonants are higher in statements than in questions. After this jump, in each case, F₀ quickly turns toward a trajectory that shadows the nasal baseline for the rest of the syllable. Despite the shadowing, in most cases, the long-term trajectories stay away from the nasal baseline, with the general tendency of higher F₀ after voiceless consonants and lower F₀ after voiced consonants. Thus, the initial jumps seem to be rather different from the longer-lasting effects. Figures 7(d) and 8(d) further show that, surprisingly, F₀ jump is much smaller after voiceless stops than after other voiceless consonants. In Fig. 7(d), after the release of a voiceless stop, F₀ even rises up to join the cluster of voiceless trajectories that are elevated well above the nasal baseline (which, as mentioned in Sec. III B 1 a, occurred in four of the eight speakers). This further implies that the initial jump is likely due to a different mechanism from the longer-term effects.

The first possibility is that the initial F₀ jump is due to an aerodynamic effect (Ladefoged, 1967). In that hypothesis, the buildup of oral pressure during a voiced stop reduces the pressure drop across the vocal cords, thus decreasing F₀ in the following vowel. In a voiceless stop, especially if it is aspirated, the high transglottal airflow at the release creates a boosted Bernoulli force, leading to increased F₀ in the following vowel (Hombert et al., 1979). However, the present data show that large F₀ jumps occur after the release of both voiced and voiceless obstruents. Moreover, at even greater odds with the aerodynamic hypothesis, voiceless stops show much smaller F₀ jumps than the other voiceless obstruents (Table II). This goes against the finding of Löfqvist et al. (1995) that the level of airflow is greater after a voiceless stop than after a voiced stop.

Another possibility is that much of the F₀ jump could be due to a brief falsetto vibration (Xu, 2019). That is, the initial vibration at voice onset after an obstruent may involve only the outer (mucosal) layer of the vocal folds (Titze, 1994), which has a higher natural frequency than the main body of the vocal folds, due to its smaller mass (Miller et al., 2002). At the moment of voice onset, transglottal airflow is going through a sharp drop as the vocal folds are quickly being adducted for voicing. The adduction process has to first involve the outer layers of the folds before engaging the main body, and a vibration involving only the outer layer would generate F₀ at the falsetto register rather than the chest register (Titze, 1994). Falsetto vibration has been suggested to happen at the end of utterance offsets, where F₀ is often observed to jump up abruptly in breach of the on-going downward intonation contour (Xu, 2019). This brief falsetto vibration hypothesis would predict that the level of F₀ jump is related to the speed of vocal fold adduction at voice onset, as falsetto vibration is more likely to happen when the adduction speed is relatively slow. This would be the case in voiceless fricatives which likely requires precise control of transglottal airflow. As shown in Table II, voiceless fricatives indeed have the largest F₀ jumps in both statements and questions. The brief falsetto vibration hypothesis would also predict that the magnitude of F₀ jump can vary positively with boundary strength. We analyzed the F₀ following the medial consonant in CVCV syllables (see Appendix B for the descriptive statistics and Appendix C for the results of the linear mixed models). Compared with the initial consonant at the word boundary in CV syllables, the closure duration of the medial consonant is much shorter and the magnitude of F₀ jump is also smaller in CVCV syllables.

The brevity of the initial F₀ jump makes it tricky to capture in F₀ analysis, however, as illustrated in Fig. 19. All the F₀ contours in the figure were generated by taking the inverse of every vocal period to obtain the raw F₀, and then applying a trimming algorithm (Xu, 1999) to prune very local spikes. They differ only in (a) whether the trimming is applied across silent intervals (edge-trimmed), and (b) whether a smoothing filter is applied after trimming. In Fig. 19(a), trimming was not applied across silent intervals longer than 33 ms (i.e., when F₀ would go below 30 Hz). With this method (which was used in the present study), the large F₀ jumps (relative to the nasals) as well as the sharp drops are clearly visible. In Fig. 19(b), trimming was again not applied across silent intervals, but a 70-ms triangular filter was applied to smooth the raw F₀. As a result, the initial jumps and the following drops are now much smaller. In Fig. 19(c), trimming was applied across silent intervals before smoothing. As can be seen, the large F₀ drops have now mostly disappeared, although the F₀ jumps are still clearly visible. With the new method, the large initial F₀ jumps can be found for all the speakers, despite some differences in magnitude (see supplementary material² for by-speaker plots).

The finding of two different kinds of F₀ perturbation in the present study may help to explain the low consensus on the rise-fall dichotomy between voiced and voiceless stops in previous studies. Those that do not catch the initial jumps (House and Fairbanks, 1953; Lehiste and Peterson, 1961; Lea, 1973; Hombert et al., 1979) tend to report a simple voicing contrast with F₀ following voiceless stops being higher than the voiced stops. When the initial jumps are preserved, the F₀ fall after both types of consonants is observed (Ohde, 1984; Silverman, 1984; Hanson, 2009). In our statistical comparison of the initial jump of voiced and voiceless stops, the removal of the abrupt F₀ shift with trimming and smoothing led to a statistically significant voicing contrast. When the initial jump was preserved, however, the F₀ following voiced and voiceless obstruent consonants was statistically indistinguishable.

The present data also show that the brief perturbation lasts only around 41 ms (SD = 22), after which there is frequently a turning point where the initial perturbation fades away and the F₀ of all consonants starts to shadow the nasal baselines. At the F₀ turning point (F₀ elbow and elbow jump), voiceless consonants show higher absolute F₀ than voiced consonants, and the difference is more prominent in statements than in questions [Fig. 16(a)]. When measured in terms of elbow jump, which is relative to the nasal baseline, F₀ shows less variance and is not influenced by the sentence intonation [Fig. 16(b)]. Again, similar to the case of onset F₀ versus F₀ jump, voicing contrast at the F₀ turning point, though large in magnitude, is masked by sentence intonation due to greater variability than elbow jump. The syllable-wise alignment with the nasals eliminates the interference of intonation, which leads to higher consistency in F₀ jump and elbow jump.

B. Sustained carryover perturbation

After the F₀ turning point, a smaller upward perturbation is still evident when comparing voiceless consonants with voiced consonants. This effect has a magnitude of around 8 Hz, and it progressively diminishes till the end of the syllable. Furthermore, the distribution of this effect is different from that of the larger initial effect. While the former shows varying magnitudes after different obstruent consonants, the latter shows little differences in magnitude between consonants. This latter effect is consistent with the vocal fold tension mechanism proposed by Halle and Stevens (1971). That is, in a voiceless obstruent the vocal folds are stiffened to impede glottal vibration during the consonant closure, while in a voiced obstruent the vocal folds are slackened to facilitate glottal vibration. Previous studies, however, have not been able to find clear evidence of F₀ lowering in English voiced obstruents (Hanson, 2009). In the present study, we observed an increasing downward perturbation after the initial perturbation. The lowering effect reaches around 13 Hz after stop-sonorants at the F₀ elbow. It then gradually declines to 5 Hz after voiced stops and 8 Hz after stop-sonorants compared with nasals at the syllable offset. No such perturbation is found after voiced fricatives. Unlike even the longer-lived upward perturbation, this effect shows no sign of abating for stop-sonorants even at the end of our measurement, which was on average 194 ms from the release of the target consonant. Not only is this consistent with Halle and Stevens (1971) hypothesis that the vocal folds are slackened to maintain voicing during a long oral closure when the transglottal pressure drop is quickly reduced below that of phonation threshold (Berry et al., 1996), but also it is first evidence that the voicing contrast is long lasting.

C. Anticipatory perturbation by obstruent coda consonants

As shown in Figs. 9(a) and 9(b), there are also two kinds of F₀ perturbations by coda consonants. Right before the closure of an obstruent coda, there is a very brief lowering of F₀, which is small in magnitude. Further back in time, there is a much greater perturbation: F₀ preceding voiceless coda consonants is higher than voiced coda. The raising effect starts to appear in the midpoint of the vowel toward the coda closure but does not reach statistical significance until the very last measurement point (Fig. 18). The F₀ contours in CVCV syllables before the second C and those before CV syllables, however, do not differ from one another. Thus, the anticipatory F₀ perturbation does not apply across syllable boundaries.

The anticipatory F₀ perturbation by coda consonants should be taken with caution, however, because they are potentially biased by difficulties in the alignment of obstruent and nasal contours. First, we marked the offsets of final obstruents at the resumption of voicing, if there was any voice break. The oral release, which often precedes the resumption of voicing, would be earlier when the coda is voiceless than when it is voiced. Second, there are significant differences in syllable duration due to the well-known pre-consonantal voicing effect in English (House and Fairbanks, 1953; House, 1961), which might have affected the phonetic implementation of the base F₀ contours. The average duration of target words is 380 ms with final nasals, 398 ms with final voiced stops, 408 ms with final voiceless stops, 411 ms with final voiced fricatives, and 442 ms with final voiceless fricatives. Since our method of measuring perturbation depends on the alignment of obstruent curves to nasals, errors in the placement of a syllable boundary in the nasal contour would result in misalignment to all corresponding obstruents, which would create gaps between the curves that are not due to actual perturbation but are measured as such. Looking from Figs. 9(a) and 9(b), however, even with adjustments in alignment, F₀ before voiceless consonant would still be higher in both statements and questions. Nevertheless, further studies are necessary to fully resolve this issue.

V. CONCLUSION

The present study is a further effort to improve the understanding of consonantal perturbation of F₀. Recent studies (Hanson, 2009; Kirby and Ladd, 2016; Kirby et al., 2020) have already shown reduced support for the simple rise-fall dichotomy of F₀ movement after voiced versus voiceless consonants (Hombert et al., 1979) illustrated in Fig. 1. These studies have demonstrated the importance of using F₀ of syllables with sonorant onsets as baseline when assessing the perturbation effect by obstruent consonants. The present study has explored further improvements of methodology by first using the entire syllable as the domain of F₀ alignment and time-normalization rather than the conventional alignment of F₀ contours at vowel voice onset. Furthermore, we tried to improve the precision of F₀ extraction by converting F₀ from individual vocal cycles without heavy smoothing. With these methods, we were able to observe, for the first time, three distinct kinds of vertical F₀ perturbations. The first is a large but brief raising effect immediately after most of the consonants, which we interpret as likely due to the vibration of only the outer layer of the vocal folds immediately after the consonant release. The second is a longer-sustained increase in F₀ both before and after voiceless consonants, which is likely due to an increase in the tension of the vocal folds to inhibit voicing during the voiceless consonant. The third is a sustained downward perturbation after voiced stops and stop-sonorant clusters, which is probably due to the slackening of the vocal folds for the sake of sustaining voicing during the stop closure.

The alignment method used in the present study is based on the assumption that underlying pitch targets associated with a syllable is synchronized with the entire syllable rather than with only the syllable rhyme (Xu and Liu, 2006; Xu, 2020). Based on this assumption, while voice breaks may mask continuous F₀ contours, they do not interrupt the underlying laryngeal movements that produce them. The assessment of the vertical F₀ perturbation by consonants should therefore treat voice breaks as internal to the syllable. The hypothetical nature of the synchronization assumption, however, means that the findings of the present study are also provisional and open to alternative interpretations.

ACKNOWLEDGMENTS

We would like to thank Andrew Wallace for helping to design the experimental stimuli, conducting the recording, performing the initial data processing, and contributing to an early version of the manuscript. The present work was supported by NIDCD (Grant No. R01 DC03902) and the Leverhulme Trust (RPG-2019-241).

APPENDIX A

Table VII shows statistical results of the anticipatory F₀ perturbation in CVC syllables.

TABLE VII.

Likelihood ratio tests of linear mixed models for the F₀ of vowels preceding target consonants in CVC syllables. Significant effects are indicated in bold.

Fixed effects	Chi-square	df	p
CVOICE	2.063	1	0.151
CMANNER	0.063	1	0.802
INTONATION	2.950	1	0.086
TIME	29.714	4	<0.001
CVOICE:CMANNER	14.866	3	0.002
CVOICE:INTONATION	8.257	2	0.016
CVOICE:TIME	72.277	4	<0.001
CMANNER:INTONATION	6.044	1	0.014
CMANNER:TIME	8.381	4	0.079
INTONATION:TIME	154.21	4	<0.001
CVOICE:CMANNER:INTONATION	10.748	1	0.001
CVOICE:CMANNER:TIME	17.103	8	0.029
CVOICE:INTONATION:TIME	1.701	4	0.791
CMANNER:INTONATION:TIME	34.927	4	<0.001
CVOICE:CMANNER:INTONATION:TIME	2.690	8	0.952

Open in a new tab

APPENDIX B

Table VIII shows means of closure duration, F₀ onset, and F₀ jump in CVCV syllables.

TABLE VIII.

Means (standard deviations) of closure duration (ms), onset F₀ (Hz), and F₀ jump (Hz) across consonant types and sentence type in CVCV syllables.

	Statement			Question
Consonant type	Closure duration (ms)	Onset F₀ (Hz)	F₀ jump (Hz)	Closure duration (ms)	Onset F₀ (Hz)	F₀ jump (Hz)
Nasal	69(10)	173(55)	NA	63(13)	187(54)	NA
Voiced stop	35(11)	178(50)	−7(16)	35(9)	170(45)	−6(13)
Voiced fricative	76(17)	170(53)	−7(20)	74(18)	199(64)	8(30)
Voiced consonant (excluding nasal)	55(25)	174(51)	−7(18)	55(24)	185(57)	1(24)
Voiceless stop	108(15)	177(55)	9(20)	98(17)	211(58)	16(27)
Voiceless fricative	124(13)	188(61)	24(21)	112(13)	216(55)	18(24)
Voiceless consonant	116(16)	182(53)	16(22)	105(17)	213(57)	17(25)

Open in a new tab

APPENDIX C

See Table IX for statistical results of F₀ jump in CVCV syllables.

TABLE IX.

Likelihood ratio tests of linear mixed models for the F₀ jump of vowels following target consonants in CVCV syllables. Significant effects are indicated in bold.

Fixed effects	Chi-square	df	p
CVOICE	16.870	1	<0.001
CMANNER	9.683	1	0.002
INTONATION	0.891	1	0.345
CVOICE:CMANNER	0.171	1	0.680
CVOICE:INTONATION	3.316	2	0.191
CMANNER:INTONATION	0.895	2	0.639
CVOICE:CMANNER:INTONATION	11.275	5	0.046

Open in a new tab

Footnotes

^¹

Although the same paper also included figures that show F₀ contours in syllables with voiced onset stops are similar to those in syllables with sonorant onset, this figure that gives the impression of a robust dichotomy is the most referred to.

^²

See supplementary material at https://doi.org/10.1121/10.0004239 for individual plots for all participants.

^³

In Hanson (2009), some of the initial jumps seem to be captured but others are not.

References

1. Atkinson, J. E. (1978). “ Correlation analysis of physiological factors controlling fundamental frequency,” J. Acoust. Soc. Am. 63(1), 211–222. 10.1121/1.381716 [DOI] [PubMed] [Google Scholar]
2. Barr, D. J. , Levy, R. , Scheepers, C. , and Tilly, H. J. (2013). “ Random effects structure for confirmatory hypothesis testing: Keep it maximal,” J. Mem. Lang. 68, 255–278. 10.1016/j.jml.2012.11.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Bates, D. , Mächler, M. , Bolker, B. M. , and Walker, S. C. (2015). “ Fitting linear mixed-effects models using lme4,” J. Stat. Softw. 67(1), 1–48. 10.18637/jss.v067.i01 [DOI] [Google Scholar]
4. Bell-Berti, F. (1975). “ Control of pharyngeal cavity size for English voiced and voiceless stops,” J. Acoust. Soc. Am. 57, 456–461. 10.1121/1.380468 [DOI] [PubMed] [Google Scholar]
5. Berry, D. A. , Herzel, H. , Titze, I. R. , and Story, B. H. (1996). “ Bifurcations in excised larynx experiments,” J. Voice 10, 129–138. 10.1016/S0892-1997(96)80039-7 [DOI] [PubMed] [Google Scholar]
6. Boersma, P. , and Weenink, D. (2020). “ Praat: Doing phonetics by computer (version 6.0.21) [computer program],” http://www.praat.org/ (Last viewed June 06, 2020).
7. Chen, Y. (2011). “ How does phonology guide phonetics in segment–F0 interaction?,” J. Phon. 39(4), 612–625. 10.1016/j.wocn.2011.04.001 [DOI] [Google Scholar]
8. Chen, S. , Zhang, C. , McCollum, A. G. , and Wayland, R. (2017). “ Statistical modelling of phonetic and phonologised perturbation effects in tonal and non-tonal languages,” Speech Commun. 88, 17–38. 10.1016/j.specom.2017.01.006 [DOI] [Google Scholar]
9. Davidson, L. (2016). “ Variability in the implementation of voicing in American English obstruents,” J. Phon. 54, 35–50. 10.1016/j.wocn.2015.09.003 [DOI] [Google Scholar]
10. Dixit, R. P. (1975). “ Neuromuscular aspects of laryngeal control, with special reference to Hindi,” Ph.D. thesis, University of Texas at Austin, Austin, TX. [Google Scholar]
11. Evans, J. , Yeh, W. C. , and Kulkarni, R. (2018). “ Acoustics of tone in Indian Punjabi,” Trans. Philos. Soc. 116, 509–528. 10.1111/1467-968X.12135 [DOI] [Google Scholar]
12. Ewan, W. G. , and Krones, R. (1974). “ Measuring larynx movement using the thyroumbrometer,” J. Phon. 2(4), 327–335. 10.1016/S0095-4470(19)31302-6 [DOI] [Google Scholar]
13. Farley, G. R. (1996). “ A biomechanical laryngeal model of voice F0 and glottal width control,” J. Acoust. Soc. Am 100(6), 3794–3812. 10.1121/1.417218 [DOI] [PubMed] [Google Scholar]
14. Fry, D. B. (1958). “ Experiments in the perception of stress,” Lang. Speech 1, 126–152. 10.1177/002383095800100207 [DOI] [Google Scholar]
15. Gao, J. , and Arai, T. (2019). “ Plosive (de-)voicing and f0 perturbations in Tokyo Japanese: Positional variation, cue enhancement, and contrast recovery,” J. Phon. 77, 10932. 10.1016/j.wocn.2019.100932 [DOI] [Google Scholar]
16. Haggard, M. , Ambler, S. , and Callow, M. (1970). “ Pitch as a voicing cue,” J. Acoust. Soc. Am. 47, 613–617. 10.1121/1.1911936 [DOI] [PubMed] [Google Scholar]
17. Halle, M. , and Stevens, K. N. (1971). “ A note on laryngeal features,” MIT Q. Prog. Rep. 101, 198–212. [Google Scholar]
18. Hanson, H. M. (2009). “ Effects of obstruent consonants on fundamental frequency at vowel onset in English,” J. Acoust. Soc. Am. 125, 425–441. 10.1121/1.3021306 [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Hanson, H. M. , and Stevens, K. N. (2002). “ A quasiarticulatory approach to controlling acoustic source parameters in a Klatt-type formant synthesizer using HLsyn,” J. Acoust. Soc. Am. 112, 1158–1182. 10.1121/1.1498851 [DOI] [PubMed] [Google Scholar]
20. Hill, N. (2019). The Historical Phonology of Tibetan, Burmese, and Chinese ( Cambridge University Press, Cambridge, UK: ). [Google Scholar]
21. Hollien, H. (1960). “ Vocal pitch variation related to changes in vocal fold length,” J. Speech Lang. Hear. Res. 3, 150–156. 10.1044/jshr.0302.156 [DOI] [Google Scholar]
22. Hombert, J.-M. (1978). “ Consonant types, vowel quality, and tone,” in Tone: A Linguistic Survey, edited by Fromkin V. A. ( Academic, New York: ), pp. 77–107. [Google Scholar]
23. Hombert, J.-M. , Ohala, J. J. , and Ewan, W. (1979). “ Phonetic explanation for the development of tones,” Language 55, 37–58. 10.2307/412518 [DOI] [Google Scholar]
24. House, A. S. (1961). “ On vowel duration in English,” J. Acoust. Soc. Am 33(9), 1174–1178. 10.1121/1.1908941 [DOI] [Google Scholar]
25. House, A. S. , and Fairbanks, G. (1953). “ The influence of consonant environment upon the secondary acoustical characteristics of vowels,” J. Acoust. Soc. Am. 25, 105–113. 10.1121/1.1906982 [DOI] [Google Scholar]
26. Jun, S.-A. (1996). “ Influence of microprosody on macroprosody: A case of phrase initial strengthening,” Technical Report No. 92, University of California at Los Angeles, Los Angeles, CA.
27. Kingston, J. (2007). “ Segmental influences on F0: Automatic or controlled?,” in Tones and Tunes, Volume 2: Experimental Studies in Word and Sentence Prosody, edited by Gussenhoven C. and Riad T. ( Mouton de Gruyter, Berlin, Germany: ), pp. 171–201. [Google Scholar]
28. Kirby, J. P. , and Ladd, D. R. (2016). “ Effects of obstruent voicing on vowel F0: Evidence from ‘true voicing’ languages,” J. Acoust. Soc. Am 140(4), 2400–2411. 10.1121/1.4962445 [DOI] [PubMed] [Google Scholar]
29. Kirby, J. P. , Ladd, D. R. , Gao, J. , and Elliott, Z. (2020). “ Elicitation context does not drive F0 lowering following voiced stops: Evidence from French and Italian,” J. Acoust. Soc. Am. 148, EL147–EL152. 10.1121/10.0001698 [DOI] [PubMed] [Google Scholar]
30. Kohler, K. J. (1982). “ F0 in the production of fortis and lenis plosives,” Phonetica 39, 199–218. 10.1159/000261663 [DOI] [PubMed] [Google Scholar]
31. Kohler, K. J. (1990). “ Macro and micro F0 in the synthesis of intonation,” in Papers in Laboratory Phonology Volume 1: Between the Grammar and Physics of Speech, edited by Kingston J. and Beckman M. E. ( Cambridge University Press, Cambridge, UK: ), pp. 115–138. [Google Scholar]
32. Ladefoged, P. (1967). Three Areas of Experimental Phonetics ( Oxford University Press, London: ). [Google Scholar]
33. Lea, W. A. (1973). “ Segmental and suprasegmental influences on fundamental frequency contours,” in Consonant Types and Tone, edited by Hyman L. M. ( University of Southern California, Los Angeles, CA: ), pp. 15–70. [Google Scholar]
34. Lehiste, I. , and Peterson, G. E. (1961). “ Some basic considerations in the analysis of intonation,” J. Acoust. Soc. Am. 33, 419–425. 10.1121/1.1908681 [DOI] [Google Scholar]
35. Lenth, R. , Singmann, H. , Love, J. , Buerkner, P. , and Herve, M. (2020). “ Estimated marginal means, aka least-squares means (version 1.3.1),” https://CRAN.R-project.org/package=emmeans (Last viewed June 26, 2020).
36. Liu, F. , Xu, Y. , Prom-on, S. , and Yu, A. C. L. (2013). “ Morpheme-like prosodic functions: Evidence from acoustic analysis and computational modeling,” J. Speech Sci. 3, 85–140. [Google Scholar]
37. Löfqvist, A. , Baer, T. , McGarr, N. S. , and Story, R. S. (1989). “ The cricothyroid muscle in voicing control,” J. Acoust. Soc. Am. 85, 1314–1321. 10.1121/1.397462 [DOI] [PubMed] [Google Scholar]
38. Löfqvist, A. , Koenig, L. L. , and McGowan, R. S. (1995). “ Vocal tract aerodynamics in /aCa/ utterances: Measurements,” Speech Commun. 16, 49–66. 10.1016/0167-6393(94)00049-G [DOI] [Google Scholar]
39. Miller, D. G. , Švec, J. G. , and Schutte, H. K. (2002). “ Measurement of characteristic leap interval between chest and falsetto registers,” J. Voice 16(1), 8–19. 10.1016/S0892-1997(02)00066-8 [DOI] [PubMed] [Google Scholar]
40. Ohala, J. J. (1974). “ A mathematical model of speech aerodynamics,” in Proceedings of the Speech Communication Seminar, April 1–3, Stockholm, Sweden, pp. 65–72. [Google Scholar]
41. Ohde, R. N. (1984). “ Fundamental frequency as an acoustic correlate of stop consonant voicing,” J. Acoust. Soc. Am. 75(1), 224–230. 10.1121/1.390399 [DOI] [PubMed] [Google Scholar]
42. Prom-on, S. , Xu, Y. , and Thipakorn, B. (2009). “ Modeling tone and intonation in Mandarin and English as a process of target approximation,” J. Acoust. Soc. Am 125(1), 405–424. 10.1121/1.3037222 [DOI] [PubMed] [Google Scholar]
43.R Core Team (2020). “ R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (version 3.1.1),” http://www.R-project.org/ (Last viewed June 22, 2020).
44. Silverman, K. E. A. (1984). “ F0 perturbations as a function of voicing of pre-vocalic and post-vocalic stops and fricatives, and of syllable stress,” in Proceedings of the Autumn Conference of the Institute of Acoustics, November 4–6, Windermere, UK, pp. 445–452. [Google Scholar]
45. Silverman, K. E. A. (1986). “ F0 segmental cues depend on intonation: The case of the rise after voiced stops,” Phonetica 43, 76–91. 10.1159/000261762 [DOI] [Google Scholar]
46. Titze, I. R. (1994). Principles of Voice Production ( Prentice-Hall, Englewood Cliffs, NJ: ). [Google Scholar]
47. Westbury, J. R. (1983). “ Enlargement of the supraglottal cavity and its relation to stop consonant voicing,” J. Acoust. Soc. Am. 73, 1322–1336. 10.1121/1.389236 [DOI] [PubMed] [Google Scholar]
48. Xu, Y. (1998). “ Consistency of tone-syllable alignment across different syllable structures and speaking rates,” Phonetica 55, 179–203. 10.1159/000028432 [DOI] [PubMed] [Google Scholar]
49. Xu, Y. (1999). “ Effects of tone and focus on the formation and alignment of F0 contours,” J. Phon. 27, 55–105. 10.1006/jpho.1999.0086 [DOI] [Google Scholar]
50. Xu, Y. (2013). “ ProsodyPro—A Tool for Large-scale Systematic Prosody Analysis,” in Proceedings of the Tools and Resources for the Analysis of Speech Prosody (TRASP 2013), August 30, Aix-en-Provence, France, pp. 7–10. [Google Scholar]
51. Xu, Y. (2019). “ Prosody, tone and intonation,” in The Routledge Handbook of Phonetics, edited by Katz W. F., and Assmann P. F. ( Routledge, London: ), pp. 314–356. [Google Scholar]
52. Xu, Y. (2020). “ Syllable is a synchronization mechanism that makes human speech possible,” PsyArXiv doi: 10.31234/osf.io/9v4hr. [DOI]
53. Xu, Y. , and Liu, F. (2006). “ Tonal alignment, syllable structure and coarticulation: Toward an integrated model,” Ital. J. Linguist. 18, 125–159. [Google Scholar]
54. Xu, Y. , and Prom-on, S. (2014). “ Toward invariant functional representations of variable surface fundamental frequency contours: Synthesizing speech melody via model-based stochastic learning,” Speech Commun. 57, 181–208. 10.1016/j.specom.2013.09.013 [DOI] [Google Scholar]
55. Xu, Y. , and Sun, X. (2002). “ Maximum speed of pitch change and how it may relate to speech,” J. Acoust. Soc. Am 111(3), 1399–1413. 10.1121/1.1445789 [DOI] [PubMed] [Google Scholar]
56. Xu, Y. , and Wang, Q. E. (2001). “ Pitch targets and their realization: Evidence from Mandarin Chinese,” Speech Commun. 33(4), 319–337. 10.1016/S0167-6393(00)00063-7 [DOI] [Google Scholar]
57. Xu, Y. , and Xu, C. X. (2005). “ Phonetic realization of focus in English declarative intonation,” J. Phon. 33, 159–197. 10.1016/j.wocn.2004.11.001 [DOI] [Google Scholar]
58. Zemlin, W. (1968). Speech and Hearing Science: Anatomy and Physiology ( Prentice-Hall, Englewood Cliffs, NJ: ). [Google Scholar]

[c1] 1. Atkinson, J. E. (1978). “ Correlation analysis of physiological factors controlling fundamental frequency,” J. Acoust. Soc. Am. 63(1), 211–222. 10.1121/1.381716 [DOI] [PubMed] [Google Scholar]

[c2] 2. Barr, D. J. , Levy, R. , Scheepers, C. , and Tilly, H. J. (2013). “ Random effects structure for confirmatory hypothesis testing: Keep it maximal,” J. Mem. Lang. 68, 255–278. 10.1016/j.jml.2012.11.001 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c3] 3. Bates, D. , Mächler, M. , Bolker, B. M. , and Walker, S. C. (2015). “ Fitting linear mixed-effects models using lme4,” J. Stat. Softw. 67(1), 1–48. 10.18637/jss.v067.i01 [DOI] [Google Scholar]

[c4] 4. Bell-Berti, F. (1975). “ Control of pharyngeal cavity size for English voiced and voiceless stops,” J. Acoust. Soc. Am. 57, 456–461. 10.1121/1.380468 [DOI] [PubMed] [Google Scholar]

[c5] 5. Berry, D. A. , Herzel, H. , Titze, I. R. , and Story, B. H. (1996). “ Bifurcations in excised larynx experiments,” J. Voice 10, 129–138. 10.1016/S0892-1997(96)80039-7 [DOI] [PubMed] [Google Scholar]

[c6] 6. Boersma, P. , and Weenink, D. (2020). “ Praat: Doing phonetics by computer (version 6.0.21) [computer program],” http://www.praat.org/ (Last viewed June 06, 2020).

[c7] 7. Chen, Y. (2011). “ How does phonology guide phonetics in segment–F0 interaction?,” J. Phon. 39(4), 612–625. 10.1016/j.wocn.2011.04.001 [DOI] [Google Scholar]

[c8] 8. Chen, S. , Zhang, C. , McCollum, A. G. , and Wayland, R. (2017). “ Statistical modelling of phonetic and phonologised perturbation effects in tonal and non-tonal languages,” Speech Commun. 88, 17–38. 10.1016/j.specom.2017.01.006 [DOI] [Google Scholar]

[c9] 9. Davidson, L. (2016). “ Variability in the implementation of voicing in American English obstruents,” J. Phon. 54, 35–50. 10.1016/j.wocn.2015.09.003 [DOI] [Google Scholar]

[c10] 10. Dixit, R. P. (1975). “ Neuromuscular aspects of laryngeal control, with special reference to Hindi,” Ph.D. thesis, University of Texas at Austin, Austin, TX. [Google Scholar]

[c11] 11. Evans, J. , Yeh, W. C. , and Kulkarni, R. (2018). “ Acoustics of tone in Indian Punjabi,” Trans. Philos. Soc. 116, 509–528. 10.1111/1467-968X.12135 [DOI] [Google Scholar]

[c12] 12. Ewan, W. G. , and Krones, R. (1974). “ Measuring larynx movement using the thyroumbrometer,” J. Phon. 2(4), 327–335. 10.1016/S0095-4470(19)31302-6 [DOI] [Google Scholar]

[c13] 13. Farley, G. R. (1996). “ A biomechanical laryngeal model of voice F0 and glottal width control,” J. Acoust. Soc. Am 100(6), 3794–3812. 10.1121/1.417218 [DOI] [PubMed] [Google Scholar]

[c14] 14. Fry, D. B. (1958). “ Experiments in the perception of stress,” Lang. Speech 1, 126–152. 10.1177/002383095800100207 [DOI] [Google Scholar]

[c15] 15. Gao, J. , and Arai, T. (2019). “ Plosive (de-)voicing and f0 perturbations in Tokyo Japanese: Positional variation, cue enhancement, and contrast recovery,” J. Phon. 77, 10932. 10.1016/j.wocn.2019.100932 [DOI] [Google Scholar]

[c16] 16. Haggard, M. , Ambler, S. , and Callow, M. (1970). “ Pitch as a voicing cue,” J. Acoust. Soc. Am. 47, 613–617. 10.1121/1.1911936 [DOI] [PubMed] [Google Scholar]

[c17] 17. Halle, M. , and Stevens, K. N. (1971). “ A note on laryngeal features,” MIT Q. Prog. Rep. 101, 198–212. [Google Scholar]

[c18] 18. Hanson, H. M. (2009). “ Effects of obstruent consonants on fundamental frequency at vowel onset in English,” J. Acoust. Soc. Am. 125, 425–441. 10.1121/1.3021306 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c19] 19. Hanson, H. M. , and Stevens, K. N. (2002). “ A quasiarticulatory approach to controlling acoustic source parameters in a Klatt-type formant synthesizer using HLsyn,” J. Acoust. Soc. Am. 112, 1158–1182. 10.1121/1.1498851 [DOI] [PubMed] [Google Scholar]

[c20] 20. Hill, N. (2019). The Historical Phonology of Tibetan, Burmese, and Chinese ( Cambridge University Press, Cambridge, UK: ). [Google Scholar]

[c21] 21. Hollien, H. (1960). “ Vocal pitch variation related to changes in vocal fold length,” J. Speech Lang. Hear. Res. 3, 150–156. 10.1044/jshr.0302.156 [DOI] [Google Scholar]

[c22] 22. Hombert, J.-M. (1978). “ Consonant types, vowel quality, and tone,” in Tone: A Linguistic Survey, edited by Fromkin V. A. ( Academic, New York: ), pp. 77–107. [Google Scholar]

[c23] 23. Hombert, J.-M. , Ohala, J. J. , and Ewan, W. (1979). “ Phonetic explanation for the development of tones,” Language 55, 37–58. 10.2307/412518 [DOI] [Google Scholar]

[c24] 24. House, A. S. (1961). “ On vowel duration in English,” J. Acoust. Soc. Am 33(9), 1174–1178. 10.1121/1.1908941 [DOI] [Google Scholar]

[c25] 25. House, A. S. , and Fairbanks, G. (1953). “ The influence of consonant environment upon the secondary acoustical characteristics of vowels,” J. Acoust. Soc. Am. 25, 105–113. 10.1121/1.1906982 [DOI] [Google Scholar]

[c26] 26. Jun, S.-A. (1996). “ Influence of microprosody on macroprosody: A case of phrase initial strengthening,” Technical Report No. 92, University of California at Los Angeles, Los Angeles, CA.

[c27] 27. Kingston, J. (2007). “ Segmental influences on F0: Automatic or controlled?,” in Tones and Tunes, Volume 2: Experimental Studies in Word and Sentence Prosody, edited by Gussenhoven C. and Riad T. ( Mouton de Gruyter, Berlin, Germany: ), pp. 171–201. [Google Scholar]

[c28] 28. Kirby, J. P. , and Ladd, D. R. (2016). “ Effects of obstruent voicing on vowel F0: Evidence from ‘true voicing’ languages,” J. Acoust. Soc. Am 140(4), 2400–2411. 10.1121/1.4962445 [DOI] [PubMed] [Google Scholar]

[c29] 29. Kirby, J. P. , Ladd, D. R. , Gao, J. , and Elliott, Z. (2020). “ Elicitation context does not drive F0 lowering following voiced stops: Evidence from French and Italian,” J. Acoust. Soc. Am. 148, EL147–EL152. 10.1121/10.0001698 [DOI] [PubMed] [Google Scholar]

[c30] 30. Kohler, K. J. (1982). “ F0 in the production of fortis and lenis plosives,” Phonetica 39, 199–218. 10.1159/000261663 [DOI] [PubMed] [Google Scholar]

[c31] 31. Kohler, K. J. (1990). “ Macro and micro F0 in the synthesis of intonation,” in Papers in Laboratory Phonology Volume 1: Between the Grammar and Physics of Speech, edited by Kingston J. and Beckman M. E. ( Cambridge University Press, Cambridge, UK: ), pp. 115–138. [Google Scholar]

[c32] 32. Ladefoged, P. (1967). Three Areas of Experimental Phonetics ( Oxford University Press, London: ). [Google Scholar]

[c33] 33. Lea, W. A. (1973). “ Segmental and suprasegmental influences on fundamental frequency contours,” in Consonant Types and Tone, edited by Hyman L. M. ( University of Southern California, Los Angeles, CA: ), pp. 15–70. [Google Scholar]

[c34] 34. Lehiste, I. , and Peterson, G. E. (1961). “ Some basic considerations in the analysis of intonation,” J. Acoust. Soc. Am. 33, 419–425. 10.1121/1.1908681 [DOI] [Google Scholar]

[c35] 35. Lenth, R. , Singmann, H. , Love, J. , Buerkner, P. , and Herve, M. (2020). “ Estimated marginal means, aka least-squares means (version 1.3.1),” https://CRAN.R-project.org/package=emmeans (Last viewed June 26, 2020).

[c36] 36. Liu, F. , Xu, Y. , Prom-on, S. , and Yu, A. C. L. (2013). “ Morpheme-like prosodic functions: Evidence from acoustic analysis and computational modeling,” J. Speech Sci. 3, 85–140. [Google Scholar]

[c37] 37. Löfqvist, A. , Baer, T. , McGarr, N. S. , and Story, R. S. (1989). “ The cricothyroid muscle in voicing control,” J. Acoust. Soc. Am. 85, 1314–1321. 10.1121/1.397462 [DOI] [PubMed] [Google Scholar]

[c38] 38. Löfqvist, A. , Koenig, L. L. , and McGowan, R. S. (1995). “ Vocal tract aerodynamics in /aCa/ utterances: Measurements,” Speech Commun. 16, 49–66. 10.1016/0167-6393(94)00049-G [DOI] [Google Scholar]

[c39] 39. Miller, D. G. , Švec, J. G. , and Schutte, H. K. (2002). “ Measurement of characteristic leap interval between chest and falsetto registers,” J. Voice 16(1), 8–19. 10.1016/S0892-1997(02)00066-8 [DOI] [PubMed] [Google Scholar]

[c40] 40. Ohala, J. J. (1974). “ A mathematical model of speech aerodynamics,” in Proceedings of the Speech Communication Seminar, April 1–3, Stockholm, Sweden, pp. 65–72. [Google Scholar]

[c41] 41. Ohde, R. N. (1984). “ Fundamental frequency as an acoustic correlate of stop consonant voicing,” J. Acoust. Soc. Am. 75(1), 224–230. 10.1121/1.390399 [DOI] [PubMed] [Google Scholar]

[c42] 42. Prom-on, S. , Xu, Y. , and Thipakorn, B. (2009). “ Modeling tone and intonation in Mandarin and English as a process of target approximation,” J. Acoust. Soc. Am 125(1), 405–424. 10.1121/1.3037222 [DOI] [PubMed] [Google Scholar]

[c43] 43.R Core Team (2020). “ R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (version 3.1.1),” http://www.R-project.org/ (Last viewed June 22, 2020).

[c44] 44. Silverman, K. E. A. (1984). “ F0 perturbations as a function of voicing of pre-vocalic and post-vocalic stops and fricatives, and of syllable stress,” in Proceedings of the Autumn Conference of the Institute of Acoustics, November 4–6, Windermere, UK, pp. 445–452. [Google Scholar]

[c45] 45. Silverman, K. E. A. (1986). “ F0 segmental cues depend on intonation: The case of the rise after voiced stops,” Phonetica 43, 76–91. 10.1159/000261762 [DOI] [Google Scholar]

[c46] 46. Titze, I. R. (1994). Principles of Voice Production ( Prentice-Hall, Englewood Cliffs, NJ: ). [Google Scholar]

[c47] 47. Westbury, J. R. (1983). “ Enlargement of the supraglottal cavity and its relation to stop consonant voicing,” J. Acoust. Soc. Am. 73, 1322–1336. 10.1121/1.389236 [DOI] [PubMed] [Google Scholar]

[c48] 48. Xu, Y. (1998). “ Consistency of tone-syllable alignment across different syllable structures and speaking rates,” Phonetica 55, 179–203. 10.1159/000028432 [DOI] [PubMed] [Google Scholar]

[c49] 49. Xu, Y. (1999). “ Effects of tone and focus on the formation and alignment of F0 contours,” J. Phon. 27, 55–105. 10.1006/jpho.1999.0086 [DOI] [Google Scholar]

[c50] 50. Xu, Y. (2013). “ ProsodyPro—A Tool for Large-scale Systematic Prosody Analysis,” in Proceedings of the Tools and Resources for the Analysis of Speech Prosody (TRASP 2013), August 30, Aix-en-Provence, France, pp. 7–10. [Google Scholar]

[c51] 51. Xu, Y. (2019). “ Prosody, tone and intonation,” in The Routledge Handbook of Phonetics, edited by Katz W. F., and Assmann P. F. ( Routledge, London: ), pp. 314–356. [Google Scholar]

[c52] 52. Xu, Y. (2020). “ Syllable is a synchronization mechanism that makes human speech possible,” PsyArXiv doi: 10.31234/osf.io/9v4hr. [DOI]

[c53] 53. Xu, Y. , and Liu, F. (2006). “ Tonal alignment, syllable structure and coarticulation: Toward an integrated model,” Ital. J. Linguist. 18, 125–159. [Google Scholar]

[c54] 54. Xu, Y. , and Prom-on, S. (2014). “ Toward invariant functional representations of variable surface fundamental frequency contours: Synthesizing speech melody via model-based stochastic learning,” Speech Commun. 57, 181–208. 10.1016/j.specom.2013.09.013 [DOI] [Google Scholar]

[c55] 55. Xu, Y. , and Sun, X. (2002). “ Maximum speed of pitch change and how it may relate to speech,” J. Acoust. Soc. Am 111(3), 1399–1413. 10.1121/1.1445789 [DOI] [PubMed] [Google Scholar]

[c55a] 56. Xu, Y. , and Wang, Q. E. (2001). “ Pitch targets and their realization: Evidence from Mandarin Chinese,” Speech Commun. 33(4), 319–337. 10.1016/S0167-6393(00)00063-7 [DOI] [Google Scholar]

[c56] 57. Xu, Y. , and Xu, C. X. (2005). “ Phonetic realization of focus in English declarative intonation,” J. Phon. 33, 159–197. 10.1016/j.wocn.2004.11.001 [DOI] [Google Scholar]

[c57] 58. Zemlin, W. (1968). Speech and Hearing Science: Anatomy and Physiology ( Prentice-Hall, Englewood Cliffs, NJ: ). [Google Scholar]

PERMALINK

Consonantal F0 perturbation in American English involves multiple mechanisms

Yi Xu

Anqi Xu

Abstract

I. INTRODUCTION

A. Vertical perturbation and macro vs micro F0

FIG. 1.

B. Voice break and F0-syllable alignment

FIG. 2.

FIG. 3.

C. The present study

II. METHOD

A. Stimuli

TABLE I.

B. Subjects

C. Recording procedure

D. Pitch extraction and processing

FIG. 4.

FIG. 5.

III. RESULTS

A. Graphical comparison of F0 contours

FIG. 6.

FIG. 7.

FIG. 8.

FIG. 9.

B. Statistical analysis

FIG. 10.

1. Carryover effect

a. Consonant closure duration.

TABLE II.

TABLE III.

b. Onset F0 and F0 jump.

FIG. 11.

FIG. 12.

FIG. 13.

FIG. 14.

FIG. 15.

c. F0 elbow and elbow jump.

TABLE IV.

FIG. 16.

FIG. 17.

d. Offset F0.

TABLE V.

2. Anticipatory effect

a. Effect of syllable boundary.

TABLE VI.

b. Time course of anticipatory F0 perturbation in CVC syllables.

FIG. 18.

IV. DISCUSSION

A. Large brief perturbations

FIG. 19.

B. Sustained carryover perturbation

C. Anticipatory perturbation by obstruent coda consonants

V. CONCLUSION

ACKNOWLEDGMENTS

APPENDIX A

TABLE VII.

APPENDIX B

TABLE VIII.

APPENDIX C

TABLE IX.

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Consonantal F₀ perturbation in American English involves multiple mechanisms

A. Vertical perturbation and macro vs micro F₀

B. Voice break and F₀-syllable alignment

A. Graphical comparison of F₀ contours

b. Onset F₀ and F₀ jump.

c. F₀ elbow and elbow jump.

d. Offset F₀.

b. Time course of anticipatory F₀ perturbation in CVC syllables.