Abstract
Voice quality has been defined variously in the literature ranging from states or postures of the glottis and vocal tract in general most broadly, to a narrower definition which refers to characteristics of vocal fold vibration during voiced phonation. Linguists have traditionally broken the voicing continuum into five basic categories based on roles they play in a language’s phonology: (spread) voiceless, breathy, modal, creaky, (constricted) voiceless. Of these, the three central states, breathy, modal, creaky, are relevant to voice quality as discussed in this work. Voice qualities can be modelled as an interaction between subglottal pressure, degree of vocal fold approximation (aperture), longitudinal tension of the vocal folds (stiffness), and medial compression of the vocal folds (thickness). Breathy voicing is achieved with high glottal aperture, low stiffness, and low thickness, resulting in noise, low pitch, and increased spectral tilt. Creaky voicing has differing realizations depending on its linguistic role, prototypical creaky voice has low aperture, low stiffness, and high thickness, resulting in irregular and lower pitch, and decreased spectral tilt. Several other types of creaky voice quality exist including: 1) glottal fry, 2) multiply pulsed voice, and 3) nonconstricted creak. In this paper, we focus on creaky voice broadly-defined and concentrate on its distribution in North American English. While not contrastive, it plays an important role in phonology and a wide variety of other discourse, pragmatic, and social functions. In this context we present some of our current research into segmental and social factors relating to creaky voicing. We find a correlation between vowel height and creaky voicing. We also find evidence that voice quality is used by men to index gender in conversational speech. Our findings bear on the debate about the sociolinguistic uses of voice quality.
Keywords: Phonetics, voice quality, phonation, acoustics, indexicality
Abstract
La qualité de voix a été définie de multiples façons dans la littérature spécialisée. Certains l’identifient, de manière très générale, aux diverses configurations de la glotte et de l’appareil phonatoire, alors que d’autres la restreignent aux modes de vibration des replis vocaux lors des épisodes de phonation voisée. Les linguistes considèrent traditionnellement que le continuum de la phonation recouvre cinq catégories principales, définies en fonction du rôle qu’elles remplissent dans la phonologie des langues du monde : la phonation nulle avec replis vocaux écartés, la voix soufflée, la voix modale, la voix craquée et la phonation nulle avec replis vocaux resserrés. Parmi ces cinq catégories, les trois états centraux : la voix soufflée, la voix modale et la voix craquée relèvent du domaine de la qualité de voix telle que traitée dans cet article. Les différentes qualités de voix peuvent se modéliser selon une interaction entre pression subglottique et degré d’ouverture, tension longitudinale et compression médiane des replis vocaux. La voix soufflée est produite en conférant un haut degré d’ouverture, un faible degré de tension longitudinale et un faible degré de compression médiane aux replis vocaux. La conjonction de ces ajustements engendre un bruit de friction, des fréquences basses et une élévation de l’inclinaison spectrale. La voix craquée se décline en diverses variétés en fonction de la fonction qu’elle remplit sur le plan linguistique. La voix craquée prototypique se définit par un faible degré d’ouverture des replis vocaux, un faible degré de tension longitudinale et un haut degré de compression médiane. La combinaison de ces ajustements résulte en des fréquences basses et irrégulières ainsi qu’en un abaissement de l’inclinaison spectrale. Il existe plusieurs autres types de voix craquées, au nombre desquels : 1) la « friture » glottique (glottal fry), 2) la voix pulsée (multiply pulsed voice) et 3) le craquement sans constriction (nonconstricted creak). Dans cet article, nous nous concentrons sur la voix craquée définie dans son acception la plus large et nous intéressons à sa distribution en anglais nord-américain. Bien que la voix craquée ne possède pas un statut contrastif, elle joue un rôle prépondérant en phonologie et remplit une gamme étendue de fonctions sociales, discursives et pragmatiques. C’est dans cette perspective que nous présentons une partie de l’état actuel de notre recherche sur les facteurs sociaux et les caractéristiques segmentales corrélés à la production de la voix craquée. Nous mettons ainsi en lumière le rapport observé entre hauteur des voyelles et usage de la voix craquée ainsi que certains éléments empiriques indiquant que les locuteurs hommes utilisent la qualité de voix pour marquer leur masculinité. Ces résultats sont de nature à enrichir le débat sur les fonctions sociolinguistiques de la qualité de voix.
Keywords: Phonétique, qualité de voix, phonation, acoustique
1. Introduction
The term voice quality can be used to indicate a wide variety of language specific or idiosyncratic ‘phonetic settings’ or ‘speech postures’ involving articulators throughout the vocal tract as in Laver (1980) or Mennen et al. (2010), but in this work we use the term more narrowly to refer to the quality of vibration of the vocal folds during voiced phonation. This definition is sometimes also referred to in the literature as phonation type or state of the glottis as in Gordon and Ladefoged (2001). Within this fairly narrow definition, a wide variety of voice qualities appear in the scholarly literature spanning the functional range from emotion to voice disorders to singing as noted in Laver (1980), Sapienza et al. (2011), Kreiman and Sidtis (2011), and Sundberg (1987), but linguistic phoneticians have traditionally situated voice quality within the voicing continuum with five basic categories based on roles they play in a language’s phonology: (spread) voiceless, breathy, modal, creaky, (constricted) voiceless as defined by Ladefoged (1971). The five states are illustrated in Figure 1. At one end of the continuum lie the voiceless-aspirated stops and voiceless fricatives where the vocal folds are spread far enough apart to prevent vibration, and at the other end lies the glottal stop and voiceless glottalized and ejective stops where there is sufficient vocal fold constriction to prevent vibration. The intermediate three categories, which are relevant to the definition of voice quality in this work, involve quasi-periodic vibration of the vocal folds (voicing) with varying vibratory qualities.
Figure 1.

Five voice qualities and their relationship to glottal aperture
As reviewed in Gordon and Ladefoged (2001), voice quality is a contrastive segmental feature of phonemes in many languages, such as breathy voiced nasal series in Hindi, Newar and Tsonga, or the creaky voiced nasal series in Montana Salish, Hupa and Kashaya. Similarly, breathy allophone of /h/ and the creaky allophone of /ʔ/ in many dialects of English can be seen as segmental as well. In other languages, voice quality is suprasegmental, being part of voice register systems in Mon-Khmer languages as described in Ferlus (1979), associated with tone register in a variety of languages as reviewed in Blankenship (2002), or as prosodic marking as described in Dilley et al. (1996) and Garellek (2013).
In this paper, we will concentrate on linguistic uses of voice quality, narrowly defined to refer only to voice phonation types, in North American English. In it we will discuss segmental, and prosodic uses, as well as some pragmatic, and sociolinguistic uses. Because creaky voice dominates the literature on linguistic uses of voice quality in North American English, it features more prominently in our discussion of linguistic uses and we focus our experimentation in this work on creaky voice (broadly-defined and to the exclusion of pressed voice and breathy voice). Our paper consists of five components: 1) a review of the basic physiology of phonatory control for voice quality, 2) a review of the types and features of voice quality, 3) a review of linguistic uses of voice quality in North American English, and 4) presentation of the results of our own research on voice quality in conversational speech.
2. Basic physiology and acoustic signatures of phonatory control
In voiced phonation, vocal folds open and close from their bottom towards their top in a smooth gliding action known as the “mucosal wave”. The vocal ligament and muscle of the body of the vocal fold (the thyroarytenoid) is relatively static and movement of the surface of the vocal fold relies on the mucosal surface gliding over the underlying ligament as described in Kreiman and Sidtis (2011). Articulation of voice quality can be modelled as an interaction between subglottal pressure, degree of vocal fold adductive tension (aperture), longitudinal tension of the vocal folds (stiffness), and medial compression of the vocal folds (thickness) as modelled by Zhang (2016), illustrated in Figure 2.
Figure 2.

Illustration of adductive tension, longitudinal tension, and medial compression
For vocal fold vibration to take place several conditions must be met: 1) sufficient approximation of the folds, 2) sufficient but not too much longitudinal tension and medial compression of the folds, and 3) a sufficient drop in pressure across the glottis – greater sub-glottal than supra-glottal pressure.
The five categories of voicing noted in Ladefoged (1971) can also be described in terms of their open quotient – the portion of the glottal cycle that the glottis is open as described in Klatt and Klatt (1990). At the spread-voiceless endpoint the open quotient of is 1 (100% open, no cycle) and at the constricted-voiceless endpoint the open quotient is 0 (100% closed, no cycle). At the mid-point is modal voicing with moderate adductive tension, moderate medial compression, moderate longitudinal tension, and a .5 open quotient. Modal voicing is characterized by a wide pitch range, no turbulent noise, and a relatively linear spectral tilt (the decrease in energy as frequency rises). Relative to modal voicing, breathy voicing (sometimes referred to as murmur) is characterised by higher glottal aperture, an open quotient of .66, resulting in concomitant turbulent flow, with lower longitudinal tension, and decreased medial compression, resulting in an acoustic signal characterized by noise, by a lower vibratory rate (perceived as lower pitch), and by increased spectral tilt as described in Kreiman and Gerratt (2012), and Kreiman et al. (2014). Creaky voicing has differing realizations depending on its linguistic role as described in Keating et al. (2015). Prototypical creaky voice has increased adductive tension, an .33 open quotient, lower longitudinal tension, and increased medial compression, resulting a signal with an irregular and a lower vibratory rate (resulting in the perception of noise and lower pitch), and with decreased (flat) spectral tilt. Figure 3 shows three spectrograms illustrating breathy, modal and prototypically creaky voice qualities.
Figure 3.

Spectrograms (5000 Hz, 195 ms) of a woman saying /mæ/ in “matches” in conversation
Keating and her colleagues (2015) have identified several other types of voice quality which are typically labelled creaky, but which differ from prototypical creak on one or more dimensions including: 1) vocal fry, 2) multiply pulsed voice, 3) aperiodic voice, and 4) nonconstricted creak, also referred to as breathy creak by Laver (1980) or as Slifka voice based on the work of Slifka (2000; 2006). Vocal fry is characterized by a low vibratory rate, high adductive tension, damped cycles (low open quotient), moderate medial compression, low longitudinal tension, and decreased (flattened) spectral tilt. Multiply pulsed voice is characterized by alternation in vocal cycle durations resulting in an alternating vibratory rate, a rough voice quality with significant perceived noise, a low open quotient, and a decreased (flattened) spectral tilt. Aperiodic voice is characterized by extreme periodic irregularity with a significant amount of perceived noise, low longitudinal tension, and increased adductive tension resulting in a low open quotient and flattening of the spectral tilt. Nonconstricted creak is characterized by an irregular vibratory rate with a low longitudinal tension, low adductive tension (akin to that found in breathy voicing), but high medial compression resulting in a signal with a perceived low and irregular pitch and significant glottal turbulence perceived as breathiness. The different sub-categories of creaky voice are associated with different linguistic, prosodic, pragmatic contexts, and speaker-specific characteristics in North American English as noted in Garellek (2012) and Keating et al. (2015).
3. Acoustic classification of voice quality
As Keating and her colleagues note (2015), what is typically labelled creaky voice can come in several different sub-categories, there is no single best acoustic measure for classification. Rather, multiple, complementary, measures should be used. Therefore, any study employing a single measure should be circumspect since it may be that group X has more creak than Y, but it may also be that group Y has equal amounts of creak, but manifests different types that escape that single applied measure. Spectral tilt, the drop in harmonic intensity with increasing frequency, can be measured with H1*-H2* (corrected 1st harmonic minus the corrected 2nd harmonic) as described in Shue (2010). The correction accounts for the effect of vowel formant peaks on harmonic intensity and vocal tract length. Breathy voice has the steepest and creaky voice has the shallowest tilt. Jitter, a measure of the cycle to cycle duration variability, is greatest for creaky voice and least for modal voice. Shimmer, the cycle to cycle variation of amplitude, is greatest for creaky voice and least for modal voice. Intensity (RMS), the average amplitude of the glottal cycle, is greatest for modal voice and least for creaky voice. Harmonics to noise ratio, a measure of the relative periodic energy to aperiodic energy, is greatest for modal voice and least for breathy voice. Similarly related to the periodic energy, cepstral peak prominence, narrowness and prominence of the harmonics relative to the average energy, is greatest for modal voice and least for creaky voice. Fundamental frequency (ƒ0), the average cycle frequency, is depressed in both breathy and creaky voice relative to modal voice. Finally, variance of pitch tracks, a measure developed in Panfili (2018) which calculates the sum of differences across pitch-tracking algorithms, is greatest for creaky voice and lowest for modal voice.
4. Uses of voice quality in North American English
While not contrastive in North American English (excluding the voiceless endpoints of the voicing continuum), voice quality plays an important role in phonology through consonantal allophony and through prosodic marking. Syllable final /t/ (and in some dialects /p/) have a glottal stop allophone word finally and pre-consonantally as described in Pierrehumbert (1995). Glottal stops are inserted before word initial vowels and variably before word initial resonants (nasals, liquids, and glides) as documented in Dilley et al. (1996). As Klatt and Klatt (1990) have documented, glottal stops are often realized as creaky voicing, especially in conversational speech, so glottal stop allophony and insertion are significant sources of creaky voicing. Intervocalic /h/ is often realized as breathy voicing, particularly in conversational speech.
Various studies have documented that non-modal voice quality is used to mark a variety of levels of prosodic boundaries, including Dilley et al. (1996), Zhuang et al. (2008), and Garellek (2013). Creaky voicing often occurs at the onset of prosodic units and Dilley et al. (1996) found that it is hierarchical in its probability of ehavioura: phrase initial > phrase medial > word initial. They also found that accentuation is marked by creaky voicing and interacts with hierarchical prosodic edge marking: pitch-accented > non-accented. Klatt and Klatt (1990) found that utterance final vowels typically terminate in either creaky or breathy voicing. Kreiman (1980) and Redi and Shattuck-Hufnagel (1982) found that phrase final creaky voicing typically extends over a longer portion of the vowel than creaky voicing associated with the glottal-stop allophone of /t/. Further, Garellek and Seyfarth (2016) found that phrase final creaky voicing is often realized as nonconstricted creak and can be further distinguished from glottal-stop creak (prototypical creak) using cepstral peak prominence as a measure. Speakers of North American English are aware of these patterns; Crowhurst (2018) conducted perceptual experiments that found that listeners align creaky voicing with phrase-final position to segment rhythmic sequences into smaller, foot or word-sized, units.
Over the past two decades, creaky voicing has also been shown to contribute to the marking of a wide variety of other discourse, pragmatic, and social functions as described in a variety of studies, including Henton (1989), Klatt and Klatt (1990), Kreiman et al. (2005), Esling and Edmondson (2010), Gobl and Ní Chasaide (2010), Yuasa (2010), Pépiot (2014), and Podesva and Callier (2015). Of recent interest is the association of voice quality with gender identity and other social-indexical features. For example, Henton (1989), Klatt and Klatt (1990), and Kreiman et al. (2005) found a greater use of creaky voicing amongst male identifying speakers than in female identifying speakers, while Yuasa (2010) and Podesva and Callier (2015) found the opposite pattern. The disagreement in the literature suggests caution when attributing observed patterns in a particular data set to broad social categories like gender. Widely cited work, such as that of Byrd (1994) and Dilley et al. (1993), finding that women use creak more than men, is interpreted in subsequent literature as broadly gender indexing. Byrd (1994) measured a variety of phonetic variables in the TIMIT corpus of Garofolo et al. (1993), which consists of sentences read by speakers from several different dialect regions of North America. She found that women were more likely to insert glottal stops than men. This is often cited as indicating that women in the corpus use more creaky voicing, and because glottal stop insertion is often realized as creaky voice this is likely the case. Byrd found that in general women exhibited more carefully and precisely articulated speech in the corpus: they spoke at a slower rate, they released their stops more frequently, and they used the flap allophone of /t/ less. Therefore, the higher frequency of glottal stop insertion could easily be attributed to prosodic fortition effects of the type noted in Dilley et al. (1993) and to glottal reinforcement of /t/ and /p/ on word final positions. Dilley et al. (1993) is also often cited as evidence that women use creaky voice more than men in North American English to index gender. In this study, the authors investigated patterns of glottal stop insertion and creaky voicing in news casters reading scripts. Overall, they found patterns that highlighted the prosodic boundaries hierarchically, however they found profound individual differences in the overall amount of glottalization ranging from 13% to 44% of the total speech. Moreover, the study was not designed to be a gender study with only 5 readers (3 of whom were women, and one of the two men was the 13% speaker), so any generalization broadly to gender indexing is tenuous. Similarly, work by Henton (1989) indicating that women use breathy voice quality to index gender have been criticized for over interpreting their results by Podesva and Callier (2015) amongst others. Kreiman et al. (2005) note that voice quality is also an individual trait correlated with a variety of factors: height and age, ehavioural traits such as smoking or public speaking, and physiological and psychological states such as mood, stress level, and health. Podesva and Callier (2015) provide a review of recent studies of voice quality as a gender indexing feature and find that there is high individual and contextual variability with significant interactions with gender, age, prosodic, and segmental factors. Indeed, Eckert (2008) points out that current notions of indexicality are fluid and context dependent, so we should expect that if speakers are indexing gender using voice quality, they might very well do it differently in different studies.
5. Current study
In research at the UW Linguistic Phonetic Laboratory we have used the ATAROS Corpus of Freeman et al. (2014) to investigate voice quality in conversational speech. In the corpus, participants in dyads engage in a series of collaborative problem-solving tasks. The speech is hand transcribed and force aligned at the phone level, and a subset of the conversations are hand annotated for various speech acts related to stance. Two studies described here advocate for a more nuanced approach to treatments of creaky voice in linguistic and social contexts.
5. 1. Study 1: Vowel height and voice quality
The first study, Panfili (2018), investigates the interaction of vowel height and probability of creaky and breathy voicing. In it, the speech of a subset of 8 conversational dyads (5 FF, 3 FM) totalling 95 minutes was coded for the presence of creaky, breathy, and modal voicing on stressed vowels in content words (2,459 vowels total). Overall, the greatest percentage of vowels were perceived as having modal voicing, while breathy voicing rarely occurred. There was a large and statistically reliable effect of vowel height on voice quality: creaky voicing was much more likely to occur on low vowels as illustrated in Figure 4.
Figure 4.

Proportion of modal, creaky, and breathy voice by vowel height (low > high)
The vowel-height effect is consistent for both men and women and each group had roughly the same amount of creaky voicing overall. This result supports previous findings that segmental factors influence the distribution of voice qualities and underscores the importance of taking into consideration segmental factors in observations about voice quality. It is most likely related to the same physiological conditions that underlie intrinsic fundamental frequency (iƒ0), a finding that high vowels have higher ƒ0 than low vowels as discussed in Whalen et al. (1998) and as studied physiologically in Vilkman and Karma (1989). While the physiological mechanisms are not fully understood, there is evidence that the higher ƒ0 in high vowels relative to low vowels results from lengthening and thinning of the vocal folds (longitudinal tension) which is a result of the thyroid cartilage raising and tilting forward; advancing the tongue root in high vowels results in the advancement of the embedded hyoid bone which is attached to the superior horns of the thyroid cartilage causing the cartilage to raise and tilt forward. The tilting elongates and thins the focal folds resulting in a lower local mass along their length thereby increasing their vibratory frequency. Longitudinal tension is contradicted in creaky voicing which is achieved through lowered longitudinal tension.
5.2. Study 2: Dyad gender makeup and convergence in voice quality
The second study investigates the interaction of dyadic gender makeup and the probability of creaky voicing using conversational dyads. The study is based on sets of 20 dyads, 12 gender matched dyads, and 8 mixed gender dyads (FM) resulting in a total of approximately 300 minutes of speech. Creaky voiced speech was classified using Drugmen et al.’s (2014) Artificial Neural Network (ANN) with a set of seven acoustic features which are designed to capture the various dimensions of non-modal voicing described earlier in this paper. To reduce the influence of phonological and lexical effects, the output of the classifier was modelled to exclude function words and words initial vowels (with glottal stop insertion), and word syllable final /t/ (with glottal reinforcement and glottal stop replacement). To verify the accuracy of the automatic classification, its output was compared to a smaller set of hand-annotated dyads using a Pearson correlation coefficient. While the ANN output is noisier than the hand annotations, as expected, the test indicated a significant overall correlation to the hand- annotated data (p=0.67, p<0.001).
On the whole, in this data men produced slightly more creaky voiced speech; however, the effect size is small and therefore we interpret it as not meaningful given the noise involved in automatic classification of voicing. When examined by dyadic makeup, more interesting results emerge. Women were very consistent in their probability of using creaky voicing regardless of the dyad type. This indicates that, at least in this kind of collaborative task with a stranger, women are not changing their use of voice quality as the social setting changes. On the other hand, men showed a significant effect of dyadic makeup on their use of creaky voicing – they exhibited a significantly higher probability of creaky voice when interacting with women than they did when interacting with men. This result is illustrated in Figure 5 which plots standard deviations in creaky voice rate by dyad type. There are two interesting points this graph highlights. The first is that men are much more likely to use creaky voice when they are talking to women than when they are talking to men in this task. The second is that there is much more variation in the mixed gender dyad for men as well. The variation could be caused by between speaker, individual, variation, it could be caused by within speaker changes over time, or some combination thereof.
Figure 5.

Standardized creaky voice rate by dyad type.
The influence on dyadic makeup on the male speakers use of creaky voicing is consistent with the proposal by Podesva and Callier (2015) and others that voice quality is used to index gender, and that it is fluid and dependent on context.
In an effort to delve deeper into social role of voice quality in the ATAROS dyadic conversations, we explored convergence (aka accommodation or entrainment) in voice quality. To do this we use two measures employed in Levitan and Hirschberg (2011): proximity and convergence. Proximity is a measure of similarity of the speech patterns of a speaker to an interlocutor across an entire conversation, while convergence and divergence signal an increase or decrease in similarity of speech patterns throughout a conversational session. Convergence across the session is determined by calculating the difference in partner similarity at an early and late window during the conversation. Convergence is considered evident if partners show significantly smaller differences in creak rate in the later window of the conversation. The results of the convergence metric are shown in Figure 6.
Figure 6.

Proximity scores for men and women. A positive score represents convergence.
The first thing to note is that women on the whole converge less than men and that they show a significant amount variance – since convergence is a change measure, the variance is between speakers. The distribution of the variance indicates that some women converge (positive numbers), while others diverge (negative numbers), but on the whole the gender makeup of the dyad does not affect convergence or divergence. Men show significantly higher convergence rates than women in both same and mixed gender dyads, and all of the scores are positive, but there is no statistically reliable difference between the same and mixed gender dyads for men either. While men show about the same amount of variance as women in the same-gender dyads, indicating between speaker variance, they are much more consistent in the amount of convergence in mixed gender dyads.
Taken together the overall creaky voice rate and the convergence results paint an interesting picture, and one that does not agree with recent assertions and findings that North American women have more creaky voicing in their speech than men, and that they are using voice quality to index gender. In our study, men and women use about the same percentage of creaky voice quality, but men use significantly more than women when they are conversing with women rather than with men. Furthermore, on the whole men show more convergence than women in the use of creaky voice quality. The convergence study helps to clarify the large amount of variance in the men’s use of creaky voicing in mixed dyads; it appears that at the outset men use more creaky voicing when beginning a task with a woman and then decrease the amount of creaky voicing to match that of the women’s speech. We interpret this result to indicate that men are using creaky voice quality to index gender when beginning a task with a stranger who is a woman, but decrease creakiness over time as they become more familiar with their interlocutor. This finding highlights the fluid nature of voice quality and underscores the importance of considering the task, time course, and the social context of the conversation.
6. Conclusion
Taken on the whole these two studies highlight the complexity of voice quality use and the care that must be taken in interpreting results. In one, a segmental factor – vowel height – is shown to have a large and reliable effect on the probability that a vowel has creaky voicing in conversation. Therefore, researchers studying voice quality may want to ensure that their sample is not skewed towards one vowel height or another. Furthermore, it underscores the importance of controlling or balancing for well-established segmental and prosodic factors related to changes in voice quality when conducting research on social or pragmatic effects on creaky or breathy voicing. In the other, male speakers are no more or less likely to use creaky voicing than their female counterparts overall; however, much of the male creaky voicing derives from the fact that they are significantly more likely to use creak in mixed dyads than in male-male ones. Interestingly, female speakers in this corpus appear to be unaffected by the gender makeup of the dyad. This finding supports the social-indexical role for voice quality modulation, as described in Podesva and Callier (2015). However, this finding is most likely related to the kind of task that the speakers were engaged in: a dyad made up of strangers solving specific problems together. It is likely that under other conditions the women in the study would exhibit more creaky voicing, or would have been influenced by the makeup of the conversational speech. On the whole it argues for careful balancing of segmental and prosodic factors together careful control (statistical or experimental) of the social and pragmatic variables in experiments probing voice quality.
Contributor Information
Richard Wright, University of Washington.
Courtney Mansfield, University of Washington.
Laura Panfili, University of Washington.
References
- Blankenship Barbara. “The timing of nonmodal phonation in vowels”, Journal of Phonetics, 30, 163–191, 2002. DOI : 10.1006/jpho.2001.0155 [DOI] [Google Scholar]
- Byrd Dani. “Relations of sex and dialect to reduction”, Speech Communication, 15: 39–54, 1994. DOI : 10.1016/0167-6393(94)90039-6 [DOI] [Google Scholar]
- Crowhurst Megan. “The influence of varying vowel phonation and duration on rhythmic grouping biases among Spanish and English speakers”. J. of Phonetics, 66, 82–89, 2018. DOI : 10.1016/j.wocn.2017.09.001 [DOI] [Google Scholar]
- Dilley Laura, and Shattuck-Hufnagel Stefanie. “Variability in glottalization of word onset vowels in American English.” In Proceedings of the XIIIth international congress of phonetic sciences, Stockholm, pp. 586–589, 1995. [Google Scholar]
- Drugman Thomas, Kane John, and Gobl Christer. “Data-driven detection and analysis of the patterns of creaky voice”. Computer Speech & Language 28.5 (2014): 1233–1253. DOI : 10.1016/j.csl.2014.03.002 [DOI] [Google Scholar]
- Eckert Penelope. “Variation and the indexical field”. Journal of Sociolinguistics 12/4, 2008: 453–476, 2008. [Google Scholar]
- Esling John H., and Edmondson Jeronl A.. “Acoustical analysis of voice quality for sociophonetic purposes”. In Paolo MD, & Yaeger-Dror M (Eds.), Sociophonetics: A student’s guide chapter 11. London: Routledge, 2010. [Google Scholar]
- Ferlus Michel. “Formation des registres et mutations consonantiques dans les langues mon-khmer”. Mon-Khmer Studies, 8, 1–76, 1979. [Google Scholar]
- Freeman Valerie, Chan J, Levow Gina-Anne, Wright Richard, Ostendorf Mari, and Zayats Victoria. “Manipulating stance and involvement using collaborative tasks: An exploratory comparison”, in Proceedeings of Interspeech, 2014. [Google Scholar]
- Garellek Marc. “The timing and sequencing of coarticulated non-modal phonation in English and White Hmong”. Journal of Phonetics, 40, 152–161, 2012. DOI : 10.1016/j.wocn.2011.10.003 [DOI] [Google Scholar]
- Garellek Marc. Production and perception of glottal stops. Ph.D. thesis UCLA, 2013. [Google Scholar]
- Garellek Marc, and Seyferth Scott. “Acoustic differences between English /t/ glottalization and phrasal creak”, Proceedings of Interspeech,1055–1058, 2016 [Google Scholar]
- Gobl Christopher, and Chasaide Ailbhe Ní. “Voice source variation and its communicative functions”. In: Hardcastle W, Laver J, Gibbon F (eds), The Handbook of Phonetic Sciences (Second Edition). Oxford: Blackwell, 378–423, 2010. [Google Scholar]
- Gordon Matthew, and Ladefoged Peter N.. “Phonation types: a cross-linguistic overview”. Journal of Phonetics, 29, 383–406, 2001. DOI : 10.1006/jpho.2001.0147 [DOI] [Google Scholar]
- Henton Caroline G. (1989). “Sociophonetic aspects of creaky voice”, Journal of the Acoustical Society of America, 86: S26. DOI : 10.1121/1.2027434 [DOI] [Google Scholar]
- Garofolo John S., Lamel Lori F., Fisher William M., Fiscus Jonathan G., Pallett David S., Dahlgren Nancy L., Zue Victor. TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1. Web Download. Philadelphia: Linguistic Data Consortium, 1993. [Google Scholar]
- Keating Patricia, Garellek Marc, and Kreiman Jody. “Acoustic properties of different kinds of creaky voice”, in Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow, 2015. [Google Scholar]
- Klatt Dennis H. and Klatt Laura C.. “Analysis, synthesis and perception of voice quality variations among female and male talkers”, Journal of the Acoustical Society of America, 87: 820–857, 1990. DOI : 10.1121/1.398894 [DOI] [PubMed] [Google Scholar]
- Kreiman Jody. “Perception of sentence and paragraph boundaries in natural conversation”, Journal of Phonetics, 10, 163–175. 1982. DOI : 10.1016/S0095-4470(19)30955-6 [DOI] [Google Scholar]
- Kreiman Jody, and Gerratt Bruce R.. “Perceptual interaction of the harmonic source and noise in voice”. Journal of the Acoustical Society of America, 131, 492–500, 2012. DOI : 10.1121/1.3665997 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kreiman Jody, and Sidtis Diana. Foundations of Voice Studies. Oxford: Wiley-Blackwell. [Google Scholar]
- Kreiman Jody, Gerratt Bruce R., Garellek Marc, Samlan Robin, and Zhang Zhaoyan. “Toward a unified theory of voice production and perception”. Loquens, e009, 2014. DOI : 10.3989/loquens.2014.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kreiman Jody, Vanlancker-Sidtis Diana, and Gerratt Bruce R. “Perception of voice quality”. In Pisoni D and Remez R (eds.), Handbook of speech perception (pp. 338–362). Malden, MA: Blackwell, 2005. [Google Scholar]
- Ladefoged Peter N. Preliminaries to linguistic phonetics. Chicago: University of Chicago, 1971. DOI : 10.7208/chicago/9780226221892.001.0001 [DOI] [Google Scholar]
- Laver John (1980). The Phonetic Description of Voice Quality. Cambridge: Cambridge University Press. [Google Scholar]
- Levitan Rivka and Hirschberg Julia. “Measuring acoustic-prosodic entrainment with respect to multiple levels and dimensions.” Proceedings of the Twelfth Annual Conference of the International Speech Communication Association, 2011. [Google Scholar]
- Mennen Ineke, Scobbie James, Esther de Leeuw Sonja Schaeffler, and Schaeffler Felix. “Measuring language-specific phonetic settings”, Second Language Research, Vol. 26, No. 1, Special Issue: “Innovative and quantitative methods for bilingualism research; ”, 13–41, 2010. DOI : 10.1177/0267658309337617 [DOI] [Google Scholar]
- Panfili Laura M. Cross-Linguistic Acoustic Characteristics of Phonation: A Machine Learning Approach, University of Washington PhD dissertation, 2018. [Google Scholar]
- Pépiot Erwan. “Male and female speech: a study of mean f0, f0 range, phonation type and speech rate in Parisian French and American English speakers”. Speech Prosody 7, May 2014, Dublin, Ireland. pp.305–309, 2014. [Google Scholar]
- Pierrehumbert Janet. “Prosodic effects on glottal allophones”. In Vocal Fold Physiology: Voice Quality Control (Fujimura O& Hirano M, editors), 39–60. San Diego: Singular Publishing Group, 1995. [Google Scholar]
- Podesva Robert, and Callier Patrick. “Voice Quality and Identity”. Annual Review of Applied Linguistics, 35, 2015. DOI : 10.1017/S0267190514000270 [DOI] [Google Scholar]
- Redi Laura, and Shattuck-Hufnagel Stefanie. “Variation in the realization of glottalization in normal speakers”. Journal of Phonetics, 29, 407–429, 2001. DOI : 10.1006/jpho.2001.0145 [DOI] [Google Scholar]
- Sapienza Christine, Hicks Douglas, and Ruddy Bari Hoffman. “Voice disorders”. In Anderson NB, and Shames GH (Eds.), Human Communication Disorders: An Introduction (pp. 202–237). Boston: Pearson. (8th ed.), (2011). [Google Scholar]
- Shue Yen-Liang. The voice source in speech production: Data, analysis and models. Ph.D. Dissertation, UCLA, 2010. [Google Scholar]
- Slifka Janet. Respiratory constraints on speech production at prosodic boundarie s. Ph.D. Dissertation, MIT, 2000. [Google Scholar]
- Slifka Janet. “Some physiological correlates to regular and irregular phonation at the end of an utterance”. Journal of Voice, 20, 171–186, 2006. DOI : 10.1016/j.jvoice.2005.04.002 [DOI] [PubMed] [Google Scholar]
- Sundberg Johan. The Science of the Singing Voice. DeKalb, IL: Northern Illinois University Press, 1987. DOI : 10.1121/1.399243 [DOI] [Google Scholar]
- Vilkman Erkki and Karma Pekka. “Vertical hyoid displacement and fundamental frequency of phonation”, Acta Otolaryngologica, 108:1–2, 142–151, 1989. DOI : 10.3109/00016488909107406 [DOI] [PubMed] [Google Scholar]
- Whalen Douglas H., Gick Bryan, Kumada Masanobu and Honda Kiyoshi. “Cricothyroid activity in high and low vowels: exploring the automaticity of intrinsic F0”, Journal of Phonetics, 27, 125–142, 1998. [Google Scholar]
- Yuasa Ikuko Patricia. Creaky Voice: “A New Feminine Voice Quality for Young Urban-Oriented Upwardly Mobile American Women?”. American Speech. 85 (3), 2010. DOI : 10.1215/00031283-2010-018 [DOI] [Google Scholar]
- Zhang Zhaoyan. “Cause-effect relationship between vocal fold physiology and voice production in a three-dimensional phonation model”. Journal of the Acoustical Society of America, 139, 1493–1507, 2016. DOI : 10.1121/1.4944754 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhuang Xiaodan, and Hasegawa-Johnson Mark. “Towards interpretation of creakiness in Switchboard”. In Proceedings of Speech Prosody, 37–40, 2008. [Google Scholar]
