The effect of bilingualism on production and perception of vocal fry

Lady Catherine Cantor-Cutiva; Pasquale Bottalico; Jossemia Webster; Charles Nudelman; Eric J Hunter

doi:10.1016/j.jvoice.2021.06.002

. Author manuscript; available in PMC: 2024 Nov 1.

Published in final edited form as: J Voice. 2021 Jul 21;37(6):970.e1–970.e10. doi: 10.1016/j.jvoice.2021.06.002

The effect of bilingualism on production and perception of vocal fry

Lady Catherine Cantor-Cutiva ¹, Pasquale Bottalico ¹, Jossemia Webster ¹, Charles Nudelman ¹, Eric J Hunter ¹

PMCID: PMC8770720 NIHMSID: NIHMS1726518 PMID: 34301440

Abstract

Aims:

(1) Determine the difference in vocal fry phonation in English and Spanish productions among bilingual young adults, (2) Characterize the effect of spoken language and native language on vocal fry production among English-Spanish bilingual speakers, (3) Identify the effect of first and second language knowledge of the listener in the voice perceptual assessment, and (4) Define the effect of the environment of the assessment (in situ vs. online), in the voice perceptual assessment.

Method:

Exploratory cross-sectional study of 34 bilingual (Spanish-English) speakers and six inexperienced listeners. Participating speakers produced two speech samples (one in English and one in Spanish). Six inexperienced monolingual and bilingual listeners performed the voice perceptual assessment of vocal fry, General grade of hoarseness, and Roughness using a 4-point rating scale.

Results:

Bilingual speakers used vocal fry more often when they were speaking in English (around 3%) compared with their production in Spanish (around 2%). Bilingual native English speakers used vocal fry more often during their productions in both languages compared with bilingual native Spanish speakers. Bilingual listeners had the highest agreement when identifying vocal fry in both languages.

Conclusions:

Differences in production of vocal fry between native speakers of American English and native speakers of Spanish may be evidence of transferring of vocal behavior (such as vocal fry) from one language to the second one. In addition, being a bilingual listener may have an important effect on the perceptual identification of voice quality in English and Spanish, as well as vocal fry in English.

Keywords: perceptual assessment of voice, vocal fry, bilingualism

INTRODUCTION

According to the Center for Immigration Studies, almost 52% of the immigrant population in the United States in 2014 came from Latin America (Camarota, 2016), where the most common language is Spanish. This long-time immigration of native Spanish speakers (dating from the 19^th century with the Adams-Onís Treaty of 1819 with Spain, and the Treaty of Guadalupe Hidalgo in 1848) has established Spanish as the most widely spoken non-English language in the United States (Rumbaut & Massey, 2013). This increased number of Spanish speakers in the United States may be one reason for an increased number of English-Spanish bilingual speakers. First, native Spanish speakers often learn English in order to be able to communicate and adjust to life in the United States (Gordon, 1964), while the high exposure to Spanish in the United States likely motivates native English speakers (born in the United States and without Hispanic roots) to also learn Spanish as a second language (L2) to interact with monolingual Spanish speakers. Over the past decade, research on bilingualism has become common and has increased our understanding of voice production and perception among bilingual speakers, though rarely in a combined study.

Voice Production in Bilingual Speakers

Regarding voice production, previous studies on cognitive correlates, and speech and voice production have found differences among bilingual speakers compared with monolingual speakers (Adesope et al., 2010; Gunnerud et al., 2020; Hambly et al., 2013). Adesope (2010) suggested that bilingual speakers appear to have a heightened metalinguistic awareness. Therefore, bilingual speakers are more aware about differences in syntactic rules and phonological systems across languages. Other research more focused on speech production has reported differences in voice onset time (VOT) according to whether initial learning of English (before 12 years old vs. after 12 years old) (Thornburgh & Ryalls, 1998) or language spoken (English vs. Spanish) (Balukas & Koops, 2015), long-term average spectrum (LTAS) among Catalan-Spanish bilingual speakers (Bruyninckx et al., 1994), and fundamental frequency (fo) among bilingual English-Russian speakers (Altenberg & Ferrand, 2006a), English-Cantonese speakers (Ng et al., 2012), and Welsh-English speakers (Ordin & Mennen, 2017).

Understanding the production of speech and vocal behaviors acquired in the native language (L1) during productions of the second one (L2) may be an indication of either code-switching or a cross-linguistic influence. For example, there seems to be code-switching with the mixing of two or more languages in discourse without changing listener or topic (El Bolock et al., 2020; Poplack, 2001). Additionally, there can be a transfer of a vocal production behavior (such as VOT) where the style of one language (e.g. Spanish) influences the standard production of a second learned language (e.g. English) (Balukas & Koops, 2015; Elias et al., 2017).

Nevertheless, there is a dearth of studies investigating the differences within vocal behaviors, such as vocal fry (which is also a vocal register) among bilingual English-Spanish speakers. Moreover, while VOT has indicated that a vocal behavior from one language can be transferred to a second one (Balukas & Koops, 2015; Elias et al., 2017), vocal fry also seems to have a strong American English component and may not translate. Additionally, vocal fry may be such a language specific vocal behavior that, therefore, may contribute to an increased (or decreased) occurrence of voice disorders among bilingual speakers. Previous research has reported that bilingual English-Spanish speakers had slightly decreased shimmer and increased HNR compared with monolingual English speakers (Cantor-Cutiva et al., 2019), which may be interpreted as bilingual speakers having better quality voices. Although, it is not clear which bilingualism-related factors contribute to these differences between monolinguals and bilinguals, since vocal fry seems to be English specific (for English-Spanish bilingual speakers), this could be one contributing factor.

Voice Perception in Bilingual Speakers

Regarding voice perception, the influence of listener variables (e.g., gender, language proficiency) on the perceptual assessment of the voice of bilingual speakers have been examined, though not as completely as bilingual production. One study showed no significant distinctions between the ratings of monolinguals or bilinguals listeners when assessing nasality between English and Spanish speech of bilingual children (Watterson et al., 2013). Another study including Japanese and American listeners found no significant differences between the Japanese and American listeners in the use of the Grade and Roughness scales for the perceptual assessment of voice on monolingual English speakers (Yamaguchi et al., 2003). In a study of monolingual and two different populations of bilingual speakers rating the presence of disordered voice production on a group of 10 females, each with a mild, moderate, or severe voice disorder or with no voice disorder, one bilingual population rated individuals with sever voice disordered more negatively than the other bilingual population and the monolingual population (Altenberg & Ferrand, 2006b); however the authors point out that the results indicate a potential cultural difference in acceptability of poor voice quality rather than a bilingual difference in the perception of the severity of voice quality itself. In terms of vocal fry perception, Crowhurst (2018) found that while fry isn’t as common in Spanish production, vocal fry is a perceptually salient linguistic feature for Spanish Speakers (Crowhurst, 2018). For this reason, vocal fry may be a useful vocal behavior to assist in the study of bilingual English-Spanish speakers both from the voice production and voice perception perspective.

Vocal Fry Production

Vocal fry production has been defined as the vocal register with fundamental frequency values below 70 Hz, often described perceptually as a “creaky voice” or sounding similar to the “popping of corn” (Blomgren et al., 1998; Hollien et al., 1966; McGlone & Shipp, 1971; Michel, 1968). Previous studies on vocal fry among native English speakers have concluded that vocal fry may be used in communication as a syntactic cue, particularly since the occurrence of vocal fry seems to be more frequent at the end of paragraphs and sentences (Davidson, 2019; Henton & Bladon, 1987; Kreiman, 1982; Redi & Shattuck-Hufnagel, 2001; Wolk et al., 2012). This placement may also offer paralinguistic information to the listener about turn-taking during conversations (Ogden, 2001). Vocal fry has also been suggested to have sociolinguistic purposes related to an expression of emotions (such as surprise or admiration), a mark of hesitation (Carlson et al., 2006), or (in the case of females using it) as a strategy to appear more masculine to move forward in a male-dominant society (Parker & Borrie, 2018; Yuasa, 2010).

Studies on cross-linguistic influence have demonstrated that in bilingual speakers, a cross-linguistic influence in lexical stress may occur in the transfer of articulatory patterns from L1 (native language) to L2 (second language), which results in a foreign language accent and affects individual phonemes and suprasegmental features (Bohn, 1995; Ellis & Ellis, 1994; Flege, 1995; Strange, 1995; Wayland et al., 2006). Speakers of English as a second language seem to learn to use production of vocal fry to offer cues to English lexical stress (Gibson, 2017), whereas, speakers of Spanish as a second language transfer the fry phonation to their productions in Spanish, where fry phonation is not common. Nevertheless, it is still unknown how this transfer can happen and its consequences on voice acoustic parameters and vocal health of bilingual speakers.

Few studies exist examining the occurrence of vocal fry among adult female speakers exposed to two languages. Gibson et al (2017) showed a higher occurrence of fry phonation when repeating English nonwords compared with the repetition of Spanish nonwords. Consequently, vocal fry use was linked with lexical representations of the English language (Gibson et al., 2017; Gibson & Summers, 2018). However, the speech material used in the Gibson et al (2017) study was composed of nonwords in both English and Spanish, and, consequently, may not be a good proxy of the conversational use of vocal fry among bilingual English-Spanish speakers. Therefore, there is a need for studies that investigate the occurrence of vocal fry among bilingual English-Spanish speakers in either conversational or conversational-like contexts to identify the use of this vocal register in the daily communication of bilingual English-Spanish speakers. This information will help to confirm if the use of vocal fry is linked with linguistic or sociolinguistic motivations related to representations of English. Moreover, it will also help to understand the possible variations on vocal health of bilingual speakers associated with adaptations related with speaking a second language. Important information considering that, for instance, bilingual speakers could be misdiagnosed with roughness when using vocal fry in Spanish.

Although there are some indications about the influence of knowing/speaking a L2 on voice production and perception, there is a dearth of studies investigating the differences of voice production and perception (modal phonation and fry phonation) among bilingual English-Spanish speakers. To the best of the authors knowledge, currently, it is not clear if language knowledge influences perceptual identification of decreased voice quality, which could cause misdiagnosis of a voice disorder in bilingual Spanish/English-speaking clients.

Therefore, in order to better understand these issues, the current cross-sectional study of 34 bilingual English-Spanish speakers and 6 inexperienced listeners was designed with the following four aims: (1) Determine the difference in vocal fry phonation in English and Spanish productions among bilingual young adults, (2) Characterize the effect of spoken language and native language on vocal fry production among English-Spanish bilingual speakers, (3) Identify the effect of first and second language knowledge of the listener in the voice perceptual assessment, and (4) Define the effect of the environment of the assessment (in situ vs. online), in the voice perceptual assessment. While these aims are broad, a broad look is necessary to expand the growing body of knowledge of voice production and perception among bilingual English-Spanish speakers.

METHOD

Design and Participants

This study was performed between the winter of 2017 and spring of 2018 under approval of the Institutional Human Research Protection Program. English-Spanish bilingual speakers were invited to participate in this cross-sectional study. Inclusion criteria for the speakers was no history of hearing, speech of voice problems, as well as voice therapy. Within the thirty-four bilingual English-Spanish speakers enrolled in this study, approximately 38% were from Texas (n=13), 32% were from Midwestern United States (n= 8 from Michigan, 2 from Illinois, 2 from Ohio), 2 participants were from Florida, 1 from New Hampshire, and 1 from Georgia, around 15% from Latin-American countries (n= 5 from Colombia and 1 from Mexico). Among the participating speakers, 13 were American native English speakers (38%), whereas 21 were native Latin-American Spanish speakers (61%). Native English speakers were those participants who reported having learned English first and then Spanish, whereas native Spanish speakers learned Spanish first and then English. Gender distribution among bilingual speakers was 22 females, 12 males, with a mean age of 24 y/o, SE= 1.13.

In addition, in order to assess the reliability of the classification used in this study of native English and native Spanish speakers, a foreign “accentedness” assessment (Rogers et al., 2006) was performed by two bilingual Spanish-English speakers (one native Spanish speaker from Peru, and one native English speaker from Michigan, US). The listeners evaluated the recorded Spanish and English samples (respectively) for foreign accent using a 10-point scale (0= no accent, 9= very heavy accent). The raters were blind to the self-reported native language of the participants. After the accent assessment, the agreement between the self-report and the score of the listener was assessed by Receiver Operating Characteristic (ROC) curves, whereby the area under the curve (AUC) would reflect the level of accuracy by which the score of the listener agrees with the report of the participants. An AUC of 0.5 would reflect a complete absence of any agreement, an AUC of 1 would reflect a perfect agreement, and an AUC of 0.70–0.80 would be considered good accuracy (Fawcett, 2006). The results of this analysis indicated good agreement between the self-report and the listener score (AUC=0.7 for the English native speaker, and AUC=0.8 for the Spanish native speaker). Therefore, it is likely that the participants were correctly classified as native English and native Spanish speakers.

Participating listeners were undergraduate college students. Inclusion criteria for the listeners was no previous experience on perceptual evaluation of voice and normal hearing (self-reported). Six inexperienced listeners (2 monolinguals - 4 bilinguals; 5 females - 1 male) were trained in the perceptual identification of vocal fry, with two dimensions of the GRBAS Scale (G and R) also included. During training, we reviewed the definitions of the dimensions, and provided accuracy feedback and anchor samples. After two training sessions (with a duration of about 2 hours each), trained listeners judged all samples for vocal fry, G score (general grade of hoarseness), and R score (roughness) using a 4-point rating scale (0 for normal, 1 for slight presence of the factor, 2 for moderate presence of the factor, and 3 for severe presence of the factor).

Because previous research has reported that the environment where the judgements are made influence the listener’s internal standards (Kreiman et al., 2007); listeners were located in two different environments. In this study, we define environment based in three elements: physical location, way to access the voice samples and way to receive the training. Four out of six listeners (2 bilinguals and 2 monolinguals) participated in the study at the same location where the experiment was run (in situ listeners) with in-person training. These listeners had access to the audio-files played on two laptops located in a room with low background noise conditions. The remaining 2 listeners (bilinguals) were in a different location, participating in training online and performing their ratings at a distance (online listeners). These listeners had access to the audio-files online. Since 2 out of the 6 listeners were in a different location, performance bias was likely. Performance bias may happen when the investigators give more attention or feedback to one group of subjects in the study. In order to avoid this bias, the first author was always present, either in person or available online, when the listeners were performing their perceptual assessments. Therefore, all the listeners received the same feedback.

Data collection procedures

Informed consent form and Questionnaire

At the beginning of the session, participating speakers were invited to read and sign the informed consent form. After signing the informed consent form, participating speakers filled out a short questionnaire and recorded two voice samples (one in Spanish and on in English).

The questionnaire used for this study included questions on gender, age (socio-demographics) and self-reported native language. Participants also reported if they have had hearing, speech of voice problems, and if they received voice therapy.

Voice samples

Participating speakers were asked to read aloud two texts, one in each language. For this study, we used “The Rainbow Passage” (Fairbanks, 1960) for English productions, and “El Caballero de la Armadura Oxidada” (The Knight in Rusty Armor, a standardized text in Spanish) (Bermudez de Alvear, 2003) for the Spanish productions. Both speech samples were produced nine times under different “virtual-simulated” acoustic conditions. The order of production of each task (Spanish vs English), as well as the order of presentation of the nine virtual-simulated scenarios, were randomized to control for any unknown confounding variables related to task order. In this manuscript, we will not analyze the effect of the “virtual-simulated” acoustic condition on vocal fry production of the bilingual speakers, but all the samples will be included for the perceptual voice assessment of the listeners. Each participating listener was requested to listen to 788 samples (9 conditions * 34 speakers * 2 languages * 30% of randomly repeated samples to assess intra-rater reliability – 8 damaged repeated samples).

Equipment

The equipment used in this study has been reported in previous publications on vocal fry among monolingual speakers (Cantor-Cutiva et al., 2017, 2018). We used an omnidirectional microphone (M2211, NTi Audio, Tigard, OR, USA) placed at a fixed distance of 30 cm from the mouth of the participant to record the speech samples. The signal acquired by the microphone was split in two signals: the first output was for direct recording and the second output for creating the virtual acoustic environment. The mixed signal (virtual environment by combining three noise types and three reverberation times) was played back to the participant using headphones (SRH840, Shure, Niles, IL, USA).

Percentage of automatic detected vocal fry

The production of vocal fry was investigated across all the voice samples. The percentage of automatic detected vocal fry was performed in four steps. The first two steps were reported in a previous study (Cantor-Cutiva et al., 2017). First, by means of an analysis technique for the automatic detection of vocal fry (Ishi et al., 2008)local power peaks were identified for deciding the possibility of being vocal fry pulses. Second, the calculation of the ratio between the duration of the segments recognized as vocal fry and the full voice sample duration (multiplied by 100) was determined as the overall percentage of vocal fry in both speech samples (i.e., the Rainbow Passage and El Caballero de la Armadura Oxidada). Third, we calculated the time Dose (Dt) to quantify the total time of the vocal folds’ vibration in seconds (Titze et al., 2003). The time Dose percentage (Dt%) was calculated as the percentage of the total period of voicing time over the total duration of the voice productions in both languages Spanish and English. The voicing frames were determined by means of Praat 5.4/5.4.17 (Netherlands) using two different criteria: (1) for the fundamental frequency, a lower bound of 30 Hz and an upper bound of 400 Hz, and (2) a voicing threshold equal to 0.45, and silence threshold equal to 0.03. A frame was rated as unvoiced if it had an intensity below the voicing threshold or a local peak below the silence threshold. Fourth, the percentage of vocal fry over the voiced speech was calculated as the ratio of the overall vocal fry percentage and the time Dose percentage (multiplied by 100).

Reliability of Percentage of automatic detected vocal fry

The perceptual assessment of two bilingual English-Spanish trained listeners was used to assess the reliability of the automatic method for the detection of vocal fry. After listening the either the text in English (Rainbow Passage) or the text in Spanish (El Caballero de la Armadura Oxidada), trained listeners determined the presence or absence of vocal fry during the production using a 4-point rating scale (0 to 3). Audio files of the speech samples were randomly presented to the two listeners. For the statistical analysis, a dichotomous variable was used with subjects having a score for persistence of vocal fry of one or above considered to be subjects with vocal fry production. Assessment of intra-reliability showed good intra-listener agreement for both listeners (kappa coefficient=0.7 for both). Inter-reliability assessment shows moderate agreement (kappa coefficient=0.6). After the perceptual assessment, the agreement between the automatic detection method of vocal fry as independent variable with the perceptual identification of vocal fry was assessed by Receiver Operating Characteristic (ROC) curves, whereby the area under the curve (AUC) reflects the level of accuracy by which the automatic detection method can detect perceptually identified vocal fry. The results of this analysis indicated a good agreement between the perceptual identification of vocal fry and the percentage of vocal fry calculated by means of the automatic detection method (AUC=0.8). Therefore, we can conclude that the automatic detection method was a good approach for the identification of occurrences of fry.

Perceptual assessment of voice production

Audio files of the two texts (English and Spanish) were randomly presented to the six listeners. All listeners performed their ratings in a quiet room using the same type of headphones (SRH840, Shure, Niles, IL, USA). Similar to a previous study (Cantor-Cutiva et al., 2018), all listeners were asked to rate the presence or absence of vocal fry in the productions. For each audio file, the listeners indicated if they perceived vocal fry in the speech production and, if so, how “persistent” it was (from 0 to 3, with 0 being no vocal fry and 3 being persistent presence of vocal fry). In addition, we requested the listeners to rate the overall grade of hoarseness (G) and roughness (R) in all the voice samples (from 0 to 3, with 0 being no presence of the factor and 3 being severe presence of the factor). These two factors are part of the five scores of the GRBAS Scale (Yu et al., 2002). However, we only used G and R because these two dimensions have been found to have high internal consistency in the GRBAS scale (de Bodt et al., 1997). In order to avoid fatigue, listeners expended maximum one hour per session listening and rating the voice samples.

Reliability

Intraclass correlation was used to assess the intra- and inter-listener agreement among the participating raters. In order to maximize intra- and inter-listener reliability for the perceptual assessment, all the listeners were trained by one experienced speech-language pathologist (first author). For the intra-reliability assessment, 30% of the voice recordings were randomly selected to be rated a second time by each rater. Therefore, each listener listened to a total of 788 samples.

Statistical Analysis

SPSS 22 software was used for all statistical analysis. Considering the objectives of this study, the statistical analysis was organized in two steps. First, analysis of bilingualism and vocal fry production was performed. Second, we analyzed the effect of bilingualism on voice perception.

For the analysis of the effect of bilingualism on vocal fry production, we assessed differences in occurrence of vocal fry per language by means of the General Linear Model Repeated Measures (GLM). This method (GLM) is recommended when the dataset included the same measurement of each subject made several times. After, Shapiro-Wilk test was used to assess normality of the dependent variable (percentage of vocal fry). As a final step, Generalized Estimating Equations (GEEs) were used to determine whether spoken language was associated with vocal fry percentage. For the independent variables, those with a p-value lower than 0.20 in the univariate analyses were included in the multivariate analysis in order to avoid residual confounding (Maldonado & Greenland, 1993), and were only retained when the p-value reached the conventional level of significance of 0.05. The magnitude of the association was expressed by the beta (β) and its standard error (SE).

For the analysis of the effect of bilingualism on voice perception, three dependent variables and two independent variables were defined. The dependent variables were discrete variables (from 0 to 3) and included: vocal fry, Grade, and Roughness. The independent variables were dichotomous and included: listener’s language knowledge (yes-bilinguals, no-monolinguals), and environment of the assessment (in situ, online). In the factor analysis, we included the variable gender (male, female) as an independent variable. Our dataset contained no missing values because the listeners rated all the samples. First, the Intraclass correlation was calculated to assess the agreement within and between the perceptual assessment by the raters (G and R scores and production of vocal fry). Second, a factor analysis was performed to identify those factors that best explained the variance among the perceptual ratings. Third, Generalized Estimating Equations (GEEs) was used with a multinomial distribution to determine if being a bilingual listener was associated with the perceptual identification of vocal fry, G score and R score in English and Spanish. The magnitude of the association was expressed as the Beta, and its standard error (SE).

RESULTS

Bilingualism and vocal fry production

Occurrence of vocal fry in English and Spanish

The percentage of automatically detected vocal fry in the voice frames in English and Spanish produced by English-Spanish bilingual speakers is shown in figure 1. On average, participants produced vocal fry about 2.3% of their productions. Native English speakers more often used vocal fry during their productions in both languages, English (mean= 4.6%; SE= 0.4) and Spanish (mean= 2.9%; SE= 0.3), compared with native Spanish speakers (mean in English = 1.9%; mean in Spanish = 0.9%).

Figure 1. — Mean vocal fry percentage by language spoken and native language.

Differences in use of vocal fry in English and Spanish among bilingual young adults

The results of the General Linear Model Repeated Measures (GLM) indicate a statistically significant difference in the production of vocal fry in both English and Spanish (F= 274.38; p-value<0.01). Speakers more often used vocal fry when they were speaking in English (mean=3%, SE=0.2) compared with their production in Spanish (mean=2%, SE=0.1). In addition, significant differences were identified in the use of vocal fry between native English bilingual speakers compared with native Spanish bilingual speakers.

Effect of spoken language on vocal fry production among English-Spanish bilingual speakers

The univariate analysis of the association between language spoken, gender, and age with the production of vocal fry showed that speaking Spanish was associated with a reduction of 1.7% of production of vocal fry among bilingual English-Spanish speakers. Since voice production is a multidimensional phenomenon, we investigated the effect of age and gender on the production of vocal fry. No significant association between either gender or age with the production of vocal fry were found between the participating bilingual English-Spanish speakers (Table 1).

Table 1.

Association between spoken language, gender, and age with vocal fry among bilingual English-Spanish speakers

Parameter	B	SE	p-value
Female	−6.28	5.14	0.22
Age	−0.07	0.2	0.74
Spanish language	−1.70*	0.42	0.00

Open in a new tab

B= Beta; SE= Standard Error

Bilingualism and perceptual assessment of voice

Intra-Reliability of voice perceptual assessment

Table 2 shows that Intra-rater reliability was less variable for judgments of vocal fry than judgments for Grade (G) and Roughness (R) in English (intraclass correlation coefficient [ICC] R01= 0.91, R02= 0.88, R03= 0.69, R04= 0.32, R05= 0.80, R06= 0.70). One of the monolingual listeners got the lowest ICC (R04), whereas one of the bilingual listeners got the highest ICC (R01).

Table 2.

Intra-reliability of six participating raters per language (English and Spanish)

RATER	ENGLISH			SPANISH
RATER	Grade	Roughness	Vocal Fry	Grade	Roughness	Vocal Fry
R01 (bilingual)	0.69	0.76	0.91	0.66	0.79	0.71
R02 (monolingual)	0.78	0.67	0.88	0.73	0.61	0.76
R03 (bilingual)	0.70	0.51	0.69	0.42	0.14	0.75
R04 (monolingual)	−0.127	−0.205	0.32	0.61	−0.042	0.14
R05 (bilingual)	0.44	0.11	0.80	0.16	−0.138	0.78
R06 (bilingual)	0.80	0.25	0.70	0.47	0.56	0.68

Open in a new tab

Inter-Reliability of perceptual assessment of voice

Table 3 shows the results of the intraclass correlation coefficients for Inter-rater reliability. The four bilingual listeners had the highest agreement when identifying vocal fry in both languages (intraclass correlation coefficient [ICC] English= 0.84, Spanish= 0.80). In contrast, bilingual listeners had low reliability when identifying the overall grade of hoarseness (G) and roughness (R) in English. The lowest Inter-rater reliability was found on the perception of the G score among the two monolingual listeners (ICC English= 0.02, Spanish= 0.17).

Table 3.

Inter-reliability of six participating raters per language (English and Spanish)

Parameter	ENGLISH		SPANISH

	Bilinguals	Monolinguals	Bilinguals	Monolinguals

Grade score	0.26	0.02	0.41	0.17
Roughness score	0.35	0.20	0.37	0.24
Vocal Fry	0.84	0.68	0.80	0.50

Open in a new tab

Factor analysis of bilingualism and perceptual assessment of voice quality

Figures 2, 3 and 4 show the analysis of the factors that best explain the variance among the perceptual ratings. The figures demonstrate that being a bilingual listener accounted for approximately 50% of the variance for the three assessed aspects (Grade, Roughness, Vocal fry). The second most important factor was the language spoken in the audio file (Rainbow Passage vs. Caballero de la Armadura Oxidada), followed by location (in situ vs. online), and gender of the listener (female vs. male).

Figure 2. — Factor analysis of variance of perceptually identified vocal fry.

Figure 3. — Factor analysis of variance of Grade scores

Figure 4. — Factor analysis of variance of Roughness scores

Association between bilingualism and voice perceptual assessment

Table 4 shows the results of the association analysis by means of the Generalized Estimating Equations between being a bilingual listener and the perceptual identification of overall grade of vocal fry, hoarseness (G score), and roughness (R score). The results indicated that bilingual listeners identified hoarseness and roughness in both languages (English and Spanish), and vocal fry in English more often than monolingual listeners. These results indicated that ratings in the G score (B=1.89 in English, and B=0.90 in Spanish), R score (B=1.43 in English, and B=1.22 in Spanish) and perception of vocal fry (B=0.54) were statistically higher among bilinguals than among monolinguals. There was not a statistically significant association between being a bilingual listener and the perceptual identification of vocal fry in Spanish.

Table 4.

Associations between being a bilingual listener and voice quality perceptual assessment

	Speech production in English

Variable	Grade score			Roughness score			Vocal Fry

	Beta	SE	p-value	Beta	SE	p-value	Beta	SE	p-value

Being a bilingual listener	1.89*	0.43	0.00	1.43*	0.14	0.00	0.54*	0.11	0.00

	Speech production in English

Variable	Grade score			Roughness score			Vocal Fry

	Beta	SE	p-value	Beta	SE	p-value	Beta	SE	p-value

Being a bilingual listener	0.90*	0.11	0.00	1.22*	0.09	0.00	−0.24	0.34	0.48

Open in a new tab

DISCUSSION

In this study, we aimed to (1) Determine the difference in vocal fry phonation in English and Spanish productions among bilingual young adults, (2) Characterize the effect of spoken language and native language on vocal fry production among English-Spanish bilingual speakers, (3) Identify the effect of first and second language knowledge of the listener in the voice perceptual assessment, and (4) Define the effect of the environment of the assessment (in situ vs. online), in the voice perceptual assessment. These broad aims provide a means to expand the growing body of knowledge on voice production and perception among bilingual English-Spanish speakers.

Concerning the first and the second aim, production of vocal fry appears to be more common in English compared with Spanish among bilingual young adults. These results agree with previous studies. For example, Gibson et al (2017) found higher percentage of vocal fry in English than in Spanish among adult female speakers exposed to two languages, regardless of the proficiency in the second language (L2). Our results likely extend to other languages as well; for example, Benoist-Lucy & Pillot-Loiseau (2013) reported a significantly higher production of vocal fry in English than in French for bilingual speakers.

One explanation for the increased use of vocal fry in English among bilingual young adults is the virtuous cycle hypothesis (Gibson et al., 2017), which states that vocal fry use might simply be the result of being exposed to this vocal register. Therefore, bilingual speakers will include in their lexical representations of English information like vocal fry even when is not part of the sound system of this language. Gibson et al (2017) suggests that bilinguals who are exposed (in some proportion) to both English and Spanish have higher use of vocal fry in English than in Spanish because fry phonation would be “encoded in the English lexical representations” of the speakers regardless of their proficiency in either language. Therefore, production of vocal fry in English is not exclusively related to linguistic and sociolinguistic purposes but it is associated with the lexical representation of English.

Another possible explanation for the increased use of vocal fry in English is a language-dependency of the phonatory process. Considering “voice quality” as language-dependent would imply that both intrinsic and extrinsic features, as defined by Laver (1987), should be considered. Therefore, “voice quality” would be not only the product of specific anatomic and physiologic characteristics of the speaker (intrinsic features), but also long-term muscular adjustments of the vocal apparatus “acquired perhaps by social imitation” (extrinsic features) (Laver, 1987). In this order of ideas, non-native English speakers learn not just the supra-laryngeal settings (articulation) of English but also the laryngeal settings (phonation) (Benoist-Lucy & Pillot-Loiseau, 2013). Since, it has been reported that a low proportion of second-language adult-learners achieve “native-speaker competence” (Ho, 1986), it is likely that native Spanish speakers tend to produce vocal fry less often compared with native English speakers. Future research is needed to confirm this hypothesis.

In the current study, bilingual native speakers of American English tend to produce vocal fry in both English and Spanish more often than bilingual native speakers of Latin-American Spanish. Since previous research has reported that learning a second language implies transferring attributes from the first language to the second language (Kroll et al., 2012), it would be expected that bilingual native English speakers would produce vocal fry more often in English and Spanish than bilingual native Spanish speakers. Moreover, it would be expected that bilingual native Spanish speakers would not produce vocal fry because its use in Spanish in conversational contexts has not been reported (probably as a result of the lack of use of this vocal register in Latin-American Spanish). Nevertheless, future studies are needed, including other ways to assess a speaker’s proficiency in the second language to confirm this hypothesis.

Concerning the third and fourth aims, results from the present study suggest that bilingual listeners may have higher inter-reliability compared with monolinguals. When considering inter-rater reliability for the perceptual assessment of vocal fry, the bilingual listeners appeared to have the highest agreement when identifying vocal fry in both languages, with an ICC of 0.84 for English and 0.80 for Spanish compared to monolingual listeners’ ICC of 0.68 for English and 0.50 for Spanish. Previous research concluded that differences in ratings among listeners are due to the listener’s trouble in isolating single aspects of the voice (Ehrlich et al., 2018; Kreiman & Gerratt, 2000). Our results, along with that of the previous study, suggest that bilingual listeners may have improved skills in language processing. Nevertheless, future studies with larger number of listeners are advisable to confirm this hypothesis.

The factor analysis conducted in the study indicates that being a bilingual listener accounted for approximately 50% of the variance for vocal fry, general grade of hoarseness, and roughness. Therefore, language knowledge accounted for a significant amount of the variance for the perceptual identification of these three aspects of voice. Bilingual listeners may have higher accuracy with perceptual identification of these perceptual parameters due to their exposure to both languages. Hearing both Spanish and English in a variety of contexts may assist bilingual listeners in being consciously aware of the vocal qualities of each language. This is supported by findings from a recent study comparing the results of a discrimination task for bilingual and monolingual speakers, in which bilingual children performed significantly better than monolingual children in the discrimination task (Levi, 2018). The results suggest that bilingual children may have better voice processing due to a bilingual advantage in the social aspect of speech perception, which may confirm that bilingual listeners have improved skills in processing information about the speaker.

In the present study, we found that bilingual listeners identified higher incidences of hoarseness (G) and roughness (R) in English and Spanish, as well as vocal fry in English. This implies that bilingual listeners were more likely to perceptually identify these qualities during their ratings, whether these identifications were accurate or not. Improved performance on identifying these voice qualities, when compared to monolingual listeners, suggests that the bilingual raters in the study may have better skills in identifying voice production characteristics compared to the monolingual raters.

Another potential reason as to why bilingual listeners have higher accuracy in their perceptual identification of voice may be related with neurological differences between monolinguals and bilinguals. Results from a recent study indicate a greater signal change in the left superior lobe for bilinguals (compared to monolinguals), suggesting that bilingual speakers may have more effortful language processing (Coderre et al., 2016; Hayakawa & Marian, 2019). For bilingual speakers, strong cognitive control is necessary in order to code switch between languages, and the left dorsolateral prefrontal cortex is activated during this process (DeLuca et al., 2019; Rodriguez-Fornells et al., 2006). Possible implications of this increased language processing are that bilingual listeners seems to be able to identify more advanced language cues than monolinguals.

Limitations

There are some limitations in the current study that should be acknowledged. First, the cross-sectional design does not allow insight into the causality of the reported associations. Therefore, we have no information on the relation over time between language spoken with production of vocal fry and voice perception. Second, the small sample size of listeners in our study does not allow generalization of the reported associations. Nevertheless, the findings provide a valuable preliminary understanding, with further studies on larger sized samples important to corroborate the current results. Third, personal data on bilingual proficiency and native language (L1) relied on a questionnaire, but future studies should include objective measures such as formal language tests. Fourth, the automatic detection method does not provide the utterance of fry phonation, and, therefore, it was not possible to locate the production of vocal fry within the speech material. Future studies are needed to explore objectively the utterances’ location of vocal fry on connected speech in monolingual and bilingual speakers. This information will allow to identify patterns of production of vocal fry in different languages. Fifth, the methodological approach was not completely representative of a conversational context, which limits the identification of either linguistic or sociolinguistic motivations to use vocal fry related to representations of English. Future studies are advisable to involve ecological approaches.

Conclusion

This study reported on the effects of a rater’s language knowledge on the ability to perceive voice quality and vocal fry in bilingual speakers. The results of this study point out two interesting features of the voice production of bilingual speakers. First, because of the linguistic purpose of vocal fry in English, the high production of vocal fry in both languages (English and Spanish) among native English speakers may indicate the code-switching effect on these individuals. This aspect is of special interest for the analysis of voice quality in bilingual speakers because it offers indications about the speech patterns in these speakers. Moreover, differences in production of vocal fry between native speakers of American English and native speakers of Spanish may be evidence of transferring of vocal behavior (such as vocal fry) from one language to the second one. This information might be taken into consideration when planning speech and voice therapies with bilingual speakers.

From the results, we can conclude that being a bilingual listener may have an important effect on the perceptual identification of voice quality (G – hoarseness, R – roughness from the GRBAS) in English and Spanish, as well as vocal fry in English. Information that might have implications when planning speech and voice therapies with bilingual speakers and for developing training programs in perceptual evaluation.

Acknowledgments

Thank you to the multiple subjects who participated in this study. Thanks also to Luis Garcia (Migrant Student Services Director at Michigan State University), Elias Lopez (College Assistance Migrant Program (CAMP) Associate Director at Michigan State University), Jessica Navarro and the team of CAMP program in Michigan State University, and Stirling Witthoeft for various supporting roles in the research. The authors appreciate the generous use of Carlos Ishi’s vocal fry automatic detection script.

Footnotes

Declaration of Interest: The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the paper. The research reported in this publication was supported by the National Institute of Deafness and Other Communication Disorders of the National Institutes of Health under Award Number R01DC012315. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

REFERENCES

Adesope O, Lavin T, Thompson T, & Ungerleider C. (2010). A systematic review and meta-analysis of the cognitive correlates of bilingualism. Review of Educational Research, 80(2), 207–245. [Google Scholar]
Altenberg EP, & Ferrand CT. (2006a). Fundamental frequency in monolingual English, bilingual English/Russian, and bilingual English/Cantonese young adult women. Journal of Voice, 20(1), 89–96. [DOI] [PubMed] [Google Scholar]
Altenberg EP, & Ferrand CT (2006b). Perception of individuals with voice disorders by monolingual English, bilingual Cantonese–English, and bilingual Russian–English women. Journal of Speech, Language, and Hearing Research. [DOI] [PubMed] [Google Scholar]
Balukas C, & Koops C. (2015). Spanish-English bilingual voice onset time in spontaneous code-switching. International Journal of Bilingualism, 19(4), 423–443. [Google Scholar]
Benoist-Lucy A, & Pillot-Loiseau C. (2013). The Influence of language and speech task upon creaky voice use among six young American women learning French. 2395–2399. [Google Scholar]
Bermudez de Alvear R. (2003). Exploración clínica de los trastornos de la voz, el habla y la audición: Pautas y protocolos asistenciales. Ediciones Aljibe. http://books.google.nl/books?id=471KAAAACAAJ [Google Scholar]
Blomgren M, Chen Y, Ng M, & Gilbert H. (1998). Acoustic, aerodynamic, physiologic, and perceptual properties of modal and vocal fry registers. The Journal of the Acoustical Society of America, 103(5), 2649–2658. [DOI] [PubMed] [Google Scholar]
Bohn O-S (1995). Cross-language speech perception in adults: First language transfer doesn’t tell it all. Speech Perception and Linguistic Experience: Issues in Cross-Language Research, 279–304. [Google Scholar]
Bruyninckx M, Harmegnies B, Llisterri J, & Poch-Olivé D. (1994). Language-induced voice quality variability in bilinguals. Journal of Phonetics, 22, 19–31. [Google Scholar]
Camarota S. (2016). Immigrants in the United States, 2016. http://cis.org/Immigrants-in-the-United-States [Google Scholar]
Cantor-Cutiva LC, Bottalico P, & Hunter E. (2018). Factors associated with vocal fry among college students. Logopedics Phoniatrics Vocology, 43(2), 73–79. 10.1080/14015439.2017.1362468 [DOI] [PMC free article] [PubMed] [Google Scholar]
Cantor-Cutiva LC, Bottalico P, Ishi C, & Hunter E. (2017). Vocal Fry and Vowel Height in Simulated Room Acoustics. Folia Phoniatrica et Logopaedica, 69(3), 118–124. 10.1159/000481282 [DOI] [PMC free article] [PubMed] [Google Scholar]
Cantor-Cutiva LC, Bottalico P, Nudelman C, Webster J, & Hunter E. (2019). Do Voice Acoustic Parameters Differ Between Bilingual English-Spanish Speakers and Monolingual English Speakers During English Productions? Journal of Voice. [DOI] [PMC free article] [PubMed] [Google Scholar]
Carlson R, Gustafson K, & Strangert E. (2006). Cues for hesitation in speech synthesis. Interspeech 2006. [Google Scholar]
Coderre EL, Smith JF, Van Heuven WJ, & Horwitz B. (2016). The functional overlap of executive control and language processing in bilinguals. Bilingualism (Cambridge, England), 19(3), 471. [DOI] [PMC free article] [PubMed] [Google Scholar]
Crowhurst MJ (2018). The influence of varying vowel phonation and duration on rhythmic grouping biases among Spanish and English speakers. Journal of Phonetics, 66, 82–99. 10.1016/j.wocn.2017.09.001 [DOI] [Google Scholar]
Davidson L. (2019). The effects of pitch, gender, and prosodic context on the identification of creaky voice. Phonetica, 76(4), 235–262. [DOI] [PubMed] [Google Scholar]
de Bodt M, Wuyts F, Van de Heyning P, & Croux C. (1997). Test-retest study of the GRBAS scale: Influence of experience and professional background on perceptual rating of voice quality. Journal of Voice, 11(1), 74–80. [DOI] [PubMed] [Google Scholar]
DeLuca V, Rothman J, Bialystok E, & Pliatsikas C. (2019). Redefining bilingualism as a spectrum of experiences that differentially affects brain structure and function. Proceedings of the National Academy of Sciences, 116(15), 7565–7574. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ehrlich B, Lin L, & Jiang J. (2018). Concatenation of the Moving Window Technique for Auditory-Perceptual Analysis of Voice Quality. American Journal of Speech-Language Pathology, 27(4), 1426–1433. 10.1044/2018_AJSLP-17-0103 [DOI] [PMC free article] [PubMed] [Google Scholar]
El Bolock A, Khairy I, Abdelrahman Y, Vu NT, Herbert C, & Abdennadher S. (2020). Who, When and Why: The 3 Ws of Code-Switching. International Conference on Practical Applications of Agents and Multi-Agent Systems, 83–94. [Google Scholar]
Elias V, McKinnon S, & Milla-Muñoz Á. (2017). The effects of code-switching and lexical stress on vowel quality and duration of heritage speakers of Spanish. Languages, 2(4), 29. [Google Scholar]
Ellis R, & Ellis RR (1994). The study of second language acquisition. Oxford University. [Google Scholar]
Fairbanks G. (1960). The Rainbow passage. In Voice and articulation drillbook (p. 127). Harper & Row. [Google Scholar]
Fawcett T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874. [Google Scholar]
Flege J. (1995). Second-language speech learning: Theory, findings, and problems. Speech Perception and Linguistic Experience. [Google Scholar]
Gibson T. (2017). The role of lexical stress on the use of vocal fry in young adult female speakers. Journal of Voice, 31(1), 62–66. [DOI] [PubMed] [Google Scholar]
Gibson T, & Summers C. (2018). A perceptual study of cross-linguistic influence on vocal fry use in women exposed to two languages. International Journal of Bilingual Education and Bilingualism, 1–13. [Google Scholar]
Gibson T, Summers C, & Walls S. (2017). Vocal Fry Use in Adult Female Speakers Exposed to Two Languages. Journal of Voice, 31(4), 510.e1–510.e5. 10.1016/j.jvoice.2016.11.006 [DOI] [PubMed] [Google Scholar]
Gordon M. (1964). Assimilation in American life: The role of race, religion, and national origins. Oxford University Press on Demand. [Google Scholar]
Gunnerud HL, Ten Braak D, Reikerås EKL, Donolato E, & Melby-Lervåg M. (2020). Is bilingualism related to a cognitive advantage in children? A systematic review and meta-analysis. Psychological Bulletin, 146(12), 1059. [DOI] [PubMed] [Google Scholar]
Hambly H, Wren Y, McLeod S, & Roulstone S. (2013). The influence of bilingualism on speech production: A systematic review. International Journal of Language & Communication Disorders, 48(1), 1–24. [DOI] [PubMed] [Google Scholar]
Hayakawa S, & Marian V. (2019). Consequences of multilingualism for neural architecture. Behavioral and Brain Functions, 15(1), 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Henton C, & Bladon A. (1987). Creak as a sociophonetic marker. In Language, speech and mind: Studies in honour of Victoria A. Fromkin. Hyman L& Li, editors. [Google Scholar]
Ho D. (1986). Two contrasting positions on second-language acquisition: A proposed solution. International Review of Applied Linguistics in Language Teaching, 24, 35–47. [Google Scholar]
Hollien H, Moore P, Wendahl R, & Michel J. (1966). On the Nature of Vocal Fry. Journal of Speech and Hearing Research, 9(2), 245–247. [DOI] [PubMed] [Google Scholar]
Ishi C, Sakakibara K-I, Ishiguro H, & Hagita N. (2008). A method for automatic detection of vocal fry. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 47–56. [Google Scholar]
Kreiman J. (1982). Perception of sentence and paragraph boundaries in natural conversation. Journal of Phonetics, 10(2), 163–175. [Google Scholar]
Kreiman J, & Gerratt B. (2000). Sources of listener disagreement in voice quality assessment. J Acoust Soc Am, 108(4), 1867–1876. [DOI] [PubMed] [Google Scholar]
Kreiman J, Gerratt B, & Ito M. (2007). When and why listeners disagree in voice quality assessment tasks. J Acoust Soc Am, 122(4), 2354–2364. [DOI] [PubMed] [Google Scholar]
Kroll J, Dussias P, Bogulski C, & Valdes Kroff J. (2012). 7 Juggling Two Languages in One Mind: What Bilinguals Tell Us About Language Processing and its Consequences for Cognition. Psychology of Learning and Motivation-Advances in Research and Theory, 56, 229. [Google Scholar]
Laver J. (1987). Individual features in voice quality [Doctor of Philosophy]. University of Edinburgh. [Google Scholar]
Levi S. (2018). Another bilingual advantage? Perception of talker-voice information. Bilingualism (Cambridge, England), 21(3), 523. [DOI] [PMC free article] [PubMed] [Google Scholar]
Maldonado G, & Greenland S. (1993). Simulation Study of Confounder-Selection Strategies. American Journal of Epidemiology, 138(11), 923–936. [DOI] [PubMed] [Google Scholar]
McGlone R, & Shipp T. (1971). Some physiologic correlates of vocal-fry phonation. Journal of Speech, Language, and Hearing Research, 769–775. [DOI] [PubMed] [Google Scholar]
Michel J. (1968). Fundamental frequency investigation of vocal fry and harshness. Journal of Speech, Language, and Hearing Research, 590–594. [DOI] [PubMed] [Google Scholar]
Ng M, Chen Y, & Chan E. (2012). Differences in vocal characteristics between Cantonese and English produced by proficient Cantonese-English bilingual speakers—A long-term average spectral analysis. Journal of Voice, 26(4), e171–e176. [DOI] [PubMed] [Google Scholar]
Ogden R. (2001). Turn-holding, turn-yielding and laryngeal activity in Finnish talk-in-interaction. Journal of the International Phonetics Association, 31(1), 139–152. [Google Scholar]
Ordin M, & Mennen I. (2017). Cross-Linguistic Differences in Bilinguals’ Fundamental Frequency Ranges. Journal of Speech, Language, and Hearing Research: JSLHR, 60(6), 1493–1506. 10.1044/2016_JSLHR-S-16-0315 [DOI] [PubMed] [Google Scholar]
Parker M, & Borrie S. (2018). Judgments of intelligence and likability of young adult female speakers of American English: The influence of vocal fry and the surrounding acoustic-prosodic context. Journal of Voice, 32(5), 538–545. [DOI] [PubMed] [Google Scholar]
Poplack S. (2001). Code-switching (linguistic). International Encyclopedia of the Social and Behavioral Sciences, 2062–2065. [Google Scholar]
Redi L, & Shattuck-Hufnagel S. (2001). Variation in the realization of glottalization in normal speakers. Journal of Phonetics, 29(4), 407–429. [Google Scholar]
Rodriguez Fornells, A., De Diego Balaguer, R., & Münte TF (2006). Executive control in bilingual language processing. Language Learning, 56, 133–190. [Google Scholar]
Rogers C, Lister J, Febo D, Besing J, & Abrams H. (2006). Effects of bilingualism, noise, and reverberation on speech perception by listeners with normal hearing. Applied Psycholinguistics, 27(3), 465–485. [Google Scholar]
Rumbaut R, & Massey D. (2013). Immigration & language diversity in the United States. Daedalus, 142(3), 141–154. [DOI] [PMC free article] [PubMed] [Google Scholar]
Strange W. (1995). Speech perception and linguistic experience: Issues in cross-language research. York Press. [Google Scholar]
Thornburgh D, & Ryalls J. (1998). Voice onset time in Spanish-English bilinguals: Early versus late learners of English. Journal of Communication Disorders, 31(3), 215–229. [DOI] [PubMed] [Google Scholar]
Titze I, Svec J, & Popolo P. (2003). Vocal Dose Measures: Quantifying Accumulated Vibration Exposure in Vocal Fold Tissues. Journal of Speech, Language, and Hearing Research, 46(4), 919–932. [DOI] [PMC free article] [PubMed] [Google Scholar]
Watterson T, Lewis KE, Murdock T, & Cordero KN (2013). Reliability and validity of nasality ratings between a monolingual and bilingual listener for speech samples from English-Spanish-Speaking children. Folia Phoniatrica et Logopaedica : Official Organ of the International Association of Logopedics and Phoniatrics (IALP), 65(2), 91–97. 10.1159/000353809 [DOI] [PubMed] [Google Scholar]
Wayland R, Landfair D, Li B, & Guion SG (2006). Native Thai speakers’ acquisition of English word stress patterns. Journal of Psycholinguistic Research, 35(3), 285. [DOI] [PubMed] [Google Scholar]
Wolk L, Abdelli-Beruh N, & Slavin D. (2012). Habitual Use of Vocal Fry in Young Adult Female Speakers. Journal of Voice, 26(3), e111–e116. [DOI] [PubMed] [Google Scholar]
Yamaguchi H, Shrivastav R, Andrews M, & Niimi S. (2003). A comparison of voice quality ratings made by Japanese and American listeners using the GRBAS scale. Folia Phoniatrica et Logopaedica, 55(3), 147–157. [DOI] [PubMed] [Google Scholar]
Yu P, Revis J, Wuyts FL, Zanaret M, & Giovanni A. (2002). Correlation of Instrumental Voice Evaluation with Perceptual Voice Analysis Using a Modified Visual Analog Scale. Folia Phoniatrica et Logopaedica, 54(6), 271–281. [DOI] [PubMed] [Google Scholar]
Yuasa I. (2010). Creaky voice: A new feminine voice quality for young urban-oriented upwardly mobile American women? American Speech, 85(3), 315–337. [Google Scholar]

[R1] Adesope O, Lavin T, Thompson T, & Ungerleider C. (2010). A systematic review and meta-analysis of the cognitive correlates of bilingualism. Review of Educational Research, 80(2), 207–245. [Google Scholar]

[R2] Altenberg EP, & Ferrand CT. (2006a). Fundamental frequency in monolingual English, bilingual English/Russian, and bilingual English/Cantonese young adult women. Journal of Voice, 20(1), 89–96. [DOI] [PubMed] [Google Scholar]

[R3] Altenberg EP, & Ferrand CT (2006b). Perception of individuals with voice disorders by monolingual English, bilingual Cantonese–English, and bilingual Russian–English women. Journal of Speech, Language, and Hearing Research. [DOI] [PubMed] [Google Scholar]

[R4] Balukas C, & Koops C. (2015). Spanish-English bilingual voice onset time in spontaneous code-switching. International Journal of Bilingualism, 19(4), 423–443. [Google Scholar]

[R5] Benoist-Lucy A, & Pillot-Loiseau C. (2013). The Influence of language and speech task upon creaky voice use among six young American women learning French. 2395–2399. [Google Scholar]

[R6] Bermudez de Alvear R. (2003). Exploración clínica de los trastornos de la voz, el habla y la audición: Pautas y protocolos asistenciales. Ediciones Aljibe. http://books.google.nl/books?id=471KAAAACAAJ [Google Scholar]

[R7] Blomgren M, Chen Y, Ng M, & Gilbert H. (1998). Acoustic, aerodynamic, physiologic, and perceptual properties of modal and vocal fry registers. The Journal of the Acoustical Society of America, 103(5), 2649–2658. [DOI] [PubMed] [Google Scholar]

[R8] Bohn O-S (1995). Cross-language speech perception in adults: First language transfer doesn’t tell it all. Speech Perception and Linguistic Experience: Issues in Cross-Language Research, 279–304. [Google Scholar]

[R9] Bruyninckx M, Harmegnies B, Llisterri J, & Poch-Olivé D. (1994). Language-induced voice quality variability in bilinguals. Journal of Phonetics, 22, 19–31. [Google Scholar]

[R10] Camarota S. (2016). Immigrants in the United States, 2016. http://cis.org/Immigrants-in-the-United-States [Google Scholar]

[R11] Cantor-Cutiva LC, Bottalico P, & Hunter E. (2018). Factors associated with vocal fry among college students. Logopedics Phoniatrics Vocology, 43(2), 73–79. 10.1080/14015439.2017.1362468 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Cantor-Cutiva LC, Bottalico P, Ishi C, & Hunter E. (2017). Vocal Fry and Vowel Height in Simulated Room Acoustics. Folia Phoniatrica et Logopaedica, 69(3), 118–124. 10.1159/000481282 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Cantor-Cutiva LC, Bottalico P, Nudelman C, Webster J, & Hunter E. (2019). Do Voice Acoustic Parameters Differ Between Bilingual English-Spanish Speakers and Monolingual English Speakers During English Productions? Journal of Voice. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Carlson R, Gustafson K, & Strangert E. (2006). Cues for hesitation in speech synthesis. Interspeech 2006. [Google Scholar]

[R15] Coderre EL, Smith JF, Van Heuven WJ, & Horwitz B. (2016). The functional overlap of executive control and language processing in bilinguals. Bilingualism (Cambridge, England), 19(3), 471. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Crowhurst MJ (2018). The influence of varying vowel phonation and duration on rhythmic grouping biases among Spanish and English speakers. Journal of Phonetics, 66, 82–99. 10.1016/j.wocn.2017.09.001 [DOI] [Google Scholar]

[R17] Davidson L. (2019). The effects of pitch, gender, and prosodic context on the identification of creaky voice. Phonetica, 76(4), 235–262. [DOI] [PubMed] [Google Scholar]

[R18] de Bodt M, Wuyts F, Van de Heyning P, & Croux C. (1997). Test-retest study of the GRBAS scale: Influence of experience and professional background on perceptual rating of voice quality. Journal of Voice, 11(1), 74–80. [DOI] [PubMed] [Google Scholar]

[R19] DeLuca V, Rothman J, Bialystok E, & Pliatsikas C. (2019). Redefining bilingualism as a spectrum of experiences that differentially affects brain structure and function. Proceedings of the National Academy of Sciences, 116(15), 7565–7574. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Ehrlich B, Lin L, & Jiang J. (2018). Concatenation of the Moving Window Technique for Auditory-Perceptual Analysis of Voice Quality. American Journal of Speech-Language Pathology, 27(4), 1426–1433. 10.1044/2018_AJSLP-17-0103 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] El Bolock A, Khairy I, Abdelrahman Y, Vu NT, Herbert C, & Abdennadher S. (2020). Who, When and Why: The 3 Ws of Code-Switching. International Conference on Practical Applications of Agents and Multi-Agent Systems, 83–94. [Google Scholar]

[R22] Elias V, McKinnon S, & Milla-Muñoz Á. (2017). The effects of code-switching and lexical stress on vowel quality and duration of heritage speakers of Spanish. Languages, 2(4), 29. [Google Scholar]

[R23] Ellis R, & Ellis RR (1994). The study of second language acquisition. Oxford University. [Google Scholar]

[R24] Fairbanks G. (1960). The Rainbow passage. In Voice and articulation drillbook (p. 127). Harper & Row. [Google Scholar]

[R25] Fawcett T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874. [Google Scholar]

[R26] Flege J. (1995). Second-language speech learning: Theory, findings, and problems. Speech Perception and Linguistic Experience. [Google Scholar]

[R27] Gibson T. (2017). The role of lexical stress on the use of vocal fry in young adult female speakers. Journal of Voice, 31(1), 62–66. [DOI] [PubMed] [Google Scholar]

[R28] Gibson T, & Summers C. (2018). A perceptual study of cross-linguistic influence on vocal fry use in women exposed to two languages. International Journal of Bilingual Education and Bilingualism, 1–13. [Google Scholar]

[R29] Gibson T, Summers C, & Walls S. (2017). Vocal Fry Use in Adult Female Speakers Exposed to Two Languages. Journal of Voice, 31(4), 510.e1–510.e5. 10.1016/j.jvoice.2016.11.006 [DOI] [PubMed] [Google Scholar]

[R30] Gordon M. (1964). Assimilation in American life: The role of race, religion, and national origins. Oxford University Press on Demand. [Google Scholar]

[R31] Gunnerud HL, Ten Braak D, Reikerås EKL, Donolato E, & Melby-Lervåg M. (2020). Is bilingualism related to a cognitive advantage in children? A systematic review and meta-analysis. Psychological Bulletin, 146(12), 1059. [DOI] [PubMed] [Google Scholar]

[R32] Hambly H, Wren Y, McLeod S, & Roulstone S. (2013). The influence of bilingualism on speech production: A systematic review. International Journal of Language & Communication Disorders, 48(1), 1–24. [DOI] [PubMed] [Google Scholar]

[R33] Hayakawa S, & Marian V. (2019). Consequences of multilingualism for neural architecture. Behavioral and Brain Functions, 15(1), 6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] Henton C, & Bladon A. (1987). Creak as a sociophonetic marker. In Language, speech and mind: Studies in honour of Victoria A. Fromkin. Hyman L& Li, editors. [Google Scholar]

[R35] Ho D. (1986). Two contrasting positions on second-language acquisition: A proposed solution. International Review of Applied Linguistics in Language Teaching, 24, 35–47. [Google Scholar]

[R36] Hollien H, Moore P, Wendahl R, & Michel J. (1966). On the Nature of Vocal Fry. Journal of Speech and Hearing Research, 9(2), 245–247. [DOI] [PubMed] [Google Scholar]

[R37] Ishi C, Sakakibara K-I, Ishiguro H, & Hagita N. (2008). A method for automatic detection of vocal fry. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 47–56. [Google Scholar]

[R38] Kreiman J. (1982). Perception of sentence and paragraph boundaries in natural conversation. Journal of Phonetics, 10(2), 163–175. [Google Scholar]

[R39] Kreiman J, & Gerratt B. (2000). Sources of listener disagreement in voice quality assessment. J Acoust Soc Am, 108(4), 1867–1876. [DOI] [PubMed] [Google Scholar]

[R40] Kreiman J, Gerratt B, & Ito M. (2007). When and why listeners disagree in voice quality assessment tasks. J Acoust Soc Am, 122(4), 2354–2364. [DOI] [PubMed] [Google Scholar]

[R41] Kroll J, Dussias P, Bogulski C, & Valdes Kroff J. (2012). 7 Juggling Two Languages in One Mind: What Bilinguals Tell Us About Language Processing and its Consequences for Cognition. Psychology of Learning and Motivation-Advances in Research and Theory, 56, 229. [Google Scholar]

[R42] Laver J. (1987). Individual features in voice quality [Doctor of Philosophy]. University of Edinburgh. [Google Scholar]

[R43] Levi S. (2018). Another bilingual advantage? Perception of talker-voice information. Bilingualism (Cambridge, England), 21(3), 523. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] Maldonado G, & Greenland S. (1993). Simulation Study of Confounder-Selection Strategies. American Journal of Epidemiology, 138(11), 923–936. [DOI] [PubMed] [Google Scholar]

[R45] McGlone R, & Shipp T. (1971). Some physiologic correlates of vocal-fry phonation. Journal of Speech, Language, and Hearing Research, 769–775. [DOI] [PubMed] [Google Scholar]

[R46] Michel J. (1968). Fundamental frequency investigation of vocal fry and harshness. Journal of Speech, Language, and Hearing Research, 590–594. [DOI] [PubMed] [Google Scholar]

[R47] Ng M, Chen Y, & Chan E. (2012). Differences in vocal characteristics between Cantonese and English produced by proficient Cantonese-English bilingual speakers—A long-term average spectral analysis. Journal of Voice, 26(4), e171–e176. [DOI] [PubMed] [Google Scholar]

[R48] Ogden R. (2001). Turn-holding, turn-yielding and laryngeal activity in Finnish talk-in-interaction. Journal of the International Phonetics Association, 31(1), 139–152. [Google Scholar]

[R49] Ordin M, & Mennen I. (2017). Cross-Linguistic Differences in Bilinguals’ Fundamental Frequency Ranges. Journal of Speech, Language, and Hearing Research: JSLHR, 60(6), 1493–1506. 10.1044/2016_JSLHR-S-16-0315 [DOI] [PubMed] [Google Scholar]

[R50] Parker M, & Borrie S. (2018). Judgments of intelligence and likability of young adult female speakers of American English: The influence of vocal fry and the surrounding acoustic-prosodic context. Journal of Voice, 32(5), 538–545. [DOI] [PubMed] [Google Scholar]

[R51] Poplack S. (2001). Code-switching (linguistic). International Encyclopedia of the Social and Behavioral Sciences, 2062–2065. [Google Scholar]

[R52] Redi L, & Shattuck-Hufnagel S. (2001). Variation in the realization of glottalization in normal speakers. Journal of Phonetics, 29(4), 407–429. [Google Scholar]

[R53] Rodriguez Fornells, A., De Diego Balaguer, R., & Münte TF (2006). Executive control in bilingual language processing. Language Learning, 56, 133–190. [Google Scholar]

[R54] Rogers C, Lister J, Febo D, Besing J, & Abrams H. (2006). Effects of bilingualism, noise, and reverberation on speech perception by listeners with normal hearing. Applied Psycholinguistics, 27(3), 465–485. [Google Scholar]

[R55] Rumbaut R, & Massey D. (2013). Immigration & language diversity in the United States. Daedalus, 142(3), 141–154. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R56] Strange W. (1995). Speech perception and linguistic experience: Issues in cross-language research. York Press. [Google Scholar]

[R57] Thornburgh D, & Ryalls J. (1998). Voice onset time in Spanish-English bilinguals: Early versus late learners of English. Journal of Communication Disorders, 31(3), 215–229. [DOI] [PubMed] [Google Scholar]

[R58] Titze I, Svec J, & Popolo P. (2003). Vocal Dose Measures: Quantifying Accumulated Vibration Exposure in Vocal Fold Tissues. Journal of Speech, Language, and Hearing Research, 46(4), 919–932. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R59] Watterson T, Lewis KE, Murdock T, & Cordero KN (2013). Reliability and validity of nasality ratings between a monolingual and bilingual listener for speech samples from English-Spanish-Speaking children. Folia Phoniatrica et Logopaedica : Official Organ of the International Association of Logopedics and Phoniatrics (IALP), 65(2), 91–97. 10.1159/000353809 [DOI] [PubMed] [Google Scholar]

[R60] Wayland R, Landfair D, Li B, & Guion SG (2006). Native Thai speakers’ acquisition of English word stress patterns. Journal of Psycholinguistic Research, 35(3), 285. [DOI] [PubMed] [Google Scholar]

[R61] Wolk L, Abdelli-Beruh N, & Slavin D. (2012). Habitual Use of Vocal Fry in Young Adult Female Speakers. Journal of Voice, 26(3), e111–e116. [DOI] [PubMed] [Google Scholar]

[R62] Yamaguchi H, Shrivastav R, Andrews M, & Niimi S. (2003). A comparison of voice quality ratings made by Japanese and American listeners using the GRBAS scale. Folia Phoniatrica et Logopaedica, 55(3), 147–157. [DOI] [PubMed] [Google Scholar]

[R63] Yu P, Revis J, Wuyts FL, Zanaret M, & Giovanni A. (2002). Correlation of Instrumental Voice Evaluation with Perceptual Voice Analysis Using a Modified Visual Analog Scale. Folia Phoniatrica et Logopaedica, 54(6), 271–281. [DOI] [PubMed] [Google Scholar]

[R64] Yuasa I. (2010). Creaky voice: A new feminine voice quality for young urban-oriented upwardly mobile American women? American Speech, 85(3), 315–337. [Google Scholar]

PERMALINK

The effect of bilingualism on production and perception of vocal fry

Lady Catherine Cantor-Cutiva, Ph.D.

Pasquale Bottalico

Jossemia Webster

Charles Nudelman

Eric J Hunter

Abstract

Aims:

Method:

Results:

Conclusions:

INTRODUCTION

Voice Production in Bilingual Speakers

Voice Perception in Bilingual Speakers

Vocal Fry Production

METHOD

Design and Participants

Data collection procedures

Informed consent form and Questionnaire

Voice samples

Equipment

Percentage of automatic detected vocal fry

Reliability of Percentage of automatic detected vocal fry

Perceptual assessment of voice production

Reliability

Statistical Analysis

RESULTS

Bilingualism and vocal fry production

Occurrence of vocal fry in English and Spanish

Figure 1.

Differences in use of vocal fry in English and Spanish among bilingual young adults

Effect of spoken language on vocal fry production among English-Spanish bilingual speakers

Table 1.

Bilingualism and perceptual assessment of voice

Intra-Reliability of voice perceptual assessment

Table 2.

Inter-Reliability of perceptual assessment of voice

Table 3.

Factor analysis of bilingualism and perceptual assessment of voice quality

Figure 2.

Figure 3.

Figure 4.

Association between bilingualism and voice perceptual assessment

Table 4.

DISCUSSION

Limitations

Conclusion

Acknowledgments

Footnotes

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases