Skip to main content
The Journal of the Acoustical Society of America logoLink to The Journal of the Acoustical Society of America
. 2024 Mar 25;155(3):2209–2220. doi: 10.1121/10.0025381

The effect of native language and bilingualism on multimodal perception in speech: A study of audio-aerotactile integrationa)

Haruka Saito 1,b),, Mark Tiede 2, D H Whalen 3,4,3,4, Lucie Ménard 1
PMCID: PMC10965246  PMID: 38526052

Abstract

Previous studies of speech perception revealed that tactile sensation can be integrated into the perception of stop consonants. It remains uncertain whether such multisensory integration can be shaped by linguistic experience, such as the listener's native language(s). This study investigates audio-aerotactile integration in phoneme perception for English and French monolinguals as well as English-French bilingual listeners. Six step voice onset time continua of alveolar (/da/-/ta/) and labial (/ba/-/pa/) stops constructed from both English and French end points were presented to listeners who performed a forced-choice identification task. Air puffs were synchronized to syllable onset and randomly applied to the back of the hand. Results show that stimuli with an air puff elicited more “voiceless” responses for the /da/-/ta/ continuum by both English and French listeners. This suggests that audio-aerotactile integration can occur even though the French listeners did not have an aspiration/non-aspiration contrast in their native language. Furthermore, bilingual speakers showed larger air puff effects compared to monolinguals in both languages, perhaps due to bilinguals' heightened receptiveness to multimodal information in speech.

I. INTRODUCTION

Multimodality in speech perception has been increasingly well-documented over the years, particularly in the audio-visual domain (e.g., Holler and Levinson, 2019, or Keough et al., 2019, for a review). More recently, new experimental techniques and equipment have started to highlight the role of tactile information in multisensory integration (Ito et al., 2009; Sato et al., 2010; Lametti et al., 2012; Trudeau-Fisette et al., 2019; Gick and Derrick, 2009; Gick et al., 2010; Derrick and Gick, 2013; Bicevskis et al., 2016; Keough, 2019; Goldenberg et al., 2022). For example, skin stretch applied by a robot synchronized with the presentation of confusable vowels on a continuum shifts listener perception toward the end point target associated with the corresponding facial position (Ito et al., 2009).

Directly relevant to the current study, Gick and Derrick (2009) reported that aero-tactile stimuli can alter listener perception of voicing contrasts such as /t/ and /d/ presented in noise, causing participants to report hearing /t/ more frequently with a puff of air applied to their hand or neck. They argued that this shift resulted from the sensation of the puff on the skin being perceived through sensory integration as the air expulsion of plosives (i.e., aspiration). Subsequent studies have shown that the application of air puffs to a distal part of the body such as the ankle can also produce the integration effect (Derrick and Gick, 2013), that audio-aerotactile integration has a temporal window similar to audio-visual integration (Gick et al., 2010), and that visual-aerotactile integration can also occur (Bicevskis et al., 2016). Addressing the possibility that the effect could be driven by general tactile sensation, Gick and Derrick (2009, supplementary material) tested the same stimulation sites using the application of a synchronized metal plunger, but found no evidence for integration, and concluded that the effect is specific to areo-tactile stimulation.

One of the questions yet to be answered is whether a listener's multisensory integration in speech is shaped by linguistic experience, specifically by the language(s) they acquired in their life. Life experience, in general, has been shown to facilitate the development of multisensory integration: adults show higher audio-visual integration (Dupont et al., 2005) and audio-tactile integration (Trudeau-Fisette et al., 2019) than children; prelinguistic infants do not yet show signs of audio-aerotactile integration (Keough, 2019). However, studies on whether adult speakers of a certain language (or certain languages) integrate multiple avenues of sensory information differently than those of another language have yielded inconsistent results.

The possibility that a perceiver's native language experience can shape their capacity for multisensory integration has predominantly been investigated for the audio-visual modalities, particularly regarding the McGurk effect (McGurk and MacDonald, 1976; Rosenblum, 2019). Several studies have found that non-English-speaking participants—Japanese (Sekiyama and Tohkura, 1991, 1993), Mandarin (Sekiyama 1997; Hayashi and Sekiyama, 1998), Cantonese (Burnham and Lau, 1998), and Hebrew (Aloufy et al., 1996)—are less susceptible to the McGurk effect (i.e., less affected by visual cues) compared to English-speaking participants. To explain these results, two types of hypotheses have been proposed: On the one hand, the linguistic hypothesis argues that some properties of participants' native language reduce the need for visual cues; for example, lexical tones involve little visible movement, hence speakers of tone languages may be less sensitive to visual cues (Hayashi and Sekiyama, 1998). On the other hand, the cultural hypothesis assumes that participants' native culture may affect their preference for looking directly into the speaker's face, potentially limiting access to relevant visual cues (Sekiyama and Tohkura, 1993).

However, other studies have failed to replicate the influence of the native language on audio-visual integration, finding no difference in susceptibility to the McGurk effect between English and Mandarin participants (Chen and Hazan, 2007, 2009; Wang et al., 2009; Magnotti et al., 2015). A statistical simulation study suggested that this inconsistency may be due to the small sample sizes (Magnotti and Beauchamp, 2018), a possible factor considering that recent studies have shown considerable individual differences in the McGurk effect (Nath and Beauchamp, 2012; Strand et al., 2014; Basu Mallick et al., 2015; Magnotti et al., 2020). For instance, in Basu Mallick et al. (2015), out of 165 English-speaking participants, susceptibility to the McGurk effect ranged widely from 0% to 100%, with the majority consistently either almost always or rarely experiencing it. This variability correlated with participants' lipreading proficiency (Strand et al., 2014), but not with their perceptual accuracy in decoding noisy speech (Magnotti et al., 2020). Additionally, differential activation patterns in the left superior temporal sulcus (STS) have been observed among individuals with varying susceptibility to the McGurk effect (Nath and Beauchamp, 2012).

Furthermore, linguistic experience may influence how multimodal information is processed. There is evidence suggesting that speech perception in bilinguals is generally more multimodal compared to monolinguals. Marian et al. (2018) demonstrated that both early and late bilinguals experienced the McGurk effect more frequently than monolinguals. Bilinguals' increased attention to visual cues has been observed even in infancy (Pons et al., 2015; Havy and Zesiger, 2021). Additionally, in adulthood, bilinguals exhibit better performance in identifying languages solely through visual information (Soto-Faraco et al., 2007). Several explanations have been proposed to account for bilinguals' tendency to rely on visual cues, including using them as a strategy to decipher complex linguistic input (Marian et al., 2018), or compensating for auditory deficits, as even proficient bilinguals often exhibit disadvantages in listening tasks under adverse conditions (Mayo et al., 1997), which may act to increase their reliance on other modalities.

The aim of the current study is to further understand the relationship between linguistic experience and multisensory integration in speech by extending the research to audio-aerotactile integration. We investigated three groups of participants: English monolinguals, French monolinguals, and English-French bilinguals. We made these choices based on the systematic differences in the voice onset time (VOT) structure between English and French.

In English, voiceless stops in the initial position are produced with a long-lag VOT and aspiration, while voiced stops typically have a short-lag VOT. In French, on the other hand, voiceless stops in the initial position involve a short-lag VOT and are described as unaspirated (although there is typically a small aspirated portion; Caramazza and Yeni-Komshian, 1974), while voiced stops are predominantly produced with pre-voicing. If the tactile sensation of air puffs is perceptually associated with aspiration, as assumed by the original study (Gick and Derrick, 2009), we can predict that the degree of aero-tactile integration will be correlated with the degree of airflow in the linguistic contrast being examined. This relationship has been demonstrated in a three-way Thai VOT contrast, where aero-tactile integration was observed between voiceless aspirated and unaspirated stops, but not in the contrast between voiceless unaspirated and pre-voiced stops (Goldenberg, 2019). Similarly, in Mandarin stimuli, only a few pairs of stops that involved a large airflow difference were affected by air puff presentation (Derrick et al., 2019). Based on these findings, one possible outcome is that we may not observe evidence of aero-tactile integration when French listeners perceive French voiceless vs voiced contrasts, unlike their English counterparts, given the relatively low perceptual salience for airflow in that contrast in their language.

However, another possibility is that aero-tactile integration could still be observed among French listeners. Stop consonants, by definition, involve a blockage of airflow and subsequent expulsion of air. If the aero-tactile sensation of air puffs is perceptually associated with auditory signals for air expulsion itself, not exclusively with audible air turbulence like aspiration, then it is possible that aero-tactile information may contribute to the perception of voiceless stops in French. In the case of French, where little aspiration is exploited, the aero-tactile sensation of air puffs might be associated with auditory signals of weaker air expulsion (i.e., short-lag VOT) simply because there is no stronger candidate (i.e., long-lag VOT with aspiration) in their linguistic repertoire. If so, the process of aero-tactile integration may be more flexible than previously thought: native language aspiration may be irrelevant for aero-tactile integration, as listeners can associate and utilize aero-tactile information within the contextual range of their language experience.

In addition to comparing English and French monolingual listeners, we also investigated the effect of English-French bilingual experience on aero-tactile integration. One possible influence of bilingualism is the mutual influence between the languages that bilinguals command. VOT, the focus of the current study, has been shown to be relatively easily influenced by language learning experiences (e.g., Tice and Woodley, 2012). For example, if there is indeed a difference in the usage of aero-tactile information between English and French monolinguals (e.g., French listeners not showing aero-tactile integration while English listeners do), it is possible that we may observe a more subtle effect in bilinguals perceiving English stimuli compared to English monolinguals due to the influence of French, and vice versa when perceiving French stimuli. Alternatively, we may observe evidence that the bilingual experience itself facilitates multisensory perception, despite the properties of the two languages.

To summarize, our research questions are:

  • (1)

    Does a listener's native language affect their susceptibility to audio/aero-tactile integration? Specifically, do French listeners, despite the lack of strong air turbulence (i.e., aspiration) for stop consonants in their native language, nonetheless integrate aero-tactile information into the perception of these consonants?

  • (2)

    Does bilingual experience increase or decrease susceptibility to audio/aero-tactile integration?

II. METHODS

A. Participants

A total of 106 participants took part in the study. These included 35 English monolinguals [mean age = 23.1, standard deviation (SD) = 7.1; 22 females, 11 males, 2 others], 36 French monolinguals (mean age = 30.8, SD = 7.1; 23 females, 13 males), and 35 English-French bilingual listeners (mean age = 25.4, SD = 7.5; 22 females, 12 males, 1 other). All were recruited in Montreal, Canada. English monolingual participants were mainly recruited from students and graduates of English-speaking universities such as McGill and Concordia University, while French monolingual participants were recruited from French-speaking universities including Université du Québec à Montreal and Université de Montreal. Originally, there were two additional participants in the English monolingual group and three additional participants in the bilingual group, but they were excluded from the analyses for various reasons.1 This study was approved by the institutional ethics committee (see Ethics Approval), and all participants provided informed consent and received monetary compensation for their time.

Although our monolingual participants self-reported that they were not fluent users of any language other than their native language, most of them reported having been exposed to one or more foreign languages as part of their formal education. Specifically, 35 out of 36 French participants reported having studied English, while 20 out of 35 English participants reported having studied French. We acknowledge that our participants may not meet the strict definition of “monolingual,” and therefore we administered the Language Experience and Proficiency Questionnaire (LEAP-Q: Marian et al., 2007) to quantify their language proficiencies and exposure to each language. Table I presents a comparison of selected LEAP-Q items among the three groups. As shown in the table, both groups of monolinguals exclusively used their native language at home and predominantly with friends. However, French monolinguals reported greater exposure to their second language, English, primarily through media consumption (e.g., watching TV, films, videos). These items were later analyzed to examine their correlation with the observed effects of air puffs (see Sec. III A 3).

TABLE I.

Selected items from LEAP-Q. Current exposure may not add up to 100% as exposure to a third language is not shown. “Current interaction” and “Current watching” specify how often they engaged in such activities at the time of the experiment.

English monolinguals French monolinguals English-French Bilinguals
English French (L2) English (L2) French English French
AOA (age of acquisition) 1.2 8.8 8.5 0.9 2.7 2.2
Current exposure (%) 88.4 9.0 16.6 79.5 59.0 39.0
self-reported proficiency: speaking (0 = none, 10 = perfect) 9.4 4.0 5.4 9.2 9.1 8.9
Self-reported proficiency: comprehension (0 = none, 10 = perfect) 9.5 5.2 6.1 9.6 9.4 9.2
Current interaction with family (0 = never, 10 = always) 8.5 0.3 0.7 8.3 5.4 5.7
Current interaction with friends (0 = never, 10 = always) 9.2 1.1 2.7 8.6 7.5 5.6
Current watching TV/film/video (0 = never, 10 = always) 8.7 1.0 6.1 6.7 8.7 3.7

Bilingual participants were recruited from both English- and French-speaking universities. They self-identified as bilingual speakers of English and French without advanced knowledge of a third language, and they reported daily exposure to both languages at the time of the experiment. Bilingual participants also completed the LEAP-Q questionnaire. The average age of acquisition (AOA) for English was 2.7 (SD = 2.8, Max = 10), for French was 2.2 (SD = 2.4, Max = 9). They self-reported that they were currently exposed to English approximately 59% of the time (SD = 23.0), to French approximately 39% of the time (SD = 23.1), and the remaining 2% was attributed to other languages. Bilingual participants reported using both languages at home and indicated comparable proficiencies in both languages (see Table I).

B. Stimuli

1. Audio stimuli

The audio stimuli consisted of four continua, each containing six intermediate steps between end point /ta/ and /da/ as well as /pa/ and /ba/ in both English and French. The stimuli were created by recording each end point produced by two bilingual speakers (a male for the coronal series and a female for the labial series), who produced each consontant-vowel (CV) syllable six times. The continua were then generated using the Winn (2020) methods and his supplemental Praat script. The /ta/-/da/ continua for both English and French were created using the same male speaker, while the /pa/-/ba/ continua for both languages were from the same female speaker. This decision was made to ensure that the stimuli were as comparable as possible between English and French, which was a priority for the current study.

To match listeners' perception, the steps in VOT were determined based on a logarithmic scale instead of a linear one (see Table II). This accounts for the fact that the perceptual difference between a VOT of 60  and 100 ms is less significant to listeners than the difference between a VOT of 0 and 40 ms (Rosen and Howell, 1981). The synthesized stimuli were validated through an identification and goodness rating task performed by participants who did not take part in the main experiment. Fourteen English monolingual listeners and twelve French monolingual listeners, drawn from the same population as those in the main experiment, were recruited to perform the validation task online. The task was administered on The Gorilla Experiment Builder (www.gorilla.sc), a cloud-based platform that allowed participants to have access to the task via a web browser. Participants were asked to sit in a quiet room at home and wear headphones. The task was to identify the sound they heard by choosing between two options (/ta/ or /da/ for alveolar stops, and /pa/ or /ba/ for bilabial stops), and then rate the goodness of the stimulus on a scale from 1 to 7. All listeners listened to all four continua, which were presented separately in different blocks.

TABLE II.

VOT in milliseconds (ms) for the six steps in the continua used in the main experiment. Positive values indicate the duration between the burst of the stop consonant and the onset of the following vowel /a/, whereas negative values indicate the duration of pre-voicing added before the burst.

(Unit: ms) English /d-t/ French /d-t/ English /b-p/ French /b-p/
Step 1 19 −41 14 −52
Step 2 22 −10 19 −18
Step 3 28 8 26 1
Step 4 37 19 37 13
Step 5 51 25 51 19
Step 6 72 28 72 23

The results of the validation task are presented in Fig. 1. Each continuum showed a steep change in responses at the intermediate steps (steps 4 or 5), where the goodness rating was also lower than both ends. Although eight steps were created and tested in the validation task, the steps at both ends, specifically steps 1 and 2 as well as steps 7 and 8, showed little difference in terms of rater responses. Accordingly, we removed steps 1 and 8 to reduce the number of trials, resulting in the six step continua used in the main experiment. Table II presents VOT for the six steps in each continuum.

FIG. 1.

FIG. 1.

(Color online) Stimuli validation test. The blue lines indicate the percentage of the “voiceless” response pooled for all participants in the validation task. The red lines indicate the goodness rating on a scale from 1 to 7. Error bars indicate the standard error.

2. Aero-tactile stimuli

The equipment used to deliver air puff stimuli was identical to that described in Goldenberg et al. (2022). Pressured air from an air compressor was regulated by a solenoid valve, ensuring a consistent air flow of 5 Standard Liters per Minute2 (SLPM) when released from the valve regardless of the actual pressure, and delivered to participants through vinyl tubing. The air flow at the exit point of the tubing was monitored using a flow meter to ensure consistency. The coordination, timing, and duration of the air puffs were programmed using a custom matlab script, which also recorded participant responses. The onset of the air puffs was synchronized with the onset of the burst of stop consonants. For each continuum, the air puff duration was set to 70 ms for English stimuli and 30 ms for French stimuli, matching the duration of the longest VOT in that continuum. The air puffs were administered to the back of the participant's right hand between the thumb and the forefinger, with the exit point of the tubing positioned approximately 3 cm above the hand.

C. Procedures

1. Air puff detection task (screening)

The purpose of the preliminary task was to confirm that participants could perceive the sensation of air puffs on their hands without being able to detect them through other means such as sound, vision, or equipment vibration. Participants were seated in a sound-proof room and wore inner-ear headphones along with sound protection earmuffs to block out any noise generated by the air puff equipment.

In the first block, participants were presented with 40 trials consisting of a short tone. Twenty of these trials were accompanied by either a 70 or a 30 ms air puff, while the other twenty had no air puff. Participants positioned their right hand under the air tubing exit point. They were instructed to press a designated key on a keypad using their left hand if they detected the presence of air puffs, and another key if they did not. All participants responded accurately (>95%) in this block.

In the second block, the task was identical except that participants placed their hand away from the tubing, on their lap, to prevent them from feeling the airflow. They were asked to press a key when they felt the presence of air puffs through any means and another key when they did not detect anything. For the participants included in the analyses, the average accuracy was 50.2% (max = 62.5%), as most of them consistently pressed the “no air puffs” key for all trials, demonstrating that they could not perceive the presence of air puffs without direct contact on their hand. However, three additional participants achieved above 90% accuracy in this block by reporting that they could hear a subtle background mechanical noise when the air puffs were released. These participants were excluded from the analyses.

2. Forced-choice identification task

This part of the experiment consisted of two sessions (English or French stimuli) of a forced-choice identification task, conducted on the same day with a 15-min break in between. The order of the sessions was counterbalanced. Participants were instructed to place their right hand under the tube exit point to feel the air puffs during the task. In each session, participants were presented with auditory stimuli drawn from coronal (/da/ to /ta/) or bilabial (/ba/ to /pa/) CV continua and were asked to identify each stimulus as either voiceless (/ta/ or /pa/) or voiced (/da/ or /ba/) by pressing one of two designated keys on a keypad using their left hand. Each stimulus was presented nine times with air puffs and nine times without air puffs, in pseudo-randomized order, divided into three blocks. The coronal and bilabial blocks were alternated. In total, each session involved 216 trials (2 continua × 6 continuum steps × 9 repetitions × 2 puff conditions). Therefore, a participant performed a total of 432 trials after completing both English and French sessions.

Both monolingual and bilingual participants followed the same procedure, except for the language used by the experimenter during the experiment. English and French monolinguals were administered the entire experiment in their native language, while the experimenter used either French or English to communicate with bilingual participants, depending on the stimuli presented in each session (to put them in either the “English” or “French” language mode, as outlined by Grosjean, 2008). As a result, bilingual participants were aware of the language of the stimuli they were listening to, while monolingual participants assumed that all stimuli were in their native language. Additionally, bilingual participants were asked to watch videos in the language that they were going to be presented with during the 15-min break to familiarize themselves with the upcoming session (e.g., a participant who listened to English stimuli in the first session was asked to watch videos spoken in French during the break).

III. RESULTS

A. French and English monolingual listeners

1. Effect of air puffs

In this analysis, we present the results for the continua from the listeners' native language. Figure 2 shows the results of a logistic regression modeling the percentage of listener responses indicating “voiceless” (either /ta/ or /pa/) in the presence and absence of air puffs. The x-axis represents the VOT steps in the stimuli, with stimulus one being the most “voiced” and stimulus six being the most “voiceless.”

FIG. 2.

FIG. 2.

(Color online) The results of the forced-choice identification task. The x-axis represents the stimulus number, with step 1 being the most “voiced” stimulus and step 6 being the most “voiceless” one. The y-axis represents the percentage of “voiceless” responses pooled across all participants in the group. The psychometric functions displayed in the figure were calculated using the psignifit 4 toolbox (Schütt et al., 2016). The lines perpendicular to the x-axis indicate the crossover point at which the probability of a “voiceless” answer was 50%.

A generalized linear mixed-effects model (GLMM) was built for each continuum separately using the lme4 package (Bates et al., 2015) in R (R Core Team, 2018). Participant response (RESP, voiceless or voiced, coded as 1 or 0) was predicted by the presence or absence of air puffs (PUFF, where the absence was the reference level) and VOT steps (STEP, coded as 1 to 6, where step 1 was the reference level), with random intercepts by participants. STEP was treated as a categorical variable rather than a continuous one because, although the steps were ordered, the effect of puff was not designed to increase as the order of step increased: we created the continua so that the effect of puff would occur the most at the middle of the steps, less at both end points. The interaction effect between PUFF and STEP was considered but not included, as its addition did not improve model fits.3 The absence of interaction was somewhat unexpected, given that, by design, the effect of PUFF was more likely to be observed in the confusable middle of the continuum (STEP 3 and 4) rather than at the end points. Despite this expectation, the interaction effect did not reach statistical significance perhaps due to the small effect of PUFF compared to STEP. Furthermore, we indeed observed increased “voiceless” responses not only at STEPS 3 and 4 but also at times closer to end points (STEPS 2 and 5), especially notable in the case of French /d-t/ (see Fig. 2, upper right).

The results for the PUFF effect are shown in Table III. A significant effect of PUFF was found for English alveolar /d-t/ and French alveolar /d-t/, while no such effect was found for English bilabial /b-p/ and French bilabial /b-p/.

TABLE III.

Estimates for the effect of PUFF obtained from GLMMs built for each continuum presented to English or French listeners using the formula RESP ∼ PUFF + STEP + (1|ID). The estimates for STEP are not shown.

Estimate Std.Error z-value p-value Odds ratio
English listeners /d-t/ 0.428 0.152 2.814 0.005 1.53
French listeners /d-t/ 0.602 0.144 4.186 <0.001 1.83
English listeners /b-p/ 0.150 0.115 1.303 0.193 1.16
French listeners /b-p/a 0.083 0.121 0.680 0.496 1.09
a

For consistency with other continua, the estimate from the model without an interaction term [RESP ∼ PUFF + STEP + (1|ID)] is presented here.

2. Individual models

To investigate how the effect of air puffs might vary among individual participants, we fitted psychometric functions to the individual data using the psignifit 4 toolbox for Python (Schütt et al., 2016). The estimated parameters were threshold (the point on the step scale where the probability of a “voiceless” answer was 50%) and width (as defined by Schütt et al., 2016, the difference between the 95th and 5th percentiles on the step scale; a narrower width indicates a steeper category boundary). The guess and lapse rates (the lower and upper asymptotes of the function) were set at zero. A psychometric function was fitted to each participant's responses for each continuum, separately for the condition with and without air puffs. The threshold for the “with air puffs” function was then subtracted from the threshold for the “without air puffs” function, so that a positive difference indicated a shift toward “voiceless” with air puffs. The threshold differences for the /da/-/ta/ and /ba/-/pa/ continua were then averaged.

Figure 3 presents the individual participants' threshold differences for each group. Across all continua, 25 out of 35 (71.4%) English listeners and 25 out of 36 (69.4%) French listeners showed a positive threshold difference, indicating a perceptual shift toward “voiceless” when air puffs were present.

FIG. 3.

FIG. 3.

The individual threshold differences for all groups. Each data point represents one participant, averaged for all the continua tested. Positive differences indicate a perceptual shift toward voicelessness when air puffs were present, while negative differences indicate a shift toward voiced with air puffs.

To illustrate the sharpness of individual participant category boundaries, Table IV displays means and standard deviations for widths (slopes) of individual models, pooled for each group and continuum (note that we only use data for the English and French monolingual groups here and will discuss bilingual data in Sec. III B 2). A linear mixed-effects model (LMM) with GROUP (English or French, the former as the reference level), CONTINUUM (alveolar /d-t/ or bilabial /b-p/, the former as the reference level), and the interaction between the two as fixed effects, and with random intercepts by participant, showed a significant effect of CONTINUUM (t = 3.35, p = 0.001) and no effect for GROUP (t = 0.02, p = 0.997) or the interaction (t = −0.75, p = 0.452). This shows that both groups of participants had less distinct category boundaries for the /b-p/ continua than for the /d-t/ continua.

TABLE IV.

Means and SDs of the widths of individual models. A narrower width indicates a steeper category boundary.

GROUP CONTINUUM MEAN WIDTH (SD)
English /ta/-/da/ 2.06 (0.95)
/pa/-/ba/ 3.01 (1.50)
French /ta/-/da/ 2.07 (0.72)
/pa/-/ba/ 2.72 (1.68)
Bilingual (EN stimuli) /ta/-/da/ 2.02 (0.58)
/pa/-/ba/ 3.20 (1.97)
Bilingual (FR stimuli) /ta/-/da/ 2.23 (1.55)
/pa/-/ba/ 4.04 (2.44)

3. Language background

To investigate whether the effect of air puffs observed in French listeners was related to their exposure to English, correlations were calculated between individual threshold differences (as described in Sec. III A 2, where a larger difference represents a larger effect of air puffs) and each of the following: age of AOA (age of acquisition), current exposure, and self-reported proficiency in English. None of the three variables showed a significant correlation with threshold differences (r = −0.13 for AOA, r = 0.06 for current exposure and r = 0.07 for proficiency), suggesting that limited exposure to English as a second language did not impact the magnitude of aero-tactile integration.

B. French–English bilingual listeners

1. Effect of air puffs

Figure 4 displays the percentages of English-French bilingual listeners' “voiceless” responses in the presence and absence of air puffs for the continua from both English and French. A GLMM model was built for each of the four continua separately. The fixed effects and random effect structure were identical to the models built for monolingual participants: participant response was predicted by PUFF and STEP, with random intercepts by participants. We observed a significant effect of PUFF for all four continua (Table V).

FIG. 4.

FIG. 4.

(Color online) The results of the forced-choice identification task presented to bilingual participants. The x-axis represents the stimulus number, while the y-axis represents the percentage of “voiceless” responses pooled across all participants in the group. The vertical lines indicate the crossover (equal probability) points. The psychometric functions were calculated using the psignifit 4 toolbox (Schütt et al., 2016).

TABLE V.

Estimates for the effect of PUFF obtained from GLMMs built for each continuum presented to bilingual listeners using the formula RESP ∼ PUFF + STEP + (1|ID). The estimates for STEP are not shown.

Estimate Std.Error z-value p-value Odds ratio
Bilingual: English /d-t/ 0.698 0.146 4.795 <0.001 2.01
Bilingual: French /d-t/ 0.319 0.129 2.480 0.013 1.38
Bilingual: English/b-p/ 0.305 0.113 2.629 0.009 1.36
Bilingual: French /b-p/ 0.215 0.101 2.131 0.033 1.24

2. Individual models

Psychometric functions were also fitted to the individual bilingual participants' data using the same procedure as for the monolingual participants. When averaged for all continua in both languages, 26 out of 35 bilingual participants (74.2%) showed a positive threshold difference, indicating that their perception of plosives was biased toward voicelessness by the presence of air puffs (Fig. 3).

To examine whether bilingual participant category boundaries were as acute as their monolingual counterparts, the widths of the individual models were examined (Table IV). An LMM with BILINGUAL (monolingual or bilingual, the former as the reference level; the monolingual group included both English and French participants), CONTINUUM (alveolar /d-t/ or bilabial /b-p/, the former as the reference level) and the interaction between the two as fixed effects and random intercepts by participant revealed a significant effect of the interaction between BILINGUAL and CONTINUUM (t = 2.05 p = 0.042) and main effect of CONTINUUM (t = 6.19, p < 0.001), while there was no effect for BILINGUAL (t = 0.22, p = 0.822). This indicates that bilingual participants showed more diffuse category boundaries for the /b-p/ continuum compared to monolingual participants, while their category boundaries were comparable to monolingual participants for the /d-t/ continua.

3. Language background

To investigate whether bilingual participants' language dominance affected the magnitude of the effect of air puffs, correlations were calculated between individuals' threshold differences (again, a larger difference represents a larger effect of air puffs) and each of the following indexes of English language dominance: the difference in AOA (subtracting English AOA from French AOA), the difference in current exposure (subtracting French exposure from English exposure), the difference in proficiency (subtracting French proficiency from English proficiency, after averaging scores for speaking and comprehension). Individual participants' threshold differences averaged for all continua did not show a significant correlation with any of the above indexes of English language dominance (r = −0.11 for AOA, r = 0.22 for current exposure, and r = −0.04 for proficiency).

IV. DISCUSSION

The current study investigated audio/aero-tactile integration in the perception of plosives for three groups of participants with different language backgrounds. Our findings can be summarized as follows. First, we found similar results for both English and French monolingual listeners: their perception was shifted towards “voiceless” by the presence of air puffs on the hand, at least for the /d-t/ continuum (but not significantly for the /b-p/ continuum, which we will discuss later in the paper). Second, bilingual participants integrated aero-tactile information into the perception of plosives for both languages and they seemed to do so in a greater magnitude than their monolingual counterparts, considering that they showed the effect of air puffs for all continua tested, including the /b-p/ continua.

A. Does a listener's native language affect audio/aero-tactile integration?

Previous studies have found the effect of air puffs in the perception of plosives for participants whose native language featured strong air turbulence (i.e., aspiration during long-lag VOT), such as English (Gick and Derrick, 2009; Goldenberg et al., 2022), Thai (Goldenberg, 2019), or Mandarin Chinese (Derrick et al., 2019). The assumption behind the selection of these languages was that only perceivers who have been exposed to such air turbulence in their linguistic experience would have the sensitivity to integrate appropriately timed aero-tactile information into speech perception. The current findings contradict this assumption by demonstrating that even perceivers whose native language was French, where aspiration was not systematically utilized in the target contrasts, showed evidence of aero-tactile integration. This suggests that aero-tactile integration may be more flexible than previously thought, as in the original study (Gick and Derrick, 2009), where air puffs were intended to simulate aspiration specifically. The pattern of results shown here suggests rather that perceivers can utilize airflow information on the skin in speech perception within the contextual range of their own linguistic repertoire. That is, the presence of an air puff may also potentially facilitate the perception of voiceless non-aspirated consonants relative to voiced consonants because the former still involves more air expulsion than the latter, even though this difference is smaller than that distinguishing aspirated and non-aspirated consonants.

These findings raise the question of how specific one's experience needs to be in order to learn the association between different modalities. If an association between the sensation of air on the skin and the sound of air expulsion in plosive consonants needs to be formed in the specific context, one must have the experience of receiving someone's (or their own) breath on their hand in real-life coinciding with the release of voiceless consonants. However, because French voiceless consonants only feature small-scale air expulsion, it is unlikely that French listeners had the exact experience during linguistic interactions with others or babbling/murmuring to themselves, even if the hand is coincidentally near the mouth. To occur, multimodal integration may not require having the exact experience of the two events (i.e., the airflow sensation on the skin and the sound of air expulsion) occurring simultaneously in the exact same context (i.e., when perceiving plosive consonants). This is consistent with previous findings that perceivers could integrate aero-tactile stimuli even when applied to distal targets on the body; for example, air puff stimuli applied to the ankle were found to trigger aero-tactile integration just like those applied to hands (Derrick and Gick, 2013). Apparently, listeners are able to generalize the association between airflow and the sound of air expulsion learned in other contexts to the perception of plosives, though it is unclear whether those contexts are linguistic or non-linguistic.

The current study also challenges some classic studies on audio-visual integration that argue that a perceiver's native language affects their ability to integrate multimodal information in speech (e.g., Sekiyama and Tohkura, 1991, 1993), while it is in agreement with more recent studies where the frequency of multimodal integration did not differ between different language groups (Chen and Hazan, 2007, 2009; Wang et al., 2009; Magnotti et al., 2015). We observed parallel results for both English and French listeners, contrary to our expectation for a larger effect for English listeners if the amount of airflow usage in their native language shapes their ability to integrate aero-tactile information. Interestingly, French listeners even showed a numerically larger effect size (odds ratio) of air puffs in the /d-t/ continuum (1.83 for French listeners compared to 1.53 for English listeners), suggesting that the magnitude of airflow usage was not driving sensitivity to the effect.

One could argue that these results may not reflect how English and French monolinguals typically perceive speech due to the location of participant recruitment. Our participants resided in Montreal, Canada, at the time of recruitment, where bilingualism is common. Therefore, some of our French “monolinguals” may have had more exposure to English compared to French listeners residing elsewhere (although it appeared that exposure to English for our French listeners was mainly through media such as TV rather than in real life). To examine this possibility, we conducted an analysis of participants' language background survey, which revealed that neither AOA, the amount of exposure, nor proficiency in English correlated with the magnitude of the effect of air puffs among French listeners. It is important to note that we specifically recruited participants who reported not being proficient in their second language, which may explain why we did not find a correlation, as they were all relatively inexperienced in English. This analysis suggests that, for “monolingual” participants, limited exposure to a second language (such as learning English as a second language in school or media consumption in English) was not a significant factor in explaining how they engage in multimodal integration.

It is important to note that although there were no group differences, we did observe large individual differences in the magnitude of aero-tactile integration, consistent with previous studies in audio-visual integration, such as the McGurk effect (Nath and Beauchamp, 2012; Strand et al., 2014; Basu Mallick et al., 2015). Across all three groups of participants, approximately 70% to 75% of participants showed a shift toward perceiving stimuli coincident with air puffs as “voiceless,” while the remaining showed no shift or a shift in the opposite direction (though small for most participants, three participants demonstrated a distinct shift toward “voiced”, beyond 2 SDs from the mean. See Fig. 3. The underlying sources of these individual variabilities in multimodal integration are still unclear (see Brown et al., 2018, for cognitive abilities and Sekiyama et al., 2014 for age, for example), although dermal sensitivity to air may account for some of the variance in the case of air puff stimuli (Derrick and Gick, 2013). The main conclusion to be drawn from the current study, nevertheless, is that an individual's native language does not appear to be a major factor explaining the extent to which multimodal integration is experienced.

Finally, we observed the effect of air puffs in one of the continua tested (the /d-t/ continua) but not in the other continua (the /b-p/ continua) for both English and French monolingual listeners. We speculate that this discrepancy may be due to the specific characteristics of the stimuli used (and note that both English and French /b-p/ stimuli were created from the same speaker, thus they had similar characteristics). Despite the stimulus validation process prior to the experiment, it appeared that the /b-p/ continua were more difficult for participants to perceive. This can be observed in the width (slope) parameter of individual models, as participants showed significantly larger width (indicating a less acute slope) for the /b-p/ continua compared to the /d-t/ continua. Additionally, some participants reported having difficulty perceiving the /b-p/ continua, commenting that they only heard a vowel /a/ without any preceding consonant. This suggests that the burst and subsequent air turbulence of the labial plosive consonant were relatively challenging to perceive after manipulation, which may have contributed to the non-significant results for the /b-p/ stimuli. Although the difference in perceptual difficulty among the continua was not intentional, the finding that both English and French listeners did not show an effect of air puffs for the less perceivable /b-p/ continua once again demonstrates that both groups were comparable in their ability to utilize airflow sensation in speech perception.

B. Does bilingual experience increase, or decrease, audio/aero-tactile integration?

In Sec. IV A, we argued that the monolingual listener native language experience may not affect how they experience multimodal perception. However, the finding that English–French bilingual participants showed an effect of air puffs for all continua tested, including the more “difficult” /b-p/ continua for which monolinguals did not show such an effect, apparently suggests that a listener's linguistic experience does matter after all. Bilingualism may increase the likelihood of experiencing audio-aerotactile integration in speech, which is compatible with a previous study where bilinguals were found to be more susceptible to audio-visual integration (Marian et al., 2018). These apparently contradictory findings for monolinguals and bilinguals point to the possibility that successfully controlling the cognitive demands of bilingualism or managing potential difficulty associated with bilingualism, rather than specific linguistic properties of a given language, may facilitate the tendency to engage in multimodal integration.

The potential difficulty associated with bilingualism in the perception of stop consonants involves the “double phonemic boundary” (García-Sierra et al., 2009): bilinguals may maintain distinct boundaries for acoustically similar phonemes,4 such as stop consonants in English and French, appropriately choosing which boundary to activate in a given context. This shift between “language modes” (Grosjean, 2008) can occur very rapidly (Casillas and Simonet, 2018) and be triggered by various contextual factors, including phonemic (Casillas and Simonet, 2018) or acoustical (García-Sierra et al., 2021) cues, as well as non-auditory cues such as lexical context (Hazan and Boulakia, 1993). The constant, real-time need for multimodal resources to determine “language mode” in speech perception may predispose bilinguals, if not compel them, to be receptive to non-auditory cues.

More broadly, there is also a developmental hypothesis regarding bilinguals' propensity for multimodal perception. For instance, bilingual infants have in some studies been found to be more attentive to visual speech cues compared to monolingual infants (Pons et al., 2015; Havy and Zesiger, 2021), although other studies have failed to find differences (e.g., Tsang et al., 2018). Considering these results, some have argued that the increased attention may result from bilinguals' experience of learning and controlling two languages, with the concomitant complexity of linguistic input (Marian et al., 2018; Pons et al., 2015). During the early stages of language acquisition, bilinguals may develop a strategy to allocate more attention or weight to redundant information in order to decipher complex input, and this strategy may persist throughout their lifespan. This hypothesis implies that bilinguals can also be attentive to other relevant information besides visual cues, and the current results provide evidence that they are susceptible to tactile information at a generally higher level than monolinguals. This supports the view that bilingual speech perception may be fundamentally more multimodal compared to that of monolingual counterparts.

Another hypothesis regarding bilingual susceptivity to multimodal integration is that bilinguals may have functional auditory deficits as a consequence of controlling the phonotactics of two languages. Bilingual listeners tend to show less accurate perception compared to monolingual counterparts, especially under adverse conditions (Mayo et al., 1997) which may, in turn, increase reliance on non-auditory speech cues. In the current experiment, there were two potentially “adverse” conditions for bilingual participants. First, they were asked to switch languages between sessions, which forced them to use two different language models that may mutually interfere within a short amount of time (there was only a 15-min break between sessions). Second, the perception of the /ba/-/pa/ continuum was more “difficult,” and bilingual participants indeed exhibited less steep category boundaries compared to monolingual participants. Thus, the hypothesis of auditory deficits also seems plausible, considering that bilingual participants integrated air puffs into perception while monolinguals did not for the /ba/-/pa/ continuum. It is interesting to note that the inaudible burst of the /ba/-/pa/ continuum may have prevented monolinguals from utilizing aero-tactile information as it obscured the relevance of air puffs to the stimuli, while bilinguals still opted for utilizing aero-tactile information to compensate for degraded auditory signals. Therefore, degraded auditory information alone may not guarantee that listeners will experience multimodal integration. We speculate that bilingual listeners' auditory disadvantage under certain circumstances may create an optimal condition for sensory integration to occur, but it may not be the determining factor for experiencing multimodal integration.

Finally, it is interesting to note that although the bilingual participant group overall showed greater sensitivity to air puffs, the number of individuals in each group who showed a perceptual shift was comparable. For bilinguals, 26 out of 35 participants exhibited a perceptual shift, while for monolinguals, it was 25 out of 35 for English participants and 25 out of 36 for French participants, respectively. In other words, bilingual experience may increase the magnitude of susceptivity to multisensory integration for those who are already susceptible to it, but it may not increase the number of individuals who experience it. When it comes to whether an individual experiences multisensory integration, bilinguals still appear to be subject to the same variability as their monolingual counterparts. This suggests that while bilingual experience may be one of the factors affecting how multimodal cues are processed in speech, there are nonetheless individual factors that may explain more variability in multimodal integration than language factors.

V. CONCLUSION

In conclusion, our findings suggest that native language experience affects the likelihood of engaging in multimodal integration in complex ways. The presence of audio/aero-tactile integration in both English and French monolingual listeners suggests a lack of any specific language effect, while the increased use of aero-tactile information by the bilinguals suggests a role for language experience as a general modulating effect. Although the utilization of the target modality (i.e., air flow information in our case) may vary between languages, the native language(s) acquired by perceivers do not seem to be the determining factor. However, being bilingual and controlling two languages tends to increase the magnitude of multimodal integration. Therefore, our results indicate that linguistic experience can partly shape how people utilize multiple modalities in their speech perception, but this effect is not necessarily due to specific events occurring in their native language.

ACKNOWLEDGMENTS

This work was supported by NIH Grant No. DC-002717 to Haskins Laboratories and the Yale Child Study Center, an Insight grant from the Social Sciences and Humanities Research Council of Canada, and a Discovery grant from the Natural Sciences and Engineering Research Council of Canada.

a)

Portions of this work were presented in “A cross-linguistic study of audio-aerotactile perceptual integration using voicing continua,” Program of 183rd Meeting of the Acoustical Society of America, Nashville, TN, December 2022.

Footnotes

1

One English participant and two bilingual participants were excluded from the study due to their failure to pass the screening test (see procedures: air puff detection test). Another English participant was excluded because they appeared to be unable to concentrate during the experiment. Additionally, there was one bilingual participant whose responses were considered atypical. They consistently (over 99% of the time) categorized a stimulus as “voiceless” when air puffs were present, and as “voiced” when air puffs were absent. We decided to exclude this participant as we suspected that they did not fully comprehend the task, which required responding based on auditory cues rather than the presence or absence of air puffs.

2

The air flow rate of 5 SLPM used in our experiment is lower than the average air flow associated with stop consonants in CV syllables (Isshiki and Ringel, 1964). However, participants were able to detect it easily, as confirmed by their high accuracy in the air puff detection task (see the following section). Furthermore, Goldenberg et al. (2022) observed audio-aerotactile integration for English /b-p/ and /g-k/ contrasts using identical equipment and the same 5 SLPM setting. Therefore, we deemed this air flow rate sufficient to induce integration.

3

A model with an interaction term, RESP ∼ PUFF * STEP + (1|ID), and the one without it, RESP ∼ PUFF + STEP + (1|ID), were built for each continuum, before the likelihood ratio test was performed. For all the continua except for French /b-p/, the result was non-significant (p > 0.05), thus the latter parsimonious model was selected. For the French listeners' /b-p/ continuum, the model with an interaction term was preferred (p < 0.01), because we observed more “voiced” responses at STEP 4 and 6, while more “voiceless” responses were observed at STEP 3. Nevertheless, a post-hoc test (emmeans) revealed no significant PUFF effect at any of the STEPs.

4

Maintaining two distinct boundaries does not necessarily indicate that bilinguals exhibit monolingual-like perception in both languages. In our data, bilinguals' perceptual thresholds without air puffs were closer to STEP 4 for both French /d-t/ and French /b-p/ continua (Fig. 4), while those of French monolinguals were closer to STEP 3 (Fig. 2). When considering the results for English continua, where bilinguals exhibited comparable thresholds to monolinguals, this may suggest that our bilingual participants, as a group, were slightly more English-dominant, as indicated by their current language exposure (59% English and 39% French).

AUTHOR DECLARATIONS

Conflict of Interest

The authors declare that there is no conflict of interest to declare.

Ethics Approval

Ethics approval for collecting data from human participants was granted by the Research Ethics Review Committee Involving Human Subjects (CIEREH) at Université du Québec à Montréal (QC, Canada). All participants provided informed consent before participating in the experiment.

DATA AVAILABILITY

The data that support the findings of this study are openly available on the Open Science Framework (https://osf.io/4r2cm/).

References

  • 1. Aloufy, S. , Lapidot, M. , and Myslobodsky, M. (1996). “ Differences in susceptibility to the ‘blending illusion’ among native Hebrew and English speakers,” Brain Lang. 53(1), 51–57. 10.1006/brln.1996.0036 [DOI] [PubMed] [Google Scholar]
  • 2. Basu Mallick, D. , Magnotti, J. F. , and Beauchamp, M. S. (2015). “ Variability and stability in the McGurk effect: Contributions of participants, stimuli, time, and response type,” Psychon. Bull. Rev. 22, 1299–1299. 10.3758/s13423-015-0817-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Bates, D. , Mächler, M. , Bolker, B. , and Walker, S. (2015). “ Fitting linear mixed-effects models using lme4,” J. Stat. Softw. 67(1), 1–48. 10.18637/jss.v067.i01 [DOI] [Google Scholar]
  • 4. Bicevskis, K. , Derrick, D. , and Gick, B. (2016). “ Visualtactile integration in speech perception: Evidence for modality neutral speech primitives,” J. Acoust. Soc. Am. 140(5), 3531–3539. 10.1121/1.4965968 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Brown, V. A. , Hedayati, M. , Zanger, A. , Mayn, S. , Ray, L. , Dillman-Hasso, N. , and Strand, J. F. (2018). “ What accounts for individual differences in susceptibility to the McGurk effect?,” PLoS One 13(11), e0207160. 10.1371/journal.pone.0207160 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Burnham, D. , and Lau, S. (1998). “ The effect of tonal information on auditory reliance in the McGurk effect,” in paper presented at the AVSP'98 International Conference on Auditory–Visual Speech Processing, Terrigal, Australia. [Google Scholar]
  • 7. Caramazza, A. , and Yeni-Komshian, G. H. (1974). “ Voice onset time in two French dialects,” J. Phon. 2, 239–245. 10.1016/S0095-4470(19)31274-4 [DOI] [Google Scholar]
  • 8. Casillas, J. V. , and Simonet, M. (2018). “ Perceptual categorization and bilingual language modes: Assessing the double phonemic boundary in early and late bilinguals,” J. Phon. 71, 51–64. 10.1016/j.wocn.2018.07.002 [DOI] [Google Scholar]
  • 9. Chen, Y. , and Hazan, V. (2007). “ Language Effects on the Degree of Visual Influence in Audiovisual Speech,” in Proceedings of the 16th International Congress of Phonetic Sciences, August 6–10, Saarbrueken, Germany. [Google Scholar]
  • 10. Chen, Y. , and Hazan, V. (2009). “ Developmental factors and the non-native speaker effect in auditory–visual speech perception,” J. Acoust. Soc. Am. 126(2), 858–865. 10.1121/1.3158823 [DOI] [PubMed] [Google Scholar]
  • 11. Derrick, D. , and Gick, B. (2013). “ Aerotactile integration from distal skin stimuli,” Multisens. Res. 26(5), 405–416. 10.1163/22134808-00002427 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Derrick, D. , Heyne, M. , O'Beirne, G. , and Hay, J. (2019). “ Aero-tactile integration in Mandarin,” in Proceedings of the 19th International Congress of Phonetic Sciences, August 5–9, Melbourne, Australia, pp. 3508–3512. [Google Scholar]
  • 13. Dupont, S. , Aubin, J. , and Ménard, L. (2005). “ A study of the McGurk effect in 4- and 5-year-old French Canadian children,” ZAS Papers Ling. 40, 1–17. 10.21248/zaspil.40.2005.254 [DOI] [Google Scholar]
  • 14. García-Sierra, A. , Diehl, R. L. , and Champlin, C. (2009). “ Testing the double phonemic boundary in bilinguals,” Speech Commun. 51(4), 369–378. 10.1016/j.specom.2008.11.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. García-Sierra, A. , Schifano, E. , Duncan, G. M. , and Fish, M. S. (2021). “ An analysis of the perception of stop consonants in bilinguals and monolinguals in different contexts: A range-based language cueing approach,” Atten. Percept. Pyschophys. 83, 1878–1896. 10.3758/s13414-020-02183-z [DOI] [PubMed] [Google Scholar]
  • 16. Gick, B. , and Derrick, D. (2009). “ Aero-tactile integration in speech perception,” Nature 462, 502–504. 10.1038/nature08572 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Gick, B. , Ikegami, Y. , and Derrick, D. (2010). “ The temporal window of audio-tactile integration in speech perception,” J. Acoust. Soc. Am. 128, EL342–EL346. 10.1121/1.3505759 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Goldenberg, D. (2019). “ Audio-tactile integration in speech perception: Effects of aero-tactile information on the perception of voicing in American English and Thai,” Ph.D. thesis, Yale University, New Haven, CT. [Google Scholar]
  • 19. Goldenberg, D. , Tiede, M. K. , Bennett, R. T. , and Whalen, D. H. (2022). “ Congruent aero-tactile stimuli bias perception of voicing continua,” Front. Hum. Neurosci. 16, 879981. 10.3389/fnhum.2022.879981 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Grosjean, F. (2008). Studying Bilinguals ( Oxford University Press, Oxford, UK: ). [Google Scholar]
  • 21. Havy, M. , and Zesiger, P. E. (2021). “ Bridging ears and eyes when learning spoken words: On the effects of bilingual experience at 30 months,” Dev. Sci. 24, e13002. 10.1111/desc.13002 [DOI] [PubMed] [Google Scholar]
  • 22. Hayashi, Y. , and Sekiyama, K. (1998). “ Native-foreign language effect in the McGurk effect: A test with Chinese and Japanese,” in Proceedings of the AVSP'98, December 4–7, Terrigal, Australia, pp. 61–66. [Google Scholar]
  • 23. Hazan, V. L. , and Boulakia, G. (1993). “ Perception and production of a voicing contrast by French-English bilinguals,” Lang. Speech 36, 17–38. 10.1177/002383099303600102 [DOI] [Google Scholar]
  • 24. Holler, J. , and Levinson, S. C. (2019). “ Multimodal language processing in human communication,” Trends Cogn. Sci. 23, 639–652. 10.1016/j.tics.2019.05.006 [DOI] [PubMed] [Google Scholar]
  • 25. Isshiki, N. , and Ringel, R. (1964). “ Air flow during the production of selected consonants,” J. Speech Hear. Res. 7, 233–244. 10.1044/jshr.0703.233 [DOI] [PubMed] [Google Scholar]
  • 26. Ito, T. , Tiede, M. , and Ostry, D. J. (2009). “ Somatosensory function in speech perception,” Proc. Natl. Acad. Sci. U.S.A. 106, 1245–1248. 10.1073/pnas.0810063106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Keough, M. (2019). “ The role of prior experience in the integration of aerotactile speech information,” Ph.D. thesis, University of British Columbia, Vancouver, BC. [Google Scholar]
  • 28. Keough, M. , Derrick, D. , and Gick, B. (2019). “ Cross-modal effects in speech perception,” Ann. Rev. Linguist. 5(1), 49–66. 10.1146/annurev-linguistics-011718-012353 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Lametti, D. R. , Nasir, S. M. , and Ostry, D. J. (2012). “ Sensory preference in speech production revealed by simultaneous alteration of auditory and somatosensory feedback,” J. Neurosci. 32(27), 9351–9358. 10.1523/JNEUROSCI.0404-12.2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Magnotti, J. F. , Basu Mallick, D. , Feng, G. , Zhou, B. , Zhou, W. , and Beauchamp, M. S. (2015). “ Similar frequency of the McGurk effect in large samples of native Mandarin Chinese and American English speakers,” Exp. Brain Res. 233(9), 2581–2586. 10.1007/s00221-015-4324-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Magnotti, J. F. , and Beauchamp, M. S. (2018). “ Published estimates of group differences in multisensory integration are inflated,” PLoS One 13(9), e0202908. 10.1371/journal.pone.0202908 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Magnotti, J. F. , Dzeda, K. B. , Wegner-Clemens, K. , Rennig, J. , and Beauchamp, M. S. (2020). “ Weak observer–level correlation and strong stimulus-level correlation between the McGurk effect and audiovisual speech-in-noise: A causal inference explanation,” Cortex 133, 371–383. 10.1016/j.cortex.2020.10.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Marian, V. , Blumenfeld, H. K. , and Kaushanskaya, M. (2007). “ The language experience and proficiency questionnaire (LEAP-Q): Assessing language profiles in bilinguals and multilinguals,” J. Speech. Lang. Hear. Res. 50, 940–967. 10.1044/1092-4388(2007/067) [DOI] [PubMed] [Google Scholar]
  • 34. Marian, V. , Hayakawa, S. , Lam, T. , and Schroeder, S. (2018). “ Language experience changes audiovisual perception,” Brain Sci. 8(5), 85. 10.3390/brainsci8050085 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Mayo, L. H. , Florentine, M. , and Buus, S. (1997). “ Age of second-language acquisition and perception of speech in noise,” J. Speech. Lang. Hear. Res. 40, 686–693. 10.1044/jslhr.4003.686 [DOI] [PubMed] [Google Scholar]
  • 36. McGurk, H. , and MacDonald, J. (1976). “ Hearing lips and seeing voices,” Nature 264, 746–748. [DOI] [PubMed] [Google Scholar]
  • 37. Nath, A. R. , and Beauchamp, M. S. (2012). “ A neural basis for interindividual differences in the McGurk effect, a multisensory speech illusion,” NeuroImage 59(1), 781–787. 10.1016/j.neuroimage.2011.07.024 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Pons, F. , Bosch, L. , and Lewkowicz, D. J. (2015). “ Bilingualism modulates infants' selective attention to the mouth of a talking face,” Psychol. Sci. 26, 490–498. 10.1177/0956797614568320 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.R Core Team (2018). R: A Language and Environment for Statistical Computing ( R Foundation for Statistical Computing, Vienna, Austria). [Google Scholar]
  • 40. Rosen, S. M. , and Howell, P. (1981). “ Plucks and bows are not categorically perceived,” Percept. Psychophys. 30(2), 156–168. 10.3758/BF03204474 [DOI] [PubMed] [Google Scholar]
  • 41. Rosenblum, L. D. (2019). “ Audiovisual speech perception and the McGurk effect,” in Oxford Research Encyclopedia of Linguistics, edited by Aronoff M. ( Oxford University Press, Oxford, UK: ). [Google Scholar]
  • 42. Sato, M. , Cavé, C. , Ménard, L. , and Brasseur, A. (2010). “ Auditory-tactile speech perception in congenitally blind and sighted adults,” Neuropsychologia 48(12), 3683–3686. 10.1016/j.neuropsychologia.2010.08.017 [DOI] [PubMed] [Google Scholar]
  • 43. Schütt, H. H. , Harmeling, S. , Macke, J. H. , and Wichmann, F. A. (2016). “ Painfree and accurate Bayesian estimation of psychometric functions for (potentially) overdispersed data,” Vision Res. 122, 105–123. 10.1016/j.visres.2016.02.002 [DOI] [PubMed] [Google Scholar]
  • 44. Sekiyama, K. (1997). “ Cultural and linguistic factors in audiovisual speech processing: The McGurk effect in Chinese subjects,” Percept. Psychophys. 59, 73–80. 10.3758/BF03206849 [DOI] [PubMed] [Google Scholar]
  • 45. Sekiyama, K. , Soshi, T. , and Sakamoto, S. (2014). “ Enhanced audiovisual integration with aging in speech perception: A heightened McGurk effect in older adults,” Front. Psychol. 5, 323. 10.3389/fpsyg.2014.00323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Sekiyama, K. , and Tohkura, Y. (1991). “ McGurk effect in non‐English listeners: Few visual effects for Japanese subjects hearing Japanese syllables of high auditory intelligibility,” J. Acoust. Soc. Am. 90, 1797–1805. 10.1121/1.401660 [DOI] [PubMed] [Google Scholar]
  • 47. Sekiyama, K. , and Tohkura, Y. (1993). “ Inter-language differences in the influence of visual cues in speech perception,” J. Phon. 21(4), 427–444. 10.1016/S0095-4470(19)30229-3 [DOI] [Google Scholar]
  • 48. Soto-Faraco, S. , Navarra, J. , Weikum, W. M. , Vouloumanos, A. , Sebastián-Gallés, N. , and Werker, J. F. (2007). “ Discriminating languages by speech-reading,” Percept. Psychophys. 69(2), 218–231. [DOI] [PubMed] [Google Scholar]
  • 49. Strand, J. , Cooperman, A. , Rowe, J. , and Simenstad, A. (2014). “ Individual differences in susceptibility to the McGurk effect: Links with lipreading and detecting audiovisual incongruity,” J. Speech. Lang. Hear. Res. 57(6), 2322–2331. 10.1044/2014_JSLHR-H-14-0059 [DOI] [PubMed] [Google Scholar]
  • 50. Tice, M. , and Woodley, M. (2012). “ Paguettes & bastries: Novice French learners show shifts in native phoneme boundaries,” UC Berkeley PhonLab Annu. Rep. 8(8), 72–75. 10.5070/P79h18t4rz [DOI] [Google Scholar]
  • 51. Trudeau-Fisette, P. , Ito, T. , and Ménard, L. (2019). “ Auditory and somatosensory interaction in speech perception in children and adults,” Front. Hum. Neurosci. 13, 344. 10.3389/fnhum.2019.00344 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Tsang, T. , Atagi, N. , and Johnson, S. P. (2018). “ Selective attention to the mouth is associated with expressive language skills in monolingual and bilingual infants,” J. Exp. Child Psychol. 169, 93–109. 10.1016/j.jecp.2018.01.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Wang, Y. , Behne, D. M. , and Jiang, H. (2009). “ Influence of native language phonetic system on audio-visual speech perception,” J. Phon. 37(3), 344–356. 10.1016/j.wocn.2009.04.002 [DOI] [Google Scholar]
  • 54. Winn, M. B. (2020). “ Manipulation of voice onset time in speech stimuli: A tutorial and flexible Praat script,” J. Acoust. Soc. Am. 147, 852–866. 10.1121/10.0000692 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data that support the findings of this study are openly available on the Open Science Framework (https://osf.io/4r2cm/).


Articles from The Journal of the Acoustical Society of America are provided here courtesy of Acoustical Society of America

RESOURCES