Abstract
Purpose
Although a growing body of literature has indentified the positive effects of visual speech on speech and language learning, oral movements of infant-directed speech (IDS) have rarely been studied. This investigation used 3-dimensional motion capture technology to describe how mothers modify their lip movements when talking to their infants.
Method
Lip movements were recorded from 25 mothers as they spoke to their infants and other adults. Lip shapes were analyzed for differences across speaking conditions. The maximum fundamental frequency, duration, acoustic intensity, and first and second formant frequency of each vowel also were measured.
Results
Lip movements were significantly larger during IDS than during adult-directed speech, although the exaggerations were vowel specific. All of the vowels produced during IDS were characterized by an elevated vocal pitch and a slowed speaking rate when compared with vowels produced during adult-directed speech.
Conclusion
The pattern of lip-shape exaggerations did not provide support for the hypothesis that mothers produce exemplar visual models of vowels during IDS. Future work is required to determine whether the observed increases in vertical lip aperture engender visual and acoustic enhancements that facilitate the early learning of speech.
Keywords: infant-directed speech, speech development, lip movement, facial movement, motion capture
When interacting with infants, adults significantly modify their communication style through changes to language, speech, and gesture. These adaptations may have a number of positive influences on early development, including the facilitation of child–parent bonding, attention, affect control, and speech and language development (Bernstein Ratner, 1986; Dominey & Dodane, 2004; Fernald et al., 1989; Kuhl et al., 1997). Identifying both the auditory and visual features of infant-directed speech (IDS) is an essential step toward understanding the contribution of environmental stimulation to the development of speech and language.
IDS is conveyed acoustically through voice and speech patterns and visually through facial, head, and body movements. The acoustic features of IDS are well established. Relative to adult-directed speech (ADS), IDS is characterized by a slowing of rate, an increase in pause frequency and duration, and an increase in the mean and range of fundamental frequency (Amano, Nakatani, & Kondo, 2006; Fernald & Simon, 1984; Grieser & Kuhl, 1988; Katz, Cohn, & Moore, 1996; Stern, Spieker, Barnett, & MacKain, 1983; Swanson, Leonard, & Gandour, 1992), as well as an increase in the acoustic distance between vowels (Kuhl et al., 1997). Although typically developing children use both auditory and visual information for learning to comprehend and produce spoken language, the characteristics of the visual component of IDS, such as facial and lip movements, have not been identified.
Facial motion during speech provides infants with a rich and salient source of speech and language cues. Research on multimodal perception of speech in adults has clearly demonstrated that watching a speaker’s facial and head movements markedly improves the speed and accuracy of sound discrimination as well as auditory comprehension, particularly in noisy environments (e.g., Bernstein, Takayanagi, & Auer, 2004; Grant & Seitz, 2000; MacLeod & Summerfield, 1987; Munhall, Jones, Callan, Kuratate, & Vatikiotis-Bateson, 2004; Rosenblum, Johnson, & Saldaþa, 1996; Sumby & Pollack, 1954; van Wassenhove, Grant, & Poeppel, 2005). The improved processing afforded by visible speech can be explained, in part, by the complementary and redundant cues for place and manner of articulation, duration, prosody, rhythmicity, and intensity (e.g., Cho, 2005; Edwards, Beckman, & Fletcher, 1991; Erickson, 1998; Summerfield, 1987; Summers, 1987).
The mouth may be a particularly potent visual stimulus for gaining an infant’s attention and then providing complementary and redundant speech cues. In early speech learning, watching the mouth is also important for establishing links between the visual and acoustic representations of speech sounds and, potentially, for learning speech through imitation (Kuhl & Meltzoff, 1982; Patterson & Werker, 2003). For example, research on young infants has demonstrated that infants learn to recognize the oral postures that accompany different vowels prior to the onset of well-developed vocalizations (Kuhl & Meltzoff, 1982; Patterson & Werker, 2003). These early associations between facial and speech combinations may be a consequence of infants’ strong attentional bias toward moving faces (Biringen, 1987; Cohn & Elmore, 1988; Slater & Kirby, 1998; Toda & Fogel, 1993) and their precocious ability to learn associations between moving objects seen and synchronously heard (Gogate & Bahrick, 2001).
One relatively unexplored possibility is that parents exaggerate the visible aspects of speech to facilitate early speech and language learning. Findings from studies of hand and body gestures during IDS suggest that gestures directed toward infants are timed to direct the child’s attention to important cues regarding the structure of language (Brand, Baldwin, & Ashburn, 2002). Exaggerated oral movements may serve a similar function in early development.
Of particular interest to this investigation is the possibility that parents exaggerate their oral movements to enhance the distinction among vowels. Several studies on the acoustic characteristics of IDS have reported that parents maximized the acoustic distinction among different vowels when speaking to their infants (Burnham, Kitamura, & Vollmer-Conna, 2002; DeBoer, 2003; Kuhl et al., 1997). Such exaggerations in speech are referred to as hyperarticulations (Lindblom, 1990) and are known to enhance speech clarity and intelligibility (Payton, Uchanski, & Braida, 1994; Picheny, Durlach, & Braida, 1985; Smiljanic & Bradlow, 2005). It is not known whether parents similarly hyperarticulate their oral movements to enhance the visual distinction among vowels. As displayed in Figure 1, parents could easily hyperarticulate vowel sounds in the visual domain by, for example, increasing the lip opening for open vowels such as “a” and the lip spread for “ee,” and by exaggerating lip rounding for “oo.”
Figure 1.

Hypothetical lip-shape changes made by mothers to maximize the visual contrast among vowels. Each vowel is represented by its hypothetical location in lip-shape space as defined by its vertical (lip opening) and horizontal lip (spread) aperture.
In this investigation we used three-dimensional (3D) motion capture technology to describe how mothers modify their articulatory movements for vowels when communicating with their infants. We addressed the following four experimental questions: (a) Do mothers exaggerate articulatory movements during IDS relative to how they articulate during ADS? (b) Are lip shapes for vowels more distinctive during IDS than during ADS? (c) Are there individual differences among mothers in the degree of articulatory exaggeration? (d) Across mothers, is there an association between the degree of exaggeration in the facial movement and acoustic characteristics of IDS?
Method
Participants
The participants were 25 English-speaking mother–infant dyads who were enrolled in a study on early speech motor development. There were 25 mothers between 23 and 42 years of age (M = 32.9, SD = 5.32), and 12 male and 13 female infants. Postsecondary maternal education levels ranged from 0 to 10 years (M = 4.1, SD = 2.3). The data from two additional mothers were not included because they tended to look downward, which placed their face out of the cameras’ fields of view. All mothers had negative histories of neurologic or severe visual impairment and did not show any evidence of speech, language, or voice disorders.
All infants were between 9 and 10 months of age. We selected this age because mothers are likely to be very active in modeling articulations at this stage of development when their children are learning to produce sounds and recognize words. More specifically, by the second half of the first year infants are in the early stages of learning to understand words (e.g., Fenson et al., 1994) and to produce vowels in babble (Kent & Murray, 1982; Robb, Chen, & Gilbert, 1997; Rvachew, Slawinski, Williams, & Green, 1996). Moreover, prior studies show that acoustic features of IDS, such as increased F0 (Amano et al., 2006), are produced throughout the first year of a child’s life. Because IDS may change depending on the infant’s developmental status (Englund & Behne, 2006), future work will need to investigate infants at different stages of development, particularly during the first half of infancy when the foundations of speech perception are established.
All children were from monolingual, English-speaking homes. All infants had negative histories of neurologic or visual impairment. All infants performed at age level on the Battelle Developmental Inventory, second edition (Newborg, 2005) when they returned for a follow-up visit at 12 months of age. On the day of data collection, all mothers passed a binaural 25-dB pure-tone screening at 1000, 2000, and 4000 Hz.
Speaking Conditions
Facial movements were recorded from each mother during four different speaking conditions: (a) storytelling to infant, (b) storytelling to adult, (c) story reading to infant, and (d) story reading to adult. Both tasks were designed to elicit the target words: beet, bat, boot, and Bobby. These words were chosen because their medial vowels have well-defined acoustic and visual targets that circumscribe the boundaries of vowel space (see Figure 1). For the story production task, mothers were provided with an illustrated book without words and were instructed to tell the story depicted in the pictures about a little boy named Bobby and his experiences playing baseball and picking beets in the garden. For the reading task, mothers were given the same illustrated book with words designed for young children. The storytelling and reading tasks were intended to provide two different contexts for eliciting IDS, with the former less constrained than the latter. The order of tasks was not varied so that the mothers were consistently producing a story rather than retelling a story. Because of the storytelling task, the number of repetitions of each word varied across subjects (M = 9, SD = 3.4).
During the ADS task, baseline measures of articulatory movements and speech were recorded while each subject read and told the story to a laboratory assistant. During the IDS task, mothers were instructed to perform the same tasks while speaking to their babies as they typically do. For the IDS tasks, mothers were positioned facing the child, who was either secured in an infant seat approximately 2 ft in front of the mother or, in several cases, sitting on the mother’s lap. In these cases, the infants were positioned in a way that permitted them to view both their mother’s face and the book. To minimize self-consciousness on the part of the mother, the motion capture cameras were partly camouflaged by a wall-sized mural that depicted a jungle scene (see Figure 2). In addition, mothers and infants were separated from the investigators by a black curtain that surrounded the data collection area.
Figure 2.
Schematic of the laboratory setup during infant- (top panel) and adult-directed (bottom panel) speaking conditions.
Audio and Lip Movement Recordings
Digital audio recordings (Fs = 44.1 k Hz, 16-bit linear PCM) were made through the entire session using a professional quality lapel microphone that was mounted on each mother’s forehead. The microphone was head mounted to ensure that the mic-to-mouth distance was kept constant during the entire data collection session. Movements of the lower and upper lip were captured in 3D at 120 frames per second using an 8-camera optical motion capture system (Eagle Digital System, Motion Analysis Corp). Prior to data collection, the system was calibrated according to manufacturer specifications.
The motion capture system tracked the movement of spherical reflective markers (approximately 2 mm in diameter) that were illuminated with an infrared light source. Fifteen markers were placed on the mother’s face in the following regions: forehead, eyebrow, nose, lips, and jaw. Only the upper lip (UL), lower lip (LL), right (RC), and left corner (LC) markers were studied. An example of the marker placement is displayed in Figure 3. The UL and LL markers were located midline on the vermilion border of each lip. The LL marker represented the combined motions of the jaw and lower lip. The RC and LC were located near the right and left of the oral commissures, just lateral to the labionasal fold.
Figure 3.
Example of marker placements used to record lip movements. RC = right corner; UL = upper lip; LC = left corner; LL= lower lip.
The 3D positional data from each marker were expressed relative to a room-based coordinate system. Following position tracking, the movement signals were digitally low-pass filtered (flp = 10 Hz) using a zero-phase shift forward and reverse digital filter (Butterworth, 8 pole). Two signals were derived from the 3D time histories of each lip marker: (a) lip separation, which was defined by the 3D Euclidean distance between UL and LL, and (b) lip spread, which was defined by the by the 3D Euclidean distance between the markers at the mouth corners (i.e., RC and LC).
Speaking Rate
We computed the speaking rate in words per minute (WPM) for the entire book reading by dividing the number of words produced by the time in minutes that it took to read the book. This measure included all of the pauses in the speech sample. Occasional comments produced by the mothers during reading were not included in the calculation of speaking rate.
Acoustic Analyses
We measured the maximum fundamental frequency (F0) and duration of each vowel to validate that the participants were producing IDS and to determine whether the mothers who exhibited IDS in the acoustic domain also exhibited exaggerated facial movements.
Vocal pitch
Increased vocal pitch is a well-established feature of IDS (Amano et al., 2006; Fernald et al., 1989; McRoberts & Best, 1997; Swanson et al., 1992). Fernald and Kuhl (1987) suggested that F0 changes are particularly salient for conveying communicative intentions in IDS. In the current study, we obtained F0 contours for the vowel in each target word using the autocorrelation algorithm in TF32 (Milenkovic, 2004), followed by occasional hand correction of mistrackings. We calculated the maximum F0 value from each pitch contour.
Vowel duration
The duration of each vowel was determined acoustically as an indicator of speaking rate differences between IDS and ADS speech. Vowels were segmented on the basis of a spectrographic display using TF32. Boundaries were determined visually by the onset and offset of acoustic energy associated with both voicing and formants. Estimates of acoustic boundaries were confirmed through audio playback.
Vocal RMS
We also obtained the maximum root-mean-square (RMS) level (in dB) for the entire reading passage and each vowel segment using TF32. We used this measure to determine whether IDS is also associated with increased speech intensity. Because articulatory movements and muscle activity are known to increase with increased speech intensity (Dromey & Ramig, 1998; Schulman, 1989; Wohlert & Hammen, 2000), changes in speech intensity may account for potential increases in articulatory displacement during IDS.
Kinematic Analyses of Extent of Articulatory Range of Motion During Storybook Reading
As a global measure of articulatory working space, two standard deviation (2SD) ellipsoids were fit around the 3D motion paths that were recorded during each mother’s reading of the entire storybook. The ellipsoid algorithm calculated the two standard deviation boundaries along the derived principal axis of motion and two additional orthogonal axes. The volume (mm3) defined by the 2SD ellipsoid served as measure articulatory working space, which was compared across the IDS and ADS conditions.
The lip kinematic data that were recorded during the entire reading passage were also used to quantify changes in mouth shape across speaking conditions. For this analysis, we measured the maximum vertical (i.e., the maximum 3D Euclidean distance between the UL and LL markers) and horizontal aperture (i.e., the maximum 3D Euclidean distance between the RC and LC markers) across the entire passage. We took these measures, in addition to the working space measure, to determine specific changes in lip shape. We presumed that the maximum vertical and horizontal aperture would be sensitive to potential exaggerations of lip opening and spreading, respectively, during vowel opening. Although the features that best capture lip protrusion are multidimensional and not fully understood, a study by Fromkin (1964) on vowel lip shape suggests that horizontal lip aperture effectively encodes information about the rounding feature of vowels. This study also suggested that, across vowels, changes in horizontal lip aperture are moderately to strongly coupled with anterior–posterior LL movements. In the current study we used maximum aperture, instead of other measures, such as average or standard deviation of aperture, because this measure is less affected by the observed differences in speaking rate across speaking tasks. We also recorded the duration of the reading passages to test for differences in speaking rate across conditions.
Kinematic Analyses of Vowel-Specific Changes
As displayed in Figure 4, for each word, we analyzed lip opening during the transition from the initial bilabial consonant to the medial vowel. We used audio playback from the digital video, which was time aligned with the kinematic data, to identify the lip movements associated with each vowel. The onset of lip opening for the vowel was the minima in a signal that represented the 3D Euclidean distance between the UL and LL during bilabial closure. The offset of lip opening for the vowel was the point that was associated with the maximum 3D distance between the lips during the vowel. Once each opening gesture was identified, they were measured for maximum vertical (i.e., the maximum 3D Euclidean distance between the UL and LL markers) and horizontal aperture (i.e., the maximum 3D Euclidean distance between the RC and LC markers).
Figure 4.
Example of lip kinematics. The top window shows the downsampled acoustic recording of a mother saying “bat.” Vertical aperture, as displayed the middle window, represents the three-dimensional (3D) Euclidean distance (in mm) between the upper and lower lip markers. Horizontal aperture, as displayed in the bottom panel, represents the 3D Euclidean distance (in mm) between the markers located near the left and right corners of the lips. The maximum vertical and horizontal apertures were recorded for each target vowel during the infant- and adult-directed speech.
We used these kinematic descriptors of lip shape to characterize potentially important visual features (i.e., lip spreading, rounding, and vertical separation) used to discriminate among some vowel categories. Put more specifically, if the mothers’ goal was to produce lip shapes with maximal visual cue contrast to their infants, they would be expected to exaggerate the specific features that distinguish each vowel. For example, an increase in vertical lip aperture would be expected during bat and Bobby, an increase in horizontal aperture would be expected for beet, and a decrease in lip opening would be expected during an exaggerated version of lip rounding for boot. An exaggerated boot may also be produced with a decrease in both vertical and horizontal aperture for rounding.
To determine whether lip shapes for vowels became more distinctive during IDS than during ADS, we computed the Euclidean distance between vowels in lip-shape work space (lip separation vs. lip spread). Lip-shape work space (see Figure 1) was defined by the maximum vertical distance between the UL and LL (vertical aperture) and the maximum horizontal distance between the corners of the lip (horizontal aperture) during vowel opening for each word.
Acoustic Analyses of Vowel-Specific Changes
Measurements of the first (F1) and second formants (F2) associated were identified on a broad-band spectrogram (260 Hz) for each target vowel. The onset of the vowel was identified as the first glottal pulse following the release of the burst, and the offset was identified as the last glottal pulse apparent in F1. We used Praat 5.1 software (Boersma & Weenink, 2009) to extract the F1 and F2 time histories of each vowel using the recommended default setting for extracting formants from the speech of female talkers (i.e., maximum formant 5000 Hz, window length = 0.0.25, pre-emphasis = 50 Hz). We reviewed all formant time histories visually prior to analysis. In the rare event of mistrackings (< 20 samples), the data analysts were instructed to obtain a different sample of the vowel from the recordings. The extracted formant trajectories were then imported into a custom program written for MATLAB that calculated the maximum F1 and F2 values within the mid-80% section of the vowel. This middle section was isolated to minimize coarticulation effects from flanking consonants. We used the maximum frequency value because it paralleled the kinematic measures (i.e., maximum vertical and horizontal) used to quantify lip opening during each vowel.
Measurement Reliability
We randomly selected 90 acoustic recordings for reanalysis of maximum fundamental frequency, vowel duration, and maximum RMS intensity. The intrarater reliability across all three measures was very high (Cronbach’s α = .99). The intrarater reliability of the speaking rate measure was determined by reanalyzing 10 files. The difference between the first and second measurements was 1.45 WPM. We did not perform reliability analysis on the measures of articulatory working space, maximum vertical aperture, maximum horizontal aperture, and formant frequencies because these measures were algorithmically identified.
Statistical Analyses
We analyzed the effects of speaking condition (IDS vs. ADS) and vowel (/i, a, ae, u/) using a two-way repeated measures analysis of variance (ANOVA) for each kinematic measure to test for potential changes in lip shape as a function of speaking task. Measures obtained across the multiple repetitions of each word were averaged for each subject. We used the Holm–Sidak method to test all pairwise multiple comparisons when significant differences were found. We used an overall alpha of p < .05 for all statistical testing. Data from the storytelling and reading tasks were collapsed because an ANOVA revealed no significant differences in lip separation and spread for these tasks.
Results
Speaking Rate
Speaking rate in WPM was slower during IDS (M = 107.11, SEM = 3.55) than during ADS (M = 122.53, SEM = 5.04), F(1, 24) = 14.11, p < .001.
F0, Duration, and Intensity
Vowel maximum F0 and duration during each speaking condition are displayed in the top panel and bottom panel of Figure 5, respectively. Maximum F0 was 57.32 Hz higher during IDS than during ADS, F(1, 24) = 5.45, p < .05. The samples of the entire reading passage were, on average, 38 s longer in duration during IDS than during ADS, F(1, 18) = 34.40, p < .001. Single vowels were, on average, 25 ms longer in duration during IDS than during ADS, F(1, 24) = 4.82, p < .001. The maximum RMS intensity of the vowels was not significantly different across the speaking conditions for the entire reading passage or for single vowels.
Figure 5.
The average change in fundamental frequency (Hz) and duration (ms). Error bars represent standard error across participants’ mean change in fundamental frequency.
Extent of Lip Movement Across Tasks During Book Reading
The working spaces (mm3), as measured by the 2 SD ellipsoids, for all the mouth markers (see Figure 3) were significantly larger during IDS than during ADS. Post hoc analysis of task effects revealed significantly greater working spaces for all articulators during IDS than during ADS: mean difference for UL = 611.60 mm3, p < .001; mean difference for LL = 1537.92 mm3, p < .003; mean difference for RC = 990.04 mm3, p < .001; mean difference for LC = 611.59 mm3, p < .003. Summary statistics for the values obtained for vertical and horizontal lip aperture for the entire reading passage and the target vowels are reported Table 1. No task effects were observed for the maximum horizontal aperture measure; however, the maximum vertical aperture was significantly larger during IDS than during ADS, q(1, 17) = 5.51, p < .05.
Table 1.
Mean vertical and horizontal lip aperture as a function of speaking condition and target word.
| Stimulus | Vertical aperture (mm)
|
Horizontal aperture (mm)
|
||
|---|---|---|---|---|
| M (SD) | M (SD) | |||
| Reading passage | 38.99 (5.90) | 43.90 (6.57) | 69.78 (5.56) | 70.53 (5.76) |
| /ae/ | 35.17 (5.28) | 37.59 (5.84) | 63.94 (5.21) | 63.99 (5.21) |
| /a/ | 31.08 (5.84) | 32.72 (5.07) | 63.67 (5.49) | 63.69 (5.60) |
| /i/ | 28.17 (3.89) | 28.94 (3.91) | 65.28 (4.89) | 65.08 (5.11) |
| /u/ | 24.30 (4.60) | 24.86 (4.59) | 63.60 (5.18) | 63.53 (5.11) |
Note. ADS = adult-directed speech; IDS = infant-directed speech.
Vowel-Specific Changes in Orofacial Movements Across Tasks
Data from two participants were excluded from the ANOVA model because their data set was missing at least one condition because of motion tracking errors. Vertical lip aperture was significantly larger during IDS than during ADS, F(1, 24) = 64.85, p < .001. Differences between lip apertures for IDS and ADS are shown for each target word in Figure 6. Mean differences between IDS and ADS are displayed for all the mothers’ data combined and for a subset of the data representing the 10 mothers who exhibited the greatest increase in vertical aperture. The data in Figure 6 suggest that, regardless of the degree of exaggeration, participants tended to exaggerate low vowels more than high vowels. Post hoc comparisons of the data set containing data from all of the mothers revealed significant differences in vertical lip aperture for /ae/ in bat, t(24) = 7.10, p < .05; /a/ in Bobby, t(24) = 5.53, p < .05; and /i/ in beet, t(24) = 2.44, p < .05. Horizontal lip aperture did not differ significantly between IDS and ADS for any vowels.
Figure 6.
Mean differences observed in vertical aperture (in mm) between IDS and ADS. Error bars indicate the standard error of the mean. Means are displayed for all 25 mothers and for the 10 mothers who exhibited the greatest exaggeration in lip opening.
We observed statistically significant differences across mothers in the extent to which they exaggerated their articulatory movements during IDS, F(1, 24) = 18.94, p < .001. We further examined across-subject differences in articulatory exaggerations during IDS for the movements associated with the word bat. We selected this word because it was associated with the greatest articulatory change across speaking conditions. In Figure 7, the difference between the vertical aperture during IDS and ADS is plotted as a percentage change of vertical aperture (from minimum separation during consonantal closure subtracted to the maximum separation during vowel opening) during the opening gesture for each vowel. Seven of the mothers exhibited a 4% or less change, and the remaining 20 mothers fell along a continuum with a 47% increase representing the greatest exaggeration of vertical lip aperture during IDS.
Figure 7.
Maximum vertical aperture of the upper and lower lips for the vowel /a/ during IDS expressed as percentage of the change in vertical aperture from the oral closing from the consonant to oral opening during the ADS condition. For ease of interpretation, mothers were ranked from lowest to highest.
The data from the 10 mothers who exhibited the highest change in F0 during IDS as compared to during ADS are displayed in Figure 6. We plotted these data separately to identify aspects of articulatory exaggeration that may have been obscured by the group data. The results based on these 10 mothers’ data were similar to those that were based on the entire data set (i.e., all 25 mothers) with the increase in lip-shape distinctiveness among vowels during IDS primarily driven by an increase in vertical aperture of /a/ and /ae/.
Vowel Formant Changes Across Tasks
Maximum F1 and F2 values for each target vowel during IDS and ADS are displayed in Figure 8. Maximum F1 values were significantly larger during IDS than during ADS for all the target vowels: /ae/, t(24) = 7.10, p < .05; /a/, t(24) = 5.53, p < .05; /i/, t(24) = 2.44, p < .05; and /u/, t(24) = 7.10, p < .05. Maximum F2 values were significantly larger only for /ae/, t(24) = 7.10, p < .05, and /a/, t(24) = 7.10, p < .05.
Figure 8.
Acoustic vowel space observed during IDS and ADS. Error bars represent the standard error across participants’ means.
Correlations Between Acoustic and Kinematic Variables
We performed analyses to determine whether the mothers who exaggerated in the kinematic domain also exaggerated acoustic aspects of speech. For each subject, we computed the average difference between IDS and ADS for vertical lip aperture (mm), maximum F0 (Hz), and duration (ms) for /ae/. The degree of vertical aperture was moderately associated with maximum F0, r(25) = .45, p < .03; duration, r(25) = .53, p < .001; and maximum F1, r(21) = .44, p < .04; but not with maximum F2.
Distinction Among Vowel Lip Shapes
To determine whether mothers enhanced the distinction among vowel lip shapes during IDS, we compared the Euclidean distance between all possible vowel pairs in lip-shape space across speaking conditions. The location of each vowel in lip-shape space was determined by its horizontal (x-axis) and vertical (y-axis) aperture during maximum opening. Four out of the six vowel pairs were significantly farther apart in lip-shape space during IDS than during ADS: /ae/–/a/ = 2.0 mm, t(24) = 2.0, p < .05; /ae/–/u/ = 2.94 mm, t(24) = 3.34, p < .05; /ae/–/i / = 2.16 mm, t(24) = 2.37, p < .05; and /a/–/u/ = 1.95 mm, t(24) = 2.15, p < .05. These findings suggest that the increased vertical lip aperture during IDS primarily increased the distance between the low vowels (i.e., /ae/ and /a/) and between these vowels and the other two vowels (i.e., /i/ and /u/).
Discussion
Lip Movements Were Exaggerated During IDS
The primary purpose of this investigation was to determine whether mothers modify their lip movements when communicating with their infants. We hypothesized that IDS would be produced with larger mouth openings than ADS. To examine vowel-specific changes in lip shape, we measured the size of vertical and horizontal lip aperture during IDS and ADS during the production of two low vowels (i.e., /a/ and /ae/) and two high vowels (i.e., /i/ and /u/), which were embedded in target words. The primary findings were that mouth opening was significantly larger during IDS than during ADS and that vertical lip aperture for low vowels was larger during IDS than during ADS. In contrast to this kinematic-based finding, all of the vowels (i.e., high and low) produced during IDS were characterized by an elevated vocal pitch and a slowed speaking rate when compared with vowels produced during ADS. The findings for changes in F1 and F2 frequencies were consistent with the mouth-opening data; changes in F1 and F2 frequencies across speaking conditions appeared to be greater for low vowels than for high vowels. The 25 mothers varied considerably in the degree of articulatory exaggeration, with mothers who exhibited acoustic exaggerations of F0 also tending to be the ones to exhibit exaggerated lip movements.
Contrary to our hypothesis, articulatory exaggerations during IDS were only in the degree of vertical aperture and not in other features of lip shape, such as spreading and possibly rounding, as indicated by the finding of no differences across the speaking conditions in horizontal aperture. The observed increase in vertical aperture did, however, increase the distinction among the four vowels in lip-shape space. Important issues to consider are the auditory–visual consequences of articulatory exaggerations in IDS and their potential to facilitate early speech learning.
Mothers Varied in the Extent to Which They Exaggerated Their Lip Movements
The significant changes in fundamental frequency and speaking rate across speaking tasks suggest that mothers in this study produced IDS during the experimental task. Therefore, the experimental paradigm appeared to elicit IDS even though the speaking tasks were more structured than during a typical child–mother interaction and the mothers were required to remain seated during the entire data collection session. As suggested by the data in Figure 4, the extent of exaggeration varied considerably among mothers. The factors that determine the extent to which a given mother produces IDS are not fully known but potentially include the mother’s and child’s mood and personality, the infant’s responsiveness, and the familial or ethnic culture. In addition, in the current study and others it is possible that some mothers’ willingness to produce IDS was suppressed by their awareness of being observed.
Speculation About Enhancements of Articulatory Exaggerations to Early Speech Learning
Despite its apparent lack of specificity with regard to visible features of vowels, the exaggerated lip opening observed during IDS could produce visual and acoustic enhancements that facilitate the early learning of speech.
Potential visual enhancements
Increasing lip opening may be a simple, effective strategy for focusing the child’s attention on the face. Many studies have shown that children have a strong attentional bias toward moving faces over still faces (Biringen, 1987; Cohn & Elmore, 1988; Toda & Fogel, 1993). This effect is so robust it has been labeled the still-face effect (see Adamson & Frick, 2003, for a review).
Exaggerated lip openings may also convey enhanced articulatory cues for vowels by modifying the luminance of the mouth and increasing the visibility of the teeth and tongue (Erber, 1974; Rosenblum et al., 1996; Summerfield, MacLeod, McGrath, & Brooke, 1989). A study of the articulatory movements during the consonant /b/ suggests that the best visual exemplars of this sound, as judged by adults, are ones that are produced with greater lower lip displacements and speeds (Hall, Green, Moore, & Kuhl, 1999). Vowels produced during IDS may similarly have time-varying visual cues that make them good exemplars. For example, the slowing of speech during IDS may convey important duration cues that emphasize the distinction among vowels or even consonants (Klatt, 1976).
Why were the exaggerations only in the vertical aperture of low vowels? The absence of changes in horizontal aperture across speaking conditions might be interpreted as suggesting that the mothers were not exaggerating to enhance specific sound contrasts (i.e., /i/ vs. /u/). Increased lip spreading and rounding (via changes to horizontal aperture) may be unnecessary because these sounds are already visually distinct (Montgomery & Jackson, 1983). In contrast, exaggerating the articulatory cues for /a/ and /ae/ may be productive because the lip shapes for these vowels are less distinguishable than are those of the other vowels (Montgomery & Jackson, 1983). Another possibility is that lip exaggerations are primarily implemented through increases in jaw opening, with little contribution of lower lip movement. In this case, changes to lip shape would only be in the vertical aperture.
The absence of exaggerations in lip spread and protrusion could also be due to the insensitivity of the horizontal lip aperture measure to small changes in lip rounding and spreading. Lip protrusion, in particular, is multidimensional, and the most sensitive measure of lip protrusion is currently unknown. An early study by Fromkin (1964) suggests that horizontal lip aperture and anterior–posterior lip movements are moderately to strongly coupled during the production of English vowels. Measures of lip movement along the anterior–posterior dimension have been used previously to quantify lip protrusion during speech (e.g., Goffman, Smith, Heisler, & Ho, 2008; Perkell, Matthies, Svirsky, & Jordan, 1993). The data from these studies suggest that some speakers only minimally protrude their lips during the production of /u/. For example, half of the subjects in Perkell et al.’s (1993) study showed negligible lip protrusion in their investigation of the rounded vowel /u/ using the anterior–posterior lip movement measure. In addition, similar to the current study, an investigation conducted by Goffman and colleagues (2008) showed only small differences (approximately 1 mm) in anterior–posterior lip protrusion between the words beet and boot for adult talkers. Additional studies are needed to determine the most sensitive measures of lip rounding and the speech contexts that elicit a strong rounding feature.
Even if the exaggerated vertical lip movements of IDS do not convey enhanced articulatory cues for specific sounds, they may convey prosodic information. Many studies have shown that visual prosody that is conveyed through head and face movements (Beckman, Edwards, & Fletcher, 1992; Erickson, 1998; Harrington, Fletcher, & Roberts, 1995; Summers, 1987) significantly facilitates the perception of speech (Munhall et al., 2004; Rosenblum et al., 1996). For infants, the strong marking of prosody may be particularly effective for facilitating the perceptual segmentation among sounds, syllables, words, and phrases (Jusczyk, Hohne, & Mandel, 1995; Kemler Nelson, Hirsh-Pasek, Jusczyk, & Wright Cassidy, 1989; Thiessen, Hill, & Saffran, 2005). Additional research is required to determine whether the observed changes in lip movement during IDS are qualitatively similar to those used to mark stress.
Potential acoustic/auditory enhancements
The observation that mothers slow their speaking rate during IDS corroborates prior findings of increased vowel durations during IDS (Andruski & Kuhl, 1996; Bernstein Ratner & Luberoff, 1984; Fernald & Simon, 1984; Kuhl et al., 1997; Uther, Knoll, & Burnham, 2007). The slowing of speech during IDS may not only afford extra processing time for the infant but also yield hyperarticulated acoustic vowel targets (Moon & Lindblom, 1994; Turner, Tjaden, & Weismer, 1995), which have been reported in prior studies of vowel spectral changes during IDS (Burnham et al., 2002; DeBoer, 2003; Kuhl et al., 1997). Prior research has also shown that articulatory movement of vowels becomes slightly exaggerated when speech is slowed (Dromey & Ramig, 1998; Mefferd & Green, 2010).
One possibility is that mothers may have exaggerated their lip openings for low vowels to produce the acoustic distinction among vowels during IDS, which has been reported previously in the literature (Burnham et al., 2002; DeBoer, 2003; Kuhl et al., 1997). The observed increase in both F1 and F2 frequency is consistent with the findings of prior acoustic studies of IDS (Burnham et al., 2002; Kuhl et al., 1997; Liu, Kuhl, & Tsao, 2003). Consistent with some of these studies was the observation of greater F1 change across conditions in low vowels than in high vowels (Burnham et al., 2002; Kuhl et al., 1997). This acoustic change associated with IDS is the expected consequence of the articulatory change that was observed—an increase in vertical lip aperture (Lindblom & Sundberg, 1971; Stevens, 2000; Stevens & House, 1955).
One unexpected finding was that IDS was not produced with greater acoustic intensity than ADS. In general, larger lip openings are expected to transmit speech energy more efficiently than smaller ones (Fairbanks, 1950). Moreover, many studies have shown that when talkers are asked to speak loudly they exaggerate their articulatory movements (Dromey & Ramig, 1998; Schulman, 1989; Tasko & McClean, 2004). Although speech loudness changes during IDS have rarely been investigated, a recent study of Jamaican talkers observed no significant speech-intensity differences between IDS and a citation speaking task (Beckford Wassink, Wright, & Franklin, 2007). One perceptual study of synthesized IDS observed that infants are less interested in amplitude modulations than F0 modulations (Fernald & Kuhl, 1987). Therefore, mothers may not exaggerate their speech intensity during IDS because their infants are not particularly responsive to such changes.
Motherese as an Affective Speaking Mode
The final consideration is that the articulatory changes observed during IDS may be a by-product of facial expressions that are unique to, and perhaps exaggerated, during IDS (e.g., Chong, Werker, Russell, & Carroll, 2003). Mother’s facial expressions are an important stimulus for affect attunement and engaging the child in meaningful social interaction (Kaplan, Bachorowski, Smoski, & Hudenko, 2002; Murray & Trevarthen, 1985; Stern, 1985; Trainor, Austin, & Desjardins, 2000). The heightened affect characteristic of IDS may also enhance children’s motivation to communicate (Kitamura & Burnham, 1998; Locke, 1993) and may be especially salient to infants. Future work is needed to elucidate how lip shapes for different vowels are affected by overlaid facial expressions.
Study Limitations
One limitation of the current work is the restrictive laboratory setting in which the speech samples were collected. Outside the laboratory, mothers may exaggerate their speech to a greater extent than what was observed in this study and in ways that are qualitatively different from those observed in this laboratory study. In addition, although we positioned the infants in a way to maximize the likelihood that they would gaze at their mothers’ faces, gazing patterns were not monitored. Mothers may be more inclined to exaggerate when their infants are actively staring at their mouths. Moreover, the method used to record mouth movement relied on the use of facial markers. Although the infants did not seem to be preoccupied with the markers, how their presence influenced the communication between infant and mother is uncertain. Finally, additional work is needed to determine whether the observed small changes in lip opening are perceptible to young infants.
Summary
Understanding the role of IDS in shaping early communication may have important implications for both theory and clinical practice. The current findings suggest that exaggerated lip movements are a characteristic of IDS, particularly during the production of low vowels. Mothers varied along a continuum in the extent to which they exaggerated their articulatory movements. The pattern of lip-shape exaggerations did not provide strong support for the hypothesis that mothers were producing exemplar visual models of vowels during IDS. Additional research is required to understand the potential significance of these exaggerations on communication development and to determine whether the observed exaggerations are consistent with those observed in less restrictive data collection environments.
Acknowledgments
This work was supported by the National Institute on Deafness and Other Communication Disorders Grants R03DC004643 and R01DC006463. We thank Lacey LaBarge, Cynthia Didion, Cara Ullman, Lindsey Fairchild, Paige Mueller, and Kelli Raber for their assistance with data collection and analysis.
Contributor Information
Jordan R. Green, University of Nebraska—Lincoln
Ignatius S. B. Nip, San Diego State University, San Diego, CA
Erin M. Wilson, Waisman Center, University of Wisconsin—Madison
Antje S. Mefferd, Wichita State University, Wichita, KS
Yana Yunusova, University of Toronto, Toronto, Ontario, Canada.
References
- Adamson LB, Frick JE. The still face: A history of a shared experimental paradigm. Infancy. 2003;4:451–473. [Google Scholar]
- Amano S, Nakatani T, Kondo T. Fundamental frequency of infants’ and parents’ utterances in longitudinal recordings. The Journal of the Acoustical Society of America. 2006;119:1636–1647. doi: 10.1121/1.2161443. [DOI] [PubMed] [Google Scholar]
- Andruski J, Kuhl PK. The acoustic structure of vowels in mothers’ speech to infants and children. In: Bunnell T, Idsardi W, editors. Proceedings of the Fourth International Conference on Spoken Language Processing. Wilmington, DE: Alfred I. du Pont Institute; 1996. pp. 1545–1548. [Google Scholar]
- Beckford Wassink A, Wright RA, Franklin AD. Intraspeaker variability in vowel production: An investigation of motherese, hyperspeech, and Lombard speech in Jamaican speakers. Journal of Phonetics. 2007;35:363–379. [Google Scholar]
- Beckman ME, Edwards J, Fletcher J. Prosodic structure and tempo in a sonority model of articulatory dynamics. In: Docherty GJ, Ladd DR, editors. Papers in laboratory phonology II: Segment, gesture, prosody. New York, NY: Cambridge University Press; 1992. pp. 68–86. [Google Scholar]
- Bernstein LE, Takayanagi S, Auer ET., Jr Auditory speech detection in noise enhanced by lipreading. Speech Communication. 2004;44:5–18. [Google Scholar]
- Bernstein Ratner N. Durational cues which mark clause boundaries in mother–child speech. Journal of Phonetics. 1986;14:303–309. [Google Scholar]
- Bernstein Ratner N, Luberoff A. Cues to post-vocalic voicing in mother–child speech. Journal of Phonetics. 1984;12:285–289. [Google Scholar]
- Biringen ZC. Infant attention to facial expressions and facial motion. Journal of Genetic Psychology. 1987;148:137–133. doi: 10.1080/00221325.1987.9914543. [DOI] [PubMed] [Google Scholar]
- Boersma P, Weenink D. Praat: Doing phonetics by computer (Version 5.1.12) [Computer program] 2009 Retrieved from http://www.praat.org/
- Brand RJ, Baldwin DA, Ashburn LA. Evidence for “motionese”: Modifications in mothers’ infant-directed action. Developmental Science. 2002;5:72–83. [Google Scholar]
- Burnham D, Kitamura C, Vollmer-Conna U. What’s new, pussycat? On talking to babies and animals. Science. 2002 May 24;296:1435. doi: 10.1126/science.1069587. [DOI] [PubMed] [Google Scholar]
- Cho T. Prosodic strengthening and featural enhancement: Evidence from acoustic and articulatory realizations of /a,i/ in English. The Journal of the Acoustical Society of America. 2005;117:3867–3878. doi: 10.1121/1.1861893. [DOI] [PubMed] [Google Scholar]
- Chong SCF, Werker JF, Russell JA, Carroll JM. Three facial expressions mothers direct to their infants. Infant and Child Development. 2003;12:211–232. [Google Scholar]
- Cohn JF, Elmore M. Effect of contingent changes in mothers’ affective expression on the organization of behavior in 3-month-old infants. Infant Behavior and Development. 1988;11:493–505. [Google Scholar]
- DeBoer B. Emergence of sound systems through self-organization. In: Knight C, Studdert-Kennedy M, Hurford JR, editors. The evolutionary emergence of language: Social function and the origins of linguistic form. Cambridge, England: Cambridge University Press; 2003. pp. 177–198. [Google Scholar]
- Dominey PF, Dodane C. Indeterminacy in language acquisition: The role of child directed speech and joint attention. Journal of Neurolinguistics. 2004;17:121–145. [Google Scholar]
- Dromey C, Ramig LO. The effect of lung volume on selected phonatory and articulatory variables. Journal of Speech, Language, and Hearing Research. 1998;41:491–502. doi: 10.1044/jslhr.4103.491. [DOI] [PubMed] [Google Scholar]
- Edwards J, Beckman ME, Fletcher J. Articulatory kinematics of final lengthening. The Journal of the Acoustical Society of America. 1991;89:369–382. doi: 10.1121/1.400674. [DOI] [PubMed] [Google Scholar]
- Englund K, Behne D. Changes in infant directed speech in the first six months. Infant and Child Development. 2006;15:139–160. [Google Scholar]
- Erber NP. Effects of angle, distance, and illumination on visual reception of speech by profoundly deaf children. Journal of Speech and Hearing Research. 1974;17:99–112. doi: 10.1044/jshr.1701.99. [DOI] [PubMed] [Google Scholar]
- Erickson D. Effects of contrastive emphasis on jaw opening. Phonetica. 1998;55:147–169. doi: 10.1159/000028429. [DOI] [PubMed] [Google Scholar]
- Fairbanks G. A physiological correlative of vowel intensity. Speech Monograph. 1950;22:390–395. [Google Scholar]
- Fenson L, Dale P, Reznick S, Bates E, Thal D, Pethick S. Variability in early communicative development. Monographs of the Society for Research in Child Development. 1994;59 Serial No. 242. [PubMed] [Google Scholar]
- Fernald A, Kuhl P. Acoustic determinants of infant preference for motherese speech. Infant Behavior and Development. 1987;10:279–293. [Google Scholar]
- Fernald A, Simon T. Expanded intonation contours in mothers’ speech to newborns. Developmental Psychology. 1984;20:104–113. [Google Scholar]
- Fernald A, Taeschner T, Dunn J, Papousek M, Boysson-Bardies B, Fukui I. A cross-language study of prosodic modifications in mothers’ and fathers’ speech to preverbal infants. Journal of Child Language. 1989;16:477–501. doi: 10.1017/s0305000900010679. [DOI] [PubMed] [Google Scholar]
- Fromkin V. Lip positions in American English vowels. Language and Speech. 1964;7:215–225. [Google Scholar]
- Goffman L, Smith A, Heisler L, Ho M. The breadth of coarticulatory units in children and adults. Journal of Speech, Language, and Hearing Research. 2008;51:1424–1437. doi: 10.1044/1092-4388(2008/07-0020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gogate L, Bahrick LE. Intersensory redundancy and seven-month-old infants’ memory for arbitrary syllable–object relations. Infancy. 2001;2:219–231. doi: 10.1006/jecp.1998.2438. [DOI] [PubMed] [Google Scholar]
- Grant KW, Seitz P. The use of visible speech cues for improving auditory detection of spoken sentences. The Journal of the Acoustical Society of America. 2000;108:1197–1208. doi: 10.1121/1.1288668. [DOI] [PubMed] [Google Scholar]
- Grieser DL, Kuhl PK. Maternal speech to infants in a tonal language: Support for universal prosodic features in motherese. Developmental Psychology. 1988;24:14–20. [Google Scholar]
- Hall MD, Green J, Moore CA, Kuhl PK. Contribution of articulatory kinematics to visual perception of stop consonants. In: Kuhl PK, Crum L, editors. Proceedings of the 2nd Convention of the European Acoustics Association: Forum Acusticum and 137th Meeting of the Acoustical Society of America (#4aSCb15) Woodbury, NY: Acoustical Society of America; 1999. [Google Scholar]
- Harrington J, Fletcher J, Roberts C. Coarticulation and the accented/unaccented distinction. Journal of Phonetics. 1995;23:305–322. [Google Scholar]
- Jusczyk P, Hohne E, Mandel D. Picking up regularities in the sound structure of the native language. In: Strange W, editor. Speech perception and linguistic experience issues in cross-language research. Timonium, MD: York Press; 1995. pp. 91–119. [Google Scholar]
- Kaplan PS, Bachorowski JA, Smoski MJ, Hudenko WJ. Infants of depressed mothers, although competent learners, fail to learn in response to their own mother’s infant-directed speech. Psychological Science. 2002;13:268–271. doi: 10.1111/1467-9280.00449. [DOI] [PubMed] [Google Scholar]
- Katz GS, Cohn JF, Moore CA. A combination of vocal F0 dynamic and summary features discriminates between three pragmatic categories of infant directed speech. Child Development. 1996;67:205–217. doi: 10.1111/j.1467-8624.1996.tb01729.x. [DOI] [PubMed] [Google Scholar]
- Kemler Nelson D, Hirsh-Pasek K, Jusczyk PW, Wright Cassidy K. How prosodic cues in motherese might assist language learning. Journal of Child Language. 1989;16:53–68. doi: 10.1017/s030500090001343x. [DOI] [PubMed] [Google Scholar]
- Kent RD, Murray AD. Acoustic features of infant vocalic utterances at 3, 6, and 9 months. The Journal of the Acoustical Society of America. 1982;72:353–365. doi: 10.1121/1.388089. [DOI] [PubMed] [Google Scholar]
- Kitamura C, Burnham D. The infant’s response to maternal vocal affect. In: Rovee-Collier C, editor. Advances in infancy research. Norwood, NJ: Ablex; 1998. pp. 221–236. [Google Scholar]
- Klatt DH. Linguistic uses of segmental duration in English: Acoustic and perceptual evidence. The Journal of the Acoustical Society of America. 1976;59:1208–1221. doi: 10.1121/1.380986. [DOI] [PubMed] [Google Scholar]
- Kuhl PK, Andruski JE, Chistovich IA, Chistovich LA, Kozhevnikova EV, Ryskina VL, Lacerda F. Cross-language analysis of phonetic units in language addressed to infants. Science. 1997 Aug 1;277:684–686. doi: 10.1126/science.277.5326.684. [DOI] [PubMed] [Google Scholar]
- Kuhl PK, Meltzoff AN. The bimodal perception of speech in infancy. Science. 1982 Dec 10;218:1138–1141. doi: 10.1126/science.7146899. [DOI] [PubMed] [Google Scholar]
- Lindblom B. Explaining phonetic variation: A sketch of the H&H theory. In: Hardcastle W, Marchal A, editors. Speech production and speech modeling. Dordrecht, The Netherlands: Kluwer Academic; 1990. pp. 403–409. [Google Scholar]
- Lindblom BEF, Sundberg JEF. Acoustical consequences of lip, tongue, jaw, and larynx movement. The Journal of the Acoustical Society of America. 1971;50:1166–1179. doi: 10.1121/1.1912750. [DOI] [PubMed] [Google Scholar]
- Liu HM, Kuhl PK, Tsao FM. An association between mothers’ speech clarity and infants’ discrimination skills. Developmental Science. 2003;6:F1–F10. [Google Scholar]
- Locke JL. The child’s path to spoken language. Cambridge, MA: Harvard University Press; 1993. [Google Scholar]
- MacLeod A, Summerfield A. Quantifying the contribution of vision to speech perception by noise. British Journal of Audiology. 1987;21:131–141. doi: 10.3109/03005368709077786. [DOI] [PubMed] [Google Scholar]
- McRoberts GW, Best CT. Accommodation in mean f0 during mother–infant and father–infant vocal interactions: A longitudinal case study. Journal of Child Language. 1997;24:719–736. doi: 10.1017/s030500099700322x. [DOI] [PubMed] [Google Scholar]
- Mefferd AS, Green JR. Articulatory-to-acoustic relations in response to speaking rate and loudness manipulations. Journal of Speech, Language, and Hearing Research. 2010;53:1206–1219. doi: 10.1044/1092-4388(2010/09-0083). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Milenkovic P. TF32 [Computer software] University of Wisconsin—Madison, Department of Electrical and Computer Engineering; 2004. [Google Scholar]
- Montgomery AA, Jackson PL. Physical characteristics of the lips underlying vowel lipreading performance. The Journal of Acoustical Society of America. 1983;73:2134–2144. doi: 10.1121/1.389537. [DOI] [PubMed] [Google Scholar]
- Moon SJ, Lindblom B. Interaction between duration, context, and speaking style in English stressed vowels. The Journal of Acoustical Society of America. 1994;96:40–55. [Google Scholar]
- Munhall K, Jones J, Callan D, Kuratate T, Vatikiotis-Bateson E. Visual prosody and speech intelligibility: Head movement improves auditory speech perception. Psychological Science. 2004;15:133–137. doi: 10.1111/j.0963-7214.2004.01502010.x. [DOI] [PubMed] [Google Scholar]
- Murray L, Trevarthen C. Emotional regulations of interactions between two-month-olds and their mothers. In: Field TM, Fox NA, editors. Social perception in infants. Norwood, NJ: Ablex; 1985. pp. 177–197. [Google Scholar]
- Newborg J. Battelle Developmental Inventory. 2. Itasca, IL: Riverside; 2005. [Google Scholar]
- Patterson RL, Werker JF. Two-month-old infants match phonetic information in lips and voice. Developmental Science. 2003;6:191–196. [Google Scholar]
- Payton K, Uchanski R, Braida L. Intelligibility of conversational and clear speech in noise and reverberation for listeners with normal and impaired hearing. The Journal of the Acoustical Society of America. 1994;95:1581–1592. doi: 10.1121/1.408545. [DOI] [PubMed] [Google Scholar]
- Perkell JS, Matthies ML, Svirsky MA, Jordan MI. Trading relations between tongue-body raising and lip rounding in production of the vowel /u/: A pilot “motor equivalence” study. The Journal of the Acoustical Society of America. 1993;93:2948–2961. doi: 10.1121/1.405814. [DOI] [PubMed] [Google Scholar]
- Picheny MA, Durlach NI, Braida LD. Speaking clearly for the hard of hearing: I. Intelligibility differences between clear and conversational speech. Journal of Speech and Hearing Research. 1985;28:96–103. doi: 10.1044/jshr.2801.96. [DOI] [PubMed] [Google Scholar]
- Robb MP, Chen Y, Gilbert HR. Developmental aspects of formant frequency and bandwidth in infants and toddlers. Folia Phoniatrica et Logopaedica. 1997;49:88–95. doi: 10.1159/000266442. [DOI] [PubMed] [Google Scholar]
- Rosenblum LD, Johnson JA, Saldaþa HM. Visual kinematic information for embellishing speech in noise. Journal of Speech and Hearing Research. 1996;39:1159–1170. doi: 10.1044/jshr.3906.1159. [DOI] [PubMed] [Google Scholar]
- Rvachew S, Slawinski EB, Williams M, Green C. Formant frequencies of vowels produced by infants with and without early onset otitis media. Canadian Acoustics. 1996;24:19–28. [Google Scholar]
- Schulman R. Articulatory dynamics of loud and normal speech. The Journal of the Acoustical Society of America. 1989;85:295–312. doi: 10.1121/1.397737. [DOI] [PubMed] [Google Scholar]
- Slater A, Kirby R. Innate and learned perceptual abilities in the newborn infant. Experimental Brain Research. 1998;123:90–94. doi: 10.1007/s002210050548. [DOI] [PubMed] [Google Scholar]
- Smiljanic R, Bradlow AR. Production and perception of clear speech in Croatian and English. The Journal of the Acoustical Society of America. 2005;118:1677–1688. doi: 10.1121/1.2000788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stern DN. Affect attunement. In: Call JD, Galenson E, Tyson RL, editors. Frontiers of infant psychiatry. Vol. 2. New York, NY: Basic Books; 1985. pp. 3–14. [Google Scholar]
- Stern DN, Spieker S, Barnett RK, MacKain K. The prosody of maternal speech: Infant age and context related changes. Journal of Child Language. 1983;10:1–15. doi: 10.1017/s0305000900005092. [DOI] [PubMed] [Google Scholar]
- Stevens KN. Acoustic phonetics. Cambridge, MA: MIT Press; 2000. [Google Scholar]
- Stevens KN, House AS. Development of a quantitative description of vowel articulation. The Journal of the Acoustical Society of America. 1955;27:401–493. [Google Scholar]
- Sumby WH, Pollack I. Visual contribution to speech intelligibility in noise. The Journal of Acoustical Society of America. 1954;26:212–215. [Google Scholar]
- Summerfield Q. Some preliminaries to a comprehensive account of audio–visual speech perception. In: Campbell R, Dodd B, editors. Hearing by eye. Hillsdale, NJ: Erlbaum; 1987. pp. 3–51. [Google Scholar]
- Summerfield Q, MacLeod AM, McGrath M, Brooke NM. Lips, teeth, and the benefits of lipreading. In: Young AW, Ellis HD, editors. Handbook of research on face processing. Amsterdam, The Netherlands: North-Holland; 1989. pp. 223–233. [Google Scholar]
- Summers WV. Effects of stress and final-consonant noticing on vowel production articulatory and acoustic analyses. The Journal of the Acoustical Society of America. 1987;82:847–863. doi: 10.1121/1.395284. [DOI] [PubMed] [Google Scholar]
- Swanson LA, Leonard LB, Gandour J. Vowel duration in mothers’ speech to young children. Journal of Speech and Hearing Research. 1992;35:617–625. doi: 10.1044/jshr.3503.617. [DOI] [PubMed] [Google Scholar]
- Tasko SM, McClean MD. Variations in articulatory movement with changes in speech task. Journal of Speech, Language, and Hearing Research. 2004;47:85–100. doi: 10.1044/1092-4388(2004/008). [DOI] [PubMed] [Google Scholar]
- Thiessen ED, Hill EA, Saffran JR. Infant-directed speech facilitates word segmentation. Infancy. 2005;7:53–71. doi: 10.1207/s15327078in0701_5. [DOI] [PubMed] [Google Scholar]
- Toda S, Fogel A. Infant response to the still-face situation at 3 and 6 months. Developmental Psychology. 1993;29:532–538. [Google Scholar]
- Trainor LJ, Austin CM, Desjardins RN. Is infant-directed speech prosody a result of the vocal expression of emotion? Psychological Science. 2000;11:188–195. doi: 10.1111/1467-9280.00240. [DOI] [PubMed] [Google Scholar]
- Turner GS, Tjaden K, Weismer G. The influence of speaking rate on vowel space and speech intelligibility for individuals with amyotrophic lateral sclerosis. Journal of Speech and Hearing Research. 1995;38:1001–1013. doi: 10.1044/jshr.3805.1001. [DOI] [PubMed] [Google Scholar]
- Uther M, Knoll MA, Burnham D. Do you speak E-NG-L-I-S-H? A comparison of foreigner- and infant-directed speech. Speech Communication. 2007;17:2–7. [Google Scholar]
- van Wassenhove V, Grant KW, Poeppel D. Visual speech speeds up the neural processing of auditory speech. Proceedings of the National Academy of Sciences USA. 2005;102:1181–1186. doi: 10.1073/pnas.0408949102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wohlert A, Hammen V. Lip muscle activity related to speech rate and loudness. Journal of Speech, Language, and Hearing Research. 2000;43:1229–1239. doi: 10.1044/jslhr.4305.1229. [DOI] [PubMed] [Google Scholar]







