Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2010 Jan 1.
Published in final edited form as: J Exp Child Psychol. 2008 Oct 1;102(1):40–59. doi: 10.1016/j.jecp.2008.08.002

Developmental shifts in children’s sensitivity to visual speech: A new multimodal picture word task

Susan Jerger *,#,ˆ, Markus F Damian +, Melanie J Spence *,#, Nancy Tye-Murray ˆ,*, Herve Abdi *
PMCID: PMC2612128  NIHMSID: NIHMS83157  PMID: 18829049

Abstract

This research developed a Multimodal Picture Word Task for assessing the influence of visual speech on phonological processing by100 children between 4 - 14 yrs of age. We assessed how manipulation of seemingly to-be-ignored auditory (A) and audiovisual (AV) phonological distractors affected picture naming without participants consciously trying to respond to the manipulation. Results varied in complex ways as a function of age and type and modality of distractors. Results for congruent AV distractors yielded an inverted U-shaped function with a significant influence of visual speech in 4-yr-olds and 10-14-yr-olds, but not in 5-9-yr-olds. In concert with dynamic systems theory, we proposed that the temporary loss of sensitivity to visual speech was reflecting reorganization of relevant knowledge and processing sub-systems, particularly phonology. We speculated that reorganization may be associated with 1) formal literacy instruction and 2) developmental changes in multimodal processing and auditory perceptual, linguistic, and cognitive skills.

Keywords: Picture-Word Task, Audiovisual Speech Perception, U-Shaped Developmental Function, Phonological Processing, Picture-Word Interference, Picture Naming, Multimodal Speech Processing, Dynamic Systems Theory

Speech communication by adults is naturally a multimodal event with auditory and visual speech integrated mandatorily. This basic property of mature speech perception is dramatically illustrated by McGurk effects (McGurk & McDonald, 1976). In the McGurk task, individuals hear a syllable whose onset has one place of articulation while seeing a talker simultaneously mouthing a syllable whose onset has a different place of articulation, e.g., auditory /ba/ and visual /ga/. Adults typically experience the illusion of perceiving /da/ or /ða/, a blend of the auditory and visual inputs. The McGurk illusion is consistent with the idea that auditory and visual speech interact prior to the classification of phonetic features, such as place of articulation (Green, 1998). Integrating auditory and visual speech without conscious effort clearly has adaptive value. Seeing a talker’s face facilitates listening in noisy soundscapes and in clear environments containing unfamiliar or complex content (Arnold & Hill, 2001; MacLeod & Summerfield, 1987; Massaro, 1998).

Multimodal Speech Perception in Infants and Children

In contrast to performance in adults, multimodal speech perception in children is not well-understood. The literature suggests that, at least in some respects, infants are more inclined to perceive speech multimodally than children. For example, infants demonstrate multimodal integration (Burnham & Dodd, 2004; Rosenblum, Schmuckler, & Johnson, 1997). When infants are habituated to a McGurk-like stimulus (auditory /ba/ and visual /ga/) and then presented with either an auditory /ba/, /da/ or /ða/, they respond as if the auditory /ba/ is unfamiliar and the integrated percepts of /da/ or /ða/ are familiar. Infants also detect equivalent phonetic information in auditory and visual speech (Kuhl & Meltzoff, 1982; Patterson & Werker, 1999, 2003). When infants hear a vowel while watching side-by-side images of two talkers, one mouthing the heard vowel and one mouthing a different vowel, they look significantly longer at the talker whose articulatory movements match the heard speech. Such findings suggest that the correspondences between auditory and visual speech are recognized without extensive perceptual-motor experience. A complication to these findings has developed recently with the observation of inconsistent results in infants dependent on testing and stimulus conditions (Desjardins & Werker, 2004). Nonetheless, the evidence as a whole continues to suggest that visual speech may play an important role in learning the phonological structure of spoken language (Dodd, 1979, 1987; Locke, 1993; Mills, 1987; Weikum et al., 2007).

In contrast to the infant and adult literatures, the child literature emphasizes that visual speech has less influence on speech perception by children. In their initial research, McGurk and MacDonald (1976) noted that significantly fewer children than adults show an influence of visual speech on perception. In response to one type of McGurk stimulus (auditory /ba/ - visual /ga/), the percentage of individuals who reported hearing /ba/ (auditory capture) was 40-60% of children, but only 10% of adults. This pattern of results has been replicated and extended to other tasks (Desjardins, Rogers, & Werker, 1997; Dupont, Aubin, & Menard, 2005; Hockley & Polka, 1994; Massaro, 1984; Massaro, Thompson, Barron, & Laren, 1986; Sekiyama & Burnham, 2004; Wrightman, Kistler, & Brungart, 2006). Overall results are consistent with the idea that performance is dominated by auditory input in children and visual input in adults, agreeing with the general observation of a bias toward the auditory modality in young children (Sloutsky & Napolitano, 2003).

Children’s visual speech perception improves with increasing age, but the time course of developmental change is not well understood. A few studies have observed benefit from visual speech by the pre-teen/teenage years (Conrad, 1977; Dodd, 1977,1980; Hockley & Polka, 1994), with one report citing an earlier age of 8 years (Sekiyama & Burnham, 2004). Developmental improvement has been attributed to experience in producing speech, changes in the emphasis and perceptual weight given to visual speech cues, and age-related advances in speechreading skills and/or linguistic skills, perhaps consequent on educational training (Desjardins et al.,1997; Green, 1998; Massaro, et al., 1986; Sekiyama & Burnham, 2004).

Age-Related vs Task Demand Effects

The nature and extent of audiovisual speech perception appears to differ in children versus infants and adults. Some investigators have cautioned, however, that the observed performance differences, particularly between infants and children, may not be reflecting age-related change in multimodal speech processing. Instead the differences may be experimentally induced effects from varying procedures, stimuli, and task demands (Bjorklund, 2005; Desjardins, et al., 1997; Fernald, Swingley, & Pinto, 2001; Green, 1998). A point is that infants’ knowledge has been assessed indirectly via procedures such as looking time whereas children’s knowledge has been accessed directly via a variety of off-line tasks requiring voluntary, conscious retrieval of knowledge and formulation of responses during a post-stimulus interval. The concepts of indirect and direct tasks are demarcated herein on the basis of task instructions as recommended by Merikle and Reingold (1991). Indirect measures do not direct participants’ attention to the experimental manipulation of interest whereas direct measures unambiguously instruct participants to respond to the experimental manipulation. The extent to which age-related differences in multimodal speech processing are reflecting development change versus varying task demands remains an important unresolved issue.

The purpose of this research was to assess the influence of visual speech on phonological processing by children with an indirect approach, namely the Multimodal Picture Word Task. The task was adapted from the Children’s Cross-Modal Picture-Word Test of Jerger, Martin, & Damian (2002) and is appropriate for a broad range of ages. Our experimental tasks qualify as indirect measures because we assess how manipulation of seemingly to-be-ignored distractors affects performance, without the participants being informed of, or consciously trying to respond to, the manipulation. The value of an indirect approach for studying visual speech has been demonstrated previously by research showing an indirect effect of visual speech on performance in adults who had difficulty directly identifying the visual speech stimuli (Jordan & Bevan, 1997). Facial expressions also seem to indirectly influence judgments of vocal speech expressions (happy-fearful) in individuals with severe impairments in directly processing facial expressions (de Gelder, Pourtois, Vroomen, & Bachoud-Levi, 2000). These results provide specific evidence that performance on direct and indirect tasks may differ. We propose that more precisely detailed visual speech representations are required for direct tasks requiring conscious access and retrieval of information relative to indirect tasks. Below we briefly describe the original cross-modal task and our new adaptation.

Children’s Multimodal Picture-Word Task

In the Cross-Modal Picture-Word Task (Jerger et al., 2002b), children are asked to name a picture while attempting to ignore a nominally irrelevant auditory distractor. The connection between the picture-distractor pairs is varied systematically to reflect either a congruent, conflicting, or neutral relationship between the picture-distractor items. The dependent measure is the speed of picture naming, and the goal is to determine whether congruent or conflicting relationships speed up or slow down naming respectively relative to neutral, or baseline, relationships. Relative to the pictures, the entire set of distractors represents phonologically onset-related, semantically related, and unrelated items. More specifically, the phonologically-related distractors are comprised of onsets that are congruent, conflicting in place of articulation, or conflicting in voicing (e.g., the picture “pizza” coupled with “peach,” “teacher,” and “beast” respectively). The semantic distractors are comprised of categorically-related and -unrelated pairs (e.g., the picture “pizza” coupled with “hotdog” and “horse” respectively); the unrelated distractors are composed of vowel nucleus onsets (e.g., the picture “pizza” coupled with “eagle”).

The onset of the distractors is varied to be before or after the onset of the picture, referred to as the stimulus onset asynchrony (SOA). Whether the distractor influences picture naming depends upon the SOA and the type of distractor. With regard to SOA, the effect of an onset related phonological distractor is typically greater when the distractor lags the onset of the picture (Damian & Martin, 1999; Schriefers, Meyer, & Levelt, 1990). With regard to the type of distractor, phonologically related distractors speed naming when the onsets are congruent but slow naming when the onsets are conflicting in place or voicing relative to unrelated distractors. When congruent or conflicting auditory distractors speed up or slow down naming, performance is assumed to reflect crosstalk between speech production and perception (Levelt et al., 1991).

Figure 1 illustrates this crosstalk for a lagging distractor in terms of the stages of processing characterizing production (top line) and perception (bottom line). In the figure, a speaker is naming the picture, “pizza,” while hearing the phonologically congruent distractor “peach.” The stages of processing for producing and perceiving speech proceed in opposite directions. Whether discrete, interactive, or cascaded, most models of picture naming assume the following stages: 1) conceptual processing and activation of a set of meaning-related lexical items; 2) output phonological processing of the selected item; and 3) articulatory motor programming and output. In terms of perceiving speech, processing of the distractor is assumed to consist of the following stages: 1) input auditory/phonetic processing; 2) input phonological processing with activation of a set of phonologically related items; and 3) lexical-semantic and conceptual processing of the selected item. The output and input phonological processes are typically assumed to be separable interacting systems (Martin, Lesch, & Bartha, 1999).

Figure 1.

Figure 1

Simplified stages of processing for speech production (top line) and perception (bottom line) for a speaker naming the picture, “pizza,” while hearing the phonologically congruent distractor, “peach.”

The crosstalk between speech production and perception is assumed to occur when the picture naming process is occupied with output phonology and the distractor perceptual process is occupied with input phonology. Congruent distractors are assumed to speed picture naming by activating input phonological representations whose activation spreads to output phonological representations, thus allowing speech segments to be selected more rapidly during the naming process. Conflicting distractors are assumed to slow naming by activating conflicting output phonological representations that compete with the picture’s output phonology for control of the response. A novel contribution of the current research was the presentation of distractors both auditorily and audiovisually.

Our new picture-word task should provide an estimate of multimodal speech processing that is less sensitive to developmental differences in task demands such as the conscious access and retrieval of information required by direct procedures (Bertelson & de Gelder, 2004). That said, performance on both indirect and direct multimodal speech tasks remains susceptible to developmental changes in a variety of cognitive-linguistic skills. Below we detail our primary and some secondary research questions and predict how developmental changes in relevant cognitive-linguistic factors may impact selected components of our task.

Research Questions and Predicted Results

Our primary research question concerned whether and how visual speech may enhance phonological processing by children over the age range of 4- to 14-yrs relative to auditory speech. In agreement with Campbell (1988), we view visual speech as an extra phonetic resource, perhaps adding another type of phonetic feature, that should enhance facilitatory and interference effects relative to auditory speech only. Possible age-related influences in children’s sensitivity to visual speech may be predicted in terms of interactive developmental changes in 1) input/output coding processes, 2) phonological representational knowledge, and 3) general information processing.

Input/Output Coding Processes

Evidence indicates that younger children with less mature perceptual skills process auditory speech cues less efficiently. Relative to adults, they require a greater amount of, and a higher fidelity of, input for auditory word recognition (Cameron & Dillon, 2007; Elliott, Hammer, & Evan, 1987). These results suggest that younger children may need to rely on visual speech to supplement their less efficient processing of auditory speech cues. Visual speech might enhance phonological effects on performance by providing additional phonetic information along with speech envelope information that aids extraction of the auditory cues. Facial expressions might also supplement less efficient auditory speech processing by providing easier nonverbal information that promotes understanding of the intent of what was heard (Doherty-Sneddon & Kent, 1996).

The above evidence about input coding processes leads to the prediction that visual speech will enhance phonological processing by younger children. Some proposals about output coding processes bolster this prediction, suggesting that younger children with less mature articulatory proficiency observe visual speech disproportionately in order to cement their knowledge of the relation between articulatory gestures and their acoustic consequences (Dodd, 1987; Gordon, 1990). With regard to predicting performance across a broad age range, the evidence suggests there may be developmental shifts in the processing weights assigned to the auditory and visual speech modalities. This in turn may cause apparent developmental shifts in children’s sensitivity to visual speech (Brainerd, 2004). The time course of developmental effects is difficult to predict from the literature.

Phonological Representational Knowledge

A broad literature suggests that younger children have less detailed phonological representations and less efficient mapping of acoustic information onto the representations (see Snowling & Hulme, 1994). This evidence predicts that the additional phonetic information provided by visual speech will enhance phonological effects on performance in younger children. With regard to performance across a broader age range, some developmental changes in phonological representational knowledge seem to occur at about 5 or 6 years of age. First, data suggest that phonological processes become sufficiently proficient at about this age to begin using “inner” speech for learning, remembering, and problem solving (Conrad, 1971). Second, and perhaps more importantly, the initiation of literacy instruction at about this age triggers dramatic changes in phonological skills (Bentin, Hammer, & Cahan, 1991; Morrison, Smith, & Dow-Ehrensberger, 1995; de Gelder & Morais, 1995; Morais, Bertelson, Cary, & Alegria, 1986). Some authorities propose that as children’s experience transmutes from phonemes as coarticulated nondistinct speech elements to phonemes as separable distinct written elements, phonological knowledge and awareness of phonemes becomes more highly detailed and specified (see Anthony & Francis, 2005, and Bryant, 1995, for discussion). The timeframe required for systematizing the knowledge gained during literacy learning for a language such as English with complicated print-sound mappings is estimated as about 3 years (Anthony & Francis, 2005).

Overall, phonological knowledge appears to reorganize into a more elaborated, systematized, and robust resource in order to support a wider range of activities, such as reading and using inner speech to think and reason, from roughly 6 to 9 years of age. To the extent that the phonological knowledge supporting visual speech processing is not as readily accessed and/or retrieved during this process of restructuring, results predict that we may observe a developmental shift in children’s sensitivity to visual speech during this time period. An intimate link between visual speech skills and the phonological knowledge gained by becoming literate is supported by findings that older individuals with reading disorders exhibit significantly less influence of visual speech on performance and unusually poor visual speechreading skills (de Gelder & Vroomen, 1998a; Ramirez & Mann, 2005).

Information Processing

Information processing skills have been addressed in terms of general attentional resources, multimodal stimuli, and face processing. First, with regard to general resources, to the extent that phonological representational knowledge undergoes restructuring as discussed above, this reorganization may demand a disproportionate share of a child’s limited processing capacity. To the extent that overloading available information processing resources creates an obstacle to processing visual speech, results predict that we may see less influence of visual speech on performance in the age range from 6 to 9 years.

A number of general processing mechanisms may also be enhanced by external cues, and younger children with immature processing skills may benefit disproportionately from such cues. For example, some experts propose that visual speech acts as a type of “alerting” or “motivational” mechanism (Arnold & Hill, 2001; Campbell, 2006). This viewpoint suggests that visual speech may boost attention, orienting, arousal, and/or motor preparedness, which would aid detection, discrimination, and rapid information processing (Wickens, 1974). This evidence predicts that younger children may benefit from visual speech due to processing-enhancing mechanisms that boost less mature skills. Available evidence does not allow prediction of the time course of the developmental effects.

With regard to multimodal stimuli, theorists propose that development consists of transitioning from processing multimodal inputs more holistically to true multimodal integration of differentiated sensory modalities (see Lickliter & Bahrick, 2004, for review). This predicts that we may observe developmental shifts in children’s sensitivity to visual speech because of transitions in the processing of auditory and visual speech inputs from a supramodal to a modality-specific manner. The time course of developmental effects is difficult to predict.

Finally, with regard to face processing, evidence suggests that the talker’s face is encoded during speechreading (Campbell & de Haan, 1998). An association between speechreading and face processing is supported by the observation that patients with severe face processing deficits due to prosopagnosia may show a loss of visual speechreading ability (de Gelder & Vroomen, 1998b). Children have some difficulties in processing faces and the full range of facial expressions up to about the preteen-teenage years (Campbell, Walker, & Baron-Cohen, 1995; Carey, Diamond, & Woods, 1980; Durand, Gallay, Seigneuric, Robichon, & Baudouin, 2007; Mondloch, Geldart, Maurer, & LeGrand, 2003). Face-to-face communication may also hinder, rather than help, performance on some types of tasks in children (Doherty-Sneddon, et al., 2000; Doherty-Sneddon, Bonner, & Bruce, 2001). This latter finding may be related to the more general phenomenon of gaze aversion, in which individuals reduce environmental stimulation in an attempt to reduce cognitive load and enhance processing (Glenberg, Schroeder, & Roberson, 1998). This predicts that we may observe a developmental shift in the influence of visual speech on phonological processing due to a transition in the processing of the facial context of visual speech around the preteen-teenage years. In short, multiple complex, interactive factors may produce developmental shifts in children’s sensitivity to visual speech.

In addition to our primary research question, secondary research questions addressed whether phonologically related distractors consistently speed phonological processing when they are congruent and slow processing when they are conflicting relative to the baseline distractors, and whether the magnitude of phonological effects on performance declines systematically with age. Scant evidence in children with cross-modal picture word tasks indicates that phonologically related auditory distractors consistently facilitate picture naming when they are congruent and disrupt naming when they are conflicting relative to a baseline condition (Brooks and MacWhinney, 2000; Jerger et al., 2002b; Jerger, Lai, & Marchman, 2002a). Effects on performance are more pronounced in younger than older children. We expect the experimental manipulations of our multimodal approach to produce comparable effects on phonological processing, thus allowing us to address our primary question in a sensitive manner. Results will contribute new evidence about how phonological processing is influenced by visual speech over a broad range of ages on the same task and whether results on an indirect task mirror results across studies in the literature on direct tasks.

Method

Participants

Participants were 100 children, 50 girls and 50 boys, ranging in age from 4 yr 3 mos to 14 yr 0 mos. The racial distribution was 85% Whites, 5% Asians, 3% Blacks, 3% Indians, and 4% Multiracial, with 12% of Hispanic ethnicity. The children were formed into five groups of 20 each according to age, namely 4-yr-olds, 5-yr-olds, 6-7-yr-olds, 8-9-yr-olds, and 10-14-yr-olds. The rationale for grouping by variable age intervals was that speech development in children is a nonlinear process in which developmental growth is more active and changing in earlier years than in later years (American Speech Language Hearing Association, 2008). Thus, as age progresses, one can group children by larger age intervals while maintaining reasonably homogeneous speech skills. The criteria for participation were a) no diagnosed or suspected disabilities and b) English as the native language. All children passed standardized or laboratory measures establishing the normalcy of hearing sensitivity, visual acuity (including corrected to normal), visual perception, spoken word recognition, vocabulary skills, articulatory proficiency, phoneme discrimination, and oral-motor function. The average Hollingshead (1975) social strata score was 1.5, which is consistent with a major business and professional socioeconomic status.

With regard to pronunciation of the names of the pictures, all participants pronounced the onsets accurately. The offsets of the picture names were also pronounced correctly except for 19 children, of whom 53% were 4-yr-olds, 26% were 5-yr-olds, 16% were 6-7-yr-olds, and 5% were 8-9-yr-olds. These children mispronounced either the /th/ in teeth, the /mp/ in pumpkin, the /r/ in deer, or the /z/ in pizza during speeded naming. With regard to identification of the auditory distractors, all children showed near ceiling performance on an auditory only task. With regard to visual speechreading skills, scores on the Children’s Audiovisual Enhancement Test (Tye-Murray & Geers, 2001) improved noticeably with age. Visual-only performance scored in terms of words averaged about 4% in the 4-to 5-yr-olds, 15% in the 6-to 9-yr-olds, and 23% in the 10-to 14-yr-olds. Visual-only performance for word onsets scored in terms of visemes, or the smallest distinguishable units of speech defined by lip movements (Fisher, 1968), averaged about 35% in the 4-to 5-yr-olds, 58% in the 6-to 9-yr-olds, and 73% in the 10-to 14-yr-olds.

Materials and Instrumentation for Picture-Word Task

Stimulus Preparation

All stimuli were recorded by an 11-year-old boy. He wore a solid navy shirt and lipgloss and looked directly into the camera. The rationale for a child talker was to increase attention and interest for child participants. Our informal experience with children and formal evidence in infants (Bahrick, Netto, & Hernandez-Reif, 1998) suggest a strong preference for child over adult faces. The recording setting was the Audiovisual Stimulus Preparation Laboratory of the University of Texas at Dallas with recording equipment, sound-proofing, and supplemental lighting and reflectors. The talker started and ended each utterance with a neutral face/closed mouth position. The full facial image and upper chest of the talker were recorded. Full facial image stimuli yield more accurate speechreading performance (Greenberg & Bode, 1968), supporting the idea that facial movements other than the mouth area may contribute to speechreading (Munhall & Vatikiotis-Bateson, 1998).

The audiovisual recordings were digitized via a Macintosh G4 computer with Apple Fire Wire, Final Cut Pro, and Quicktime software. Color video was digitized at 30 frames/sec with 24-bit resolution at 720 × 480 pixel size. Auditory input was digitized at a 22 kHz sampling rate with 16-bit amplitude resolution. The pool of utterances was edited to an average RMS level of −14 dB. The average fundamental frequency was 202 Hz.

Stimulus Onset Asynchrony (SOA)

The colored pictures were scanned into a computer and edited to achieve objects of a similar size and complexity on a white background. The size of the picture was edited to be the width of the face at eye level. Each picture was pasted onto the upper chest of the talker in exactly the same time frame for both auditory and audiovisual items. The pictures were pasted twice to form SOAs of −165 ms and +165 ms (the onset of the distractor was respectively 165 ms or 5 frames before and after the picture). To be consistent with current practice, we defined a distractor’s onset on the basis of its auditory onset. Technically, a picture can be pasted onto an audiovisual stimulus only at the beginning of a frame (every 33 ms). To illustrate our pasting strategy, we will use an imaginary SOA of 0 ms (simultaneous picture-distractor onsets). The goal was that the onset of a picture should be in the frame nearest the auditory onset. Thus, if the auditory onset was in the first half of a frame, we pasted the picture at the beginning of that frame. If the auditory onset was in the last half of a frame, we pasted the picture in the beginning of the nearer following frame. This strategy yielded an average SOA with a maximum variability of about 16 ms.

In the literature, leading and lagging SOAs are reported both in combination and in isolation. With regard to using SOAs in combination, researchers have chosen this approach when they were interested in tracking the time course of phonological or semantic activation. These results have yielded interesting differences between the different types of distractors. More specifically, findings have shown that a lagging SOA of roughly 100-200 ms tends to maximize any phonological effects on performance due to interactions between input and output phonology (Damian & Martin, 1999; Schriefers et al., 1990; see Figure 1). When phonological distractors are presented at a leading SOA of about 100-200 ms, on the other hand, phonological effects on performance are typically very small. In this latter case, less interaction is attributed to less temporal overlap between the two types of phonology, with activation of the input phonological representations decaying prior to the output phonological encoding of the picture. In contrast to these findings, results for semantic distractors have yielded the opposite pattern of interaction. Semantic effects on performance are typically negligible at lagging SOAs and prominent at leading SOAs. With regard to a focus on only one SOA, researchers have chosen this approach when they wished to investigate differing effects on performance produced by differing types of phonological or semantic distractors. In this case, they typically focus on the SOA maximizing any effect, i.e., lagging for phonological distractors and leading for semantic distractors. Research questions about the time course of activation versus the effects of differing types of distractors are usually reported in separate papers although all data may be gathered simultaneously, particularly in children, due to the difficulties and expense of recruiting and testing the participants. It is also the case that an inconsistent relationship between the picture-distractor pairs is viewed as boosting listeners’ attempts to disregard the distractors. In this paper aimed at explicating the developmental course of children’s sensitivity to visual speech, we focused only on the phonological distractors at the lagging SOA.

Pictures and Distractors

Development of specific test items and conditions comprising the Children’s Cross-Modal Picture-Word Test has been detailed previously (Jerger et al., 2002b). The pictured objects of this study are the same pictures as used previously; the distractors differ however. Table 1A (Appendix) details the individual picture and distractor-word items. Table 2A summarizes linguistic statistics for the phonology pictures and distractors. In brief, the test materials are of high familiarity, high concreteness, high imagery, high phonotactics probabilities, low word frequency, and an early age of acquisition (Carroll & White, 1973; Coltheart, 1981; Cortese & Fugett, 2004; Dale & Fenson, 1996; Gilhooly & Logie, 1980; Morrison, Chappell, & Ellis, 1997; Nusbaum, Pisoni, & Davis, 1984; Snodgrass & Vanderwart, 1980; Vitevitch & Luce, 2004). In brief, the onsets of the pictures always began with /b/, /p/, /t/, or /d/, coupled with the vowels /i/ or /ˆ/. Previous research has established that speechreading performance for these onsets is equivalent for /i/ and /ˆ/ vowel contexts (Owens & Blazek, 1985). Our rationales for selecting the onsets were twofold. First, the onsets represent developmentally early phonetic achievements and reduced articulatory demands (Dodd, Holm, Hua, & Crosbie, 2003; Smit, Hand, Freilinger, Bernthal, & Bird, 1990). To the extent that phonological development is a dynamic process, with knowledge improving from 1) unstable, minimally specified and harder-to-access/retrieve representations to 2) stable, robustly detailed and easier-to-access/retrieve representations, it seems important for an initial study to assess early-acquired phonemes that children are more likely to have mastered (see McGregor, Friedman, Reilly, & Newman, 2002, for similar reasoning about semantic knowledge).

Table 1A.

Pictures and auditory/auditory-visual distractors of Children’s Multimodal Picture-Word Test.

Phonology Items
Pictured Objects Distractors

Bees Pizza Beach Demon Peacock Teacher
Bus Pumpkin Beast Detective Peach Tee-shirt
Deer Teeth Buckle Dumbo Potato Tomato
Duck Tongue Butterfly Dumptruck Puddle Tugboat
Eagle Onion

Semantic Filler Items
Pictured Object Distractors

Boot Pickle Bear Flag Puppet
Dog Pizza Bed Glove Shirt
Doll Tiger Cat Horse Slipper
Pants Cheese Hotdog Worm
Dress Lemon

Note: Phonology pictures were administered in presence of three types of distractors (congruent, one feature conflicting in place, and one feature conflicting in voicing onsets: e.g., “Bus”-“Buckle”; “Bus”-“Dumptruck”; “Bus”-“Potato”), plus the baseline distractor “Onion” for /ˆ/ vowel-nucleus pictures or “Eagle” for /i/ vowel-nucleus pictures. Filler-item pictures were presented in presence of two types of distractors (semantically-related and -unrelated: e.g., “Boot”-“Slipper”; “Boot”-“Flag”).

Table 2A.

Linguistic statistics for picture and distracter word items of phonology condition for Children’s Multimodal Picture-Word Test

Source Scale Pictures n=8 Distracters n=18
Concreteness Very concrete = 7 or 700
Overall Average 7 pt or adjusted 7 pt 6.27 (5) 5.80 (12)
Gilhooly & Logie (1980) End pt = 7 6.69 (2) 6.30 (6)
Coltheart (1981) End pt = 700 617.20 (5) 569.08 (12)
Imagery High imageability = 7 or 700
Overall Average 7 pt or adjusted 7 pt 6.43 (7) 5.93 (14)
Morrison et al. (1997) End pt = 7 6.38 (5) 6.21 (7)
Coltheart (1981) End pt = 700 622.20 (5) 586.85 (13)
Cortese & Fugett (2004) End pt = 7 6.68 (6) 6.33 (3)
Word Familiarity Very familiar = 5, 7 or 700
Overall Average 7 pt or adjusted 7 pt 5.41 (8) 5.53 (15)
Morrison et al. (1997) End pt = 5 2.57 (5) 3.03 (7)
Snodgrass & Vanderwart (1980) End pt = 5 3.05 (5) 2.98 (7)
Coltheart (1981) End pt = 700 543.20 (5) 517.62 (13)
Nusbaum et al. (1984) End pt = 7 6.93 (6) 6.97 (13)
Age of Acquisition 13+ yrs = 7 or 9
Overall Average 7 pt or adjusted 7 pt 2.41 (6) 2.85 (10)
Morrison et al. (1997) End pt = 7 2.22 (5) 2.69 (7)
Carroll & White (1973) End pt = 9 3.19 (4) 3.40 (4)
Gilhooly & Logie (1980) End pt = 7 2.23 (2) 2.92 (6)
Word Frequency
Toddler Data
Dale & Fenson (1996) Proportion of children understanding/producing words at 30 mos. 88.79 (7) 78.93 (4)
Adult Data
 Kucera & Francis* Printed occurrences per million 29.57 (7) 18.64 (14)
Word Recognition
Jerger et al. (2007) Percent of children recognizing words from 6 alternative picture choices
Preschool - ** 91.90 (18)
Elementary - ** 98.89 (18)
Phonotactic Probability
Vitevitch and Luce (2004) Positional Segment Frequency
 Sum .1731 (8) .2159 (18)
 Onset .0580 (8) .0524 (18)
Position Specific Biphone Frequency
 Sum .0065 (8) .0115 (18)
 Onset .0023 (8) .0025 (18)

Note. Each of the overall averages was obtained by averaging data across resources for each item and then averaging across mean item values for each subset. Numbers of items contributing to averages across resources are presented in parentheses. No average could be determined for word frequency. pt = point; yrs = years

*

as cited in Coltheart (1981)

**

picture readability in young children was established previously (Jerger et al., 2002b)

Second, the onsets represent variations in place of articulation (/b/-/d/ versus /p/-/t/) and voicing (/b/- /p/ versus /d/- /t/), two phonetic features that are traditionally thought to be differentially dependent on auditory vs visual speech. Previous findings, based on lip or lower-face visual images, indicate that place of articulation is easier to discriminate visually whereas voicing is easier to discriminate auditorily (Miller & Nicely, 1955; Owens & Blazek, 1985). Each picture was administered in the presence of the four types of distractors described previously, namely congruent, one-feature conflicting in place of articulation, one-feature conflicting in voicing, and vowel-onset baseline distractors.

Experimental Instrumentation

To administer picture-word items, the video track of the Quicktime movie file was routed to a high resolution computer monitor and the auditory track was routed through a speech audiometer to a loudspeaker. For audiovisual trials, each trial contained 1000 ms of the talker’s still neutral face and upper chest, followed by presentation of one colored picture on the chest and an audiovisual utterance of one distractor word, followed by 1000 ms of still neutral face and the colored picture. For auditory only trials, each trial contained 1000 ms of still neutral face and upper chest; followed by a continuation of the still neutral face, presentation of one colored pictured on the chest, and an auditory only utterance of one distractor word; followed by 1000 ms continuation of still face and the colored picture. Each picture was pasted in exactly the same time frame for both auditory and audiovisual items. Thus, the only difference between auditory and audiovisual conditions was that the auditory items have a neutral face and the audiovisual items have a dynamic face.

The computer monitor and the loudspeaker were mounted on an adjustable height table directly in front of the child at a distance of approximately 90 cm. To name each picture, children spoke into a unidirectional microphone mounted on an adjustable stand. To obtain naming latency, the computer triggered a counter/timer with better than 1 ms resolution at the initiation of a movie file. The timer was stopped by the onset of the child’s vocal response into the microphone, which was fed through a stereo mixing console amplifier and 1 dB step attenuator to a voice-operated relay (VOR). A pulse from the VOR stopped the timing board via a data module board. We verified that the VOR was not triggered by the auditory distractors. The counter timer values were corrected for the amount of silence in each movie file before the onset of the picture. Naming times were digitally recorded for offline analysis in all children with flawed pronunciations.

Procedure

Participants were tested in two separate sessions, approximately 12 days apart, one for auditory testing and one for audiovisual testing. The modality of the first and second sessions was counterbalanced across participants. The first session always began with a practice task. A tester showed each picture on a 5“ × 5” card, asking children to name the picture and teaching them the target names of any pictures named incorrectly. Next the tester flashed some picture cards quickly and modeled speeded naming. The child was asked to copy the tester for another few pictures. Speeded naming practice trials went back and forth between tester and child until the child was naming pictures fluently, particularly without saying “a” before names. The second session always began with a mini-practice task.

The experimental trials consisted of two practice items followed by presentation of all of the pictures with each type of speech distractor in a random order within one unblocked condition (see Starreveld, 2000, for discussion). No individual picture or word distractor was allowed to reoccur without at least two intervening trials. The children sat at a child-sized table in a double-walled sound-treated booth. The tester sat at a computer workstation and a co-tester sat alongside the children, keeping them on task. Each trial was initiated by the tester’s pushing the space bar (out of the participant’s sight). Children were instructed to name each picture and disregard the speech distractor. They were told that “Andy” (pseudonym) was wearing a picture on his chest, and he wanted to know what it was. They were to say the name as quickly as possible to say it correctly. The microphone was placed approximately 12 inches from the child’s mouth without blocking his or her view of the monitor. If necessary, the child’s speaking level, the position of the microphone or child, and/or the setting on the 1 dB step attenuator between the microphone and VOR were adjusted to ensure that the VOR was triggering reliably. The intensity level of the distractors was approximately 70 dB SPL, as measured at the imagined center of the participant’s head with a sound level meter.

Measures

The dependent measures were picture-naming times in the presence of both the auditory and audiovisual distractors. The picture-distractor pairs represented congruent, conflicting in place of articulation, conflicting in voicing, and neutral (i.e., baseline) relationships. With regard to the characteristics of these data, 5.52% of all trials were excluded or missing for the following reasons. Naming responses that were more than 3 standard deviations from an item’s conditional mean were discarded. This procedure excluded 1.68% of trials. Naming responses that were flawed, on the other hand, were deleted on-line and re-administered after intervening items. The percentage of overall trials judged to be flawed (e.g., lapses of attention, squirming out of position, triggering the microphone in a flawed manner) was 17.45%, ranging from 24.48% in the younger children to 7.35% in the older children. The percentage of missing trials remaining at the end because the re-administered trial was also flawed was 6.35% in the younger children and less than 1% in the older children, averaging 3.84% of overall trials.

Results

Analysis Plan

Naming times were analyzed with a factorial mixed-design analyses of variance, regression analyses, and t-tests (Abdi, Edelman, Valentin, & Dowling, 2009). The overall set of variables was comprised of a between-subjects factor (5 age groups) and within-subjects factors representing the modality of the distractor (auditory versus audiovisual) and the type of condition (congruent, conflicting in place of articulation, conflicting in voicing, or baseline). The problem of multiple comparisons was controlled with the False Discovery Rate (FDR) procedure (Benjamini & Hochberg, 1995; Benjamini, Krieger, & Yekutieli, 2006). The FDR approach controls the expected proportion of false positive findings among rejected hypotheses. A value of the approach is its demonstrated applicability to repeated measures designs. For the experimental conditions, we quantified the degree of facilitation and interference from congruent and conflicting onsets respectively with adjusted naming times. Adjusted times were derived by subtracting each participant’s vowel baseline naming times from his or her congruent and conflicting times, as done in our previous studies (Jerger et al., 2002a & b). This approach controls for developmental differences in detecting and responding to stimuli and allows each picture to serve as its own control, without affecting the differences among the types of distractors.

Baseline Condition

Figure 2 shows average naming times in the age groups for the vowel onset distractors presented in the auditory vs audiovisual modalities. Naming times for the /i/ and /ˆ/ onsets were statistically equivalent; results are collapsed across vowels. Omnibus statistical analysis of the data included one between-subjects factor (age groups) and one within-subjects factor (modality: auditory vs audiovisual). Results indicated that age significantly affected overall naming times, F (4, 95) = 23.109, p <.0001. No other significant effects or interactions were observed.

Figure 2.

Figure 2

Average absolute naming latencies in five age groups for vowel onset, baseline distractors, presented in the auditory versus audiovisual modalities.

To obtain a more precise understanding of the effects of age, we carried out a multiple regression analysis. Results indicated a significant decrease in naming times with increasing age, F (1,196) = 142.182, p one way < .0001. A linear trend accounted for approximately 99% of the age-related decline in naming times for each modality. The slopes of the auditory and audiovisual functions (-10 vs -9 ms/mos respectively) did not differ statistically, indicating that naming times improved in a comparable way with age for each mode. The intercepts of the auditory and audiovisual functions (2524 vs 2471 ms respectively) also did not differ. In the presence of homogeneous slopes, equivalent intercepts indicate equivalent absolute naming times (i.e., the auditory developmental function was not shifted relative to the audiovisual function). Naming times collapsed across modality decreased from approximately 2035 ms in the 4-yr-olds to1140 ms in the 10-14-yr-olds. An age-related improvement in absolute picture naming times agrees with previous findings (Brooks & MacWhinney, 2000; Jerger et al., 2002a & b; Jescheniak, Hahne, Hoffman, & Wagner, 2006; Melnick, Conture, & Ohde, 2003).

Experimental Conditions

Initially, we conducted an omnibus analysis of the experimental conditions with one between-subjects factor (age groups) and two within-subjects factors (1: modality: auditory vs audiovisual, and 2: condition: congruent, conflicting in place, vs conflicting in voicing). Results indicated that adjusted naming times were significantly influenced by age, F (4, 95) = 3.795, p = .007, and condition, F (2, 190) = 201.684, p <.001. The effect of the condition on performance varied in complex ways, however, as a function of age and the modality of the distractor, with a significant condition × age group interaction, F (8, 190) = 6.242, p <.001, condition × modality interaction, F (2, 190) = 3.260, p = .041, and condition × age group × modality interaction, F (8, 190) = 2.463, p = .015. No other significant effects were observed. These complex interactions were probed by analyzing each condition separately.

Congruent Condition

Figure 3 shows the degree of facilitation as quantified by adjusted naming times for auditory and audiovisual congruent distractors in the age groups. The zero baseline of the ordinate represents naming times for the vowel onset baseline distractors (Figure 2). Results of multiple regression analysis did not indicate a significant general effect of age on adjusted naming times. However, the individual developmental curves characterizing the auditory and audiovisual functions differed significantly from each other, F (1,196) = 3.952, p one way =.024. Whereas a quadratic trend accounted for the largest proportion (74%) of age-related variability for audiovisual distractors, a linear trend accounted for the largest proportion (89%) of the variability for auditory distractors. For audiovisual distractors, both the linear and quadratic trends were significant, F (1,95) = 2.79, p one way = .049, and F (1,95) = 8.60, p one way = .002, respectively. For auditory distractors, the linear trend approached significance, F (1,95) = 2.28, p one way = .066. Previous results on the Cross-Modal Picture-Word Task with auditory distractors have consistently shown a greater degree of facilitation for younger than older children (Brooks & MacWhinney, 2000; Jerger et al., 2002a & b; Jescheniak et al., 2006).

Figure 3.

Figure 3

Congruent Distractors. Degree of facilitation for auditory versus audiovisual modalities as quantified by adjusted naming latencies in five age groups. The zero baseline of the ordinate represents naming times for vowel onset baseline distractors (Fig. 2). A larger negative value indicates more facilitation.

Multiple t-tests with the FDR method controlling for multiplicity indicated that the degree of facilitation was significantly greater for audiovisual than auditory distractors in the 4-yr-olds and 10-to-14-yr-olds. All other groups showed equivalent degrees of facilitation for both types of distractors. Multiple t-tests with the FDR method assessing whether the adjusted naming times differed significantly from zero indicated significant facilitation in 4, 5, and 10-yr-olds for audiovisual distractors and in 4 and 5-yr-olds for auditory distractors. FDR results in 6-7 and 8-9-yr-olds for auditory distractors approached significance.

Conflicting-in-Voicing Condition

Figure 4 shows the degree of interference as quantified by adjusted naming times in the age groups for auditory and audiovisual distractors conflicting in voicing. Again, the zero baseline of the ordinate represents naming times for the vowel onset baseline distractors. Results of multiple regression indicated a significant decrease in interference with increasing age, F (1,196) = 24.049, p one way <.0001. The degree of age-related change differed significantly, however, for the auditory and audiovisual functions, with the developmental trajectory significantly steeper for the audiovisual modality, F (1,196) = 3.580, p one way = .030. A linear trend accounted for 90% of the between group variability for audiovisual distractors, but only 55% of the variability for auditory distractors. The trends were significant for both functions: audiovisual, F (1,95) = 24.46, p one way <.0001, and auditory, F (1,95) = 5.76, p one way =.009. The curvilinear trends did not achieve significance. The slopes of the functions declined by -2 ms/mos for the audiovisual mode but only -1 ms/mos for the auditory mode. The nonparallel slopes for the functions rendered testing differences between the intercepts irrelevant.

Figure 4.

Figure 4

Conflicting in Voicing Distractors. Degree of interference for auditory versus audiovisual modalities as quantified by adjusted naming latencies in five age groups. The zero baseline of the ordinate represents naming times for vowel onset baseline distractors (Fig. 2). A larger positive value indicates more interference.

Multiple t-tests with the FDR method controlling for multiplicity indicated significantly greater interference from audiovisual than auditory distractors in the 4-yr-olds. All other groups showed equivalent degrees of interference for both types of distractors. Multiple t-tests with the FDR method assessing whether all adjusted naming times differed significantly from zero indicated a significant degree of interference in all groups for both auditory and audiovisual distractors. These findings agree with previous findings for auditory only conflicting-in-voicing distractors (Jerger et al., 2002a & b).

Conflicting-in-Place Condition

Figure 5 shows the degree of interference as quantified by adjusted naming times in the age groups for auditory and audiovisual distractors conflicting in place. Results of multiple regression analysis indicated a significant decrease in interference with age, F (1,197) = 15.579, p one way <.0001. A linear trend accounted for 85% of the between group variability for audiovisual distractors and 66% of the variability for auditory distractors. The trends were significant for both functions: audiovisual, F (1,95) = 13.95, p one way <.0001, and auditory, F (1,95) = 4.18, p one way =.022. The curvilinear trends did not achieve significance. The developmental functions for the audiovisual and auditory modalities were characterized by statistically equivalent slopes (-1 ms/mos) and intercepts (254 ms). Homogeneous slopes and intercepts signify comparable adjusted naming times and a uniform degree of age-related decline for the two modalities. That said, the notable trend suggesting greater interference from the audiovisual distractors in the younger children, particularly the 5-yr-olds, clearly seems worth mentioning. Finally, results of multiple t-tests with the FDR method indicated significant interference in all groups for both auditory and audiovisual distractors. These findings are consistent with previous findings for auditory only conflicting in place distractors (Jerger et al., 2002a & b).

Figure 5.

Figure 5

Conflicting in Place Distractors. Degree of interference for auditory versus audiovisual modalities as quantified by adjusted naming latencies in five age groups. The zero baseline of the ordinate represents naming times for vowel onset baseline distractors (Fig. 2). A larger positive value indicates more interference.

Discussion

This research modified the Children’s Cross-Modal Picture-Word Task (Jerger et al., 2002b) into a multimodal procedure for assessing indirectly the influence of visual speech on phonological processing. Results varied as a function of age and the type and modality of the distractors in complex ways. For distractors conflicting in place of articulation, the groups showed statistically equivalent interference for the auditory and audiovisual distractors. There was a notable trend suggesting greater interference from the audiovisual distractors in the younger children, particularly the 5-yr-olds, but we lacked the statistical power to detect the effect when correcting for multiple comparisons. The degree of interference decreased significantly with increasing age in an equivalent manner across modality.

For distractors conflicting in voicing, the degree of interference also decreased significantly with increasing age, but the auditory and audiovisual functions exhibited significantly different developmental trajectories. The degree of age-related change was greater for the audiovisual function because the 4-yr-olds showed significantly greater interference from audiovisual than auditory distractors whereas all other groups showed equivalent degrees of interference from both types of distractors. Although results in the 4-yr-olds appear inconsistent with the literature indicating that voicing is difficult to discriminate visually on the lips (Tye-Murray, 1998), they are coherent with more recent data suggesting that some visemes, such as /p/ versus /b/, may be more readily discriminated visually when individuals view full facial images as used herein (Bernstein, Iverson, & Auer, 1997, as cited in Bernstein, Demorest, & Tucker, 2000) (See Footnote 1).

For the congruent distractors, the auditory and audiovisual functions exhibited significantly different developmental trajectories. The audiovisual function showed a unique, significant quadratic trend due to audiovisual distractors producing significantly greater facilitation than auditory distractors in 4-yr-olds and 10-14-yr-olds but not in other age groups. The degree of facilitation also varied considerably. Congruent distractors produced significant facilitation in 4, 5, and 10-yr-olds for audiovisual distractors and in 4 and 5-yr-olds for auditory distractors, with results in 6-7 and 8-9-yr-olds of borderline significance.

Our results showing a pronounced influence of visual speech on performance for congruent and conflicting distractors in the 4-yr-olds are difficult to relate to the literature because previous studies have pooled results with older ages. To the extent that the previous amalgamated data are reflecting the performance of 4-yr-olds accurately, our results on an indirect task disagree with the results obtained on direct testing measures (Desjardins et al., 1997; Dupont et al., 2005; Massaro, 1984; Massaro et al., 1986). Further research on indirect vs direct testing approaches in 4-yr-olds is warranted. Our results showing a lack of influence of visual speech on performance in the 5-9-yr-olds agree with previous data on a variety of direct testing measures (Desjardins et al., 1997; Dupont et al., 2005; Hockley & Polka, 1994; Massaro, 1984; Massaro et al., 1986; McGurk & MacDonald, 1976; Sekiyama & Burnham, 2004; Wrightman et al., 2006). Children within this age range are less influenced by visual speech on both indirect and direct tasks. The data support the conclusion that the negative findings in children of 5-9-yrs-of-age represent an age-related effect, rather than differences in the task demands of indirect and direct procedures.

Some previous investigators have attributed the reduced influence of visual speech on performance in children to their poorer speechreading abilities (Massaro et al., 1986; Wrightman et al., 2006). A relation between the influence of visual speech on performance and speechreading skills seems undeniably reasonable. That said, such a relationship cannot explain the developmental shifts noted in our research. As detailed in the Methods section, visual-only speechreading scores were comparable in the 4- and 5-yr-olds (4% words; 35% visemes) and poorer in these children than in the 6- to -9-yr-olds (15% words; 58% visemes). Thus, speechreading skills cannot account for our results indicating a greater influence of visual speech on performance in the 4-yr-olds than in the 5-9-yr-olds.

Our findings in the 10-14-yr-olds indicating that congruent audiovisual distractors produced significantly more facilitation agree with results in the literature (Conrad, 1977; Dodd, 1977,1980; Hockley & Polka, 1994). A novel finding of this research was the observation that conflicting audiovisual distractors did not produce significantly more interference in the 10-14-yr-olds. A significant influence of visual speech on the degree of facilitation but not on the degree of interference may be associated with the more advanced cognitive abilities of preteen and teenage children and adults, particularly in terms of inhibiting conflicting information and resisting interference (Bjorklund, 2005; Jerger, Pearson, & Spence, 1999). Finally, we should note that previous investigators have compared children’s performance to adult performance rather than to 10-14-yr-olds. We also tested a group of college students, 18-38 yrs. The data were not included because the pattern of results in the 10-14-yr-old was adult-like, mirroring findings in the college students.

The intriguing finding of the present study was the inverted U-shaped developmental function observed for congruent audiovisual distractors. Performance showed an influence of visual speech at the youngest and oldest ages, but not at the intermediate ages. As noted earlier, speechreading skills were consistently improving with age. Why then was performance in the intermediate-aged children less influenced by visual speech as seen in Figure 3? U-shaped functions have been carefully scrutinized by dynamic systems theorists (Smith & Thelen, 2003) who propose that the plateau of the U-shaped trajectory is reflecting a period of transition, rather than an actual loss of visual speech on performance. The idea is that the components of early skills are softly assembled behaviors, i.e., malleable configurations, that reorganize over time into more mature, stable, and flexible forms (Gershkoff-Stowe & Thelen, 2004). The dynamic systems model also assumes that multiple interactive factors typically form the basis of developmental change, rather than one single factor. From this viewpoint, the temporary decline in the influence of visual speech on performance is viewed as reflecting a reorganization of relevant knowledge and processing sub-systems in response to internal and environmental forces. Using knowledge mechanisms that are in a period of significant dynamic growth may require more resources and overload the processing system, resulting in a temporary decrease in processing efficiency.

With regard to the developmental changes in face processing, our results do not seem consistent with a lack of influence of visual speech on performance due to difficulties in processing faces and the full range of facial expressions until the preteen-teenage years (Campbell et al., 1995; Carey et al., 1980; Mondloch et al., 2003). The age range supporting the establishment of adult-like face processing skills does correspond closely to our age range showing the reestablishment of an influence of visual speech on performance, 10-14-yr-olds. That said, an effect on performance due to immaturities in face processing seems contradicted by the data in the current 4-yr-olds, who showed a pronounced influence of visual speech on performance.

With regard to the influence of developmental changes in input/output processing skills, our results are consistent with the transitions in the processing weights of the auditory vs visual speech modalities that were proposed in the Introduction. An important idea of dynamic systems theory is that the ends of the U-shaped trajectory, which seem to reflect identical outcomes in the 4-yr-olds and 10-14-yr-olds, may not be reflecting identical underlying mechanisms. With regard to information processing attentional resources, our results are consistent with the proposal that visual speech may act as an external cue that disproportionately benefits information processing in preschool children with less mature skills, creating an indirect influence of visual speech on performance. With regard to multimodal processing, the U-shaped results are consistent with the proposal of developmental shifts due to transitions in the processing of multimodal inputs from an undifferentiated holistic to modality-specific manner.

Our results are also consistent with the proposal that phonological representational knowledge reorganizes during the kindergarten-early elementary school years. The age at which literacy instruction begins and speech coding becomes sufficient to begin using inner speech for learning, remembering, and problem solving, about 5-6 years, is uncannily similar to the age at which an effect of visual speech on phonological processing seems to disappear in the current study. The age range of our results showing a lack of visual speech on performance is uncannily similar to the estimate that it requires a period of about 3 years to systematize the knowledge gained during literacy learning for a language such as English (Anthony & Francis, 2005). To the extent that temporary periods of reorganization and dynamic growth may be characterized by less robust processing systems and decreases in processing efficiency, the influence of visual speech may vary as a function of the processing demands of different tasks. Higher demand tasks that stress processing may reveal developmental shifts more readily than lower demand tasks that do not create the same degree of stress. Future research should explore the effects of visual speech on performance with tasks that manipulate information processing requirements.

In sum, a complex array of factors may influence the processing of multimodal stimuli. The U-shaped developmental function for congruent audiovisual distractors might be reflecting any or all of the above considerations with the possible exception of immaturities in face processing. Multimodal speech processing clearly seems to involve diverse component processes that require a multidisciplinary perspective.

Acknowledgments

This work was supported by the National Institute on Deafness and Other Communication Disorders, grant DC-00421 to the University of Texas at Dallas. We thank Dr. Alice O’Toole for her generous advice and assistance in recording our audiovisual stimuli and interpreting data. We appreciate the thoughtful comments of Dr. Virgina Marchman on an earlier version of the paper. We thank the children and parents who participated, and the students who assisted, namely Shaumika Ball, Karen Banzon, Katie Battenfield, Sarah Joyce Bessonette, K. Meaghan Dougherty, Irma Garza, Stephanie Hirsch, Kelley Leach, Anne Pham, Lori Pressley, and Anastasia Villescas (data collection, analysis, and/or presentation), and Derek Hammons (computer programming).

Footnotes

1

To probe Bernstein & colleagues’ suggestion, the speech readability of the onsets of the distractors (/b/, /d/, /p/, /t/) in terms of place of articulation (labial vs not labial) and voicing (voiced vs not voiced) was assessed in a pilot study with 20 normal adults (13 females and 7 males) ranging in age from 19-32 years (M=22.6 yrs). The students watched visual-only presentations of the distractors intermixed with filler items and classified each onset in terms of lips/not lips or voicing/not voicing. The order of classifying on the basis of place of articulation or voicing was counterbalanced across students. Immediately before testing each type of classification, the students classified a practice list with feedback. Classification of the distractors’ onsets averaged 90.5% correct (range = 70-100%) for place of articulation and 65.0% correct (range = 45-80%) for voicing. Classification was significantly above chance for both labial place of articulation, t(19) = 22.84, p < .0001, and for voicing, t(19) = 7.55, p < .0001. The amount of visual speech information observed for distractors conflicting in voicing seems to have been sufficient to significantly influence performance on a classification task in adults and on our indirect task in 4-yr-olds. More research is needed in this area.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Abdi H, Edelman B, Valentin D, Dowling W. Experimental design and analysis for psychology. Oxford: Oxford University Press; 2009. in press. [Google Scholar]
  2. American Speech Language Hearing Association. Typical Speech and Language Development. 2008 Retrieved March 28, 2008, from http://www.asha.org/public/speech/development/default.htm.
  3. Anthony J, Francis D. Development of phonological awareness. Current Directions in Psychological Science. 2005;14:255–259. [Google Scholar]
  4. Arnold P, Hill F. Bisensory augmentation: A speechreading advantage when speech is clearly audible and intact. British Journal of Psychology. 2001;92:339–355. [PubMed] [Google Scholar]
  5. Bahrick L, Netto D, Hernandez-Reif M. Intermodal perception of adult and child faces and voices by infants. Child Development. 1998;69:1263–1275. [PubMed] [Google Scholar]
  6. Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B (Methodological) 1995;57:289–300. [Google Scholar]
  7. Benjamini Y, Krieger A, Yekutieli D. Adaptive linear step-up procedures that control the false discovery rate. Biometrika. 2006;93:491–507. [Google Scholar]
  8. Bentin S, Hammer R, Cahan S. The effects of aging and first grade schooling on the development of phonological awareness. Psychological Science. 1991;2:271–274. [Google Scholar]
  9. Bernstein L, Demorest M, Tucker P. Speech perception without hearing. Perception & Psychophysics. 2000;62(2):233–252. doi: 10.3758/bf03205546. [DOI] [PubMed] [Google Scholar]
  10. Bertelson P, de Gelder B. The psychology of multimodal perception. In: Spence C, Driver J, editors. Crossmodal space and crossmodal attention. Oxford: Oxford University Press; 2004. pp. 141–177. [Google Scholar]
  11. Bjorklund D. Children’s thinking. Cognitive development and individual differences. 4. Belmont, CA: Wadsworth/Thomson Learning; 2005. [Google Scholar]
  12. Brainerd C. Dropping the other U: An alternative approach to U-shaped developmental functions. Journal of Cognition and Development. 2004;5:81–88. [Google Scholar]
  13. Brooks P, MacWhinney B. Phonological priming in children’s picture naming. Journal of Child Language. 2000;27:335–366. doi: 10.1017/s0305000900004141. [DOI] [PubMed] [Google Scholar]
  14. Bryant P. Phonological and grammatical skills in learning to read. In: deGelder B, Morais J, editors. Speech and reading: A comparative approach. Hove, East Sussex: Erlbaum (UK): Taylor & Francis; 1995. pp. 249–266. [Google Scholar]
  15. Burnham D, Dodd B. Auditory-visual speech integration by prelinguistic infants: Perception of an emergent consonant in the McGurk effect. Developmental Psychobiology. 2004;44:209–220. doi: 10.1002/dev.20032. [DOI] [PubMed] [Google Scholar]
  16. Cameron S, Dillon H. Development of the listening in spatialized noise-sentences test (LISN-S) Ear & Hearing. 2007;28:196–211. doi: 10.1097/AUD.0b013e318031267f. [DOI] [PubMed] [Google Scholar]
  17. Campbell R. Tracing lip movements: Making speech visible. Visible Language. 1988;22:32–57. [Google Scholar]
  18. Campbell R. Audio-visual speech processing. In: Brown K, Anderson A, Bauer L, Berns M, Hirst G, Miller J, editors. The encyclopedia of language and linguistics. Amsterdam: Elsevier; 2006. pp. 562–569. [Google Scholar]
  19. Campbell R, DeHaan E. Repetition priming for face speech images: Speech-reading primes face identification. British Journal of Psychology. 1998;89:309–323. [Google Scholar]
  20. Campbell R, Walker J, Baron Cohen S. The development of differential use of inner and outer face features in familiar face identification. Journal of Experimental Child Psychology. 1995;59:196–210. [Google Scholar]
  21. Carey S, Diamond R, Woods B. Development of face recognition - A maturational component? Developmental Psychology. 1980;16:257–269. [Google Scholar]
  22. Carroll JB, White MN. Age-of-acquisition norms for 220 picturable nouns. Journal of Verbal Learning and Verbal Behavior. 1973;12:563–576. [Google Scholar]
  23. Coltheart M. The MRC Psycholinguistic Database. 1981 Retrieved August 9, 2006, from http://www.psy.uwa.edu.au/mrcdatabase/uwa_mrc.htm.
  24. Conrad R. The chronology of the development of covert speech in children. Developmental Psychology. 1971;5:398–405. [Google Scholar]
  25. Conrad R. Lipreading by deaf and hearing children. British Journal of Educational Psychology. 1977;47:60–65. doi: 10.1111/j.2044-8279.1977.tb03001.x. [DOI] [PubMed] [Google Scholar]
  26. Cortese M, Fugett A. Imageability ratings for 3,000 monosyllabic words. Behavioral Research Methods, Instruments, & Computers. 2004;36(3):384–387. doi: 10.3758/bf03195585. [DOI] [PubMed] [Google Scholar]
  27. Dale P, Fenson L. Lexical development norms for young children. Behavioral Research Methods, Instruments, & Computers. 1996;28:125–127. [Google Scholar]
  28. Damian M, Martin R. Semantic and phonological codes interact in single word production. Journal of Experimental Psychology: Learning, Memory, and Cognition. 1999;25:345–361. doi: 10.1037//0278-7393.25.2.345. [DOI] [PubMed] [Google Scholar]
  29. de Gelder B, Morais J. Speech and reading: One side to two coins. In: de Gelder B, Morais J, editors. Speech and reading: A comparative approach. Hove, East Sussex: Erlbaum (UK): Taylor & Francis; 1995. pp. 1–13. [Google Scholar]
  30. de Gelder B, Pourtois G, Vroomen J, Bachoud-Levi A. Covert processing of faces in prosopagnosia is restricted to facial expressions: Evidence from cross-modal bias. Brain and Cognition. 2000;44:425–444. doi: 10.1006/brcg.1999.1203. [DOI] [PubMed] [Google Scholar]
  31. de Gelder B, Vroomen J. Impaired speech perception in poor readers: Evidence from hearing and speech reading. Brain and Language. 1998a;64:269–281. doi: 10.1006/brln.1998.1973. [DOI] [PubMed] [Google Scholar]
  32. de Gelder B, Vroomen J. Impairment of speech-reading in prosopagnosia. Speech Communication. 1998b;26:89–96. [Google Scholar]
  33. Desjardins R, Rogers J, Werker J. An exploration of why preschoolers perform differently than do adults in audiovisual speech perception tasks. Journal of Experimental Child Psychology. 1997;66:85–110. doi: 10.1006/jecp.1997.2379. [DOI] [PubMed] [Google Scholar]
  34. Desjardins R, Werker J. Is the integration of heard and seen speech mandatory for infants? Developmental Psychobiology. 2004;45:187–203. doi: 10.1002/dev.20033. [DOI] [PubMed] [Google Scholar]
  35. Dodd B. The role of vision in the perception of speech. Perception. 1977;6:31–40. doi: 10.1068/p060031. [DOI] [PubMed] [Google Scholar]
  36. Dodd B. Lip reading in infants: Attention to speech presented in- and out-of-synchrony. Cognitive Psychology. 1979;11:478–484. doi: 10.1016/0010-0285(79)90021-5. [DOI] [PubMed] [Google Scholar]
  37. Dodd B. Interaction of auditory and visual information in speech perception. British Journal of Psychology. 1980;71:541–549. doi: 10.1111/j.2044-8295.1980.tb01765.x. [DOI] [PubMed] [Google Scholar]
  38. Dodd B. The acquisition of lipreading skills by normally hearing children. In: Dodd B, Campbell R, editors. Hearing By eye: The psychology of lipreading. London: Erlbaum; 1987. pp. 163–175. [Google Scholar]
  39. Dodd B, Holm A, Hua Z, Crosbie S. Phonological development: A normative study of British English-speaking children. Clinical Linguistics & Phonetics. 2003;17:617–643. doi: 10.1080/0269920031000111348. [DOI] [PubMed] [Google Scholar]
  40. Doherty-Sneddon G, Bonner L, Bruce V. Cognitive demands of face monitoring: Evidence for visuospatial overload. Memory & Cognition. 2001;29:909–919. doi: 10.3758/bf03195753. [DOI] [PubMed] [Google Scholar]
  41. Doherty-Sneddon G, Kent G. Visual signals and the communication abilities of children. Journal of child psychology and psychiatry. 1996;37:949–959. doi: 10.1111/j.1469-7610.1996.tb01492.x. [DOI] [PubMed] [Google Scholar]
  42. Doherty-Sneddon G, McAuley S, Bruce V, Langton S, Blokland A, Anderson A. Visual signals and children’s comunication: Negative effects on task outcome. British Journal of Developmental Psychology. 2000;18:595–608. [Google Scholar]
  43. Dupont S, Aubin J, Menard L. A study of the McGurk effect in 4- and 5-year-old French Canadian children. ZAS Papers in Linguistics. 2005;40:1–17. [Google Scholar]
  44. Durand K, Gallay M, Seigneuric A, Robichon F, Baudouin J. The development of facial emotion recognition: The role of configural information. Journal of Experimental Child Psychology. 2007;97:14–27. doi: 10.1016/j.jecp.2006.12.001. [DOI] [PubMed] [Google Scholar]
  45. Elliott L, Hammer M, Evan K. Perception of gated, highly familiar spoken monosyllabic nouns by children, teenagers and older adults. Perception and Psychophysics. 1987;42:150–157. doi: 10.3758/bf03210503. [DOI] [PubMed] [Google Scholar]
  46. Fernald A, Swingley D, Pinto J. When half a word is enough: Infants can recognize spoken words using partial phonetic information. Child Development. 2001;72:1003–1015. doi: 10.1111/1467-8624.00331. [DOI] [PubMed] [Google Scholar]
  47. Fisher C. Confusions among visually perceived consonants. Journal of Speech & Hearing Research. 1968;11:796–804. doi: 10.1044/jshr.1104.796. [DOI] [PubMed] [Google Scholar]
  48. Gershkoff-Stowe L, Thelen E. U-shaped changes in behavior: A dynamic systems perspective. Journal of Cognition and Development. 2004;5:11–36. [Google Scholar]
  49. Gilhooly KJ, Logie RH. Age-of-acquisition, imagery, concreteness, familiarity, and ambiguity measures for 1,944 words. Behavior Research Methods & Instrumentation. 1980;12(4):395–427. [Google Scholar]
  50. Glenberg A, Schroeder J, Robertson D. Averting the gaze disengages the environment and facilitates remembering. Memory & Cognition. 1998;26:651–658. doi: 10.3758/bf03211385. [DOI] [PubMed] [Google Scholar]
  51. Godfrey J, Syrdal-Lasky A, Millay K, Knox C. Performance of dyslexic children on speech perception tests. Journal of Experimental Child Psychology. 1981;32:401–424. doi: 10.1016/0022-0965(81)90105-3. [DOI] [PubMed] [Google Scholar]
  52. Gordon P. Perceptual-motor processing in speech. In: Proctor R, Reeve T, editors. Stimulus-response compatibility. North-Holland: Elsevier Science; 1990. [Google Scholar]
  53. Goswami U, Ziegler J, Richardson U. The effects of spelling consistency on phonological awareness: A comparison of English and German. Journal of Experimental Child Psychology. 2005;92:345–365. doi: 10.1016/j.jecp.2005.06.002. [DOI] [PubMed] [Google Scholar]
  54. Green K. The use of auditory and visual information during phonetic processing: Implications for theories of speech perception. In: Campbell R, Dodd B, Burnham D, editors. Hearing by eye II Advances in the psychology of speechreading and auditory-visual speech. Hove, UK: Taylor & Francis; 1998. pp. 3–25. [Google Scholar]
  55. Greenberg H, Bode D. Visual discrimination of consonants. Journal of Speech & Hearing Research. 1968;11:869–874. doi: 10.1044/jshr.1104.869. [DOI] [PubMed] [Google Scholar]
  56. Hayden A, Bhatt R, Reed A, Corbly C, Joseph J. The development of expert face processing: Are infants sensitive to normal differences in second-order relational information? Journal of Experimental Child Psychology. 2007;97:85–98. doi: 10.1016/j.jecp.2007.01.004. [DOI] [PubMed] [Google Scholar]
  57. Hockley N, Polka L. A developmental study of audiovisual speech perception using the McGurk paradigm. Journal of the Acoustical Society of America. 1994;96:3309. [Google Scholar]
  58. Hollingshead A. Four factor index of social status. New Haven, CT: Yale University, Department of Sociology; 1975. [Google Scholar]
  59. Jerger S, Bessonette SJ, Davies KL, Battenfield K. University of Texas at Dallas 2007 Unpublished data. [Google Scholar]
  60. Jerger S, Lai L, Marchman V. Picture naming by children with hearing loss: II. Effect of phonologically-related auditory distractors. Journal of the American Academy of Audiology. 2002;13:478–492. [PubMed] [Google Scholar]
  61. Jerger S, Martin R, Damian M. Semantic and phonological influences on picture naming by children and teenagers. Journal of Memory and Language. 2002;47:229–249. [Google Scholar]
  62. Jerger S, Pearson D, Spence M. Developmental course of auditory processing interactions: Garner interference and Simon interference. Journal of Experimental Child Psychology. 1999;74:44–67. doi: 10.1006/jecp.1999.2504. [DOI] [PubMed] [Google Scholar]
  63. Jescheniak J, Hahne A, Hoffmann S, Wagner V. Phonological activation of category coordinates during speech planning is observable in children but not in adults: Evidence for cascaded processing. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2006;32:373–386. doi: 10.1037/0278-7393.32.3.373. [DOI] [PubMed] [Google Scholar]
  64. Jordan T, Bevan K. Seeing and hearing rotated faces: Influences of facial orientation on visual and audiovisual speech recognition. Journal of Experimental Psychology: Human Perception and Performance. 1997;23:388–403. doi: 10.1037//0096-1523.23.2.388. [DOI] [PubMed] [Google Scholar]
  65. Kuhl P, Meltzoff A. The bimodal perception of speech in infancy. Science. 1982;218:1138–1141. doi: 10.1126/science.7146899. [DOI] [PubMed] [Google Scholar]
  66. Levelt W, Schriefers H, Vorberg D, Meyer A, Pechmann T, Havinga J. The time course of lexical access in speech production: A study of picture naming. Psychological Review. 1991;98:122–142. [Google Scholar]
  67. Lickliter R, Bahrick L. Perceptual development and the origins of multisensory responsiveness. In: Calvert G, Spence C, Stein B, editors. The handbook of multisensory processes. Cambridge, MA: The MIT Press; 2004. pp. 643–654. [Google Scholar]
  68. Locke J. The child’s path to spoken language. Cambridge, MA: Harvard University Press; 1993. [Google Scholar]
  69. MacLeod A, Summerfield Q. Quantifying the contribution of vision to speech perception in noise. British Journal of Audiology. 1987;21:131–141. doi: 10.3109/03005368709077786. [DOI] [PubMed] [Google Scholar]
  70. Martin R, Lesch M, Bartha M. Independence of input and output phonology in word processing and short-term memory. Journal of Memory and Language. 1999;41:3–29. [Google Scholar]
  71. Massaro D. Children’s perception of visual and auditory speech. Child Development. 1984;55:1777–1788. [PubMed] [Google Scholar]
  72. Massaro D. Perceiving talking faces: From speech perception to a behavioral principle. Cambridge, MA: MIT Press; 1998. [Google Scholar]
  73. Massaro D, Thompson L, Barron B, Laren E. Developmental changes in visual and auditory contributions to speech perception. Journal of Experimental Child Psychology. 1986;41:93–113. doi: 10.1016/0022-0965(86)90053-6. [DOI] [PubMed] [Google Scholar]
  74. McGregor K, Friedman R, Reilly R, Newman R. Semantic representation and naming in young children. Journal of Speech, Language, and Hearing Research. 2002;45:332–346. doi: 10.1044/1092-4388(2002/026). [DOI] [PubMed] [Google Scholar]
  75. McGurk H, McDonald M. Hearing lips and seeing voices. Nature. 1976;264:746–748. doi: 10.1038/264746a0. [DOI] [PubMed] [Google Scholar]
  76. Melnick K, Conture E, Ohde R. Phonological priming in picture naming of young children who stutter. Journal of Speech, Language, and Hearing Research. 2003;46:1428–1443. doi: 10.1044/1092-4388(2003/111). [DOI] [PubMed] [Google Scholar]
  77. Meltzoff A, Moore M. Explaining facial imitation: A theoretical model. Early Development and Parenting. 1997;6:179–192. doi: 10.1002/(SICI)1099-0917(199709/12)6:3/4<179::AID-EDP157>3.0.CO;2-R. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Merikle P, Reingold E. Comparing direct (explicit) and indirect (implicit) measures to study unconscious memory. Journal of Experimental Psychology: Learning, Memory, and Cognition. 1991;17:224–233. [Google Scholar]
  79. Miller G, Nicely P. An analysis of perceptual confusions among some English consonants. Journal of the Acoustical Society of America. 1955;27:338–352. [Google Scholar]
  80. Mills A. The development of phonology in the blind child. In: Dodd B, Campbell R, editors. Hearing By eye: The psychology of lipreading. London: Erlbaum; 1987. pp. 145–161. [Google Scholar]
  81. Mondloch C, Geldart S, Maurer D, LeGrand R. Developmental changes in face processing skills. Journal of Experimental Child Psychology. 2003;86:67–84. doi: 10.1016/s0022-0965(03)00102-4. [DOI] [PubMed] [Google Scholar]
  82. Morais J, Bertelson P, Cary L, Alegria J. Literacy training and speech segmentation. Cognition. 1986;24:45–64. doi: 10.1016/0010-0277(86)90004-1. [DOI] [PubMed] [Google Scholar]
  83. Morrison C, Chappell T, Ellis A. Age of acquisition norms for a large set of object names and their relation to adult estimates and other variables. Quarterly Journal of Experimental Psychology. 1997;50A(3):528–559. [Google Scholar]
  84. Morrison F, Smith L, Dow-Ehrensberger M. Education and cognitive development: A natural experiment. Developmental Psychology. 1995;31:789–799. [Google Scholar]
  85. Munhall K, Vatikiotis-Bateson E. The moving face during speech communication. In: Campbell R, Dodd B, Burnham D, editors. Hearing by eye II: Advances in the psychology of speechreading and auditory-visual speech. Hove, UK: Psychology Press; 1998. pp. 123–139. [Google Scholar]
  86. Nusbaum HC, Pisoni DB, Davis CK. Sizing up the Hoosier Mental Lexicon: Measuring the familiarity of 20,000 words. Research on Speech Perception Progress Report No. 10. 1984:357–376. [Google Scholar]
  87. Owens E, Blazek B. Visemes observed by hearing-impaired and normal-hearing adult viewers. Journal of Speech & Hearing Research. 1985;28:381–393. doi: 10.1044/jshr.2803.381. [DOI] [PubMed] [Google Scholar]
  88. Patterson M, Werker J. Matching phonetic information in lips and voice is robust in 4.5-month-old infants. Infant Behavior & Development. 1999;22:237–247. [Google Scholar]
  89. Patterson M, Werker J. Two-month-old infants match phonetic information in lips and voice. Developmental Science. 2003;6:191–196. [Google Scholar]
  90. Ramirez J, Mann V. Using auditory-visual speech to probe the basis of noise-impaired consonant-vowel perception in dyslexia and auditory neuropathy. Journal of the Acoustical Society of America. 2005;118:1122–1133. doi: 10.1121/1.1940509. [DOI] [PubMed] [Google Scholar]
  91. Rosenblum L, Schmuckler M, Johnson J. The McGurk effect in infants. Perception & Psychophysics. 1997;59:347–357. doi: 10.3758/bf03211902. [DOI] [PubMed] [Google Scholar]
  92. Schriefers H, Meyer A, Levelt W. Exploring the time course of lexical access in language production: Picture-word interference studies. Journal of Memory and Language. 1990;29:86–102. [Google Scholar]
  93. Sekiyama K, Burnham D. Issues in the development of auditory-visual speech perception: Adults, infants, and children. Interspeech-2004. 2004:1137–1140. [Google Scholar]
  94. Sloutsky V, Napolitano A. Is a picture worth a thousand words? Preference for auditory modality in young children. Child Development. 2003;74:822–833. doi: 10.1111/1467-8624.00570. [DOI] [PubMed] [Google Scholar]
  95. Smit A, Hand L, Freilinger J, Bernthal J, Bird A. The Iowa articulation norms project and its Nebraska replication. Journal of Speech & Hearing Disorders. 1990;55:779–798. doi: 10.1044/jshd.5504.779. [DOI] [PubMed] [Google Scholar]
  96. Smith L, Thelen E. Development as a dynamic system. Trends in Cognitive Sciences. 2003;7:343–348. doi: 10.1016/s1364-6613(03)00156-6. [DOI] [PubMed] [Google Scholar]
  97. Snodgrass JG, Vanderwart M. A standardized set of 260 pictures: Norms for name agreement, image agreement, familiarity, and visual complexity. Journal of Experimental Psychology: Human Learning and Memory. 1980;6(2):174–215. doi: 10.1037//0278-7393.6.2.174. [DOI] [PubMed] [Google Scholar]
  98. Snowling M, Hulme C. The development of phonological skills. Philosophical Transactions of the Royal Society of London Series B. 1994;346:21–27. doi: 10.1098/rstb.1994.0124. [DOI] [PubMed] [Google Scholar]
  99. Starreveld P. On the interpretation of onsets of auditory context effects in word production. Journal of Memory and Language. 2000;42:497–525. [Google Scholar]
  100. Tye-Murray N. Foundations of aural rehabilitation. San Diego: Singular Publishing Group; 1998. [Google Scholar]
  101. Tye-Murray N, Geers A. Children’s audio-visual enhancement test. St. Louis, MO: Central Institute for the Deaf; 2001. [Google Scholar]
  102. Vitevitch MS, Luce PA. A web-based interface to calculate phonotactic probability for words and nonwords in English. Behavior Research Methods, Instruments, & Computers. 2004;36:481–487. doi: 10.3758/bf03195594. [DOI] [PMC free article] [PubMed] [Google Scholar]
  103. Weikum W, Vouloumanos A, Navarra J, Soto-Faraco S, Sebastian-Galles N, Werker J. Visual language discrimination in infancy. Science. 2007;316:1159. doi: 10.1126/science.1137686. [DOI] [PubMed] [Google Scholar]
  104. Wickens C. Temporal limits of human information processing: A developmental study. Psychological Bulletin. 1974;81:739–755. [Google Scholar]
  105. Wightman F, Kistler D, Brungart D. Informational masking of speech in children: Auditory-visual integration. Journal of the Acoustical Society of America. 2006;119:3940–3949. doi: 10.1121/1.2195121. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES