Neural Correlates of Intersensory Processing in Five-Month-Old Infants

Greg D Reynolds; Lorraine E Bahrick; Robert Lickliter; Maggie W Guy

doi:10.1002/dev.21104

. Author manuscript; available in PMC: 2014 Apr 1.

Published in final edited form as: Dev Psychobiol. 2013 Feb 19;56(3):355–372. doi: 10.1002/dev.21104

Neural Correlates of Intersensory Processing in Five-Month-Old Infants

Greg D Reynolds ¹, Lorraine E Bahrick ², Robert Lickliter ², Maggie W Guy ¹

PMCID: PMC3954462 NIHMSID: NIHMS561461 PMID: 23423948

Abstract

Two experiments assessing event-related potentials in 5-month-old infants were conducted to examine neural correlates of attentional salience and efficiency of processing of a visual event (woman speaking) paired with redundant (synchronous) speech, nonredundant (asynchronous) speech, or no speech. In Experiment 1, the Nc component associated with attentional salience was greater in amplitude following synchronous audiovisual as compared with asynchronous audiovisual and unimodal visual presentations. A block design was utilized in Experiment 2 to examine efficiency of processing of a visual event. Only infants exposed to synchronous audiovisual speech demonstrated a significant reduction in amplitude of the late slow wave associated with successful stimulus processing and recognition memory from early to late blocks of trials. These findings indicate that events that provide intersensory redundancy are associated with enhanced neural responsiveness indicative of greater attentional salience and more efficient stimulus processing as compared with the same events when they provide no intersensory redundancy in 5-month-old infants.

Keywords: intersensory perception, event-related potentials, infancy, attention

Introduction

It is well established that multimodal stimulation is highly salient and promotes heightened attention, perceptual processing and memory in human infants and adults as well as nonhuman animal infants (Bahrick & Lickliter, 2000; 2002; 2012; Lewkowicz, 2000; Lickliter & Bahrick, 2000). What accounts for the attentional salience of multimodal stimulation? Multimodal stimulation often provides intersensory redundancy, the synchronous co-occurrence of the same amodal information (e.g., rhythm, tempo, intensity changes) across two or more sense modalities (Bahrick & Lickliter, 2000). Infants are routinely exposed to redundant (e.g., synchronous faces and voices) and nonredundant (e.g., faces or voices alone, or one person’s face moving out of synchrony with another person’s voice) stimulation in their everyday lives.

Bahrick and Lickliter (2000, 2002) have proposed the intersensory redundancy hypothesis (IRH), a model that describes the central role of selective attention to intersensory redundancy (i.e., temporal synchrony) in guiding early perceptual and cognitive development. Research has demonstrated that selective attention to intersensory redundancy is a cornerstone of perceptual learning and early cognitive development (Bahrick & Lickliter, 2000, 2012; Lewkowicz, 2000). The IRH proposes that temporal synchrony across two or more sensory systems promotes attention to redundantly specified properties of objects and events (e.g., rhythm, tempo, affect) at the expense of other nonredundantly specified properties, particularly in early development when attentional resources are most limited.

Findings at the behavioral level have provided support for the intersensory redundancy hypothesis. For example, Bahrick and Lickliter (2000) habituated 5-month-old infants to a hammer tapping a complex rhythm in redundant (synchronous audiovisual) versus nonredundant (unimodal visual, unimodal auditory, or asynchronous audiovisual) stimulation. Only infants habituated to the rhythm in the synchronous condition demonstrated dishabituation (i.e., increased looking) to a change in rhythm. Similarly, Flom & Bahrick (2007) found that infants were able to discriminate the affect of a woman speaking by 4 months of age when exposed to redundant stimulation (synchronous audiovisual speech) but not nonredundant stimulation (unimodal visual, unimodal auditory or asynchronous audiovisual speech). Discrimination of affect during unimodal visual and unimodal auditory speech emerged later in development. Thus, the redundant, synchronous presentation of affective information across the auditory and visual sense modalities enhanced infants’ perceptual processing of amodal information, a process referred to as intersensory facilitation (Bahrick & Lickliter, 2002, 2012). The consistent finding (Bahrick, Flom, & Lickliter, 2002; Bahrick & Lickliter, 2000, 2012; Flom & Bahrick, 2007) that young infants are able to abstract amodal stimulus properties of redundant audiovisual stimuli at an earlier age than the same amodal stimulus properties can be abstracted from nonredundant audiovisual stimuli indicates that intersensory redundancy fosters enhanced perceptual processing in early infancy.

What are the underlying mechanisms that support enhanced perceptual processing of redundant information in early development? To date most developmental research has focused on the behavioral level. Behavioral findings indicate that infants are able to perceive amodal information provided by multimodal sensory stimulation at a very early age (for reviews see Bahrick & Pickens, 1994; Lewkowicz, 2000; Lickliter & Bahrick, 2000; Walker-Andrews, 1997). Research suggests that this ability guides young infants’ selective attention and is fundamental to their unitary perception of meaningful events (see Bahrick & Lickliter, 2002; 2012 Gibson & Pick, 2000). Although a large body of research demonstrates impressive intersensory processing skills in human and nonhuman animal infants (Bremner, Lewkowicz, & Spence, 2012; Calvert, Spence, & Stein, 2004; Lewkowicz & Lickliter, 1994), there is currently little understanding of potential mechanisms underlying this skill/process. For example, it is not known if redundantly presented information is processed more efficiently because intersensory redundancy serves as a salient, attention-getting stimulus (Cohen, 1972) or if redundant information is just easier to process, but no more salient, than nonredundant information. There is also little known about the neural processes involved in processing intersensory redundancy in infancy.

The majority of what we do know about neural processes involved in multimodal perception in early development is based on comparative work with non-human animal subjects (e.g., Jay & Sparks, 1984; Stein, Meredith, & Wallace, 1994; Wallace & Stein, 1997; Wallace, Wilkinson, & Stein, 1996). This line of research has demonstrated “superadditive” effects of multimodal stimulation on firing rates of neurons in the superior colliculus of young cats (Wallace & Stein, 1997) and monkeys (Jay & Sparks, 1984; Wallace, Wilkinson, & Stein, 1996). Furthermore, comparative work with cats and primates indicates that multisensory neurons are distributed throughout the cerebral cortex, including areas classically viewed as unisensory domains (for reviews see Ghazanfar & Schroeder, 2006; Stein & Stanford, 2008). Cortical areas commonly identified as multisensory regions in monkeys include the superior temporal sulcus (STS; Bruce, Desimone, & Gross, 1981; Hikosaka, 1993), the intraparietal complex (IP; Mazzoni et al., 1999; Linden et al., 1999), and the frontal cortex (Benevento et al., 1977).

Neuroimaging studies using human participants have also demonstrated that these areas are involved in multimodal processing in adulthood (e.g., Beauchamp, 2005; Gobbelè et al., 2003; Lutkenhoner et al., 2002). For example, research indicates the STS is actively involved in processing audiovisual speech in adult participants (Senkowski, Saint-Amour, Gruber, & Foxe, 2008), and recent fMRI work (Marchant, Ruff, & Driver, 2012) demonstrates a significantly higher BOLD response to synchronous audiovisual stimuli compared to asynchronous audiovisual stimuli in the: STS, superior temporal gyrus, thalamus, and putamen. Furthermore, Laurienti and colleagues (2003) found increased BOLD responses occurring in the anterior cingulate gyrus and medial prefrontal cortex to matching (or congruent) audiovisual stimuli compared to non-matching (incongruent) audiovisual stimuli. Interestingly, studies utilizing cortical source analyses with infant participants indicate that these frontal areas are likely sources of the Nc ERP component associated with infant attention and visual preferences (Reynolds, Courage, & Richards, 2010; Reynolds & Richards, 2005). Thus, research across species demonstrates that multiple cortical and subcortical areas are involved in multimodal processing; however, little is known about neural processing of multimodal stimulation in infancy, due in part to practical and ethical concerns related to use of standard neuroimaging techniques (e.g., PET, fMRI) with human infants (Reynolds & Richards, 2009). Here we focus on one aspect of neural processing of multimodal stimulation in infancy, the neural underpinnings of attention to intersensory redundancy.

Although behavioral (i.e., habituation) findings indicate that intersensory redundancy promotes selective attention to and perceptual processing of amodal stimulus properties in infancy, the underlying neural processes are relatively unknown and the point in the information processing stream at which facilitation occurs cannot be determined based on behavioral findings alone. The ERP is a particularly useful measure for examining component processes (e.g., orienting, attention, memory) involved in perceptual and cognitive processing that potentially occur within the course of a single look (see, Reynolds & Guy, 2012). The ERP represents voltage oscillations in the electroencephalogram (EEG) that are time-locked with a specific event of interest. The ERP is averaged across trials to increase the signal-to-noise ratio in the EEG, and components can be identified in the averaged waveform that are associated with specific aspects of perceptual and cognitive processing. ERP work with adult and infant participants indicates that components associated with early auditory and visual processing are significantly greater in amplitude following multimodal audiovisual stimulus presentations when compared to the sum of unimodal auditory and visual presentations (e.g., Giard & Peronnet, 1999; Hyde, Jones, Porter, & Flom, 2010; Molholm et al., 2002; Santangelo, Van der Lubbe, Olivetti-Berlardinelli, & Postma, 2008).

In infant ERP research, the Nc component has been shown to be associated with visual attention and stimulus salience (Courchesne, Ganz, & Norcia, 1981; de Haan & Nelson, 1997; Reynolds et al., 2005, 2010; Richards, 2003). The Nc is a negatively-polarized ERP component that occurs between 350 and 750 ms after stimulus onset over midline electrodes. A common finding across early infant ERP studies was that Nc is greater in amplitude following oddball (or rare) stimulus presentations than following standard stimulus presentations (Courchesne, 1977; Courchesne, Ganz, & Norcia, 1981; Karrer & Ackles, 1987; 1988; Karrer & Monti, 1995; Nikkel & Karrer, 1994). More recent findings indicate that Nc amplitude is impacted by stimulus salience as opposed to frequency of presentation or novelty per se (de Haan & Nelson, 1997, 1999; Reynolds & Richards, 2005; Reynolds, Courage, and Richards, 2010). Reynolds, Courage, and Richards (2010) integrated a behavioral measure of infant visual preferences (i.e., paired comparison trials) into an ERP study and found that Nc is greatest in amplitude to the infant’s preferred stimulus regardless of novelty or familiarity. These findings indicate that the Nc component is associated with infant visual attention and varies in amplitude based on stimulus salience (Courchesne et al., 1981; Nelson, 1994; Reynolds et al., 2010; Richards, 2003). Thus, if intersensory redundancy recruits infant attention, then Nc would be expected to be greater in amplitude to redundant multimodal stimulation than nonredundant stimulation.

The late slow wave (LSW) is believed to reflect stimulus encoding, and differential amplitude of the LSW based on stimulus type is an electrophysiological index of infant recognition memory (see review, de Haan, 2007). The LSW typically occurs from 1 to 2 s following stimulus onset over temporal and frontal leads. A consistent finding across studies is that the LSW demonstrates a reduction in amplitude with repeated stimulus presentations that is indicative of stimulus encoding (de Haan & Nelson, 1999; Reynolds, Guy, & Zhang, 2011; Snyder, 2010; Snyder, Webb, & Nelson, 2002; Snyder et al., 2010; Webb, Long, & Nelson, 2005; Wiebe et al., 2006). Given the behavioral findings demonstrating enhanced processing of redundant multimodal stimuli in infancy (Bahrick, Flom, & Lickliter, 2002; Bahrick & Lickliter 2000; Flom & Bahrick, 2007), infants would be expected to require less exposure to a redundant multimodal stimulus in comparison to nonredundant stimulus (multimodal or unimodal) in order to demonstrate a significant reduction in LSW amplitude. Such an effect would provide evidence of more efficient processing of redundant stimuli at the neural level.

In this paper, we describe two studies designed to test neural mechanisms underlying the salience and enhanced processing of intersensory redundancy. For consistency with prior behavioral research in this area (e.g., Flom and Bahrick, 2007), we exposed 5-month-old infants to videos of a woman speaking providing intersensory redundancy (synchronous audiovisual speech) versus no redundancy (asynchronous audiovisual or unimodal visual speech). Five-month-old infants are skilled at discriminating synchrony from asynchrony and at detecting amodal properties, including rhythm, tempo, and affective information common to faces and voices (Bahrick & Lickliter, 2012). Experiment 1 was designed to test the attentional salience of intersensory redundancy as reflected by the amplitude of the Nc component. If intersensory redundancy provided by multimodal stimulation is highly salient and captures infant attention (consistent with the IRH, Bahrick & Lickliter, 2000), then infants should show a greater amplitude Nc component to stimuli depicting intersensory redundancy (synchronous audiovisual) than to stimuli depicting no intersensory redundancy (asynchronous audiovisual, or unimodal visual). Experiment 2 was designed to learn more about mechanisms underlying intersensory processing. If intersensory redundancy promotes enhanced perceptual processing in comparison to nonredundant stimulation, then this should be reflected by significant changes in the LSW with repeated exposure to a redundant multimodal stimulus. If infants demonstrate enhanced intersensory processing due to deeper levels of attentional engagement, then changes in LSW amplitude over time should be paired with greater amplitude Nc to redundant audiovisual stimuli.

Experiment 1: Intersensory Redundancy and Attentional Salience as assessed by the Nc Component

In Experiment 1, we utilized high-density EEG to examine the impact of multimodal (synchronous and asynchronous) audiovisual and unimodal visual stimulus presentations on ERP components associated with visual attention and face processing. We tested infants at 5 months of age for consistency with previous behavioral work in the area (e.g., Bahrick & Lickliter, 2000). We exposed infants to repeated presentations of a woman speaking a short phrase under conditions depicting three stimulus types: unimodal visual (video of a woman speaking with no soundtrack), synchronous audiovisual (video of a woman speaking with synchronous soundtrack), and asynchronous audiovisual (video a woman speaking with temporally asynchronous soundtrack).

Three studies (Grossmann, Striano, & Friederici, 2006; Hyde, Jones, Flom, & Porter, 2011; Vogel, Monesson, & Scott, 2012) have examined the effects of audiovisual face-voice pairings on the Nc component in infant participants. Findings from these studies have been somewhat inconsistent regarding the effects of congruency across auditory and visual stimulus components. For example, Grossmann and colleagues (2006) found that infants demonstrate greater negativity to face-voice pairings conveying incongruent emotional information compared to face-voice pairings conveying congruent emotional information. In contrast, Vogel and colleagues (2012) found that infants demonstrate greater amplitude Nc, indicating greater attention, to face-voice pairings conveying congruent emotional information.

Vogel and colleagues (2012) speculated that inconsistency across studies in the direction of congruency effects may be due to differences in stimuli used or task design. Grossmann and colleagues (2006) first presented a face displaying a happy or angry facial expression and then presented an audio clip of a woman speaking in a happy or angry tone. The ERPs were time-locked to the audio presentation of the woman speaking. In contrast, Vogel and colleagues first played an audio clip of a woman speaking in a happy or sad tone, and then presented a face displaying a happy or sad facial expression. The ERPs were time-locked to the visual presentation of the face, which is typical of most studies examining Nc. Thus, the inconsistency may have been due to the fact that in one study the audio component always served as the source of information for detecting congruity or incongruity (Grossmann et al., 2006), and in the other study the video component always served as the source of information for detecting congruity or incongruity (Vogel et al., 2012). It also worthwhile to note that the “negative component” analyzed in response to the audio clips in the Grossman and colleagues’ study (2006) was actually a relatively slight negative deflection that occurred following a high amplitude positive-going change in the ERP waveform. Thus, their “negative component” was actually positively-polarized, and the “greater negativity” observed in response to incongruent stimuli could just as easily be interpreted as a greater amplitude positive component occurring in response to congruent stimuli. Since the audio and video components of the face-voice pairings were not presented in synchrony in either of these studies, the stimuli used did not provide intersensory redundancy.

In the only published study to date to specifically examine the effects of redundancy provided through audiovisual synchrony on infant ERPs, Hyde and colleagues (2011) exposed 5-month-old infants to synchronous and asynchronous audiovisual presentations of a woman speaking. In the synchronous condition, infants heard an audio clip paired simultaneously with a matching video clip (i.e., video and audio of a woman saying, “Hi baby”). In the asynchronous condition, the infants heard the same audio clip paired with the simultaneous presentation of a non-matching video clip (i.e., video clip showed a woman mouthing, “You’re such a beautiful baby”). In contrast to what would be expected based on the behavioral literature (Bahrick & Lickliter, 2000; Flom & Bahrick, 2007), the authors found that infants demonstrated greater amplitude Nc to the asynchronous face-voice pairings compared to the synchronous face-voice pairings, and concluded that the greater amplitude Nc to asynchronous audiovisual stimuli reflected detection of a novel stimulus category.

Although their conclusion regarding this effect may be correct, Hyde and colleagues did not fully balance the audio and visual components in their synchronous and asynchronous conditions. The phrase used as the audio component, “Hi baby,” remained constant across all stimulus presentations. Thus, the video clip of a woman mouthing, “You’re such a beautiful baby,” only occurred in the asynchronous condition and was incongruent with the ongoing, repeated auditory presentations of the phrase, “Hi baby.” Given that Nc is greater in amplitude to low-frequency or oddball stimuli (Courchesne et al., 1977, 1981; Reynolds & Richards, 2005) and the amplitude of Nc is likely influenced by overall procedural context (Richards, 2003), the presentation of the non-matching video clip may have led to an oddball effect occurring against the standard presentation of the auditory stimulus (i.e., the “Hi baby” phrase) in the Hyde and colleagues’ (2011) study.

We used a balanced design for our synchronous and asynchronous stimulus conditions in the current study, utilizing two different phrases for both the video and audio components of our stimuli to avoid creating a “standard” stimulus and potential “oddball” effects. Consistent with the intersensory redundancy hypothesis and behavioral findings (Bahrick & Lickliter, 2000, 2002, 2012), we predicted that with the greater level of control in the current study, infants would show greater attention to synchronous audiovisual presentations and this would be associated with greater amplitude Nc when compared to asynchronous audiovisual and unimodal visual trials. Greater amplitude Nc in the synchronous compared to asynchronous condition would allow us to rule out the possibility that the differences across groups were simply based on additive effects (audio plus visual as compared with visual only). Amount and type (auditory and visual) of stimulation were equated across synchronous and asynchronous conditions and only the redundancy differed between them.

A secondary goal of Experiment 1 was to conduct a more exploratory and descriptive analysis of the potential impact of intersensory redundancy on ERP components associated with face processing and speech perception in infancy. The ERP components of interest for these analyses included the N290 and P400 components associated with face processing (e.g., de Haan, Johnson, & Halit, 2007; Farroni, Csibra, Simion, Johnson, 2002; Halit, de Haan, & Johnson, 2003); and the auditory P1 and N250 components associated with speech perception (Benasich et al., 2006; Rivera-Gaxiola, Klarman, Garcia-Sierra, & Kuhl, 2005; Rivera-Gaxiola, Silva-Pereya, & Kuhl, 2005).

The N290 and P400 have been identified as ERP components related to face-processing in infancy (de Haan, Johnson, & Halit, 2007). The N290 is a negatively-polarized component that occurs over midline and posterior electrodes with peak latency between 290 and 350 ms after stimulus onset (Halit, de Haan, & Johnson, 2003). By 3 months of age, the N290 is greater in amplitude to faces than noise (Halit, Csibra, Volein, & Johnson, 2004). The P400 is a positive-going component that occurs over posterior midline and lateral electrodes and reaches peak amplitude between 390 and 450 ms after stimulus onset. By 6 months of age, the P400 has a shorter latency to peak in response to faces than objects (de Haan & Nelson, 1999), and by 12 months of age, the P400 is shorter in latency to upright versus inverted human faces (Halit et al., 2003).

The auditory P1 component is the first positive peak (also referred to as P150) in the ERP waveform that occurs across the scalp with a peak latency between 150 – 250 ms after stimulus onset. The P1 is sensitive to native and non-native speech contrasts for 6 and 12 month olds (Rivera-Gaxiola, Klarman, Garcia-Sierra, & Kuhl, 2005; Rivera-Gaxiola, Silva-Pereya, & Kuhl, 2005), and is similar in latency to the auditory P2 component that has been shown to be associated with auditory recognition memory in newborn infants (e.g., de Regnier, Nelson, Thomas, Wewerka, & Georgieff, 2000; de Regnier, Wewerka, Georgieff, Mattia, & Nelson, 2002). ERP studies on audiovisual speech perception in infants have had inconsistent results. For example, some studies have found a reduction in amplitude of early auditory components in response to phonemes preceded by (Bristow et al., 2008) or paired with (Kushnerenko, Teinonen, Volein, & Csibra, 2008) congruent visual cues (van Wassenhove, Grant, & Poeppel, 2005). In contrast, Hyde and colleagues’ (2011) found increased amplitude of early auditory components in response to speech paired with congruent visual cues compared to speech paired with incongruent visual cues.

With respect to our secondary analyses of face processing components (N290 and P400) and speech processing components (auditory P1), we made no specific predictions regarding differential effects of redundant and nonredundant audiovisual stimuli. However, due to the potential additive effects of combining auditory and visual stimulation, we predicted that both audiovisual conditions (synchronous and asynchronous) would be associated with greater amplitude ERP across these components compared to the unimodal visual condition. Because the auditory and visual components of the asynchronous stimulus we used were spatially co-located and contained synchronous stimulus onset, we reasoned that basic multimodal additive effects would occur in both audiovisual conditions, but the predicted attention-related effect of intersensory redundancy on Nc amplitude would only occur in the synchronous audiovisual condition.

Method

Participants

A sample of 15 infants (9 male, 6 female) was tested at 5 months of age. Infants were tested within a week of their 22 week birthdate. Only infants born full term (at least 38 weeks gestation) without complications and of normal birth weight were recruited. Participants were drawn from a predominantly Caucasian and middle-class population. The ethnic/racial distribution of participants was: 14 Caucasian (not Hispanic), and 1 Biracial. An additional 23 infants were tested, but not included in the final sample due to fussiness/distractibility (N = 6), excessive artifact in the EEG (N = 13), and technical problems (N = 4). This level of attrition falls within the typical range of 50 – 75 % for infant ERP studies (DeBoer, Scott, & Nelson, 2007).

Apparatus

Participants were positioned on their parent’s lap in a sound-attenuated room. They were seated 55 cm away from a 27″ color LCD monitor (Dell 2707 WFP) with 60 Hz resolution. Speakers were positioned directly behind the monitor for presenting the auditory components of bimodal stimulus presentations. A digital camcorder (Sony DCR-HC28) was located just below the monitor in order to judge infant visual fixations. Fixations were judged online using a video feed to a computer in the experiment control room, adjacent to the testing room. The video was recorded through use of Netstation software produced by Electrical Geodesics Incorporated (EGI). The Netstation was used to record EEG data and to synchronize this data with the video.

Stimuli

Test Stimuli

Infants were exposed to three different stimulus types: unimodal visual, synchronous audiovisual, and asynchronous audiovisual speech. Importantly, two exemplars (depicting different phrases) were used for each stimulus type resulting in a total of 6 test stimuli. The unimodal visual stimuli consisted of dynamic videos without soundtracks, the synchronous audiovisual stimuli consisted of dynamic videos with temporally matching soundtracks, and the asynchronous audiovisual stimuli consisted of dynamic videos with temporally mis-matching soundtracks. All three stimulus types consisted of a female adult actress reciting one of two phrases (“Come over here by me!” or “Where’s the baby going?”) in infant-directed speech using positive affect. For the asynchronous audiovisual condition, the soundtracks were reversed. For example, the video depicting the actress saying, “Come over here by me!” was accompanied by the soundtrack, “Where’s the baby going?” and vice versa. This presentation provided a somewhat stringent test of redundancy/synchrony detection in that the audiovisual onset and offset synchrony were preserved in both conditions (the soundtracks to both occurred only while the faces were visible and moving rather than beginning before or terminating after the movement in the asynchronous condition) and only the internal temporal synchrony of the movements of speech with respect to the temporal structure of the sounds of speech was incongruent during asynchronous presentations. All stimuli were 1700 ms in duration and subtended a 33° vertical by 39° horizontal visual angle. The audiovisual stimuli were 60 dB at the position of the infant during testing. The videos consisted of close-up footage of the actress’ face (from the neck-line up). A single actress, positioned in front of a blue-gray background, was used for all stimuli. The stimuli were drawn from the positive affect subset of stimuli used in Flom and Bahrick (2007).

Sesame Street characters

Videos of Sesame Street characters were used as attractor stimuli. The Sesame Street videos covered a 15° square area centered on the monitor.

Procedure

Infants were held on a parent’s lap approximately 55 cm from the center of the computer monitor. They were fitted with an EGI sensor net and impedances were measured. The test phase consisted of repeated presentations of the unimodal visual, synchronous audiovisual, and asynchronous audiovisual stimuli. The stimuli were presented for 1700 ms, followed by a blank blue-gray screen with a random duration of 950 to 1200 ms. Stimulus type presentations were equally distributed across trials in random order. Stimulus presentations were initiated only when the infant was judged to be looking at the monitor. During periods of distraction, the Sesame Street videos were presented as an attractor stimulus, subsequent stimulus presentations were always preceded by a blank screen for at least 500 ms. The procedure was continued for as long as the infant was not tired or fussy (approximately 10 min on average).

Fixation Judging

In addition to judging infant fixations online for the purpose of experimental control during testing, fixations were also judged offline by a trained rater to determine if the participant was looking during each ERP trial. ERP trials in which the infant was not looking at any point during the stimulus presentation were not included in analyses.

EEG recording and analyses

The Electrical Geodesics Incorporated (EGI) Geodesic EEG System 300 (GES 300) 128 channel EEG recording system was used. The EGI Netstation program was used for A/D sampling, data storage, zero and gain calibration for each channel, and measuring impedances. Electrodes were adjusted until impedance values ranging from 10 to 50 kΩ were achieved. The Netstation program received serial communication from a Dell Workstation used to control the experimental protocol with E-Prime 2.0 software (Psychology Software Tools, Inc.). The sampling rate of the EEG was 250 Hz (4 ms samples) and band-pass filters were set from 0.1 to 100 Hz, with 20K amplification. EEG recordings were referenced to the vertex and algebraically re-referenced to the average reference.

The EEG recordings were inspected for artifacts (i.e., blinks, saccades, movement artifact, and drift) and poor recordings using the Netstation review system. Individual channels were marked bad within trials if these occurred. Segments in which more than 10% of the channels were marked bad were eliminated from the analysis. For trials that were retained for the ERP analysis, individual channels marked bad were replaced using a spherical spline interpolation (Perrin, Pernier, Bertrand, Giard, & Echallier, 1987; Srinivasan, Tucker, & Murias, 1998). Only those participants who retained enough ERP trials per condition (i.e., 10 trials) for stable ERP averages following EEG editing were included in the final dataset (DeBoer, Scott, & Nelson, 2007). The number of trials included in the averages did not differ significantly (p > .10) across stimulus types (Ms = 16.4 asynchronous audiovisual, 16.9 synchronous audiovisual, and 14.9 unimodal visual).

ERP averages were calculated from 200 ms before stimulus onset through 1.75 s after stimulus onset. For increased stability, we analyzed the ERP averaged across multiple channels. Nc peak amplitude and latency to peak were analyzed from 350 – 750 ms following stimulus onset at midline frontal (4, 10, 11, 16, 18, 19), midline central (7, 31, 55, 80, 106), and midline parietal (61, 62, 67, 72, 77, 78) electrode locations. For the N290 component, mean amplitude from 190 – 290 ms following stimulus onset was analyzed at left occipital (65, 69, 70), midline occipital (74, 75, 82), and right occipital (83, 89, 90) electrode clusters. For the P400 component, we analyzed mean amplitude from 300 – 500 ms following stimulus onset examining the same electrode locations as the N290 analysis. Electrodes were chosen for the analyses based on past research in the area and visual inspection of the grand average ERP waveforms (DeBoer, Scott, & Nelson, 2007).

Design for Statistical Analysis

The design included the experimental factors of stimulus type (unimodal visual, synchronous bimodal, asynchronous bimodal) and electrode location (level varied by component) as repeated measures. Repeated-measures ANOVAs were used in all analyses and the Greenhouse-Geisser correction was used in cases of violations of the assumption of sphericity. For significant effects, follow-up analyses were done using one-way ANOVAs and paired-samples t-tests. Effect sizes (η_p²) are reported on all significant effects, and all significant tests are reported at p < .05.

Results

Primary Analyses: The Nc Component

Our primary analyses assessed the salience of intersensory redundancy as reflected by the Nc component. The amplitude of Nc component was expected to be higher for redundant (synchronous) than nonredundant (both asynchronous and unimodal visual) stimuli if intersensory redundancy is the basis for the salience of multimodal stimulation in early development. To analyze peak (minimum) amplitude of Nc, we conducted a two-way ANOVA with electrode location (3: midline frontal, midline central, midline parietal) and stimulus type (3: unimodal visual, synchronous bimodal, asynchronous bimodal) as within-subjects factors. There was a significant main effect for electrode location, F (2, 28) 22.90; p < .001, η_p² = .621, with greater amplitude Nc at parietal electrodes than central and frontal electrodes. This main effect was qualified by a significant electrode by stimulus type interaction, F (4, 56) 5.12; p < .001, η_p² = .268. A follow-up ANOVA on parietal electrodes revealed a significant main effect for stimulus type. Consistent with our predictions, infants demonstrated greater amplitude Nc to synchronous audiovisual (M = −16.44 μV) than asynchronous audiovisual (M = −13.10 μV, p = .019) and unimodal visual (M = −11.99 μV, p = .028) stimuli (see Figure 1).

The Nc component at midline parietal electrodes is shown for the unimodal visual (thin line), synchronous audiovisual (bold line), and asynchronous audiovisual (dashed line) conditions from Experiment 1. The Y-axis represents the amplitude of the ERP in microvolts, and the X-axis represents time following stimulus onset. The time-window of the component analysis is shaded on the X-axis. The positioning of the electrodes included in the midline parietal cluster are shown within the electrode montage (see box and shaded electrode sites).

We analyzed latency to peak for the Nc component using the same statistical approach as above and found similar effects. There was a significant interaction of electrode and stimulus type, F (4, 56) 3.34; p = .016, η_p² = .193. At parietal electrodes, infants demonstrated shorter latency to peak Nc for multimodal stimulus presentations (M = 490.76 ms and 489.50 ms for asynchronous and synchronous respectively) compared to unimodal visual presentations (M = 558.04 ms; p < .05 for both comparisons).

Secondary Analyses

We conducted secondary analyses of ERP components involved in face and speech processing. After visual inspection of the grand average waveforms, we focused these analyses on the N290 and P400 components involved in face processing in infancy, and the auditory P1 involved in speech processing. The N290 and P400 were analyzed at occipital electrodes, and the auditory P1 was analyzed at anterior temporal electrode sites. We predicted significant differences between both audiovisual conditions compared to the unimodal visual condition due to the additive effects of combining auditory and visual stimuli.

The N290 Component

We analyzed the average amplitude and peak latency of the N290 using a two-way ANOVA with electrode location (3: left occipital, midline occipital, right occipital) and stimulus type (3: unimodal visual, synchronous audiovisual, asynchronous audiovisual) as within-subjects factors. For mean amplitude, there were significant main effects of electrode location, F (1, 28) 5.92; p < .01, η_p² = .297, and stimulus type, F (2, 28) 5.02; p = .016, η_p² = .264. Infants demonstrated greater amplitude N290 at central occipital electrodes (M = −9.19 μV) than left occipital (M = −3.46 μV; p < .01) and right occipital (M = −4.96 μV; p < .05) electrodes. Infants also demonstrated greater amplitude N290 on multimodal trials than unimodal visual trials (p < .05 for both comparisons; see Figure 2). No differences were found between the synchronous and asynchronous audiovisual conditions, and there were no significant effects related to stimulus type for latency to peak for the N290 component.

The N290 and P400 components are shown for the unimodal visual (thin line), synchronous audiovisual (bold line), and asynchronous audiovisual (dashed line) conditions from Experiment 1. The Y-axis represents the amplitude of the ERP in microvolts, and the X-axis represents time following stimulus onset. The time-window of the component analysis is shaded on the X-axis. The peak of the N290 and P400 components is indicated with the arrows. The positioning of the electrodes included in the occipital cluster are shown in the electrode montage (see box and shaded electrode sites).

The P400 Component

We analyzed the mean amplitude and latency to peak of the P400 using the same statistical approach as our analysis of the N290 component. For amplitude analyses, there was a significant main effect of stimulus type, F (2, 28) 5.94; p < .01, η_p² = .298. Similar to the N290 effect, infants demonstrated greater negativity in P400 amplitude on multimodal trials when compared to unimodal visual trials (p < .05 for both comparisons; see figure 2). However, inspection of Figure 2 indicates that these differences were possibly due to the N290 effect as the amount of change occurring in the waveforms from the peak of the N290 to the peak of the P400 was similar across conditions. Thus, we conducted a follow-up analysis examining the peak-to-peak change in amplitude that occurred from the peak of the N290 to the peak of the P400. In the peak-to-peak analysis, no differences were found based on stimulus type. There were no significant effects for latency to peak for the P400 component.

The Auditory P1 component

We analyzed mean amplitude and latency to peak of the Auditory P1 from 190 – 390 ms at left anterior temporal (34, 35, 39, 40, 41), and right anterior temporal (103, 109, 110, 115, 116) electrodes. There was a main effect for stimulus type, F (2, 28) 5.78; p < .01, η_p² = .292. Infants demonstrated greater amplitude auditory P1 for both asynchronous audiovisual (M = 6.00 μV, p = .005) and synchronous audiovisual (M = 5.45 μV, p = .038) than to unimodal visual stimuli (M = 1.00 μV). No differences were found between the asynchronous and synchronous audiovisual conditions (see Figure 3).

Discussion

Infants were exposed to redundant (synchronous) audiovisual, nonredundant (asynchronous) audiovisual, and unimodal visual presentations of a woman speaking, and ERP components associated with attention, face processing, and auditory processing were examined. Our main hypothesis for Experiment 1, consistent with the IRH, was that the salience of multimodal stimulation was based on intersensory redundancy and this attentional salience would be reflected by the Nc component. Thus, we predicted that the Nc component would be greater in amplitude following synchronous audiovisual presentations when compared to asynchronous audiovisual and unimodal visual presentations. This prediction was supported by an interaction of electrode and stimulus type on Nc amplitude. At midline parietal electrodes, infants demonstrated greater amplitude Nc in the synchronous audiovisual condition than the asynchronous audiovisual and unimodal visual conditions. Additionally, the latency to peak of the Nc component was shorter for both audiovisual conditions than the unimodal visual condition. These findings indicate greater sensitivity to multimodal presentations than unimodal presentations, and greater allocation of attention to redundant multimodal than to nonredundant multimodal and unimodal stimuli. These findings provide novel information about neural mechanisms underlying the facilitating effects of intersensory redundancy on infant attention.

Descriptive analyses of face processing components revealed that infants demonstrated greater amplitude of the N290 component following synchronous and asynchronous audiovisual presentations compared to unimodal visual presentations. While the results of our N290 analysis indicate that multimodal stimulation (regardless of synchrony) may enhance face processing in infants, we cannot rule out the possibility that this multimodal effect is simply due to linear super-position of the electrical activity associated with auditory and visual stimulation as opposed to enhanced responsiveness. Without a unimodal auditory condition, we cannot determine if this effect is super-additive. We chose not to include a unimodal auditory condition because it is generally advised that researchers limit the number of stimulus types to two or three in infant ERP research to avoid excessively high attrition rates (e.g., DeBoer et al., 2007). Infants also demonstrated greater amplitude auditory P1 in both audiovisual conditions compared to the unimodal visual condition. This was expected given the lack of auditory stimulation in the unimodal visual condition.

Experiment 2: Intersensory Redundancy and Processing Efficiency as assessed by the LSW and Nc Component

The findings from Experiment 1 indicate that redundant audiovisual stimuli elicit greater amplitude Nc than nonredundant audiovisual and unimodal visual stimuli. This enhanced neural responsiveness associated with attention may serve as a neural mechanism underlying the intersensory facilitation of perceptual learning and recognition memory that has been consistently found in infant habituation studies (e.g., Bahrick, Flom, & Lickliter, 2002; Bahrick & Lickliter, 2000; Flom & Bahrick, 2007). Experiment 2 was designed to examine the influence of redundant and nonredundant audiovisual stimuli on the LSW associated with stimulus processing (i.e., encoding) and recognition memory in infancy. A reduction in the amplitude of the LSW with repeated stimulus exposure is associated with recognition memory of a fully processed stimulus (de Haan, 2007; de Haan & Nelson, 1997, 1999; Reynolds, Guy, & Zhang, 2011). For example, Snyder (2010) found that infants who demonstrate a significant reduction in LSW amplitude at anterior temporal electrodes following repeated presentation of a single stimulus were more likely to show evidence of recognition memory for the previously viewed stimulus in behavioral testing than infants who showed no reduction in LSW amplitude.

In Experiment 2, we exposed 5-month-old infants to repeated presentations of a single stimulus (either synchronous audiovisual or asynchronous audiovisual) and utilized a block design (where the same trial type was presented across three blocks) to allow for comparison of the amplitude of the LSW across early to late trials. We utilized a between-subjects design and only presented a single stimulus to infants in each group to avoid potential interference effects from other stimulus types. Consistent with behavioral findings indicating enhanced perceptual processing and learning of redundant multimodal stimuli (e.g., Bahrick & Lickliter, 2000, 2002, 2012), we predicted that infants would demonstrate reduced amplitude LSWs across early to late trials in the synchronous audiovisual condition, but that no differences would be found in LSW amplitude across early to late trials in the asynchronous audiovisual condition. Based on the results of Experiment 1, we also predicted that infants would demonstrate greater amplitude Nc in the synchronous audiovisual condition than the asynchronous audiovisual condition. Taken together, these findings would reveal neural underpinnings of greater attention to and enhanced processing of redundant audiovisual stimuli compared to nonredundant audiovisual stimuli.