Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Dec 1.
Published in final edited form as: Dev Psychol. 2019 Sep 16;55(12):2491–2504. doi: 10.1037/dev0000822

Sharing Sounds: The Development of Auditory Joint Engagement During Early Parent-Child Interaction

Lauren B Adamson 1, Roger Bakeman 1, Katharine Suma 1, Diana L Robins 2
PMCID: PMC6861634  NIHMSID: NIHMS1047344  PMID: 31524417

Abstract

Joint engagement—the sharing of events during social interactions—is an important context for early learning. To date, sharing topics that are only heard has not been systematically documented. To describe the development of auditory joint engagement, 48 child-parent dyads were observed 5 times from 12 to 30 months during semi-naturalistic play. Reactions to four types of sounds—overheard speech about the child, instrumental music, animal calls, and mechanical noises—were observed before and as parents scaffolded shared listening and after the sound ceased. Before parents reacted, even 12-month-old infants readily alerted and oriented to the sounds; over time they increasingly tried to share new sounds with their parents. When parents then joined in sharing a sound, periods of auditory joint engagement often ensued, increasing from two thirds of 12-month observations to almost ceiling level at the 18-through 30-month observations. Overall, the developmental course and structure of auditory joint engagement and joint engagement with multimodal objects and events are remarkably similar. Symbol-infused auditory joint engagement occurred rarely at first but increased steadily. Children’s labeling of the sound and parents’ language scaffolding also increased linearly while child pointing towards it rose until 18 months and then declined. Future studies should address variations in the development of auditory joint engagement, whether autism spectrum disorder affects how toddlers share sounds, and the role auditory joint engagement may play in gestural and language development.

Keywords: Joint engagement, parent-child interaction, reaction to sound, shared listening, toddlers, gestures and speech


Joint engagement—the active sharing of objects and events with a partner during social interactions—emerges during infancy (Bakeman & Adamson, 1984) and continues to change form and focus well into early childhood (Adamson, Bakeman, & Deckner, 2004; Adamson, Bakeman, Deckner, & Nelson, 2014). Research on joint engagement has deep roots in seminal developmental theories on early symbol formation (e.g., Piaget, 1962; Vygotsky, 1978; Werner & Kaplan, 1963) that spread broadly through contemporary accounts of early communication and language development (e.g., Adamson, 1996; Bruner, 1983; Nelson, 1996; Tomasello, 2003). There are now volumes devoted to the discussion of how joint engagement and related aspects of joint attention, such as skills related to initiating and responding to bids for shared attention, are relevant to a wide range of philosophical and psychological questions (see, e.g., Eilan, 2005; Moore & Dunham, 1995; Seemann, 2011).

Yet, despite all of this interest in joint engagement, relatively little is known about how children and their caregivers share sounds during early social interactions. To date, research has focused almost exclusively on shared attention to objects that may be simultaneously seen, touched, and heard. When an investigation primarily considers only one modality, it is almost always vision (Rossano, Carpenter, & Tomasello, 2012; see Butterworth & Jarrett, 1991, for a classic account, and Bottema-Beutel, 2016, for a recent review). Recently there have been compelling calls to supplement the study of visual joint engagement with studies of the role of touch (Botero, 2016) and vocal and postural cues (Akhtar & Gernsbacher, 2008) during social interactions. Nevertheless, there is still a noteworthy paucity of research that documents how young children and their partners share listening to the array of ephemeral sounds, such as background speech and environmental sounds, that can provide an invisible focus for an interaction.

The Construct of Auditory Joint Engagement

In the study reported here, we sought to begin to correct this oversight by charting the developmental course of auditory joint engagement—the active sharing of audible events during social interactions—from 12 to 30 months of age. We did this by systematically observing whether and how commonly occurring sounds, including speech, music, animal calls, and mechanical noises, are shared during on-going play. By focusing on sounds, we gained a novel view of how young children and caregivers negotiate the auditory world and of how sharing sounds can play a central role in the emergence of symbols. Although a child may point in order to visually direct attention toward the sound source, symbols including language (e.g., “where’s that piano!”) and symbolic gestures (e.g., pretending to play a piano) are often crucial to clarify what is heard. Moreover, in contrast to visible objects that tend to persist, sounds tend to be impermanent as they rapidly change quality and meaning or vanish entirely. Thus, concentrating on the emergence of auditory joint engagement may highlight the early development of displacement (Brown, 1973) during which symbols allow shared topics to expand from concrete objects in the here and now to events displaced in time and space (Adamson & Bakeman, 2006).

There are already several indications that sharing sounds plays a distinctive role in parent-child interactions. These include the rich, albeit quite limited, corpus of seminal observational studies that demonstrate how the rhythms of speech and music can organize the flow of communication during early infancy (e.g., Stern, 1985), how adults may modify their speech to capture and sustain infants’ attention (e.g., Fernald et al., 1989), and how young children learn spoken words more readily when adults position them within shared attention (e.g., Tomasello & Farrar, 1986). Moreover, there are descriptions of how sounds play a central role as parents play with young children with low or no vision (Alfaro et al, 2017; Bigelow, 2003) and of blind parents interacting with a sighted infant (Adamson, Als, Tronick, & Brazelton, 1977). There is also experimental evidence that toddlers can take into account their partner’s appreciation of sound. By two years of age, children appear to consider what others have previously heard (Moll, Carpenter, & Tomasello, 2014) and by at least age three, they seem to understand the implications of sharing sounds with partners as they try to conceal events (Melis, Call, & Tomasello, 2010) or not wake a baby (Williamson, Brooks, & Meltzoff, 2014) and begin to display understanding of diverse desires and beliefs that focus exclusively on sounds (Hasni, Adamson, Williamson, & Robins, 2017).

Observing Auditory Joint Engagement

Nevertheless, there have been surprisingly few attempts to document how young children and parents actively share sounds that neither of them produce during interactions. One exception is a pilot study that showed how toddlers reacted to and shared a bird call during a playful interaction with their parents (Oyabu, 2006; Oyabu & Adamson, 2000) by adding a new scene—the bird scene—to the Communication Play Protocol (CPP; Adamson & Bakeman, 2016), a semi-naturalistic observational protocol that consists of a series of scenes that facilitate parent-child interactions in different communicative contexts. The bird scene consisted of three 1-to 2-min phases: first, the bird call played briefly and the parent was asked not to take notice; second, the bird call played again and the parent was asked to try to share it with the child; and third, the bird call vanished. During the first phase toddlers oriented to the bird call and they often sought to share it with their mothers using points and words. Then, when the mother tried to share the bird call, all of the children jointly engaged with both her and the sound. Finally, when the sound vanished, many of the toddlers and mothers continued to share the bird call.

Intrigued by these findings of how readily toddlers and their parents wove sounds into joint engagement, we formulated a new auditory version of the CPP, the Communication Play Protocol-Auditory (CPP-A), that blended the experimental procedure of presenting specific sounds in various locations (Grieco-Calub, Litovsky, & Werner, 2008; Olsho, Koch, Halpin, & Carter, 1987) with the CPP. In the CPP-A, four new auditory scenes were added to the CPP, each of which presented one of four types of sounds, specifically speech, instrumental music, animal calls, and mechanical noises. As in the bird scene, each auditory scene consisted of three phases so that we could observe (a) the child’s initial awareness of and attempts to initiate joint engagement to a sound before the caregiver reacts (which we labeled the ignore-sound phase); (b) the duration and form of the child’s auditory joint engagement once the caregiver displays interest in the sound (the share-sound phase); and (c) the representation and sharing of the sound after it vanished (the post-sound phase). In addition to the four auditory scenes, the CPP-A included three five-min interactive scenes from the original CPP that were interspersed between the auditory scenes.

The Developmental Trajectory of Auditory Joint Engagement

Our primary aim was to describe the developmental trajectory of auditory joint engagement. To do this, we observed children and their caregivers in the CPP-A five times, spacing the first three sessions—when infants were 12, 15, and 18 months old—to observe auditory joint engagement as first words emerge, and the last two—when toddlers were 24 and 30 months old—to observe auditory joint engagement as vocabularies expand. Our repeated observations using the CPP-A let us pose questions—producing four sets of hypotheses—about developmental changes in how young children initially reacted to the sounds, in how they shared them with their parents, in how their parents supported this sharing, and finally in what the children did when a sound disappeared.

Children’s initial reaction to the sounds.

We expected that infants, even at 12 months of age, would display interest in the sounds. Thus, we did not expect to observe hypersensitivity, as when a person covers their ears to block the sound or becomes distressed when the sound begins (Hypothesis 1.1). Moreover, we expected infants, even at 12 months of age, would alert and orient to the sound during the ignore-sound phase (Hypothesis 1.2). These expectations are consistent with research on early auditory development that demonstrates that even newborns alert and turn their heads towards a sound source (e.g., Alegria & Noirot, 1978; Clifton, 1992), suggesting that they begin to localize sounds and to coordinate auditory and visual spaces right from birth (Butterworth and Castillo, 1976). Moreover, by 12 months of age the components of auditory attention of arousal, orienting, selective attention, and sustained attention appear to be well established with variations attributable primarily to motivation and to the voluntary direction and regulation of attention (Gomes, Molholm, Chrisodoulou, Ritter, & Cowan, 2000; Werner & Rubel, 2012).

However, we also expected that there might be subtle differences in initial alerting and orienting due to the location of the sound source relative to the child, such that children might be more apt to alert and to orient to sounds that initially emanated from the spaces beside rather than behind them (Hypothesis 1.3a). This expectation is consistent with research on the effect of target location on visual attention (e.g., Butterworth & Jarrett, 1991) and on turning towards sounds (Kezuka, Amano, & Reddy, 2017) that found that young infants are most adept at turning towards voices and castanets when they are located to the side rather than behind them, presumably because it is easier to visualize peripheral rather than hidden spaces. However, we also expected that as toddlers begin to develop the capacity to represent space (Butterworth & Jarrett, 1991), differences in alerting and orienting associated with a sound’s location would become less pronounced with age (Hypothesis 1.3b).

Sharing sounds.

In contrast to our expectation that at every age point we observed children would listen to the sounds when they were initially presented, we expected that the occurrence of auditory joint engagement in the share-sound phase would increase from 12 to 30 months of age (Hypothesis 2.1) and that its form and focus would change. In concert with studies of joint engagement development in multimodal contexts (Adamson, et al., 2004; Bakeman & Adamson, 1984; Bottema-Beutel, 2016; Carpenter, Nagell, & Tomasello, 1998), we focused on three aspects of auditory joint engagement.

First, we observed how actively the children tried to involve the parent in shared listening, expecting increases with age both in attempts to share the sound in the ignore-sound phase and in coordinated joint engagement—joint engagement during which the child attends to both the shared sound and the parent—in the share-sound (Hypothesis 2.2).

Second, we also expected that children would attend to sounds with their parents—who are well positioned to influence their child’s experience of the sound—without explicitly attending to the parent. Thus, as with multimodal joint engagement (Adamson et al., 2004; Bakeman & Adamson, 1984), we hypothesized that supported auditory joint engagement—joint engagement during which the child attends to the shared sound but not explicitly to the caregiver—during the share-sound phase would increase at least until 18 months (Hypothesis 2.3a) and, at all age points, would occur more often than coordinated auditory joint engagement (Hypothesis 2.3b).

Third, we systematically observed the children’s use of symbols related to the sound. We expected that symbol-infused auditory joint engagement—a state during which the child actively attends to symbols as well as the shared sound—would not occur when infants were 12 months old in the share-sound phase but would become increasingly evident over time, especially between 18- and 30-months of age (Hypothesis 2.4).

In addition to systematically observing whether joint engagement occurred, we asked if there were developmental changes in how children communicated about the sounds in the share-sound phase. We expected that there would be an increase in how often the children speak about the sound, especially how often they produce a word that labels the sound and locates it source, and how often they produce iconic gestures as they communicate about the sound (Hypothesis 2.5a). In contrast, we expected that their production of points might initially increase but then begin to wane as they increasingly used words and iconic gestures to represent the sound (Hypothesis 2.5b).

Parents’ contributions.

We expected that parents would readily comply with our instructions to ignore the sound and the child’s reaction to it in the ignore-sound phase and to share the sound with the child in the share-sound phase (this was a fidelity check, not a hypothesis). We also expected that as they tried to share the sound in the share-sound phase that they would successfully scaffold the child’s auditory joint engagement in the share-sound phase (Hypothesis 3.1) and that over time parents’ scaffolding would increasingly focus on language related to the sound (Hypothesis 3.2).

Sharing sounds after they end.

We expected that over time children and their parents would increasingly focus on the disappearance of a sound in the post-sound phase, rarely displaying auditory joint engagement at 12 months but increasingly thereafter, albeit less often than when the sound was present (Hypothesis 4.1). We also expected that representation of the no longer audible sound would increase over time as evidenced by increases in symbol-infused joint engagement, in child speaks and gestures, and in parent language scaffolding (Hypothesis 4.2).

Auditory Joint Engagement with Different Sounds

Our second research aim was to examine the effect of the type of sound on how infants attend to and share sounds. Drawing from studies of early auditory processing, we selected categories of sounds that infants by one year of age would likely have organized into meaningful, identifiable auditory “objects” (Lass, Eastham, Parrish, Scherbick, & Ralph, 1982) and that, as we note above, would likely not evoke hypersensitivity. Two of the sound types, animals calls and mechanical noises, are environmental sounds that gain meaning through their association with specific objects such as an animal or a machine (Ballas & Howard, 1987; Cummings et al., 2006). The other two sound types, third-party speech and instrumental music, are often contrasted with environmental sounds (cf. Ballas & Howard, 1987) and with each other (cf. Trehub, Trainor, & Unyk, 1993). Our interest in exploring the possibility that the type of sound influences attention to and sharing of sounds was piqued by findings that speech processing is differentiated from other sound processing early in infancy (Lloyd-Fox, Blasi, Mercure, Elwell, & Johnson, 2012) and that young infants may prefer listening to speech over nonspeech sounds (Vouloumanous & Curtin, 2014, but see Kezuka et al., 2017). Thus we expected that speech might attract more initial alerting and orienting in the ignore-sound phase than the environmental sounds of animal calls and mechanical noises (Hypothesis 5.1). In contrast, we expected that because young children might treat environmental sounds—but not third-party speech—as emanating from hidden objects that they could potentially find, see, and share during a social interaction environmental sounds might be shared more often than speech, a hypothesis that we assessed by comparing attempts to share interest in the sound, auditory joint engagement, and child speaks and gestures about the sound during the sound-share phase (Hypothesis 5.2).

Method

Participants

Participating in this study were 48 full-term, typically developing children (24 boys, 24 girls), each with a parent (45 mothers, 3 fathers). This sample size allows for repeated measures age effects with pη2 = .077 or above to be called statistically significant, p < .05. The sample was recruited from the community using strategies such as posting flyers in online community forums to seek volunteers for the study. We did not enroll a dyad if the parent reported that the child did not have significant daily exposure to English or that the child had experienced a significant medical complication or developmental problem at or since birth, including being born before 36 weeks gestation. Because our focus was on the child’s reaction to auditory stimuli, we confirmed at the time of recruitment that the child had not failed his or her most recent hearing screening. At the onset of each visit, we confirmed that the child was not currently being treated for an ear infection. As detailed in Table 1, the parents were generally well-educated, with some racial and ethnic diversity reflective of our urban location; over half reported household incomes over $100,000. Their mean age was 35.2 (range 26.7–50.2).

Table 1.

Characteristics of Children and Their Parents

Characteristic Category N
Child sex Male 24
Female 24
Parent (play partner) Mother 45
Father 3
Child race/ethnicity White non-Hispanic 32
Black non-Hispanic 5
Multi-racial non-Hispanic 5
Hispanic White 5
Hispanic multi-racial 1
Parent race/ethnicity White non-Hispanic 40
Black non-Hispanic 5
Multi-racial non-Hispanic 1
Hispanic White 2
Hispanic multi-racial 0
Parent employment Full-time 18
Part-time 10
Not employed 20
Parent education Only high school 0
Some college 2
4 year degree 19
Some post college 1
Graduate degree 26
Parent language English only 41
English + other language 7
Household income $30K–$50K 4
$50K–$100k 17
$100K or more 26
None given 1

Note. N = 48.

Observations were scheduled to occur when children were aged 12, 15, 18, 24, and 30 months; 40 dyads completed all five visits and 8 dyads completed four (2 missed the 15-mo, 3 the 24-mo, and another 3 the 30-mo visit); mean ages at the five visits were 12.6, 15.2, 18.3, 24.2, and 30.1 months and ranges were 12.0–13.9, 14.7–15.8, 17.7–19.6, 23.5–25.1, and 29.2–31.4. Nine additional dyads participated in the first visit but were not included in the final sample; seven missed more than one of the subsequent four visits, one missed one of the subsequent visits and had incomplete CPP-A data for another, and one lacked of expressive words at 30 months of age, which was a major developmental concern to his parents and pediatrician. The study titled The Developmental Course of Auditory Joint Engagement was approved by the Georgia State University’s institutional review board (IRB number H14441).

Observing Auditory Joint Engagement

The Communication Play Protocol-Auditory (CPP-A).

In order to observe how young children react to and share various sounds with their parents, we modified the CPP (Adamson & Bakeman, 2016). Like the CPP, the CPP-A is a semi-naturalistic observational procedure that took place in a 3.0 m by 4.6 m playroom that included ample space for the parent and child to move around or be seated either on beanbags or at a child-size table. As in the CPP, we used the metaphor of a play to explain our procedure to the parent. Thus, we told the parent that we were interested in observing how the child communicates and so they would be performing a play that encourages communication. The play contains a series of conditions (scenes) with the parent (the supporting actor) and the child (the star). Before each scene, the researcher (director) enters the playroom (stage) briefly to provide the parent with a cue card that specifies a plot and when applicable the props for the scene as well as suggestions about how the parent might act, but we do not provide a script. For the auditory scenes, we expliclity suggest that the parent ignore the sound and the child’s reaction to it when it is first played (the ignore-sound phase) and then that the parent try to share the sound with the child (the share-sound phase) when it occurs the second time. The CPP-A usually took 40 min to complete.

Scenes.

The CPP-A included seven scenes, the four newly designed auditory scenes interspersed with three 5-min long multimodal scenes from the original CPP. Each auditory scene focused on one of four types of sounds—speech, music, animal, or mechanical—that was played over one of the six hidden speakers that lined the periphery of the playroom. During the first presentation, the 30-s ignore-sound phase, the sound was presented without parent response. Then, after a brief (15 s) interlude with no sound, a 60-s share-sound phase occurred during which the same sound was presented with the parent responding. Each scene then ended with a 60-s post-sound phase during which the parent and child interact after the sound has ceased. Before each auditory scene, the director brought in a new set of toys conducive to free play and provided the parent with a cue card that named the specific sound and asked the parent not to react to the sound or to the child’s reaction to the sound during the first presentation and to try to share the sound with the child during the second presentation. No specific request is made for the period after the sound’s offset. The three multimodal scenes (Help me, Turn-taking, and Container) were designed to facilitate interactions that focused on three communicative functions: requesting, turn-taking, and commenting. In the current study, they were interspersed between the auditory scenes in order to provide a separation between them and were not used to derive data.

The order of the four auditory scenes and the order of the three multimodal scenes were randomized across the five visits. We also randomized the props for each scene using five different bins of props, one for each visit. In each bin, there were props for the three multimodal scenes that matched the scene’s plot (three interesting toys on a shelf for requesting, three easily shared toys for turn taking, and five toys in a container for commenting) and props for the four auditory scenes (puzzle, a pretend toy such as a picnic basket with food, a toy that required the parent’s help such as a fishing pole game, and a book and stuffed animal) to play with during the scene.

Sounds.

The auditory scenes featured four types of sounds, with two exemplars per sound type that were alternated between visits: third-party speech (about the child traveling to or leaving the playroom), instrumental music (piano or guitar), animal calls (bird or cat), and mechanical noises (train or motorcycle). All sounds were pre-recorded. The speech sound was customized before each session by recording two women talking about the child, using the child’s name once every 5 to 10 sec. During a pre-session sound check, sound volume was adjusted to average within the range of 65–75 decibels. The order of the four sound types was randomized across visits.

Audio stimuli were presented from one of six speakers (Pure Acoustics Lord HT770 4” White Cube Speakers) mounted on the playroom’s walls 2.3 m from the floor. The speakers were powered by an amplifier (Dayton DTA-1 Class T Digital AC/DC Mini Amplifier) that ran through a speaker selector (Pyle PSS6 6 Channel High Power Stereo Speaker Selector). Audio files were stored on iTunes and played on an Apple iPod Touch.

A research assistant positioned behind a one-way mirror controlled the presentation of the sounds. After the child settled into play with new props, a process that usually took approximately 30 s, she activated a recording of the sound, using the speaker location designated prior to the session. The appropriate speaker was the one located behind the child, to the child’s right side, or to the child’s left side at the start of the scene and did not change over the course of the scene. Location was randomized before the visit across the four auditory scenes with the constraint that the sound played once from each location before a location was repeated. The recording played the designated sound for 30 s (the ignore-sound phase), was silent for 15 s, and then repeated the sound for 60 s (the share-sound phase). After the sound played for a second time, the director waited 60 s (the post-sound phase) before entering the play room.

Video Recording.

The CPP-A was video recorded from separate cameras that captured complementary views of the dyad’s activities without restricting their position on stage. Two cameras (Canon XA25) were operated by research assistants behind the one-way mirrors located on opposite walls of the playroom. The third camera—a GoPro Hero3+ Silver Edition—was embedded in a modified GoPro Head Mount that the parent wore on the forehead and often captured a clear view of the child’s facial expressions and gaze.

Coding

Trained observers coded a scene by viewing video from the three camera angles simultaneously, typically three times at regular speed. This usually took about 20 min. Behavior was characterized by applying 16 codes (see Figure 1) to the video records of the sound scenes. Three codes focused on the child’s independent reaction to the sound (hypersensitivity, alerting, orienting). Four additional codes characterized the child’s shared engagement with the sound. The first, attempts to share sound, was a binary code used during the ignore-sound phase to note whether the child tried to initiate joint engagement. The other three were 4-point rating items that were adapted from the Joint Engagement Rating Inventory (JERI, see Adamson, Bakeman, Deckner, & Nelson, 2012) to characterize joint engagement to the sound in the share-sound and post-sound phases when the parent was available to share the sound. To be considered joint engagement, both the child and the parent had to be actively attending to the sound together so that the parent’s actions influence the child’s experience of the sound. Our joint engagement codes were coordinated joint engagement—the child shared the sound with the parent while also actively attending to the parent, usually by glancing to him or her; supported joint engagement—the child shared the sound with the parent but the child did not explicitly acknowledge the parent; and symbol-infused joint engagement—the child attended to the shared sound and to symbols such as words and iconic gestures. In addition, six codes focused on the child’s specific communicative acts about the sound (speaks including labeling and location, gestures including points and iconic gestures).

Figure 1.

Figure 1.

Coding Scheme

Three binary codes focused on the parent’s contribution. Parent compliance coded yes if the parent followed our instructions to ignore the sound and any child reaction to it during the ignore-sound phase and to share the sound with the child in the share-sound phase served as a fidelity check on the CPP-A protocol. The two other codes focused on the parent’s scaffolding of the child’s engagement with the sound and on the parent’s language scaffolding that encouraged not only the child’s engagement with the sound but also emphasized language about the sound (e.g., “That’s a cat!”, “Hear the guitar!”).

Codes were used for all three phases except alerts to sound and orients to sound (not used for the post-sound phase), attempts to share sound (not used for the share-sound and post-sound phases), and the joint engagement items and parent scaffolding codes (not used for the ignore-sound phase).

Reliability.

To check observer agreement on the variables, 54 of the 232 visits (23%) were randomly selected to be used to assess reliability with the constraint that they be spread in equal proportion across visits and that all four auditory scenes be completed. Observers rated the visit without being aware that it had been selected for double coding. Their records were then compared with those of a trained master coder. Agreement statistics—percentage agreement, kappa, and estimated percent accuracy—for our 2×2 kappa tables are given in Table 2 (for hypersensitivity, which was coded for the scene and not by phase, kappa = .72 and estimated accuracy = 98%). For 2×2 tables, relatively low values of kappa are expected and are nonetheless acceptable (Bakeman & Quera, 2011; Bakeman, Quera, McArthur, & Robinson, 1997); kappa values were all .65 or above except for language scaffolding which was .60. Our goal was at least 90% estimated observed accuracy and with one exception (language scaffolding at 88%), values were 90% or better (for a description of estimated observed accuracy, see Bakeman, 2018; Bakeman et al., 1997; Gardner, 1995).

Table 2.

Percentage of Scenes Coded Yes for Each Code by Phase and Reliability Statistics

Phase Statistic
Code Ignore-sound Share-sound Post-sound % agree Kappa Est. % accuracy
Child attention to sound
Alerts to sound 93 88 94 .72 96
Orients to sound 81 73 89 .78 94
Child joint engagement with sound
Attempts to share sound 26 94 .72 96
Auditory joint engagementb 79 26 83 .65 90
Coordinated AJEa 42 12 88 .67 92
Supported AJEa 65 19 83 .65 90
Symbol-infused AJEa 41 17 89 .69 93
Child communicative acts
Speaks related to sound 22 36 17 97 .90 98
If yes, a label 8 18 6 90 .76 94
Gestures related to sound 17 32 14 95 .82 97
If yes, points 14 22 9 97 .91 98
Parent
Scaffolding 99 45 92 .79 95
If yes, language scaffolding 84 27 80 .60 88

Note. Percentages are for 928 scenes (all scenes for all dyads at all visits, with 8 missing visits). Reliability statistics are for 54 double-coded sessions (see text for details).

a

Rated 1–4 initially and then binary recoded to yes (rated 2–4) or no (rated 1).

b

Coded yes if either coordinated or supported auditory joint engagement was coded yes.

Variables.

For analysis, the three joint engagement rating items were recoded as binary (no = 1, engagement not observed; yes = 2–4, some to considerable engagement observed). An additional code—auditory joint engagement—was created, coded yes if either coordinated or supported auditory joint engagement was coded yes, thus representing total auditory joint egagement. Codes for child communicative acts—iconic gestures and speaks about sound location—were not analyzed further due to low counts. For the ignore-sound, share-sound, and post-sound phases, their counts were 1, 22, and 3 (0.1%, 2.4%, and 0.3% of 928 scenes) and 19, 42, and 62 (2.0%, 4.5%, and 6.7% of 928 scenes), respectively. Thus, in addition to child hypersensitivity to sound and parent complies with instructions, which were coded by scene and not phase, and including the created variable and excluding the two others just described, our analyses focused on 13 binary variables derived from the CPP-A. These variables are listed in Table 2, along with the percentage of scenes coded yes for each variable for each phase (if the variable was coded for that phase).

Data Analysis

To analyze effects of age, we computed for each visit how many of the four scenes were coded yes for a specific code, thus scores could vary from 0–4. To analyze effects of location, we computed for each location (side or behind) for each age the proportion of scenes coded yes, thus scores could vary from 0–1 (proportions were used because our protocol had called for randomizing three locations—left, right, and behind—to four scenes, thus the number of side and behind scenes was not always equal for a session). To analyze effects of sound type, we computed for each of the four types the number of visits coded yes, thus scores could vary from 0–5.

As noted above, eight dyads were missing one visit. For repeated-measures analyses, we substituted sample means for the missing scores. This is simple, straightforward, and, as Widaman (2006) noted, leaves sample means of the non-missing values unchanged. Thus, data were derived from 928 scenes (48 dyads, 5 visits, 4 scenes per visit less 4 scenes each for the 8 dyads missing one visit). Additionally, due to experimenter error, 3 dyads were missing the behind location for the 30-month visit; these dyads were excluded from analyses of location.

We used repeated measures analyses of variance to detect statistically significant differences among mean scores for visits, sound types, and sound locations within phases, decomposing age (i.e., visit) effects into linear, quadratic, and cubic trends and using Tukey post hoc tests to characterize differences between sound types. Effect sizes were assessed with partial eta squares (pη2). Significant linear trends indicated steady increases over visits, quadratic trends usually indicated initial increases followed by decreases (an inverted U), and cubic trends indicated lower initial scores, rapid acceleration, and higher later scores (an S curve)—understanding that trends can be combined, that is, effect sizes for more than one trend can be noteworthy and statistically significant.

Results

Preliminaries

Fidelity check.

Parents almost always complied with our instructions to ignore the sound and the child’s reaction to it in the ignore-sound phase and to attempt to share the sound with the child in the share-sound phase. Non-compliance was coded for only five of the 928 ignore-sound phases and only twice for the share-sound phase.

Child hypersensitivity.

No child appeared unduly hypersensitivity to the sounds, supporting Hypothesis 1.1. Of 928 scenes, only 22 were coded yes for displayed hypersensitivity (3 speech, 2 music, 3 animal, and 14 mechanical sound scenes); moreover, only 11 children were coded yes, but for no more than two of their five visits.

Reactions to Sounds During the Ignore-Sound Phase

Alerting and orienting.

Children almost always alerted to the sounds during the ignore-sound phase—alert was coded yes for 859 of the 928 scenes (see Table 2)—and all children were coded yes for at least one scene at each visit, and usually more. On average and at all ages, children alerted to the sound in over 3.5 of 4 scenes and oriented to the sound in over 3.0 of 4 scenes (see Figure 2). A repeated measures analysis of variance showed no statistically significant difference of age over visits (pη2 = .015 and .027, p = .60 and .26, for alerts and orients, respectively), supporting Hypothesis 1.2.

Figure 2.

Figure 2.

Effect of age on alerts, orients, and shares attention to the sound for the ignore-sound phase. Bars represent the mean number of scenes coded yes (0–4) over the four sound types, separately for the 12–30-month visits. N = 45; error bars are standard errors of the mean.

Effect of sound location.

Children appeared to orient more, but not alert more, to sounds coming from the side as opposed to from behind, partially supporting Hypothesis 1.3a, and in ways that did not change with age, thus not supporting Hypothesis 1.3b. Figure 3 shows means for alerts and orients for the ignore-sound phase, separately for sounds from the side and from behind. Analyses of variance for location with age as the repeated measure showed an effect of location for orients (pη2 = .19, p = .004) but not for alerts (pη2 = .013, p = .44), no statistically significant age effect for either alerts or orients, and no statistically significant interaction. For orients, there was a higher probability of orienting to sound when presented from the side at four of the five visits (the exception was the 15-month visit), resulting in an overall significant main effect for location.

Figure 3.

Figure 3.

Effect of sound location on alerts and orients to sound for the ignore-sound phase. Bars represent the proportion of scenes coded yes for alerts and orients for sound coming from the side or behind, separately for the 12–30-month visits. N = 45; error bars are standard errors of the mean.

Sharing Sounds During the Ignore-Sound and Share-Sound Phases

Total auditory joint engagement during the share-sound phase increased from 12 to 18 months and then stabilized, generally supporting Hypothesis 2.1 (see Figure 4). Analyses of variance showed that the effect size for the increasing linear trend was greater than for the inverted-U quadratic trend; pη2 = .42 and .19, p < .001 and = .002.

Figure 4.

Figure 4.

Effect of age on total and coordinated, supported, and symbol-infused auditory joint engagement (AJE). Bars represent the mean number of scenes coded yes (0–4) over the four sound types, separately for the 12–30-month visits. N = 48; error bars are standard errors of the mean.

Children’s attempts to share the sound increased with age during the ignore-sound phase (see Figure 2), as did coordinated auditory joint engagement in the share-sound phase (see Figure 4), supporting Hypothesis 2.2. Analyses of variance showed significant increasing linear trends; pη2 = .71 and .30, p < .001 for both.

Supported versus coordinated auditory joint engagement.

Supported auditory joint engagement during the share-sound phase increased from 12 to 18 months and then stabilized, generally supporting Hypothesis 2.3a (see Figure 4). Analyses of variance showed that the effect size for the increasing linear trend was greater than for the inverted-U quadratic trend; pη2 = .35 and .073, p < .001 and = .060, albeit with a less pronounced inverted-U than total auditory joint engagement. It occurred more often at all ages than coordinated auditory joint engagement during the share-sound phase , supporting Hypothesis 2.3b (see Figure 4). Analysis of variance with coordinated versus supported and age as repeated measures showed a significant coordinated versus supported main effect (pη2 = .57, p < .001).

Symbol-infused auditory joint engagement.

Symbol-infused auditory joint engagement during the share-sound phase was minimal at 12 months of age but increased thereafter, supporting Hypothesis 2.4 (see Figure 4). Analysis of variance showed a significant linear trend; pη2 = .88; p < .001.

Child Communicative Acts During the Share-Sound Phase

With some qualification, children speaking about the sound, including labeling, increased with age in the share-sound phase, supporting Hypothesis 2.5a (see Figure 5). For both, the strongest increases began at 18-months. We could not determine, however, whether iconic gestures increased with age; as noted earlier, they were coded too seldom for meaningful analysis. For both speaks and if speaks a label, the strongest trend was increasing linear, although qualified by weaker quadratic and, for speaks, cubic trends; pη2 = .92, .16, .19 and .79, .11, .035; p < .001, = .005, = .002 and < .001, = .022, = .20, respectively.

Figure 5.

Figure 5.

Effect of age on child speaks including labels, on gestures including points, and on parent scaffolding including scaffolding for language. Bars represent the mean number of scenes coded yes (0–4) over the four sound types, separately for the 12–30-month visits. N = 48; error bars are standard errors of the mean.

Pointing increased from 12 to 18 months and then decreased in the share-sound phase, supporting Hypothesis 2.5b (see Figure 5). The effect size for the inverted-U quadratic was greater than for the increasing linear trend; pη2 = .23 versus .12, p < .001 versus .015.

Parent Scaffolding During Share-Sound Phase

Parents scaffolded in essentially all scenes at all ages in the share-sound phase (further evidence that they complied with our instructions), supporting Hypothesis 3.1 (see Figure 5). Moreover, parent scaffolding for language increased over time in the share-sound phase, supporting Hypothesis 3.2. Its increasing linear trend pη2 was .27, p < .001.

Sharing Sounds In the Post-Sound Phase

Total auditory joint engagement increased from 12 to 18 months and then decreased some in the post-sound phase, supporting Hypothesis 4.1 (see Figure 4 ). Analyses of variance showed roughly equal effect sizes for the increasing linear and the inverted-U quadratic trends; pη2 = .35 and .27, p < .001 for both.

With some qualification, symbol-infused joint engagement (see Figure 4), child speaks and gestures, and parent language scaffolding (see Figure 5) all increased with age in the post-sound phase, generally supporting Hypothesis 4.2. Symbol-infused jont engagement increased from 18 to 30 months (increasing linear component pη2 = .47; p < .001). Child speaks about the sound was near zero at 12 and 15 months and then increased to occurring in about 1 of 4 scenes from 18–30 months (increasing linear trend pη2 = .55, p < .001, but quadratic, cubic, and quartic trend pη2 = .10–.15, p = .006–.025). Gestures and language scaffolding increased to 18 months and decreased thereafter (inverted-U quadratic pη2 = .38 and 22, p < .001 for both).

Speech Versus Environmental Sounds During the Ignore-Sound and Share-Sound Phases

To address our second research aim—examining the effect of the type of sound on sharing sounds—we asked whether the type of sound affected children’s alerting and orienting in the ignore-sound and share-sound phases. In the ignore-sound phase children were more likely to alert and orient to speech and mechanical sounds than to music and animal sounds, and in the share-sound phase children were neither more nor less likely to alert and orient to sounds other than speech (see Figure 6). We had hypothesized that speech would attract more initial alerting and orienting than environmental sounds, but this was true for only one of the environmental sounds (animal) in the ignore-sound phase, thus providing at best only weak support for Hypothesis 5.1.

Figure 6.

Figure 6.

Effect of sound type on alerts, orients, shares attention to the sound, auditory joint engagement (AJE), speaks, and gestures for the ignore-sound and share-sound phases, as appropriate. Bars represent the mean number of visits coded yes (0–5) over the five visits, separately for the speech (Sp), music (Mu), animal (An), and mechanical (mK) sound types. N = 48; error bars are standard errors of the mean. Means for the lighest, medium, and darkest bars differ p < .05 per a Tukey post-hoc test, whereas means for the dotted bars do not differ significantly from the others.

We had also hypothesized that environmental sounds might be shared more often than speech sounds as assessed by attempts to share interest in the sound, auditory joint engagement, and child speaks and gestures about the sound. Indeed, each of these variables was coded less frequently for speech than for animal or mechanical sounds, thus supporting Hypothesis 5.2 (see Figure 6). At least during the share-sound phase, music seemed similar to speech; auditory joint engagement, speaks, and gestures were all coded less frequently for music than for animal or mechanical sounds.

Order Effects

Although our primary focus was on effects of children’s age and sound type on children’s behavior, it is worth asking whether the order of scenes affected children’s behavior to see if children became less responsive over scenes, especially in the ignore-sound phase when the parent did not share the child’s interest in the sound. When we examined the key behaviors of alerts, orients, and attempts to share interest in the sound in the ignore-sound phase and auditory joint engagement in the share-sound phase, we found a marginally significant decrease for orients and a statistically significant decrease for auditory joint engagement (linear trend pη2 = .005, .071,.035, and .14; p = .62, .06, .20, and .008, respectively), suggesting a carry-over effect, at least for auditory joint engagement in the share-sound phase and possibly orients during the ignore-sound phase. Nonetheless, for both orients and auditory joint engagement the overall effect size was greater for sound type than for order (pη2 of .30 vs. .032 and .17 vs. .064).

Discussion

This longitudinal study displays how a sound—be it a piece of instrumental music, a mechanical noise, an animal call, or an overheard conversation—can become a topic during early parent-child interactions. To do this, we isolated sound as a potential shared topic by blending traditional experimental procedures that vary sound location and type into the semi-naturalistic Communication Play Protocol. Parents were willing and able to follow our instructions and toddlers appeared to take in stride hearing sounds playing from unseen sources, almost always alerting to them even before their parents attended to them and almost never showing dismay. Thus, we were able to systematically observe almost a thousand playful scenes of toddlers and parents sharing sounds that we then used to describe the development of auditory joint engagement and to probe differences in the way various sounds are shared.

The Developmental Course of Auditory Joint Engagement

As we hypothesized, over time toddlers increasingly tried to share new sounds with their parent, albeit to no immediate avail given our instructions to their parents in the ignore-sound phase. When parents were available to share the sound, periods of auditory joint engagement often ensued, increasing in frequency from two thirds of the observations at 12 months until almost ceiling level at 18 months of age through 30 months, when our observations ended.

The developmental course of joint engagement with sounds appears remarkably similar to the emergence of joint engagement with multimodal objects and events (Adamson et al., 2004; Bakeman & Adamson, 1984; Carpenter et al., 1998). Both auditory and multimodal joint engagement occur well before children are verbal and by 18 months they are both well-established, occurring at almost every opportunity. Moreover, the structure of auditory and multimodal joint engagement is strikingly similar. As we predicted, auditory joint engagement occurred both with and without the child actively engaging with the parent with periods of supported joint engagement occurring more often than periods of coordinated joint engagement. Thus, children often experience sounds, as they do tangible objects, that are enriched by the parent’s actions and language without having to expend attention acknowledging the parent or actively sharing affect, an asymmetrical arrangement of engagement that may facilitate learning (Adamson et al., 2004; Bloom, 1993). Furthermore, as in multimodal joint engagement (Adamson et al., 2004), we found that the occurrence of symbol infused joint engagement rose markedly as children aged. At 12 months, it was rarely observed; by 30 months of age, children almost always attended to words and/or iconic gestures when they shared sounds. Using symbols while focusing on a shared topic allows the shared topic to extend beyond the immediate context into a representational space that can include the past and future aspects of the shared topic (e.g., “That piano sounds like the one at Grandma’s.”) as well as mental states (e.g., “I really like that motorcycle!”; Adamson & Bakeman, 2006; Werner & Kaplan, 1963). We observed instances of such extension in the post-sound phase when symbol-infused joint engagement was observed with a sound that had ceased.

Commonalities in the developmental course of auditory and multimodal joint engagement suggest that sounds and tangible objects might have much in common when they serve as the focus of a parent-toddler interaction. Indeed, our findings support the notion that toddlers may often treat sounds as if they are part of a potentially visible whole that was located just outside of view. Most notably, when a sound began toddlers almost always turned to look towards its possible source; indeed, at times they even went so far as to race over and peer out of one of the playroom’s one-way windows. The clarity of these spontaneous reactions to the sound suggests that young children regarded intangible sounds as part of multisensory objects (Murray, Lewkowic, Amedi, & Wallace, 2016; see also Werker, 2018, for a related discussion about infants’ rich multisensory appreciation of speech). Moreover, similar to the lateral advantage for visual search behavior (Butterworth & Jarrett, 1991), the location of the sound source affected spatial hearing, with toddlers more often orienting to sounds that emanated from the side rather than from behind. This finding is consistent with the long-held notion that spatial hearing involves an attempt to place the sound’s hidden source within visual space which is harder for infants to explore when it is behind them and hence invisible (Butterworth & Castillo, 1976; Clifton, 1992). However, we did not find as we had hypothesized that this lateral advantage would decrease with age as children’s representational skills increased, as it likely does for searching for visible targets. One possible explanation is that representing the hidden source of sounds in invisible space is more difficult than soon-to-be seen visible objects. However, given that even 12-month-olds oriented to sounds emanating from behind in 70% of the ignore-sound phases, explanations involving the motoric and motivational differences between turning partially as compared to completely around also seem credible.

As expected, and consistent with studies of communicative acts during joint engagement with multimodal objects (Bakeman & Adamson, 1986), over time children more often spoke and more specifically, they more often labelled sounds using the name of the sound source. Moreover, parents actively scaffolded language learning as they shared sounds, doing so in 3 of 4 scenes even with 12-month-old infants. Gestures related to the sound also increased, but interestingly, this increase was likely due to an increase in conventional gestures rather than pointing. It is noteworthy that children point towards a sound source. To date, studies of early pointing have not isolated sound as the potential target of pointing (see Colonnesi, Stam, Koster, & Noom, 2010, for a review). Here we observed pointing to sounds at least once in 181 of 928 scenes in the share-sound phases we staged, documenting that it does occur during parent-child interactions. Moreover, albeit at a relatively low rate, we observed points to a sound that has ceased at least once in 84 of 928 scenes, adding to observations of pointing to once-visible absent objects (Bohn, Zimmermann, Call, & Tomasello, 2018; Liszkowski, Carpenter, & Tomasello, 2007). But, as we predicted, points to sound followed an intriguing inverted-U shaped developmental trajectory that we have not seen reported previously: in all three phases, pointing increased from 12 to 18 months, as has often been noted for visual and multimodal targets, but then declined. Thus, it appears that as children began to talk about the sounds, the naming of objects gradually supplanted deictic gestures as a way to communicate about them.

Sharing Different Types of Sounds

There were subtle but intriguing differences in how attractive and shareable toddlers found the different types of sounds we presented. To some extent, these differences can be interpreted using the distinction between speech and environmental sounds. Our expectation that speech would initially attract more attention that environmental sounds was partially supported since children did alert and orient to speech more often than to animal calls in the ignore-sound phase. However, both speech and mechanical noises seemed irresistible, provoking attention in essentially all of the ignore-sound scenes in which they were played. One intriguing explanation for speech’s appeal is that the presence of one’s name in the overheard conversation induced the so-called “cocktail party effect” that has been observed even in young infants (Bernier & Soderstrom, 2018; Mandel, Jusczyk, & Pisoni, 1995; Newman, 2005). Nevertheless, finding that the sound of a motorcycle or a train held a similarly powerful appeal to toddlers adds an unexpected caveat to the prominent finding that speech is particularly salient to infants even at birth (e.g., Vouloumanos & Werker, 2007).

Yet, despite speech’s propensity to attract toddler’s attention, it was less likely than environmental sounds to become the topic of joint engagement and the referent for children’s words and gestures. The possibility that toddlers thought that animal sounds and mechanical noises emanate from potentially tangible objects helps explain this finding. But it may also reflect in part that parents were more likely to encourage labeling of the entity that produced environmental sounds than the people whose conversations were overheard as they scaffolded joint engagement, doing so both when the sound was playing and after it stopped. In any case, these findings suggest that environmental sounds may be especially auspicious referents for early language, ones that can encourage not only the acquisition of conventional labels like cat and train but also the playful production of onomatopoeia labels like meow (Laing, 2014; Perry, Perlman, Winter, Massaro, & Lupyan, 2018).

Comparing reactions to music and to other sounds underscores why instrumental music seems to elude ready classification, being considered by some to be an environmental sound (Lass et al., 1982) but not by others (Ballas & Howard, 1987), and why it has long been recognized (Trehub et al., 1993) that music processing is multifaceted even during early infancy. We found that music, like speech, was often less likely than environmental sounds to be the shared topic of joint engagement or the target of communicative acts, but like animal sounds, it did not evoke as much initial alerting and orienting as speech and mechanical noises. But perhaps most compelling, we observed a remarkable range of reactions to music from seeming indifference to dramatic dance and singing performances, indicating the need for additional studies of variations in how music is introduced and appreciated during early childhood.

Concluding Comments

The current study provides the foundation for additional research on the development course and variability of auditory joint engagement as well as its significance for language acquisition. Broadening the age span to include observations before 12 months might reveal more fully the emergence of auditory coordinated joint engagement as young infants first begin to attend both to a shared topic and their partners. Continuing to observe past 30 months would help describe how iconic gestures about sounds emerge and are produced by children to refer to sounds during interactions.

One of the limitations of the current study was that the parents who graciously participated with their typically developing toddlers in five sessions over a period of 18 months offered us a relatively narrow view of variations in auditory joint engagement. Especially informative may be observations of auditory joint engagement in non-Western cultures, including Japan where Oyabu first observed reactions to bird song (Oyabu, 2000; Oyabu & Adamson, 2000), that may more actively encourage infants to share sounds. In addition, observations of auditory joint engagement in children who have developmental difficulties that might impact hearing would expand our view of variations of auditory joint engagement. We are currently expanding this view by observing auditory joint engagement of toddlers diagnosed with autism spectrum disorder (ASD) who often display both joint attention difficulties (Adamson, Bakeman, Suma, & Robins, 2019; Mundy, 2016) and unusual reactions to sounds such as a selective lack of response to their own name, defensive reactions to loud sounds, and fascination with specific mechanical sounds (e.g., Wiggins, Robins, Bakeman, & Adamson, 2009; for review, see O’Connor, 2012; cf. Uljarević et al., 2017).

Finally, the current study now positions us to probe how auditory joint engagement might facilitate language development (Adamson & Dimitrova, 2014). In line with Ahktar’s (2004) caution that third-party conversations should not be overlooked as an important context for early word learning, future studies might profitably probe how different characteristics of third-party speech, such as presence of names, the identity of speakers, and the degree to which that speech has the prosody of child-directed talk, might facilitate child’s attention to and sharing of this speech. Moreover, given studies that indicate that the quality of multimodal joint engagement predicts language outcomes (e.g., Hirsh-Pasek et al., 2015), we look forward to learning whether auditory joint engagement, given its sole focus on sound, is a particularly powerful predictor of early language acquisition.

Acknowledgments

This research was supported by a grant from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (R01HD035612). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Child Health and Human Development or the National Institutes of Health. Diana L. Robins is co-owner of MCHAT, LLC, which receives royalties from companies that incorporate the M-CHAT(-R) into commercial products. No royalties were collected in relation to the current study.

The authors thank Anita Hasni, Ashley Kellerman, Sarah Vogt, Claire Cusack, Lindsey Grossman, Jeri Wheeler, and Dominique La Barrie for their help with this research. We also gratefully acknowledge the parents and children who participated in this study.

Footnotes

Portions of the study were presented at the 2016 Biennial Congress on Infant Studies, New Orleans, LA.

References

  1. Adamson LB (1996). Communication development during infancy. Boulder, CO: Westview. [Google Scholar]
  2. Adamson LB, Als H, Tronick E, & Brazelton TB (1977). The development of social reciprocity between a sighted infant and her blind parents. Journal of the Academy of Child Psychiatry, 16, 194–207. [DOI] [PubMed] [Google Scholar]
  3. Adamson LB, & Bakeman R (2016). The Communication Play Protocol: Capturing variations in language development. Perspectives of the ASHA Special Interest Groups, SIG 12, 1(4), 164–171. doi: 10.1044/persp1.SIG12.164 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Adamson LB, & Bakeman R (2006). The development of displaced speech in early mother-child conversations. Child Development, 77, 186–200. doi: 10.1111/j.1467-8624.2006.00864.x [DOI] [PubMed] [Google Scholar]
  5. Adamson LB, Bakeman R, & Deckner DF (2004). The development of symbol-infused joint engagement. Child Development, 75, 1171–1187. doi: 10.1111/j.1467-8624.2004.00732x [DOI] [PubMed] [Google Scholar]
  6. Adamson LB, Bakeman R, Deckner DF, & Nelson PB (2012). Rating parent-child interactions: Joint engagement, communication dynamics, and shared topics in autism, Down syndrome, and typical development. Journal of Autism and Developmental Disorders, 42, 2622–2635. doi: 10.1007/s10803-012-1520-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Adamson LB, Bakeman R, Deckner DF, & Nelson PB (2014). From interactions to conversations: The development of joint engagement during early childhood. Child Development, 85, 941–955. doi: 10.1111/cdev.12189 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Adamson LB, Bakeman R, Suma K, & Robins DL (2019). An expanded view of joint attention: Skill, engagement, and language in typical development and autism. Child Development, 90 (1), e1–e18. doi: 10.1111/cdev.12973 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Adamson LB, & Dimitrova N (2014). Joint attention and language development In Brooks P & Kempe V (Eds.), Encyclopedia of language development, (pp. 299–304). Los Angeles, CA: Sage publications. [Google Scholar]
  10. Akhtar N (2004). Contexts of early word learning. In D. G. Hall & S. Waxman (eds), Weaving a lexicon (pp. 485–507). Cambridge, MA: MIT Press. [Google Scholar]
  11. Akhtar N, & Gernsbacher MA (2008). On privileging the role of gaze in infant social cognition. Child Development Perspectives, 2, 59–65. doi: 10.1111/j.1750-8606.2008.00044.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Alegria J & Noirot E (1978) Neonate orientation behavior towards human voice. International Journal of Behavioral Development, 1, 291–312. doi: 10.1177/016502547800100401 [DOI] [Google Scholar]
  13. Alfaro AU, Morash VS, Lei D, & Orel-Bixler D (2017). Joint engagement in infants and its relationship to their visual impairment measurements. Infant Behavior & Development. doi: 10.1016/j.infbeh.2017.05.010 [DOI] [PubMed] [Google Scholar]
  14. Bakeman R (2018). KappaAcc: Deciding Whether Kappa is Big Enough by Estimating Observer Accuracy (Technical Report 28). Retrieved from Georgia State University website: http://bakeman.gsucreate.org/DevLabTechReport28.pdf
  15. Bakeman R, & Adamson LB (1984). Coordinating attention to people and objects in mother-infant and peer-infant interaction. Child Development, 55, 1278–1289. doi: 10.2307/1129997 [DOI] [PubMed] [Google Scholar]
  16. Bakeman R & Adamson LB (1986). Infants’ conventionalized acts: Gestures and words with mothers and peers. Infant Behavior and Development, 9, 215–230. doi: 10.1016/0163-6383(86)90030-5 [DOI] [Google Scholar]
  17. Bakeman R, & Quera V (2011). Sequential analysis and observational methods for the behavioral sciences. Cambridge, UK: Cambridge University Press. [Google Scholar]
  18. Bakeman R, Quera V, McArthur D, & Robinson BF (1997). Detecting sequential patterns and determining their reliability with fallible observers. Psychological Methods, 2, 357–370. doi: 10.1037/1082-989X.2.4.357 [DOI] [Google Scholar]
  19. Ballas JA, & Howard J (1987). Interpreting the language of environmental sounds. Environment and Behavior, 19, 91–114. doi: 10.1177/0013916587191005 [DOI] [Google Scholar]
  20. Bernier DE, & Soderstrom M (2018). Was that my name? Infants’ listening in conversational multi-talker backgrounds. Journal of Child Language, 45, 1439–1449. doi: 10.1017/S0305000918000247 [DOI] [PubMed] [Google Scholar]
  21. Bigelow AE (2003). The development of joint attention in blind infants. Development and Psychopathology, 15, 259–275. doi: 10.1017/S095457940300014 [DOI] [PubMed] [Google Scholar]
  22. Bloom L (1993). The transition from infancy to language. Cambridge, England: Cambridge University Press. [Google Scholar]
  23. Botero M (2016). Tactless scientists: Ignoring touch in the study of joint attention. Philosophical Psychology, 29, 1200–1214. doi: 10.1080/09515089.2016.1225293 [DOI] [Google Scholar]
  24. Bottema-Beutel K (2016). Associations between joint attention and language in autism spectrum disorder and typical development: A systematic review and meta-regression analysis. Autism Research, 9, 1021–1035. doi: 10.1002/aur.1624 [DOI] [PubMed] [Google Scholar]
  25. Brown R (1973). A First language: The early stages. Cambridge, MA; Harvard University Press. [Google Scholar]
  26. Bruner J (1983). Child’s talk: Learning to use language. New York: W. W. Norton. [Google Scholar]
  27. Bohn M, Zimmermann L, Call J, & Tomasello M (2018). The social-cognitive basis of infants’ reference to absent entities, Cognition, 177, 41–48. doi: 10.1016/j.cognition.2018.03.024 [DOI] [PubMed] [Google Scholar]
  28. Butterworth G, & Castillo M (1976). Coordination of auditory and visual space in newborn human infants. Perception, 5, 155–160. doi: 10.1068/p050155 [DOI] [PubMed] [Google Scholar]
  29. Butterworth G, & Jarrett N (1991). “What minds have in common is space: Spatial mechanisms serving joint visual attention in infancy.” British Journal of Developmental Psychology, 9, 55–72. doi: 10.1111/j/2044-835X.1991.tb00862.x [DOI] [Google Scholar]
  30. Carpenter M, Nagell K, & Tomasello M (1998). Social cognition, joint attention, and communicative competence from 9- to 15-months of age. Monographs of the Society for Research in Child Development, Vol. 63, No. 4, pp. i+iii+v-vi+1–174. [PubMed] [Google Scholar]
  31. Clifton RK (1992). The development of spatial hearing in human infants In Werner LA & Rubel EW (Eds.), Developmental psychoacoustics (pp. 135–157). Washington, DC: American Psychological Association. doi: 10.1037/10119-005 [DOI] [Google Scholar]
  32. Colonnesi C, Stam GJJM, Koster I, & Noom MJ (2010). The relation between pointing and language development: A meta-analysis. Developmental Review, 30, 352–366. doi: 10.1016/j.dr.2010.10.001 [DOI] [Google Scholar]
  33. Cummings A, Čeponienė R, Koyama A, Saygin AP, Townsend J, & Dick F (2006). Auditory semantic networks for words and natural sounds. Brain Research, 1115, 92–107. doi: 10.1016/j.brainres.2006.07.050 [DOI] [PubMed] [Google Scholar]
  34. Eilan N (Ed.) (2005). Joint attention: Communication and other minds: Issues in philosophy and psychology. New York: Oxford University Press. [Google Scholar]
  35. Fernald A, Taeschner T, Dunn J, Papousek M, De Boysson-Bardies B, & Fukui I (1989). A cross-language study of prosodic modifications in mothers’ and fathers’ speech to preverbal infants. Journal of Child Language, 16, 477–501. doi: 10.1017/S0305000900010679 [DOI] [PubMed] [Google Scholar]
  36. Gardner W (1995). On the reliability of sequential data: Measurement, meaning, and correction In Gottman JM (Ed.), The analysis of change (pp. 339–359). Hillsdale, NJ: Erlbaum. [Google Scholar]
  37. Gomes H, Molholm S Chrisodoulou C, Ritter W, & Cowan N (2000). The development of auditory attention in children. Frontiers in Bioscience, 5, d108–120. doi: 10.2741/Gomes [DOI] [PubMed] [Google Scholar]
  38. Grieco-Calub TM, Litovsky RY, & Werner LA (2008). Using the observer-based psychophysical procedure to assess localization acuity in toddlers who use bilateral cochlear implants. Otology & Neurotology, 29, 235–239. doi: 10.1097/mao.0b13e31816250fe [DOI] [PubMed] [Google Scholar]
  39. Hasni AA, Adamson LB, Williamson RA, & Robins DL (2017). Adding sound to theory of mind: Comparing children’s development of mental-state understanding in the auditory and visual realms. Journal of Experimental Child Psychology, 164, 239–249. doi: 10.1016/j.jecp.2017.07.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Hirsh-Pasek K, Adamson LB, Bakeman R, Owen MT, Golinkoff RM, Pace A, Yust PKS, & Suma K (2015). The contribution of early communication quality to low-income children’s language success. Psychological Science, 26, 1071–1083. doi: 10.1177/0956797615581493 [DOI] [PubMed] [Google Scholar]
  41. Kezuka E, Amano S, & Reddy V (2017). Developmental changes in locating voice and sound in space. Frontiers in Psychology, 8, 1–11. doi: 10.3389/fpsyg.2017.01574. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Laing CE (2014). A phonological analysis of onomatopoeia in early word production. First Language, 34, 387–405. [Google Scholar]
  43. Lass NJ, Eastham SK, Parrish WC, Scherbick KA, & Ralph DM (1982). Listeners’ identification of environmental sounds. Perceptual and Motor Skills, 55, 75–78. doi: 10.2466/pms.1982.55.1.75 [DOI] [PubMed] [Google Scholar]
  44. Liszkowski U, Carpenter M & Tomasello M (2007). Pointing out new news, old news, and absent referents at 12 months of age. Developmental Science, 10, pp. F1–F7. doi: 10.1111/j.1467-7687.2006.00552.x [DOI] [PubMed] [Google Scholar]
  45. Lloyd-Fox S, Blasi A, Mercure E, Elwell CE, & Johnson MH (2012). The emergence of cerebral specialization for the human voice over the first months of life. Social Neuroscience, 7, 317–330. doi: 10.1080/17470919.2011.614696 [DOI] [PubMed] [Google Scholar]
  46. Mandel DR, Jusczyk PW, & Pisoni DB (1995). Infants’ recognition of the sound patterns of their own names. Psychological Science, 6,314–317. doi: 10.1111/j.1467-9280.1995.tb00517.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Melis AP, Call J, & Tomasello M (2010). 36-month-olds conceal visual and auditory information from others. Developmental Science, 13, 479–489. doi: 10.1111/j.1467-7687.2009.00892.x [DOI] [PubMed] [Google Scholar]
  48. Moll H, Carpenter M, & Tomasello M (2014). Two- and 3-year-olds know what others have and have not heard. Journal of Cognition and Development, 15(1), 12–21. doi: 10.1080/15248372.2012.710865 [DOI] [Google Scholar]
  49. Moore C, & Dunham P (Eds.) (1995). Joint attention: Its origins and role in development. Hillsdale, NJ: Erlbaum. [Google Scholar]
  50. Mundy PC (2016). Autism and joint attention: Development, neuroscience, and clinical fundamentals. New York: Guilford Press. [Google Scholar]
  51. Murray MM, Lewkowicz DJ, Amedi A, & Wallace MT (2016). Multisensory processes: A balancing act across the lifespan. Trends in Neurosciences, 39, 567–579. doi: 10.1016/j.tins.2016.05.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Nelson K (1996). Language in cognitive development: Emergence of the mediated mind. Cambridge: Cambridge University Press. [Google Scholar]
  53. Newman RS (2005). The cocktail party effect in infants revisited: Listening to one’s name in noise. Developmental Psychology, 42, 352–362. doi: 10.1037/0012-1649.41.2.352 [DOI] [PubMed] [Google Scholar]
  54. O’Connor K (2012). Auditory processing in autism spectrum disorder: A review. Neuroscience and Biobehavioral Reviews, 36, 836–54. doi: 10.1016/j.neubiorev.2011.11.008 [DOI] [PubMed] [Google Scholar]
  55. Olsho LW, Koch EG, Halpin CF, & Carter EA (1987). An observer-based psychoacoustic procedure for use with young infants. Developmental Psychology, 23, 627–640. doi: 10.1037/0012-1649.23.5.627 [DOI] [Google Scholar]
  56. Oyabu Y (2006, June). The emergence of joint auditory attention: A comparison with joint visual attention In Oyabu Y & Adamson LB (Co-Chairs), Broadening views of joint attention in mother-toddler interactions. Paper presented at the International Society on Infant Studies, Kyoto, Japan. [Google Scholar]
  57. Oyabu Y, & Adamson LB (2000, March). A trial study of joint auditory attention of one and two year olds. Presentation at the meeting of the Developmental Psychology Society of Japan, Toyko, Japan. [Google Scholar]
  58. Özçahşkan S, & Goldin-Meadow S (2009): When gesture-speech combinations do and do not index linguistic change, Language and Cognitive Processes, 24, 190–217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Perry LK, Perlman M, Winter B, Massaro DW, & Lupyan G (2018). Iconicity in the speech of children and adults. Developmental Science, 21, e12572. doi: 10.1111/desc.12572. [DOI] [PubMed] [Google Scholar]
  60. Piaget J (1962). Play, dreams and imitation in childhood. New York: W, W, Norton. [Google Scholar]
  61. Rossano F, Carpenter M, & Tomasello M (2012). One-year-old infants follow others’ voice direction. Psychological Science, 23, 1298–1302. doi: 10.1177/0956797612450032 [DOI] [PubMed] [Google Scholar]
  62. Seemann A (2011). Joint attention: New developments in psychology, philosophy of mind, and social neuroscience. Cambridge, MA: MIT Press. [Google Scholar]
  63. Stern DN (1985). The Interpersonal world of the Infant: A view from psychoanalysis and developmental psychology. NY: Basic Books. [Google Scholar]
  64. Tomasello M (2003). Constructing a language: A usage-based theory of language acquisition. Cambridge MA: Harvard University Press. [Google Scholar]
  65. Tomasello M, & Farrar JM (1986). Joint attention and early language. Child Development, 57, 1454–1463. doi: 10.2307/1130423 [DOI] [PubMed] [Google Scholar]
  66. Trehub S, Trainor L, & Unyk AM (1993). Music and speech processing in the first year of life. Advances in Child Development and Behavior, 24, 1–35. doi: 10.1016/S0065-2407(08)60298-0 [DOI] [PubMed] [Google Scholar]
  67. Uljarević M, Baranek G, Vivanti G, Hedley D, Hudry K, & and Lane A (2017). Heterogeneity of sensory features in autism spectrum disorder: Challenges and perspectives for future research. Autism Research 10, 703–710. doi: 10.1002/aur.1747 [DOI] [PubMed] [Google Scholar]
  68. Vouloumanos A & Curtin S (2014). Foundational tuning: How infants’ attention to speech predicts language development. Cognitive Science, 38, 1675–1686.doi: 10.1111/cogs.12128 [DOI] [PubMed] [Google Scholar]
  69. Vouloumanos A, & Werker JF (2007). Listening to language at birth: Evidence for a bias for speech in neonates. Developmental Science, 10, 159–171. doi: 10.1111/j.1467-7687.2007.00549.x [DOI] [PubMed] [Google Scholar]
  70. Vygotsky LS (1978). Mind in society: The development of higher psychological processes. Cambridge, MA: Harvard University Press. [Google Scholar]
  71. Werker J (2018). Perceptual beginnings to language acquisition. Applied Psycholinguistics 39, 703–728. doi: 10.1017/S0142716418000152 [DOI] [Google Scholar]
  72. Werner H, & Kaplan B (1963). Symbol formation. New York: Wiley. [Google Scholar]
  73. Werner LA, & Rubel EW (Eds.). (1992). Developmental psychoacoustics. Washington, DC: American Psychological Association. doi: 10.1037/10119-000 [DOI] [Google Scholar]
  74. Widaman KF (2006). Missing data: What to do with or without them. In McCartney K, Burchinal MR, & Bub KL (Eds.), Best practices in quantitative methods for developmentalists (pp. 64–97). Monographs of the Society for Research in Child Development, 71(3, Serial No. 285). doi: 10.1037/a0028418 [DOI] [PubMed] [Google Scholar]
  75. Wiggins LD, Robins DL, Bakeman R, & Adamson LB (2009). Sensory abnormalities as distinguishing symptoms of autism spectrum disorders in young children. Journal of Autism and Developmental Disorders, 39, 1087–1091. doi: 10.1007/s10803-009-0711-x [DOI] [PubMed] [Google Scholar]
  76. Williamson RA, Brooks R, & Meltzoff AN (2014). The sound of social cognitions: Toddlers’ understanding of how sound influences others. Journal of Cognition and Development, 16, 252–260. doi: 10.1080/15248372.2013.824884 [DOI] [Google Scholar]

RESOURCES