Live Action: Can Young Children Learn Verbs From Video?

Sarah Roseberry; Kathy Hirsh-Pasek; Julia Parish-Morris; Roberta Michnick Golinkoff

doi:10.1111/j.1467-8624.2009.01338.x

. Author manuscript; available in PMC: 2010 Sep 1.

Published in final edited form as: Child Dev. 2009 Sep-Oct;80(5):1360–1375. doi: 10.1111/j.1467-8624.2009.01338.x

Live Action: Can Young Children Learn Verbs From Video?

Sarah Roseberry ¹, Kathy Hirsh-Pasek ², Julia Parish-Morris ³, Roberta Michnick Golinkoff ⁴

PMCID: PMC2759180 NIHMSID: NIHMS138262 PMID: 19765005

Abstract

The availability of educational programming aimed at infants and toddlers is increasing, yet the effect of video on language acquisition remains unclear. Three studies of 96 children aged 30–42 months investigated their ability to learn verbs from video. Study 1 asked whether children could learn verbs from video when supported by live social interaction. Study 2 tested whether children could learn verbs from video alone. Study 3 clarified whether the benefits of social interaction remained when the experimenter was shown on a video screen rather than in person. Results suggest that younger children only learn verbs from video with live social interaction while older children can learn verbs from video alone. Implications for verb learning and educational media are discussed.

Can infants and toddlers learn language from video? A recent upsurge in children’s “edutainment” media (Brown, 1995) has promised to enhance cognitive and language skills in infants and toddlers (Garrison & Christakis, 2005). Despite the prevalence of toddler-directed educational programming and high rates of television viewing among very young children (Zimmerman, Christakis & Meltzoff, 2007a), only a handful of studies have investigated whether children under 3 years can learn words from video. These studies yield mixed results, with a plurality of research suggesting that children learn language through responsive and interactive exchanges with adults and other children rather than by passive viewing (Krcmar, Grela & Lin, 2007; Naigles, Bavin & Smith, 2005). The current research explores mechanisms that drive word learning in young children. In particular, we focus on how children acquire verbs, the gateway to grammar. Can children learn verbs when passively exposed to video versus when they interact with a person while watching videos?

Two distinct lines of research are relevant to the question of whether children can learn language from video. The first, correlational, investigates the relation between television usage patterns and language outcomes. The second, experimental, involves laboratory studies that use video displays to test for vocabulary acquisition.

Research on television viewing and word learning outcomes in children under age three is rare, but at least two relevant studies exist. One compares individual youngsters’ viewing habits while the other is a laboratory-based word learning experiment. The first is a multi-state survey by Zimmerman, Christakis, and Meltzoff (2007b), who asked parents of 8- to 24-month-olds about the content and frequency of their children’s exposure to screen media. A standardized parent-report language measure was also given over the phone. Results revealed that the vocabulary sizes of 8- to 16-month-olds varied inversely as a function of the number of hours of television that they watched per day. That is, infants who spent more time with screen media had smaller vocabulary sizes than same-aged peers who watched less television. Notably, no association between television use and vocabulary was found in children aged 17 to 24 months. In conjunction, these results might indicate a developmental trajectory in the ability to manage input from screen media, or that the process of language learning becomes more robust over time.

The second study, conducted by Krcmar and colleagues (2007), investigated word learning from video in a laboratory setting. Using a repeated measures design, researchers presented novel objects and their names to children aged 15 to 24 months in four ways: A real life interaction with an adult that used joint attention; a real life interaction that incorporated discrepant adult and child attention; an adult on video; and a clip from a children’s television show, Teletubbies. Following each training, children were asked to fast map the novel label onto the object that was named. Results revealed that when children were trained using children’s television shows, 62% of children aged 22 to 24 months selected the target object among four distracters while only 23% of children aged 15 to 21 months were successful. In contrast, word learning was significantly better in both older and younger children when taught by a live person using joint attention (93% and 52% accurate, respectively). Thus, even though all children learn language best in the context of social interaction, older toddlers may learn words from video alone. Taken together, these studies suggest that as children age and have more experience with television, they are better able to conquer the “video deficit” (Anderson & Pempek, 2005) to learn words.

In contrast to the dearth of research on how young children learn words from video, there is a significant body of data examining television and language outcomes in older children. For example, Singer and Singer (1998) demonstrated in a naturalistic study that 3- and 4-year-old children learn nouns from Barney & Friends. Another study suggests social interaction improves recall from educational videos among 4-year-olds (Reiser, Tessmer & Phelps, 1984). In this experimental study, half of the children watched an educational program while participating in a content-related interaction with a live experimenter, while the other half of the children simply watched the video. Three days later, children who interacted with an adult while viewing were better able to identify numbers and letters that had been presented in the video. Finally, Rice and Woodsmall (1988) showed children an animated program in a naturalistic setting that contained 20 novel words from a variety of word classes (e.g., nouns, verbs, adjectives). Results of pre- and posttests suggested that both 3- and 5-year-olds learned to pair a novel word heard on television with a picture that depicted that word. Even here, 5-year-old children learned only 5 out of a possible 20 words, while 3-year-old children learned only 2 out of 20 words.

Although several studies of television and language have been conducted, the literature as a whole has several limitations. First, only two studies to date have examined video and word learning outcomes in children younger than 3 years of age, despite the recent push for educational programming targeting that demographic. Second, video language studies with very young children have primarily investigated noun learning (e.g., Krcmar et al., 2007). To become sophisticated speakers, children must also acquire verbs, adjectives, and prepositions. In particular, verbs enable children to speak about a dynamic world of relations between objects, and may form the bridge into grammar (Marchman & Bates, 1994). Verbs are also, however, more difficult to learn than nouns (Bornstein et al., 2004; Gentner, 1982; Hirsh-Pasek & Golinkoff, 2006). Finally, it is critical to go beyond correlational research to better understand the relationship between language used on video and vocabulary acquisition (Linebarger & Walker, 2005).

Are there any other sources of information that can shed light on children’s ability to learn from video? In addition to studies examining the direct effects of television on language outcomes, there are a number of language experiments that use video as a stimulus at training or test but were not specifically designed to test the effects of television viewing, per se.

Many developmental studies use video paradigms to teach children new linguistic concepts through controlled exposures or to test children’s inherent knowledge of language. Although the videos used in these studies do not approximate the quality of those used in commercial programming, results nonetheless indicate that children under age 3 are able to learn information from video displays. The practice of using video clips to teach new words is particularly useful in studies of verb learning, because dynamic scenes best illustrate the meaning of an action word. For example, studies have used the Intermodal Preferential Looking Paradigm to investigate children’s emerging grammar (Gertner, Fisher & Eisengart, 2006; Hirsh-Pasek, Golinkoff & Naigles, 1996). In this video based paradigm, children were able to link a transitive (Cookie Monster is blicking Big Bird) or intransitive (e.g., Cookie Monster is blicking) verb with a video depicting the appropriate action at 28 months of age, even when the verb was unfamiliar.

Recent research has explored whether children can use televised social cues to distinguish words for intentional actions like pursuing from words for unintentional actions like wandering (Poulin-Dubois & Forbes, 2002; 2006). Toddlers watched videos that depicted pairs of intentional and unintentional actions performed by a real person. The actions were perceptually similar, and therefore distinguishable only by the videotaped actor’s social cues indicating intent or lack of intent. Results suggested that by 27 months of age, children succeeded at a matching task using relatively subtle social cues like eye gaze and facial expression – even when those social cues are presented on video.

Other studies using video paradigms further illuminate the optimal conditions for learning from video, and suggest a strong role for social cues, particularly during infancy. For example, young children are better able to reproduce an action after a delay when the demonstration is presented live, rather than on video (Barr, Muentener, Garcia, Fujimoto & Chavez, 2007). However, when 12- to 21-month-olds saw videotaped actions twice as many times as the live action, they imitated just as well as children who had seen the live demonstration with fewer repetitions. In addition to repetition, several studies indicate that contingent social interactions also enable children to learn from video (Nielsen, Simcock & Jenkins, in press; Troseth, Saylor & Archer, 2006). For example, Nielsen and colleagues (in press) demonstrated that providing children with a contingent social partner who interacted via a closed circuit video increased children’s imitation. In fact, children were just as likely to imitate a contingent partner on video as they were to imitate a live model. Thus, although children typically learn better from live social interactions than from video (see also Conboy, Brooks, Taylor, Meltzoff & Kuhl, 2008; Kuhl, Tsao & Liu, 2003; Kuhl, 2004), repetition and contingent social interactions improve video-based learning. Finally, a study by Naigles and colleagues (2005) also highlighted the importance of social interaction when using video to test children’s verb learning. First, novel verbs were presented to 22- and 27-months-old children in interactive play settings. Here, children were able to interact with the live experimenter and even perform the actions themselves while hearing the novel verb sixteen times. Then, videotaped scenes were used to test children’s ability to match each novel verb with the corresponding target action. Children were able to match a novel verb to the correct action when two scenes were presented side-by-side on a video screen, but only when the novel verb was presented in a rich syntactic context (e.g., “Where is she lorping the ball?”) and not in a bare context (e.g., “Where’s lorping?”). Importantly, although Naigles and colleagues relied on an interactive teaching session and used video only for testing, their results suggest that children are able to transfer knowledge about novel verbs to a video format. Although these studies do not speak to the specific ability of children to learn words from video, these results suggest that social interaction and repetition are central features in children’s successful training and learning from video in other domains (e.g., imitation; Barr et al., 2007; Nielsen et al., in press; Troseth et al., 2006), and that social interaction during training allows young children to generalize their knowledge of novel verbs to video (Naigles et al., 2005). These conclusions are useful in considering the demands of learning words from video.

The current research adds to our knowledge in four ways. First, we study verbs rather than nouns. This move is important because verbs are the architectural centerpiece of language and ultimately control the shape of sentences. Second, word learning demands more than matching words to objects or actions (e.g., Brandone, Pence, Golinkoff & Hirsh-Pasek, 2007; Krcmar et al., 2007). Rather, a child who truly understands a word forms a category of the referent and can extend the word to a new, never-before-seen instance of the referent. For example, after a child sees two or three different colored cups and hears the label cup, they should be able to label a green cup even if they have never seen one before. Third, most existing research on television and language outcomes is focused on children aged 3 years and older. As the television industry increases programming for children under age three, however, it is imperative for both researchers and the public to understand how television might affect language at this sensitive age. Finally, prior research underscores the role of social support in verb learning (Brandone et al., 2007; Naigles et al., 2005; Tomasello & Akhtar, 1995), particularly in the case of video-mediated learning (Poulin-Dubois & Forbes, 2002; 2006).

This research is among the first to explore the utility of television for childhood language learning by combining the television and language learning literatures that have, up to now, presented mixed evidence about the effect of television on language. We use highly engaging videos similar to the commercial programming used in research on the direct effects of television on language, and we couple these with the tightly controlled conditions that are characteristic of laboratory-based word learning studies. In this way, we hope to provide children with the optimal video verb-learning experience. We hypothesize that 2-year-olds will be able to learn verbs from video, but only when the video is accompanied by live social interaction. Three studies examine these claims.

Study 1: Do children learn verbs from video in an optimal learning environment?

In light of the finding that social cues, live or demonstrated on video, help children learn verbs (e.g., Naigles et al., 2005; Poulin-Dubois & Forbes, 2002, 2006), the first study is designed to optimize the word-learning situation. We provide children with stimuli from child-oriented commercial programming, and supplement the video with interactive teaching by a live experimenter. When provided with video and social support, will children be able to learn verbs?

Method

Participants

Forty monolingual, English-reared children were tested, twenty in each of two age groups (half male and half female): 30- to 35-months (mean age = 33.74 months; range = 30.66–35.66) and 36- to 42-months (mean age = 39.36 months; range = 36.21–42.87). Data from an additional 5 children were discarded for prematurity (2), bilingualism (1), low attention (1), and experimenter error (1). Children included in the final sample were predominantly white and from middle-class homes in suburban Philadelphia.

Paradigm and dependent variable

Children were tested using the Intermodal Preferential Looking Paradigm (IPLP; see Figure 1; Golinkoff, Hirsh-Pasek & Cauley, 1987; Hirsh-Pasek & Golinkoff, 1996). The IPLP is an established method of studying language acquisition in infants and toddlers. Children see a series of events on a video screen, and are shown two different events (one on each side of split-screen) at test. For example, the video might show a mother performing an action with her child (e.g., bouncing her child up and down on her lap). The scene is accompanied by audio presenting a novel word such as blicking (e.g., Look, she’s blicking!). At test, the same child is shown a split screen wherein the action from training appears on one side of the screen and a scene depicting another novel action appears on the other side of the screen (e.g., a mother turning her child around). The child hears, “Where is blicking?” A number of studies show that if children learn that the word blick refers to a particular novel action, then they prefer looking at the action labeled by the word blick at test. Comprehension is measured by looking time in seconds to the matching versus the non-matching screen.

The Intermodal Preferential Looking Paradigm (see Hollich, et al., 2000).

Design and procedure

Videos were created using high quality clips from Sesame Beginnings, a video series produced by Sesame Workshop for children as young as six months (Hassenfeld, Nosek & Clash, 2006a; 2006b). Children were seated on a parent’s lap, four feet in front of the video monitor. Parents were instructed to close their eyes during the video, and all parents complied. An experimenter sat in a chair next to the parent/child dyad, also with eyes closed, except during the prepared live interaction sequence. The task was presented in four phases: Introduction, Salience, Training, and Testing (see Table 1). With the exception of the Training Phase, each phase was presented entirely in video format. Each video segment was separated by a 3 second “centering trial” in which attention was redirected to the middle of the video monitor. Centering trials showed a baby laughing and were accompanied by audio instructions to look at the screen (“Wow! Look up here!”). All children recentered for at least .75 seconds.

Table 1.

Sequence of Phases for Studies 1, 2 and 3.

Trial Type	Audio	Left side of screen Center of screen Right side of screen
Center^a	Hey, look up here!	smiling baby
Introduction phase	This is Cookie Monster. Do you see Cookie Monster?	Cookie Monster smiling
Introduction phase	This is Cookie Monster. Do you see Cookie Monster?		Cookie Monster smiling
Center	Wow, what's up here?	smiling baby
Salience phase	Hey, what's going on up here?	Baby wezzling	Baby playing on parent's leg
Center	Cool, what's up here?	smiling baby
Training phase^b (4 presentations)	Look at Cookie Monster wezzling! He's wezzling! Cookie Monster is wezzling!	Cookie Monster wezzling
Center	Find wezzling!	smiling baby
Test trial 1	Where is wezzling? Can you find wezzling? Look at wezzling!	Baby wezzling	Baby playing on parent's leg
Center	Find wezzling!	smiling baby
Test trial 2	Where is wezzling? Can you find wezzling? Look at wezzling!	Baby wezzling	Baby playing on parent's leg
Center	Now find glorping!	smiling baby
Test trial 3 - new verb trial	Where is glorping? Can you find glorping? Look at glorping!	Baby wezzling	Baby playing on parent's leg
Center	Find wezzling again!	smiling baby
Test trial 4 - recovery trial	Where is wezzling? Can you find wezzling? Look at wezzling!	Baby wezzling	Baby playing on parent's leg

Open in a new tab

Centering trials were 3 seconds in duration; every other trial was 6 seconds in duration.

In Study 1, two of theTraining Phase trials comprised the live action sequence and two trials used video clips. In Study 2, all four Training Phase trials were conducted via video clips. Finally, the Training Phase in Study 3 consisted of two video clips of an experimenter and two video clips from Sesame Beginnings.

In the Introduction Phase, children were shown one of four possible characters (e.g., Cookie Monster). Two of the characters were Sesame puppets and two were real babies. The Introduction Phase acquainted children with the single character who would perform the target action, because children learn verbs better when they are familiar with the agent in a scene (Kersten, Smith & Yoshida, 2006). Video clips in the Introduction Phase appeared twice: first on one half of the screen, and then on the other. The left/right presentation order was counterbalanced across condition. This demonstrated to the child that actions could appear on either side of the screen. Importantly, the video clips used to introduce the characters were not included in any other phase of the video.

During the Salience Phase, children saw a preview of the exact test clips used during the Test Phase. Measuring looking time to this split-screen presentation before training allows detection of a priori preferences for either of the paired clips. Lack of an a priori preference in the Salience Phase indicates that differences in looking time to one clip or another at test are due to the effect of the Training Phase.

Next, the Training Phase was presented in four 6-second trials. In each of the trials, children saw the same character from the Introduction Phase perform one of four possible actions (see Table 2 for novel words and descriptions). During the first two trials, the experimenter sitting next to the parent and child presented children with a novel word, and the second two trials used video to present children with the same novel word. For the live interaction sequence, the videotape was programmed to show a solid black screen for 15 seconds (the equivalent of two 6-second trials plus one 3-second centering trial). When the black screen appeared, the experimenter produced the same puppet (or doll) as was seen in the Introduction Phase and performed an action with the puppet (or doll) while presenting a novel verb six times using full syntax (three times for each of two training trials; e.g., “Look at Cookie Monster wezzling! He’s wezzling! Cookie Monster is wezzling!”). The experimenter was trained to use infant-directed speech and to look at the child during the live action sequence. Following the live interaction sequence, children saw two 6-second clips of the same animated character (e.g., Cookie Monster) demonstrating the same action (e.g., wezzling) that they had just seen with the live experimenter. Each video clip was accompanied by pre-recorded audio that matched the script used during the live interaction. Thus, children heard the novel verb 12 times during the Training Phase. Nonsense words are commonly used instead of real words in language training studies (e.g., Brandone et al., 2007; Hirsh-Pasek et al., 1996; Naigles et al., 2005) and were used in the current study to control for children’s previous word knowledge.

Table 2.

Description of the Actions, the Nonsense Verbs They Were Paired With, and the Closest English Equivalent.

Nonsense Word	English Equivalent	Description
Frep	Shake	Character moves object in hand from side to side rapidly
Blick	Bounce	Parent moves character up and down on knee
Wezzle	Wiggle	Character rotates torso and arms rapidly
Twill	Swing	Parent holds character in arms and rotates from side to side

Open in a new tab

Finally, the Test Phase consisted of four trials. In each of the trials, a split-screen simultaneously presented two novel clips. One of the clips showed a new actor (e.g., a baby) performing the same action that children saw during the Training Phase (e.g., wezzling). The other clip showed the same new actor (e.g., the baby) doing a different action, never seen before (e.g., glorping). Children who saw a Sesame puppet during the Training Phase saw a real baby at test and children who saw a real baby during the Training Phase saw a Sesame puppet at test. Importantly, and a strength of this design, all conditions required children to extend their verb knowledge to a new actor.

Of the four Test Trials, trials 1 and 2 were designed to test children’s ability to generalize the trained verb to an action performed by a novel actor (e.g., from puppet to baby or from baby to puppet). In this Extension Test, pre-recorded audio asked children to find the action presented during training, using the novel word (e.g., “Where is wezzling? Can you find wezzling? Look at wezzling!”). If children learned the target verb, they should look at the matching action screen during each of the first two test trials.

Test Trials 3 and 4, the Stringent Test, provided an additionally strong test of word learning (Hollich, Hirsh-Pasek & Golinkoff, 2000) by asking whether children have truly mapped the novel verb to the particular novel action. Here, we test whether children will accept any verb for the action presented during training or whether children expect the trained verb to accompany the trained action. Based on the theory of mutual exclusivity (Markman, 1989), children should prefer attaching only one verb to any given action. Thus, Test Trial 3, the new verb trial, asked children to find a novel action that was not labeled during training (“Where is glorping? Can you find glorping? Look at glorping!”; glorping and spulking were terms used respectively for non-trained verbs). If children truly learned the target verb (e.g., wezzling), they should not look toward the action previously labeled wezzling during the new verb trial, and should look instead toward the unfamiliar action, hereafter the non-matching action. That is, if children thought that the wezzling action had already been named, they should now look toward the non-matching action. Test Trial 4, the recovery trial, asked children to renew their attention to the trained action by asking for it again by name (“Where is wezzling? Can you find wezzling? Look at wezzling!”). If children have mapped the original verb to the originally named action, they should redirect their gaze to the original action.

In sum, if children learned the target word during the Training Phase, they should look more at the side of the screen that showed wezzling during the Extension Test (trials 1 and 2). If children were able to do more than simply extend, they were also able to use mutual exclusivity and show that they remembered the action’s name, then they should show a quadratic pattern of looking in the Stringent Test (trials 3 and 4; see Figure 2). In other words, children should look toward the matching screen in Test Trials 1 and 2, look away in Test Trial 3 (the new verb trial), and look back to the matching screen during Test Trial 4 (the recovery trial). This v-shape in visual fixation time during the Test Trials 1 and 2 (the Extension Test), Test Trial 3 and Test Trial 4 forms a quadratic pattern of looking. Because the Stringent Test was designed to probe successful extension of the novel word, analyses for the Stringent Test were conducted only when children were able to extend the novel word in the Extension Test.

Expected quadratic pattern for the Stringent Test of Verb Learning.

Following the four Test Trials, the entire sequence was repeated for a second novel verb (e.g., blicking). The child’s assignment to one of four possible conditions determined the particular verbs a child was exposed to during the Training and Test Phases. All test trials for a given verb used an identical set of video clips and all conditions were counterbalanced in a between-subjects design (see Table 1). For example, wezzling appeared both as the novel verb in the first position and as the novel verb in the second position, depending on condition. Finally, in half of the conditions, puppets were used to train the novel verb while babies were used in test sequences. In the other half of conditions, babies were featured in the videos used to train the novel verb while puppets were used in the test clips.

Coding

Each child’s head and shoulders were videotaped for offline coding of gaze duration. Gaze direction was also coded during phases where the child saw a split-screen. Each participant’s gaze direction and duration was coded two times by a coder blind to condition and experimental hypotheses. A second blind coder recoded 20% of the videotapes. Reliability within and between coders was r = .97 and .96, respectively.

Several studies in the literature suggest that older children respond very quickly to auditory instructions at test (e.g., “Where is wezzling?), and tend to make a discriminatory response during the first 2 seconds of a trial (Fernald et al., in press; Gertner et al., 2006; Hollich & George, 2008; Meints, Plunkett, Harris & Dimmock, 2002; Swingley, Pinto & Fernald, 1999). Longer test trials give older children enough time to make a discriminatory response and then to visually explore both sides of the screen. Measures of looking time for older children that are averaged across such a long trial are likely to mask the response and reflect instead the visual exploration after the response (Fernald et al., in press). Therefore, although the current test trials were arranged to show children the full 6-second video clip, we elected to code gaze information during the first 2 seconds of each test trial in addition to the standard coding. Separate coders were used for the standard coding and the two-second coding.

For each child, we calculated the percentage of looking time to either side of the screen during salience trials and test trials. For the salience trials, we divided the number of seconds spent looking at the target action by the total number of seconds spent looking at the video screen. For test trials, percentages were calculated for the first 2-seconds of the trial only. Any score greater than 50% indicated that the child spent more time looking toward the target action than the non-target action, while any score less than 50% indicated that the child spent more time looking at the non-target.

Results

A preliminary multivariate analysis of variance (MANOVA) indicated no effect of gender on mean looking time and no a priori preference for one test clip over the other in any condition during the Salience Phase. Thus, the data were collapsed across gender and condition.

Extension Test of verb learning

Because Test Trials 1 and 2 were identical, both asking the child to find the trained action using the trained verb, data from Test Trials 1 and 2 were averaged for both the first and then the second verb. Averaging two test trials increases the reliability of children’s responses. Because each child saw two verbs, a 2 (age group: 30–35 months, 36–42 months) x 2 (verb position: 1^st verb, 2^nd verb) repeated measures analysis of variance (ANOVA) was used to determine the effect of age group and verb position on learning. Results indicate a main effect of verb position, F(1,38) = 4.75, p < .05, η_p² = .11, but no main effect of age group, p > .05, η_p² = .04, and no interaction effect, p > .05, η_p² = .01. This suggests that children reacted differently to the first verb than they did to the second verb across both age groups, and that the pattern of reaction was the same for both groups (e.g., if the younger children reacted a certain way to the first verb, so did the older children). Due to the absence of an age effect, data were pooled across age groups for further analyses.

To decipher the differences between child performance on the first verb and the second verb, we conducted planned paired-sample t-tests comparing children’s looking times to the matching action versus the non-matching action for the verbs in both positions. Results indicated that children looked equally toward the matching and non-matching action for the first verb, t(39) = .77, p > .05, but looked significantly longer to the matching action than the non-matching action for the second verb, t(39) = 4.67, p < .001. This mean looking time was also significantly greater than chance (50%), ps < .001. Thus, the training trials changed children’s looking behavior for the second verb in the sequence. Children who did not initially prefer to look at either action over the other (as demonstrated by chance rates of looking during the salience trials) now preferred to look at the matching action during the second verb.

The Extension Test provides evidence of learning in both age groups, but only for the second verb. To determine whether children succeeded in the stronger test of verb learning, we further analyzed the looking patterns for the second verb only.

Stringent Test of verb learning

Recall that Test Trial 3 (new verb trial) asked children to find the action labeled by a new novel verb and Test Trial 4 (recovery trial) asked children to find the action that was labeled during training. This strong test of verb learning used the average looking time during the first two test trials, looking time in Test Trial 3 (new verb trial) and looking time in Test Trial 4 (recovery trial) to examine whether a quadratic pattern emerged from the three measures. Although children’s responses in Test Trials 1 and 2 (Extension Test) were averaged, Test Trials 3 and 4 ask different questions and success in these trials would be evidenced by looking in opposite directions. Therefore, the test trials in the Stringent Test were analyzed individually.

Data were analyzed using a repeated measures one-way ANOVA with three conditions (extension test, new verb trial, recovery trial). A significant quadratic pattern emerged, F(1,39) = 6.16, p < .05, η_p² = .14, indicating that children learned the second verb, even by the standards of the strong test of verb learning (see Figure 3). Furthermore, paired samples t-tests revealed that children looked equally to both sides of the screen in Test Trial 3, t(39) = 0.02, p > .05, but children looked significantly longer toward the matching screen than the non-matching screen in Test Trial 4, t(39) = 3.23, p < .05. Thus, children succeeded in the Extension Test, showed no preference in the new verb trial, and again preferred the matching screen in the recovery trial.

Percentage Looking Time to the Trained Action in Studies 1, 2, and 3. Error Bars Represent the Standard Error of the Mean.

*denotes a significant percentage of looking time to target during individual test trials

**denotes significant quadratic patterns

*n.s.* denotes quadratic patterns which are not significant

Discussion

Our results reveal that children as young as 30 to 35 months of age can learn verbs from a combination of video and social interaction. This finding is consistent with prior research, and furthers it in several ways. First, our test of verb learning requires children to extend word meaning from one agent to another, rather than simply glue a word onto a scene (which may be mere association rather than word learning). For example, children who fail to extend might learn that Cookie Monster can wezzle, but not that a baby can also wezzle. The current study used a particularly difficult extension task, asking children to extend verb knowledge from puppets to humans and vice versa. Although several researchers have noted that children’s ability to generalize increases as linguistic ability increases (Forbes & Farrar, 1995; Maguire, Hish-Pasek, Golinkoff, & Brandone, 2008), few studies have used extension.

Second, in addition to extension, the current study offered a Stringent Test of verb learning. In Trials 3 and 4, children were asked to look away from the trained action upon hearing a new novel word and then redirect their attention to the trained action. While the extension task ensures that word learners are able to generalize their knowledge of the novel verb, the stringent test of verb learning asks children to constrain their generalization to the trained action only and exclude other referents. Success in this task indicates that children in the current study acquired a sophisticated understanding of the trained verb through video and live action. Interestingly, although results from Test Trial 3 (the new verb trial) suggests that children did not look significantly toward either side of the screen, children did show a significant quadratic pattern of looking. In this case, the v-shaped quadratic pattern of looking indicates that children succeeded in the Extension Test, then looked away from the trained action in Test Trial 3 (but did not prefer either side of the screen), and finally recovered their looking toward the trained action in Test Trial 4 (the recovery trial). In general, data from individual children followed this pattern, as opposed to the expected quadratic pattern that would have required children to significantly look away from the trained action in Test Trial 3 (see Figure 2). Thus, even though children did not attach the novel verb in the new verb trial to the non-matching action, they were nonetheless able to shift their attention away from the trained action. Notably, children’s familiarity with either the trained or the untrained action did not appear to inhibit (or drive) learning. If familiarity with an action had prevented children from attaching a second label to it, we would be more likely to see such an effect with the trained actions than the untrained actions. Whereas the nonsense words for the trained actions all had one-word English equivalents (see Table 2), the non-matching actions shown opposite the trained actions in the Salience Phase and Test Phase were not easily described by a single word in English (e.g., child laying on the parent’s leg while the parent swings their leg up and down). Importantly, that children show evidence of verb learning in the current study suggests that familiarity with the trained actions did not impede children’s ability to learn a verb.

Finally, we combined videos and live action to provide children with a rich, but complex learning environment. The videos used in the current study were taken from the Sesame Beginnings series and contained an action in an intricate setting. For instance, the video clip showing Cookie Monster wezzling places him in a living room, in front of a couch, with framed pictures in the background. Thus, children were challenged to zoom in on the action while faced with potentially distracting stimuli. The live action sequence in the current study was also limited in length and scope. Specifically, each live action sequence lasted 15 seconds, which was counterbalanced to encompass two training trials, but only comprised a small amount of the total experiment. In addition, the live action sequence was included between video clips, meaning that children had to redirect attention several times during the experiment. Switching between the modalities of live interaction and video may have been difficult for children, as evidenced by improved performance on the second verb. Success on the second verb in the sequence, regardless of the particular verb itself, may indicate that children required the first verb sequence to acclimate themselves to the demands of the procedure.

The current study suggests that children gleaned specific information from a complex situation to learn a verb. Moreover, both younger and older children were able to generalize a novel word to a new actor and show evidence of constraining novel word use. This first study, however, does not allow us to separate the effect of live action from the effect of video on word learning. Study 2 was designed to disentangle these variables.

Study 2: Do children learn verbs from video alone?

Both younger and older children showed evidence of verb learning in Study 1 when video displays were accompanied by social interaction. To determine if the live action sequence contributed to learning, it was important to evaluate verb learning through video alone. Given that previous research highlights the role of social interaction in verb learning, especially in younger children (e.g., Naigles et al., 2005), we predict that only the older children will be able to learn verbs from video alone.