A little labeling goes a long way: Semi-supervised learning in infancy

Alexander LaTourrette; Sandra R Waxman

doi:10.1111/desc.12736

. Author manuscript; available in PMC: 2020 Jan 1.

Published in final edited form as: Dev Sci. 2018 Sep 18;22(1):e12736. doi: 10.1111/desc.12736

A little labeling goes a long way: Semi-supervised learning in infancy

Alexander LaTourrette ^1,^*, Sandra R Waxman ^1,²

PMCID: PMC6294654 NIHMSID: NIHMS985901 PMID: 30157311

Abstract

There is considerable evidence that labeling supports infants’ object categorization. Yet in daily life, most of the category exemplars that infants encounter will remain unlabeled. Inspired by recent evidence from machine learning, we propose that infants successfully exploit this sparsely labeled input through “semi-supervised learning.” Providing only a few labeled exemplars leads infants to initiate the process of categorization, after which they can integrate all subsequent exemplars, labeled or unlabeled, into their evolving category representations. Using a classic novelty preference task, we introduced 2-year-old infants (n = 96) to a novel object category, varying whether and when its exemplars were labeled. Infants were equally successful whether all exemplars were labeled (fully supervised condition) or only the first two exemplars were labeled (semi-supervised condition), but they failed when no exemplars were labeled (unsupervised condition). Furthermore, the timing of the labeling mattered: when the labeled exemplars were provided at the end, rather than the beginning, of familiarization (reversed semi-supervised condition), infants failed to learn the category. This provides the first evidence of semi-supervised learning in infancy, revealing that infants excel at learning from exactly the kind of input that they typically receive in acquiring real-world categories and their names.

Keywords: language acquisition, conceptual development, category learning, semi-supervised learning, language, thought

The powerful connection between language and human cognition begins in infancy. Even before infants begin to produce their first words, language supports fundamental cognitive capacities (Feigenson & Halberda, 2008; Ferguson & Lew-Williams, 2016; Ferry, Hespos, & Waxman, 2010; Xu, 2002; for a review, see Perszyk & Waxman, 2018). Infant object categorization serves as a strong illustration of this link: decades of research reveal that naming facilitates infants’ object categorization. When infants are introduced to a series of distinct objects belonging to the same object category, they successfully identify that category if the objects are each named with the same novel word (e.g., “Look at the dax!”), but they fail to do so if these objects remain unnamed (Balaban & Waxman, 1996; Waxman & Braun, 2005; Waxman & Markow, 1995). Naming also shapes the boundaries of the object categories infants form. When presented with a set of objects that vary along a continuous distribution, infants use labels to infer the underlying categories. If all the objects are named with the same novel word, infants form a single inclusive object category; in contrast, if the objects on opposite sides of the continuum receive different names, infants form two contrastive categories (Althaus & Westermann, 2016; Havy & Waxman, 2016; Plunkett, Hu, & Cohen, 2008). This link between naming and object categories, which develops within infants’ first year, may play an important role in early category acquisition.

In infants’ everyday experiences, however, their input differs dramatically from the input provided in carefully controlled experiments in the infant laboratory. Infants do not typically hear a name for every exemplar they encounter. Instead, the vast majority of exemplars remain unlabeled. Even a caregiver who readily labels novel objects as they appear (e.g., “Look, a grasshopper!”) will not label all the objects within the infants’ view (e.g., the leaf on which the grasshopper has alighted, the surrounding trees, the bird passing overhead). Nor will the caregiver label each grasshopper the infant later encounters. Complicating the task further, every object is a member of multiple categories that overlap in scope (e.g., grasshopper, bug, insect, animal; green; jumping), yet only a few such categories, if any, will be named in a given encounter. In addition, there is considerable variation, both within and across cultures, in how and how often adults label objects for infants (e.g., Cartmill et al. 2013; Gaskins, 1999; Lieven, 1994; Rogoff, Mistry, Göncü, & Mosier, 1993; Shneidman & Goldin-Meadow, 2012). Finally, even when a caregiver does name an object within the infant’s view, identifying the intended referent is often still quite difficult (Cartmill et al., 2013; Gillette, Gleitman, Gleitman, & Lederer, 1999).

In sum, there can be little doubt that the clear and consistent naming episodes that have proven so advantageous in the infant laboratory do not characterize infants’ experiences in the natural world. For many categories, of course, this poses no problem: infants can successfully learn certain categories exclusively from unlabeled exemplars (e.g., Quinn, Eimas, & Rosenkrantz, 1993). This sort of unsupervised learning may also support infants’ ability to learn words referring to objects with which they have prior experience (Clerkin, Hart, Rehg, Yu, & Smith, 2017; Ramscar, Yarlett, Dye, Denny, & Thorpe, 2010). Particularly for more difficult categories, however, unambiguous labeling can play a powerful role in infant word learning (Tomasello & Farrar, 1986; Waxman, 1990, 1998; Woodard, Gleitman, & Trueswell, 2016). While rare, these moments in which infants successfully link labels and referents can dramatically boost word learning when they occur (Stevens, Gleitman, Trueswell, & Yang, 2016; Trueswell, Medina, Hafri, & Gleitman, 2013).

How, then, can we reconcile the power of labels in infants’ categorization with their relative scarcity in infants’ input? Can infants use the labels they do hear to learn new object categories? One possibility is that providing even a small set of labeled exemplars is sufficient to spark the acquisition of object categories: labeled exemplars may serve as a foundation for learning from subsequent, unlabeled exemplars. This strategy, known as “semi-supervised learning” (SSL), has been documented extensively in machine learning (for reviews, see Chapelle, Scholkopf, & Zien, 2006; Zhu, 2005; Zhu & Goldberg, 2009). Typically, semi-supervised learning algorithms employ a two-step process. First, a small set of labeled exemplars is provided: this allows the algorithm to form initial estimates of the categories to be learned. Next, the algorithm is given a much larger set of unlabeled exemplars: this permits it to adjust the category boundaries to reflect the full distribution of exemplars. While there are many different mechanisms by which an algorithm might make this adjustment, they all share the goal of gradually refining categories to account for the unlabeled data (Zhu, 2005). For instance, one widely-used approach is expectation-maximization (Dempster, Laird, & Rubin, 1977). Essentially, the algorithm predicts each unlabeled exemplar’s category by comparing it to the category estimates derived from the labeled exemplars. Next, the algorithm incorporates these unlabeled exemplars into their predicted categories, weighting each exemplar’s impact on its category by the algorithm’s confidence in that prediction. This process then repeats, using the new category representations, until the algorithm reaches an optimal set of category boundaries, informed by both labeled and unlabeled exemplars. Notably, this process typically relies on the algorithm processing the labeled exemplars first.

Semi-supervised learning has proven successful in machine learning across myriad content domains including text classification (Goldberg & Zhu, 2006; Nigam, Mccallum, Thrun, & Mitchell, 2000; J.-M. Xu, Fumera, Roli, & Zhou, 2009) and object/person identification (Balcan et al., 2005; Guillaumin, Verbeek, & Schmid, 2010). For example, Lu, Ting, Little, and Murphy (2013) compared fully supervised and semi-supervised learning algorithms tasked with identifying individual players in a video recording of the 2010 NBA Championship series. Although the fully supervised and semi-supervised algorithms ultimately achieved equivalent accuracy in the task, the number of labeling episodes required differed dramatically: the fully supervised algorithm required 10 times as many labeled exemplars (20,000) as the semi-supervised algorithm (2,000). Clearly, then, SSL offers a powerful solution to the challenge of learning when the available unlabeled exemplars far outnumber the available labeled exemplars.

Moreover, the benefits of SSL are evident not only in machines but also in adults (for a review, see Gibson, Rogers, & Zhu, 2013). Adults successfully learn novel categories by integrating a small set of labeled exemplars with a subsequent, larger set of unlabeled exemplars (Gibson et al., 2013.; Kalish, Rogers, Lang, & Zhu, 2011; Zhu, Rogers, Qian, & Kalish, 2007). Indeed, Lake and McClelland (2011) estimated that when acquiring new categories, their adult participants weighted unlabeled exemplars at least 40% as heavily as labeled exemplars. Although the conditions under which adults most successfully use SSL are still under investigation (cf. McDonnell, Jew, & Gureckis, 2012; Rogers, Gibson, Harrison, & Zhu, 2010; Vandist, De Schryver, & Rosseel, 2009), adults appear to readily engage in semi-supervised learning, drawing on both labeled and unlabeled exemplars to learn new categories.

Recall that, like these adults, most infants typically receive a few high-quality labeled exemplars amidst a larger set of unlabeled exemplars when learning object categories. In principle, then, semi-supervised learning appears to be a natural fit for early category learning (e.g., Kalish et al., 2011; McDonnell et al., 2012; Zhu, 2005). Unfortunately, however, developmental evidence is scant. We are aware of only one investigation of SSL in children: Kalish, Zhu, and Rogers (2015) documented successful SSL in 7- to 8-year-olds, with suggestive but more equivocal performance in 4- to 6-year-olds.

At issue, then, is whether infants take advantage of SSL in their first few years of life, as they acquire many new object categories and their names. There are several promising hints to suggest that they can. First, infants successfully learn object categories from a mixture of labeled and unlabeled exemplars; although it is unclear from this prior work whether infants integrated the unlabeled exemplars or simply ignored them (Balaban & Waxman, 1997; Waxman & Markow, 1995). Second, substantial evidence suggests the effect of naming extends beyond the exemplars that have been named: naming directs infants’ attention to commonalities among objects and, moreover, influences their interpretation of as-yet-unnamed objects (Althaus & Mareschal, 2014; Althaus & Plunkett, 2015; Waxman & Braun, 2005; Waxman & Markow, 1995). Perhaps, then, exposure to labeled exemplars provides infants with a robust foundation for learning from subsequent, unlabeled exemplars. That is, perhaps infants, like SSL algorithms, use the labeled exemplars to classify subsequent unlabeled exemplars and incorporate them into an evolving category representation.

Here, we test this hypothesis directly. We focus on 2-year-olds, an age at which infants rapidly acquire object categories and their names. We adapt a classic object categorization task (Experiment 1) to explore SSL in infants (Experiments 2 and 3). We examine infants’ object categorization in 3 distinct conditions: fully supervised learning (FSL: all exemplars labeled), unsupervised learning (USL: no exemplars labeled), and semi-supervised learning (SSL: only the first few exemplars labeled). If infants benefit from SSL, then performance in the SSL and FSL conditions should be comparable. Alternatively, if SSL provides insufficient support for infant categorization, then performance in the SSL and USL conditions should be comparable.

Experiment 1

Building upon previous work (e.g., Waxman & Markow, 1995), we engaged infants in a two-step categorization task. First, during familiarization, we introduced infants to six exemplars from one object category. Then, at test, we presented two new exemplars: one from the now-familiar category and one from a novel category. All infants viewed the same objects; what varied across conditions was whether the familiarization exemplars were labeled. For half the infants, all the exemplars were labeled (FSL); for the remaining infants, all exemplars were unlabeled (USL). We expected that infants in the FSL, but not the USL, condition would successfully form object categories.

Method

Participants.

Forty-eight infants (23 female) between 25 and 30 months of age (M = 26.8, SD = 1.25) from predominantly college-educated, white families living in the Greater Chicago area participated. Five additional infants were excluded prior to analysis for technical issues (2), parental interference (1), and failing to accumulate at least 2500ms of looking during test (2). All infants attended to both exemplars during the test phase.

Apparatus.

A Tobii T60XL eyetracker was used for stimulus presentation and data collection. The eyetracker has a sampling rate of 60 Hz, and a display size of 57.3 × 45 cm.

Materials.

Auditory stimuli.

Two naming phrases (“Look at the modi!” / “Look—it’s a modi!”) and two non-naming phrases (“Look at that!” / “Look over here!”) were produced by a female using infant-directed speech and recorded in a sound isolation booth.

Visual stimuli.

See Figure 1. Two sets of novel objects, designed by Havy and Waxman (2016), served as visual stimuli. First, two pairs of colorful, creature-like objects were created. The two objects in each pair were then morphed together, yielding two perceptual continua of objects. As a result, images varied along a variety of dimensions, including color, overall body shape, and feature details. By sampling from these continua at 20% intervals, we obtained 6 regularly distributed, continuously varied category exemplars for each of the two categories. These served as familiarization exemplars. At test, infants saw two new exemplars: a new member of the familiar category and a member of a novel category. For the familiar category exemplar, we selected the midpoint of the familiar continuum. For the novel category exemplar, we generated another two continuous categories and selected their midpoints. Each of these novel category exemplars was then paired with a familiar category test exemplar, and the novel exemplar’s coloring was altered to match the familiar exemplar’s.

Procedure.

All infants saw the same images. During familiarization, infants viewed 6 different exemplars from one of the two continuous categories, counterbalanced across infants. Exemplars were presented in one of four pseudo-random orders; to prevent infants from forming spurious generalizations, the first two exemplars were always drawn from different sides of the continuum (cf. Gerken & Quam, 2017). Exemplars appeared on either the left or right side of the screen, with the initial side counterbalanced across infants, and were approximately 600 × 750 pixels. Each exemplar was presented once for 3 seconds.

Infants were randomly assigned to either the FSL or USL condition. In the FSL condition, each familiarization exemplar was paired with a labeling phrase, containing a novel noun (e.g., “Look at the modi!”); in the USL condition, each exemplar was paired with a non-labeling phrase (e.g., “Look at that.”) (see Figure 1).

After familiarization, an attention-getter appeared at the center of the screen (10 seconds), followed immediately by the test trial (20 seconds). At test, all infants saw two exemplars: a new exemplar from the familiar category and an exemplar from a novel category. These were presented side-by-side and in silence; their side placement was counterbalanced. We analyzed each infant’s first 5 seconds of looking to the test objects.¹

Data Preparation.

We calculated each infant’s novelty preference score (looking to the novel exemplar divided by looking to novel and familiar exemplars). Because this calculation yields a bounded proportion, we used an empirical logit transformation to ensure the data are suitable for analysis with linear models.

In addition to novelty preferences, we examined how infants’ attention evolved over the course of the test phase. We predicted that their looking patterns would diverge over time, with infants in the FSL condition showing greater attention to the novel object than those in the USL condition. We employed a cluster-based permutations analysis (see Maris & Oostenveld, 2007) to identify when, if ever, performance in the two conditions diverged significantly and to do so without inflating the overall Type I error rate (for other examples, see Dautriche, Swingley, & Christophe, 2015; de Carvalho, Dautriche, & Christophe, 2016; Hahn, Snedeker, & Rabagliati, 2015). The analysis was implemented with the eyetrackingR package (Dink & Ferguson, 2015). To begin, we created 25ms bins and compared performance across conditions within each bin using a t-test. For adjacent time-bins which yielded a significant result (based on alpha = .05), the t-statistics for those bins were summed together, creating a cumulative t-statistic representing the overall size of that divergence. The use of alpha = .05 as the threshold here represents a conservative choice, ensuring that any reported divergences will be large in scale. Finally, to evaluate the probability of observing these divergences by chance, we performed 1000 simulations in which the condition labels were randomly shuffled. By evaluating the divergences in infant data against this chance-based distribution, we obtain a p-value, estimating the likelihood that each divergence might have occurred by chance.²

Results

Familiarization.

Infants in both conditions were highly attentive to the visual stimuli: there were no differences in looking time between the USL (M = 14.94 sec., SD = 1.91) and FSL (M = 13.86, SD = 3.00) conditions, t(46) = 1.48, p = .14, d = .43.

Test.

Preliminary analyses yielded no effect of age, sex, exemplar order, exemplar left/right position, whether the novel test exemplar occurred on the same side as the final familiarization exemplar, or the category learned, ps > .10 on infants’ novelty preferences in any experiment. Therefore, here and in all subsequent analyses, we collapse across these factors.

As predicted, infants in the FSL condition (M = .59, SD = .15) successfully learned the object category, revealing a significant novelty preference, t(23) = 3.05, p = .006, d = .62, but those in the USL condition (M = .49, SD = .18) performed at chance levels, t(23) = .39, p = .70, d = .08 (see Figure 2). Moreover, performance in the two conditions was significantly different, t(46) = 2.27, p = .028, d = .66.

Figure 2. — Mean novelty preferences by condition. Infants in the FSL and SSL conditions revealed reliable novelty preferences, p < .05. Infants in the USL or Reversed SSL conditions performed at chance levels. Error bars represent standard errors of the mean.

Infants’ looking patterns over the course of the test phase also varied significantly as a function of condition (see Figure 3). A cluster-based permutations analysis revealed that performance in the FSL and USL conditions diverged significantly, from 3450ms to 3850ms after test onset, p = .038, with stronger novelty preferences in the FSL condition throughout this window.

Discussion

Infants in the FSL, but not USL, condition successfully acquired the novel category. This extends evidence from previous work (Balaban & Waxman, 1996; Waxman & Braun, 2005; Waxman & Markow, 1995) to a new age group and to entirely novel, continuous categories. This outcome, evident in both the novelty preference and time-course measures, provides a firm foundation for assessing infants’ success in a semi-supervised condition.

Experiment 2

Our next goal was to test infant categorization in a semi-supervised learning condition. In the SSL condition, familiarization began with two labeled exemplars, followed by four unlabeled exemplars. If infants are capable of SSL, then they should successfully categorize in this condition, mirroring infants in the FSL condition (Experiment 1). But if a semi-supervised environment provides insufficient support, then infants in the SSL condition should mirror the chance-like performance of infants in the USL condition (Experiment 1).