Abstract
Adults readily coordinate on temporary pacts about how to refer to things in conversation. Young children are also capable of forming pacts with peers given appropriate experimenter intervention. Here, we investigate whether parents may spontaneously provide a similar kind of scaffolding with U.S. children in a director–matcher task (N = 201, 49% female; ages 4, 6, 8). In Experiment 1, we show that parents initiate more clarification exchanges with younger children who, in turn, are more likely to adopt labels introduced by the parent. We then examine whether the benefit of such scaffolding acts primarily through childrens' difficulties with comprehension (Experiment 2) or production (Experiment 3). Our findings suggest that parents primarily scaffold pacts by easing children's production difficulties, modeling cooperative communication.
Children do not just learn the sounds, words, and structures of language—they learn how to use language to communicate (Clark, 2009; Hockett & Hockett, 1960). A core aspect of communication is our sensitivity to common ground. We produce and interpret language for each other in light of our shared history, and expect others to do the same (Brown‐Schmidt, 2009; Brown‐Schmidt et al., 2015; Bruner, 1985; Clark, 1996). For example, the same entity can be referred to as Max with family members, but our dog or our pug would be more appropriate with strangers who cannot be expected to know his name (Brown, 1958). When the meaning of an utterance is unclear, or when an intended referent does not have a conventional label, interlocutors will engage in negotiation, collaboratively arriving at a shared conversational pact about how to think and talk about the intended referent (Clark & Wilkes‐Gibbs, 1986; Haber et al., 2019; Hawkins et al., 2020). These pacts are temporary agreements about referent names, and adults fluently form, revise, and track them over time with different partners. For instance, Brennan and Clark (1996) showed that adults will use the contextually appropriate level of specificity for a referent (e.g., “loafer” rather than “shoe” when another shoe is present), carry it forward into a new context with the same conversational partner (even when the initial competitor is removed), then revert to the simpler “shoe” when a new partner is introduced.
While adults readily form pacts with one another, studies examining peer interactions between children have yielded more mixed results (see Stephens & Matthews, 2014 for a review). In a classic set of studies, Krauss and Glucksberg (1977) asked pairs of age‐matched children to play a director–matcher game that required them to refer to novel objects by themselves (see also Glucksberg et al., 1966; Glucksberg & Krauss, 1967; Krauss & Glucksberg, 1969). Across repeated interactions with the same novel objects, 5‐ and 6‐year‐old children showed very little evidence of pact formation. The matcher's accuracy in selecting the correct referent remained low, and the director did not increase the efficiency of their referential expressions as adults do. These classic results are perhaps surprising from a contemporary perspective, given that children engage in relatively fluent conversation at this age and are able to track at least some aspects of common ground when engaging in scripted interactions with an adult experimenter as a partner (Vasil, 2023), including visual perspective (Khu et al., 2020; Nadig & Sedivy, 2002; Nilsen & Graham, 2009) and referential precedence (Akhtar et al., 1996; Graham et al., 2014; Matthews et al., 2010; Ostashchenko et al., 2019; Yoon et al., 2021). However, as Köymen et al. (2014, p. 2335) note, “establishing referential pacts with adult experimenters who provide scripted utterances might undermine the spontaneity and the mutual agreement on a referential term, since children are simply presented with referential terms and are not actively creating referential pacts themselves.” Even more surprising, then, are recent studies of spontaneous interactions between pairs of children as young as 4 years old, who are in some cases able to get pacts off the ground by themselves (Branigan et al., 2016; Köymen et al., 2014), most notably in nonverbal gesturing paradigms (Bohn et al., 2019; Lister et al., 2021).
Why do pairs of children apparently succeed in some studies but fail in others? One key factor may be the presence or absence of targeted input or feedback from adult experimenters. For example, Köymen et al. (2014) explicitly prompted 4‐ to 6‐year‐old participants to ask clarification questions when their partner was underinformative (asking “do you know what he/she means?” and saying “You can ask him/her” when they said no), mediated negotiation over pacts (“I say we give a name to this boy/girl. Do you have any idea?”), and provided immediate feedback when they failed to use the pact (“And who are they?”). Similar effects of experimenter feedback were noted as far back as Krauss and Glucksberg (1977) (“Children in the first, third, and fifth grades did improve, with the help of comments from the experimenter pointing out the mistakes they had been making”). Developmental changes were also observed in these more recent studies, indicating that the underlying mechanisms may be graded and continue to develop into middle childhood. For example, Köymen et al. (2014) found that 6‐year‐olds were more sensitive to referential pacts than 4‐year‐olds, who were only reliably sensitive when the pacts were proper nouns, even in the presence of extensive prompting. And while Branigan et al. (2016) found that 8‐ to 10‐year‐old participants were able to coordinate on pacts given explicit feedback about their accuracy, their error rates were still higher in absolute terms than adults. Thus, relatively targeted experimenter intervention may be required for children to successfully construct conversational pacts with their peers; the ability to reliably initiate the requisite speech acts on their own may not develop until later.
In everyday conversation, children do not have access to explicit mediation from experimenters. It is possible, however, that parents may spontaneously scaffold referential pacts with their children in a similar way. Parents are not only more linguistically capable than their children but also have fine‐grained representations about their children's knowledge and conversational abilities (Leung et al., 2021). Parental scaffolding may not only help facilitate effective communication in the local context but also serve to model dialogue skills that children may gradually adopt more globally. Our investigation thus begins in Experiment 1 by asking what parents may be doing spontaneously to support the formation of conversational pacts when talking with their children. We do not aim here to make direct comparisons between parent–child interactions and peer‐to‐peer interactions. Instead, we focus on the parent–child setting and consider three potential obstacles to initially establishing pacts where adult scaffolding may help:
Utterance‐level comprehension. In order to identify the target of a referential utterance, children must determine whether the information contained in it is a good description of each potential referent. There is extensive evidence that by 4 years, children can track partner‐specific perspectives during comprehension; for example, Nadig and Sedivy (2002) found that children use information from previous interactions to look toward the correct target image as their partner’s speech unfolds (Akhtar et al., 1996; see also, Bleijlevens et al., 2023; Bohn et al., 2021). However, it is possible that in contexts with more complex referring expressions for novel objects (like the tangrams we use), children may struggle to identify the target of an utterance, even if it would be perfectly informative for an adult listener (Speer, 1984).
Utterance‐level production. Alternatively, the primary source of communicative difficulty may lie on the production side. Children may generate referential expressions that are either overly idiosyncratic or insufficiently informative for their conversational partner to correctly identify the intended referent (Beal, 1988; Beal & Flavell, 1983; Rabagliati & Robertson, 2017). For familiar objects, conventional labels may provide a good‐enough default (Goldberg & Ferreira, 2022; Koranda et al., 2022), but more challenging contexts with unfamiliar objects may place additional demands on production. Children are also known to struggle to determine which referential expressions would be more or less informative in context (Beal & Flavell, 1982; Whitehurst & Sonnenschein, 1978), even for idiosyncratic expressions they themselves produced at an earlier time (Asher & Oden, 1976; Robinson & Robinson, 1977).
Discourse‐level interaction. Finally, the difficulty may lie beyond the level of single utterances, implicating the interactive discourse processes that link together production and comprehension. For example, children may experience no more or less difficulty in production or comprehension than adults, but struggle to notice and actively signal to their partners that they have not understood each other. Adults readily use back‐channels and other vocal affirmations to signal that they have understood a potentially ambiguous utterance (Clark & Bernicot, 2008) and will ask clarifying questions when they feel they have been unable to understand (Clark & Wilkes‐Gibbs, 1986; Schegloff, 2007). However, young children may have trouble proactively initiating these speech acts when necessary, and errors may persist preventing pacts from getting off the ground (Anderson et al., 1994; Beal, 1987).
EXPERIMENT 1: FORMING PACTS IN AN INTERACTIVE COMMUNICATION GAME
It is clear from existing studies that children are receptive to explicit scaffolding from experimenters (Köymen et al., 2014). For example, Deutsch and Pechmann (1982) showed that repeated clarification questions eventually elicited unambiguous descriptions from 6‐ and 9‐year‐olds and, to a lesser extent, from 3‐year‐olds. A more recent training study by Matthews et al. (2007) replicated this finding in 2‐ to 4‐year‐olds and further showed transfer to a different communication task. But it is unclear whether, or how, adults spontaneously provide such input over the course of unmediated parent–child conversations, and whether such input actually enables the formation of conversational pacts (rather than generically increasing informativeness).
One possibility is that parents naturally adopt some of the interactive strategies developed by experimenters in these training studies: pointing out when references are insufficiently informative, asking for clarification about targeted features, and helping children to reconceptualize referential targets within the expressiveness of their developing vocabulary (Clark, 2018; Nikolaus et al., 2022). In our first experiment, we asked parents and their 4‐, 6‐, and 8‐year‐old children, as well as control pairs of adults, to play an adapted version of Krauss and Glucksberg's (1969) director–matcher game. We examined (1) whether parents' contributions enable conversational pacts to form at all, (2) what strategies parents use to scaffold pact formation if so, and (3) whether the nature of these pacts changes over development as children become more adult‐like users of language.
Method
Participants
Children (ages 4, 6, and 8) and their parents were recruited from a database of families in the local community to achieve a planned sample of 60 parent–child pairs (20 per age group). This sample size was chosen based on logistical feasibility of in‐person recruitment and is comparable with other recent studies of parent–child interaction (e.g., Leung et al., 2021). A total of 75 children and their parents participated. Data from 12 pairs were dropped due to failure to complete the study, leaving a final sample of 63 pairs. There were 24 four‐year‐olds, 20 six‐year‐olds, and 19 eight‐year‐olds, along with their parents, in our sample. Of the included pairs, 31 of these children were female (49%) and 32 were male (51%). Of the 60 participants who responded to our demographic questionnaire about race, 33 children were White (52%), 14 were Black or African American (22%), two were Asian (3%), and 11 were multiracial (17%). For a comparison group, a convenience sample of adult participants were also recruited from a Psychology Department subject pool to achieve a planned control group size of 20 adult–adult pairs.
Stimuli
Twelve solid black images of tangrams were normed for pairwise similarity by an independent group of 60 participants on Amazon Mechanical Turk. Each of these participants made 22 pairwise similarity judgments on a scale from 1 to 100. Based on these similarity ratings ( = 42.3, = 26.1), the 10 tangrams with the highest overall dissimilarity were selected for use as stimuli (Figure 1a). To ensure that the game would not be too difficult, we designed contexts such that the foil was never too similar to the target. To do so, we rank‐ordered all tangram pairs from least to most similar and chose foils to minimize similarity while ensuring that each image appeared as a foil four times.
FIGURE 1.

(a) Parents (orange) and children (blue) played a repeated reference game with a set of 10 tangram images. To measure the respective contributions of parents and children to the pacts that were eventually established, we ensured that half of the images were described first by the parent, and half of the images were described first by the child. (b) Two of these figures were presented as the context on each trial. One was the target and the other was the foil. The director (here, the parent) was asked to refer to the target (privately highlighted in a box) so that the matcher (here, the child) could distinguish it from the foil. (c) Each tangram appeared as the target once per block, and each dyad played the game for four blocks. Parents and children alternated roles on each trial.
Design and procedure
Pairs of participants were brought into the laboratory to play a cooperative director–matcher game. Adult participants and parents provided written consent and children provided verbal consent prior to beginning the game. Parent–child pairs were compensated $10 and a small toy or book for their time. Adult participants received $5 each or course credit for their participation. They were seated in front of iPads at opposite ends of a table, with a divider preventing them from seeing the other's screen. This divider did not fully occlude participants; they could still see their partner's face, although they were explicitly instructed to use words only and we did not observe participants relying heavily on gestural or facial cues in practice.
Participants were told that they would take turns playing director and matcher roles. On each trial, exactly two tangrams appeared on their screens. One of these tangrams was the target, and the other was the foil. Pairs were told that the director's task was to describe the target image, privately indicated by a blue border, and the matcher's task was to select one of the two images on their screen based on the director's description (Figure 1b). Participants were aware that both screens showed the same two images, but possibly in different locations (left or right is randomized) on the screen. Before beginning the experiment, participants played six practice trials with images of common fruits and vegetables. There were no time limits for trials, and participants were not given specific instructions on what they can or cannot say. To prevent matchers from selecting a target too early, touches were disabled for 1500 ms on each trial.
The experiment consisted of four repetition blocks of 10 trials each (Figure 1c). Each tangram was the target once per block. We constructed the trial sequence to ensure that participants both alternated roles from trial to trial and alternated roles for each target from block to block. For each participant pair, we randomly divided the tangrams into two sets of five: The adult was assigned one set to describe on the first block, and the child was assigned the other set. These sets were interleaved on the first block, such that players alternated roles. On each subsequent block, these sets were swapped such that each tangram was described by each participant exactly twice over the course of the experiment.
On each trial, the target tangram appeared with exactly one foil selected from the set of nine other tangrams. Targets appeared with a different foil on different repetition blocks. To ensure that the game would not be too difficult for young children, tangrams most similar to the target (based on similarity norms) did not appear in the same context. To discourage participants from using spatial language (e.g., “left side”), the target and foil were shown in randomized order across the two iPads. When the matcher selected an image, it became colorful and a pleasant sound played. Importantly, neither the matcher nor the director received explicit feedback about accuracy: The same sound played whether the selection was correct or not.
Preprocessing
Sessions were videotaped and subsequently transcribed using Datavyu Team (2014), an open source coding program. Each video was transcribed by one researcher and checked by a different researcher. Checking involved watching the video alongside the transcript and correcting any typos or errors in transcription. Utterances were manually coded as part of a given trial or unrelated to the game (e.g., “sit down please”), and unrelated utterances were removed before analysis. For the purposes of analyzing turn‐turning, a conversational turn was defined by a clear end to speech by the speaker. Some conversational turns do not constitute full sentences, in cases where the partner interrupts. Transcribers were instructed that backchannels (e.g., yes, mmhm, I see) should not count as interruptions, unless it led the speaker to stop talking. If the speaker continues speaking without pause, then the utterance is transcribed as a single conversational turn.
Results
We characterized developmental differences using three measures of communicative behavior. First, we examined accuracy to evaluate whether children were able to succeed at the reference game in collaboration with their parents. Second, we examined conversational turn‐taking behavior to evaluate how interactive dialogue may contribute to success. Third, we examined the number of words produced by each partner on each turn to evaluate the efficiency of pacts.
Performance accuracy
We began by analyzing task performance across age groups. Because pairs of adults were consistently at ceiling throughout the task, we focused on the performance of parent–child pairs. We constructed a mixed‐effects logistic regression predicting whether the matcher successfully chose the correct referent on each trial. The model included fixed effects whether the parent or child was the director, (numeric) age, and repetition block. It also included random intercepts for each tangram and pair of participants, and random effects of repetition block and director for each pair of participants (Table S1).
Initial accuracy was well above chance for all age groups, the lowest being 83% correct (confidence interval: [76%, 90%]) for 4‐year‐old directors, indicating that even young children can succeed in this referential task with their parents. We also found a significant main effect of age ( = .34, = 2.99; = .003): Pairs with younger children performed significantly worse than pairs with older children. Critically, however, accuracy improved significantly over the four repetition blocks for all groups ( = .49, = 3.48, < .001; Figure 2a). Intriguingly, accuracy was also slightly lower when children were the directors ( = −.38, = −2.18; = .030), suggesting a potential asymmetry in performance across roles.
FIGURE 2.

(a) Accuracy and (b) number of dialogue exchanges per trial, broken down by whether the child (blue) or the adult (orange) was the director. Error bars are 95% CIs.
Interactive dialogue exchanges
If the ability of children of different ages to successfully establish reference depends on interactive scaffolding provided by their parents, we would expect additional dialogue exchanges for younger children. We quantified dialogue exchanges by counting the total number of distinct turns of continuous speech on each trial and constructed a (Poisson) mixed‐effects model predicting the (continuous) number of exchanges with the same effect structure reported in the previous section (Table S2). Consistent with previous work (Clark & Wilkes‐Gibbs, 1986), and replicated in our adult‐adult control condition, we found a significant main effect of repetition: Fewer dialogue turns were required on later trials ( = −.05, = −3.03; = .002). In line with our predictions, we also found a significant main effect of age ( = −.08, = −4.01; < .001). Pairs with 4‐year‐old children took roughly one additional turn at each point in the experiment than pairs with older children, who more closely resembled pairs of adults (Figure 2b). The increased levels of interactivity between parents and young children provides an interesting contrast with previous studies showing lower interactivity between peer dyads of young children interacting without parental scaffolding (Anderson et al., 1994, see Appendix A for a preliminary analysis of the content of these exchanges).
Reduction in length of referential expression
A key signature of successful communication among adults is an increase in efficiency over repeated reference (Clark & Wilkes‐Gibbs, 1986). As pairs form conceptual pacts, they are able to communicate the same meaning using fewer words. Our control sample of adults replicated this classic effect ( = −.27, = −7.30, < .001). Here, we asked whether parents and children of different ages spontaneously reduce their referential expressions in the same way. We define efficiency as the sum total of all words produced by the director on a given trial, up until a selection is made by the matcher. Note that the total number of words produced on a trial is correlated with the number of dialogue exchanges examined above ( = .60) (Figure 3).
FIGURE 3.

Total number of words in referential expressions produced by children and parents over the course of interaction.
Using a mixed‐effects model with a Poisson linking function to account for count data, we predicted the number of words used by the director on each trial, including fixed effects of age, repetition block, and director identity (parent vs. child) as well as all of their interactions. We also included random intercepts and slopes for repetition block at the tangram‐level and maximal random structure at the dyad‐level (i.e., intercept, slopes for repetition block and director identity, and their interaction, Barr et al., 2013; see Table S3). All variables were centered to allow interpretation of lower order terms as effects at the average level of the other terms. We found significant main effects of repetition block ( = −.15, = −6.20, < .001), director identity ( = .12, = 5.89, < .001), and age ( = −.10, = −3.71, < .001). All else being equal, directors used fewer words over subsequent repetitions, children used fewer words than their parents, and pairs with older children used fewer words than pairs with younger children. However, these main effects were clarified by several pairwise interactions. First, while parents on average used more words as director than their children did, we found a significant interaction with the child's age ( = −.03, = −2.54, = .011). This gap between parent and child utterance length was largest at age 4 but nearly disappeared by age 8. Second, we found that parents reduced their utterance length over time more strongly than children did, holding age group constant ( = −.02, = −2.26, = .024). Third, we found a main effect of the age group, with older children supporting stronger reduction overall, ( = −.03, = −2.27, = .023).
Who introduces pacts and who adopts them?
Our results so far demonstrate that children are able to converge on increasingly accurate and efficient pacts with their parents. What might allow children to coordinate with their parents but not with their peers (Krauss & Glucksberg, 1977)? A classical explanation is that children are rigid and lack the ability to adapt to their partner: they have a strong preference for a particular idiosyncratic description and are not sensitive to the possibility that their partner may not understand it (e.g., “this one looks like mommy's dress”). Under this hypothesis, children fail with other children because they each stubbornly continue to use mutually incomprehensible expressions, and only succeed with their parent as a result of the parent's flexibility. Another possibility is that young children may be able to adapt successfully but are simply unable to generate good enough initial candidate labels to get the process off the ground. In this case, pairs of children may fail because neither partner can generate good enough labels to start the pact‐formation process, while children and parents succeed because parents seed the first good candidate label. Each of these accounts make different predictions about who is adapting to who: do pacts originate with children, or with adults?
We distinguish these accounts by quantitatively analyzing the natural‐language transcripts. For each word in the final description of a tangram, we checked whether it had appeared in an earlier referential expression for that tangram. We noted the first trial where it appeared, and who was director when it was produced.1 The proportion of words originating with the child and parent is shown in Figure 4. We observed an asymmetry: The words used by children on the final repetition were more likely to have originated with their parents than the words used by parents were to originate with their children. In addition, this gap appeared to close with older groups, with parents more likely to adopt words introduced by older children.
FIGURE 4.

Probability of words used on final round first occurring with child or parent. Error bars are 95% CI.
We tested this hypothesized interaction using a mixed‐effects logistic model predicting whether each word appearing on the final repetition for each tangram was introduced by the current director or by their partner. We included fixed effects of age group and the director identity (parent or child), as well as random intercepts for each pair of participants and each tangram (Table S4). We found a significant main effect of director identity, with the words used by children more likely to originate with their partner than the words used by parents, ( = .37, = 4.76, p < .001). Additionally, we found a weak but significant interaction between director and age, indicating that this asymmetry was smaller for older children ( = −.10, = −2.19, = .028). Thus, parents—especially parents of younger children—appear to be the source of the labels that persist in successful conceptual pacts.
Summary and discussion
In Experiment 1, we adapted the classic tangram director–matcher paradigm developed by Clark and Wilkes‐Gibbs (1986) to examine conversational pact formation among parent–child dyads. We found that even 4‐year‐old children and their parents can successfully coordinate on pacts, but for these youngest children, parents adaptively provide multiple sources of support and scaffolding. Even 4‐year‐old children readily adopted labels introduced by their parents and interactively refined their descriptions in response to spontaneous parent‐initiated scaffolding (see Appendix B for further analysis of how children may constrain pacts). Overall, 4‐ to 8‐year‐old children and their parents exhibited patterns similar to adult pairs in terms of reduction in length of referring expressions and exchange turns.
Importantly, we observed successful pact formation even in the absence of explicit task feedback: the same sounds played whether or not the matcher's selection was correct. We expect that adding explicit (serial) feedback would help facilitate pact formation (Fishbein & Osborne, 1971). It is also possible that the neutral sound was mistakenly interpreted by participants as positive feedback, leading to confusion. However, the fact that parent–child pairs were so successful under these conditions, as accuracy continued to improve, suggests that children and parents were largely able to rely on self‐initiated communicative feedback to form pacts.
EXPERIMENT 2: COMPREHENSION
If parental scaffolding is able to overcome the communicative challenges faced by young children, then what, exactly, could those challenges be? Children as young as 3 years old are able to effectively track and maintain existing pacts (Graham et al., 2014; Matthews et al., 2010); hence, we suggest that the difficulties, in practice, may lie in the initial establishment of pacts. One possible difficulty is that children's poor comprehension abilities may prevent pacts from getting off the ground. That is, young children may be unable to understand or accommodate their partner's initial description of a novel tangram when it does not align with their own way of conceptualizing that tangram, preventing its uptake as a pact. In this case, parent–child dyads' communicative success in our reference game may be due to parents' ability to align to their children's idiosyncratic point of view and adapt their referring expressions accordingly. A second possible difficulty may stem from children's production‐side abilities. In other words, children may have difficulty generating sufficiently descriptive referring expressions on their own, given their limited vocabulary and other processing constraints, but are able to recognize a good description when they hear it, and adopt that description as a pact going forward.
In Experiment 2, we explicitly tested the first of these two hypotheses. Although there is already a large body of work demonstrating that children are able to accurately interpret referring expressions by the age of four (Brandt et al., 2016; Davies et al., 2021; Morisseau et al., 2013; Nadig & Sedivy, 2002; Nilsen & Graham, 2009), our task in Experiment 1 may have posed additional challenges. The images may have been more abstract or unfamiliar, and the referring expressions may have been more complex than prior comprehension studies. Here, we validate what is known from the existing literature on early referring expression comprehension by providing the exact descriptions produced by participants in Experiment 1 to naive groups of children and adults, in the exact same referential contexts. If comprehension is the root of children's difficulty, we might expect that naive children as comprehenders would be equally unable to interpret all referring expressions (regardless of whether they were originally produced by adults or children) while naive adults as comprehenders would have no difficulty. Conversely, if naive children are able to understand the referring expressions produced by adults as well as adults are, then we may expect the source of the difficulty to lie elsewhere.
Methods
Participants
We recruited 355 adults from Amazon Mechanical Turk. Participants were compensated 30¢ for a short task. Data from 24 participants were dropped due to failure to pass an attention check, leaving a final sample of 331 adults. Additionally, we planned to recruit a sample of 200 children (ages 4 to 8) from a school and a museum in the Chicago area, with the relatively large sample size based on the classroom sizes of the local school. However, due to the COVID‐19 pandemic, we were forced to terminate data collection early with only 78 children skewed toward the older end of the age range (16 children between four and six, 34 children age seven, and 29 children age eight). Children received a sticker for their participation. We were not able to obtain further demographic information for these participants. Our partner classrooms have not yet allowed the resumption of data collection at time of submission, so we decided to move forward with our existing smaller‐than‐planned sample.
Stimuli
To conceal the identity of the original speaker, the first author and four research assistants produced new audio recordings by reading the Experiment 1 transcripts in a uniform vocal style. All recordings were by female native English speakers. We drew utterances from the first and fourth (final) round of the reference game. We removed disfluencies and isolated the speaker's original referring expression on each trial (i.e., we excluded additional information provided in response to the listener's questions or prompting). This process produced unique audio stimuli, two for each item of each game from Experiment 1. These stimuli were broken into 118 unique “item sets” containing 10 recordings each, such that each tangram appeared as the target once in each set. Our randomization was set up such that a unique item set would be shown to each participant, and then repeated once all 118 sets were used. Due to forced termination of data collection, only 78 sets were used for child participants. Across adult participants, all 118 sets were shown at least once. On each trial, the target tangram appeared with the same foil it was paired with on the corresponding trial in Experiment 1. The stimuli sets were counterbalanced such that each participant encountered exactly five utterances originally generated by a parent and five by a child, and five utterances from the first round and five from the final round.
Procedure
Participants were placed in the role of the listener and presented a sequence of 10 audio recordings—a single referring expression for each target tangram, in a randomized order. Participants were instructed to click the intended referent based on the audio they hear. On each trial, two tangram images were displayed side by side (left and right order randomized). At the beginning of each trial, the audio recording played once. To reduce possible learning effects, participants did not receive any feedback after their response. Before participants began the experiment, we ensured their audio was working. We did not allow participants to proceed past the consent page without clicking a “play” button that asked them to “type the number 86 into the box.” Additionally, to detect participants who were not following instructions, we included an attention check which simply asked people to click “the one on the left.” Child participants provided verbal consent prior to the start of the experiment, and parents provided written consent. The children's version of the experiment did not include the initial “play” button and attention check. While children completed the task independently on an iPad, an experimenter was nearby to ensure that the audio was working. We measured response time as time elapsed between the completion of the audio recording and the response. Participants were not allowed to respond prior to the completion of the audio recording.
Design
We used a 2 × 2 factorial design manipulating the age of the producer (parent vs. child) and comprehender (adult vs. child). The age of the producer was a within‐subjects manipulation while the age of the comprehender was an across‐subjects manipulation: each comprehender was exposed to utterances originally produced by both adults and children. We predicted that if children struggle to form conceptual pacts primarily due to comprehension difficulties, then adult comprehenders should be highly accurate across the board (regardless of whether the referring expression was originally produced by an adult or child), while child comprehenders should uniformly struggle.
Results
Children and adults comprehend descriptions at similar levels
To test whether reference game performance may be attributed to comprehension difficulties, we compared how naive groups of children and adults were able to determine the referent of the same expressions heard by listeners in Experiment 1 (see Appendix C for more comprehensive analyses). We focused on accuracy, as response times are not directly comparable between the web interface (for adults) and laboratory interface (for children). We constructed a mixed‐effects logistic regression model predicting trial‐by‐trial accuracy, including a fixed effect of comprehender group (child vs. adult) and random intercepts and slopes for each source game.2 We found no significant difference in overall accuracy across child and adult comprehenders, (adult accuracy = 0.87, 95% CI [0.86, 0.88], child accuracy = 0.85, 95% CI [0.83, 0.88], = 1.49, = .14, see Figure 5). Consistent with this finding, there was no support in a nested comparison for a model with an effect of comprehender group over an intercept‐only model, (3) = 2.28, = .52.3
FIGURE 5.

Results for Experiment 2. Both naive adults (left) and children (right) were able to more accurately interpret referring expressions originally produced by parents (orange) than children (blue) in Experiment 1. No overall difference was observed across the two comprehender groups. Error bars are 95% confidence intervals.
Summary and discussion
In Experiment 2, we asked whether children may struggle to initially establish pacts due to comprehension‐side difficulties. If parents' initial referring expressions are too complex or the images are too ambiguous, children may be unable to determine the referent and thus unable to get a pact off the ground. Surprisingly, we found no difference between the ability of naive adults and children to comprehend referential expressions from Experiment 1. It is possible that this negative effect is attributable to our incomplete sample, which was biased toward older children, and that a more balanced sample with younger comprehenders would reveal reliable differences (Figure S2). At the same time, although children may not be substantially worse at comprehending utterances, our data did suggest that referential expressions originally generated by younger children were harder to comprehend by naive adults and children alike (Appendix C). That is, both adults and children were less likely to find the intended referent after hearing a description originally produced by a 4‐year‐old child. Our findings so far, then, suggest that children's difficulty establishing pacts may primarily lie outside comprehension processes, and point toward possible developmental changes in children's ability to produce sufficiently informative descriptions by themselves.
EXPERIMENT 3: PRODUCTION EXPERIMENT
In Experiment 2, we found that naive children and adults were able to comprehend referential expressions equally well. That is, to the extent that children contributed to poorer group performance in Experiment 1, their contribution is not well‐explained by the comprehension hypothesis. At the same time, however, we found some evidence consistent with the production hypothesis: Messages originally produced by children were less comprehensible for listeners of any age. While this source effect is intriguing, it was difficult to disentangle the original messages from the interactive parent–child context in which they were produced. For example, it is possible that children were receiving interactive scaffolding that prompted them toward more comprehensible expressions (Grigoroglou & Papafragou, 2019). Or, conversely, it is possible that children were relying on their parents to take on more of the division of labor of interpretation (Hawkins et al., 2021) and were thus producing expressions that are less comprehensible to a naive audience than they were actually capable of.
In Experiment 3, we remove the interactive context to more directly assess children's ability to produce referential expressions for novel tangram objects in different contexts. To be clear, this study does not aim to address the ability of children to track partner‐specific pacts, which is already well‐established; it examines the conditions that may prevent a pact from being established in the first place, which require grounding in a successful referential act. We used a 2 × 2 design aiming to tease apart two related explanations for poor production performance. First, to assess the extent to which production difficulties stem from pragmatic reasoning (i.e., the ability to recognize that an accessible label is not sufficiently informative in context), we manipulate whether the foil is more or less similar to the target. Second, to assess the extent to which production difficulties stem from impoverished lexical priors (i.e., the ability to access candidate labels for a given referent), we manipulate whether the target objects are familiar photographs or novel tangram shapes (Horton & Gerrig, 2002). To the extent that children fail to produce context‐sensitive utterances for familiar objects with accessible labels, we may expect that pragmatic reasoning is a bottleneck on performance. To the extent that there is more variability in the utterances produced by children than by adults, we may expect that their lexical priors play a larger role.
Methods
Participants
We recruited 100 adult participants from Amazon Mechanical Turk. All participants gave informed consent prior to the start of the study and were compensated 60¢. Detailed demographics were not collected, but we constrained our recruitment to only participants in the US. We also recruited 60 children aged 4–8 years old ( = 6.35) to participate in the study. Our sample consisted of 4 four‐year‐olds, 17 five‐year‐olds, 10 six‐year‐olds, 12 seven‐year‐olds, and 17 eight‐year‐olds. Families received $5 electronic gift cards for their participation. Data on race and ethnicity were not collected, but participants were recruited from a database of families that reflect the overall racial/ethnic makeup of the Chicago area. The study was conducted online over Zoom. Parents provided informed written and verbal consent, and children provided verbal consent.
Stimuli and design
We used 37 pictures of familiar objects and 24 pictures of tangrams. Familiar objects were drawn from an image set used by Degen et al. (2020). We drew from eight different basic‐level categories that we expected to be familiar to children: bears, birds, cars, candy, dogs, fish, shirts, and tables. Meanwhile, tangrams were drawn from a public royalty‐free set available on https://www.1001freedownloads.com/. Each participant provided labels for a total of eight familiar targets and eight tangram targets. The rest of the images were used as foils when constructing contexts.
We used a factorial design, manipulating both the novelty of the stimuli (familiar images vs. tangram images) and the similarity of the foil to the target in the context (close contexts vs. far contexts). For far contexts, the target and competitor had low semantic overlap (i.e., far away in semantic space; Figure 6, top row). For close contexts, the target and competitor had high semantic overlap (i.e., close in semantic space; Figure 6, bottom row). For familiar images, we operationalized semantic overlap in terms of the basic‐level category: Close trials involved two objects in the same basic‐level semantic category (e.g., Pug and German Shepherd), and far trials used different basic‐level categories (e.g., Pug and Rabbit). For tangram images, close and far competitors were determined through an independent norming study where adult participants produced labels for a number of tangrams in isolation, and we constructed close or far contexts that (respectively) maximized or minimized naming agreement, the proportion of responses that overlapped for a pair of tangrams (Zettersten & Lupyan, 2020).
FIGURE 6.

Objects in Experiment 3 were either familiar images (left) or tangram images (right) and could appear in 1a far context (top), or in a close context (bottom).
Procedure
Adults
Adults were instructed to describe the object in the blue box by typing one or two words into a text box. In addition to the close and far conditions, adults also provided labels for each object in a third isolation condition (Figure 6). Trials appeared in contiguous blocks of the same type (e.g., eight trials in a row of “familiar” objects in “close” contexts), and the order of the six blocks was fully randomized across participants, for a total of 48 trials. At the start of each block, they were told whether they would see one picture (isolation condition), or two pictures (close and far conditions), and reminded that they should type in a description that would help another participant identify the target with a blue border.
Children
After parents gave consent and children assented to participate, children completed several warm‐up trials to introduce them to the game. Children were then told that they would continue to play a labeling game and that their responses would be shown to another person who did not know which object was in the blue border. To ease children into the task, we always began with a block of familiar objects in either the close or far condition, but otherwise the four blocks were randomized, for a total of 32 trials. They gave responses only for the close and far conditions; there was no isolation condition. Children's productions in all conditions were typed into a text box in real time by an experimenter during their participation. At the end of the experiment, we additionally included a manipulation check block to gauge individual differences in sensitivity to referential ambiguity (e.g. Beal & Belgrad, 1990; see Appendix D for further procedural details).
Preprocessing
We cleaned the text input by applying the following procedure across the combined data set. First, we corrected typos and removed stop words (e.g., determiners such as “a,” “the”). Second, we lemmatized all entries to remove spurious differences between tenses and plurals of the same root form, which may lead to spurious inflation of edit distances. Third, we manually removed phrases or frames that repeated across descriptions (e.g., if a participant said “a person who is…” on every trial, we removed that phrase), which may lead to spurious variability in descriptions across participants. We also manually standardized the word order such that, for example, “person running” would be transformed to “running person”. Fourth, we removed spaces and collapsed multiple words together into a single token (e.g., “German Shepherd” was tokenized to “germanshepherd”). We use the fully cleaned data for our analyses, but the rawer lemmatized data can be found on our OSF page. While adult participants typed in their own responses, children's responses were entered by an experimenter. When children's descriptions were overly long, experimenters prompted children to simplify them by asking, “Can you say that in one or two words?” Children were only prompted once, regardless of whether they simplified their expression.
Results
We focus on two primary hypotheses. First, could children be failing to take into account the referential context when deciding what to say, leading to more ambiguous or underinformative referring expressions than adults? Second, could children have more uncertainty over possible acceptable labels for novel objects, leading to higher variation than adults (Cycowicz et al., 1997; Lachman et al., 1974)? These hypotheses are not mutually exclusive. Indeed, the corresponding mechanisms—pragmatic reasoning and lexical priors—are both implicated in recent production models (Hawkins et al., 2022; e.g., Murthy et al., 2022). The first analysis was preregistered as confirmatory while the second was preregistered as exploratory.
Children are differentially sensitive to referential context
To test our first hypothesis, we examined the extent to which the same participant produced different utterances across the far versus close contexts. We began by considering a simple ‘exact match’ criterion, coding an item as 1 if the participant used the same label for that item in both contexts and 0 if they used different labels. This criterion is conservative in the sense that it will miss a number of near‐ or partial‐matches, giving a lower‐bound for overlap. We modeled the binary variable of context overlap using a mixed‐effects logistic regression. We included fixed effects for age cohort (child vs. adult) and target type (familiar vs. novel), as well as their interaction. To control for the fact that longer utterance strings are less likely to exactly match by chance, we also included a term for the average length of the close and far labels for that speaker and item. The most complex random effects structure that converged only included random intercepts at the participant level.
We found a significant interaction between age group and target type, , , (Figure 7a). Although adults displayed similar rates of context‐sensitivity for familiar and novel objects (familiar: = .35, novel: = .35), children were nearly twice as likely to provide the exact same label across contexts for a familiar object (novel: = .34; familiar: = .60). Similar results were obtained using “softer” measures such as edit distance, which is the number of edits required to turn one string into the other, , , (see Figure S5). In other words, while adults appropriately modulated their utterances across contexts, children often produced the same description for familiar targets across contexts (see Appendix E for evidence that these context‐insensitive descriptions were in fact underinformative in close contexts).
FIGURE 7.

(a) Context‐sensitivity for children and adults. Children were twice as likely to give the same description for familiar objects than tangrams across contexts, while adults equally modulated their descriptions for both target types. (b) Nameability for children and adults. Children produced more variable labels for tangrams than adults. Error bars are 95% CIs.
Familiar objects elicit less variable names
We tested the second hypothesis by examining the distribution of labels at the population level. First, we hypothesized that both adults and children will produce a fairly narrow, high‐agreement range of labels for familiar objects, yielding highly concentrated distributions. Second, we hypothesized that children will use a broader range of different labels for tangrams than adults, yielding a less concentrated distribution with less agreement among different children. We considered several measures of concentration, but we focused primarily on the proportion of unique labels to total labels, which is simple and interpretable.4 For example, suppose that from a pool of 40 participants, 20 said ‘bird’, 10 said ‘dancer’, and the remaining 10 chose other labels that were all distinct from one another. Then we would have 12 unique labels overall, and . Meanwhile, if all 40 participants chose different labels, we would have ; and at the other extreme, if all 40 participants chose the same label, we would have .
Because population agreement metrics necessarily aggregate over individual participants for each target, we constructed our mixed‐effects regression model at the item level. Given our findings of differential context‐sensitivity in the previous section, we limited this analysis to the “far” condition where the distribution of adult and children labels are more comparable (aggregating across close and far yields qualitatively similar results). We predicted agreement as a function of age group (adult vs. child) and target type (familiar vs. tangram), including random intercepts at the target level. First, we observed a main effect of target type, with less agreement on tangram labels than familiar object labels, , , . This is in line with our hypothesis that agreement would be higher for familiar objects with commonly known canonical labels. Importantly, however, we also found a significant interaction with age group, , , . The labels produced by different children in our sample were about as variable as adults' for familiar targets ( for children and for adults). However, children as a group produced a more variable set of labels for novel tangrams than adults did ( for children and for adults).
Summary and discussion
In Experiment 3, we asked whether difficulty in production may explain why young children may struggle to establish referential pacts with peers while succeeding with parents. Comparing adult and child responses in our production task revealed that children were not as sensitive to referential context, providing descriptions that did not always distinguish the intended referent from its foil (e.g., saying “table” when both a dining table and a picnic table were present). Importantly, this effect was found on trials containing familiar items that children are likely to have accessible labels for, so these results were unlikely to be explained by simple vocabulary constraints. In other words, children's ability to produce context‐sensitive utterances may be constrained by limitations in their pragmatic reasoning.
At the same time, we found that children produce a more variable set of labels for tangram shapes than adults. One possible explanation is that children have a less stable prior over possible names for the tangram shapes due to lower “codability” or “nameability,” there is no single existing convention (Hupet et al., 1991; Zettersten & Lupyan, 2020). That is, variability could be driven less by pragmatic reasoning and more by sampling under uncertainty: they may be producing utterances from a more spread‐out or idiosyncratic lexical prior (Bonawitz et al., 2014; Denison et al., 2013). Another possibility is that children have less strong priors over how to categorize or conceptually interpret these abstract shapes, upstream of labeling, leading to more creative and less “streamlined” interpretations than adults. Further work is required to distinguish whether this variability exists in conceptual representations of abstract depictions, at the level of lexical‐semantic conventions, or for these tangrams in particular.
Why might children struggle to modulate their descriptions for familiar objects? One possibility is that the basic level category label of a familiar object (e.g., “table”) is too salient, and children have difficulty suppressing this label in favor of a more informative description. Further studies probing the alternatives that children consider, or testing the salience of various object labels, could provide a stronger test of this possibility. Another possibility is that younger children are simply less sensitive to referential ambiguity (e.g., Beal & Flavell, 1982; Robinson & Robinson, 1977). In Appendix F, we present preliminary evidence from a manipulation check linking the ability to recognize referential ambiguity and the ability to generate appropriately informative expressions based on context. Children who were able to recognize that an ambiguous expression would be unhelpful were more likely to show context sensitivity in their own ability to generate descriptions for familiar objects. However, it remains unclear how either of these individual differences across children are reflected in different parental strategies. Parents have extensive, well‐calibrated knowledge about their children's developing communication abilities, and further research should compare parents against other adults with less specific knowledge about a particular child.
GENERAL DISCUSSION
Successful communication crucially depends on the ability to establish common ground and coordinate on conversational pacts with partners. A long literature of developmental work dating back to Krauss and Glucksberg (1977) has revealed surprising failures to interactively form adult‐like pacts even among children who are sensitive to partner‐specific perspective and referential precedent in interactions with adult experimenters (Akhtar et al., 1996; Graham et al., 2014; Khu et al., 2020; Matthews et al., 2010; Nadig & Sedivy, 2002; Nilsen & Graham, 2009; Ostashchenko et al., 2019; Yoon et al., 2021). Here, we turned to parent–child interactions for insight, asking whether parents may be adopting some of the strategies used by experimenters to elicit pacts in recent studies (e.g., Köymen et al., 2014). We found that the obstacles to pact formation are, to some extent, resolved through spontaneous parental scaffolding: parents of younger children engaged in more conversational turns, asked more clarification questions, produced longer referential expressions, and adopted less complex labels. We then conducted two additional studies to disentangle the contribution of comprehension and production processes, determining that children's production is likely to present the root problem that parental scaffolding is adapted to overcome.
Our studies contribute to the debate over why young children may struggle to form referential pacts with peers in some cases and not in others, especially given extensive evidence that they are capable of tracking pacts introduced by adult experimenters (Graham et al., 2014; Matthews et al., 2010). One classic family of explanations implicates theory of mind development: children may fail to take their partner's perspective into account and stubbornly produce idiosyncratic descriptions that are not meaningful to anyone else (Krauss & Glucksberg, 1977). Another family of explanations implicates sensitivity to informativeness given the demands of production: Children may be focused on generating an appropriate standalone label for the target while neglecting to consider whether it might also apply to the foil (Speer, 1984). Our study emphasizes a third possibility. Young children may be unable to produce a good‐enough label for an unfamiliar object on their own, due to weak lexical priors for what to call unfamiliar objects, and unable to initiate clarification. Once presented with a good‐enough label, and prompted to confirm that they understand it, however, they are flexible enough to readily adopt and use it themselves on future trials.
Our paradigm differs in important ways from those used in prior work and it is possible that pairs of children would, hypothetically, succeed at forming conversational pacts without experimenter intervention in our variant. For example, our task used contexts containing only two tangram images per trial, whereas classic studies used contexts of 6 or 12. This choice could have reduced the cognitive load in our variant. On the other hand, participants were instructed to communicate verbally without the ability to use non‐verbal modalities like gesture that they could rely upon in other paradigms (Bohn et al., 2019; Holler, 2022; Lister et al., 2021), making the task more challenging. There are other reasons to believe that pairs of children would still struggle in our variant, including the production failures we independently observed in Experiment 3 and the observation that younger children relied more heavily on parent‐generated descriptions in Experiment 1. Further experiments are necessary to replicate classic child–child phenomena under our paradigm, or replicate our parent–child phenomena under classic paradigms.
Taken together, these studies contribute to the growing understanding of parental scaffolding in language development more broadly. As children encounter novel concepts and situations they do not yet have words to describe, they must be particularly attuned to the distribution of labels that appear to be acceptable to competent speakers. Beyond the benefits of child‐directed speech (The ManyBabies Consortium, 2020) and careful tailoring of labels to the child's vocabulary (Leung et al., 2021), adults may spontaneously adopt some of the higher‐order discourse strategies examined in training studies (Matthews et al., 2007). These more interactive forms of scaffolding go beyond a single utterance to draw attention to sources of ambiguity and expose the need to actively check for mutual understanding. The everyday process of parents and children trying to understand one another may in this way not only scaffold the formation of conversational pacts, but of communicative development more broadly.
Supporting information
Data S1.
Leung, A. , Yurovsky, D. , & Hawkins, R. D. (2025). Parents spontaneously scaffold the formation of conversational pacts with their children. Child Development, 96, 546–561. 10.1111/cdev.14186
Footnotes
To match different forms of the same word (e.g., “jumping” vs. “jumped”) we first lemmatized each word. We also filtered out stop words (“the”, “with”), as well as common words that were not part of the pacts (“person”, “box”), and excluded words that appeared for the first time on the final repetition of each target.
Because we do not have child comprehension data for every audio clip, we could not estimate random slopes at this finer level of granularity. All behavior within a game is non‐independent, so we believe this coarser level of grouping at source games is natural. However, see Table S6 for a Bayesian approach to estimating these effects on the subset of recordings where both comprehender groups were available.
Because null hypothesis significance testing is unable to provide positive evidence for the null hypothesis, we also ran a Bayesian regression using the brm package. We used the Savage–Dickey method and found a Bayes Factor of 9.5, indicating moderate support in favor of the null hypothesis . We used a weakly informative student‐ prior on the coefficient with degrees of freedom and scale , following Gelman et al. (2008), but this result was robust to other choices.
In principle, an appropriate metric of spread across labels would be the information theoretic quantity of entropy: . However, our empirical distributions are highly sparse, with many labels appearing only once. Estimates of entropy are therefore somewhat sensitive to the choice of statistical estimator (i.e. how much to regularize with pseudo‐counts) while being less numerically interpretable. Alternative metrics include “modal agreement” (Brodeur et al., 2010, 2014), the proportion of participants that produce the most common label, and Simpson's diversity index (Majid et al., 2018; Majid & Burenhult, 2014; Simpson, 1949), which can be interpreted as the probability that two independently sampled labels will match.
DATA AVAILABILITY STATEMENT
All data and code for these analyses are available at https://osf.io/vkug8/. Studies 2–3 were preregistered at https://osf.io/vkug8/registrations; the preregistration for Study 1 is unavailable due to an archiving error on OSF. Videos of experiment sessions are available on Databrary.
REFERENCES
- Akhtar, N. , Carpenter, M. , & Tomasello, M. (1996). The role of discourse novelty in early word learning. Child Development, 67(2), 635–645. [Google Scholar]
- Anderson, A. H. , Clark, A. , & Mullin, J. (1994). Interactive communication between children: Learning how to make language work in dialogue. Journal of Child Language, 21(2), 439–463. [DOI] [PubMed] [Google Scholar]
- Asher, S. R. , & Oden, S. L. (1976). Children's failure to communicate: An assessment of comparison and egocentrism explanations. Developmental Psychology, 12(2), 132–139. [Google Scholar]
- Barr, D. J. , Levy, R. , Scheepers, C. , & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beal, C. R. (1987). Repairing the message: Children's monitoring and revision skills. Child Development, 58, 401–408. [Google Scholar]
- Beal, C. R. (1988). Children's knowledge about representations of intended meaning. In Astington J. W., Harris P. L., & Olson D. R. (Eds.), Developing theories of mind (pp. 315–325). Cambridge University Press. [Google Scholar]
- Beal, C. R. , & Flavell, J. H. (1982). Effect of increasing the salience of message ambiguities on kindergartners' evaluations of communicative success and message adequacy. Developmental Psychology, 18(1), 43–48. [Google Scholar]
- Beal, C. R. , & Flavell, J. H. (1983). Young speakers' evaluations of their listener's comprehension in a referential communication task. Child Development, 54, 148–153. [Google Scholar]
- Bleijlevens, N. , Contier, F. , & Behne, T. (2023). Pragmatics aid referent disambiguation and word learning in young children and adults. Developmental Science, 26(4), e13363. [DOI] [PubMed] [Google Scholar]
- Bohn, M. , Kachel, G. , & Tomasello, M. (2019). Young children spontaneously recreate core properties of language in a new modality. Proceedings of the National Academy of Sciences of the United States of America, 116(51), 26072–26077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bohn, M. , Le, K. N. , Peloquin, B. , Köymen, B. , & Frank, M. C. (2021). Children's interpretation of ambiguous pronouns based on prior discourse. Developmental Science, 24(3), e13049. [DOI] [PubMed] [Google Scholar]
- Bonawitz, E. , Denison, S. , Griffiths, T. L. , & Gopnik, A. (2014). Probabilistic models, learning algorithms, and response variability: Sampling in cognitive development. Trends in Cognitive Sciences, 18(10), 497–500. [DOI] [PubMed] [Google Scholar]
- Brandt, S. , Lieven, E. , & Tomasello, M. (2016). German children's use of word order and case marking to interpret simple and complex sentences: Testing differences between constructions and lexical items. Language Learning and Development, 12(2), 156–182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Branigan, H. P. , Bell, J. , & McLean, J. F. (2016). Do you know what i know? The impact of participant role in children's referential communication. Frontiers in Psychology, 7, 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brennan, S. E. , & Clark, H. H. (1996). Conceptual pacts and lexical choice in conversation. Journal of Experimental Psychology. Learning, Memory, and Cognition, 22(6), 1482–1493. [DOI] [PubMed] [Google Scholar]
- Brodeur, M. B. , Dionne‐Dostie, E. , Montreuil, T. , & Lepage, M. (2010). The bank of standardized stimuli (BOSS), a new set of 480 normative photos of objects to be used as visual stimuli in cognitive research. PLoS One, 5(5), e10773. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brodeur, M. B. , Guérard, K. , & Bouras, M. (2014). Bank of standardized stimuli (BOSS) phase II: 930 new normative photos. PLoS One, 9(9), e106953. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brown, R. (1958). How shall a thing be called? Psychological Review, 65(1), 14–21. [DOI] [PubMed] [Google Scholar]
- Brown‐Schmidt, S. (2009). Partner‐specific interpretation of maintained referential precedents during interactive dialog. Journal of Memory and Language, 61(2), 171–190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brown‐Schmidt, S. , Yoon, S. O. , & Ryskin, R. A. (2015). People as contexts in conversation. In Ross B. (Ed.), Psychology of learning and motivation (Vol. 62, pp. 59–99). Elsevier. [Google Scholar]
- Bruner, J. (1985). Child's talk: Learning to use language. Child Language Teaching and Therapy, 1(1), 111–114. [Google Scholar]
- Clark, E. V. (2009). First language acquisition. Cambridge University Press. [Google Scholar]
- Clark, E. V. (2018). Conversation and language acquisition: A pragmatic approach. Language Learning and Development, 14(3), 170–185. [Google Scholar]
- Clark, E. V. , & Bernicot, J. (2008). Repetition as ratification: How parents and children place information in common ground. Journal of Child Language, 35(2), 349–371. [DOI] [PubMed] [Google Scholar]
- Clark, H. H. (1996). Using language. Cambridge University Press. [Google Scholar]
- Clark, H. H. , & Wilkes‐Gibbs, D. (1986). Referring as a collaborative process. Cognition, 22, 1–39. [DOI] [PubMed] [Google Scholar]
- Cycowicz, Y. M. , Friedman, D. , Rothstein, M. , & Snodgrass, J. G. (1997). Picture naming by young children: Norms for name agreement, familiarity, and visual complexity. Journal of Experimental Child Psychology, 65(2), 171–237. [DOI] [PubMed] [Google Scholar]
- Datavyu Team . (2014). Datavyu: A video coding tool. Databrary project. New York University. http://datavyu.org [Google Scholar]
- Davies, C. , Lingwood, J. , Ivanova, B. , & Arunachalam, S. (2021). Three‐year‐olds' comprehension of contrastive and descriptive adjectives: Evidence for contrastive inference. Cognition, 212, 104707. [DOI] [PubMed] [Google Scholar]
- Degen, J. , Hawkins, R. D. , Graf, C. , Kreiss, E. , & Goodman, N. D. (2020). When redundancy is useful: A bayesian approach to “overinformative” referring expressions. Psychological Review, 127(4), 591–621. [DOI] [PubMed] [Google Scholar]
- Denison, S. , Bonawitz, E. , Gopnik, A. , & Griffiths, T. L. (2013). Rational variability in children's causal inferences: The sampling hypothesis. Cognition, 126(2), 285–300. [DOI] [PubMed] [Google Scholar]
- Deutsch, W. , & Pechmann, T. (1982). Social interaction and the development of definite descriptions. Cognition, 11(2), 159–184. [DOI] [PubMed] [Google Scholar]
- Fishbein, H. D. , & Osborne, M. (1971). The effects of feedback variations on referential communication of children. Merrill‐Palmer Quarterly of Behavior and Development, 17(3), 243–250. [Google Scholar]
- Gelman, A. , Jakulin, A. , Pittau, M. G. , & Su, Y.‐S. (2008). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2(4), 1360–1383. [Google Scholar]
- Glucksberg, S. , & Krauss, R. M. (1967). What do people say after they have learned how to talk? Studies of the development of referential communication. Merrill‐Palmer Quarterly of Behavior and Development, 13(4), 309–316. [Google Scholar]
- Glucksberg, S. , Krauss, R. M. , & Weisberg, R. (1966). Referential communication in nursery school children: Method and some preliminary findings. Journal of Experimental Child Psychology, 3(4), 333–342. [DOI] [PubMed] [Google Scholar]
- Goldberg, A. E. , & Ferreira, F. (2022). Good‐enough language production. Trends in Cognitive Sciences, 26, 300–311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Graham, S. A. , Sedivy, J. , & Khu, M. (2014). That's not what you said earlier: Preschoolers expect partners to be referentially consistent. Journal of Child Language, 41(1), 32–48. [DOI] [PubMed] [Google Scholar]
- Grigoroglou, M. , & Papafragou, A. (2019). Interactive contexts increase informativeness in children's referential communication. Developmental Psychology, 55(5), 951–966. [DOI] [PubMed] [Google Scholar]
- Haber, J. , Baumgärtner, T. , Takmaz, E. , Gelderloos, L. , Bruni, E. , & Fernández, R. (2019). The PhotoBook dataset: Building common ground through visually‐grounded dialogue. Proceedings of the 57th annual meeting of the Association for Computational Linguistics, 1895–1910 10.18653/v1/P19-1184 [DOI]
- Hawkins, R. D. , Frank, M. C. , & Goodman, N. D. (2020). Characterizing the dynamics of learning in repeated reference games. Cognitive Science, 44, e12845. [DOI] [PubMed] [Google Scholar]
- Hawkins, R. D. , Franke, M. , Frank, M. C. , Goldberg, A. E. , Smith, K. , Griffiths, T. L. , & Goodman, N. D. (2022). From partners to populations: A hierarchical Bayesian account of coordination and convention. Psychological Review, 130(4), 977–1016. [DOI] [PubMed] [Google Scholar]
- Hawkins, R. D. , Gweon, H. , & Goodman, N. D. (2021). The division of labor in communication: Speakers help listeners account for asymmetries in visual perspective. Cognitive Science, 45(3), e12926. [DOI] [PubMed] [Google Scholar]
- Hockett, C. F. , & Hockett, C. D. (1960). The origin of speech. Scientific American, 203(3), 88–97. [PubMed] [Google Scholar]
- Holler, J. (2022). Visual bodily signals as core devices for coordinating minds in interaction. Philosophical Transactions of the Royal Society B, 377(1859), 20210094. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Horton, W. S. , & Gerrig, R. J. (2002). Speakers' experiences and audience design: Knowing when and knowing how to adjust utterances to addressees. Journal of Memory and Language, 47(4), 589–606. [Google Scholar]
- Hupet, M. , Seron, X. , & Chantraine, Y. (1991). The effects of the codability and discriminability of the referents on the collaborative referring procedure. British Journal of Psychology, 82(4), 449–462. [Google Scholar]
- Khu, M. , Chambers, C. G. , & Graham, S. A. (2020). Preschoolers flexibly shift between speakers' perspectives during real‐time language comprehension. Child Development, 91(3), e619–e634. [DOI] [PubMed] [Google Scholar]
- Koranda, M. J. , Zettersten, M. , & MacDonald, M. C. (2022). Good‐enough production: Selecting easier words instead of more accurate ones. Psychological Science, 33(9), 1440–1451. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Köymen, B. , Schmerse, D. , Lieven, E. , & Tomasello, M. (2014). Young children create partner‐specific referential pacts with peers. Developmental Psychology, 50(10), 2334–2342. [DOI] [PubMed] [Google Scholar]
- Krauss, R. M. , & Glucksberg, S. (1969). The development of communication: Competence as a function of age. Child Development, 40, 255–266. [Google Scholar]
- Krauss, R. M. , & Glucksberg, S. (1977). Social and nonsocial speech. Scientific American, 236(2), 100–105.194309 [Google Scholar]
- Lachman, R. , Shaffer, J. P. , & Hennrikus, D. (1974). Language and cognition: Effects of stimulus codability, name‐word frequency, and age of acquisition on lexical reaction time. Journal of Verbal Learning and Verbal Behavior, 13(6), 613–625. [Google Scholar]
- Leung, A. , Tunkel, A. , & Yurovsky, D. (2021). Parents fine‐tune their speech to children's vocabulary knowledge. Psychological Science, 32(7), 975–984. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lister, C. J. , Burtenshaw, T. , Walker, B. , Ohan, J. L. , & Fay, N. (2021). A cross‐sectional test of sign creation by children in the gesture and vocal modalities. Child Development, 92(6), 2395–2412. [DOI] [PubMed] [Google Scholar]
- Majid, A. , & Burenhult, N. (2014). Odors are expressible in language, as long as you speak the right language. Cognition, 130(2), 266–270. [DOI] [PubMed] [Google Scholar]
- Majid, A. , Roberts, S. G. , Cilissen, L. , Emmorey, K. , Nicodemus, B. , O'grady, L. , Woll, B. , LeLan, B. , De Sousa, H. , Cansler, B. L. , Shayan, S. , de Vos, C. , Senft, G. , Enfield, N. J. , Razak, R. A. , Fedden, S. , Tufvesson, S. , Dingemanse, M. , Ozturk, O. , … Levinson, S. C. (2018). Differential coding of perception in the world's languages. Proceedings of the National Academy of Sciences of the United States of America, 115(45), 11369–11376. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matthews, D. , Lieven, E. , & Tomasello, M. (2007). How toddlers and preschoolers learn to uniquely identify referents for others: A training study. Child Development, 78(6), 1744–1759. [DOI] [PubMed] [Google Scholar]
- Matthews, D. , Lieven, E. , & Tomasello, M. (2010). What's in a manner of speaking? Children's sensitivity to partner‐specific referential precedents. Developmental Psychology, 46(4), 749–760. [DOI] [PubMed] [Google Scholar]
- Morisseau, T. , Davies, C. , & Matthews, D. (2013). How do 3‐and 5‐year‐olds respond to under‐and over‐informative utterances? Journal of Pragmatics, 59, 26–39. [Google Scholar]
- Murthy, S. K. , Griffiths, T. L. , & Hawkins, R. D. (2022). Shades of confusion: Lexical uncertainty modulates ad hoc coordination in an interactive communication task. Cognition, 225, 105152. [DOI] [PubMed] [Google Scholar]
- Nadig, A. S. , & Sedivy, J. C. (2002). Evidence of perspective‐taking constraints in children's on‐line reference resolution. Psychological Science, 13(4), 329–336. [DOI] [PubMed] [Google Scholar]
- Nikolaus, M. , Prévot, L. , & Fourtassi, A. (2022). Communicative feedback as a mechanism supporting the production of intelligible speech in early childhood. In Culbertson J., Perfors A., Rabagliati H., & Ramenzoni V. (Eds.), Proceedings of the 44th Annual Conference of the cognitive science society (pp. 771–778). eScholarship University of California. [Google Scholar]
- Nilsen, E. S. , & Graham, S. A. (2009). The relations between children's communicative perspective‐taking and executive functioning. Cognitive Psychology, 58(2), 220–249. [DOI] [PubMed] [Google Scholar]
- Ostashchenko, E. , Deliens, G. , Geelhand, P. , Bertels, J. , & Kissine, M. (2019). Referential processing in 3‐and 5‐year‐old children is egocentrically anchored. Journal of Experimental Psychology: Learning, Memory, and Cognition, 45(8), 1387. [DOI] [PubMed] [Google Scholar]
- Rabagliati, H. , & Robertson, A. (2017). How do children learn to avoid referential ambiguity? Insights from eye‐tracking. Journal of Memory and Language, 94, 15–27. [Google Scholar]
- Robinson, E. J. , & Robinson, W. P. (1977). Children's explanations of communication failure and the inadequacy of the misunderstood message. Developmental Psychology, 13(2), 156–161. [Google Scholar]
- Schegloff, E. A. (2007). Sequence organization in interaction: A primer in conversation analysis (Vol. 1). Cambridge University Press. 10.1017/CBO9780511791208 [DOI] [Google Scholar]
- Simpson, E. H. (1949). Measurement of diversity. Nature, 163(4148), 688. [Google Scholar]
- Speer, J. R. (1984). Two practical strategies young children use to interpret vague instructions. Child Development, 55, 1811–1819. [Google Scholar]
- Stephens, G. , & Matthews, D. (2014). Referential pacts in child language development. In Arnon I., Casillas M., Kurumada C., & Estigarribia B. (Eds.), Language in interaction (pp. 175–190). John Benjamins Publishing Company. [Google Scholar]
- The ManyBabies Consortium . (2020). Quantifying sources of variability in infancy research using the infant‐directed‐speech preference. Advances in Methods and Practices in Psychological Science, 3(1), 24–52. [Google Scholar]
- Vasil, J. (2023). A new look at young children's referential informativeness. Perspectives on Psychological Science, 18(3), 624–648. [DOI] [PubMed] [Google Scholar]
- Whitehurst, G. J. , & Sonnenschein, S. (1978). The development of communication: Attribute variation leads to contrast failure. Journal of Experimental Child Psychology, 25(3), 490–504. [Google Scholar]
- Yoon, S. O. , Jin, K. , Brown‐Schmidt, S. , & Fisher, C. L. (2021). What's new to you? Preschoolers' partner‐specific online processing of disfluency. Frontiers in Psychology, 11, 612601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zettersten, M. , & Lupyan, G. (2020). Finding categories through words: More nameable features improve category learning. Cognition, 196, 104135. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data S1.
Data Availability Statement
All data and code for these analyses are available at https://osf.io/vkug8/. Studies 2–3 were preregistered at https://osf.io/vkug8/registrations; the preregistration for Study 1 is unavailable due to an archiving error on OSF. Videos of experiment sessions are available on Databrary.
