Skip to main content
PLOS One logoLink to PLOS One
. 2020 Mar 11;15(3):e0219874. doi: 10.1371/journal.pone.0219874

Decision-making flexibility in New Caledonian crows, young children and adult humans in a multi-dimensional tool-use task

Rachael Miller 1,*,#, Romana Gruber 2,#, Anna Frohnwieser 1,*, Martina Schiestl 2,3, Sarah A Jelbert 4, Russell D Gray 2,3, Markus Boeckle 1, Alex H Taylor 2, Nicola S Clayton 1
Editor: Juliane Kaminski5
PMCID: PMC7065838  PMID: 32160191

Abstract

The ability to make profitable decisions in natural foraging contexts may be influenced by an additional requirement of tool-use, due to increased levels of relational complexity and additional work-effort imposed by tool-use, compared with simply choosing between an immediate and delayed food item. We examined the flexibility for making the most profitable decisions in a multi-dimensional tool-use task, involving different apparatuses, tools and rewards of varying quality, in 3-5-year-old children, adult humans and tool-making New Caledonian crows (Corvus moneduloides). We also compared our results to previous studies on habitually tool-making orangutans (Pongo abelii) and non-tool-making Goffin’s cockatoos (Cacatua goffiniana). Adult humans, cockatoos and crows, but not children and orangutans, did not select a tool when it was not necessary, which was the more profitable choice in this situation. Adult humans, orangutans and cockatoos, but not crows and children, were able to refrain from selecting non-functional tools. By contrast, the birds, but not the primates tested, struggled to attend to multiple variables—where two apparatuses, two tools and two reward qualities were presented simultaneously—without extended experience. These findings indicate: (1) in a similar manner to humans and orangutans, New Caledonian crows and Goffin’s cockatoos can flexibly make profitable decisions in some decision-making tool-use tasks, though the birds may struggle when tasks become more complex; (2) children and orangutans may have a bias to use tools in situations where adults and other tool-making species do not.

Introduction

Effective decision-making ensures that individuals achieve goal-directed behaviour [13]. In natural foraging contexts, individuals are required to take into account various different aspects simultaneously when making profitable decisions, such as whether to travel further afield for higher quality foods and how they can access extractable foods, for instance, through the use of tools. Such decisions may therefore be influenced by work-effort sensitivity, the level of perceived risk, attention to the functionality of available tools, and the quality of food available [4, 5].

One aspect that underlies decision-making is self-control—the capacity to suppress immediate drives in favour of delayed rewards [6]. One approach to measuring self-control is the use of delay of gratification tasks, where subjects have to wait longer and/or work harder to obtain a more valuable outcome [7]. In human children, delay of gratification is developmentally influenced, shows high individual variation and correlates with measures of success in later life, such as social and academic competence [8], though see a recent study [9]. Self-control skills emerge in infancy [10, 11] and develop throughout toddlerhood and pre-school age [12, 13], improving significantly between ages 3 and 5 [14, 15] and further beyond 5 years.

Delayed gratification is also likely important for non-human species in a number of contexts, including foraging and social interactions. A prominent example is tool-use, where animals may have to forgo immediate gratification when they choose to use a tool to gain access to high value but out-of-reach food, rather than low value, freely accessible food. One possibility, therefore, which is predicted by the technical intelligence hypothesis [16], is that self-control is enhanced in tool-using species compared to non-tool using species. Specifically, tool-using species may have evolved better self-control abilities so they can make more efficient decisions when foraging with their tools. However, evidence for this hypothesis is mixed.

The ability to forgo an immediate reward for a delayed one has been found in various non-human animals, including primates and birds—see [17] for a recent review of self-control in crows, parrots and non-human primates—though has primarily been tested in non-tool-using contexts. To date, no clear comparisons can be drawn between the self-control abilities of tool-using and non-tool-using species, also based on the fact that different methodologies have been used for different species. For example, in tasks not involving tool-use, where subjects choose between two food qualities, one available immediately and one following a delay, tool-using capuchin monkeys (Sapajus apella) performed comparably to tool-using great apes (Pan paniscus, Pan troglodyes), and outperformed non-tool-using marmosets (Callithrix jacchus) and tamarins (Saguinus oedipus) [18, 19]. However, non-tool-using spider monkeys (Ateles geoffroyi) actually outperform capuchins [20]. While some species are able to use and/or make tools in the lab, for instance, Goffin’s cockatoos [21, 22], there is limited evidence that they do so in the wild [23]. Therefore, we refer to these species as ‘non-tool-making’. A number of non-tool-making species, including carrion crows (Corvus corone), common ravens (Corvus corax) and Goffin’s cockatoos, perform similarly to tool-making chimpanzees at non-tool-using delay tasks [2427]. Furthermore, non-tool-making Eurasian jays (Garrulus glandarius) and California scrub-jays (Aphelocoma californica) are able to overcome current motivational needs in non-tool-use contexts; i.e. during caching and when food-sharing with a partner [2830].

Tasks involving delayed gratification in a tool-use context have been tested in separate single-species studies, primarily using a variety of paradigms, in tool-making primate species (e.g. chimpanzees [31], capuchins [32], orangutans [5]) and in non-tool-making bird species (Goffin’s cockatoo [33] and common ravens [34]). In these studies, the choice is typically between getting a lesser reward without a tool versus using a tool to gain a better reward. Experience with tool-use improves performance in a delayed gratification tool-use task in capuchin monkeys, where subjects could either immediately consume rod-shaped food items or carry them to an apparatus to use them to extract a food of high quality [32]. Chimpanzees and orangutans selected a tool over a least preferred food and range of toys when the tool could be used later to suck fruit soup from a bowl [31]. Non-tool-making bird species have also shown delayed gratification ability in a tool-use context. Ravens were able to consistently select the correct tool over distractor items including an immediate reward to open a box and obtain a reward, even when the box was missing for up to 17 hours [34], though see [35, 36]. Goffin’s cockatoos were able to overcome immediate drives in favour of future gains in performance on a delayed gratification tool-use task [33].

However, very few studies have used comparable methodology with both tool-making and non-tool-making species in tool-using tasks involving delayed gratification, with a focus on the ability to flexibly make the most profitable decision. Additionally, benchmarking against children and adult human performances is crucial, given that past tests of physical cognition have found differences between human and non-human species’ performances, which suggest that the task may not measure the ability that it is designed to test, or only measures it in one species but not the other. This can be due to differences in perceptual abilities or assumptions about human performance. One prominent example of this is the trap-tube task [3741], in which subjects have to push or pull food from a horizontal tube while avoiding a trap. When humans were faced with the inverted trap tube, where the trap was now on top of the horizontal tube and therefore inactive, they failed to avoid the trap [42]. This task had been previously presented as a key test of causal understanding in humans and other species. Thus, researchers should not assume that failure in a task indicates poorer performance of one species over another, without having tested both species using the same (or at least closely comparable) methodology.

Here, we tested children aged 3–5 years, adult humans and tool-making New Caledonian crows [43, 44] on a multi-dimensional tool-use task, requiring the use of two different types of tools, apparatuses and rewards varying in quality as determined by a preference test. We focused on the ability to make the most profitable decisions across five conditions where reward quality, tool functionality and work effort were manipulated. We adapted and extended an experimental paradigm previously tested on non-tool-making Goffin’s cockatoos [33] and tool-making orangutans [5]. Subjects were required to make binary choices between two tools (stick or stone) or a tool and a reward (most or least preferred) to use in one of two apparatuses. We refer to the previous findings in cockatoos and orangutans, in addition to the present tested species, though note some minor differences in methodology between species, as outlined in the methods and discussion. We can therefore make tentative comparisons between species, with a primary focus on exploring each species’ ability to make profitable decisions in multiple contexts that each requires use of a tool. We expected that–similar to cockatoos and orangutans [5, 33]–New Caledonian crows and humans would be able to show flexibility in their ability to make profitable decisions in this multi-dimensional paradigm. Additionally, delayed gratification and tool functionality understanding appear to develop in children at different ages and increase with age [45, 46], but have not previously been tested simultaneously. Thus, we expected children’s ability to solve these tasks to increase with age. Therefore, we were also able to provide novel insight into the developmental trajectory and inter-relation between these abilities in children.

Materials and methods

Ethics statement

The methods of this study were carried out in accordance with the relevant guidelines and regulations. The study and related experimental protocols were approved and conducted under the European Research Council Executive Agency Ethics Team (application: 339993-CAUSCOG-ERR) and University of Cambridge Psychology Research Ethics Committee (pre. 2013.109). Informed written consent was obtained from legal guardians prior to child participation, and from adult subjects. The parents of the children identified in the Supplementary Video gave their informed written consent for this information to be published. The New Caledonian crow research was conducted under approval from the University of Auckland Animal Ethics Committee (reference number 001823) and from the Province Sud with permission to work on Grande Terre, New Caledonia, and to capture and release crows.

While care was taken to make the methodology as similar as possible between the humans and crows, please note that there were some differences in methodology between species as outlined in the methods and discussion sections, which limits the opportunity to directly compare the species with one another [47]. We therefore focus primarily on individual species performance in these tasks.

Subjects

New Caledonian crows

Bird subjects were 6 New Caledonian crows caught from the wild (at location 21.67°S 165.68°E) on Grand Terre, New Caledonia. The birds were held temporarily in captivity on Grande Terre for non-invasive behavioural research purposes from April to August 2017. There were 4 males and 2 females, based on sexual size dimorphism [48], of which 4 were adults and 2 were juveniles (less than 1 year old), based on age by beak colouration [48]. Due to the small number of birds held at the field site at any time, it was not possible to include a larger sample, and juvenile and adult crows were grouped for the analysis. The birds were housed in small groups, consisting of two to four individuals per group, in a ten-compartment outdoor aviary, with approximately 7x4x3m per compartment, containing a range of natural enrichment materials, like logs, branches, sea shells and pine cones. Subjects were tested individually in temporary visual isolation from the group. The birds were not generally food deprived, and the daily diet consisted of meat, dog food, eggs and fruit, with water available ad libitum. The birds were trained to stand on weighing scales for a small food reward to regularly monitor their weight, and all birds maintained at or above capture weights during their stay in captivity. The birds were acclimatized to the aviaries in April and trained for the experiment in May–July. All birds completed the full study in August 2017. At the end of their research participation, birds were released at their capture site(s). A previous study indicated that New Caledonian crows housed temporarily in a similar situation as the present study successfully reintegrated into the wild after release [49].

Children and adult humans

Child subjects were 88 children aged between three and five years old: 29 3-year olds (Mean: 3.69 years; Range: 3.31–3.93 years), 30 4-year olds (M: 4.45 years; R: 4–4.96 years) and 29 5-year olds (M: 5.42 years; R: 5–5.93 years), of which 46 were male and 42 were female. We chose the age range of 3–5 years for children as previous research suggests significant improvements in self-control ability within this range [14, 15]. Children were recruited and tested at five nurseries and primary schools in England, serving predominantly white, middle-class communities, between January and February 2018. 20 adult subjects were recruited via the Cambridge psychology research sign-up system and tested in April 2018, comprising primarily of Undergraduate and Postgraduate students at the University of Cambridge, 3 were male and 17 were female. Adults received £5 for participation in the study. All adults and children tested completed the full study. Humans were tested individually in temporary visual isolation, though for some of the very young children, a member of staff was present in the room, but did not interact with the child.

Materials

The ‘stone-apparatus’ was a box made of Perspex (10x5cm) with a vertical tube on top (8x3cm) and a platform inside that collapsed when a heavy object, i.e., a stone, was dropped (Fig 1). To prevent subjects from inserting the stick into the stone-apparatus to release the platform and get access to the reward, the vertical tube had a 30° slant, which made the release of the platform with a stick impossible. We used two different ‘stick-apparatuses’ for the humans and the crows though both apparatuses were functionally the same, as they required a stick to contact a reward and move it to the left or right. The minor variation in the apparatus structure was due to the testing equipment available in New Caledonia. The ‘stick-apparatus’ for the humans was similar to [50] with a transparent box with a central opening hole (17x17cm), where the reward sits on a small platform on the slanted plate within the box. For the crows, we used a ‘stick-apparatus’ consisting of a horizontally orientated Perspex tube (10x3cm) that rested on two Perspex pillars (5cm high), which could be operated by a wooden stick to gain access to the reward by raking the reward towards themselves causing it to drop in to the subject’s reach (Fig 1A and 1B). To prevent the subject from inserting the stone into the stick-apparatus to try to obtain the reward, the entrance hole was too small and narrow on both apparatus types for stone insertion. Therefore, only the stone was functional in the stone-apparatus and only the stick was functional in the stick-apparatus.

Fig 1.

Fig 1

Stone-apparatus (left) and stick-apparatus (right), a) shows the stick-apparatus for the crows and b) for the humans, with both tools—the functional tool indicated by a green hook and the non-functional tool with a red cross.

The crow experiment was run in a similar manner to the Goffin’s cockatoo study [50], using the same apparatuses, tools and protocol for training and testing, in order to enable comparisons of performance in each bird species. However, we made further adaptations to the crow study, by extending the previous study, as detailed below. The human experiments were run as closely as possible to the bird experiments, using the same apparatuses, tools and protocols. Fewer trials were run for the humans than birds due to practical reasons like restrictions on session length and number for the children. The reward types also differed between species and groups. The rewards used for the crows were meat as the most preferred food reward and a piece of apple as the least preferred food reward, following reward preference testing. The crow rewards differed from the cockatoos (nuts), due to the differing diets of these two species. The child rewards were three types of stickers of similar sizes: most preferred (animal stickers), least preferred (white, square stickers) and ‘medium’ stickers (round, yellow, smiley face stickers) as determined by the reward preference test and piloting. The reward for the adults was money (£), with 10p as least preferred, 50p as most preferred and 20p as medium reward, represented by white, square, laminated tokens of the same size as the stickers used for the children.

New Caledonian crow experiment

Procedure

During training and testing, the subjects were presented with a binary choice between two tools or a tool and a reward, on the left and right side of the apparatus, with the side of presentation semi-randomly balanced across trials.

Training

Unlike the cockatoos in the previous study, the crows were wild-caught and not comfortable with close human presence, therefore we adapted the methodology slightly for the crows. The choice for training and testing was presented inside two of five drawers resting on a table and could be operated by the experimenter behind a visual barrier (See S2 Video). The crows could sit on an elevated perch diagonally to the rear of the table and a perch in front of the drawers, which allowed them to inspect the contents from a distance and close up, pick up the tools and operate the apparatuses. Depending on the condition, either one apparatus was placed in front of the middle drawer, and the tools were presented in the drawers left and right of it, resting on a piece of foam so that the crow could easily pick it up, or both apparatuses were placed left and right of the middle drawer, and the tools were presented in the middle drawer. The drawers were pulled back if the crow either successfully received the reward or made a wrong connection between tool and apparatus. They were never pulled back when the subject was operating the apparatus to avoid disturbing the subject.

There were two steps to the training phase. In Step 1, the subject had the opportunity to learn tool use, until they could reliably retrieve the reward. The crows received training to drop stones into the stone-apparatus as per previous studies using this apparatus (e.g. [51, 52]. After each bird had dropped the stone into the tube 20 times without any mistakes, they moved on to the next training step. As the crows were natural tool users, they did not require any pre-training for stick use, though were habituated to the stick-apparatus and had to retrieve the reward from the apparatus 20 times.

In Step 2, we checked that the reward quality preferences were viewed as such by the subject in a reward preference test, with 11 sessions of 12 trials each until the subject selected the most preferred reward over the least preferred one in 80% of binary choices. 9 sessions were run prior to testing, one session during testing prior to running the tool selection quality allocation condition, and a final session after all tests were completed. Subjects were presented with similar sized pieces of meat, bread, dog food and apple. Although subjects showed minor individual preferences between meat, bread and dog food, all subjects consistently selected these items over apple. Hence, meat was selected as the most preferred reward and apple as the least preferred reward for all birds.

Testing

In the tool selection condition, the subject should select the functional tool from the choice of both tools–one functional and one non-functional to the presented apparatus–to obtain the reward inside the apparatus (Fig 2A). In the motivation condition, the subject should avoid work effort by selecting the immediately available most preferred reward. The choice was between the functional tool and most preferred reward, with the apparatus containing the exact same most preferred reward (Fig 2B). In the quality allocation condition, the subject should select the functional tool over the immediately available least preferred reward, but take the immediately available most preferred reward. For each apparatus, the choice was between the functional tool and least preferred reward with the most preferred reward inside the apparatus, or between the tool and most preferred reward with the least preferred reward inside the apparatus (Fig 2C). In the tool functionality condition, subjects should select the tool over the least preferred reward only when the tool was functional. For each apparatus, the choice was between the functional tool and the least preferred reward or the non-functional tool and the least preferred reward, with the most preferred reward inside the apparatus in both cases (Fig 2D). In the tool selection quality allocation condition, subjects should select the functional tool for the appropriate apparatus that contained the most preferred reward. In this condition, all task components were present, with both tools present, and the most preferred reward either in the stone-apparatus and the least preferred reward in the stick-apparatus, or the other way around (Fig 2E).

Fig 2. All conditions.

Fig 2

(a) Tool Selection Condition: both tools present, most preferred food (MPF) inside: (b) Motivation Condition: functional tool present, MPF inside and outside; (c) Quality Allocation Condition: functional tool present, either MPF inside (left), and the least preferred food (LPF) outside, or MPF outside and LPF inside (right); (d) Tool Functionality Condition: functional tool present (left) or non-functional tool present (right), MPF inside and LPF outside; (e) Tool Selection Quality Allocation Condition: both apparatuses present with both tools, MPF in stone-apparatus (left), or stick-apparatus (right), LPF in other apparatus; (f) Apparatus Functionality Condition: both apparatuses and both tools present, MPF in stone- (left) or stick- (right) apparatus, other apparatus empty; (g) Apparatus Choice Condition: only one (functional) tool is present, both apparatuses presented and baited with MPF.

In the apparatus functionality condition, we explored whether crows could choose the correct tool for the correct apparatus, by presenting subjects with both apparatuses and both tools, but only one apparatus was baited with most preferred food, while the other apparatus was empty (Fig 2F). In the apparatus choice condition, subjects should choose the correct apparatus for the available tool, with both apparatuses presented and baited with most preferred food, though only one tool was present (Fig 2G). The apparatus functionality and apparatus choice conditions were new additions to the [50] cockatoo study. These new conditions were included for the crows, as both bird species struggled with the tool selection quality allocation condition, so we aimed to explore whether giving more experience when both apparatuses were present at the same time, and further sessions of the tool selection quality allocation condition, could improve the crows’ performance in this final condition.

Children and adult humans experiments

Training

We did not use the drawers for the human study to present the choices, but rather presented the choices on the table in front of the subject, in the same way that the choice was presented to the cockatoos. The human training was the same as the crows, except the humans received fewer trials per learning step than the crows. Specifically, in step 1, the human subject was shown how to use the tool and then could try for 1 trial per apparatus. In step 2, the subject could try to use the non-functional tool for up to 10 seconds, before it was replaced with the functional tool, which they could use to obtain the reward in 1 trial per apparatus. In step 3, three reward preference trials were run (1 trial = most vs. less preferred reward, 1 trial = tissue vs most preferred reward, 1 trial = tissue vs less preferred reward) with an additional trial run at the end of the test session (most vs. less preferred reward) to confirm that this key preference still held. In step 4, eight trials were run with the apparatus containing a medium preferred reward and the choice between the functional tool and a piece of tissue, or the apparatus containing the piece of tissue and the choice between tool and medium preferred reward.

The human experiment also included a verbal command during training–“you can have the immediately available item now or the tool to try to use later”, randomising which item was mentioned first between trials. During all trials, if the subject chose the tool over the immediately available item in any trial, they had to wait before they were allowed to use it as the experimenter pulled the apparatus back out of the subject’s reach for 5 seconds and then pushed it back into reach and said ‘go’, whether their choice was correct or incorrect. We included this command in the human experiment during piloting, after we discovered that the children preferred to select the tool in the motivation condition, where the choice was between the functional tool and the immediately available most preferred reward, with the exact same most preferred reward inside the apparatus. We aimed to explore whether this command may incur a small cost of selecting the tool for the humans.

Testing

The human test procedure was the same as the crow one, other than reducing the trials per condition for the humans (see below).

Crows and humans: Test trials

For the crows, in tool selection, apparatus functionality and apparatus choice conditions, the crows received a minimum of 2 sessions of 12 trials each until 18 of 24 trials were correct in 2 consecutive sessions. In motivation and tool selection quality allocation conditions, the crows received 2 sessions of 12 trials each per condition. In quality allocation and tool functionality conditions, the crows received 4 sessions of 12 trials each per condition. The incorrect choice in each condition resulted in no reward (tool selection, tool functionality, apparatus functionality, apparatus choice conditions), least preferred reward (tool selection quality allocation, quality allocation, tool functionality conditions) or the most preferred reward (motivation condition). Within every condition, the trials were randomized across session and individuals. For the humans, in the tool selection, motivation and tool selection quality allocation conditions, humans received two trials (one per apparatus) per condition. In the quality allocation and tool functionality conditions, humans received four trials per condition (two per apparatus). We did not run the additional two conditions on apparatus functionality and apparatus choice with humans, as, unlike the birds, piloting indicated that humans did not struggle with the tool selection quality allocation condition.

Crows and humans: Test order

The crows and humans were each divided into 2 different subgroups to control for learning effects, each subgroup receiving a different order of condition: tool selection–motivation–tool functionality–quality allocation–tool selection quality allocation, the second group received the following order of tool selection–quality allocation–motivation–tool functionality–tool selection quality allocation. All crows then received the same order of the additional tests designed to further explore performance in the tool selection quality allocation condition: another 2 sessions of the tool selection quality allocationapparatus functionality–apparatus choice–tool selection quality allocation (sessions 5–8).

Data analysis

We recorded the choice per trial for each subject as ‘correct’ or ‘incorrect’. All test sessions were coded live as well as being video-recorded (unless parental consent requested otherwise for the children). 10% of trials, i.e. 240 trials for the crow data and 242 trials for the human data, were coded from video by a second observer and compared to the live coding, finding significant agreement for the human data (κ = 1; p = < .001) and the crow data (κ = .818; p = < .001). Example trials can be found in S1 and S2 Videos.

We conducted Generalized Linear Mixed Models (GLMM: [53] using R (version 2.15.0; R Core Team, 2014) using the R packages lme4 [54] and nlme [55] (for GLMM), MASS [56] (allows for negative binomial), gamlss [57] (beta distributed data) and multcomp [58] (posthoc tests within glmm) to assess which factors influenced success rate in the children and New Caledonian crows. Success was a binary variable indicating whether the subject correctly solved the trial (1) or not (0) and was entered as a dependent variable in the models. For the crows, we included the random effect of subject ID and the random slopes for trial and condition within subject ID, fixed effects of condition (1–5), apparatus type (stick/stone), gender (male/female), trial number (1–12) and age (adult/juvenile). When the problem of non-convergence occurred, the complexity of the model was reduced by dropping single factors separately until convergence was reached while keeping maximum complexity of the model. The resulting model included condition, sex, trial, age, and (1 + trial|ID) as random effect. For the children, we included the random effect of 1+Trial+Condition|ID, fixed effects of age in decimal years (continuous: ages 3–5 in individual years), condition (1–5), gender (male/female), trial number (1–14) and the interaction between age and condition. When the problem of non-convergence occurred, the complexity of the model was reduced by dropping single factors separately until convergence was reached while keeping maximum complexity of the model. The resulting model included condition, gender, trial, age, and (1 | ID) as random effect. We used deviance information criteria to compare the full model (all predictor variables, random effects) firstly with a null model, and then with reduced models to test each of the effects of interest [59]. The null model consisted of random effects and no predictor variables. The reduced model comprised of all effects present in the full model, except the effect of interest [59]. Post-hoc comparison of factors in the model exceeding two categories were calculated with Tukey correction for multiple comparisons.

For the crows, we then analysed the data in a comparable way to the cockatoo and orangutan data, using non-parametric two-tailed statistics, namely 1-sample Wilcoxon tests and Mann-Whitney U tests run in SPSS version 21. The cockatoo and orangutan data were obtained from the Supplementary Materials in [5, 50]. For the children, we ran further analyses using exact two-tailed Binomial tests to assess success rate in each condition, for each apparatus type separately, and age class (3–5 years). We focussed on group-level analyses, in order to ease any comparisons between species, however, the individual-level analyses for the crows, using two-tailed Binomial tests, can be found in S1 Table, with crow subject information in S2 Table.

Results

New Caledonian crows

For the crow data, the full model differed significantly from the null model (χ2 = 198.67, df = 94, p = < .001). We found a significant main effect of condition (Estimate = -3.580, χ 2 = -6.005, p = < .001) on success rate (correct vs. incorrect choice, S3 and S4 Tables for multiple comparisons of the factor condition). The crows generally performed well in the tool selection, motivation and quality allocation conditions, and performed poorly in the tool functionality and tool selection quality allocation conditions (Table 1). Specifically, in the tool selection condition, when choosing the correct tool for the presented apparatus, the crows chose correctly significantly above chance with both apparatus types combined (Table 1) and the stick-apparatus alone, though not with the stone-apparatus alone. In the motivation condition and quality allocation condition, the crows chose correctly significantly above chance level with both apparatus types combined (Table 1) and the stick- and stone- apparatus alone.

Table 1. Performance across all conditions for the crows across both apparatuses.

Results reflect results of Wilcoxon 1-sample signed ranks tests–chance value = 50%. Significant p-values (< .05) highlighted in bold. S1-8 stands for the number of sessions in this condition.

Condition T+ P
Tool selection (S1&2) 21 .026
Motivation 21 .02
Quality Allocation 21 .027
Tool functionality 21 .223
Tool selection quality allocation (S1&2) 19 .074
Tool selection quality allocation (S3&4) 21 .027
Tool selection quality allocation (S5-8) 21 .027
Apparatus functionality 21 .028
Apparatus choice 21 .027

In the tool functionality condition, when the correct choice was either the least preferred reward over the non-functional tool or the functional tool over the least preferred reward, the crows did not select correctly significantly above chance, either with both apparatus types combined (Table 1), nor with the stick- or stone- apparatus alone. In the tool selection quality allocation condition, when all task components were present at once, looking at session 1 and 2 only, the crows did not select correctly significantly above chance with both apparatus types combined (Table 1), nor with each apparatus type alone.

After the first two sessions, unlike in the cockatoo study [50], the crows in the present study were given a further two sessions to see whether additional experience would improve performance. In these two further sessions of the tool selection quality allocation condition, the crows selected correctly above chance with both apparatus types combined (Table 1) and with the stone-apparatus, but not with the stick-apparatus alone. Following this, again unlike the cockatoos, the crows then received further experience of both apparatuses being presented at once in the apparatus functionality and apparatus choice conditions. In the apparatus functionality condition, where both apparatuses and tools were present but only one apparatus was baited, the crows selected correctly significant above chance with both apparatus types combined and the stone-apparatus alone, though not the stick-apparatus alone. In the apparatus choice condition, where both apparatuses were baited and presented with only one tool, the crows selected correctly significantly above chance (Table 1). Following these two additional conditions, the crows received four additional sessions of the tool selection quality allocation condition. Across sessions 5–8, the crows selected correctly significantly above chance with both apparatus types combined and singly (Table 1).

Children and adult humans

For the child data, the full model differed significantly from the null model (χ2 = 259.1, df = 114, p.001). We found a significant interaction effect of age and the tool functionality condition (χ2 = 2.469, p = 0.014) as well as the motivation condition (χ2 = -2.463, p = 0.014) on success rate in children (correct vs. incorrect choice; S5 Table). Success rate increased with age in the tool functionality and motivation conditions and was also significantly poorer in both of these conditions for all ages, compared with the other conditions.

When combining all conditions, only the 4 and 5-year old children selected correctly above chance across all trials, while the 3-year olds did not select correctly above chance (Table 2). Within conditions, the 3-year olds did not select correctly above chance in any condition, except for the tool selection quality allocation condition (Table 2). The 4-year olds and 5-year olds performed well and comparably to one another, selecting correctly above chance within the tool selection, quality allocation and tool selection quality allocation conditions (Table 2). 3 to 5-year olds showed a non-significant trend to select incorrectly on the motivation condition–i.e. to select the tool even though the reward immediately available and inside the apparatus were exactly the same (both rewards were most preferred). 3 to 5-year olds did not select correctly above chance in the tool functionality condition. In adults, we found that subjects selected the correct choice above chance across all conditions and within each condition separately (Table 2).

Table 2. Correct choices (%) in tool selection, tool selection quality allocation and motivation conditions within each apparatus type for each age (3–5 years) for children and for adult humans.

P-values calculated from exact two-tailed binomial tests. Significant p-values are highlighted in bold. NS = not significant with a Bonferroni correction. “Incorrect”–selected incorrect choice above chance.

Age (in years)/ Apparatus type Tool selection Tool selection quality allocation Motivation
% p % p % p
3 –stone 68 .071 72 .024 32 .07
3 –stick 81 .001 86 < .001 26 .011 incorrect
4 –stone 87 < .001 83 < .001 27 .016 incorrect
4 –stick 100 < .001 83 < .001 37 .2
5 –stone 90 < .001 97 < .001 24 .008 incorrect
5 –stick 97 < .001 93 < .001 34 .14
Adult–stone 100 < .001 100 < .001 80 .005
Adult–stick 100 < .001 100 < .001 85 .003

In the tool functionality condition for children, 3-, 4- and 5-year old children selected significantly above chance when the tool presented was functional, i.e. they chose the functional tool over the least preferred reward (Table 3). When the tool presented was non-functional, 3 to 5-year olds significantly selected incorrectly above chance, i.e. they incorrectly chose the non-functional tool over the least preferred reward, while adults chose correctly (Table 3). In the quality allocation condition for children, 5-year olds selected correctly significantly above chance in all trials. The 3- and 4-year olds selected correctly significantly above chance when the most preferred reward was inside the apparatus and the correct choice was the tool, but when the least preferred reward was inside the apparatus, so the correct choice was the immediately available most preferred reward, 3-year olds did not have a significant preference for the correct choice and 4-year olds chose correctly with the stick apparatus but not the stone apparatus (Table 3). Adults chose correctly in all conditions (Table 3).

Table 3. Correct choices (%) in tool functionality and quality allocation conditions in each apparatus type for children aged 3–5 years and adult humans.

P-values calculated from exact two-tailed binomial tests. NS = not significant with a Bonferroni correction. Significant p-values are highlighted in bold. “Incorrect”–selected incorrect choice above chance.

Tool functionality Quality allocation
Age (in years)/ apparatus type Functional choice Non-functional choice Least preferred reward inside apparatus Most preferred reward inside apparatus
% p % p % p % p
3—stick 90 < .001 20 .001 incorrect 47 .856 87 < .001
3—stone 93 < .001 13 < .001 incorrect 57 .585 90 < .001
4 –stick 90 < .001 27 .016 incorrect 73 .016 93 < .001
4—stone 87 < .001 20 .001 incorrect 67 .099 90 < .001
5—stick 90 < .001 28 .024 incorrect 86 < .001 97 < .001
5 –stone 90 < .001 72 .024 incorrect 90 < .001 93 < .001
Adult—stick 100 < .001 100 < .001 100 < .001 100 < .001
Adult—stone 100 < .001 95 < .001 100 < .001 100 < .001

Comparison with previous studies in Goffin’s cockatoos and orangutans

We compare the performance of young children, adult humans and tool-making New Caledonian crows tested in the present study, with that of tool-making orangutans [5] and non-tool making Goffin’s cockatoos [50] tested in previous studies. As there were minor methodological differences between species, and therefore necessary differences in analyses, we make only tentative species comparisons, rather focusing on performance per species as above. We focus on performance across apparatus types (stick- and stone- apparatus) rather than separately, though present the crow and cockatoo comparisons for each apparatus type in S6 and S7 Tables. We found that adult humans selected correctly above chance in all conditions. Crows, cockatoos, orangutans and children aged 4–5 selected correctly significantly above chance in the tool selection and quality allocation conditions. Children aged 3–5 and orangutans performed poorly in the motivation condition, while crows and cockatoos performed well. Children aged 3–5 and crows performed poorly in the tool functionality condition, while cockatoos and orangutans performed well. Children aged 3–5 and orangutans performed well in the tool selection quality allocation condition, while crows and cockatoos performed poorly in this condition (Fig 3).

Fig 3. Mean percentage of correct trials for each condition for New Caledonian crows, children aged 3–5 years, adult humans (present study), Goffin’s cockatoos [50] and orangutans [5].

Fig 3

* indicate significant selection of correct choice within condition for each age from exact two-tailed binomial tests for the children and adults, and 1-sample Wilcoxon signed rank tests for the crows, cockatoos and orangutans. Dashed horizontal line indicates chance level.

Discussion

Our study was designed to investigate the ability of tool-making New Caledonian crows, 3-5-year-old children and adult humans to make profitable decisions requiring effective decision-making, delayed gratification and work-effort sensitivity in a tool-use context. We found that these species performed in a similar manner to non-tool-making Goffin’s cockatoos [50] and tool-making orangutans [5] tested in previous studies. They were able to flexibly select between reward items of differing quality and tools (including non-functional tools) relative to the context of each condition. As there were differences between the two species in training, reward types and apparatuses, we cannot compare species directly, and so focus on both species’ performances in both experiments.

First, we found that the crows and cockatoos performed well in most conditions, except the tool functionality (cockatoos only) and tool selection quality allocation conditions (crows and cockatoos). In this latter condition, all task components were present and the most profitable choice was the tool for the apparatus containing high-quality food, while ignoring the other tool for the second apparatus containing low-quality food. The orangutans and humans performed significantly above chance, while the birds did not, in this condition. However, additional experience (not available in the cockatoo study) improved performance for the crows in this condition.

Second, the birds and adult humans performed significantly above chance, while the 3–5 year old children and orangutans did not, in the motivation condition, where the most profitable choice was to take the preferred reward that was immediately available on the table, rather than use a tool to obtain the same preferred reward from inside an apparatus, indicating work-effort sensitivity. Third, the cockatoos, orangutans and adult humans performed significantly above chance, while the crows and children did not, in the tool functionality condition, where the most profitable choice was to take an immediately available reward rather than try to use a non-functional tool, which would not result in a reward, indicating an ability to take into account tool functionality while making decisions. Specifically, children and crows often chose the non-functional tool over an immediately available reward.

That 3-year old children struggled with most problems requiring the ability to delay gratification and in understanding how to succeed on various tasks echoes past child development work, such as [14, 46]. However, our results also suggest that experience may improve tool-related delayed gratification performance, given that they passed the tool selection quality allocation condition. The birds’ performance in the four tasks that the 3-year olds struggled with, and particularly the motivation condition, which 4-5-year olds and orangutans also struggled with due to an apparent preference for using tools, is particularly intriguing, though note that the birds received more trials in this condition than children. Generally, these performances indicate that these tasks are non-trivial in terms of motivation and the cognition required and so illustrate the high levels of self-control that these species possess, even when solving tool problems. The motivation condition, where children and orangutans continued to select the tool even when the same high-quality reward was immediately available, suggests that they, though not the adult humans or birds, may have a bias towards tool-use. That is, when all things are equal, our result suggests children and orangutans prefer to use a tool to get a reward over the more efficient choice of directly taking an equal reward. However, once tool-use leads to different types of food, as in the quality allocation condition, orangutans and 4-5-year olds, though not 3-year olds, appear able to overcome the desire to immediately select the tool. This could be caused by an inability to inhibit taking a tool before evaluating the reward options or a different understanding of the goal of the task, i.e. exploring the tools and apparatuses instead of focussing on acquiring the rewards.

Another possibility is that there may have been an issue of contra-freeloading, i.e. that the use of the tool was a reward in itself. In many trials, children tried to use the incorrect tool to acquire the reward inside the apparatus and were given a chance to do so until they gave up; however, some children also realised their mistake after making the wrong choice and confirmed so verbally without trying to use the non-functional tool or when receiving a non-preferred reward. It is unusual that contra-freeloading would influence the children and orangutans, though not the other groups tested. As the crows were wild birds, unlike the cockatoos and orangutans, they may have had more of a food-shortage mentality, i.e. be less likely to contra-freeload. However, a recent study by McCoy et al. [60] showed that after tool-use, New Caledonian crows are more optimistic towards a task compared with when they had not used tools, which shows that tool-use in itself is rewarding for the crows, and so does not provide support for the contra-freeloading hypothesis in relation to this study. Similarly, the cockatoos–which were captive birds—were able to refrain from selecting the tool when it was not necessary. As children aged 3–5 continued to select the tool even when it was not required, rather than just the youngest children, this suggests that it is not an issue with understanding of needing a tool if one is available. Further work is clearly needed to investigate the possibility of a bias towards using tools that even other tool-making species, such as New Caledonian crows, do not have. Such a bias could potentially explain differences between the tool behaviours of great apes and other species, just as a bias in the motivation to cooperate may explain differences between chimpanzee and human cooperation [61]. When given a choice between working together or alone for the same reward, children, though not chimpanzees, show a preference for working together [61], though recently one kea also showed this bias [62].

In a surprising result, both crows and children, though not the cockatoos, orangutans or adult humans, performed poorly in the tool functionality condition, as they selected the non-functional tool over the immediate reward. These results cannot be explained by a tool functionality understanding issue, given these subjects reliably discriminated between functional and non-functional tools in the tool selection task. There are a number of other potential hypotheses. First, crows and children may have a drive to ‘give it a go’ and try to make non-functional tools work, rather than accept a low-value reward, due to past experience that sometimes tools that appear not to work can be made to function. Thus, both these subject groups may have higher persistence than cockatoos, who have less experience of such situations. However, the success of human adults and orangutans counts against this hypothesis, as they would likely have as much, if not more experience, that non-functional tools sometimes can be made to work.

One possibility is potential species differences in the perceived reward value associated with each particular test. The rewards did differ between subject groups, as they were selected to be most appropriate and desirable for each group. Hence, the crows and children may have selected the non-functional tool over the least preferred reward due to the low preferential value assigned to the latter type. The difference between food rewards for the cockatoos was most preferred and third (least) preferred reward. In comparison, the crows were presented with meat as most preferred reward and apple as least preferred reward, as they showed consistent preference for meat over apple. However, the quality of the rewards used for the other subject groups also differed from one another, yet they selected correctly in this condition. A final possibility is that these results are directly due to differences in self-control. Despite the large amounts of experience crows and children had in deciding when to use a tool and when a tool was not functional, these subjects may have struggled to inhibit picking up a non-functional, previously rewarded tool in this task. As described above, in the motivation condition, children continued to select a (functional) tool over a high-quality reward, indicating that tool-use itself may have been rewarding for them, which could have been the case even when the tool was non-functional. Cockatoos, orangutans and adults in contrast, may simply have had sufficient self-control to ignore this tool. Future work incorporating both tool-related and non-tool-use delayed gratification paradigms across species are required to further explore this possibility.

The clearest difference in performance between the primates and birds appeared in the tool selection quality allocation condition. Here, both the crows and cockatoos struggled, while orangutans, 3-to 5-year old children and adult humans performed well. This was also the only condition that 3-year old children passed. The authors in a previous study [50] suggested that the poor cockatoo performance in this condition reflected possible information processing limitations, with subjects being unable to focus their attention on all relevant cues at once [63]. Additionally, chimpanzees, bonobos and orangutans struggled when facing a trap-tube problem requiring simultaneously considering two spatial object-object relations (e.g. tool-reward), which the authors suggested may be due to cognitive overload in the attentional system [64]. In the present study, we found that the crows also struggled with this condition, at least within the first two sessions, though three individuals did perform above chance. We included two new conditions for the crows that were not used in the cockatoo study [50], where the subjects gained more experience of selecting the correct tool or apparatus when both apparatuses were present, before further testing in the tool selection quality allocation condition. We found that this additional experience did improve crow performance in this condition. Our findings support the suggestion by [50]: without extended experience, birds, though not great apes–at least in this context, though not necessarily in others [64]—may have issues in attending to multiple variables at once. This may affect problem-solving performance in other physical cognition tasks.

New Caledonian crows and Goffin’s cockatoos show comparable performance in some cognitive tasks, such as tool-manufacture in the lab and flexible problem-solving skills [21, 22, 38, 44, 65, 66]. They both show high levels of object manipulation in and outside the foraging context [65] and make intrinsically structured object combinations [67]. New Caledonian crows often use stick tools in the wild, which may explain why they performed better using the stick-apparatus than the stone-apparatus in the tool selection condition. However, it does not explain why they performed better with the stone-apparatus than the stick-apparatus in the apparatus functionality condition. Importantly though, only the crows routinely make and use tools in the wild [43], while there is limited evidence that cockatoos make tools in the wild [23], though they can do so proficiently in the lab [22].

Despite this, we found that the crows performed similarly to the cockatoos. This result is in line with studies indicating that non-tool-making ravens perform comparably with tool-making chimpanzees in a delayed gratification tool-use context [34]. It also corresponds with previous findings that New Caledonian crows do not have higher levels of motor self-regulation–defined as stopping a pre-potent but counter-productive movement–than non-tool making carrion crows [68]. Clearly, there are several potential explanations for why the cockatoos performed well in the tool functionality condition, while the crows struggled, including different experiences (cockatoos were hand reared with extensive testing experience, while the crows were wild-caught), minor variation in the procedure, differences in persistence or self-control.

In conclusion, we found that 3 to 5-year old children, adult humans and tool-making New Caledonian crows were able to make profitable decisions requiring tool-use and the ability to delay gratification, make effective decisions and judge work-effort, by adapting and extending a paradigm used previously in non-tool making Goffin’s cockatoos [50] and tool-making orangutans [5]. Our findings suggest that: (1) crows and cockatoos show self-control and can perform similarly to adult humans and older children on a range of tool related delayed gratification tasks, but struggle when they have to attend to multiple cues thereby increasing task complexity; (2) children and orangutans may have a bias to use tools that other tool-making species may not have. Future work extending these findings should offer valuable insight into how the cognition behind self-control in corvids, parrots and primates evolves.

Supporting information

S1 Table. Number of correct trials for all constellations of the tests for the crows for each individual.

In the tool selection and apparatus functionality condition the total number of trials (in brackets) varied between individuals, depending on when they reached the criterion. P-values calculated from exact binomial tests. Significant p-values highlighted in bolt. MPR = most preferred reward, Sessions 1 to 2 in the tool selection and tool selection quality allocation condition are for comparability with the Goffin’s cockatoos; * p < 0.05, ** p <0.01, *** p < 0.001.

(DOCX)

S2 Table. Crow subject information.

(DOCX)

S3 Table. Generalized linear mixed models on factors affecting the number of correct trials in crows.

N = 6. Significant p-values are highlighted in bold.

(DOCX)

S4 Table. Posthoc comparison of conditions of crow data with Tukey correction for multiple comparison.

(DOCX)

S5 Table. Generalized linear mixed models on factors affecting the number of correct trials in children aged 3–5 years, with age in years.

N = 88. Significant p-values are highlighted in bold.

(DOCX)

S6 Table. Posthoc comparison of conditions of children data with Tukey correction for multiple comparison.

(DOCX)

S7 Table. Performance across all conditions for the crows with each apparatus singly.

Results reflect results of Wilcoxon 1-sample signed ranks tests–chance value = 50%. Significant p-values (<0.05) highlighted in bold.

(DOCX)

S8 Table. Comparison of performance within conditions between crows and cockatoos.

Results reflect Mann Whitney U-tests. Significant p-values highlighted in bold.

(DOCX)

S1 Video

(MP4)

S2 Video

(MP4)

Acknowledgments

We would like to thank the staff, parents and children at Sutton V.A. Lower School, St Andrew’s C of E Primary School, Under Fives Roundabout, Histon Early Years Centre and Patacake Day Nursery for their participation in this study. Thank you to Ian Millar for help in apparatus construction and to Alizée Vernouillet for video coding. Thanks to Province Sud for the permission to work in New Caledonia, and to Dean M. and Boris C. for granting property access for catching and releasing the crows.

Data Availability

The full data set is available on Figshare: https://figshare.com/s/8e1670783219ddc4561e

Funding Statement

The study was funded by the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013)/ERC Grant Agreement No. 3399933, awarded to N.S.C (PI). R.G., M.S., A.H.T. received funding from a Royal Society of New Zealand Rutherford Discovery Fellowship and a Prime Ministers McDarmid Emerging Scientist prize awarded to A.H.T. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Diamond A. Executive functions. Annual review of psychology. 2013;64:135–68. 10.1146/annurev-psych-113011-143750 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Santos LR, Rosati AG. The evolutionary roots of human decision making. Annual review of psychology. 2015;66:321–47. 10.1146/annurev-psych-010814-015310 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.McCormack T, Atance CM. Planning in young children: A review and synthesis. Developmental Review. 2011;31(1):1–31. [Google Scholar]
  • 4.De Petrillo F, Ventricelli M, Ponsi G, Addessi E. Do tufted capuchin monkeys play the odds? Flexible risk preferences in Sapajus spp. Animal cognition. 2015;18(1):119–30. 10.1007/s10071-014-0783-7 [DOI] [PubMed] [Google Scholar]
  • 5.Laumer I, Auersperg AM, Bugnyar T, Call J. Orangutans (Pongo abelii) make flexible decisions relative to reward quality and tool functionality in a multi-dimensional tool-use task. PloS one. 2019;14(2):e0211031 10.1371/journal.pone.0211031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Nigg JT. Annual Research Review: On the relations among self‐regulation, self‐control, executive functioning, effortful control, cognitive control, impulsivity, risk‐taking, and inhibition for developmental psychopathology. Journal of child psychology and psychiatry. 2017;58(4):361–83. 10.1111/jcpp.12675 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Beran MJ, Rossettie MS, Parrish AE. Trading up: Chimpanzees (Pan troglodytes) show self-control through their exchange behavior. Animal cognition. 2016;19(1):109–21. 10.1007/s10071-015-0916-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Mischel W, Shoda Y, Rodriguez MI. Delay of gratification in children. Science. 1989;244(4907):933–8. 10.1126/science.2658056 [DOI] [PubMed] [Google Scholar]
  • 9.Watts TW, Duncan GJ, Quan H. Revisiting the Marshmallow Test: A Conceptual Replication Investigating Links Between Early Delay of Gratification and Later Outcomes. Psychological science. 2018:0956797618761661. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Diamond A, Goldman-Rakic PS. Comparison of human infants and rhesus monkeys on Piaget's AB task: Evidence for dependence on dorsolateral prefrontal cortex. Experimental brain research. 1989;74(1):24–40. 10.1007/bf00248277 [DOI] [PubMed] [Google Scholar]
  • 11.Rothbart M, Derryberry D, Posner M. A psychobiological approach to the development of temperament. Temperament: Individual differences at the interface of biology and behavior. 1994:83–116. [Google Scholar]
  • 12.Carlson SM, Davis AC, Leach JG. Less is more: Executive function and symbolic representation in preschool children. Psychological science. 2005;16(8):609–16. 10.1111/j.1467-9280.2005.01583.x [DOI] [PubMed] [Google Scholar]
  • 13.Kochanska G, Murray KT, Harlan ET. Effortful control in early childhood: continuity and change, antecedents, and implications for social development. Developmental psychology. 2000;36(2):220 [PubMed] [Google Scholar]
  • 14.Hughes C. Executive function in preschoolers: Links with theory of mind and verbal ability. British Journal of Developmental Psychology. 1998;16(2):233–53. [Google Scholar]
  • 15.Macdonald JA, Beauchamp MH, Crigan JA, Anderson PJ. Age-related differences in inhibitory control in the early school years. Child Neuropsychology. 2014;20(5):509–26. 10.1080/09297049.2013.822060 [DOI] [PubMed] [Google Scholar]
  • 16.Byrne RW. The technical intelligence hypothesis: an additional evolutionary stimulus to intelligence Machiavellian intelligence II, ed Whiten A & Byrne R. 1997:289–311. [Google Scholar]
  • 17.Miller R, Boeckle M, Jelbert SA, Frohnwieser A, Wascher CA, Clayton NS. Self‐control in crows, parrots and nonhuman primates. Wiley Interdisciplinary Reviews: Cognitive Science. 2019:e1504 10.1002/wcs.1504 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Addessi E, Paglieri F, Focaroli V. The ecological rationality of delay tolerance: insights from capuchin monkeys. Cognition. 2011;119(1):142–7. 10.1016/j.cognition.2010.10.021 [DOI] [PubMed] [Google Scholar]
  • 19.Addessi E, Paglieri F, Beran MJ, Evans TA, Macchitella L, De Petrillo F, et al. Delay choice versus delay maintenance: Different measures of delayed gratification in capuchin monkeys (Cebus apella). Journal of Comparative Psychology. 2013;127(4):392 10.1037/a0031869 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Amici F, Aureli F, Call J. Fission-fusion dynamics, behavioral flexibility, and inhibitory control in primates. Current Biology. 2008;18(18):1415–9. 10.1016/j.cub.2008.08.020 [DOI] [PubMed] [Google Scholar]
  • 21.Auersperg AM, von Bayern AM, Weber S, Szabadvari A, Bugnyar T, Kacelnik A. Social transmission of tool use and tool manufacture in Goffin cockatoos (Cacatua goffini). Proc R Soc B. 2014;281(1793):20140972 10.1098/rspb.2014.0972 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Auersperg AM, Szabo B, von Bayern AM, Kacelnik A. Spontaneous innovation in tool manufacture and use in a Goffin’s cockatoo. Current Biology. 2012;22(21):R903–R4. 10.1016/j.cub.2012.09.002 [DOI] [PubMed] [Google Scholar]
  • 23.Osuna-Mascaró A, Auersperg A. On the brink of tool use? Could object combinations during foraging in a feral Goffin’s cockatoo (Cacatua goffiniana) result in tool innovations. Anim Behav Cogn. 2018;5:229–34. [Google Scholar]
  • 24.Hillemann F, Bugnyar T, Kotrschal K, Wascher CA. Waiting for better, not for more: corvids respond to quality in two delay maintenance tasks. Animal behaviour. 2014;90:1–10. 10.1016/j.anbehav.2014.01.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Auersperg AM, Laumer I, Bugnyar T. Goffin cockatoos wait for qualitative and quantitative gains but prefer ‘better’to ‘more’. Biology letters. 2013;9(3):20121092 10.1098/rsbl.2012.1092 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Dufour V, Wascher CA, Braun A, Miller R, Bugnyar T. Corvids can decide if a future exchange is worth waiting for. Biology Letters. 2011:rsbl20110726. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Dufour V, Pelé M, Sterck E, Thierry B. Chimpanzee (Pan troglodytes) anticipation of food return: coping with waiting time in an exchange task. Journal of Comparative Psychology. 2007;121(2):145 10.1037/0735-7036.121.2.145 [DOI] [PubMed] [Google Scholar]
  • 28.Cheke LG, Clayton NS. Eurasian jays (Garrulus glandarius) overcome their current desires to anticipate two distinct future needs and plan for them appropriately. Biology Letters. 2011:rsbl20110909. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Ostojić L, Legg EW, Brecht KF, Lange F, Deininger C, Mendl M, et al. Current desires of conspecific observers affect cache-protection strategies in California scrub-jays and Eurasian jays. Current Biology. 2017;27(2):R51–R3. 10.1016/j.cub.2016.11.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Ostojić L, Legg EW, Shaw RC, Cheke LG, Mendl M, Clayton NS. Can male Eurasian jays disengage from their own current desire to feed the female what she wants? Biology letters. 2014;10(3):20140042 10.1098/rsbl.2014.0042 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Osvath M, Osvath H. Chimpanzee (Pan troglodytes) and orangutan (Pongo abelii) forethought: self-control and pre-experience in the face of future tool use. Animal cognition. 2008;11(4):661–74. 10.1007/s10071-008-0157-0 [DOI] [PubMed] [Google Scholar]
  • 32.Evans TA, Westergaard GC. Self-control and tool use in tufted capuchin monkeys (Cebus apella). Journal of Comparative Psychology. 2006;120(2):163 10.1037/0735-7036.120.2.163 [DOI] [PubMed] [Google Scholar]
  • 33.Laumer I, Bugnyar T, Auersperg A. Flexible decision-making relative to reward quality and tool functionality in Goffin cockatoos (Cacatua goffiniana). Scientific reports. 2016;6:28380 10.1038/srep28380 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Kabadayi C, Osvath M. Ravens parallel great apes in flexible planning for tool-use and bartering. Science. 2017;357(6347):202–4. 10.1126/science.aam8138 [DOI] [PubMed] [Google Scholar]
  • 35.Redshaw J, Taylor AH, Suddendorf T. Flexible planning in ravens? Trends in cognitive sciences. 2017;21(11):821–2. 10.1016/j.tics.2017.09.001 [DOI] [PubMed] [Google Scholar]
  • 36.Dickerson KL, Ainge JA, Seed AM. The role of Association in pre-schoolers’ solutions to “spoon tests” of future planning. Current Biology. 2018;28(14):2309–13. e2. [DOI] [PubMed] [Google Scholar]
  • 37.Seed AM, Tebbich S, Emery NJ, Clayton NS. Investigating physical cognition in rooks, Corvus frugilegus. Current Biology. 2006;16(7):697–701. 10.1016/j.cub.2006.02.066 [DOI] [PubMed] [Google Scholar]
  • 38.Taylor AH, Hunt GR, Medina FS, Gray RD. Do New Caledonian crows solve physical problems through causal reasoning? Proceedings of the Royal Society of London B: Biological Sciences. 2009;276(1655):247–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Tebbich S, Seed AM, Emery NJ, Clayton NS. Non-tool-using rooks, Corvus frugilegus, solve the trap-tube problem. Animal cognition. 2007;10(2):225–31. 10.1007/s10071-006-0061-4 [DOI] [PubMed] [Google Scholar]
  • 40.Visalberghi E, Limongelli L. Lack of comprehension of cause€ ffect relations in tool-using capuchin monkeys (Cebus apella). Journal of Comparative Psychology. 1994;108(1):15 10.1037/0735-7036.108.1.15 [DOI] [PubMed] [Google Scholar]
  • 41.Reaux JE, Povinelli DJ. The trap-tube problem Folk physics for apes: a chimpanzee’s theory of how the world works Oxford University Press, Oxford: 2000:108–31. [Google Scholar]
  • 42.Silva FJ, Page DM, Silva KM. Methodological-conceptual problems in the study of chimpanzees’ folk physics: how studies with adult humans can help. Animal Learning & Behavior. 2005;33(1):47–58. [DOI] [PubMed] [Google Scholar]
  • 43.Hunt GR. Manufacture and use of hook-tools by New Caledonian crows. Nature. 1996;379(6562):249. [Google Scholar]
  • 44.Taylor AH, Hunt GR, Gray RD. Context-dependent tool use in New Caledonian crows. Biology letters. 2012;8(2):205–7. 10.1098/rsbl.2011.0782 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Cheke LG, Loissel E, Clayton NS. How do children solve Aesop's Fable? PloS one. 2012;7(7):e40574 10.1371/journal.pone.0040574 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Zelazo PD, Müller U, Frye D, Marcovitch S, Argitis G, Boseovski J, et al. The development of executive function in early childhood. Monographs of the society for research in child development. 2003:i–151. [DOI] [PubMed] [Google Scholar]
  • 47.Leavens DA, Bard KA, Hopkins WD. The mismeasure of ape social cognition. Animal cognition. 2017:1–18. 10.1007/s10071-016-1051-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Kenward B, Rutz C, Weir AA, Chappell J, Kacelnik A. Morphology and sexual dimorphism of the New Caledonian crow Corvus moneduloides, with notes on its behaviour and ecology. Ibis. 2004;146(4):652–60. [Google Scholar]
  • 49.Hunt GR. Social and spatial reintegration success of New Caledonian crows (Corvus moneduloides) released after aviary confinement. The Wilson Journal of Ornithology. 2016;128(1):168–73. [Google Scholar]
  • 50.Laumer I, Bugnyar T, Auersperg AM. Flexible decision-making relative to reward quality and tool functionality in Goffin cockatoos (Cacatua goffiniana). Scientific reports. 2016;6:28380 10.1038/srep28380 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Miller R, Jelbert SA, Taylor AH, Cheke LG, Gray RD, Loissel E, et al. Performance in Object-Choice Aesop's Fable Tasks Are Influenced by Object Biases in New Caledonian Crows but not in Human Children. Plos One. 2016;11(12). doi: ARTN e0168056 10.1371/journal.pone.0168056 PubMed PMID: WOS:000389587100265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Jelbert SA, Taylor AH, Cheke LG, Clayton NS, Gray RD. Using the Aesop's Fable paradigm to investigate causal understanding of water displacement by New Caledonian crows. PloS one. 2014;9(3):e92895 10.1371/journal.pone.0092895 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Baayen RH. Analyzing linguistic data: A practical introduction to statistics using R: Cambridge University Press; 2008. [Google Scholar]
  • 54.Bates D, Maechler M, Bolker B, Walker S. Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software. 2015;67(1):1–48. doi: papers2://publication/doi/10.18637/jss.v067.i01. [Google Scholar]
  • 55.Pinheiro J, Bates D, DebRoy S, Sarkar D. Linear and nonlinear mixed effects models. R package version. 2006:3.1–73.
  • 56.Venables WN, Ripley BD. Modern applied statistics with S-PLUS: Springer Science & Business Media; 2013. [Google Scholar]
  • 57.Mayr A, Fenske N, Hofner B, Kneib T, Schmid M. Generalized additive models for location, scale and shape for high dimensional data—a flexible approach based on boosting. Journal of the Royal Statistical Society: Series C (Applied Statistics). 2012;61(3):403–27. [Google Scholar]
  • 58.Hothorn T, Bretz F, Westfall P. Simultaneous inference in general parametric models. Biometrical Journal: Journal of Mathematical Methods in Biosciences. 2008;50(3):346–63. [DOI] [PubMed] [Google Scholar]
  • 59.Forstmeier W, Schielzeth H. Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner's curse. Behavioral Ecology and Sociobiology. 2011;65(1):47–55. 10.1007/s00265-010-1038-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.McCoy DE, Schiestl M, Neilands P, Hassall R, Gray RD, Taylor AH. New caledonian crows behave optimistically after using tools. Current Biology. 2019;29(16):2737–42. e3. 10.1016/j.cub.2019.06.080 [DOI] [PubMed] [Google Scholar]
  • 61.Rekers Y, Haun DB, Tomasello M. Children, but not chimpanzees, prefer to collaborate. Current Biology. 2011;21(20):1756–8. 10.1016/j.cub.2011.08.066 [DOI] [PubMed] [Google Scholar]
  • 62.Heaney M, Gray RD, Taylor AH. Keas perform similarly to chimpanzees and elephants when solving collaborative tasks. PloS one. 2017;12(2):e0169799 10.1371/journal.pone.0169799 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Rowe C, Healy SD. Measuring variation in cognition. Behavioral Ecology. 2014;25(6):1287–92. [Google Scholar]
  • 64.Völter CJ, Call J. The cognitive underpinnings of flexible tool use in great apes. Journal of Experimental Psychology: Animal Learning and Cognition. 2014;40(3):287. [DOI] [PubMed] [Google Scholar]
  • 65.Auersperg AM, Kacelnik A, von Bayern AM. Explorative learning and functional inferences on a five-step means-means-end problem in Goffin’s cockatoos (Cacatua goffini). PloS one. 2013;8(7):e68979 10.1371/journal.pone.0068979 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Auersperg AM, Von Bayern AM, Gajdon GK, Huber L, Kacelnik A. Flexibility in problem solving and tool use of kea and New Caledonian crows in a multi access box paradigm. PLoS One. 2011;6(6):e20231 10.1371/journal.pone.0020231 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Auersperg AM, van Horik JO, Bugnyar T, Kacelnik A, Emery NJ, von Bayern AM. Combinatory actions during object play in psittaciformes (Diopsittaca nobilis, Pionites melanocephala, Cacatua goffini) and corvids (Corvus corax, C. monedula, C. moneduloides). Journal of Comparative Psychology. 2015;129(1):62 10.1037/a0038314 [DOI] [PubMed] [Google Scholar]
  • 68.Teschke I, Wascher C, Scriba MF, von Bayern AMP, Huml V, Siemers B, et al. Did tool-use evolve with enhanced physical cognitive abilities? Phil Trans R Soc B. 2013;368(1630):20120418 10.1098/rstb.2012.0418 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Juliane Kaminski

3 Sep 2019

PONE-D-19-18492

Decision-making flexibility in New Caledonian crows, young children and adult humans in a multi-dimensional tool-use task

PLOS ONE

Dear Dr Frohnwieser,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

As you will see both reviewers are mainly positive but have several comments which I would ask you to attend to. Here I would like to highlight Reviewer 1's comment on the species comparison being potentially weakened by the fact that species received a different number of trials. Rev 1 also has several statistical comment I would like yo to attend to and Rev 2 has several comment which you will find helpful when revising your introduction and discussion.

==============================

We would appreciate receiving your revised manuscript by Oct 18 2019 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Juliane Kaminski

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: In this article, the authors examined flexible decision-making in the context of tool-use tasks in New Caledonian crows and human children and adults. Comparisons with previous studies with orangutans and cockatoos are made. While the results are interesting, I do not think the results justify the conclusions drawn with regard to the species comparisons. Additionally, there are a number of issues with the statistical analyses (as detailed below).

There are differences in the procedure (e.g. the trial number) and the statistical analysis (which seems to be a consequence of the difference in the procedure) which critically undermine the comparison between crows and humans. The crows had more opportunities to learn across trials compared to humans. At various locations throughout the manuscript, birds are compared to humans (e.g., lines 38, 612 and 616) without acknowledging the differences in procedure (and as a consequence the possibility that similar performance levels might reflect different cognitive underpinnings).

Line 347: Interobserver reliability: Kappa and the number of observations and more details about the reliability coding (did the same person do the live coding and the reliability coding? was the reliability coder naïve with respect to the research questions?) should be added.

Line 350-364: were random slopes included in the GLMMs (i.e. random slopes of condition, apparatus type, and trial number)? There is evidence that without a maximal random slopes structure GLMMs can be overconfident (Barr , Levy, Scheepers, Tily 2013 Random effects structure for confirmatory hypothesis testing: Keep it maximal. J Mem Lang 68(3), 255-278.)

Line 363: which variables were specified as “control variables”?

Line 378: The reported chi-squared value “-0.32” does not seem to be correct (if the p value is <0.001). Looking at table S3 shows that “-0.32” is probably the estimate.

Line 378: was condition included as a factor (i.e., as a dummy-coded categorical variable)? Given that there is only one estimate for condition (table S3), condition seems to be entered as a covariate, which would be difficult to justify. Could the authors please specify how condition was entered in the model? Assuming that it is treated as a factor why are no pairwise comparisons reported compared to a reference category (e.g. in Table S3)?

Line 426: the main effect of age has a limited meaning in the presence of the interaction term (it refers to the age effect with condition at its reference category). For that reason, I would recommend to just report the interaction term here. Besides, how did the authors calculate a likelihood ratio test (LRT) for age (in the presence of an interaction term)?

Line 426: typo “=,”

Line 429-430: a difference between the conditions is reported without any supporting inferential statistics (e.g. post-hoc tests).

Table 2, table S7 and Lines 486, 491: are multiple observations per individual included here in these binomial tests? Given that multiple trials per condition and subject were conducted, it seems to be the case (looking at the data file supports this interpretation). This is a case of pseudo-replication. Condition and apparatus type need to be analyzed separately to avoid pseudo-replication.

Table 3 seems to be redundant and just a dichotomized version of Figure 3.

Discussion section: Even though the authors acknowledge the differences in procedure between the different species, they draw comparisons here between children and the birds. However, while the birds received 24 trials in the motivation condition, for example, the children received 2 trials. Given that we do not know how children’s performance would look like after 24 trials I am not sure what can be said about the differences in performance between humans and crows. For example, I am not sure whether children would continue to choose the tool in motivation trials after 24 trials (and/or when children would receive a food reward).

Supplementary videos: it looks like the experimenter touched both tools and looked at the child when children made their choice. Where any precautions taken to guard against the possibility of inadvertent cueing?

Reviewer #2: This is a very interesting and methodologically sound study. However, it has remained unclear what the focus of the research is (self-control or a broader focus also on undertsanding of tool functionality), and this has to be made clearer in the introduction as well as discussion. The results can in parts be presented more concisely. I had some comments on the analysis. My comments can be found in the attached file.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: Review for PONE-D-19-18492.docx

PLoS One. 2020 Mar 11;15(3):e0219874. doi: 10.1371/journal.pone.0219874.r002

Author response to Decision Letter 0


17 Jan 2020

Response to Reviewers

Dear Dr Kaminski,

Following an invitation to revise and re-submit (PONE-D-19-18492), we would like to re-submit our manuscript entitled ‘Decision-making flexibility in New Caledonian crows, young children and adult humans in a multi-dimensional tool-use task’.

We wish to thank you and the two reviewers for the helpful and constructive comments. We have now fully revised the manuscript and accompanying documents in accordance with these comments. Please find responses to each comment in this response to reviewers document – please note that line numbers correspond with the tracked changes version of the manuscript.

We hope that following our revisions, you will consider our manuscript for publication in PLOS ONE.

Yours Sincerely,

Rachael Miller, Romana Gruber, Anna Frohnwieser, Martina Schiestl, Sarah A. Jelbert, Russell D. Gray, Markus Boeckle, Alex H. Taylor, Nicola S. Clayton

PONE-D-19-18492

Decision-making flexibility in New Caledonian crows, young children and adult humans in a multi-dimensional tool-use task

PLOS ONE

Dear Dr Frohnwieser,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

As you will see both reviewers are mainly positive but have several comments which I would ask you to attend to. Here I would like to highlight Reviewer 1's comment on the species comparison being potentially weakened by the fact that species received a different number of trials. Rev 1 also has several statistical comment I would like yo to attend to and Rev 2 has several comment which you will find helpful when revising your introduction and discussion.

==============================

We would appreciate receiving your revised manuscript by Oct 18 2019 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

• A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

• A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

• An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Juliane Kaminski

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

________________________________________

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

________________________________________

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: In this article, the authors examined flexible decision-making in the context of tool-use tasks in New Caledonian crows and human children and adults. Comparisons with previous studies with orangutans and cockatoos are made. While the results are interesting, I do not think the results justify the conclusions drawn with regard to the species comparisons.

We now present the results of the two studies as separate studies and do not directly compare the data. We now draw conclusions that are based on our data.

Additionally, there are a number of issues with the statistical analyses (as detailed below).

We changed the statistical analysis according to the reviewers comments (see below)

There are differences in the procedure (e.g. the trial number) and the statistical analysis (which seems to be a consequence of the difference in the procedure) which critically undermine the comparison between crows and humans. The crows had more opportunities to learn across trials compared to humans. At various locations throughout the manuscript, birds are compared to humans (e.g., lines 38, 612 and 616) without acknowledging the differences in procedure (and as a consequence the possibility that similar performance levels might reflect different cognitive underpinnings).

A paragraph was added at the beginning of the materials and methods section (line 142-146) and the Discussion section (line 530-532) to acknowledge the differences in methodology between the species.

Line 347: Interobserver reliability: Kappa and the number of observations and more details about the reliability coding (did the same person do the live coding and the reliability coding? was the reliability coder naïve with respect to the research questions?) should be added.

Cohen’s Kappa and number of observations has been added (line 370-375). The live coding and video coding were conducted by different observers; however, due to the nature of the experiment (i.e. being able to tell from the video whether a trial was solved correctly or not) the second observer was not completely naïve to the research question.

Line 350-364: were random slopes included in the GLMMs (i.e. random slopes of condition, apparatus type, and trial number)? There is evidence that without a maximal random slopes structure GLMMs can be overconfident (Barr , Levy, Scheepers, Tily 2013 Random effects structure for confirmatory hypothesis testing: Keep it maximal. J Mem Lang 68(3), 255-278.)

We now include random slopes and intercept by using (1+Trial | Individual) as random factor. Based on the problem of non-convergence, we had to reduce the complexity of the model and drop the factor apparatus as fixed factor and condition in the random slopes. In case of non-convergence we dropped each single factor separately until convergence was reached while keeping maximum complexity of the model, as the reviewer 2 suggested.

Line 363: which variables were specified as “control variables”?

We don’t use the term control variables anymore, only the terms predictor variables and random effects

Line 378: The reported chi-squared value “-0.32” does not seem to be correct (if the p value is <0.001). Looking at table S3 shows that “-0.32” is probably the estimate.

Correct, we changed this now to present the z-value.

Line 378: was condition included as a factor (i.e., as a dummy-coded categorical variable)? Given that there is only one estimate for condition (table S3), condition seems to be entered as a covariate, which would be difficult to justify. Could the authors please specify how condition was entered in the model? Assuming that it is treated as a factor why are no pairwise comparisons reported compared to a reference category (e.g. in Table S3)?

It was entered as a factor. We now report the values for all categories and the initial results of the model as well as the pairwise comparison with Tukey post-hoc test in the supplements.

Line 426: the main effect of age has a limited meaning in the presence of the interaction term (it refers to the age effect with condition at its reference category). For that reason, I would recommend to just report the interaction term here. Besides, how did the authors calculate a likelihood ratio test (LRT) for age (in the presence of an interaction term)?

We now report the interaction term only. We used comparisons of deviance information criteria and report this in the analysis section now.

Line 426: typo “=,”

This has been corrected

Line 429-430: a difference between the conditions is reported without any supporting inferential statistics (e.g. post-hoc tests).

We now report the inferential statistics on the data.

Table 2, table S7 and Lines 486, 491: are multiple observations per individual included here in these binomial tests? Given that multiple trials per condition and subject were conducted, it seems to be the case (looking at the data file supports this interpretation). This is a case of pseudo-replication. Condition and apparatus type need to be analyzed separately to avoid pseudo-replication.

This analysis was rerun with condition and apparatus type separately to avoid pseudo-replication.

Table 3 seems to be redundant and just a dichotomized version of Figure 3.

Table 3 has been removed from the manuscript.

Discussion section: Even though the authors acknowledge the differences in procedure between the different species, they draw comparisons here between children and the birds. However, while the birds received 24 trials in the motivation condition, for example, the children received 2 trials. Given that we do not know how children’s performance would look like after 24 trials I am not sure what can be said about the differences in performance between humans and crows. For example, I am not sure whether children would continue to choose the tool in motivation trials after 24 trials (and/or when children would receive a food reward).

We added a sentence to acknowledge this difference in procedure (line 558).

Supplementary videos: it looks like the experimenter touched both tools and looked at the child when children made their choice. Where any precautions taken to guard against the possibility of inadvertent cueing?

Great care was taken to ensure the two tools/stickers were placed on the table simultaneously. In trials where the choice had to be explained further (“you can have the immediately available item now or the tool to try to use later”), the order in which the two items were mentioned was randomised between trials, so that each option was mentioned first/last in multiple trials. We made this clearer in the methods section (line 327-328).

Reviewer #2: This is a very interesting and methodologically sound study. However, it has remained unclear what the focus of the research is (self-control or a broader focus also on undertsanding of tool functionality), and this has to be made clearer in the introduction as well as discussion. The results can in parts be presented more concisely. I had some comments on the analysis. My comments can be found in the attached file.

________________________________________

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

Review for

PLOS ONE

Decision-making flexibility in New Caledonian crows, young children and adult humans

in a multi-dimensional tool-use task

This study set out to investigate the ability of human adults, children, and New Caledonian crows to make profitable decisions in a variety of tool-use contexts. In addition, the authors compare their results with the findings from a previous study on cockatoos and orangutans which used a similar metholodogy. Therefore, one of the additions that the current papers brings to the literature is the use of a comparable methodology of a variety of species, including tool-using and non-tool-using ones.

The authors predicted that the crows would show a similar performance as the performance in the orangutans and cockatoos, but had no directed hypothesis about the performance of the human children. The key message that the authors draw from their results is that crows, human adults and children from 4 years of age can make profitable decisions in their studied tool-use context; however, crows show a decreased performance when they had to attend to several cues in parallel and 3-year-old children showed chance performance in almost all conditions. Lastly, the authors found a strong bias for selecting a tool, even if this choice was less efficient, across children of all ages.

Thank you for your comment. We have added a directed hypothesis concerning the children’s performance (line 136-137).

The article is appropriately structured and clearly presented. However, I have one small comment on one of the subheadings in the result section (“Performance in New Caledonian crows, Goffin’s cockatoos, orangutans, children and adult humans”) which is formulated a bit too vague (see below). The paper title does reflect the content and attracts attention. All in all, the article fits with the scope of PlosOne.

Thank you for your comment. The subheading has been changed to be clearer (line 497).

Please find my detailed comments below.

Introduction

The introduction lacks a little bit structure. Several concepts are introduced within the first page (effective decision making, self-control, delay of gratification, tool use, delay of gratification in tool use), which makes it hard to figure out what the primary focus of the current paper is. The overall area of interest is effective decision-making and it is stated that one aspect underlying this ability is self-control. The authors also mention other aspects such as work-effort sensitivity and attention to the functionality of the tools. Even though the latter two concepts are mentioned further below in the paper as well, the introduction suggests that the main focus of the authors is on self-control/delay of gratification. However, as the methods section shows, this is not the case as only some of the studied conditions are explicitly looking at delay of gratification. Others look at the perception of tool functionality and work-effort sensitivity. Therefore, the introduction could be rephrased in stating that several aspects of decision-making in a tool-use context are being investigated. Otherwise, if the focus of the introduction remains heavily on self-control, the choice of that many conditions (some of which do not involve delay of gratification) remains unjustified. This will imply that the authors give some more information on what we already know about work-effort sensitivity and attention to the functionality of tools in the studied species (especially crows and human children).

What I would also find interesting is some statement on how these different concepts of decision-making (self-control, work-effort sensitivity, sensitivity to tool functionality, etc) relate to each other.

What is also lacking at the end of the introduction or at the beginning of the methods section is an account of how the researchers are aiming to measure delay of gratification, work-effort sensitivity, and the sensitivity to the functionality of the tool. That is, here the different conditions/comparisons between conditions can be introduced (in the current paper, the conditions are just described in the methods section, but is remains somewhat implicit what the conditions are designed to measure).

These comments have been addressed throughout the introduction and more detail was added on the different measures that were used in the study.

More specific comments I had are:

Lines 58/59: “There is significant improvement by 4 years old and above (14), with some studies showing improvement between ages 3 and 5.” This statement is a bit unclear as it doesn’t become clear how the improvement looks like.

In addition, “There is significant improvement by 4 years “seems to describe almost the same as “improvement between 3 and 5”, and this redundancy is even more pronounced by the fact that the sentence before this one already states that self-control being present in toddlerhood and preschool age. Thus, I suggest to rephrase this sentence to state clearly which improvement the cited papers have shown and how the developmental trajectory looks like. Also note that there is not only evidence for an increase in self-control between 3 and 5, but also even beyond 5 years of age (see e.g., Jacqui A. Macdonald, Miriam H. Beauchamp, Judith A. Crigan & Peter J. Anderson (2014) Age-related differences in inhibitory control in the early school years, Child Neuropsychology: A Journal on Normal and Abnormal Development in Childhood and Adolescence, 20:5, 509-526).

We have clarified this sentence (line 61).

Lines 71/72: I am not entirely convinced by the authors’ conclusion that “no clear pattern has emerged between the self-control abilities of tool-using and non-tool-using species”. While it is true that no conclusion can yet be drawn from studies looking into self control in non-tool-using contexts (because – as the authors rightly state – species have rarely been tested on the same tasks), the studies on animals’ performance in tasks involving delayed gratification in a tool-use context (lines 86-100), might be somewhat clearer. It seems to emerge that animals of different species are able to delay gratification and to select a tool over getting an immediate reward regardless of whether the animal belongs to a tool-using species or not. The authors should discuss the possibility that the existing literature might suggest that there is no relationship between self-control abilities and being a tool user (or not) in delay of gratification tasks involving tool use. Alternatively, if the authors think the results of the previous studies are truly inconclusive, they should explain more clearly why they think that (if there is a reason other than that different tasks were used).

This sentence has been clarified and some information added (line 74-76).

A related point, what is missing before the paragraph starting in line 68 is a bit of theoretical background, i.e., whether researchers have had a priori expectations to find that tool-users have better self control (and if so, why they expect that).

We now included the background theory as to tool-users may have better self-control abilities (lines 66-70).

Also related to this part in the text, I think it would be great to move the point stated in line 101 (that “very few studies have used comparable methodology”) further up, to line 71, i.e., the authors could say that no clear pattern has emerged yet, also because different methodology has been used. This is an important point and can be made way earlier in the text.

We now stated the use of different methodologies earlier in the manuscript (lines 74-76).

Lines 105/106: “found differences between human and non-human species’ performances, which suggest that the task may not measure the ability that it is designed to test”. This sentence is a bit too vague – could you please specify what kind of differences were found? After all, you seem to describe not mere performance differences, but actually differences in the how the task is approached/perceived by the subjects?

Relatedly, line 106, “may not measure the ability that it is designed to test”. While possible, this is not always true. Isn’t it more likely that when task differences occur (see the performance differences in the trap tube task which the authors mention) that the task is indeed measuring what it is supposed to measure in one species, but not the in the other one? It seems that in most of the cases, the issue is that a task has been designed to measure an ability in one species and then fails to transfer rather than it not measuring the desired ability in either of the species. Thus, I suggest to rephrase the sentence to reflect this possibility.

This has been rephrased and clarified (line 110-119).

At some point in the introduction it might be nice to have a brief explanation of why we should know about decision-making abilities in a tool-use context specifically. While the authors describe that a tool-use context is special (by having more relational complexity and more work-effort), they miss to describe why it is interesting to investigate delay of gratification in a tool-use context. What insight can this give us on different species’ cognition? Is it because of the greater ecological validity? One could argue that if one is interested in self-control only, one should revert to non-tool-use tasks as these would not introduce another potential source of variance. I would like to see the authors countering this potential criticism.

A justification for this has been added to the introduction (line 66-76)

Lines 125/126: More information should be given on the different onset times of delayed gratification and tool functionality - the link to what has previously been investigated is rather short. Some more explanation here might also help to justify the choice of the age range, which is missing in the methods section (see below).

We added a justification for the chosen age range in the methods section (line 174-176).

Lines 126-128: Unlike for the non-human species, the authors state no directed hypotheses about children’s performance in the task. The authors should explain why not. Can one deduce from the previous literature on the different onset times of tool functionality understanding on the one hand and delay of gratification on the other hand help to formulate a hypothesis?

We have added a directed hypothesis for the children’s performance (line 136-137).

Methods

The methodology is sound. The statistical design seems mostly appropriate, however I have a few questions/comments on the design and analysis (see below). The procedures have mostly been described sufficiently, however, some details are missing that allow the study to be replicated (see below).

Line 110: Please provide a justification for why this age range of children (3-5) was chosen.

This was added in line 174.

Line 135: how was the sample size for the crows determined? Did the authors have a target number?

As the crows were wild caught, the number of individuals available for the study depended on how many crows we had for the season (always between 8 and 10 individuals) and which ones were reliable to work with and succeeded in previous training stages. This lowered the number to six reliable working crows. This is now stated in the text (line 155-157).

Line 139: The authors state that they were testing juvenile and adult crows. What is the implication of this? Is it assumed that juveniles already have the same abilities as adults? Why did the authors not decide to only test adult crows? Does a mix of juveniles and adults not matter? How much were practical issues decisive? More information on these questions should be given in order to help the reader understand whether having juveniles and adults in the sample is not an issue at all or should be seen as a concern.

We agree that ideally this study should have used enough adult and juvenile crows to make meaningful comparisons between the two age groups. However, as mentioned above, the crows were wild caught and only few animals were available, thus making it impossible to reduce the sample size further by differentiating by age. We have added more information in the methods section (line 154-157).

Line 142: can the authors add information on how big the caught family groups were?

We now stated the sizes of the groups “The birds were housed in small groups, consisting of two to four individuals per group, in a ten-compartment outdoor aviary, with approximately 7x4x4m per compartment, containing a range of natural enrichment materials, like logs, branches, and pine cones” (lines 157-160).

Line 156: how was the sample size for the child sample determined? Was there an a-priori sample size calculation?

We estimated that children will roughly show a percentage of 80 percent correct across our three age groups of 3, 4, and 5-year olds. Based on this we calculated with the g*power software for binomial tests that we would need 28 kids per age group. Effect size g= 0.3; �=0.05; estimated sample size = 28 for one-tailed binomial test per age group.

Line 160: how was the sample size for the adult sample determined?

We estimated that adults will roughly show a percentage of 90 percent correct. Based on this, we calculated with the g*power software for binomial tests that we would need 13 adults, which we overpowered by recruiting 20 adults. Effect size g= 0.4; �=0.05; estimated sample size = 13 for one-tailed binomial test.

Line 171: would the authors consider moving figure S1 to the main manuscript? It would benefit the understanding of the experiment tremendously to have a picture of the apparatuses readily available. If not, I would like to ask the authors to move S1 to the beginning of their supplementary materials, before the result tables (just to stay in chronological order).

The figure S1 has now been moved to the main manuscript as Figure 1 (line 205).

Lines 173-181: the information that two different stick apparatuses were used for humans and NC crows should come before the description of the apparatuses; just makes this section easier to understand.

We now moved the information as to why two different stick apparatuses were used before the description of the apparatuses, “We used two different ‘stick-apparatus’ for the humans and the crows, both apparatuses were functionally the same, as they required a stick to contact a reward and move it to the left or right. The minor variation in the apparatus structure was due to the testing equipment available in New Caledonia. The ‘stick-apparatus’ for the humans…” (lines 191-194)

Line 179: again, it would be nice if figure S1 was in the main manuscript rather than in the supplementary material.

Figure S1 is now in the main manuscript as Figure 1 (line 205).

Line 179: as the authors refer to S1a Fig, they should also refer to S1b Fig (e.g., in line 175).

We are now mentioning Figure 1b (line 199).

lines 335/336: please add the explanation for why subjects were divided into two subsets with a different order of conditions.

The reason for different order of conditions has now been added (lines 359-367)

347: please explain why an unusually low number of videos (10%) were coded for reliability – usually 20 – 25% are used?

10% is the usual number of videos coded for reliability in our field. We added information on how many individual trials were double coded and Kappa’s Cohen (line 370-375).

355-360: Why were no random slopes (for condition, apparatus, trial number) included into the model? According to Barr, Levy, Scheepers and Tily (2013) one should try to construct a maximal full model in order to keep type 1 error rate minimal. I suggest re-running the models and – if they lead to non-convergence – simplifying the models according to a pre-determined process.

References: Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 225-278.

We now include random slopes but had to exclude apparatus from the model for convergence reasons. Also see comments to Reviewer 1.

There is no report on model stability nor any model diagnostics checks. These analyses have to be run and included in the main manuscript or the supplementary material.

We were not able to find ways of calculating model stability and model diagnostics checks for binomial data with random effects. In case the reviewer has a suggestion how to do so, we are happy to proceed with it in the suggested way

Line 358/359: There are two issues with age.

1. Usually in developmental studies on young children, when age is entered as continuous variable into models, this means that age in months is used. If children are assigned a year value (3, 4, 5) this is usually referred to as using age as a categorical variable. As I understand, in the current manuscript children were assigned a year value, which is equivalent to using a categorical variable. Why did the authors choose to label this a continuous variable? The variable with the three levels is arguable too coarse to label it continuous?

2. Why in the first place did the authors decide to bin age into the three levels? I suggest using the truly continuous variable (age in years and months) instead. Why binning one of the few continuous variables that developmental psychologists have and thus losing information? In addition, binning age implies that a child aged 3y11m is different from a child aged 4y1m, whereas a child aged 4y1m is put into the same category as a child with 4y11m. This seems unintuitive. It is suggested to rerun the analyses with age as a continuous variable.

We now used year as a decimal as we calculated it according to the exact age of the child. We therefore decided to exclude the information completely. Due to changes in the analysis Table S5 and S6 have been removed from the supplementary materials. Additionally, the analysis does not merit any additional insight into the performance, as the reviewer noted. We therefore decided to exclude the information completely.

Results

The results were partly not easy to understand. Some information in the text is overlapping with information given in table 1. It remained unclear why some results were presented in the main text and others were moved to the supplementary section. Table 2 is also rather difficult to grasp (see below).

We have clarified the results section and tables by removing overlapping information and changing the sub-headings.

Lines 430/431: relating to the comment above on the measurement level on age, it is unclear why the authors present the results on age for both age as a truly continuous (age in months) and a categorical (age in years) variable. The authors should consider using only the continuous variable from the start or to clearly justify their decision in the methods section.

We now only include the age as a continuous variable and excluded the rest from the MS and the Supplements.

Lines 431-432: Do the authors have any references in favour of excluding subjects in order to create more distinct age classes? To me, it seems that there is no reason to get rid of datapoints (especially given that the sample is not very large to begin with). In order to circumvent excluding datapoints, one could just go with age as a continuous variable for all analyses.

Thank you for the comment, we now only do this and do not exclude any datapoints.

Line 363: it is unclear which variables in the GLMM are the control variables. I assume that one is gender, but it’s unclear whether for the children age was treated as a control variable or predictor. Please specify.

We don’t use control variables but only predictor variables and random factors.

Line 364: the reference given after this sentence seems to be wrong or at least it is unclear how it relates. The reference “52” given below in the reference list is “Göckeritz S, Schmidt MF, Tomasello M. Young children's creation and transmission of social norms. Cognitive Development. 2014;30:81-95.” Please correct and check the correctness of the references throughout the text.

We now cite the correct paper, which is reference number 59.

Lines 365-367: It is not explained what was analysed using the two-tailed statistics; which comparisons were made? Please clarify.

We now clarify in the data analysis section that we calculated success rate for each condition and age class 3-5 years.

Line 367: Please explain why for non-parametric tests a different software was used.

We ran only the Wilcoxon test in SPSS because of applicability and data structure.

Line 369: could the authors clarify which question the exact two-tailed Binomial tests address? “assessing success rates” is quite vague and could also be achieved by studying the descriptive data.

We calculated whether children were performing above chance within each age class and condition.

Lines 382-387 and following (e.g. lines 411-412)/Table 1: Would the authors consider adding to Table 1 the performance for each apparatus separately? In the text, the data for both “across apparatuses” and “for each apparatus separately” are presented together, and in between there are references to the table. It was a bit odd that the table would only present part of the data.

We moved references to Table 1 in the text so they are less confusing, but have kept Table 1 as it was to give an overview of the most important data (performance in each condition).

Lines 382-407 seem to describe in large parts what is displayed in table 1 and Figure 2. This is repetitive and the authors should consider presenting these results in a more concise manner, e.g., by cutting some parts of the text. In any case, I would recommend removing those sentences explaining which was the correct choice in which condition. This was already explained in the methods section and in Figure 1.

We have made this section less redundant by removing parts of the text.

Line 426: since the interaction between age and condition is significant, the main effect of age is not interpretable (as it is dependent on condition) and should thus not be presented. Instead, only the interaction effect should be reported.

We now only present the interaction term.

Line 428: “success rate increases with age” – this sentence implies a main effect of age, but this is not what was found, Please rephrase so that the text correctly describes the relationship between age and condition. For this, displaying the results visually would be beneficial for the understanding of the reader.

We are now more specific about the result.

Line 430: Why was the interaction not further investigated? The methods section mentioned the posthoc tests using the multcomp package, but the results from these analyses don’t seem to appear in the results section. Please add.

We added the mulcomp results in the supplement now.

Lines 433-434. “Across all conditions, only the 4 and 5-year old children selected correctly above chance across all trials”; later on the authors explain “3 to 5-year olds did not select correctly above chance in the tool functionality condition.” These two sentences are contradicting and the first sentence is potentially misleading. It suggests that 5-year-olds performed above chance in all conditions, while a look into the table and the second sentence exclude the tool functionality condition. To avoid a misunderstanding and to highlight the fact that the results are not as clear-cut as the first sentence might suggest, this first sentence should be modified accordingly to accurately describe the data.

This sentence was reworded, as it referred to the analysis of all conditions combined, not individual analyses of each condition (line 459).

Table S7: would the authors consider including the data displayed in this table into Table 2 in the main text? This way, the performance in all conditions could be more easily compared by the reader.

Table S7 was moved to the manuscript and is now Table 3.

Table 2: Instead of displaying the uncorrected p-values and then adding “NS” if the values got insignificant after the Bonferroni correction, would the authors consider just presenting the p-values obtained after the correction? This would make it a lot easier to quickly grasp the results pattern when looking at the table. What benefit does including the non-corrected values have?

Thank you for the suggestion but we prefer to leave the information inside to give the reader the possibility of fully understand the data.

Lines 467-477 and Table 3: Why is children’s performance in the tool functionality condition displayed collapsed across the two subconditions? It was found that children indeed chose correctly when the tool was functional, so the conclusion that children performed poorly in the tool functionality condition might not accurately describe their ability to perceive affordances and use tools.

Due to suggestions by Reviewer 1 Table 3 was removed from the manuscript. We discussed children’s performance in the tool functionality test in the discussion section. While children picked the functional tool when the choice was between the tool and a low quality reward in the “functional” sub-condition, they also picked the tool when it was non-functional, indicating that they did not understand or take into account tool functionality when making their choice, but made it based on reward quality alone.

Discussion and conclusions

Line 497: it is stated that the study showed that the studied species were able to make profitable decisions in tasks requiring delayed gratification in a tool-use context. This phrasing suggests that all studied conditions involved a delay of gratification component, which does not seem to be the case. The authors should carefully rephrase this sentence at the beginning of this study; this is an important sentence which many readers will read first after the abstract. As it is formulated at the moment, the sentence suggests that the study was a delay of gratification /self-control study, which I wouldn’t have called it. The subsequent sentence indeed states that the studied populations “were able to flexibly select between reward items of differing quality and tools […] relative to the context of each condition.” – this has nothing to do with delay of gratification and shows that the focus of the paper was broader (effective decision-making). This is also evident by how the following sections in the discussion are structured: First the results on tool functionality are presented, then on motivation, and only then the delay of gratification part is presented. The authors need to be clearer in describing whether self-control/delay of gratification is their main focus of the paper or just one of several (and if there are several aspects, are the other ones related to the requirements of effective decision-making or as these aspects on the motivations and perceptions of affordances in tool-use contexts?). This is unclear in both the discussion and the introduction and needs to be more streamlined.

We have made changes to both the introduction and discussion in order to make the focus of this study clearer. Specifically, we changed the first paragraph of the introduction to include work-effort sensitivity and effective decision-making (line 52-53).

Linea 514/515: the authors state that children failed in the tool functionality condition. However, if I understood the results section correctly, the results could be differentiated between the two sub-conditions (i.e., when the tool was functional vs not functional). This is an interesting finding and should be acknowledged in the discussion. The way this is currently formulated does not represent the data accurately (very broad brush).

We have added additional information and discussion about the tool functionality condition (line 619-622).

Lines 522-527: Can the authors elaborate on why the “birds’ performance at the four tasks that the 3-year olds struggled with, and particularly the motivation condition, which 4- 5-year olds and orangutans also struggled with” suggest that the cognitive demands of these conditions were “non-trivial”? Could these results also just be explained by a strong bias of choosing the tool in the children, which would not comment on whether or not the cognitive demands are difficult (i.e., it is about motivation, not cognition)?

This has been changed.

Lines 533-534: “4-5-year olds, though not 3-year olds, appear able to overcome the desire to immediately select the tool”. This might suggest that 3-year-olds lack the ability to overcome this desire. However, the authors should consider that the reason why the 3-year-olds struggled could eb that they perceived the task differently, for example as a task in which they try to explore the novel materials and tools instead of one in which getting the reward was the primary goal.

This has been added (line 568-571).

Line 555: another reason why children performed badly at the tool functionality task could not only be the “drive to give it a go”, trying to make the non-functional tool work, but it could also be that the children know well in advance that the tool will not work, but select it because it is fun manipulating the tool and exploring the apparatus with it. Could the authors provide information on what the children were attempting to do with the tool (did they just select it, were they trying to actually get the reward?) or were children not given the chance to try out the non-functional tool in the test?

This has been added (line 573-577).

In the result section it was found that performance sometimes differed between the two apparatuses. This should be at least mentioned in the discussion, preferably also discussed.

This has been added (line 647-651).

The authors provide possible explanations of why there yould be a difference between crows and cockatoo performance, but an explanation of what the difference between in the conditions in which the 3-year-olds struggle and the birds succeed seems not to be given? The authors should also comment on this difference.

We have extended the paragraph comparing child and crow performance (line 551-560).

The authors have presented many different potential explanations for the findings in the various conditions. However, given the emphasis of self-control in the introduction, could the authors make a concluding statement in the discussion on what was learned with regard to the self-control abilities of the studied groups, if anything at all can be said with certainty?

References to self-control have been added in the conclusion.

It would be nice if the authors could make a bit clearer in the discussion that xyz was their main focus and abc is what they found. At the moment, the discussion is rather driven by what has been found, e.g., in lines 574 and following the results from the tool selection quality allocation condition are presented as the most prominent finding and for the first time the ability to process information is discussed, which doesn’t seem to be strong focus of the paper at all.

We have clarified the focus of the paper at the start of the discussion (line 524-527).

Minor issues:

- Line 50: comma before „and the quality”

A comma has been added before “and the quality” (line 52).

- Line 51: I suggest to start a new paragraph for the sentence beginning with “One aspect that underlies”; that way, the first paragraph describes decision-making, whereas the second one focuses on self-control/delay of gratification, and will help the reader follow the structure and arguments of the paper.

We now start a new paragraph with “One aspect that underlies…” (line 54)

- Line 58: another good review article for the development of inhibitory control in children, which the authors could refer to, is: Garon, N., Bryson, S. E., & Smith, I. M. (2008). Executive Function in Preschoolers: A Review Using an Integrative Framework. Psychological Bulletin, 134(1), 31–60.

This reference has been added.

- Lines 64-67: The word “therefore” seems to be misplaced. “Therefore” seems to imply that the argument that decision-making involving tool use may require increased levels of complexity results from what has been described in the previous sentence; however, this is not the case. Rather, there seem to be two separate issues involved in decision-making involving tool use: 1) tool use requires delay of gratification and 2) tool use requires an increased level of relational complexity, as well as work-effort. Relatedly, the next sentence starting with “For example” doesn’t seem to be an example of what has been explained in the previous sentence. How is the need to pay attention to tool functionality linked to an increased relational complexity? I understand that paying attention to the tool per se (compared to only paying attention to the reward) increases relational complexity, but this seems to be regardless of whether or not attention is paid to tool functionality. Neither does the given example fit to the statement that tool use may require additional work-effort. I suggest that the authors rephrase this or make it clearer in how far their example represents either of the two issues mentioned in the previous sentence (increased relational complexity or additional work-effort).

This section has been rephrased (line 66 onwards).

- Lines 68/69: “has been found in various non-human species, including primates and birds” – the nomenclature is not quite correct here – the authors use the label “species” and when they continue with their examples (birds, primates) the phrasing seems to suggest that primates and birds are labels for species. Please try to be as scientifically correct as possible and try to rephrase.

This has been rephrased (line 71-73).

- Lines 77 and 79: “tool-making” – this is the first time the authors use the label “tool-making” – they started the paragraph differentiating between tool-using and non-tool-using animals and then there is the word “tool-making”. It doesn’t become clear whether these labels are used interchangeably (they shouldn’t). Is there a reason for the switch in the labels? If so, this has to be explained, as otherwise this leaves the reader potentially confused. If there is no reason, I suggest using just one label throughout the manuscript (presumable “(non-) tool-using” as the paper focuses on tool use, not on tool manufacture) in order to not introduce any ambiguity or confusion.

The paragraph was restructured to define the terms tool-using and tool-making before referring to them (line 81-86).

- Line 84: please insert “;” after “i.e.”

This has been changed.

- Line 110: comma after “humans”?

- Line 144: comma after “branches”?

We did not use Oxford commas throughout the manuscript and have thus not added these commas.

- Line 122: “require” instead of “requires”?

- This has been changed.

- Line 170: remove comma in “box, was”

- This has been changed.

- Line 170: comma after “i.e.”

- This has been changed.

- Line 214: “Supplemental Video” – please specify which video (2 in this case)

This has been added.

- Lines 351/352: please provide references for the R packages

- References have been added.

- Line 367: “data were” instead of “data was”

- This has been changed.

- Line 372: “in S1 Table” instead of “i S1 Table”

- This has been changed.

- Lines 377/378 and 425-427: replace “X” with “χ”

- This has been changed.

- Line 378 and 426-427: italicize “p”-values

- Line 378: insert space between “<” and “0.001”

- Lines 378/379 and 426-427: remove the first zero in “0.001” – as the value is bound between 0 and 1, there does not need to be a 0 before the decimal point

- Table 1: remove zeros before the decimal points

These changes to the p-values have been made throughout the results section.

- Table 1: It is not explained what “S1&2” stands for

- This has been added.

- Lines 387-392: “For the motivation condition, the correct choice was the immediately available most preferred reward over the functional tool, with the same most preferred reward inside the apparatus. For the quality allocation condition, the correct choice was the immediately available most preferred reward over the tool when the least preferred reward was inside the apparatus, or the tool over the least preferred reward, when the most preferred reward was inside the apparatus.” This was already explained in the methods section and it’s unclear why this is repeated here. Delete.

- This section has been removed.

- Table 2:

o please remove the zero before the decimal point all in p-values.

o Insert a space between “<” and the respective p-values

o The authors write “<.0001” several times. The convention is to use .001 (3 decimal places and the authors should consider changing this.

o In the column “tool functionality”, “P” should be “p”

These changes have been made.

o Why are some p-values italicized? This is not explained in the figure caption and needs to be added.

We have now italicized all p-values throughout the manuscript.

- Line 451: “-“ after “3” and “4”

- This has been changed.

- Line 457: “-“ after “3”

- This has been changed.

- Lines 463-464: This heading is too vague. Isn’t this section more about a comparison? The performance of the crows and the humans was already presented in the previous sections, so labelling this section “performance in NC crows, …children, and adult humans” is potentially confusing. I suggest rephrasing the heading so that it is more specific and describes exactly what the following section is about

- This heading has been changed to “Comparison with previous studies in Goffin’s cockatoos and orangutans”

- Lines 456-466: “we illustrate the performance…with that” – this seems grammatically incorrect. Better: we compare performance…with that?

- This has been changed.

- Line 497: remove comma after “humans”

- This has been changed.

- Line 519: “succeed” instead of “success”

- This has been changed.

- Line 526: “species” instead of “species’”

- This has been changed.

- Line 539: comma after “mentality”

- This has been added

- Line 584: “three” instead of “3”

- This has been changed.

- Line 684: the “S” in “Psychological science” should be capital letter

- Supplementary Material S3 and S4 Table: In the caption of both tables, replace “models” with “model”

- This has been changed.

Decision Letter 1

Juliane Kaminski

12 Feb 2020

Decision-making flexibility in New Caledonian crows, young children and adult humans in a multi-dimensional tool-use task

PONE-D-19-18492R1

Dear Dr. Frohnwieser,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

As you will see there are some minor comments from one of the reviewers regarding some typos etc. I think your manuscript will benefit if you attend to these comments.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Juliane Kaminski

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors have addressed all my previous comments. I recommend publication pending some minor adjustments (see below):

Line 222: "The ‘stick-apparatus’ for the humans was like [50]": consider rephrasing this sentence.

Lines 417 and 423: “we included the random effect of subject 1+Trial+Condition|ID”: I would suggest something along the following lines: “we included the random effect of subject ID and the random slopes of trial and condition within subject ID”

Lines 422 and 428: “random effect” (not “random factor”)

Line 502: why is a z statistic reported here (it does not match the z-value in Table S4). Should probably be the chi sqared value.

Table S4-1 is insufficiently described and labelled. I did not understand which comparisons were made in the different rows.

Table S3-1: layout could be improved.

Table S3 and S4: empty tables are still included

Reviewer #2: The authors address all comments in a satisfying manner and I recommend this manuscript for publication. I have a few remaining minor issues (line numbers relate to the track-changed manuscript):

Introduction, line 135: “decisions in across five conditions” – delete either “in” or “across”

Introduction, line 146: “each require” – not entirely sure about this, but the word “each” seems to imply singular, therefore “require” should be “requires”?

Introduction, lines 149-152: “delayed gratification and tool functionality understanding appear to develop in children at different ages [45, 46], but have not previously been tested simultaneously. Thus, we expected children’s ability to solve these tasks to increase with age.” The second sentence (stating that an increase in task performance is expected with age) does not necessarily follow from the first (that delayed gratification and tool functionality understanding start developing at different ages). In order for this conclusion to make sense, the authors would need to add that while delayed gratification and tool functionality understanding emerge at different ages, there is an increase in these skills over childhood.

Methods, line 219: “two different “stick apparatus” – “apparatus” needs to be changed to plural, “apparatuses”

Methods, line 279 and 286: the authors use “step 1” in the former and “Step 1” in the latter sentence – needs to be consistent

Methods, line 394: “received the following order of” – maybe “following” not needed as the order is given in the same sentence?

Analysis, lines 423 and 428: The random effect for child is once labelled “ID” and once “individual” – should be consistent

Analysis, line 452: “for multiple comparison” – “comparison” needs to be changed to plural “comparisons”

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Acceptance letter

Juliane Kaminski

21 Feb 2020

PONE-D-19-18492R1

Decision-making flexibility in New Caledonian crows, young children and adult humans in a multi-dimensional tool-use task

Dear Dr. Frohnwieser:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

For any other questions or concerns, please email plosone@plos.org.

Thank you for submitting your work to PLOS ONE.

With kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Juliane Kaminski

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Number of correct trials for all constellations of the tests for the crows for each individual.

    In the tool selection and apparatus functionality condition the total number of trials (in brackets) varied between individuals, depending on when they reached the criterion. P-values calculated from exact binomial tests. Significant p-values highlighted in bolt. MPR = most preferred reward, Sessions 1 to 2 in the tool selection and tool selection quality allocation condition are for comparability with the Goffin’s cockatoos; * p < 0.05, ** p <0.01, *** p < 0.001.

    (DOCX)

    S2 Table. Crow subject information.

    (DOCX)

    S3 Table. Generalized linear mixed models on factors affecting the number of correct trials in crows.

    N = 6. Significant p-values are highlighted in bold.

    (DOCX)

    S4 Table. Posthoc comparison of conditions of crow data with Tukey correction for multiple comparison.

    (DOCX)

    S5 Table. Generalized linear mixed models on factors affecting the number of correct trials in children aged 3–5 years, with age in years.

    N = 88. Significant p-values are highlighted in bold.

    (DOCX)

    S6 Table. Posthoc comparison of conditions of children data with Tukey correction for multiple comparison.

    (DOCX)

    S7 Table. Performance across all conditions for the crows with each apparatus singly.

    Results reflect results of Wilcoxon 1-sample signed ranks tests–chance value = 50%. Significant p-values (<0.05) highlighted in bold.

    (DOCX)

    S8 Table. Comparison of performance within conditions between crows and cockatoos.

    Results reflect Mann Whitney U-tests. Significant p-values highlighted in bold.

    (DOCX)

    S1 Video

    (MP4)

    S2 Video

    (MP4)

    Attachment

    Submitted filename: Review for PONE-D-19-18492.docx

    Data Availability Statement

    The full data set is available on Figshare: https://figshare.com/s/8e1670783219ddc4561e


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES