Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Sep 10.
Published in final edited form as: Dev Sci. 2023 Sep 4;27(2):e13445. doi: 10.1111/desc.13445

Controlling the input: How one-year-old infants sustain visual attention

Andres Mendez 1,2, Chen Yu 3, Linda B Smith 4
PMCID: PMC11384333  NIHMSID: NIHMS2021167  PMID: 37665124

Abstract

Traditionally, the exogenous control of gaze by external saliencies and the endogenous control of gaze by knowledge and context have been viewed as competing systems, with late infancy seen as a period of strengthening top-down control over the vagaries of the input. Here we found that one-year-old infants control sustained attention through head movements that increase the visibility of the attended object. Freely moving one-year-old infants (n=45) wore head-mounted eye trackers and head motion sensors while exploring sets of toys of the same physical size. The visual size of the objects, a well-documented salience, varied naturally with the infant’s moment-to-moment posture and head movements. Sustained attention to an object was characterized by the tight control of head movements that created and then stabilized a visual size advantage for the attended object for sustained attention. The findings show a collaboration between exogenous and endogenous attentional systems and suggest new hypotheses about the development of sustained visual attention.

Keywords: infancy, sustained attention, salience, top-down control


After their first birthday, infants make considerable strides withstanding distraction (e.g., Rosen, Amso, McLaughlin, 2019; Rothbart, et al., 2011). One context used to measure attentional control is active toy play (Ruff & Lawson, 1990; Wass et al, 2018; Yu & Smith, 2016). Like many other everyday contexts, the visual world of toy play is busy with many potential targets. Within this context, sustained visual attention, the frequency of looks to a single target that last multiple seconds, predicts in-task learning (Yu & Smith, 2012; Yu, Suanda & Smith, 2019). Sustained attention has also been related to later developments in attention and self-control (Brandes-Aitken et al., 2019; Fisher, 2019; Johansson, et al., 2015; Ruff, et al., 1990). By some accounts, infants’ everyday visual attention constitutes a training ground for the prefrontal cortex and neural circuitry of executive functions (Badre & Nee, 2018; Rosen et al., 2019). For these reasons, infant attention in active, free-flowing contexts has become an important empirical target for understanding the development of self-regulatory processes (Brandes-Aitken et al, 2019; Thompson & Steinbeis, 2020).

Two attentional systems

Visual attention is generally understood as controlled by two competing neural systems: exogenous and endogenous (Baluch & Itti, 2011; Columbo & Cheatham, 2006; Desimone & Duncan, 1995; Wass et al, 2018; Wolfe, 2020). The exogenous system guides gaze direction to targets based on low-level stimulus saliences such as contrast, luminance, visual size, and motion (Itti et al., 1998; Proulx & Egeth, 2008; Wolfe & Horowitz, 2017). These attention-grabbing saliences are often discussed in terms of their evolutionary value as alerts to possible danger (Itti & Koch, 2001). In contrast, the endogenous system guides gaze direction to targets based on context and the perceiver’s goals and is often driven by well-learned cues that rapidly direct attention to context-relevant information (Rossi et al., 2009; Werchan & Amso, 2020). The exogenous system is often referred to as “bottom-up” because saliences in the input itself drive gaze. The endogenous system is referred to as “top-down” because of the role of the prefrontal cortex and experience-based cues in guiding attention.

Much of what is known about these two systems – in infants (e.g., Rosen et al, 2019; Werchan & Amso, 2020) and in adults (e.g., Itti & Koch, 2001; Rossi et al, 2009) – derives from well-controlled experiments in which brief discrete trials pit cued targets (top-down signal) and salient distractors (bottom-up competitor) against each other. In contrast, in the free-flowing and extended temporal contexts of everyday attention, including infant object play, saliences and attentional goals are uncontrolled. They may sometimes compete but also may redundantly organize attention to the same target. Thus, in studies of infant attention during free-flowing object play, the main experimental focus has not been on the details of competition and relative saliences but on the duration of visual attention to a single object; in this context, it is this sustained attention, lasting multiple seconds, that predicts learning (Yu et al, 2019) and longer-term outcomes (e.g., Ruff et al, 1990; Yu et al, 2019). The predictive value of sustained attention in active vision makes sense: a long multi-second look to a single toy requires the infant to withstand any emerging distractions.

A closed-loop control hypothesis

Active everyday visual attention also differs from typical laboratory experiments in the role of body movements (Luo & Franchak, 2020). Most contemporary experiments present stimuli as 2-dimensional arrays on a screen and constrain body movements other than eye movements (see Fisher, 2019). In everyday attention tasks, perceivers freely move in a dynamic 3-dimensional space. The geometry of the 3D world and the body mean that the perceiver’s movements directly influence the image received by the retina. Small head and posture adjustments can have large effects on the image received at the retina (Ellis et al., 1989), particularly on the spatial layout of the objects in the image. Changes in vantage point can bring partially occluded objects into full view or totally obscure competing objects. Moving the head closer to a target of interest makes that target visually larger and can also block the view of a competitor object. In brief, changes in the spatial relation between the perceiver’s head and objects in the world directly determine the visual sizes of potential targets for attention.

The link between body movements and the visual sizes of objects in the input image is relevant to sustained attention during object play because visual size increases the visibility of the target. Visual size is also a well-documented bottom-up salience that powerfully attracts gaze in laboratory studies of adult attention (e.g., Anderson, Seemiller and Smith, 2022; Borji, Sihite & Itti, 2013; Wolfe & Horowitz, 2017) and appears to do so in infants as well (Cohen, 1972; Guan & Corbetta, 2012). During free-flowing toy play, uncontrolled infant body movements could create volatile variability in the visual sizes of potential targets making sustained attention more difficult. In contrast, controlled body movements by the infant could be used to create and control the visual sizes of targets, giving the selected target a salience advantage over competitors. In brief, if infants behaviorally control the spatial relation between the eyes and a selected target, they could behaviorally control the visual sizes of the target and potential competitors and, in so doing, limit distraction in the service of sustaining gaze on the selected target.

This control-the-input-to-control-attention hypothesis aligns with Ruff’s (Ruff & Lawson, 1990) seminal observations on focused attention in young children: she noted that focused multi-second concentration on an object was often accompanied by a stilled body and head with the target of interest close to the head (see also, Yu & Smith, 2012). The hypothesis is also consistent with evidence showing that infants and young children shift posture and move their heads as well as eyes when they look at targets of interest (e.g., Borjon, Abney, Yu & Smith, 2021; Kretch et al, 2014; Luo & Franchak, 2020). Other research suggests that infants (and their parents) handle objects in ways that may also systematically increase the visual size of the object (e.g., Anderson et al, 2022; Burling & Yoshida, 2019; Smith et al, 2011; Yu & Smith, 2012). However, there is as yet no evidence that directly links sustained gaze during infant object play to the visual size of the selected target, nor is there evidence that infants behaviorally control the visual sizes of the target and distractors during sustained attention. Here, we provide this evidence.

One might ask if a system that controls attention by controlling sensory input counts as a form of top-down control. One might also ask how controlling the input to control attention relates to the development of top-down and covert control of attention, as evident in adults (Posner, 2008). We will consider these larger theoretical issues in the general discussion in light of the empirical findings. However, we note that the proposed mechanism posits a collaboration between the endogenous and exogenous systems in the form of a closed-looped control system: a circular process through which behavior generates momentary input and behavior is continually controlled to maintain a property in visual input. Such closed-loop solutions to attentional tasks are common in many biological organisms other than humans (Biswas et al, 2018; Hoffman et al, 2013; Kleinfeld et al, 2006; Taub & Yovel, 2020) and have also been shown to be influenced by context and learning which could be interpreted as “top down” control (Bechtel & Bich, 2021; Biswas et al, 2018).

Rationale for the empirical approach.

We measured sustained attention during toy play in one-year-old infants because this is the age at which sustained attention during toy play is first systematically observed (Lansink, Mintz, & Richards, 2003; Ruff & Capazolli, 2003). Our preliminary studies, as well studies by others, indicate that infants at this age show less sustained attention when playing alone than with a known social partner (Wass et al, 2018). Accordingly, the infant’s caregiver was present and told to act naturally with their infant during the four test trials. Our preliminary studies also indicated that infants sustained engagement with toys was most frequent when toys were first introduced. Accordingly, we used very brief toy-play trials, with each trial 45 seconds in duration, the same play period used in prior studies in which infants explored toys one at a time with no competing toys (Pereira, et al, 2010). Here we added competitors (3 toys for play) on each trial.

Infants wore a hat outfitted with three head-mounted sensors: a scene camera that captured the image projected from the world to the head, an eye camera that tracked gaze within the head-camera scene, and a motion sensor that captured the movement of the head relative to the world. The toys used for the play trials were all of comparable physical volume. Thus, differences in the visual size of the objects in the scene-camera images were determined solely by the vantage point and proximity of the infant’s head to the toys in play. The principal analyses focus on the visual sizes of the toys in the infant point-of-view (POV) image. There are many other bottom-up saliences that attract human gaze to a target (Itti, Koch, & Niebur, 1998; Wolfe & Horowitz, 2017); however, many of these (e.g., contrast) are not directly controllable by the perceiver through momentary body movements (see, Anderson, Seemiller, & Smith, 2022). Visual size, directly affected by the momentary spatial relation of the head to the object, provides a tractable and known bottom-up salience. Therefore, we analyzed the visual sizes of objects during sustained attention to provide a first test of a closed-loop control solution to attention in a busy and changing visual world.

The main empirical hypothesis tested is this: If infants behaviorally control the visual size of a selected target to sustain attention, then sustained gaze should co-occur with a behaviorally maintained visual size advantage for the target over competitors. Accordingly, the main analyses focus on onset, duration and offset of increases in visual size as a function of onset, duration, and offset of gaze during sustained attention. The additional critical question is how control of the input during sustained attention is accomplished. Because the spatial relation between the eye and the object is the world determines the visual size of the object in the infant view, the most direct path to creating and then maintaining a visual size advantage for a selected target is through head movements and head stabilization. Accordingly, we also measured the velocity of head movements with respect to the onset, duration, and offset of sustained gaze and the visual size of the target.

Parents often scaffold infant attention to objects, and the degree to which they do so predicts the amount of infant sustained attention in the task (e.g., Suarez-Rivera, et al, 2019; Sun & Yoshida, 2022) as well as longer-term outcomes in executive control and self-regulation (see Marciszko et al, 2020; Rosen et al, 2019). We do not directly measure this scaffolding during our brief 45-second play trials. However, we report analyses of parent as well as infant handling of objects in relation to infant sustained attention. In the general discussion, we consider the implications of the present findings for how parent scaffolding may relate to infant control of visual attention through infant control of the input.

Methods

Experimental procedures

Participants.

45 infants (25 male, mean age 14.1 months, SD = 1.4) met the inclusion criteria of contributing head-mounted eye tracking data for all trials. 15 recruited infants contributed no data due to failure to contribute eye-tracking data for all four trials. 30 of 45 infants contributed both eye-tracking and head motion data. Failures to contribute head-mounted eye-tracking data for all 4 trials were due to infant fussiness, removal of the headgear by the infant during the trials, refusal to wear the headgear from the beginning, or equipment failure. Failure to contribute motion data (while contributing eye-tracking data) was based on the experimenter’s decision during data collection. Before starting the experiment, the experimenter needed to place the headgear and then adjust the eye camera, head camera, and motion sensor, typically in that temporal order. Multiple adjustments of the headgear invited infant hand actions directed to the headgear in the moment or caused later removal of the gear by the infant. Accordingly, the goal was to adjust the motion sensors in one or two attempts. If the proper adjustment of the motion sensor was not easily accomplished, the experiment proceeded with the collection of eye-tracking data but not head motion data. The infants were recruited from an opt-in database for developmental research. Participants in the database broadly represent the demographics of Monroe County Indiana but over-sample minority groups (68 % European American, 10% African American, 7% Asian American, 7% Latino, 7% Mixed race) and consist predominantly of working- and middle-class families. Recruitment and procedures were approved by the Institutional Review Board of Indiana University (protocol number 0808000094).

Recording Equipment.

The head- mounted eye tracker (Positive Science LLC, http://www.positivescience.com) included an infrared camera mounted on the head, pointed to the right eye of the infant, that recorded eye images, and a scene camera that captured the events in the world from the infant’s point of view (Figure 1A). The scene camera’s visual field was 108o diagonal. Together the eye and scene cameras yielded gaze within the scene location (x and y) at a sampling rate of 30 Hz. A wired motion capture sensor (Polhemus Liberty) was affixed to the hat on the right temple of the infant’s head and collected rotational position data (roll, pitch, and yaw) at 60 Hz. There was also an overhead birds-eye camera that provided good views of hand actions by the infant and parent.

Figure 1. The visual size of objects in the infant view.

Figure 1.

A. Infant in head-mounted eye-tracker. B. Proportion of all frames in the corpus with 1, 2 or 3 of the toys present (at least 1 pixel) in the image. Dots show the proportion of images with 1, 2 or 3 toys present for individual infants. C. Corpus mean relative visual size (RS) of the largest toy and the two other toys in the image; dots show individual participant means. D. Corpus mean of absolute visual size (AS) for the largest and two other toys; dots show individual participant means. E. The relative (RS) and absolute (AS) visual size of all toys present (420,325) in the corpus of (on-task) frames. The largest object in 4 example head-camera images is labeled near its RS by AS coordinates in the scatterplot.

Play context and toys.

Infants and parents sat across from each other at a small table (61cm × 91cm × 64cm). The infant sat alone on a chair that allowed leaning and rotation of the body and torso. The infant (and parent) wore a white smock, and the walls and table were painted white. This setup served two purposes: first, the all-white background dampened in-room distractors beyond the toys themselves; second, the white background aided the algorithmic measures used to determine the presence and visual sizes of the toys in the infant head camera images.

Each infant played with three toys on each of the trials. The three toys for each trial were a different uniform color: blue, green, or red. The toys were novel, designed by the experimenters, and constructed of various materials and moveable parts to engage 1-year-old infants. The volume of each toy was 300 cm3. Different infants played with different sets of 6 toys. In total, across the 45 infants, 18 different toys were used. See Figure 1E for examples.

Procedure.

The headgear was placed on the infant and adjusted while an experimenter and parent kept the infant occupied with a toy with a spinning light. This attention-grabbing toy was selected to calibrate the eye-tracking system and was not used in the experiment. On average, fifteen calibration points at different locations on the play table were collected; the experimenter directed the infant’s attention toward the toy while a technician recorded the attended moment used in later off-line eye-tracking calibration.

Parents were told that the goal of the experiment was to study how infants explored novel objects and to interact with their infant as they normally would. Each infant received two unique sets of 3 toys twice in the four trials in the order of ABAB, with the individual sets of 3 toys for each infant randomly assigned as set A or B. The inter-trial onset was about 1.5 minutes apart; this made for smoother transitions in play from one trial to the next. Onset was defined as the timepoint at which all three objects for that trial were on the play table. Only the first 45 sec after onset was considered the trial proper and included as data.

Data processing and coding

Objects in the image.

The presence and image size of the objects in the infant head camera image were algorithmically determined using computer vision techniques (Smith, Yu & Pereira, 2011; Yu et al., 2009). The algorithm delivered the number of pixels present for each toy object in the image. The validity of the automatic coding results was assessed by asking two human coders to annotate a small proportion of the data (~ 1200 frames); the comparison of hand coding with the image processing results yielded 91% frame-by-frame agreement across all measures.

Measures of visual size.

Because the objects were the same physical size, the visual size of the objects in the head-camera images depended on occlusions and distance to the head camera. The absolute visual size (AS) for each object present in a frame was calculated as the visual angle, that is, as an estimation of the angle the in-view objects subtends at the eye. The diameter of the fovea (adult) is typically estimated at 5.2 degrees (Jonas et al, 2015). The head camera videos are 480 pixels in height by 640 pixels in width, which at 72 dpi, translates to 22 cm x 16.9 cm, respectively. The average distance of the eye to the table center for infants sitting on the chair was 44.5 cm. The degrees in a single pixel of the head camera were therefore calculated as:

θpix=arctan0.5×16.944.50.5×480

An object’s relative visual size (RS) was calculated directly from the number of image pixels belonging to each object in each frame as the proportion of all object pixels in the frame that belonged to that object. For both AS and RS, an in-play object with no pixels in the head camera image (was out of view) was entered as 0 in size. Because RS measures the proportion of all pixels belonging to one object divided by the sum of all pixels of the 3 objects in play, the expected value of RS, if all three objects were in an image and the same visual size, is .33. If the infant’s vantage point is such that only one object is in the image, the RS is 1.00 (since 100% of all object pixels in the image belong to that object). Thus, for each toy in play on a given trial, the RS could vary from 0 (the toy has no pixels in the image) to 1.00 (the toy is the only in-view toy in that head-camera frame). The AS of an object could vary from 0 to in principle 90o if the infant positioned the object and head such the object was so close, its pixels filled the field of view of the camera. However, in the collected data the largest visual size of a single object was 28o.

Gaze data.

Frame-by-frame gaze was determined for each image with respect to three regions-of-interest (ROIs) defined precisely as the pixels belonging to each of the three objects in play. Trained coders, naive to the hypotheses and goals of this study, indicated when the gaze crosshair fell on a pixel belonging to a toy object. Because the three toys were three different primary colors and differed from skin tones and the white background, this could be done with accuracy. Reliability was computed between two independent coders on eleven dyads that were randomly selected. Coders coded 25% of each dyad’s frames making judgments on 2,790 frames per dyad on average. The inter-coder reliability of eye-gaze coding performed by these highly trained coders ranged from 82% to 95% for each individual subject with an overall Cohen’s kappa of 0.75.

Looks.

A look to an object was defined as a continuous stream of frames in which gaze was directed to pixels belonging to the same object. For look duration and stability measures, frames with gaze directed to the same object were combined into the same look if separated by no more than one frame (.033 sec).

Head motion.

We measured the velocity of head movements from the 3 rotational coordinates delivered (in degrees) by the motion sensor (Borjon et al, 2021; Richards & Hunter, 1997; Rosander & Von Hofsten, 2000). Rotational velocity was calculated by taking the difference in the 3 rotational coordinates at each 60 Hz sample divided by the change in time. The key experimental question with respect to the control-loop hypothesis is how much head movement (and not specific directions of movement) because all varieties of directional movements may alter proximity, vantage point, and occlusions depending on the 3-dimensional composition of the world scene. Further, the motion sensor could not be quickly attached and aligned to match XYZ coordinates of the world. Accordingly, we used an aggregated measure of momentary rotational velocity, the calculated Euclidean distance across the 3 directional velocities. Momentary rotational velocities at or above the 99th percentile for each subject were excluded.

Infant and parent handling of objects.

Because infant and parent handling of objects could systematically affect the visual sizes of the objects in play, human coders determined, frame-by-frame, if a hand, and whose hand, was in contact with a toy. Coders had simultaneous access to both the infant perspective view and the view from the overhead camera but primarily relied on the overhead camera for coding hands in contact with an object. A second coder independently coded a randomly selected 25% of the frames of five dyads; agreement ranged from .76 to .90 for individual dyads, with an overall Cohen’s kappa of 0.90.

Images analyzed.

Each infant contributed 5400 frames (45 second* 4 trials * 30Hz), a total corpus of 243,000 frames.

Statistical analyses.

Analyses were conducted at the corpus level. The dependent measures were frame-by-frame measures of visual size of all objects in the images, gaze directed to those objects, durations of gaze to the same object, and head velocity. From these data, we determined the onsets, durations, and offsets of increases in the visual size of objects, of continuous gaze to a single object, and head velocity in relation to gaze onset, sustained gaze and gaze offset.

Mixed-effect models were conducted using the lme4 package in R (Version 3.6.1; Bates, et al., 2014) with measures of object size as the fixed effect predicting the duration of looks. The nlme package in R (Version 3.1–160, Pinheiro et al 2017) was used to study temporal changes in head speed. Corpus level error bars were calculated as per Campbell (2017). Individual infants and the specific toy objects were random variables in all analyses.

Open Data.

The de-identified data for all reported analyses, the individual comparisons in the time-series analyses, and the supplementary analyses of looks below our threshold criterion for sustained attention and are available at https://osf.io/tcd35/.

Results

The input.

For the first set of analyses, we analyzed frames (68% of the total frames) in which the infant gaze was directed to one of the toys (excluding gaze to parent face and to targets in the room that were not toys). The goal of this first analysis is to provide a description of the stimulus, that is of the visual sizes of potential targets in the images in front of infant faces. Figure 1B, C and D show summary statistics of the corpus means (bars), and participant means (dots) of the proportion of frames with 1, 2 or 3 objects in the image (Figure 1B), RS and AS measures of the visual size of objects with at least one pixel in the image (Figure 1 C,D). Figure 1E shows the main result: the RS and AS for each object present (at least one pixel) in an image with the blue dots indicating the objects that were the visually largest in each image and the grey dots indicating the remaining objects. As is apparent, the visual sizes of potential targets vary markedly. The AS of objects varied from under 1o to 28o ; in 99% of cases, the AS was equal to or less than e than 15o, the maximum value shown in Figure 1E). The RS of the individual objects varied from .00 (the object was not in the image) to 1.00 (the object was the only one in the image). The visually largest object in a frame ranged from an RS of .35 (slightly larger than the expected size if all three objects were present and equal in visual size) to 1.00. On average, the largest object in an image had an RS that accounted for .63 of all object pixels (SE=.17) putting it at just less than twice the visual size of the sum of the pixels of the other two objects. The visually largest object in an image had an average absolute visual size of 4.8o (SE= 1.8). Figure 1B and 1E use letter labels to align the RS and AS coordinates of 4 individual objects that were the largest in 4 different infant-egocentric images. AS and RS are both based on pixel size and thus necessarily correlated (r2= 0.51) However, as is apparent in Figure 1E, objects with the same AS can differ markedly in their relative size (RS) to the other objects in the image. In sum, in the uncontrolled but natural context of exploring toys, the stimulus arrays in front of infant faces present highly variable within-image visual-size saliences and thus potential bottom-up competition among potential targets and distractors.

Defining sustained attention.

Figure 2 shows the frequency distribution of all infant look durations with their well-known skew (Borjon, et al, 2021; Suarez-Rivera, Smith, & Yu, 2019). Most looks are very brief, glances that last a fraction of a second. But there is a long tail of much longer multi-second looks to a single object. In the following analyses, we concentrate on the long tail of very long looks to a single object that last 3 seconds or longer. Although these looks longer than 3 seconds constitute only 19% of individual looks, they constitute 66% of the total duration of time that the toddlers were looking at an object. That is, the clear majority of looking time to objects occurs within looks that are 3 seconds or longer. We focus on these very long multi-second looks to a single object because they are characteristic of infant looking behavior during object exploration, because these sustained looks predict learning in the task (Pereira et al, 2054; Schroer & Yu, 2022; Yu & Smith, 2022), and because they predict longer-term developmental outcomes in visual attention, self-regulation, and executive function (e.g., Brandes-Aiken et al, 2019; Frick et al, 2018 et al, 2019; Ruff et al, 1990). It has been suggested that briefer and sustained looks result from different underlying processes (Richards, 2010); however, there is limited evidence on this issue and no established empirical criteria for determining different kinds of looks or where the break point between different kinds of looks might lie. Therefore, we took a conservative but established approach (Suarez-Rivera et al., 2019; Wass et al, 2018; Yu, Suanda & Smith, 2019), using a 3-second threshold.

Figure 2. Sustained Attention.

Figure 2.

A. Histogram (frequency normalized as proportion of all looks to an object) of durations of unbroken looks to a single toy object. Bin size is 250 msec. The insets show the corpus means of RS and AS for Looks below and above for the 3-second threshold for sustained attention; dots show participant means. Although looks maintained 3 seconds or longer to a single object are only 19% of all individual looks, they constitute 66% of all the total time that infants direct gaze at a play object.

On face value, sustained looking to a single object for a duration longer than 3 seconds should be difficult if there are other competing saliences. The insets in Figure 2 show preliminary evidence that salience advantages may be associated with sustained looks, advantages that could make attention to a target easier to sustain. The means for both RS (proportion of all object pixels in the image belonging to the gazed-to object) and AS (absolute size in degrees of gazed to object) are greater for looks longer than the 3 second threshold than for looks below that threshold: RS for above threshold long looks, , Mlong = 0.56; SElong = 0.009, for below threshold shorter looks, Mshort = 0.48; SEshort = 0.005; β = 1.59; z = 6.39; p < 0.001; AS for above threshold long looks, , Mlong = 5.96; SElong = 0.075, below threshold shorter looks, Mshort = 5.36; SEshort = 0.037; β = .14; z = 4.90; p < 0.001. However, the critical questions for the control hypothesis are about the dynamics of looking behavior. Do infants behaviorally create and control (that is, maintain) the observed salience advantage associated with sustained gaze to an object? We answer this question through a series of analyses of all looks to a single object that lasted 3 seconds or longer. We make no claim that there are categorical differences between looks that fall above and below the selected threshold. The Supplementary Information provides two comparisons of looks above and below the selected threshold. Figure S1 compares the proportion of short and long looks during which the gazed to object was the largest in the image. Because the below threshold shorter looks are a mixture of very different durations –from fractions of seconds to minutes, Figure S2 shows the dynamics of the target’s relative size for looks using 5 different thresholds suggesting that the below threshold looks may also consist of a mixture different kinds of looks. Determining the criteria for separating different kinds of looks measured via head-mounted eye-tracking in freely moving perceivers remains an open and critical question for the field.

The dynamics of visual size and sustained attention.

We computed the RS and AS of the attended object from 2 seconds before and 2 seconds after the onset and offset of each sustained look and compared those values to baselines for RS and AS respectively. The baselines were computed to instantiate the null hypothesis that looking at each moment is random with respect to the visual sizes of the objects in the image projected to the eye. Specifically, the moment-to-moment baseline RS and AS were determined individually for each sustained look by randomly selecting one in-view object and its visual size from the ordered series of within-look frames. The expected baseline (if object visual size does not matter) was then calculated frame by frame across all looks as an expected mean visual size from 2 seconds before to 2 seconds after onset and also offset. Figures 3A and 3B show the marked increases in RS and AS respectively at the onset of a sustained look and marked decreases at the offset.

Figure 3. Onset and offset of visual size advantages relative to the onset and offset of sustained attention.

Figure 3.

A. Mean corpus (calculated frame by frame) relative size (RS –proportion of object pixels in a frame corresponding to the attended object) from 2 seconds before and after the onset and offset of all looks 3 seconds or longer. B. Mean corpus absolute size (AS – degrees subtended) for the attended object from 2 seconds before and after the offset and onset of a looks 3 seconds or longer. In both A and B, the dashed line indicates the expected baseline if looking is independent of the RS or AS of the attended object. The shaded area indicates the standard error of the corpus mean.

We used the time-series approach of a series of t-tests first introduced by Allopenna, Magunuson & Tanenhaus (1998; see also Yu & Smith, 2012) to answer the statistical question about onset as to when in a series of comparisons, a measure first differs from the baseline and remains above the baseline for a pre-specified duration. Although the Type 1 error for each comparison in the series is set at .05, the overall Type 1 error rate for a rise that exists for N comparisons after the first rise is much smaller (approximated by .05N). This approach can also answer the statistical question, critical with respect to the offset of a look, as to when in a series of t-tests, a measure that differs from baseline for N comparisons, first returns to and remains at baseline for a pre-specified duration. We computed, across the corpus of all sustained looks, frame-by-frame pairwise comparisons (30 per sec), comparing RS to its baseline and AS to its baseline from 2 seconds before the onset to 2 seconds after the onset and for the window 2 seconds before to 2 seconds after the offset with individual looks as the random variable. The RS of the attended object, Figure 3A, first reliably increased from baseline and stayed above baseline at 100 msec before look onset. Thus, the increase in RS for the attended object was nearly in synchronous with the onset of the look itself. The advantage remained reliably different for 2 seconds (p< .00001 for the overall measure of onset of increased RS). The observed RS of the target object first reliably decreased to baseline and then remained at baseline from 200 msecs after offset (p < . 0001 for the overall measure). Thus, onset and offset of the relative size advantage for selected target were temporally coordinated with the onset and offset of sustained attention to the target.

AS, the absolute size of the target, provides direct evidence of the spatial relation of the attended object to the infant’s head. Using the same statistical analysis approach as for RS, the AS of the attended object first reliably differed from baseline at 66 msecs before look onset and did not return to base for the following 2 sec of comparisons; the AS of the target first decreased and then remained at baseline at 166 msecs after offset, p <.0001 for both overall measures of onset and offset. In sum, the target’s absolute visual size (AS) increases just ahead of the onset of the look and remains stable until just after the offset of the look. This fact indicates that the distance and vantage point of the attended object relative to the infant’s head changed at the start of sustained attention and was maintained throughout the duration of the look.

A visual size advantage (as well as other saliences) could cause infant selection of a target for sustained attention. However, the very small lead in the onset of the increase in visual size with respect to the onset of gaze makes it unlikely that a visual size advantage for the to-be-attended object was a systematic cause of the initial shift of gaze to the object. If the onset of the visual size advantage of the target were considered the stimulus onset, the stimulus onset to gaze onset (100 msec) would fall well below the estimated minimum (250 to 500 msec depending on the complexity of the array) for infants to execute a shift of gaze to a target (e.g., Marchman & Fernald, 2008; Oakes & Luck, 2013; Yu et al, 2012) as determined in controlled experimental studies. Instead, the brief lead of the visual size increase relative to the gaze shift likely reflects the well-documented fact that infants and children (unlike adults) often shift the head before their eyes when re-directing gaze (e.g., Borjon et al, 2021; Luo & Franchak, 2020; Nakagawa & Sukigara, 2013; Schmitow et al., 2013).

Head movements.

Head movements provide a direct path for creating and controlling a visual size advantage for targets of interest. If infants increase the visual size of an object at the start of sustained attention and then maintain that advantage for the duration of the look by controlling the spatial relation of their head to the object, there should be a rapid head movement near the start of gaze onset and then a stilling of the head within the look. Figure 4A shows the mean momentary aggregate velocity for the 30 infants who contributed head motion data for the period from 2 seconds before to 2 seconds after look onset and for the period from 2 seconds before to 2 seconds after look offset. As is apparent, infant heads move markedly around the onset and offset of a look. To statistically quantify the reliability of this increase in head movements at onset and offset of sustained attention, we calculated the mean rotational velocity of the head in 500 msec windows (a window size around gaze onset that corresponds to previous reported temporal relations between coordinated head movements and gaze shifts by toddlers, Borjon et al, 2021; Luo & Franchak, 2020; Nakagawa & Sukigara, 2013; Schmitow et al., 2013). As shown in Figure 4B, one 500 msec window was centered on look onset. This creates 7 windows from 1.75 seconds before [window numbers: −3, −2, −1] to after [window numbers: +1, +2,+3] onset [0] and likewise 7 windows from 1.75 seconds to 1.75 after the offset of a Long look. We compared head velocity in each window to the successive next window with a preset alpha of .001 to correct for the individual 6 comparisons. For onset, the only reliable comparisons were from the window −1 to 0, and from 0 to +1: that is, head velocity during the window centered on Look onset differed from the just preceding window (p < .0001) and from the just following window (p < .0001). No other comparisons in the series of 6 comparisons were reliable. The head moves at the start of a look and then stills. For offset, head velocity at the window centered at offset increased reliably from the just preceding window (window −1 to 0, p < .0001) and the just following window (0 to +1, p < .0001). No other adjacent comparisons approached significance. The head moves rapidly at the onset and offset of the look to a target and is stilled during the look.

Figure 4. Head movements at the onset, offset and during sustained attention.

Figure 4.

A. Mean corpus aggregated rotational velocity computed frame by frame from 2 seconds before and 2 seconds after the onset and offset of sustained attention (look 3 seconds or longer in duration). The shaded area indicates the standard error of the corpus mean. B. The seven 500 msec windows centered on the onset and offset of the looks used for analyses of the precision of head movements at onset and offset. C. The temporal windows before and after look onset and look offset used for analyses of head velocity within sustained attention as compared to the period before onset and the period after look offset.

Is the head stilled more during the look than prior to the look? If infants inhibit head movements during sustained attention to maintain the visual size advantage, the head should be more stilled during the look than in the period before the head movement at the start of a look. To test the hypothesis, illustrated in Figure 4C, we compared head movements for the 2 seconds before onset and for 2 after onset during the look excluding the 500 msec window of rapid head movements at onset. We did the same for the 2 seconds during look just before offset and for the 2 seconds after look onset, again excluding the 500 msec window around look offset. The key question is whether head movements during the look are less than those in the period outside of the sustained look. Head movement in the two seconds window before onset M =26.25, SE = 1.11 are faster than the head movements in the two seconds after onset window, that is, during the look, M = 18.72, SE =.822, B = 7.64, t (145) = 7.54, p < 0001. Likewise, head movements in the two seconds before offset, that is during the look, M=17.47, SE= .62, were slower than in the 2 seconds after, M=25.29, SE=.97, B= 7.87, t(146) =5.75, p < .001. After one-year-old infants move their head to direct gaze to an object for a Long look, they still the head during the look to a greater degree than before or after the head movements that initiated and ended the look. This result implicates control of head movements in the service of sustaining attention.

Object handling.

Infants (and their parents) explored the objects by handling them. These activities could create and sustain visual size advantages if they systematically brought handled objects close to the infant head or if handling was the systematic cause of the infant’s leaning in for a closer look at the start of sustained attention. We determined for the 2-second window before and after the onset and offset of each sustained look, the frames in which the looked-to object was in contact with the hands of the parent or infant. We calculated the expected baselines of infant and parent object handling by randomly selecting one object from the corpus of frames. The baseline thus instantiates the null hypothesis that object handling was independent of the object and infant sustained attention. We used the same time series analysis (Allopenna et al, 1998) to determine changes in visual size at the onset and offset of sustained attention. As shown in Figure 5, the dynamics of hand contact by both infants and caregiver are not systematically related to the onset and offset of infant sustained gaze to an object. The first significant increase in infant handling of the attended object occurred at 733 msecs after the onset of sustained gaze. Infant handling did not decrease to baseline within the +/- 2 second window around look offset. Thus, infant handling of an attended object was common for some portion of the time that gaze is directed to the object but was not systematically associated with the onset and offset of the look and thus not systematically associated with the onset and offset of the visual size advantages nor the head movements at the onset and offset of sustained looks to an object. Onset and offset of parent handling were also not coordinated with the onset and offset of the infant’s look to the object. The likelihood of parent handling of an object to which an infant sustained gaze was reliably above baseline for the entire +/- 2-second window around look onset but returned to baseline and reliably stayed at baseline from 766 msecs forward. In brief, although handling and moving objects occurred before, during, and after sustained attention to the handled object, handling is not a systematic proximal cause of the increased visual size at look onset, the maintenance of that salience advantage, the breaking of the visual size advantage that occurs at look offset, nor the head movements that occur at the start and end of a sustained gaze to an object.

Figure 5. Handling of objects at onset and offset of sustained attention.

Figure 5.

A. Mean handling by infant (determined frame by frame) of the attended object for 2 secs before and after the onset and 2 sec before and after the offset looks 3 seconds or longer. The dashed line indicates the expected baseline if looking is independent of infant handling. B. Mean handling by parent (determined frame by frame) of the attended object for 2 secs before and after the onset and 2 sec before and after the offset of looks longer 3 seconds or longer. The dashed line indicates the expected baseline if looking is independent of infant handling. For handling, the shaded area indicates the standard error of the mean proportion of participants handling of the looked-to object for each frame.

Discussion

Moving heads and moving objects during infant toy play create a context of highly variable input with respect to the visual sizes of potential targets for attention. However, during a sustained look to an object, the visual size of the target is tightly controlled by infant behavior. Infants control the visual size of the attended object in the input during sustained attention by controlling their head movements and thus the spatial relation between the head and the object. Infants control the input by systematically moving the head at the start of sustained attention in a way that makes the target visually larger than competitors, they inhibit head movements during the look and thus maintain the salience advantage, and they break the look and the salience advantage by moving the head. This solution to sustained attention has not been explicitly considered in the developmental literature. But as Gibson (1979) argued, the behavior of freely moving perceivers creates the visual input, defines the perceptual task that must be solved, and provides the solution. Perception in the context of freely moving perceivers is also the environment of evolutionarily constrained developmental process. In the following discussion, we consider the implications of infants’ behavioral control of the input for developing top-down control of attention.

Collaboration

Is infant closed-loop behavioral control of sustained attention a form of bottom-up or top-down control? Head movements that increase the visual size of the target occur at the same time as the infant directs gaze to the object. Thus, whatever factors determine the shift in gaze to what will become the target of sustained attention also determine the synchronous head movements that change the spatial relation of the head to the target making the target larger than competitors in the infant field of view. We do not know the factors that cause the infants to shift gaze to the target. Those factors could include other bottom-up saliences, memories of past experiences, or other momentary goals of the infant. Nonetheless, the head movements that create and sustain a visual size advantage for targets of sustained attention emerge at the same moment as the decision to look. These observations suggest that there may not be an easy partition of infant attention into distinct top-down versus bottom-up processes. In a theoretical paper on infant visual attention, Rosen, Amso and McLaughlin (2019) argued that in everyday attention, exogenous and endogenous systems were collaborative and not separable as moment-to-moment influences on attention: bottom-up saliences alert, initiate interest and create top-down attentional goals. Here we show bottom-up salience is also behaviorally controlled by the infant when the infant displays sustained visual interested in an object.

The closed-loop hypothesis and the present findings provide new insights into several previous results. Ruff’s seminal work (Ruff & Lawson, 1990) used not only the duration of looks as defining of focused attention but also a stilled head and a stilled body. The present results suggest that controlled body movements may be critical because they stabilize a visual advantage of the target over distractors. Two studies (Pereira et al, 2014; Yu & Smith, 2012) of infant point-of view experiences have revealed a visual signature of effective parent naming moments for learning object names. Pereira et al (2014) found that 18-month-old infants were more likely to learn name-object associations when the parent naming event co-occurred with a visual experience in which the referent was visually large and centered in the infant head-centered field of view for an extended duration of 3 to 5 seconds around the heard name, a duration that meets the literature’s currently used threshold for sustained attention. Yu & Smith (2012) showed that infants were more likely to learn object names when their head was not moving during parent naming events. In brief, the signature sensory-motor properties for infant learning of object names are also signature sensory-motor properties of sustained attention (see Yu et al, 2019).

The present findings may also increase understanding of the developmental paths from “sticky” gaze in infants younger than 4 months of age (Columbo & Cheatham, 2006; Kulke, Atkinson & Braddick, 2015) to sustained attention as observed in infants 9 months of age and older. On the surface, the two phenomena appear similar: long gaze to a continuing target that is not easily disrupted by a competitor. However, in young infants, re-orienting gaze to a new peripheral target is a positive predictor of later cognitive developments (e.g., Columbo, 1997; Columbo & Cheatham, 2006); in infants older than 9 months of age, sustaining gaze on a target is a positive predictor of later cognitive outcomes (e.g., Fisher, 2019; Rosen et al, 2019; see also Geeraerts et al. 2019). When young infants successfully “unstick” gaze to re-orient to a new target, they often move their heads before the gaze shift and these movements have been proposed (Robertson, et al, 2001) to unlock the gaze on the prior target. The one-year-old infants in the present study also moved their heads to end sustained gaze in a way that may function to disrupt the helpful salience for sustaining gaze in order to look elsewhere. Together, these observations suggest that the increasing autonomy and control of motor systems in the first two years of life may play a critical role in the development of cognitive control systems. This hypothesis (see also Gottwald, et al., 2016; Thompson & Steinbeis, 2020) is consistent with the use of atypical sensory-motor behaviors (including difficulty in head-stabilization) as biomarkers of attentional deficits in children (Berger et al, 2019; Friedman et al, 2005). The hypothesis is also consistent with research on the neural underpinnings of attention that reveals overlaps between brain networks that plan motor behaviors and those that control the spatial direction of attention (e.g., Miller & Cohen, 2001; Van Ede, et al., 2019).

A developmental hypothesis

Figure 6 illustrates a closed-loop system connecting visual input, brain activation, and behavior (see Byrge et al, 2014, Chiel & Beer, 1997). The input at each moment perturbs ongoing brain activations across multiple networks bottom-up and top-down, activating memories, goals, and behaviors. These activations within the infant brain can affect the world – the next moment of input—through the infant’s own behavior. Momentary inputs and the brain activations elicit, incrementally support, tune, and train neural circuitry (Byrge et al, 2014) and likely do so for those relevant to the development of self-regulatory processes (Rosen et al, 2019). By adulthood, internal mechanisms of top-down attentional control are sufficiently strong that attentional tasks can be solved covertly involving no supporting external behaviors, not even shifts in eye gaze (Posner, 1980). However, for adults and infants in most everyday life tasks, attention may typically exploit the closed loop of brain-behavior-input (Foulsham et al, 2011). Considerable research shows that adults also look with their whole bodies and purposely alter the sensory input to support attention and the extraction of task relevant information (e.g., Anderson et al, 2022; Clark, 2008; Risko & Gilbert, 2016). Given these observations, we propose a developmental hypothesis with two parts. First, behavioral control of the visual input emerges as infants begin to better control motor systems (Adolph & Franchak, 2017). Second, this emergent period of behavioral control of the input is plays a key role in the development of executive function.

Figure 6. Illustration of a hypothesized closed-loop mechanism for sustained visual attention and their possible role in brain networks responsible for the development of self-regulatory processes.

Figure 6.

At each moment visual input perturbs ongoing brain activations across multiple networks, activating memories, generating context specific goals, and behavior. The generated behavior selects and structures the spatial layout of visual information at the next time step. The behavior of social partners also selects and structure the visual input for the infant. Momentary input incrementally supports, tunes, and trains neural circuitry relevant to sustained attention during infant play and for other self-regulatory processes.

Critically and as illustrated in Figure 5, mature social partners through their behaviors also directly influence the visual input to infants and thus the infant’s looking behaviors. The closed-loop system illustrated in Figure 5 provides a pathway through which parent behavior may scaffolds infant visual attention and sustained attention. Past work shows that the key components of parent behavior are responsiveness to the child’s own interests during play with potent effects of parent looks to, talk about, and touches of the object to which the infant is already attending (Fay‐Stammbach et al, 2014; Suarez-Rivera et al, 2019; Yu & Smith, 2016). Thus, parent behavior that is responsive and coordinated with infant behavior may be particularly potent as a booster of ongoing infant momentary interest in objects, triggering and helping the infant sustain the behaviors (inhibiting head movements) that sustain a visibility advantage for objects during sustained attention. The present finding that infants control their body to control the input to control visual attention raises multiple empirically testable hypotheses about the development of sustained attention in infancy during the period in which individual differences attention begin to predict later developments in executive control.

Supplementary Material

supplementary information

Research Highlights.

  • One-year-old infants wore head-mounted eye trackers and motion sensors while exploring multiple novel objects of the same physical size.

  • Infants systematically moved their heads to make the object of current interest visually larger than competitors and stabilized the head to maintain that advantage during sustained attention.

  • The onset, maintenance, and offset of sustained visual attention was time-locked to the onset, maintenance, and offset of a visual size advantage of the attended object over competitors.

  • The findings implicate a close-loop control system: Infants control visual attention by behaviorally controlling the visual input.

Acknowledgments:

The authors thank Steven Emlinger, Seth Foster, Charlene Tay, Melissa Hall, and Char Wozniak as well as the past and present members of the Cognitive Development Laboratory and the Computational Cognition and Learning Laboratory for contributions to data collection, management, analysis, and interpretation. The research was supported by grants from the National Institute for Child Health and Human Development to Chen Yu and L.B. Smith (R01HD074601, R01HD104624) and from the National Eye Institute to L.B. Smith (R01EY032897). Andrés H. Méndez was supported by the Agencia Nacional de Investigación e Innovación (POS_NAC_2016_1_130861). Participant recruitment and the research protocol were approved by the Institutional Review Board of Indiana University (protocol number 0808000094). All authors confirm we have no financial or other interests that constitute a conflict of interest.

Data availability statement:

The de-identified data, code for analyses, and supplementary analyses are available on the Open Science Framework https://osf.io/tcd35/.

References

  1. Adolph KE, & Franchak JM (2017). The development of motor behavior. Wiley Interdisciplinary Reviews: Cognitive Science, 8(1–2). [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Allopenna PD, Magnuson JS, & Tanenhaus MK (1998). Tracking the time course of spoken word recognition using eye movements: Evidence for continuous mapping models. Journal of memory and language, 38(4), 419–439. [Google Scholar]
  3. Anderson BA, Kim H, Kim AJ, Liao MR, Mrkonja L, Clement A, & Grégoire L (2021). The past, present, and future of selection history. Neuroscience & Biobehavioral Reviews, 130, 326–350. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Anderson EM, Seemiller ES, & Smith LB (2022). Scene saliencies in egocentric vision and their creation by parents and infants. Cognition, 229, 105256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Arnold AJ, Liddy JJ, Harris RC, & Claxton LJ (2020). Task‐specific adaptations of postural sway in sitting infants. Developmental Psychobiology, 62(1), 99–106. [DOI] [PubMed] [Google Scholar]
  6. Baluch F, & Itti L (2011). Mechanisms of top-down attention. Trends in neurosciences, 34(4), 210–224. [DOI] [PubMed] [Google Scholar]
  7. Bates D, Mächler M, Bolker B, & Walker S (2014). Fitting linear mixed-effects models using lme4. arXiv preprint arXiv:1406.5823. [Google Scholar]
  8. Bechtel W, & Bich L (2021). Grounding cognition: heterarchical control mechanisms in biology. Philosophical Transactions of the Royal Society B, 376(1820), 20190751. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Berger SE, Harbourne RT, & Guallpa Lliguichuzhca CL (2019). Sit still and pay attention! Trunk movement and attentional resources in infants with typical and delayed development. Physical and Occupational Therapy in Pediatrics, 39(1), 48–59. [DOI] [PubMed] [Google Scholar]
  10. Bertenthal B, & Von Hofsten C (1998). Eye, head and trunk control: The foundation for manual development. Neuroscience and Biobehavioral Reviews, 22(4), 515–520. [DOI] [PubMed] [Google Scholar]
  11. Biswas D, Arend LA, Stamper SA, Vágvölgyi BP, Fortune ES, & Cowan NJ (2018). Closed-loop control of active sensing movements regulates sensory slip. Current Biology, 28(24), 4029–4036. [DOI] [PubMed] [Google Scholar]
  12. Brandes-Aitken A, Braren S, Swingler M, Voegtline K, & Blair C (2019). Sustained attention in infancy: A foundation for the development of multiple aspects of self-regulation for children in poverty. Journal of experimental child psychology, 184, 192–209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Borji A, Sihite DN, & Itti L (2013). What stands out in a scene? A study of human explicit saliency judgment. Vision research, 91, 62–77. [DOI] [PubMed] [Google Scholar]
  14. Borjon JI, Abney DH, Yu C, & Smith LB (2021). Head and eyes: Looking behavior in 12-to 24-month-old infants. Journal of vision, 21(8). doi: 10.1167/jov.21.8.18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Burling JM, & Yoshida H (2019). Visual constancies amidst changes in handled objects for 5‐to 24‐month‐old infants. Child development, 90(2), 452–461. [DOI] [PubMed] [Google Scholar]
  16. Byrge L, Sporns O, & Smith LB (2014). Developmental process emerges from extended brain–body–behavior networks. Trends in cognitive sciences, 18(8), 395–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Campbell R, (2017). raacampbell/shadedErrorBar, https://github.com/raacampbell/shadedErrorBar. [Google Scholar]
  18. Chiel HJ, & Beer RD (1997). The brain has a body: adaptive behavior emerges from interactions of nervous system, body, and environment. Trends in neurosciences, 20(12), 553–557. [DOI] [PubMed] [Google Scholar]
  19. Clark A (2008). Where brain, body and world collide. In Material agency (pp. 1–18). Springer, Boston, MA. [Google Scholar]
  20. Cohen LB (1972). Attention-getting and attention-holding processes of infant visual preferences. Child Development, 869–879. [PubMed] [Google Scholar]
  21. Colombo J (1997). Individual differences in infant cognition: methods, measures, and models. In Developing Brain Behaviour (pp. 339–385). Academic Press [Google Scholar]
  22. Colombo J, & Cheatham CL (2006). The emergence and basis of endogenous attention in infancy and early childhood. Advances in child development and behavior, 34, 283–322. [DOI] [PubMed] [Google Scholar]
  23. Desimone R, & Duncan J (1995). Neural mechanisms of selective visual attention. Annual review of neuroscience, 18(1), 193–222. [DOI] [PubMed] [Google Scholar]
  24. DiCarlo JJ, & Cox DD (2007). Untangling invariant object recognition. Trends in cognitive sciences, 11(8), 333–341. [DOI] [PubMed] [Google Scholar]
  25. Ellis R, Allport DA, Humphreys GW, & Collis J (1989). Varieties of object constancy. The Quarterly Journal of Experimental Psychology Section A, 41(4), 775–796. [DOI] [PubMed] [Google Scholar]
  26. Fay‐Stammbach T, Hawes DJ, & Meredith P (2014). Parenting influences on executive function in early childhood: A review. Child development perspectives, 8(4), 258–264. [Google Scholar]
  27. Fisher AV (2019). Selective sustained attention: a developmental foundation for cognition. Current opinion in psychology, 29, 248–253. [DOI] [PubMed] [Google Scholar]
  28. Foulsham T, Walker E, & Kingstone A (2011). The where, what and when of gaze allocation in the lab and the natural environment. Vision research, 51(17), 1920–1931. [DOI] [PubMed] [Google Scholar]
  29. Frick MA, Forslund T, Fransson M, Johansson M, Bohlin G, & Brocki KC (2018). The role of sustained attention, maternal sensitivity, and infant temperament in the development of early self‐regulation. British Journal of Psychology, 109(2), 277–298. [DOI] [PubMed] [Google Scholar]
  30. Friedman AH, Watamura SE, & Robertson SS (2005). Movement-attention coupling in infancy and attention problems in childhood. Developmental Medicine and Child Neurology, 47(10), 660–665. [DOI] [PubMed] [Google Scholar]
  31. Geeraerts SB, Hessels RS, Van der Stigchel S, Huijding J, Endendijk JJ, Van den Boomen C, … & Deković M (2019). Individual differences in visual attention and self-regulation: A multimethod longitudinal study from infancy to toddlerhood. Journal of experimental child psychology, 180, 104–112. [DOI] [PubMed] [Google Scholar]
  32. Garon M, Bryson S, Smith I (2008). Executive function in preschoolers: A review using an integrative framework. Psychological Bulletin, 134 (2008), pp. 31–60. [DOI] [PubMed] [Google Scholar]
  33. Gibson JJ (1979) The ecological approach to visual perception. New York: Houghton Mifflin. [Google Scholar]
  34. Gottwald JM, Achermann S, Marciszko C, Lindskog M, & Gredebäck G (2016). An embodied account of early executive-function development: prospective motor control in infancy is related to inhibition and working memory. Psychological science, 27(12), 1600–1610. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Guan Y, & Corbetta D (2012). What grasps and holds 8-month-old infants' looking attention? The effects of object size and depth cues. Child Development Research, 2012. [Google Scholar]
  36. Hackman DA, Gallop R, Evans GW, & Farah MJ (2015). Socioeconomic status and executive function: Developmental trajectories and mediation. Developmental science, 18(5), 686–702. [DOI] [PubMed] [Google Scholar]
  37. Hofmann V, Sanguinetti-Scheck JI, Künzel S, Geurten B, Gómez-Sena L, & Engelmann J (2013). Sensory flow shaped by active sensing: sensorimotor strategies in electric fish. Journal of Experimental Biology, 216(13), 2487–2500. [DOI] [PubMed] [Google Scholar]
  38. Itti L, & Koch C (2001). Computational modelling of visual attention. Nature reviews neuroscience, 2(3), 194–203. [DOI] [PubMed] [Google Scholar]
  39. Itti L, Koch C, & Niebur E (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20 (11), 1254–1259. doi: 10.1109/34.730558. [DOI] [Google Scholar]
  40. Johansson M, Marciszko C, Gredebäck G, Nyström P, & Bohlin G (2015). Sustained attention in infancy as a longitudinal predictor of self-regulatory functions. Infant Behavior and Development, 41, 1–11. [DOI] [PubMed] [Google Scholar]
  41. Jonas RA, Wang YX, Yang H, Li JJ, Xu L, Panda-Jonas S, & Jonas JB (2015). Optic disc-fovea angle: the Beijing Eye Study 2011. PLoS One, 10(11), e0141771. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Kretch KS, Franchak JM, & Adolph KE (2014). Crawling and walking infants see the world differently. Child development, 85(4), 1503–1518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Kleinfeld D, Ahissar E, & Diamond ME (2006). Active sensation: insights from the rodent vibrissa sensorimotor system. Current opinion in neurobiology, 16(4), 435–444. [DOI] [PubMed] [Google Scholar]
  44. Kulke L, Atkinson J, & Braddick O (2015). Automatic detection of attention shifts in infancy: Eye tracking in the fixation shift paradigm. PloS one, 10(12), e0142505.d [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Lansink JM, Mintz S, & Richards JE (2000). The distribution of infant attention during object examination. Developmental Science, 3(2), 163–170. [Google Scholar]
  46. Luo C, & Franchak JM (2020). Head and body structure infants’ visual experiences during mobile, naturalistic play. Plos one, 15(11), e0242009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Nakagawa A, & Sukigara M (2013). Variable coordination of eye and head movements during the early development of attention: A longitudinal study of infants aged 12–36 months. Infant Behavior and Development, 36(4), 517–525. [DOI] [PubMed] [Google Scholar]
  48. Marchman VA, & Fernald A (2008). Speed of word recognition and vocabulary knowledge in infancy predict cognitive and language outcomes in later childhood. Developmental science, 11(3), F9–F16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Marciszko C, Forssman L, Kenward B, Lindskog M, Fransson M, & Gredebäck G (2020). The social foundation of executive function. Developmental Science, 23(3), e12924. [DOI] [PubMed] [Google Scholar]
  50. Miller EK, & Cohen JD (2001). An integrative theory of prefrontal cortex function. Annual review of neuroscience, 24(1), 167–202. [DOI] [PubMed] [Google Scholar]
  51. Miller EK, & Cohen JD (2001). An integrative theory of prefrontal cortex function. Annual review of neuroscience, 24(1), 167–202. [DOI] [PubMed] [Google Scholar]
  52. Pereira AF, James KH, Jones SS, & Smith LB (2010). Early biases and developmental changes in self-generated object views. Journal of vision, 10(11), 22–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Pereira AF, Smith LB, & Yu C (2014). A bottom-up view of toddler word learning. Psychonomic Bulletin & Review, 21(1), 178–185. 10.3758/s13423-013-0466-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Posner MI (1980). Orienting of attention. Quarterly journal of experimental psychology, 32(1), 3–25. [DOI] [PubMed] [Google Scholar]
  55. Proulx MJ, & Egeth HE (2008). Biased competition and visual search: the role of luminance and size contrast. Psychological research, 72(1), 106–113. 10.1007/s00426-006-0077- [DOI] [PubMed] [Google Scholar]
  56. Richards JE (2010). The development of attention to simple and complex visual stimuli in infants: Behavioral and psychophysiological measures. Developmental Review, 30(2), 203–219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Richards JE, & Hunter SK (1997). Peripheral stimulus localization by infants with eye and head movements during visual attention. Vision Research, 37(21), 30. [DOI] [PubMed] [Google Scholar]
  58. Risko EF, & Gilbert SJ (2016). Cognitive offloading. Trends in cognitive sciences, 20(9), 676–688. [DOI] [PubMed] [Google Scholar]
  59. Rosander K, & von Hofsten C (2000). Visual-vestibular interaction in early infancy. Experimental Brain Research, 133(3), 321–333. [DOI] [PubMed] [Google Scholar]
  60. Robertson SS, Bacher LF, & Huntington NL (2001). The integration of body movement and attention in young infants. Psychological science, 12(6), 523–526. [DOI] [PubMed] [Google Scholar]
  61. Rosen ML, Amso D, & McLaughlin KA (2019). The role of the visual association cortex in scaffolding prefrontal cortex development: A novel mechanism linking socioeconomic status and executive function. Developmental cognitive neuroscience, 39, 100699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Rossi AF, Pessoa L, Desimone R, & Ungerleider LG (2009). The prefrontal cortex and the executive control of attention. Experimental brain research, 192(3), 489–497. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Rothbart MK, Sheese BE, Rueda MR, & Posner MI (2011). Developing mechanisms of self-regulation in early life. Emotion review, 3(2), 207–213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Ruff HA (1990). Individual differences in sustained attention during infancy. Individual differences in infancy: Reliability, stability, prediction, 247–270. [Google Scholar]
  65. Ruff HA, & Capozzoli MC (2003). Development of attention and distractibility in the first 4 years of life. Developmental psychology, 39(5), 877. [DOI] [PubMed] [Google Scholar]
  66. Ruff HA, & Lawson KR (1990). Development of sustained, focused attention in young children during free play. Developmental psychology, 26(1), 85. [Google Scholar]
  67. Ruff HA, Lawson KR, Parrinello R, & Weissberg R (1990). Long‐term stability of individual differences in sustained attention in the early years. Child development, 61(1), 60–75. [PubMed] [Google Scholar]
  68. Schmitow C, Stenberg G, Billard A, von Hofsten C. Using a head-mounted camera to infer attention direction. International Journal of Behavioral Development. 2013; 37(5):468–474. [Google Scholar]
  69. Schroer SE, & Yu C (2022). Looking is not enough: Multimodal attention supports the real-time learning of new words. Developmental Science, e13290. [DOI] [PubMed] [Google Scholar]
  70. Sun L, & Yoshida H (2022). Why the parenťs gaze is so powerful in organizing the infanťs gaze: The relationship between parental referential cues and infant object looking. Infancy. [DOI] [PubMed] [Google Scholar]
  71. Smith LB, Yu C, & Pereira AF (2011). Not your mother’s view: The dynamics of toddler visual experience. Developmental science, 14(1), 9–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Suarez-Rivera C, Smith LB, & Yu C (2019). Multimodal parent behaviors within joint attention support sustained attention in infants. Developmental psychology, 55(1), 96. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Taub M, & Yovel Y (2020). Segregating signal from noise through movement in echolocating bats. Scientific Reports, 10(1), 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Thompson A, & Steinbeis N (2020). Sensitive periods in executive function development. Current Opinion in Behavioral Sciences, 36, 98–105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Tu HF, Lindskog M, & Gredebäck G (2022). Attentional control is a stable construct in infancy but not steadily linked with self-regulatory functions in toddlerhood. Developmental Psychology. [DOI] [PubMed] [Google Scholar]
  76. Van Ede F, Chekroud SR, Stokes MG, & Nobre AC (2019). Concurrent visual and motor selection during visual working memory guided action. Nature Neuroscience, 22(3), 477–483. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Wass SV, Noreika V, Georgieva S, Clackson K, Brightman L, Nutbrown R, … & Leong V (2018). Parental neural responsivity to infants’ visual attention: How mature brains influence immature brains during social interaction. PLoS biology, 16(12), e2006328. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Wass SV, Clackson K, Georgieva SD, Brightman L, Nutbrown R, & Leong V (2018). Infants' visual sustained attention is higher during joint play than solo play: is this due to increased endogenous attention control or exogenous stimulus capture? Developmental science, 21(6), e12667. [DOI] [PubMed] [Google Scholar]
  79. Wass SV, Whitehorn M, Haresign IM, Phillips E, & Leong V (2020). Interpersonal neural entrainment during early social interaction. Trends in cognitive sciences, 24(4), 329–342. [DOI] [PubMed] [Google Scholar]
  80. Werchan DM, & Amso D (2020). Top-down knowledge rapidly acquired through abstract rule learning biases subsequent visual attention in 9-month-old infants. Developmental Cognitive Neuroscience, 42, 100761.2001d. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Wolfe JM (2020). Visual search: How do we find what we are looking for. Annual review of vision science, 6(1), 539–562. [DOI] [PubMed] [Google Scholar]
  82. Wolfe JM, & Horowitz TS (2017). Five factors that guide attention in visual search. Nature Human Behaviour, 1(3), 1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Yu C, & Smith LB (2012). Embodied attention and word learning by toddlers. Cognition, 125(2), 244–262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Yu C, & Smith LB (2016). The social origins of sustained attention in one-year-old human infants. Current biology, 26(9), 1235–1240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Yu C, Smith LB, Shen H, Pereira AF, & Smith T (2009). Active information selection: Visual attention through the hands. IEEE transactions on autonomous mental development, 1(2), 141–151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Yu C, Suanda SH, & Smith LB (2019). Infant sustained attention but not joint attention to objects at 9 months predicts vocabulary at 12 and 15 months. Developmental science, 22(1), e12735. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplementary information

Data Availability Statement

The de-identified data, code for analyses, and supplementary analyses are available on the Open Science Framework https://osf.io/tcd35/.

RESOURCES