Abstract
Researchers are acutely interested in how people engage in social interactions and navigate their environment. However, in striving for experimental or laboratory control, we often instead present individuals with representations of social and environmental constructs and infer how they would behave in more dynamic and contingent interactions. Mobile eye-tracking (MET) is one approach to connecting the laboratory to the experienced environment. MET superimposes gaze patterns captured through head or eye-glass mounted cameras pointed at the eyes onto a separate camera that captures the visual field. As a result, MET allows researchers to examine the world from the point of view of the individual in action. This review touches on the methods and questions that can be asked with this approach, illustrating how MET can provide new insight into social, behavioral, and cognitive processes from infancy through old age.
Keywords: Mobile eye-tracking, stationary eye-tracking, visual attention, development, learning
Many researchers are acutely interested in how people engage in social interactions and navigate their environment. To help experimentally control experiences in the laboratory and standardize individual events for analysis, researchers often take contingent three-dimensional people, objects, and events and create flattened two-dimensional “stimuli.” As such, we typically generate data that really reflect how we observe social information and process the environment (Redcay & Schilbach, 2019). We have a limited sense of how the mechanisms and processes we are interested in play out in the actual circumstances of our everyday lives. Our two-dimensional stimuli do not do a good job of capturing “second-person” (Redcay & Schilbach, 2019) or “person-centered” (Fu & Pérez-Edgar, 2019) attention mechanisms. Individuals do not typically respond to a steady stream of controlled pre-determined stimuli. Rather, they create for themselves an “experienced environment” (Pérez-Edgar & Fox, 2018) by selectively and idiosyncratically engaging with only a subset of the world around them. In our work, we wish to capture how the individual navigates and takes in their environment from their own point of view.
Central to our personalized experience is attention, which acts as a domain-neutral mechanism for development, learning, and daily functioning. Attention links experiences across time and contexts, setting the boundaries for stimuli that will be attended to, processed, interpreted, and acted upon (Amso & Scerif, 2015). Thus, it is a gateway for social, behavioral, cognitive, and neural processes. Here, our focus is on visual attention, which reflects an important source of environmental information for most individuals. For research purposes, the goal is often to capture the focus of attention via eye gaze. We interpret gaze direction--where you are looking--as the focus of your visual attention (Colombo, 2001). Visual attention, in turn, is thought to reflect the focus of your current active processing.
From the first days of life, as our visual acuity steadily improves, we use vision to take in and make sense of the world. As we develop more complex and sophisticated motor skills, we continue to use vision to guide our actions (Franchak, 2020). Visual attention can help create the informational and experiential pipeline for development and learning. Visual exploration, often coupled with behavioral exploration, is opportunistic. Items appear in the world unexpectedly, or take on new meanings as our goals shift. This exploratory process may become more purposeful and guided with age. A similar evolution takes place as people learn a new skill and transition from novices to experts. However, we are still captured by the salient and the unexpected well into adulthood and well after we master a specific set of skills.
Currently, researchers typically rely on the perspective of a researcher observing from a distance with a camera or extract indirect information through behavioral responses elicited with computer-based tasks. We can come a bit closer to seeing the world through the eyes of another person using mobile eye-tracking (MET). MET uses a cap-mounted camera (for infants; Franchak, Kretch, Soska, & Adolph, 2011) or cameras embedded inside eye-glasses (Fu & Pérez-Edgar, 2019). With the eye-glasses, one camera looks out into the world while two cameras point to each eye. By superimposing the signals from all three cameras we can track where, and when, individuals gaze at specific aspects of their environment (Figure 1). Then, from gaze, we can loop back to infer processes of attention. While MET has been used for decades to capture attention processes in adults (Isaacowitz, Livingstone, Harris, & Marcotte, 2015), its use from the earliest months of life is a relatively new phenomenon (Franchak, 2017).
Figure 1.

Recordings from a mobile eye-tracking paradigm. The room recording (left) and the mobile eye-tracking recording (right) is synchronized offline after the study visit. Researchers can code the child’s eye gazes and the adult research assistant’s behavior from the eye-tracking recording, as well as the child’s bodily behavior from the room recording. Hence, the paradigm enables researchers to examine the child’s visual attention processes embedded in active social interaction.
Attention is a Domain-General Mechanism
Attention is a complex multi-component processing system. Researchers often divide attention into three distinct, but interwoven, components (Posner & Rothbart, 2007): Orienting, Vigilance, and Executive Attention. These three core areas of functioning allow a person moving through her busy environment to notice an important event (vigilance), shift attention to the event (orienting), and then decide if she needs to act (executive). We can get a sense of each component by examining patterns of visual attention over time and comparing it to the surrounding pattern of events and behaviors over time. For example, given the choice between climbing a structure to get a toy, or getting a parent to help, we can observe a child’s visual attention before, during, and after the choice is enacted to infer decision-making processes. Over time, we can see how people gain greater flexibility in responding to the world and a wider range of options when needing to respond and regulate (Pérez-Edgar, Taber-Thomas, Auday, & Morales, 2014). With children in particular we can see how infants and toddlers are “captured” by the environment, while the older child can recruit visual attention to meet a challenge, as needed (Shulman et al., 2009).
Why do we need MET?
The scope of the current review does not allow us to discuss in depth the wide numbers of methods and measures that can be used to capture and assess attention (see Fu & Pérez-Edgar, 2019 for a more thorough discussion). These methods, particularly when used together in the same study with the same participants, can help us better triangulate how attention emerges, is deployed, and then influences broad patterns of thought and behavior (LoBue et al., in press). Of course, every method has its limitations. For example, if we videotape a person to capture their visual attention, we can observe semi-naturalistic movements. However, we need to constrain the environment and set our cameras up in such a way that it will be obvious to us where the person is looking. Capturing these data can require laborious coding procedures, only to generate fairly coarse approximations. If the scene is ambiguous or if the person moves out of the correct camera angle, the information can be lost entirely.
Researchers can gain control and precision by using stationary eye-tracking (SET) to capture visual attention. In SET an infrared signal, often mounted on a computer, detects a person’s pattern of eye gaze as she views stimuli on the screen. The eye gaze detection is much more precise that what can be tracked via video and it does not require an overt behavioral response from the participant. For example, with a camera, we can tell that a person may be looking at a face. With SET we can now track eye-gaze as it moves from the eyes to the mouth to the ears and back again (Oakes, 2012). However, we are still limited to stimuli that can be presented on a screen and the person must sit still.
Fundamentally, SET cannot provide us with a sense of the person’s self-generated visual experience. When we present a single or limited set of stimuli on a computer screen, we can capture attention in the absence of person-driven alternatives. We are less able to see individual differences in selection and engagement. Thus, our tasks deliberately and systematically take away choice (Ladouce, Donaldson, Dudchenko, & Ietswaart, 2017).
With MET we have the opportunity capture the real-time dynamic relation between attention, self-regulation, and behavior (Fu & Pérez-Edgar, 2019). As researchers, our goal is often to observe, predict, and, when needed, modify behavior—broadly defined across motoric, cognitive, and socioemotional domains. These behaviors, in turn, are embedded in a world that is reciprocal, where there is an exchange between people and things, and events are unpredictable, since one event cannot perfectly predict what will happen next. MET provides an additional tool to improving ecological validity in our research while taking a novel person-centered approach.
It allows us to depict the ebb and flow of attention patterns in real-time as the surrounding events and the child’s own behavior unfold. In addition, MET can be used to capture patterns of attention in the first months of life. In this way, MET broadens the age window for capturing visual attention processes, from birth to old age, and across levels of disability.
How do we use MET?
The first step in deploying a relatively novel technique is to define the parameters needed to generate reliable and valid data. In contrast to SET tasks, the depth and distance of objects and events that the participants are looking at can vary from moment to moment (Franchak, 2017). Calibration accuracy can be compromised when the objects are either closer or further than the distance between the participants and the targets used for initial calibration. Hence, it is likely that the initial calibration accuracy will not apply for the entire recording. To address this issue, it is important to assess potential changes in accuracy by performing multiple validation procedures during which a target is presented at different viewing distance. The calibration process is illustrated in Slone et al. (2018) and an example of the validation procedure is presented here (http://bit.ly/MET_OSF).
MET also allows us to make more fine-grained studies of dynamic processes, observing micro-longitudinal trajectories within a task. This work also increases our ability to use innovative analytic approached that capture dynamic relations (Hollenstein, 2013) and individual time-series (Corbetta, Guan, & Williams, 2012). New approaches are needed because MET data quickly become complicated. As noted, MET captures attention selection and attention in motion. As such, we cannot necessarily predict what a person will attend to and when and how often they will (or will not) attend to a particular target. As a result, we have to first characterize the person-by-person idiosyncratic behaviors and attention patterns we see. This can be done by coding frame by frame across a visit or by coding for specific events (e.g., child looks at a stranger’s face) and then coding for the behaviors and attention patterns that are evident before and after the event.
For example, state space grids (Hollenstein, 2013; Lewis, Lamey, & Douglas, 1999) can illustrate how two sets of variables, each with multiple components, can “move within a figurative space”. Plotting the movement across the grid allows researchers to observe, and quantify, the interplay between constructs or behaviors of interest over the course of a task or interaction (Figure 2). They can also help extract transitional patterns that suggest sequences of behavior moving across states.
Figure 2.

Illustration of intraindividual variability of dyadic behavior using two state space grids, each showing a parent-child dyad. Solid lines reflect movement across event nodes. Dotted lines connect event nodes prior to missing events to nodes that follow the missing events. Parent Ref. = Parent Reference (wherever the parent was pointing/referencing during the task); Pos. Reinforce = Positive Reinforcement.
What are we beginning to learn with MET?
To illustrate one use of MET, MacNeill (2019) focused on the contours of parent-child interactions to examine how parental traits and behaviors may be transmitted, and shared, in the moment to the child. A strong body of work has shown that over-solicitous and over-protective parenting can increase anxiety among children (Hastings & Rubin, 1999), particularly if they are at heightened risk due to temperamental behavioral inhibition (BI; Hastings, Rubin, Smith, & Wagner, 2019). MacNeill (2019) had parents and 5- to 7-year-old children complete a difficult puzzle task (http://bit.ly/MET_OSF). Areas of interest (AOIs) derived from MET were plotted on one axis, while the other axis captured parenting behaviors noted within the course of the parent-child puzzle task. Figure 2 shows the patterns of dyadic behavior for two unique parent-child dyads, capturing time spent at each square or state, representing the intersection of an AOI with a specific behavior. The size of the node reflects amount of time and transitions between these states are illustrated with the lines that connect the nodes. Using these grids, we can identify patterns of dyadic exchanges that occur over the course of the task or interaction, potentially revealing dyadic processes that may influence development. In this study, MacNeill (2019) examined attractors, or states that pull the dyadic system from other states under particular conditions (Thelen & Smith, 2006). The data indicated that child BI was positively associated with parent-focused/controlling parenting attractor strength (i.e., longer amounts of time at each visit to states where the child looked at the parent and the parent engaged in directive or intrusive behavior), but only when the parent also reported higher levels of anxiety.
Researchers quickly saw that MET can lead us to question many of the basic assumptions we had about the ways in which navigate the environment and interact with others. For example, both children and adults spend far less time looking at each other’s faces than we would predict based on our everyday intuitions and studies that rely on SET (Franchak et al., 2011; Jung, Zimmerman, & Pérez-Edgar, 2018; MacNeill, 2019). These patterns clearly show that visual attention—where we are looking—cannot capture the full breadth of attentional processes that are central to our everyday behavior. Anyone who has ever turned left to go to work, forgetting that they were supposed to turn right that day and first go to the bank, knows that we can see and process visual input without actually internalizing the meaning behind that input.
For example, Yu and Smith (2013) found that an infant’s visual gaze to hands can be a better indicator of engagement than looking at a partner’s face, depending on the task. At the other end of the age spectrum, Isaacowitz and colleagues (Isaacowitz et al., 2015) found no difference in bias to positive valence if selecting from the environment with MET, although SET studies had consistently shown a positivity bias for older adults. And, in between, Fu and colleagues (Fu, Nelson, Borge, Buss, & Pérez-Edgar, 2019) had 5- to 7-year-old children complete a standard attention bias task with reaction time and SET measures. The same children also engaged in a standardized interaction with an adult stranger modified from Buss (2011). An association with BI was only evident in the MET task. In addition, a relation between the SET task performance and MET behavior was only significant with increasing levels of BI.
(Interim) Conclusions
Clearly, our daily environments are more complex than the protocols we construct in the research laboratory. Although we gain experimental control, we lose a connection to the world we wish to better understand. Social interactions pull for practical knowledge and experience with past social encounters. As such, individuals will engage in social and communicative signaling, above and beyond the specific task or message, in a way not evident in non-interactive contexts such as viewing a movie or pressing buttons in response to social stimuli.
The growing MET literature can work to help bridge this gap. As equipment becomes lighter and more resistant to calibration loss, we can gather more data, over more time, as individuals roam more broadly. Thus we can continue to apply MET to new contexts, including spider phobics confronting their fear (Lange, Tierney, Reinhardt-Rutland, & Vivekananda-Schmidt, 2004), adolescents taking in feedback during a stressful speech (Woody et al., 2019), and a young child exploring a children’s museum (Jung et al., 2018). In this way we can better capture the ultimate prize—understanding the mechanisms and processes that help us become who we are, embedded in our specific contexts, through our own eyes. This is the idiosyncratic, and embodied, experienced environment (Pérez-Edgar & Fox, 2018).
Acknowledgements:
Preparation of the manuscript was supported by a National Institute of Mental Health grant R21 MH111980 (to K.P.-E.), a National Institute of Child Health and Human Development T32 HD007376, Carolina Consortium on Human Development Training Program (to L.M.), and internal funding from the Research Institute at Nationwide Children’s Hospital (X.F.).
References
- Amso D, & Scerif G (2015). The attentive brain: insights from developmental cognitive neuroscience. Nature Reviews Neuroscience, 16, 606–619. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Buss KA (2011). Which fearful toddlers should we worry about? Context, fear regulation, and anxiety risk. Developmental Psychology, 47, 804–819. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Colombo J (2001). The development of visual attention in infancy. Annual Review of Psychology, 52, 337–367. [DOI] [PubMed] [Google Scholar]
- Corbetta D, Guan Y, & Williams JL (2012). Infant eye-tracking in the context of goal-directed actions. Infancy, 17, 102–125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Franchak JM (2017). Using head-mounted eye tracking to study development (Hopkins B, Geangu E, & Linkenauger S Eds. 2nd ed.). Cambridge: Cambridge University Press. [Google Scholar]; This review outlines the ways in which we can use mobile eye tracking to capture developmental processes.
- Franchak JM (2020). The ecology of infants’ perceptual-motor exploration. Current Opinion in Psychology, 32, 110–114. [DOI] [PubMed] [Google Scholar]
- Franchak JM, Kretch KS, Soska KC, & Adolph KE (2011). Head-mounted eye tracking: A new method to describe infant looking. Child Development, 82(6), 1738–1750. [DOI] [PMC free article] [PubMed] [Google Scholar]; This is one of the first studies to apply mobile eye-tracking to motor processes and movement in infants.
- Fu X, Nelson EE, Borge M, Buss KA, & Pérez-Edgar K (2019). Stationary and ambulatory attention patterns are differentially associated with early temperamental risk for socioemotional problems: Preliminary evidence from a multimodal eye-tracking investigation. Development and Psychopathology, 31, 971–988. [DOI] [PMC free article] [PubMed] [Google Scholar]; Empirical study comparing attention patterns across RT-based measured, stationary eye tracking, and mobile eye tracking.
- Fu X, & Pérez-Edgar K (2019). Threat-related attention bias in socioemotional development: A critical review and methodological considerations. Developmental Review, 51, 31–57. [DOI] [PMC free article] [PubMed] [Google Scholar]; A thorough theoretical and empirical analysis of methodologies available for assessing patterns of attention, particularly in children.
- Hastings PD, & Rubin KH (1999). Predicting mothers’ beliefs about preschool-aged children’s social behavior: Evidence for maternal attitudes moderating child effects. Child Development, 70(3), 722–741. [DOI] [PubMed] [Google Scholar]
- Hastings PD, Rubin KH, Smith KA, & Wagner NJ (2019). Parenting Behaviorally Inhibited and Socially Withdrawn Children. In Bornstein MH (Ed.), Handbook of Parenting [Google Scholar]
- Hollenstein T (2013). State Space Grids: Depicting Dynamics across Development. Cham: Springer. [Google Scholar]; This book illustrates and explains the use of state space grids to capture dynamic and interactive processes.
- Isaacowitz DM, Livingstone KM, Harris JA, & Marcotte SL (2015). Mobile eye tracking reveals little evidence for age differences in attentional selection for mood regulation. Emotion, 15(2), 151–161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jung YJ, Zimmerman HT, & Pérez-Edgar K (2018). A methodological case study with mobile eye-tracking of child interaction in a science museum. TechTrends, 62(5), 509–517. [Google Scholar]
- Ladouce S, Donaldson DI, Dudchenko PA, & Ietswaart M (2017). Understanding minds in real-world environments: toward a mobile cognition approach. Frontiers in Human Neuroscience, 10, 694. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lange W, Tierney K, Reinhardt-Rutland A, & Vivekananda-Schmidt P (2004). Viewing behaviour of spider phobics and non-phobics in the presence of threat and safety stimuli. British Journal of Clinical Psychology, 43(3), 235–243. [DOI] [PubMed] [Google Scholar]
- Lewis MD, Lamey AV, & Douglas L (1999). A new dynamic systems method for the analysis of early socioemotional development. Developmental Science, 2, 457–475. [Google Scholar]
- LoBue V, Reider L, Kim E, Burris JL, Oleas D, Buss KA, & Pérez-Edgar K (in press). Making sense of the blooming, buzzing confusion: The importance of using multiple outcome measures in infant research. Infancy. [DOI] [PMC free article] [PubMed] [Google Scholar]
- MacNeill L (2019). Attention processes in context: A multi-method assessment of how parenting and temperament contribute to the development of attention. (PhD), The Pennsylvania State University, University Park, PA. [Google Scholar]
- Oakes LM (2012). Advances in eye tracking in infancy research. Infancy, 17, 1–8. [DOI] [PubMed] [Google Scholar]
- Pérez-Edgar K, & Fox NA (2018). Next steps: Behavioral inhibition as a model system. In Pérez-Edgar K & Fox NA (Eds.), Behavioral Inhibition (pp. 357–372): Springer. [Google Scholar]
- Pérez-Edgar K, Taber-Thomas B, Auday E, & Morales S (2014). Temperament and attention as core mechanisms in the early emergence of anxiety. In Lagattuta K (Ed.), Children and Emotion: New Insights into Developmental Affective Science (Vol. 26, pp. 42–56): Karger Publishing. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Posner MI, & Rothbart MK (2007). Research on attention networks as a model for the integration of psychological science. Annual Review of Psychology, 58, 1–23. doi: 10.1146/annurev.psych.58.110405.085516 [DOI] [PubMed] [Google Scholar]
- Redcay E, & Schilbach L (2019). Using second-person neuroscience to elucidate the mechanisms of social interaction. Nature Reviews Neuroscience, 20, 495–505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shulman GL, Astafiev SV, Franke D, Pope DLW, Snyder AZ, McAvoy MP, & Corbetta M (2009). Interaction of stimulus-driven reorienting and expectation in ventral and dorsal frontoparietal and basal ganglia-cortical networks. Journal of Neuroscience, 29, 4392–4407. doi: 10.1523/JNEUROSCI.5609-08.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Slone LK, Abney DH, Borjon JI, Chen C. h., Franchak JM, Pearcy D,… Smith LB(2018). Gaze in Action: Head-mounted Eye Tracking of Children’s Dynamic Visual Attention During Naturalistic Behavior. JoVE (Journal of Visualized Experiments), 141, e58496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thelen E, & Smith LB (2006). Dynamic systems theories. In Damon W & Lerner RM (Eds.), Handbook of child psychology (pp. 258–313). [Google Scholar]
- Woody ML, Rosen D, Allen KB, Price RB, Hutchinson E, Amole MC, & Silk JS(2019). Looking for the negative: Depressive symptoms in adolescent girls are associated with sustained attention to a potentially critical judge during in vivo social evaluation. Journal of Experimental Child Psychology, 179, 90–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yu C, & Smith LB (2013). Joint attention without gaze following: Human infants and their parents coordinate visual attention to objects through eye-hand coordination. Plos ONE, 8(11), e79659. [DOI] [PMC free article] [PubMed] [Google Scholar]
