The role of (observed) gaze behaviour in identity recognition

Lindsay M Peterson; Colin W G Clifford; Colin J Palmer

doi:10.3758/s13423-026-02908-5

. 2026 Apr 14;33(4):136. doi: 10.3758/s13423-026-02908-5

The role of (observed) gaze behaviour in identity recognition

Lindsay M Peterson ^1,^✉, Colin W G Clifford ¹, Colin J Palmer ²

PMCID: PMC13079541 PMID: 41979830

Abstract

Recognising familiar people is an important function of the human visual system that supports daily social interactions. In addition to static visual cues (e.g., face shape and texture), biological motion cues such as idiosyncratic facial motion and gait contribute to identity recognition. Surprisingly, recent research has indicated an individuals’ eye movements exhibit idiosyncratic patterns that contain identifiable information, though it is largely unknown to what extent human observers exploit this information. The current study measures sensitivity of human observers to idiosyncratic gaze behaviours when identifying familiar faces. In two experiments, participants familiarised themselves with faces that were generated from three-dimensional scans of human heads. As participants examined each face, the face’s eye movements were animated using eye-tracking data from real observers such that the face appeared to be looking around the room. Participants were then tested on how much had been learned about each identity’s gaze behaviours. In Experiment 1 (N = 40), we found a small but significant effect of gaze behaviour in spontaneously biasing the perceived facial identity even when other visual cues to facial identity were available. In Experiment 2, participants (N = 51) were sensitive at discriminating the identity of a face based purely on its eye movements after explicit instructions to rely on gaze behaviour to distinguish different individuals. Our results demonstrate an individual’s dynamic gaze behaviours can inform how others recognise them, expanding current understanding of the visual cues that contribute to identity recognition and the sensitivity of human observers to others’ attentional behaviour.

Keywords: Face recognition, Gaze perception, Gaze behaviour, Eye-tracking, Biological motion

Introduction

Recognising familiar people is a critical component of daily human functioning that facilitates social interactions. The visual system relies on a range of facial cues to recognise a person, including their face shape and skin pigment, internal features such as the nose and mouth, and external features such as hair (Abudarham et al., 2019; Burton et al., 1999; Ellis et al., 1979). Non-face cues also support person recognition. In particular, biological motion – natural patterns of movement generated by humans and animals – is a reliable and dynamic, non-face cue used for person recognition (Yovel & O’Toole, 2016). For example, detecting idiosyncrasies in gait and gestures can contribute to identifying a known person (Loula et al., 2005; O’Toole et al., 2002; Yovel & O’Toole, 2016). The current study investigates whether gaze behaviour can influence identity recognition, as a form of biological motion that is intimately linked to our perception of another person’s attentional state.

The way that a person visually examines their environment tends to be salient to us in part because it can be indicative of their current state of mind and future goals (Baron-Cohen, 1997). Much of the gaze-perception literature has focused on the static element of gaze (e.g., perceiving or following the direction of a person’s gaze), but the temporal dynamics of gaze may be just as important to us as observers for understanding another person’s mental state. The temporal order of fixations carries information about the gazer’s environment: watching a social interaction through the ‘eyes’ of another whose fixations have been temporally scrambled significantly impacts the interpretation of the interaction (Bush et al., 2015). Saccades (rapid eye movements between fixations) occur up to five times per second while viewing an image (Martin et al., 2018; Wolfe, 2007), and these dynamic behaviours regulate the conversation flow in dyadic interactions (Degutyte & Astell, 2021). Indeed, human observers have a sense of the dynamic gaze behaviours that are socially appropriate given the context (Landmann et al., 2024).

There is reason to think that this sensitivity of human observers to the spatiotemporal dynamics of other people’s gaze behaviour might also extend to the discrimination of the gaze behaviours of different individuals. In neurotypical observers, there is evidence of significant individual differences in gaze behaviour that are stable across time (Berlijn et al., 2022; de Haas et al., 2019; Guy et al., 2019; Mehoudar et al., 2014) and partially heritable, such that identical twins tend to share a greater similarity in how they free-view naturalistic images when compared to dizygotic twins or other individuals (Constantino et al., 2017; Kennedy et al., 2017). Eye movements carry sufficient identifiable information that computer algorithms can be developed to identify an individual from their gaze patterns (termed ‘gaze fingerprinting’; Crockford et al., 2023), but can human observers recognise the dynamic gaze behaviours of different individuals? Ziman et al. (2023) reported that observers could learn to correctly distinguish two people from their eye movement data with ~ 60% accuracy when eye movements were represented as a dynamic marker on an image. Earlier studies using a similar methodological approach differed somewhat in their findings: while observers could discriminate their own gaze patterns from random or computer-generated gaze patterns, they found it difficult to discriminate their own gaze patterns from another person’s (Clarke et al., 2017; Foulsham & Kingstone, 2013; Võ et al., 2016). Importantly, however, these past studies did not assess the role of eye-movement behaviours in face recognition, but rather how participants can learn to identify patterns in eye-movement data when presented in an abstract manner (e.g., as a moving dot superimposed on an image). In daily interactions, the input to our visual system associated with (observed) gaze behaviours are the biological motion cues produced directly by the movement of our partner’s eyes and face. Are these dynamic visual cues associated with gaze behaviour a component of how we perceive facial identity?

The aim of the current study was to measure the influence of (observed) gaze behaviour on the perception of facial identity. We presented participants with highly realistic computer-generated faces that could move their eyes as if they were looking around the room. The eyes of the faces were animated using eye-tracking data recorded from real human observers such that the eye movements of one (computer-generated) face could be controlled using the eye-movement data recorded from one specific (real) person while the eye movements of other (computer-generated) faces were controlled using the eye-movement data recorded from different (real) people. In Experiment 1, we measured sensitivity at discriminating the identity of two people when participants are not explicitly instructed to rely on eye movements to make their judgements and when other established cues to face identity are also available (e.g., face shape and texture). In the learning phase of the experiment, participants saw animations of faces performing eye movements, as if they were looking around the room, and judged the identity of the face and how closely the face was paying attention to them. In the subsequent test phase, participants viewed blends of the two original faces (facial morphs) that varied along a continuum of relative identity strength. Each morphed face performed gaze behaviours associated with both identities presented during the learning phase. Participants again judged the identity of the face and how closely the face was paying attention to them. We hypothesised that if participants implicitly learn about gaze behaviours associated with a particular identity, then this learning would be evident in facial identity judgements in the testing phase. For example, we anticipated that when the facial identity was ambiguous (i.e., a 50/50 morph of the two original faces), perceived identity would be biased towards the identity corresponding to the eye movements. The specific eye movements performed by a face identity during the learning phase were always different from those presented during the test phase, despite being recorded from the same human observer (across different trials of a free-viewing task). Hence, the task was designed to test recognition of idiosyncrasies in a person’s style of looking behaviour, rather than merely testing recognition of a previously seen pattern of eye movements.

Experiment 1

Method

Participants

Forty participants (mean age = 18.3 years, min. = 17 years, max. 22 years; 32 females, eight males) completed the experiment. An additional two participants completed the experiment, but their data were not recorded due to a technical error. Participants were undergraduate students enrolled in a first-year psychology course at the University of New South Wales (UNSW) Sydney and had self-reported normal or corrected-to-normal vision. No further demographic information was collected from participants, and no screening surveys were used in the experiment. Participants gave informed consent prior to beginning the experiment. The recruitment and experimental procedures were approved by the Human Research Ethics Committee, School of Psychology, UNSW Sydney. A post hoc sensitivity analysis indicated that the experiment had 95% power to detect a medium effect (Cohen’s d ≈.6) with α =.05.

Stimuli and apparatus

Three-dimensional (3D) face models

We used three-dimensional (3D) models of human heads from the 3D Scan Store (https://www.3dscanstore.com/) that had been created from high-definition scans of real individuals. These models were imported into the computer graphics rendering program Blender (version 3.6.2) in their ‘real’ size, with the models having an inter-pupillary distance of approximately 6.3 cm (which corresponds to the average inter-pupillary distance reported in large samples; Dodgson, 2004; Fesharaki et al., 2012). We had eight models in total: two female pairs (referred to as pair #1 and #2) and two male pairs (pair #3 and #4). We refer to identity #1 in pair #1 as identity 1.1, and identity #2 in pair #1 as identity 1.2, and so on. Each face identity can be seen in the far left and right columns of Fig. 1. The 3D models shared the same topology, meaning that the mesh of a 3D model, which defines its 3D shape, could be merged with another face mesh in Blender to create faces that were morphs of two identities. The textures of the two models were morphed in MATLAB and imported into Blender to apply to the morphed 3D mesh. We created morphs ranging from 10 to 90% in 5% increments. The morph percentages are relative to the amount of the second identity present in the morphed face. For example, a 25% morph of pair #2 is a morphed face comprised of 75% identity 2.1 and 25% of identity 2.2. Examples of the morphed faces for each identity pair are shown in the middle three columns of Fig. 1.

Fig. 1 — Examples of the stimuli from Experiment 1. There were four identity pairs: two female pairs (#1 and #2) and two male pairs (#3 and #4). Participants were randomly assigned to an identity pair, with equal numbers of participants seeing each pair. Each (unmorphed) face identity can be seen in the far left and right columns. We created morphed faces within each face-identity pair, where the morph percentage is relative to the amount of identity #2 present in the face. For example, a 25% morph face for pair #4 indicates that the morphed face is 25% identity 4.2 and 75% identity 4.1

Animating gaze behaviour

The eye movements of the faces were controlled by the eye-tracking data from the Object and Semantic Images and Eye-tracking (OSIE) dataset (Xu et al., 2014). The OSIE dataset contains the recorded eye movements of 15 human observers that freely viewed 700 images of everyday objects and scenes. The eye-tracking data collected by Xu et al. (2014) were processed to define each participant’s scanpath for each image as a set of fixations (x–y coordinates corresponding to the participant’s point of fixation in an image). Note that the velocity threshold for a saccade to be detected was 22°/s and any unstable fixations (i.e., < 100 ms in duration) were discarded to reduce noise in the eye-tracking data (Xu et al., 2014). As such, the eye movements of the faces in this study consist of a set of fixations but do not include finer-scale eye movements or microsaccades.

We selected four pairs of observers from the OSIE dataset to animate the eyes of the face models in Fig. 1. To maximise the differences in gaze behaviour within a pair, we used an entropy analysis to determine the four most dissimilar pairs of observers in the OSIE dataset based on their scanpaths. For each image in the OSIE dataset, we calculated the dissimilarity between each pair of observers where the x–y coordinates of the fixations in each scanpath were binned, with a total of 192 bins at 50 pixels square for the 800 × 600 pixel image. The durations of each fixation within a bin were made proportional to the total duration of the scanpath, and the distribution of duration-weighted fixations was used to calculate the entropy. Mutual information, calculated as the difference between the conditional entropy (i.e., the average entropy of the two scanpaths) and total entropy for a pair of scanpaths (i.e., the entropy derived from the two scanpaths combined) provided a measure of dissimilarity. Note that mutual information was normalised by the total entropy. By averaging over the dissimilarity for the 700 images we obtained a 15 × 15 observer dissimilarity matrix. We then determined the four most dissimilar pairs by comparing the mean dissimilarity of each observer to every other observer. The selected pairs of observers were assigned to the face pairs; for example, the eye movements of Identity 1.1 (i.e., the face in the top left panel of Fig. 1) were controlled by the observer #7 from the OSIE dataset. Observers #7 and #2 were assigned to pair #1, observers #15 and #5 to pair #2, observers #6 and #4 to pair #3, and observers #12 and #3 to pair #4. We refer to the OSIE observer (eye-movement data) assigned to each facial identity as the gaze identity.

We randomly selected scanpaths for 50 images from the OSIE dataset for each identity pair. In Blender, the eyes of each face model were rotated to track the location of a gaze target. This target was located 66 cm in front of the model to match the viewing distance at which the eye-tracking data was recorded in the OSIE dataset (Xu et al., 2014; see Fig. 2A). The position of the target on this depth plane was given by the x–y coordinates from the eye-tracking data (see Figs. 2C–E). We used the fixation durations that were recorded for each fixation in a scanpath such that each image of the model performing a fixation would be displayed for the appropriate duration. Additional fixations were appended to the beginning and end of each scanpath, such that animation always began and ended with a 500-ms central fixation (i.e., the face looking directly ahead). Within a face-identity pair, each face was rendered performing the 50 scanpaths by the gaze identity. This resulted in 100 stimuli for each face-identity pair. As described previously, morphed faces were generated for each face-identity pair. There were 17 morph levels: 10–90% in 5% increments. For each morphed face, eight images were selected from the OSIE dataset and each morphed face was rendered performing the associated scanpaths for each gaze identity within a face pair. This resulted in 272 morphed stimuli in total for each identity pair. Examples of the animated stimuli can be found in the Online Supplementary Material.

Fig. 2 — The experimental stimuli were three-dimensional (3D) models of human faces who could look around the room. **(A)** A screenshot of the stimulus setup in Blender. The face fixated on a depth plane 66 cm in front of them and the x–y coordinates of the fixation point were determined by the eye-tracking data from the OSIE dataset. Note that an image from the OSIE dataset has been added to the depth plane such that the plane is visible in this figure; the images were not visible in the experimental stimuli. **(B)** An example of the eye-tracking data from the OSIE dataset. The red dots show each fixation point from this particular scanpath, with the numbers next to each dot conveying the temporal order of the fixations. **(C–E)** The eyes of the face were animated using the eye-tracking data from the OSIE dataset. The eyes in this example are animated with scanpath displayed in **(B).** The first, fourth, and seventh fixations of the scanpath are shown in panels **(C)**, **(D)**, and **(E)**, respectively. The video of the face performing the complete scanpath from **(B)** can be seen in the Online Supplementary Material

In Blender, each model was illuminated by a 60-cm square plane light source that emitted light uniformly across its surface. This light source was positioned 1 m from the model and can be seen on the left in Fig. 2A. The virtual camera was positioned 57 cm in front of the model and had a spatial resolution of 1,280 pixels × 720 pixels. Note that the position of the virtual camera was consistent with participants’ viewing distance during the experiment (~ 60 cm), such that the perspective onto the faces rendered in the images was consistent with the viewing conditions used in the experiment. The rendering engine Cycles was used to render the images in Blender. During the experiment, the stimuli were presented in colour on linearized Display + + monitors (Cambridge Research Systems) with a refresh rate of 120 Hz.

Design and procedure

Participants completed the experiment in a darkened booth, seated approximately 57 cm from the monitor with their head stabilised by a chin rest. At this viewing distance, the faces were approximately ‘life-sized’ with on-screen inter-pupillary distance of ~ 6.3 cm (Dodgson, 2004; Fesharaki et al., 2012). This viewing distance also corresponds to the interpersonal distance that is preferred for an acquaintance or close friend (Sorokowska et al., 2017). Before beginning the task, participants were instructed that they would be learning to recognise two people: Sam and Charlie. These names were selected as these are gender-neutral names in the Australian context in which the study was conducted (i.e., Sam is a common nickname for both the masculine Samuel and feminine Samantha; Charlie is a nickname for the masculine Charles or feminine Charlotte). The instructions explained that on each trial, Sam or Charlie would appear on the screen and move their eyes as if they were looking around the testing booth. Participants were instructed to indicate whether the person on-screen was Sam or Charlie and rated how closely the person was paying attention to them on a 7-point Likert scale (‘Not at all’ to ‘Very closely’). The purpose of this rating response was to prompt participants to pay attention to the eye movements of the face without explicitly stating that this could be a cue useful for identity recognition.

The first or learning phase of the experiment consisted of 100 trials, with 50 scanpaths performed by each facial identity. The faces presented in this phase were always the original, non-morphed faces (i.e., 0% morph and 100% morph, see far left and right column of Fig. 1) with each face consistently paired with the same gaze identity. Participants were randomly assigned to a face/gaze identity pair, and the assignment of the names ‘Sam’ and ‘Charlie’ to each identity was randomised across participants. The presentation order of the stimuli was also randomised for each participant, and participants had the opportunity to take a rest break halfway through the learning phase. Participants received feedback on the correctness of their response via ‘Correct’ or ‘Incorrect’ appearing on the screen after their response was recorded.

The second or testing phase of the experiment consisted of 288 trials: 136 scanpaths performed by each gaze identity plus 16 catch trials. For the catch trials, participants viewed unmorphed faces with the congruent gaze identity, with eight trials each for Sam and Charlie. The scanpaths used to animate the faces’ eye movements in the test phase had not been seen in the learning phase. In other words, these were scanpaths recorded from the same two (human) observers as those used in the learning phase, but when looking at different images in the OSIE dataset. Participants indicated whether the face looked more like Sam or Charlie and did not receive feedback on the correctness of their response. Participants also rated how closely they felt the face was paying attention to them. Participants could take a break halfway through the testing phase and were debriefed by the experimenter upon completing the experiment.

Analysis

The experiment produced two sets of data for each participant: data from the learning phase and the testing phase. The identity discrimination responses from the learning phase were summarised as proportion of correct responses and, as this was not a particularly difficult task, we anticipated that participants would perform quite well. We did not plan to conduct any statistical analyses on these data though the descriptive statistics served as a way to check whether participants were paying attention during the experiment (and not randomly pressing buttons, for example). Likewise for the rating data from the learning phase, we did not have specific hypotheses for the ratings of how closely the face was paying attention to the participant, and this question was included in the task to prompt participants to pay attention to the eye movements of the face.

For the testing phase, there were eight data points for each facial morph level for each gaze identity. These data points were converted to proportion of trials in which a participant indicated that the face looked like identity #2, where identity #2 could be Sam or Charlie depending on the random name assignment. A Logistic curve was fit to the data with parameters for the point of subjective equality (PSE), the slope of the function, and the lapse rate. Our hypothesis centred on the PSE: the point on the morphed identity continuum at which the fitted proportion of responses of the two identities was equal. If gaze behaviour does not influence identity recognition, we would expect no difference between the PSEs associated with each gaze identity. For example, if the PSEs for both functions (i.e., the responses for the morphs paired with each gaze identity) are 50%, this indicates that when 50% of face identity #2 is present in the morphed face, participants are equally likely to judge the face as Sam or Charlie. However, if gaze behaviour does influence identity recognition, the PSEs for each gaze identity would be significantly different from one another. Keeping with the example, for morphed faces paired with gaze identity #1, we would expect the PSE to be greater than 50% as the perceived identity is biased towards the gaze identity and therefore, a greater percentage of face identity #2 is required for participants to be equally likely to judge that face as Sam or Charlie. Conversely, for morphed faces paired with gaze identity #2, the PSE would be less than 50% because participants are biased by the gaze identity and a smaller percentage of face identity #2 is necessary in the morphed face for participants to be judging the facial identity at chance.

Results

All participants had accuracy equal to or greater than 85% for the identity discrimination in the learning phase and the catch trials in the testing phase, indicating that participants successfully learnt the names associated with each face. The supersubject data for face identity judgements in the testing phase are depicted in Fig. 3A. As anticipated, participants were more likely to judge that a face was face identity #2 when a greater proportion of identity #2 was present in the morphed face (i.e., in terms of shape and texture cues). In contrast, there was a much weaker effect of gaze behaviour on facial identity judgements. Each participant’s PSEs are plotted in Fig. 3B, contrasting facial identity judgements when the same facial morphs performed gaze behaviours consistent with either gaze identity #1 or gaze identity #2. If participants were biased by gaze behaviour when judging facial identity, the PSEs for gaze identity #1 should be relatively greater than the PSEs for gaze identity #2 and fall below the dashed line in Fig. 3B. Averaged across participants, the PSE for gaze identity #1 (M = 51.19, SEM = 1.30) was slightly greater than the PSE for identity #2 (M = 50.10, SEM = 1.24). The difference between the PSE for each gaze identity (across face pairs) was significantly greater than zero (t(39) = 2.17, p =.036, 95% CI [0.07, 2.10], Cohen’s d = 0.34).

Fig. 3 — The results from Experiment 1. **(A)** The supersubject data from the testing phase. The percentage of facial identity #2 present in the morphed face is on horizontal axis and the proportion of trials in which the face was reported to be identity #2 is on the vertical axis. The blue circles represent the observed data for trials where the morphed face’s eye movement were controlled by gaze identity #1, and the orange circles correspond to trial where the face’s gaze was controlled by gaze identity #2. The blue and orange solid lines represent the best fitting Logistic function. **(B)** The points of subjective equality (PSEs) for each gaze identity, with the circle markers depicting the PSEs for participants assigned to identity pair #1, square markers depicting identity pair #2, diamond markers depicting identity pair #3, and triangle markers depicting identity pair #4. If gaze behaviour biased the perceived facial identity, then the PSEs for gaze identity #1 should be relatively greater than the PSEs for gaze identity #2 and fall below the dashed line. The fitted data for each participant can be found in the Online Supplementary Material. A plot showing the relationship between each participant’s PSE difference (i.e., PSE for identity #1—PSE for identity #2) with the difference in the Likert ratings for each learned identity (i.e., mean rating for identity #1—mean rating for identity #2) can also be found in the Online Supplementary Material

However, the bootstrapped 95% confidence intervals for the PSE difference for the supersubject data indicate that the difference was not significantly different from zero (95% CI [−0.02, 1.81]). We also calculated the bootstrapped 95% confidence intervals around the PSE difference for each identity pair (i.e., those shown in Fig. 1) and found that the difference was significantly greater than zero for pair #3 (95% CI [0.23, 3.77]) but not for the other three pairs (pair #1: 95% CI [−0.23, 3.38]; pair #2: 95% CI [−0.89, 2.67]; pair #4: 95% CI [−2.71, 1.41]). We note that the t-test with the individual participant data takes the between-subject variability (i.e., the noise along the line of unity in Fig. 3B) into account, while the supersubject analysis does not. As such, the discrepancy between the results based on the individual participant data and the supersubject data is possibly due to the masking of between-subject variability in the supersubject data rendering this analysis less powerful. Furthermore, the inconsistency in the PSE difference across identity pairs in the supersubject data could reflect variation in the discriminability of facial features or gaze behaviour across pairs. In summary, we find some evidence for a significant effect of identity-specific eye movements on facial recognition, but this effect may only be significant when discriminating between certain facial identities using sufficiently dissimilar gaze behaviours and is clearly very small compared to the influence of static face features on facial identity judgements (i.e., face shape and texture).

Experiment 2

Experiment 1 produced limited evidence of identity-specific eye movements spontaneously biasing facial recognition. That the effect observed was relatively weak is perhaps unsurprising given that traditional cues to facial identity (e.g., face shape and texture) were available to participants. Furthermore, participants were not explicitly instructed to base their judgements of identity on the face’s gaze behaviour, thus the effect observed represents spontaneous or implicit learning of individual differences in gaze behaviour when observing others. The aim of Experiment 2 was to measure participants’ sensitivity at discriminating the identity of two people based on their gaze behaviour when facial shape and texture could not be relied upon to discriminate identity. Participants were explicitly instructed to scrutinise the eye movements of the two people as they would be tested on how much they learned about the gaze behaviours. Eye-movement data recorded from the same two individuals was later presented via a computer-generated face that did not vary across trials in shape or textural features, and participants were tasked to identify which of the two earlier-presented identities the gaze behaviour of the face was consistent with. We theorised that explicitly prompting participants to focus on gaze behaviour would elicit best possible performance on the identity discrimination task.