Skip to main content
PLOS One logoLink to PLOS One
. 2022 Feb 4;17(2):e0263594. doi: 10.1371/journal.pone.0263594

How to choose the size of facial areas of interest in interactive eye tracking

Antonia Vehlen 1, William Standard 1, Gregor Domes 1,*
Editor: Guido Maiello2
PMCID: PMC8815978  PMID: 35120188


Advances in eye tracking technology have enabled the development of interactive experimental setups to study social attention. Since these setups differ substantially from the eye tracker manufacturer’s test conditions, validation is essential with regard to the quality of gaze data and other factors potentially threatening the validity of this signal. In this study, we evaluated the impact of accuracy and areas of interest (AOIs) size on the classification of simulated gaze (fixation) data. We defined AOIs of different sizes using the Limited-Radius Voronoi-Tessellation (LRVT) method, and simulated gaze data for facial target points with varying accuracy. As hypothesized, we found that accuracy and AOI size had strong effects on gaze classification. In addition, these effects were not independent and differed in falsely classified gaze inside AOIs (Type I errors; false alarms) and falsely classified gaze outside the predefined AOIs (Type II errors; misses). Our results indicate that smaller AOIs generally minimize false classifications as long as accuracy is good enough. For studies with lower accuracy, Type II errors can still be compensated to some extent by using larger AOIs, but at the cost of more probable Type I errors. Proper estimation of accuracy is therefore essential for making informed decisions regarding the size of AOIs in eye tracking research.


Eye tracking, especially in its video-based form, has become a standard method for investigating visual attention in many research areas, including social neuroscience [1, 2], psychopharmacology [35] and virtual reality [6, 7]. In recent years, several studies have questioned common reporting standards in this field, or the lack of data quality evaluations regarding the specific setup in use [8, 9], e.g., some studies only report the overall accuracy of hardware determined by the manufacturer under ideal test conditions. The issue of data quality and inadequate reporting standards seems to be increasingly relevant as the field advances to develop more naturalistic, interactive or face-to-face eye tracking applications [1012]. Not only do these setups deviate further in design from the manufacturer’s test conditions, but when used in naturalistic interactions, other factors can affect the accuracy, such as movements accompanying facial expressions, speech or varying viewing distances.

Area of interest (AOI) based gaze classification is a popular approach for analyzing gaze data [13, 14]. This approach determines whether a gaze point or a fixation is directed at a predefined region around a target point, e.g., in face perception, the target would be another person’s face, while potential AOIs would be the eye region or mouth. Researchers have taken this approach to study, for example, the gaze behavior of participants with (sub-)clinical social anxiety or autism [11, 15, 16]. For these applications in potentially interactive scenarios, automatic procedures to generate AOIs seem beneficial [13], as manual generation can be time-consuming depending on the recording duration. In addition, the uniformity of target stimuli in these setups allow the use of standardized, published procedures (e.g. Limited-Radius Voronoi-Tessellation method, LRVT; [17]), thus ensuring the comparability of research results. In this method, AOI size can be easily regulated by adjusting the limiting radius.

The impact of AOI size on gaze classification has been investigated [13, 14]. In those studies, suggestions were made that AOIs on sparse stimuli should be large enough to be robust to noise [13], and that oversized AOIs are problematic when considering falsely positive classified gaze data [14]. While the problem of inadequate AOI sizes seems to be known, guidelines on how to choose the most appropriate size of AOIs remain vague. In addition, it is conceivable that the choice of AOI sizes depends on the gaze data’s accuracy. In general, we can assume that low accuracy would require larger AOIs to ensure valid classified gaze points. However, it is unclear how accuracy and AOI size interact in affecting classification performance, and whether false-negatives and false-positives are affected differentially.

With the current study, we aimed to investigate the gaze classification performance depending on the accuracy of the detected gaze position (spatial offset between detected and real gaze position) and the AOI size (LRVT with different radii) with simulated gaze data in order to derive guidelines for selecting AOIs and their size in interactive (face-to-face) eye tracking applications. Thereby, we focus on classification performance with respect to false-positives (falsely classified inside a specific AOI; Type I error; false alarms) and false-negatives (falsely classified outside a specific AOI; Type II error; misses) to derive recommendations for choosing AOI size depending on accuracy. Specifically, we expected accuracy and AOI size to independently influence classification performance, and for the two factors to interact such that AOI size would demonstrate a greater impact on classification performance when accuracy is low. Along with these recommendations, we present a software tool that enables gaze data to be generated with a given accuracy, the visualization of gaze data on AOIs of different sizes, and the evaluation of the resulting classification of gaze data.



Facial stimuli from the Face Research Lab London (version 3; [18]) served as the basis for this simulation. We selected the following four stimuli to represent different ethnic groups: 005 (male, Asian, 28 years old), 012 (male, white, 24 years old), 025 (female, African American, 21 years old) and 134 (female, white, 21 years old). A picture with a neutral facial expression and direct gaze was chosen for each stimulus. Stimuli were resized and rescaled to 480 x 480px resembling the size of a real person sitting at a viewing distance of approx. 130cm recorded at 1920 x 1080px resolution. After rescaling, the facial stimuli covered an average area of 7.8 by 5.7°, which corresponds to the size of a real face in a face-to-face conversation at the aforementioned viewing distance of approx. 130cm. For display in the figures, we created another stimulus that was not used for the simulation. The individual pictured in Fig 1 and Figs 3 to 5 has provided written informed consent (as outlined in PLOS consent form) to publish their image alongside the manuscript.

Fig 1. Visualization of gaze data simulation and areas of interest (AOI) definition.

Fig 1

(A) Visualization of the three gamma functions used to generate gaze data with three levels of accuracy (0.5°, 1.0° & 1.5°) and examples of the simulated fixations for the left eye of a facial stimulus as the facial target. Each red dot represents the averaged fixation location of 30 simulated fixations; a total of n = 100 data sets were simulated. (B) Visualization of the three steps of the automatic AOI construction process. 1. Facial landmarks from OpenFace. 2. AOI center points derived from the facial landmarks. 3. Resulting AOIs using the Limited-Radius Voronoi-Tessellation (LRVT) method (example with 2.0° radius). Note OF = OpenFace [19]. The stimulus shown in A and B was created for illustrative purposes only and is not part of the stimulus set used in the study (see Methods section).

Fig 3. Effect of gaze data accuracy and AOI size on classification performance of simulated fixations (on eyes, nose & mouth) averaged over all facial AOIs.

Fig 3

(A) Visualization of the AOI sizes 1.0° and 2.0° drawn around the blue center points and the simulated fixation points in red (n = 100, averaged over 30 simulated fixations; accuracy 0.5°). (B) Green indicates correct classification within the corresponding AOI, orange and gray indicate misclassifications as fixations within the other AOIs, or no AOI at all. Effect of gaze data accuracy and AOI size on classification performance of simulated fixations (on the forehead). (C) Visualization of AOIs with 1.0° and 2.0° radius and the simulated fixation points on the forehead in red (n = 100, averaged over 30 simulated fixations; accuracy 0.5°). (D) Green indicates the correct classification outside any AOI, orange and gray indicate misclassifications as fixations within the AOIs of the eye region or within the other AOIs. The stimulus shown in A and C was created for illustrative purposes only and is not part of the stimulus set used in the study (see Methods section).

Fig 5. Effect of viewing distance on stimulus size and fixation point deviation.

Fig 5

(A) Visualization of the stimulus size over three viewing distances (90cm, 130cm & 170cm) with AOIs covering the same facial areas (radius 4.6cm; in visual degree angle: 1.6°, 2.0° & 2.9°) and simulated fixation points on the left eye. (B) Visualization of the interaction between visual angle and viewing distance on fixation point deviation. In both figures (A & B) each red dot represents the averaged fixation location of 30 simulated fixations with an accuracy of 1.5°; a total of n = 100 data set were simulated. The stimulus shown in A and B was created for illustrative purposes only and is not part of the stimulus set used in the study (see Methods section).

Gaze data simulation

The goal for our gaze data simulation was to mimic a standard test procedure with multiple participants and several runs of a gaze validation on facial features, i.e. the instructed sequential fixation of specific targets points on a facial stimulus. For each target, fixations lasting one second each were simulated at a 120 Hz recording frequency. The coordinates of five facial features (left eye, right eye, nose, mouth & forehead) were chosen as targets. Each target point corresponded to an AOI center point, except for the forehead point, for which no AOI was generated. Both target points and AOI center points were determined using OpenFace landmarks (Fig 1B; [19]).

To simulate a realistic gaze data set for a group of (simulated) participants, the fixation points around the facial targets were determined in four steps. (1) Mean accuracy, sample size and number of runs were specified. (2) Each simulated participant was assigned a base accuracy derived from a generalized gamma distribution around the specified mean accuracy. The standard deviation was set to 0.5 times the mean accuracy and the skewness to 0.6. (3) A random offset angle around the target point was chosen for each simulated participant. (4) Offsets per target were created for each simulated participant depending on the number of runs by varying the individual base accuracy according to a normal distribution with a standard deviation of 0.15 times the base accuracy. Runs with accuracy values that fell outside the three standard deviations were recalculated. This procedure allowed us to account for within- (Step 4) and between-subjects (Steps 1–3) variance. We applied the above-mentioned method to simulate data with mean accuracy values of 0.5°, 1.0° and 1.5°. The distribution of the simulated gaze data for the three accuracy levels is found in Fig 1A.

A total of 100 participants were simulated with 30 face validation runs for the three accuracy values and four stimuli, resulting in 36000 data sets. Each face validation run consists of a fixation for each facial target point computed by averaging the gaze samples from one second of recording at a frequency of 120 Hz. Data was simulated with an in-house tool written in Python 3.7. The tool can be downloaded here:

Definition of AOIs

We used the LRVT method [13] with different radii for facial features and a face ellipse to automatically define AOIs and vary their size. In the first step, facial landmarks (eyes, nose & mouth) were obtained using OpenFace [19]–Fig 1B. Second, AOI centers were either derived directly from the facial landmarks (nose & mouth) or by calculating the midpoint between two landmarks (left & right eye). The face ellipse’s center was created by calculating the midpoint between the left and right eye corners (x-coordinate) and 1.5 times the distance between facial landmark 8 and 33 (y-coordinate)–Fig 1B. In the final step, AOIs were defined by applying the LRVT method with three different radii (1.0, 1.5 & 2.0°) for the facial features and OpenFace landmarks were used to define the face ellipse. The ellipse’s horizontal radius is the smaller distance between the face’s center point and facial landmark 0 or 16 (x-coordinate). The vertical radius is the distance between the face center and facial landmark 8 (y-coordinate). To assess the effect of accuracy and AOI size on gaze classification performance, the AOI radius of 4° proposed in the literature as being robust to noise (imprecision of the signal) [13, 17] was adjusted to a 130cm viewing distance, resulting in a radius of approx. 2.0° (~4.6cm). This was necessary to ensure that the AOIs covered the same facial area. Additionally, this radius was reduced again twice (1.0° [~3.4cm] & 1.5° [~2.3cm]) to test the effect of different AOI sizes on classification performance.

Data analyses

As a prerequisite for aggregating the fixations across stimuli, we performed a two-way analysis of variance (ANOVA) with the between-subject factors stimulus (005, 012, 025 & 134) and accuracy (0.5°, 1.0° & 1.5°), to test whether the percentage of correctly classified fixation points differed as a function of facial stimulus used for the simulation.

To investigate the influence of accuracy and AOI size (LRVT with different radii) on false-negatives (Type II error; misses), we analyzed the number of fixation points directed to one of the four facial AOIs that were misclassified as belonging to a different AOI, or to no AOI at all (rest of face & surrounding). We chose to visualize the effect using confusion matrices and bar plots, and analyzed the gaze data descriptively.

The effect of accuracy and AOI size on false-positives (Type I error; false alarms) was tested by simulating fixation points on the forehead of the facial stimuli for which no AOI had been defined. Classification was correct when no AOI was detected, whereas false-positives occurred when fixations points were misclassified as belonging to one of the AOIs. Again, bar plots were created to visualize the effect of the independent variables, and analyses performed at the descriptive level.

Last, the effect of accuracy on false-negatives (Type-II error; misses) was further tested by analyzing fixation points simulated on the different AOIs as being directed towards or away from the face. Classification was correct when the fixations were detected within the face ellipse.


The two-way ANOVA for the percentage of correctly classified fixation points revealed a non-significant main effect of facial stimulus, F(3, 1188) = 2.06, p = .104, ƞ2G < .01, a significant main effect of accuracy, F(2, 1188) = 275.77, p < .001, ƞ2G = .32 and a non-significant interaction effect, F(6, 1188) = 0.37, p = .900, ƞ2G < .01, resulting in the aggregation of classification data across stimuli.

Classification of fixations on facial feature AOIs

Fixations simulated on eyes, nose, and mouth with high quality, e.g., accuracy values of 0.5°, were correctly classified in 96.7 to 100.0 percent of cases. Misclassification and non-classification of fixations (false-negatives; Type II error; misses), in turn, occurred in only 0 to 3.4 percent of cases. In this condition, the classification performance was largely independent of AOI size (Fig 2 right column along the vertical axis). With reduced accuracy, we noted that differences emerged in the classification performance as a function of AOI size (Fig 2 left and middle column along the vertical axis). Fixations with an accuracy of 1.5° were on average correctly classified in 33.4 to 79.8 percent of the cases. Large AOIs with a radius of 2.0° resulted in correct classification above the chance level for most AOIs (left eye, right eye & mouth), whereas classification of fixation points on the nose AOI were more evenly distributed across all other AOIs, resulting in the highest percentages of false-negatives (12.9 to 48.4%). Reducing AOI size, on the other hand, is associated with fewer misclassified fixation points, at the expense of an increase in false-negatives in terms of non-classification (rest of face & surrounding: 0 to 43.0%).

Fig 2. Confusion matrices of the fixation classification performance as a function of gaze data accuracy and AOI size.

Fig 2

The percentages in the diagonal represent correctly classified fixation points, while the percentages outside the diagonal represent misclassified fixation points. The percentages in the last two rows correspond to the number of unclassified fixation points (rest of face & surrounding).

In the second step, we averaged the classification performance over all facial target points (Fig 3B). Concerning false-negative classifications, AOI size seems irrelevant for accuracy values below or equal to 0.5°. For scenarios with accuracy values over 1.0°, larger AOIs result in more correctly classified fixation points at the expense of a slightly increased percentage of misclassified fixation points. Smaller AOIs, on the other hand, lead to fewer misclassified fixation points, but also to a slight reduction in correctly classified fixation points attributable to 35% unclassified data-quality-dependent fixation points.

Classification of fixations outside AOIs

To investigate the effect of accuracy and AOI size on false-positives (Type I error; false alarms), gaze data were simulated on a target point for which no AOI had been defined. In this particular case, we simulated fixations on the forehead of the facial stimuli to recreate a situation in which someone tries to avoid eye contact by hiding their behavior and fixating on the forehead. The mean distance between the forehead target point and AOI center points of the eyes was approx. 1.56°. Fig 3D shows that large AOIs in combination with good gaze data quality (0.5° accuracy) result in almost all fixation points being misclassified (inflation of Type I error). As expected, most fixation points were falsely classified as belonging to the left or right eye. Smaller AOIs in combination with good data quality, on the other hand, lower numbers of false-positives drastically with over 90% of fixation points being correctly classified outside any specific AOI. The effect of accuracy is inverse, while smaller AOIs profit from better accuracy in terms of Type I errors, whereas the error increases slightly with better accuracy for larger AOIs.

Classification of fixations on the face

The largest meaningful AOI for facial stimuli can be assumed to be the whole face, represented by an ellipse drawn around the entire face with no specific facial features discriminated. To investigate the effect of accuracy on false-negatives (misses), gaze data was simulated on four facial targets points (left eye, right eye, nose & mouth) and classified as being directed towards or away from the face. The face ellipse had an average vertical radius of about 4.22° and a horizontal radius of 2.80°. Accuracy values up to a 1° degree visual angle allow nearly error-free classification of fixation points (0.5° accuracy: 100% correct classification; 1.0° accuracy: approx. 99% correct classification), but even accuracy values of a 1.5° degree visual angle reduce the classification performance by only about 5% (approx. 95% correct classification) (Fig 4B).

Fig 4. Effect of gaze data accuracy on classification performance of simulated fixations (on eyes, nose & mouth) within the face ellipse.

Fig 4

(A) Visualization of the face ellipse and the simulated fixation points with an accuracy of 1.5° (n = 100, averaged over 30 simulated fixations) in red. (B) Green indicates correct classification within the face ellipse compared to orange for misclassification outside the face ellipse. The stimulus shown in A was created for illustrative purposes only and is not part of the stimulus set used in the study (see Methods section).


In the present simulation study, we investigated the effect of the accuracy of the detected gaze position and the AOI size (LRVT with different radii) on gaze data classification for facial stimuli in a (simulated) interactive eye tracking setup. As hypothesized, we found that the AOI size’s effect on classification performance depend strongly on accuracy. Differentiating this effect in terms of Type I (false alarms) and Type II errors (misses), we found that AOI size is irrelevant for Type II errors when the accuracy is better than 1.0°. The picture changed when accuracy exceeded the threshold of 1.0° within the present setup: larger AOIs raised classification accuracy leading to fewer Type II errors. On the other hand, if we consider the Type I error (falsely classified inside a predefined AOI), the definition of smaller AOIs seems appropriate for all the levels of accuracy simulated here. However, the advantages of smaller AOIs decline slightly with an accuracy worse than 1.0°. Here the probability of misclassification or Type I errors increases.

Our findings regarding Type II error are consistent with previous accounts of the interaction between accuracy and AOI size on gaze data classification performance [20, 21]. Moreover, a systematic observation of gaze data classification performance on facial stimuli using the same AOI definition as in the present study concluded that larger AOIs are a noise-robust solution [13]. Therefore, larger AOIs might be a better choice with accuracy worse than 1.0°, but larger AOIs are also associated with more Type I errors (false alarms) when accuracy is low.

This distinction seems particularly relevant when examining clinical disorders such as social phobia or autism with such interactive eye tracking setups, for which small spatial differences in attention allocation are characteristic [22, 23]. We simulated such a scenario by including fixations directed at a target point on the forehead of the facial stimuli for which no AOI had been defined. We chose this scenario because people suffering from social interaction disorders often report employing strategies to normalize their gaze behavior in social interactions, such as looking between the eyes or at their interaction partner’s forehead. Our simulation results are in line with another publication that recommended smaller AOIs to prevent inflated Type I errors [14]. However, by examining a wide range of accuracy values in the current study, we additionally observed that the superiority of smaller AOIs decreases in conjunction with reduced accuracy. For subject samples where abnormal gaze behavior is expected, the Type I error problem could be transformed into a Type II error problem by adding an AOI on the forehead.

Given the differential effects on Type I and Type II errors, medium-sized AOIs would represent a compromise, but when we consider the accuracy for values worse than 1.0°, on average only about 50% of fixations were classified correctly in both simulations. We therefore, included a simulation designed for scenarios with low accuracy that sacrifices facial feature distinction in favor of classification performance. While 1.5° accuracy resulted in a classification performance around chance level when differentiating facial features, the face ellipse as the largest meaningful AOI in face perception resulted in over 95% of the fixations being classified correctly even with accuracy as low as 1.5°.

Based on the present simulation study, the recommendations for the choice of facial AOI size in interactive eye tracking setups can be summarized as follows:

  • Inflated Type I errors (falsely classified gaze points inside an AOI; false alarms) can be prevented by using small AOIs, such as radii of 1.0°, regardless of accuracy

  • As previously published and recommended, the use of larger AOIs prevents the inflation of the Type II error (falsely classified gaze points outside an AOI; misses), especially for accuracy values above 1.0°

  • If both error types (Type I & Type II errors) are to be compensated for in a setup with accuracy of 1.0° or better, smaller AOIs appear to be the better choice

  • If both error types (Type I & Type II errors) are to be compensated for in a setup with accuracy values worse than 1.0°, we suggest not using AOIs to distinguish facial features, but instead a face ellipse as an indicator of facial gaze

While the investigator can choose the AOI size freely, accuracy is the limiting factor. Hence, it seems crucial not only to maximize accuracy as much as possible, but furthermore to estimate accuracy based on a data set recorded under the specific conditions. Ideally, distinct trials instructing the participants to fixate predefined points in the tracking area are included in the study protocol that enable accuracy to be properly calculated (validation procedures). If this is possible, one could follow the recommendations made in this study to choose the appropriate AOI size. In situations where a proper validation procedure is unfeasible, e.g. because of limited time and resources, adjusting another factor can mitigate the impact of accuracy on classification performance: the viewing distance.

In this study, we used a viewing distance of approx. 130cm to simulate an interactive eye tracking setup with a typical viewing distance for face-to-face interactions. Two factors change when reducing the viewing distance: (1) the ratio of the facial stimulus size to the surrounding in the visual field, that is, the face covers a larger area (Fig 5A), and thereby the inter-AOI distances increase, and (2) the span of fixation point deviation due to a reduced visual angle decreases, that is, the spatial accuracy increases (Fig 5B). The first aspect is also influenced by the face’s actual size, e.g., in studies with toddlers the face occupies a smaller area in the viewing field even at constant viewing distance. Here, a change in inter-AOI distance (see AOI span; [13]) affects the influence of accuracy and AOI size on classification performance. However, the effect might be small in studies with an adolescent or adult sample due to the limited variation in face sizes. We thus conclude that lower accuracy can be partially compensated for by using shorter viewing distances, provided the setup allows for this. Longer viewing distances or situations with stimuli observed at varying viewing distances, on the other hand, should be treated with caution. Note that the visual angle values used in the present study need to be adjusted to differing viewing distances for meaningful comparison between studies. For example, if the viewing distance is reduced to 90cm, the AOI sizes must be increased by 0.9° from the original 2.0° radius to cover the same facial areas (Fig 5A).

One potential study limitation is that our simulation relied on the assumption that gaze data follows a unique distribution for all generated data sets. Although, the overall parameters (deviation and skewness) largely concur with the gaze data recorded in the real setup underlying the present simulation [12], it is possible that other recording conditions, populations or interaction paradigms will lead to different gaze data distributions. Therefore, we have made the gaze data simulation tool used in this study freely available and encourage repeating or modifying the simulation process.

The fact that our evaluation was based on fixations can be considered as a further study limitation. Defining fixations eliminates the influence of precision and thus improves classification performance. This has little effect in setups with eye trackers demonstrating good to very good precision, but can lead to significantly different results in less precise eye trackers. In such cases, fixation classification is already affected by the reduced precision [24] and thus indirectly influences the gaze classification performance.

Furthermore, one might question the use of AOIs in general due to their subjective nature and the fact that the definition can be very time-consuming especially with moving AOIs in interactive setups [25]. We tried to minimize these practical disadvantages by employing a validated automatic AOI-definition method that takes a Voronoi-Tessellation approach to avoid subjectivity, and the OpenFace tool to save time. Nevertheless, data-driven approaches based on neural networks [26, 27] are a potential alternative approach when gaze data quality is uncertain or difficult to estimate, an approach that could be evaluated in future studies when used in interactive eye tracking setups.

The present study thus might help to further improve reporting standards in eye tracking research [20, 21, 28]. Furthermore, the freely available simulation tool used in the present study can be used to support the validation process of novel eye tracking setups.

Eye tracking during real social interactions is a powerful tool to examine social attention and behavior in both healthy and clinical populations. Accuracy in these setups can be compromised by movements caused by speech, facial expressions, or head rotations. It is thus essential to validate novel interactive eye tracking setups carefully. The proper estimation of accuracy is an important prerequisite for informed decisions regarding the size of AOIs used to analyze data. The results of the present simulation study indicate that smaller AOIs minimize false classifications (both Type I and Type II errors) as long as the accuracy is sufficient. For studies with lower accuracy, Type II (misses) errors can still be compensated to some extent by using larger AOIs, but at the cost of making Type I errors (false alarms) more likely. When accuracy is low, facial feature discrimination is better omitted and larger AOIs, such as a face ellipse, should be preferred to enable valid AOI classification.

Data Availability

The data sets created and analysed in this study can be downloaded here: or reproduced using the software tool described, which can also be downloaded from this website.

Funding Statement

The study was in part supported by grants from the German Research Foundation (DO1312/5-1) to GD and the Trier University Research Priority Program “Psychobiology of Stress”, funded by the State Rhineland-Palatinate. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.


  • 1.Kaiser D, Jacob GA, van Zutphen L, Siep N, Sprenger A, Tuschen-Caffier B, et al. Biased attention to facial expressions of ambiguous emotions in borderline personality sisorder: An eye-tracking study. J Pers Disord. 2019;33: 1–28. doi: 10.1521/pedi_2018_32_325 [DOI] [PubMed] [Google Scholar]
  • 2.Mojzisch A, Schilbach L, Helmert JR, Pannasch S, Velichkovsky BM, Vogeley K. The effects of self-involvement on attention, arousal, and facial expression during social interaction with virtual others: A psychophysiological study. Soc Neurosci. 2006;1: 184–195. doi: 10.1080/17470910600985621 [DOI] [PubMed] [Google Scholar]
  • 3.Domes G, Steiner A, Porges SW, Heinrichs M. Oxytocin differentially modulates eye gaze to naturalistic social signals of happiness and anger. Psychoneuroendocrinology. 2013;38: 1198–1202. doi: 10.1016/j.psyneuen.2012.10.002 [DOI] [PubMed] [Google Scholar]
  • 4.Lischke A, Berger C, Prehn K, Heinrichs M, Herpertz SC, Domes G. Intranasal oxytocin enhances emotion recognition from dynamic facial expressions and leaves eye-gaze unaffected. Psychoneuroendocrinology. 2012;37: 475–481. doi: 10.1016/j.psyneuen.2011.07.015 [DOI] [PubMed] [Google Scholar]
  • 5.Reilly JL, Lencer R, Bishop JR, Keedy S, Sweeney JA. Pharmacological treatment effects on eye movement control. Brain Cogn. 2008;68: 415–435. doi: 10.1016/j.bandc.2008.08.026 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Rubin M, Minns S, Muller K, Tong MH, Hayhoe MM, Telch MJ. Avoidance of social threat: Evidence from eye movements during a public speaking challenge using 360°- video. Behav Res Ther. 2020;134: 103706. doi: 10.1016/j.brat.2020.103706 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Vatheuer CC, Vehlen A, Dawans B von, Domes G. Gaze behavior is associated with the cortisol response to acute psychosocial stress in the virtual TSST. J Neural Transm. 2021;128: 1269–1278. doi: 10.1007/s00702-021-02344-w [DOI] [PubMed] [Google Scholar]
  • 8.Dalrymple KA, Manner MD, Harmelink KA, Teska EP, Elison JT. An examination of recording accuracy and precision from eye tracking data from toddlerhood to adulthood. Front Psychol. 2018;9: 803. doi: 10.3389/fpsyg.2018.00803 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Niehorster DC, Santini T, Hessels RS, Hooge ITC, Kasneci E, Nyström M. The impact of slippage on the data quality of head-worn eye trackers. Behav Res Methods. 2020;52: 1140–1160. doi: 10.3758/s13428-019-01307-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Hessels RS, Cornelissen THW, Hooge ITC, Kemner C. Gaze behavior to faces during dyadic interaction. Can J Exp Psychol. 2017;71: 226–242. doi: 10.1037/cep0000113 [DOI] [PubMed] [Google Scholar]
  • 11.Grossman RB, Zane E, Mertens J, Mitchell T. Facetime vs. screentime: Gaze patterns to live and video social stimuli in adolescents with ASD. Sci Rep. 2019;9: 12643. doi: 10.1038/s41598-019-49039-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Vehlen A, Spenthof I, Tönsing D, Heinrichs M, Domes G. Evaluation of an eye tracking setup for studying visual attention in face-to-face conversations. Scientific Reports. 2021;11: 2661. doi: 10.1038/s41598-021-81987-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Hessels RS, Kemner C, van den Boomen C, Hooge ITC. The area-of-interest problem in eyetracking research: A noise-robust solution for face and sparse stimuli. Behavior Research Methods. 2016;48: 1694–1712. doi: 10.3758/s13428-015-0676-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Orquin JL, Ashby NJS, Clarke ADF. Areas of interest as a signal detection problem in behavioral eye-tracking research: Areas of interest as a signal detection problem. J Behav Dec Making. 2016;29: 103–115. doi: 10.1002/bdm.1867 [DOI] [Google Scholar]
  • 15.Cañigueral R, Hamilton F de CA. The role of eye gaze during natural social interactions in typical and autistic people. Front Psychol. 2019;10: 560. doi: 10.3389/fpsyg.2019.00560 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hessels RS, Holleman GA, Cornelissen THW, Hooge ITC, Kemner C. Eye contact takes two–Autistic and social anxiety traits predict gaze behavior in dyadic interaction. J Exp Psychopathol. 2018; 1–17. doi: 10.5127/jep.062917 [DOI] [Google Scholar]
  • 17.Hessels RS, Benjamins JS, Cornelissen THW, Hooge ITC. A validation of automatically-generated areas-of-interest in videos of a face for eye-tracking research. Front Psychol. 2018;9: 1367. doi: 10.3389/fpsyg.2018.01367 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.DeBruine L, Jones B. Face Research Lab London Set. 2017. Available: cles/Face_Research_Lab_London_Set/5047666 [Google Scholar]
  • 19.Amos B, Ludwiczuk B, Satyanarayanan M. OpenFace: A general-purpose face recognition library with mobile applications. CMU-CS-16-118, CMU School of Computer Science; 2016. [Google Scholar]
  • 20.Holmqvist K, Nyström M, Mulvey F. Eye tracker data quality: What it is and how to measure it. Proceedings of the Symposium on Eye Tracking Research and Application. Santa Barbara, California: ACM Press; 2012. pp. 45–52. doi: 10.1145/2168556.2168563 [DOI] [Google Scholar]
  • 21.Orquin JL, Holmqvist K. Threats to the validity of eye-movement research in psychology. Behav Res Methods. 2018;50: 1645–1656. doi: 10.3758/s13428-017-0998-z [DOI] [PubMed] [Google Scholar]
  • 22.Chita-Tegmark M. Attention allocation in ASD: A review and meta-analysis of eye-tracking studies. J Autism Dev Disord. 2016;3: 209–223. doi: 10.1007/s40489-016-0077-x [DOI] [PubMed] [Google Scholar]
  • 23.Chen NTM, Clarke PJF. Gaze-based assessments of vigilance and avoidance in social anxiety: A review. Curr Psychiatry Rep. 2017;19: 59. doi: 10.1007/s11920-017-0808-4 [DOI] [PubMed] [Google Scholar]
  • 24.Wass SV, Smith TJ, Johnson MH. Parsing eye-tracking data of variable quality to provide accurate fixation duration estimates in infants and adults. Behav Res. 2013;45: 229–250. doi: 10.3758/s13428-012-0245-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Rim NW, Choe KW, Scrivner C, Berman MG. Introducing point-of-interest as an alternative to area-of-interest for fixation duration analysis. PLOS ONE. 2021;16: e0250170. doi: 10.1371/journal.pone.0250170 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Fuhl W, Bozkir E, Hosp B, Castner N, Geisler D, Santini TC, et al. Encodji: Encoding gaze data into emoji space for an amusing scanpath classification approach;). Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications. Denver, Colorado: ACM Press; 2019. pp. 1–4. doi: 10.1016/j.ydbio.2019.12.010 [DOI] [Google Scholar]
  • 27.Castner N, Kuebler TC, Scheiter K, Richter J, Eder T, Huettig F, et al. Deep semantic gaze embedding and scanpath comparison for expertise classification during OPT viewing. ACM Symposium on Eye Tracking Research and Applications. Stuttgart Germany: ACM Press; 2020. pp. 1–10. doi: 10.1145/3379155.3391320 [DOI] [Google Scholar]
  • 28.Carter BT, Luke SG. Best practices in eye tracking research. Int J Psychophysiol. 2020;155: 49–62. doi: 10.1016/j.ijpsycho.2020.05.010 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Guido Maiello

6 Oct 2021

PONE-D-21-24588Computer-generated facial areas of interest in eye tracking research: A simulation studyPLOS ONE

Dear Dr. Domes,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

 Two expert reviewers have evaluated your work. Both reviewers are very positive and provide thoughtful and detailed comments that I am convinced will further strengthen your paper. I believe it will be possible to address all reviewer comments, and I look forward to receiving your revised manuscript. 

Please submit your revised manuscript by Nov 20 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at When you're ready to submit your revision, log on to and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in to enhance the reproducibility of your results. assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on Read more information on sharing protocols at

We look forward to receiving your revised manuscript.

Kind regards,

Guido Maiello

Academic Editor


Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at and

2. Please note that according to our submission guidelines (, outmoded terms and potentially stigmatizing labels should be changed to more current, acceptable terminology. To this effect, please change "Caucasian" to "white" or "of (western) European descent

3. Thank you for stating the following in the Funding Section of your manuscript:

“The study was in part supported by grants from the German Research Foundation (DO1312/5-1) and Trier University Research Priority Program “Psychobiology of Stress”, funded by the State Rhineland-Palatinate.”

We note that you have provided additional information within the Funding Section. Please note that funding information should not appear in other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

 “The study was in part supported by grants from the German Research Foundation (DO1312/5-1) to GD and the Trier University Research Priority Program “Psychobiology of Stress”, funded by the State Rhineland-Palatinate. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript..”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

4. We note that Figures 1A and 1B, 3 and 4 includes an image of a [patient / participant / in the study].

As per the PLOS ONE policy ( on papers that include identifying, or potentially identifying, information, the individual(s) or parent(s)/guardian(s) must be informed of the terms of the PLOS open-access (CC-BY) license and provide specific permission for publication of these details under the terms of this license. Please download the Consent Form for Publication in a PLOS Journal ( The signed consent form should not be submitted with the manuscript, but should be securely filed in the individual's case notes. Please amend the methods section and ethics statement of the manuscript to explicitly state that the patient/participant has provided consent for publication: “The individual in this manuscript has given written informed consent (as outlined in PLOS consent form) to publish these case details”.

If you are unable to obtain consent from the subject of the photograph, you will need to remove the figure and any other textual identifying information or case descriptions for this individual.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes


2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes


3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes


4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes


5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: In this simulation study, the authors investigate the relation between AOI size, accuracy of the gaze position signal and (in)correct classification of gaze to facial feature AOIs. The topic is relevant, and the simulations are sensible. I am therefore enthusiastic about the paper. I have only one real major comment (the first point below). However, I have a number of additional comments and suggestions that do not disqualify the relevance of the paper, but which I think can tremendously strengthen the paper and its impact.

1. As the authors currently phrase it, they investigate AOI size and accuracy of the gaze position signal. However, the third major 'factor' is the inter-AOI distance (or the size of the facial stimulus). That is, the recommendations on page 13 only hold for facial stimuli of the size used in the present study. The problem is scalable: if the inter-AOI distance doubles, and the inaccuracy doubles, the recommendation for AOI size also doubles. I strongly urge the authors to consider phrasing their problem as the combination of AOI size, inaccuracy, AND inter-AOI distance (e.g. operationalised using the AOI span measure in Hessels et al., 2016, BRM, but of course there are other ways of quantifying this). This has at least the following two advantages:

- The recommendations can be made more generic; i.e. as a suggestion for AOI size given both a known accuracy and inter-AOI distance.

- The authors can discuss reasonable inter-AOI distance to be expected in interactive setups, and the fact that this is often close to the inaccuracies that may be expected (in my experience with dual eye-tracking setups, inter-AOI distances of 2 deg and inaccuracies of 1-1.5 deg are not uncommon).

2. The analyses reported in Figure 3 and 4 (and the corresponding Results sections) contain 3 accuracy levels (0.5, 1.0, 1.5 deg) and (for figure 3) three AOI sizes (1.0, 1.5, and 2.0 deg). Why are these accuracy levels and AOI sizes used? Given that it is a simulation study, why not simulate the entire range from 0 deg (or 0.3 which seems to be roughly the lower limit for modern video-based eye trackers) to 4 deg (or something like twice/thrice the AOI span)? Such an analysis would allow readers to pick the AOI size for a given prop. (in)correct classification. (I understand that for the confusion matrices, it is not practical to do a continuous analysis)

3. In many instances, the authors write "data ..." for something that could be made explicit. A number of examples:

- "Data accuracy". Do the authors mean accuracy of the gaze position signal?

- "Data validity". I am unsure what this means. Data cannot be 'invalid' to me. Do the authors mean that the conclusion that one looks at a certain facial feature based on those data would be invalid?

- "Data simulation". Exactly what is meant with data here?

- "Data distribution". Do the authors mean the distribution of fixation positions? Or the distribution of the inaccuracy?

I suggest making all instances of "data ..." explicit whenever possible.

4. I am struggling to follow along with exactly what was simulated (e.g. lines 108-122). Perhaps a flowchart could help here, or at least some redundancy in the writing. Can it be made explicit here how many fixations where simulated for each AOI/stimulus/accuracy values/participants/etc.

5. Although I am trained as a psychologist and have been bombarded with the terms Type I and Type II, I always forget them. I understand the authors' choice for using these terms, but the reader (at least this one) could be helped by redundancy at some instances (e.g. double-coding it with other terms such as 'hit' or 'false alarm', or writing the explanation in parentheses).

6. The authors use the LRVT radius as an operationalisation for AOI size. This could also be reiterated at several locations in the text when the authors write AOI size. In my experience, it is not obvious that the AOI size is operationalised by a radius for novice readers.

Details in order of appearance:

Title. The title seems to misrepresent the topic: it's about choosing AOI size, not the fact that the AOIs are computer-generated. Why not something like:

- How to choose area-of-interest size for facial areas in interactive eye tracking?

- Facial areas of interest in eye-tracking research: How to choose AOI size

- (I'm sure there are many other great alternatives)

l.95-99. This seems incomplete: it depends also on the physical size of the screen on which it is displayed. This could be inferred from the size in degrees, but redundancy may be beneficial to the reader here.

l.105-106. Rephrase sentence. It seems grammatically off.

Figure 1. Is each participant represented by one red dot? If so, can this be made explicit?

l.134-136. What was the motivation for the 4 deg radius in previous research? And how was that used to choose the 2 deg radius here?

l.168. What is 'condition' here? Maybe I missed it, but here I already forgot. Can this be made explicit?

Section 3.2. This analysis depends drastically on the distance between the simulated fixation point and the AOI borders. However, this information is not given. Can this be given?

Section 3.3. The analysis here is uninformative without a description of the AOI size. Can this be made explicit?

l.258-261. It could be made clear that for studies investigating such participants groups or such viewing-strategies, it thus makes sense to consider an AOI for that particular location (or, alternatively, to take an AOI-free approach to check for such potential strategies).

Reviewer #2: see document - apparently I have to write 200 characters here. Lorem Ipsum sum, the rest I forgot. I liked the paper btw. - still eighty characters to go! Now it is only thirty characters... ok here we go


6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Benedikt Ehinger

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at Please note that Supporting Information files do not need this step.


Submitted filename: r0_2021.docx

PLoS One. 2022 Feb 4;17(2):e0263594. doi: 10.1371/journal.pone.0263594.r002

Author response to Decision Letter 0

29 Dec 2021

Response to the editor

2. Please note that according to our submission guidelines (, outmoded terms and potentially stigmatizing labels should be changed to more current, acceptable terminology. To this effect, please change "Caucasian" to "white" or "of (western) European descent

Response: We thank the editor for pointing this out. We changed the term “Caucasian” to “white”.

“The following four stimuli were selected to represent different ethnic groups: 005 (male, Asian, 28 years old), 012 (male, white, 24 years old), 025 (female, African American, 21 years old) and 134 (female, white, 21 years old).”

3. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

“The study was in part supported by grants from the German Research Foundation (DO1312/5-1) to GD and the Trier University Research Priority Program “Psychobiology of Stress”, funded by the State Rhineland-Palatinate. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

Response: We have removed the text referring to funding from the manuscript and want to stick to the current funding statement.

4. We note that Figures 1A and 1B, 3 and 4 includes an image of a [patient / participant / in the study]. As per the PLOS ONE policy ( on papers that include identifying, or potentially identifying, information, the individual(s) or parent(s)/guardian(s) must be informed of the terms of the PLOS open-access (CC-BY) license and provide specific permission for publication of these details under the terms of this license. Please download the Consent Form for Publication in a PLOS Journal ( The signed consent form should not be submitted with the manuscript, but should be securely filed in the individual's case notes. Please amend the methods section and ethics statement of the manuscript to explicitly state that the patient/participant has provided consent for publication: “The individual in this manuscript has given written informed consent (as outlined in PLOS consent form) to publish these case details”.

If you are unable to obtain consent from the subject of the photograph, you will need to remove the figure and any other textual identifying information or case descriptions for this individual.

Response: For illustration purposes, we have created our own stimulus. The person depicted has signed the PLOS consent form.

“For display in the figures, we created another stimulus that was not used for the simulation. The individual pictured in Fig 1 and Fig 3 to 5 has provided written informed consent (as outlined in PLOS consent form) to publish their image alongside the manuscript.” (p. 5)

Responses to the reviewers’ comments

Reviewer #1

In this simulation study, the authors investigate the relation between AOI size, accuracy of the gaze position signal and (in)correct classification of gaze to facial feature AOIs. The topic is relevant, and the simulations are sensible. I am therefore enthusiastic about the paper. I have only one real major comment (the first point below). However, I have a number of additional comments and suggestions that do not disqualify the relevance of the paper, but which I think can tremendously strengthen the paper and its impact.

Major comment:

1. As the authors currently phrase it, they investigate AOI size and accuracy of the gaze position signal. However, the third major 'factor' is the inter-AOI distance (or the size of the facial stimulus). That is, the recommendations on page 13 only hold for facial stimuli of the size used in the present study. The problem is scalable: if the inter-AOI distance doubles, and the inaccuracy doubles, the recommendation for AOI size also doubles. I strongly urge the authors to consider phrasing their problem as the combination of AOI size, inaccuracy, AND inter-AOI distance (e.g. operationalised using the AOI span measure in Hessels et al., 2016, BRM, but of course there are other ways of quantifying this). This has at least the following two advantages:

• The recommendations can be made more generic; i.e. as a suggestion for AOI size given both a known accuracy and inter-AOI distance.

• The authors can discuss reasonable inter-AOI distance to be expected in interactive setups, and the fact that this is often close to the inaccuracies that may be expected (in my experience with dual eye-tracking setups, inter-AOI distances of 2 deg and inaccuracies of 1-1.5 deg are not uncommon).

Response: We are very grateful for the reviewer’s comment, as it enabled us to make the influence of inter-AOI distance and the viewing distance clearer. We extended the discussion and added a figure to visualize the effect driven by two different phenomena.

“While the investigator can choose the AOI size freely, accuracy is the limiting factor. Hence, it seems crucial not only to maximize accuracy as much as possible, but furthermore to estimate accuracy based on a data set recorded under the specific conditions. Ideally, distinct trials instructing the participants to fixate predefined points in the tracking area are included in the study protocol that allow the proper calculation of data accuracy (validation procedures). If this is possible, one could follow the recommendations made in this study to choose the appropriate AOI size. In situations where a proper validation procedure is unfeasible, because of limited time and resources, adjusting another factor can mitigate the impact of accuracy on classification performance: the viewing distance.

In this study, we used a viewing distance of approx. 130cm to simulate an interactive eye tracking setup with a typical viewing distance for face-to-face interactions. Two factors change when reducing the viewing distance: (1) the ratio of the facial stimulus size to the surrounding in the visual field, that is, the face covers a larger area (Fig 5A) and thereby the inter-AOI distances increase, and (2) the span of fixation point deviation due to a reduced visual angle decreases, that is, the spatial accuracy increases (Fig 5B). The first aspect is also influenced by the face’s actual size, e.g., in studies with toddlers the face occupies a smaller area in the viewing field even at constant viewing distance. Here, a change in inter-AOI distance (see AOI span; 13) affects the influence of accuracy and AOI size on classification performance. However, the effect might be small in studies with an adolescent or adult sample due to the limited variation in face sizes. We thus conclude that lower accuracy can be partially compensated for by using shorter viewing distances, provided the setup allows for this. Longer viewing distances or situations with stimuli observed at varying viewing distances, on the other hand, should be treated with caution. Note that the visual angle values used in our study need to be adjusted to differing viewing distances for meaningful comparison between studies. For example, if the viewing distance is reduced to 90cm, the AOI sizes must be increased by 0.9° from the original 2.0° radius to cover the same facial areas (Fig 5A).

[Fig 5]

Fig 5. Effect of viewing distance on stimulus size and fixation point deviation. (A) Visualization of the stimulus size over three viewing distances (90cm, 130cm & 170cm) with AOIs covering the same facial areas (radius 4.6cm; in visual degree angle: 1.6°, 2.0° & 2.9°) and simulated fixation points on the left eye. (B) Visualization of the interaction between visual angle and viewing distance on fixation point deviation. In both figures (A & B) each red dot represents the averaged fixation location of 30 simulated fixations with an accuracy of 1.5°; a total of n=100 data sets were simulated.” (p. 14-15)

Additional comments:

1. The analyses reported in Figure 3 and 4 (and the corresponding Results sections) contain 3 accuracy levels (0.5, 1.0, 1.5 deg) and (for figure 3) three AOI sizes (1.0, 1.5, and 2.0 deg). Why are these accuracy levels and AOI sizes used? Given that it is a simulation study, why not simulate the entire range from 0 deg (or 0.3 which seems to be roughly the lower limit for modern video-based eye trackers) to 4 deg (or something like twice/thrice the AOI span)? Such an analysis would allow readers to pick the AOI size for a given prop. (in)correct classification. (I understand that for the confusion matrices, it is not practical to do a continuous analysis).

Response: We thank the reviewer for pointing out this opportunity to extend our simulation results. We actually did this at some point when conceptualizing this paper, but refrained from reporting it. We found that the effects of accuracy levels <0.5° and >1.5° on gaze classification performance are marginal, and thus focused on an accuracy range with the greatest effect on gaze classification. If the reader experiences accuracy levels that are not reported in our study, we recommend using the gaze simulation tool published with the study. The tool can be used to adjust parameters such as the desired accuracy and to observe the effects on gaze classification performance.

2. In many instances, the authors write "data ..." for something that could be made explicit. A number of examples:

• "Data accuracy". Do the authors mean accuracy of the gaze position signal?

• "Data validity". I am unsure what this means. Data cannot be 'invalid' to me. Do the authors mean that the conclusion that one looks at a certain facial feature based on those data would be invalid?

• "Data simulation". Exactly what is meant with data here?

• "Data distribution". Do the authors mean the distribution of fixation positions? Or the distribution of the inaccuracy?

• I suggest making all instances of "data ..." explicit whenever possible.

Response: We took the comment of the reviewer very seriously and now specify the definition of data as precisely as possible. Changes were made throughout the manuscript, here we list some examples:

“With the current study, we aimed to investigate the gaze classification performance depending on the accuracy of the detected gaze position (spatial offset between detected and real gaze position) and AOI size (LRVT with different radii) with simulated gaze data in order to derive guidelines for the selection AOIs and their size in interactive (face-to-face) eye tracking applications.” (p. 4)

“Since these setups differ substantially from the eye tracker manufacturer’s test conditions, validation is essential with regard to data quality the quality of gaze data and other factors potentially threatening the data validity validity of this signal.” (p. 2)

“The goal for our gaze data simulation was to mimic a standard test procedure with multiple participants and several runs of a gaze validation on facial features, i.e. the instructed sequential fixation of specific targets points on a facial stimulus.” (p. 5)

“Although, the overall parameters (deviation and skewness) are largely in accordance with gaze data recorded in the real setup underlying the present simulation (12), it is possible that other recording conditions, populations or interaction paradigms will lead to different gaze data distributions.” (p. 15)

3. I am struggling to follow along with exactly what was simulated (e.g. lines 108-122). Perhaps a flowchart could help here, or at least some redundancy in the writing. Can it be made explicit here how many fixations where simulated for each AOI/stimulus/accuracy values/participants/etc.

Response: We thank the reviewer for this important remark. We now describe the procedure in more detail to make it easier to understand.

“To simulate a realistic gaze data set for a group of (simulated) participants, the fixation points around the facial targets were determined in four steps. (1) Mean accuracy, sample size and number of runs were specified. (2) Each simulated participant was assigned a base accuracy derived from a generalized gamma distribution around the specified mean accuracy. The standard deviation was set to 0.5 times the mean accuracy and the skewness to 0.6. (3) A random offset angle around the target point was chosen for each simulated participant. (4) Offsets per target were created for each simulated participant depending on the number of runs by varying the individual base accuracy according to a normal distribution with a standard deviation of 0.15 times the base accuracy. Runs with accuracy values that fell outside three standard deviations were recalculated. This procedure allowed us to account for within- (Step 4) and between-subjects (Steps 1-3) variance. We applied the above-mentioned method to simulate data with mean accuracy values of 0.5°, 1.0° and 1.5°. The distribution of the simulated gaze data for the three accuracy levels is found in Fig 1A.

A total of 100 participants were simulated with 30 face validation runs for the three accuracy values and four stimuli, resulting in 36000 data sets. Each face validation run consists of a fixation for each facial target point computed by averaging the gaze samples from one second of recording at a frequency of 120 Hz.” (p. 5-6)

4. Although I am trained as a psychologist and have been bombarded with the terms Type I and Type II, I always forget them. I understand the authors' choice for using these terms, but the reader (at least this one) could be helped by redundancy at some instances (e.g. double-coding it with other terms such as 'hit' or 'false alarm', or writing the explanation in parentheses).

Response: We thank the reviewer for bringing this problem to our attention. We now include more redundancy in the text.

“In addition, these effects were not independent and differed for falsely classified gaze inside AOIs (Type I errors; false alarms) and falsely classified gaze outside the predefined AOIs (Type II errors; misses).” (p.2)

“Thereby, we focus on classification performance with respect to false-positives (falsely classified inside a specific AOI; Type I error; false alarms) and false-negatives (falsely classified outside a specific AOI; Type II error; misses) to derive recommendations for choosing AOI size depending on accuracy.” (p.4)

“To investigate the influence of accuracy and AOI size (LRVT with different radii) on false-negatives (Type II error; misses), we analyzed the number of fixation points directed to one of the four facial AOIs that were misclassified as belonging to a different AOI, or to no AOI at all (rest of face & surrounding). We chose to visualize the effect using confusion matrices and bar plots, and analyzed the gaze data descriptively.

The effect of accuracy and AOI size on false-positives (Type I error; false alarms) was tested by simulating fixation points on the forehead of the facial stimuli for which no AOI had been defined. Classification was correct when no AOI was detected, whereas false-positives occurred when fixations points were misclassified as belonging to one of the AOIs. Again, bar plots were created to visualize the effect of the independent variables, and analyses were performed at the descriptive level.

Last, the effect of accuracy on false-negatives (Type-II error; misses) was further tested by analyzing fixation points simulated on the different AOIs as being directed towards or away from the face. Classification was correct when the fixations were detected within the face ellipse.” (p.8)

“Fixations simulated on eyes, nose, and mouth with high data quality, e.g., accuracy values of 0.5°, were correctly classified in 96.7 to 100.0 percent of cases. Misclassification and non-classification of fixations (false-negatives; Type II error; misses), in turn, occurred in only 0 to 3.4 percent of cases. (p.9)

“To investigate the effect of accuracy and AOI size on false-positives (Type I error; false alarms), gaze data were simulated on a target point for which no AOI had been defined. In this particular case, we simulated fixations on the forehead of the facial stimuli to recreate a situation in which someone is trying to avoid eye contact and hides their behavior by fixating on the forehead.” (p.10)

“Differentiating this effect in terms of Type I (false alarms) and Type II errors (misses), we found that AOI size is irrelevant concerning Type II error (falsely classified outside predefined AOIs) for data accuracy better than 1.0°.” (p.12)

“For studies with lower accuracy, Type II errors (misses) can still be compensated to some extent by using larger AOIs, but at the cost of increasing the probability of Type I errors (false alarms).” (p. 16)

5. The authors use the LRVT radius as an operationalisation for AOI size. This could also be reiterated at several locations in the text when the authors write AOI size. In my experience, it is not obvious that the AOI size is operationalised by a radius for novice readers.

Response: Following the reviewer's advice, we have added the information at certain points in the text.

“With the current study, we aimed to investigate the gaze classification performance depending on the accuracy of the detected gaze position (spatial offset between detected and real gaze position) and AOI size (LRVT with different radii) with simulated gaze data in order to derive guidelines for the selection AOIs and their size in interactive (face-to-face) eye tracking applications.” (p. 4)

“To investigate the influence of accuracy and AOI size (LRVT with different radii) on false-negatives (Type II error; misses), we analyzed the number of fixation points directed to one of the four facial AOIs that were misclassified as belonging to a different AOI, or to no AOI at all (rest of face & surrounding).” (p.8)

“In the present simulation study, we investigated the effect of the accuracy of the detected gaze position and the AOI size (LRVT with different radii) on gaze data classification for facial stimuli in a (simulated) interactive eye tracking setup.” (p. 12)


1. Title. The title seems to misrepresent the topic: it's about choosing AOI size, not the fact that the AOIs are computer-generated. Why not something like:

• How to choose area-of-interest size for facial areas in interactive eye tracking?

• Facial areas of interest in eye-tracking research: How to choose AOI size

• (I'm sure there are many other great alternatives)

Response: We thank the reviewer for these suggestions and have adjusted the title to read:

“How to choose the size of facial areas of interest in interactive eye tracking”

2. L. 95-99. This seems incomplete: it depends also on the physical size of the screen on which it is displayed. This could be inferred from the size in degres, but redundancy may be beneficial to the reader here.

Response: Our gaze data were simulated from the camera's perspective to the stimuli (above the participants' heads; Vehlen et al., 2021) and are thus independent of monitor size. We did this to simulate realistic gaze data in a face-to-face setup without monitors.

3. L.105-106. Rephrase sentence. It seems grammatically off.

Response: We have reworded the sentence to make it grammatically correct:

“Each target point corresponded to an AOI center point, except for the forehead point, for which no AOI was generated.” (p.5)

4. Figure 1. Is each participant represented by one red dot? If so, can this be made explicit?

Response: We extended the caption to figure 1:

“Visualization of the three gamma functions used to generate gaze data with three levels of accuracy (0.5°, 1.0° & 1.5°) and examples of the simulated fixations for the left eye of a facial stimulus as the facial target. Each red dot represents the averaged fixation location of 30 simulated fixations; a total of n=100 data sets were simulated. (B) Visualization of the three steps of the automatic AOI construction process. 1. Facial landmarks from OpenFace. 2. AOI center points derived from the facial landmarks. 3. Resulting AOIs using the Limited-Radius Voronoi-Tessellation (LRVT) method (example with 2.0° radius). Note OF = OpenFace (19).” (p. 7)

5. L.134-136. What was the motivation for the 4 deg radius in previous research? And how was that used to choose the 2 deg radius here?

Response: We tried to make this reference more explicit by adding the following information:

“To assess the effect of accuracy and AOI size on gaze classification performance, the AOI radius of 4° proposed in the literature as being robust to noise (imprecision of the signal) (13,17) was adjusted to a 131cm viewing distance, resulting in a radius of approx. 2.0° (~4.6cm). This was necessary to ensure that the AOIs covered the same facial area.” (p. 7)

6. L.168. What is 'condition' here? Maybe I missed it, but here I already forgot. Can this be made explicit?

Response: We thank the reviewer for pointing out this inexactness and have revised the section:

“The two-way ANOVA for the percentage of correctly classified fixation points revealed a non-significant main effect of facial stimulus, F(3, 1188) = 2.06, p = .104, ƞ2G < .01, a significant main effect of accuracy, F(2, 1188) = 275.77, p < .001, ƞ2G = .32 and a non-significant interaction effect, F(6, 1188) = 0.37, p = .900, ƞ2G < .01, resulting in the aggregation of classification data across stimuli.” (p. 8)

7. Section 3.2. This analysis depends drastically on the distance between the simulated fixation point and the AOI borders. However, this information is not given. Can this be given?

Response: We completely agree with the reviewer and added the appropriate information. In the process, we noticed an error in the specified sizes of the stimuli, which we have also corrected.

“To investigate the effect of accuracy and AOI size on false-positives (Type I error; false alarms), gaze data were simulated on a target point for which no AOI had been defined. In this particular case, we simulated fixations on the forehead of the facial stimuli to recreate a situation in which someone is trying to avoid eye contact and hides their behavior by fixating on the forehead. The mean distance between the forehead target point and the AOI center points of the eyes was approx. 1.56°.” (p. 10)

“After rescaling, the facial stimuli covered an average area of 7.8 by 5.7°, which corresponds to the size of a real face in a face-to-face conversation at the aforementioned viewing distance.” (p. 5)

8. Section 3.3. The analysis here is uninformative without a description of the AOI size. Can this be made explicit?

Response: We added the requested information.

“To investigate the effect of data accuracy on false-negatives (misses), gaze data was simulated on four facial targets points (left eye, right eye, nose & mouth) and classified as being directed towards or away from the face. The face ellipse had an average vertical radius of about 4.22° and a horizontal radius of 2.80°.” (p. 11)

9. L.258-261. It could be made clear that for studies investigating such participants groups or such viewing-strategies, it thus makes sense to consider an AOI for that particular location (or, alternatively, to take an AOI-free approach to check for such potential strategies).

Response: We extended the discussion accordingly.

“However, by examining a wide range of accuracy values in the current study, we additionally observed that the superiority of smaller AOIs decreases in conjunction with reduced accuracy. For subject samples where abnormal gaze behavior is expected, the Type I error problem could be transformed into a Type II error problem by adding an AOI on the forehead.” (p. 13)

Reviewer #2

In this study, the authors simulate fixations on faces with different accuracies (noise levels of the simulated Eye-Tracker). They then analyze how well fixation location in different ROIs can be estimated using four ROIs and three ROI-sizes. They quantify both type-1 and type-2 errors. They find the expected patterns that high accuracy is good, large ROIs lead to higher type-2 errors, but lower type-1 errors, but importantly, they give a clear framework and open-source tool on how to quantify these intuitions. Consequently, this helps in planning and intuiting eye-tracking face-perception studies with eye-trackers of different accuracy.

As the authors state in the paper, a limitation is, that the ROI-size, eye-tracking accuracy and stimulus sizes are all connected via viewing distance, and one can be traded off each other’s. A limitation in their type-2 error analysis is, that it hinges heavily on the placement of the “wrong” fixation point (in this case the forehead). But this is clearly visible from the paper. I encourage the authors to polish their tool a bit, because I think it could be very useful for the community.

I enjoyed reading this paper and I have only some minor comments.


1. Code not available


We thank the reviewer for this suggestion and uploaded the code to Zenodo ( and github ( The links can also be found in the OSF repository (

Minor Comments:

1. Unclear definition of “fixation point”.

As I understand from the data, you check for each simulated sample, whether it is inside or outside the AOI. I understood fixation point more as the average x/y of a fixation. “Gaze sample” could be a better term?

Response: We thank the reviewer for pointing out this inconsistency in our results. We now based all calculations on fixations and uploaded the data to OSF. As expected, gaze classification improved across all conditions as a result of this change. We discussed this observation in the following manner:

“The fact that our evaluation was based on fixations can be seen as a further study limitation. Defining fixations eliminates the influence of precision and thus improves classification performance. This has little effect in setups with eye trackers demonstrating good to very good precision, but can lead to significantly different results in less precise eye trackers. In such cases, fixation classification is already affected by the reduced precision (24) and thus indirectly influences the gaze classification performance.” (p. 15)

2. Related to the above: Often, accuracies of eye trackers are calculated on fixation level and not on sample, thus counteracting potential influences of bad precision. It did not become clear to me, whether we should count the 120 samples as 120 Fixations, or as a single fixation.

Response: By presenting the results on fixation level, the effect of precision is eliminated.

3. Thanks to the shared data, I reproduced some part of the main analyses of the paper. This replicates Figure 3b, but in reverse order of the x-axis and not as stacked barplots. Note that the means perfectly match Figure 3b – I additionally depicted the variability over subjects, which I think show the relation to Figure 1a, the gamma distributions of “base” accuracy”. Maybe this variability is of interest to the readers as well, quite often we aggregate too much and do not show the variability; but in this case I’m actually not sure.

Response: We thank the reviewer for taking the time to visualize the data. We believe that distribution information is in general very valuable, but for reasons of clarity, we decided to use the original type of visualization. The limited variance due to the fixation-based analysis was another factor that led to our decision.

4. It would be super valuable, if you could host the source code of the simulation tool on github (or other platform + zenodo link for archiving), so that other researchers can build upon your work. Further, there is no documentation on how to use the tool (or did I miss it?), and I couldn’t get it to run properly. If documentation is too time-consuming, a short screencap would already help.

Response: We thank the reviewer for this suggestion. We uploaded the code to Github ( and Zenodo ( On OSF we have linked two videos describing the use of the tool (

5. On a similar note, it would be nice to get a short readme file in your, simply depicting what column means what. I think I figured it out, but I was confused at first.

Response: We uploaded the new fixation-based data to OSF and added a detailed readme file.

6. Plotting Suggesting: Maybe you can add the sizes of the stimuli (2.8°/1.9°) to one (or all?) stimulus depictions? That would help in setting the 1.5° accuracy better into relation.

Response: We did this for Figures 3 and 4.

7. L319/L320: repetition of “essential”


“It is thus essential to validate novel interactive eye tracking setups carefully. The proper estimation of accuracy is an important prerequisite for informed decisions regarding the size of AOIs used for data analysis.” (p. 15)

8. L113 was => were

Response: This sentence was removed in response to reviewer 1’s third comment.

“Second, each simulated participant was assigned a base accuracy derived from a generalized gamma distribution around the specified mean accuracy.” (p.6)


Submitted filename: Response_to_Reviewers.docx

Decision Letter 1

Guido Maiello

24 Jan 2022

How to choose the size of facial areas of interest in interactive eye tracking


Dear Dr. Domes,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact

Kind regards,

Guido Maiello

Academic Editor


Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed


2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: (No Response)

Reviewer #2: Yes


3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: (No Response)

Reviewer #2: Yes


4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: (No Response)

Reviewer #2: Yes


5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: (No Response)

Reviewer #2: Yes


6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

Reviewer #2: Thank you for adressing all my concers. This is a super nice and open manuscript. Well done, keep up the good, helpful and important work!


7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Benedikt Ehinger

Acceptance letter

Guido Maiello

26 Jan 2022


How to choose the size of facial areas of interest in interactive eye tracking

Dear Dr. Domes:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact

If we can help with anything else, please email us at

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Guido Maiello

Academic Editor


Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials


    Submitted filename: r0_2021.docx


    Submitted filename: Response_to_Reviewers.docx

    Data Availability Statement

    The data sets created and analysed in this study can be downloaded here: or reproduced using the software tool described, which can also be downloaded from this website.

    Articles from PLoS ONE are provided here courtesy of PLOS