Improving Eye–Computer Interaction Interface Design: Ergonomic Investigations of the Optimum Target Size and Gaze-triggering Dwell Time

Ya-feng Niu; Yue Gao; Ya-ting Zhang; Cheng-qi Xue; Li-xin Yang

doi:10.16910/jemr.12.3.8

. 2020 Sep 25;12(3):10.16910/jemr.12.3.8. doi: 10.16910/jemr.12.3.8

Improving Eye–Computer Interaction Interface Design: Ergonomic Investigations of the Optimum Target Size and Gaze-triggering Dwell Time

Ya-feng Niu ^1,^*, Yue Gao ¹, Ya-ting Zhang ¹, Cheng-qi Xue ¹, Li-xin Yang ¹

PMCID: PMC7880147 PMID: 33828737

Abstract

Interactive feedback of interface elements and low level of spatial accuracy are two main key points for the interaction research in the Eye-computer interaction system. This study tried to solve these two problems from the perspective of human–computer interactions and ergonomics. Two experiments were conducted to explore the optimum target size and gaze-triggering dwell time of the eye–computer interaction (ECI) system. Experimental Series 1 was used as the pre-experiment to identify the size that has a greater task completion rate. Experimental Series 2 was used as the main experiment to investigate the optimum gaze-triggering dwell time by using a comprehensive evaluation of the task completion rate, reaction time, and NASA-TLX (Task Load Index). In Experimental Series 1, the optimal element size was determined to be 256 × 256p x 2. The conclusion of Experimental Series 2 was that when the dwell time is set to 600 ms, the efficiency of the interface is the highest, and the task load of subjects is minimal as well. Finally, the results of Experiment Series 1 and 2 have positive effects on improving the usability of the interface. The optimal control size and the optimal dwell time obtained from the experiments have certain reference and application value for interface design and software development of the ECI system.

Keywords: ECI, dwell time, target size, usability, ergonomics

Introduction

The Eye–Computer Interaction (ECI) is an interactive method of controlling a computer or equipment by eye movements. An eye tracker is used to capture the user's line of sight data and identify the real-time position and trajectory of the gaze point. The ECI input commands include fixation, gaze gesture, blink, saccade, and smooth pursuit [1]. ECI has become the main control mode in the fields of head mounted displays (HMD) aiming system [2], and Artificial Intelligence [3], and it can also help patients with amyotrophic lateral sclerosis, hemiplegia, and pediatric cerebral palsy to communicate without obstacles. As the first interactive entrance, the user interface is one of the most important components in the ECI system, and all the ECI input commands are directly related to it. A good ECI interface design can improve user manipulation performance. Both, Windows and IOS operation systems have interface design specifications and standards [4, 5], but they are not entirely practical for the ECI system. There are two universal problems with the ECI system: “Midas touch” and “low spatial accuracy”. On the one hand, the ECI system cannot accurately distinguish whether the user is looking at a control for interaction or only getting information owing to the “Midas touch” [6, 7]. As shown in Fig. 1-a, the user’s original intention was to glance at A to get information but not trigger A; however, the system feedback result showed that A was triggered. On the other hand, “low spatial accuracy” resulted in a large deviation between the eye gaze position and the actual target control position, which led to a large probability of accidentally touching adjacent controls. As shown in Fig. 1-b, the user planned to gaze at A and trigger it, but adjacent control B was triggered instead.

Midas touch (left) and low spatial accuracy (right).

In order to address these two questions, scholars proposed solutions in terms of the eye movement index [8, 9], the positioning calibration algorithm [10, 11, 12, 13, 14, 15], and multi-channels [16, 17]. However, these methods required a higher hardware configuration and algorithm accuracy, which brought new problems such as visual fatigue [18, 19] and a poor interaction experience [20, 21]. Therefore, in the interface design of the eye control system, it remains to be determined what kind of interactive feedback mode brings the highest interaction efficiency. In addition, we need to find out how large the interaction area is when the recognition rate is the highest? These are the two core issues of this research. The interaction efficiency can be evaluated comprehensively by the reaction time and user workload, and the recognition rate can be obtained by the accuracy rate or error rate.

Interactive feedback mainly involves eye-triggering movements and the dwell time. The main forms of eye-triggering movements are fixation, gaze gesture, blink, and closure, among which fixation is the most basic, widespread, and direct mode, so this research chose gaze as the triggering movement.

The interaction range mainly involves the size and location distribution of the functional control. The size refers to the spatial area of the control. The location distribution is mainly reflected in the saccade orientation or the speed of sight-lines' moving from the current triggering control to the next control. The faster the speed is, the greater the location advantage of the next control. The research on the best interactive feedback form and interactive range could reduce the Midas touch and improve the spatial accuracy to a certain extent.

This research mainly investigated the optimal gaze-triggering dwell time and size of functional controls. The gaze interaction basic model of the ECI system can be used to describe the process intuitively, as shown in Fig. 2. In Step 1, an individual gazes at module A and triggers it; in Step 2, a visual search is conducted to find target module B, and ignore other distractors; and in Step 3, an individual gazes at module B and triggers it. Steps 1 and 3 mainly involve the gaze-triggering dwell time, and Step 2 refers to the saccade orientation at the fastest saccade speed in different spatial positions. In Step 3, the black square is the functional control for triggering and gray is used in the non-triggering situation.

Basic gaze interaction model of the eye–computer interface (ECI) system.

Theoretic foundation of Experimental Series 1

Theoretic foundation of shapes

Murata and Fukunaga's research on the size of ECI controls shows that square and circular controls are more efficient than 1:2 and 1:3 rectangles during interactions [22]. Thus, the controls were set to square in this study, and we also aimed to unify the spacing of controls.

Theoretic foundation of size and position

According to a previous study, the size of controls is generally represented by the visual angle in the ECI system. The visual angle is the angle between the edge of a control and sight when the eye is looking at its center. It is also determined by the distance between the eyeball and the screen as well as the control’s size. Office man–machine instruction manual states that the distance between the eyes and the screen should be no less than 25 inches or 63.5 cm [23]. Feng and Shen suggested that the size of the trigger object should be no less than 1.5°, and the object spacing should be no less than 1.0° [24].

Combined with the above research, the side length of an ECI control should be no less than 49 pixels, and the pitch should be no less than 33 pixels. According to The Windows Interface Guidelines for Software Design [4], this research selected four standard sizes as alternatives—64 × 64px², 128 ×128 px², 256 × 256 px², and 512 × 512px²—which meet the requirements of the visual angle as well. Subsequently, the control size was further filtered according to other standards. The steps for screening selected control standards were as follows:

Nine square positions were set in experimental interfaces for placing controls, and these were evenly distributed to positions of 3 × 3. In Experimental Series 1, there was no control placed in the center position of the interface. Thus, control positions in the interfaces were named the upper left (UL), upper (U), upper right (UR), right (R), below right (BR), below (B), below left (BL), and left (L) giving a total of eight kinds, as shown in Fig. 3-a. Controls of different sizes appeared in the center of these eight areas in Experimental Series 1, and the gaps between the controls is not considered in this experiment.

Schematic graph of the control position (left) and four control sizes.

For the control of size 512 × 512px², the information capacity was only 6, and the spacing was relatively narrow. Users could obtain only 6 pieces of information too. This did not meet the requirements of the general interface information, so it was not selected.

In the process of sight recognition, the feedback point was changing in real-time. In the ECI system, the default range setting for sight was nearly 30 × 30 px². If the control size was close to this value, the acquisition of key information in the target would be directly interfered with, resulting in a low interaction efficiency. Moreover, the phenomenon of sight drift is common in the ECI. When a control occupies a limited area, it is relatively difficult to trigger an action by dropping the viewpoint into the scope of the control for a certain period. The difficulty increases as the dwell time increases as well. Besides, the scanning speed of sight is extremely fast, which also leads to participants being unable to lock the target well if the control size is small, so the size 64 × 64px² was not selected. Ultimately the sizes 128 × 128px² and 256 × 256px² were used (Fig. 3-b).

Experimental Series 1

Experimental Series 1 was used as the pre-experiment. By recording the reaction time and accuracy rate under different control sizes, the optimal control size was screened, which paved the way for the formal experiment. The control size was set to two levels, and the position of control in the interface was set to eight levels. Experimental Series 1 adopted a single-factor, two-level experimental design.

There were 10 (repetitions) × 8 (positions) × 2 (128 × 128px², 256 × 256px²) × 20 (participants), giving a total of 3200 trials in the whole of Experimental Series 1. For each single level (128 × 128px², 256 × 256px²), 1600 sets of data were recorded. Each participant needed to complete 10 (repetitions) × 8 (positions) × 2 (levels) for a total of 160 trials, which took approximately 10 minutes.

Participants

Twenty right-handed volunteers participated in Experimental Series 1 and 2. Their ages ranged from 21 to 25, the mean age was 23.1, and the standard deviation was 1.5. They were all undergraduate or postgraduate students from Southeast University. All participants were physically and mentally healthy, had no history of mental illness, and had normal or corrected vision without astigmatism. All of them had experience with using the Tobii eye tracker. The study protocol had been approved by the Southeast University Ethics Committee.

Equipment

The computer system used was Windows 10, and the screen size was 1920 × 1080px². The Tobii X2-30 Eye tracker is an eye movement tracker device with a 30 Hz sampling rate. It is small in size and was fixed at the bottom of the screen for the experiment. The device was used to get participants’ eye movement data during the process. The experimental platform was imported into the Tobii SDK installation package through Unity6.0 and compiled using C#.

Experimental stimuli

The dependent variables were the reaction time and accuracy rate. The reaction time was directly output by the eye tracker and represented by t_a, which referred to the length of time from the beginning of the trial to the trigger of control A. The accuracy rate was calculated as (number of tasks - number of failed tasks)/number of interfaces. Failed tasks included unintended activations and timeouts. When the residence time of a trial exceeded 10 seconds, the system counted it as a timeout automatically.

Procedure of Experimental Series 1

Participants were told to sit in front of the screen, with their eyes approximately 640 mm from the screen. The angle between the sight and the screen was 27° horizontally and 17° vertically. Experimental Series 1 trial began with a black cross focal point with a white background in the center of the screen for 1000 ms. Then, controls of different sizes with eight white letters on black backgrounds were displayed in the eight different directions of the white background screen. The eight different blocks used in each trial were random but contained control A each time. Participants were asked to find control A and gaze at it for 2000 ms until it turned green (#009944). When participants gazed at other controls but not control A, the related control turned red, which meant that participants were making wrong decisions. As participants finished the task with the right decision or could not finish in 15 seconds, a white blank display subsequently appeared for 1000 ms. Visual persistence was eliminated, and participants could take a rest as well. The procedure of Experimental Series 1 is depicted in Fig. 4.

Flow chart of the Experimental Series 1 trial.

All levels of independent variables were included in each round of experiments, and each level had one trial in a total of 40 rounds. Each participant needed to finish two rounds of experiments. After the first round of experiments was completed, there was a rest period. Trials appeared randomly.

Data analysis of Experimental Series 1

1. Reaction time

In the data preprocessing, 20 data points had been removed corresponding to timeout failure and accidental fixation, the average time taken for task completion is shown in Fig. 5.

The graph shows that the reaction time of both 256 × 256px² and 128 × 128 px² was around 2 seconds. The unequal variance analysis of the independent samples T tests was used for data analysis. The results shows that there was no significant difference between the reaction time at the two size levels (p = 0.057, p > 0.05), that is, reaction time could not be used to select the optimal control size.

2. Accuracy rate

An One Way ANOVA analysis was performed on the accuracy rate data (Fig. 6). The analysis suggested that there is a significant difference in the accuracy rate under different control sizes. The accuracy rate of the control with a size of 256 × 256px² (0.97) was significantly higher than that of the control with a size of 128 × 128px² (0.82) (F = 3.97, p<0.001). Therefore, 256 × 256px² was chosen as the control size.

Line chart showing the results of the accuracy analysis.

Discussion of Experimental Series 1

In Experimental Series 1, the size levels were 128 × 128px² (3.36° × 3.36°) and 256 × 256px² (6.63° × 6.63°), respectively. Two dependent variables were adopted to select the optimal size: reaction time and accuracy rate. The average reaction time under levels of 128 × 128px² and 256 × 256px² were 2.05 and 2.38 s, respectively. According to the variance analysis, the size level had no significant influence on the reaction time. The short review above shows that the reaction time cannot be used as an effective indicator in this experiment. However, as can be seen from Fig. 5, the reaction time at level 128 × 128px² was longer than that at level 256 × 256px² in most cases. At the same time, there were certain differences in the response for different control positions. However, the impact of control positions on the efficiency is not discussed in this research.

Itakura and Sakamoto [25] built experimental interfaces with two different control sizes, with the width of the controls being 4° and 6°, respectively. In their study, the accuracy rate was calculated by deviation. If the deviation in one gaze was greater than 2°, it was considered a triggering failure. Finally, the accuracy of the interfaces was 96.7% and 88%, in which the control size made the difference. In Experimental Series 1, the accuracy rate of the 256 × 256px² size was significantly higher than that of the 128 × 128px² size (p < 0.05), which was consistent with the conclusion of Itakura et al. and was also consistent with the interaction suggestions proposed by Chitty [26]: in the ECI system, control sizes should be as large as possible, while the information capacity and fault tolerance should be also considered.

In addition, the less accurate areas in the interface were at the lower right corner, and the distribution area of the viewpoint before fitting was 225×183px² [27]. In Experimental Series 1, all eight controls were located at the edge of the screen. Compared with the controls located in the center, the gaze accuracy of the controls at the lower edge and right edge was significantly reduced (p < 0.05), maybe this is related to the precision of the eye tracking device [28]. In order to ensure the accuracy and efficiency of the gaze input, the optimal size of the control located at the edge of the screen should be slightly larger. This conclusion supports the conclusion of experiment 1.

Fitz's law for ECI is as follows: when human eyes scan an object, the viewpoint will first move to the direction close to the target by a large distance and then be adjusted slowly and slightly through a small distance, before being positioned at the target [29]. The first stage of the saccade is quick. However, when the control is relatively small, the slow adjustment in second stage will last longer, which can affect the reaction time to some extent. Murata, Konishi, Moriwaka, and Fukunaga [30] verified the influences of the shape, area, and position of gaze-input controls on the reaction time (pointing time) in the ECI system and how it fits with the modified Fitz’s law model. Ware and Mikaelian [31] also fitted Fitz’s law to the reaction time model and obtained a relatively ideal result:

Pt = a + b \times \log_{2} (d / s + 1)

(1)

Pt refers to the reaction time, d is the distance between control B and the center of control A, s is the area of control B, a and b are constants, and log₂ (d / s + 1) is defined as the difficulty. In the study of Ware and Mikaelian [31], the data of square controls fit the modified Fitz’s law best. This suggested that the reaction time of square controls would decrease significantly with the increase of control size. This is another main reason why we chose square as stimulus material (the first reason had been mentioned in the part of “Theoretic foundation of shapes”).

The purpose of Experimental Series 1 is to screen the optimal control size. In order to get the optimum gaze-triggering dwell time of optimal control size, Experimental Series 2 was conducted.

Theoretic foundation of Experimental Series 2

Feng Chengzhi [24] suggested that the dwell time of the gaze trigger should be 500 ms. Helmert et al. [32] compared the performance of the virtual keyboard in gaze-input typing when the dwell time was 350, 500, and 700 ms after considering the KSPC (Key Strokes Per Character) and the character input speed. The results showed that 500 ms was the best solution. In Helmet research, there was a given task, while the present research is more or less free of any task context. The dwell time setting of gaze control is different according to task, such as gaze typing, control of an interface and other tasks.

The study of Graf and Krueger [33] showed that the length of the gaze was divided into two types: long fixations (>320ms), named the voluntary gaze, and short fixations (<240 ms), named the involuntary gaze, respectively. Therefore, the lower limit of the dwell time needs to be set at more than 200 ms so as to avoid the phenomenon of accidental fixation. According to Sibert, Linda, and Jacob [34], human eyes usually stabilize the viewpoint on the target object within 200–600 ms after a saccade. In their relevant studies, the dwell time was set to 200 ms. Therefore, based on previous research and human physiological characteristics, Experimental Series 1 locked the dwell time at 200–800 ms and set the step size at 200 ms.

Experimental Series 2

Experimental Series 2 was used as the Main Experiment to investigate the optimum gaze-triggering dwell time by conducting a comprehensive evaluation of the accuracy rate, reaction time, and NASA-TLX (Task Load Index). According to the results of Experimental Series 1, control sizes A and B were set to 256 × 256px². The purpose of setting control A was to make participants’ eyes start from the center of the screen uniformly to eliminate the original error. Control B was used as the interactive control. In this experiment, a single factor four-level design was adopted. Since the design of Experimental Series 2 was based on the conclusion of Experimental Series 1, same participants completed the two experiments at different time intervals of one week. The participants and equipment used in Experimental Series 2 were same as those used in Experimental Series 1.

There were 10 (repetitions) × 8 (eight positions) × 4 (200, 400, 600, and 800 ms) ×20 (participants), giving a total of 6400 trials in the whole of Experimental Series 2. For each single level (200, 400, 600, and 800 ms), 1600 sets of experimental data were recorded. Each subject needed to complete 10 (repetitions) ×8 (positions) ×4 (levels), giving a total of 320 trials, which took approximately 20 minutes.