Abstract
This paper concerns human-inspired robotic eye-hand coordination algorithms using custom built robotic eyes that were interfaced with a Baxter robot. Eye movement was programmed anthropomorphically based on previously reported research on human eye-hand coordination during grasped object transportation. Robotic eye tests were first performed on a component level where accurate position and temporal control were achieved. Next, 11 human subjects were recruited to observe the novel robotic system to quantify the ability of robotic eye-hand coordination algorithms to convey two kinds of information to people during object transportation tasks: first, the transported object’s delivery location and second, the level of care exerted by the robot to transport the object. Most subjects correlated decreased frequency in gaze fixations on an object’s target location with increased care of transporting an object, although these results were somewhat mixed among the 11 human subjects. Additionally, the human subjects were able to reliably infer the delivery location of the transported object purely by the robotic eye-hand coordination algorithm with an overall success rate of 91.4%. These results suggest that anthropomorphic eye-hand coordination of robotic entities could be useful in pedagogical or industrial settings.
Keywords: robotics, animatronics, eye-hand coordination, human-robot interaction
I. Introduction
A wealth of information can be communicated between people using eye movements. Currently, robotic eyes do not possess all the capabilities of human eyes, which is a limitation in human-robot interaction where realistic eyes are important to positively impact human perception of the robot [1]. Nevertheless, there have been several studies on the impact of robotic eye motions in social robotics. The use of human-like gaze in robots was found to increase a robot’s persuasive ability as a storyteller and speed up a handoff task from robot to human [2], [3]. However, one study in which a robot attempted to influence a subject’s answer in a guessing game found gender might have influenced which types of gaze, such as averted or constant, were effective [4].
To compare the effects of realism in different media, one study analyzed the performance of three characters (one robotic, one animated and one human) whose eyes were only visible to test subjects [5]. The robot was seen as more credible, informative, and engaging than the animated character and rated comparable to the human in all categories. The conclusions highlighted the ways in which robotic eyes can make social robots better teachers and assistants.
One way in which robotic eyes can be actuated is through the use of servos and wires [6]. Also, in any case where multiple servos are connected to a single eye, the movement of one servo can create an undesired torque on another. One open-source project avoided both issues by using rigid 3D printed connections to separate lateral and vertical eye rotation, but vertical eye rotation speeds were neither specified nor portrayed in videos to be as fast as that of a human eye [7]. The Agile Eye consisted of three DC servo motors and six spherical links per eye [8]. While its speed and range of motion were greater than that of a human eye, the original design could not fit behind an artificial human face. Modifying the design by employing bent links fixed the issue [9]. To simplify the complex design, a team altered the eyes to have two active degrees of freedom (DOF) and one passive DOF [10]. In the process, however, the range of motion was greatly reduced.
Other robotic eye designs do not use motors. One example is an artificial eyeball suspended in fluid within a translucent outer shell [11]. Magnets are fixed within the eyeball, and an eight-coil electromagnetic drive structure is mounted to the outer shell. Small currents are sent to specific coils that result in eye rotation. The eyes’ small form factor makes them ideal replacements for old animatronic eyes because the change does not interfere with the rest of the mechanism; however, this control method can be more challenging to accurately implement. Display eyes using either organic light-emitting diode (OLED) screens or printed optics are also motor-free [12], [13]. However, because OLED eyes naturally glow, they are more fitting for the portrayal of animated characters than that of lifelike humanoid robots.
For robot manipulators to better perform tasks, some have been programmed with robotic hand-eye coordination algorithms. The algorithms rely on feedback from a vision system often consisting of two cameras [14]–[16]. One such algorithm employed trajectory planning and enabled a robot to track and grasp a moving object [14]. For versatile applications, one group sought to create task-independent control using an extended state observer to estimate and thereby account for unknown system factors such as external disturbances [17]. In a case where the robots needed to grasp multiple types of objects, they were taught their movements using thousands of datasets containing example grasp scenarios obtained through testing [18]. Similarly, a study was done in obstacle avoidance that used the movements of human test subject arms under various obstacle location conditions as a model for a humanoid robot’s algorithm [19]. Interestingly, the addition of user gaze to end effector data for machine learning purposes was found to make insignificant improvements in robot performance [20]. Quite recently, a robotic hand-eye coordination system combined with a retina-like vision system was designed in which signals went in both directions between the vision system and the robotic end effector [21]. The addition of the hand-to-eye channel allowed for obstacle detection in the workspace.
This paper presents the design and control of a novel robotic eye mechanism shown in Fig. 1(a) that is incorporated into a 3D scanned (Fig. 1(b)) and printed (Fig. 1(c)) face affixed to a Baxter robot (Fig. 1(d)) [22]. The mechanism is designed to have independent left-right and up-down gaze movements achieved using a servo and DC motor, respectively. These robotic eyes are programmed to move anthropomorphically relative to robot arm motions during experiments where people infer information from the robot based on the eye-hand coordination algorithm. In one case, subjects attempted to interpret the robotic eye movements to infer the delivery location of objects transported by Baxter. In another case, people were asked to infer whether a cup transported by Baxter was completely full or not (the level of care to transport an object) based on the eye movements during transport. To the best knowledge of the authors, this paper details a novel attempt to synchronize robotic eye-hand coordination with the intent of communicating task relevant information to people in a social context.
Fig. 1.
(a) CAD of robotic eye mechanism. (b) Face scan. (c) 3D printed mask. (d) Robotic eye mechanism and mask integrated onto Baxter robot
II. Robotic Eye Mechanism design
A. Human Inspiration for Robotic Eyes
The average diameter of a human eye is 24mm, and pupil diameter can range from 3mm to 7mm [10], [23]. The human eye’s mechanical range of motion, referred to as the oculomotor range (OMR), can be defined by how far left, right, up and down the average person can rotate their eyes [24]. Different sources define the human OMR using slightly different numbers, ranging from 53° ± 2° to 60° for both left and right eye rotation [10], [24], [25]. However, quick eye movements, or saccades, are limited to the effective oculomotor range (EOMR) of ±45° horizontally [24], [25]. The vertical component of the EOMR is said to be just under ±40° while one study found it to be about 35° up and 47° down [25], [26]. However, 90% of the time, human eyes move within the range of ±20° [9].
The larger a saccade, the larger will be the saccade’s duration, peak velocity and average velocity [27]. The maximum velocity of a saccade is around 600°/s although few saccades exceed 500°/s [24], [27]–[29]. Saccade durations range between approximately 25ms and 100ms for saccades larger than one-degree [27], [28], [30], and they take about 200ms to initiate [10].
Multiple conclusions were gleaned from a study that analyzed human eye movement during an object manipulation task [28]. The gaze fixation points always preceded, or led, hand movement, and the hand and object being moved were rarely visually fixated. Landmarks, or areas of interest in the workspace, could be divided into two categories: obligatory and optional. Obligatory landmarks required gaze fixation to complete the task and included the grasp site and target. Optional landmarks were associated with a lower probability of gaze fixation and shorter fixation time lengths. Gaze also exited a landmark around the time of a kinematic event at that landmark. For instance, gaze left the grasp site just before the hand grasped the object.
Two task-based experiments that involved tea-making and sandwich-making concluded that gaze was typically directed to an object one second or less before contact, with some lead times being as large as two seconds [31]–[33]. They also found within-action saccades, or saccades that occurred within the uninterrupted manipulation of a single object, averaged eight degrees and peaked at 40°, which fits within the aforementioned EOMR of ±45°.
Other studies analyzed the effects task difficulty had on gaze, one concluding that the amount of time subjects fixate on a landmark is directly related to the landmark’s complexity [34]. In a study on driving, increased task difficulty led to increased concentration in gaze at the center of the road, more time looking at the informative display, and more frequent glances to the display [35]. Interestingly, the specific way in which a subject’s eyes move while performing a task, such as number of saccades to a target, could vary from person to person [36].
B. Mechanical Design of Robotic Eyes
Based on these observations of human eyes, the robotic eye design parameters were determined. The maximum saccade velocity as a design requirement was 500°/s. The lateral range of motion was set to ±35°, which is below the average human EOMR but well beyond the ±20° range in which the eyes operate 90% of the time. Because Baxter usually performs tasks in which the lower and especially upper EOMR limits are rarely used, the vertical range of motion was chosen to be ±20°.
To realize these parameters, a mechanism was designed which used a DC motor for vertical motion and a servo for lateral motion (Fig. 2(a)). The spherical eyes are 24mm in diameter, comparable to the size of human eyes. A two-piece shell secures each eye, acting as a socket for the sphere. L-brackets connect the shells to a trapezoidal base plate, and standoff blocks are used to adjust the distance between the eyes and plate. Each eyeball has a cylindrical extrusion with an eyebolt screwed into the circular face. Eyebolts screwed into either end of a cylindrical bar are lined up with those connected to the eyes. Two clevis pins with press-fit screws form pin connections between the bar and eyes.
Fig. 2.
(a) Isometric view of robotic eye design assembly. (b) The DC motor actuation causes vertical motion of the eyes, which is measured by the encoder. (c) A slot enables sliding contact during lateral and vertical motor actuation (d) Servo motor actuation causes the DC motor and eye assembly to rotate laterally in the slot.
The bar connecting the eyes goes through a slot that is part of the DC motor housing. Two set screw shaft collars keep the motor housing at the center of the bar (Fig. 2(b)). The bar is free to move along the length of the slot as the eyes rotate to look up and down (Fig. 2(c)). The DC motor and rotary encoder shaft are fixed to either end of a threaded rod by set screw shaft couplers. The encoder slides up and down along a track attached to extrusions from a hollow disk that sits on top of the plate. The rod is threaded through a press-in nut fixed into the encoder track. When the motor shaft rotates, it threads the rod through the press-in nut, causing the motor housing and encoder to move up or down depending on whether the shaft is rotating counterclockwise or clockwise, respectively.
The servo enables the eyes to look left and right. The servo acts like a windshield wiper, rotating the motor-encoder assembly along the trapezoidal plate within the arced slot (Fig 2(d)). A potentiometer was mounted below the plate to measure the servo’s position.
C. Robotic Eye Kinematics
To control the gaze location of the eyes, the location of coordinate frame {3} was measured with the potentiometer and encoder (Fig. 3(a)). Kinematic equations were defined to determine where this measured coordinate frame {3} (coincident with the centroid of the connecting bar) needs to be for the eyes to look at a particular point in space (Fig. 3). This was accomplished by defining another coordinate frame {1} between the eyeball centroids. Aligning {3} with {1} effectively points the eyes toward the target gaze location {2}.
Fig. 3.
(a) Robotic eye reference frames. (b) Baxter’s origin. (c) Vector angles and coordinates of reference frames.
In this paper, the target gaze location {2} is based on Baxter’s left gripper location. To establish this relationship, it was necessary to use the coordinate frame {0} as the global reference frame, which is Baxter’s origin shown in Fig. 3(b) [37]. The notation used in the coming calculations is the vector position of {b} relative to {a}. The desired location of the bar centroid relative to the midpoint between the eyes, or , for the eyes to look at {2} along with the bar centroid’s displacements in x, y and z are found as follows.
First, the constant was chosen by measuring how high above Baxter’s base frame the eyes would be mounted. The gaze fixation point was initialized as a point straight ahead of and level with the eyes. The position of the bar centroid relative to the eyeball midpoint, , was then initialized as the servo’s moment arm length, 37mm, all in the negative x-direction. Note the magnitude of the vector between {1} and {3} is always 37mm. Calculations begin with finding using Equation 1.
| (1) |
The magnitude of the vector between the bar centroid and gripper position, , is found using Equation 2.
| (2) |
Equations 3 and 4 are equivalent definitions of the unit vector in the direction of , where α, β, and γ are the angles from the x-, y-, and z-axes, respectively, to in Fig. 3(c).
| (3) |
| (4) |
The components of Equations 3 and 4 were equated and rewritten to solve for α, β, and γ as shown in Equation 5.
| (5) |
The three angles give the orientation of the vector that passes through {1}, {2}, and {3}. Knowing that Equation 5 can be written for any pair of {1}, {2} and {3} coordinates resulted in Equation 6. The eyes’ radius of rotation, r, is the same length as the servo’s moment arm and is defined in Fig. 4.
Fig. 4.
Eyeball rotation about its centroid in the XZ-plane.
| (6) |
The value of is next found using
| (7) |
Taking the components of the desired position found in Equation 7 and subtracting the bar centroid’s current coordinates gives Δx, Δy, and Δz of the bar centroid:
| (8) |
It is important to guarantee that all gaze fixation points are within the eyes’ vertical range of motion. To that end, all gaze fixation points were chosen for L in Fig. 4 to remain within 11mm. This choice sets a soft constraint on the eyes’ vertical range to 35°, which is just under their full vertical range of motion of 40°. The eye’s angular range of motion, φ, is related to L:
| (9) |
To keep the eyes within their lateral range of motion, the servo angle was constrained through software.
III. Experimental Methods
A. Integrating Robotic Eyes with Baxter
The Baxter Robot has two arms with seven degrees of freedom that are outfitted with parallel grippers [38]. Robotic eye motions were programmed based on the location of Baxter’s left hand during object transportation. A laptop was used to communicate with Baxter and execute all necessary code. This required a Linux distribution to run Baxter; Ubuntu 14.04 was installed on the laptop. Baxter’s recommended programming language, Robot Operating System (ROS) Indigo, was also installed. An ethernet cable was used for communication between the laptop and Baxter [39].
Additional requirements to run the eye assembly were Arduino software and Python 2.7. For communication from the laptop to the Arduino Uno R3, the Uno was connected to the laptop’s USB port.
The servo for lateral movement, the potentiometer and the rotary encoder were all powered directly through the Arduino Uno. The DC motor for vertical eye motion required a 12V power supply and was powered by a BK Precision 1672 variable power supply.
A SCANIFY 3D scanner by Fuel3D was used to take images of a human face at multiple angles. Using Fuel3D Studio software, the images were spliced together to create a 2D mesh of the face as shown in Fig. 1(b). The mesh was then used to model a 3D mask in SolidWorks. Grooves were designed in the back of the mask to guide where the robotic eyeball assembly would mate with the face. After converting the part file to an STL, the mask was printed on an Ultimaker 3 using Cura software. It was printed out of PLA with a twenty-percent infill at full scale and was then painted beige (Fig. 1(c)).
To mount the robotic eye assembly, two L-brackets connect the main plate to the two mounting supports. The mounting supports mirror one another and fit over the top corners of Baxter’s display monitor as shown in Fig. 1(d). A Velcro strap is then used to affix the mask atop Baxter’s display screen.
B. Robotic Communication
Fig. 5 shows the communication between all components. Baxter’s desired poses are input into the PC via ROS. The Arduino controls the motors for lateral and vertical eye motions (servo and DC motor, respectively) and receives the feedback from both their angular sensors (potentiometer and encoder, respectively). The motor driver sends the Arduino’s speed commands to the DC motor. Arduino publishes coordinates such as desired gripper position and actual gaze fixation point to ROS topics that are subsequently written to text files on the laptop for verification and analysis after experiments.
Fig. 5.
Communication between robotic components.
Joint angles of Baxter’s arms were obtained through the joint recorder; however, forward kinematics were required to solve for the gripper’s location relative to Baxter’s base frame. Therefore, Python code was written utilizing the forward position kinematics function in the Baxter PYKL package [40]. The function returns an array of seven numbers, the first three being the left gripper’s x, y, and z coordinates (XHAND, YHAND, ZHAND, respectively). The three coordinates were published separately to three different ROS topics. The percentage amount which Baxter’s gripper was open (g), with 0 being fully closed and 1 being fully open, was published to a fourth topic. ROS topics were written to text files for data collection and analysis of the robotic eye assembly’s performance.
C. Arduino Program for Robotic Eye-Hand Coordination
The Arduino code used these gripper coordinates (XHAND, YHAND, ZHAND) in conjunction with the eye kinematics equations (1)–(9) when calculating instructions for the servo and DC motors for lateral and vertical eye movements, respectively (Fig. 6). This was done so that the eye movements would correlate to the gripper position during object transportation tasks based on the robot operational mode (described subsequently in Section III.D).
Fig. 6.
Robotic eye-hand coordination algorithm diagram.
The ‘servo.write()’ command was used to control the servo motor in Arduino. The potentiometer was used to measure the servo’s current angle and verify closed-loop control of the lateral eye motion. If the servo’s measured angle was within one degree of the desired angle (θ, Fig. 2(d)), motion was halted. The time between commands was set to a minimum of 2ms to impose a software constraint so it would not move at a speed faster than its manufacturer’s specified maximum.
For vertical motor control, a loop function for the DC motor first calculated the number of motor rotations required to cover the instructed Δz (Fig. 2(b)) This function used the pitch of the threaded rod that is coupled to the DC motor shaft to convert the desired vertical displacement of the DC motor (Δz) to the necessary number of motor rotations. The loop went on to calculate the time allotted for acceleration as a set portion of the predefined time given for the motor to move, the motor’s maximum angular velocity constrained within the motor’s specifications. Motion of the vertical motor was halted if the measured position had an error within 0.1mm.
A subroutine was written to verify the orientation of the eyeball assembly at each time step. It uses the servo and encoder outputs to calculate two points: the true locations of the bar centroid {3} and actual gaze fixation point {2}. The gaze of the eyes can be at an infinite number of points along a vector normal to the pupil. To reduce the vector ( in Fig. 3) to one point, the x-value of the instructed point (X) and the x-value of the actual gaze fixation point (XEYES) were both equated to the gripper’s x-coordinate (XHAND). Therefore, the instructed and actual points lie in a plane normal to the x-direction, making them possible to compare.
An Arduino sketch was used to execute the functions in the .cpp files. Additionally, the sketch contained four subscribers that listened to the ROS topics published by the Python code. The Arduino sketch file contained eight publishers for time, desired gaze fixation point (X, Y, Z), actual gaze fixation point (YEYES, ZEYES), and left gripper location (YHAND, ZHAND). To achieve this, one .cpp and one .h file were written for Denavit-Hartenberg parameters, lateral motor control, vertical motor control, and verification.
D. Modes of Robotic Eye-Hand Coordination
The eyes were programmed to operate under one of five unique modes. There were two parameters that were varied with each mode. The first main difference among modes was the amount of time that the gaze fixation point of the eyes temporally led the gripper location along the trajectory of the gripper. In this way, the gaze fixation points of the eyes would foreshadow the actual location of the gripper during object transportation. The second parameter that was varied among the eye-hand coordination modes was the frequency of eye saccades with which the eyes fixated the target delivery location during object transportation.
(1). Tracking Mode
In the tracking mode, the eyes were always gazing at the gripper to determine the eyes’ tracking ability by having the eyes follow Baxter’s left gripper. Thus, there was no temporal offset of gaze along the trajectory of the gripper and no saccades leaving the gripper. In the Arduino code, the gripper’s coordinates were equated to the desired gaze location and used in the equations detailed in Section II to calculate the displacement of the bar centroid {3} so that the eyes always fixated the object in Tracking Mode.
(2). Block Mode
Block mode was created based on the human eye-hand coordination research described in Section II and was specifically made to run when Baxter would grasp and transport an object that requires little care, such as a solid block. To program this in a human-like way, recall that eye movement can lead hand movement by as much as two seconds and that gaze exits a landmark approximately at the time of a kinematic event at that landmark [28], [31]–[33].
Therefore, in block mode, the eyes would look at the grasp site (the original location of the object) until the gripper was thirty-percent closed just before grabbing the object. The eyes would then saccade between the delivery target location (where the object was to be delivered at the end of the task) and a point 1.5 seconds temporally ahead the left gripper location (along the trajectory of the gripper motion). In this manner, the eye motions would foreshadow the motion of the object transported by the Baxter robot. Then, the eyes would look forward after the object was delivered as the gripper opened to indicate task completion.
(3). Full Glass Mode
Full glass mode differed from block mode by altering both the length of the temporal offset that the eye gaze led gripper motion along the trajectory of the gripper, and the frequency of saccades towards the delivery location during object transportation. The goal was to change these two parameters to create the impression that Baxter was handling an object with greater care. For instance, one would handle a glass full of liquid more carefully than one would handle a solid block, which is how the mode names were derived.
Based on prior research, the hypothesis was made that increased gaze concentration was associated with an increased level of care taken to complete more difficult tasks [35]. Therefore, the temporal offset that eye gaze led gripper motion along the trajectory of the gripper for full glass mode was made smaller than that of block mode (one second versus 1.5 seconds) to keep robotic eye gaze closer to the gripper location while transporting the object.
In the same study [35], an informative display was fixated more frequently and for more time as task difficulty increased [35]. This observation matched the conclusion of another study that mentioned that landmark fixations and task complexity are directly related [34]. Therefore, the robotic eyes were programmed to saccade to fixate the delivery location during object transportation more frequently in full glass mode than in block mode.
(4). Modes I and II
Modes I and II were slightly modified versions of block mode and full glass mode, respectively. These modes were designed for tests involving human subjects discussed in Section G. The first set of tests was designed to investigate whether the human subjects would correlate a specific mode of eye-hand coordination with the level of care Baxter exerted during object delivery, such as with a glass full of liquid or with a solid block. The second set of tests was to determine if the subjects could infer the delivery location of an object transported by Baxter based on the mode of eye-hand coordination. The tests in which these two modes were used required subjects to make inferences with no knowledge of the reasoning behind the modes’ programming. Therefore, the names “Mode I” and “Mode II” were chosen in place of descriptive names that would have influenced the subjects’ perceptions of the robotic system.
Mode I was programmed so that the gaze fixation point of the eyes temporally led the gripper location by two seconds. Mode I had only one saccade towards the object delivery location which occurred just before the gripper initially grasped the object.
Mode II was programmed so that the gaze fixation point of the eyes temporally led the gripper location by 1.5 seconds. Mode II had four saccades directed towards the object delivery location during transport of the object. (see video S1 in the supplementary material for a visual illustration of these modes).
E. Creating Baxter’s Arm Movements
Baxter was taught to perform object manipulation tasks in Python utilizing the joint recorder, joint trajectory action server, and joint trajectory file playback included in Baxter’s software [41]. With the joint recorder running in a terminal, Baxter’s left gripper was manually moved at the desired speed while opening and closing the gripper when needed to grasp and release an object. For Baxter to repeat the trained movement, the joint trajectory action server was run in the third terminal followed by the arm joint trajectory in the fourth terminal. Every task that Baxter was taught consisted of the same general process; Baxter would reach for an object with its left-hand, pick it up, transport it, set it down, release it, and raise the left arm.
Prior to experiments with human subjects, three arm trajectories were programmed to evaluate the performance of the robotic system: a circular, parabolic and linear (Fig. 7). The circular trajectory program was written so the left gripper would traverse a circular path parallel to the YZ plane twice clockwise and twice counterclockwise. Afterwards, another object transportation task was programmed for experiments with human subjects, described subsequently.
Fig. 7.
Circular, parabolic, and linear trajectories programmed for the Baxter’s gripper to quantify the performance of the different eye-hand coordination modes.
F. Testing Robotic System Performance
Seven tests were performed with the robotic system to ensure reliability of control. The eyes were tested in tracking mode for the circular trajectory. Next, the tracking, block, and full glass modes were tested in conjunction with the parabolic and linear trajectories (Table 1). The arm trajectory and mode of eye movement implemented were used in naming each test (e.g. the parabolic block test is abbreviated as par. block). The error was defined as the distance between the instructed and actual gaze fixation points.
Table 1.
List of robotic eye tests to quantify performance of the robotic eye-hand coordination algorithm in bench top tests. The trajectory column indicates the motion of Baxter’s gripper (see also Fig. 7) while the mode column indicates the method of eye-hand coordination.
| Test Number | Trajectoiy | Mode |
|---|---|---|
| 1 | Circular | Tracking |
| 2 | Parabolic | Tracking |
| 3 | Parabolic | Block |
| 4 | Parabolic | Full Glass |
| 5 | Linear | Tracking |
| 6 | Linear | Block |
G. Testing Human-Inspired Robotic Eye-Hand Coordination Algorithms with 11 Human Subjects
Experiments were conducted in which Baxter transported a cup while task relevant information was communicated to human test subjects purely through robotic eye-hand coordination. The experiment had two parts: the care level test and the delivery location test. The goal of the former was to communicate the level of care with which Baxter appeared to be handling an object, from which subjects would infer if the object delivered by Baxter required a high or low level of care. The goal of the delivery location test was to communicate the target delivery location of the object transported by Baxter to human subjects purely by eye-hand coordination.
For these experiments, 11 participants were recruited between the ages of 20 and 35 years old (9 male, 2 female). The protocol was approved by the Internal Review Board of Florida Atlantic University in accordance with the declaration of Helsinki. All subjects signed a written informed consent form.
The initial grasp site of the cup and four object delivery positions, or targets, were located on a horizontal board that sat elevated on a table. The four 90mm diameter delivery location targets were equidistantly spaced 21mm apart and labeled A, B, C, and D (Fig. 8(a)). To keep subjects from getting distracted by objects beyond the test bench, a curtain was erected behind Baxter. Subjects were also asked to wear noise cancelling earphones during testing to block out sounds that could distract them or influence their responses. Facing Baxter, subjects were asked to sit on a stool 185cm back and 35cm to the right of the midpoint between targets B and C as shown in Fig. 8(b).
Fig. 8.
(a) Baxter workspace setup for experiments with human subjects. The 3D printed cup was initially grasped in the location shown, and then transported to one of four delivery location targets labeled as A, B, C, D. The subjects attempted to ascertain the delivery location through the robotic eye-hand coordination algorithm. (b) Subjects were seated on the stool to observe the robotic system during these experiments.
1). Level of Care During Object Transportation Test
The care level test analyzed each subject’s preconceived notion of how a person’s eyes move when they handle objects requiring different levels of care. In every trial, Baxter moved one of two cups along the same parabolic path from the grasp site to the target location labeled “D.” One cup was empty and the other filled with elastic rubber, both designed in a way to make it impossible for the human subject to see which was which during testing. The PLA cups were 60mm in diameter and were 3D printed on an Ultimaker 3 (see Fig. 9).
Fig. 9.
(left) Rear view of empty and full cups. (Right) The design of the cups prevented visually determining which cup was empty or full.
Depending on which cup Baxter was given to move, the eyes were commanded to move in either Mode I or Mode II described in Section III.D and shown in Fig. 10. The hypothesis was that the human subjects would tend to associate Mode I with the empty cup and Mode II with the full cup.
Fig. 10.
Robotic eye movement in relation to Baxter’s gripper position for Mode I (top) and Mode II (bottom) of the care level test.
At the beginning of the experiment, both cups were shown to the human subject, and he/she was asked to infer which mode of robotic eye-hand coordination was associated with each cup. This part of the experiment consisted of ten trials with each of the 11 subjects, arranged in a pseudo-random fashion, but consistent among subjects so that each subject saw five trials of both modes. The human subject was asked if the cup was “empty” or “full” after each trial. This part of the experiment was conducted without training; therefore, the subjects had to make their decisions based on their own notion of how a person’s eyes move when handling an object with high or low levels of care.
The nonparametric Wilcoxon rank sum U-test was used to statistically analyze human perception of each of the care level test modes. The p-value indicates whether the human perception of Modes I and II are samples from a continuous distribution with equal medians (null hypothesis). If the p-value is greater than 0.05, there is not enough evidence to reject the null hypothesis, in which case the human perception of Modes I and II have the same or a very similar responses. On the other hand, if the p-value is less than 0.05, then the human perception of Modes I and II are significantly different, indicating that the level of care of transporting an object could be conveyed by robotic eye-hand coordination.
2). Inferring Object Delivery Location Test
For the second part of the experiment, the subjects were asked to ascertain the object delivery location based on the robotic eye-hand coordination algorithm. First, the human subjects were trained by observing eight object deliveries by Baxter to targets in this sequence: A, B, C, D, A, B, C, D. In all cases, Baxter first lifted the object from the grasp site to the elevated point in the middle of the target positions along a linear path (Fig. 11). During this upward lifting arm movement, the eyes temporally led the gripper location by 1.5 seconds with five saccades directed toward the delivery location target. The process took 15 seconds, after which the eyes would look forward for three seconds. The eyes would then look at the delivery location while Baxter delivered the object to the target. The process is shown in Fig. 11 for target D (see also supplemental video attachment).
Fig. 11.
(Left) Delivery target test trajectories. (Right) Robotic eye movement in relation to Baxter’s gripper position for delivery to target D in the delivery target test where eye saccades towards the delivery location foreshadowed the arm motion.
Subjects were told in training that during the actual testing, they would have to guess the target position (A, B, C, D) while the object is lifted towards the midway point centrally located above the table and before the descending arm motion towards the delivery location. Thus, arm motion alone could not convey the delivery location. The three seconds of Baxter looking forward without moving after the ascending arm motion but prior to the descending arm motion acted as a buffer if the subject was delayed in answering. For testing, 20 trials were arranged pseudo-randomly to test the ability of the 11 human subjects to infer information from the robotic eye-hand coordination algorithm. Baxter delivered the cup to each of the four delivery locations five times in a pseudo-random fashion; the sequence of delivery locations was consistent among all 11 subjects.
IV. Results
A. Robotic Eye Error Tracking Performance
Error analyses were performed for lateral, vertical, and overall performance in all seven tests. Lateral motor motion for the parabolic tracking test (test 2) is shown in Fig. 12(a). In the test, the eyes were instructed to look forward until gripper information began publishing hand tracking data, and then began to track the location of the robotic hand. A similar analysis is shown in Fig. 12(b) for the parabolic block mode test (test 3), which included saccades to the delivery location. Like in Fig. 12(a), the labels indicate where the eyes were instructed to look throughout the task (forward, at the grasp site, at the target, or temporally offset ahead of the hand). The labels for gaze fixation sites were abbreviated as follows: “For” for forward, “Grasp” for grasp site, “T” for target and “O” for a point temporally offset forward along the trajectory of the gripper.
Fig. 12.
(a) Lateral motor data from the parabolic tracking trajectory, (test 2, see also Table 1). (b) Lateral motor data from the parabolic block mode trajectory (test 3) showing corresponding eye and gripper positions. (c) Corresponding vertical motor data from test 3. (d) The average error and error below which 90% of the data fell for the y-direction and (e) z-direction all fell within reasonable levels.
When the eyes moved laterally, they sometimes lagged behind their instructed gaze fixation point; however, such errors never exceeded 4°, and the average lateral motor error was still less than 2°. The vertical motor’s accuracy was even better with brief overshoots during large saccades as shown in the vertical motor analysis of test 3 in Fig. 12(c).
In Cartesian space, the average tracking error was less than 50mm in all seven tests, and the error below which 90% of the data points fell was also acceptable in both the y and z-directions: Fig. 12(d), Fig. 12(e), respectively.
B. Human-Robot Experiments
1). Level of Care During Object Transportation Test
The results of the care level test are presented as a pie chart (Fig. 13). Most subjects associated Mode II with the empty cup and Mode I with the full cup. Specifically, 36% associated Mode I with a full cup while 14% associate Mode I with an empty cup. Also, 33% associated Mode II with an empty cup while 17% associated Mode II with a full cup. While eight of the eleven subjects were always consistent in their answers for all ten trials, three subjects occasionally vacillated between which mode was associated with the empty or full cup. This indecisiveness resulted in slightly different percentages being associated with Mode I: Full and Mode II: Empty as well as Mode I: Empty and Mode II: Full; however, it had minimal impact on the overall results.
Fig. 13.
Pie chart showing that human perception of whether a cup transported by Baxter was full or empty is more often correlated to a specific mode of robotic eye-hand coordination (see also Table 2).
The resulting p-values from the U-test on the human perception of robotic eye-hand coordination to convey the level of care while transporting an object shows that human perception of Modes I and II with a full cup are significantly different as are Modes I and II with an empty cup (Table 2). In other words, the modes of robotic eye-hand coordination had a statistically significant impact on human perception of whether the cup was full or empty, even though they could not see the contents of the cup. The p-values greater than 5% for Mode I: Full/Mode II: Empty and Mode I: Empty/Mode II: Full reflect the fact that most subjects consistently associated one mode with the full glass and the other mode with the empty glass.
Table 2.
P-values from nonparametric Wilcoxon rank sum U-test related to the human subjects’ perceptions of the level of care test during object transportation.
| Mode I Full | Mode I Empty | |
|---|---|---|
| Mode II Full | p = 0.0014 | p = 0.4577 |
| Mode II Empty | p = 0.5724 | p = 8.2e-04 |
2). Inferring Object Delivery Location Test
The overall accuracy of the human subjects to infer the target delivery location of objects transported by Baxter was 91.4%, which is represented as a confusion matrix (Fig. 14). Combining data from all 11 subjects resulted in an analysis of 220 trials, with each delivery location accounting for 55 trials, or one quarter, of the total trials. For target A, five trials were confused as target B, while for target B three trials were confused as target A and four trials were confused as target C. Among all the other delivery locations, target C was the most accurately guessed target; two trials were confused as target D. Target B was accurately guessed 87.3% of the time, which is the lowest accuracy of the four targets. For target D, five trials were confused as target C. For all trials, human perception of the true delivery target location was only one delivery location offset from the actual location. This indicates that even the few wrong answers from the human subjects that did occur were not located spatially far from the correct location.
Fig. 14.
Confusion matrix showing results of delivery location test. Subjects were able to reliably infer the delivery location with a 91.4% level of accuracy overall.
V. Discussion
The robotic eye design and its capabilities have the potential to benefit industries and applications that employ human-robot interaction. Their accuracy in gaze and ability to convey information through eye-hand coordination would make them useful for teaching applications. For instance, given a series of objects of various shapes and colors, a child could be taught by a robot to pick up specific objects to learn object geometry or attributes [42].
In the entertainment industry, theme parks have been increasingly breaking the fourth wall between their animatronics and park goers, allowing people to become a part of the story instead of just watching scenes unfold. By including a camera within the eyes, animatronics could interact face-to-face with visitors, holding their gaze and pointing out areas of interest or foreshadowing upcoming events accurately through eye-hand coordination strategies.
Human subjects in this paper were reliably able to infer the target delivery location of the grasped object purely by the anthropomorphic eye-hand coordination algorithm with a 91.4% level of accuracy. However, the differences in algorithm Modes I and II were not so uniformly associated with the level of care while transporting an object (full cup or empty cup), even though the results were statistically significant (Table II).
One factor affecting the variability in human perception of the two different modes is likely related to the differences in eye-hand coordination strategies observed among human subjects during studies involving different pointing [36] and object manipulation tasks [33]. Because human subjects employ different eye-hand coordination strategies themselves, it is likely that they would have different interpretations of motion control strategies in a robotic agent. A further study in which subject eye movement is recorded as they move objects with high and low levels of care would provide the direct comparison necessary to determine whether the variability in human eye-hand coordination is consistent with that of human perceptions of robotic eye-hand coordination during object transport. In some situations, it may be useful for the gaze of the robot to vary from person to person or from task to task [4], [43].
The utility of robotic eye-hand coordination in this paper could also be extended to assistive robots whose gaze could help its users better understand what it will do next, resulting in greater trust in the robot [4]. For example, one publication found that the gaze of a humanoid robot significantly affected the human trust of the robot, and had a positive impact on the performance of difficult tasks [44]. Additionally, ‘favored’ people who were gazed at more by a robot docent in a museum-like setting were found to be more attentive relative to the ‘unfavored’ people who were not looked at as often by the robot [45]. Robotic gaze has also been used to convey if something a character does in a story is right or wrong [2], to indirectly indicate where a person should grab an object in a handoff task [3], and to convince people that they are correct or incorrect in their answers [4]. The results of this paper have extended upon the use of gaze alone as a useful tool, to the eye-hand control strategy as potentially being a persuasive action in the repertoire of robotic agents, particularly if it were integrated with the ability to infer meaning from human eye motions and hand gestures [46]. As the internet of robotic things becomes increasingly integrated within our society [47], the findings in this paper could have strong implications in a broad range of fields, ranging from robotic therapy assistants for children with autism spectrum disorder to educational and collaborative robots (for a review of potential application areas, see [48]).
VI. Conclusion
Anthropomorphic control of robotic eye-hand coordination has been explored in this paper using novel robotic eyes integrated with a Baxter robot. Both position control for tracking and temporal control for human-like eye movement were achieved during object transportation experiments.
When used in a social context, it was found that 11 human subjects were able to associate the modes of robotic eye-hand coordination with the level of care the robot used in transporting a cup that was either full or empty. On average, more subjects associated fewer saccades towards the delivery location with an increased level of care. The human subjects were also able to reliably infer the delivery location of transported objects by the human-like eye-hand coordination algorithm with a 91.4% level of accuracy.
Supplementary Material
VII. Acknowledgment
Research reported in this publication was supported by the National Institute Of Biomedical Imaging And Bioengineering of the National Institutes of Health under Award Number R01EB025819. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This research was also supported by the National Institute of Aging under 3R01EB025819-04S1, National Science Foundation award #1317952, and Department of Energy contracts TOA#0000332969 and TOA#0000403076.
Footnotes
Compliance with Ethical Standards: The authors declare that they have no conflict of interest.
Publisher's Disclaimer: This Author Accepted Manuscript is a PDF file of an unedited peer-reviewed manuscript that has been accepted for publication but has not been copyedited or corrected. The official version of record that is published in the journal is kept up to date and so may therefore differ from this version.
VIII. References
- [1].Geller T, “Overcoming the Uncanny Valley,” IEEE Comput. Graph. Appl, vol. 28, no. 4, pp. 11–17, 2008. [DOI] [PubMed] [Google Scholar]
- [2].Ham J, Cuijpers RH, and Cabibihan JJ, “Combining Robotic Persuasive Strategies: The Persuasive Power of a Storytelling Robot that Uses Gazing and Gestures,” Int. J. Soc. Robot, vol. 7, no. 4, pp. 479–487, 2015. [Google Scholar]
- [3].Aj. Moon et al. , “Meet me where i’m gazing: How shared attention gaze affects human-robot handover timing,” Proc. 2014 ACM/IEEE Int. Conf. Human-robot Interact. - HRI ‘14, no. March, pp. 334–341, 2014. [Google Scholar]
- [4].Stanton CJ and Stevens CJ, “Don’t Stare at Me: The Impact of a Humanoid Robot’s Gaze upon Trust During a Cooperative Human–Robot Visual Task,” Int. J. Soc. Robot, vol. 9, no. 5, pp. 745–753, 2017. [Google Scholar]
- [5].Kidd CD and Breazeal C, “Effect of a robot on user perceptions,” 2004 IEEE/RSJ Int. Conf. Intell. Robot. Syst. (IEEE Cat. No.04CH37566), vol. 4, no. January 2004, pp. 3559–3564, 2004. [Google Scholar]
- [6].Ramos L, Valencia S, Verma S, Zornoza K, Morris M, and Tosunoglu S, “Robotic Face to Simulate Humans Undergoing Eye Surgery,” 2017. [Google Scholar]
- [7].“Eye Mechanism | InMoov.” [Online]. Available: http://inmoov.fr/eye-mechanism/. [Accessed: 15-Apr-2018].
- [8].Gosselin CM and Hamel J-F, “The agile eye: a high-performance three-degree-of-freedom camera-orienting device,” Proc. 1994 IEEE Int. Conf. Robot. Autom, pp. 781–786, 1994. [Google Scholar]
- [9].Bang YB, Paik JK, Shin BH, and Lee C, “A three-degree-of-freedom anthropomorphic oculomotor simulator,” Int. J. Control Autom. Syst, vol. 4, no. 2, pp. 227–235, 2006. [Google Scholar]
- [10].Pateromichelakis N et al. , “Head-eyes system and gaze analysis of the humanoid robot Romeo,” IEEE Int. Conf. Intell. Robot. Syst, no. April 2015, pp. 1374–1379, 2014. [Google Scholar]
- [11].Bassett K, Hammond M, and Smoot L, “A fluid-suspension, electromagnetically driven eye with video capability for animatronic applications,” 9th IEEE-RAS Int. Conf. Humanoid Robot. HUMANOIDS09, pp. 40–46, 2009. [Google Scholar]
- [12].Irmler H et al. , “United States Patent: System and method for generating realistic eyes,” US 8,651,916 B2, 2014. [Google Scholar]
- [13].Brockmeyer E, Poupyrev I, and Hudson S, “Papillon: Designing Curved Display Surfaces With Printed Optics,” Proc. 26th Annu. ACM Symp. User interface Softw. Technol. - UIST ‘13, pp. 457–462, 2013. [Google Scholar]
- [14].Allen PK, Timcenko A, Yoshimi B, and Michelman P, “Automated Tracking and Grasping of a Moving Object with a Robotic Hand-Eye System,” 1993.
- [15].Hager GD, Chang W-C, and Morse AS, “Robot hand-eye coordination based on stereo vision,” IEEE Control Syst, vol. 15, no. 1, pp. 30–39, February. 1995. [Google Scholar]
- [16].Hong W and Slotine J-JE, “Experiments in hand-eye coordination using active vision,” in Experimental Robotics IV, London: Springer-Verlag, 1997, pp. 130–139. [Google Scholar]
- [17].Su J, Ma H, Qiu W, and Xi Y, “Task-Independent Robotic Uncalibrated Hand-Eye Coordination Based on the Extended State Observer,” vol. 34, no. 4, pp. 1917–1922, 2004. [DOI] [PubMed] [Google Scholar]
- [18].Levine S, Pastor P, Krizhevsky A, Ibarz J, and Quillen D, “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection,” Int. J. Rob. Res, vol. 37, no. 5, pp. 421–436, 2018. [Google Scholar]
- [19].Lukic L, Santos-Victor J, and Billard A, “Learning robotic eye–arm–hand coordination from human demonstration: a coupled dynamical systems approach,” Biol. Cybern, vol. 108, no. 2, pp. 223–248, April. 2014. [DOI] [PubMed] [Google Scholar]
- [20].Razin Y and Feigh K, “Learning to Predict Intent from Gaze During Robotic Hand-Eye Coordination,” Proc. 31th Conf. Artif. Intell. (AAAI 2017), pp. 4596–4602, 2017. [Google Scholar]
- [21].Chao F et al. , “Enhanced Robotic Hand–Eye Coordination Inspired From Human-Like Behavioral Patterns,” IEEE Trans. Cogn. Dev. Syst, vol. 10, no. 2, pp. 384–396, June. 2018. [Google Scholar]
- [22].Olson ST, “Human-inspired robotic hand-eye coordination,” Florida Atlantic University, 2018. [Google Scholar]
- [23].Atchison DA, “Optics of the Human Eye,” Ref. Modul. Mater. Sci. Mater. Eng, pp. 1–19, 2017. [Google Scholar]
- [24].Guitton D and Volle M, “Gaze control in humans: eye-head coordination during orienting movements to targets within and beyond the oculomotor range.,” J. Neurophysiol, vol. 58, no. 3, pp. 427–459, 1987. [DOI] [PubMed] [Google Scholar]
- [25].Misslisch H, Tweed D, and Vilis T, “Neural constraints on eye motion in human eye-head saccades.,” J. Neurophysiol, vol. 79, no. 2, pp. 859–869, 1998. [DOI] [PubMed] [Google Scholar]
- [26].Freedman EG, “Coordination of the eyes and head during visual orienting,” Exp. Brain Res, vol. 190, no. 4, pp. 369–387, October. 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Bahill AT, Clark MR, and Stark L, “The main sequence, a tool for studying human eye movements,” Math. Biosci, vol. 24, no. 3–4, pp. 191–204, 1975. [Google Scholar]
- [28].Johansson RS, Westling G, Bäckström A, and Flanagan R, “Eye – Hand Coordination in Object Manipulation,” J. Neurosci, vol. 21, no. 17, pp. 6917–6932, 2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Bronstein AM and Kennard C, “Predictive eye saccades are different from visually triggered saccades,” Vision Res, vol. 27, no. 4, pp. 517–520, 1987. [DOI] [PubMed] [Google Scholar]
- [30].Robinson DA, “The Mechanics of Human Saccadic Eye Movement,” J. Physiol, pp. 245–264, 1964. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Land M, Mennie N, and Rusted J, “The roles of vision and eye movements in the control of activities of daily living,” Perception, vol. 28, no. 11, pp. 1311–1328, 1999. [DOI] [PubMed] [Google Scholar]
- [32].Hayhoe M, “Vision Using Routines: A Functional Account of Vision,” Vis. cogn, vol. 7, no. 1–3, pp. 43–64, 2000. [Google Scholar]
- [33].Land MF and Hayhoe MM, “In what ways do eyemovements contribute to everyday activities?,” Vision Res, vol. 41, no. 25–26, pp. 3559–3565, 2001. [DOI] [PubMed] [Google Scholar]
- [34].Iqbal ST and Bailey BP, “Using Eye Gaze Patterns to Identify User Tasks,” Grace Hopper Celebr. Women Comput, p. 6, 2004. [Google Scholar]
- [35].Victor TW, Harbluk JL, and Engström JA, “Sensitivity of eye-movement measures to in-vehicle task difficulty,” Transp. Res. Part F Traffic Psychol. Behav, vol. 8, no. 2 SPEC. ISS., pp. 167–190, 2005. [Google Scholar]
- [36].Smith BA, Ho J, Ark W, and Zhai S, “Hand eye coordination patterns in target selection,” Proc. Symp. Eye Track. Res. Appl. - ETRA ‘00, pp. 117–122, 2000. [Google Scholar]
- [37].“Baxer User Guide for Intera 3.0 Software,” 2014. [Online]. Available: http://mfg.rethinkrobotics.com/mfg-mediawiki-1.22.2/images/1/12/Baxter_User_Guide_for_Intera_3.0.0.pdf.
- [38].“Baxter Collaborative Robot Tech Specs | Rethink Robotics.” [Online]. Available: https://www.rethinkrobotics.com/baxter/tech-specs/. [Accessed: 24-Aug-2018].
- [39].“Workstation Setup - sdk-wiki.” [Online]. Available: http://sdk.rethinkrobotics.com/wiki/Workstation_Setup. [Accessed: 15-Apr-2018].
- [40].“Baxter PyKDL - sdk-wiki.” [Online]. Available: http://sdk.rethinkrobotics.com/wiki/Baxter_PyKDL. [Accessed: 15-Apr-2018].
- [41].“Joint Trajectory Playback Example - sdk-wiki.” [Online]. Available: http://sdk.rethinkrobotics.com/wiki/Joint_Trajectory_Playback_Example. [Accessed: 22-Jun-2018].
- [42].Biondi M, Boas DA, and Wilcox T, “On the other hand: Increased cortical activation to human versus mechanical hands in infants.,” Neuroimage, vol. 141, pp. 143–153, November. 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [43].Haji Fathaliyan A, Wang X, and Santos VJ, “Exploiting Three-Dimensional Gaze Tracking for Action Recognition During Bimanual Manipulation to Enhance Human–Robot Collaboration,” Front. Robot. AI, vol. 5, p. 25, April. 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Stanton C, Stevens C, “Robot Pressure: The impact of robot eye gaze and lifelike bodily movements upon decision-making and trust,” International Conference on Social Robotics, p. 330–339, 2014 [Google Scholar]
- [45].Karreman D, Bradford G, Dijk E, Lohse M, and Evers V, “Picking favorites: the influence of robot eye-gaze on interactions with multiple users,” 2013. IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan [Google Scholar]
- [46].Miyauchi D, Nakamura A, Kuno Y, “Bidirectional eye contact for human-robot communication,” IEICE Trans. Inf. & Syst, vol. E88-D, p. 2509–2516, 2005 [Google Scholar]
- [47].Simoens P, Dragone M and Saffiotti A. “The Internet of Robotic Things: A review of concept, added value and applications,” International Journal of Advanced Robotic Systems 2018; 15: 1729881418759424. [Google Scholar]
- [48].Admoni H and Scassellati B, “Social eye gaze in human-robot interaction: a review,” Journal of Human-Robot Interaction, vol. 6, p. 25–63, 2017 [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.














