Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Jun 1.
Published in final edited form as: IEEE Trans Neural Syst Rehabil Eng. 2015 Sep 3;24(6):682–691. doi: 10.1109/TNSRE.2015.2475724

Robot-mediated Imitation Skill Training for Children with Autism

Zhi Zheng 1, Eric M Young 2, Amy R Swanson 3, Amy S Weitlauf 4, Zachary E Warren 5, Nilanjan Sarkar 6
PMCID: PMC4965236  NIHMSID: NIHMS797216  PMID: 26353376

Abstract

Autism spectrum disorder (ASD) impacts 1 in 68 children in the US, with tremendous individual and societal costs. Technology-aided intervention, more specifically robotic intervention, has gained momentum in recent years due to the inherent affinity of many children with ASD towards technology. In this paper we present a novel robot-mediated intervention system for imitation skill learning, which is considered a core deficit area for children with ASD. The Robot-mediated Imitation Skill Training Architecture (RISTA) is designed in such a manner that it can operate either completely autonomously or in coordination with a human therapist depending on the intervention need. Experimental results are presented from small user studies validating system functionality, assessing user tolerance, and documenting subject performance. Preliminary results show that this novel robotic system draws more attention from the children with ASD and teaches gestures more effectively as compared to a human therapist. While no broad generalized conclusions can be made about the effectiveness of RISTA based on our small user studies, initial results are encouraging and justify further exploration in the future.

Index Terms: Autism Spectrum Disorder, Imitation Skill Training, Robot and Autism, Robot-mediated Intervention

I. Introduction

Autism spectrum disorder (ASD) is a common disorder associated with enormous financial and human costs [1]. The cumulative ASD literature suggests early intensive behavioral interventions are effective for many children [2]. However, an unfortunate reality remains that many families and service systems often struggle to access appropriate services due to resource limitations [3]. There is an urgent need for the development and application of novel treatment paradigms capable of earlier, more effective impact on the core vulnerabilities and pivotal skills associated with ASD across resource-strained environments. Increasingly researchers have proposed advanced technologies, including robotic systems, as potential mechanisms for addressing these limits of ASD intervention [4]. Despite hypothesized theoretical benefit of such applications, major challenges exist regarding realizing robotic systems that are (1) capable of robust autonomous functioning, (2) relevant and important to the core features of ASD at appropriate points in development, and (3) potentially realistic as cost-effective intervention systems outside of highly specialized research environments.

The emergent robotic and technological literature has demonstrated that many individuals with ASD show a preference for robot-like characteristics over non-robotic toys [5] and in some circumstances even respond faster when cued by robotic movement rather than human movement [6]. Although this research has focused on school-aged children and adults, the downward extension of this preference for robotic and technological stimuli is promising as many very young children with and at-risk for ASD often preferentially orient to nonsocial contingencies rather than biological motion [7]. Several recent approaches have highlighted the potential of robotic tools for enhancing social interactions. Scassellati and his colleagues [4] demonstrated that preschool and school-aged children with ASD spoke more to an adult confederate when asked to do so by a robot than when asked by another adult or by computer. Similarly, Goodrich et al. [8] reported that a low-dose exposure to a humanoid robot yielded enhanced positive interactions for children with ASD. Feil-Seifer and Mataric [9] found that contingent activation of a robot during interactions yielded immediate short-term improvement in social interactions with adults.

A major limitation and challenge to the majority of technological intervention systems studied to date has been the reliance on static performance-based protocols. In this regard, systems have often been programmed to operate within a narrow set of responses, or have been remotely operated and unable to realize dynamic autonomous interaction. Recent work has suggested that robotic intervention systems designed to operate in “closed-loop” form [10] and targeted to early pivotal skills for children with ASD [11] may represent more promising dynamic interaction. The development of adaptive interaction realizing within-system changes in response to measured behaviors is important for individualization of meaningful technological interventions, as needed by young children with ASD [12]. Only a few studies of adaptive technological interaction for children with ASD have appeared in the literature: proximity-based closed-loop robotic interaction [13, 14], haptic interaction [15], adaptive robot-assisted play [16], and video-game responses to physiological signals [17]. However, the paradigms explored had limited direct relevance to the core deficits of ASD at young ages and instead focused on proof-of-concept task and game or school-aged children.

In the current work, we describe the development and initial application of a non-invasive intelligent robotic intervention system capable of dynamic and individualized interaction with potential relevance to improving imitation skills for young children with ASD. Imitation involves translating from the perspective of another individual to oneself, and creating representation of this individual's primary representation of the world [18]. Although the exact reasons of the imitation impairment associated with ASD is still unclear, evidence suggests that this imitation impairment may be related with the basic ability to map actions of others onto an imitative match by oneself [19]. Imitation is a critically important social communication skill that emerges early in life and it is theorized to play an important role in the development of cognitive, language, and social skills [20]. Children with ASD show powerful impairments in imitation and such deficits have been tied to a host of associated neurodevelopmental and learning challenges over time[21].

There are only a few preliminary robotic studies reported on imitation learning for children with ASD. Duquette et al. [22] compared the impact of a mobile robot Tito with a human therapist on the imitation behavior of children with ASD. The participants paired with the robot demonstrated more shared attention and imitated more facial expression, while the participants paired with the human imitated more body movement. Bugnariu et al. [23] developed a method to quantify imitation using a robot, kinematic data and a Dynamic Time Warping algorithm. Cabibihan et al. [24] claimed that imitation skills taught with the aid of humanoid robots had potential to be generalized with people. Robins et al. [25] conducted a longitudinal study where children freely interacted with a humanoid robot Robota where the participants exhibited diverse imitation behaviors after repeated exposure. Srinivasan et al. [26] conducted a interaction study with a small humanoid robot Isobot. Results showed that the task-specific imitation and generalized praxis performance improved for a group of typically developing children and one child with ASD.

The imitation intervention literature suggests that intervention approaches are most effective when children show sustained engagement with a variety of objects, can be utilized within intrinsically motivating settings, and when careful adaptation to small gains and shifts can be incorporated and utilized over longer intervals of time [21].

The contribution of this work is two-fold: first, we present a novel autonomous Robot-mediated Imitation Skill Training Architecture or RISTA specifically designed for children with ASD including a new gesture recognition algorithm that can assess imitated gestures in real-time and provide dynamic feedback. Second, we present a preliminary user study to demonstrate the tolerability and usefulness of robotic interaction using RISTA with a group of both typically developing children and children with ASD. Our initial concepts and results were presented in [27].

In what follows, we first introduce the robot-mediated imitation skill training system architecture in Section II. Section III describes the gesture representation method used in this study. Section IV discusses the gesture recognition algorithm embedded in the system. The experiments and results are described in Section V. Finally, Section VI discusses the potential and limitations of the current study.

II. Robot-Mediated Imitation Skill Training Architecture (RISTA)

The Robot-mediated Imitation Skill Training Architecture (RISTA) for children with ASD that we present in this paper has a humanoid robot as a task administrator, a camera for gesture recognition, a gesture recognition algorithm to assess the imitated gesture, and a feedback mechanism to encourage interaction (Fig. 1). Existing literature [28, 29] suggests, at least in preliminary form, the ability of humanoid robotic interaction systems to capture interest of some children with ASD in a manner that could potentially be leveraged into meaningful intervention approaches. RISTA is designed to teach imitation skills via the robot by first making the robot demonstrate a target gesture and asking the child to imitate it, assessing the imitated gesture, and finally providing relevant feedback – all autonomously and in a closed-loop manner. An interesting feature of this system is that the robot can be replaced by a human therapist within the architecture when needed without altering the rest of the system components, such as the gesture recognition and feedback modules, allowing the system to be used for co-robotic intervention.

Fig. 1.

Fig. 1

RISTA system architecture.

The RISTA architecture is illustrated in Fig. 1. There are four important modules in RISTA. The robotic gesture demonstration (RGD) module is meant for demonstrating a gesture by the robot and is implemented on the humanoid robot NAO [30]. NAO has 25 degrees of freedom, 2 flashing LED “eyes”, 2 speakers, and a synthetic childlike voice. The imitated gesture sensing (IGS) module is used to sense the imitated gesture by the child and is implemented using Microsoft Kinect [31], which can track a person's skeleton with an average of accuracy of 5.6 mm in 3D space [32, 33]. The supervisory controller (SC) is the primary control mechanism for RISTA and is designed based on a timed automata model [34]. We use a timed automata model for SC because it fits well with the Finite State Machine (FSM) based gesture recognition method that we use for gesture recognition and the state-based predesigned robot behavior libraries that we utilize for robot gestures. The SC manages the component communications, handles the experimental logic and is embedded with a gesture recognition algorithm that can recognize a partial or completed gesture. It instructs the robot to show a target gesture to the child and once the target gesture is completed, the child is asked by the robot to imitate the gesture. The SC continuously monitors the IGS and evaluates the child's imitative performance for feedback. Based on this performance, the SC may instruct the robot either to give rewards or aid the child with reinforcement components and approximations of the gestures within their motor movements. The feedback provided by the robot were predefined by autism clinicians for every state in the FSMs and stored in the system software library. The SC continues this procedure in a closed-loop manner for a specified duration of time and collects data to evaluate the efficacy of the trials.

The Graphical User Interface (GUI) is designed in a manner that it is easily operated by an experimenter, e.g., a therapist, who may not be technologically savvy. The head pose estimation, skeleton tracking and the participant's real time video are displayed for observation.

In the trial, the NAO provides imitation prompts in the form of recorded verbal scripts, mirroring movements, and gestural movements for imitation. Given the gesture prompts, the participant's response is sensed by a Microsoft Kinect at 30 frames per second (fps). Skeleton data from Kinect are processed using a Holt double exponential smoothing filter to avoid glitch and jitter. The Kinect SDK face tracking functions fit a 3D convex mesh on the participant's head and provide 3D position and orientation of the participant's head within the Kinect frame. The head pose is then used for estimating a participant's attention on the robot or on the human therapist. If the participant moved out of the Kinect tracking zone during interaction, a signal would be triggered to suspend the robotic action and RISTA would not proceed with the interaction until the child returned to the appropriate region and the Kinect resumed its tracking.

In this work we chose a set of arm gestural movements for imitation skill learning, which were: 1) raising one hand, 2) waving, 3) raising two hands, and 4) reaching arms out to the side. These four gestures were intentionally selected due to the low motor skill requirements they presented to participating children. These gestures were also selected to avoid motor limitations of the humanoid robot (e.g., challenges crossing midline and adequately positioning fingers/digits). However, the capabilities of RISTA are not limited to these gestures alone. Rather, they represent the kinds of gestures that a therapist might use for imitation skill learning intervention.

III. Gesture Representation

A. Gesture representation for the human

We first defined a set of variables that could mathematically capture each of the four chosen gestures as described above. The gesture variables are shown in Table I. Fig. 2 shows some of these gesture variables with respect to the Kinect frame. Each gesture was broken down into several salient parts where each part was represented as a state (Fig. 3). Note that this decomposition is not unique and was designed based on common sense and with several autism clinicians' input.

Table I. Gesture Variables of Gesture Representation and Recognition.

Symbol Definition
sw
Vector pointing from shoulder to wrist
ew
Vector pointing from elbow to wrist
wy y coordinate of the wrist joint
ey y coordinate of the elbow joint
sy y coordinate of the shoulder joint
a1 Angle between sw and negative y axis
a2 Angle between sw and yz plane, when sw in negative z direction (arm pointing forward)
a3 Angle between ew and xy plane
a4 Angle between sw and positive x axis for right arm, angle between sw and negative x axis for left arm.
a5 Angle between sw and xy plane, sw with positive x direction for right arm, and with negative x direction for left arm.
wes Angle between the upper arm and the forearm
D x direction movement
H y direction raised height
Titem Threshold for distances or angles

Fig. 2. Gesture variables demonstration in Kinect frame.

Fig. 2

Fig. 3. Gesture states of the four gestures in this study.

Fig. 3

B. Gesture representation for the robot

In order to implement the same gestures on the robot so that the robot can demonstrate these gestures, each gesture was carefully designed by specifying its joint angle trajectories. Finally, each gesture was stored in a library that the supervisory controller could select from and play. A part of our imitation skill training paradigm included gesture mirroring by the robot. In other words, sometimes the robot was also required to copy a participant's gesture. The skeleton tracking module of Kinect was used to acquire a participant's arm joint angles, which were then mapped to the corresponding joint angles of the robot. If the participant's joint angles were outside the robot's workspace, then the robot angles were set for their maximum attainable values.

IV. Gesture Recognition by the Robot

An interactive robot-mediated imitation skill training system must be able to dynamically provide feedback to the participants similar to the way a therapist does during intervention. The robot's feedback depended on the accuracy and speed of the gesture recognition algorithm. In addition, given the target population it was quite likely that the participants would not be able to completely imitate all the gestures. In order to scaffold participant skills, the robot needed to recognize partially completed gestures, detect what components of a gesture require attention, and provide specific feedback to improve the detected deficiency.

In order to achieve these goals, we designed a rule-based finite state machine (FSM) method to recognize gestures. While there are several powerful probabilistic methods for gesture recognition such as Hidden Markov Model [35] and particle filtering [36], we chose a rule-based method to avoid the complexity of computation and the difficulty of generating a training data set due to the young age of our participants. It is difficult for a young child to repeat a standard gesture multiple times accurately to create training data. The recognition accuracy was of utmost importance in this task since the robot should not provide erroneous feedback to the children, which might confuse or frustrate them.

FSM has been widely used to model and recognize gestures [37]. We chose a FSM method because we can break down each gesture into a number of intermediate states, such that the recognition algorithm can precisely detect a partial gesture and thus allow the robot to provide more targeted feedback. We designed a FSM representation for each gesture and defined a region of interest (ROI) in which each FSM would be activated. These ROIs are defined in Eqn. (2)-(4). For example, a wave gesture FSM was only activated when the participant's arm was raised in front of the torso, the wrist was higher than the shoulder and the forearm was pointing upwards. The input variables to the gesture recognition FSM are computed from skeleton coordinates. Five sliding windows (1-5 seconds) were used to chop the FSM input data. Those windows were updated in every frame. In this way, we set up the maximum completion time for a gesture to be 5 seconds. Although usually a gesture lasts for 2-3 seconds, we introduce additional time for flexibility.

ROIWave={a1<Tang1,a2<Tang2,wy>sy,a3<Tang3}. (1)
ROIRaiseHand(s)={a1<Tang3,a2<Tang2}. (2)
ROIReachArmsOut={a1<Tang3,a4<Tang3,a5<Tang2,swy<0,(swx>0(right arm)orswx<0(left arm))}. (3)

We briefly discuss how the FSM works for each gesture.

A. Raising one hand

This gesture includes a) start raising a hand until the wrist is higher than the elbow; b) continue raising the hand and stretching the arm until the elbow is higher than the shoulder; and c) stretch the arm slightly further until it becomes straight (Fig. 3a). Fig.4a shows its FSM representation.

Fig. 4. FSM model for gesture recognition.

Fig. 4

L1 to L3 in Fig. 4a are the corresponding states that describe the 3 stages for the left arm, and R1 to R3 are for the right arm. C1L to C4L are the guard conditions for the left arm and C1R to C4R are for the right arm. The guard condition for the same level of state is in the same form for both arms since a participant can perform this gesture with either arm. The definitions below are not repeated for two arms separately. Using the variables in Table I, C1 to C3 are defined as:

C1=wy>eyforncontinous framesH>Theight1, (4)
C2=ey>sywesy>Tang1forncontinous frames, (5)
C3=wes>Tang2|πa1|<Tang3forncontinous frames. (6)

A gesture is considered successful if it is done within an appropriate time period. At the end of the time window, the guard condition “time out” (TO) is provided to terminate the recognition process if the arm has not reached the next state.

This gesture is graded in a 5-point scale according to the following rule: raising one hand should be done with only one arm. So if only one arm is raised to state 3 and the other arm is kept below state 2, it gets a score 4. If this raised hand is held still in state 3 for a certain amount of time, it is scored 5. However, there are scores for partial completion as well. If one or both arms are raised to state 1 but no further, it is scored 1. If one or both arms are raised to state 2, the score is 2, and if one arm is in state 3 and the other arm is in state 2 or 3, the score is 3.

B. Waving one hand

This gesture includes (Fig. 3b): a) start raising a hand; b) further move the hand higher than the shoulder; c) wave the raised hand to one side; and d) wave the same hand to the other side. Fig. 4b shows its FSM representation.

C1 to C4 are defined as follows,

C1=wy>eyforncontinous framesH>Theight2, (7)
C2=ey>syforncontinous frames, (8)
C3=D>Tdisin one direction, (9)
C4=D>Tdisin the other direction. (10)

In this case, if one or both arms are raised to state 1, the gesture gets a score 1. Both arms reaching state 2 leads to a score 2. Waving should also be done with only one arm, so if one arm gets to state 2 to 4 while the other is below state 2, this performance gets a score between 3 and 5.

C. Raising two hands

This gesture is similar to raising one hand (Fig. 3c), so the FSM graph is the same as Fig. 4a. However, it requires the raising of both hands. For either arm, if it reaches state 1 to 3, it gets a score from 1 to 3, respectively. If the hand is held still in state 3, it gets a score 4. Since this gesture should be done with both arms, the final score is the average score of the left and right arms. For example, if only one arm is fully raised and the other is not raised at all, the final score is 2.

D. Reaching arms out

Reaching arms out follows these steps (Fig. 3d): a) raise arms up to shoulder level; b) get the raised arms sideways; and c) stretch them sideways.

Its FSM graph looks exactly as that in Fig. 4a, except the guard conditions, which are:

C1=|π2a1|<Tang4forncontinous framesH>Theight3, (11)
C2=|a4|<Tang5forncontinous frames, (12)
C3=wes>Tang6forncontinous frames. (13)

The score for either arm equals to the states it reaches. Similar to raising two hands, the final score in this case is the average score of the left and the right arms.

We can see that all the gestures use similar FSM structure but with different guard conditions. All the parameters are adjustable for different application environments and user groups. We used the following values: n = 10; Theight1 = 20cm ; Theight2 = 20cm ; Theight3 = 10cm ; Tdis = 20cm ; Tang1 = π/2, Tang2 = 3π/4 ; Tang3 = π/6 ; Tang4 = π/6 ; Tang5 = π/6, Tang6 = π/4 for the user study. These values were chosen by the clinicians and engineers involved in this project based on the ability of the participant group. It is important that the gesture recognition algorithm runs in parallel with the robot's gesture demonstration task. In this way, even if a participant finishes a gesture before the robot finishes its own gesture prompt, a reward will be given and the robot will stop prompting. If the robot were to continue prompting the child (even though the child might have completed the required gesture) and only give rewards after the prompting was over, the child might feel frustrated.

V. Materials & Methods

We conducted a user study to assess user acceptance and performance of RISTA. The study was approved by the Vanderbilt Institutional Review Board (IRB). The experiment room is shown in Fig. 5. The participant was seated about 120cm from the Kinect and 150cm from the administrator.

Fig. 5. Schematic of the Experiment Room.

Fig. 5

A. Participants for the user study

Twelve children with ASD and 10 typically developing (TD) children were originally recruited to participate in this experiment. However, 4 children with ASD and 2 TD children did not complete the study. Two ASD children refused to sit in the experiment chair and thus did not start the experiment. Two other ASD children and the two TD children exhibited mild distress in the protocol and were withdrawn from the study. Group characteristics of the participants who completed the study are shown in Table II. The ASD group had received a clinical diagnosis of ASD based on DSM-IV-TR [38] criteria from a licensed psychologist, met the spectrum cut-off of the Autism Diagnostic Observation Schedule Comparison Score (ADOS CS) [39], and had existing data regarding cognitive abilities from the Mullen Scales of Early Learning, Early Learning Composite (MSEL) [40] in the registry. Parents of participants in both groups completed the Social Responsiveness Scale– Second Edition (SRS-2) [41], and Social Communication Questionnaire Lifetime Total Score (SCQ) [42] to index current ASD symptoms.

Table II. Characteristics of Participants.

Mean (SD) ADOS CS MSEL SRS-2 SCQ Age (Year)
ASD 7.63 (1.69) 64.75 (22.11) 75.29 (12.62) 17.88 (6.58) 3.83 (0.54)
TD NA NA 42.75 (10.08) 3.88 (2.95) 3.61 (0.64)

B. Task and protocol

We wanted to assess how a RISTA-based robotic system compares with human therapist-based imitation training. We hypothesized that the robotic system would elicit imitation performance and garner interest from children with ASD as well as a human therapist. To test this hypothesis, we conducted 2 human-administered sessions and 2 robot-administered sessions for each participant. In each group, one-half of the participants followed the order: robot session 1, human session 1, robot session 2, and human session 2 while the other half had human session 1, robot session 1, human session 2, and robot session 2. The human administrator was not present in robot sessions and the robot was not present in human administered sessions. Each session tested 2 gestures, and each gesture was tested in 2 trials. All 4 gestures were exhaustively tested in a randomized order. We compared participants' performance between the robot-administered sessions and the human-administered sessions for 1) gesture imitation performance and 2) attention towards the administrators (robot or human).

In the robot sessions, as shown in Fig. 6, prior to the practice of each gesture, the robot initiated a mirroring interaction segment for 15 seconds with the verbal prompt, “Let's play! I will copy you!”. In this segment, the robot copied a participant's arm gesture to the best of its motor capability. This was designed to maintain interest of the children on the robot and provided a break between the imitation training of different gestures. Following that, the child was asked to imitate the gestures of the robot in two trials.

Fig. 6. Flow of the Imitation Training Procedure for Each Gesture.

Fig. 6

In Trial 1, the robot said, “Okay! Now you copy me. Look at what I am doing!” and demonstrated the gesture twice, and prompted “You do it!”. The proposed gesture recognition was initiated immediately upon the first demonstration and ended 5 seconds following the second demonstration. As soon as the participant imitated the gesture correctly, the trial was terminated with a verbal praise, “Good job!”. The system recorded the performance score. Otherwise, the system provided feedback on the approximation if applicable, and recorded the best score the participant got. Consider the gesture “raising one arm” as an example. If the participant did not raise his/her arm high enough within the given time limit, the robot would take its (i.e., the robot's) arm at the participant's best raised position and gave a verbal response, “you were here”, and then would raise its (i.e., the robot's) arm further until the desired height with the verbal response, “higher!”.

Trial 2 included Stage A and Stage B (Fig. 6). If the participant succeeded in Trial 1, then the Trial 2 executed Stage A in Normal mode, which was the repeated procedure from Trial 1. Otherwise, Trial 2 executed Stage A in Mirroring mode, where the robot mirrored the participant's motion after gesture demonstration. For instance, if the participant was waving, the robot would wave its arm to follow the child. This mirroring helped the children check on their own performance. If the child imitated the gesture successfully in Stage A, a verbal reward was given and Stage B was omitted. Otherwise, after the robot told the child where he/she was wrong, Stage B was presented. It provided the final two gesture demonstrations and another 2 seconds following the gesture demonstration as the final response time. Without Stage B, the child would be frustrated since no chance was left to try the gesture again. However, this procedure should not be repeated too many times since the child would lose interest in doing one gesture continuously.

In human-administered sessions, the supervisory controller computed all of the information needed for the human administrator as it would for the robot in the robot-administrated session, which included the grading of imitation performance of the participant and what to respond to the participant. These messages were projected on the wall behind the participant, and thus the human administrator could read and follow those instructions while still looking in the participant's direction. The human administrator did not make any personal judgment.

Eye gaze approximation via head pose was a coarse indicator of a person's attention. It was estimated by the Kinect tracking module. We assumed that the participant's attention was on the administrator if his/her head pose was oriented towards the attention box discussed in section VI.A.

VI. Results

A. System validation results

In order to validate the accuracy of the proposed gesture recognition algorithm, 7 adults and 3 typically developing (TD) children were recruited. Each participant performed each of the 4 gestures 10 times under the experimental conditions. Each participant was also instructed to perform some non-specific movements during testing, and slightly shift their front facing postures between gesturing to create a naturalistic condition. The gesture recognition algorithm classified the performed gestures into one of the 4 categories or a “not recognized” category. These recognition results were compared with the subjective ratings of a therapist and got the overall accuracy of 98% (1.5% false positive and 0.5% false negative) [27]. In the few cases where it failed, it was mainly due to the tracking failure of the Kinect when the subjects quickly shifted their postures.

The system also inferred a participant's attention to the task administrator (i.e., either the robot or a human therapist) based on where he was looking. The robot height is similar to the human therapist's upper body height. A box of 85.77 cm×102.42 cm around the robot and the upper body of human therapist was set as the target attention regions. The gaze was approximated based on head pose estimation. To test the attention inference method, those same participants were asked to first look at the bounding box covering the region where an administrator would stand with their natural head pose. These head poses were reordered as the baseline data. Then the participants were asked to look away and back to the region for 10 times. Their raw head poses were normalized by their baseline values and those rectified poses were computed to see if they were oriented towards the administrator region. The results show that for 91% (2% false positive, 7% false negative) of the times, their head pose indicated the gaze towards the administrator region. In the user study, all participants' natural head poses were also calibrated in the same way.

B. Preferential attention towards the administrator

Attention to the administrator is a marker for eventual learning within intervention paradigms. On average, the ASD group paid attention to the robot and the human therapist for 55.01 (SD: 28.42) seconds and 43.32 (SD: 25.47) seconds per session, respectively. The TD group paid attention to the robot and human therapist for 61.35 (SD 28.89) seconds and 47.02 (SD: 17.30) seconds per session, respectively. The duration of a session depended on the participant's performance in that session, and each participant had different imitation abilities. The ASD group required similar amounts of time to complete the tasks across robot sessions (Avg = 105.52, SD = 24.47 seconds) and human sessions (Avg = 104.35, SD = 23.51 seconds), while the TD group required more time to complete the robot sessions (Avg = 99.69, SD = 29.76 seconds) than the human sessions (Avg = 86.67, SD = 28.64 seconds). Therefore, the ratios of the duration of attention on the administrator to the total session length was used as a normalized representation of how much attention the participants paid to the administrator, which are shown in Table III for both groups.

Table III. The Ratio of the Duration of Attention on the Administrator to the Total Session Time (%).

Mean(SD) Robot session Human session
ASD 52.38% (24.23%) 41.38% (21.27%)
TD 63.50% (23.53%) 61.59% (29.34%)

We can see that participants in the ASD group spent 11% more time attending to the robot than the human therapist, while participants in the TD group paid similar attention to the robot and the human therapist across sessions. However, Wilcoxon signed rank test shows that the differences were not statistically significant for either group with p = 0.0663 for the ASD group and p = 0.7367 for the TD group. These results, although only approaching significant, support a part of our hypothesis in that the children with ASD paid more attention to the robot administrator than the human administrator, which indicate the potential for such a system to garner interest in imitation training.

C. Gesture Imitation performance

Next we analyzed the demonstrated imitation skills of both groups in human and robot administered sessions. The score of each gesture in each trial was normalized to [0,10]. Table IV(a) lists the group performance between ASD and TD across sessions. For each participant, their imitation scores for all 4 gestures in both Trial1 and Trial2 for both robot-administered and human-administered sessions were added together and presented in Table IV(a) to show the overall performance. Further trial by trial analysis is presented in Table IV(b). The average scores of all the robot-administrated Trial1 (R1) and Trial2 (R2) as well as human administrated Trial1 (H1) and Trial2 (H2) were computed for each group. Fig. 7 shows the group performance on each gesture. G1 to G4 represent raising one hand, raising two hands, wave, and reaching arms out, respectively.

Table IV. Imitation Performance Results—Gesture Scores.

(a). General Session Performance Results

Mean (SD) Robot session scores Human session scores
ASD 27.31 (32.07) 19.75 (13.64)
TD 43.75 (28.26) 44.79 (31.98)
(b). Performance Results in Trials

Mean (SD) R1 scores R2 scores H1 scores H2 scores
ASD 3.17 (4.43) 3.66 (4.31) 1.46 (2.55) 3.47 (3.67)
TD 5.49 (4.32) 5.45 (4.37) 5.70 (4.50) 5.50 (4.73)
(c). Wilcoxon Signed-Rank p Values on Trial vs. Trial Performance within Group

Trial vs. Trial ASD TD Trial vs. Trial ASD TD
R1 vs. R2 0.2793 0.9893 H1 vs. H2 0.0006 0.6318
R1 vs. H1 0.0494 0.6946 H2 vs. R2 0.8669 0.9811
(d). Mann-Whitney U Test p Values on ASD vs. TD Trial Performance

Group vs. Group R1 R2 H1 H2
ASD vs. TD 0.0277 0.0905 0.0003 0.1473

Fig. 7. Group performance of individual gestures.

Fig. 7

Consistent with previously demonstrated imitation deficits in individuals with ASD, results show that the ASD group was less successful than the TD group in general. Participants in the ASD group performed better in robot-administered sessions than in human-administered sessions, especially in Trial1 for G1 to G3. Among the 4 gestures, we can see that wave got the lowest scores in both group due to its complexity. Participants in the TD group did not demonstrate much difference across trials in all the sessions.

Table IV(c) give the statistical p values of the imitation performance for each group. We can see that the ASD group's performance in the robot sessions was not statistically significantly different between Trial1 and Trial2, while that of human sessions it was significant. The result of robot-administered Trial1 was significantly better than that of human-administered Trial1, while the two Trial2s' performance were not significantly different. The TD group's performance was similar across all the trials. However, putting Trial1 together with Trial2, the statistical analysis of robot-administered session vs. human-administered session showed non-significant results for both ASD (p=0.5781) and TD (p=0.6406) groups. Table IV(d) lists the ASD vs. TD group comparison across different trials. We can see that the main differences were on Trial1 for both robot-administered sessions and human-administered sessions.

These results show that in robot-administrated sessions, children with ASD showed better imitation skills more quickly than they did with human mediated sessions. Initial interactions with the human therapist yielded significantly lower imitation scores than interactions with the robot in Trial1. In Trial2, the difference between human and robot-administrated session scores decreased, but the average scores for human sessions were still lower. There was no significant difference for the TD group across either condition throughout the whole experiment.

VII. Discussion

In this paper, we presented the design and development of a novel robot-mediated imitation skill training system, RISTA, with potential relevance to core areas of deficit in young children with ASD. RISTA was suitable for both a robot and a human therapist administrator. It detected the participant's imitation performance in real-time and fed this back to the administrator for adaptive intervention. Within this proof of concept experiment we also replicated previous findings demonstrating that young children with ASD paid more attention to the robot administrator and performed better in robot facilitated imitation tasks than in human-administered sessions under the same experimental protocol.

A particular strength of the RISTA is its use of a non-invasive configuration that does not require the participants to wear any physical sensors. This is extremely important for young children with ASD, as they can find wearable sensors uncomfortable and distracting. Another important contribution of this work relates our modeling method affording for closed-loop interaction. The FSM-based gesture recognition method that we designed allowed us to obtain real-time evaluation of participant performance and provide adaptive and individualized feedback on different levels of imitation completion. Such extension is bolstered by the fact that the FSM recognition method does not require specific training data from children to be gathered prior to participation.

In terms of performance within the system, most children with ASD and TD children were able to respond with some degree of accuracy to prompts delivered by a humanoid robot and a human administrator within the standardized protocol. Children with ASD paid more attention to the robot than the human administrator, a finding replicating previous work suggesting attentional preferences for robotic interactions over brief intervals of time. Further, some young children with ASD seemed to demonstrate enhanced performance in response to robotic prompts than those delivered by human counterparts. This suggests that robotic systems endowed with enhancements for successfully capitalizing on baseline enhancements in non-social attention preference might be utilized to meaningfully enhance skills related to core symptoms of ASD. Although this work does not demonstrate generalization beyond the experimental sessions, this documented preferential attention could potentially be harnessed to drive towards such an outcome. Future work examining more in-depth prompt and reinforcement strategies, upgrading and accommodating the system into a formal clinical study, including more gestures, and combining gesture imitation with other meaningful daily tasks would likely enhance future applications of this system.

There are several methodological limitations of the current study that are important to highlight. The small sample size examined and the limited time frame of interaction restricted our ability to realistically comment on the value and ultimate clinical utility of this system as applied to young children with ASD. Further, the brief exposure of the current paradigm, in combination with unclear baseline skills of participating children, ultimately cannot answer questions as to whether heightened attention paid to the robotic system or performance differences in conditions displayed during the study are simply the artifact of novelty or of a more characteristic pattern of preference that could be harnessed over time. We also did not explore test-retest reliability in this preliminary study. Regarding gesture recognition, Kinect has a limited range and thus puts constraints on the set of gestures that can be used for imitation tasks. Therefore, extending the range of Kinect is to be explored to improve the system capabilities. In addition, developing an optimization algorithm for autonomous selection of the parameters of the FSM based will further enhance the system. Another important technical limitation was the approximation of attention with head pose. It must be emphasized that head orientation approximating gaze or attention does not necessarily equate to actual eye gaze or by extension, attention. However, such data does provide a coarse proxy for documenting feasibility. In terms of the robot, Nao was not suitable for very fast paced motion due to its limited motor ability. It was programmed to provide intermittent verbal prompts and rewards but did not engage the participants in continuous verbal communication. There could be some benefits in engaging the children continuously through verbal communication, however, in this study we thought that continuous verbal communication might distract the participants from imitating gestures.

Despite limitations, this work is the first to our knowledge to design and empirically evaluate the usability, feasibility, and preliminary efficacy of a non-invasive closed-loop interactive robotic technology capable of modifying response based on within system measurements of performance on imitation tasks with young children. Movement in this direction introduces the possibility of realized technological intervention tools that are not simple response systems, but systems that are capable of necessary and more sophisticated adaptations. Our platform represents a move toward realistic deployment of technology capable of accelerating and priming a child for learning in key areas of deficit.

Acknowledgments

This work was supported in part by a grant from the Vanderbilt Kennedy Center (Hobbs Grant), the National Science Foundation under Grant 1264462, and the National Institute of Health under Grant 1R01MH091102-01A1 and 5R21MH103518-02. Work also includes core support from NICHD (P30HD15052) and NCATS (UL1TR000445-06).

Biographies

graphic file with name nihms797216b1.gif

Zhi Zheng (S'09) received the B.S. degree in biomedical engineering Xidian University, Xi'an, China, in 2008, and M.S. degree of electrical engineering in Vanderbilt University, Nashville, TN, USA in 2013. Since then, she has been working toward the Ph.D. degree in electrical engineering with Vanderbilt University, Nashville, TN, USA. Her current research interests include human–computer interaction, social robotics, affective computing, and computer vision.

graphic file with name nihms797216b2.gif

Eric M. Young received his B. Eng. and M.S. degrees in mechanical engineering at Vanderbilt University, Nashville, TN in 2015, while researching assistive robotics with intention-detecting capabilities.

He is currently working toward the PhD degree in mechanical engineering with the Haptics Group at the University of Pennsylvania, where he is continuing to investigate methods of interpreting and applying forces during physical human-robot interaction.

graphic file with name nihms797216b3.gif

Amy R. Swanson received the M.A. degree in social science from University of Chicago, Chicago, IL, USA, in 2006.

Currently she is a Research Analyst at Vanderbilt Kennedy Center's Treatment and Research Institute for Autism Spectrum Disorders, Nashville, TN, USA.

graphic file with name nihms797216b4.gif

Amy S. Weitlauf received the Ph.D. degree in Psychology from the Vanderbilt University, Nashville, TN, USA, in 2011.

She completed her pre-doctoral internship at the University of North Carolina, Chapel Hill. She then returned to Vanderbilt, first as a postdoctoral fellow and now as an Assistant Professor of Pediatrics at Vanderbilt University Medical Center. She is a Licensed Clinical Psychologist who specializes in the early diagnosis of autism spectrum disorder.

graphic file with name nihms797216b5.gif

Zachary E. Warren received the Ph.D. degree from the University of Miami, Miami, FL, USA, in 2005.

He completed his postdoctoral fellowship at the Medical University of South Carolina. Currently he is an Associate Professor of Pediatrics and Psychiatry at Vanderbilt University. He is the Director of the Treatment and Research Institute for Autism Spectrum Disorders at the Vanderbilt Kennedy Center.

graphic file with name nihms797216b6.gif

Nilanjan Sarkar (S'92–M'93–SM'04) received Ph.D. in mechanical engineering and applied mechanics from the University of Pennsylvania, Philadelphia, in 1993.

In 2000, Dr. Sarkar joined Vanderbilt University, Nashville, TN, where he is currently a Professor of mechanical engineering and electrical engineering and computer science. His current research interests include human–robot interaction, affective computing, dynamics, and control.

Contributor Information

Zhi Zheng, Email: zhi.zheng@vanderbilt.edu, Electrical Engineering and Computer Science Department, Vanderbilt University, Nashville, TN 37212 USA.

Eric M. Young, Email: yoeric@seas.upenn.edu, Mechanical Engineering Department, University of Pennsylvania, Philadelphia, PA 19104, USA.

Amy R. Swanson, Email: amy.r.swanson@vanderbilt.edu, Treatment and Research in Autism Disorder, Vanderbilt University Kennedy Center, Nashville, TN 37203 USA.

Amy S. Weitlauf, Email: amy.s.weitlauf@vanderbilt.edu, Treatment and Research in Autism Disorder, Vanderbilt University Kennedy Center, Nashville, TN 37203 USA.

Zachary E. Warren, Email: zachary.e.warren@vanderbilt.edu, Treatment and Research in Autism Disorder, Vanderbilt University Kennedy Center, Nashville, TN 37203 USA.

Nilanjan Sarkar, Email: nilanjan.sarkar@vanderbilt.edu, Mechanical Engineering Department, Vanderbilt University, Nashville, TN 37212 USA.

References

  • 1.Amendah D, Grosse SD, Peacock G, Mandell DS. Autism Spectrum Disorders. Oxford University Press Inc; 2011. The economic costs of autism: A review. [Google Scholar]
  • 2.Weitlauf AS, McPheeters ML, P B, MPH, Sathe N, Travis R, Aiello R, et al. Therapies for Children With Autism Spectrum Disorder: Behavioral Interventions Update. Comparative Effectiveness Review No 137. 2014 Jun; AHRQ Publication No 14-EHC036-EF. [PubMed] [Google Scholar]
  • 3.Warren Z, Vehorn A, Dohrmann E, Newsom C, Taylor JL. Brief report: Service implementation and maternal distress surrounding evaluation recommendations for young children diagnosed with autism. Autism. 2013;17:693–700. doi: 10.1177/1362361312453881. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Scassellati B, Admoni H, Mataric M. Robots for use in autism research. Annual Review of Biomedical Engineering. 2012;14:275–294. doi: 10.1146/annurev-bioeng-071811-150036. [DOI] [PubMed] [Google Scholar]
  • 5.Robins B, Dautenhahn K, Dubowski J. Does appearance matter in the interaction of children with autism with a humanoid robot? Interaction Studies. 2006;7:509–542. [Google Scholar]
  • 6.Pierno AC, Mari M, Lusher D, Castiello U. Robotic movement elicits visuomotor priming in children with autism. Neuropsychologia. 2008;46:448–454. doi: 10.1016/j.neuropsychologia.2007.08.020. [DOI] [PubMed] [Google Scholar]
  • 7.Klin A, Lin DJ, Gorrindo P, Ramsay G, Jones W. Two-year-olds with autism orient to nonsocial contingencies rather than biological motion. Nature. 2009;459:257–261. doi: 10.1038/nature07868. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Goodrich MA, Colton MA, Brinton B, Fujiki M. A case for low-dose robotics in autism therapy. presented at the Human-Robot Interaction (HRI), 2011 6th ACM/IEEE International Conference on. 2011 [Google Scholar]
  • 9.Feil-Seifer D, Matarić MJ. Toward socially assistive robotics for augmenting interventions for children with autism spectrum disorders. Experimental robotics. 2009:201–210. [Google Scholar]
  • 10.Bekele ET, Lahiri U, Swanson AR, Crittendon JA, Warren ZE, Sarkar N. A step towards developing adaptive robot-mediated intervention architecture (ARIA) for children with autism. Neural Systems and Rehabilitation Engineering, IEEE Transactions on. 2013;21:289–299. doi: 10.1109/TNSRE.2012.2230188. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Warren ZE, Zheng Z, Swanson AR, Bekele E, Zhang L, Crittendon JA, et al. Can Robotic Interaction Improve Joint Attention Skills? Journal of autism and developmental disorders. 2013:1–9. doi: 10.1007/s10803-013-1918-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Yoder PJ, McDuffie AS, Charman T, Stone W. Treatment of responding to and initiating joint attention. In: Charman T, Stone W, editors. Social & communication in autism spectrum disorders: Early identification, diagnosis, & intervention. New York: Guilford; 2006. pp. 117–142. [Google Scholar]
  • 13.Greczek J, Kaszubksi E, Atrash A, Matarić MJ. Graded Cueing Feedback in Robot-Mediated Imitation Practice for Children with Autism Spectrum Disorders. Proceedings, 23rd IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN 2014) Edinburgh, Scotland, UK. 2014 Aug; [Google Scholar]
  • 14.Feil-Seifer D, Mataric M. presented at the In Proceedings of the 6th international conference (ACM/IEEE) on Human-robot interaction. New York, NY: ACM Press; 2011. Automated detection and classification of positive vs. negative robot interactions with children with autism using distance-based features. [Google Scholar]
  • 15.Amirabdollahian F, Robins B, Dautenhahn K, Ji Z. Investigating tactile event recognition in child-robot interaction for use in autism therapy. presented at the Engineering in Medicine and Biology Society,EMBC, 2011 Annual International Conference of the IEEE. 2011 doi: 10.1109/IEMBS.2011.6091323. [DOI] [PubMed] [Google Scholar]
  • 16.Francois D, Dautenhahn K, Polani D. Using real-time recognition of human-robot interaction styles for creating adaptive robot behaviour in robot-assisted play. presented at the IEEE Symposium on Artificial Life. 2009 [Google Scholar]
  • 17.Liu C, Conn K, Sarkar N, Stone W. International Journal of Human-Computer Studies. Vol. 66. Elsevier; 2008. Physiology-based affect recognition for computer-assisted intervention of children with Autism Spectrum Disorder; pp. 662–677. [Google Scholar]
  • 18.Williams JH, Whiten A, Suddendorf T, Perrett DI. Imitation, mirror neurons and autism. Neuroscience & Biobehavioral Reviews. 2001;25:287–295. doi: 10.1016/s0149-7634(01)00014-8. [DOI] [PubMed] [Google Scholar]
  • 19.Whiten A, Brown J. Imitation and the reading of other minds: Perspectives from the study of autism, normal children and non-human primates. Intersubjective communication and emotion in early ontogeny. 1998:260–280. [Google Scholar]
  • 20.Ingersoll B. Brief report: Effect of a focused imitation intervention on social functioning in children with autism. Journal of autism and developmental disorders. 2012;42:1768–1773. doi: 10.1007/s10803-011-1423-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Ingersoll B. Pilot Randomized Controlled Trial of Reciprocal Imitation Training for Teaching Elicited and Spontaneous Imitation to Children with Autism. Journal of Autism and Developmental Disorders. 2010;40 doi: 10.1007/s10803-010-0966-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Duquette A, Michaud F, Mercier H. Exploring the use of a mobile robot as an imitation agent with children with low-functioning autism. Autonomous Robots. 2008;24:147–157. [Google Scholar]
  • 23.Ranatunga I, Beltran M, Torres NA, Bugnariu N, Patterson RM, Garver C, et al. Social Robotics. Springer; 2013. Human-robot upper body gesture imitation analysis for autism spectrum disorders; pp. 218–228. [Google Scholar]
  • 24.Cabibihan JJ, Javed H, Ang M, Jr, Aljunied SM. Why robots? A survey on the roles and benefits of social robots in the therapy of children with autism. International journal of social robotics. 2013;5:593–618. [Google Scholar]
  • 25.Robins B, Dautenhahn K, Boekhorst R, Billard A. Robotic assistants in therapy and education of children with autism: can a small humanoid robot help encourage social interaction skills? Universal Access in the Information Society. 2005;4:105–120. [Google Scholar]
  • 26.Srinivasan SM, Lynch KA, Bubela DJ, Gifford TD, Bhat AN. Effect of Interactions Between a Child and a Robot on the Imitation and Praxis Performance of Typically Developing children and a Child with Autism: A Preliminary Study. Perceptual & Motor Skills. 2013;116:885–904. doi: 10.2466/15.10.PMS.116.3.885-904. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Zheng Z, Das S, Young EM, Swanson A, Warren Z, Sarkar N. Autonomous robot-mediated imitation learning for children with autism. Robotics and Automation (ICRA), 2014 IEEE International Conference on. 2014:2707–2712. [Google Scholar]
  • 28.Zheng Z, Zhang L, Bekele E, Swanson A, Crittendon J, Warren Z, et al. Rehabilitation Robotics (ICORR), 2013 IEEE International Conference on. Seattle, Washington: 2013. Impact of Robot-mediated Interaction System on Joint Attention Skills for Children with Autism. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Wainer J, Dautenhahn K, Robins B, Amirabdollahian F. A pilot study with a novel setup for collaborative play of the humanoid robot KASPAR with children with autism. International Journal of Social Robotics. 2014;6:45–65. [Google Scholar]
  • 30.Aldebaran Robotics. Available: http://www.aldebaran-robotics.com/en/
  • 31.Microsoft Kinect for Windows. Available: http://www.microsoft.com/en-us/kinectforwindows/
  • 32.Livingston M, Sebastian J, Ai Z, Decker JW. Performance measurements for the Microsoft Kinect skeleton. Virtual Reality Short Papers and Posters (VRW), 2012 IEEE. 2012:119–120. [Google Scholar]
  • 33.Obdrzalek S, Kurillo G, Ofli F, Bajcsy R, Seto E, Jimison H, et al. Accuracy and robustness of Kinect pose estimation in the context of coaching of elderly population. Engineering in medicine and biology society (EMBC), 2012 annual international conference of the IEEE. 2012:1188–1193. doi: 10.1109/EMBC.2012.6346149. [DOI] [PubMed] [Google Scholar]
  • 34.Lee EA, Seshia SA. Introduction to embedded systems: A cyber-physical systems approach: Lee & Seshia. 2011 [Google Scholar]
  • 35.Wilson AD, Bobick AF. Parametric hidden markov models for gesture recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 1999;21:884–900. [Google Scholar]
  • 36.Deutscher J, Blake A, Reid I. Articulated body motion capture by annealed particle filtering. Computer Vision and Pattern Recognition, 2000 Proceedings IEEE Conference on. 2000:126–133. [Google Scholar]
  • 37.Mitra S, Acharya T. Gesture recognition: A survey. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on. 2007;37:311–324. [Google Scholar]
  • 38.Diagnostic and Statistical Manual of Mental Disorders: Quick reference to the Diagnostic Criteria from DSM-IV-TR. Fourth. Washington D.C: American Psychiatric Association; 2000. [Google Scholar]
  • 39.Lord C, Rutter M, DiLavore P, Risi S, Gotham K, Bishop S. Autism Diagnostic Observation Schedule–2nd edition (ADOS-2) Western Psychological Services; Torrance, CA: 2012. [Google Scholar]
  • 40.Mullen EM. Mullen scales of early learning: AGS edition. Circle Pines, MN: American Guidance Service; 1995. [Google Scholar]
  • 41.Constantino JN, Gruber CP. The social responsiveness scale. Los Angeles: Western Psychological Services; 2002. [Google Scholar]
  • 42.Rutter M, Bailey A, Lord C. The Social Communication Questionnaire. Los Angeles, CA: Western Psychological Services; 2010. [Google Scholar]

RESOURCES