Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2022 Sep 16:1–23. Online ahead of print. doi: 10.1007/s10209-022-00914-7

Toward automatic motivator selection for autism behavior intervention therapy

Nur Siyam 1,, Sherief Abdallah 1
PMCID: PMC9483340  PMID: 36160369

Abstract

Children with autism spectrum disorder (ASD) usually show little interest in academic activities and may display disruptive behavior when presented with assignments. Research indicates that incorporating motivational variables during interventions results in improvements in behavior and academic performance. However, the impact of such motivational variables varies between children. In this paper, we aim to address the problem of selecting the right motivator for children with ASD using reinforcement learning by adapting to the most influential factors impacting the effectiveness of the contingent motivator used. We model the task of selecting a motivator as a Markov decision process problem. The states, actions and rewards design consider the factors that impact the effectiveness of a motivator based on applied behavior analysis as well as learners’ individual preferences. We use a Q-learning algorithm to solve the modeled problem. Our proposed solution is then implemented as a mobile application developed for special education plans coordination. To evaluate the motivator selection feature, we conduct a study involving a group of teachers and therapists and assess how the added feature aids the participants in their decision-making process of selecting a motivator. Preliminary results indicated that the motivator selection feature improved the usability of the mobile app. Analysis of the algorithm performance showed promising results and indicated improvement of the recommendations over time.

Keywords: Special education, Autism, Markov decision processes, Reinforcement learning, Behavior intervention, Intervention therapy

Introduction

Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder with an occurrence rate between 1 and 2% [19]. ASD is characterized by challenges in communication and social interactions, repetitive and stereotyped behaviors, and limited interests [56]. While there is no known cure for ASD, early intervention has proven to be effective in improving cognitive abilities, language, and adaptive behavior [22].

One challenging aspect in early intervention is the lack of students’ interest in academic activities or homework assignments. Students with ASD may resort to behave in a disruptive manner to avoid the academic tasks [35]. Such disruptive behaviors are considered major barriers to the attainment of educational goals as described in the individualized education program (IEP) of the student. Disruptive behaviors are likely to worsen if left untreated. However, research indicates that incorporating motivational variables into the intervention of a child with ASD leads to improvements in core symptoms of autism and academic areas [35, 56]. The use of motivational variables, often referred to as reinforcement learning, has been refined into a structured treatment system called applied behavioral analysis (ABA). ABA-based treatment approaches use reinforcement learning to promote desired behavior (such as eye contact) and diminish atypical behaviors (such as repetitive body movements) [56]. To differentiate between reinforcement learning as a machine learning paradigm and reinforcement learning as part of the ABA-based treatment approaches, we refer in this paper to the latter as “the use of motivators.”

In this paper, we propose the “motivator selection” function and add it to a mobile app to be used by teachers and therapists. The mobile app, “IEP-Connect,” was developed to enable information sharing between different parties involved in the intervention of special education needs and disabilities (SEND) students. IEP-Connect uses the individualized educational plan (IEP) as the main point of coordination [62]. The app allows teachers, therapists and parents to record details of daily sessions and classes regarding students’ progress toward the IEP goals, including behavior monitoring data and motivators used. This information is shared among participating parties to improve learning and therapy outcomes as suggested by earlier research [59]. While sharing information is crucial to the learning process, teachers and therapists need to navigate through the data to determine the right motivator to use for their student in the current session, which may be time-consuming and cause overload [60]. This motivates the proposal of a more data-centric approach that systemically evaluates the space of available motivators in order to determine the best action, which is what we propose in this paper.

The aim of our work is to improve the learning activities and sessions for students with ASD by providing an adaptive decision support system through a mobile app that recommends motivators for teachers and therapists during a learning session. Using the mobile app, teachers record problematic behavior and the system suggests a motivator to be used with the learner. We consider in this study contingent rewards, which are used to reward children when meeting a specified goal and provide positive reinforcement when a task is well done [35]. Thus, this study aims to answer the following research questions:

RQ1: Does adding the “motivator selection” feature to the IEP-Connect app increase the usability of the app?

RQ2: Does student motivation significantly increase when using the “motivator selection” feature compared to the traditional motivator selection methods?

RQ3: When using the RL algorithm for the “motivator selection”, does reward significantly increase over time?

RQ4: When using the RL algorithm for the “motivator selection”, does the episode length (number of steps) significantly decrease over time?

The contribution of this work can be summarized as follows:

  1. We model the selection of a motivator as an MDP problem, considering the factors that impact the effectiveness of a motivator according to ABA-based methods and the trade-off between different motivators;

  2. We attempt to solve the motivator selection problem (MSP) using reinforcement learning. While RL is a widely used method for decision support systems, it was not previously implemented for motivators recommendations;

  3. We improve the functionally of the IEP-Connect mobile app by adding the “motivator selection” feature. Results from the usability study indicated that the mobile app has excellent usability rate and user satisfaction. These promising results highlight the capabilities of mobile technologies and artificial intelligence in improving the learning and behavior of children with special educational needs.

In the next section, we formally define the motivator selection problem (MSP) that our proposed system aims to address. We then present the related work by highlighting the most relevant concepts of behavior intervention for autism and then explore the literature on digitized behavior intervention and the use of reinforcement learning in special education. In the following section, we map the motivator selection problem into a Markov decision process (MDP) and use Q-learning to solve the modeled problem. Subsequently, we lay out our methodology. We then present the evaluation results in terms of users’ satisfaction and system performance. We conclude the paper with a discussion and implications for practice, research, and future work.

Motivator selection problem (MSP)

Identifying and assessing potential motivating stimuli for students with SEND, and evaluating the methods for maintaining the effectiveness of these motivators, have received interest in the literature [46]. As one of the main issues encountered in the therapy of children with ASD is the lack of motivation, many studies aimed to identify the factors that impact the children’s response to different motivators [56].

Children with ASD do not typically show preference to social stimuli (such as smiles or praise) or affective stimuli (such as a picture of a crying face), which makes it hard to reinforce new behaviors through a natural environment [56]. That is why parents and teachers tend to rely on edible motivators as an alternative. While various studies showed that many skills can be established while using edible stimuli, there are many downsides for delivering edible motivators [52]. For instance, providers tend to use food as a common motivator in ABA-based treatment. Therefore, food becomes a therapeutic tool that influences eating behaviors occurring even when the child is not responding to a biological need, resulting in a tendency to overeat. Evidence suggests that children with ASD are at a higher risk for unhealthy weight compared to other children [44]. Despite this fact, therapists, teachers and parents continue to use calorically dense foods as a motivator to influence behavior change. Frequent use of food as a motivator has many undesirable side effects that are mostly not found in other motivators. These side effects include long-term health consequences such as weight gain and dental cavities, interruption of activities to administer the motivator, and satiation [44].

Satiation is a decrease in the effectiveness of a stimuli as a motivator when used repeatedly or for a long period of time [48]. Research indicates that children satiate more often on edible motivators. This resulted in more investigations on the advantages of using other types of motivators such as tokens and sensory stimuli [44, 52]. However, satiation is considered a negative consequence of using any kind of motivator repeatedly. To solve the issue of satiation, therapists and teachers are required to continually vary stimuli and introduce motivators from different modalities, vary the schedules of reinforcement, provide children with choices, and benefit from the use of technology to access preference items to increase motivation [46, 48]. Many other factors where considered in the literature such as the benefits of sensory stimuli in promoting greater interaction between the child and the environment [52], the reinforcing value of providing choice [46], and the effects of using preferred (specially assessed) motivators [56].

One of the important steps for a successful deployment of ABA-based interventions is the identification of motivator preference for each student [35]. However, evidence suggests the need of conducting frequent assessment to reflect the changes in preferences over time. Motivators for students with autism are sensitive to the changing characteristics across time and children, suggesting that a typical motivator that is powerful in one context might not be effective in other settings or with other children [23]. Thus, there is a need for continuous identification and assessment of effective motivators, which requires time and effort from teachers, therapists and parents. Our research aims to provide an automated system that addresses that need.

MSP is considered a challenging problem as various factors impact the effectiveness of a motivator, including the child herself, the teacher, the subject being taught, the time of day, the type of disruptive behavior, among others. Therapists who are trained on ABA-based treatment approaches usually choose a specific motivator considering the history of student behavior, current behavior, and some internal approximation of the outcome of possible future therapy decisions. They develop reinforcement sampling menus or lists that can help them identify motivators to each identified problem behavior (e.g., aggression, stereotypy, non-compliance) according to the antecedents of the behavior (e.g., change of activity, denied access to preferred item) context variables (e.g., time of day, subject), and the behavior function (e.g., attention seeking, escape). Additionally, therapists take into consideration the student’s preferred motivator which is usually identified by using motivator assessments [49]. In academics and behavior, teachers and therapists use these data to individualize instruction on a student-by-student basis according to the student’s exceptional learning and behavior needs [26]. The left side of Fig. 1 shows the steps in the behavior intervention strategy (ABA) followed by therapists to devise their plans.

Fig. 1.

Fig. 1

Behavior intervention strategy

However, therapists and teachers face many challenges in developing such plans. First, devising behavior intervention plans requires training and experience. Studies show that the supply of certified ABA providers is low [77]. Also, while therapists and interventionists may have the needed knowledge, other teachers, especially those in general education classrooms, rely on behavior therapists to recommend appropriate intervention plans [32]. This requires continuous communication and coordination to prevent the misuse of motivators and to ensure the intervention works as intended. While effective communication is always sought by all parties, it is often lacking due to time restrains or denied access [59]. Second, as with most clinical treatments, the effect of the motivator used with the student is uncertain (non-deterministic). This uncertainty makes it hard to plan ahead as attempting to predict the effect of a series of treatment over time compounds the uncertainty [13]. The process of navigating student-based data and plan for the right intervention is not only time-consuming, but requires considering various variables and continuous access to research and heuristic investigations [13]. Moreover, there has been a deal of contradicting evidence regarding the use of “extrinsic” motivation to engage and motivate students [56]. We follow in this paper the popular position in this regard in that the application of planned positive motivation is a critical element of teaching children with ASD. Teachers, therefore, are required to sustain students’ motivation using both “intrinsic” and “extrinsic” reinforcement. Motivation is considered an internal “intrinsic” psychological state. Reinforcement, on the other hand, can be intrinsic to the task, extrinsically applied, or both. The challenge is, therefore, in deciding when the use of different types of motivators is effective or even necessary [29].

Related work

The increased rate of ASD diagnosis in recent years [19] has fueled research in the area of machine learning with the purpose of improving the learning experience of those affected. The focus of research has been mainly on developing academic or social skills learning applications [25, 53], improving diagnosis efficiency [37], and modeling social and behavioral aspects of ASD [66]. However, we are not aware of any research that applied reinforcement learning to solve the MSP. The following sections review related work on digitized behavior management and reinforcement learning, in an attempt to identify the gap between available technology tools and the need to provide solutions for therapy recommendations in special education.

Digitized behavior intervention

There are many available applications that allow therapists, teachers and parents to monitor the behavior of children with special needs [43, 73]. These applications allow the people involved with the intervention of children with ASD to track, store and share important information. This information is used to plan for interventions, monitor progress toward IEP objectives, and generate reports. While these applications are very helpful and a good replacement of paper-based data collection, the data collected in special needs settings is usually complex, unstandardized and incomplete [43]. Many studies suggested using data mining techniques to support intervention decisions [70]. For instance, Burns et al. [17] developed a mobile app that employed association rule mining to reveal pattern in behavior causes and effects to inform the therapist decisions. In the study, parents use a mobile app to collect Antecedent, Behavior and Consequence (ABC) data. The data mining techniques aimed to identify behavior causes and effects patterns to enable therapists to improve intervention. Linstead et al. [40] introduced the autism management platform (AMP), an integrated healthcare information system for managing data related to the diagnosis and treatment of children with ASD. The authors developed a mobile application to facilitate information and multimedia sharing between parents and clinicians. The system also includes a web interface and analytics platforms, allowing specialists to mine patient data in real time. The analytics platform uses machine learning techniques to provide users with personalized data searching preferences. Bhuyan et al. [15] studied temporal data to identify factors that aid caregivers in creating an effective intervention plan and predict the right treatment based on the data in other contexts.

Previous studies also focused on using mobile technology to help children with ASD and their caregivers in regulating challenging behaviors. For instance, Crutchfield et al. [21] evaluated the impact of the I-Connect app on stereotypy in adolescents with ASD in a school setting. Préfontaine et al. [51] developed the iSTIM app to support parents of younger children with ASD in reducing stereotypy behavior. The app was evaluated and found successful in regulating stereotypy behavior when used by trained researches as well as parents who do not have the required ABA training [71]. In another related study, Begoli et al. [10] aimed to develop a computational representation for ABA to serve as a reasoning foundation for intelligent-agent mediated therapies by formulating the representation of ABA concepts as a process ontology. Concepts that are relevant to the reasoning and operations functions aspects of the agents (e.g., rewarding and prompting) were represented in the ontology and then were formalized as a Belief-Desire-Intention (BDI) reasoning framework. Such formalization is feasible because of the procedural, repetitive and prescriptive nature of ABA [9].

Reinforcement learning (RL)

As a subfield of machine learning, RL has been widely implemented, resulting in its increasing applicability in real-life problems and decision support systems [3, 75]. For instance, RL has been used to improve the delivery of personalized care by optimizing medication choices, medicine doses, and intervention timings [42]. In the healthcare and therapy domain, data are characterized by its high dimensionality and complex interdependencies [27]. RL has the potential to automatically explore various treatment options by analyzing patient data to derive a policy and personalized therapy without the need of pre-established rules [42].

Recommender systems have also been leveraged using RL. Recommender systems based on RL have the advantage of updating the policies during online interaction, which enables the system to generate recommendations that best suit users evolving preferences [78]. Examples include news [79], music recommendations [31] and personalized learning systems [57].

RL has proven to be an appropriate framework for interaction modeling and optimization of problems that can be formulated as MDPs. The advantage of such methods is the ability to model the stochastic variation of outcomes as transition probabilities between states and action [72]. RL and MDPs have been successfully applied to personalized learning systems [54, 57], intelligent tutoring systems [8, 65], adaptive serious games for ASD [33] and robot assisted therapy [72]. For instance, Bennane [12] automated the selection of the content of a tutoring system and its pedagogical approach to provide differentiated instruction. Similarly, Shawky and Badawi [57] used RL to build an intelligent environment to provide learners with suitable content as well as adapt to the learner’s evolving states. Khabbaz et al. [33] proposed an adaptive serious game for rating social ability in children with ASD using RL. The game adapts itself according to the level of the ASD child by adjusting the difficulty level of the activities. In the field of robot-assisted therapy, Tsiakas et al. [72] proposed an interactive RL framework that adapts to the user’s preferences and refine its learned policy when coping with new users.

In this work, we aim to develop an app that can be used by any of the child caregivers in any setting. Moreover, we aim to provide teachers and therapists with a tool that facilitates intervention planning once a problematic behavior was detected through the recommendation of motivators using RL. Unlike previous studies, we rely on online learning rather than on previously collected data. While online learning does not benefit from the offline repetitive training period, it allows the model to adjust the policies to match the non-stationary environment and individuality of each child with SEND [41].

Solving the MSP

The aim of this work is to leverage the power of RL to solve the problem of selecting the best motivator for each intervention session. We first model the MSP as a Markov decision process (MDP) problem. By using MDPs, the proposed model can explicitly model future rewards, which will benefit the motivator recommendation accuracy significantly in the long run. MDPs can address many of the challenges faced in therapy decision-making. We then apply RL by using Q-learning to solve the modeled problem.

Markov decision processes (MDP)

The MSP can be formulated as an MDP. An MDP is a standard formalization of sequential decision-making, which is widely used for applications where an autonomous agent interacts with its surrounding environment through actions. An MDP can be defined as a four-tuple (S,A,P,R), where S is a set of states called the state space, A is a set of actions called the action space, P is the state transition function, which is the probability of transitioning between every pair of states given an action, and R is the reward function that assigns an immediate reward after transitioning to a new state due to an action [69].

The agent, which is situated in the therapist’s or teacher’s mobile application, interacts with the environment at discrete time steps. In our example, a time step is considered each time a therapist records a behavior in the mobile application. At each time, the agent receives a state St from the environment from a set of possible states, S. Based on this state, the agent selects an action At from a set of valid actions A in state St. Actions in our example are motivators the therapist can use to motivate the student. Based in part on the agent’s action, the agent finds itself in a new state St+1 one time step later. The environment also provides the agent a scalar reward Rt+1 from a set of possible rewards, R. The reward in our example will depend on whether the student becomes motivated and to which degree, among other factors that are explained in the next sections. The transition (st, at, rt+1, s t+1) is stored in memory Μ. The ultimate variation of this system aims at enhancing the learning and therapy experience of the child by recommending the right motivator [69].

The agent-environment interaction produces a trajectory of experience consisting of state-action-reward tuples. Actions influence immediate rewards as well as future states and, therefore, future rewards. When the agent takes an action in a state, the transition dynamics function p (s′, r | s, a), formalizes the state transaction probability. This produces the probability of transitioning to state s′ with reward r, from state s when taking action a.

Modeling MSP as an MDP

The research suggests that in various clinical settings, modeling treatment decisions through MDPs is effective and can yield better results than therapists’ intuition alone [13]. However, there are no previous attempts to model the MSP and an MDP. Careful formulation of the problem and state/action space in essential to obtain satisfactory results and satisfy the Markov assumption that the current timepoint (t) is dependent only on the previous time point (t − 1) [69]. Figure 1 shows how the MSP problem, represented by the ABA intervention, is mapped as an MDP in this work. The following sub-sections describe how each component of the MDP is used to model the MSP problem.

State

One of the most challenging and critical issues in designing the MDP model is to properly identify the factors that influence the effectiveness of a motivator, especially when these factors may differ from one child to another. The personalization of intervention can be achieved by carefully determining these features that represent the state space [57]. Through careful investigation of the research that investigates motivation stimuli for students with ASD, the features outlined in Table 1 were considered.

Table 1.

Features representing the state space

Feature Description Number of values References
Contextual features
Antecedent event (trigger) Event or activity that immediately preceded a problem behavior (alone, given a direction or demand, transitioned to new activity, denied access to an item) 4 Bhuyan et al. [15], Stichter et al. [67]
Time of Day Time of the day the problem behavior occurred (morning, noon, evening) 3 Burns et al. [17]
Subject The aim of this feature is to account to the place and person the problem behavior occurred with (academic subjects, therapy sessions, home) 8 Burns et al. [17]
Behavior
Behavior The problem behavior that requires intervention, grouped into seven categories (aggression, self-injury, disruption, elopement, stereotypy, tantrums, non-compliance) 7 Stevens et al. [66]
Behavior Function The reason the behavior is occurring (sensory stimulation, escape, access to attention, access to tangibles) 4 Alstot and Alstot [4]
History
Last unsuccessful motivator The ID of the last motivator used that was not successful in motivating the student within an episode, including an option for “none” 7
Motivator past usage The number of times each motivator was used within a week grouped in categories of < 5, 5–10, > 11. This factor is composed of six features according to the number of motivators (actions) available (edibles, sensory, activities, tokens, social, choice) 36 Çetin [20]

Therapists and teachers aim to identify appropriate intervention for multiple settings. However, these interventions may fail if no attention is given to contextual differences [67]. Contextual features such as antecedent events, time of day, and location (where and with whom) all impact the child response to a proposed intervention, therefore informing optimal motivators. Moreover, while interventionists aim to track and remediate problem behaviors, the ability to understand the reason behind the occurrence of a behavior is essential as the behavior itself for creating appropriate behavior plans [55].

Problem behaviors in special education are numerous and diverse. In this study, challenging behaviors are grouped into eight widely observed behaviors [66]: aggression (e.g., hitting, biting), self-injury (e.g., head-banging, hitting walls), disruption (e.g., yelling, knocking things over), elopement (e.g., wandering, escaping), stereotypy (e.g., rocking, hand-flapping), tantrums (e.g., crying, screaming), non-compliance (e.g., whining, defying orders), obsession (e.g., constantly talking about same topic).

Keeping track of the last ineffective motivator used is essential in our problem definition to maintain the Markov property where future state and reward depends only on the current state and action [69]. We consider this feature a part of the state to prevent suggesting the same motivator repeatedly. Moreover, we keep track of the number of times a motivator group was used to prevent satiation [44, 52]. While studies have shown that extrinsic reward does not directly harm a child’s intrinsic motivation [18], we consider the repeated long-use of tangible rewards, such as edibles or tokens, to have a negative impact when not carefully administered, and therefore should be limited [74].

Actions

There has been a controversy regarding what type of reward best motivates children with SEND to follow routines and complete academic tasks without negatively impacting their future behavior. Nevertheless, there is a strong evidence that rewarded children report higher intrinsic motivation than the non-rewarded ones [18].

However, the dilemma regarding which motivator is best suited for each intervention withstands. There are many factors that impact the choice of the right contingent reward (motivator) during a therapy or academic session. According to ABA techniques, there is a need to address what happens before the behavior, what is the behavior itself, and what is done immediately after the behavior. In this study, the goal is to recommend an action (contingent motivator) that can be given to the student after completing a certain task or complying to a certain command. The teacher or therapist needs to decide on which motivator to use from a list of six motivators categories (see Table 2): edibles, sensory, activities, tokens, social, and choice [20]. For example, if a student is yelling to get the teacher’s attention, the teacher may promise the student a favorite food item (edible) if the student stops yelling and completes her task. Alternatively, the teacher may assign a leadership role (social) as a motivator to the student once she is done with the activity. If another student is wandering to escape a task, the teacher may promise extra computer time (activity) once the student completes the task in hand. Therapists also consider the long-term effect of the motivator. For example, edible items, especially unhealthy choices, should be avoided. Repetitive use of the same motivator should be avoided as well to prevent satiation. Experienced interventionists sometimes use the same motivator for a specific period of time to stablish a routine but change it later to prevent the student dependency on that particular reward to complete the tasks.

Table 2.

Motivators categories

Motivator Description
Edible Food items, such as fruits, snacks, and juice
Sensory Items or activities that realizes pleasure to the senses of the child, such as listening to music, sitting in a rocking chair, or playing with sand
Activity Activities may include drawing, playing with the computer, or jumping on a trampoline
Token Tangible items that the child values, such as stickers, money, or stars on an honor chart
Social Attention or interaction with another person, such as high-fives, smiles, and praise
Choice Giving the child the chance to choose between two different items or methods, such as asking whether she prefers to use a pencil or crayons to write

Rewards

The reward in our problem definition is the measure of student motivation after introducing the motivator. We adopt in this study the subjective measure of responsiveness proposed by Koegel and Egel [36] as shown in Table 3. The teacher or therapist rates the student’s responsiveness after introducing a motivator and carrying out an activity.

Table 3.

Scale of child’s responsiveness (adapted from Koegel and Egel [36])

Output Description Reward
Negative Child continues problem behavior (tantrums, kicking, screaming) or does not comply with instructions and engages in behavior unrelated to the activity (rocking, yawning, tapping) − 1
Neutral Complies with instructions but tends to get restless or loses attention + 2
Positive Performs task readily. Attends to task quickly, smiles while doing the task, and presents appropriate behavior + 4
Rejected recommendation The user rejects the motivator recommendation and does not introduce it to the child − 0.25
Edible item The motivator selected was an edible item − 1
Token item The motivator selected was a token item − 0.5

Each student responsiveness category results in the agent receiving a reward, as shown in Table 3. The agent receives a reward of − 1 if the motivator did not work or the student response was negative. Alternatively, it receives a +2 if the response was neutral, or +4 if the response was positive. If the caregiver chooses not to follow the recommendation, the reward is − 0.25. In formulating the problem, we also aim to balance two competing objectives; receiving positive responsiveness from the student, and limiting long-term exposure to unhealthy items. The definition of “safe Reinforcement Learning” has been proposed in the literature, especially for recommender systems that aim to balance user’s satisfaction and the avoidance of recommending harmful items like violent movies [30]. Therefore, the agent receives a penalty of − 1 when recommending edibles and − 0.5 for recommending tokens.

Q-learning

To solve the proposed MDP problem, a Q-learning algorithm with an epsilon-greedy (ε-greedy) policy with linearly decreasing exploration rate was used. Q-learning is an off-policy, value-based RL algorithm that aims to find the best action to take according to the current state. Q-learning seeks to learn a policy that maximizes the total reward. Q-learning is considered off-policy as the Q-learning function learns from actions chosen according to a behavior policy that differs from the updated policy. A policy is equivalent to an ABA-based intervention protocol with the advantage of capturing more individualized details of students. In our case, the agent choses an action according to an ε-greedy policy, while learning the optimal policy. ε-greedy is a method used to balance exploration and exploitation, where epsilon (ε) refers to the probability of choosing to explore (i.e., choosing a random action) rather than exploit (i.e., choosing the optimal action). The policy is represented by a table that maps all possible states with actions. While following the ε-greedy policy, the agent exploits with a probability of (1-ε) and with a probability of exploring of (ε). This probability (ε) decays over time by some rate as the agent learns more about the environment. The agent will become “greedy” in terms of exploiting and the probability of exploration becomes less. If the agent becomes “well-trained,” it is possible to select the best action given the state. This process is described as acting according to an optimal policy [69].

The reward is an estimation of the scores the state S receives under the action a, which is denoted as Q (s, a) and updated based on Eq. 1 (Q-learning function), which is based on Bellman’s optimality equation [11].

Qs,aQs,a+α(r+γmaxaQs,a-Qs,a) 1

where α is the learning rate, r is the observed reward, s′ is the new state, γ < 1 is the discount factor for the future rewards received, and Q (s′, a′) is the estimation of the maximum reward that can be obtained by taking some future action in the state s′. The learning process can continue for any number of episodes. In our case, we consider the end of the episode when the student becomes motivated. The Q-learning algorithm can be found in “Appendix 1.”

While it seems straightforward to apply standard learning algorithms to learn the agent’s optimal policy and then use it to recommend motivators to the user, this approach cannot be applied in practice to our problem. Unlike traditional reinforcement tasks such as Atari games [47], therapy recommendation tasks cannot benefit from the possibility of interacting with the user repeatedly to obtain any amount of experience to update the policy toward an optimal one [38]. Moreover, there is no previously collected data to train the algorithm offline before the online interaction. Therefore, we do not vary the experimental parameters in this study.

Additionally, this study is considered to have a “cold start” as all the values in the Q-Table were set to “zero” before the deployment phase. A cold start can be considered problematic as it bothers users by requiring too many interactions for collecting enough experience for learning ([76]. On the other hand, online learning is beneficial for therapy recommendations due to the highly dynamic nature of children preferences and responses to intervention. Moreover, online learning allows us to obtain user’s feedback by tracking whether the suggested motivator was used or not [5, 41].

Each episode starts when a caregiver records a behavior on the mobile app. Then, it is terminated by reaching the final state in which the student becomes motivated. We use a learning rate α of 0.1 and a discount γ of 0.95. We apply an ε-greedy policy that starts with a high ε of 0.9 to encourage state exploration. Then, it decays exponentially with a rate of 0.99 until it reaches 0.05.

As shown in Fig. 3, on each time step t, the therapist or teacher records a behavior instance and requests a motivator recommendation. The agent takes the feature representation of the current state and recommends a motivator using ε-greedy policy. The caregiver then administers the intervention and provides feedback by rating the response of the student. Alternatively, the caregiver can choose not to use the recommended motivator if deemed unappropriated or skip the recommendation if the item is not available (e.g., edible items) or cannot be applied to the current activity (e.g., choice). When the agent chooses to exploit, it selects an action by selecting the highest Q(s,a) for the observed state from the Q-table. Otherwise, the agent “explores” by selecting a random action.

Fig. 3.

Fig. 3

“Motivator Selection” feature

Methodology

To evaluate the proposed model, we deployed a mobile app that can be used by caregivers. This mobile app has been developed and tested in our previous study as an app that facilitates communication and coordination between different parties involved in the therapy and learning of students with ASD [62]. In this study, we added to the mobile app the feature of selecting a motivator. This feature can be used by the teacher, the therapist, and the parent during a learning or therapy session.

To test the usability of the app and the efficiency of the proposed RL model, we asked 12 teachers and therapists of students registered in a private school to maintain behavioral monitoring data using the mobile app without the “Motivator Selection” feature for four weeks. In this phase, caregivers decided what motivator to use for each case according to their knowledge and experience and recorded this information along with the success of the motivator in the app (see Fig. 2). After the four weeks period, we administered the System Usability Scale (SUS) questionnaire, which is a quick measure of the perceived usability of the system [16]. We then updated the app to include the “Motivator Selection” feature, and asked the caregivers to use it for four weeks as well (see Fig. 3). When the users (therapists or teachers) use the “Motivator Selection” feature, they first record the problem behavior, the antecedent of the behavior, and the behavior function by choosing from drop-down lists (Fig. 3a). After clicking the “Suggest Motivator” button, the RL algorithm suggests a motivator to be used (Fig. 3b). If the user “Declines” the motivator suggested, the algorithm will suggest a new motivator. If the user “Accepts” the motivator suggested, the mobile app will prompt the user to choose the student’s response to the motivator from a drop-down list (Fig. 3c). Once the response is entered, the app takes the user back to the “Motivator Suggestion” screen (Fig. 3a) to record a new behavior, choose another student, or exit the app. We then administered the System Usability Scale (SUS) again, and compared the results to the SUS score previously obtained to answer the first research question [62]. To evaluate the performance of the Q-learning algorithm, we first applied the Chi-square test to answer the second research question. We then applied correlation and regression analysis to answer the third and fourth research questions.

Fig. 2.

Fig. 2

Behavior monitoring without the “Motivator Selection” feature

The study design was limited by the settings of teachers’ assignment to classes. As most participating teachers and therapists taught all of the participating students, it was hard to conduct control trials and avoid spillovers. Spillovers occur when a treatment affects those in the control group [6]. In our case, there was no way to assign the participants randomly into two separate groups. First, students who are in a group may have a teacher who is teaching students in both groups, which will prevent us from measuring the true usability value of the feature. On the other hand, if teachers were assigned to either group, they may teach students who have teachers from both groups, which will prevent us from measuring the impact of the RL policy on the student motivation. Therefore, this study follows a one-group pretest–posttest quasi-experimental research design, in which the same variables (i.e., usability of the app and students motivation) are measured in one group of participants before and after using the “Motivator Selection” app feature [34]. The term “quasi” indicates that the design resembles experimental research. However, since participants are not randomly assigned to different groups, quasi-experimental design is considered non-experimental research and lacks the advantages of having a control group. However, an advantage of this method is that it can be used to conveniently assess an intervention on target participants. Moreover, this design allows for statistical analysis of data using recognized methods [68].

Participants

Consent was obtained from 12 parents of children with ASD studying at the same private school. The consent was obtained to use the child’s data in the mobile app and allow teachers and therapists to use the mobile app to track academic progress and behavior of the child. The age of the children ranged from 6 to 9 years and they were all diagnosed with a middle-range (Level 2) to severe (Level 3) form of ASD (see Table 4). All the participating students attended self-contained classrooms for students with SEND in a private K-12 school. Self-contained classrooms are usually separated from general education classrooms but within the same school building [63]. All participating children met the following criteria: (1) attended self-contained classrooms for students with SEND, (2) had an IEP, and (3) had a comprehensive behavioral plan. These students were taught by special education teachers in addition to specialized behavioral and operational therapists.

Table 4.

Learners’ demographics

Category N
Learners’ gender
Female 1
Male 11
Learner’s age
6 3
7 5
8 2
9 2
Level of ASD
Level 2 5
Level 3 7

Teachers and therapists of these twelve students were invited to participate in the study by assessing the quality of the recommendations they received from the app. This resulted in the participation of ten teachers, and two therapists (see Table 5). The participating therapists have extensive experience in the use of behavioral therapy techniques with students with SEND in general, and with students with ASD in particular. On the other hand, participating teachers did not receive any formal training in the use of behavioral therapy techniques but had varied experience dealing with children with ASD in terms of tracking behavior and using motivators inside the class.

Table 5.

Teachers’ and therapists’ demographics

Category N
Gender
Female 11
Male 1
Nationality
Egyptian 7
Jordanian 3
Syrian 2
Job description
Subject teacher 10
Occupational Therapist 1
Behavior Specialist 1

Ethical considerations

One challenge for applying RL in experimental settings is exploration. In other domains, such as game playing and movies recommendations, experiments can be repeated as many times as needed. In our clinical setting, the RL agent has to learn online with limited previously collected data. Using trial and error to explore all possible states may conflict with therapy and education ethics ([42]. However, in this study, the caregiver has the ability to dismiss any suggested motivator, either because of its unavailability, or because of the caregiver belief that the motivator suggested will not be effective. Therefore, the motivator suggestion aims to augment the decision-making process of the therapist, rather than replace it.

Moreover, the process of ABA-based therapy requires that the therapist varies between the motivator choices. This sometimes requires the therapist to try motivators that may not work. As the caregiver maintains the control of what motivators to use, this study does not pose any identifiable or foreseeable risk to any participant outside of normal daily risks.

Finally, the use of the app does not require caregivers to change any of the normal learning activities and environments where participating students usually engage.

Usability evaluation

To evaluate the performance of the applicability of the application and RL algorithm, and to answer the first research question, we conducted a user study by administering the SUS questionnaire (see “Appendix 2”). The SUS questionnaire was developed in 1996 by Brooke as a quick measure of the perceived usability of a system [16]. Numerous studies confirmed the validity and reliability of the questionnaire for various platforms and with a limited number of participants [7, 39]. The questionnaire was presented to participants in two forms, an English version and an Arabic version [2]. To calculate the SUS score, the score contribution for each item was calculated. The average score of all participants responses is the value of the system usability [16].

RQ1: Does adding the “motivator selection” feature to the IEP-Connect app increase the usability of the app?

To answer the first research question, we compared the average SUS score for the IEP-Connect before using the “Motivator Selection” feature, which was 80.42, and the average SUS score for the IEP-Connect app after introducing the “Motivator Selection” feature, which was 84.38. The score is considered to have improved, as shown in Fig. 4.

Fig. 4.

Fig. 4

New SUS mean score compared to the SUS mean score of the previous iteration

Q-learning algorithm performance evaluation

Compared to other machine learning algorithms, there is an absence of an agreed upon performance evaluation standard for RL [42]. While this problem is not unique to RL, it is harder to address compared to other machine learning algorithms that rely on accuracy and precision recall as a performance indicator. Calculating the precision and accuracy of an algorithm usually requires an offline dataset to be divided into training and testing sets. As this study does not benefit from an offline dataset, the effectiveness of the proposed algorithm is evaluated through statistical and qualitative analysis [68].

Descriptive summary

During the first four weeks, participants manually recorded the use of motivators through the behavior monitoring page in app. In this phase, teachers and therapists presented students with motivators according to the treatment approaches they usually follow. They also recorded whether the motivator was effective or not (Fig. 2). During this period, teachers and therapists entered 490 behavior monitoring entries. Out of those entries, 223 (45.5%) contained motivators that worked and 267 (54.5%) contained motivators that did not work. Table 6 presents the descriptive summary for the dataset collected before using the RL algorithm.

Table 6.

Descriptive summary for “no-algorithm” dataset variables

Variable Total Per Mean SD Min Max
Users 12 Research
Students 12 Research
Days 30 Research
Entries 490 User (therapists and teachers) 40.833 40.653 6 141

In the following four weeks, participants were asked to use the “motivator selection” feature in the app to aid them in identifying the right motivator to use rather than manually choose the motivators. In this phase, teachers and therapists recorded the behavior problem, the antecedent, and the behavior function (Fig. 3). Then, the app would suggest a motivator to be used. The therapist or teacher can decide whether to use the motivator suggested, or reject it and ask for another suggestion. Once using the motivator, the user would record the student response to the monitor. Each entry is an instance where the app suggested a motivator to be used. If the user “declines” the motivator (i.e., choose not to use it), the app will immediately propose a new motivator. The new recommended motivator becomes a new entry. This continues until the user “accepts” a motivator (i.e., uses the suggested motivator). If the motivator accepted is effective (i.e., the student becomes motivated), the episode ends. If the motivator accepted is not effective (i.e., the student does not become motivated), the user can request another motivator to use or record a new behavior problem. An episode constitutes of the steps where a motivator was “declined” or not effective until the motivator used is marked as effective.

Table 7 presents the descriptive summary for the dataset collected after using the RL algorithm. An episode considered all the steps until the student becomes motivated, including the motivators that were dismissed and not used. This resulted in 598 episodes, with an average of 2.06 steps per episode. Figure 5 shows the percentage of effective, not effective, and declined motivators when using the motivator selection featured compared to when therapists and teachers selected the motivator by themselves.

Table 7.

Descriptive summary for the “algorithm” dataset variables

Variable Total Per Mean SD Min Max
Users 12 Research
Students 12 Research
Days 32 Research
Entries 1231 User (therapists and teachers) 102.58 98.72 8 376
Episodes 598 Day 38.469 26.39 1 91
Steps (episode length) Episode 2.06 2.35 1 22
Reward − 2.00 4.00
Reward sum Episode 2.68 1.478 − 4.50 4.00

Fig. 5.

Fig. 5

Percentage of effectiveness of motivator with and without using the motivator selection feature

Algorithm evaluation

Three statistical techniques were applied to evaluate the performance of the Q-learning algorithm: Chi-square test, correlation, and regression analysis. Significance level was set by the researcher at α = 0.05.

RQ2: Does student motivation significantly increase when using the “motivator selection” feature compared to the traditional motivator selection methods?

To compare the effectiveness of selecting motivators with and without using the “motivator selection” feature, the entries that included a motivator that was actually used are considered. This resulted in 671 entries. Out of those entries, 602 were motivators that worked and 69 were motivators that did not work. To answer the research question, a Chi-square test of independence was conducted to test whether or not the application of the algorithm has a significant effect on motivating students. The test revealed that applying the algorithm is significantly associated with more motivated students, Pearson χ2 = 265.16 and p < 0.001. The cross-tabulation in Table 8 indicates that 45.5% of students were motivated without applying the algorithm, while 89.6% were motivated with algorithm being applied. The contingency coefficient was 0.432, p < 0.001, indicates that students being motivated is significantly associated with the algorithm application, and that this association is relatively strong.

Table 8.

Cross-tabulation of algorithm versus status

Status Total
Not motivated Motivated
Algorithm
 Without 267 223 490
54.5% 45.5% 100.0%
 With 69 594 663
10.4% 89.6% 100.0%
Total 336 817 1153
29.1% 70.9% 100.0%

Pearson χ2 = 265.16, p < 0.001. Symmetric measures: contingency coefficient = 0.432, p < 0.001

Cells (0.0%) have expected count less than 5. The minimum expected count is 142.79

A binomial logistic regression was also performed to ascertain the effect of algorithm application on the likelihood that students would be motivated. The logistic regression model was statistically significant, χ2(1) = 273.337, p < 0.001, as shown in Table 9. The model explained 30.1% (Nagelkerke R2) of the variance in students’ motivation and correctly classified 74.7% of cases. Applying algorithm is 10.307 times more likely to motivate students more than not applying it.

RQ3: When using the RL algorithm for the “motivator selection”, does reward significantly increase over time?

Table 9.

Summary of logistic regression analysis for algorithm application predicting students’ motivation

Variable B SE Wald df Sig Exp(B)
Algo(1) 2.333 0.156 222.987 1 0.000 10.307
Constant − 0.180 0.091 3.940 1 0.047 0.835

− 2LL = 1118.132, Nagelkerke R2 = 0.301, χ2 = 273.337, df = 1, p < 0.001

Classification accuracy = 74.7%

The reward data for the applied algorithm was plotted against time so that it would be easy to observe the pattern of reward data, as shown in the scatterplot in Fig. 6. Correlation analysis revealed that there is a significant positive association between reward and sequence, r = 0.081, p = 0.036. Although the association is weak (in magnitude), it is positive and statistically significant, indicating that reward significantly increases over time. This allows to proceed to regression analysis that shows how reward increases over time.

Fig. 6.

Fig. 6

Scatterplot of reward over time

Regression analysis of reward

One of the aims of the study is to find whether rewards increase over time, i.e., to test the hypothesis: H¬0: Time (sequence) has a positive impact on Rewards. Given the dependent variable (Rewards), and the independent variable (time sequence), a simple linear regression analysis was conducted to investigate how reward is predicted over time. The results of regression analysis, reported in Table 10, show that the regression mode is significant, F(1,661) = 4.412, p = 0.036, and explains 0.7% of the variance in Reward. The t test of the regression coefficient shows a significant predictor, t = 2.101, p = 0.036. That is, time that refers to each step taken to apply the algorithm is a significant contributor to the positive change in reward. In other words, each one step taken to apply the algorithm would cause an increase in reward by 0.00037 (or from 0.000024 to 0.00071). Using G*Power 3.1.9.7 [24], effect size and test power were estimated. With 663 cases, α = 0.05, and R2 = 0.007, effect size ƞ2 was estimate to be 0.007 and power (1 − β) to be 0.579.

Table 10.

Regression analysis summary for sequence predicting reward

Variable B 95% CI β t p
(Constant) 2.386 [2.125, 2.647] 17.944 < 0.001
Sequence 0.000367 [0.000024, 0.000710] 0.081 2.101 0.036

R2 = 0.007, R2adj. = 0.005, CI = confidence interval for B

Deeper regression analysis of reward

Considering that a reward higher than 2 is already high, a simple linear regression analysis was conducted using data for reward 2 or below. The analysis produced a significant regression model, F(1,257) = 7.519, p = 0.007, with R2 = 0.028. The correlation coefficient between reward and sequence is equal to 0.169, which is a stronger value than 0.081. Moreover, investigating the regression coefficient of 0.001 indicates that low rewards significantly increase over time, that is, for each new step, reward increases by 0.001 (Table 11). The regression line is shown in Fig. 7. Using G*Power 3.1.9.7 [24], effect size and test power were estimated. With 259 cases, α = 0.05, and R2 = 0.028, effect size ƞ2 was estimate to be 0.029 and power (1 − β) to be 0.777.

RQ4: When using the RL algorithm for the “motivator selection”, does the episode length (number of steps) significantly decrease over time?

Table 11.

Regression analysis summary for sequence predicting reward (< 3)

Variable B 95% CI β t p
(Constant) 0.475 [0.136, 0.813] 2.764 0.006
Sequence 0.000627 [0.000177, 0.001077] 0.169 2.742 0.007

R2 = 0.028, R2adj. = 0.025, CI = confidence interval for B

Fig. 7.

Fig. 7

Scatterplot of reward (< 3) over time

In order to answer this question, the scatterplot of episode length against time is investigated first to check whether there is a negative pattern on length over time (see Fig. 8). The scatterplot shows a negative pattern of episode lengths over time as the fit line is moving downward. A correlation analysis revealed that there is a significant negative association between episode length and sequence, r = − 0.161, p < 0.001. This encourages to run regression analysis to discover how length decreases over time.

Fig. 8.

Fig. 8

Scatterplot of episode length against time

Regression analysis of episode length

A simple linear regression analysis was conducted to investigate how episode length changes based on sequence. A significant regression equation was found, F(1,596) = 15.917, p < 0.001, with an R2 of 0.026. The predicted episode length is equal to 2.758–0.001 (sequence) steps for each episode. That is, episode length decreases by 0.001 (or between 0.000518 and 0.001524) steps for each episode (Table 12). This indicates that an episode would need a smaller number of steps over time. Using G*Power 3.1.9.7 [24], effect size and test power were estimated. With 598 cases, α = 0.05, and R2 = 0.026, effect size ƞ2 was estimate to be 0.027 and power (1 − β) to be 0.979.

Table 12.

Regression analysis summary for sequence predicting episode length

Variable B 95% CI β t p
(Constant) 2.758 [2.366, 3.149] 13.841 < 0.001
Sequence − 0.001021 [− 0.001524, − 0.000518] − 0.161 − 3.990 < 0.001

R2 = 0.026, R2adj. =0 .024, CI = Confidence Interval for B

Deeper regression analysis of episode length

As the number of steps per an episode of 1 or 2 steps is already a small number, regression analysis was conducted using data of episodes with a number of steps greater than 2. The analysis revealed that the association between episode length and sequence is stronger. The produced regression model was significant, F(1,126) = 4.951, p = 0.028, with R2 = 0.038. Moreover, the regression coefficient B = − 0.002 indicates that for each new episode, the number of steps significantly decreases by 0.002 (Table 13). This causal relationship is shown in Fig. 9. Using G*Power 3.1.9.7 [24], effect size and test power were estimated. With 128 cases, α = 0.05, and R2 = 0.038, effect size ƞ2 was estimate to be 0.040 and power (1 − β) to be 0.607.

Table 13.

Regression analysis summary for sequence predicting episode length (> 2)

Variable B 95% CI β t p
(Constant) 6.482 [5.312, 7.652] 10.965  < 0.001
Sequence − 0.001903 [− 0.003596, − 0.000210] − 0.194 − 2.225 0.028

R2 = 0.038, R2adj. = 0.030, CI = confidence interval for B

Fig. 9.

Fig. 9

Scatterplot of episode length (> 2) against time

Discussion

This study aimed to address the problem of selecting motivators to be used in a learning setting with learners with ASD. To this aim, the problem of selecting a motivator was modeled as an MDP problem. The factors that impact the effectiveness of a motivator were considered based on applied behavior analysis as well as learners’ individual preferences. The states, actions and rewards were designed through careful consideration of the research that investigates motivation stimuli for learners with ASD. The MDP problem was then solved using reinforcement learning and added as a feature to a mobile app (IEP-Connect) to aid therapists and teachers in selecting the right motivator to use.

Prior to adding the “motivator section” feature to the IEP-Connect mobile app, teachers and therapists used the app to record the progress of the learner toward the learning objectives, as well as record the behavior of the learners along the motivators the therapists selected according to their own methods. To test the usability of the mobile app with the added “motivator selection” feature, we compared the SUS scores before and after adding the feature. The SUS can be considered a standard instrument for iterative software usability evaluation [7]. In our study, SUS scores were collected as part of each iteration of the mobile app and related to the new features encompassing the software. This provided an effective way of determining if the tool design is becoming more or less usable with each iteration. However, statistical analysis was not performed on the SUS scores to determine whether the change was significant. This is because participants’ answers to the SUS questionnaires were anonymous, making the pre/posttest statistical analysis not possible. Another limitation of the method used is that quasi-experimental methods are susceptible to threats to internal validity, especially those related to observing the same participants over time. Without a control group, it cannot be concluded with a certain degree that the treatment is what caused the change in scores [68]. On the other hand, the positive trends identified in this study indicate that the app usability is headed in the right direction. This is because the introduction of new features in a technology tool usually results in an initial drop on usability score as users become accustomed to the new introduction [7].

We then investigated whether students’ motivation increases when using the RL model compared to when therapists and teachers selected the motivator without the aid of the “motivator selection” feature. While this study is limited with the number of participants as well as the study period, the entries analysis gave us insights on how well the new feature supported the decision-making process of the mobile app users. When comparing the number of motivators administered that worked when the therapists used their own methods and when they used the mobile app feature, the results indicated that using the “motivator selection” feature significantly increased the times learners were motivated. While these results demonstrate the feasibility of RL to support the decision-making process in regards of the motivator selection problem, the reliability of these results requires further investigation. This is due to the fact that the “motivator selection” feature was used by a homogenous group of users from the same context. Many factors might affect the results if the study were to be conducted in other contexts. One of the main factors that may impact the results is the experience of the users. In this study, ten of the participants did not have experience in ABA-based methods and usually relied on the therapists to recommend motivators to use. As communication between the different people involved with the intervention of a learner with ASD is considered a challenge [61, 62], teachers usually find themselves using the same motivator repeatedly. The repeated administration of a motivator causes sanitation. On the other hand, the RL model was designed in a way that prevents suggesting the same motivator repeatedly.

Another point worth discussing is concerned with the “declined” suggestions when using the “motivator selection” feature. The number of entries that were “declined” was (560). The high number of declined suggestions is due to many reasons. First, teachers and therapists declined the suggested motivator when the item suggested was not available. The highest number of motivators declined was for “edible” motivators. Edible motivators might be easy to administer at home. However, in the school, these are not available unless previously planned. On the other hand, other items were declined when the teacher or therapist believed that the suggested motivator will not work or is hard to administer at the moment. Moreover, teachers declined motivators they were not familiar with or did not have experience in administering them.

While the number of declined suggestions can be considered high, they were not considered as “Not Effective” motivators in the analysis for various reasons. First, these motivators were not tested to determine whether they were effective or not in a particular setting. Moreover, we cannot obtain similar data on what motivators that were “thought off” and not used when the therapists and teachers chose the motivators themselves. Additionally, the aim of this study is not to measure the accuracy of the recommendations. Rather, we hypothesize that the use of the motivator selection feature in the mobile app in tandem with the judgment of the therapist will result in the use of motivators that work more often than when the therapist does not use this feature. Thus, we assume that the RL-based model can augment the therapists and teachers’ capabilities, rather than replace their role. However, as with any application of machine learning on intervention and therapy, the long-term impact of the use of the proposed model on the motivation of learners requires longer study periods and a larger number of participants.

Another point worth discussing is the rewards system proposed in this study. The penalties imposed on the use of edible and token motivators may have decreased the percentage of successful motivators. That is because the use of edibles, despite being harmful on the long run, are usually very effective on the short term [52]. Moreover, the use of token economies has proven to be very effective in behavior intervention for learners with autism [45]. While this is the case, we aimed in this study to encourage teachers and therapists to administer social and activities motivators more often [52].

Finally, the regression analysis of the data obtained when using the RL algorithm aimed to investigate whether the episode reward improves and the episode length declines over time as the model is trained. The results showed that the change in reward and episode length over time is significant. However, this change is considered small. This limitation is a result of the short period of time the algorithm was trained. The low R value indicates that the independent variable (sequence) is not explaining much of the variation in the dependent variable (rewards), regardless of the significance. That is, sequence is not accounting for much of the variance of rewards. The R value can be improved by extending the study period, as we believe that with the continued use of the algorithm through the app, the performance of the algorithm will continue to improve. Moreover, the effect size is considered small. However, even small effect sizes can have scientific clinical significance, depending on the study field [28].

Implications

This research is part of continuous work in developing a mobile application that facilitates collaboration and communication between different stakeholders working with students with SEND.

The study took place during the COVID-19 pandemic, which resulted in disruptions such as schools closing or shifting to distance learning. This unique context highlighted the issues in information sharing and collaboration between school and home regarding children’s academic and behavioral plans [64]. Moreover, it was reported that during distance learning, students in general, and students with special needs in particular felt less motivated and engaged with their lessons [14]. Additionally, behavioral plans had to change according to whether the student is in a distance learning, blended learning, or on-site learning program. However, the challenges in behavior management coordination are not exclusive to distance learning settings. Many studies reported issues in coordination between school and home, where the communication is often one-sided [59]. Moreover, teachers and therapists often lack the time and resources needed to train parents on effective behavior management techniques [64].

With so many challenges in place, it became more apparent that an innovative and dynamic way of communication and progress monitoring should be adopted. Moreover, as parents’ responsibility in managing their children behavior increased, the need for specialized decision support systems for parents or caregivers without the needed experience became increasingly apparent. The use of the IEP-Connect app with its “Motivator Selection” feature can address the inconsistency in behavior management techniques between home and school.

The results presented demonstrate the feasibility of RL to support the decision-making process of caregivers of students with ASD in regards of the motivator selection problem. Despite its early stages of development, the proposed “motivator selection” feature performs significantly better than current conventional motivator selection methods. A distinct advantage of RL over other machine learning approaches is that users’ preferences guide the choice of the RL algorithm, which in return optimizes the reward function, resulting in higher reward values over time. The results indicate that the episode rewards increased over time, which indicates that the agent learnt to maximize its total reward earned. However, the episode reward did not level out at a high reward per episode value, meaning that the agent is not yet behaving optimally at every state. This is due to the large state space compared to the number of episodes the agent experienced. We believe that with the continued use of the algorithm through the app, the performance of the algorithm will continue to improve until it acts optimally at every state.

The proposed RL algorithm was based on scientific research which considered ABA practices, health concerns, and satiation. Moreover, the algorithm encouraged caregivers to incorporate motivational components in academic tasks and give children a choice, which resulted in higher motivation rates. These results add to the existing literature regarding the use of motivators for behavior management. Moreover, the results indicate that the introduction of the “Motivator Selection” increased the usability score of the app. The modeling approach using MDP can be further improved to provide more accurate and personalized motivator selections. The continuous collection of data resulting from ABA sessions will provide valuable data for research on children with ASD behavior and how they are motivated in different settings [17]. Data mining techniques can be used on the collected data to uncover fundamental patterns regarding the optimal motivator magnitude and to predict the likelihood of the child response to intervention. For example, children with inconsistent or reduced response to motivators may require additional training or therapeutic sessions [13, 56]. Moreover, patterns can be extracted from such data to understand and explain problematic and reoccurring behavior or antecedents, which will help in return minimize the situations where such behavior occurs. Using data mining techniques can offer capabilities beyond human comprehension, such as considering numerous contextual features [17].

While RL has the potential to be a significant contribution in the area of therapy decision-making for special needs, there are certain key issues that need to be addressed, such as clinical implementations and ethics ([42]. RL-based policies face many challenges before they can be deployed to inform clinical decision-making. One of the most common limitations is the challenge in obtaining training data, both in terms of time and cost. Thus, the number of trajectories obtained is usually limited compared to those resulting from simulations. Besides the number of trajectories limitation, another factor that may impact the estimation of the value function in this type of studies is that the type of data collected on different children can be extremely variable, both across children and within a child over time [58]. Since the proposed RL algorithm is run in a real scenario without the benefit of a training period, the large number of possible state-action pairs has to be considered to avoid coverage problems. The RL models in clinical settings require iterative refining to include new data from various resources as well as longer training periods to increase the system’s knowledge-base. Using data from limited resources may result in biased algorithms that do not apply to all scenarios. Additionally, among the raised ethical concerns of AI algorithms is the ownership of generated data and the right to benefit from them [57].

The number of participants and the study period may also be considered as limitations. Moreover, the study did not consider a personalized model for each learner due to the large state space compared to the number of participants. Therefore, there is a need for a dynamic adaptation mechanism that allows the agent to efficiently refine its policy toward each learner [72].

Moreover, future work should consider the interpretability of produced recommendations. The system should be able to not only suggest suitable motivators, but also explain such choices to the user [50]. “Explainable” AI systems have been proven to increase users trust and acceptance of the system [1]. Other possible directions for future work include modeling the RL algorithm as a partially observable MDP where data are mapped to some state space that truly represents students’ behavior and context features.

The proposed RL framework can be integrated into a multi-agent ecosystem that aims to improve the coordination among the stakeholders of an IEP. The agents in the system will be able to “learn” over time and adapt to the real-world variation. Such systems can aid in the decision-making process of education providers while developing updated knowledge about effective special education practices [13].

Appendix 1

Q-Learning and Algorithm

graphic file with name 10209_2022_914_Figa_HTML.jpg

Appendix 2

SUS Questionnaire Items in English and Arabic

See Table 14.

Table 14.

SUS questionnaire items in English and Arabic

graphic file with name 10209_2022_914_Tab14_HTML.jpg

Funding

No funding was received for conducting this study.

Availability of data and materials

The data collected and analyzed during the current study are available from the corresponding author on request.

Code availability

The reinforcement learning algorithm code is available from the corresponding author on request.

Declarations

Conflict of interest

The authors declare that both are no conflict of interest to this work.

Consent to participate

All participants (parents and teachers) provided written informed consent prior to enrollment in the study.

Consent for publication

Not applicable.

Ethics approval

Approval was obtained from the Research Ethics Committee in the British University in Dubai.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Abdi, S., Khosravi, H., Sadiq, S., Gasevic, D.: Complementing educational recommender systems with open learner models. In: Proceedings of the Tenth International Conference on Learning Analytics & Knowledge, pp. 360–365 (2020).10.1145/3375462.3375520
  • 2.AlGhannam BA, Albustan SA, Al-Hassan AA, Albustan LA. Towards a standard Arabic system usability scale: psychometric evaluation using communication disorder app. Int. J. Hum. Comput. Interact. 2018;34(9):799–804. doi: 10.1080/10447318.2017.1388099. [DOI] [Google Scholar]
  • 3.Alkashri, Z., Siyam, N., Alqaryouti, O.: A detailed survey of artificial intelligence and software engineering: emergent issues. In: 2020 Fourth International Conference on Inventive Systems and Control (ICISC), pp. 666–672 (2020).10.1109/ICISC47916.2020.9171118
  • 4.Alstot AE, Alstot CD. Behavior management: examining the functions of behavior. J. Phys. Educ. Recreat. Dance. 2015;86(2):22–28. doi: 10.1080/07303084.2014.988373. [DOI] [Google Scholar]
  • 5.Arzate Cruz, C., Igarashi, T.: A survey on interactive reinforcement learning: design principles and open challenges. In: Proceedings of the 2020 ACM Designing Interactive Systems Conference, pp. 1195–1209. Association for Computing Machinery (2020). 10.1145/3357236.3395525
  • 6.Baird S, Bohren JA, McIntosh C, Ozler B. Optimal design of experiments in the presence of interference (SSRN Scholarly Paper ID 2900967) Soc. Sci. Res. Netw. 2017 doi: 10.2139/ssrn.2900967. [DOI] [Google Scholar]
  • 7.Bangor A, Kortum PT, Miller JT. An empirical evaluation of the system usability scale. Int. J. Hum. Comput. Interact. 2008;24(6):574–594. doi: 10.1080/10447310802205776. [DOI] [Google Scholar]
  • 8.Barnes T, Stamper J. Toward automatic hint generation for logic proof tutoring using historical student data. In: Woolf BP, Aïmeur E, Nkambou R, Lajoie S, editors. Intelligent Tutoring Systems. Berlin: Springer; 2008. pp. 373–382. [Google Scholar]
  • 9.Begoli, E.: Procedural-reasoning architecture for applied behavior analysis-based instructions (2014). https://trace.tennessee.edu/utk_graddiss/2749
  • 10.Begoli E, Ogle CL, Cihak DF, MacLennan BJ. Towards an integrative computational foundation for applied behavior analysis in early autism interventions. In: Lane HC, Yacef K, Mostow J, Pavlik P, editors. Artificial Intelligence in Education. Berlin: Springer; 2013. pp. 888–891. [Google Scholar]
  • 11.Bellman R. Dynamic programming. Science. 1966;153(3731):34–37. doi: 10.1126/science.153.3731.34. [DOI] [PubMed] [Google Scholar]
  • 12.Bennane A. Adaptive educational software by applying reinforcement learning. Inform. Educ. Int. J. 2013;12(1):13–27. [Google Scholar]
  • 13.Bennett CC, Hauser K. Artificial intelligence framework for simulating clinical decision-making: a Markov decision process approach. Artif. Intell. Med. 2013;57(1):9–19. doi: 10.1016/j.artmed.2012.12.003. [DOI] [PubMed] [Google Scholar]
  • 14.Beulah, J.: Progress monitoring through the lens of distance learning (2020). https://red.mnstate.edu/thesis/398
  • 15.Bhuyan, F., Lu, S., Ahmed, I., Zhang, J.: Predicting efficacy of therapeutic services for autism spectrum disorder using scientific workflows. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 3847–3856 (2017). 10.1109/BigData.2017.8258388
  • 16.Brooke J. SUS-A quick and dirty usability scale. Usability Eval. Ind. 1996;189(194):4–7. [Google Scholar]
  • 17.Burns W, Donnelly M, Booth N. Mining for patterns of behaviour in children with autism through smartphone technology. In: Bodine C, Helal S, Gu T, Mokhtari M, editors. Smart Homes and Health Telematics. Berlin: Springer; 2015. pp. 147–154. [Google Scholar]
  • 18.Cameron J, Pierce WD. Reinforcement, reward, and intrinsic motivation: a meta-analysis. Rev. Educ. Res. 1994;64(3):363–423. doi: 10.3102/00346543064003363. [DOI] [Google Scholar]
  • 19.CDC: Data and Statistics on Autism Spectrum Disorder | CDC. Centers for Disease Control and Prevention (2020). https://www.cdc.gov/ncbddd/autism/data.html
  • 20.Çetin ME. Determination of reinforcement usage strategies during literacy education of teachers working with students with multiple disabilities in Turkey. Int. J. Educ. Lit. Stud. 2021;9(1):25–32. doi: 10.7575/aiac.ijels.v.9n.1p.25. [DOI] [Google Scholar]
  • 21.Crutchfield SA, Mason RA, Chambers A, Wills HP, Mason BA. Use of a self-monitoring application to reduce stereotypic behavior in adolescents with autism: a preliminary investigation of I-connect. J. Autism Dev. Disord. 2015;45(5):1146–1155. doi: 10.1007/s10803-014-2272-x. [DOI] [PubMed] [Google Scholar]
  • 22.Dawson G, Jones EJH, Merkle K, Venema K, Lowy R, Faja S, Kamara D, Murias M, Greenson J, Winter J, Smith M, Rogers SJ, Webb SJ. Early behavioral intervention is associated with normalized brain activity in young children with autism. J. Am. Acad. Child Adolesc. Psychiatry. 2012;51(11):1150–1159. doi: 10.1016/j.jaac.2012.08.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Dyer K. The competition of autistic stereotyped behavior with usual and specially assessed reinforcers. Res. Dev. Disabil. 1987;8(4):607–626. doi: 10.1016/0891-4222(87)90056-4. [DOI] [PubMed] [Google Scholar]
  • 24.Faul F, Erdfelder E, Buchner A, Lang A-G. Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses. Behav. Res. Methods. 2009;41(4):1149–1160. doi: 10.3758/BRM.41.4.1149. [DOI] [PubMed] [Google Scholar]
  • 25.Foster, M. E., Avramides, K., Bernardini, S., Chen, J., Frauenberger, C., Lemon, O., Porayska-Pomsta, K.: Supporting children’s social communication skills through interactive narratives with virtual characters. In: Proceedings of the International Conference on Multimedia - MM ’10, p. 1111 (2010). 10.1145/1873951.1874163
  • 26.Fuchs D, Fuchs LS, Vaughn S. What is intensive instruction and why is it important? Teach. Except. Child. 2014;46(4):13–18. doi: 10.1177/0040059914522966. [DOI] [Google Scholar]
  • 27.Gräßer F, Beckert S, Küster D, Schmitt J, Abraham S, Malberg H, Zaunseder S. Therapy decision support based on recommender system methods. J. Healthc. Eng. 2017;2017:e8659460. doi: 10.1155/2017/8659460. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Grace-Martin, K.: Assessing the fit of regression models. The Analysis Factor. (2015)
  • 29.Healey JB. Extrinsic reinforcers as a critical component of education for motivating students with special needs. Motiv. Pract. Classr. 2008 doi: 10.1163/9789087906030_009. [DOI] [Google Scholar]
  • 30.Heger M. Consideration of risk in reinforcement learning. In: Cohen WW, Hirsh H, editors. Machine Learning Proceedings 1994. Burlington: Morgan Kaufmann; 1994. pp. 105–111. [Google Scholar]
  • 31.Hong, D., Li, Y., Dong, Q.: Nonintrusive-sensing and reinforcement-learning based adaptive personalized music recommendation. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1721–1724 (2020). 10.1145/3397271.3401225
  • 32.Hudson, H.: Teaching Students with Autism Spectrum Disorders: Are Teachers Truly Prepared? [Ed.D., Wilmington University (Delaware)] (2020). http://search.proquest.com/pqdtglobal/docview/2425898207/abstract/39201764A57E4FE4PQ/1
  • 33.Khabbaz, A.H., Pouyan, A.A., Fateh, M., Abolghasemi, V.: An adaptive RL based fuzzy game for autistic children. In: 2017 Artificial Intelligence and Signal Processing Conference (AISP), pp. 47–52 (2017).10.1109/AISP.2017.8324105
  • 34.Kirk, R.E.: Experimental design. In: Handbook of Psychology, Second Edition. American Cancer Society (2012).10.1002/9781118133880.hop202001
  • 35.Koegel L, Singh AK, Koegel RL, Koegel L. Improving motivation for academics in children with autism. J. Autism Dev. Disord. 2010;40(9):1057–1066. doi: 10.1007/s10803-010-0962-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Koegel R, Egel A. Motivating autistic children. J. Abnorm. Psychol. 1979;88(4):418–426. doi: 10.1037/0021-843X.88.4.418. [DOI] [PubMed] [Google Scholar]
  • 37.Kosmicki JA, Sochat V, Duda M, Wall DP. Searching for a minimal set of behaviors for autism detection through feature selection-based machine learning. Transl. Psychiatry. 2015;5(2):e514. doi: 10.1038/tp.2015.7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Lei Y, Li W. Interactive recommendation with user-specific deep reinforcement learning. ACM Trans. Knowl. Discov. Data. 2019;13(6):61:1–61:15. doi: 10.1145/3359554. [DOI] [Google Scholar]
  • 39.Lewis JR, Sauro J. The factor structure of the system usability scale. In: Kurosu M, editor. Human Centered Design. Berlin: Springer; 2009. pp. 94–103. [Google Scholar]
  • 40.Linstead, E., Burns, R., Nguyen, D., Tyler, D.: AMP: A platform for managing and mining data in the treatment of Autism Spectrum Disorder. In: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 2545–2549 (2016). 10.1109/EMBC.2016.7591249 [DOI] [PubMed]
  • 41.Liu P, Chen M. Performance evaluation of recommender systems. Int. J. Perform. Eng. 2017;13(8):1246. doi: 10.23940/ijpe.17.08.p7.12461256. [DOI] [Google Scholar]
  • 42.Liu S, See KC, Ngiam KY, Celi LA, Sun X, Feng M. Reinforcement learning for clinical decision support in critical care: comprehensive review. J. Med. Internet Res. 2020;22(7):e18477. doi: 10.2196/18477. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Marcu, G., Tassini, K., Carlson, Q., Goodwyn, J., Rivkin, G., Schaefer, K. J., Dey, A.K., Kiesler, S.: Why do they still use paper? Understanding data collection and use in Autism education. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 3177–3186 (2013).10.1145/2470654.2466436
  • 44.Matheson BE, Douglas JM. Overweight and obesity in children with Autism Spectrum Disorder (ASD): a critical review investigating the etiology, development, and maintenance of this relationship. Rev. J. Autism Dev. Disorders. 2017;4(2):142–156. doi: 10.1007/s40489-017-0103-7. [DOI] [Google Scholar]
  • 45.Matson JL, Boisjoli JA. The token economy for children with intellectual disability and/or autism: a review. Res. Dev. Disabil. 2009;30(2):240–248. doi: 10.1016/j.ridd.2008.04.001. [DOI] [PubMed] [Google Scholar]
  • 46.Mechling LC, Gast DL, Cronin BA. The effects of presenting high-preference items, paired with choice, via computer-based video programming on task completion of students with autism. Focus Autism Other Dev. Disabil. 2006;21(1):7–13. doi: 10.1177/10883576060210010201. [DOI] [Google Scholar]
  • 47.Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D. Human-level control through deep reinforcement learning. Nature. 2015;518(7540):529–533. doi: 10.1038/nature14236. [DOI] [PubMed] [Google Scholar]
  • 48.Murphy ES, McSweeney FK, Smith RG, McComas JJ. Dynamic changes in reinforcer effectiveness: theoretical, methodological, and practical implications for applied research. J. Appl. Behav. Anal. 2003;36(4):421–438. doi: 10.1901/jaba.2003.36-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.National Center on Intensive Intervention: Reinforcement Strategies. U.S. Department of Education, Office of Special Education Programs (2016). https://intensiveintervention.org/intervention-resources/behavior-strategies-support-intensifying-interventions#reinforcement
  • 50.Nunes I, Jannach D. A systematic review and taxonomy of explanations in decision support and recommender systems. User Model. User Adap. Inter. 2017;27(3):393–444. doi: 10.1007/s11257-017-9195-0. [DOI] [Google Scholar]
  • 51.Préfontaine I, Lanovaz MJ, McDuff E, McHugh C, Cook JL. Using mobile technology to reduce engagement in stereotypy: a validation of decision-making algorithms. Behav. Modif. 2019;43(2):222–245. doi: 10.1177/0145445517748560. [DOI] [PubMed] [Google Scholar]
  • 52.Rincover A, Newsom CD. The relative motivational properties of sensory and edible reinforcers in teaching autistic children. J. Appl. Behav. Anal. 1985;18(3):237–248. doi: 10.1901/jaba.1985.18-237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Roman J, Mehta DR, Sajja PS. Multi-agent simulation model for sequence generation for specially abled learners. In: Satapathy SC, Joshi A, editors. Information and Communication Technology for Intelligent Systems (ICTIS 2017)—Volume 1. Berlin: Springer; 2018. pp. 575–580. [Google Scholar]
  • 54.Sayed WS, Gamal M, Abdelrazek M, El-Tantawy S. Towards a learning style and knowledge level-based adaptive personalized platform for an effective and advanced learning for school students. In: Farouk MH, Hassanein MA, editors. Recent Advances in Engineering Mathematics and Physics. Berlin: Springer; 2020. pp. 261–273. [Google Scholar]
  • 55.Schaeffer, M.: What motivates you: Smiles or stickers? Extrinsic vs. intrinsic motivators in a self-contained Special Education Classroom [Thesis, Trinity Christian College] (2018). https://search.proquest.com/openview/137af1da3c1662e00f9f7ff703a82f67/1?pq-origsite=gscholar&cbl=18750&diss=y
  • 56.Schuetze M, Rohr CS, Dewey D, McCrimmon A, Bray S. Reinforcement learning in autism spectrum disorder. Front. Psychol. 2017 doi: 10.3389/fpsyg.2017.02035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Shawky D, Badawi A. Towards a personalized learning experience using reinforcement learning. In: Hassanien AE, editor. Machine Learning Paradigms: Theory and Application. Berlin: Springer; 2019. pp. 169–187. [Google Scholar]
  • 58.Shortreed SM, Laber E, Lizotte DJ, Stroup TS, Pineau J, Murphy SA. Informing sequential clinical decision-making through reinforcement learning: an empirical study. Mach. Learn. 2011;84(1):109–136. doi: 10.1007/s10994-010-5229-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Siyam N. Special education teachers’ perceptions on using technology for communication practices. J. Res. Educ. Pract. Theory. 2018;1(2):6–18. doi: 10.5281/zenodo.2537590. [DOI] [Google Scholar]
  • 60.Siyam N. Factors impacting special education teachers’ acceptance and actual use of technology. Educ. Inf. Technol. 2019;24(3):2035–2057. doi: 10.1007/s10639-018-09859-y. [DOI] [Google Scholar]
  • 61.Siyam, N.: Using mobile technology for coordinating educational plans and supporting decision making through reinforcement learning in inclusive settings [thesis, The British University in Dubai (BUiD)] (2021). https://bspace.buid.ac.ae/handle/1234/1879
  • 62.Siyam N, Abdallah S. A pilot study. Investigating the use of mobile technology for coordinating educational plans in inclusive settings. J. Spec. Educ. Technol. 2021 doi: 10.1177/01626434211033581. [DOI] [Google Scholar]
  • 63.Spencer TD. Self-contained classroom. In: Volkmar FR, editor. Encyclopedia of Autism Spectrum Disorders. Berlin: Springer; 2013. pp. 2721–2722. [Google Scholar]
  • 64.Spiller, A.N.: Understanding Stakeholder Communication and Coordination for Children with Behavioral Needs [Thesis, University of Michigan] (2020). https://deepblue.lib.umich.edu/handle/2027.42/162555?show=full
  • 65.Stamper J, Eagle M, Barnes T, Croy M. Experimental evaluation of automatic hint generation for a logic tutor. Int. J. Artif. Intell. Educ. 2013;22(1–2):3–17. doi: 10.3233/JAI-130029. [DOI] [Google Scholar]
  • 66.Stevens, E., Atchison, A., Stevens, L., Hong, E., Granpeesheh, D., Dixon, D., Linstead, E.: A Cluster analysis of challenging behaviors in autism spectrum disorder. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 661–666 (2017). 10.1109/ICMLA.2017.00-85
  • 67.Stichter JP, Randolph JK, Kay D, Gage N. The use of structural analysis to develop antecedent-based interventions for students with autism. J. Autism Dev. Disord. 2009;39(6):883–896. doi: 10.1007/s10803-009-0693-8. [DOI] [PubMed] [Google Scholar]
  • 68.Stratton SJ. Quasi-experimental design (pre-test and post-test studies) in prehospital and disaster research. Prehosp. Disaster Med. 2019;34(6):573–574. doi: 10.1017/S1049023X19005053. [DOI] [PubMed] [Google Scholar]
  • 69.Sutton RS, Barto AG. Reinforcement Learning: An Introduction (2nd edition) Cambridge: MIT Press; 2018. [Google Scholar]
  • 70.Thabtah F. Machine learning in autistic spectrum disorder behavioral research: a review and ways forward. Inform. Health Soc. Care. 2019;44(3):278–297. doi: 10.1080/17538157.2017.1399132. [DOI] [PubMed] [Google Scholar]
  • 71.Trudel L, Lanovaz MJ, Préfontaine I. Brief report: mobile technology to support parents in reducing stereotypy. J. Autism Dev. Disord. 2020 doi: 10.1007/s10803-020-04735-6. [DOI] [PubMed] [Google Scholar]
  • 72.Tsiakas K, Dagioglou M, Karkaletsis V, Makedon F. Adaptive robot assisted therapy using interactive reinforcement learning. In: Agah A, Cabibihan J-J, Howard AM, Salichs MA, Hea H, editors. Social Robotics. Berlin: Springer; 2016. pp. 11–21. [Google Scholar]
  • 73.Vannest KJ, Burke MD, Payne TE, Davis CR, Soares DA. Electronic progress monitoring of IEP goals and objectives. Teach. Except. Child. 2011;43(5):40–51. doi: 10.1177/004005991104300504. [DOI] [Google Scholar]
  • 74.Witzel BS, Mercer CD. Using rewards to teach students with disabilities: implications for motivation. Remed. Spec. Educ. 2003;24(2):88–96. doi: 10.1177/07419325030240020401. [DOI] [Google Scholar]
  • 75.Yu, C., Liu, J., Nemati, S.: Reinforcement Learning in Healthcare: A Survey. http://arxiv.org/abs/1908.08796 [Cs] (2020)
  • 76.Zhang, C., Wang, S., Aarts, H., Dastani, M.: Using Cognitive Models to Train Warm Start Reinforcement Learning Agents for Human–Computer Interactions. http://arxiv.org/abs/2103.06160 [Cs] (2021)
  • 77.Zhang YX, Cummings JR. Supply of certified applied behavior analysts in the United States: implications for service delivery for children with autism. Psychiatr. Serv. 2020;71(4):385–388. doi: 10.1176/appi.ps.201900058. [DOI] [PubMed] [Google Scholar]
  • 78.Zhao, X., Zhang, L., Xia, L., Ding, Z., Yin, D., Tang, J.: Deep Reinforcement Learning for List-wise Recommendations. http://arxiv.org/abs/1801.00209 [Cs, Stat] (2019)
  • 79.Zheng, G., Zhang, F., Zheng, Z., Xiang, Y., Yuan, N. J., Xie, X., Li, Z.: DRN: a deep reinforcement learning framework for news recommendation. In: Proceedings of the 2018 World Wide Web Conference, pp. 167–176 (2018). 10.1145/3178876.3185994

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data collected and analyzed during the current study are available from the corresponding author on request.

The reinforcement learning algorithm code is available from the corresponding author on request.


Articles from Universal Access in the Information Society are provided here courtesy of Nature Publishing Group

RESOURCES