Abstract
Conventional approaches to enhance movement coordination, such as providing instructions and visual feedback, are often inadequate in complex motor tasks with multiple degrees of freedom (DoFs). To effectively address coordination deficits in such complex motor systems, it becomes imperative to develop interventions grounded in a model of human motor learning; however, modeling such learning processes is challenging due to the large DoFs. In this paper, we present a computational motor learning model that leverages the concept of motor synergies to extract low-dimensional learning representations in the high-dimensional motor space and the internal model theory of motor control to capture both fast and slow motor learning processes. We establish the model’s convergence properties and validate it using data from a target capture game played by human participants. We study the influence of model parameters on several motor learning trade-offs such as speed-accuracy, exploration-exploitation, satisficing, and flexibility-performance, and show that the human motor learning system tunes these parameters to optimize learning and various output performance metrics.
Author summary
Examining the learning and acquisition of motor skills in humans when facing complex, high-dimensional tasks is vital for understanding human motor learning, optimizing the performance of human-in-the-loop systems, improving learning outcomes, and facilitating rehabilitation. Toward this goal, we develop a normative model of human motor learning in high-dimensional novel motor tasks and show that it explains experimental data reasonably well. Further, through a model-based investigation, we examine various motor learning trade-offs, such as exploration-exploitation, speed-accuracy, satisficing, and flexibility-performance. These findings provide a foundational insight into how the human brain may balance these trade-offs during learning.
Introduction
Understanding human motor learning and relearning is of critical importance for domains such as motor skill acquisition and rehabilitation. Despite several advances in the theoretical mechanisms underlying motor learning [1], these mechanisms have been predominantly based on task paradigms that often do not consider the high dimensionality of the motor system at multiple levels, termed the ‘degrees of freedom problem’ [2]. Consequently, expanding existing models to investigate motor learning in high-dimensional motor tasks presents non-trivial challenges such as simultaneous control of large DoFs, and computational complexities arising from learning in high-dimensional spaces.
A key feature of learning in such high dimensional motor tasks is to handle several trade-offs such as the classic speed-accuracy trade-off [3], the exploration-exploitation trade-off [4], and the computational simplicity vs. flexibility trade-off [2, 5]. Moreover, cognitive models of decision-making have investigated phenomena such as satisficing [6] under such constraints, but these trade-offs have not been examined in motor learning contexts. In this work, we investigate these trade-offs by developing a model of motor learning in high-dimensional motor tasks. We leverage the idea that the human nervous system uses a small number of motor synergies, defined as a coordinated motion of a group of joints, to control and manipulate high DoF motor systems [7–9]. This allows us to extract low-dimensional learning representations in high-dimensional motor space, which enables the formulation of a computational model that can tackle the computational complexity of a large number of DoFs.
The approach of Ref [10] is closest to ours with the key distinction that [10] considers discrete tasks, while we focus on a continuous learning paradigm, which includes adaptation in the presence of continuous visual feedback. This shift in the experimental paradigm necessitates the reformulation of the model from first principles and the inclusion of a human motion perception model. Initial efforts towards motor learning on novel motor tasks in the presence of continuous visual feedback can also be found in [11] where heuristic methods are used for data fitting. Compared to the model proposed in [11], in the current work, we have a more refined model, a systematic data fitting procedure, a thorough investigation of various motor learning phenomena, and a bigger pool of experimental data to validate the efficacy of our model in explaining the motor learning behavior.
The goal of this work is to study human motor learning for high-dimensional novel learning task through a computational model that can explain various motor learning phenomena. Toward this end, we first develop an integrated dynamic model of human motor learning through the formation of internal representation, including models for perception, and forward and inverse learning. We use the motor synergies extracted from the human hand postures to create low-dimensional learning states, thus tackling the issue of increasing computational complexity with increasing DoFs of motor systems. We establish convergence properties of the proposed model and after fitting human participant data, we show that the proposed model can explain human motor learning and output performance behavior well. We then use the proposed model to systematically investigate the influence of model parameters on several motor learning trade-offs, including, speed-accuracy, exploration-exploitation, satisficing, and flexibility-performance. This analysis reveals how the motor system optimizes the use of synergies to control large degrees of freedom, how they manage various learning trade-offs, and how satisficing behavior is observed in a motor learning setting.
Motor learning experiment
In our experiment, healthy participants learn a novel motor task by playing a target capture game [12]. Each participant wears a data glove, which records the movements of the 19 finger joints. A body-machine interface (BoMI) then projects the 19-dimensional finger movements onto the movement of a cursor on a 2-D computer screen using a matrix. Specifically, the BoMI projects finger joint velocities to cursor velocity using a matrix such that
| (1) |
Here, n = 2 (the 2-D computer screen) and m = 19 (the 19 finger joints). Let be the vector of finger joint angles and thus, . Since the mapping is linear and the mapping matrix C is time-invariant, the joint velocity to cursor velocity mapping, and the joint position to cursor position mapping are equivalent, as long as the initial cursor position is the same. Also note that this setup is different from the experimental paradigm undertaken in other studies, for instance [13], where the joint (IMU) angles are mapped to the end-effector velocities of a robot arm being controlled.
Participants first engage in a calibration phase where they move their hand fingers randomly while avoiding any extreme range of motion. The corresponding hand finger posture data is collected through the data glove, centered, and then PCA is performed to extract the principal components (PCs). The first two PCs are used as the rows to design the projection matrix C specific to each participant. The game bounds are also specialized to each participant to make sure that all the points in the game window are reachable (refer to Methods for more details). During the target capture gameplay, the participants need to move the cursor to the prescribed target point one after the other displayed as the centers of the squares on a 5 × 5 square grid, and a new target is prescribed only after the current target is captured. The participants train on 4 targets (3 outer targets and 1 center target) for 8 sessions, each comprising of 60 target capture movements. The sequence of targets prescribed in each session is randomized but it always consists of 12 center out movements to each of the 3 outer targets from the center target, and 24 movements to the center target (8 from each of the 3 outer targets). Participants are further instructed to try and capture the prescribed targets within 2s of the movement onset from the previously captured target position, failing which the target square highlights in red to indicate that they have exceeded the time. Although there is no maximum time limit to capture the target, a scoring system based on the movement time and accuracy is shown on top of the game window for motivation purposes (refer to [12] for more details).
Through the gameplay, participants learn how to move various finger joints to make the cursor move along a desired trajectory, and in doing so they learn coordinated finger joint movements that are consistent with the projection matrix C (C is unknown to the participants). This experiment involves learning in high-dimensional human motor systems (the finger joint space), whereas the output performance feedback is in the low-dimensional screen space. Since the mapping C from the 19-dimensional joint space to the 2-dimensional screen space is many-to-one, there are multiple solutions to the inverse kinematic mapping due to the large null space of C. This task is redundant in the sense that a desired cursor movement can be achieved with multiple synergistic motions of finger joints.
Model
One way that motor learning in novel environments can occur is through the formation of internal representations, including models for perception, forward learning, and inverse learning [14–16]. Our formulation of these forward and inverse models follows the convention described in [10], which was originally introduced in [17].
There is also evidence that humans can control a high number of DoFs using a small number of coordinated joint movement patterns called synergies [7, 18–21]. We, therefore, decompose the mapping matrix C = WΦ, where is a matrix of h basic synergies underlying coordinated human finger motions, and represents the contributions (weights) of these synergies (see Methods: Extracting motor synergies for details on how the synergy matrix Φ is formed). Without loss of generality, the rows of Φ are assumed to be orthogonal, i.e., the synergies contributing to the hand motions lie in orthogonal spaces. While our motor learning model is in a high-dimensional space, these synergies reduce the size of the learning space and enable efficient learning by reducing the amount of exploration.
Human perception model of BoMI
Since our experimental paradigm consists of humans learning under continuous feedback, we propose a perception model that explains how humans process continuous feedback signals. The BoMI mapping (Eq (1)) is described in terms of joint and cursor velocities. However, we hypothesize that humans interacting with the BoMI perceive these velocities as increments in cursor positions and finger joint angles. Consequently, we re-write (Eq (1)) using filtered joint and cursor position data [22] as
| (2) |
where δx and δq are termed filtered increments in cursor positions, and filtered increments in joint angles, respectively. Here δq evolves as per the dynamic equation
| (3) |
where the parameter a controls the smoothing weights assigned to previous velocities, and ξq is the perceptual noise that captures the inaccuracies in human motion perception (refer to S1 File:Section 1.1 for a detailed derivation). We call a the perceptual recency parameter.
The forward learning model
Learning the forward model corresponds to learning the forward BoMI mapping matrix C, which maps the finger motions to the cursor motion. We represent the participant’s implicit estimate of C at time t by . Correspondingly, for the participant’s change in finger joint angles , the participant’s estimated change in cursor position is determined by the forward mapping estimate :
| (4) |
where is the estimate of the matrix W. It represents the estimated weights that the human participant assigns to each synergy at time t. The learning space is therefore reduced from to , a significant reduction, thus making our proposed model much more tractable. It follows from (Eqs (2) and (4)) that the estimation error
| (5) |
where is called the parameter estimation error.
Applying the gradient descent on with respect to leads to
| (6) |
where (⋅)⊤ represents the transpose and γ > 0 is the rate at which human participant learns the forward mapping. Accordingly, we call γ the forward learning rate. We posit (Eq (6)), with γ as a tunable parameter, as a model for human forward learning dynamics. Similar models, whose dynamics evolve as a result of reducing some error metric, have been previously used in motor learning literature [10, 23], thus making our proposed model consistent with the error-based human motor learning paradigm.
The inverse learning model
Learning the inverse model requires identifying the finger joint motions that are needed to drive the cursor on the screen to the right target by achieving the target hand posture.
Given the current cursor position x and the desired cursor position xdes, we define ex = xdes − x. We hypothesize that humans choose their nominal joint velocities to minimize the cost function
| (7) |
where the minima associated with the first term determines a u that makes the human’s estimate of cursor velocity equal to the error-driven proportional feedback kPex, for some kP > 0, and the second term ensures that the joint velocities are not too high, where the parameter μ > 0 determines the admissible joint velocities. We refer to kP and μ as control and optimality parameters, respectively.
Our model posits that humans compute their joint velocities by performing a gradient descent on J, i.e., , where η > 0 is the inverse learning rate and ξu is the exploratory noise, modeled as white noise with intensity σu that captures the inaccuracies in the computation of the gradient and the exploration by humans in the joint velocity space. Upon simplification, the u dynamics are
| (8) |
Putting everything together, the model of human motor learning dynamics
| (9a) |
| (9b) |
| (9c) |
| (9d) |
where ex dynamics are used instead of x dynamics.
Results
We first establish the stability and convergence properties of the proposed HML model. We summarize our results in this section and refer interested readers to S1 File:Section 2 for detailed proof.
For initial forward mapping estimate sufficiently close to the actual mapping C, and inverse learning dynamics sufficiently faster than forward learning dynamics, the model converges to a neighborhood of the equilibria at which the model learns the true weights W associated with the mapping matrix, the cursor reaches the target position, and the hand posture is stationary. Moreover, the size of the neighborhood is a function of and decreases with the exploration noise intensity.
Comparing HML model performance with human experiment data
We simulated the proposed HML model with the parameters obtained by fitting experiment data from six healthy human participants to the HML model. The fitted parameters are shown in Table 1 and the fitting procedure is detailed in Materials and Methods. The trajectory data from the HML model was re-sampled at the same time indices as the finger joint angle data from the data glove for direct performance comparisons. We use Forward Model Error (FME) [10], defined by
as the metric to quantify the convergence of the human estimate of the forward mapping matrix, , to the actual forward mapping matrix C = WΦ. The two output performance metrics we used to compare the performance of the HML model are reaching error (RE) and straightness of trajectory (SoT). RE in each trial of the human participant data is calculated as the Euclidean norm of the cursor position from the target at the end of the trial. The end of the trial is defined as the instant when the cursor position did not change by more than 0.0025 units for 15 consecutive samples, or 2 seconds after the start of the movement, whichever is earlier. SoT is defined as an aspect ratio of the maximum perpendicular distance of the trajectory from the straight line joining the start and end points, to the straight line distance between the start and the end points.
Table 1. Model parameters.
Fitted parameter values for the 6 subjects.
| Parameters | Subject 1 | Subject 2 | Subject 3 | Subject 4 | Subject 5 | Subject 6 | |
|---|---|---|---|---|---|---|---|
| γ | Forward learning rate | 0.0664 | 0.0030 | 0.0456 | 0.1398 | 0.0013 | 0.1252 |
| η | Inverse learning rate | 3.1742 | 3.1448 | 1.5383 | 1.9856 | 2.4916 | 0.7131 |
| μ | Optimality parameter | 2.4581 | 3.3056 | 3.3072 | 3.5735 | 3.5382 | 3.9744 |
| k P | Control parameter | 1.3098 | 1.5965 | 3.2714 | 1.8976 | 1.5569 | 2.2515 |
| σ u | Exploration noise intensity | 0.8764 | 1.0165 | 1.0082 | 0.9556 | 1.9749 | 0.9298 |
| σ q | Perceptual noise intensity | 0.1370 | 0.5451 | 0.0508 | 0.0169 | 0.7118 | 0.0064 |
| a | Perceptual recency parameter | 10 | 10 | 10 | 10 | 10 | 10 |
Fig 1 compares the RE and SoT, and Fig 2 compares the cursor trajectories obtained from the human data with that from the fitted HML model. Results show that our model can reproduce the motor learning of human hand motions very closely. Noisy trajectories, in the beginning, are due to high exploration noise in the initial trials and the fact that the initially learned mapping leads to much of the energy being expended towards the null space of the mapping matrix; trajectories look straighter as the trials progress and the model learns the mapping. The decrease in FME as a function of trials for the subjects (Fig 3) captures the task learning evolution for the human participants.
Fig 1. Performance measures across subjects.
Temporal evolution of (a) reaching error, and (b) straightness of trajectory (both averaged over a moving window of 10 trials) for subjects (red) and the respective fitted HML model (blue) across trials.
Fig 2. Cursor trajectories.
Cursor trajectory data from the fitted model (a), (c) and human experiments (b), (d). As learning progresses through the 8 sessions, the trajectories become closer to a straight line between targets, which the proposed HML model also captures.
Fig 3. Forward model error.
Evolution of forward model error (FME) for the fitted model as a function of trials for all 6 subjects.
Comparative analysis of HML model’s efficacy in explaining the motor learning
Pierella et al. [10] also developed a computational model of motor learning for high-dimensional episodic/discrete tasks. The extension of a model from capturing a trial-by-trial motor learning task to a continuous task with visual feedback is non-trivial. Nevertheless, we fit the model proposed in Ref [10] and the proposed HML model to the experiment data. For both the models, we report the RE fitting error for all subjects, which captures the deviation of the model’s RE from the subject’s RE from the experiments in Fig 4. We claim the following improvements over [10] in designing a normative model of human motor learning. First, model fitting errors in Fig 4 show a superior performance of the HML model in capturing the output performance of participants. Second, since the HML model captures continuous learning dynamics, it is possible to extract/estimate a larger set of performance metrics, such as straightness of trajectory [12]. Furthermore, our model introduces a human perception model to account for the continuous visual feedback and incorporates the ideas of motor synergies, capturing prior knowledge about a participant’s motor movement behavior. The extracted motor synergies also provide a principled way of initializing the mapping matrix for the model evolution, where the weights on the synergies W can be initialized instead of initializing the whole C matrix randomly.
Fig 4. Comparing HML model with Ref [10] model.
Comparing the errors in RE curve fitting from the model in Ref [10] to the HML model shows that the model in Ref [10] is not as accurate as HML model in capturing the RE for this motor learning task.
Investigating trade-offs in motor learning behavior
We showed that the HML model can capture the human motor learning behavior in novel learning tasks for high-dimensional motor systems very closely. We now conduct a model-based investigation into the influence of parameter variations on various trade-offs in motor learning behaviors. Keeping all the other parameters at their fitted values (Table 1), the parameters under study are varied in a range, and their effects on the performance metrics are observed. For brevity, we only discuss the effects of parameters that had a significant effect on performance metrics.
Exploration versus exploitation trade-off
Motor learning requires balancing the trade-off between the exploration of joint space to learn desired coordinated movement versus the exploitation of the currently learned dexterity to drive the cursor motion. We define the driving (exploitatory) and exploratory efforts as the projection of joint velocities (Eq (9d)) on row space versus null space of (learned) mapping estimate (Eq (9b)), respectively. A typical target-reaching trajectory is assumed to comprise ballistic and learning phases [24]. In the ballistic phase, the learned model is used for finger joint movement generation, and post this phase, movement corrections based on the visual feedback (learning phase) begin. To better capture the learning of the forward mapping, we compare these two efforts at the end of the ballistic phase of each trial. The end of the ballistic phase is calculated at the point within the trial where the norm of finger-joint velocities is maximum.
We found that the inverse learning rate η has the most pronounced effect on the exploration-versus-exploitation trade-off. An increase in inverse learning rate in the inverse dynamics (Eq (9d)) initially decreases the energy expended towards the null space of , thus increasing the driving effort. Simultaneously, for smaller values of η, exploration dominates exploitation, and thus the magnitude of joint velocities’ projection on null space of is higher. Distribution of these efforts across trials shown in Fig 5 captures this effect, showing a decrease in the mean exploratory effort and an increase in mean driving effort with an increase in η.
Fig 5. Effort variation with η.
Distribution of driving and exploratory effort (averaged across 128 Monte Carlo runs) with means and 95% confidence bounds across trials as η is varied around its fitted value 3.1742. While driving effort increases, exploratory effort decreases initially, and both plateau past the fitted η value. One-tailed paired t-tests over the effort values across trials reveal this plateauing effect at a significance level of p < 0.001.
Additionally, for η higher than its fitted value in Fig 5, running one-tailed paired t-tests on the distribution of driving effort values across trials between η-value pairs shows a significant (at significance level p < 0.001) difference between the distributions at η smaller than its fitted value. The difference between the distributions plateau as we increase η past its fitted value. These trends and data fit suggest that human motor learning balances the exploration-exploitation trade-off by tuning η such that driving effort is optimally expended towards the task at hand.
Speed versus accuracy trade-off
We define the speed of the response based on the time it takes to capture the target after the start of the movement for each trial. The target is considered captured if the cursor position does not change by more than 0.0025 units for 15 consecutive samples while being inside a radius of size ρx around the target point. The accuracy of the response is defined by looking at the straightness of the cursor motion trajectory between the targets, or in other words, the root-mean-square (RMS) error between the cursor trajectory and a straight line joining the start and end targets in a trial.
We found that the control parameter kP has the most influence on the speed-accuracy trade-off. An increase in the control intensity should lead to faster trajectories and thus a decrease in the average trial time. Additionally, as kP increases, the driving effort should start dominating the exploration noise, and thus the trajectories should start to look straighter. Fig 6 captures this behavior, where the speed and accuracy go up as kP increases to a certain value. Comparing the accuracy values across trials around the value of kP from the data fits (1.3098) to other values of kP using one-tailed paired t-test shows a significant increase (at the significance level p < 0.001) in accuracy values around the fitted kP value. This may be an outcome of the motor system optimizing this speed and accuracy trade-off for constrained inverse learning effort.
Fig 6. Speed and accuracy variation with kP.
Across trial distribution (averaged over 128 Monte Carlo runs) of speed and accuracy with means and 95% confidence bounds as kP is varied around its fitted value 1.3098. Accuracy is highest around the fitted value (p < 0.001) and past that speed increases while accuracy decreases.
Satisficing
Satisficing stands for satisfaction and sufficing [6]. In the context of human decision making it refers to the fact that humans tend to settle with ‘good enough’ options than seeking the optimal ones. In our context, this behavior can be seen in terms of how well the mapping is learned relative to the target size. We model the sufficing level by turning off the learning after a desired threshold is achieved on the Forward Modeling Error (FME), which captures how well the weights on synergies, W, have been learned. Satisfaction is modeled using a desired upper bound on the reaching error achieved for a fixed trial time, which is effectively captured by setting the target radius, denoted by ρx. We thus aim to study how the learning behavior changes by changing these thresholds.
To study satisficing, we quantify the performance with the probabilities with which the trajectories enter different target sizes ρx at the end of the prescribed trial time for different learning thresholds (different lower bounds to FME), at different trial times. Intuitively, for a particular learning level (FME threshold), larger target sizes (higher ρx) lead to better performance (increase success probability). Also, lower FME thresholds are needed at small trial times compared to higher trial times to achieve a particular performance level. This is supported by the curves in Fig 7, which show the probabilities with which the trajectories hit different target sizes, ρx, for increasing FME threshold across different trial times. Satisficing behavior is observed as curves start to plateau at low FME values, that is, lower values of the learning threshold, which enable better learning, do not necessarily improve the success probability.
Fig 7. Satisficing effect.
Probabilities of entering the targets as a function of target size and learning threshold (FME) for different trial times. For smaller trial times, lower learning thresholds (high learning accuracy) are required to achieve high success probabilities for the same target sizes. Satisficing behavior is observed at high learning accuracy (low FME) levels, where learning with higher accuracy does not necessarily increase success probabilities. The zoomed view (right) for probability curves at high learning accuracy (low FME) levels for 1.2s trial time shows the satisficing effect. Curves are average success probabilities with 95% confidence bounds over 1280 Monte Carlo runs of the HML model. Values of ρ are relative to the size of the unit cell of the game grid.
Moreover, the learning curves appear to be sigmoid functions; the slope of the sigmoid decreases as the target size is decreased for each trial time. Also, the inflection point starts to move towards higher FME values as both the target sizes and trial times increase. Furthermore, for lower satisfaction levels (higher ρx values), the highest success probability does not correspond to the highest sufficing levels (lowest FME thresholds), meaning, enforcing better learning could end up hurting the overall task performance. This effect is more pronounced at smaller trial times.
Flexibility versus performance
We now explore the effect of the number of motor synergies used in the model on the forward modeling error when the BoMI mapping matrix does not belong to the span of the selected synergies. The higher the number of used synergies, the better an arbitrary mapping matrix can be represented in the space of synergies. However, there is a trade-off as learning with a higher number of synergies requires higher exploratory effort captured through exploration noise intensity σu. This is corroborated by results in Fig 8, which shows the variation of FME with σu and the number of synergies at the end of session 4. We see minima in FME at synergies less than 19 (the maximum number of synergies) across all exploration noise intensities. Thus, using a higher number of synergies can lead to convergence to the true mapping matrix C, however for limited learning time and limited exploration, using a smaller number of synergies gives better performance, as convergence can be much faster albeit not to the true C matrix. This effect is more pronounced around the fit value of σq = 0.1370; for σq = 0.1 using 10 synergies gives us the lowest FME.
Fig 8. FME variation with σu and number of synergies.
FME as a function of increasing σu and number of synergies used. For limited training time, using more synergies is not always the most optimal strategy. Minimum FME (blue cells) is achieved at synergies lower than 19.
Discussion
The aim of this work is two-fold: to design a computational model of human motor learning for a de-novo (novel) learning task that requires participants to capture targets on a computer screen through hand-finger motions, and then leverage this model to study various motor learning trade-offs taking place when learning high-dimensional motor tasks. Toward developing a computational model for human motor learning in a high-dimensional continuous learning task, we tackle the issue of high motor DoFs by leveraging motor synergies extracted from the human participant to create low-dimensional learning representations and include a perception model to account for the continuous visual feedback during the motor task. We then utilize the internal model theory of motor learning to obtain a computational HML model that comprises fast and slow varying forward and inverse learning models. We also establish the exponential convergence properties of the proposed learning model using singular perturbation arguments. Then, we fit the experimental data from 6 human participants to the HML model showing that the proposed model captures the human motor learning behavior well. We then use the proposed model to study the following trade-offs in human motor learning: exploration-exploitation, speed-accuracy, satisficing, and flexibility-performance.
Motor synergies to capture high-dimensional motor learning
There has been strong evidence of the use of postural synergies by the human nervous system to control high-dimensional motor systems. Literature is replete not only with studies explaining a large number of human hand postures using a small number of postural synergies through kinematic recordings [7, 18] but also with works verifying the encoding of synergistic information for hand postural control at the neural level in human brain motor cortical regions [25–27] and using fMRI brain data to predict the hand postures. It is only natural that the participants would employ these existing motor synergies while learning to play our target capture game, and thus incorporating synergies in our proposed HML model helps explain the underlying motor learning process more closely. It should be noted however that our model can also capture learning in tasks where the participants have to learn non-synergistic coordination patterns (for example, flexion of thumb accompanied by extension of fingers) simply by not factorizing the mapping matrix into weights and synergies and using C as is.
HML model and its parameters
Studies in motor learning literature have shown that motor learning in the target capture task cannot be accounted for by mechanisms of motor adaptation, but by de-novo motor skill learning, which consists of developing a controller from scratch, without interfering with pre-existing controllers. Although there are some theories on what the motor learning process for de-novo learning might look like, for example, the new controller could be assembled through reinforcement learning [28, 29], in this work we entertain the possibility of de-novo learning taking place through the formation and simultaneous updating of internal models, including that of perception, and forward and inverse model, which was first introduced in [17].
The coevolution of forward and inverse models of motor learning has had a strong footing in the literature [16, 30–32], [33] [Chapter 9]. However, it has mostly been studied for motor adaptation tasks; care must be taken when generalizing these ideas of dynamically evolving internal models from motor adaptation paradigms to motor skill learning [1]. Nevertheless, few works do explore the coevolution of these internal models in de-novo learning task paradigm [10, 34], of which, [10] came up with a computational model that could explain motor skill learning in an episodic de-novo learning task. We develop a normative model of de-novo motor learning through gradient descent on two separate error functions [15] for continuous feedback tasks. These error functions are constructed using the sensory prediction errors for the forward model [35, 36], and a weighted combination of the task error (a proxy for motor error) and control energy (to account for input energy expended) for the inverse model [17, 37]. If we take out the perception model and the regularization of task error function from our model, we exactly recover the model in [10] for trial-by-trial learning tasks. Of course, when going from an episodic task domain to one with continuous feedback during trials, we need to have some notion of motion perception by human participants. In line with the theories of motion perception in humans, which elucidate that the visual system detects motion by spatiotemporal correlation between the stimuli [38], we design our perception model as smoothing of (cursor and finger-joint) velocities over signal history to obtain the filtered increments in these signals. This perception model includes the perceptual recency parameter a which controls the weights assigned to the previous velocities, and the perceptual noise ξq that accounts for inaccuracies in the filtering process. The motivation for including the optimality parameter μ in the inverse learning model was derived from the likely assumption that a substantial cognitive component is involved in optimizing the motions when learning de-novo tasks. Danziger et al. [39] concluded that, as opposed to participants moving the cursor on the screen along straight lines to achieve the task if the cursor is represented as the endpoint of a two-link arm, participants tend to minimize the distance in terms of joint angles of the two-link arm. The second objective that minimizes the total change in two-link arm joint angles during a target capture can be included in the cost function (7) to derive the inverse learning dynamics. The corresponding parameter value μ, that weighs the trade-off between the task objective (capturing the target with robot end-effector) and minimizing the change in two-link arm joint angles, would capture this effect observed in the study. A higher value of the optimality parameter would thus correspond to the participants trying to minimize the total joint angle changes during a target capture more strongly. However, further investigation is warranted to support this hypothesis, which is beyond the scope of the current work.
Motor exploration and exploratory noise
Through the motor learning behavior investigation, we identify the parameters that have the most significant effects on the metric trade-offs for the learning task considered in this work. Sternad [40] emphasized the importance of variability in learning novel motor skills by exploring a host of possible solutions in the motor space. This effect is captured by the parameters η and σu in our model. The exploration versus exploitation trade-off analysis (see S1 Fig) shows that an increase in the exploration noise intensity σu increases the exploration effort, and consequently, the exploration in the nullspace of (equivalently ) for possible solutions, making FME converge faster. Whereas an increase in η has an opposite effect on exploration, while simultaneously increasing the driving effort towards exploitation of the learned policy. We, therefore, hypothesize that the human motor learning system tries to optimize these parameters, deviations from which result in a non-optimal distribution of exploration and driving efforts when learning a motor skill.
Fast and slow learning timescales
Although not a central focus of our work, we also found evidence for multiple time scales of learning. Two separate time scales of learning, fast and slow processes, in motor adaptation tasks have been widely studied in the literature. Prevailing theories [36, 41, 42] suggest that the cerebellum is involved in the initial fast learning followed by changes in the motor cortex during the slow learning phase [43]. We found a timescale separation between the forward and inverse learning dynamics, which was also corroborated by the parameter fits where γ ≪ η. This suggests that in our proposed model, the forward learning dynamics evolves at a slower rate as learning the forward mapping is a process that continues over the whole course of the task. On the other hand, the inverse learning dynamics evolves at a faster rate because the participant has to quickly adapt to generate the optimal finger joint velocities which are computed as per the current (participant) estimate of the forward mapping, thus allowing the participant to complete the trial most optimally. However, a careful evaluation needs to be undertaken to verify this hypothesis.
Future directions
Having a computational model of HML in high-dimensional motor systems is crucial to advancing our understanding of the underlying learning mechanisms, generating testable hypotheses, guiding the design of effective interventions, and studying the effect of practice schedules, task complexities, and feedback. There have been many studies and experiments in the last decade that aim to design and develop exoskeletons for studying hand joint motions [44], as well as rehabilitation of hand injuries [45]. Recent works [46, 47] have developed assist-as-needed controllers for the rehabilitation of hand fingers. Another work [48] has shown how adaptation in training tasks can modulate the rate of motor learning and affect rehabilitation. The proposed model is capable of explaining motor learning behaviors more broadly, and we believe the experimental paradigm we used is quite extensive for several reasons. Firstly, the experiment involves learning to control a high number of degrees of freedom mapped to a low-dimensional task, which is a critical factor in many motor learning tasks. Secondly, the use of kinematic “synergies” for hand movements has been well documented [7], enabling the task to capture various features of the model effectively. We plan to test the generality of model on a broader set of learning paradigms going forward.
Future works include leveraging the developed HML model and the insights drawn for the design of such assist-as-needed controllers and adaptive training schedules toward optimizing task performance and/or learning. Furthermore, we can generate hypotheses to experimentally validate various motor learning trade-offs. For example, our model suggests the satisficing behavior during motor learning. This can be empirically evaluated by varying the target sizes ρx and examining the relationship between the success probabilities and learning threshold levels. Various metrics can be used as a measure of learning threshold level of a participant, such as straightness of trajectory, reaching error, etc. Another prediction of the model is that on a constrained learning time, using lower number of synergies should lead to a more robust learning performance. A possible experiment could be to evaluate the minimum number of synergies employed by a particular participant to learn the task if we keep perturbing/changing the mapping matrix periodically during the experiment. Specifically, performing PCA on the collected data, we can identify the number of dominant synergies as a function of variation in the mapping.
Methods
Experiment procedure
The human participant data for the current paper is from one of the groups from a previous study [12], specifically the group with full visual feedback. The experiment procedure is briefly outlined below, but for more details interested readers are referred to [12].
An experimental session lasts approximately forty-five minutes and consists of three stages.
Calibration phase: In this phase, we perform calibration to design the forward mapping matrix specific to each participant. Participants are asked to perform a sequence of free finger movements where they would move their hand fingers in as many different ways as possible, carefully avoiding any extreme ranges of motion, while wearing the data glove. The corresponding posture information is recorded until 4000 − 5000 samples are collected (∼ 70 − 90s), centered around mean posture, and then PCA is performed. The first two principal components (PCs) are used in the mapping matrix C to map the movements of hand finger joints to cursor movement in x and y directions. The two PCs are also scaled by the square root of their respective eigenvalues to ensure comparable ease of motion in two directions. The mean posture is also calibrated to map to the center of the 5 × 5 grid.
Familiarization phase: In this phase, the participants are asked to move the cursor around freely in a 5 × 5 unit grid (for a limited time, so that no motor learning takes place before the training phase) to get accustomed to the game/motions and also to ensure that most of the grid is reachable. The latter is achieved by scaling the game window based on the participant’s cursor movement data and it ensures that participants can easily maneuver the cursor across the full screen. Since game window units are calculated from the scaling data during the calibration phase, they are different for different individuals.
Gameplay (training) phase: Each participant trains for 8 sessions, each comprised of 60 trials, on 4 target squares (3 outer targets and 1 center target) with centers located at (0.5, 4.5), (2.5, 0.5), (2.5, 2.5), (4.5, 4.5) units on the screen. A trial is comprised of one reaching movement from one target to another. A session always starts after the participants reach the center target, and a target is considered captured if two consecutive glove data samples did not change by more than 2 units (∼ 1°) for 10 consecutive samples while being inside the target square. The sequence of targets is randomized in each session, but it always consists of 12 center out movements to each of the 3 outer targets, and 24 movements to the center target (8 from each each of the 3 outer targets). A score is also displayed at the top of the game window which is calculated based on the accuracy of target capture (proximity to the center of the target square) and time taken to capture the target. Participants are additionally instructed to reach the center of the targets within 2s of the movement start, failing which the target block highlights in red and participants incur a time penalty on the score. This scoring system is shown at the top of the game window which does not serve any functional purpose other than motivating the participants.
Extracting motor synergies
The matrix Φ is formed using four motor synergies obtained as the first four principal components from the PCA performed on centered posture data recorded during the calibration phase. While our model can work with any number of synergies, we choose four synergies because previous studies on hand and finger configuration [7, 49] have shown that four synergies spanned more than 80% of the finger joint configurations.
Fitting HML model to human participant data
We use the human participant experiment data to obtain the HML model parameters in (Eq (9)) (also summarized in Table 1). The task performance is quantified by two metrics—reaching error (RE) and the straightness of the trajectory, to ascertain the performance of the HML model while fitting the data. Reaching error in each trial in the human participant data is calculated as detailed in the Results section. The straightness of the trajectory (SoT) is defined as an aspect ratio of the maximum perpendicular distance of the trajectory from the straight line joining the start and end points, to the straight line distance between the start and the end points. Owing to the stochastic non-linearity of the proposed HML model, we use the multi-objective optimization genetic algorithm NSGA-II [50] to find the optimal parameters over the parameter space using fRE = ‖REmodel − REdata‖2 and fSoT = ‖SoTmodel − SoTdata‖2 as the two objectives. The subscripts denote if the metric is formed using experiment data or the HML model. The duration of each trial of the HML model was consistent with the human experiment data so as to have a fair basis for objective function calculation. NSGA-II was run for 500 generations using the simulated binary crossover operator and polynomial mutation operator with rates 0.7 and 0.2, respectively, over a population size of 100. Out of the min{fRE} fits selected from 10 runs of NSGA-II, the parameter fits with minimum fRE and fSoT < avg{fSoT} are chosen for a particular subject. Table 1 shows the parameter values obtained for the human participant experiment data using NSGA-II. Perceptual recency parameter a was heuristically chosen to be a large value ∼ O(10), and the target size ρx value was chosen the same as the experiments.
Supporting information
Document containing supplementary HML model details, its convergence analysis, and additional model-based investigation into human motor learning behavior results.
(PDF)
Distribution of driving and exploratory effort across trials as σu is varied around its fit value 0.8764. Driving effort is highest around the fitted value of σu (p < 0.01), while exploratory effort increases monotonically with σu.
(TIF)
Data Availability
The data and the code files necessary to reproduce the results in the paper can be found at the GitHub repository: https://github.com/nkur/HMLmodel.
Funding Statement
This work was supported by the NSF Grant CMMI 1940950 (https://www.nsf.gov/awardsearch/showAward?AWD_ID=1940950) received by the authors V.S., X.T., and R.R. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1. Krakauer JW, Hadjiosif AM, Xu J, Wong AL, Haith AM. Motor learning. Comprehensive Physiology. 2019;9(2):613–663. doi: 10.1002/cphy.c170043 [DOI] [PubMed] [Google Scholar]
- 2. Bernstein’s N. The Coordination and Regulation of Movements. Oxford: Pergamon Press Ltd; 1967. [Google Scholar]
- 3. Fitts PM. The information capacity of the human motor system in controlling the amplitude of movement. Journal of Experimental Psychology. 1954;47(6):381. doi: 10.1037/h0055392 [DOI] [PubMed] [Google Scholar]
- 4. Sutton RS, Barto AG. Reinforcement Learning: An Introduction. MIT Press; 1998. [Google Scholar]
- 5. Flash T, Hochner B. Motor primitives in vertebrates and invertebrates. Current Opinion in Neurobiology. 2005;15(6):660–666. doi: 10.1016/j.conb.2005.10.011 [DOI] [PubMed] [Google Scholar]
- 6. Simon HA. Rational choice and the structure of the environment. Psychological Review. 1956;63(2):129. doi: 10.1037/h0042769 [DOI] [PubMed] [Google Scholar]
- 7. Santello M, Flanders M, Soechting JF. Postural hand synergies for tool use. Journal of Neuroscience. 1998;18(23):10105–10115. doi: 10.1523/JNEUROSCI.18-23-10105.1998 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Gentner R, Gorges S, Weise D, aufm Kampe K, Buttmann M, Classen J. Encoding of motor skill in the corticomuscular system of musicians. Current Biology. 2010;20(20):1869–1874. doi: 10.1016/j.cub.2010.09.045 [DOI] [PubMed] [Google Scholar]
- 9. Dominici N, Ivanenko YP, Cappellini G, d’Avella A, Mondì V, Cicchese M, et al. Locomotor primitives in newborn babies and their development. Science. 2011;334(6058):997–999. doi: 10.1126/science.1210617 [DOI] [PubMed] [Google Scholar]
- 10. Pierella C, Casadio M, Mussa-Ivaldi FA, Solla SA. The dynamics of motor learning through the formation of internal models. PLoS Computational Biology. 2019;15(12):e1007118. doi: 10.1371/journal.pcbi.1007118 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kamboj A, Ranganathan R, Tan X, Srivastava V. Towards Modeling Human Motor Learning Dynamics in High-Dimensional Spaces. In: American Control Conference. Atlanta, GA; 2022. p. 683–688.
- 12. Ranganathan R, Adewuyi A, Mussa-Ivaldi FA. Learning to be lazy: Exploiting redundancy in a novel task to minimize movement-related effort. Journal of Neuroscience. 2013;33(7):2754–2760. doi: 10.1523/JNEUROSCI.1553-12.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Ranganathan R, Lee MH, Padmanabhan MR, Aspelund S, Kagerer FA, Mukherjee R. Age-dependent differences in learning to control a robot arm using a body-machine interface. Scientific Reports. 2019;9(1):1960. doi: 10.1038/s41598-018-38092-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Shadmehr R, Mussa-Ivaldi FA. Adaptive representation of dynamics during learning of a motor task. Journal of Neuroscience. 1994;14(5):3208–3224. doi: 10.1523/JNEUROSCI.14-05-03208.1994 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Krakauer GM John W, C G. Independent learning of internal models for kinematic and dynamic control of reaching. Nature Neuroscience. 1999;2:1026–1031. doi: 10.1038/14826 [DOI] [PubMed] [Google Scholar]
- 16. Wolpert DM, Miall RC, Kawato M. Internal models in the cerebellum. Trends in Cognitive Sciences. 1998;2(9):338–347. doi: 10.1016/S1364-6613(98)01221-2 [DOI] [PubMed] [Google Scholar]
- 17. Jordan MI, Rumelhart DE. Forward models: Supervised learning with a distal teacher. Cognitive Science. 1992;16(3):307–354. doi: 10.1207/s15516709cog1603_1 [DOI] [Google Scholar]
- 18. Tresch MC, Cheung VC, d’Avella A. Matrix factorization algorithms for the identification of muscle synergies: Evaluation on simulated and experimental data sets. Journal of Neurophysiology. 2006;95(4):2199–2212. doi: 10.1152/jn.00222.2005 [DOI] [PubMed] [Google Scholar]
- 19. Berniker M, Jarc A, Bizzi E, Tresch MC. Simplified and effective motor control based on muscle synergies to exploit musculoskeletal dynamics. Proceedings of the National Academy of Sciences. 2009;106(18):7601–7606. doi: 10.1073/pnas.0901512106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Leo A, Handjaras G, Bianchi M, Marino H, Gabiccini M, Guidi A, et al. A synergy-based hand control is encoded in human motor cortical areas. eLife. 2016;5:e13420. doi: 10.7554/eLife.13420 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Al Borno M, Hicks JL, Delp SL. The effects of motor modularity on performance, learning and generalizability in upper-extremity reaching: a computational analysis. Journal of the Royal Society Interface. 2020;17(167):20200011. doi: 10.1098/rsif.2020.0011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Sastry S, Bodson M. Adaptive Control: Stability, Convergence, and Robustness. Prentice-Hall, Inc.; 1989. [Google Scholar]
- 23. Herzfeld DJ, Vaswani PA, Marko MK, Shadmehr R. A memory of errors in sensorimotor learning. Science. 2014;345(6202):1349–1353. doi: 10.1126/science.1253138 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Rosenbaum DA. Human Motor Control. Academic Press; 2009. [Google Scholar]
- 25. Ehrsson HH, Kuhtz-Buschbeck JP, Forssberg H. Brain regions controlling nonsynergistic versus synergistic movement of the digits: A functional magnetic resonance imaging study. Journal of Neuroscience. 2002;22(12):5074–5080. doi: 10.1523/JNEUROSCI.22-12-05074.2002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Gentner R, Classen J. Modular organization of finger movements by the human central nervous system. Neuron. 2006;52(4):731–742. doi: 10.1016/j.neuron.2006.09.038 [DOI] [PubMed] [Google Scholar]
- 27. Santello M, Baud-Bovy G, Jörntell H. Neural bases of hand synergies. Frontiers in Computational Neuroscience. 2013;7:23. doi: 10.3389/fncom.2013.00023 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Cashaback JG, McGregor HR, Mohatarem A, Gribble PL. Dissociating error-based and reinforcement-based loss functions during sensorimotor learning. PLoS Computational Biology. 2017;13(7):e1005623. doi: 10.1371/journal.pcbi.1005623 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Holland P, Codol O, Galea JM. Contribution of explicit processes to reinforcement-based motor learning. Journal of Neurophysiology. 2018;119(6):2241–2255. doi: 10.1152/jn.00901.2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Donchin O, Francis JT, Shadmehr R. Quantifying generalization from trial-by-trial behavior of adaptive systems that learn with basis functions: Theory and experiments in human motor control. Journal of Neuroscience. 2003;23(27):9032–9045. doi: 10.1523/JNEUROSCI.23-27-09032.2003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Taylor JA, Ivry RB. Flexible cognitive strategies during motor learning. PLoS Computational Biology. 2011;7(3):e1001096. doi: 10.1371/journal.pcbi.1001096 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Gonzalez Castro LN, Monsen CB, Smith MA. The binding of learning to action in motor adaptation. PLoS Computational Biology. 2011;7(6):e1002052. doi: 10.1371/journal.pcbi.1002052 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Busemeyer JR, Diederich A. Cognitive Modeling. Sage; 2010. [Google Scholar]
- 34. Krakauer JW, Mazzoni P. Human sensorimotor learning: adaptation, skill, and beyond. Current Opinion in Neurobiology. 2011;21(4):636–644. doi: 10.1016/j.conb.2011.06.012 [DOI] [PubMed] [Google Scholar]
- 35. Krakauer JW. Motor learning: its relevance to stroke recovery and neurorehabilitation. Current Opinion in Neurology. 2006;19(1):84–90. doi: 10.1097/01.wco.0000200544.29915.cc [DOI] [PubMed] [Google Scholar]
- 36. Tseng Yw, Diedrichsen J, Krakauer JW, Shadmehr R, Bastian AJ. Sensory prediction errors drive cerebellum-dependent adaptation of reaching. Journal of Neurophysiology. 2007;98(1):54–62. doi: 10.1152/jn.00266.2007 [DOI] [PubMed] [Google Scholar]
- 37. Abdelghani M, Lillicrap TP, Tweed DB. Sensitivity derivatives for flexible sensorimotor learning. Neural Computation. 2008;20(8):2085–2111. doi: 10.1162/neco.2008.04-07-507 [DOI] [PubMed] [Google Scholar]
- 38. Van Santen JP, Sperling G. Elaborated reichardt detectors. JOSA A. 1985;2(2):300–321. doi: 10.1364/JOSAA.2.000300 [DOI] [PubMed] [Google Scholar]
- 39. Danziger Z, Mussa-Ivaldi FA. The influence of visual motion on motor learning. Journal of Neuroscience. 2012;32(29):9859–9869. doi: 10.1523/JNEUROSCI.5528-11.2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Sternad D. It’s not (only) the mean that matters: variability, noise and exploration in skill learning. Current Opinion in Behavioral Sciences. 2018;20:183–195. doi: 10.1016/j.cobeha.2018.01.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Smith MA, Ghazizadeh A, Shadmehr R. Interacting adaptive processes with different timescales underlie short-term motor learning. PLoS Biology. 2006;4(6):e179. doi: 10.1371/journal.pbio.0040179 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Criscimagna-Hemminger SE, Bastian AJ, Shadmehr R. Size of error affects cerebellar contributions to motor learning. Journal of Neurophysiology. 2010;103(4):2275–2284. doi: 10.1152/jn.00822.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Paz R, Boraud T, Natan C, Bergman H, Vaadia E. Preparatory activity in motor cortex reflects learning of local visuomotor skills. Nature Neuroscience. 2003;6(8):882–890. doi: 10.1038/nn1097 [DOI] [PubMed] [Google Scholar]
- 44. Rashid A, Hasan O. Wearable technologies for hand joints monitoring for rehabilitation: A survey. Microelectronics Journal. 2019;88:173–183. doi: 10.1016/j.mejo.2018.01.014 [DOI] [Google Scholar]
- 45. Zhang F, Hua L, Fu Y, Chen H, Wang S. Design and development of a hand exoskeleton for rehabilitation of hand injuries. Mechanism and Machine Theory. 2014;73:103–116. doi: 10.1016/j.mechmachtheory.2013.10.015 [DOI] [Google Scholar]
- 46. Castiblanco JC, Mondragon IF, Alvarado-Rojas C, Colorado JD. Assist-As-Needed Exoskeleton for Hand Joint Rehabilitation Based on Muscle Effort Detection. Sensors. 2021;21(13):4372. doi: 10.3390/s21134372 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Agarwal P, Deshpande AD. Subject-specific assist-as-needed controllers for a hand exoskeleton for rehabilitation. IEEE Robotics and Automation Letters. 2017;3(1):508–515. doi: 10.1109/LRA.2017.2768124 [DOI] [Google Scholar]
- 48. Agarwal P, Deshpande AD. A framework for adaptation of training task, assistance and feedback for optimizing motor (re)-learning with a robotic exoskeleton. IEEE Robotics and Automation Letters. 2019;4(2):808–815. doi: 10.1109/LRA.2019.2891431 [DOI] [Google Scholar]
- 49. Vinjamuri R, Sun M, Chang CC, Lee HN, Sclabassi RJ, Mao ZH. Dimensionality Reduction in Control and Coordination of the Human Hand. IEEE Transactions on Biomedical Engineering. 2010;57(2):284–295. doi: 10.1109/TBME.2009.2032532 [DOI] [PubMed] [Google Scholar]
- 50. Deb K, Pratap A, Agarwal S, Meyarivan T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation. 2002;6(2):182–197. doi: 10.1109/4235.996017 [DOI] [Google Scholar]








