Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2021 Apr 19;17(4):e1008481. doi: 10.1371/journal.pcbi.1008481

Hierarchical motor adaptations negotiate failures during force field learning

Tsuyoshi Ikegami 1,2,3,*,#, Gowrishankar Ganesh 1,2,4,#, Tricia L Gibo 2,5, Toshinori Yoshioka 2, Rieko Osu 2,6, Mitsuo Kawato 2
Editor: Adrian M Haith7
PMCID: PMC8084335  PMID: 33872304

Abstract

Humans have the amazing ability to learn the dynamics of the body and environment to develop motor skills. Traditional motor studies using arm reaching paradigms have viewed this ability as the process of ‘internal model adaptation’. However, the behaviors have not been fully explored in the case when reaches fail to attain the intended target. Here we examined human reaching under two force fields types; one that induces failures (i.e., target errors), and the other that does not. Our results show the presence of a distinct failure-driven adaptation process that enables quick task success after failures, and before completion of internal model adaptation, but that can result in persistent changes to the undisturbed trajectory. These behaviors can be explained by considering a hierarchical interaction between internal model adaptation and the failure-driven adaptation of reach direction. Our findings suggest that movement failure is negotiated using hierarchical motor adaptations by humans.

Author summary

How do we improve actions after a movement failure? Although negotiating movement failures is obviously crucial, previous motor-control studies have predominantly examined human movement adaptations in the absence of failures, and it remains unclear how failures affect subsequent movement adaptations. Here we examined this issue by developing a novel force field adaptation task where the hand movement during an arm reaching is perturbed by novel forces that induce a large target error, that is a failure. Our experimental observation and computational modeling show that, in addition to the popular ‘internal model learning’ process of motor adaptations, humans also utilize a ‘failure-negotiating’ process, that enables them to quickly improve movements in the presence of failure, even at the expense of increased arm trajectory deflections, which are subsequently reduced gradually with training after the achievement of the task success. Our results suggest that a hierarchical interaction between these two processes is a key for humans to negotiate movement failures.

Introduction

Imagine you are practicing golf shots in a driving range and aiming to land the ball on the green with a pre-planned ball trajectory. When the ball goes along a different, unintended trajectory but it still lands on the green, you will almost automatically correct your next hitting action, by accounting for the error in the ball trajectory. However, the correction you make will be very different if the ball goes out of bounds of the green. In which case, you would not just make a large correction in the hitting action but also maybe even change your plan of the trajectory. Going out of bounds is considered a failure in golf, penalized by an extra shot, and the movement adaptation by humans in the presence of failure is intuitively very different from when a movement has achieved its target.

Failure-driven adaptations by humans have been extensively studied in decision making or cognitive control [1,2], while it has remained unclear how such distinct adaptations driven by failure affect human motor adaptation. Previous studies on motor adaptation have mainly focused on the internal model adaptation that is driven by sensory prediction error (SPE)–the difference between sensory feedback and sensory prediction of a movement [35], and/or motor command error [6]. In the last decade, however, there is mounting evidence that failure or target error (TE)–the difference between the sensory feedback of the movement endpoint and the target position–has a distinct, important contribution to motor adaptation [7,8]. The most popular TE-driven (or failure-driven) motor adaptation process is explicit strategy learning [7,9,10], which has been mostly examined during arm reaching adaptation to visuomotor rotations and often quantified by explicit reports of the reaching aiming point [9]. The explicit strategy learning is thought to modify motor performance to reduce TE, independently of SPE [7].

It however remains unclear what is the relation between the TE-driven motor adaptation and the SPE-driven motor adaptation (i.e., internal model adaptation). The interaction between the explicit strategy learning and the internal model adaptation is popularly explained by a two-state model of sensorimotor learning with different time scales for each state [11], where the two operate in a ‘flat’ (non-hierarchical) manner and the net adaptation is defined to be the sum of the two [9,12]. The fast component of the model has been often suggested to be linked to explicit strategy learning in visuomotor rotation tasks [9,10] as well as force field tasks [10,13,14]. On the other hand, recent studies have shown that the TE modulates the adaptation rate of the SPE-driven internal model adaptation [15] or savings [16]. This role of the TE as a modulator to the internal model adaptation may suggest a hierarchical interaction between the TE-driven and the SPE-driven motor adaptations.

Here we show that the two adaptation processes, in fact, interact hierarchically using a force adaptation paradigm with new TE-inducing force fields that perturbed the participant’s hand with large forces near the target (Fig 1B). The development of these new fields was crucial, as the force fields used in most reaching adaptation studies induce minimal TE or failure. For example, the popular velocity-dependent curl force-field (VDCF) [17,18] exerts the largest force perturbation on hand movements of participants in the middle of the reach and minimal perturbation near the target during reaching movements with a bell-shaped velocity profile [19]. The force field, thus, results in large lateral deviations (LDs) mid-reach in the early adaptation trials, but allows the participant to reach their target even after this large LD (see Fig 2A).

Fig 1.

Fig 1

Experiment and force fields: A) Participants made a reaching movement from a start point to a target point while holding a handle of a robot manipulandum. The direct vision of the participant’s hand was occluded by a table while they received visual feedback of their hand position during each trial by a cursor projected on the table. B) A very stiff two-dimensional spring, which was activated when the hand velocity decreased below a threshold of 20 mm/s, ensured that the participant could not make a second corrective movement to reach the target. C) The reaching task was performed in two force fields in Experiment-1 (VDCF and LIPF) and two force fields in Experiment-2 (PSPF and CPVF). The hand force profiles in these force fields are shown as shaded regions while assuming a straight minimum-jerk hand trajectory along x = 0. VDCF is a velocity-dependent force field, while LIPF and PSPF are position-dependent force fields. CPVF is a linear combination of VDCF and LIPF. Please refer to the methods for the mathematical definitions of the fields.

Fig 2.

Fig 2

Trajectory adaptation in Experiment-1: (A, C) The hand trajectories of two representative participants and learning curves in VDCF (A) and LIPF (C) averaged across all participants. Note that the scales differ between x and y axes to clearly show trajectory changes along the x-axis. The light gray shades behind some trajectories represent a schematic image of the force field. The adaptation of the TE and LD are shown by traces with open circles and filled circles, respectively. The first 15 TE and LD values are plotted for every single trial, while the subsequent trials (indicated by thick gray lines at the bottom of the figure) are plotted for every five trials. The shaded gray areas around the lines represent standard errors. The light green zones represent the target width (radius: 7.5 mm). (B, D) The TEs and baseline-subtracted LDs in six trial epochs (1st, 3rd-5th, 136th-155th adaptation trials, and 1st, 3rd-5th, 131st-150th de-adaptation trials) in VDCF (B) and LIPF (D). Gray dots represent data from individual participants. The error bars indicate standard errors. The light green zone in the TE plots represents the target width.

In our study with the novel TE-inducing force fields, we observed that TE-driven motor adaptation occurs faster than internal model adaptation. Second, and importantly, TE-driven motor adaptation can result in persistent after-effects that are distinct from after-effects after internal model adaptation. Third, these adaptive behaviors can be well explained by previous models of internal model adaptation only if they incorporate a hierarchical interaction between TE-driven adaptation of the kinematic plan and internal model adaptation. The relation between TE-driven adaptation and internal model adaptation is consistent with the traditional view of hierarchical motor planning of kinematics and dynamics [6,20].

Results

Experiment-1

In Experiment-1, thirty participants were asked to make arm reaching movements to a target 150 mm from the start position (Fig 1A) and adapt to either of two force fields (Fig 1C): the popular velocity-dependent curl field (VDCF) that does not induce TEs, and the novel and TE-inducing linearly increasing position-dependent (orthogonal) field (LIPF) (see Methods for details). The adaptation phase (155 trials) was followed by the de-adaptation phase (150 trials), where the participants performed the same task in the null field, like the baseline session. We randomly assigned the participants to one of the two force fields (n = 15 for each). Their movements were quantified by two variables: TE and LD. The TE was defined as the x-deviation of the endpoint hand position from the target, and the LD was defined as the x-deviations of the hand from the mid-point (y = 75 mm) of the straight line connecting the start and the target (Fig 1A and 1B, and see Methods).

TE changes the trajectory adaptation pattern

Fig 2 shows the time development of hand trajectories, TE (open circle), and LD (filled circle) in the two force fields and subsequent null field. To show immediate and later effects of the initial TE on the adaptation and de-adaptation phases, we analyzed the data in six trial epochs: 1st, 3rd-5th, 136th-155th (i.e., last 20) adaptation trials and the 1st, 3rd-5th, 131st-150th (i.e., last 20) de-adaptation trials (Fig 2B and 2D).

In the VDCF, the trajectory adaptation pattern was similar to those reported in previous studies. The force field perturbed the participants’ hand trajectories considerably in the first adaptation trial (Fig 2A), but their hands still could reach the target as we expected. After the adaptation phase, the participants could fully compensate for the perturbation, and their trajectories became straighter, curving towards the opposite direction by the 155th adaptation trial. In the first de-adaptation trial, their hand trajectories exhibited a large after-effect, deviating towards the opposite direction to the force field. By the end of the de-adaptation phase, their trajectories returned to the straight baseline, or null, trajectories (see 148th de-adaptation trial). These results were consistent with what has been observed in previous studies [18,21]. The across-participant average adaptation of the TE (open circle) and LD (filled circle) are shown in the bottom panels of Fig 2A. A large LD induced at the beginning of the adaptation and de-adaptation phases quickly decreased to within the target size (radius = 7.5 mm, light green zone) within the first 10 adaptation and de-adaptation trials, respectively. Importantly, TEs remained relatively small—around or within the target from the very first adaptation trial and through the following adaptation and de-adaptation phases. In fact, the magnitude of TE was not significantly larger than the target radius in the first adaptation trial (t(14) = 0.284, p = 0.780) and the first de-adaptation trials (t(14) = 0.131, p = 0.897).

On the other hand, the TE-inducing LIPF showed a dramatically different adaptation pattern from the VDCF. In the LIPF, the participants’ hand trajectories in the first adaptation trial (Fig 2 C) were perturbed the most around the target, resulting in a large TE (across-participants average of TE in 1st trial was 112.6 ± 38.0 (mean ± s.d.) mm) that was significantly larger than the target (t(14) = 10.700, p = 4.016×10−8). In the subsequent adaptation trials (see 4th adaptation trial in Fig 2C), the participant’s hand trajectories jumped opposite to the force direction, which ensures that the target is reached, even with a curved trajectory. It is important to note that the magnitude of the LD increases (between 2nd and 7th adaptation trials), before it gradually decays after the 7th adaptation trial. Furthermore, the decay was observed to be opposite in sign to that in VDCF. That is, while the LD in the VDCF decays from an initial negative value (i.e., from ‘–x’ towards zero), the decay in the LIPF is from a positive deviation (i.e., from ‘+x’ towards zero), even though the LIPF also pushes the hand in the same direction as the VDCF field (i.e., towards ‘–x’). Consequently, the decays of the TE and LD are of the same sign in the VDCF, but opposite signs in the LIPF.

The trajectory change in the de-adaptation phase (1st, 4th, and 147th de-adaptation trials in Fig 2. C) was almost a mirror image of that in the adaptation phase. A distinctly large TE (of 44.3 ± 27.7 mm) was induced in the first de-adaptation trial, which was again significantly larger than the target (t(14) = 5.140 p = 1.503×10−4), which monotonically reduced to within the target size by the 10th trial. In contrast, the LD did not show a monotonic decrease. Unlike in the VDCF, the magnitude of the LD first increased and then decreased. And, again in the de-adaptation phase, we observed that the decays were of opposite sign changes in TE and LD.

To quantify the trajectory adaptation pattern of each group, we performed one-way ANOVAs on the TE and LD values across the trial epochs. The VDCF group showed a significant main effect in LD (F2.546, 35.649 = 175.179, p = 3.165×10−5, ηp2 = 0.926) but not TE (F2.152, 30.134 = 2.284, p = 0.116, ηp2 = 0.140). Post-hoc Tukey’s tests confirmed that the magnitude of LD monotonically changed during the adaptation (1st vs 136th-155th: p<0.001) and de-adaptation (1st vs 131st-150th: p<0.001) phases.

The LIPF group showed a significant main effect in both TE (F1.686, 23.600 = 84.204, p = 6.404×10−11, ηp2 = 0.857) and LD (F2.601, 36.412 = 73.312, p = 8.660×10−15, ηp2 = 0.840). The magnitude of TE monotonically decreased during the adaptation (1st vs 136th-155th: p<0.001) and de-adaptation (1st vs 131st-150th: p<0.001) phases. In contrast, the LD showed a non-monotonic change during the adaptation and de-adaptation phases. The LD increased from the 1st to the 3rd-5th adaptation trials (p<0.001) and then decreased from the 3rd-5th trials to the 136th-155th adaptation trials (p<0.001). Similarly, the LD decreased from the 1st to 3rd-5th de-adaptation trials (p<0.001), and then increased from the 3rd-5th to 131st-150th de-adaptation trials (p = 0.008).

The appearance of a new, curved null trajectory after de-adaptation of LIPF

Furthermore, we observed an intriguing phenomenon in the de-adaptation phase of the LIPF. In the case of the VDCF, upon returning to the null field after the adaptation phase, the participants readily lost their adapted trajectories within the first 10 de-adaptation (null) trials (Fig 2A); their trajectories returned to their original null trajectories (observed in the baseline session) as previously reported [21,22]. This was, however, not the case after the LIPF (see 150th de-adaptation trial in Fig 2C). After the de-adaptation phase, the participants’ trajectories remained marginally, yet consistently, deviated from their original null trajectories, even after as many as 150 null trials (~20 min). Fig 3A compares the participant-averaged null trajectories before (blue traces) and after (red traces) exposure to the VDCF or LIPF (first and second plots from left). The LD in the null trajectory showed a significant difference between before and after exposure to the LIFP (t(14) = 4.224, p = 8.494×10−4), but not the VDCF (t(14) = 0.774, p = 0.452) (Fig 3B).

Fig 3.

Fig 3

Null trajectories before and after adaptation in the four force fields in Experiment-1 (VDCF and LIPF) and Experiment-2 (PSPF and CPVF). A) The null trajectories averaged across the last 20 trials were compared between the baseline (cyan lines) and de-adaptation (magenta lines) phases. The color shades indicate standard errors. Note that the scales differ between x and y axes to clearly show trajectory differences along the x-axis. B) The baseline-subtracted LDs in the trial epoch from the last 20 (131st-150th) de-adaptation trials in the four force fields. Gray dots represent data from individual participants. The error bars indicate standard errors. * indicates p < 0.05.

Crucially, note that the deviation of the new null trajectory was observed to be in the direction in which the force field perturbed the hand and not in the direction opposite to the force field, as would be generally expected after exposure to the VDCF. These observations suggest that the new null trajectory may be not simply an after-effect due to a slow de-adaptation to the force field but a consequence of the TEs induced in the first few null (de-adaptation) trials after exposure to the LIPF. To further investigate the cause of the appearance of the new null trajectory, we next conducted two control experiments.

Experiment-2

In Experiment-2, we considered the possibility that the new null trajectory was not a consequence of the TE and was, rather, induced due to the LIPF being a position-dependent field. To negate this possibility, we examined trajectory adaptation by fifteen participants in the positively skewed position-dependent field (PSPF) (Fig 1B), which is a position-dependent force field that does not induce TEs.

We observed that the magnitude of TE in the first adaptation (t(14) = 0.261 p = 0.798) and de-adaptation trials (t(14) = 0.097 p = 0.924) in PSPF was not significantly larger than the target radius, while the LD showed a monotonic change through the adaptation and de-adaptation phases (see S1 Fig and S1 Text). Importantly, the null trajectory in the de-adaptation phase of the PSPF returned to the baseline null trajectory (t(14) = 0.659, p = 0.520) (Fig 3A and 3B). These observations were similar to the behaviors observed during exposure to the VDCF.

Next, to ensure that the new null trajectory is also observed in other TE-inducing force fields than the LIPF, we examined the trajectory adaptation in the position and velocity-dependent field (CPVF) (Fig 1C). We observed that similar to the LIPF, the CPVF induces a large TE, both in the first adaptation trial (73.2 ± 50.0 (mean ± s.d.) mm, t(13) = 4.905, p = 2.874×10−4), as well as the first de-adaptation trial (26.0 ± 22.8 mm, t(12) = 5.211 p = 2.178×10−4). The TEs monotonically reduced until the participant’s hand could reach the target. In contrast, as with the LIPF, the LD clearly decreased only after substantial reductions in the TE during the adaptation and de-adaptation phases (see S2 Fig and S1 Text). Crucially, the participants exhibited a new hand trajectory that was significantly different from their initial null trajectory (t(13) = 3.386, p = 0.0049) even after 150 trials in the de-adaptation phase (Fig 3A and 3B). This result provides further support for the possibility that the new null trajectory is a consequence of the TEs induced at the beginning of the de-adaptation phase.

Experiment-3

Finally, to concretely establish the TEs (at the beginning of the de-adaptation phase) as the cause of the new null trajectory, in Experiment-3 we examined the hand trajectories when the TEs were eliminated in the de-adaptation phase of LIPF (Experiment-1). Thirty participants participated in Experiment-3. Half (15) of these participants had previously participated in Experiment-1. Similar to Experiment 1, these participants trained in the LIPF first, followed by the de-adaptation phase. However, in the de-adaptation phase of Experiment-3, they made reaches in the null field in the presence of a partial error clamp (PEC). This experiment condition was referred to as LIPF-PEC condition, while the LIPF condition of Experiment-1 (the LIPF followed by the Null) was referred to as LIPF-Null condition. The PEC was implemented as a strong spring (see Methods for details) that acted over the second half of their movement (y > 75 mm) and pulled the participant’s hand to the target along the x-axis (Fig 4A, also see Methods). Note that the first half of the movement (y ≤ 75 mm), where the LD is measured, remained unaffected by the PEC. The other half of participants, who were newly recruited, experienced the LIPF-PEC first and then the LIPF-Null conditions to cancel out the order effects of these two conditions. We compared the LIPF-PEC condition (Fig 4B, right) with the LIPF-Null condition (Fig 4B, left). As the half of participants was also used in Experiment-1, statistical significance for the data of Experiment-3 was tested with Bonferroni multiple comparison.

Fig 4. Effect of attenuation of TE on the de-adaptation trajectory.

Fig 4

(A) After exposure to LIPF, the participants in the LIPF-PEC condition of Experiment-3 were exposed to the PEC where a force channel was applied over the second half of the reaching movement to attenuate TEs. (B) The hand trajectories and learning curves of both TE (open circle) and LD (filled circle) are compared between the LIPF-Null (left panel) and LIPF-PEC conditions (right panel). (C) The TE in the first de-adaptation trial (left panel) and the baseline-subtracted LD averaged across the last 20 (131st-150th) de-adaptation trials (right panel) were compared between the two conditions. Gray dots represent data from individual participants. The error bars indicate standard errors. * P < 0.05.

Although the trajectory adaptation to the LIPF was similar between the LIPF-Null (left panel in Fig 4B) and LIPF-PEC conditions (right panel in Fig 4B), a stark difference was observed in the de-adaptation phase in presence of the PEC. As expected, the TE in the first PEC trial was substantially attenuated, compared to the first trial in a normal Null field (left panel in Fig 4C; PEC: 8.6 ± 1.9 (mean ± s.d.) mm, Null: 39.9 ± 26.5 mm; t(29) = 6.550, pcorrected = 7.133×10−7). On the other hand, the LD in the first de-adaptation trial did not differ between the PEC field and the Null field (t(29) = 0.732, puncorrected = 0.470). However, the difference in LD appeared after the 1st de-adaptation trial; while the LD in the LIPF-Null condition showed large jumps from ‘+x’ to ‘-x’, before decaying to the new null trajectory (similar to Experiment-1), the LD in the LIPF-PEC condition was similar to the VDCF condition. In the presence of the PEC, the LD monotonically converged from ‘+x’ through the de-adaptation phase. More importantly, the magnitude of the LD in the last twenty de-adaptation trials in the LIPF-PEC condition was significantly smaller than in the LIPF-Null condition (t(29) = 2.851, pcorrected = 0.016; right panel in Fig 4C). Furthermore, the participants’ hand trajectories returned to their initial null trajectories on the application of the PECs (t(29) = 0.283, puncorrected = 0.779). Overall, the behaviors in the PEC were observed to be the same as in the no-TE-inducing force fields, specifically the VDCF and PSPF (compare Fig 4B’s right panel with Fig 2A). Moreover, when we analyzed only the second half of participants who participated only in Experiment-3 (no need of multiple comparison), we confirmed the same results. The TE in the first de-adaptation trial was substantially smaller in the LIPF-PEC condition than the LIPF-Null condition (t(14) = 4.193, p = 9.020×10−4). The LD in the last twenty de-adaptation trials was significantly smaller in the LIPF-PEC condition than the LIPF-Null condition (t(14) = 2.183, p = 0.047), and the hand trajectories in the PEC returned to the original null trajectories (t(14) = 0.637, p = 0.534). Furthermore, the results of Experiment-3 suggest that muscle fatigue is unlikely to account for the formation of the new null trajectory. This is because we observe new null trajectories in the LIPF-Null but not in the LIPF-PEC conditions, even though the participants train on the same LIPF before performing the de-adaptation phase in these conditions. Overall, these results strongly suggest that the TEs after exposure to TE-inducing force fields caused the new null trajectories observed in Experiment-1 and -2.

Hierarchy and model simulation

Our results show that in the presence of failure (TE > target size), the evolution of the trajectories is very different from when there are no TEs (compare Fig 2A and 2C). The reduction of TE is consistently given priority over the reduction of LD (Fig 2C), with the TE decreasing monotonically, even at the cost of a temporary increase of LD over several trials. Finally, adaptation in the presence of failure can induce changes in the undisturbed (null) trajectories (Fig 3).

First, these observations suggest the presence of a TE-driven adaptation process, in addition to the SPE-driven internal model adaptation. Furthermore, the distinct adaptation of the TE and LD in the LIPF, one of which is monotonic while the other not (Fig 2C), led us to hypothesize a hierarchical interaction between the two processes. To evaluate this hypothesis, we simulated the trajectory adaptation in the VDCF, LIPF-Null, and LIPF-PEC using two sensorimotor adaptation models that consider only the internal model adaptation, with and without the addition of a hierarchical TE-driven adaptation process.

First, we started with the ‘flat’ optimal feedback control model (or the flat OFC model), proposed by Izawa et al. [23] to explain trajectory adaptation in a velocity-dependent force field by combining the internal model learning of the learned force field and the optimal feedback control [24]. Second, the ‘flat’ V-shaped model (or flat VS model) proposed by Franklin et al. [25], which utilized a different algorithm, similar to feedback error learning [6] where muscle activation changes across trials are determined by a V-shaped learning function under the assumption of a pre-planned desired trajectory. We refer to both these models using the prefix ‘flat’ as both models consider a single SPE-driven internal model adaptation process to explain motor adaptations. We will show that these models can explain our experimental observations by appending a ‘hierarchical’ TE-driven adaptation process in their current structure. Please see Methods for details of implementation.

Fig 5 shows that simulations of the VDCF, LIPF-Null, and LIPF-PEC adaptations by the flat OFC and flat VS models. Although the flat OFC model (Fig 5A) and the flat VS model (Fig 5B) qualitatively reproduced the trajectory adaptation in the VDCF well, they were unable to reproduce both the non-monotonic change in LD and the persistent curved null trajectory observed in the LIPF-Null and LIPF-PEC (Fig 5C and 5D).

Fig 5.

Fig 5

Flat models cannot reproduce LIPF and PEC behaviors: Simulation for trajectory adaptation in the VDCF (A, B), LIPF-Null (C, D), and LIPF-PEC (E, F) conditions, represented by TE (open circle) and LD (filled circle) by the flat OFC (upper panels) and VS models (lower panels). The flat learning models (only internal model adaptation) were unable to reproduce either the non-monotonic change in LD (C, D, E, F) or the curved null trajectory with a persistent deviation after exposure to the LIPF(C, D).

Next, we introduced an additional TE-driven adaptation process to these models. We assumed that the adaptation process represents a modification of the kinematic plan, when there is a failure (i.e., a TE > target size), and then added the kinematic plan adaptation process on the top of the flat learning models (Fig 6A). We thus refer to these two models as the ‘hierarchical’ OFC model and the ‘hierarchical’ VS model, respectively. The kinematic plan adaptation process was assumed to be activated only in the presence of failure and modulated by TE so that the trajectory is adjusted to change in the opposite direction to the TE. In the absence of failure (i.e., TE < target size), the kinematic plan subtly decays across trials to the original plan (i.e., the straight direction towards the target). We assume that the decay stops when the motor cost of the generated reaching goes below a small value of threshold (see Methods for details of implementation). This assumption was done to reproduce the persistent curved null trajectory.

Fig 6. Hierarchical motor adaptation model.

Fig 6

(A) Schematic diagram of the model. The model consists of two adaptation components: the kinematic plan adaptation (magenta box) as a higher component, driven by TE, and the internal model adaptation (light blue box) as a lower component, driven by SPE. In the presence of failure (i.e., TE > target size), the kinematic plan adaptation process becomes active and modifies the planned direction of the hand motion. When the task is successful, the planned direction slowly decays to the original movement direction. (B) The planned direction of the hand motion is implemented as a directional bias (magenta arrow) in the hierarchical OFC model and a desired trajectory (magenta line) in the hierarchical VS model (see Methods for details).

In the hierarchical OFC model, this process was implemented by a direction bias [26] (Fig 6B), which was incorporated into the cost function within the flat OFC model (see Methods for details). In the hierarchical VS model, the initial direction of the desired trajectory (Fig 6B) was modified in the same way as the hierarchical OFC model (see Methods). By including this TE-driven adaptation process, both models (Fig 7C, 7D, 7E and 7F) could explain all the features of the trajectory adaptation in LIPF-Null and LIPF-PEC, including the non-monotonic change of the LD during the adaptation phase, and the appearance of the new null trajectory after de-adaptation in the LIPF-Null or disappearance of that in the LIPF-PEC. In the absence of failure, as in VDCF, both models predict the same results as their flat counterparts (Fig 5A and 5B).

Fig 7.

Fig 7

Hierarchical model’s simulation for trajectory adaptation in the VDCF (A, B), LIPF-Null (C, D), and LIPF-PEC (E, F) conditions, represented by TE (open circle) and LD (filled circle) by the hierarchical OFC (upper panels) and VS models (lower panels). The simulated hand trajectories were shown at the top of each panel. The hierarchical learning models (kinematic plan adaptation and internal model adaptation) successfully reproduced the behaviors in all the three conditions.

Discussion

We examined the motor adaptation of arm reaching trajectories in force fields that induce failure (TE > target size) at the beginning of the adaptation and de-adaptation phases. First, our results showed that the human motor learning system puts a higher priority on the reduction of TE than LD. In the presence of failure, the LDs did not follow a typical monotonic decrease as reported in previous studies [21,22,27,28]. TE is reduced first, even at the expense of an increased LD (Fig 2C). A monotonic decrease in LD took place only after the TE was reduced to around the target size. Second, the presence of failure in the de-adaptation phase caused the appearance of a new null trajectory that was distinct from the null trajectory observed in the baseline period and persisted even after 150 de-adaptation trials. These observations were successfully reproduced by the hierarchical motor adaptation models that combine a TE-driven kinematic plan adaptation with the internal model adaptation.

The prioritized reduction in TE over LD (Fig 2C and 2D) cannot be explained only by internal model adaptation even when considering multiple time scale adaptations, such as a two-state model [1012], because these models predict similar monotonic changes in both TE and LD (like Fig 5). It is important to note that this is also the case when considering the spatiotemporal difference in the error information. If the errors early in a trajectory are less important than those at the end to update the internal model of the force field, the difference may affect the adaptation rate (i.e. TE may lead to a faster internal model adaptation) but still not change the adaptation pattern to which the internal model adaptation leads (i.e. monotonic decay of the trajectory). In contrast, the non-monotonic trajectory changes in the presence of TEs suggests the presence of an additional TE-driven kinematic plan adaptation. In our hierarchical motor adaptation models (Fig 6), the kinematic plan adaptation changes the reaching direction in the opposite direction of the TE, which enables a quick reduction in TE, even when it sometimes leads to an increase in LD (Fig 7C, 7D, 7E and 7F). After the TE reduction, we assume that the kinematic plan slowly returns towards the original movement direction (i.e., towards the target). The hierarchical addition of this TE or failure driven process enables the models to explain the TE and LD adaptation processes both in no-TE-inducing force-fields as well as TE-inducing force fields.

The appearance of the new null trajectory in the de-adaptation phase can be also explained by the hierarchical dominance of kinematic plan adaptation over internal model adaptation. In our hierarchical models, we assumed that after the motor cost of arm reaching falls below a small threshold value, the decay of the kinematic plan toward the baseline plan stops. This assumption could reproduce the persistent curved null trajectory after de-adaptation in the presence of failure. The models thus suggest that the TE-driven kinematic plan adaptation may determine the steady-state null trajectory to which the internal model adaptation converges. This possibility is strongly supported by Experiment-3 (Fig 4) where the suppression of TE enabled the participants to converge back to their baseline null trajectory. This observation was also successfully reproduced by the hierarchical learning models (Fig 7E and 7F). Our assumption, that the TE-driven kinematic plan adaptation is also affected by the motor cost, is similar to the idea that a desired trajectory of movement may be modified according to the level of interaction force with the environment [29]. It has however not yet been empirically examined and remains an interesting question for future studies.

Motor learning processes like motor memory [3032] or use-dependent learning [33] make one’s movement similar to the last performed movement. Operant reinforcement learning [34] causes people to select movements for which the task had previously been successfully achieved. These processes may be seen as likely candidates to explain the persistent curved trajectories. However, these processes alone cannot explain why the persistent curved null trajectories do not appear in the no-TE-inducing force fields (VDCF or PSPF) (Fig 3B) as well as during PEC in Experiment-3 (Fig 4B), in which the participants successfully reached the target with curved null trajectories in the first de-adaptation trials. Our results thus suggest that even if motor memory, use-dependent learning, or operant reinforcement learning is indeed active during the force field adaptation, unlike kinematic plan adaptation, they do not hierarchically interact with internal model adaptation but instead work in a non-hierarchical manner. Likewise, other possible causes like perceptual bias [35,36] or perceptual recalibration [37] of the hand position also can not well explain why the persistent curved null trajectories appear only in the TE-inducing but not the non-TE-inducing force fields. If our model includes these learning processes or perceptual adaptation processes, it may be able to better explain the behavior. We however note that the main purpose of our model simulation is to explain the necessity of an additional TE-driven adaptation process hierarchically interacting with the internal model adaptation, rather than develop a new model. For this purpose, we chose the two most popular learning models in the current literature and demonstrated the effect of adding the additional TE driven process.

A priori, the new null trajectory in the de-adaptation phase shown by us is different from a persistent retention of learned movements that has been recently reported to occur after some reinforcement period where only binary success or failure feedback is provided [3841]. There, the retention was measured in no feedback periods subsequent to reaches, in which movement-related feedback was not available, and then when the feedback was available, a typical washout process took place with the movement quickly returning to the baseline level [38]. In contrast, in our study the new null trajectory persists for at least ⁓20 min of the de-adaptation period even when movement-feedback is available and without a reinforcement period. Future work is needed to determine how long the new null trajectory persists or whether it decays very slowly.

Recent studies have identified the presence of distinct explicit and implicit components of adaptation to novel visuomotor rotations [7,9,10]. The explicit components, called explicit strategy learning, have been proposed to be sensitive to task performance or TE, and faster than implicit components represented by internal model adaptation. We believe the TE-driven adaptation process we observed here may (at least partially) be an explicit strategy learning, as it was active only in the presence of failures and fast [10,14] but insensitive to LD (i.e. SPE). However, the key difference between this TE-driven adaptation and the explicit strategy learning previously identified lies in the way the two processes interact with the internal model adaptation. Previous visuomotor rotation studies have often utilized a two-state model to explain the interaction between the explicit strategy adaptation and internal model adaptation [10,42] by assuming that these two adaptation processes interact in a non-hierarchical manner where the net reach trajectory is defined to be the sum of the two. However, in the case of visuomotor rotation tasks, the parameter to be learned by the two adaptation processes is the same–the rotation angle (or its equivalent). In fact, previous force-field studies have similarly looked at the adaptation of a single parameter–the trajectory (quantified by its curvature, deviation, or encompassed area relative to the straight line). The adaptation of a single parameter is well explained by ‘flat’ models, including the “non-hierarchical” two-state model. On the other hand, this is not the case in our force field task, where the two adaptation processes represent changes in distinct parameters (the target and trajectory). The net adaptation behavior in our experiment cannot be explained by the flat models, including the two-state model, in its current formulation. Rather, the TE-driven adaptation and the internal model adaptation we observe here seem to be more consistent with the traditional view of hierarchical motor planning of kinematics and dynamics [6,20].

Our hierarchical models are different from the Adaptation Modulation model, a hierarchical motor adaptation model proposed by Kim et al. [15] that could explain the interaction of TE-driven adaptation and SPE-driven adaptation in their visuomotor rotation paradigm. The Adaptation Modulation model increases the adaptation rate of the SPE-driven adaptation process in the presence of failure (TE > target size). In the end, as with the two-state model, this model also considers only adaptation of the internal model (i.e. novel visuomotor ration) although it is modulated by the presence of TE. Thus, the TE-driven process of the Adaptation Modulation model hierarchically determines a temporal feature of the SPE-driven adaptation (i.e., how fast arm trajectories adapts to the novel environment) but not a spatial feature as in our models (i.e., where the adapted trajectories converges). Accordingly, the Adaptation Modulation model can explain our observations in the non-TE-inducing fields but not those in the TE-inducing fields (i.e. the presence of non-monotonic trajectory change and new null trajectory). Additionally, this is also true for two other models proposed by Kim et al. [15]: the Movement Reinforcement and Dual Error models. Both models implement an interaction of SPE-driven and TE-driven processes, but again consider the adaptation of internal model alone. On the other hand, our model may partially explain their results. Specifically, the TE-driven kinematic plan (by limiting the range of the plan change) can explain the facilitated adaptation observed in Kim et al. [15] although some constraints are necessary. However, as the TE-driven process in Kim et al. [15] modulates adaptive behavior in a completely implicit manner while our TE-driven process may, we believe, work in an explicit manner, these two may be distinct in nature.

Studies have regularly found hierarchical behaviors during cognitive learning and decision making in humans. The brain activations during these hierarchical behaviors have been well explained by hierarchical reinforcement learning (HRL) algorithms [4348]. The typical role of the higher component in a HRL system is to select a task-goal-oriented sub-goal or option, while the lower component typically selects an action to achieve this goal or sub-goal [44,4951]. This structure is very similar to the hierarchical motor learning models we suggest here. However, while the previous theoretical and imaging studies have exhibited a hierarchy at the level of cognitive learning in low degrees-of-freedom tasks, here our study suggests the presence of similar hierarchical structures for solving large degrees-of-freedom motor learning problems. The higher components active during cognitive learning have been linked to neural systems in the dorsolateral striatum, the dorsolateral prefrontal cortex, the supplementary motor area, the pre-supplementary motor area, and the premotor cortex [44]. On the other hand, the lower components have been related to the ventral striatum and the orbitofrontal cortex that has strong connections to both the ventral striatum and the dorsolateral prefrontal cortex [44]. Interestingly most of these areas have been observed to be active during motor learning of point-to-point arm or finger movements as well [5255], suggesting the cognitive learning processes and the hierarchical motor learning may process as subsets of a common HRL structure. However, further studies are required to clarify this speculation by concretely examining the sharing of neural structures between the two processes.

Before the conclusion, we note two limitations of this study. First, while we manipulate the presence or absence of TE across the force fields, the current experimental design could not control several movement features like the velocity profile, stiffness profile [56], online feedback gain [57], posture at the final position [58] or adaptive movement changes [59,60]. Although we believe it is unlikely that any of these factors alone can consistently explain our two key observations in the TE-inducing force fields: the non-monotonic trajectory change and the new null trajectory, they may partially contribute to the formation of our observations. For example, one possibility is that a change in feedback gain induced by a large TE may contribute to shape the new null trajectory, because feedback control has been suggested to share the internal model used for ‘feedforward’ control [6163]. Another possibility is that the faster reduction of TE than LD may be boosted by adaptive control that involves online update of the control policy within individual movements [59,60]. Adaptive control may update not only the control policy but also the kinematic plan in the presence of TE. If the update rate to the kinematic plan is greater than that to the control policy, this may result in different trial-by-trial adaptation of TE and LD with faster and slower time scales, respectively. Future studies are needed to examine these possibilities.

Second, the current experiment design cannot determine whether the TE-driven kinematic plan adaptation is an implicit process to automatically compensate for TE or an explicit process to intentionally change the strategy or the initial reach direction, although we believe the latter. One promising way to address this question may be to manipulate the participants’ psychological sensitivity to TE of the same movements as employed by Kim et al. [15]. Changing the target size or monetary reward for task success, but with other motor factors being kept constant, would be useful to examine whether or not the TE-induced adaptative behaviors observed in our study are explicitly modulated.

The failure (i.e., TE) driven adaptation of the kinematic plan leads to large and fast movement changes that are arguably costly in terms of control and energy [24,64]. It is, therefore, possible that in our daily lives, to reduce the control cost, kinematic plan adaptation remains inactive during the performance of most movements, as they are overlearned and rarely lead to failure. This plan adaptation is likely activated only when there is a (probably unexpected) failure. When a failure is experienced, the kinematic plan adaptation process helps the brain to quickly acquire success or reward, even at the expense of large high energy movement changes, after which it is again left to the internal model adaptation to optimize the movement relative to this new movement plan. Furthermore, task success or failure definitively depends on task requirements. In our study, as TE determines whether the task is successful or not, the participants prioritized TE over LD. However, if participants were instructed that the task goal is to make a reaching trajectory with a certain magnitude of LD, they would prioritize LD over TE. Moreover, when the failure is indicated by a binary (success or failure) feedback but not a signed error feedback like TE, LD may be more prioritized as suggested in a previous study [65]. Importantly, whatever the task goal or the feedback type is, our results suggest that the presence of failure may activate the kinematic plan adaptation to quickly achieve the goal. In conclusion, our study provides behavioral evidence to exhibit that human motor learning is shaped by the hierarchical interactions between the two learning processes; a higher kinematic plan adaptation driven by failure, and a lower internal model adaptation. This hierarchical motor adaptation structure may allow the brain to negotiate unexpected behavioral failures in an ever-changing and diverse environment around us.

Methods

Ethics statement

All experiments involved human participants and were approved by both the ethics committees of Advanced Telecommunication Research Institute (approval numbers: 15–722, 16–722) and National Institute of Information and Communications Technology. All participants signed an institutionally approved consent form.

Participants

A total of seventy-five neurologically normal volunteers (fourteen females and sixty-one males; age 22.70 ± 2.06, mean ± s.d.) participated in the experiments. All participants were right-handed as assessed by the Edinburgh Handedness Inventory [66]. All participants were naïve to the purpose of the experiments. No statistical methods were used to determine sample sizes although the sample sizes used in this study were similar to those in previous studies using similar reaching tasks [9,14,15,23,42].

Apparatus

The participants sat on an adjustable chair while using their right hand to grasp a robotic handle of the twin visuomotor and haptic interface system (TVINS) used to generate the environmental dynamics [67]. Their forearm was secured to a support beam in the horizontal plane and the beam was coupled to the handle. Since the TVINS has two parallel-link direct drive air magnet floating manipulandums, we performed the experiments with two participants at a time. Each manipulandum was powered by two DC direct-drive motors controlled at 2,000 Hz and the participants’ hand position and velocity were measured using optical joint position sensors (4800,000 pulses/rev). The handle was supported by a frictionless air magnet floating mechanism.

A projector was used to display the position of the handle with an open circle cursor (diameter 4 mm) on a horizontal screen board placed above the participant’s arm. The screen board prevented the participants from directly seeing their arm and handle. The participants controlled the cursor representing the hand position by making forward reaching movements (the details will be shown in the next section) from a start circle (10 mm diameter) to a target circle (15 mm diameter), which were displayed on the screen throughout all of the experiments. The start circle was located approximately 350 mm in front of the shoulder joint, and the target was 150 mm away from it.

Task

The participants were instructed to move the cursor from the start circle to the target circle in a period of 400 ± 50 ms. No instructions were given about the trajectory of reaching movement. Each movement was initiated by audio beeps. Participants were instructed to begin movement on the second beep, 1 s after the first beep. The second beep lasted for 400 ms and could be used as a reference to the instructed movement duration. The cursor was visible only during each trial. After each trial, the participants were provided information about their movement duration and final hand position. Movement duration was defined as the period between the time the cursor exits the start circle and enters the target circle. Participants were provided information about the movement duration, given as “SHORT”, “LONG” or “OK”. The final hand position was defined as the position at the moment when the hand velocity fell below 20 mm/s. If the final hand position was within the target circle, the inside of the circle turned blue. After each trial, a third beep 3s after the first beep indicated the termination of the trial and the TVINS brought the participant’s hand back to the start circle, and the next trial started after a period of 1 s. The inter trial-interval was 8 s.

Force fields

This study used four different force fields: Velocity-dependent curl field (VDCF), Linearly increasing position-dependent (orthogonal) field (LIPF), Positive skew position-dependent (orthogonal) field (PSPF), and Combination of position- and velocity-dependent field (CPVF). There are two TE-inducing force fields (LIPF and CPVF) and two no-TE-inducing force fields (VDCF and PSPF). They are illustrated in Fig 1C and computed using the following equations.

VDCF:[FxFy]=B1[0110][x˙y˙]
LIPF:[FxFy]=K1[0100][xy]
PSPF:[FxFy]=K2cos(π+40y)+1(π+40y)5[10]
CPVF:[FxFy]=K1[0100][xy]B1[0110][x˙y˙]

Where (Fx, Fy)T represents a force in Newtons exerted on the hand, (x, y) is the hand position relative to the center of the start circle in meters, (x˙,y˙) is the hand velocity in meter per second, B1 is 14 Ns/m, K1 and K2 are 60 and 20868 N/m, respectively.

Importantly, the hand motion is momentarily constrained to the final hand position where the velocity fell below a low threshold of 20 mm/s by applying a strong stiff two-dimensional spring force (500 N/m) and damper (50 Ns/m). The constraint force is active until the trial ends (lasting for around 1600 ms). This was designed such that participants did not need to continue resisting large force at the movement end (as in LIPF and CPVF) and it prevents them from reaching the target by sub-movements [68,69].

Partial error clamp

This study developed a new error clamp method and used it in Experiment-3. Previous motor learning studies have extensively utilized error clamp methods to assess motor adaptation performance [70]. When the error-clamp was active, the trajectory of the hand was attracted to a straight line joining the start circle to the target by a virtual “channel” (see Fig 4A) in which any motion perpendicular to the straight line was pulled back by a one-dimensional spring (800 N/m) and damper (45 Ns/m). However, in contrast to the previous experiments, the error clamp was applied only over the last part of the hand movement (y >75 mm) such that the first part of the movement where the LD is measured (the details will be shown in a later section) is unaffected by the clamp. Furthermore, the magnitude of the spring was set weaker than that in the previous studies, which allows the hand trajectory to change smoothly (see the hand trajectories for LIPF-PEC condition in Fig 4B). We call this a partial error clamp (PEC).

Experiment procedure

Experiment-1

Thirty participants who passed initial screening (the details will be shown later in Participant screening section) were randomly assigned to each of the two groups (n = 15 for each): the VDCF group and the LIPF group (Fig 1C). First, the participants in both groups were given a practice period to acclimatize themselves to the apparatus and task. They were allowed to take their time but asked to make reaching movements in the no-force field environment (null field) at least more than 50 trials. All participants finished practice less than 100 trials. This was followed by the two experimental sessions: baseline and adaptation sessions. In the baseline session, the participants performed 50 trials of reaching movements in the null field. In the adaptation session, after 5 trials in the null field, the participants in the VDCF and LIPF groups performed 155 (adaptation) trials in VDCF and LIPF, respectively, which was followed by 150 (de-adaptation) trials in the null field. Two-minutes rests were taken three times, each after the 50th, 100th, and 150th adaptation trials.

Experiment-2

Thirty participants who passed initial screening were randomly assigned to each of the two groups (n = 15 for each): the PSPF group and the CPVF group (Fig 1C). The experimental procedure is the same as Experiment-1.

Experiment-3

Thirty participants took part in Experiment-3. Half of them who were assigned to the LIPF group of Experiment-1 returned to our laboratory at least more than one week after Experiment-1 and performed Experiment-3. In Experiment-3, unlike Experiment-1, they performed 155 adaptation trials in the LIPF followed by 150 de-adaptation trials in the PEC. Thus, this experimental condition was referred to as the LIPF-PEC condition, while the condition in the Experiment-1 performed by the participants was called as LIPF-Null condition. To compare these two conditions, we needed to cancel out the order effects of the two experimental conditions. We thus newly recruited another fifteen participants. Those who passed initial screening experienced the LIPF-PEC first and then the LIPF-Null conditions. These experiments in the two conditions were again separated by at least one week. The experimental procedure in Experiment-3 is also the same as Experiment-1 except that in the LIPF-PEC condition, the participants performed the 155 de-adaptation trials in the PEC.

Data analysis

Target error (TE) and lateral deviation (LD) were used to evaluate motor adaptation. The TE was defined as x-deviation of the final hand position from the straight line joining the start circle to the target (Fig 1B). The final hand position was defined as the position at the moment when the hand velocity fell below 20 mm/s. The LD was defined as the x-deviations midway (at 75 mm from the start circle) from the straight line joining the start circle to the target.

To draw the participant-averaged trajectories for each of the VDCF (Experiment-1), LIPF (Experiment-1), PSPF (Experiment-2), and CPVF (Experiment-2) conditions, we sampled the x-axis data at the fifteen y positions: 7.5 (target radius size), 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, and 150 (target position) mm for each participant. These sampled data were averaged across participants for each y position and plotted in Fig 3A.

All statistical tests conducted in this study were two-tailed with a significance level of 0.05. To examine changes in each of TE and LD during motor adaptation, we separately performed one-way ANOVAs across trial epochs (6 epochs:1st, 3rd-5th, 136th-155th adaptation trials and 1st, 3rd-5th, 131st-150th de-adaptation trials). When assumptions of heterogeneity of covariance were violated, the number of degrees of freedom was corrected with the Greenhouse-Geisser procedure. Post-hoc pairwise comparisons were performed using Tukey’s method. For other tests, we performed paired or unpaired t-test was performed. The ANOVAs were performed using SPSS Statistics ver. 25 (IBM) and the t-tests were performed using MATLAB version R2018b (Mathworks).

Data exclusion

Trials were excluded from the analysis when the reach distance was less than 75 mm as the LD could not be evaluated (Fig 1B). 34 trials (0.098% of the total number of trials) were excluded. Only one participant in the CPVF group was excluded from the analysis because the participant showed unstable trajectory changes over the last 100 de-adaptation trials with at least three large jumps (> 20 mm) across the x-axis as well as an outlying value of the LD over the last 20-de-adaptation trials (outside of 3 s.d. from the mean). 14 participants were thus analyzed for the CPVF group (Experiment-2). Note that for the t-test on the first de-adaptation trial of the CPVF group, the statistical degree of freedom was 12 since the first de-adaptation trial of a participant was excluded due to the trial exclusion criterion.

Participant screening

We screened participants in all the experiments based on trajectory deviation in the baseline sessions. With pilot experiments, we anticipated the persisting curved null trajectory would appear after adaptation to TE-inducing force fields as seen in Fig 3. To assess this phenomenon, we wanted to examine how much the curved trajectories differ from null trajectories in the baseline. However, our pilot experiments observed that some participants showed considerably curved null trajectories (LD of ∼10 mm) in the baseline session because we did not provide participants with any instruction on reaching trajectory for the sake of the research question. Thus, to ensure that baseline null trajectories are the same across all the participant groups, only the participants whose LD averaged over the last 20 trials in the baseline session is less than 4.5 mm proceeded to the learning session. In fact, there were no significant differences in the LD in the baseline session across all the groups: the VDCF, LIPF, PSPF, LIPF groups and the participant group who performed the PEC-LIPF condition first (one-way ANOVA: F(4, 73) = 1.430, p = 0.223, ηp2 = 0.077). The other screened out participants afterwards performed similar reaching experiments which is not related to this study, and thus their data were not further analyzed for this study.

Simulation

To explain adaptive behaviors in the VDCF and the LIPF of the Experiment-1, we utilized two motor learning models: one is proposed by Izawa et al. [23], which we refer to as the flat OFC model, and the other is proposed by Franklin et al. [25], which we refer to as the flat VS model. These original models implement only the internal model learning and can explain monotonic trajectory adaptation as observed in the VDCF. However, they cannot explain non-monotonic trajectory adaptation, nor a persistent change in the null trajectory in the LIPF. We thus extended the two models by introducing a TE-driven kinematic plan adaptation that hierarchically interacts with the internal model adaptation (Fig 6A). We referred to the extended models as the hierarchical OFC model and the hierarchical VS model, respectively.

OFC model

The original model (i.e., flat OFC model) utilizes optimal feedback control (OFC) theory [24,71] to simulate reaching trajectories during adaptation to a state-dependent novel force field, based on a concept that motor learning is a process to acquire a model of the novel environment and use the model to re-optimize movements. Accordingly, in this framework, motor adaptation is characterized by the knowledge of the environment (the novel force field) which the motor system gradually acquires. The external force imposing to the arm is written by the form:

Ft=Dxt (1)

where Ft and Xt are the external force vector and the current state vector of the plant (arm and environment) at time t, respectively. D is the force matrix (e.g. for VDCF, D = B1[0–1;1 0]). What the motor system needs to perform the optimal movement in the force field is the full knowledge of D, which is assumed to be gradually acquired. The knowledge of D during adaptation is represented by the form:

D^=αD (2)

where D^ is the estimated force matrix, and α is the learning parameter, which is assumed to gradually increase from 0 to 1 with adaptation. During adaptation, the motor system predicted the external force using D^ as follows:

F^t=D^x^t (3)

F^t is the predicted external force vector at time t and x^t is the estimated state vector of the plant and is obtained through the optimal state estimator (see [71]). Accordingly, the motor system produces the motor command optimized for the environment where the predicted external force could impose on the arm. Only when α = 1, does the system have the full knowledge of D and produce the optimal motor commands for the actual environment. When 0 < α < 1, the system has an incomplete knowledge of D and would produce a sub-optimal movement for the actual environment. Thus, by changing the value of α, Izawa et al. simulated reaching trajectories in several phases of motor adaptation using OFC.

For the hierarchical OFC model, we borrowed the idea of a kinematic bias of movement direction proposed by Mistry et al. [26], which we refer to as directional bias. Mistry et al. extended the cost function of OFC by including a directional bias to explain a directional preference of reaching trajectories observed during motor adaptation to an acceleration-based force field. The directional bias represents the desired direction of movement, which is represented by the form:

Qd=[dy2dxdydxdydx2] (4)

Where Qd is the directional bias matrix and d = [dx dy]T is the desired directional vector represented as a unit vector. While the original cost function consists of the error term (first term) and the motor cost term (second term) (Eq 5), the cost function of the hierarchical OFC model (Eq 6) has the additional term related to the directional bias (third term in Eq 6) so that any position or velocity perpendicular to the desired direction was penalized as follows:

xtTQtxt+utTRut (5)
xtTQtxt+utTRut+et/τ(kpptTQdpt+kvvtTQdvt) (6)

where xt is the current state vector of the plant (the arm and environment) at time t, ut is the motor command vector, pt and vt are the position and velocity vector, respectively, and kp and kv are the weight of bias for position and velocity, respectively. Qt is the weight matrix of state cost, and R is the weight matrix of motor cost. The exponential decay term is included because the directional bias need not exist for the entire motion. In our simulation, these parameters were set as follows: kp = kv = 0.5, τ = 130 (ms). The cost parameters included in Qt and R were determined to produce trajectories similar to those in the experiments (see S2 Text). The reaching movement was simulated for 0≤tT+TH where T is the maximum movement completion time and TH is the time for which the hand was supposed to hold a position at the target after movement completion (see [23]). T and TH were set to 400 (ms) and 50 (ms), respectively.

Here, we further extended this idea by introducing a directional bias modulated by trial-by-trial TE (upper panel, Fig 6B). The directional bias is inclined in the opposite direction of TE to reduce it. The direction of the directional bias in the i-th trial is represented by φi, the angle from the target direction (clockwise as positive). The TE is equivalent to the directional error represented by θi, defined as the angle between the target direction from the start position and the direction from the start position to the endpoint of the reaching. In the presence of TE (i.e., TE > target size), the directional bias is updated according to the directional error as follows:

φi+1=bφirθi (7)

where the constant b is the forgetting rate and is set to 0.95. The constant r is the sensitivity to the degree of the directional bias update to the directional error and set to 0.85. The initial value of the directional bias is 0 (i.e. φ1 = 0).

In the absence of TE (i.e., TE < target size), we assumed that the direction bias subtly decays across trials to the original direction towards the target as follows:

φi+1=bφi (8)

Additionally, we assume that the kinematic plan adaptation is also affected by the motor cost (Eq 6, and see S2 Text) of the generated reaching, and the decay of the directional bias stops, i.e., b = 1 when the cost goes below less than 0.01. The threshold value was arbitrarily determined to produce curved null trajectories similar to those in the experiments.

Once a TE greater than the target size occurs, the kinematic bias is active. In contrast, if the TEs keep within the target size throughout the experiment, the kinematic bias remains inactive.

Next, to simulate the internal model adaptation in novel force fields, we changed the value of learning rate α. In the adaptation phase, α is increased from 0 to 0.8 such that αi = 0.8·log(log(i)+1)/log(log(155)+1) for 1≤i≤155. In the de-adaptation phase, α is decreased from 0.8 to 0 in the first 30 de-adaptation trials because de-adaptation process is well known to be much faster than adaptation process[72]. This was given by αi = 0.8·log(log(i−155)+1)/log(log(30)+1) for 156≤i≤185; αi = 0 for 186≤i≤305. The update rule for α was determined to well reproduce the experimental observations.

We simulated the reaching trajectory of the arm modeled as a point mass in the Cartesian coordinates. The movement distance was 150 mm. B1 and K1 were set to 7 Ns/m and 120 N/m for the simulation of VDCF and LIPF, respectively to produce trajectories similar to those in the experiments. PEC was applied over the second half of movement (y > 75 mm) as a one-dimensional spring force (1500 N/m) and damper (100 Ns/m) along x-axis. We discretized the system dynamics with a time step of Δt = 10 ms and performed the model simulation in a similar way as that introduced by Izawa et al. [23], except for the directional bias modulated by history of TE. Please see S2 Text for further detail of the model (section of OFC model).

V-shaped model

The original model (i.e., flat VS model) assumes that desired trajectory, which the motor system should trace, is a fixed straight line joining the start and target and that the motor command is gradually corrected to reduce the difference between the actual and desired trajectory, which is defined as movement error. In simulation with the model, the error is represented in coordinates of muscle length and written by the form:

E=λλ0 (9)

where E is the movement error which is the difference between the actual muscle length, λ and the desired muscle length, λ0. This error is used to update feedforward command to the individual muscle of the arm on a trial-by-trial basis, based on a simple V-shaped learning function (see S2 Text). The feedforward command for each muscle k is updated from uki to uki+1 according to the following learning law:

uki+1(t)[uki(t)+Δuki(t+ϕ)]+,[·]+max{·,0}Δuki(t)=αεk,+i(t)+βεk,i(t)γ,[·][·]+εki(t)=Eki(t)+gdE˙ki(t) (10)

where Eki(t) is the stretching/shortening in muscle k at time t for trial i, and Δu is phase advanced by ϕ>0, which is feedback delay. α and β are the learning parameters (α>β>0) and γ (>0) is a constant de-activation parameter. The term gd (>0) indicates the relative level of velocity error to length error. By implementing this learning law to a 2-joint 6-muscle arm model, Franklin et al. [25] and Tee et al. [73] simulated the reaching trajectories in a broad range of novel force field environments.

Here, we extend the flat model by introducing an idea that the desired trajectory (lower panel in Fig 6B), which is represented in the Cartesian coordinates, is updated according to a trial-by-trial TE in a similar way to the hierarchical OFC model. The desired trajectory is described as a curved line with a deflection, dx, 120 mm away from the start position along the y-axis (Fig 6B). Before adaptation, the desired trajectory is the straight line towards the target, that is, dx = 0. In the presence of TE (i.e., TE > target size), dx is updated as follows:

dxi+1=bdxirTEi (11)

where the constant b represents the retention of motor learning and is set to 0.95. The constant r to the degree of update of dx to the TE in the previous trial and is set to 0.45. The constant r is the sensitivity to the degree of the desired trajectory update to TE. In the presence of TE, dx is modulated such that the desired trajectory is deflected in the opposite direction to a trial-by-trial TE. The desired trajectory with dx was calculated as the minimum jerk trajectory with the via-point at [dx 120] (mm) from the start position [74].

In the absence of TE (i.e., TE < target size), we assumed that the desired trajectory subtly decays across trials to the original direction towards the target as follows:

dxi+1=bdxi (12)

We again assume that the kinematic plan adaptation is affected by the motor cost of the generated reaching, which is calculated as average muscle tension across all the 6 muscles during movement (see S2 Text). When the cost goes below less than 350, the decay of the desired trajectory stops, i.e., b = 1. The threshold value was again arbitrarily determined to produce curved null trajectories similar to those in the experiments.

In simulation, the desired trajectory was converted from the Cartesian to muscle space to apply it to the learning law (Eq 10). The start and target positions were at [0, 350] and [0, 500] (mm) in the Cartesian coordinate (where [0, 0] is at the shoulder joint), respectively. The reach duration was 400 ms. For simplicity, all noise parameters were set to zero. B1 and K1 (see the section of force fields) were set to 20 Ns/m and 120 N/m, respectively, to produce trajectories similar to those in the experiments. PEC was applied over the second half of movement (y > 75 mm) as a one-dimensional spring force (2500 N/m) and damper (1000 Ns/m) along x-axis. We performed the model simulation in the same way as that introduced by Franklin et al. [25], except that the desired trajectory is modulated by history of endpoint error. Please see S2 Text for further detail of the model (section of V-shaped model).

Supporting information

S1 Text. Trajectory adaptation in Experiment-2.

(DOCX)

S2 Text. Detail for the simulation.

(DOCX)

S1 Fig

Trajectory adaptation in Experiment-2: (A, C) The hand trajectories of two representative participants and learning curves in PSPF (A) and CPVF (C) averaged across all participants. Note that the scales differ between x and y axes to clearly show trajectory changes along the x-axis. The light gray shades behind some trajectories represent a schematic image of the force field. The adaptation of the TE and LD are shown by traces with open circles and filled circles, respectively. The first 15 TE and LD values are plotted for every single trial, while the subsequent trials (indicated by thick gray lines at the bottom of the figure) are plotted for every five trials. The shaded gray areas around the lines represent standard errors. The light green zones represent the target width (radius: 7.5 mm). (B, D) The TEs and baseline-subtracted LDs in six trial epochs (1st, 3rd-5th, 136th-155th adaptation trials, and 1st, 3rd-5th, 131st-150th de-adaptation trials) in PSPF (B) and CPVF (D). Gray dots represent data from individual participants. The error bars indicate standard errors. The light green zone in the TE plots represents the target width.

(TIF)

S2 Fig

Simulation results for trajectory adaptation in CPVF, represented by TE (open circle) and LD (filled circle) by the flat (A, B)/hierarchical (C, D) OFC (upper panels) and VS models (lower panels). The flat learning models (only internal model adaptation) were unable to reproduce either the non-monotonic change in LD or the curved null trajectory with a persistent deviation after exposure to CPVF. However, the hierarchical OFC models (kinematic plan learning and internal model adaptation) successfully reproduced both.

(TIF)

Acknowledgments

We thank Ms. Yuka Furukawa and Ms. Naoko Katagiri for help in recruiting the participants. We thank Dr. Jun Izawa and Dr. Tee for providing the codes used in their studies.

Data Availability

Data and codes for all experiments and simulations are freely available in the Dryad repository at the URL: https://doi.org/10.5061/dryad.5x69p8d2f [75].

Funding Statement

TI was supported by JSPS KAKENHI Grant #26750387 (https://www.jsps.go.jp/j-grantsinaid/). RO was supported by KAKENHI Grants #17H02128 and # 20H05482. MK was supported by AMED Grant #JP20dm0307008 (https://www.amed.go.jp/), and JST ERATO Grant # JPMJER1801 (https://www.jst.go.jp/). TY was supported by a contract with the National Institute of Information and Communications Technology, entitled ‘Development of network dynamics modeling methods for human brain data simulation systems’ (https://www.nict.go.jp/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Botvinick MM. Hierarchical reinforcement learning and decision making. Curr Opin Neurobiol. 2012;22(6):956–62. Epub 2012/06/15. S0959-4388(12)00087-6 [pii] 10.1016/j.conb.2012.05.008. 10.1016/j.conb.2012.05.008 . [DOI] [PubMed] [Google Scholar]
  • 2.Sugrue LP, Corrado GS, Newsome WT. Choosing the greater of two goods: neural currencies for valuation and decision making. Nat Rev Neurosci. 2005;6(5):363–75. Epub 2005/04/16. nrn1666 [pii] 10.1038/nrn1666. 10.1038/nrn1666 . [DOI] [PubMed] [Google Scholar]
  • 3.Shadmehr R, Smith MA, Krakauer JW. Error correction, sensory prediction, and adaptation in motor control. Annu Rev Neurosci. 2010;33:89–108. Epub 2010/04/07. 10.1146/annurev-neuro-060909-153135 . [DOI] [PubMed] [Google Scholar]
  • 4.Tseng YW, Diedrichsen J, Krakauer JW, Shadmehr R, Bastian AJ. Sensory prediction errors drive cerebellum-dependent adaptation of reaching. J Neurophysiol. 2007;98(1):54–62. Epub 2007/05/18. 00266.2007 [pii] 10.1152/jn.00266.2007 . [DOI] [PubMed] [Google Scholar]
  • 5.Wolpert DM, Diedrichsen J, Flanagan JR. Principles of sensorimotor learning. Nature Reviews Neuroscience. 2011;12(12):739–51. 10.1038/nrn3112 [DOI] [PubMed] [Google Scholar]
  • 6.Kawato M, Furukawa K, Suzuki R. A hierarchical neural-network model for control and learning 10.1007/BF00364149 voluntary movement. Biol Cybern. 1987;57(3):169–85. Epub 1987/01/01. [DOI] [PubMed] [Google Scholar]
  • 7.McDougle SD, Ivry RB, Taylor JA. Taking Aim at the Cognitive Side of Learning in Sensorimotor Adaptation Tasks. Trends Cogn Sci. 2016;20(7):535–44. Epub 2016/06/05. 10.1016/j.tics.2016.05.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Krakauer JW, Hadjiosif AM, Xu J, Wong AL, Haith AM. Motor Learning. Compr Physiol. 2019;9(2):613–63. Epub 2019/03/16. 10.1002/cphy.c170043 . [DOI] [PubMed] [Google Scholar]
  • 9.Taylor JA, Krakauer JW, Ivry RB. Explicit and Implicit Contributions to Learning in a Sensorimotor Adaptation Task. The Journal of Neuroscience. 2014;34(8):3023–32. 10.1523/JNEUROSCI.3619-13.2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.McDougle SD, Bond KM, Taylor JA. Explicit and Implicit Processes Constitute the Fast and Slow Processes of Sensorimotor Learning. J Neurosci. 2015;35(26):9568–79. Epub 2015/07/03. 10.1523/JNEUROSCI.5061-14.2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Smith MA, Ghazizadeh A, Shadmehr R. Interacting adaptive processes with different timescales underlie short-term motor learning. PLoS Biol. 2006;4(6):e179. Epub 2006/05/17. 05-PLBI-RA-0791R2 [pii] 10.1371/journal.pbio.0040179. 10.1371/journal.pbio.0040179 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Lee JY, Schweighofer N. Dual adaptation supports a parallel architecture of motor memory. J Neurosci. 2009;29(33):10396–404. Epub 2009/08/21. 29/33/10396 [pii] 10.1523/JNEUROSCI.1294-09.2009. 10.1523/JNEUROSCI.1294-09.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Keisler A, Shadmehr R. A shared resource between declarative memory and motor memory. J Neurosci. 2010;30(44):14817–23. Epub 2010/11/05. 30/44/14817 [pii] 10.1523/JNEUROSCI.4160-10.2010. 10.1523/JNEUROSCI.4160-10.2010 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Schween R, McDougle SD, Hegele M, Taylor JA. Explicit strategies in force field adaptation. bioRxiv. 2019:694430. 10.1101/694430 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Kim HE, Parvin DE, Ivry RB. The influence of task outcome on implicit motor learning. Elife. 2019;8. Epub 2019/04/30. 10.7554/eLife.39882 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Leow LA, Marinovic W, de Rugy A, Carroll TJ. Task errors drive memories that improve sensorimotor adaptation. J Neurosci. 2020. Epub 2020/02/08. 10.1523/JNEUROSCI.1506-19.2020 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Osu R, Hirai S, Yoshioka T, Kawato M. Random presentation enables subjects to adapt to two opposing forces on the hand. Nat Neurosci. 2004;7(2):111–2. Epub 2004/01/28. 10.1038/nn1184 [pii]. . [DOI] [PubMed] [Google Scholar]
  • 18.Shadmehr R, Mussa-Ivaldi FA. Adaptive representation of dynamics during learning of a motor task. J Neurosci. 1994;14(5 Pt 2):3208–24. Epub 1994/05/01. 10.1523/JNEUROSCI.14-05-03208.1994 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Morasso P. Spatial control of arm movements. Exp Brain Res. 1981;42(2):223–7. Epub 1981/01/01. 10.1007/BF00236911 . [DOI] [PubMed] [Google Scholar]
  • 20.Hollerbach MJ, Flash T. Dynamic interactions between limb segments during planar arm movement. Biol Cybern. 1982;44(1):67–77. 10.1007/BF00353957 . [DOI] [PubMed] [Google Scholar]
  • 21.Lackner JR, Dizio P. Rapid adaptation to Coriolis force perturbations of arm trajectory. J Neurophysiol. 1994;72(1):299–313. Epub 1994/07/01. 10.1152/jn.1994.72.1.299 . [DOI] [PubMed] [Google Scholar]
  • 22.DiZio P, Lackner JR. Congenitally blind individuals rapidly adapt to coriolis force perturbations of their reaching movements. J Neurophysiol. 2000;84(4):2175–80. Epub 2000/10/12. 10.1152/jn.2000.84.4.2175 . [DOI] [PubMed] [Google Scholar]
  • 23.Izawa J, Rane T, Donchin O, Shadmehr R. Motor adaptation as a process of reoptimization. J Neurosci. 2008;28(11):2883–91. Epub 2008/03/14. 28/11/2883 [pii] 10.1523/JNEUROSCI.5359-07.2008. 10.1523/JNEUROSCI.5359-07.2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Todorov E, Jordan MI. Optimal feedback control as a theory of motor coordination. Nat Neurosci. 2002;5(11):1226–35. Epub 2002/10/31. 10.1038/nn963 [pii]. . [DOI] [PubMed] [Google Scholar]
  • 25.Franklin DW, Burdet E, Tee KP, Osu R, Chew CM, Milner TE, et al. CNS learns stable, accurate, and efficient movements using a simple algorithm. J Neurosci. 2008;28(44):11165–73. Epub 2008/10/31. 28/44/11165 [pii] 10.1523/JNEUROSCI.3099-08.2008. 10.1523/JNEUROSCI.3099-08.2008 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Mistry M, Theodorou E, Schaal S, Kawato M. Optimal control of reaching includes kinematic constraints. J Neurophysiol. 2013;110(1):1–11. Epub 2013/04/05. jn.00794.2011 [pii] 10.1152/jn.00794.2011. 10.1152/jn.00794.2011 . [DOI] [PubMed] [Google Scholar]
  • 27.Schmidt RA, Lee TD. Motor control and learning: a behavioral emphasis. 4th ed. Champaign, IL: Human Kinetics; 2005. vi, 537 p. p. [Google Scholar]
  • 28.Krakauer JW, Pine ZM, Ghilardi MF, Ghez C. Learning of visuomotor transformations for vectorial planning of reaching trajectories. J Neurosci. 2000;20(23):8916–24. Epub 2000/01/11. 20/23/8916 [pii]. 10.1523/JNEUROSCI.20-23-08916.2000 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Chib VS, Patton JL, Lynch KM, Mussa-Ivaldi FA. Haptic identification of surfaces as fields of force. J Neurophysiol. 2006;95(2):1068–77. Epub 2005/10/07. 00610.2005 [pii] 10.1152/jn.00610.2005. 10.1152/jn.00610.2005 . [DOI] [PubMed] [Google Scholar]
  • 30.Ganesh G, Haruno M, Kawato M, Burdet E. Motor memory and local minimization of error and effort, not global optimization, determine motor behavior. J Neurophysiol. 2010;104(1):382–90. Epub 2010/05/21. jn.01058.2009 [pii] 10.1152/jn.01058.2009. 10.1152/jn.01058.2009 . [DOI] [PubMed] [Google Scholar]
  • 31.Kodl J, Ganesh G, Burdet E. The CNS stochastically selects motor plan utilizing extrinsic and intrinsic representations. PLoS One. 2011;6(9):e24229. Epub 2011/09/14. 10.1371/journal.pone.0024229 [pii]. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Ganesh G, Burdet E. Motor planning explains human behaviour in tasks with multiple solutions. Robotics and Autonomous Systems. 2013;61(4):362–8. [Google Scholar]
  • 33.Diedrichsen J, White O, Newman D, Lally N. Use-dependent and error-based learning of motor behaviors. J Neurosci. 2010;30(15):5159–66. Epub 2010/04/16. 30/15/5159 [pii] 10.1523/JNEUROSCI.5406-09.2010. 10.1523/JNEUROSCI.5406-09.2010 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Huang VS, Haith A, Mazzoni P, Krakauer JW. Rethinking motor learning and savings in adaptation paradigms: model-free memory for successful actions combines with internal models. Neuron. 2011;70(4):787–801. )00338-2 [pii] 10.1016/j.neuron.2011.04.012. 10.1016/j.neuron.2011.04.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Vindras P, Desmurget M, Prablanc C, Viviani P. Pointing errors reflect biases in the perception of the initial hand position. J Neurophysiol. 1998;79(6):3290–4. Epub 1998/06/26. 10.1152/jn.1998.79.6.3290 . [DOI] [PubMed] [Google Scholar]
  • 36.Ostry DJ, Darainy M, Mattar AA, Wong J, Gribble PL. Somatosensory plasticity and motor learning. J Neurosci. 2010;30(15):5384–93. Epub 2010/04/16. 10.1523/JNEUROSCI.4571-09.2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Modchalingam S, Vachon CM, t Hart BM, Henriques DYP. The effects of awareness of the perturbation during motor adaptation on hand localization. PLoS One. 2019;14(8):e0220884. Epub 2019/08/10. 10.1371/journal.pone.0220884 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Shmuelof L, Huang VS, Haith AM, Delnicki RJ, Mazzoni P, Krakauer JW. Overcoming motor "forgetting" through reinforcement of learned actions. J Neurosci. 2012;32(42):14617–21. Epub 2012/10/19. 10.1523/JNEUROSCI.2184-12.2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Galea JM, Mallia E, Rothwell J, Diedrichsen J. The dissociable effects of punishment and reward on motor learning. Nat Neurosci. 2015;18(4):597–602. Epub 2015/02/24. 10.1038/nn.3956 . [DOI] [PubMed] [Google Scholar]
  • 40.Codol O, Holland PJ, Galea JM. The relationship between reinforcement and explicit control during visuomotor adaptation. Scientific reports. 2018;8(1):9121. Epub 2018/06/16. 10.1038/s41598-018-27378-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Holland P, Codol O, Oxley E, Taylor M, Hamshere E, Joseph S, et al. Domain-Specific Working Memory, But Not Dopamine-Related Genetic Variability, Shapes Reward-Based Motor Learning. J Neurosci. 2019;39(47):9383–96. Epub 2019/10/13. 10.1523/JNEUROSCI.0583-19.2019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Miyamoto YR, Wang S, Smith MA. Implicit adaptation compensates for erratic explicit strategy in human motor learning. Nat Neurosci. 2020;23(3):443–55. Epub 2020/03/01. 10.1038/s41593-020-0600-3 . [DOI] [PubMed] [Google Scholar]
  • 43.Botvinick MM. Hierarchical models of behavior and prefrontal function. Trends Cogn Sci. 2008;12(5):201–8. Epub 2008/04/19. S1364-6613(08)00088-0 [pii] 10.1016/j.tics.2008.02.009. 10.1016/j.tics.2008.02.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Botvinick MM, Niv Y, Barto AC. Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition. 2009;113(3):262–80. Epub 2008/10/18. S0010-0277(08)00205-9 [pii] 10.1016/j.cognition.2008.08.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Ribas-Fernandes JJ, Solway A, Diuk C, McGuire JT, Barto AG, Niv Y, et al. A neural signature of hierarchical reinforcement learning. Neuron. 2011;71(2):370–9. Epub 2011/07/28. S0896-6273(11)00499-5 [pii] 10.1016/j.neuron.2011.05.042. 10.1016/j.neuron.2011.05.042 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Badre D, Doll BB, Long NM, Frank MJ. Rostrolateral prefrontal cortex and individual differences in uncertainty-driven exploration. Neuron. 2012;73(3):595–607. Epub 2012/02/14. S0896-6273(12)00075-X [pii] 10.1016/j.neuron.2011.12.025. 10.1016/j.neuron.2011.12.025 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Badre D, Frank MJ. Mechanisms of hierarchical reinforcement learning in cortico-striatal circuits 2: evidence from FMRI. Cereb Cortex. 2012;22(3):527–36. Epub 2011/06/23. bhr117 [pii] 10.1093/cercor/bhr117. 10.1093/cercor/bhr117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Kawato M, Samejima K. Efficient reinforcement learning: computational theories, neuroscience and robotics. Curr Opin Neurobiol. 2007;17(2):205–12. Epub 2007/03/22. 10.1016/j.conb.2007.03.004 [pii] . [DOI] [PubMed] [Google Scholar]
  • 49.Merel J, Botvinick M, Wayne G. Hierarchical motor control in mammals and machines. Nat Commun. 2019;10(1):5489. Epub 2019/12/04. 10.1038/s41467-019-13239-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Barto AG, Sutton RS. Reinforcement learning: The MIT Press; 1998. [Google Scholar]
  • 51.Barto AG, Mahadevan S. Recent advances in hierarchical reinforcement learning. Discrete Event Systems journal. 2003;13:44–77. [Google Scholar]
  • 52.Diedrichsen J, Hashambhoy Y, Rane T, Shadmehr R. Neural correlates of reach errors. J Neurosci. 2005;25(43):9919–31. Epub 2005/10/28. 25/43/9919 [pii] 10.1523/JNEUROSCI.1874-05.2005. 10.1523/JNEUROSCI.1874-05.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Shadmehr R, Holcomb HH. Neural correlates of motor memory consolidation. Science. 1997;277(5327):821–5. 10.1126/science.277.5327.821 . [DOI] [PubMed] [Google Scholar]
  • 54.Diedrichsen J, Criscimagna-Hemminger SE, Shadmehr R. Dissociating timing and coordination as functions of the cerebellum. J Neurosci. 2007;27(23):6291–301. Epub 2007/06/08. 27/23/6291 [pii] 10.1523/JNEUROSCI.0061-07.2007. 10.1523/JNEUROSCI.0061-07.2007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Imamizu H, Kawato M. Neural correlates of predictive and postdictive switching mechanisms for internal models. J Neurosci. 2008;28(42):10751–65. Epub 2008/10/17. 28/42/10751 [pii] 10.1523/JNEUROSCI.1106-08.2008. 10.1523/JNEUROSCI.1106-08.2008 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Burdet E, Osu R, Franklin DW, Milner TE, Kawato M. The central nervous system stabilizes unstable dynamics by learning optimal impedance. Nature. 2001;414(6862):446–9. Epub 2001/11/24. 10.1038/35106566 [pii]. . [DOI] [PubMed] [Google Scholar]
  • 57.Cluff T, Scott SH. Rapid feedback responses correlate with reach adaptation and properties of novel upper limb loads. J Neurosci. 2013;33(40):15903–14. Epub 2013/10/04. 10.1523/JNEUROSCI.0263-13.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Scheidt RA, Ghez C. Separate adaptive mechanisms for controlling trajectory and final position in reaching. J Neurophysiol. 2007;98(6):3600–13. Epub 2007/10/05. 00121.2007 [pii] 10.1152/jn.00121.2007. 10.1152/jn.00121.2007 . [DOI] [PubMed] [Google Scholar]
  • 59.Crevecoeur F, Thonnard JL, Lefevre P. A Very Fast Time Scale of Human Motor Adaptation: Within Movement Adjustments of Internal Representations during Reaching. eNeuro. 2020;7(1). Epub 2020/01/18. 10.1523/ENEURO.0149-19.2019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Braun DA, Aertsen A, Wolpert DM, Mehring C. Learning optimal adaptation strategies in unpredictable motor tasks. J Neurosci. 2009;29(20):6472–8. Epub 2009/05/22. 29/20/6472 [pii] 10.1523/JNEUROSCI.3075-08.2009. 10.1523/JNEUROSCI.3075-08.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Maeda RS, Cluff T, Gribble PL, Pruszynski JA. Feedforward and Feedback Control Share an Internal Model of the Arm’s Dynamics. J Neurosci. 2018;38(49):10505–14. Epub 2018/10/26. 10.1523/JNEUROSCI.1709-18.2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Hayashi T, Yokoi A, Hirashima M, Nozaki D. Visuomotor Map Determines How Visually Guided Reaching Movements are Corrected Within and Across Trials. eNeuro. 2016;3(3). Epub 2016/06/09. 10.1523/ENEURO.0032-16.2016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Wagner MJ, Smith MA. Shared internal models for feedforward and feedback control. J Neurosci. 2008;28(42):10663–73. Epub 2008/10/17. 10.1523/JNEUROSCI.5479-07.2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Scott SH. Optimal feedback control and the neural basis of volitional motor control. Nat Rev Neurosci. 2004;5(7):532–46. 10.1038/nrn1427 . [DOI] [PubMed] [Google Scholar]
  • 65.Cashaback JGA, McGregor HR, Mohatarem A, Gribble PL. Dissociating error-based and reinforcement-based loss functions during sensorimotor learning. PLoS Comput Biol. 2017;13(7):e1005623. Epub 2017/07/29. 10.1371/journal.pcbi.1005623 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Oldfield RC. The assessment and analysis of handness: the Edinburgh inventory. 1971. [DOI] [PubMed] [Google Scholar]
  • 67.Ganesh G, Takagi A, Osu R, Yoshioka T, Kawato M, Burdet E. Two is better than one: Physical interactions improve motor performance in humans. Scientific reports. 2014;4. ARTN 3824 10.1038/srep03824. WOS:000330045000001. 10.1038/srep03824 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Elliott D, Helsen WF, Chua R. A century later: Woodworth’s (1899) two-component model of goal-directed aiming. Psychol Bull. 2001;127(3):342–57. Epub 2001/06/08. 10.1037/0033-2909.127.3.342 . [DOI] [PubMed] [Google Scholar]
  • 69.Novak KE, Miller LE, Houk JC. Kinematic properties of rapid hand movements in a knob turning task. Experimental Brain Research. 2000;132(4):419–33. 10.1007/s002210000366 [DOI] [PubMed] [Google Scholar]
  • 70.Scheidt RA, Reinkensmeyer DJ, Conditt MA, Rymer WZ, Mussa-Ivaldi FA. Persistence of motor adaptation during constrained, multi-joint, arm movements. J Neurophysiol. 2000;84(2):853–62. Epub 2000/08/12. 10.1152/jn.2000.84.2.853 . [DOI] [PubMed] [Google Scholar]
  • 71.Todorov E. Stochastic optimal control and estimation methods adapted to the noise characteristics of the sensorimotor system. Neural Comput. 2005;17(5):1084–108. Epub 2005/04/15. 10.1162/0899766053491887 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Shadmehr R, Wise SP. The computational neurobiology of reaching and pointing. Cambridge, Massachusetts: The MIT Press; 2005. [Google Scholar]
  • 73.Tee KP, Franklin DW, Kawato M, Milner TE, Burdet E. Concurrent adaptation of force and impedance in the redundant muscle system. Biol Cybern. 2010;102(1):31–44. Epub 2009/11/26. 10.1007/s00422-009-0348-z . [DOI] [PubMed] [Google Scholar]
  • 74.Flash T, Hogan N. The coordination of arm movements: an experimentally confirmed mathematical model. J Neurosci. 1985;5(7):1688–703. Epub 1985/07/01. 10.1523/JNEUROSCI.05-07-01688.1985 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Ikegami T, Ganesh G, Gibo LT, Yoshioka T, Osu R, Kawato M. Data for: Hierarchial motor adaptations negotiate failures during force field learning [Internet]. Dryad. 2021. Available from: 10.1523/JNEUROSCI.05-07-01688.1985 [DOI] [PMC free article] [PubMed] [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008481.r001

Decision Letter 0

Adrian M Haith, Samuel J Gershman

17 Dec 2020

Dear Dr Ikegami,

Thank you very much for submitting your manuscript "Hierarchical motor adaptations negotiate failures during force field learning" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers all agreed that the experiments and model make a potentially valuable contribution to our understanding of motor adaptation. However, they also highlighted a number of limitations of the work and some concerns related to the presentation. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

The reviewers also made a number of suggestions for possible experiments to help address some of the current limitations of the paper. These suggestions would undoubtedly improve the paper if you choose to pursue them. However, it may not be essential to include any additional experiments in your revision if instead you believe the concerns can be adequately addressed by more clearly acknowledging and discussing the limitations of the existing approach.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Adrian M Haith

Associate Editor

PLOS Computational Biology

Samuel Gershman

Deputy Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Reviewer #1: Ikegami and colleagues investigate patterns of adaptation to different types of force fields across several experiments. Adaptation to standard velocity dependent force fields and position dependent force fields are studied. These perturbations differ notably by their magnitude near the end of movement, such that the velocity dependent force field vanishes near the target when velocity decreases, whereas the position dependent force field builds up and increases throughout the reach. As a consequence, there is a difference in end-point error, which is larger for the position-dependent force fields. Different patterns of adaptation and after effects are documented, and it is argued that the terminal target error following position-dependent force fields is a “failure signal”, which plays a key role in adaptation. To account for their results, the authors put forward a computational model that includes changes in the kinematic plan which aims at correcting for the target error, in addition to the adaptation of an internal model to cancel sensory prediction errors.

This is an excellent paper. It provides a comprehensive and rigorous analysis of a robust and reproducible aspect of adaptation to position dependent force fields, and highlights that adaptation strategies may process different error signals during planning and control stages. From a computational perspective, the study is a convincing demonstration that kinematic plans and adaptation can be studied in closed coop control models. This major contribution helps bringing together the often-separated fields of human feedback control and motor adaptation.

In my view, the paper could be published as is. I only have one major concern and a number of minor comments for which I am interested to hear the authors point of view. Some of them a may require clarifications or short discussion points:

Major

My only major criticism is that the movement kinematics varied a lot across experiments and conditions, leading to the possibility that some aspects linked to online control also play a role. For instance, in Experiment 1, the magnitude of the position dependent FF produced larger deviations, and we do not know what would happen if the lateral deviations were comparable across force fields. The same concern holds for the control experiment #3, the virtual spring indeed reduces the target error but it also has an impact on many other kinematic parameters. Of course, it is very difficult if possible to only alter target error while matching all other kinematic parameters, but perhaps a complementary analysis such as subsampling trials with comparable kinematic errors to compare after effect would be useful, otherwise a note of caution seems necessary.

Specific:

Line 64: “motor movement” is a strange formulation.

Intro, paragraph around line 75: The authors imply a strong link between explicit strategies, TE-mediated adaptation, and the fast state of motor adaptation which to my knowledge is not clearly established. One critical aspect is the impact of the paradigm, for instance the explicit strategy that counter target errors following visuomotor rotations to my knowledge does not directly apply to force field learning (perhaps it is a conclusion of this paper). Likewise, it is not clear to me that the implicit adaptation is only linked to the slow state. I would suggest to clarify the argument if I was missing anything, or that these concepts be presented with a bit more caution, keeping in mind that differences across adaptation paradigms (rotations and force fields) my hinder the correspondence between fast-slow, and explicit-implicit components.

Related to the previous point, the possibility of rapid feedback adaptation (Crevecoeur et al., 7(1) ENEURO.0149-19.2019 1–16) is clearly linked to the fast state, but it is a SPE adaptation components in the authors’ terminology and likely linked to implicit adaptation as it produces systematic, one-trial lag after effect in a random scenario.

Lines 91-93: It is a bit of an overstatement that velocity dependent force fields have limited impact in end-point error. Often end-point corrections are not stable and it is not clear which error signal can be used in the case of non-zero terminal velocity or time out errors. Pls consider clarifying that although there is a clear difference in terms of TE magnitude, other aspects of behaviour, such as velocity and ability to stabilise, may also differ.

Lines 108-109: Although it is well defined in the Method section I would recommend giving more info about the force field at this stage (curl field? Orthogonal?)

I was under the impression that the hierarchical nature of the proposed model was not well motivated or not strongly supported in the data. It is fair to assume that a kinematic plan is selected prior to the derivation of the control law, and that the can some sequential organisation of kinematic (TE) and dynamic (SPE) adaptation. But this is potentially a property of the paradigm in which the target is fixed producing apparent hierarchy. However, there is evidence for online adaptation to changes in target location (Braun et al., J Neurosci, 29(20):6472– 6478), thus it is conceivable that such case the update of the kinematic plan be performed downstream of the selection of a control law based on sensory prediction errors. Is the hierarchy truly necessary, or can one imagine that the selection of the kinematic plan and the adaptation of the control law be processed in parallel?

Again on the latter point: the apparent hierarchy is potentially induced by different timescales such that kinematic plan for a fixed target changes from trial to trial whereas adaptation based on sensory prediction errors uses continuous signals. However, in the Crevecoeur et al (cite above eNeuro) and in Braun et al (JNS, cited above), it is suggested that both sensory prediction error and kinematic plans can be updated within movement. Pls consider discussing the potential impact of the time-scales of adaptation to TE and SPE in the hierarchical organisation of the different adaptive components.

Figure 3: the case strongly rests on the fact that long-lasting after effects in terms of lateral deviation characterise washout after position dependent force fields. It seems to be a robust and reproducible feature of the data and clearly the authors’ model can capture this aspect of behaviour, but is it fair to say that it is not clear why it happened in the first place? That it reveals differences between TE and SPE adaptation is fair, but I was under the impression that this finding was not clearly expected a priori.

Please clarify the learning rule or update rule for alpha, was it fitted to the data? Is there a learning algorithm in the sense that the internal model is deduced, or is it just based on partial compensation of the force field with alpha lesser than one?

Pls consider also sharing the simulation code to help the community playing with models of human adaptation and control.

Respectfully submitted,

F. Crevecoeur

Reviewer #2: In this well-written study, the authors ask how motor internal models of environmental dynamics and task error (success/failure) information interact to induce motor learning in human. To this end, they designed and ran a series of well-thought behavioral experiment, which entails participants reaching in various force-fields with their upper limb. While all the force-fields trigger learning at the internal model level, they were cleverly designed to manipulate the amount of task error participants would be exposed to. This enabled the authors to assess the qualitative relationship between internal model and task error. Next, they modelled the observed human behavior with optimal control models to assess the compatibility of the data with distinct working hypotheses: first, that internal model adaptation relies on sensorimotor prediction error alone (“flat” models) or second, that it relies on both sensorimotor prediction error and task error (“hierarchical” models).

While I particularly appreciate the experimental designs and the thoughtful choice of force-fields to tease apart sensory prediction error and task error influence on internal model adaptation, I have two main concerns as detailed below. First, that the behavioral results are presented too much at face value, and that additional controls may benefit the soundness of the results; and second that the modelling section does not really give a “fighting chance” to alternative models, meaning that it does not provide a lot of additional information on top of what the behavioral experiments tell us.

Major points

Lines 78-79: Miyamoto et al (2020) do not claim that the explicit and implicit components of adaptation occur “in parallel” but on the contrary, that they interact. In that sense, that citation would be more fitting several lines down (lines 80-83), where the authors mention work suggesting that task error and sensorimotor prediction error interact. This also applies at lines 475-476.

Behavioral experiments: the LIPF group shows much larger TE errors at the end of the movement than the VDCF group showed LD errors (nearly 4 times more on average based on figure 1). This discrepancy is less pronounced in the PSPF versus CPVF comparison but still present (nearly 2 times more on average from figure S1). This consistent discrepancy may lead to increased feedback gains, or increased stiffness through co-contraction or postural control specifically for the TE groups to help reduce errors. This may explain baseline differences, as least in theory. For instance, a change in postural control may lead to a change in baseline trajectory to accommodate the biomechanics of the arm (Sergio & Scott, Exp Brain Res, 1998). An increase in feedback gains may lead to changes in internal models as hypothesized in Miyamoto, Kawato, Setoyama and Suzuki, (1988, Neural Networks), and shown in humans (Maeda, Cluff, Gribble and Pruszynski, 2018, J Neurosci). The larger errors also suggest participants in these conditions experience stronger forces, which may lead to increased muscle fatigue and explain differences in the new baseline. Ideally the force-fields should be tuned so that the maximal lateral deviation at the distance of interest (LD for VDCF and TE for PIPF) is controlled and matching across conditions. Occasional full channel trials may also help rule out an effect of co-contraction as forces against a channel would signify net torques only. I leave to the authors to decide whether they would include such a control experiment, provide additional analyses to rule out alternative explanations, or explicitly mention this limitation where they deem appropriate in the paper.

Lines 275: “Half (15) of these participants had previously participated in Experiment-1.” Was analysis using this data corrected for multiple comparisons due to being used for several statistical tests? Also, if I understand this right, in line 294 “similar to Experiment-1” is a circular statement since half of the data comes from experiment 1. If the second half shows the same trend, it would be more informative to specify that.

Modelling section: it seems unsurprising and therefore uninformative that a model with two “modules” (a task-error driven module and a sensorimotor-error driven module) can better account for the observed data than a model with only one module, because the data shows two uncorrelated behaviors. Indeed, the authors mention a similar point on lines 712-713. In that sense, the study in its current form supports the incompatibility of the data with a task-error blind (flat) model more than it supports the existence of a hierarchical model of the specific form presented here. Since the behavioral data already make the case for a task-error sensitive model, the current study would greatly benefit from a comparison between different plausible hierarchical models that include task error. The authors already lean in that direction in the discussion on the paragraph at lines 489-503, which was particularly interesting. Indeed, they mention the behavioral data of the current study is incompatible with Kim et al (2019)’s “Adaptation Modulation” model. What about the other two models that Kim and colleagues propose, namely the “Movement Reinforcement” and “Dual Error” models? It may be interesting to simulate each of them and assess their compatibility with the data presented here.

Lines 436 and 439: if the model decays very slowly, then it surely does not “converge” to a new baseline? Could the authors please clarify that point?

Discussion: There is quite a lot of missing work of direct relevance to the current study, although I leave to the authors’ discretion what they deem interesting. The existence of a “new baseline” following reinforcement in visuomotor rotation tasks has been observed many times before (eg Galea et al. 2015 Nat Neuro, Schmuelof et al 2012 J Neurosci), though it remains partially unexplained (Holland et al 2019, J Neurosci), which makes this study potentially exciting. Possible causes that may be relevant are perceptual bias (Vindras et al 1998) and perceptual recalibration (Modchalingam et al 2019), with interesting parallel to the modelling part of this study. An exciting perspective of the current study is that one or both causes are sensitive to task error, which may also be worth pointing out. In line 501-503 the authors mention that the task error in the current study may be explicit, but explicitly reinforced reaching directions can still lead to an implicit new baseline (Holland et al 2019, J Neurosci), which may be worth clarifying. Finally, at lines 402-404, the study from Cashaback et al (2017, Plos Comp Biol) may be of interest to the authors, as it asks this question specifically.

Minor points

Line 83: I find the term “parallel interaction” ambiguous, as “parallel” suggests independent processing, that is, it suggests no interaction. Some rewording may benefit the general reader by removing the ambiguity.

Line 297: “PEC-LIPF” I assume the authors meant “LIPF-PEC”?

Figure S1: I believe the legends (open and closed circle) is missing on this figure? I only find it mentioned in the caption.

Line 610: How long was the hand constrained at the end?

Line 756: Though I understand it is indirectly described in the text, it may be clearer to also write down the cost function for the flat models for the reader’s benefit.

Reviewer #3: In the current manuscript, the authors set out to study how target errors may affect learning in force field adaptation. While force field adaptation has generally been considered the outcome of internal model adaptation, work in visuomotor rotations has suggested that learning can also be the result of explicit re-aiming strategies which appear to be driven by task performance errors (i.e., target errors). Surprisingly, the role of target errors in force field adaptation has received little attention. Here, the authors created force field conditions that would result in target errors. They find that the learning curve under these target-error inducing force field conditions is radically different from standard viscous curl field conditions, which don’t necessarily result in target errors. In addition, they find that movements appear to be curved in the direction of the force field following de-adaptation. A hierarchical model that includes directional changes in the kinematic plan, in addition to force field adaptation, can account for the time course of learning and resulting changes in hand trajectories.

I thought the overarching question of if/how task performance errors alter the learning function is interesting and the findings show a clear behavioral difference, and the simple addition of a directional bias to the modeling simulations is impressive. I have two main comments, one can easily be addressed through the paper’s framing while the other is a bit more difficult as it pertains to the experimental approach.

First, I felt that the narrative and interpretation from the authors could be made clearer. I suspect that the authors were attempting to remain agnostic or cautious, which may have muddled their narrative. I was left wondering whether they think this effect is truly driven by some form of an explicit re-aiming strategy (an interpretative narrative) or if their intent was simply to emphasize an effect of the presence of target errors (a descriptive narrative). This extends to the discussion of their modeling efforts where it is unclear if they think adaptation to the kinematic plan is the automatic result of target errors or reflects volitional changes (i.e., explicit strategy). Again, I suspect this is caution on the authors part, but I think they need to be “explicit” about it so that the reader can clearly know the authors position and, as such, correctly cite the work.

Second, there is a conflation between the physical manipulation and psychological manipulation in the task. We need to be aware that target errors, or task performance errors, are a psychological construction of the participant. Instead of changing the framing of the task, the authors chose to manipulate target errors by the physical properties of the task. In experiment 1, there is a large physical difference between velocity- and position-dependent force field, which would require different control strategies to counter regardless of the psychological evaluation of the participant, yet these are compared as equivalent except for their target error inducing properties. This difference in the physical properties is mitigated by their simulation results, showing that the flat model cannot replicate the learning function of the participants; however, the model when including the directional bias term only qualitatively matches the data. The subtleties that are induced by the physical differences really complicate a convincing, easy-to-see behavioral distinction that could be attributable to a change in task performance criterion/framing.

Their second approach, which is commendable, is to create force field conditions that mix position and velocity perturbations to overcome this issue but here the data here are not convincing either. There’s a numerical (if not significant) bias in de-adaptation in the position-skewed-position dependent field. This could be a spurious finding or it could be power issue. I would also suggest that these experiments be fully presented in the manuscript rather than only partially (with the majority in the supplemental). Their final approach is to use a partial error clamp over the second half of reach to remove target errors. Again, though this requires a change in the physical aspects of the task and, thus, it is conflated with psychological interpretation of target error.

I would suggest that the authors take an approach where they simply change the participants’ psychological/conceptual framing of task performance instead of through the perturbations. This could easily be accomplished through instruction, changing the target size, providing points or monetary incentive, task performance criterion etc. This would also help clarify if the directional bias they observe is the result of a control strategies in response to position-dependent field induced target error or if the directional bias arises simply from a volitional change in aiming direction by the subject. In the end, I appreciate the intention of the current study and found the model to be surprisingly useful, but I think a more psychological approach to manipulating task performance could overcome confounds with changing the perturbations.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Frederic Crevecoeur

Reviewer #2: No

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions, please see http://journals.plos.org/compbiol/s/submission-guidelines#loc-materials-and-methods

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008481.r003

Decision Letter 1

Adrian M Haith, Samuel J Gershman

24 Mar 2021

Dear Dr Ikegami,

We are pleased to inform you that your manuscript 'Hierarchical motor adaptations negotiate failures during force field learning' has been provisionally accepted for publication in PLOS Computational Biology.

Regarding Reviewer 3's remaining concerns: I agree that there are a number of outstanding questions with regard to how the target errors induced by the force field approach relate to other manipulations that have been used in the past. A more systematic exploration of this question would certainly be valuable going forwards. However, I feel the issue has been adequately discussed in the paper and further experiments are not necessary at this stage.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Adrian M Haith

Associate Editor

PLOS Computational Biology

Samuel Gershman

Deputy Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Reviewer #1: I am satisfied with the revised version of the paper and recommend publication of the manuscript.

Reviewer #2: Response to the authors

After careful reading of the responses the authors provided to my original comments, I find the amendments they made satisfactory in either addressing them directly or mentioning their content explicitly in the final work. Consequently I have no further points to raise and am happy to support it for publication.

Reviewer #3: While I thought that the flow of the manuscript was much improved, I still think the authors have a fundamental problem/conflation in the study design that complicates interpretation. Target errors are inextricably linked to the dynamics of the perturbation. Thus, it is impossible to determine if the pattern of results, especially the changes to the null-field trajectory, are the result of a task-performance sensitive process (i.e., strategy) or a lower-level process concerned with kinematic planning, which may be distinguishable from a strategy. As I said in my previous review, I think this issue could easily be addressed with a follow-up experiment that dissociates target errors from the force perturbation, such as changing the size of the target or jumping the target. Findings from a study like this could help constrain the modeling efforts and scope of speculation in the manuscript.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008481.r004

Acceptance letter

Adrian M Haith, Samuel J Gershman

12 Apr 2021

PCOMPBIOL-D-20-02060R1

Hierarchical motor adaptations negotiate failures during force field learning

Dear Dr Ikegami,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Katalin Szabo

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Text. Trajectory adaptation in Experiment-2.

    (DOCX)

    S2 Text. Detail for the simulation.

    (DOCX)

    S1 Fig

    Trajectory adaptation in Experiment-2: (A, C) The hand trajectories of two representative participants and learning curves in PSPF (A) and CPVF (C) averaged across all participants. Note that the scales differ between x and y axes to clearly show trajectory changes along the x-axis. The light gray shades behind some trajectories represent a schematic image of the force field. The adaptation of the TE and LD are shown by traces with open circles and filled circles, respectively. The first 15 TE and LD values are plotted for every single trial, while the subsequent trials (indicated by thick gray lines at the bottom of the figure) are plotted for every five trials. The shaded gray areas around the lines represent standard errors. The light green zones represent the target width (radius: 7.5 mm). (B, D) The TEs and baseline-subtracted LDs in six trial epochs (1st, 3rd-5th, 136th-155th adaptation trials, and 1st, 3rd-5th, 131st-150th de-adaptation trials) in PSPF (B) and CPVF (D). Gray dots represent data from individual participants. The error bars indicate standard errors. The light green zone in the TE plots represents the target width.

    (TIF)

    S2 Fig

    Simulation results for trajectory adaptation in CPVF, represented by TE (open circle) and LD (filled circle) by the flat (A, B)/hierarchical (C, D) OFC (upper panels) and VS models (lower panels). The flat learning models (only internal model adaptation) were unable to reproduce either the non-monotonic change in LD or the curved null trajectory with a persistent deviation after exposure to CPVF. However, the hierarchical OFC models (kinematic plan learning and internal model adaptation) successfully reproduced both.

    (TIF)

    Attachment

    Submitted filename: PlosCompBiol_R1_Replies.docx

    Data Availability Statement

    Data and codes for all experiments and simulations are freely available in the Dryad repository at the URL: https://doi.org/10.5061/dryad.5x69p8d2f [75].


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES