Skip to main content
Journal of Biomechanical Engineering logoLink to Journal of Biomechanical Engineering
. 2022 Oct 6;144(12):121008. doi: 10.1115/1.4055680

Generating Human Arm Kinematics Using Reinforcement Learning to Train Active Muscle Behavior in Automotive Research

Sayak Mukherjee 1,, Daniel Perez-Rapela 1,, Jason L Forman 1,, Matthew B Panzer 1,1
PMCID: PMC10782871  PMID: 36128755

Abstract

Computational human body models (HBMs) are important tools for predicting human biomechanical responses under automotive crash environments. In many scenarios, the prediction of the occupant response will be improved by incorporating active muscle control into the HBMs to generate biofidelic kinematics during different vehicle maneuvers. In this study, we have proposed an approach to develop an active muscle controller based on reinforcement learning (RL). The RL muscle activation control (RL-MAC) approach is a shift from using traditional closed-loop feedback controllers, which can mimic accurate active muscle behavior under a limited range of loading conditions for which the controller has been tuned. Conversely, the RL-MAC uses an iterative training approach to generate active muscle forces for desired joint motion and is analogous to how a child develops gross motor skills. In this study, the ability of a deep deterministic policy gradient (DDPG) RL controller to generate accurate human kinematics is demonstrated using a multibody model of the human arm. The arm model was trained to perform goal-directed elbow rotation by activating the responsible muscles and investigated using two recruitment schemes: as independent muscles or as antagonistic muscle groups. Simulations with the trained controller show that the arm can move to the target position in the presence or absence of externally applied loads. The RL-MAC trained under constant external loads was able to maintain the desired elbow joint angle under a simplified automotive impact scenario, implying the robustness of the motor control approach.

Keywords: motor control, human body model, reinforcement learning, active muscle

Introduction

Motor control mechanisms in humans manage and modify the stiffness of skeletal joints by generating active muscle forces as determined by the central nervous system (CNS). Active muscle forces enable the human body to maintain posture and balance, perform motor tasks, and react to external perturbations. In automotive loading scenarios such as pre-impact bracing in low-severity crashes, active muscle forces may alter the occupant response and injury modes due to the change in body kinematics during the loading phase or by stiffening up of joints. Computational human body models (HBMs) are extensively used in the automotive industry to predict the response of occupants and pedestrians in motor vehicle collisions (MVCs), leading to injury during the vehicle design phase. Incorporating active motor control mechanisms into HBMs will help improve our understanding of the mechanisms and tolerances of injury and will help accelerate the development of injury countermeasures.

Previous computational studies in vehicle safety research have widely used two approaches for modeling muscle control in HBMs. The first approach involves the application of a predetermined activation-time history to muscle groups responsible for carrying out specific motions around the joint. This simplified approach has been utilized in both multibody (MB) models [1,2] as well as finite element (FE) models [35] and have demonstrated the effect of muscle activation on the response of HBMs under external loads. Some studies have also derived an optimized activation scheme for muscles to maintain the corresponding joint at a predefined position [6]. Unfortunately, these predetermined activation schemes have limited utility for general use applications of HBMs, and these approaches cannot be used under the various cases of loading where activation patterns or targeted kinematics are not known a priori.

The second control approach, which has become state-of-the-art in active muscle control for HBMs used in automotive applications, is based on closed-loop feedback (PID) mechanisms that drive muscle activation levels by some error signal based on current and target joint positions or muscle lengths. These controllers are designed to output muscle forces or activation levels to restore the HBMs to a desired position. Kistemaker integrated a PID controller into a MB model of the upper arm to reproduce fast elbow motion like in humans [7,8], using the error calculated as the difference between muscle length and a target muscle length (equilibrium point control). The number of muscles was reduced by combining different muscle units into four lumped muscles, with two of the lumped muscle responsible for elbow extension and the remaining two for flexion. The output from the PID controller was muscle stimulation for each group. One limitation of using target muscle length for control is that it is difficult to correlate the muscle length with different values of target joint angles. Östh used a similar approach to model musculoskeletal control in a FE model of the arm [9]. In the FE model, nine different muscle units were modeled, and the control approach grouped the muscles as extensors and flexors. The PID controller was developed using the joint angle error as the input, and the control signal was used to derive the muscle activation level for each group. The PID controllers have also been used on whole body HBMs to predict the occupant response during motor vehicle load cases. Östh et al. used three PID controllers with head, neck, and lumbar spine angles as the error signal [10] for posture control during a frontal crash. Iwamoto et al. used an FE model with a PID controller to control the kinematics of an occupant in a low severity side impact scenario [11]. Twenty-two PID controllers were modeled to control the joint motion with the joint angles as error signals with muscles classified into right or left control groups. Martynenko et al. used PID controllers with 370 active muscles of the neck, torso, and upper extremities [12]. The error signal for each PID was based on target muscle lengths to predict the body kinematics of the occupant under the combination of emergency braking and lane change maneuvers. Inkol et al. used nonlinear torque generators for HBM joints in place of the control of Hill-type muscles [13]. Parameter control optimization were done to simulate athlete performance in golf, cycling, and wheelchair propulsion. Walter et al. [14] developed a hierarchical control architecture with PID controllers to maintain posture and simulate squat movement using full body musculoskeletal model with 36 muscles.

The feedback based PID control mechanisms require precise tuning of controllers to fit the validation data generated by volunteer testing. Although useful under many circumstances, the PID control approach suffers from two major limitations: (1) the controllers are tuned for a limited range of possible external loads and may not output accurate responses beyond the loading scenarios they are tuned for and (2) the feedback controllers require a predefined muscle recruitment strategy on how muscles are organized as agonist or antagonist groups for the preferred joint kinematics. The human musculoskeletal system is redundant in nature; there are more actuators (muscles) than the kinematic degrees-of-freedom (DOF) at the joints. For simple joints, assumptions can be made regarding extensor and flexor groups, however for more complex body regions, it is not feasible to isolate individual muscles responsible for motion along any standard joint direction for a generalized response. Rather, movements are carried out by intricate coordination of different muscles activated at different points of time by the CNS [1517], which is difficult to replicate using linear feedback gains [18,19].

The present study aims to use reinforcement learning (RL) for motor control in HBMs. Deep RL algorithms are recent advances in the field of machine learning which use an iterative approach to train the controller to generate desired outputs through a system of rewards and penalties [20]. Reinforcement learning is a biologically inspired learning routine that allows for the controller (called agent) to identify the optimal sequence of actions to take from a given state of the control environment to achieve a predefined goal [21]. State refers to the parameters that can be used to define the control environment. Reinforcement learning enables the control model to learn how muscle actuation affects joint kinematics by rewarding good responses and penalizing bad responses. This process is analogous to how children learn to use and coordinate their muscles to eventually interact with the environment around them. Deep RL leverages the performance of neural networks with reinforcement learning algorithms that enable the agent to reach the desired objective [22]. Neural networks (NNs) can be considered as function approximations, which map the state action pair to its corresponding value. In a RL problem, the neural networks are trained to predict the efficacy of each state-action pair and take the best possible action. The efficacy of actions is quantified using a reward function, which awards desirable actions (actions that moves the system closer to its objective) and penalizes the undesirable actions.

In this study, deep deterministic policy gradient (DDPG) is used to model the agent, which belongs to the class of RL algorithms called the actor-critic method [23]. One advantage of DDPG is that it can be used in continuous action spaces [24,25]. DDPG concurrently trains two neural networks to identify and output the best action. The actor network is the policy network that maps the state parameters to actions. The critic network updates the Q-value based on the state parameter and action from previous time-steps. The Q-value of a state-action is the cumulative reward that the agent is expected to receive when it undertakes the given action from the present state. The actor network outputs the action, which maximizes the expected reward.

Deep RL algorithms have demonstrated the ability to show human-level performance while playing video games [22] and learning complex games like alpha-go from scratch [26]. RL for control has also been used in the field of robotics for navigation in complex environments [2730]. HBMs present a further control challenge due to physiological redundancy and nonlinearity of the musculoskeletal system.

Previously RL control schemes have been used for simulating arm reaching tasks [3134], for generating motion about shoulder joint [35] and for maintaining the stability of joints under gravity [36,37]. Deep RL algorithms have also been used for synthesizing locomotion using multibody models of humans [38,39] and animals [40], control the kinematics of neck FE model in the sagittal plane [41] under rear impact scenario, and have aided in the design and analysis of limb assistive exoskeletons [42]. Human beings can adapt to changes in external loads [18,19], and the ability of RL agents to replicate such adaptive response have been studied previously [43]. While previous RL musculoskeletal control studies for motion related tasks have been performed in detail [3133,39,44,45], the ability of RL control mechanisms to extend to dynamic events such as automotive impacts, where the response time is much faster, and the environment is more chaotic, remains to be studied. The biofidelity of muscle activation patterns corresponding to joint stability in such cases and the effect of change in external environment on the muscle synergy also need to be investigated.

The current study makes use of DDPG algorithms to model and integrate active muscle control in HBMs with the aim of evaluating the biofidelity of the controller and verifying its adaptability in automotive environments. For this purpose, a MB model of the human arm with the anatomy of an average (50th percentile) male has been developed to demonstrate the utility and implementation of this active muscle modeling approach using a simplistic anatomical model. The reinforcement learned muscle activation controller (henceforth, referred to as RL-MAC) developed in this study was used to simulate the desired motion at the elbow or maintain the stability of the joint in the presence of external impact loads.

Methodology

The human arm MB model was developed in matlab r2020b using the Simscape multibody toolbox. The multibody model was integrated with the matlab reinforcement learning toolbox to develop the RL-MAC and carry out training and simulation of the control model.

Development of the Arm Multibody Model.

A simplified model of a 50th percentile human arm was developed comprising of scapula, humerus, radius, and ulna (Fig. 1(a)). The bones were modeled as rigid bodies. In this study, only the extension–flexion motion about the elbow was considered, thus the glenohumeral joint was constrained. The radius and ulna were also combined into one rigid body, and mass and inertial properties of the lower arm (below the elbow) were applied. In the present study, the focus was on reproducing the elbow extension–flexion motion, thus a revolute joint was defined between radius–ulna and humerus at the elbow with a joint stiffness of 0.6 N·m/rad [9,12]. Joint damping of 0.4 N·ms/rad was also used to prevent any minor oscillations at the neutral position in the passive condition. Previous studies have measured the damping of the elbow joint between 0.2 N·ms/rad and 1 N·ms/rad [46,47]. Popescu et al. also determined that the damping value was almost constant during the entire time of the elbow rotation [48]. The simulation space of the elbow was restricted at the limit of the extension–flexion angle between 0 deg to 160 deg when measured from full extension of the arm.

Fig. 1.

Multibody model of the human arm: (a) rigid bones with elbow joint and (b)MBmodel with muscles

Multibody model of the human arm: (a) rigid bones with elbow joint and (b)MBmodel with muscles

The muscles in the model were defined with the suitable origin and insertion points [49,50], along the line of action of the muscle forces. Force magnitudes were calculated according to the Hill-type muscle model considering the muscle length and contractile velocity during the simulation.

Hill-type muscle model provides for the calculation of muscle forces in numerical analysis [51,52] (Fig. 2). Hill-type muscle consists of a contractile element (CE) simulating the active forces generated by the muscles (FCE). The passive element (PE), which is in parallel to the contractile element, computes the forces due to muscle stiffness (FPE).

Fig. 2.

Hill-type muscle model with contractile element and passive element

Hill-type muscle model with contractile element and passive element

The forces generated by the muscles are nonlinear in nature and depend on the muscle length and the muscle velocity. The active muscle forces are stimulated by the CNS, which generates the muscle activation levels (at). The activity level or activation varies between 0 (fully passive) and 1 (fully active), and its value is determined by the CNS depending on external loads and the current joint stability.

The Hill-type muscle parameters like normalized force-length (Fl) and force-velocity (Fv) relationships are shown in Fig. 3. The total forces generated by the active part of the muscle are dependent on Fl, Fv, and the activation level (at) [52,53]

FCE=at×Fmax×Fl(L)×FV(V) (1)

Fig. 3.

Hill-type muscle parameters: (a) force-length relationship of active (Fl) and passive (FP) elements and (b) force-velocity (FV) relationship of the active element

Hill-type muscle parameters: (a) force-length relationship of active (Fl) and passive (FP) elements and (b) force-velocity (FV) relationship of the active element

Fmax is the maximum force a muscle can generate, which is a characteristic property of the muscle and is dependent on the anatomical cross section area. The active tension–length curve Fl describes the relationship between active muscle forces and normalized length of muscles (L), with the maximum force which occurs at an optimal length (Lopt). The FV curve is the relationship between contractile velocity and FCE. When the velocity is positive, i.e., the muscle elongates, the force (Fv) asymptotes at a value near Fmax.

For the passive element, the force was calculated using an exponential function of length. The passive force (FP) only starts acting when the length of the muscle exceeds the optimum length [54]

FPE=1exp(Ksh)1{exp[KshLmax(L1)]1}forL>1 (2)

Ksh is a dimensionless parameter influencing the rise of the passive force with length. The total force generated by the muscle is the sum of the magnitudes of passive force and active force.

The MB model incorporated muscles responsible for both the flexion and extension motion (Fig. 1(b)). The muscles included in the model are the biceps brachii long head and short head, the brachialis, the brachioradialis, the pronator teres, the extensor carpi radialis longus, and the triceps long head, lateral head, and medial head. Some of the muscles which have large cross section and wide bone insertion regions were divided into several strands to distribute the muscle forces. The Hill-type muscle parameters like normalized force-length (Fl) and force-velocity (Fv) relationships were defined as a curve and were identical for each arm muscle [5,55].

The muscles origin and insertion points in the model are approximated from various sources of anatomical data available [49,50]. The optimum muscle length is the length of the muscles at the neutral position when the humerus and radius are at right angles to each other [56]. The muscle properties included in the model are tabulated below (Table 1).

Table 1.

Properties of muscles in MB model

Muscles Fmax (N) [9] No. of strands in model Lopt (mm)
Biceps brachii long head 360 1 336
Biceps brachii short head 248 1 327
Brachialis 568 2 166, 156
Brachioradialis 152 1 283
Pronator teres 320 1 157
Extensor carpi rad longus 176 1 312
Triceps long head 456 1 324
Triceps lateral head 360 1 296
Triceps medial head 360 3 239, 211, 170

Note: Optimum length (Lopt) of the muscles is obtained at the neutral position (90 deg flexion angle).

Muscle Control Framework.

As discussed before, the CNS actuates different muscles in coordination for carrying out any motion about a joint. Muscle forces are highly nonlinear in nature and are affected by the delay between the neural stimulation generated by the CNS and the actuation of the muscle. Humans can also adapt to changes in external environments during their movements, and Smeets et al. argued that human movement patterns cannot be explained using simple feedback mechanisms [18]. This study explores the feasibility of using the RL-MAC mechanism for muscle control under varied external environments. The RL-MAC used in this paper uses deep NN, which can efficiently approximate nonlinear behaviors from known predictors [57]. Trained RL agents have also been found to adapt to changes in environments in various applications [58]. All these factors make RL-MAC potentially suitable for HBMs.

Figure 4 shows the RL-MAC framework for controlling arm motion by incorporating a DDPG agent. The RL-MAC reads the state parameters from the arm multibody model. The controller outputs neural stimulations (ut), which are converted to muscle activation (at) using activation dynamics. The resultant activations are applied to the corresponding Hill's muscles in the MB model, which also obtain the value of muscle length and velocity from the model to simulate the muscle forces (Eq. (1)). The active muscle forces are tensile in nature, i.e., forces developed try to pull the origin and insertion points toward each other. During the motion, different sets of muscles are activated by the RL-MAC to carry out the relevant joint motion.

Fig. 4.

RL-MAC framework for the arm model

RL-MAC framework for the arm model

The state of the controller is defined by the elbow joint angle, the joint velocity, the error (difference between target angle and current angle), and the muscle activations. The activation (a) buildup in the muscles as a result of the neural stimulation (u) using activation dynamics proposed by Zajac [52]

dadt=1τact[u(1δ)auδa] (3)

The τact is the time constant for generating muscle activity from neural stimulation, which represents the time delay between the neural stimulus (u) and the commencement of muscle activity (a). δ is the ratio (τact/τdeact), where τdeact was the depletion time constant which controls the time lag of drop in activation level (a) after reduction of the stimulation (u). The length and velocity were determined during the simulation by measuring the Euclidean distance between the origin and insertion points of the muscles. The parameters used to calculate the muscle forces are tabulated in Table 2. The total muscle forces were calculated as the sum of forces generated by the contractile element and passive element.

Table 2.

Muscle and activation parameters used in MB model

Muscle parameters Value
τ act 0.02 [59]
τ deact 0.06 [59]
Minimum activation (ao) 0.005
FL curve Graph input [5]
FV curve Graph input [5]
K sh 6.15 (Flexors) [9]
3 (Extensors)

The overall architecture of the RL-MAC, which was based on the DDPG agent (Fig. 5), was similar to the one proposed by Lillicrap et al. [23]. The actor network consisted of a feedforward neural network with one hidden layer between the input and the final layer. Input to the hidden layer and final layer was activated with a rectified linear (ReLu) function. The output of the final layer was activated using a sigmoid function as the action space varies uniformly between 0 and 1 representing the neural stimulation (ut). The number of nodes in the final layer was equal to the number of muscles required to be stimulated. The critic network is comprised of three layers with the ReLu transfer function after the hidden layer. The input of the critic network included the state parameters and the actions from the actor network. The state observations were activated with a ReLu function before connecting to the hidden layer. The actions from the actor network are connected to a hidden layer skipping the ReLu activation [23]. The critic layer outputs the Q-value associated with the state-action pair. Since the Q-value is a scalar value, the critic network has one node in the final layer.

Fig. 5.

Architecture of DDPG agent

Architecture of DDPG agent

In this study, the objective of the RL-MAC was to perform goal directed motion of the elbow joint within its range of motion and maintain its stability in the presence of external perturbations. The agent was trained to move the forearm to a target position from any given starting position and stabilize it at the final position. During the training phase, the controller learns to minimize the error between the current angle and the target angle. In the reward function, the controller is penalized proportionally to the magnitude of the error. The reward function also awards the agent if it manages to stabilize the elbow angle within 0.1 rad (5.7 deg) of the target value. Due to the redundant nature of the human musculoskeletal system, there are many combinations of muscle activation patterns that can produce the same desired movement. One method to reduce the ambiguity of the muscle activations scheme that is commonly used in muscle activation biomechanics is to minimize the metabolic costs of active muscle activity [60]. There are two major metabolic cost functions associated with muscles that are most commonly used. The first is the energy cost which seeks to minimize the total forces generated in the muscles or work done by the muscles [61,62]. Second is the muscle fatigue or muscle effort cost function, which considers minimizing the muscle activation over the time of motion [63]. In the control model, we have considered minimizing the activation, but alternate energy costs can also be implemented

Reward=αErrorβa(t)+γ(|Error|<0.1rad) (4)

The reward calculated using Eq. (4) at each time-step of the simulation, while the agent tries to maximize the cumulative reward over the simulation time. α, β, and γ are positive constant terms that calibrate the different components of the reward function. Equal weightage was assigned to the activation of each muscle in the reward function. During the training of the RL-MAC Ornstein–Uhlenbeck (OU) process was used to add noise with a standard deviation of 0.09 for adequate exploration of the action space. OU noise was found to explore the action space better for coordinated actuation of muscles [40]. The RL training was considered to converge when the average cumulative reward over the most recent 250 iterations reached a predetermined value.

Model Evaluation, Training, and Validation.

Before training the control model using RL-MAC, the passive structural behavior of the arm MB model was evaluated. A moment with a magnitude varying between −1 N·m and 1 N·m was applied at the elbow revolute joint with the humerus fixed, and the rotation of the forearm was measured. The resultant stiffness of the joint was calculated from the moment-angle data, and the magnitude was verified with published literature [56,64].

After verifying the structural stiffness of the MB model, the control training was carried out with the system of rewards mentioned in Eq. (4). The model was trained in two different scenarios described in the Training Scenario 1: Bare Arm Motion Control and Training Scenario 2: Arm Motion Control Under External Load sections (also summarized in Table 3), and the ability of the trained model to synthesize motion under a novel environment was also evaluated (testing scenario, summarized in Table 3). In both cases, the humerus and the scapula were fixed, and the forearm was free to rotate about the elbow revolute joint. The elbow angle was measured from the arm's extension limit (Fig. 6(a)).

Table 3.

Training and simulation scenarios

Training/simulation case Load applied Model response
Evaluation of passive structural response Torque at the elbow revolute joint Moment-angle response of the joint
Training scenario 1: targeted motion of the forearm with RL-MAC integrated MB model No external loads Angle-time response of the elbow joint
Training scenario 2: targeted motion of the forearm under external loads Point mas attached to the radius and gravity Angle-time response of the elbow joint
Testing scenario: response to novel loads Simplified crash pulse applied to the humerus proximal end Angle-time response of the elbow joint

Fig. 6.

(a) Arm parameters for training, (b) flexor muscle group, and (c) extensor muscle group

(a) Arm parameters for training, (b) flexor muscle group, and (c) extensor muscle group

Training Scenario 1: Bare Arm Motion Control.

In the first scenario, the MB arm model with integrated RL-MAC was trained to perform a fast goal-directed motion of the elbow joint from a given initial position to a target position of the joint. For the purpose of training, two different muscle recruitment strategies were used. In the first recruitment strategy, called group activated muscle recruitment (GAMR), the arm muscles were grouped as extensors and flexors (Figs. 6(b) and 6(c)), and identical activation was applied to muscles belonging to the same group. The flexor group included the biceps brachii long head and short head, the brachialis, the brachioradialis, the pronator teres, and the extensor carpi. The triceps long head, lateral head, and medial head made up the extensor group. For this case, the actor network had two nodes in the output layer prescribing the stimulation for the extensor and flexor groups. The second recruitment strategy was called individual activated muscle recruitment (IAMR), in which each muscle was actuated individually regardless of being an extensor or flexor and having an activation level independent of the activation levels of other muscles. For this activation scheme, the actor network had nine nodes in the final layer, one for neural stimulation for each muscle. During the training process, the initial and target angles were randomly varied in the elbow rotation space so that the trained agent does not overfit to any particular set of input data.

After the activation models were trained to convergence, the arm was repositioned to test the model's response to a series of initial and target angles based on independent human volunteer data from available literature [8]. The validation case used in this study required six male volunteers to perform fast targeted extension or flexion of the arm, and the elbow angle—time history of the motion was recorded. Due to the simple nature of the test setup, a similar motion could be replicated in the MB model. Depending on the rotation case, the arm was stabilized at the initial angle for the first 100–125 ms before the target angle was set to the desired value for the rotation. The angle-time data generated by the trained RL-MAC was compared with the volunteer kinematics data to verify its efficiency.

Training Scenario 2: Arm Motion Control Under External Load.

Training is performed using the targeted arm motion detailed in scenario 1, but in this case, a mass was attached to the radius (Fig. 7(a)), and gravity was introduced in the multibody model. The model was trained using the IAMR scheme, and the reward function is described in Eq. (4). Only changes to the multibody environment were made, and no change was made to the control model for this scenario, i.e., no additional information on the magnitude of the added mass or direction of gravity was provided to the controller. This enabled the RL-MAC to formulate a response pattern, which was independent of the external loads applied. During the training, the added mass was randomly varied between 1 kg and 5 kg (in increments of 1 kg), and the direction of gravity was adjusted every iteration such that the arm was trained to perform the flexion–extension motion both along and against the gravity. The results from the trained model in scenario 2 enabled the assessment of the utility of the RL-MAC framework for training the HBM using the same architecture for varied sets of loads generally associated with to HBMs.

Fig. 7.

(a) Arm model with variable mass and (b) the trained arm model subjected to load case representing simplified automotive impact

(a) Arm model with variable mass and (b) the trained arm model subjected to load case representing simplified automotive impact

Testing Scenario: Response to Novel Environments.

The ability of the RL-MAC framework to produce a robust response within an environment for which it was not explicitly trained was also evaluated in this study. The model trained in the presence of an acceleration field (scenario 2) was subjected to a simplified crash pulse representative of the inertial load experienced by the upper extremity in a typical frontal MVC (Fig. 7(b)) [65]. The radius was pinned with a revolute joint representing the occupant's hand position on the steering wheel. The scapula was free to move in the planar direction. A stiffness of 1000 N/m was applied to the scapula in the vertical direction to effectively reproduce the effect of the lower body weight. The crash pulse was applied to the scapula in the horizontal direction. At the start of the simulation, the arm was set at a neutral position (flexion angle = 90 deg). The period of the pulse was similar to that used by Happee et al. to design a linear controller for two-dimensional arm motion [66]. However, the pulse magnitude applied to the MB scapula (Fig. 7(b)) was measured in the thoracic region in automotive frontal tests [65] and was higher than the constant force field under which the agent was trained. The RL-MAC trained in scenario 2 was evaluated in its ability to stabilize the arm under forward loads (representing frontal collision) and in rearward loads (rear collision). The testing step is important to verify the range of utility of the trained RL-MAC agent.

Results

The response of the passive part of the MB arm model was measured and compared with previously published data. The stiffness of the model with inactivated muscles was found to be 0.955 N·m/rad. The stiffness magnitude measured was close to the values reported by Hayes and Hatze [56], Wiegner and Watts [64], and Howell et al. [67]. The resultant stiffness of the elbow joint with the unactuated muscles is due to the combination of prescribed revolute joint stiffness and passive muscle stiffness. Previous studies have added damping to the muscles to improve the passive and active behavior at the elbow joint [9,68]. In this study, the damping of muscles was not considered; thus, all the nonlinearity at the joint was the result of the joint damping assigned.

For the training cases, the MB model with RL-MAC was simulated for 600 ms in each episode of the training. The training was distributed over 20 CPUs using the matlab parallel computing toolbox. Each episode took 20–30 s to simulate, and the DDPG agent was updated every 5 ms of the training simulation.

Training Scenario 1: Bare Arm Motion Control.

The arm model was trained to reach a randomized target joint angle from any randomized initial angle within the joint range of motion for extension/flexion. The training was carried out with both the IAMR and GAMR schemes. The variation of average reward with each iteration is displayed in Fig. 8. The average reward value considered for convergence was 1500, but it can vary depending on the control objective and the reward function. The IAMR converged faster compared to the GAMR scheme. A possible explanation can be that since the muscles are actuated individually, IAMR provides greater overall control of the arm motion. Also, given the randomized nature of the training algorithms, the reward versus episode response during the training varies slightly for each time the model is trained.

Fig. 8.

Plot showing the variation of average reward for individual activation (IAMR) and group activation (GAMR) with each episode during training. (The reward averaged over 250 episodes).

Plot showing the variation of average reward for individual activation (IAMR) and group activation (GAMR) with each episode during training. (The reward averaged over 250 episodes).

During the testing and validation phase of this scenario, the angle time histories of the movements produced by the trained RL-MAC were compared with volunteer kinematics angle-time plots and showed excellent agreement with the experimental data (Fig. 9). Both the IAMR and the GAMR could reproduce the motion of the volunteers in both flexion and extension after completion of training. In some cases, the simulated angle was lower or higher than the target angle due to the nature of the reward function, which allowed for the error of ±0.1 rad.

Fig. 9.

Elbow rotation angles by trained RL controllers in response to prescribed target angles and comparison with volunteer data with the individual (IAMR) and group (GAMR) activation scheme: (a) arm in flexion and (b) arm in extension

Elbow rotation angles by trained RL controllers in response to prescribed target angles and comparison with volunteer data with the individual (IAMR) and group (GAMR) activation scheme: (a) arm in flexion and (b) arm in extension

The muscle activity level for the duration of arm motion was also measured for both the activation strategies (Fig. 10). The IAMR model predicted different activation levels for each muscle during the simulation, and these responses were different than those using the GAMR scheme. In the IAMR scheme, some of the muscles remained inactivated throughout the duration of motion. The activation decreases rapidly for the muscles once the joint has been stabilized at the intended position. The GAMR scheme, which prescribed identical activation to all the muscles of the same group, resulted in long-term, low-magnitude activation for both extensors and flexors trying to balance each other even after the target position was reached.

Fig. 10.

Muscle activation plots for extensor and flexor muscles: (a) arm in flexion: 45–145 deg and (b) arm in extension: 145–45 deg

Muscle activation plots for extensor and flexor muscles: (a) arm in flexion: 45–145 deg and (b) arm in extension: 145–45 deg

The muscle activity patterns in the arm simulations are biphasic or triphasic in nature, similar to that expected in fast goal directed movements [69,70]. At the onset of the error signals, the agonist muscles are activated, and after an initial burst, the activation declines. The initial activities of the agonist muscles are directly related to the amount the elbow is required to rotate [71]. Near the completion of the motion, the antagonist muscles are activated, causing deceleration of the limb. In some of the simulations, we also saw a second burst of muscle activity in the agonists [72]. Happee argued that the third phase of muscle activation is essential in some goal directed movements where the antagonist activations have not reduced significantly once the target position is reached [73]. In our simulations, the third activation phase was more prominent in cases where the initial error was low or when a mass was attached to the distal forearm (scenario 2).

Training Scenario 2: Arm Motion Control Under External Loads.

The training of the RL-MAC using the IAMR scheme with the added random loading under scenario 2 converged in 8947 episodes. The trained RL-MAC was tested in its ability to produce extension–flexion motion and adapt to a range of mass attached at the radius (Fig. 11).

Fig. 11.

Rotation time history of the arm in presence of external load: (a) arm in flexion: 45–135 deg and (b) arm in extension: 135–45 deg

Rotation time history of the arm in presence of external load: (a) arm in flexion: 45–135 deg and (b) arm in extension: 135–45 deg

The trained model was able to perform the desired motion in the presence of external loads in the extension and flexion of the elbow. The elbow angle remained stable at the end of the simulation for both loading modes. During the flexion motion, the arm was able to carry weight up to 4.8 kg, and activation patterns for each muscle were different for different values of applied loads (Fig. 12). For the 4.8 kg flexion case, the activations of the arm flexor muscles were nearly at maximum, indicating that the 4.8 kg weight was the structural load limit of the MB arm model during flexion. The arm could carry up to 10 kg against gravity in extension, even though the mass was limited to 5 kg in training, showing that the trained RL-MAC than that for which can actuate muscles for loads different it has been trained. For extension, the arm extensor muscles achieved a maximum level of actuation while carrying a 10 kg mass.

Fig. 12.

Activation time history of the agonist muscles in presence of external load (a) arm in flexion: 45–135 deg (solid—1 kg, dashed—4.8 kg) and (b) arm in extension: 135–45 deg (solid—1 kg, dashed—10 kg)

Activation time history of the agonist muscles in presence of external load (a) arm in flexion: 45–135 deg (solid—1 kg, dashed—4.8 kg) and (b) arm in extension: 135–45 deg (solid—1 kg, dashed—10 kg)

The muscle activation profiles show a similar triphasic pattern. With the increase in mass against gravity, the antagonist activity period was shorter, followed by a noticeable burst of agonist activity in the third phase. The drop-in agonist activity after reaching in target angle during flexion, even with maximum load, was due to a decrease in the moment arm. On the contrary, the agonists in extension motion remain actuated throughout.

Angle-time history of RL-MAC trained in scenario 2 was simulated in the absence of external loads, and the response was compared with the volunteer tests, and scenario 1 trained RL-MAC with the IAMR scheme. The peak velocity of the RL-MAC in scenario 2 during the goal directed motion was lower than that seen in the experiments (Fig. 13). Nevertheless, the elbow stabilized at the target angle around the same time, after 300 ms of the onset of the error signal for both trained RL agents.

Fig. 13.

Comparison of RL-MAC from training scenario 1 (IAMR) and scenario 2 (a) arm in flexion: 45–145 deg and (b) arm in extension: 145–45 deg

Comparison of RL-MAC from training scenario 1 (IAMR) and scenario 2 (a) arm in flexion: 45–145 deg and (b) arm in extension: 145–45 deg

Testing Scenario: Response to Novel Loads.

The model trained in scenario 2 was used to control the arm motion under a simplified impact scenario design to replicate a driver holding on to the steering wheel and bracing for a frontal MVC. The response generated with the trained IAMR in scenario 2 under the simplified MVC pulse was compared to the response of a completely passive arm model without the active muscles (Fig. 13). The arm was kept at a neutral position (90 deg) at the start of the simulation followed by application of MVC crash pulse (Fig. 7(b)) after stabilization of the joint. The crash loading was applied horizontally at the scapula and humerus proximal end for 100 ms. During the application of the crash pulse, the objective of the active arm model was to maintain the neutral position. The active model was able to maintain the stability of the joint position of the elbow for both loading conditions. The passive case, however, was unstable for the duration of the application of the load (Fig. 14).

Fig. 14.

Rotation time history comparison of the arm in the presence of external pulse (a) load applied in rear and (b) load applied forward

Rotation time history comparison of the arm in the presence of external pulse (a) load applied in rear and (b) load applied forward

Discussion

The present study demonstrated the development and implementation of an active muscle control framework for a simplified human arm model based on recent RL algorithms with the intention of simulating untrained impulsive loading responses typically seen in the automotive crash environment. The RL algorithms have been used in other active human body simulations typically associated with the control of body motion [31,34,38,42,74,75], but the applicability of RL-MAC to the automotive environment has not been studied.

In this study, training of the RL-MAC was carried out with two different activation schemes. The RL-MAC was trained to minimize the error between the current and the target angle using minimum possible activation. The activation minimization criterion was included to ensure that the extensor and flexor muscles do not cocontract at the stabilized final position. Previous arm movement control studies have used the contractile element length error during the development of feedback controller [8,12]. Kistemaker et al. developed a muscle controller based on the feedback of CE element length (λ control) to simulate the same volunteer tests [7,8]. Martynenko et al. used a similar approach to model the fast arm movements [12]. In both studies, presimulation with the passive arm was performed to determine the target CE lengths of the muscles. In a more complicated anatomical system with more DOFs, CE length for a corresponding joint position is complex, and a strategy using CE length as the controller feedback to move from one arbitrary position to another may be challenging to determine. In the current study, we used elbow angle error for the RL-MAC representing a proprioceptive signal to get to a randomly defined targeted position from a randomly defined initial position [9]. Happee et al. developed a controller for the stability of the head and neck multibody model with both the head kinematics feedback and muscle length feedback to maintain the head at the neutral position [76]. However, in that study, the initial and targeted positions (and thus muscle lengths) were identical.

The trained agent could produce the required motion in both extension–flexion from any given start angle, which indicates that the controller was trained to generate arm kinematics independent of the initial position of the forearm. Both the activation schemes resulted in trained models that had similar kinematics when tested under unloaded motion scenarios, although evaluation of the activation patterns showed that the individual activation schemes actuate only those muscles which are required for the motion and imparting low activations to the others. In the grouped muscles activation framework, all the muscles belonging to the same group were activated together at the same level, which may result in a higher energy cost of motion. Grouping muscles together is a simplification that may lead to accurate external kinematics but incorrect internal loading on the tissues. The separate muscle scheme is likely more biofidelic. However, further investigation is required to clearly identify the effects of the activation frameworks on the HBM.

The control architecture could also train the arm model when the MB model was modified with added randomized point mass in a constant gravity field. This information was not provided to the RL agent, instead the controller had to formulate a generalized response of the muscles with the same states as in training scenario 1 based on how the model was performing relative to its objective. Apart from the reward function, the RL-MAC framework required negligible user input on the performance of the arm model or how the various muscles should behave as a system to accomplish an objective. Adding mass and gravity direction to the RL-MAC state could improve the learning process and overall kinematics of the arm MB model as both the parameters are important for the control process [17,19]. However, expanding the states to include mass and gravity would result in potentially overfitting the agent to the training case. Instead, RL-MAC devised a muscle control strategy to respond to the sudden change in mass from kinematics based-feedback. This kind of muscle activity modification is not possible using linear closed-loop controllers as multiple muscles need to be actuated simultaneously [18,19]. It was also found that linear models for neuromuscular control overfit the training datasets [66]. In the RL-MAC training process, both the initial kinematic parameters and environmental variables can be modified in each iteration, which can reduce overfitting. The generalized RL-MAC (scenario 2) in the absence of an external force field was found to undershoot the response of the volunteer dataset, but the overall kinematics was similar during the time frame of evaluation (Fig. 13). The initial disparity in response is because the RL-MAC in scenario 2 has not been provided with the complete information of the environment and has to rely on the kinematics feedback from the arm MB model to decipher the constant force acting on the forearm in each iteration.

The two training scenarios have demonstrated the ability of the RL framework to actuate independent muscles within a system to achieve a response. This feature will be critical for body regions more complex than the arm, where it is difficult to identify and associate different muscles working in a system to achieve motion in many different DOF. The RL-MAC efficiency was found to depend primarily on how the reward function was defined. Crowder et al. reported that an inaccurate balance between coefficients of the reward function might not be able to train the musculoskeletal system even with an increase in training time [33]. Hence, to improve the control framework, it is desirable to explore how the different components of the reward function affect the controller response and potentially inform the reward function development with various sources of new or existing volunteer data.

Reinforcement learning algorithms have the potential to train active HBMs to control their motion using scenarios where there is the potential for a lot of human volunteer data (e.g., lifting weights or exercising) [43], while providing good generalization and response in untrained cases where there is not an abundance of data (e.g., muscle response during impact). This is analogous to how humans learn to control their body motion from everyday activities but still, they need to react in some fashion to scenarios they have never encountered (like in an automotive crash). In this study, it was shown that the RL-MAC was able to maintain arm stability under conditions comparable to a driver bracing on the steering wheel before an automotive frontal crash pulse. This is notable because the muscle control model with arm kinematics as input was only trained to control its motion by using a simple weightlifting scenario (training scenario-2), and the nature of the loads in an impact scenario is substantially different with higher magnitude forces acting on the elbow joint for a shorter time span. Although this case is a simplified representation of a human's response to a crash, this study demonstrates the potential for a RL controller to respond appropriately under a different set of loads for which it has not been explicitly trained. This potentially makes for a more generalizable muscle control scheme than the traditional PID approaches and eliminates the need for retraining or calibrating the model under multiple load cases, which can be computationally expensive. With RL-MACs, training can be accomplished on a simplified, fast-running model, and the trained controller can be transferred to a more complex and detailed HBM capable of simulating impact and injury. The ability of trained RL controller to generate response under novel scenarios is also useful for the development of injury countermeasures and assistive devices [42] and for training and deployment of biomimetic devices [44,77] for which the nature and models of loads acting on the devices in real-world scenarios can be different from the training environment.

The purpose of this study was to implement and demonstrate the utility of the RL-MAC for training active muscle responses in a human body model for automotive scenarios, and to accomplish this, simplifications were made to the human body model. A multibody model of the human arm was developed using Hill-type line muscles, and for this study, the series element in the Hill-type muscle have been assumed to be rigid as the focus was the development of the controller. Some previous studies have considered a series element in Hill's muscle model to represent the stiffness of tendons [78]. However, in a study by Bayer et al., it was shown that the series element of the Hill-type model contributes around 7.6% of the total muscle forces [59]. In this study, the passive structural properties of the model were verified with the stiffness data available in the literature, demonstrating that the assumption of rigid tendon here did not affect the overall response of the arm MB model, and it certainly did not affect the objective of the study.

The MB model of the arm was simplified with a revolute joint at the elbow, the pronation, supination, and motion about the elbow were not considered. Only the flexion–extension motion of the arm was modeled mainly to compare the current RL-MAC with volunteer data [8] and similar arm models with PID controllers [8,9,12]. The potential of the deep RL controllers will be better understood in body regions like the head and neck, which have multiple nonlinear joints and complex coordination of muscles which makes the control system more complicated [76,79,80]. The development of a control model for such body regions with input kinematic parameters and reward functions will be assessed in future studies. In this study, we have penalized muscle activation to minimize fatigue (Eq. (4)). The effect of implementing other cost functions such as muscle work [62] or joint energy expenditure [81] also needs to be studied. Further, it also needs to be investigated whether a trained RL-MAC can adapt to changes in anthropometries.

Conclusions

This study implemented a methodology for integrating a robust muscle control mechanism of the elbow joint. In the current study, RL-MAC could produce goal directed arm movement, synthesize the same motion in the presence of a constant force field, and the trained controller could react to high magnitude impulse loads which provide evidence of its potential for controlling HBMs in commonly encountered chaotic scenarios. Such a control mechanism is important for motor vehicle impact scenarios which can be injurious, and thus collecting volunteer data under these situations is not possible.

The current control methodology can be extended to study the response of other more complex body regions with numerous muscles in an automotive environment and can be incorporated at the whole-body level. Active HBMs will be important tools for the development of improved injury countermeasures and protective gear in the future. Furthermore, the RL-MAC framework can also be used in biomechanics applications such as gait and occupational health research.

References

  • [1]. De Jager, M. , Sauren, A. , Thunnissen, J. , and Wismans, J. , 1996, “ A Global and a Detailed Mathematical Model for Head-Neck Dynamics,” SAE Paper No. 962430. 10.4271/962430 [DOI] [Google Scholar]
  • [2]. Shewchenko, N. , Withnall, C. , Keown, M. , Gittens, R. , and Dvorak, J. , 2005, “ Heading in Football. Part 2: Biomechanics of Ball Heading and Head Response,” Br. J. Sports Med., 39(Suppl. 1), pp. i26–i32. 10.1136/bjsm.2005.019042 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3]. Iwamoto, M. , and Nakahira, Y. , 2014, “ A Preliminary Study to Investigate Muscular Effects for Pedestrian Kinematics and Injuries Using Active THUMS,” Proceedings of the IRCOBI Conference, IRC-14–53, Berlin, Germany, Sept. 10–12, pp. 444–460.http://www.ircobi.org/wordpress/downloads/irc14/pdf_files/53.pdf [Google Scholar]
  • [4]. Brolin, K. , Halldin, P. , and Leijonhufvud, I. , 2005, “ The Effect of Muscle Activation on Neck Response,” Traffic Inj. Prev., 6(1), pp. 67–76. 10.1080/15389580590903203 [DOI] [PubMed] [Google Scholar]
  • [5]. Panzer, M. B. , Fice, J. B. , and Cronin, D. S. , 2011, “ Cervical Spine Response in Frontal Crash,” Med. Eng. Phys., 33(9), pp. 1147–1159. 10.1016/j.medengphy.2011.05.004 [DOI] [PubMed] [Google Scholar]
  • [6]. Chancey, V. C. , Nightingale, R. W. , Van Ee, C. A. , Knaub, K. E. , and Myers, B. S. , 2003, “ Improved Estimation of Human Neck Tensile Tolerance: Reducing the Range of Reported Tolerance Using Anthropometrically Correct Muscles and Optimized Physiologic Initial Conditions,” SAE Paper No. 2003-22-0008. 10.4271/2003-22-0008 [DOI] [PubMed] [Google Scholar]
  • [7]. Kistemaker, D. A. , 2006, “Control of Fast Goal-Directed Arm Movements,” Ph.D. thesis, Printpartners Ipskamp B.V., Enschede, The Netherlands. [Google Scholar]
  • [8]. Kistemaker, D. A. , Van Soest, A. K. J. , and Bobbert, M. F. , 2006, “ Is Equilibrium Point Control Feasible for Fast Goal-Directed Single-Joint Movements?,” J. Neurophysiol., 95(5), pp. 2898–2912. 10.1152/jn.00983.2005 [DOI] [PubMed] [Google Scholar]
  • [9]. Östh, J. , Brolin, K. , and Happee, R. , 2012, “ Active Muscle Response Using Feedback Control of a Finite Element Human Arm Model,” Comput. Methods Biomech. Biomed. Eng., 15(4), pp. 347–361. 10.1080/10255842.2010.535523 [DOI] [PubMed] [Google Scholar]
  • [10]. Östh, J. , Brolin, K. , Carlsson, S. , Wismans, J. , and Davidsson, J. , 2012, “ The Occupant Response to Autonomous Braking: A Modeling Approach That Accounts for Active Musculature,” Traffic Inj. Prev., 13(3), pp. 265–277. 10.1080/15389588.2011.649437 [DOI] [PubMed] [Google Scholar]
  • [11]. Iwamoto, M. , Nakahira, Y. , and Kimpara, H. , 2015, “ Development and Validation of the Total Human Model for Safety (THUMS) Toward Further Understanding of Occupant Injury Mechanisms in Precrash and During Crash,” Traffic Inj. Prev., 16(Suppl. 1), pp. S36–S48. 10.1080/15389588.2015.1015000 [DOI] [PubMed] [Google Scholar]
  • [12]. Martynenko, O. V. , Neininger, F. T. , and Schmitt, S. , 2019, “ Development of a Hybrid Muscle Controller for an Active Finite Element Human Body Model in LS-DYNA Capable of Occupant Kinematics Prediction in Frontal and Lateral Maneuvers,” Proceedings of the 26th International Technical Conference on the Enhanced Safety of Vehicles (ESV), Eindhoven, The Netherlands, June 10–13, pp. 1–12.https://www-nrd.nhtsa.dot.gov/departments/esv/26th/ [Google Scholar]
  • [13]. Inkol, K. A. , Brown, C. , McNally, W. , Jansen, C. , and McPhee, J. , 2020, “ Muscle Torque Generators in Multi-Body Dynamic Simulations of Optimal Sports Performance,” Multibody Syst. Dyn., 50(4), pp. 435–452. 10.1007/s11044-020-09747-9 [DOI] [Google Scholar]
  • [14]. Walter, J. R. , Günther, M. , Haeufle, D. F. , and Schmitt, S. , 2021, “ A Geometry-and Muscle-Based Control Architecture for Synthesising Biological Movement,” Biol. Cybern., 115(1), pp. 7–37. 10.1007/s00422-020-00856-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15]. Roh, J. , Cheung, V. C. , and Bizzi, E. , 2011, “ Modules in the Brain Stem and Spinal Cord Underlying Motor Behaviors,” J. Neurophysiol., 106(3), pp. 1363–1378. 10.1152/jn.00842.2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16]. Ma, S. , and Feldman, A. G. , 1995, “ Two Functionally Different Synergies During Arm Reaching Movements Involving the Trunk,” J. Neurophysiol., 73(5), pp. 2120–2122. 10.1152/jn.1995.73.5.2120 [DOI] [PubMed] [Google Scholar]
  • [17]. Lacquaniti, F. , Bosco, G. , Gravano, S. , Indovina, I. , La Scaleia, B. , Maffei, V. , and Zago, M. , 2015, “ Gravity in the Brain as a Reference for Space and Time Perception,” Multisensory Res., 28(5–6), pp. 397–426. 10.1163/22134808-00002471 [DOI] [PubMed] [Google Scholar]
  • [18]. Smeets, J. B. J. , Erkelens, C. J. , and van der Gon Denier, J. J. , 1990, “ Adjustments of Fast Goal-Directed Movements in Response to an Unexpected Inertial Load,” Exp. Brain Res., 81(2), pp. 303–312. 10.1007/BF00228120 [DOI] [PubMed] [Google Scholar]
  • [19]. Happee, R. , 1993, “ Goal-Directed Arm Movements. III: Feedback and Adaptation in Response to Inertia Perturbations,” J. Electromyogr. Kinesiol., 3(2), pp. 112–122. 10.1016/1050-6411(93)90006-I [DOI] [PubMed] [Google Scholar]
  • [20]. Mnih, V. , Kavukcuoglu, K. , Silver, D. , Rusu, A. A. , Veness, J. , Bellemare, M. G. , and Graves, A. , et al., 2015, “ Human-Level Control Through Deep Reinforcement Learning,” Nature, 518(7540), pp. 529–533. 10.1038/nature14236 [DOI] [PubMed] [Google Scholar]
  • [21]. Sutton, R. S. , and Barto, A. G. , 2018, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA. [Google Scholar]
  • [22]. Mnih, V. , Kavukcuoglu, K. , Silver, D. , Graves, A. , Antonoglou, I. , Wierstra, D. , and Riedmiller, M. , 2013, “ Playing Atari With Deep Reinforcement Learning,” Technical Report Deepmind Technologies, arXiv:1312.5602.https://arxiv.org/abs/1312.5602
  • [23]. Lillicrap, T. P. , Hunt, J. J. , Pritzel, A. , Heess, N. , Erez, T. , Tassa, Y. , Silver, D. , and Wierstra, D. , 2015, “ Continuous Control With Deep Reinforcement Learning,” Proceedings 6th International Conference on Learning Representations, pp. 1–14, arXiv:1509.02971.https://arxiv.org/abs/1509.02971
  • [24]. Wu, X. , Liu, S. , Zhang, T. , Yang, L. , Li, Y. , and Wang, T. , 2018, “ Motion Control for Biped Robot Via DDPG-Based Deep Reinforcement Learning,” 2018 WRC Symposium on Advanced Robotics and Automation (WRC SARA), Beijing, China, Aug. 16, pp. 40–45. 10.1109/WRC-SARA.2018.8584227 [DOI] [Google Scholar]
  • [25]. Islam, R. , Henderson, P. , Gomrokchi, M. , and Precup, D. , 2017, “ Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control,” CoRR, arXiv:1708.04133.https://arxiv.org/abs/1708.04133
  • [26]. Silver, D. , Schrittwieser, J. , Simonyan, K. , Antonoglou, I. , Huang, A. , Guez, A. , Hubert, T. , et al., 2017, “ Mastering the Game of Go Without Human Knowledge,” Nature, 550(7676), pp. 354–359. 10.1038/nature24270 [DOI] [PubMed] [Google Scholar]
  • [27]. Phaniteja, S. , Dewangan, P. , Guhan, P. , Sarkar, A. , and Krishna, K. M. , 2017, “ A Deep Reinforcement Learning Approach for Dynamically Stable Inverse Kinematics of Humanoid Robots,” 2017 IEEE International Conference on Robotics and Biomimetics (ROBIO), Macau, Macao, Dec. 5–8, pp. 1818–1823. 10.1109/ROBIO.2017.8324682 [DOI] [Google Scholar]
  • [28]. Lobos-Tsunekawa, K. , Leiva, F. , and Ruiz-del-Solar, J. , 2018, “ Visual Navigation for Biped Humanoid Robots Using Deep Reinforcement Learning,” IEEE Rob. Autom. Lett., 3(4), pp. 3247–3254. 10.1109/LRA.2018.2851148 [DOI] [Google Scholar]
  • [29]. Abreu, M. , Reis, L. P. , and Lau, N. , 2019, “ Learning to Run Faster in a Humanoid Robot Soccer Environment Through Reinforcement Learning,” Robot World Cup, Springer, Cham, Switzerland, pp. 3–15. [Google Scholar]
  • [30]. Xu, D. , Zhang, Y. , Tan, W. , and Wei, H. , 2021, “ Reinforcement Learning Control of a Novel Magnetic Actuated Flexible-Joint Robotic Camera System for Single Incision Laparoscopic Surgery,” 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi'an, China, May 30–June 5, pp. 1236–1241. 10.1109/ICRA48506.2021.9560927 [DOI] [Google Scholar]
  • [31]. Fischer, F. , Bachinski, M. , Klar, M. , Fleig, A. , and Müller, J. , 2021, “ Reinforcement Learning Control of a Biomechanical Model of the Upper Extremity,” Sci. Rep., 11(1), pp. 1–15. 10.1038/s41598-021-93760-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32]. Jagodnik, K. M. , Thomas, P. S. , van den Bogert, A. J. , Branicky, M. S. , and Kirsch, R. F. , 2016, “ Human-Like Rewards to Train a Reinforcement Learning Controller for Planar Arm Movement,” IEEE Trans. Hum.-Mach. Syst., 46(5), pp. 723–733. 10.1109/THMS.2016.2558630 [DOI] [Google Scholar]
  • [33]. Crowder, D. C. , Abreu, J. , and Kirsch, R. F. , 2021, “ Hindsight Experience Replay Improves Reinforcement Learning for Control of a MIMO Musculoskeletal Model of the Human Arm,” IEEE Trans. Neural Syst. Rehabil. Eng., 29, pp. 1016–1025. 10.1109/TNSRE.2021.3081056 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34]. Tahami, E. , Jafari, A. H. , and Fallah, A. , 2014, “ Learning to Control the Three-Link Musculoskeletal ARM Using Actor–Critic Reinforcement Learning Algorithm During Reaching Movement,” Biomed. Eng.: Appl., Basis Commun., 26(5), p. 1450064. 10.4015/S1016237214500641 [DOI] [Google Scholar]
  • [35]. Joos, E. , Péan, F. , and Goksel, O. , 2020, “ Reinforcement Learning of Musculoskeletal Control From Functional Simulations,” International Conference on Medical Image Computing and Computer-Assisted Intervention, Lima, Peru, Oct. 4–8, pp. 135–145. 10.1007/978-3-030-59716-0_14 [DOI] [Google Scholar]
  • [36]. Min, K. , Iwamoto, M. , Kakei, S. , and Kimpara, H. , 2018, “ Muscle Synergy–Driven Robust Motion Control,” Neural Comput., 30(4), pp. 1104–1131. 10.1162/neco_a_01063 [DOI] [PubMed] [Google Scholar]
  • [37]. Iwamoto, M. , and Kato, D. , 2021, “ Efficient Actor-Critic Reinforcement Learning With Embodiment of Muscle Tone for Posture Stabilization of the Human Arm,” Neural Comput., 33(1), pp. 129–156. 10.1162/neco_a_01333 [DOI] [PubMed] [Google Scholar]
  • [38]. Kidziński, Ł. , Mohanty, S. P. , Ong, C. F. , Hicks, J. L. , Carroll, S. F. , Levine, S. , Salathé, M. , and Delp, S. L. , 2018, “ Learning to Run Challenge: Synthesizing Physiologically Accurate Motion Using Deep Reinforcement Learning,” The NIPS'17 Competition: Building Intelligent Systems, Springer, Cham, Switzerland, pp. 101–120. [Google Scholar]
  • [39]. Song, S. , Kidziński, Ł. , Peng, X. B. , Ong, C. , Hicks, J. , Levine, S. , Atkeson, C. G. , and Delp, S. L. , 2021, “ Deep Reinforcement Learning for Modeling Human Locomotion Control in Neuromechanical Simulation,” J. Neuroeng. Rehabil., 18(1), pp. 1–17. 10.1186/s12984-021-00919-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [40]. La Barbera, V. , Pardo, F. , Tassa, Y. , Daley, M. A., Richards, C. , Kormushev, P. , and Hutchinson, J. R. , 2021, “ OstrichRL: A Musculoskeletal Ostrich Simulation to Study Bio-Mechanical Locomotion,” CoRR, arXiv:2112.06061.https://arxiv.org/abs/2112.06061 [Google Scholar]
  • [41]. Iwamoto, M. , Nakahira, Y. , Kimpara, H. , Sugiyama, T. , and Min, K. , 2012, “ Development of a Human Body Finite Element Model With Multiple Muscles and Their Controller for Estimating Occupant Motions and Impact Responses in Frontal Crash Situations,” Stapp Car Crash J., 56, pp. 231–268. 10.4271/2012-22-0006 [DOI] [PubMed] [Google Scholar]
  • [42]. Luo, S. , Androwis, G. , Adamovich, S. , Nunez, E. , Su, H. , and Zhou, X. , 2021, “ Robust Walking Control of a Lower Limb Rehabilitation Exoskeleton Coupled With a Musculoskeletal Model Via Deep Reinforcement Learning,” Research Square. 10.21203/rs.3.rs-1212542/v1 [DOI] [PMC free article] [PubMed]
  • [43]. Denizdurduran, B. , Markram, H. , and Gewaltig, M. O. , 2022, “ Optimum Trajectory Learning in Musculoskeletal Systems With Model Predictive Control and Deep Reinforcement Learning,” Biol. Cybern., epub, pp. 1–16. 10.1007/s00422-022-00940-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [44]. Driess, D. , Zimmermann, H. , Wolfen, S. , Suissa, D. , Haeufle, D. , Hennes, D. , Toussaint, M. , and Schmitt, S. , 2018, “ Learning to Control Redundant Musculoskeletal Systems With Neural Networks and SQP: Exploiting Muscle Properties,” 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, May 21–25, pp. 6461–6468. 10.1109/ICRA.2018.8463160 [DOI] [Google Scholar]
  • [45]. Qin, W. , Tao, R. , Sun, L. , and Dong, K. , 2022, “ Muscle‐Driven Virtual Human Motion Generation Approach Based on Deep Reinforcement Learning,” Comput. Animation Virtual Worlds, 33(3–4), p. e2092. 10.1002/cav.2092 [DOI] [Google Scholar]
  • [46]. Cannon, S. C. , and Zahalak, G. I. , 1982, “ The Mechanical Behavior of Active Human Skeletal Muscle in Small Oscillations,” J. Biomech., 15(2), pp. 111–121. 10.1016/0021-9290(82)90043-4 [DOI] [PubMed] [Google Scholar]
  • [47]. Rack, P. M. , 2011, “ Limitations of Somatosensory Feedback in Control of Posture and Movement,” Compr. Physiol., R. Terjung, ed., pp. 229–256. 10.1002/cphy.cp010207 [DOI] [Google Scholar]
  • [48]. Popescu, F. , Hidler, J. M. , and Rymer, W. Z. , 2003, “ Elbow Impedance During Goal-Directed Movements,” Exp. Brain Res., 152(1), pp. 17–28. 10.1007/s00221-003-1507-4 [DOI] [PubMed] [Google Scholar]
  • [49]. Moore, K. L. , and Dalley, A. F. , 2018, Clinically Oriented Anatomy, Wolters Kluwer India Pvt Ltd., Gurugram Haryana, India. [Google Scholar]
  • [50]. Lieber, R. L. , Jacobson, M. D. , Fazeli, B. M. , Abrams, R. A. , and Botte, M. J. , 1992, “ Architecture of Selected Muscles of the Arm and Forearm: Anatomy and Implications for Tendon Transfer,” J. Hand Surg., 17(5), pp. 787–798. 10.1016/0363-5023(92)90444-T [DOI] [PubMed] [Google Scholar]
  • [51]. Hill, A. V. , 1938, “ The Heat of Shortening and the Dynamic Constants of Muscle,” Proc. R. Soc. London, Ser. B, 126(843), pp. 136–195. 10.1098/rspb.1938.0050 [DOI] [Google Scholar]
  • [52]. Zajac, F. E. , 1989, “ Muscle and Tendon: Properties, Models, Scaling, and Application to Biomechanics and Motor Control,” Crit. Rev. Biomed. Eng., 17(4), pp. 359–411.https://pubmed.ncbi.nlm.nih.gov/2676342/ [PubMed] [Google Scholar]
  • [53]. Bahler, A. S. , Fales, J. T. , and Zierler, K. L. , 1967, “ The Active State of Mammalian Skeletal Muscle,” J. Gen. Physiol., 50(9), pp. 2239–2253. 10.1085/jgp.50.9.2239 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [54]. Winters, J. M. , 1995, “ An Improved Muscle-Reflex Actuator for Use in Large-Scale Neuromusculoskeletal Models,” Ann. Biomed. Eng., 23(4), pp. 359–374. 10.1007/BF02584437 [DOI] [PubMed] [Google Scholar]
  • [55]. Panzer, M. , 2006, “ Numerical Modelling of the Human Cervical Spine in Frontal Impact,” Master's. thesis, University of Waterloo, Waterloo, ON, Canada. [Google Scholar]
  • [56]. Hayes, K. C. , and Hatze, H. , 1977, “ Passive Visco-Elastic Properties of the Structures Spanning the Human Elbow Joint,” Eur. J. Appl. Physiol. Occup. Physiol., 37(4), pp. 265–274. 10.1007/BF00430956 [DOI] [PubMed] [Google Scholar]
  • [57]. Lewis, F. W. , Jagannathan, S. , and Yesildirak, A. , 2020, Neural Network Control of Robot Manipulators and Non-Linear Systems, CRC Press, Boca Raton, FL. [Google Scholar]
  • [58]. Padakandla, S. , 2021, “ A Survey of Reinforcement Learning Algorithms for Dynamically Varying Environments,” ACM Comput. Surv. (CSUR), 54(6), pp. 1–25. 10.1145/3459991 [DOI] [Google Scholar]
  • [59]. Bayer, A. , Schmitt, S. , Günther, M. , and Haeufle, D. F. B. , 2017, “ The Influence of Biophysical Muscle Properties on Simulating Fast Human Arm Movements,” Comput. Methods Biomech. Biomed. Eng., 20(8), pp. 803–821. 10.1080/10255842.2017.1293663 [DOI] [PubMed] [Google Scholar]
  • [60]. Koelewijn, A. D. , Heinrich, D. , and van den Bogert, A. J. , 2019, “ Metabolic Cost Calculations of Gait Using Musculoskeletal Energy Models, a Comparison Study,” PLoS One, 14(9), p. e0222037. 10.1371/journal.pone.0222037 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [61]. Minetti, A. E. , and Alexander, R. M. , 1997, “ A Theory of Metabolic Costs for Bipedal Gaits,” J. Theor. Biol., 186(4), pp. 467–476. 10.1006/jtbi.1997.0407 [DOI] [PubMed] [Google Scholar]
  • [62]. Umberger, B. R. , Gerritsen, K. G. , and Martin, P. E. , 2003, “ A Model of Human Muscle Energy Expenditure,” Comput. Methods Biomech. Biomed. Eng., 6(2), pp. 99–111. 10.1080/1025584031000091678 [DOI] [PubMed] [Google Scholar]
  • [63]. De Groote, F. , Kinney, A. L. , Rao, A. V. , and Fregly, B. J. , 2016, “ Evaluation of Direct Collocation Optimal Control Problem Formulations for Solving the Muscle Redundancy Problem,” Ann. Biomed. Eng., 44(10), pp. 2922–2936. 10.1007/s10439-016-1591-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [64]. Wiegner, A. W. , and Watts, R. L. , 1986, “ Elastic Properties of Muscles Measured at the Elbow in Man: I. Normal Controls,” J. Neurol., Neurosurg. Psychiatry, 49(10), pp. 1171–1176. 10.1136/jnnp.49.10.1171 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [65]. Shaw, G. , Parent, D. , Purtsezov, S. , Lessley, D. , Crandall, J. , Kent, R. , Guillemot, H. , Ridella, S. A. , Takhounts, E. , and Martin, P. , 2009, “ Impact Response of Restrained PMHS in Frontal Sled Tests: Skeletal Deformation Patterns Under Seat Belt Loading,” Stapp Car Crash J., 53, pp. 1–48. 10.4271/2009-22-0001 [DOI] [PubMed] [Google Scholar]
  • [66]. Happee, R. , de Vlugt, E. , and van Vliet, B. , 2015, “ Nonlinear 2D Arm Dynamics in Response to Continuous and Pulse-Shaped Force Perturbations,” Exp. Brain Res., 233(1), pp. 39–52. 10.1007/s00221-014-4083-x [DOI] [PubMed] [Google Scholar]
  • [67]. Howell, J. N. , Chleboun, G. , and Conatser, R. , 1993, “ Muscle Stiffness, Strength Loss, Swelling and Soreness Following Exercise‐Induced Injury in Humans,” J. Physiol., 464(1), pp. 183–196. 10.1113/jphysiol.1993.sp019629 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [68]. Wochner, I. , Endler, C. A. , Schmitt, S. , and Martynenko, O. V. , 2019, “ Comparison of Controller Strategies for Active Human Body Models With Different Muscle Materials,” IRCOBI Conference Proceedings , Florence, Italy, Sept. 11–13, pp. 133–135.https://www.semanticscholar.org/paper/Comparison-of-Controller-Strategies-for-Active-Body-Wochner-Endler/c3e7329a5ffb9e12bc068fcdc5e87eb7c13e3960 [Google Scholar]
  • [69]. Marsden, C. D. , Obeso, J. A. , and Rothwell, J. C. , 1983, “ The Function of the Antagonist Muscle During Fast Limb Movements in Man,” J. Physiol., 335(1), pp. 1–13. 10.1113/jphysiol.1983.sp014514 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [70]. Flament, D. , Hore, J. , and Vilis, T. , 1984, “ Braking of Fast and Accurate Elbow Flexions in the Monkey,” J. Physiol., 349(1), pp. 195–202. 10.1113/jphysiol.1984.sp015152 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [71]. Wadman, W. J. , Denier, J. J. , Geuze, R. H. , and Mol, C. R. , 1979, “ Control of Fast Goal-Directed Arm Movements,” J. Hum. Mov. Stud., 5, pp. 3–17.https://www.researchgate.net/publication/233391758_Control_of_fast_goaldirected_arm_movements [Google Scholar]
  • [72]. Hannaford, B. , and Stark, L. , 1985, “ Roles of the Elements of the Triphasic Control Signal,” Exp. Neurol., 90(3), pp. 619–634. 10.1016/0014-4886(85)90160-8 [DOI] [PubMed] [Google Scholar]
  • [73]. Happee, R. , 1992, “ Time Optimality in the Control of Human Movements,” Biol. Cybern., 66(4), pp. 357–366. 10.1007/BF00203672 [DOI] [PubMed] [Google Scholar]
  • [74]. Kolesnikov, S. , and Khrulkov, V. , 2020, “ Sample Efficient Ensemble Learning With Catalyst.RL,” e-print arXiv:2003.14210.https://arxiv.org/abs/2003.14210
  • [75]. Akimov, D. , 2019, “ Distributed Soft Actor-Critic With Multivariate Reward Representation and Knowledge Distillation,” e-print arXiv:1911.13056.https://arxiv.org/abs/1911.13056
  • [76]. Happee, R. , de Bruijn, E. , Forbes, P. A. , and van der Helm, F. C. , 2017, “ Dynamic Head-Neck Stabilization and Modulation With Perturbation Bandwidth Investigated Using a Multisegment Neuromuscular Model,” J. Biomech., 58, pp. 203–211. 10.1016/j.jbiomech.2017.05.005 [DOI] [PubMed] [Google Scholar]
  • [77]. Diamond, A. , and Holland, O. E. , 2014, “ Reaching Control of a Full-Torso, Modelled Musculoskeletal Robot Using Muscle Synergies Emergent Under Reinforcement Learning,” Bioinspiration Biomimetics, 9(1), p. 016015. 10.1088/1748-3182/9/1/016015 [DOI] [PubMed] [Google Scholar]
  • [78]. Millard, M. , Uchida, T. , Seth, A. , and Delp, S. L. , 2013, “ Flexing Computational Muscle: Modeling and Simulation of Musculotendon Dynamics,” ASME J. Biomech. Eng., 135(2), p. 021005. 10.1115/1.4023390 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [79]. Mukherjee, S. , Perez-Rapela, D. , Forman, J. , Virgilio, K. , and Panzer, M. B. , 2021, “ Controlling Human Head Kinematics Under External Loads Using Reinforcement Learning,” IRCOBI Conference Proceedings , Online, Sept. 8–10, pp. 697–698.https://www.researchgate.net/publication/354644237_Controlling_Human_Head_Kinematics_under_External_Loads_Using_Reinforcement_Learning [Google Scholar]
  • [80]. Ólafsdóttir, J. M. , Östh, J. , and Brolin, K. , 2019, “ Modelling Reflex Recruitment of Neck Muscles in a Finite Element Human Body Model for Simulating Omnidirectional Head Kinematics,” IRCOBI Conference Proceedings, Florence, Italy, Sept. 11–13, pp. 308–323.https://www.researchgate.net/publication/336720514_Modelling_Reflex_Recruitment_of_Neck_Muscles_in_a_Finite_Element_Human_Body_Model_for_Simulating_Omnidirectional_Head_Kinematics [Google Scholar]
  • [81]. Berret, B. , Darlot, C. , Jean, F. , Pozzo, T. , Papaxanthis, C. , and Gauthier, J. P. , 2008, “ The Inactivation Principle: Mathematical Solutions Minimizing the Absolute Work and Biological Implications for the Planning of Arm Movements,” PLoS Comput. Biol., 4(10), p. e1000194. 10.1371/journal.pcbi.1000194 [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Journal of Biomechanical Engineering are provided here courtesy of American Society of Mechanical Engineers

RESOURCES