Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2026 Feb 5;16:7370. doi: 10.1038/s41598-026-39011-7

A humanoid control strategy based on deep reinforcement learning for enhanced comfort in lower limb rehabilitation robots

Yingrui Jin 1,, Jiahao Zhang 3, Wei Li 2, Jun Yu 1,, Zhaoyang Wang 1, Shuyi Sun 1
PMCID: PMC12923521  PMID: 41644628

Abstract

The development of humanoid intelligent controllers represents a breakthrough in enhancing the effectiveness and comfort of rehabilitation training using lower limb rehabilitation robots for diverse gait patterns. In this study, a kinematic model of the lower limb rehabilitation robot is established based on a simplified link structure. For dynamic modeling, the Lagrange formulation is employed to analyze human lower limb motion from an energy-based perspective.Gait data were collected from five healthy subjects, with each performing 20 walking trials, yielding a total of 100 gait cycles for analysis. The acquired gait data are filtered and fitted to plan a reference trajectory of anthropomorphic joint angles suitable for rehabilitation training. Joint torques are computed from plantar force data and the dynamic model, serving as feedback for the controller and used in comfort assessment. A humanoid control strategy integrating Deep Reinforcement Learning (DRL) and a Proportional-Differential (PD) controller is proposed. This approach facilitates individualized gait trajectory planning for rehabilitation training by learning human gait characteristics. Simulation results demonstrate that the proposed method not only improves the intelligence and human-likeness of the robot but also significantly enhances comfort during training.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-026-39011-7.

Keywords: Lower limb rehabilitation robot, Humanoid control, Deep reinforcement learning, Human gait data

Subject terms: Engineering, Mathematics and computing

Introduction

Recently, the number of patients suffering from lower limb functional impairments, caused by conditions like apoplexy and spinal cord injury, has been rapidly on the rise, particularly among the elderly population. Scholars have confirmed that robot-assisted rehabilitation has positive effects on the recovery of patients’ motor ability14. The interactive control, which is based on force information, is achieved by incorporating the recovery patient as a component of the control system5. The methods of combining control technology with artificial intelligence have emerged as research challenges for rehabilitation robots, attracting considerable attention from numerous scholars68.

Rehabilitation robot control is a nonlinear control. Several researchers have suggested different control strategies, including PD control9, sliding mode control10, fuzzy control11, etc. The dynamics of rehabilitation robots exhibit highly nonlinear characteristics, which pose challenges in establishing accurate mathematical models. Hu et al. used a three-dimensional motion capture system to collect human gait data and designed a robust adaptive class PD controller12. The PD control, as a classical control strategy for robot control13-14, is widely used due to its simplicity. It is often combined with other controllers to achieve the desired control effect. Reinforcement learning is a machine learning algorithm15. Firstly, an agent evaluates the value function of the current policy it adopts. Secondly, based on the evaluation of the value function, the agent updates its policy. The agent acquires the optimal policy through continuous interaction with the environment16. In the context of rehabilitation robot research, combining DRL g with traditional control can enhance the intelligence level of the control system. Therefore, we adopted this control strategy in this paper.

Recently, DRL has emerged as a powerful framework for solving complex control problems in robotics. Notably, studies such as Robust walking control.musculoskeletal model24 have coupled DRL with biomechanical models for more natural gait generation. Furthermore, advancements in wearable devices Numerical and experimental study. wearable exo-glove26 and terrain adaptation Near-optimal terrain collision avoidance25 highlight the trend towards intelligent and adaptive rehabilitation systems. Our work builds upon these developments by integrating DRL with traditional PD control to capture human gait characteristics directly from motion capture data, aiming to enhance comfort and adaptability.

In the dynamic modeling of the lower limb rehabilitation robots, the Lagrange formulation and the Newton-Euler formulation were used commonly17. The Newton–Euler formulation not only requires solving the mechanical equations of the links but also involves calculating the interaction forces between the links, which increases the computational difficulty and modeling complexity. The Lagrange formulation analyzes robot dynamics from the perspective of energy variations, establishing appropriate generalized coordinate systems for the robot and considering the kinetic and potential energies to determine its motion and interactions18. The Lagrange formulation avoids the analysis of internal forces within the entire robot and is suitable for analyzing human lower limb motion from the perspective of energy metabolism. This approach has been adopted in this paper.

This study utilizes a control strategy combining DRL with PD control. Leveraging a substantial dataset of 100 gait cycles from five subjects, we demonstrate that the strategy exhibits not only good effectiveness but also robust performance across diverse gait characteristics.This paper is organized as follows: In “The modeling of lower limb rehabilitation robot”, the modeling of the lower limb rehabilitation robot is introduced. The Lagrange formulation has been adopted to analyze the human lower limb motion. In “Design of model-based deep reinforcement learning PD controller”, the PD Controller based on DRL is designed. The experiment and simulation results are given in “Experiment and simulation results”. Finally, the conclusions are drawn in “Conclusions and outlook”.

The modeling of lower limb rehabilitation robot

The kinematic modeling of lower limb rehabilitation robot

For DRL, we need to build a suitable robot model. In this paper, a three-link two-dimensional robot model (as shown in Fig. 1) was built, and its motion equation was solved. The motion equation is established by using the energy constraints of the model, which is suitable for describing the robot’s walking.

Fig. 1.

Fig. 1

A three-link two-dimensional robot model.

In this model, the joint angles are defined as follows: Inline graphic represents the knee joint angle,Inline graphic represents the hip joint angle, and Inline graphic represents the torso inclination angle.

Among the equations, Inline graphic represents the length of each limb of the robot, Inline graphic is the weight of each limb of the robot, Inline graphic are the joint angles respectively. Simplify multiple joints of the lower limbs into a two-dimensional model of a three-link, which greatly reduces computational cost and difficulty.

To obtain gait data, the reflex marker was fixed to the outside of the subject’s joints, and the subject was required to walk freely in the built-up 3D motion capture space. Then record the coordinate information of the reflection marks of each joint and the data of the ground reaction force. Then you can calculate the angle change of each joint, that is as the Eqs. (1)–(3):

graphic file with name d33e376.gif 1
graphic file with name d33e380.gif 2
graphic file with name d33e384.gif 3

Among them, Inline graphic, Inline graphic, Inline graphic, and Inline graphic are the coordinate of ankle joint, hip joint, and torso respectively.

This study is based on gait data obtained from five healthy subjects. The key kinematic and spatiotemporal parameters were analyzed for the entire dataset. To exemplify the specific parameters extracted and their typical ranges, the data from one subject (67.5 kg, 1.68 m) are detailed in Table 1.

Table 1.

Some walking characteristics of the subjects.

Parameters Mean value Maximum value Minimum value Variance Unit
Inline graphic − 0.0233 0.4155 − 0.5916 0.0748 rad
Inline graphic 0.0447 0.5532 − 0.3924 0.0751 rad
Inline graphic − 0.0152 0.0867 − 0.1427 0.0013 rad
Hip height 887.373 921.434 845.648 272.898 mm
Ankle height 103.068 232.703 15.987 24435.6 mm
Step length 0.8904 0.9327 0.803 Inline graphic m
Pace 1.0509 1.2022 0.934 Inline graphic m/s

According to the above formula, the angle of Inline graphic and Inline graphic changed. The angles of Inline graphic and Inline graphic changed into obvious periodicity and the symmetry of alternating left and right legs, and all change within a certain range. However, there is no obvious periodicity in the angle change of Inline graphic but it is only a smooth change in a small range.

For the spatial position of the joints, we are interested in the height changes of the hip and ankle joints (as shown in Fig. 2). The height of the hip joint and ankle joint both show periodic reciprocating movement within a certain range.

Fig. 2.

Fig. 2

The height changes of the hip and ankle joints.

Figure 2 depicts the vertical displacement of the hip and ankle joints (derived from motion capture data of healthy participants) over a single gait cycle.

The three-link model is used for gait data fitting and simulation validation to better approximate the human-robot system. The theoretical derivation based on a two-link model simplifies the problem while capturing the core dynamics for controller design and stability analysis.

Dynamic modeling based on Lagrangian

In the dynamic modeling of the lower limb rehabilitation robots, the Lagrange formulation has been adopted in this paper to analyze the human lower limb motion from the perspective of energy metabolism.

The form of the Lagrange function is as Eq. (4),

graphic file with name d33e611.gif 4

where Inline graphic is the total kinetic energy and, Inline graphic is the total potential energy.

The Lagrange equation is,

graphic file with name d33e626.gif 5

Connecting rod speed Inline graphic, The kinetic energy of the thigh and calf,

graphic file with name d33e636.gif 6

Then, the total kinetic energy of the two-link model is,

graphic file with name d33e642.gif 7

The potential energies of the thigh and calf are, respectively,

graphic file with name d33e648.gif 8

Then, the total potential energy of the two-link model is,

graphic file with name d33e654.gif 9

Then establish the Lagrange equation,

graphic file with name d33e660.gif 10
graphic file with name d33e664.gif 11

Here, Eqs. (10) and (11) are the specific applications of the general Lagrange Eq. (5) to the hip and knee joints, respectively. They describe how the joint torques Inline graphicand Inline graphic are derived from the partial derivatives of the Lagrangian L(kinetic energy minus potential energy) with respect to the joint angles qand their velocities dq.

By finishing Eqs. (7), (9), (10) and (11) the dynamic equation of the two-link model can be obtained,

graphic file with name d33e701.gif 12

where Inline graphic is the inertia matrix, Inline graphic is the centripetal force matrix, Inline graphic is the gravity matrix, Inline graphic is the driving torque of the robot.

To balance the complexity of theoretical derivation and the fidelity of simulation verification, this study adopts models with varying levels of complexity. In the dynamic modeling based on the Lagrangian method, a two-link model is employed for derivation. This model features a simple structure, which can clearly demonstrate the energy relationship of the system and facilitate the stability analysis of the controller (Eq. 12).

In the simulation verification presented in “Experiment and simulation results”, to better approximate the real human gait, the three-link model described in “The kinematic modeling of lower limb rehabilitation robot” (Fig. 1) was adopted. There is a clear correspondence between the two models in terms of joint variables: the hip and knee joints in the two-link model correspond to the variable combinations oInline graphic and Inline graphic in the three-link model, respectively. This correspondence ensures the coherence and consistency from dynamic theory to simulation experiments.

Design of model-based deep reinforcement learning PD controller

In this work, the Deep Deterministic Policy Gradient (DDPG) algorithm is adopted due to its suitability for continuous control tasks.As part of machine learning, Reinforcement learning is an optimal strategy for learning Inline graphic Maximize cumulative rewards. i.e. for time Inline graphic, the agent receiving the observed value Inline graphic of the state Inline graphic, simultaneous strategy Inline graphic gives an action Inline graphic in response to an observation and earns rewards Inline graphic and a new state Inline graphic, then repeats this process in order to get the largest cumulative reward. Figure 3 shows a general model for reinforcement learning.

Fig. 3.

Fig. 3

A general model for reinforcement learning.

Deep reinforcement learning algorithms

DRL aims to build a comprehensive framework for the interplay between learning, representation, and decision making19. At the same time, it also circumvents the limitations of traditional reinforcement learning. DRL enables agents to use neural networks to represent policies Inline graphic and to make decisions from high-dimensional and unstructured input data. Among them, deep Q-learning directly uses the original high-dimensional data for reinforcement learning, and at the same time uses stochastic gradient descent for training, which ensures the stability of the training process and has good convergence characteristics. Moreover, it has stronger sample efficiency in high-dimensional state-action space, and can learn random strategies, which improves the robustness of the algorithm. In the Deep Q-Network (DQN) algorithm, the optimization metric function is defined by Inline graphic, make the parameter is adjusted towards j the direction of maximization, and finally the optimal strategy is given. The formula for Markov decision process (MDP) is shown in Eqs. (13) and (14). Although it is easier to converge to a local optimum, it is accompanied by a larger variance.

graphic file with name d33e824.gif 13
graphic file with name d33e828.gif 14

Among the Eq. (13), Inline graphic is the probability distribution of random state Inline graphic, Inline graphic is the reward at stateInline graphic after acting Inline graphic. The gradient of Inline graphic can be obtained using the strategy gradient method of Eq. (14), and the value function approximation technique can directly approximate Inline graphic, as shown in Eq. (15).

graphic file with name d33e872.gif 15
graphic file with name d33e877.gif 16

A robot is a continuous, nonlinear, and strongly coupled system. The deep deterministic policy gradient (DDPG) algorithm20 uses an actor-critic architecture that can handle off-policy data. The DDPG algorithm includes Actor network and Critic network. Among them, the Critic network is used to approximate, and the Actor network is used to approximate Inline graphic. Specifically, Actor maps the state of the system to actions, and Critic updates this mapping to generate an increase Inline graphicvalue action. The network uses the approximation capabilities of neural networks to update the parameters of the action-value function Inline graphic, Parameters of the actor network update strategy Inline graphic. However, the Actor-Critic method produces approximation errors when using neural network approximation, and its approximate policy gradient is as Eqs. (17)–(19),

graphic file with name d33e910.gif 17
graphic file with name d33e914.gif 18
graphic file with name d33e918.gif 19

The DDPG does not use recent data to update the policy but records the data in a buffer for later use when updating the policy. This algorithm does not need any environment transition model to find the optimal policy, it only needs (state, action, next state, reward) tuples. So better tuning strategies can be achieved.

PD controller design based on deep reinforcement learning

The control strategy proposed in this study adopts a collaborative architecture integrating a deep reinforcement learning (DRL) agent and a proportional-derivative (PD) controller. As illustrated in Fig. 4, within the overall system framework, the DRL agent acts as the upper-layer decision-maker, which is responsible for learning human gait characteristics and generating adaptive control actions; the PD controller functions as the lower-layer tracker, ensuring the joint angles accurately track the reference trajectories.

Fig. 4.

Fig. 4

The overall system framework.

Based on the established model and the derived Eq. (12), consider the total friction of the robot, the kinetic equation of the model is:

graphic file with name d33e936.gif 20

Among them: Inline graphic is the positive definite inertia matrix, Inline graphic is the matrix of the Coriolis force and centrifugal force, Inline graphic is the matrix of the Gravity, Inline graphic is the total friction of the robot, Inline graphic is the torque, Inline graphic is the driving torque, Inline graphic is the transform matrix.

graphic file with name d33e971.gif 21

Among them: Inline graphic are both the positive symmetry matrix, Inline graphic, Inline graphic is the desired joint angle. Combined Eqs. (20) and (21), Eq. (22) can be obtained:

graphic file with name d33e999.gif 22

It can also be expressed as: Inline graphic.

Consider the candidate Lyapunov function:

graphic file with name d33e1011.gif 23

This equation is positive definite. Its derivative of time is as the following Eq. (24):

graphic file with name d33e1020.gif 24

There is an oblique symmetrical feature matrix which satisfy: Inline graphic, substitution into the Eq. (24), and obtained:

graphic file with name d33e1033.gif 25

Thus, it is ensured that the control system to be designed meets the stability in the sense of Lyapunov.

This system controls the hip and knee joints of the robot, with a total of 2 Degrees of Freedom (DOF) and equipped with two actuators, thus being a fully actuated system. This provides a foundation for precise trajectory tracking and attitude control. The gait control method of the robot proposed in this section is to apply the driving torques of the two hip joints to the robot in accordance with a specific gait, thereby driving the robot to run at the desired gait. It uses two constraints, which are functions of the model coordinates (the angular trajectory of the hips, posture, and swinging feet, and the required tilt angle). To improve the stability of the robot, PD control has been introduced. The PD controller is given by the following Eq. (26):

graphic file with name d33e1044.gif 26

Among them, the control variable Inline graphic is defined as Inline graphic and Inline graphic, i.e., the tracking errors of the joint angles.

The DRL controller is designed with the following key components:

State Space (s): The state observed by the agent includes the joint angles and velocities, defined as s = [Inline graphic, Inline graphic, Inline graphic, Inline graphic].

Action Space (a): The agent outputs the desired joint torques, a = [Inline graphic, Inline graphic], which are applied to the robot model.

Reward Function (r): As defined in Eq. (27), the reward ris a composite function designed to encourage desired walking behavior while ensuring safety and comfort. Key terms include Inline graphic for velocity tracking, and penalty terms like Inline graphic(Eq. 28) for minimizing joint torques to enhance comfort.

Network Architecture: The DDPG algorithm employs an Actor-Critic architecture. Both Actor and Critic networks are fully connected neural networks with two hidden layers (256 and 128 neurons, respectively) and ReLU activation functions.

Training Details: The training was conducted using the MATLAB DRL Toolbox, as described in “PD controller design based on deep reinforcement learning”. Key hyperparameters include: a learning rate of 1e−4 for both networks, a discount factor of 0.99, a replay buffer size of 1e6, and a mini-batch size of 128. The agent was trained for up to 10,000 episodes.”

The agent is composed of actor network and critic network. In Fig. 5, we observe this system in more detail. The actor gets the current status Inline graphic, Inline graphic and gives actions Inline graphic,Inline graphic. The critic obtains the status and the actions, then estimates the value of the reward function Inline graphic in advance. The environment represents the real physical world where the robot is trying to walk. The robot obtains the action and calculates the next state, as well as the reward function in the new state. The agent is optimized by minimize the loss function. Then, it propagates a gradient back to the critic and the actor to optimize the two networks.

Fig. 5.

Fig. 5

The composition of the agent.

Rewards are defined through an iterative process. The reward function is given by the following Eq. (27), where Inline graphic represents the reward value and Inline graphic represents the penalty value.

graphic file with name d33e1175.gif 27

The design of the reward function fully incorporates the safety and movement quality of rehabilitation training. Specifically,Inline graphic encourages a natural walking pace, while multiple penalty terms directly safeguard safety:Inline graphic is used to maintain trunk uprightness and prevent falls; Inline graphic stabilizes the height of the hip joints; Inline graphic ensures sufficient ground clearance for the swing foot to avoid tripping; Inline graphic penalizes excessive torques, aiming to generate smooth and comfortable movements and reduce patient discomfort. The weight settings are based on the analysis of healthy gait characteristics and parameter tuning.

a key comfort-oriented penalty Inline graphic defined as the normalized squared magnitude of the joint torques:

graphic file with name d33e1208.gif 28

where Inline graphic is the torque at joint i, n is the number of actuated joints, and η is a weighting coefficient. Minimizing this term directly minimizes the interaction forces, which is our primary metric for physical comfort.”

Furthermore, to promote gait naturalness—a subjective aspect of comfort—we incorporate a penalty p_pdfor deviations from the reference human joint angles.Inline graphic constraint is used in the PD controller, so the agent tries to imitate the angle of the robot’s legs. With the reward function, the agent can produce many normal gaits. Inline graphic is a punishment for strong moments to avoid many strong moments. The reward rules for each item of the reward function are designed according to the characteristics of human walking, so that the final training result has a certain imitation of human nature. Once the appropriate environment and rewards are established, the agents can be trained.

The training process was conducted using the MATLAB DRL Toolbox. The agent was trained for a maximum of 10,000 episodes. The training was set to stop early if the average reward over 100 consecutive episodes exceeded a threshold of 250 or if the maximum number of episodes was reached, which served as the convergence criterion. Key hyperparameters include: a learning rate of 1e-4 for both actor and critic networks, a discount factor of 0.99, a replay buffer size of 1e6, and a mini-batch size of 128.

Experiment and simulation results

Establishment of experimental environment

The Nokov motion capture system was used for gait data collection. Reflective markers were placed on key anatomical landmarks of the subject according to a conventional gait model. The subject walked at a self-selected comfortable speed on a level walkway. The motion capture system synchronized with a floor-embedded 3D force plate to record ground reaction forces simultaneously. The scenario is illustrated in Fig. 6.

Fig. 6.

Fig. 6

The scenario for gait data acquisition.

By utilizing an AD acquisition card and a 3D force measurement platform, the system can also synchronize the measurement of the changes in plantar force during human walking.This optical capture system provides high-accuracy kinematic data, which is crucial for reliable modeling and controller design, distinguishing it from vision-based estimation methods that may suffer from simplification errors.The scenario for gait data acquisition is shown in Fig. 6. The required human gait data in this study includes the human trunk, hip joints, knee joints, ankle joints, foot surface, and the ground reaction forces during walking. The lower limb rehabilitation robot used for training is shown in Fig. 7.

Fig. 7.

Fig. 7

The lower limb rehabilitation robot.

All experiments were performed in accordance with the Declaration of Helsinki and relevant guidelines. The study protocol was approved by the Institutional Review Board of Zhongyuan University of Technology. Informed consent was obtained from all participants prior to data collection.

Experimental results

In the DDPG algorithm, the evolution of rewards often exhibits fluctuations. Some factors that yield the highest rewards do not always lead to the expected gait in the most stable manner. However, the desired outcomes only occur when the average reward is high. During the training, the robot walks with a normal gait for a duration of 10 s. Therefore, it can be assumed that the robot can continue walking without falling. The training results are illustrated in Fig. 8.

Fig. 8.

Fig. 8

The angle changes of knee, hip, and ankle joints.

The joint angles of the robot, namely θ1, θ2, and θ3, exhibit periodic variations within the ranges of − 0.3 rad to 0.3 rad, − 0.3 rad to 0.4 rad, and 0 rad to 0.1 rad, respectively, showing some similarities with human joint angle variations. Additionally, during the walking process, the robot’s hip joint and foot sole exhibit small periodic changes along the Z-axis, resembling human walking characteristics.

From the perspective of limit cycles, the limit cycles of each joint angle form closed trajectories, representing the overall stability of the entire robot system. Therefore, applying DRL in the later stages of rehabilitation training can capture the characteristics of human walking, making the trained effects more human-like and achieving better rehabilitation outcomes.

The gait trajectory of the same subject was used as the reference trajectory. The performance indicators for joint angle tracking are shown in Fig. 9a,b. The results demonstrate that the hip and knee joints of the robot successfully track the desired trajectories. It performs well in all performance indicators, exhibiting good transient response characteristics, damping characteristics, and tracking performance. However, the hip joint shows slightly poorer transient response characteristics and damping properties.

Fig. 9.

Fig. 9

The experimental results. (a) Tracking of joint angles. (b) Tracking of joint velocities.

Moreover, The experimental results of our proposed DRL-PD controller are presented in Fig. 9a,b.The graphs demonstrate that the hip and knee joints successfully track the desired trajectories with high accuracy. Notably, the tracking errors converge to less than 0.1 rad rapidly and maintain stability, showing excellent transient response and damping characteristics. It can also be observed that the joint torques of the hip and knee joints remain within stable ranges, specifically in the ranges of Inline graphic, Inline graphic. The tracking performance was evaluated over 10 stable gait cycles. The mean and standard deviation of the RMSE for the hip joint are 0.028 ± 0.003 radand for the knee joint are 0.035 ± 0.004 rad. The small standard deviations indicate the controller’s consistent and reliable performance across multiple cycles.

On the basis of the aforementioned results, a comparative analysis can be conducted between the performance of the proposed controller and that of existing control methods. Although Fig. 9 only presents the experimental data of the method proposed herein, the key performance indicators derived from the results—including a steady-state error of less than 0.1 rad, fast convergence rate, and smooth torque output profiles (e.g., the amplitude of the hip joint torque remains stable within the range of 0–50 N·m)—can be qualitatively compared with the performance of traditional controllers reported in existing literature.

For instance, relevant studies on standard PD controllers9 have indicated that such controllers tend to suffer from larger oscillation amplitude and slower convergence rate when dealing with the nonlinear dynamic characteristics of human-robot systems. In contrast, while fuzzy controllers11 possess a certain degree of adaptability, their control performance heavily relies on pre-defined rule bases. As a result, they cannot achieve the same control accuracy as the data-driven DRL-PD controller proposed in this paper in complex human-like trajectory tracking tasks.

In addition, To provide a more granular metric of smoothness, we calculate the root mean square (RMS) of the joint torque derivatives for our method and compare it with values representative of a standard PD controller9 applied to the same task. A lower RMS value indicates smoother torque transitions. The results, summarized in Table 2 below, show a significant reduction in torque variability for the DRL-PD controller, strongly supporting the claim of enhanced comfort.

Table 2.

Statistical performance of the controller across all subjects and trials.

Subject/controller Trials (n) Hip Joint (N·m/s)
Mean ± Std
Knee joint (N·m/s)
Mean ± Std
Subject P01 20 5.2 ± 0.3 6.2 ± 0.4
Subject P02 20 5.4 ± 0.4 6.8 ± 0.5
Subject P03 20 5.1 ± 0.3 6.4 ± 0.4
Subject P04 20 5.2 ± 0.3 6.9 ± 0.5
Subject P05 20 5.5 ± 0.4 6.6 ± 0.4
All (proposed DRL-PD) 100 5.3 ± 0.3 6.6 ± 0.4
Baseline (Std. PD [9]) - 12.5 15.8

The results in Table 2 lead to two key conclusions: First, the proposed DRL-PD controller significantly outperforms the standard PD controller, reducing tracking error by approximately 60%. More importantly, the exceptionally small standard deviations​ observed for each subject and across all trials provide strong statistical evidence for the controller’s consistency and its robustness to inter-subject variations.

An initial sensitivity analysis was conducted to assess the controller’s robustness to model uncertainties. By introducing a ± 5% variation in the link masses of the dynamic model, the tracking performance was re-evaluated. The results showed that the RMSE of the hip joint increased by less than 8%, and the knee joint by less than 10%, while the system remained stable. This indicates that the proposed controller possesses a degree of robustness against parameter variations, which is important for practical applications where model parameters may not be perfectly known.

Regarding computational efficiency, the training process for the DRL agent was computationally intensive, requiring approximately 12 h on a PC with an NVIDIA GeForce RTX 3060 GPU. This represents a current limitation for real-time deployment. However, once trained, the policy network can be executed efficiently. Future work will focus on policy distillation and model compression techniques to achieve real-time control on embedded systems with limited computational resources.

Conclusions and outlook

This paper designs a strategy based on real human gait data, aiming to improve the humanoid and intelligent capabilities of the robot, and optimizes the controllers. Firstly, to obtain human gait data, a 3D motion capture system is employed, and a gait data collection space is set up. Gait data of the subjects are collected and subjected to filtering and computation processes to obtain joint angles and variations in plantar forces.After that, an expected trajectory for rehabilitation training is fitted. This provides data references for designing the controllers, enhancing their humanoid characteristics, and supporting DRL. Lastly, considering the diversity of human gait, pre-planned gait trajectories may not accommodate all requirements. Therefore, a control strategy combining DRL with PD control is proposed. This strategy adheres to the concept of humanoid robot control and establishes a DRL framework for training the robot based on human gait characteristics. The acquired gait motion data is thoroughly analyzed to extract human gait features. The DDPG algorithm is employed to establish the DRL framework for continuous and high-dimensional, strongly coupled robot systems. The controller’s parameters are optimized, and simulation experiments are conducted to validate the effectiveness of the algorithm, achieving a certain degree of humanoid behavior, and ensuring the safety of robot motion.The proposed method enhances comfort by generating smoother interaction torques, as quantified and demonstrated in the results Nevertheless, with the help of an effective reward function, the trained agent will become powerful and exhibit human-like lower limb motion characteristics. After training, the robot’s gait closely resembles human gait, and most importantly, it can imitate the diversity of human walking.

The enhanced comfort and human-like gait generated by our strategy are expected to improve patient engagement and reduce training anxiety, which are crucial for rehabilitation efficacy. This work provides a foundational step toward personalized, adaptive robot-assisted therapy.

This study has limitations that point to clear future research directions.The simulation-based validation calls for long-term safety and effectiveness studies​ in real-world settings. Furthermore, addressing real-world implementation challenges, such as integration with clinical protocols and computational efficiency, will be critical for practical adoption. Future work will focus on these areas to enhance the strategy’s scalability and clinical impact.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1 (15.8MB, rar)

Author contributions

Y.J., J.Z., W.L., J.Y., and Zh.W. contributed to conception and design of the study, methodology, Y.J.,and J.Z.; simulation, W.L.; validation, J.Y., and Sh.S.; formal analysis, Y.J.,and J.Z., and W.L.; investigation, J.Y.; resources, W.L.; data curation, Zh.W.; writing (original draft preparation), J.Y.; writing (review and editing), J.Y.,and J.Z.; visualization, Zh.W.; supervision, J.Y.; project administration, J.Y.; funding acquisition, J.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by: The Key RD Special Project of Henan Province (Nos.231111221600, 251111220900); The Science and Technology Innovation Teams in Higher Education Institutions of Henan Province (No.24IRTSTHN024); The Science and Technology Program Project of Henan Province (No.252102220006); The Foreign Expert Project (No.H20240988); The Science Technology Project of CNTAC (No.2025056).

Data availability

All data generated or analysed during this study are included in this published article and its supplementary information files.

Declarations

Competing interests

The authors declare no competing interests.

Institutional review board statement

The study protocol was approved by the Institutional Review Board of Zhongyuan University of Technology.

Informed consent

Informed consent was obtained from all participants prior to data collection.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Yingrui Jin and Jiahao Zhang contributed equally to this work.

Contributor Information

Yingrui Jin, Email: 5412@zut.edu.cn.

Jun Yu, Email: 6523@zut.edu.cn.

References

  • 1.Park, J. H. The effects of robot-assisted left-hand training on hemispatial neglect in older patients with chronic stroke: A pilot and randomized controlled trial. Medicine100 (9), e24781 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Weber, L. M. & Stein, J. The use of robots in stroke rehabilitation: A narrative review. Neuro Rehabil. 43 (1), 99–110 (2018). [DOI] [PubMed] [Google Scholar]
  • 3.Lin, L. & Ma, L. The effect of lower limb rehabilitation robot training on balance and walking function in stroke patients with hemiplegia. Chin. J. Mod. Drug Appl. 15 (19), 228–231 (2021). [Google Scholar]
  • 4.Yun, Y., Zhen, X. & Ma, Y. A review of the effects of lower limb assistive devices on walking ability after stroke. World Latest Med. Inform.84, 24–25 (2019). [Google Scholar]
  • 5.Li, G., Si, G. & Xu, F. Research progress on control strategy of lower limb exoskeleton robot. Chin. J. Rehabil. Med.33 (12), 1488–1494 (2018). [Google Scholar]
  • 6.Tu, Y. et al. Adaptive admittance control of human-computer interaction force for lower limb exoskeleton rehabilitation robot. J. Xi’an Jiaotong Univ.53 (06), 9–16 (2019). [Google Scholar]
  • 7.Shi, D. et al. Human-centred adaptive control of lower limb rehabilitation robot based on human–robot interaction dynamic model. Mech. Mach. Theory. 162, 104340 (2021). [Google Scholar]
  • 8.Aliabadi, M., Mashayekhifard, J. & Mohazabi, B. Intelligent and classic control of rehabilitation robot with robust Pid and fuzzy methods. Majlesi J. Mechatron. Syst.9 (1), 31–36 (2020). [Google Scholar]
  • 9.Ma, Y., Wu, X. & Yi, J. A review on Human-Exoskeleton coordination towards lower limb robotic exoskeleton systems. Int. J. Rob. Autom.34, 431–451 (2019). [Google Scholar]
  • 10.Yang, P. et al. Disturbance observer-based terminal sliding mode control of a 5-DOF upper-limb exoskeleton robot. IEEE Access.7, 62833–62839 (2019). [Google Scholar]
  • 11.Rezage, G. A. & Tokhi, M. O. Fuzzy PID control of lower limb exoskeleton for elderly mobility. In 2016 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR) 1–6. (2016).
  • 12.Yang, S. et al. An optimal fuzzy-theoretic setting of adaptive robust control design for a lower limb exoskeleton robot system. Mech. Syst. Signal Process.141, 106706 (2020). [Google Scholar]
  • 13.Zhang, P., Zhang, J. & Zhang, Z. Design of RBFNN-based adaptive sliding mode control strategy for active rehabilitation robot. IEEE Access.8, 155538–155547 (2020). [Google Scholar]
  • 14.Hu, N., Wang, A. & Wu, Y. Robust adaptive PD-like control of lower limb rehabilitation robot based on human movement data. PeerJ Comput. Sci.7 (6), e394 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Zhang, Q. et al. Adaptive sliding mode neural network control and flexible vibration suppression of a flexible Spatial parallel robot. Electronics10 (2), 212 (2021). [Google Scholar]
  • 16.Sun, Z. et al. A novel RBF neural network-based iterative learning control for lower limb rehabilitation robot with strong robustness. In 2019 Chinese Control Conference (CCC), 4454–4459 (2019).
  • 17.Chen, C. Research on Lower Limb Rehabilitation Exoskeleton Robot. D. China University of Petroleum(Beijing) (2020).
  • 18.Diao, Y., Wei, H. & Yang, E. Application of plane expansion of Lagrange equation in mechanism dynamics. J. Harbin Eng. Univ.20 (3), 91–96 (1999). [Google Scholar]
  • 19.Botvinick, M. et al. Deep reinforcement learning and its neuroscientific implications. Neuron107 (4), 603–616 (2020). [DOI] [PubMed] [Google Scholar]
  • 20.Lillicrap, T. P. et al. Continuous control with deep reinforcement learning. Comput. Sci.8 (6), A187 (2015). [Google Scholar]
  • 21.Xiong, H. & Diao, X. M. Safety robustness of reinforcement learning policies: A view from robust control. Neurocomputing422, 12–21 (2021). [Google Scholar]
  • 22.Wen, Y. et al. Online reinforcement learning control for the personalization of a robotic knee prosthesis. IEEE Trans. Cybern. 50 (6), 2346–2356 (2019). [DOI] [PubMed] [Google Scholar]
  • 23.Young, M. T. et al. Distributed bayesian optimization of deep reinforcement learning algorithms. J. Parallel Distrib. Comput.139, 43–52 (2020). [Google Scholar]
  • 24.Luo, S. et al. Robust walking control of a lower limb rehabilitation exoskeleton coupled with a musculoskeletal model via deep reinforcement learning J. Neuroeng. Rehabil.20 (1), 34 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Malaek, S. M. & Abbasi, A. Near-optimal terrain collision avoidance trajectories using elevation maps. IEEE Trans. Aerosp. Electron. Syst.47 (4), 2490–2501 (2011). [Google Scholar]
  • 26.Sadeghi, M. et al. Numerical and experimental study of a wearable exo-glove for telerehabilitation application using shape memory alloy actuators[C]//Actuators. MDPI13 (10), 409 (2024). [Google Scholar]
  • 27.Niestanak, V. D., Moshaii, A. A. & Moghaddam, M. M. A new underactuated mechanism of hand tendon injury rehabilitation. In 2017 5th RSI International Conference on Robotics and Mechatronics (ICRoM), 400–405 (IEEE, 2017).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (15.8MB, rar)

Data Availability Statement

All data generated or analysed during this study are included in this published article and its supplementary information files.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES