Skip to main content
. 2021 Sep 24;8:730330. doi: 10.3389/frobt.2021.730330

TABLE 2.

Sampling of reinforcement learning control for continuum robots, including the algorithm, task, and training duration. Intended to be exemplary, not comprehensive.

Literature Reinforcement learning algorithm Task Training steps/time Distance error/success rate
Thuruthel et al. (2018) Policy search 3D position reaching 8,000 s Without load: 0.009∼0.017 m
With load: 0.022 m
Wu et al. (2020) Q-learning 2D position reaching 1,000 iterations Without load: <0.5 cm
With load: <1 cm
You et al. (2017) Q-learning 2D position reaching 1,000 iterations <10 mm
Jiang et al. (2021) Q-learning Interaction tasks including drawer opening and handwheel rotating 120 iterations (about 60 s) and 20,000 iterations (about 11 h) with/without the method of virtual goals Task success rate 98.86%
Satheeshbabu et al. (2019) DQN 3D position reaching 5,000 episodes a 3.05 cm
You et al. (2019) DDQN 3D position reaching 100 episodes 6.58 ± 5.6 mm
Ansari et al. (2017b) Actor–critic 3D position control 300 episodes
Satheeshbabu et al. (2020) DDPG 3D path tracking 10,000 episodes ≤3 cm
Liu et al. (2020) PPO 2D tracking with changing goals 6,400 episodes
a

One episode in reinforcement learning means a sequence of states, actions, and rewards, which ends with the terminal state. The time length of one episode depends on the specific task.