TABLE 2.
Literature | Reinforcement learning algorithm | Task | Training steps/time | Distance error/success rate |
---|---|---|---|---|
Thuruthel et al. (2018) | Policy search | 3D position reaching | 8,000 s | Without load: 0.009∼0.017 m |
With load: 0.022 m | ||||
Wu et al. (2020) | Q-learning | 2D position reaching | 1,000 iterations | Without load: <0.5 cm |
With load: <1 cm | ||||
You et al. (2017) | Q-learning | 2D position reaching | 1,000 iterations | <10 mm |
Jiang et al. (2021) | Q-learning | Interaction tasks including drawer opening and handwheel rotating | 120 iterations (about 60 s) and 20,000 iterations (about 11 h) with/without the method of virtual goals | Task success rate 98.86% |
Satheeshbabu et al. (2019) | DQN | 3D position reaching | 5,000 episodes a | 3.05 cm |
You et al. (2019) | DDQN | 3D position reaching | 100 episodes | 6.58 ± 5.6 mm |
Ansari et al. (2017b) | Actor–critic | 3D position control | 300 episodes | — |
Satheeshbabu et al. (2020) | DDPG | 3D path tracking | 10,000 episodes | ≤3 cm |
Liu et al. (2020) | PPO | 2D tracking with changing goals | 6,400 episodes | — |
One episode in reinforcement learning means a sequence of states, actions, and rewards, which ends with the terminal state. The time length of one episode depends on the specific task.