| MAMMDP | Multi-Arm Manipulator Markov Decision Process |
| SAC | Soft Actor-Critic |
| HER | Hindsight Experience Replay |
| AI | Artificial Intelligence |
| FMMs | Fast Marching Methods |
| PRM | Probabilistic Road Map |
| RRT | Rapid exploring Random Trees |
| DNN | Deep Neural Network |
| TD3 | Twin Delayed Deep Deterministic Policy Gradient |
| MDP | Markov Decision Process |
| DOF | Degree of Freedom |
| OBB | Oriented Bounding Boxes |
| DQN | Deep Q-Network |
| DPG | Deterministic Policy Gradient |
| DDPG | Deep Deterministic Policy Gradient |
| A3C | Asynchronous Advantage Actor-Critic |
| TRPO | Trust Region Policy Optimization |
| MPO | Maximum a Posteriori Policy Optimisation |
| D4PG | Distributed Distributional Deep Deterministic Policy Gradient |
| KL | Kullback-Leibler |