Abstract
Recently, Recurrent Neural Network (RNN) control schemes for redundant manipulators have been extensively studied. These control schemes demonstrate superior computational efficiency, control precision, and control robustness. However, they lack planning completeness. This paper explains why RNN control schemes suffer from the problem. Based on the analysis, this work presents a new random RNN control scheme, which 1) introduces randomness into RNN to address the planning completeness problem, 2) improves control precision with a new optimization target, 3) improves planning efficiency through learning from exploration. Theoretical analyses are used to prove the global stability, the planning completeness, and the computational complexity of the proposed method. Software simulation is provided to demonstrate the improved robustness against noise, the planning completeness and the improved planning efficiency of the proposed method over benchmark RNN control schemes. Real-world experiments are presented to demonstrate the application of the proposed method.
Keywords: Recurrent Neural Networks, Motion Planning, Redundant Manipulator, Random Neural Networks, Robot
I. INTRODUCTION
Redundant manipulators demonstrated superior dexterity and are widely applied to intelligent robots. However, the redundant manipulator motion planning problem remains challenging. Actually, it has been proven that this problem is PSPACE-hard, when obstacles exist [1]. The redundant manipulator motion planning problem is to find the optimal path in the manipulator configuration space that delivers the end-effector to the desired target without breaking constraints [2]. The configuration space consists of all feasible arm joint configurations, where m denotes the Degrees of Freedom (DoF) of the arm. Given the manipulator model, each joint configuration corresponds to a uniquely defined end-effector pose as where is the kinematic model. Therefore, the configuration space is mapped to the task space [3]. Correspondingly, the n-dimensional task space contains all feasible end-effector poses,
Searching for the solution in the configuration space given a task and the kinematic model is known as the kinematic control problem [1]. The problem is often solved in the velocity space because the partial differentiation of the kinematic model linearizes and simplifies the problem to:
| (1) |
where is the n × m Jacobian matrix [2].
For redundant manipulators, an infinite number of solutions satisfy Eqn. 1, because the redundancy m – n > 0. The redundancy corresponds to the self motion, which is useful in obstacle avoidance [3]. Existing algorithms utilizing the redundancy for obstacle avoidance can be generally divided into two categories. One utilizes “Gradient Projection” Methods to determine the joint velocity vector that corresponds the self motion of avoiding obstacles [3]. The components of that are in the null space of J can be selected by (I − J† J). By adding the selected components to the motion that moves the end-effector the optimal joint velocity is uniquely defined as: where J† is the Moore-Penrose pseudo-inverse defined as [2], [4]. The second family of algorithms treats the obstacle avoidance as constraints and converts them into tasks, through which the task space is augmented to uniquely define the solution [2], [4]. Mathematically, denotes the constraint of obstacle avoidance, where s is the dimension of the constraints, then the task is augmented as: and the solution can be uniquely determined when s = m − n, as:
| (2a) |
| (2b) |
where
From the previous paragraph we know the problem is mathematically well-defined. However, it is still challenging to solve the problem because the configuration spaces are often concave while obstacles exist. Neural network based methods attract attention recently, and many reported significant improvement of robot performance [5]–[18].
Most of these recent studies approach the problem from the control theory perspective, and focus on improving control stability [19], [20] or system adaptiveness [21], [22]. In real-world robotic applications, the solutions from the motion planning perspective are preferable because of the requirements on applicability, effectiveness, and efficiency. Among the works that approach the problem from the motion planning perspective, some of them assume obstacle-free environments. For example, Jin et al. proposed a Recurrent Neural Network (RNN) based solution to optimize motions for maximizing the manipulability of redundant manipulators [23]. Li et al. proposed a RNN control scheme to address the cooperative control problem for distributed redundant manipulators [24]. For environments with obstacles, Zhang et al. treats the condition of collision avoidance as an additional constraint and solved the motion planning in the velocity space [25]. Guo et al. extended the work into the acceleration space, and the proposed scheme guarantees minimum-acceleration-norm [26]. More discussion and comparison on neural network based motion planning can be found in [27]. Although these RNN control schemes have demonstrated improved control precision and efficiency, they model the problem as the constrained optimization, therefore, they suffer from the local minimum problem and lack the planning completeness.
This work aims to address the planning incompleteness problem of these neural network-based control schemes in environments with obstacles. Being different from these existing works, this work proposed a novel random RNN control scheme, inspired by the recent finding that neural network randomness correlates with superior learning abilities [28]. Therefore, the proposed method inherits the robustness, the computational efficiency, and the effectiveness of neural network-based control schemes, while it also achieves the probabilistic planning completeness. Furthermore, through learning in the process of exploration, the proposed method balances the random exploration of environments with the heuristic search and improves the planning effectiveness. In summary, the main contributions of this work are:
We propose a novel Recurrent Neural Network (RNN) control scheme to address the planning incomplete problem. The proposed method inherits the advantages of classical RNN control schemes, including high precision, the high efficiency and the high robustness from RNN. Meanwhile, it addressed the local minimum problem and achieved guaranteed probabilistic planning completeness.
We introduce Short Term Memory (STM) model into the proposed method to learn environmental complexity from exploration. The proposed scheme balances the random exploration and the heuristic search to improve planning efficiency.
We prove the global stability, the planning completeness and show the computational complexity of the proposed method.
We study the control precision, the robustness against noise and the planning completeness and the planning efficiency of the proposed method through comparing it with other three algorithms.
We demonstrate the application of the proposed method in both software simulation and real-world experiments.
The rest of this paper is organized as follows: Section II presents the proposed RNN control schemes in detail. Section III presents the theoretical analyses of the proposed method. Section IV compares the proposed method with other three schemes in simulation experiment, and further verifies it in real-world experiments. The paper concluded with the discussion in Section V.
II. RANDOM RECURRENT NEURAL NETWORK FOR REDUNDANT MANIPULATOR MOTION PLANNING
A. Solving Kinematic Control in Dual Space with Recurrent Neural Networks
Recurrent Neural Networks (RNNs) refer to networks that have inter-layer connections. RNNs are intrinsically parallel and capable of processing sequential or time-varying data [29]. RNNs showed superior robustness and efficiency on solving the Quadratic Programming problem (QP) [30], [31], thus have been used for redundant manipulator control. The kinematic control of redundant manipulators can be express in the form of QP as:
| (3a) |
| (3b) |
| (3c) |
where W and c are two weighting factors that are a m × m real symmetric matrix and a real valued m-dimensional vector, respectively, and denotes the configuration space of the redundant manipulator.
To simplify the QP problem, we project it into its dual space, through designing the Lagrange multiplier, to correspond to the constraint (Eqn. 3b). The problem described in Eqn. 3 is simplified to:
| (4) |
The Karush-Kuhn-Tucker condition (Chapter 5.5.3 in [32]) ensures that the solution to Eqn. 4 equals to the solution to the following equation:
| (5) |
where is a projection function from domain to Ω, where and
Eqn. 5 naturally matches the neural dynamics of a projected Recurrent Neural network (Eqn. 6), and it can be proved that the equilibrium of the network equals to the optimal solution of the system described in Eqn. 3 [33], [34].
| (6a) |
| (6b) |
where ϵ > 0 is a scaling factor. These types of projected RNN has simple architeture as they are a single layer neural network, in which neurons are fully connected. The output of the neural network goes to a projection function as explained in Eqn. 9.
B. Improve Control Precision and Robustness by Closing the Loop of RNN Control Scheme
The control precision and the robustness against noise are critical in motion planning, because: 1) the control precision defines the minimum safety distance to obstacles, which shrinks the configuration space; 2) high control precision and robustness against noise ensure the successful execution of planning results. Classical RNN control schemes have demonstrated improvement on the control precision, the robustness and the efficiency of redundant manipulator control, compared with their Jacobian inversion based equivalents [35], we identified the problem of error accumulations in these control schemes and design a new RNN control scheme to overcome the error accumulation problem through introducing a new optimization target. We first introduced the new design and the proof of the effectiveness will be presented in Section III-A.
Intuitively, feeding the tracking errors, back into the optimization target forces RNN to minimize the errors. Correspondingly, the new optimization target can be designed as:
| (7) |
where k > 0 is a weighting factor and r is the end effector position [36]. Eqn. 7 will direct RNNs to minimize the magnitude of e, because and Control precision and robustness of RNNs based on Eqn. 7 are analyzed in theory in Subsection III-A, and empirically compared in Subsection IV-A.
With the new defined optimization function (Eqn. 7), Eqn. 4 becomes:
| (8) |
The constraints of joint physical limits can be fulfilled by designing a proper projection function. A projection function has the form of:
| (9) |
where we design the boundary conditions as:
| (10) |
Eqn. 10 ensures the joint limits are met. This is because when joints approach the upper (q+) and the lower (q−) bounds of joint limits, the magnitudes of the two terms, and decrease to zeros, as the speed regulated by two positive scaling factors, c1 and c2. These two scaling factors can be empirically tuned to clamp the joint accelerations, which improves the control precision in mechanics with non-negligible inertial, for example, cable-driven manipulators. w+ and w− are the upper and the lower bound of the joint speed. They can be used to clamp movement speeds, which improve manipulator performance through regulating motion patterns [36].
By substituting Eqn. 8, 9 and 10 into Eqn. 6, we have a new RNN control scheme that meets the constraints of joint limits and joint speeds in obstacle free environments.
C. Obstacle Avoidance and Complete Motion Planning with RNN
Because of the flexibility of RNN architecture, the obstacle avoidance problem can be addressed by both augmenting the task space, and converting the obstacle avoidance constraints into bounding conditions [25], [26]. The former adopts the scheme explained in Subsection II-A. To be more specific, the Jacobian and the task are augmented by the obstacle avoidance constraints, as explained in Eqn. 2. These methods require the dimension of the extra tasks, s, to meet the condition s ≤ m − n. The latter does not have such limitation. In order to explain it, we first introduce the concept of “critical points”, as indicated by green points in Fig. 2. The critical points are the points on the manipulator, whose distances to obstacles equal or are smaller than the safety threshold. Let’s denote the critical points as ro,i, and denote the nearest points on the obstacle by oo,i, where i indicates the i-th critical point. Then the algorithms in the latter category avoid a collision through exerting “escaping velocity” on the critical points.
Fig. 2:

Explanation to the Reason that Classical RNN Control Schemes Suffer from the Local Minimum Problem and Lack of Planning Completeness. In the simple 2-dimensional environment, classical RNN control schemes fail to find a valid pathway to move from the pose indicated by the solid line to the pose indicated by the dashed line because the configuration space is concave due to the existence of the obstacle.
The escaping velocity denotes the velocity that moves the critical points away from the obstacles:
| (11) |
where a is a semi-positive scaling factor. Any critical point velocity between zero and the escaping velocity will avoid a collision. It is obvious that this method converts the constraints of obstacle avoidance, into a closed set in the configuration space, and in the RNN control scheme, this set bounds neuron activities:
| (12) |
Because in this scheme, the obstacle avoidance does not explicitly “consumed” the manipulator redundancy, algorithms based on this scheme can deal with the infinite number of constraints and are more flexible.
From Eqn. 12 we know that in classical RNN control schemes, the target globally “drags” the end effector to move toward it, while obstacles “push” the critical points to move away. Intuitively, this causes the local minimum problem and the planning is guaranteed to be complete. This problem is visualized in Fig. 2. In the figure, RNN schemes failed to drive the manipulator from the position indicated by the solid red circle, to the target indicated by the dashed red circle, because the obstacle made the configuration concave and trapped the manipulator. The mathematical explanation of this local minimum problem is in Section III-B.
In order to address this problem, the manipulator needs the ability to “jump” out of local minimum, and the neural network randomness can be utilized because recent study demonstrated that the randomness correlated with superior learning abilities and has been used to demonstrate human-level concept learning ability [28].
In order to grant RNNs the randomness, we replace the global attraction from the target with random attractions. The random attractions, is designed as:
| (13) |
where g is a non-zero weighting factor that regulates the attractions.
D. Balancing Random Exploration and Heuristic Search with Short Term Memory Model
The local minimum problem was due to the concave configuration space. In sparse environments, the heuristic search has a high probability of success and is efficient, therefore, is preferable. In environments with complex obstacles, the heuristic search is easy to fail and the random exploration is the key to success. However, the full knowledge of the environments is often not available, and it is known, calculating the configuration space for the environment is computational expensive [2]. In this work, we balance the heuristic search and the random exploration, through learning the environmental complexity online.
This environmental complexity can be learned by probabilistic methods, such as the classical theory of probability or Bayesian methods. The classical theory of probability is simple in concept as it counts the percentage of failed exploration, but these algorithms prone to bias [37]. For example, in a scenario that the target is inside a corridor, when the manipulator’s initial position is unluckily located, the manipulator reaches obstacles before it reaches the targets. Therefore, the algorithm degenerates to pure random exploration because collisions are frequent. However, we know the configuration space is wide and nicely connected. Bayesian methods do not suffer from this problem, because of the existence of the prior. However, the prior depends on the configuration space and is difficult to achieve [38], [39].
Shunting Short Term Memory model (STM) is a solution to this online learning problem. STM is a robust and powerful tool that describes how living creatures adapt to environmental changes and has been widely used in addressing the system adaptiveness problem [40]. STM was derived from the additive Short Term Memory model, which is mathematically described as [40], [41]:
where −Aixi is a passive decay; is the positive feedback and is the negative feedback; Ii is the input inspiration (see detailed explanation in [40]).
Algorithm 1.
Proposed Random RNN Control Scheme
| Input: Target (rT), Manipulator Start Position (r0) | |
| Output: Sequence of Joint States (C) | |
| Init : rs=r0, rd=r0, Ts = {[rs, rd, ∅]}, RRNN†, re‡ | |
| 1: | if (Neural activity of RRNNSTM > rand (0,1)) then |
| 2: | rd = random point in the task space |
| 3: | else |
| 4: | rd = rT |
| 5: | end if |
| 6: | Select the reached goals( rd in Ts), which is closes to rd, use it as rs |
| 7: | RRNNRNN plan motion for given rs and rd, and produce a sequence of commands cs,d |
| 8: | while re>0 do |
| 9: | rr = end effector position |
| 10: | if (rd = rr) then |
| 11: | if (rd == rT) then |
| 12: | Back tracing control sequence |
| 13: | Break |
| 14: | end if |
| 15: | else |
| 16: | rd = rr |
| 17: | end if |
| 18: | Feed collision into RRNNSTM |
| 19: | Insert [rs, rd, cs,d] into Ts |
| 20: | re = re − 1 |
| 21: | end while |
| 22: | return C |
The architecture of RRNN is explained in Fig. 1, which consists RRNNRNN and RRNNSTM.
A big integer to control the maximum allowed exploration time.
The intuition is easy to understand through an analogy of “pain” from the collision. If a collision was fed into a STM, the output of the STM can be seen as the memory of “pain”. In the beginning, the STM has no pain and lean to the heuristic search. While a collision happens, the neural activity of the STM increases and it tends to random exploration. The “pain” decays with time, and it tends to perform the heuristic search again. In the extreme case that collisions happen very time, STM is still able to perform the heuristic search as the decay is continuous and the neuron activities are guaranteed to be unsaturated. From the description we know that unlike to some “windowed” methods, STM does not explicitly have a fixed term of memory; instead, it nonlinear decays memory and the old ones will be “flashed” by new ones.
A single cell STM is the simplest model that meets our needs, which can be mathematically described as:
| (14) |
where parameters A denotes the passive decay rate; variable x is the neural activity and x ∈ [0,1] is guaranteed; the excitatory inputs (inspirations) to neurons are I is the excitatory input from the exploration, it can simply be 0 for no collision and 1 for collision; wj denotes weights of the self-excitatory. The neural activities of STM, RRNNSTM, will be used to balance the random search with the heuristic search. The proposed RNN control scheme is explained in Fig. 1 and in Algorithm 1.
Fig. 1:

The Architecture of the Proposed Random Recurrent Neural Network. The proposed method consists a single layer Recurrent Neural Network(RNN) and a single cell Short Term Memory (STM). The STM learns from exploration and controls the RNN to generate precise, efficient and robust motion planning results for redundant manipulators.
In Algorithm 1, RRNNRNN denotes the control scheme described in Eqn. 8, 9, 10 and 12, and RRNNSTM is mathematically explained in Eqn. 14.
III. THEORETICAL ANALYSES
A. Precision and Stability
Subsection II-B states that the proposed RNN control scheme overcomes the error accumulation and improves the control precision. The classical RNN control scheme is:
| (15) |
where λ0 and denotes the value of λ and at t = 0, respectively.
By replacing λ in Eqn. 6 with Eqn. 15 we have:
| (16) |
From Eqn. 16 we know that given arbitrary time point t = 0, the errors accumulates with time as long as errors With the proposed scheme (Eqn. 7), for any time point t, its initial error accumulation, is fed back into the controller as:
| (17) |
We can see the accumulated errors are canceled out in Eqn. 17, thus the proposed scheme does not suffer from the problem anymore.
In order to prove the global stability of the proposed method, we define the Lyapunov function as where denotes the tracking errors (used as feedback in Eqn. 7). Then from Eqn. 3a and 5 we know Because rd is the goal, which is a constant, we have:
| (18) |
By substituting Eqn. 18 into the defined Lyapunov function, we have:
| (19) |
Because the defined projection function is a saturation function as we have:
| (20) |
Therefore, by projection to zero we have:
The above equation is simplified as:
From the definition of the norm, we have:
| (21) |
Eqn. 21 can be used in LaSalle’s invariant set principle to prove that e = 0 is the only solution to and the proposed control scheme globally converges to zero [42].
B. Probability Completeness
We first introduce two lemmas and a definition to facilitate the analysis.
Remark 3.1:
The task space of a manipulator is a bounded connected open set, as , because of the existence of obstacles, there is an obstacle space as: , and the subspace that is reachable by a manipulator becomes , where Xfree is the free space as: Xfree = X\Xobs.
Lemma 3.1:
For all points in the reachable space there exists a space, Bd for xd, and all points in the space , a valid path can be found by the control scheme described in Eqn. 7.
Proof:
Because the point xd is reachable by the manipulator, for any points , there exists at least one valid path , and , and . However, because the control scheme defined in Eqn. 7 is consistently attracted by the target rd, a subset of the path that corresponds to the non-concave configuration subspace is directly reachable by the control scheme in Eqn. 7. If we define Bd as and , Lemma 3.1 holds truth. □
Intuitively, Bd can be imagined as a basin to xd. We know that, for most of the manipulators, if there are no joint limits, self collisions and obstacles, Bd = Xreach holds truth for any . Those constrains break Xreach into basins, and here we have Lemma 3.2.
Lemma 3.2:
For a given manipulator and a environment, .
Proof:
Lemma 3.1 proves that , exists, so holds truth. Let’s define , because x locates on at least one of the valid path, as proved in Lemma 3.1, we have , therefore, , therefore . □
Now it is ready to prove the proposed method has the planning probabilistic completeness.
Definition 3.1:
For a given target , if the set of valid paths, equals to , it is reported in finite time. If
Theorem 3.1:
For the proposed method, for , in finite time,
Proof:
Lemma 3.1 and 3.2 show that Bd can be reached within finite time and Theorem 3.1 is immediate.
In real applications, as described in Algorithm 1, the maximum allowed exploration time needs be set to ensure the algorithm will not take arbitrarily long to explore. After exceeding the time, the algorithm will report the non-existence of a valid trajectory.
C. Efficiency
If we denote a valid path by the control points as: where and Then we have following theorem.
Theorem 3.2:
For the given path, the possibility of finding a path grows exponentially in n iterations as: where and pi denotes the probability of sampling a point in Bi.
Proof:
Because of the uniform random sampling is adopted in the proposed method, given the volume of B, the probability of sampling a point in B is known as the ratio of the volume of B with respect to the volume of the reachable space. However, precisely calculating the volume B is more computationally expensive than motion planning. Here we estimate the upper bound of the efficiency instead.
If we approximate pi with p, then the probability of sampling B are independent and identical, and follow Bernoulli distribution with parameter p. Let L denotes the event of achieving the given path with n samplings. With the approximation of p, the event L follows a binomial distribution with parameters k and p, as:
| (22a) |
| (22b) |
| (22c) |
Therefore, hold truth [45]. We also have:
| (23a) |
Because the probability of missing the path is less than , and the probability of finding the path is bigger than □
IV. ILLUSTRATIVE EXAMPLES AND DISCUSSION
Table. I compares RNN control schemes for redundant manipulator motion planning. To our best knowledge, the proposed method is the first RNN control scheme that achieved planning completeness. Among all those algorithms, three were selected to compare with the proposed method, based on similarity. The first algorithm [25] solves the redundant manipulator obstacle avoidance problem in the velocity space. For conciseness, we refer the work as “Method1” in this paper.The second algorithm [26] addresses the obstacle avoidance problem in the acceleration space, and we refer to it as “Method2”. The third method [44] is not capable of avoiding obstacles, but it demonstrated the superior control precision and the outstanding robustness in obstacle-free environments. In order to provide a benchmark for the proposed method, we included that work in the comparison and refer to it as “Method3”.
TABLE I:
Comparisons among Recurrent Neural Network based Control Scheme for Redundant Manipulators†.
| Global Convergence |
Theoretical Error |
Free of Error Accumulation |
Free of Matrix Inversion |
Physical Limits Avoidance |
Smothness Optimization |
Obstacle Avoidance |
Planning Completeness |
|
|---|---|---|---|---|---|---|---|---|
| Model in [43] | Yes | Nonzero | No | No | No | No | No | No |
| Model in [44] | Yes | Zero | Yes | Yes | Yes | No | No | No |
| Model in [36] | Yes | Zero | Yes | Yes | Yes | Yes | No | No |
| Model in [35] | Yes | Zero | Yes | Yes | Yes | Yes | No | No |
| Model in [25] | Yes | Zero | No | Yes | Yes | No | Yes | No |
| Model in [26] | Yes | Zero | No | Yes | Yes | Yes | Yes | No |
| Proposed | Yes | Zero | Yes | Yes | Yes | Yes | Yes | Yes‡ |
Algorithms closely related to the proposed method are compared here
Probabilistic completeness.
The three representative algorithms were compared with the proposed method on the Mitsubishi PA10–7C based simulation. The PA10 redundant manipulator was chosen because it has 7 DoF and its mechanics are similar to human arms [46].
In the simulation experiments, the parameters of the proposed method were empirically chosen as: k = 100, c1 = c2 = 0.5, A = 0.9 and w = 0.95. For the other algorithms, we followed the references to set up the parameters [25], [26], [44].
A. Control Precision and Robustness against Noise
Because “Method3” is not capable of obstacle avoidance, the four algorithms were compared in obstacle-free environments. Obstacle-free environments can also validate that the proposed method balances the random exploration and the heuristic search, and succeed in planning with one heuristic search attempt.
In the simulation experiments, the manipulator starts from a known position and tracks a circular trajectory. White Gaussian process noise was injected in order to study the robustness against noise. The white noise has three different levels of standard deviation: σ = 0.01, σ = 0.05 and σ = 0.25. End effector trajectory errors are compared in Fig. 3. From the results we can see that with the increase of noise, “Method3” and the proposed method showed stronger robustness against noise, due to their ability against error accumulation. Although the proposed method and “Method3” have similar performance, we know the proposed method is better from Table. II.
Fig. 3:

Tracking Precision Comparison. Different levels of process noise have been injected to verify the robustness of the RNN control schemes.
TABLE II:
RMS Position Tracking Error With Respect to Various Noise Level.
B. Planning Completeness and Efficiency
Because “Method3” is not capable of obstacle avoidance, only “Method1” and “Method2” were compared with the proposed method,
Two representative scenarios in redundant manipulator planning are adopted in the experiments. Scenario 1 is the environment with a plane-like obstacle and Scenario 2 has a window-shaped obstacle. In both scenarios, the three algorithms command the PA10 manipulator to reach a target behind the obstacles. The manipulator randomly starts from the known initial positions because both “Method1” and “Method2” suffer from the error accumulation problem. Table III compares the planning success rates, which is defined as: vs/ve, where vs denotes the total number of successes, and ve = 50 denotes the total number of experiments.
TABLE III:
Planning Success Rate Comparison.
| Method 1 | Method2 | Proposed | |
|---|---|---|---|
| Scenario 1 | 92% | 92% | 100% |
| Scenario 2 | 72% | 74% | 100% |
Example planning results in the scenario 1 and 2 are shown in Fig. 4(a)~(c) and (d)~(f), respectively. In the figure, the semi-transparent blue plane denotes the obstacle; the thick colored lines indicate manipulator initial configurations and the thin colored lines denote the trajectory. The goal is denoted by a red sphere, and the curved red line segments denote the end effector trajectories. From Fig. 4 and Table III it is clear that the proposed method addresses the local minimum problem and achieves the planning completeness.
Fig. 4:

Example Planning Results in Environments with a Plane-shaped Obstacle or a Window-shaped Obstacle. The semitransparentplanes denote the obstacle, the red globe denotes the target and the colored lines indicate the manipulator trajectories.
Planning efficiency of the proposed method was demonstrated by comparing the proposed method with a control scheme described in [47]. This control scheme is not capable of learning from exploration. Table IV compares the efficiency and shows how many random explorations and heuristic searches have been conducted in the planning. Fig. 5 shows how RRNNSTM neuron activities change during the exploration and adapt to different environments. From Table IV and Fig. 5 it is clear that the proposed method optimizes the exploration per environment and dramatically increases the planning efficiency.
TABLE IV:
Planning Efficiency Comparison. The averaged total numbers of explorations in 50 experiments are compared in the table.
| Without Learning | Proposed | |||||
|---|---|---|---|---|---|---|
| RE | HS | Total | RE | HS | Total | |
| Scenario 1 | 20.80 | 21.46 | 42.26 | 19.88 | 2.92 | 22.80 |
| Scenario 2 | 56.66 | 56.60 | 113.26 | 52.76 | 5.42 | 58.18 |
RE: Random Exploration, HS: Heuristic Search, Total=RE+HS
Fig. 5:

Example RRNNSTM Neuron Activity Changes in Exploration. The neural network learns from exploration. In simpler environments, it tends to perform the heuristic search; in complex environments, it leans to the random exploration.
C. Real World Experiment
The proposed method was applied to a Raven II surgical robot. Raven II surgical robots are popular in the robotic surgery community and are deployed at 18 sites worldwide. The challenges of controlling Raven II, as well as the kinematic model, can be found in [8]. The experiment simulates the popular endoscopic robotic surgeries, in which the robot reaches a target through a small orifice. Compared to exiting teleoperated robotic surgeries, the robot starts from an unknown position and autonomously reaches the goal without collision. Fig. 6a shows the Raven II surgical robot and 6b explains the experimental setup. A box with a 4 cm square opening represents the surgical environmental obstacle and the target is denoted by a pin with a red head. The robot trajectory is recorded by a stereo camera. Fig. 6c visualizes the robot trajectory, in which thick colored lines indicate the manipulator initial configuration and the thin lines indicate the end effector trajectory. The obstacle position is indicated by the semi-transparent blue plane and the goal is denoted by a red dot. From the experimental result, it is clear that the proposed method achieved success in such a complex task in robotic surgeries.
Fig. 6:

Applying the Proposed Method to Simulate Autonomous Robotic Endoscopic Surgery. The Raven II robot automatically reaches the surgical target under the control of the proposed method. (a): the Raven II surgical robot. (b): the experimental setup and the initial manipulator position. The zoomed-in area shows the manipulator reaches the goal. (c): the surgical robot trajectory.
V. CONCLUSION
This paper presents a new RNN control scheme for complete and efficient redundant manipulator motion planning. The proposed scheme addresses the local minimum problem in RNN control scheme, through introducing randomness. Moreover, it avoids the low-efficiency of the random explorations, through online learning from exploration. Rigorous theoretical analyses show the precision, the stability, and the planning completeness and the planning efficiency of the proposed method. Simulation experiments demonstrate the proposed method has better precision, robustness than other three representative RNN schemes, and more importantly, it achieves the planning completeness and improves the planning efficiency. Real-world experiments demonstrate the application of the proposed method in the endoscopic robotic surgery scenario. The proposed method shares the efficiency and the robustness with other RNN control schemes, meanwhile, it achieves higher control precision. More importantly, the proposed method is the first RNN control scheme that achieves the planning completeness in environments with obstacles and shows the much broader applicability of Recurrent Neural Network.
ACKNOWLEDGMENT
This work was partially supported by NSF grant IIS-1637444 and NIH grant 5R21EB016122-02.
Contributor Information
Yangming Li, College of Engineering Technology, Rochester Institute of Technology, Rochester, NY, USA 14623. The major part of this work was done when he was with the BioRobotics Lab at University of Washington, Seattle, WA, USA 98195.
Shuai Li, Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong.
Blake Hannaford, Department of Electrical Engineering, University of Washington, Seattle, WA, USA 98195.
REFERENCES
- [1].Khatib O, “A unified approach for motion and force control of robot manipulators: The operational space formulation,” IEEE Journal on Robotics and Automation, vol. 3, no. 1, pp. 43–53, 1987. [Google Scholar]
- [2].Craig JJ, Introduction to robotics: mechanics and control. Pearson Prentice Hall; Upper Saddle River, 2005, vol. 3. [Google Scholar]
- [3].Bejczy AK, “Robot arm dynamics and control,” 1974. [Google Scholar]
- [4].Bernstein DS, Matrix mathematics: Theory, facts, and formulas with application to linear systems theory. Princeton University Press; Princeton, 2005, vol. 41. [Google Scholar]
- [5].Pan Y, Yang C, Pan L, and Yu H, “Integral sliding mode control: Performance, modification and improvement,” IEEE Transactions on Industrial Informatics, pp. 1–1, 2017. [Google Scholar]
- [6].Li S and Li Y, “Nonlinearly activated neural network for solving time-varying complex sylvester equation,” IEEE transactions on cybernetics, vol. 44, no. 8, pp. 1397–1407, 2014. [DOI] [PubMed] [Google Scholar]
- [7].Liu Y-J and Tong S, “Barrier lyapunov functions for nussbaum gain adaptive control of full state constrained nonlinear systems,” Automatica, vol. 76, pp. 143–152, 2017. [Google Scholar]
- [8].Li Y and Hannaford B, “Gaussian process regression for sensorless grip force estimation of cable-driven elongated surgical instruments,” IEEE Robotics and Automation Letters, vol. 2, no. 3, pp. 1312–1319, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Gao H, He W, Zhou C, and Sun C, “Neural network control of a two-link flexible robotic manipulator using assumed mode method,” IEEE Transactions on Industrial Informatics, pp. 1–1, 2018. [DOI] [PubMed] [Google Scholar]
- [10].Li Z, Huang Z, He W, and Su C-Y, “Adaptive impedance control for an upper limb robotic exoskeleton using biological signals,” IEEE Transactions on Industrial Electronics, vol. 64, no. 2, pp. 1664–1674,2017. [Google Scholar]
- [11].Yang C, Jiang Y, He W, Na J, Li Z, and Xu B, “Adaptive parameter estimation and control design for robot manipulators with finite-time convergence,” IEEE Transactions on Industrial Electronics, pp. 1–1, 2018. [Google Scholar]
- [12].Sun B, Zhu D, and Yang SX, “A bioinspired filtered backstepping tracking control of 7000-m manned submarine vehicle,” IEEE Transactions on Industrial Electronics, vol. 61, no. 7, pp. 3682–3693, 2014. [Google Scholar]
- [13].Xiao H, Li Z, Yang C, Zhang L, Yuan P, Ding L, and Wang T, “Robust stabilization of a wheeled mobile robot using model predictive control based on neurodynamics optimization,” IEEE Transactions on Industrial Electronics, vol. 64, no. 1, pp. 505–516, 2017. [Google Scholar]
- [14].Yang C, Jiang Y, Li Z, He W, and Su C-Y, “Neural control of bimanual robots with guaranteed global stability and motion precision,” IEEE Transactions on Industrial Informatics, 2017. [Google Scholar]
- [15].Pan Y, Wang H, Li X, and Yu H, “Adaptive command-filtered backstepping control of robot arms with compliant actuators,” IEEE Transactions on Control Systems Technology, vol. 26, no. 3, pp. 1149–1156, May 2018. [Google Scholar]
- [16].Wen G, Chen CLP, and Liu YJ, “Formation control with obstacle avoidance for a class of stochastic multiagent systems,” IEEE Transactions on Industrial Electronics, vol. 65, no. 7, pp. 5847–5855, July 2018. [Google Scholar]
- [17].He W, Ouyang Y, and Hong J, “Vibration control of a flexible robotic manipulator in the presence of input deadzone,” IEEE Transactions on Industrial Informatics, vol. 13, no. 1, pp. 48–59, 2017. [Google Scholar]
- [18].Li Y and Hannaford B, “Soft-obstacle avoidance for redundant manipulators with recurrent neural network,” in Intelligent Robots and Systems (IROS), 2018 IEEE/RSJ International Conference on IEEE, 2018, pp. 1–6. [Google Scholar]
- [19].Tang L, Liu YJ, and Chen CLP, “Adaptive critic design for pure- feedback discrete-time mimo systems preceded by unknown backlashlike hysteresis,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–10, 2018. [DOI] [PubMed] [Google Scholar]
- [20].Jin L, Li S, Luo X, and Li Y, “Neural dynamics for cooperative control of redundant robot manipulators,” IEEE Transactions on Industrial Informatics, pp. 1–10, 2018. [Google Scholar]
- [21].Li X, Pan Y, Chen G, and Yu H, “Continuous tracking control for a compliant actuator with two-stage stiffness,” IEEE Transactions on Automation Science and Engineering, vol. 15, no. 1, pp. 57–66, January 2018. [Google Scholar]
- [22].He W and Dong Y, “Adaptive fuzzy neural network control for a constrained robot using impedance learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 4, pp. 1174–1186, April 2018. [DOI] [PubMed] [Google Scholar]
- [23].Jin L, Li S, La HM, and Luo X, “Manipulability optimization of redundant manipulators using dynamic neural networks,” IEEE Transactions on Industrial Electronics, 2017. [Google Scholar]
- [24].Li S, He J, Li Y, and Rafique MU, “Distributed recurrent neural networks for cooperative control of manipulators: A game-theoretic perspective,” IEEE transactions on neural networks and learning systems, vol. 28, no. 2, pp. 415–426, 2017. [DOI] [PubMed] [Google Scholar]
- [25].Zhang Y and Wang J, “Obstacle avoidance for kinematically redundant manipulators using a dual neural network,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 34, no. 1, pp. 752–759, 2004. [DOI] [PubMed] [Google Scholar]
- [26].Guo D and Zhang Y, “Acceleration-level inequality-based man scheme for obstacle avoidance of redundant robot manipulators,” IEEE Transactions on Industrial Electronics, vol. 61, no. 12, pp. 6903–6914, 2014. [Google Scholar]
- [27].Zhang Y and Jin L, Robot Manipulator Redundancy Resolution. John Wiley & Sons, 2017. [Google Scholar]
- [28].Lake BM, Salakhutdinov R, and Tenenbaum JB, “Human-level concept learning through probabilistic program induction,” Science, vol. 350, no. 6266, pp. 1332–1338, 2015. [DOI] [PubMed] [Google Scholar]
- [29].Funahashi K.-i. and Nakamura Y, “Approximation of dynamical systems by continuous time recurrent neural networks,” Neural networks, vol. 6, no. 6, pp. 801–806, 1993. [Google Scholar]
- [30].Nocedal J and Wright SJ, Sequential quadratic programming. Springer, 2006. [Google Scholar]
- [31].Zhang Z and Zhang Y, “Variable joint-velocity limits of redundant robot manipulators handled by quadratic programming,” IEEE/ASME Transactions on Mechatronics, vol. 18, no. 2, pp. 674–686, 2013. [Google Scholar]
- [32].Boyd S and Vandenberghe L, Convex optimization. Cambridge university press, 2004. [Google Scholar]
- [33].Zhang Y, Wang J, and Xu Y, “A dual neural network for bi-criteria kinematic control of redundant manipulators,” IEEE Transactions on Robotics and Automation, vol. 18, no. 6, pp. 923–931, 2002. [Google Scholar]
- [34].Xia Y and Wang J, “A dual neural network for kinematic control of redundant robot manipulators,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 31, no. 1, pp. 147–154, 2001. [DOI] [PubMed] [Google Scholar]
- [35].Zhang Z, Zheng L, Yu J, Li Y, and Yu Z, “Three recurrent neural networks and three numerical methods for solving repetitive motion planning scheme of redundant robot manipulators,” IEEE/ASME Transactions on Mechatronics, 2017. [Google Scholar]
- [36].Li Y, Li S, Miyasaka M, Lewis A, and Hannaford B, “Improving control precision and motion adaptiveness for surgical robot with recurrent neural network,” in Intelligent Robots and Systems (IROS), 2017 IEEE/RSJ International Conference on IEEE, 2017, pp. 1–6. [Google Scholar]
- [37].Kolmogorov AN, “Foundations of the theory of probability.” 1950. [Google Scholar]
- [38].Bernardo JM and Smith AF, “Bayesian theory,” 2001. [Google Scholar]
- [39].Li Y, Li S, Song Q, Liu H, and Meng MQ-H, “Fast and robust data association using posterior based approximate joint compatibility test,” IEEE Transactions on Industrial Informatics, vol. 10, no. 1, pp. 331–339, 2014. [Google Scholar]
- [40].Grossberg S, “Nonlinear neural networks: Principles, mechanisms, and architectures,” Neural Networks, vol. 1, no. 1, pp. 17–61, 1988. [Google Scholar]
- [41].Li Y, Zhang J, and Li S, “STMVO: biologically inspired monocular visual odometry,” Neural Computing and Applications, vol. 29, no. 6, pp. 215–225, 2018. [Google Scholar]
- [42].Zhang Y, Wang J, and Xu Y, “A dual neural network for bi-criteria kinematic control of redundant manipulators,” IEEE Transactions on Robotics and Automation, vol. 18, no. 6, pp. 923–931, 2002. [Google Scholar]
- [43].Guo D and Zhang Y, “Li-function activated znn with finite-time convergence applied to redundant-manipulator kinematic control via time-varying jacobian matrix pseudoinversion,” Applied Soft Computing, vol. 24, pp. 158–168, 2014. [Google Scholar]
- [44].Li S, Zhang Y, and Jin L, “Kinematic control of redundant manipulators using neural networks,” IEEE Transactions on Neural Networks and Learning Systems, 2016. [DOI] [PubMed] [Google Scholar]
- [45].Motwani R and Raghavan P, Randomized algorithms. Chapman & Hall/CRC, 2010. [Google Scholar]
- [46].Shen W, Gu J, and Ma Y, “3d kinematic simulation for pa10–7c robot arm based on vrml,” in Automation and Logistics, 2007 IEEE International Conference on IEEE, 2007, pp. 614–619. [Google Scholar]
- [47].Li Y, Li S, and Hannaford B, “A novel recurrent neural network control scheme for improving redundant manipulator motion planning completeness,” in Robotics and Automation (ICRA), 2018 IEEE International Conference on IEEE, 2018, p. 16. [DOI] [PMC free article] [PubMed] [Google Scholar]
