Skip to main content
PLOS One logoLink to PLOS One
. 2025 Aug 13;20(8):e0329065. doi: 10.1371/journal.pone.0329065

Research on prostate brachytherapy puncture control strategy based on adaptive PID control with FBG sensors

Jianqiao Li 1, Xuesong Dai 2,3,*, Peng Li 2,4
Editor: Kamal Sharma,5
PMCID: PMC12349729  PMID: 40802812

Abstract

This paper enhances prostate brachytherapy robot accuracy by developing a needle deflection prediction model and a controlled puncturing strategy, addressing current challenges and trends. The study addresses the challenges in needle deflection prediction by proposing a correction force-based prediction model. The puncture control strategy comprises two phases: preoperative needle trajectory planning and intraoperative approach adjustment, both relying on corrective force. During operative adjustment, a model predicting and counteracting needle tip deflection ensures accurate corrective force application. An adaptive PID controller, utilizing Reinforcement Learning (RL), regulates corrective force for precise puncture accuracy. A dedicated experimental platform was constructed to validate the puncture control strategy for prostate seed implantation. The seed implantation’s average error was 1.96 mm, with a standard error of 0.56 mm. Experiments show that correction force in the strategy significantly reduces tip deflection, enhancing seed implantation precision.

Introduction

Among the incidence of malignant cancers in men, prostate cancer has risen to the second highest place and the fifth leading cause of cancer death in men [1]. At present, the treatment of prostate cancer is mainly radical resection, external radiation therapy (EBRT) and Low dose rate (LDR) prostate brachytherapy (BT), supplemented by other treatment methods to achieve the best surgical effect [2]. Compared with radical resection, prostate cancer particle implantation has the characteristics of less trauma, faster recovery, fewer complications, and low hospital costs, and Ennis [3] concluded through a large number of clinical studies that BT can achieve similar treatment effects as radical resection, and has become the most desired treatment for patients. BT involves placing radiation sources inside or near the targeted treatment area. By utilizing the continuous radiation emitted by the radioactive particles, the structure and activity of tumor cells are affected, thereby selectively eliminating the tumor cells. Compared to traditional surgeries such as radical resection, radioactive particle implantation treatment has lower risks of side effects and offers better prognosis and quality of life. Currently, radioactive particle implantation therapy has become the standard treatment for early-stage prostate cancer in the United States [4].

In clinical practice, BT is primarily performed by doctors manually using a percutaneous puncture technique. As shown in Fig 1, a puncture needle is guided along a planned path to implant radioactive particles such as Iodine-125 and Palladium-103 into the tumor target area. Multiple small radiation sources emit continuous, short-range radiation to irradiate the tumor tissue. During the brachytherapy process, the dose distribution requirements for the tumor target area are quantitative and non-uniform, depending on the differences in the location of the tumor lesion in each patient. The position of each radioactive particle is adjusted to meet the radiation dose requirements of the tumor target area.

Fig 1. Current clinical treatment methods of brachytherapy.

Fig 1

However, due to the steep radiation dose gradient [57], there are high precision demands for the placement of radioactive particles. Currently, due to limitations such as insufficient manual operation accuracy, unexpected organ movements, and physiological structures such as bones and blood vessels, it is often difficult in clinical settings to accurately place the radioactive particles in the predetermined position. This can result in incomplete coverage of the tumor target area, increasing the risk of tumor recurrence. Therefore, achieving precise implantation of radioactive particles into the target site has become a critical challenge that needs to be addressed in particle implantation therapy.

In clinical practice, doctors intermittently rotate the needle to control its linear progression. They rotate the needle to alter the direction of the needle tip’s bevel, enabling it to move in the opposite direction; however, manually controlling the precise path of the needle tip is challenging [4,5].Therefore, in recent years, robot-assisted BT technology has increasingly gained attention [6]. Research institutions achieve precise puncturing by guiding needle rotation, developing needle-tissue interaction models, creating needle deflection prediction models, and improving needle steering control. The basic interactions between needle and tissue, including stiffness force, friction, and cutting force, have been studied [710].Needle deflection prediction models include mechanical models [814] and kinematic models [1520]. However, current kinematic models have little correlation with the characteristics of the punctured tissue, leading to discrepancies between the model and the actual trajectory. Mechanics-based needle deflection models take into account tissue properties and have led to improved needle deflection prediction models [10,11,13,14,17,21], providing information for axial needle rotation steering in model-based controllers [8,1422]. In clinical practice, doctors use two methods to adjust the needle tip position during surgery: 1) rotating the needle body; 2) applying corrective force near the insertion point [22].Rotating the needle body is a simple operation, but it can cause adhesion between the patient’s tissue and the needle, leading to secondary injury to the patient. Method 2 requires the doctor to apply a corrective force perpendicular to the direction of needle insertion to steer the needle. However, the precision in the magnitude and timing of this force demands high skill from the doctor; improper application can lead to tearing of patient’s tissue. In recent years, the application of robotic technology in puncture procedures has become one of the hot topics in medical robotics research. Lehmann conducted puncture experiments on silicone tissue using a robot, studying the impact of corrective force on the precision of the puncture [2325]. During the puncture process, the corrective force is applied directly to the needle body along the direction of needle deflection to reduce the deflection value. The advantage of this method is that the corrective force provides a continuous control input. The PID (Proportional, Integral, and Derivative) control used in the Ref. [25] is based on the calculation of proportional, derivative, and integral components. It exhibits a certain degree of lag, affecting operational efficiency. Research on the second method, the corrective force guidance technique, is currently in its initial stages. The control models based on this method need improvements in terms of accuracy and real-time performance.

Materials and methods

Needle deflection prediction model

As shown in Fig 2, the left side of the puncture needle is fixed by the fixed needle guide, so only the needle shaft part from point A to point C is considered for modeling, which simplifies the model complexity and improves the computational efficiency of the mathematical model. During needle puncture, as the depth of puncture increases, the length of the needle from point A to point C is also increasing, so the length of the needle is a variable. At points B and C, the needle is subject to correction force and cutting force, respectively. The needle deflection prediction model is established by using the principle of minimum potential energy. Equations translate the functional work performed on the needle-tis sue system by energy and outside forces stored in the needle and the tissue during puncture into a linear equation system by applying the Rayleigh-Ritz approach [26]. Finally, by using the principle of minimum potential energy to solved the linear equations of the needle deflection.

Fig 2. Schematic diagram of corrective force affecting needle deflection.

Fig 2

The system energy (u) for needle-tissue is expressed as:

(u)=U(u)+V=Us(u)+Ud(u)+Vl+Vt (1)

Where: U(u) is the energy possessed by the system itself; V is the energy generated on the system by the lateral driving force and the cutting reaction force; Us(u) is the elastic potential energy generated by the deflection of the needle; Ud(u) is the compression potential energy generated when the needle is inserted into the tissue and the tissue is compressed; Vl is the energy generated by the work done by the corrective force Fl; Vt is the energy generated by the work done by the component Fcutting,x of the X-axis cutting force.

  • (1)

    Elastic potential energy of needle Us(u)

In this paper, the axial deflection of the needle can be ignored, and only the radial deflection of the needle is considered. Elastic potential energy generated by needle deflection Us(u) can be expressed as:

Us(u)=0lEI2(2u(z)2z2)dz (2)

Where: E is Young ‘s modulus of puncture needle; I is moment of inertia; l is length of puncture needle; u(z) is deflection model of needle; z is depth of puncture

  • (2)

    Tissue compression potential energy Ud(u)

When the needle puncture into the tissue, the needle is deflected and occupies the space of original tissue, the tissue around the needle will be squeezed by the needle, and the energy Ud(u) in the compressed tissue is expressed as:

Ud(u)=K2ldkl(u(z)ut(z))2dz (3)

Where: ut(z) is measured needle tip path, the value of z ranges from 0 to l; dk is the final puncture depth of the needle and z is the depth of puncture.

When the needle puncture into the tissue, the compressed tissue can be represented by virtual springs that join into a needle-shaped trajectory as shown in Fig 3. According to Eq 3, the elongation of elastic spring is related to the deviation position of needle shaft after receiving correction force and the difference between path ut(z) of needle tip.

Fig 3. Schematic of needle deflection when corrective force is applied.

Fig 3

  • (3)

    Work done by corrective force Vl

Apply a correction force perpendicular to the needle axis to point B of the needle axis by a corrective force application mechanism, and work done by the corrective force Vl can be expressed as:

Vl=Flu(c2) (4)

Where: u(c2) represents the offset distance of needle at point B.

  • (4)

    Work done by Fcutting,x (component of cutting force along X-axis)

The X-axis component force Fcutting,x of the cutting force is the main cause of needle deflection during needle puncture into tissue and is caused by the asymmetric geometry of the oblique needle tip. Because of the asymmetry of the needle tip, the tissue is squeezed by the needle tip as it passes through the tissue. Therefore, the needle will bend in the same direction as the bevel. Therefore, the direction of the bevel is responsible for determining both the sign of Fcutting,x and the direction in which the needle will deflect.

The work done by Fcutting,x is shown as follows:

Vt=Fcutting,xu(l) (5)

Where: u(l) is the value of needle deflection

The meaning of u(l) is different from the meaning of ut(z). Needle tip path ut(z) is constituted by the tip deflection u(l) of the past puncture step and thus is dependent on the z-coordinate in the horizontal plane. In summary, the Eq 24 is substituted into Eq 1, and then the system energy can be expressed as:

Π(u)={*20colEI2(2u(z)2z2)dz+K2ldkl(u(z)ut(z))2dzFcutting, xu(d,l)  (z<dl)olEI2(2u(z)2z2)dz+K2ldkl(u(z)ut(z))2dzFlu(c2)Fcutting, xu(d,l)  (z>dl) (6)

Where: dl is the puncture depth when applying corrective force to needle.

In attempt to solve the energy-based needle-tissue system model that was presented before, the Rayleigh-Ritz approach was applied in order to find an answer to the problem of the needle’s deflection variable. According to the Rayleigh-Ritz method, an approximation of a differential equation that takes the form of a function can be found by adding a finite weighted shape function to itself. Function of weighting for series that are finite:

un(z)=i=1nqi(z)gi (7)

Where: qi(z) refers to the i-th shape function; gi refers to the weighting coefficient corresponding to the shape function. qi(z) can be calculated using the following equation [27]:

qi(z)=1ki(sin(βizl)sinh(βizl))γi[cos(βizl)cosh(βizl)] (8)

Where: γi and ki can be calculated using the following formula:

γi=sinβi+sinhβicosβi+coshβi (9)
ki=sinβisinhβiγi(cosβicoshβi) (10)

Where: βi is the constant value in the cantilever model without clamping, when i > 4, β1=1.857, β2=4.695, β3=7.855, β4=10.996, βiπ(i1/2).

Bringing Eq 8 into Eq 6, get the following formula:

Π(un)=EI20l(i=1nqi(2)(z)gi)2dz+K2(ldkli=1nqi(z)giut(z))2dzFli=1nqi(c2)giFt,xi=1nqi(l)gi\] (11)

Where: qi(2)(z) represents the second derivative of qi(z) relative to z.

When Π/gj=0, and the value range of j is (1,n), Π(un) gets the minimum value. Based on this condition, a system of linear equations with a weighted coefficient gi can be established and solved.

Then take the partial derivative of gi for Π(un), and it can be seen from the Eq 8 that for any i and j values, there is qi(z)=qj(z)=1, and for any j value, the value of qj(c2) can be found, so the following results can be obtained:

Π(un)gj=EI0l(i=1nqi(z)gi)qj(z)dz+Kldkl(i=1nqi(z)giut(z))qj(z)dzFlqj(z)Fcutting,x=0 (12)

Simplifying Eq 12, extracting gi can get Eq 13, substituting and adding the values of i can get a simplified formula:

i=1nφjigiωjγjFt,x=0 (13)

Where:

φji(z)=EI0lqi(z)qj(z)dz+Kldklqi(z)qjdz; ωj(z)=Kldklut(z)qj(z)dz; γj=Flqj(c2).

According to the above equations analysis, you can write a matrix formula with Eq 13:

[*20cφ11φ1nφn1φnm]Φ=[*20cglgln]g=Fl[*20cql(c2)qn(c2)]q(c2)+Fcutting,x1n×1+[*20cw1wn]Ω (14)

Where:1n×1 represents a column vector of size n.

The unknown vector g can be solved according to Eq 14 as follows:

Where:1n×1 represents a column vector of size n.

The unknown vector g can be solved according to Eq 14 as follows:

g=Φ1(Flq(c2)+Fcutting,x1n×1+Ω) (15)

Substituting Eq 15 into Eq 7 calculates the deflection function un(z) of the needle.

Puncture control strategy based on corrective force

This chapter builds a preoperative puncture control strategy based on the needle flexure deformation prediction model established in Chapter 2, because there is a certain error between the needle flexure deformation prediction model and the actual system, and the puncture operation is easily interfered by external factors, resulting in deviation between the needle tip position and the expected position, in order to overcome the adverse effects of model uncertainty and external interference, while considering the complex model characteristics of the system, it is difficult to apply the robust control algorithm usually based on the model, and the ordinary proportional integral derivative (Proportional-Integral-Derivative, PID) controller has poor robust performance and is difficult to meet the system requirements of this paper, so this paper will build an adaptive PID (RL-APID) control system based on reinforcement learning (RL), which adjusts the corrective force in real time so that the needle tip can reach the target point.

Preoperative needle trajectory planning

The puncture control strategy consists of two phases, as shown in Fig 4.

Fig 4. Schematic diagram of the overall puncture control strategy.

Fig 4

The phase 1 is the preoperative needle tip trajectory planning stage of the puncture needle. According to the needle deflection prediction model built in Chapter 2, the best needle tip path for the needle tip to reach the target point is obtained, and the corresponding puncture parameters-correction force Fl and puncture depth dk are obtained.

The phase 2 is the intraoperative puncture control strategy stage of the needle. After applying the correction force Fl, discrepancies arise between the intraoperative needle tip trajectory and the preoperative planned trajectory as the puncture needle is inserted. To accurately monitor and mitigate these errors, Fiber Bragg Grating (FBG) sensors are embedded within the needle(as shown in Fig 5), enabling precise sensing of the needle tip position in real-time. FBG sensors are mainly used to feedback forces, pressures and shapes, and the wavelength changes when the fibers elongate due to mechanical loads or changes in temperature. In this paper, the FBG sensor type is OSC1100−05. The main function of the FBG demodulator is to process the data collected by the FBG sensor in the corresponding software Enlight, which in turn converts it into the position information of the needle. Through the adaptive PID control strategy based on reinforcement learning, the size of the correction force is adjusted in real time to minimize the puncture error.

Fig 5. Structure diagram of FBG embedded needle.

Fig 5

Before operation, first set the desired needle body line segment τ, As shown in Fig 6.

Fig 6. Schematic diagram of preoperative needle tip trajectory planning.

Fig 6

Based on the prediction model of needle deflection, the puncture parameters of the best needle tip trajectory were obtained. Based on the prediction model of needle deflection, the cost function to minimize Ae area is established. Ae is the area enclosed by the expected needle body line segment and the needle body line segment calculated by the model from ds to df. Considering that the search space of the correction force distribution function fl is generally infinite, the simplified force distribution function fl is selected to reduce the search space.

fl(d,dl.1,dl,2)=Fl,c[k(ddl,1)k(ddl,2)]d(0,df) (16)

Where: k(·) is a step function.

Fl,c, dl,1 and dl,2 indicate the magnitude of the corrective force, as well as the starting and ending depths at which the corrective force is applied. As shown in Eq 16, the corrective force distribution function fl is a function of d, dl,1,and dl,2. The cost function R(Fl,c, dl,1) constructed by Eq 16 is the sum of squares of the residual between the desired value τ of the needle body segment and the shape of the final puncture depth.

R(Fl,c,dl,1)=zτ(ds,df)(u(df,zτ,Fl,c,dl,1)τ)2 (17)

where: u(df,zτ,Fl,c,dl,1) is the simulated deflection value of the needle at the final puncture depth obtained from the needle deflection prediction model.

The input of the cost function is the constant correction force Fl,c and the puncture depth dl,1.when Fl,c is applied that minimizes the value of R. Through the experiment, it is determined that the effective depth of the stop driven by the correction force Fl,c is 60 mm.The optimization algorithm is selected to find the optimal value of parameters. The optimization algorithm is selected to find the optimal value of parameters Fl,c and dl,1, so the optimal tip trajectory is the pattern search method. The Fl,c and dl,1 puncture parameters that make R(Fl,c, dl,1) the minimum are obtained, and the identified optimal needle tip trajectory is used as the reference trajectory of intraoperative puncture control strategy during the puncture process. Simulate the preoperative puncture control strategy algorithm. The origin is the starting point, and the expected needle body segment is the segment with curvature of 0. Calculate the optimal path and the size of Fl,c and dl,1. The result is that a correction force of 2.8N is applied at 19 mm, as shown in Fig 7.

Fig 7. Optimal tip path and corresponding correction force.

Fig 7

Intraoperative puncture control strategy

  • 1)

    Theoretical analysis of online adjustment

During puncture, the corrective force applied to the needle is adjusted according to the error between the pre-planned needle tip trajectory and the measured needle tip deflection value. In general, the corrective force predicted in the phase 1 (preoperative puncture strategy stage) can be used to control the puncture of the puncture needle. However, due to the errors in the needle deflection prediction model and the possible changes in conditions in the physical system, the prediction accuracy of the needle deflection prediction model cannot meet the requirements, so it is necessary to feed back the needle tip deflection value obtained from the FBG sensor and recalculate the corrective force online.

In order to predict the corrective force required to bring the needle tip from the current position to the target point, a reverse needle deflection prediction model based on the required needle deflection value is required to reverse the corrective force.

Reverse needle deflection prediction model:

δe=ue(d+Δd) is the expected tip deflection value,

which is achieved by applying an undetermined corrective force Fl*. Where, Δd is the feed distance of the puncture needle when the corrective force is applied. Assume that the trajectory of the needle tip ut(d) to the current depth d is known through measurement.

In order to solve the unknown correction force Fl* to make the needle tip reach the ideal deflection value, first use vector Λ=[*20c0n×1δe]T to expand the dimension of Eq 14, move Flq(c2) to the right, and combine qj(c2) in Flq(c2) into Φ, we can get:

[*20cΦqj(c2)[5pt]1n×10]ΦΨ=[*20cgFl*]gΨ=[*20cFcutting,x1n×1[5pt]0]+[*20cΩ0]ΩΨ+[*20c0n×1δe] (18)

The final Eq 18 can be written as:

gΨ=ΦΨ1(Fcutting,x1n×1+ΩΨ+Λ) (19)

i.e., Fl*=gn+1Ψ.

With the above Equations, given the parameters K, Fcutting, x and measuring needle tip track ut(d), it is possible to predict the magnitude of the corrective force required to be the desired needle deflection value δe. The advantage of this corrective force calculation method is that it does not require time-consuming iterative searches, which is key to the time constraint of a given sample during real-time trajectory replanning. During puncture, the corrective force is removed when the following criteria are met: 1) the maximum corrective force limit Fl,max is exceeded (the maximum value in this article is 4N);2) The limit of variation in corrective force between objects exceeds d > dl,2 (where the maximum dl,2 is 60 mm), and if any of these criteria are met, the reference force of the corrective force drive is set to 0. These conditions are all extreme cases that may exist when operating on the model. When extreme conditions occur, the reference force of the corrective force needle guide will be set to 0. The above is the modeling process of the reverse needle deflection deformation prediction model.

  • 2)

    Intraoperative needle tip position adjustment based on reinforcement learning adaptive PID(RL-APID) control

This paper will design an adaptive PID controller based on reinforcement learning technology, adopt the reinforcement learning technology in the form of Actor Critical structure, and respectively use the radial basis function neural network (RBFNN) to realize the Actor and Critical mechanisms, which can effectively reduce the storage requirements and avoid repeated calculations, Then a new adaptive update rule of PID control is designed based on Actor Critic structure of RBFNN.

The main contributions of this paper are as follows: First, the one-step prediction output is considered, and the enhanced signal is redefined. Therefore, temporal difference (TD) includes prediction error; Secondly, the new adaptive update rule can be calculated according to TD error. Finally, the proposed scheme is modelless design, which is very suitable for complex practical systems that are difficult to obtain accurate mathematical models.

(1)Math problem description

In order to more clearly explain the design idea and process of RL-APID, first consider the following general form of discrete time nonlinear dynamic model

x(t+1)=f(x(t))+g(x(t))u(t)y(t)=h(x(t),u(t1)) (20)

where: System state x(t)Rm at time t, control input u(t)Rn, output y(t).

As the details of the allowable model are unknown in reinforcement learning technology, Eq 20 can be expressed in a more compact form as follows

x(t+1)=F(x(t),u(t))y(t)=h(x(t),u(t1)) (21)

In order to apply the reinforcement learning control technology to the Eq 21, the system first needs to meet the following two assumptions.

Assumptions 1: Because the state of Eq 21 at time t + 1 only depends on the state and input at time t, and has nothing to do with the historical state before time t and input information, Eq 21 satisfies the “memoryless” property of Markov chain. This assumption is defined in the framework of Markov Decision Process (MDP). The goal of MDP is to achieve specific goals through satisfactory control strategies. It is similar to the definition of reinforcement learning technology, so it has an important influence in the process of combining control problems with reinforcement learning technology.

Assumptions 2: The sign of partial derivatives of function h(·) with respect to all elements is known and the sign is the same as that of system Jacobian matrix. The sign of the partial derivative of a function with respect to all elements is known and is the same as the sign of the Jacobian matrix of the system.

Since the puncture closed-loop control system in this paper is easily affected by the jump of PID derivative term, this paper proposes a speed type PID control structure to reduce the adverse effects caused by the jump of derivative term. The discrete time control structure is designed as follows.

u(t)=u(t1)+KI(t)e(t)KP(t)Δy(t)KD(t)Δ2y(t) (22)

From the Eq 22, the control increment is

Δu(t)=KI(t)e(t)KP(t)Δy(t)KD(t)Δ2y(t)=K(t)Θ(t) (23)

where: K(t)=[KI(t),KP(t),KD(t)] is the control parameter vector of the adaptive PID controller, define Θ(t)=[e(t),Δy(t),Δ2y(t)]T as the augmented system state, define Δ=1z1 is the difference operation symbol, which means the difference between the current time variable and the previous time variable. Therefore Δ2y(t) can be further expanded and expressed as

Δ2y(t)=Δy(t)Δy(t1)=y(t)2y(t1)+y(t2) (24)

where: e(t) in Θ(t) is defined as the tracking error between the system reference input and the actual system output, that is, design e(t) is

e(t)=yd(t)y(t) (25)

where: yd(t) is the reference input expected by the system.

The structure block diagram of the adaptive PID control method based on reinforcement learning proposed in this paper is shown in Fig 8. The input of the Actor Critical structure is Θ(t), which is converted from the trajectory tracking error e(t). The actuator Actor adjusts the controller online by using the observed system state, while the evaluator Critical not only receives the system state, but also receives the reward signal r(t+1), which evaluates the system performance and outputs the timing difference error.

Fig 8. Structure block diagram of adaptive PID control method based on reinforcement learning.

Fig 8

Timing differential error δTD(t) is an important parameter in the design process. The purpose of this section is to design a PID control system with a new adaptive law using the Actor-Critic structure, while meeting the system tracking accuracy and robust performance requirements.

Adaptive control system design process:

First, define a value function in the following form

V(t)=i=tγitr(x(i),u(i)) (26)

where: 0<γ1 is the attenuation factor, u(t) is the control signal, Function r(x(i),u(i)) is called a reward signal or reinforcement signal, it is generally designed as a quadratic function.

Rewrite Eq 26 as

V(t)=r(x(t),u(t))+γi=t+1γi(t+1)r(x(i),u(i)) (27)

Eq 27 is still an infinite summation equation and is difficult to solve, so it is further expressed as follows

V(t)=r(x(t),u(t))+γV(t+1),V(0)=0 (28)

Eq 28 is also known as the Bellman equation.

Based on Bellman’s Eq 28, the timing difference error can be defined as

δTD(t)=r(x(t),u(t))+γV(t+1)V(t) (29)

If the Bellman equation holds, then the timing difference error δTD(t)=0, so the control signal at each moment can be considered the optimal control strategy.

RBF neural networks are widely used in parameter recognition due to their versatile approximation ability. In this paper, we will use the RBF neural network to implement the Actor-Critic structure, and the block diagram is shown in Fig 9.

Fig 9. Block diagram of actor-critic structure.

Fig 9

The neural network structure consists of three layers of neuron nodes: input layer, hidden layer and output layer. The input layer is composed of trajectory tracking error and system output, RBF neural network transmits the system state from the input layer to the hidden layer, and constructs the hidden layer to the output layer in the form of weighted summation, and the output is the actuator and evaluator, that is, the adaptive control parameters and value functions defined above. The input of the input layer is the augmented state vector Θ(t), which is passed to the hidden layer, and then the hidden layer uses Θ(t) to calculate the output information of the layer, and the weight function of the input layer to the hidden layer is a radial basis function, that is, Φ(t)=[ϕ1(t),,ϕh(t)], and

Φj(t)=exp(Θ(t)μj(t)22σj2(t)),  j=1,2,3,,h (30)

where: μj andσj are the center and width of the radial basis function, respectively, and the center vector is defined as follows

μj(t)=[μ1j,μ2j,μ3j]T (31)

The third layer is the output layer including Actor and Critic, which is constructed in the form of a simple and direct weighted summation, and the adaptive PID controller parameters of the output can be expressed as

KP(t)=j=1hwjP(t)Φj(t)KI(t)=j=1hwjI(t)Φj(t)KD(t)=j=1hwjD(t)Φj(t) (32)

where: wjP(t), wjI(t) and wjD(t) are the weighting coefficients between the j-th hidden layer node and the corresponding output Actor. The value function of Critic can be expressed as:

V(t)=j=1hvj(t)Φj(t) (33)

where: vj(t) is the weighting coefficient between the j-th hidden layer node and the output layer Critic.

The weight matrices from the input layer to the hidden layer and the hidden layer to the output layer can be calculated by the learning algorithm based on gradient descent. First, the reward signal r(·) in this paper is defined as:

r(x(t),u(t))=12(yd(t+1)y(t+1))2 (34)

Then according to Eq 29, the timing differential error δTD(t) can be expressed as:

δTD(t)=12(yd(t+1)y(t+1))2+γV(t+1)V(t) (35)

According to the preceding definition, the cost function in this paper can be expressed as:

J(t)=12δTD2(t) (36)

Therefore, the partial differential equation of the cost function with respect to the individual output weight matrices can be described as follows

wjP(t+1)=wjP(t)αwPJ(t)wjP(t)[12pt]wjI(t+1)=wjI(t)αwIJ(t)wjI(t)[12pt]wjD(t+1)=wjD(t)αwDJ(t)wjD(t) (37)

where: αwP, αwI and αwD are learning rates, and according to the defined cost function of Eq 36, the partial derivative in Eq 37 can be obtained by finding the partial derivative one by one, and the solution process is expressed as:

J(t)wjP(t)=J(t)δTD(t)δTD(t)y(t+1)y(t+1)u(t)u(t)KP(t)KP(t)wjP(t)            =δTD(y(t)y(t1))Φj(t)y(t+1)u(t) (38)
J(t)wjI(t)=J(t)δTD(t)δTD(t)y(t+1)y(t+1)u(t)u(t)KI(t)KI(t)wjI(t)           =δTDe(t)Φj(t)y(t+1)u(t) (39)
J(t)wjD(t)=J(t)δTD(t)δTD(t)y(t+1)y(t+1)u(t)u(t)KD(t)KD(t)wjD(t)            =δTD(y(t)2y(t1)+y(t2))Φj(t)y(t+1)u(t) (40)

From Eq 38 to Eq 40, it can be seen that the above partial derivatives all require prior knowledge of the Jacobian matrix of the known system, and according to assumptions 2, the sign of the Jacobian matrix is known, so this paper calculates the Jacobian matrix according to the equation established by the following identity.

ε=|ε|sign(ε) (41)

where: sign(·) is a symbolic function.

sign(ε)={  1,  ε>0  0,  ε=01,  ε<0 (42)

Then let y(t+1)u(t) be

y(t+1)u(t)=|y(t+1)u(t)|sign(y(t+1)u(t)) (43)

Since sign(y(t+1)u(t)) is known, for |y(t+1)u(t)|, it can

be included in the learning rates such as αwP, αwI and αwD [28]. Similarly, the radial basis function center and width of the hidden layer of a neural network can be updated online by the following adaptive law.

μij(t+1)=μij(t)αμJ(t)μij(t)=μij(t)+αμδTD(t)vj(t)Φj(t)Θi(t)μij(t)σj2(t) (44)
σj(t+1)=σj(t)ασJ(t)σj(t)=σj(t)+ασδTD(t)vj(t)Φj(t)Θi(t)μij(t)2σj3(t) (45)

where: αμ and ασ are the learning rate parameter.

In addition, the output weight matrix of Critic under the RBF neural network structure can be updated online by the following adaptive law.

vj(t+1)=vj(t)αvJ(t)vj(t)=vj(t)+αvδTD(t)Φj(t) (46)

where: αv is the learning rate parameter that outputs the weight.

The design steps of reinforcement learning adaptive PID controller based on the Actor-Critic framework are shown in Table 1. The implementation process of Algorithm 1 requires setting some essential control parameters.

Table 1. Reinforcement learning adaptive PID controller design steps.

Algorithm 1. Design steps of reinforcement learning adaptive PID controller based on Actor-Critic framework
1. t=0, Initialize control input signal u(0) and reference input signal yd(0)
2. Initialize the control parameters wjP, wjI, wjD, vj(0), μij(0) and σj(0), set the learning rates αw,αv,αμ andασ
3. for t = 1:EndTime
4. The system output y(t) is measured and the output error is calculated according to e(t)=yd(t)y(t)
5. Calculation of kernel radial basis function of the hidden layer of RBF neural network structure (equation (30))
6. Calculate the output of the Actor at t moment by equation (32) to obtain the PID controller parameters, and calculate the output value function V(t) of the Critic at t moment by equation (33).
7. Obtain the control increment Δu(t) at the current moment by equation (33):
8. The control signal u(t)=u(t1)+Δu(t) at the current time is calculated by equation (32), and it is input to the controlled puncture system, while the system output y(t+1) at the next time is generated
9. Based on the system output, build the next instantaneous expansion state:
θ(t+1)=[e(t+1),Δy(t+1),Δ2y(t+1)]T
10. Calculate the output value function V(t+1) of Critic at the time t + 1 according to equation (33)
11. Calculate the timing differential error δTD(t) according to equation (35)
12. Update the weight coefficients of the value function according to equation (37), (39), and (40) and the weight coefficients of the new PID parameter according to equation (45)
13. Update the center and width values of the RBF kernel function according to equation (44) and equation (45).
14. end for
15. End of Algorithm 1.

In this paper, given parameters K,Fcutting and measuring tip trajectory ut(d), the corrective force Fl* magnitude of the reference can be calculated according to Eq 19. Therefore, the reference corrective force is used as the input of the reinforcement learning adaptive PID controller, the correction force measured by the actual system is used as feedback, the controller output is converted into the correction force through the linear drive device, and the closed-loop control system structure of the end effector is shown in Fig 10.

Fig 10. End effector closed-loop control system structure.

Fig 10

In summary, the puncture control strategy of the transrectal prostate BT robot is shown in Fig 11.

Fig 11. Puncture control strategy of prostate BT.

Fig 11

Results and discussion

In this chapter, the closed-loop control system experiment and comparative analysis study will be carried out to evaluate the feasibility and robustness of the proposed control method. In this paper, a robotic puncture platform is set up, as shown in Fig 12.

Fig 12. Control system block diagram.

Fig 12

The whole system consists of UR5e manipulator, end effector, upper computer and lower computer. The UR5e robotic arm is mainly used for the initial positioning of the actuator end. The structure of the robot control system is shown in Fig 13.

Fig 13. Control system block diagram.

Fig 13

When the puncture objects are the same, the results of the experiments using different puncture strategies are compared and analyzed, and the traditional PID control method and the adaptive PID control method based on reinforcement learning proposed in this paper are used to carry out the puncture experiment, and the puncture depth of each puncture is 80 mm. The first group: without corrective force, the rectum and beef tissue were punctured at a depth of 80 mm; The second group: puncture rectum and beef tissue with traditional PID control method, the puncture depth is 80 mm, and the initial PID control parameters are set as follows: K(0)=[0,0,0]T; The third group: the adaptive PID control method of reinforcement learning was used to puncture rectal and beef tissues with a puncture depth of 80 mm, and the main parameters of RL-APID were selected as follows: αw=0.13,αv=0.35,αμ=0.027,ασ=0.015,γ=0.90 Each group of experiments was repeated 5 times, and the average value was taken as the final result, as shown in Figs 14 and 15. From Fig 16, it can be seen that the needle tip trajectory will gradually deviate from the reference trajectory when no corrective force is applied, and the deviation will be significantly reduced after the corrective force is applied.

Fig 14. Comparison of puncture experiments results.

Fig 14

Fig 15. Comparison between the predicted correction force of the model and the actually applied correction force.

Fig 15

Fig 16. Needle tip trajectory error.

Fig 16

In addition, it can be seen that the RL-APID tracking error is smaller, the dynamic control performance is more stable when the reference trajectory jumps, there is no excessive overshoot or jitter, and the traditional PID will produce a relatively large overshoot and jitter during the trajectory jump, which is not conducive to the smooth progress of the puncture, in addition, the trajectory tracking error Fig 16 shows that RL-APID also has higher trajectory tracking accuracy, and the lateral driving force of the RL-APID control output can significantly reduce the deviation of the needle puncture process. Fig 17 shows the process of adaptive adjustment of RL-APID parameters during the puncture process. From Fig 16, it can be seen that piercing by adaptive PID control method of reinforcement learning can reduce the needle deflection value by 90% at a puncture depth of 80 mm, and has higher puncture accuracy.

Fig 17. Adaptive PID parameter variation.

Fig 17

Since the material of the seeds in the real tissue could not be seen, the transparent biomimetic tissue-agar glue was used for the seed implantation experiment, as shown in Fig 18, the relative coordinate values of the seeds implantation points were obtained by image processing of the seeds implantation points by MATLAB, and 5 seeds were implanted each time, and the experiment was repeated 5 times to take the average of its data. By comparing the theoretical coordinate values of particles with the actual coordinate values, the deviation value between the two is obtained, as shown in Table 2. Finally, the absolute error of average seeds implantation is 1.96 mm and the standard error is 0.56 mm, and the seeds implantation accuracy meets the clinical requirements of 3–6 mm [4].

Fig 18. Biomimetic tissue seed implantation experiment.

Fig 18

Table 2. Seed implantation experiment results.

Scheme Theoretical coordinate value Actual coordinate value Deviation value
1 (3.0, 16.0) (3.8, 15.7) 1.2
2 (3.0, 32.0) (4.2, 33.2) 1.5
3 (3.0, 48.0) (4.6, 47.8) 1.8
4 (3.0, 64.0) (5.4, 65.2) 2.6
5 (3.0, 80.0) (5.6, 78.2) 2.8

Conclusions

In this paper, a corrective force-based puncture control strategy is proposed that uses only the corrective force drive to minimize the deflection value of the needle at the final puncture depth.The puncture control strategy is divided into two stages: the preoperative needle trajectory planning stage and the intraoperative puncture strategy adjustment stage. In the preoperative needle trajectory planning stage, the optimal needle tip trajectory and puncture parameters were obtained based on the needle deflection prediction model. In the stage of adjusting the intraoperative puncture strategy, a reverse needle tip deflection prediction model was constructed, and the value of the corrective force was compensated intraoperatively, and the traditional PID control and the adaptive PID control method based on reinforcement learning were used to control the application of the correction force to achieve accurate puncture. In addition, the effectiveness of the puncture control strategy is verified and compared based on the experimental platform o0066 prostate BT robot, and the puncture experimental results show that the adaptive PID control method based on reinforcement learning can effectively reduce the deflection value of the needle tip, and has smaller overshoot and jitter than the traditional PID control method, and has higher puncture accuracy. In the seeds implantation experiment, the average implantation error of seeds implantation is 1.96 mm and the standard error is 0.56 mm, which can meet the clinical and design index requirements.

Supporting information

pone.0329065.s001.docx (12.1KB, docx)

Data Availability

“All relevant data are within the paper and its Supporting Information files. These data can also be accessed at the following link: (https://doi.org/10.6084/m9.figshare.28300652).”

Funding Statement

This work was supported in part by the Natural Science Foundation of Jiangsu Province for Young Scholars under Grant BK20240316.

References

  • 1.Taghizadeh S, Shvydka D, Shan A, Mian OY, Parsai EI. Optimization and experimental characterization of the innovative thermo-brachytherapy seed for prostate cancer treatment. Med Phys. 2024;51(2):839–53. doi: 10.1002/mp.16920 [DOI] [PubMed] [Google Scholar]
  • 2.Xiao Y, Zeng Y, Han L, Lin G, Ke H, Xu S, et al. A novel simplified transperineal prostate biopsy guided by perineal ultrasound. Br J Radiol. 2024;97(1159):1351–6. doi: 10.1093/bjr/tqae097 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Stanberry B, Webber-Jones N. Low-dose-rate brachytherapy as a primary treatment for localised and locally advanced prostate cancer: a systematic review of economic evaluations. Prostate Cancer Prostatic Dis. 2025;28(1):23–36. doi: 10.1038/s41391-024-00817-z [DOI] [PubMed] [Google Scholar]
  • 4.Valerio M, Emberton M, Eggener SE, Ahmed HU. The challenging landscape of medical device approval in localized prostate cancer. Nat Rev Urol. 2016;13(2):91–8. doi: 10.1038/nrurol.2015.289 [DOI] [PubMed] [Google Scholar]
  • 5.Zhang Y, Zhang W, Liang Y, Xu Y. Research on mechanism and strategy of high accuracy puncture of prostate. Chinese J Sci Instrum. 2017;38(6):1405–12. doi: CNKI:SUN:YQXB.0.2017-06-012 [Google Scholar]
  • 6.Li H, Wang Y, Li Y, Zhang J. A novel manipulator with needle insertion forces feedback for robot-assisted lumbar puncture. Int J Med Robot. 2021;17(2):e2226. doi: 10.1002/rcs.2226 [DOI] [PubMed] [Google Scholar]
  • 7.Dai X, Zhang Y, Jiang J, Li B. Image-guided robots for low dose rate prostate brachytherapy: Perspectives on safety in design and use. Int J Med Robot. 2021;17(3):e2239. doi: 10.1002/rcs.2239 [DOI] [PubMed] [Google Scholar]
  • 8.Simone C, Okamura AM. Modeling of needle insertion forces for robot-assisted percutaneous therapy. In: Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292). 2085–91. doi: 10.1109/robot.2002.1014848 [DOI] [Google Scholar]
  • 9.Okamura AM, Simone C, O’Leary MD. Force modeling for needle insertion into soft tissue. IEEE Trans Biomed Eng. 2004;51(10):1707–16. doi: 10.1109/TBME.2004.831542 [DOI] [PubMed] [Google Scholar]
  • 10.Misra S, Reed KB, Schafer BW, Ramesh KT, Okamura AM. Observations and models for needle-tissue interactions. In: 2009 IEEE International Conference on Robotics and Automation, 2009. 2687–92. doi: 10.1109/robot.2009.5152721 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Kataoka H, Washio T, Audette M, Mizuhara K. A Model for Relations Between Needle Deflection, Force, and Thickness on Needle Penetration. Lecture Notes in Computer Science. Springer Berlin Heidelberg. 2001. p. 966–74. doi: 10.1007/3-540-45468-3_115 [DOI] [Google Scholar]
  • 12.Misra S, Reed KB, Schafer BW, Ramesh KT, Okamura AM. Mechanics of Flexible Needles Robotically Steered through Soft Tissue. Int J Rob Res. 2010;29(13):1640–60. doi: 10.1177/0278364910369714 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Abolhassani N, Patel RV, Ayazi F. Minimization of needle deflection in robot-assisted percutaneous therapy. Int J Med Robot. 2007;3(2):140–8. doi: 10.1002/rcs.136 [DOI] [PubMed] [Google Scholar]
  • 14.Khadem M, Rossa C, Usmani N, Sloboda RS, Tavakoli M. A Two-Body Rigid/Flexible Model of Needle Steering Dynamics in Soft Tissue. IEEE/ASME Trans Mechatron. 2016;21(5):2352–64. doi: 10.1109/tmech.2016.2549505 [DOI] [Google Scholar]
  • 15.Webster RJ, Cowan NJ, Chirikjian G, Okamura AM. Nonholonomic Modeling of Needle Steering. Springer Tracts in Advanced Robotics. Springer Berlin Heidelberg. 2006. p. 35–44. doi: 10.1007/11552246_4 [DOI] [Google Scholar]
  • 16.Glozman D, Shoham M. Image-Guided Robotic Flexible Needle Steering. IEEE Trans Robot. 2007;23(3):459–67. doi: 10.1109/tro.2007.898972 [DOI] [Google Scholar]
  • 17.Abayazid M, Roesthuis RJ, Reilink R, Misra S. Integrating Deflection Models and Image Feedback for Real-Time Flexible Needle Steering. IEEE Trans Robot. 2013;29(2):542–53. doi: 10.1109/tro.2012.2230991 [DOI] [Google Scholar]
  • 18.Fallahi B, Khadem M, Rossa C, Sloboda R, Usmani N, Tavakoli M. Extended bicycle model for needle steering in soft tissue. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015. 4375–80. [Google Scholar]
  • 19.Patil S, Burgner J, Webster RJ 3rd, Alterovitz R. Needle Steering in 3-D Via Rapid Replanning. IEEE Trans Robot. 2014;30(4):853–64. doi: 10.1109/TRO.2014.2307633 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Zhao YJ, Liu ZH, Zhang YD, Liu ZQ. Kinematic model and its parameter identification for cannula flexible needle insertion into soft tissue. Adv Mech Eng. 2019;11(6):1687814019852185. doi: 10.1177/1687814019852 [DOI] [Google Scholar]
  • 21.Lee H, Kim J. Estimation of Needle Deflection in Layered Soft Tissue for Robotic Needle Steering. Advances in Intelligent Systems and Computing. Springer International Publishing. 2015. p. 1133–44. doi: 10.1007/978-3-319-08338-4_82 [DOI] [Google Scholar]
  • 22.Babaiasl M, Yang F, Swensen JP. Robotic needle steering: state-of-the-art and research challenges. Intel Serv Robotics. 2022;15(5):679–711. doi: 10.1007/s11370-022-00446-2 [DOI] [Google Scholar]
  • 23.Lehmann T, Sloboda R, Usmani N, Tavakoli M. Model-Based Needle Steering in Soft Tissue via Lateral Needle Actuation. IEEE Robot Autom Lett. 2018;3(4):3930–6. doi: 10.1109/lra.2018.2858001 [DOI] [Google Scholar]
  • 24.Lehmann T, Rossa C, Usmani N, Sloboda R, Tavakoli M. Deflection modeling for a needle actuated by lateral force and axial rotation during insertion in soft phantom tissue. Mechatronics. 2017;48:42–53. doi: 10.1016/j.mechatronics.2017.10.008 [DOI] [Google Scholar]
  • 25.Lehmann T, Rossa C, Usmani N, Sloboda RS, Tavakoli M. Intraoperative Tissue Young’s Modulus Identification During Needle Insertion Using a Laterally Actuated Needle. IEEE Trans Instrum Meas. 2018;67(2):371–81. doi: 10.1109/tim.2017.2774182 [DOI] [Google Scholar]
  • 26.Yongfeng Z, Ziyuan ZHU, Gang W. Thermal Modal Analysis of Doubly Curved Shell Based on Rayleigh⁃Ritz Method. Trans Nanjing Univ Aeronaut Astronaut.2022; 39(1).https://doi.10.16356/j.1005-1120.2022.01.006 [Google Scholar]
  • 27.Kataoka H, Washio T, Chinzei K, Mizuhara K, Simone C, Okamura AM. Measurement of the Tip and Friction Force Acting on a Needle during Penetration. Lecture Notes in Computer Science. Springer Berlin Heidelberg. 2002. p. 216–23. doi: 10.1007/3-540-45786-0_27 [DOI] [Google Scholar]
  • 28.Cimolato A, Driessen JJM, Mattos LS, De Momi E, Laffranchi M, De Michieli L. EMG-driven control in lower limb prostheses: a topic-based systematic review. J Neuroeng Rehabil. 2022;19(1):43. doi: 10.1186/s12984-022-01019-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

pone.0329065.s001.docx (12.1KB, docx)

Data Availability Statement

“All relevant data are within the paper and its Supporting Information files. These data can also be accessed at the following link: (https://doi.org/10.6084/m9.figshare.28300652).”


Articles from PLOS One are provided here courtesy of PLOS

RESOURCES