Abstract
The problem of intelligent L2-L∞ consensus design for leader-followers multiagent systems (MASs) under switching topologies is investigated based on switched control theory and fuzzy deep Q learning. It is supposed that the communication topologies are time-varying, and the model of MASs under switching topologies is constructed based on switched systems. By employing linear transformation, the problem of consensus of MASs is converted into the issue of L2-L∞ control. The consensus protocol is composed of the dynamics-based protocol and learning-based protocol, where the robust control theory and deep Q learning are applied for the two parts to guarantee the prescribed performance and improve the transient performance. The multiple Lyapunov function (MLF) method and mode-dependent average dwell time (MDADT) method are combined to give the scheduling interval, which ensures stability and prescribed attenuation performance. The sufficient existing conditions of consensus protocol are given, and the solutions of the dynamics-based protocol are derived based on linear matrix inequalities (LMIs). Then, the online design of the learning-based protocol is formulated as a Markov decision process, where the fuzzy deep Q learning is utilized to compensate for the uncertainties and achieve optimal performance. The variation of the learning-based protocol is modeled as the external compensation on the dynamics-based protocol. Therefore, the convergence of the proposed protocol can be guaranteed by employing the nonfragile control theory. In the end, a numerical example is given to validate the effectiveness and superiority of the proposed method.
1. Introduction
In recent years, the coordination control of MASs has attracted considerable attention for their broad applications in many fields [1, 2], such as formation control, cooperative attack, and attitude alignment. The MAS consists of a series of agents, which can communicate and interact with each other to realize multiple missions and adapt to the complex environment [3, 4]. In particular, much attention has been paid to the problem of consensus of MASs because of their great potential applications in both economic and military. The purpose of MASs is to construct a relationship between the agents to achieve an agreement for the state/output. In the past decades, fruitful research studies have emerged to contribute to the development in theory and applications. To mention a few, the problem of distributed formation control for MASs is studied in [5], the time-varying formation design for MASs with disturbances is proposed in [6], and the problem of finite-time consensus for switched nonlinear MASs is investigated in [7].
In practical applications, it is well known that the communication topology among the agents may change dramatically over time to adjust to multiple missions and complex environments [8, 9], such as the MASs can realize obstacle avoidance and higher flight efficiency by formation transformation [10, 11]. The design flexibility, security, and performance of convergence will be improved, which motivated the studies on the switching topologies of MASs [1, 12]. Recently, because of the broad potential applications of switching topologies, considerable significant research studies have been proposed by scholar at home and abroad. The communication topologies among interacting agents will change according to the flight conditions and missions, which can be modeled as switched systems. The switched systems consist of a series of continuous-time (or discrete-time) subsystems and a switching signal, which determines the switching strategy between subsystems. It provides an efficient approach to deal with the problem of fast time-varying conditions. Therefore, it can be inferred that the switching of topologies can be viewed as the switching between subsystems, and it is essential to study the problem of consensus protocol design to make sure the state/output can converge to the given value. In [13], the problem of time-varying formation control of MASs is investigated. The communication topologies switching among given connected topologies and the switching signal depend on the Markovian process. The Lyapunov function method is utilized to analyze the convergence. In the work of [14], the problem of event-triggered leader-following consensus problem for multiagent systems with external disturbances is addressed under switching topologies. A novel distributed event-triggered protocol is proposed to realize disturbance rejection based on extended state observer. The average dwell time (ADT) method is utilized to ensure the stability of the event-triggered protocol. In [15], the time-varying practical formation problem is studied for spacecraft, where switching topologies and time-delays are taken into consideration. Sufficient conditions are provided to ensure that the error system is convergent, which are derived based on the ADT method. It is well known that the research studies mentioned above are proposed to deal with the problem of switching topologies. However, the convergence is guaranteed based on the ADT method. It can be inferred that the common parameters are applied for all subsystems in the ADT method, which will lead to conservativeness. To obtain tighter bounds on dwell time and improve the design flexibility of the algorithm, MDADT is applied during last decades. In [16], the MDADT method and multiple discontinuous Lyapunov function (MDLF) method are combined to analyze the stability of switched systems with unstable modes. The sufficient conditions are established, and the results in existing literature are covered as a special case. The fast switching and slow switching in the framework of MDADT are applied to unstable modes and stable modes. In [17], the global adaptive control algorithm for switched systems is proposed based on the MDADT method. The different properties of subsystems are taken into consideration. Then, the adaptive tracking controller is applied to the nonlinear switched systems with external disturbance and unmodeled dynamics, which illustrates the effectiveness and superiority of the MDADT method. In the work of [18], the event-triggered sliding mode controller is proposed. By employing the MDADT method and event-triggered strategy, less conservative and more practical results are obtained. Sufficient conditions are given to ensure stochastically exponential stability by the aid of the LMI technique. The literature mentioned has provided fruitful results on consensus protocol design for MASs under switching topologies. However, stability and convergence are ensured by the traditional ADT method. The different properties of subsystems cannot be considered, which will lead to conservativeness. Therefore, how to obtain less restrictive results is still an open and challenging problem, which has been fully investigated, and it has an important value and potential applications in practice.
Moreover, in practical environment, there always exist uncertainties and disturbances, which will lead to performance degradation and even instability [19, 20]. Therefore, it is essential to investigate the robust consensus problem to improve the performance in the uncertain environment [21–23]. In the work of [24], the problem of distributed H∞ containment control for MASs with switching topologies is studied. An observer-based containment control scheme is proposed. The external disturbance and time delay in the environment are taken into consideration, which is more applicable than the traditional method. By employing the Lyapunov function method and LMIs technique, the sufficient existing conditions and solutions of control protocol are given in the form of LMIs. In [25], the problem of time-varying formation of second-order discrete-time MASs under switching topologies and the time delay is investigated. The sufficient conditions are given to ensure MASs accomplish the mission of time-varying formation based on the state transformation method. The time delay and uncertainties are considered. Compared with the existing literature, the proposed can overcome the undesirable response caused by time delay and improve the transient performance. In the work of [26], the problem of formation control for tail-sitters in flight mode transitions is studied. The nonlinear dynamics and uncertainties are considered, and the robust time-varying formation control protocol is proposed. It is proven that the tracking errors can converge to the origin in finite time. The problem of L2-gain robust protocol for time-varying output formation-containment of MASs is addressed in [27]. The PID-based output-feedback control protocol is provided to ensure that all followers can track a time-varying formation reference, where communication delays and external disturbance are taken into consideration. The asymptotic stability of MASs is proved by the Lyapunov function method. However, as well known, the transient performance and robustness cannot achieve simultaneously. Therefore, we need to make comprise of the transient performance and robustness, which still remains an open and challenging problem.
In addition, with the development of computing ability, the intelligent technique has been an attractable problem during the last decades [28–30]. It is widely applied in the areas of target recognition, machine vision, robotic systems, and controller design [31, 32]. It provides an efficient method to improve the autonomy and design flexibility of the system [33]. The most widely used methods are the deep learning and reinforcement learning. As a combination of deep learning and reinforcement learning, the advantages of deep learning and reinforcement learning are adopted, which include the characteristics of self-fitting and self-learning. In the work of [34], the automatic completion of multiple peg-in-hole assemble tasks is realized. Because the traditional method requires an accurate contact model and complex analysis, the intelligent control method is formulated by constructing the task as a Markov decision process. The deep deterministic policy gradient (DDPG) algorithm is proposed to accomplish the task to achieve optimal policy and avoid risky actions. In [35], a noninteger PID controller is proposed based on the DDPG algorithm. The measurement noises and external disturbances are taken into consideration. The kinematic controller and dynamic controller are proposed to achieve optimal performance. The DDPG algorithm is given to compensate for the uncertainties and disturbances in the framework of actor-critic. A numerical example is given to illustrate the effectiveness of the proposed method. Cheng et al. [36] proposed the real-time controller for the problem of fuel-optimal moon landing. Because the traditional method cannot meet the demand of high requirements of real-time performance and autonomy, the deep reinforcement learning algorithm is proposed for the real-time optimal control based on actor-indirect method architecture. The deep neural networks are applied for initial guesses, and the efficiency of training data is guaranteed. The literature mentioned above has provided considerable meaningful results in the area of machine learning. However, to the best of the authors' knowledge, the intelligent consensus design for MASs with considerations of stability, robustness, and optimal transient performance has not been fully studied yet. It is essential and important to achieve optimal comprise of robustness and transient performance.
Based on the statement above, it can be inferred that the problem of the improvement of autonomy and design flexibility for the system needs to be studied. The problem of consensus protocol design for MASs under switching topologies has not been fully investigated yet. The design flexibility can be improved by employing tighter bounds on dwell time because less conservative results can be obtained, and it leaves more room to ensure the switching logic stays in the subsystems with better performance for long enough time. Moreover, it is of great importance to combine the advantages of the traditional method and intelligent technique, which can ensure convergence, robustness, and transient performance simultaneously. Therefore, the problem of intelligent L2-L∞ consensus design of MASs under switching topologies is investigated. The convergence and robustness are guaranteed by the Lyapunov function method and the MDADT method, which are more applicable. The transient performance is improved by fuzzy deep Q learning, in which the fuzzy reward function is proposed for the complex scheduling process. The main contributions of this study can be summarized as follows:
The L2-L∞ consensus protocol of MASs under switching topologies is designed. The problem of L2-L∞ consensus of MASs is converted into the problem of stability analysis for switched systems, which is more applicable than the traditional method. The MDADT method and multiple Lyapunov function method are combined to guarantee the stability and prescribed attenuation performance index, which can obtain tighter bounds on dwell time and less conservative results.
The consensus protocol is composed of the dynamics-based consensus protocol and learning-based consensus protocol. Compared with the traditional method, the proposed strategy can ensure the stability, robustness, and transient performance simultaneously.
The fuzzy reward function is utilized to improve the efficiency of the deep reinforcement learning algorithm. The design of reward function for the traditional method mainly depends on the experience of designer, which will lead to complexity. The fuzzy reward function can improve the data efficiency and ensure optimal performance.
The rest of the study is organized as follows: the preliminaries and problem statement are provided in Section 2; in Section 3, the main results of the study are given; the numerical example is given in Section 4, which is followed by the conclusion in Section 5.
2. Preliminaries and Problem Statement
In this study, it is supposed that MASs are composed of a leader labelled as 0 and n followers labelled as 1, 2,…, n. The connection topology among n followers can be described as a time-varying model with Nf topologies. We define 𝒢σ(k)=(𝒢1, 𝒢2,…, 𝒢Nf) as undirected connected graph, respectively. ℋ=(1,2,…, n), n > 1 represents the set of finite nodes. s=σ(k) : [0, ∞)⟶R={1,2,…, Nf} denotes the switching signal, which is a piecewise continuous function of time and takes value in the finite set ℋ. 𝒜σ(k)=(aijσ(k))n×n and ℒσ(k)=(lijσ(k))n×n are the adjacency matrices of the undirected graph 𝒢σ(k) and the Laplacian matrix at time instant k, where aijσ(k) stands for the element of adjacency matrix, where aijσ(k)=1 represents that the node i can obtain information from node j, and lijσ(k) is defined in the following equation.
| (1) |
Then, for given node i ∈ ℋ, we can define the neighbors of node i as 𝒩i,σ(k)={j ∈ ℋ : aijσ(k)=1}.
Another undirected connected graph is defined as to indicate the information transformation between the leader and the followers with n nodes. Define a diagonal matrix Θσ(k)=diag{θ1σ(k), θ2σ(k),…, θnσ(k)}, where θiσ(k)=1 stands for that the node i ∈ ℋ can obtain information from the leader; otherwise, we define θiσ(k)=0.
Therefore, MASs with leader-followers can be described as in the following equations:
| (2) |
| (3) |
where A, B, C, and D are the system matrices with appropriate dimensions, x0(k)=[x01(k), x02(k),…,x0p(k)]T ∈ Rp represents the state vector of leader, xi(k)=[xi1(k), xi2(k), ..., xip(k)]T ∈ Rp is the state of the ith follower, ui(k)=[ui1(k), ui2(k),…,uil(k)]T ∈ Rl is the input of the ith follower, zi(k)=[zi1(k), zi2(k),…,ziq(k)]T ∈ Rq stands for the output of the ith follower, and ωi(k) ∈ Rm denotes the external disturbance belonging to L2[0, ∞). It is supposed that the agent i can obtain information from its neighbors and leader. Therefore, we define υi(k) as relative state measurements of the ith agent, which can be described as follows:
| (4) |
In this study, the control input of the ith agent to ensure the consensus of leader-followers is proposed.
| (5) |
where Kσ(k) is the control parameter to be determined by robust control theory, and Kc,σ(k) is the compensated parameter obtained by deep Q learning. In this study, the gained parameters Kc,σ(k) are supposed to vary in a finite set with given bounds. The Kc,σ(k) can be viewed as additional perturbance of Kσ(k), which can be described as follows:
| (6) |
where Mσ(k) ∈ Rl×lΔ and Nσ(k) ∈ RqΔ×q are the known matrices with appropriate dimensions, and Fσ(k) ∈ RlΔ×qΔ are the unknown matrices with Fσ(k)TFσ(k) ≤ I.
For the ith agent, the error of state is defined as ei(k)=xi(k) − x0(k). Then, the closed-loop system can be rewritten as
| (7) |
where e(k)=[e1T(k),…,enT(k)]T, z(k)=[z1T(k),…,znT(k)]T, ω(k)=[ω1T(k),…,ωnT(k)]T, , , and .
To facilitate the proof, the definitions and lemmas are given as follows.
Definition 1 (see [37]). —
For given switching signal σ(k) and k1 > 0, define Nσ,s(0, k1) as the number of switching instants over the time interval (0, k1). Tσs(0, k1) is set to be the activated time of undirected graph 𝒢s during (0, k1). There exist constant scalars N0 ≥ 0 and τas > 0, such that
(8) Then, τas is called the mode-dependent average dwell time and Nσ,s(0, k1) is the mode-dependent chatter bound, respectively. In this study, we set N0=0.
Definition 2 . —
(see [37]). If there exist control protocol in equation (5), all agents asymptotically track the state trajectory of the leader, such that
(9)
Definition 3 . —
(see [38]). For given constant scalars 0 < δ < 1 and γ > 0, the prescribed L2 − L∞ attenuation performance γ is satisfied such that
The MASs in equations (2)-(3) are asymptotically stable when ω(k)=0.
The following inequation holds for all nonzero ω(k) ∈ l2(0, ∞].
(10)
Lemma 1 . —
(see [35]). The matrices ℒσ(k)+Θσ(k) are symmetric and positive definite if and only if the graphs are connected for t ≥ 0. Moreover, there exist a transformation matrix Tσ(k), such that the following equation holds.
(11) where λiσ(k), i ∈ ℋ are the nonzero eigenvalues of matrices ℒσ(k)+Θσ(k).
Lemma 2 . —
(see [39]). For given constant a > 0 and real matrices Θ,U,V,W, it is concluded that equation (12) is equivalent to equation (13).
(12)
(13)
Lemma 3 (see [39]). —
For given symmetric matrix 𝒯 and matricesℳ, 𝒩, if there exist constant scalar ε > 0, such that
(14)
Then, the following equation holds for any appropriate ℱ with ℱTℱ ≤ I.
| (15) |
3. Main Results
3.1. L2-L∞ Consensus Protocol Design
In this section, the L2-L∞ consensus protocol is proposed, and the stability and prescribed performance are guaranteed.
Lemma 4 . —
For given constant scalars 0 < δ < 1, γ > 0. The system in (7) with control input in (5) is asymptotic stable with L2-L∞ attenuation performance γ if and only if the following equation holds.
(16) where
(17)
Proof —
Substituting equation (17) to (7), one can obtain equation (16). It can be inferred that the transformation matrix Tσ(k) is unique; therefore, we have the following equations.
(18)
(19) It is obvious that the problem of robust consensus protocol design can be converted to the controller design of (16).
Remark 1 . —
The system in equation (16) consists of the independent system in equation (20). Therefore, the stability of equation (7) is equivalent to the stability of n subsystems in equation (20); the attempt to ensure the prescribed attenuation performance of (7) can be converted to guarantee the attenuation performance of (16).
(20) where , and .
In Theorem 1, the sufficient conditions to guarantee the stability and prescribed attenuation performance index are presented.
Theorem 1 . —
For given constant scalars μs > 1, 0 < δs < 1, γ > 0, if there exist Lyapunov functions , and class functions κ1, κ2, the switched systems in equation (20) with MDADT satisfying equation (25) are globally uniformly asymptotically stable with prescribed L2-L∞ attenuation performance γ, such that
(21)
(22)
(23)
(24)
(25)
Proof —
The entire proof can be divided into two steps.
- (1)
The stability of equation (20).
The switching instants in the time interval (0, k) are set to be k1, k2,…, kt with kt+1=k. Then, (26) holds when ‖ωi(k)‖ ≡ 0.
(26) Together with (22), we can conclude that
(27) Combining with Definition 1, we have
(29) Therefore, the system in (20) with MDADT satisfying (25) is globally uniformly asymptotically stable.
- (2)
The system in equation (20) has prescribed L2-L∞ attenuation performance γ.
Together with equations (22)-(23), one has
(31) Then, one can obtain the equation as follows by iteration.
(32) Substituting the equation above into (8), one can obtain that
(33) According to the conditions μs > 1, 0 < δs < 1, and (25), we have
(34)
(35) Combining equations (32)–(34), one can obtain equation (35).
(36) Together with (24), it is obvious that
(37) which implies that , and the proof is complete.
Corollary 1 . —
For given constant scalars μs > 1, 0 < δs < 1, γ > 0, if there exist positive-definite matrices Pi,s ∈ Rp×p satisfying equations (37)–(39), the switched systems in equation (20) with MDADT satisfying equation (25) are globally uniformly asymptotically stable with prescribed L2-L∞ attenuation performance γ.
(38)
(39)
(40)
Proof —
The Lyapunov function , is defined as follows:
(41) According to (20) and (39), we can conclude that (38) is equivalent to (24). Along the trajectory of , one has
(42) Together with equations (40)-(41), we have
(43) According to Theorem 1, we can conclude that the system in (20) with MDADT satisfying (25) is globally uniformly asymptotically stable with prescribed L2-L∞ attenuation performance γ.
Based on Theorem 1 and Corollary 1, the solutions of consensus protocol are given in Theorem 2.
Theorem 2 . —
For given constant scalars μs > 1, 0 < δs < 1, γ > 0, as > 0, and εs > 0, if there exist positive-definite matrices Pi,s ∈ Rp×p, matrices Xs ∈ Rl×l, Ys ∈ Rl×q, the MASs in (2)-(3) with control input in equation (5) are asymptotically stable with prescribed L2-L∞ attenuation performance γ such that equation (43) holds.
(44) The parameters of control protocol can be derived in (43).
(45) where Ξ11=−(1 − δs)Pi,s+εs(λis)2NsTNs, Ξ13=ATPi,s − λisYsTBT.
Proof —
According to Schur complement, it is obvious that equation (43) is equivalent to equation (45).
(46) Define Θs=𝒯s+εs−1ℳsT+εs𝒩sT𝒩s, , and , where , , and .
Together with Lemma 2, we have
(47) Moreover, based on Lemma 3, one has
(48) According to Schur complement, it is obvious that (46) is equivalent to (37), which completes the proof.
4. Compensated Consensus Protocol Design Based on FDQL
In this section, the learning-based consensus protocol is proposed based on deep reinforcement learning, where fuzzy deep Q learning is utilized. The stability and prescribed attenuation performance are guaranteed by the robust control theory, and the learning-based control protocol is introduced to improve the transient performance and realize optimal control policy. The output of the learning-based control protocol can be viewed as an additional variation of robust consensus protocol. The online scheduling of control protocol is established as a Markovian process. Therefore, the advantages of robust control theory and deep reinforcement learning are combined.
It is well known that reinforcement learning is composed of state, action, agent, and environment. The state of kth step is defined as sk, and the chosen action is supposed to be ak; then, the reward function rk and the state sk+1 are generated based on the interaction with the environment. Therefore, the optimal control policy can be obtained by maximum the reward function.
To improve the convergence of consensus protocol, the state is defined as and the action is defined as ak=[Kc,σ(k)].
In Q learning, the deep neural network is utilized to approximate the action-state value function Q∗(sk, ak), which can be described as
| (49) |
where f(sk, ak, ω) denotes the function of deep neural networks.
The action is chosen based on the maximum Q value:
| (50) |
There exist two neural networks in the deep Q learning algorithm, whose structures are the same and can be called as the critic neural network and target neural network. The parameters of the critic neural network are updated based on temporal-difference learning. The output of the critic neural network is defined as Q(sk, ak, ω) and the output of the target neural network is defined as Q(sk, ak, ω−). Therefore, the parameters of the critic neural network are updated based on the equation as follows:
| (51) |
where Lr is the learning rate, γs denotes the discount factor, R represents the reward of state transition from sk to s′ through action ak, and maxa′(Q(s′, a′, ω−)) stands for the maximum Q value of the target neural network.
It can be inferred that the reward function has an important influence on the final performance. The design of traditional deep Q learning mainly depends on the experience of designers, which can not achieve optimal performance and will improve the computational complexity. In this study, the reward function is applied to design the reward function. The input value of fuzzy reward function can be divided into five categories, which can be described as VB, B, N, G, and VG. The five categories represent very bad, bad, normal, good, and very good. In this study, it is supposed that there are four followers. Therefore, the inputs of the fuzzy reward system are set to be |e1|, |e2|, |e3|, and |e4|. It can be inferred that each fuzzy set includes 25 rules, and the total number of the fuzzy rules is 75. The output of the fuzzy reward function is limited in the interval [−1,0), and the defuzzifier of the fuzzy reward function is defined as
| (52) |
Based on the statement above, the learning-based consensus protocol design algorithm can be summarized as follows:
Remark 2 . —
The FDQN algorithm proposed in this study can improve the transient convergence performance of MASs. The output of the deep Q network is supposed to be variation of parameters of consensus protocol. As well known, the design of reward function in the traditional method depends of the experience of the designers. To overcome the problem, the fuzzy reward function is developed to improve the learning efficiency in this study.
5. Numerical Example
In this section, an example is provided to illustrate the effectiveness of the method. The model of MASs is constructed as follows:
| (53) |
The external disturbance is
| (54) |
The switching topologies are shown in Figure 1. Then, we can obtain the Laplace matrices as follows:
| (55) |
Figure 1.

Switching topologies of MASs. (a) Interacting topology 𝒢1. (b) Interacting topology 𝒢2.
The parameters of switching topologies are given as follows:
| (56) |
Therefore, we can obtain MDADT according to (25).
| (57) |
It is well known that the ADT method can be viewed as a special case of the MDADT method. Therefore, it can be inferred that τa=max{τai}=0 · 4266. It is obvious that tighter bounds on dwell time and less conservative results can be obtained. Then, we define the attenuation performance index γ=0 · 9, and we can obtain the parameters of consensus protocol based on Theorem 2.
The switching logic is shown in Figure 2. In order to illustrate the effectiveness and superiority of the proposed method, the traditional ADT method and MDADT method are given as comparisons. From the statement above, we have realized that MDADT can obtain tighter bounds and less conservative results. Moreover, the comparisons of state response of the ADT method and MDADT method are shown in Figures 3–6. The state responses of MASs with ADT switching topologies are shown in Figures 3-4. The state responses of MASs with MDADT switching topologies are shown in Figures 5-6. We can see that the transient performance of the ADT method is better than that of the MDADT method because the different characteristics of subsystems are taken into consideration, which will no doubt improve the design flexibility and make it more applicable for practical conditions.
Figure 2.

The switching logic.
Figure 3.

The state response of (x)1 under the ADT method.
Figure 4.

The state response of (x)2 under the ADT method.
Figure 5.

The state response of (x)1 under the MDADT method.
Figure 6.

The state response of (x)2 under the MDADT method.
Validate the superiority of the proposed method. The state response of the proposed method is shown in Figures 7–11. The state responses of the proposed method are shown in Figures 7-8. We can conclude that the transient performance can be improved by the aid of fuzzy deep Q learning. The advantages of the traditional method and intelligent method are combined. Compared with the traditional method, the transient performance can be improved, and compared with the intelligent method, stability and training efficiency can be guaranteed. The attenuation performance index is shown in Figure 9, from which we can see that the robustness of the proposed is ensured. The episodes reward response is shown in Figure 10, and we can see that the reward function of the fuzzy deep Q learning algorithm can converge to the neighbor of the origin, which demonstrates the effectiveness of the algorithm in this study. In addition, the response of the action is shown in Figure 11, from which we can see that the learning-based consensus protocol is provided to compensate the additional input caused by the uncertainties.
Figure 7.

The state response of (x)1 with the proposed method.
Figure 8.

The state response of (x)2 with the proposed method.
Figure 9.

The response of attenuation performance index.
Figure 10.

The response of episodes reward.
Figure 11.

The response of the action.
Based on the statement above, we can conclude that the convergence, robustness, and prescribed attenuation performance index are guaranteed. The less conservative results and tighter bounds on dwell time can be obtained by the MDADT method. The transient performance of the system can be improved based on the fuzzy deep Q learning algorithm. It is worth mentioning that the traditional robust cannot make comprised of robustness and transient performance, and the intelligent method always cannot guarantee convergence. By employing the proposed method, convergence, robustness, and transient performance are guaranteed simultaneously.
6. Conclusions
The problem of intelligent L2-L∞ consensus design for MASs under switching topologies is investigated in this study. The switching topologies of MASs are modeled as switched system theory by employing linear transformation. Then, the problem of consensus protocol design can be converted to the problem of L2-L∞ control. To ensure the convergence, robustness, and transient performance simultaneously, the proposed consensus protocol is composed of dynamics-based consensus protocol and learning-based consensus protocol, which provides baseline and compensation of uncertainties. The baseline of consensus protocol is obtained by dynamics-based consensus protocol, which is provided based on the MDADT method and MLF method. The scheduling interval of learning-based protocol is given by nonfragile control theory. Then, the learning-based consensus protocol is proposed based on the fuzzy deep Q learning algorithm to improve the transient performance and achieve optimal policy, where the fuzzy reward function is introduced to improve the learning efficiency.
Algorithm 1.

Learning-based control protocol design based on FDQN.
Acknowledgments
This study was cosupported by the National Natural Science Foundation of China (61973253, 62176214, and 62101590), the Aeronautical Science Foundation of China (201907053001), and the National Natural Science Foundation of Shaanxi Province (2019JQ-014 and 2020JQ-481).
Data Availability
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
References
- 1.Zhao Y., Duan Q., Wen G., Zhang D., Wang B. Time-varying formation for general linear multiagent systems over directed topologies: a fully distributed adaptive technique. IEEE Transactions on Systems, Man, and Cybernetics: Systems . 2021;51(1):532–541. doi: 10.1109/tsmc.2018.2877818. [DOI] [Google Scholar]
- 2.Zhou S., Dong X., Hua Y., Yu J., Ren Z. Predefined formation‐containment control of high‐order multi‐agent systems under communication delays and switching topologies. IET Control Theory & Applications . 2021;15(12):1661–1672. doi: 10.1049/cth2.12150. [DOI] [Google Scholar]
- 3.Zhang D., Tang Y., Ding Z., Qian F. Event-based resilient formation control of multiagent systems. IEEE Transactions on Cybernetics . 2021;51(5):2490–2503. doi: 10.1109/tcyb.2019.2910614. [DOI] [PubMed] [Google Scholar]
- 4.Zhang K., Zhou B., Jiang H., Duan G. Finite-time output regulation by bounded linear time-varying controls with applications to the satellite formation flying. International Journal of Robust and Nonlinear Control . 2021;32:1–21. doi: 10.1002/rnc.5832. [DOI] [Google Scholar]
- 5.Liu H., Wang Y., Xi J. Completely distributed formation control for networked quadrotors under switching communication topologies. Systems & Control Letters . 2021;147 doi: 10.1016/j.sysconle.2020.104841.104841 [DOI] [Google Scholar]
- 6.Wang L., Xi J., He M., Liu G. Robust time‐varying formation design for multiagent systems with disturbances: extended‐state‐observer method. International Journal of Robust and Nonlinear Control . 2020;30(7):2796–2808. doi: 10.1002/rnc.4941. [DOI] [Google Scholar]
- 7.Zou W., Shi P., Xiang Z., Shi Y. Finite-time consensus of second-order switched nonlinear multi-agent systems. IEEE Transactions on Neural Networks and Learning Systems . 2020;31(5):1757–1762. doi: 10.1109/tnnls.2019.2920880. [DOI] [PubMed] [Google Scholar]
- 8.Yang Y., Si X., Yue D., Tian Y.-C. Time-varying formation tracking with prescribed performance for uncertain nonaffine nonlinear multiagent systems. IEEE Transactions on Automation Science and Engineering . 2021;18(4):1778–1789. doi: 10.1109/tase.2020.3019346. [DOI] [Google Scholar]
- 9.Yu J., Dong X., Li Q., Lv J., Ren Z. Distributed adaptive cooperative time-varying formation tracking guidance for multiple aerial vehicles system. Aerospace Science and Technology . 2021;117 doi: 10.1016/j.ast.2021.106925.106925 [DOI] [Google Scholar]
- 10.Zou W., Zhou C., Guo J., Xiang Z. Global adaptive leader-following consensus for second-order nonlinear multiagent systems with switching topologies. IEEE Transactions on Circuits and Systems II: Express Briefs . 2021;68(2):702–706. doi: 10.1109/tcsii.2020.3009291. [DOI] [Google Scholar]
- 11.Zhang P., Xue H., Gao S., Zhang J. Distributed adaptive consensus tracking control for multi-agent system with communication constraints. IEEE Transactions on Parallel and Distributed Systems . 2021;32(6):1293–1306. doi: 10.1109/tpds.2020.3048383. [DOI] [Google Scholar]
- 12.Nguyen T.-M., Qiu Z., Nguyen T. H., Cao M., Xie L. Persistently excited adaptive relative localization and time-varying formation of robot swarms. IEEE Transactions on Robotics . 2020;36(2):553–560. doi: 10.1109/tro.2019.2954677. [DOI] [Google Scholar]
- 13.Li B., Wen G., Peng Z., Wen S., Huang T. Time-varying formation control of general linear multi-agent systems under Markovian switching topologies and communication noises. IEEE Transactions on Circuits and Systems II: Express Briefs . 2021;68(4):1303–1307. doi: 10.1109/tcsii.2020.3023078. [DOI] [Google Scholar]
- 14.Mu R., Wei A., Li H., Wang Z. M. Event‐triggered leader‐following consensus for multi‐agent systems with external disturbances under fixed and switching topologies. IET Control Theory & Applications . 2020;14(11):1486–1496. doi: 10.1049/iet-cta.2019.0925. [DOI] [Google Scholar]
- 15.Yuan Y., Wang Y., Guo L. Sliding-mode-observer-based time-varying formation tracking for multispacecrafts subjected to switching topologies and time-delays. IEEE Transactions on Automatic Control . 2021;66(8):3848–3855. doi: 10.1109/tac.2020.3030866. [DOI] [Google Scholar]
- 16.Zhao X., Shi P., Yin Y., Nguang S. K. New results on stability of slowly switched systems: a multiple discontinuous Lyapunov function approach. IEEE Transactions on Automatic Control . 2017;62(7):3502–3509. doi: 10.1109/tac.2016.2614911. [DOI] [Google Scholar]
- 17.Niu B., Zhao P., Liu J., Ma H., Liu Y. Global adaptive control of switched uncertain nonlinear systems: an improved MDADT method. Automatica . 2020;115108872 [Google Scholar]
- 18.Yan H., Zhang H., Zhan X., Wang Y., Chen S., Yang F. Event-triggered sliding mode control of switched neural networks with mode-dependent average dwell time. IEEE Transactions on Systems, Man, and Cybernetics: Systems . 2021;51(2):1233–1243. doi: 10.1109/tsmc.2019.2894984. [DOI] [Google Scholar]
- 19.Yu D., Chen C. L. P. Smooth transition in communication for swarm control with formation change. IEEE Transactions on Industrial Informatics . 2020;16(11):6962–6971. doi: 10.1109/tii.2020.2971356. [DOI] [Google Scholar]
- 20.Xiao H., Philip Chen C. L. Time-varying nonholonomic robot consensus formation using model predictive based protocol with switching topology. Information Sciences . 2021;567:201–215. doi: 10.1016/j.ins.2021.01.034. [DOI] [Google Scholar]
- 21.Ren Y., Hou Z. Robust model‐free adaptive iterative learning formation for unknown heterogeneous non‐linear multi‐agent systems. IET Control Theory & Applications . 2020;14(4):654–663. doi: 10.1049/iet-cta.2019.0738. [DOI] [Google Scholar]
- 22.Xu Y., Zhao S., Luo D., You Y. Affine formation maneuver control of high-order multi-agent systems over directed networks. Automatica . 2020;118109004 [Google Scholar]
- 23.Liu H., Li S., Li G., Wang H. Robust adaptive control for fractional-order financial chaotic systems with system uncertainties and external disturbances. Information Technology and Control . 2017;46(2):246–259. doi: 10.5755/j01.itc.46.2.13972. [DOI] [Google Scholar]
- 24.Wang D., Huang Y., Guo S., Wang W. Distributed H∞ containment control of multi-agent systems over switching topologies with communication time delay. International Journal of Robust and Nonlinear Control . 2020;30(1):1–12. doi: 10.1002/rnc.5055. [DOI] [Google Scholar]
- 25.Wang J., Han L., Li X., Dong X., Li Q., Ren Z. Time‐varying formation of second‐order discrete‐time multi‐agent systems under non‐uniform communication delays and switching topology with application to UAV formation flying. IET Control Theory & Applications . 2020;14(14):1947–1956. doi: 10.1049/iet-cta.2020.0183. [DOI] [Google Scholar]
- 26.Liu D., Liu H., Lewis F. L., Valavanis K. P. Robust time-varying formation control for tail-sitters in flight mode transitions. IEEE Transactions on Systems, Man, and Cybernetics: Systems . 2021;51(7):4102–4111. doi: 10.1109/tsmc.2019.2931482. [DOI] [Google Scholar]
- 27.De Tommasi G., Lui D. G., Petrillo A., Santini S. A L2‐gain robust PID‐like protocol for time‐varying output formation‐containment of multi‐agent systems with external disturbance and communication delays. IET Control Theory & Applications . 2021;15(9):1169–1184. doi: 10.1049/cth2.12114. [DOI] [Google Scholar]
- 28.Wang X., Gu Y., Cheng Y., Liu A., Chen C. L. P. Approximate policy-based accelerated deep reinforcement learning. IEEE Transactions on Neural Networks and Learning Systems . 2020;31(6):1820–1830. doi: 10.1109/tnnls.2019.2927227. [DOI] [PubMed] [Google Scholar]
- 29.Peng Z., Wang D., Li T., Han M. Output-feedback cooperative formation maneuvering of autonomous surface vehicles with connectivity preservation and collision avoidance. IEEE Transactions on Cybernetics . 2020;50(6):2527–2535. doi: 10.1109/tcyb.2019.2914717. [DOI] [PubMed] [Google Scholar]
- 30.Liu H., Pan Y., Li S., Chen Y. Synchronization for fractional-order neural networks with full/under-actuation using fractional-order sliding mode control. International Journal of Machine Learning and Cybernetics . 2018;9(7):1219–1232. doi: 10.1007/s13042-017-0646-z. [DOI] [Google Scholar]
- 31.Li H., Chen S., Izzo D., Baoyin H. Deep networks as approximators of optimal low-thrust and multi-impulse cost in multitarget missions. Acta Astronautica . 2020;166:469–481. doi: 10.1016/j.actaastro.2019.09.023. [DOI] [Google Scholar]
- 32.Tan T., Bao F., Deng Y., Jin A., Dai Q., Wang J. Cooperative deep reinforcement learning for large-scale traffic grid signal control. IEEE Transactions on Cybernetics . 2020;50(6):2687–2700. doi: 10.1109/tcyb.2019.2904742. [DOI] [PubMed] [Google Scholar]
- 33.Shalumov V. Cooperative online guide-launch-guide policy in a target-missile-defender engage-ment using deep reinforcement learning. Aerospace Science and Technology . 2020;104105996 [Google Scholar]
- 34.Xu J., Hou Z., Wang W., Xu B., Zhang K., Chen K. Feedback deep deterministic policy gradient with fuzzy reward for robotic multiple peg-in-hole assembly tasks. IEEE Transactions on Industrial Informatics . 2019;15(3):1658–1667. doi: 10.1109/tii.2018.2868859. [DOI] [Google Scholar]
- 35.Gheisarnejad M., Khooban M. H. An intelligent non-integer PID controller-based deep reinforcement learning: implementation and experimental results. IEEE Transactions on Industrial Electronics . 2021;68(4):3609–3618. doi: 10.1109/tie.2020.2979561. [DOI] [Google Scholar]
- 36.Cheng L., Wang Z., Jiang F. Real-time control for fuel-optimal Moon landing based on an interactive deep reinforcement learning algorithm. Astrodynamics . 2019;3(4):375–386. doi: 10.1007/s42064-018-0052-2. [DOI] [Google Scholar]
- 37.Shi T., Shi P., Zhang L. Distributed L2-L∞ consensus of multi-agent systems under asynchronous switching topologies. International Journal of Control . 2022;95(2):544–553. doi: 10.1080/00207179.2020.1803409. [DOI] [Google Scholar]
- 38.Li X., Zhou C., Zhou J., Wang Z., Xia J. Couple-group L2-l∞ consensus of nonlinear multi-agent systems with markovian switching topologies. International Journal of Control, Automation and Systems . 2019;17(3):575–585. doi: 10.1007/s12555-018-0550-7. [DOI] [Google Scholar]
- 39.Zhou J., Wang Y., Zheng X., Wang Z., Shen H. Weighted $${\mathcal {H}}_{\infty }$$ H ∞ consensus design for stochastic multi-agent systems subject to external disturbances and ADT switching topologies. Nonlinear Dynamics . 2019;96(2):853–868. doi: 10.1007/s11071-019-04826-9. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data used to support the findings of this study are included within the article.
