Skip to main content
PLOS One logoLink to PLOS One
. 2024 Jul 24;19(7):e0307767. doi: 10.1371/journal.pone.0307767

Adaptive control for circulating cooling water system using deep reinforcement learning

Jin Xu 1,#, Han Li 1,#, Qingxin Zhang 1,*,#
Editor: Lalit Chandra Saikia2
PMCID: PMC11268623  PMID: 39047030

Abstract

Due to the complex internal working process of circulating cooling water systems, most traditional control methods struggle to achieve stable and precise control. Therefore, this paper presents a novel adaptive control structure for the Twin Delayed Deep Deterministic Policy Gradient algorithm, which is based on a reference trajectory model (TD3-RTM). The structure is based on the Markov decision process of the recirculating cooling water system. Initially, the TD3 algorithm is employed to construct a deep reinforcement learning agent. Subsequently, a state space is selected, and a dense reward function is designed, considering the multivariable characteristics of the recirculating cooling water system. The agent updates its network based on different reward values obtained through interactions with the system, thereby gradually aligning the action values with the optimal policy. The TD3-RTM method introduces a reference trajectory model to accelerate the convergence speed of the agent and reduce oscillations and instability in the control system. Subsequently, simulation experiments were conducted in MATLAB/Simulink. The results show that compared to PID, fuzzy PID, DDPG and TD3, the TD3-RTM method improved the transient time in the flow loop by 6.09s, 5.29s, 0.57s, and 0.77s, respectively, and the Integral of Absolute Error(IAE) indexes decreased by 710.54, 335.1, 135.97, and 89.96, respectively, and the transient time in the temperature loop improved by 25.84s, 13.65s, 15.05s, and 0.81s, and the IAE metrics were reduced by 143.9, 59.13, 31.79, and 1.77, respectively. In addition, the overshooting of the TD3-RTM method in the flow loop was reduced by 17.64, 7.79, and 1.29 per cent, respectively, in comparison with the PID, the fuzzy PID, and the TD3.

1 Introduction

In the production process of many industrial sectors, a large amount of waste heat will be generated. Currently, it is necessary to use cold water or other liquids to absorb the heat in time to ensure the regular operation of the production process. The cold water used in this process is called cooling water in industrial production. To save water resources and reduce energy costs, consider recycling industrial cooling water to form a circulating cooling water system. With the continuous development of modern industry, as an essential cooling method in industrial production, circulating cooling water system is widely used in various production processes, such as pharmaceutical, electric power, chemical, metallurgy, Marine engine, etc. Optimizing circulating cooling water system control can improve industrial production efficiency and reduce energy consumption and maintenance costs. Therefore, in controlling circulating cooling water systems, the study of achieving efficient control has become an important topic.

At present, the control methods employed in circulating cooling water systems are predominantly based on traditional PID control [1,2], fuzzy control [35] model predictive control (MPC) [6,7], intelligent optimization algorithms [810] and other traditional methods. For example, Xia et al. [11] proposed the use of a PID controller and a fuzzy PID controller as the control strategy for the temperature controller of the circulating cooling water system in a fuel cell engine. This approach resulted in a notable reduction in temperature fluctuations during the water temperature mixing process. Terzi et al. [12] proposed the use of a model predictive control algorithm as the control strategy for an industrial plant’s circulating cooling water system, which resulted in an improvement in control performance. Zhang et al. [13] developed an algorithmic coupling of an artificial neural network optimized by a genetic algorithm and a heat transfer model of the condenser and air-cooling heat exchanger. This was employed to optimize and control the mass flow of circulating cooling water in an indirect cooling system of thermal power units. The objective was to enhance the efficiency of the circulating cooling water system and to reduce costs. However, these methods all have certain limitations, which are difficult to adapt to the nonlinear dynamic characteristics of the system and the uncertainty in the operation process. To a significant extent, these methods depend on prior knowledge and necessitate the development of sophisticated system models and the adjustment of parameters. For instance, the PID control method necessitates manual system adjustment and is unable to accommodate the intricate dynamic alterations of the system. Although the fuzzy control method can effectively handle uncertainty, it often requires considerable expertise to design fuzzy rules and may struggle to achieve optimal control. Conversely, MPC can optimize control strategies based on predictive control, thereby enhancing control performance. However, MPC is characterized by high computational complexity and demands significant computing resources, which presents a challenge in the application of real-time, high-frequency control systems. Furthermore, MPC is susceptible to model accuracy and measurement precision, which may result in suboptimal performance in cases of unknown systems or model errors. The intelligent optimization algorithm exhibits strong adaptability but may be prone to becoming stuck in a suboptimal local solution, thus failing to ensure optimal control of the system.

In recent years, with the continuous development of artificial intelligence technology, artificial intelligence theories and technologies such as deep learning and reinforcement learning have been widely applied in many fields, such as games field [14,15], robot control field [1618], building energy efficiency field [19], natural language processing field [20], and automatic driving field [2123], and fault diagnosis field [24,25]. RL is a machine learning method to learn the optimal decision through trial and error. It is powerful nonlinear modeling and adaptive learning ability have brought new opportunities for controlling the circulating cooling water system. For example, Qiu et al. [26] proposed a model-free optimal control method based on reinforcement learning to control circulating cooling water systems in the architectural field, which makes it have broad application prospects in the architectural area where accurate system performance models are generally lacking. Wu et al. [27] proposed a PI controller based on reinforcement learning to control a steam compression refrigeration system with nonlinearity and coupling two inputs and two outputs, realizing adaptive control and improving control performance. Compared with traditional control methods, the reinforcement learning method can automatically learn the system’s dynamic characteristics and operation rules without manually adjusting the control parameters and has better adaptability and intelligence. In addition, the reinforcement learning method can also use multi-agent reinforcement learning to realize collaborative control among multiple circulating cooling water systems and further improve control efficiency and stability. For example, Fu et al. [28] proposed a multi-agent deep reinforcement learning method for building cooling water system control to optimize the load distribution, cooling tower fan frequency, and cooling pump frequency of different cooling water systems. Furthermore, industrial processes’ safety usually requires solving constrained optimal control (COC) problems. Zhang et al. [29] proposed a new safety-enhanced learning algorithm for COC problems of continuous-time nonlinear systems with unknown dynamics and perturbations. For the uncertainties in the bridge crane system, such as payload mass and unmodeled dynamics, without knowing the system model, a new model-free online reinforcement learning control method for the real-time position adjustment and anti-sway control problem of bridge cranes is proposed [30], which combines the advantages of adaptive and optimal control and exhibits satisfactory performance. These research results show that reinforcement learning methods have a broad application prospect in industrial process control.

In order to ascertain whether deep reinforcement learning methods offer certain advantages over traditional control methods in the recirculating cooling water system, and to address issues such as the inability of traditional control methods to achieve stable and precise control of the controlled system, this paper proposes the design of an adaptive control structure for the recirculating cooling water system with the objective of improving the system’s control performance. This paper makes the following contributions:

1) The design of an adaptive control structure based on the Twin Delayed Deep Deterministic Policy Gradient algorithm under a reference trajectory model (TD3-RTM) enables end-to-end control of the recirculating cooling water system at the simulation level.

2) The state space and reward function were designed to consider the multivariable characteristics of the recirculating cooling water system. A reference trajectory model was introduced to accelerate the convergence speed of the agent and reduce oscillations and instability in the control system.

3) The exploration of the potential application of deep reinforcement learning in the recirculating cooling water system, with the objective of providing references and insights for control problems in the industrial field.

The rest of this paper is organized as follows: Section 2 is the background, introducing the basics of deep reinforcement learning, the working principle of the circulating cooling water system, and the system model. Section 3 is the methods, which Outlines the design of adaptive control structure based on TD3-RTM. Section 4 is the experiment and analysis of results. Section 5 is the conclusion, which summarizes this study and puts forward the future research direction.

2 Background

The prerequisite for combining the control of a circulating cooling water system with reinforcement learning is establishing a Markov model of the circulating cooling water system. The working principle and model of the circulating cooling water system and Markov decision process (MDP) based on the circulating cooling water system are described below.

2.1 Circulating cooling water system

The circulating cooling water system comprises a temperature sensor, flowmeter, pressure sensor, heat exchanger, electric control valve, manual butterfly valve, check valve, frequency conversion pump, and other equipment. The schematic system diagram is shown in Fig 1.

Fig 1. Schematic diagram of circulating cooling water system.

Fig 1

The cold-water flow into the line through the electric regulating valve M1. When the pressure sensor P1 detects that the pressure in the main line of the system exceeds the safety value required by the system, the opening of the electric regulating valve M2 increases and discharges part of the cold water for pressure relief. At the same time, M2 is connected to the check valve to prevent backflow. Another amount of cold water enters the main line through M1, is detected by the flow meter and temperature sensor T2, and then enters the heat exchanger. As the heat exchange proceeds, some hot water is discharged through an electrically regulated valve M4 connected to a check valve. Another part of the hot water is mixed with the cold water through an electrically controlled valve M3. This cycle ensures that the cold water flowing into the heat exchanger has a constant temperature, thus ensuring the stability and safety of the system.

Flow and temperature are crucial control objectives in a circulating cooling water system. To simplify modeling and control complexity, this paper focuses on these two critical variables as the primary targets for controlling the circulating cooling water system. On the other hand, over the past decades, the successful application of single-variable control theory has demonstrated the convenience and effectiveness of using transfer functions to express and analyze control systems. Therefore, transfer function matrices are employed in this paper to describe and analyze circulating cooling water systems.

This paper represents the circulating cooling water system as a multivariate model with two inputs and outputs, as shown in Fig 2. The input variables are the opening of electric control valve M1 and M3. The output variables are the water flow and temperature into the heat exchanger. The linear transfer functions G11, G12, G21, and G22 represent the relationship between the input and output variables of the system, where the first number represents the output, and the second number represents the input. For example, G21 represents the effect of valve opening M1 on temperature.

Fig 2. Model of circulating cooling water system.

Fig 2

The transfer functions G11, G12, G21, and G22, which represent the dynamic behavior of the system, need to be identified, and the best pairing variables need to be found for the controller design. Therefore, the best-paired variables are found by selecting different variable pairs to observe the regulation state of the system during the experiment and then collecting data on the input and output quantities of the system at a steady state. For the collected system data, the data is firstly pre-processed to remove the outliers and noise. Then the transfer function model G(S) of the circulating cooling water system [31] is obtained using the MATLAB system identification toolbox, as shown in Eq 1.

G(S)=[0.7541S+0.002914S2+0.08358S+0.000257824.65S+0.02572S2+2.529S+0.0035384.721e05S3.809e06S2+0.2304S+4.309e140.354S+0.0006877S2+1.189S+0.002565] (1)

2.2 Markov decision model of circulating cooling water system

The mathematical foundation and modeling tool of reinforcement learning is the MDP. An MDP usually comprises state space s, action space a, state transition function P, reward function r, and discount factor γ. At any time step t, the agent first observes the current state st of the environment and the current corresponding reward value rt. Based on this state and reward information, the agent acts at and obtains the state st+1 and the reward rt+1 from the environment for the next step. The interaction between the reinforcement learning agent and the environment under the control system is shown in Fig 3.

Fig 3. Agent and environment interaction process.

Fig 3

In control system terminology, the term "agent" refers to the designed controller; the "environment" includes the system outside the controller, which, in this paper, refers explicitly to the circulating cooling water system. The policy represents the optimal control behavior sought by the designer. As shown in Fig 3, the interaction process between the agent and the environment indicates that state s represents various features and parameters measured by sensors in the circulating cooling water system, such as flow and temperature. Action a illustrates the opening value of the electric regulating valve determined by the agent based on the current state of the circulating cooling water system. Reward r indicates the feedback obtained by the agent after taking specific actions in specific conditions. Rewards are used to evaluate the quality of the agent’s behavior and guide decision-making in different states. In the context of the circulating cooling water system, rewards can be used to measure the control effectiveness and performance of the system. In deep reinforcement learning, state transition function P is often unknown, so the agent needs to estimate the state transition probability through interaction with the environment and learn and optimize control strategies. The design of the control strategy based on deep reinforcement learning relies on the design of the state, action, reward function, and reinforcement learning algorithms. Section 3 will provide a detailed introduction to the design of the control strategy.

3 Methods

In this study, a deep reinforcement learning approach is used to design an adaptive controller for the circulating cooling water system. In deep reinforcement learning, the neural network is used as the value function or parameterized policy, while the gradient optimization method is used to optimize the loss. Here, the twin delayed deep deterministic policy gradient [32] (TD3) algorithm, which is an actor-critic framework to deal with continuous action space problems, is employed to optimize the control parameters in the circulating cooling water system.

The TD3 algorithm is a deep reinforcement learning algorithm based on the Actor-Critic framework based on the Deep Deterministic Policy Gradient [33] (DDPG) algorithm. Since the value network of DDPG tends to overestimate the action value function, the TD3 algorithm has made improvements in the following three aspects to address the shortcomings of the DDPG algorithm: with truncated double Q learning, the problem of overestimation of the critic network is alleviated; the robustness and smoothness of the algorithm are improved by adding noise that obeys a truncated normal distribution to the output action of the target policy network; make the policy network and the three target networks update less frequently than the value network, this method can reduce the variance of the approximate action-value function, and a better policy can be obtained. The TD3 algorithm was selected for controlling the circulating cooling water system in this study due to its capability to handle continuous action spaces, utilization of twin Q networks to mitigate overestimation bias, and implementation of delayed policy updates and soft updates to reduce function approximation errors, thereby delivering more stable and precise control. Furthermore, TD3’s deep neural networks are proficient in effectively modeling the complex nonlinear and multivariate characteristics of the system, facilitating real-time adaptation and optimized control, thereby enhancing system performance.

Choosing an appropriate deep reinforcement learning algorithm is only part of designing a controller. In contrast, the design of states, actions, and rewards in reinforcement learning are crucial in determining the agent’s learning capabilities, control performance, and adaptability to dynamic environments. Thoughtful and well-tailored designs of these elements are essential for successful and efficient learning in various applications. The following will explain the selection and design of states, actions, and reward functions.

3.1 Control strategy design

3.1.1 State

The state reflects essential information during the interaction between the agent and the environment, and the selection of the state space directly affects the agent’s decision-making, thereby influencing the overall control performance of the system. Therefore, the state should contain sufficient information to describe the current stage. In the circulating cooling water system, where the actuator exhibits nonlinear characteristics, and the process gain varies with different manipulated variables, the chosen state space in this study is as follows:

s=[eF,eT,eFdt,eTdt,F,T,Fsp,Tsp,a1,a2]T (2)

Where, eF = FspF and eT = TspT represent the control error values of the flow and temperature loops, respectively. ∫eFdt and ∫eTdt are the error integrals. F and T represent the historical output measurement values of flow and temperature, respectively. Fsp and Tsp are the setpoints for flow and temperature. a1 and a2 are the manipulated variable values for the flow and temperature loops, respectively, which are the action values output by the agent.

3.1.2 Action

Actions represent the actions taken by the agent in specific states, and the agent’s task is to choose appropriate actions in different states to maximize its long-term rewards. In reinforcement learning, actions are typically determined by the agent’s policy, and in control systems, they correspond to the manipulated variables applied to the system. In this study, the action values correspond to the opening values of the electric regulating valves in the circulating cooling water system, where a1 represents the opening value of the electric regulating valve M1, and a2 represents the opening value of the electric regulating valve M3. The range of action values is [0, 100], making the action space:

a=[a1,a2]T (3)

3.1.3 Reward

The reward function is a crucial concept in reinforcement learning, which is used to evaluate the performance of an agent in an environment. The reward function is typically a mapping from the state and action space to a real number, representing the desirability of an action taken by the agent in each state. In reinforcement learning, the objective of the agent is to maximize the accumulated reward by interacting with the environment. Therefore, the reward function can be viewed as the objective function of the reinforcement learning task. By adjusting its policy, the agent can attempt to maximize the reward function and learn how to take optimal actions in different states of the environment.

In some reinforcement learning tasks, the reward function is typically designed such that the agent receives a reward only when the output values satisfy the system requirements. This type of reward function is known as a sparse reward function. In simple environments like single-variable systems, using a sparse reward function can still yield good control results. However, in a multivariate system, transferring the state of the system environment to the target state becomes more complex and uncertain than that of a univariate system. Therefore, based on the characteristics of circulating cooling water systems, this paper designs a dense reward function. For the flow loop, the dense reward function is set as follows:

r1={100,|eFt|φ11/eFt,φ1<|eFt|φ2eFt,|eFt|>φ2 (4)

Where eFt represents the error value of the current moment of flow. φ1 and φ2 represent the thresholds of error values in different intervals of flow. When the error value satisfies the system goal requirements, give the agent a large reward value to encourage the current behavior. In this paper, φ1 equals 0.1, and φ2 equals 5.

Furthermore, the temperature loop reward function is designed in the same way as the flow loop. Therefore, the reward function of the temperature loop is defined as:

r2={100,|eTt|η11/eTt,η1<|eTt|η2eTt,|eTt|>η2 (5)

Where eTt represents the error value of the current moment of temperature. η1 and η2 represent the thresholds of error values in different intervals of temperature. In this paper, η1 equals 0.1, and η2 equals 2.

Finally, the reward function rt based on the circulating cooling water system is defined as

rt=r1+r2 (6)

3.2 Network structure and algorithm design

The network structure of the TD3 algorithm comprises four principal components: the Actor network, the Critic network, the Target Actor network, and the Target Critic network. The Actor network generates a policy for continuous actions based on the current state. The Critic network is responsible for estimating the Q-value for the current state and action pair. The Target Actor and Target Critic networks serve as target networks for the Actor and Critic networks, respectively. The Target Actor and Target Critic networks have the same structure as the Actor and Critic networks, respectively, and their parameters are updated through soft updates from the Actor and Critic networks. In this study, the Actor and Critic networks are implemented with three-layer neural networks, comprising 128 and 64 neurons in their respective hidden layers. The rectified linear unit (ReLU) function is employed as the activation function. Furthermore, as the control actuator in the circulating cooling water system is an electric regulating valve with a range of 0 to 100, the output of the Actor network is normalized to the range of [–1, 1] using the tanh function and then scaled using the scaling operation.

To enhance the exploration and learning capabilities of the agent, this paper introduces a reference trajectory model. This model guides the agent to converge more rapidly to the desired control policy during the learning process, thereby improving the control effectiveness and learning speed of reinforcement learning. The reference trajectory model utilized in this study is:

Fr(s)=1τrs+1 (7)

In addition, in practical applications, setpoints may experience sudden changes or instability, which can lead to unstable performance or oscillations in the control system. By introducing the reference trajectory model, the setpoint signal can be smoothed to make its changes more gradual and smoother, thereby helping to reduce oscillations and instability in the control system. In this paper, τr equals 0.2. The design of the control system is illustrated in Fig 4.

Fig 4. Control structure of circulating cooling water based on TD3-RTM.

Fig 4

In this study, a controller for the circulating cooling water system is designed based on the TD3 algorithm. Its control strategy is shown in Algorithm 1.

Algorithm 1. TD3 algorithm in circulating cooling water control system.

 Initialize replay buffer M, initialize critic network Qθ1,Qθ2, parameters θ1, θ2, initialize actor network πϕ parameter ϕ, initialize target network parameters θ1θ1,θ2θ2,ϕϕ.

Repeat

 Randomly initialize the flow and temperature setpoints within the range allowed by system.

 select action atπϕ(st)+ε,εN(0,σ), accept reward rt and next state st+1.

 store state transfer data (st, at, rt, st+1) to M.

 Sample mini batches of size B from M.

a't+1πϕ(st+1)+ε,εclip(N(0,σ,c,c),c>0) y=rt+γmini=1,2Qθi(st+1,a't+1) Update value network.

θiargminθiB1(yQθi(st,at))2 if t mod d then

  Update ϕ

  ϕJ(ϕ)=B1aQθ1(st,at)|at=πϕ(st)ϕπϕ(st) Update the target network, where ρ is the soft update factor.

  θiρθi+(1ρ)θiϕρϕ+(1ρ)ϕ

 end if

end for

4 Experiments and analysis of results

In the training process of TD3-RTM in this study, the total number of episodes is set to 2000, with a sampling time of 0.1 seconds and a maximum simulation duration of 20 seconds. To enhance the disturbance rejection control performance of the system, random step signals with amplitudes ranging from -5 to 5 are applied at the control ports of the flow and temperature loops at the 15th second. The reference step input signals for the flow (m^3/h) and temperature (°C) are set to [550, 650] and [20, 30], respectively, to achieve robustness to significant setpoint changes in the system. Since TD3-RTM is based on the TD3 algorithm, the primary hyperparameters used in the training process of the TD3 algorithm are shown in Table 1.

Table 1. Hyperparameter settings of the algorithm 1.

Hyperparameters Values
Discount factor, γ 0.995
Mini-batch size 128
Replay buffer size 1e6
Critic learning rate 1e-3
Actor learning rate 5e-4
Target update frequency 10
Exploration model Gaussian noise
Variance, σ 0.2
Variance decay rate 1e-5
Policy update frequency 2
Soft update factor, ρ 5e-3

All computations were carried out on a standard PC (Win11, AMD 4600H CPU@3.00GHz, 16GB) in MATLAB/Simulink R2022b. To validate the effectiveness of TD3-RTM, comparisons were made with the classical PID controller, fuzzy PID controller, DDPG algorithm, and TD3 algorithm. To be fair, the PID parameters for classical PID control and fuzzy PID control were obtained using the Ziegler-Nichols method. The neural network architecture, number of neurons, and learning rate used in different deep reinforcement learning algorithms were the same. Each task was run for 2000 episodes, and the experiments were repeated five times with different random seeds. The recorded results represent the average reward value for every 20 episodes. The learning curves are shown in Fig 5.

Fig 5. Learning curve of the control task.

Fig 5

The results in Fig 6 indicate that TD3-RTM can converge to the desired control policy faster and with more stable convergence performance under different random initial states. Additionally, TD3-RTM achieves higher total rewards after 2000 episodes of learning.

Fig 6. Output results of different controllers at the flow setpoint of 600 m^3/h.

Fig 6

4.1 Step response and disturbance rejection performance simulation experiment

To validate the control effectiveness of TD3-RTM in the circulating cooling water system, a 100-second simulation experiment was conducted with a flow of 600 m^3/h and a temperature of 25°C. At the 60-second and 80-second marks, disturbance signals with amplitudes of 5 and -5 were applied to the control ports of the flow and temperature loops. The control performance of different control methods is shown in Figs 6 and 7.

Fig 7. Output results of different controllers at the temperature setpoint of 25°C.

Fig 7

As shown in Fig 6, it can be observed that in the flow control loop, the deep reinforcement learning controller exhibits faster response speed and minor overshoot compared to the classical PID controller and fuzzy PID controller in the step response. TD3-RTM is less affected when faced with external disturbance signals, while the DDPG algorithm shows steady-state error and the TD3 algorithm exhibits oscillations. The oscillations in the TD3 algorithm cause continuous changes in the control signal, which can damage the actuators in the circulating cooling water system and lead to system instability. As shown in Fig 7, in the temperature control loop, the PID controller, fuzzy PID controller and DDPG algorithm have relatively slow response speeds, while the TD3 algorithm and TD3-RTM achieve good control performance. The deep reinforcement learning controller performs better when subjected to external disturbance signals. The performance parameters for different control methods are shown in Table 2.

Table 2. Comparison of controllers performance parameters.

Variables Controllers Rise Time (s) Transient Time (s) Overshoot (%) IAE
Flow PID 1.03 7.10 18.60 753.80
Fuzzy-PID 0.57 6.30 8.75 378.36
DDPG 0.30 1.58 0 179.23
TD3 0.28 1.78 2.25 133.22
TD3-RTM 0.47 1.01 0.96 43.26
Temperature PID 16.54 29.81 0 170.35
Fuzzy-PID 9.32 17.44 0 85.28
DDPG 5.99 18.84 0 58.24
TD3 2.97 4.60 0.21 28.22
TD3-RTM 2.78 3.79 0 26.45

From Table 2, compared to PID, fuzzy PID, DDPG and TD3, the TD3-RTM method improved the transient time in the flow loop by 6.09s, 5.29s, 0.57s, and 0.77s, respectively, and the Integral of Absolute Error(IAE) indexes decreased by 710.54, 335.1, 135.97, and 89.96, respectively, and the transient time in the temperature loop improved by 25.84s, 13.65s, 15.05s, and 0.81s, and the IAE metrics were reduced by 143.9, 59.13, 31.79, and 1.77, respectively. In addition, the overshooting of the TD3-RTM method in the flow loop was reduced by 17.64, 7.79, and 1.29 per cent, respectively, in comparison with the PID, the fuzzy PID, and the TD3. Generally, for industrial energy-consuming scenarios such as the circulating cooling water system, a controller with lower IAE and shorter settling time can save more energy. Although the DDPG algorithm shows a shorter rise time and no overshoot in the flow control loop, it exhibits a longer rise time and settling time in the temperature control loop. Its IAE is the largest compared to TD3 and TD3-RTM. Overall, TD3-RTM demonstrates significant advantages in both control loops.

4.2 Tracking performance simulation experiment

To validate the tracking performance of TD3-RTM, this study designed different setpoints for both the flow control loop and the temperature control loop and conducted 300 seconds of simulation experiments. The control effects of different control methods are shown in Figs 8 and 9.

Fig 8. Output results for different controllers in the flow loop at different setpoints.

Fig 8

Fig 9. Output results for different controllers in the temperature loop at different setpoints.

Fig 9

As shown in Fig 8, in the flow control loop, when the setpoint is changed, the DDPG algorithm becomes unstable, and both the PID controller and fuzzy PID controller exhibit significant overshoot and long settling times at different setpoints. On the other hand, the TD3 algorithm and TD3-RTM outperform other methods significantly. In Fig 9, although all controllers can track the given setpoints, their transient responses vary widely. The DDPG algorithm can reach the setpoint at the end of each simulation time, but its settling time is longer, and its performance is inferior to the PID and fuzzy PID controllers. However, TD3-RTM’s performance during setpoint changes is comparable to the TD3 algorithm, with the fastest response speed and good settling time. Overall, TD3-RTM performs well in both control loops and shows excellent potential.

5 Conclusion

This paper presents a novel adaptive control structure based on the Twin Delayed Deep Deterministic Policy Gradient algorithm under a reference trajectory model (TD3-RTM) for addressing complex control problems in recirculating cooling water systems. Initially, the TD3 algorithm is employed to construct a deep reinforcement learning agent, enabling it to select appropriate actions for each loop based on system state features. Additionally, the multivariable characteristics of the recirculating cooling water system necessitate the design of a dense reward function, which enables the agent to receive various rewards through interactions with the environment and update its network, thereby gradually approaching the optimal policy. Furthermore, the introduction of the reference trajectory model accelerates the convergence speed of the agent and reduces system oscillations and instability. Simulation results show that compared to PID, fuzzy PID, DDPG and TD3, the TD3-RTM method improved the transient time in the flow loop by 6.09s, 5.29s, 0.57s, and 0.77s, respectively, and the Integral of Absolute Error(IAE) indexes decreased by 710.54, 335.1, 135.97, and 89.96, respectively, and the transient time in the temperature loop improved by 25.84s, 13.65s, 15.05s, and 0.81s, and the IAE metrics were reduced by 143.9, 59.13, 31.79, and 1.77, respectively. In addition, the overshooting of the TD3-RTM method in the flow loop was reduced by 17.64, 7.79, and 1.29 per cent, respectively, in comparison with the PID, the fuzzy PID, and the TD3. To further enhance safety and system stability, regular monitoring of system performance and adjusting as necessary are encouraged in practical applications. Furthermore, the utilization of backup control strategies to address exceptional circumstances that may be beyond the scope of deep reinforcement learning algorithms ensures the system’s stability in extreme conditions.

This research validates the potential of deep reinforcement learning in the circulating cooling water system and offers novel solutions and insights for practical engineering control problems. Although the method proposed in this paper achieves good control performance in simulation experiments and shows advantages over both traditional control methods and other deep reinforcement learning methods, there are some potential limitations, such as applicability limitations, computational resource requirements, hyper-parameter sensitivity, adaptability to environmental variations, and the challenge of practical system validation. Further optimization and extension of the proposed control method can be explored for broader industrial applications, along with investigating other deep reinforcement learning algorithms for complex system control. This will contribute to advancing intelligent control technology in industrial automation, enhancing production efficiency and resource utilization.

Supporting information

S1 Appendix. Datas and codes from the experiments.

(ZIP)

pone.0307767.s001.zip (1.2MB, zip)

Data Availability

All relevant data are within the manuscript and its Supporting Information files.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.Kim K-H. Temperature Stabilization of the Klystron Cooling Water at the KOMAC. Journal of the Korean Physical Society. 2018;73(8):1157–62. doi: 10.3938/jkps.73.1157 [DOI] [Google Scholar]
  • 2.Garciadealva Y, Best R, Gomez VH, Vargas A, Rivera W, Jimenez-Garcia JC. A Cascade Proportional Integral Derivative Control for a Plate-Heat-Exchanger-Based Solar Absorption Cooling System. Energies. 2021;14(13):20. doi: 10.3390/en14134058 WOS:000671123300001. [DOI] [Google Scholar]
  • 3.Liu W-h, Xie Z. Design and Simulation Test of Advanced Secondary Cooling Control System of Continuous Casting Based on Fuzzy Self-Adaptive PID. Journal of Iron and Steel Research International. 2011;18(1):26–30. doi: 10.1016/S1006-706X(11)60006-X [DOI] [Google Scholar]
  • 4.Liang YY, Wang DD, Chen JP, Shen YG, Du J. Temperature control for a vehicle climate chamber using chilled water system. Appl Therm Eng. 2016;106:117–24. doi: 10.1016/j.applthermaleng.2016.05.168 WOS:000381530600013. [DOI] [Google Scholar]
  • 5.Jia Y, Zhang R, Lv X, Zhang T, Fan Z. Research on Temperature Control of Fuel-Cell Cooling System Based on Variable Domain Fuzzy PID. 2022;10(3):534. doi: 10.3390/pr10030534 [DOI] [Google Scholar]
  • 6.Zhao Y, Pistikopoulos E. Dynamic modelling and parametric control for the polymer electrolyte membrane fuel cell system. Journal of Power Sources. 2013;232:270–8. 10.1016/j.jpowsour.2012.12.116. [DOI] [Google Scholar]
  • 7.Muller CJ, Craig IK. Economic hybrid non-linear model predictive control of a dual circuit induced draft cooling water system. J Process Control. 2017;53:37–45. 10.1016/j.jprocont.2017.02.009. [DOI] [Google Scholar]
  • 8.Dulce-Chamorro E, Martinez-de-Pison FJ. An advanced methodology to enhance energy efficiency in a hospital cooling-water system. Journal of Building Engineering. 2021;43:102839. 10.1016/j.jobe.2021.102839. [DOI] [Google Scholar]
  • 9.Liang J, Li L, Li Y, Wang Y, Feng X. Operation optimization of existing industrial circulating water system considering variable frequency drive. Chemical Engineering Research and Design. 2022;186:387–97. 10.1016/j.cherd.2022.08.010. [DOI] [Google Scholar]
  • 10.Niu D, Liu X, Tong YJIJoCIS. Operation Optimization of Circulating Cooling Water System Based on Adaptive Differential Evolution Algorithm. 2023;16(1):22. [Google Scholar]
  • 11.Xia QA, Zhang T, Sun ZF, Gao Y. Design and optimization of thermal strategy to improve the thermal management of proton exchange membrane fuel cells. Appl Therm Eng. 2023;222:11. doi: 10.1016/j.applthermaleng.2022.119880 WOS:000914111300001. [DOI] [Google Scholar]
  • 12.Terzi E, Cataldo A, Lorusso P, Scattolini R. Modelling and predictive control of a recirculating cooling water system for an industrial plant. J Process Control. 2018;68:205–17. doi: 10.1016/j.jprocont.2018.04.009 WOS:000442706100018. [DOI] [Google Scholar]
  • 13.Zhang W, Ma L, Jia B, Zhang Z, Liu Y, Duan L. Optimization of the circulating cooling water mass flow in indirect dry cooling system of thermal power unit using artificial neural network based on genetic algorithm. Appl Therm Eng. 2023;223:120040. 10.1016/j.applthermaleng.2023.120040. [DOI] [Google Scholar]
  • 14.Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, et al. Mastering the game of Go without human knowledge. Nature. 2017;550(7676):354–9. doi: 10.1038/nature24270 [DOI] [PubMed] [Google Scholar]
  • 15.McNamara JM, Houston AI, Leimar O. Learning, exploitation and bias in games. PLOS ONE. 2021;16(2):e0246588. doi: 10.1371/journal.pone.0246588 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hwangbo J, Lee J, Dosovitskiy A, Bellicoso D, Tsounis V, Koltun V, et al. Learning agile and dynamic motor skills for legged robots. 2019;4(26):eaau5872. doi: 10.1126/scirobotics.aau5872 [DOI] [PubMed] [Google Scholar]
  • 17.Ejaz MM, Tang TB, Lu CK. Vision-Based Autonomous Navigation Approach for a Tracked Robot Using Deep Reinforcement Learning. IEEE Sensors Journal. 2021;21(2):2230–40. doi: 10.1109/JSEN.2020.3016299 [DOI] [Google Scholar]
  • 18.Fernandez-Gauna B, Etxeberria-Agiriano I, Graña M. Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning. PLOS ONE. 2015;10(7):e0127129. doi: 10.1371/journal.pone.0127129 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Fu Q, Han Z, Chen J, Lu Y, Wu H, Wang YJJoBE. Applications of reinforcement learning for building energy efficiency control: A review. 2022;50:104165. [Google Scholar]
  • 20.Le-Khac PH, Healy G, Smeaton AF. Contrastive Representation Learning: A Framework and Review. IEEE Access. 2020;8:193907–34. doi: 10.1109/ACCESS.2020.3031549 [DOI] [Google Scholar]
  • 21.Al-Qizwini M, Bulan O, Qi X, Mengistu Y, Mahesh S, Hwang J, et al. A Lightweight Simulation Framework for Learning Control Policies for Autonomous Vehicles in Real-World Traffic Condition. IEEE Sensors Journal. 2021;21(14):15762–74. doi: 10.1109/JSEN.2020.3036532 [DOI] [Google Scholar]
  • 22.Gangopadhyay B, Soora H, Dasgupta P. Hierarchical Program-Triggered Reinforcement Learning Agents for Automated Driving. IEEE Transactions on Intelligent Transportation Systems. 2022;23(8):10902–11. doi: 10.1109/TITS.2021.3096998 [DOI] [Google Scholar]
  • 23.Ashraf NM, Mostafa RR, Sakr RH, Rashad MZ. Optimizing hyperparameters of deep reinforcement learning for autonomous driving based on whale optimization algorithm. PLOS ONE. 2021;16(6):e0252754. doi: 10.1371/journal.pone.0252754 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Cao J, Ma J, Huang D, Yu P. Finding the optimal multilayer network structure through reinforcement learning in fault diagnosis. Measurement. 2022;188:110377. 10.1016/j.measurement.2021.110377. [DOI] [Google Scholar]
  • 25.Wang R, Jiang H, Li X, Liu S. A reinforcement neural architecture search method for rolling bearing fault diagnosis. Measurement. 2020;154:107417. 10.1016/j.measurement.2019.107417. [DOI] [Google Scholar]
  • 26.Qiu S, Li Z, Li Z, Li J, Long S, Li X. Model-free control method based on reinforcement learning for building cooling water systems: Validation by measured data-based simulation. Energy and Buildings. 2020;218:110055. 10.1016/j.enbuild.2020.110055. [DOI] [Google Scholar]
  • 27.Wu Y, Xing L, Liu XK, Guo F, editors. A New Solution to the PID18 Challenge: Reinforcement-Learning-based PI Control. 2022 34th Chinese Control and Decision Conference (CCDC); 2022. 15–17 Aug. 2022. [Google Scholar]
  • 28.Fu Q, Chen X, Ma S, Fang N, Xing B, Chen J. Optimal control method of HVAC based on multi-agent deep reinforcement learning. Energy and Buildings. 2022;270:112284. 10.1016/j.enbuild.2022.112284. [DOI] [Google Scholar]
  • 29.Zhang H, Zhao C, Ding J. Robust safe reinforcement learning control of unknown continuous-time nonlinear systems with state constraints and disturbances. J Process Control. 2023;128:103028. 10.1016/j.jprocont.2023.103028. [DOI] [Google Scholar]
  • 30.Zhang H, Zhao C, Ding J. Online reinforcement learning with passivity-based stabilizing term for real time overhead crane control without knowledge of the system model. Control Engineering Practice. 2022;127:105302. 10.1016/j.conengprac.2022.105302. [DOI] [Google Scholar]
  • 31.Li T, Liu Y, Chen Z. Design of Gas Turbine Cooling System Based on Improved Jumping Spider Optimization Algorithm. 2022;10(10):909. doi: 10.3390/machines10100909 [DOI] [Google Scholar]
  • 32.Fujimoto S, Hoof H, Meger D, editors. Addressing function approximation error in actor-critic methods. International conference on machine learning; 2018: PMLR. [Google Scholar]
  • 33.Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, et al. Continuous control with deep reinforcement learning. 2015. [Google Scholar]

Decision Letter 0

Joint Chair Prof Dr Stelios Bekiros

19 Oct 2023

PONE-D-23-24165Adaptive control for circulating cooling water system using deep reinforcement learningPLOS ONE

Dear Dr. Zhang,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Nov 25 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Prof. Dr. Stelios Bekiros, PhD

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

Additional Editor Comments :

REVIEWER COMMENTS:

This paper presents a deep RL-based control of a circulating cooling water system. The topic has some practical significance, but the novelty of the paper is not enough. However, the decision can be reconsidered if the authors could carefully address all the concerns raised.

1. How does the proposed method ensure the stability of the system?

2. The authors mentioned many successful applications of RL to circulating cooling water system ([26-28]), what is the contribution of this manuscript compared to them? It is suggested that the motivation and contributions should be more emphasized.

3. Since there are many related methods that can also deal with optimal control of unknown systems, it is better to provide a more comprehensive literature review. Please note that the up-to-date of references will contribute to the up-to-date of your manuscript. The studies named: Robust safe reinforcement learning control of unknown continuous-time nonlinear systems with state constraints and disturbances, Journal of Process Control; Online reinforcement learning with passivity-based stabilizing term for real time overhead crane control without knowledge of the system model, Control Engineering Practice, can be used to explain the method in the study or to indicate the contribution in the "Introduction" section. I believe this would further strengthen the introduction and lend support to the methodology used in general.

4. Check the notation system throughout the text. For example, the differential operator in equation (1) and the state in MDP use the same character "s". The transfer function G and the state transition function P should be unified, the current expression is confusing. If a1 and M1 represent the same value, why do the authors use different notations?

5. The control error values in equation (2) are not defined. The error between what? It is suggested that the reference trajectory model be placed in a more appropriate location.

6. What is the difference between the proposed method and TD3?

7. Please improve the quality of all figures and the language.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This paper presents a deep RL-based control of a circulating cooling water system. The topic has some practical significance, but the novelty of the paper is not enough. However, the decision can be reconsidered if the authors could carefully address all the concerns raised.

1. How does the proposed method ensure the stability of the system?

2. The authors mentioned many successful applications of RL to circulating cooling water system ([26-28]), what is the contribution of this manuscript compared to them? It is suggested that the motivation and contributions should be more emphasized.

3. Since there are many related methods that can also deal with optimal control of unknown systems, it is better to provide a more comprehensive literature review. Please note that the up-to-date of references will contribute to the up-to-date of your manuscript. The studies named: Robust safe reinforcement learning control of unknown continuous-time nonlinear systems with state constraints and disturbances, Journal of Process Control; Online reinforcement learning with passivity-based stabilizing term for real time overhead crane control without knowledge of the system model, Control Engineering Practice, can be used to explain the method in the study or to indicate the contribution in the "Introduction" section. I believe this would further strengthen the introduction and lend support to the methodology used in general.

4. Check the notation system throughout the text. For example, the differential operator in equation (1) and the state in MDP use the same character "s". The transfer function G and the state transition function P should be unified, the current expression is confusing. If a1 and M1 represent the same value, why do the authors use different notations?

5. The control error values in equation (2) are not defined. The error between what? It is suggested that the reference trajectory model be placed in a more appropriate location.

6. What is the difference between the proposed method and TD3?

7. Please improve the quality of all figures and the language.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Jul 24;19(7):e0307767. doi: 10.1371/journal.pone.0307767.r002

Author response to Decision Letter 0


20 Nov 2023

Dear Editors and Reviewers:

Thanks for your comments concerning our manuscript entitled “Adaptive control for circulating cooling water system using deep reinforcement learning” (ID: PONE-D-23-24165). Your comments are really helpful for revising and improving our paper. We have studied those comments carefully and have made some corrections which we hope to meet with your approval. The main corrections in the paper and the response to the reviewers are as follows:

Comments 1: How does the proposed method ensure the stability of the system?

Response 1: Thank you for your valuable feedback on our submitted paper. We have carefully read your review comments, and in response to your concerns about system stability, we are willing to provide a detailed response.

In this paper, we employ the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm designed to improve training stability. By introducing a Twin Q network and a delayed update mechanism, we aim to reduce the variance during training to prevent excessive fluctuations in the system. The algorithm employs experience playback, which is one of the commonly used techniques in the field of deep reinforcement learning. Experience playback helps mitigate instability due to sample correlation and improves the system's robustness by reusing previous experience. In this paper, we rationally design the state space and the reward function of a multivariate system for circulating cooling water to help the deep learning agent perceive the system state more accurately and adjust it according to the reward signal. This initiative aims to prevent unstable behaviours during the training process. In addition, we introduce a reference trajectory model to accelerate convergence and reduce system oscillations during control. This optimization tool helps to make the control system approximate the optimal policy more smoothly and improve the overall stability.

We encourage regular system performance monitoring in practical applications to enhance safety and system stability further. It is crucial to make adjustments as needed to maintain system stability. Additionally, considering alternative control strategies may be prudent to address specific scenarios that deep reinforcement learning algorithms might not handle effectively. This ensures that the system remains stable even in extreme circumstances.

Based on this, we have made an addition in Section 5. Please refer to the red content in the first paragraph of section 5 on page 13, at line 311~314.

Comments 2: The authors mentioned many successful applications of RL to circulating cooling water system ([26-28]), what is the contribution of this manuscript compared to them? It is suggested that the motivation and contributions should be more emphasized.

Response 2: Thank you for your valuable suggestions on our paper. We understand and value your comments.

Regarding the motivation of the research in this paper, the circulating cooling water system is a complex system with nonlinear, time-lag and multivariate characteristics. Traditional control methods, such as PID controllers, fuzzy control, model predictive control, etc., are often difficult to cope with the complex dynamic characteristics of the system and the uncertainty in the operation process and thus have certain limitations. However, with the development of artificial intelligence technology, reinforcement learning, as a machine learning method based on trial-and-error learning, has powerful nonlinear modelling and adaptive learning capabilities. On the one hand, this paper wants to verify whether deep reinforcement learning has certain advantages over traditional control methods in circulating cooled water systems; on the other hand, although [26-28] have done some research in circulating cooled water-related systems, in general, the research in this field is not deep enough and comprehensive, based on which this paper adopts a different deep reinforcement learning method from [26-28]: the Twin Delayed Deep Deterministic Policy Gradient. The main contributions of this paper are as follows: 1) A deep reinforcement learning controller for circulating cooling water systems was designed based on the TD3 algorithm, achieving end-to-end control and enhancing system stability. 2) The circulating cooling water multivariable system's state space and reward function were reasonably designed. The convergence speed of the agent was accelerated, and the oscillations and instability of the control system were reduced by adding a reference trajectory model. 3) The controller design does not require a model or specialized knowledge about industrial processes. Random disturbance signals were introduced during simulation training to improve the system's adaptive capabilities. 4) The application of deep reinforcement learning in circulating cooling water systems was explored, providing reference and inspiration for control problems in other industrial domains.

Based on this, we have made an addition in Section 1. Please refer to the red content in the third paragraph of section 1 on page 3, at line 69~83.

Comments 3: Since there are many related methods that can also deal with optimal control of unknown systems, it is better to provide a more comprehensive literature review. Please note that the up-to-date of references will contribute to the up-to-date of your manuscript. The studies named: Robust safe reinforcement learning control of unknown continuous-time nonlinear systems with state constraints and disturbances, Journal of Process Control; Online reinforcement learning with passivity-based stabilizing term for real time overhead crane control without knowledge of the system model, Control Engineering Practice, can be used to explain the method in the study or to indicate the contribution in the "Introduction" section. I believe this would further strengthen the introduction and lend support to the methodology used in general.

Response 3: Thank you for your comments and suggestions. The literature you recommended is critical. We fully agree and have added this section to the manuscript.

Please refer to the red content in the third paragraph of section 1 on page 3, at line 61~68.

Comments 4: Check the notation system throughout the text. For example, the differential operator in equation (1) and the state in MDP use the same character "s". The transfer function G and the state transition function P should be unified, the current expression is confusing. If a1 and M1 represent the same value, why do the authors use different notations?

Response 4: Thank you for your comments and suggestions. We apologize for the lack of clarity in our previous presentation; your suggestion is essential.

For this reason, we use "S" to denote the differential operator in Equation (1) and "s" to denote the state in the MDP. The transfer function G is mainly used in control system theory to describe linear time-invariant systems' input and output relationship. In contrast, the state transfer function P is mainly used in the MDP framework to describe the state transfer process between an intelligent body and its environment. In addition, the values of a1 and M1 are indeed the same in this paper, and the reason why different symbols are used is that a1 denotes the value of the action in reinforcement learning, and M1 denotes the value of the valve opening in the circulating cooling water system. The action value a1 obtained after the training of the reinforcement learning algorithm is applied to the circulating cooling water system as a control quantity, and the realization of the control quantity is done through M1.

For the revision details of this question, please refer to the red content in section 2 on page 5, at line 127.

Comments 5: The control error values in equation (2) are not defined. The error between what? It is suggested that the reference trajectory model be placed in a more appropriate location.

Response 5: Thank you for pointing this out. For the revision details of this question, please refer to the red content in section 3 on page 7, at line 177.

The reason for placing the reference trajectory model after the setpoint is that, considering that the setpoint may have sudden changes or instability in practical applications, adding the reference trajectory model after it can smooth out the setpoint signal so that its changes are slower and smoother, thus helping to reduce the oscillations and instability of the control system. The fact that the reference trajectory model is placed after the setpoint does not affect the magnitude of the error value.

Comments 6: What is the difference between the proposed method and TD3?

Response 6: Thank you for pointing this out. We are willing to provide further explanation on the issue.

The control algorithm used in our proposed method is TD3. However, we have adjusted the system's control structure to adapt to the control problems in circulating cooling water systems. Since the setpoints in the circulating water system may have sudden changes or instability in practical applications, this may lead to unstable performance or oscillations in the control system. By introducing a reference trajectory model, the setpoint signal can be smoothed to make its changes slower and smoother, which helps to reduce the oscillation and instability of the control system. From the learning curves of different methods under the same task in Fig. 5, the method proposed in this paper obtains higher rewards faster and more stably due to the addition of the reference trajectory model. Through simulation experiments, the method proposed in this paper has a better performance and a more significant potential in a comprehensive view.

Comments 7: Please improve the quality of all figures and the language.

Response 7: Thank you for your review and valuable comments. We take your suggestions very seriously and have already started to improve the quality of all graphics and language in the paper. We will carefully review and cross-reference your guidance to ensure that all charts and graphs, as well as the presentation of the paper, are more precise, more accurate, and better aligned with academic requirements. We look forward to demonstrating a tangible effort to improve on your suggested improvements in the final version.

Thank you again for your guidance and review.

We are looking forward to hearing from you at your earliest convenience. Thanks for your attention and time.

Sincerely,

Qingxin Zhang (Corresponding author)

E-mail: zhy9712_sau@163.com

Shenyang Aerospace University

November 20, 2023

Attachment

Submitted filename: Response to Reviewers.docx

pone.0307767.s002.docx (99.7KB, docx)

Decision Letter 1

Lalit Chandra Saikia

19 Jan 2024

PONE-D-23-24165R1Adaptive control for circulating cooling water system using deep reinforcement learningPLOS ONE

Dear Dr. Zhang,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Mar 04 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Lalit Chandra Saikia, PhD

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

All the comments of reviewer must be addressed and necessary changes must be done in the revised manuscript.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

Reviewer #3: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors have addressed most of my concerns and the paper is recommended for acceptance if possible.

Reviewer #2: (No Response)

Reviewer #3: Some potential drawbacks of the proposed deep reinforcement learning control method:

1. Complexity: Deep RL methods introduce significant complexity compared to traditional controllers.

2. Hyperparameters: Fine-tuning hyperparameters like discount factor, learning rate etc. requires expertise.

3. Sample efficiency: Large volumes of experience/data needed to learn optimal policy, may not be feasible in practice.

4. Brittleness: Policies could fail under distribution shifts or novel operating conditions not seen during training.

5. Non-stationary systems: No mechanism provided to continually learn as system dynamics change over time.

6. Interpretability: Learned policies are black-boxes, hard to analyze causes of behavior and ensure robustness.

7. Real system validation: Only simulated tests conducted, performance on real plant with noises/disturbances unknown.

8. Computational cost: Training deep RL agents is computationally expensive requiring specialized hardware.

9. Data requirements: Need sufficient coverage of state-action space in collected data to train policy.

10. Safety: No fail-safes described for scenarios where control deteriorates before retraining can occur.

11. Single objective: Only optimize for one control metric, may negatively impact other important factors.

12. Keywords section is missing.

13. Describe dataset features in more details and its total size and size of (train/test) as a table.

14. Flowchart and algorithm steps need to be inserted.

15. Time spent need to be measured in the experimental results.

16. Limitation Section need to be inserted.

17. All metrics need to be calculated in the experimental results as tables.

18. Address the accuracy/improvement percentages in the abstract and in the conclusion sections, as well as the significance of these results.

19. The architecture of the proposed model must be provided

20. The authors need to make a clear proofread to avoid grammatical mistakes and typo errors.

21. The authors need to add recent articles in related work and update them.

22. Add future work in last section (conclusion) (if any)

23. Enhance the clarity of the Figures by improving their resolution.

24. To improve the Related Work and Introduction sections authors are recommended to review this highly related research work paper:

a) Building an Effective and Accurate Associative Classifier Based on Support Vector Machine

b) A survey on improving pattern matching algorithms for biological sequences

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Tarek Abd El-Hafeez

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Jul 24;19(7):e0307767. doi: 10.1371/journal.pone.0307767.r004

Author response to Decision Letter 1


25 Feb 2024

Response to Reviewer Comments

Thank you very much for taking the time to review this manuscript. We will carefully consider and provide detailed answers to your questions. Please find the detailed responses below and the corresponding revisions/corrections highlighted/in track changes in the resubmitted files.

Comments: Some potential drawbacks of the proposed deep reinforcement learning control method:

1. Complexity: Deep RL methods introduce significant complexity compared to traditional controllers.

2. Hyperparameters: Fine-tuning hyperparameters like discount factor, learning rate etc. requires expertise.

3. Sample efficiency: Large volumes of experience/data needed to learn optimal policy, may not be feasible in practice.

4. Brittleness: Policies could fail under distribution shifts or novel operating conditions not seen during training.

5. Non-stationary systems: No mechanism provided to continually learn as system dynamics change over time.

6. Interpretability: Learned policies are black-boxes, hard to analyze causes of behavior and ensure robustness.

7. Real system validation: Only simulated tests conducted, performance on real plant with noises/disturbances unknown.

8. Computational cost: Training deep RL agents is computationally expensive requiring specialized hardware.

9. Data requirements: Need sufficient coverage of state-action space in collected data to train policy.

10. Safety: No fail-safes described for scenarios where control deteriorates before retraining can occur.

11. Single objective: Only optimize for one control metric, may negatively impact other important factors.

12. Keywords section is missing.

13. Describe dataset features in more details and its total size and size of (train/test) as a table.

14. Flowchart and algorithm steps need to be inserted.

15. Time spent need to be measured in the experimental results.

16. Limitation Section need to be inserted.

17. All metrics need to be calculated in the experimental results as tables.

18. Address the accuracy/improvement percentages in the abstract and in the conclusion sections, as well as the significance of these results.

19. The architecture of the proposed model must be provided

20. The authors need to make a clear proofread to avoid grammatical mistakes and typo errors.

21. The authors need to add recent articles in related work and update them.

22. Add future work in last section (conclusion) (if any)

23. Enhance the clarity of the Figures by improving their resolution.

24. To improve the Related Work and Introduction sections authors are recommended to review this highly related research work paper:

a) Building an Effective and Accurate Associative Classifier Based on Support Vector Machine

b) A survey on improving pattern matching algorithms for biological sequences

Response:

Thank you for your detailed comments and suggestions on the paper we submitted. We greatly appreciate your interest and contribution to our work. In your suggestions, we recognize that some of these issues deserve further consideration and improvement.

We greatly appreciate your review and valuable feedback on our work. We acknowledge the challenges you raised (issues 1-11) as common concerns faced by deep reinforcement learning methods in the control field. Your insights will help us refine and apply our approach to practical engineering applications. We will carefully consider your suggestions and strive to address these issues in our future work.

Regarding your point about adding a Keywords section (issue 12), we appreciate your suggestion. However, according to the official template and guidelines we followed, a separate Keywords section is optional. For consistency with the format provided by the journal and based on the precedent set by previously published papers on the journal's website, we did not include a Keywords section in our manuscript. Nevertheless, we have provided the relevant keywords for our paper: Industrial process control, Circulating cooling water system, Deep reinforcement learning, PID controller, and TD3.

In response to issue 13 concerning dataset description, we acknowledge your feedback. Our study did not utilize a specific dataset; instead, we conducted research and experiments based on theoretical models and simulation environments. Therefore, we did not provide specific details about a dataset in the manuscript.

As for your issue 14, we are very grateful for your suggestion, which has already been included in the document. Please refer to Figure 4 and Algorithm 1's related content.

Regarding issue 15, we understand your point about time measurement in the experimental results. However, at this stage, we cannot conduct additional time measurements. Therefore, we do not intend to add this information to the manuscript. We will consider incorporating time measurements in future experiments to provide more comprehensive results.

Regarding your issue 16, we appreciate your suggestion. Although our approach achieves good control performance in simulation experiments and shows advantages over both traditional control methods and other deep reinforcement learning methods, we are well aware of some potential limitations, such as applicability constraints, computational resource requirements, hyper-parameter sensitivity, adaptability to environmental variations, and the challenges of practical system validation. Based on this, we have made an addition in Section 5. Please refer to the red content in the second paragraph of section 5 on page 14, at lines 316~320.

Concerning issue 17, we appreciate your recommendation. Performance metrics for controllers, such as rise time, transient time, and overshoot, have been extensively calculated and compared in Table 2. The data presented in Table 2 effectively demonstrate the performance disparities among the controllers and offer readers a comprehensive understanding of the experimental outcomes. If you deem additional performance metrics or further analysis necessary, please suggest, and we will make the necessary adjustments.

For issue 18, we value your suggestion. The manuscript's abstract, experiments, and conclusion sections thoroughly discuss the significance of our research findings. Thanks again for your advice.

Regarding your issue 19, we appreciate your suggestion. Regarding the model of this paper and the system's architecture, which is available in the manuscript, please refer to the relevant part of Figure 4.

Concerning your issue 20 and 23, we take your suggestions very seriously and have already started to improve the quality of all graphics and language in the paper. We will carefully review and cross-reference your guidance to ensure that all charts and graphs, as well as the presentation of the paper, are more precise, more accurate, and better aligned with academic requirements. Thank you again for your guidance and review.

Regarding your issue 21, we appreciate your suggestions. Recent articles on the work studied here are fully cited and discussed in the manuscript, and again, we thank you for your suggestions.

Regarding your issue 22, we appreciate and value your suggestion. Regarding the section on future related work, please refer to the red content in the second paragraph of section 5 on page 14, at lines 320~322.

Regarding your issue 24, thank you very much for reviewing our paper and for the advice you provided. We have carefully considered the references you have given. However, after careful review, they are not directly relevant to the research content of our paper, and therefore, we do not intend to cite them for the time being. We have covered the literature closely related to our research topic in the Related Work and Introduction sections and provided a comprehensive introduction and analysis of related work in the current research area. These references support our paper's background and motivation and give the reader a transparent background and introduction to our research. Please provide specific suggestions if we need to consider citing other relevant literature; we will be more than happy to listen and discuss further.

Thank you again for your valuable suggestions!

Attachment

Submitted filename: Response to Reviewers.docx

pone.0307767.s003.docx (98.5KB, docx)

Decision Letter 2

Lalit Chandra Saikia

2 May 2024

PONE-D-23-24165R2Adaptive control for circulating cooling water system using deep reinforcement learningPLOS ONE

Dear Dr. Zhang,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jun 13 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Joanna Tindall

Staff Editor

PLOS ONE

on behalf of: 

Lalit Chandra Saikia

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

The is accepted. All the comments of the reviewers must be addressed.

Comments from Editorial Office: Please address the reviewers comments as outlined by the Academic Editor above under 'Additional Comments'. 

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #4: (No Response)

Reviewer #5: All comments have been addressed

Reviewer #6: (No Response)

Reviewer #7: All comments have been addressed

********** 

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #4: Partly

Reviewer #5: Yes

Reviewer #6: Yes

Reviewer #7: Yes

********** 

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #4: Yes

Reviewer #5: Yes

Reviewer #6: Yes

Reviewer #7: Yes

********** 

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #4: No

Reviewer #5: Yes

Reviewer #6: Yes

Reviewer #7: Yes

********** 

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #4: Yes

Reviewer #5: Yes

Reviewer #6: No

Reviewer #7: Yes

********** 

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #4: The manuscript proposes a new application of deep reinforcement learning to solve the problem of adaptive control for circulating cooling water systems. The authors have provided a clear explanation of the problem and their proposed solution. Overall, the paper is well-written, and the research question is clearly stated. However, there are some areas that could be improved to enhance the manuscript's clarity and impact.

• The methods section provides a detailed description of the proposed approach, including the deep reinforcement learning algorithm, the simulation environment, and the evaluation metrics. However, it would be helpful to provide more details on the implementation of the algorithm, such as the network architecture, the exploration strategy, and the reward function.

• It would be useful to provide more information on the simulation environment, such as the size and complexity of the system, and how it was validated.

• It would be helpful to provide all metrics in the experimental results as tables.

• I kindly suggest that you address the accuracy and improvement percentages in the abstract and conclusion sections and highlight the significance of these results. This will provide readers with a clear understanding of the impact of your work.

• It would be useful to provide more details on the implementation of the reinforcement learning algorithm, such as the reward function and the network architecture.

• To enhance the manuscript's clarity, I recommend that you add more details about the theoretical models and simulation environments in the experiments section. This will enable readers to better understand the methodology behind your research and potentially replicate your experiments.

Reviewer #5: The paper can be accepted now as the authors have addressed all the comments of the reviewers. The quality of the paper is now overall good.

Reviewer #6: 1. In the abstract part, the method adopted by the author is better than the other 11 control strategies, but the author does not specify the control performance index.

2. There is a syntax error in the introduction, please revise it, between lines 22 and 43.

3. The description between lines 75 and 87 is not appropriate in the introduction, please reconsider.

4. Table 2 should be a three-wire table.

5. The conclusion lacks clarity and should be described objectively..

Reviewer #7: This paper is about the adaptive control for circulating cooling water systems using deep reinforcement learning. There are some issues that the authors have to address:

1. This article aimed to improve the performance of the circulating cooling system. The motivation of this paper should be based on the application. Why is TD3 suitable for the system?

2. What are the differences between the standard TD3 algorithm and the proposed algorithm shown in Algorithm 1? The authors did not provide any details about the improvement.

********** 

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #4: No

Reviewer #5: No

Reviewer #6: No

Reviewer #7: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Jul 24;19(7):e0307767. doi: 10.1371/journal.pone.0307767.r006

Author response to Decision Letter 2


20 May 2024

Reviewer #4

Comments1: The methods section provides a detailed description of the proposed approach, including the deep reinforcement learning algorithm, the simulation environment, and the evaluation metrics. However, it would be helpful to provide more details on the implementation of the algorithm, such as the network architecture, the exploration strategy, and the reward function.

Response1: Thank you for your valuable comments on our submitted paper. We agree with your suggestions. Please refer to lines 227-237 in Section 3.2 of the manuscript for details on the network architecture of the algorithm. The exploration strategy using Gaussian noise is presented in Table 1. A detailed description of the reward function used in the algorithm can be found in Section 3.1.3.

Comments2: It would be useful to provide more information on the simulation environment, such as the size and complexity of the system, and how it was validated.

Response2: Thank you for your suggestion. In this study, the system is a two-input, two-output system. We trained the control strategy using the TD3-RTM method and then compared it in detail with traditional PID control, fuzzy PID control, DDPG, and the original TD3 algorithm. To ensure fairness and consistency in the comparison, we set the same initial conditions and disturbance factors. Additionally, we used rise time, settling time, overshoot, and IAE metrics to evaluate the performance of the different control systems.

Comments3: It would be helpful to provide all metrics in the experimental results as tables.

Response3: Thank you for your suggestion. All metrics in the experimental results (rise time, settling time, overshoot, and IAE) are presented in tabular form. Please refer to Table 2.

Comments4: I kindly suggest that you address the accuracy and improvement percentages in the abstract and conclusion sections and highlight the significance of these results. This will provide readers with a clear understanding of the impact of your work.

Response4: Thank you for your valuable suggestion. We have revised the abstract and conclusion sections based on your feedback. Please refer to the relevant content in the manuscript.

Comments5: It would be useful to provide more details on the implementation of the reinforcement learning algorithm, such as the reward function and the network architecture.

Response5: As mentioned in your first question, we provided a detailed description of the reward function used in the algorithm in Section 3.1.3. Please refer to lines 227-237 in Section 3.2 of the manuscript for the algorithm's network architecture.

Comments6: To enhance the manuscript's clarity, I recommend that you add more details about the theoretical models and simulation environments in the experiments section. This will enable readers to better understand the methodology behind your research and potentially replicate your experiments.

Response6: Thank you for your feedback. All simulation experiments in this paper were conducted on a PC running Windows 11, equipped with an AMD 4600H CPU @ 3.00 GHz and 16GB of RAM, using MATLAB/Simulink R2022b. The design of the control system (Figure 4) and the settings of algorithm-related hyperparameters have been thoroughly explained in the article. We believe this information will enable readers to better understand the methods behind our research and replicate our experiments.

Reviewer #6

Comments1: In the abstract part, the method adopted by the author is better than the other control strategies, but the author does not specify the control performance index.

Response1: Thank you for your valuable feedback on our submitted paper. We have revised the abstract to provide a detailed description of the control performance metrics, as per your suggestion. Please review the updated abstract section in the manuscript.

Comments2: There is a syntax error in the introduction, please revise it, between lines 22 and 43.

Response2: Thank you for your feedback. The relevant section has been modified accordingly. Please refer to lines 28-50 in the introduction section for the updated content.

Comments3: The description between lines 75 and 87 is not appropriate in the introduction, please reconsider.

Response3: Thank you for your feedback. The relevant section has been modified accordingly. Please refer to lines 76-92 in the introduction section for the updated content.

Comments4: Table 2 should be a three-wire table.

Response4: Thank you for your suggestion. Table 2 has been revised.

Comments5: The conclusion lacks clarity and should be described objectively.

Response5: Thank you for your suggestion. The conclusion section has been revised and objectively described.

Reviewer #7

Comments1: This article aimed to improve the performance of the circulating cooling system. The motivation of this paper should be based on the application. Why is TD3 suitable for the system?

Response1: Thank you for your valuable feedback on our submitted paper. We chose the TD3 algorithm for controlling the circulating cooling water system because it can handle continuous action spaces, utilizes twin Q networks to reduce overestimation bias, and employs delayed policy updates and soft updates to minimize function approximation errors, thereby providing more stable and accurate control. Additionally, TD3's deep neural networks can effectively model the complex nonlinear and multivariate characteristics of the system, enabling real-time adaptation and optimized control, ultimately enhancing system performance. The content has been supplemented in the new manuscript. Please refer to lines 168-174 on page 7 of the manuscript.

Comments2: What are the differences between the standard TD3 algorithm, and the proposed algorithm shown in Algorithm 1? The authors did not provide any details about the improvement.

Response2: The standard TD3 algorithm is not different from Algorithm 1; this study did not modify the TD3 algorithm itself. Instead, it introduced a reference trajectory model to the circulating cooling water system and designed an adaptive control structure for the Twin Delayed Deep Deterministic Policy Gradient algorithm (TD3-RTM) based on this model, as illustrated in Figure 4. The description of this aspect was not sufficiently clear in the original manuscript, but it has been detailed in the newly submitted manuscript.

Attachment

Submitted filename: Response to Reviewers.docx

pone.0307767.s004.docx (97.3KB, docx)

Decision Letter 3

Lalit Chandra Saikia

11 Jul 2024

Adaptive control for circulating cooling water system using deep reinforcement learning

PONE-D-23-24165R3

Dear Dr. Zhang,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Lalit Chandra Saikia, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

The paper is recommended for publication.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #6: All comments have been addressed

Reviewer #7: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #6: Yes

Reviewer #7: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #6: Yes

Reviewer #7: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #6: Yes

Reviewer #7: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #6: Yes

Reviewer #7: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #6: This paper has clear logic and reasonable structure, has certain innovation and application value, and it is recommended to be published.

Reviewer #7: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #6: No

Reviewer #7: No

**********

Acceptance letter

Lalit Chandra Saikia

15 Jul 2024

PONE-D-23-24165R3

PLOS ONE

Dear Dr. Zhang,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Lalit Chandra Saikia

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix. Datas and codes from the experiments.

    (ZIP)

    pone.0307767.s001.zip (1.2MB, zip)
    Attachment

    Submitted filename: Response to Reviewers.docx

    pone.0307767.s002.docx (99.7KB, docx)
    Attachment

    Submitted filename: Response to Reviewers.docx

    pone.0307767.s003.docx (98.5KB, docx)
    Attachment

    Submitted filename: Response to Reviewers.docx

    pone.0307767.s004.docx (97.3KB, docx)

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting Information files.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES