Abstract
The adoption of continuous pharmaceutical manufacturing has driven increased use of modeling, simulation, and advanced process control strategies. Artificial intelligence (AI) model-based approaches, like neural network predictive control (NNPC), offer advantages in providing insights, predictions, and process adjustments. However, evaluating the credibility of such models and accurately quantifying their impact on product quality remains challenging. In this study, a digital twin model of a continuous direct compression (CDC) line was developed based on residence time distribution theory. A two-layer neural network model was trained using data from the digital twin to predict system outputs. The NNPC model combined the trained neural network with an optimization block to adjust control signals and minimize tracking error and control effort. A proportional-integral-derivative (PID) controller was also developed for comparison. The developed neural network model accurately represented the dynamics of the nonlinear system. The tuned NNPC outperformed PID in setpoint tracking (zero overshoot, shorter settling times) and disturbance rejection (≤1.6% peak deviation, settling time of zero) for ±20% and ±50% changes. In conclusion, the NNPC model demonstrated remarkable performance in setpoint tracking and disturbance rejection for the simulated CDC line, underscoring the potential of AI-based control strategies in enhancing product quality and regulatory assessment.
Keywords: MPC, Neural Network, pharmaceutical continuous manufacturing, continuous direct compression line, RTD model
1. Introduction
Pharmaceutical manufacturing significantly impacts patient care, as manufacturing process failures may lead to product quality issues and potentially harm patients. In addition, product failures or issues with facilities, equipment, or manufacturing process can potentially result in disruptions to the drug supply (Ventola, 2011). Modernizing manufacturing technology has the capability to enhance the robustness of the manufacturing process with fewer interruptions in production, fewer product failures before or after product distribution, and greater assurance that the drug products manufactured in any given period will provide the expected clinical performance (O’Connor et al., 2016). Supporting and enabling pharmaceutical innovation and modernization is one of the U.S. Food and Drug Administration’s (FDA) key missions in protecting and promoting the public health (Lee et al., 2015). The agency continues to support flexible approaches for manufacturing high-quality pharmaceutical drug products. While the implementation of emerging technology is critical for advancing product design, modernizing pharmaceutical manufacturing, and improving quality, the FDA also recognizes that the adoption of innovative approaches may present challenges to industry (Lee et al., 2015).
The FDA actively supports the adoption of emerging technologies, including continuous manufacturing (CM) and model-based advanced control strategies for CM (FDA, 2025b; O’Connor et al., 2016). In the pharmaceutical industry, CM is an integrated approach to drug production where raw materials are continuously fed into the system, transformed through a series of interconnected unit operations, and the finished product is continuously removed (Domokos et al., 2021; Fisher et al., 2022). CM demonstrates potential to improve agility, flexibility, and robustness in the manufacture of pharmaceuticals (Lee et al., 2015; O’Connor et al., 2016). In contrast to traditional batch methods, CM enables production in an interconnected process with fewer steps and shorter processing times, potentially reducing the production cost and lowering the necessary material inventory. Additionally, CM allows for smaller-scale productions, and CM’s ability to process different amounts of material by simply adjusting the production time makes CM highly versatile in both the clinical and commercial scales without the need for scale-up (Engisch & Muzzio, 2016).
Over the past decade, there have been significant investments in the development of CM processes made by the pharmaceutical industry, equipment manufacturers, and academia. This includes new enabling technologies, advancements in process models, process analytical technology (PAT) tools, and advanced process control (APC) strategies for designing, analyzing, and controlling CM processes (Nagy et al., 2022). Model-based APC, particularly Model Predictive Control (MPC), is gaining traction in the pharmaceutical industry, especially with the shift towards CM. In contrast to traditional control strategies that rely on empirical or trial-and-error approaches, these control strategies leverage mathematical models of the process to predict future system behavior and proactively adjust control inputs, ensuring consistent product quality and optimizing operational efficiency in real time. This proactive approach allows for better management of complex, multi-variable processes and helps maintain critical quality attributes within desired specifications. (Chopda et al., 2022; Destro et al., 2024).
Building on these advances, digital twins (DTs) have emerged as a transformative application in pharmaceutical manufacturing (Beke et al., 2021; Chen et al., 2020; Gerogiorgis & Castro-Rodriguez, 2021; Huang, 2023; Jajcevic et al., 2024; Miozza et al., 2024). Unlike traditional process models, which typically represent isolated aspects of a process, DTs are integrated, dynamic virtual replicas that combine multi-physics, multi-scale, probabilistic models with real-time data from PAT tools and manufacturing system. This enables them not only to predict process performance under different operating conditions and potential disturbances, but also to continuously mirror and assess the actual state of their physical counterparts to help maintain a state of control.
“Industry 4.0” (Arden et al., 2021) represents an interesting shift in pharmaceutical manufacturing using integrated, autonomous, and self-organizing production systems. Industry 4.0 offers the potential for higher output, improved safety and quality, greater agility and flexibility, and reduced waste. Artificial intelligence (AI) is one example of a potentially disruptive technology driving this transformation. The FDA has recently emphasized the growing role of AI in drug development and manufacturing through several key initiatives. These include the “Discussion Paper: Artificial Intelligence in Drug Manufacturing” (FDA, 2023), which explores how AI can enhance pharmaceutical manufacturing processes by improving process control, increasing efficiency, and supporting quality assurance, while also identifying regulatory and technical challenges. This paper highlighted the need for robust data management practices, approaches for developing and validating AI-based models, and considerations for monitoring AI-based systems that learn and adapt over time. The FDA sought to promote dialogue with stakeholders to ensure the safe, effective, and consistent use of AI-based technologies in drug manufacturing. Complementing this, an analysis of public feedback submitted in response to the FDA’s discussion paper (Das et al., 2025), reviewed perspectives from industry, academia, and other stakeholders. This feedback underscored the need for clear guidelines on AI-based model validation and lifecycle management, expectations for data quality and governance, and strategies to manage continuously learning AI-based systems. Stakeholders emphasized regulatory flexibility to keep pace with rapid AI advances while maintaining robust oversight to protect product quality and patient safety, calling for collaborative dialogue to develop practical, science-based regulatory approaches for AI in pharmaceutical manufacturing. These efforts are part of the broader CDER Framework for Regulatory Advanced Manufacturing Evaluation (FRAME) Initiative (FDA, 2025c), which aims to modernize regulatory evaluation of advanced manufacturing technologies. Additionally, the FDA has issued the draft guidance “Considerations for the Use of Artificial Intelligence to Support Regulatory Decision-Making for Drug and Biological Products” (FDA, 2025a). The draft guidance outlines a risk-based framework for evaluating the credibility of AI used across the drug product lifecycle. The draft guidance discusses expectations for data quality and integrity, documentation of AI model development and performance, and management of risks such as algorithm drift and bias. The guidance also emphasizes the importance of lifecycle oversight for AI models, recommending that sponsors have plans for monitoring, updating, and maintaining models over time. Together, these documents highlight AI’s transformative potential in pharmaceutical manufacturing and regulatory science, while underscoring the importance of addressing technical and regulatory complexities to ensure product quality and patient safety.
Among the applications of AI in drug development and manufacturing, neural network for APC in CM system could serve as a promising example (Nagy et al., 2022). AI-based APC allows dynamic control of the process to achieve a desired output in combination with real-time measurements of critical quality attributes using PAT (Nagy et al., 2022; Roggo et al., 2020). Specifically, neural network approaches employ computers to train themselves at high speeds using certain numerical algorithms and a vast number of datasets collected during production to (1) obtain actionable insight and predictions about processes and operations, (2) adjust processes to maintain optimal operating conditions, and (3) achieve high product quality. Such approaches have the advantage of providing actionable insights and predictions about processes and operations, as well as adjusting processes to maintain optimal operating conditions, mainly because they could account for the large data sets collected during a production run.
While the use of artificial neural networks in pharmaceutical manufacturing process has been widely exploited in previous research (Nagy et al., 2022), there have been limited investigations into their application in APC. Therefore, the objectives of this study were to: (1) develop an artificial neural network predictive control (NNPC) model for controlling a CM process using a digital twin of the controlled system, (2) optimize and validate the NNPC model, and (3) explore general procedures for evaluating the NNPC model. Specifically, a digital twin of the controlled system, i.e., a simplified continuous direct compression (CDC) line was represented by a mathematical model based on the residence time distribution (RTD) of the CM process. The digital twin model continuously calculated the outlet composition, such as active pharmaceutical ingredient (API) percent loading, for a given inlet composition. To represent the system’s dynamics, a two-layer neural network was trained using data collected from the digital system’s operation. Based on the trained neural network model, the NNPC model was used to control the digital system. The performance of the predictive controller was evaluated by comparing it with a traditional proportional-integral-derivative (PID) controller in terms of both setpoint change and disturbance rejection scenarios. Simulink (MathWorks Inc., Natick, MA) was used as the platform for developing models, performing simulations, and conducting analyses for the system. This work aimed to provide necessary information and knowledge on the risks of using AI-based APC in continuous manufacturing and the impacts on product quality.
2. Methods
The initial stage of NNPC model development involved constructing a neural network model (see 2.1). The subsequent step involved establishing a controller that employed the neural network model in conjunction with an optimization technique to anticipate the system's future behavior and consistently optimize its performance (see 2.2).
2.1. Neural network substitution of the digital system
With proper training using input (u) and output (y) data collected from system operations, a neural network can capture the dynamics of a system as shown in Fig. 1. Throughout the training process, the neural network undergoes consistent optimization using updated error between the system/process output (yp) and the neural network model output (ym) at each time step.
Figure 1.

Schematic diagram of the digital system substitution by neural network model
The neural network design involves determining the network structure. Feedforward networks often have one or more hidden layers of sigmoid neurons followed by an output layer of linear neurons. Multiple layers of neurons with nonlinear transfer functions allow the network to learn nonlinear relationships between input and output vectors. The linear output layer plays an important role in function fitting or nonlinear regression problems. In this work, the network has one hidden layer with sigmoid activation function. A network with such a structure can perform as a general function approximator, as it can approximate any function, including nonlinear functions, with a finite number of discontinuities arbitrarily well, given sufficient neurons in the hidden layer. The structure of the neural network model is illustrated in Fig. 2. The system input and output at previous time step ( and , respectively) were used as the input of neural network model. Subsequently, the network predicted the future system output by knowing the previous inputs and outputs of the system. The adaptive linear neuron (ADALINE) network was employed here considering its outstanding performance in prediction (Sun et al., 2021). The tapped delay line (TDL) component was also used to make full use of the ADALINE network (Beale; et al., 2020; Bitzer & Omlin, 2000; Kaiser, 1994). The data obtained from system operations were utilized for offline batch training of the neural network. The process of training a neural network involved tuning the values of the weights and biases, i.e., IW1,1, IW1,2, b1, LW2,1, and b2, of the network to optimize network performance, as defined by the network performance function . This study employed the mean squared error (MSE), i.e., the average squared error between the network outputs and the target outputs, as cost function :
| (1) |
where is the total time steps used for the training process. In this study, Bayesian regularization algorithm (Burden & Winkler, 2009) was used as the optimization algorithm for . The algorithm calculated the gradient of the network performance with respect to the network weights . The gradient was calculated using a technique called the backpropagation algorithm, which involved performing computations backward through the network (Hagan, 1996). Bayesian regularization algorithm also modified the linear combination so that at the end of training the resulting network had good generalization qualities (Foresee & Hagan, 1997; MacKay, 1992).
Figure 2.

Structure of the neural network model (Beale; et al., 2020)
In the context of neural network training, several factors are of particular interest: the performance of the neural network, denoted as , the magnitude of the gradient of performance, and the number of validation checks. These latter two factors play a crucial role in determining when to terminate the training process. The gradient typically becomes very small as training reaches a minimum of the performance. If the magnitude of the gradient falls below a pre-defined threshold value, e.g., 1.0E-05, the training process will terminate. Similarly, the number of validation checks represents the number of consecutive iterations in which the validation performance fails to improve. If this number reaches 6 (the default value), training stops. For a comprehensive presentation of the network training and validation results, please refer to 3.1.
2.2. Model-based predictive control
The predictive controller was designed using the NNPC toolbox integrated in Simulink (MathWorks Inc., Natick, MA). The working principles for this predictive controller are summarized as follows:
At each time step, the controller receives and estimates the current state of the system, specifically the system input and output at current time step.
The controller then calculates the sequence of control actions that minimizes the cost function over a finite and receding horizon by solving a constrained optimization problem that relies on a system model (neural network model trained in previous section) and the current state of the system.
Next, the controller applies to the system only the first computed control action, ensuring the control action has time to take effect before the next control action is computed.
This process repeats in subsequent time steps, enabling the controller to continuously adapt and respond to the evolving dynamics of the system.
The NNPC model consists of a neural network model and an optimization block (see Fig. 3). The role of the optimization block is to adjust the values of manipulated variable, , to minimize the cost function . Once the optimal value of control signal is determined, it is transmitted to the system. The cost function used in the optimization technique over the prediction time horizon is expressed in Eq. (2),
| (2) |
where the minimal prediction horizon of the output, the prediction horizon of the output, the control horizon, the tentative control signal, the target/reference response, the network model response/prediction, and the control weighting factor. Prediction horizon, , defines the number of future samples over which the controller attempts to minimize the cost function . It should be sufficiently large to capture the transient response and encompass the significant dynamics of the system. A longer prediction horizon (larger ) increases both performance and computational requirements. The control horizon, , represents the number of free control moves that the controller uses to minimize over . Similar to , a longer control horizon (larger ) increases both performance and computational requirements, while a smaller promotes faster computations and maintains an internally stable controller. A good rule of thumb for is to set it between 10% and 20% of , meanwhile ensuring a minimum of 2 to 3 steps. The control weighting factor, , can prioritize the performance goals of the controller by adjusting the tuning weights of the cost function . Typically, smaller provides aggressive reference tracking performance, while larger promotes smoother control moves that enhance robustness.
Figure 3.

Block diagram of NNPC
The controller requires a proper tuning of its parameters which directly affects the system’s performance. The minimal prediction horizon of the output, , fixed at 1 as the default value in the toolbox. The backtracking line search routine was utilized for locating the minimum of in a given direction (Dennis, 1983).
The minimization ceases when the following condition is satisfied,
| (3) |
where the difference in between two successive iterations, and a search parameter which determines sufficient reduction in minimization performance (Jorge Nocedal, 2006). Discussion on the tuning of the controller’s parameters is presented in 3.2.
2.3. Digital twin of controlled system
Because RTD plays a key role in material traceability by characterizing how a material flows through the process and RTD is commonly determined experimentally and/or using process modeling (Bhalode et al., 2021; Jelsch et al., 2021), we developed a digital twin of the continuous manufacturing system by capturing the RTD of the dynamic system. The tracking of materials and process dynamics in the whole system was determined by the convolution integral in combination with RTD (Bhalode et al., 2021; Engisch & Muzzio, 2016):
| (4) |
where the outlet composition, the inlet composition, and the RTD of the unit operation(s).
Applying the generalized equation for equally sized continuously stirred tank reactors (CSTRs) placed in series, we can determine as
| (5) |
where is the mean residence time and is the number of CSTRs. Values of and can be determined by fitting the RTD experimental data to the tanks-in-series model. In this work, we assumed a system with and (Hurley et al., 2022; Yang et al., 2023), and the of the system is described in Fig. 4. This system represents a simplified CDC manufacturing line including one feed frame, one blender and a rotary tablet press (Bekaert et al., 2022). The sample data set, i.e., (), used for neural network training in 2.1 was generated from the digital twin model based on Eqs. (4–5). Specific to this study, for a CDC line (Roggo et al., 2020), can be the API concentration at the blender’s entrance and is the API concentration in the tablet. Depending on the process to be investigated, = and = .
Figure 4.

RTD for tanks-in-series model having a mean residence time of 100s and a number of tanks of 2 (Jelsch et al., 2021; Su et al., 2019)
3. Results and Discussion
3.1. Neural network model training and validation
When training multilayer neural networks, a common practice involves first dividing the data into three subsets, namely training, validation, and testing data sets, using a typical partition ratio of 0.5, 0.25, and 0.25, respectively. The training set is used for computing the gradient and updating the network weights and biases. The second subset is the validation set. The MSE on the validation set is monitored during the training process. The validation set MSE normally decreases during the initial phase of training, as does the training set MSE. However, when the network begins to overfit the data, the MSE on the validation set typically begins to rise. The network weights and biases are stored at the minimum of the validation set MSE. The last subset is the testing set, which is used for verifying the trained weights and biases after training process is completed. The test set MSE is not used during training, but it is useful to compare the performance of different models. Thus, it is of interest to plot the test set MSE during the training process. If the MSE on the test set reaches a minimum at a significantly different iteration number than the validation set MSE, this might indicate a poor division of the data set. It should be noted that the validation process mentioned in this study is part of the neural network model development process and should not be confused with the concept of model validation and life cycle considerations.
For the current CDC case study, the neural network underwent training for 2000 epochs. Illustrated in Fig. 5 (a), the performance plot exhibits the MSEs as a function of iteration number. Notably, the MSE demonstrates a substantial decrease in the initial iterations for both training and validation processes. At the 2000-epoch mark, the network achieved its highest validation performance, attaining an MSE of 2.9596E-08. Upon conducting 2000 iterations, the testing performance reflected an MSE of 2.24E-06. Consequently, Fig. 5 (a) indicates that no significant issues arose during the training procedure.
Figure 5.

Neural network training and post-training analysis: (a) training performance and (b) regression plot
The next step in validating the network was to generate a regression plot, showing the relationship between the outputs of the network and the targets (see Fig. 5 (b)). The solid line represents the best fit linear regression line between outputs and targets. The R value is an indication of the relationship between the outputs and targets. R = 1 indicates an exact correlation between outputs and targets.
Figure 6 illustrates the response of the resulting system model for training, validation and testing. The errors between system outputs and network outputs are within the range of ±2.0E-03 for training results. The testing errors are in the range of (−1.0E-03, 5.0E-04), indicating that the neural network was well trained. Consequently, the training results confirm that the neural network can represent the nonlinear system accurately. It is worth mentioning that the maximum system input was set to 200% for the training process as initial system input was considered as 100% label claim. Therefore, this study only investigates a possible setpoint change within a range of 0 to 200% (see Fig. 6 Input and Output). If larger setpoint change control is anticipated, the network needs proper retraining accordingly.
Figure 6.

Neural network (a) training, (b) validation and (c) testing response outcomes
3.2. Model calibration
Several control parameters needed to be tuned in the predictive model, including , , and . Thus, a sequence of closed-loop control tests was carried out with various control parameters. For all test cases, we defined a fixed setpoint decrease of 50%, of the target system output at time (see Figs. 7 (a) to (d)). Prediction horizon was first determined based on the system output plot as shown in Fig. 7 (a) at a fixed control horizon Nu (i.e., ). It was found that when increased to 30, no overshoot was observed during the control process. If , a slight overshoot (approximately 0.4%) can be noticed. The final value of N2 was determined as 30 because it resulted in a shorter response time compared to when N2 was 40, and a larger value of N2 required a higher computational cost. The control horizon was set to 10% to 20% of the . Figure 7 (b) illustrates that the system input fluctuated and could not converge to a steady state within 100 s when , which indicated that other parameters required further tuning. Consequently, the control weighting factor was tuned. The results of tuning are shown in Figs. 7 (c) and (d). In Fig. 7 (c), the value of had minimal influence on the system output, and there was a 0.04% offset regarding the setpoint change. Figure 7 (d) shows the effect of on system input. When , the input was able to converge to 100% within 100 s, whereas input fluctuations occurred when takes on other values.
Figure 7.

NNPC model parameters tuning with 50% decrease of setpoint: (a) system output and (b) input at various value of and ; (c) system output and (d) input at various value of
Other control parameters, such as iterations per sample time (IPST) and search parameter , were also tuned. Specifically, when within 2 to 10, IPST had no significant effect on both system input and output (see Figs. 8 (a) and (b)). We selected an IPST of 5 as there was no observable difference between the results when IPST = 5 and IPST = 10. The tuning results of are shown in Figs. 8 (c) and (d). was used for following control cases considering its control accuracy and computational cost required.
Figure 8.

NNPC parameters tuning results for IPST (a) system input and (b) output, and (c) system input and (d) output
3.3. System with PID controller
Feedback PID controller was developed for the dynamical system described above as a comparison to the NNPC model. This choice was made because PID control is widely employed as the predominant control technique in various industrial applications (Borase et al., 2021). The control parameters in the PID block were semi-automatically tuned using Simulink PID Tuner toolbox. PID Tuner provides a fast and widely applicable single-loop PID tuning method for the Simulink PID Controller blocks. With this method, PID controller parameters can be tuned to achieve a robust design with the desired response time. Specifically, the PID Tuner automatically computes a linearized system model seen by the controller. The Tuner identifies the system input and output and uses the current operating point for the linearization. Then, the controller in the PID Tuner can be tuned by manually adjusting design criteria, i.e., response time and transient behavior, and the tuner computes PID parameters that robustly stabilize the system (see Fig. 9). Finally, the tuned parameters of the designed controller are exported back to the PID controller block, from where verification in the closed-loop system can be performed. In this study, two sets of PID parameters were selected for comparison purposes (see Table 1). PID-1 provided the most robust control behavior (smallest overshoot, defined in Eq. (6)), while PID-2 had the shortest response time.
Figure 9.

PID tuning illustration
Table 1.
Tuned PID controller parameters.
| P | I | D | |
|---|---|---|---|
| PID-1 | 1.2412 | 0.010787 | 14.6072 |
| PID-2 | 17.6324 | 0.01551 | 15.648 |
3.4. NNPC vs PID
After the verification and validation of NNPC control parameters, the closed-loop response of the control system was evaluated. We focused on the closed-loop response assessment in two main aspects: setpoint tracking and disturbance rejection. In a closed-loop system, setpoint tracking involves the assessment of the controlled variable response to a change in its setpoint, while disturbance rejection involves the assessment of any process variable’s response to a disturbance in its value. In this study, setpoint tracking performance was assessed using three metrics, i.e., overshoot , rise time and settling time . These metrics were defined as follows,
| (6) |
| (7) |
| (8) |
where the greatest overshoot, the new setpoint, and the system output. is a time function, e.g., represents the time point when do not exceed 2% of the afterwards, and represents the time point that initially changes to .
For the purpose of comparison, the system input was limited between 0 and 200% for both NNPC and PID models. The target output was adjusted to a new setpoint at . Fig. 10 (a) compares the NNPC and PID control response to a 20% setpoint increase in the reference output (), resulting in a new setpoint of 120%. The NNPC model adjusted the system input in a wider range compared with two PID models, even though all three models had the same range limit regarding system input. Thus, NNPC provided the best control regarding the system output with a rise time of 28.49 s and settling time of 45.62 s when (see Table 2). Although PID-2 had a similar as NNPC model, there was a 27.8% overshoot which led to a much larger (200.04 s) than NNPC (45.62 s) (see Table 3). When (see Fig. 10 (b)), again, NNPC model provided the best control performance with = 0, , and . In contrast, PID-2 exhibited the worst performance with a 46.9% overshoot, resulting in a settling time of 287.85 s, which was 105.22 s longer than PID-1 and 202.35 s longer than NNPC (Table 2). Similar results can be observed from Figs. 10 (c) and (d) when there was a decrease in . An interesting observation is that for PID-1, was about 11 s longer at than at .
Figure 10.

Comparison of NNPC and PID models control results regarding setpoint changes: (a) , (b) , (c) , and (d)
Table 2.
Controllers’ response assessment comparison regarding setpoint changes.
| (%) | (s) | (s) | |
|---|---|---|---|
|
% | |||
| PID-1 | 0 | 121.85 | 193.52 |
| PID-2 | 27.80 | 28.69 | 200.04 |
| NNPC | 0 | 28.49 | 45.62 |
|
| |||
|
| |||
| PID-1 | 0 | 116.74 | 182.63 |
| PID-2 | 46.90 | 58.41 | 287.85 |
| NNPC | 0 | 58.43 | 85.32 |
|
| |||
|
| |||
| PID-1 | 0 | 122.15 | 193.12 |
| PID-2 | 27.80 | 29.31 | 200.08 |
| NNPC | 0 | 29.58 | 45.64 |
|
| |||
|
| |||
| PID-1 | 0 | 117.26 | 182.61 |
| PID-2 | 46.90 | 59.59 | 287.66 |
| NNPC | 0 | 59.61 | 85.16 |
Table 3.
Controllers’ response assessment comparison regarding disturbance rejection.
| (%) | (s) | |
|---|---|---|
|
| ||
| PID-1 | 37.1 | 485.5 |
| PID-2 | 4.3 | 189.7 |
| NNPC | 1.6 | 0 |
|
| ||
|
| ||
| PID-1 | 37.1 | 485.8 |
| PID-2 | 4.4 | 189.8 |
| NNPC | 1.56 | 0 |
|
| ||
|
| ||
| PID-1 | 37.1 | 485.5 |
| PID-2 | 4.4 | 189.7 |
| NNPC | 1.4 | 0 |
|
| ||
|
| ||
| PID-1 | 37.1 | 485.7 |
| PID-2 | 4.4 | 189.6 |
| NNPC | 1.5 | 0 |
In terms of disturbance rejection, another two metrics were used to assess the controllers’ performance. They are peak deviation and settling time :
| (9) |
| (10) |
where is the greatest deviation, and is the point of initial disturbance introduced.
The disturbance was defined as the ratio of an unexpected change in system input to its original value, . A disturbance was introduced to the system input at that lasted for 100 s. Four different values of were investigated, i.e., and . As shown in Fig. 11, NNPC provided the best disturbance rejection performance. Specifically, for all four cases, NNPC model had which guaranteed . In contrast, PID-1 exhibited the worst control in terms of the introduced disturbance, as and , while PID-2 provided better control results regarding disturbance rejection than PID-1 with and (see Table 3).
Figure 11.

Comparison between NNPC and PID models control results regarding disturbance rejection: (a) , (b) , (c) , and (d)
Moreover, when , although the NNPC model showed satisfactory outcome in terms of the system output, the input generated by the model did not reach a steady state (see Figs. 11 (b) and (d) black curve). This indicated that the control parameters in predictive control model need further tuning. The possible tuning methods include: (1) reducing prediction horizon, (2) increasing iterations per sample time, and (3) adjusting the weighting factor. After further tuning, the NNPC performance in reacting to is illustrated in Fig. 12.
Figure 12.

NNPC reacting to with finalized control parameters: , , , and IPST = 5
Overall, the simulation results demonstrated the excellent performance of the NNPC in terms of both setpoint change and disturbance rejection scenarios in comparison with the PID controller.
Based on above discussion, the general procedures for training and evaluating the NNPC model are summarized as follows:
Determine the structure of the neural network, i.e., the size of the hidden layer, number of delayed system input and output, and the network parameters. This is done by trial-and-error technique according to the neural network training results.
Determine the prediction horizon and control horizon of the predictive control model. In general, control horizon is 10% ~ 20% of prediction horizon, with a minimum value of 2. Prediction horizon can be tuned based on the closed-loop system control performance, specifically the system output and input outcomes.
Tune and verify other control parameters, such as control weighting factor, iterations per sample time and search parameter based on closed-loop control results.
Finally, it should be noted that if the above tuning process is done based on setpoint change control results, the tuned parameters need to be verified for a disturbance rejection scenario.
3.5. Advantages and limitations of current NNPC
The main advantages of NNPC in comparison with PID model used in this study are summarized in the following points:
The combination of neural network and model predictive control better identified the system’s complexity using either digital twin model or experimental data set to train the neural network;
The two layers neural network was mathematically proved to be able to represent any non-linear behavior of the dynamic system, so long as the system was not highly discontinuous;
The NNPC model offered superior control performance compared to the PID controller;
The NNPC depended on the system inputs and outputs only rather than the PID controller which required the mathematical model (i.e., transfer function) of the system as mentioned in the introduction section.
Main limitations of NNPC in comparison with PID control in this study were:
More parameters in NNPC model needed proper tuning and verification. For example, the networks were sensitive to the number of neurons in their hidden layers. Too few neurons led to underfitting. Too many neurons contributed to overfitting, in which all training points were well fitted, but the fitting curve oscillated wildly between these points.
Although multilayer networks being trained may theoretically be capable of performing any linear or nonlinear computation and approximating any reasonable function arbitrarily, backpropagation did not always find a solution. Retraining several times may be required.
Higher computational cost was required as a more complex algorithm was utilized in the NNPC model in comparison to PID control.
We employed data generated by digital twin, i.e., RTD based system model, for training the neural network, which did not account for the process uncertainties, and thus, the training data did not best represent actual manufacturing process conditions. Therefore, it was preferred to train neural network using experimental data (system’s input and output) collected from a manufacturing process.
The actual behavior or characteristics of the feeder, which adjusted the input flowrate of the material, was not considered in this project. In other words, the control signal generated by the NNPC may not be able to execute the requested adjustment in flowrate due to mechanical limitations and practical slower response time of the feeder. For example, as shown in Fig. 10 (a), there might be mechanical restrictions on achieving the reduction of input from 200% to 0 in about 20 s. Therefore, the physical behaviors of the system should also be taken into consideration before implementing the NNPC model to real manufacturing process. The latter limitation can be addressed by inclusion of upper bound limit for the rate of control action at optimization step.
Accordingly, above limitations can be addressed in the future work by:
Employing a complex system as a controlled system, e.g., a system starting from feeders to tableting.
Optimizing the number of hidden layers and investigating its impact on the final control performance.
Investigating the performance of the NNPC model regarding larger disturbance and setpoint change requests to explore the limits of the controller’s capability.
Integrating the equipment dynamics and characteristics into the NNPC model to better understand the performance of the controller.
Investigating other AI algorithms’ suitability towards different CM process control applications.
4. Conclusion
This study successfully developed, optimized, and validated an NNPC model for a simplified CDC process, demonstrating its superior performance in setpoint tracking and disturbance rejection compared to traditional PID control. The findings highlight the potential of AI-based APC to manage complex, nonlinear system dynamics inherent in continuous pharmaceutical manufacturing. By providing a framework for developing and evaluating such models, this work offers valuable insights for establishing model credibility, supporting the broader modernization of pharmaceutical manufacturing, and aligning with key regulatory initiatives.
Despite these promising results, significant challenges remain. The lack of standardized mechanisms for neural network model verification and validation presents a hurdle for regulatory submissions, underscoring a critical area for future research. Further work is also needed to integrate equipment dynamics and ensure the model can be deployed for real-time control by aligning its time steps with physical hardware. Addressing these limitations will be crucial for the practical implementation of AI-based control systems and could substantially enhance regulatory quality assessment protocols for the pharmaceutical industry.
Acknowledgments
This research was supported from grants by the Regulatory Science and Review Enhancement (RSR) Program under CDER Intramural Funding Programs. The authors would like to thank Mr. Akhilesh Mishra (Industry Marketing Manager, MathWorks) for technical support on the model development.
Nomenclature
- ADALINE
Adaptive linear neuron
- AI
Artificial intelligence
- APC
Advanced process control
- API
Active pharmaceutical ingredient
- CM
Continuous manufacturing
- CSTR
Continuous stirred tank reactor
- FDA
Food and Drug Administration
- IPST
Iterations per sample time
- MSE
Mean squared error
- NNPC
Neural network predictive control
- PAT
Process analytical technology
- PID
Proportional-integral-derivative
- RTD
Residence time distribution
- TDL
Tapped delay line
System inlet composition
System outlet composition
System initial setpoint/target output composition
System new setpoint/target output composition
Residence time distribution of the unit operation
Error between system target output and network predicted output
Neural network performance function
Cost function of predictive controller
Number of CSTRs
Number of training samples
Minimal prediction horizon of the output
Prediction horizon of the output
Control horizon
Time/time step
Control interval/sampling time
Control signal/system input
Tentative control signal/system input
Actual system output
Neural network predicted system output
System reference output
Bayesian regularization network weight
Search parameter
Control weighting factor
Mean residence time
Rise time
Settling time of setpoint change condition
Overshoot of the control system
Greatest overshoot
Peak deviation of the control system
Settling time of disturbance rejection condition
Footnotes
Disclaimer
This article reflects the views of the authors and should not be construed to represent FDA’s views or policies.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References
- Arden NS, Fisher AC, Tyner K, Yu LX, Lee SL, & Kopcha M (2021). Industry 4.0 for pharmaceutical manufacturing: Preparing for the smart factories of the future. International Journal of Pharmaceutics, 602, 120554. 10.1016/j.ijpharm.2021.120554 [DOI] [PubMed] [Google Scholar]
- Beale MH, Hagan MT, & Demuth HB (2020). Deep Learning Toolbox User’s Guide. The MathWorks, Inc. [Google Scholar]
- Bekaert B, Van Snick B, Pandelaere K, Dhondt J, Di Pretoro G, De Beer T, Vervaet C, & Vanhoorne V (2022). Continuous direct compression: Development of an empirical predictive model and challenges regarding PAT implementation. International Journal of Pharmaceutics: X, 4, 100110. 10.1016/j.ijpx.2021.100110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beke ÁK, Gyürkés M, Nagy ZK, Marosi G, & Farkas A (2021). Digital twin of low dosage continuous powder blending – Artificial neural networks and residence time distribution models. European Journal of Pharmaceutics and Biopharmaceutics, 169, 64–77. 10.1016/j.ejpb.2021.09.006 [DOI] [PubMed] [Google Scholar]
- Bhalode P, Tian H, Gupta S, Razavi SM, Roman-Ospino A, Talebian S, Singh R, Scicolone JV, Muzzio FJ, & Ierapetritou M (2021). Using residence time distribution in pharmaceutical solid dose manufacturing – A critical review. International Journal of Pharmaceutics, 610, 121248. 10.1016/j.ijpharm.2021.121248 [DOI] [PubMed] [Google Scholar]
- Bitzer T, & Omlin CW (2000). Neural Networks for Chaotic Time Series Prediction. In: University of Applied Sciences Konstanz, Germany. [Google Scholar]
- Borase RP, Maghade DK, Sondkar SY, & Pawar SN (2021). A review of PID control, tuning methods and applications. International Journal of Dynamics and Control, 9(2), 818–827. 10.1007/s40435-020-00665-4 [DOI] [Google Scholar]
- Burden F, & Winkler D (2009). Bayesian Regularization of Neural Networks. In Livingstone DJ (Ed.), Artificial Neural Networks: Methods and Applications (pp. 23–42). Humana Press. 10.1007/978-1-60327-101-1_3 [DOI] [Google Scholar]
- Chen Y, Yang O, Sampat C, Bhalode P, Ramachandran R, & Ierapetritou M (2020). Digital Twins in Pharmaceutical and Biopharmaceutical Manufacturing: A Literature Review. Processes, 8(9). [Google Scholar]
- Chopda V, Gyorgypal A, Yang O, Singh R, Ramachandran R, Zhang H, Tsilomelekis G, Chundawat SPS, & Ierapetritou MG (2022). Recent advances in integrated process analytical techniques, modeling, and control strategies to enable continuous biomanufacturing of monoclonal antibodies. Journal of Chemical Technology & Biotechnology, 97(9), 2317–2335. 10.1002/jctb.6765 [DOI] [Google Scholar]
- Das J, O’Connor TF, Fisher AC, Osorio M, Lam J, & Myers RC (2025). Public feedback to FDA on regulatory considerations for AI in drug manufacturing. AAPS Open, 11(1), 10. 10.1186/s41120-025-00110-w [DOI] [Google Scholar]
- Dennis JE, and Schnabel RB. (1983). Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice-Hall. [Google Scholar]
- Destro F, Inguva PK, Srisuma P, & Braatz RD (2024). Advanced methodologies for model-based optimization and control of pharmaceutical processes. Current Opinion in Chemical Engineering, 45, 101035. 10.1016/j.coche.2024.101035 [DOI] [Google Scholar]
- Domokos A, Nagy B, Szilágyi B, Marosi G, & Nagy ZK (2021). Integrated Continuous Pharmaceutical Technologies—A Review. Organic Process Research & Development, 25(4), 721–739. 10.1021/acs.oprd.0c00504 [DOI] [Google Scholar]
- Engisch W, & Muzzio F (2016). Using Residence Time Distributions (RTDs) to Address the Traceability of Raw Materials in Continuous Pharmaceutical Manufacturing. Journal of Pharmaceutical Innovation, 11(1), 64–81. 10.1007/s12247-015-9238-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- FDA. (2023). Artificial Intelligence in Drug Manufacturing. Retrieved from https://www.fda.gov/media/165743/download
- FDA. (2025a). Considerations for the Use of Artificial Intelligence To Support Regulatory Decision-Making for Drug and Biological Products. Retrieved from https://www.fda.gov/media/184830/download
- FDA. (2025b). Examples of Accepted Emerging Technologies. U.S. Food and Drug Administration. https://www.fda.gov/about-fda/center-drug-evaluation-and-research-cder/examples-accepted-emerging-technologies [Google Scholar]
- FDA. (2025c). Framework for Regulatory Advanced Manufacturing Evaluation (FRAME) Initiative. https://www.fda.gov/about-fda/center-drug-evaluation-and-research-cder/cders-framework-regulatory-advanced-manufacturing-evaluation-frame-initiative
- Fisher AC, Liu W, Schick A, Ramanadham M, Chatterjee S, Brykman R, Lee SL, Kozlowski S, Boam AB, Tsinontides SC, & Kopcha M (2022). An audit of pharmaceutical continuous manufacturing regulatory submissions and outcomes in the US. International Journal of Pharmaceutics, 622, 121778. 10.1016/j.ijpharm.2022.121778 [DOI] [PubMed] [Google Scholar]
- Foresee FD, & Hagan MT (1997). Gauss-Newton approximation to Bayesian learning. Proceedings of International Conference on Neural Networks (ICNN’97), 3, 1930–1935 vol.1933. [Google Scholar]
- Gerogiorgis DI, & Castro-Rodriguez D (2021). A Digital Twin for Process Optimisation in Pharmaceutical Manufacturing. In Türkay M & Gani R (Eds.), Computer Aided Chemical Engineering (Vol. 50, pp. 253–258). Elsevier. 10.1016/B978-0-323-88506-5.50041-3 [DOI] [Google Scholar]
- Hagan MT, Demuth HB and Beale M (1996). Neural Network Design. PWS Publishing Co. [Google Scholar]
- Huang Y-S (2023). Digital Twin Development and Advanced Process Control for Continuous Pharmaceutical Manufacturing Purdue University]. [Google Scholar]
- Hurley S, Tantuccio A, Escotet-Espinoza MS, Flamm M, & Metzger M (2022). Development and Use of a Residence Time Distribution (RTD) Model Control Strategy for a Continuous Manufacturing Drug Product Pharmaceutical Process. Pharmaceutics, 14(2). [Google Scholar]
- Jajcevic D, Remmelgas J, Toson P, Matić M, Hörmann-Kincses T, Beretta M, Rehrl J, Poms J, O’Connor T, Koolivand A, Tian G, Krull SM, & Khinast JG (2024). Development of a high-fidelity digital twin using the discrete element method for a continuous direct compression process. Part 1. Calibration workflow. International Journal of Pharmaceutics, 666, 124796. 10.1016/j.ijpharm.2024.124796 [DOI] [PubMed] [Google Scholar]
- Jelsch M, Roggo Y, Kleinebudde P, & Krumme M (2021). Model predictive control in pharmaceutical continuous manufacturing: A review from a user’s perspective. European Journal of Pharmaceutics and Biopharmaceutics, 159, 137–142. 10.1016/j.ejpb.2021.01.003 [DOI] [PubMed] [Google Scholar]
- Jorge Nocedal SJW (2006). Numerical Optimization. Springer; 10.1007/978-0-387-40065-5 [DOI] [Google Scholar]
- Kaiser M (1994). Time-delay neural networks for control. IFAC Proceedings Volumes, 27(14), 967–972. 10.1016/S1474-6670(17)47423-4 [DOI] [Google Scholar]
- Lee SL, O’Connor TF, Yang X, Cruz CN, Chatterjee S, Madurawe RD, Moore CMV, Yu LX, & Woodcock J (2015). Modernizing Pharmaceutical Manufacturing: from Batch to Continuous Production. Journal of Pharmaceutical Innovation, 10(3), 191–199. 10.1007/s12247-015-9215-8 [DOI] [Google Scholar]
- MacKay DJC (1992). Bayesian Interpolation. Neural Computation, 4(3), 415–447. 10.1162/neco.1992.4.3.415 [DOI] [Google Scholar]
- Miozza M, Brunetta F, & Appio FP (2024). Digital transformation of the Pharmaceutical Industry: A future research agenda for management studies. Technological Forecasting and Social Change, 207, 123580. 10.1016/j.techfore.2024.123580 [DOI] [Google Scholar]
- Nagy B, Galata DL, Farkas A, & Nagy ZK (2022). Application of Artificial Neural Networks in the Process Analytical Technology of Pharmaceutical Manufacturing—a Review. The AAPS Journal, 24(4), 74. 10.1208/s12248-022-00706-0 [DOI] [PubMed] [Google Scholar]
- O’Connor TF, Yu LX, & Lee SL (2016). Emerging technology: A key enabler for modernizing pharmaceutical manufacturing and advancing product quality. International Journal of Pharmaceutics, 509(1), 492–498. 10.1016/j.ijpharm.2016.05.058 [DOI] [PubMed] [Google Scholar]
- Roggo Y, Jelsch M, Heger P, Ensslin S, & Krumme M (2020). Deep learning for continuous manufacturing of pharmaceutical solid dosage form. European Journal of Pharmaceutics and Biopharmaceutics, 153, 95–105. 10.1016/j.ejpb.2020.06.002 [DOI] [PubMed] [Google Scholar]
- Su Q, Ganesh S, Moreno M, Bommireddy Y, Gonzalez M, Reklaitis GV, & Nagy ZK (2019). A perspective on Quality-by-Control (QbC) in pharmaceutical continuous manufacturing. Computers & Chemical Engineering, 125, 216–231. 10.1016/j.compchemeng.2019.03.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sun L, Liang F, & Cui W (2021). Artificial Neural Network and Its Application Research Progress in Chemical Process. Asian Journal of Research in Computer Science, 177–185. 10.9734/ajrcos/2021/v12i430302 [DOI] [Google Scholar]
- Ventola CL (2011). The drug shortage crisis in the United States: causes, impact, and management strategies. P & T : a peer-reviewed journal for formulary management, 36(11), 740–757. [PMC free article] [PubMed] [Google Scholar]
- Yang W, Krull S, Pavurala N, Xu X, O’Connor T, & Tian G (2023). Assessing residence time distributions and hold-up mass in continuous powder blending using discrete element method. Chemical Engineering Research and Design, 190, 10–19. 10.1016/j.cherd.2022.12.005 [DOI] [Google Scholar]
