Skip to main content
CPT: Pharmacometrics & Systems Pharmacology logoLink to CPT: Pharmacometrics & Systems Pharmacology
. 2025 Nov 15;15(1):e70146. doi: 10.1002/psp4.70146

Neural Controlled Differential Equation and Its Application in Pharmacokinetics and Pharmacodynamics

Zhisong Wu 1, Pingyao Luo 1, Rong Chen 1, Yaou Liu 1,2, Weizhe Jian 1, Tianyan Zhou 1,
PMCID: PMC12823316  PMID: 41241764

ABSTRACT

With the recent advances in machine learning (ML) and artificial intelligence (AI), data‐driven modeling approaches for pharmacokinetics (PK) and pharmacodynamics (PD) have gained popularity due to their versatility in diverse settings and reduced reliance on prior assumptions. However, most of the ML methods ignore the hidden dynamics behind the data, lacking interpretability. This study investigated the applicability of neural controlled differential equation (NCDE), a novel ML method that is suitable for data‐driven modeling of PK and PD profiles, especially in the setting of multiple dosing. We demonstrated that NCDE was capable of combining differential‐equation‐based dynamics with data‐driven characteristics, flexibly incorporating various types of inputs, and embedding discontinuous dynamics. Moreover, a direct correspondence was identified between the learned dynamics of NCDE and the dynamics behind the data, which highlights the intrinsic interpretability of NCDE. Additionally, the influence of important hyperparameters was systematically investigated, and it was found that L1 regularization and the AdaMax optimizer were useful for stabilizing the training process and leading to a generalizable NCDE model. Together, these findings demonstrate the accuracy, generalizability, and interpretability of NCDE, indicating that NCDE is a reliable method for further application. In the future, NCDE may further facilitate PK and PD prediction in general.

Keywords: AI4Science, data‐driven modeling, machine learning, neural controlled differential equations


Neural controlled differential equations (NCDE), driven by control variables, are capable to learn the discontinuous dynamics in the PK and PD datasets.

graphic file with name PSP4-15-e70146-g007.jpg


Study Highlights.

  • What is the current knowledge on the topic?
    • Data‐driven modeling approaches based on machine learning are increasingly popular in pharmacometrics. However, the capability of neural controlled differential equation (NCDE) for data‐driven modeling of PK and PD profiles has not been investigated.
  • What question did this study address?
    • Can NCDE serve as an accurate, flexible, and generalizable method for data‐driven modeling of PK and PD profiles, while preserving interpretability?
  • What does this study add to our knowledge?
    • NCDE is a powerful machine learning architecture for data‐driven modeling of PK and PD profiles, exhibiting flexibility to incorporate different types of input (continuous, discontinuous, and constant) while maintaining accuracy and interpretability. The study also provides insights into establishing a robust methodology for NCDE training.
  • How might this change clinical pharmacology or translational science?
    • NCDE can be further utilized in drug development for data‐driven modeling, allowing for the efficient identification of dynamic patterns behind the data, which is useful for the preliminary investigation of PK and PD profiles. Besides, when information from other modalities is incorporated, NCDE may further facilitate PK and PD prediction in general.

1. Introduction

Pharmacokinetic (PK) and pharmacodynamic (PD) modeling extensively utilizes ordinary differential equations (ODEs) that reflect well‐understood empirical or mechanistic principles [1, 2]. PK and PD models based on ODEs are more readily accepted by regulatory agencies due to the clear interpretation of parameters. However, traditional modeling approaches are hypothesis‐driven, time‐consuming, and require expert knowledge [3]. With recent advances in artificial intelligence (AI) and machine learning (ML), AI/ML methods in the context of PK and PD have become a plausible alternative to traditional methods [4, 5]. This paradigm shift reflects a data‐driven perspective of model construction, aiming to reduce subjective assumptions while possessing the versatility to handle different scenarios [4]. Especially when mechanistic hypotheses are not available, it is a reasonable choice to harness the power of AI/ML methods to uncover the hidden patterns behind the data [5, 6, 7].

Among various AI/ML methods, neural differential equations (NDEs), especially neural ordinary differential equations (NODEs), have attracted special attention in the field of PK and PD. NODE utilizes ODE in its underlying structure, which aligns with the modeling practices of pharmacometricians [8]. For example, Bräm et al. established a low‐dimensional NODE framework to model the compartmental dynamics in various PK settings, where the dynamics of NODE directly correspond to the dynamics behind the data [9, 10]. By contrast, some researchers utilized latent‐ODE structure to model PK profiles and disease progression [11, 12, 13], where latent‐ODE is a generative model with a complex structure.

While these works greatly advanced the acceptance of NDEs and NODE in pharmacometrics, their drawbacks are worth considering. Firstly, although the complex structure of latent‐ODE enables it to handle multiple features, its prediction results are inherently probabilistic due to the sampling procedure [14]. Correspondingly, the complexity of latent‐ODE also reduces its interpretability. Secondly, although the low‐dimensional NODE is more interpretable, its input–output relationship is obscured. This is because low‐dimensional NODE exclusively focuses on the dynamics of state variables (e.g., PK and PD profiles), but the information that cannot be categorized as state variables (e.g., time since the last dose and baselines) cannot be included as inputs. Thirdly, both low‐dimensional NODE and latent‐ODE are unsuitable for learning discontinuous dynamics. All these characteristics may hinder their application in PK and PD, especially in scenarios involving multiple dosing.

However, one type of NDE, namely neural controlled differential equation (NCDE), can combine the interpretability of low‐dimensional NODE and the versatility of latent‐ODE to handle complex inputs. NCDE can directly incorporate the relationship between independent and dependent variables (or explanatory and response variables), which exceeds the capability of NODE. When the independent variables exhibit discontinuities, NCDE can also be employed to model discontinuous dynamics, which is suitable for PK or PD profiles in multiple dosing scenarios. Inspired by these advantages, the study was aimed at investigating NCDE and its application in PK and PD. To establish a robust methodology, the influence of important hyperparameters was also explored. Furthermore, the study provides insights into the interpretability of the intrinsic dynamics of NCDE, illustrating its “gray‐box” characteristic to uncover the hidden dynamics behind the data.

2. Methods

2.1. Theoretical of NCDE

In pharmacometrics modeling, ODEs are commonly utilized to describe PK and PD dynamics over a continuous time course. The general form of an ODE can be written as (1):

dyt=fytdt (1)

In which y is the vector of the state variables consisting of n elements that depend on time t, written as ytn. The original or low‐dimensional NODE can approximate the real dynamics fyt using a feedforward neural network fθ, typically a multilayer perceptron (MLP). However, both ODE and NODE per se can only describe continuous dynamics of dependent variables y. To incorporate independent variables x that control the dynamics of the data, a controlled differential equation (CDE) becomes necessary. For an intuitive comparison with ODE, the general form of CDE can be written as:

dyt=gytdxt (2)

In 2, the term gydx represents matrix–vector multiplication. Similarly, an MLP gθ can also be applied to parameterize an NCDE. If yn, xm, then gyn×m to align the dimensionality. In this way, ODE can be regarded as a special case where the input vector x equals t. For simplicity, we assume that the dynamics of the data can be fully characterized by the dependent variables y itself. Besides, it can be shown that CDE can be solved via conversion to ODE:

dydtt=gytdxdtt (3)

The equation holds at the differentiable intervals where dxt/dt exists, and splines (e.g., Hermite cubic splines) can be applied to calculate the term dxt/dt. But generally, when CDE is integrated, the corresponding integral is a Riemann‐Stieltjes integral, which is an expanded form of the Riemann integral and capable of embedding discontinuities [15, 16]. A comparison between the dynamics of (N)ODE and (N)CDE is shown in Figure 1.

FIGURE 1.

FIGURE 1

Comparison of structures of original or low‐dimensional NODE (a), latent ODE (b), and NCDE (c), where NODE and NCDE are isomorphic with ODE and CDE, respectively. The arrows stand for the derivatives at each point on the (t, y) plane, representing the direction field of corresponding differential equations. The orange lines stand for the integral curves, namely the solutions of differential equations. Although latent ODE can handle complex data using an autoencoder–decoder structure, its dynamics are intrinsically continuous and rely on probabilistic hidden states to generate an initial value for the NODE section.

In practice, x often contains a time channel t to acquire time‐dependent dynamics [15]. Secondly, to embed discontinuities for PK and PD data with multiple dosing, we included time since the last dose, denoted as p, as the second channel in x. Finally, dose information D was included as the third channel. Dose is constant along the time course, resembling baseline information. Together, the three independent variables or control variables could be written in a vector x=t,p,D3, and the conception of CDE in this study is, in fact, rooted in clinical practices. Clearly, no ODE exists among t, p, and D, which is one of the reasons why handling such information exceeds the capacity of a simple NODE.

2.2. Implementation of NCDE

Previous studies have shown that a linear projection into a higher dimension can enhance the capability of NODEs [17]. Therefore, to better handle the complex nonlinear dynamics in PK and PD dataset, we introduced a linear transformation h1 to convert three‐dimensional independent variables x into six‐dimensional hidden variables z6 for initial states z0 when learning most of the datasets. Then, an NCDE parameterized by MLP gθ was employed to model a CDE between z and x. Finally, another linear transformation h2 was utilized to convert hidden variables z to PK or PD prediction values y. The model structure can be summarized in three equations:

z0=h1x0
dz=gzdx (4)
y=h2z

Here, using a linear layer to convert hidden states to prediction is helpful for extracting the derivative dy/dt from dz/dt conveniently. For example, since h2 can be written as a linear operation y=wz+b, the derivative dy/dt can be directly evaluated via the chain rule and 3 and 4:

dydt=wdzdt=wgθzdxdt (5)

Typically, an MLP with a single hidden layer is sufficient to be the carrier of NDEs [18], which also aligns with the principle of parsimonious or low‐dimensional modeling. In this study, the activation function of MLP was the tanh function that is constrained within [−1,1] interval. Unlike most previous studies, AdaMax optimizer and L1 regularization were employed to train the model. This combination was useful to obtain a parsimonious model and stabilize the training process, which would be discussed in the next section.

The NCDE implemented in the study was based on the Diffrax library, which was built using JAX, a high‐efficiency computational framework [18, 19]. In comparison to the torchdiffeq library [8], which is widely used in the literature of pharmacometrics, Diffrax is still under maintenance to date, and it supports various kinds of NDEs (including NODE, NCDE, and NSDE (neural stochastic differential equation)) and numerical solvers. In Diffrax, gθz term can be calculated using the controlled vector field, and dx/dt can be calculated after x is interpolated by Hermite cubic splines, which could be efficiently computed even for large datasets. The code was written in Jupyter Notebook, an interactive platform that is helpful for convenient reproduction and experimentation (see Supporting Information and https://github.com/wzs‐zwdxsky/ncde‐pkpd) [20].

2.3. Simulation of the Dataset

To investigate the versatility of NCDE to handle different types of PK and PD settings, five types of datasets were simulated for NCDE modeling, including: (1) PK profiles of intravenous injection, (2) PK profiles of extravascular administration with linear elimination, (3) PK profiles of extravascular administration with nonlinear elimination, (4) PD profiles characterized by biophase model, (5) PD profiles characterized by indirect response (IDR) model [21]. When the models were trained, PK and PD datasets under two doses (1 and 5 mg) were used for most cases. To test the performance of NCDE on unseen doses, the generalization or extrapolation capability of NCDE was tested on intermediate doses (2, 3, and 4 mg).

For PK dataset, only plasma concentrations served as the target variables, and for PD dataset, only pharmacodynamic effects served as the target variables. The training dataset of each of the five scenarios includes 44 groups of data, and the test dataset contains four groups of data. Each group of data contains an initial observation value (t = 0), followed by 19 data points randomly distributed within the timespan of nine dosing intervals (t < = 108) or 10 dosing intervals (t < = 120, only for the IDR model). For comparison, all the datasets in the main text are based on the same dosing strategy, but an alternative dosing strategy is used in the (Figures S7 and S8), in order to show the capability of NCDE in a more irregular setting. A total of six times of drug administration were simulated, and the remaining time corresponded to the gradual elimination of the drug or washout of drug effects. All the data simulated in the study were irregular‐sampled data added with Gaussian noise.

3. Results

3.1. The Influence of Hyperparameters

In the beginning, to establish a robust methodology, we investigated the influences of significant hyperparameters before actual training processes. The investigated hyperparameters include: (1) coefficient of L1 regularization, (2) loss function, (3) numerical solver, (4) optimizer, and (5) size of the dataset and size of each batch. For the purpose of pre‐experiments, the dataset utilized in this section only included PK data of one‐compartmental extravascular administration with linear elimination, and the iterating processes were stopped after 8001 iterations, which was sufficient for a preliminary determination of hyperparameters. To present the training procedure more informatively, apart from the error‐iteration plot (left panels of Figure 2), we also utilized the t‐SNE technique to project one part of the model parameter (specifically, the first weight matrix of gθ) into a 2D plane (right panels of Figure 2) [22], which is helpful to visualize the trajectory of training processes.

FIGURE 2.

FIGURE 2

Influence of hyperparameters. Left panel: Error‐iteration curves smoothed with a moving average (window size: 100) for illustrative purposes. Right panel: TSNE projection of the first weight matrix of NCDE during training, with the weight matrix recorded for every 100 iterations. (a, b) The influence of the L1 regularization coefficient on test errors (all using Heun solver and AdaMax optimizer). (c, d) The influence of the L1 coefficient and the choice of loss functions (all using Heun solver and AdaMax optimizer). (e)(f) The influence of numerical solver and L1 coefficient on model error and parameter diversity (all using AdaMax optimizer). (g, h) The influence of the optimizer on model convergence (all using Heun solver). The horizontal dashed lines in (a, c, e) represent the average MSE or WMSE errors in the last 3000 iterations.

The two loss functions investigated in the study are mean squared error (MSE) and weighted mean squared error (WMSE). They were calculated by the equations below:

MSE=1ni=1nyobsypred2 (6)
WMSE=1ni=1nyobsypred2yobs+ε (7)

where n is the batch size yobs and ypred denote observed and predicted values, respectively. To avoid division by zero when calculating WMSE, a near‐zero epsilon value (e.g., 1e‐3) was added to the denominator. Besides, an L1 regularization term was added to obtain a sparse model and to avoid overfitting [6]. Although L2 regularization can also reduce overfitting, L1 regularization is more efficient at shrinking parameters to zeros, and therefore more capable of obtaining a simpler model with fewer parameters (Figure S1). When the L1 regularization term was added, the full form of the loss function can be written as:

Loss=MSEor WMSE+λj=1mθj (8)

where θj denotes the j‐th parameter of the model, and the L1 norms of each parameter were added up to calculate the L1 regularization term. As shown in Figure 2, when the L1 coefficient λ was larger, the prediction error generally became greater, even in test datasets (Figure 2a,c,e). Besides, choosing a different value for the L1 coefficient might yield distinct models (Figure 2b). Unexpectedly, the impact of the choice between MSE and WMSE on the difference between models was larger than the impact of the choice among different L1 coefficient values (Figure 2b,d).

The selection of a numerical solver is also crucial for differential‐equation‐based modeling. Only after differential equations are solved by numerical solvers can predictions be made. One key concern is whether the stiffness of NDEs may limit the use of explicit solvers, since parameters in NDEs may span across multiple orders of magnitude. However, the model trained via explicit method (Heun method) and implicit method (implicit Euler method) showed no significant difference in terms of both errors and model parameters (Figure 2e,f). Considering the higher computational cost of the implicit method (Figure S3), it is not recommended to use implicit methods for ordinary purposes. Furthermore, regardless of the type of numerical solver, it was found that the solver employed for model training should match the one used for prediction or inference, or significant deviations would occur (Figure S4).

More importantly, it was discovered that for datasets of different sizes, the suitable batch size and L1 coefficient for NCDE training should be reinvestigated (Figure S5). With respect to this study, when the batch size was set to 8, and the dataset size was set to 44, plausible choices of L1 coefficient were ranged from 1e‐3 to 7e‐3, which could be manually fine‐tuned when the model performances were monitored. Nevertheless, when the pattern of the dataset is more complex (e.g., nonlinear dynamics), the model should be less sparse to capture the hidden dynamics of the data (Table 1), and the L1 coefficient should be reduced accordingly.

TABLE 1.

MSEs, R‐squared scores, MAEs, and L1 norms of the NCDE models presented in the study.

PK or PD setting Train MSE Test MSE R‐squared score MAE L1 norm
Intravenous injection PK 0.00987 0.01012 0.99849 0.074293 5.35272
Extravascular administration with linear elimination PK 0.02810 0.03753 0.99443 0.10925 16.32808
Extravascular administration with nonlinear elimination PK 0.02630 0.03837 0.99810 0.10959 57.14127
IDR PD 0.01235 0.01175 0.99110 0.082392 37.81123
Biophase PD (two doses for training and testing) 0.03468 0.03590 0.98484 0.11673 88.59156
Biophase PD (three doses for training and testing) 0.01615 0.01087 0.99201 0.091072 164.64706

Note: R‐squared scores and MAEs were calculated from the training dataset.

Abbreviations: IDR: indirect response, MAE: mean absolute error, MSE: mean squared error.

Interestingly, although the Adam optimizer is commonly used in relevant studies about NDEs [9, 11, 12, 13, 23], we found it to be less stable than AdaMax optimizer when training NCDEs (Figure 2g). The training trajectory of Adam optimizers was fragmented and discontinuous, particularly with a small regularization term. On the contrary, the training trajectories of AdaMax remained continuous and stable (Figure 2h). One possible explanation for the stability of AdaMax lies in the updating mechanism. AdaMax utilizes L norm to stabilize the updating process, making it less sensitive to noise during each update step. This is crucial for training differential‐equation‐based models, because even a small overestimation of parameters can lead to significantly different predictions over long time spans. However, while AdaGrad and AMSGrad were more stable than AdaMax, both of them converged more slowly than AdaMax (Figure S2a). Balancing stability and convergence speed, AdaMax is a suitable choice for NCDE training.

3.2. Data Fitting and Extrapolation

Fitting and extrapolating results of NCDEs are shown in Figure 3 (PK datasets) and Figure 4 (PD datasets). It could be visually demonstrated that NCDEs successfully captured the hidden patterns in the irregularly sampled datasets, showing their capability for PK and PD data fitting. For example, the shape, curvature, and smoothness of the prediction and extrapolation curves were aligned with the patterns of the data, and it could be inspected that the discrepancies of other metrics (e.g., time‐to‐peak as well as Cmax or Emax) were small. The fitting curves shown in the left panels of Figures 3 and 4 were calculated from the dense inputs of t,p,D. Goodness‐of‐fit diagrams and R‐squared scores of each model are shown in the middle panels of Figures 3 and 4, which were calculated one‐to‐one from the original inputs, and five of six NCDE models had R‐squared scores larger than 0.99. MSEs, MAEs, and R‐squared scores of the models presented in Figures 3 and 4 are summarized in Table 1, from which the accuracy of NCDE could be demonstrated. L1 norms of different NCDE models are also included in Table 1, which could serve as a metric of model complexity. Additionally, the fitting and extrapolating performance of NCDE in a more irregular dosing scheme is shown in Figure S7.

FIGURE 3.

FIGURE 3

Fitting and extrapolating performances of NCDE for PK datasets. (a) Intravenous injection (kel: 0.06 h−1); (b) Extravascular administration with linear elimination (ka: 0.6 h−1; kel: 0.06 h−1); (c) Extravascular administration with nonlinear elimination (ka: 0.6 h−1; Vmax: 0.5 mg·L−1·h−1; Km: 5 mg·L−1). The apparent volume of distribution is set to 1 L for simplicity. Kel: Elimination rate constant. Ka: Absorption rate constant; Vmax: Maximum rate for Michaelis–Menten transportation; Km: Constant for Michaelis–Menten transportation.

FIGURE 4.

FIGURE 4

Fitting and extrapolating performances of NCDE for PD datasets. (a) PD of IDR model (kin: 10; kout: 1 h−1; Imax: 0.8; IC50: 10 mg·L−1); (b)(c) PD of biophase model (ke0: 0.5 h−1; Emax: 6; EC50: 1.5 mg·L−1). The model in Figure (b) was trained via the implicit Euler method, and the model in Figure (c) was trained via the Heun method. For Figure (c), data from three doses were included to test the nonlinearity of extrapolation, where the number of groups of data was maintained at 44 in total, identical to other settings. Kin: Zero‐order rate constant for effect production. Kout: First‐order rate constant for effect washout. Emax: Maximum pharmacodynamic effect. EC50: Half maximal effective concentration. Ke0: Transportation rate constant from plasma to the effect compartment.

Besides, for the models shown in Figures 3 and 4, the dimension of the hidden state for all models was 6, except the model that was trained on the extravascular administration PK dataset with nonlinear elimination, where it was 8 (Figures 3c). Regarding the numerical solvers, except for utilizing the implicit Euler method for biophase PD (Figure 4b) and extravascular administration PK with nonlinear administration (Figure 3c), all other models employed the Heun method for model training. Parameters for data simulation are shown in the figure legends in Figures 3 and 4.

As shown in Figures 3 and 4, the introduction of discontinuous control variables (i.e., the time since the last dose t) resulted in discontinuous prediction results. Specifically, the discontinuous predictions exhibited possible deviations between the final prediction of the last interval and the first prediction of the next interval. This characteristic was consistent with PK profiles of intravenous injection (Figure 3a,b). However, for the PD profiles and extravascular administration PK profiles, the actual discontinuities are located in the first‐order derivative (dy/dt), rather than in the data itself (y). Therefore, the accurate estimations of trough value (or peak value for the IDR model) were at the end of each interval, rather than at the start of each interval.

Moreover, the trained models were able to provide a near‐linear extrapolation for different doses, similar to the previously reported low‐dimensional NODE [9]. Such extrapolation capability is likely due to the L1 regularization. The addition of the L1 regularization term might lead to a sparse and simple NCDE model, which in turn generates linear dynamics for the dose value. However, such linear extrapolation is not suitable for nonlinear PK and PD datasets. From a proof‐of principle standpoint, one possible approach was to include data with intermediate doses to help NCDE models capture nonlinearities. As shown in (Figure 4c), when the information from the intermediate dose was known, the extrapolation results of different doses were not equally spaced, as was the case with Figure 4b. Although the intermediate dose is not always feasible in clinical settings, it is also challenging for the traditional modeling approach to evaluate parameters in the Michaelis–Menten equation when only two doses are present, and therefore, the inclusion of an intermediate dose might be appropriate for a proof‐of‐principle experiment. Nevertheless, it is reasonable to infer that NCDE can also incorporate other baseline information (such as body weight, sex, and other characteristics of subpopulations) that remain constant along the time axis, thereby uncovering the relationship between baseline information and the data.

To better understand the fitting and extrapolating capabilities of NCDE, we also investigated the influence of a totally irrelevant control variable, namely a noise input following a uniform distribution U0,1. Our study showed that NCDE was capable of ignoring the irrelevant noise signal, and the final prediction was consistent with the results shown in Figure 3b (Figure S6). Additionally, when the trained NCDE model is extrapolated to a different dosing scheme (e.g., a combination of arbitrarily chosen dosing intervals), a deviation would occur (Figure S8), but it is possible to reduce such a deviation by incorporating datasets with diverse dosing schemes. In summary, the fitting and extrapolating capability of NCDE for various PK and PD datasets was demonstrated, and the effects of discontinuous control variables, baseline information, and irrelevant input were also discussed in this section.

3.3. Obtaining Intrinsic Dynamics

One major advantage of differential‐equation‐based models is their ability to capture the hidden dynamics of the data. When NCDE (and NODE) learn the dataset directly, without the intermediacy of an abstract and complex autoencoder‐decoder structure (as in latent‐ODE), the dynamics of NDEs directly correspond to the dynamics behind the data. As shown in section 2.2, based on the vector field of NCDE and the dense inputs of control variables that were known a priori, derivatives of the data at each data point can be obtained (see 5). The obtained derivatives of the trained NCDE models in the previous section are shown in Figure 5 and S9 (the last three intervals and all the intervals, respectively), demonstrating the ability of NCDE to uncover hidden dynamics of different PK and PD datasets. The illustration was focused on the differentiable intervals, and the discontinuous points were omitted for the ill‐defined derivatives. As seen in Figure 5, even for the derivatives of PK and PD profiles under unseen doses, NCDE could produce reasonable inferences, which aligned with the extrapolation capability shown in the previous section. Compared with NODE, the discontinuous intrinsic dynamics of NCDE were clearly demonstrated, which exceed the capability of NODE.

FIGURE 5.

FIGURE 5

Learned derivatives of NCDE in different PK and PD settings. (a) Derivative of one‐compartmental intravenous injection PK; (b) Derivative of one‐compartmental extravascular administration with linear elimination PK; (c) Derivative of biophase PD; (d) Derivative of IDR PD. To better illustrate the details of derivatives, only local diagrams in the differentiable interval of [36, 95] ([36, 120] for biophase PD) were presented here.

In the PK profile of intravenous injection, its exponential decay dynamics corresponded to an upside‐down exponential derivative, since dexpkt/dt=kexpkt, where the only difference between PK and its derivative was the multiplication factor, and the shape remained the same (Figure 5a). In contrast, for the predicted derivatives of PD and other PK datasets, there was one zero point in each of the dosing intervals, which corresponded to the maxima of the predicted PK or PD profiles (Figure 5b–d). Interestingly, the trained model for biophase PD data displayed a gradual delay at the washout stage (Figure 5c), which was consistent with the actual delay of the pharmacodynamic mechanism. Correspondingly, predicted derivatives of the biophase PD dataset under different doses were not paralleled to each other. However, when the model was trained using the explicit Heun solver, such a delay characteristic in the derivative of biophase PD was not discovered (Figure S10).

Another useful method for investigating the learned dynamics of NCDE is the dy/dt vs. y diagram. For example, inferred from the linear characteristic of one‐compartmental intravenous injection PK, the ratio between dy/dt and y should be a constant (specifically, the negative value of the elimination rate constant of the drug, −kel), so all the points on the (y, dy/dt) 2D plane should lie on a single straight line. Correspondingly, this characteristic was recovered in the dynamics of NCDE (Figure 6a). In contrast, for other PK and PD settings, clockwise loops appeared in the dy/dt vs. y diagram, representing their nonmonotonic characteristics, and different PK or PD settings corresponded to the loops with distinct shapes and curvatures. Interestingly, in the PD scenario, as the effect gradually accumulated to reach the maximum (Emax) after multiple dosing, the corresponding dy/dt vs. y curves were confined to a significantly smaller region as they approached saturation. As shown in Figure 6c,d, these characteristics could be identified in the learned dynamics of NCDE, indicating the correspondence between the hidden dynamics of the NCDE model and the actual dynamics in the data, and therefore showcasing the intrinsic interpretability of NCDE.

FIGURE 6.

FIGURE 6

Derivative‐concentration or derivative‐effect diagram (dy/dt vs. y). (a) PK of intravenous injection; (b) PK of extravascular administration with linear elimination; (c) PD of biophase model; (d) PD of IDR model. Black hexagon: Stable point, corresponding to zero concentration or baseline effect measurement. Empty circle: Initial value. Black arrow: The evolving direction of the curve in each dosing interval. The transparency of each curve decreased when the number of dose intervals escalated; therefore, the curve of the first interval was the most transparent one.

4. Discussion

Compared with other AI/ML methods, especially transformer‐based large models, NCDEs and other NDEs have an intrinsic interpretability, as the learned dynamics in the neural network are consistent with the actual dynamics in the data. This characteristic is helpful for the reliability and trustworthiness of machine learning methods, and closer to the concept of scientific machine learning (SciML). As shown in the results section, it is possible to extract useful information from the learned derivative of NCDE. To our knowledge, this is the first implementation of NCDE in the context of pharmacometrics. However, the exact algebraic expressions of NCDEs are still illegible to humans, and the NCDE is a ‘gray‐box’ rather than a purely ‘white‐box’. Incorporating symbolic regression with NCDE to obtain algebraic expressions is a promising direction for future work.

Apart from serving as a proof‐of‐principle methodological investigation, the study also shed light on the potential of control differential equations (CDEs) in pharmacometrics. Although pharmacometric modeling is primarily based on ODEs that are easier to interpret, some research has explored the potential of other differential equations, such as partial differential equations (PDEs) [24], stochastic differential equations (SDEs) [25, 26], integro‐differential equations (IDEs) [27], and delay differential equations (DDEs) [28, 29, 30]. Here, SDEs, IDEs, and DDEs are also interconnected with CDEs, because stochastic, integro‐differential, and delay terms can be viewed as control terms. Therefore, it is possible that their NDE version could be developed accordingly. Furthermore, research in alternative differential equations not only expands the toolbox of pharmacometrics but also offers flexibility for accurately characterizing mechanisms and causal relationships in the data.

In section 3.2, to obtain a prediction from NCDE, there is a potential difference between the one‐to‐one prediction (as in goodness‐of‐fit diagrams) and the dense prediction (as in fitting and extrapolating plots). For the first method, the inputs are identical to the input features in the dataset, and in the second method, the dense inputs are generated (e.g., 500 data points of inputs that are priorly known). This is because, for the one‐to‐one method, when the dataset is irregular‐sampled, and dx/dt are calculated from interpolation based on Hermite cubic splines, the sparsity and irregularity may lead to inaccurate estimations. Nevertheless, when comparing fitting plots with goodness‐of‐fit diagrams in Figures 3 and 4, it can be concluded that such deviation is not significant for the converged models. For further comparison, the one‐to‐one prediction plots overlaying observed datapoints with prediction results are shown in Figure S11, from which the consistency of the two methods can also be demonstrated.

Although the study primarily serves as a methodological study for NCDEs, its further application to real‐world data with rich information is also promising, since time‐varying continuous variables (time axis), discontinuous variables (time since last dose), and baseline information (doses) were all covered in the study. At the same time, the capability of NCDE to capture complex and nonlinear patterns has been investigated on diverse PK and PD settings, and future investigations may experiment with other scenarios, such as PK and PD datasets based on TMDD and transit compartment models. However, the application of data‐driven approaches, including NCDE, is still challenging due to higher sparsity, higher noise level, and smaller datasets in real‐world settings. Besides, higher computational complexity and difficulty in simulation are key drawbacks hindering large‐scale real‐world applications, whereas the traditional pharmacometric modeling approach may surpass AI/ML methods in these aspects. We recognize that further research is needed to enable its broader utilization.

For the preliminary investigation, only a single dependent variable was considered in the study, namely, concentration or effect. However, since clinical datasets often contain multiple biomarkers or measurements, more dependent variables can also be incorporated for NCDE modeling. Besides, when information from other modalities (e.g., QSAR, images, and genomics) is incorporated, it is possible for NCDE to serve as a downstream section for PK and PD prediction. However, as the dimension of the input variables increases, it is required that the model size should increase accordingly. Similarly, the optimal choices of hyperparameters should be re‐evaluated when the dataset changes. As is common practice in pharmacometrics, it is recommended to conduct an exploratory data analysis (EDA) to identify key variables before introducing them into NCDE training.

In summary, the study demonstrated the accuracy, flexibility, and interpretability of a novel AI/ML method, namely NCDE, for data‐driven modeling of PK and PD profiles. In this study, the comprehensive investigation toward establishing a robust NCDE methodology may also facilitate its wider application, as well as the broader acceptance of NDEs in general. Further utilization of NCDE to incorporate the data from other modalities is also promising in the future.

Author Contributions

Z.W., P.L, R.C, Y.L., W.J. and T.Z. wrote the manuscript. Z.W. and T.Z. designed the research. Z.W. performed the research. Z.W., P.L, R.C and Y.L. analyzed the data.

Conflicts of Interest

The authors declare no conflicts of interest.

Supporting information

Data S1: Supporting Information.

PSP4-15-e70146-s001.pdf (15.3MB, pdf)

Acknowledgments

The authors thank anyone who inspired the research (especially Dominic Stefan Bräm and Patrick Kidger) and provided suggestions.

Wu Z., Luo P., Chen R., Liu Y., Jian W., and Zhou T., “Neural Controlled Differential Equation and Its Application in Pharmacokinetics and Pharmacodynamics,” CPT: Pharmacometrics & Systems Pharmacology 15, no. 1 (2026): e70146, 10.1002/psp4.70146.

Funding: This study was supported by the National Key Research and Development Program of China (Grant no. 2022YFF1203003).

References

  • 1. Sy S. K. B., Wang X., and Derendorf H., “Introduction to Pharmacometrics and Quantitative Pharmacology With an Emphasis on Physiologically Based Pharmacokinetics,” in Applied Pharmacometrics, ed. Schmidt S. and Derendorf H. (Springer, 2014). [Google Scholar]
  • 2. Yue C. and Ducharme M. P., Empirical Models, Mechanistic Models, Statistical Moments, and Noncompartmental PK/PD Analyses, in Shargel and Yu's Applied Biopharmaceutics and Pharmacokinetics, 8e (McGraw‐Hill Education, 2022). [Google Scholar]
  • 3. Bonate P. L., Barrett J. S., Ait‐Oudhia S., et al., “Training the Next Generation of Pharmacometric Modelers: A Multisector Perspective,” Journal of Pharmacokinetics and Pharmacodynamics 51, no. 1 (2024): 5–31. [DOI] [PubMed] [Google Scholar]
  • 4. McComb M., Bies R., and Ramanathan M., “Machine Learning in Pharmacometrics: Opportunities and Challenges,” British Journal of Clinical Pharmacology 88, no. 4 (2022): 1482–1499. [DOI] [PubMed] [Google Scholar]
  • 5. Tang A., “Machine Learning for Pharmacokinetic/Pharmacodynamic Modeling,” Journal of Pharmaceutical Sciences 112, no. 5 (2023): 1460–1475. [DOI] [PubMed] [Google Scholar]
  • 6. Aggarwal C., “An Introduction to Neural Networks,” in Neural Networks and Deep Learning: A Textbook (Springer International Publishing, 2023). [Google Scholar]
  • 7. Valderrama D., Ponce‐Bobadilla A. V., Mensing S., Fröhlich H., and Stodtmann S., “Integrating Machine Learning With Pharmacokinetic Models: Benefits of Scientific Machine Learning in Adding Neural Networks Components to Existing PK Models,” CPT: Pharmacometrics & Systems Pharmacology 13, no. 1 (2024): 41–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Chen R. T., “Neural Ordinary Differential Equations,” Advances in Neural Information Processing Systems 31 (2018): 6572–6583. [Google Scholar]
  • 9. Bräm D. S., Nahum U., Schropp J., Pfister M., and Koch G., “Low‐Dimensional Neural ODEs and Their Application in Pharmacokinetics,” Journal of Pharmacokinetics and Pharmacodynamics 51, no. 2 (2024): 123–140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Bräm D. S., Steiert B., Pfister M., Steffens B., and Koch G., “Low‐Dimensional Neural Ordinary Differential Equations Accounting for Inter‐Individual Variability Implemented in Monolix and NONMEM,” CPT: Pharmacometrics & Systems Pharmacology 14 (2024): 5–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Lu J., Deng K., Zhang X., Liu G., and Guan Y., “Neural‐ODE for Pharmacokinetics Modeling and Its Advantage to Alternative Machine Learning Models in Predicting New Dosing Regimens,” Iscience 24, no. 7 (2021): 102804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Laurie M. and Lu J., “Explainable Deep Learning for Tumor Dynamic Modeling and Overall Survival Prediction Using Neural‐ODE,” NPJ Systems Biology and Applications 9, no. 1 (2023): 58. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Xiang J., Qi B., Cerou M., Zhao W., and Tang Q., “DN‐ODE: Data‐Driven Neural‐ODE Modeling for Breast Cancer Tumor Dynamics and Progression‐Free Survivals,” Computers in Biology and Medicine 180 (2024): 108876. [DOI] [PubMed] [Google Scholar]
  • 14. Rubanova Y., Chen R. T., and Duvenaud D. K., “Latent Ordinary Differential Equations for Irregularly‐Sampled Time Series,” in Advances in Neural Information Processing Systems (2019). [Google Scholar]
  • 15. Patrick K., “Neural Controlled Differential Equations for Irregular Time Series,” Advances in Neural Information Processing Systems 33 (2020): 6696–6707. [Google Scholar]
  • 16. Tao T., “The Riemann Integral,” in Analysis I (Springer Nature Singapore, 2022). [Google Scholar]
  • 17. Dupont E., Doucet A., and Teh Y. W., “Augmented Neural Odes. Advances in Neural Information Processing Systems,” (2019).
  • 18. Kidger P., “On Neural Differential Equations,” arXiv Preprint arXiv:2202.02435 (2022). [Google Scholar]
  • 19. Bradbury J., “JAX: Composable Transformations of Python+ NumPy Programs,” (2018).
  • 20. Kluyver T., Jupyter Notebooks–a Publishing Format for Reproducible Computational Workflows, in Positioning and Power in Academic Publishing: Players, Agents and Agendas (IOS press, 2016). [Google Scholar]
  • 21. Levy G., Gibaldi M., and Jusko W. J., “Multicompartment Pharmacokinetic Models and Pharmacologic Effects,” Journal of Pharmaceutical Sciences 58, no. 4 (1969): 422–424. [DOI] [PubMed] [Google Scholar]
  • 22. Van der Maaten L. and Hinton G., “Visualizing Data Using t‐SNE,” Journal of Machine Learning Research 9, no. 11 (2008): 2579–2605. [Google Scholar]
  • 23. Kingma D. P. and Ba J., “Adam: A Method for Stochastic Optimization. CoRR, 2014. abs/1412.6980,”.
  • 24. Boger E. and Wigström O., “A Partial Differential Equation Approach to Inhalation Physiologically Based Pharmacokinetic Modeling,” CPT: Pharmacometrics & Systems Pharmacology 7, no. 10 (2018): 638–646. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Tornøe C. W., Overgaard R. V., Agersø H., Nielsen H. A., Madsen H., and Jonsson E. N., “Stochastic Differential Equations in NONMEM: Implementation, Application, and Comparison With Ordinary Differential Equations,” Pharmaceutical Research 22, no. 8 (2005): 1247–1258. [DOI] [PubMed] [Google Scholar]
  • 26. Donnet S. and Samson A., “A Review on Estimation of Stochastic Differential Equations for Pharmacokinetic/Pharmacodynamic Models,” Advanced Drug Delivery Reviews 65, no. 7 (2013): 929–939. [DOI] [PubMed] [Google Scholar]
  • 27. Kulesza A., Couty C., Lemarre P., Thalhauser C. J., and Cao Y., “Advancing Cancer Drug Development With Mechanistic Mathematical Modeling: Bridging the Gap Between Theory and Practice,” Journal of Pharmacokinetics and Pharmacodynamics 51, no. 6 (2024): 581–604. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Koch G., Krzyzanski W., Pérez‐Ruixo J. J., and Schropp J., “Modeling of Delays in PKPD: Classical Approaches and a Tutorial for Delay Differential Equations,” Journal of Pharmacokinetics and Pharmacodynamics 41, no. 4 (2014): 291–318. [DOI] [PubMed] [Google Scholar]
  • 29. Hu S., Dunlavey M., Guzy S., and Teuscher N., “A Distributed Delay Approach for Modeling Delayed Outcomes in Pharmacokinetics and Pharmacodynamics Studies,” Journal of Pharmacokinetics and Pharmacodynamics 45, no. 2 (2018): 285–308. [DOI] [PubMed] [Google Scholar]
  • 30. Yan X., Bauer R., Koch G., Schropp J., Perez Ruixo J. J., and Krzyzanski W., “Delay Differential Equations Based Models in NONMEM,” Journal of Pharmacokinetics and Pharmacodynamics 48, no. 6 (2021): 763–802. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data S1: Supporting Information.

PSP4-15-e70146-s001.pdf (15.3MB, pdf)

Articles from CPT: Pharmacometrics & Systems Pharmacology are provided here courtesy of Wiley

RESOURCES