Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Sep 18.
Published in final edited form as: Commun Stat Theory Methods. 2018 Dec 29;48(24):5985–6004. doi: 10.1080/03610926.2018.1523433

Parameter Estimation for Semiparametric Ordinary Differential Equation Models

Hongqi Xue 1, Arun Kumar 2,*, Hulin Wu 3
PMCID: PMC7500512  NIHMSID: NIHMS1514797  PMID: 32952273

Abstract

We propose a new class of two-stage parameter estimation methods for semiparametric ordinary differential equation (ODE) models. In the first stage, state variables are estimated using a penalized spline approach; In the second stage, form of numerical discretization algorithms for an ODE solver is used to formulate estimating equations. Estimated state variables from the first stage are used to obtain more data points for the second stage. Asymptotic properties for the proposed estimators are established. Simulation studies show that the method performs well, especially for small sample. Real life use of the method is illustrated using Influenza specific cell-trafficking study.

Keywords: Ordinary differential equation, Penalized spline, Semiparametric coefficient models, Data augmentation estimation, Two-stage estimation

1. Introduction

We consider a semiparametric coefficient ODE model, which can be written as

{ddtX(t)=F{t,X(t),Z(t),β,η(t)},t[t0,T],X(t0)=X0, (1)

where t ∈ [t0, T ] (0 ≤ t0 < T < ∞) is time, X(t) = {X1(t), · · ·, Xϑ(t)}T is a ϑ-dimensional state vector, F (.) = {F1(.), · · ·, Fϑ(.)}T is a ϑ-dimensional vector of smoothing functions with known forms, Z(t) = {Z1(t), · · ·, ZH(t)}T is a H-dimensional co-variable vector, β = (β1, · · ·, βd)T is a d-dimensional vector of unknown constant parameters with a true value β0, η(t) = {η1(t), · · ·, ηb(t)}T is a b-dimensional vector of unknown time-varying parameters with a true curve η0(t), and X(t0) = X0 is the initial value, which may be known or unknown. The function F is assumed to fulfill the Lipschitz assumption on X ensuring existence and uniqueness of the solutions to Eq. (1) (see Hairer, Nørsett and Wanner 1993; and Mattheij and Molenaar 2002).

We assume that each Xk(t) (k = 1, · · ·, ϑ) is measured with noise and the measurement model can be written as

Yk,i=Xk(ti)+k,i, (2)

at random time points t1, · · ·, tn, where the measurement errors s1, · · ·, sn are assumed to be i.i.d. with mean zero and variance σk2.

For ODE models with only constant coefficients, the existing statistical methods include the nonlinear least squares (NLS) method (Bard 1974; Li, Osborne, and Prvan 2005; Xue, Miao, and Wu 2010), the two-stage smoothing estimation method (Varah 1982; Liang and Wu 2008; Brunel 2008; Wu, Xue, and Kumar 2012; Gugushvili and Klaassen 2012), the principal differential analysis (PDA) (Ramsay 1996) and its extension, the generalized profiling approach (Ramsay et al. 2007; Qi and Zhao 2010), and the Bayesian approaches (Putter et al. 2002; Huang, Liu, and Wu 2006; Donnet and Samson 2007). For ODE models with only time-varying coefficients, Chen and Wu (2008a,b) considered the two-stage smoothing estimation method, but their models are linear and additive for time-varying coefficients as follows:

ddtX(t)=ZT(t)η(t)+g{X(t)},

and

ddtX(t)=η(t)+βTX(t),

where g(.) is a known function and β is a known constant vector. However, ODE model (1) with both constant and time-varying coefficients was widely used in practical applications such as in physiolotics (Thomaseth et al. 1996), pharmacokinetic (Li et al. 2002), and HIV studies (Xue, Miao, and Wu 2010; Liang, Miao, and Wu 2010; Cao, Huang, and Wu 2012).

In this article, we propose a novel two-stage estimation method for ODE model (1). In the first stage, we estimate the curve X(t) and its derivative X(t) by the penalized spline smoothing approach (Ruppert, Wand, and Carroll 2003; Li and Ruppert 2008; Claeskens, Krivobokova, and Opsomer 2009; Wu, Xue, and Kumar 2012). Then in the second stage, we propose a class of estimation methods which are formed using the one-step discretization structure of the numerical algorithms for solving ODEs. In the second stage, we use more sample points from the smoothed curves from the first stage than the original data points, which was first suggested by Varah (1982) for parameter estimation of ODEs, and has been applied to gene regulatory network modeling (D’Haeseleer et al. 1999; Wessels, van Someren, and Reinders 2001; Bansal, Della Gatta, and di Bernardo 2006). This novel method is based on the original two-stage smoothing estimation method for ODE (Varah 1982; Chen and Wu 2008a,b; Liang and Wu 2008; Brunel 2008) like the numerical discretization-based estimation method (Wu, Xue, and Kumar 2012). The method, however, is different from the numerical discretization-based estimation method (Wu, Xue, and Kumar 2012) because the coefficients of ODE models are semiparametric. The derivation of the asymptotic properties for this model is challenging because the proposed estimation method plugs-in the nonparametric estimator obtained from the first stage to the second stage, and therefore, we cannot directly use the common Huber-Pollard Z-theorem (see Theorem 3.3.1 in van der Vaart and Wellner 1996) to derive the asymptotic normality. The article extends the Huber-Pollard Z-theorem to the two-stage estimation case.

The rest of the article is organized as follows. We introduce our new estimation methods in Section 2. The asymptotic properties of the proposed estimators are studied in Section 3. Estimation of variance using the bootstrap method is also discussed in Section 3. A real data application is given in Section 4 to illustrate the usefulness of the proposed methods, and simulation results are presented to demonstrate the finite-sample behavior of the proposed methods in Section 5. We conclude with some remarks in Section 6. The proofs of all theoretical results are given in the Appendix.

2. Estimation Methods

In the first stage, we use penalized splines to estimate the state variable X(t) as a smooth function of t. Every state variable Xk(t) (k = 1, · · ·, ϑ) is approximated at time t as follows Xk(t)j=vKδk,jNj(t)=NT(t)δk,where δk=(δk,v,,δk,K)T is the unknown coefficient vector to be estimated from the data, and N(t)={Nv(t),,NK(t)}T is the B-spline basis function vector of degree ν (order ν + 1) at knots t0 = τ−ν = τ−ν+1 = · · · = τ−1 = τ0 < τ1 < · · · < τK < τK+1 = τK+2 = · · · = τK+ν+1 = T on the interval [t0, T ]. Define n × K + ν + 1 spline design matrix N = {N (t1), · · ·, N (tn)}T, Y k = (Yk,1, · · ·, Yk,n)T and let VN=t0T[N(t)][N(t)]Tdt. The estimation objective function contains a sum of squared differences with a penalty term using the integrated squared second order derivative of the spline function as

Lk(δk;λk)=(YkNδk)T(YkNδk)+λkδkTVNδk. (3)

The minimizer of (3) takes the form δ^k=(NTN+λkVN)1NTYk. Then X^(t)=NT(t)δ^k and X^(t)=[N(t)]Tδ^k. The derivatives of spline functions can be expressed in terms of lower order spline functions, hence the expressions of VN and N(t) can be explicitly obtained. To determine the penalty parameter λk, we use the standard generalized cross validation (GCV) method (Craven and Wahba 1979).

In the second stage, by one-step discretization methods for ODEs (Hairer, Norsett, and Wanner 1993; Mattheij and Molenaar 2002), we have

X(sj+1)X(sj)sj+1sj=Φ{sj,X(sj),Z(sj),X(sj+1),Z(sj+1),β,η(sj)}+O(hp), (4)

for j = 1, · · ·, m − 1, where t0 = s1 < s2 < · · · < sm = T are sample points on the interval [t0, T ], which may be different from the observation time points t1, · · ·, tn with mn, h = maxj(sj+1sj). Variable m is the sample size of the augmented data set and h = O(m−1). The form of Φ{sj,X(sj),Z(sj),X(sj+1),Z(sj+1),β,η(t)} and the order p are determined by the discretization methods. Similar to Wu, Xue, and Kumar (2012), we consider three different discretization methods: Euler’s method, trapezoidal rule and Runge-Kutta method. For each of these methods, the form of the function Φ and order p are given as follows. For the Euler’s method, we have

Φ{sj,X(sj),Z(sj),X(sj+1),Z(sj+1),β,η(sj)}=F{sj,X(sj),Z(sj),β,η(sj)}

with p = 1; the trapezoidal rule gives the form

Φ{sj,X(sj),Z(sj),X(sj+1),Z(sj+1),β,η(sj)}=12[F{sj,X(sj),Z(sj),β,η(sj)}+F{sj+1,X(sj+1),Z(sj+1),β,η(sj+1)}]

with p = 2; for the fourth-order Runge-Kutta method, we have

Φ{sj,X(sj),Z(sj),X(sj+1),Z(sj+1),β,η(sj)}=k16+k23+k33+k46

with p = 4, where

  • k1=F{sj,X(sj),Z(sj),β,η(sj)},

  • k2=F{sj+hj/2,X(sj)+hjk1/2,Z(sj+hj/2),β,η(sj+hj/2)},

  • k3=F{sj+hj/2,X(sj)+hjk2/2,Z(sj+hj/2),β,η(sj+hj/2)},and

  • k4=F{sj+hj,X(sj)+hjk3,Z(sj+hj),β,η(sj+hj)}.

Further we approximate each component of η(t) by regression splines with the same B-spline basis functions. Let t0 = u0 < u1 < · · · < uq = T be a partition of the interval [t0, T ], where q=O(nϖ)(0<ϖ<0.5) is a positive integer such that max1jq|ujuj1|=O(nϖ). Then we have r = q + l normalized B-spline basis functions of order l + 1 that form a basis for the linear spline space. We denote these basis functions as a vector π(t) = {B1(t), · · ·, Br(t)}T. Parameter η(t) can be approximated by πT (t)α, where α=(α1,,αr)TRr is the spline coefficient vector with α0 corresponding to η0(t).

Replacing the state variable, X(t), with its estimate, X^(t), the expression (4) can be re-written as

X^(sj+1)X^(sj)sj+1+sj=Φ{sj,X^(sj),Z(sj),X^(sj+1),Z(sj+1),β,πT(sj)α}+Γj, (5)

where Γj is the sum of discretization error, approximation error by splines and estimation error of X^(.) from the first stage. Obviously, Γj are not i.i.d., but dependent. For a prescribed non-negative weight function wk(t) on [t0, T ] (k = 1, · · ·, G), we propose to estimate (β^n,α^n) of (β, α) by minimizing the following weighted least squares criterion:

j=1mk1ϑwk(sj)[X^k(sj+1)X^k(sj)sj+1+sjΦk{sj,X^(sj),Z(sj),X^(sj+1),Z(sj+1),β,πT(sj)α}]2. (6)

Then η^n(t)=πT(t)α^n Assumption required on wk(.) for the asymptotic results is discussed in Section 3. From numerical studies, we found that the selection of wk(.) have little affect on the performance of the proposed estimators.

In general, a higher order discretization method is expected to give a better estimation accuracy since its discretization error is smaller, but its computational cost is higher. The trade-off between the estimation accuracy and computational cost needs to be considered in practical applications. For convenience, we use the abbreviations EDB, TDB, and RDB for the Euler’s discretization-based estimator, the trapezoidal discretization-based estimator, and the Runge-Kutta discretization-based estimator, respectively. We study the asymptotic properties of the proposed estimators in Section 3 and evaluate the finite-sample performance of these estimators in Sections 45.

3. Asymptotic Properties

In this section, we establish the asymptotic properties of the proposed two-stage estimator for the semiparametric ODE models. This is challenging since we need to integrate the asymptotic results from the two-stage smoothing-based estimation method for ODE models (Varah 1982; Brunel 2008; Chen and Wu 2008a,b; Liang and Wu 2008; Wu, Xue, and Kumar 2012) and the spline approximation error for the time-varying coefficients together especially when the resulting error terms Γj in (5) are not i.i.d., but dependent.

Let µ be a nonnegative integer and γ ∈ (0, 1] such that ϱ = µ + γ > 0.5. Let A be the collection of functions η on [t0, T ] whose µ-th derivative η(µ) exists and satisfies the Lipschitz condition of order γ:|η(μ)(s)η(μ)(t)|L|st|γ for s, t ∈ [t0, T ] with a general positive constant L. Denote the parametric vector θ = {βT, ηT (t)}T and define η(t)2=[t0Tη2(t)ρ(t)dt]1/2 for any function η(t) whenever the integral exists, where ρ(t) is defined in the Assumption A5. For any θ1 and θ2, we define a distance ρ(θ1,θ2)=β1β2+j=1bηj,1ηj,22. For 1 ≤ kϑ, the assumptions A-D needed for the theorems are listed below:

Assumption A:

  • (A1)

    For k=max0jK(τk,j+1τk,j), there exists a constant M > 0 such that k/min0jK(τk,j+1τk,j)M and max0jK|τk,j+1τk,jτk,jτk,j11|=o(1).

  • (A2)

    Kcnψ with 1/(2v+3)ψ<1 and λk=O(nπ) with πv/(2v+3).

  • (A3)

    Xk(t) ∈ Cν+1[t0, T ] with ν ≥ 2.

  • (A4)

    K*(K+v1)(λkc˜1)1/4n1/4<1 for some constant c˜1.

  • (A5)

    t1, · · ·, tn are i.i.d. with a cumulative distribution function Q(t) and a positive and continuous derivative density ρ(t). Moreover, ρ(t) is bounded away from 0 and +∞ and has a bounded and continuous first-order derivative.

Assumption B:

  • (B1)

    Fk{t, X(t), Z(t), β, η(t)} is a continuous function for βB and ηj(t)F(1jb), where B is a compact subset of Rd with a finite diameter Rβ.

  • (B2)

    All partial Fréchet derivatives of Fk{t, X, Z(t), β, η} up to order p with respect to t, X, Z and η exist and are continuous.

  • (B3)

    The numerical method for solving ODEs is of order p.

  • (B4)

    The true parameter β0 is an interior point of B.

Assumption C:

  • (C1)

    The first and the second partial Fr´echet derivatives of Fk{t, X(t), Z(t), β, η}; Fkβ,Fkη,2FkββT,2FkβηT and 2FkηηT exist and are continuous and uniformly bounded for all t, β, X(t) and η.

  • (C2)

    The first partial Fr´echet derivatives tFk{t,X(t),Z(t),β,η(t)} and XFk{t,X,Z(t),β,η(t)} are continuous and uniformly bounded for all t, β, X(t), Z(t) and η(t).

  • (C3)

    The measurement error red 𝜖k,1, · · ·, 𝜖k,n are i.i.d. with mean 0 and variance σk2(0<σk2<) Moreover, their density function, ρ(.), is bounded away from 0 and +∞ and has a bounded and continuous first-order derivative.

  • (C4)

    l + 1 ≥ ϱ.

  • (C5)

    For any βB and ηA, k=1ϑE(wk(t)[Fk{t,X(t),Z(t),θ}Fk{t,X(t),Z(t),θ0}]2)=0 if and only if β = β0 and P {t : η(t) = η0(t)} = 1.

Assumption D:

  • (D1)

    The weight function wk(.) is bounded and nonnegative on [t0, T ] with wk(t0) = wk(T) = 0. Moreover, wk(t) has a bounded and continuous first-order derivative.

  • (D2)

    ϖ satisfies the restrictions 0.25/ϱ < ϖ < 0.5 and ϖ(2 + ϱ) > 0.5.

Note that Assumption A is a standard condition to derive local asymptotic properties of the penalized spline estimator for the nonparametric regression function (Claeskens, Krivobokova, and Opsomer 2009; Wu, Xue, and Kumar 2012). Assumption B is required for the precision of the discretization algorithm (Wu, Xue, and Kumar 2012), where Assumption B3 from Mattheij and Molenaar (2002, p.55–56) defines the precision of the numerical algorithm. For example, the Euler backward method, the trapezoidal rule and the 4-stage Runge-Kutta are of order 1, 2 and 4, respectively. Assumptions C1-C3 are needed for the proof of consistency. Assumption C4 is a general constrained condition in spline theories. Assumption C5 is required for identifiability. Assumption D1 on the boundary constraint of weight functions is required for some technical reasons, which was also used by Wu, Xue, and kumar (2012) to ensure the convergence rate of the parametric components’ estimators to achieve the common root n. Assumption D2 is needed for the proof of asymptotic normality.

Theorem 1: Under Assumptions A, B and C, we have ρ(θ^,θ0)0, almost surely, under Pθ0.

Theorem 2: Under Assumptions A, B, C and D, we have ρ(θ^n,θ0)=Op(nϖϱ+n(1ϖ)/2). Moreover, if ϖ=1/(1+2ϱ),η^n(t)η0(t)2=Op(nϱ/(1+2ϱ)), which is the same as the optimal rate of the standard nonparametric function estimation.

Theorem 3: Under Assumptions A, B, C and D, if ϑ ≥ 2, then we have n1/2(β^nβ0)dN(0,A12(A1)T) with the matrixes A and Σ2 given in (11) and (13) in Section 7, respectively.

Remark 1: For ϑ = 1, Theorem 3 does not hold because of unidentifiability problem, which is similar to Theorem 4.2 in Xue, Miao, and Wu (2010).

The asymptotic variance-covariance matrix needs to be estimated in order to perform statistical inference for unknown parameters β. There are some standard methods that can be used. We recommend the nonparametric bootstrap method (Efron 1982), which has been used for statistical inference for ODE models with constant coefficients (Joshi, Seidel-Morgenstern, and Kremling 2006).

Let {(ti, Yi), i = 1, · · ·, n} be independent and drawn with replacement from the original sample {(ti, Yi), i = 1, · · ·, n}. In the first stage, based on the bootstrap sample {(ti, Yi), i = 1, · · ·, n}, we can obtain estimators X^*(t) and X^*(t). Plugging them into the second stage, we can get the bootstrap M-estimator (β^n*,α^n*). From Theorem 1 in Cheng and Huang 2010) and Theorem 3 in this article, given {ti,Y(ti)},n(β^n*β^n) and n(β^nβ^0) have the same limiting distribution, which can be used for inference on β^n and η^n(t).

4. Real data analysis

We used the proposed two-stage method to estimate parameters of an ODE model that describes kinetics of influenza-specific CD8+ T cells during the primary immune response. Goal of the study is to estimate biological parameters that determine the distribution of CD8+ T cells during influenza infection in mice. Since these parameters are not directly estimable from experiments, a mathematical model is developed to mimic the kinetics of CD8+ T cells during influenza infection as follows (Wu et al. 2011):

dTEmdt=(ρm(t)γms)TEm,dTEsdt=(ρs(t)γsl)TEs+γmsTEm,dTEldt=γslTEsδlTEl. (7)

The state variable TEm denotes effector CD8+ T cell count in Mediastinal lymph node (MLN), TEs denotes effector CD8+ T cell count in spleen, and TEl denotes effector CD8+ T cell count in lung. The model has two time-varying parameters: ρm(t) and ρs(t) and three constant parameters: γms, γsl, and δl. Parameter ρm(t) denotes the proliferation rate of effector CD8+ T cells in MLN and ρs(t) denotes the proliferation rate of effector CD8+ T cells in spleen. Constant parameter γms denotes the migration rate of effector CD8+ T cells from MLN to spleen, γsl denotes the migration rate of effector CD8+ T cells from spleen to lung, and δl denotes the death rate of effector CD8+ T cells in lung. Data on TEm, TEs and TEl were collected through experiments on Female C57BL/6 mice (see Wu et al. 2011 for more details). Mice were anesthetized and intranasally inoculated with H3N2 A/Hong Kong/X31 influenza-A virus. From mice tissues, CD8+ T cell data were collected on days 0,4,5,6,7,8,9,10,11,12,14, and 28. At each time point, data were collected from 6 mice except day 12 for which data were collected from 12 mice. For estimating parameters, we only used the data from day 5 to day 14 when adaptive immune response is active. Also one observation on day 8 was identified as an outlier and hence removed from the data for model fitting.

Wu, Xue, and Kumar (2012) showed that the TDB method is the best among all the two-stage methods when both accuracy and computational cost are taken into account. Hence we used the TDB algorithm to estimate the parameters of the model (7). The time-varying parameters ρm(t) was approximated by a linear sum of four B-splines with order 4:ρm(t)=j=03αjBj(t),5t14. These B-splines were evaluated at knot points (5,5,5,8,11,14,14,14). Two inner knot points at 8 and 14 allowed us to capture the bi-modal shape of the time-varying parameters. The time-varying parameter ρs(t) was also approximated by a linear sum of four B-splines with order 4. We used the weight function w(ti) = sin((ti − 5)π/9) in the objective function. We adopted the data augmentation size m = 11 ≈ 1.25n with n = 9 in the second stage. To find confidence intervals of the parameters, we used the bootstrap method. The bootstrap method was repeated 500 times to obtain 500 bootstrap estimates of each parameter. A 95% confidence interval for a parameter was obtained by finding the 2.5th percentile and the 97.5th percentile of the bootstrap parameter estimates. Point estimates along with 95% confidence intervals for the parameters of model (7) are given in Table 1. Plots of the estimates of the time-varying parameter ρm and ρs along with their 95% confidence bands are shown in the left and right sides of Figure 1, respectively. From these estimation results, we can see that our estimates for the constant parameters (γms, γsl, δl) are on the similar scale of the original estimates in Wu et al. (2011). Our time-varying parameter estimates for proliferation rates demonstrate an interesting bi-mode shape, which may indicate an earlier peak during the adaptive immune response and a later peak during the recovery period after the influenza virus is removed at Day 10. But these biological implications need to be confirmed by further experiments in the future.

Table 1:

Parameter point estimates and 95% bootstrap confidence intervals in the real data example.

Parameter Estimate Confidence interval
γms  1.57 (0.94,3.03)
γsl  0.92 (0.55,1.21)
δl  4.60 (2.26,6.62)

Figure 1:

Figure 1:

Estimates of the proliferation rates of CD8+ T cells in MLN (left) and spleen (right) along with 95% confidence bands.

5. Simulation Studies

We conducted simulation studies to evaluate the performance of the proposed two-stage estimation method. Similar to Section 4, we only considered the TDB method. We used model (7) to generate the simulation data. Constant parameters for simulation studies were γms = 1.25, γsl = 1.25 and δl = 9. The time-varying parameters ρm(t) and ρs(t) that were used to generate simulation data are shown in Figure 2. These simulation parameters were selected so that the solution to (7) is well-behaved (shown in Figures 3).

Figure 2:

Figure 2:

ρm(t) (left) and ρs(t) (right) used to generate simulation data.

Figure 3:

Figure 3:

Solutions of TEm (left), TEs(middle) and TEl (right) for the simulation parameters.

We generated two simulation data sets: one of 20 observations and the other of 40 observations for each state variable. Simulation data were generated at equally-spaced time points between 5 and 14. A fourth order Runge-Kutta method was used to solve (7) to generate the simulation data. Initial conditions TEm(5)=5000, TEs(5)=45000, and TEl(5)=1300 were used to solve (7) numerically. Noise generated from independent normal distributions with zero mean and standard deviations σEm, σEs, and σEl were added to the simulation data for TEm, TEs, and TEl respectively. The standard deviation for the noise was calculated by taking 1% of the standard deviation of the noiseless simulation data. We tried two different data augmentation sizes: m = n and m = 2n. Similar to the real data analysis, we used the weight function w(ti) = sin((ti − 5)π/9) in the objective function.

We estimated ρm(t), γms and γsl, and assumed that the rest of the parameters are known. This reduced the computational burden on the optimization routine avoiding local convergence problem and hence giving a fair comparison of the performance of different data augmentation sizes. The time-varying parameter ρm(t) was approximated by a linear sum of four B-spline basis functions of order 4. R function “genoud”, which is a global optimization routine, was used to minimize (6) to obtain the parameter estimates. The algorithm “genoud” searches for the estimate of the parameters in the range [0,10] and starts the search at 5 for each parameter. For each simulation case, 500 runs were replicated. The following average relative error (ARE) of the constant parameter estimates were used to compare for different data augmentation sizes,

ARE(θ)=1500i=1500|θiθ0θ0|, (8)

where θ0 is the true value of a parameter θ and θi is the ith simulation run. For the time-varying parameter ρm, we calculated its average mean square error (AMSE) as follows,

AMSE(ρm)=1500i=1500(11000j=11000(ρmi(tj)ρm0(tj))2), (9)

where ρm (tj) is the estimate of ρm(tj) in the ith simulation run and ρm (tj) is the true value of ρm at the time point tj. Time points tj’s in the equation (9) are equi-spaced points between 5 and 14. In Table 2, we show ARE for the constant parameters γms and γsl and AMSE for the time-varying parameter ρm at two data augmentation sizes: m = n and m = 2n. We observed the trend that both ARE and AMSE decrease as the data augmentation size increases. This simulation study clearly demonstrates the merit of data augmentation in the second stage in improving the overall fit of the model to the data.

Table 2:

ARE for γms and γsl and AMSE for ρm

n Parameter m/n ARE or AMSE
20 γms 1 6.08
2 1.40
γsl 1 0.17
2 0.15
ρm 1 0.0044
2 0.0005

40 γms 1 0.91
2 0.78
γsl 1 0.13
2 0.11
ρm 1 0.0002
2 0.0001

6. Conclusion and Discussion

In this article, we have proposed a new class of estimation methods to estimate both constant and time-varying coefficients for semiparametric ODE models, which exploits the form of numerical discretization algorithms for an ODE solver and uses more sample points than the original data points. This new class of methods are shown to have benefits in terms of computational efficiency and estimation accuracy over the original two-stage smoothing estimation methods for ODE models. Other estimation approaches for the semiparametric ODE models, such as the NLS method and the generalized profiling approach (Ramsay et al. 2007) may also be considered, but this new class of estimation methods provides an alternative estimation method with a low computation cost. In this article, we have not explored how to select the data augmentation size in the second stage. As per our simulation studies, the bigger the data augmentation size, the better the estimation. However, a bigger data augmentation size comes with a higher computational cost. The trade-off between the data augmentation size and the computational cost is an open problem.

Acknowledgments

The authors thank Dr. Hua Liang and Dr. Yun Fang for helpful discussions.

This research was partially supported by the NIH grants HHSN272201000055C, AI087135, and two University of Rochester CTSI pilot awards (UL1RR024160) from the National Center For Research Resources. This work was done when authors were affiliated with University of Rochester.

Appendix: Technical Proofs

For presentation and notation simplicity, we mainly focus on the case without input variable Z and t in (1). And we only consider the two-stage smoothing estimation method, i.e., replacing (6) with the following objection function:

j=1mk=1ϑwk(sj)[X^k(sj)Fk{X^(sj),β,πT(sj)α}]2.

All of the proofs can be extended to the numerical discretization-based methods based on the objection function (6) by similar arguments in the proofs of Lemma 5, Proposition 1 and Proposition 2 in Wu, Xue, and Kumar (2012). We also only discuss the univariate case for η(t), which can be extended to the multivariate case using similar approach as in He, Xue, and Shi (2010). Denote

S˜n(θ)=1mj=1mk=1ϑwk(sj)[X^k(sj)Fk{X^(sj),θ}]2,
Sn(θ)=1mj=1mk=1ϑwk(sj)[Xk(sj)Fk{X(sj),θ}]2,
S(θ)=k=1ϑE(wk(t)[Xk(t)Fk{X(t),θ}]2).

Proof of Theorem 1. Denote set An={η(t)=i=1rBi(t)αi:max1ir|αi|ln}, where lnn(2l1)/[2l(2l+1)] with a constant l arbitrarily close to l (see Shen 1997, p. 2560), and Θn={θ:βB,ηAn}=B×An. For any θΘ=˙{θ:βB,ηA}=B×A, under Assumption C4, by Corollary 6.21 in Schumaker (1981), there exists θn ∈ Θn such that ρ(θ,θn)=O(nϖϱ)

First, we claim that supθ|S˜n(θ)Sn(θ)|0, a.s., under pθ0. In fact, by the mid-value theorem, we have that

S˜n(θ)=1mj=1mk=1ϑwk(sj)[X^k(sj)Fk{X^(sj),θ}]2=1mj=1mk=1ϑwk(sj)[Xk(sj)Fk{X(sj),θ}+X^k(sj)Xk(sj)+Fk{X(sj),θ}Fk{X^(sj),θ}]2=1mj=1mk=1ϑwk(sj)[Xk(sj)Fk{X(sj),θ}+X^k(sj)Xk(sj)ι=1ϑFk{X,θ}Xι|X1=X1(sj),,Xι=X˜ι,j,,Xϑ=Xϑ(sj)[X^ι(sj)Xι(sj)]]2=1mj=1mk=1ϑwk(sj)[Xk(sj)Fk{X(sj),θ}]2+1mj=1mk=1ϑwk(sj)[X^k(sj)Xk(sj)]2+1mj=1mk=1ϑι=1ϑwk(sj)(Fk{X,θ}Xι)2[X^ι(sj)Xι(sj)]2+Rn(θ), (10)

where X˜ι,j is some point between Xι(sj) and X^ι(sj) and Rn(θ) is the left term of interaction. Under Assumptions A and C1, by the strong law of large numbers and Lemma 2 in Wu, Xue, and Kumar (2012), we have

1mj=1mk=1ϑwk(sj)[X^k(sj)Xk]2k=1ϑE(wk(t)[X^k(t)Xk(t)]2)0,a.s.,

and

supθ1mj=1mk=1ϑι=1ϑwk(sj)(Fk{X,θ}Xι)2|X1=X1(sj),,Xι=X˜ι,j,,Xϑ=Xϑ(sj)[X^ι(sj)Xι(sj)]2k=1ϑι=1ϑE[wk(t)supθ(Fk{X,θ}Xι)2|X=X(t)[X^ι(t)Xι(t)]2]0.a.s.

By the Cauchy-Schwarz inequality, it follows that the left term supθ Rn(θ) is bounded by a product of a lower term than that other three terms in (10) and can be ignored. Thus the claim holds.

Next, let Anδ be the set {ηAn,ηηn02δ} and N2(,L,Anδ) be its bracketing number with respect to L (see Definition 2.1.6, van der Vaart and Wellner 1996), where ηn0 is the map point of η0 in the sieve An. By the calculation of Shen and Wong (1994, p.597), for any 𝜖 ≤ δ, we have N2(,L,Anδ)C(δ/)r, where r = q + l is the number of B-splines basis functions. Let Fn be the set {Sn(θ)S(θ):ββ0δ,ηAn,ηηn02δ}. For any θ1, θ2 ∈ Θn, we can easily obtain |Sn(θ1)Sn(θ2)|C(β1β2+η1η2) using Taylor expansion. Hence

N2(,L,Fn)N1(/2,L2,B)×N2(/2,L,Anδ)C(3Rβ/)d(δ/)C(1/)r+d.

Then by similar arguments of (A.2) in Xue, Lam, and Li (2004), we can obtain supFn|Sn(θ)S(θ)|0 a.s., under pθ0.

Third, under Assumption C5, it is easy to obtain that S(θ) reaches its unique minimum at θ = θ0. Under Assumption B4, it follows that the first-order derivative S(θ)θ of S(θ) at θ0 equals to zero and the second-order derivative 2S(θ)θθT of S(θ) at θ0 is positive definite. By Assumptions A5, C1 and C3, the second-order derivative of S(θ) in a small neighborhood of θ0 is bounded away from 0 and ∞. Then the second-order Taylor expansion of S(θ) gives that there exists a constant 0 < C < ∞ such that S(θ^n)S(θ0)Cp(θ^n,θ0). Moreover,

0S(θ^n)S(θ0)=S(θ^n)S˜n(θ^n)+S˜n(θ^n)S(θ0)S(θ^n)S˜n(θ^n)+S˜n(θ^0)S(θ0)2supθ|S˜n(θ)S(θ)|2supθ|S˜n(θ)Sn(θ)|+2supθ|Sn(θ)S(θ)|0,a.s.

Thus, ρ(θ^n,θ0)0 a.s., under pθ0.

Proof of Theorem 2. We apply Theorem 3.4.1 in van der Vaart and Wellner (1996) to obtain the rate of convergence.

First, let θn0 be the map point of θ0 in Θn in the proof of Theorem 1, and define θn0ρ1(θ,θn0) be a map from Θn to [0, ∞) as ρ12(θ,θn0)=S(θ)S(θn0) Choose δn = ρ(θ0, θn0). For δn < δ < ∞, denote Ω = {θ : θ ∈ Θn, δ/2 < ρ(θ, θn0) ≤ δ}. From the definition of ρ1, we have supΩS(θn0)S(θ)δ24 Let Ξn be the set {Sn(θ)S(θn0):θΘn} and J˜(δ,L2(P),Ξn) be the L2(P)-norm bracketing integral of Θn. Similar to the proof of Theorem 4.2 of Xue, Miao, and Wu (2010), we have J˜(δ,L2(P),Ξn)Cr12δ. Let

ϕn(δ)=J˜(δ,L2(P),Ξn)(1+J˜(δ,L2(P),Ξn)/(δ2n))=r12δ+r/n.

Obviously, ϕn(δ)1+ι is a decreasing function in δ for 0 < τ < 1. Then by Lemma 3.4.2 in Van der Vaart and Wellner (1996), we have E0[supΩ(SnS)(θθn0)]_ϕn(δ)/n.

Next, we claim that supΩ[S˜n(θ)Sn(θ)]=op(n12) From (10), it is sufficient to prove 1mj=1mk=1ϑwk(sj)[X^k(sj)Xk(sj)]2op(n12) and supΩ1mj=1mk=1ϑι=1ϑwk(sj)(Fk{X,θ}Xι)2[X^ι(sj)Xι(sj)]2=op(n12). In fact, for fixed X^(t) and θ, from the numerical error results on the Trapezoidal Rule integration, we have that

1mj=1mk=1ϑι=1ϑwk(sj)supΩ[Fk{X,θ}Xι][X^ι(sj)Xι(sj)]=k=1ϑι=1ϑt0Twk(t)ρ(t)supΩ[Fk{X,θ}Xι][X^ι(t)Xι(t)dt+Op(1/m)].

Since mn,Op(1/m)=Op(n12). Denote g(t)=wk(t)ρ(t)supΩ[Fk{X,θ}Xι]. Under Assumption A, by Lemma 6 in Wu, Xue and Kumar (2012), t0Tg(t)[X^ι(t)Xι(t)]dt is asymptotically normal with mean as

μ1=t0Tg(t)E[X^ι(t)Xι(t)]dt=Op(kυ+1)=op(n1/2),

and variance as

Σ1=t0Tt0Tg(s)Cov[X^ι(s)Xι(s),X^ι(t)Xι(t)]g(t)dsdt=Op(1/n).

Thus nmj=1mk=1ϑι=1ϑwk(sj)supΩ[Fk{X,θ}Xι][X^ι(sj)Xι(sj)] is asymptotically normal with mean 0 and variance of order Op(1). Using the Delta method (van der Vaart and Wellner 1996, p.377), we have that

nmj=1mk=1ϑι=1ϑwk(sj)supΩ[Fk{X,θ}Xι]2[X^ι(sj)Xι(sj)]2

is also asymptotically normal with mean 0 and variance of order Op(1). So

1mj=1mk=1ϑι=1ϑwk(sj)supΩ[Fk{X,θ}Xι]2[X^ι(sj)X(sj)]2=op(n12)

Similarly, for fixed X^t, we have

1mj=1mk=1ϑwksjX^ksjXksj=k=1ϑt0TwktρtX^ktXktdt+opm12.

Further, applying integration by parts and Assumption D1 wk(t0) = wk(T) = 0 for k = 1, · · ·, ϑ, we have

k=1ϑt0Twk(t)ρ(t)[X^k(t)Xk(t)]dt=k=1ϑ(wk(t)ρ(t)[X^k(t)Xk(t)])|t0Tk=1ϑt0Tddt[wk(t)ρ(t)][X^k(t)Xk(tj)]dt=k=1ϑt0Tddt[wk(t)ρ(t)][X^k(t)Xk(t)]dt.

Then by Lemma 6 in Wu, Xue and Kumar (2012), nmj=1mk=1ϑwk(sj)[X^k(sj)Xk(sj)] is asymptotically normal with mean 0 and variance of order Op(1). Using the Delta method again, it follows that nmj=1mk=1ϑwk(sj)[X^k(sj)Xk(sj)]2 is also asymptotically normal with mean 0 and variance of order Op(1). So 1mj=1mk=1ϑwk(sj)[X^k(sj)Xk(sj)]2=op(n12). Thus the claim holds and we have E0[supΩ(S˜nS)(θθn0)]_ϕn(δ)/n+op(1)/nϕn/n. It follows that E0[supΩn(S˜nS)(θθn0)]_ϕn(δ)

Thus the conditions of Theorem 3.4.1 in Van der Vaart and Wellner (1996) are satisfied for the δn, ρ1 and ϕn(δ) above. Therefore we have rn2ρ1(θ^n,θn0)=Op(1), where rn satisfies rn2ϕn(1rn)n. It follows that rn=r12n12=n(1ϖ)/2. Thus ρ1(θ^n,θn0)=Op(n(1ϖ)/2). Further, by similar arguments in the proof of Theorem 4.2 in Xue, Miao, and Wu (2010), we have that ρ(θ^n,θn0)=Op(nϖϱ+n(1ϖ)/2).

Proof of Theorem 3: To prove Theorem 3, we cannot directly use the Huber-Pollard Ztheorem (see Theorem 3.3.1 in van der Vaart and Wellner 1996), since in the second stage of our proposed estimation method, X^(t) and X^(t) are nonparametric plug-in estimators from the first stage. We extend the original Huber-Pollard Z-theorem to such two-stage case.

Some definitions are needed to prove Theorem 3. For any fixed ηA, let A0= {ηh(.) : h in a neighborhood of 0 ∈ R } be a smooth curve in A running through η0 at h = 0, that is ηh=0(t) = η0(t). Denote hηh(t)|h=0=a(t) and the space generated by such a(t) as ϒ. For any a ∈ ϒ we define

S˙1(θ)=S(θ)β=2k=1ϑE(wk(t)[Xk(t)Fk{X(t),θ}]Fk{X(t),θ}β),S˙2(θ)[a]=S(β,ηh)h|h=0=2k=1ϑE(wk(t)[Xk(t)Fk{X(t),θ}]Fk{X(t),θ}ηa(t)).

We set

S˙11(θ)=2S(θ)ββT=2k=1ϑE(wk(t)Fk{X(t),θ}βFk{X(t),θ}βT)2k=1ϑE(wk(t)[Xk(t)Fk{X(t),θ}]2Fk{X(t),θ}ββT),
S˙12(θ)[a]=S˙21T(θ)[a]=2S(β,ηh)βh|h=0=2k=1ϑE(wk(t)Fk{X(t),θ}βFk{X(t),θ}ηa(t))2k=1ϑE(wk(t)[Xk(t)Fk{X(t),θ}]2Fk{X(t),θ}βηa(t)),
S˙22(θ)[a1,a2]=2S(β,ηhj)h1h2|h1=0,h2=0h2S˙2(β,ηh2)[a1]|h2=0=2k=1ϑE(wk(t)[Fk{X(t),θ}η]2a1(t)a2(t))2k=1ϑE(wk(t)[Xk(t)Fk{X(t),θ}]2Fk{X(t),θ}2ηa1(t)a2(t)),

where a1(t),a2(t) ∈ ϒ. We also define S˜1n(θ)=S˜n(θ)β,S˜2n(θ)[a]=S˜n(β,ηh)h|h=0,S1n(θ)=Sn(θ)β, and S2n(θ)[a]=S˜n(β,ηh)h|h=0. Further, for a = (a1,···, ad)T ∈ ϒd, where aj ∈ ϒ for j = 1,···, d, we denote S˙2(θ)[a]=(S˙2(θ)[a1],,S˙2(θ)[ad])T,S˙12(θ)[a]=(S˙12(θ)[a1],,S˙12(θ)[ad])T,S˙21(θ)[a]=(S˙21(θ)[a1],,S˙21(θ)[ad])T,S˙22(θ)[a,a]=(S˙22(θ)[a1,a],,S˙22(θ)[ad,a])T,S˜2n(θ)[a]=(S˙2(θ)[a1],,S˜2n(θ)[ad])T and S2n(θ)[a]=(S2n(θ)[a1],,S2n(θ)[ad])T Then we can obtain the following results E1–E7.

  • E1.

    From Theorem 1 and Theorem 2, we have that β^nβ0|=op(1),η^nη02=Op(nζ), where Op(nζ) is just the rate of convergence in Theorem 2. Moreover, ζ>14.

  • E2.

    1(θ0) = 0 and 2(θ0)[a] = 0 for all a ∈ ϒ.

  • E3.
    Following the arguments in Ma and Kosorok (2005) and Wellner and Zhang (2007), we need to find a least favorable a*(t)={a1*(t),,ad*(t)}Tϒd, where aϒ. for j = 1,···, d, such that 12(θ0)[a] − 22(θ0)[a,a] = 0, for all a ∈ ϒ. Moreover, the matrix A = −11(θ0) + 12(θ0)[a] is nonsingular. Some calculations yield
    S˙12(θ0)[a]S˙22(θ0)[a*,a]=2k=1ϑE[[wk(t)Fk{X(t),θ0}β0Fk{X(t),θ0}η0a(t)]2k=1ϑE[wk(t)(Fk{X(t),θ0}η0)2a*(t)a(t)]2k=1ϑE[wk(t)[Xk(t)Fk{X(t),θ0}]2Fk{X(t),θ0}β0η0a(t)]+2k=1ϑE[wk(t)[Xk(t)Fk{X(t),θ0}]2Fk{X(t),θ0}2ηa*(t)a(t)],=2k=1ϑE[wk(t)Fk{X(t),θ0}β0Fk{X(t),θ0}η0a(t)]2k=1ϑE[wk(t)(Fk{X(t),θ0}η0)2a*(t)a(t)].
    Therefore, an obvious choice of a is
    a*(t)=k=1ϑwk(t)Fk{X(t),θ0}β0Fk{X(t),θ0}η0k=1ϑwk(t)[Fk{X(t),θ0}η0]2.
    Moreover, for ϑ ≥ 2,
    A=2k=1ϑE[wk(t)Fk{X(t),θ0}β0Fk{X(t),θ0}β0T]+2k=1ϑE[wk(t)Fk{X(t),θ0}β0Fk{X(t),θ0}η0a*(t)] (11)
    is nonsingular. Otherwise, A = 0 for ϑ = 1.
  • E4.

    By the similar arguments of Condition (i) of the proof of Theorem 4 in Xue, Lam and Li (2004), it follows that the estimator θ^n satisfies S˜1n(θ^n)=0 and S˜2n(θ^n)[a*]=op(n1/2)

  • E5.
    For any δn ↓ 0 and C > 0, setting F={θ:ββδn,ηη02Cnζ}, we claim
    supF|n(S˜1nS˙1)(θ)n(S˜1nS˙1)(θ0)|=op(1),supF|n(S˜2nS˙2)(θ)[a*]n(S˜2nS˙2)(θ0)[a*]|=op(1).
    In fact, similar to C3 of Theorem 4 in Xue, Lam, and Li (2004), it follows that
    supF|n(S1nS˙1)(θ)n(S1nS˙1)(θ0)|=op(1),supF|n(S2nS˙2)(θ)[a*]n(S2nS˙2)(θ0)[a*]|=op(1).
    Moreover, similar to the proof of Theorem 2, we have
    supF|n(S˜1nS1)(θ)n(S˜1nS1)(θ0)|=op(1),supF|n(S˜2nS2)(θ)[a*]n(S˜2nS2)(θ0)[a*]|=op(1).
    Combining them together, it follows that the claim holds.
  • E6.
    By the Taylor expansion, for θF, it follows that
    |S˙1(θ)S˙1(θ0)S˙11(θ0)(ββ0)S˙12(θ0)[ηη0]|=o(ββ0)+O(ηη022),
    and
    |S˙2(θ)[a*]S˙2(θ0)[a*]S˙21(θ0)[a*](ββ0)S˙22(θ0)[a*,ηη0]|=o(ββ0)+O(ηη022).
  • E7.
    n(S˜1n(θ0)S˜2n(θ0)[a*]) is asymptotically normal. In fact, by the Taylor expansion, we have
    S˜1n(θ0)S˜2n(θ0)[a*]=2mj=1mk=1ϑwk(sj)(Fk{X^(sj),θ0}βFk{X^(sj),θ0}η[a*])×[X^k(sj)Fk{X^(sj),θ0}]=2mj=1mk=1ϑwk(sj)(Fk{X^(sj),θ0}βFk{X^(sj),θ0}η[a*])×[X^k(sj)X^k(sj)+Fk{X(sj),θ0}Fk{X^(sj),θ0}]=2mj=1mk=1ϑwk(sj)(Fk{X^(sj),θ0}βFk{X^(sj),θ0}η[a*])×(ι=1ϑFk{X,θ0}Xι|X1=X1(sj),,Xι=X˜ι,j,,Xϑ=Xϑ(sj)[X^ι(sj)Xι(sj)]+X^k(sj)Xk(sj))
    with X˜ι,j being some point between Xι(sj) and X^ι(sj). Further, denoting Ak(t)=Fk{X(t),θ0}βFk{X(t),θ0}η[a*] and applying the numerical error results on the Trapezoidal Rule integration and integration by parts, we have
    S˜1n(θ0)S˜2n(θ0)[a*]=2k=1ϑt0Twk(t)ρ(t)Ak(t)[X^k(t)Xk(t)]dt+2k=1ϑι=1ϑt0Twk(t)ρ(t)Ak(t)Fk{X,θ0}Xι|X=X(t)[X^ι(t)Xι(t)]dt+Op(m1)=2k=1ϑ{wk(t)ρ(t)Ak(t)[X^k(t)Xk(t)]}|t0T+2k=1ϑt0Tddt[wk(t)ρ(t)Ak(t)][X^k(t)Xk(t)]dt+2k=1ϑι=1ϑt0Twk(t)ρ(t)Ak(t)Fk{X,θ0}Xι|X=X(t)[X^ι(t)Xι(t)]dt+op(n12).

Under the assumption of wk(t0) = wk(T) = 0 for k = 1,···, ϑ, we have

S˜1n(θ0)S˜2n(θ0)[a*]
=2k=1ϑt0TBk(t)[X^k(t)Xk(t)]dt+op(n12) (12)

With Bk(t)=ddt[wk(t)ρ(t)Ak(t)]+ι=1ϑ[wι(t)ρ(t)Aι(t)Fι{X,θ0}Xk|X=X(t)]. Similar to the proof of Theorem 2, by Lemma 6 in Wu, Xue and Kumar (2012), it follows that 2nk=1ϑt0TBk(t)[X^k(t)Xk(t)]dtdN(0,2) with

Σ2=4k=1ϑσk2[t0Tt0TBk(s)NT(s)(G+λknDq)1×G(G+λknDq)1N(t)BkT(t)dsdt], (13)

and Σ2=Op(1). Then from the expression (12), we have nS˜1n(θ0)S˜2n(θ0)[a*]dN(0,Σ2). Thus the claim E7 holds.

Then, by E1 and E5, we have n(S˜1nS˙1)(θ^n)n(S˜1nS˙1)(θ0)=op(1). Since S˜1n(θ^n)=0 by E4 and S˙1(θ0)=0 by E2, it follows that nS˙1(θ^n)+nS˜1n(θ0)=op(1) Similarly, nS˙2(θ^n)[a*]+nS˜2n(θ0)[a*]=op(1) Combining these equalities and E6 yields

S˙11(β^nβ0)+S˙12[η^nη0]+S˜1n(θ0)=op(n12)+0(β^β0)+O(η^nη022), (14)

and

S˙21[a*](β^nβ0)+S˙22[a*,η^nη0]+S˜2n(θ0)[a*]=op(n12)+0(β^β0)+O(η^nη022). (15)

E1 implies nO(η^nη022)=op(1). Thus by E3 and (14) minus (15), it follows that

(S˙11(θ0)S˙21(θ0)[a*])(β^nβ0)+o(β^nβ0)=(S˜1n(θ0)S˜2n(θ0)[a*])+op(n12).

That is [A+o(1)](β^nβ0)=(S˜1n(θ0)S˜2n(θ0)[a*])+op(n12) Combining with E7, it follows that

n(β^nβ)=[A+o(1)]1n(S˜1n(θ0)S˜2n(θ0)[a*])+op(1)dN(0,A1Σ2(A1)T)

with A1Σ2(A1)T=Op(1)

References

  1. BANSAL M, DELLA G AND DI BERNARDO D(2006). Inference of gene regulatory networks and compound mode of action from time course gene expression profiles. Bioinformatics 22:815–822. [DOI] [PubMed] [Google Scholar]
  2. BARD Y (1974). Nonlinear Parameter Estimation New York:Academic Press. [Google Scholar]
  3. BRUNEL N (2008). Parameter estimation of ODE’s via nonparametric estimators. Electronic Journal of Statistics 2:1242–1267. [Google Scholar]
  4. CAO J, HUANG JZ AND WU H (2012). Penalized Nonlinear Least Squares Estimation of Time-Varying Parameters in Ordinary Differential Equations. Journal of Computational and Graphical Statistics 21:42–56 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. CHEN J AND WU H (2008a). Efficient local estimation for time-varying coefficients in deterministic dynamic models with applications to HIV-1 dynamics. J. Am. Statist. Assoc 103:369–384. [Google Scholar]
  6. CHEN J AND WU H (2008b). Estimation of time-varying parameters in deterministic dynamic models with application to HIV infections. Statistica Sinica 18:987–1006. [Google Scholar]
  7. CHEN G AND HUANG JZ (2010). Bootstrap consistency for general semiparametric M-estimation. Ann. Stat 38:2884–2915. [Google Scholar]
  8. CLAESKENS G, KRIVOBOKOVA T AND OPSOMER J (2009). Asymptotic properties of penalized spline estimators. Biometrika 96:529–544. [Google Scholar]
  9. CRAVEN P AND WAHBA G (1979). Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the generalized cross-validation. Numer. Math 31:337–403. [Google Scholar]
  10. D’HAESELEER P, WEN X, FUHRMAN S AND SOMOGYI R (1999). Linear modeling of mrna expression levels during cns development and injury. Pacific Symposium on Biocomputing 4:41–52. [DOI] [PubMed] [Google Scholar]
  11. DONNET S AND SAMSON A (2007). Estimation of parameters in incomplete data models defined by dynamical systems. Journal of Statistical Planning and Inference 137:2815–2831. [Google Scholar]
  12. EFRON B (1982). The Jackknife, the Bootstrap and Other Resampling Plans Philadelphia:SIAM. [Google Scholar]
  13. Gugushvili S and Klaassen CAJ n-Consistent Parameter Estimation for Systems of Ordinary Differential Equations: bypassing Numerical Integration via Smoothing. Bernoulli 18:1061–1098. [Google Scholar]
  14. HAIRER E, NORSETT S AND WANNER G (1993). Solving Ordinary Differential Equation I: Nonstiff Problems Berlin:Springer-Veralag. [Google Scholar]
  15. HE X, XUE H AND SHI NZ (2010). Sieve maximum likelihood estimation for doubly semiparametric zero-inflated Poisson models. Journal of Multivariate Analysis 101:2026–2038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. HUANG Y, LIU D AND WU H (2006). Hierarchical Bayesian methods for estimation of parameters in a longitudinal HIV dynamic system. Biometrics 62:413–423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. JOSHI M, SEIDEL-MORGENSTERN A AND KREMLING A (2006). Exploiting the boostrap method for quantifying parameter confidence intervals in dynamic systems. Metabolic Engineering 8:447–455. [DOI] [PubMed] [Google Scholar]
  18. LI L, BROWN MB, LEE KH AND GUPTA S (2002). Estimation and inference for a spline-enhanced population pharmacokinetic model. Biometrics 58:601–611. [DOI] [PubMed] [Google Scholar]
  19. LI Y AND RUPPERT D (2008). On the asymptotics of penalized splines. Biometrika 95:415–436. [Google Scholar]
  20. LI Z, OSBORNE MR AND PRVAN T (2005). Parameter estimation of ordinary differential equations. IMA Journal of Numerical Analysis 25:264–285. [Google Scholar]
  21. LIANG H, MIAO H AND WU H (2010). Estimation of constant and time-varying dynamic parameters of HIV infection in a nonlinear differential equation model. Ann. Appl. Statist 4:460–483. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. LIANG H AND WU H (2008). Parameter estimation for differential equation models using a framework of measurement error in regression. J. Am. Statist. Assoc 103:1570–1583. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. MA S AND KOSOROK MR (2005). Robust semiparametric M-estimation and the weighted bootstrap. Journal of Multivariate Analysis 96:190–217. [Google Scholar]
  24. MATTEIJ R AND MOLENAAR J (2002). Ordinary Differential Equations in Theory and Practice Philadelphia:SIAM. [Google Scholar]
  25. PUTTER H, HEISTERKAMP S, LANGE J AND WOLF F (2002). A Bayesian approach to parameter estimation in HIV dynamic models. Statistics in Medicine 21:2199–2214. [DOI] [PubMed] [Google Scholar]
  26. QI X AND ZHAO H (2010). Asymptotic efficiency and finite-sample properties of the generalized profiling estimation of parameters in ordinary differential equations. Ann. Statist 38:435–481. [Google Scholar]
  27. RAMSAY JO (1996). Principal differential analysis: data reduction by differential operators. J. R. Statist. Soc. B 58:495–508. [Google Scholar]
  28. RAMSAY JO, HOOKER G, CAMPBELL D AND CAO J (2007). Parameter estimation for differential equations: a generalized smoothing approach (with discussion). J. R. Statis. Soc. B 69:741–796. [Google Scholar]
  29. RUPPERT D, WAND MP AND CARROLL RJ (2003). Semiparametric Regression Cambridge: Cambridge University Press. [Google Scholar]
  30. SCHUMAKER LL (1981). Spline Functions: Basis Theory New York:Wiley. [Google Scholar]
  31. SHEN X (1997). On methods of sieves and penalization. Ann. Statist 25:2555–2591. [Google Scholar]
  32. SHEN X AND WONG WH (1994). Convergence rate of sieve estimates. Ann. Statist 22:580–615. [Google Scholar]
  33. THOMASETH K, ALEXANDRA KW, BERNHARD L et al. (1996). Integrated mathematical model to assess β-cell activity during the oral glucose test. Am. J. Phisiol 270:522–531. [DOI] [PubMed] [Google Scholar]
  34. VAN DER VAART AW AND WELLNRR JA (1996). Weak Convergence and Empirical Processes New York:Springer-Verlag. [Google Scholar]
  35. VARAH J (1982). A spline least squares method for numerical parameter estimation in differential equations. SIAM J. Sci. Stat. Comput 3:28–46. [Google Scholar]
  36. WELLNER JA AND ZHANG Y (2007). Two likelihood-based semiparametric estimation methods for panel count data with covariates. Ann. Statist 35:2106–2142. [Google Scholar]
  37. WESSELS LF, VAN SOMEREN EP AND REINDERS MJ (2001). A comparison of genetic network models. Pacific Symposium on Biocomputing 6:508–519. [PubMed] [Google Scholar]
  38. WU H, XUE H AND KUMAR A (2012). Numerical discretization-based estimation methods for ordinary differential equation models via penalized spline smoothing. Bio-metrics 68(2):344–352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. WU H, KUMAR A, MIAO H, WILTSE JH, MOSSMAN TR, LIVINGSTON AM, BELZ GT, PERELSON AS, ZAND MS, AND TOPHAM DJ (2011). Modeling of Influenza-Specific CD8+ T Cells during the Primary Response Indicates that the Spleen Is a Major Source of Effectors. The Journal of Immunology 187:4474–4482. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. XUE H, LAM KF AND LI G (2004). Sieve maximum likelihood estimator for semiparametric regression models with current status data. J. Amer. Statist. Assoc 99:346–356. [Google Scholar]
  41. XUE H, MIAO H AND WU H (2010). Sieve estimation of constant and time-varying coefficients in nonlinear ordinary differential equation models by considering both numerical error and measurement error. Ann. Statist 38:2351–2387. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES