Parameter Estimation for Semiparametric Ordinary Differential Equation Models

Hongqi Xue; Arun Kumar; Hulin Wu

doi:10.1080/03610926.2018.1523433

. Author manuscript; available in PMC: 2020 Sep 18.

Published in final edited form as: Commun Stat Theory Methods. 2018 Dec 29;48(24):5985–6004. doi: 10.1080/03610926.2018.1523433

Parameter Estimation for Semiparametric Ordinary Differential Equation Models

Hongqi Xue ¹, Arun Kumar ^2,^*, Hulin Wu ³

PMCID: PMC7500512 NIHMSID: NIHMS1514797 PMID: 32952273

Abstract

We propose a new class of two-stage parameter estimation methods for semiparametric ordinary differential equation (ODE) models. In the first stage, state variables are estimated using a penalized spline approach; In the second stage, form of numerical discretization algorithms for an ODE solver is used to formulate estimating equations. Estimated state variables from the first stage are used to obtain more data points for the second stage. Asymptotic properties for the proposed estimators are established. Simulation studies show that the method performs well, especially for small sample. Real life use of the method is illustrated using Influenza specific cell-trafficking study.

Keywords: Ordinary differential equation, Penalized spline, Semiparametric coefficient models, Data augmentation estimation, Two-stage estimation

1. Introduction

We consider a semiparametric coefficient ODE model, which can be written as

{\begin{array}{l} \frac{d}{d t} X (t) = F {t, X (t), Z (t), β, η (t)}, \forall t \in [t_{0}, T], \\ X (t_{0}) = X_{0}, \end{array}

(1)

where t ∈ [t₀, T ] (0 ≤ t₀ < T < ∞) is time, X(t) = {X₁(t), · · ·, X_ϑ(t)}T is a ϑ-dimensional state vector, F (.) = {F₁(.), · · ·, F_ϑ(.)}^T is a ϑ-dimensional vector of smoothing functions with known forms, Z(t) = {Z₁(t), · · ·, Z_H(t)}^T is a H-dimensional co-variable vector, β = (β₁, · · ·, β_d)^T is a d-dimensional vector of unknown constant parameters with a true value β0, η(t) = {η₁(t), · · ·, η_b(t)}^T is a b-dimensional vector of unknown time-varying parameters with a true curve η0(t), and X(t₀) = X₀ is the initial value, which may be known or unknown. The function F is assumed to fulfill the Lipschitz assumption on X ensuring existence and uniqueness of the solutions to Eq. (1) (see Hairer, Nørsett and Wanner 1993; and Mattheij and Molenaar 2002).

We assume that each X_k(t) (k = 1, · · ·, ϑ) is measured with noise and the measurement model can be written as

Y_{k, i} = X_{k} (t_{i}) + \in_{k, i},

(2)

at random time points t₁, · · ·, t_n, where the measurement errors s₁, · · ·, s_n are assumed to be i.i.d. with mean zero and variance $σ_{k}^{2}$ .

For ODE models with only constant coefficients, the existing statistical methods include the nonlinear least squares (NLS) method (Bard 1974; Li, Osborne, and Prvan 2005; Xue, Miao, and Wu 2010), the two-stage smoothing estimation method (Varah 1982; Liang and Wu 2008; Brunel 2008; Wu, Xue, and Kumar 2012; Gugushvili and Klaassen 2012), the principal differential analysis (PDA) (Ramsay 1996) and its extension, the generalized profiling approach (Ramsay et al. 2007; Qi and Zhao 2010), and the Bayesian approaches (Putter et al. 2002; Huang, Liu, and Wu 2006; Donnet and Samson 2007). For ODE models with only time-varying coefficients, Chen and Wu (2008a,b) considered the two-stage smoothing estimation method, but their models are linear and additive for time-varying coefficients as follows:

\frac{d}{d t} X (t) = Z^{T} (t) η (t) + g {X (t)},

and

\frac{d}{d t} X (t) = η (t) + β^{T} X (t),

where g(.) is a known function and β is a known constant vector. However, ODE model (1) with both constant and time-varying coefficients was widely used in practical applications such as in physiolotics (Thomaseth et al. 1996), pharmacokinetic (Li et al. 2002), and HIV studies (Xue, Miao, and Wu 2010; Liang, Miao, and Wu 2010; Cao, Huang, and Wu 2012).

In this article, we propose a novel two-stage estimation method for ODE model (1). In the first stage, we estimate the curve X(t) and its derivative $X^{'} (t)$ by the penalized spline smoothing approach (Ruppert, Wand, and Carroll 2003; Li and Ruppert 2008; Claeskens, Krivobokova, and Opsomer 2009; Wu, Xue, and Kumar 2012). Then in the second stage, we propose a class of estimation methods which are formed using the one-step discretization structure of the numerical algorithms for solving ODEs. In the second stage, we use more sample points from the smoothed curves from the first stage than the original data points, which was first suggested by Varah (1982) for parameter estimation of ODEs, and has been applied to gene regulatory network modeling (D’Haeseleer et al. 1999; Wessels, van Someren, and Reinders 2001; Bansal, Della Gatta, and di Bernardo 2006). This novel method is based on the original two-stage smoothing estimation method for ODE (Varah 1982; Chen and Wu 2008a,b; Liang and Wu 2008; Brunel 2008) like the numerical discretization-based estimation method (Wu, Xue, and Kumar 2012). The method, however, is different from the numerical discretization-based estimation method (Wu, Xue, and Kumar 2012) because the coefficients of ODE models are semiparametric. The derivation of the asymptotic properties for this model is challenging because the proposed estimation method plugs-in the nonparametric estimator obtained from the first stage to the second stage, and therefore, we cannot directly use the common Huber-Pollard Z-theorem (see Theorem 3.3.1 in van der Vaart and Wellner 1996) to derive the asymptotic normality. The article extends the Huber-Pollard Z-theorem to the two-stage estimation case.

The rest of the article is organized as follows. We introduce our new estimation methods in Section 2. The asymptotic properties of the proposed estimators are studied in Section 3. Estimation of variance using the bootstrap method is also discussed in Section 3. A real data application is given in Section 4 to illustrate the usefulness of the proposed methods, and simulation results are presented to demonstrate the finite-sample behavior of the proposed methods in Section 5. We conclude with some remarks in Section 6. The proofs of all theoretical results are given in the Appendix.

2. Estimation Methods

In the first stage, we use penalized splines to estimate the state variable X(t) as a smooth function of t. Every state variable X_k(t) (k = 1, · · ·, ϑ) is approximated at time t as follows $X_{k} (t) \approx \sum_{j = - v}^{K} δ_{k, j} N_{j} (t) = N^{T} (t) δ_{k}$ ,where $δ_{k} = {(δ_{k, - v}, \dots, δ_{k, K})}^{T}$ is the unknown coefficient vector to be estimated from the data, and $N (t) = {N_{- v} (t), \dots, N_{K} (t)}^{T}$ is the B-spline basis function vector of degree ν (order ν + 1) at knots t₀ = τ_−ν = τ_−ν+1 = · · · = τ₋₁ = τ₀ < τ₁ < · · · < τ_K < τ_K+1 = τ_K+2 = · · · = τ_K+ν+1 = T on the interval [t₀, T ]. Define n × K + ν + 1 spline design matrix N = {N (t₁), · · ·, N (t_n)}T, Y _k = (Y_k,1, · · ·, Y_k,n)^T and let $V_{N} = \int_{t_{0}}^{T} [N^{″} (t)] {[N^{″} (t)]}^{T} d t$ . The estimation objective function contains a sum of squared differences with a penalty term using the integrated squared second order derivative of the spline function as

L_{k} (δ_{k}; λ_{k}) = {(Y_{k} - N δ_{k})}^{T} (Y_{k} - N δ_{k}) + λ_{k} δ_{k}^{T} V_{N} δ_{k} .

(3)

The minimizer of (3) takes the form ${\hat{δ}}_{k} = {(N^{T} N + λ_{k} V_{N})}^{- 1} N^{T} Y_{k}$ . Then $\hat{X} (t) = N^{T} (t) {\hat{δ}}_{k}$ and $\hat{X}' (t) = {[N^{'} (t)]}^{T} {\hat{δ}}_{k}$ . The derivatives of spline functions can be expressed in terms of lower order spline functions, hence the expressions of V_N and $N^{'} (t)$ can be explicitly obtained. To determine the penalty parameter λ_k, we use the standard generalized cross validation (GCV) method (Craven and Wahba 1979).

In the second stage, by one-step discretization methods for ODEs (Hairer, Norsett, and Wanner 1993; Mattheij and Molenaar 2002), we have

\frac{X (s_{j + 1}) - X (s_{j})}{s_{j + 1} - s_{j}} = Φ {s_{j}, X (s_{j}), Z (s_{j}), X (s_{j + 1}), Z (s_{j + 1}), β, η (s_{j})} + O (h^{p}),

(4)

for j = 1, · · ·, m − 1, where t₀ = s₁ < s₂ < · · · < s_m = T are sample points on the interval [t₀, T ], which may be different from the observation time points t₁, · · ·, t_n with m ≥ n, h = max_j(s_j+1 − s_j). Variable m is the sample size of the augmented data set and h = O(m⁻¹). The form of $Φ {s_{j}, X (s_{j}), Z (s_{j}), X (s_{j + 1}), Z (s_{j + 1}), β, η (t)}$ and the order p are determined by the discretization methods. Similar to Wu, Xue, and Kumar (2012), we consider three different discretization methods: Euler’s method, trapezoidal rule and Runge-Kutta method. For each of these methods, the form of the function Φ and order p are given as follows. For the Euler’s method, we have

Φ {s_{j}, X (s_{j}), Z (s_{j}), X (s_{j + 1}), Z (s_{j + 1}), β, η (s_{j})} = F {s_{j}, X (s_{j}), Z (s_{j}), β, η (s_{j})}

with p = 1; the trapezoidal rule gives the form

\begin{array}{l} Φ {s_{j}, X (s_{j}), Z (s_{j}), X (s_{j + 1}), Z (s_{j + 1}), β, η (s_{j})} \\ = \frac{1}{2} [F {s_{j}, X (s_{j}), Z (s_{j}), β, η (s_{j})} + F {s_{j + 1}, X (s_{j + 1}), Z (s_{j + 1}), β, η (s_{j + 1})}] \end{array}

with p = 2; for the fourth-order Runge-Kutta method, we have

Φ {s_{j}, X (s_{j}), Z (s_{j}), X (s_{j + 1}), Z (s_{j + 1}), β, η (s_{j})} = \frac{k_{1}}{6} + \frac{k_{2}}{3} + \frac{k_{3}}{3} + \frac{k_{4}}{6}

with p = 4, where

$k_{1} = F {s_{j}, X (s_{j}), Z (s_{j}), β, η (s_{j})},$
$k_{2} = F {s_{j} + h_{j} / 2, X (s_{j}) + h_{j} k_{1} / 2, Z (s_{j} + h_{j} / 2), β, η (s_{j} + h_{j} / 2)},$
$k_{3} = F {s_{j} + h_{j} / 2, X (s_{j}) + h_{j} k_{2} / 2, Z (s_{j} + h_{j} / 2), β, η (s_{j} + h_{j} / 2)}, and$
$k_{4} = F {s_{j} + h_{j}, X (s_{j}) + h_{j} k_{3}, Z (s_{j} + h_{j}), β, η (s_{j} + h_{j})} .$

Further we approximate each component of η(t) by regression splines with the same B-spline basis functions. Let t₀ = u₀ < u₁ < · · · < u_q = T be a partition of the interval [t₀, T ], where $q = O (n^{ϖ}) (0 < ϖ < 0.5)$ is a positive integer such that $\max_{1 \leq j \leq q} | u_{j} - u_{j - 1} | = O (n^{- ϖ})$ . Then we have r = q + l normalized B-spline basis functions of order l + 1 that form a basis for the linear spline space. We denote these basis functions as a vector π(t) = {B₁(t), · · ·, B_r(t)}T. Parameter η(t) can be approximated by π^T (t)α, where $α = {(α_{1}, \dots, α_{r})}^{T} \in R^{r}$ is the spline coefficient vector with α₀ corresponding to η₀(t).

Replacing the state variable, X(t), with its estimate, $\hat{X} (t)$ , the expression (4) can be re-written as

\frac{\hat{X} (s_{j + 1}) - \hat{X} (s_{j})}{s_{j + 1} + s_{j}} = Φ {s_{j}, \hat{X} (s_{j}), Z (s_{j}), \hat{X} (s_{j + 1}), Z (s_{j + 1}), β, π^{T} (s_{j}) α} + Γ_{j},

(5)

where Γ_j is the sum of discretization error, approximation error by splines and estimation error of $\hat{X} (.)$ from the first stage. Obviously, Γ_j are not i.i.d., but dependent. For a prescribed non-negative weight function w_k(t) on [t₀, T ] (k = 1, · · ·, G), we propose to estimate $({\hat{β}}_{n}, {\hat{α}}_{n})$ of (β, α) by minimizing the following weighted least squares criterion:

{\sum_{j = 1}^{m} \sum_{k 1}^{ϑ} w_{k} (s_{j}) [\frac{{\hat{X}}_{k} (s_{j + 1}) - {\hat{X}}_{k} (s_{j})}{s_{j + 1} + s_{j}} - Φ_{k} {s_{j}, \hat{X} (s_{j}), Z (s_{j}), \hat{X} (s_{j + 1}), Z (s_{j + 1}), β, π^{T} (s_{j}) α}]}^{2} .

(6)

Then ${\hat{η}}_{n} (t) = π^{T} (t) {\hat{α}}_{n}$ Assumption required on w_k(.) for the asymptotic results is discussed in Section 3. From numerical studies, we found that the selection of w_k(.) have little affect on the performance of the proposed estimators.

In general, a higher order discretization method is expected to give a better estimation accuracy since its discretization error is smaller, but its computational cost is higher. The trade-off between the estimation accuracy and computational cost needs to be considered in practical applications. For convenience, we use the abbreviations EDB, TDB, and RDB for the Euler’s discretization-based estimator, the trapezoidal discretization-based estimator, and the Runge-Kutta discretization-based estimator, respectively. We study the asymptotic properties of the proposed estimators in Section 3 and evaluate the finite-sample performance of these estimators in Sections 4–5.

3. Asymptotic Properties

In this section, we establish the asymptotic properties of the proposed two-stage estimator for the semiparametric ODE models. This is challenging since we need to integrate the asymptotic results from the two-stage smoothing-based estimation method for ODE models (Varah 1982; Brunel 2008; Chen and Wu 2008a,b; Liang and Wu 2008; Wu, Xue, and Kumar 2012) and the spline approximation error for the time-varying coefficients together especially when the resulting error terms Γ_j in (5) are not i.i.d., but dependent.

Let µ be a nonnegative integer and γ ∈ (0, 1] such that ϱ = µ + γ > 0.5. Let $A$ be the collection of functions η on [t₀, T ] whose µ-th derivative η^(µ) exists and satisfies the Lipschitz condition of order $γ : | η^{(μ)} (s) - η^{(μ)} (t) | \leq L {| s - t |}^{γ}$ for s, t ∈ [t₀, T ] with a general positive constant L. Denote the parametric vector θ = {β^T, η^T (t)}T and define ${‖ η (t) ‖}_{2} = {[\int_{t_{0}}^{T} η^{2} (t) ρ (t) d t]}^{1 / 2}$ for any function η(t) whenever the integral exists, where ρ(t) is defined in the Assumption A5. For any θ₁ and θ₂, we define a distance $ρ (θ_{1}, θ_{2}) = ‖ β_{1} - β_{2} ‖ + \sum_{j = 1}^{b} {‖ η_{j, 1} - η_{j, 2} ‖}_{2}$ . For 1 ≤ k ≤ ϑ, the assumptions A-D needed for the theorems are listed below:

Assumption A:

(A1)
For $k = \max_{0 \leq j \leq K} (τ_{k, j + 1} - τ_{k, j})$ , there exists a constant M > 0 such that $k / \min_{0 \leq j \leq K} (τ_{k, j + 1} - τ_{k, j}) \leq M$ and $\max_{0 \leq j \leq K} | \frac{τ_{k, j + 1} - τ_{k, j}}{τ_{k, j} - τ_{k, j - 1}} - 1 | = o (1)$ .
(A2)
K ∼ cn^ψ with $1 / (2 v + 3) \leq ψ < 1$ and $λ_{k} = O (n^{π})$ with $π \leq v / (2 v + 3)$ .
(A3)
X_k(t) ∈ C^ν+1[t₀, T ] with ν ≥ 2.
(A4)
$K^{*} (K + v - 1) {(λ_{k} {\tilde{c}}_{1})}^{1 / 4} n^{- 1 / 4} < 1$ for some constant ${\tilde{c}}_{1}$ .
(A5)
t₁, · · ·, t_n are i.i.d. with a cumulative distribution function Q(t) and a positive and continuous derivative density ρ(t). Moreover, ρ(t) is bounded away from 0 and +∞ and has a bounded and continuous first-order derivative.

Assumption B:

(B1)
F_k{t, X(t), Z(t), β, η(t)} is a continuous function for $β \in B$ and $η_{j} (t) \in F (1 \leq j \leq b)$ , where B is a compact subset of $R^{d}$ with a finite diameter R_β.
(B2)
All partial Fréchet derivatives of F_k{t, X, Z(t), β, η} up to order p with respect to t, X, Z and η exist and are continuous.
(B3)
The numerical method for solving ODEs is of order p.
(B4)
The true parameter β0 is an interior point of B.

Assumption C:

(C1)
The first and the second partial Fr´echet derivatives of F_k{t, X(t), Z(t), β, η}; $\frac{\partial F_{k}}{\partial β}, \frac{\partial F_{k}}{\partial η}, \frac{\partial^{2} F_{k}}{\partial β \partial β^{T}}, \frac{\partial^{2} F_{k}}{\partial β \partial η^{T}}$ and $\frac{\partial^{2} F_{k}}{\partial η \partial η^{T}}$ exist and are continuous and uniformly bounded for all t, β, X(t) and η.
(C2)
The first partial Fr´echet derivatives $\frac{\partial}{\partial t} F_{k} {t, X (t), Z (t), β, η (t)}$ and $\frac{\partial}{\partial X} F_{k} {t, X, Z (t), β, η (t)}$ are continuous and uniformly bounded for all t, β, X(t), Z(t) and η(t).
(C3)
The measurement error red 𝜖_k,1, · · ·, 𝜖_k,n are i.i.d. with mean 0 and variance $σ_{k}^{2} (0 < σ_{k}^{2} < \infty)$ Moreover, their density function, ρ(.), is bounded away from 0 and +∞ and has a bounded and continuous first-order derivative.
(C4)
l + 1 ≥ ϱ.
(C5)
For any $β \in B$ and $η \in A$ , $\sum_{k = 1}^{ϑ} E (w_{k} (t) {[F_{k} {t, X (t), Z (t), θ} - F_{k} {t, X (t), Z (t), θ_{0}}]}^{2}) = 0$ if and only if β = β0 and P {t : η(t) = η0(t)} = 1.

Assumption D:

(D1)
The weight function w_k(.) is bounded and nonnegative on [t₀, T ] with w_k(t₀) = w_k(T) = 0. Moreover, w_k(t) has a bounded and continuous first-order derivative.
(D2)
ϖ satisfies the restrictions 0.25/ϱ < ϖ < 0.5 and ϖ(2 + ϱ) > 0.5.

Note that Assumption A is a standard condition to derive local asymptotic properties of the penalized spline estimator for the nonparametric regression function (Claeskens, Krivobokova, and Opsomer 2009; Wu, Xue, and Kumar 2012). Assumption B is required for the precision of the discretization algorithm (Wu, Xue, and Kumar 2012), where Assumption B3 from Mattheij and Molenaar (2002, p.55–56) defines the precision of the numerical algorithm. For example, the Euler backward method, the trapezoidal rule and the 4-stage Runge-Kutta are of order 1, 2 and 4, respectively. Assumptions C1-C3 are needed for the proof of consistency. Assumption C4 is a general constrained condition in spline theories. Assumption C5 is required for identifiability. Assumption D1 on the boundary constraint of weight functions is required for some technical reasons, which was also used by Wu, Xue, and kumar (2012) to ensure the convergence rate of the parametric components’ estimators to achieve the common root n. Assumption D2 is needed for the proof of asymptotic normality.

Theorem 1: Under Assumptions A, B and C, we have $ρ (\hat{θ}, θ_{0}) \to 0$ , almost surely, under $P_{θ_{0}}$ .

Theorem 2: Under Assumptions A, B, C and D, we have $ρ ({\hat{θ}}_{n}, θ_{0}) = O_{p} (n^{- ϖ ϱ} + n^{- (1 - ϖ) / 2})$ . Moreover, if $ϖ = 1 / (1 + 2 ϱ), {‖ {\hat{η}}_{n} (t) - η_{0} (t) ‖}_{2} = O_{p} (n^{- ϱ / (1 + 2 ϱ)})$ , which is the same as the optimal rate of the standard nonparametric function estimation.

Theorem 3: Under Assumptions A, B, C and D, if ϑ ≥ 2, then we have $n^{1 / 2} ({\hat{β}}_{n} - β_{0}) \overset{d}{\to} N (0, A^{- 1} \sum_{2} {(A^{- 1})}^{T})$ with the matrixes A and $Σ_{2}$ given in (11) and (13) in Section 7, respectively.

Remark 1: For ϑ = 1, Theorem 3 does not hold because of unidentifiability problem, which is similar to Theorem 4.2 in Xue, Miao, and Wu (2010).

The asymptotic variance-covariance matrix needs to be estimated in order to perform statistical inference for unknown parameters β. There are some standard methods that can be used. We recommend the nonparametric bootstrap method (Efron 1982), which has been used for statistical inference for ODE models with constant coefficients (Joshi, Seidel-Morgenstern, and Kremling 2006).

Let {(t_i∗, Y_i∗), i = 1, · · ·, n} be independent and drawn with replacement from the original sample {(t_i, Y_i), i = 1, · · ·, n}. In the first stage, based on the bootstrap sample {(t_i∗, Y_i∗), i = 1, · · ·, n}, we can obtain estimators ${\hat{X}}_{*} (t)$ and ${\hat{X}}_{*}' (t)$ . Plugging them into the second stage, we can get the bootstrap M-estimator $({\hat{β}}_{n *}, {\hat{α}}_{n *})$ . From Theorem 1 in Cheng and Huang 2010) and Theorem 3 in this article, given ${t_{i}, Y (t_{i})}, \sqrt{n} ({\hat{β}}_{n *} - {\hat{β}}_{n})$ and $\sqrt{n} ({\hat{β}}_{n} - {\hat{β}}_{0})$ have the same limiting distribution, which can be used for inference on ${\hat{β}}_{n}$ and ${\hat{η}}_{n} (t)$ .

4. Real data analysis

We used the proposed two-stage method to estimate parameters of an ODE model that describes kinetics of influenza-specific CD8⁺ T cells during the primary immune response. Goal of the study is to estimate biological parameters that determine the distribution of CD8⁺ T cells during influenza infection in mice. Since these parameters are not directly estimable from experiments, a mathematical model is developed to mimic the kinetics of CD8⁺ T cells during influenza infection as follows (Wu et al. 2011):

\begin{array}{l} \frac{d T_{E}^{m}}{d t} = (ρ_{m} (t) - γ_{m s}) T_{E}^{m}, \\ \frac{d T_{E}^{s}}{d t} = (ρ_{s} (t) - γ_{s l}) T_{E}^{s} + γ_{m s} T_{E}^{m}, \\ \frac{d T_{E}^{l}}{d t} = γ_{s l} T_{E}^{s} - δ_{l} T_{E}^{l} . \end{array}

(7)

The state variable $T_{E}^{m}$ denotes effector CD8⁺ T cell count in Mediastinal lymph node (MLN), $T_{E}^{s}$ denotes effector CD8⁺ T cell count in spleen, and $T_{E}^{l}$ denotes effector CD8⁺ T cell count in lung. The model has two time-varying parameters: ρ_m(t) and ρ_s(t) and three constant parameters: γ_ms, γ_sl, and δ_l. Parameter ρ_m(t) denotes the proliferation rate of effector CD8⁺ T cells in MLN and ρ_s(t) denotes the proliferation rate of effector CD8⁺ T cells in spleen. Constant parameter γ_ms denotes the migration rate of effector CD8⁺ T cells from MLN to spleen, γ_sl denotes the migration rate of effector CD8⁺ T cells from spleen to lung, and δ_l denotes the death rate of effector CD8⁺ T cells in lung. Data on $T_{E}^{m}$ , $T_{E}^{s}$ and $T_{E}^{l}$ were collected through experiments on Female C57BL/6 mice (see Wu et al. 2011 for more details). Mice were anesthetized and intranasally inoculated with H3N2 A/Hong Kong/X31 influenza-A virus. From mice tissues, CD8⁺ T cell data were collected on days 0,4,5,6,7,8,9,10,11,12,14, and 28. At each time point, data were collected from 6 mice except day 12 for which data were collected from 12 mice. For estimating parameters, we only used the data from day 5 to day 14 when adaptive immune response is active. Also one observation on day 8 was identified as an outlier and hence removed from the data for model fitting.

Wu, Xue, and Kumar (2012) showed that the TDB method is the best among all the two-stage methods when both accuracy and computational cost are taken into account. Hence we used the TDB algorithm to estimate the parameters of the model (7). The time-varying parameters ρ_m(t) was approximated by a linear sum of four B-splines with order 4: $ρ_{m} (t) = \sum_{j = 0}^{3} α_{j} B_{j} (t), 5 \leq t \leq 14$ . These B-splines were evaluated at knot points (5,5,5,8,11,14,14,14). Two inner knot points at 8 and 14 allowed us to capture the bi-modal shape of the time-varying parameters. The time-varying parameter ρ_s(t) was also approximated by a linear sum of four B-splines with order 4. We used the weight function w(t_i) = sin((t_i − 5)π/9) in the objective function. We adopted the data augmentation size m = 11 ≈ 1.25n with n = 9 in the second stage. To find confidence intervals of the parameters, we used the bootstrap method. The bootstrap method was repeated 500 times to obtain 500 bootstrap estimates of each parameter. A 95% confidence interval for a parameter was obtained by finding the 2.5th percentile and the 97.5th percentile of the bootstrap parameter estimates. Point estimates along with 95% confidence intervals for the parameters of model (7) are given in Table 1. Plots of the estimates of the time-varying parameter ρ_m and ρ_s along with their 95% confidence bands are shown in the left and right sides of Figure 1, respectively. From these estimation results, we can see that our estimates for the constant parameters (γ_ms, γ_sl, δ_l) are on the similar scale of the original estimates in Wu et al. (2011). Our time-varying parameter estimates for proliferation rates demonstrate an interesting bi-mode shape, which may indicate an earlier peak during the adaptive immune response and a later peak during the recovery period after the influenza virus is removed at Day 10. But these biological implications need to be confirmed by further experiments in the future.

Table 1:

Parameter point estimates and 95% bootstrap confidence intervals in the real data example.

Parameter	Estimate	Confidence interval
γms	1.57	(0.94,3.03)
γsl	0.92	(0.55,1.21)
δ_l	4.60	(2.26,6.62)

Open in a new tab

Figure 1: — Estimates of the proliferation rates of CD8⁺ T cells in MLN (left) and spleen (right) along with 95% confidence bands.

5. Simulation Studies

We conducted simulation studies to evaluate the performance of the proposed two-stage estimation method. Similar to Section 4, we only considered the TDB method. We used model (7) to generate the simulation data. Constant parameters for simulation studies were γ_ms = 1.25, γ_sl = 1.25 and δ_l = 9. The time-varying parameters ρ_m(t) and ρ_s(t) that were used to generate simulation data are shown in Figure 2. These simulation parameters were selected so that the solution to (7) is well-behaved (shown in Figures 3).

Figure 2: — ρ_m(t) (left) and ρ_s(t) (right) used to generate simulation data.

Figure 3: — Solutions of $T_{E}^{m}$ (left), $T_{E}^{s}$ (middle) and $T_{E}^{l}$ (right) for the simulation parameters.

We generated two simulation data sets: one of 20 observations and the other of 40 observations for each state variable. Simulation data were generated at equally-spaced time points between 5 and 14. A fourth order Runge-Kutta method was used to solve (7) to generate the simulation data. Initial conditions $T_{E}^{m} (5) = 5000$ , $T_{E}^{s} (5) = 45000$ , and $T_{E}^{l} (5) = 1300$ were used to solve (7) numerically. Noise generated from independent normal distributions with zero mean and standard deviations $σ_{E}^{m}$ , $σ_{E}^{s}$ , and $σ_{E}^{l}$ were added to the simulation data for $T_{E}^{m}$ , $T_{E}^{s}$ , and $T_{E}^{l}$ respectively. The standard deviation for the noise was calculated by taking 1% of the standard deviation of the noiseless simulation data. We tried two different data augmentation sizes: m = n and m = 2n. Similar to the real data analysis, we used the weight function w(t_i) = sin((t_i − 5)π/9) in the objective function.

We estimated ρ_m(t), γ_ms and γ_sl, and assumed that the rest of the parameters are known. This reduced the computational burden on the optimization routine avoiding local convergence problem and hence giving a fair comparison of the performance of different data augmentation sizes. The time-varying parameter ρ_m(t) was approximated by a linear sum of four B-spline basis functions of order 4. R function “genoud”, which is a global optimization routine, was used to minimize (6) to obtain the parameter estimates. The algorithm “genoud” searches for the estimate of the parameters in the range [0,10] and starts the search at 5 for each parameter. For each simulation case, 500 runs were replicated. The following average relative error (ARE) of the constant parameter estimates were used to compare for different data augmentation sizes,

ARE (θ) = \frac{1}{500} \sum_{i = 1}^{500} | \frac{θ_{i} - θ_{0}}{θ_{0}} |,

(8)

where θ₀ is the true value of a parameter θ and θ_i is the i^th simulation run. For the time-varying parameter ρ_m, we calculated its average mean square error (AMSE) as follows,

AMSE (ρ_{m}) = \frac{1}{500} \sum_{i = 1}^{500} (\frac{1}{1000} \sum_{j = 1}^{1000} {(ρ_{m_{i}} (t_{j}) - ρ_{m_{0}} (t_{j}))}^{2}),

(9)

where ρ_m (t_j) is the estimate of ρ_m(t_j) in the i^th simulation run and ρ_m (t_j) is the true value of ρ_m at the time point t_j. Time points t_j’s in the equation (9) are equi-spaced points between 5 and 14. In Table 2, we show ARE for the constant parameters γ_ms and γ_sl and AMSE for the time-varying parameter ρ_m at two data augmentation sizes: m = n and m = 2n. We observed the trend that both ARE and AMSE decrease as the data augmentation size increases. This simulation study clearly demonstrates the merit of data augmentation in the second stage in improving the overall fit of the model to the data.

Table 2:

ARE for γ_ms and γ_sl and AMSE for ρ_m

n	Parameter	m/n	ARE or AMSE
20	γms	1	6.08
		2	1.40
	γsl	1	0.17
		2	0.15
	ρ_m	1	0.0044
		2	0.0005

40	γms	1	0.91
		2	0.78
	γsl	1	0.13
		2	0.11
	ρ_m	1	0.0002
		2	0.0001

Open in a new tab

6. Conclusion and Discussion

In this article, we have proposed a new class of estimation methods to estimate both constant and time-varying coefficients for semiparametric ODE models, which exploits the form of numerical discretization algorithms for an ODE solver and uses more sample points than the original data points. This new class of methods are shown to have benefits in terms of computational efficiency and estimation accuracy over the original two-stage smoothing estimation methods for ODE models. Other estimation approaches for the semiparametric ODE models, such as the NLS method and the generalized profiling approach (Ramsay et al. 2007) may also be considered, but this new class of estimation methods provides an alternative estimation method with a low computation cost. In this article, we have not explored how to select the data augmentation size in the second stage. As per our simulation studies, the bigger the data augmentation size, the better the estimation. However, a bigger data augmentation size comes with a higher computational cost. The trade-off between the data augmentation size and the computational cost is an open problem.

Acknowledgments

The authors thank Dr. Hua Liang and Dr. Yun Fang for helpful discussions.

This research was partially supported by the NIH grants HHSN272201000055C, AI087135, and two University of Rochester CTSI pilot awards (UL1RR024160) from the National Center For Research Resources. This work was done when authors were affiliated with University of Rochester.

Appendix: Technical Proofs

For presentation and notation simplicity, we mainly focus on the case without input variable Z and t in (1). And we only consider the two-stage smoothing estimation method, i.e., replacing (6) with the following objection function:

\sum_{j = 1}^{m} \sum_{k = 1}^{ϑ} w_{k} (s_{j}) {[{\hat{X}}_{k}^{'} (s_{j}) - F_{k} {\hat{X} (s_{j}), β, π^{T} (s_{j}) α}]}^{2} .

All of the proofs can be extended to the numerical discretization-based methods based on the objection function (6) by similar arguments in the proofs of Lemma 5, Proposition 1 and Proposition 2 in Wu, Xue, and Kumar (2012). We also only discuss the univariate case for η(t), which can be extended to the multivariate case using similar approach as in He, Xue, and Shi (2010). Denote

{\tilde{S}}_{n} (θ) = \frac{1}{m} \sum_{j = 1}^{m} \sum_{k = 1}^{ϑ} w_{k} (s_{j}) {[{\hat{X}}_{k}^{'} (s_{j}) - F_{k} {\hat{X} (s_{j}), θ}]}^{2},

S_{n} (θ) = \frac{1}{m} \sum_{j = 1}^{m} \sum_{k = 1}^{ϑ} w_{k} (s_{j}) {[X_{k}^{'} (s_{j}) - F_{k} {X (s_{j}), θ}]}^{2},

S (θ) = \sum_{k = 1}^{ϑ} E (w_{k} (t) {[X_{k}^{'} (t) - F_{k} {X (t), θ}]}^{2}) .

Proof of Theorem 1. Denote set $A_{n} = {η (t) = \sum_{i = 1}^{r} B_{i} (t) α_{i} : \max_{1 \leq i \leq r} | α_{i} | \leq l_{n}}$ , where $l_{n} \leq n^{(2 l - 1) / [2 l^{'} (2 l + 1)]}$ with a constant $l^{'}$ arbitrarily close to l (see Shen 1997, p. 2560), and $Θ_{n} = {θ : β \in B, η \in A_{n}} = B \times A_{n}$ . For any $θ \in Θ \dot{=} {θ : β \in B, η \in A} = B \times A$ , under Assumption C4, by Corollary 6.21 in Schumaker (1981), there exists θ_n ∈ Θ_n such that $ρ (θ, θ_{n}) = O (n^{- ϖ ϱ})$

First, we claim that $\sup_{θ} | {\tilde{S}}_{n} (θ) - S_{n} (θ) | \to 0$ , a.s., under $p_{θ_{0}}$ . In fact, by the mid-value theorem, we have that

\begin{array}{l} {\tilde{S}}_{n} (θ) = \frac{1}{m} \sum_{j = 1}^{m} \sum_{k = 1}^{ϑ} w_{k} (s_{j}) {[{\hat{X}}_{k}^{'} (s_{j}) - F_{k} {\hat{X} (s_{j}), θ}]}^{2} \\ = \frac{1}{m} \sum_{j = 1}^{m} \sum_{k = 1}^{ϑ} w_{k} (s_{j}) [X_{k}^{'} (s_{j}) - F_{k} {X (s_{j}), θ} + {\hat{X}}_{k}^{'} (s_{j}) - X_{k}^{'} (s_{j}) \\ {+ F_{k} {X (s_{j}), θ} - F_{k} {\hat{X} (s_{j}), θ}]}^{2} \\ = \frac{1}{m} \sum_{j = 1}^{m} \sum_{k = 1}^{ϑ} w_{k} (s_{j}) [X_{k}^{'} (s_{j}) - F_{k} {X (s_{j}), θ} + {\hat{X}}_{k}^{'} (s_{j}) - X_{k}^{'} (s_{j}) \\ {- \sum_{ι = 1}^{ϑ} \frac{\partial F_{k} {X, θ}}{\partial X_{ι}} |_{X_{1} = X_{1} (s_{j}), \dots, X_{ι} = {\tilde{X}}_{ι, j}, \dots, X_{ϑ} = X_{ϑ} (s_{j})} [{\hat{X}}_{ι} (s_{j}) - X_{ι} (s_{j})]]}^{2} \\ = \frac{1}{m} \sum_{j = 1}^{m} \sum_{k = 1}^{ϑ} w_{k} (s_{j}) {[X_{k}^{'} (s_{j}) - F_{k} {X (s_{j}), θ}]}^{2} \\ + \frac{1}{m} \sum_{j = 1}^{m} \sum_{k = 1}^{ϑ} w_{k} (s_{j}) {[{\hat{X}}_{k}^{'} (s_{j}) - X_{k}^{'} (s_{j})]}^{2} \\ + \frac{1}{m} \sum_{j = 1}^{m} \sum_{k = 1}^{ϑ} \sum_{ι = 1}^{ϑ} w_{k} (s_{j}) {(\frac{\partial F_{k} {X, θ}}{\partial X_{ι}})}^{2} {[{\hat{X}}_{ι} (s_{j}) - X_{ι} (s_{j})]}^{2} + R_{n} (θ), \end{array}

(10)

where ${\tilde{X}}_{ι, j}$ is some point between X_ι(s_j) and ${\hat{X}}_{ι} (s_{j})$ and R_n(θ) is the left term of interaction. Under Assumptions A and C1, by the strong law of large numbers and Lemma 2 in Wu, Xue, and Kumar (2012), we have

\frac{1}{m} \sum_{j = 1}^{m} \sum_{k = 1}^{ϑ} w_{k} (s_{j}) {[{\hat{X}}_{k}^{'} (s_{j}) - X_{k}^{'}]}^{2} \to \sum_{k = 1}^{ϑ} E (w_{k} (t) {[{\hat{X}}_{k}^{'} (t) - X_{k}^{'} (t)]}^{2}) \to 0, a . s .,

and

\begin{array}{l} sup_{θ} \frac{1}{m} \sum_{j = 1}^{m} \sum_{k = 1}^{ϑ} \sum_{ι = 1}^{ϑ} w_{k} (s_{j}) {(\frac{\partial F_{k} {X, θ}}{\partial X_{ι}})}^{2} |_{X_{1} = X_{1} (s_{j}), \dots, X_{ι} = {\tilde{X}}_{ι, j}, \dots, X_{ϑ} = X_{ϑ} (s_{j})} {[{\hat{X}}_{ι} (s_{j}) - X_{ι} (s_{j})]}^{2} \\ \to \sum_{k = 1}^{ϑ} \sum_{ι = 1}^{ϑ} E [w_{k} (t) sup_{θ} {(\frac{\partial F_{k} {X, θ}}{\partial X_{ι}})}^{2} |_{X = X (t)} {[{\hat{X}}_{ι} (t) - X_{ι} (t)]}^{2}] \to 0. a . s . \end{array}

By the Cauchy-Schwarz inequality, it follows that the left term supθ R_n(θ) is bounded by a product of a lower term than that other three terms in (10) and can be ignored. Thus the claim holds.

Next, let $A_{n}^{δ}$ be the set ${η \in A_{n}, {‖ η - η_{n 0} ‖}_{2} \leq δ}$ and $N_{2} (\in, L_{\infty}, A_{n}^{δ})$ be its bracketing number with respect to L_∞ (see Definition 2.1.6, van der Vaart and Wellner 1996), where η_n0 is the map point of η₀ in the sieve A_n. By the calculation of Shen and Wong (1994, p.597), for any 𝜖 ≤ δ, we have $N_{2} (\in, L_{\infty}, A_{n}^{δ}) \leq C {(δ / \in)}^{r}$ , where r = q + l is the number of B-splines basis functions. Let $F_{n}$ be the set ${S_{n} (θ) - S (θ) : ‖ β - β_{0} ‖ \leq δ, η \in A_{n}, {‖ η - η_{n 0} ‖}_{2} \leq δ}$ . For any θ₁, θ₂ ∈ Θ_n, we can easily obtain $| S_{n} (θ_{1}) - S_{n} (θ_{2}) | \leq C (‖ β_{1} - β_{2} ‖ + {‖ η_{1} - η_{2} ‖}_{\infty})$ using Taylor expansion. Hence

\begin{array}{l} N_{2} (\in, L_{\infty}, F_{n}) \leq N_{1} (\in / 2, L_{2}, B) \times N_{2} (\in / 2, L_{\infty}, A_{n}^{δ}) \\ \leq C {(3 R_{β} / \in)}^{d} (δ / \in) \leq C^{'} {(1 / \in)}^{r + d} . \end{array}

Then by similar arguments of (A.2) in Xue, Lam, and Li (2004), we can obtain $\sup_{F_{n}} | S_{n} (θ) - S (θ) | \to 0$ a.s., under $p_{θ_{0}}$ .

Third, under Assumption C5, it is easy to obtain that S(θ) reaches its unique minimum at θ = θ₀. Under Assumption B4, it follows that the first-order derivative $\frac{\partial S (θ)}{\partial θ}$ of S(θ) at θ₀ equals to zero and the second-order derivative $\frac{\partial^{2} S (θ)}{\partial θ \partial θ^{T}}$ of S(θ) at θ₀ is positive definite. By Assumptions A5, C1 and C3, the second-order derivative of S(θ) in a small neighborhood of θ₀ is bounded away from 0 and ∞. Then the second-order Taylor expansion of S(θ) gives that there exists a constant 0 < C < ∞ such that $S ({\hat{θ}}_{n}) - S (θ_{0}) \geq C_{p} ({\hat{θ}}_{n}, θ_{0})$ . Moreover,

0 \leq S ({\hat{θ}}_{n}) - S (θ_{0}) = S ({\hat{θ}}_{n}) - {\tilde{S}}_{n} ({\hat{θ}}_{n}) + {\tilde{S}}_{n} ({\hat{θ}}_{n}) - S (θ_{0}) \leq S ({\hat{θ}}_{n}) - {\tilde{S}}_{n} ({\hat{θ}}_{n}) + {\tilde{S}}_{n} ({\hat{θ}}_{0}) - S (θ_{0}) \leq 2 \sup_{θ} | {\tilde{S}}_{n} (θ) - S (θ) | \leq 2 \sup_{θ} | {\tilde{S}}_{n} (θ) - S_{n} (θ) | + 2 \sup_{θ} | S_{n} (θ) - S (θ) | \to 0, a . s .

Thus, $ρ ({\hat{θ}}_{n}, θ_{0}) \to 0$ a.s., under $p_{θ_{0}}$ .

Proof of Theorem 2. We apply Theorem 3.4.1 in van der Vaart and Wellner (1996) to obtain the rate of convergence.

First, let θ_n0 be the map point of θ₀ in Θ_n in the proof of Theorem 1, and define $θ_{n 0} \mapsto_{ρ 1} (θ, θ_{n 0})$ be a map from Θ_n to [0, ∞) as $ρ_{1}^{2} (θ, θ_{n 0}) = S (θ) - S (θ_{n 0})$ Choose δ_n = ρ(θ₀, θ_n0). For δ_n < δ < ∞, denote Ω = {θ : θ ∈ Θ_n, δ/2 < ρ(θ, θ_n0) ≤ δ}. From the definition of ρ₁, we have $\sup_{Ω} S (θ_{n 0}) - S (θ) \leq - \frac{δ^{2}}{4}$ Let $Ξ_{n}$ be the set ${S_{n} (θ) - S (θ_{n 0}) : θ \in Θ_{n}}$ and $\tilde{J} (δ, L_{2} (P), Ξ_{n})$ be the L₂(P)-norm bracketing integral of Θ_n. Similar to the proof of Theorem 4.2 of Xue, Miao, and Wu (2010), we have $\tilde{J} (δ, L_{2} (P), Ξ_{n}) \leq C r^{\frac{1}{2}} δ$ . Let

ϕ_{n} (δ) = \tilde{J} (δ, L_{2} (P), Ξ_{n}) (1 + \tilde{J} (δ, L_{2} (P), Ξ_{n}) / (δ^{2} \sqrt{n})) = r^{\frac{1}{2}} δ + r / \sqrt{n} .

Obviously, ϕ_n(δ)/δ^1+ι is a decreasing function in δ for 0 < τ < 1. Then by Lemma 3.4.2 in Van der Vaart and Wellner (1996), we have $E_{0} [\sup_{Ω} (S_{n} - S) (θ - θ_{n 0})] \underline{≺} ϕ_{n} (δ) / \sqrt{n}$ .

Next, we claim that $\sup_{Ω} [{\tilde{S}}_{n} (θ) - S_{n} (θ)] = o_{p} (n^{- \frac{1}{2}})$ From (10), it is sufficient to prove $\frac{1}{m} {\sum_{j = 1}^{m} \sum_{k = 1}^{ϑ} w_{k} (s_{j}) [{\hat{X}}_{k}^{'} (s_{j}) - X_{k}^{'} (s_{j})]}^{2} - o_{p} (n^{- \frac{1}{2}})$ and $\sup_{Ω} \frac{1}{m} \sum_{j = 1}^{m} \sum_{k = 1}^{ϑ} \sum_{ι = 1}^{ϑ} w_{k} (s_{j}) {(\frac{\partial F_{k} {X, θ}}{\partial X_{ι}})}^{2} {[{\hat{X}}_{ι} (s_{j}) - X_{ι} (s_{j})]}^{2} = o_{p} (n^{- \frac{1}{2}})$ . In fact, for fixed $\hat{X} (t)$ and θ, from the numerical error results on the Trapezoidal Rule integration, we have that

\frac{1}{m} \sum_{j = 1}^{m} \sum_{k = 1}^{ϑ} \sum_{ι = 1}^{ϑ} w_{k} (s_{j}) \sup_{Ω} [\frac{\partial F_{k} {X, θ}}{\partial X_{ι}}] [{\hat{X}}_{ι} (s_{j}) - X_{ι} (s_{j})] = \sum_{k = 1}^{ϑ} \sum_{ι = 1}^{ϑ} \int_{t_{0}}^{T} w_{k} (t) ρ (t) \sup_{Ω} [\frac{\partial F_{k} {X, θ}}{\partial X_{ι}}] [{\hat{X}}_{ι} (t) - X_{ι} (t) d t + O_{p} (1 / m)] .

Since $m \geq n, O_{p} (1 / m) = O_{p} (n^{- \frac{1}{2}})$ . Denote $g (t) = w_{k} (t) ρ (t) \sup_{Ω} [\frac{\partial F_{k} {X, θ}}{\partial X_{ι}}]$ . Under Assumption A, by Lemma 6 in Wu, Xue and Kumar (2012), $\int_{t_{0}}^{T} g (t) [{\hat{X}}_{ι} (t) - X_{ι} (t)] d t$ is asymptotically normal with mean as

μ_{1} = \int_{t_{0}}^{T} g (t) E [{\hat{X}}_{ι} (t) - X_{ι} (t)] d t = O_{p} (k^{υ + 1}) = o_{p} (n^{- 1 / 2}),

and variance as

Σ_{1} = \int_{t_{0}}^{T} \int_{t_{0}}^{T} g (s) Cov [{\hat{X}}_{ι} (s) - X_{ι} (s), {\hat{X}}_{ι} (t) - X_{ι} (t)] g (t) d s d t = O_{p} (1 / n) .

Thus $\frac{\sqrt{n}}{m} \sum_{j = 1}^{m} \sum_{k = 1}^{ϑ} \sum_{ι = 1}^{ϑ} w_{k} (s_{j}) \sup_{Ω} [\frac{\partial F_{k} {X, θ}}{\partial X_{ι}}] [{\hat{X}}_{ι} (s_{j}) - X_{ι} (s_{j})]$ is asymptotically normal with mean 0 and variance of order O_p(1). Using the Delta method (van der Vaart and Wellner 1996, p.377), we have that

\frac{\sqrt{n}}{m} \sum_{j = 1}^{m} \sum_{k = 1}^{ϑ} {\sum_{ι = 1}^{ϑ} w_{k} (s_{j}) \sup_{Ω} [\frac{\partial F_{k} {X, θ}}{\partial X_{ι}}]}^{2} {[{\hat{X}}_{ι} (s_{j}) - X_{ι} (s_{j})]}^{2}

is also asymptotically normal with mean 0 and variance of order O_p(1). So

\frac{1}{m} \sum_{j = 1}^{m} \sum_{k = 1}^{ϑ} {\sum_{ι = 1}^{ϑ} w_{k} (s_{j}) \sup_{Ω} [\frac{\partial F_{k} {X, θ}}{\partial X_{ι}}]}^{2} {[{\hat{X}}_{ι} (s_{j}) - X (s_{j})]}^{2} = o_{p} (n^{- \frac{1}{2}})

Similarly, for fixed $\hat{X}' (t)$ , we have

\frac{1}{m} \sum_{j = 1}^{m} \sum_{k = 1}^{ϑ} w_{k} (s_{j}) [{\hat{X}}^{'}_{k} (s_{j}) - {X^{'}}_{k} (s_{j})] = \sum_{k = 1}^{ϑ} \int_{t_{0}}^{T} w_{k} (t) ρ (t) [{\hat{X^{'}}}_{k} (t) - {X^{'}}_{k} (t)] d t + o_{p} (m^{- \frac{1}{2}}) .

Further, applying integration by parts and Assumption D1 w_k(t₀) = w_k(T) = 0 for k = 1, · · ·, ϑ, we have

\sum_{k = 1}^{ϑ} \int_{t_{0}}^{T} w_{k} (t) ρ (t) [{\hat{X}}_{k}^{'} (t) - X_{k}^{'} (t)] d t = \sum_{k = 1}^{ϑ} (w_{k} (t) ρ (t) [{\hat{X}}_{k} (t) - X_{k} (t)]) |_{t_{0}}^{T} - \sum_{k = 1}^{ϑ} \int_{t_{0}}^{T} \frac{d}{d t} [w_{k} (t) ρ (t)] [{\hat{X}}_{k} (t) - X_{k} (t_{j})] d t = - \sum_{k = 1}^{ϑ} \int_{t_{0}}^{T} \frac{d}{d t} [w_{k} (t) ρ (t)] [{\hat{X}}_{k} (t) - X_{k} (t)] d t .

Then by Lemma 6 in Wu, Xue and Kumar (2012), $\frac{\sqrt{n}}{m} \sum_{j = 1}^{m} \sum_{k = 1}^{ϑ} w_{k} (s_{j}) [{\hat{X}}_{k}^{'} (s_{j}) - X_{k}^{'} (s_{j})]$ is asymptotically normal with mean 0 and variance of order O_p(1). Using the Delta method again, it follows that $\frac{\sqrt{n}}{m} \sum_{j = 1}^{m} {\sum_{k = 1}^{ϑ} w_{k} (s_{j}) [{\hat{X}}_{k}^{'} (s_{j}) - X_{k}^{'} (s_{j})]}^{2}$ is also asymptotically normal with mean 0 and variance of order O_p(1). So $\frac{1}{m} \sum_{j = 1}^{m} {\sum_{k = 1}^{ϑ} w_{k} (s_{j}) [{\hat{X}}_{k}^{'} (s_{j}) - X_{k}^{'} (s_{j})]}^{2} = o_{p} (n^{- \frac{1}{2}})$ . Thus the claim holds and we have $E_{0} [\sup_{Ω} ({\tilde{S}}_{n} - S) (θ - θ_{n 0})] \underline{≺} ϕ_{n} (δ) / \sqrt{n} + o_{p} (1) / \sqrt{n} ≃ ϕ_{n} / \sqrt{n}$ . It follows that $E_{0} [\sup_{Ω} \sqrt{n} ({\tilde{S}}_{n} - S) (θ - θ_{n 0})] \underline{≺} ϕ_{n} (δ)$

Thus the conditions of Theorem 3.4.1 in Van der Vaart and Wellner (1996) are satisfied for the δ_n, ρ₁ and ϕ_n(δ) above. Therefore we have $- r_{n}^{2} ρ_{1} ({\hat{θ}}_{n}, θ_{n 0}) = O_{p} (1)$ , where r_n satisfies $r_{n}^{2} ϕ_{n} (\frac{1}{r_{n}}) \leq \sqrt{n}$ . It follows that $r_{n} = r^{- \frac{1}{2}} n^{\frac{1}{2}} = n^{(1 - ϖ) / 2}$ . Thus $ρ_{1} ({\hat{θ}}_{n}, θ_{n 0}) = O_{p} (n^{- (1 - ϖ) / 2})$ . Further, by similar arguments in the proof of Theorem 4.2 in Xue, Miao, and Wu (2010), we have that $ρ ({\hat{θ}}_{n}, θ_{n 0}) = O_{p} (n^{- ϖ ϱ} + n^{- (1 - ϖ) / 2})$ .

Proof of Theorem 3: To prove Theorem 3, we cannot directly use the Huber-Pollard Ztheorem (see Theorem 3.3.1 in van der Vaart and Wellner 1996), since in the second stage of our proposed estimation method, $\hat{X} (t)$ and ${\hat{X}}^{'} (t)$ are nonparametric plug-in estimators from the first stage. We extend the original Huber-Pollard Z-theorem to such two-stage case.

Some definitions are needed to prove Theorem 3. For any fixed η ∈ $A$ , let $A_{0}$ = {η_h(.) : h in a neighborhood of 0 ∈ $R$ } be a smooth curve in $A$ running through η₀ at h = 0, that is η_h=0(t) = η₀(t). Denote $\frac{\partial}{\partial h} η_{h} (t) |_{h = 0} = a (t)$ and the space generated by such a(t) as ϒ. For any a ∈ ϒ we define

{\dot{S}}_{1} (θ) = \frac{\partial S (θ)}{\partial β} = - 2 \sum_{k = 1}^{ϑ} E (w_{k} (t) [X_{k}^{'} (t) - F_{k} {X (t), θ}] \frac{\partial F_{k} {X (t), θ}}{\partial β}), \begin{array}{l} {\dot{S}}_{2} (θ) [a] = \frac{\partial S (β, η_{h})}{\partial h} |_{h = 0} \\ = - 2 \sum_{k = 1}^{ϑ} E (w_{k} (t) [X_{k}^{'} (t) - F_{k} {X (t), θ}] \frac{\partial F_{k} {X (t), θ}}{\partial η} a (t)) . \end{array}

We set

\begin{array}{l} {\dot{S}}_{11} (θ) = \frac{\partial^{2} S (θ)}{\partial β \partial β^{T}} \\ = 2 \sum_{k = 1}^{ϑ} E (w_{k} (t) \frac{\partial F_{k} {X (t), θ}}{\partial β} \frac{\partial F_{k} {X (t), θ}}{\partial β^{T}}) \\ - 2 \sum_{k = 1}^{ϑ} E (w_{k} (t) [X_{k}^{'} (t) - F_{k} {X (t), θ}] \frac{\partial^{2} F_{k} {X (t), θ}}{\partial β \partial β^{T}}), \end{array}

\begin{array}{l} {\dot{S}}_{12} (θ) [a] = {\dot{S}}_{21}^{T} (θ) [a] = \frac{\partial^{2} S (β, η_{h})}{\partial β \partial h} |_{h = 0} \\ = 2 \sum_{k = 1}^{ϑ} E (w_{k} (t) \frac{\partial F_{k} {X (t), θ}}{\partial β} \frac{\partial F_{k} {X (t), θ}}{\partial η} a (t)) \\ - 2 \sum_{k = 1}^{ϑ} E (w_{k} (t) [X_{k}^{'} (t) - F_{k} {X (t), θ}] \frac{\partial^{2} F_{k} {X (t), θ}}{\partial β \partial η} a (t)), \end{array}

\begin{array}{l} {\dot{S}}_{22} (θ) [a_{1}, a_{2}] = \frac{\partial^{2} S (β, η_{h_{j}})}{\partial h_{1} \partial h_{2}} |_{h_{1} = 0, h_{2} = 0} \equiv \frac{\partial}{\partial h_{2}} {\dot{S}}_{2} (β, η_{h_{2}}) [a_{1}] |_{h_{2} = 0} \\ = 2 \sum_{k = 1}^{ϑ} E (w_{k} (t) {[\frac{\partial F_{k} {X (t), θ}}{\partial η}]}^{2} a_{1} (t) a_{2} (t)) \\ - 2 \sum_{k = 1}^{ϑ} E (w_{k} (t) [X_{k}^{'} (t) - F_{k} {X (t), θ}] \frac{\partial^{2} F_{k} {X (t), θ}}{\partial^{2} η} a_{1} (t) a_{2} (t)), \end{array}

where a₁(t),a₂(t) ∈ ϒ. We also define ${\tilde{S}}_{1 n} (θ) = \frac{\partial {\tilde{S}}_{n} (θ)}{\partial β}, {\tilde{S}}_{2 n} (θ) [a] = \frac{\partial {\tilde{S}}_{n} (β, η_{h})}{\partial h} |_{h = 0}, S_{1 n} (θ) = \frac{\partial S_{n} (θ)}{\partial β},$ and $S_{2 n} (θ) [a] = \frac{\partial {\tilde{S}}_{n} (β, η_{h})}{\partial h} |_{h = 0}$ . Further, for a = (a₁,···, a_d)^T ∈ ϒ^d, where a_j ∈ ϒ for j = 1,···, d, we denote ${\dot{S}}_{2} (θ) [a] = {({\dot{S}}_{2} (θ) [a_{1}], \dots, {\dot{S}}_{2} (θ) [a_{d}])}^{T}, {\dot{S}}_{12} (θ) [a] = {({\dot{S}}_{12} (θ) [a_{1}], \dots, {\dot{S}}_{12} (θ) [a_{d}])}^{T}, {\dot{S}}_{21} (θ) [a] = {({\dot{S}}_{21} (θ) [a_{1}], \dots, {\dot{S}}_{21} (θ) [a_{d}])}^{T}, {\dot{S}}_{22} (θ) [a, a] = {({\dot{S}}_{22} (θ) [a_{1}, a], \dots, {\dot{S}}_{22} (θ) [a_{d}, a])}^{T}, {\tilde{S}}_{2 n} (θ) [a] = {({\dot{S}}_{2} (θ) [a_{1}], \dots, {\tilde{S}}_{2 n} (θ) [a_{d}])}^{T}$ and $S_{2 n} (θ) [a] = {(S_{2 n} (θ) [a_{1}], \dots, S_{2 n} (θ) [a_{d}])}^{T}$ Then we can obtain the following results E1–E7.

E1.
From Theorem 1 and Theorem 2, we have that $‖ {\hat{β}}_{n} - β_{0} | = o_{p} (1), {‖ {\hat{η}}_{n} - η_{0} ‖}_{2} = O_{p} (n^{- ζ})$ , where $O_{p} (n^{- ζ})$ is just the rate of convergence in Theorem 2. Moreover, $ζ > \frac{1}{4}$ .
E2.
Ṡ₁(θ₀) = 0 and Ṡ₂(θ₀)[a] = 0 for all a ∈ ϒ.

E3.

Following the arguments in Ma and Kosorok (2005) and Wellner and Zhang (2007), we need to find a least favorable

a^{*} (t) = {a_{1}^{*} (t), \dots, a_{d}^{*} (t)}^{T} \in ϒ^{d}

, where

a \in ϒ

. for j = 1,···, d, such that Ṡ₁₂(θ₀)[a] − Ṡ₂₂(θ₀)[a^∗,a] = 0, for all a ∈ ϒ. Moreover, the matrix A = −Ṡ₁₁(θ₀) + Ṡ₁₂(θ₀)[a^∗] is nonsingular. Some calculations yield

\begin{array}{l} {\dot{S}}_{12} (θ_{0}) [a] - {\dot{S}}_{22} (θ_{0}) [a^{*}, a] \\ = 2 \sum_{k = 1}^{ϑ} E [[w_{k} (t) \frac{\partial F_{k} {X (t), θ_{0}}}{\partial β_{0}} \frac{\partial F_{k} {X (t), θ_{0}}}{\partial η_{0}} a (t)] \\ - 2 \sum_{k = 1}^{ϑ} E [w_{k} (t) {(\frac{\partial F_{k} {X (t), θ_{0}}}{\partial η_{0}})}^{2} a^{*} (t) a (t)] \\ - 2 \sum_{k = 1}^{ϑ} E [w_{k} (t) [X_{k}^{'} (t) - F_{k} {X (t), θ_{0}}] \frac{\partial^{2} F_{k} {X (t), θ_{0}}}{\partial β_{0} \partial η_{0}} a (t)] \\ + 2 \sum_{k = 1}^{ϑ} E [w_{k} (t) [X_{k}^{'} (t) - F_{k} {X (t), θ_{0}}] \frac{\partial^{2} F_{k} {X (t), θ_{0}}}{\partial^{2} η} a^{*} (t) a (t)], \\ = 2 \sum_{k = 1}^{ϑ} E [w_{k} (t) \frac{\partial F_{k} {X (t), θ_{0}}}{\partial β_{0}} \frac{\partial F_{k} {X (t), θ_{0}}}{\partial η_{0}} a (t)] \\ - 2 \sum_{k = 1}^{ϑ} E [w_{k} (t) {(\frac{\partial F_{k} {X (t), θ_{0}}}{\partial η_{0}})}^{2} a^{*} (t) a (t)] . \end{array}

Therefore, an obvious choice of a^∗ is

a^{*} (t) = \frac{\sum_{k = 1}^{ϑ} w_{k} (t) \frac{\partial F_{k} {X (t), θ_{0}}}{\partial β_{0}} \frac{\partial F_{k} {X (t), θ_{0}}}{\partial η_{0}}}{\sum_{k = 1}^{ϑ} w_{k} (t) {[\frac{\partial F_{k} {X (t), θ_{0}}}{\partial η_{0}}]}^{2}} .

Moreover, for ϑ ≥ 2,

A = - 2 \sum_{k = 1}^{ϑ} E [w_{k} (t) \frac{\partial F_{k} {X (t), θ_{0}}}{\partial β_{0}} \frac{\partial F_{k} {X (t), θ_{0}}}{\partial β_{0}^{T}}] + 2 \sum_{k = 1}^{ϑ} E [w_{k} (t) \frac{\partial F_{k} {X (t), θ_{0}}}{\partial β_{0}} \frac{\partial F_{k} {X (t), θ_{0}}}{\partial η_{0}} a^{*} (t)]

(11)

is nonsingular. Otherwise, A = 0 for ϑ = 1.

E4.
By the similar arguments of Condition (i) of the proof of Theorem 4 in Xue, Lam and Li (2004), it follows that the estimator ${\hat{θ}}_{n}$ satisfies ${\tilde{S}}_{1 n} ({\hat{θ}}_{n}) = 0$ and ${\tilde{S}}_{2 n} ({\hat{θ}}_{n}) [a^{*}] = o_{p} (n^{- 1 / 2})$
E5.
For any δ_n ↓ 0 and C > 0, setting $F = {θ : ‖ β - β ‖ \leq δ_{n}, {‖ η - η_{0} ‖}_{2} \leq C_{n}^{- ζ}}$ , we claim
$sup_{F} | \sqrt{n} ({\tilde{S}}_{1 n} - {\dot{S}}_{1}) (θ) - \sqrt{n} ({\tilde{S}}_{1 n} - {\dot{S}}_{1}) (θ_{0}) | = o_{p} (1), sup_{F} | \sqrt{n} ({\tilde{S}}_{2 n} - {\dot{S}}_{2}) (θ) [a^{*}] - \sqrt{n} ({\tilde{S}}_{2 n} - {\dot{S}}_{2}) (θ_{0}) [a^{*}] | = o_{p} (1) .$
In fact, similar to C3 of Theorem 4 in Xue, Lam, and Li (2004), it follows that
$sup_{F} | \sqrt{n} (S_{1 n} - {\dot{S}}_{1}) (θ) - \sqrt{n} (S_{1 n} - {\dot{S}}_{1}) (θ_{0}) | = o_{p} (1), sup_{F} | \sqrt{n} (S_{2 n} - {\dot{S}}_{2}) (θ) [a^{*}] - \sqrt{n} (S_{2 n} - {\dot{S}}_{2}) (θ_{0}) [a^{*}] | = o_{p} (1) .$
Moreover, similar to the proof of Theorem 2, we have
$sup_{F} | \sqrt{n} ({\tilde{S}}_{1 n} - S_{1}) (θ) - \sqrt{n} ({\tilde{S}}_{1 n} - S_{1}) (θ_{0}) | = o_{p} (1), sup_{F} | \sqrt{n} ({\tilde{S}}_{2 n} - S_{2}) (θ) [a^{*}] - \sqrt{n} ({\tilde{S}}_{2 n} - S_{2}) (θ_{0}) [a^{*}] | = o_{p} (1) .$
Combining them together, it follows that the claim holds.
E6.
By the Taylor expansion, for θ ∈ F, it follows that
$| {\dot{S}}_{1} (θ) - {\dot{S}}_{1} (θ_{0}) - {\dot{S}}_{11} (θ_{0}) (β - β_{0}) - {\dot{S}}_{12} (θ_{0}) [η - η_{0}] | = o (‖ β - β_{0} ‖) + O ({‖ η - η_{0} ‖}_{2}^{2}),$
and
$\begin{array}{l} | {\dot{S}}_{2} (θ) [a^{*}] - {\dot{S}}_{2} (θ_{0}) [a^{*}] - {\dot{S}}_{21} (θ_{0}) [a^{*}] (β - β_{0}) - {\dot{S}}_{22} (θ_{0}) [a^{*}, η - η_{0}] | = o (‖ β - β_{0} ‖) + \\ O ({‖ η - η_{0} ‖}_{2}^{2}) . \end{array}$

E7.

\sqrt{n} ({\tilde{S}}_{1 n} (θ_{0}) - {\tilde{S}}_{2 n} (θ_{0}) [a^{*}])

is asymptotically normal. In fact, by the Taylor expansion, we have

\begin{array}{l} {\tilde{S}}_{1 n} (θ_{0}) - {\tilde{S}}_{2 n} (θ_{0}) [a^{*}] \\ = - \frac{2}{m} \sum_{j = 1}^{m} \sum_{k = 1}^{ϑ} w_{k} (s_{j}) (\frac{\partial F_{k} {\hat{X} (s_{j}), θ_{0}}}{\partial β} - \frac{\partial F_{k} {\hat{X} (s_{j}), θ_{0}}}{\partial η} [a^{*}]) \\ \times [{\hat{X}}_{k}^{'} (s_{j}) - F_{k} {\hat{X} (s_{j}), θ_{0}}] \\ = - \frac{2}{m} \sum_{j = 1}^{m} \sum_{k = 1}^{ϑ} w_{k} (s_{j}) (\frac{\partial F_{k} {\hat{X} (s_{j}), θ_{0}}}{\partial β} - \frac{\partial F_{k} {\hat{X} (s_{j}), θ_{0}}}{\partial η} [a^{*}]) \\ \times [{\hat{X}}_{k}^{'} (s_{j}) - {\hat{X}}_{k}^{'} (s_{j}) + F_{k} {X (s_{j}), θ_{0}} - F_{k} {\hat{X} (s_{j}), θ_{0}}] \\ = - \frac{2}{m} \sum_{j = 1}^{m} \sum_{k = 1}^{ϑ} w_{k} (s_{j}) (\frac{\partial F_{k} {\hat{X} (s_{j}), θ_{0}}}{\partial β} - \frac{\partial F_{k} {\hat{X} (s_{j}), θ_{0}}}{\partial η} [a^{*}]) \\ \times (- \sum_{ι = 1}^{ϑ} \frac{\partial F_{k} {X, θ_{0}}}{\partial X_{ι}} |_{X_{1} = X_{1} (s_{j}), \dots, X_{ι} = {\tilde{X}}_{ι, j}, \dots, X_{ϑ} = X_{ϑ} (s_{j})} [{\hat{X}}_{ι} (s_{j}) - X_{ι} (s_{j})] \\ + {\hat{X}}_{k}^{'} (s_{j}) - X_{k}^{'} (s_{j})) \end{array}

with

{\tilde{X}}_{ι, j}

being some point between X_ι(s_j) and

{\hat{X}}_{ι} (s_{j})

. Further, denoting

A_{k} (t) = \frac{\partial F_{k} {X (t), θ_{0}}}{\partial β} - \frac{\partial F_{k} {X (t), θ_{0}}}{\partial η} [a^{*}]

and applying the numerical error results on the Trapezoidal Rule integration and integration by parts, we have

\begin{array}{l} {\tilde{S}}_{1 n} (θ_{0}) - {\tilde{S}}_{2 n} (θ_{0}) [a^{*}] \\ = - 2 \sum_{k = 1}^{ϑ} \int_{t_{0}}^{T} w_{k} (t) ρ (t) A_{k} (t) [{\hat{X}}_{k}^{'} (t) - X_{k}^{'} (t)] d t \\ + 2 \sum_{k = 1}^{ϑ} \sum_{ι = 1}^{ϑ} \int_{t_{0}}^{T} w_{k} (t) ρ (t) A_{k} (t) \frac{\partial F_{k} {X, θ_{0}}}{\partial X_{ι}} |_{X = X (t)} [{\hat{X}}_{ι} (t) - X_{ι} (t)] d t + O_{p} (m^{- 1}) \\ = - 2 \sum_{k = 1}^{ϑ} {w_{k} (t) ρ (t) A_{k} (t) [{\hat{X}}_{k} (t) - X_{k} (t)]} |_{t_{0}}^{T} \\ + 2 \sum_{k = 1}^{ϑ} \int_{t_{0}}^{T} \frac{d}{d t} [w_{k} (t) ρ (t) A_{k} (t)] [{\hat{X}}_{k} (t) - X_{k} (t)] d t \\ + 2 \sum_{k = 1}^{ϑ} \sum_{ι = 1}^{ϑ} \int_{t_{0}}^{T} w_{k} (t) ρ (t) A_{k} (t) \frac{\partial F_{k} {X, θ_{0}}}{\partial X_{ι}} |_{X = X (t)} [{\hat{X}}_{ι} (t) - X_{ι} (t)] d t + o_{p} (n^{- \frac{1}{2}}) . \end{array}

Under the assumption of w_k(t₀) = w_k(T) = 0 for k = 1,···, ϑ, we have

{\tilde{S}}_{1 n} (θ_{0}) - {\tilde{S}}_{2 n} (θ_{0}) [a^{*}]

= 2 \sum_{k = 1}^{ϑ} \int_{t_{0}}^{T} B_{k} (t) [{\hat{X}}_{k} (t) - X_{k} (t)] d t + o_{p} (n^{- \frac{1}{2}})

(12)

With $B_{k} (t) = \frac{d}{d t} [w_{k} (t) ρ (t) A_{k} (t)] + \sum_{ι = 1}^{ϑ} [w_{ι} (t) ρ (t) A_{ι} (t) \frac{\partial F_{ι} {X, θ_{0}}}{\partial X_{k}} |_{X = X (t)}]$ . Similar to the proof of Theorem 2, by Lemma 6 in Wu, Xue and Kumar (2012), it follows that $2 \sqrt{n} \sum_{k = 1}^{ϑ} \int_{t_{0}}^{T} B_{k} (t) [{\hat{X}}_{k} (t) - X_{k} (t)] d t \overset{d}{\to} N (0, \sum_{2})$ with

\begin{array}{l} Σ_{2} = 4 \sum_{k = 1}^{ϑ} σ_{k}^{2} [\int_{t_{0}}^{T} \int_{t_{0}}^{T} B_{k} (s) N^{T} (s) {(G + \frac{λ_{k}}{n} D_{q})}^{- 1} \\ \times G {(G + \frac{λ_{k}}{n} D_{q})}^{- 1} N (t) B_{k}^{T} (t) d s d t], \end{array}

(13)

and $Σ_{2} = O_{p} (1)$ . Then from the expression (12), we have $\sqrt{n} {\tilde{S}}_{1 n} (θ_{0}) - {\tilde{S}}_{2 n} (θ_{0}) [a^{*}] \overset{d}{\to} N (0, Σ_{2})$ . Thus the claim E7 holds.

Then, by E1 and E5, we have $\sqrt{n} ({\tilde{S}}_{1 n} - {\dot{S}}_{1}) ({\hat{θ}}_{n}) - \sqrt{n} ({\tilde{S}}_{1 n} - {\dot{S}}_{1}) (θ_{0}) = o_{p} (1)$ . Since ${\tilde{S}}_{1 n} ({\hat{θ}}_{n}) = 0$ by E4 and ${\dot{S}}_{1} (θ_{0}) = 0$ by E2, it follows that $\sqrt{n} {\dot{S}}_{1} ({\hat{θ}}_{n}) + \sqrt{n} {\tilde{S}}_{1 n} (θ_{0}) = o_{p} (1)$ Similarly, $\sqrt{n} {\dot{S}}_{2} ({\hat{θ}}_{n}) [a^{*}] + \sqrt{n} {\tilde{S}}_{2 n} (θ_{0}) [a^{*}] = o_{p} (1)$ Combining these equalities and E6 yields

{\dot{S}}_{11} ({\hat{β}}_{n} - β_{0}) + {\dot{S}}_{12} [{\hat{η}}_{n} - η_{0}] + {\tilde{S}}_{1 n} (θ_{0}) = o_{p} (n^{- \frac{1}{2}}) + 0 (‖ \hat{β} - β_{0} ‖) + O ({‖ {\hat{η}}_{n} - η_{0} ‖}_{2}^{2}),

(14)

and

{\dot{S}}_{21} [a^{*}] ({\hat{β}}_{n} - β_{0}) + {\dot{S}}_{22} [a^{*}, {\hat{η}}_{n} - η_{0}] + {\tilde{S}}_{2 n} (θ_{0}) [a^{*}] = o_{p} (n^{- \frac{1}{2}}) + 0 (‖ \hat{β} - β_{0} ‖) + O ({‖ {\hat{η}}_{n} - η_{0} ‖}_{2}^{2}) .

(15)

E1 implies $\sqrt{n} O ({‖ {\hat{η}}_{n} - η_{0} ‖}_{2}^{2}) = o_{p} (1)$ . Thus by E3 and (14) minus (15), it follows that

({\dot{S}}_{11} (θ_{0}) - {\dot{S}}_{21} (θ_{0}) [a^{*}]) ({\hat{β}}_{n} - β_{0}) + o (‖ {\hat{β}}_{n} - β_{0} ‖) = - ({\tilde{S}}_{1 n} (θ_{0}) - {\tilde{S}}_{2 n} (θ_{0}) [a^{*}]) + o_{p} (n^{- \frac{1}{2}}) .

That is $- [A + o (1)] ({\hat{β}}_{n} - β_{0}) = - ({\tilde{S}}_{1 n} (θ_{0}) - {\tilde{S}}_{2 n} (θ_{0}) [a^{*}]) + o_{p} (n^{- \frac{1}{2}})$ Combining with E7, it follows that

\sqrt{n} ({\hat{β}}_{n} - β) = {[A + o (1)]}^{- 1} \sqrt{n} ({\tilde{S}}_{1 n} (θ_{0}) - {\tilde{S}}_{2 n} (θ_{0}) [a^{*}]) + o_{p} (1) \overset{d}{\to} N (0, A^{- 1} Σ_{2} {(A^{- 1})}^{T})

with $A^{- 1} Σ_{2} {(A^{- 1})}^{T} = O_{p} (1)$

References

BANSAL M, DELLA G AND DI BERNARDO D(2006). Inference of gene regulatory networks and compound mode of action from time course gene expression profiles. Bioinformatics 22:815–822. [DOI] [PubMed] [Google Scholar]
BARD Y (1974). Nonlinear Parameter Estimation New York:Academic Press. [Google Scholar]
BRUNEL N (2008). Parameter estimation of ODE’s via nonparametric estimators. Electronic Journal of Statistics 2:1242–1267. [Google Scholar]
CAO J, HUANG JZ AND WU H (2012). Penalized Nonlinear Least Squares Estimation of Time-Varying Parameters in Ordinary Differential Equations. Journal of Computational and Graphical Statistics 21:42–56 [DOI] [PMC free article] [PubMed] [Google Scholar]
CHEN J AND WU H (2008a). Efficient local estimation for time-varying coefficients in deterministic dynamic models with applications to HIV-1 dynamics. J. Am. Statist. Assoc 103:369–384. [Google Scholar]
CHEN J AND WU H (2008b). Estimation of time-varying parameters in deterministic dynamic models with application to HIV infections. Statistica Sinica 18:987–1006. [Google Scholar]
CHEN G AND HUANG JZ (2010). Bootstrap consistency for general semiparametric M-estimation. Ann. Stat 38:2884–2915. [Google Scholar]
CLAESKENS G, KRIVOBOKOVA T AND OPSOMER J (2009). Asymptotic properties of penalized spline estimators. Biometrika 96:529–544. [Google Scholar]
CRAVEN P AND WAHBA G (1979). Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the generalized cross-validation. Numer. Math 31:337–403. [Google Scholar]
D’HAESELEER P, WEN X, FUHRMAN S AND SOMOGYI R (1999). Linear modeling of mrna expression levels during cns development and injury. Pacific Symposium on Biocomputing 4:41–52. [DOI] [PubMed] [Google Scholar]
DONNET S AND SAMSON A (2007). Estimation of parameters in incomplete data models defined by dynamical systems. Journal of Statistical Planning and Inference 137:2815–2831. [Google Scholar]
EFRON B (1982). The Jackknife, the Bootstrap and Other Resampling Plans Philadelphia:SIAM. [Google Scholar]
Gugushvili S and Klaassen CAJ $\sqrt{n}$ -Consistent Parameter Estimation for Systems of Ordinary Differential Equations: bypassing Numerical Integration via Smoothing. Bernoulli 18:1061–1098. [Google Scholar]
HAIRER E, NORSETT S AND WANNER G (1993). Solving Ordinary Differential Equation I: Nonstiff Problems Berlin:Springer-Veralag. [Google Scholar]
HE X, XUE H AND SHI NZ (2010). Sieve maximum likelihood estimation for doubly semiparametric zero-inflated Poisson models. Journal of Multivariate Analysis 101:2026–2038. [DOI] [PMC free article] [PubMed] [Google Scholar]
HUANG Y, LIU D AND WU H (2006). Hierarchical Bayesian methods for estimation of parameters in a longitudinal HIV dynamic system. Biometrics 62:413–423. [DOI] [PMC free article] [PubMed] [Google Scholar]
JOSHI M, SEIDEL-MORGENSTERN A AND KREMLING A (2006). Exploiting the boostrap method for quantifying parameter confidence intervals in dynamic systems. Metabolic Engineering 8:447–455. [DOI] [PubMed] [Google Scholar]
LI L, BROWN MB, LEE KH AND GUPTA S (2002). Estimation and inference for a spline-enhanced population pharmacokinetic model. Biometrics 58:601–611. [DOI] [PubMed] [Google Scholar]
LI Y AND RUPPERT D (2008). On the asymptotics of penalized splines. Biometrika 95:415–436. [Google Scholar]
LI Z, OSBORNE MR AND PRVAN T (2005). Parameter estimation of ordinary differential equations. IMA Journal of Numerical Analysis 25:264–285. [Google Scholar]
LIANG H, MIAO H AND WU H (2010). Estimation of constant and time-varying dynamic parameters of HIV infection in a nonlinear differential equation model. Ann. Appl. Statist 4:460–483. [DOI] [PMC free article] [PubMed] [Google Scholar]
LIANG H AND WU H (2008). Parameter estimation for differential equation models using a framework of measurement error in regression. J. Am. Statist. Assoc 103:1570–1583. [DOI] [PMC free article] [PubMed] [Google Scholar]
MA S AND KOSOROK MR (2005). Robust semiparametric M-estimation and the weighted bootstrap. Journal of Multivariate Analysis 96:190–217. [Google Scholar]
MATTEIJ R AND MOLENAAR J (2002). Ordinary Differential Equations in Theory and Practice Philadelphia:SIAM. [Google Scholar]
PUTTER H, HEISTERKAMP S, LANGE J AND WOLF F (2002). A Bayesian approach to parameter estimation in HIV dynamic models. Statistics in Medicine 21:2199–2214. [DOI] [PubMed] [Google Scholar]
QI X AND ZHAO H (2010). Asymptotic efficiency and finite-sample properties of the generalized profiling estimation of parameters in ordinary differential equations. Ann. Statist 38:435–481. [Google Scholar]
RAMSAY JO (1996). Principal differential analysis: data reduction by differential operators. J. R. Statist. Soc. B 58:495–508. [Google Scholar]
RAMSAY JO, HOOKER G, CAMPBELL D AND CAO J (2007). Parameter estimation for differential equations: a generalized smoothing approach (with discussion). J. R. Statis. Soc. B 69:741–796. [Google Scholar]
RUPPERT D, WAND MP AND CARROLL RJ (2003). Semiparametric Regression Cambridge: Cambridge University Press. [Google Scholar]
SCHUMAKER LL (1981). Spline Functions: Basis Theory New York:Wiley. [Google Scholar]
SHEN X (1997). On methods of sieves and penalization. Ann. Statist 25:2555–2591. [Google Scholar]
SHEN X AND WONG WH (1994). Convergence rate of sieve estimates. Ann. Statist 22:580–615. [Google Scholar]
THOMASETH K, ALEXANDRA KW, BERNHARD L et al. (1996). Integrated mathematical model to assess β-cell activity during the oral glucose test. Am. J. Phisiol 270:522–531. [DOI] [PubMed] [Google Scholar]
VAN DER VAART AW AND WELLNRR JA (1996). Weak Convergence and Empirical Processes New York:Springer-Verlag. [Google Scholar]
VARAH J (1982). A spline least squares method for numerical parameter estimation in differential equations. SIAM J. Sci. Stat. Comput 3:28–46. [Google Scholar]
WELLNER JA AND ZHANG Y (2007). Two likelihood-based semiparametric estimation methods for panel count data with covariates. Ann. Statist 35:2106–2142. [Google Scholar]
WESSELS LF, VAN SOMEREN EP AND REINDERS MJ (2001). A comparison of genetic network models. Pacific Symposium on Biocomputing 6:508–519. [PubMed] [Google Scholar]
WU H, XUE H AND KUMAR A (2012). Numerical discretization-based estimation methods for ordinary differential equation models via penalized spline smoothing. Bio-metrics 68(2):344–352. [DOI] [PMC free article] [PubMed] [Google Scholar]
WU H, KUMAR A, MIAO H, WILTSE JH, MOSSMAN TR, LIVINGSTON AM, BELZ GT, PERELSON AS, ZAND MS, AND TOPHAM DJ (2011). Modeling of Influenza-Specific CD8⁺ T Cells during the Primary Response Indicates that the Spleen Is a Major Source of Effectors. The Journal of Immunology 187:4474–4482. [DOI] [PMC free article] [PubMed] [Google Scholar]
XUE H, LAM KF AND LI G (2004). Sieve maximum likelihood estimator for semiparametric regression models with current status data. J. Amer. Statist. Assoc 99:346–356. [Google Scholar]
XUE H, MIAO H AND WU H (2010). Sieve estimation of constant and time-varying coefficients in nonlinear ordinary differential equation models by considering both numerical error and measurement error. Ann. Statist 38:2351–2387. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] BANSAL M, DELLA G AND DI BERNARDO D(2006). Inference of gene regulatory networks and compound mode of action from time course gene expression profiles. Bioinformatics 22:815–822. [DOI] [PubMed] [Google Scholar]

[R2] BARD Y (1974). Nonlinear Parameter Estimation New York:Academic Press. [Google Scholar]

[R3] BRUNEL N (2008). Parameter estimation of ODE’s via nonparametric estimators. Electronic Journal of Statistics 2:1242–1267. [Google Scholar]

[R4] CAO J, HUANG JZ AND WU H (2012). Penalized Nonlinear Least Squares Estimation of Time-Varying Parameters in Ordinary Differential Equations. Journal of Computational and Graphical Statistics 21:42–56 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] CHEN J AND WU H (2008a). Efficient local estimation for time-varying coefficients in deterministic dynamic models with applications to HIV-1 dynamics. J. Am. Statist. Assoc 103:369–384. [Google Scholar]

[R6] CHEN J AND WU H (2008b). Estimation of time-varying parameters in deterministic dynamic models with application to HIV infections. Statistica Sinica 18:987–1006. [Google Scholar]

[R7] CHEN G AND HUANG JZ (2010). Bootstrap consistency for general semiparametric M-estimation. Ann. Stat 38:2884–2915. [Google Scholar]

[R8] CLAESKENS G, KRIVOBOKOVA T AND OPSOMER J (2009). Asymptotic properties of penalized spline estimators. Biometrika 96:529–544. [Google Scholar]

[R9] CRAVEN P AND WAHBA G (1979). Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the generalized cross-validation. Numer. Math 31:337–403. [Google Scholar]

[R10] D’HAESELEER P, WEN X, FUHRMAN S AND SOMOGYI R (1999). Linear modeling of mrna expression levels during cns development and injury. Pacific Symposium on Biocomputing 4:41–52. [DOI] [PubMed] [Google Scholar]

[R11] DONNET S AND SAMSON A (2007). Estimation of parameters in incomplete data models defined by dynamical systems. Journal of Statistical Planning and Inference 137:2815–2831. [Google Scholar]

[R12] EFRON B (1982). The Jackknife, the Bootstrap and Other Resampling Plans Philadelphia:SIAM. [Google Scholar]

[R13] Gugushvili S and Klaassen CAJ $\sqrt{n}$ -Consistent Parameter Estimation for Systems of Ordinary Differential Equations: bypassing Numerical Integration via Smoothing. Bernoulli 18:1061–1098. [Google Scholar]

[R14] HAIRER E, NORSETT S AND WANNER G (1993). Solving Ordinary Differential Equation I: Nonstiff Problems Berlin:Springer-Veralag. [Google Scholar]

[R15] HE X, XUE H AND SHI NZ (2010). Sieve maximum likelihood estimation for doubly semiparametric zero-inflated Poisson models. Journal of Multivariate Analysis 101:2026–2038. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] HUANG Y, LIU D AND WU H (2006). Hierarchical Bayesian methods for estimation of parameters in a longitudinal HIV dynamic system. Biometrics 62:413–423. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] JOSHI M, SEIDEL-MORGENSTERN A AND KREMLING A (2006). Exploiting the boostrap method for quantifying parameter confidence intervals in dynamic systems. Metabolic Engineering 8:447–455. [DOI] [PubMed] [Google Scholar]

[R18] LI L, BROWN MB, LEE KH AND GUPTA S (2002). Estimation and inference for a spline-enhanced population pharmacokinetic model. Biometrics 58:601–611. [DOI] [PubMed] [Google Scholar]

[R19] LI Y AND RUPPERT D (2008). On the asymptotics of penalized splines. Biometrika 95:415–436. [Google Scholar]

[R20] LI Z, OSBORNE MR AND PRVAN T (2005). Parameter estimation of ordinary differential equations. IMA Journal of Numerical Analysis 25:264–285. [Google Scholar]

[R21] LIANG H, MIAO H AND WU H (2010). Estimation of constant and time-varying dynamic parameters of HIV infection in a nonlinear differential equation model. Ann. Appl. Statist 4:460–483. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] LIANG H AND WU H (2008). Parameter estimation for differential equation models using a framework of measurement error in regression. J. Am. Statist. Assoc 103:1570–1583. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] MA S AND KOSOROK MR (2005). Robust semiparametric M-estimation and the weighted bootstrap. Journal of Multivariate Analysis 96:190–217. [Google Scholar]

[R24] MATTEIJ R AND MOLENAAR J (2002). Ordinary Differential Equations in Theory and Practice Philadelphia:SIAM. [Google Scholar]

[R25] PUTTER H, HEISTERKAMP S, LANGE J AND WOLF F (2002). A Bayesian approach to parameter estimation in HIV dynamic models. Statistics in Medicine 21:2199–2214. [DOI] [PubMed] [Google Scholar]

[R26] QI X AND ZHAO H (2010). Asymptotic efficiency and finite-sample properties of the generalized profiling estimation of parameters in ordinary differential equations. Ann. Statist 38:435–481. [Google Scholar]

[R27] RAMSAY JO (1996). Principal differential analysis: data reduction by differential operators. J. R. Statist. Soc. B 58:495–508. [Google Scholar]

[R28] RAMSAY JO, HOOKER G, CAMPBELL D AND CAO J (2007). Parameter estimation for differential equations: a generalized smoothing approach (with discussion). J. R. Statis. Soc. B 69:741–796. [Google Scholar]

[R29] RUPPERT D, WAND MP AND CARROLL RJ (2003). Semiparametric Regression Cambridge: Cambridge University Press. [Google Scholar]

[R30] SCHUMAKER LL (1981). Spline Functions: Basis Theory New York:Wiley. [Google Scholar]

[R31] SHEN X (1997). On methods of sieves and penalization. Ann. Statist 25:2555–2591. [Google Scholar]

[R32] SHEN X AND WONG WH (1994). Convergence rate of sieve estimates. Ann. Statist 22:580–615. [Google Scholar]

[R33] THOMASETH K, ALEXANDRA KW, BERNHARD L et al. (1996). Integrated mathematical model to assess β-cell activity during the oral glucose test. Am. J. Phisiol 270:522–531. [DOI] [PubMed] [Google Scholar]

[R34] VAN DER VAART AW AND WELLNRR JA (1996). Weak Convergence and Empirical Processes New York:Springer-Verlag. [Google Scholar]

[R35] VARAH J (1982). A spline least squares method for numerical parameter estimation in differential equations. SIAM J. Sci. Stat. Comput 3:28–46. [Google Scholar]

[R36] WELLNER JA AND ZHANG Y (2007). Two likelihood-based semiparametric estimation methods for panel count data with covariates. Ann. Statist 35:2106–2142. [Google Scholar]

[R37] WESSELS LF, VAN SOMEREN EP AND REINDERS MJ (2001). A comparison of genetic network models. Pacific Symposium on Biocomputing 6:508–519. [PubMed] [Google Scholar]

[R38] WU H, XUE H AND KUMAR A (2012). Numerical discretization-based estimation methods for ordinary differential equation models via penalized spline smoothing. Bio-metrics 68(2):344–352. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] WU H, KUMAR A, MIAO H, WILTSE JH, MOSSMAN TR, LIVINGSTON AM, BELZ GT, PERELSON AS, ZAND MS, AND TOPHAM DJ (2011). Modeling of Influenza-Specific CD8⁺ T Cells during the Primary Response Indicates that the Spleen Is a Major Source of Effectors. The Journal of Immunology 187:4474–4482. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] XUE H, LAM KF AND LI G (2004). Sieve maximum likelihood estimator for semiparametric regression models with current status data. J. Amer. Statist. Assoc 99:346–356. [Google Scholar]

[R41] XUE H, MIAO H AND WU H (2010). Sieve estimation of constant and time-varying coefficients in nonlinear ordinary differential equation models by considering both numerical error and measurement error. Ann. Statist 38:2351–2387. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Parameter Estimation for Semiparametric Ordinary Differential Equation Models

Hongqi Xue

Arun Kumar

Hulin Wu

Abstract

1. Introduction

2. Estimation Methods