Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Apr 2.
Published in final edited form as: J Am Stat Assoc. 2014 Jun 13;109(506):700–716. doi: 10.1080/01621459.2013.859617

Sparse Additive Ordinary Differential Equations for Dynamic Gene Regulatory Network Modeling

Hulin Wu 1, Tao Lu 2, Hongqi Xue 3, Hua Liang 4
PMCID: PMC4104722  NIHMSID: NIHMS561312  PMID: 25061254

Summary

The gene regulation network (GRN) is a high-dimensional complex system, which can be represented by various mathematical or statistical models. The ordinary differential equation (ODE) model is one of the popular dynamic GRN models. High-dimensional linear ODE models have been proposed to identify GRNs, but with a limitation of the linear regulation effect assumption. In this article, we propose a sparse additive ODE (SA-ODE) model, coupled with ODE estimation methods and adaptive group LASSO techniques, to model dynamic GRNs that could flexibly deal with nonlinear regulation effects. The asymptotic properties of the proposed method are established and simulation studies are performed to validate the proposed approach. An application example for identifying the nonlinear dynamic GRN of T-cell activation is used to illustrate the usefulness of the proposed method.

Keywords: Adaptive group LASSO, Differential equations, High-dimensional data, Sparse additive models, Time course microarray data, Variable selection

1. Introduction

The gene regulatory network (GRN) is a complex system associated with biological activities at the cellular level, such as cell growth, division, development, and response to environmental stimulus (Carthew and Sontheimer, 2009), which should be modeled in a dynamic way (Hecker et al., 2009). The new high-throughput technologies such as DNA microarray and next generation RNA-Seq enable us to observe the dynamic features of gene expression profiles in a genome scale. In particular, the time course gene expression data collected from these new technologies allow investigators to study gene regulatory networks from a dynamic point of view in more details. Currently, several models have been proposed for GRN construction, such as information theory models (Steuer et al., 2002;Stuart et al., 2003); Boolean networks (Kauffman, 1969; Thomas, 1973; Bornholdt, 2008); Bayesian Networks (Heckerman, 1996; Imoto et al., 2003; Needham et al., 2007; Werhli and Husmeier, 2007); latent variable models (Shojaie and Michailidis, 2009); and other regression models (Kim et al., 2009).

In particular, many popular models have been proposed for inferring gene regulatory networks using time course gene expression data. For examples, dynamic Boolean networks and probabilistic Boolean networks (Liang et al., 1998; Akutsu et al., 2000; Shmulevich et al., 2002; Martin et al., 2007), dynamic Bayesian networks (Murphy and Mian, 1999; Friedman et al., 2000; Hartemink et al., 2001; Zou and Conzen, 2005; Song et al., 2009), vector autoregressive and state space models (Hirose et al., 2008; Kojima et al., 2009; Shimamura et al., 2009); and differential equation models (Voit, 2000; Holter et al., 2001; DeJong, 2002; Yeung et al., 2002). Especially Xing and his associates have recently introduced temporal exponential random graph models and time-varying networks to capture dynamics of networks (Hanneke and Xing, 2006; Guo et al., 2007; Kolar et al., 2010). Most of these dynamic network models such as dynamic Bayesian networks and random graph models require extensive computations for posterior inference, which only allow us to deal with small networks. Song et al. (2009) also introduced a time-varying dynamic Bayesian network to model structurally varying directed graphs. Gupta et al. (2007) proposed a hierarchical hidden Markov regression model for determining gene regulatory networks from gene expression microarray data, which also allows for covariate effects varying between states and gene clusters varying over time. Furthermore, Gupta and Ibrahim (2007) introduced a hierarchical regression mixture model to combine gene clustering and motif discovery in an unified framework, in which a Monte Carlo method was used for simultaneous variable selection (for motifs) and clustering (for time course gene expression data). Shojaie et al. (2012) recently proposed an adaptive thresholding estimate under the framework of graphical Granger causality for reconstructing regulatory networks from time-course gene expression data.

In this paper we focus on ordinary differential equation (ODE) models for dynamic GRN construction. The ODE approach models the dynamic change of a gene expression (the derivative of the expression) as a function of expression levels of all related genes. So the dynamic feature of the GRN is automatically and naturally quantified. Both positive and negative as well as the feedback effects of gene regulations can be appropriately captured by the ODE model in a systematic way. A general ODE model for GRNs can be written as:

X(t)=F(t,X(t),θ), (1.1)

where t ∈ [t0, T] (0 ≤ t0< T < ∞) is time, X(t) = (X1(t), ⋯, Xp (t))T is a vector representing the gene expression level of gene 1, ⋯, p at time t and X′(t) is the first order derivative of X(t). F serves as the link function that quantifies the regulatory effects of regulator genes on the expression change of a target gene which depends on a vector of parameters θ. In general F can take any linear or nonlinear functional forms. Many GRN models are based on linear ODEs due to its simplicity. Lu et al. (2011) proposed using the following linear ODEs for dynamic GRN identification and applied the SCAD approach for variable selection,

Xk(t)=j=1pθkjXj(t),k=1,2,,p, (1.2)

where parameters θ = {θkj}k,j=1, ⋯, p quantify the regulations and interactions among the genes in the network. In practice, however, there is little a priori justification for assuming that the effects of regulatory genes take a linear form. Thus, the linear ODE models may be very restrictive for practical applications. In fact, nonlinear parametric ODE models have been proposed for gene regulatory networks (Weaver et al., 1999; Sakamoto and Iba, 2001; Spieth et al., 2006), but the variable selection (network edge identification) problem for high-dimensional nonlinear ODEs has not been addressed. In this paper, we intend to extend the high-dimensional linear ODE models in Lu et al. (2011) to a more general additive nonparametric ODE model for modeling high-dimensional nonlinear GRNs:

Xk(t)=μk+j=1pfkj(Xj(t)),k=1,2,,p, (1.3)

where μk is an intercept term and fkj (·) is a smooth function to quantify the nonlinear relationship among related genes in the GRN. Based on the sparseness principle of gene regulatory networks and other biological systems, we usually assume that the number of significant nonlinear effects, fkj (·), is small for each of the p variables (genes), Xk, although the total number of variables (genes), p, in the network may be large. Thus, we refer to model (1.3) as the sparse additive ODE (SA-ODE) model. Also we assume that the measurements of gene expression for the kth gene are obtained at multiple time points, ti, i = 1, ⋯, n, that is,

Yk(ti)=Xk(ti)+εk(ti),k=1,2,,p, (1.4)

where the measurement errors εk(ti) (i = 1, ⋯, n) are assumed to be i.i.d. with mean zero and variance σk2(0<σk2<). The challenging question is how to perform model selection for the nonparametric SA-ODE model (1.3) under the assumption of sparsity constraints on the index set {j : fkj (·) ≠ 0} of functions fkj (·) that are not identically zero.

There exist several classes of parameter estimation methods for ODE models, which include the nonlinear least squares method (NLS) (Hemker, 1972; Bard, 1974; Li et al., 2005; Xue et al., 2010), the two-stage smoothing-based estimation method (Varah, 1982; Brunel, 2008; Chen and Wu, 2008a,b; Liang and Wu, 2008; Wu et al., 2012), the principal differential analysis (PDA) and its extensions (Ramsay, 1996; Heckman and Ramsay, 2000; Ramsay and Silverman, 2005; Poyton et al., 2006; Ramsay et al., 2007; Varziri et al., 2008; Qi and Zhao, 2010) and the Bayesian approaches (Putter et al., 2002; Huang et al., 2006; Donnet and Samson, 2007). Among these methods, we are more interested in the two-stage smoothing-based estimation method, where in the first stage, a nonparametric smoothing approach is used to obtain the estimates of both the state variables and their derivatives from the observed data, and then in the second stage, these estimated functions are plugged into the ODEs to estimate the unknown parameters using a formulated pseudo-regression model. In particular, the two-stage smoothing-based estimation method avoids numerically solving the differential equations directly and does not need the initial or boundary conditions of the state variables. This method also decouples the high-dimensional ODEs to allow us to perform variable selection and parameter estimaion for one equation at a time (Voit and Almeida, 2004; Jia et al., 2011; Lu et al., 2011). These good features of the two-stage smoothing-based estimation method in addition to its computational efficiency greatly outweigh its disadvantage in a small loss of estimation accuracy in dealing with high-dimensional nonlinear ODE models (Lu et al., 2011).

In the past two decades, there has been much work on penalization methods for variable selection and parameter estimation for high-dimensional data, including the bridge estimator (Frank and Friedman, 1993); the least absolute shrinkage and selection operator (LASSO) (Tibshirani, 1996) and its extensions, such as the adaptive LASSO (Zou, 2006), the group LASSO (Yuan and Lin, 2006) and the adaptive group LASSO (Wang and Leng, 2008); the smoothly clipped absolute deviation (SCAD) penalty (Fan and Li, 2001); the elastic net (Zou and Hastie, 2006) among others. Recently, the LASSO approach has been applied to high-dimensional nonparametric sparse additive (regression) models to perform variable selection and parameter estimation simultaneously (Meier et al., 2009; Ravikumar et al., 2009; Huang et al., 2010; Cantoni et al., 2011). In this paper, we propose to couple the ideas of the two-stage smoothing-based estimation method and the high-dimensional variable selection techniques to perform variable selection and nonparametric function estimation for the proposed SA-ODE model (1.3). However, this is not just to trivially combine the two ideas, instead we have to tackle several critical challenges, which include: 1) the two-stage smoothing-based estimation method allows us to convert our nonparametric SA-ODE model into a pseudo-sparse additive regression model where the covariates and the response variables are derived from nonparametric smoothing estimates of the ODE state variables and their derivatives (instead of direct observed data); 2) the resulting errors in the pseudo-sparse additive regression model are not i.i.d., but dependent; and 3) the number of covariates, p, in the SA-ODE models is usually large (maybe greatly larger than the sample size). Thus, it is not trivial to establish the theoretical properties for the proposed method. To the best of our knowledge, this is the first attempt to propose a variable selection method for a high-dimensional nonparametric ODE model, and it is also the first time to establish the theoretical results for the penalized estimators for ODE models under the “large p, small n” setting.

The remainder of this paper is organized as follows. In Section 2, we propose a five-step variable selection procedure for the SA-ODE model (1.3). In Section 3, the theoretical properties of the proposed method are established in the “large p, small n” setting. In Section 4, we present some simulation results to validate the proposed method. To illustrate the usefulness of the proposed methods, we apply the proposed method to identify a dynamic gene regulatory network based on time course gene expression data from a T-cell activation study in Section 5. We conclude the paper with some remarks in Section 6. The detailed technical proofs are given in the Appendix.

2. Pseudo-Sparse Additive Model and Variable Selection

In this section, we propose a five-step variable selection procedure for the SA-ODE model (1.3). In Step I, we use one of the nonparametric smoothing approaches to estimate both the ODE state variables and their derivatives based on the measurement model (1.4). In Step II, we use spline functions to approximate each of the nonparametric additive components in the SA-ODE model (1.3), and then substitute the estimated state variables and their derivatives from Step I into the SA-ODE model (1.3) to form a ‘pseudo’ sparse additive model. In Step III, we apply the group LASSO approach to obtain an initial estimator and reduce the dimension of the problem. In Step IV, we apply the adaptive group LASSO approach (Wang and Leng, 2008; Huang et al., 2010) for component selection, which combines ideas from the adaptive LASSO (Zou, 2006) and the group LASSO (Yuan and Lin, 2006). In Step II, we usually use a larger number of basis functions to approximate the nonparametric functions and some of these basis functions may not be necessary. In Step V, we propose to use the regular LASSO again to the selected model from Step IV to further shrink some of the coefficients of B-spline basis to zero so that we can obtain a more parsimonious model at the end.

Step I. Nonparametric Smoothing

First we apply one of nonparametric smoothing approaches such as smoothing splines, regression splines, penalized splines or local polynomial to estimate the state variables and their derivatives, Xk (t) and Xk(t) based on model (1.4). In this paper, we adopt the penalized splines (Ruppert et al., 2003; Li and Ruppert, 2008; Claeskens et al., 2009; Wu et al., 2012) to obtain the estimates, k (t) and k(t), k = 1, ⋯, p. That is, approximate Xk (t) one by one for 1 ≤ kp by Xk(t)j=νKkδk,jNk,j,ν+1(t)=Nk,ν+1T(t)δk, where δk = (δk,−ν, ⋯, δk,Kk)T is the unknown coefficient vector to be estimated from the data, and Nk,ν+1(t) = {Nk,−ν,ν+1(t), ⋯, Nk,Kk,ν+1(t)}T is the B-spline basis function vector of degree ν and dimension Kk + ν + 1 at a sequence of knots t0 = τk,−ν = τk,−ν+1 = ⋯ = τk,−1 = τk,0 < τk,1 < ⋯ < τk,Kk < τk,Kk+1 = τk,Kk+2 = ⋯ = τk,Kk+ν+1 = T on [t0, T] (Schumaker, 1981). Define n × (Kk + ν + 1) matrix Nk = {Nk,ν+1(t1), ⋯, Nk,ν+1(tn)}T, Yk = (Yk (t1), ⋯, Yk (tn))T and let Vk=t0T[Nk,ν+1(t)][Nk,ν+1(t)]Tdt. The penalized spline (P-spline) objective function contains a penalized sum of squared differences with a penalized term by the integrated squared second order derivative of the spline function as

Lk(δk;λk)=(YkNkδk)T(YkNkδk)+λkδkTVkδk. (2.1)

The minimizer of (2.1) takes the form δ̂k=(NkTNk+λkVk)1NkTYk. Then we have

k(t)=Nk,ν+1T(t)δ̂k,k(t)=[Nk,ν+1(t)]Tδ̂k. (2.2)

From de Boor (2001), the derivatives of spline functions can be simply expressed in terms of lower order spline functions, then we can obtain the explicit expressions of Nk,ν+1(t) and Nk,ν+1(t). To determine λk, we use the standard generalized cross validation (GCV) method (Craven and Wahba, 1979). Note that, if the longitudinally replicate data are available (see our application example in Section 5), the nonparametric mixed-effects smoothing methods can be used in this step in order to obtain better smoothing results (Wu and Zhang, 2006; Lu et al., 2011).

Step II. Pseudo-Sparse Additive Models

In this step, we propose a method to identify significant functions in model (1.3) using a high-dimensional variable selection technique, the group LASSO. First, following the idea similar to Varah (1982); Brunel (2008); Chen and Wu (2008a,b); Liang and Wu (2008);Wu et al. (2012), we substitute the estimated state variables k (t) and their derivatives k(t) into ODE model (1.3) to form the following pseudo-sparse additive (PSA) model:

Hki=μk+j=1pfkj(ji)+ϒki,k=1,2,,p,i=1,2,,n, (2.3)

where Hki=k(ti) and ji = j (ti), ϒki is the sum of measurement errors and estimation errors of k(t) and k (t) from Step I. In model (2.3), the response variables and the covariates are derived from the nonparametric smoothing estimates of the state variables and their derivatives, respectively. Moreover, the resulting error terms ϒki are not i.i.d., but dependent. Thus this is not a standard sparse additive regression model studied in the literature (Meier et al., 2009; Ravikumar et al., 2009; Huang et al., 2010; Cantoni et al., 2011). That is why we call it as a ‘pseudo’ sparse additive (PSA) model. Since k (t) and k(t) in the above model are estimated continuously at any time point t from Step I, we may augment more time points than the original observation times for the next step analysis. In fact, other investigators (D’Haeseleer et al., 1999; Wessels et al., 2001; Bansal et al., 2006) have used this data augmentation strategy for ODE parameter estimation. We adopt a similar idea here.

Remark 1. Decoupled property. Note that the substitution approach in model (2.3) allows us to decouple the p-dimensional ODE model into p one-dimensional ODEs independently (Voit and Almeida, 2004; Jia et al., 2011; Lu et al., 2011), so that we can deal with the variable selection problem for these ODEs one by one separately. This is a unique feature of the two-stage smoothing-based estimation method for ODEs (Varah, 1982; Brunel, 2008; Chen and Wu, 2008a,b; Liang and Wu, 2008; Wu et al., 2012).

We adopt a similar idea from Huang et al. (2010) and apply truncated series expansions with B-spline bases to approximate the additive components in model (2.3). Let t0 = ξ0 < ξ1 < ⋯ < ξKn < ξKn+1 = T be a partition of the interval [t0, T], where Kn = O(nϖ) (0 < ϖ < 0.5) is a positive integer such that max0≤mKnm+1 − ξm| = O(n−ϖ). Let 𝒮n be the space of polynomial splines on [t0, T] of degree l ≥ 1 consisting of functions s satisfying: (i) s is a polynomial of degree l on the subintervals Im = [ξm, ξm+1), m = 0, ⋯, Kn − 1 and IKn = [ξKn, ξKn+1]; (ii) for l ≥ 2 and 0 ≥ ll − 2, s is l times continuously differentiable on [t0, T]. Then there exists a normalized B-spline basis {ϕm, 1 ≤ mmn} on [t0, T] for 𝒮n, where mnKn + l such that, for any fkj*𝒮n, it can be expressed

fkj*(x)=m=1mnβkjmϕm(x),k,j=1,2,,p, (2.4)

where βkjm are spline coefficients. Here we propose to conservatively choose the number of basis functions, mn, as large as possible (more than enough or under-smoothing). Note that it is computationally prohibited to select different mns for different functions fkj (·) when p is large. To deal with this problem, we will re-apply the LASSO approach to shrink unnecessary basis coefficients into zero in Step V. Replacing fkj by its B-spline approximation in (2.4), model (2.3) can be expressed as

Hki=μk+j=1pm=1mnβkjmϕm(ji)+ϒki*,k=1,,p,i=1,,n, (2.5)

where ϒki* is the sum of ϒki and the approximation errors of the additive regression functions by splines. Let βkj = (βkj1, ⋯, βkjmn)T (k, j = 1, ⋯, p) and βk=(βk1T,,βkpT)T. Then we have p groups of parameters and our purpose is to select non-zero groups, i.e., nonzero βkj, k, j = 1, ⋯, p.

Step III. Group LASSO

For model (1.3), we need a constraint to deal with unidentifiability problem, i.e., Efkj (Xj (t)) = 0 (j = 1, ⋯, p). Thus, for model (2.5), we impose the constraints i=1nm=1mnβkjmϕm(ji)=0,1jp. We may also use the centralization of the response and the basis functions to remove the restrictions. Let ϕ̅jm=1ni=1nϕm(ji) and ψm(x) ≡ ψjm(x) = ϕm(x) − ϕ̄jm. Write Zij = {ψ1(ji),⋯ ψmn(ji)}T, Zj = (Z1j, ⋯, Znj)T, and Z = (Z1, ⋯,Zp). Let k=1ni=1nHki and Hk = (Hk1k, ⋯, Hknk)T. Similar to Brunel (2008) and Wu et al. (2012), a weight function with boundary restrictions should be imposed in order to achieve a better convergence rate for parameter estimation. Let Dk1 = diag{dk1(t1), ⋯, dk1(tn)}, Dk2 = diag{dk2(t1), ⋯, dk2(tn)} and Dk3 = diag{dk3(t1), ⋯, dk3(tn)}, where dk1(t), dk2(t) and dk3(t) are prescribed non-negative weight functions on [t0, T] with boundary conditions dk1(t0) = dk1(T) = 0, dk2(t0) = dk2(T) = 0 and dk3(t0) = dk3(T) = 0. More discussions on how to select the weight function can be found in Brunel (2008) and Wu et al. (2012). With these notations, we can obtain the group LASSO estimator β̃k by minimizing the following penalized weighted least squares criterion,

Lk1(βk;λk1)=(HkZβk)TDk1(HkZβk)+λk1j=1pβkj2, (2.6)

where λk1 is a penalty parameter, which can be determined by BIC or EBIC (Chen and Chen, 2008). Here we have dropped μk in the arguments of Lk1 with the centering μ̂k = k. Based on the group LASSO estimator β̃k, we can also obtain the estimates of the nonparametric functions, kj(x)=m=1mnβ̃kjmψm(x), 1 ≤ jp.

Step IV. Adaptive Group LASSO

The above group LASSO penalty treats coefficients from each group equally which is not optimal. In order to allow different amounts of shrinkage for different coefficients, an adaptive group LASSO is necessary. In this step, we perform the adaptive group LASSO based on the results from Step III by setting wkj=β̃kj21 if ‖β̃kj2 > 0, otherwise wkj = ∞. Then we obtain the adaptive group LASSO estimator β̂k by minimizing the penalized weighted least squares criterion,

Lk2(βk;λk2)=(HkZβk)TDk2(HkZβk)+λk2j=1pwkjβkj2 (2.7)

with a penalty parameter λk2, which can also be determined by BIC or EBIC (Chen and Chen, 2008). Then we obtain the adaptive group LASSO estimates of μk and fkj, μ̂k=k1ni=1nHki and kj(x)=m=1mnβ̂kjmψm(x), 1 ≤ jp.

Step V. Regular LASSO for Shrinking Basis Coefficients

In Step II, we approximate each of the nonparametric functions in the pseudo-sparse additive model intentionally using a larger number of basis functions (under-smoothing). Thus, some of these basis functions may not be necessary. In this step, we re-apply the regular LASSO or adaptive LASSO to the final model selected from the adaptive group LASSO in Step IV to shrink the coefficients of unnecessary basis functions into zero so that we can obtain a final parsimonious model. The minimization criterion for the adaptive LASSO is

Lk3(βk;λk3)=(Hk(s)Z(s)βk)TDk3(Hk(s)Z(s)βk)+λk3j=1sm=1mnwkjm|βkjm(s)| (2.8)

where the superscript “(s)” stands for the corresponding quantities for groups picked up from Step IV and s is the total number of groups. The weight wkjm is set as |β̂kjm(s)|1 if |β̂kjm(s)|>0 and wkjm = ∞, otherwise.

3. Theoretical Results

In this section, we establish the asymptotic properties of the proposed group LASSO estimator in Step III and the adaptive group LASSO estimator in Step IV for the pseudo-sparse additive model derived from a set of ODEs in the last section. This is challenging since we need to integrate the asymptotic results from the two-step smoothing-based estimation method for ODE models (Varah, 1982; Brunel, 2008; Chen and Wu, 2008a,b; Liang and Wu, 2008; Wu et al., 2012) and the sparse additive models (Meier et al., 2009; Ravikumar et al., 2009; Huang et al., 2010; Cantoni et al., 2011) together.

Let r be a nonnegative integer and ζ ∈ (0, 1] such that ϱ = r + ζ > 0.5. Let ℱ be the collection of functions f on [t0, T] whose r-th derivative, f(r) exists and satisfies the Lipschitz condition of order ζ: |f(r)(s) − f(r)(t)| ≤ C|st|ζ for s, t ∈ [t0, T] with a general positive constant C. In model (1.3), without loss of generality, suppose that the first q components are nonzero, that is, fkj (x) ≠ 0, 1 ≤ jq, but fkj (x) ≡ 0, q+1 ≤ jp. Let A1 = {1, ⋯, q}, A0 = {q+1, ⋯, p}, Ã1 = {j :, ‖β̃kj2 ≠ 0, 1 ≤ jp}, and Ã2 = A1Ã1. Let |A| be the cardinality for any index set A. Define f2=[abf2(x)dx]1/2 for any function f(x) at x ∈ [a, b], whenever the integral exists. For 1 ≤ kp, we make the following assumptions:

Assumption A:

  • (A1)

    For κ=max0jKk(τk,j+1τk,j), there exists a constant M > 0 such that κmin0jKk(τk,j+1τk,j)M and max0jKk|τk,j+1τk,jτk,jτk,j11|=o(1).

  • (A2)

    Kk ~ cnϑ with 1/(2ν + 3) ≤ ϑ < 1 and λk = O(nπ) with π ≤ ν /(2ν + 3).

  • (A3)

    Xk(t) ∈ Cν+1[t0, T] with ν ≥ 2.

  • (A4)

    K* = (Kk + ν − 1)(λk1)1/4n−1/4 < 1 for some constant 1.

  • (A5)

    Random design points t1, ⋯, tn are i.i.d. with a cumulative distribution function Q(t) and a positive and continuous derivative density ρ(t). Moreover, ρ(t) is bounded away from 0 and +∞ and has a bounded and continuous first-order derivative.

Assumption B:

  • (B1)

    l + 1 ≥ ϱ.

  • (B2)

    The number of nonzero components q is fixed and there is a constant cf > 0 such that min1≤jqfkj2cf for those nonzero fkj ’s.

  • (B3)

    The random variables εk(ti) (i = 1, ⋯, n) are i.i.d. with E[εk(ti)] = 0 and Var[εk(ti)]=σk2(0<σk2<). Furthermore, their tail probabilities satisfy P{|εk(ti)| > x} ≤ K exp(−Cx2), i = 1, ⋯, n, for all x ≥ 0 and for some constants C and K.

  • (B4)

    E[fkj (Xj (t))] = 0 and fkj ∈ ℱ for k, j = 1, ⋯, p.

  • (B5)

    ν ≥ 3ϱ.

  • (B6)

    Both the weight functions dk1(․) and dk2(․) are bounded and nonnegative on [t0, T] with dk1(t0) = dk1(T) = 0 and dk2(t0) = dk2(T) = 0. For simplicity, assume that ‖dk1 ≤ 1 and ‖dk2 ≤ 1. Moreover, both dk1(t) and dk2(t) have bounded and continuous first-order derivatives.

Note that Assumption A is required to derive the local properties for penalized splines estimates, which were also used by Claeskens et al. (2009) and Wu et al. (2012). Assumption B1 is required for the nonparametric smoothing approaches (Huang, 2003; Xue et al., 2010). Assumptions B2, B3 and B4 are standard conditions for nonparametric additive models (Huang et al., 2010). Assumption B5 means that the smoothness degrees for the state variables Xk (․) are higher than those for the additive functions fkj(․), which is required to control the error of the first-step nonparametric smoothing in order to achieve the same order of the error rate in the pseudo-sparse additive model in Step II. Assumption B6 is for dealing with the boundary effect for the derivative estimation so that the proposed adaptive group LASSO estimator can achieve the optimal nonparametric convergence rate.

Theorem 1 Suppose that Assumptions A and B hold and λk1Cnlog(pmn) for a sufficiently large constant C. Then we have

  1. With probability approaching to one, |Ã1| ≤ M1|A1| = M1q for a finite constant M1 > 1.

  2. If mn2log(pmn)/n0 and (λk12mn2)/n20 as n → ∞, then all the nonzero βk,j, 1 ≤ jq, are selected with probability approaching to one.

  3. j=1pβ̃kjβkj22=Op(mn2log(pmn)n)+Op(mnn)+O(1mn2ϱ1)+O(mn2λk12n2).

Theorem 2 Suppose that Assumptions A and B hold and that λk1Cnlog(pmn) for a sufficiently large constant C. Then we have the following results:

  1. Let Ãf = {j : ‖k,j2 > 0, 1 ≤ jp}. There is a constant M1 > 1 such that, with probability approaching to 1, |Ãf| ≤ M1q.

  2. If mn log(pmn))/n → 0 and (λk12mn)/n20 as n → ∞, then all the nonzero additive components fkj, 1 ≤ jq, are selected with probability approaching to one.

  3. kjfkj22=Op(mnlog(pmn)n)+Op(1n)+O(1mn2ϱ)+O(mnλk12n2), j ∈ Ã2.

Corollary 1 Suppose that Assumptions A and B hold. If λk1Cnlog(pmn) and mnn1/(2ϱ+1), we have

  1. If n−2ϱ/(2ϱ+1) log(p) → 0 as n → ∞, then with probability approaching to one, all the nonzero components fkj, 1 ≤ jq, are selected and the number of selected components is no more than M1q.

  2. kjfkj22=Op(n2ϱ/(2ϱ+1)log(pmn)), j ∈ Ã2.

Theorem 3 Suppose Assumptions A and B hold. If λk1Cnlog(pmn), mnn1/(2ϱ+1), and λk2O(n1/2) and satisfies λk2n(8ϱ+3)/(8ϱ+4)=o(1) and n1/(4ϱ+2)log1/2(pmn)λk2=o(1), then P (‖kjfkj2 > 0, jA1 andkj2 = 0, jA0) → 1, that is, the adaptive group LASSO consistently selects the nonzero components. In addition, j=1qkjfkj22=Op(n2ϱ/(2ϱ+1)).

All detailed proofs of Theorems 1–3, Corollary 1 are given in the Appendix.

Remark 2. We note that, for the λk1, λk2 and mn given in Theorem 3, the number of zero components can be as large as exp (o(n2ϱ/(2ϱ+1))), which is much larger than n. Thus, under the conditions of the theorems, the proposed adaptive group LASSO estimator is selection consistent and achieves the optimal rate convergence even when p is much larger than n.

Remark 3. All of these theoretical results (Theorems 1–3 and Corollary 1) can be extended to the case of fixed design points t1, ⋯, tn, in which case we can assume that there exists a distribution function Q(t) with a positive and continuous derivative density ρ(t) such that for the empirical distribution Qn(t), Qn(t),supt[t0,T]|Qn(t)Q(t)|=o(Kk1). This assumption was also adopted by Zhou et al. (1998) and Zhou and Wolfe (2000).

For the pseudo-sparse additive models derived from a set of nonlinear/nonparametric ODEs, we have achieved similar theoretical results (Theorems 1–3 and Corollary 1) to those obtained by Huang et al. (2010) for nonparametric additive regression models. To prove these results, we develop an important lemma (Lemma A.1 in the Appendix) on the convergence rate of the projection involving the weighted residuals of the derivative estimate of the state variables in the ODEs. In addition, we used the weight functions, Dk1 and Dk2, in the group LASSO objective function (2.6) and the adaptive group LASSO objective function (2.7), respectively. Under a parametric ODE model setting, Wu et al. (2012) used the weighted function with similar boundary conditions for estimation of constant coefficients in ODE models. They found that the parametric estimator has different convergence rates for different boundary conditions of the weight function, that is, if the weight function is zero at both boundaries, the standard root n rate for the constant ODE parameter estimates can be achieved; otherwise, only an optimal nonparametric convergence rate can be reached. In the situation of the pseudo-sparse additive models, we have achieved the optimal nonparametric convergence rate for the adaptive group LASSO estimator (Theorem 3) of the nonparametric functions in the SA-ODE model under the zero boundary assumption for the weight function. If the zero boundary assumption of the weight function does not hold, a lower than the optimal nonparametric convergence rate is expected although we did not provide a detailed proof for this claim. Here we adopt similar ideas in the proofs of Theorems 1–3 and Corollary 1 to those in Huang et al. (2010), but we have to tackle some challenges in the detailed proofs due to the difference between the pseudo-sparse additive model and the sparse additive regression model in Huang et al. (2010).

4. Simulation Studies

In this section, we design simulation experiments to validate the proposed variable selection method for the SA-ODE model. We consider a true SA-ODE model with 8 coupled ODEs as a dynamic network (a higher dimensional SA-ODE model is prohibited for simulation studies due to the limitation of current computational power):

X1=f1,1(X1)+f1,2(X2),X2=f2,1(X1)+f2,2(X3),X3=f3,1(X2)+f3,2(X5), (4.1)
X4=f4,1(X8),X5=f5,1(X4)+f5,2(X6)+f5,3(X8),X6=f6,1(X5), (4.2)
X7=f7,1(X6)+f7,2(X8),X8=f8,1(X4)+f8,2(X6) (4.3)

and we set

f1,1(x)=sin(2x),f1,2(x)=0.1x21,f2,1(x)=x,f2,2(x)=ex, (4.4)
f3,1(x)=x1.52,f3,2(x)=10sin(2x)/(10cos(2x)),f4,1(x)=0.1x20.5, (4.5)
f5,1(x)=10cos(2x),f5,2(x)=10cos(2x)+10x×sin(2x),f5,3(x)=x1/3+x1/5, (4.6)
f6,1(x)=x×cos(2x),f7,1(x)=0.1x+x2,f7,2(x)=0.01x3, (4.7)
f8,1(x)=|x|×cos(2x),f8,2(x)=0.01x3+0.01x2. (4.8)

We assume the measurement model as

Yk(tij)=Xk(tij)+εk(tij),i=1,2,,n,j=1,2,,nk,k=1,2,,8 (4.9)

Note that, to mimic the real data example in the next section, we also allow to have replicates of measurements, j = 1, 2, ⋯, nk at each time point for the kth equation, and the number of replicates may be different for different equations. In our simulation studies, we assume the numbers of replicates for the 8 equations as (n1, n2, n3, n4, n5, n6, n7, n8) = (50, 60, 70, 40, 50, 30, 45, 35), respectively; and the number of measurement time points is taken as n = 15 and 150 (equally-spaced), respectively. If the replicates are repeated measures or longitudinal data as in our real data example in next section, we can use the nonparametric mixed-effects smoothing approach (Wu and Zhang, 2006) for the first step nonparametric smoothing, which is more efficient (Lu et al., 2011). In our simulation studies, we also consider the case of longitudinal replicates. We first generated the mean trajectory of the observational data, k (t), by numerically solving the ODE models (4.1)(4.3) using the initial values of the state variables (in log-scale), Xk (0) = Xk0, sampled from a normal distribution with a mean 8 and a standard deviation 5. Then, assuming the real observational data to have a random departure from the mean trajectory at time ti, bki, which is assumed to follow a standard normal distribution. In addition, the measurement error εk(tij) is assumed to follow a normal distribution with mean zero and variance σ2, and we took σ2 = 0.01 and 0.1, respectively in our simulation studies. In summary, the observational data were generated as yk (tij) = k (tij) + bki + εk(tij). The number of simulation runs is taken as 100.

We evaluate the performance of the proposed adaptive group LASSO for the pseudo-sparse additive models derived from a set of ODEs based on the simulated data that are described above. In Step II, the number of basis functions for approximating the non-parametric components was taken as 12 and the data augmentation strategy (D’Haeseleer et al., 1999; Wessels et al., 2001; Bansal et al., 2006; Lu et al., 2011) was used (2000 data points were taken from the smoothed estimates in Step I). The simulation results are reported in Table 1. From Table 1, we can see that, when the sample size is smaller (n = 15) and the measurement error is larger (σ2 = 0.1), the true positive rate (TP) of the variable selection by the proposed adaptive group LASSO method ranges from 51% to 84%, and the false positive rate(FP) ranges from 33% to 45% for the eight ODEs respectively. However, when the sample size is increased (n = 150), the minimum TP for the eight ODEs is increased to 74% and the FP is decreased to 17–30%. When the measurement error is reduced from σ2 = 0.1 to σ2 = 0.01 for the sample size n = 15, the minimum TP for the eight ODEs is 68% and the FP is decreased to 25–36%. When both the sample size is increased (n = 150) and the measurement error is reduced (σ2 = 0.01), the TP ranges 88–100% and the FP is further decreased to 7–16%. These simulation results show that the proposed adaptive LASSO method is reasonably good and tends to perform perfectly when the measurement error is small and the sample size is large.

Table 1.

Simulation results for the adaptive group LASSO in Step IV. The numbers are the averages of true positive rate (TP%) and false positive rate(FP%) from 100 simulation replicates for different scenarios of sample sizes and measurement errors. Standard deviations are given in parenthesis.

δ2 n Eqn# TP% FP% n Eqn# TP% FP%









0.1 15 150
(1) 51(0.57) 45(0.64) (1) 74(0.39) 30(0.27)
(2) 55(0.48) 44(0.37) (2) 78(0.27) 27(0.34)
(3) 61(0.56) 39(0.42) (3) 80(0.37) 22(0.4)
(4) 84(0.39) 45(0.53) (4) 100(0.0) 17(0.33)
(5) 61(0.72) 38(0.61) (5) 78(0.31) 26(0.33)
(6) 70(0.42) 33(0.39) (6) 94(0.12) 25(0.27)
(7) 72(0.37) 39(0.48) (7) 93(0.22) 21(0.33)
(8) 75(0.62) 37(0.69) (8) 96(0.11) 19(0.17)


0.01 15 150
(1) 71(0.83) 29(0.21) (1) 88(0.22) 7(0.09)
(2) 74(0.51) 33(0.31) (2) 95(0.13) 12(0.16)
(3) 68(0.44) 25(0.25) (3) 94(0.17) 8(0.20)
(4) 89(0.26) 28(0.63) (4) 100(0.0) 10(0.34)
(5) 68(0.29) 29(0.79) (5) 91(0.12) 16(0.14)
(6) 79(0.47) 36(0.58) (6) 100(0.0) 9(0.18)
(7) 80(0.48) 31(0.44) (7) 97(0.11) 13(0.12)
(8) 88(0.33) 28(0.83) (8) 100(0.0) 11(0.23)

To illustrate the effect of Step V (the regular LASSO for shrinking basis coefficients) in the proposed variable selection procedure in Section 2, we plot the estimated nonparametric functions from Step IV (adaptive group LASSO) and Step V as well as the true function from one simulation run in Figure 1. From this figure, we can see that, the two estimated nonparametric functions have similar trends and can capture the complex non-linear functions, but the adaptive group LASSO estimates from Step IV are too wiggly (dotted lines), which is presumably due to too many basis functions included (under-smoothing). That is why Step V is necessary to shrink some of the unnecessary basis coefficients to zero using the regular LASSO, which produces better (smoother) estimates (dashed lines in Figure 1). We can also see that Step V estimates perform better at boundaries and inection points of the curves compared to Step IV estimates, although this observation is only based on one simulation case. But in general, Step V estimates should be better for more smooth functions compared to those of Step IV, since Step V is designed to correct the under-smoothing problem for Step IV. To quantify the gains from Step V, a comparison of residual sum of squares (RSS) between Step IV and V for each of 15 nonlinear functions are listed in Table 2. The RSS for each function fij is defined as RSS=nij=1nss[fij(xnij)fij(xnij)^]2, where fij(xnij)^ is estimated from Step IV or V and fij(xnij) is the true function value. nss is the sample size (here we use 2,000). Table 2 shows that the RSS is reduced by Step V from Step IV in most cases. However, for two functions, f22 and f51, out of 15 functions, the RSS from Step V is larger than that from Step IV.

Figure 1.

Figure 1

Simulation example: A comparison of the true functions (solid lines) and estimated functions from the adaptive group LASSO in Step IV (dotted lines) and from Step V after further basis coefficient shrinking (dashed lines) from one simulation run.

Table 2.

Comparisons of residual sum of squares (RSS) from Step IV and V.

f11 f12 f21 f22 f31 f32 f41 f51
(×103) (×10) (×103) (×103) (×103)

Step IV 7.0 53.1 8.2 1.5 34.3 7.0 9.0 1.1
Step V 6.8 9.9 8.0 2.7 8.6 5.8 1.2 2.8

f52 f53 f61 f71 f72 f81 f82
(×106) (×10−2) (×105) (×106) (×10) (×102) (×103)

Step IV 8.6 35.3 1.5 1.2 19.2 8.0 4.0
Step V 8.1 9.0 1.4 1.1 8.4 5.1 3.9

5. Application: Identification of Nonlinear Dynamic Gene Regulatory Networks

In this section, we apply the proposed method to identify a nonlinear dynamic gene regulatory network based on time course microarray data for T-cell activation. The central event in generation of an immune response is the activation of T-lymphocytes (T-cells). Activated T-cells proliferate and produce cytokines involved in the regulation of effector cells such as B cells and macrophages, which are the primary mediators of the immune response. T-cell activation is initiated by the interaction between the T-cell receptor (TCR) complex and the antigenic peptide presented on the surface of an antigen-presenting cell. This event triggers a network of signaling molecules, including kinases, phosphatases and adaptor proteins that couple the stimulatory signal received from the TCR to gene transcription events in the nucleus (Iwashima et al., 1994; Ley et al., 1991).

In order to better understand the gene regulation network during T-cell activation, Rangel et al. (2004) performed two experiments to characterize the response of a human T cell line (Jurkat) to PMA and ionomicin treatment. In the first experiment, they monitored the expression of 88 genes using cDNA array technology across 10 unequally spaced time points (0, 2, 4, 6, 8, 18, 24, 32, 48, 72h) and each gene was replicated 34 times. In the second experiment, an identical experimental protocol was used, but additional genes were added to the arrays and each gene was replicated 10 times. Genes that displayed very poor reproducibility between the two experiments were removed and the expression data for 58 genes were considered for final analysis by Rangel et al. (2004). Details of data collection and preprocessing can be found in Rangel et al. (2004). Rangel et al. (2004) applied a linear state space model (SSM) to identify the transcriptional network based on this highly replicated expression profiling data. Beal et al. (2005) proposed a variational Bayesian treatment for identifying suitable dimension of hidden state space in SSM. In this paper, we intend to apply the proposed SA-ODE model and the proposed variable selection method in the previous sections to establish a nonlinear dynamic regulatory network among 58 genes for the T-cell activation process. Through this application example, we will demonstrate that some of the gene regulation effects are essentially nonlinear and the linear network model may not be sufficient to capture the nonlinear features of the dynamic network.

We consider the 58 T-cell activation genes that were identified as reproducible from the two experiments by Rangel et al. (2004). For consistency, we only use the expression data from the first experiment with each gene replicated 34 times at 10 time points. The 34 replicates for each gene showed a similar expression pattern during the T-cell activation experiments. It is legitimate to smooth the data for each of the genes using the nonparametric mixed-effects smoothing splines technique (Wu and Zhang, 2006; Lu et al., 2011) to obtain the estimates of the mean expression curves and their derivative curves for each gene, respectively. At the second step, these estimates were plugged in ODE model (1.3) to form the pseudo-sparse additive (PSA) model (2.3). The penalized pseudo-least squares methods, the group LASSO and adaptive group LASSO approaches were used to identify significant regulations (connections) among the 58 genes with potential nonlinear regulation effects. We obtained the fitted curves for all genes by integrating 58 genes concurrently with parameters estimated from the adaptive grouped LASSO approach in Step IV and after shrinking the basis coefficients using regular LASSO in Step V, respectively.

We plot the fitted expression curves (dashed lines) for 9 genes from Step V of the proposed procedure, overlaid with the raw data (dots) and the smoothed mean curves (solid lines) from Step I in Figure 2. For comparison purpose, the estimation results from the linear ODE model (Lu et al., 2011) were also obtained and plotted (dotted lines). From this figure, we can see that the proposed nonparametric additive ODE models fit the data better (also closer to the smoothed mean curves), compared to the linear ODE model fit. We also notice that the proposed SA-ODE models not only can fit the simple monotonic curves well, but also can exibly fit complex nonlinear curves reasonably good. For each of 58 genes, we list the regulatory genes, identified by the proposed method in Table 3. One important feature of the identified GRN is that each of these 58 genes is regulated by only a few other genes (ranging from 1 to 8 genes), which reects the fact of sparseness of the network connections.

Figure 2.

Figure 2

Real data analysis for T cell activation experimental data (dots): the estimation results for gene FYB (gene 45) and 9 genes regulated by gene FYB which include nonparametric smoothing estimates of mean expression curves from Step I (solid lines), the estimated curves (dashed lines) based on the proposed SA-ODE model from Step V of the proposed procedure, and the estimated curves (dotted lines) using the linear ODE model fit from Lu et al. (2011).

Table 3.

T-cell activation gene regulatory network: variable selection results from the adaptive group LASSO. For a particular gene listed in Columns 1 and 4, Columns 2 and 5 provide the list of genes that have a significant regulation effects (inward inuence) on this gene, and Columns 3 and 6 provide the list of genes that this particular gene has a significant regulation effect on (outward inuence).

Gene Inward Inuence Genes Outward Inuence Genes Gen Inward Inuence Genes Outward Inuence Genes
1 29, 32, 43 30 6, 46, 52
2 32, 41, 45 31 25 6, 10, 23, 27, 35, 46, 48
3 29, 35, 41 32 48 1, 2, 5, 21, 24, 26, 56, 57
4 25 33 27, 28, 41 11, 23
5 32, 35 34 6, 27
6 6, 25, 31 6, 9, 16, 26, 30, 34, 42, 52 35 20, 31 3, 5, 9, 14, 18, 20, 21, 38
7 45, 52, 57 11, 23 36 37 23
8 25 37 25 12, 13, 16, 22, 27, 36, 40, 56
9 6, 17, 27, 28, 35, 45 38 11, 28, 29, 35
10 31, 51, 52 39 25 53
11 7, 13, 33, 46, 57 14, 18, 21, 38 40 37, 46
12 29, 37 41 20, 55 2, 3, 15, 17, 25, 33, 42, 50, 58
13 37 11, 14, 15, 23, 50 42 6, 41, 46
14 11, 13, 20, 28, 29, 35 43 27 1
15 13, 41, 45 44 28, 29, 45, 57
16 6, 27, 37 45 52 2, 7, 9, 15, 18, 44, 46, 49, 55
17 41, 48 9, 54 46 31, 45 11, 23, 24, 30, 40, 42, 58
18 11, 35, 45, 54 47 27
19 25 58 48 31, 48 17, 28, 32, 48
20 29, 35, 55 14, 25, 35, 41, 57 49 35, 45, 52
21 11, 28, 32, 35 50 13, 41, 57
22 27, 37, 52 51 25 10
23 7, 13, 31, 33, 36, 46, 57 52 6 7, 10, 22, 29, 30, 45, 49, 53, 55
24 32, 46, 57 53 39, 52
25 20, 28, 41 4, 6, 8, 19, 26, 31, 37, 39, 51 54 17
26 6, 25, 32 55 27, 45, 52 20, 41
27 31, 37 9, 16, 22, 33, 34, 43, 47, 55, 58 56 32, 37
28 48, 57 9, 14, 21, 25, 33, 38, 44 57 20, 32 7, 11, 23, 24, 28, 44, 50
29 52 1, 3, 12, 14, 20, 38, 44 58 19, 27, 41, 46

In agreement with the findings from Rangel et al. (2004), our model identified the important adaptor molecule in T-cell receptor signaling pathway, FYB (gene 45 in Table 3) as one of the genes having the highest number of outward connections. In addition to the 6 genes, found by Rangel et al. (2004), that are regulated by FYB, we identified three more FYB-regulated genes which carry functions in proliferation (gene 2), interference (gene 44) and apoptosis (gene 49). This is presumably due to that the proposed nonparametric additive ODE model and the variable selection method allow us to identify significant nonlinear regulation effects compared to the linear state space model (SSM) used by Rangel et al. (2004). As another interesting finding based on the proposed model, we identified the direct regulation of integrin (gene 15) by FYB, which has already been corroborated by the experimental study (Peterson et al., 2001). On the contrary, the linear SSM used by Rangel et al. (2004) only found the indirect regulation mediated by IL3Rθ (gene 55) and TRAF5 (gene 3). The estimated regulation functions of FYB (gene 45 in Table 3) to other genes are shown in Figure 3. Our results show that the gene FYB always has a positive regulation on genes 9, 18, 46 and 55, but a negative effect on gene 7, which agree with the findings from Rangel et al. (2004). For gene 15 which was found to be indirectly regulated by FYB in Rangel et al. (2004), we discover that the direct regulation effect is positive when the FYB expression level is low and turns to be negative when the FYB expression level is high. Similarly, for the three newly discovered FYB-regulated genes, genes 2, 44 and 49, by our method, we found that the regulation effect changes from positive to negative when the FYB expression is high. This demonstrates that the proposed nonparametric additive ODE model may help us to discover not only nonlinear regulation effects, but also varying effects due to the expression level of the regulator genes.

Figure 3.

Figure 3

Estimated regulation functions of genes inuenced by gene 45 (FYB)

We use R function ‘igraph’ to plot the obtained GRN in Figure 4 in order to visualize the complete network. From this figure, we can clearly see that there are 14 ‘big’ regulator genes that regulate more than 5 other genes for each of them (see also Table 3). Due to space limitation, we will report more detailed annotations and biological implications of this established dynamic GRN for T-cell activation somewhere else. Notice that any statistical methods and modeling approaches can only identify potential regulation effects in actual GRNs. Usually some of these findings can be confirmed by existing literature and some others may need new experiments to validate. We expect that the proposed SA-ODE model and the variable selection methods provide a exible tool to explore and identify additive nonlinear regulation effects in the GRNs.

Figure 4.

Figure 4

Graph of T cell activation GRN formed by 58 genes. Each gene is represented as a node. Arrow stands for the direction of inuence. The 14 “big” regulators are in blue.

6. Concluding Remarks

In this paper, we have proposed a sparse additive ordinary differential equation (SA-ODE) model for dynamic gene regulatory networks (GRN) in order to capture the nonlinear regulation effects. A five-step variable selection and estimation procedure by coupling the two-step smoothing-based ODE estimation method and LASSO techniques is developed. The asymptotic properties of the proposed methods are established. Simulation studies are conducted to demonstrate the good performance of the proposed variable selection method. We have successfully applied the proposed method to construct a nonlinear dynamic GRN for T cell activation. We discovered additional new regulation effects in the complicated nonlinear network consisting of 58 genes, compared to the existing linear network modeling approach. The real data analysis results also show that the proposed SA-ODE model not only can capture complex nonlinear regulation effects, but also it can identify the varying-effects due to the expression levels of regulator genes.

In the theoretical development in Section 3, we have shown that the adaptive group LASSO selects the correct model with probability approaching 1 and achieves the optimal nonparametric convergence rate for the proposed SA-ODE model, which are similar to those obtained by Huang et al. (2010) for a nonparametric additive regression model. Notice that we convert the proposed SA-ODE model into a ‘pseudo’ sparse additive (PSA) model, which is different from the additive regression model in Huang et al. (2010) in the sense that both response variables and covariates in the PSA model are derived from the first step nonparametric smoothing of state variables and their derivatives, instead of observed data, and the error terms in the PSA model are not i.i.d. but dependent. To tackle the challenges in this complex model, we used a weight function with boundary restrictions in the penalized least squares (LASSO) criteria and established a critical lemma in order to achieve the optimal convergence rate.

Alternative models for dynamic GRNs include boolean network, Bayesian network, hidden Markov models among others (Liang et al., 1998; Akutsu et al., 2000; Shmulevich et al., 2002; Martin et al., 2007; Murphy and Mian, 1999; Friedman et al., 2000; Hartemink et al., 2001; Zou and Conzen, 2005; Song et al., 2009; Hirose et al., 2008; Kojima et al., 2009; Shimamura et al., 2009; Gupta et al., 2007). It is interesting to compare these modeling approaches with the proposed ODE modeling method from the perspectives of computational cost and likelihood to recover the true network. But the computational cost for this comparison is beyond the limit of our current computational power. We employed the two-stage smoothing-based ODE estimation method for the SA-ODE model in order to reduce the computational cost and simplify the implementation of LASSO-based variable selection methods. One limitation of the two-stage smoothing-based ODE estimation method is its requirement of direct observations of all state variables in the system. It is still a challenging problem to perform variable selection for high-dimensional ODE models with partially observed state variables. Some other more efficient estimation approaches for ODE models such as the numerical discretization-based estimation method (Wu et al., 2012) and generalized profiling approach (Ramsay et al., 2007) may also be considered for high-dimensional ODE variable selection. We believe that it is worth future exploration along a similar line, although the computational cost and implementation details need to be carefully considered. In the proposed SA-ODE model (1.3), we only consider the nonparametric additive structure of related variables. It could be extended to a nonparametric structure involving interactions (Radchenko and James, 2010) since co-regulations are very often in gene regulatory networks. We are currently working on this problem and will report the results elsewhere.

Acknowledgments

The authors would like to thank the editor, an associate editor, and two referees for their constructive comments and suggestions. This research was partially supported by the NIAID/NIH grants HHSN272201000055C and AI087135 as well as two University of Rochester CTSI pilot awards (UL1RR024160) from the National Center For Research Resources. Liang’s research was partially supported by NSF grants DMS-1007167 and DMS-1207444, and by Award Number 11228103, made by National Natural Science Foundation of China.

APPENDIX

Proofs

We can prove the theoretical results in Theorems 1–3 and Corollary 1 similarly to Huang et al. (2010). But there are more technical challenges that we need to deal with. The major differences between the ‘pseudo’ sparse additive (PSA) model (2.3) and the additive regression model in Huang et al. (2010) include: 1) both response variables and covariates in the PSA model are derived from the first step nonparametric smoothing of state variables and their derivatives, instead of observed data, and ii) the error terms, ϒki in the PSA model, are not i.i.d. but dependent. To tackle these problems, we establish the following lemma. For 1 ≤ kp, let Tjm=n1/2mn1/2i=1nψm(Xj(ti))dk11/2(ti)(k(t)E[k(t)]) for 1 ≤ jp, 1 ≤ mmn (for notational simplicity, we suppress the dependence of k), and Tn = max1≤jp,1≤mmn |Tjm|.

Lemma A.1 Under Assumptions A, B2–B4 and B6 in Section 3, we have E(Tn)=O(1)log(pmn).

Proof: From (2.2), we know that Tjm can be expressed as follows.

Tjm=n1/2mn1/2i=1nψm(Xj(ti))dk11/2(ti)[Nk,ν+1(ti)]T(NkTNk+λkVk)1NkTεk=n1/2mn1/2ω=1n[i=1nψm(Xj(ti))dk11/2(ti)gω(ti)]εk(tω),

with εk = (εk(t1), ⋯, εk(tn))T and gω(ti) being the ω-th component of the n-dimensional vector [Nk,ν+1(ti)]T(NkTNk+λkVk)1NkT. Under Assumption B3, εk(ti)’s are i.i.d. and sub-Gaussian. Moreover, for given ti’s, the coefficient i=1nψm(Xj(ti))dk11/2(ti)gω(ti) of εk(tω) is given. Therefore conditional on ti’s, Tjm’s are sub-Gaussian.

Let sjm2=Var(Tjm|t1,,tn) and sn2=max1jp,1mmnsjm2. Then by the maximal inequality for sub-Gaussian random variables (Lemmas 2.2.1 and 2.2.2, van der Vaart and Wellner, 1996), we have E(max1jp,1mmn|Tjm||t1,,tn)C1snlog(pmn), where C1 > 0 is a constant. Therefore,

E(max1jp,1mmn|Tjm|)C1log(pmn)E(sn). (A.1)

Now we discuss the order of E(sn). Similar to Wu et al. (2012), we consider the integral approximation of Tjm because of the boundary effects. By the strong law of large number, we have

1ni=1nψm(Xj(ti))dk11/2(ti)(k(ti)E[k(ti)])=t0Tψm(Xj(t))dk11/2(t)ρ(t)(k(t)E[k(t)])dt+o(1). (A.2)

From the integration by parts, it follows that

t0Tψm(Xj(t))dk11/2(t)ρ(t)(k(t)E[k(t)])dt=ψm(Xj(t))dk11/2(t)ρ(t)(k(t)E[k(t)])|t0Tt0T(k(t)E[k(t)])ddt[ψm(Xj(t))dk11/2(t)ρ(t)]dt=t0T(k(t)E[k(t)])ddt[ψm(Xj(t))dk11/2(t)ρ(t)]dt,

where E[k(t)]=(E[k(t)]) and the second equality holds because of the boundary condition dk1(t0) = dk1(T) = 0. Denote A(t)=ψm(Xj(t))dk11/2(t)ρ(t) and

ΣX=t0Tt0TA(s)Cov{k(s)E[k(s)],k(t)E[k(t)]|t1,,tn}A(t)dsdt.

By the Hölder’s inequality, we have

ΣXA(s)A(t)t0Tt0TCov{k(s)E[k(s)],k(t)E[k(t)]|t1,,tn}dsdt. (A.3)

From the proof of Lemma 6 in Wu et al. (2012), we know that

t0Tt0TCov{k(s)E[k(s)],k(t)E[k(t)]|t1,,tn}dsdtC2n1 (A.4)

for some positive constant C2. By the extremal equality, it follows that A(s)A(t)=sup{|t0Tt0TA(s)A(t)g(s,t)dsdt|:gL1(μ,μ),g11}, where L1(μ, μ) is the bi-variate Lebesgue function space with L1-norm. Continuing to apply integration by parts and the boundary condition dk1(t0) = dk1(T) = 0, we have

t0Tt0TA(s)A(t)g(s,t)dsdt=t0T{A(s)[g(t,s)A(t)]|t=t0t=Tt0TA(t)g(t,s)tdt}ds=t0Tt0TA(s)A(t)g(t,s)tdsdt=t0TA(t)dtt0Tg(t,s)tdA(s)=t0Tt0TA(s)A(t)2g(t,s)tsdsdt.

Then |t0Tt0TA(s)A(t)g(s,t)dsdt|C3Cov[ψm(Xj(s)),ψm(Xj(t))] for some positive constant C3. By the properties of B-splines, we have

Cov[ψm(Xj(s)),ψm(Xj(t))]C4mn1 (A.5)

for some positive constant C4. Combining (A.3)(A.5) together, it follows that ΣXCn1mn1 for some positive constant C. Then from (A.2), we have that sjm2=Var(Tjm|t1,,tn)C. Therefore Esn(Esn2)1/2C for some positive constant C. Thus from (A.1), we have

E(max1jp,1mmn|Tjm|)Clog(pmn)

for some positive constant C.

Proof of Theorem 1. The proof of part (i) is similar in spirit to the proof of Theorem 1 in Huang et al. (2010). But some technical challenges have to be tackled since here we need to consider the estimation errors of k(t) and k(t) (involving the measurement error) and the approximation error of the additive regression functions by splines. Specifically, from (2.3) and Taylor expansion, we have

ϒki*=k(ti)Xk(ti)+j=1qfkj(Xj(ti))j=1qfkj*(j(ti))=k(ti)Xk(ti)+j=1qfkj(Xj(ti))j=1qfkj(j(ti))+j=1qfkj(j(ti))j=1qfkj*(j(ti))=k(ti)Xk(ti)+j=1qfkj(Xji)(Xj(ti)j(ti))+j=1q[fkj(j(ti))fkj*(j(ti))]=k(ti)E[k(ti)]j=1qfkj(Xji){j(ti)E[j(ti)]}+E[k(ti)]Xk(ti)j=1qfkj(Xji)E[j(ti)]Xj(ti)}+j=1q[fkj(j(ti))fkj*(j(ti))],

where Xji locates between j(ti) and Xj(ti).

For Dk1 = diag{dk1(t1), ⋯, dk1(tn)}, we have Dk11/2=diag{dk11/2(t1),,dk11/2(tn)}. Let ξk=Dk11/2(Ξk+Πk+δk), where Ξk = (Ξk(t1), ⋯, Ξk(tn))T, Πk = (Πk1, ⋯, Πkn)T and δk = (δk1, ⋯, δkn)T with Ξk(ti)=dk11/2(ti)(k(ti)E[Xk(ti)]j=1qfkj(Xji){j(ti)E[j(ti)]},Πki=dk11/2(ti)(E[Xk(ti)]Xk(ti)j=1qfkj(Xji)E[j(ti)]Xj(ti)}),δki=dk11/2(ti)j=1q[fkj(j(ti))fkj*(j(ti))]. For any integer m, let

χm=max|A|=mmaxUAk2=1,1km|ξkTVA(SA)|VA(SA)2,andχm*=max|A|=mmaxUAk2=1,1km|ΞkTVA(SA)|VA(SA)2,

where VA(SA)=ξkT[Dk11/2ZA(ZADk1)ZA)1SA(IPA)Zβ] for |A| = q1 = m ≥ 0 (here the inverse matrix can also be replaced by the generalized inverse), SA=(SA1T,,SAmT)T, SAk = λk1UAk and ‖UAk2 = 1. Then from the expression of k(t) and k(t), we have Πk=Nk(NkTNk+λkVk)1NkTεk+j=1qdiag{fkj(Xj1),,fkj(Xjn)}Nj(NjTNj+λjVj)1NjTεj with Nk={Nk,ν+1(t1),,Nk,ν+1(tn)}T.

For a sufficiently large constant C1 > 0, Define

Ωm0={(Z,Ξk):χmσkC1(m1)mnlog(pmn),mm0}

and

Ωm0*={(Z,Ξk):χm*σkC1(m1)mnlog(pmn),mm0},

where m0 ≥ 0. Similar to the proof of Theorem 2.1 of Wei and Huang (2010), it can follow that (Z, Ξk) ∈ Ωq ⇒ |Ã1| ≤ M1q for a constant M1 > 1 and

P(Ωq*)1. (A.6)

By the triangle and Cauchy-Schwarz inequalities,

|ξkTVA(SA)|VA(SA)2=|(Ξk+Πk+δk)TVA(SA)|VA(SA)2|ΞkTVA(SA)|VA(SA)2+Πk2+δk2. (A.7)

Since fkjfkj*2=O(mnϱ)=O(nϱ/(2ϱ+1)) for mn = n1/(2ϱ+1), we have that, for all mq and n sufficiently large,

δk2C2nqmn2ϱ=C2q1/2n1/(4ϱ+2)σkC1(m1)mnlog(pmn), (A.8)

for some constant C2 > 0. From Lemma 1(i) of Wu et al. (2012) and Theorem 2(a) of Claeskens et al. (2009), we have E[k(t)]Xk(t)=O(κν)+O(λkn1κ3)=O(nν/(2ν+3)) and E[k(t)]−Xk(t) = Oν+1)+Okn−1κ−2) = O(n−(ν+1)/(2ν+3)) for κ = n−1/(2ν+3) and λk = O(nν/(2ν+3)). Then Πki = O(qn−ν/(2ν+3)). Under Assumption B5, it follows that 3/(4ν + 6) ≤ 1/(4ϱ + 2). Then for all mq and n sufficiently large, we have

Πk2C3nqn2ν/(2ν+3)=C3q1/2n3/(4ν+6)σkC1(m1)mnlog(pmn), (A.9)

for some constant C3 > 0. Finally, it follows from (A.6)(A.9) that P(Ωq) → 1. This completes the proof of part (i) of Theorem 1.

Before proving part (ii), we first prove part (iii) of Theorem 1. By the definition of β̃k=(β̃k1T,,β̃kpT)T,

(HkZβ̃k)TDk1(HkZβ̃k)+λk1j=1pβ̃kj2(HkZβk)TDk1(HkZβk)+λk1j=1pβkj2. (A.10)

Let A2 = {j : ‖βkj2 ≠ 0 or ‖β̃kj2 ≠ 0} and dk2 = |A2|. By part (i), dk2 = Op (q). By (A.10) and the definition of A2,

(HkZA2β̃kA2)TDk1(HkZA2β̃kA2)+λk1jA2β̃kj2(HkZA2βkA2)TDk1(HkZA2βkA2)+λk1jA2βkj2. (A.11)

Let ηk=Dk11/2(HkZβk) and νk=Dk11/2ZA2(β̃kA2βkA2). Write Dk11/2(HkZA2β̃kA2)=Dk11/2(HkZβk)Dk11/2ZA2(β̃kA2βkA2)=ηkνk. We have (HkZA2β̃kA2)TDk1(HkZA2β̃kA2)=ηkTηk2ηkTνk+νkTνk. (A.11) can therefore be expressed as νkTνk2ηkTνkλk1jA1(βkj2β̃kj2). Note that |jA1(βkj2β̃kj2)||A1|β̃kA1βkA12|A1|β̃kA2βkA22. These two expressions indicate that

νkTνk2ηkTνkλk1|A1|β̃kA2βkA22. (A.12)

Let ηk* be the projection of ηk on the span of Dk11/2ZA2; i.e., ηk*=Dk11/2ZA2(ZA2TDk1ZA2)1ZA2TDk11/2ηk. By Cauchy-Schwarz inequality and aba2 + b2/4, we have that

2|ηkTνk|2ηk*2·νk22ηk*22+12νk22. (A.13)

From (A.12) and (A.13), we have

νkTνk4ηk*22+2λk1|A1|β̃kA2βkA22. (A.14)

By the definition of Dk1, we know that ZA2TDk1ZA2/n is a nonnegative definite matrix and its smallest positive eigenvalue cn* exists. Under Assumptions A5 and B3, by Lemma 3 in Huang et al. (2010) and part (i), we have cn*mn1 with probability converging to one. Since νk22ncn*β̃kA2βkA222 and 2aba2 + b2, from (A.14), we have

ncn*β̃kA2βkA2224ηk*22+(2λk1|A1|)22ncn*+12ncn*β̃kA2βkA222.

It follows that

β̃kA2βkA2228ηk*22ncn*+4λk12|A1|n2cn*2. (A.15)

Let fk((ti))=j=1pfkj(j(ti)),fkA((ti))=jAfkj(j(ti)),fk*((ti))=j=1pfkj*(j(ti)) and fkA*((ti))=jAfkj*(j(ti)) with f* (․) Defined by (2.4). For each component ηi of ηk, we have

ηi=dk11/2(ti)[Hk(ti)kjA2ZijTβkj]=dk11/2(ti){[k(ti)Xk(ti)]+[Xk(ti)μkfk(X(ti))]+(μkk)+[fkA2(X(ti))fk,A2((ti))]+[fkA2((ti))fkA2*((ti))]}=dk11/2(ti){[Γk(ti)+(μkk)+[fkA2(X(ti))fkA2((ti))]+[fkA2((ti))fkA2*((ti))]}.

Since |μkk|2 = Op(n−1) and fkjfkj*=O(mnϱ), we have

ηk*222Γk*22+2Λk*22+Op(1)+O(ndn2mn2ϱ), (A.16)

where Γk* and Λk* are projections of Гk = {Гk(t1), ⋯, Гk(tn)}T and Λk = {fkA2( (t1)), ⋯, fkA2 ( (tn))}T on the span of Dk11/2ZA2, respectively. We have Γk*22=(ZA2TDk1ZA2)1/2ZA2TDk11/2Γk221/ncn*ZA2TDk11/2Γk22. Now

maxA:|A|dn2ZA2TDk11/2Γk22=maxA:|A|dn2jAZjTDk11/2Γk22dn2mnmax1jp,1mmn|𝒵jmTDk11/2Γk|2,

where 𝒵jm = {ψm(j (t1)), ⋯, ψm(j(ti))}T. By Lemma A.1, we have

max1jp,1mmn|𝒵jmTDk11/2Γk|2=nmn1max1jp,1mmn|(mn/n)1/2𝒵jmTDk11/2Γk|2=Op(1)nmn1log(pmn).

It follows that,

Γk*22=Op(1)dn2log(pmn)cn*and similarlyΛk*22=Op(1)dn2log(pmn)cn*. (A.17)

Combining (A.15)(A.17), we get

β̃kA2βkA222=Op(dn2log(pmn)ncn*2)+Op(1ncn*)+O(dn2mn2ϱcn*)+4λk12|A1|n2cn*2.

Since dn2 = Op(q), cn*mn1 with probability converging to one, we have

β̃kA2βkA222=Op(mn2log(pmn)n)+Op(mnn)+O(1mn2ϱ1)+O(mn2λk12n2). (A.18)

This completes the proof of part (iii).

We now prove part (ii). Under Assumption B2, ‖fkj2cf, 1 ≤ jq,fkjfkj*2=O(mnϱ) and fkj*2fkj2fkjfkj*2, we have fkj*20.5cf for n sufficiently large. By a result of de Boor (2001), see also (12) of Stone (1986), there are positive constants c1 and c2 such that c1mn1βkj22fkj*22c2mn1βkj22. It follows that βkj22c21mnfkj*220.25c21cf2mn. Therefore, if ‖βkj2 ≠ 0 but ‖β̃kj2 = 0, then β̃kjβkj220.25c21cf2mn, which contradicts part (iii) since mn2log(pmn)/n0 and (λk12mn2)/n20 as n → ∞.

Proof of Theorem 2. By the definition of kj, 1 ≤ jp, parts (i) and (ii) follow from parts (i) and (ii) of Theorem 1 directly. Now, consider part (iii). By the properties of spline (de Boor, 2001), there exist positive constants c1 and c2 such that c1mn1β̃kjβkj22kjfkj*22c2mn1β̃kjβkj22 Thus

kjfkj*22=Op(mnlog(pmn)n)+Op(1n)+O(1mn2ϱ)+O(4mnλk12n2). (A.19)

By Assumption B3, fkjfkj*22=O(mn2ϱ). Part (iii) follows.

Corollary 1 follows from Theorem 2 directly. The proof of Theorem 3 essentially follows the proof of Corollary 2 in Huang et al. (2010) by similar changes to those given in the proof of part (iii) of Theorem 1, and we omit it here.

Contributor Information

Hulin Wu, Email: Hulin Wu@urmc.rochester.edu, Department of Biostatistics and Computational Biology, University of Rochester, School of Medicine and Dentistry, 601 Elmwood Avenue, Box 630, Rochester, NY 14642.

Tao Lu, Email: stat.lu11@gmail.com, Department of Epidemiology and Biostatistics, State University of New York, Albany, NY 12144.

Hongqi Xue, Email: Hongqi_Xue@urmc.rochester.edu, Department of Biostatistics and Computational Biology, University of Rochester, School of Medicine and Dentistry, 601 Elmwood Avenue, Box 630, Rochester, NY 14642.

Hua Liang, Email: hliang@gwu.edu, Department of Statistics, George Washington University, 801 22nd St. NW, Washington, D.C. 20052.

References

  1. Akutsu T, Miyano S, Kuhara S. Inferring qualitative relations in genetic networks and metabolic pathways. Bioinformatics. 2000;16:727–734. doi: 10.1093/bioinformatics/16.8.727. [DOI] [PubMed] [Google Scholar]
  2. Bansal M, Della Gatta G, di Bernardo D. Inference of gene regulatory networks and compound mode of action from time course gene expression profiles. Bioinformatics. 2006;22:815–822. doi: 10.1093/bioinformatics/btl003. [DOI] [PubMed] [Google Scholar]
  3. Bard Y. Nonlinear Parameter Estimation. New York: Academic Press; 1974. [Google Scholar]
  4. Beal MJ, Falciani F, Ghahramani Z, Rangel C, Wild DL. A Bayesian approach to reconstructing genetic regulatory networks with hidden factors. Bioinformatics. 2005;21:349–356. doi: 10.1093/bioinformatics/bti014. [DOI] [PubMed] [Google Scholar]
  5. Bornholdt S. Boolean network models of cellular regulation: prospects and limitations. Journal of the Royal Soceity Interface. 2008;5(Suppl 1):S85–S94. doi: 10.1098/rsif.2008.0132.focus. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Brunel N. Parameter estimation of ODE’s via nonparametric estimators. Electronic Journal of Statistics. 2008;2:1242–1267. [Google Scholar]
  7. Cantoni E, Flemming JM, Ronchetti E. Variable selection in additive models by nonnegative garrote. Statistical Modelling. 2011;11:237–252. [Google Scholar]
  8. Carthew RW, Sontheimer EJ. Origins and mechanisms of miRNAs and siR-NAs. Cell. 2009;136:642–655. doi: 10.1016/j.cell.2009.01.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Chen J, Chen Z. Extended Bayesian information criteria for model selection with large model spaces. Biometrika. 2008;95:759–771. [Google Scholar]
  10. Chen J, Wu H. Estimation of time-varying parameters in deterministic dynamic models. Statistica Sinica. 2008a;18:987–1006. [Google Scholar]
  11. Chen J, Wu H. Efficient local estimation for time-varying *coefficients in deterministic dynamic models with applications to HIV-1 dynamics. Journal of the American Statistical Association. 2008b;103:369–384. [Google Scholar]
  12. Claeskens G, Krivobokova T, Opsomer JD. Asymptotic properties of penalized spline estimators. Biometrika. 2009;96:529–544. [Google Scholar]
  13. Craven P, Wahba G. Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numerische Mathematik. 1979;31:337–403. [Google Scholar]
  14. de Boor C. A Practical Guide to Splines. revised ed. New York: Springer-Verlag; 2001. [Google Scholar]
  15. DeJong H. Modeling and simulation of genetic regulatory systems: a literature review. Journal of Computational Biology. 2002;9:67–103. doi: 10.1089/10665270252833208. [DOI] [PubMed] [Google Scholar]
  16. D’Haeseleer P, Wen X, Fuhrman S, Somogyi R. Linear modeling of mRNA expression levels during CNS development and injury. Pacific Symposium on Biocomputing. 1999;4:41–52. doi: 10.1142/9789814447300_0005. [DOI] [PubMed] [Google Scholar]
  17. Donnet S, Samson A. Estimation of parameters in incomplete data models defined by dynamical systems. Journal of Statistical Planning and Inference. 2007;137:2815–2831. [Google Scholar]
  18. Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association. 2001;96:1348–1360. [Google Scholar]
  19. Frank I, Friedman J. A statistical view of some chemometrics regression tools (with discussion) Technometrics. 1993;35:109–148. [Google Scholar]
  20. Friedman N, Linial M, Nachman I, Peer D. Using Bayesian networks to analyze expression data. Journal of Computational Biology. 2000;7:601–620. doi: 10.1089/106652700750050961. [DOI] [PubMed] [Google Scholar]
  21. Guo F, Hanneke S, Fu W, Xing EP. Recovering temporally rewiring networks: a model-based approach; Proceedings of the 24th International Conference on Machine Learning (ICML); 2007. [Google Scholar]
  22. Gupta M, Ibrahim JG. Varialbe selection in regression mixture modeling for the discovery of gene regulatory networks. Journal of the American Statistical Association. 2007;102:867–880. [Google Scholar]
  23. Gupta M, Qu P, Ibrahim JG. A temporal hidden Markov regression model for the analysis of gene regulatory networks. Biostatistics. 2007;8:805–820. doi: 10.1093/biostatistics/kxm007. [DOI] [PubMed] [Google Scholar]
  24. Hanneke S, Xing EP. Discrete Temporal Models of Social Networks; In proceedings of the Workshop on Statistical Network Analysis, the 23rd International Conference on Machine Learning (ICML-SNA); 2006. [Google Scholar]
  25. Hartemink A, Gifford D, Jaakkola T, Young R. Using graphical models and genomic expression data to statistically validate models of genetic regulatory networks. Pacific Symposium on Biocomputing. 2001;6:422–433. doi: 10.1142/9789814447362_0042. [DOI] [PubMed] [Google Scholar]
  26. Hecker M, Lambeck S, Toepfer S, van Someren E, Guthke R. Gene gegulatory network inference: data integration in dynamic models - a review. Biosystems. 2009;96:86–103. doi: 10.1016/j.biosystems.2008.12.004. [DOI] [PubMed] [Google Scholar]
  27. Heckerman D. A tutorial on learning with Bayesian networks. Microsoft Research Technical Report: MSR-TR-95-06. 1996 [Google Scholar]
  28. Heckman NE, Ramsay JO. Penalized regression with model-based penalties. Canadian Journal of Statistics. 2000;28:241–258. [Google Scholar]
  29. Hemker P. Numerical methods for differential equations in system simulation and in parameter estimation. Analysis and Simulation of Biochemical Systems. 1972:59–80. [Google Scholar]
  30. Hirose O, Yoshida R, Imoto S, Yamaguchi R, Higuchi T, Charnock-Jones DS, Print C, Miyano S. Statistical inference of transcriptional module-based gene networks from time course gene expression profiles by using state space models. Bioinformatics. 2008;24:932–942. doi: 10.1093/bioinformatics/btm639. [DOI] [PubMed] [Google Scholar]
  31. Holter NS, Maritan A, Cieplak M, Fedoroff NV, Banavar JR. Dynamic modeling of gene expression data. Proceedings of the National Academy of Sciences of the United States of America. 2001;98:1693–1698. doi: 10.1073/pnas.98.4.1693. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Huang J, Horowitz JL, Wei F. Variable selection in nonparametric additive models. The Annals of Statistics. 2010;38:2282–2313. doi: 10.1214/09-AOS781. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Huang JZ. Local asymptotics for polynomial spline regression. The Annals of Statistics. 2003;31:1600–1635. [Google Scholar]
  34. Huang Y, Liu D, Wu H. Hierarchical Bayesian methods for estimation of parameters in a longitudinal HIV dynamic system. Biometrics. 2006;62:413–423. doi: 10.1111/j.1541-0420.2005.00447.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Imoto S, Sunyong K, Goto T, Aburatani S, Tashiro K, Kuhara S, Miyano S. Bayesian network and nonparametric heteroscedastic regression for nonlinear modeling of genetic network. Journal of Bioinformatics and Computational Biology. 2003;6:231–252. doi: 10.1142/s0219720003000071. [DOI] [PubMed] [Google Scholar]
  36. Iwashima M, Irving BA, van Oers NS, Chan AC, Weiss A. Sequential Interactions of the TCR with two Distinct Cytoplasmic Tyrosine Kinases. Science. 1994;263:1136–1139. doi: 10.1126/science.7509083. [DOI] [PubMed] [Google Scholar]
  37. Jia G, Stephanopoulos GN, Gunawan R. Parameter estimation of kinetic models from metabolic profiles: Two-phase dynamic decoupling method. Bioinformatics. 2011;27:1964–1970. doi: 10.1093/bioinformatics/btr293. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Kauffman SA. Metabolic stability and epigenesis in randomly constructed genetic nets. Journal of Theoretical Biology. 1969;22:437–467. doi: 10.1016/0022-5193(69)90015-0. [DOI] [PubMed] [Google Scholar]
  39. Kim H, Lee JK, Park T. Inference of large-scale gene regulatory networks using regression-based network approach. J. Bioinform. Comput. Biol. 2009;7:717–735. doi: 10.1142/s0219720009004278. [DOI] [PubMed] [Google Scholar]
  40. Kojima K, Yamaguchi R, Imoto S, Yamauchi M, Nagasaki M, Yoshida R, Shimamura T, Ueno K, Higuchi T, Gotoh N, Miyano S. A state space representation of VAR models with sparse learning for dynamic gene networks. Genome Informatics. 2009;22:56–68. [PubMed] [Google Scholar]
  41. Kolar M, Song L, Ahmed A, Xing EP. Estimating time-varying networks. Annals of Applied Statistics. 2010;4:94–123. [Google Scholar]
  42. Ley SC, Davies AA, Druker B, Crumpton MJ. The T-Cell Receptor/CD3 Complex and CD2 Stimulate the Tyrosine Phosphorylation of Indistinguishable Patterns of Polypeptides in the Human T-Leukemic Cell-Line Jurkat. European Journal of Immunology. 1991;21:2203–2209. doi: 10.1002/eji.1830210931. [DOI] [PubMed] [Google Scholar]
  43. Li Y, Ruppert D. On the asymptotics of penalized splines. Biometrika. 2008;95:415–436. [Google Scholar]
  44. Li Z, Osborne MR, Pravan T. Parameter estimation in ordinary differential equations. IMA Journal of Numerical Analysis. 2005;25:264–285. [Google Scholar]
  45. Liang H, Wu H. Parameter estimation for differential equation models using a framework of measurement error in regression models. Journal of the American Statistical Association. 2008;103:1570–1583. doi: 10.1198/016214508000000797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Liang S, Fuhrman S, Somogyi R. REVEAL, a general reverse engineering algorithm for inference of genetic network architectures. Proc. Pacific Symp. on Biocomputing. 1998;3:18–29. [PubMed] [Google Scholar]
  47. Lu T, Liang H, Li H, Wu H. High dimensional ODEs coupled with mixed-effects modeling techniques for dynamic gene regulatory network identification. Journal of the American Statistical Association. 2011;106:1242–1258. doi: 10.1198/jasa.2011.ap10194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Martin S, Zhang Z, Martino A, Faulon JL. Boolean dynamics of genetic regulatory networks inferred from microarray time series data. Bioinformatics. 2007;23:866–874. doi: 10.1093/bioinformatics/btm021. [DOI] [PubMed] [Google Scholar]
  49. Meier L, van de Geer S, Buhlmann P. High-dimensional additive modeling. The Annals of Statistics. 2009;37:3779–3821. [Google Scholar]
  50. Murphy K, Mian S. Modelling gene expression data using dynamic Bayesian networks. Technical Report, University of California, Berkeley. 1999 [Google Scholar]
  51. Needham CJ, Bradford JR, Bulpitt AJ, Westhead DR. A primer on learning in Bayesian networks for computational biology. PLoS Computational Biology. 2007;3:e129. doi: 10.1371/journal.pcbi.0030129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Peterson EJ, Woods ML, Dmowski SA, Derimanov G, Jordan MS, Wu JN, Myung PS, Liu Q, Pribila JT, Freedman BD, Shimizu Y, Koretzky GA. Coupling of the TCR to integrin activation by SLAP-130/Fyb. Science. 2001;293:2263–2265. doi: 10.1126/science.1063486. [DOI] [PubMed] [Google Scholar]
  53. Poyton AA, Varziri MS, McAuley KB, McLellan PJ, Ramsay JO. Parameter estimation in continuous-time dynamic models using principal differential analysis. Computers & Chemical Engineering. 2006;30:698–708. [Google Scholar]
  54. Putter H, Heisterkamp SH, Lange JMA, de Wolf F. A Bayesian approach to parameter estimation in HIV dynamical models. Statistics in Medicine. 2002;21:2199–2214. doi: 10.1002/sim.1211. [DOI] [PubMed] [Google Scholar]
  55. Qi X, Zhao H. Asymptotic efficiency and finite-sample properties of the generalized profiling estimation of the parameters in ordinary differential equations. The Annals of Statistics. 2010;38:435–481. [Google Scholar]
  56. Radchenko P, James GM. Variable selection using adaptive nonlinear interaction structures in high dimensions. Journal of the American Statistical Association. 2010;105:1541–1553. [Google Scholar]
  57. Ramsay JO. Principal differential analysis: data reduction by differential operators. Journal of the Royal Statistical Society, Series B. 1996;58:495–508. [Google Scholar]
  58. Ramsay JO, Hooker G, Campbell D, Cao J. Parameter estimation for differential equations: a generalized smoothing approach (with discussion) Journal of the Royal Statistical Society, Series B. 2007;69:741–796. [Google Scholar]
  59. Ramsay JO, Silverman BW. Functional Data Analysis. 2nd ed. New York: Springer; 2005. [Google Scholar]
  60. Rangel C, Angus J, Ghahramani Z, Lioumi M, Sotheran E, Gaiba A, Wild DL, Falciani F. Modeling T-Cell activation using gene expression profiling and state-space models. Bioinformatics. 2004;20:1361–1372. doi: 10.1093/bioinformatics/bth093. [DOI] [PubMed] [Google Scholar]
  61. Ravikumar P, Lafferty J, Liu H, Wasserman L. Sparse additive models. Journal of the Royal Statistical Society, Series B. 2009;71:1009–1030. [Google Scholar]
  62. Ruppert D, Wand MP, Carroll RJ. Semiparametric Regression. Cambridge: Cambridge University Press; 2003. [Google Scholar]
  63. Sakamoto E, Iba H. Inferring a systems of differential equations for a gene regulatory network by using genetic programming; Proceedings of the IEEE Congress on Evolutionary Computation; 2001. pp. 720–726. [Google Scholar]
  64. Schumaker LL. Spline Functions: Basic Theory. New York: Wiley; 1981. [Google Scholar]
  65. Shimamura T, Imoto S, Yamaguchi R, Fujita A, Nagasaki M, Miyano S. Recursive regularization for inferring gene networks from time-course gene expression profiles. BMC Systems Biology. 2009;3:41–54. doi: 10.1186/1752-0509-3-41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Shmulevich I, Dougherty ER, Kim S, Zhang W. Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks. Bioinformatics. 2002;18:261–274. doi: 10.1093/bioinformatics/18.2.261. [DOI] [PubMed] [Google Scholar]
  67. Shojaie A, Basu S, Michailidis G. Adaptive thresholding for reconstructing regulatory networks from time-course gene expression data. Stat Biosci. 2012;4:66–83. [Google Scholar]
  68. Shojaie A, Michailidis G. Analysis of gene sets based on the underlying regulatory network. J. Comput. Biol. 2009;16:407–426. doi: 10.1089/cmb.2008.0081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Song L, Kolar M, Xing EP. Time-Varying Dynamic Bayesian Networks. Proceeding of the 23rd Neural Information Processing Systems, (NIPS) 2009 [Google Scholar]
  70. Spieth C, Hassis N, Streichert F. Comparing mathematical models on the problem of network inference; Proceedings of 8th Annual Conference on Genetic and evolutionary computation; 2006. pp. 279–285. [Google Scholar]
  71. Steuer R, Kurths J, Daub CO, Weise J, Selbig J. The mutual information: detecting and evaluating dependencies between variables. Bioinformatics. 2002;18:S231–S240. doi: 10.1093/bioinformatics/18.suppl_2.s231. [DOI] [PubMed] [Google Scholar]
  72. Stone CJ. The Dimensionality Reduction Principle for Generalized Additive Models. The Annals of Statistics. 1986;14:590–606. [Google Scholar]
  73. Stuart JM, Segal E, Koller D, Kim SK. A gene-coexpression network for global discovery of conserved genetic modules. Science. 2003;302:249–255. doi: 10.1126/science.1087447. [DOI] [PubMed] [Google Scholar]
  74. Thomas R. Boolean formalization of genetic control circuits. Journal of Theoretic Biology. 1973;42:563–585. doi: 10.1016/0022-5193(73)90247-6. [DOI] [PubMed] [Google Scholar]
  75. Tibshirani RJ. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B. 1996;58:267–288. [Google Scholar]
  76. van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. New York: Springer; 1996. [Google Scholar]
  77. Varah JM. A spline least squares method for numerical parameter estimation differential equations. SIAM Journal on Scientific and Statistical Computing. 1982;3:28–46. [Google Scholar]
  78. Varziri MS, Poyton AA, McAuley KB, McLellan PJ, Ramsay JO. Selecting optimal weighting factors in iPDA for parameter estimation in continuous-time dynamic models. Computers & Chemical Engineering. 2008;32:3011–3022. [Google Scholar]
  79. Voit EO. Computational Analysis of Biochemical Systems: a Practical Guide for Biochemists and Molecular Biologists. Cambridge: Cambridge University Press; 2000. [Google Scholar]
  80. Voit EO, Almeida J. Decoupling dynamical systems for pathway identification from metabolic profiles. Bioinformatics. 2004;22:1670–1681. doi: 10.1093/bioinformatics/bth140. [DOI] [PubMed] [Google Scholar]
  81. Wang H, Leng C. A note on the adaptive group Lasso. Computational Statisti & Data Analysis. 2008;52:5277–5286. [Google Scholar]
  82. Weaver DC, Workman CT, Stormo GD. Modeling regulatory networks with weight matrices. Pacific Symposium on Biocomputing. 1999;4:112–123. doi: 10.1142/9789814447300_0011. [DOI] [PubMed] [Google Scholar]
  83. Wei F, Huang J. Consistent group selection in high-dimensional linear regression. Bernoulli. 2010;16:1369–1384. doi: 10.3150/10-BEJ252. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Werhli AV, Husmeier D. Reconstructing gene regulatory networks with Bayesian networks by combining expression data with multiple sources of prior knowledge. Statistical Applications in Genetics and Molecular Biology. 2007;6(Iss.1) doi: 10.2202/1544-6115.1282. Article15. [DOI] [PubMed] [Google Scholar]
  85. Wessels LF, van Someren EP, Reinders MJ. A comparison of genetic network models. Pacific Symposium on Biocomputing. 2001;6:508–519. [PubMed] [Google Scholar]
  86. Wu H, Xue H, Kumar A. Numerical algorithm-based estimation methods for ODE models via penalized spline smoothing. Biometrics. 2012;68:344–352. doi: 10.1111/j.1541-0420.2012.01752.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Wu H, Zhang JT. Nonparametric regression methods for longitudinal data analysis. Hoboken: John Wiley & Sons; 2006. [Google Scholar]
  88. Xue H, Miao H, andWu H. Sieve estimation of constant and time-varying coefficients in nonlinear ordinary differential equation models by considering both numerical error and measurement error. The Annals of Statistics. 2010;38:2351–2387. doi: 10.1214/09-aos784. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Yeung MKS, Tegner J, Collins JJ. Reverse engineering gene networks using singular value decomposition and robust regression. Proceedings of the National Academy of Sciences of the United States of America. 2002;99:6163–6168. doi: 10.1073/pnas.092576199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B. 2006;68:49–67. [Google Scholar]
  91. Zhou S, Shen X, Wolfe D. Local asymptotics for regression splines and confidence regions. The Annals of Statistics. 1998;26:1760–1782. [Google Scholar]
  92. Zhou S, Wolfe D. On derivative regression in spline estimation. Statistica Sinica. 2000;10:93–108. [Google Scholar]
  93. Zou H. The adaptive LASSO and its oracle properties. Journal of the American Statistical Association. 2006;101:1418–1429. [Google Scholar]
  94. Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B. 2006;67:301–320. [Google Scholar]
  95. Zou M, Conzen SD. A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics. 2005;21:71–79. doi: 10.1093/bioinformatics/bth463. [DOI] [PubMed] [Google Scholar]

RESOURCES