Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Apr 1.
Published in final edited form as: J Stat Plan Inference. 2013 Apr;143(4):745–763. doi: 10.1016/j.jspi.2012.09.009

Statistical properties on semiparametric regression for evaluating pathway effects

Inyoung Kim a,*, Herbert Pang b, Hongyu Zhao c,d
PMCID: PMC3763850  NIHMSID: NIHMS455567  PMID: 24014933

Abstract

Most statistical methods for microarray data analysis consider one gene at a time, and they may miss subtle changes at the single gene level. This limitation may be overcome by considering a set of genes simultaneously where the gene sets are derived from prior biological knowledge. We call a pathway as a predefined set of genes that serve a particular cellular or physiological function. Limited work has been done in the regression settings to study the effects of clinical covariates and expression levels of genes in a pathway on a continuous clinical outcome. A semiparametric regression approach for identifying pathways related to a continuous outcome was proposed by Liu et al. (2007), who demonstrated the connection between a least squares kernel machine for nonparametric pathway effect and a restricted maximum likelihood (REML) for variance components. However, the asymptotic properties on a semiparametric regression for identifying pathway have never been studied. In this paper, we study the asymptotic properties of the parameter estimates on semiparametric regression and compare Liu et al.’s REML with our REML obtained from a profile likelihood. We prove that both approaches provide consistent estimators, have n convergence rate under regularity conditions, and have either an asymptotically normal distribution or a mixture of normal distributions. However, the estimators based on our REML obtained from a profile likelihood have a theoretically smaller mean squared error than those of Liu et al.’s REML. Simulation study supports this theoretical result. A profile restricted likelihood ratio test is also provided for the non-standard testing problem. We apply our approach to a type II diabetes data set (Mootha et al., 2003).

Keywords: Gaussian random process, Kernel machine, Mixed model, Pathway analysis, Profile likelihood, Restricted maximum likelihood

1. Introduction

Numerous statistical methods have been developed for analyzing microarray data based on single genes (Efron et al., 2004; Fan and Li, 2001; Tibshirani, 1996; Zou and Hastie, 2005). However, they may not detect coordinated yet subtle changes among a set of genes where each gene only shows modest changes. One way to address the limitation of single gene based analysis is to analyze gene sets derived from prior biological knowledge to uncover patterns among the genes within a set. A number of methods and programs have been developed to consider gene groupings based on gene ontology (GO) (Harris et al., 2004). These methods have been successful to detect subtle changes in expression levels which could be missed using single gene-based analysis (Mootha et al., 2003; Hosack et al., 2003; Rajagopalan and Agarwal, 2005). In our following discussion, we consider a pathway as a predefined set of genes that serve a particular cellular or physiological function.

Limited work has been done in the regression settings to study the effects of clinical covariates and expression levels of genes in a pathway on a continuous clinical outcome. Goeman et al. (2004) proposed a global test derived from a random effects model. Liu et al. (2007) proposed a semiparametric regression model for high dimensional covariate data. Liu et al.’s approach is developed by connecting between a least squares kernel machine in a nonparametric model with REML in a linear mixed model. However, statistical properties of their approach have not been studied. In the rest of this paper, we refer Liu et al.’s approach to LLG’s approach.

The goal of our study is to provide the asymptotic properties of these estimators on consistency, convergence rate, and limiting distributions under regularity conditions. We compare LLG’s estimators with ours which are estimated using a REML obtained from a profile likelihood. We note that although our likelihood is a penalized likelihood, we simply use it as a profile likelihood. Note that there is a penalized likelihood called penalized Henderson’s likelihood (Wang, 1998; Gu and Ma, 2005) which are obtained using smoothing splines.

We show that our REML estimators give more accurate score equations and information matrix, and have smaller mean squared errors (MSEs) than those of LLG’s REML. For non-standard testing problems, a profile restricted likelihood ratio test is also provided.

This paper is organized as follows. In Section 2, we give the definition of the least square kernel machine estimation in the semiparametric model. We then explain LLG’s likelihood-based approach on semiparametric regression and compare their method with our REML obtained from a profile likelihood approach. Section 3 describes the asymptotic properties for both estimators. In Section 4, we provide a testing procedure based on a profile restricted likelihood ratio test for nonstandard testing problem and also derive its theoretical distribution. In Section 5 we report the simulation results comparing MSEs, type I errors, and power of LLG’s estimators with those. In Section 6, we apply our approach to the type II diabetes data set analyzed by Mootha et al. (2003). Section 7 contains concluding remarks.

2. Semiparametric regression

A semiparametric regression model can be written as

Y=Xβ+r(Z)+ε, (1)

where Y is an n × 1 vector denoting the continuous outcome measured on n subjects, X is an n × q matrix representing q clinical covariates of these subjects, β is a q × 1 vector of regression coefficients for the covariate effects, Z is a p × n matrix denoting the gene expression matrix for p genes (pn), Z = [z1, …, zn], and zi is a p × 1 vector for the gene expression levels of the ith subject, r(·) is an unknown nonlinear smooth function, and ε ~ N(0, σ2I), where σ2 >0. Because of the high dimensional space of Z, we make statistical inference for the model (1) by connecting a least squares kernel machine with a Gaussian random process, where r(Z) = {r(z1), …, r(zn)} ~ N(0, τK) with r(·) following a Gaussian process (GP) with mean 0 and covariance cov{r(z), r(z′)} = τK(z, z′), τ>0, z and z′ represent two different arbitrary p × 1 vectors, K(·, ·) is a kernel function which implicitly specifies a unique function space spanned by a particular set of orthogonal basis functions, and K is an n × n matrix with the ijth component K(zi, zj), i, j = 1, …, n.

The linear effects of the clinical variables are adjusted in this model. This model reduces to the standard linear regression model when τ = 0. If τ>0, two samples i and j with similar gene expression patterns have correlated random effects r(zi) and r(zj), and therefore they have a greater probability of having similar outcomes yi and yj than samples with less similar expression patterns. This “similarity” is measured using a kernel function.

We consider the following three kernels K(·, ·) to model the covariance matrix of pathway effects:

  • The dth order polynomial kernel, K(z, z′) = (zz′)d, d = 1, 2, which quantifies similarity through the inner-product.

  • Gaussian kernel, K(z, z′) = exp(−||zz′||2/ρ), where ||z-z||2=k=1p(zk-zk)2 and ρ is an unknown scale parameter (ρ>0). The “similarity” in this kernel is measured using the Euclidean distance between z and z′.

  • Neural network kernel, K(z, z′) = tanh(z · z′), which uses the hyperbolic tangent (tanh) function to quantify similarity.

By connecting a least squares kernel machine (Cristianini and Shawe-Taylor, 2006) with a restricted maximum likelihood (REML) (Zhang et al., 1998; Wang, 1998), Liu et al. (2007) estimated the nonparametric pathway effects of multiple gene expressions, r(Z), using the least squares kernel machine, estimated the covariate effects, β, using the weighted least squares estimator, and estimated the parameters of the variance components using REML. They derived the score equations and information matrix of variance component parameters by treating the estimated β̂ as if they were the true parameter without uncertainty. However, because β̂ depends on the variance components parameters, we consider incorporating the relationship between β̂ and the variance components parameters into the REML to obtain more accurate score equations and information matrix of these parameters.

We adopt the same general approach to estimate the nonparametric function r(·). β and r(·) are estimated by maximizing the scaled penalized likelihood function with smoothing parameter λ

L{β,r(·)}=-12i=1n{yi-xiβ-r(zi)}2-12λ||r||HK2,

where ||·||HK represents the norm induced by Kernel K of Hk, the function space generated by a kernel function K. This estimation is called least square kernel machine estimation (Liu et al., 2007).

By using the dual formulation (Cristianini and Shawe-Taylor, 2006; Liu et al., 2007) to reduce the high dimensional problem into a low dimensional one, the parameters β and r(·) can be estimated as follows:

β^={XT(I+λ-1K)-1X}-1XT(I+λ-1K)-1y,r^=λ-1K(I+λ-1K)-1(y-Xβ^).

For estimating the variance components parameters, we start from a mixed model formulation with the assumption that the nonparametric function r(·) follows a Gaussian process (GP) with mean 0 and covariance cov{r(z), r(z′)} = τK(z, z′) = τ exp(−||zz′||2/ρ), where ρ>0 is an unknown scale parameter. Using the marginal distribution of y ~ N(Xβ, Σ), the regression coefficient estimator β̂ can be obtained from the weighted least squares estimator β̂ = (XTΣ−1X)−1XTΣ−1Y, where Σ = σ2I+τK = Σ(Θ), where Θ = (τ, ρ, σ2) and Σ(Θ) are an n × n positive definite variance matrix. The covariance of β̂ and the nonparametric function (·) can be estimated as cov(β̂) = (XTΣ−1X)−1, cov() = τK−(τK)P(τK), and cov{(z)} = τK(z, z)− (τKz)P(τKz), where P = Σ−1Σ−1X(XTΣ−1X)−1XTΣ−1 and Kz = {K(z, z1), …, K(z, zn)}T. Based on the above estimates, the variance components parameters, Θ = (τ, ρ, σ2), can be estimated through the REML, a commonly used approach for estimating variance components in the mixed effects model, and the smoothing parameter can be obtained from λ = τ−1σ2. The REML is

l(Θ)=-12log-12logXT-1X-12(y-Xβ)T-1(y-Xβ).

LLG’s REML and our REML are

l1(Θ)=-12(y-Xβ^)T-1(y-Xβ^)-12logXT-1X-12log, (2)
l2(Θ)=-12{y-X(XT-1X)-1XT-1y}T-1{y-X(XT-1X)-1XT-1y}-12logXT-1X-12log. (3)

LLG calculate the score equations and information matrix of Θ = (τ, ρ, σ2) at given β̂. Unlike LLG’s approach (2007), we consider a profile likelihood and replace β̂ by (XTΣ−1X)−1XTΣ−1y because β̂ depends on the parameters Θ. We then obtain the score equations and information matrix of Θ (see Appendix A.1). The parameters are estimated using the Newton–Raphson method to estimate all parameters.

3. Asymptotic properties

In this section, we will provide the asymptotic properties on consistency, convergence rate, and limiting distribution for the estimators. We will also show that the estimators from our REML have asymptotically smaller MSE than those of LLG’s REML.

We first give a consistency result of our estimators when the true parameter Θ0 is either in the interior region or on the boundary of parameter space Ω. For this result we need that the parameter space (Ω) near Θ0 behaves like a closed set (Self and Liang, 1987; Vu and Zhou, 1997) which imposes the condition that the intersection of Ω and the closure of neighborhoods centered about Θ0 compose closed subsets. This requirement is satisfied because our Ω is a rectangle which is a cross product of intervals.

Theorem 1

If Θ̂1 and Θ̂2 are estimators of Θ using REML functions (2) and (3), respectively, then Θ̂1 and Θ̂2 are both consistent estimators under regularity conditions.

Theorem 1 proves that the estimators from both approaches are consistent under regularity conditions. These regularity conditions are described in Appendix A.2. The proof of Theorem 1 is summarized in Appendix A.3.

Next, we will show that both approaches have n convergence rate. Before proving this convergence rate, we first explain the following two lemmas (Lemmas 1 and 2).

Lemma 1

If Σ and Σk are simultaneously diagonalizable, where Σk is the first derivative of Σ with respect to the kth component of Θ, then (1/n) tr(k) uniformly converges. That is, (1/n) tr(τ) and (1/n) tr(σ2) uniformly converge.

Its proof is summarized in Appendix A.4. Lemma 1 implies that tr(ij)1/2 uniformly converges because 0≤tr(ij)1/2 ≤tr(i)1/2 tr(j)1/2.

Lemma 2

If ΣΣk = ΣkΣ, then Σ and Σk are simultaneously diagonalizable.

It is well known that if AB = BA for two matrices A and B, then A and B are simultaneously diagonalizable. We need to use this fact in our proof.

The n convergence rate can be proved using (1) these two lemmas, (2) four conditions C1–C4 (see Appendix A.5), which are described in Cressie and Lahiri (1993, 1996), and (3) the condition that the intersection of Ω and the closure of neighborhoods centered about Θ0 constitute closed subsets (Vu and Zhou, 1997).

Theorem 2

Under conditions C1–C4 in Appendix A.5, n(Θ^1-Θ)=O(1) and n(Θ^2-Θ)=O(1).

In contrast to Cressie and Lahiri (1993, 1996) who showed the asymptotic property of REML estimator in a linear model with Gaussian error, we consider a semiparametric model. The proof of Theorem 2 is described in Appendix A.6. Because the intersection of Ω and the closure of neighborhoods centered about Θ0 constitute closed subsets, this result also holds when the true parameter is on the boundary of the parameter space Ω (Vu and Zhou, 1997). In our study we use the Gaussian, polynomial, and neural network kernels. We show that Σ and Στ are simultaneously diagonalizable as well as Σ and Σσ2. For the Gaussian kernel, using Lemma 2, we show that Σ and Σρ are simultaneously diagonalizable when n goes to ∞.

Theorem 3

(3.1) Suppose that the true parameters are the interior points of the parameter space. Then under regularity conditions, Θ̂1 and Θ̂2 are asymptotically normally distributed.

(3.2) Suppose that one component of the true parameters is a left endpoint of the parameter space. Then under regularity conditions, Θ̂1 and Θ̂2 are asymptotically distributed with a mixture of normal distributions.

The proof of (3.1) is explained in Appendix A.7. The proof of (3.2) can be shown by Chant’s (1974) case (i).

As for the MSEs, we compare our estimator based on our REML (2) with LLG’s estimator based on LLG’s REML (1). We prove that the estimators from our REML have asymptotically smaller MSEs than those of LLG’s REML in the following theorem.

Theorem 4

E||Θ̂2Θ||2 < E||Θ̂1Θ||2 asymptotically.

Its proof is given in Appendix A.8. This results implies that our score equations and information matrix are more accurate than those of Liu et al. (2007) because we take into account the relationship between β̂ and the variance components in REML.

Although estimators of parameters in the semiparametric regression approach developed by connecting a least squares kernel machine with REML have useful large sample properties, there are some situations that ρ and τ are not identifiable. This result is summarized in Theorem 5.

Theorem 5

ρ and τ are not identifiable under one of the following situations: (i) τ→0; (ii) ρ→0 and τ ~ O(1/ρm) for any positive m; or (iii) 1/ρ→0.

This is because the marginal distribution of Y, f(Y|τ) = f(Y|ρ) ~ N(0, σ2I) under conditions (i) and (ii) and f(Y|τ) = f(Y|ρ) ~ N(0, σ2I+τJ) under condition (iii). Its proof is given in Appendix A.9.

4. Test for the nonparametric function

Our main interest is to test H0: r(Z) = constant, where r(·) is an unknown nonlinear smooth function under model (1), which means that a null hypothesis under the Gaussian random process using Gaussian kernel, H0: {r(Z) is a point mass at zero} ∪ {r(Z) has a constant covariance matrix as a function of Z} is equivalent to H0: τ/ρ = 0 or ρ = 0. For the polynomial and neural network kernels, our null hypothesis is H0: τ = 0. The proof of this equivalence is given in Appendix A.10. We perform this testing based on two approaches: one is a profile restricted likelihood ratio test, which is described in Section 4.1, and the other is a permutation test, which is explained in Section 4.2.

4.1. Profile restricted likelihood ratio test

We derive a profile restricted likelihood ratio test (PRLRT) by taking into account that the parameter of interest under the null hypothesis is on the boundary of the parameter space and the kernel matrix K is not a block diagonal matrix. Theoretical properties of our profile REML estimators, which we have shown in Section 3, are needed for PRLRT. We can perform PRLRT test using both empirical distribution obtained by permutation as well as theoretical distribution derived in the following subsections for each null hypothesis.

4.1.1. Test for H0: τ/ρ = 0 vs Ha: τ/ρ>0

Test for H0 means that either H01: τ = 0 and ρ is positive and bounded or H02: ρ = ∞ and τ is positive and bounded. The PRLRT can be derived as follows.

Test for H01: τ = 0 vs τ >0, where ρ is positive and bounded. Let Ω be the parameter space for Θ = (τ, ρ, σ2)T. Denote Ω0 and Ω1 = Ω/Ω0 as the parameter spaces under H0 and Ha. The true parameters Θ0 are either in the interior or on the boundary of the parameter space Ω. Assume that the parameter spaces Ω0 and Ω1 can be approximated at Θ0 by cones CΩ0 and CΩ1, respectively, with vertex Θ0.

The PRLRT statistic, D, is the deviance of two times the log PRLRT, D = 2l2(Θ)−2l2(Θ0). By Claeskens (2004) who extended the non-standard LRT test to PRLRT, D converges to

DinfΘC0||U-Θ||2-infΘC||U-Θ||2,

where = {Θ̃: Θ̃ = I(Θ0)T/2(ΘΘ0), ΘCΩ} is the orthonormal transformation of the cone approximation, CΩ, of the parameter space Ω with Θ0 as the vertex, and 0 = {Θ̃: Θ̃ = I(Θ0)T/2(ΘΘ0), ΘCΩ0} is the orthonormal transformed cone approximation of the parameter space Ω0 under the null hypothesis. U is a random vector from N(0, I), and I(Θ0)T/2 is the right Cholesky square root of profile REML information matrix, i.e, I(Θ0) = [I(Θ0)]1/2[I(Θ0)]T/2.

Under the null hypothesis, Θ0 = (0, ρ, σ2)T, ρ is inestimable. Let Θ = (τ, σ2). Now the cone parameter spaces are reduced to CΩ = [0, ∞) × (0, ∞). CΩ0 = {0} × (0, ∞) and CΩ1 = (0, ∞)2.

Decompose normal vector U and I(Θ0) as U = (U1, U2)T and I(Θ0) = {Ijk} corresponding to τ and σ2. After some algebra, we can easily show that

infΘC0||U-Θ||=(I11-I12I22-1I21)U12

and

infΘC||U-Θ||=(I11-I12I22-1I21)U12(U10),

where U1=(I11-I12I22-1I21)1/2U1. Therefore, the difference between these two becomes (I11-I12I22-1I21)U12(U1>0). Since (I11-I12I22-1I21)1/2U1~N(0,1), the asymptotic distribution of D under H01 is a 50:50 mixture between a point mass at zero and a chi-square distribution with 1 degree of freedom, i.e., 0.5χ02+0.5χ12.

Test for H02: ρ = ∞, where τ is positive and bounded. This test H02: ρ = ∞ means that 1/ρ = 0. Let us denote that ρ* = 1/ρ. In this case, Θ=(θ1,θ2T), where θ1 = ρ* and θ2 = (τ, σ2). CΩ0 = {0} × (0, ∞)2 and CΩ1 = (0, ∞)3. Similar to the previous case, decompose normal vector U and I(Θ0) as U=(U1,U2T)T and I(Θ0) = {Ijk} corresponding to θ1 and θ2. We can then show that

infΘC0||U-Θ||=(I11-I12I22-1I21)U12, (4)
infΘC||U-Θ||=(I11-I12I22-1I21)U12(U10). (5)

In a similar way, the asymptotic distribution of D under H02: ρ = ∞ is a 50:50 mixture between a point mass at zero and a chi-square distribution with 1 degree of freedom, i.e., 0.5χ02+0.5χ12.

4.1.2. Test for H0: ρ = 0

Test for H0 means that either H03: ρ = 0 and τ is positive and bounded or H04: ρ = 0 and τ = 0.

Test for H03: ρ = 0, where τ is positive and bounded. In this case, Θ=(θ1,θ2T), where θ1 = ρ and θ2 = (τ, σ2). CΩ0 = {0} × (0, ∞)2 and CΩ1 = (0, ∞)3. Similar to the previous case, decompose U and I(Θ0) as U=(U1,U2T)T and I(Θ0) = {Ijk} corresponding to θ1 and θ2. In a similar way, the asymptotic distribution of D under H03 is a 50:50 mixture between a point mass at zero and a chi-square distribution with 1 degree of freedom, i.e., 0.5χ02+0.5χ12.

Test for H04: ρ = 0 and τ = 0, where ρ < o(τ). In this case, Θ = (θ1, θ2, θ3), where θ1 = ρ, θ2 = τ, and θ3 = σ2. CΩ0 = {0}2 × (0, ∞) and CΩ1 = (0, ∞)3. Decompose normal vector U and I(Θ0) as U = (U1, U2, U3)T and I(Θ0) = {Ijk} corresponding to θ1, θ2, and θ3. Under the orthonormal transformation, the cone spaces become to = {Θ; ηθ2θ1 ≥ 0, θ2 ≥ 0} and 0 = {Θ: θ2 = θ1 = 0}, where η = Ĩ12|Ĩ(Θ0)|−1/2 is the slope of the axis θ2 after transformation and

I(Θ0)=(I11I12I21I22)=(I11I12I21I22)-(I13I23)I33-1(I31I32).

We can then show that

infΘC0||U-Θ||2=U12+U22.

The representations of infΘ||UΘ|| are different in the four regions of the plane with coordinates (θ1, θ2)T

infΘC||U-Θ||2={0,θ20,ηθ2-θ10,U12+U22-(ηU1+U2)2/(1+η2),θ2+ηθ10,ηθ2-θ1<0,U22,θ2<0,θ10,U12+U22,θ2+ηθ1<0,θ1<0.

The area proportions (π, 1/4, 1/4,1/2−π) as in the aforementioned order, of these four regions determine the probabilities that the vector U lies in which region, where π = cos−1{η · (1+η2)−1/2} = Ĩ12 · (Ĩ11Ĩ22)−1/2. Then the asymptotic distribution of D is the difference of the above two representations

D{U12+U22withprobabilityπ,(ηU1+U2)2/(1+η2)withprobability14,U12withprobability14,0withprobability12-π.

Because U1 and U2 are independent, (ηU1+U2)/1+η2~N(0,1), and the final approximate asymptotic distribution of D is ηχ22+0.5χ12+(0.5-η)χ02.

4.2. Permutation test

For identifying significant pathways, we also consider using the following permutation procedures.

  • Step 1: estimate τ/ρ, ρ, τ using the observed data by fitting the semiparametric model and calculate the residuals ε̂0i from yi = xiβ + ε0i.

  • Step 2: permute the residual ε^0i and simulate outcomes as yi=xiβ^+ε^0i.

  • Step 3: based on y*, x, and z, fit the semiparametric model using the likelihood-based approach and then estimate τ̂*/ρ̂*, ρ̂*, and τ̂*.

  • Step 4: repeat Step 1–Step 3 a large number of times, e.g., 10,000 times.

  • Step 5: estimate the statistical significance by the percentage of times either τ̂*/ρ̂*>τ̂/ρ̂ or ρ̂*>ρ̂, where τ̂/ρ̂, ρ̂, τ̂ are the estimated values from the observed data.

Significant pathways can be selected based on this percentage, which is equivalent to a statistical significance.

To rank the importance of genes within a significant pathway related to clinical outcomes, we perform the following steps:

  • Step 6: bootstrap the samples B times. For each bootstrapped sample, we estimate τ^(-g)b,ρ^(-g)b, and (τ^/ρ^)(-g)b by fitting the following model, where b = 1, …, B,
    Y=Xβ+r{Z-(g)}+ε.
  • Step 7: calculate the absolute value of difference, τ^(-g)b-τ^,ρ^(-g)b-ρ^, and (τ^/ρ^)(-g)b-(τ^/ρ^), under the Gaussian kernel. For other kernels, only calculate the absolute value of difference τ^(-g)b-τ^.

  • Step 8: rank the importance of the genes by the mean absolute difference (1/B)b=1Bτ^(-g)b-τ^. If a gene plays an importance role in a pathway, this difference will be large.

We can also rank the importance of gene pairs by performing the same procedure except for fitting the following model

Y=Xβ+r{z-(g,g)}+ε.

5. Simulation studies

5.1. Mean squared error and coverage probability

We conducted simulations to compare the MSEs and coverage probabilities of LLG’s estimators with those of our estimators.

We considered the following cases and simulated 1000 data sets with two sets of n and p values. One set is (n, p) = (60, 5) with a relatively large sample size compared to the number of genes and the other set is (n, p) = (60, 200) with a relatively small sample size compared to the number of genes.

  • Case 1: yi = βxi + r(zi1, zi2, …, zip)+ εi with β = 1, xi = 3 cos(zi1)+2ui with ui being independent of zi1 and following N(0, 1), zij ~ Uniform(0, 1), and the true r(z) ~ GP{0, τKg(ρ)}, where Kg(ρ)(z, z′) = exp(−||zz′||2/ρ) and ||·|| was used as Euclidean norm.

  • Case 2: the same setting as that of case 1 except that r(z) ~ GP(0, τKp2), where Kp2(z, z′) = (zz′)2.

  • Case 3: the same setting as that of case 1 except that r(z) ~ GP(0, τKp1), where Kp1(z, z′) = (zz′)1.

For each case, we estimated parameters in the semiparametric regression model using the LLG’s REML and our REML. We calculated the average MSEs and coverage probability of the parameter estimates. The results are summarized in Table 1. For both n>p and n<p cases, our REML estimators have smaller MSEs than that of LLG’s REML estimators. Both approaches have comparable coverage probability.

Table 1.

The coverage probability (cvpr) of 95% confidence intervals and average mean squared error values (mse) of parameter estimates. LLIKE=LLG’s likelihood-based approach with kernel K; PLIKE=our profile likelihood-based approach with kernel K; GK=Gaussian kernel; P2K=Quadratic kernel; P1K=Linear kernel.

The number of sample size and gene K Method Case Coverage probability and mean squared error β σ τ ρ
n=60 GK LLIKE 1 cvpr × 100 97 96 91 90
p=5 mse 0.0053 0.0247 0.0483 0.0623
PLIKE cvpr 97 96 92 91
mse 0.0048 0.0151 0.0322 0.0458
P2K LLIKE 2 cvpr × 100 98 93 95 N/A
mse 0.0053 0.0485 0.0252 N/A
PLIKE cvpr × 100 98 93 95 N/A
mse 0.0050 0.0474 0.0243 N/A
P1K LLIKE 3 cvpr × 100 97 93 95 N/A
mse 0.0051 0.0104 0.0165 N/A
PLIKE cvpr × 100 97 93 95 N/A
mse 0.0042 0.0095 0.0142 N/A
n=60 GK LLIKE 1 cvpr × 100 89 87 87 80
p=200 mse 0.0321 0.0436 0.0424 0.1261
PLIKE cvpr × 100 89 89 87 80
mse 0.0304 0.0295 0.0421 0.112
P2K LLIKE 2 cvpr × 100 86 63 87 N/A
mse 0.0478 0.4683 0.0388 N/A
PLIKE cvpr × 100 86 63 87 N/A
mse 0.0475 0.4620 0.0379 N/A
P1K LLIKE 3 cvpr × 100 86 65 82 N/A
mse 0.0486 0.2984 0.0583 N/A
PLIKE cvpr 86 66 85 N/A
mse 0.0480 0.2839 0.0564 N/A

5.2. Consistency

We also performed additional simulation to study the increased accuracy of our estimates as n increases when p is fixed. We considered n=15, 30, 60, and 600, for one small, two medium, and one relatively large sample sizes. For each case, we simulated 1000 data sets and average MSEs are summarized in Table 2.

Table 2.

The average mean squared error values (mse) of parameter estimates for different sample sizes. LLIKE=LLG’s likelihood-based approach with kernel K; PLIKE=our profile likelihood-based approach with kernel K; GK=Gaussian kernel.

The number of sample size and gene K Method Case Coverage probability and mean squared error β σ τ ρ
n=15 GK LLIKE 1 mse 0.0865 0.0774 0.0799 0.0974
p=5 PLIKE mse 0.0741 0.0643 0.0673 0.0791
n=30 GK LLIKE 1 mse 0.0067 0.0319 0.0529 0.0723
p=5 PLIKE mse 0.0062 0.0234 0.0436 0.0558
n=60 GK LLIKE 1 mse 0.0053 0.0247 0.0483 0.0623
p=5 PLIKE mse 0.0048 0.0151 0.0322 0.0458
n=600 GK LLIKE 1 mse 0.0025 0.0114 0.0235 0.0312
p=5 PLIKE mse 0.0020 0.0107 0.0197 0.0259

5.3. Type I error and power

For the assessment of type I error and power for both approaches, we considered the following Case 4 for type I error and Case 5 for power, respectively. We consider Case 5 because the nonparametric function r(zi1, zi2, …, zip) is allowed to have a complex form with nonlinear functions of the z’s and interactions among the z’s. It allows for xi and (zi1, …, zip) to be correlated. We obtained type I error and power using a profile restricted likelihood ratio test (PRLRT) described in Section 4.1. We performed PRLRT using both empirical distribution as well as theoretical distribution. We denote them as “PRLRT(e)” and “PRLRT(t)”, respectively. We also compared them with the permutation test described in Section 4.2. This permutation is denoted as “PERM”.

  • Case 4: yi = βxi +εi with β = 1, xi = 3 cos(π/6)+ 2ui with ui ~ N(0, 1) and εi ~ N(0, σ2).

  • Case 5: yi = βxi +r(zi1, zi2, …, zip)+εi, xi = 3 cos(zi1)+ 2ui with ui being independent of zi1 and following N(0, 1), zij ~ Uniform(0, 1), and r(z)=10cos(z1)-15z22+10exp(-z3)z4-8sin(z5)cos(z3)+20z1z2.

The estimated type I error rates from both approaches are summarized in Tables 35 and they are all close to the nominal level when n>p. However, when n<p, both methods differ from the nominal level. The power for both approaches are comparable. The performance between “PRLRT (e)” and “PRLRT (t)” was similar to each other, so was the performance between “PRLRT” and “PERM”.

Table 3.

Estimated type I error rate and power based on profile restricted likelihood ratio test (PRLRT) and permutation test (PERM). LLIKE=LLG’s likelihood-based approach with kernel K; PLIKE=our profile likelihood-based approach with kernel K; GK=Gaussian kernel; PRLRT(e) and PRLRT(t) are profile restricted likelihood ratio tests which are performed using empirical distribution and theoretical distribution, respectively.

The number of sample size and gene Type I error and power Kernel (GK) LLIKE
PLIKE
PERM PRLRT(e) PRLRT(t) PERM PRLRT(e) PRLRT(t)
Case
n = 60 Type I 4 0.04 0.05 0.05 0.04 0.05 0.05
p = 5 Power 5 0.99 0.99 1 0.99 0.99 1
n = 60 Type I 4 0.03 0.04 0.04 0.03 0.04 0.04
p = 200 Power 5 0.84 0.84 0.84 0.84 0.85 0.85

Table 5.

Estimated type I error rate and power based on profile restricted likelihood ratio test (PRLRT) and permutation test (PERM). LLIKE=LLG’s likelihood-based approach with kernel K; PLIKE=our profile likelihood-based approach with kernel K; P1K=Linear kernel; PRLRT(e) and PRLRT(t) are profile restricted likelihood ratio tests which are performed using empirical distribution and theoretical distribution, respectively.

The number of sample size and gene Type I error and power Kernel (P1K) LLIKE
PLIKE
PERM PRLRT(e) PRLRT(t) PERM PRLRT(e) PRLRT(t)
Case
n = 60 Type I 4 0.04 0.05 0.05 0.04 0.05 0.05
p = 5 Power 5 0.97 1 1 0.97 1 1
n = 60 Type I 4 0.03 0.04 0.04 0.03 0.04 0.04
p = 200 Power 5 0.82 0.83 0.83 0.82 0.83 0.83

6. Real data analysis

6.1. Pathway based analysis for type II diabetes

We applied our profile likelihood approach to a microarray expression data set on type II diabetes (Mootha et al., 2003), where a microarray expression data from 22,283 genes were measured in 17 male patients with normal glucose tolerance and 18 male samples with type II diabetes mellitus. Because their approach is based on a normalized Kolmogorov–Smirnov statistics, their method is unable to model the continuous outcome, such as glucose level, and clinical covariates such as age. Incorporating such information into analysis might be helpful to more efficiently detect subtle differences in gene expression profiles. We studied a total of 277 pathways consisting of 128 KEGG pathways (http://www.genome.jp/kegg/pathway.html) and 149 curated pathways. The 149 curated pathways were constructed from known biological experiments by Mootha and colleagues.

In our analysis, let Y be the log transformed glucose level, X be the age, and Z be the p × n gene expression levels within each pathway, where n is 35, i.e. the number of subjects, p is the number of genes in a specific pathway which varied from 4 to 200 across these pathways. Our goal is to identify pathways that affect the glucose level related to diabetes after adjusting for the age effect and also to rank genes within a significant pathway. To identify significant pathways, we fitted the semiparametric model.

6.2. Identifying significant pathways

We chose the 0.05 cut off for the statistical significance using both theoretical and empirical distributions of PRLRT and the percentage which was described in Section 4.2. We also applied existing multiple comparison methods (Storey, 2002, 2003) to our pathway data, although our pathways are not independent of each other because of shared genes and interactions among pathways. The FDR q-values were between 0.081 and 0.303. Our approach using both theoretical and empirical distributions took about 60 min to run, using a Mac Pro with two 3.0 GHz Quad-Core Intel Xeon processors, and 10 GB of memory. Our code is written in Matlab and is available upon request.

For overlapping pathways across four kernels, we selected top 50 pathways for each kernel and then examined the common pathways across four kernels. We found that a total of seven pathways were common among the four kernels, including Alanine and aspartate metabolism, Oxidative phosphorylation, and RNA polymerase pathways. Since one of them is a subset of another pathway, six pathways are summarized in Table 6.

Table 6.

Seven pathways which are significant over four kernels (Linear, Quadratic, Gaussian, Neural network kernels) using pathway data of type II diabetes; P1K=linear kernel; P2K=quadratic kernel; GK=Gaussian kernel; NNK=neural network kernel; P-values are obtained using both profile restricted likelihood ratio test (PRLRT) and permutation test (PERM); PRLRT(t) is a profile restricted likelihood ratio test which is performed using theoretical distribution.

Pathway ID Pathway name (# of genes) P1K
P2K
GK
NNK
PERM PRLRT(t) PERM PRLRT(t) PERM PRLRT(t) PERM PRLRT(t)
4 Alanine and aspartate metabolism (18) 0.009 0.008 0.003 0.002 0.034 0.030 0.016 0.014
36 c17_U133_probes (116) 0.029 0.031 0.002 0.001 0.003 0.002 0.015 0.013
133 MAP00190_Oxidative_phosphorylation (58) 0.025 0.027 0.007 0.007 0.032 0.029 0.022 0.025
229 Oxidative_phosphorylation (113) 0.023 0.021 0.008 0.009 0.035 0.039 0.024 0.022
209 MAP03020_RNA_polymerase (21) 0.034 0.031 0.027 0.026 0.040 0.041 0.025 0.026
254 RNA polymerase (25) 0.037 0.038 0.028 0.023 0.041 0.043 0.022 0.021

Pathway 4 is related to the Alanine and aspartate metabolism pathway which has been studied for its association with abnormal hepatocellular functions and abnormal fasting glucose level alanine and aspartate metabolism in type II diabetes (Jiamjarasrangsi et al., 2009). Two other pathways out of six pathways, pathways 133 and 229, where all except one gene in pathway 133 belong to pathway 229, are related to the Oxidative phosphorylation which is known to be associated to diabetes (Misu et al., 2007; Mootha et al., 2003, 2004). This is a process of cellular respiration in humans (or in general eukaryotes). These pathways contain coregulated genes across different tissues and are related to insulin/glucose disposal. They consist of ATP synthesis, a pathway involved in energy transfer. The other two pathways, pathways 209 and 254, are related to RNA polymerase. All genes except two genes in pathway 209 are part of pathway 254. Among all the pathways, pathway 36, c17_U133_probes, is the most significant one under the Gaussian kernel. This pathway plays a role in cellular behavioral changes (Saxena, 2001) and is also one of the seven pathways that are common among the four kernels. It contains several genes, e.g. CAP1, MAPP2K6, ARF6, and SGK, related to human insulin signaling (Dahlquist et al., 2002). Only one gene in pathway 36 is included in pathway 254. These genes were not significant using single gene based analysis. We also note that all genes in the Oxidative phosphorylation pathway were not significant using single gene based analysis. Only one gene, GAD2, in pathway 4 is significant using single gene based analysis.

We compared the top 50 pathways identified by global test (Goeman et al., 2004) and GSEA (Subramanian et al., 2005). Global test is based on a random effects model which does not include covariates and uses a linear kernel. GSEA calculates enrichment score which is weighed function of correlation among genes in a pathway. This GSEA method cannot include covariate information into this enrichment score value. The GSEA result and global test results are summarized in supplementary materials.

The proportions of overlap among global, GSEA, and our approach were calculated. The proportions of overlap between the global and GSEA was 0.36. The largest proportion among the GSEA and our approaches was 0.41 using the quadratic kernels. These results suggest that the proportions of overlap among different methods were small, meaning that they detected different pathways. When our approach was used with four different kernels, the largest overlap was 0.92 between linear and quadratic kernels. Global test and GSEA approaches found that pathways 4, 140, and 229 are significant. But they could not detect pathways 36, 133, and 254, while our approach detects them all.

For kernel selection, we used Akaike information criterion (AIC) (Akaike, 1974) and Bayesian information criterion (BIC) (Schwarz, 1978), where AIC = n log {(YŶ)T(YŶ)}+2r and BIC = n log{(YŶ)T(YŶ)}+ r log(n), Ŷ = LY, L = (I+λ−1K)−1[λ−1K+X{XT(I+λ−1K)−1X}−1XT(I+λ−1K)−1], and r = rank(L).

We found that all seven pathways are in the top 15 pathway list having the smallest AIC and BIC values for all kernels. Pathway 36, c17_U133_probes, has the smallest AIC and BIC values with the Gaussian kernel among four kernels, whereas pathway 229, Oxidative phosphorylation, has the smallest values with the quadratic kernel among four kernels. Pathway 254, RNA polymerase, has the smallest values with the Neural network kernel among four kernels. Pathways 4, 36, and 229 have small values among all pathways with quadratic kernel.

7. Discussion

In this paper, we have derived the asymptotic properties of semiparametric regression for identifying pathway effects on clinical outcomes after controlling for covariates. We compared LLG’s REML with our REML obtained from a profile n likelihood. We showed that the estimators obtained from our REML and LLG’s REML are both consistent, have n convergence, and asymptotically follow normal distribution when the true parameters are interior points of the parameter space and a mixture of normal distribution when the one component of true parameters is a left endpoint of the parameter space. However, our REML gives more accurate score equations and information matrix, and have smaller MSEs than those of LLG’s REML. A profile restricted likelihood ratio test is also provided for the non-standard testing problem in our application.

The choice of an appropriate kernel poses a significant practical issue. We used kernel selection approaches based on AIC and BIC (Liu et al., 2007). Kim et al. (2012) proposed Bayes factor-based approach on kernel selection. However, this Bayes factor-based approach is computational expensive. Kernel selection can be considered as a model selection problem with the kernal machine framework. More general and flexible methods on model selection for covariance matrix estimation may be explored here and are worth future research.

We note that we analyze each pathway separately in our analysis. It is known that pathways are not independent of each other because of shared genes and interactions among pathways, which makes difficult to adjust p-value. Because existing multiple comparison methods based on false discovery rates (Benjamini and Hochberg, 1995; Storey, 2002, 2003) are developed in single gene based analysis under independence assumption or known positive dependence structure among genes, they are not applicable in a pathway based analysis, where pathways are not independent of each other. Developing such multiple comparison method for pathway based analysis will be a challenging problem because of complex dependence structure among pathways.

It is also important to generalize the semiparametric model (Hastie and Tibshirani, 1990; Tibshirani, 1996) to incorporate multiple pathways using generalized additive model and the multivariate adaptive regression splines (Friedman, 1991).

Supplementary Material

Appendix B

Table 4.

Estimated type I error rate and power based on profile restricted likelihood ratio test (PRLRT) and permutation test (PERM). LLIKE=LLG’s likelihood-based approach with kernel K; PLIKE=our profile likelihood-based approach with kernel K; P2K=Quadratic kernel; PRLRT(e) and PRLRT(t) are profile restricted likelihood ratio tests which are performed using empirical distribution and theoretical distribution, respectively.

The number of sample size and gene Type I error and power Kernel (P2K) LLIKE
PLIKE
PERM PRLRT(e) PRLRT(t) PERM PRLRT(e) PRLRT(t)
Case
n = 60 Type I 4 0.04 0.05 0.05 0.04 0.05 0.05
p = 5 Power 5 0.97 1 1 0.98 1 1
n = 60 Type I 4 0.03 0.04 0.04 0.03 0.04 0.04
p = 200 Power 5 0.83 0.85 0.85 0.84 0.85 0.85

Acknowledgments

This study was supported in part by NIH Grants GM-59507, N01-HV-28186 and P30-DA-18343, National Science Foundation Grant DMS 1106738, and a pilot grant from the Yale Pepper Center P30AG021342.

Appendix A. Technical complements

A.1. Score equations and information matrix

We introduce the following notations to simplify our discussion:

K(z,z)=exp{-k=1p(zk-zk)2ρ},Kρ(z,z)=K(z,z)ρ=exp{-k=1p(zk-zk)2ρ}{k=1p(zk-zk)2ρ2},Kρρ(z,z)=2K(z,z)ρ=exp[-k=1p(zk-zk)2ρ{k=1p(zk-zk)2ρ2}2]-2exp{-k=1p(zk-zk)2ρ}{k=1p(zk-zk)2ρ3},

where Kρ(·, ·) and Kρρ(·, ·) are the first and second derivatives of K(·, ·) with respect to ρ. Let K, Kρ, and Kρ,ρ be n × n matrices. Their entries consist of K(·, ·), Kρ(·, ·), and Kρρ(·, ·), respectively. We recall the following notations:

=σ2I+τK=(Θ),P=-1--1X(XT-1X)-1XT-1,H=I-X(XT-1X)-1XT-1.

Based on these notations and the calculations of

-1Θ=--1Θ-1andlogΘ=-1Θ

we obtain the first and second derivatives of Σ and Σ−1 with respect to τ as follows Στ = K, Σττ = 0 × I, τ-1=--1τ-1, and ττ-1=-(τ-1τ-1+-1ττ-1+-1ττ-1).

We also calculate the first and second derivatives of Σ and Σ−1 with respect to ρ as follows: Σρ = τKρ, Σρρ = τKρρ, ρ-1=--1ρ-1, and ρρ-1=-(ρ-1ρ-1+-1ρρ-1+-1ρρ-1).

The first and second derivatives of Σ and Σ−1 with respect to σ2 are Σσ2 = I, Σσ2σ2 = 0 × I, σ2-1=--1σ2-1, and σ2σ2-1=-(σ2-1σ2-1+-1σ2σ2-1+-1σ2σ2-1).

We obtain the second derivatives of Σ and Σ−1 with respect to τ and ρ, τ and σ2, and ρ and σ2 as follows:

τρ=Kρ,τσ2=0×I,ρσ2=0×I,τρ-1=-(ρ-1τ-1+-1τρ-1+-1τρ-1),τσ2-1=-(σ2-1τ-1+-1τσ2-1+-1τσ2-1),andρσ2-1=-(σ2-1ρ-1+-1ρσ2-1+-1ρσ2-1).

The first derivatives of XTΣX with respect to τ, ρ, σ2 are

(XT-1X)τ-1=-(XT-1X)-1(XTτ-1X)(XT-1X)-1,(XT-1X)ρ-1=-(XT-1X)-1(XTρ-1X)(XT-1X)-1,(XT-1X)σ2-1=-(XT-1X)-1(XTσ2-1X)(XT-1X)-1.

The second derivatives of (XΣX)−1 with respect to τ, ρ, σ2 are

(XT-1X)ττ-1=-[{(XT-1X)τ-1(XTτ-1X)(XT-1X)-1}+{(XT-1X)-1(XTττ-1X)(XT-1X)-1}+{(XT-1X)-1(XTτ-1X)(XT-1X)τ-1}],(XT-1X)ρρ-1=-[{(XT-1X)ρ-1(XTρ-1X)(XT-1X)-1}+{(XT-1X)-1(XTρρ-1X)(XT-1X)-1}+{(XT-1X)-1(XTρ-1X)(XT-1X)ρ-1}],(XT-1X)σ2σ2-1=-[{(XT-1X)σ2-1(XTσ2-1X)(XT-1X)-1}+{(XT-1X)-1(XTσ2σ2-1X)(XT()-1X)-1}+{(XT-1X)-1(XTσ2-1X)(XT-1X)σ2-1}],(XTX)τρ-1=-[{(XT-1X)ρ-1(XTτ-1X)(XT-1X)-1}+{(XT-1X)-1(XTτρ-1X)(XT-1X)-1}+{(XT-1X)-1(XTτ-1X)(XT-1X)ρ-1}],(XTX)τσ2-1=-[{(XT-1X)σ2-1(XTτ-1X)(XT-1X)-1}+{(XT-1X)-1(XTτσ2-1X)(XT-1X)-1}+{(XT-1X)-1(XTτ-1X)(XT-1X)σ2-1}],(XTX)ρσ2-1=-[{(XT-1X)σ2-1(XTρ-1X)(XT-1X)-1}+{(XT-1X)-1(XTρσ2-1X)(XT-1X)-1}+{(XT-1X)-1(XTρ-1X)(XT-1X)σ2-1}].

The first and second derivatives of H with respect to τ are

Hτ=-{X(XT-1X)τ-1XT-1+X(XT-1X)-1XTτ-1},Hττ=-[{X(XT-1X)ττ-1XT-1}+{X(XT-1X)τ-1XTτ-1}+{X(XT-1X)-1XTττ-1}+{X(XT-1X)τ-1XTτ-1}].

The first and second derivatives of H with respect to ρ are

Hρ=-{X(XT-1X)ρ-1XT-1+X(XT-1X)-1XTρ-1},Hρρ=-[{X(XT-1X)ρρ-1XT-1}+{X(XT-1X)ρ-1XTρ-1}+{X(XT-1X)-1XTρρ-1}+{X(XT-1X)ρ-1XTρ-1}].

The first and second derivatives of H with respect to σ2 are

Hσ2=-{X(XTX)σ2-1XT-1+X(XT-1X)-1XTσ2-1},Hσ2σ2=-[{X(XT-1X)σ2σ2-1XT-1}+{X(XT-1X)σ2σ2-1XTσ2-1}+{X(XT-1X)-1XTσ2σ2-1}+{X(-1)σ2-1XTσ2-1}].

The second derivatives of H with respect to τ and ρ, τ and σ2, ρ and σ2 are

Hτρ=-{(X(XT-1X)τρ-1XT-1)+(X(XT-1X)τ-1XTρ-1)+(X(XT-1X)ρ-1XTτ-1)+(X(XT-1X)-1XTτρ-1)},Hτσ2=-[{X(XT-1X)τσ2-1XT-1}+{X(XT-1X)τ-1XTσ2-1}+{X(XTX)σ2-1XTτ-1}+{X(XT-1X)-1XTτσ2-1}],Hρσ2=-[{X(XT-1X)ρσ2-1XT-1}+{X(XT-1X)ρ-1XTσ2-1}+{X(XT-1X)σ2-1XTρ-1}+{X(XT-1X)-1XTρσ2-1}].

The first derivatives of P with respect to τ, ρ, and σ2 are

Pτ=τ-1-[{τ-1X(XT-1X)-1XT-1}+{-1X(XT-1X)-1XTτ-1}+{-1X(XT-1X)τ-1XT-1}],Pρ=ρ-1-[{ρ-1X(XT-1X)-1XT-1}+{-1X(XT-1X)-1XTρ-1}+{-1X(XT-1X)ρ-1XT-1}],Pσ2=σ2-1-[{σ2-1X(XT-1X)-1XT-1}+{-1X(XT-1X)-1XTσ2-1}+{-1X(XT-1X)σ2-1XT-1}].

The score equations based on our REML l2(Θ) are

l2(Θ)τ=-12tr(KP)-12YT{(HτT-1H)+(HT-1Hτ)+(HTτ-1H)}Y,l2(Θ)ρ=-12tr{(τKρ)P}-12YT{(HρT-1H)+(HT-1Hρ)+(HTρ-1H)}Y,l2(Θ)σ2=-12tr(P)-12YT{(Hσ2T-1H)+(HT-1Hσ2)+(HTσ2-1H)}Y.

The second derivatives of l2(Θ) with respect to τ, ρ, and σ2 are

2l2(Θ)τ2=-12tr(KPτ)-12YT{(HττT-1H)+(HTττ-1H)+(HT-1Hττ)+2(HτT-1Hτ+HτTτ-1H+HTτ-1Hτ)}Y,2l2(Θ)ρ2=-12tr{τ(KρρP+KρPρ)}-12YT{(HρρT-1H)+(HTρρ-1H)+(HT-1Hρρ)+2(HρT-1Hρ+HρTρ-1H+HTρ-1Hρ)}Y,2l2(Θ)(σ2)2=-12tr(Pσ2)-12YT{(Hσ2σ2T-1H)+(HTσ2σ2-1H)+(HT-1Hσ2σ2)+2(Hσ2T-1Hσ2+Hσ2Tσ2-1H+HTσ2-1Hσ2)}Y,2l2(Θ)τρ=-12tr(KρP+KPρ)-12YT{(HτρT-1H)+(HTτρ-1H)+(HT-1Hτρ)+(HτTρ-1H)+(HτT-1Hρ)+(HρTτ-1H)+(HTτ-1Hρ)+(HρT-1Hτ)+(HTρ-1Hτ)}Y,2l2(Θ)τσ2=-12tr(KPσ2)-12YT{(Hτσ2T-1H)+(HTτσ2-1H)+(HT-1Hτσ2)+(HτTσ2-1H)+(HτT-1Hσ2)+(Hσ2Tτ-1H)+(HTτ-1Hσ2)+(Hσ2T-1Hτ)+(HTσ2-1Hτ)}Y,2l2(Θ)ρσ2=-12tr(τKρPσ2)-12YT{(Hρσ2T-1H)+(HTρσ2-1H)+(HT-1Hρσ2)+(HρTσ2-1H)+(HρT-1Hσ2)+(Hσ2Tρ-1H)+(HTρ-1Hσ2)+(Hσ2T-1Hρ)+(HTσ2-1Hρ)}Y.

The information matrix based on our REML l2(Θ) is

Iθ=(2l2(Θ)τ22l2(Θ)τρ2l2(Θ)τσ22l2(Θ)τρ2l2(Θ)ρ22l2(Θ)(σ2ρ)2l2(Θ)τσ22l2(Θ)ρσ22l2(Θ)(σ2)2).

Also the score equations and information matrix of Liu et al. (2007) are

l1(Θ)τ=-12tr(KP)+12(Y-Xβ^)T-1K-1(Y-Xβ^),l1(Θ)ρ=-12tr{τKρP}+12(Y-Xβ^)T-1(τKρ)-1(Y-Xβ^),l1(Θ)σ2=-12tr(P)+12(Y-Xβ^)T-1-1(Y-Xβ^),

and

IΘ={12tr(PΘlPΘl)}.

A.2. Regularity conditions

RC1

The observations D = (X, Z, Y) have probability density f(D, Θ) with respect to some measure μ and parameter Θ = (θ1, …, θk). f(D, Θ) has a common support and the model is identifiable. Furthermore, the first and second derivatives of log(f) satisfy

EΘ{logf(D,Θ)θi}=0fori=1,,k

and

Iij(Θ)=EΘ{θilogf(D,Θ)θjlogf(D,Θ)}=EΘ{-2θiθjlogf(D,Θ)}.

RC2

The Fisher information matrix

I(Θ)=E[{Θlogf(D,Θ)}{Θlogf(D,Θ)}T]

is finite and positive definite at Θ = Θ0, where Θ0 is the true parameter value.

RC3

There exists an open subset ω of Θ containing the true parameter vector Θ0 such that for almost all D the density f(D, Θ) admits all third derivatives

3logf(D,Θ)θiθjθlforallΘω.

Furthermore there exists functions Mijl such that

|3logf(D,Θ)θiθiθl|Mijl(D)forallΘω,

where mijl = EΘ0[Mijl(D)] < ∞ for i, j, l.

RC4

Near Θ0 the parameter space Ω behaves like a closed set; this condition was used by Self and Liang (1987) and Vu and Zhou (1997).

A.3. Proof of Theorem 1

Recall that LLG’s REML and our REML are denoted as l1(·) and l2(·), and Θ̂1 and Θ̂2 denote LLG’s estimator and our estimator, respectively. In the following derivation, we use l(·) and Θ̂ to denote the loglikelihood and estimator of either approach to simplify discussion. The true parameter value is denoted by Θ0.

Let αn = n−1/2 + an, where

an=1nmax{log(XT-1X)Θ}=1nmax{(XT-1X)-1(XT-1Θ-1X)}=o(1).

We want to show that for any ε > 0 there exists a constant C such that

P{w:sup||u||=Cl(Θ0+αnu)<l(Θ0)}1-ε. (A.1)

This implies with probability at least 1−ε that there exists a local maximum in the ball {Θ0 + αnu: ||u|| ≤ C}. Hence, there exists a local maximizer Θ̂ such that ||Θ̂Θ0|| = O(αn). Since the intersection of the parameter space Ω and the closure of a neighborhood about Θ0 constitute of closed intervals, there also exist a local maximizer on this set although Θ0 is on the boundary of parameter space Ω. Let L′(Θ0) be the gradient vector of loglikelihood function L. By the standard argument on the Taylor expansion of the likelihood function and if an = o(1), we have

l(Θ0+αnu)-l(Θ0)=L(Θ0+αnu)-L(Θ0)+12log(XT-1(Θ0+αnu)X)-12log(XT-1(Θ0)X)αnL(Θ0)Tu-12uTI(Θ0)unαn2{1+o(1)}+12log(XT-1(Θ0+αnu)XXT-1(Θ0)X). (A.2)

Note that n−1/2L′(Θ0) = O(1). Thus, the first term on the right-hand side of (A.2) is on the order O(n1/2αn)=O(nαn2). By choosing a sufficiently large C, the second term dominates the first term uniformly in ||u|| = C. The third term is also dominated by the second term. Hence, by choosing a sufficiently large C, (A.1) holds when Θ0 is interior of Ω as well as when Θ0 is on the boundary of Ω. This completes the proof of (1.1) in Theorem 1.

A.4. Proof of Lemma1

Since Σ(Θ) is symmetric matrix, we can express it as

(Θ)=(λ1n(Θ)000λ2n(Θ)0000λnn(Θ)),

where λin(Θ) denotes the ordered eigenvalues of Σ(Θ) and Θ = (θ1, …, θk). For easy notation, we let λin(Θ) ≡ λin and Σ(Θ) ≡ Σ. We define

θk=k=(k(λ1n)000k(λ2n)0000k(λnn)).

Since

P=-1--1X(XT-1X)-1XT-1=(λ1n-1000λ2n-10000λnn-1)-1(x12λ1++xn2λn)(λ1n-1000λ2n-10000λnn-1)(····xixj····)(λ1n-1000λ2n-10000λnn-1),

we have

Pk=[(λ1n-1000λ2n-10000λnn-1)-1(x12λ1n++xn2λnn)(····xixjλinλjn····)](k(λ1n)000k(λ2n)0000k(λnn))=(k(λ1n)λ1n000k(λ2n)λ2n0000k(λnn)λn)-1(x12λ1n++xn2λnn)(····XiXjkλjnλinλjn····).

Hence, the mth diagonal element of k is

(Pk)mm=kλmnλmn-1xi2λin(xm2λmn2)k(λmn)=(imxi2λini=1nxi2λin)k(λmn)λmn.

Therefore, we have

1ntr(Pk)=1nm=1n(1nimxi2λin1ni=1nxi2λin)k(λmn)λmn.

Since

1nimnxi2λiand1ni=1nxi2λi

are the same when n goes ∞,

1ntr(Pk)E(k(λmn)λmn)

by the law of large numbers.

A.5. Conditions C1–C4

Cressie and Lahiri (1993, 1996) showed that a general result for the asymptotic property of REML estimator Θ̂reml in a parametric linear model with Gaussian error, Y ~ Nn(Xβ, Σ(Θ)), holds under conditions C1–C4, which we describe later in this section, where Y is an n × 1 data vector (Y1, …, Yn)T, X is an n × q matrix of explanatory variables, β is a q × 1 vector of unknown large scale effects, and Σ(Θ) is an n × n positive definite variance matrix which is known up to a k × 1 vector of small scale effects Θ = (θ1, …, θk). In our studies, Σ(Θ) = σ2I+τK is a positive definite matrix. When we use the Gaussian kernel, Θ = (τ, ρ, σ2) and k = 3. For the polynomial and neural network kernels, Θ = (τ, σ2) and k = 2.

Let U = ΓY represent a vector of ns linearly independent error contrast and; that is, the (ns) columns of U are linearly independent and U ~ N(0, ΓΣ(θ)Γ). If a set of (ns) linearly independent contrasts are used to define U, the new negative loglikelihood function is

LU(Θ)=n-s2log(2π)-12log(XTX)+12log((Θ))+12log(XT(Θ)-1X)+(12)YTPY,

where P = Σ−1Σ−1X(XTΣ−1X)−1 XTΣ−1. A REML estimator of θ can be obtained by minimizing this equation. Let φ(Θ) = (∂2LU(Θ)/∂θiθj), the k × k matrix of the second-order partial derivatives of the negative loglikelihood function LU(·). Then E(φ(Θ)) = tr(i(Θ)j(Θ)/2). Under conditions C1–C4, Cressie and Lahiri (1993, 1996) showed that

[Eθ(φn(Θ))]1/2(Θ^reml-Θ)=O(1).

In our semiparametric regression setting, we show that their result holds under certain situations which are described in Lemmas 1 and 2. By using Lemmas 1 and 2 of our paper and the result of Cressie and Lahiri (1993, 1996), there exists a such that (1/n) tr(i(θ)j(θ)/2)→u, where →u denotes uniform convergence. Therefore, we can show that

n(Θ^reml-Θ)=O(1).

The conditions needed for applying the result of Cressie and Lahiri (1993, 1996) are

  • C1. Σ(Θ) is twice continuously differentiable on Θ.

  • C2. For all c > 0, η > 0, let ||B|| = {tr(BB)}1/2,

    1. sup{||{Eθφn(Θ)}-1/2{Eθφn(Θ0)}1/2-Ik||:||[{Eθφ(Θ)}-1/2](Θ-Θ0)||c;Θ,Θ0Ω}u0
    2. with M=(θ10,θk0)
      Pθ(sup{||{Eθφn(Θ)}-1/2{φn(M)-φn(Θ)}({Eθφn(Θ)}-1/2)T||:||({Eθφ(Θ)}-1/2)T(Θ-Θi0)||c,1ik}>η)u0.
  • C3. There exists a positive definite matrix W(θ), continuous in θ, such that Qn(θ) uniformly converge to W(θ), where
    Qn(Θ)ij=tr(Pi(Θ)Pj(Θ))/(||Pi(Θ)||)(||Pj(Θ)||).
  • Let |λ1n(Θ)| ≤ · · · ≤ |λnn(Θ)| denote the ordered absolute eigenvalues of Σ(Θ), λ1ni(Θ)λnni(Θ) denote those of Σij = ∂2Σ(Θ)/∂ΘiΘj. Suppose there is a sequence (rn)n ≥ 1 with lim supn→∞rn/n ≤ 1−δ*, for some δ* ∈ (0, 1), such that for any compact subset ΩsΩ, there exist constants 0 < C1(Ωs) < ∞ and η1(Ωs) > 0 such that
    limsupnmax{λn(Θ),λni(Θ),λnij(Θ):1i,j,k}<C1(Ωs)<,limsupnmin{λ1(Θ),λrni(Θ):1ik}>η1(Ωs)>0

    uniformly in ΘΩs.

A.6. Proof of Theorem 2

We check the validity of assumptions (C1), (C2), (C3), and (C4) as follows:

  • Conditions (C1) and (C2,ii) require smoothness of the variance–covariance matrix Σ(Θ) as a function of Θ. Since our Σ(Θ) = τK(ρ)+σ2I is a smooth function of θ, the conditions (C1) and (C2,ii) hold.

  • Since [{φn(Θ)}1/2]n ≥ 1 is smooth to guarantee (C2.i), condition (C2.i) holds by the arguments of Cressie and Lahiri (1996).

  • C3. We show that condition (C3) holds under certain situation described in Lemmas 1 and 2. When we use the Gaussian, polynomial, and neural network kernels, Σ and Στ are simultaneously diagonalizable as well as Σ and Σσ2 are. For the Gaussian kernel, we also need to show that Σ and Σρ are simultaneously diagonalizable when n goes to ∞. We can show this using Lemma 2 as follows:

    • The ijth elements of matrix KKρ and KρK are
      (KKρ)ij=kKik(Kρ)kj=(1+k=1nzk-zj2)exp{-zi-zk2+zk-zj2ρ},(KρK)ij=(1+k=1nzi-zk2)exp{-zi-zk2+zk-zj2ρ}.

      Since Ez(KKρKρK) = 0 when n→∞, KKρ = KρK as n→∞. Therefore, K and Kρ are almost simultaneously diagonalizable for large n. This means that, Σ and Σρ are almost simultaneously diagonalizable for large n. From Lemma 1, (1/n) tr(ρ) converges. Therefore, condition C3 holds if Σ and Σk are simultaneously diagonalizable.

  • C4. Since Σ(Θ) and Σi(Θ) are nonsingular and uniformly bounded over a compact subset of Ω, this condition holds.

The proof of Theorem 2 is done when Θ0 is interior of parameter space Ω.

Since the intersection of parameter space Ω and the closure of a neighborhood about Θ constitute of closed intervals, Theorem 2 also holds when Θ0 is on the boundary of parameter space Ω by arguments in Geyer (1994) and Vu and Zhou (1997) who extended the result of Chernoff (1954) to nonidentically distributed sampling. Geyer’s (1994) result is based on a sampling model that is essentially a stationary process, while Vu and Zhou (1997) has no such restriction and allows general nonidentically distributed sampling so that models with covariances can be included.

We can demonstrate that our results hold using Vu and Zhou (1997) by relating their conditions with ours. Vu and Zhou (1997) provided that the existence, consistency, and asymptotic properties of the local maximum estimators covering a large class of estimation problems which allow sampling from nonidentically distributed random variable under certain conditions, A1–A2, B1–B4, in their paper: Regularity condition RC3 of our paper infers A1 in -Vu and Zhou (1997); RC4 of our paper suggest A2; RC1 and 2 of our paper imply B1; RC2 infers B2; Condition C2 in A.5 of our paper infers B3 and B4 in Vu and Zhou (1997). Therefore, under regularity conditions, Theorem 2 also holds when Θ0 is on the boundary of parameter space Ω.

We can also show our results using Geyer (1994) but we need to add his assumption D. Geyer (1994) assumed that the sampling distribution has a stationary process and showed that the existence, consistency, and asymptotic properties of the local maximum estimators under Assumptions A–D: since our density function f(D, Θ) is in C3 in Θ, Assumption A in Geyer (1994) is satisfied; RC3 infers Assumption B in Geyer (1994); Assumption C in Geyer (1994) is

n(if(Di,Θ)Θ-E(f(Di,Θ)Θ))N(0,A)

for some covariance matrix A. This assumption C is satisfied by the weak law large number theory and central limit theory; Assumption D requires that the estimating sequence {Θ̂n} = Θ0 + op(1). Therefore, Theorem 2 also holds under regularity conditions.

A.7. Proof of Theorem 3

For consistent estimators Θ̂ using a first order Taylor expansion of l(·) near Θ0 yields,

0=l(Θ)Θ|Θ^=l(Θ)Θ|Θ0+l(Θ)Θ|Θ¯(Θ^-Θ0),

where Θ̄ is a vector between Θ̂ and Θ0. Consequently, we have

n(Θ^-Θ0)={-1n2l(Θ)ΘΘT|Θ¯}-11nl(Θ)Θ|Θ0.

Next, we show that, (i) the limit distribution of

1nl(Θ)Θ|Θ0N{0,I(Θ0)}

and (ii)

-1nl(Θ)ΘΘT|Θ¯I(Θ0)

as n→∞

To prove (i), note that

1nl(Θ)Θ|Θ0=1nL(Θ)Θ|Θ0+12n(XT-1X)-1(XT-1Θ-1X).

The first term converges in distribution to N(0, I(Θ0)); the second term goes to zero with o(n−1/2) because (XTΣ−1X)−1(XTΣ−1ΣΘΣ−1X) = o(1). Thus, (i) holds.

For (ii), since

-1n2l(Θ)ΘΘT|Θ¯=-1n2L(Θ)ΘΘT|Θ¯-12n{(XT-1X)Θ-1XT(-1Θ-1)X+(XT-1X)-1XT(-1Θ-1)ΘX}

and again the second term of the right side goes to zero, the first term converges in probability to I(Θ0) (Lehmann, 1983).

Thus, applying Slutsky’s lemma and converting back to the original parameter space via the delta method, the asymptotic normality holds. The proof of (3.1) is done.

The proof of (3.2) can be shown by Chant’s (1974) case (i). Suppose that Ω = Ω1 × Ω2 × Ω3, Θ1 is a left endpoint of Ω1, and other components of Θ are interior points of Ωi, i = 2, 3. Then Θ̂ can be expressed as

(N1N2N3)I{N1>0}+(0N2-(I21/I11)N1N3-(I31/I11)N1)I{N1>0},

where N is a random variable with a multivariate Gaussian distribution with mean Θ and covariance I−1(Θ0), where Θ is restricted to lie in CΩΘ0 and CΩ is a cone with vertex at Θ0. This expression corresponds to the result of Chant’s (1974) case (i).

A.8. Proof of Theorem 4

If Θ̂ is consistent then by a Taylor series approximation

n(Θ^-Θ0)={-1n2l(Θ)ΘΘT|Θ¯}-11nl(Θ)Θ|Θ0.

Therefore,

Θ^1-Θ0(-l1,ΘΘΘ¯)-1l1,ΘΘ0,Θ^2-Θ0(-l2,ΘΘΘ¯)-1l2,ΘΘ0,

where Θ̄ is a vector between Θ̂ and Θ0, li.Θ = ∂li(Θ)/∂ΘT, and li.ΘΘ = ∂li.Θ(Θ)/∂ΘT, i = 1, 2.

By applying chain-rule, we obtain the first derivative and second derivatives of REML with respect to Θ as follows:

l2,Θ(Θ)=l{Θ,β^(Θ)}Θ=lΘ(Θ)+lβ^(Θ)β^Θ(Θ)=l1,Θ(Θ)+lβ^(Θ)β^Θ(Θ)=l1,Θ(Θ)(sincelβ^=0)

and

l2.ΘΘ(Θ)=l{Θ,β^(Θ)}ΘΘ=lΘΘ(Θ)+lβ^Θ(Θ)β^Θ(Θ)+lβ^(Θ)β^ΘΘ(Θ)=lΘΘ(Θ)+{lβ^(Θ)β^(Θ)β^(Θ)Θ}β^Θ(Θ)+lβ^(Θ)β^ΘΘ(Θ)=lΘΘ(Θ)+lβ^β^(Θ){β^Θ(Θ)}2+lβ^(Θ)β^ΘΘ(Θ)=l1,ΘΘ(Θ)+lβ^β^{β^Θ(Θ)}2(sincelβ^=0,lβ^β^(β^Θ)2<0)<l1,ΘΘ<0.

Since ||l2.ΘΘ|| >||l1.ΘΘ||, we obtain the following inequality:

||Θ^1-Θ||||l1.ΘΘ-1l1.Θ||>||l2.ΘΘ-1l1.Θ||=||l2.ΘΘ-1l2.Θ||||Θ^2-Θ||.

Therefore, E||Θ̂1Θ||2 > E||Θ̂2Θ||2. This means that the estimators based on our REML have theoretically smaller mean square error than those of LLG’s REML. The proof of Theorem 4 is done.

A.9. Non-identifiability between τ and ρ when ρ→0

If τ ~ O(1/ρm) for any positive value m and ρ ~ O{E(||zz′||2)}, Θ̂2 is asymptotically normally distributed with mean Θ and covariance matrix I(Θ). But if ρ→0, Hτ = Hττ = Pτ = Pττ = 0 and Hρ = Hρρ = Pρ = Pρρ = 0, then the asymptotic distributions of τ̂2 and ρ̂2 are the same and degenerated.

For LLG’s estimator Θ̂1, as ρ→0, the asymptotic distributions of τ̂1 and ρ̂1 are also the same because the information matrix of Liu et al. (2007) is

Iθl,θl=12tr{PθlPθl}

and Σθl0.

A.10. The proof of equivalence of the two testings

The testing of H0: {r(Z) is a point mass at zero} ∪ {r(Z) has a constant covariance matrix as a function of z} is equivalent to the testing of ∂K(Z)/∂Z = 0.

K(zi,zj)=τexp(-||zi-zj||2ρ),K(zi,zj)zi=-2(||zi-zj||)(τρ)exp(-||zi-zj||2ρ).

If ρ→0 and τ→0 with faster rate O(ρm), then ∂K(Z)/∂Z = 0. That is, if τ/ρ→0, then ∂K(Z)/∂Z = 0.

If ρ→∞, then 0 ≤ exp(−||zizj||2/ρ) ≤ 1. Therefore,

0τρexp(-||z-zj||2ρ)τρ.

Hence, if

τρ0,K(Z)Z=0.

If ρ→0 and τ ~ O(1/ρm), then exp(−||zizj||2/ρ)→0. Therefore, ∂K(Z)/∂Z = 0. Therefore, if τ/ρ→0 or ρ→0, then ∂K(Z)/∂Z = 0.

Appendix B. Supplementary

Some additional results and information on pathway and genes are available in a separate file for supplementary materials.

Supplementary data associated with this paper can be found in the online version at http://dx.doi.org/10.1016/j.jspi.2012.09.009.

References

  1. Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control. 1974;19:716–723. [Google Scholar]
  2. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B. 1995;57:289–300. [Google Scholar]
  3. Chant D. On asymptotic tests of composite hypotheses in nonstandard conditions. Biometrika. 1974;61:291–298. [Google Scholar]
  4. Chernoff On the Distribution of the Likelihood Ratio. Annals of Mathematical Statistics. 1954;25:573–578. [Google Scholar]
  5. Claeskens G. Restricted likelihood ratio lack of fit tests using mixed spline models. Journal of the Royal Statistical Society: Series B. 2004;66:909–926. [Google Scholar]
  6. Cressie N, Lahiri SN. The asymptotic distribution of REML estimators. Journal of Multivariate Analysis. 1993;45:217–233. [Google Scholar]
  7. Cressie N, Lahiri SN. Asymptotic for REML estimation of spatial covariance parameters. Journal of Statistical Planning and Inference. 1996;50:327–341. [Google Scholar]
  8. Cristianini N, Shawe-Taylor J. Kernel Methods for Pattern Analysis. Cambridge University Press; Cambridge: 2006. [Google Scholar]
  9. Dahlquist KD, Salomonis N, Vranizan K, Lawlor SC, Conklin BR. GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways. Nature Genetics. 2002;31:19–20. doi: 10.1038/ng0502-19. [DOI] [PubMed] [Google Scholar]
  10. Efron B, Hastie T, Johnstine I, Tibshirani R. Least angle regression. Annals of Statistics. 2004;32:407–499. [Google Scholar]
  11. Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association. 2001;96:1348–1360. [Google Scholar]
  12. Friedman JH. Multivariate adaptive regression splines (with discussion) The Annals of Statistics. 1991;19:1–141. [Google Scholar]
  13. Geyer CJ. On the asymptotics of constrained M-estimation. The Annals of Statistics. 1994;22:1993–2010. [Google Scholar]
  14. Goeman J, van de Geer SA, Kort F, Houwelingen HC. A global test for groups of genes: testing association with a clinical outcome. Bioinformatics. 2004;20:93–99. doi: 10.1093/bioinformatics/btg382. [DOI] [PubMed] [Google Scholar]
  15. Gu C, Ma P. Optimal smoothing in nonparametric mixed effect models. Annals of Statistics. 2005;33:1357–1379. [Google Scholar]
  16. Harris MA, et al. The gene ontology (GO) database and informatics resource. Nucleic Acids Research. 2004;32:D258–D261. doi: 10.1093/nar/gkh036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Hastie TJ, Tibshirani RJ. Generalized Additive Models. Chapman & Hall, CRC; 1990. [Google Scholar]
  18. Hosack DA, Dennis G, Jr, Sherman BT, Clifford H, Lempicki RA. Identifying biological themes within lists of genes with EASE. Genome Biology. 2003;4 (10):R70. doi: 10.1186/gb-2003-4-10-r70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Jiamjarasrangsi W, Lertmaharit S, Sangwatanaroj S, Lohsoonthorn V. Type 2 diabetes, impaired fasting glucose, and their association with increased hepatic enzyme levels among the employees in a University Hospital in Thailand. Journal of the Medical Association of Thailand. 2009;92:961–968. [PubMed] [Google Scholar]
  20. Kim I, Pang H, Zhao H. Bayesian semiparametric regression models for evaluating pathway effects on clinical continuous and binary outcomes. Statistics in Medicine. 2012;31:1633–1651. doi: 10.1002/sim.4493. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Lehmann EL. Theory of Point Estimation. John Wiley; New York: 1983. [Google Scholar]
  22. Liu D, Lin X, Ghosh D. Semiparametric regression of multi-dimensional genetic pathway data: least square kernel machines and linear mixed models. Biometrics. 2007;63:1079–1088. doi: 10.1111/j.1541-0420.2007.00799.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Misu H, Takamura T, Matsuzawa N, Shimizu A, Ota T, Sakurai M, Ando H, Arai K, Yamashita T, Honda M, Yamashita T, Kaneko S. Genes involved in oxidative phosphorylation are coordinately upregulated with fasting hyperglycaemia in livers of patients with type 2 diabetes. Diabetologia. 2007;50:268–277. doi: 10.1007/s00125-006-0489-8. [DOI] [PubMed] [Google Scholar]
  24. Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carsson E, Ridderstrale M, Laurila E, Houstis N, Daly MJ, Patterson P, Mesirov JP, Golub TR, Tamayo P, Spiegelman B, Lander ES, Hirschhorn JN, Altshuler D, Groop L. PGC-1 alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature Genetics. 2003;34:267–273. doi: 10.1038/ng1180. [DOI] [PubMed] [Google Scholar]
  25. Mootha VK, Handschin C, Arlow D, Xie X, Pierre JS, Sihag S, Yang W, Altshuler D, Puigserver P, Patterson N, Willy PJ, Schulman IG, Heyman RA, Lander ES, Spiegelman BM. Err α and Gabpa/b specify PGC-1 α-dependent oxidative phosphorylation gene expression that is altered in diabetic muscle. Proceedings of the National Academy of Sciences. 2004;101:6570–6575. doi: 10.1073/pnas.0401401101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Rajagopalan DA, Agarwal P. Inferring pathways from gene lists using a literature-derived network of biological relationships. Bioinformatics. 2005;21:788–793. doi: 10.1093/bioinformatics/bti069. [DOI] [PubMed] [Google Scholar]
  27. Saxena V. PhD Dissertation of Mechanical Engineering. Massachusetts Institute of Technology; 2001. Genomic Response, Bioinformatics, and Mechanics of the Effects of Forces on Tissues and Wound Healing. [Google Scholar]
  28. Schwarz GE. Estimating the dimensional of a model. The Annals of Statistics. 1978;6:461–464. [Google Scholar]
  29. Self SG, Liang KY. Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. Journal of American Statistical Association. 1987;82:605–610. [Google Scholar]
  30. Storey JD. The positive false discovery rate: a Bayesian interpretation and the q-value. Annals of Statistics. 2003;31:2013–2035. [Google Scholar]
  31. Storey JD. A direct approach to false discovery rates. Journal of the Royal Statistical Society Series B. 2002;64:479–498. [Google Scholar]
  32. Subramanian A, Tamayoa P, Mootha V, Mukherjee S, Ebert B, Gillette M, Paulovich A, Pomeroy S, Golub TR, Lander ES, Mesurov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences. 2005;102 (43):15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Tibshirani R. Regression shrinkage and selection via the Lasso. Journal of Royal Statistical Society, Series B. 1996;58:267–288. [Google Scholar]
  34. Vu HTV, Zhou S. Generalization of likelihood ratio tests under nonstandard conditions. The Annals of Statistics. 1997;25:916–987. [Google Scholar]
  35. Wang Y. Mixed-effects smoothing spline ANOVA. Journal of the Royal Statistical Society, Series B. 1998;60:159–174. [Google Scholar]
  36. Zhang D, Lin X, Raz J, Sowers M. Semi-parametric mixed models for longitudinal data. Journal of American Statistical Association. 1998;93:710–719. [Google Scholar]
  37. Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of Royal Statistical Society, Series B. 2005;67:301–320. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix B

RESOURCES