Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Oct 1.
Published in final edited form as: Ann Stat. 2015 Oct;43(5):2102–2131. doi: 10.1214/15-AOS1344

Estimation and Inference in Generalized Additive Coefficient Models for Nonlinear Interactions with High-Dimensional Covariates

MA Shujie *,1, Raymond J Carroll †,‡,2, Hua Liang §,3, Shizhong Xu *
PMCID: PMC4578655  NIHMSID: NIHMS719947  PMID: 26412908

Abstract

In the low-dimensional case, the generalized additive coefficient model (GACM) proposed by Xue and Yang [Statist. Sinica 16 (2006) 1423–1446] has been demonstrated to be a powerful tool for studying nonlinear interaction effects of variables. In this paper, we propose estimation and inference procedures for the GACM when the dimension of the variables is high. Specifically, we propose a groupwise penalization based procedure to distinguish significant covariates for the “large p small n” setting. The procedure is shown to be consistent for model structure identification. Further, we construct simultaneous confidence bands for the coefficient functions in the selected model based on a refined two-step spline estimator. We also discuss how to choose the tuning parameters. To estimate the standard deviation of the functional estimator, we adopt the smoothed bootstrap method. We conduct simulation experiments to evaluate the numerical performance of the proposed methods and analyze an obesity data set from a genome-wide association study as an illustration.

Key words and phrases: Adaptive group lasso, bootstrap smoothing, curse of dimensionality, gene-environment interaction, generalized additive partially linear models, inference for high-dimensional data, oracle property, penalized likelihood, polynomial splines, two-step estimation, undersmoothing

1. Introduction

Regression analysis is a commonly used statistical tool for modeling the relationship between a scalar dependent variable Y and one or more explanatory variables denoted as T = (T1, T2, …, Tp)T. To study the marginal effects of the predictors on the response, one may fit a generalized linear model (GLM),

E(Y|T)=μ(T)=g1{η(T)},η(T)==1pα0T, (1)

where g is a known monotone link function, and α0, 1 ≤ p, are unknown parameters. Sometimes, the effect of one variable may change with other variables; that is, there is an interaction effect. By letting T1 = 1, to incorporate the interaction effects of T and the other variables, denoted as X = (X1,…, Xd)T, model (1) can be modified to E(Y|X, T) = μ(X, T) = g−1{η(X, T)} with

η(X,T)=α10+=2pα0T+k=1dα1kXk+=2pk=1dαkXkT, (2)

where αℓk for 0 ≤ kd and 1 ≤ p are parameters. After a direct reformulation, model (2) can be written as

η(X,T)==1p(α0+k=1dαkXk)T. (3)

Here the effect of each T changes linearly with Xk. However, in practice, this simple linear relationship may not reflect the true changing patterns of the coefficient with other covariates. We here use an example of gene and environment (G × E) interactions for illustration. It has been noticed in the literature that obesity is linked to genetic factors. Their effects, however, can be altered under different environmental factors such as sleeping hours [Knutson (2012)] and physical activity [Wareham, van Sluijs and Ekelund (2005)]. To have a rough idea of how the effects of the genetic factors change with the environment, we explore data from the Framingham Heart Study [Dawber, Meadors and Moore (1951)]. In Figure 1 we plot the estimated mean body mass index (BMI) against sleeping hours per day and activity hours per day, respectively, for people with three possible genotype categories represented by AA, Aa and aa, and for one single nucleotide polymorphism (SNP). A detailed description and the analysis of this data set are given in Section 5. We define allele A as the minor (less frequent) allele. This figure clearly shows different nonlinear curves for the three groups in each of the two plots. By letting T be the indicator for the group , the linear function in model (3) is clearly misspecified.

Fig. 1. Plots of the estimated BMI against sleeping hours per day (left panel) and activity hours per day (right panel) for the three genotypes AA (solid line), Aa (dashed line) and aa (dotted line) of SNP rs242263 in the Framingham study, where A is the minor allele.

Fig. 1

To relax the linearity assumption, we allow each αℓk Xk term to be an unknown nonlinear function of Xk, and thus extend model (3) to the generalized additive coefficient model (GACM)

η(X,T)==1p{α0+k=1dαk(Xk)}T==1pα(X)T. (4)

For identifiability, the functional components satisfy E{αℓk(Xk)} = 0 for 1 ≤ kd and 1 ≤ p. The conditional variance of Y is modeled as a function of the mean, that is, var(Y|X, T) = V{μ(X, T)} = σ2(X, T). In each coefficient function of the GACM, covariates Xk are continuous variables. If some of them are discrete, they will enter linearly. For example, if Xk is binary, we let αℓk(Xk) = αℓk Xk. In such a case, model (4) turns out to be a partially linear additive coefficient model. The linearity of (4) in T is particularly appropriate when those factors are discrete, for example, SNPs in a genome-wide association study (GWAS), as in the data example of Section 5.

For the low-dimensional case that the dimensions of X and T are fixed, estimation of model (4) has been studied; see Liu and Yang (2010), Xue and Liang (2010), Xue and Yang (2006) for a spline estimation procedure and Lee, Mammen and Park (2012) for a backfitting algorithm. In modern data applications, model (4), however, is particularly useful when p is large. For example, in GWAS, the number of SNPs, which is p, can be very large, but the dimension of X such as the environmental factors, which is d, is inevitably relatively small. Moreover, the number of variables in T which have nonzero effects is small. It therefore, poses new challenges to apply model (4) to the high-dimensional case including: (i) how to identify those important variables in T, (ii) how to estimate the coefficient functions for the important covariates and (iii) how to conduct inferences for the nonzero coefficient functions. For example, it is of interest to know whether they are a function of a specific parametric form such as constant, linear or quadratic, etc.

In the high-dimensional data setting, studying nonlinear interaction effects has found much attention in recent years, and a few strategies have been proposed. For example, Jiang and Liu (2014) proposed to detect variables under the general index model, which enables the study of high-order interactions among components of continuous predictors, which are assumed to have a multivariate normal distribution. Moreover, Lian (2012) considered variable selection in varying coefficient models which allows the coefficient functions to depend on one index variable, such as a time-dependent variable.

When we would like to see how the effect of each genetic factor changes under the influence of multiple environmental variables, the proposed high-dimensional GACM (4) becomes a natural approach to consider, since both the index model [Jiang and Liu (2014)] and the varying coefficient model [Lian (2012)] cannot address this question; the former is used to study interactions of components in a set of continuous predictors, and the latter only allows one index variable. For model selection and estimation, we apply a groupwise penalization method. Moreover, most existing high-dimensional nonparametric modeling papers [Lian (2012), Meier, van de Geer and Bühlmann (2009), Ravikumar et al. (2009), Wang et al. (2014), Huang, Horowitz and Wei (2010)] focus on variable selection and estimation. In this paper, after variable selection, we also propose a simultaneous inferential tool to further test the shape of the coefficient function for each selected variable, which has not been studied in the previous works.

To this end, we aim to address questions (i)–(iii). Specifically, for estimation and model selection, we apply a groupwise regularization method based on a penalized quasi-likelihood criterion. The penalty is imposed on the L2 norm of the spline coefficients of the spline estimators for α(·). We establish the asymptotic consistency of model selection and estimation for the proposed group penalized estimators with the quasi-likelihood criterion in the high-dimensional GACM (4). We allow p to grow with n at an almost exponential order. Importantly, establishment of these results is technically more difficult than other work based on least squares, since no closed-form of the estimators exists from the penalized quasi-likelihood method.

After selecting the important variables, the next question of interest is what shapes the nonzero coefficient functions may have. Then we need to provide an inferential tool to further check whether a coefficient function has some specific parametric form. For example, when it is a constant or a linear function, the corresponding covariate has no or linear interaction effects with another covariate, respectively. For global inference, we construct simultaneous confidence bands (SCBs) for the nonparametric additive functions based on a two-step estimation procedure. By using the selected variables, we first propose a refined two-step spline estimator for the function of interest, which is proved to have a pointwise asymptotic normal distribution and oracle efficiency. We then establish the bounds for the SCBs based on the absolute maxima distribution of a Gaussian process and on the strong approximation lemma [Csörgő and Révész (1981)]. Some other related works on SCBs for nonparametric functions include Claeskens and Van Keilegom (2003), Hall and Titterington (1988), Härdle and Marron (1991), among others. We provide an asymptotic formula for the standard deviation of the spline estimator for the coefficient function, which involves unknown population parameters to be estimated. The formula has somewhat complex expressions and contains many parameters. Direct estimation therefore may be not accurate, particularly with the small or moderate sample sizes. As an alternative, the bootstrap method provides us a reliable way to calculate the standard deviation by avoiding estimating those population parameters. We here apply the smoothed bootstrap method suggested by Efron (2014), which advocated that the method can improve coverage probability to calculate the pointwise estimated standard deviations for the estimators of the coefficient functions. This method was originally proposed for calculating the estimated standard deviation of the estimate of a parameter of interest, such as the conditional mean. We extend this method to the case of functional estimation. We demonstrate by simulation studies in Section 4 that compared to the traditional resampling bootstrap method, the smoothed bootstrap method can successfully improve the empirical coverage rate.

The paper is organized as follows. Section 2 introduces the B-spline estimation procedure for the nonparametric functions, describes the adaptive group Lasso estimators and the initial Lasso estimators and presents asymptotic results. Section 3 describes the two-step spline estimators and introduces the simultaneous confidence bands and the bootstrap methods for calculating the estimated standard deviation. Section 4 describes simulation studies, and Section 5 illustrates the method through the analysis of an obesity data set from a genome-wide association study. Proofs are in the Appendix and additional supplementary material [Ma et al. (2015)].

2. Penalization based variable selection

Let (Yi, XiT,TiT), i = 1,…, n, be random vectors that are independently and identically distributed as (Y, XT, TT), where Xi = (Xi1, …, Xid)T and Ti = (Ti1, …, Tip)T. Write the negative quasi-likelihood function Q(μ,y)=μy{(yζ)/V(ζ)}dζ. Estimation of the mean function can be achieved by minimizing the negative quasi-likelihood of the observed data

i=1nQ{g1{η(Xi,Ti)},Yi}. (5)

2.1. Spline approximation

We approximate the smooth functions αℓk(·), 1 ≤ kd and 1 ≤ p in (4) by B-splines. As in most work on nonparametric smoothing, estimation of the functions αℓk(·) is conducted on compact sets. Without loss of generality, let the compact set be χ = [0, 1]. Let Gn0 be the space of polynomial splines of order q ≥ 2. We introduce a sequence of spline knots

tq1==t1=t0=0<t1<<tN<1=tN+1==tN+q,

where NNn is the number of interior knots. In the following, let Jn = Nn + q. For 0 ≤ jN, let Hj = tj + 1tj be the distance between neighboring knots and let H = max0≤sN Hj. Following Zhou, Shen and Wolfe (1998), to study asymptotic properties of the spline estimators for αℓk(·), we assume that max0≤jN−1 |Hj + 1Hj| = o(N−1) and H/min0≤jN HjM, where M > 0 is a predetermined constant. Such an assumption is necessary for numerical implementation. In practice, we can use the quantiles as the locations of the knots. Let {bj,k(xk) : 1 ≤ jJn}T be the qth order B spline basis functions given on page 87 of de Boor (2001). For positive numbers an and bn, anbn means that limn→∞ an/bn = c, where c is some nonzero finite constant. For 1 ≤ jJn, we adopt the centered B-spline functions given in Xue and Yang (2006) such that Bj,k(xk) = √N[bj,k(xk) − {E(bj,k)/E(b1,k)}b1,k(xk)], so that E{Bj,k(Xk)} = 0 and var{Bj,k(Xk)} ≍ 1. Define the space Gn of additive spline functions as the linear space spanned by B(x) = {1, Bj,k(xk), 1 ≤ jJn, 1 ≤ kd}T , where x = (x1,…, xd)T. According to the result on page 149 of de Boor (2001), for αℓk(·) satisfying condition (C3) in Appendix A.2 such that αk(r1)(xk)C0,1[0,1] for given integer r ≥ 1, where C0, 1 [0, 1] is the space of Lipschitz continuous functions on [0, 1] defined in Appendix A.2, there is a function

αk0(xk)=j=1Jnγj,kBj(xk)Gn0, (6)

such that supxk[0,1]|αk0(xk)αk(xk)|=O(Jnr). Then for every 1 ≤ p, α(x) can be approximated well by a linear combination of spline functions in Gn0, so that

α(X)α0(X)=γ0+k=1dj=1Jnγj,kBj,k(xk)=B(X)Tγ, (7)

where γ=(γ0,γ1T,,γdT)T, in which γℓk = (γj,ℓk : 1 ≤ jJn)T. Thus the minimization problem in (5) is equivalent to finding γ0=(γ0T,1p)T with γ0=(γ00,γ10T,,γd0T)T and γk0=(γj,k0:1jJn)T to minimize i=1nQ[g1{=1pB(Xi)TγT},Yi]. The components of the additive coefficients are estimated by αk0(xk)=j=1Jnγj,k0Bj(xk)=B(X)Tγk0 and α00=γ00.

2.2. Adaptive group Lasso estimator

We now describe the procedure for estimating and selecting the additive coefficient functions by using the adaptive group Lasso. The estimators are obtained by minimizing a penalized negative quasi-likelihood criterion. We establish asymptotic selection consistency as well as the convergence rate of the estimators to the true nonzero functions. For any vector a = (a1, …, as)T, let its L2 norm be a2=a12++as2. For any measurable L2-integrable function ϕ on [0, 1]d, define the L2 norm as ‖ϕ2 = E{ϕ2(X)}.

We are interested in identifying the significant components of the vector T = (T1, …, Tp)T. Let s, a fixed number, be the total number of nonzero α's and I1 = {: ‖α‖ ≠ 0, 1 ≤ p}. Let I2 be the complementary set of I1; that is, I2 = {: α(·) ≡ 0, 1 ≤ p}. Recalling the approximation given in (7), γ is zero if and only if each element of γ is zero; that is, ‖γ2 = 0. We apply the adaptive group Lasso approach in Huang, Horowitz and Wei (2010) for variable selection in model (4). In order to identify zero additive coefficients, we penalize the L2 norm of the coefficients γ for 1 ≤ p. Let wn = (wn1, …, wnp)T be a given vector of weights, which needs to be chosen appropriately to achieve selection consistency. Their choice will be discussed in Section 2.3. We consider the penalized negative quasi-likelihood

Ln(γ)=i=1nQ[g1{=1pBT(Xi)γT},Yi]+nλn=1pwnγ2, (8)

where λn is a regularization parameter controlling the amount of shrinkage. The estimator γ^=(γ^1T,,γ^pT)T is obtained by minimizing (8). Minimization of (8) is solved by local quadratic approximation as adopted by Fan and Li (2001).

For = 1, …, p, the th additive coefficient function is estimated by

α^(X)=γ^0+k=1dj=1Jnγ^j,kBj,k(xk)=BT(X)γ^.

We will make the following two assumptions on the order requirements of the tuning parameters. Write wn, I1 = (wnℓ : I1).

Assumption 1. Jn2{nlog(n)}10 and λnwn, I12 → 0, as n → ∞.

Assumption 2. nλnwn,I12+n1/2Jn1/2log(pJn)+nJnr=o(nλnwn), for all I2.

The following theorem presents the selection consistency and estimation properties of the adaptive group Lasso estimators.

Theorem 1. Under conditions (C1)–(C5) in the Appendix and Assumptions 1 and 2: (i) as n → ∞, P(‖α̂‖ > 0, I1 and ‖α̂‖ = 0, I2) → 1, and (ii) α^α=Op(λnwn,I12+n1/2Jn1/2+Jnr),I1.

2.3. Choice of the weights

We now discuss how to choose the weights used in (8) based on the initial estimates. For low-dimensional data settings with p < n, an unpenalized estimator such as least squares estimator [Zou (2006)] can be used as an initial estimate. For high-dimensional settings with pn, it has been discussed [Meier and Bühlmann (2007)] that the Lasso estimator is a more appropriate choice. Following Huang, Horowitz and Wei (2010), we obtain an initial estimate with the group Lasso by minimizing

Ln1(γ)=i=1nQ[g1{=1pB(Xi)TγT},Yi]+nλn1=1pγ2,

with respect to γ=(γ1T,,γpT)T. Denote the resulting estimators by γ=(γ1T,,γpT)T. Let Ĩ1 = { : ‖γ̃2 ≠ 0, 1 ≤ p}, and let be the number of elements in Ĩ1.

Under conditions (C1)–(C5) in the Appendix, and when λn1Cn1/2Jn1/2×log(pJn) for a sufficiently large constant C, we have: (i) the number of estimated nonzero functions are bounded; that is, as n → ∞, there exists a constant 1 < C1 < ∞ such that P(C1s) → 1; (ii) if λn1 → 0, then P(‖γ̃2 > 0 for all lI1) → 1; (iii) γγ2=Op(λn1+n1/2Jn1/2+Jnr). We refer to Theorems 1 (i) and (ii) of Huang, Horowitz and Wei (2010) for the proofs of (i) and (ii), and Theorem 1 in our paper for the proof of (iii).

The weights we use are wn=γ21, if ‖γ̃2 > 0; wnℓ = ∞, if ‖γ̃2 = 0.

Remark 1. Assumptions 1 and 2 give the order requirements of Jn and λn. Based on the condition that Jn2{nlog(n)}10 given in Assumption 1, we need Jn ≪ {n log(n)}1/2, where anbn denotes that an/bn = o(1) for any positive numbers an and bn, and λn needs to satisfy n1/2Jn1/2log(pJn)×{minI2(wn)}1λn1. From the above theoretical properties of the group Lasso estimators, we know that, with probability approaching 1, ‖γ̃2 > 0 for nonzero components, and then the corresponding weights wnℓ are bounded away from 0 and infinity for I1. By defining 0 · ∞ = 0, the components not selected by the group Lasso are not included in the adaptive group Lasso procedure. Let Jnn1/(2r + 1), so that Jn has the optimal order for spline regression. If p = exp[o{n2r/(2r + 1)}], then n1/2Jn1/2log(pJn)0. This means the dimension p can diverge with the sample size at an almost exponential rate.

2.4. Selection of tuning parameters

Tuning parameter selection always plays an important role in model and variable selection. An underfitted model can lead to severely biased estimation, and an overfitted model can seriously degrade the estimation efficiency. Among different data-driven methods, the Bayesian information criterion (BIC) tuning parameter selector has been shown to be able to identify the true model consistently in the fixed dimensional setting [Wang, Li and Tsai (2007)]. In the high-dimensional setting, an extend BIC (EBIC) and a generalized information criterion have been proposed by Chen and Chen (2008) and Fan and Tang (2013), respectively. In this paper, we adopt the EBIC method [Chen and Chen (2008)] to select the tuning parameter λn in (8). Specifically, the EBIC(λn) is defined as

2i=1n(Q[g1{=1pB(Xi)Tγ^Ti},Yi])+s(1+dJn)log(n)+2νlog(ps),

where (γ^)=1p is the minimizer of (8) for a given λn, s* is the number of nonzero estimated functions (α^)=1p and 0 ≤ ν ≤ 1 is a constant. Here we use ν = 0.5. When ν = 0, the EBIC is ordinary BIC.

We use cubic B-splines for the nonparametric function estimation, so that q = 4. In the penalized estimation procedure, we let the number of interior knots N = ⎿cn1/(2q + 1)⏌ satisfy the optimal order, where ⎿a⏌ denotes the largest integer no greater than a and c is a constant. In the simulations, we take c = 2.

3. Inference and the bootstrap smoothing procedure

3.1. Background

After model selection, our next step is to conduct statistical inference for the coefficient functions of those important variables. We will establish a simultaneous confidence band (SCB) based on a two-step estimator for global inference. An asymptotic formula of the SCB will be provided based on the distribution of the maximum value of the normalized deviation of the spline functional estimate. To improve accuracy, we calculate the estimated standard deviation in the SCB by using the nonparametric bootstrap smoothing method as discussed in Efron (2014). For specificity, we focus on the construction of α1(x1), with αℓk(xk)for k ≥ 2 defined similarly, for Î1, where Î1 = { : ‖α̂‖ ≠ 0, 1 ≤ p}.

Although the one-step penalized estimation in Section 2 can quickly identify nonzero coefficient functions, no asymptotic distribution is available for the resulting estimators. Thus we construct the SCB based on a refined two-step spline estimator for α1(x1), which will be shown to have the oracle property that the estimator of α1(x1) has the same asymptotic distribution as the univariate oracle estimator obtained by pretending that α0 and αℓk (Xk) for Î1, k ≥ 2 and α(X) for Î1 are known. See Horowitz, Klemelä and Mammen (2006), Horowitz and Mammen (2004), Liu, Yang and Härdle (2013) for kernel-based two-step estimators in generalized additive models, which also have the oracle property but are not as computationally efficient as the two-step spline method. We next introduce the oracle estimator and the proposed two-step estimator before we present the SCB.

3.2. Oracle estimator

In the following, we describe the oracle estimator of α1(x1). We rewrite model (4) as

μ(X,T)=g1{η(X,T)}=I^1α1(X1)T+I^1{α0+k2αk(Xk)}T+I^1α(X)T. (9)

By assuming that α0 and αℓk(Xk) for Î1, k ≥ 2 and α(X) for Î1 are known, estimation in (9) involves only the nonparametric functions α1(X1) of a scalar covariate X1. It will be shown in Theorem 2 that the estimator achieves the univariate optimal convergence rate when the optimal order for the number of knots is applied. We estimate α1(x1) = {α1(x1), Î1}T by minimizing the negative quasi-likelihood function as follows. Denote the oracle estimator by α^1OR(x1)=B1S(x1)Tγ^1OR, where γ^1OR is defined directly below, B1S(x1)={Bj,1S(x1),1jJnS} where Bj,1S(x1) is the centered B-spline function defined in the same way as Bj, 1(x1) in Section 2, but with NS=NnS interior knots and JnS=NnS+q. Rates of increase for JnS are described in Assumptions 3 and 4 below. Let α, −1(Xi) = α0 + Σk ≥ 2 αℓk(Xik). Then γ^,1OR={(γ^1OR)T,I^1}T is obtained by minimizing the negative quasi-likelihood

LnOR(γ,1)=i=1nQ[g1{I^1B1S(Xi1)Tγ1Ti+I^1α,1(Xi)Ti+I^1α(Xi)Ti},Yi], (10)

where γ,1 = {(γ1)T, Î1}T. Similarly, the oracle estimator of α0 = {α0, Î1}T, which is denoted as α^0OR={α^0OR,I^1}T={γ^0OR,I^1}T, is obtained by minimizing LnOR(γ,0)=i=1nQ[g1{I^1γ0Ti+I^1α,0(Xi)Ti+I^1α(Xi)Ti},Yi], where γ,0 = (γ0, Î1) and α,0(Xi)=k=1dαk(Xik).

3.3. Initial estimator

The oracle estimator is infeasible because it assumes knowledge of the other functions. In order to obtain the two-step estimators of α1(x1) for Î1, we first need initial estimators for α0 and αℓk(xk) for k ≥ 2 and Î1 , denoted as α^0ini=γ^0ini and α^kini(xk)=Bkini(xk)Tγ^kini, where Bkini(xk)={Bj,kini(xk):1jJnini}T and Bj,kini(xk) are B-spline functions with the number of interior knots Nnini and Jnini=Nnini+q. Rates of increase for Jnini are described in Assumptions 3 and 4 below. We need an undersmoothed procedure in the first step, so that the approximation bias can be reduced, and the difference between the two-step and oracle estimators is asymptotically negligible. We obtain γ^I^1ini={(γ^ini)T:I^1}T, where γ^ini={γ^0ini,(γ^kini)T}T, by minimizing the negative quasi-likelihood i=1nQ[g1{I^1B(Xi)TγT},Yi]. The adaptive group Lasso penalized estimator γ̂Î1 = {(γ̂)T : Î1}T obtained in Section 2 can also be used as the initial estimator. We, however, refit the model with the selected variables and obtain the initial estimator γ^I^1ini in order to improve estimation accuracy in high-dimensional data settings.

3.4. Final estimator

In the second step, we construct the two-step estimator of α1 for Î1. We replace α0 and αℓk(Xk) by the initial estimators α^0ini and α^kini(Xk) for Î1 and k ≥ 2 and replace α(X) for Î1 by α̂(X) = 0. Let α^,1ini(Xi)=α^0ini+k2α^kini(Xik). Denote the two-step spline estimator of α1(x1) as α^1S(x1)=B1S(x1)Tγ^1S with γ^,1S={(γ^1S)T,I^1}T minimizing

LnS(γ,1)=i=1nQ[g1{I^1B1S(Xi1)Tγ1Ti+I^1α^,1ini(Xi)Ti+I^1α^(Xi)Ti},Yi]. (11)

Then the two-step of α0, denoted as α^0S=γ^0S, is obtained in the same way as α^0OR by replacing α,0(Xi) with α^,0ini(Xi)=k=1dα^kini(Xik) for Î1 and replacing α(Xi) with α̂(Xi) = 0 for Î1. Let α^0S={α^0S,I^1}T.

3.5. Asymptotic normality and uniform oracle efficiency

We now establish the asymptotic normality and uniform oracle efficiency for the oracle and final estimators. Let Zij,1=Bj,1S(Xi1)Ti and Zi,1=(Zij,1,1jJnS,I^1)T. Let s* be the number of elements in Î1. By Theorem 1, P(s* = s) → 1. For simplicity of notation, denote σi2=σ2(Xi,Ti) and ηi = η(Xi, Ti). Define s×sJnS matrix BInline graphic(x1) as

[B1,1S(x1)BJnS,1S(x1)00000000B1,1S(x1)BJnS,1S(x1)].

To establish the asymptotic distribution of the two-step estimator, in addition to Assumptions 1 and 2 given in Section 2, we make the following two assumptions on the number of basis functions JnS and Jnini:

Assumption 3. (i) s(JnS)2{nlog(n)}1=o(1) and s(JnS)r=o(1), and (ii) n(logn)1(JnSJnini)1, as n → ∞.

Assumption 4. (n/JnS)1/2(Jnini)r0, as n → ∞.

First we describe the asymptotic normality of the oracle estimator α^1OR(x1) of α1(x1). Let α^1OR(x1)={α^1OR(x1),I^1}T. Let b1(x1)=E{α^1OR(x1)|X,T} and b1(x1)=E{α^1OR(x1)|X,T}, for Î1, where (X,T)=(Xi,Ti)i=1n.

Theorem 2. Under conditions (C1)–(C5) and Assumption 3(i), for any vector aRs* with ‖a2 = 1, for any x1 ∈ [0, 1], aTσn1(x1){α^1OR(x1)b1(x1)}N(0,1), where

σn2(x1)=BS(x1)[i=1nZi,1Zi,1T{g˙1(ηi)}2/σi2]1BS(x1)T, (12)

where −1 (ηi) is the first-order derivative of g−1(ηi) with respect to ηi, and

I^1α^1ORb12=Op(sJnSn1),I^1b1α12=Op{(s)2(JnS)2r}.

Thus for Î1, σn11(x1){α^1OR(x1)b1(x1)}N(0,1), where

σn12(x1)=eTσn2(x1)e, (13)

and e is the s*-dimensional vector with the th element 1 and other elements 0, and α^0ORα02=Op(s/n).

The next result shows the uniform oracle efficiency of the two-step estimator that the difference between the two-step estimator α^1S(x1) and oracle estimator α^1OR(x1) is uniformly asymptotically negligible, and thus the two-step estimator is oracle in the sense that it has the same asymptotic distribution as the oracle estimator. Let α^1S(x1)={α^1S(x1),I^1}T.

Theorem 3. Under conditions (C1)–(C5) in the Appendix and Assumptions 1–3,

supx1[0,1]α^1S(x1)α^1OR(x1)=Op{(n1logn)1/2+(Jnini)r},

α^0Sα^0OR2=op(n1/2), and furthermore under Assumption 4,

supx1[0,1]|aTσn1(x1){α^1S(x1)α^1OR(x1)}|=op(1),

for any vector aRs* with ‖a2 = 1 and σn2(x1) given in (12). Hence, for any x1 ∈ [0, 1], aTσn1(x1){α^1S(x1)b1(x1)}N(0,1).

Remark 2. Under Assumptions 1 and 2, by Theorem 1, with probability approaching 1, s* = s, which is a fixed number. In the second step, by letting JnSn1/(2r+1), the nonparametric functions α1 for Î1 are approximated by spline functions with the optimal number of knots. By the conditions that (n/JnS)(Jnini)10 and n(logn)1(JnSJnini)1 given in Assumptions 3 and 4, Jnini needs to satisfy n1/(2r+1)Jninin2r/(2r+1)(logn)1 where r ≥ 1. By using the adaptive group lasso estimator as the initial estimator, Assumption 1 requires that Jnini{nlog(n)}1/2. Hence n1/(2r+1)Jnini{nlog(n)}1/2. We therefore can let Jninin(1+ϑ)/(2r+1), where ϑ is any small positive number close to 0. This increase in the number of basis functions ensures undersmoothing in the first step in order that the uniform difference between the two-step and the oracle estimators become asymptotically negligible. Based on Assumptions 1 and 2, the tuning parameter λn needs to satisfy n1/2(Jnini)1/2log(pJnini){minI2(wn)}1λn1.

Remark 3. The number of interior knots has the same order requirement as the number of basis functions. In the first step, with the undersmoothing requirement as discussed in Remark 2, we let the number of interior knots Nini = ⎿cn(1+0.01)/(2q+1)⏌, where c is a constant, by assuming that r = q. In the simulations, we let c = 2. In the second-step estimation, we use BIC to select the number of knots NS, so the optimal NS ranges in [⎿n1/(2q+1)⏌, ⎿2n1/(2q+1)⏌] by minimizing BIC: BIC(NS)=2LnS(γ^,1S)+d(NS+q)log(n).

3.6. Simultaneous confidence bands

In this section, we propose a SCB for α1(x1) by studying the asymptotic behavior of the maximum of the normalized deviation of the spline functional estimate. To construct asymptotic SCBs for α1(x1) over the interval x1 ∈ [0, 1] with confidence level 100(1 − α)%, α ∈ (0, 1), we need to find two functions lℓn(x1) and uℓn(x1) such that

limnP(ln(x1)α1(x1)un(x1)for allx1[0,1])=1α. (14)

In practice, we consider a variant of (14) and construct SCBs over a subset Sn,1 of [0, 1] with Sn,1 becoming denser as n → ∞. We, therefore, partition [0, 1] according to Ln equally spaced intervals based on 0 < ξ0 < ξ1 < ··· < ξLn < ξLn+1 = 1 where Ln → ∞ as n → ∞. Let Sn,1 = (ξ0,…,ξLn). Define dLn(α) = 1 − {2 log(Ln + 1)}−1[log{−(1/2)log(1 − α)} + (1/2) {log log(Ln + 1) + log(4π)}], and QLn(α) = {2 log(Ln + 1)}1/2dLn(α).

Theorem 4. Under conditions (C1)–(C5) in the Appendix, and LnJnSn1/(2r+1) and n1/(2r+1)Jninin2r/(2r+1){log(n)}1, we have

limnP{supx1Sn,1|σn11(x1){α^1S(x1)α1(x1)}|QLn(α)}=1α,

and thus an asymptotic 100(1 − α)% confidence band for αℓ1(x1) over x1Sn,1 is

α^1S(x1)±σn1(x1)QLn(α). (15)

Remark 4. Compared to the pointwise confidence intervals with width 2Z1 − α/2σn(x1), the width of the confidence bands (15) is inflated by a rate {2 log(Ln +1)}1/2dLn (α) / Z1−α/2, where Z1−α/2 is the cut-off point of the 100(1 − α)th percentile of the standard normal.

3.7. Bootstrap smoothing for calculating the standard error

Theorem 4 establishes a thresholding value QLn(α) for the SCB. One critical question is how to estimate the standard deviation σn1(x1) in order to construct the SCB. We can use a sample estimate of σn1(x1) according to the asymptotic formula given in (12),which may have approximation error and thus lead to inaccurate results for inference. The bootstrap estimate of the standard deviation provides an alternative way. We here propose a bootstrap smoothed confidence band by adopting the nonparametric bootstrap smoothing idea from Efron (2014), which can eliminates discontinuities in jumpy estimates. The procedure is described as follows.

Let D = {D1,…, Dn} be the data we have, where Di = {Yi, Xi, (Ti,ℓ, Î1)}. Denote D={D1,,Dn} as a nonparametric bootstrap sample from {D1,…, Dn}, and D(j)={D(j)1,,D(j)n} as the jth bootstrap sample in B draws. Let α^1,(j)S(x1) be the two-step estimator of α1(x1) by using the data D(j). We first present an empirical standard deviation by the traditional resampling method which is given as

σ^1,B(x1)=[j=1B{α^1,(j)S(x1)α^1,S.(x1)}2/(B1)]1/2, (16)

where α^1,S.(x1)=j=1Bα^1,(j)S(x1)/B. Then a 100(1 − α)% unsmoothed bootstrap SCB for α1(x1) over x1Sn,1 is given as

α^1S(x1)±σ^1,B(x1)QLn(α). (17)

Another choice is the smoothed bootstrap SCB which eliminates discontinuities in the estimates [Efron (2014)]. Let

α1S(x1)=j=1Bα^1,(j)S(x1)/B

be the smoothed estimate of α1(x1) obtained by averaging over the bootstrap replications. Let C(j)i=#{D(j)i=Di} be the number of elements in D(j)i equaling Di.

Proposition 1. At each point x1Sn,1, the nonparametric delta-method estimate of the standard deviation for the smoothed bootstrap statistic α1S(x1) is σ1(x1)={i=1ncovi2(x1)}1/2, where covi(x1)=cov{C(j)i,α^1,(j)S(x1)} which is the bootstrap covariance between C(j)i and α^1,(j)S(x1).

The proof of Proposition 1 essentially follows the same arguments as the proof for Theorem 1 in Efron (2014). Based on Proposition 1, to construct the smoothed bootstrap SCB, we use the nonparametric estimate of the standard deviation given as

σ1,B(x1)={i=1ncov^i,B2(x1)}1/2, (18)

Where

cov^i,B(x1)=j=1B(C(j)iCi)(α^1,(j)S(x1)α^1,S.(x1))/B

with Ci=j=1BC(j)i/B. The 100(1 − α)% smoothed bootstrap SCB for α1(x1) over x1Sn,1 is given as

α1S(x1)±σ1,B(x1)QLn(α). (19)

4. A simulation study

In this section, we present a simulation study to evaluate the finite sample performance of our proposed penalized estimation procedure and the simultaneous confidence bands. More numerical studies are located in the supplementary materials [Ma et al. (2015)].

Example 1. In this example, we use 1286 SNPs located on the sixth chromosome from the Framingham Heart Study to simulate the binary response from the logistic model

logit{P(Yi=1|Xi,Ti)}==1pα(Xi)Ti==1p{α0+k=12αk(Xik)}Ti, (20)

with the four SNPs ss66063578, ss66236230, ss66194604 and ss66533844 selected from the real data analysis in Section 5 as important covariates and the other SNPs as unimportant covariates, so that s = 4 (the number of important covariates), p = 1286 and the sample size n = 300. The three possible allele combinations are coded as 1, 0 and −1 for each SNP The covariates Xik, k = 1, 2, are simulated environmental effects, which are generated from independent uniform distributions on [0, 1]. We generate the coefficient functions as α10 = 0.5, α11(x1) = 4cos(2πx1), α12(x2) = 5{(2x2 − 1)2 − 1/3}, α20 = 0.5, α21(x1) = 6x1 − 3, α22(x2) = 4{sin(2πx2) + cos(2πx2)}, α30 = 0.5, α31(x1) = 4sin(2πx1), α32(x2) = 6x2 − 3, α40 = 0.5, α41(x1) = 4cos(2πx1), α42(x2) = 5{(2x2 − 1)2 − 1/3} and α(Xi) = 0 for l = 5,…, 1286. We conducted 500 replications for each simulation. We fit the data with the GACM (20) by using the adaptive group lasso (AGL) and group lasso (GL). In the literature, the generalized varying coefficient model [GVCM; Lian (2012)], which considers one index variable in the coefficient function for each predictor Tiℓ, has been widely used to study nonlinear interactions. To apply the GVCM method [Lian (2012)] in this setting, we first perform principal component analysis (PCA) on Xi and then use the first principal component as the index variable in the GVCM. Then we apply the AGL and GL methods to the GVCM: logit{P(Yi=1|Xi,Ti)}==1pα(Ui)Ti, where Ui is the first principal component obtained by PCA on Xi. Moreover, we also fit the data with the parametric logistic regression by assuming linear coefficient functions (3) with the AGL method. We also compare our proposed method with the conventional screening method by parametric logistic regression for Genome-Wide Association Studies [GWAS; Murcray, Lewinger and Gauderman (2009)]. In the screening method, we fit a logistic model for each SNP: logit{P(Yi=1|Xi,Ti)}=α0+αTXi+βTi+k=12βkXikTi, for = 1,…, 1286. Then we conduct a likelihood ratio test for the genetic and interaction effects of H0 : β = β1 = β2 = β3 = 0. Let α0 = 0.05 be the overall type I error for the study and M = 1286 be the number of SNPs in this study. We apply the multiple testing correction procedure for GWAS with H0 rejected when the p-value < α0/Meff, where Meff is the Cheverud–Nyholt estimate of the effective number of tests [Cheverud (2001), Nyholt (2004)] calculated by Meff=1+M1j=1Mk=1M(1rjk2) and rjk are the correlation coefficients of the SNPs, and we obtain Meff = 1275.65.

Table 1 presents the percentages of correct-fitting (C) (exactly the important covariates are selected), over-fitting (O) (both the important covariates and some unimportant covariates are selected) and incorrect-fitting (I) (some of the important covariates are not selected), the average true positives (TP), that is, the average number of selected covariates among the important covariates, the average false positives (FP), that is, the average number of selected covariates among the unimportant covariates, and the average model errors (MR), the latter defined as i=1n{μ^i(Xi,Ti)μi(Xi,Ti)}2/n, where μ̂i(Xi, Ti) and μi(Xi, Ti) are the estimated and true conditional means for Yi, respectively. We see that by fitting the proposed GACM, the GL method has larger percentage of over-fitting as well as larger average false positives than the AGL methods. The AGL improves the correct-fitting percentage by 26%. As a result, the AGL reduces the model fitting error by (0.083 − 0.059)/0.059 = 40.7% compared to the GL method. Moreover, both the logistic model and the GVCM fail to identify those important covariates with incorrect-fitting percentage close to or being 1. Furthermore, by using the screening method with logistic regression, the average true positive is 1.056, which is much less than 4 (the number of those important SNPs). This further illustrates that the traditional screening method is not an effective tool to identify important genetic factors in this context. In addition, we observe that the results for the AGL method in Table 1 are comparable to the results in Table S.1 of Example 2 (in the supplementary materials) at p = 1000 with the simulated SNPs in terms of having similar correct-fitting percentages and MR values.

Table 1.

Variable selection and estimation results by the adaptive group lasso and the group lasso with the GACM and GVCM, respectively, and parametric logistic regression with adaptive group lasso and screening methods based on 500 replications. The columns of C, O and I show the percentage of correct-fitting, over-fitting and incorrect-fitting. The columns TP, FP and MR show true positives, false positives and model errors, respectively

C O I TP FP MR
GACM AGL 0.410 0.460 0.130 3.860 0.870 0.059
GL 0.140 0.764 0.096 3.904 2.540 0.083
GVCM AGL 0.030 0.000 0.970 1.636 5.685 0.142
GL 0.060 0.000 0.940 2.076 20.670 0.120
Logistic regression AGL 0.000 0.000 1.000 1.872 1.174 0.159
Screening 0.000 0.000 1.000 1.056 0.786 0.141

Next, we investigate the empirical coverage rates of the unsmoothed and smoothed SCBs given in (17) and (19). To calculate the unsmoothed and smoothed bootstrap standard deviations (16) and (18), we use B = 500 bootstrap replications. The confidence bands are constructed at Ln = 20 equally spaced points. At 95% confidence level, Table 2 reports the empirical coverage rates (cov) and the sample averages of median and mean standard deviations (sd.median and sd.mean), respectively, for the unsmoothed SCB (17) and smoothed SCB (19) for coefficient functions α1(x1), = 1, 2, 3, 4. We see that the smoothed bootstrap method leads to better performance, having empirical coverage rates closer to the nominal confidence level 0.95.

Table 2. The empirical coverage rates (cov) and the sample average of median and mean of the standard deviations (sd.median and sd.mean)for the unsmoothed SCB (17) and smoothed SCB (19) for the coefficient functions α1(x1) for = 1, 2, 3, 4.

Unsmoothed bootstrap Smoothed bootstrap


cov sd.median sd.mean cov sd.median sd.mean
α11 0.610 0.689 0.809 0.818 0.735 0.982
α21 0.628 0.563 0.725 0.846 0.666 0.932
α31 0.636 0.736 0.832 0.869 0.837 1.053
α41 0.646 0.768 0.843 0.882 0.891 1.064

5. Data application

We illustrate our method via analysis of the Framingham Heart Study [Dawber, Meadors and Moore (1951)] to investigate the effects of G × E interactions on obesity. People are defined as obese when their body mass index (BMI) is 30 or greater: this is the definition of being obese made by the U.S. Centers for Disease Control and Prevention; see http://www.cdc.gov/obesity/adult/defining.html. We defined the response variable to be Y = 1 for BMI ≥ 30; and Y = 0 for BMI < 30. We use X1 = sleeping hours per day; X2 = activity hours per day; and X3 = diastolic blood pressure as the environmental factors, and use single nucleotide polymorphisms (SNPs) located in the sixth chromosome as the genetic factors. The three possible allele combinations are coded as 1, 0 and −1. As in the simulation, we thus are fitting a multiplicative risk model in the SNPs. For details on genotyping, see http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?studyid=phs000007.v3.p2. A total of 1286 SNPs remain in our analysis after eliminating SNPs with minor allele frequency <0.05, those with departure from Hardy–Weinberg equilibrium and those having correlation coefficient with the response between −0.1 and 0.1. We have n = 300 individuals left in our study after deleting observations with missing values.

To see possible nonlinear main effects of the environmental factors, we first fit a generalized additive model by using X1, X2 and X3 as predictors such that

E(Yi|Xi,Ti)=g1{η(Xi)}withη(Xi)=m0+k=13mk(Xik). (21)

Figure S.1 given in the supplementary material [Ma et al. (2015)] depicts the plots of k(·) for k = 1, 2, 3 by one-step cubic spline estimation. Clearly the estimate of each nonparametric function has a nonlinear pattern. We refer to Section S.2 for the detailed description of this figure. Based on the plots shown in Figure S.1, we fit the GACM model

η(Xi,Ti)==11287{α0+k=13αk(Xik)}Ti, (22)

where Ti = (Ti1,Ti2,…, Ti1287)T with Ti1 = 1, and Tiℓ are the SNP covariates for = 2,…, 1287. The nonparametric function αℓk(·) is estimated by cubic splines, and the number of interior knots for each step is selected based on the criterion described in Section 2.4. We select variables in model (22) by the proposed adaptive group lasso (AGL) and the group lasso (GL). To compare the proposed model with linear models, we perform the group lasso by assuming linear interaction effects (Linear) such that α(Xi)=α0+k=13βkXik, and we also perform the lasso by assuming no interaction effects (No interaction) such that α(Xi) = α0. We also apply the screening method with parametric logistic regression (Screening) as described in Example 2. Table 3 reports the variable selection results in these five scenarios. After model selection, we calculate the estimated leave-one-out cross-validation prediction error (CVPE) for the model with the selected variables as shown in the last row of Table 3. Among the selected SNPs by the AGL method, two SNPs, rs4714924 and rs6543930, have been scientifically confirmed by Randall et al. (2013) to have strong associations with obesity. Moreover, compared to the linear, no interaction and screening methods, our proposed AGL with GACM method enables us to identify more genetic factors, which may be important to the response but missed out by other methods. As a result, it has the smallest CVPE (0.078), so that it significantly improves model prediction compared to other methods. We also see that the logistic model that completely ignores interactions has the largest CVPE (0.152). The screening method has the second largest CVPE (0.149), which is larger than that of the penalization method (0.124) obtained by fitting the same logistic regression model but including interaction considered. This result demonstrates that the screening method is not as effective as the penalization method for analysis of this data set, a result which also agrees with our simulations.

Table 3.

Variable selection results for the group lasso (GL) and the adaptive group lasso (AGL) in model (22), the group lasso by assuming linear interaction effects (linear), the lasso by assuming no interaction effects (no interaction) and the screening method (screening). The symbol ✓ indicates that the SNP was selected into the model. The last row shows the cross validation prediction errors (CVPE)

SNPs GL AGL Linear No interaction Screening
rs9296244
rs6910353
rs3130813
rs9353447
rs4714924
rs242263
rs282123
rs282128
rs6929006
rs9353711
rs12199154
rs2277114
rs749517
rs729888
rs203139
rs6914589
rs6543930
CVPE 0.099 0.078 0.124 0.152 0.149

Next we fit the final GACM selected variables from the AGL procedure as

η(Xi,Ti)==110{α0+k=13αk(Xik)}Ti. (23)

To illustrate the main effects of the environmental factors, Figure 2 plots the smoothed two-step estimated functions α1kS() of the functions α1kS(), for k = 1, 2, 3, and the associated 95% smoothed SCBs (upper and lower solid lines). The plots of the functional estimates have the same nonlinear change patterns as the corresponding plots in Figure S.1, although because of the addition of the SCBs, the scale of the plot has changed.

Fig. 2. Plots of the smoothed two-step estimated functions α1kS() for k = 1, 2, 3 and the associated 95% SCBs based on model (23).

Fig. 2

To illustrate the effects of the genetic factors changing with the environmental factors, in Figure 3 we plot the smoothed two-step estimated functions α6kS() and the associated 95% smoothed SCBs of the coefficient functions α6kS() for the SNP rs242263. To further demonstrate how the probability of developing obesity changes with the environmental factors for each category of SNP rs242263, Figure 4 plots the estimated conditional probability of obesity against each environmental factor by letting Tiℓ = 0 for ≠ 6. Letting A be the minor allele, the curves are for aa (solid line), Aa (dashed line) and AA (dotted line). Figure 3 indicates different changing patterns of the interaction effects under different environments. For example, sleeping hours seem to have an overall more significant interaction effect with this particular SNP than the other two variables. The effect of this SNP changes from positive to negative and then to positive again as the sleeping hours increase. The coefficient functions of the SNP have an increasing pattern along with the activity hours and diastolic blood pressure, respectively. From Figure 4, we observe that there are stronger differences among the levels AA, Aa, and aa of SNP rs242263 for both large and small values of the environmental factors. There are other interesting results worth further study. For example, in the 2–6 hours per day sleeping range, the AA group (dotted lines) have much higher rates of obesity than the aa group (solid line), but the opposite occurs in the 6–9 hour range. For those with low amounts of activity per day, again the AA group is more obese than the aa group, while when activity increases, the AA group is less obese than the aa group. A similar noticeable difference occurs between the <60 diastolic blood pressure group, those who are hypotensive, and the >90 group, those who are hypertensive, although there are few subjects in the former group.

Fig. 3. Plots of the smoothed two-step estimated functions α5kS() for k = 1, 2, 3 and the associated 95% SCBs based on model (23).

Fig. 3

Fig. 4.

Fig. 4

Plots of the estimated conditional probability of obesity against each environmental factor by letting Tiℓ = 0 for ≠ 5. With A being the minor allele, the curves are aa (solid line), Aa (dashed line) and AA (dotted line), based on model (23).

6. Discussions

The generalized additive coefficient model (GACM) proposed by Xue and Yang (2006) and Xue and Liang (2010) has been demonstrated to be a powerful tool for studying nonlinear interaction effects of variables. To promote the use of the GACM in modern data applications such as gene-environment (G × E) interaction effects in GWAS, we have proposed estimation and inference procedures for the GACM when the dimension of the variables is high. Specifically, we have devised a groupwise penalization method in the GACM for simultaneous model selection and estimation. We showed by numerical studies that we can effectively identify important genetic factors by using the proposed nonparametric model while traditional generalized parametric models such as logistic regression model fails to do so when nonlinear interactions exist. Moreover, by comparing with the conventional screening method with logistic regression as commonly used in the GWAS community, our proposed groupwise penalization method with the GACM has been demonstrated to be more effective for variable selection and model estimation. After identifying those important covariates, we have further constructed simultaneous confidence bands for the nonzero coefficient functions based on a refined two-step estimator. We estimate the standard deviation of the functional estimator by a smoothed bootstrap method as proposed in Efron (2014). The method was shown to have good numerical performance by reducing variability as well as improving the empirical coverage rate of the proposed simultaneous confidence bands. Our methods can be extended to longitudinal data settings through marginal models or mixed-effects models. More work, however, is needed to understand the properties of the estimators in such new settings. Moreover, extending this work to the setting with the dimensions for both genetic and environmental factors growing with the sample size can be a future project to be considered. Some associated theoretical properties with respect to model selection and estimation as well as inference need to be carefully investigated.

Supplementary Material

Supplement

Acknowledgments

The authors thank the Co-Editors, an Associate Editor and three referees for their valuable suggestions and comments that have substantially improved an earlier version of this paper.

Appendix

Denote the space of the qth order smooth functions as C(q)([0, 1]) = {ϕ|ϕ(q)C[0, 1]}. For any s × s symmetric matrix A, denote its Lq norm as ‖Aq = maxςRs,‖ς2=1Aςq. Let A=max1isj=1s|aij|. For a vector a, let ‖a = max1≤is |ai|.

Let C0, 1 (χw) be the space of Lipschitz continuous functions on χw, that is,

C0,1(χw)={φ:φ0,1=supww',w,w'χw|φ(w)φ(w')||ww'|<+},

in which ‖φ0, 1 is the C0, 1-norm of φ. Denote qj(η, y) = j Q{g−1(η), y}/∂ηj, so that

q1(η,y)=ηQ{g1(η),y}={yg1(η)}ρ1(η),
q2(η,y)=2η2Q{g1(η),y}=ρ2(η){yg1(η)}ρ1'(η),

where ρj(η) = {ġ−1 (η)}j/V{g−1 (η)}.

A.1. Assumptions

Throughout the paper, we assume the following regularity conditions:

(C1) The joint density of X, denoted by f(x), is absolutely continuous, and there exist constants 0 < cfCf < ∞, such that cf ≤minx∈[0, 1]d f(x) ≤ maxx∈[0, 1]d f(x) ≤ Cf.

(C2) The function V is twice continuously differentiable, and the link function g is three times continuously differentiable. The function q2(η, y) < 0 for ηR and y in the range of the response variable.

(C3) For 1 ≤ p, 1 ≤ kd, αk(r1)(xk)C0,1[0,1], for given integer r ≥ 1. The spline order satisfies qr.

(C4) Let εi = Yiμ(Xi, Ti), 1 ≤ in. The random variables ε1,…, εn are i.i.d. with E(εi) = 0 and var(εi|Xi, Ti) = σ2(Xi, Ti). Furthermore, their tail probabilities satisfy P(|εi| > x) < K exp(−Cx2), i = 1,…, n, for all x ≥ 0 and for some positive constants C and K.

(C5) The eigenvalues of E(TI1TI1T|X=x), where TI1 = (T, I1)T, are uniformly bounded away from 0 and ∞ for all x ∈ [0, 1]d. There exist constants 0 < c1 < C1 < ∞, such that c1E(T2|X=x)C1, for all x ∈ [0, 1]d, I2.

Conditions (C1)–(C5) are standard conditions for nonparametric estimation. Condition (C1) is the same as condition (C1) in Xue and Yang (2006) and condition (C5) in Xue and Liang (2010). The first condition in (C2) gives the assumptions on V and the link function g, which can be found in condition (E) of Lam and Fan (2008). The second condition in (C2) guarantees that the negative quasi-likelihood function Q{g−1(η), y} is convex in ηR, which is also given in condition (D) of Lam and Fan (2008) and (a) of condition 1 in Carroll et al. (1997). Condition (C3) is typical for polynomial spline smoothing; see the same condition given in Section 5.2 of Huang (2003). Condition (C4) is the same as assumption (A2) given in Huang, Horowitz and Wei (2010). Condition (C5) is given in condition (C5) of Xue and Liang (2010) and condition (A5) in Ma and Yang (2011b).

A.2. Preliminary lemmas

Define α0(x)=k=1dαk0(xk)=B(x)Tγ, where αk0(xk) is defined in (6). Let γI1 = (γ : I1)T. To prove Theorem 1, we next define the oracle estimator of γI1 by minimizing the penalized negative quasi-likelihood with all irrelevant predictors eliminated as such

Ln(γI1)=i=1nQ[g1{I1B(Xi)TγT},Yi]+nλnI1wnγ2, (24)

so that γ^I10=(γ^0:I1)T=argminγI1Ln(γI1). Define γ^I20=(γ^0:I2)T with γ^00dJn+1 for I2, where 0d Jn+1 is a (d Jn + 1)-dimensional zero vector. We next present several lemmas, whose detailed proofs are given in the online supplementary materials [Ma et al. (2015)]. Lemma A.1 is used for the proof of Theorem 1, while Lemma A.2 is needed in the proof of Theorem 3.

Lemma A.1. nder the conditions of Theorem 1, one has

γ^I10γI12=Op(λnwn,I1+n1/2Jn1/2+Jnr), (25)

and as n → ∞,

P{γ^=(γ^I10T,γ^I20T)T}1. (26)

Lemma A.2. Under conditions (C1)–(C5) and Assumptions 1–3,

γ^,1Sγ^,1OR=Op(logn/(JnSn)+(JnS)1/2(Jnini)r). (27)

A.3. Proof of Theorem 1

By (25) and (26),

I1α^αγ^I1γI12=Op(λnwn,I1+n1/2Jn1/2+Jnr),P(α^>0,I1andα^=0,I2)1.

A.4. Proof of Theorem 2

Let γ,1 = (γ1, Î1)T, where γ1 is defined in (7). By Taylor's expansion, from (10), one has

γ^,1ORγ,1=[i=1nZi,1Zi,1T{g˙1(ηi)}2/σi2]1×[i=1nZi,1{Yig1(ηi0)}(g˙1(ηi0)/σi2)],

Where ηi0==1p{α0+k=2dαk(Xik)}Ti+=1pBS(x1)Tγ1Ti and

ηi==1p{α0+k=2dαk(Xik)}Ti+=1pBS(x1)Tγ1Ti,

where γ,1=(γ1,I^1)T(γ,1,γ^,1OR). Following similar reasoning as the proofs for (25), we have γ^,1ORγ,12=op(1). Then γ^,1ORγ,1=(γ^,1eOR+γ^,1μOR)+op(1), where

γ^,1eOR=[i=1nZi,1Zi,1T{g˙1(ηi)}2/σi2]1[i=1nZi,1εi{g˙1(ηi)/σi2}],
γ^,1μOR=[i=1nZi,1Zi,1T{g˙1(ηi)}2/σi2]1×[i=1nZi,1{g1(ηi)g1(ηi0)}{g˙1(ηi)/σi2}]. (28)

Therefore, var(γ^,1eOR|X,T)=[i=1nZi,1Zi,1T{g˙1(ηi)}2/σi2]1. By Theorem 5.4.2 of DeVore and Lorentz (1993), for sufficiently large n, there exist constants 0 < cBCB < ∞, such that cBIJnS×JnSE(B1S(Xi1)B1S(Xi1)T)CBIJnS×JnS. By condition (C5), for n large enough, there are constants 0 < CT, C′ < ∞, such that

E[Zi,1Zi,1T{g˙1(ηi)}2/σi2]C'E[{B1S(Xi1)B1S(Xi1)T}{E(TT'|X)},'I^1]CCTsE{B1S(Xi1)B1S(Xi1)T}Is×sC'CTCBsIJnS×JnSIs×s=CsIJnSs×JnSs,

where C = C′CTCB. Similarly, we have E[Zi,1Zi,1T{g˙1(ηi)}2/σi2]cIJnSs×JnSs for some constant 0 < c < ∞. Thus, following the same reasoning as the proof for (S.5) in the supplementary materials [Ma et al. (2015)], we have with probability 1, for n → ∞,

C1(s)1n1IJnSs×JnSs[i=1nZi,1Zi,1T{g˙1(ηi)}2/σi2]1c1n1IJnSs×JnSs. (29)

By the Lindeberg central limit theorem, it can be proved that

aTσn1(x1){BS(x1)γ^,1eOR}N(0,1), (30)

for any aRs* with ‖a2 = 1. Since aTσn1(x1){α^1OR(x1)b1(x1)}=aTσn1(x1){BS(x1)γ^,1eOR}+op(1), by (30) and Slutsky's theorem, we have

aTσn1(x1){α^1OR(x1)b1(x1)}N(0,1). (31)

By (28) and (29), with probability approaching 1,

I1α^1ORb12γ^,1eOR22c2n2[i=1nεiZi,1T(g˙1(ηi)/σi2)][i=1nZi,1εi(g˙1(ηi)/σi2)]c2n1E[Zi,1TZi,1{g˙1(ηi)}2/σi2]sJnSn1;
aT(α^1ORb1)2Caγ^,1eOR22Cac1n2(i=1nεiZi,1T)(i=1nZi,1εi)Cac1n1E(Zi,1TZi,1)sJnSn1.

Since supx1[0,1]|α1(x1)B1S(x1)Tγ1|=O{(JnS)r}, it can be proved that aTγ^,1μORγ^,1μOR2=Op{(s)1/2(JnS)r}, and aT(b1α10)aTγ^,1μ2OR=Op{(s)1/2(JnS)r}. Hence

aT(b1α1)aT(b1α10)+aT(α10α1)=Op{s(JnS)r}.

By (31), {eTσn2(x1)e}1/2{α^1OR(x1)b1}N(0,1), and supI^1|α^0ORα0|=Op(n1/2) follows from the central limit theorem.

A.5. Proof of Theorem 3

By (27) in Lemma A.2,

supx1[0,1]α^1S(x1)α^1OR(x1)supx1[0,1]j=1JnS|Bj,1S(x1)|γ^,1Sγ^,1OR.

The right-hand side is bounded by Op{(n1logn)1/2+(Jnini)r}α^0Sα^0OR2=op(n1/2) can be proved following the same procedure and thus omitted. By (29), with probability approaching 1, for large enough n, for any x1 ∈ [0, 1], and aRs* with ‖a2 = 1, one has

aTσn2(x1)acZ1n1aTBS(x1)BS(x1)Tac1JnSn1aTa,
aTσn2(x1)aCZ1(s)1n1aTBS(x1)BS(x1)TaC1JnS(s)1n1aTa,

where σn2(x1) is defined in (12). Thus

supx1[0,1]|aTσn1(x1){α^1S(x1)α^1OR(x1)}|supx1[0,1]σn1(x1)2α^1S(x1)α^1OR(x1)2=Op[s{(logn/JnS)1/2+(n/JnS)1/2(Jnini)r}]=op(1).

A.6. Proof of Theorem 4

Using the strong approximation lemma given in Theorem 2.6.7 of Csörgő and Révész (1981), we can prove by the same procedure as Lemma A.7 in Ma, Yang and Carroll (2012) that

supx1[0,1]|α^1OR(x1)b1(x1)α^1,ε0(x1)|=oa.s.(nt) (32)

for some t < –r/(2r + 1) < 0, where α^1,ε0(x1) is

eTBS(x1)[i=1nZi,1Zi,1T{g˙1(ηi)}2/σi2]1[i=1nZi,1ei{g˙1(ηi)/σi2}],

and ei, 1 ≤ in, are i.i.d. N(0, 1) independent of Zi, 1. For σn2(x1) defined in (12) and σn1(x1)(JnS/n)1/2{1+op(1)} uniformly in x1 ∈ [0, 1]. By (32), JnSn1/(2r+1) and t < –r/(2r + 1) < 0, we have

supx1[0,1]|{log(Ln+1)}1/2σn11(x1){α^1OR(x1)b1(x1)α^1,ε0(xl1)}|=oa.s.({log(Ln+1)}1/2(n/JnS)1/2nt)=oa.s.({log(Ln+1)}1/2nr/(2r+1)t)=oa.s.(1). (33)

Define η(x1)=σn11(x1)α^1,ε0(x1). It is apparent that Inline graphic{η(ξJ)|Zi, 1, 1 ≤ in} = N(0, 1), so Inline graphic{η(ξJ)} = N(0,1) for 0 ≤ JLn. Moreover, the eigenvalues of (EZi,1Zi,1T)1JnS. Then with probability approaching 1, for JJ′,

|E{η(ξJ)η(ξJ')}|(n/JnS)n1|eTBS(ξJ)(EZi,1Zi,1T)1BS(ξJ')Te||eTBS(ξJ)BS(ξJ')Te|=j=1JnSBj,1S(ξJ)Bj,1S(ξJ')

and j=1JnSBj,1S(ξJ)Bj,1S(ξJ')C for a constant 0 < C < ∞ when |jJjJ′| ≤ (q − 1) and j=1JnSBj,1S(ξJ)Bj,1S(ξJ')=0 when |jJjJ′| > (q – 1), in which jJ denotes the index of the knot closest to ξJ from the left. Therefore, by LnJnS, there exist constants 0 < C1 < ∞ and 0 < C2 < ∞ such that with probability approaching 1, for JJ′, |E{η(ξJ)η(ξJ')}|C1|jJjJ'|C2|JJ'|. By Lemma A1 given in Ma and Yang (2011a), we have

limnP{sup0JLn|{2log(Ln+1)}1/2η(ξJ)|dNn(α)}=1α,

and hence

limnP{supx1Sn,1|{2log(Ln+1)}1/2σn1(x1)α^1,ε0(x1)|dNn(α)}=1α. (34)

Furthermore, according to the result on page 149 of de Boor (2001), we have

supx1[0,1]|{log(Ln+1)}1/2σn11(x1){b1(x1)α1(x1)}|=Op({log(Ln+1)}1/2(n/JnS)1/2(JnS)r)=op(1). (35)

Moreover, α^1OR(x1)α1(x1)=α^1,ε0(x1)+{α^1OR(x1)b1(x1)α^1,εOR(x1)}+{b1(x1)α1(x1)}. Hence by (33) and (35), we have

limnP{supx1Sn,1{log(Ln+1)}1/2σn11(x1)|α^1OR(x1)α1|dNn(α)}=limnP{supx1Sn,1{log(Ln+1)}1/2σn11(x1)|α^1,ε0(x1)|dNn(α)}=1α, (36)

where the last step follows from (34). By the oracle property given in Theorem 3, and JnSn1/(2r+1) and n1/(2r+1)Jnini, we have

supx1[0,1]{log(Ln+1)}1/2σn11(x1)|α^1S(x1)α^1OR(x1)|=Op[log(Ln+1)1/2(n/JnS)1/2(n1logn)1/2+(Jnini)r]=op(1). (37)

Therefore, by (36) and (37), we have

limnP{supx1Sn,1{log(Ln+1)}1/2σn11(x1)|α^1S(x1)α1(x1)|dNn(α)}=1α,

and hence the result in Theorem 4 is proved.

Contributor Information

MA Shujie, Email: shujie.ma@ucr.edu.

Raymond J. Carroll, Email: carroll@stat.tamu.edu.

Hua Liang, Email: hliang@gwu.edu.

Shizhong Xu, Email: shizhong.xu@ucr.edu.

References

  1. Carroll RJ, Fan J, Gijbels I, Wand MP. Generalized partially linear single-index models. J Amer Statist Assoc. 1997;92:477–489. MR1467842. [Google Scholar]
  2. Chen J, Chen Z. Extended Bayesian information criteria for model selection with large model spaces. Biometrika. 2008;95:759–771. MR2443189. [Google Scholar]
  3. Cheverud JM. A simple correction for multiple comparisons in interval mapping genome scans. Heredity (Edinb) 2001;87:52–58. doi: 10.1046/j.1365-2540.2001.00901.x. [DOI] [PubMed] [Google Scholar]
  4. Claeskens G, Van Keilegom I. Bootstrap confidence bands for regression curves and their derivatives. Ann Statist. 2003;31:1852–1884. MR2036392. [Google Scholar]
  5. Csörgő M, Révész P. Strong Approximations in Probability and Statistics. Academic Press; New York: 1981. MR0666546. [Google Scholar]
  6. Dawber TR, Meadors GF, Moore FE. Epidemiological approaches to heart disease: The Framingham 660 study. American Journal of Public Health. 1951;41:279–286. doi: 10.2105/ajph.41.3.279. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. de Boor C. Applied Mathematical Sciences. revised. Vol. 27. Springer; New York: 2001. A Practical Guide to Splines. MR1900298. [Google Scholar]
  8. DeVore RA, Lorentz GG. Constructive Approximation Grundlehren der Mathematischen Wissenschaften. Vol. 303. Springer; Berlin: 1993. MR1261635. [Google Scholar]
  9. Efron B. Estimation and accuracy after model selection. J Amer Statist Assoc. 2014;109:991–1007. doi: 10.1080/01621459.2013.823775. MR3265671. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Amer Statist Assoc. 2001;96:1348–1360. MR1946581. [Google Scholar]
  11. Fan Y, Tang CY. Tuning parameter selection in high dimensional penalized likelihood. J R Stat Soc Ser B Stat Methodol. 2013;75:531–552. MR3065478. [Google Scholar]
  12. Hall P, Titterington DM. On confidence bands in nonparametric density estimation and regression. J Multivariate Anal. 1988;27:228–254. MR0971184. [Google Scholar]
  13. Härdle W, Marron JS. Bootstrap simultaneous error bars for nonparametric regression. Ann Statist. 1991;19:778–796. MR1105844. [Google Scholar]
  14. Horowitz J, Klemelä J, Mammen E. Optimal estimation in additive regression models. Bernoulli. 2006;12:271–298. MR2218556. [Google Scholar]
  15. Horowitz JL, Mammen E. Nonparametric estimation of an additive model with a link function. Ann Statist. 2004;32:2412–2443. MR2153990. [Google Scholar]
  16. Huang JZ. Local asymptotics for polynomial spline regression. Ann Statist. 2003;31:1600–1635. MR2012827. [Google Scholar]
  17. Huang J, Horowitz JL, Wei F. Variable selection in nonparametric additive models. Ann Statist. 2010;38:2282–2313. doi: 10.1214/09-AOS781. MR2676890. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Jiang B, Liu JS. Variable selection for general index models via sliced inverse regression. Ann Statist. 2014;42:1751–1786. MR3262467. [Google Scholar]
  19. Knutson KL. Does inadequate sleep play a role in vulnerability to obesity? Am J Hum Biol. 2012;24:361–371. doi: 10.1002/ajhb.22219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Lam C, Fan J. Profile-kernel likelihood inference with diverging number of parameters. Ann Statist. 2008;36:2232–2260. doi: 10.1214/07-AOS544. MR2458186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Lee YK, Mammen E, Park BU. Flexible generalized varying coefficient regression models. Ann Statist. 2012;40:1906–1933. MR3015048. [Google Scholar]
  22. Lian H. Variable selection for high-dimensional generalized varying-coefficient models. Statist Sinica. 2012;22:1563–1588. MR3027099. [Google Scholar]
  23. Liu R, Yang L. Spline-backfitted kernel smoothing of additive coefficient model. Econometric Theory. 2010;26:29–59. MR2587102. [Google Scholar]
  24. Liu R, Yang L, Härdle WK. Oracally efficient two-step estimation of generalized additive model. J Amer Statist Assoc. 2013;108:619–631. MR3174646. [Google Scholar]
  25. Ma S, Yang L. A jump-detecting procedure based on spline estimation. J Nonparametr Stat. 2011a;23:67–81. MR2780816. [Google Scholar]
  26. Ma S, Yang L. Spline-backfitted kernel smoothing of partially linear additive model. J Statist Plann Inference. 2011b;141:204–219. MR2719488. [Google Scholar]
  27. Ma S, Yang L, Carroll RJ. A simultaneous confidence band for sparse longitudinal regression. Statist Sinica. 2012;22:95–122. doi: 10.5705/ss.2010.034. MR2933169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Ma S, Carroll RJ, Liang H, Xu S. Supplement to “Estimation and inference in generalized additive coefficient models for nonlinear interactions with high-dimensional covariates”. 2015 doi: 10.1214/15-AOS1344SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Meier L, Bühlmann P. Smoothing l1-penalized estimators for high-dimensional time-course data. Electron J Stat. 2007;1:597–615. MR2369027. [Google Scholar]
  30. Meier L, van de Geer S, Bühlmann P. High-dimensional additive modeling. Ann Statist. 2009;37:3779–3821. MR2572443. [Google Scholar]
  31. Murcray CE, Lewinger JP, Gauderman WJ. Gene-environment interaction in genome-wide association studies. Am J Epidemiol. 2009;169:219–226. doi: 10.1093/aje/kwn353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Nyholt DR. A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am J Hum Genet. 2004;74:765–769. doi: 10.1086/383251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Randall JC, Winkler TM, Kutalik Z, Berndt SI, Jackson AU, et al. Sex-stratified genome-wide association studies including 270,000 individuals show sexual dimorphism in genetic loci for anthropometric traits. PLOS Genetics. 2013;9:e1003500. doi: 10.1371/journal.pgen.1003500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Ravikumar P, Lafferty J, Liu H, Wasserman L. Sparse additive models. J R Stat Soc Ser B Stat Methodol. 2009;71:1009–1030. MR2750255. [Google Scholar]
  35. Wang H, Li R, Tsai CL. Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika. 2007;94:553–568. doi: 10.1093/biomet/asm053. MR2410008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Wang L, Xue L, Qu A, Liang H. Estimation and model selection in generalized additive partial linear models for correlated data with diverging number of covariates. Ann Statist. 2014;42:592–624. MR3210980. [Google Scholar]
  37. Wareham NJ, van Sluijs EMF, Ekelund U. Physical activity and obesity prevention: A review of the current evidence. Proc Nutr Soc. 2005;64:229–247. doi: 10.1079/pns2005423. [DOI] [PubMed] [Google Scholar]
  38. Xue L, Liang H. Polynomial spline estimation for a generalized additive coefficient model. Scand J Stat. 2010;37:26–46. doi: 10.1111/j.1467-9469.2009.00655.x. MR2675938. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Xue L, Yang L. Additive coefficient modeling via polynomial spline. Statist Sinica. 2006;16:1423–1446. MR2327498. [Google Scholar]
  40. Zhou S, Shen X, Wolfe DA. Local asymptotics for regression splines and confidence regions. Ann Statist. 1998;26:1760–1782. MR1673277. [Google Scholar]
  41. Zou H. The adaptive lasso and its oracle properties. J Amer Statist Assoc. 2006;101:1418–1429. MR2279469. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement

RESOURCES