Abstract
Semiparametric generalized varying coefficient partially linear models with longitudinal data arise in contemporary biology, medicine, and life science. In this paper, we consider a variable selection procedure based on the combination of the basis function approximations and quadratic inference functions with SCAD penalty. The proposed procedure simultaneously selects significant variables in the parametric components and the nonparametric components. With appropriate selection of the tuning parameters, we establish the consistency, sparsity, and asymptotic normality of the resulting estimators. The finite sample performance of the proposed methods is evaluated through extensive simulation studies and a real data analysis.
1. Introduction
Identifying the significant variables is of great significance in all regression analysis. In practice, a number of variables are available for an initial analysis, but many of them may not be significant and should be excluded from the final model in order to increase the accuracy of prediction. Various procedures and criteria, such as stepwise selection and subset selection with Akaike information criterion (AIC), Mallows Cp, and Bayesian information criterion (BIC), have been developed. Nevertheless, these selection methods suffer from expensive computational costs. Many shrinkage methods have been developed for the purpose of computational efficiency, e.g., the nonnegative garrote [1], the LASSO [2], the bridge regression [3], the SCAD [4], and the one-step sparse estimator [5]. Among those, the SCAD possesses the virtues of continuity, unbiasedness, and sparsity. There are a number of works on the SCAD estimation methods in various regression models, e.g., [6–9]. Zhao and Xue [8] proposed a variable selection method to select significant variables in the parametric components and the nonparametric components simultaneously for the varying coefficient partially linear models (VCPLMs).
On the other hand, longitudinal data occurs frequently in biology, medicine, and life science, in which it is often necessary to make repeated measurements of subjects over time. The responses from different subjects are independent, but the responses from the same subject are very likely to be correlated. This feature is called “within-cluster correlation”. Qu et al. [10] proposed a method of quadratic inference functions (QIFs) to treat the longitudinal data. The QIF can efficiently take the within-cluster correlation into account and is more efficient than the generalized estimating equation (GEE) [11] approach when the working correlation is misspecified. The QIF approach has been applied to many models, including varying coefficient models (VCM) [12, 13], partially linear models (PLM) [14], varying coefficient partially linear models (VCPLMs) [15], and generalized partially linear models (GPLM) [16]. Wang et al. [13] proposed a group SCAD procedure for variable selection of VCM with longitudinal data. More recently, Tian et al. [15] proposed a QIF-based SCAD penalty for the variable selection for VCPLM with longitudinal data.
As introduced in Li and Liang [17], the generalized partially linear varying coefficient model (GPLVCM) possesses the great flexibility of a nonparametric regression model and provides the explanatory power of a generalized linear regression model, which arises naturally due to categorical covariates. Many models are the special case of GPLVCM, e.g., VCM, VCPLM, PLM, and GLM. Li and Liang [17] studied variable selection for GPLVCM, where the parametric components are identified via the SCAD but the nonparametric components are selected via a generalized likelihood ratio test instead of shrinkage. In this paper, we extend the QIF-based group SCAD variable selection procedure to GPLVCM with longitudinal data, and the B-spline methods are adopted to approximate the nonparametric component in the model. With suitable chosen tuning parameters, the proposed variable selection procedure is consistent, and the estimators of regression coefficients have oracle property, i.e., the estimators of the nonparametric components achieve the optimal convergence rate, and the estimators of the parametric components have the same asymptotic distribution as that based on the correct submodel.
The rest of this paper is organized as follows. In Section 2, we propose a variable selection procedure for the GPLVCM with longitudinal data. Asymptotic properties of the resulting estimators and an iteration algorithm are presented in Section 3. In Section 4, we carry out simulation studies to assess the finite sample performance of the method. A real data analysis is given in Section 5 to illustrate the proposed methodology. The details of proofs are provided in the appendix.
2. Methodology
2.1. GPLVCM with Longitudinal Data
In this article, we consider a longitudinal study with n subjects and mi observations over time for the ith subject (i = 1, ⋯, n) for a total of N = ∑i=1n mi observations. Each observation consists of a response variable Yij and the predicator variables (Xij, Zij, U(ij)), where Xij ∈ Rp, Zij ∈ Rq and Uij is a scalar. We assume that the observations from different subjects are independent, but those within the same subject are dependent. The generalized varying coefficient partially linear model (GPLVM) with longitudinal data takes the form
| (1) |
where μij is the expectation of Yij when Xij, Zij, and Uij are given, β = (β1,⋯,βp)T is an unknown p × 1 regression coefficient vector, h(·) is a known smooth link function, and α(u) = (α1(u), α2(u),⋯,αq(u))T is a q × 1 unknown monotonic smooth function vector. Without loss of generality, we assume U ~ U[0, 1].
We approximate α(·) by B-spline basis functions B(u) = (B1(u),⋯,BL(u))T with the order of M, where L = K + M + 1 and K is the number of interior knots, i.e.,
| (2) |
where γk = (γk1,⋯,γkL)T is a L × 1 vector of unknown regression coefficients. Accordingly, μij is approximated by
| (3) |
where γ = (γ1T, ⋯,γqT)T and “⊗” is the Kronecker product. We use the B-spline basis functions because they are numerically stable and have bounded support [18]. The spline approach also treats a nonparametric function as a linear function with the basis functions as pseudodesign variables, and thus, any computational algorithm for the generalized linear models can be used for the GPLVCMs.
To incorporate the within-cluster correlation, we apply the QIFs to estimate β and γ, respectively. Denote θ = (βT, γT)T, we define the extended score gN(θ) as follows:
| (4) |
where , Ai = diag(Var(Yi1), ⋯, Var(Yim)) is the marginal variance matrix of subject Yi, and M1, ⋯, Ms are the base matrices to represent the inverse of the working correlation matrix R in GEE approach. Following Qu et al. [10], we define the quadratic inference functions to be
| (5) |
where Ωn(θ) = (1/n)∑i=1n gi(θ)gi(θ)T. Note that Ωn depends on θ. The QIF estimate is then given by
| (6) |
2.2. Penalized QIF
In real data analysis, the true regression model is always unknown. An overfitted model lowers the efficiency of estimation while an underfitted one leads to a biased estimator. A popular approach to identify the relevant predictors while estimating the nonzero parameters and functions in model (1) simultaneously is to exert some kind of “penalty” on the original objective function. Here, we choose the smoothly clipped absolute deviation (SCAD) penalty because it has several advantages such as unbiasedness, sparsity, and continuity. The SCAD-penalized quadratic inference function (PQIF) is defined as follows:
| (7) |
where ‖γk‖H = (γkTHγk)1/2, H = (hij)L×L, hij = ∫01 Bi(u)BjT(u)du and pλ is the SCAD penalty function, where the derivative is defined as
| (8) |
where a > 2, ω > 0, pλ(0) = 0; here, we choose a = 3.7 as in [4].
Note that
| (9) |
This group-wised penalization ensures that the spline coefficient vector of the same nonparametric component is treated as an entire group in model selection.
Denote to be the penalized estimator obtained by minimizing the penalized objective function of (7). Then, is the estimator of the parameter β and the estimator of the nonparametric function α(u) is calculated by , where .
3. Asymptotic Properties
3.1. Oracle Property
We next establish the asymptotic properties of the resulting penalized QIF estimators. We first introduce some notations. Let β0 and α0(·) denote the true values of β(·) and α(·). In addition, γ0 is the spline coefficient vector from the spline approximation to α0(·). Without loss of generality, we assume that β0l ≠ 0, l = 1, ⋯, p1 and β0l = 0, l = p1 + 1, ⋯, p, i.e., only the first p1 component of β0 is nonzero. Similarly, we assume that α0k(·) ≠ 0, k = 1, ⋯, q1 and α0k(·) = 0, k = q1 + 1, ⋯, q, i.e., only the first q1 component of α0(·) is nonzero. For convenience and simplicity, let C denote a positive constant that may have different values at each appearance throughout this paper and ||A|| denote the modulus of the largest singular value of matrix or vector A. Before the proof of our main theorems, we list some regularity conditions used in this paper.
Assumption 1 (A1). —
The spline regression parameter γ is identifiable, that is, γ0 is the spline coefficient vector from the spline approximation to α0(·). In addition, there is a unique θ0 = (β0, γ0) ∈ S satisfying E{gN(θ0)} = 0, where S is the parameter space.
Assumption 2 (A2). —
The weight matrix Ωn = (1/n)∑i=1n gi(θ)giT(θ) converges almost surely to a constant matrix Ω0, where Ω0 is invertible.
Assumption 3 (A3). —
The covariate matrices Xi and Zi, i = 1, ⋯, n, satisfy supiE‖Xi‖4 < ∞ and supiE‖Zi‖4 < ∞.
Assumption 4 (A4). —
The error εi = Yi − μi satisfies E(εiεiT) = Vi, supi‖Vi‖ < ∞, and there exists a positive constant δ such that supiE‖εi‖2+δ < ∞.
Assumption 5 (A5). —
All marginal variances Ai ≥ 0 and supi‖Ai‖ < ∞.
Assumption 6 (A6). —
{mi} is a bounded sequence of positive integers.
Assumption 7 (A7). —
α i(u), i = 1, 2, ⋯, q is rth continuous differentiable on (0, 1), where r ≥ 2.
Assumption 8 (A8). —
The inner knots {ci, i = 1, ⋯, K} satisfy
(10) where hi = ci − ci−1.
Assumption 9 (A9). —
The link function h(·) is 2th continuous differentiable and E{h2+δ} < ∞ for some δ > 2.
Assumption 10 (A10). —
a n = O(n−1/2); bn⟶0 as n⟶∞, where
(11)
Theorem 1 indicates that the estimator of nonparametric components achieve the optimal convergence rate.
Theorem 1 . —
Assume that Assumptions (A.1)–(A.10) hold and the number of knots K = O(N1/(2r + 1)), then
(12) Furthermore, under suitable condition, Theorem 1 shows that the penalized QIF estimator has the sparsity property.
Theorem 2 . —
Assume that the conditions in Theorem 1 hold and as n⟶∞, with probability approaching 1,
(13) where λmax = max{λ1, λ2}, λmin = min{λ1, λ2}.
Theorems 1 and 2 indicate that with the tune parameter λ being suitably chosen, the proposed selection method possesses model selection consistency. Next, we establish the asymptotic property for the estimator of the nonzero parametric components. Let β∗ = (β1, ⋯,βp1)T, α∗(·) = (α1∗(·), ⋯,αq1∗(·))T and let β0∗ and α0∗(·) denote their true value, respectively. In addition, let γ∗ = (γ1T, ⋯,γq1T)T and γ0∗ = (γ01T, ⋯,γ0q1T)T denote the spline coefficient vector of α∗(·) and α0∗(·), respectively, and let Xi∗ and Zi∗, i = 1, ⋯, n denote their correspondent covariate. Let , and
(14) where Δ⊗2 = ΔΔT, τ = (τij)n×n is a n × n block matrix with its (i, j) block taking the form
(15) Theorem 3 states that is asymptotically normally distributed.
Theorem 3 . —
Suppose that Assumptions (A.1)–(A.9) hold and the number of knots K = O(N1/(2r + 1)), then
(16) where Σ = (ΓΔ−1Γ)−1 and represents the convergence in distribution.
3.2. Selection of Tuning Parameters
Theorems 1–3 imply that the proposed variable selection procedure possessed the oracle property. However, this attractive feature relies on the choice of tuning parameters λi. The popular criteria to choose λi include cross-validation, generalized cross-validation, AIC, and BIC. Wang et al. [19] suggested using BIC for the SCAD estimator in linear models and partially linear models and proved its model selection consistency property, i.e., the optimal parameter chosen by BIC can identify the true model with probability tending to one. Tian proved that for partially linear models. Hence, we adopt BIC to choose the optimal {λ1, λ2}. Following [19–21], we simplify the tuning parameters as
| (17) |
where and are the unpenalized QIF estimates. Consequently, the original two-dimensional problem becomes a univariate problem about λ0, which can be selected according to the following BIC-type criterion:
| (18) |
where is the regression coefficient estimated by minimizing the penalized QIF in (2.8) for a given λ and dfλ is the number of nonzero coefficients of and . Thus, the tuning parameter λ is obtained by
| (19) |
From Theorem 4 of Tian et al. [15], the BIC tuning parameter selector enables us to select the true model consistently.
3.3. An Algorithm Using Local Quadratic Approximation
Based on Fan and Li's local quadratic approximating approach [4], we propose an iterative algorithm to minimize the PQIF (7). Similar with Tian et al. [15], we choose the unpenalized QIF estimator as the initial estimator. Let θk = (β1k, ⋯,βpk, γ1kT, ⋯,γqkT)T be the value of θ at the kth iteration. If βlk (or γlk) is close to 0 (or 0), i.e., |βlk| ⩽ ϵ (or ‖γlk‖H ⩽ ϵ) with some small threshold value ϵ, then we set βlk = 0 (or γlk = 0). We consider ϵ = 10−6 in our simulations.
Suppose βlk+1 = 0, for l = pk + 1, ⋯, p, and γlk+1 = 0, for l = qk + 1, ⋯, q, and βk+1 = (β1k+1, ⋯,βpkk+1, βpk+1k+1, ⋯,βpk+1)T = ((βNk+1)T, (βZk+1)T)T, where βNk+1 = (β1k+1, ⋯,βpkk+1)T are the nonzero parametric components and βZk+1 = (βpk+1k+1, ⋯,βpk+1)T = 0. Similarly, let γk+1 = ((γ1k+1)T, ⋯,(γqkk+1)T, (γqk+1k+1)T, ⋯,(γqk+1)T)T = ((γNk+1)T, (γZk+1)T)T, where γNk+1 = ((γ1k+1)T, ⋯,(γqkk+1)T)T and γZk+1 = ((γqk+1k+1)T, ⋯,(γqk+1)T)T correspond to qk zero functions and q − qK zero functions, respectively. Let θ = (βNT, βZT, γNT, γZT)T denote a vector which has the same length and same partition with θk+1.
For the parametric term, if |βlk| > ϵ, the penalty function at βl ≈ βlk is approximated by
| (20) |
Similarly, to the nonparametric component, if ‖γlk‖H > ϵ, the penalty function at γl ≈ γlk is approximated by
| (21) |
where p′λ is the first-order derivative of the penalty function pλ. This leads to the local approximation of the PQIF 𝒬np(θ) by a quadratic function:
| (22) |
where , , with ω11 = (βNT, γZT)T, and
| (23) |
Minimizing the quadratic function (22), we obtain ω11k+1. The Newton-Raphson method then iterates the following process to convergence:
| (24) |
4. Simulation Studies
4.1. Assessing Rule
In this section, we conduct a simulation study to assess the finite sample performance of the proposed procedures. Following [17], the performance of estimator will be assessed by the generalized mean square error (GMSE), which is defined as
| (25) |
The performance of estimator will be assessed by the square root of average square errors (RASE)
| (26) |
where uv, v = 1, ⋯, M are the grid points where the function is evaluated. In our simulation, M = 300 is used.
To assess the performance of the variable selection, we use “C” to denote the average number of zero regression coefficients that are correctly estimated as zero and use “IC” to denote the average number of nonzero regression coefficients that are erroneously set to zero. The more closer the value of “C” to the number of true zero coefficient in the model and the more closer the value of “IC” to zero, the better the performance of the variable selection procedure is.
In our simulations, we use the sample quantiles of Uij as knots and take the number of internal knots to be 3, that is, O(N1/5). This particular choice is consistent with the asymptotic theory in Section 3 and performs well in the simulations. For each simulated dataset, the proposed estimation procedures for finding out penalized QIF estimators with SCAD and LASSO penalty functions are considered. The tuning parameters λ1, λ2 for the penalty functions are chosen by BIC from 50 equispaced grid points in [−15, 5]. For each of these methods, the average of zero coefficients over the 500 simulated datasets is reported.
4.2. Study 1 (Partial Penalty)
Consider a Bernoulli response
| (27) |
where β = (2,1.5,0.7, 017T)T, m = 6, Xij ~ N(0, I20), α(Uij) = 0.4cos((π/2)Uij), and Uij are drawn independently from U[0, 1]. Response variable Yij with compound symmetry correlation structure (CS) is generated according to Oman [22]. In our simulation study, we consider ρ = 0.25 and 0.75, representing weak and strong correlations, respectively. In some situations, we prefer not to shrink some certain components in the variable selection procedure when some kind of prior information is available. Partial penalty arises naturally for such case. In this example, we only exert penalty on the parametric component, i.e., coefficient β. In this situation, the PQIF (7) becomes
| (28) |
The variable selection result is reported in Tables 1 and 2.
Table 1.
Variable selection for the parametric components under different methods.
| Method | n = 150 | n = 200 | n = 300 | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| GMSE | C | IC | GMSE | C | IC | GMSE | C | IC | ||
| ρ = 0.75 | SCAD | 0.0011 | 15.83 | 0 | 0.0006 | 16.246 | 0 | 0.0005 | 16.746 | 0 |
| LASSO | 0.0006 | 14.81 | 0 | 0.0005 | 15.346 | 0 | 0.0004 | 15.574 | 0 | |
|
| ||||||||||
| ρ = 0.25 | SCAD | 0.0011 | 15.75 | 0 | 0.0006 | 16.70 | 0 | 0.0004 | 16.846 | 0 |
| LASSO | 0.0007 | 14.82 | 0 | 0.0006 | 14.96 | 0 | 0.0005 | 15.35 | 0 | |
Table 2.
RASE of under different methods.
| Method | n = 150 | n = 200 | n = 300 | |
|---|---|---|---|---|
| ρ = 0.75 | SCAD | 0.1920 | 0.2051 | 0.1054 |
| LASSO | 0.0999 | 0.0840 | 0.1064 | |
|
| ||||
| ρ = 0.25 | SCAD | 0.2449 | 0.2460 | 0.0694 |
| LASSO | 0.1399 | 0.1205 | 0.1033 | |
Tables 1 and 2 show that the performance of the proposed variable selection approach improves as n increases, e.g., the number of correctly recognized zero coefficient increases to the number of true zero coefficient in the model and the GMSE of decreases as n increases. In addition, the RASE of also decreases as n increases, which means the estimated curve of fits better to the true line of α(u) when the sample size increases. Moreover, the SCAD penalty method outperforms the LASSO penalty ones in the sense of correct variable selection rate, which significantly reduces the model uncertainty and complexity.
4.3. Study 2 (Fixed-Dimensional Setup)
In this example, we generate data from the following model:
| (29) |
where β = (2,1.5,0.7, 07T) and α(u) = (α1(u), α2(u), 05T)T with α1(u) = 0.8cos((π/2)u), α2(u) = 1.5 + u2, Xij and Zij(j = 1, ⋯, 6) come from a multivariate normal distribution with mean zero, marginal variance 1 and correlation coefficient 0.5, and u ~ U(0, 1). Response variable Yij with compound symmetry correlation structure (CS) is generated by the same method as study 1 and we also consider ρ = 0.25 and 0.75, representing weak and strong correlations, respectively. We generated 500 datasets for each pair of (N, ρ). The results are also reported in Tables 3 and 4.
Table 3.
Variable selection for the parametric components under different methods.
| Method | n = 150 | n = 200 | n = 300 | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| GMSE | C | IC | GMSE | C | IC | GMSE | C | IC | ||
| ρ = 0.75 | SCAD | 0.0048 | 6.76 | 0 | 0.0036 | 6.846 | 0 | 0.0030 | 6.864 | 0 |
| LASSO | 0.0039 | 4.694 | 0 | 0.0033 | 4.766 | 0 | 0.0028 | 5.074 | 0 | |
|
| ||||||||||
| ρ = 0.25 | SCAD | 0.0047 | 6.76 | 0 | 0.0035 | 6.718 | 0 | 0.0028 | 6.846 | 0 |
| LASSO | 0.0038 | 4.814 | 0 | 0.0035 | 4.98 | 0 | 0.0029 | 5.048 | 0 | |
Table 4.
Variable selection for the nonparametric components under different methods.
| Method | n = 150 | n = 200 | n = 300 | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| GMSE | C | IC | GMSE | C | IC | GMSE | C | IC | ||
| ρ = 0.75 | SCAD | 0.1696 | 4.35 | 0 | 0.1221 | 4.66 | 0 | 0.0812 | 4.83 | 0 |
| LASSO | 0.1932 | 4.38 | 0 | 0.1540 | 4.36 | 0 | 0.1235 | 4.57 | 0 | |
|
| ||||||||||
| ρ = 0.25 | SCAD | 0.1636 | 4.42 | 0 | 0.1076 | 4.72 | 0 | 0.0344 | 4.85 | 0 |
| LASSO | 0.1982 | 4.40 | 0 | 0.1160 | 4.68 | 0 | 0.0398 | 4.76 | 0 | |
Table 3 reports the variable selection for the parametric components; it shows that the performances become better and better as n increases, e.g., the number of correctly recognized zero coefficients, which is denoted as values in the column labeled “C,” becomes more and more closer to the true number of zero regression coefficients in the model. At the same time, the GMSE decreases steadily as n increases. Table 4 shows that, for the nonparametric components, the performances of the proposed variable selection method are similar to those of the method for the parametric components. As n increases, the RASE of the estimated nonparametric function also becomes smaller and smaller. This reflects that the estimate curves fit better to the corresponding true line as the sample size increases. Moreover, the SCAD penalty method outperforms the LASSO penalty ones in the sense of correct variable selection rate, which significantly reduces the model uncertainty and complexity.
To study the influence of misspecified correlation structure to the proposed approach, we perform variable selection when the working correlation structure is specified to be CS and first-order autoregressive (AR-1), respectively. The result is listed in Table 5. It is known that the QIF estimator is insensitive to misspecification in correlation structure. Table 5 shows that the proposed variable selection procedure gives similar results even when the correlation structure is misspecified. This indicates that our method is robust.
Table 5.
Variable selection when the true R is CS when n = 300.
| Working R | Method | β | α(·) | ||||
|---|---|---|---|---|---|---|---|
| GMSE | C | IC | RASE | C | IC | ||
| ρ = 0.75 | |||||||
|
| |||||||
| CS | SCAD | 0.0030 | 6.864 | 0 | 0.0812 | 4.83 | 0 |
| LASSO | 0.0028 | 5.074 | 0 | 0.1235 | 4.57 | 0 | |
|
| |||||||
| AR-1 | SCAD | 0.0033 | 6.856 | 0 | 0.0935 | 4.82 | 0 |
| LASSO | 0.0034 | 4.924 | 0 | 0.1230 | 4.57 | 0 | |
|
| |||||||
| ρ = 0.25 | |||||||
|
| |||||||
| CS | SCAD | 0.0028 | 6.846 | 0 | 0.0344 | 4.85 | 0 |
| LASSO | 0.0029 | 5.048 | 0 | 0.0398 | 4.76 | 0 | |
|
| |||||||
| AR-1 | SCAD | 0.0030 | 6.846 | 0 | 0.0354 | 4.86 | 0 |
| LASSO | 0.0031 | 5.048 | 0 | 0.0411 | 4.75 | 0 | |
4.4. Study 3 (High-Dimensional Setup)
In this example, we discuss how the proposed variable selection procedure can be applied to the “large n, diverging p/q” setup for longitudinal models. We consider the high-dimensional setup of study 2. In this simulation, we take n = 300, m = 6, p = 20 = O(N1/4), q = 10 = O(N1/4). The true coefficient vector is β = (2,1.5,0.7, 017T)T, α(u) = (α1(u), α2(u), 010T)T, where α1(u) and α2(u) are defined in study 2. The other settings are the same with study 2. The results are reported in Table 6. It is easy to see that the proposed variable selection procedure is able to correctly identify the true model and works well in the “large n, diverging p/q” setup.
Table 6.
Variable selection under high-dimensional setup.
| Method | β | α(·) | |||||
|---|---|---|---|---|---|---|---|
| GMSE | C | IC | RASE | C | IC | ||
| ρ = 0.75 | SCAD | 0.0036 | 16.664 | 0 | 0.1148 | 9.656 | 0 |
| LASSO | 0.0033 | 15.574 | 0 | 0.1239 | 9.546 | 0 | |
|
| |||||||
| ρ = 0.25 | SCAD | 0.0034 | 16.846 | 0 | 0.1047 | 9.875 | 0 |
| LASSO | 0.0039 | 15.35 | 0 | 0.1138 | 9.802 | 0 | |
5. Application to Infectious Disease Data
We apply the proposed method to analyze an infectious disease data (indon.data), which has been well analyzed by many authors, such as [16, 23–27]. In this study, a total of 275 preschool children were examined every three months for 18 months. The response is the presence of respiratory infection (1 = yes, 0 = no). The primary interest is in studying the relationship between the risk of respiratory infection and vitamin A deficiency (1 = yes, 0 = no).
In our study, we consider the following GPLVCM model
| (30) |
where t is age, X1 is vitamin A deficiency, X2, X3 are the seasonal cosine and seasonal sine variables, respectively, which indicate the season when those examinations took place, X4 is gender (1 = female, 0 = male), X5 is height, X6 is stunting status (1 = yes, 0 = no), and Z1 = X52 is the square of height. The with-cluster correlation structure is assumed to be exchangeable, i.e., compound symmetric. This structure is also used in [16, 26, 27].
We apply the proposed QIF-based group SCAD variable selection procedure to the above model and recognize five nonzero coefficients and one nonzero function α0(t), where β1 = 0.842, β2 = −0.685, β3 = −0.309, β4 = −0.554, and β6 = 0.966. The results are generally consistent with those previous studies, but our results show that the height has no significant impact on the infectious rate and can be removed from the model. Figure 1 reports the curve of baseline age function α0(t) estimated by QIF-based group SCAD that is estimated by QIF and that is estimated by QIF-based SCAD partial penalty to β in [16], where the GPLM without the varying coefficient term is used. Figure 1 implies that the probability of having respiratory infection increases at the very early stage, then decreases steadily, and declines dramatically when the age is over 5.5 years old. This also coincides with previous results [16, 26, 27].
Figure 1.

The estimated function on age for the infectious disease data.
6. Conclusion and Discussion
We proposed a QIF-based group SCAD variable selection procedure for the generalized partially linear varying coefficient models with longitudinal data. This procedure can select significant variables in the parametric components and nonparametric components simultaneously. Under mild conditions, the estimators of regression coefficients have oracle property. Simulation studies indicate that the proposed procedure is very effective in selecting significant variables and estimating the regression coefficients.
In this paper, we assume that the dimensions of the covariates X and Z are fixed. Study 3 in simulations shows that the proposed approach still have desired results when the dimensions p and q go to infinity as n⟶∞. However, when in ultrahigh-dimensional case, the proposed variable selection procedure may not work well anymore. As a future research topic, it is interesting to consider the variable selection for the generalized partially linear varying coefficient models with ultrahigh-dimensional covariates.
Acknowledgments
The research is funded by the National Natural Science Foundation of China (11571025) and the Beijing Natural Science Foundation (1182008). This support is greatly appreciated.
Appendix
A. Proofs of the Main Results
For convenience and simplicity, let C denote a positive constant that may have different values at each appearance throughout this paper and ‖A‖ denote the modulus of the largest singular value of matrix or vector A.
Let ηij = XijTβ + ZijT · Iq ⊗ B(Uij)Tγ, then μij = h(ηij). Let ηi = (ηi1, ⋯,ηim)T, μi = (μi1, ⋯,μim)T, and θ = (βT, γT)T, Yi = (Yi1, ⋯,Yim)T, Xi = (Xi1, ⋯,Xim)T.
Similarly, let Wij = B(Uij) ⊗ Iq · Zij, Pij = (XijT, Wij)T, and Wi = (Wi1, ⋯,Wim))T, Pi = (Pi1, ⋯,Pim)T = (Xi, W(Ui)); then, ηij = PijTθ, ηi = Piθ, and ∂ηij/∂θ = Pij, ∂ηi/∂θ = PiT.
Let h′(t) = dh(t)/dt, then ∂μij/∂θ = h′(ηij)Pij. Let
| (A.1) |
Then,
| (A.2) |
Proof of Theorem 1. —
Let δ = n−1/2, β = β0 + δD1, γ = γ0 + δD2, and D = (D1T, D2T)T. We first show that for any given ε > 0, there exists a large constant C such that
(A.3) Note that β0l = 0, for all l = P1 + 1, ⋯, p, and γ0k = 0, for all k = q1, ⋯, q, together with Assumption (A1) and pλ(0) = 0, we have
(A.4) By Taylor expansion and Assumption (A4), we have
(A.5) Invoking the proof of Theorem 2 in Zhang and Xue [16],
(A.6) By choosing a sufficient large C, I1 dominates I2. Similarly, I1 dominates I3 for a sufficient large C. Thus (A.3) holds, i.e., with probability at least 1 − ε, there exists a local minimizer that satisfies . Therefore, and . Let Rk(u) = αk(u) − B(U)Tγk and γok denote the spline coefficient vector from the spline approximation to αk(·). From Assumptions (A7) and (A8) and Theorem 12.7 in [18], we get that ‖Rk(u)‖ = O(K−r). Therefore,
(A.7) Thus, we complete the proof of Theorem 1.
Proof of Theorem 2. —
According to Theorem 2, in order to prove the first part of Theorem 2, we need only to prove that, for any γ satisfying ‖γ − γ0‖ = Op(n−1/2) and for any βl satisfying ‖βl − β0l‖ = Op(n−1/2), l = 1, ⋯, p1, there exists a certain ϵ = Cn−1/2 that satisfies, as n⟶∞, with probability tending to 1:
(A.8)
(A.9) These imply that the PQIF 𝒬np(β, γ) reaches its minimum at βl = 0, l = p1 + 1, ⋯, p.
Following Lemmas 3 and 4 of [16], we have
(A.10) According to (8), the expression of the derivative of SCAD-penalized function, it is easy to see that limn→∞liminfβl→0λ2−1p′λ2(∣βl∣) = 1. Together with Assumption (A10), λ2n1/2 > λminn1/2⟶∞, it is clear that the sign of (A.10) is decided by that of βl. This implies (A.8) and (A.9) hold. Thus, we complete the proof of the first part.
Similarly, we can prove that with probability tending to 1, . Note that ‖B(u)‖ = O(1) and ; the second part of Theorem 2 is proved. Thus, we complete the proof of Theorem 2.
Proof of Theorem 3. —
Let θ∗ = (β∗T, γ∗T)T and let Pi∗ = (Xi∗T, Wi∗T)T, i = 1, ⋯, n denote the covariates corresponding to θ∗. Denote and to be the first derivatives of the PQIF 𝒬np with respect to β and γ, respectively, i.e.,
(A.11) By Theorems 1 and 2, and satisfies that
(A.12) By the Taylor expansion, we have
(A.13) where is between ((β0∗T, 0T)T, (γ0∗T, 0T)T) and . Apply the Taylor expansion to , we obtain
(A.14) By Assumption (A10), p′′λ2(∣β0l∣) = op(1). Note that p′λ2l(∣β0l∣) = 0 as λmax⟶0; therefore, by Lemma 4 of [16] and through some calculation, we have
(A.15) where is the (l, k) block of Ω−1 and
(A.16) Similarly, we have
(A.17) where . Hence,
(A.18) Following the proof of Theorem 2 in [16], we prove (16). Thus, we complete the proof of Theorem 3.
Data Availability
The data can be downloaded from https://content.sph.harvard.edu/xlin/dat/indon.dat.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Supplementary Materials
The R code presented in Word format for the real data analysis is included in the supplementary file.
References
- 1.Breiman L. Better subset regression using the nonnegative garrote. Techonometrics. 1995;37(4):373–384. doi: 10.1080/00401706.1995.10484371. [DOI] [Google Scholar]
- 2.Tibshirani R. Regression shrinkage and selection via the LASSO. Journal of Royal Statistical Society, Series B. 1996;58:267–288. doi: 10.1111/j.2517-6161.1996.tb02080.x. [DOI] [Google Scholar]
- 3.Fu W. J. Penalized Regressions: the bridge versus the LASSO. Journal of Computational and Graphical Statistics. 1998;7(3):397–416. doi: 10.1080/10618600.1998.10474784. [DOI] [Google Scholar]
- 4.Fan J., Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association. 2001;96(456):1348–1360. doi: 10.1198/016214501753382273. [DOI] [Google Scholar]
- 5.Zhou H., Li R. One-step sparse estimates in nonconcave penalized likelihood models. Annals of Statistics. 2007;36:1509–1533. doi: 10.1214/009053607000000802. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.FAN J., Li R. New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. Journal of the American Statistical Association. 2004;99(467):710–723. doi: 10.1198/016214504000001060. [DOI] [Google Scholar]
- 7.FAN J., Zhang W. Statistical methods with varying coefficient models. Statistics and Its Interface. 2008;1(1):179–195. doi: 10.4310/SII.2008.v1.n1.a15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Zhao P. X., Xue L. G. Variable selection for semi-parametric varying coefficient partially linear models. Statistics & Probability Letters. 2009;79(20):2148–2157. doi: 10.1016/j.spl.2009.07.004. [DOI] [Google Scholar]
- 9.Xue L., Qu A., Zhou J. Consistent model selection for marginal generalized additive model for correlated data. Journal of the American Statistical Association. 2010;105(492):1518–1530. doi: 10.1198/jasa.2010.tm10128. [DOI] [Google Scholar]
- 10.Qu A., Lindsay B. G., Li B. Improving generalised estimating equations using quadratic inference functions. Biometrika. 2000;87(4):823–836. doi: 10.1093/biomet/87.4.823. [DOI] [Google Scholar]
- 11.Liang K. L., Zeger S. L. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73(1):13–22. doi: 10.1093/biomet/73.1.13. [DOI] [Google Scholar]
- 12.Qu A., Li R. Quadratic inference functions for varying coefficient models with longitudinal data. Biometrics. 2006;62(2):379–391. doi: 10.1111/j.1541-0420.2005.00490.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Wang L., Li H., Huang J. Z. Variable selection in nonparametric varying coefficient models for analysis of repeated measurements. Journal of American Statistical Association. 2008;103:1556–1569. doi: 10.1198/016214508000000788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Bai Y., Zhu Z. Y., Fung W. K. Partial linear models for longitudinal data based on quadratic inference functions. Scandinavian Journal of Statistics. 2008;35(1):104–118. doi: 10.1111/j.1467-9469.2007.00578.x. [DOI] [Google Scholar]
- 15.Tian R. Q., Xue L. G., Liu C. L. Penalized quadratic inference functions for semiparametric varying coefficient partially linear models with longitudinal data. Journal of Multivariate Analysis. 2014;132:94–110. doi: 10.1016/j.jmva.2014.07.015. [DOI] [Google Scholar]
- 16.Zhang J. H., Xue L. G. Quadratic inference functions for generalized partially models with longitudinal data. Chinese Journal of Applied Probability and Statistics. 2017;33:417–432. [Google Scholar]
- 17.Li R., Liang H. Variable selection in semiparametric regression modeling. The Annals of Statistics. 2008;36(1):261–286. doi: 10.1214/009053607000000604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Schumaker G. Spline Function. New York, NY, USA: Wiley; 1981. [Google Scholar]
- 19.Wang H., Li R., Tsai C. Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika. 2007;94(3):553–568. doi: 10.1093/biomet/asm053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Zhou H. The adaptive lasso and its oracle properties. Journal of the American Statistical Association. 2006;101(476):1418–1429. doi: 10.1198/016214506000000735. [DOI] [Google Scholar]
- 21.Wang H. S., Xia Y. C. Shrinkage estimation of the varying coefficient model. Journal of the American Statistical Association. 2009;104(486):747–757. doi: 10.1198/jasa.2009.0138. [DOI] [Google Scholar]
- 22.Oman S. D. Easily simulated multivariate binary distributions with given positive and negative correlations. Computational Statistics & Data Analysis. 2009;53(4):999–1005. doi: 10.1016/j.csda.2008.11.017. [DOI] [Google Scholar]
- 23.Zeger S. L., Karim M. R. Generalized linear models with random effects: a Gibbs sampling approach. Journal of the American Statistical Association. 1991;86:79–86. doi: 10.1080/01621459.1991.10475006. [DOI] [Google Scholar]
- 24.Diggle P. J., Liang K. Y., Zeger S. L. Analysis of Longitudinal Data. Oxford, England: Oxford University Press; 1994. [Google Scholar]
- 25.Lin X. H., Carroll R. J. Nonparametric function estimation for clustered data when the predictor is measured without/with error. Journal of the American Statistical Association. 2000;95:520–534. doi: 10.1080/01621459.2000.10474229. [DOI] [Google Scholar]
- 26.Lin X. H., Carroll R. J. Semiparametric regression for clustered data using generalized estimating equations. Journal of the American Statistical Association. 2001;96(455):1045–1056. doi: 10.1198/016214501753208708. [DOI] [Google Scholar]
- 27.He X., Fung W., Zhu Z. Robust estimation in generalized Partial linear Models for Clustered data. Journal of the American Statistical Association. 2005;100(472):1176–1184. doi: 10.1198/016214505000000277. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
The R code presented in Word format for the real data analysis is included in the supplementary file.
Data Availability Statement
The data can be downloaded from https://content.sph.harvard.edu/xlin/dat/indon.dat.
