Abstract
In this paper, we consider the estimation and model selection for longitudinal partial linear varying coefficient errors-in-variables (EV) models when the covariates are measured with some additive errors. Bias-corrected penalized quadratic inference functions method is proposed based on quadratic inference functions with two penalty function terms. The proposed method can not only handle the measurement errors of covariates and within-subject correlations but also estimate and select significant non-zero parametric and nonparametric components simultaneously. With some regularization conditions, the resulting estimators of parameters are asymptotically normal and the estimators of nonparametric varying coefficient achieves the optimal convergence rate. Furthermore, we present simulation studies and a real example analysis to evaluate the finite sample performance of the proposed method.
Keywords: Longitudinal data, variable selection, partial linear varying coefficient EV models, quadratic inference function
1. Introduction
Varying coefficient models [7] have better interpretability and flexibility than linear models and can avoid the curse of dimensionality. They are usually applied for analysis of longitudinal and clustered data. In varying coefficient model, the regression coefficients are unknown nonparametric functions and allowed to depend on time or some other covariates. They facilitate the study of dynamic features. A survey about studies for varying coefficient models can be seen in [15].
As we know, not all of the coefficients are varying in some special cases. Thus, we consider the partial linear varying coefficient model [12] based on longitudinal data. Suppose the longitudinal data
satisfies the following partial linear varying coefficient model as
| (1) |
where and are the response and covariate at , , , is a zero-mean stochastic process, and are independent of each other. is a regression parameter vector. , is coefficient functions vector with being unknown smoothing coefficient functions of t, . We further state some assumptions on the first two moments of model (1) as and , where , are the expectation and variance respectively, and is known function.
Model (1) has the advantages of linear model and varying coefficient model, can reduce the modeling bias and avoid the curse of dimensionality. Recently, it has been studied by some different statistical methods, such as local polynomial fitting method [32], profile least square method [4], empirical likelihood method [10,30], quantile regression method [23], penalized quadratic inference function (pQIF) method [20] and so on. An important assumption in these methods is that covariates can be observed.
However, as all we know, it is impossible to get accurate data in practice, especially for some important covariates. No matter how data is collected, measurement errors are unavoidable or some covariates are unobserved. Ignoring measurement errors may result in biased estimators or even uncorrected results. Therefore, it is meaningful to incorporate measurement errors into model (1). In view of these, we consider that X and Z are measured with additive errors in model (1), which is the so-called partial linear varying coefficient errors-in-variables (EV) model as
| (2) |
where and can be observed directly, and are zero-mean measurement errors with diagonal covariance matrix and respectively. In addition, we assume that for , and are independent of each other and are all independent of , where denotes the covariance operator. Although these assumptions are not the weakest possible condition, to deal with measurement errors, we need extra information about and in practice. For example, we usually assume that and are known or can be estimated.
Model (2) was studied in the literatures. For the case that only is measured with additive error, You and Chen [30] proposed two different estimators for parametric and nonparametric components with cross-sectional data. Empirical likelihood inference can be seen in Hu et al. [9], Zhao and Xue [35], Xia and Da[28], Zhou et al. [38], Fan et al. [3] and Wang et al. [25]. The cases for some linear covariates are unobserved with available ancillary variables can been in Zhou and Liang [37]. Variable selection procedure for the high-dimensional situation was studied by Wang and Xue [26]. The estimation and testing problems can be seen in Zhang et al. [33]. Wang and Zou [22] studied the model average problem of model (2). Wei [27] proposed a restricted modified profile least squares estimator for the parametric components.
For the models that only is unobserved and measured with additive error, empirical likelihood inference and Local bias-corrected restricted profile least squares estimators can be used for model estimation [2,6]. For generalized partial linear varying coefficient model with some linear covariates being error prone but ancillary variables being available, Zhang et al. [31] proposed a variable selection method. Zhao and Xue [36] proposed a variable selection method for the case that and are measured with errors simultaneously based on the cross-sectional data.
On the other hand, model selection is an important topic for longitudinal data analysis. Some valuable research can be seen in Tian and Xue [19], Zhao et al. [34]. As far as we know, there is no study being reported on model selection for model (2) with longitudinal data when and are measured with additive errors simultaneously. Taking this issue into account, inspired by [34] and [19], we mainly study the variable selection for model (2). In view of the advantages of quadratic inference functions (QIF)[16] versus generalized estimating equations (GEE) [14], a bias-corrected penalized quadratic inference functions (pQIF) method for model (2) is proposed in this paper, which can estimate and select non-zero regression parameters and coefficient functions simultaneously. Furthermore, the asymptotic properties of the proposed method and estimators are constructed.
The rest of this paper is organized as follows. In section 2, we propose the bias-corrected pQIF method. In section 3, we study the asymptotic properties of model estimation and selection results. Some issues in practical implementation are presented in section 4. Simulation studies and a real example analysis are presented in section 5. In Section 6, we present a brief conclusion and discussion of the results and methods. The proofs of some asymptotic results are provided in the Appendix.
2. Model estimation and selection method
Denote B-spline basis vector with the order d as , where L = K + d, is the number of interior knots. Hence, following He et al. [8], can be represented approximately as
| (3) |
where is a regression coefficient vector of B-spline basis.
Replace by (3), model (2) can be represented as
| (4) |
where , , , , , is the identity matrix. From the assumption of model (2), we can see that and are independent of each other, and are all independent of , , , , for . From (4), we can get the GEE about as
| (5) |
Then we can get
This shows that Equation (5) is biased, then we can get the bias-corrected GEE about θ as
| (6) |
where , , , , , , is the covariance of . Obviously, equation (6) is unbiased. From the GEE method, we take as , where , is a working correlation matrix, ρ is a nuisance parameter. Liang and Zeger [14] pointed out that consistent estimator of ρ may not exist in some certain simple cases, which may invalidate the GEE method.
To overcome this drawback of the GEE, Qu and Li [17] proposed the QIF method to analyze longitudinal data by assuming that , where are some simple known matrices, are unknown constants. This approach treats as nuisance parameters [17]. Substituting it into (6), we get the new bias-corrected GEE as
| (7) |
Unlike the GEE method, we do not need to estimate . Instead, define the bias-corrected extended score function as
| (8) |
where , . Obviously, and are independent of each other, so we can get that, and are independent of each other and and . we can get
| (9) |
where , , . By some simple matrix calculations, following as Zhao et al. (2020)[34], we have
| (10) |
| (11) |
where , denotes a diagonal matrix operator. However, the covariance matrix and are usually unknown in advance, so we need to estimate in practice. Under some conditions, and can usually be estimated by partial replication similar as [1].
If the longitudinal data is balanced, that is, . Suppose that and can be observed times for ith subject, , , , we can get two consistent, unbiased estimators and for and , respectively, as
| (12) |
| (13) |
where , . Furthermore, we can get two consistent, unbiased estimators and for and , respectively, as
| (14) |
| (15) |
Substituting (14) and (15) into (9), we can get a consistent, unbiased estimator for as
| (16) |
If the longitudinal data is unbalanced, following Xue et al. [29], it can be reformulated to balanced, the details are omitted here.
According to (16), we can get a estimator for as
| (17) |
Obviously, is a vector, however, θ is a parameter vector. Equation is over-identified and can not be used to solve the . To solve this problem, following Qu et al. [16], we construct the bias-corrected QIF about θ as
| (18) |
where . Furthermore, we can get as
| (19) |
As mentioned above, the bias-corrected QIF can correct the bias of estimating equations and handle within-subject correlations simultaneously. However, the bias-corrected QIF method for nonparametric coefficient functions is the spline regression approach and usually over-fitted. Not only that, but the true model is unknown in practice. To solve these issues, we construct the bias-corrected pQIF to estimate and select significant parameters and varying coefficients simultaneously, defined as
| (20) |
where , , , is the SCAD penalty function [5] defined as
| (21) |
where a = 3.7, w>0 and , λ is a tuning parameter and measures the amount of penalty. Therefore, denote and for and respectively in (20). The bias-corrected pQIF estimator is given by
| (22) |
Furthermore, the estimators of can be obtained by
| (23) |
3. Asymptotic properties
We now construct the asymptotic properties of and . Firstly, let and be the true regression parameters and coefficient functions, be B-spline regression coefficient vectors from the spline approximation to . Furthermore, we assume that
Some necessary regularity conditions for the asymptotic properties are as follows.
-
C1:
for .
-
C2:
are rth continuously differentiable on , where .
-
C3:
∃ unique satisfies , where Θ is the parameter space.
-
C4:
∃ invertible such that .
-
C5:
, and ∃ such that , , , where is the modulus of the largest singular values.
-
C6:
, .
-
C7:
, , .
-
C8:
Denote interior knots as and satisfy and , where , , , , .
-
C9:exists and is continuous, and according to the weak law of large number, when , ∃ such that
(24) -
C10:
Denote , then as .
-
C11:satisfies
(25) (26)
Remark 3.1
These conditions are often used in the literatures for nonparametric and semi-parametric statistical inference. C1 implies . C2 is the smoothness condition about and necessary condition to study the convergence rate of B-spline estimator. C4 and C9 can be easily obtained by the weak law of large numbers when . C3, C5-C7, C9 can be seen in [20]. C8 is necessary for knots of B-spline basis approximations [18]. C10 and C11 can be seen in [5,20,36].
According these conditions above, some asymptotic properties about resulting estimators are presented as follows.
Theorem 3.1
If C1-C11 hold, and , we have
(27)
Theorem 3.2
If C1-C11 hold, and , let , satisfy , , then with probability tending to 1, we have
Theorem 3.3
Denote as the estimator of . If C1-C11 hold, and , we have
(28) where is denoted as Equation (A11) in Appendix, “ ” represents the convergence in distribution.
Remark 3.2
Theorem 1 shows that the estimators of varying coefficients have the optimal convergence rate, Theorem 2 shows that the estimators of constant coefficients and varying coefficients have sparse property. From Theorem 1-3, we know that the proposed method possesses the oracle property.
4. Computational algorithm and selection of tuning parameters
4.1. Computational algorithm
It is obvious that by (22) does not have closed form and is irregular at the origin which means that we can only get numerical solution of . Therefore, can be approximated around a given point using Taylor expansion as
where and . On the other hand, can be approximated as
where is an initial value. Therefore, apart from a constant, the bias-corrected pQIF can represented as
| (29) |
where
According to (29), can be solved by following calculation algorithm
The detailed computational algorithm iterative calculation method is shown below.
Step 1: Take the bias-corrected QIF estimator denoted by (19) as .
- Step 2: Update at the th iteration by
Step 3: Repeat Step 2 until certain convergence criterion is satisfied.
4.2. Selection of tuning parameters
As all we know, and control the amount of penalty and determine the values of the penalty function and . However, they are unknown in practice. These mean that unknown and determine the results of model estimation and selection indirectly. Thus, it is important for selection of and in the implementation. As Wang et al. [21] presented, the BIC criterion for SCAD estimator can select the true model with probability tending to one. In our work, we apply the BIC criterion to select the optimal tuning parameters and .
However, it is a challenge to select p + q parameters simultaneously in real applications. A wise way for selection of and is to give a larger value to a zero parameter or a zero coefficient function than to a non-zero parameter or non-zero coefficient function. This method aims to give more amount of penalty to zero parameters or zero coefficient functions than to non-zero parameters or non-zero coefficient functions, which is good for selecting significantly non-zero parameters or non-zero coefficient functions and can reduce computational complexity. This kind of tuning parameters usually are called adaptive tuning parameters. The proposed method using the adaptive tuning parameters can estimate large parameters and coefficient functions unbiasedly and shrink the small ones toward zero simultaneously. Thus, denote and as
where and are defined by (19). Consequently, the selection of and becomes a problem of selection of , which is an easier univariate problem and can reduce computational complexity greatly. Define BIC as
| (30) |
where is defined by (22) for a given λ, is the number of non-zero parameters and coefficients of and , , . So we can get the optimal as
| (31) |
In practice, can be obtained by the grid searching method.
5. Numerical studies
5.1. Simulations studies
We conducted some numerical simulations to asses the performance of the bias-corrected pQIF method in terms of estimation accuracy and selection performance in finite samples. Firstly, the generalized mean square error (GMSE) [20,36] is defined as
Obviously, the smaller the GMSE, the better the estimation effect for β. The square root of average square (RASE) is defined as
Smaller RASE indicates better estimation accuracy which implies is more closer to the true function . In our work, we set m = 200, and are equally spaced on .
‘C’ in tables below means the average number of or , and ‘IC’ denotes the average number of or . Obviously, larger ‘C’ and smaller ‘IC’ imply better model selection results. The performance of the bias-corrected pQIF method is assessed by the GMSE, RASE, ‘C’ and ‘IC’ simultaneously.
In our simulation studies, for model (2), let with , , , , and
We took , , , , where , , is identify matrix, is identify matrix. We set as 0.2, 0.4, 0.6. . , where and is a known correlation matrix with parameter ρ. So we can get . We set and considered has the first-order autoregressive (AR(1)) and exchangeable (EX) correlation structures with and . We generated n = 150, 200, 300 subjects. The cubic B-spline basis was applied, the knots were equally spaced in , , where denotes the largest integer less than c [8]. Following Tian et al. [20], we choose c = 0.6.
For each simulated longitudinal data, we compared the bias-corrected pQIF method with the LASSO and the SCAD penalty functions and the one neglecting measurement errors with SCAD penalty function (denoted as ‘nSCAD’). For the sake of simplicity, the bias-corrected pQIF with LASSO and SCAD penalty functions are denoted by ‘LASSO’ and ‘SCAD’ respectively. and were chosen by (31). Furthermore, we did 500 simulation runs under each simulation setup and presented the median of GMSE and RASE in the following tables.
In summary, from Tables 1 to 4, we can get some conclusions as follows:
The performances of the LASSO and SCAD methods are much more better than the nSCAD method in all of cases, which implies that bias-corrected method we proposed is valuable and neglecting measurement errors results in biased estimation and poor variable selection results for model (2).
Under the same conditions, the performance of the SCAD and LASSO methods become better as the sample size becomes larger. Furthermore, the SCAD methods is better than the LASSO method as for estimation and selection for parametric and nonparametric parts.
Under the same conditions, the SCAD and LASSO methods become worse when the measurement error increases. There is little difference between the performance of the SCAD and LASSO methods when the measurement error is small. However, the SCAD method is significantly better than the LASSO method when the measurement error is large, which implies that the LASSO method is less robust than that of the SCAD method.
Table 2.
Variable selections for with the EX correlation structure.
| n = 150 | n = 200 | n = 300 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| ρ | Method | C | IC | RASE | C | IC | RASE | C | IC | RASE | |
| 0.2 | LASSO | 3.442 | 0 | 0.11303 | 3.798 | 0 | 0.09892 | 3.972 | 0 | 0.08785 | |
| SCAD | 3.502 | 0 | 0.11307 | 3.808 | 0 | 0.09879 | 3.974 | 0 | 0.08779 | ||
| nSCAD | 3.410 | 0 | 0.13200 | 3.764 | 0 | 0.11801 | 3.948 | 0 | 0.10970 | ||
| 0.4 | LASSO | 3.378 | 0 | 0.17904 | 3.696 | 0 | 0.14573 | 3.916 | 0 | 0.12109 | |
| SCAD | 3.414 | 0 | 0.17833 | 3.766 | 0 | 0.14346 | 3.956 | 0 | 0.11848 | ||
| nSCAD | 3.134 | 0 | 0.30538 | 3.524 | 0 | 0.29042 | 3.850 | 0 | 0.27680 | ||
| 0.6 | LASSO | 3.140 | 0 | 0.26483 | 3.628 | 0 | 0.20670 | 3.888 | 0 | 0.16369 | |
| SCAD | 3.282 | 0 | 0.25851 | 3.674 | 0 | 0.20247 | 3.922 | 0 | 0.15911 | ||
| nSCAD | 2.834 | 0 | 0.60510 | 3.178 | 0 | 0.57348 | 3.588 | 0 | 0.56202 | ||
| 0.2 | LASSO | 3.520 | 0 | 0.10967 | 3.752 | 0 | 0.09818 | 3.970 | 0 | 0.08738 | |
| SCAD | 3.538 | 0 | 0.10946 | 3.776 | 0 | 0.09803 | 3.972 | 0 | 0.08711 | ||
| nSCAD | 3.434 | 0 | 0.12746 | 3.804 | 0 | 0.11733 | 3.958 | 0 | 0.10913 | ||
| 0.4 | LASSO | 3.334 | 0 | 0.17337 | 3.684 | 0 | 0.14602 | 3.940 | 0 | 0.12035 | |
| SCAD | 3.386 | 0 | 0.17244 | 3.710 | 0 | 0.14444 | 3.954 | 0 | 0.11865 | ||
| nSCAD | 3.136 | 0 | 0.30833 | 3.502 | 0 | 0.29786 | 3.856 | 0 | 0.27834 | ||
| 0.6 | LASSO | 3.180 | 0 | 0.26489 | 3.522 | 0 | 0.20862 | 3.900 | 0 | 0.16294 | |
| SCAD | 3.242 | 0 | 0.26246 | 3.662 | 0 | 0.20480 | 3.938 | 0 | 0.15718 | ||
| nSCAD | 2.688 | 0 | 0.61163 | 3.132 | 0 | 0.58300 | 3.600 | 0 | 0.56400 | ||
Table 3.
Variable selections for β with the AR(1) correlation structure.
| n = 150 | n = 200 | n = 300 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| ρ | Method | C | IC | GMSE | C | IC | GMSE | C | IC | GMSE | |
| 0.2 | LASSO | 1.834 | 0 | 0.00027 | 1.950 | 0 | 0.00018 | 1.992 | 0 | 0.00010 | |
| SCAD | 1.842 | 0 | 0.00017 | 1.960 | 0 | 0.00011 | 1.992 | 0 | 7.6E-05 | ||
| nSCAD | 1.796 | 0 | 0.00079 | 1.880 | 0 | 0.00061 | 1.964 | 0 | 0.00060 | ||
| 0.4 | LASSO | 1.380 | 0 | 0.00676 | 1.586 | 0 | 0.00184 | 1.754 | 0 | 0.00601 | |
| SCAD | 1.396 | 0 | 0.00154 | 1.594 | 0 | 0.00113 | 1.758 | 0 | 0.00075 | ||
| nSCAD | 0.988 | 0 | 0.03460 | 0.954 | 0 | 0.00621 | 0.892 | 0 | 0.00111 | ||
| 0.6 | LASSO | 1.028 | 0 | 0.03337 | 1.150 | 0 | 0.03183 | 1.408 | 0 | 0.03737 | |
| SCAD | 1.060 | 0 | 0.00929 | 1.178 | 0 | 0.00516 | 1.410 | 0 | 0.00418 | ||
| nSCAD | 0.448 | 0 | 0.24760 | 0.286 | 0 | 0.16020 | 0.162 | 0 | 0.09212 | ||
| 0.2 | LASSO | 1.884 | 0 | 0.00026 | 1.934 | 0 | 0.00015 | 1.992 | 0 | 0.00011 | |
| SCAD | 1.886 | 0 | 0.00014 | 1.936 | 0 | 0.00010 | 1.996 | 0 | 6.9E-05 | ||
| nSCAD | 1.814 | 0 | 0.00092 | 1.904 | 0 | 0.00076 | 1.948 | 0 | 0.00062 | ||
| 0.4 | LASSO | 1.420 | 0 | 0.00301 | 1.588 | 0 | 0.00662 | 1.794 | 0 | 0.00126 | |
| SCAD | 1.474 | 0 | 0.00168 | 1.614 | 0 | 0.00107 | 1.824 | 0 | 0.00084 | ||
| nSCAD | 1.046 | 0 | 0.00856 | 0.982 | 0 | 0.00197 | 0.886 | 0 | 0.00628 | ||
| 0.6 | LASSO | 0.984 | 0 | 0.03522 | 1.212 | 0 | 0.03437 | 1.428 | 0 | 0.04007 | |
| SCAD | 0.994 | 0 | 0.00885 | 1.236 | 0 | 0.00583 | 1.436 | 0 | 0.00406 | ||
| nSCAD | 0.460 | 0 | 0.23031 | 0.308 | 0 | 0.16011 | 0.136 | 0 | 0.08534 | ||
Table 1.
Variable selections for β with the EX correlation structure.
| n = 150 | n = 200 | n = 300 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| ρ | Method | C | IC | GMSE | C | IC | GMSE | C | IC | GMSE | |
| 0.2 | LASSO | 1.834 | 0 | 0.00025 | 1.950 | 0 | 0.00017 | 1.988 | 0 | 0.00010 | |
| SCAD | 1.850 | 0 | 0.00014 | 1.960 | 0 | 9.7E-05 | 1.988 | 0 | 7.6E-05 | ||
| nSCAD | 1.802 | 0 | 0.00089 | 1.878 | 0 | 0.00072 | 1.974 | 0 | 0.00068 | ||
| 0.4 | LASSO | 1.420 | 0 | 0.00339 | 1.546 | 0 | 0.00200 | 1.784 | 0 | 0.00120 | |
| SCAD | 1.448 | 0 | 0.00151 | 1.580 | 0 | 0.00116 | 1.792 | 0 | 0.00077 | ||
| nSCAD | 0.892 | 0 | 0.00762 | 1.000 | 0 | 0.00683 | 1.038 | 0 | 0.00534 | ||
| 0.6 | LASSO | 1.046 | 0 | 0.01934 | 1.180 | 0 | 0.01422 | 1.404 | 0 | 0.00954 | |
| SCAD | 1.082 | 0 | 0.00820 | 1.200 | 0 | 0.00610 | 1.438 | 0 | 0.00435 | ||
| nSCAD | 0.188 | 0 | 0.03413 | 0.314 | 0 | 0.03321 | 0.478 | 0 | 0.03836 | ||
| 0.2 | LASSO | 1.882 | 0 | 0.00028 | 1.962 | 0 | 0.00017 | 1.994 | 0 | 0.00012 | |
| SCAD | 1.892 | 0 | 0.00013 | 1.968 | 0 | 9.3E-05 | 1.996 | 0 | 6.1E-05 | ||
| nSCAD | 1.870 | 0 | 0.00099 | 1.926 | 0 | 0.00085 | 1.984 | 0 | 0.00070 | ||
| 0.4 | LASSO | 1.456 | 0 | 0.00340 | 1.626 | 0 | 0.00192 | 1.752 | 0 | 0.00121 | |
| SCAD | 1.466 | 0 | 0.00176 | 1.598 | 0 | 0.00117 | 1.784 | 0 | 0.00079 | ||
| nSCAD | 0.974 | 0 | 0.00883 | 0.994 | 0 | 0.00723 | 1.000 | 0 | 0.00630 | ||
| 0.6 | LASSO | 1.024 | 0 | 0.03529 | 1.262 | 0 | 0.01561 | 1.438 | 0 | 0.00424 | |
| SCAD | 1.068 | 0 | 0.00821 | 1.264 | 0 | 0.00594 | 1.448 | 0 | 0.00421 | ||
| nSCAD | 0.170 | 0 | 0.03655 | 0.330 | 0 | 0.03483 | 0.496 | 0 | 0.02724 | ||
Table 4.
Variable selections for with the AR(1) correlation structure.
| n = 150 | n = 200 | n = 300 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| ρ | Method | C | IC | RASE | C | IC | RASE | C | IC | RASE | |
| 0.2 | LASSO | 3.446 | 0 | 0.11341 | 3.790 | 0 | 0.10114 | 3.952 | 0 | 0.08951 | |
| SCAD | 3.454 | 0 | 0.11330 | 3.816 | 0 | 0.10087 | 3.958 | 0 | 0.08931 | ||
| nSCAD | 3.488 | 0 | 0.13156 | 3.716 | 0 | 0.11831 | 3.960 | 0 | 0.10979 | ||
| 0.4 | LASSO | 3.324 | 0 | 0.18076 | 3.672 | 0 | 0.14687 | 3.924 | 0 | 0.11895 | |
| SCAD | 3.396 | 0 | 0.17887 | 3.744 | 0 | 0.14504 | 3.954 | 0 | 0.11742 | ||
| nSCAD | 3.064 | 0 | 0.30761 | 3.546 | 0 | 0.29106 | 3.836 | 0 | 0.27361 | ||
| 0.6 | LASSO | 3.132 | 0 | 0.26516 | 3.628 | 0 | 0.20858 | 3.868 | 0 | 0.16263 | |
| SCAD | 3.276 | 0 | 0.25738 | 3.732 | 0 | 0.19892 | 3.934 | 0 | 0.15858 | ||
| nSCAD | 2.694 | 0 | 0.60629 | 2.986 | 0 | 0.57842 | 3.468 | 0 | 0.55978 | ||
| 0.2 | LASSO | 3.416 | 0 | 0.11313 | 3.780 | 0 | 0.09946 | 3.964 | 0 | 0.08908 | |
| SCAD | 3.474 | 0 | 0.11247 | 3.806 | 0 | 0.09909 | 3.966 | 0 | 0.08878 | ||
| nSCAD | 3.396 | 0 | 0.13017 | 3.710 | 0 | 0.11741 | 3.968 | 0 | 0.10843 | ||
| 0.4 | LASSO | 3.278 | 0 | 0.17854 | 3.706 | 0 | 0.14404 | 3.924 | 0 | 0.11779 | |
| SCAD | 3.372 | 0 | 0.17591 | 3.782 | 0 | 0.14356 | 3.950 | 0 | 0.11729 | ||
| nSCAD | 3.150 | 0 | 0.30647 | 3.496 | 0 | 0.28994 | 3.808 | 0 | 0.27429 | ||
| 0.6 | LASSO | 3.088 | 0 | 0.26305 | 3.574 | 0 | 0.20768 | 3.86 | 0 | 0.16407 | |
| SCAD | 3.226 | 0 | 0.26080 | 3.67 | 0 | 0.20441 | 3.92 | 0 | 0.15910 | ||
| nSCAD | 2.656 | 0 | 0.6041 | 3.018 | 0 | 0.58127 | 3.522 | 0 | 0.56007 | ||
5.2. Real example analysis
We now describe the performance of the proposed method through analysis of the AIDS dataset. This dataset contains some variables such as the mean CD4 percentage, smoking status, the pre-HIV infection CD4 percentage and age. It is unbalanced and available in the R package timereg. More details of the study design and medical implications can be found in [11]. It has been analyzed to illustrate partial linear varying coefficient models [20] and partial linear varying coefficient EV models [35]. Zhao et al. [37] and Tian et al. [20] indicated that only the baseline function varies over time and pre-CD4 has a constant effect over time. We now consider measurement errors for the covariates and analyze this dataset using the proposed method.
For simplicity, following Zhao and Xue [36], we considered the following model. Let Y be the individuals CD4 percentage, be the centered preCD4 percentage, , be the centered age at HIV infection, .
| (32) |
where is the baseline of CD4 percentage; and describe the first-order and second-order effects of preCD4 percentage, and describes the first-order and second-order effects of the age at HIV infection, t is the visiting time for each patient.
For the AIDS dataset, we cannot get repeated measurements of the covariates or estimate the variance of the measurement error. Following Lin and Carroll (2000), a sensitivity analysis can be used to test the practicability of the proposed method. Similar as in Zhao and Xue (2010)[36], we considered and have additive measurement errors as follows
where , . In our work, we took to represent different measurement errors. It is obvious that implies no measurement errors.
We repeated the proposed model selection procedure and the proposed method identified two non-zero coefficients and every time under different measurement errors, which means that the first-order or second-order effects of age at HIV infection have no significant impact on the mean CD4 percentage. The same goes for the second-order effect of the centered preCD4 percentage and the interaction effect between the preCD4 percentage and age at HIV infection. Our result is same as in Zhao et al. (2009).
Figure 1 shows the curve of over time under different measurement errors. It shows that decreases quickly at the beginning of HIV infection, and the rate of decrease slows down, which is similar as in Zhao and Xue [36]. Furthermore, we found that the estimated functional curve preserves its shape under different measurement errors, which means that our bias-corrected model selection scheme works well. This further demonstrates that the proposed model estimation and selection method has good practical value.
Figure 1.
The curve of for the cases (solid curve), (dashed curve) and (dotted curve).
6. Conclusion and discussion
Longitudinal data are widely used in some scientific fields, and it is of great significance to consider measurement error in longitudinal research. Longitudinal data have unknown within-subject correlations. Thus, the processing of within-subject correlations and measurement errors is an important subject for analysis for longitudinal data with measurement errors. In our work, we consider cases where covariates of model (2) have additive measurement errors. For model (2), some scholars have done valuable researches, such as [9,33,35–37]. However, no studies have been reported on model estimation and selection simultaneously for model (2) with longitudinal data. In our work, we proposed a bias-corrected penalized quadratic inference functions method to do model estimation and selection for model (2) with longitudinal data. This method can deal with both within-subject correlation and measurement errors. Under some conditions, the proposed method can select significant non-zero parameters and varying coefficients. Furthermore, the estimators of non-zero coefficient functions achieve the optimal convergence rate, the estimators of parameters are asymptotic normal. The performance of the proposed method in the case of finite samples can be demonstrated by numerical studies. Finally, it can be concluded that the proposed method has good theoretical and practical value for model estimation and selection of model (2).
The proposed method can also be applied to other models, such as generalized partial linear additive models, generalized partial linear single index models and many others. In addition, the proposed method can also be used for other types of correlated data analysis, such as panel data, clustered data and so on. In future, we will use this method to study more complex models.
Acknowledgements
This work is supported by grants from the Social Science Foundation of China (15CTJ008 to MZ), the Natural Science Foundation of Anhui Universities (KJ2017A433 to KZ), the Social Science Foundation of the Ministry of Education of China(19YJCZH250 to KZ), the National Science Foundation of China (12071305, 11871390 and 11871411 to YZ), the Excellent Young Talents Fund Program of Higher Education Institutions of Anhui Province(gxyqZD2019031 to YZ), the National Science Foundation of China (71803001 to YZ). This paper is partially supported by the National Natural Science Foundation of China (11901401). All authors read and approved the final manuscript.
Appendix. Proof of theorems.
Lemma 1. If C1-C11 hold, and , then we have
Proof.
According to (17), we have
Denote the κth block matrix of as ,
Now, we prove as .
Clearly, according to the law of large numbers, we have and as the . So we get . Under C9, we can get . Now, let's prove that and .
Denote , where Obviously, we can get and
where . From C4-C7, we see that and are bounded. By the law of large numbers, we can get . Thus, we have and where .
According to the Taylor expansion to at , we have
(A1) Denote the κth block matrix of as ,
where , .
Denote , where . According to C5-C7 and Lemma 1, we have and
By the law of large numbers, we get . Similarly, we have and .
Denote , where . And since are independent of each other, we have . According to the Cauchy-Schwarz inequality and C5-C7 we have
Thus, . By the law of large numbers, from the definition of , we have . From C8 and Lemma 1, we have and . So, according to (A1), we have .
Following [20], according to the results above, we have
where , . So we have
From C5-C7, we get . Following the properties of covariance matrix, we have
So satisfies the Lyapunov condition for the central limit theorem. Thus
According to the Slutsky Theorem, we have . The proof of Lemma 1 is completed.
Lemma 1
If C1-C11 hold, then
(A2)
(A3)
Proof.
The proof of Lemma 2 is similar as Lemma 2 in Tian et al. [20] and is omitted here.
Proof of Theorem 1.
Proof.
Let , , and . To prove Theorem 1, it is sufficient to show that , ∃ a large constant satisfies
(A4) Obviously, when , , (A4) is always true. Therefore, we consider the case that . Assume , and . Let , , we have
Apply Taylor expansion to at , we have
where lies between β and . According to Lemma 1 and Lemma 2, we can get
and
Therefore, we have
Obviously, . When C is large enough,
So when C is large enough, . Next, by Taylor expression, we get that
Then, is dominated by uniformly in for a sufficiently large .
Assume , and . When n is large enough, following Xue et al. [29], we have . According to the definition of the penalty function, we get
So, , ∃ a large enough satisfies (A4), which further implies that there exists satisfies . Note that
With the same arguments above, we can get . Therefore, invoking , we have .
Suppose C2 and C8 hold and , with the Corollary 6.21 in [18], ∃ a constant that satisfies
(A5) So we get . Thus, the proof of Theorem 1 is complete.
Proof of Theorem 2.
Proof.
Part (i). Denote . According to Theorem 1, similar as [20], it suffices to show that, that satisfies , that satisfies , and ∃ small , when , with probability tending to one, we have
(A6) and
(A7) obviously, (A6) and (A7) imply that minimizer of about β attains at .
According to Lemma 2, we have
Denote . According to Lemma 2, we have , where . Thus, we get
In addition, C11 implies that , and , which means that the sign of is same as that of . So, (A6) and (A7) hold, the proof of part (i) is completed.
We then prove part (ii). Denote
where is a vector with all of components being zero.
To prove part (ii), it is sufficient to show that, and , is true with probability tending to 1.
where lies between and θ, . Furthermore, we get
Note that , from Lemma 1 and [29], we have and . According to Lemma 2 and Lemma 3, we have
Form C10 and C11, for t lies between 0 and
Thus, for any and , is true with probability tending to 1. The part (ii) is real. So, the proof of Theorem 2 is completed.
Proof of Theorem 3.
Proof.
Let be the true value of . Let and be the true value of , and are the spline coefficients of and respectively. Then, Theorems 1 and 2 imply that attains the minimal value at and .
Denote , , and , write , we have
Denote . According to (22) we have
(A8) Applying the Taylor expression to (A8), we have
(A9) where lies between and . Therefore, we have
(A10) Note that , , and . Apply Taylor expression to , we have
C9 implies that , and note that as . Following [29], we know that for n large enough. Thus, and , which imply that and . So we have
From Lemma 1, we can get , . According to C9 and Lemma 2, we have
and
where .
Denote , Hence we can get
(A11)
According to the Slutsky Theorem, we can see that is consistent and asymptotic normality,
(A12) This completes the proof of Theorem 3.
Funding Statement
This work is supported by grants from the Social Science Foundation of China [grant number 15CTJ008 to M. Z.], the Natural Science Foundation of Anhui Universities [grant number KJ2017A433 to K. Z.], the Social Science Foundation of the Ministry of Education of China [grant number 19YJCZH250 to K. Z.], the National Science Foundation of China [grant numbers 12071305, 11871390 and 11871411 to Y. Z.], the Excellent Young Talents Fund Program of Higher Education Institutions of Anhui Province [grant number gxyqZD2019031 to Y. Z.], the National Science Foundation of China [grant number 71803001 to Y. Z.]. This paper is partially supported by the National Natural Science Foundation of China [grant number 11901401].
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.Carroll R.J., Ruppert D., Stefanski L.A. and Crainiceanu C.M., Measurement Error in Nonlinear Models: a Modern Perspective, Chapman and Hall/CRC, New York, 2006. [Google Scholar]
- 2.Fan G.L., Xu H.X. and Huang Z.S., Empirical likelihood for semivarying coefficient model with measurement error in the nonparametric part, AStA Adv. Stat. Anal. 100 (2015), pp. 21–41. [Google Scholar]
- 3.Fan G.L., Xu H.X. and Liang H.Y., Empirical likelihood inference for partially time-varying coefficient errors-in-variables models, Electron. J. Stat. 6 (2012), pp. 1040–1058. [Google Scholar]
- 4.Fan J. and Huang T., Profile likelihood inferences on semiparametric varying-coefficient partially linear models, Bernoulli. 11 (2005), pp. 1031–1057. [Google Scholar]
- 5.Fan J. and Li R., Variable selection via nonconcave penalized likelihood and its oracle properties, J. Am. Statist. Assoc. 96 (2001), pp. 1348–1360. [Google Scholar]
- 6.Feng S. and Xue L., Bias-corrected statistical inference for partially linear varying coefficient errors-in-variables models with restricted condition, Ann. Inst. Statist. Math. 66 (2014), pp. 121–140. [Google Scholar]
- 7.Hastie T. and Tibshirani R., Varying coefficient models, J. R. Stat. Soc. Ser. B. (Stat. Methodol.). 55 (1993), pp. 757–779. [Google Scholar]
- 8.He X., Zhu Z.Y. and Fung W.K., Estimation in a semiparametric model for longitudinal data with unspecified dependence structure, Biometrika 89 (2002), pp. 579–590. [Google Scholar]
- 9.Hu X., Wang Z. and Zhao Z., Empirical likelihood for semiparametric varying coefficient partially linear errors-in-variables models, Statist. Probab. Lett. 79 (2009), pp. 1044–1052. [Google Scholar]
- 10.Huang Z. and Zhang R., Empirical likelihood for nonparametric parts in semiparametric varying coefficient partially linear models, Statist. Probab. Lett. 79 (2009), pp. 1798–1808. [Google Scholar]
- 11.Kaslow R.A., Ostrow D.G., Detels R., Phair J.P., Polk B.F. and Rinaldo C.J., The multicenter AIDS cohort study: rationale, organization and selected characteristics of the participants, Am. J. Epidemiol. 126 (1987), pp. 310–318. [DOI] [PubMed] [Google Scholar]
- 12.Li Q., Huang C.J., Li D. and Fu T-T, Semiparametric smooth coefficient models, J. Bus. Econom. Statist. 20 (2002), pp. 412–422. [Google Scholar]
- 13.Li R. and Liang H., Variable selection in semiparametric regression modeling, Ann. Stat. 36 (2008), pp. 261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Liang K.Y. and Zeger S.L., Longitudinal data analysis using generalized linear models, Biometrika 73 (1986), pp. 13–22. [Google Scholar]
- 15.Park B.U., Mammen E., Lee Y.K. and Lee E.R., Varying coefficient regression models: a review and new developments, Int. Stat. Rev. 83 (2015), pp. 36–64. [Google Scholar]
- 16.Qu A. and Li R., Quadratic inference functions for varying coefficient models with longitudinal data, Biometrics. 62 (2006), pp. 379–391. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Qu A., Lindsay B.G. and Li B., Improving generalised estimating equations using quadratic inference functions, Biometrika 87 (2000), pp. 823–836. [Google Scholar]
- 18.Schumaker L., Spline Functions: Basic Theory, Cambridge University Press, New York, 2007. [Google Scholar]
- 19.Tian R. and Xue L., Variable selection for semiparametric errors-in-variables regression model with longitudinal data, J. Stat. Comput. Simul. 19 (2013), pp. 1–16. [Google Scholar]
- 20.Tian R., Xue L. and Liu C., Penalized quadratic inference functions for semiparametric varying coefficient partially linear models with longitudinal data, J. Multivariate Anal. 132 (2014), pp. 94–110. [Google Scholar]
- 21.Wang H., Li R. and Tsai C-L., Tuning parameter selectors for the smoothly clipped absolute deviation method, Biometrika 94 (2007), pp. 553–568. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Wang H., Zou G. and Wan A.T., Model averaging for varying-coefficient partially linear measurement error models, Electron. J. Stat. 6 (2012), pp. 1017–1039. [Google Scholar]
- 23.Wang H.J., Zhu Z. and Zhou J., Quantile regression in partially linear varying coefficient models, Ann. Statist. 37 (2009), pp. 3841–3866. [Google Scholar]
- 24.Wang L., Li H. and Huang J.Z., Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements, J. Am. Statist. Assoc. 103 (2008), pp. 1556–1569. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Wang X., Li G. and Lin L., Empirical likelihood inference for semi-parametric varying-coefficient partially linear EV models, Metrika. 73 (2011), pp. 171–185. [Google Scholar]
- 26.Wang Z. and Xue L., Variable selection for high dimensional partially linear varying coefficient errors-in-variables models, Hacet. J. Math. Stat. 48 (2019), pp. 213–229. [Google Scholar]
- 27.Wei C., Statistical inference for restricted partially linear varying coefficient errors-in-variables models, J. Statist. Plann. Inference 142 (2012), pp. 2464–2472. [Google Scholar]
- 28.Xia Y. and Da H., Block empirical likelihood for semiparametric varying-coefficient partially linear errors-in-variables models with longitudinal data, J. Probab. Stat. 168 (2013), pp. 175–186. [Google Scholar]
- 29.Xue L., Qu A. and Zhou J., Consistent model selection for marginal generalized additive model for correlated data, J. Am. Stat. Assoc. 105 (2010), pp. 1518–1530. [Google Scholar]
- 30.You J. and Zhou Y., Empirical likelihood for semiparametric varying-coefficient partially linear regression models, Statist. Probab. Lett. 76 (2006), pp. 412–422. [Google Scholar]
- 31.Zhang J., Feng Z., Xu P. and Liang H., Generalized varying coefficient partially linear measurement errors models, Ann. Inst. Statist. Math. 69 (2017), pp. 97–120. [Google Scholar]
- 32.Zhang W., Lee S.Y. and Song X., Local polynomial fitting in semivarying coefficient model, J. Multivariate Anal. 82 (2002), pp. 166–188. [Google Scholar]
- 33.Zhang W., Li G. and Xue L., Profile inference on partially linear varying-coefficient errors-in-variables models under restricted condition, Comput. Statist. Data Anal. 55 (2011), pp. 3027–3040. [Google Scholar]
- 34.Zhao M., Gao Y. and Cui Y., Variable selection for longitudinal varying coefficient errors-in-variables models, Comm. Statist. Theory Methods. 19 (2020), pp. 1–26. [Google Scholar]
- 35.Zhao P. and Xue L., Empirical likelihood inferences for semiparametric varying coefficient partially linear errors-in-variables models with longitudinal data, J. Nonparametr. Stat. 21 (2009), pp. 907–923. [Google Scholar]
- 36.Zhao P. and Xue L., Variable selection for semiparametric varying coefficient partially linear errors-in-variables models, J. Multivariate Anal. 101 (2010), pp. 1872–1883. [Google Scholar]
- 37.Zhou X. and Liang H., Statistical inference for semiparametric varying coefficient partially linear models with error-prone linear covariates, Ann. Statist. 37 (2009), pp. 427–458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Zhou X., Zhao P. and Lin L., Empirical likelihood for parameters in an additive partially linear errors-in-variables model with longitudinal data, J. Korean Stat. Soc. 43 (2014), pp. 91–103. [Google Scholar]

