Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2009 Aug 24.
Published in final edited form as: J Am Stat Assoc. 2007 Jun 1;102(478):632–641. doi: 10.1198/016214507000000095

Analysis of Longitudinal Data with Semiparametric Estimation of Covariance Function

Jianqing Fan 1, Tao Huang 2, Runze Li 3
PMCID: PMC2730591  NIHMSID: NIHMS103889  PMID: 19707537

Abstract

Improving efficiency for regression coefficients and predicting trajectories of individuals are two important aspects in analysis of longitudinal data. Both involve estimation of the covariance function. Yet, challenges arise in estimating the covariance function of longitudinal data collected at irregular time points. A class of semiparametric models for the covariance function is proposed by imposing a parametric correlation structure while allowing a nonparametric variance function. A kernel estimator is developed for the estimation of the nonparametric variance function. Two methods, a quasi-likelihood approach and a minimum generalized variance method, are proposed for estimating parameters in the correlation structure. We introduce a semiparametric varying coefficient partially linear model for longitudinal data and propose an estimation procedure for model coefficients by using a profile weighted least squares approach. Sampling properties of the proposed estimation procedures are studied and asymptotic normality of the resulting estimators is established. Finite sample performance of the proposed procedures is assessed by Monte Carlo simulation studies. The proposed methodology is illustrated by an analysis of a real data example.

Keywords: Kernel regression, local linear regression, profile weighted least squares, semiparametric varying coefficient model

1 Introduction

Estimation of covariance functions is an important issue in the analysis of longitudinal data. It features prominently in forecasting the trajectory of an individual response over time and is closely related with improving the efficiency of estimated regression coefficients. Challenges arise in estimating the covariance function due to the fact that longitudinal data are frequently collected at irregular and possibly subject-specific time points. Interest in this kind of challenges has surged in the recent literature. Wu and Pourahmadi (2003) proposed nonparametric estimation of large covariance matrices using two-step estimation procedure (Fan and Zhang, 2000), but their method can deal with only balanced or nearly balanced longitudinal data. Recently, Huang, et al. (2006) introduced a penalized likelihood method for estimating covariance matrix when the design is balanced and Yao, Müller and Wang (2005a, b) approached the problem from the point of view of functional data analysis.

In this paper, we consider a semiparametric varying-coefficient partially linear model:

y(t)=x(t)Tα(t)+z(t)Tβ+ε(t), (1.1)

where α(t) consists of p unknown smooth functions, β is a q-dimensional unknown parameter vector, and E{ε(t)|x(t), z(t)} = 0. Nonparametric models for longitudinal data (Lin and Carroll, 2000; Wang, 2003) can be viewed as a special case of model (1.1). Moreover, model (1.1) is a useful extension of the partially linear model, systematically studied by Härdle, Liang and Gao (2000), and of the time-varying coefficient model (Hastie and Tibshirani, 1993). It has been considered by Zhang, Lee and Song (2002), Xia, Zhang and Tong (2004) and Fan and Huang (2005) in the case of iid observations, and by Martinussen and Scheike (1999) and Sun and Wu (2005) for longitudinal data. It is a natural extension of the models studied by Lin and Carroll (2001) (with identity link), He, Zhu and Fung (2002), He, Fung and Zhu (2005), Wang, Carroll and Lin (2005), and Huang and Zhang (2004).

We focus on parsimonious modeling of the covariance function of the random error process ε (t) for the analysis of longitudinal data, when observations are collected at irregular and possibly subject-specific time points. We approach this by assuming that var{ε (t)|x(t), z(t)} = σ2(t), which is a nonparametric smoothing function, but the correlation function between ε (s) and ε (t) has a parametric form corr{ε (s), ε (t)} = ρ (t, s, θ), where ρ (s, t, θ) is a positive definite function of s and t, and θ is an unknown parameter vector.

The covariance function is fitted by a semiparametric model, which allows the random error process ε (t) to be nonstationary as its variance function σ2(t) may be time-dependent. Compared with a fully nonparametric fit defined in (6.1) to the correlation function, our semiparametric model guarantees positive definiteness for the resulting estimate; it retains the flexibility of nonparametric modeling and parsimony and ease of interpretation of parametric modeling. To improve the efficiency of the regression coefficient, one typically takes the weight matrix in the weighted least squares method to be the inverse of estimated covariance matrix. Thus, the requirement on positive definiteness becomes necessary. Our semiparametric model allows a data analyst to easily incorporate prior information about the correlation structure. It can be used to improve the estimation efficiency of β. For example, let ρ0(s, t) be a working correlation function (e.g. working independence) and ρ (s, t, θ) be a family of correlation functions, such as AR or ARMA correlation structure, that contains ρ0, our method allows us to choose an appropriate θ to improve the efficiency of the estimator of β. Obviously, to improve the efficiency, the family of correlation functions {ρ (s, t, θ)} is not necessary to contain the true correlation structure.

We will also introduce an estimation procedure for the variance function, and propose two approaches to estimating the unknown vector θ, motivated from two different principles. We also propose an estimation procedure for the regression function α(t) and coefficient β using the profile least squares. Asymptotic properties of the proposed estimators are investigated, and finite sample performance is assessed via Monte Carlo simulation studies. A real data example is used to illustrate the proposed methodology.

This paper is organized as follows. We propose estimation procedures for variance function and unknown parameters in the correlation matrix in Section 2. An efficient estimation procedure for α(t) and β is proposed based on the profile least squares techniques in Section 3. Sampling properties of the proposed procedures are presented in Section 4. Simulation studies and real data analysis are given in Section 5. All technical proofs are relegated to the Appendix.

2 Estimation of covariance function

Suppose that a random sample from model (1.1) consists of n subjects. For the i-th subject, i = 1,···, n, the response variable yi(t) and the covariates {xi(t), zi(t)} are collected at time points t = tij, j = 1, ···, Ji, where Ji is the total number of observations for the i-th subject. Denote

rijrij(α,β)=yi(tij)xi(tij)Tα(tij)zi(tij)Tβ,

and ri(α, β) = (ri1, ···, riJi)T. Here we adopt the notation rij(α, β) to emphasize the parameters α and β, although for true values of α and β, rij(α, β) = εi(tij).

To motivate the proposed estimation procedures below, pretend for the moment that εi is normally distributed with zero mean and covariance matrix Σi. Then, the logarithm of the likelihood function for α, β, σ2 and θ is

(α,β,σ2,θ)=12i=1nlogi12i=1nri(α,β)Ti1ri(α,β) (2.1)

after dropping a constant. Maximizing the log-likelihood function yields a maximum likelihood estimate (MLE) for the unknown parameters. The parameters can be estimated by iterating between estimation of (α, β) and estimation of (σ2, θ). We shall discuss the estimation procedure of (α, β) for model (1.1) in details in the next section. Thus, we may substitute their estimates into rij(α, β), and rij(α̂, β̂) is computable and is denoted by ij for simplicity.

2.1 Estimation of variance function

We first propose an estimation procedure for σ2(t). Note that

σ2(tij)=E{ε2(t)t=tij}.

A natural estimator for σ2(t) is the kernel estimator:

σ^2(t)=i=1nj=1Jir^ij2Kh1(ttij)i=1nj=1JiKh1(ttij),

where Kh1 (x) = h1−1K(x/h1), and K(x) is a kernel density function and h1 is a smoothing parameter. Note that locally around a time point, there are few subjects that contribute more than one data point to the estimation of σ2(t). Thus, the estimator should behave locally as if data were independent. Ruppert et al (1997) studied local polynomial estimation of the variance function when observations are independently taken from the canonical nonparametric regression model: Y = m(X) + ε with E(ε|X) = 0 and var(ε|X) = σ2(X). Fan and Yao (1998) further showed that the local linear fit of variance function performs as well as the ideal estimator, which is a local linear fit to the true squared residuals {(Yim(Xi))2}, allowing data to be taken from a stationary mixing process. A similar result was obtained by Müller and Stadtmüller (1993). The consistency and asymptotic behavior of σ̂2(t) will be studied in Theorem 4.2(B), from which we may choose an optimal bandwidth for σ̂2(t) using various existing bandwidth selectors for independent data (for example, Ruppert, Sheather and Wand, 1995).

2.2 Estimation of θ

Decompose the covariance matrix Σi into variance-correlation form, that is,

i=ViCi(θ)Vi,

where Vi = diag{σ (ti1), ···, σ (tiJi)} and Ci(θ) is the correlation matrix of εi, whose (k, l)-element equals ρ (tik, til, θ). To construct an estimator for θ, we maximize ℓ (α̂, β̂, σ̂2, θ) with respect to θ. In other words,

θ^=argmaxθ12i=1n{logCi(θ)+r^iTV^i1Ci1(θ)V^i1r^i}, (2.2)

where i = diag{σ̂(ti1), ···, σ̂(tiJi)}, and i = (i1, ···, iJi)T. The estimator in (2.2) is referred to as a quasi-likelihood (QL) estimator.

Optimizing QL may provide us a good estimate for θ when the correlation structure is correctly specified, but when it is misspecified, the QL might not be the best criterion to optimize. We may, for example, be interested in improving the efficiency for β, treating α, σ2 and θ as nuisance parameters. In such a case, we are interested in choosing θ to minimize the estimated variance of β̂. For example, for a given working correlation function ρ0(s, t) (e.g. working independence), we can embed this matrix into a family of parametric models ρ (s, t, θ) (e.g. autocovariance function of the ARMA(1, 1) model). Even though ρ (s, t, θ) might not be the true correlation function, we can always find a θ to improve the efficiency of β. More generally, suppose that the current working correlation function is ρ0(s, t; θ0). Let ρ1(s, t), ···, ρm(s, t) be given family of correlation functions. We can always embed the current working correlation function ρ0(s, t; θ0) into the family of the correlation functions

ρ(s,t;θ)=τ0ρ0(s,t;θ0)+τ1ρ1(s,t)++τmρm(s,t).

where θ = (θ0, τ0, ···, τm), and τ0 + ··· + τm = 1 with all τi ≥ 0. Thus, by optimizing the parameters θ0, τ0, ···, τm, the efficiency of the resulting estimator β̂ can be improved.

To fix the idea, let Γ(σ̂2, θ) be the estimated covariance matrix of β̂ derived in (3.7) for a given working correlation function ρ (s, t, θ). Define the generalized variance of β̂ as the determinant of Γ(σ̂2, θ). Minimizing the volume of the confidence ellipsoid of (β̂β)TΓ−1(σ̂2, θ)(β̂β) < c for any positive constant c is equivalent to minimizing the generalized variance. Thus, we may choose θ to minimize the volume of the confidence ellipsoid:

θ^=argminθΓ(σ^2,θ). (2.3)

We refer to this approach as the minimum generalized variance (MGV) method.

3 Estimation of regression coefficients

As mentioned in Section 2, the estimation of σ2 and θ depends on the estimation of α(t) and β. On the other hand, improving the efficiency of the estimate for (α, β) relies on the estimation of σ2 and θ. In practice, therefore, estimation needs to be done in steps: the initial estimates of (α(t), β) are constructed by ignoring within subject correlation. With this initial estimate, one can further estimate σ2(t) and θ. Finally, we can now estimate α(t) and β more efficiently by using the estimate of σ2(t) and θ. In this section, we propose efficient estimates for α(t) and β using profile least squares techniques.

For a given β, let y*(t) = y(t) − z(t)Tβ. Then model (1.1) can be written as

y(t)=x(t)Tα(t)+ε. (3.1)

This is a varying coefficient model, studied by Fan and Zhang (2000) in the context of longitudinal data and by Hastie and Tibshirani (1993) for the case of iid observations. Thus, α(t) can be easily estimated by using any linear smoother. Here we employ local linear regression (Fan and Gijbels, 1996). For any t in a neighborhood of t0, it follows from Taylor’s expansion that

αl(t)αl(t0)+αl(t0)(tt0)al+bl(tt0),forl=1,,q.

Let K(·) be a kernel function and h be a bandwidth. Thus, we can find local parameters (a1, ···, aq, b1, ···, bq) that minimize

i=1nj=1Ji[yi(tij)l=1q{al+bl(tijt0)}xil(tij)]2Kh(tijt0), (3.2)

where Kh(·) = h −1K(·/h). The local linear estimate for α(t0) is then simply α̂ (t0, β̂) = (a1, ···, aq)T. Note that since the data are localized in time, the covariance structure does not greatly affect the local linear estimator.

The profile least-squares estimator of (α, β) has a closed form using the following matrix notation. Let yi = (yi(ti1), ···, yi(tiJi))T, Xi = (xi(ti1), ···, xi(t iJi))T, Zi = (zi(ti1), ···, zi(tiJi))T, and mi = (xi(ti1)Tα(ti1), ···, xi(tiJi)Tα(tiJi))T. Denote by y=(y1T,,ynT)T,X=(X1T,,XnT)T,Z=(Z1T,,ZnT)T, and m=(m1T,,mnT)T. Then, model (3.1) can be written as

yZβ=m+ε, (3.3)

where ε = (ε1(t11), ···, ε n(t nJn))T. It is known that the local linear regression results in a linear estimate in y*(tij) for α(·) (Fan and Gijbels, 1996). Thus, the estimate of α(·) is linear in yZβ, and the estimate of m is of the form = S(yZβ). The matrix S is usually called a smoothing matrix of the local linear smoother, and depends only on the observations {tij, xi(tij), j = 1, ···, Ji, i = 1, ···, n}. Substituting into (3.3) results in the synthetic linear model

(IS)y=(IS)Zβ+ε, (3.4)

where I is the identity matrix of order n=i=1nJi.

To improve efficiency for estimating β, we minimize the weighted least squares

(yZβ)T(IS)TW(IS)(yZβ), (3.5)

where W is a weight matrix, called a working covariance matrix. As usual, misspecification of the working covariance matrix does not affect the consistency of the resulting estimate, but it does affect the efficiency. The weighted least squares estimator for β is

β^={ZT(IS)TW(IS)Z}1ZT(IS)TW(IS)y. (3.6)

This estimator is called the profile weighted least squares estimator. The profile least squares estimator for the nonparametric component is simply α̂ (·; β̂). Using (3.4), it follows that when the weight matrix does not depend on y,

cov{β^tij,xi(tij),zi(tij)}=D1VD1=^Γ(σ2,θ), (3.7)

where D = ZT(IS)TW(IS)Z and V = cov{ZT(IS)TWε}. In practice, Γ̂ (σ̂2, θ) is estimated by a sandwich formula by taking = ZT(IS)TWRWT (IS)Z, where R=diag{r1r1T,,rnrnT} with ri = yiŷi. Speckman (1988) derived a partial residual estimator of β for partially linear models with independent and identically distributed data; the form of this estimator is the same as that in (3.5) with W set to be an identity matrix. However, the idea of partial residual approach is difficult to implement for model (1.1).

4 Sampling properties

In this section, we investigate sampling properties of the profile weighted least squares estimator. The proposed estimation procedures are applicable for various formulations on how the longitudinal data are collected. Here we consider the collected data as a random sample from the population process {y(t), x(t), z(t)}, t ∈ [0, T]. To facilitate the presentation, we assume that Ji, i = 1, ···, n are independent and identically distributed with 0 < E(Ji) < ∞, and for a given Ji, tij, j = 1, ···, Ji are independent and identically distributed according to a density f(t). Furthermore, suppose that the weight matrix W in (3.5) is block diagonal, i.e., W = diag{W1, ···, Wn}, where Wi is a Ji × Ji matrix. Moreover, assume that the (u, v)-element of Wi is set to be w(tiu, tiv) for a bivariate positive function w(·, ·). When the weight function w(·, ·) is data-dependent, assume that it tends to a positive definite function in probability. Thus, for simplicity, assume that w(·, ·) is deterministic.

Let G(t) = Ex(t)xT(t), Ψ(t) = Ex(t)zT(t), and denote by

Xi=(Ψ(ti1)G1(ti1)xi(ti1),,Ψ(tiJi)G1(tiJi)xi(tiJi))T.

Set

n=1ni=1n{ZiXi}TWi{ZiXi},andξn=1ni=1n{ZiXi}TWiεi,

where εi = (εi(ti1), ···, εi(tiJi))T. Let

A=E{(Z1X1)TW1(Z1X1)},andB=E{(Z1X1)TW1ε1ε1TW1(Z1X1)}.

Denote by α0(t) and β0 the true values of α(t) and β, respectively.

Theorem 4.1

Under the regularity conditions (1)–(5) in the Appendix, if the matrices A and B exist, and if A is positive definite, then as n → ∞,

n(β^β0)=nn1ξn+oP(1)LN(0,A1BA1),

where n is the number of subjects.

When Wi is taken to be the inverse of the conditional variance-covariance matrix of εi given xi(tij) and zi(tij) for j = 1, ···, Ji, then A = B. In this case,

n(β^β0)LN(0,B01),

where B0 = E{Z11)T cov−1 (ε1|X1, Z1)(Z11)}. It will be shown in the Appendix that for any weight matrix Wi,

A1BA1B010, (4.1)

where the symbol D ≥ 0 means that the matrix D is nonnegative definite. Thus, the most efficient estimator for β among the profile weighted least-squares estimates given in (3.6) is the one that uses the inverse of the true variance-covariance matrix of εi as the weight matrix Wi.

One could also use a working independence correlation structure, i.e. let W be a diagonal matrix. Under conditions of Theorem 4.1, the resulting estimate of β is still root n consistent.

Let μi = ∫uiK(u)du and νi = ∫uiK2(u)du. For a vector of functions α(u) of u, denote α̇ (u) = dα(u)/du and α̈(u) = d2α(u)/du2, which are the componentwise derivatives. The following theorem presents the asymptotic normality for α̂ (t) and σ̂2(t), and its proof was given the earlier version of this paper (Fan, Huang and Li, 2005).

Theorem 4.2

Suppose that conditions of Theorem 4.1 hold.

  1. If nh5 = O(1) as n → ∞, then

    nh(α^(t)α(t)12μ2h2α¨(t))LN(0,ν0f(t)E(J1)σ2(t)Γ1(t)).
  2. Under conditions (5) and (6) in the Appendix, if c<nh15<C, and c < h/h1 < C for some positive constants c and C, then, as n → ∞,

    nh1(σ^2(t)σ2(t)b(t))LN(0,v(t)),

    where the bias

    b(t)=h122{σ¨2(t)+2σ.2(t)f(t)f(t)}μ2,

    and the variance

    v(t)=var{ε2(t)}ν0f(t)E(J1).

Since the parametric convergence rate of β̂ is faster than the nonparametric convergence rate of α̂ (t), the asymptotic bias and variance have similar forms to those of varying coefficient model (Cai, Fan and Li, 2000). The choice of the weight matrix W determines the efficiency of β̂, but it does not affect the asymptotic bias and variance of α(t).

From Theorem 4.2 (B), the asymptotic bias and variance do not depend on the choice of the weight matrix W. Therefore, one may use the residuals obtained by using the working independence correlation matrix to estimate σ2(t). This is consistent with our empirical findings from the simulation studies. Therefore, in next section σ2(t) will be estimated using residuals obtained under working independence. Theorem 4.2 (B) implies that we may choose a bandwidth by modifying one of existing bandwidth selectors used for independent data.

5 Numerical comparison and application

In this section, we investigate finite sample properties of the proposed estimators in Sections 2 and 3 via Monte Carlo simulation. All simulation studies are conducted using Matlab code. We have examined the finite sample performance and numerical comparisons for the proposed estimate σ̂2(t), β̂ and α̂ (t) in the earlier version of this paper. See technical report (Fan, Li and Huang, 2005) for details. To save space, we focus on the inference on β in this section.

5.1 Simulation study

We generate 1000 data sets, each consisting of n = 50 subjects, from the following model:

y(t)=x(t)Tα(t)+z(t)Tβ+ε(t). (5.1)

In practice, observation times are usually scheduled but may be randomly missed. Thus, we generate the observation times in the following way. Each individual has a set of ‘scheduled’ time points, {0,1,2,…,12}, and each scheduled time, except time 0, has a 20% probability of being skipped. The actual observation time is a random perturbation of a scheduled time: a uniform [0, 1] random variable is added to a non-skipped scheduled time. This results in different observed time points tij per subject.

In our simulation, the random error process ε(t) in (5.1) is taken to be a Gaussian process with zero mean, variance function

σ2(t)=0.5exp(t/12),

and ARMA(1,1) correlation structure

corr(ε(s),ε(t))=γρts

for st. We consider three pairs of (γ, ρ), namely, (0.85, 0.9), (0.85, 0.6) and (0.85, 0.3), which correspond to strongly, moderately and weakly correlated errors, respectively.

We let the coefficients of α(t) and β both be two-dimensional in our simulation, and further set x1(t) ≡ 1 to include an intercept term. We generate the covariates in the following way: for a given t, (x2(t), z1(t))T follows a bivariate normal distribution with means zero, variances one and correlation 0.5, and z2(t) is a Bernoulli-distributed random variable with success probability 0.5 and independent of x2(t) and z1(t). In this simulation, we set β = (1, 2)T,

α1(t)=t/12,andα2(t)=sin(2πt/12).

Presumably we can gain some efficiency by incorporating the correlation structure, and it is of interest to study the size of gain. We consider the case in which the working correlation structure is taken to be the true one, which is ARMA(1,1) correlation structure. For comparison, we also estimate β using working independence correlation structure and using the true correlation structure in which the parameter (γ, ρ)T is set to be the true value. Profile weighted least squares estimate using the true correlation is shown to be the most efficient estimate among the profile weighted least squares estimates and serves as a benchmark, while the working independence correlation structure is supposed to be a commonly used one in practice.

Table 1 presents a summary of the results over 1000 simulations. In Table 1, “bias” stands for the sample average over 1000 estimates subtracting the true value of β, “SD” stands for the sample standard deviation over 1000 estimates. “Median” represents the median of the 1000 estimates subtracting the true value and “MAD” represents the median absolute deviation of the 1000 estimates divided by a factor of 0.6745. From Table 1, both QL and MGV approaches yield estimates for β as good as the estimate using the true correlation function, and is much better than the estimate using working independence correlation structure. The relative efficiency (MAD(Indep.)/MAD(QL)) is about 3 for high correlation random error, 2 for moderately correlated error and 1.3 for weakly correlated error.

Table 1.

Performance of β̂*

β̂1 β̂2
Method SD Bias MAD Median SD Bias MAD Median
(γ, ρ) = (0.85, 0.9)
Indep. 47.780 −1.9730 44.575 −1.2802 82.488 −1.7276 79.580 −2.7890
True 25.061 −1.2565 25.905 −0.7676 45.003 0.1211 45.543 −0.1568
QL 25.156 −1.2545 25.536 −0.7709 44.932 0.1749 44.654 −0.6489
MGV 25.205 −1.2040 25.575 −0.9126 45.585 0.2663 45.033 −0.5308

(γ, ρ) = (0.85, 0.6)
Indep. 47.499 −2.6415 49.465 −0.8980 82.094 −1.1161 82.553 −3.0444
True 34.308 −1.6807 34.569 −1.5081 62.596 −0.2047 61.871 −0.3016
QL 46.365 −0.2651 34.807 −1.2672 62.650 −0.0023 62.485 −0.3322
MGV 34.634 −1.3411 35.450 −0.5676 64.393 −0.2691 61.090 −1.8051

(γ, ρ) = (0.85, 0.3)
Indep. 46.991 −2.8990 47.457 −1.6817 81.798 −1.0896 83.991 −1.2721
True 40.123 −1.9687 40.184 −2.1143 73.031 −0.5122 73.278 0.1861
QL 95.506 −6.7632 41.841 −1.9187 288.389 −5.7357 77.514 0.1459
MGV 40.389 −1.6740 40.685 −1.4153 74.798 −0.5055 73.465 0.1435
*

Values in the columns of SD, bias, MAD and median are multiplied by a factor of 1000

The simulation results also indicate that the MGV method is more stable and robust than the QL method. This is evidenced in the case of low correlated random error, in which for a few realizations, the estimates were apparently quite bad (the SD is much higher than the MAD). Note the object function to optimize in (2.2) may not be a concave function of θ. Thus, the numerical algorithm may not converge when it stops. This may yield a bad estimate for β and contributes to the issues of the robustness of the algorithm. In addition, the QL criterion is similar to the least-squares criterion and hence is not very robust. On the other hand, the MGV method, aiming directly at minimizing the precision of estimated standard errors, does not allow estimates to have large SEs.

We next study the impact of misspecification of correlation structure, by comparing the performance of β̂ using independent and AR(1) working correlation structures, when the true correlation structure is ARMA(1,1). The top panel of Table 2 summarizes the simulation results. From Table 2, we can see that AR(1) working correlation structure produces much more efficient estimate than working independence correlation structure. For example, the relative efficiency for high correlated random error is about (30.066/19.975)2 ≈ 2.3. Thus, even when the true correlation structure is unavailable, it is still quite desirable to choose a structure close to the truth.

Table 2.

Impacts of Misspecification of Correlation on β̂*

β̂1 β̂2
Method SD Bias MAD Median SD Bias MAD Median
Optimization Algorithm Search
(γ, ρ) = (0.85, 0.9)
Indep. 47.7800 − 1.9730 44.5759 − 1.2802 82.4880 − 1.7276 79.5815 − 2.7890
QL 31.8570 − 0.4859 29.6149 − 0.0837 60.8860 − 0.0684 54.6946 0.2275
MGV 33.1210 − 0.5275 31.8003 0.0535 63.4840 0.3800 58.0557 0.7224

(γ, ρ) = (0.85, 0.6)
Indep. 47.4990 − 2.6415 49.4655 − 0.8980 82.0940 − 1.1161 82.5541 − 3.0444
QL 37.0470 − 1.0667 36.2184 − 0.8100 68.9890 − 0.0925 61.8111 − 1.6883
MGV 37.9660 − 1.1648 36.9805 − 1.0777 71.3970 0.1137 65.4138 − 2.2338

(γ, ρ) = (0.85, 0.3)
Indep. 46.9910 − 2.8990 47.4580 − 1.6817 81.7980 − 1.0896 83.9923 − 1.2721
QL 41.0240 − 1.6139 40.7671 − 1.0700 74.8700 − 0.3320 73.7801 0.0931
MGV 42.4130 − 1.6264 42.2526 − 0.7556 79.1190 − 0.1797 73.2301 − 2.0012

Rough Grid Point Search
(γ, ρ) = (0.85, 0.9)
Indep. 47.7800 − 1.9730 44.5759 − 1.2802 82.4880 − 1.7276 79.5815 − 2.7890
QL 31.9390 − 0.4489 29.3436 − 0.0714 60.7410 0.0272 54.9896 0.7295
MGV 33.2930 − 0.5232 31.4578 − 0.2297 63.8040 0.4463 58.2365 0.6303

(γ, ρ) = (0.85, 0.6)
Indep. 47.4990 − 2.6415 49.4655 − 0.8980 82.0940 − 1.1161 82.5541 − 3.0444
QL 37.2570 − 1.1533 36.4912 − 1.1077 6.9254 0.2263 63.1202 − 1.2543
MGV 40.6740 − 1.1390 39.7678 − 1.5885 77.0800 0.5339 71.0284 0.7412

(γ, ρ) = (0.85, 0.3)
Indep. 46.9910 − 2.8990 47.4580 − 1.6817 81.7980 − 1.0896 83.9923 − 1.2721
QL 41.3200 − 1.6483 40.9850 − 1.8832 75.1910 0.0079 73.4095 0.3885
MGV 48.4380 − 1.5369 47.8895 − 1.9910 91.9430 0.2399 84.2413 1.8811
*

Values in the columns of SD, bias, MAD and median are multiplied by a factor of 1000.

In practice, one may try several values for ρ and choose the best one using the QL or MGV method rather than using an optimization algorithm. We refer to such search as the rough grid point search. We next examine how such search works in practice, using the points {0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95} for ρ. The bottom panel of Table 2 presents the simulation results. Comparing the bottom panel with the top panel of Table 2, the performance of the resulting estimates using the rough grid point search is very close to that using an optimization algorithm.

Now we test the accuracy of the proposed standard error formula (3.7). Table 3 depicts the simulation results for the case (γ, ρ) = (0.85, 0.9). Results for other cases are similar. In Table 3, “SD” stands for the sample standard deviation of 1000 estimates of β and can be viewed as the true standard deviation of the resulting estimate. “SE” stands for the sample average of 1000 estimated standard errors using formula (3.7), and “Std” presents the standard deviation of these 1000 standard errors. From Table 3, the standard error formula works very well for both correctly specified and misspecified correlation structures.

Table 3.

Standard Errors

β̂1 β̂2
SD SE (Std) SD SE (Std)
ARMA(1,1) working correlation matrix
Independence 0.0478 0.0464(0.0065) 0.0825 0.0800 (0.0108)
QL 0.0252 0.0254 (0.0030) 0.0449 0.0440 (0.0047)
MGV 0.0252 0.0257 (0.0031) 0.0456 0.0446 (0.0049)

AR(1) working correlation matrix
QL 0.0319 0.0307 (0.0078) 0.0609 0.0541 (0.0131)
MGV 0.0331 0.0316 (0.0084) 0.0635 0.0557 (0.0141)

5.2 Some comparison with traditional approach

The purpose of this section is to demonstrate the flexibility and efficiency of model (1.1) by comparing its performance with linear models for longitudinal data:

y(t)=x(t)Tα+z(t)Tβ+ε(t), (5.2)

which can be viewed as a special case of model (1.1) with constant function α(·). We employed the weighted least squares method to estimate α and β in model (5.2). To make a fair comparison, we generated 1000 data sets, each consisting of n = 50 samples, from model (5.1) with

Case I: α1(t)=t/12 and α2(t) = sin(2πt/12). This is exactly the same as those in Section 5.1.

Case II: α1(t) = 2 and α2(t) = 1. That is, both α1(t) and α2(t) are constant functions.

and all others parameters and generation scheme of observation times are the same as those in Section 5.1.

To illustrate the flexibility of model (1.1), we fit data generated under the setting of Case I using the linear model (5.2). The error correlation structure is no longer to be ARMA if model (5.2) is fitted under the setting of Case I. Thus, we did not include “True” correlation structure in our simulation. Simulation results are summarized in the top panel of Table 4, in which the caption is the same as that in Tables 1 and 2. To save space, we present only the simulation results with (γ, ρ) = (0.85, 0.6). Results for other (γ, ρ) pairs are similar. Compared with results in Tables 1 and 2, it can be found that misspecification α(t) may yield an estimate with larger bias and less efficient.

Table 4.

Comparison to linear model*

β̂1 β̂2

Model Correlation Method MAD Median(bias) MAD Median(bias)
Case I: α1(t)=t/12, α2(t) = sin(2πt/12) and (γ, ρ) = (0.85, 0.6)
(5.2) Indep. 62.5252 − 3.7274 102.7234 2.3116
AMRA(1,1) QL 43.3809 − 5.2141 75.1942 1.0293
ARMA(1,1) MGV 60.7346 − 4.1006 98.3291 1.8330
AR(1) QL 52.8866 − 2.2510 93.2711 − 3.9622
AR(1) MGV 59.9004 − 3.2324 96.4753 1.1836

Case II: α1(t) = 2, α2(t) = 1 and (γ, ρ) = (0.85, 0.6)
(5.2) Indep. 47.7871 − 3.1878 82.1597 − 2.2803
AMRA(1,1) True 32.9404 − 1.9984 61.6268 0.5491
ARMA(1,1) QL 33.1803 − 2.8015 61.8600 0.1782
ARMA(1,1) MGV 47.0792 − 1.2353 76.6334 − 0.6911
AR(1) QL 35.1901 − 0.8354 64.2013 − 0.3883
AR(1) MGV 47.0576 − 1.4820 76.8226 − 0.8559

(1.1) Indep. 49.4474 − 1.0333 82.7413 − 3.0255
AMRA(1,1) True 34.3453 − 1.6239 63.0509 0.2820
ARMA(1,1) QL 35.3995 − 1.7503 62.9548 − 0.5040
ARMA(1,1) MGV 35.6286 − 0.3130 62.1033 − 2.5856
AR(1) QL 36.2746 − 0.8732 63.2304 − 1.2967
AR(1) MGV 39.8883 − 0.8650 72.1075 1.2003
*

Values in the columns of MAD and median are multiplied by a factor of 1000

Simulation results of models (5.2) and (1.1) for Case II are summarized in the middle panel and the bottom panel of Table 4, respectively. The bias of the resulting estimates for all estimation procedures are in the same magnitude. Comparing the simulation of models (5.2) and (1.1) with independent working correlation matrix and with the true/QL ARMA(1,1) correlation matrix, the proposed models do not lose much efficiency. In summary, the proposed estimation procedure with the model (1.1) offers us a good balance between model flexibility and estimation efficiency.

5.3 An application

We next demonstrate the newly proposed procedures by an analysis of a subset of data from the Multi-Center AIDS Cohort study. The data set contains the human immunodeficiency virus (HIV) status of 283 homosexual men who were infected with HIV during the following-up period between 1984 and 1991. This data set has been analyzed by Fan and Zhang (2000) and Huang, Wu and Zhou (2002) using functional linear models. Details of the study design, methods, and medical implications are given by Kaslow et al. (1987).

All participants were scheduled to have their measurements taken during semiannual visits, but, because many participants missed some of their scheduled visits and the HIV infections occurred randomly during the study, there are unequal numbers of repeated measurements and different measurement times per individual. Our interest is to describe the trend of the mean CD4 percentage depletion over time and to evaluate the effects of cigarette smoking, pre-HIV infection CD4 percentage, and age at infection on the mean CD4 percentage after the infection. Huang, Wu and Zhou (2002) took the response y(t) to be CD4 cell percentage and considered the functional linear model,

y(t)=β0(t)+β1(t)Smoking+β2(t)Age+β3(t)PreCD4+ε(t). (5.3)

The results of the hypothesis testing in Huang, Wu and Zhou (2002) indicate that the baseline function varies over time; neither Smoking nor Age has a significant impact on the mean CD4 percentage; and it is unclear whether PreCD4 has a constant effect over time or not. The P-value for testing whether β3(t) varies over time or not is 0.059. Thus, we fit the data using a simpler semiparametric varying coefficient partially linear model

y(t)=α1(t)+α2(t)X1+β1Z1+β2Z2+ε(t),

where, for numerical stability, X1 is the standardized variable for PreCD4, Z1 is the smoking status (1 for a smoker and 0 for a nonsmoker), Z2 is the standardized variable for age, and the unit for observation time t is one month.

Bandwidth selection

We employ a multifold cross-validation method to select a bandwidth for α̂ (t). We partition the data into Q groups, each of which has approximately the same number of subjects. For each k, k = 1, ···, Q, model (5.3) is fitted for the data excluding the k-group of data. Cross-validation score is defined as the sum of residual squares:

CV(h)=k=1Qidkj=1Ji{yi(tij)y^dk(tij)}2,

where ŷdk (tij) is the fitted value for the i-th subject at observed time tij with the data in dk being deleted, using a working independence correlation matrix. In the implementation, we choose Q = 15. Figure 1(a) depicts the cross-validation score function CV (h) which gives the optimal bandwidth h = 21.8052. Note that σ̂2(t) is a one-dimensional kernel regression of the squared residuals over time. Thus, various bandwidth selectors for one-dimensional smoothing can be used to choose a bandwidth for σ̂2(t). In this application, we directly use the plug-in bandwidth selector (Ruppert, Sheather and Wand, 1995), and the bandwidth h1 = 12.7700 is chosen.

Figure 1.

Figure 1

(a) Plot of the cross-validation score against the bandwidth. (b) and (c) are plots of estimate of α1(t) and α2(t) with bandwidth 21.8052, chosen by the cross-validation method. (d) Plot of estimated σ (t) with bandwidth 12.7700, chosen by the plug-in method.

Estimation

The resulting estimate of α(t) is depicted in Figures 1(b) and (c). The intercept function decreases as time increases. This implies that the overall trend of CD4 cell percentage decreases over time. The trend of α2(t) implies that the impact of PreCD4 on CD4 cell percentage decreases gradually as time evolves. The results are consistent with our expectation. They quantify the extent to which the mean CD4 percentage depletes over time and how the association between CD4 percentage and PreCD4 varies as time evolves. The resulting estimate of σ̂ (t) is depicted in Figure 1(d), from which we can see that σ (t) seems to be constant during the first and half year, and then increases as time increases. This shows that the CD4 percentage gets harder to predict as time evolves.

We next estimate β. Here we consider ARMA(1,1) correlation structure. The proposed estimation procedures in Section 2 were applied for estimating (γ, ρ). The resulting estimates are displayed in the top panel of Table 5, and the corresponding estimates for β are depicted in the bottom panel of Table 5. The quasi-likelihood approach yields a correlation structure with moderate correlation, and the standard error for the resulting estimate of β is smaller than that using independence correlation structure. The minimum generalized variance method results in a correlation structure with low correlation, but the corresponding standard error is still smaller than that of independence correlation structure. From Table 5, the effects of smoking status and age are not significant under the three estimation schemes.

Table 5.

Estimates of (γ, ρ) and β

Independence QL MGV
γ̂ 0.8575 0.5334
ρ̂ 0.9852 0.0804

β̂1 0.8726(1.1545) 0.6848(0.9972) 0.6328(1.0864)
β̂2 − 0.5143(0.6110) 0.0556(0.4718) − 0.3658(0.5488)

Prediction of individual trajectory

We now illustrate how to incorporate correlation information into prediction. Let us assume that given the covariates x(t) and z(t), the error process ε (t) is a Gaussian process with zero mean and covariance function c(t, s). Denote by μ (t) = x(t)Tα(t)+z(t)Tβ. Suppose that data for an individual are collected at t = t1, ···, tJ and we want to predict his/her y(t) at t = t* with covariates x(t*) and z(t*). Let yo = (y(t1), ···, y(tJ ))T be the observed response and μ = (μ(t1), ···, μ(tJ ))T be its associated mean. Let Σ be the covariance matrix of (ε (t1), ···, ε (tJ ))T, and c* = (c(t1, t*), ···, c(tJ, t*))T. Then, by the properties of the multivariate normal distribution, we have

E{y(t)yo}=μ(t)+cT1(yoμ),

and

var{y(t)yo}=σ2(t)cT1c.

Thus, the prediction of y(t*) is

y^(t)=μ^(t)+c^T^1(yoμ^).

Since the errors in estimating the unknown regression coefficients and parameters of covariance matrix are negligible relative to random error, the (1 − α)100% predictive interval is

y^(t)±z1α/2σ^2(t)c^T^1c^,

where z1− α/2 is the 1 − α/2 quantile of the standard normal distribution. In particular, it is easy to verify that when t* is one of the observed time points, the prediction error is zero, a desired property.

We now apply the prediction procedure for this application. Assume that ε (t) has AMRA(1,1) correlation structure. As an illustration, here we only consider the prediction with (γ, ρ) estimated by quasi-likelihood approach. That is, (γ̂, ρ̂) = (0.8575, 0.9852). Predictions and their 95% predictive intervals for 4 typical subjects are displayed in Figure 2.

Figure 2.

Figure 2

Plot of pointwise predictions and their 95% predictive intervals for 4 typical subjects. Solid line is the prediction, dashdot lines stand for the limits for the 95% pointwise predictive confidence interval, and “o” is the observed value of y(t).

6 Discussions

In this paper, we proposed a class of semiparametric models for the covariance function of longitudinal data. We further developed an estimation procedure for σ2(t) using kernel regression, estimation procedures for θ in correlation matrix using quasi-likelihood and minimum generalized variance approaches, and estimation procedure for regression coefficients α(t) and β using profile weighted least squares. Robust method estimation procedures have been proposed for semiparametric regression modeling with longitudinal data (He, Zhu and Fung, 2002; He, Fung and Zhu, 2005). In the presence of outliers, one should consider robust method to estimate α(t) and β.

Although misspecification of the correlation structure ρ (s, t, θ) does not affect the consistency of the resulting estimate of α(t) and β, it may lead to nonexistence or inconsistency of the estimates of θ. Thus, it is of interest to check whether the imposed correlation structure is approximately correct. To address this issue, we may consider a full nonparametric estimate for the correlation function ρ (s, t)

ρ(s,t)=i=1njjJie^i(tij)e^i(tij)Kh2(stij)Kh2(ttij)i=1njjJiKh2(stij)Kh2(ttij) (6.1)

for st, where ê (tij) = ij/σ̂ (tij), the standardized residual.

The nonparametric covariance estimator cannot be guaranteed to be positive definite, but it may be useful in specifying an approximate correlation structure, or checking whether the imposed correlation structure ρ (s, t, θ) its approximately correct. This is a two-dimensional smoothing problem, but the effective data points in (6.1) can be small unless the time points for each subject are nearly balanced.

Some alternative estimation procedures for α(t) and β may also be considered. For example, an alternative strategy to estimate β is to first decorrelate data within subjects, and then apply the profile least squares techniques to the decorrelated data. Further research and comparison may be of interest.

In this paper, we have not discussed the sampling property of θ̂ derived by QL and MGV approaches. If the correlation function is correctly specified, the asymptotic property of θ̂ may be derived by following conventional techniques related to linear mixed effects models. It is an interesting topic to investigate the asymptotic behaviors of θ̂ when the correlation function is misspecified. Some new formulation may be needed to establish the asymptotic property of θ̂. This research topic is out of scope of this paper. Further research is needed.

Acknowledgments

Fan’s research was supported partially by NSF grant DMS-0354223 and NIH grant R01-GM072611. Li’s research was supported by NSF grant DMS-0348869 and National Institute on Drug Abuse grant P50 DA10075. The authors would like to thank the AE and the referees for their constructive comments that substantially improve the earlier draft, and the MACS study for data in Section 5.3.

Appendix

The following technical conditions are imposed. They are not the weakest possible conditions, but they are imposed to facilitate the proofs.

  1. The density function f(·) is Lipschitz continuous and bounded away from 0. The function K(·) is a symmetric density function with a compact support.

  2. nh8 → 0 and nh2/(log n)3 → ∞.

  3. Ex(t)x(t)T and Ex(t)z(t)T are Lipschitz continuous.

  4. Ji has a finite moment generating function. In addition, E||x(t)||4 +E||z(t)||2 < ∞

  5. α(t) has a continuous second derivative.

  6. σ2(·) has a continuous second derivative.

Proof of Theorem 4.1

First, by condition (4), we can easily show almost surely that max1≤in Ji = O(log n). For each given β, the estimator α̂ (t; β) is a local linear estimator by minimizing (3.2) based on data

{tij,xi(tij),yi(tij)},j=1,,Ji,i=1,,n.

Observe that { yi(tij), j = 1, ···, Ji} is a realization from the process

y(t)=x(t)Tα0(t)+z(t)T(β0β)+ε(t).

Note that the consistency of α̂ (t; β) is not affected by ignoring the correlation within subjects. Following the proof of Fan and Huang (2005), α̂ (t; β) is a consistent estimator of the function

α(t;β)=α0(t)G1(t)Ψ(t)(ββ0). (A.1)

Indeed, uniformly in t,

α^(t;β)α(t;β)=OP(cn), (A.2)

where cn = h2 + {− log h/(nh)}1/2. Let ij(β) = xi(tij)Tα̂ (tij; β) and i(β) = (i1, ···, iJi)T. Note that the profile weighted least squares estimate β̂ is the minimizer of the following weighted quadratic function:

n(β)=1ni=1n(yim^i(β)Ziβ)TWi(yim^i(β)Ziβ), (A.3)

which is a convex and quadratic function of β. This allows us to apply the convexity lemma and the quadratic approximation lemma (see, for example, Fan and Gibjels, 1996, pp.209–210) to establish the asymptotic normality of β̂.

We next decompose ℓn(β). Denote

mi(β)=(xi(ti1)Tα(ti1;β),,xi(tiJ1)Tα(tiJ1;β))T,In,1(β)=1ni=1n{yimi(β)Ziβ}TWi{yimi(β)Ziβ},In,2(β)=2ni=1n{yimi(β)Ziβ}TWi{mi(β)m^i(β)},andIn,3(β)=1ni=1n{mi(β)m^i(β)}TWi{mi(β)m^i(β)},

where mi(β) = (mi1, ···, miji)Tα(tij, β). Then

n(β)=In,1(β)+In,2(β)+In,3(β), (A.4)

Note that In,2(β) and In,3(β) are quadratic in β. Using techniques related to Müller and Stadtmüller (1993) and Fan and Huang (2005), following some tedious calculations, it follows that for each given β

In,2(β)=In,3(β)=O(cn2)=oP(n1/2). (A.5)

We now deal with the main term In,1(β). Using the model

y(t)=x(t)Tα0(t)+z(t)Tβ0+ε(t)

and (A.1), we have

In,1(β)=1ni=1nεiTWiεi2(ββ0)Tξn+(ββ0)Tn(ββ0). (A.6)

The minimization of In,1 is given by

β^0=β0+nξn,

where Σn and ξn are defined before Theorem 4.1. By the WLLN and CLT,

n(β^0β0)LN(0,A1BA1), (A.7)

where A and B are defined in Section 3.2. Finally, we apply the convexity lemma to show that

n(β^β0)=nn1ξn+oP(1). (A.8)

This together with (A.7) proves the results. To show (A.8), first of all, by the convexity lemma, β̂ is a consistent estimator of β0. From (A.4), we have

0=I.n,1(β^)+I.n,2(β^)+I.n,3(β^)=2n(β^β0)2ξn+I.n,2(β^)+I.n,3(β^).

Since I2(β) and I3(β) are quadratic in β, it follows from (A.5) that

I.n,2(β^)=oP(n1/2)andI.n,3(β^)=oP(n1/2).

This completes the proof of Theorem 4.1.

Proof of (4.1)

Denote U = (Z11), and W0 = cov(ε|X1, Z1). Define

D={E(UTW1U)}1UTW1W01/2{E(UTW01U)}1UTW01/2

Then

DDT={E(UTW1U)}1(UTW1W0W1U){E(UTW1U)}1{E(UTW1U)}1(UTW1U){E(UTW01U)}1{E(UTW01U)}1(UTW1U){E(UTW1U)}1+{E(UTW01U)}1(UTW01U){E(UTW01U)}1.

Since DDT is nonnegative definite, we have

E(DDT)={E(UTW1U)}1E(UTW1W0W1U){E(UTW1U)}1{E(UTW01U)}1

is nonnegative definite. Hence,

A1BA1B010

The equality holds if and only if D = 0, which occurs when W=W01

Contributor Information

Jianqing Fan, Frederick Moore Professor of Finance, Department of Operation Research and Financial Engineering, Princeton University, Princeton, NJ 08544 (jqfan@Princeton.EDU).

Tao Huang, Assistant professor, Department of Statistics, University of Virginia, Charlottesville, VA 22904 (E-mail: th8e@Virginia.EDU).

Runze Li, Associate professor, Department of Statistics and The Methodology Center, The Pennsylvania State University, University Park, PA 16802-2111 (E-mail: rli@stat.psu.edu).

References

  1. Cai Z, Fan J, Li R. Efficient estimation and inferences for varying-coefficient models. Journal of the American Statistical Association. 2000;95:888–902. [Google Scholar]
  2. Fan J, Huang T. Profile likelihood inferences on semiparametric varying coefficient partially linear models. Bernoulli. 2005;11:1031–1059. [Google Scholar]
  3. Fan J, Huang T, Li R. Analysis of longitudinal data with semiparametric estimation of covariance function. Technical Report 05-074. Methodology Center, The Pennsylvania State University, University Park; 2005. [Google Scholar]
  4. Fan J, Gijbels I. Local Polynomial Modelling and Its Applications. Chapman and Hall; London: 1996. [Google Scholar]
  5. Fan J, Yao Q. Efficient estimation of conditional variance functions in stochastic regression. Biometrika. 1998;85:645–660. [Google Scholar]
  6. Fan J, Zhang J. Two-step estimation of functional linear models with applications to longitudinal data. Jour Royal Statist Soc, B. 2000;62:303–322. [Google Scholar]
  7. Härdle W, Liang H, Gao J. Partially Linear Models. Springer-Verlag; New York: 2000. [Google Scholar]
  8. Hastie T, Tibshirani R. Varying-coefficient models (with discussion) Jour Royal Statist Soc, B. 1993;55:757–796. [Google Scholar]
  9. He X, Fung WK, Zhu ZY. Robust estimation in generalized partial linear models for clustered data. Jour Amer Statist Assoc. 2005;100:1176–1184. [Google Scholar]
  10. He X, Zhu ZY, Fung WK. Estimation in a semiparametric model for longitudinal data with unspecified dependence structure. Biometrika. 2002;89:579–590. [Google Scholar]
  11. Huang JZ, Liu N, Pourahmadi M, Liu L. Covariance selection and estimation via penalized normal likelihood. Biometrika. 2006;93:85–98. [Google Scholar]
  12. Huang JZ, Wu CO, Zhou L. Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika. 2002;89:111–128. [Google Scholar]
  13. Huang JZ, Zhang L. Efficient estimation in marginal partially linear models for longitudinal/clustered data using splines. 2004. Manuscript. [Google Scholar]
  14. Kaslow RA, Ostrow DG, Detels R, Phair JP, Polk BF, Rinaldo CR. The Multicenter AIDS Cohort Study: rationale, organization and selected characteristics of the participants. Am J Epidem. 1987;126:310–318. doi: 10.1093/aje/126.2.310. [DOI] [PubMed] [Google Scholar]
  15. Lin X, Carroll R. Nonparametric function estimation for clustered data when the predictor is measured without/with error. Jour Amer Statist Assoc. 2000;95:520–534. [Google Scholar]
  16. Lin X, Carroll RJ. Semiparametric regression for clustered data using generalized estimating equations. Jour Amer Statist Assoc. 2001;96:1045–1056. [Google Scholar]
  17. Martinussen T, Scheike TH. A semiparametric additive regression model for longitudinal data. Biometrika. 1999;86:691–702. [Google Scholar]
  18. Müller HG, Stadtmüller U. On variance function estimation with quadratic forms. J Statist Plann Inf. 1993;35:213–231. [Google Scholar]
  19. Ruppert D, Sheather SJ, Wand MP. An effective bandwidth selector for local least squares regression. Jour Amer Statist Assoc. 1995;90:1257–1270. [Google Scholar]
  20. Ruppert D, Wand MP, Holst U, Hössjer O. Local polynomial variance function estimation. Technometrics. 1997;39:262–73. [Google Scholar]
  21. Speckman P. Kernel smoothing in partial linear models. Journal Royal Statistical Society, B. 1988;50:413–436. [Google Scholar]
  22. Sun Y, Wu H. Semiparametric time-varying coefficients regression model for longitudinal data. Scandinavian Journal of Statistics. 2005;32:21 – 47. [Google Scholar]
  23. Wang N. Marginal nonparametric kernel regression accounting within-subject correlation. Biometrika. 2003;90:29–42. [Google Scholar]
  24. Wang N, Carroll RJ, Lin X. Efficient semiparametric marginal estimation for longitudinal/clustered data. Journal of the American Statistical Association. 2005;100:147–157. [Google Scholar]
  25. Wu WB, Pourahmadi M. Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika. 2003;90:831–844. [Google Scholar]
  26. Xia Y, Zhang W, Tong H. Efficient estimation for semivarying-coefficient models. Biometrika. 2004;91:661–681. [Google Scholar]
  27. Yao F, Müller HG, Wang JL. Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association. 2005a;100:577–590. [Google Scholar]
  28. Yao F, Müller HG, Wang JL. Functional Regression Analysis for Longitudinal Data. The Annals of Statistics. 2005b;33:2873–2903. [Google Scholar]
  29. Zhang W, Lee SY, Song X. Local polynomial fitting in semivarying coefficient models. Jour Multivar Anal. 2002;82:166–188. [Google Scholar]

RESOURCES