Varying Coefficient Models for Sparse Noise-contaminated Longitudinal Data

Damla Şentürk; Danh V Nguyen

doi:10.5705/ss.2009.328

. Author manuscript; available in PMC: 2015 Jan 12.

Published in final edited form as: Stat Sin. 2011 Oct;21(4):1831–1856. doi: 10.5705/ss.2009.328

Varying Coefficient Models for Sparse Noise-contaminated Longitudinal Data

Damla Şentürk ^1,^✉, Danh V Nguyen ^2,^✉

PMCID: PMC4291232 NIHMSID: NIHMS601279 PMID: 25589822

Summary

In this paper we propose a varying coefficient model for highly sparse longitudinal data that allows for error-prone time-dependent variables and time-invariant covariates. We develop a new estimation procedure, based on covariance representation techniques, that enables effective borrowing of information across all subjects in sparse and irregular longitudinal data observed with measurement error, a challenge in which there is no adequate solution currently. More specifically, sparsity is addressed via a functional analysis approach that considers the observed longitudinal data as noise contaminated realizations of a random process that produces smooth trajectories. This approach allows for estimation based on pooled data, borrowing strength from all subjects, in targeting the mean functions and auto- and cross-covariances to overcome sparse noisy designs. The resulting estimators are shown to be uniformly consistent. Consistent prediction for the response trajectories are also obtained via conditional expectation under Gaussian assumptions. Asymptotic distribution of the predicted response trajectories are derived, allowing for construction of asymptotic pointwise confidence bands. Efficacy of the proposed method is investigated in simulation studies and compared to the commonly used local polynomial smoothing method. The proposed method is illustrated with a sparse longitudinal data set, examining the age-varying relationship between calcium absorption and dietary calcium. Prediction of individual calcium absorption curves as a function of age are also examined.

Keywords: Functional data analysis, Local least squares, Measurement error, Repeated measurements, Smoothing, Sparse design

1 Introduction

Varying coefficient models (Cleveland, Grosse and Shyu, 1991; Hastie and Tibshirani, 1993) are extensions of parametric regression models that have attracted many applications in diverse scientific research areas in the last fifteen years. An example is the modeling of the time-varying relationship between virologic response and immunologic status (as measured by viral load and CD4+ status) and other covariates in AIDS clinical studies. As recently reviewed by Fan and Zhang (2008), estimation in varying coefficient models for longitudinal data is based on three main approaches: polynomial spline (Huang, Wu and Zhou, 2002; 2004), smoothing spline (Hoover et al., 1998; Chiang, Rice and Wu, 2001) and perhaps the most natural approach of all, local polynomial smoothing (Wu, Chiang and Hoover, 1998; Hoover et al., 1998; Fan and Zhang, 2000; Wu and Chiang, 2000). Qu and Li (2006) proposed penalized spline and quadratic inference functions for incorporating the correlation structure into the estimation.

Although these approaches may be more effective for densely measured longitudinal data, highly sparse longitudinal data combined with measurement error poses unique unresolved challenges. Here sparsity refers to the irregular measurement times between subjects and the availability of only a few observed repetitions per subject in longitudinal designs. With measurement error, estimation of the varying coefficient functions by applying the above methods will be biased. Measurement error is typical in studies of dietary intake (Carroll et al., 2006), such as individual calcium absorption and dietary calcium in adult women population that we consider in Section 4. Of particular interest is estimation of the age-varying relationship between calcium absorption and dietary calcium and other baseline measures, such as body surface area. In addition to inherent measurement errors, the data is sparse with subject ages ranging from 39 to 58 and due to dropouts and missed visits about 40% of the subject have only two or a single measurement. For both estimation of the varying coefficient function and prediction of individual calcium absorption curves, an effective strategy to pool information across subjects is needed.

In this paper we take a functional analysis approach to propose multiple varying coefficient modeling of noise-contaminated sparse longitudinal data, and to develop a new estimation method that allows for both cross-sectional (time-invariant) and longitudinal predictors. The main idea of the functional approach is to view the observed longitudinal data as a noise contaminated realization of a stochastic process which produces smooth trajectories. This approach is adopted to allow for pooling of information across subjects in order to strengthen the estimation from sparse data. We note that functional data analysis has been extended to sparse longitudinal data in the context of functional regression models by Yao, Müller and Wang (2005a). More recently Şentürk and Müller (2009) considered estimation in functional varying coefficient models with one covariate process that incorporates a history index. However, as the authors point out, their estimation approach is also useful for univariate varying coefficient models relating a longitudinal response process to a single longitudinal predictor process. The authors represent the varying coefficient functions using auto- and cross-covariances of the underlying stochastic processes which are then estimated based on the entire data using functional analysis approaches. We utilize similar representations for the varying coefficient functions and propose an estimation procedure for multiple predictor processes, which may include cross-sectional and longitudinal covariates. Several important distinctions of the current proposal from Şentürk and Müller's methodology are as follows. First, the current proposal is designed for multiple predictor processes. Second, cross-sectional predictors are included in the functional analysis approach proposed. We note that incorporation of cross-sectional predictor variables is not very common in functional linear models and functional data analysis. Our third new contribution is developing the estimation method to accommodate these two innovations and to study its theoretical and finite sample properties. The proposed estimation procedure enables a novel way of incorporating the within-subject correlation and handling sparse noise-contaminated longitudinal designs, which leads to improved finite sample performance relative to the commonly used local polynomial smoothing methods for varying coefficient models.

In the next section we describe the proposed estimation procedure for the multiple varying coefficient model with time-dependent and time-invariant covariates in its full generality and provide uniform consistency of the proposed estimators. Consistent prediction of the response trajectories via conditional expectation obtained under Gaussian assumptions are given in Section 3. Also given in Section 3 is the asymptotic distribution of the predicted response trajectories along with asymptotic pointwise confidence bands. In Section 4 the method is illustrated with the aforementioned sparse longitudinal data set, where we examine the age-varying relationship between calcium absorption and dietary calcium. Simulation studies, including comparisons with local polynomial smoothing, and concluding remarks follow in Section 5 and 6, respectively. Technical assumptions and proofs are given in the appendix.

2 Estimation in Multiple Varying Coefficient Models

2.1 Sparse Data and Model Representation

Consider the observed data, consisting of p time-dependent and q time-independent predictors along with a time-dependent response. The q time-independent predictors Z_gi, i = 1, …, n, g = 1, …, q are assumed to have finite variance. The time-dependent predictors X_ri and response Y_i, i = 1, …, n, r = 1, …, p are square integrable random realizations of the smooth random processes X_r and Y respectively, both defined on a finite and closed interval domain [0, T]. Predictor and response processes X and Y have smooth mean functions μ_Xr(t) = EX_r(t), μ_Y(t) = EY(t), and (auto-)covariance functions G_XrXr(s, t) = cov{X_r(s), X_r(t)}, G_YY(s, t) = cov{Y(s), Y(t)}, for s, t ∈ [0, T] and r = 1, …, p. Orthogonal expansions of the covariances, i.e. $G_{X_{r} X_{r}} (s, t) = \sum_{m = 1}^{\infty} ρ_{rm} ϕ_{rm} (s) ϕ_{rm} (t)$ and $G_{YY} (s, t) = \sum_{k = 1}^{\infty} λ_{k} ψ_{k} (s) ψ_{k} (t)$ for s, t ∈ [0, T] and r = 1, …, p follow under mild conditions, where ϕ_rm and ψ_k denote the eigenfunctions with nonincreasing eigenvalues ρ_rm and λ_k. The sparse design (SD), following Şentürk and Müller(2009), can formally be described in the following manner.

(SD) For the i-th subject one has a random number N_i of repeated measurements on the rth time-dependent predictor, X_rij = X_ri(T_ij) + ε_rij, and on the response, Y_ij = Y_i(T_ij) + ε_ij, j = 1, …, N_i, obtained at i.i.d. random time points T_i₁, …, T_iNi, where ε_rij, ε_ij are zero mean finite variance i.i.d. measurement errors. The N_i are assumed to be i.i.d and N_i, T_ij, ε_rij, ε_ij are mutually independent, and also independent of the underlying processes X_ri, Y_i as well as Z_gi. Hence the predictor and response observations can be represented as $X_{rij} = μ_{X_{r}} (T_{ij}) + \sum_{m = 1}^{\infty} ξ_{rim} ϕ_{rm} (T_{ij}) + ɛ_{rij}$ , $Y_{ij} = μ_{Y} (T_{ij}) + \sum_{k = 1}^{\infty} ζ_{ik} ψ_{k} (T_{ij}) + ɛ_{ij}$ , where ξ_rim, ζ_ik are uncorrelated mean zero functional principal component scores with second moments equal to the eigenvalues ρ_rm and λ_k, respectively.

The representations in (SD) above follow from the Karhunen-Loève expansion (see Ash and Gardner, 1975), where we also assume $\sum_{m} ρ_{rm} < \infty$ , $\sum_{k} λ_{k} < \infty$ for the eigenvalues.

Consider the multiple varying coefficient model

E {Y (t) | X_{1} (t), \dots, X_{p} (t), Z_{1}, \dots, Z_{q}} = β_{0} (t) + \sum_{r = 1}^{p} β_{r} (t) X_{r} (t) + \sum_{g = 1}^{q} α_{g} (t) Z_{g},

(1)

where the varying coefficient functions, β_r(t) and α_g(t), are assumed to be smooth functions. Note that for each fixed t, model (1) reduces to a standard linear model. Centering the predictor and response trajectories, i.e. $X_{r}^{C} (t) = X_{r} (t) - μ_{X_{r}} (t)$ , $Z_{g}^{C} = Z_{g} - E (Z_{g})$ and Y^C(t) = Y(t) − μ_Y(t), we can express model (1) as

E {Y^{C} (t) | X_{1} (t), \dots, X_{p} (t), Z_{1}, \dots, Z_{q}} = \sum_{r = 1}^{p} β_{r} (t) X_{r}^{C} (t) + \sum_{g = 1}^{q} α_{g} (t) Z_{g}^{C} .

Note that alternatively β₀(t) can be given as $μ_{Y} (t) - \sum_{r} β_{r} (t) μ_{X_{r}} (t) - \sum_{g} α_{g} (t) E (Z_{g})$ .

2.2 Local Linear Smoothing

A standard method for fitting varying coefficient models (Fan and Zhang, 2008) is local polynomial smoothing. For instance, local linear fitting would minimize

\sum_{i = 1}^{n} \sum_{j = 1}^{N_{i}} K (\frac{T_{ij} - t}{h}) {[Y_{ij}^{C} - \sum_{r = 1}^{p} {θ_{r, 0} + θ_{r, 1} (t - T_{ij})} X_{rij}^{C} - \sum_{g = 1}^{q} {γ_{g, 0} + γ_{g, 1} (t - T_{ij})} Z_{gi}^{C}]}^{2},

(2)

with respect to θ_r,₀, θ_r,₁, γ_g,₀, γ_g,₁, leading to β̂_r(t) = θ̂_r,₀ and α̂_g(t) = γ̂_g,₀ (Hoover et al., 1998). The minimization in (2) requires a specified kernel function K(·), which corresponds to a symmetric probability density function associated with a bandwidth h. Şentürk and Müller (2009) point out that local polynomial smoothing does not take advantage of the functional nature of the underlying processes. In other words, while (2) involves each repeated observation taken on a subject, it does not involve the cross product terms between the repetitions which would correspond to the underlying covariance structure. They also point out that local polynomial smoothing will be biased for the case of sparse and noise-corrupted measurements. We will demonstrate that this bias is also present in the multiple varying coefficient model through simulations.

2.3 Proposed Estimation Procedure

The proposed approach will utilize the functional nature of the covariate processes. Thus, we define the auto- and cross-covariance functions:

\begin{array}{l} G_{{YX}_{r}} (s, t) & = & cov {Y (s), X_{r} (t)} = \sum_{m = 1}^{\infty} \sum_{k = 1}^{\infty} E (ξ_{rm} ζ_{k}) ϕ_{rm} (s) ψ_{k} (t), \\ G_{X_{r} X_{r^{'}}} (s, t) & = & cov {X_{r} (s), X_{r^{'}} (t)} = \sum_{m = 1}^{\infty} \sum_{m^{'} = 1}^{\infty} E (ξ_{rm} ξ_{r^{'} m^{'}}) ϕ_{rm} (s) ϕ_{r^{'} m^{'}} (t), \\ G_{{YZ}_{g}} (t) & = & cov {Y (t), Z_{g}} = \sum_{k = 1}^{\infty} E (ζ_{k} Z_{g}) ψ_{k} (t), \\ G_{X_{r} Z_{g}} (t) & = & cov {X_{r} (t), Z_{g}} = \sum_{m = 1}^{\infty} E {ξ_{rm} Z_{g}} ϕ_{rm} (t), \end{array}

and G_ZgZg_′ = cov(Z_g, Z_g_′). Consider the following equalities that follow directly from (1)

\begin{array}{l} G_{{YX}_{r^{'}}} (t, t) & = & \sum_{r = 1}^{p} β_{r} (t) G_{X_{r} X_{r^{'}}} (t, t) + \sum_{g = 1}^{q} α_{g} (t) G_{X_{r^{'}} Z_{g}} (t), \\ G_{{YZ}_{g^{'}}} (t) & = & \sum_{r = 1}^{p} β_{r} (t) G_{X_{r} Z_{g^{'}}} (t) + \sum_{g = 1}^{q} α_{g} (t) G_{Z_{g} Z_{g^{'}}}, \end{array}

for r′ = 1, …, p and g′ = 1, …, q. Then the varying coefficient functions of interest can be obtained as

{[β_{1} (t), \dots β_{p} (t), α_{1} (t), \dots, α_{q} (t)]}^{T} = χ_{t}^{- 1} Ξ_{t}

(3)

where

χ_{t} = [\begin{matrix} G_{X_{1} X_{1}} (t, t) & \dots & G_{X_{1} X_{p}} (t, t) & G_{X_{1} Z_{1}} (t) & \dots & G_{X_{1} Z_{q}} (t) \\ ⋮ & ⋱ & ⋮ & ⋮ & ⋱ & ⋮ \\ G_{X_{1} Z_{p}} (t, t) & \dots & G_{X_{p} X_{p}} (t, t) & G_{X_{p} Z_{1}} (t) & \dots & G_{X_{p} Z_{q}} (t) \\ G_{X_{1} Z_{1}} (t) & \dots & G_{X_{p} Z_{1}} (t) & G_{Z_{1} Z_{1}} & \dots & G_{Z_{1} Z_{q}} \\ ⋮ & ⋱ & ⋮ & ⋮ & ⋱ & ⋮ \\ G_{X_{1} Z_{q}} (t) & \dots & G_{X_{p} Z_{q}} (t) & G_{Z_{1} Z_{q}} & \dots & G_{Z_{q} Z_{q}} \end{matrix}]

(4)

and Ξ_t = [G_YX₁ (t, t), …, G_YXp(t, t), G_YZ₁ (t), …, G_YZq (t)]^T. Estimation of the varying coefficient functions in (3) involves first obtaining estimates of the auto- and cross-covariances in (χ_t and Ξ_t and then using the plug-in estimator ${\hat{χ}}_{t}^{- 1} {\hat{Ξ}}_{t}$ . The special case of p = 1 and q = 0 is considered by Şentürk and Müller (2009) based on the relations β₁(t) = G_YX(t, t)/G_XX(t, t). The proposal here is adopted for multiple predictor processes in addition to cross-sectional predictors. The estimation algorithm is given through the following steps.

Mean functions. Estimate the mean functions for the predictor and response processes by smoothing the aggregated data (T_ij, X_rij) and (T_ij, Y_ij) for j = 1, …, N_i, and i = 1, …, n, with local linear fitting. Denote the estimated mean functions by μ̂_Xr and μ̂_Y.

Raw covariances. Compute the raw covariances of X_r and Z_g and the raw cross-covariances between (Y, X_r), (X_r, Z_g) and (Y, Z_g) based on all observations from the same subject, defined by

\begin{array}{l} G_{X_{r} X_{r^{'}}, i} (T_{ij}, T_{i ℓ}) & = & {X_{rij} - {\hat{μ}}_{X_{r}} (T_{ij})} {X_{r^{'} i ℓ} - {\hat{μ}}_{X_{r^{'}}} (T_{i ℓ})}, \\ G_{Z_{g} Z_{g^{'}}, i} & = & (Z_{gi} - {\bar{Z}}_{g}) (Z_{g^{'} i} - {\bar{Z}}_{g^{'}}), \\ G_{{YX}_{r}, i} (T_{ij}, T_{i ℓ}) & = & {Y_{ij} - {\hat{μ}}_{Y} (T_{ij})} {X_{ri ℓ} - {\hat{μ}}_{X_{r}} (T_{i ℓ})}, \\ G_{X_{r} Z_{g}, i} (T_{ij}) & = & {X_{rij} - {\hat{μ}}_{X_{r}} (T_{ij})} {Z_{gi} - {\bar{Z}}_{g}} and \\ G_{{YZ}_{g}, i} (T_{ij}) & = & {Y_{ij} - {\hat{μ}}_{Y} (T_{ij})} {Z_{gi} - {\bar{Z}}_{g}}, \end{array}

for j, ℓ = 1, …, N_i and i = 1, …, n. These raw covariances are then smoothed, giving their corresponding final estimates described in the next step.

Smoothed covariances. (3A) The final estimates of the two-dimensional auto- and cross-covariances, namely Ĝ_XrXr_′ and Ĝ_YXr_′ are obtained by feeding the corresponding two-dimensional raw covariances, G_XrXr_′,_i and G_YXr,i from step 2, into a two dimensional local least squares smoothing algorithm.

Remark 1. For estimation of the auto-covariances G_XrXr, the diagonal of the raw auto-covariance matrix is removed before the two dimensional smoothing step, in order to eliminate the effects of measurement error on the longitudinal predictors. This covariance estimation step (inspired by the approach in Yao, Muller and Wang, 2005a, b) achieves two major objectives. First, it eliminates the effect of the noise contamination on the longitudinal observations. Second, through pooling of the data across subjects, it overcomes the problems associated with the sparseness of the design. In addition, to guarantee that the estimator of G_XrX_r is nonnegative definite, we propose an adjusted estimator where we exclude the negative estimates of the eigenvalues and corresponding eigenfunctions in the functional principal component decomposition of the covariance function. More precisely, a nonparametric functional principal component analysis step employed on the smooth estimate of the auto-covariance surface yields estimators for ϕ_rm(t) and ρ_rm, where details are described in Appendix A.1. Then, Ĝ_XrXr is given as $\sum_{m : {\hat{ρ}}_{rm} > 0}^{M_{r}} {\hat{ρ}}_{rm} {\hat{ϕ}}_{rm} (s) {\hat{ϕ}}_{rm} (t)$ . The number M_r of included eigenfunctions can be chosen by one-curve-leave-out cross-validation, the Akaike information criterion (AIC), fraction of variance explained, or similar criteria.

(3B) Similarly, the final estimates of the one-dimensional cross-covariances, namely Ĝ_XrZg and Ĝ_YZg, are obtained by feeding the corresponding one-dimensional raw cross-covariances, G_XrZg,i and G_Y_Zg,_i, into a one-dimensional local polynomial smoothing algorithm. In addition the variance estimator Ĝ_ZgZg_′ is given as n⁻¹ $\sum_{i = 1}^{n} G_{Z_{g} Z_{g^{'}}, i}$ .

Remark 2. Explicit forms of all one- and two-dimensional smoothing estimators are assembled in Appendix A.1.
Plug-in estimator. Estimators for the varying coefficient functions are obtained by simply using the following plug-in estimators for χ_t and Ξ_t: ${[{\hat{β}}_{1} (t), \dots {\hat{β}}_{p} (t), {\hat{α}}_{1} (t), \dots, {\hat{α}}_{q} (t)]}^{T} = {\hat{χ}}_{t}^{- 1} {\hat{Ξ}}_{t}$ . Estimator of the intercept function can be given as ${\hat{β}}_{0} (t) = {\hat{μ}}_{Y} (t) - \sum_{r = 1}^{p} {\hat{β}}_{r} {\hat{μ}}_{X_{r}} (t) - \sum_{g = 1}^{q} {\hat{α}}_{g} (t) {\bar{Z}}_{g}$ , as noted earlier in Section 2.1.

We note that this estimation procedure differs from the standard methods for fitting varying coefficient models, which do not take advantage of the covariance structure of underlying processes. Using this structure in the estimation process makes it possible to handle the sparsity of the longitudinal data, but also allows for incorporating additional information that is inherent in the underlying covariance structure in the estimation step.

2.4 Uniform Consistency

The proposed estimators for the varying coefficient functions are uniformly consistent, as summarized in Theorem 1. The assumptions and proof can be found in the Appendix section. This result holds for sparse designs where the longitudinal predictor and response measurements are contaminated by additive measurement errors.

Theorem 1. Under Assumptions (A) in theAppendix, the varying coefficient function estimators satisfy

sup_{t \in [0, T]} | {\hat{β}}_{r} (t) - β_{r} (t) | = O_{p} (τ_{n}), and sup_{t \in [0, T]} | {\hat{α}}_{g} (t) - α_{g} (t) | = O_{p} (τ_{n}),

for r = 0, 1, …, p, and g = 1, …, q where $τ_{n} = n^{- 1 / 2} (\sum_{r = 1}^{p} \frac{1}{h_{r 1} h_{r 2}} + \sum_{r = 1}^{p} \sum_{r^{'} = 1}^{p} \frac{1}{h_{X_{r}} h_{X_{r^{'}}}})$ .

In the above expression for τ_n, the bandwidths used in the two-dimensional smoothing step of the raw covariances to obtain the cross-covariance function Ĝ_YXr are h_r₁ and h_r₂. Similarly, the corresponding bandwidths used in the two-dimensional smoothing step to obtain the cross-covariance surface Ĝ_XrXr, are denoted by h_Xr and h_Xr_′. Details are given in Appendix A.1. The bandwidths are required to converge to zero, and to satisfy some other restrictions outlined in Appendix A.2.

3 Prediction of Response Trajectories

Also of interest, in addition to estimation of the varying coefficient functions, is the prediction of calcium absorption trajectories as a function of age, based on dietary calcium (X*) and body surface area (Z*). More generally, prediction of an individual response trajectory Y* based on a new subject's sparse observations from the longitudinal predictor trajectories $X_{1}^{*}, \dots, X_{p}^{*}$ and the cross-sectional predictors $Z_{1}^{*}, \dots, Z_{q}^{*}$ is of interest. In this section, we provide consistent predictors of individual response trajectories and provide their asymptotic distribution for construction of asymptotic pointwise confidence intervals.

From the proposed model (1), the predicted response trajectory would be obtained through the following conditional expectation

E {Y * (t) | X_{1}^{*} (t), \dots X_{p}^{*} (t), Z_{1}^{*}, \dots, Z_{q}^{*}} = μ_{Y} (t) + \sum_{r = 1}^{p} β_{r} (t) \sum_{m = 1}^{\infty} ξ_{rm}^{*} ϕ_{rm} (t) + \sum_{g = 1}^{q} α_{g} (t) Z_{g}^{C *},

(5)

where $ξ_{rm}^{*} = \int_{0}^{T} {X_{r}^{*} (t) - μ_{X_{r}} (t)} ϕ_{rm} (t) dt$ is the mth functional principal component score of $X_{r}^{*}$ . To estimate the predicted trajectory in (5), we note that the estimates of μ_Y(t), β_r(t) and α_g(t) were described in the previous section. Also, the nonparametric functional principal component analysis step employed in the estimating the auto-covariance surface Ĝ_XrXr yields estimators for ϕ_rm(t) and ρ_rm, where details are described in Appendix A.1. Thus, the only term remaining in (5) that requires estimation is $ξ_{rm}^{*}$ .

Estimation of $ξ_{rm}^{*} = \int_{0}^{T} {X_{r}^{*} (t) - μ_{X_{r}} (t)} ϕ_{rm} (t) dt$ is a challenging problem since the integral cannot be approximated feasibly from the sparse trajectory $X_{r}^{*} (t)$ .However, estimation is feasible under a Gaussian framework, following the novel work of Yao, Muller and Wang (2005b). More precisely, let $X_{rj}^{*} = X_{r}^{*} (T_{j})$ be the jth measurement for the predictor function $X_{r}^{*}$ and ${\tilde{X}}_{rj}^{*} = X_{rj}^{*} + ɛ_{rj}^{*}$ be the observed noise-contaminated version of it at time $T_{j}^{*}$ for a random number of total measurements N*, j = 1, …, N*. Further let ${\tilde{X}}_{r}^{*} = {({\tilde{X}}_{r 1}^{*}, \dots, {\tilde{X}}_{r N *}^{*})}^{T}$ . Assume that the functional principal component scores $ξ_{rm}^{*}$ , the measurement errors $ɛ_{rj}^{*}$ and $Z_{g}^{*}$ are jointly Gaussian. Then the predicted $ξ_{rm}^{*}$ is given as the best linear prediction conditional on the (N*p + q) × 1 observation vector $U * = ({\tilde{X}}_{1}^{*^{T}}, \dots, {\tilde{X}}_{p}^{*^{T}}, Z_{1}^{*}, \dots, Z_{q}^{*})$ , N*, and locations of the observations $T * = {(T_{1}^{*}, \dots, T_{N *}^{*})}^{T}$ , namely

{\tilde{ξ}}_{rm}^{*} = H_{rm}^{*^{T}} \sum_{U *}^{- 1} (U * - μ_{U}^{*})

(6)

In (6) $μ_{U}^{*} = (μ_{X_{1}}^{*^{T}}, \dots, μ_{X_{p}}^{*^{T}}, μ_{Z_{1}}, \dots, μ_{Z_{q}})$ is the (N*p + q) × 1 mean vector with $μ_{X_{r}}^{*} = {μ_{X_{r}} (T_{1}^{*}), \dots, μ_{X_{r}} (T_{N *}^{*})}^{T}$ ,

H_{rm}^{*^{T}} = {\sum_{m^{'} = 1}^{\infty} ρ_{rm, 1 m^{'}} ϕ_{1 m^{'}}^{*^{T}}, \dots, \sum_{m^{'} = 1}^{\infty} ρ_{rm, {pm}^{'}} ϕ_{{pm}^{'}}^{*^{T}}, E (ξ_{rm} Z_{1}), \dots, E (ξ_{rm} Z_{q})}

(7)

is the (N*p + q) × 1 covariance vector with ρ_rm,_r′_m_′ = cov(ξ_rm, ξ_r_′_m_′) and $ϕ_{rm}^{*} = {ϕ_{rm} (T_{1}^{*}), \dots, ϕ_{rm} (T_{N *}^{*})}^{T}$ . The (N*p + q) × (N*p + q) covariance matrix Σ_U_*. in (6) is equal to

\sum_{U *} = [\begin{matrix} {\tilde{G}}_{X_{1} X_{1}} & \dots & {\tilde{G}}_{X_{1} X_{p}} & {\tilde{G}}_{X_{1} Z_{1}} & \dots & {\tilde{G}}_{X_{1} Z_{q}} \\ ⋮ & ⋱ & ⋮ & ⋮ & ⋱ & ⋮ \\ {\tilde{G}}_{X_{1} Z_{p}} & \dots & {\tilde{G}}_{X_{p} X_{p}} & {\tilde{G}}_{X_{p} Z_{1}} & \dots & {\tilde{G}}_{X_{p} Z_{q}} \\ {\tilde{G}}_{X_{1} Z_{1}} & \dots & {\tilde{G}}_{X_{p} Z_{1}} & G_{Z_{1} Z_{1}} & \dots & G_{Z_{1} Z_{q}} \\ ⋮ & ⋱ & ⋮ & ⋮ & ⋱ & ⋮ \\ {\tilde{G}}_{X_{1} Z_{q}} & \dots & {\tilde{G}}_{X_{p} Z_{q}} & G_{Z_{1} Z_{q}} & \dots & G_{Z_{q} Z_{q}} \end{matrix}],

where the N* × 1 vector ${\tilde{G}}_{X_{r} Z_{g}} = cov ({\tilde{X}}_{r}^{*}, Z_{g}^{*} | N *, T *)$ , the scalars G_ZgZg_′ are as defined Section 2.3 and the N* × N* covariance matrix ${\tilde{G}}_{X_{r} X_{r^{'}}} = cov ({\tilde{X}}_{r}^{*}, {\tilde{X}}_{r^{'}}^{*} | N *, T *)$ for r ≠ r′ and ${\tilde{G}}_{X_{r} X_{r}} = cov ({\tilde{X}}_{r}^{*} | N *, T *)$ with the (j, ℓ)th entry ${({\tilde{G}}_{X_{r} X_{r}})}_{j, ℓ} = G_{X_{r} X_{r}} (T_{j}^{*}, T_{ℓ}^{*}) + var (ɛ_{r}) δ_{j ℓ}$ where δ_jℓ = 1 if j = ℓ and 0 if j ≠ ℓ.

Next, estimators of $μ_{U}^{*}$ , $ϕ_{rm}^{*}$ , ${\tilde{G}}_{X_{r} X_{r}^{'}}$ , G̃_XrZg, G_ZgZg_′ and G̃_XrXr that are based on the entire data, where ${({\hat{\tilde{G}}}_{X_{r} X_{r}})}_{j, ℓ} = {\hat{G}}_{X_{r} X_{r}} (T_{j}^{*}, T_{ℓ}^{*}) + \hat{var} (ɛ_{r}) δ_{j ℓ}$ , are substituted in (6) to obtain a plug-in estimator for $ξ_{rm}^{*}$ . Explicit forms of the estimator of the variance of the measurement errors, $\hat{var} (ɛ_{r})$ are given in Appendix A.1. The covariance ρ_rm,r_′_m_′ can be estimated via ∫ Ĝ_{X_rX_r′}(s, t)ϕ̂_rm(s)ϕ̂_r_′_m_′(t)dsdt and E(ξ_rmZ_g) can be estimated by ∫ Ĝ_XrZg(t)ϕ̂_rm(t)dt using estimates of G_XrXr_′, G_XrZg and ϕ_rm(t). Finally, ${\hat{H}}_{rm}^{*^{T}} = {\sum_{m^{'} = 1}^{M_{1}} {\hat{ρ}}_{rm, 1 m^{'}} {\hat{ϕ}}_{1 m^{'}}^{*^{T}}, \dots, \sum_{m^{'} = 1}^{M_{p}} {\hat{ρ}}_{rm, {pm}^{'}} {\hat{ϕ}}_{{pm}^{'}}^{*^{T}}}$ where the numbers M₁, …, M_p of included eigenfunctions can be chosen by one-curve-leave-out cross-validation, the Akaike information criterion (AIC) or the fraction of variance explained. This leads to ${\hat{ξ}}_{rm}^{*} = {\hat{H}}_{rm}^{*^{T}} {\sum^{^}}_{U *}^{- 1} (U * - {\hat{μ}}_{U}^{*})$ . Hence, the predicted trajectories are given as

{\hat{Y}}_{M}^{*} (t) = {\hat{μ}}_{Y} (t) + \sum_{r = 1}^{p} {\hat{β}}_{r} (t) \sum_{m = 1}^{M_{r}} {\hat{ξ}}_{rm}^{*} {\hat{ϕ}}_{rm} (t) + \sum_{g = 1}^{q} {\hat{α}}_{g} (t) Z_{g}^{C *},

(8)

where $M = \sum_{r = 1}^{p} M_{r}$ . The following Theorem provides the consistency of the prediction ${\hat{Y}}_{M}^{*} (t)$ for the target trajectory $\tilde{Y} * (t) = μ (t) + \sum_{r = 1}^{p} β_{r} (t) \sum_{m = 1}^{\infty} {\hat{ξ}}_{rm}^{*} ϕ_{rm} (t) + \sum_{g = 1}^{q} α_{g} (t) Z_{g}^{C *}$ .

Theorem 2. Under Assumptions (A) and (B) in theAppendix, given N* and T*, for all t ∈ [0, T], the prediction for the response trajectory satisfies

lim_{n \to \infty} {\hat{Y}}_{M}^{*} (t) = \tilde{Y} * (t), in probability .

Here the number M_r = M_r(n) of eigen-components included in the eigen-decomposition of $X_{r}^{*}$ for r = 1, …, p and hence (ℳ all tend to infinity as n → ∞.

Next we consider construction of asymptotic confidence bands for the response trajectory Y*, given the observed sparse and noisy data. For M₁, …, M_p ≥ 1, let $ξ_{r}^{*^{M_{r}}} = {(ξ_{r 1}^{*}, \dots, ξ_{r M_{r}}^{*})}^{T}$ for r = 1, …, p and $ξ *^{M} = {(ξ_{1}^{*^{M_{1}^{T}}}, \dots, ξ_{p}^{*^{M_{p}^{T}}})}^{T}$ . Quantities ${\tilde{ξ}}_{r}^{*^{M_{r}}}$ and ξ̃*^ℳ are defined similarly. Under the Gaussian assumption, given N* and T*, ξ̃*^ℳ − ξ*^ℳ ∼ N(0, Ω_ℳ), where the normality, covariance matrix Ω_ℳ and its plug-in estimator Ω̂_ℳ are derived in the proof of Theorem 3 given in Appendix A.2. Define ϕ_t_ℳ = {β₁(t)ϕ₁₁(t), …, β₁(t)ϕ₁_M₁(t), …, β_p(t)ϕ_p₁(t), …, β_p(t)ϕ_p_Mp(t)}^T for t ∈ [0, T] and let ϕ̂_t_ℳ be its estimate obtained from the data. The following Theorem 3 establishes the asymptotic distribution of the predicted trajectories ${\hat{Y}}_{M}^{*} (t) = {\hat{μ}}_{Y} (t) + {\hat{ϕ}}_{t M}^{T} \hat{ξ} *^{M} + \sum_{g = 1}^{q} {\hat{α}}_{g} (t) Z_{g}^{C *}$ , conditional on N* and T*.

Theorem 3. Under Assumptions (A), (B) and (C) in theAppendix, given N* and T*, for all t ∈ [0, T], x ∈ ℝ, the prediction for the response trajectory satisfies

lim_{n \to \infty} P [\frac{{\hat{Y}}_{M}^{*} (t) - E {Y * (t) | X_{1}^{*} (t), \dots, X_{p}^{*} (t), Z_{1}^{*}, \dots, Z_{q}^{*}}}{{\hat{ω}}_{t M}} \leq x] = Φ (x),

where $ω_{t M} = ϕ_{t M}^{T} Ω_{M} ϕ_{t M}$ , ${\hat{ω}}_{t M} = {\hat{ϕ}}_{t M}^{T} {\hat{Ω}}_{M} {\hat{ϕ}}_{t M}$ and Φ(·) denotes the Gaussian cdf and M_r, r = 1, …, p and hence ℳ, all tend to infinity as n → ∞.

Hence, ignoring bias resulting from truncation at M₁, …, M_p in ${\hat{Y}}_{M}^{*}$ , the (1 − α) 100% asymptotic pointwise confidence interval for $E {Y * (t) | X_{1}^{*} (t), \dots, X_{p}^{*} (t), Z_{1}^{*}, \dots, Z_{q}^{*}}$ is given by ${\hat{Y}}_{M}^{*} (t) \pm Φ (1 - α / 2) \sqrt{{\hat{ω}}_{t M}}$ .

4 Application to Sparse Dietary Calcium Absorption Data

In a study of calcium deficiency, Heaney et al. (1989) showed a complex inverse relation between calcium intake and calcium absorption where age and body surface area are among the variables that affect calcium absorption efficiency We examine the age-varying coefficient regression of calcium absorption on intake and body surface area via the analysis of data from a longitudinal study on factors affecting calcium absorption (Davis, 2002, pg. 336). Longitudinal measurements were taken on absorption and intake among others, where repeated measurement per subject were taken in roughly five-year intervals. We analyze the data where patient ages are between 39 and 58, yielding 182 subjects with 1 to 4 repeated measurements per subject. The data is sparse and irregular due to measurement times between subjects that differed vastly and the number of total repetitions per subject is small. Figure 1 displays the observed individual trajectories of calcium intake X₁(age) and absorption Y(age), along with the corresponding mean functions μ̂_X and μ̂_Y. An increasing trend with age is observed for intake and a decreasing trend is observed for absorption.

(a) Observed individual trajectories (dashed) and the smoothed estimate of the mean function *μ̂_X* (thick solid) for calcium intake. (b) Observed individual trajectories (dashed) and the smoothed estimate *μ̂_Y* of the mean function for calcium absorption (thick solid). (c) Boxplot of baseline body surface area values of the 182 female patients.

We fit the age varying coefficient model

E {Y (age) | X_{1} (age), Z_{1}} = β_{0} (age) + β_{1} (age) X_{1} (age) + α_{1} (age) Z_{1}

of Y (calcium absorption) on X₁ (calcium intake) and Z₁ (baseline body surface area; see Figure 1) using the proposed estimation procedure and kernel linear smoothing, as described in Section 2.2. The resulting estimated varying coefficient functions from both methods are displayed in Figure 2, along with 90% bootstrap percentile confidence intervals. Bootstrap confidence intervals are constructed from 500 bootstrap samples, generated by resampling subjects. Bandwidths for the smoothing of the cross-sectional mean functions, the covariance functions (G_X₁_Z₁ and G_YZ₁), the covariance surface (G_X₁_X₁), and the cross-covariance surface (G_YX₁) were selected by generalized cross-validation.

(a) Estimated varying coefficient function β₀(age) from the proposed varying coefficient model fit (solid) along with 90% bootstrap confidence intervals (dotted) for the calcium absorption data. Estimated functions from the varying coefficient model fits using kernel linear smoothing (dashed) are also displayed. (b) Estimated varying coefficient function β₁(age), the slope function of calcium intake, from both fits along with 90% bootstrap confidence intervals. (c) Estimated varying coefficient function α₁(age), the slope function of the cross-sectional variable body surface area, from both fits along with 90% bootstrap confidence intervals.

The estimated varying coefficient functions from both approaches, displayed in Figures 2, suggest a significant negative relationship between calcium intake and absorption. While the inverse relationship between intake and absorption is declining with age (Figure 2b), especially after age 45 in the kernel linear fit, such a decline is not observed in the proposed fit. Both methods indicate a significant positive effect of baseline body surface area on absorption, for ages between 43 and 55. While the effect is slowly becoming positive with age in both methods, the effect estimated by the kernel linear fit is much larger in magnitude. These differences in the estimated varying coefficients for the kernel linear method are attributed to the estimation bias of the kernel linear method because of potential measurement error and for lack of efficiency due to the fact that it does not incorporate the underlying correlation structure into the estimation. These issues are further investigated in the simulation studies of Section 5. The estimated positive and negative relations of calcium absorption with intake and body surface area respectively, are consistent with earlier findings (Heaney et al., 1989).

Next, we illustrate the prediction of calcium absorption trajectories as described in Section 3, based on sparse longitudinal intake trajectories along with baseline body surface area. The numbers of eigenfunctions used in the expansion of the predictor trajectories given in (8) were chosen by AIC; further details on these choices can be found in Yao, Müller and Wang (2005a). The predicted trajectories for 4 randomly selected subjects are given in Figure 3. The trajectories show a decline in calcium absorption with age, which is the same pattern as observed in the estimated smooth mean function of Figure 1. Overlaying the predictions are 90% approximate confidence intervals, as proposed in Section 3, along with observed calcium absorption values and predictions obtained from kernel linear smoothing. Predictions and the confidence intervals are obtained with the predicted subject's predictor trajectory left out. The predicted values obtained from kernel linear smoothing are similar to those obtained from the proposed method, where the average absolute prediction error from both kernel linear smoothing and the proposed method are 0.0612 and 0.0643, respectively.

Observed values (circles) for calcium absorption (not used for prediction), predicted curves (solid) and 90% pointwise confidence bands (dotted), for four randomly selected patients, where bands and predicted curves are based on one-curve-leave-out analysis. Also displayed (+) are predicted values from the kernel linear smoothing estimation.

Note that even though the kernel linear smoothing estimation procedure cannot target the true underlying varying coefficient functions in the analysis of sparse noise-contaminated longitudinal data as will be demonstrated in the simulations of Section 5, this does not translate into predictions of the response values. This is a well known phenomenon in nonparametric measurement error models, where in the nonparametric regression model Y = g(X) + ε, the predictor X is measured with additive measurement error U yielding the observations W = X + U. Even though the nonparametric estimation of g(·) needs to adjust for the additive measurement error, the prediction of future response values can be obtained without adjusting for the additive measurement error via estimation of E(Y|X + U) = E(Y|W). See Carroll et al. (2006), Carroll and Hall (1988), Stefanski and Carroll (1990) and Carroll, Delaigle and Hall (2009) for further details and discussions on similar issues. Finally, we note an important distinction: although the kernel linear smoothing and the proposed estimation methodology yield similar predictions, the proposed method has the distinctive advantage of providing predicted response trajectories for the entire length of the study, while kernel linear smoothing and other methods can only provide pointwise predictions.

5 Simulation Studies

We assess the finite sample performance of the proposed estimation algorithm and compare its performance to that of kernel linear smoothing via three simulation studies. While the first simulation set-up corresponds to highly sparse designs, the second set-up reflects denser longitudinal designs. The first two simulation set-ups involve a varying coefficient model with one longitudinal and one cross-sectional covariate similar to the data analysis. We study the performance of the proposed estimation algorithm for a varying coefficient model with two longitudinal and two cross-sectional predictors in the third simulation for sparse longitudinal data. We report results for all set-ups based on 500 Monte Carlo runs.

In the first study the number of measurements per subject are randomly chosen with equal probability from {1, 2, 3, 4} for each of n = 182 subjects, similar to the calcium absorption data, to reflect highly sparse designs. The locations T_ij of the measurements for the i-th subject are generated uniformly from [0, 10]. The predictor process X is generated according to (SD) of Section 2.1 with mean function μ_X(t) = t + sin(t), two eigenfunctions, $ϕ_{1} (t) = cos (π t / 10) / \sqrt{5}$ and $ϕ_{2} (t) = sin (π t / 10) / \sqrt{5}$ , for 0 ≤ t ≤ 10 and two eigenvalues, ρ₁ = 2 and ρ₂ = 1, respectively. The functional principal components ξ_im (m = 1, 2) are generated from Inline graphic (0, ρ_m), and the mean zero additive measurement error ε_ij is assumed to be Gaussian with variance 0.2. The cross-sectional variable Z₁ is generated from (0, 1), where it is the marginal component from a bivariate normal distribution for (Z₁_i, ξ_i₂) with cov(Z₁, ξ₂) = 0.3, to allow for correlation between X₁ and Z₁. The response trajectories are generated from

Y_{i} (t) = β_{0} (t) + β_{1} (t) X_{1 i} (t) + α_{1} (t) Z_{1 i} + V_{i} (t),

(9)

according to (1), where β₀(t) = 10 sin(π + tπ/5), β₁(t) = sin(πt/10), α₁(t) = t/10. The functional error V_i in (9) is constructed from the same two eigenfunctions as used for X(t), with Gaussian functional principal components generated with eigenvalues ρ₁ = 0.2 and ρ₂ = 0.1. The observed measurements on the response are further contaminated with additive measurement errors according to Y_ij = Y_i(T_ij) + ε_ij, where ε_ij are i.i.d. zero mean Gaussian errors with variance 0.2. Under the second simulation set-up, the variables are generated in the same way as the first set-up, except at still irregular but denser (non-sparse) measurement times, with the total number of repeated measurements generated uniformly from {5, …, 15}. The average number of repeated measurements per subject is 10 for the dense case, compared to less than 3 observations per subject for the highly sparse case.

We compare the performance of the proposed estimation algorithm with the performance of kernel linear smoothing under the sparse and denser set-ups using mean absolute deviation error (MADE) and weighted average squared error (WASE) , defined respectively as

\begin{array}{l} MADE & = & \frac{1}{3 T} [\sum_{r = 0}^{1} \frac{\int | β_{r} (t) - {\hat{β}}_{r} (t) | dt}{range (β_{r})} + \frac{\int | α_{1} (t) - {\hat{α}}_{1} (t) | dt}{range (α_{1})}], and \\ WASE & = & \frac{1}{3 T} [\sum_{r = 0}^{1} \frac{\int {β_{r} (t) - {\hat{β}}_{r} (t)}^{2} dt}{{range}^{2} (β_{r})} + \frac{\int {α_{1} (t) - {\hat{α}}_{1} (t)}^{2} dt}{{range}^{2} (α_{1})}], \end{array}

where T = 10, range(β_r) is the range of the function β_r(t) and range(α₁) is defined similarly. We also consider the unweighted average squared error (UASE) to compare the estimators, where UASE is defined the same way as WASE, but without weights in the denominator. Bandwidths involved in smoothing of the mean functions and the auto- and cross-covariance surfaces are chosen by generalized cross-validation.

Results from sparse and denser simulation set-ups are given in Figures 4 and Figure 5, respectively. More specifically, plot (d) in both figures are boxplots of logarithms of the ratios of MADE, WASE and UASE values of the proposed method over the kernel linear smoothing approach. The proposed estimators lead to improved finite sample performance for both sparse and denser cases with respect to all three error criteria. More specifically, the proposed estimators have improved performance in (85, 77, 71)% of the Monte Carlo runs for sparse design according to (MADE, WASE, UASE) criterion respectively, while they lead to improved performance in all Monte Carlo runs according to all three criteria in the case of denser design. This can be attributed to the fact that the proposed method adjusts for noise contaminated measurements and incorporates information inherent in the underlying correlation structure of the longitudinal processes. The estimated varying coefficient functions based on the proposed method and the kernel linear approach are provided in Figure 4 (a)-(c) and Figure 5 (a)-(c) for both simulation scenarios. Displayed are the cross-sectional medians of the estimated varying coefficient functions for the proposed method and the kernel linear method, together with estimated functions corresponding to the 5% and 95% cross-sectional percentiles for the proposed method. The kernel linear smoothing fits deviate from the underlying true functions for both sparse and denser simulation set-ups. The bias is especially apparent in the estimation of β₁(t) (e.g. see Figure 5(b)) and is also apparent in the estimation of β₀(t). The (median) estimated functions of the proposed method target the corresponding true functions closely for both simulation scenarios (highly sparse and dense data), and for dense data case note that they essentially coincide with the true functions. This is not surprising since from the highly sparse case to the dense case, there is an average of 4-fold increase in the number of repeated observations per subject. However, the bias of estimated functions via the kernel linear method remains due to the measurement error.

First simulation set-up given in (9) with highly sparse design: (a) The cross-sectional median curves of the proposed estimates (grey) along with 5% and 95% cross-sectional percentiles (dotted) overlaying the true varying coefficient function β₀(t) (solid). Also displayed are the cross-sectional median curves from fits using kernel linear smoothing (dash-dotted). Similarly, for (b) β₁(t) and (c) α₁(t). (d) Boxplots for the logarithm of the ratios of error measures (MADE, WASE and UASE) for proposed estimates over kernel linear smoothing. Values smaller than zero show that the proposed method is superior.

Second simulation set-up given in (9) with denser design: (a) The cross-sectional median curves of the proposed estimates (grey) along with 5% and 95% cross-sectional percentiles (dotted) overlaying the true varying coefficient function β₀(t) (solid). Also displayed are the cross-sectional median curves from fits using kernel linear smoothing (dash-dotted). Similarly, for (b) β₁(t) and (c) α₁(t). (d) Boxplots for the logarithm of the ratios of error measures (MADE, WASE and UASE) for proposed estimates over kernel linear smoothing. Values smaller than zero show that the proposed method is superior.

In addition, we studied the coverage level of the proposed asymptotic confidence intervals for the predicted response trajectories given in Section 3 under the sparse set-up of the first simulation. Pointwise confidence intervals are constructed at a grid of time points at the 95% level, where coverage levels are averaged over number of subjects in 100 Monte Carlo runs. The estimated coverage levels are given in Figure 6 for sample sizes, n = 182, 400 and 1000. A boundary effect is observed in the estimated coverage levels given over time, where the coverage level approaches the targeted 95% for the middle time range and the region for the boundary effect gets smaller with increasing sample size. For example, excluding boundary regions (time 0-1 and 9-10), the coverage is between 83% and 96% for the time region 1-9 for n = 1000.

Estimated coverage levels for the 95% asymptotic confidence intervals of the predicted response trajectory proposed in Section 3 under the sparse design of the first simulation for n = 182 (dotted), n = 400 (dash-dotted) and n = 1000 (solid).

For the third simulation, the number of measurements per subject are randomly chosen with equal probability from {4, 5, 6, 7, 8} for each of n = 400 subjects where the locations T_ij of the measurements for the i-th subject are generated uniformly from [0, 10]. The first longitudinal predictor process X₁ is generated with the same mean function and eigenbasis as in the first two simulations with ρ₁ = 1 and ρ₂ = 1, while the second longitudinal predictor X₂ is generated with mean function μ_X₂(t) = −(t − 5)²/2, two basis functions, ϕ₁(t) = (t − 5)/5 and ϕ₂(t) = 3((t − 5)/5)² − 1)/2, for 0 ≤ t ≤ 10 and with mean zero variance 1 Gaussian coefficients. The mean zero additive measurement error ε_rij is assumed to be Gaussian with variance 0.2 for both predictor processes. In order to allow for correlations between the two longitudinal and two cross-sectional predictors, the two cross-sectional variables Z₁ and Z₂ are generated from Gaussian distributions with means 1 and 2, variances 2 and 2 respectively, where they are marginal components from a six dimensional multivariate normal vector containing the two random coefficients of the two longitudinal predictors and the two cross-sectional predictors, i.e. (ξ₁_i₁, ξ₁_i₂, ξ₂_i₁, ξ₂_i₂, Z₁_i, Z_g₂). The 6 × 6 covariance matrix is equal to

[\begin{matrix} 1 & 0 & 0.2 & 0 & 0.2 & 0.3 \\ 0 & 1 & 0 & 0.2 & 0.4 & 0 \\ 0.2 & 0 & 1 & 0 & 0 & 0.1 \\ 0 & 0.2 & 0 & 1 & 0.5 & 0 \\ 0.2 & 0.4 & 0 & 0.5 & 2 & 0.2 \\ 0.3 & 0 & 0.1 & 0 & 0.2 & 2 \end{matrix}] .

The response trajectories are generated from

Y_{i} (t) = β_{0} (t) + β_{1} (t) X_{1 i} (t) + β_{2} (t) X_{2 i} (t) + α_{1} (t) Z_{1 i} + α_{2} (t) Z_{2 i} + V_{i} (t),

(10)

according to (1), where β₀(t) = 50 sin(π + tπ/5), β₁(t) = 5 sin(πt/10), β₂(t) = 5 cos(πt/4), α₁(t) = t/2 and α₂(t) = (t − 5)²/20. The functional error V_i in (10) is constructed from the same two eigenfunctions as used for X₁(t), with Gaussian functional principal components generated with eigenvalues ρ₁ = 0.2 and ρ₂ = 0.1. The additive measurement error on the response is generated from zero mean Gaussian distribution with variance 0.2. Boxplots of logarithms of the ratios of MADE, WASE and UASE values of the proposed method over the kernel linear smoothing approach along with cross-sectional medians and 5% and 95% percentiles of the estimated varying coefficient functions based on the proposed method and the medians of the kernel linear approach are provided in Figure 7. The proposed method performs better than the kernel linear smoothing according to all three criterion in 95% of the total Monte Carlo runs. The superior performance of the proposed method can be seen especially in the estimated β₀ and the two slope varying coefficient functions β₁ and β₂ corresponding to longitudinal predictors measured with additive measurement error. The estimated median curves for the kernel linear smoother deviate from the true curves, not being able to handle measurement error in covariates.

Third simulation set-up given in (10): (a) The cross-sectional median curves of the proposed estimates (grey) along with 5% and 95% cross-sectional percentiles (dotted) overlaying the true varying coefficient function β₀(t) (solid). Also displayed are the cross-sectional median curves from fits using kernel linear smoothing (dash-dotted). Similarly for (b) β₁(t) (c) β₂ (d) α₁(t) (e) α₂(t). (f) Boxplots for the logarithm of the ratios of error measures (MADE, WASE and UASE) for proposed estimates over kernel linear smoothing. Values smaller than zero show that the proposed method is superior.

6 Discussion

In this work we proposed a multiple varying coefficient model in the context of highly sparse longitudinal data, where the longitudinal response and predictor processes are measured with error. The method is motivated by a study of calcium absorption and dietary calcium intake. We note that the proposed estimation algorithm is applicable to cases with possibly different time grids for each longitudinal covariate of a subject. Most longitudinal regression methods such as local polynomial smoothing require that the longitudinal predictors be observed at concurrent times as the longitudinal response variable. In practice this requirement may not be met either by design or due to common challenges with follow-up measurements over time and/or make-up measurements because of missed follow-up occasions. One example is multi-center studies where all covariates may not be observed at all centers, resulting in missing response or predictor values. While, in most longitudinal regression methods observations with a missing pair are discarded, in the proposed estimation procedure the observed measurements with missing pairs also contribute to the estimation algorithm and no observation is discarded due to the unique representation of the varying coefficient functions and the component-wise nature of the algorithm. This could lead to favorable properties of the proposed estimation algorithm in case of missing covariates especially for sparse designs, where imputation via smoothing techniques would not be feasible.

Acknowledgments

We are extremely grateful to two anonymous referees, the associate editor and editor for helpful remarks that improved the paper. Support for this work includes the National Institute of Health grants UL1RR024922, RL1AG032119 and RL1AG032115 and grant UL1 RR024146 from the National Center for Research Resources.

Appendix

A.1 Details on Estimation Procedures

Explicit forms of the proposed mean and covariance estimators, functional principal components decompositions and measurement error variance estimators are given as follows. Define the local linear scatterplot smoother for μ_Xr(t) through minimizing

\sum_{i = 1}^{n} \sum_{j = 1}^{N_{i}} K_{1} (\frac{T_{ij} - t}{b_{X_{r}}}) {X_{rij} - η_{0} - η_{1} (t - T_{ij})}^{2},

with respect to η₀, η_l, leading to μ̂_X(t) = η̂₀. Other one-dimensional smoothing estimators of the proposed estimation algorithm, namely μ̂_Y(t), Ĝ_XrZg and Ĝ_YZg can be defined similarly.

For the two-dimensional smoothers, recall $G_{X_{r} X_{r}^{'}, i} (T_{ij}, T_{i ℓ}) = {X_{rij} - {\hat{μ}}_{X_{r}} (T_{ij})} {X_{r^{'} i ℓ} - {\hat{μ}}_{X_{r^{'}}} (T_{i ℓ})}$ , and define the local linear surface smoother for G_XrXr_′(s, t) through minimizing

\sum_{i = 1}^{n} \sum_{1 \leq j, ℓ \leq N_{i}} K_{2} (\frac{T_{ij} - s}{h_{X_{r}}}, \frac{T_{i ℓ} - t}{h_{X_{r^{'}}}}) {[G_{X_{r} X_{r^{'}}, i} (T_{ij}, T_{i ℓ}) - f {η, (s, t), (T_{ij}, T_{i ℓ})}]}^{2},

(11)

where f{η, (s, t), (T_ij, T_iℓ)} = η₀ + η_l(s − T_ij) + η₂(t − T_iℓ), with respect to η = (η₀, η₁, η₂), yielding Ĝ_XrXr_′ (s, t) = η̂₀. The two-dimensional smoother in the estimation of G_YXr is defined similarly.

For the estimation of the auto-covariance G_XrXr, i.e. r = r′, the second sum in (11) is taken over 1 ≤ j ≠ ℓ ≤ N_i, to leave the noise contaminated diagonal raw covariance elements out of the smoothing procedure. For the eigen-decomposition of the auto-covariance surface G_XrXr, the eigenquations, $\int_{0}^{T} {\hat{G}}_{X_{r} X_{r}}^{*} (s, t) {\hat{ϕ}}_{rm} (s) ds = {\hat{ρ}}_{rm} {\hat{ϕ}}_{rm} (t)$ are solved under orthonormal constraints on the eigenfunctions, where ${\hat{G}}_{X_{r} X_{r}}^{*}$ is not the final but only the smooth estimator of the covariance function. To arrive at the final auto-covariance estimator, we exclude the negative estimates of the eigenvalues and corresponding eigenfunctions in the functional principal component decomposition of the covariance function, i.e. ${\hat{G}}_{X_{r} X_{r}} \sum_{m : {\hat{ρ}}_{rm} > 0}^{M_{r}} {\hat{ρ}}_{rm} {\hat{ϕ}}_{rm} (s) {\hat{ϕ}}_{rm} (t)$ .

In the original smoothing estimator of the auto-covariance surface leading to ${\hat{G}}_{X_{r} X_{r}}^{*}$ , for estimation of var(ε_r), a local quadratic component is fit orthogonal to the diagonal of G_XrXr and a local linear component is fit in the direction of the diagonal, resulting in a surface estimate where the diagonal will be denoted by G_r(s). In addition, a separate local linear smoother is fit only to the diagonal values {G_XrXr(t, t) + var(ε_r)} denoted by V̂_X(t). The estimator of var(ε_r) is given as the difference between the above two smoothing estimators for the diagonal terms, by $\hat{var} (ɛ_{r}) = (2 / T) \int_{T / 4}^{3 T / 4} {\hat{V} (s) - G_{r} (t)} dt$ , if var(ε_r) > 0, and var(ε_r) = 0 otherwise.

A.2. Assumptions and Proofs

Assumptions (A1–A6) are needed for all three theorems, (B1–B2) are needed for the consistency of the predicted response trajectories of Theorem 2 and their asymptotic distribution in Theorem 3, while (C) is only needed for the distributional result of Theorem 3.

(A1)
The cross-sectional predictors Z_gi are iid for i = 1, …, n with var(Z_gi) > 0 for g = 1, …, q.
(A2)
The covariance matrices χ_t defined in (4) are nonsingular for t ∈ [0, T].

The longitudinal predictor and response trajectories (T_ij, X_rij) and (T_ij, Y_ij), i = 1, …, n, j = 1, …, N_i, r = 1, …, p are assumed to have the same distribution as ( , X_r) and ( , Y) with joint densities g_r(t, x) and h(t, y). The observation times T_ij are i.i.d. with density f (t). Let T₁ and T₂ be i.i.d. T and X_r₁ and X_r₂ be the repeated measurements of X_r made on the same subject at times T₁ and T₂, and assume (T_ij, T_iℓ, X_rij, X_riℓ), 1 ≤ j ≠ ℓ ≤ N_i, is identically distributed as (T₁, T₂, X_r₁, X_r₂) with joint density function g_XrXr(t₁, t₂, x₁, x₂). It is analogously assumed that the response measurements (T_ij, T_iℓ, Y_ij, Y_iℓ), 1 ≤ j ≠ ℓ ≤ N_i, are identically distributed with joint density function g_YY(t₁, t₂, y₁, y₂). The following regularity conditions are assumed on f(t), g_r(t, x), h(t, y), g_XrXr(t₁, t₂, x₁, x₂) and g_YY(t₁, t₂, y₁, y₂).
(A3)
Let p₁, p₂ be integers with 0 ≤ p₁, p₂ ≤ p = p₁ + p₂ = 2. The derivative (dp/dt^p)f(t) exists and is continuous on [0, T] with f(t) > 0 on [0, T], (dp/dt^p)g_r(t, x) and (dp/dt^p)h(t, y) exist and are continuous on [0, T] × ℝ, and ${dp / ({dt}_{1}^{p_{1}} {dt}_{2}^{p_{2}})} g_{X_{r} X_{r}}$ (t₁, t₂, x₁, x₂) and ${dp / ({dt}_{1}^{p_{1}} {dt}_{2}^{p_{2}})} g_{Y Y}$ (t₁, t₂, y₁, y₂) exist and are continuous on [0, T]² × ℝ².
(A4)
The number of measurements N_i made on the ith subject is a random variable such that $N_{i} \overset{iid}{\sim} N$ , where N is a positive discrete random variable with P(N > 1) > 0. The observation times and measurements are assumed to be independent of the number of observations for any subset J_i ∈ {1, …, N_i} and for all i = 1, …, n, i.e. {T_ij, Y_ij, X_rij, Z_gi : j ∈ Ji} is independent of N_i.

Let K₁(·) be the nonnegative, mean zero, finite variance, compactly supported kernel function used in the estimating μ_Xr, μ_Y, G_YZg, G_XrZg and K₂(·, ·) be the bivariate kernel function with similar properties used in estimating the covariance surfaces G_XrXr_′, G_YXr. Explicit forms for the estimators of these quantities are given in Appendix A.3.
(A5)
The Fourier transform κ₁(t) = ∫ e⁻^iutK₁(u)du of K₁(u) and κ₂(t, s) = ∫e⁻⁽^iut⁺^ivs⁾K₂(u, υ)dudυ of K₂(u, υ) are absolutely integrable, i.e. ∫|κ₁(t)|dt < ∞ and ∫ ∫ |κ₂(t, s)|dtds < ∞.

Let b_Xr, b_Y be the bandwidths used for estimating μ̂_Xr, μ̂_Y, (h_Xr, h_Xr_′) be the bandwidths for estimating Ĝ_XrXr_′, (h_r₁, h_r₂) be the bandwidths for obtaining Ĝ_YXr, h_g for obtaining Ĝ_YZg and h_rg for Ĝ_XrZg_′, where all bandwidths depend on n.
(A6)
As n → ∞, b_Xr → 0, b_Y → 0, h_g → ∞ and h_rg → ∞, ${nb}_{X_{r}}^{4} \to \infty$ , ${nb}_{Y}^{4} \to \infty$ , ${nh}_{g}^{4} \to \infty$ , ${nh}_{rg}^{4} \to \infty$ and ${nb}_{X_{r}}^{6} < \infty$ , ${nb}_{Y}^{6} < \infty$ , ${nh}_{g}^{6} < \infty$ and ${nh}_{rg}^{6} < \infty$ . Without loss of generality h_Xr/hX_r_′ → 1, h_r₁/h_r₂ → 1 and ${nh}_{X_{r}}^{6} \to \infty$ , ${nh}_{r 1}^{6} \to \infty$ and ${nh}_{X_{r}}^{8} < \infty$ and ${nh}_{r 1}^{8} < \infty$ .
(A7)
Assume that the fourth moments of Y and X, centered at μ_Y(t) and μ_X(t) are finite, i.e. E[{Y − μ_Y(T)}⁴] < ∞, E[{(X − μ_X(T)}⁴] < ∞.
(A8)
The number of included eigenfunctions in (8), M₁, …, M_p are integer valued sequences that depend on sample size n such that inf_t_∈[0,_T_]M_r(n) → ∞ and both inf_t_∈[0,_T_]M_r(n) and sup_t_∈[0,_T_]M_r(n) satisfy the rate conditions given in assumption (B5) of Yao, Müller and Wang (2005a).
(A9)
The autocovariance operator A_Gr generated by the continuously differentiable covariance function G_XrXr(s, t) is positive definite.
(B1)
The number and locations of the measurements for a subject or cluster remain unaltered as the sample size n → ∞.
(B2)
For all 1 ≤ i ≤ n, m ≥ 1, 1 ≤ r ≤ p, 1 ≤ g ≤ q and 1< j < N_i, the functional principal component scores ξ_rim, the cross-sectional predictors Z_gi and the measurement errors ε_rij in (SD) are jointly Gaussian.
(C)
There exists a continuous positive definite function ω_t such that ω_t_ℳ as defined in Theorem 3 satisfies ω_t_ℳ → ω_t as M₁, …, M_p … ∞.

Proof of Theorem 1. Uniform consistency of μ̂_Xr(t) and μ̂_Y(t) follow from Theorem 1 of Yao, Muller and Wang (2005b) and that of Ĝ_XrZg(t) and Ĝ_YZg(t) can be shown similar to the consistency of μ̂_Xr(t) and μ̂_Y(t) for r = 1, …, p, g = 1, …, q. The properties of A_Gr in (A9) imply that ρ_rm are all positive. Hence $\sum_{m : {\hat{ρ}}_{rm} > 0}^{M_{r}} {\hat{ρ}}_{rm} {\hat{ϕ}}_{rm} (s) {\hat{ϕ}}_{rm} (t)$ and $\sum_{m}^{M_{r}} {\hat{ρ}}_{rm} {\hat{ϕ}}_{rm} (s) {\hat{ϕ}}_{rm} (t)$ are asymptotically equivalent since $| {\hat{ρ}}_{rm} - ρ_{rm} | = O_{p} {1 / (\sqrt{n} h_{X_{r}}^{2})}$ by Theorem 2 of Yao, Müller and Wang (2005b). The uniform consistency of Ĝ_XrXr(s, t) follows from uniform consistency of the eigenvalue and eigenfunction estimators shown in Theorem 2 of Yao, Müller and Wang (2005b). For the rate conditions on M_r, we refer the reader to assumption (B5) of Yao, Müller and Wang (2005a) and note that further details on theoretical properties of functional principal component analysis can be found in Silverman (1996), Hall and Hosseini-Nasab (2009) and Hall et al. (2006). Uniform consistency of the cross-covariance estimators Ĝ_XrXr_′(s, t) and Ĝ_YXr(s, t) follow from Lemma A1 of Yao, Müller and Wang (2005a). Combining these results implies uniform consistency of χ^_t and Ξ̂_t and Theorem 1 follows.

Proof of Theorem 2. For fixed M₁, …, M_p and ℳ, let ${\tilde{Y}}_{M}^{*} (t) = μ_{Y} (t) + \sum_{r = 1}^{p} β_{r} (t) \sum_{m = 1}^{M_{r}} {\tilde{ξ}}_{rm}^{*} ϕ_{rm} (t) + \sum_{g = 1}^{q} α_{g} (t) Z_{g}^{C *}$ and recall that $\tilde{Y} * (t) = μ_{Y} (t) + \sum_{r = 1}^{p} β_{r} (t) \sum_{m = 1}^{\infty} {\tilde{ξ}}_{rm}^{*} ϕ_{rm} (t) + \sum_{g = 1}^{q} α_{g} (t) Z_{g}^{C *}$ .

Observe the following decomposition

| {\hat{Y}}_{M}^{*} (t) - \tilde{Y} * (t) | \leq | {\hat{Y}}_{M}^{*} (t) - {\tilde{Y}}_{M}^{*} (t) | + | {\tilde{Y}}_{M}^{*} (t) - \tilde{Y} * (t) |,

where it follows similar to Lemma 3 of Yao, Muller and Wang (2005b) that ${\tilde{Y}}_{M}^{*} (t) \overset{p}{\to} \tilde{Y} * (t)$ as M₁, …, M_p → ∞ and n → ∞. The uniform consistency of μ̂_Y(t) follows from Theorem 1 of Yao, Müller and Wang (2005b). Hence, using Theorem 1 of Section 2.4, Theorem 3 and (17) of Yao, Müller and Wang (2005b) and Slutsky's Theorem, it follows that $| {\hat{Y}}_{M}^{*} (t) - {\tilde{Y}}_{M}^{*} (t) | \to 0$ as n → ∞ and Theorem 2 follows.

Proof of Theorem 3. Define $E_{M} {Y * (t) | X_{1}^{*} (t), \dots, X_{p}^{*} (t), Z_{1}^{*}, \dots, Z_{q}^{*}} = μ_{Y} (t) + \sum_{r = 1}^{p} β_{r} (t) \sum_{m = 1}^{M_{r}} ξ_{rm}^{*} ϕ_{rm} (t) + \sum_{g = 1}^{q} α_{g} (t) Z_{g}^{C *}$ and note the following decomposition

\begin{array}{l} {\hat{Y}}_{M}^{*} (t) - E_{M} {Y * (t) | X_{1}^{*} (t), \dots, X_{p}^{*} (t), Z_{1}^{*}, \dots, Z_{q}^{*}} & = & {\hat{Y}}_{M}^{*} (t) - {\tilde{Y}}_{M}^{*} (t) + {\tilde{Y}}_{M}^{*} (t) \\ - & E_{M} {Y * (t) | X_{1}^{*} (t), \dots, X_{p}^{*} (t), Z_{1}^{*}, \dots, Z_{q}^{*}} . \end{array}

It follows from the proof of Theorem 2 that lim_n_{→ ∞} sup_t_∈[0,_T_] $| {\hat{Y}}_{M}^{*} (t) - {\tilde{Y}}_{M}^{*} (t) | \to 0$ .

Define the ℳ × (N*p+q) matrix $H = cov (ξ *^{M}, U * | N *, T *) = {[H_{11}^{*}, \dots, H_{1 M_{1}}^{*}, \dots, H_{p 1}^{*}, \dots, H_{p M_{r}}^{*}]}^{T}$ where $H_{rm}^{*^{T}}$ is as defined in (7). Since $\tilde{ξ} *^{M} = H \sum_{U *}^{- 1} (U * - μ_{U}^{*})$ , $cov (\tilde{ξ} *^{M} | N *, T *) = cov (\tilde{ξ} *^{M}, ξ *^{M} | N *, T *) = H \sum_{U *}^{- 1} H^{T}$ . Hence, $cov (\tilde{ξ} *^{M} - ξ *^{M} | N *, T *) = cov (ξ *^{M} | N *, T *) - cov (\tilde{ξ} *^{M} | N *, T *) = D - H \sum_{U *}^{- 1} H^{T} \equiv Ω_{M}$ , where D = cov(ξ*^ℳ|N*, T*) is the ℳ × ℳ matrix with (r, r′)th partition, a M_r × M_r_′ matrix ${(D)}_{r r^{'}} = D_{r r^{'}} = cov (ξ_{r}^{*^{M_{r}}}, ξ_{r^{'}}^{*^{M_{r^{'}}}} | N *, T *)$ . Let ${\hat{Ω}}_{M} = \hat{D} - \hat{H} \sum_{U *}^{- 1} {\hat{H}}^{T}$ , where D̂ and $\hat{H} = {({\hat{H}}_{11}^{*}, \dots, {\hat{H}}_{1 M_{1}}^{*}, \dots, {\hat{H}}_{p 1}^{*}, \dots, {\hat{H}}_{p M_{r}}^{*})}^{T}$ and Σ̂_U_* are estimated based on the entire data with ${\hat{H}}_{rm}^{*^{T}}$ , Ê(ξ_rmZ_g) and ξ̂_rm,_r_′_m_′ as defined in Section 3. It follows that under the Gaussian assumption for a fixed M₁, …, M_p ≥ 1, ξ̃*^ℳ − ξ*^ℳ ∼ Inline graphic (0, Ω_ℳ). Hence,

{\hat{Y}}_{M}^{*} (t) - E_{M} {Y * (t) | X_{1}^{*} (t), \dots, X_{p}^{*} (t), Z_{1}^{*}, \dots, Z_{q}^{*}} \overset{D}{\to} Z_{M} \sim N (0, ω_{t M}) .

Under Assumption (C), letting M₁, …, M_p → ∞ leads to $Z_{M} \overset{D}{\to} Z \sim N (0, ω_{t})$ . From the Karhunen-Loéve Theorem, we have

| E_{M} {Y * (t) | X_{1}^{*} (t), \dots, X_{p}^{*} (t), Z_{1}^{*}, \dots, Z_{q}^{*}} - E {Y * (t) | X_{1}^{*} (t), \dots, X_{p}^{*} (t), Z_{1}^{*}, \dots, Z_{q}^{*}} | \overset{P}{\to} 0,

as M₁, …, M_p → ∞. Hence, ${lim}_{M_{1}, \dots, M_{p} \to \infty} {lim}_{n \to \infty} [{\hat{Y}}_{M}^{*} (t) - E {Y * (t) | X_{1}^{*} (t), \dots, X_{p}^{*} (t), Z_{1}^{*}, \dots, Z_{q}^{*}}] \overset{D}{=} Z$ . From Theorem 1 of Section 2.4 and Lemma 1 of Yao, Müller and Wang (2005a), it follows that lim_M_l, …, M_p_→∞ lim_n_→∞ω̂_t_ℳ = ω_t. Theorem 3 follows by Slutsky's Theorem.

Contributor Information

Damla Şentürk, Email: dsenturk@stat.psu.edu.

Danh V. Nguyen, Email: ucdnguyen@ucdavis.edu.

References

Ash RB, Gardner MF. Topics in Stochastic Processes. New York: Academic Press; 1975. [Google Scholar]
Carroll RJ, Delaigle A, Hall P. Nonparametric prediction in measurement error models. Journal of the American Statistical Association. 2009;104:993–1003. doi: 10.1198/jasa.2009.tm07543. [DOI] [PMC free article] [PubMed] [Google Scholar]
Carroll RJ, Hall P. Optimal rates of convergence for deconvolving a density. Journal of the American Statistical Association. 1988;83:1184–1186. [Google Scholar]
Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement error in nonlinear models: A modern perspective. 2nd. Baco Raton: Chapman and Hall CRC Press; 2006. [Google Scholar]
Chiang CT, Rice JA, Wu CO. Smoothing spline estimation for varying coefficient models with repeatedly measured dependent variables. Journal of the American Statistical Association. 2001;96:605–619. [Google Scholar]
Cleveland WS, Grosse E, Shyu WM. Statistical Models in S, J. M. Chambers and T. J. Hastie. Pacific Grove: Wadsworth & Brooks; 1991. Local regression models; pp. 309–376. [Google Scholar]
Davis CS. Statistical methods for the analysis of repeated measurements. New York: Springer; 2002. [Google Scholar]
Fan J, Zhang W. Simultaneous confidence bands and hypothesis testing in varying-coefficient models. Scandinavian Journal of Statistics. 2000;27:715–731. [Google Scholar]
Fan J, Zhang W. Statistical methods with varying coefficient models. Statistics and its Interface. 2008;1:179–195. doi: 10.4310/sii.2008.v1.n1.a15. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hall P, Hosseini-Nasab M. Theory for high-order bounds in functional principal components analysis. Journal of the Royal Statistical Society B. 2009;68:109–126. [Google Scholar]
Hall P, Müller HG, Wang JL. Properties of principal component methods for functional and longitudinal data analysis. The Annals of Statistics. 2006;34:1493–1517. [Google Scholar]
Hastie T, Tibshirani R. Varying coefficient models. Journal of the Royal Statistical Soceity B. 1993;55:757–796. [Google Scholar]
Heaney RP, Recker RR, Stegman MR, Moy AJ. Calcium absorption in women: relationships to calcium intake, estrogen status, age. Journal of Bone and Mineral Research. 1989;4:469–475. doi: 10.1002/jbmr.5650040404. [DOI] [PubMed] [Google Scholar]
Hoover DR, Rice JA, Wu CO, Yang LP. Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika. 1998;85:809–822. [Google Scholar]
Huang JZ, Wu CO, Zhou L. Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika. 2002;89:111–128. [Google Scholar]
Huang JZ, Wu CO, Zhou L. Polynomial spline estimation and inference for varying coefficient models with longitudinal data. Statistica Sinica. 2004;14:763–788. [Google Scholar]
Qu A, Li R. Quadratic inference functions for varying coefficient models with longitudinal data. Biometrics. 2006;62:379–391. doi: 10.1111/j.1541-0420.2005.00490.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Şentürk D, Müller HG. Technical report. Department of Statistics, Penn State University, University; 2009. Functional varying coefficient models for longitudinal data. [Google Scholar]
Silverman BW. Smoothed functional principal components analysis by choice of norm. The Annals of Statistics. 1996;24:1–24. [Google Scholar]
Stefanski LA, Carroll RJ. Deconvoluting kernel density estimators. Statistics. 1990;21:165–184. [Google Scholar]
Wu CO, Chiang CT. Kernel smoothing on varying coefficient models with longitudinal dependent variable. Statist Sinica. 2000;10:433–456. [Google Scholar]
Wu CO, Chiang CT, Hoover DR. Asymptotic confidence regions for kernel smoothing of a varying-coefficient model with longitudinal data. Journal of the American Statistical Association. 1998;93:1388–1402. [Google Scholar]
Yao F, Müller HG, Wang JL. Functional linear regression analysis for longitudinal data. Annals of Statistics. 2005a;3:2873–2903. [Google Scholar]
Yao F, Müller HG, Wang JL. Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association. 2005b;100:577–590. [Google Scholar]

[R1] Ash RB, Gardner MF. Topics in Stochastic Processes. New York: Academic Press; 1975. [Google Scholar]

[R2] Carroll RJ, Delaigle A, Hall P. Nonparametric prediction in measurement error models. Journal of the American Statistical Association. 2009;104:993–1003. doi: 10.1198/jasa.2009.tm07543. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Carroll RJ, Hall P. Optimal rates of convergence for deconvolving a density. Journal of the American Statistical Association. 1988;83:1184–1186. [Google Scholar]

[R4] Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement error in nonlinear models: A modern perspective. 2nd. Baco Raton: Chapman and Hall CRC Press; 2006. [Google Scholar]

[R5] Chiang CT, Rice JA, Wu CO. Smoothing spline estimation for varying coefficient models with repeatedly measured dependent variables. Journal of the American Statistical Association. 2001;96:605–619. [Google Scholar]

[R6] Cleveland WS, Grosse E, Shyu WM. Statistical Models in S, J. M. Chambers and T. J. Hastie. Pacific Grove: Wadsworth & Brooks; 1991. Local regression models; pp. 309–376. [Google Scholar]

[R7] Davis CS. Statistical methods for the analysis of repeated measurements. New York: Springer; 2002. [Google Scholar]

[R8] Fan J, Zhang W. Simultaneous confidence bands and hypothesis testing in varying-coefficient models. Scandinavian Journal of Statistics. 2000;27:715–731. [Google Scholar]

[R9] Fan J, Zhang W. Statistical methods with varying coefficient models. Statistics and its Interface. 2008;1:179–195. doi: 10.4310/sii.2008.v1.n1.a15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Hall P, Hosseini-Nasab M. Theory for high-order bounds in functional principal components analysis. Journal of the Royal Statistical Society B. 2009;68:109–126. [Google Scholar]

[R11] Hall P, Müller HG, Wang JL. Properties of principal component methods for functional and longitudinal data analysis. The Annals of Statistics. 2006;34:1493–1517. [Google Scholar]

[R12] Hastie T, Tibshirani R. Varying coefficient models. Journal of the Royal Statistical Soceity B. 1993;55:757–796. [Google Scholar]

[R13] Heaney RP, Recker RR, Stegman MR, Moy AJ. Calcium absorption in women: relationships to calcium intake, estrogen status, age. Journal of Bone and Mineral Research. 1989;4:469–475. doi: 10.1002/jbmr.5650040404. [DOI] [PubMed] [Google Scholar]

[R14] Hoover DR, Rice JA, Wu CO, Yang LP. Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika. 1998;85:809–822. [Google Scholar]

[R15] Huang JZ, Wu CO, Zhou L. Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika. 2002;89:111–128. [Google Scholar]

[R16] Huang JZ, Wu CO, Zhou L. Polynomial spline estimation and inference for varying coefficient models with longitudinal data. Statistica Sinica. 2004;14:763–788. [Google Scholar]

[R17] Qu A, Li R. Quadratic inference functions for varying coefficient models with longitudinal data. Biometrics. 2006;62:379–391. doi: 10.1111/j.1541-0420.2005.00490.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] Şentürk D, Müller HG. Technical report. Department of Statistics, Penn State University, University; 2009. Functional varying coefficient models for longitudinal data. [Google Scholar]

[R19] Silverman BW. Smoothed functional principal components analysis by choice of norm. The Annals of Statistics. 1996;24:1–24. [Google Scholar]

[R20] Stefanski LA, Carroll RJ. Deconvoluting kernel density estimators. Statistics. 1990;21:165–184. [Google Scholar]

[R21] Wu CO, Chiang CT. Kernel smoothing on varying coefficient models with longitudinal dependent variable. Statist Sinica. 2000;10:433–456. [Google Scholar]

[R22] Wu CO, Chiang CT, Hoover DR. Asymptotic confidence regions for kernel smoothing of a varying-coefficient model with longitudinal data. Journal of the American Statistical Association. 1998;93:1388–1402. [Google Scholar]

[R23] Yao F, Müller HG, Wang JL. Functional linear regression analysis for longitudinal data. Annals of Statistics. 2005a;3:2873–2903. [Google Scholar]

[R24] Yao F, Müller HG, Wang JL. Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association. 2005b;100:577–590. [Google Scholar]

PERMALINK

Varying Coefficient Models for Sparse Noise-contaminated Longitudinal Data

Damla Şentürk

Danh V Nguyen

Summary

1 Introduction