Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2010 Dec 6.
Published in final edited form as: Biometrics. 2007 Nov 19;64(3):751–761. doi: 10.1111/j.1541-0420.2007.00940.x

A Penalized spline approach to functional mixed effects model analysis

Huaihou Chen 1, Yuanjia Wang 2,*
PMCID: PMC2996855  NIHMSID: NIHMS252084  PMID: 18047528

Abstract

Summary

In this work, we propose penalized spline based methods for functional mixed effects models with varying coefficients. We decompose longitudinal outcomes as a sum of several terms: a population mean function, covariates with time-varying coefficients, functional subject-specific random effects and residual measurement error processes. Using penalized splines, we propose nonparametric estimation of the population mean function, varying-coefficient, random subject-specific curves and the associated covariance function which represents between-subject variation and the variance function of the residual measurement errors which represents within-subject variation. Proposed methods offer flexible estimation of both the population-level and subject-level curves. In addition, decomposing variability of the outcomes as a between-subject and a within-subject source is useful in identifying the dominant variance component therefore optimally model a covariance function. We use a likelihood based method to select multiple smoothing parameters. Furthermore, we study the asymptotics of the baseline P-spline estimator with longitudinal data. We conduct simulation studies to investigate performance of the proposed methods. The benefit of the between- and within-subject covariance decomposition is illustrated through an analysis of Berkeley growth data where we identified clearly distinct patterns of the between- and within-subject covariance functions of children's heights. We also apply the proposed methods to estimate the effect of anti-hypertensive treatment from the Framingham Heart Study data.

Keywords: Multi-level functional data, Functional random effects, Semiparametric longitudinal data analysis

1. Introduction

Longitudinal designs are routinely implemented in biomedical research studies. Comprehensive presentations on parametric methods for analyzing longitudinal data can be found in, for example, Diggle et al. (2002). In some applications, in addition to modeling a mean function, modeling a covariance function of the subject-specific processes is of scientific interest. For example, in genetic studies the covariance of related subjects within a family represents genetic information. This function is used to compute heritability (ratio of genetic variance and total trait variance) which quantifies the genetic effect on a trait (Khoury et al. 1993). In other applications, although not of direct scientific interest, accurate estimation of a covariance function leads to efficiency gain in estimating the population mean function and fixed effects parameters (Fan et al. 2007).

In practice, concerns on model missspecification for parametric methods may call for more flexible nonparametric or semiparametric approaches. In the context of longitudinal data analysis, Diggle and Verbyla (1998) provided nonparametric estimation of covariance structure by using local polynomials to smooth various moment estimators of the variance and covariance functions. Wu and Pourahmadi (2003) and Huang et al. (2006) proposed nonparametric estimators for large covariance matrix via Cholesky decomposition which are guaranteed to be positive definite. In the context of functional data analysis, Guo (2002) considered functional mixed effects models and introduced a Kalman filtering algorithm to handle large matrices in the mixed model representation of smoothing splines which may be computationally challenging. Crainiceanu et al. (2007) proposed Bayesian penalized spline to model variance function of heteroscedastic errors nonparametrically and provided a spatially adaptive smoothing parameter for the population mean function. Krafty et al. (2008) dealt with a varying coefficient model and pursued a smoothing spline based approach with an iterative reweighted least square procedure to fit the model. Rice and Wu (2001) used regression spline based methods and treated subject-specific curves to be nonparametric random curves. Fan et al. (2007) proposed a semiparametric method to estimate the error covariance function where the variance function is modeled nonparametrically with local polynomials and the correlation function is modeled parametrically. In the context of spatial smoothing, Wood et al. (2002) considered spatial process as a mixture of smoothing splines to achieve spatial adaptivity. Alternatively, functional principal components are used to reduce dimensionality and model a covariance function. Methods along this line include Ramsay and Silverman (2005, Chapter 8-10) for univariate data, Yao et al. (2005), Yao and Lee (2006) and Kauermann and Wegener (2009) for longitudinal data (or sparse functional data), and Di et al. (2009) and Staicu et al. (2010) for multi-level functional data. To alleviate computational burden, Durbáan et al. (2005) pursued a simple penalized spline (P-spline, O'Sullivan 1986; Eliers and Marx 1996) approach to fit subject-specific curves which expresses these curves as a linear combination of truncated polynomial spline basis with random coefficients and specifies a simplified parametric covariance matrix for the basis coefficients.

In this work, we present methods for functional mixed effects models that decompose longitudinal or functional outcomes as a sum of several terms: an unspecified population mean function, covariates with time-varying coefficients, functional subject-specific random effects and residual measurement error process. Using penalized splines, we propose nonparametric estimation of the population mean function, varying-coefficient, subject-specific curves and the associated covariance function which represents between-subject variation, and the variance function of the residual measurement errors which represents within-subject variation. The proposed model and methods maintain flexibility in modeling both the population-level and the subject-level curves. In addition, decomposing variability of the outcomes as a between-subject and a within-subject source is useful in identifying the dominant variance component therefore optimally model a covariance function. The benefit of such decomposition is illustrated through an analysis of the Berkeley growth data (Tuddenham and Snyder 1954) where we identified clearly distinct patterns of the between- and within-subject covariance. Both estimated covariance functions satisfy the positive semidefinite constraint.

All nonparametric components of our model are estimated through P-spline which is considered as a reduced rank smoother. Penalized spline was originally proposed by O'Sullivan (1986) and has gained popularity since Eilers and Marx (1996) and Ruppert et al. (2003). A comprehensive review of penalized spline can be found in Ruppert et al. (2003, 2009). With a penalized spline basis expansion of the random subject-specific curves, the dimensionality of the covariance matrix of the random basis coefficients is reduced due to the moderate number of knots, leading to computational advantage. Theoretical work has shown that penalized spline as a low rank approximation can be asymptotically as efficient as full rank estimators such as smoothing splines (Li and Ruppert 2008; Claeskens et al. 2009).

This work distinguishes from the functional principal components based and longitudinal data analysis based approaches in the literature (e.g., Diggle and Verbyla 1998; Yao et al. 2005; Yao and Lee 2006; Di et al. 2009; Kauermann and Wegener 2009) in that we do not require a moment estimator or a surface smoother of the covariance function/matrix before smoothing, and the dimensionality is reduced through using moderate number of knots instead of using reduced number of principal components. The proposed methods improve upon the regression spline based approaches (Rice and Wu 2001) which are sensitive to number and location of the knots through imposing a penalty on the spline coefficients to control overfitting and achieve smooth fit. This work also distinguishes from Durbáan et al. (2005) by allowing a general unstructured covariance matrix for random spline basis of the subject-specific curves and allowing the residual error variance to be modeled nonparametrically. Comparing to the local polynomial or kernel based approaches (e.g., Fan et al. 2007) the proposed methodologies allow for easy incorporation of the fixed and random effects.

Current literature studies the asymptotic properties of penalized spline estimator obtained from univariate data (a single measurement for each subject). Li and Ruppert (2008) examined the asymptotics of a P-spline estimator with B-spine basis and first or second order penalty assuming the number of knots is relatively large. Kauermann et al. (2009) studied asymptotics of P-spline estimator allowing for generalized non-normal outcomes. Claeskens et al. (2009) obtained two asymptotic scenarios of the P-spline estimator and showed the asymptotic bias and variance for each scenario with univariate data. In this work, we examine the asymptotic properties of the P-spline nonparametric population mean function estimated with longitudinal data. We show that under appropriate assumptions, the order of the bias and the variance term is the same as for the univariate data as shown in Claeskens et al. (2009).

The remaining part of the paper is structured as follows. Section 2 proposes methods to estimate the population mean function and the within-subject heteroscedastic error variance function nonparametrically. Section 3 develops methods for a wider class of functional mixed effects models with varying coefficients, random unspecified subject-specific curves and heteroscedastic measurement errors. In section 4, we show the asymptotic bias, variance and normality of the penalized spline estimator with longitudinal data. In section 5, we conduct two simulation studies to investigate performance of the proposed methods and apply them to analyze the Berkeley growth data and the Framingham Heart Study data. In section 6, we discuss possible extensions.

2. Semiparametric estimation of the within-subject variation

In this section we account for heteroscedastic within-subject errors while estimating the population mean function and the residual variance function nonparametrically. Let i index subjects and let j index visits. A useful model for longitudinal data analysis is a partially linear mixed effects model,

yij=μ(tij)+xijTβ0+zijTbi+ij(tij),iN(0,Vi12Ri(θ)Vi12),Vi=diag{σ2x(ti1),,σ2(ti,mi)}, (1)

where μ(t) is a nonparametric population mean function, xij is a px × 1 vector of covariates and β0 is the associated parameter vector, bi are i.i.d. random effect vectors following N(0, D), zij are the associated design vectors, and the vectors of heteroscedastic measurement errors εi = (εi1, · · · , εimi)T are assumed to be independent of the random effects, and their variance function, σ2(t), will be modeled nonparametrically, and Ri(θ) is a parametric correlation matrix such as AR-1 or compound symmetry with θ as the vector of unknown parameters. When μ(t) takes a linear form in model (1), it reduces to a linear mixed effects model with heteroscedastic errors. When μ(t) has a known non-linear form such as exponential, the model (1) reduces to a non-linear mixed effects model with heteroscedastic errors.

In practice it may not be easy to model the population mean and the error variance function parametrically. For example, the Berkeley growth data which we analyzed in section 5.2 clearly illustrates a non-linear trend of the mean function and the variance function of children's heights which are not straightforward to be specified parametrically. In the online appendix, we provide methods to estimate these functions nonparametrically by penalized splines and present a likelihood based smoothing parameter selection approach to choose multiple smoothing parameters.

3. Functional mixed effects model and nonparametric estimation of the between-subject variation

3.1 Model and proposed methods

In this section, we propose methods for a wider class of functional mixed effects models where we accommodate covariates with varying coefficients and in addition to heteroscedastic errors, we accommodate functional subject-specific random effects. To be specific, consider

yij=xijTβ0+μ(tij)+wijβ(tij)+νi(tij)+ij(tij),νi(t)W(0,γ),iN(0,Vi12Ri(θ)Vi12),Vi=diag{σ2(ti1),,σ2(ti,mi)}, (2)

where νi(t) are functional subject-specific random effects assumed to be independent, W(0, γ) is a Gaussian process with covariance function γ(s, t), and the residuals εij are again assumed to have nonparametric variance σ2(t). When β(t) = 0 and νi(t) has a parametric form, model (2) reduces to model (1). The model (2) can handle nonparametric population mean function, varying-coefficients and unspecified subject-specific curves with an unspecified covariance function, therefore one obtains flexible estimation of both the population-level and subject-level curves.

Assume that the population mean, time-varying coefficient, functional random effects and heteroscedastic error variance functions can be approximated as

μ(t)=Bμ(t)βμ,β(t)=Bc(t)βc,νi(t)=Bν(t)ξi,andlogσ2(t)=Bσ(t)η,

where Bμ(t), Bc(t), Bν(t) and Bσ(t) are row vectors of basis functions for the mean, varying coefficient, subject-specific curves and error variance function with possible different order and different number of knots, βμ, βc and η are the associated basis coefficients, and ξi are vectors of random subject-specific basis coefficients. Since the functional random effects νi(t) are approximated by a linear combination of spline basis with random coefficients, the between-subject covariance function can be approximated by

γ(s,t)=Bν(s)ΩBνT(t),whereΩ=cov(ξi).

Let Bci=(wi1BcT(ti1),,wimiBcT(timi))T, Xi=(xi,Bμi,Bci), Zi=(BνT(ti1),,BνT(timi))T and β=(β0T,βμT,βcT)T. Then the model (2) can be re-written as

Yi=Xiβ+Ziξi+i,ξiN(0,Ω),andiN(0,Vi12RiVi12).

Direct maximization of the penalized marginal likelihood of the above model is a difficult non-convex problem. However, we can treat ξi as missing data and employ the EM algorithm. Define the penalized joint log-likelihood of Yi and ξi as

i=1n{(YiXiβZiξi)T(Vi12RiVi12)1(YiXiβZiξi)+ξiTΩ1ξi}+λμβμTPμβμ+λcβcTPcβc+ληηTPηη+λνi=1mξiTPνξi, (3)

where λμ, λc, λν and λσ are smoothing parameters and Pμ, Pc, Pν and Pσ are penalty matrices depending on the chosen basis. For example, for the pth order truncated polynomial basis with K knots, the penalty matrix is diag(0p+1, 1K). The first three penalty terms in (3) control the smoothness of the fitted population mean, varying coefficient and error variance functions. The last penalty term controls smoothness of the fitted subject-specific curves. It is motivated by the assumption that the random effects are realizations of a Gaussian process with smooth covariance function. Similar penalty was used in Krafty et al. (2008) for smoothing splines and in Wu and Zhang (2006).

Given the variance components Ω, Vi and Ri, we minimize the joint penalized likelihood (3) with respect to β and ξi to obtain

β^=(i=1nXiTΣ^i1Xi+Pλμ,λc)1(i=1nXiTΣ^i1Yi),ξ^i=Ω^λνZiTΣ^i1(YiXiβ^), (4)

where Σ^i=ZiΩλνZiT+Vi12RiVi12,Ω^λν=(Ω^1+λνPν)1, and Pλμc = diag(0px, λμPμ, λcPc), where px is the column dimension of Xi. The estimation of the between-subject variance component Ω is through restricted maximum likelihood (REML) which yields

Ω^=1ni=1n{ξ^iξ^iT+Ω^λνΩ^λνZiTMiZiΩ^λν}, (5)

with Mi=Σ^i1Σ^i1Xi(i=1nXiTΣ^i1Xi+Pλμ,λc)1XiΣ^i1.

To summarize, we use the following algorithm to estimate parameters in (2). Assuming a working independent residuals with constant variance, we can obtain initial value β^(0). Let Ω(0)=diag{1,,1},λν=1,Ω(0)=(Ω(0)1+λνPν)1, and ξ^i(0)=Ω(0)ZiTΣ^i(0)1(YiXiβ^(0)). We repeat the following step 1 and step 2 until convergence is reached.

Step 1. Use methods introduced in Section 2 and the online appendix to estimate η and θ which are associated with the within-subject covariance function.

Step 2. Calculate the EM algorithm based estimators (4) and (5).

There are four smoothing parameters, λμ, λc λν and λσ, involved in the estimation. A cross-validation based approach would be computationally intensive. It is also complicated to carry out information criteria based model selection due to difficulties in defining degrees of freedom. We choose the smoothing parameters by a likelihood based approach as described in the online appendix.

After the convergence is reached, the estimated nonparametric population-level curve is μ^(t)=Bμ(t)β^μ, and the predicted nonparametric subject-level curve for the individual i is

s^i(t)=xiT(t)β^0+Bμ(t)β^μ+wi(t)Bc(t)β^c+Bν(t)ξ^i.

Furthermore, the estimated between-subject covariance function is

γ^(s,t)=Bν(s)Ω^BνT(t). (6)

3.2 Testing the varying coefficients

In some applications, one may be interested in testing whether the varying-coefficient changes with time, that is, the hypothesis

H0:β(t)=βfor anytvs.H1:β(t)βfor somet.

Due to the non-standard distribution of the likelihood ratio test under the null hypothesis reported in Crainiceanu and Ruppert (2004a) and Crainiceanu and Ruppert (2004b), we compute p value of the likelihood ratio test based on bootstrap resampling. Specifically, let

^i=Yixiβ^0Bμ(ti)β^μwiBc(ti)β^cZiξ^i

be the residuals obtained under H1, and let

Yi(b)=xiβ^0H0+Bμ(ti)β^μH0+wiβ^c+Ziξ^iH0+^i,i=1,,n,

denote the bth pseudo-outcome under H0, where β^0H0,β^μH0,β^c and ξ^iH0 are the corresponding estimators obtained under the null hypothesis. We resample the data Yi(b) from the above model B times, and compute the likelihood ratio test with each copy of the B samples. We then compute the p-value of the test based on the empirical distribution of the bootstrapped likelihood ratio statistics. Similar procedure was used in Huang et al. (2002).

4. Asymptotic properties

In this section, we examine the asymptotic convergence rate of the bias and variance of the estimated population mean function and examine the asymptotic normality. These results are closely related to those obtained in Claeskens et al. (2009) and Zhu et al. (2008). Assume that the range of the variable tij is [a, b], with –∞ < a < b < ∞. We will first consider the estimator with B-spline basis, and then extend the results to the truncated polynomial basis by a transformation of the two sets of basis functions (the latter results are presented in the online appendix).

4.1 Preliminary

Let a = τ0 < τ1 < · · · < τK < τK+1 = b. In addition, define p knots τ–p = τp+1 = · · · = τ–1 = τ0 and another set of p knots τK+1 = τK+2 = · · · = τK+p+1. Denote the B-spline basis functions as N(t) = {Np,p+1(t), · · ·, NK,p+1(t)}, let N = {NT(t11), · · ·, NT(tnm)}T and let Σ = diag{V, · · ·, V}. We allow the covariance of Yi to be unstructured, assume it is known and does not change across subjects. As described in section 2, the population mean function is obtained by minimizing

(YNβμ)TΣ1(YNβμ)+λab[(N(t)βμ)(q)]2dt, (7)

where the penalty is the integrated squared qth order derivative of the B-spline function and is assumed to be finite.

Let R denote a matrix with elements Rij=abNj,p+1q(t)Ni,p+1q(t)dt, for i, j = –p + q, · · ·, K and let Δq denote a difference operator. The penalty term can be re-written as λβμTΔqTRΔqβμ. Let Dq=ΔqTRΔq, the fitted population mean function can be expressed as a ridge regression estimator with weighted least square

μ^=N(NTΣ1N+λDq)1NTΣ1Y,

with μ^={μ^(t11),,μ^(tnm)}T. A regression spline estimator is the solution to (7) ignoring the penalty term, that is,

μ^reg=N(NTΣ1N)1NTΣ1Y.

Denote Cp+1[a, b] = {μ : μ has p + 1 continuous derivatives}. Under the assumptions A1, (A-1) in A2, and A3 stated in the online appendix, and μCp+1[a, b], Zhu et al. (2008) obtained the approximation bias and variance for μ^reg as

Eμ^reg(t)μ(t)=ba(t,p+1)+o(δp+1),Var{μ^reg(t)}=1nN(t)G1NT(t)+o((nδ)1),

where μ^reg(t)=N(t)(NTΣ1N)1NTΣ1Y, G = (gij), and Σ–1 = (σst) with

gij=stmababNi(x)σstNj(y)ρst(x,y)dxdy+s=1mabNi(x)σssNj(x)ρs(x)dx,

where ρs and ρst are defined in the online appendix. The approximation bias is

ba(t,p+1)=μ(p+1)(t)(p+1)!i=0KI(τit<τi+1)δip+1Bp+1(tτiδi),

with Bp+1(t) as the (p + 1)th Bernoulli polynomial (Barrow and Smith 1978). These results will be used to derive the asymptotic properties of the penalized spline estimator. The asymptotic results are in the sense of keeping number of measurements per subject fixed and letting the number of subjects go to infinity.

4.2 Asymptotic properties for P-spline estimator with B-spline basis

Denote Kq = λK2q/n and μ^(t)=N(t)(NTΣ1N+λDq)1NTΣ1Y.

Theorem 1: 1. Under assumptions A1, (A-1) in A2, A3, Kq = o(1), and μ(·) ∈ Cp+1[a, b], the following statements hold

E(μ^(t))μ(t)=ba(t,p+1)+bλ(t,Σ)+o(δp+1)+o(λn1δq),Var(μ^(t))=1nN(t)(G+λnDq)1G(G+λnDq)1NT(t)+o(n1δ1),

and for K ~ n1/(2p+3) and λ = O(n(p+2–q)/(2p+3)), the optimal rate for mean squared error (MSE), n–(2p+2)/(2p+3), is attained by the penalized spline estimator.

2. Under assumptions A1, (A-2) in A2, A3, Kq = O(1) and μ(·) ∈ Wq[a, b] = {μ : μ has q-1 absolutely continuous derivatives, ab{μ(q)(x)}2dx<} the Sobolev space of order q, the following statements hold

E(μ^(x))μ(x)=ba(t,q)+bλ(t,Σ)+o(δq)+o((λn)12),Var(μ^(x))=1nN(t)(G+λnDq)1G(G+λnDq)1NT(t)+o(n1(λn)12q),

and for λ ~ n1/(2q+1) and K ~ n1/(2q+1), the optimal rate for MSE, n–2q/(2q+1), is attained by the penalized spline estimator.

The proof of the Theorem is given in the online appendix.

Remark 1: For the both scenarios in the Theorem 1, the shrinkage bias bλ(t,Σ)=λnN(t)(G+λnDq)1Dqβ depends on Σ through G.

Remark 2: Theorem 1 holds for both fixed designs and random designs. The asymptotic approximation bias does not depend on the design distribution. The asymptotic shrinkage bias depends on the design distribution through G.

Remark 3: Under different conditions, Theorem 2 in Claeskens et al. (2009) obtained the same rate for the bias and the variance with m = 1 and Σ = σ2In, i.e., the univariate case.

Remark 4: The above theorem suggests that the asymptotic properties of the penalized spline estimator are closer to the regression spline estimator when the number of knots is small (Kq = o(1)) while its asymptotic properties are closer to the smoothing spline estimators when the number of knots is large (Kq = O(1)). This observation is also noted in Claeskens et al. (2009) for independent data.

Theorem 2: Assume K2p+3 ~ n, λ = O(Kp–q+2), and h > 0, C > 0, such that supi,j E|εij|2+hC. Then

μ^(t)μ(t)ba(t,p+1)bλ(t,Σ)Var(μ^(t))N(0,1)

in distribution, as n → ∞.

Remark 5: Under the assumptions of this theorem, Kq=λK2qn=O(Kpq+2K2qn)=O(n(p+q+2)(2p+3)n)=O(npq+12p+3)=o(1). Hence, the asymptotic normality addresses the first scenario in the theorem 1.

In the online appendix, we present similar asymptotic properties for P-spline estimator with truncated polynomial basis.

5. Numerical results

5.1 Simulation Studies

Simulation Study I. Our first simulation study examines performance of the semiparametric estimator of the within-subject covariance presented in section 2. We compared the proposed P-spline estimator with three other alternatives: (1) Regression spline estimator (R-spline) for both mean and variance function; (2) Penalized spline estimator for the mean function when assuming working independent residuals with constant variance (WI); and (3) Penalized spline estimator for the mean function when assuming a correctly specified parametric model for the covariance function of the residuals (Parametric). Two simulation scenarios were considered. In the first model, we generated data from

yij=sin(2πtij)+bi+ij(tij),

where the variance function of the residuals was Var{εij(t)} = exp(3t), and the correlation structure was AR-1 with autoregressive parameter ρ = 0.6. The number of subjects n = 200 and the number of repeated measurements per subject m = 10 with probability of missing equals to 0.1. Hence the number of repeated measurements can differ across subjects. The covariates tij were generated from a uniform distribution, U(0, 1). The random effects bi were generated independently from a standard normal distribution.

In the second simulation model, we used μ(t) = 7 – 16t + 30t2 – 15t3 and σ2(t)=10t and all the other settings were the same as the first case.

We conducted 200 simulation runs. To evaluate performance of the estimated nonparametric functions, the mean squared errors (MSEs) were calculated over grid points {0.05, 0.06, · · ·, 0.95} for each simulated dataset. The MSEs were then averaged across the 200 simulated datasets to obtain the average MSE (AMSE). Table 1 summarizes the simulation results. The AMSEμ and AMSEσ are the corresponding AMSEs of μ(t) and σ2(t). The RMSEμ and RMSEσ are the ratios of AMSE of the proposed P-spline estimators μ^(t) and σ^2(t) over other estimators. The RMSEμ of the proposed over assuming working independent residuals was around 0.85 for both simulation models which suggests efficiency gain of estimating mean function by properly accounting for the within-subject covariance by the proposed semiparametric estimator. The RMSEμ of the proposed P-spline estimator over the regression spline was about 0.95. The corresponding AMSEσ for the variance function of the proposed over the regression spline was about 0.90 for both simulation models, which shows the proposed method to be also more efficient (10% reduction in AMSE) in estimating the variance function. To compare with the parametric approach assuming the functional form of the variance function to be known, we note that the RMSEμ of the proposed over the parametric approach was slightly over one indicating low efficiency loss in adopting the proposed semiparametric approach to estimate variance functions.

Table 1.

Simulation results based on model 1, 200 replications

Method RMSEμ AMSEμ RMSEσ AMSE σ Method RMSEμ AMSEμ
Case I
P-spline 1 0.0317 1 0.637 Parametric 1.001 0.0315
R-spline 0.948 0.0335 0.901 0.708 WI 0.829 0.0381

Case II
P-spline 1 0.0281 1 0.611 Parametric 1.005 0.028
R-spline 0.946 0.0297 0.889 0.687 WI 0.849 0.033

RMSE: The ratio of AMSE between the proposed method and other methods.

Simulation Study II. Our second simulation study examines methods proposed for the functional mixed effects model with varying coefficients and nonparametric random subject-specific curves in section 3. We generated data from the model

Yij=μ(tij)+β(tij)trti+bi0+bi1ν(tij)+ij(tij),

where we considered two simulation scenarios. In the first scenario, we specified

μ(t)=2sin(2πt),β(t)=13logt,ν(t)=1.5exp{10(t0.8)2},σ2(t)=exp(t).

The random coefficients bi0 and bi1 were sampled from N(0, 4) and N(0, 1), respectively. The measurement errors εij(tij) were generated independently from N(0, σ2(tij)). The group indicators, trti, were generated from Bernoulli distribution with probability 0.6. The total number of subjects n = 200 while the repeated measurements within each subject m = 10 with probability 0.15 of being missing. The measurement time points were generated from U(0, 1).

In the second scenario, we specified

μ(t)=2exp{sin(4t)},β(t)=t,ν(t)=0.7exp(t),σ2(t)=exp{5(t0.1)2},

n = 100, and m = 20 with a missing probability of 0.15. All the other settings were the same as the first case.

The simulation results are summarized in Table 2. Again the AMSEμ, AMSEβ, AMSEσ and AMSEγ are the corresponding AMSEs of μ(t), β(t), σ2(t) and γ(t, t), respectively. The RMSEs are the ratios of the AMSE of the proposed method over other methods. Similar to the first simulation study, we compared the proposed estimators to regression spline (R-spline), P-spline assuming working independent residuals (WI) and P-spline assuming a correctly specified parametric model for the subject-specific random effects covariance and residual effects variance (Parametric). The efficiency gains of the proposed method for estimating the mean and varying coefficient function were about 15% compared to assuming working independent residuals in both simulation scenarios, which is non-ignorable. For estimating the mean function, in the first scenario, the proposed method performed better than the regression spline in terms of AMSE, while in the second scenario their performance was similar. The AMESσ for estimating the covariance function was 30% lower for the P-spline compared to regression spline in the first scenario and 17% lower in the second scenario. Analogous to the simulation study I, the differences in AMSE between the parametric approach assuming a correctly specified subject-specific random effects covariance and residual effects variance and the proposed method were small in both simulation cases.

Table 2.

Simulation results based model 2, 200 replications

RMSEμ AMSEμ RMSEβ AMSEβ
Case I
P-spline 1 0.0749 1 0.120
R-spline 0.952 0.0787 0.932 0.129
WI 0.800 0.0935 0.841 0.143
Parametric 1.003 0.0746 0.988 0.122

Case II
P-spline 1 0.144 1 0.270
R-spline 0.998 0.144 0.991 0.273
WI 0.834 0.173 0.888 0.304
Parametric 1.007 0.143 1.003 0.269
RMSEσ AMSEσ RMSEγ AMSEγ
Case I
P-spline 1 0.111 1 0.606
R-spline 0.701 0.158 0.991 0.611

Case II
P-spline 1 0.0048 1 0.624
R-spline 0.831 0.0058 0.995 0.627

RMSE: The ratio of AMSE between the proposed method and other methods.

The computing time to fit the model by the proposed algorithm depends on the number of subjects and number of observations per subject. For example, for the first scenario in Simulation II, the average running time for 100 repetitions with 100 subjects and 10 observations per subject on a Dell desktop with 2.67 GHz CPU and 4GB RAM was 1.42 minutes. We present the computing time for other sample sizes in the online appendix.

5.2 Data examples

Example I. We applied proposed methods to analyze the Berkeley Growth Study (Tuddenham and Snyder 1954) data, a long-term investigation of children's developmental characteristics conducted by the California Institute of Child Welfare. There were 93 subjects examined, including 39 boys and 54 girls. The heights of the children were measured at each of the scheduled times. There were four measurements by a child's first birthday followed by annual measurements from two to eight years, and then biannual measurements until the end of age 18.

Let yij be the height of subject i measured at occasion j, and let tij be the corresponding age. We fitted the model

yij=μ(tij)+sexi×β(tij)+νi(tij)+ij(tij),i=1,,93,j=1,,31,

where μ(t) was the mean height function for the boys, β(t) was the height difference between girls and boys over time and νi(t) were the random subject-specific deviations from their respective population mean function for boys and girls.

We used quadratic truncated polynomial splines for the mean, varying coefficient and variance functions and linear splines for the random subject-specific curves. The estimated varying coefficient and mean functions and the associated 95% confidence bands were plotted in Figure 1. The mean function for boys increased rapidly and then slowed down after age 16. The varying coefficient function β(t) decreased quickly after age 12, while for the remaining time it was close to a constant. On average, the girls were shorter than the boys by about 2cm under the age of 12. After age 12, the difference between boys and girls increased quickly. At the age of 18, the maximum difference of about 14cm was reached, with boys being taller. Also note that during age 10 and 12, there was a visible bump of the difference between the boys and girls corresponding to the first period of puberty of girls coming two years earlier than boys.

Figure 1.

Figure 1

Estimated population mean function for boys μ(t) (left panel), varying coefficient β(t) (right panel) and their 95% confidence bands.

Using the bootstrap test introduced in section 3.2, we tested whether the varying coefficient function was a constant. We simulated B=100 bootstrap samples. The observed likelihood ratio test statistic was T = log LH1 – log LH0=3840. Based on the simulated empirical distribution of T under the null hypothesis, the p-value < 0.01. Therefore, we observed significant evidence that the height difference between boys and girls varies across time. This can also be seen from the pointwise 95% confidence interval for β(t).

The estimated covariance function γ(s, t) of the subject-specific curves is plotted on the left panel of Figure 2. We can see that there was considerable variation of the subject-specific curves around their mean function, indicating substantial between-subject variation of the height growth patterns across children compared to the within-subject variation. The between-subject variation increased with age. The estimated standard deviation function σ(t) of the residual measurement errors is shown in the right panel of Figure 2. It is evident that the variance function is not a constant. There was a decreasing trend of the variance function suggesting improvement of the precision of height measurements as a child grows. It is conceivable that height measurements for newborns are more variable than teenagers. The magnitude of σ2(t) is much smaller compared to γ(t, t), suggesting that the dominant variance component of the variation in children's heights is the between-subject source.

Figure 2.

Figure 2

Estimated between-subject variation γ(s, t) (left panel) and within-subject variation σ(t) (right panel).

Example II. In this example, we applied the proposed method to analyze the Framingham Heart Study (FHS) longitudinal systolic blood pressure (SBP) data. The FHS is a large ongoing prospective study of risk factors for cardiovascular disease (CVD) with the third generation data collected between 2002 and 2005 (Splansky et al., 2007). We analyzed subjects with age ranging between 30 and 75. There were 190 independent subjects with 2406 observations. For each subject, their SBP, body mass index (BMI) and antihypertensive treatment status (trt) were measured over time. Sex was a baseline covariate coded as one for females and negative one for males. We centered the covariate BMI.

Let yij be the SBP of subject i measured at occasion j, and tij be the corresponding age. We fitted the model

yij=α1sexi+α2BMIij+μ(tij)+β(tij)×trtij+νi(tij)+ij(tij),

where μ(t) was the population mean SBP function, β(t) was the time varying effect of the antihypertensive treatment and νi(t) were the random subject-specific deviations from the population mean function. We used linear truncated polynomial spline basis for the mean function, varying coefficient and random subject-specific curves; and quadratic spline for the variance function. We show the estimated mean SBP function for subjects with and without treatment in Figure 3. The population mean SBP increased with age, and taking antihypertensive treatment reduced the SBP over time. For example, the mean SBP was 128.8 (95% CI: [126.5, 131.1]) at age 40 and then increased to 139.3 (95% CI: [136.5, 142.1]) at age 60.

Figure 3.

Figure 3

Observed SBP values (dots) and estimated mean SBP function for treated (μ(t) + β(t), trt=1) and untreated subjects (μ(t), trt=0).

We first tested whether the effect of antihypertensive is zero and the test was found to be significant. Using the bootstrap procedure in section 3.2, we then tested whether there is any time-varying treatment effect (i.e., H0 : β(t) = β*). The observed log-likelihood ratio test statistic T = 972 with p-value=0.02 based on B=100 bootstrap samples. Therefore, we observe significant evidence that the effect of antihypertensive treatment was non-zero and it varied with time. The square-root of the estimated variance function, σ^2(t), and between subject covariance function γ^(s,t) are plotted in Figure 4. The variance function appeared to be non-linear, with a change of the rate of increase at around age 50 and 68.

Figure 4.

Figure 4

Estimated between-subject variation γ(s, t) (left panel) and within-subject variation σ(t) (right panel).

6. Discussion

In this work, we propose flexible estimation of population-level and subject-level curves in a class of functional mixed effects models with varying coefficients. We also propose nonparametric estimation of the between-subject covariance and semiparametric estimation of the within-subject covariance which are useful descriptive tools to examine the outcome variability over time. When parsimony is desirable, these functions can be used to design reasonable parametric structures for the covariance of the outcomes. It is easy to see that the estimated covariance functions satisfy the positive semi-definite constraint. Furthermore, taking into account of the covariance function improves efficiency in estimating the population mean function and the varying coefficients. The relative efficiency of estimating the covariance function with more subjects or more observations per subject depends on complexity of the functions σ2(t) and γ(s, t).

In the model (2), we assumed that the covariance function γ(s, t) of the subject-specific curves is the same for all subjects. It is possible that γ(s, t) differs across groups of subjects. For example, the covariance function for boys and girls may be different in the Berkeley Growth data. It is easy to accommodate such extension through the proposed penalized spline methods by including an interaction between the basis functions and a covariate. In addition, adding parametric random effects to the model (2) is also straightforward.

The asymptotic theories for penalized spline estimator are under-developed until very recently. In this paper, we have extended the asymptotic bias and variance results in Claeskens et al. (2009) for univariate data to the longitudinal data case and we show the asymptotic normality for one of the asymptotic scenarios. The convergence rates obtained are consistent with those in Claeskens et al. (2009). Although the sample size required for the asymptotics to be an accurate approximation may be large, these results suggest that the P-spline estimator can be asymptotically as efficient as other smoothing techniques such as smoothing splines when the number of knots increases with sample size at a proper rate.

Supplementary Material

Supplementary Data

Acknowledgements

The Framingham data was obtained from the Framingham Heart Study of the National Heart Lung and Blood Institute of the National Institutes of Health and Boston University School of Medicine (Contract No. N01-HC-25195). Yuanjia Wang's research is supported by NIH grant AG031113-01A2. The authors wish to thank the Associate Editor and two anonymous reviewers for their constructive and helpful comments on this manuscript.

Footnotes

Supplementary Materials: Web Appendix A, referenced in sections 2, 4 and 5.1, are available under the Paper Information link at the Biometrics website http://www.biometrics.tibs.org.

Contributor Information

Huaihou Chen, Department of Biostatistics, Mailman School of Public Health, Columbia University, 722 W168th Street, New York, New York 10032, U.S.A..

Yuanjia Wang, Department of Biostatistics, Mailman School of Public Health, Columbia University, 722 W168th Street, New York, New York 10032, U.S.A..

References

  1. Barrow DL, Smith PW. Asymptotic properties of best L2[0,1] approximation by splines with variable knots. Quarterly of Applied Mathematics. 1978;36:293–304. [Google Scholar]
  2. Claeskens G, Kivobokova T, Opsomer JD. Asymptotic properties of penalized spline estimators. Biometrika. 2009;96:529–544. [Google Scholar]
  3. Crainiceanu C, Ruppert D. Restricted likelihood ratio tests in nonparametric longitudinal models. Statistica Sinica. 2004a;14:713–729. [Google Scholar]
  4. Crainiceanu C, Ruppert D. Likelihood ratio tests in linear mixed models with one variance component. Journal of the Royal Statistical Society, Series B. 2004b;66:165–185. [Google Scholar]
  5. Crainiceanu C, Ruppert D, Carroll RJ, Joshi A, Goodner B. Spatially adaptive Bayesian P-splines with heteroscedastic errors. Journal of Computational and Graphical Statistics. 2007;16:265–288. [Google Scholar]
  6. Di C, Crainiceanu C, Caffo BS, Punjabi NM. Multilevel functional principal component analysis. Annals of Applied Statistics. 2009;3:458–488. doi: 10.1214/08-AOAS206SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Diggle PJ, Liang KY, Zeger SL. Analysis of Longitudinal Data. 2nd ed. Oxford University Press; Oxford: 2002. [Google Scholar]
  8. Diggle P, Verbyla A. Nonparametric estimation of covariance structure in longitudinal data. Biometrics. 1998;54:401–415. [PubMed] [Google Scholar]
  9. Durbáan M, Harezlak J, Wand MP, Carroll RJ. Simple fitting of subject-specific curves for longitudinal data. Statistics in Medicine. 2005;24:1153–1167. doi: 10.1002/sim.1991. [DOI] [PubMed] [Google Scholar]
  10. Eilers P, Marx B. Flexible smoothing with B-splines. Statistical Science. 1996;11:89–121. [Google Scholar]
  11. Fan J, Huang T, Li R. Analysis of longitudinal data with semi-parametric estimation of covariance function. Journal of the American Statistical Association. 2007;102:632–641. doi: 10.1198/016214507000000095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Guo W. Functional mixed effects models. Biometrics. 2002;58:121–128. doi: 10.1111/j.0006-341x.2002.00121.x. [DOI] [PubMed] [Google Scholar]
  13. Huang JZ, Liu N, Pourahmadi M, Liu L. Covariance matrix selection and estimation via penalized normal likelihood. Biometrika. 2006;93:85–98. [Google Scholar]
  14. Huang J, Wu C, Zhou L. Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika. 2002;89:111–128. [Google Scholar]
  15. Kauermann G, Krivobokova T, Fahrmeir L. Some asymptotic results on generalized penalized spline smoothing. Journal of the Royal Statistical Society, Series B. 2009;71:487–503. [Google Scholar]
  16. Kauermann G, Wegener M. Functional Variance Estimation using penalized splines with principal components analysis. Statistics and Computing. 2009 In press. [Google Scholar]
  17. Khoury M, Beaty H, Cohen B. Fundamentals of Genetic Epidemiology. Oxford University Press; New York: 1993. [Google Scholar]
  18. Krafty R, Gimotty P, Holtz D, Coukos G, Guo W. Varying coefficient model with unknown within-subject covariance for analysis of tumor growth curves. Biometrics. 2008;64:1023–1031. doi: 10.1111/j.1541-0420.2007.00980.x. [DOI] [PubMed] [Google Scholar]
  19. Li Y, Ruppert D. On the asymptotics of penalized splines. Biometrika. 2008;95:415–36. [Google Scholar]
  20. O'Sullivan F. A statistical perspective on ill-posed inverse problems (c/r: P519-527). Statistical Science. 1986;1:502–518. [Google Scholar]
  21. Ramsay JO, Silverman BW. Functional data analysis. Second Edition Springer; New York: 2005. [Google Scholar]
  22. Rice J, Wu C. Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics. 2001;57:253–259. doi: 10.1111/j.0006-341x.2001.00253.x. [DOI] [PubMed] [Google Scholar]
  23. Ruppert D. Selecting the number of knots for penalized splines. Journal of Computational and Graphical Statistics. 2002;11:735–757. [Google Scholar]
  24. Ruppert D, Wand M, Carroll R. Semiparametric Regression. Cambridge University Press; Cambridge: 2003. [Google Scholar]
  25. Ruppert D, Wand M, Carroll R. Semiparametric regression during 2003–2007. Electronic Journal of Statistics. 2009;3:1193–1256. doi: 10.1214/09-EJS525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Staicu A-M, Crainiceanu CM, Carroll RJ. Fast methods for spatially correlated multilevel functional data. Biostatistics. 2010;8:1–29. doi: 10.1093/biostatistics/kxp058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Splansky GL, Corey D, Yang Q, Atwood LD, Cupples LA, Benjamin EJ, D'Agostino RB, Fox CS, Larson MG, Murabito JM, O'Donnell CJ, Vasan RS, Wolf PA, Levy D. The Third Generation Cohort of the National Heart, Lung, and Blood Institute's Framingham Heart Study: design, recruitment, and initial examination. American Journal of Epidemiology. 2007;165:1328–1335. doi: 10.1093/aje/kwm021. [DOI] [PubMed] [Google Scholar]
  28. Tuddenham RD, Snyder MM. Physical growth of California boys and girls from birth to eighteen years. University of California Publication in Child Development. 1954;1:183–364. [PubMed] [Google Scholar]
  29. Wood S, Jiang W, Tanner M. Bayesian mixture of splines for spatially adaptive nonparametric regression. Biometrika. 2002;89:513–528. [Google Scholar]
  30. Wu WB, Pourahmadi M. Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika. 2003;90:831–844. [Google Scholar]
  31. Wu H, Zhang J. Nonparametric Regression Methods for Longitudinal Data Analysis Mixed-Effects Modeling Approaches. Wiley; 2006. [Google Scholar]
  32. Yao F, Müller HG, Wang J-L. Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association. 2005;100:57–590. [Google Scholar]
  33. Yao F, Lee TC. Penalized spline models for functional principal component analysis. Journal of the Royal Statistical Society, Series B. 2006;68:3–25. [Google Scholar]
  34. Zhu Z, Fung WK, He X. On the asymptotics of marginal regression splines with longitudinal data. Biometrika. 2008;95:907–917. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

RESOURCES