Longitudinal Functional Models with Structured Penalties

Madan G Kundu; Jaroslaw Harezlak; Timothy W Randolph

doi:10.1177/1471082X15626291

. Author manuscript; available in PMC: 2017 Apr 1.

Published in final edited form as: Stat Modelling. 2016 Feb 17;16(2):114–139. doi: 10.1177/1471082X15626291

Longitudinal Functional Models with Structured Penalties

Madan G Kundu ¹, Jaroslaw Harezlak ², Timothy W Randolph ³

PMCID: PMC5354471 NIHMSID: NIHMS816516 PMID: 28316508

Abstract

This article addresses estimation in regression models for longitudinally-collected functional covariates (time-varying predictor curves) with a longitudinal scaler outcome. The framework consists of estimating a time-varying coefficient function that is modeled as a linear combination of time-invariant functions with time-varying coefficients. The model uses extrinsic information to inform the structure of the penalty, while the estimation procedure exploits the equivalence between penalized least squares estimation and a linear mixed model representation. The process is empirically evaluated with several simulations and it is applied to analyze the neurocognitive impairment of HIV patients and its association with longitudinally-collected magnetic resonance spectroscopy (MRS) curves.

Keywords: Functional data analysis, longitudinal data, LongPEER estimate, structured penalty, generalized singular value decomposition

1 Introduction

Technological advancements and increased availability of storage of large datasets have allowed for the collection of functional data as part of time-course or longitudinal studies. In the cross-sectional setting, there have been many proposed methods for estimating a regression function in a so-called functional linear model (fLM). This function is a continuous analogue of a vector of (discrete) regression coefficients; it connects the scalar response, y to a functional covariate, w ≡ w(s). Although estimation for fLMs have been well studied, the extension to longitudinally-collected functions have not received much attention. Only recently have longitudinal penalized functional regression (LPFR) and longitudinal functional principal component regression (LFPCR) approaches have been proposed to extend the cross-sectional fLM to a longitudinal setting by incorporating subject-specific random intercepts (Goldsmith et al., 2012; Gertheiss et al., 2013). However, a basic assumption of LPFR and LF-PCR is that the regression function remains constant over time. Consequently, these methods are not suited for situations in which the association between a functional predictor and scalar response may evolve over time. We propose a technique that extends the analysis of functional linear models by relating a scalar outcome to a functional predictor—both observed longitudinally—and estimates a time-dependent regression function.

The method fits into a generalized ridge regression framework by incorporating an scientifically informed context-dependent quadratic penalty term into the estimation process. Our extension of the fLM framework to the longitudinal setting has two major advantages: 1) the regression function is allowed to vary over time; and 2) extrinsic, scientifically relevant information about the structure of the regression function can be incorporated directly into the estimation process. We formulate the estimation procedure within a mixed-model framework making the method computationally efficient and easy to implement.

Ramsay and Dalzell (1991) introduced the term functional data analysis (FDA) in the statistical literature. The cross-sectional fLM with scalar response can be stated as follows (see e.g., Yao and Müller, 2010)

E (y ∣ W) = μ_{y} + \int_{Ω} W (s) γ (s) d s

where μ_y is the mean of y, Ω denotes the domain of the predictor functions W ≡ W(·), s ∈ Ω, and γ ≡ γ(·) is a square integrable function that models the linear relationship between the functional predictor and scalar response. We will assume that W denotes a mean-centered function (E[W(s)] = 0 for almost all s ∈ Ω).

As there is no unique γ that solves this equation some form of regularization, or constraint, is required. A common constraint is to impose smoothness on γ. One approach to this is to expand both the regression function γ and predictor functions W in terms of B-splines and then obtain the regularized estimate of γ (Ramsay and Silverman, 1997). Another approach is to express the regression function γ in terms of the empirical orthonormal basis obtained by the eigenfunctions of the covariance of W (i.e., a Karhunen-Loève (K-L) expansion (e.g., Müller (2005)). A third approach, known as penalized functional regression (PFR) (Goldsmith et al., 2011), combines the above two methods. In PFR, a spline basis is used to represent γ and a subset of empirical eigenfunctions is used to represent each W. Another approach is to use a wavelet basis, instead of splines or eigenfunctions, to represent the predictor functions (Morris and Carroll, 2006).

Here we adopt an approach by Randolph et al. (2012) which does not begin by explicitly projecting onto a pre-specified subspace of functions. Instead, prior information about this subspace and its inherent functional structure is incorporated into the estimation process by way of a penalty operator, L. This approach of “partially empirical eigenvectors for regression” (PEER) exploits the fact that a penalized least-squares regression estimate mathematically arises as a series expansion in terms of a set of basis functions determined jointly by the covariance (empirical functional structure) and the penalty (imposed structure). This naturally extends ridge regression (non-structured penalty) and smoothing penalties such as a second-derivative penalty (presuming a smooth regression function). Here we extend the scope of the PEER approach to the longitudinal setting in a manner that allows the estimated regression function γ to vary with time. Our interest lies primarily in the estimation of a regression function β rather than the prediction of an outcome y, see, e.g., Cai and Hall (2006) for a discussion regarding the substantially different characteristics of these to goals.

The problem we address involves repeated observations from each of N subjects. For each subject, i, at each observation time, t, we collect data on a scalar response variable, y, and a (idealized) predictor function, W. We are interested in longitudinal regression models of the form:

y_{i t} = x_{i t}^{⊤} β + \int_{0}^{1} W_{i t} (s) γ (t, s) d s + z_{i t}^{⊤} b_{i} + ε_{i t} .

(1.1)

γ ≡ γ(t, ·) denotes the regression function at time t, x_it is a vector of scalar-valued (non-functional) predictors; $z_{i t}^{⊤} b_{i}$ and ε_it denote the subject specific random effect and random error term, respectively. In a spirit similar to that of a linear mixed model with time-related slope for longitudinal data, we assume that γ can be decomposed into several time-invariant component functions; e.g., γ(t, ·) = γ₀(·) + t γ₁(·).

Our work is motivated by a study in which magnetic resonance (MR) spectra have been collected longitudinally from late stage HIV patients (Harezlak et al., 2011). We consider global deficit score (GDS) as a scalar response variable, y, and MR spectra as predictor functions, W. Of interest is the association of GDS with MR spectra and how this association evolves with time. One MR spectrum is shown in the left panel of Figure 1: the amplitude, W(s), is plotted against the frequency s, normalized to the [0, 1] interval (x-axis). The pattern and amplitudes of the peaks contain information about the concentration of metabolites present in tissue. Each metabolite has a unique spectrum and so one MR spectrum is a mixture of spectra from each individual metabolite (plus background and random noise); see the right panel in Figure 1 which displays spectra from 9 metabolites. Consequently, one expects an observed spectrum from tissue to lie near a functional subspace, 𝒬, spanned by the spectra of pure metabolites. The regression function, γ, models the association between y and W and hence, in principle, should also lie near 𝒬. Hence, the subspace 𝒬 should be more informative than B-splines or cosine functions that are in some sense “external” to the problem. For this reason, we adopt a methodology that encourages the estimate of γ to be in or near the preferred subspace, 𝒬. The approach is implemented using a decomposition based penalty, L, which penalizes the estimate of γ lightly if it belongs to 𝒬 and strongly if it does not (Randolph et al., 2012).

*Left panel* an observed MR spectrum from tissue. *Right panel*: The 9 pure metabolite spectra. In each plot, the y-axis represents amplitude and x-axis the frequency of nucleus, s, normalized to [0, 1] interval.

The cross-sectional fLM with scalar response has been a focus of various investigations (Ramsay and Silverman, 1997; Faraway, 1997; Fan and Zhang, 2000; Cardot et al., 1999, 2003; Cai and Hall, 2006; Cardot et al., 2007; Reiss and Ogden, 2009), many of which estimate a regression function in two steps. For example, Cardot et al. (2003) first perform principal component regression (PCR), which projects the observed predictor curves onto an empirical basis to obtain an estimate, then use B-splines to smooth the result. Reiss and Ogden (2009) study several of these methods along with modifications that include versions of PCR using B-splines and second-derivative penalties (cf. (Ramsay and Silverman, 1997; Silverman, 2009)). Extensions of fLM have been made towards generalized linear model with functional predictors (James, 2002; Müller and Stadtmüller, 2005) and quadratic functional regression (Yao and Müller, 2010). We are interested in extending the fLM to a longitudinal setting. To our knowledge, the only published methods addressing the longitudinal functional predictor framework are LPFR (Goldsmith et al., 2012) and LFPCR (Gertheiss et al., 2013). However, both LPFR and LFPCR assume that the regression function in (1.1), γ remains constant over time. In contrast, proposed estimate is based on modeling the regression function γ as a time-dependent linear combination of several time-invariant component functions, ${γ_{d}}_{d = 0}^{D}$ , each of which is estimated via a penalty operator, L, that is informed by the context of the scientific question.

An important concern for any regularized estimation method, either cross-sectional or longitudinal, is identifiability of the estimate; i.e., the lack of uniqueness or, possibly, its instability. In the fLM this arises from the lack of invertibility of the empirical covariance operator: a finite number of predictor curves means the dimension of the range of this operator is finite and so, as an operator on a infinite-dimensional domain, it has a non-trivial null space. The philosophy behind a penalty-operator approach is that estimation is constrained to the subspace spanned by functions that are the jointly determined by W and L. A sufficient condition for uniqueness of this estimate is to assume Null(W) ∩ Null(L) = {0}; see (Engl et al., 2000) or (Bjorck, 1996). We assume this throughout.

Longitudinal functional regression model with time-varying regression function is described in Section 2. In Section 3.1, estimate of regression function is obtained asn generalized ridge estimate (Hoerl and Kennard, 1970) (or Tikhonov (1963)). Concept of decomposition-based penalty is reviewed in Section 3.2. We have described mixed model approach of estimation as Best linear unbiased predictors (BLUP) in Section 4.1 and expressions for the precision of the estimates are derived Section 4.2. In the supplemental material (available online in journal’s website) we present how our longitudinal penalized estimate, along with its bias and precision, can be obtained, under some weak assumptions, in terms of generalized singular vectors.

Simulation results are presented in Section 5. An application to real MRS data using, and a summary of our findings, is presented in Section 6. The methods discussed in this paper have been implemented in the R package refund (Crainiceanu et al., 2012) via the peer() and lpeer() functions.

2 Statistical Model

Let W denote a continuous predictor function defined on the closed interval Ω; without loss of generality Ω = [0, 1]. Let W_it denote a predictor function from the i^th subject (i = 1, …, N) at the t^th timepoint (t = t₁, …, t_{n_i}). Technically, an observed predictor arises as a discretized sampling from an idealized function, and we will assume that each observed predictor is sampled at the same p locations, s₁, …, s_p ∈ [0, 1], with sampling that is appropriately regular and dense enough to capture informative functional structure, as seen, for instance, in the MRS data in Section 6. Let w_it := [w_it(s₁), · · ·, w_it(s_p)]^⟙ be the p × 1 vector of values sampled from the realized function W_it. Then, the observed data are of the form {y_it; x_it;w_it}, where y_it is a scalar outcome, x_it is a K×1 column vector of measurements on K scalar predictors, and w_it is the sampled predictor from the i^th subject at time t. Denoting the true regression function at time t by γ ≡ γ(t, ·), the longitudinal functional outcome model of interest is

y_{i t} = x_{i t}^{⊤} β + \int_{0}^{1} W_{i t} (s) γ (t, s) d s + z_{i t}^{⊤} b_{i} + ε_{i t}

(2.1)

where, $ε_{i t} ~ N (0, σ_{ε}^{2})$ and b_i is the vector of r random effects pertaining to subject i and distributed as N(0,Σ_{b_i}). As usual we assume that z_it is a subset of x_it, ε_it and b_i are independent, ε_it and ε_i′t′ are independent whenever i ≠ i′ or t ≠ t′ or both, and b_i and b_i′ are independent if i ≠ i′. Here $x_{i t}^{⊤} β$ is the standard fixed effect from K univariate predictors, $z_{i t}^{⊤} b_{i}$ is the standard random effect and $\int_{0}^{1} W_{i t} (s) γ (t, s) d s$ is the subject/time specific functional effect. We assume that for all t, γ is a twice-continuously differentiable function of s ∈ Ω.

The functional structure, indexed by s, and time structure, indexed by t, have somewhat unequal roles in our model, as we assume the longitudinal observations are more limited in the amount of information relative to the densely-sampled s index. For example, γ may vary linearly with time, γ(t, ·) = γ₀(·) + tγ₁(·), or quadratically, γ(t, ·) = γ₀(·)+tγ₁(·)+t²γ₂(·). This is similar in spirit to a linear mixed effects model with linear or quadratic time slope (see e.g., Fitzmaurice et al., 2004). In general, we assume that γ can be decomposed into several time-invariant component functions γ₀, · · ·, γ_D as

γ (t, \cdot) = γ_{0} (\cdot) + f_{1} (t) γ_{1} (\cdot) + \dots + f_{D} (t) γ_{D} (\cdot)

where, f₁, …, f_D are D prescribed linearly independent functions of t and f_d(0) = 0 for all d; the time component t enters into γ through these terms. At t = 0, γ reduces to γ₀ and has the obvious interpretation of a baseline regression function. When D = 0, γ ≡ γ₀ is independent of t, a situation considered by Goldsmith et al. (2012). In general, each f may be any function of t with f(0) = 0, e.g., f(t) = t or t exp(t). We rewrite equation (2.1) as

y_{i t} = x_{i t}^{⊤} β + \int_{0}^{1} W_{i t} (s) {γ_{0} (s) + f_{1} (t) γ_{1} (s) + \dots + f_{D} (t) γ_{D} (s)} d s + z_{i t}^{⊤} b_{i} + ε_{i t} .

The association of y_it with W_it is modeled as a linear dependence on observations at p sampling points, w_it. In our approach, the (functional) structure is imposed directly into the estimation of each γ_d = [γ_d(s₁), …, γ_d(s_p)]^⟙, for d = 0, …, D (as described in Section 3). Combining all $n_{•} = \sum_{i = 1}^{N} n_{i}$ observations from the N subjects obtained across all time points, we express this model in discretized matrix form as

y = X β + W γ + Zb + ε .

(2.2)

Here, y = [y_1t₁, · · ·, y_{1t_n₁}, …, y_{1t_N}, …, y_{Nt_{n_N}}]^⟙ is a n_• × 1 vector of all responses, $X = {[x_{1 t_{1}}^{⊤}, \dots, x_{1 t_{n_{1}}}^{⊤}, \dots, x_{1 t_{N}}^{⊤}, \dots, x_{N t_{n_{N}}}^{⊤}]}^{⊤}$ is an n_• ×K design matrix pertaining to K univariate predictors, β is the associated coefficient vector, $γ = {[γ_{0}^{⊤}, γ_{1}^{⊤}, \dots, γ_{D}^{⊤}]}^{⊤}$ is a (D+1)p×1 is a coefficient vector representing a discretized, structured association between the sampled curves W and y. Further, b is rN ×1 vector of random effects and Z is corresponding n_• × rN design matrix. The matrix W has the form

W = [\begin{matrix} W_{1} \\ ⋮ \\ W_{N} \end{matrix}] W_{i} = [\begin{matrix} w_{{i t}_{1}}^{⊤} & f_{1} (t_{1}) w_{{i t}_{1}}^{⊤} & \dots & f_{D} (t_{1}) w_{{i t}_{1}}^{⊤} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ w_{{i t}_{n_{i}}}^{⊤} & f_{1} (t_{n_{i}}) w_{{i t}_{n_{i}}}^{⊤} & \dots & f_{D} (t_{n_{i}}) w_{{i t}_{n_{i}}}^{⊤} \end{matrix}]

The first set p columns of W are obtained as follows: a) first, place all the p × 1 observed values of W from first subject collected at first time point in the first row; b) then, place the observed values of W from all the time point and all the subjects in the successive rows such a way each row represents observed value of W from a specific time point and subject, and the rows are ordered by time and subjects according to y. Once the first p columns are constructed, the second set of p columns are constructed by multiplying first p columns by f₁(t). Similarly, the third set of p columns are constructed by multiplying first p columns by f₂(t) and so on.

3 Estimation of Parameters with a Penalty

The proposed estimation approach builds on intuition from single-level functional regression that encourage an estimates of γ ≡ γ(t, ·) to be in or near a “preferred” space via choice of penalty operator (Randolph et al., 2012). To describe the effect of a general penalty operator, L, it is useful to consider the familiar example of a Laplacian penalty, ℒ. The typical heuristic for this arises by viewing γ as a function whose local “smoothness” is informative. In this case, the term ||ℒγ||² penalizes sharp changes in γ. For our perspective, it is helpful to recall that the dominant eigenvectors of ℒ (those corresponding to the largest eigenvalues) are sharply oscillatory while the least-dominant eigenvectors are very smooth. Hence a linear-algebraic view of this is that rather than penalizing sharp changes, smoothness in the estimate is inherited from the eigenproperties of ℒ. More specifically, structure in the estimate arises from the joint eigen properties of W and L; this is given by the generalized singular value decomposition (GSVD), as described in the Supplemental section. In general, the least-dominant eigenfunctions of a penalty L will have the largest effect on the estimate. This property can be used to construct a “preferred subspace” by defining a penalty L whose least-dominant (or perhaps zero-associated) eigenfunctions are preferred. The estimates of γ₀, ···, γ_D are obtained as follows: (1) Identify the “preferred” functional subspace 𝒬 where γ is expected to belong; (2) define a decomposition-based penalty (see Section 3.2) to encourage the estimate to be close to 𝒬; (3) obtain a penalized estimate of γ₀, ···, γ_D. We say “encourage” rather than “constrain” because the estimate is not forced to be in a subspace 𝒬, but rather it is jointly determined by 𝒬 and W (as illustrated in Section 5.4). Proposed approach allows different preferred subspace for each of the γ_d’s. In the longitudinal (or t) dimension, γ is more explicitly and severely constrained by the choice of f₁, …, f_D.

3.1 Generalized Ridge Estimate

The model described in Section 2 can be written as

y = X β + W γ + ε^{*},

(3.1)

where ε^* = Zb + ε ~ N(0,V) and $V = Z \sum_{b} Z^{⊤} + σ_{ε}^{2} I$ . For each d = 0, …, D, let L_d be the penalty operator for γ_d and let $λ_{d}^{2}$ be the associated tuning parameter. The corresponding penalized estimates of β and γ are minimizers of:

{‖ y - X β - W γ ‖}_{V^{- 1}}^{2} + λ_{0}^{2} {‖ γ_{0} ‖}_{L_{0}^{⊤} L_{0}}^{2} + \dots + λ_{D}^{2} {‖ γ_{D} ‖}_{L_{D}^{⊤} L_{D}}^{2} .

(3.2)

where ${‖ a ‖}_{B}^{2} = a^{⊤} Ba$ for some symmetric, positive definite matrix B. A generalized ridge estimate of β and γ minimizing (3.2) is obtained as (see Ruppert et al., 2003)

[\begin{matrix} \hat{β} \\ \hat{γ} \end{matrix}] = {(C^{⊤} V^{- 1} C + D)}^{- 1} C^{⊤} V^{- 1} y

(3.3)

where, C = [X W], D = blockdiag{0,L^⟙L} and L = blockdiag{λ₀L₀, ···, λ_DL_D}. In the supplemental material, we have derived an expression for the generalized ridge estimate γ̂ explicitly in terms of the GSVD.

3.2 Decomposition based penalty

Let γ̃_d ≡ γ̃_{L_dλ_d} be the estimate obtained from the penalty operator L_d and tuning parameter $λ_{d}^{2}$ , for each d = 0, …, D. For example, L_d may denote I_p (a ridge penalty) or a second-order derivative penalty (giving an estimate having continuous second derivative). Alternatively, with prior knowledge about potentially relevant structure in a regression function, a targeted decomposition-based penalty can be defined in terms of a subspace defined by such structure (Randolph et al., 2012). To be precise, if it is appropriate to impose scientifically-informed constraints on the “signal” being estimated by γ, this prior may be implemented by encouraging the estimate to be in or near a subspace, 𝒬 ⊂ L²(Ω). We further assume that the subspace 𝒬 is spanned by the functions ${q_{j}}_{j = 1}^{J}$ .

Returning to our notation that reflects functional predictors observed at p sampling points, we represent 𝒬 by the range of a p×J matrix Q whose columns are ${q_{j}}_{j = 1}^{J}$ , where each q_j represents the discretized q_j. Consider the orthogonal projection P_Q = QQ⁺ onto the Range(Q), where Q⁺ is Moore-Penrose inverse of Q. Then a decomposition penalty is defined as

L_{Q} = ϕ_{b} P_{Q} + ϕ_{a} (I - P_{Q})

(3.4)

for scalars ϕ_a and ϕ_b. To see how L_Q works, let γ̃_d be any estimate of γ_d. When γ̃_d ∈ span(Q), we have L_Qγ̃_d = ϕ_bγ̃_d, but when γ̃_d ∉ span(Q), we have L_Qγ̃_d = ϕ_aγ̃_d. The condition ϕ_a> ϕ_b imposes more penalty for γ̃_d ∉ span(Q) compared to when γ̃_d ∈ span(Q). The weights ϕ_a and ϕ_b determine the relative strength of emphasizing 𝒬 in the estimation process. Note that L_Q is invertible, provided ϕ_a and ϕ_b are nonzero, and that ϕ_a = ϕ_b results in ordinary ridge estimate.

4 Mixed model representation

We aim to construct an appropriate mixed model that minimizes the expression in equation (3.2). In general, the penalty, L, is not required to be invertible but for simplicity this will be assumed here. The mixed model approach provides an automatic selection of tuning parameters λ₁, ···, λ_D. REML-based estimation of the tuning parameters has been shown to perform as well as the other criteria and under certain conditions it is less variable than GCV-based estimation (Reiss and Ogden, 2009).

4.1 Estimation of parameters

Using Henderson’s justification (Henderson, 1950), one can show that, for each d = 0, …, D, the model y = Xβ+Wγ+ε^* where, ε^* ~ N(0,V) and $γ_{d} ~ N (0, \frac{1}{λ_{d}^{2}} {(L_{d}^{⊤} L_{d})}^{- 1})$ , minimizes the expression (3.2) to obtain the BLUP. Thus the generalized ridge estimate of β and γ correspond to the BLUP from the following model:

y = X β + W^{*} γ^{*} + ε

where, W^* = [W Z], γ^* = [γ^⟙b^⟙]^⟙ ~ N[0,Σ_γ^*] and $ε ~ N (0, σ_{ε}^{2} I)$ with Σ_γ^* = blockdiag{(L^⟙L)⁻¹, Σ_b} and Σ_b = blockdiag{Σ_b₁, ···, Σ_{b_N}}. This representation allows us to estimate fixed and functional predictors simply by fitting a linear mixed model (e.g., using nlme package in R or PROC MIXED in SAS).

4.2 Precision of Estimates

Our ridge estimate is the BLUP from an equivalent mixed model, hence the variance of the estimate depends on whether the parameters are random or fixed. Randomness of γ is a device used to obtain the ridge estimate while ε and b in our case are truly random. With this in mind, we assume that the variance of the estimates is conditional on γ, but not on b. The BLUP of β, γ and b can be expressed as (see e.g., Robinson, 1991; Ruppert et al., 2003):

\begin{matrix} \tilde{β} = {(X^{⊤} V_{1}^{- 1} X)}^{- 1} X^{⊤} V_{1}^{- 1} y \tilde{γ} = {(L^{⊤} L)}^{- 1} W^{⊤} V_{1}^{- 1} (y - X \tilde{β}) \\ \tilde{b} = \sum_{b} Z^{⊤} V_{1}^{- 1} (y - X \tilde{β}) \end{matrix}

where V₁ = V + W(L^⟙L)⁻¹W^⟙. β̃ is an unbiased estimator of β, but γ̃ is not unbiased. It is trivial to see that Cov(y|γ) = V. Thus, the conditional variances of β̃ and γ̃ are:

\begin{matrix} Cov (\tilde{β} ∣ γ) = {(X^{⊤} V_{1}^{- 1} X)}^{- 1} X^{⊤} V_{1}^{- 1} V V_{1}^{- 1} X {(X^{⊤} V_{1}^{- 1} X)}^{- 1} \\ Cov (\tilde{γ} ∣ γ) = A_{γ} V A_{γ}^{⊤} A_{γ} = {(L^{⊤} L)}^{- 1} W^{⊤} V_{1}^{- 1} {V_{1} - X {(X^{⊤} V_{1}^{- 1} X)}^{⊤}} V_{1}^{- 1} \end{matrix}

(4.1)

To obtain the unconditional variance, one must replace V by V₁ in the above expressions, but this will overestimate the variance of the estimates. The 95% confidence bands can be constructed as Estimate ±3.201× (Standard Error) using the expression for the standard error derived here. The constant 3.201 was obtained by Salganik et al. (2009) through a large simulation study to ensure 95% simultaneous coverage.

Expressions for the predicted value of y and its variance are:

\tilde{y} = X \tilde{β} + W \tilde{γ} + Z \tilde{b} Cov (\tilde{y} ∣ γ) = A_{y} V A_{y}^{⊤}

where $A_{y} = [{V_{1} - W L^{⊤} LW - Z \sum_{b} Z^{⊤}}^{- 1} X {(X^{⊤} V^{- 1} X)}^{- 1} X^{⊤} V^{- 1} + W L^{⊤} L W^{⊤} + Z \sum_{b} Z^{⊤}] V_{1}^{- 1}$ . Let, T = [1 f₁(t) ··· f_d(t)]⊗I_K. Then the discretized version of regression function at time t is γ_t = [γ(t, s₁), ···, γ(t, s_K)] = Tγ. Therefore, the estimate of γ_t is γ̃_t = Tγ̃ and the estimate of its variance is TCov(γ̃|γ)T^⟙.

4.3 Selection of time-structure in γ ≡ γ(t, ·)

The proposed approach allows a flexible choice of the time structure to be included in the regression function γ ≡ γ(t, ·). In practice, data and information to estimate structure of the longitudinal observations (along the t index) are more limited than the functional relationship along the s index. For example, whether γ₀ + tγ₁ is sufficient or the more flexible γ₀ + tγ₁ + t²γ₂ is required is not known. The problem of choosing appropriate time-structure in γ is similar, in principle, to that of choosing time structure in a linear mixed-effects model (e.g., E(y_it|b_i) = β₀ + β₁ t or E(y_it|b_i) = β₀ + β₁ t + β₂ t²). We propose two approaches to determine the form of the unknown regression function: (a) use AIC to compare different structures, and (b) use a point-wise confidence band for the component functions: γ₀, …, γ_D. If the confidence band for any γ_d contains zero in its entire domain, then such term is dropped from the γ.

4.4 Selection of ϕ_a and ϕ_b for a decomposition penalty

We view ϕ_a and ϕ_b as weights of a tradeoff between preferred and non-preferred subspaces and assume ϕ_a · ϕ_b = constant. In the current implementation, we use REML to estimate λ_d’s for a fixed value of ϕ_a, and do a grid search over the ϕ_a values to jointly select the tuning parameters which maximize the information criterion, such as AIC, based on the restricted maximum likelihood.

5 Simulation

The proposed method of estimating regression function using decomposition based penalty will be referred to as LongPEER (longitudinal partially empirical eigenvectors for regression). We pursue several simulations to evaluate its properties. The first simulation study (Section 5.1) compares the performance of LongPEER with the LPFR approach. In the remaining simulation studies, only LongPEER is considered due to a lack of comparable alternatives. The purpose of the second simulation study is to evaluate the influence of sample size and the contribution of prior information about the functional structure (as determined by the tuning parameters ϕ_a and ϕ_b in (3.4)) on the LongPEER estimate. In the third simulation study, we evaluate the coverage probabilities of the confidence bands constructed using the formula presented in Section 4.2. Finally, in Section 5.4 we evaluate the performance of a LongPEER estimate when information on some features is missing. These studies are performed in the context of MRS data and hence the simulated predictor functions are constructed based on the structure of MRS data. All results summarized in this section are based on 100 simulated datasets.

For each subject and visit, predictor functions were simulated using equation (5.3). Predictor functions were flat with bumps of varied widths at a number of pre-specified locations. White noise was added to the predictor functions to account for the instrumental measurement noise. These “bumpy” regression functions were generated with bumps at some (but, not all) of the bump locations of the predictor function. For the simulation in Section 5.1, the regression function is assumed to be independent of time, whereas it varies with time in the simulation of Section 5.2. For both the predictor and regression functions, 100 equi-spaced sampling points in [0,1] are used.

For the decomposition based penalty (3.4), the matrix L_d is defined as follows: (1) select the discretized functions q₁, …, q_J spanning the “preferred” subspace, where each q_j defined to have a single bump corresponding to a region in the simulated predictor functions (see Figure 2); and (2) compute L_d = QQ⁺, where Q = col[q₁, …, q_J]. The columns of Q need not be orthogonal (cf., Figure 9). For ridge and second-order penalties, the matrices L_d are defined as I and 𝒟², respectively, where 𝒟² = [d_i_,_j] is a square matrix with entries d_i_,_i = d_i_,_i₊₂ = 1, d_i_,_i₊₁ = −2 and d_i_,_j = 0 otherwise.

Average estimates of γ for the simulation in Section 5.1 with *ϕ_a* = 10 and *ϕ_b* = 1. *Top panel*: Prior information used in LongPEER estimation. *Bottom panel*: True γ and the average estimates from 100 simulations. LongPEER, ridge and second-order difference estimates have been obtained using decomposition based penalty, ridge penalty and second-order difference penalties, respectively as described in 3rd paragraph of Section 5. LPFR estimates were obtained using lpfr() with default options in refund package.

LongPEER estimates of γ₀ and γ₁ with *ϕ_a* = 100, *ϕ_b* = 1 as described in Section 6. Shaded region in both the plots represent 95% pointwise confidence bands. Selected (scaled) pure metabolite spectra are also shown on both plots.

Our primary interest lies in estimation of regression function; therefore, estimation error is summarized in terms of the mean squared error (MSE) defined as ||γ − γ̃||², where γ̃ denotes the estimate of γ. We also calculated the sum of squared prediction errors (SSPE) as ||y− ỹ||²/N, where ỹ denotes the estimate of the true (noiseless) y to evaluate the prediction performance. The estimates based on the proposed methods, including the LongPEER estimate, were obtained as BLUPs from the mixed model formulation described in Section 4.1.

5.1 Comparison with LPFR

As mentioned, LPFR estimates a regression function that does not vary with time. Therefore, in order to compare LongPEER with LPFR, in the first set of simulations we generated outcomes using a time-invariant regression function (i.e., γ(t, ·) = γ₀(·), for all t). The following model was used to generate the outcome data for 100 individuals, each at 4 timepoints (t = 0, 1, 2, 3):

y_{i t} = β_{0} + \int_{0}^{1} W_{i t} (s) γ_{0} (s) d s + b_{i} + ε_{i t}, i = 1, \dots, 100.

(5.1)

We uniformly sampled W’s at s = 0.01, 0.02, ···, 0.99, 1.00. γ₀ was defined as follows with bumps centered at H_γ₀ = {0.15, 0.50, 0.80}

γ_{0} (s) = \sum_{h \in H_{γ_{0}}} a^{0, h} exp [- c^{0, h} * {(\frac{s - h}{100})}^{2}], for s \in [0, 1] .

where a^0,^h and c^0,^h correspond to amplitude and degree of curvature, respectively, at the bumps as specified in Table 1. We constructed W to consist of wide, moderate and narrow bumps centered at: H_wide = {0.50}, H_moderate = {0.15, 0.30, 0.70, 0.80, 0.90} and H_narrow = {0.05, 0.95}. Specifically, we generated the subject-specific functional predictors w_it(s) [i.e., the observed values of W] of the form

Table 1.

The amplitude and curvature used to generate the predictor function W and regression function γ₀ in simulation studies in Sections 5.1–5.4. Parameters defining features in equation (5.3) are specified in columns Wide, Moderate and Narrow. H_γ₀ and H_γ₁ represent the set of sampling points with corresponding to bumps in γ₀ and γ₁, respectively. Note that, the degrees of curvature in γ₀ and γ₁ are different from the one assumed in W.

	Wide		Moderate		Narrow		H_γ₀		H_γ₁

h	amp (a^w,h)	curv (c^w,h	amp (a^m,h)	curv (c^m,h)	amp (a^n,h)	curv (c^n,h)	amp (a⁰^,h)	curv (c⁰^,h)	amp (a¹^,h)	curv (c¹^,h)

5					0.1	2500
15			0.6	1000			0.2	500
30			0.6	1000					0.2	500
50	0.5	250					−0.15	250
70			0.5	1000					− 0.2	1000
80			0.5	1000			0.15	500
90			0.4	1000
95					0.1	2500

Open in a new tab

amp: Amplitude; curv: Curvature

\begin{array}{l} w_{i t} (s) = \sum_{h \in H_{wide}} (ξ_{i t}^{w, h} + a^{w, h}) exp [- c_{w, h} * {(\frac{s - h}{100})}^{2}] \\ + \sum_{h \in H_{moderate}} (ξ_{i t}^{m, h} + a^{m, h}) exp [- c_{m, h} * {(\frac{s - h}{100})}^{2}] \\ + \sum_{h \in H_{narrow}} (ξ_{i t}^{n, h} + a^{n, h}) exp [- c_{n, h} * {(\frac{s - h}{100})}^{2}], \end{array}

(5.2)

where {a^w^,^h, c^w^,^h}, {a^m^,^h, c^m^,^h} and {aⁿ^,^h, cⁿ^,^h} correspond to the amplitude and degree of curvature at the wide, moderate and narrow bumps, respectively. These are defined in Table 1. Note that the amount of curvature in W differs considerably from the amount imposed in γ₀ and γ₁. $ξ_{i t}^{w, h}, ξ_{i t}^{m, h}$ and $ξ_{i t}^{n, h}$ were drawn independently from Uniform(0, 0.1). Also, β₀ = 0.06, ε_it ~ N[0, (0.02)²] and b_i ~ N[0, (0.05)²].

Both LPFR (using lpfr() available in the refund package in R (Crainiceanu et al., 2012)) and the LongPEER estimates were obtained. For LPFR estimate, the dimension of both principal components and truncated power series spline basis were set to 30. The prior information used here is represented by bumps of a various scalings and locations, as shown in the top panel of Figure 2. This structure is loosely consistent with that present in the predictors. We used ϕ_a/ϕ_b = 10, a choice motivated by our findings in Sections 5.2 and 5.4.

Table 2 displays the MSE and prediction error obtained for LongPEER and LPFR estimates. The SSPE was similar for both methods (1.1572 and 1.1527), however, the LongPEER estimate has smaller MSE. Both the bias and variance are higher for the LPFR estimate and consequently it has the greater MSE. Figure 2 displays the estimates of the regression function. It should be emphasized that any comparison of these methods is not entirely fair since LongPEER is designed to exploit presumed structural information while LPFR is not. We note also that the ability to exploit such information may be limited and so in this simulation we used imprecise information about the shapes of features; see top panel in Figure 2.

Table 2.

Estimation and prediction errors for LPFR and LongPEER estimates based on 100 simulated datasets. The sample size is N = 100 and the number of longitudinal observations is n_i = 4.

	LongPEER	LPFR

MSE(γ₀)	0.0257	0.2229
Trace of Variance(γ₀)	0.0057	0.0653
\|\|Bias(γ₀)\|\|²	0.0200	0.1576
SSPE of Y	1.1572	1.1527

Open in a new tab

5.2 Simulation with a time varying regression function

Here and in subsections 5.3 and 5.4, the regression function varies parametrically with time. Due to the lack of existing alternatives for time-varying functional regression for comparison, we instead evaluate the performance of our approach in several ways. First, our primary goal was to assess the effects of sample size, fraction of variance explained by the model, and the relative contribution of extrinsic information (as determined by ϕ_a and ϕ_b in equation (3.4)) on estimate. Without loss of generality, we set ϕ_b = 1 and vary ϕ_a on an exponential scale. Larger values of ϕ_a indicate greater emphasis of prior information on the estimation process. The model considered here was similar to that described in Section 5.1 with the exception that γ(t, ·) = γ₀(·) + t γ₁(·). The function γ₀ is defined in equation (5.1) and γ₁ was defined as follows with bumps centered at H_γ₁ = {0.30, 0.70}

γ_{1} (s) = \sum_{h \in H_{γ_{1}}} a^{1, h} exp [- c^{1, h} * {(\frac{s - h}{100})}^{2}]

where a^1,^h and c^1,^h correspond to amplitude and degree of curvature, respectively, at the bumps as specified in Table 1. As before, it was assumed that β₀ = 0.06. Realizations of functional predictors were generated as described in section 5.1. For each simulation, an appropriate $σ_{ε}^{2}$ was chosen to ensure that the squared multiple correlation coefficient $R^{2} = s_{y}^{2} / [s_{y}^{2} + σ_{ε}^{2}]$ is 0.6 and 0.9. Here, $s_{y}^{2} = \frac{1}{4} \sum_{t = 0}^{3} \frac{1}{N - 1} \sum_{i = 1}^{N} {(y_{i t} - {\bar{y}}_{. t})}^{2}$ denotes the average sample variance in the set {y_it − ε_it : i = 1, ⋯, N; t = 0, ⋯, 3} with ${\bar{y}}_{. t} = \frac{1}{N} \sum_{i = 1}^{N} y_{i t}$ .

We have repeated the simulation for four scenarios: (i) N = 100, R² = 0.6; (ii) N = 100, R² = 0.9; (iii) N = 200, R² = 0.6; and (iv) N = 200, R² = 0.9. The prior information used to obtain LongPEER estimates of γ₀ and γ₁ are plotted in the top panel of Figure 5. Results for AIC, MSE and SSPE are displayed graphically in Figure 3. The standard deviation of MSE were plotted in Figure 4. As the sample size and R² increased, both the MSE(γ₀) and MSE (γ₁) were decreased, providing empirical evidence that the LongPEER estimates were consistent. In general, MSE(γ₀) (except for N = 100, R² = 0.60) and MSE(γ₁) were minimized at ϕ_a = 10 and plateaued after that. That is, an increase in ϕ_a up to 10 resulted in improvement in estimation of both γ₀ and γ₁ and thereafter estimation performance remain almost unchanged. Finally, note that the value of ϕ_a that maximized AIC also minimized MSE(γ₀) and MSE(γ₁). These results suggest that (a) AIC can be used to guide the choice of ϕ_a, and (b) with a sufficiently large ratio of ϕ_a/ϕ_b, there is minimal impact on the estimation performance.

Average estimates of the components of regression functions for simulations described in Section 5.2 with *ϕ_a* = 10 and *ϕ_b* = 1. *Top panel*: prior information used in LongPEER estimation. *Middle and bottom panels*: the average estimates of γ₀ and γ₁; these improve as N and/or R² increase.

Average AIC, SSPE and MSE for simulations in Section 5.2 over 100 simulations. At *ϕ_a* = 10, average AIC were maximized and MSE(γ₀) and MSE(γ₁) were minimized. In general, average AIC increased with the increase in sample size and R² whereas SSPE, MSE(γ₀) and MSE(γ₁) decreased.

Standard deviation of MSE(γ₀) and MSE(γ₁) over 100 simulations in Section 5.2. Both the standard deviations were decreased with increasing sample size and R² and were minimized at *ϕ_a* = 10 and plateaued after that.

The average LongPEER estimate of γ₀ and γ₁ using a decomposition penalty are displayed in Figure 5 with ϕ_a = 10 and ϕ_b = 1. For smaller sample sizes and R², the LongPEER estimate may: (a) oversmooth (i.e., negatively bias) the estimated regression function at locations of a true feature, and (b) be positively biased in locations corresponding to features in 𝒬 but the true γ is zero. However, by increasing the sample size to 200 and/or R² to 0.9, we observe that the average LongPEER estimate γ₀ and γ₁ approach the true functions.

5.3 Coverage probability

In this section, data were generated according to simulation scenario described in Section 5.2 with R² = 0.9. The prior information used in LongPEER estimation along with the confidence bands and the coverage probabilities of LongPEER estimates are displayed in Figure 6. The 95% confidence bands are constructed as discussed in Section 4.2.

Coverage probabilities of LongPEER estimates in 100 simulations with *ϕ_a* = 10 and *ϕ_b* = 1 discussed in section 5.3. *Top panel*: Prior information used in LongPEER estimation. *Middle and bottom panels*: 95% confidence band (shaded region) and coverage proportions (the dotted line) based on N = 100, and N = 400 subjects, respectively. The left column displays the cross-sectional function γ₀ and the right column the longitudinal function γ₁. The horizontal line in each plot marks the nominal coverage of 95%.

A notable improvement in coverage probabilities for both γ₀ and γ₁ was observed with the increase of N. For N = 400, the coverage for both γ₀ and γ₁ were close to nominal level of 95%. We can see the dip in coverage probability in the first feature in γ₀ at s = 0.15 due to negative estimation bias (i.e. over-smoothing); however, the coverage probability improves when N increases. We also explored the influence of ϕ_a on the confidence band and coverage probability (not shown here). The higher values of ϕ_a led to the confidence band shrinkage and this in turn resulted in under-coverage of both γ₀ and γ₁.

5.4 Estimation in the presence of incomplete information

Since the LongPEER estimate uses extrinsic scientific information in the estimation process, it is of interest to evaluate its estimation performance when only partial information is available. For this, we considered a simulation scenario similar to Section 5.3, but LongPEER estimate was obtained disregarding the information about the peak at s = 0.5. LongPEER estimates with ϕ_b = ϕ_a = 1 (i.e., a ridge penalty) and ϕ_b = 1, ϕ_a = 10^0.75 are displayed in Figure 7. LongPEER estimates of γ₀(s) has appropriate structure at s = 0.5, on average. Indeed as with an ordinary ridge penalty, this structure is inherited from the empirical eigenfunctions of W. This highlights the advantage of an estimate obtained from the jointly-determined eigenfunctions of W and L (see supplemental material available online in journal’s website); the estimate depends on the relative contributions of predictor function W and penalty L (i.e. relative contribution of extrinsic information), controlled by ϕ_a/ϕ_b. The relative increase in the contribution of extrinsic information resulted in shrinkage towards zero at s = 0.5 in LongPEER estimate.

*Top panel* true regression functions (solid lines) γ₀ (left) and γ₁ (right) and prior information about features (dashed lines). Note that there is no information about the peak at s = 0.5 in γ₀ *Middle panel*: Average ridge-penalty estimate from 100 simulations. *Bottom panel*: Average LongPEER estimate from 100 simulations using prior information displayed in the top panel and *ϕ_a* = 10^0.75.

6 MRS study application

We applied LongPEER to investigate potential associations of metabolite spectra, obtained from basal ganglia, and the global deficit score (GDS) in a longitudinal study of late stage HIV patients. Of particular interest is how such an association evolves over time. The study cohort was comprised of chronically HIV-infected patients enrolled in the HIV Neuroimaging Consortium (HIVNC), a longitudinal study of HIV associated brain injury, at multiple sites in the United States. At the time of enrollment, patients were on a stable anti-retroviral treatment. More detailed study description is available in (Harezlak et al., 2011). We treat global deficit score (GDS) as our scalar continuous response variable and MR spectrum (sampled at K = 399 distinct frequencies) as functional predictor. GDS is often used as a continuous measure of neurocognitive impairment (e.g., Carey et al., 2004) and a large GDS score indicates a high degree of impairment. The MRS spectra are comprised of pure metabolite spectra, instrument noise and a background profile. We collected a total of n_• = 306 observations from N = 114 subjects. The longitudinal observations for each subject were within 3 years from baseline. The number of observations per subject ranged from 1 to 5 with a median equal to 3. Spectral information of following 9 pure metabolites was used as prior information for the LongPEER estimation: Creatine (Cr), Glutamate (Glu), Glucose (Glc), Glycerophosphocholine (GPC), myo-Inositol (Ins), N-Acetylaspartate (NAA), N-Acetylaspartylglutamate (NAAG), scyllo-Inositol (Scyllo) and Taurine (Tau). These spectra are displayed in Figure 1.

Information available on demographic factors includes: age at baseline, gender and race. We relied on AIC to choose (a) scalar covariates in the model, (b) ϕ_a (while setting ϕ_b = 1) for determine the relative contribution of prior information and (c) the time structure of γ ≡ γ(t, ·). Based on the AIC (see Table 3), Models 1, 3, 5 and 6 are almost identical and appear to be better than the remaining models. In this illustration, we choose to consider relatively simple model. We fit Model 1 (with ϕ_a = 100,ϕ_b = 1) as follows:

y_{i t} = β_{0} + β_{1} t + \int_{Ω} W_{i t} (s) γ (t, s) d s + b_{i} + ε_{i t},

(6.1)

where γ(t, ·) = γ₀(·) + t γ₁(·). y_it and W_it(·) are GDS and basal ganglia spectrum for subject i at time t, respectively. The LongPEER estimates were obtained as BLUP assuming $ε_{i t} ~ N (0, σ_{ε}^{2})$ and the subject-specific random intercepts $b_{i} ~ N (0, σ_{b}^{2})$ .

Table 3.

Comparison of AIC for selection of scalar covariates, ϕ_a (ϕ_b = 1) and time structure in γ ≡ γ(t, ·) in Section 6

	Scalar covariates	Time structure in γ(t, s)	ϕ_a	AIC

Model 1	t	γ₀(s) + tγ₁(s)	10	−395.2335
Model 2	Age, t	γ₀(s) + tγ₁(s)	10	−405.2796
Model 3	Gender, t	γ₀(s) + tγ₁(s)	10	−395.9040
Model 4	Race, t	γ₀(s) + tγ₁(s)	10	−398.5607
Model 5	t, t²	γ₀(s) + tγ₁(s) + t²γ₂(s)	10	−394.5752
Model 6	t	γ₀(s) + tγ₁(s)	100	−395.3670

Open in a new tab

The estimates of λ (tuning parameter) associated with γ₀ and γ₁ were 1.152 and 2.242, respectively and the estimates of $σ_{ε}^{2}$ and $σ_{b}^{2}$ were 0.0786 and 0.3332, respectively. The GDS score, fitted values and residual plot are displayed in Figure 8 for the purpose of model checking. The residuals do not show any obvious pattern.

Prediction performance of Model in equation (6.1). Left panel: observed GDS score (y) and predicted value (ỹ). Right panel: observed ỹ and residuals (y−ỹ).

Figure 9 displays the estimates of γ₀ and γ₁ with pointwise 95% confidence bands. To aid interpretation, selected pure metabolite spectra are displayed. These figures reveal that estimated γ₀ (the “baseline” part of the regression function) is different from zero at the locations where at least one of the pure metabolites Cr, Glu, NAA, NAAG and Scyllo has a bump. Similarly, each non-zero part of estimated γ₁ (the “longitudinal” part of the regression function) coincides with bump locations of one or more pure metabolite profiles of Cr, Glu, NAA, GPC and Ins.

Pointwise confidence intervals for γ₀ and γ₁ contain the 0 line over large intervals. The estimated γ₀ is significant in the region s ∈ (0.4, 0.5) ∪ (0.6, 0.8)) and estimated γ₁ is significant in a region s ∈ (0.5, 0.6). To be precise, peaks in both γ̂₀(·) and γ̂₁(·) are significant at locations where at least one of the pure metabolite profiles NAA or Glu have bumps. The observation of negative ‘longitudinal’ effect of NAA is worth commenting; it suggests that GDS increases as NAA concentration decreases in basal ganglia, a finding consistent with several studies in which a reduced concentration of NAA is seen to be associated with a decrease in neuronal mass (Christiansen et al., 1993; Lim and Spielman, 1997; Soares and Law, 2009).

Finally, we considered other forms of f(t), such as exp(t) −1 or log(t + 1). However, relative improvement in AIC due to use of these complex structure was very minimal. For example, AIC associated with γ(t, ·) = γ₀(·) + [exp(t) − 1]γ₁(·) was observed as −394.56 in comparison to AIC of −395.23 observed with γ(t, ·) = γ₀(·) + tγ₁(·).

7 Discussion

We have proposed a novel estimation method for longitudinal functional regression and derived some properties of the estimated regression function. A valuable contribution of this framework is that it allows this estimate to vary with time as it extends the scope of penalized regression to the realm of longitudinal data. The approach may be viewed as an extension of longitudinal mixed effects models, replacing scalar predictors by functional predictors. Advantages of this framework include: estimating a time-dependent regression function; ability to incorporate structural information into the estimation process; easy implementation through linear mixed model equivalence.

The first simulation study of Section 5.1 illustrates the potential advantage of LongPEER estimate in exploiting an informed structured penalty, as compared to the more generic smoothness or spline-based constraints. The simulation in Section 5.3 suggests that coverage probabilities of the confidence bands for the true regression function are close to the nominal level. However, for small sample sizes the naive confidence bands do not seem to be sufficient and an alternative solution which takes into account the estimation bias is needed. In the case when only partial information is available the proposed method can be still useful, if we limit the relative contribution of the “informed” space and/or increase the sample size (see Section 5.4). In the absence of prior information, one may impose more vaguely-defined constraints—such as identity penalties, smoothing penalties or re-weighted projections onto empirical subspaces—to estimate the regression function.

Estimation in generalized ridge regression can be expressed in many forms. Clearly, one natural way to view this is via a Bayesian equivalence formulation (see e.g., Robinson, 1991) with the informative priors quantifying the available scientific knowledge. In our formulation, the linear mixed model equivalence provides an easy computational implementation as well as an automatic choice of the tuning parameters using REML criterion. As detailed in the Supplemental material, the generalized singular value decomposition (GSVD) provides intuition and insight about the role of the penalty operator via an explicit algebraic formulation of the estimate. Indeed, estimate of γ is obtained from a simultaneous diagonalization of W and L. The GSVD also provides explicit derivations of the bias and variance for the estimate. A possible extension of this work is to incorporate multiple functional predictors.

Consider we have two predictor functions W⁽¹⁾ and W⁽²⁾ and the associated regression functions are γ⁽¹⁾ ≡ γ⁽¹⁾(t, ·) and γ⁽²⁾ ≡ γ⁽²⁾(t, ·). Now express $γ^{(1)} = γ_{0}^{(1)} + f_{1}^{(1)} (t) γ_{1}^{(1)} + \dots + f_{d^{(1)}}^{(1)} (t) γ_{d^{(1)}}^{(1)}$ and $γ^{(2)} = γ_{0}^{(2)} + f_{1}^{(2)} (t) + γ_{1}^{(2)} + \dots + f_{d^{(2)}}^{(2)} (t) γ_{d^{(2)}}^{(2)}$ . Then the component functions can be estimated as BLUP from the mixed model: y = Xβ +W⁽¹⁾γ⁽¹⁾ +W⁽²⁾γ⁽²⁾ +Zb+ε where W⁽¹⁾, and W⁽²⁾ represent design matrices for the two functional predictors. Another possible extension is when response variable is binary or count. Indeed, an important problem that arises is to understand the neurocognitive impairment status of HIV patients, based on MRS collected over time. Estimation in these general settings appears to be possible with the proposed framework.

Supplementary Material

Supp Material

NIHMS816516-supplement-Supp_Material.pdf^{(141.7KB, pdf)}

Supp Material 1

NIHMS816516-supplement-Supp_Material_1.csv^{(870.8KB, csv)}

Supp Material 10

NIHMS816516-supplement-Supp_Material_10.R^{(9.6KB, R)}

Supp Material 11

NIHMS816516-supplement-Supp_Material_11.R^{(7.6KB, R)}

Supp Material 12

NIHMS816516-supplement-Supp_Material_12.R^{(7.6KB, R)}

Supp Material 13

NIHMS816516-supplement-Supp_Material_13.R^{(7.9KB, R)}

Supp Material 14

NIHMS816516-supplement-Supp_Material_14.R^{(7.9KB, R)}

Supp Material 15

NIHMS816516-supplement-Supp_Material_15.R^{(14.4KB, R)}

Supp Material 16

NIHMS816516-supplement-Supp_Material_16.R^{(7.7KB, R)}

Supp Material 2

NIHMS816516-supplement-Supp_Material_2.R^{(12.4KB, R)}

Supp Material 3

NIHMS816516-supplement-Supp_Material_3.rda^{(24.3KB, rda)}

Supp Material 4

NIHMS816516-supplement-Supp_Material_4.R^{(9.1KB, R)}

Supp Material 5

NIHMS816516-supplement-Supp_Material_5.R^{(8.4KB, R)}

Supp Material 6

NIHMS816516-supplement-Supp_Material_6.R^{(8.4KB, R)}

Supp Material 7

NIHMS816516-supplement-Supp_Material_7.R^{(8.4KB, R)}

Supp Material 8

NIHMS816516-supplement-Supp_Material_8.R^{(8.4KB, R)}

Supp Material 9

NIHMS816516-supplement-Supp_Material_9.R^{(9.6KB, R)}

Acknowledgments

The authors wish to thank two anonymous referees and the associate editor for several corrections and suggestions that greatly improved the paper. The authors also extend their thank Dr. B. Navia who provided the MRS data used as an example in the manuscript. Partial research support was provided by the National Institutes of Health grants U01-MH083545 (JH), R01-CA126205 (TR) and U01-CA086368 (TR).

References

Bjorck A. Numerical methods for least square problems. 1. Philadelphia: SIAM; 1996. [Google Scholar]
Brumback B, Rice J. Smoothing spline models for the analysis of nested and crossed samples of curves. J of Am Stat Assoc. 1998;93(443):961–976. [Google Scholar]
Cai T, Hall P. Prediction in functional linear regression. The Annals of Statistics. 2006;34(5):2159–2179. [Google Scholar]
Cardot H, Ferraty F, Sarda P. Functional linear model. Statistics and Probability Letters. 1999;45(1):11–22. [Google Scholar]
Cardot H, Ferraty F, Sarda P. Spline estimators for the functional linear model. Statistica Sinica. 2003;13:571–591. [Google Scholar]
Cardot H, Crambes C, Kneip A, Sarda P. Smoothing splines estimators in functional linear regression with errors-in-variables. Computational Statistics Data Analysis. 2007;51(10):4832–4848. [Google Scholar]
Carey C, Woods S, Gonzalez R, Conover E, Marcotte T, Grant I, Heaton R. Predictive validity of global deficit scores in detecting neuropsychological impairment in HIV infection. J Clin & Exp Neuropsychol. 2004;26(3):307–319. doi: 10.1080/13803390490510031. [DOI] [PubMed] [Google Scholar]
Christiansen P, Toft P, Larsson H, Stubgaard M, Henriksen O. The concentration of N-acetyl aspartate, creatine+phosphocreatine, and choline in different parts of the brain in adulthood and senium. Magnet reson imag. 1993;11(6):799–806. doi: 10.1016/0730-725x(93)90197-l. [DOI] [PubMed] [Google Scholar]
Crainiceanu C, Reiss P, Goldsmith A, Huang L, Huo L, Scheipl F. refund: Regression with Functional Data. 2012 (R package version 0.1-6). [ http://CRAN.R-project.org/package=refund]
Di C, Crainiceanu C, Caffo B, Punjabi N. Multilevel functional principal component analysis. Annals of Applied Statistics. 2009;4:458–288. doi: 10.1214/08-AOAS206SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]
Engl HW, Hanke M, Neubauer A. Regularization of Inverse Problems. Klewer Academic Publishers; 2000. [Google Scholar]
Fan J, Zhang J. Two-step estimation of functional linear models with applications to longitudinal data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2000;62(2):303–322. [Google Scholar]
Faraway J. Regression analysis for a functional response. Technometrics. 1997;39(3):254–261. [Google Scholar]
Fitzmaurice M, Laird G, Ware J. Applied Longitudinal Analysis. 1. 2004. Wiley Series in probability and statistics. [Google Scholar]
Gertheiss J, Goldsmith J, Crainiceanu C, Greven S. Longitudinal scalar-on-functions regression with application to tractography data. Biostatistics. 2013;14(3):447–461. doi: 10.1093/biostatistics/kxs051. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goldsmith J, Bobb J, Crainiceanu C, Caffo B, Reich D. Penalized functional regression. J Comput & Graph Stat. 2011;20(4):830–851. doi: 10.1198/jcgs.2010.10007. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goldsmith J, Bobb J, Crainiceanu C, Caffo B, Reich D. Longitudinal penalized functional regression for cognitive outcomes on neuronal tract measurements. J Roy Stat Soc Ser C (Appl Stat) 2012;61(3):453–469. doi: 10.1111/j.1467-9876.2011.01031.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Golub G, Van-Loan C. Matrix computations. Baltimore: John Hopkins University Press; 1996. [Google Scholar]
Greven S, Crainiceanu C, Caffo B, Reich D. Recent Advances in Functional Data Analysis and Related Topics. 1. Physica-Verlag HD; 2011. Longitudinal functional principal component analysis; pp. 149–154. [Google Scholar]
Guo W. Functional mixed effects models. Biometrics. 2002;58(1):121–128. doi: 10.1111/j.0006-341x.2002.00121.x. [DOI] [PubMed] [Google Scholar]
Hall P, Poskitt D, Presnell B. A functional data-analytic approach to signal discrimination. Technometrics. 2001;43(1):1–9. [Google Scholar]
Harezlak J, Buchthal S, Taylor M, Schifitto G, Zhong J, Daar E, Alger J, Singer E, Campbell T, Yiannoutsos C. Persistence of HIV-associated cognitive impairment, inflammation, and neuronal injury in era of highly active antiretroviral treatment. AIDS. 2011;25:625–633. doi: 10.1097/QAD.0b013e3283427da7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Henderson C. Estimation of genetic parameters (abstract) Annals of Mathematical Statistics. 1950;21(1) [Google Scholar]
Hoerl AE, Kennard RW. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics. 1970;12(1):55–67. [Google Scholar]
James G. Generalized linear models with functional predictors. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002;64(3):411–432. [Google Scholar]
Lim K, Spielman D. Estimating NAA in cortical gray matter with applications for measuring changes due to aging. Magnet Reson Med. 1997;37(3):372–377. doi: 10.1002/mrm.1910370313. [DOI] [PubMed] [Google Scholar]
Morris J, Carroll R. Wavelet-based functional mixed models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2006;68(2):179–199. doi: 10.1111/j.1467-9868.2006.00539.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Müller H. Functional modelling and classification of longitudinal data. Scandinavian Journal of Statistics. 2005;32(2):223–240. [Google Scholar]
Müller H, Stadtmüller U. Generalized functional linear models. The Annals of Statistics. 2005;33(2):774–805. [Google Scholar]
Paige C, Saunders M. Towards a generalized singular value decomposition. SIAM Journal on Numerical Analysis. 1981;18(3):398–405. [Google Scholar]
Phillips DL. A technique for the numerical solution of certain integral equations of the first kind. J Associat Comput Mach. 1962;9:84–97. [Google Scholar]
Ramsay J, Dalzell C. Some tools for functional data analysis. Journal of the Royal Statistical Society Series B (Methodological) 1991;53(3):539–572. [Google Scholar]
Ramsay J, Silverman B. Functional Data Analysis. 1. Springer-Verlag; Berlin: 1997. [Google Scholar]
Randolph T, Harezlak J, Feng Z. Structured penalties for functional linear models – partially empirical eigenvectors for regression. Electronic Journal of Statistics. 2012;6:323–353. doi: 10.1214/12-EJS676. [DOI] [PMC free article] [PubMed] [Google Scholar]
Reiss P, Ogden R. Smoothing parameter selection for a class of semiparametric linear models. J Roy Stat Society Ser B (Stat Method) 2009;71(2):505–523. [Google Scholar]
Robinson G. That blup is a good thing: the estimation of random effects. Statistical Science. 1991;6(1):15–32. [Google Scholar]
Ruppert D, Wand M, Carroll R. Semiparametric Regression. 1. 2003. Cambridge Series in Statistical and Probabilistic Mathematics. [DOI] [PMC free article] [PubMed] [Google Scholar]
Salganik M, Wand M, Lange N. Comparison of Feature Significance Quantile Approximations. Aus & NZ J Stat. 2004;46(4):569–581. [Google Scholar]
Silverman BW. Smoothed functional principal components analysis by choice of norm. Annals of Statistics. 1996;24:1–24. [Google Scholar]
Soares D, Law M. Magnetic resonance spectroscopy of the brain: review of metabolites and clinical applications. Clinical radiology. 2009;64(1):12–21. doi: 10.1016/j.crad.2008.07.002. [DOI] [PubMed] [Google Scholar]
Tikhonov AN. Solution of incorrectly formulated problems and the regularization method. Dokl Akad Nauk SSSR. 1963;151(3):501–504. (in Russian) English transl.: Soviet Math. Dokl 4(4), 1035—1038. [Google Scholar]
Van-Loan C. Generalizing the singular value decomposition. SIAM Journal on Numerical Analysis. 1976;13(1):76–83. [Google Scholar]
Yao F, Müller H. Functional quadratic regression. Biometrika. 2010;97(1):49–64. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp Material

NIHMS816516-supplement-Supp_Material.pdf^{(141.7KB, pdf)}

Supp Material 1

NIHMS816516-supplement-Supp_Material_1.csv^{(870.8KB, csv)}

Supp Material 10

NIHMS816516-supplement-Supp_Material_10.R^{(9.6KB, R)}

Supp Material 11

NIHMS816516-supplement-Supp_Material_11.R^{(7.6KB, R)}

Supp Material 12

NIHMS816516-supplement-Supp_Material_12.R^{(7.6KB, R)}

Supp Material 13

NIHMS816516-supplement-Supp_Material_13.R^{(7.9KB, R)}

Supp Material 14

NIHMS816516-supplement-Supp_Material_14.R^{(7.9KB, R)}

Supp Material 15

NIHMS816516-supplement-Supp_Material_15.R^{(14.4KB, R)}

Supp Material 16

NIHMS816516-supplement-Supp_Material_16.R^{(7.7KB, R)}

Supp Material 2

NIHMS816516-supplement-Supp_Material_2.R^{(12.4KB, R)}

Supp Material 3

NIHMS816516-supplement-Supp_Material_3.rda^{(24.3KB, rda)}

Supp Material 4

NIHMS816516-supplement-Supp_Material_4.R^{(9.1KB, R)}

Supp Material 5

NIHMS816516-supplement-Supp_Material_5.R^{(8.4KB, R)}

Supp Material 6

NIHMS816516-supplement-Supp_Material_6.R^{(8.4KB, R)}

Supp Material 7

NIHMS816516-supplement-Supp_Material_7.R^{(8.4KB, R)}

Supp Material 8

NIHMS816516-supplement-Supp_Material_8.R^{(8.4KB, R)}

Supp Material 9

NIHMS816516-supplement-Supp_Material_9.R^{(9.6KB, R)}

[R1] Bjorck A. Numerical methods for least square problems. 1. Philadelphia: SIAM; 1996. [Google Scholar]

[R2] Brumback B, Rice J. Smoothing spline models for the analysis of nested and crossed samples of curves. J of Am Stat Assoc. 1998;93(443):961–976. [Google Scholar]

[R3] Cai T, Hall P. Prediction in functional linear regression. The Annals of Statistics. 2006;34(5):2159–2179. [Google Scholar]

[R4] Cardot H, Ferraty F, Sarda P. Functional linear model. Statistics and Probability Letters. 1999;45(1):11–22. [Google Scholar]

[R5] Cardot H, Ferraty F, Sarda P. Spline estimators for the functional linear model. Statistica Sinica. 2003;13:571–591. [Google Scholar]

[R6] Cardot H, Crambes C, Kneip A, Sarda P. Smoothing splines estimators in functional linear regression with errors-in-variables. Computational Statistics Data Analysis. 2007;51(10):4832–4848. [Google Scholar]

[R7] Carey C, Woods S, Gonzalez R, Conover E, Marcotte T, Grant I, Heaton R. Predictive validity of global deficit scores in detecting neuropsychological impairment in HIV infection. J Clin & Exp Neuropsychol. 2004;26(3):307–319. doi: 10.1080/13803390490510031. [DOI] [PubMed] [Google Scholar]

[R8] Christiansen P, Toft P, Larsson H, Stubgaard M, Henriksen O. The concentration of N-acetyl aspartate, creatine+phosphocreatine, and choline in different parts of the brain in adulthood and senium. Magnet reson imag. 1993;11(6):799–806. doi: 10.1016/0730-725x(93)90197-l. [DOI] [PubMed] [Google Scholar]

[R9] Crainiceanu C, Reiss P, Goldsmith A, Huang L, Huo L, Scheipl F. refund: Regression with Functional Data. 2012 (R package version 0.1-6). [ http://CRAN.R-project.org/package=refund]

[R10] Di C, Crainiceanu C, Caffo B, Punjabi N. Multilevel functional principal component analysis. Annals of Applied Statistics. 2009;4:458–288. doi: 10.1214/08-AOAS206SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Engl HW, Hanke M, Neubauer A. Regularization of Inverse Problems. Klewer Academic Publishers; 2000. [Google Scholar]

[R12] Fan J, Zhang J. Two-step estimation of functional linear models with applications to longitudinal data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2000;62(2):303–322. [Google Scholar]

[R13] Faraway J. Regression analysis for a functional response. Technometrics. 1997;39(3):254–261. [Google Scholar]

[R14] Fitzmaurice M, Laird G, Ware J. Applied Longitudinal Analysis. 1. 2004. Wiley Series in probability and statistics. [Google Scholar]

[R15] Gertheiss J, Goldsmith J, Crainiceanu C, Greven S. Longitudinal scalar-on-functions regression with application to tractography data. Biostatistics. 2013;14(3):447–461. doi: 10.1093/biostatistics/kxs051. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Goldsmith J, Bobb J, Crainiceanu C, Caffo B, Reich D. Penalized functional regression. J Comput & Graph Stat. 2011;20(4):830–851. doi: 10.1198/jcgs.2010.10007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Goldsmith J, Bobb J, Crainiceanu C, Caffo B, Reich D. Longitudinal penalized functional regression for cognitive outcomes on neuronal tract measurements. J Roy Stat Soc Ser C (Appl Stat) 2012;61(3):453–469. doi: 10.1111/j.1467-9876.2011.01031.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] Golub G, Van-Loan C. Matrix computations. Baltimore: John Hopkins University Press; 1996. [Google Scholar]

[R19] Greven S, Crainiceanu C, Caffo B, Reich D. Recent Advances in Functional Data Analysis and Related Topics. 1. Physica-Verlag HD; 2011. Longitudinal functional principal component analysis; pp. 149–154. [Google Scholar]

[R20] Guo W. Functional mixed effects models. Biometrics. 2002;58(1):121–128. doi: 10.1111/j.0006-341x.2002.00121.x. [DOI] [PubMed] [Google Scholar]

[R21] Hall P, Poskitt D, Presnell B. A functional data-analytic approach to signal discrimination. Technometrics. 2001;43(1):1–9. [Google Scholar]

[R22] Harezlak J, Buchthal S, Taylor M, Schifitto G, Zhong J, Daar E, Alger J, Singer E, Campbell T, Yiannoutsos C. Persistence of HIV-associated cognitive impairment, inflammation, and neuronal injury in era of highly active antiretroviral treatment. AIDS. 2011;25:625–633. doi: 10.1097/QAD.0b013e3283427da7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Henderson C. Estimation of genetic parameters (abstract) Annals of Mathematical Statistics. 1950;21(1) [Google Scholar]

[R24] Hoerl AE, Kennard RW. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics. 1970;12(1):55–67. [Google Scholar]

[R25] James G. Generalized linear models with functional predictors. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002;64(3):411–432. [Google Scholar]

[R26] Lim K, Spielman D. Estimating NAA in cortical gray matter with applications for measuring changes due to aging. Magnet Reson Med. 1997;37(3):372–377. doi: 10.1002/mrm.1910370313. [DOI] [PubMed] [Google Scholar]

[R27] Morris J, Carroll R. Wavelet-based functional mixed models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2006;68(2):179–199. doi: 10.1111/j.1467-9868.2006.00539.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] Müller H. Functional modelling and classification of longitudinal data. Scandinavian Journal of Statistics. 2005;32(2):223–240. [Google Scholar]

[R29] Müller H, Stadtmüller U. Generalized functional linear models. The Annals of Statistics. 2005;33(2):774–805. [Google Scholar]

[R30] Paige C, Saunders M. Towards a generalized singular value decomposition. SIAM Journal on Numerical Analysis. 1981;18(3):398–405. [Google Scholar]

[R31] Phillips DL. A technique for the numerical solution of certain integral equations of the first kind. J Associat Comput Mach. 1962;9:84–97. [Google Scholar]

[R32] Ramsay J, Dalzell C. Some tools for functional data analysis. Journal of the Royal Statistical Society Series B (Methodological) 1991;53(3):539–572. [Google Scholar]

[R33] Ramsay J, Silverman B. Functional Data Analysis. 1. Springer-Verlag; Berlin: 1997. [Google Scholar]

[R34] Randolph T, Harezlak J, Feng Z. Structured penalties for functional linear models – partially empirical eigenvectors for regression. Electronic Journal of Statistics. 2012;6:323–353. doi: 10.1214/12-EJS676. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] Reiss P, Ogden R. Smoothing parameter selection for a class of semiparametric linear models. J Roy Stat Society Ser B (Stat Method) 2009;71(2):505–523. [Google Scholar]

[R36] Robinson G. That blup is a good thing: the estimation of random effects. Statistical Science. 1991;6(1):15–32. [Google Scholar]

[R37] Ruppert D, Wand M, Carroll R. Semiparametric Regression. 1. 2003. Cambridge Series in Statistical and Probabilistic Mathematics. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] Salganik M, Wand M, Lange N. Comparison of Feature Significance Quantile Approximations. Aus & NZ J Stat. 2004;46(4):569–581. [Google Scholar]

[R39] Silverman BW. Smoothed functional principal components analysis by choice of norm. Annals of Statistics. 1996;24:1–24. [Google Scholar]

[R40] Soares D, Law M. Magnetic resonance spectroscopy of the brain: review of metabolites and clinical applications. Clinical radiology. 2009;64(1):12–21. doi: 10.1016/j.crad.2008.07.002. [DOI] [PubMed] [Google Scholar]

[R41] Tikhonov AN. Solution of incorrectly formulated problems and the regularization method. Dokl Akad Nauk SSSR. 1963;151(3):501–504. (in Russian) English transl.: Soviet Math. Dokl 4(4), 1035—1038. [Google Scholar]

[R42] Van-Loan C. Generalizing the singular value decomposition. SIAM Journal on Numerical Analysis. 1976;13(1):76–83. [Google Scholar]

[R43] Yao F, Müller H. Functional quadratic regression. Biometrika. 2010;97(1):49–64. [Google Scholar]

PERMALINK

Longitudinal Functional Models with Structured Penalties

Madan G Kundu

Jaroslaw Harezlak

Timothy W Randolph

Abstract

1 Introduction

Figure 1.

2 Statistical Model

3 Estimation of Parameters with a Penalty

3.1 Generalized Ridge Estimate

3.2 Decomposition based penalty

4 Mixed model representation

4.1 Estimation of parameters

4.2 Precision of Estimates

4.3 Selection of time-structure in γ ≡ γ(t, ·)

4.4 Selection of ϕa and ϕb for a decomposition penalty

5 Simulation

Figure 2.

Figure 9.

5.1 Comparison with LPFR

Table 1.

Table 2.

5.2 Simulation with a time varying regression function

Figure 5.

Figure 3.

Figure 4.

5.3 Coverage probability

Figure 6.

5.4 Estimation in the presence of incomplete information

Figure 7.

6 MRS study application

Table 3.

Figure 8.

7 Discussion

Supplementary Material

Acknowledgments

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

4.4 Selection of ϕ_a and ϕ_b for a decomposition penalty