Abstract
Functional data are increasingly collected in public health and medical studies to better understand many complex diseases. Besides the functional data, other clinical measures are often collected repeatedly. Investigating the association between these longitudinal data and time to a survival event is of great interest to these studies. In this article, we develop a functional joint model (FJM) to account for functional predictors in both longitudinal and survival submodels in the joint modeling framework. The parameters of FJM are estimated in a maximum likelihood framework via EM algorithm. The proposed FJM provides a flexible framework to incorporate many features both in joint modeling of longitudinal and survival data and in functional data analysis. The FJM is evaluated by a simulation study and is applied to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study, a motivating clinical study testing whether serial brain imaging, clinical and neuropsychological assessments can be combined to measure the progression of Alzheimer’s disease.
Keywords: ADNI study, functional data analysis, longitudinal and time-to-event data, functional principal component analysis
1 Introduction
Modern technologies are currently producing increasingly large, complex, and high-dimensional data in medical research. One such type of data is functional data, whose units of observation are functions defined on certain continuous domains (e.g., time, space, or both) but sampled on discrete grids. These functions may be defined on a one-dimensional Euclidean domain, such as growth curve data, heart rate monitor (HRM) data, and electroencephalogram (EEG) data. A growing volume of functional data are also collected on higher dimensional domains such as magnetic resonance imaging (MRI), positron emission tomography (PET), and functional MRI (fMRI). Moreover, prospective cohort studies and clinical trials investigating neurodegenerative diseases such as Alzheimer’s disease (AD) and Huntington’s disease (HD) often collect repeated measurements of clinical variables, event history, and functional data, which induce complex correlation structures among the observations. The emergence of new data types and structures brings rich source of information but also poses new challenges in methodology development of functional data analysis (FDA).
Function regression, especially functional predictor regression (scalar-on-function regression, models the relationship between a scalar outcome and functional predictors), is an active area of FDA in the past 10 years. It was first introduced by Ramsay & Dalzell [1] and built up by Ramsay & Silverman [2]. There is a rich literature in functional predictor regression [e.g., 3–10], while most existing work deals only with cross-sectional data. Goldsmith et al. [11] first extended the penalized functional regression approach [10] to handle longitudinal measurements in both the response variable and functional predictors by incorporating scalar random effects. Gertheiss et al. [12] extended the functional principal component regression (PCR) to longitudinal functional data and allowed for different effects of subject-specific curves. More recently, Gellar et al. [13] extended the Cox proportional hazards model to incorporate functional predictors and estimated the parameters via penalized partial likelihood approach. Lee et al. [14] developed a Bayesian functional Cox regression model with both functional and scalar covariates, but used different regularization methods. To the best of our knowledge, there is no functional regression modeling attempts to simultaneously analyze the longitudinal measurements and time-to-event data under the joint modeling framework.
Joint models of longitudinal and time-to-event data were proposed by Faucett & Thomas [15] and Wulfsohn & Tsiatis [16]. The principle is to define two submodels (a mixed effects submodel for the longitudinal outcome and a Cox submodel for the survival outcome) and link them using a common latent structure. This modeling approach analyzes the two types of outcomes simultaneously and is able to reduce the bias of parameter estimates and improve the efficiency of statistical inference. Tsiatis and Davidian [17] and Proust-Lima et al. [18] gave excellent review of joint modeling research. However, current state-of-the-art joint models do not incorporate functional predictors.
The major objective of this article is to incorporate the growing volume of functional data in the longitudinal-survival setting. Specifically, we develop a functional joint model (FJM), where outcomes consist of a longitudinal measure and a time-to-event variable, and the exposure variables include both scalar covariates and functional predictors. The rest of the article is organized as follows. In Section 2, we describe a motivating clinical study and the data structure. In Section 3, we discuss the joint longitudinal-survival model with functional predictors, and the estimation procedure. In Section 4, we apply the proposed FJM to the motivating Alzheimer’s Disease Neuroimaging Initiative (ADNI) study. In Section 5, we conduct a simulation study to examine the performance of the proposed FJM. Concluding remarks and discussion is presented in Section 6.
2 A Motivating Clinical Study
The methodology development is motivated by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study. The ADNI study was launched in year 2003 with the primary goal of testing whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), cerebrospinal fluid (CSF) markers, and neuropsychological assessments can be combined to measure the progression of AD. The phase one of the ADNI study (ADNI-1) recruited more than 800 adults, of which about 200 cognitively normal individuals, 400 mild cognitive impairment (MCI) patients, and 200 early AD patients. Participants were reassessed at 6, 12, 18, 24 and 36 months, and additional follow-ups were conducted annually as part of ADNI-2. At each visit, various neuropsychological assessments, brain image, and clinical measures were collected. Detailed information about the ADNI study procedures, including participant inclusion and exclusion criteria and complete study protocol can be found at http://www.adni-info.org.
Because MCI is commonly considered as a transitional stage between normal cognition and Alzheimer’s disease, numerous recent studies are to assess various neuroimaging techniques and clinical markers to predict AD diagnosis among MCI patients [e.g., 19]. To this end, we select 355 MCI patients in ADNI-1 study without missing data in covariates of interests, and consider AD diagnosis to be the survival event of interest. Participants were followed up for a mean of 3.2 years (SD 2.6; range 0.4–9.3) before AD diagnosis or censoring. Among the 355 MCI patients, 180 patients were diagnosed with AD and 175 had stable MCI over a mean follow-up period of 2.3 years and 4.2 years, respectively.
Alzheimer Disease Assessment Scale-Cognitive (ADAS-Cog) score (with 11 items, referred to as ADAS-Cog 11) measures cognition functions and it is usually reported as a composite score of the 11 items and it ranges from 0 to 70 (or 85), with a higher score indicating poor cognitive function. Figure 1 displays the lowess smoothing curve [20] of ADAS-Cog 11 scores over time for the MCI patients, with follow-up time less than 3 years (203 patients, solid line), 3–6 years (82 patients, dotted line), and more than 3 years (70 patients, dashed line), in addition to 95% pointwise confidence intervals (shaded regions). Figure 1 suggests that the ADAS-Cog 11 scores of patients in all three groups increase with time (deteriorating cognitive functions). Moreover, patients with shorter follow-up time tend to have higher ADAS-Cog 11 scores, indicating that patients with more severe cognitive impairment were more likely to progress to AD. This phenomenon manifests strong correlation between the longitudinal ADAS-Cog 11 values and the time to AD diagnosis. Such a dependent terminal event time is often referred to as “dependent censoring” or “informative censoring” in the literature of joint modeling [21]. However, many studies [22, 23] designed to explore longitudinal measures for predicting future cognitive functions of MCI patients fail to account for such informative censoring, leading to biased inference.
Moreover, degree of atrophy within the medial temporal lobe structures, especially within the hippocampus, was reported to be associated with the AD progression. Most of the current analysis was based on volumetric brain MRI data. For example, AD patients and MCI patients have been shown to have 27% and 11% smaller hippocampal volumes, respectively, as compared with normal age-matched elderly [24]. In the preliminary analysis, we fit regular joint models with baseline MRI imaging measure including bilateral hippocampal volume as a scalar covariate. The results (detailed in the Web Supplement) suggest that hippocampal volume has strong association with both cognitive function decline and time to AD diagnosis. Furthermore, some researches [25–28] demonstrated that the surface-based morphology analysis offers more advantages because these methods study patterns of hippocampal subfield atrophy and produce detailed pointwise correlation between atrophy and cognitive functions.
In this article, we propose a functional joint model (FJM) to examine the association between the longitudinal ADAS-Cog 11 score and the time to AD diagnosis, accounting for the clinical co-variates and MRI imaging measures. We include as a functional predictor the hippocampal radial distance (HRD) of each bilateral hippocampal surface point (referred to as vertex), which measures the distance from the medial core to each vertex on the surface and represents the hippocampal thickness. For image processing, we adopt a surface fluid registration package [29]. The left and right hippocampal surfaces are first conformally mapped to a two-dimensional (2D) rectangle plane, in the form of matrix, to form a feature image of the surface. After registering each feature image to a common template and calculating HRD for all vertexes, the 2D image matrices are vectorized into one-dimensional (1D) image vectors. The spatial information and image smoothing has been accounted for in part in the image preprocessing steps as described in Shi et al [29]. Our focus here is to propose methodology that is applicable to a wide variety of imaging and non-imaging functional data. To this end, the corresponding hippocampal radial distances of the vertexes are represented as a 1D functional data defined on domain S. Each point in the image vector retains a one-to-one relationship to the original vertex on the surface, which allows us to back-transform any functions defined on domain S to the hippocampal surfaces. The hippocampus image processing procedure is detailed in the Web Supplement.
3 Functional Joint Model
We introduce a general functional joint model (FJM) framework for a longitudinal process and a time-to-event process with time-invariant functional predictors. For the ease of illustration, we only incorporate a single time-invariant (baseline) functional predictor in both the longitudinal and survival submodels, while the FJM can readily accommodate multiple functional predictors.
3.1 Functional joint model framework
For each subject i (i = 1, …, I) at visit j (j = 1, …, Ji), we observe data , where yij = yi(tij) is a scalar outcome recorded at time tij from the study onset. Vector xij is a p-dimensional covariate vector. Function is a time-invariant functional predictor defined over a 1D domain s ∈ [0, Smax] = S. The longitudinal submodel is
(1) |
is the unobserved true value of the longitudinal outcome at time tij, β0 is the intercept, and β is the regression coefficient vector. Coefficient function B(x)(s) (defined on the same domain as ) determines a pointwise association between and yi(tij). Vector zij is a q dimensional covariates corresponding to random effects ui, which is assumed to have ui ~ N(0, Σu) to account for the within-subject correlation. The measurement error is independent from ui.
The event history is recorded for each subject i with observed event time and the event indicator , where and Ci are the true event time and censoring time, respectively. The survival submodel is
(2) |
where h0(t) is the baseline hazard function, and wi is a vector of time-independent covariates with regression coefficient vector γ. Functional predictor may be the same or different from its counterpart in model (1). Functional log hazard ratio B(w)(s) measures the overall contribution of towards the event hazard. The association parameter α quantifies the strength of correlation between the unobserved true longitudinal function mi(t) and the event hazard at the same time point. Models (1) and (2) consist of the functional joint model (FJM) framework.
3.2 Functional regression
To build the functional regression model, we follow the general strategy of functional principal component regression (FPCR; e.g., [4, 8]). We first express the time-invariant functional predictor in model (1) using the Karhunen-Loève decomposition. Let μ(x)(s) be the mean of and be the covariance function between two locations (s and s′) of the functional predictor. The spectral decomposition of the covariance function is given by , where are non-increasing eigenvalues and are the corresponding orthonormal eigenfunctions. The Karhunen-Loève expansion of is
where the functional principal component (FPC) scores are uncorrelated random variables with mean zero and variance . In practice, we adopt a truncated approximation for given by . The number of components Kx can be determined using the proportion of explained variance (PEV). Specifically, Kx may be chosen as the minimum number of functional principal components such that , where L is a pre-specified PEV, e.g., L = 80%, 90%, or 95%.
In the second step, we use the first Kx eigenfunctions as the basis functions to expand the coefficient function B(x)(s) in model (1) as , where the coefficient . We let vector of FPC scores , vector of eigenfunctions , and vector of coefficients . Then we have
Note that ∫S ϕ(x)(s)ϕ(x)(s)⊤ds = I, where I is an identity matrix, because the eigenfunctions are orthonormal.
Similarly, the functional predictor g(w)(s) in model (2) can be expressed as and thus , where μ(w)(s), , , and B(w) have the same meanings as μ(x)(s), , and B(x), respectively. Thus, the FJM based on the FPC scores is
(3) |
and
(4) |
where and . For notational ease and without ambiguity, we replace the approximation sign (≈) by the equal sign. Note that models (3) and (4) are similar to a linear mixed model for the longitudinal scalar response variable and a Cox model for the survival outcome, respectively, in a standard joint model framework [16]. And FPC scores and can be treated as scalar covariates. Similar to mixed models, our FJM can readily handle unbalanced data in the longitudinal measurement of yi(t).
Let be the parameter vector, where vech(Σu) is the vector being formed by vectorizing the lower triangular part of covariance matrix Σu, vector denotes the parameters in the baseline hazard function . The conditional likelihood from the longitudinal data is
and the density function of the random effects ui is where q is the dimension of the covariance matrix Σu. The conditional likelihood from the survival data is
where , and function can be approximated by a piecewise-constant function or a B-spline function.
Under the local independence assumption (i.e., conditional on the random effects vector ui, all components in yi and Ti are independent), the joint likelihood function is
(5) |
3.3 Estimation and inference
In practice, the functional predictors such as are measured over finite grids in domain S and often with error, i.e., the observed functional predictor , where measurement error . The mean function μ(x)(s) is estimated by , and the empirical covariance function is estimated by . We apply kernel smoothing to the off-diagonal elements of to remove the effects from measurement errors [30, 31]. Then the estimated eigenvalues and the corresponding estimated eigenfunctions , where l = 1, …, Smax, are calculated based on the decomposition of the smoothed covariance function. Finally, the estimated FPC scores for each subject are calculated as , and the integral can be approximated by the Riemann sum. We choose the first Kx estimated eigenfunctions and FPC scores, and denote them as and , respectively.
Maximization of the likelihood function in model (5) with respect to parameter vector θ can be performed using Expectation Maximization (EM) algorithm. More details regarding the EM algorithm implementation are given in the Web Supplement. Based on the estimated coefficient vector , the estimated coefficient function is calculated by . A pointwise 95% confidence interval for B(x)(s) can be constructed based on D (e.g., D = 1,000) bootstrap samples [32], e.g., at location s, a 95% bootstrap confidence interval for B(x)(s) can be where is the p-quartile of the bootstrap samples , d = 1, ⋯, D. Alternatively, a Wald-type confidence interval based on the standard deviation of the bootstrap estimates, , is given by . These two types of bootstrap confidence intervals give very similar results in our simulation study. Similarly, the estimated coefficient function and its confidence interval can be obtained. For visualization in the ADNI study, we can map the coefficient function back to the hippocampal surfaces, because there is an one-to-one relationship between the location s in domain S and the vertex on the hippocampal surfaces.
3.4 Implementation using software
An advantage of the proposed FJM is that its implementation can be done using available standard software. The first step is to conduct FPCA for functional predictors and to estimate the FPC eigenfunctions and scores, using fpca.sc function in the refund package [33] or fpca.mle and fpca.score functions in the FPCA package [34] in R. In the second step, FPC scores are used as scalar covariates in a standard joint model for longitudinal and survival data and their coefficients can be estimated using JM package [35] in R. The estimated coefficient function can be calculated as weighted sum of FPC eigenfunctions. In addition, we have fitted the proposed FJM via our own code and have obtained the estimation results very close to those from the aforementioned packages. To facilitate easy reading and implementation of the proposed FJM, we provide in the Web Supplement the R codes to conduct FPCA, and to fit the FJM using either the JM package or our estimation methods based on EM algorithm.
4 Application to the ADNI Study
We apply the proposed FJM to the motivating ADNI study. We include the following variables as scalar covariates: baseline age (bAge, mean: 74.4, SD: 7.3, range 55.1–89.3), gender (gender, 36.1% female), years of education (Edu, mean: 15.6, SD: 3.0, range 4–20), and presence of at least one apolipoprotein E allele (APOE–ε4, 56%), given their potential effects on AD progression [36–38]. To utilize the brain imaging information, we include baseline hippocampal volume (bHV) as a scalar covariate and the baseline hippocampal surfaces based on hippocampal radial distance (HRD) as a functional predictor. We follow the procedure in the Web Supplement to convert the 3D HRD to a 1D domain denoted by S.
The first model we consider is the regular joint model (refer to as model JM, identified in the preliminary analysis in Section 2), which incorporates variable bHV in both longitudinal and survival submodels. Additionally, we consider three FJMs, i.e., model FJM1 includes HRD only in the longitudinal submodel and model FJM2 includes HRD only in the survival submodel, while model FJM3 includes HRD in both submodels as
We perform functional principal component analysis (FPCA) on HRD and select the first 20 FPCs which explain 82.6% of the total variance in the hippocampus radial distance data. Baseline hazard function h0(t) is approximated by a piecewise constant function. Specifically, the observed survival time is divided into M = 7 intervals by every 1/Mth quantiles. We have also explored other selections of M and obtained very similar results.
Table 1 displays the values of Akaike information criterion (AIC) from the four candidate models. The FJM1 and FJM3 have smaller AIC than model JM, suggesting that including HRD as a functional predictor in the longitudinal submodel may improve the model fit. Model FJM1 is selected as the final model because it has the smallest AIC value. This may indicate that after adjustment of hippocampal volume and other covariates, HRD remains an important functional predictor for the cognitive functions manifested by variable ADAS-Cog 11 among the MCI patients. The HRD may not be significantly associated with the time to AD diagnosis, because FJM2 has the largest AIC.
Table 1.
JM | FJM1 | FJM2 | FJM3 | |
---|---|---|---|---|
AIC | 10211 | 10202 | 10217 | 10208 |
Parameter estimates from model FJM1 are presented in Table 2, while the estimated vector of coefficients (for a vector of 20 FPC scores as in model (3)) are presented in Web Table 3. ADAS-Cog 11 score increases (deteriorates) as time progresses, i.e., an average increase of 1.006 unit (95% CI: [0.881–1.131]) per year for MCI patients. Higher education, lack of APOE-ε4 allele(s), and larger hippocampal volume at baseline are associated with lower (better) ADAS-Cog 11 scores. Moreover, Web Table 3 suggests that the coefficients for eight FPC scores are significant (p < 0.05), indicating that the baseline hippocampal radial distance (HRD) is associated with the ADAS-Cog 11 score at all visits. In the survival submodel, the presence of APOE-ε4 allele(s) increases the hazard of AD diagnosis by 44%(exp(0.364) − 1, 95% CI: [4%–100%]), which is consistent with the literature [39]. Older age and larger hippocampal volume at baseline are associated with lower risk of AD diagnosis. Furthermore, larger ADAS-Cog 11 score increases the risk of AD diagnosis, i.e., one unit increase in ADAS-Cog 11 score increases the hazard of AD diagnosis by 11% (exp(0.107) − 1, 95% CI: [8%–15%]).
Table 2.
Parameters | MLE | SE | p | |
---|---|---|---|---|
For longitudinal outcome | ||||
ADAS-Cog 11 | Time (Years) | 1.006 | 0.064 | <0.001 |
Female | −0.227 | 0.375 | 0.546 | |
bAge | −0.259 | 0.177 | 0.143 | |
Edu (years) | −0.248 | 0.051 | <0.001 | |
APOE-ε4 | 1.090 | 0.278 | <0.001 | |
bHV (mm3) | −0.954 | 0.201 | <0.001 | |
| ||||
For survival process | ||||
MCI to AD | Female | −0.203 | 0.168 | 0.227 |
bAge | −0.165 | 0.087 | 0.059 | |
Edu (years) | −0.002 | 0.026 | 0.951 | |
APOE-ε4 | 0.364 | 0.168 | 0.030 | |
bHV (mm3) | −0.300 | 0.091 | <0.001 | |
α | 0.107 | 0.015 | <0.001 |
Table 3.
n=200, CR=
0.3 |
n=500, CR=0.3
|
||||||||||
Bias | AMSE | SE | SD | CP | Bias | AMSE | SE | SD | CP | ||
| |||||||||||
For longitudinal data | |||||||||||
β1 = 0.78 | <0.001 | <0.001 | 0.002 | 0.002 | 0.970 | <0.001 | <0.001 | 0.001 | 0.002 | 0.940 | |
B(x)(s) = 2 sin(πs/5) | 0.008 | 0.003 | |||||||||
|
<0.001 | <0.001 | 0.030 | 0.028 | 0.965 | 0.007 | <0.001 | 0.019 | 0.022 | 0.920 | |
|
0.052 | 0.017 | 0.130 | 0.121 | 0.970 | 0.005 | 0.009 | 0.081 | 0.092 | 0.950 | |
For survival data | |||||||||||
γ1 = −1.75 | 0.077 | 0.060 | 0.219 | 0.233 | 0.930 | 0.012 | 0.027 | 0.152 | 0.163 | 0.940 | |
α = 0.29 | 0.012 | <0.001 | 0.021 | 0.022 | 0.940 | 0.003 | <0.001 | 0.013 | 0.016 | 0.925 | |
B(w)(s) = 1.2 sin(πs/4) | 0.023 | 0.012 | |||||||||
| |||||||||||
n=200, CR=0.5
|
n=500, CR=0.5
|
||||||||||
Bias | AMSE | SE | SD | CP | Bias | AMSE | SE | SD | CP | ||
| |||||||||||
For longitudinal data | |||||||||||
β1 = 0.78 | <0.001 | <0.001 | 0.002 | 0.002 | 0.930 | <0.001 | <0.001 | 0.001 | 0.001 | 0.985 | |
B(x)(s) = 2 sin(πs/5) | 0.007 | 0.003 | |||||||||
|
0.002 | 0.001 | 0.030 | 0.030 | 0.955 | <0.001 | <0.001 | 0.019 | 0.017 | 0.960 | |
|
0.044 | 0.019 | 0.130 | 0.130 | 0.955 | 0.016 | 0.007 | 0.081 | 0.083 | 0.940 | |
For survival data | |||||||||||
γ1 = −1.75 | 0.115 | 0.082 | 0.255 | 0.262 | 0.950 | 0.043 | 0.021 | 0.154 | 0.160 | 0.950 | |
α = 0.29 | 0.019 | 0.001 | 0.024 | 0.025 | 0.920 | 0.008 | <0.001 | 0.015 | 0.015 | 0.915 | |
B(w)(s) = 1.2 sin(πs/4) | 0.031 | 0.010 |
The coefficient function B(x)(s) is estimated via . For visualization purpose, each point in the 1D domain S, along with the coefficient function on that point, are mapped back to the corresponding vertex on the hippocampal surfaces (Figure 2). Due to the difficulty of displaying a 3D object on paper, Figure 2 only displays two views (from top and bottom) of left and right hippocampal surfaces. Panel (a) displays a schematic representation of the hippocampal subfields defined by Apostolova et al. [40], on the hippocampal surface template. Panel (b) displays the coefficient function of the functional predictor HRD in the longitudinal submodel. Blue colors denote negative values of in the regions. It suggests that the decrease of HRD (i.e., hippocampal atrophy) in the blue regions is associated with increasing ADAS-Cog 11 score and deteriorating cognitive functions. Most blue regions in Panel (b) are located in the CA1 subfield and subiculum (Sub) subfield displayed in Panel (a), suggesting that regional radial atrophy in these subfields may be a good predictor of AD progression among MCI patients. The similar point was made in the previous literature [40, 41].
5 Simulation Study
In this section, we conduct a simulation study with two settings to evaluate the proposed FJM models. In Setting I, we include one functional predictor in both longitudinal and survival submodels, while in Setting II, we simulate a functional predictor and its coefficient which are similar to Section 4.
In Setting I, we select I = 200 or 500 subjects and each subject has Ji=4 measurements at time 0, 40, 80, and 120. The longitudinal submodel is
where j = 1, …, Ji, , , ui1 ~ U(0, 5), ui2 ~ N(1, 0.2), and νis1, νis2 ~ N(0, 1/k2). The time-invariant functional predictor is defined on a 1D domain S = [0, 10], and it is observed on a discrete grid at location s = m/10, where m = 0, …, 100. The observed functional predictor , where the measurement errors εi(s) ~ N(0, 0.1) across s. The coefficient function B(x)(s) = 2 sin(πs/5). The survival submodel is
where the baseline hazard function h0(t) = 0.02, w1 is simulated from Bernoulli distribution with probability being 0.5, and γ1 = −1.75. Functional predictor and the observed functional predictor is generated in a similar fashion as . The coefficient function B(w)(s) = 1.2 sin(πs/4). Censoring times are independently simulated from a uniform distribution U(0, c), where c is chosen to achieve a desired censoring rate (CR) of 30% or 50%. Due to censoring, each subject has 2 to 3 repeated measurements. We perform FPCA to the simulated functional predictors and , and choose the first 5 functional principal components which explain 95% of the total variance in the original data.
We generate 200 simulated datasets (denoted by subscript r) for each combination of sample sizes (I = 200 or 500) and censoring rates (30% or 50%). Table 3 presents the average mean squared error (AMSE) for the coefficient functions and other parameters as and , respectively, in addition to standard error (SE, the square root of the average of the variance), standard deviation (SD, the standard deviation of the MLEs), and coverage probabilities (CP) of 95% confidence intervals. Table 3 suggests that in Setting I, the proposed FJM performs reasonable well with relatively small AMSE values for both coefficient functions and other parameters, SE being close to SD, and the confidence interval coverage probabilities being reasonably close to 95%.
Figure 3 displays the true coefficient functions B(x)(s) and B(w)(s) (red solid lines) and their estimated curves (black solid lines), along with the 95% pointwise confidence bands (shaded regions, constructed using Wald-type confidence intervals based on 1,000 bootstrap samples) and the 95% coverage probabilities. All panels suggest that the estimated coefficient functions from the FJM are reasonably close to the true coefficient functions, with 95% pointwise confidence intervals always covering the true functions. The empirical coverage probabilities on all regions over the domain S are close to the nominal level of 95%, except the rightmost tail of the coefficient function B(w)(s) in the survival submodel. This may be because that B(w)(s) is not well expanded by the first few principal components calculated from on the rightmost tail. This limitation is further discussed in Section 6.
In Setting II, we keep the same parameters as in Setting I, but simulate a functional predictor which is mapped to a 3D bean surface to resemble the hippocampal surfaces. We evaluate the performance of our FJM when the 3D functional predictor is converted to a 1D vector, as described in the Web Supplement. To do this conversion, we first construct the triangular surface meshes of the bean surface, and then conformally map the triangular surface meshes to a rectangle plane. We align the points on the rectangle plane to form a 1D vector, but still retain the one-to-one correspondence with points on the original bean surface.
To generate the 1D functional predictor, we use the first 25 orthonormal eigenfunctions ϕ(s) = [ϕ1(s), …, ϕ25(s)]⊤ and corresponding eigenvalues λ = [λ1, …, λ25]⊤ that derived from the hippocampal radial distance in Section 4. The functional predictor for the ith subject gi(s) is simulated by , for i = 1, …, 400, where ξil is the subject-specific principle component score generated from a Normal distribution as ξil ~ N(0, λl), where l = 1, …, 25. Thus, the simulated functional predictor gi(s) retains many features of the hippocampal radial distance. Next, we construct a two-dimensional (2D) coefficient function (coefficient image) on the rectangle plane to match the size of the bean surface using the densities of bivariate Normal distributions. Let
be density functions denoted by fP1 and fP2, respectively. The coefficient image is given by 0.8fP1 − 1.0fP2, and its 1D representation B(s) can be achieved by aligning points in the same order described above. After both functional predictor and its coefficient are transformed to 1D domain, the longitudinal and survival data are generated via the same method as in Setting I. We select an approximate censoring rate of 30% and generate 200 simulated datasets. Our objective is to estimate the coefficient image via estimating its 1D representation B(s) and mapping back to the 3D surfaces. Figure 4 displays the true and estimated coefficient functions mapped back to the bean surface. The close similarity of two figures indicates that the estimated coefficient function captures the main feature of the true coefficient function, suggesting the feasibility of our approach to transforming the high dimensional surface image to the 1D domain.
6 Discussion
Functional data are increasingly collected in public health and medical studies to better understand important public health issues and complex diseases. Both theoretical and computational complexity in functional data analysis (FDA) often makes health care practitioners to reduce the rich functional data into several scalar measures, e.g., volumes of a few brain regions. This enormous data reduction may distort the true underlying relationship between population’s health condition and the functional data. Moreover, some nontraditional functional data, such as genetic variant profiles defined along chromosomes or genomic regions, may not be able to reduce to scalar measures. To this end, FDA methods are increasingly used.
In this article, we develop a functional joint model (FJM) to account for functional predictors in both longitudinal and survival submodels within the framework of joint modeling. We use the functional principal component analysis (FPCA) to approximate the functional predictor, and expand its corresponding coefficient function using the empirical orthonormal eigenfunctions obtained from FPCA. FPC scores can be readily included as scalar covariates in a standard joint model consisting of a linear mixed model for the longitudinal scalar response variable and a Cox model for the survival outcome. The parameters of the FJM are estimated in a maximum likelihood framework via EM algorithm. The proposed FJM provides a flexible framework to incorporate many features both in joint modeling of longitudinal and survival data and in functional data analysis. We demonstrate through a simulation study that the FJM performs well in estimating the coefficient functions and other parameters.
Several studies have documented diminishing hippocampal thickness at baseline is associated with an increased likelihood of progressing to clinical dementia [42]. However, these studies only assess changes to hippocampal volume rather than its surface morphology. Enormous information loss may result when aggregating the high dimensional image data into a scalar volumetric value [43]. In comparison, our FJM framework accounts for the functional predictor (hippocampal radial distance, HRD) and other scalar covariates and efficiently estimate the association between the trajectory of cognitive functions measured by the ADAS-Cog 11 score and time to AD diagnosis. The inclusion of functional predictor HRD into the longitudinal submodel improves model fitting. We have identified that the regional radial atrophy in the CA1 subfield and the subiculum subfield is a good predictor of AD progression among patients with mild cognitive impairment. The identification of these subfields may facilitate case selection in clinical trials for evaluating therapeutic efficacy in slowing or modifying AD-related pathophysiology. Moreover, the proposed FJM can readily include multiple brain regions, and even genotype profiles, as functional predictors to assess whether they are associated with Alzheimer’s disease progression.
There are some advantages of using FPC expansion, besides the convenience of FPC implementation. FPC expansion allows borrowing strength across subjects in estimating basis functions, and it can capture complex correlations within the functions, as in the ADNI study [44]. Moreover, FPC can easily handle observation with missing values in functions, or functions measured with errors [11]. However, we retain the first K eigenfunctions to approximate the functional predictor. Although the selected number of K eigenfunctions can explain the majority of the variability in the functional predictor, they may not adequately represent the coefficient function. Therefore, the features on the tail region of the coefficient function are not well captured as demonstrated in our simulation study. Furthermore, it is possible that some retained FPCs are not significantly associated with the outcome. On the other hand, in the functional predictor regression, there are various choices of basis functions, e.g., splines, Fourier, wavelet, and their combinations, and regularization approaches. Splines are well suited to modeling simple and smooth functions, and usually work better when the dimension of the grid is not too high [45]. However, no single basis function is superior in all settings. In the future work, we will investigate the performance of other basis functions and new methods for selecting among various candidate basis functions and regularization approaches.
There are some limitations we will address in the future. First of all, we exclude from analysis the subjects with missing data in baseline covariates of interest. A majority of these subjects either do not have baseline image data measured or do not have image data in the archive due to the technical difficulty in image data collection and storage. We have assumed the baseline covariates are missing completely at random. We would like to investigate the validity of this assumption in the future. Moreover, there is some longitudinal MRI measurements available for some ADNI participants. About 30% these participants have at least one missed MRI measurement, creating the missing data issue in longitudinal functional data. The complex within-subject correlation among functional data should be carefully modeled when handling missing data. In addition, we use a time-invariant functional predictor in this article. It would be of scientific interest to extend the proposed FJM to accommodate longitudinal functional data. We can treat the longitudinal functional variable as a functional predictor which can be decomposed by longitudinal FPCA (LFPCA) [46] to account for it longitudinal data structure. Alternatively, the longitudinal functional variable can be treated as a functional response variable in the longitudinal submodel (function-on-scalar regression problem) and it can be incorporated in the survival submodel as a time-dependent functional predictor. This model can investigate how the longitudinal functional variable directly impacts the time to event of interest. Furthermore, in model (2), different formulations can be used to postulate how the hazard of a survival event depends on the longitudinal trajectory. For example, both the unobserved true value of mi(tij) as in model (1) and its time-dependent slope can be included in model (2). A good summary of these various formulations in the joint modeling framework can be found in Rizopoulos et al. [47] and Yang et al. [48]. Finally, we would like to develop an user-friendly R package which addresses the aforementioned issues and incorporate some useful features, e.g., nonparametric smoothing and missing data.
Supplementary Material
Acknowledgments
Sheng Luo’s research was supported in part by the National Institute of Neurological Disorders and Stroke under Award Numbers R01NS091307 and 5U01NS043127. Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.ucla.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this article. A complete listing of ADNI investigators can be found at: http://adni.loni.ucla.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.
Contributor Information
Kan Li, Department of Biostatistics, The University of Texas Health Science Center at Houston.
Sheng Luo, Department of Biostatistics, The University of Texas Health Science Center at Houston, 1200 Pressler St, Houston, TX 77030, USA.
References
- 1.Ramsay JO, Dalzell CJ. Some tools for functional data analysis. Journal of the Royal Statistical Society Series B (Methodological) 1991;53(3):539–572. [Google Scholar]
- 2.Ramsay J, Silverman B. Functional Data Analysis. Springer; New York: 1997. p. 1997. [Google Scholar]
- 3.Marx BD, Eilers PH. Generalized linear regression on sampled signals and curves: a P-spline approach. Technometrics. 1999;41(1):1–13. [Google Scholar]
- 4.Cardot H, Ferraty F, Sarda P. Functional linear model. Statistics & Probability Letters. 1999;45(1):11–22. [Google Scholar]
- 5.Cardot H, Ferraty F, Sarda P. Spline estimators for the functional linear model. Statistica Sinica. 2003;13(3):571–591. [Google Scholar]
- 6.James GM. Generalized linear models with functional predictors. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002;64(3):411–432. [Google Scholar]
- 7.Müller HG, Stadtmüller U. Generalized functional linear models. The Annals of Statistics. 2005;33(2):774–805. [Google Scholar]
- 8.Reiss PT, Ogden RT. Functional principal component regression and functional partial least squares. Journal of the American Statistical Association. 2007;102(479):984–996. [Google Scholar]
- 9.James GM, Wang J, Zhu J. Functional linear regression that’s interpretable. The Annals of Statistics. 2009;37(5A):2083–2108. [Google Scholar]
- 10.Goldsmith J, Bobb J, Crainiceanu CM, Caffo B, Reich D. Penalized functional regression. Journal of Computational and Graphical Statistics. 2012;20(4):830–851. doi: 10.1198/jcgs.2010.10007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Goldsmith J, Crainiceanu CM, Caffo B, Reich D. Longitudinal penalized functional regression for cognitive outcomes on neuronal tract measurements. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2012;61(3):453–469. doi: 10.1111/j.1467-9876.2011.01031.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Gertheiss J, Goldsmith J, Crainiceanu C, Greven S. Longitudinal scalar-on-functions regression with application to tractography data. Biostatistics. 2013;14(3):447–461. doi: 10.1093/biostatistics/kxs051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Gellar JE, Colantuoni E, Needham DM, Crainiceanu CM. Cox regression models with functional covariates for survival data. Statistical Modelling. 2015;15(3):256–278. doi: 10.1177/1471082X14565526. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Lee E, Zhu H, Kong D, Wang Y, Giovanello KS, Ibrahim JG, et al. BFLCRM: A Bayesian functional linear Cox regression model for predicting time to conversion to Alzheimer’s disease. The Annals of Applied Statistics. 2015;9(4):2153–2178. doi: 10.1214/15-AOAS879. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Faucett CL, Thomas DC. Simultaneously modelling censored survival data and repeatedly measured covariates: a Gibbs sampling approach. Statistics in Medicine. 1996;15(15):1663–1685. doi: 10.1002/(SICI)1097-0258(19960815)15:15<1663::AID-SIM294>3.0.CO;2-1. [DOI] [PubMed] [Google Scholar]
- 16.Wulfsohn MS, Tsiatis AA. A joint model for survival and longitudinal data measured with error. Biometrics. 1997;53(1):330–339. [PubMed] [Google Scholar]
- 17.Tsiatis AA, Davidian M. Joint modeling of longitudinal and time-to-event data: an overview. Statistica Sinica. 2004;14(3):809–834. [Google Scholar]
- 18.Proust-Lima C, Séne M, Taylor JM, Jacqmin-Gadda H. Joint latent class models for longitudinal and time-to-event data: A review. Statistical Methods in Medical Research. 2014;23(1):74–90. doi: 10.1177/0962280212445839. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Weiner MW, Veitch DP, Aisen PS, Beckett LA, Cairns NJ, Cedarbaum J, et al. 2014 Update of the Alzheimer’s Disease Neuroimaging Initiative: A review of papers published since its inception. Alzheimer’s & Dementia. 2015;11(6):e1–e120. doi: 10.1016/j.jalz.2014.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Cleveland WS. Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association. 1979;74(368):829–836. [Google Scholar]
- 21.Diggle P, Kenward MG. Informative drop-out in longitudinal data analysis. Journal of the Royal Statistical Society Series C (Applied Statistics) 1994;43(1):49–93. [Google Scholar]
- 22.Lo RY, Hubbard AE, Shaw LM, Trojanowski JQ, Petersen RC, Aisen PS, et al. Longitudinal change of biomarkers in cognitive decline. Archives of Neurology. 2011;68(10):1257–1266. doi: 10.1001/archneurol.2011.123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Zhang D, Shen D, Initiative ADN, et al. Predicting future clinical changes of MCI patients using longitudinal and multimodal biomarkers. PLOS ONE. 2012;7(3):e33182. doi: 10.1371/journal.pone.0033182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Du AT. Magnetic resonance imaging of the entorhinal cortex and hippocampus in mild cognitive impairment and Alzheimer’s disease. Journal of Neurology, Neurosurgery & Psychiatry. 2001;71(4):441–447. doi: 10.1136/jnnp.71.4.441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Apostolova LG, Mosconi L, Thompson PM, Green AE, Hwang KS, Ramirez A, et al. Subregional hippocampal atrophy predicts Alzheimer’s dementia in the cognitively normal. Neurobiology of Aging. 2010;31(7):1077–1088. doi: 10.1016/j.neurobiolaging.2008.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Qiu A, Fennema-Notestine C, Dale AM, Miller MI, Initiative ADN, et al. Regional shape abnormalities in mild cognitive impairment and Alzheimer’s disease. NeuroImage. 2009;45(3):656–661. doi: 10.1016/j.neuroimage.2009.01.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Thompson PM, Hayashi KM, de Zubicaray GI, Janke AL, Rose SE, Semple J, et al. Mapping hippocampal and ventricular change in Alzheimer disease. NeuroImage. 2004;22(4):1754–1766. doi: 10.1016/j.neuroimage.2004.03.040. [DOI] [PubMed] [Google Scholar]
- 28.Yushkevich PA. Continuous medial representation of brain structures using the biharmonic PDE. NeuroImage. 2009;45(1):S99–S110. doi: 10.1016/j.neuroimage.2008.10.051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Shi J, Thompson PM, Gutman B, Wang Y, Initiative ADN, et al. Surface fluid registration of conformal representation: Application to detect disease burden and genetic influence on hippocampus. NeuroImage. 2013;78:111–134. doi: 10.1016/j.neuroimage.2013.04.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Staniswalis JG, Lee JJ. Nonparametric regression analysis of longitudinal data. Journal of the American Statistical Association. 1998;93(444):1403–1418. [Google Scholar]
- 31.Yao F, Müller HG, Clifford AJ, Dueker SR, Follett J, Lin Y, et al. Shrinkage estimation for functional principal component scores with application to the population kinetics of plasma folate. Biometrics. 2003;59(3):676–685. doi: 10.1111/1541-0420.00078. [DOI] [PubMed] [Google Scholar]
- 32.Crainiceanu CM, Staicu AM, Ray S, Punjabi N. Bootstrap-based inference on the difference in the means of two correlated functional processes. Statistics in Medicine. 2012;31(26):3223–3240. doi: 10.1002/sim.5439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Crainiceanu CM, Reiss PT, Goldsmith J, Huang L, Huo L, Scheipl F. refund: Regression with functional data. :2013. [Google Scholar]
- 34.Peng J, Paul D. A Geometric Approach to Maximum Likelihood Estimation of the Functional Principal Components From Sparse Longitudinal Data. Journal of Computational and Graphical Statistics. 2009;18(4):995–1015. [Google Scholar]
- 35.Rizopoulos D, et al. JM: An R package for the joint modelling of longitudinal and time-to-event data. Journal of Statistical Software. 2010;35(9):1–33. [Google Scholar]
- 36.Risacher SL, Saykin AJ, Wes JD, Shen L, Firpi HA, McDonald BC. Baseline MRI predictors of conversion from MCI to probable AD in the ADNI cohort. Current Alzheimer Research. 2009;6(4):347–361. doi: 10.2174/156720509788929273. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Cui Y, Liu B, Luo S, Zhen X, Fan M, Liu T, et al. Identification of conversion from mild cognitive impairment to Alzheimer’s disease using multivariate predictors. PLOS ONE. 2011;6(7):e21896. doi: 10.1371/journal.pone.0021896. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Li S, Okonkwo O, Albert M, Wang MC. Variation in variables that predict progression from MCI to AD dementia over duration of follow-up. American Journal of Alzheimer’s Disease. 2013;2(1):12–28. doi: 10.7726/ajad.2013.1002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Corder E, Saunders A, Strittmatter W, Schmechel D, Gaskell P, Small G, et al. Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer’s disease in late onset families. Science. 1993;261(5123):921–923. doi: 10.1126/science.8346443. [DOI] [PubMed] [Google Scholar]
- 40.Apostolova LG, Dutton RA, Dinov ID, Hayashi KM, Toga AW, Cummings JL, et al. Conversion of mild cognitive impairment to Alzheimer disease predicted by hippocampal atrophy maps. Archives of Neurology. 2006;63(5):693–699. doi: 10.1001/archneur.63.5.693. [DOI] [PubMed] [Google Scholar]
- 41.Frankó E, Joly O, Alzheimer’s Disease Neuroimaging Initiative et al. Evaluating Alzheimer’s disease progression using rate of regional hippocampal atrophy. PLOS ONE. 2013;8(8):e71354. doi: 10.1371/journal.pone.0071354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Weiner MW, Veitch DP, Aisen PS, Beckett LA, Cairns NJ, Green RC, et al. The Alzheimer’s Disease Neuroimaging Initiative: a review of papers published since its inception. Alzheimer’s & Dementia. 2013;9(5):e111–e194. doi: 10.1016/j.jalz.2013.05.1769. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Wang Y, Song Y, Rajagopalan P, An T, Liu K, Chou YY, et al. Surface-based TBM boosts power to detect disease effects on the brain: an N= 804 ADNI study. NeuroImage. 2011;56(4):1993–2010. doi: 10.1016/j.neuroimage.2011.03.040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Yao F, Müller HG, Wang JL. Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association. 2005;100(470):577–590. [Google Scholar]
- 45.Morris JS. Functional regression. Annual Review of Statistics and Its Application. 2015;2(1):321–359. [Google Scholar]
- 46.Greven S, Crainiceanu C, Caffo B, Reich D. Longitudinal functional principal component analysis. Electronic Journal of Statistics. 2010;4:1022–1054. doi: 10.1214/10-EJS575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Rizopoulos D, Hatfield LA, Carlin BP, Takkenberg JJ. Combining dynamic predictions from joint models for longitudinal and time-to-event data using Bayesian model averaging. Journal of the American Statistical Association. 2014;109(508):1385–1397. [Google Scholar]
- 48.Yang L, Yu M, Gao S. Prediction of coronary artery disease risk based on multiple longitudinal biomarkers. Statistics in Medicine. 2016;35(8):1299–1314. doi: 10.1002/sim.6754. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.