Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Jan 1.
Published in final edited form as: Comput Stat Data Anal. 2018 Aug 16;129:14–29. doi: 10.1016/j.csda.2018.07.015

Bayesian Functional Joint Models for Multivariate Longitudinal and Time-to-Event Data

Kan Li a, Sheng Luo b,*
PMCID: PMC6294314  NIHMSID: NIHMS1504053  PMID: 30559575

Abstract

A multivariate functional joint model framework is proposed which enables the repeatedly measured functional outcomes, scalar outcomes, and survival process to be modeled simultaneously while accounting for association among the multiple (functional and scalar) longitudinal and survival processes. This data structure is increasingly common across medical studies of neurodegenerative diseases and is exemplified by the motivating Alzheimer’s Disease Neuroimaging Initiative (ADNI) study, in which serial brain imaging, clinical and neuropsychological assessments are collected to measure the progression of Alzheimer’s disease (AD). The proposed functional joint model consists of a longitudinal function-on-scalar submodel, a regular longitudinal submodel, and a survival submodel which allows time-dependent functional and scalar covariates. A Bayesian approach is adopted for parameter estimation and a dynamic prediction framework is introduced for predicting the subjects’ future health outcomes and risk of AD conversion. The proposed model is evaluated by a simulation study and is applied to the motivating ADNI study.

Keywords: Longitudinal functional data, Joint modeling, Dynamic prediction, Alzheimer’s disease

1. Introduction

The growing public health threat posed by Alzheimer’s disease (AD) has raised the urgency to discover and assess markers for the early detection of the disease. In this regard, a great deal of e ort has been dedicated to building models for predicting AD based on a single marker, or a combination of multiple markers, which captures the heterogeneity among subjects and detects the disease progression of subjects at risk [1]. Since mild cognitive impairment (MCI) is often considered as a transitional stage to AD, MCI patients are usually enrolled as the target population for early prognosis and evaluating interventions [2]. Existing research has identified a number of biomarkers in predicting an individual’s likelihood of converting to AD, as well as differences in biomarker values among MCI and AD individuals [3, 4]. It is widely acknowledged that magnetic resonance imaging (MRI) based measures of atrophy in key brain regions, such as the hippocampus, are predictive of progression from MCI to AD[5, 6]. Although most of the current studies measure regional atrophy using a single volume-based value, some researchers [7, 8] demonstrated that the surface-based morphology analysis offers more advantages because this method studies patterns of subregion atrophy and produces detailed pointwise correlation between atrophy and cognitive decline. Li and Luo [9] proposed a functional joint model (FJM) that incorporates surface-based hippocampus measure as a functional predictor in the joint model of longitudinal and survival framework. They developed a dynamic prediction method and demonstrated that using such a functional predictor, in addition to other scalar markers, improves predictive performance of the progression of MCI to AD [10]. However, the proposed FJM only accommodates baseline imaging marker as a time-invariant function predictor. Since the imaging markers (e.g., hippocampus) from MRI, along with other neurocognitive markers, are often collected repeatedly in the studies of AD, it is of scientific interest to investigate the combined predictive performance of these repeatedly measured functional and scalar outcomes.

Several methods for the analysis of repeatedly measured functional outcome exist in the literature. One category of the methods is based on functional principal component analysis (FPCA), as well as its extension for multilevel FPCA by Di et al. [11], longitudinal FPCA by Greven et al. [12] and by Park et al. [13]. These methods modeled subject-specific deviations from a population mean by using low dimensional basis functions estimated from the empirical covariance matrix. However, they were inflexible to estimate the effect of covariates (e.g., age) on the functional outcome. Brumback and Rick [14] and Guo [15] proposed a function-on-scalar mixed effect model in which population level effects and individual level deviations were modeled by using penalized splines. Wavelet-based Bayesian functional mixed models were presented in Morris and Carroll [16], which used a discrete wavelet transform of the observed functional data and modeled coefficients in the wavelet domain. Goldsmith and Kitage [17] developed a Bayesian framework for penalized spline function-on-scalar regression, allowing the joint modeling of population level fixed effects, individual level random effects, and residual functions. However, these works focused on the statistical inference on longitudinal functional data without considering the survival process and not for prediction purpose.

Joint model is an appropriate framework to modeling longitudinal data and time-to-event data since it has potential to reduce parameter estimate bias, account for dropout in longitudinal studies, and enable the inclusion of longitudinal covariates (both scalar and functional) measured with error in time-to-event models [18, 19]. Multivariate joint models have been well studied by considering multivariate continuous, binary, ordinal, or a mixture of different outcome types. Hickey et al. [20] gave an excellent review of multivariate joint modeling research. However, no previous study investigates how to incorporate the longitudinal functional (high-dimensional) outcome in a multivariate joint model framework. To this end, we propose a novel joint model that incorporates the growing volume of repeatedly measured functional outcomes in the longitudinal-survival setting. Specifically, we develop a multivariate functional joint model (MFJM) that could simultaneously analyze a longitudinal functional outcome, a longitudinal scalar outcome, and a survival outcome. The principle of the MFJM is to define three type of submodels: (1) a functional mixed effect submodel for the longitudinal functional outcome, (2) a regular mixed effects submodel, or multiple regular mixed effects submodels, to describe the evolution of the longitudinal scalar outcome(s), and (3) a Cox submodel for the survival outcome which is linked with (1) and (2) using a common latent structure. The MFJM is flexible to account for the correlation between repeated measures and correlation among multiple outcomes. We estimate the coefficient functions in the functional regression using penalized spline approach and parameters are jointly estimated in a Bayesian framework.

Compared with the existing literature, we make two major contributions to both multivariate joint modeling and functional data analysis: 1) We propose a multivariate joint model considering both longitudinal functional and scalar outcomes. To the best of our knowledge, this paper is the first to model the repeatedly measured functional outcomes, scalar outcomes, survival process simultaneously while accounting for the associations among the processes. 2) We propose a dynamic prediction framework that provides accurate personalized predictions of disease risk and progression. We investigate the potential capability of the longitudinal functional outcome in improving the prediction of AD progression. Previous studies involving functional data mainly focused on model inference rather than prediction of risk and longitudinal outcome trajectories. These important predictive tools can provide valuable information to monitor each patient’s disease progression and to make early decisions about targeted prevention and treatment selection.

The rest of the article is organized as follows. In Section 2, we describe the motivating Alzheimer’s Disease Neuroimaging Initiative (ADNI) study and the data structure. In Section 3, we discuss the multivariate functional joint model, Bayesian inference procedure, and dynamic prediction framework. In Section 4, we apply the proposed method to the motivating ADNI study. In Section 5, we conduct a simulation study to assess the performance of the method. Concluding remarks and discussion are presented in Section 6.

2. A Motivating Clinical Study

The methodology development is motivated by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study. The primary goal of the study is to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), cerebrospinal fluid (CSF) markers, and neuropsychological assessments can be combined to measure the progression of AD. The phase one of the ADNI study (ADNI-1) recruited more than 800 adults, of which about 200 cognitively normal individuals, 400 mild cognitive impairment (MCI) patients, and 200 early AD patients. Participants were reassessed at 6, 12, 18, 24 and 36 months, and additional follow-ups were conducted annually as part of ADNI-2. At each visit, various neuropsychological assessments, brain image, and clinical measures were collected. Detailed information about the ADNI study procedures, including participant inclusion and exclusion criteria and complete study protocol can be found at http://www.adni-info.org.

MCI is commonly considered as a transitional stage between normal cognition and Alzheimer’s disease and used as the target population for evaluating prognosis and early treatment. To this end, our analysis focuses on 355 MCI patients in the ADNI-1 study without missing data in covariates of interests, and we consider time from baseline to AD diagnosis among MCI patients to be the survival event of interest. In the ADNI-1 study, the 355 MCI patients were followed up for a mean of 3.2 years (SD 2.6; range 0.4-9.3) before AD diagnosis or censoring. Among them, 180 patients were diagnosed with AD (survival event) and 175 had stable MCI over a mean follow-up period of 2.3 years and 4.2 years, respectively.

Longitudinal AD Assessment Scale-Cognitive (ADAS-Cog) score and Hippocampal volume (HV) were reported to be the strongest predictor of AD progression in the cognitive and imaging domains, respectively [21]. However, when the high-dimensional MRI data are aggregated to single volume data such as HV, enormous information is lost [8], as the more recent surface-based morphology analysis (based on the longitudinal changes of cortical thickness in thousand of vertices) provides crucial disease progression information for early detection of AD [22]. In this study, we adopt the surface-based analysis of imaging data which retains more information about Hippocampus morphology. In the surface-based analysis, the hippocampus is modeled as a surface model which is a mesh of triangles. Each triangle is known as a face and the place where the corners of the triangles meet is called a vertex. The coordinate of each vertex is determined during image processing and allows one to compute many morphometric measures, e.g., hippocampal radial distance (HRD). Figure 1 illustrates the longitudinal profile of the surface-based hippocampus images (mapped on a three-dimensional hippocampus template) of one MCI patient at different visits. The colors represent the hippocampal radial distance (HRD) which measures the distance from the medial core to each point on the surface (referred to as vertex) and reflects the hippocampal cortical thickness. As AD progresses and the hippocampus atrophies, the radial distance of some subfields shrinks. It has been shown that the baseline vertex-based HRD is predictive of time of MCI-to-AD as a functional predictor [10]. In this paper, we propose a Bayesian personalized prediction model based on a multivariate functional joint model (MFJM) of longitudinal ADAS-Cog 11 score as a scalar predictor, longitudinal vertex-based HRD as a functional predictor, and the time to AD diagnosis.

Figure 1.

Figure 1

The longitudinal profile of surface-based hippocampal images of one MCI patient: hippocampal radial distances are denoted by colors.

The image processing procedure is detailed in the Web Supplement. We first extract the hippocampal surfaces (left and right) from original MRI scans (Step 1 in Web Figure 1) using FIRST [23], an integrated surface analysis tool developed as part of the FSL library [24]. The surfaces are then conformally mapped to a two-dimensional (2D) rectangle plane, in the form of matrix, to form two feature images (Step 2). We then register each feature image (patient and visit) to a common template and calculate the hippocampal radial distance (HRD) of each vertex to the predefined medical core, which represents the hippocampal cortical thickness (Step 3). These steps account for the spatial information and image smoothing. Then the HRD values on the 2D image matrices are aggregated over the y-axis of the image into a one-dimensional (1D) image vector such that the corresponding HRDs of the vertices are represented as a 1D functional data (denoted by yi(s, tij) for subject i visit j) defined on domain S (Step 4). Each point in the image vector domain (i.e., S) corresponds to a coordinate on x-axis of the 2D hippocampal image matrix. It was revealed that left hippocampus atrophy was associated with delayed verbal memory [25], where the delayed verbal memory was one of the important predictors for determine whether a subject was a MCI converter or not [26]. Thus, our analysis focus on the surface morphology and HRD of the left hippocampus.

3. Methods

3.1. Multivariate functional joint model (MFJM) framework

In the context of clinical trials and observational studies, for each subject i (i = 1, ⋯ , I) at visit j ( j = 1, ⋯ ,Ji) and on a 1D domain s ∈ [0, Smax] = S, we observe data {yij, yij(s), xij}, where yij=yi(tij) is a scalar response observed at time tij from the study onset, yij(s) = yi(s, tij) is a functional response curve observed at time tij on domain S, and xij = [xij1, ⋯ , xijP] is a P-dimensional scalar covariates vector. The domain of the functional response S is not the same as the time domain t, over which the survival event is followed. We propose a longitudinal submodel to describe the evolution of the scalar outcome over time. The model is represented as

yi(tij)=mi(tij)+ij,mi(tij)=β0+tijβt+pPxijpβp+VR(tij)ζ+bi, (1)

where mi(tij) is the unobserved true value of the scalar longitudinal outcome at time tij, β0 is the intercept, βt is the change of scalar outcome overtime, βp’s are the regression coefficients. To allow additional flexibility and smoothness in modeling the effects of some covariates, we adopt a smooth time function VR(t)ζ=r=1Rζr(tkr)+ using the truncated power series spline basis expansion VR(t) = {(tκ1)+, ⋯ , (tκR)+}, where ζ = [ζ1, ⋯ , ζR]T are the spline coefficients, κ = {κ1, ⋯ , κR} are the knots, and (tκr)+ = tκr if t > κr and 0 otherwise. We consider a sufficient large number of knots that can ensure the desired flexibility and we select the knot location to have sufficient subjects between adjacent knots. The choice of knots is important to obtain a well fitted model and penalizing the spline coefficients to constrain their influence could help to avoid overfitting [27]. The random intercepts bi are independent and identically distributed (iid), and the measurement errors ϵij~N(0,σϵ2) are independent from bi. The inclusion of covariate-specific random effects as a random slope is a direct extension of model (1).

We assume the functional response yi(s, tij) is linear in time, which is a direct extension of the linear assumption in the scalar model. The longitudinal functional submodel is defined as

yi(s,tij)=mi(s,tij)+ij(s),mi(s,tij)=B0(s)+tijBt(s)+pPxijpBp(s)+bi(s), (2)

in which mi(s, tij) is the unobserved true value of the longitudinal functional outcome at time tij over domain S, B0(s) is the overall mean function, Bt(s) and Bp(s)’s are fixed effect coefficient functions corresponding to time t and the scalar covariates xij. The random intercept function bi(s) for subject i represents the subject-specific effect, and ϵij(s) is a white noise error process with covariance Cov{ϵij(s),ϵij(s)}=σϵ2 if s = s and 0 otherwise. We assume that the random function bi(s) are iid, the error process ϵij(s) are iid and are independent from bi(s). For identifiability we require that bi(s) comprises solely the random deviation that is specific to the subject; any repeated visit-specific deviation is viewed as part of ϵij(s). As in traditional mixed models, the inclusion of “random slope functions” would allow subject-specific impacts of changing covariate levels and should be considered in future applications.

The event history is recorded for each subject i with observed event time Ti=min(Ti,Ci) and the event indicator δi=I(TiCi), where Ti and Ci are the true event time and censoring time, respectively. The survival submodel is

hi(t)=h0(t)exp{wiγ+αmi(t)+Sα(s)mi(s,t)ds}, (3)

where h0(t) is the baseline hazard function, and wi is a vector of time-independent covariates with regression coefficient vector γ. The association parameter α* quantifies the strength of correlation between the unobserved true longitudinal function mi(t) and the event hazard at the same time point t, and the association function α(s) quantifies the correlation between the unobserved true longitudinal function mi(s, t) and the event hazard at the time point. In this paper, we assume a constant functional parameter α(s) ≡ α for identifiability and discuss the case that α(s) varies over the domain of S. We implicitly assume that the risk for an event at time t depends on the unobserved true value of the longitudinal outcomes at the same time point in Model (3). However, other functional forms for the association structure such as time-dependent slopes or cumulative effects of mi(t) and mi(s, t) can also be included in the survival submodel. Models (1), (2), and (3) consist of the multivariate functional joint model (MFJM) framework.

To modeling the longitudinal functional data, we adopt the functional mixed effect model and expand the random intercept function bi(s) using a functional principal component analysis (FPCA) approach. FPCA is a dimensionality reduction tool for functional data which leads to low dimensional projection basis (eigenfunction) and makes analyzing data easier. Specifically, we first express the random intercept function bi(s) in model (2) using the Karhunen-Loève decomposition. The spectral decomposition of the covariance function of bi(s)’s is given by Σ(b)(s,s)=k=1λkϕk(s)ϕk(s), where λ1λ2 ≥ ⋯ ≥ 0 are non-increasing eigenvalues and ϕk(s)’s are the corresponding eigenfunctions. The Karhunen-Loève expansion of bi(s) is bi(s)=k=1ξikϕk(s), where the functional principal component (FPC) scores ξik are uncorrelated random variables with mean zero and variance λk. In practice, we adopt a truncated approximation for bi(s) given by bi(s)k=1Kϕξikϕk(s). Thus yi(s, tij) is written as

yi(s,tij)B0(s)+tijBt(s)+pPxijpBp(s)+k=1Kϕξikϕk(s)+ij(s).

The number of eigenfunctions for random intercept function Kϕ is pre-specified fixed constants. A sufficiently large value should be chosen for Kϕ to capture the variation in the random functions, and sensitivity to the choices should be assessed. However, selecting the number of eigenfunctions larger than necessary leads to increased computing burden [17]. For computing efficiency, we assume the correlation between yi(tij) and yi(s, tij) is manifested by the correlation between bi and the first elements in ξi=[ξi1,,ξiKϕ], and bi=[bi,ξi]~MVN(0,Σ), where

Σ=[σb2,ρσbλ1,,0ρσbλ1,λ1,,00,0,λKϕ].

We may also estimate the correlation between the scalar random effects and all FPC components via an covariance matrix whose off-diagonal elements are nonzero. Such a covariance matrix may provide a full representation of the correlation between the mixed outcomes. However, as the computational burden increases dramatically as the covariance matrix gets more complex, we have to consider the trade-off among modeling flexibility, estimation accuracy, and computation.

In practice, functional outcomes are not truly functions but are observed on a finite grid of length M that cover the domain S, i.e., {s1, ⋯ , sM}. Let B(s) be the M × (P + 2) matrix with columns containing B0(s), Bt(s), and Bp(s)’s evaluated on the finite grid and let ϕ(s) be a M × Kϕ matrix with columns containing eigenfunctions ϕk(s). We express the coefficient function and the eigenfunction in each column of B(s) and ϕ(s) in terms of a known cubic B-spline with equally spaced knots, which leads to a M × Kψ matrix ψ(s)=[ψ1(s),,ψKψ(s)] with basis functions as columns. For example, the coefficient function B0(s)=l=1KψB0lψl(s)=[B0ψ(s)] and the eigenfunction ϕk(s)=l=1KψBϕklψl(s)=[Bϕkψ(s)], where B0 and Bϕk are row vectors of spline coefficients B0l and Bϕkl, respectively. Other spline bases could be used, but the parameters of the B-spline have good mixing properties in the context of Bayesian posterior simulation [28] and B-spline is widely used in functional data analysis literature for its flexibility [29]. Let Bt and Bk’s have the same meaning as B0, then B=[B0,Bt,B1,,BP] and Bϕ=[Bϕ1,,BϕKϕ] denote (P + 2) × Kψ and Kϕ × Kψ matrices respectively, whose rows are spline coefficients for B(s) and ϕ(s). Therefore, the coefficient functions are written as B(s)=[Bψ(s)], the eigenfunctions are ϕ(s)=[Bϕψ(s)], and the random intercept function is bi(s)=ξi(ϕ(s))=ξiBϕψ(s), with ξi being the row vector of FPC scores for the random intercept function. For the choice of number of basis functions Kψ for B-spline, we refer to Ruppert [30] and choose them large (e.g., 10) to capture the complexity in coefficient functions. We adopt penalization technique to prevent overfitting and induce smoothness in the resulting coefficient functions. Thus the functional longitudinal submodel is rewritten as

yi(s,tij)mi(s,tij)+ij(s),wheremi(s,tij)B0ψ(s)+tijBtψ(s)+pPxijpBpψ(s)+ξiBϕψ(s). (4)

We assume a constant associate function α(s) ≡ α, and thus the survival submodel is

hi(t)=h0(t)exp{wiγ+αmi(t)+αSmi(s,t)ds}. (5)

The integral in the survival submodel is computed by numeric integration.

Let θ=[β,ζ,B,(ξi:i=1,,I),(λk:k=1,,Kϕ),ρ,Bϕ,γ,α,α,σb2,σϵ2,σϵ2,θh0] be the unknown parameter vector to be estimated, where β = [β0, βt, β1, ⋯ , βp]T , and vector θh0 denotes the parameters in the baseline hazard function h0(). The observed data yi(tij), yi(s, tij), xij, tij, wi for i = 1, ⋯ , I and j = 1, ⋯ , Ji are known; as well as the cubic B-spline basis functions ψ(s), which can be generated using the bs function in the R package spline.

The conditional likelihood from the longitudinal scalar data yi=[yi1,,yiJi] is

p(yiθ,bi)=(2πσϵ2)Ji2exp{12σϵ2j=1Ji[yij(β0+tijβt+pPxijpβp+r=1Rζr(tkr)+bi)]2}.

The conditional likelihood for the functional longitudinal data yi(s)=[yi1(s),,yiJi(s)] is

p(yi(s)θ,bi)=2πσϵ2Is×sJi2exp{12tr([yi(s)mi(s)][yi(s)mi(s)](σϵ2Is×s)1)},

where mi(s)=[mi(s,ti1),,mi(s,tiJi)], |·| is the determinant of a matrix and tr is the trace of a matrix. The density function of the random effects bi is p(biθ)=(2π)(Kϕ+1)2Σ12exp(12biΣ1bi), where (Kϕ + 1) is the dimension of the covariance matrix Σ. The conditional likelihood from the survival data is

p(Ti,δi,θ,bi)=hi(Tiθ,bi)δiSi(Tiθ,bi)=hi(Tiθ,bi)δiexp[0Tihi(tθ,bi)dt],

where hi(Tiθ,bi)=h0(Ti)exp{wiγ+αmi(Ti)+αSmi(s,Ti)ds}, and function h0() can be approximated by a piecewise-constant function or a B-spline function.

Under the local independence assumption (i.e., conditional on the random effects vector bi, all components in yi, yi(s), and Ti are independent), the joint likelihood function is

L(θ)=i=1Ip(yi,yi(s),Ti,δiθ)=i=1Ip(yiθ,bi)p(yi(s)θ,bi)p(Ti,δi,θ,bi)p(biθ)dbi. (6)

3.2. Bayesian inference

For model estimation, we propose a Bayesian modeling approach based on Markov Chain Monte Carlo (MCMC) posterior simulations, which provides a flexible way for statistical inference. The Bayesian approach has a number of advantages and has been previously exploited in the univariate joint modeling framework [31] and functional regression [32]. Liu and Li [33] compared the performance of Bayesian approaches to classical frequentist (maximum likelihood) approaches under multivariate joint model framework, demonstrating superiority of the Bayesian methods with respect to bias, root-mean square error, and coverage.

We use vague priors on all elements in parameter vector θ. Specifically, the prior distributions of parameters β, γ, α*, α are N(0, 100), and Inverse-Gamma(0.01, 0.01) for variance paramete σϵ2 and σϵ2 We impose smoothness on coefficient function estimates through the prior specification on spline coefficients B and Bϕ, and assume the following priors for the columns of the matrices:

Bk~N(0,σk2Q1),for1k(P+2),Bϕk~N(0,σϕk2Q1),for1kKϕ,

where Q is a pre-specified Kψ × Kψ penalty matrix enforces smoothness through the connection between Bayesian priors and quadratic penalization [27]. As suggested in Goldsmith et al. [17], we use Q = μQ0 + (1 − μ)Q2 where Q0 and Q2 are zeroth- and second-order derivative penalty matrices, with the upper left parts as

Q0=Ψ(s)[1000010000100001]Ψ(s)andQ2=Ψ(s)[121000254100146410014641]Ψ(s),

where ψ(s) is the cubic B-spline evaluation matrix defined previously. Selecting 0 < μ ≤ 1 balance the universal shrinkage encoded in Q0 and the smoothness constraint of Q2, while ensuring Q is positive definite and priors are proper. In the simulation and real data analysis we set μ = 0.1 and sensitivity analyses have indicated robustness to the choice of μ in the analysis. We use random walk prior of Lang & Brezger [34] on the spline coefficients ζr, for r = 1, ⋯ , R, for smoothing and penalization. Specifically, we use a first order random prior distribution for ζr+1~N(ζr,σζ2), for r = 1, ⋯ , R−1, where ζ1 is treated as a fixed unknown parameter. The variance component σk2, σϕk2, and σζ2 are assigned Inverse-Gamma(0.01, 0.01) as prior distribution. The parameters σb2 and λk in the covariance matrix Σ are assigned Inverse-Gamma(0.01, 0.01) prior distribution, and correlation coefficient ρ is assigned Uniform(−1, 1).

The model fitting is performed in Stan by specifying the full likelihood function and the prior distributions of all unknown parameters. Stan adopts a No-U-Turn sampler (NUTS), which is an extension to Hamiltonian Monte Carlo (HMC) that avoids random walk behavior by using the gradient of the log-posterior and eliminates the need to set a number of steps that required in HMC [35]. NUTS uses a recursive algorithm to build a set of likely candidate points that spans a wide swath of the target distribution, stopping automatically when it starts to double back and retrace its steps [36]. Empirically, NUTS offers faster convergence and parameter space exploration compared with other MCMC algorithms such as Gibbs sampler. We use the history plots and view the absence of apparent trend in the plot as evidence of convergence. In addition, we use the Gelman-Rubin diagnostic to ensure the scale reduction R^ of all parameters are smaller than 1.1 [37]. After fitting the model to the training dataset (the dataset used to build the model) using Bayesian approaches, we obtain D (e.g., D=5,000 after burn-in) samples for the parameter vector denoted by {θ(d), d = 1, ⋯ , D}. All estimations can then be obtained by calculating simple summaries (e.g., mean, variance, quantiles) of the posterior distributions of D samples {θ(d), d = 1, ⋯ , D}. Based on the estimated coefficient vector B^ (posterior mean), the estimated coefficient function is calculated by B^(s)=[B^ψ(s)]. The Bayesian approach allows for the easy construction of posterior credible intervals for the coefficient function B(s) as [q^B,0.025(s),q^B,0.975(s)], where q^B,u(s) is the u-quartile of the MCMC samples B(s)(d) = [B(d)ψT(s)]T, d = 1, ⋯ , D. To facilitate easy reading and implementation of the proposed multivariate functional joint model, we provide a sample Stan code in the Web Supplement.

3.3. Dynamic prediction framework

We next illustrate the dynamic prediction framework based on the proposed model. Given a new subject N’s outcome histories yN{t}={yN(tNj),yN(s,tNj);0tNjt} and covariates XN{t}={xN(tNj),wN;0tNjt} up to time t, and δN = 0 (no event), we want to predict the personalized scalar outcome yN(t) and functional outcome yN(s, t′) at a future time point t′ > t (e.g., t′ = t + Δt), as well as the conditional probability of event-free or survival at time t′, denoted by πN(tt)=P(TNtTN>t,yN{t},XN{t}). The key step for prediction is to obtain the subject N’s subject-specific random intercept bN and random function bN(s). This could be achieved by sampling bN and FPC scores vector ξN jointly from their posterior distribution p(bNTN>t,yN{t},θ), where bN=[bN,ξN], and reconstructing random function bN(s) = ξNBϕψ(s)T. Conditional on the dth posterior sample θ(d), d = 1, ⋯ , D, we draw the dth sample of the bN from the posterior distribution

p(bNTN>t,yN{t},θ(d))=p(yN{t},TN>t,bNθ(d))p(yN{t},TN>tθ(d))p(yN{t},TN>t,bNθ(d))=p(yN{t}θ(d),bN)p(TN>tθ(d),bN)p(bNθ(d)).

where p(yN{t}θ(d),bN) is the joint conditional probability of longitudinal scalar and functional outcomes, p(TN>tθ(d),bN) is the survival probability, and p(bN|θ(d)) is the probability of random effect. For each of θ(d), d = 1, ⋯ , D, we use adaptive rejection Metropolis sampling (ARMS) [38] to draw one sample of random effect vector bN. This process is repeated for the D saved values of θ so that D samples of random effect vector bN are obtained. The predictions can be calculated by plugging in the samples of the parameter vector and random effect vector {θ(d),bN(d),d=1,,D} into the proposed models. For example, based on model (1), the expected values of the longitudinal scalar outcome for subject N at time t′ is calculated with respect to the posterior distribution of the parameters {θDI} as

E{yN(t)TN>t,yN{t},XN{t},D1}=E{yN(t)TN>t,yN{t},XN{t},θ}p(θD1)dθ, (7)

where DI denotes the sample on which the model is fitted. The first part of the integrand is given as

E{yN(t)TN>t,yN{t},XN{t},θ}=E{yN(t)TN>t,yN{t},XN{t},bN,θ}p(bNTN>t,yN{t},XN{t},θ)dbN={β0+tβtΣpPxNpβp+VR(t)ζ+bN}p(bNTN>t,yN{t},XN{t},θ)dbN=β0+tβt+ΣpPxNpβp+VR(t)ζ+bNp(bNTN>t,yN{t},XN{t},θ)dbN. (8)

The integration with respect to θ in Equation (7) and the integration with respect to bN in Equation (8) can be approximated using a Monte Carlo simulation scheme [39], where the dth Monte Carlo sample is

E(d){yN(t)TN>t,yN{t},XN{t},D1}=β0(d)+tβt(d)+ΣpPxNpβp(d)+VR(t)ζ(d)+(bN)(d),d=1,,D.

Similarly, based on model (4), the expected values of the longitudinal functional outcome for subject N at time t′ is

E{yN(s,t)TN>t,yN{t},XN{t},DI}=E{yN(s,t)TN>t,yN{t},XN{t},θ}p(θ,DI)dθ,

where the first part of the integrand is given as

E{yN(s,t)TN>t,yN{t},XN{t},θ}=E{yN(s,t)TN>t,yN{t},XN{t},ξN,θ}p(ξNTN>t,yN{t},XN{t},θ)dξN=B0ψ(s)+tBtψ(s)+ΣpPxNpBpψ(s)+ξNBϕψ(s)p(ξNTN>t,yN{t},XN{t},θ)dξN.

In addition, based on model (5), the conditional probability of event-free at time t′ is

πN(tt)=P(TNtTN>t,yN{t},XN{t},θ)p(θ,DI)dθ,andP(TNtTN>t,yN{t}XN{t},θ)=P(TNtTN>t,yN{t},XN{t},bN)p(bNTN>t,yN{t},XN{t},θ)dbN=P(TNtTN>t,yN{t},XN{t},bN)P(TNtTN>t,yN{t},XN{t},bN)p(bNTN>t,yN(t),XN{t},θ)dbN.

The Monte Carlo samples of E{yN(s,t)TN>t,yN{t},XN{t},DI} and πN(t′|t) can be obtained by simply replacing {θ, bN} in the model with {θ(d),bN(d):d=1,,D}. All prediction results can then be obtained by calculating simple summaries (e.g., mean, variance, quantiles) of the D samples.

Suppose that subject N has not experienced the event of interest by time t′, then the outcome histories are updated to yN{t}. We can dynamically update the posterior distribution to p(bNTN>t,yN{t},θ), draw new samples, and obtain the updated predictions. We assess the performance of the proposed predictive measures in discriminate between patients who had the event from patients who did not. Such discrimination performance is measured by the integrated area under the time-dependent receiver operating characteristic curve that accommodates censoring time [40].

4. Application to the ADNI Study

We apply the proposed Bayesian MFJM to the motivating ADNI-1 study. Besides the longitudinal ADAS-Cog 11 and imaging marker, we include the following variables as scalar covariates: baseline age (bAge, mean: 74.4, SD: 7.3, range 55.1-89.3), gender (gender, 36.1% female), years of education (Edu, mean: 15.6, SD: 3.0, range 4-20), and presence of at least one apolipoprotein E-ε 4 allele (APOE-ε4, 56%), given their potential effects on AD progression [41, 42, 43]. To investigate the different forms of imaging information, we include the baseline hippocampal volume (bHV), baseline hippocampal surface based on hippocampal radial distance (bHRD), and longitudinal hippocampal radial distance (lHRD).

We proposed three joint models with the same longitudinal submodel of the scalar outcome, ADAS-Cog 11, which is defined by

ADASCogi(tij)=mi(tij)+εijmi(tij)=β0+β1APOEε4i+β2bAgei+βttij+Σr=13ζr(tijkr)++bi1.

The three joint models are varied in the survival part that incorporate different levels of imaging information in prediction of AD progression. In the first joint model (refer to as model JM), we incorporate the baseline hippocampal volume (bHV) as a scalar predictor, along with other covariates and underline process mi(t) of the ADAS-Cog 11, in the survival part. This gives the survival submodel in JM as

hi(t)=h0(t)exp{γ1genderi+γ2bAgei+γ3Edui+γ4APOEε4+γ5bHVi+αmi(t)}.

The second model is a function joint model (refer to as model FJM) which includes baseline hippocampal radial distance bHRD(s), instead of hippocampal volume, as a time-independent functional predictor in the survival submodel. The model was proposed and applied to ADNI study in the previous work [10], in which the survival submodel is defined as

hi(t)=h0(t)exp{γ1genderi+γ2bAgei+γ3Edui+γ4APOEε4+sbHRDi(s)BbHRD(s)ds+αmi(t)}.

The third model is a multivariate functional joint model (refer to as model MFJM) that accounts for the longitudinal hippocampal radial distance lHRD(s, t) in the survival submodel, where lHRD(s, t) is modeled as

lHRDi(s,tij)=mi(s,tij)+εij(s)mi(s,tij)=B0(s)+B1(s)APOEε4i+B2(s)bAgei+Bt(s)tij+bi1(s),

and the survival submodel is

hi(t)=h0(t)exp{γ1genderi+γ2Edui++γ3bAgei+γ4APOEε4i+αmi(t)+αsmi(s,t)ds}.

In the MFJM, we expand the random functions bi1(s)=k=1kϕ=4ξikϕk(s), and consider the correlation between bi1 and ξi1. We express coefficient functions Bp(s) and eigenfunctions ϕk(s) in term of a known cubic B-spline basis functions ψ(s) with 10 knots. We allow a flexible and smooth disease progression along time by using truncated power series splines with 3 knots at the location k = (1, 2, 3) in years, which ensures sufficient patients within each interval. Baseline hazard function h0(t) is approximated by a piecewise constant function. Specifically, the observed survival time is divided into H = 7 intervals by every 1/Hth quantiles. We have also explored other selections of Kϕ and H and obtained very similar results.

The three candidate models are compared via assessing their predictive performance, manifested by the time-dependent AUCs, at different time points over the follow-up period. To avoid overestimation of the prediction, we conduct a 10-fold cross validation. Parameters of the joint model are estimated from the training dataset and applied to the validation dataset. The conditional event-free probability corresponding to the time frame (t, t + Δt] are predicted for each patient in the validation datasets as describe in Section 3.3. Because the ADNI patients were reassessed approximately every half year, we select t at 1, 1.5, and 2 years, and Δt as 0.5 and 1 years for analysis. Then the time-dependent AUCs are calculated based on the predicted probabilities of all patients.

Table 1 displays the time-dependent AUCs from the three candidate models. Model FJM and MFJM have notably larger AUCs than model JM for most combinations of t and Δt. This suggests that including functional predictor HRD in the survival submodel improves the capability of the joint model in predicting risk of AD diagnosis. However, the predictive capability of longitudinal HRD does not shown much advantage than the baseline HRD information, except the early phase of the follow-up. This may explained by the fact that different markers may be more or less discriminative at different stages of disease, and MRI abnormalities usually occurring earlier before any symptom of cognitive impairment appears [44, 45]. We have also assessed the MFJM accounting for the correlations between bi1 and the first two FPC components [ξi1, ξi2]. The model has no notable improvement in term of prediction.

Table 1:

Areas under the ROC curve (AUC) by three candidate models in the ADNI study.

Δ t t JM 1 FJM MFJM
0.5 1 0.715 0.754 0.821
1.5 0.691 0.738 0.734
2 0.781 0.809 0.812

1 1 0.696 0.747 0.792
1.5 0.735 0.776 0.777
2 0.749 0.769 0.766

We select MFJM as the final model because it has a competitive good discrimination capability in both early and late phases. Parameter estimates from model MFJM using whole dataset are presented in Table 2. In the longitudinal submodel of scalar outcome, people with APOE-ε4 allele(s), on average, have higher (worse) ADAS-Cog 11 score (2.207 unit) than people without this genetic variation. Also, the ADAS-Cog 11 score increases (deteriorates) as time progresses, i.e., an average increase of 1.121 unit (95% CI: [0.648-01.584]) per year for MCI patients. In the survival submodel, the presence of APOE-ε4 allele(s) increases the hazard of AD diagnosis by 51% (exp(0.409) − 1, 95% CI: [8%-109%]), which is consistent with the literature [46]. Furthermore, larger ADAS-Cog 11 score increases the risk of AD diagnosis, i.e., one unit increase in ADAS-Cog 11 score increases the hazard of AD diagnosis by 19% (exp(0.173) − 1, 95% CI: [14%-24%]). The association parameter α is negative, indicating that the decrease of HRD (i.e., hippocampal atrophy) is associated with the increasing risk of AD diagnosis. The estimated coefficient functions in the longitudinal model for functional outcome HRD is presented in Figure 2. APOE-ε4 allele(s) is not associated with the thickness of hippocampus because the estimated coefficient function B^1(s) (upper right panel) fluctuates around zero across the domain S. The estimated coefficient function B^t(s) (lower left panel) quantifies the change of HRD over time, and notable atrophy can be viewed on both ends and middle part of hippocampus. The baseline age of the patients have a similar effect on hippocampus. As shown in the estimated coefficient function B^2(s) (lower right panel), the older patients, on average, have thinner hippocampus on both ends and middle parts.

Table 2:

ADNI data analysis results from model MFJM.

Parameters Mean SE 2.5% 97.5%
For scalar longitudinal outcome
ADAS-Cog 11 APOE-ε4 2.207 0.397 1.464 3.006
bAge 0.172 0.258 −0.346 0.668
Time (Years) 1.121 0.240 0.648 1.584

For survival process
MCI to AD Female 0.094 0.172 −0.241 0.442
bAge −0.113 0.086 −0.278 0.059
Edu (years) 0.029 0.026 −0.022 0.080
APOE-ε4 0.409 0.164 0.081 0.736
α* 0.173 0.021 0.135 0.215
α −1.105 0.437 −1.969 −0.259

Figure 2.

Figure 2

Estimated coefficient functions (solid lines) in the functional longitudinal submodel with 95% pointwise uncertainty band (dashed lines) and reference lines (dotted lines) at zero.

To illustrate the personalized dynamic predictions, we select two target patients as validation data, and predict their future health outcomes and event-free probabilities based on MFJM estimated using the remaining data as training set. Patient A had a baseline age of 73, no APOE-ε4, as compared with a more severe Patient B, 78 years old at baseline, and with APOE-ε4. Figure 3 demonstrates how the predicted ADAS-Cog 11 scores are updated over time for the two patients. From the left to the right on Figure 3, by using more follow-up data, predictions are closer to the true observed values and the 95% uncertainty band is narrower. Figure 4 shows the predicted HRD at third and fourth visits based on the previous observations. The predicted expected HRD (red line) is close to the true observation (black line), however we do not observe significant difference between HRD at the third and the fourth visits and we suggest to use the predicted HRD with caution. Figure 5 displays the predicted probability of being free of AD diagnosis. For Patient A, the event-free probability curve does not show large change because Patient A’s predicted ADAS-Cog 11 scores are relatively low. In comparison, Patient B has higher predicted ADAS-Cog 11 scores and worse cognitive function, and thus has considerably drop in the event-free probability. This suggests that Patient B has a higher risk of AD diagnosis and should be monitored frequently.

Figure 3.

Figure 3

Predicted ADAS-Cog 11 for Patient A (upper panels) and Patient B (lower panels). Solid line is predicted longitudinal trajectories. Dashed lines construct a 95% pointwise uncertainty band. The dotted vertical lines represent the time of prediction t.

Figure 4.

Figure 4

Predicted HRD (red) with 95% pointwise uncertainty band (dashed lines) for Patient A (upper panels) and Patient B (lower panels) at third and forth visit. The solid black curve represents the true observation at the time point.

Figure 5.

Figure 5

Predicted event-free probability with 95% pointwise uncertainty band (dashed lines) for Patient A (upper panels) and Patient B (lower panels). The dotted vertical lines represent the time of prediction t.

5. Simulation Study

In this section, we conduct a simulation study to evaluate the proposed models. We generate 100 datasets with sample size I = 150 subjects and each subject has Ji=4 measurements at time 0, 5, 10, and 15. The simulated data structure is similar to the motivating ADNI study. We generate longitudinal functional response yi(s, tij) on an equally spaced grid of length 25 and longitudinal scalar response yi(tij) according to the longitudinal submodels

yi(tij)=mi(tij)+ϵij,yi(s,tij)=mi(s,tij)+ϵij(s),wheremi(tij)=β0+tijβt+xi1β1+bi,andmi(s,tij)=B0(s)+tijBt(s)+xi1B1(s)+Σk=1Kϕ=2ξikϕk(s),s[0,1]=S.

We set β0 = 4, βt = 1, and β1 = 0.5. The intercept function is B0(s) = −1.5 − sin(2πs) − cos(2πs). The time effect is Bt(s)=110Φ(s0.50.22), where Φ(.) is the standard Normal density function. The fixed effect is B1(s) = sin(2πs)−cos(4πs), and we generate scalar predictors using xi1 ~ N(0, 2). Similar to previous simulation study [17], the orthogonal eigenfunctions for random intercept functions are chosen to be ϕ1(s) ∝ 1.5 − sin(2πs) − cos(2πs) and ϕ2(s) ∝ sin(4πs) and scaled such that S[ϕ1(s)]2ds=1 and S[ϕ2(s)]2ds=1. The subject-specific random effect and FPC scores [bi, ξi1, ξi2] are generated from multivariate normal distribution with zero-mean and covariance matrix Σ with σb2=0.62, λ1 = 1, λ2 = 0.5 and ρ = 0.5, respectively. The measurement error for scalar response ϵij is simulated from N(0, 0.2). The white noise component ϵij(s) is simulated from N(0, 0.1) across s.

We choose a constant baseline hazard function h0(t) = 0.01 and the survival submodel is

hi(t)=h0(t)exp{wiγ1+αmi(t)+αSmi(s,t)ds},

where wi is simulated from N(0, 1), γ1 = 0.76, α* = 0.5, and α = 0.3. We generated random survival times based on the closed-form of T* derived from survival function:

Si(t)=exp{λexp(wiγ1+α(β0+β1xi1+bi)+αS[B0(s)+xi1B1(s)+Σk=12ξikϕk(s)]ds)αβt+αSBt(s)ds×[exp(t×(αβt+αSBt(s)ds))1]},

and thus,

Ti=log{log(Si(t))×(αβt+αSBt(s)ds)h0(t)exp(wiγ1+α(β0+β1xi1+bi)+αS[B0(s)+xi1B1(s)Σk=12ξikϕk(s)]ds)+1}(αβt+αSBt(s)ds),

where Si(t) is simulated from a uniform distribution between 0 and 1. Censoring time is independently simulated from another uniform distribution to achieve a censoring rate about 30%. Due to censoring, each subject has an average of 3 repeated measurements.

For estimation, the coefficient functions B0(s), Bt(s) , B1(s), and FPC eigenfunctions ϕk(s), k = 1, ⋯ , Kϕ are expanded by cubic B-spline basis with Kψ = 10. We set the number of estimated principal components K^ϕ{2,3}. Note that when K^ϕ=3 the number of estimated FPC components is larger than the number of true FPC components, which is held at Kϕ = 2. Model parameters are estimated for the C = 100 data sets using the methodology described in Section 3.2. Estimation and inference is based on posterior means and quartiles of 5000 iteration from the sampler after discarding the first 1000 as burn-in. We perform diagnostics for one simulated dataset indicate that these levels are sufficient for convergence and exploration of the full posterior distribution. On average, the computing process takes 1.1 hours for each simulated dataset on a personal computer (RAM 8G, CPU 3.30GHz). The ability to estimate the true coefficient is assessed by the average mean squared error (AMSE), e.g., AMSE(β^1)=1100c=1100(β^1β1)2. Table 3 presents the AMSE, in addition to bias (the average of the posterior means minus the true values), standard error (SE, the square root of the average of the variance), standard deviation (SD, the standard deviation of the posterior mean), and coverage probabilities (CP) of 95% credible intervals when K^ϕ=2. Table 3 suggests that the proposed model performs reasonable well with relatively small bias and AMSE values.

Table 3:

Parameter estimation in the simulation study based on 100 datasets when K^ϕ=2.

Bias AMSE SE SD CP
β0 = 4 <0.001 0.003 0.053 0.055 0.930
βt = 1 −0.003 <0.001 0.019 0.019 0.960
β1 = 0.5 <0.001 <0.001 0.004 0.003 0.980
γ1 = 0.76 −0.017 0.017 0.116 0.122 0.920
α* = 0.5 0.031 0.005 0.059 0.062 0.910
α = 0.3 −0.023 0.053 0.232 0.231 0.930

Figure 6 displays the coefficient functions estimated under K^ϕ=2. The true coefficient functions (black solid lines), their mean estimated curves (red solid lines), along with 100 estimated curves based on each individual dataset (grey solid lines). The figures suggest that the estimated coefficient functions from the model are reasonably close to the true coefficient functions. The simulation results for scenario K^ϕ=3 are presented in Web Table 1 and Web Figure 2. It suggests that increasing K^ϕ has limited effects on parameter estimation in either longitudinal sub-models and survival sub-models. The bias and AMSE values have slight increase but are still relatively small. The estimated coefficient functions are reasonably close to the true coefficient functions.

Figure 6.

Figure 6

The estimates of the coefficient functions in the simulation study based on 100 runs and K^ϕ=2. The red solid lines are mean estimated curves and the grey solid lines are 100 estimated curves based on each individual dataset.

For each testing dataset, we predict subject-specific conditional survival probability at different time points t and Δt using the MCMC samples from the fitted model and available measurements up to time t. Table 4 presents the time-dependent AUCs by averaging the separate analyses of 100 datasets. The true AUCs are computed using the prerespecified parameter values and random effects when generate the data. The predict AUCs are relative closed to the true AUCs, suggesting a good prediction performance of the MFJM in terms of validation.

Table 4:

Areas under the ROC curve (AUC) for the simulation study.

Δ t t True AUCtΔt
5 0 0.831 0.830
5 0.811 0.810
10 0.829 0.827
10 0 0.847 0.846
5 0.875 0.875
10 0.824 0.823

6. Discussion

The proposed multivariate functional joint model (MFJM) is an important complement to both functional data analysis and joint modeling for longitudinal and survival data. Our model allows the longitudinal functional outcome, longitudinal scalar outcome, and survival outcome to be modeled simultaneously and can be applied to many areas of research when MRI data and clinical variables are collected longitudinally. We use the functional mixed effect model and functional principal component (FPC) analysis techniques to approximate the longitudinal functional data, and expand the coefficient functions and eigenfunctions in the model using a penalized spline approach. We then develop the process of making personalized dynamic prediction of future outcomes and risks of event of interest using both repeated functional and scalar outcomes.

The model inference is conducted using a Bayesian approach. One advantage of Bayesian approach is the availability of Markov chain Monte Carlo (MCMC) sampling algorithms, which allow estimation from posterior probability density functions that are not analytically tractable, and which require complex multi-dimensional integration over the random effects. The surge in MCMC sampling can be partly explained by the wide use of the Bayesian computing languages, such as Stan, which eliminate the need of complex analytical derivation of the posterior distributions. Moreover, Bayesian method is well-suited for dynamic prediction using joint models. With the MCMC samples from the posterior distributions of the parameters for the original data, we can devise a simple simulation scheme to obtain a Monte Carlo estimate of risk prediction and longitudinal trajectories [39, 47]. In addition, uncertainty about posterior parameter estimates is readily calculated from the MCMC output without the need for further complex derivation and calculations. This facilitates the calculation of the uncertainty intervals of functional coefficients and functional predictions.

Simulation indicates that the proposed Bayesian MFJM yields accurate inference and prediction. The application of our developed methodology to the motivating data yields novel insight into the effect of regional atrophy on the AD progression. More importantly, the proposed dynamic prediction approach can utilize the functional and scalar predictors to make correct predictions for new subjects. The inclusion of longitudinal functional predictor hippocampal radial distance (HRD) into the survival submodel improves the predictive performance in the early phase of disease among MCI patients. When new measurements are available, the predictions can be updated with improved accuracy and efficiency. The practical impact of such dynamic prediction tools can be dramatic for the neurodegenerative diseases (e.g., Alzheimer’s disease) because the longitudinal functional data are increasingly collected in the studies of these diseases. They provide unique insight and valuable guidance for clinical decision making on patient prognosis, targeted treatment, and for targeted recruitment for clinical trials.

There are some limitations we will address in the future. First, our method requires to select the number of the FPC components prior to analysis. Although we suggest a large number of FPC components, determining whether a selection is sufficient to describe the major variation in the functional data require additional analyses with even larger values, which can incur considerable computational expense. Second, we jointly model all parameters of interest in a Bayesian context, the computation time of the proposed Bayesian procedure could be a serious concern particularly as sample sizes, dimension of functional data, and the number of estimated principal components grow. Future work focusing on variation Bayes or other approximation may address the computational concern. Another option is to use a two-step method which models longitudinal functional outcome using functional principal components approaches and then plugs in the estimates of FPC scores into the survival model. These approaches are more computationally attractive and scalable than joint modeling approach but may be accompanied by poorer inferential performance [32, 48]. It is worthwhile to investigate when using these methods is a reasonable alternative to the joint analysis. For example, the two-setp method may provide tools for choosing the dimension of the FPC components via rapid comparisons of different selections. Third, we assume a constant associate function α(s) ≡ α to quantify the association between the functional outcome and the hazard. If the primary interest is prediction, we can allow α(s) varies over the domain of S and expand it by the cubic B-spline basis functions and estimate the spline coefficients. Another direct extension of the model is the inclusion of covariate-specific random effects, such as random time-slope function, in the functional mixed effect model (2). In this paper, we introduce the correlation between the scalar and functional outcomes using the correlation between the scalar random effects and the first FPC component derived from the random function. Accounting for the correlations between the random effects and the first few FPC components may represent the correlation between the scalar and functional outcomes more accurate, but may also lead to increased computing burden. It is worthwhile to further explore other correlation structures, e.g., adopting an idea of latent trial model [49, 50], especially when additional covariate-specific random effects are in the model. Moreover, the inverse-gamma prior distribution, which we used for variance parameters in the model inference, can be sensitive to the choice of the hyperparameters (shape and scale) in case where variance estimates are close to zero [51]. We tested the model inference on one training dataset from our application study using uniform and half-Cauchy prior distribution instead, and achieved reasonably similar results. We suggest further tests are needed in other applications and datasets on a case-by-case basis. We would like to investigate the effect of these extensions and address the limitations to improve predict performance in our future research.

Supplementary Material

1
3
4

Acknowledgements

Sheng Luo’s research was supported by the National Institute of Neurological Disorders and Stroke under Award Number R01NS091307. The authors acknowledge the Texas Advanced Computing Center (TACC) for providing high-performing computing resources. Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.ucla.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this article. A complete listing of ADNI investigators can be found at: http://adni.loni.ucla.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • [1].Weiner MW, Veitch DP, Aisen PS, Beckett LA, Cairns NJ, Green RC, Harvey D, Jack CR, Jagust W, Liu E, et al. , The alzheimer’s disease neuroimaging initiative: a review of papers published since its inception, Alzheimer’s & Dementia 9 (5) (2013) e111–e194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Petersen RC, Smith GE, Waring SC, Ivnik RJ, Tangalos EG, Kokmen E, Mild cognitive impairment: clinical characterization and outcome, Archives of Neurology 56 (3) (1999) 303–308. [DOI] [PubMed] [Google Scholar]
  • [3].Perrin RJ, Fagan AM, Holtzman DM, Multi-modal techniques for diagnosis and prognosis of Alzheimers disease, Nature 461 (7266) (2009) 916–922. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Schmand B, Huizenga HM, Gool W. A. v., Meta-analysis of CSF and MRI biomarkers for detecting preclinical Alzheimer’s disease, Psychological Medicine 40 (1) (2010) 135–145. [DOI] [PubMed] [Google Scholar]
  • [5].Du AT, Magnetic resonance imaging of the entorhinal cortex and hippocampus in mild cognitive impairment and Alzheimer’s disease, Journal of Neurology, Neurosurgery & Psychiatry 71 (4) (2001) 441–447. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Frisoni GB, Fox NC, Jack CR, Scheltens P, Thompson PM, The clinical use of structural mri in alzheimer disease, Nature Reviews Neurology 6 (2) (2010) 67–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Apostolova LG, Mosconi L, Thompson PM, Green AE, Hwang KS, Ramirez A, Mistur R, Tsui WH, de Leon MJ, Subregional hippocampal atrophy predicts alzheimer’s dementia in the cognitively normal, Neurobiology of Aging 31 (7) (2010) 1077–1088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Qiu A, Fennema-Notestine C, Dale AM, Miller MI, Initiative ADN, et al. , Regional shape abnormalities in mild cognitive impairment and alzheimer’s disease, NeuroImage 45 (3) (2009) 656–661. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Li K, Luo S, Functional joint model for longitudinal and time-to-event data: an application to alzheimer’s disease, Statistics in Medicine 36 (22) (2017) 3560–3572. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Li K, Luo S, Dynamic predictions in Bayesian functional joint models for longitudinal and time-to-event data: An application to Alzheimers disease, Statistical Methods in Medical Research OnlineFirst. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Di C-Z, Crainiceanu CM, Caffo BS, Punjabi NM, Multilevel functional principal component analysis, The Annals of Applied Statistics 3 (1) (2009) 458–488. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Greven S, Crainiceanu C, Caffo B, Reich D, Longitudinal functional principal component analysis, Electronic Journal of Statistics 4 (2010) 1022–1054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Park SY, Staicu A-M, Longitudinal functional data analysis, Stat 4 (1) (2015) 212–226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Brumback BA, Rice JA, Smoothing spline models for the analysis of nested and crossed samples of curves, Journal of the American Statistical Association 93 (443) (1998) 961–976. [Google Scholar]
  • [15].Guo W, Functional mixed effects models, Biometrics 58 (1) (2002) 121–128. doi:10.1111/j.0006-341X.2002.00121.x. [DOI] [PubMed] [Google Scholar]
  • [16].Morris JS, Carroll RJ, Wavelet-based functional mixed models, Journal of the Royal Statistical Society. Series B, Statistical Methodology 68 (2) (2006) 179–199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Goldsmith J, Kitago T, Assessing systematic effects of stroke on motor control by using hierarchical function-on-scalar regression, Journal of the Royal Statistical Society: Series C (Applied Statistics) 65 (2) (2016) 215–236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Tsiatis AA, Davidian M, Joint modeling of longitudinal and time-to-event data: an overview, Statistica Sinica 14 (3) (2004) 809–834. [Google Scholar]
  • [19].Henderson R, Diggle P, Dobson A, Joint modelling of longitudinal measurements and event time data, Biostatistics 1 (4) (2000) 465–480. [DOI] [PubMed] [Google Scholar]
  • [20].Hickey GL, Philipson P, Jorgensen A, Kolamunnage-Dona R, Joint modelling of time-to-event and multivariate longitudinal outcomes: recent developments and issues, BMC Medical Research Methodology 16 (2016) 117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Li K, Chan W, Doody RS, Quinn J, Luo S, Prediction of conversion to Alzheimers disease with longitudinal measures and time-to-event data, Journal of Alzheimer’s Disease 58 (2) (2017) 361–371. doi:10.3233/JAD-161201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Huang M, Yang W, Feng Q, Chen W, Initiative ADN, et al. , Longitudinal measurement and hierarchical classification framework for the prediction of Alzheimers disease, Scientific Reports 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Patenaude B, Smith SM, Kennedy DN, Jenkinson M, A bayesian model of shape and appearance for subcortical brain segmentation, NeuroImage 56 (3) (2011) 907–922. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Jenkinson M, Beckmann CF, Behrens TE, Woolrich MW, Smith SM, Fsl, NeuroImage 62 (2) (2012) 782–790. [DOI] [PubMed] [Google Scholar]
  • [25].Chen KHM, Chuah LYM, Sim SKY, Chee MWL, Hippocampal region-specific contributions to memory performance in normal elderly, Brain and Cognition 72 (3) (2010) 400–407. [DOI] [PubMed] [Google Scholar]
  • [26].Gomar JJ, Bobes-Bascaran MT, Conejero-Goldberg C, Davies P, Goldberg TE, Alzheimer’s Disease Neuroimaging Initiative, Utility of combinations of biomarkers, cognitive markers, and risk factors to predict conversion from mild cognitive impairment to alzheimer disease in patients in the alzheimer’s disease neuroimaging initiative, Archives of General Psychiatry 68 (9) (2011) 961-969. [DOI] [PubMed] [Google Scholar]
  • [27].Ruppert D, Wand MP, Carroll RJ, Semiparametric regression, no. No.12, Cambridge University Press, 2003. [Google Scholar]
  • [28].Crainiceanu CM, Goldsmith AJ, Bayesian Functional Data Analysis Using WinBUGS, Journal of statistical software 32 (11). [PMC free article] [PubMed] [Google Scholar]
  • [29].Morris JS, Functional regression, Annual Review of Statistics and Its Application 2 (1) (2015) 321–359. [Google Scholar]
  • [30].Ruppert D, Selecting the number of knots for penalized splines, Journal of Computational and Graphical Statistics 11 (4) (2002) 735–757. [Google Scholar]
  • [31].Faucett CL, Thomas DC, Simultaneously Modelling Censored Survival Data and Repeatedly Measured Covariates: A Gibbs Sampling Approach, Statistics in Medicine 15 (15) (1996) 1663–1685. [DOI] [PubMed] [Google Scholar]
  • [32].Crainiceanu CM, Staicu A-M, Di C-Z, Generalized multilevel functional regression, Journal of the American Statistical Association 104 (488) (2009) 1550–1561. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Liu F, Li Q, A Bayesian model for joint analysis of multivariate repeated measures and time to event data in crossover trials, Statistical Methods in Medical Research 25 (5) (2016) 2180–2192. [DOI] [PubMed] [Google Scholar]
  • [34].Lang S, Brezger A, Bayesian P-Splines, Journal of Computational and Graphical Statistics 13 (1) (2004) 183–212. [Google Scholar]
  • [35].Neal RM, et al. , Mcmc using hamiltonian dynamics, Handbook of Markov Chain Monte Carlo 2 (2011) 113–162. [Google Scholar]
  • [36].Hoffman MD, Gelman A, The no-u-turn sampler: adaptively setting path lengths in hamiltonian monte carlo., Journal of Machine Learning Research 15 (1) (2014) 1593–1623. [Google Scholar]
  • [37].Gelman A, Carlin JB, Stern HS, Rubin DB, Bayesian Data Analysis, CRC Press, 2013. [Google Scholar]
  • [38].Gilks WR, Best NG, Tan KKC, Adaptive rejection Metropolis sampling within Gibbs sampling, Journal of the Royal Statistical Society. Series C (Applied Statistics) 44 (4) (1995) 455–472. [Google Scholar]
  • [39].Rizopoulos D, Dynamic predictions and prospective accuracy in joint models for longitudinal and time-to-event data, Biometrics 67 (3) (2011) 819–829. [DOI] [PubMed] [Google Scholar]
  • [40].Li L, Greene T, Hu B, A simple method to estimate the time-dependent receiver operating characteristic curve and the area under the curve with right censored data, Statistical Methods in Medical Research OnlineFirst. doi:10.1177/0962280216680239. [DOI] [PubMed] [Google Scholar]
  • [41].Risacher SL, Saykin AJ, Wes JD, Shen L, Firpi HA, McDonald BC, Baseline mri predictors of conversion from mci to probable ad in the adni cohort, Current Alzheimer Research 6 (4) (2009) 347–361. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [42].Cui Y, Liu B, Luo S, Zhen X, Fan M, Liu T, Zhu W, Park M, Jiang T, Jin JS, et al. , Identification of conversion from mild cognitive impairment to alzheimer’s disease using multivariate predictors, PLOS ONE 6 (7) (2011) e21896. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [43].Li S, Okonkwo O, Albert M, Wang M-C, Variation in variables that predict progression from mci to ad dementia over duration of follow-up, American Journal of Alzheimer’s Disease 2 (1) (2013) 12–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [44].Jack CR Jr, Knopman DS, Jagust WJ, Shaw LM, Aisen PS, Weiner MW, Petersen RC, Trojanowski JQ, Hypothetical model of dynamic biomarkers of the Alzheimer’s pathological cascade, The Lancet Neurology 9 (1) (2010) 119–128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [45].Li S, Okonkwo O, Albert M, Wang M-C, Variation in Variables that Predict Progression from MCI to AD Dementia over Duration of Follow-up, American journal of Alzheimer’s disease (Columbia, Mo.) 2 (1) (2013) 12–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [46].Corder E, Saunders A, Strittmatter W, Schmechel D, Gaskell P, Small G, Roses AD, Haines J, Pericak-Vance MA, Gene dose of apolipoprotein e type 4 allele and the risk of alzheimer’s disease in late onset families, Science 261 (5123) (1993) 921–923. [DOI] [PubMed] [Google Scholar]
  • [47].Rizopoulos D, Hatfield LA, Carlin BP, Takkenberg JJ, Combining dynamic predictions from joint models for longitudinal and time-to-event data using Bayesian model averaging, Journal of the American Statistical Association 109 (508) (2014) 1385–1397. [Google Scholar]
  • [48].Goldsmith J, Zipunnikov V, Schrack J, Generalized multilevel function-on-scalar regression and principal component analysis, Biometrics 71 (2) (2015) 344–353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [49].Dunson DB, Dynamic latent trait models for multidimensional longitudinal data, Journal of the American Statistical Association 98 (463) (2003) 555–563. [Google Scholar]
  • [50].Wang J, Luo S, Li L, Dynamic prediction for multiple repeated measures and event time data: An application to parkinsons disease, The annals of applied statistics 11 (3) (2017) 1787. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [51].Gelman A, Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper), Bayesian Analysis 1 (3) (2006) 515–534. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
3
4

RESOURCES