ABSTRACT
We develop an integrative joint model for multivariate sparse functional and survival data to analyze Alzheimer's disease (AD) across multiple studies. To address missing‐by‐design outcomes in multi‐cohort studies, our approach extends the multivariate functional mixed model (MFMM), which integrates longitudinal outcomes to extract shared disease progression trajectories and links these outcomes to time‐to‐event data through a parsimonious survival model. This framework balances flexibility and interpretability by modeling shared progression trajectories while accommodating cohort‐specific mean functions and survival parameters. For efficient estimation, we incorporate penalized splines into an EM algorithm. Application to three AD cohorts demonstrates the model's ability to capture disease trajectories and account for inter‐cohort variability. Simulation studies confirm its robustness and accuracy, highlighting its value in advancing the understanding of AD progression and supporting clinical decision‐making in multi‐cohort settings.
Keywords: EM algorithm, functional data, multivariate longitudinal data, penalized splines
1. Introduction
Alzheimer's disease (AD) is a progressive brain disorder that significantly impairs cognitive and behavioral functions. In the United States, AD is the fifth leading cause of death among people 65 years and older, with an estimated 6.9 million affected in 2024 [1]. The increasing prevalence of AD has led to the establishment of large‐scale longitudinal studies, such as the Alzheimer's Disease Neuroimaging Initiative (ADNI) [2], the National Alzheimer's Coordinating Center (NACC) [3], and the Religious Orders Study and Rush Memory and Aging Project (ROSMAP) [4]. These studies collect various longitudinal data, including neuropsychological and behavioral measurements, to monitor disease trajectories and assess risk factors over time (Table 1). Such measurements are critical for understanding disease progression and its associations with survival outcomes (e.g., dementia onset), ultimately informing clinical interventions and policy.
TABLE 1.
List of longitudinal outcomes and number of Mild Cognitive Impairment (MCI) subjects in three AD cohorts.
| Order | Longitudinal outcome | Scaling coefficient () | ADNI | NACC | ROSMAP | |||
|---|---|---|---|---|---|---|---|---|
| 1 | MMSE | 1.00 |
|
|
|
|||
| 2 | WMSLM | 1.54 |
|
|
|
|||
| 3 | RAVLT | 3.68 |
|
|
|
|||
| 4 | SDMT | 3.65 |
|
|
|
|||
| 5 | CDR‐SB | −0.60 |
|
|
|
|||
| 6 | ADAS | −2.70 |
|
|
|
|||
| 7 | FAQ | −1.84 |
|
|
|
|||
| 8 | TRAILA | −1.08 |
|
|
|
|||
| Number of MCI subjects | 715 | 3707 | 522 |
Note: , data available; , data unavailable.
Abbreviations: ADAS, Alzheimer disease assessment scale‐cognitive; CDR‐SB, clinical dementia rating; FAQ, functional assessment questionnaire; MMSE, mini‐mental state exam; RAVLT, rey auditory verbal learning test immediate recall; SDMT, symbol digit modalities test; TRAILA, trail making test Part A; WMSLM, wechsler memory scale logical memory.
The analysis of data from multiple cohorts of AD presents significant opportunities and challenges. Pooling data from multiple cohorts increases the sample size and statistical power, enabling the investigation of complex risk and protective factors, including nonlinear and time‐varying effects. Additionally, multi‐cohort analysis improves the robustness and generalizability of predictive models across diverse subgroups, settings, and countries, making findings more clinically applicable. However, heterogeneity in baseline characteristics, study designs, and data collection protocols complicates integration, particularly when some longitudinal outcomes are systematically missing by design (Table 1). Modeling longitudinal outcomes in this context requires methods capable of capturing nonlinear subject‐specific trajectories while accounting for cohort‐specific differences. Traditional parametric models often fail to address these complexities, especially in the presence of missing‐by‐design outcomes.
Joint models (JMs) are widely used for analyzing longitudinal and survival data [5] and have been extended to handle multiple longitudinal outcomes [6, 7, 8]. However, traditional JMs often rely on parametric assumptions for the longitudinal sub‐model, which can be overly restrictive and subject to model misspecification, in the presence of intricate nonlinear trajectories. To address this, longitudinal outcomes can alternatively be modeled as sparse functional data [9, 10], leveraging flexible nonparametric methods [11], to capture complex subject‐specific patterns over time. The joint modeling of survival and sparse functional data is termed functional joint models (FJMs) [12, 13]. Recent advances have extended FJMs to incorporate multivariate sparse functional data [8, 14, 15, 16], multi‐dimensional sparse functional data [17], recurrent event data [18], and imaging data [19]. Despite the successes of FJMs, their model estimation remains a major challenge, and many existing works adopt a computationally more feasible two‐stage estimation, often at the expense of substantial model bias. This challenge becomes more severe when developing FJM to address systematic missingness and inter‐cohort heterogeneity in multi‐cohort studies.
Motivated by the challenges, this paper develops an integrative FJM for multivariate sparse functional data and survival data from multiple cohorts, with an accompanied efficient and feasible model estimation. First, to accommodate missing‐by‐design outcomes while leveraging shared information across cohorts in the longitudinal sub‐model, we decompose longitudinal outcomes into shared and outcome‐specific disease trajectories as in the multivariate functional mixed model (MFMM) [14], which effectively balances model flexibility and interpretability, thereby enabling comprehensive insights into disease processes. Crucially, we assume consistent variation patterns across cohorts in modeling the shared and outcome‐specific disease trajectories, allowing the integration of longitudinal data from diverse sources while maintaining a parsimonious survival model. Indeed, our approach integrates longitudinal and survival data through shared latent trajectories, offering a unified framework that leverages all available information to improve predictive accuracy. Then to address cohort‐specific variability, we incorporate cohort‐specific mean functions in the longitudinal sub‐model and cohort‐specific regression coefficients in the survival sub‐model. As shall be shown in the application to AD studies, the integrative model is capable of revealing insights that cannot be identified by a single‐cohort analysis. Those model considerations and assumptions balance leveraging shared information across cohorts with accommodating their inherent heterogeneity.
To estimate the proposed multi‐cohort FJM, we develop an efficient and computationally feasible expectation‐maximization (EM) algorithm, another major contribution of the paper. FJMs often have multiple nonparametric smooth functions (more than 10 in our model) to estimate, and their estimation makes EM algorithms computationally very challenging. Indeed, regression splines were previously used [14], for which overfitting may not only affect estimates but also result in slow convergence of EM. Notice that overfitting is almost inevitable in joint models due to the right truncation of longitudinal outcomes by the survival outcome. Therefore, we propose to use penalized splines [20] to estimate nonparametric smooth functions in the longitudinal sub‐model, allowing for the modeling of complex nonlinear relationships while reducing overfitting risks associated with regression splines. Penalized splines also allow for varying smoothness across functions, accommodating diverse data structures and trajectories within and across cohorts. For example, Figure 3 in the data application shows that penalized splines provide not only highly nonlinear estimates but also essentially linear estimates. However, selecting smoothing parameters for multiple nonparametric functions presents great computational challenges, particularly in iterative EM algorithms. A fast algorithm for the global selection of smoothing parameters in nonparametric regression such as in generalized additive models [21] is inapplicable for FJMs because it can only deal with regression functions while for FJMs, nonparametric smooth functions (eigenfunctions) are also used to model variances in functional data models. To address this, we propose using a local selection of smoothing parameters, exploiting the iterative nature of EM algorithms. Specifically, at each iterative step of EM, we reformulate the estimation of each smooth function as a weighted least squares problem for nonparametric regression. This reformulation enables efficient simultaneous estimation of smooth functions and smoothing parameters using standard software, such as the well‐developed mgcv R package. This approach stabilizes parameter estimation and substantially accelerates EM convergence, making it feasible for multi‐cohort studies such as AD studies.
FIGURE 3.

Estimated cohort‐specific mean trajectories of longitudinal outcomes for ADNI (blue lines), NACC (red lines) and ROSMAP (green lines). Columns 1 and 3 compare multi‐cohort FJM (solid lines) with separate FJM (dashed lines), while columns 2 and 4 show results from the parametric MJM.
The remainder of this paper is organized as follows. Section 2 introduces the proposed multi‐cohort functional joint model. Section 3 details model estimation using the Monte Carlo Expectation‐Maximization (EM) algorithm. Section 4 outlines the selection of key model parameters, such as the principal components and the smoothing parameters. In Section 5, we apply the model to three AD cohorts, demonstrating its ability to capture shared and cohort‐specific disease progression patterns. Section 6 evaluates the performance of the model through simulation studies, and Section 7 concludes with a discussion. The R code for implementing the proposed method is available at https://github.com/wenyiwang2000/Multi‐Cohort‐FJM.
2. The Multi‐Cohort Functional Joint Model
2.1. Data Structure and Notation
As shown in Table 1, the longitudinal outcomes measured in each cohort differ, with some outcomes observed across all cohorts and others specific to one or two cohorts. Let be the total number of subjects across all cohorts, indexed , and let be the number of cohorts. The mapping indicates the cohort to which subject belongs. Let denote the total number of longitudinal outcomes and the set of outcomes observed in cohort . For the multi‐cohort AD studies and their outcomes listed in Table 1, , with in ADNI, in NACC, and in ROSMAP. For subject in cohort and outcome , let denote the observation of the longitudinal outcome at time , with and the number of observations. Observational times are assumed to be within a compact interval , representing the study follow‐up period.
The survival outcome, representing the time from baseline to the onset of AD dementia, is denoted by for subject . When is right‐censored, we observe , where is the censoring time, assumed independent of both the event time and the longitudinal outcomes. The binary event indicator specifies whether is observed. Longitudinal times are restricted to , implying no observations of longitudinal outcomes after . Finally, let denote a common set of baseline covariates collected across all cohorts, including demographic and clinical characteristics such as age, sex, and APOE genotype, which are relevant to AD progression.
2.2. Multivariate Functional Mixed Model for Multi‐Cohort Longitudinal Data
We model multiple longitudinal outcomes as multivariate functional data and extend the multivariate functional mixed model (MFMM) [14] to multiple cohorts:
| (1) |
where is a smooth latent stochastic process, is the fixed mean function for the outcome in cohort , the random process captures shared variation patterns and induces correlations among multiple outcomes for the subject, is the deviation of the outcome from for the subject, is the outcome‐specific scaling parameter for the outcome, and is measurement error with variance . These components jointly capture both shared disease progression and individual variations in AD longitudinal outcomes. It is assumed that is independent across subjects. The outcome‐specific deviation and the measurement error are assumed to be independent of , and they are mutually independent across subjects and outcomes.
The shared latent trajectory captures overall AD progression, reflecting variation patterns shared among outcomes. Modeled as a Gaussian process with zero mean and covariance , quantifies the overall trajectory of disease progression. Outcome‐specific variations are captured by , another Gaussian process with covariance . The scaling coefficients adjust for differences in outcome magnitudes, ensuring comparability. Identifiability is ensured by fixing for a sentinel outcome, such as MMSE or WMS. Note that different scaling parameters might be used for and , but we found that such flexibility is unnecessary for our data application.
The covariance function of the shared latent trajectory is decomposed using eigendecomposition as , where are the eigenvalues, and are the associated orthonormal eigenfunctions, satisfying . Similarly, the covariance of the outcome‐specific process is decomposed as , where are the eigenvalues and are orthonormal eigenfunctions that satisfy .
To facilitate estimation and interpretation, we assume that both and can be represented by a finite number of functional principal components (FPC). Specifically, with FPC scores , and with FPC scores . Here, the parameters and represent the number of principal components retained for shared and outcome‐specific processes, respectively, and are adaptively selected using data‐driven criteria such as cross‐validation or AIC/BIC. Finally, measurement errors are assumed to be independently distributed as . Given the eigenfunctions and , Model (1) can be expressed as a mixed model
We make a few remarks about Model (1). First, heterogeneity across cohorts is modeled via the outcome‐ and cohort‐specific fixed mean functions, ensuring flexibility in modeling cohort‐specific baseline differences. Second, the proposed model integrates the cohorts by imposing shared variation patterns across the outcomes, allowing the scores and to be directly comparable among subjects across cohorts. This comparability facilitates a parsimonious survival model that integrates information across cohorts, even when each cohort collects different sets of longitudinal outcomes. Using shared latent trajectories and cohort‐specific mean functions, MFMM naturally accommodates missing data by design, allowing outcomes observed in only a subset of cohorts to contribute to the model. Indeed, Proposition 1 below shows that MFMM is identifiable under loose conditions. Figure 1 illustrates this decomposition by plotting the estimated components for two outcomes, MMSE and CDR‐SB, for one subject in the ADNI study and another subject in the NACC study, highlighting the model's ability to separate shared and outcome‐specific variations.
FIGURE 1.

Estimated components of the multi‐cohort MFMM in Model (1) for two outcomes: MMSE and CDR‐SB for one subject from the ADNI study (top two rows) and one subject from the NACC study (bottom two rows). Columns represent the following: (1) observed MMSE/CDR‐SB values (black dots) and estimated latent processes ; (2) cohort‐specific mean functions ; (3) shared latent disease trajectory scaled by coefficient ; (4) outcome‐specific deviations scaled by .
Proposition 1
Under model (1), suppose that . Assume that and . Then, Model (1) is identifiable. Specifically, the model parameters and are identifiable.
The proof of Proposition 1 is similar to that of identifiability for the original MFMM for one cohort [14] and hence is omitted. The condition ensures the presence of shared variation among outcomes, while fixing anchors the scaling of the shared latent trajectories. These conditions are easily satisfied in AD studies because all cohorts collect the MMSE and WMS outcomes, one of which can serve as the sentinel outcome with . Moreover, the conditions in Proposition 1 can be relaxed. For example, it is not a necessary condition to have a single or multiple outcomes common to all cohorts, demonstrating the model's flexibility in accommodating different outcome collection across cohorts.
2.3. Joint Model for Longitudinal and Survival Data
Let be the hazard function for the subject. In addition, let be the vector of scores corresponding to the shared latent trajectory among the outcomes. We consider the following cohort‐specific proportional hazards model,
| (2) |
where is the baseline hazard function, is the coefficient vector for the baseline covariates in cohort , and is the coefficient vector associated with the shared latent score for cohort . Here, represents the baseline hazard function, which may be specified parametrically or estimated nonparametrically. The cohort‐specific coefficients account for baseline differences in covariate effects among cohorts, while captures the influence of the shared latent trajectory on survival outcomes. This hazard model is parsimonious and applicable across cohorts because it incorporates longitudinal outcomes via shared random scores , enabling integration across cohorts without dependence on the specific set of outcomes collected. Using shared latent scores, the model avoids the complexity of cohort‐specific dependence on collected outcomes, reducing dimensionality while maintaining interpretability and statistical power.
Although the proportional hazards assumption is used here, the framework can be extended to accommodate time‐varying effects or more complex survival models, providing additional flexibility for diverse applications. Moreover, although we employ the random effects model as the linking function between longitudinal outcomes and survival data, other functional forms can also be utilized. For example, derivative forms incorporating the rate of change in the shared latent trajectory, cumulative forms accounting for historical influence, or lag models emphasizing the weighted impact of recent history, could offer alternative approaches [16]. These extensions provide further flexibility to capture nuanced relationships between longitudinal and survival data, enhancing the applicability of the model across various contexts.
2.4. Likelihood of Joint Model
Let be the column vector of observations for the outcome of the subject. Let be the vector of FPC scores corresponding to the shared latent trajectory, with a diagonal covariance matrix . Similarly, let be the vector of FPC scores corresponding to the outcome‐specific trajectory, with a diagonal covariance matrix . Define and as eigenfunctions for the shared latent trajectory. Similarly, let and represent the eigenfunctions for the outcome‐specific trajectory.
The predicted values for the outcome of the subject are given by . The predicted values provide the functional link between latent FPC scores and observed longitudinal data, capturing the modeled relationship between shared and outcome‐specific trajectories. Concatenate the vectors into a long column vector and define similarly . Let be the collection of all FPC scores (treated as random effects in the likelihood framework). Finally, denote the covariance matrix of the measurement errors by . The conditional likelihood of the multivariate longitudinal data given is
FPC scores (random effects) and are assumed to follow multivariate normal distributions: and .
The conditional partial likelihood of the time‐to‐event data is given by
| (3) |
where is the baseline cumulative hazard function.
The multivariate longitudinal data and the time‐to‐event data are assumed to be conditionally independent given the random effects . This assumption reflects the idea that the shared latent trajectory and outcome‐specific deviations adequately capture the dependence structure between the longitudinal and survival components. To account for differences in cohort sizes, a weighted log‐likelihood is adopted. The smallest cohort (cohort 1) is designated as the reference cohort, ensuring that variability in sample sizes does not disproportionately influence parameter estimation. Let denote the total number of longitudinal observations in cohort . In particular, we choose cohort 1 such that is the smallest. Then the weights for subjects in different cohorts are defined as .
The overall marginal likelihood is obtained by integrating the product of the longitudinal and survival likelihoods over the distributions of the random effects . The weighted marginal log‐likelihood is
| (4) |
3. Model Estimation via Monte Carlo EM
3.1. Spline Approximation of Smooth Functions
Estimating the smooth mean and covariance functions in the MFMM model requires a flexible yet computationally efficient approach. We employ the spline approximation, which provides these properties and is particularly suited to complex functional data. Let be the vector of B‐spline basis functions, where is the number of equally‐spaced interior knots plus the order (degree + 1) of the B‐splines. The mean function for the outcome in cohort is modeled as , where is the corresponding coefficient vector.
To ensure numerical stability, we orthonormalize the B‐spline bases using the Gram matrix: . Let denote the resulting orthonormalized B‐spline basis functions. The covariance functions (shared latent trajectory) and (outcome‐specific trajectory) are decomposed into their eigenfunctions, which are approximated: and , where and are coefficient vectors for the eigenfunctions of and , respectively. The orthonormality of the eigenfunctions imposes the constraints and .
3.2. E‐Step
The spline approximation of nonparametric smooth functions enables the use of parametric estimation methods for the MFMM model. The random score vector , which captures both shared and outcome‐specific variations, introduces significant dimensionality to the model. In practice, the dimension of often exceeds four, making direct maximization of the log marginal likelihood in Model (4) computationally expensive or infeasible. To address this, we employ the EM algorithm [22], which treats as latent (missing) data. The EM algorithm iteratively alternates between two steps until convergence: (1) computing the expectation of the conditional log‐likelihood of the observed data given the current parameter estimates (E‐step), and (2) maximizing the expected log‐likelihood to update the parameter estimates (M‐step).
Let the observed data for the subject be , where represents the collection of longitudinal data. Let , , , and . Let . Denote the complete collection of parameters by . The E‐step calculates the expected value of the weighted conditional log‐likelihood given the observed data and current parameter estimates, . This is expressed as follows.
| (5) |
where is the expectation with respect to the conditional distribution of given the observed data and current parameter estimates .
Let be any smooth function of . The conditional expectation is given by:
where corresponds to the survival likelihood defined in Model (3), and
The conditional distribution is multivariate normal (see Appendix A).
To efficiently compute , we use Monte Carlo integration. Specifically, we draw samples from the multivariate normal distribution and compute the approximation:
The Monte Carlo approximation enables efficient implementation of the E‐step, ensuring computational feasibility even in high‐dimensional parameter spaces.
3.3. M‐Step
In the M‐step, updated estimates of are obtained by maximizing the function in (5). Specifically, estimates of the longitudinal parameters , the survival parameters , and the eigenvalue parameters are updated iteratively. In the following, we provide details on the estimation process for each set of parameters.
First, the longitudinal parameters are estimated iteratively by minimizing , expressed as:
| (6) |
where is the matrix of B‐spline bases evaluated at time points , and is the orthonormalized basis obtained by multiplying by the inverse square root of the Gram matrix . The outcome‐specific scaling parameter is obtained through weighted least squares of linear regression, as detailed in Appendix C. To enforce smoothness in the estimates of the mean functions and eigenfunctions, quadratic penalties are added to (6). Compared to regression spline estimation [14], penalized spline estimation stabilizes iterative updates, reduces the number of iterations required for EM convergence, and produces smoother and more interpretable estimates. Empirical and theoretical studies of penalized splines show that the number of knots does not matter as long as a relatively large number of knots are used [23, 24]. In this study, seven spline bases constructed at equally spaced knots are used for both simulations and real data. The orthonormality of eigenfunctions is also ensured by post‐processing the updates.
Second, the baseline hazard function in the Cox regression is estimated by minimizing , expressed as:
The baseline hazard is then estimated using a weighted ratio of observed events to expected contributions at each time :
The parameter vector , representing the effects of baseline covariates and latent trajectories, is updated iteratively using the Newton‐Raphson algorithm. The iteration is given by: where and are the score vector and the observed information matrix, respectively. By differentiating the survival likelihood (3) with respect to , the score for subject is:
where are the distinct observed event times across all cohorts. The cohort‐level score and the information matrix are calculated as: and , where is the number of subjects in cohort .
Third, the diagonal matrices of eigenvalues and are estimated by minimizing the weighted negative log‐likelihoods and , respectively. For , this minimization simplifies to: . The estimator for the eigenvalues is: , where represents the expected squared random scores for the shared latent trajectory. Similarly, for , minimizing the negative log‐likelihood leads to the estimator: , where represents the expected squared random scores for the outcome‐specific deviations. The inclusion of cohort‐specific weights ensures that estimates adequately account for differences in cohort sizes and data distributions. This weighting improves model flexibility and accuracy in diverse cohort distributions.
4. Model Selection
Penalized splines [20] are used to estimate the mean functions and eigenfunctions in the longitudinal data model, which require the selection of appropriate smoothing parameters. These parameters are determined using generalized cross‐validation (GCV) at each iteration of the EM algorithm. By reformulating function estimation as a weighted least squares problem in nonparametric regression, we enable efficient implementation using the gam function from the R package mgcv [21, 25] (see Appendix C for details). Although the smoothing parameters may vary between iterations, they stabilize rapidly as the algorithm converges, ensuring consistent estimation of smooth functions and improving computational efficiency.
The number of principal components for the shared latent process and for the outcome‐specific latent process are critical tuning parameters in the model. To select these parameters, we use Bayesian information criteria (BIC), defined as , where is the weighted marginal log‐likelihood of the data, approximated as:
Here, is the marginal density, which follows a normal distribution (see Appendix A). The samples are drawn from ; see Appendix B for the derivation. To ensure an accurate approximation of the marginal log‐likelihood, the number of samples must be much larger than in the EM algorithm.
The degrees of freedom (DOF) for cohort , denoted as , are calculated as:
Each term in this formula corresponds to a specific model component:
: DOF for estimating the mean functions in cohort .
: DOF for the shared covariance function's eigenpairs.
: DOF for the outcome‐specific covariance function's eigenpairs.
: DOF for estimating error variances and scaling coefficient vector .
: Number of baseline covariates in the Cox regression.
: Number of cohort‐specific coefficients for shared latent scores in the Cox regression.
Cohort 's DOF has a division factor to appropriately reflect cohort‐level contributions when parameters are shared across cohorts. A two‐dimensional grid is used to identify the optimal values of and , balancing model complexity with goodness‐of‐fit.
5. Data Application
We apply the proposed multi‐cohort functional joint model (FJM) to data from three AD cohorts, integrating longitudinal outcomes via the extended MFMM and linking them to survival outcomes through a Cox regression model. This application jointly characterizes the progression of multivariate longitudinal outcomes and their association with time to the diagnosis of AD dementia. By leveraging cohort‐specific variations while incorporating shared latent processes, the multi‐cohort FJM allows for a unified yet flexible representation of disease trajectories across diverse datasets. We include the following baseline covariates [14]: age, sex, years of education, and the number of apolipoprotein E ϵ 4 alleles (APOE4), in the hazard model, reflecting known demographic and genetic risk factors for AD progression (see Table S1 in Appendix D for details). To assess performance, we compare the multi‐cohort FJM with the single‐cohort FJM‐MFMM [14] and the parametric multivariate joint model (MJM) [26], each applied separately to the cohorts.
Using the BIC criteria described in Section 4, we selected four principal components for the shared covariance structure () and three for the outcome‐specific covariance structure (). The estimated scaling coefficients , shown in Table 1, with signs aligning with expectations: lower values of MMSE, WMSLM, RAVLT, and SDMT and higher values of CDR‐SB, ADAS, FAQ, and TRAILA indicate AD progression. Figure 2 illustrates the first two eigenfunctions estimated using the multi‐cohort FJM and separate FJM models. Consistent trends in eigenfunctions across models support the utility of the multi‐cohort FJM in capturing subject‐specific random trajectories. For example, the first eigenfunction of the shared latent process () is negative throughout, and its loading coefficients in (Table 1) suggest that positive scores for the shared trajectory correspond to worsening outcomes (lower MMSE, WMSLM, RAVLT, and SDMT values, and higher CDR‐SB, ADAS, FAQ, and TRAILA values). This pattern indicates accelerated AD progression as time progresses. A similar interpretation holds for the first outcome‐specific eigenfunction (), highlighting its role in capturing deviations specific to individual outcomes.
FIGURE 2.

Top two estimated eigenfunctions with associated eigenvalues for the shared and outcome‐specific latent processes, comparing multi‐cohort FJM (solid lines) and separate FJM (dashed lines). These eigenfunctions characterize the primary patterns of variability in the longitudinal outcomes across cohorts.
Figure 3 compares the estimated mean trajectories of longitudinal outcomes across three models: multi‐cohort FJM, single‐cohort FJM and MJM. Columns one and three display results for the FJM and multi‐cohort FJM, while columns two and four depict results for MJM. Unlike MJM, which assumes linear trends, both FJM and multi‐cohort FJM effectively capture nonlinear trajectories. Three important observations emerge. First, while FJM and multi‐cohort FJM produce similar mean functions, multi‐cohort FJM captures steeper declines in WMSLM and RAVLT and sharper increases in TRAILA, reflecting more rapid deterioration in some AD‐related measures. Second, the flexibility of penalized splines in the FJM models enables them to capture nonlinear trends in some outcomes (e.g., WMSLM, RAVLT, and TRAILA), which indicate an acceleration in deterioration in AD‐related measures. In contrast, MJM is linear and does not capture such dynamics, which exposes its limitations in modeling complex trends in AD studies. Third, clear cohort‐specific differences in progression are evident. For example, patients in the ROSMAP cohort demonstrate faster progression, as reflected in lower MMSE and WMSLM scores, compared to ADNI and NACC, where ADNI patients show the slowest progression. These differences highlight the importance of multi‐cohort modeling in uncovering population‐specific patterns.
Table 2 presents the estimated Cox regression coefficients of multi‐cohort FJM. APOE4 consistently emerges as a significant risk factor in all cohorts, while sex does not show a significant association. Age is significant in ADNI and NACC but not in ROSMAP, and years of education have a significant protective effect in NACC and ROSMAP but not in ADNI, potentially reflecting differences in population characteristics. Shared progression scores () are critical predictors of AD risk. Most scores show significant associations with time to AD diagnosis, emphasizing the utility of shared latent trajectory to capture meaningful variability in disease progression. For example, positive values for are associated with higher AD risk, consistent with the interpretation of the shared trajectory reflecting overall deterioration. These findings underscore the importance of incorporating longitudinal outcomes into survival models to better understand the progression and prediction of AD.
TABLE 2.
Estimated Cox regression coefficients (with standard errors) from the multi‐cohort functional joint model, evaluating the association between baseline covariates, shared latent scores, and time to AD diagnosis.
| Covariates | ADNI | NACC | ROSMAP | |||
|---|---|---|---|---|---|---|
| Estimate (s.e.) | p | Estimate (s.e.) | p | Estimate (s.e.) | p | |
| Age | 0.02 (0.01)* | 0.01 | 0.01 (0.00)* | 0.00 | 0.00 (0.01) | 0.96 |
| sex (female) | 0.10 (0.17) | 0.55 | 0.09 (0.08) | 0.21 | −0.31 (0.24) | 0.20 |
| Education | 0.05 (0.03) | 0.08 | 0.09 (0.01)* | 0.00 | 0.11 (0.03)* | 0.00 |
| APOE4 | 0.28 (0.13)* | 0.03 | 0.37 (0.06)* | 0.00 | 0.40 (0.19)* | 0.03 |
| Shared score | 0.63 (0.04)* | 0.00 | 0.59 (0.02)* | 0.00 | 0.81 (0.06)* | 0.00 |
| Shared score | 0.68 (0.22)* | 0.00 | 0.48 (0.07)* | 0.00 | 0.10 (0.30) | 0.74 |
| Shared score | −1.23 (0.26)* | 0.00 | −1.71 (0.12)* | 0.00 | −0.66 (0.39) | 0.09 |
| Shared score | 2.86 (0.68)* | 0.00 | 3.90 (0.31)* | 0.00 | −3.88 (0.71)* | 0.00 |
Note: Covariates include baseline age, sex, years of education, and APOE4 status. Shared latent scores reflect subject‐specific progression in longitudinal outcomes. An asterisks indicates significance at the 0.05 level.
In multi‐cohort FJM, distinct mean functions for longitudinal outcomes and cohort‐specific Cox regression coefficients are employed to account for inter‐cohort variability. To investigate the impact of simplifying assumptions, we evaluate alternative models with either shared mean functions or shared Cox coefficients across cohorts. Using AIC/BIC for model selection, we assess the trade‐offs between model flexibility and parsimony. When both mean functions and Cox coefficients are shared across cohorts, the degrees of freedom for cohort are defined as:
where represents the number of cohorts in which the outcome is collected.
For each model, we apply multi‐cohort FJM with different values of the tuning parameter and (, ) to the three AD cohorts; see Appendix D for details. The results show that cohort‐specific Cox coefficients are consistently required, as determined by both AIC and BIC. However, the two metrics diverge regarding mean functions: BIC favors shared mean functions across cohorts, reflecting its preference for simpler models, while AIC supports cohort‐specific mean functions, prioritizing flexibility. This divergence likely arises from the limited overlap of outcomes across cohorts, with only three of the eight outcomes shared by at least two cohorts. Allowing cohort‐specific mean functions in such a scenario does not significantly increase model flexibility. Results for the simpler model selected by BIC are provided in Appendix D and align with the interpretations of the full model. These findings underscore the critical role of cohort‐specific Cox coefficients in capturing inter‐cohort heterogeneity, while the choice to use cohort‐specific mean functions depends on the extent of outcome overlap across cohorts.
6. Simulation Study
6.1. Simulation Settings
We evaluate the performance of multi‐cohort FJM by generating data similar to the AD cohorts. The longitudinal data of 3 cohorts are generated according to Model (1) with outcomes, following the structure of AD cohorts. Each cohort includes the same set of outcomes as in the real data, with cohort‐specific mean functions, scaling parameters, and error variances derived from the estimates obtained in the application study. To simplify the analysis, the multi‐cohort FJM is fitted with two principal components for both the shared covariance function and the outcome‐specific covariance functions. The eigenfunctions used in the simulations are the estimated eigenfunctions from the three AD cohorts, corresponding to two shared principal components and two outcome‐specific principal components.
The shared random scores and are generated from a normal distribution , where and . Similarly, outcome‐specific random scores are generated , with and . Measurement errors, denoted as , are sampled from a normal distribution . The observed time points correspond to 11 fixed time points mapped to the interval and are subject to truncation based on censoring and event times, as described below.
The events data are generated using the Cox regression model in (2), incorporating shared random scores and four baseline covariates from the data application. The linear hazard rate function is specified as , where the Cox coefficients are based on estimates from the data application. Specifically, , , and , with shared coefficients , , , , , and . A Weibull baseline hazard with is used, and event times are generated via the inverse probability method [27]. Censoring times are sampled independently from a Beta distribution, with and chosen to approximate censoring rates of 66%, 66% and 59%, matching the three AD cohorts. For each subject, only measurements at are retained. The cohort sample sizes are 715, 3707, and 522 for cohorts 1, 2, and 3, respectively, with an average number of observations per subject of 7.8, 6.7, and 7.3. These settings closely mimic the AD study, ensuring realistic simulation conditions. We simulate data 100 times.
6.2. Simulation Results
To assess model performance, we first fit the multi‐cohort FJM using the true number of shared and outcome‐specific principal components. Figure 4 shows the estimated mean functions across 100 replications. Gray lines represent estimates from individual replicates, dashed blue lines show the average of these estimates, and solid red lines represent the true mean functions. The close alignment of the dashed blue and solid red lines indicates accurate estimation. The estimated eigenfunctions and a comprehensive assessment of the estimated model parameters are presented in Appendix E. These parameters include the eigenvalues , the outcome‐specific scaling parameter , the Cox regression coefficients , and the white noise variance . The estimated eigenfunctions demonstrate strong agreement with the ground truth, and the estimated model parameters align closely with their true values, confirming the good performance of the proposed multi‐cohort FJM.
FIGURE 4.

Estimated mean functions from 100 simulation replications. Gray lines: individual estimates; dashed blue lines: average of these estimates; solid red lines: true mean functions.
Finally, we evaluate model selection using AIC and BIC to determine the number of eigenfunctions in the covariance structures. The proposed selection method demonstrates high accuracy, achieving correct selection rates of 0.98 for AIC and 1.00 for BIC, respectively. These results highlight the robustness of BIC in correctly identifying model complexity, making it the preferred criterion for model selection.
In summary, these simulation results substantiate the reliability and effectiveness of multi‐cohort FJM and demonstrate its ability to accurately estimate model parameters and select the appropriate number of eigenfunctions.
6.3. Comparison With Regression Spline Estimation
To evaluate the computational efficiency of our EM implementation with penalized splines, we compare it with an EM implementation using regression splines [14]. Under the same simulation design as above, we generate 50 independent replications.
Estimation accuracy for the mean functions and covariance functions in the longitudinal sub‐model was quantified using the relative integrated squared error (RISE). Specifically, letting denote the true mean function for outcome in cohort and its estimate, and letting and denote the true shared and outcome‐specific covariance functions with estimates and , we define
Figure 5 provides boxplots of RISEs over 50 replications for the two estimation methods. The penalized spline EM algorithm yields uniformly lower RISE values for the mean functions and shared covariance compared to the regression‐spline EM. For outcome‐specific covariance , the two approaches show comparable accuracy across replications.
FIGURE 5.

Boxplots of RISE of the estimated mean functions, shared covariance , and outcome‐specific covariance over 50 replications, comparing the penalized‐spline EM algorithm with the regression‐spline EM algorithm.
We also compare the computation time of the two estimation methods. Median (median absolute deviation) runtime for the penalized‐spline EM is 2.14 (1.04) h, whereas the regression‐spline EM requires 3.72 (1.74) h. Boxplots of computation times are provided in Appendix EE. Across identical simulation settings, penalized‐spline EM achieves better or comparable estimation accuracy and faster convergence than regression‐spline EM. These results provide direct empirical support for the efficiency claim of our proposed algorithm.
7. Discussion
We proposed a novel extension of the multivariate functional mixed model (MFMM) [14] to jointly analyze longitudinal outcomes from multiple cohorts. Our approach offers a principled solution to the challenges posed by disparate outcome collections across cohorts. By leveraging shared variation patterns extracted from the MFMM, we achieved a parsimonious, yet flexible framework for linking longitudinal and survival data. This approach enables robust characterization of disease trajectories and their associations with survival outcomes in heterogeneous cohorts. The application to the three AD cohorts underscores the utility of the model to uncover shared and cohort‐specific disease progression patterns. By identifying differences in disease trajectories across cohorts, such as faster progression in ROSMAP compared to ADNI and NACC, the model revealed insights that would be unavailable from a single‐cohort analysis. Additionally, inclusion of cohort‐specific survival model coefficients allowed us to capture inter‐cohort heterogeneity in baseline covariate effects, offering a more comprehensive understanding of AD risk factors, including the differential impact of age, education, and APOE4 status across cohorts.
For the proposed functional joint model, we developed a computationally feasible algorithm that effectively combines EM and penalized splines. The iterative nature of EM matches well with the local selection of smoothing parameters for penalized splines, overcoming the computational complexity of functional joint models. The success of our algorithm will no doubt encourage further uses of penalized splines in iterative algorithms. It should be noted that the local selection strategy of smoothing parameters is generally applicable to any iterative algorithm with nonparametric smooth functions to estimate.
Despite its strengths, the proposed model has limitations that warrant further investigation. Currently, the longitudinal and survival sub‐models are linked parametrically via the principal scores. Although this approach ensures model simplicity, it may not fully capture more complex associations between longitudinal outcomes and survival data. Future work could incorporate nonparametric linking mechanisms [16], to enhance flexibility and accommodate complex relationships between the two sub‐models.
Another limitation is the reliance on a single shared variation pattern to represent commonalities across longitudinal outcomes. Although sufficient for a moderate number of outcomes, this may become restrictive when analyzing a large set of outcomes with diverse dependencies. The recent latent functional factor model [28] allows multiple shared variation patterns and offers a promising direction to extend MFMM to handle such complexity. Incorporating multiple shared variation patterns into the functional joint model could further improve its ability to capture the rich structure of longitudinal data and enhance its applicability to more complex multi‐cohort studies.
Funding
The research of Sheng Luo was supported by the National Institute on Aging (grant number: R01AG064803, P30AG072958, and P30AG028716). The research of Wenyi Wang and Luo Xiao was partially supported by the National Institute of Neurological Disorders and Stroke (R01NS112303).
Conflicts of Interest
The authors declare no conflicts of interest.
Supporting information
Data S1. Supporting information.
Acknowledgments
Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (https://adni.loni.usc.edu/). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at http://adni.loni.usc.edu/wp‐content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.
Wang W., Xiao L., Li R., Luo S., Alzheimer's Disease Neuroimaging Initiative , “A Functional Joint Model for Survival and Multivariate Sparse Functional Data in Multi‐Cohort Alzheimer's Disease Study,” Statistics in Medicine 45, no. 3‐5 (2026): e70442, 10.1002/sim.70442.
Data Availability Statement
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.
References
- 1. Alzheimer's Association , “2024 Alzheimer's Disease Facts and Figures,” Alzheimer's & Dementia 20, no. 5 (2024): 3708–3821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Weiner M. W., Veitch D. P., Aisen P. S., et al., “2014 Update of the Alzheimer's Disease Neuroimaging Initiative: A Review of Papers Published Since Its Inception,” Alzheimer's & Dementia 11, no. 6 (2015): e1–e120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Besser L., Kukull W., Knopman D. S., et al., “Neuropsychology Work Group, Directors, and Clinical Core Leaders of the National Institute on Aging‐Funded US Alzheimer's Disease Centers Version 3 of the National Alzheimer's Coordinating Center's Uniform Data Set,” Alzheimer Disease and Associated Disorders 32, no. 4 (2018): 351–358. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Bennett D. A., Buchman A. S., Boyle P. A., Barnes L. L., Wilson R. S., and Schneider J. A., “Religious Orders Study and Rush Memory and Aging Project,” Journal of Alzheimer's Disease 64, no. s1 (2018): S161–S189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Wulfsohn M. S. and Tsiatis A. A., “A Joint Model for Survival and Longitudinal Data Measured With Error,” Biometrics 330–339 (1997): 330. [PubMed] [Google Scholar]
- 6. Lin H., McCulloch C. E., and Mayne S. T., “Maximum Likelihood Estimation in the Joint Analysis of Time‐To‐Event and Multiple Longitudinal Variables,” Statistics in Medicine 21, no. 16 (2002): 2369–2382. [DOI] [PubMed] [Google Scholar]
- 7. Murray J. and Philipson P., “Fast Estimation for Generalised Multivariate Joint Models Using an Approximate EM Algorithm,” Computational Statistics and Data Analysis 187 (2023): 107819. [Google Scholar]
- 8. He Y., Song X., and Kang K., “Joint Mixed Membership Modeling of Multivariate Longitudinal and Survival Data for Learning the Individualized Disease Progression,” Annals of Applied Statistics 18, no. 3 (2024): 1924–1946. [Google Scholar]
- 9. James G., Hastie T., and Sugar C., “Principal Component Models for Sparse Functional Data,” Biometrika 87, no. 3 (2000): 587–602. [Google Scholar]
- 10. Yao F., Müller H. G., and Wang J. L., “Functional Data Analysis for Sparse Longitudinal Data,” Journal of the American Statistical Association 100, no. 470 (2005): 577–590. [Google Scholar]
- 11. Ramsay J. and Silverman B., Functional Data Analysis (Springer, 2005). [Google Scholar]
- 12. Yao F., “Functional Principal Component Analysis for Longitudinal and Survival Data,” Statistica Sinica 17 (2007): 965–983. [Google Scholar]
- 13. Yan F., Lin X., and Huang X., “Dynamic Prediction of Disease Progression for Leukemia Patients by Functional Principal Component Analysis of Longitudinal Expression Levels of an Oncogene,” Annals of Applied Statistics 11, no. 3 (2017): 1649–1670. [Google Scholar]
- 14. Li C., Xiao L., and Luo S., “Joint Model for Survival and Multivariate Sparse Functional Data With Application to a Study of Alzheimer's Disease,” Biometrics 78, no. 2 (2022): 435–447. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Li K. and Luo S., “Dynamic Prediction of Alzheimer's Disease Progression Using Features of Multiple Longitudinal Outcomes and Time‐To‐Event Data,” Statistics in Medicine 38, no. 24 (2019): 4804–4818. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Zou H., Zeng D., Xiao L., and Luo S., “Bayesian Inference and Dynamic Prediction for Multivariate Longitudinal and Survival Data,” Annals of Applied Statistics 17, no. 3 (2023): 2574–2595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Shi H., Jiang S., Ma D., Beg M. F., and Cao J., “Dynamic Survival Prediction Using Sparse Longitudinal Images via Multi‐Dimensional Functional Principal Component Analysis,” Journal of Computational and Graphical Statistics 33 (2024): 1240–1251. [Google Scholar]
- 18. Hong Y., Su L., Song S., and Yan F., “Dynamic Prediction of Disease Processes Based on Recurrent History and Functional Principal Component Analysis of Longitudinal Biomarkers: Application for Ovarian Epithelial Cancer,” Statistics in Medicine 40, no. 8 (2021): 2006–2023. [DOI] [PubMed] [Google Scholar]
- 19. Zou H., Xiao L., Zeng D., and Luo S., “Dynamic Prediction With Multivariate Longitudinal Outcomes and Longitudinal Magnetic Resonance Imaging Data,” Annals of Applied Statistics 19, no. 1 (2025): 505–528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Eilers P. H. C. and Marx B. D., “Flexible Smoothing With B‐Splines and Penalties,” Statistical Science 11, no. 2 (1996): 89–102. [Google Scholar]
- 21. Wood S. N., Pya N., and Säfken B., “Smoothing Parameter and Model Selection for General Smooth Models,” Journal of the American Statistical Association 111, no. 516 (2016): 1548–1563. [Google Scholar]
- 22. Dempster A. P., Laird N. M., and Rubin D. B., “Maximum Likelihood From Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society: Series B: Methodological 39, no. 1 (1977): 1–22. [Google Scholar]
- 23. Ruppert D., “Selecting the Number of Knots for Penalized Splines,” Journal of Computational and Graphical Statistics 11, no. 4 (2002): 735–757. [Google Scholar]
- 24. Xiao L., “Asymptotic Theory of Penalized Splines,” Electronic Journal of Statistics 13, no. 1 (2019): 747–794. [Google Scholar]
- 25. Wood S. N., “Fast Stable Restricted Maximum Likelihood and Marginal Likelihood Estimation of Semiparametric Generalized Linear Models,” Journal of the Royal Statistical Society (B) 73, no. 1 (2011): 3–36. [Google Scholar]
- 26. Henderson R., Diggle P., and Dobson A., “Joint Modelling of Longitudinal Measurements and Event Time Data,” Biostatistics 1, no. 4 (2000): 465–480. [DOI] [PubMed] [Google Scholar]
- 27. Bender R., Augustin T., and Blettner M., “Generating Survival Times to Simulate Cox Proportional Hazards Models,” Statistics in Medicine 24, no. 11 (2005): 1713–1723. [DOI] [PubMed] [Google Scholar]
- 28. Li R. and Xiao L., “Latent Factor Model for Multivariate Functional Data,” Biometrics 79, no. 4 (2023): 3307–3318. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data S1. Supporting information.
Data Availability Statement
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.
