Abstract
In many clinical studies, longitudinal biomarkers are often used to monitor the progression of a disease. For example, in a kidney transplant study, the glomerular filtration rate (GFR) is used as a longitudinal biomarker to monitor the progression of the kidney function and the patient's state of survival that is characterized by multiple time-to-event outcomes, such as kidney transplant failure and death. It is known that the joint modelling of longitudinal and survival data leads to a more accurate and comprehensive estimation of the covariates' effect. While most joint models use the longitudinal outcome as a covariate for predicting survival, very few models consider the further decomposition of the variation within the longitudinal trajectories and its effect on survival. We develop a joint model that uses functional principal component analysis (FPCA) to extract useful features from the longitudinal trajectories and adopt the competing risk model to handle multiple time-to-event outcomes. The longitudinal trajectories and the multiple time-to-event outcomes are linked via the shared functional features. The application of our model on a real kidney transplant data set reveals the significance of these functional features, and a simulation study is carried out to validate the accurateness of the estimation method.
Keywords: Competing risks, functional principal component analysis, joint model, latent variables, kidney transplant
1. Background and introduction
Various studies, such as Levey et al. [11] and Wolfe et al. [17], have shown that kidney transplantation prolongs the survival of patients with end-stage renal disease. As patients may experience acute rejection or graft failure post-transplantation, how to extend the long-term survival of the kidney graft remains the main scientific question for transplant studies. If the rate of kidney graft failure can be reduced, the overall patient population would enjoy a longer survival time. Surrogate markers have been proposed to predict kidney graft failure. For example, Marcen et al. [12] and Moranne et al. [13] proposed to use the slope of the GFR trajectories to predict graft failure using a Cox model.
Three key questions pertain to understanding the interconnection between the GFR trajectories and the long-term transplant outcomes. The first question is how to fit a continuous trajectory from the repeated longitudinal GFR measurements. The second question is how to model the survival hazard of multiple outcomes/events simultaneously, as kidney recipients post-transplantation are subject to competing risks of transplant failure as well as death from causes other than transplant failure. The third question is how to identify the effects of the shape of the longitudinal trajectories on the prediction of multiple time-to-event outcomes.
To address the first question, several methods have been developed. For example, parametric models are commonly used to fit the GFR trajectories, e.g. Marcen et al. [12], Moranne et al. [13], and Dong et al. [5]. Another alternative is to use a nonparametric approach. For example, the functional principal component analysis is adopted by Dong et al. [4] to explore the major sources of variation among the GFR trajectories. The dimension of the GFR trajectories is effectively reduced and each curve can be represented by four functional principal components (FPCs). The top four FPCs account for of the total variation.
To address the second question, various survival models have been proposed to handle competing events. In the competing risk framework, two popular competing risk models are used. One is the cause-specific hazard model proposed by Prentice et al. [14] and Putter et al. [15], and the other is the subdistribution hazards regression introduced by Fine and Gray [7]. We model the multiple time-to-event outcomes via the latter approach, which is based on a reweighted risk set for consistent estimation of the regression coefficients.
To address the third question, the longitudinal outcomes and the time-to-event outcomes are linked using the FPC scores as the shared latent features. As shown by Dong et al. [4], the four FPCs relate to four primary patterns of variation within the GFR trajectories. Figure 1 shows the GFR trajectories of four clusters of patients whose trajectories are dominated by their first, second, third, and fourth FPC scores, respectively. The risk of kidney transplant failure or death might be different when a patient's GFR trajectory is flat versus when the trajectory highly fluctuates. It is thus of interest to explore whether and how the progression of kidney function differs among these four clusters, and a natural approach is to use the FPC scores as the shared covariates between the longitudinal model and the survival model.
Figure 1.
The 4 clusters of observable GFR curves by FPC scores from our preliminary analysis. The thick blue curve is the average of GFR curves in each panel.
Several proposed joint models for longitudinal outcomes and time-to-event outcomes have been constructed for FPCA. Yao [18] developed a joint model where FPCA is used to fit the longitudinal trajectories and the longitudinal outcome is treated as a covariate in the Cox regression model. Ding and Wang [3] proposed a joint model that treats longitudinal outcomes as nonparametric multiplicative random effects within the Cox proportional hazard framework. However, the two joint models mentioned above can only accommodate a single time-to-event outcome. A number of joint models have been developed recently for longitudinal and competing risk data. For example, Hickey et al. [8] has given the summary for the published joint models with competing-risks event. However, none of these joint models for longitudinal and competing risk data were set up for FPCA. It is of interest to determine the relationship between the patient's progression of kidney function and the dominant variation patterns of the GFR trajectories in our clinical kidney transplantation data. Therefore, we propose a new joint model based on the FPC scores as the shared latent features between the longitudinal and survival components.
Our model uses functional principal component analysis for the modeling of the longitudinal measurements and a competing risk subdistribution hazard model for handling multiple time-to-event outcomes. The main highlight of this paper is that after reflecting on the three key clinical questions in the kidney transplant studies, we tailor a new joint model to adequately address them. To the best of our knowledge, the proposed model is the first to explore the relationship between the pattern of variation of the GFR trajectories and the patient's state of survival that is subject to multiple time-to-event outcomes, using FPC scores as the latent shared features between the longitudinal model and the survival model.
The rest of this article is organized as follows. The proposed joint model is introduced in Section 2. We present the estimation method for the proposed joint model in Section 3. Section 4 demonstrates the application of our joint model in the kidney transplant data. Section 5 presents a simulation study to investigate the finite sample performances of our joint model. Conclusions and discussion are given in Section 6.
2. Joint model
Let and respectively denote the event and censoring times for the ith subject, where . Let denote the longitudinal outcome of the ith subject at time t, , and the covariate vector of the ith subject. We observe and the censoring indicator . Each may correspond to one of M different event types, and we denote as the index for the observed event types. We assume that the subjects are independent with each other.
2.1. Longitudinal model
The model for the longitudinal outcome is based on functional principal component analysis, which decomposes the underlying random stochastic process into a linear combination of functional principal components. As mentioned in Dong et al. [4], for the analysis of GFR curves, the principal component analysis through the conditional expectation (PACE) approach is well-suited for conducting FPCA and handling longitudinal outcome with possibly missing values and measurement errors. The obtained first 4 leading FPCs account for a majority of the variation ( ). We adopt such an approach for fitting the longitudinal measurements.
Let be the longitudinal process adjusted for the effects of the covariates. We decompose as
(1) |
where are identically and independently distributed normal measurement error terms with mean 0 and variance . The function is the kth functional principal component, which satisfies , where if k = j and 0 otherwise. The is the associated functional principal component score for the ith subject and the kth component, which is defined as
The magnitude of represents the degree of similarity between the and . The mean and variance of the distribution of are and , where .
By the Mercer's theorem, the covariance function between any two time points s and t in the time-period , defined as , can be expressed as
As it would be unrealistic to estimate an infinite number of , in reality, is usually well approximated by retaining only the first K leading FPCs.
To estimate the FPCs, the first step is to establish smoothed estimates of the mean and covariance functions. The mean function is obtained by smoothing the data from all observations based on the one-dimensional local linear smoother [6], and the covariance function is estimated by a two-dimensional smoother [10,19]. Let and denote the estimated mean trajectory and the estimated smoothed covariance surface.
To obtain estimates for and , we solve the eigenequation
(2) |
with the constraints and if k = j, and 0 otherwise. The solution to such an eigenequation can be found by applying spectral decomposition to the discretized covariance surface of .
Let denote the number of timepoints on the trajectory of the ith subject. Let and denote the vectors of values of and evaluated at time points of the ith subject, and let denote the matrix of values of evaluated at the two-dimensional grid consisting of time points of the ith subject. The FPC score of the ith subject and the kth FPC is computed from the conditional expectation
where is the vector of covariate-adjusted longitudinal data points of the ith subject, and is an identity matrix of size .
2.2. Competing risk model
The competing risk model is well-suited for our analysis of the time to multiple competing events of interest. Without loss of generality, we consider two types of events, one event of primary interest, and another event constituting a competing risk. If a competing event occurs before the event of primary interest, the primary event of interest would no longer be observable. For example, in the case of our application, kidney failure is the event of primary interest, whereas death from other causes constitutes the competing event. Censoring of the primary event (kidney failure) is not only due to loss of follow-up but also due to the competing event (death from other causes).
Due to the existence of a competing event, the usual Kaplan-Meier estimate of the survival function of the primary event would render biased results. This is because the usual assumption that any subject with a censored observation will eventually experience the primary event of interest no longer holds due to the presence of the competing event. If the competing event occurs prior to the primary event, it would be impossible for the subject to experience the event of primary interest, thus violating the assumption of the Kaplan-Meier method. Under the Kaplan-Meier method, the cumulative probability function will approach one given infinite time of follow-up; whereas under the presence of the competing risk, there remain a proportion of subjects who are affected by the competing event and will never experience the event of primary interest.
An alternative to the conventional cumulative probability function under the Kaplan-Meier method is the cumulative incidence function. Denote as the index for the primary event and the competing event, respectively, and let denote the cause-specific hazard function for the mth event type. The cumulative incidence function of the mth event is
where
The survival function is the probability that the subject survives to time t without experiencing events 1 or 2. The cumulative incidence function can be regarded as the marginal subdistribution function of the time-to-event of a given type.
From the cumulative incidence function, the definition of the subdistribution hazard function is naturally derived as
It is worth noting the is a better characterization of the cumulative incidence function as opposed to the cause-specific hazard function . A transformation involving the integration of would lead to the cumulative incidence function, i.e. , whereas conducting the same transformation on the would render an improper function that lacks meaningful interpretation.
To incorporate the effects of covariates into the subdistribution hazard function and thus direct effects on the cumulative incidence function, we may impose a proportional hazards structure on the hazard assumption as
where is the baseline subdistribution hazard function.
The estimate of the covariate effects in the subdistribution hazard function is not as straightforward as establishing separate estimations for different causes from the Cox model. Gray and Fine [7] proposed a reweighted approach on a modified risk set for consistent estimation of the regression coefficients. To be specific, to adjust for diminishing observability due to censoring, subjects who have experienced a competing event are retained in the risk set with weight tapering off as time progresses according to a function that depends on the censoring distribution.
2.3. Model setup
For the longitudinal process, in addition to extracting the major sources of variation via FPCA, we are interested in the association between the longitudinal outcome and the covariate vector . For the survival process, it is of interest to model the cumulative incidence function , which is the probability that the event of type m occurs at or before time t. This is a conditional probability on the covariate vector as well as the functional principal component scores from the longitudinal process, denoted as , where K is the number of FPCs extracted.
We propose the following joint model for the longitudinal outcome and the subdistribution hazard of multiple competing time-to-event outcomes,
(3) |
The first equation represents the model for the longitudinal outcome , where is the overall mean function of the longitudinal outcome, is the vector of the functional principal component, the vector of FPC scores for the ith subject, i.e. , and is a vector of coefficients for the fixed effects of . It should be noted that the usual Mercer expansion would have and we retain only the top K FPCs that explain the majority of variation among the longitudinal trajectories. The FPC scores serve as the shared features in the survival model, and they link the longitudinal outcome and the competing time-to-event outcome in the joint model.
The second equation is a competing risk survival model. The subdistribution hazard function is the hazard rate of the conditional cumulative incidence function for the mth competing risk, is the vector of coefficients for the random effects of FPC scores , and is the vector of coefficients for the fixed effects of . We take m = 1 as the primary event of interest corresponding to the event of kidney transplant failure.
There are several advantages of incorporating the FPC scores as the latent shared features in the new joint model. Contrary to the conventional models [18], which use the longitudinal outcome as a covariate in the proportional hazard structure, the FPC scores, which represent the pattern of variation in the longitudinal curves, may offer a better characterization of the relationship between the longitudinal process and the survival outcomes. In particular, in the case of the kidney transplant data, prior studies [5] have identified notable variation in the GFR trajectories, e.g. one group of patients may have very flat GFR trajectories, whereas another group may exhibit huge variation in the trajectories in terms of shape or slope. It is thus of interest to study the effects of the curve patterns on the survival incidence function.
3. The estimation method
3.1. The joint likelihood functions
This section gives the inference of the proposed joint model. Let denote the observations of each subject in the data, where is the observed survival time, is the observed event type , is the censoring indicator of event, is the observed covariate, and is the longitudinal outcome. Let be a potential censoring time, and be the event time. We assume that , and . The parameters need to be estimated from data, where , , and and is the variance for the FPC score .
In order to estimate the parameters, we need to construct the joint likelihood functions. As mentioned in Section 2.1, let be the longitudinal process after adjusted for the effects of the covariates. We assume that the longitudinal outcome and the time-to-event are conditionally independent given the latent FPC scores . The longitudinal trajectories of can be determined by the FPC score as shown in Section 2.1, so the joint probability density function of and the time-to-event can be written as the factorization of the density distribution of the FPC score and the conditional survival density distribution of on the latent FPC score . Also, we assume that the subjects who are censored at time t should be representative of all the subjects in that subgroup who remained at risk at time t with respect to their survival experience. In other words, censoring is independently provided that it is random within any subgroup of interest. The full likelihood of the full set of parameters under independent censoring can be given by:
(4) |
where the density survival function is given by
the longitudinal function
and , and the shared latent variable density function of the FPC score
For example, we choose the sub-hazard density function for the sub-survival model in the application example as the following:
where
and the weighting function
where is the standard Kaplan-Meier estimator for the censoring distribution
According to Equation (4), the score function is found to be proportional to
(5) |
where
The observed data score vector in formula (5) is expressed as the expected value of the complete-data score vector with respect to the posterior distribution of the random effects of . If the score equations in the formula (5) can be solved with respect to Θ, with fixed at the Θ value of the previous iteration, then it is an EM algorithm. The proposed algorithm to estimate the parameters will be given in Section 3.2.
3.2. Parameter estimation
This section is focused on estimating the parameters in the joint-likelihood function. There are two main challenges to estimate parameters in the joint likelihood functions. One is the requirement for numerical integration of latent variables when the dimension of random-effects increases. The other is to estimate the density function because we don't have a closed-form for FPC function . Therefore, we propose to use the modified two-stage algorithm to estimate the parameters, and the proposed algorithm is specified as follows:
- Stage I
- Estimate all the parameters , the FPC , and the FPC score , and as shown in Section 2.1. The vector of random effects is shared between both longitudinal and survival sub-models. Thereby, we try to reduce the biases from the informative dropout problem for estimating the random effects parameters. Here the informative dropout means that there are missing measurements in the longitudinal trajectory . These missing measurements can be recovered for all subjects as
where t can be any past or future time points before patient death. In this way, we can simulate the complete longitudinal measurements in the next step.(6) - Simulate complete measurements of the longitudinal data from Equation (6), which is based on the estimated mean function , the estimated score function , the latent variables and . The latent variables and are simulated from normal distributions with means and variances estimated from the observed longitudinal data in the previous step. Using the complete longitudinal measurements, we can reestimate parameters , , , and using the procedure in (a). As , the estimated parameters in the submodel will convergence to the estimated parameters obtained from the joint model in probability as shown in Rizopoulos [16] and Huong et al. [9].
-
Stage II
After the insertion of the fitted values from stage I, the proposed joint models become- Approximate the expected function of the complete data likelihood. After the fitted values from stage I, the full joint likelihood function is in the following form:
As proven in the papers [9,16], the expected function of the complete data log-likelihood function can be approximated as in the following when ,(7) - Estimate the parameters and for the survival model by maximizing the approximation of the expected function of the complete data log-likelihood as in the formula (7).
4. Application: kidney transplant study
A total of 5654 kidney transplant recipients are included in the study from United Network for Organ Sharing (UNOS). Patients may experience kidney transplant failure, death or remain healthy until the end of their follow-up periods. Among the total 5654 patients, 1590 patients experience kidney transplant failure and 1735 patients eventually die. A total of 707 patients die after kidney failure, and 1028 patients die with a kidney still functioning. A competing risk model is adopted where the primary event of interest is the transplant failure and the competing event is death before transplant failure from other causes. We use the first four FPC scores computed from the functional principal component analysis on the patients' longitudinal GFR trajectories as covariates in the model. In addition, we included additional prognostic covariates such as age, sex, race, cause of end-stage renal disease (ESRD) and kidney donor type. The detailed patient demographics are shown in Table 1.
Table 1.
Summary of Kidney transplanted recipient characteristics in some kidney transplant data.
Patient characteristics variables | Percentage ( ) |
---|---|
Age | |
18–39 | 48 |
40–59 | 39 |
13 | |
Sex | |
Male | 59 |
Female | 41 |
Race | |
White | 61 |
Black | 31 |
Other | 8 |
Cause of ESRD | |
Diabetes | 32 |
Hypertension | 21 |
Glomerular disease | 29 |
Polycystic disease | 9 |
Other | 9 |
Kidney donor type | |
Deceased | 78 |
Living | 22 |
4.1. Results: longitudinal submodel
We choose K = 4 based on the Akaike Information Criterion (AIC) that takes into account the joint likelihood of longitudinal and survival models to select the number of principal components to be extracted. From functional principal components analysis, the 4 leading functional principal components (K = 4) account for of the total variability of the GFR curves; the first, second, third, and fourth FPCs respectively account for , , , and . Although the first 2 FPCs account for of the total variation, the third and fourth ones also contain important information that has potential predictive value.
The first four functional principal components are as shown in Figure 2. The patterns of the four leading functional principal component curves are similar to Figure 3 in Dong et al. [4], which are included in the Appendix. However, the new curves are more robust and smooth compared with Figure 3 in Dong et al. [4]. The reason is that the joint model can recover missing longitudinal GFR from survival data when modelling the longitudinal and survival data together. These functional principal component curves have a straightforward explanation. For example, the first FPC stays flat during the entire follow-up period, indicating that the largest GFR variation between subjects is the distance of a subject's GFR curve to the mean GFR curve. In other words, the degree of how far the curve is from the mean GFR curve captures the largest variation of the data, and the majority of patients have a relatively stable trajectory. The second FPC represents the ascent/decline in the GFR curve. The third and fourth FPCs represent a higher degree of fluctuation in the GFR curves.
Figure 2.
The first four leading functional principal components that estimated from the observed longitudinal GFR data and the recovered GFR data by incorporating survival data information when jointly modelling the longitudinal and survival outcomes.
4.2. Results: competing risk submodel
Table 2 show the hazard ratios of the covariates in the proposed joint model with m = 2. All of the 4 FPC scores are statistically significant for both death and kidney failure outcomes. The effects of these FPC scores on the hazard are different. For example, the first FPC score has a negative effect on the hazard while the second FPC score has a positive one. The clinical interpretations of the effects of FPC scores are as follows. The first FPC score depicts the main level of the GFR curve, i.e. how far it deviates from the mean. The level of GFR is related to the state of kidney function, as higher GFR values indicate better kidney health. A patient with a larger first FPC score has a higher GFR level and thus less likely to suffer from kidney failure, whereas the one with a smaller first FPC score (and thus lower GFR level) may be subject to a higher risk of kidney failure. Similarly, the second FPC score relates to the degree of decline in the GFR curve. A larger second FPC score indicates a steeper decline and is thus associated with a higher hazard of kidney failure. The third and fourth FPC scores are also significantly related to the event of interest; they represent the degree of abnormal fluctuation within the curve. The third and fourth FPC scores can be used as a guide to identifying those abnormal patients with highly fluctuating GFR curves. The significance of the FPC scores indicates that the conventional proposal in the literature [12,13] to assume that the change of the GFR curves is primarily linear might result in an incomplete depiction of the variation within the GFR curves as well as its effects. Such a simplified assumption may lead to biased findings.
Table 2.
Estimated hazard ratios of kidney failure post kidney transplant (m = 1) in the joint model with different survival submodels with 95% confidence interval given in brackets.
Joint model | ||||
---|---|---|---|---|
Competing risk submodel | Cox submodel | |||
Covariates | Hazard ratio | p-value | Hazard ratio | p-value |
Age | ||||
18–39 | 1.00 | 1.00 | ||
40–59 | 0.68(0.61,0.76) | 1.61(1.35, 1.92) | 0.001 | |
0.47(0.38,0.57) | 1.49(1.15, 1.93) | 0.001 | ||
Sex | ||||
Male | 1.00 | 1.00 | ||
Female | 0.89(0.80,0.99) | 0.048 | 0.78(0.66, 0.91) | 0.038 |
Race | ||||
White | 1.00 | 1.00 | ||
Black | 1.39(1.23, 1.55) | 1.37(1.16, 1.62) | ||
Other | 0.82(0.66, 1.02) | 0.079 | 0.63(0.44, 0.89) | |
Cause of ESRD | ||||
Diabetes | 1.00 | 1.00 | ||
Hypertension | 0.89(0.74, 0.97) | 0.026 | 0.74(0.61, 0.91) | |
Glomerular disease | 0.99(0.85, 1.14) | 0.834 | 0.55(0.44, 0.68) | |
Polycystic disease | 0.76(0.60, 0.97) | 0.026 | 0.44(0.31, 0.62) | |
Other | 0.90(0.76, 1.07) | 0.221 | 0.56(0.44, 0.71) | |
Kidney donor type | ||||
Deceased | 1.00 | 1.00 | ||
Living | 0.90(0.79,0.99) | 0.048 | 0.84(0.68,1.02) | 0.084 |
FPC score | ||||
First FPC score | 0.981(0.979, 0.982) | 0.968(0.965, 0.970) | ||
Second FPC score | 1.009(1.005, 1.013) | 1.003(1.000, 1.005) | 0.001 | |
Third FPC score | 0.976(0.969, 0.983) | 0.964(0.953, 0.975) | ||
Fourth FPC score | 0.993(0.974, 1.000) | 0.050 | 0.972(0.946, 0.999) | 0.048 |
As a comparison, Table 2 also displays estimation results under the Cox model. The Cox model attempts to answer a different question from the competing risks model. Under a Cox model, all patients who die before kidney failure are regarded as censored. The effect identified in the Cox model can be interpreted as the association between a covariate and the hazard rate of kidney failure in patients who have not experienced either kidney failure or death. On the other hand, under a competing risks model, the covariate effect is interpreted as the degree of association to the instantaneous rate of kidney failure given that the patient has either been healthy and never experienced kidney failure, or has died and could never possibly have kidney failure. The issue addressed by the Cox model refers to the risks of kidney failure that could be expected if a patient lives long enough, instead of, as with the competing risks model, the actual risk of kidney failure. The Cox model might be more suitable for understanding the etiological association between the covariates and the event of kidney failure; whereas the competing risks model is more relevant for prediction and allocation of resources as it focuses on the actual risk of the event occurring.
Regarding the effect of age, we observe an interesting pattern. Specifically, under the Cox model, the older age group has a hazard ratio greater than 1, indicating that from an etiological perspective, older patients are more likely to experience kidney failure. On the contrary, under the competing risks model, the hazard ratio of the older age group is smaller than 1, i.e. relative to the age group of 18 to 35, patients who are in age groups of 40 to 59, and 60 or above have hazard ratios equal to 0.68 and 0.47, respectively. This is possibly due to the fact that young patients are more likely to have kidney failure before death because they tend to live long enough until the occurrence of kidney failure, while old patients are less likely to experience a kidney failure because more often than not, death occurs before kidney failure.
The relationships of the time-to-event outcomes with the rest of the covariates have reasonable clinical interpretations. For example, female patients are less likely to experience a kidney failure event compared with male patients. Compared with patients whose donor is deceased, patients who have a living donor transplant are less likely to experience kidney failure.
5. Simulation study
We evaluate the estimation accuracy of our joint model under simulated scenarios. As the primary focus is the functional component of the joint model, for simplicity, we assume that there are no baseline covariates . A FPCA model with no covariates is used to simulate the longitudinal trajectories,
where K = 4 and the mean function and the eigenfunctions are given as the ones estimated from the kidney transplant data. The measurement error are assumed to follow , and the FPC scores follow the distributions N , where , and , , , . For each subject, the scheduled repeated measurement times are set at a grid sequence of , where is the event time. Each cohort has a maximum follow-up time of 10. The time to the primary event is assumed to follow a log-normal distribution, and the time to the competing risk event is assumed to follow a Weibull distribution. The hazard for the time-to-event is specified as
The censoring times were simulated from a uniform distribution in the interval (0, 10), the censoring rate about 40%, and the number of repeated longitudinal measurements per person is greater than 3. We generate data cohorts in two scenarios where and take different values. As the estimation accuracies of the two event types are similar, only results for the estimation for the primary competing risk, i.e. , are presented. Table 3 displays the estimates, standard errors, and coverage probabilities for the two scenarios. It is evident that the estimation is quite accurate as the estimates are close to the true values. The empirical standard deviations of the estimates are also close to the average standard errors, corroborating the correctness in our estimation. The empirical coverage probabilities are slightly lower than the nominal levels of and ; this could be due to the stage-wise nature of our estimation procedure. In general, the model is proven to render satisfactory performance in terms of the estimation accuracy.
Table 3.
Means, empirical standard deviations (SD), average standard errors (SE), and the and empirical coverages in two different scenarios. Each scenario has 100 simulation replicates and 100 subjects in each replicate.
Parameters | ||||
---|---|---|---|---|
Scenario 1 | ||||
True value | 1.000 | 1.000 | 1.000 | 1.000 |
Estimate | 1.014 | 0.986 | 1.021 | 0.974 |
Empirical SD | 0.127 | 0.212 | 0.131 | 0.132 |
Average SE | 0.121 | 0.208 | 0.127 | 0.128 |
Empirical coverage | 0.932 | 0.946 | 0.938 | 0.937 |
Empirical coverage | 0.974 | 0.986 | 0.978 | 0.973 |
Scenario 2 | ||||
True value | −1.000 | 0.850 | −0.750 | 0.500 |
Estimate | −0.985 | 0.844 | −0.739 | 0.484 |
Empirical SD | 0.131 | 0.112 | 0.196 | 0.114 |
Average SE | 0.128 | 0.107 | 0.189 | 0.105 |
Empirical coverage | 0.935 | 0.946 | 0.941 | 0.941 |
Empirical coverage | 0.982 | 0.992 | 0.981 | 0.987 |
6. Conclusion and discussion
This paper proposed a joint model that includes a longitudinal FPCA submodel and a competing risk submodel, with shared latent functional features. The competing risk survival submodel can incorporate hazard ratios of multiple time-to-event outcomes. We have demonstrated the usefulness and applicability of the proposed joint model on a real kidney transplant data set. The main results from the application reveal meaningful clinical findings. The finite sample performance of the proposed method is verified in the simulation study.
One possible direction of future research relates to the stage-wise approach we adopted to estimate the coefficients. Compared with the regular approach based on seeking the maximizer of the complete data likelihood via the expectation-maximization algorithm, the stage-wise approach tremendously reduces the computational burden as it no longer requires computing integral via quadratures. Therefore, the stage-wise procedure is one of common approaches for joint models, and the improvements of stage-wise approaches for joint models have been developed in the literature [1,2,9,20,21]. Ye and Wu [20] and Huong et al. [9] have proved the asymptotic equivalence between the stage-wise estimates and the estimates obtained from the complete data likelihood and expectation-maximization algorithm. Possible improvement in parameter estimation might be achieved in the modified stage-wise EM approach in the future.
Acknowledgements
We acknowledge the great support, professional advice, and constructive comments from the Editor-in-Chief Jie Chen, anonymous associate Editor, and three anonymous reviewers. We are also indebted to thank Rahul Unni and other persons for this great editorial assistance.
Appendix.
For comparison with the patterns of the four leading FPCs, Figure 3 in Dong et al. [4] is copied here for the convenience of the readers.
Figure A1.
The first four leading functional principal components (FPCs) that estimated from functional principal components analysis in the paper [4].
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.Albert P.S. and Shih J.H., On estimating the relationship between longitudinal measurements and time-to-event data using a simple two-stage procedure, Biometrics 63 (2008), pp. 983–987. [DOI] [PubMed] [Google Scholar]
- 2.Albert P.S. and Shih J.H., An approach for jointly modeling multivariate longitudinal measurements and discrete time-to-event data, Ann. Appl. Statist. 4 (2010), pp. 1517–1532. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Ding J. and Wang J., Modeling longitudinal data with nonparametric multiplicative random effects jointly with survival data, Biometrics 64 (2007), pp. 546–556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Dong J., Wang L., Gill J., and Cao J., Functional principal component analysis of GFR curves after kidney transplant, Stat. Methods Med. Res. 27 (2018), pp. 3785–3796. [DOI] [PubMed] [Google Scholar]
- 5.Dong J., Wang S., Wang L., Gill J., and Cao J., Joint modelling for organ transplantation outcomes for patients with diabetes and the end-stage renal disease, Stat. Methods Med. Res. 28 (2019), pp. 2724–2737. [DOI] [PubMed] [Google Scholar]
- 6.Fan J. and Gijbels I., Local Polynomial Modelling and its Applications, London: CRC Press, 1996. [Google Scholar]
- 7.Fine J.P. and Gray R.J., A proportional hazards model for the subdistribution of a competing risk, J. Am. Stat. Assoc. 94 (1999), pp. 496–509. [Google Scholar]
- 8.Hickey G., Philipson P., Jorgensen A., and Kolamunnage-Dona R., A comparison of joint models for longitudinal and competing risks data, with application to an epilepsy drug randomized controlled trial, J. R. Soc. A 181 (2018), pp. 1105–1123. [Google Scholar]
- 9.Huong P., Nur D., Pham H., and Branford A., A modified two-stage approach for joint modelling of longitudinal and time-to-event data, J. Stat. Comput. Simul. 88 (2018), pp. 3379–3398. [Google Scholar]
- 10.James G.M., Hastie T.J., and Sugar C.A., Principal component models for sparse functional data, Biometrika 87 (2010), pp. 587–602. [Google Scholar]
- 11.Levey A.S., Stevens L.A., Schmid C.H., Zhang Y.L., Castro A.F., Feldman H.I., Kusek J.W.,Eggers P., Van Lente F., Greene T., and Coresh J., A new equation to estimate glomerular filtration rate, Ann. Intern. Med. 150 (2009), pp. 604–612. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Marcen R., Morales J.M., Fernandez-Rodriguez A., Capdevila L., Pallardo L., Plaza J.J.,Cubero J.J., Puig J.M., Sanchez-Fructuoso A., Arias M., Alperovich G., and Seron D., Long-term graft function changes in kidney transplant recipients, Nephrol. Dial. Transplant. 9 (2010), pp. ii2–ii8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Moranne O., Maillardb N., Fafina C., Thibaudinb L., Alamartineb E., and Mariatb C., Rate of renal graft function decline after one year is a strong predictor of all-cause mortality, Am.J. Transplant. 13 (2013), pp. 695–706. [DOI] [PubMed] [Google Scholar]
- 14.Prentice R., Kalbfleisch J., Peterson A., Flournoy N., Farewell V., and Breslow N., The analysis of failure times in the presence of competing risks, Biometrika 34 (1978), pp. 541–554. [PubMed] [Google Scholar]
- 15.Putter H., Fiocco M., and Geskus R.B., Tutorial in biostatistics: competing risks and multistate models, Stat. Med. 26 (2007), pp. 2389–2430. [DOI] [PubMed] [Google Scholar]
- 16.Rizopoulos D., Fast fitting of joint models for longitudinal and event time data using a pseudoadaptive Gaussian quadrature rule, Comput. Stat. Data Anal. 56 (2011), pp. 2061–2077. [Google Scholar]
- 17.Wolfe R.A., Ashby V.B., Milford E.L., Ojo A.O., Ettenger R.E., Agodoa L.Y., Held P.J., and Port F.K., Comparison of mortality in all patients on dialysis, patients on dialysis awaiting transplantation, and recipients of first cadaveric transplant, New Engl. J. Med. 341 (1999), pp. 1725–1730. [DOI] [PubMed] [Google Scholar]
- 18.Yao F., Functional principal component analysis for longitudinal and survival data, Stat. Sin. 17 (2007), pp. 965–983. [Google Scholar]
- 19.Yao F., Muller H.G., and Wang J.L., Functional data analysis for sparse longitudinal data, J. Am. Stat. Assoc. 100 (2005), pp. 577–590. [Google Scholar]
- 20.Ye Q. and Wu L., Two-step and likelihood methods for joint models of longitudinal and survival data, J. Stat. Comput. Simul. 46 (2017), pp. 0361–0918. [Google Scholar]
- 21.Ye W., Lin X., and Taylor J.M.G., Semiparametric modeling of longitudinal measurements and time-to-event data–A two-stage regression calibration approach, Biometrics 64 (2008), pp. 1238–1246. [DOI] [PubMed] [Google Scholar]