Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2021 Sep 22;50(1):43–59. doi: 10.1080/02664763.2021.1981256

Jointly modelling multiple transplant outcomes by a competing risk model via functional principal component analysis

Jianghu (James) Dong a,b,CONTACT, Haolun Shi c, Liangliang Wang c, Ying Zhang a, Jiguo Cao c
PMCID: PMC9754024  PMID: 36530777

Abstract

In many clinical studies, longitudinal biomarkers are often used to monitor the progression of a disease. For example, in a kidney transplant study, the glomerular filtration rate (GFR) is used as a longitudinal biomarker to monitor the progression of the kidney function and the patient's state of survival that is characterized by multiple time-to-event outcomes, such as kidney transplant failure and death. It is known that the joint modelling of longitudinal and survival data leads to a more accurate and comprehensive estimation of the covariates' effect. While most joint models use the longitudinal outcome as a covariate for predicting survival, very few models consider the further decomposition of the variation within the longitudinal trajectories and its effect on survival. We develop a joint model that uses functional principal component analysis (FPCA) to extract useful features from the longitudinal trajectories and adopt the competing risk model to handle multiple time-to-event outcomes. The longitudinal trajectories and the multiple time-to-event outcomes are linked via the shared functional features. The application of our model on a real kidney transplant data set reveals the significance of these functional features, and a simulation study is carried out to validate the accurateness of the estimation method.

Keywords: Competing risks, functional principal component analysis, joint model, latent variables, kidney transplant

1. Background and introduction

Various studies, such as Levey et al. [11] and Wolfe et al. [17], have shown that kidney transplantation prolongs the survival of patients with end-stage renal disease. As patients may experience acute rejection or graft failure post-transplantation, how to extend the long-term survival of the kidney graft remains the main scientific question for transplant studies. If the rate of kidney graft failure can be reduced, the overall patient population would enjoy a longer survival time. Surrogate markers have been proposed to predict kidney graft failure. For example, Marcen et al. [12] and Moranne et al. [13] proposed to use the slope of the GFR trajectories to predict graft failure using a Cox model.

Three key questions pertain to understanding the interconnection between the GFR trajectories and the long-term transplant outcomes. The first question is how to fit a continuous trajectory from the repeated longitudinal GFR measurements. The second question is how to model the survival hazard of multiple outcomes/events simultaneously, as kidney recipients post-transplantation are subject to competing risks of transplant failure as well as death from causes other than transplant failure. The third question is how to identify the effects of the shape of the longitudinal trajectories on the prediction of multiple time-to-event outcomes.

To address the first question, several methods have been developed. For example, parametric models are commonly used to fit the GFR trajectories, e.g. Marcen et al. [12], Moranne et al. [13], and Dong et al. [5]. Another alternative is to use a nonparametric approach. For example, the functional principal component analysis is adopted by Dong et al. [4] to explore the major sources of variation among the GFR trajectories. The dimension of the GFR trajectories is effectively reduced and each curve can be represented by four functional principal components (FPCs). The top four FPCs account for 99.8% of the total variation.

To address the second question, various survival models have been proposed to handle competing events. In the competing risk framework, two popular competing risk models are used. One is the cause-specific hazard model proposed by Prentice et al. [14] and Putter et al. [15], and the other is the subdistribution hazards regression introduced by Fine and Gray [7]. We model the multiple time-to-event outcomes via the latter approach, which is based on a reweighted risk set for consistent estimation of the regression coefficients.

To address the third question, the longitudinal outcomes and the time-to-event outcomes are linked using the FPC scores as the shared latent features. As shown by Dong et al. [4], the four FPCs relate to four primary patterns of variation within the GFR trajectories. Figure 1 shows the GFR trajectories of four clusters of patients whose trajectories are dominated by their first, second, third, and fourth FPC scores, respectively. The risk of kidney transplant failure or death might be different when a patient's GFR trajectory is flat versus when the trajectory highly fluctuates. It is thus of interest to explore whether and how the progression of kidney function differs among these four clusters, and a natural approach is to use the FPC scores as the shared covariates between the longitudinal model and the survival model.

Figure 1.

Figure 1.

The 4 clusters of observable GFR curves by FPC scores from our preliminary analysis. The thick blue curve is the average of GFR curves in each panel.

Several proposed joint models for longitudinal outcomes and time-to-event outcomes have been constructed for FPCA. Yao [18] developed a joint model where FPCA is used to fit the longitudinal trajectories and the longitudinal outcome is treated as a covariate in the Cox regression model. Ding and Wang [3] proposed a joint model that treats longitudinal outcomes as nonparametric multiplicative random effects within the Cox proportional hazard framework. However, the two joint models mentioned above can only accommodate a single time-to-event outcome. A number of joint models have been developed recently for longitudinal and competing risk data. For example, Hickey et al. [8] has given the summary for the published joint models with competing-risks event. However, none of these joint models for longitudinal and competing risk data were set up for FPCA. It is of interest to determine the relationship between the patient's progression of kidney function and the dominant variation patterns of the GFR trajectories in our clinical kidney transplantation data. Therefore, we propose a new joint model based on the FPC scores as the shared latent features between the longitudinal and survival components.

Our model uses functional principal component analysis for the modeling of the longitudinal measurements and a competing risk subdistribution hazard model for handling multiple time-to-event outcomes. The main highlight of this paper is that after reflecting on the three key clinical questions in the kidney transplant studies, we tailor a new joint model to adequately address them. To the best of our knowledge, the proposed model is the first to explore the relationship between the pattern of variation of the GFR trajectories and the patient's state of survival that is subject to multiple time-to-event outcomes, using FPC scores as the latent shared features between the longitudinal model and the survival model.

The rest of this article is organized as follows. The proposed joint model is introduced in Section 2. We present the estimation method for the proposed joint model in Section 3. Section 4 demonstrates the application of our joint model in the kidney transplant data. Section 5 presents a simulation study to investigate the finite sample performances of our joint model. Conclusions and discussion are given in Section 6.

2. Joint model

Let Ti and Ci respectively denote the event and censoring times for the ith subject, where i=1,,N. Let Yi(t) denote the longitudinal outcome of the ith subject at time t, tT, and Zi the covariate vector of the ith subject. We observe Xi=min(Ti,Ci) and the censoring indicator Δi=I(TiCi). Each Ti may correspond to one of M different event types, and we denote mi(1,,M) as the index for the observed event types. We assume that the subjects are independent with each other.

2.1. Longitudinal model

The model for the longitudinal outcome Yi(t) is based on functional principal component analysis, which decomposes the underlying random stochastic process into a linear combination of functional principal components. As mentioned in Dong et al. [4], for the analysis of GFR curves, the principal component analysis through the conditional expectation (PACE) approach is well-suited for conducting FPCA and handling longitudinal outcome with possibly missing values and measurement errors. The obtained first 4 leading FPCs account for a majority of the variation ( 99.8%). We adopt such an approach for fitting the longitudinal measurements.

Let Yi(t)=Yi(t)α^Zi be the longitudinal process adjusted for the effects of the covariates. We decompose Yi(t) as

Yi(t)=μ(t)+k=1Kξikϕk(t)+ϵi,i=1,,n, (1)

where ϵi are identically and independently distributed normal measurement error terms with mean 0 and variance σ2. The function ϕk(t) is the kth functional principal component, which satisfies Tϕk(t)ϕj(t)=δkj, where δkj=1 if k = j and 0 otherwise. The ξik is the associated functional principal component score for the ith subject and the kth component, which is defined as

ξik=T{Yi(t)μ(t)}ϕk(t)dt.

The magnitude of ξik represents the degree of similarity between the Yi(t)μ(t) and ϕk(t). The mean and variance of the distribution of ξik are E(ξik)=0 and Var(ξik)=λk, where λ1λ20.

By the Mercer's theorem, the covariance function between any two time points s and t in the time-period T, defined as G(s,t)=Cov(Yi(s)μ(s),Yi(t)μ(t)), can be expressed as

G(s,t)=k=1λkϕk(s)ϕk(t).

As it would be unrealistic to estimate an infinite number of ϕk(t), in reality, Yi(t) is usually well approximated by retaining only the first K leading FPCs.

To estimate the FPCs, the first step is to establish smoothed estimates of the mean and covariance functions. The mean function is obtained by smoothing the data from all observations based on the one-dimensional local linear smoother [6], and the covariance function G(s,t) is estimated by a two-dimensional smoother [10,19]. Let μ^(t) and G^(s,t) denote the estimated mean trajectory and the estimated smoothed covariance surface.

To obtain estimates for ϕk(t) and λk, we solve the eigenequation

TG^(s,t)ϕk(s)ds=λkϕk(t), (2)

with the constraints ϕk2=1 and ϕk,ϕj=1 if k = j, and 0 otherwise. The solution to such an eigenequation ϕ^k(t) can be found by applying spectral decomposition to the discretized covariance surface of G^(s,t).

Let ni denote the number of timepoints on the trajectory of the ith subject. Let ϕ^ik and μ^i denote the vectors of values of ϕ^k(t) and μ^(t) evaluated at time points of the ith subject, and let G~i denote the matrix of values of G^(s,t) evaluated at the two-dimensional grid consisting of time points of the ith subject. The FPC score of the ith subject and the kth FPC is computed from the conditional expectation

ξ^ik=E^(ξik|Yi)=λkϕ^k(G~i+σ^2I)1(Yiμ^i),

where Yi=(Yi1,,Yini) is the vector of covariate-adjusted longitudinal data points of the ith subject, and I is an identity matrix of size ni.

2.2. Competing risk model

The competing risk model is well-suited for our analysis of the time to multiple competing events of interest. Without loss of generality, we consider two types of events, one event of primary interest, and another event constituting a competing risk. If a competing event occurs before the event of primary interest, the primary event of interest would no longer be observable. For example, in the case of our application, kidney failure is the event of primary interest, whereas death from other causes constitutes the competing event. Censoring of the primary event (kidney failure) is not only due to loss of follow-up but also due to the competing event (death from other causes).

Due to the existence of a competing event, the usual Kaplan-Meier estimate of the survival function of the primary event would render biased results. This is because the usual assumption that any subject with a censored observation will eventually experience the primary event of interest no longer holds due to the presence of the competing event. If the competing event occurs prior to the primary event, it would be impossible for the subject to experience the event of primary interest, thus violating the assumption of the Kaplan-Meier method. Under the Kaplan-Meier method, the cumulative probability function will approach one given infinite time of follow-up; whereas under the presence of the competing risk, there remain a proportion of subjects who are affected by the competing event and will never experience the event of primary interest.

An alternative to the conventional cumulative probability function under the Kaplan-Meier method is the cumulative incidence function. Denote m=1,2 as the index for the primary event and the competing event, respectively, and let hm(t) denote the cause-specific hazard function for the mth event type. The cumulative incidence function of the mth event is

Fm(t)=0tS(s)hm(s)ds,

where

S(t)=exp{0t(h1(s)+h2(s))ds}.

The survival function S(t) is the probability that the subject survives to time t without experiencing events 1 or 2. The cumulative incidence function can be regarded as the marginal subdistribution function of the time-to-event of a given type.

From the cumulative incidence function, the definition of the subdistribution hazard function is naturally derived as

h¯m(t)=ddtlog(1Fm(t)).

It is worth noting the h¯m(t) is a better characterization of the cumulative incidence function as opposed to the cause-specific hazard function hm(t). A transformation involving the integration of h¯m(t) would lead to the cumulative incidence function, i.e. Fm(t)=1exp{0th¯m(s)ds}, whereas conducting the same transformation on the hm(t) would render an improper function that lacks meaningful interpretation.

To incorporate the effects of covariates into the subdistribution hazard function and thus direct effects on the cumulative incidence function, we may impose a proportional hazards structure on the hazard assumption as

h¯m(t|ξi,Zi)=h¯m,0(t)exp{γmξi+βmZi},

where h¯m,0(t) is the baseline subdistribution hazard function.

The estimate of the covariate effects in the subdistribution hazard function is not as straightforward as establishing separate estimations for different causes from the Cox model. Gray and Fine [7] proposed a reweighted approach on a modified risk set for consistent estimation of the regression coefficients. To be specific, to adjust for diminishing observability due to censoring, subjects who have experienced a competing event are retained in the risk set with weight tapering off as time progresses according to a function that depends on the censoring distribution.

2.3. Model setup

For the longitudinal process, in addition to extracting the major sources of variation via FPCA, we are interested in the association between the longitudinal outcome and the covariate vector Zi. For the survival process, it is of interest to model the cumulative incidence function Fm(t|ξi,Zi)=Pr(Tt,miΔi=m|ξi,Zi), which is the probability that the event of type m occurs at or before time t. This is a conditional probability on the covariate vector as well as the functional principal component scores from the longitudinal process, denoted as ξi=(ξ1i,,ξKi)T, where K is the number of FPCs extracted.

We propose the following joint model for the longitudinal outcome Yi(t) and the subdistribution hazard of multiple competing time-to-event outcomes,

{Yi(t)=μ(t)+Φ(t)ξi+αZi+ϵi,h¯m(t|ξi,Zi)=h¯m,0(t)exp{γmξi+βmZi}. (3)

The first equation represents the model for the longitudinal outcome Yi(t), where μ(t) is the overall mean function of the longitudinal outcome, Φ(t)=(ϕ1(t),,ϕK(t)) is the vector of the functional principal component, ξi=(ξi1,,ξiK) the vector of FPC scores for the ith subject, i.e. Φ(t)ξi=k=1Kξikϕk(t), and α=(α1,,αP)T is a vector of coefficients for the fixed effects of Zi=(Zi1,,ZiP)T. It should be noted that the usual Mercer expansion would have K and we retain only the top K FPCs that explain the majority of variation among the longitudinal trajectories. The FPC scores ξi serve as the shared features in the survival model, and they link the longitudinal outcome and the competing time-to-event outcome in the joint model.

The second equation is a competing risk survival model. The subdistribution hazard function h¯m(t|ξi,Zi) is the hazard rate of the conditional cumulative incidence function Fm(t|ξi,Zi) for the mth competing risk, γm=(γm1,,γmK)T is the vector of coefficients for the random effects of FPC scores ξi, and βm=(βm1,,βmP)T is the vector of coefficients for the fixed effects of Zi=[Zi1,,ZiP]T. We take m = 1 as the primary event of interest corresponding to the event of kidney transplant failure.

There are several advantages of incorporating the FPC scores as the latent shared features in the new joint model. Contrary to the conventional models [18], which use the longitudinal outcome Yi(t) as a covariate in the proportional hazard structure, the FPC scores, which represent the pattern of variation in the longitudinal curves, may offer a better characterization of the relationship between the longitudinal process and the survival outcomes. In particular, in the case of the kidney transplant data, prior studies [5] have identified notable variation in the GFR trajectories, e.g. one group of patients may have very flat GFR trajectories, whereas another group may exhibit huge variation in the trajectories in terms of shape or slope. It is thus of interest to study the effects of the curve patterns on the survival incidence function.

3. The estimation method

3.1. The joint likelihood functions

This section gives the inference of the proposed joint model. Let (ti,Δi,miΔi,Zi,Yi(t)) denote the observations of each subject in the data, where ti is the observed survival time, mi is the observed event type (1,,M), Δi is the censoring indicator of event, Zi is the observed covariate, and Yi(t) is the longitudinal outcome. Let Ci be a potential censoring time, and Ti be the event time. We assume that Xi=min(Ti,Ci), and Δi=I(TiCi). The parameters Θ=(γ,β,h0,Λ,σ2) need to be estimated from data, where γ=(γ1,,γM)T, β=(β1,,βM)T, and Λ=(λ1,,λK)T and λk is the variance for the FPC score ξik.

In order to estimate the parameters, we need to construct the joint likelihood functions. As mentioned in Section 2.1, let Yi(t)=Yi(t)α^Zi be the longitudinal process after adjusted for the effects of the covariates. We assume that the longitudinal outcome Yi(t) and the time-to-event Ti are conditionally independent given the latent FPC scores ξi. The longitudinal trajectories of Yi(t) can be determined by the FPC score ξi=(ξi1,,ξiK)T as shown in Section 2.1, so the joint probability density function of Yi(t) and the time-to-event Ti can be written as the factorization of the density distribution of the FPC score ξi and the conditional survival density distribution of Ti on the latent FPC score ξi. Also, we assume that the subjects who are censored at time t should be representative of all the subjects in that subgroup who remained at risk at time t with respect to their survival experience. In other words, censoring is independently provided that it is random within any subgroup of interest. The full likelihood of the full set of parameters under independent censoring can be given by:

L(Θ)=i=1nmi=1M{f(Ti,Δi,miΔi|ξi,Zi,γ,β)f(Yi(t)|ξi,σ)f(ξi|Λ)dξi}, (4)

where the density survival function f(Ti,Δi,mi|ξi,Zi,γ,β) is given by

f(Ti,Δi,mi|ξi,Zi,γ,β)=hm(ti|ξi,Zi,γ,β)ImiΔi=mS(ti|ξi,Zi,γ,β)1ImiΔi=m,

the longitudinal function

f(Yi(t)|ξi,σ)=(2πσ2)ni2exp{12σ2(Yi(t)μi(t))T(Yi(t)μi(t))}

and μi(t)=μ(t)+k=1Kξikϕk(t), and the shared latent variable density function of the FPC score

f(ξi|Λ)=(2π|Λ|)12exp(12ξiTΛ1ξi).

For example, we choose the sub-hazard density function for the sub-survival model in the application example as the following:

f(Ti,Δi,mi|ξi,Zi,γ,β)=[exp{γmξi+βmZi}j=1nRj(Ti)wj(Ti)exp{γmξj+βmZj}]ImiΔi=m,

where

Rj(t)={1ifmimI(Tit)ifmi=m

and the weighting function

wj(t)={1iftTi0ift>TiandΔi=0G^(t)/G^(Ti)ift>TiandΔi=1,

where G^(t) is the standard Kaplan-Meier estimator for the censoring distribution

G^(t)=ti<t{1j=1nI(tj=tiΔi=0)j=1nI(tjti)}.

According to Equation (4), the score function is found to be proportional to

S(Θ)(i=1nlog{m=1M{f(TI,Δmi|ξi,Zi,γ,β)f(Yi(t)|ξi,σ)f(ξi|Λ)dξi}})Θ=i=1nh(Θ,ξi)Θf(ξi|Ti,Δi,Yi(t),Θ)dξi, (5)

where

h(Θ,ξi)=log{m=1Mf(TI,Δmi|ξi,Zi,γ,β)f(Yi(t)|ξi,σ)f(ξi|Λ)}.

The observed data score vector in formula (5) is expressed as the expected value of the complete-data score vector with respect to the posterior distribution of the random effects of ξ. If the score equations in the formula (5) can be solved with respect to Θ, with f(ξi|Ti,Δi,Yi(t),Θ) fixed at the Θ value of the previous iteration, then it is an EM algorithm. The proposed algorithm to estimate the parameters will be given in Section 3.2.

3.2. Parameter estimation

This section is focused on estimating the parameters in the joint-likelihood function. There are two main challenges to estimate parameters in the joint likelihood functions. One is the requirement for numerical integration of latent variables ξi when the dimension of random-effects increases. The other is to estimate the density function f(ξi|Λ) because we don't have a closed-form for FPC function ϕk(t). Therefore, we propose to use the modified two-stage algorithm to estimate the parameters, and the proposed algorithm is specified as follows:

  1. Stage I
    1. Estimate all the parameters μ^, the FPC ϕ^k, and the FPC score ξ^, and σ^2 as shown in Section 2.1. The vector of random effects ξi is shared between both longitudinal and survival sub-models. Thereby, we try to reduce the biases from the informative dropout problem for estimating the random effects parameters. Here the informative dropout means that there are missing measurements in the longitudinal trajectory Yi(t). These missing measurements can be recovered for all subjects as
      Yi^(t)=μ^(t)+k=1Kξ^ikϕ^k(t)+ϵi, (6)
      where t can be any past or future time points before patient death. In this way, we can simulate the complete longitudinal measurements Yi(t) in the next step.
    2. Simulate complete measurements of the longitudinal data Yi(t) from Equation (6), which is based on the estimated mean function μ^(t), the estimated score function ϕ^k(t), the latent variables ξi^ and σ^2. The latent variables ξ^i and σ^2 are simulated from normal distributions with means and variances estimated from the observed longitudinal data in the previous step. Using the complete longitudinal measurements, we can reestimate parameters μ^(t), ϕ^k(t), ξ^i, and σ^2 using the procedure in (a). As min(ni), the estimated parameters in the submodel will convergence to the estimated parameters obtained from the joint model in probability as shown in Rizopoulos [16] and Huong et al. [9].
  2. Stage II

    After the insertion of the fitted values from stage I, the proposed joint models become
    {Yi^(t)=μ(t)^+k=1Kξ^ikϕ^k(t)+ϵi,hm(t|ξi^,Zi)=hm,0exp[γmTξi^+βmTZi].
    1. Approximate the expected function of the complete data likelihood. After the fitted values from stage I, the full joint likelihood function is in the following form:
      L(Θ)=i=1nm=1M({hm[ti|ξi,Zi]}ImiΔi=mexp{0i=1nRi(u)0i=1n×hm(u|ξi,Zi)du}1ImiΔi=m})f(Yi(t)|ξ^i,σ^)f(ξi^|Λ^)dξi.
      As proven in the papers [9,16], the expected function of the complete data log-likelihood function can be approximated as in the following when min(ni),
      E(l(Θ))i=1nlog{m=1M({hm[ti|ξi^,Zi]}ImiΔi=mexp{0i=1nRi(u)0i=1n×hm(u|ξi^,Zi)du}1ImiΔi=m)}+logf(Yi(t)|ξ^i,σ^)+logf(ξi^|Λ^). (7)
    2. Estimate the parameters γ and β for the survival model by maximizing the approximation of the expected function of the complete data log-likelihood as in the formula (7).

4. Application: kidney transplant study

A total of 5654 kidney transplant recipients are included in the study from United Network for Organ Sharing (UNOS). Patients may experience kidney transplant failure, death or remain healthy until the end of their follow-up periods. Among the total 5654 patients, 1590 (28%) patients experience kidney transplant failure and 1735 (31%) patients eventually die. A total of 707 (44%) patients die after kidney failure, and 1028 (28%) patients die with a kidney still functioning. A competing risk model is adopted where the primary event of interest is the transplant failure and the competing event is death before transplant failure from other causes. We use the first four FPC scores computed from the functional principal component analysis on the patients' longitudinal GFR trajectories as covariates in the model. In addition, we included additional prognostic covariates such as age, sex, race, cause of end-stage renal disease (ESRD) and kidney donor type. The detailed patient demographics are shown in Table 1.

Table 1.

Summary of Kidney transplanted recipient characteristics in some kidney transplant data.

Patient characteristics variables Percentage ( %)
Age  
18–39 48
40–59 39
60 13
Sex  
Male 59
Female 41
Race  
White 61
Black 31
Other 8
Cause of ESRD  
Diabetes 32
Hypertension 21
Glomerular disease 29
Polycystic disease 9
Other 9
Kidney donor type  
Deceased 78
Living 22

4.1. Results: longitudinal submodel

We choose K = 4 based on the Akaike Information Criterion (AIC) that takes into account the joint likelihood of longitudinal and survival models to select the number of principal components to be extracted. From functional principal components analysis, the 4 leading functional principal components (K = 4) account for 99.8% of the total variability of the GFR curves; the first, second, third, and fourth FPCs respectively account for 84.6%, 10.6%, 3.4%, and 1.1%. Although the first 2 FPCs account for 95.24% of the total variation, the third and fourth ones also contain important information that has potential predictive value.

The first four functional principal components are as shown in Figure 2. The patterns of the four leading functional principal component curves are similar to Figure 3 in Dong et al. [4], which are included in the Appendix. However, the new curves are more robust and smooth compared with Figure 3 in Dong et al. [4]. The reason is that the joint model can recover missing longitudinal GFR from survival data when modelling the longitudinal and survival data together. These functional principal component curves have a straightforward explanation. For example, the first FPC stays flat during the entire follow-up period, indicating that the largest GFR variation between subjects is the distance of a subject's GFR curve to the mean GFR curve. In other words, the degree of how far the curve is from the mean GFR curve captures the largest variation of the data, and the majority of patients have a relatively stable trajectory. The second FPC represents the ascent/decline in the GFR curve. The third and fourth FPCs represent a higher degree of fluctuation in the GFR curves.

Figure 2.

Figure 2.

The first four leading functional principal components that estimated from the observed longitudinal GFR data and the recovered GFR data by incorporating survival data information when jointly modelling the longitudinal and survival outcomes.

4.2. Results: competing risk submodel

Table 2 show the hazard ratios of the covariates in the proposed joint model with m = 2. All of the 4 FPC scores are statistically significant for both death and kidney failure outcomes. The effects of these FPC scores on the hazard are different. For example, the first FPC score has a negative effect on the hazard while the second FPC score has a positive one. The clinical interpretations of the effects of FPC scores are as follows. The first FPC score depicts the main level of the GFR curve, i.e. how far it deviates from the mean. The level of GFR is related to the state of kidney function, as higher GFR values indicate better kidney health. A patient with a larger first FPC score has a higher GFR level and thus less likely to suffer from kidney failure, whereas the one with a smaller first FPC score (and thus lower GFR level) may be subject to a higher risk of kidney failure. Similarly, the second FPC score relates to the degree of decline in the GFR curve. A larger second FPC score indicates a steeper decline and is thus associated with a higher hazard of kidney failure. The third and fourth FPC scores are also significantly related to the event of interest; they represent the degree of abnormal fluctuation within the curve. The third and fourth FPC scores can be used as a guide to identifying those abnormal patients with highly fluctuating GFR curves. The significance of the FPC scores indicates that the conventional proposal in the literature [12,13] to assume that the change of the GFR curves is primarily linear might result in an incomplete depiction of the variation within the GFR curves as well as its effects. Such a simplified assumption may lead to biased findings.

Table 2.

Estimated hazard ratios of kidney failure post kidney transplant (m = 1) in the joint model with different survival submodels with 95% confidence interval given in brackets.

  Joint model
  Competing risk submodel Cox submodel
Covariates Hazard ratio p-value Hazard ratio p-value
Age        
18–39 1.00   1.00  
40–59 0.68(0.61,0.76) <0.001 1.61(1.35, 1.92) 0.001
60 0.47(0.38,0.57) <0.001 1.49(1.15, 1.93) 0.001
Sex        
Male 1.00   1.00  
Female 0.89(0.80,0.99) 0.048 0.78(0.66, 0.91) 0.038
Race        
White 1.00   1.00  
Black 1.39(1.23, 1.55) <0.001 1.37(1.16, 1.62) <0.001
Other 0.82(0.66, 1.02) 0.079 0.63(0.44, 0.89) <0.001
Cause of ESRD        
Diabetes 1.00   1.00  
Hypertension 0.89(0.74, 0.97) 0.026 0.74(0.61, 0.91) <0.001
Glomerular disease 0.99(0.85, 1.14) 0.834 0.55(0.44, 0.68) <0.001
Polycystic disease 0.76(0.60, 0.97) 0.026 0.44(0.31, 0.62) <0.001
Other 0.90(0.76, 1.07) 0.221 0.56(0.44, 0.71) <0.001
Kidney donor type        
Deceased 1.00   1.00  
Living 0.90(0.79,0.99) 0.048 0.84(0.68,1.02) 0.084
FPC score        
First FPC score 0.981(0.979, 0.982) <0.001 0.968(0.965, 0.970) <0.001
Second FPC score 1.009(1.005, 1.013) <0.001 1.003(1.000, 1.005) 0.001
Third FPC score 0.976(0.969, 0.983) <0.001 0.964(0.953, 0.975) <0.001
Fourth FPC score 0.993(0.974, 1.000) 0.050 0.972(0.946, 0.999) 0.048

As a comparison, Table 2 also displays estimation results under the Cox model. The Cox model attempts to answer a different question from the competing risks model. Under a Cox model, all patients who die before kidney failure are regarded as censored. The effect identified in the Cox model can be interpreted as the association between a covariate and the hazard rate of kidney failure in patients who have not experienced either kidney failure or death. On the other hand, under a competing risks model, the covariate effect is interpreted as the degree of association to the instantaneous rate of kidney failure given that the patient has either been healthy and never experienced kidney failure, or has died and could never possibly have kidney failure. The issue addressed by the Cox model refers to the risks of kidney failure that could be expected if a patient lives long enough, instead of, as with the competing risks model, the actual risk of kidney failure. The Cox model might be more suitable for understanding the etiological association between the covariates and the event of kidney failure; whereas the competing risks model is more relevant for prediction and allocation of resources as it focuses on the actual risk of the event occurring.

Regarding the effect of age, we observe an interesting pattern. Specifically, under the Cox model, the older age group has a hazard ratio greater than 1, indicating that from an etiological perspective, older patients are more likely to experience kidney failure. On the contrary, under the competing risks model, the hazard ratio of the older age group is smaller than 1, i.e. relative to the age group of 18 to 35, patients who are in age groups of 40 to 59, and 60 or above have hazard ratios equal to 0.68 and 0.47, respectively. This is possibly due to the fact that young patients are more likely to have kidney failure before death because they tend to live long enough until the occurrence of kidney failure, while old patients are less likely to experience a kidney failure because more often than not, death occurs before kidney failure.

The relationships of the time-to-event outcomes with the rest of the covariates have reasonable clinical interpretations. For example, female patients are less likely to experience a kidney failure event compared with male patients. Compared with patients whose donor is deceased, patients who have a living donor transplant are less likely to experience kidney failure.

5. Simulation study

We evaluate the estimation accuracy of our joint model under simulated scenarios. As the primary focus is the functional component of the joint model, for simplicity, we assume that there are no baseline covariates Zi. A FPCA model with no covariates is used to simulate the longitudinal trajectories,

Yi(t)=μ(t)+k=1Kξikϕk(t)+ϵi,

where K = 4 and the mean function μ(t) and the eigenfunctions ϕk(t) are given as the ones estimated from the kidney transplant data. The measurement error ϵi are assumed to follow N(0,0.85), and the FPC scores follow the distributions ξik N (0,σk2), where k=1,,4, and σ1=16, σ2=8, σ3=4, σ4=1. For each subject, the scheduled repeated measurement times are set at a grid sequence of (1,,Ti), where Ti is the event time. Each cohort has a maximum follow-up time of 10. The time to the primary event is assumed to follow a log-normal distribution, and the time to the competing risk event is assumed to follow a Weibull distribution. The hazard for the time-to-event Ti is specified as

hm(t|ξi)=hm0(t)exp(γm1ξi1+γm2ξi2+γm3ξi3+γm4ξi4),m=1,2.

The censoring times were simulated from a uniform distribution in the interval (0, 10), the censoring rate about 40%, and the number of repeated longitudinal measurements per person is greater than 3. We generate data cohorts in two scenarios where (γ11,γ12,γ13,γ14) and (γ21,γ22,γ23,γ24) take different values. As the estimation accuracies of the two event types are similar, only results for the estimation for the primary competing risk, i.e. (γ11,γ12,γ13,γ14), are presented. Table 3 displays the estimates, standard errors, and coverage probabilities for the two scenarios. It is evident that the estimation is quite accurate as the estimates are close to the true values. The empirical standard deviations of the estimates are also close to the average standard errors, corroborating the correctness in our estimation. The empirical coverage probabilities are slightly lower than the nominal levels of 95% and 99%; this could be due to the stage-wise nature of our estimation procedure. In general, the model is proven to render satisfactory performance in terms of the estimation accuracy.

Table 3.

Means, empirical standard deviations (SD), average standard errors (SE), and the 95% and 99% empirical coverages in two different scenarios. Each scenario has 100 simulation replicates and 100 subjects in each replicate.

Parameters γ11 γ12 γ13 γ14
Scenario 1        
True value 1.000 1.000 1.000 1.000
Estimate 1.014 0.986 1.021 0.974
Empirical SD 0.127 0.212 0.131 0.132
Average SE 0.121 0.208 0.127 0.128
Empirical coverage (95%) 0.932 0.946 0.938 0.937
Empirical coverage (99%) 0.974 0.986 0.978 0.973
Scenario 2        
True value −1.000 0.850 −0.750 0.500
Estimate −0.985 0.844 −0.739 0.484
Empirical SD 0.131 0.112 0.196 0.114
Average SE 0.128 0.107 0.189 0.105
Empirical coverage (95%) 0.935 0.946 0.941 0.941
Empirical coverage (99%) 0.982 0.992 0.981 0.987

6. Conclusion and discussion

This paper proposed a joint model that includes a longitudinal FPCA submodel and a competing risk submodel, with shared latent functional features. The competing risk survival submodel can incorporate hazard ratios of multiple time-to-event outcomes. We have demonstrated the usefulness and applicability of the proposed joint model on a real kidney transplant data set. The main results from the application reveal meaningful clinical findings. The finite sample performance of the proposed method is verified in the simulation study.

One possible direction of future research relates to the stage-wise approach we adopted to estimate the coefficients. Compared with the regular approach based on seeking the maximizer of the complete data likelihood via the expectation-maximization algorithm, the stage-wise approach tremendously reduces the computational burden as it no longer requires computing integral via quadratures. Therefore, the stage-wise procedure is one of common approaches for joint models, and the improvements of stage-wise approaches for joint models have been developed in the literature [1,2,9,20,21]. Ye and Wu [20] and Huong et al. [9] have proved the asymptotic equivalence between the stage-wise estimates and the estimates obtained from the complete data likelihood and expectation-maximization algorithm. Possible improvement in parameter estimation might be achieved in the modified stage-wise EM approach in the future.

Acknowledgements

We acknowledge the great support, professional advice, and constructive comments from the Editor-in-Chief Jie Chen, anonymous associate Editor, and three anonymous reviewers. We are also indebted to thank Rahul Unni and other persons for this great editorial assistance.

Appendix.

For comparison with the patterns of the four leading FPCs, Figure 3 in Dong et al. [4] is copied here for the convenience of the readers.

Figure A1.

Figure A1.

The first four leading functional principal components (FPCs) that estimated from functional principal components analysis in the paper [4].

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • 1.Albert P.S. and Shih J.H., On estimating the relationship between longitudinal measurements and time-to-event data using a simple two-stage procedure, Biometrics 63 (2008), pp. 983–987. [DOI] [PubMed] [Google Scholar]
  • 2.Albert P.S. and Shih J.H., An approach for jointly modeling multivariate longitudinal measurements and discrete time-to-event data, Ann. Appl. Statist. 4 (2010), pp. 1517–1532. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ding J. and Wang J., Modeling longitudinal data with nonparametric multiplicative random effects jointly with survival data, Biometrics 64 (2007), pp. 546–556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Dong J., Wang L., Gill J., and Cao J., Functional principal component analysis of GFR curves after kidney transplant, Stat. Methods Med. Res. 27 (2018), pp. 3785–3796. [DOI] [PubMed] [Google Scholar]
  • 5.Dong J., Wang S., Wang L., Gill J., and Cao J., Joint modelling for organ transplantation outcomes for patients with diabetes and the end-stage renal disease, Stat. Methods Med. Res. 28 (2019), pp. 2724–2737. [DOI] [PubMed] [Google Scholar]
  • 6.Fan J. and Gijbels I., Local Polynomial Modelling and its Applications, London: CRC Press, 1996. [Google Scholar]
  • 7.Fine J.P. and Gray R.J., A proportional hazards model for the subdistribution of a competing risk, J. Am. Stat. Assoc. 94 (1999), pp. 496–509. [Google Scholar]
  • 8.Hickey G., Philipson P., Jorgensen A., and Kolamunnage-Dona R., A comparison of joint models for longitudinal and competing risks data, with application to an epilepsy drug randomized controlled trial, J. R. Soc. A 181 (2018), pp. 1105–1123. [Google Scholar]
  • 9.Huong P., Nur D., Pham H., and Branford A., A modified two-stage approach for joint modelling of longitudinal and time-to-event data, J. Stat. Comput. Simul. 88 (2018), pp. 3379–3398. [Google Scholar]
  • 10.James G.M., Hastie T.J., and Sugar C.A., Principal component models for sparse functional data, Biometrika 87 (2010), pp. 587–602. [Google Scholar]
  • 11.Levey A.S., Stevens L.A., Schmid C.H., Zhang Y.L., Castro A.F., Feldman H.I., Kusek J.W.,Eggers P., Van Lente F., Greene T., and Coresh J., A new equation to estimate glomerular filtration rate, Ann. Intern. Med. 150 (2009), pp. 604–612. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Marcen R., Morales J.M., Fernandez-Rodriguez A., Capdevila L., Pallardo L., Plaza J.J.,Cubero J.J., Puig J.M., Sanchez-Fructuoso A., Arias M., Alperovich G., and Seron D., Long-term graft function changes in kidney transplant recipients, Nephrol. Dial. Transplant. 9 (2010), pp. ii2–ii8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Moranne O., Maillardb N., Fafina C., Thibaudinb L., Alamartineb E., and Mariatb C., Rate of renal graft function decline after one year is a strong predictor of all-cause mortality, Am.J. Transplant. 13 (2013), pp. 695–706. [DOI] [PubMed] [Google Scholar]
  • 14.Prentice R., Kalbfleisch J., Peterson A., Flournoy N., Farewell V., and Breslow N., The analysis of failure times in the presence of competing risks, Biometrika 34 (1978), pp. 541–554. [PubMed] [Google Scholar]
  • 15.Putter H., Fiocco M., and Geskus R.B., Tutorial in biostatistics: competing risks and multistate models, Stat. Med. 26 (2007), pp. 2389–2430. [DOI] [PubMed] [Google Scholar]
  • 16.Rizopoulos D., Fast fitting of joint models for longitudinal and event time data using a pseudoadaptive Gaussian quadrature rule, Comput. Stat. Data Anal. 56 (2011), pp. 2061–2077. [Google Scholar]
  • 17.Wolfe R.A., Ashby V.B., Milford E.L., Ojo A.O., Ettenger R.E., Agodoa L.Y., Held P.J., and Port F.K., Comparison of mortality in all patients on dialysis, patients on dialysis awaiting transplantation, and recipients of first cadaveric transplant, New Engl. J. Med. 341 (1999), pp. 1725–1730. [DOI] [PubMed] [Google Scholar]
  • 18.Yao F., Functional principal component analysis for longitudinal and survival data, Stat. Sin. 17 (2007), pp. 965–983. [Google Scholar]
  • 19.Yao F., Muller H.G., and Wang J.L., Functional data analysis for sparse longitudinal data, J. Am. Stat. Assoc. 100 (2005), pp. 577–590. [Google Scholar]
  • 20.Ye Q. and Wu L., Two-step and likelihood methods for joint models of longitudinal and survival data, J. Stat. Comput. Simul. 46 (2017), pp. 0361–0918. [Google Scholar]
  • 21.Ye W., Lin X., and Taylor J.M.G., Semiparametric modeling of longitudinal measurements and time-to-event data–A two-stage regression calibration approach, Biometrics 64 (2008), pp. 1238–1246. [DOI] [PubMed] [Google Scholar]

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES