Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Jun 15.
Published in final edited form as: Stat Med. 2016 Jan 7;35(13):2167–2182. doi: 10.1002/sim.6860

A Two-Stage Approach for Dynamic Prediction of Time-to-Event Distributions

Xuelin Huang a,*, Fangrong Yan a,d, Jing Ning a, Ziding Feng a, Sangbum Choi b, Jorge Cortes c
PMCID: PMC4853264  NIHMSID: NIHMS750374  PMID: 26748812

Abstract

Dynamic prediction uses longitudinal biomarkers for real-time prediction of an individual patient's prognosis. This is critical for patients with an incurable disease such as cancer. Biomarker trajectories are usually not linear, nor even monotone, and vary greatly across individuals. Therefore, it is difficult to fit them with parametric models. With this consideration, we propose an approach for dynamic prediction that does not need to model the biomarker trajectories. Instead, as a trade-off, we assume that the biomarker effects on the risk of disease recurrence are smooth functions over time. This approach turns out to be computationally easier. Simulation studies show that the proposed approach achieves stable estimation of biomarker effects over time, has good predictive performance, and is robust against model misspecification. It is a good compromise between two major approaches, namely, (1) joint modeling of longitudinal and survival data, and (2) landmark analysis. The proposed method is applied to patients with chronic myeloid leukemia. At any time following their treatment with tyrosine kinase inhibitors, longitudinally measured BCR-ABL gene expression levels are used to predict the risk of disease progression.

Keywords: Biomarker, Dynamic prediction, Landmark analysis, Longitudinal data, Survival analysis, Time-dependent covariate

1. Introduction

Medical advancements have allowed many types of cancer to be managed as a chronic disease. However, achieving a complete cancer cure remains much more challenging. Cancer recurrence is a common problem and patients are scheduled for many follow-up visits once their primary treatment concludes. It is important to use the longitudinal biomarker data collected during such visits to make real-time predictions for the patient's disease prognosis so that new treatments can be initiated early to prevent disease progression. Thus, the prediction is conducted at each patient visit, and the measure of interest is the time to the next failure event. Such a real-time prediction system is called dynamic prediction and is critical for physicians and patients to be able to monitor disease progression and plan for early prevention and treatment, thus improving the patient's likelihood of survival.

This article is motivated by the need to predict the distribution of time to relapse for patients with chronic myeloid leukemia (CML) [1]. The disease burden of these patients can be represented by the expression level of the gene BCR-ABL. Tyrosine kinase inhibitors (TKIs) achieve good responses in these patients, as defined by both symptoms and the BCR-ABL expression level. Such patients take the TKIs continuously and are scheduled for regular follow-up visits. However, CML may recur during the years following treatment. The common clinical practice is to wait until a patient shows symptoms of disease relapse before starting a new treatment. However, for many patients, the BCR-ABL expression levels increase before clinical symptoms of disease relapse appear. Thus, it would be helpful to use BCR-ABL levels to predict the time to relapse so that patients and physicians can initiate new treatments early to minimize disease relapse.

The literature on traditional survival analysis provides many models for estimating the time to an event of interest. Most such models can be used for the purpose of prediction. However, many of them incorporate only baseline covariates [2, 3], so they can be used to predict survival at the baseline, but are not useful for dynamic prediction at any time beyond that.

As a tool for dynamic prediction, it is natural to consider using the Cox [4] proportional hazards model, with longitudinal biomarker values as time-dependent covariates. We denote by Ti the time to an event of interest, such as disease progression, for subject i, with i = 1, · · · , n. Hereafter, we simply call Ti the survival time for convenience. We denote the baseline covariates as Yi and time-dependent covariates at time t as Zi(t). The Cox model specifies hi(t), the hazard function for Ti, as follows.

hi(tYi,Zi(t))=h0(t)exp{θYi+γZi(t)}, (1)

where h0(t) is an arbitrary non-negative function, and θ and γ are unknown parameters. At time t, conditional on Ti ≥ t, the future survival distribution for this patient can be computed as follows.

Pr[Tit+vTit,Yi,{Zi(t+u),0uv}]=exp[0vh0(t+u)exp{θYi+γZi(t+u)}du]. (2)

Using the above model, the prediction conducted at time t needs to use the future biomarker values {Zi(t + u), 0 ≤ uv}. However, these values of Zi(t + u) are not yet available at time t. In order to use the above equation to make such predictions, a commonly adopted approach is to model the longitudinal biomarker trajectories, and use the model-based biomarker values in the above Cox model (1). This approach of jointly modeling longitudinal and time-to-event data has been adopted by many researchers [5, 6, 7, 8, 9, 10, 11, 12, 13]. Nice reviews of this topic are available in the literature [14, 15].

In real applications, longitudinal biomarker trajectories can be any shape, and can vary greatly from patient to patient. This can be seen from Figure 1, which shows the trajectories of the BCR-ABL expression levels over time for three patients with CML in the study we described. It is difficult to find satisfactory parametric models to fit longitudinal biomarker data well. In this article, we propose a method that does not specify a model for the changing patterns of the longitudinal biomarkers. It is an information-cumulating model for continuous predictive analysis over time. It performs dynamic predictions through a two-stage procedure. In the first stage, we provide a prediction based on only baseline covariate information and ignore post-baseline time-dependent covariates. In the second stage, we use information from post-baseline time-dependent covariates to improve the prediction results obtained from the first stage. Through this process, our approach is computationally easier than the joint modeling approach, and we avoid any need to impute unobserved biomarker values.

Figure 1.

Figure 1

BCR-ABL expression values over time for three selected patients in the CML data set. The BCR-ABL values are standardized to lie within the range (0,100).

Another approach to dynamic prediction is landmark analysis, which fits a marginal survival model at fixed prediction times known as landmarks [17, 18, 19, 20]. However, in practice, we need to make predictions any time a new biomarker measurement is obtained, not only at a few selected time points. As shown in Figure 2, the time points for patients’ follow-up visits and longitudinal biomarker measurements are scattered irregularly throughout the time interval following the initial treatment. Consequently, we need dynamic prediction models that are applicable at any post-treatment time point. Our proposed approach can perform such dynamic predictions continuously over time. Our approach exploits the following inherent property of hazard functions, which seems to have been neglected in this research area. That is, suppose the hazard function for T is λT(u). Here and below, when needed, we use a subscript for each hazard function to indicate the time variable with which the hazard function corresponds. At time t, given Tt, denote the hazard function for Tt by λT−t(u), then we have

λTt(u)=λT(t+u). (3)

With this equality, the hazard functions used by landmark analysis at different time points, say λT−3(u) for prediction at t = 3 given T ≥ 3, λT−6(u) and λT−9(u) (defined similarly), can be expressed as different shifts from a single function, namely, λT(3 + u), λT(6 + u) and λT(9 + u). This helps to greatly reduce the number of variables. Moreover, this equality enables us to specify the conditional hazard function of T given Tt, at any time t. That is to say, making a prediction is no longer limited to a few selected time points. This equality makes it possible to conduct predictions continuously over time.

Figure 2.

Figure 2

BCR-ABL expression values over time for all patients with CML in the data set.

Another challenge when conducting landmark analysis is that it requires biomarker measurements for all the patients at each landmark time. This requirement is almost impossible to satisfy in practical applications. Figure 2 illustrates this difficulty. Using interpolation or some other imputation techniques to estimate the biomarker values may result in bias. Our two-stage method described in the next Section avoids this difficulty. It enables us to simply use every biomarker measurement at the time it is measured, and does not require any two or more patients to have their biomarkers measured at the same time post-baseline. This makes our approach easy to use and free from biomarker imputation errors.

Our proposed approach is described in Section 2, and is compared with joint modeling and landmark analysis through simulations in Section 3. It is then applied to a CML study in Section 4. Related topics about our approach are discussed in Section 4. Mathematical details are provided in the Appendix.

2. Method

The proposed model for dynamic prediction is implemented through a two-stage procedure. In the first stage, we provide a prediction based on the survival information and baseline covariate information of all the subjects, but ignore the information from their post-baseline time-dependent covariates. In the second stage, we use information from all of the post-baseline time-dependent covariates to improve the prediction results obtained from the first stage. Our two-stage information-cumulating approach proceeds as follows.

2.1. Stage I: Prediction using only survival and baseline covariate information

In the first stage, we suppose that the hazard function λi(t) for patient i with baseline covariate Yi satisfies the following proportional hazards model,

λi(tYi)=λ0(t)exp{αYi}, (4)

where λ0(t) is an unspecified baseline hazard function and α is an unknown parameter. Here, the hazard function λi(t|Yi) is conditional on baseline covariates Yi only. It does not depend on time-dependent covariates Zi(t). Thus, it is different from the hazard function hi(t|Yi, Zi(t)) introduced in the previous section, which is conditional on both Yi and Zi(t). Let S0(t)=exp{0tλ0(s)ds} and Si(t|Yi) = Pr(Ti > t|Yi). Given that Ti > t, the prediction of whether Ti > t + u can be obtained through the following equation,

Pr(Ti>t+uTi>t,Yi)=Si(t+uTit,Yi)={S0(t+u)S0(t)}exp(αYi). (5)

This prediction can be conducted at any post-baseline time point t.

Denoting the censoring time for patient i by Ci, Xi = min(Ti, Ci) and Δi = I(TiCi), where I(·) is the indicator function. We use lower case letters (xi, yi, δi, etc.) to denote the realized values of the corresponding random variables. For convenience, we assume that the time variables xi, i = 1, · · · , n, have been sorted in ascending order without ties. We then use the following two-stage estimation procedure.

In the first stage, we estimate α and S0(t), t > 0, in equations (4) and (5). The estimator α^ is obtained by maximizing the following partial likelihood,

PL(α)=i=1n{exp(αyi)j:xjxiexp(αyj)}δi. (6)

Then the Breslow estimator is used to estimate S0(t), t > 0:

λ^0(xi)=δij:xjxiexp(α^yi),i=1,,n;S^0(t)=exp{i:xitλ^0(xi)}. (7)

2.2. Stage II: Using longitudinal data to improve predictions from stage I

In the second stage, at time t and with time-dependent covariates Z(t), we should be able to provide a more accurate prediction than that obtained by equation (5) in stage I. For this purpose, given the property of hazard functions in equation (3), we have

λTit(uTit,Yi)=λTi(t+uYi). (8)

We use the hazard function property in (3) to avoid specifying a hazard function λt(u) for Tit|Tit at each time point t. Instead, the values of λt(u), t ≥ 0 are expressed as λ(t + u), a shift of the baseline hazard function. This makes our proposed prediction model parsimonious. According to equation (4),

λTi(t+uYi)=λ0(t+u)exp{αYi}. (9)

Then (8) and (9) together imply λTi–t(u|Tit, Yi) = λ0(t + u) exp{αYi}. Based on this equation, with additional information in Zi(t), we postulate that

λTit(uTit,Yi,Zi(t))=λ0(t+u)exp{αYi}exp{β(t)Zi(t)}. (10)

In the above model, we regard the parameters λ0(t + u) and α as being the same as in equations (9) and (4), so they can be estimated by equations (6) and (7). The only unknown parameter in the above model (10) is β(t), which is a smooth vector function that describes the time-varying effects of covariates Z(t) on future survival, with β′(t) denoting its transpose. With the above model (10), the prediction at time t can be provided as

Pr(Tit+uTit,Yi,Zi(t))={S0(t+u)S0(t)}exp{αYi+β(t)Zi(t)}, (11)

for which the estimation of α and S0(·) is provided by equation (6) and (7), and the estimation of β(t) is provided hereafter.

If we have biomarker measurements taken at or prior to baseline, we include such information in Yi, for i = 1, · · · , n. That is to say, for the same biomarker, such as the expression level of BCR-ABL, its baseline measurement is included in Yi (together with other demographic information that does not change over time). Only its post-baseline information is included in Zi(t), t > 0. We do not use Zi(0) in this article. This makes it convenient for Zi(t) to include some commonly used covariates such as biomarker changes from previous measurements or change rates. That is to say, Zi(t) is not limited to being biomarker values at time t. It can be summary statistics (features, patterns) of all the biomarker information up to time t. Using different variables Y and Z(t), t > 0 for baseline and post-baseline covariates respectively (i.e., including “Z(0)” in Y) avoids the difficulty that arises when, for example, the changing slopes of biomarker expression levels are not defined at baseline t = 0.

In the second stage, we estimate β(t). Various smoothness constraints, such as splines and fractional polynomials, can be placed on β(t). Binder et al. [21] used simulations to compare these two approaches, and concluded that fractional polynomials better recover simpler functions, whereas splines better recover more complex functions. As illustration, for a scalar time-dependent covariate Z(t), denoting v = t + 1, we assume

β(t)Z(t)=β0(t)+β1(t)Z(t)=β00+β01ln(v)+β02v+β031v+β04v+β051v+β06v2+β071v2+Z(t)(β10+β11ln(v)+β12v+β131v+β14v+β151v+β16v2+β171v2) (12)

where v = t + 1 is used instead of the original t to avoid log(0) and 1/0. The higher order polynomials in equation (12) may be removed by backward variable elimination.

Note that Z(t) can represent values other than the biomarker values measured at time t, and may be a summary statistic of all the biomarker information observed up to (and including) time t. Z(t) may include slope variables to describe the changes or rate of change in the biomarker values. It may be another variable that indicates a category of changing patterns of biomarker expression. The specification of Z(t) requires a careful exploration of the data.

We include an intercept variable β0(t) in the expression of β′(t)Z(t) in equation (12). If needed, we may also include interactions between baseline covariates Y and Z(t). Thus, we may allow β′(t)Z(t) to be

β0(t)+β1(t)Y+β2(t)Z(t)+β3(t)YZ(t).

This kind of flexible parameterization means that the proportional hazards assumption that seems to be implied by our model (11) does not need to hold for the model to have a good approximation of reality. For example, if β1(t) = η ln(t), then the hazard ration of Y = 1 versus Y = 0 is tηeα, instead of being a constant eα over time.

Suppose patient i has biomarker measurements taken at time points tij, j = 1, · · · , ni, with 0 < ti1 < · · · < ti,ni < xi. At each time point tij, by our model (11), the likelihood function connecting the biomarker value zi(tij) and the final survival outcome (xi, δi) is

Lij{β(t)}=L{Xi=xi,Δi=δiTi>tij,yi,zi(tij),α,λ0()}=[λ0(xi)exp{αyi+β(tij)zi(tij)}]δiexp[k:tij<xkxiλ0(xk)exp{αyi+β(tij)zi(tij)}]. (13)

We denote lij(β(t)) = log{Lij(β(t))}. Then, using a working independence assumption among the different time points tij, j = 1, · · · , ni, we have the following “working” log-likelihood function:

l{β(t)}=i=1nli{β(t)}=i=1nj=1nilij{β(t)}=i=1nj=1niδi[log{λ0(xi)}+αyi+β(tij)zi(tij)]exp{αyi+β(tij)zi(tij)}k:tij<xkxiλ0(xk). (14)

Then we plug in the stage 1 estimators α=α^ and λ0()=λ^0(), maximize the above log-likelihood function by setting its derivative with respect to β to zero, and solve the resulting estimating equations to obtain an estimator β^(t).

The variance formulae for α^ and S^0() can be obtained in the same way as in the Cox model [22]. The variance for the coefficients in β^(t) is estimated by a two-stage estimator [23], which is provided in the Appendix.

Given an existing data set, to apply the above two-stage approach for dynamic prediction, we first obtain estimators α^,S^0(t) using only baseline covariates and survival information. We then estimate β(t) for t ≥ 0, as described above. For a new patient (not in the data set) who has baseline covariate Ynew and covariate value Znew(t), we predict the new patient's survival time Tnew as

Pr(Tnewt+uTnew>t,Ynew,Znew(t))=[S^0(t+u)S^0(t)]exp{α^Ynew+β^(t)Znew(t)}. (15)

Assuming a parametric form for β(t) and using a two-stage approach, we achieve the following advantages. First, there is no need to specify a biomarker trajectory model, which is usually difficult. This also avoids the biased prediction that may result from a mis-specified trajectory model. Second, the proposed approach uses only the observed biomarker values; therefore, there is no need to require all patients to have biomarker measurements taken at any common post-baseline time point. This avoids the need to impute biomarker values, even when their distributions are scattered throughout irregular time intervals. Third, the proposed method can conduct predictive analyses continuously over time (i.e., predictions can be conducted at any time t ≥ 0). These advantages and the convenience of our method come with a price: we need to assume a parametric shape for the time-varying biomarker effect coefficient β(t). We believe this is a good trade-off. It is safer to assume a parametric shape for β(t) than for Zi(t), i = 1, · · · , n. Biomarker values can fluctuate over time for each patient, and their changing patterns vary greatly among different patients. This can be seen in Figure 1. On the other hand, it is reasonable to believe that β(t) has a smooth shape.

3. Simulation Studies

Two major approaches to dynamic prediction are currently available in the literature: (1) joint modeling of longitudinal and survival data, and (2) landmark analysis. We briefly describe these approaches, compare them with our proposed method, and explain their advantages and disadvantages.

3.1. Joint modeling approach

Most approaches for jointly modeling time-to-event and longitudinal data are based on the Cox proportional hazards model with time-dependent covariates, as shown in equation (1). The full likelihood of such a joint model includes the following partial likelihood equation,

Ltd(β)=i=1n[exp{θyi+γzi(xi)}j:xjxiexp{θyj+γzj(xi)}]δi. (16)

From this likelihood function, we can see that at the time of each event xi, we need to know Zj(xi) for all the patients j with xjxi (i.e., patients who are still at risk at this time xi). In most studies, biomarker values are measured only at discrete time points. It is almost impossible to have biomarker values available for all patients in a study at all the random event times. Thus, the above condition is usually not satisfied in practical applications.

Some approaches will try to solve this problem by filling in the biomarker values according to ad hoc methods such as the last observation carried forward, the next observation carried backward, using the closest observation, or using the mean of the previous and next observations. However, such methods can give biased results [8]. The joint modeling approach solves this problem by assuming parametric models for the longitudinal biomarker values. For example,

Zi(t)=fi(t)+εi(t),i=1,,n, (17)

where fi(t), i = 1, · · · , n are usually related to a common parametric function f(t) through some individual level random effects. Then, in the partial likelihood equation (16), instead of using Zj(xi) for the patients j with xjxi, the joint modeling approach maximizes the following partial likelihood that involves fj(xi),

Ljm(β)=i=1n[exp{θyi+γfi(xi)}j:xjxiexp{θyj+γfj(xi)}]δi. (18)

Using this joint modeling approach, even though Zj(xi) is not observed, fj(xi) can still be estimated from model (17), and the likelihood in (18) can still be computed. An advantage of the joint modeling approach is that the model-based value fj(t) is not as prone to measurement errors as Zj(t). However, this advantage is critically dependent on the correct specification of model (17) for the longitudinal biomarker trajectories, which is not an easy task.

3.2. Landmark analysis

In landmark analysis [24, 25], a fixed time point after the initiation of therapy is selected as a landmark for conducting the analysis of survival. Only patients who are alive and have not experienced the event of interest at the landmark time are included in the analysis. Many researchers have used a series of landmark analyses over time as an approach to dynamic prediction [17, 19], such as that shown here:

hi(tZi(s))=hs(t)exp{θLM(s)Yi+βLM(s)Zi(s)},t>s. (19)

In equation (19), for any two distinct time point ss′, the hazard functions hs(t) and hs(t) are different hazard functions without any constraints between them. Similarly, θLM(s) and θLM(s′) have no constraints between them, nor do βLM(s) and βLM(s). Smoothness constraints between each pair of them can be imposed by using cubic splines or other techniques [17].

An advantage of landmark analysis is that it does not specify a model such as model (17) for the biomarker trajectories. However, separate and unconstrained landmark analyses on discrete time points ignore the correlation between neighboring time intervals. Thus, such analyses are not efficient and may result in large variations in the predictions. When biomarker measurements are irregularly scattered throughout the follow-up period, one must perform ad hoc grouping of biomarker measurements into different time intervals and conduct biomarker imputation so that the analyses can be performed on some common time points. This imputation process may introduce bias [8].

3.3. Simulation set-up

We performed a series of simulations to evaluate the proposed method for dynamic prediction and compared it with the approaches of landmark analysis and jointly modeling of time-to-event and longitudinal data. We assumed that a study involved 200 patients who had been followed for a period of 15 years, and for whom longitudinal biomarker measurements had been taken at baseline and every year thereafter. We used a linear mixed effects model to generate the longitudinal biomarker measurements. To generate survival times, we applied a Cox proportional hazards model, using the longitudinal biomarkers as time-dependent covariates. We evaluated the robustness of the dynamic prediction approaches by considering different linear mixed effects models for the longitudinal data. Specifically, we generated data by the following Models I and II, which we call Scenarios I and II, respectively. Then, we analyzed each data set by each of the four models listed below to make predictions. Model III is a landmark analysis model. Model IV is our proposed approach. Models III and IV do not specify a data generating mechanism.

Model I:hi(tmi(t))=h0(t)exp{γmi(t)},mi(t)=(β0+bi0)+(β1+bi1)t,Zi(t)=mi(t)+εi(t).Model II:hi(tmi(t))=h0(t)exp{γmi(t)},mi(t)=(β0+bi0)+(β1+bi1)t3,Zi(t)=mi(t)+εi(t).Model III:hi(tZi(s))=hs(t)exp{βLM(s)Zi(s)},t>s.Model IV:hi(tZi(0))=h0(t)exp{αZi(0)},hi(tZi(s))=h0(t)exp{αZi(0)+β(s)Zi(s)},ts>0.

When we used Models I and II to generate data, we let h0(t) = atσ−1, a Weibull hazard function. However, when we used these models for data analysis and then prediction, we assumed that their h0(t) functions were nonparametric. We let β0 = 3, β1 = 2, γ = 0.8 or 1.6. Here, γ controls how the survival time is influenced by the longitudinal biomarker values; a larger value of γ results in higher correlation. The random effects (bi0, bi1) have a multivariate normal distribution with mean 0 and covariance matrix (40.10.12). We measured the biomarkers at t = 1, 2, 3, · · · , 10. At each time point t, the distribution of the measurement error term εi(t) is N(0, 0.62). During data generation, the shape parameter for the Weibull function σ = 0.5. We adjusted the scale parameter a to achieve a 50% censoring rate for each scenario using a uniform (0, 28) distribution for the censoring time. In Model III, each s represents a landmark time point, and the baseline hazard function at time s is hs(t), t > s, which is an arbitrary non-negative function. For different landmark time points s1s2, there is no constraint between hs1(t) and hs2(t), nor between the regression coefficients βLM(s1) and βLM(s2). In Model IV, β(t) is a fractional polynomial, and β′(t)Zi(t) has the formulation described in equation (12).

3.4. Measures to assess predictive performance

To evaluate the predictive performance of the above methods, we focused on calibration and discrimination. We assessed calibration by calculating the root mean squared prediction errors (RMSEs) between the true survival probabilities and the predictions obtained from the four models. We used Si(t)=exp{0thi(s)ds} with t = 10 to calculate the true survival probabilities, in which hi(s) were known in the simulations.

To measure the discriminative capability of a longitudinal biomarker, we focused on a time interval of clinical relevance, (t, t + Δt), within which the occurrence of events is of interest and physicians can take action to improve the patient's chance of survival. In this setting, a useful property of the model is the ability to successfully discriminate between patients who will and those who will not experience the event within this time period. For a randomly chosen pair of patients {i, j} both of whom have provided measurements up to time t, the discriminative capability of the assumed model can be assessed by the area under the receiver operating characteristic curve (AUC) as

AUC(t,Δt)=Pr[πi(t+Δtt)πj(t+Δtt){Ti(t,t+Δt]}{Tj>t+Δt}],

where πi(t + Δt) represents the survival probability for patient i at t + Δt. That is, if patient i experiences the event within the relevant timeframe, and patient j does not, then we would expect to assign a higher probability of surviving longer than t + Δt to patient j. We follow the work of Rizopoulous et al. [16], Harrell, Lee, and Mark [27], Zheng and Heagerty [28], and Antolini, Boracchi, and Biganzoli [29] to calculate this discrimination measurement.

The mean values and standard deviations of the RMSEs (Table 1) and AUCs (Table 2) from 1000 simulated data sets are presented for Scenarios I and II, under two values of γ (which controls the effects of the biomarkers on survival). Our proposed Model IV performed almost the same as the true model in each scenario, and outperformed all the other models. In Scenario I, the RMSEs (and AUCs) obtained from our method are only slightly larger (smaller) than those obtained from Model I (the true model in this Scenario), and are much better than those obtained from Models II and III. In Scenario II, the RMSEs (and AUCs) obtained from our method are only slightly larger (smaller) than those obtained from Model II (the true model in this Scenario), and are much better than those obtained from Models I and III. When the joint modeling approach used the correct biomarker trajectory model for prediction (Model I in Scenario I and Model II in Scenario II), it performed the best, as expected, with the smallest RMSE and largest AUC. However, when a misspecified trajectory model was used (Model I in Scenario II, or Model II in Scenario I), the joint modeling approach for prediction was clearly outperformed by our proposed model. In all the scenarios, the landmark analysis approach performed worse than the other approaches on average, and its AUCs showed large variations.

Table 1.

Mean value (and standard deviation) of the root mean squared prediction error (RMSE) in the simulations. Models I and II are joint models, Model III is landmark analysis. Model IV is the proposed method.

Data generated by Model I Data generated by Model II

γ = 0.8 γ = 1.6 γ = 0.8 γ = 1.6

Prediction by Model I 0.346 (0.021) 0.601 (0.021) 0.429 (0.030) 0.645 (0.009)
Prediction by Model II 0.408 (0.019) 0.622 (0.017) 0.387 (0.023) 0.616 (0.012)
Prediction by Model III 0.564 (0.020) 0.659 (0.009) 0.545 (0.024) 0.649 (0.008)
Prediction by Model IV 0.344 (0.035) 0.608 (0.021) 0.390 (0.031) 0.623 (0.017)

Table 2.

Mean value (and standard deviation) of the area under the receiver operating characteristic curve (AUC) in the simulations. Models I and II are joint models, Model III is landmark analysis. Model IV is the proposed method.

Data generated by Model I Data generated by Model II

γ = 0.8 γ = 1.6 γ = 0.8 γ = 1.6

Prediction by Model I 0.622 (0.045) 0.667 (0.126) 0.564 (0.046) 0.538 (0.109)
Prediction by Model II 0.573 (0.049) 0.545 (0.110) 0.598 (0.062) 0.653 (0.120)
Prediction by Model III 0.518 (0.260) 0.434 (0.353) 0.504 (0.270) 0.466 (0.355)
Prediction by Model IV 0.605 (0.047) 0.655 (0.134) 0.596 (0.043) 0.608 (0.106)

4. Application to a CML study

We used our proposed method to predict the distribution of time to disease progression for patients with CML [1]. The data were obtained from a study of 670 patients with CML who had failed their front-line therapy with imatinib, and then enrolled in a trial to receive dasatinib. Only 567 patients had BCR-ABL measurements taken both before and after the dasatinib treatment, and thus were included in our data analysis by the proposed prediction model. By variable selection, we decided to use two baseline variables and one time-dependent covariate as follows: (1) Age60, a binary indicator of whether at baseline the patient is 60 years of age or older; (2) Bcrbase, the baseline BCR-ABL level; and (3) Bcrcurrt(t), the BCR-ABL level at time t > 0. We normalized all the BCR-ABL levels to be between 0 and 100, and measured time in months.

4.1. Stage 1: Prediction using only baseline covariate information

First, we fit a Cox proportional hazards model as shown in equation (4), using the two baseline variables and the time to disease progression, as follows,

λi(tAge60i,Bcrbasei)=λ0(t)exp(α1Age60i+α2Bcrbasei). (20)

We obtained estimators α^1, α^2, and Ŝ0(·) as a function of λ^0(), shown in (7). Then, without using the post-treatment biomarker observations, we used the following formula to predict the time to disease progression:

P^r(Tit+uTit,Age60i,Bcrbasei)={S^0(t+u)S^0(t)}exp{α^1Age60i+α^2Bcrbasei}. (21)

Suppose a patient with CML who was younger than 60 years had BCR-ABL = 80 at baseline. At 10 months post-treatment, the disease has not progressed for this individual. At that time, without using information about the post-treatment BCR-ABL measures, the future risk of disease progression for this patient is predicted by the solid line in Figure 3. This prediction can be viewed as an average for all the patients at that time, t = 10 months. The dashed and dotted lines in the Figure are not confidence intervals. They are generated by using different values of BCR-ABL at 10 months, as described hereafter.

Figure 3.

Figure 3

Predictions of future progression-free survival at 10 months for patients with baseline age 60 years, baseline BCR-ABL=80: (1) Patient # 1: BCR-ABL expression level at 10 months is 1 (dashed line), (2) Without using BCR-ABL information at 10 months (solid line), and (3) Patient # 2: BCR-ABL expression level at 10 months is 30 (dotted line).

4.2. Stage II: Using longitudinal data to improve predictions from stage I

After treatment, patients were scheduled to have follow-up visits at 3, 6, 9, 12, 18 and 24 months and annually thereafter. Their BCR-ABL expression levels were measured at these visits. Some patients did not adhere to this follow-up schedule and some missed scheduled visits. Most patients had irregular intervals between visits. Figure 2 shows all the available BCR-ABL measurements.

With the longitudinal measurements Bcrcurrti(t) at time points tik, k = 1, · · · , ni, we assume that

Pr(Titik+uTitik,Age60i,Bcrbasei,Bcrcurrti(tik))={S0(tik+u)S0(tik)}exp{α1Age60i+α2Bcrbasei+β0(tik)+β1(tik)Bcrcurrti(tik)}, (22)

with β(t) = (β0(t), β1(t))′ specified in (12). At any time t, the effect of the current BCR-ABL value, Bcrcurrt(t), on progression-free survival is a power term exp{β0(t) + β1(t)Bcrcurrt(t)} on the conditional survival function. Fitting this model by maximizing the working likelihood in (14), the resulting estimators for β0(t) and β1(t) are plotted in Figure 4 (a) and (b), respectively. Those plots show that the value of β1(t) is small initially but increases over time and then reaches a plateau. Recall that β1(t) describes the effects of the BCR-ABL expression level at time t on the future risk of disease progression. The small value of β1(t) in the first few months after treatment indicates that a high expression level of BCR-ABL in the first few months does not have much impact on progression-free survival. Six months after treatment, the large value of β1(t) indicates that a high expression level of BCR-ABL at that time is a sign of a short progression-free survival time. This difference suggests that it takes a few months for patients to respond to the TKI treatment and for their BCR-ABL expression levels to be reduced. However, if a patient has high BCR-ABL expression levels six months after treatment, that patient is not responding well to the treatment and has a high risk of disease progression in the near future.

Figure 4.

Figure 4

Estimation of the time-varying effects of BCR-ABL expression levels on the risk of disease progression: (a) β0(t) by proposed method, (b) β1(t) by proposed method, and (c) βLM(t) by landmark analysis, which is the counterpart of β1(t) in the proposed method. Landmark analysis does not use β0(t).

We also apply the landmark analysis approach to analyze the CML data set. The resulting effects of BCR-ABL expression levels on the risk of disease progression are shown in Figure 4 (c). It can be see that β1(t) by our proposed approach, shown in Figure 4 (b), is a smoothed version of that by landmark analyses. It is hard to believe that the real effects of BCR-ABL on disease progression can be so bumpy over time, as shown by the landmark analysis. One may try to get estimates for a series of discrete landmark time points first, and then smooth them by splines or other techniques. However, this kind of ad hoc method is arbitrary and inefficient. By putting our estimation in a systematic framework, we borrow information between neighbor time points on biomarker effects on disease progression and thus are more efficient. This is reflected in the smaller standard deviations of the estimated AUC's by our method (Table 2).

With all the unknown parameters in the above equation (22) having been estimated, we now illustrate how to conduct the dynamic prediction by our proposed two-stage approach. We already provided stage-1 prediction in the previous Subsection for a patient who was younger than 60 years and had BCR-ABL=80 at baseline. That prediction used only baseline information. Then at 10 months post baseline, the patient's disease has not progressed and the current BCR-ABL measure is Bcrcurrt(10). The prediction of this patient's future progression-free survival is as follows.

P^r(T10+uT10,Age60,Basebcr,Bcrcurrt(10))={S^0(10+u)S^0(10)}exp{α^1Age60+α^2Basebcr+β^0(10)+β^1(10)Bcrcurrt(10)} (23)

For two cases, Bcrcurrt(10) = 1 and Bcrcurrt(10) = 30, the predicted progression survival curves are plotted in Figure 3 (dashed and dotted lines, respectively). Figure 3 (solid line) also shows the prediction conducted at 10 months, computed by (21), using only baseline information, without using time-varying BCR-ABL measurements. This figure shows that the longitudinal biomarker information helps to further distinguish patients within each subgroup as defined by baseline covariates, and thus provides more accurate prediction.

4.3. Generating nomograms for medical decision making

Suppose, at any time t beyond the baseline measure, a physician would like to initiate new treatments for patients whose risk of disease progression in the next 12 months is 20% or greater. Then, based on the model obtained above, we can determine the cut-off values for the BCR-ABL expression levels over time that correspond to such a risk level. That is to say, after plugging all the estimated values α^0, α^1, β^0(t), β^1(t), and Ŝ0(t) into equation (22), letting u = 12 months, we would like to determine Bcrcurrt(t) such that

1P^r(Tt+12Tt,Age60,Basebcr,Bcrcurrt(t))=1{S^0(t+12)S^0(t)}exp{α^1Age60+α^2Basebcr+β^0(t)+β^1(t)Bcrcurrt(t)}0.2. (24)

For a patient who is 60 years of age or younger, and Basebcr = 80, we solve the above equation to obtain a solution for Bcrcurrt(t) for each t > 0 (we actually only need to work on those event times). We repeated this procedure, replacing the 20% with different risk levels of 25% and 30%. The solutions for Bcrcurrt(t) depend on α^0, α^1, β^0(t), β^1(t), and Ŝ0(t). Although β^0(t) and β^1(t) are smooth, Ŝ0(t) is not. Consequently, the solutions for Bcrcurrt(t) are not smooth. We then applied smoothing techniques and plotted the results (see Figure 5). This nomogram shows that at any time post-baseline, the expression levels of BCR-ABL correspond to, respectively, 20% (dotted line), 25% (dashed line), and 30% (solid line) risk of disease progression in the next 12 months. For example, at the 20th month after the beginning of treatment, if the above patient's BCR-ABL level is 20 (or 35, or 45), then the patient's risk of disease progression in the next 12 months is about 20% (or 25%, or 30%, respectively). Similarly, we can use this nomogram to perform risk prediction at any other time such as the 30th or 40th month. Patients can use this nomogram to evaluate whether their situation is improving or becoming worse over time. This is similar to the way pediatricians use growth curve nomograms to evaluate the growth of babies and children. Our nomogram is as convenient to use as growth curves.

Figure 5.

Figure 5

For patients with a baseline age 60 years, baseline BCR-ABL=80, at any time t post-baseline, the expression level of BCR-ABL at t corresponding to the risk of disease progression within (t, t + 12) months being 20% (dotted line), 25% (dashed line), or 30% (solid line).

4.4. Predictions by different approaches

We conducted predictions for patients with CML using the joint modeling approach, landmark analysis, and the proposed two-stage approach. To evaluate the performance of these three approaches and verify their validity, we computed Kaplan-Meier estimators (Figure 6, d) for the four subgroups of patients with BCR-ABL at 10 months falling within the ranges (0, 0.1], (0.1, 1.0], (1.0, 10] and (10, 100] respectively, and compared them with the corresponding results obtained from the aforementioned three methods. These three methods use continuous BCR-ABL values to fit a model for each subgroup. For comparison, predictive results for the median BCR-ABL value within each of the four subgroups are respectively displayed in Figure 6 (a-c). If we consider the distances between the four survival curves obtained from each of the approaches, the joint modeling approach shrinks the curves the most, while the landmark analysis shrinks the curves the least. Our proposed approach gives results comparable with the empirical Kaplan-Meier curves. This shows that our approach achieves a desirable balance between joint modeling and landmark analysis.

Figure 6.

Figure 6

Prediction of future progression-free probability for patients who were at-risk at 10 months, by four different methods: (a) joint modelling with a linear mixed model for biomarker measurements overt time, (b) landmark analysis, (c) proposed two-stage approach, and (d) Kaplan-Meier estimators; with each method applied to each of the four subgroups of patients classified by their expression levels of BCR-ABL at 10 months: solid line for range (0, 0.1], dashed line for range (0.1, 1], dotted line for range (1, 10] and dot-dashed line for range (10, 100].

5. Discussion

It is important for individuals who have had cancer to be monitored for signs of disease recurrence, even after they have been successfully treated. Monitoring patients may involve measuring specific biomarkers, such as prostate-specific antigen for prostate cancer, CA-125 for ovarian cancer, and BCR-ABL expression for CML.

We have provided an information-cumulating model for continuously conducting predictive analyses over time. Our model comprises a two-stage approach that uses longitudinal biomarker data to conduct dynamic real-time prediction of the time to a specific event. The first stage provides predictions that are conditional on only the baseline covariates. These predictions describe the distributions of the time to the event of interest in the absence of post-treatment biomarker information. In the second stage, the post-treatment biomarker information is used to improve the prediction from the first stage. An inherent property of hazard functions is used to provide parsimonious prediction models over time. Our proposed method has a few advantages. The first is that it does not need to specify a model for the longitudinal biomarker, which avoids the bias caused by the mis-specification of such a model. A second advantage is that it does not need to have the complete history of the biomarker, and does not need to perform biomarker value imputation. Without using random effect models for the longitudinal biomarkers, the computation required by the proposed method is much easier than that of other dynamic prediction methods, such as jointly modeling time-to-event and longitudinal data.

It is important to optimize the decision-making tool we have proposed by using receiver operating characteristic (ROC) curves [26]. It would be desirable to find some change points that can achieve the best ROC properties. There are a large number of publications on the optimization of ROC curves. Our next step will be to apply the optimization of ROC curves to our dynamic prediction method to achieve its best performance.

Acknowledgements

This research was supported by the US National Institutes of Health, grants U54 CA096300, U01 CA152958 and 5P50 CA100632.

A. Appendix: Variance Estimation

Maximizing the proposed working likelihood in equation (14) by replacing α and S0(·) with their consistent estimators gives a consistent estimator of β(t) [23]. We next derive the asymptotic distribution of β^(t). For simplicity, we assume a simple linear form β(t) = β × t; however, the arguments can be easily generalized to a general parametric function.

To study the asymptotic variance of β^, we explore the first derivative of the working log-likelihood,

U(β)=i=1nj=1ni[βδitijzi(tij)+tijzi(tij)exp{α^yi+βtijzi(tij)}(logS^0(xi)logS^0(tij))]. (25)

Denote Ũ(β) as the equation derived from U(β) by replacing α^ and Ŝ(.) with their true values. Then the score equation U(β) evaluated at the true value β0 of β can be rewritten as

U(β0)=U~(β0)+U(β0)U~(β0)=U~(β0)+I+II, (26)

where

I=i=1nj=1nitijzi(tij)log{S^0(xi)S^0(tij)}[exp{α^yi+βtijzi(tij)}exp{αyi+βtijzi(tij)}]=i=1nyij=1ni[tijzi(tij)exp{αyi+βtijzi(tij)}log{S^0(xi)S^0(tij)}](α^α)+op(n) (27)
II=i=1nj=1nitijzi(tij)exp{αyi+βtijzi(tij)}[log{S^0(xi)S^0(tij)}log{S0(xi)S0(tij)}]. (28)

Given the asymptotic martingale expressions of α^ and log Ŝ0(.),

n{α^α}=1nΓαi=1n0I(uxi)(yiy(u))dMi(u;α)

and

n[log{S^0(xi)S^0(tij)}log{S0(xi)S0(tij)}]=1ni=1n0I(tij<uxi)[1k=1nexp(αyk)I(xku)+(yiy(u))]dMi(u;α),

we have

1nI=1ni=1n0{ΓαD1(u)(yiy(u))}dMi(u;α)+op(1), (29)
1nII=1ni=1n0{D2(u)+D3(u)(yiy(u))}dMi(u;α)+op(1), (30)

where Γα is the expectation of the Jacobian matrix of the partial likelihood, and

y^(u)=i=1nI(xiu)yiexp(αyi)i=1nI(xiu)exp(αyi),Mi(u;α)=I(xiu,δi=1)0tI(xi>s)exp(αyi)λ0(s)ds,D1(u)=E(yij=1ni[tijzi(tij)exp{αyi+βtijzi(tij)}logS0(tij)]),D2(u)=E(j=1nitijzi(tij)exp{αyi+βtijzi(tij)}I(tij<uxi)k=1nexp(αyk)I(xku)),D3(u)=E(j=1ni[tijzi(tij)exp{αyi+βtijzi(tij)}I(tij<uxi)]).

By equations (26), (27), (28), (29) and (30) and the martingale central limit theorem, U(β0)n converges to a normal distribution with zero mean and variance matrix Σβ, which can be estimated by the sum of the products of the terms from Ũ(β), I and II. Then by the Taylor series expansion,

1nU(β^)=n12U(β^0)+1nΓn(β0)n(β^β0)+op(1),

where n(β^β0) converges weakly to a normal distribution with mean zero and variance-covariance matrix Γβ1ΣβΓβ1. Here, Γβ is the expectation of the Jacobian matrices 1nΓn(β0)=1nU(β)ββ=β0.

References

  • 1.Quintas-Cardama A, Choi S, Kantarjian H, Jabbour E, Huang X, Cortes J. Predicting outcomes in patients with chronic myeloid leukemia at any time during tyrosine kinase inhibitor therapy. Clinical Lymphoma, Myeloma & Leukemia. 2014;14(4):327–334. doi: 10.1016/j.clml.2014.01.003. doi: 10.1016/j.clml.2014.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Zheng Y, Cai T, Feng Z. Application of the time-dependent ROC curves for prognostic accuracy with multiple biomarkers. Biometrics. 2006;62(1):279–287. doi: 10.1111/j.1541-0420.2005.00441.x. [DOI] [PubMed] [Google Scholar]
  • 3.Uno H, Cai T, Tian L, Wei LJ. Evaluating prediction rules for t-year survivors with censored regression models. Journal of the American Statistical Association. 2007;102(478):527–537. [Google Scholar]
  • 4.Cox DR. Regression models and life-tables. Journal of the Royal Statistical Society. Series B (Methodology) 1972;34(2):187–220. [Google Scholar]
  • 5.Wulfsohn MS, Tsiatis AA. A joint model for survival and longitudinal data measured with error. Biometrics. 1997;53(1):330–339. [PubMed] [Google Scholar]
  • 6.Henderson R, Diggle P, Dobson A. Joint modelling of longitudinal measurements and event time data. Biostatistics. 2000;1(4):465–480. doi: 10.1093/biostatistics/1.4.465. [DOI] [PubMed] [Google Scholar]
  • 7.Wang Y, Taylor JMG. Jointly modeling longitudinal and event time data with application to acquired immunodeficiency syndrome. Journal of the American Statistical Association. 2001;96(455):895–905. doi: 10.1198/016214501753209031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Tsiatis AA, Davidian M. A semiparametric estimator for the proportional hazards model with longitudinal covariates measured with error. Biometrika. 2001;88(2):447–458. doi: 10.1093/biostatistics/3.4.511. [DOI] [PubMed] [Google Scholar]
  • 9.Song X, Davidian M, Tsiatis AA. A semiparametric likelihood approach to joint modeling of longitudinal and time-to-event data. Biometrics. 2002;58:742–753. doi: 10.1111/j.0006-341x.2002.00742.x. [DOI] [PubMed] [Google Scholar]
  • 10.Huang X, Liu L. A joint frailty model for survival and gap times between recurrent events. Biometrics. 2007;63(2):389–397. doi: 10.1111/j.1541-0420.2006.00719.x. [DOI] [PubMed] [Google Scholar]
  • 11.Liu L, Huang X, O'Quigley J. Analysis of longitudinal data in the presence of informative observational times and a dependent terminal event, with application to medical cost data. Biometrics. 2008;64(3):950–958. doi: 10.1111/j.1541-0420.2007.00954.x. [DOI] [PubMed] [Google Scholar]
  • 12.Liu L, Huang X. Joint analysis of correlated repeated measures and recurrent events processes in the presence of death, with application to a study on acquired immune deficiency. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2009;58(1):65–81. [Google Scholar]
  • 13.Rizopoulos D. Dynamic predictions and prospective accuracy in joint models for longitudinal and time-to-event data. Biometrics. 2011;67(3):819–829. doi: 10.1111/j.1541-0420.2010.01546.x. doi:10.111/j.1541-0420.2010.01546.x. [DOI] [PubMed] [Google Scholar]
  • 14.Tsiatis AA, Davidian M. Joint modeling of longitudinal and time-to-event data: an overview. Statistica Sinica. 2004;14:809–834. [Google Scholar]
  • 15.Rizopoulos D. Joint Models for Longitudinal and Time-to-event Data: With Applications in R. Chapman and Hall/CRC; Boca Raton: 2012. [Google Scholar]
  • 16.Rizopoulos D, Hatfield L, Carlin P, Takkenberg J. Combining dynamic predictions from joint models for longitudinal and time-to-event data using Bayesian model averaging. Journal of the American Statistical Association. 2014;109(508):1385–1397. [Google Scholar]
  • 17.Zheng Y, Heagerty PJ. Partly conditional survival models for longitudinal data. Biometrics. 2005;61(2):379–391. doi: 10.1111/j.1541-0420.2005.00323.x. [DOI] [PubMed] [Google Scholar]
  • 18.van Houwelingen HC. Dynamic prediction by landmarking in event history analysis. Scandinavian Journal of Statistics. 2007;34:7–85. [Google Scholar]
  • 19.van Houwelingen HC, Putter H. Dynamic predicting by landmarking as an alternative for multi-state modeling: an application to acute lymphoid leukemia data. Lifetime Data Analysis. 2008;14:447–463. doi: 10.1007/s10985-008-9099-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.van Houwelingen H, Putter H. Dynamic Prediction in Clinical Survival Analysis. CRC Press; Boca Raton: 2012. [Google Scholar]
  • 21.Binder H, Sauerbrei W, Royston P. Comparison between splines and fractional polynomials for multivariable model building with continuous covariates: a simulation study with continuous response. Statistics in Medicine. 2013;32(13):2262–2277. doi: 10.1002/sim.5639. [DOI] [PubMed] [Google Scholar]
  • 22.Andersen PK, Gill RD. Cox's regression model for counting processes: a large sample study. The Annals of Statistics. 1982;10(4):1110–1120. [Google Scholar]
  • 23.Shih JH, Louis TA. Inferences on the association parameter in copula models far bivariate survival data. Biometrics. 1995;51:1384–1399. [PubMed] [Google Scholar]
  • 24.Anderson J, Cain K, Gelbber R. Analysis of survival by tumor response. Journal of Clinical Oncology. 1983;1:710–719. doi: 10.1200/JCO.1983.1.11.710. [DOI] [PubMed] [Google Scholar]
  • 25.Dafni U. Landmark analysis at the 25-year landmark point. Circulation: Cardiovascular Quality and Outcomes. 2011;4(3):363–371. doi: 10.1161/CIRCOUTCOMES.110.957951. doi: 10.1161/CIRCOUTCOMES.110.957951. [DOI] [PubMed] [Google Scholar]
  • 26.Heagerty PJ, Zheng Y. Survival model predictive accuracy and ROC curves. Biometrics. 2005;61(1):92–105. doi: 10.1111/j.0006-341X.2005.030814.x. [DOI] [PubMed] [Google Scholar]
  • 27.Harrell FE, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine. 1996;15:361–387. doi: 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4. [DOI] [PubMed] [Google Scholar]
  • 28.Zheng Y, Heagerty PJ. Prospective accuracy for longitudinal markers. Biometrics. 2007;63(2):332–341. doi: 10.1111/j.1541-0420.2006.00726.x. [DOI] [PubMed] [Google Scholar]
  • 29.Antolini L, Boracchi P, Biganzoli E. A time-dependent discrimination index for survival data. Statistics in Medicine. 2005;24(24):3927–3944. doi: 10.1002/sim.2427. [DOI] [PubMed] [Google Scholar]

RESOURCES