Abstract
With the emergence of rich information on biomarkers after treatments, new types of prognostic tools are being developed: dynamic prognostic tools that can be updated at each new biomarker measurement. Such predictions are of interest in oncology where after an initial treatment patients are monitored with repeated biomarker data. However, in such setting, patients may receive second treatments to slow down the progression of the disease. This paper aims to develop and validate dynamic individual predictions that allow the possibility of a new treatment in order to help understand the benefit of initiating new treatments during the monitoring period. The prediction of the event in the next x years is done under two scenarios: (1) the patient initiates immediately a second treatment, (2) the patient does not initiate any treatment in the next x years. Predictions are derived from shared random-effect models. Applied to prostate cancer data, different specifications for the dependence between the PSA repeated measures, the initiation of a second treatment (hormonal therapy) and the risk of clinical recurrence are investigated and compared. The predictive accuracy of the dynamic predictions is evaluated with two measures (Brier score and prognostic cross-entropy) for which approximated cross-validated estimators are proposed.
Keywords: Brier score, Dynamic predictions, Hormonal treatment, Joint model, Prognostic models, Prostate cancer, Shared random-effect models
1 Introduction
The increase of rich longitudinal information in health studies1–6 has motivated the development of joint models of longitudinal and time-to-event data. In addition to providing an efficient framework to correctly model correlated longitudinal markers and clinical events and to better understand their interrelationship, these models have more recently offered a new approach for monitoring the patients after the diagnosis of a chronic disease.
Indeed, dynamic individual predictions of an event of interest can be easily derived from joint models7–9. They consist of the predicted probability of having the event in a certain window of time conditional on the longitudinal marker history. As such, in contrast with standard predictive tools that only use information at diagnosis, these predictions can dynamically adapt the risk of the event of interest to the individual trajectory of the biomarker of progression.
Such predictions are of interest in oncology where after a first treatment, patients are monitored to detect early recurrence of the cancer. In particular, after prostate cancer diagnosis and initial treatment by radiation therapy, some dynamic tools that are based on repeated measures of PSA (prostate specific antigen), in addition to the standard prognostic factors were found to more accurately predict the risk of clinical recurrence than standard prognostic tools7,10.
In practice, after a first treatment of the cancer, patients may receive a second treatment (ST) to slow down the progression of the disease, and prevent or delay the recurrence of the cancer. In particular for prostate cancer, after a first treatment by radiation therapy, the patient may initiate hormonal therapy (HT) when he has a high risk of clinical recurrence. The optimal time to initiate such HT is unknown. The timing is mainly determined by the clinician based on experience, the knowledge of the disease and the observation of the PSA trajectory. Yet, HT has consequences for the patient’s personal life, so that in accordance with the principle of personalized medicine11, assessing the benefit for the patient in terms of reduction of his risk of recurrence of initiating a ST is of great importance.
Dynamic predictive tools can be used for this purpose as they provide an up-to-date quantification of the risk of recurrence. However most dynamic predictions currently developed do not take into account a possible change of treatment7,8,12,13: they are computed by assuming the patient will not initiate any ST in the window of prediction. As ST’s usually change the dynamics of the biomarker and of the time-to-event, such a joint model would be more complex to define13,14 and differential dynamic predictions would be required to distinguish the risk of an event according to the initiation of ST.
In this context, the objective of this paper is to compute and validate dynamic individual predictions of an event in the next x years based on different scenarios, in particular two conditions: (1) whether the patient initiates a ST today or (2) whether the patient does not initiate any ST in the window of prediction. The idea is to provide tools to help the clinician quantifying the potential benefit of starting immediately a new treatment compared to postponing the decision to the next monitoring visit. To reach this goal, we utilized the joint model methodology and focused on the prostate cancer example with initiation of hormonal therapy. We explored and compared different specifications of the interrelationships between the PSA repeated measures, the initiation of HT and the risk of clinical recurrence. Dynamic individual predictions were computed in situations (1) and (2), and several measures of predictive abilities were considered.
In order to validate the dynamic predictive tools, we considered the quadratic error of prediction (Brier Score - BS)10,15 and developed a new estimator which is valid both on external data and on estimation data. The usual overoptimism due to the use of the same data for the estimation and the prediction was corrected by applying a general formula of approximated cross-validation recently developed16. In addition to the Brier Score, we also considered an information criterion for assessing the prognostic value of joint models (EPOCE)10,17. Specifically in situation (1) that focuses on the subsequent years after initiation of a ST, integrated versions of these two predictive accuracy measures over the times of initiation of ST were defined.
The paper is organized as follows. Section 2 describes the classical joint model methodology. Section 3 details the development of the differential individual dynamic predictions and their validation with BS and EPOCE estimators. Section 4 describes the application to Prostate Cancer with the evaluation of the risk of clinical recurrence based on the PSA trajectory after initial radiation therapy, and on the immediate initiation of hormonal therapy. Finally, the model and the results are discussed in section 5.
2 Joint models
The shared random-effect approach was chosen to jointly model the biomarker repeated measures and the time-to-event1–3,5,6. A general description is given below and more specific models will be described in the application section.
2.1 Notation
Let be the time to the event of interest and Ci the censoring time for subject i, i = 1, …, N. We observe the time and is the indicator of event. Let τi be the time to ST (unobserved if τi > Ti) and 𝟙{t≥τi} the indicator of ST status at any time t. For each subject we also collect ni repeated measures of the biomarker Yi = (Yi (ti1), …, Yi (tini)) at times (ti1, …, tini).
2.2 Longitudinal submodel
The biomarker trajectory in the absence of second treatment initiation is described using a linear mixed model18. The biomarker pattern after a possible ST initiation is not modelled. We assume that for j = 1, …, ni, the repeated measures Yi(tij) are noisy measures of the true unobserved biomarker value. The mean change over time of can depend on covariates and the within-subject correlation of the biomarker repeated measures is captured using subject-specific random effects:
| (1) |
where XLi(tij) and Zi(tij) are a p-vector and a q-vector of time-dependent covariates associated respectively with the p-vector of fixed effects β and the q-vector of random effects bi, bi ~ 𝒩 (0, B). The vector of independent errors of measurement εi = (εi(ti1), …, εi(tini)) ~ 𝒩 (0, Σi = σ2Ini); εi and bi are independent.
2.3 Survival submodel
To model the risk of the event and to quantify the effects of the biomarker dynamics and the initiation of ST on this risk adjusted for other covariates, we define a proportional hazard model as follows:
| (2) |
where λ0 (t) is the baseline hazard function, γ is the r-vector of coefficients associated with the r-vector of time-independent covariates XSi, and ϕ is the vector of parameters associated with Wi(bi, τi, t), the multivariate function of the random effects bi from model (1) and the initiation of ST when ST is considered. In the standard framework without any ST, Wi(bi, τi, t) defines the nature of the dependence between the longitudinal and the survival processes and ϕ measures the corresponding strength of association. The most common example is that assumes an association with the risk of event through true current level of the biomarker. In the presence of a ST, Wi(bi, τi, t) captures both the dependence between the two processes and the effect of ST on the hazard. For example, models independent effects of the true current level of the biomarker and a change of risk after ST initiation. Other examples of Wi(bi, τi, t) will be described in the application (section 5.2).
2.4 Maximum likelihood estimation
Maximum likelihood estimates were obtained using the JM R package19, with modifications of the JM source code when necessary. The maximum likelihood estimates, denoted θ̂, were obtained by a quasi-Newton algorithm with a convergence criterion on the log-likelihood19. The two integrals involved in the log-likelihood computation were approximated using Gaussian quadrature19. Estimates of the variance-covariance matrix of the estimated parameters were provided by the inverse of the Hessian matrix.
3 Individual dynamic predictions
Individual dynamic predictions derived from joint models7–10 consist in the individual predicted probability of event, pi(s, 𝒯; θ), between times s and s + 𝒯 computed for a new subject given his biomarker history and his covariates history until time s as well as time-independent covariates XSi. It is defined as:
| (3) |
With a change in risk due to the initiation of a ST, different individual dynamic predictions can be distinguished from a time of prediction s in patients free of ST:
- The patient initiates ST at time s:
(4) - The patient does not initiate ST in s. In that case, many alternative scenarios can be considered. For example, ST could be initiated after a certain amount of time t1 with t1 < 𝒯 or t1 > 𝒯 or ST could be initiated when the biomarker reaches a certain threshold x. For validating the dynamic prognostic tools, we focused on scenario “no initiation of ST in the window of prediction 𝒯” (but we illustrate alternatives in section 5.4.3):
(5)
In (4) and (5), fY(s) and fb are multivariate Gaussian density functions with respectively means and 0, and variance-covariance matrices and B; Si is the survival function. and Z(s) are design matrices with respectively and Z(s) (tij)T as row vectors with tij ≤ s, and is the submatrix of Σi with tij ≤ s.
A point estimate of these individual dynamic predictions can be obtained with pi(s, 𝒯; θ̂). Alternatively, the posterior distribution of pi(s, 𝒯; θ) can be also approximated by a Monte Carlo method7: a large set of θ(d) (d = 1, …, D) is generated from the asymptotic distribution of the estimates and used to compute pi(s, 𝒯; θ(d)) for (d = 1, …, D). The median value of pi(s, 𝒯; θ(d)) provides the point estimate and the 2.5% and 97.5% percentiles give a 95% confidence band. Instead of computing the probabilities using (4) and (5), can be sampled from its posterior distribution, and the probabilities computed given 8.
4 Evaluation of predictive accuracy
Validation of the prognostic tools is done in terms of predictive accuracy. Two measures adapted to the dynamic setting were considered: the prognostic cross-entropy and the Brier score. The measures and simple estimators of them are described. Then, the formula for the approximate cross-validation16 is applied to provide estimators of the measures that can be used on the training data to correct overoptimism. Finally, integrated versions over the times of initiation of ST are proposed and confidence intervals are given.
In the following, Ns is the number of subjects at risk at time of prediction s. However, as predictive performances are evaluated differently in the absence of ST and after immediate initiation of ST, in the first case while in the second case .
4.1 Measures of predictive accuracy
4.1.1 Expected Prognostic Observed Cross-Entropy
The expected prognostic observed cross-entropy (EPOCE) is a criterion that quantifies the prognostic information of a joint model from a time of prediction s10,17. It is formally defined as the expectation of the log of the conditional density fT |Y(s),T*≥s of the time to event given the history of the biomarker Y(s) until the time of prediction s, E[−log (fT|Y(s),T*≥s|T* ≥ s)]. A simple estimator of EPOCE is given by the prognostic observed log-likelihood (POL) which is the log-likelihood to observe the event in the time window [s, s + 𝒯] conditional on the observed marker data until time s:
| (6) |
where Fi(θ̂, s, 𝒯) is minus the observed individual contribution to the conditional log-likelihood defined below in (7) given that the subject is still at risk at time s and given his covariates history until s. For i = 1, …, Ns, Fi is defined as :
| (7) |
where T̃i = min(Ti, s + 𝒯) and Ẽi = 𝟙{Ti≤s+𝒯}which means that subjects are artificially censored at s + 𝒯.
4.1.2 Brier Score and Integrated Brier Score
The Brier Score (BS) developed in survival models20,21 was extended to joint models7,10,15,22. It consists in E[(η(s + 𝒯) − Ŝ(s + 𝒯 |s; θ̂))2], the expectation of the squared difference between the observed survival status η and the predicted survival at a specific time: Ŝ(s + 𝒯 |s; θ̂) = 1 − pi(s, 𝒯; θ̂) with pi(s, 𝒯; θ̂) defined in (5) in the absence of HT and in (4) after immediate initiation of HT. The simple estimator of BS at time s + 𝒯 is:
| (8) |
where wi is a weight that compensates for the loss of information due to censoring. The weights are defined according to the inverse probability of censoring10,23 as with Ĝ the Kaplan-Meier estimate of the survival function for the censoring.
An average prediction accuracy is derived from BS in a window [s, s + 𝒯] by integrating the quantity over the horizon times10,22. Again, to account for the loss of events due to censoring, a weighted mean can be used22 and estimated by :
| (9) |
where are the distinct times of events in the window [s, s + 𝒯] and is the number of events at each time tk among subjects at risk at time s.
4.2 Approximated cross-validated estimators
To provide a valid assessment of predictive accuracy, these measures should be computed for independent data. On the model estimation data, a cross-validation technique is required to correct for overoptimism. With complex models, cross-validation is numerically too expensive to be used, so we applied the approximate leave-one-out cross-validation formula for regular problems proposed by Commenges et al.16. It is defined as:
| (10) |
where ℳ(θ̂, s, 𝒯) is the simple estimator (POL, BS or IBS) and the second term is the correction term added by the approximated leave-one-out cross-validation. This is a penalty for the statistical complexity of the model that captures the overfit and the corresponding overoptimism of predictions. H is the Hessian matrix of the joint log-likelihood, and is the product of the gradients v̂i(s, 𝒯) and d̂i of the individual contributions respectively to the simple estimator ℳ(θ̂, s, 𝒯) and the maximized joint log-likelihood. The gradients are computed using finite differences. Applied to POL, BS and IBS respectively, the approximate cross-validation estimators are respectively called CVPOLa, CVBSa and CVIBSa; H and d̂i are the same in the three approximate cross-validation measures and for CVPOLa, for CVBSa and for CVIBSa.
4.3 Averaged predictive accuracy
When predicted probabilities are computed for the case of immediate initiation of a ST, the predictive accuracy is evaluated from the time of ST initiation. In practice, this time is different for each subject. So instead of an evaluation at fixed times of prediction s, we focused on the average predictive accuracy over the times of ST initiation computed using the inverse probability of censoring weighting technique22:
| (11) |
where ℳ can be BS or POL (simple or cross-validated) computed in (6), (8) and (10); nST is the number of distinct times of ST initiations, is the number of ST initiations at time τi and Ĝτ is the Kaplan-Meier estimate of the survival function of censoring related to times of ST initiation τi.
4.4 Confidence interval
Predictive accuracy measures between two models can be compared by computing the difference of their approximated cross-validation estimators with a 95% confidence interval (CI)16. It is computed from the asymptotic distribution of the predictive accuracy difference and its empirical variance. Let θ̂A and θ̂B be the vectors of parameter estimates for the two models A and B. Let Δ(A(θ̂A), B(θ̂B)) be the difference of predictive accuracy between the two models and 𝒟(A(θ̂A), B(θ̂B)) its estimator. Let m be the number of subjects on which the predictive accuracy is computed. It is shown16 that the difference between 𝒟(A(θ̂A), B(θ̂B)) and Δ(A(θ̂A), B(θ̂B)) is asymptotically normal:
| (12) |
where can be estimated by the empirical variance ŵ2 of the difference of the simple estimators. With zu the uth quantile of a standard normal variable, the confidence interval is then [𝒟 (A(θ̂A), B(θ̂B)) − zα/2m−1/2 ŵ; 𝒟 (A(θ̂A), B(θ̂B)) + zα/2m−1/2 ŵ].
In absence of ST, the predictive accuracy is evaluated at different times s with m = Ns and 𝒟 is the approximated cross-validation estimate of POL, BS or IBS at time s. After initiation of HT, predictive accuracy is evaluated once with and 𝒟 is the approximated cross-validation estimate of the average measures defined in (11).
5 Application to the prediction of prostate cancer recurrence
5.1 Datasets
Data used in this application consist of 2386 men treated for localized prostate cancer by external beam radiation therapy (EBRT) in three different studies: 503 patients come from the cohort of the University of Michigan (UM) with a period of recruitment from 1988 to 2004; 1268 patients come from the cohort of Beaumont Hospital, in Michigan (BM), recruited between 1987 and 2003; 615 patients come from the multicenter clinical trial RTOG9406 recruited from 1994 to 2001. Among them, 261 (10.9%) received a ST that is hormonal therapy (HT) during their follow-up. The definition of clinical recurrence was any kind of recurrence (local, regional, distant) or death from prostate cancer and only the first clinical recurrence was considered. 312 (13.1%) patients had a clinical recurrence among which 53 received HT. The four baseline prognostic factors considered in this application are the pre-radiation therapy level of PSA, the T-stage which indicates how large the tumor is and how far it has spread (in three categories: 1;2;3–4), the Gleason score which quantifies the aggressiveness of the cancer (in three categories: 2–6; 7; 8–10) and the corrected total dose of radiation therapy24; full description of these covariates has been previously given7,25.
Figure 1 shows 8 random individual observed trajectories of PSA after the end of EBRT for patients who recurred or were censored and patients who did or did not receive HT. After EBRT a drop in PSA is observed in the first year. Then a subsequent rise of PSA indicates a higher risk of recurrence and in some cases HT is initiated to reduce the risk and postpone the recurrence. Initiation of HT induces an immediate change in the PSA dynamics. Since the objective was to predict the risk of recurrence based on PSA dynamics prior to HT we chose to censor the post-HT PSA data. We call the PSA dynamics as if the person were not treated with HT the base PSA dynamics. A median of 9 (Inter-quartile Range =5,12) PSA repeated measures per subject were analyzed.
Figure 1.
Individual observed trajectories of log(PSA + 0.1) after the end of EBRT until the observed survival time (at the vertical black dash line): (a) for two patients who received HT (at vertical grey dashed line) and who subsenquently recurred, (b) for two patients who received HT and were subsequently censored, (c) for two patients who recurred without initiating any HT and (d) for two patients who were censored without initiating any HT. Black dots represent observed values of PSA before HT and the black curve represents the subject-specific PSA predictions from the linear mixed model. Grey dots are the observed PSA values after HT and the grey curve represents the extrapolated subject-specific PSA predictions from the mixed model based only on pre-HT data. It gives the expected PSA trajectory assuming the patient did not receive any HT.
5.2 Specification of the joint models
PSA repeated measures were analyzed in the logarithm scale. As previously proposed25 we used the two phases trajectory of PSA defined as:
| (13) |
where f(t) = ((1 + t)−1.5 − 1) and t captured respectively the short-term decline and the long-term trend of PSA25; X0i included 1, the pre-EBRT PSA and the cohort indicators; X1i included X0i plus 2 binary indicators for T-stage (2 vs 1, and 3–4 vs 1); and X2i included X1i plus 2 binary indicators for the Gleason score (7 vs 2–6, and 8–10 vs 2–6).
In the survival model, the baseline hazard function was approximated by splines and the four baseline prognostic factors were included in XSi. In addition, different specifications of Wi(bi, τi, t) were explored. Wi(bi, τi, t) includes two components: the multivariate function h(bi, t) of the random effects derived from (1) that models the dependency between the PSA dynamics and the time-to-clinical-recurrence, and information about initiation of HT. In the following, we propose five specifications of Wi(bi, τi, t) that differ in the way the initiation of HT enters into the model, and different variants that correspond to different functions of the PSA dynamics h(bi, t). The five specifications are:
Wi(bi, τi, t)T ϕ = ϕ1 𝟙{t≥τi}. This is the standard survival model assuming there is no association between the PSA dynamics and the risk of event, but considering a change of risk of recurrence after initiation of HT.
Wi(bi, τi, t)T ϕ = ϕ1 𝟙{t≥τi} + h(bi, t)T ϕ2. This is the standard joint model for describing PSA dynamics and risk of recurrence13,14,25?. This model assumes that the characteristics of the PSA dynamics have the same role before and after HT. After HT, these characteristics are extrapolated as if the patient did not initiate HT (see Figure 1) so that the change in risk after HT captured by parameter ϕ1 summarizes the effect of HT adjusted for base PSA dynamics.
Wi(bi, τi, t)T ϕ = ϕ1 𝟙{t≥ τi}+ h(bi, t)T ϕ2 𝟙{t<τi} + h(bi, t)T ϕ3 𝟙{t≥ τi}. This model considers an interaction between the initiation of HT and the PSA dynamics by including a different effect of the base PSA dynamics before and after HT. The assumption is that the effect of HT on the risk of recurrence depends on the shape of PSA trajectory preceding the initiation of HT.
Wi(bi, τi, t)T ϕ = ϕ1 𝟙{t≥ τi} + h(bi, t)T ϕ2 𝟙{t<τi} + h(bi, τi)T ϕ3 𝟙{t≥ τi}. This model is a variant of model 3. As the extrapolated PSA dynamics after HT initiation no longer represents the actual PSA dynamics of the patient, the current extrapolated PSA values after HT are replaced by the PSA value at the time τi of HT initiation at which PSA measurements were censored.
Wi(bi, τi, t)T ϕ = 𝟙{t≥ τi} (ϕ1 + αg(t − τi)) + h(bi, t)T ϕ2 𝟙{t<τi} + h(bi, τi)T ϕ3 𝟙{t≥ τi}. This is a more flexible version of model 4, in which the baseline risk of recurrence after HT may change with time according to a function g(t − τi). We considered both g (t − τi) = log (t − τi) and g (t − τi) = t − τi.
In specifications 2 to 5, up to three variants of h(bi, t) were considered9.
: the level and slope of PSA at time t are independent predictors of the time to clinical recurrence.
: instead of the crude PSA level, a transformed PSA level and the slope at time t are independent predictors of the time to clinical recurrence25, with .
hc(bi, t) = (b0i, b1i, b2i)T: the individual deviations from the mean PSA dynamics, that are the random effects, are independent predictors of the time to clinical recurrence. This variant was only considered with specification 2.
5.3 Estimation and goodness-of-fit of the joint models
Estimation of the joint models is summarized in Table 1 and parameter estimates that measure the effect of HT and the association between the PSA dynamics and the risk of clinical recurrence are shown in Table 2. Whatever the assumed nature of the dependence between the PSA dynamics and the time-to-clinical recurrence, the joint models provided a substantial gain in fit compared to model M1 which assumes independence between the two processes (minimum gain of 435.9 points of AIC for the joint model M2a).
Table 1.
Goodness-of-fit statistics of the different joint models.
| Model | L | AIC | # param. |
|---|---|---|---|
| 1 | −13549.4 | 27184.7 | 43 |
| 2.a | −13329.4 | 26748.8 | 45 |
| 2.b | −13222.9 | 26535.8 | 45 |
| 2.c | −13261.6 | 26615.1 | 46 |
| 3.a | −13266.8 | 26627.7 | 47 |
| 3.b | −13218.2 | 26530.4 | 47 |
| 4.a | −13265.5 | 26625.1 | 47 |
| 4.b | −13214.8 | 26523.5 | 47 |
| 5.1.a† | −13264.0 | 26624.0 | 48 |
| 5.1.b† | −13213.9 | 26523.7 | 48 |
| 5.2.a‡ | −13263.2 | 26622.4 | 48 |
| 5.2.b‡ | −13213.4 | 26522.7 | 48 |
For this model, we assume a change in baseline risk after HT with the function g (t − τi) = t − τi which corresponds to a Gompertz hazard function.
For this model, we assume that g (t − τi) = log (t − τi) which corresponds to a Weibull hazard function.
Table 2.
Parameters estimates (and standard error (se)) of the HT and the association between the PSA dynamics and the risk of clinical recurrence adjusted on the prognostic factors.
| Parameters Model | HT | Before HT | After HT | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ϕ̂1 | (se) | Level | Slope | Random Effects | Level | Slope | ||||||||||
| ϕ̂21 | (se) | ϕ̂22 | (se) | ϕ̂21 | (se) | ϕ̂22 | (se) | ϕ̂23 | (se) | ϕ̂31 | (se) | ϕ̂32 | (se) | |||
| 1. | 0.16 | (0.17) | ||||||||||||||
| 2.a | −1.89 | (0.25) | 0.13 | (0.05) | 2.44 | (0.18) | ||||||||||
| 2.b | −1.39 | (0.17) | 4.82 | (0.39) | 1.10 | (0.14) | ||||||||||
| 2.c | −2.56 | (0.22) | 0.92 | (0.14) | −0.31 | (0.06) | 3.70 | (0.22) | ||||||||
| 3.a | 1.33 | (0.45) | 0.62 | (0.06) | 1.56 | (0.19) | −0.05 | (0.06) | 1.29 | (0.26) | ||||||
| 3.b | 2.74 | (1.28) | 4.77 | (0.41) | 1.19 | (0.16) | 1.13 | (1.39) | 0.95 | (0.20) | ||||||
| 4.a | 1.20 | (0.46) | 0.64 | (0.06) | 1.50 | (0.19) | 0.15 | (0.14) | 1.10 | (0.25) | ||||||
| 4.b | 2.17 | (1.01) | 4.77 | (0.41) | 1.18 | (0.16) | 1.90 | (1.13) | 0.94 | (0.22) | ||||||
| 5.1.a | 1.33 | (0.47) | 0.62 | (0.06) | 1.55 | (0.19) | 0.14 | (0.15) | 1.20 | (0.26) | ||||||
| 5.1.b | 2.31 | (1.01) | 4.77 | (0.41) | 1.20 | (0.16) | 1.84 | (1.14) | 0.96 | (0.23) | ||||||
| 5.2.a | 1.35 | (0.47) | 0.62 | (0.06) | 1.56 | (0.19) | 0.12 | (0.15) | 1.23 | (0.25) | ||||||
| 5.2.b | 2.36 | (1.01) | 4.71 | (0.41) | 1.22 | (0.16) | 1.74 | (1.13) | 1.00 | (0.23) | ||||||
bold underlined: highly significant (p < 0.001); bold: significant (0.001 ≤ p ≤ 0.05); nonbold: not significant (p > 0.05).
Among the different joint models, considering a logistic transformation of the current level of PSA (models b) rather than the crude current level (models a) improved the fit. In previous work, a residual analysis9 had noted a departure of the log-linearity assumption when considering the crude PSA level in the survival model, and the correction of this departure when considering the logistic transformation. This transformation that makes the effect of the PSA level increase in the range 0 to 4ng/ml and become maximal around 4ng/ml is particularly of importance in M2 where after initiation of HT, very high levels of PSA can be extrapolated from the longitudinal model (as illustrated in Figure 1), which may artificially increase the subsequent risk of recurrence.
Assuming that the effects of the crude current PSA level and the current slope differed before and after HT in M3a greatly improved the fit (121.1 points of AIC) compared to M2a. In contrast, when considering the logistic transformation instead of the crude PSA value, assuming different effects of PSA dynamics before and after HT in M3b provided only a small gain in fit (5.4 points of the AIC) compared to M2b. Indeed, after HT, most of the extrapolated PSA levels are very high so that they drive the estimate to a smaller overall impact of the current PSA level. When separating the effects pre and post-HT in M3a, the pre-HT crude effect (defined from relatively standard PSA levels) was four times bigger than the overall crude effect estimated in M2a, and the effect post-HT was no longer significant. In contrast, when assuming a transformation of the PSA level, the overall effect in M2b was similar to the effect pre-HT in M3b. We noted the same things for models M4 and M5 compared to the models M2.
We observed from Table 2 that only the slope of PSA was significantly predictive of the risk of recurrence after HT with relatively stable estimates ranging from 0.94 to 1.29 in models M3 through M5. Neither the extrapolated current level in models M3 (with p = 0.38 for M3a and p = 0.42 for M3b) or the level reached at the time of initiation of HT in models M4 and M5 (with p = 0.31 and p = 0.09 for M4a and M4b; p = 0.35 and p = 0.11 for M5.1a and M5.1b; p = 0.40 and p = 0.12 for M5.2a and M5.2b) were associated with the risk of recurrence post-HT after adjustment for the slope of PSA.
Assuming a dependence through the random effects (M2c) rather than the PSA level or slope provided a fit in between models M2a and M2b even though the dependence was summarized by three parameters (all significant) instead of two. Finally, assuming a non constant change in the baseline risk function after HT (in models M5.1 and M5.2) did not improve substantially the fit of the models. In summary, the model M4b assuming an association with the transformed PSA level, separating effects of PSA prior and after HT, and focusing on characteristics at the time of HT after the initiation, provided the best fit of the data.
Regarding the specific effect of initiation of HT, the interpretation differs between models. Model M2a aims at capturing the actual protective effect of HT after adjustment for the base PSA trajectory (ϕ1 = −1.89, p < 0.0001). But as explained before, this model may suffer from the very high extrapolated PSA values after HT so that M2b may be more appropriate to accurately evaluate the effect of HT with an estimate ϕ1 = −1.39 (p < 0.0001). This corresponds to a relative reduction by 4 in the risk of recurrence when initiating HT and adjusted for the PSA characteristics.
In models M3 to M5, no single parameter represents the effect of HT, and particularly parameter ϕ1 no longer represents the effect of HT and should not be interpreted as such. Indeed, distinct effects of PSA dynamics before and after HT are modeled so that (except for standard prognostic factors) the model is stratified on the initiation of HT and parameter ϕ1 associated with the initiation of HT only represents a change in the baseline risk at HT initiation. This baseline risk appears to be substantially increased (e.g. ϕ1=1.33 in M3a and ϕ1=2.74 in M3b) but this has to be put in balance with the different effects of PSA level and slope before and after HT, PSA level being highly significant before initiation of HT and no longer significant after initiation of HT.
5.4 Predictive accuracy of the joint models
For the comparison in terms of predictive accuracy, we focused on 6 joint models: the model assuming independence between the PSA dynamics and the risk of clinical recurrence (M1), the standard joint models in PSA studies (M2a and M2b), the joint models in which the extrapolated PSA current level and slope after HT are replaced by the PSA level and slope at initiation of HT (M4a and M4b) and the model with a dependence directly on the random effects (M2c). The predictive accuracy was evaluated on the estimation data using the approximated cross-validated estimates. We assessed the ability of the joint models to predict the risk of clinical recurrence in a window of 3 years (𝒯=3) which was a clinically reasonable window. For all the measures, the lower the better.
5.4.1 Average predictive accuracy after immediate initiation of ST
Among men who initiated a HT during the follow-up, the average POL and BS defined in section 4.3 are shown in Figures 2(a) and 2 (c). The differences between pairs of models and their 95% CI were also computed and shown in Figures 2(b) and 2 (d).
Figure 2.
Predictive accuracy measures after an immediate initiation of ST averaged over the times of ST initation for 6 joint models: with (a) POL estimate, (b) difference in POL and 95% CI, (c) BS estimate, (d) difference in BS and 95% CI. Negative (respectively positive) differences indicate the first model has a better (respectively worse) predictive ability.
First, BS and POL measures mostly agreed even if a few differences were observed between the three or four most predictive models.
Whatever the nature of the dependency between the PSA dynamics and the risk of recurrence, the predictive accuracies of joint models were significantly better than those of model M1 which assumes independence between PSA dynamics and risk of recurrence.
In accordance with the goodness-of-fit measures, considering different effects prior to HT and after HT improved a lot more the predictive accuracy when the crude PSA level was considered (model M4a compared to M2a) than when considering a transformed PSA level (model M4b compared to M2b). The latter comparison is the only one with discordance between POL and BS results: BS concluded that predictive ability of M4b was significantly better than the one of M2b while no difference was found with POL.
Among models M2, considering a transformation of the PSA current level rather than the crude PSA level improved significantly the predictive accuracy (M2b compared to M2a) while among models M4, this did not induce any significant difference for either measure between M4b and M4a. Finally, assuming a dependence on the random effects (M2c) rather than on the PSA transformed level and slope (M2b) did not alter much the ability to predict the risk of recurrence.
In summary, BS tended to favor model M4b while POL tended to slightly favor model M2b. As the difference in POL between M4b and M2b was not significant, we chose M4b as the final best model to predict clinical recurrence after immediate initiation of HT.
5.4.2 Predictive accuracy in absence of ST
To evaluate the predictive accuracy of the joint models in absence of HT initiation in the next 3 years, predictive accuracy measures were computed at different times of prediction s (from 1 to 6 years after end of EBRT) among men who did not initiate any HT in the window [s, s + 3] years. These curves are displayed in Figures 3(a), 3 (c) and 3 (e) for the approximated cross-validation estimates of POL, BS and IBS. The corresponding differences between pairs of models and their 95% confidence bands were computed at the same times of prediction and shown in Figures 3(b), 3 (d) and 3 (f).
Figure 3.
Predictive accuracy measures in absence of ST for 6 joint models at times from 1 to 6 after EBRT with (a) EPOCE estimate, (b) difference in EPOCE and 95% CI, (c) BS estimate, (d) difference in BS and 95% CI, (e) IBS estimate and (f) difference in IBS and 95% CI. Negative (positive) differences indicate the first model has a better (worse) predictive ability.
Whatever the predictive accuracy measure and the nature of the dependency between the PSA dynamics and the risk of recurrence, the joint models provided globally a significantly better predictive accuracy compared to model M1 which assumes independence between the two processes (with the surprising exception for M4b in the first years according to the BS and IBS measures).
Whatever the predictive accuracy measure, models M4 and M2 had similar predictive performances (differences not shown) in the absence of HT. Indeed, the overall estimates in M2 are mostly driven by the high proportion of subjects who did not initiate HT.
Whatever the measure, considering in models M2 a logistic transformation of the PSA level (M2b) instead of the crude PSA level (M2a) did not really improve the predictive performances in the short-term for s ∈ [1, 4]. This was expected as the transformation of PSA level is supposed to mainly correct very high extrapolated PSA values not observed among subjects who will not initiate any HT. In the long-term (s ≥ 4 years) joint models considering the crude PSA level (M2a) provided even a significantly better predictive accuracy. This contrasted with conclusions in terms of goodness-of-fit or after HT initiation where specification b was systematically better.
When considering a dependence on the random effects (M2c) rather than on the PSA crude level and slope (M2a), conclusions based on BS, IBS and POL measures differed: M2c was found largely better than M2a at times of prediction greater than 1.5years with POL and was also found better with BS and IBS but only for shorter times of prediction. At longer times of prediction, model M2a was even slightly better with BS and IBS.
Although results differed substantially depending on the type of measure, the joint model with a dependence directly through the random effects (M2c) provided a nice alternative to the more standard (M2a) joint model among men who did not undergo any HT. These two models that are the most predictive in absence of HT are also the ones in which the effects of the PSA long-term slope are the highest. This was previously observed in (author?) 9, where among patients who did not initiate any HT, joint models having the largest effects of the slope of log PSA were also the ones having the best predictive ability suggesting that after a few years, the slope of log PSA would be the major predictor of the risk of recurrence in the absence of HT.
In summary, while the best model to predict the risk of clinical recurrence assuming an immediate initiation of HT was M4b, the best model we chose to predict the risk of recurrence assuming the patient will not initiate any HT within 3 years was M2c.
5.4.3 Example of differential dynamic prediction of prostate cancer recurrence
We provide here an illustrative example of how these differential dynamic predictions can be used in practice. We consider a subject who had a T-stage of 2, a Gleason of 6, an initial PSA of 12.7 ng/ml a corrected dose of radiation of 65.7 Gy and who recurred at 2.7 years after the end of EBRT. After each PSA measurement, we computed his individual predicted probability of clinical recurrence in the next 3 years under the two extreme and validated assumptions: whether he initiates HT immediately (probabilities computed according to model M4b) and whether he does not initiate any HT in the next 3 years (probabilities computed according to model M2c), as well as two intermediate scenarios in which the patient initiates HT after 1 and 2 years respectively (probabilities computed according to model M4b). Indeed, although for validation purposes, we chose to focus on the two first scenarios, in practice any clinically relevant scenario could be investigated, as for example a delayed initiation of the treatment.
Figure 4 provides for 4 times of prediction and according to the observed history of PSA (left side of the figure), the individual predictions of clinical recurrence in the next 3 years computed according to each of the four scenarios (right side of the figure).
Figure 4.
Observed PSA history (denoted by × on the left) and individual predicted probabilities of clinical recurrence within 3 years according to four scenarios of treatment (on the right). The four scenarios are: immediate initiation of HT, initiation in 1 year, in 2 years or no initiation of HT in the next 3 years. After each new PSA measurement, the distribution of the prediction is approximated by a 2000-draw Monte Carlo method (solid black circle and solid grey triangle indicate the median and the intervals indicate the 95% bands)
This example illustrates that initiating HT early would have reduced largely the probability of having a recurrence for this patient. For example, at the 1.6-year visit, the man has a probability of having a clinical recurrence in the next 3 years of 25% which would reduce to 5% if he initiated immediately the hormonal therapy. Moreover, by reporting the predicted probabilities according to intermediate scenarios, we observe that the probability of recurrence for this patient increases with the delayed initiation of HT up to the largest predicted probability in case of no initiation in the window of time.
6 Discussion
Using the joint model methodology, we provided individualized dynamic prognostic tools depending on hypothetical clinical decisions, namely the initiation of new treatments. We focused mainly on two scenarios: the prediction of the event assuming no change in the treatment, and the prediction of the event assuming the immediate initiation of a new (second) treatment. Indeed, deciding whether to initiate a second treatment or not has become central in the individual monitoring of chronic diseases such as cancers. While the joint model development requires some subjects with at least three repeated measures, the derived individual predictions can be computed as soon as one measure is available even if in practice more information may be necessary to provide precise predictions13.
Although promising for clinical practice, such differential dynamic predictive tools were never developed or validated in the literature. A website calculator associated with a publication13 does include dynamic predictions under two scenarios, but it was not described and validated in that publication. Until now, dynamic predictive tools in the literature only predicted the risk of event based on a biomarker value8 or a biomarker trajectory7,13 by assuming that there was no change in treatment or patient characteristics that might impact the subsequent risk of event. Indeed, developing and validating dynamic prognostic tools that can be conditioned on scenarios of initiation is challenging.
First, it requires a very precise specification of the dependency between the biomarker dynamics, the treatment initiation and the risk of the event. This was accomplished here by using as series of sophisticated joint shared random-effect models. However other approaches like joint latent class models or landmark analyses7 could also be considered.
Second, the predictive performances have to be validated specifically for each scenario. Indeed, it may be unrealistic to expect that the same model provides the best predictions in different situations. This required the development of integrated measures in the “immediate initiation of second treatment” scenario to focus on the predictive performances following the initiation. In the application, even if all the joint models had a relatively good predictive accuracy in both situations, we did find that the best predictive tool in each scenario did not come from the same models. This illustrates that prognostic tools should be strictly validated for what they are aimed to quantify in practice.
Third, even in the absence of HT, the prognostic tool validation is still not straightforward. We chose to focus here on patients who did not initiate any HT in the window of prediction. However these patients may not be a representative sample of the patients free of HT at the time of prediction. (author?) 13 proposed instead to validate the prognostic tools by focusing on the sample of subjects free of HT at the time of prediction and by considering either all the HT initiations during the window of prediction as recurrences or as censoring. As shown in Web supplementary materials, the results concerning the relative performance of the models did not change when using this technique.
Fourth, due to the models complexity and the differential validation procedure, the use of the whole sample was preferred to a data splitting approach so that estimation and validation of the predictive tools were done on the same data. This motivated the development of a new estimator of the Brier Score by approximated leave-one-out crossvalidation16 which is valid and easy to compute on the estimation data.
Predictive performances were assessed using two different measures that do not tackle the predictive accuracy in the same way. The Brier Score directly measures the Mean Square Error between the event process and the prediction of the model while the EPOCE assesses the prognostic value of the joint models by measuring the distance between the conditional density of the time to event assumed in the model and the true one. This may explain why we found differences between the conclusions given by the two measures in the application. We still chose to select the best models as a balance between the results given by these two measures. Moreover, the models providing the best goodness-of-fit did not necessarily have the best predictive ability. This illustrates the difference between these two types of assessment. While the goodness-of-fit measures use all the information, the predictive accuracy measures focus only on a part of the sample and use only the history of the biomarker up to the time of prediction. This is why when interested in dynamic predictions, the predictive ability of the models should be assessed9.
AUC derived measures26 were not considered here for assessing the predictive ability of the joint models. Indeed, first they focus on discrimination while our focus was really on predictiveness since we wanted to quantify individual probabilities of recurrence. Second, their use in dynamic settings has been rather limited8,27. Third, providing an approximated cross-validation estimate was not straightforward.
Finally, in prostate cancer, the initiation of a second treatment, namely hormonal therapy, has raised many questions about how to take it into account in the model for the risk of clinical recurrence or how to evaluate its causal effect in the presence of indication bias?. In the present paper, our focus was only on the dynamic individual predictions. As such, we chose to compare descriptive joint models that treated HT intuitively as a time-dependent covariate, possibly in interaction with other characteristics. However, using the same strategy, more causal or mechanistic models could also be investigated. Alternatively, time to HT initiation could be treated as a censored time-to-event along with other clinical recurrences by defining a multistate model or a multivariate survival model jointly with the biomarker longitudinal model.
Acknowledgments
Funding: This work was supported by the French National Institute of Cancer INCa [grant PREDYC number 2010-059] and by the US National Cancer Institute [grants CA110518, U10-CA21661, U10-CA37422, and U10-CA180822].
Footnotes
Declaration of Conflicting Interests: none.
References
- 1.Faucett CL, Thomas DC. Simultaneously modelling censored survival data and repeatedly measured covariates: a Gibbs sampling approach. Statistics in medicine. 1996 Aug;15(15):1663–85. doi: 10.1002/(SICI)1097-0258(19960815)15:15<1663::AID-SIM294>3.0.CO;2-1. [DOI] [PubMed] [Google Scholar]
- 2.Wulfsohn MS, Tsiatis AA. A joint model of survival and longitudinal data measured with error. Biometrics. 1997 Mar;53:330–339. [PubMed] [Google Scholar]
- 3.Henderson R, Diggle P, Dobson A. Joint modelling of longitudinal measurements and event time data. Biostatistics (Oxford, England) 2000 Dec;1(4):465–80. doi: 10.1093/biostatistics/1.4.465. [DOI] [PubMed] [Google Scholar]
- 4.Lin H, Turnbull BW, McCulloch CE, Slate EH. Latent class models for joint analysis of longitudinal biomarker and event process data : Application to longitudinal prostate-specific antigen readings and prostate cancer. Journal of the American Statistical Association. 2002 Mar;97(457):53–65. [Google Scholar]
- 5.Tsiatis AA, Davidian M. Joint modeling of longitudinal and time-to-event data : An overview. Statistica Sinica. 2004;14:809–834. [Google Scholar]
- 6.Rizopoulos D. Joint models for longitudinal and time-to-event data: With applications in R. 2012. [Google Scholar]
- 7.Proust-Lima C, Taylor JMG. Development and validation of a dynamic prognostic tool for prostate cancer recurrence using repeated measures of posttreatment PSA: a joint modeling approach. Biostatistics (Oxford, England) 2009 Jul;10(3):535–49. doi: 10.1093/biostatistics/kxp009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Rizopoulos D. Dynamic predictions and prospective accuracy in joint models for longitudinal and time-to-event data. Biometrics. 2011 Sep;67(3):819–29. doi: 10.1111/j.1541-0420.2010.01546.x. [DOI] [PubMed] [Google Scholar]
- 9.Sène M, Bellera CA, Proust-Lima C. Shared random-effect models for the joint analysis of longitudinal and time-to-event data: application to the prediction of prostate cancer recurrence. Journal de la Société Française de Statistique. In press. [Google Scholar]
- 10.Proust-Lima C, Sène M, Taylor JMG, Jacqmin-Gadda H. Joint latent class models for longitudinal and time-to-event data: A review. Statistical Methods in Medical Research. 2012 Apr; doi: 10.1177/0962280212445839. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Welsh SJ, Powis G. Personalized cancer medicine. Springer-Verlag; Berlin Heidelberg: 2009. [DOI] [Google Scholar]
- 12.Yu M, Taylor JMG, Sandler HM. Individual Prediction in Prostate Cancer Studies Using a Joint Longitudinal Survival-Cure Model. Journal of the American Statistical Association. 2008 Mar;103(481):178–187. [Google Scholar]
- 13.Taylor JMG, Park Y, Ankerst DP, Proust-Lima C, Williams S, Kestin L, Bae K, Pickles T, Sandler H. Real-time individual predictions of prostate cancer recurrence using joint models. Biometrics. 2013;69(1):206–213. doi: 10.1111/j.1541-0420.2012.01823.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kennedy EH, Taylor JMG, Schaubel DE, Williams S. The effect of salvage therapy on survival in a longitudinal study with treatment by indication. Statistics in medicine. 2010 Nov;29(25):2569–80. doi: 10.1002/sim.4017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Schoop R, Schumacher M, Graf E. Measures of prediction error for survival data with longitudinal covariates. Biometrical journal. 2011 Mar;53(2):275–93. doi: 10.1002/bimj.201000145. [DOI] [PubMed] [Google Scholar]
- 16.Commenges D, Proust-Lima C, Samieri C, Liquet B. A universal approximate cross-validation criterion and its asymptotic distribution. arXiv:1206.1753 [math.ST] Submitted; [Google Scholar]
- 17.Commenges D, Liquet B, Proust-Lima C. Choice of prognostic estimators in joint models by estimating differences of expected conditional kullback-leibler risks. Biometrics. 2012 Jun;68(2):380–7. doi: 10.1111/j.1541-0420.2012.01753.x. [DOI] [PubMed] [Google Scholar]
- 18.Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]
- 19.Rizopoulos DJM. An R package for the joint modelling of longitudinal and time-to-event data. Journal of Statistical Software. 2010;35(9):1–33. [Google Scholar]
- 20.Gerds TA, Schumacher M. Consistent Estimation of the Expected Brier Score in General Survival Models with Right-Censored Event Times. Biometrical Journal. 2006 Dec;48(6):1029–1040. doi: 10.1002/bimj.200610301. [DOI] [PubMed] [Google Scholar]
- 21.Gerds TA, Schumacher M. Efron-type measures of prediction error for survival analysis. Biometrics. 2007 Dec;63(4):1283–7. doi: 10.1111/j.1541-0420.2007.00832.x. [DOI] [PubMed] [Google Scholar]
- 22.Henderson R, Diggle P, Dobson A. Identification and efficacy of longitudinal markers for survival. Biostatistics (Oxford, England) 2002 Mar;3(1):33–50. doi: 10.1093/biostatistics/3.1.33. [DOI] [PubMed] [Google Scholar]
- 23.Graf E, Schmoor C, Sauerbrei W, Schumacher M. Assessment and comparison of prognostic classification schemes for survival data. Statistics in medicine. 1999;18(17–18):2529–45. doi: 10.1002/(sici)1097-0258(19990915/30)18:17/18<2529::aid-sim274>3.0.co;2-5. [DOI] [PubMed] [Google Scholar]
- 24.Proust-Lima C, Taylor JMG, Sécher S, Sandler H, Kestin L, Pickles T, Bae K, Allison R, Williams S. Confirmation of a low α/β ratio for prostate cancer treated by external beam radiation therapy alone using a post-treatment repeated-measures model for psa dynamics. International journal of radiation oncology, biology, physics. 2011 Jan;79(1):195–201. doi: 10.1016/j.ijrobp.2009.10.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Proust-Lima C, Taylor JMG, Scott W, Ankerst D, Liu N, Kestin L, KB, Howard S. Determinants of change in prostate-specific antigen over time and its association with recurrence after external beam radiation therapy for prostate cancer in five large cohorts. International Journal of Radiation Oncology Biology Physics. 2008 Aug;72(3):782–791. doi: 10.1016/j.ijrobp.2008.01.056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Heagerty PJ, Zheng Y. Survival model predictive accuracy and roc curves. Biometrics. 2005;61:92–105. doi: 10.1111/j.0006-341X.2005.030814.x. [DOI] [PubMed] [Google Scholar]
- 27.Zheng Y, Heagerty PJ. Prospective accuracy for longitudinal markers. Biometrics. 2007;63(2):332–341. doi: 10.1111/j.1541-0420.2006.00726.x. [DOI] [PubMed] [Google Scholar]




