Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Jun 11.
Published in final edited form as: Stat Med. 2015 Jan 29;34(10):1733–1746. doi: 10.1002/sim.6445

Evaluation of treatment efficacy using a Bayesian mixture piecewise linear model of longitudinal biomarkers

Lili Zhao a,*, Dai Feng b, Brian Neelon c, Marc Buyse d,e
PMCID: PMC5995342  NIHMSID: NIHMS972143  PMID: 25630845

Abstract

Prostate-specific antigen (PSA) is a widely used marker in clinical trials for patients with prostate cancer. We develop a mixture model to estimate longitudinal PSA trajectory in response to treatment. The model accommodates subjects responding and not responding to therapy through a mixture of two functions. A responder is described by a piecewise linear function, represented by an intercept, a PSA decline rate, a period of PSA decline and a PSA rising rate; a non-responder is described by an increasing linear function with an intercept and a PSA rising rate. Each trajectory is classified as a linear or a piecewise linear function with a certain probability, and the weighted average of these two functions sufficiently characterizes a variety of patterns of P SA trajectories. Furthermore, this mixture structure enables us to derive clinically useful endpoints such as a response rate and time-to-progression, as well as biologically meaningful endpoints such as a cancer cell killing fraction and tumor growth delay. We compare our model to the most commonly used dynamic model in the literature and show its advantages. Finally, we illustrate our approach using data from two multi-center prostate cancer trials. The R code used to produce the analyses reported in this paper is available on request.

Keywords: Mixture model, Changepoint, Bayesian hierarchical model, Longitudinal data, PSA, Tumor growth profile

1. Introduction

Prostate-specific antigen (PSA), a serine protease normally produced in the prostate, is a widely used marker in clinical trials for patients with prostate cancer. Typically it becomes elevated when the prostatic epithelium undergoes malignant transformation. The longitudinal trajectory for PSA levels is often non-linear in clinical trials. PSA levels are conventionally categorized as complete response, partial response, no change, or progressive disease [1]. For instance, a partial response is determined if the PSA level is decreased by at least 50% from the baseline level, and remained under 50% of the baseline level for at least 28 days. However, using these categorized endpoints often leads to biased estimates [2] and underpowered comparisons [35].

In this paper, we propose using continuous longitudinal PSA data to evaluate treatment effects. Motivated by a Phase III multi-center prostate cancer trial [1], we observe that, in many cases, there is a decline in PSA immediately after therapy followed by a rise in the recurrence of the cancer, and the decline in PSA represents that a subject responded to therapy. There has been extensive literature on fitting such non-linear PSA trajectories using a piecewise linear model (PWL) with a random changepoint [69]. Similar PWL models have been applied to other studies related to HIV [10], cognitive function [11], AIDS [12] and xenograft experiments [13]. However, we also observe that not all patients have a decline in PSA following therapy. For these patients, the longitudinal PSA data increase until disease progression as a sign of not responding to therapy. Therefore, it is important for a model to accommodate both of these trajectories for responders and non-responders. The most commonly seen mixture model in the literature consists of a mixture of a 1-parameter intercept-only function and a 3-parameter PWL function [10, 1416]. These two functions are probably adequate to model longitudinal trajectories in their contexts. For example, a flat trajectory is expected for controls in the ovarian cancer screening study [14]. However, they will not be appropriate for modelling longitudinal efficacy biomarkers following treatment.

To accommodate responders and non-responders, we consider a model with an additional slope parameter in both the PWL and linear functions. This additional parameter allows an initial decline in PSA among responders or a linear increase among non-responders. For a responder, the PSA trajectory is described by a piecewise linear function, represented by an intercept, a PSA decline rate, a period of PSA decline and a PSA rising rate. For a non-responder, the trajectory is described by an increasing linear function with an intercept and a PSA rising rate. Using Bayesian inference, a subject’s trajectory is classified into a linear or a PWL function with a certain probability. This probability estimates how likely it is that the subject responds to therapy. Furthermore, this additional slope parameter offers a greater flexibility for modelling a variety of patterns of non-linear PSA trajectories as indicated in Figure 1. This figure depicts hypothetical PSA trajectories simulated from our mixture model with a mixture probability ranging from 0 to 1, at 0.2 intervals. When the mixture probability is 0, the trajectory is an increasing linear function of time; when the mixture probability is 1, the trajectory is a PWL function with a decline phase. Different mixture probabilities between 0 and 1 result in different patterns of PSA trajectories.

Figure 1.

Figure 1

Hypothetical non-linear PSA trajectories following treatment. Each curve is an average over 1000 curves simulated from a mixture model with a mixture probability, p, ranging from 0 to 1, at 0.2 intervals, a fixed intercept at 7, and fixed normal distributions for the random changepoint, slope before and after the change point.

In addition to providing a sufficient fit to non-linear trajectories, this mixture structure also enables us to derive clinically useful endpoints such as a response rate and time-to-progression, as well as biologically meaningful endpoints, such as a cancer cell killing fraction and tumor growth delay.

The remainder of this article is organized as follows. In Section 2, we develop a Bayesian mixture hierarchical longitudinal model, derive clinically and biologically meaningful endpoints and address the non-constant variance issue. In Section 3 we present simulation studies and compare our mixture model to a commonly used dynamic model. Finally, in Section 4 we analyze the prostate cancer trial. We conclude the article with a brief discussion.

2. Mixture Model

Let yij be the log2 observed PSA data of subject i (i = 1, · · · , n) at time tij(j = 1, · · · , ni). Log transformation is commonly used to obtain linear profiles before and after the nadir (the lowest PSA measurement), and the base 2 logarithms are related to PSA doubling times [8]. The observed PSA data are measured with error, and can be described by yij = μzi,ij + εij , where εij ~ N(0, σ2). An unobserved indicator zi is included in the mean function to distinguish responders and non-responders. If zi = 1, subject i is a responder, and the mean function of the log2 PSA profile is described by a piecewise linear (PWL) function, which is given by

μ1,ij=ai+b1itijI(tijτi)+b1iτiI(tij>τi)+b2i(tij-τi)+ (1)

where functions x+ is defined as max(x, 0). Here, ai is the log2 measurement at baseline, τi is the unobserved changepoint (period of PSA decline), b1i is the rate of decline in PSA, and b2i is the PSA rising rate. If zi = 0, subject i is a non-responder, and the mean function is reduced to a linear function

μ0,ij=ai+b2itij (2)

Subject i has four parameters (ai, b1i, τi, b2i) with zi = 1, and only has two parameters (ai, b2i) with zi = 0.

2.1. Hierarchical Structure

Both mean functions in (1) and (2) include the subject-specific intercept, ai, which represents the baseline data (log2 transformed) common to all treatment groups. Hence we will borrow strength across all subjects (in all treatment groups) for estimation of this parameter. Specifically, for i = 1, · · · , n, we consider

aiα,σa2~N(α,σa2)

where α and σa2 are the population-level parameters. Borrowing the most information across all subjects and all treatment groups will gain precision in parameter estimations.

The log2 PSA decline rate b1i and period of decline τi are specific to zi = 1, and it is very likely that they are affected by treatments. Therefore, we only allow pooling information across subjects with zi = 1 in the same treatment group for the estimation of these parameters. Let gi be the treatment applied to subject i out of k = 1, . . . , K treatments (hence gi {1, . . . , K}), we consider

b1iβ1gi,σb1gi2,zi=1~N(β1k,σb1k2)I(b1i<0) (3)
τiμτgi,στgi2,zi=1~N(μτk,στk2)I(τi>0) (4)

where β1k, μτk, σb1k2, and στk2 are treatment-level parameters. The restrictions of b1i < 0 and τi > 0 suggest that the log2 PSA profile for a responder is characterized by an initial PSA decline followed by a rebound of PSA.

The parameter b2i also appears in both zi = 1 and zi = 0. It represents a log2 PSA growth rate when zi = 0 and a regrowth rate when zi = 1. To allow comparisons between treatment arms, we consider

b2iβ2gi,σb2gi2~N(β2k,σb2k2) (5)

Additionally, indicators zi (i = 1, · · · , n) are assumed to follow a Bernoulli distribution given by

zi~Bern(ngi,Pgi) (6)

where ngi is the number of subjects and Pgi is the response rate in treatment gi (gi {1, . . . , K}).

The hierarchical structures from (3)-(6) allow us to directly make comparisons between treatment groups using treatment-level parameters such as Pk, β1k, μτk β2k.

2.2. Posterior Computation

The Markov chain Monte Carlo procedure for estimating the posterior distributions is implemented by repeatedly drawing samples from the full conditional distributions of the parameters. The full conditional distributions for all parameters except τi and zi are straightforward to compute due to the conjugacy, assuming posterior distributions are normal for the mean parameters and gamma for the precision parameters. The slice sampling algorithm developed by Neal [17] is used in sampling the changepoint τi. Updating the class indicator zi requires an algorithm that can move between different parameter spaces, that is, between a space with two parameters (ai, b2i) and a space with four parameters (ai, b1i, τi, b2i). We apply the pseudo-prior method of Carlin et al. [18] and the reversible jump procedure of Green [19] that allow such moving. Details of the calculation are given in Appendix A. With the prior restrictions in (3) and (4), these algorithms can move between zi = 0 and zi = 1 efficiently. When no changepoint is present in the trajectory, τi is not identifiable [20], and in this situation the algorithm will randomly pick a positive τi. With the constraint of b1i < 0 and τi > 0 , the algorithm is able to correctly identify the pattern of zi = 0 due to the absence of a negative pre-nadir slope.

2.3. Endpoints for Drug Efficacy

An important feature of our mixture model is its capability to derive clinically useful endpoints such as the response rate and time to PSA progression (TTP).

As defined in most clinical protocols with a PSA endpoint, a subject is classified as a responder if the subject experiences a decline in PSA following treatment, although the reduction threshold varies across protocols. In our mixture model, the log2 PSA profile of subject i is classified as zi = 0 or zi = 1 at each MCMC iteration, and the proportion of times the profile is classified as zi = 1 (corresponds to a PWL function) across all MCMC iterations estimates the probability that subject is a responder. Clinically, this probability may help guide physicians in developing individualized treatments. For a treatment k, its response rate, Pk can be calculated using zi (∀i; gi = k) by the beta-binomial conjugacy as shown in (6). This Pk is the preferred endpoint of response rate in Phase II trials.

Additionally, we derive the TTP endpoint as a X% increase in PSA taking the nadir (smallest value) as reference. We define TTPk (k = 1, · · · ,K,) as

TTPk=Pkμτk+log2(1+X%)/β2k (7)

At the subject level, TTP is the sum of two time periods: the time from the start of treatment to nadir and the time from nadir to progression. The first time period is the period of PSA decline, τi, if zi = 1 and zero otherwise; the second time period, by simple algebra, equals to log2(1 + X%)/b2i. At the treatment level, TTPk is defined in a similar fashion except that subject-level parameters are replaced by treatment-level parameters as shown in (7), in which the averaged treatment responding time is estimated as μτk weighted by Pk, the proportion of responders in treatment k.

If PSA is a surrogate for tumor size such that a decline (increase) of PSA values indicates regression (growth) of the tumor. Also, if treated tumors can be assumed to regrow at the same rate as untreated tumors (i.e., β21 = · · · = β2K = β2) [21, 22], we can define cancer cell killing fraction (KF), a cell surviving fraction (LSF) and a tumor growth delay (TGD) respectively as

LSFk=-Pk×μτk(β2-β1k),KFk=1-2LSFk,TGDk=-LSFk/β2

These clinical and biological endpoints are all defined as a function of our model parameters. Using MCMC draws of model parameters, these endpoints can be easily calculated and compared between treatment groups.

2.4. Non-constant Error Variance

Since laboratory assays are known to have reduced precision at lower levels, we are concerned with the assumption that var(yij) = σ2. To check this assumption, we modelled the log2 variance as a linear function of the observed yij [8, 23]. Additionally, to improve model performance, we normalized longitudinal data of subject i by its baseline value, yi1, specifically, εij~N(0,σij2), where

log2(σij2)=σ2-2r(yij-yi1)

If r = 0, data have a variance σ2, which is independent of the marker value; if r > 0, the measurement variance decreases with the increasing marker level, and vice versa. Further details on MCMC sampling are given in Appendix B.

3. Simulation Studies

3.1. Performance of our mixture model

We carried out simulations to investigate the performance of our proposed mixture model defined in Section 2 (denoted as Mixture model). We simulated two groups, each containing 40 subjects, with each subject having measurements at baseline, at 2 weeks, monthly for 6 months, at 3-month intervals until the second year. The true parameter values were chosen to mimic the real data example, given by

  • ai ~ N(7, 22) and σ2 = 0.42.

  • Arm I: P1 = 0.6, b1i ~ N(−0.8, 0.162), τi ~ N(3, 1.52), b2i ~ N(0.4, 0.12). Arm II: P2 = 0.8, b1i ~ N(−1.2, 0.162), τi ~ N(5, 1.52), b2i ~ N(0.2, 0.12). In both arms, b1i is truncated below zero and τi and b2i are truncated above zero.

In the simulations, data were truncated at a progression point, determined on the basis of a moving average of three consecutive values of PSA. Progression was defined as an increase in PSA equal to, or larger than, 50% above the lowest prior moving average. This increase has to be either the last determination in the patient follow-up, or maintained for at least 28 days [1]. Based on this rule and the above chosen parameters, the average median number of measurements per subject across 80 subjects is 7 (ranges from 2 to 13 ).

The models were implemented in R. We generated 5000 Gibbs samples with a burn-in of 2000 iterations. N(0, 100) priors were used for the mean parameters except that μτ ~ N(4, 100). Gamma(0.01, 0.01) priors were used for the precision parameters, and a noninformative prior, Beta(0.5, 0.5), was used for Pk (k = 1, · · · ,K). The posterior estimates from the reversible jump procedure and pseudo-prior approach were very similar; therefore we only reported the estimation from the reversible jump procedure. To improve the computation speed, a table lookup method was used in the indexing operation, and vectorized computation was used wherever possible. Table 1 shows the true values of the parameters (True), estimated mean (Mean), standard deviation (SD), square root of mean square error (SqrMSE), and 95% coverage probability (CP), based on 1000 simulated trials. These results indicate that the Mixture model performed well for the estimation of all parameters. In Appendix D, we also present simulation results for data simulated from a different set of parameters, which resemble the real data example, and the model also performed well as shown in Table 6.

Table 1.

Parameter estimation and performance statistics using the Mixture model based on 1000 simulated datasets.

Parameter Arm True Mean SD SqrtMSE CP
α 7 6.99 0.28 0.28 89
P I 0.6 0.56 0.07 0.08 96
II 0.8 0.79 0.07 0.07 95
β1 I −0.8 −0.85 0.10 0.11 94
II −1.2 −1.21 0.08 0.08 94
μτ I 3 3.18 0.33 0.38 93
II 5 5.03 0.28 0.28 94
β2 I 0.4 0.39 0.03 0.03 92
II 0.2 0.20 0.03 0.03 94

Table 6.

Parameter estimation and performance statistics using the Mixture model based on 1000 simulated datasets.

Parameter Arm True Mean SD SqrtMSE CP
α 7 6.97 0.31 0.31 85
P I 0.4 0.42 0.06 0.07 97
II 0.6 0.55 0.07 0.08 97
β1 I −0.5 −0.49 0.05 0.05 98
II −0.75 −0.75 0.09 0.09 97
μτ I 4 3.96 0.45 0.46 95
II 2 2.21 0.24 0.32 92
β2 I 0.35 0.35 0.03 0.03 90
II 0.35 0.34 0.03 0.03 90

3.2. Performance of the tumor growth inhibition model

In the literature, models to estimate longitudinal efficacy biomarkers following treatment are very limited. Dynamic models are the only models used to estimate longitudinal data (specifically, tumor size data) following treatment. These models assume that a non-linear tumor growth profile is a result of two latent processes: a cell growing process and a cell killing process induced by drug exposure. Since PSA can be considered as a surrogate of tumor size, dynamic models can serve as alternative models for longitudinal PSA data. Among all dynamic models in the literature [2428], the most commonly used model was proposed by Claret et al. [2628]. It is referred to as the tumor-growth inhibition (TGI) model. The TGI model is described by the differential equation,

dyi(t)dt=KLiyi(t)-KD0iexp(-λit)yi(t)yi(0)=y0i

where yi(t) is the PSA measurement at time t for subject i, and yi(t) is described as a result of two latent processes: a growing process with a rate of KLi and a killing process with a rate of KD0i exp(−λit), which decreases exponentially with time (according to λi) from an initial rate of KD0i [26].

This differential equation can be shown (details in Appendix C) to be equivalent to

log(yi(t))=log(yi(0))+KLit+KD0iλi(exp(-λit)-1)

where lognormal distributions are assumed for KLi,KD0i and λi as specified in [26].

It is advisable to assume that the baseline data is also measured with error. We, therefore, modified the above model by replacing log(yi(0)) with a random parameter ai (i = 1, · · · , n), which can be thought as unobserved true values at baseline. This model is denoted as TGI(a).

Following Claret et al. [26], the period of PSA decline (τi) can be calculated as

τi=log(KD0i)-log(KLi)λi

It is noteworthy that τi is negative when KD0i < KLi . Therefore, the TGI model provides an estimate of τi for all trajectories including the one without a decline phase. In contrast, the Mixture model provides an estimate of τi only when zi = 1.

We used the same simulation strategy to investigate the performance of the TGI and TGI(a) model. Again, true parameter values were chosen to mimic the real data example, given by

  • ai ~ N(7, 22) and σ2 = 0.42

  • Arm I: λi ~ LN(−0.3, 0.52), KD0i ~ LN(−0.4, 0.52), KLi ~ LN(−1.2, 0.52). Arm II: λi ~ LN(−0.5, 0.52), KD0i ~ LN(0.4, 0.52), KLi ~ LN(−1.5, 0.52). LN denotes the lognormal distribution.

The distribution of ai is the same as in the Mixture model. Following the calculation in [26], we estimated the posterior median (i.e., exp(a) in LN(a, b)) for each parameter. Based on the choices of the above parameters, the average median number of measurements per subject across 80 subjects is 6 (ranges from 2 to 13). The models were implemented in WinBUGS 1.4. N(0, 100) priors were used for the mean parameters and Gamma(0.01, 0.01) priors were used for the precision parameters. Both models provided accurate estimates for model parameters except λ, and the bias of λ was large especially in the TGI model (see Table 5 in Appendix D). Since the TGI(a) was better from the simulation study and also coincides with the generated data (i.e, the TGI(a) model considers that baseline data are generated with errors), we compared the Mixture model to the TGI(a) model in the next section.

3.3. Comparison of the Mixture model and the TGI(a) model

We fitted the Mixture model and TGI(a) model to the data generated in Section 3.1 (under the Mixture model) and the data generated in Section 3.2 (under the TGI(a) model), which resulted in 4 analyses (a combination of 2 models and 2 datasets). We used three Bayesian model comparison criteria to compare the Mixture model and the TGI(a) model: 1) a modified Deviance Information Criterion (DIC3) [29], a Watanabe-Akaike information criterion (WAIC) [30] and a log-pseudo marginal likelihood (LPML) [31]. DIC3 is preferred in our setting over the standard DIC proposed by Spiegel-halter et al. [32] since it correctly reflects the effective number of parameters in mixture models, 2) LPML is a cross-validated leave-one-out measure of a models ability to predict the data. It is valid for small and large samples and does not suffer from a heuristic justification based on large sample normality, and 3) WAIC was proposed recently and can be viewed as an improvement on the standard DIC and it also approximates Bayesian cross-validation [33]. The best model should have the smallest DIC3 and WAIC and largest LPML.

Not surprisingly, on average TGI(a) model was the best for the data that were simulated from the TGI(a) model and the Mixture model was the best for the data that were simulated from the Mixture model, as shown in Table 2 (best diagnostic statistics are highlighted). But it appears that the Mixture model was more robust than the TGI(a) model (the differences in the diagnostic statistics were between 27 and 64 for data from the Mixture model and between 190 and 385 for data from the TGI(a) model). It is also important to note that, for data that are simulated from the TGI(a), the Mixture model was also chosen to be the best model 29–34% of the times.

Table 2.

Model comparisons

Data from Mixture Data from TGI(a)
Models DIC3 WAIC LPML DIC3 WAIC LPML
Mixture 710 778 440 754 744 −449
TGI(a) 1095 1162 −630 690 711 422
% times Mixture is better 100 100 100 30 29 34

4. Application

We applied the Mixture model, TGI and TGI(a) model to two multi-center trials for patients with advanced (metastatic) prostate cancer [1]. An experimental retinoic acid metabolism-blocking agent, liarozole, was compared with the antiandrogenic drugs, cyproterone acetate and flutamide. As in previous analyses of these data [1], we combined data from the two trials and make no distinction between the antiandrogenic drugs. Patients in both trials were in relapse after first-line endocrine therapy. We included 485 patients in our analyses, 249 treated with liarozole and 236 treated with antiandrogenic drugs. Assessments of PSA were undertaken before the start of treatment, at 2 weeks, monthly for 6 months, at 3-month intervals until treatment discontinuation or death. The number of PSA measurements per subject ranges from 2 to 19 (median is 6).

All priors were the same as in the simulation study. Based on the prior knowledge that PSA progression occurred within 6 months and the assumption that changepoint occurs about 2 months earlier than the progression time, we selected 4 as the mean for the normal prior for μτ (i.e.,μτ ~ N(4, 100)). These priors were quite vague relative to the likelihood. A sensitivity analysis to investigate the effect of the prior distributions on the parameter estimates was performed. The effect on the final estimates of increasing the variance of the prior distribution was assessed. For example, increasing the variance of the prior normal distribution (e.g., σa2,σb1k2,σb2k2, and στ2) from 100 to 1000 had a negligible effect on the posterior distribution of their estimates. This indicates that the priors employed were sufficiently vague relative to the likelihood, so that the prior had minimal effect on the posterior estimates.

We generated 30,000 Gibbs samples with a burn-in of 10,000 iterations, and we then used 2,000 iterations obtained from every 10th iteration for computing all posterior estimates, including the posterior mean and the highest probability density (HPD) intervals. It took about six hours for the PSA data to execute on an Intel Xeon 3.10 GHz 4GB RAM, x64 Linux computer. The MCMC chains mixed well and the convergence of the MCMC sampling algorithm was further checked using several diagnostic procedures as recommended by Cowles et al. [34].

In the first step, we verified the assumption of a constant variance by calculating a 95% HPD interval for the parameter r, as defined in section 2.4. The interval covered zero, suggesting r was not significantly different from zero and a constant variance was assumed in the analysis below.

Table 3 suggests that both the Mixture and TGI(a) models are reasonable choices for the fit of PSA data. Specifically, the Mixture model is preferred for the Liarozole arm and the TGI(a) is preferred for the Antiandrogens arm, and the TGI model seems to be the worse choice for both arms.

Table 3.

Model comparisons

Models Antiandrogens Liarozole
DIC3 LPML WAIC DIC3 LPML WAIC


Mixture 1758 −1125 1998 1771 1086 1950
TGI(a) 1697 1070 1884 1842 −1124 2021
TGI 2020 −1192 2179 1965 −1140 2104

Posterior estimates from the Mixture and TGI(a) models are displayed in Table 4. This Table shows that both models provided very close estimates for α and σ2. A 95% HPD interval was calculated for the difference in each parameter between Liarozole and Antiandrogens (denoted by Difference in the Table); if the interval does not cover zero for a parameter, the two arms are considered to be significantly different for that parameter (denoted by *). Both models show that Liarozole was more effective. In the Mixture model, while the response rate in Liarozole was similar to Antiandrogens ( 40%), patients treated with Liarozole responded to therapy significantly longer than those treated with antiandrogenic drugs, which led to a significantly prolonged TTP, greatly increased cancer cell killing fraction as well as a longer tumor growth delay. In the TGI(a) model, the initial drug killing rate was higher in Liarozole and the tumor growth rate was also marginally higher in Liarozole. The estimates using the TGI model were similar to the TGI(a) model except that KL is not significant in the TGI model.

Table 4.

Posterior mean and 95% HPD intervals in prostate cancer trials using the Mixture model and TGI(a) model.

Common Parameters
Mixture σ 0.46 (0.45,0.48)
TGI(a) 0.46 (0.45,0.48)
Mixture α 7.10 (6.91,7.28)
TGI(a) 7.04 (6.84,7.21)
Different Parameters
Antiandrogens Liarozole Difference
Mixture P 0.40(0.33,0.48) 0.43(0.35,0.50) (−0.08,0.13)
μτ 2.02(1.57,2.46) 4.09(3.28,4.96) (1.12,3.07)*
β1 −0.76(−0.97, −0.57) −0.74(−0.90, −0.59) (−0.25,0.27)
β2 0.37(0.33,0.42) 0.35(0.32,0.39) (−0.07,0.04)
TTP 2.39(2.15,2.67) 3.41(2.94,3.87) (0.46,1.53)*
KF 0.46(0.38,0.55) 0.73(0.63,0.82)) (0.13,0.39)*
TGD 2.50(1.90,3.18) 5.30(3.95,6.81) (0.28,4.48)*
TGI(a) λ 0.97(0.42,1.73) 0.82(0.39,1.41) (−1.24,0.79)
KD0 0.26(0.09,0.42) 0.62(0.40,0.97) (0.08,0.79)*
KL 0.27(0.22,0.32) 0.34(0.29,0.40) (0.00,0.15)*

Figure 2 displays the estimated PSA trajectories on the observed data for 9 selected subjects. All three models fit the data reasonably well. The Mixture model was noticeably better for subject G and slightly worse for subject I. Based on the Mixture model, each fitted curve was estimated by a weighted average of a linear function and a PWL function. The weight (mixture probability) is shown on top of each panel, which indicates how likely it is that the subject responds to therapy. For instance, patient A is a responder, patient D is a non-responder, and patient F is a responder with a 48% probability. It is also important to note that the fitted curve is not limited to a linear or PWL function; for instance, patient A, B and C had a quadratic fitted curve and patient F and I had a fitted curve with a slower increasing linear function followed by a rapidly increasing linear function. These 9 subjects represent different patterns of PSA trajectories and they were selected for illustrative purposes. We want to emphasize that the model fit all 485 subjects fairly well (plots are available upon request). It is worth mentioning that profiles with two data points (such as patient H) should be interpreted with caution. If the two data points are far apart in time, it is unknown if there was a PSA decline between the two time points. In this study, there are only a few such subjects and the two time points were reasonably close, otherwise, these subjects should be excluded before using the Mixture model.

Figure 2.

Figure 2

log2 PSA profiles for 9 selected patients in the prostate cancer trial. The horizontal axis is the months at which PSA are measured. Each plot represents one patient denoted by A,B,C, · · · , I. The circles are the observed data. The percentage on the top of each panel represents a mixture probability, i.e., the proportion of times that the trajectory is classified as a PWL function.

Lastly, we assessed the goodness-of-fit of each model using a quantitative measure based on posterior predictive checks. The predicted values were obtained from the posterior predictive distribution for each posterior sample from the MCMC algorithm. Figure 3 suggests that the Mixture model fit the PSA data better than the TGI and TGI(a) models. In the TGI(a) model, the observed 25% quartile and median were above the 95% confidence intervals of predictive distribution of the model; in the TGI model, the observed 75% quartile is below the 95% confidence interval. In contrast, the observed median and quartiles (25% and 75% quartiles) for the Mixture model were all well within the 95% confidence intervals.

Figure 3.

Figure 3

95% confidence intervals of predicted log2 PSA reduction at the last measurement relative to baseline (a), or relative to the second measurement (b), compared with observed values (vertical lines).

5. Discussion

In this article, we extended existing mixture models to estimate non-linear PSA trajectories following treatment by introducing a mixture structure consisting of a linear function and a PWL function. These two functions have an important clinical implication: a responder is described by a PWL function and a non-responder is described by a linear function. The model is able to classify each subject as a responder or non-responder with a certain probability. Using this probability as a weight, the weighted average of the two trajectories sufficiently characterizes a variety of patterns of PSA trajectories. Furthermore, this mixture structure enables us to derive clinically useful endpoints such as a response rate and time-to- progression, as well as biologically meaningful endpoints such as a cancer cell killing fraction and tumor growth delay.

A major limitation of dynamic models is its inability to classify the PSA trajectories; therefore, it provides an estimate of the PSA decline period even when this period does not exist. We illustrated the advantage of our model through simulation studies and an actual clinical trial data, and conclude that it is an attractive alternative to dynamic models for longitudinal PSA data following treatment. The proposed mixture model is generalizable to many other diseases when longitudinal efficacy biomarker data are available (for example, CD125 in ovarian cancer and circulating tumor cells in various solid tumors). Additionally, this model allows the inclusion of covariates for the estimation of the changepoint, log2 PSA value at the changepoint, and log2 PSA rates of change before and after the changepoint. For instance, the mean of τi, μτ can be modelled as a linear function of the treatment groups and patient-level covariates.

The model can also incorporate data that fall below the limit of detection. Such data are not observed in the prostate cancer trials, but they are likely to occur in most clinical trials. For example, subjects with complete response have data that are below the limit of detection. Such data can be considered left-censored and their values are less than a threshold. In the posterior sampling, these data can be sampled from a truncated normal distribution as illustrated in [13].

In this article we used partial follow-up data on PSA for patients who dropped out of the study before PSA progression by assuming uninformative drop-out. Appropriate adjustment is needed if the drop-out is informative of the PSA trajectory, and the inclusion of some time-dependent covariates may potentially be useful for this adjustment.

Acknowledgments

The authors gratefully acknowledge the constructive comments of three referees, and thank Dr. Joseph Heyse for helpful discussions. The authors also thank the Janssen Research Foundation (Beerse, Belgium) for permission to use data from two clinical trials testing liarozole in patients with advanced prostate cancer.

APPENDIX

A: Posterior Computation

The Markov chain Monte Carlo procedure for estimating the posterior distributions was implemented by repeatedly drawing samples from the full conditional distributions of the parameters. The full conditional distributions for all parameters except τi and zi are straightforward to compute due to the conjugacy, assuming posterior distributions are normal for the mean parameters and gamma for the precision parameters.

Sampling τi

The likelihood function is only continuous but not differentiable in τi. Skates et al. [14] and Pauler et al. [7] propose using an approximation of the likelihood to obtain a proposal density in the Metroplis-Hastings algorithm. We found that the slice sampling algorithm developed by Neal [17] is more efficient in sampling the changepoint τi. In the slice sampling algorithm, the lower limit is zero and the slice width is 0.2.

Sampling zi

Updating the class indicator zi of whether or not there is a treatment responding period is not straightforward, because different values of zi imply different parameter spaces. If zi = 0, two parameters (ai, b2i) will need to be updated; if zi = 1, four parameters (ai, b1i, τi, b2i) will need to be updated. Hence sampling from the full conditional distribution of zi requires an algorithm that can move between zi = 0 and zi = 1. We applied two procedures that allow such moving between models, the reversible jump procedure of Green [19] and the pseudo-prior method of Carlin et al. [18] . In both procedures, we took p(zi=1)=p(zi=0)=12 as a priori. In the reversible jump procedure, prior densities of τi and b1i were chosen as their proposal densities to propose a move to zi = 1. Specifically, we generated a candidate b1i~N(β1k,σb1k2)I(b1i<0) and a candidate τi~N(μτk,στk2)I(μτk>0), and the move was accepted with probability

min{likehoodratio×priorratio×proposalratio×Jacobian,1},

since the proposal densities are their prior distributions, prior ratio × proposal ratio = 1, and the Jacobian was also equal to 1 because b1i and τi were generated directly from proposal distributions. Therefore, the acceptance probability reduced to the minimum of likelihood ratio and 1,

min{l(yiai,b1i,τi,b2i,σ2)l(yiai,b2i,σ2),1}

where l(.) is the likelihood of longitudinal data of subject i (i.e., yi = (yi1, · · · , yini ). The acceptance probability for a proposed move from zi = 1 to zi = 0 is the inverse of the above probability.

The prior densities of τi and b1i were also chosen to be their pseudo-prior densities. As the name suggests, a pseudo-prior is not really a prior but only a conveniently chosen linking density, required to define completely the joint model specification. In other words, we augmented the trajectory under zi = 0 by defining a probability distribution for a hypothetical b1i and τi. These variables have no meaningful interpretation under zi = 0, and they are only introduced to match the parameter dimensions. The full conditional distributions of b1i and τi are

f(b1i,τizi,yi)={l(yib1i,τi,zi=1)g(b1i,τizi=1)ifzi=1;g(b1i,τizi=0)ifzi=0.

Where g(b1i,τizi=0)=N(β1k,σb1k2)I(b1i0)×N(τ0,στ02)I(μτk>0). When zi = 1, we generate b1i and τi from the usual full conditional given data; when zi = 0, we generate b1i and τi from their pseudo-prior densities g(b1i,τizi=0). In this case when the pseudo-priors are their prior densities, we have

p(zi=1)=l(yiai,b1i,τi,b2i,σ2)l(yiai,b1i,τi,b2i,σ2)+l(yiai,b2i,σ2),

Hence the pseudo-prior method generates zi as a discrete random variable of 0 or 1, in contrast to the Metropolis step in the reversible jump procedure.

B: Sampling the Non-constant Variance

In section 2.4, we modelled the variance as a function of the observed yij , i.e.,

εij~N(0,σij2),

The full conditional distributions of σ2 and r are

[1/σ2]=Gamma(0.01+i=1Nni2,0.01+12i=1Nj=1ni(vij/vi1)2r(yij-μz,ij)2)

We sampled r by means of the adaptive Metropolis-Hastings algorithm [35]. The normal proposal density was centered at the previous value, and the variance in the proposal was ”refined” by using the empirical covariance from an extended burn-in period.

C: Implementation of the TGI model

f(r)exp{ri=1Nj=1nilog(vij/vi1)-12σ2i=1Nj=1ni(vij/vi1)2r(yij-μzi,ij)2}

Therefore,

dyi(t)dt=KLiyi(t)-KD0iexp(-λit)yi(t)andyi(0)=y0idyi(t)dt1yi(t)=KLi-KD0iexp(-λit)dlog(yi(t))dt=KLi-KD0iexp(-λit)log(yi(t))=KLit+KD0iλiexp(-λit)+clog(yi(0))=KD0iλi+cc=log(yi(0))-KD0iλi

Therefore,

log(yi(t))=log(yi(0))+KLit+KD0iλi(exp(-λit)-1)

In the TGI model, KLi,KD0i and λi were assumed to be lognormally distributed.

D: Additional Simulation Results

Simulation results using the TGI and TGI(a) models are presented in Table 5. Data were simulated from true parameters below, and simulation results from the Mixture model are shown in Table 6.

Table 5.

Parameter estimation and performance statistics using TGI and TGI(a) models based on 1000 simulated datasets.

Parameter Arm True baseline Mean SD SqrMSE CP
α 7 TGI(a) 6.99 0.22 0.22 95
λ I 0.74 TGI(a) 1.96 10.99 11.06 93
TGI 6.43 86.86 87.04 85
II 0.61 TGI(a) 0.67 0.15 0.16 93
TGI 0.67 0.15 0.17 91
KD0 I 0.67 TGI(a) 0.99 3.33 3.34 96
TGI 2.79 32.18 32.25 92
II 1.49 TGI(a) 1.54 0.22 0.23 94
TGI 1.54 0.22 0.22 93
KL I 0.30 TGI(a) 0.29 0.07 0.07 90
TGI 0.29 0.06 0.00 90
II 0.22 TGI(a) 0.21 0.04 0.03 92
TGI 0.22 0.03 0.00 92
  • Arm I: P1 = 0.4, b1i ~ N(−0.5, 0.12), τi ~ N(4, 1.52), b2i ~ N(0.35, 0.12). Arm II: P2 = 0.6, b1i ~ N(−0.75, 0.162), τi ~ N(2, 12), b2i ~ N(0.35, 0.12).

References

  • 1.Buyse M, Vangeneugden T, Bijnens L, Geys H, Renard D, Burzykowski T, Molenberghs G. Validation of biomarkers and surrogates for clinical endpoints, chap. In: Bloom JC, Dean RA, editors. Biomarkers in Clinical Drug Development. Marcel Dekker; New York: 2003. pp. 149–168. [Google Scholar]
  • 2.Panageas KS, Ben-Porat L, Dickler MN, Chapman PB, Schrag D. When you look matters: The effect of assessment schedule on progression-free survival. J Natl Cancer Inst. 2007;99:428–432. doi: 10.1093/jnci/djk138. [DOI] [PubMed] [Google Scholar]
  • 3.Karrison T, Maitland M, Stadler W, Ratain M. Design of phase II cancer trials using a continuous endpoint of change in tumor size: application to a study of sorafenib and erlotinib in non small-cell lung cancer. J Natl Cancer Inst. 2007;99:1455–61. doi: 10.1093/jnci/djm158. [DOI] [PubMed] [Google Scholar]
  • 4.Adjei AA, Christian M, Ivy P. Novel designs and end points for phase II clinical trials. Clin Cancer Res. 2009;15:1866–1872. doi: 10.1158/1078-0432.CCR-08-2035. [DOI] [PubMed] [Google Scholar]
  • 5.Wason J, Mander A, Eisen T. Reducing sample sizes in two-stage phase II cancer trials by using continuous tumour shrinkage end-points. Eur J Cancer. 2011;47:983–999. doi: 10.1016/j.ejca.2010.12.007. DOI: http://dx.doi.org/10.1016/j.ejca.2010.12.007. [DOI] [PubMed] [Google Scholar]
  • 6.Slate EH, Clark LC. Using PSA to detect prostate cancer onset: An application of Bayesian retrospective and prospective changepoint identification. Journal of Educational and Behavioral Statistics. 1999;26:443–468. [Google Scholar]
  • 7.Pauler DK, Finkelstein DM. Predicting time to prostate cancer recurrence based on joint models for non-linear longitudinal biomarkers and event time outcomes. Statistics in Medicine. 2002;21:3897–3911. doi: 10.1002/sim.1392. [DOI] [PubMed] [Google Scholar]
  • 8.Bellera CA, Hanley JA, Joseph L, Albertsen PC. Hierarchical changepoint models for biochemical markers illustrated by tracking post-radiotherapy Prostate-Specific Antigen series in men with prostate cancer. Ann Epidemiol. 2008;18:270–282. doi: 10.1016/j.annepidem.2007.10.006. [DOI] [PubMed] [Google Scholar]
  • 9.Bellera CA, Hanley JA, Joseph L, Albertsen PC. A statistical evaluation of rules for biomedical failure after radiothrapy in mean treated for prostate cancer. Int J Radiat Oncol Biol Phys. 2009;75:1357–1363. doi: 10.1016/j.ijrobp.2009.01.013. [DOI] [PubMed] [Google Scholar]
  • 10.Pauler DK, Laird NM. A mixture model for longitudinal data with application to assessment of noncompliance. J R Statist Soc A. 2000;56:464–472. doi: 10.1111/j.0006-341X.2000.00464.x. [DOI] [PubMed] [Google Scholar]
  • 11.Hall CB, Ying J, Kuo L, Lipton RB. Bayesian and profile likelihood change point methods for modeling cognitive function over time. Computational Statistics and Data Analysis. 2003;42:91–109. doi: 10.1016/S0167-9473(02)00148-2. [DOI] [Google Scholar]
  • 12.Kiuchi A, Hartigan J, Holford T, Rubinstein P, Stevens C. Change points in the series of T4 counts prior to AIDS. Biometrics. 1995;51:236–248. doi: 10.2307/2533329. [DOI] [PubMed] [Google Scholar]
  • 13.Zhao L, Morgan MA, Parsels LA, Maybaum J, Lawrence TS, Normolle D. Bayesian hierachical changepoint methods in modeling the tumor growth profiles in xenograft experiments. Clinical Cancer Research. 2010;17:1–7. doi: 10.1158/1078-0432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Skates SJ, Pauler DK, Jacobs IJ. Screening based on the risk of cancer calculation from Bayesian hierarchical changepoint and mixture models of longitudinal markers. Journal of the American Statistical Association. 2001;96:429–439. doi: 10.1198/016214501753168145. [DOI] [Google Scholar]
  • 15.Garre FG, Zwinderman AH, Geskus RB, Sijpkens YWJ. A joint latent class changepoint model to improve the prediction of time to graft failure. J R Statist Soc A. 2008;171:299–308. doi: 10.1111/j.1467-985X.2007.00514.x. [DOI] [Google Scholar]
  • 16.Zhao L, Banerjee M. Bayesian piecewise mixture model for racial disparity in prostate cancer. Computational Statistics and Data Analysis. 2012;56:362–369. doi: 10.1016/j.csda.2011.07.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Neal RM. Slice sampling. The Annals of Statistics. 2003;31:705–767. [Google Scholar]
  • 18.Carlin BP, Gelfand AE, Smith AFM. Hierarchical Bayesian analysis of changepoint problems. Journal of the Royal Statistical Society Series C-Applied Statistics. 1992;41:389–405. HierarchicalBayesiananalysisofchangepointproblems. [Google Scholar]
  • 19.Green PJ. Reversible-jump markov chain monte carlo computation and Bayesian model determination. Biometrika. 1995;82:711–732. [Google Scholar]
  • 20.Andrews DWK. Testing when a parameter is on the boundary of the maintained hypothesis. Econometrica. 2001;69:683–773. doi: 10.1111/1468-0262.00210. [DOI] [Google Scholar]
  • 21.Demidenko E. Three endpoints of in vivo tumour radiobiology and their statistical estimation. Int J Radiat Biol. 2010;86:164–173. doi: 10.3109/09553000903419304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Wu J. Assessment of antitumor activity for tumor xenograft studies using exponential growth models. J Biopharm Stat. 2011;21:472–483. doi: 10.1080/10543406.2010.481802. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Zangeneh SZ. PhD Thesis. University of Michigan; Ann Arbor: 2012. Model-based methods for robust finite population inference in the presence of external information. [Google Scholar]
  • 24.Wang Y, Sung C, Dartois C, Ramchandani R, Booth BP, Rock E, Gobburu J. Elucidation of relationship between tumor size and survival in non-small-cell lung cancer patients can aid early decision making in clinical drug development. Clin Pharmacol Ther. 2009;86:167–174. doi: 10.1038/clpt.2009.64. [DOI] [PubMed] [Google Scholar]
  • 25.Stein W, Gulley J, Schlom J, Madan R, Dahut W, Figg W, Ning YM, Arlen P, Price D, Bates S, et al. Tumor regression and growth rates determined in five intramural NCI prostate cancer trials: the growth rate constant as an indicator of therapeutic efficacy. Clinical Cancer Research. 2011;17:907–17. doi: 10.1158/1078-0432.CCR-10-1762. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Claret L, Girard P, Hoff PM, abd Klaas P, Zuideveld EVC, Jorga K, Fagerberg J, Bruno R. Model-based prediction of phase III overall survival in colorectal cancer on the basis of phase II tumor dynamics. J Clin Oncol. 2009;66:4103–4108. doi: 10.1200/JCO.2008.21.0807. [DOI] [PubMed] [Google Scholar]
  • 27.Claret L, Lu JF, Sun YN, Bruno R. Development of a modeling framework to simulate efficacy endpoints for motesanib in patients with thyroid cancer. Clin Pharmacol Ther. 2010;66:1141–1149. doi: 10.1007/s00280-010-1449-z. [DOI] [PubMed] [Google Scholar]
  • 28.Claret L, Gupta M, Han K, Joshi A, Sarapa N, He J, Powell B, Bruno R. Evaluation of tumor-size response metrics to predict overall survival in Western and Chinese patients with first-line metastatic colorectal cancer. J Clin Oncol. 2013;31:2110–2114. doi: 10.1200/JCO.2012.45.0973. [DOI] [PubMed] [Google Scholar]
  • 29.Celeux G, Forbes F, Robert C, Titterington D. Deviance information criteria for missing data models. Bayesian Analysis. 2006;1:651–674. doi: 10.1214/06-BA122. [DOI] [Google Scholar]
  • 30.Watanabe S. Asymptotic equivalence of bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research. 2010;11:3571–3594. [Google Scholar]
  • 31.Ibrahim JG, Chen MH, Sinha D. Bayesian Survival Analysis. New York: Springer; 2001. [Google Scholar]
  • 32.Spiegelhalter DJ, Best NG, Carlin B, Van de Linde A. Bayesian measures of model complexity and fit (with discussion) Journal of the Royal Statistical Society. 2002;64:583–639. doi: 10.1111/1467-9868.00353. [DOI] [Google Scholar]
  • 33.Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian Data Analysis. 3. Chapman & Hall/CRC; 2013. [Google Scholar]
  • 34.Cowles MK, Carlin BP. Markov chain monte carlo convergence diagnostics: A comparative review. Journal of the American Statistical Association. 1996;91:883–904. doi: 10.1080/01621459.1996. [DOI] [Google Scholar]
  • 35.Haario H, Saksman S, Tamminen J. An adaptive metropolis algorithm. Bernoulli. 2001;7:223–242. [Google Scholar]

RESOURCES