Abstract
It has been widely recognized that interim analyses of accumulating data in a clinical trial can inflate type I error. Different methods, from group sequential boundaries to flexible alpha spending functions, have been developed to control the overall type I error at pre-specified level. These methods mainly apply to testing the same endpoint in multiple interim analyses. In this paper, we consider a group sequential design with pre-planned endpoint switching after unblinded interim analyses. We extend the alpha spending function method to group sequential stopping boundaries when the parameters can be different between interim, or between interim and final analyses.
Keywords: Alpha spending function, Switching endpoints, Stopping boundaries, Interim analyses, Group sequential trials
1 Introduction
It is fundamental to have clinical trials that are properly designed to answer specific scientific questions, such as whether a drug improves overall survival. Every trial design is striving to answer this question with as much robustness and accuracy as possible while involving the fewest number of patients, reasonable costs and the shortest duration of time. Methodology for group sequential clinical trials has been developed largely during the past few decades so that a trial can be stopped early if there is strong evidence of efficacy during any planned interim analysis. Pocock (1977) first proposed that the crossing boundary be constant for all equally spaced analyses. O’Brien and Fleming (1979) suggested that the crossing boundaries for the kth analysis, zc(k), be changed over the total number of analyses K such that . In both procedures, the number of interim analyses and the timing of the interim analyses need to be pre-determined. The O’Brien-Fleming boundaries have been used more frequently because they preserve a nominal significance level at the final analysis that is close to that of a single test procedure.
The alpha spending function developed by Lan and DeMets (1983) is a more flexible group sequential procedure that does not require the total number nor the exact time of the interim analyses to be specified in advance. Earlier development of the alpha spending function was based on the assumption that the information accumulated between each interim analysis is statistically independent. However, this assumption does not apply to longitudinal studies in a sequential test of slopes for which the total information is unknown. Sequential analysis using the linear random-effects model suggested by Laird and Ware (1982) has been considered by Lee and DeMets (1991), and Wu and Lan (1992). There have been debates on whether the alpha spending function can still be used since the independent increments structure does not hold and the information fraction is unknown (Wei et al 1990, Su & Lachin 1992). It was argued by DeMets and Lan (1994) that the alpha spending function can still be used with a more complex correlation between successive test statistics. The key to using the alpha spending function is being able to define the information fraction. Tsiatis et al. (1995) derived the joint distribution of sequentially computed score tests and maximum likelihood estimates for general parametric survival models subject to censoring and staggered entry, and show that the joint distribution of the sequentially computed score test for any single parameter has an asymptotically independent increments structure. Scharfstein et al. (1997) further demonstrated that sequential Wald statistics behave like a standardized partial sum of independent normal variables and thus have independent increments.
A high degree of flexibility has been well developed with respect to the timing of the analyses and how much type I error (α) to spend at each analysis. However, these methods mainly apply to testing the same endpoint or the same hypothesis in multiple testing. Changing study endpoints is not uncommon in clinical trials. Chan et. al. (2004) compared published articles from 102 randomized clinical trials and reported that 62% of the trials had at least one primary endpoint that was changed. For example the PROactive study concluded that pioglitazone, a peroxisome proliferator-activated receptor agonist, can significantly reduce macrovascular events in diabetes based on a composite endpoint of multiple macrovascular events and death (Dormandy et al 2006). However, the primary endpoint is time to occurrence of a new macrovascular event or death by design (Charbonnel et al 2004).
Changing the primary endpoint after an unblinded interim analysis has been a challenging statistical issue. The EMEA reflection paper finalized in 2007 (EMEA 2007) does not recommend any endpoint change after an unblinded interim analysis although the language has been more flexible in the final version compared to the draft versions. The recently published draft FDA guideline on adaptive designs (FDA 2010) recognizes the need for endpoint change in some cases. However, statistical procedures to control the type I error rate are not well developed in this setting. As the design of the clinical trials is becoming increasingly complex, being able to test different hypotheses while preserving the pre-specified significance level is an emergent need. In this paper, we consider combining a group sequential design with pre-planned endpoint switching after unblinded interim analyses. The major motivation for this is 1) to maximize the information at the interim analysis so that the interim analysis will have a higher chance of success, or 2) to maximize the probability of success at the final analysis by switching endpoints.
Chen et al (2003) considered a switching endpoint design based on the log-rank statistic where mortality was used as the primary endpoint at the interim analysis while a composite endpoint was used as the primary endpoint at the final analysis. The rationale for the design is that a composite endpoint that includes mortality would have a higher event rate and thus have a higher power should the mortality outcome fail to show significance at the interim analysis. However, using a composite endpoint at the interim analysis may not be feasible as regulatory agencies may be reluctant to terminate a trial early to allow the efficacy claim unless there are convincing results based on a more conventional outcome, such as mortality. Since how much type I error (α) to spend at an interim analysis and the timing of the interim analysis are flexible, one can allocate most of the α to the interim analysis so that there is a sufficiently high probability of terminating the trial early should there be a significant difference in mortality. If the mortality endpoint fails, switching to a composite endpoint can still be successful. The PROactive study described above(Dormandy et al 2006) was successful in a composite endpoint. As another example, the MERIT-HF study (MERIT-HF Study Group 1999) was designed to switch to a composite endpoint after an interim analysis based solely on mortality. The study was terminated early due to a convincing mortality outcome at the interim analysis.
The original motivation of this paper came from a design of a phase III trial in patients with glioblastoma multiforme (GBM). The investigator wished to design the study using progression-free survival (PFS) as the primary endpoint in the interim analysis, while using overall survival as the primary endpoint to be tested at the final analysis. A major consideration for the design is the low event rate for overall survival at time of the interim analysis whereas the PFS data is much more mature and will have a much higher probability to cross the stopping boundary. Goldman et al (2008), from the Southwest Oncology Group Statistical Center, recommend the use of an intermediate endpoint, such as PFS, for interim futility testing of Phase III trials. Other authors have also explored the possibility of using an intermediate endpoint to shorten the time lines for drug approval (Scher et al 2009, Olmos et al 2009, Hallstrom 2009). Although regulatory agencies have not been very open to the idea of terminating a trial early based on intermediate endpoints, the development of targeted therapy and a better understanding of the relationship between biomarkers and disease progression may change the landscape of drug approval. In the future, there may be more studies that would be allowed to use an intermediate endpoint in an early interim analysis.
No paper has formally considered endpoint switching and the implied stopping boundaries in general. In this paper, we extend the alpha spending function methodology to derive stopping boundaries when our interest focuses on switching endpoints or parameters at different analysis times. Statistically, this is equivalent to testing different hypotheses at different interim analyses. The derivation is based on the joint distribution of the test statistics and the alpha spending function so that the overall type I error α will be strictly preserved. After a brief review of the alpha spending function in Section 2, in Section 3, the newly derived stopping boundaries are compared to the boundaries without changing the parameters, using the Pocock and O’Brien-Fleming like spending functions proposed by Lan and DeMets (1983). Applications to a biviarate survival model and a joint model of longitudinal and time-to-event data are discussed in Sections 4 and 5. We close the article with a discussion in Section 6.
2 Preliminaries: The Alpha Spending Function and Stopping Boundaries
Let T denote the scheduled end of the trial, and t* denote the fraction of information that has been observed at calendar time t (t ∈ [0, T]). Also let ik, k = 1, 2,…, K denote the information available at the kth interim analysis at calendar time tk, so , where I is the total information. Lan and DeMets specified an alpha spending function such that α(0) = 0 and α(1) = α. Boundary values zc(k) can be determined successively so that
(2.1) |
where {Z(1),…, Z(k)} are the test statistics from the interim analyses 1,…, k and P0 denotes the probability.
Alpha spending functions that approximate O’Brien-Fleming or Pocock Boundaries are as follows:
where Φ denotes the standard normal cumulative distribution function. To solve for the boundary values zc(k), we need to obtain the joint distribution of Z{(1), Z(2),…, Z(k)}. In most cases, this distribution is asymptotically multivariate normal, and the covariance matrix Σ is simple when the test statistics involve the same parameter at each interim analysis. In particular,
where σlk is teh lkth element of Σ.
If the information increments have an independent distributional structure, which is usually the case, , where nl and nk are the number of subjects included in the lth and kth interim analyses. The derivation of zc(k) based on α(t*) is relatively straightforward with this covariance structure using the methods of Armitage et al (1969).
3 Stopping Boundaries for Testing Different Parameters at the Interim and Final Analysis
3.1 Independent Information Increments
Classical group sequential methods assume that each patient has only one response, such as a 1-year response rate, thus test statistics at each interim analysis will have independent increments. In this Section, we expand upon the classical group sequential methods to derive the stopping boundaries for testing different parameters.
Similar to testing the same hypothesis, boundary values zc(k) can still be determined successively by equation (2.1) when testing different hypotheses. Because the alpha spending functions α1(t*) = α2(t*) = α when t* = 1, the overall type I error will be ≤ α with the set of crossing boundaries, zc(k), determined from equation (2.1). The joint distribution of the test statistics, {Z(1),…, Z(k)} still asymptotically follows a multivariate normal distribution. In general, if different interim analyses involve different parameters, the covariance structure is unknown, and therefore we cannot obtain the asymptotic joint distribution of {Z(1), Z(2),…, Z(k)}. Thus, deriving zc(k) will be problematic. When the parameters we are testing at the interim analysis and the final analysis are from the same likelihood function, however, the covariance is known and can be computed from the Fisher information matrix.
To make our ideas clear, let β1 denote the parameter to be tested at the lth interim analysis, and let β2 be the parameter to be tested at the kth interim analysis. The null hypotheses are H0: β1 = β01 for testing β1, H0: β2 = β02 for testing β2. Let lk denote the log-likelihood at the kth analysis from nk independent samples, lk = log(L(β1, β2|ynk)). Similarly, let ll = log(L(β1, β2|yn1)) and lk−l = log(L(β1, β2|ynk−1)). Further assume that Z(l) and Z(k) are the score statistics at the lth and kth interim analysis, and the information accumulated between each interim analysis is independent. Define
It can be shown that
(3.1) |
where I12(β01, β02) is the off diagonal element, and I11(β01), I22(β02) are the diagonal elements of the Fisher information matrix. Note that under the independent information increment assumption. Therefore, when we test different hypotheses at different interim analyses, the stopping boundaries will not only depend on the information fraction, they will also depend on the information matrix of the two parameters under H0. Thus, there will not be one set of stopping boundaries that can be used for all likelihood functions or all parameters. The investigators in this case must derive their own stopping boundaries for different study designs.
Letting , it can be shown that
where l denotes the log-likelihood based on a sample of size 1. Thus, w is the correlation coefficient (Corr) of the score function, and |w| ≤ 1. Since the covariance matrix, Σ of the test statistics (Z(1), Z(2),…, Z(k)) is positive definite, the value of w is also bounded by a number that is ≥ −1. When we test the same endpoint between the lth and the kth interim analysis, w = 1. It should be noted that the covariance in (3.1) is derived under the global hypothesis of β1 = β10 and β2 = β20. However, the score statistic Z(l) at the lth analysis only tests the hypothesis of β1 = β10, and Z(k) at the kth analysis only tests the hypothesis of β2 = β20. The other parameter will be replaced by its maximum likelihood estimate when calculating the score statistics. If we reject the null hypothesis at the lth interim analysis, the covariance of Z(l) and Z(k) will be irrelevant, as Z(k) will not be calculated. If we cannot reject the null hypothesis at the lth analysis, it is reasonable to derive the covariance under the global hypothesis of β1 and β2. Therefore, the false positive rate will be under the global hypothesis.
We next calculate different stopping boundaries by assuming different values of w. In Table 1, we compare the boundaries computed from α1(t*) and α2(t*), the O’Brien-Fleming-like, and the Pocock-like alpha spending functions proposed by Lan & DeMets (1983). The comparison is made for a one-sided test with significance level α = 0.025, K = 5, equally spaced, and the test parameter is β1 for j = 1, 2, β2 for j = 3, 4, 5 (j = 1,…, 5), and .
Table 1.
w | O’Brien-Fleming Like Alpha Spending Function α1(t*)
|
Pocock Like Alpha Spending Function α2(t*)
|
||||||||
---|---|---|---|---|---|---|---|---|---|---|
zc(1) | zc(2) | zc(3) | zc(4) | zc(5) | zc(1) | zc(2) | zc(3) | zc(4) | zc(5) | |
1 | 4.88 | 3.36 | 2.68 | 2.29 | 2.03 | 2.44 | 2.42 | 2.41 | 2.40 | 2.39 |
0.8 | 4.88 | 3.36 | 2.69 | 2.29 | 2.03 | 2.44 | 2.42 | 2.50 | 2.43 | 2.42 |
0.5 | 4.88 | 3.36 | 2.70 | 2.30 | 2.03 | 2.44 | 2.42 | 2.57 | 2.46 | 2.44 |
0 | 4.88 | 3.36 | 2.70 | 2.30 | 2.03 | 2.44 | 2.42 | 2.60 | 2.50 | 2.45 |
−0.5 | 4.88 | 3.36 | 2.70 | 2.30 | 2.03 | 2.44 | 2.42 | 2.60 | 2.50 | 2.45 |
−0.7 | 4.88 | 3.36 | 2.70 | 2.30 | 2.03 | 2.44 | 2.42 | 2.60 | 2.50 | 2.45 |
Note that between (j = 1, 2) and (j = 3, 4, 5), the value of w is still 1. The covariance matrix for (Z(1),…, Z(5)) is
(3.2) |
where w ≠ 1. We can see that the covariance matrix can be partitioned into four sub-matrices . Solving for (zc(1),…, zc(K)) in equation (2.1) requires numerical integration. The quadrature method by Armitage et al (1969) cannot be applied here with this covariance structure since the statistics are not the same in the sequential procedure. Here, we used the adaptive integration method by Genz (1992) to evaluate zc(k). Compared to a Monte Carlo algorithm and the subregion adaptive algorithm, the adaptive integration method of Genz (1992) reliably computes multivariate normal probabilities with as many as ten variables in a few seconds. When we compare our boundaries to a group sequential procedure that does not change parameters at different interim analyses, the boundaries are very close when the alpha spending function is α1(t*). However, the boundaries are substantially different when the alpha spending function is α2(t*).
We next consider a scenario where we change the parameter at the final (5th) analysis. Our stopping boundary for the first 4 analyses will be the same as the ones in Lan and DeMets (1983). The 5th boundary was calculated for different values of w (Table 2). The boundary is substantially different than the Lan-Demets boundary. This shows when α is minimally spent prior to the endpoint change, as in early interim analyses using α1(t*), the impact on the stopping boundaries is small (smaller penalty). The more the α is spent prior to the endpoint change, the more significant the impact is on the boundaries. In Table 3, we compare the boundaries computed from α1(t*) and α2(t*) for a one-sided α = 0.025 test with K = 2 and .
Table 2.
Alpha Spending Function | w
|
|||||
---|---|---|---|---|---|---|
1 | 0.8 | 0.5 | 0 | −0.5 | −0.7 | |
O’Brien-Fleming Like Function, α1(t*) | 2.03 | 2.13 | 2.19 | 2.23 | 2.23 | 2.23 |
Pocock Like Function, α2(t*) | 2.39 | 2.54 | 2.64 | 2.70 | 2.70 | 2.70 |
Table 3.
w | O’Brien-Fleming Like Alpha Spending Function α1(t*)
|
Pocock Like Alpha Spending Function α2(t*)
|
||
---|---|---|---|---|
zc(1) | zc(2) | zc(1) | zc(2) | |
1 | 2.96 | 1.97 | 2.16 | 2.20 |
0.8 | 2.96 | 1.98 | 2.16 | 2.25 |
0.5 | 2.96 | 1.98 | 2.16 | 2.30 |
0 | 2.96 | 1.99 | 2.16 | 2.34 |
−0.5 | 2.96 | 1.99 | 2.16 | 2.34 |
−0.8 | 2.96 | 1.99 | 2.16 | 2.34 |
−1 | 2.96 | 1.99 | 2.16 | 2.34 |
When the test parameter is changed more than once, the covariance matrix for (Z(1),…, Z(k)) will have more elements where w ≠ 1. For example, when K = 5, if we test β1 for j = 1, 2, test β2 for j = 3, 4, and test a third parameter β3 for j = 5, the covariance matrix in (3.2) will become
where , and . As a result, the critical values will be larger.
The value of w is determined by the Fisher information matrix. However, when the likelihood function is complex, the Fisher information matrix can be dificult to obtain. In this case, researchers can use more conservative boundaries by assuming a smaller w so that the overall type I error rate will be ≤ α. The observed information matrix can also be used for the calculation. Since w = 1 when the test parameter is the same, the critical values for the interim analyses prior to endpoint switch can be calculated without the information matrix. The observed information matrix based on data collected prior to endpoint switch can then be used to calculate the critical values when or after the endpoint is switched. The number of interim analyses and timing of the switch should depend on the duration and the size of the study, as well as the probability of success at the interim or final analyses. To avoid extra penalties, we recommend that the switch not be done in a late stage interim analysis.
Although w can be less than 0, there seems to be no further impact on the stopping boundaries in the scenarios we presented above. For a given alpha spending function, solving equation (2.1) sequentially involves finding the smallest critical values in a sequential order such that the tail probability is no larger than the value defined on the right hand side of the equation. For a fixed set of critical values in the region that we are interested in, the tail probability increases as w decreases, with negligible increases beyond w = 0. Thus, solving equation (2.1) will result in smaller critical values (smaller penalty) when w is larger. Recall that w is the correlation coefficient of two efficient scores and , and the sign of w is determined by the off diagonal element of the Fisher information matrix, which is determined by the covariance of β1 and β2. When w < 0, the covariance of β1 and β2 is positive, the penalty to switch endpoints is also higher.
In this Section, we have provided stopping boundaries for different values of w. For convenience and easy comparison to classical stopping boundaries, we have assumed that the information fractions are equally spaced. When the information fraction is not equally spaced, for example, we just need to replace etc. in the covariance matrix (3.2) with etc., where n1, n2 and n3 are the sample size at the 1st, 2nd, and fird analysis. The numerical method we discussed above can still be used to derive the stopping boundaries.
3.2 Non-independent Information Increments
There are scenarios where complete independent increments of information cannot be assumed. Classical group sequential designs have similar issues and there has been a large literature devoted to this topic. Many argued that the sequentially computed statistics has an asymptotically independent increment structure under usual conditions (Tsiatis et al. 1995, Scharfstein at al. 1997). In Section 3.1, we showed that
(3.3) |
If the sequentially computed score statistics have an asymptotically independent increments structure, the information fraction between the lth analysis and the kth analysis will be asymptotically . The covariance structure we discussed in Section 3.1 can be applied in time-to-event analysis. Assuming independent increments is more controversial in other settings such as sequential testing of the slope in longitudinal studies (Wei et al 1990, Su & Lachin 1992, DeMets & Lan 1994) where the total information is unknown. When the sequentially computed score statistics are not independent or asymptotically independent, (3.3) can still be expressed as
and the absolute value of the correlation coefficient (Corr) of Sl and is still bounded by 1. However, reflects both the additional contribution of the nl subjects to the kth analysis and the correlation coefficient of the score function. We re-write as
where w is the correlation coefficient of the score function, as defined in Section 3.1. Information fraction between the analyses includes both incremental of information between subject and within subject. It follows that the total information estimation methods suggested in classical group sequential designs (Lan & Zucker 1993, Lan et al 1994) can then be applied in this setting. As discussed in Section 3.1, a larger w will result in smaller stopping boundaries, and there is no further impact on the boundaries when w < 0 compared to w = 0. Therefore, when w ≤ 0, an accurate estimate of the information fraction is not needed.
4 Application to a Bivariate Survival Model
In this section, we present an application for a bivariate survival model where we are interested in testing whether time-to-event A or the first recurrence time is associated with treatment (H0: β1 = 0) in an earlier interim analysis, but change to testing whether time to event B or second recurrence time is associated with treatment (H0: β2 = 0) in a later interim, or final analysis.
We assume that the event time for the ith subject and the jth event type (i = 1,…, N, j = 1, 2) is follows a Weibull distribution with shape parameter ν and frailty ψi. Thus, the hazard function of the event time of the ith subject and the jth event type, Tij, is
where xij denotes the explanatory variable for subject i and the jth event type, and β0 and βj are the intercept and the coefficient of the explanatory variable, respectively. We consider a model with gamma frailty ψi, and thus , with mean 1 and variance . Conditional on ψi, the survival times are assumed to be independent. Thus, the observed-data likelihood function is given by
(4.1) |
where Cij is the right censoring indicator (equals 0 for right censoring, 1 otherwise), and tij denotes the event time for subject i and the jth event type. After ψi is integrated out in (4.1), the observed-data likelihood is given by
(4.2) |
where and .
For ease of exposition, let treatment be the only explanatory variable and therefore xi = xi1 = xi2 in this particular setting. In most confirmatory clinical trials, there are only two treatment arms. Based on the likelihood function (4.2) and using the reference cell coding for convenience (xi = {0, 1}),
where l(β) = lnL(θ, β) and . The quantities E[b(t1)b(t2)] and E[(b(t1) − b2(t1))] can be solved numerically, since the joint density of the survival times, (t1, t2), is given by
However, regardless of the choice of θ, ν, or β0, the value of w will be negative. As discussed in Section 3, the boundaries will be the same as the boundaries for the case when w = 0. Further calculation of w in this case will not be necessary.
5 Application to Joint Modeling of Longitudinal and Time-to-Event Data
Most time-to-event studies also collect repeated measurements of potential biomarkers. A powerful method to take into account the dependency of time-to-event data and repeated measurements of biomarkers is joint modeling of these two data types (Wulfsohn & Tsiatis 1997, Henderson et al 2000, Tsiatis & Davidian 2004). Applications of joint models in studying surrogate endpoints was particularly discussed in Taylor and Wang (2002). It has been demonstrated through simulation studies that use of joint modeling leads to correction of biases and improvement of efficiency (Hsieh et al 2006, Chen et al 2011, Ibrahim et al 2010). Since joint models contain multiple parameters that may be related to the treatment effect in a joint likelihood, this modeling situation presents a unique opportunity and advantage of testing different parameters at different interim analyses. Although joint models have not been widely accepted as the primary analysis models, we believe that there is great potential in this type of modeling and hence provide another demonstration of the advantages of the proposed methodology in this Section.
5.1 Notation
For subject i, (i = 1,…, N), let Ti and Ci denote the event and censoring times, respectively; Si = min(Ti, Ci) and Δi = I(Ti ≤ Ci). Let Zi be a treatment indicator, and let Xi(u) be the longitudinal process (also referred to as the trajectory) at time u ≥0. In a more general sense, Zi can be a q-dimensional vector of baseline covariates including treatment. To simplify the notation, Zi denotes the binary treatment indicator in this section. Values of Xi(u) are measured intermittently at times u ≤Si, j = 1,…, mi, for subject i. Let Y(tij) denote the observed value of Xi(tij) at time tij, which may be prone to measurement error.
The joint modeling approach links two sub-models, one for the longitudinal process Xi(u) and one for the event time Ti, by including the trajectory in the hazard function of Ti (Taylor & Wang 2002, Chen et al 2011). Thus,
(5.1) |
where β measures the degree of association between the longitudinal marker and time-to-event, and ξ is the direct treatment effect on survival. A general polynomial linear model is frequently used to model the trajectory function of the longitudinal data (Wulfsohn & Tsiatis 1997, Ibrahim et al 2004), given by.
(5.2) |
where θi = {θ0i, θ1i,…, θpi}T is distributed as a multivariate normal with mean μθ and variance-covariance matrix Σθ. The parameter γ is a fixed treatment effect for the longitudinal marker, which may also have an indirect effect on survival. The observed longitudinal measures are modeled as Yi(tij) = Xi(tij) + eij, where , the θi’s are independent and Cov(eij, eij′) = 0, for j ≠ j′0. The observed-data likelihood function for subject i is given by
(5.3) |
In expression (5.3), the density function for the time-to-event, f(Si, Δi|θi, β, γ, ξ), can be based on any model, such as the Weibull model, exponential model, or piecewise exponential model.
5.2 Motivation for Testing Different Parameters at Different Interim Analyses in Joint Models
Based on the hazard function (5.1) and the trajectory model (5.2), we can see that the overall treatment effect is βγ+ ξ, where γ is the treatment effect on the longitudinal marker, ξ is the direct treatment effect on time-to-event, and β is the association between the longitudinal marker and time-to-event. The parameter β can also be viewed as measuring the degree of “surrogacy” between the longitudinal marker and the time-to-event. It was suggested by Taylor and Wang (2002) that the quantity represents
which is a measure of surrogacy suggested by Freedman et al (1992). If Yij is a good surrogate, the values of β and γ will be relatively large compared to the value of ξ.
In the case of a real surrogate, directly testing the treatment effect may require substantially more subjects and may take longer to observe enough events. A natural question is whether we can test β and γ jointly. If β ≠ 0 and γ ≠ 0, then βγ ≠ 0, and if βγ and ξ have the same sign, which is typically the case, the overall treatment effect βγ + ξ ≠ 0. Simulations were carried out to examine the power of testing β, γ and βγ + ξ from the joint model (5.3). We simulated 1000 datasets and each dataset had 200 subjects (100 subjects per treatment group). Power was determined as the % of datasets with a p-value ≤ 0.05 from the score test for testing
(5.4) |
versus
(5.5) |
Rejecting H0 in (5.4) implies rejecting H0 in (5.5) unless the direct treatment effect on the time-to-event ξ has a completely opposite effect compared to βγ. Table 4 shows substantial power advantages for testing β and γ jointly instead of testing βγ + ξ alone, especially when the size of ξ is relatively small.
Table 4.
β | γ | ξ |
H0: β = 0 or γ = 0
|
H0: βγ+ ξ= 0
|
|||
---|---|---|---|---|---|---|---|
β Estimate (SE) | γ Estimate (SE) | Power | Estimate (SE) | Power | |||
0.2 | 0.25 | 0.05 | 0.210 (0.056) | 0.250 (0.130) | 46.6% | 0.095 (0.170) | 9.7% |
0.2 | 0.15 | 0.15 | 0.210 (0.056) | 0.149 (0.130) | 25.6% | 0.176 (0.168) | 18.8% |
The event time was simulated from an exponential distribution with λi(t) = λ0 exp{βXi(t)+ξZi}, where Xi(t) = θ0i + θ1it+ γZi and λ0 = 0.85. To ensure a minimum follow-up time of 0.75 years (9 months) and maximum follow-up time of 2 years, right censoring was generated from a uniform [0.75, 2] distribution. The (θ0i θ1i) were assumed to follow a bivariate normal distribution with and .
By rejecting the null hypothesis in (5.4), we showed that there is a treatment effect on survival through the biomarker. This is different from the idea of switching endpoints because the endpoint is still survival. The number of events required to reject (5.4) is smaller than the number of events required to reject (5.5), and therefore provides a higher probability to succeed in the interim analysis. This is more evident when the marker is a good surrogate of survival.
5.3 Stopping Boundaries in a Hypothetical Design
Let φ = βγ+ξ. The likelihood function (5.3) can be reparameterized in terms of β, γ and φ by replacing ξ with φ − βγ. Let Zβ(l), Zγ(l), Zφ(k) denote the score test statistics of β and γ at the lth analysis, and of φ at the kth analysis. Then , and , where Flk is the information fraction between the lth and the kth analysis, I(β0), I(γ0), I(φ0), I(β0, φ0) and I(γ0, φ0) are elements of the Fisher information matrix under the null hypothesis.
Suppose that in a study with two planned analyses, we are interested in testing hypothesis (5.4) in the interim analysis, and testing hypothesis (5.5) in the final analysis. To ensure the type I error will not exceed the planned level of 0.05 in two-sided tests, the boundary values zc(1) and zc(2) can be determined successively so that
(5.6) |
(5.7) |
Note that both (5.6) and (5.7) need to be satisfied as the parameter space under H0 for the first interim analysis is the set {β: β = 0}∪{γ: γ = 0}. The two sets in the null hypothesis parameter space can be completely disjoint.
The likelihood function of (5.3) does not have a closed form, thus a direct estimate of the Fisher information matrix will be difficult. One possible solution is to approximate the likelihood function by a Laplace approximation and obtain the approximate Fisher information matrix. Based on our simulated data, we obtained negative value of and , where I(n) stands for the observed information. This is expected since the correlation between φ and β (or γ) is usually positive, resulting in w < 0. Therefore it is fairly safe to derive a set of boundaries by assuming w = 0 between β (or γ) and φ. Even if w > 0, assuming w = 0 will only result in more conservative crossing boundaries and the overall type I error will be less than α.
It is possible that a different alpha spending function can be used in (5.6) and (5.7), and therefore result in different crossing boundaries for β and γ. However the second stopping boundary should be based on the maximum of zc1(2) and zc2(2) so that the overall type I error will not exceed α.
6 Discussion
In this paper, we have extended the concept of the alpha spending function to testing different endpoints between interim or between interim and final analyses. Correlations between successive test statistics not only depend on the information fraction between interim analyses, but they also depend on the Fisher information matrix. The correlation between the two parameter is inversely related to the off diagonal elements of the Fisher information matrix. Thus, when the correlation coefficient of the score function (w) is positive, the two parameters have negative association, and the additional penalty to pay for switching endpoints is smaller. When w is negative, the two parameters are positively associated, and the additional penalty for switching endpoints is larger.
The general covariance structure of the sequential score test statistics in this paper was developed based on the independent increments of information assumption. The proposed methodology can be applied to any parameter test statistics where the assumption is satisfied asymptotically, which applies to most test statistics (Tsiatis et al 1995, Scharfstein et al. 1997). When the assumption is violated, the covariance structure of the sequential test statistics will be more complex, although the alpha spending function can still be applied as long as the covariance is known or can be numerically calculated based on a known likelihood function. Therefore, the assumption is not necessary for changing hypotheses in general. Furthermore, assuming independence of the test statistics (w = 0) returns the most conservative stopping boundaries. When w = 0, the independent increments of information assumption is no longer important.
In the case where w = 0 and there are only two analyses, the critical values based on the method discussed in this paper will be equivalent to the ones based on a simple Bonferroni adjustment. When there are more than 2 analyses, the method will return smaller critical values even when w = 0, as it takes into account the correlation between the test statistics when the parameter is the same. The simple Bonferroni method assumes complete independence of (Z(1),…, Z(k)), that is, w = 0 for all off diagonal elements in the covariance matrix. This will return the largest critical values.
Sophisticated parametric models have not been widely used in late stage clinical trials. It is recognized that despite the many promising innovations in bio-markers to measure drug effect and disease progression, little has changed in the style of drug development over the past 30 years. The Food and Drug Administration (FDA) Pharmacometrics Group released two advisory meeting proceedings in 2006 and 2008 concerning disease-drug-trial models, with the objective of improving the success rate of late stage drug development (FDA 2006, FDA 2008). We believe that sophisticated parametric models that give better power to detect treatment differences will become more popular in late phase clinical trials and will be more acceptable to FDA reviewers.
The application to joint modeling of longitudinal and time-to-event data is promising in the case that the longitudinal marker may be a good surrogate for the time-to-event. In reality, time-to-event data, such as overall survival, may be lengthy to obtain. With recent advances in genetic research and other biomarker research, many potential surrogates are being identified. For example, Circulating tumor cells (CTC’s) have been found to be associated with progression-free survival and overall survival in patients with metastatic breast cancer (Dawood et al 2008, Liu et al 2009). However, there is also great uncertainty in terms of how good these “surrogates” are. By showing both β ≠ 0 and γ ≠ 0, this not only demonstrates that the longitudinal marker is a strong potential surrogate, it also demonstrates that there is a treatment effect on survival through the longitudinal marker. This may be considered sufficient efficacy evidence to terminate the trial early. If the longitudinal marker is a weak surrogate, this will allow the investigator to proceed to the next analysis to test the hypothesis that the overall treatment effect, βγ + ξ, is not 0. Note that if we know whether the longitudinal marker is or is not a good surrogate, testing the same parameter would be more powerful as it will not result in any extra penalty.
As discussed by Fleming and DeMets (1993), early termination of a clinical trial is a complex process and cannot be simply reached by pre-specified stopping rules. For example, even when the efficacy stopping boundary is crossed, a trial may need to be continued to collect sufficient safety information. In this paper, we simply provide a tool to facilitate the decision process and ensure that the type I error will be strictly under control to its pre-specified level when testing different endpoints at different interim analyses. Furthermore, we are not advocating that when a particular endpoint is not performing well in terms of significance at a particular interim analysis, we can just switch to another endpoint at a subsequent interim analysis. The hypothesis tests should be pre-specified before the trial begins. If the analyses are based on different likelihood functions, or the value of w cannot be reliably estimated, w should be set to 0, assuming no correlation between the test statistics.
References
- Armitage P, McPherson CK, Rowe BC. Repeated significance tests on accumulating data. Journal of the Royal Statistical Society. 1969;132:232–244. [Google Scholar]
- Chan AW, Hrobjartsson A, Haahr MT, Gotzsche PC, Altman DG. Empirical evidence for selective reporting of outcomes in randomized trials. Journal of the American Medical Association. 2004;291(20):2457–2465. doi: 10.1001/jama.291.20.2457. [DOI] [PubMed] [Google Scholar]
- Charbonnel B, Dormandy J, Erdmann E, Massi-Benedetti M, Skene A. The prospective pioglitazone clinical trial in macrovascular events (PROactive): Can pioglitazone reduce cardiovascular events in diabetes? Study design and baseline characteristics of 5238 patients. Diabetes Care. 2004;27:16471653. doi: 10.2337/diacare.27.7.1647. [DOI] [PubMed] [Google Scholar]
- Chen LM, Ibrahim JG, Chu H. Sample Size and Power Determination in Joint Modeling of Longitudinal and Survival Data. Statistics in Medicine. 2011;30:2295–2309. doi: 10.1002/sim.4263. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen YHJ, DeMets DL, Lan KKG. Monitoring mortality at interim analyses while testing a composite endpoint at the final analysis. Controlled Clinical Trials. 2003;24:16–27. doi: 10.1016/s0197-2456(02)00306-9. [DOI] [PubMed] [Google Scholar]
- Dawood S, Broglio K, Valero V, et al. Circulating tumor cells in metastatic breast cancer: from prognostic stratification to modification of the staging system? Cancer. 2008;113(9):2422–2430. doi: 10.1002/cncr.23852. [DOI] [PubMed] [Google Scholar]
- DeMets DL, Lan KKG. Interim analysis: the alpha spending function approach. Statistics in Medicine. 1994;13:1341–1352. doi: 10.1002/sim.4780131308. [DOI] [PubMed] [Google Scholar]
- Dormandy J, Charbonnel B, Eckland D, Erdmann E, Massi-Benedetti M, et al. Secondary prevention of macrovascular events in patients with type 2 diabetes in the PROactive study (PROspective pioglitAzone Clinical Trial In macroVascular Events): A randomized controlled trial. Lancet. 2006;366:12791289. doi: 10.1016/S0140-6736(05)67528-9. [DOI] [PubMed] [Google Scholar]
- European Medicines Agency. Reflection paper on methodological issues in confirmatory clinical trials planned with an adaptive design. 2007 http://www.ema.europa.eu/ema/pages/includes/document/open_document.jsp?webContentId=WC500003616.
- Fleming TR, DeMets DL. Monitoring of clinical trials: issues and recommendations. Control Clinical Trials. 1993;14:183197. doi: 10.1016/0197-2456(93)90002-u. [DOI] [PubMed] [Google Scholar]
- Food and Drug Administration. Proceedings of the Clinical Pharmacology Sub-Committee Advisory Committee Meeting. 2006 http://www.fda.gov/AboutFDA/CentersOffices/CDER/ucm180485.htm.
- Food and Drug Administration. Proceedings of the Clinical Pharmacology Sub-Committee Advisory Committee Meeting. 2008 http://www.fda.gov/AboutFDA/CentersOffices/CDER/ucm180485.htm.
- Food and Drug Administration. Guidance for Industry - Adaptive design clinical trials for drugs and biologics. 2010 http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ucm201790.pdf.
- Freedman LS, Graubard BI, Schatzkin A. Statistical validation of intermediate endpoints for chronic disease. Statistics in Medicine. 1992;11:167178. doi: 10.1002/sim.4780110204. [DOI] [PubMed] [Google Scholar]
- Genz A. Numerical Computation of Multivariate Normal Probabilities. J Comp Graph Stat. 1992;1:141–149. [Google Scholar]
- Goldman B, LeBlanc M, Crowley J. Interim futility analysis with intermediate endpoints. Clinical Trials. 2008;5:14–22. doi: 10.1177/1740774507086648. [DOI] [PubMed] [Google Scholar]
- Hallstrom A. Is survival the only or even the right outcome for evaluating treatments for out-of-hospital cardiac arrest? A proposed test based on both an intermediate and ultimate outcome. UW Biostatistics Working Paper Series. 2009 Working Paper 352. http://www.bepress.com/uwbiostat/paper352.
- Henderson R, Diggle P, Dobson A. Joint modeling of longitudinal measurements and event time data. Biostatistics. 2000;1:465–480. doi: 10.1093/biostatistics/1.4.465. [DOI] [PubMed] [Google Scholar]
- Hsieh F, Tseng YK, Wand JL. Joint modeling of survival and longitudinal data: likelihood approach revisited. Biometrics. 2006;62:1037–1043. doi: 10.1111/j.1541-0420.2006.00570.x. [DOI] [PubMed] [Google Scholar]
- Ibrahim JG, Chen MH, Sinha D. Bayesian methods for joint modeling of longitudinal and survival data with applications to cancer vaccine trials. Statistica Sinica. 2004;14:863–883. [Google Scholar]
- Ibrahim JG, Chu H, Chen LM. Basic Concepts and Methods for Joint Models of Longitudinal and Survival Data. Journal of Clinical Oncology. 2010;28(16):2796–2808. doi: 10.1200/JCO.2009.25.0654. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]
- Lan KKG, DeMets DL. Discrete sequential boundaries for clinical trials. Biometrika. 1983;70:659–663. [Google Scholar]
- Lan KKG, Zucker D. Sequential monitoring of clinical trials: The role of information in Brownian motion. Statistics in Medicine. 1993;12:753–765. doi: 10.1002/sim.4780120804. [DOI] [PubMed] [Google Scholar]
- Lan KKG, Reboussin DM, DeMets DL. Information and information fractions for design and sequential monitoring of clinical trials. Communications in Statistics-Theory and Methods. 1994;23(2):403–420. [Google Scholar]
- Lee JW, DeMets DL. Sequential comparison of change with repeated measurement data. Journal of the American Statistical Association. 1991;86:757–762. [Google Scholar]
- Liu MC, Shields PG, Warren RD, et al. Circulating tumor cells: a useful predictor of treatment efficacy in metastatic breast cancer. Journal of Clinical Oncology. 2009;27(31):5153–5159. doi: 10.1200/JCO.2008.20.6664. [DOI] [PMC free article] [PubMed] [Google Scholar]
- MERIT-HF Study Group. Effect of metoprolol CR/XL in chronic heart failure: Metoprolol CR/XL randomized intervention trial in congestive heart failure (MERIT-HF) Lancet. 1999;353:20012007. [PubMed] [Google Scholar]
- O’Brien PC, Fleming TR. A multiple testing procedure for clinical trials. Biometrics. 1979;35:549–556. [PubMed] [Google Scholar]
- Olmos D, Arkenau HT, Ang JE, Ledaki I, Attard G, Carden CP, Reid AHM, A’Hern R, Fong PC, et al. Circulating tumour cell (CTC) counts as intermediate end points in castration-resistant prostate cancer (CRPC): a single-centre experience. Annals of Oncology. 2009;20:27–33. doi: 10.1093/annonc/mdn544. [DOI] [PubMed] [Google Scholar]
- Pocock SJ. Group sequential methods in the design and analysis of clinical trials. Biometrika. 1977;64:191–199. [Google Scholar]
- Scharfstein DO, Tsiatis AA, Robins JM. Semiparametric efficiency and its implication on the design and analysis of group-sequential studies. Journal of the American Statistical Association. 1997;92:1342–1350. [Google Scholar]
- Scher HI, Jia X, Bono JS, Fleisher M, Pienta KJ, Raghavan D, Heller G. Circulating tumour cells as prognostic markers in progressive, castration-resistant prostate cancer: a reanalysis of IMMC38 trial data. The Lancet Oncology. 2009;10:233– 239. doi: 10.1016/S1470-2045(08)70340-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Su JQ, Lachin JU. Group sequential distribution-free methods for the analysis of multivariate observations. Biometrics. 1992;48:1033–1042. [PubMed] [Google Scholar]
- Taylor JMG, Wang Y. Surrogate markers and joint models for longitudinal and survival data. Controlled Clinical Trials. 2002;23:626–634. doi: 10.1016/s0197-2456(02)00234-9. [DOI] [PubMed] [Google Scholar]
- Tsiatis AA, Boucher H, Kim K. Sequential methods for parametric survival model. Biometrika. 1995;82:165–173. [Google Scholar]
- Tsiatis AA, Davidian M. Joint modeling of longitudinal and time-to-event data: and overview. Statistica Sinica. 2004;14:809–834. [Google Scholar]
- Wei LJ, Su JQ, Lachin JM. Interim analyses with repeated measurements in a sequential clinical trial. Biometrika. 1990;77:359–364. [Google Scholar]
- Wu MC, Lan KKG. Sequential monitoring for comparison of changes in a response variable in clinical trials. Biometrics. 1992;48:765–779. [PubMed] [Google Scholar]
- Wulfsohn MS, Tsiatis AA. A joint model for survival and longitudinal data measured with error. Biometrics. 1997;53:330–339. [PubMed] [Google Scholar]