Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Jun 15.
Published in final edited form as: Stat Biosci. 2013 May 17;7(1):1–18. doi: 10.1007/s12561-013-9089-6

Cumulative Hazard Ratio Estimation for Treatment Regimes in Sequentially Randomized Clinical Trials

Xinyu Tang 1, Abdus S Wahed 2
PMCID: PMC4467029  NIHMSID: NIHMS594885  PMID: 26085847

Abstract

The proportional hazards model is widely used in survival analysis to allow adjustment for baseline covariates. The proportional hazard assumption may not be valid for treatment regimes that depend on intermediate responses to prior treatments received, and it is not clear how such a model can be adapted to clinical trials employing more than one randomization. Besides, since treatment is modified post-baseline, the hazards are unlikely to be proportional across treatment regimes. Although Lokhnygina and Helterbrand (Biometrics 63: 422–428, 2007) introduced the Cox regression method for two-stage randomization designs, their method can only be applied to test the equality of two treatment regimes that share the same maintenance therapy. Moreover, their method does not allow auxiliary variables to be included in the model nor does it account for treatment effects that are not constant over time. In this article, we propose a model that assumes proportionality across covariates within each treatment regime but not across treatment regimes. Comparisons among treatment regimes are performed by testing the log ratio of the estimated cumulative hazards. The ratio of the cumulative hazard across treatment regimes is estimated using a weighted Breslow-type statistic. A simulation study was conducted to evaluate the performance of the estimators and proposed tests.

Keywords: Cumulative treatment effect, Non-proportional hazards, Sequentially randomized clinical trial, Stratified proportional hazards model

1 Introduction

It is common practice in treatments of chronic diseases for patients to start with an initial therapy based on specific diagnoses. Patients stay on this initial therapy until a predetermined milestone, commonly referred to as response, is achieved, or physicians determine that the treatment is not providing the intended benefit, commonly referred to as non-response. In either case, further treatment could be suggested, which could be a rescue therapy if patients show primary resistance to the initial therapy, or a maintenance therapy if a response is observed. Thus, treatments applied at later stages depend on the responses to the initial therapies, and a comparison across treatments at various stages would be misleading. Instead, dynamic treatment sequences are usually formed based on stage-specific treatments and responses to these treatments during the course of therapy. In order to study the effect of a treatment sequence on the survival outcome, sequentially randomized designs are used in clinical trials. In a sequentially randomized clinical trial, eligible patients are first randomized to receive one of the initial therapies. Patients reaching the second stage will continue to participate in a second randomization to either rescue or maintenance therapies. The procedure continues. This design results in different treatment regimes consisting of an initial therapy, the intermediate response and a second-stage therapy. The analysis and comparisons of such treatment regimes can be done based on classic survival methods, weighted using inverse probabilities of treatment allocations.

The idea of inverse-probability weighting (IPW) to estimate treatment regime effect in two-stage randomization designs was first introduced by Lunceford, Davidian and Tsiatis [2]. Since then the strategy of IPW has been incorporated into many other classical statistical methods to make them applicable to two-stage randomization designs. For example, Guo and Tsiatis [3] extended the Aalen–Nelson estimator to a Weighted Risk Set Estimator (WRSE) using IPW. Hernan, Lanoy, Costagliola and Robins [4] applied IPW in comparing treatment strategies from observational studies using a Cox proportional hazards model under artificial censoring. Lokhnygina and Helterbrand [1] incorporated IPW into the Cox regression method for two-stage randomization designs. Feng and Wahed [5] presented a modified weighted log-rank test for comparing different treatment strategies using IPW. Miyahara and Wahed [6] introduced IPW to Kaplan–Meier estimators with both fixed and time-dependent weights. More recently, Goldberg and Kosorok [7] used IPW methods in Q-learning to estimate treatment effect for censored data from multistage designs. However, except for the work of Hernan et al. [4], all the aforementioned models failed to adjust for baseline covariates.

Proportional hazards models are widely used in analyzing data from clinical trials with time-to-event endpoints. The advantage of the proportional hazards model over nonparametric methods (e.g. a log-rank test) is that auxiliary covariates can be included in the model to explain the variability of the dependent variable further. Even in randomized clinical trials, baseline characteristics are often imbalanced between treatment groups and hence adjustment becomes a consideration. This imbalance is even more problematic for two stages of randomization, since patients proceeding to the second stage are randomized differently based on their intermediate responses. Lokhnygina and Helterbrand [1] proposed to use a Cox regression method to test the equality of two treatment regimes that share the same maintenance therapy in a two-stage randomization setting. However, their method does not allow auxiliary variables to be included in the model nor does it account for treatment effects that are not constant over time, an issue in many medical studies. Therefore, comparisons among treatment regimes in the presence of non-proportional hazards is of importance to health research communities. Wei and Schaubel [8] proposed cumulative treatment effect estimation based on treatment-specific cumulative baseline hazards using a stratified proportional hazards model. Their model assumed proportional hazards with respect to baseline covariates within each treatment group and non-proportional hazards across treatment groups. In this article, we take a similar approach to estimate the cumulative treatment effect for treatment regimes from sequentially randomized clinical trials. Comparisons among treatment regimes are performed by testing the ratio of the estimated cumulative hazards. A simulation study is conducted to evaluate the performance of the estimators and proposed tests. The estimators and proposed tests are also applied to the neuroblastoma data to compare different treatment regimes in a neuroblastoma study with respect to the overall survival.

2 Design Setting and Statistical Model

2.1 Notation

We consider a two-stage randomization design, in which n eligible patients are first randomized to one of the J initial therapies (A1, … , AJ) and patients achieving a clinical response are then randomized to one of the K second-stage therapies (B1, … , BK). By design, non-responders in the first stage are not eligible to receive any treatment in the second stage, similarly to the neuroblastoma study (Sect. 5) that motivated this work. Thus, there are a total of J × K treatment regimes based on this design, namely, A1B1, … , AJBK. The treatment regime AjBk , j = 1, … , J ; k = 1, … , K, is defined as “treat with Aj as initial therapy, then Bk as second-stage therapy if responds to Aj” [2]. This treatment regime consists of an initial treatment (Aj), the intermediate response status, and a second-stage treatment (Bk) in the event the patient responds to the initial treatment. Thus, estimation of the effect of treatment regime AjBk will not only include patients who were randomized to receive initial therapy Aj, responded, and then randomized to Bk as second-stage therapy, but also patients who were randomized to Aj and did not respond.

Let Xji be the indicator for initial therapy Aj such that Xji = 1 if the ith patient receives initial treatment Aj(j=1JXji=1). The response status is denoted by Ri, with Ri = 1 for responders and Ri = 0 for non-responders. For responders (Ri = 1), let Zki be the indicator for second-stage therapy Bk, i.e., Zki = 1 if the ith patient responds and receives second-stage treatment Bk(Rik=1KZki=Ri). Let Ti and Ci be the survival and censoring times from the time of first randomization for the ith patient, respectively. We assume that the survival time is independently right censored. Then the observed time and event indicator would be defined as Ui = min(Ti,Ci) and Δi = I (TiCi). If we define Vi as the vector of baseline covariates, the observed data for the ith individual can be described as the set of random vectors {Xji, Ri, RiZki, Ui, Δi, Vi, j = 1, … , J ; k = 1, … , K}, i = 1, 2, … , n. Note that in this consideration we assume that Ri is always observed, which may not be true for patients who are censored or died prior to response evaluation. Customarily such patients are treated as non-responders and Ri is set to zero for them [2]. Whenever there is no ambiguity, we will drop the subscript i to represent a generic observation from the population.

2.2 Model

Our proposed model assumes proportional hazards within each treatment regime with respect to the baseline covariates Vi. However, the hazard functions for J × K treatment regimes are left unspecified, and could be proportional or non-proportional across regimes. In other words, a stratified proportional hazards model is used to account for the non-proportionality across treatment regimes. More specifically, let us denote the hazard and the cumulative hazard functions for treatment regime AjBk as λjk (t) and Λjk(t)=0tλjk(s)ds, respectively. Based on stratified proportional hazards with treatment regimes as strata, the hazard function for treatment regime AjBk, j = 1, … , J and k = 1, … , K, could be written as

λjk(t)=λjk0(t)exp{βTV},j=1,,Jandk=1,,K, (1)

where λjk0(t) is the baseline hazard function for treatment regime AjBk, and β is a vector of coefficients corresponding to baseline covariates V. Note that in model (1), the parameter vector β is assumed to be constant across regimes, which implies that no interaction between treatment regimes and baseline covariates is assumed. For a general treatment regime, please see Sect. 2.1. Moreover, no functional form is specified for the regime-specific baseline hazard functions. For example, under a two-stage randomization design with J = K = 2, the hazard functions for treatment regimes A1B1, A1B2, A2B1 and A2B2 can be written as

λ11(t)=λ110(t)exp{βTV},

and similarly, λ12(t) = λ120(t) exp{βT V}, λ21(t) = λ210(t) exp{βT V}, and λ22(t) = λ220(t) exp{βT V}, respectively. Thus, irrespective of the treatment regimes, the effect of baseline covariate V on the hazard can be quantified by the log hazard ratio parameter β. The forms of the baseline hazard functions λ110(t), λ120(t), λ210(t), and λ220(t) are left unspecified, and could be non-proportional. However, we assume proportionality within each regime with respect to the baseline covariates, and the effects of the baseline covariates do not vary across regimes.

3 Inference

Our objective is to draw inference about treatment regimes AjBk, j = 1, … , J ; k = 1, … , K. Note that the focus of the inference in model (1) is not the parameter vector β, rather it is to compare the hazards across treatment regimes. Based on the analytical framework of IPW (see for details in [9]) we define the weight function for treatment regime AjBk, j = 1, … , J and k = 1, … , K, as

Wjki=Xji{(1Ri)+RiZkiπjk}πj,

where πj = P(Xji = 1), and πjk = P(Zki = 1∣Xji = 1, Ri = 1). Thus, both responders (Ri = 1) to Aj and non-responders (Ri = 0) to Aj are weighted with the inverse of the probability of randomization when evaluating the effect of the treatment regime AjBK. The probabilities of being assigned to Bk, k = 1, … , K, could be different for initial treatments Aj, j = 1, … , J. For example, in some two-stage randomization studies, if the patients receive one initial treatment, particular choices in the second stage might be more toxic than others, and hence these patients are randomized with less probability into such choices. Based on the counting process notation described in Fleming and Harrington [10], the event and risk indicators for the ith patient are defined as Ni (t) = ΔiI (Uit), and Yi (t) = I (Uit), respectively. We define the weighted event and risk indicators for treatment regime AjBk as Njki = WjkiΔiI (Uit) and Yjki(t) = WjkiI (Uit).

The partial likelihood estimate of β can be obtained by solving the pseudo-score equation [1]:

U(β)=i=1nj=1Jk=1K0L{ViV¯jk(t,β)}dNjki(t)=0,

where

V¯jk(t,β)=p=1nVpYjkp(t)exp{βTVp}p=1nYjkp(t)exp{βTVp}.

Note that in defining U(·), we have utilized the weighted event and risk processes. The estimated vector of coefficients is denoted by β^. Then the Breslow estimator [12] of the cumulative baseline hazard for treatment regime AjBk can be obtained as

Λ^jk0(t,β^)=i=1n0tdNjki(s)p=1nYjkp(s)exp{β^TVp}.

A comparison of different treatment regimes can then be carried out in terms of the ratio of the cumulative baseline hazards. The ratio of the cumulative baseline hazards for comparing treatment regimes AjBk and AjBk is defined as

θjkjk(t)=Λjk0(t)Λjk0(t). (2)

This ratio of the cumulative baseline hazards equals the ratio of the cumulative hazards given the same values for covariates, because

θjkjk(t)=Λjk0(t)Λjk0(t)=Λjk0(t)exp{βTVi}Λjk0(t)exp{βTVi}=Λjk(t)Λjk(t).

The ratio of the cumulative baseline hazards can be estimated by replacing the estimated cumulative baseline hazards in (2). Let us denote the corresponding estimator by θ^jkjk(t). For unweighted group comparisons, Wei and Schaubel [8] showed that such a ratio converges asymptotically to a Gaussian process. A similar argument outlined in Appendix A can be used to show that θ^jkjk(t) follows a Gaussian process with mean θjkj’k’(t) and variance function σjkjk2(t), where

σjkjk2(t)=E{ξjkjki2(t,β)},ξjkjki(t,β)=Φjki(t,β)Λjk0(t)Λjk0(t)Φjki(t,β)Λjk02(t),Φjki(t,β)=hjkT(t,β)Ω1(β)Ψi(β)+0tyjk(0)(s,β)1dMjki(s,β),hjk(t,β)=0tv¯jk(s,β)dΛjk0(s),Ω(β)=j=1Jk=1K0Lτjk(t,β)yjk(0)(t,β)dΛjk0(t),Ψi(β)=j=1Jk=1K0L{Viv¯jk(t,β)}dMjki(t,β),dMjki(t,β)=dNjki(t)Yjki(t)exp{βTVi}dΛjk0(t). (3)

Besides, yjk(0)(t,β),v¯jk(t,β) and τjk(t,β) are the limiting values of Yjk(0)(t,β)=1np=1nYjkp(t)exp{βTVp},V¯jk(t,β), and p=1nVpVpTYjkp(t)exp{βTVp}p=1nYjkp(t)exp{βTVp}V¯jk(t,β)V¯jkT(t,β), respectively. The variance function can be estimated by σ^jkjk2(t)=1ni=1nξ^jkjki2(t,β^). The detailed steps for computing ξ^jkjki(t,β^) are described in the Appendix B. One might expect that the asymptotic distribution of ln{θ^jkjk(t)} would be closer to a Gaussian process than θ^jkjk(t). Based on the delta method, the variance of ln{θ^jkjk(t)} can be estimated by σ^jkjk(t)2=σ^jkjk2(t)θ^jkjk(t). We will use simulation to investigate the properties of θ^jkjk(t) and ln{θ^jkjk(t)} in a sequentially randomized setting in Sect. 4.

Based on the log ratio estimator of the cumulative baseline hazards, a Wald-type test can be used for comparing different treatment regimes. For example, for comparing treatment regimes AjBk and Aj’Bk’ at a specific time point t0, the null hypothesis can be described as Hjkj’k’ : ln{θjkj’k’ (t0)} = 0, and the test statistic can be written as

Djkjk=ln{θ^jkjk(t0)}σ^jkjk(t0).

This test statistic is then compared to a standard normal distribution. Because the cumulative hazard can be viewed as an “accumulation” of the hazard over time, the choice of t0 will depend on the disease-specific survival patterns as well as the treatments. If there are more than two treatment regimes involved, multiple treatment regimes can be compared by performing an overall test of difference among all regimes using the Wald Chi-square test. More specifically, in our case, let θ(t0) = [θjk11(t0), j = 1, … , J ; k = 1, … , K; jk ≠ 11]T be the ratios (J K – 1)-dimensional vector of cumulative hazard ratios at t0, and

Σ(t0)=(σ11112(t0)σ11111211(t0)σ1111JK11(t0)σ11111211(t0)σ12112(t0)σ1211JK11(t0)σ1111JK11σ1211JK11σJK112(t0))

be the corresponding variance-covariance matrix. The diagonal elements of σjk112(t0) are the variances of θ^jk11(t), j = 1, …, J; k = 1, … , K. The off-diagonal elements are the covariance between θ^jk11(t) and θ^jk11(t), j = 1, … , J ; k = 1, … , K; jkj’k’. Then for testing H0 : Λ11(t0) = ⋯ = ΛJK(t0) or equivalently H0 : ln{θ(t0)} = 0, the test statistic can be written as

χJK12=[ln{θ^(t0)}]TΣ^1(t0)[ln{θ^(t0)}],

where

Σ^(t0)=(ln{θ^(t0)}θ^(t0))TΣ^(t0)(ln{θ^(t0)}θ^(t0)).

This test statistic is then compared to a Chi-square distribution with (J K – 1) degrees of freedom.

4 Simulation Study

For simplicity, a simulation study was carried out under a two-stage randomization design with only two treatment options for each stage (J = K = 2). Under this design, the performance of the ratio estimator, log ratio estimator and proposed tests were assessed under the following scenarios:

  • – Scenario I: The survival distributions for treatment regimes A1B1 and A1B2 are the same (note that the treatment regimes A1B1 and A1B2 share the same induction therapy A1);

  • – Scenario II: The survival distributions for treatment regimes A1B1 and A2B1 are the same (note that the treatment regimes A1B1 and A2B1 share the same second-stage therapy B1);

  • – Scenario III: The survival distributions for treatment regimes A1B1 and A2B2 are the same (note that the treatment regimes A1B1 and A2B2 have different induction and second-stage therapies);

  • – Scenario IV: No pattern is specified in the population.

An indicator (X1i) for initial treatment A1 was generated following the Bernoulli(0.5) distribution. An indicator for response (Ri) was generated from the Bernoulli distribution with 60 % response rate. An indicator (Z1i) for second-stage treatment B1 was drawn from the Bernoulli(0.5) distribution where Ri = 1. Based on the stratified proportional hazards model (1), the death time Tjki for the ith patient in the jkth regime is calculated based on the Weibull distribution using

Tjki={log(ϕi)[αjkexp{βTVi}]}1γjk,

for j = 1, 2, and k = 0, 1, 2, where ϕi was generated from the Uniform(0,1) distribution. Vi = [V1i, V2i]T and both V1i and V2i were generated from the Bernoulli(0.5) distribution. The parameter vector β was set to [0.5, 0.5]T. A variety of values were used for αjk and γjk to assess the properties of the ratio estimator, log ratio estimator and proposed tests under different scenarios (Table 1). Null hypotheses H1211 : ln{θ1211(t)} = 0, H2111 : ln{θ2111(t)} = 0, and H2211 : ln{θ2211(t)} = 0 were true in the population under scenarios I, II and III, respectively. For scenario IV, parameter values were chosen with no pattern. The true cumulative hazard ratios for comparing A1B2, A1B2, and A1B2 to A1B2 under different scenarios are plotted in Fig. 1. Censoring time was generated according to the Uniform(τ/2, τ) distribution, where the value of τ was chosen to be 5, resulting in censoring percentage ranging from 10 to 30 %. The sample size was varied from 400 to 1200, and 1000 replications were performed under each scenario. Time t0 was chosen to be the 75th percentile of the observed time under each scenario. For each sample, the ratios and the log ratios of the cumulative baseline hazards for comparing A1B2, A2B1, and A2B2 to A1B1 were calculated, and theWald-type tests were performed based on the log ratio estimates.

Table 1.

Different values for αjk and γjk, j = 1, 2, and k = 0, 1, 2, under different scenarios (for scenario details see Sect. 4)

Scenario γ 10 γ 11 γ 12 γ 20 γ 21 γ 22 α 10 α 11 α 12 α 20 α 21 α 22 Censoring (%)
I 1.4 1.4 1.4 1.2 1.2 1.2 0.7 0.4 0.4 0.3 0.2 0.25 10
II 1.0 1.5 1.2 1.0 1.5 1.3 0.5 0.2 0.1 0.5 0.2 0.05 23
III 1.2 1.0 1.2 1.2 1.5 1.0 0.1 0.4 0.7 0.1 0.04 0.4 30
IV 1.0 1.5 1.2 1.5 1.0 1.4 0.7 0.4 0.35 0.3 0.2 0.25 10

Fig. 1.

Fig. 1

The true cumulative hazard ratios for comparing A1B2, A2B1, and A2B2 to A1B1 under different scenarios. Solid line A1B2 vs. A1B1, dashed line A2B1 vs. A1B1, dotted line A2B2 vs. A1B1

Table 2 shows the ratio and log ratio estimates (EST), absolute bias (BIAS), asymptotic standard error (ASE), empirical standard deviation (ESD) and 95 % coverage probability (CP) under each scenario. For example, under scenario I and at a sample size of 800, the estimate of θ1211(t0) was 1.01 with an absolute bias of 0.01. The asymptotic standard error was 0.119, close to the empirical standard deviation of 0.121. The corresponding coverage probability was 93 % based on the estimate and asymptotic standard error. Similarly, the estimate for the ln{θ1211(t0)} was 0.00 with an absolute bias of 0.01. The asymptotic standard error (0.118) and empirical standard deviation (0.120) were close. The coverage probability was 94 %, better than that for θ^1211(t0). In most of the cases, the estimates were approximately unbiased; with the absolute biases differed from 0.00 to 0.02. The asymptotic standard errors were close to the empirical standard deviations, demonstrating that the estimated standard errors were consistent. Most coverage probabilities were close to 95 %, attaining the nominal level. As sample size increased, the estimated standard error decreased and the coverage probability increased. Some of the log ratio estimates were less biased than the corresponding ratio estimates, and the overall coverage probabilities were closer to 95 % based on the log ratio estimates than those based on the ratio estimates.

Table 2.

Simulation results under each scenario (for scenario details see Sect. 4). EST: ratio or log ratio estimate; BIAS: absolute bias; ASE: asymptotic standard error; ESD: empirical standard deviation; and CP: 95 % coverage probability

Scenario n θ1211(t0)
ln{θ1211(t0)}
EST BIAS ASE ESD CP EST BIAS ASE ESD CP
I 400 1.01 0.00 0.166 0.175 0.93 −0.01 0.02 0.165 0.171 0.94
800 1.01 0.01 0.119 0.121 0.93 0.00 0.01 0.118 0.120 0.94
1200 1.00 0.01 0.097 0.102 0.93 0.00 0.01 0.097 0.101 0.94
II 400 0.51 0.01 0.084 0.083 0.95 −0.69 0.00 0.167 0.166 0.95
800 0.51 0.01 0.060 0.061 0.96 −0.69 0.01 0.119 0.121 0.95
1200 0.51 0.00 0.049 0.051 0.95 −0.69 0.00 0.098 0.100 0.95
III 400 1.14 0.01 0.113 0.105 0.97 0.13 0.00 0.099 0.092 0.97
800 1.14 0.01 0.082 0.073 0.97 0.13 0.00 0.072 0.064 0.97
1200 1.14 0.00 0.067 0.058 0.98 0.13 0.00 0.059 0.05 0.98
IV 400 0.79 0.01 0.121 0.126 0.93 −0.24 0.02 0.153 0.157 0.95
800 0.79 0.01 0.087 0.089 0.92 −0.24 0.02 0.109 0.113 0.93
1200 0.79 0.01 0.071 0.072 0.94 −0.24 0.02 0.089 0.090 0.94

Scenario n θ2111(t0)
ln{θ2111(t0)}
EST BIAS ASE ESD CP EST BIAS ASE ESD CP

I 400 0.43 0.01 0.075 0.077 0.94 −0.86 0.01 0.174 0.179 0.94
800 0.43 0.00 0.053 0.052 0.96 −0.86 0.00 0.123 0.122 0.94
1200 0.43 0.00 0.043 0.044 0.95 −0.86 0.01 0.101 0.103 0.95
II 400 1.02 0.02 0.181 0.197 0.93 0.00 0.00 0.178 0.190 0.94
800 1.00 0.01 0.128 0.131 0.95 0.00 0.00 0.127 0.129 0.95
1200 1.01 0.01 0.105 0.108 0.94 0.00 0.00 0.105 0.107 0.95
III 400 0.51 0.02 0.098 0.103 0.94 −0.70 0.01 0.195 0.202 0.94
800 0.50 0.01 0.070 0.072 0.95 −0.70 0.02 0.140 0.143 0.95
1200 0.50 0.01 0.057 0.057 0.96 −0.70 0.01 0.115 0.113 0.95
IV 400 0.44 0.01 0.076 0.075 0.95 −0.82 0.02 0.170 0.168 0.95
800 0.44 0.01 0.053 0.054 0.96 −0.83 0.01 0.121 0.123 0.94
1200 0.44 0.01 0.044 0.045 0.95 −0.83 0.01 0.099 0.101 0.95

Scenario n θ2211(t0)
ln{θ2211(t0)}
EST BIAS ASE ESD CP EST BIAS ASE ESD CP

I 400 0.49 0.00 0.084 0.088 0.93 −0.72 0.01 0.171 0.182 0.93
800 0.49 0.00 0.059 0.058 0.95 −0.72 0.01 0.122 0.119 0.96
1200 0.49 0.00 0.049 0.049 0.93 −0.72 0.01 0.100 0.100 0.95
II 400 0.38 0.00 0.069 0.069 0.94 −0.99 0.01 0.182 0.181 0.94
800 0.38 0.00 0.049 0.048 0.95 −0.99 0.01 0.130 0.129 0.95
1200 0.37 0.00 0.040 0.039 0.94 −0.99 0.01 0.106 0.104 0.95
III 400 1.01 0.01 0.175 0.180 0.94 0.00 0.01 0.172 0.177 0.95
800 1.01 0.00 0.125 0.129 0.94 0.00 0.01 0.124 0.127 0.95
1200 1.00 0.00 0.102 0.100 0.95 0.00 0.00 0.102 0.099 0.95
IV 400 0.61 0.01 0.101 0.104 0.93 −0.51 0.00 0.165 0.172 0.95
800 0.60 0.00 0.071 0.071 0.95 −0.51 0.01 0.118 0.117 0.95
1200 0.60 0.00 0.058 0.058 0.94 −0.51 0.01 0.096 0.097 0.96

Table 3 presents the rejection rates for testing null hypotheses H1211 : ln{θ1211(t)} = 0, H2111 : ln{θ2111(t)} = 0, and H2211 : ln{θ2211(t)} = 0, separately under each scenario. Because the null hypotheses H1211 : ln{θ1211(t)} = 0, H2111 : ln{θ2111(t)} = 0, and H2211 : ln{θ2211(t)} = 0, were true under scenarios I, II and III, respectively, the rejection rates for H1211, H2111, and H2211 were close to the nominal level of 0.05 under scenarios I, II and III, respectively, suggesting that the tests were approximately unbiased. The rejection rates were relatively larger under the smaller sample size of 400. As sample size increased, the rejection rate approached 0.05.

Table 3.

Rejection rates for testing null hypotheses H1211 : ln{θ1211(t)} = 0, H2111 : ln{θ2111(t)} = 0, and H2211 : ln{θ2211(t)} = 0 under different scenarios. Scenarios I, II and III respectively represent null hypotheses H1211, H2111 and H2211, while for scenario IV, all three null hypotheses are false

Scenario n H 1211 H 2111 H 2211
I 400 0.059 0.999 0.991
800 0.053 1.000 1.000
1200 0.055 1.000 1.000
II 400 0.992 0.064 1.000
800 1.000 0.048 1.000
1200 1.000 0.050 1.000
III 400 0.233 0.937 0.050
800 0.444 1.000 0.050
1200 0.603 1.000 0.046
IV 400 0.351 0.992 0.864
800 0.587 1.000 0.994
1200 0.753 1.000 0.999

5 Analysis of Neuroblastoma Data

From 1991 to 1996, the Children’s Cancer Group conducted a randomized clinical trial to study the effect of a combination of myeloablative chemotherapy, total-body irradiation and transplantation of autologous bone marrow purged of cancer cells (ABMT) to a standard chemotherapy in treating children with high risk neuroblastoma [13]. Since relapse is common after completion of induction therapies among children with high risk neuroblastoma, patients without progressive disease (PD) or histologically confirmed disease (HCD) after induction therapies were then randomly assigned to receive either 13-cis-rectinoic acid (cis-RA) or no further therapy. Therefore, the high risk neuroblastoma clinical trial followed a two-stage randomization design. By the end of the study, a total of 379 children participated in the first randomization, with 190 children assigned to ABMT, and 189 children assigned to chemotherapy. After the completion of the first-stage therapy, 203 children had no PD or HCD, and thus participated in the second randomization. During the second randomization, 102 children were assigned to cis-RA, and 101 patients were assigned to no further therapy. Matthay et al. [13] reported 55 more children who participated in the second randomization, resulting in 130 children assigned to cis-RA and 128 children assigned to no further therapy. These children were not included in our setting because they did not participate in the first randomization. This two-stage randomization design resulted in four treatment regimes: (i) treat with ABMT, followed by cis-RA if no PD or HCD (AC); (ii) treat with ABMT, followed by no further therapy if no PD or HCD (AN); (iii) treat with chemotherapy, followed by cis-RA if no PD or HCD (CC); (iv) treat with chemotherapy, followed by no further therapy if no PD or HCD (CN).

A stratified proportional hazards model based on (1) was applied to the neuroblastoma data, using treatment regimes as strata. The only baseline characteristics found imbalanced among different treatment groups was “Evan’s stage” in Matthay et al. [13], and thus was included in the model as one of the covariates. Additionally, “age” was also included in the model as a continuous covariate. The time to second randomization did not differ significantly across the first stage treatments (p = 0.26). The resulting log ratio estimates of the cumulative baseline hazards and their corresponding 95 % confidence intervals within the time interval of [0, 2000] days are shown in Fig. 2. From the plot of the log ratio estimate of the cumulative baseline hazards comparing treatment regime AN to AC, we observed a notable difference around 450 days. However, the horizontal “zero” line was within the confidence interval most of the time, suggesting that there was no significant difference between treatment regimes AN and AC. In the plot for comparing treatment regime CC to AC the confidence band moved further away from “zero” as time increased. Thus, we would suspect that children following the treatment regime AC had better overall survival compared to those following the regime CC. Besides, the confidence band also became narrower as time increased since there was higher variability with fewer events at the beginning of the study. Similar results were observed from the plot for comparing treatment regime CN to AC. The hazard ratios (95 % confidence interval) for comparing AN, CC, and CN to AC at year 3 were estimated to be 1.10 (0.78, 1.43), 1.13 (0.74, 1.52), and 1.02 (0.67, 1.38), respectively. The hazard ratios (95 % confidence interval) for comparing AN, CC and CN to AC at year 5 were estimated to be 1.22 (0.86, 1.57), 1.38 (0.92, 1.85), and 1.43 (0.95, 1.91), respectively. We also used the Wald Chi-square test to evaluate the overall difference among four treatment regimes at year 5. It resulted in a p-value of 0.17, showing that there was no overall significant difference in the cumulative hazard among four treatment regimes at 5 years. Therefore, we would suspect that children with high risk neuroblastoma would have a similar overall 5-year survival irrespective of the induction therapy they received. After the completion of induction therapies, whether they would be subsequently treated with cis-RA or not, did not significantly improve the overall 5-year survival either.

Fig. 2.

Fig. 2

Estimated log ratios of the cumulative baseline hazards and the corresponding 95 % pointwise confidence intervals from the neuroblastoma study. AC: “treat with ABMT, and then followed by cis-RA if no PD or HCD,” AN: “treat with ABMT, and then followed by no further therapy if no PD or HCD,” CC: “treat with chemotherapy, and then followed by cis-RA if no PD or HCD,” CN: “treat with chemotherapy, and then followed by no further therapy if no PD or HCD”

6 Discussion

In this article, we proposed a stratified proportional hazards model to estimate the cumulative treatment effect for treatment regimes from sequentially randomized clinical trials. This approach is similar to that advocated by Wei and Schaubel [8] for a single-stage randomization. Comparisons among treatment regimes were performed by testing the log ratio of the estimated cumulative hazards. Simulation results showed that the ratio estimator was approximately unbiased, and the coverage probabilities were close to the nominal level. However, the log ratio estimator performed better, with smaller absolute biases and better coverage probabilities. The comparative hypothesis testing can also be performed maintaining adequate type I errors from the proposed model under moderate sample size based on the Wald-type tests. In this paper we assumed that the randomization probabilities are known (by design). Estimation of these probabilities from the observed data might result in more efficient estimators.

Wald-type tests were used to assess the survival difference between treatment regimes at a specific time point. Although such point-wise comparisons are of interest in many diseases-specific areas, comparisons of treatment regimes based on overall hazard curves may be of importance alongside the construction of simultaneous confidence bands. This issue is beyond the scope of this manuscript and is being considered as a separate publication.

Acknowledgements

We would like to thank COG Neuroblastoma Disease Committee for kind permission to use the neuroblastoma data set, especially Dr. Wendy London for her help during the application process. We thank Dr. Susan Ellenberg from University of Pennsylvania for her insightful comments. We also thank the referees of this article for their helpful comments. Dr. Wahed’s research was in part supported by a National Institute of Mental Health Grant P30 MH090333.

Appendix A. Outline of the asymptotic normality of θ^jkjk(t)

The following are the equivalent assumptions in sequentially randomized clinical trials to those outlined in Sect. 3 of Wei and Schaubel [8].

  1. (Xji, Ri, RiZki, Ui, Δi, Vi) are i.i.d. random vectors for i = 1, … , n.

  2. Elements of Vi have bounded total variation for i 1, … , n

  3. The cumulative hazard is finite over a pre-specified interval [0, L] such that P (Ui > L) > 0.

  4. (d)
    yjk(1)(t,β)=βyjk(0)(t,β)andyjk(2)(t,β)=2ββTyjk(0)(t,β),
    where
    yjk(d)(t,β)=limnYjk(d)(t,β)ford=0,1,2,
    where
    Yjk(d)(t,β)=n1i=1nYjki(t)Videxp{βTVi}ford=0,1,2,whereVi0=1,Vi1=Vi,Vi2=ViViT,
    with yjk(1)(t,β) and yjk(2)(t,β) bounded away from 0 for t ∈ [0. L] and β in an open set.
  5. Positive-definiteness of the matrix
    Ω(β)=j=1Jk=1K0Lτjk(t,β)yjk(0)(t,β)dΛjk0(t,β),
    where
    τjk(t,β)=yjk(2)(t,β)yjk(0)(t,β)vjk(t,β)2,

Consistency and asymptotic normality of θ^jkjk(t) Consistency and asymptotic normality of θ^jkjk(t) can be established in a similar manner as in the proof of Theorems 1 and 2 given in the Web Appendix of Wei and Schaubel [8] as long as the following results hold:

  1. β^a.s.β0.

  2. n(β^β0) is asymptotically normal

  3. Λ^jk(t,β^) is a uniformly consistent estimator of Λjk(t, β).

1. Consistency of β^

Recall that β is a solution to the equation

Un(β)=1ni=1nj=1Jk=1K0L{ViV¯jk(t,β)}dNjki(t)=0.

First note that the processes Yjk(t) and Njk(t) are both cardlag processes and hence they are Donsker. Since the classes {βB} and {V} are trivially Donsker, the functions Yjk(t) exp{βT V}, VYjk(t) exp{βT V}, VVT Yjk(t) exp{βT V} are all Donsker for t ∈ [0, L], βB. The derivative of Un(β) with respect to β is Ωn(β), where

Ωn(β)=j=1Jk=1K0L[p=1nVpVpTYjkp(t)exp(βTVp)p=1nYjkp(t)exp(βTVp)][{p=1nVpYjkp(t)exp(βTVp)p=1nYjkp(t)exp(βTVp)}2]1ni=1ndNjki(s),

where ⊗2 denotes the outer product. All functions in the above expressions are Glivenko–Cantelli and the limiting value yjk(0)(t,β) of 1np=1nYjkp(t)exp(βTVp) is bounded away from zero. Therefore,

supβBΩn(g)Ω(β)a.s.0,

where Ω(β) is defined in (3). Since Ω(β) is positive semidefinite. Un(β) is almost surely convex for large n. Therefore, β^a.s.β0.

2. Asymptotic normality of n(β^β0)

We write Ψn(β) as

Ψn(β)=1ni=1nj=1Jk=1K0L{Viv¯jk(t,β)}dMjki(t)=0,

where

Mjki(t,β)=Njki(t)0LYjki(t)exp(βTVi)dΛjk0(t).

Let

Ψ(β)=E[j=1Jk=1K0L{Vv¯jk(t,β)}dMjk(t)],

where

v¯jk(t,β)=E[VYjk(t)exp(βTV)]E[Yjk(t)exp(βTV)].

Then the arguments on pages 56 and 57 in Kosorok (2008, Chap. 4, [11]) can be applied to show that n(β^β0) weakly converges to a mean zero normal random vector with covariance matrix Ω−1(β0).

3. Almost sure convergence of Λ^jk(t,β^)

Note that

Λ^jk(t,β^)=0tPndNjk(s)PnYjk(s)exp(β^TV),

where Pn is the empirical measure, namely, Pnf=1ni=1nf(xi) for real-valued functions f:XR, X being the sample space. This is basically the same estimator defined in the first display on page 57 of Kosorok (2008, Chap. 4, [11]). Hence, a straightforward application of the argument therein leads to

supt[0,L]Λ^jk(t,β^)Λjk0(t)a.s.0.

With the above results (1–3), it is now straightforward to apply the arguments in the Web Appendix A of Wei and Schaubel [8] to establish that θ^jkjk(t) is uniformly consistent and asymptotically normal.

Appendix B. Estimation of ξ^jkjki(t,β^)

The detailed steps for computing ξ^jkjki(t,β^) are as follows:

Step 1: Calculate V¯^jk(s,β^)=p=1nVpYjkp(s)exp{β^TVp}p=1nYjkp(s)exp{β^TVp}.

Step 2: Calculate

h^jk(t,β^)=0tV¯^jk(s,β^)dΛ^jk0(s,β^)=0tV¯^jk(s,β^)i=1ndNjki(s)p=1nYjkp(s)exp{β^TVp}=i=1n0tV¯^jk(s,β^)dNjki(s)p=1nYjkp(s)exp{β^TVp}.

Step 3: Define τ^jk(t,β^)=p=1nVpVpTYjkp(s)exp{β^TVp}p=1nYjkp(s)exp{β^TVp}V¯^jk(t,β^)V¯^jkT(t,β^).

Step 4: Calculate

Ω^(β^)=1nj=1Jk=1K0Lτ^jk(t,β^)p=1nYjkp(t)exp{β^TVp}dΛ^jk0(t,β^)=1nj=1Jk=1K0Lτ^jk(t,β^)p=1nYjkp(t)exp{β^TVp}×i=1ndNjki(t)p=1nYjkp(t)exp{β^TVp}=1ni=1nj=1Jk=1K0Lτ^jk(t,β^)dNjki(t).

Step 5: Calculate

Ψ^i(β^)=j=1Jk=1K0L{ViV¯^jk(t,β^)}dM^jki(t,β^)=j=1Jk=1K0L{ViV¯^jk(t,β^)}[dNjki(t)Yjkiexp{β^TVi}dΛ^jk0(t,β^)]=j=1Jk=1K0L{ViV¯^jk(t,β^)}dNjki(t)j=1Jk=1K0L{ViV¯^jk(t,β^)}Yjkiexp{β^TVi}×i=1ndNjki(t)p=1nYjkp(t)exp{β^TVp}=j=1Jk=1K0L{ViV¯^jk(t,β^)}dNjki(t)i=1nj=1Jk=1K0L{ViV¯^jk(t,β^)}Yjkiexp{β^TVi}p=1nYjkp(t)exp{β^TVp}dNjki(t).

Step 6: Define

Φ^jkiL(t,β^)=0tnp=1nYjkp(s)exp{β^TVp}dM^jki(s,β^)=0tnp=1nYjkp(s)exp{β^TVp}×[dNjki(s)Yjkiexp{β^TVi}dΛ^jk0(s,β^)]=0tndNjki(s)p=1nYjkp(s)exp{β^TVp}0tnYjkiexp{β^TVi}p=1nYjkp(s)exp{β^TVp}i=1ndNjki(s)p=1nYjkp(s)exp{β^TVp}=0tndNjki(s)p=1nYjkp(s)exp{β^TVp}i=1n0tnYjkiexp{β^TVi}dNjki(s)[p=1nYjkp(s)exp{β^TVp}]2.

Step 7: Calculate Φ^jki(t,β^)=h^jkT(t,β^)Ω^1(β^)Ψ^i(β^)+Φ^jkiL(t,β^).

Step 8: Calculate ξ^jkjki(t,β^)=Φ^jki(t,β^)Λ^jk0(t,β^)Λ^jk0(t,β^)Φ^jki(t,β^)Λ^jk02(t,β^).

Contributor Information

Xinyu Tang, Tang Biostatistics Program, Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, AR 72202, USA.

Abdus S. Wahed, Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261, USA

References

  • 1.Lokhnygina Y, Helterbrand JD. Cox regression methods for two-stage randomization designs. Biometrics. 2007;63:422–428. doi: 10.1111/j.1541-0420.2007.00707.x. [DOI] [PubMed] [Google Scholar]
  • 2.Lunceford JK, Davidian M, Tsiatis AA. Estimation of survival distributions of treatment policies in two-stage randomization designs in clinical trials. Biometrics. 2002;58:48–57. doi: 10.1111/j.0006-341x.2002.00048.x. [DOI] [PubMed] [Google Scholar]
  • 3.Guo X, Tsiatis AA. A weighted risk estimator for survival distributions in two-stage randomization designs with censored survival data. Int J Biostat. 2005;1:1–15. [Google Scholar]
  • 4.Hernan MA, Lanoy E, Costagliola D, Robins JM. Comparison of dynamic treatment regimes via inverse probability weighting. Basic Clin Pharmacol Toxicol. 2006;98:237–242. doi: 10.1111/j.1742-7843.2006.pto_329.x. [DOI] [PubMed] [Google Scholar]
  • 5.Feng W, Wahed AS. Supremum weighted log-rank test and sample size for comparing two-stage adaptive treatment strategies. Biometrika. 2008;95:695–707. [Google Scholar]
  • 6.Miyahara S, Wahed AS. Weighted Kaplan–Meier estimators for two-stage treatment regimes. Stat Med. 2010;29:2581–2591. doi: 10.1002/sim.4020. [DOI] [PubMed] [Google Scholar]
  • 7.Goldberg Y, Kosorok MR. Q-learning and censored data. Ann Stat. 2012 doi: 10.1214/12-AOS968. doi:10.1214/12-AOS968. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Wei G, Schaubel DE. Estimating cumulative treatment effects in the presence of nonproportional hazards. Biometrics. 2008;64:724–732. doi: 10.1111/j.1541-0420.2007.00947.x. [DOI] [PubMed] [Google Scholar]
  • 9.Wahed AS, Tsiatis AA. Optimal estimator for the survival distribution and related quantities for treatment policies in two-stage randomization designs in clinical trials. Biometrics. 2004;60:124–133. doi: 10.1111/j.0006-341X.2004.00160.x. [DOI] [PubMed] [Google Scholar]
  • 10.Fleming TR, Harrington DP. Counting processes and survival analysis. Wiley; New York: 1991. [Google Scholar]
  • 11.Kosorok MR. Introduction to Empirical Processes and Semiparametric Inference. Springer; New York: 2008. [Google Scholar]
  • 12.Breslow NE. Contribution to the discussion of the paper by D.R. Cox. J R Stat Soc, Ser B. 1972;34:187–220. [Google Scholar]
  • 13.Matthay KK, Reynolds P, Seeger RC, Shimada H, Adkins ES, Haas-Kogan A, et al. Long-term results for children with high-risk neuroblastoma treated on a randomized trial of myeloablative therapy followed by 13-cis-retinoic acid: A Children’s Oncology Group study. J Clin Oncol. 2009;27:1007–1013. doi: 10.1200/JCO.2007.13.8925. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES