Summary
We consider methods for estimating the treatment effect and/or the covariate by treatment interaction effect in a randomized clinical trial under non-compliance with time-to-event outcome. As in Cuzick et al. (2007), assuming that the patient population consists of three (possibly latent) subgroups based on treatment preference: the ambivalent group, the insisters and the refusers, we estimate the effects among the ambivalent group. The parameters have causal interpretations under standard assumptions. The paper contains two main contributions. First, we propose a weighted per-protocol (Wtd PP) estimator through incorporating time-varying weights in a proportional hazards model. In the second part of the paper, under the model considered in Cuzick et al. (2007), we propose an EM algorithm to maximize a full likelihood (FL) as well as the pseudo likelihood (PL) considered in Cuzick et al. (2007). The E step of the algorithm involves computing the conditional expectation of a linear function of the latent membership, and the main advantage of the EM algorithm is that the risk parameters can be updated by fitting a weighted Cox model using standard software and the baseline hazard can be updated using closed form solutions. Simulations show that the EM algorithm is computationally much more efficient than directly maximizing the observed likelihood. The main advantage of the Wtd PP approach is that it is more robust to model misspecifications among the insisters and refusers since the outcome model does not impose distributional assumptions among these two groups.
Keywords: All-or-Nothing Compliance, EM Algorithm, Proportional Hazards Model, Randomized Clinical Trial, Weighted Partial Likelihood
1. Introduction
In a randomized clinical trial with time-to-event outcome, the treatment effect is often estimated using the proportional hazards (PH) model following the intention-to-treat (ITT) principle. When patient non-compliance occurs, the ITT estimator measures the effect of the treatment assignment, rather than the true treatment efficacy. Simple alternatives to the ITT analysis such as the as-treated analysis (AT) and the per-protocol analysis (PP) could also be biased due to potential confounding by measured or unmeasured factors.
Often, more sophisticated models that incorporate covariates and/or treatment by covariate interactions are of interest. Our motivating study for this paper is the PACCT-1 trial, a phase III breast cancer trial led by the ECOG-ACRIN Cancer Research Group. This trial evaluates the use of the OncoType DX Recurrence Score (RS, developed by Genomic Health, Inc. based on expression profiles from 21 genes, Paik et al. (2004)) as a marker to guide treatment decisions. In the study design, low risk patients (RS < 11) were treated with standard hormonal therapy (H) and high risk patients (RS > 25) were treated with hormonal therapy + chemo therapy (H+C), while the intermediate risk patients (RS 11–25) were randomized between H and H+C. To address the primary question whether adding chemotherapy is beneficial for the intermediate risk group, the study was designed to have high power to detect a difference in 5-year invasive disease-free survival of 90% (H+C) vs. 87% (H). Accrual to the study was closed on October 2010 with 6,907 intermediate risk patients randomized. Further classifying the intermediate risk patients into three RS categories with increasing risk: (11 – 15), (16 – 20), (21 – 25), treatment non-compliance was observed in both arms, within each risk category. As risk category increases, the non-compliance rate increases in arm H (5% vs. 6% vs. 12%) but decreases in arm H+C (25% vs. 15% vs. 11%), implying increased preference for C as risk rises. The term non-compliance used here refers to the situation where a patient was randomized to H but was actually treated with H+C, or vice versa.
While the study was designed to use an ITT comparison as the primary analysis, given that relatively high non-compliance may reduce the power of the analysis and may bias the treatment effect towards no difference, further analyses to adjust for non-compliance may be necessary. As retrospective experience suggested that high RS patients may receive greater benefit from chemotherapy (Paik et al. (2006)), it is also of interest to explore the treatment by RS interaction effect.
Approaches to adjust for non-compliance in randomized trials have been proposed in various outcome settings, including time-to-event endpoints. Most methods are formulated to estimate the causal effect of treatment actually received in structural models for counterfactual outcomes. Robins and Tsiatis (1991) estimated the causal treatment effect in a structural accelerated failure time model. However, in cancer clinical trials, treatment effect on the PH scale is more commonly of interest. Loeys and Goetghebeur (2003) proposed estimating the causal PH effect among the treatable subpopulation defined on the potential treatment compliance variable. Cuzick et al. (2007) proposed several estimators to estimate the treatment effect on a PH scale among the ambivalent group, a latent subgroup of subjects who do not have a preference to treatment, but only the mixture likelihood method is general enough to allow covariate effects and allow correlation between the covariates and the treatment preference groups.
The marginal structural Cox model introduced by Robins (2000) and illustrated by Hernán et al. (2001), can be applied in the non-compliance setting to estimate the population average causal effect of treatment actually received using the inverse probability of treatment weighted (IPTW) method. This method does not exploit randomization but rather relies on the no unmeasured confounder assumption (NUCA), which requires that all covariates predicting both treatment and outcome are collected and incorporated in the weights. It was originally developed for complex longitudinal observational studies with time-varying treatment and therefore could incorporate time dependent non-compliance. Both the IPTW method and the mixture likelihood method could incorporate covariates and possible treatment by covariate interactions.
In this article, we propose a weighted per-protocol (Wtd PP) approach to adjust for treatment non-compliance in a PH model. All-or-nothing compliance is assumed. In the PACCT-1 trial, compliance is with respect to the chemotherapy, which is of short duration relative to time to events. Also, almost all patients who start chemotherapy received several cycles (planned number is 4–6). As in Cuzick et al. (2007), we assume that the patient population consists of three subgroups based on patient treatment preference, the ambivalent group, the insisters and the refusers, and our objective is to estimate the treatment effect among the ambivalent group, which was also referred to as the compliers in the principal stratification context (e.g. Frangakis and Rubin (2002)). Various papers have considered estimating treatment effect among the ambivalent group (e.g. Rubin et al. (1996), Abadie (2003)). Since the outcome model specifies effects within the ambivalent group without imposing distributional assumptions on the outcomes for the other two groups, this method is potentially more robust to model misspecifications among the insisters and refusers, although the model used for estimating the weights does restrict the distributions. On the other hand, the likelihood method considered in Cuzick et al. (2007) does restrict the outcome distributions for the insisters and refusers through the main outcome models.
In the second part of the paper, under the proportional hazards model considered in Cuzick et al. (2007), we propose an EM algorithm to estimate the parameters in a full likelihood (FL), and we also modify the EM algorithm to maximize the pseudo likelihood (PL) considered in Cuzick et al. (2007). The EM algorithm is computationally much more efficient than directly optimizing the likelihood using standard general purpose algorithms, since the risk parameters can be updated using standard software and the baseline hazard parameters can be updated using closed form formulas.
The paper is organized as follows. In Section 2, we introduce the notation and the model. In Section 3, the Wtd PP estimator is developed, and we propose the FL estimator and the PL estimator obtained from the EM algorithms. The performance of the three estimators is evaluated through simulations in Section 4. In Section 5, the methods are illustrated by analyzing an ECOG-ACRIN breast cancer clinical trial. Section 6 gives relevant discussion.
2. Notation and the model
Consider a clinical trial with non-compliance where the interest is to compare an experimental treatment E vs. a standard treatment S. Denote treatment E as 1 and treatment S as 0. Let Zi be a vector of baseline covariates, Ri be the randomized arms and assume P(Ri = 1|zi) = ρ so P(Ri = 0|zi) = 1 − ρ, where ρ is known by design and is independent of Zi. Let Ai be the treatment actually received. Under all-or-nothing compliance both Ri and Ai are binary variables. Let Ti be time to failure, Ci be censoring time, Ui = min(Ti, Ci) and δi = I(Ti ≤ Ci).We observe n independent and identically distributed (iid) copies Oi = (Ui, δi, Zi, Ri, Ai).
As in Cuzick et al. (2007), assume patient population consists of three latent groups determined by patient preference for E: ambivalent (amb) – subjects who receive whichever treatment they are offered; insisters (ins) – subjects who always receive E regardless treatment assigned; and refusers (ref) – subjects who always receive S regardless treatment assigned. Denote this membership as Mi, and throughout the paper, we assume Ti ⊥ Ci|Zi, Ai, Mi, with ⊥ denoting independence. That is, Ci is non-informative, conditional on the covariates, the preference groups and the actually received treatment. This treatment preference model assumes no defiers and it is consistent with the principal stratification framework (Frangakis and Rubin (2002)). Note that while treatment effect among insisters and refusers are not identified as they always only receive one treatment, patients in the amb group always receive their randomized treatments therefore they should provide unbiased information regarding the treatment effect. Further discussions regarding the identifiability of treatment effect under non-compliance can be found in Robins and Rotnitzky (2004). We consider the following proportional hazards model:
(1) |
where λ0(t) is the baseline hazard function,β ∈ ℜP, and η(z, a) is a function of z and a. In the PACCT-1 trial, for example, let Z = RS, β′ η(z, a) = βzz + βaa + βza(z × a), where z × a is the interaction between treatment and RS.
Note that although (1) is defined among M = amb, M is not always observable. Instead, by combining Ri and Ai, the following four compliance groups are observed: SE – patients randomized to S but receive E (ins); ES – patients randomized to E but receive S (ref); SS – patients randomized to S and receive S (mixture of amb and ref); and EE – patients randomized to E and receive E (mixture of amb and ins). Denote this observed membership as Mobs,i. It is clear that Mi can only be directly identified for SE and ES groups, but not for SS and EE groups. By directly applying the arguments given in Chapter 23, Fitzmaurice (2009), we show in Web Appendix A that the parameters in (1) have a causal hazard ratio interpretation.
3. Estimation
3.1 A weighted Per-Protocol (Wtd PP) approach
The per-protocol group includes patients with R = A. In Web Appendix B, we show that
(2) |
That is, given A = a and Z = z, the hazard in the amb group can be identified as the hazard in the per-protocol group, weighted by a time-varying factor. In counting process notation, define the failure process Ni (t) = 1(Ui ≤ t, δi = 1) and the at-risk process Yi(t) = 1(Ui ≥ t). It follows (1) that in the subset {Ri = Ai = a}, the process,
is a martingale. Linking this together with the standard theory for partial likelihood, we therefore propose to estimate β in (1) by fitting a weighted Cox model for the per-protocol group, with the contribution of a subject in the risk set weighted by . That is, β̂ is obtained by maximizing the following weighted pseudo-partial likelihood:
with corresponding weighted pseudo-score equation
where is an estimate of Wia (t, z) defined in (2).
The weights Wia (t, z) are time-varying (depend on Ti) and subject-specific (depend on Zi). While consistent estimators of Wia (t, z) are needed in order to get consistent estimators for β, obtaining them is not trivial, since the true functional forms of both numerator and denominator could be complex, especially when Z is continuous and/or high dimensional. Smoothing techniques could be considered. Alternatively, in Web Appendix C we propose to estimate the numerator and the denominator separately by fitting parametric models for each piece. Note that this approach does not guarantee that the estimators for the weights are consistent (in fact, most likely they will not be). However, through simulations, we show that for reasonable settings, weights obtained through this approach adjust non-compliance fairly well and reasonable estimators of β can be obtained.
Once the are obtained, we use the “BFGS” quasi-Newton optimization algorithm to maximize Lw (β), and it appears to converge relatively rapidly in our setting.
3.2 The full likelihood (FL) approach
Cuzick et al. (2007) considered the following proportional hazard model:
(3) |
where m = ins, ref, amb, z0 = 1(M = amb and A = 1), and γamb = 0. Given the definition of z0, this model also only specifies the treatment effect among the amb group. Note that (3) assumes that given Z, the hazards of the ins and ref groups are proportional to the amb group. Again, in the PACCT-1 setting, letting β′η(z, z0) = βzz + βaz0 + βza(z × z0), where z is the RS score, (3) implies that the hazards in the three groups are:
Note that both Z0 and M are not observable for SS and EE groups. Cuzick et al. (2007) estimated the parameters using a pseudo likelihood (PL) approach. A new PL algorithm is given in the following section. Here, we also consider a FL estimator. As shown in Web Appendix D, the full likelihood still has a tractable form. Other considerations for a FL estimator include: 1) in the FL setting, a likelihood ratio test can be implemented, and 2) the method proposed in Chen and Little (1999) can be applied to estimate the variance of the risk parameters (see Section 3.4 for details).
Given (3), by first writing the conditional joint distribution f(ui, δi|zi, m, ai) as
we show in Web Appendix D that the full likelihood contribution from subject i is:
(4) |
where λ0() is the baseline hazard, Λ0() is the baseline cumulative hazard, τins,i = exp(γins + β′η(z, z0 = 0)), τref,i = exp(γref + β′η(z, z0 = 0)), τamb0,i = exp(β′η(z, z0 = 0)), τamb1,i = exp(β′η(z, z0 = 1)), Sm,i = e{−Λ0(ui)τm,i} is the survival function that depends on the corresponding τm,i, and πins(z) = P(M = ins|z), πref(z) = P(M = ref|z), so πamb(z) = P(M = amb|z) = 1 − πins(z) − πref(z). Following the approach of Breslow (1974), Λ0() is assumed to be a discrete jump function which only jumps at the observed event times, and we denote the set of jump sizes as λ0{uj}, j = 1, …, J. So, writing θ = (γins, γref, β), the problem is then to maximize Li over (θ, λ0{uj}, πm(z)), θ = (γm, β), m = ins, ref.
The πm(z) can be re-expressed in terms of probabilities of Mobs,i, that is, πins(z) and . When Z is low dimensional and categorical, πm(z) are just discrete probabilities. When Z is continuous and/or high dimensional, for simplicity, parametric models on the observed data Mobs,i (say, indexed by finite dimensional parameter α) can be assumed to model πm(z), so the problem is then to maximize Li over the parameter set (θ, λ0{uj}, α).
3.2.1 The EM algorithm
Let COi = (Ui, δi, Zi, Mi, Ri) denote the complete data for subject i. Note that since Ai is completely determined by combining Mi and Ri, it is omitted in the complete data vector. In Web Appendix E, we show that the log of the complete data likelihood is the sum of the following two components:
Note that only involves (θ, λ0{uj}) and only involves πm(z). Detailed procedures of the EM algorithm are given in Web Appendices F (the E step) and G (the M step). Our simulations show that the EM algorithm is computationally very efficient given the existence of closed form solutions for λ0{uj} (and for πm(z) when appropriate) and being able to update θ (and possibly πm(z)) using standard packages.
3.3 The pseudo likelihood (PL) approach
The FL approach requires the likelihood to be jointly maximized over (θ, λ0{uj}, πm(z)), where πm(z) is updated during each iteration. Alternatively, as in Cuzick et al. (2007), a pseudo likelihood approach can be considered, by fixing πm(z) at their initial estimates, and the problem is simplified to maximizing the pseudo likelihood over the parameter set (θ, λ0{uj}). The EM algorithm can again be applied. The complete data log likelihood contribution now simplifies to the first component of the complete data log likelihood in the FL setting, . In calculating the E step conditional expectation (and the weights), are replaced by π̂m(z), where π̂m(z) are fixed over all iterations. The same M step procedure from the FL setting can then be used to update (θ, λ0{uj}).
3.3.1 Estimating πm(z), m = ins, ref
As discussed in Section 3.2, by expressing these probabilities as probabilities of the observed data Mobs,i, multinomial/logistic models can be used to obtain their estimates, or when Z is categorical, the formulas given for choosing the initial values for these probabilities in the FL setting can be used to obtain the estimates.
3.4 Variance estimator
The full likelihood (FL) approach fits into the framework of a non-parametric maximum likelihood (NPML) method for censored data, considered in Zeng and Lin (2007) and Murphy and Van der Vaart (2000). Analogous to the parametric likelihood method, variance of the NPML estimators can be estimated by the inverse of the observed Fisher information matrix, which is usually of very high dimension. Alternatively, when interest only lies in a low dimensional parameter set (θ̂ = (γ, β) in our case), the profile likelihood method (Murphy and Van der Vaart (2000)) can be considered and the variance of the low dimensional parameter can be estimated by inverting the information matrix of the profile likelihood. One potential problem for applying this idea is that analytical forms of the second derivative of the observed profile likelihood is often not available and complex numerical differentiation method may be needed. To handle this problem, in this paper, we adopt the method proposed by Chen and Little (1999) which uses the EM-aided differentiation to approximate the second derivative of the profile likelihood based on the idea that the first derivative of the profile likelihood can be relatively easily obtained within the EM framework. Specifically, we apply the following steps to estimate the variance of θ̂ = (γ̂, β̂):
Calculate , the complete data score function.
Obtain the NPML estimates of the parameter of interest, θ̂, by applying the proposed EM algorithm.
Perturb the jth component of θ̂, θ̂j, by a small amount , where n is the sample size. Denote , and let denote the resulting vector where the jth component is replaced by whereas all the other components are fixed at the corresponding NPML estimates.
Apply the proposed EM algorithm to obtain and .
Calculate .
Now, perturb θ̂j on the other side, that is, define and repeat Steps 3–5. (Two-sided perturbation suggested by Chen and Little (1999)). Denote the resulting Score function as E(S−).
The second derivative of the observed log profile likelihood with respect to the jth component of θ can be approximated using {E(S+) − E(S−)}(2d)−1. Denote this as I(θj).
Steps 3–6 are repeated for j = 1, …, J.
The covariance matrix of θ can be estimated using the inverse of the resulting matrix, that is, (I(θ1), …, I(θJ))−1.
Given that the Wtd PP estimator involves parameters that increase with sample size (γt embedded in ), and both the PL estimator and the Wtd PP estimator involve pseudo terms (π̂m(z) and , respectively), deriving the asymptotic properties of these estimators could be quite involved and has not been attempted. The general inference results given in Zeng and Lin (2007) for semiparametric likelihood models involving infinite dimensional parameters and the profile approach considered in Murphy and Van der Vaart (2000) could be potential future directions for this. Here, we propose to use bootstrap techniques to estimate the variance of the estimators from these two methods. Let X = (X1, …, Xn) represents the originally observed data, where Xi = (Ui, δi, Zi, Ri, Ai). Let β̂ = β̂(X) be the estimator of β obtained from data X using a particular approach. Draw B bootstrap resamples of size n, X*(1), X*(2), …, X*(B), from the original data. Then the variance of β̂ can be estimated using the bootstrap variance estimator
where , b = 1, …, B, is the estimate computed from the bth bootstrap sample, and .
4. Simulation results
Simulations are run under the PACCT-1 setting and three sets of conditions are considered. In Simulation set 1, failure times are generated assuming the three preference groups have proportional hazards and three configurations within the amb group are considered: 1) there is a treatment by RS interaction; 2) the interaction effect is absent but a treatment effect is present; and 3) both the interaction effect and the treatment effect are null. In Simulation set 2, hazards in the three groups are assumed to be non-proportional: ins and ref groups follow Weibull distributions while the hazard for the amb group follows exponential distribution. Only the RS score by treatment interaction model is considered in this setting. In Simulation set 3, a more complex proportional hazards model is considered allowing interaction effects between the preference groups and the RS score. In all simulations, the risk parameters are chosen such that the hazard ratios are approximately consistent with the assumptions used in the PACCT-1 study design (except for the null model), but a higher baseline hazard is used. Under each condition, 300 datasets are simulated and for each simulated dataset, a sample size of 1000 is used. When appropriate, bootstrap variance estimators are calculated using B = 500.
4.1 Simulation set 1
The RS scores, Zi = 11, 12, …, 25, are generated based on the empirical distributions observed on PACCT-1; Mi are generated from multinomial distributions using the following three sets of parameters: (ins = 5%, ref = 25%, amb = 70%), (6%, 15%, 79%) and (12%, 11%, 77%), which correspond to the observed proportions in the three risk groups, respectively: RS scores 11–15, 16–20 and 21–25; Ri follow bernoulli (0.5) and Ai are determined from Mi and Ri: Ai = 0 if Mi = ref; Ai = 1 if Mi = ins; Ai = 0 if Mi = amb and Ri = 0; and Ai = 1 if Mi = amb and Ri = 1; Ti follows model (3), where λT(t|z, m, z0) = λ0(t)exp(γins1(M = ins) + γref 1(M = ref) + βzz + βaz0 + βza(z × z0)). Three sets of true β parameters are considered: (βz, βa, βza) = (0.1, −0.5, 0.03), (0.1, −0.28, 0), and (0.1, 0, 0), with γ parameters fixed at (γins, ref) = (0.5, −0.5). Both Ti and Ci follow exponential distributions with baseline hazards λ0(t) = 0.021 and λC(c) = 0.0053, respectively, which gives approximately 14% censoring rate under the non-null models.
The parameters πm(z), m = ins, ref, are updated using the closed form solutions for the FL estimator, and are estimated using sample proportions within each of the categories of the RS for the PL estimator: Zi ≤ 15, 16 ≤ Zi ≤ 20, and Zi ≥ 21. The time-varying weights Wia (t, z) in (2) are estimated by fitting multinomial models of Mobs,i: RS is treated as continuous and an additive model of t and z is fit for estimating the denominator; in estimating the numerator, RS is again treated as a three category variable, and at any given observed event time t, sample proportions within each category of Z are used for estimation.
In addition to the Wtd PP estimator and the two likelihood estimators, we also considered the naïve unweighted per-protocol estimator, and the ITT estimator. Also, for the first data configuration where both a treatment effect and an interaction effect are present, we also obtained the Wtd PP estimator using the true weights calculated from the data generating mechanism, to assess the bias and efficiency loss associated with estimating the weights.
The R function optim() was used to maximize the objective functions for the Wtd PP approach, and the R function coxph() was used to fit the Cox model for the other methods, including the weighted Cox model considered in the likelihood methods. For the Wtd PP approach, initial values for the parameters were obtained by fitting the Cox model λ(t|z, a) = λ0(t)exp(βzz + βaa+ βza(z × a)) among the observed compliers. For the likelihood methods, initial values θ, λ0(t), and the πm(z) were obtained as described in Web Appendix G.
In all following results tables, the Emean columns give the empirical means β̂ from the 300 simulations, Evar give the empirical variance, and is the average of the variance estimators from the 300 simulations (using the EM-aided differentiation techniques for the FL estimator, and bootstrap techniques for the Wtd PP estimator and the PL estimator). Under all settings, the bootstrap variance estimators appear to estimate the variance well. The variance estimators from EM-aided differentiation seems to slightly under-estimate the variance.
Under all settings, the performance of the two likelihood estimators is very similar, both are nearly unbiased and have similar variances. In most settings, the likelihood estimators appear to be more efficient than the Wtd PP estimator using estimated weights. However, when true weights are used (results under the treatment by RS interaction model, Table 1), the Wtd PP estimator is also unbiased and appears to be the most efficient estimator. When the treatment effect or the interaction effect is non-null (Table 1 and 2), the naïve per-protocol and the ITT estimators under-estimate the effect size. However, when both the treatment effect and interaction are null (Table 3), the ITT estimator is nearly unbiased and is the most efficient method, while the naïve per-protocol estimator gives the most biased estimate for the treatment effect.
Table 1.
Parameter and variance estimates under a model assuming the 3 groups have proportional hazards (number of simulations = 300, sample size in each dataset = 1000); true β = (0.1, −0.5, 0.03).
β̂z | β̂a | β̂za | |||||||
---|---|---|---|---|---|---|---|---|---|
| |||||||||
Method | Emean | Evar | Emean | Evar | Emean | Evar | |||
wtd PP | 0.10 | 0.00037 | 0.00036 | −0.47 | 0.040 | 0.038 | 0.029 | 0.00061 | 0.00059 |
wtd PP true w | 0.10 | 0.00018 | - | −0.51 | 0.022 | - | 0.030 | 0.00037 | - |
PL | 0.10 | 0.00016 | 0.00017 | −0.51 | 0.033 | 0.033 | 0.031 | 0.00043 | 0.00043 |
FL | 0.10 | 0.00016 | 0.00015 | −0.51 | 0.034 | 0.028 | 0.029 | 0.00043 | 0.00040 |
unwtd PP | 0.10 | 0.00019 | - | −0.31 | 0.024 | - | 0.025 | 0.00038 | - |
ITT | 0.11 | 0.00017 | - | −0.34 | 0.020 | - | 0.019 | 0.00032 | - |
Table 2.
Parameter and variance estimates under a model assuming the 3 groups have proportional hazards (number of simulations = 300, sample size in each dataset = 1000); true β = (0.1, −0.28, 0)
β̂z | β̂a | β̂za | |||||||
---|---|---|---|---|---|---|---|---|---|
| |||||||||
Method | Emean | Evar | Emean | Evar | Emean | Evar | |||
wtd PP | 0.10 | 0.00030 | 0.00036 | −0.25 | 0.034 | 0.037 | −0.0020 | 0.00053 | 0.00058 |
PL | 0.10 | 0.00014 | 0.00016 | −0.27 | 0.031 | 0.031 | −0.0020 | 0.00043 | 0.00042 |
FL | 0.10 | 0.00014 | 0.00015 | −0.28 | 0.031 | 0.027 | −0.0038 | 0.00043 | 0.00039 |
unwtd PP | 0.11 | 0.00016 | - | −0.097 | 0.023 | - | −0.0043 | 0.00038 | - |
ITT | 0.11 | 0.00016 | - | −0.18 | 0.018 | - | −0.0042 | 0.00031 | - |
Table 3.
Parameter and variance estimates under a model assuming the 3 groups have proportional hazards (number of simulations = 300, sample size in each dataset = 1000); true β = (0.1, 0, 0)
β̂z | β̂a | |||||
---|---|---|---|---|---|---|
| ||||||
Method | Emean | Evar | Emean | Evar | ||
wtd PP | 0.098 | 0.00013 | 0.00015 | 0.011 | 0.0087 | 0.0082 |
PL | 0.10 | 0.000076 | 0.000095 | −0.0050 | 0.0096 | 0.0095 |
FL | 0.10 | 0.000075 | 0.000091 | −0.0087 | 0.0095 | 0.0084 |
unwtd PP | 0.10 | 0.000080 | - | 0.13 | 0.0057 | - |
ITT | 0.11 | 0.000070 | - | −0.0045 | 0.0048 | - |
note*: only the additive models of z and a were fit
4.2 Simulation set 2
Failure times for the amb group follow the same exponential distribution as in Simulation set 1, λ(t|z, a, M = amb) = 0.021exp(0.1z − 0.5a + 0.03(z × a)). Hazards for insisters and refusers now follow Weibull proportional hazards model given RS score with baseline hazards and , respectively. Let H R1 and H R2 denote the ratios of the hazards in the ins and the ref groups, relative to the hazard in the untreated amb group, respectively. Given this data configuration, at t = (20, 40, 60, 80, 100), (H R1, H R2) = (5, 0.00002), (10, 0.0003), (15, 0.001), (20, 0.004), (25, 0.01), respectively. Under this relatively extreme non-proportional hazards assumption, the Wtd PP approach still gives reasonable parameter estimates whereas both likelihood methods now give much more biased estimates (Table 4). This is expected since the Wtd PP approach does not directly impose distributional assumptions on the outcomes of insisters and refusers through the main effect model, whereas the likelihood methods require the failure times of all three groups to have proportional hazards.
Table 4.
Parameter and variance estimates under a model assuming the 3 groups have non-proportional hazards (number of simulations = 300, sample size in each dataset = 1000); true β = (0.1, −0.5, 0.03)
β̂z | β̂a | β̂za | |||||||
---|---|---|---|---|---|---|---|---|---|
| |||||||||
Method | Emean | Evar | Emean | Evar | Emean | Evar | |||
wtd PP | 0.099 | 0.00033 | 0.00034 | −0.56 | 0.034 | 0.035 | 0.036 | 0.00057 | 0.00059 |
PL | 0.084 | 0.00012 | 0.00014 | −0.83 | 0.035 | 0.040 | 0.072 | 0.00046 | 0.00052 |
FL | 0.083 | 0.00011 | 0.00015 | −0.87 | 0.042 | 0.030 | 0.074 | 0.00051 | 0.00043 |
4.3 Simulation set 3
Failure times are now generated from the following more complex proportional hazard model allowing for interaction effects between the preference groups and RS:
where (γins, γref, γins,z, γref,z) = (0.28, −0.71, 0.03, 0.03). The γ parameters are chosen such that the median failure times for the three preference groups are approximately the same as in the other two sets of simulations. All the other parameters remain the same as in the interaction model in Simulation set 1.
The estimation procedure for the Wtd PP approach remains the same. However, the parameters τm,i in the likelihood contribution now need to be replaced by τins,i = exp(γins + γins,zz + βzz) and τref,i = exp(γref + γref,zz + βzz). Under this less parsimonious model, in terms of bias, the performance of the Wtd PP estimator and the likelihood estimators is similar to their corresponding counterparts obtained in Simulation set 1. But it appears now that the likelihood estimators are losing their advantages of being more efficient than the Wtd PP estimator due to the fact that the number of risk parameters that need to be estimated is increased. That is, the variances of the likelihood estimators and the Wtp PP estimator are now comparable (Table 5).
Table 5.
Parameter and variance estimates under a more complex model allowing for interaction effects between the latent groups and RS score (number of simulations = 300, sample size in each dataset = 1000); true β = (0.1, −0.5, 0.03)
β̂z | β̂a | β̂za | |||||||
---|---|---|---|---|---|---|---|---|---|
| |||||||||
Method | Emean | Evar | Emean | Evar | Emean | Evar | |||
wtd PP | 0.10 | 0.00038 | 0.00036 | −0.46 | 0.042 | 0.040 | 0.028 | 0.00063 | 0.00060 |
PL | 0.10 | 0.00036 | 0.00041 | −0.51 | 0.046 | 0.047 | 0.031 | 0.00064 | 0.00069 |
FL | 0.10 | 0.00036 | 0.00031 | −0.51 | 0.047 | 0.036 | 0.029 | 0.00065 | 0.00056 |
5. An example
E1180 (Mansour et al. (1998)) is an ECOG-ACRIN breast cancer study where patients are randomized between the combination chemotherapy CMFP (cyclophosphamide/methotrexate/5 fluorouracil/prednisone) and observation. Time from randomization to recurrence is the primary endpoint. Among the 424 analyzable patients, observed non-compliance rate is 11% (22/208) on the CMFP arm, and 16% (34/216) on the observation arm, respectively. Covariates of interest include estrogen receptor (ER) status (positive vs. negative) and primary tumor size (< 3cm vs. ≥ 3cm). ER positive patients with tumors < 3cm were not eligible.
Define two binary variables Z1 =1(ER = neg, tumsz > 3cm), and Z2 =1(ER = pos, tumsz > 3cm). For the Wtd PP method we assume the following model, λ(t|z1, z2, a, Mi = amb) = λ0(t)exp(βz1z1 + βz2z2 + βaa) and for the mixture likelihood methods we assume λ(t|z1, z2, m, a) = λ0(t)exp(βz1z1 + βz2z2 + γm + βaz0), where m = ins, ref, amb, γamb = 0, and z0 = 1(M = amb and A = 1). The results given in Table 6, show that the treatment effect estimates and their bootstrap standard errors from the Wtd PP approach and the PL method are very close. The performance of the two likelihood estimators is also similar.
Table 6.
Effect estimates from E1180
Method | β̂z1 (se) | β̂z2 (se) | β̂a(se) |
---|---|---|---|
wtd PP | 0.48 (0.34) | 0.78 (0.33) | −1.05 (0.33) |
PL | 0.47 (0.22) | 0.66 (0.23) | −1.02 (0.31) |
FL | 0.49 (0.21) | 0.66 (0.22) | −1.03 (0.27) |
unwtd PP | 0.40 (0.23) | 0.68 (0.23) | −0.74 (0.19) |
ITT | 0.41 (0.20) | 0.61 (0.22) | −0.71 (0.18) |
6. Discussion
We consider methods for estimating the treatment effects (and potentially the treatment by covariate interaction effects) in randomized clinical trials under all-or-nothing compliance for time-to-event outcomes. We propose a weighed per-protocol (Wtd PP) method which is potentially more robust to certain model missppecifications on the outcomes than a likelihood approach. The EM algorithms we proposed for the likelihood estimators are computationally more efficient than directly maximizing the observed likelihood over the high dimensional parameter set.
When true weights are known, the Wtd PP estimator gives unbiased estimates that are more efficient than the likelihood estimators. But in reality, the weights need to be estimated. When this is the case, the Wtd PP estimator that uses estimated weights from fitting parametric model appears to be less efficient, and more biased than the likelihood estimators when the likelihood is correctly specified. However, the efficiency differences also appear to depend on the correct model giving a parsimonious relationships among the distributions of the ambivalent, insister, and refuser groups. Obtaining better estimates of the time-varying weights in the Wtd PP estimator is challenging but may worth exploring in future work.
To adjust for non-compliance in a randomized trial setting, the IPTW estimator in marginal structural models that was developed for observational studies could also be considered. But these methods do not exploit randomization and also the no unmeasured confounder assumption (NUCA) is rather strong. Although estimators that do not require NUCA and are robust to model misspecifications are certainly desirable, so far attempts made in the literature toward this direction have not yet been completely successful for time-to-event outcomes. As shown in the simulation results, the Wtd PP method proposed here is potentially more robust in the sense that it does not require any assumptions on outcomes of the insisters and refusers through the main model, and the optimization of the (pseudo) objective function only involves a low dimensional risk parameter set. Based on our experience, the following may be considered as a guideline, with respect to the use of the three methods considered in this paper. The likelihood methods may be considered when the following assumptions can be reasonably made: 1) the outcome depends on the covariates/membership through a fairly simple form (a parsimonious model), and 2) the proportional hazards assumption reasonably holds among the three compliance groups. Otherwise, the Wtd PP method should be considered. The performance of the two likelihood estimators is similar in our settings. While the EM procedure for the PL estimator is computationally faster, when a likelihood ratio test is desirable, or when the Chen and Little (1999) variance estimator is preferred over a bootstrap variance estimator, the FL estimator could be considered. In practice, when non-compliance is substantial, more than one method may need to be considered together to give a better understanding of the data and more appropriate interpretation of the results obtained from each particular method.
In this paper, we focus on estimating the hazard ratio (HR), which has an causal interpretation in our setting. Hernán (2010) discussed the potential limitation of using a single value HR as the effect measure, when the period-specific HRs vary over time and/or there is selection bias over time, and as suggested in the paper, adjusted survival curves could be considered under these situations.
Supplementary Material
Acknowledgments
Support for Shuli Li's research was provided by National Institutes of Health doctoral training grant T32CA009337. The authors thank ECOG-ACRIN for permission to use the data.
Footnotes
Supplementary Materials
Web Appendices A–G, referenced in Sections 2 and 3, and R code are available with this paper at the Biometrics website on the Wiley Online Library.
Contributor Information
Shuli Li, Email: shuli@jimmy.harvard.edu, Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02115, U.S.A.
Robert J. Gray, Email: gray@jimmy.harvard.edu, Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02115, U.S.A.
References
- Abadie A. Semiparametric instrumental variable estimation of treatment response models. Journal of Econometrics. 2003;113(2):231–263. [Google Scholar]
- Breslow N. Covariance analysis of censored survival data. Biometrics. 1974:89–99. [PubMed] [Google Scholar]
- Chen HY, Little RJA. Proportional hazards regression with missing covariates. Journal of the American Statistical Association. 1999:896–908. [Google Scholar]
- Cuzick J, Sasieni P, Myles J, Tyrer J. Estimating the effect of treatment in a proportional hazards model in the presence of non-compliance and contamination. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2007;69(4):565–588. [Google Scholar]
- Fitzmaurice GM. Longitudinal data analysis. Chapman & Hall/CRC: 2009. [Google Scholar]
- Frangakis CE, Rubin DB. Principal stratification in causal inference. Biometrics. 2002;58(1):21–29. doi: 10.1111/j.0006-341x.2002.00021.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hernán MA, Brumback B, Robins JM. Marginal structural models to estimate the joint causal effect of nonrandomized treatments. Journal of the American Statistical Association. 2001;96(454):440–448. [Google Scholar]
- Hernán Miguel A. The hazards of hazard ratios. Epidemiology (Cambridge, Mass.) 2010;21(1):13. doi: 10.1097/EDE.0b013e3181c1ea43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Loeys T, Goetghebeur E. A causal proportional hazards estimator for the effect of treatment actually received in a randomized trial with all-or-nothing compliance. Biometrics. 2003;59(1):100–105. doi: 10.1111/1541-0420.00012. [DOI] [PubMed] [Google Scholar]
- Mansour EG, Gray R, Shatila AH, Tormey DC, Cooper MR, Osborne CK, Falkson G. Survival advantage of adjuvant chemotherapy in high-risk node-negative breast cancer: ten-year analysis–an intergroup study. Journal of clinical oncology. 1998;16(11):3486–3492. doi: 10.1200/JCO.1998.16.11.3486. [DOI] [PubMed] [Google Scholar]
- Murphy SA, Van der Vaart AW. On profile likelihood. Journal of the American Statistical Association. 2000:449–465. [Google Scholar]
- Paik S, Shak S, Tang G, Kim C, Baker J, Cronin M, Baehner FL, Walker MG, Watson D, Park T, et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. New England Journal of Medicine. 2004;351(27):2817–2826. doi: 10.1056/NEJMoa041588. [DOI] [PubMed] [Google Scholar]
- Paik S, Tang G, Shak S, Kim C, Baker J, Kim W, Cronin M, Baehner FL, Watson D, Bryant J, et al. Gene expression and benefit of chemotherapy in women with node-negative, estrogen receptor–positive breast cancer. Journal of Clinical Oncology. 2006;24(23):3726–3734. doi: 10.1200/JCO.2005.04.7985. [DOI] [PubMed] [Google Scholar]
- Robins J. Marginal structural models versus structural nested models as tools for causal inference. Statistical models in epidemiology, the environment and clinical trials. 2000;116:95. [Google Scholar]
- Robins J, Rotnitzky A. Estimation of treatment effects in randomised trials with non-compliance and a dichotomous outcome using structural mean models. Biometrika. 2004;91(4):763–783. [Google Scholar]
- Robins JM, Tsiatis AA. Correcting for non-compliance in randomized trials using rank preserving structural failure time models. Communications in statistics-theory and methods. 1991;20(8):2609–2631. [Google Scholar]
- Rubin DB, Imbens GW, Angrist JD. Identification of causal effects using instrumental variables. Journal of the American Statistical Association. 1996;91(434):444–455. [Google Scholar]
- Zeng D, Lin DY. Maximum likelihood estimation in semiparametric regression models with censored data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2007;69(4):507–564. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.