Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Jun 30.
Published in final edited form as: Stat Med. 2015 Mar 20;34(14):2235–2265. doi: 10.1002/sim.6470

Bias in estimating the causal hazard ratio when using two-stage instrumental variable methods

Fei Wan a, Dylan Small b, Justin E Bekelman c, Nandita Mitra a,*
PMCID: PMC4455906  NIHMSID: NIHMS674630  PMID: 25800789

Abstract

Two stage instrumental variable methods are commonly used to estimate the causal effects of treatments on survival in the presence of measured and unmeasured confounding. Two stage residual inclusion (2SRI) has been the method of choice over two stage predictor substitution (2SPS) in clinical studies. We directly compare the bias in the causal hazard ratio estimated by these two methods. Under a principal stratification framework, we derive a closed form solution for asymptotic bias of the causal hazard ratio among compliers for both the 2SPS and 2SRI methods when survival time follows the Weibull distribution with random censoring. When there is no unmeasured confounding and no always takers, our analytic results show that 2SRI is generally asymptotically unbiased but 2SPS is not. However, when there is substantial unmeasured confounding, 2SPS performs better than 2SRI with respect to bias under certain scenarios. We use extensive simulation studies to confirm the analytic results from our closed-form solutions. We apply these two methods to prostate cancer treatment data from SEER-Medicare and compare these 2SRI and 2SPS estimates to results from two published randomized trials

Keywords: instrumental variable, two-stage residual inclusion, two-stage predictor substitution, unmeasured confounding, survival, bias

1. Introduction

Evaluating the effectiveness of treatment and identifying the causal relationship between exposure and disease are critical objectives for clinical and health services researchers. Confounding is often a concern when analyzing nonrandomized observational studies and even randomized studies with non-compliance [1]. Instrumental variable (IV) methods are increasingly being used in clinical comparative effectiveness studies to potentially control for both measured and unmeasured confounding. Angrist et al.[2] defined the IV for causal effects of treatment on outcome to be a variable satisfying the following five assumptions: i) The potential outcomes on one subject are unrelated with the particular assignment of treatment to the other subjects; ii) IV is randomly (or ignorably) assigned; iii) Any effect of IV on the outcome must be mediated by treatment received (the exclusion restriction);iv) IV has nonzero effect on treatment received; v) There are no defiers. (for details see section 2)

In a recent clinical study, we were interested in comparing the effectiveness of two treatments for prostate cancer in elderly men using SEER-Medicare, a large national observational database. Specifically, we planned to use IV methods to estimate the effect of the addition of external beam radiation therapy (EBRT) to androgen suppression therapy (ADT) in improving overall survival in men with locally advanced prostate cancer. We considered a commonly used IV in health services research: local area treatment patterns defined by the percentage of active treatment in hospital referral regions (HRR). This IV has been shown to capture regionally distinct structural variation in care [3]. Such variation is not fully explained by patient characteristics. Further, this IV varies across HRRs and is strongly associated with treatment assignment. Finally, it is balanced across important observed prognostic factors. Although there is an extensive literature on the importance of choosing an appropriate instrument, less attention has been paid to using the appropriate modeling approach once an IV is selected.

Recently, there has been rapid uptake and widespread use of two IV based analytic approaches called two-stage residual inclusion (2SRI) and two-stage predictor substitution (2SPS)[4, 5]. These methods have been used to correct for bias due to endogeneity in non-linear models for both binary and time-to-event outcomes. Among these two IV approaches, 2SRI was shown to consistently estimate a conditional causal parameter under certain assumptions [4] and has been adopted as the method of choice in clinical research studies involving survival outcomes[6, 7, 8]. The conditional causal parameter that Terza et al.[4] consider is only identified by making homogeneity assumptions that go beyond the five assumptions for a valid IV defined in the first paragraph. Angrist et al. [2] showed that under these five assumptions for a valid IV, the only treatment effect that is identified is the average treatment effect for the compliers, where the the compliers are the subjects who would take the treatment if encouraged to do so by the IV but would not take the treatment if not encouraged by the IV; this is called the local average treatment effect (LATE). In the context of a binary outcome, Cai et al.[5] demonstrated that both the 2SRI and 2SPS methods generated biased estimates of LATE among compliers for binary outcome. In this paper, we focus on the properties of 2SPS and 2SRI as estimators of the LATE for time-to-event data.

Despite the fact that there is growing interest in applying two stage IV methods to time-to-event data, little is known about the potential bias of using such methods to estimate LATE among compliers. We derive closed form expressions of the bias and conduct extensive simulations to quantify this bias. We then apply both of the two-stage IV methods to our prostate cancer treatment data and compare them to the results from two published randomized clinical trials [9, 10]

2. Notation, Assumptions, Compliance Categories, and Model

2.1. Notation

Following the notation of Cai et al.[5] and Nie et al.[11], an N-dimensional vector of binary IV is represented by . An IV value of 1 represents encouragement to receive the active treatment and 0 represents no encouragement to receive the active treatment. In a RCT setting, where the IV is the randomized assignment, then an IV value of 1 represents random assignment to treatment and 0 represents random assignment to control; in the prostate cancer observational study described in the introduction, an IV value of 1 represents a high local area rate (above median) of adding EBRT to ADT and 0 represents a low local area rate (below the median) of adding EBRT to ADT. The ith element Ri = 1 implies that subject i is encouraged to receive the active treatment, whereas Ri = 0 indicates that subject i is not encouraged to receive the active treatment. Let be an N-dimensional vector of potential treatment received given , and ith element ZiR¯=1 indicates that subject i receives the active treatment and ZiR¯=0 means that subject i receives the control under .

Similarly, we define Ṟ,Ẕ to be an N-dimensional vector of potential survival time under and , and ith element TiR¯,Z¯ is the potential survival time for subject i under and . Let Ṟ, Ẕ to be an N-dimensional vector of potential censoring time under and , and ith element LiR¯,Z¯ is the potential censoring time for subject i under and .

We define Ṟ, Ẕ=min{Ṟ,Ẕ, ḺṞ,Ẕ}, the elementwise minimum of potential censoring and survival times, to be an N-dimensional vector of potential observed follow up time under and , and ith element YiR¯,Z¯ represents the potential follow up time for subject i under and . Let δiR¯,Z¯=I{TiR¯,Z¯CiR¯,Z¯} indicates whether subject i is observed to terminate by failure ( δiR¯,Z¯=1) or by censoring ( δiR¯,Z¯=0) given and . The vector i represents measured confounding variables for subject i.

2.2. Assumptions

The main assumptions we will make for causal modeling are the five assumptions made by Angrist et al. [2], and a random censoring assumption for the survival setting.

  1. Stable Unit Treatment Value Assumption (SUTVA)[12, 13]

    1. if Ri=Ri, then ZiR¯=ZiR¯

    2. if Ri=Ri and Zi=Zi, then YiR¯,Z¯=YiR¯,Z¯

      The SUTVA assumption says that the potential outcomes for subject i are not related with the treatment status of other subjects such that we can write ZiR¯, YiR¯,Z¯, TiR¯,Z¯, LiR¯,Z¯, δiR¯,Z¯ as ZiRi, YiRi,Zi, TiRi,Zi, LiRi,Zi, δiRi,Zi respectively. The SUTVA assumption also implies the assumption of consistency, such that the value of the potential outcome given a treatment remains unchanged no matter what the treatment assignment mechanism is [12]

  2. Independence of the instrument [14]:

    Conditional on a vector of confounders , the random vector (Ṟ,Ẕ, Ṟ,Ẕ,Ṟ,Ẕ,) is independent of . In a randomized trial where R is the IV, the independence assumption holds without conditioning on .

  3. Exclusion Restriction

    Ẕ,Ṟ, and Ṟ′, we have:

    Ṟ,Ẕ=Ṟ′,Ẕ,Ṟ,Ẕ = Ṟ′,Ẕ, Ṟ′,Ẕ = Ṟ′,Ẕ, This assumption implies that any effect of IV on potential outcomes must be through its effect on treatment actually received. Thus, we can write TiR¯,Z¯, LiR¯,Z¯, YiR¯,Z¯ as TiZi, LiZi, YiZi by combining the exclusion restriction and SUTVA assumptions.

  4. Non-zero Average Causal Effect of on
    E[Zi1Zi0]0

    This assumption means the IV is correlated with treatment received.

  5. Monotonicity [15]
    Zi1Zi0,iN

    This assumption rules out the existence of defiers. No subject always does the opposite of the treatment assigned.

  6. Independent censoring

    The distribution of potential survival time Ṟ,Ẕ is independent of the distribution of potential censoring time Ṟ,Ẕ.

2.3. Compliance Categories

Under the framework of principal stratification and potential outcomes [2, 16], subjects in a two-arm randomized trial can be categorized into 4 principal strata: Always takers (AT) are subjects who always take the treatment regardless of assignments (Z1 = 1, Z0 = 1); Compliers (C) are subjects who comply with their assignments (Z1 = 1, Z0 = 0); Never takers (NT) are the subjects who never take the treatment no matter which group they are assigned to (Z1 = 0, Z0 = 0); Defiers (D) are the subjects who take the treatment opposite of their assignments (Z1 = 0, Z0 = 1).

2.4. Model

We first define the probability of receiving the treatment Pr(R = 1) = r, the probability of being a always taker Pr(AT) = ρa, and the probability of being a complier Pr(C) = ρc. We also define the probability of being a defier Pr(D) = ρd, but under the monotonicity assumption, there are no defiers so that ρd = 0. Hence, the probability of being a never taker Pr(NT) is equal to 1 − ρaρc.

We assume both potential censoring time and potential survival time follow the Weibull distribution with the same shape parameter α. The potential censoring time for the subjects in each principal strata follows Weibull(α, λ), and we define the parameters of the probability distribution of potential survival time for each principal strata as follows:

T1AT~Weibull(α,θat1),T0AT~Weibull(α,θat0)T1C~Weibull(α,θc1),T0C~Weibull(α,θc0)T1NT~Weibull(α,θnt1),T0NT~Weibull(α,θnt0)

We also examined scenarios in which different shape parameters α’s are assumed for the potential censoring time and the potential survival time. These details are given in Appendix E. The density of Weibull distribution is f(t) = (α/K)(t/K)K−1exp(−(t/K)α) and the hazard rate is h(t) = αKαtα−1. In the case of Weibull regression with covariates X, Kα can be reparameterized as exp(βX). The hazard rate for the compliers if treated is h(T1=tC)=αtα1(θc1)α. The hazard rate for the compliers if not treated is h(T0=tC)=αtα1(θc0)α. Hence, the log causal hazard ratio ϕ for the compliers is the difference between two log hazard rates:

ϕ=log[h(T1=tC)]log[h(T0=tC)]=α(log(θc1)log(θc0))

3. Two Stage Predictor Substitution (2SPS) Method

The 2SPS method is frequently used and simple to implement [4]. In the first stage, the treatment received Z is regressed on the IV-treatment assignment R, and let P = E (ZR). In the second stage, a log linear model including P, defined as:

log[h(YP)]=η+ξP+log(h0(y)),h0(Y)=αyα1

is fitted to estimate the coefficient ξ. This is 2SPS estimator of the log causal hazard ratio. We first derive a closed form expression to the probability limit of the maximal likelihood estimator (M.L.E) of ξ, then take the difference between this probability limit and true log causal parameter ϕ for the expression of the asymptotic bias of the 2SPS estimator as an estimator of the log causal hazard ratio for compliers.

3.1. Probability limit of M.L.E of causal parameter

Let denote the predicted value from the estimated binary regression model. i.e., = Ê(ZR). When is substituted for P, the second stage Weibull model becomes:

log[λ(YP^)]=η+ξP^+log(h0(y))

Let ξ̂* and ξ̂ denote the estimators (M.L.E) of ξ* and ξ respectively. As sample size n → ∞, P, ξ^pξ^, and ξ^pξ. Therefore, ξ^pξ. To derive closed form expression for the asymptotic bias, we need to re-express ξ in terms of parameters specified in Section 2 under the principal stratification framework.

Only always takers receive the treatment when assigned to control (R = 0). Both always takers and compliers take the treatment when assigned to treatment (R = 1). Thus, it can be shown that [5]:

p0=E(ZR=0)=ρa,p1=E(ZR=1)=ρa+ρc

Since P = {p0, p1} is an one-to-one transformation of R = {0, 1}, we have the following for the second stage Weibull regression:

log(h(YR=0))=log(h(YP=p0))=η+ξp0+log(h0(y)) (1)

and,

log(h(YR=1))=log(h(YP=p1))=η+ξp1+log(h0(y)) (2)

Instead of working with a second stage model involving P, we can work with a model involving R instead. Solving (1) and (2), we have:

ξ=log(h(YR=1))log(h(YR=0))p1p0 (3)

The log linear model including R assumes two underlying Weibull distributions of the same shape parameter α*, Weibull(α*, K0) and Weibull(α*, K1), for subjects assigned to control (R = 0) and treatment (R = 1) respectively. Thus, (3) can be expressed as:

ξ=log(K1α)log(K0α)ρc,K1α=eη+ξp1,K0α=eη+ξp0 (4)

It is worth noting that both follow up times of subjects assigned to control, denoted as YR = 0, and follow up times of subjects assigned to treatment, denoted as YR = 1, actually follow mixture distributions consisting of three different Weibull distributions. Details are given in Appendix A. However, the second stage Weibull model of 2SPS method imposes the two Weibull distributions, with the same shape parameter α* but different scale parameters K0, K1, upon subjects assigned to treatment (R = 1) or assigned to control (R = 0) respectively. Thus, the M.L.E of α*, K0, K1 are derived by maximizing the likelihood function Ln (α*, K0, K1) that consists of products of two Weibull densities: Weibull(α*, K0) and Weibull(α*, K1).

Let α̂* denote the M.L.E of α* and We set E(log(Ln(α,K^0(α),K^1(α))α), the expectation of score equation derived from profile likelihood of α*, equal to 0 and let a be the solution. Under the assumptions stated in Section 2 and consistency of M.L.E, the probability limit of the estimator α̂* is a. Details are given in Appendix C. Once the parameters of the principal strata are defined, a can be solved numerically using a root-finding algorithm such as the “bisection” method. Let 0, 1 be the M.L.Es of the two scale parameters K0, K1 respectively. After the value of a is determined, the probability limits of the estimators 0, 1 can be derived as follows:

K0=[1P(δ=1R=0)×{ρaΓ(αα+1)[1θat1α+1λα]α/α+ρnΓ(αα+1)[1θnt0α+1λα]α/α+ρcΓ(αα+1)[1θc0α+1λα]α/α}]1/α (5)

and,

K1=[1P(δ=1R=1)×{ρaΓ(αα+1)[1θat1α+1λα]α/α+ρnΓ(αα+1)[1θnt0α+1λα]α/α+ρcΓ(αα+1)[1θc1α+1λα]α/α}]1/α (6)

The detailed steps of the derivation of (5) and (6) are given in Appendix C. By substituting (5) and (6) into (4), we derive the expression of log causal hazard ratio ξ as the following:

ξ={log([1P(δ=1R=1)×{ρaΓ(αα+1)[1θat1α+1λα]α/α+ρnΓ(αα+1)[1θnt0α+1λα]α/α+ρcΓ(αα+1)[1θc1α+1λα]α/α}])1log([1P(δ=1R=0)×{ρaΓ(αα+1)[1θat1α+1λα]α/α+ρnΓ(αα+1)[1θnt0α+1λα]α/α+ρcΓ(αα+1)[1θc0α+1λα]α/α}])1}×1ρc (7)

Thus, (7) is the closed-form expression of the probability limit of the log causal hazard ratio estimator ξ̂* from the 2SPS Weibull model.

3.2. Bias analysis

The asymptotic bias of the causal parameter ξ of the 2SPS Weibull regression model is simply the difference between the true log causal hazard ratio ϕ and the derived closed form expression of ξ, such that

B2sps=ξ+α(log(θc1)log(θc0)) (8)

We can re-paramterize θnt0 in (8) with one additional parameter Δ=α(log(θnt0)log(θc0)) as the following:

log(θnt0)=log(θc0)+Δα (9)

Δ in (9) is the log hazard ratio between never takers and compliers given no treatment. It can be interpreted as the magnitude of the unmeasured confounding because the differences between principal strata are attributable to the unmeasured confounding [5]. When Δ = 0 or θnt0=θc0, there is no unmeasured confounding.

We make the following observations about the bias of 2SPS method from (3.11): 1) When α = 1 and we treat α* as a known parameter and fix it at 1, that is the scenario when the survival outcomes of all principal strata follow exponential distributions and we also fit an exponential model in the second stage instead of estimating the shape parameter for a more general form of Weibull distribution; 2) When ρc = 1, every subject is a complier and (8) can be simplified as 1αγαψ(αα+1)1α=0. Then we have α=α. Setting ρc = 1, ρa = 0, and ρn = 0, (8) becomes 0 so that bias B2sps = 0 when a randomized controlled trial has perfect compliance; 3) When there is no causal effect ( θc1=θc0), all terms in (8) cancel out and we have B2sps = 0; 4) When ρa = 0 and θc0=θn0, there is no confounding because there are no always takers and never takers can’t get treatment so that the confounding can only be attributable to the difference between never takers and compliers given no treatment[5]. However, (8) can not be reduced to 0 under this setting so that the bias of 2SPS method B2sps is generally not 0 even when there is no confounding. 5) λ, the scale parameter of the censoring distribution is involved in bias equation (9), which coincides with the results in Struthers and Kalbfleisch[17].

We can analyze how parameters influence the relationship between the magnitude of confounding and bias using derived closed form expression (9). For the purpose of demonstration only, here we create four scenarios in which there are no always takers. The results are revealed in Figure 1 (a)-(d).

Figure 1.

Figure 1

Plot of bias against magnitude of unmeasured confounding Δ using 2SPS method:(a) P (R = 1) = 0.8, ρa = 0, ρc = 0.5, θc1=3.33, θc0=1.67. (b) P (R = 1) = 0.8, ρa = 0, ρc = 0.8, θc1=3.33, θc0=1.67. (c) P (R = 1) = 0.8, ρa = 0, ρc = 0.5, θc1=33.3, θc0=16.7. (d) P (R = 1) = 0.5, ρa = 0, ρc = 0.8, θc1=3.33, θc0=1.67. The different colour of solid line corresponds to different shape parameter: black (α = 0.5), red (α = 1), and green (α = 2).

In Figure 1, we can clearly see that the bias of the 2SPS method is not 0 when there is no confounding. The bias increases with the larger shape parameter α of the survival function (within each principal stratum). The bias is the smallest when we have an decreasing hazard rate (α < 1) and the highest when we have an increasing hazard rate (α > 1). By comparing Figure 1 (a) and (b), we also observe that the bias decreases as the compliance rate increases from 0.5 to 0.8. When the scale parameter (θc) is smaller, the bias is also smaller (Figure 1 (a) vs. (c)). Although the probability of being randomly assigned to the treatment group is involved in computing the shape parameter of the second stage Weibull regression model, its effects on the bias are very small (compare Figure 1 (b) to (d)).

4. Two Stage Residual Inclusion (2SRI) Method

Similar to the 2SPS method, the 2SRI method involves two stage modeling [4]. In the first stage, we regress the treatment received Z on the IV-treatment assignment R and calculate the residual term E = Z − E (ZR). In the second stage, we fit a log linear model on both treatment received variable Z and residual E as,

log(h(YZ,E)))=λ0+λ1Z+λ2E+log(h0(y)),h0(Y)=αyα1 (10)

, to estimate the regression coefficient λ1. This is 2SRI estimaor of the log causal hazard ratio. We derive the probability limit of the M.L.E of λ1 first and then calculate the asymptotic bias by taking the difference between this probability limit of the estimator and true log causal hazard ratio among compliers.

4.1. Probability limit of M.L.E of causal parameter

As discussed in a previous study[5], (10) is not the true model for the hazard function h(YZ, E). In fact the true model includes the interaction term between Z and E. However, deriving the closed-form expression for the probability limit of the estimator from (10) is very difficult when (10) is not the true model. With one additional assumption that there are no always takers, (10) becomes the true model. We derive a closed-form expression of the probability limit of the estimator of causal parameter λ1 assuming that there are no always takers and thus (10) is the true model. Let Ê denote the residuals from the estimated binary regression model in the first stage. i.e., Ê = Z − Ê(ZR). When Ê is substituted for E, (10) becomes:

log[h(YZ,E^)]=λ0+λ1Z+λ2E^+log(h0(y))

Let λ^1 and λ̂1 be the estimators (M.L.E) of λ1 and λ1. As sample size n → ∞, ÊE, λ^1pλ^1, and λ^1pλ^1. Thus, λ^1pλ1. To derive a closed form expression for the asymptotic bias, we need to first re-express λ1 in terms of the parameters specified in section 2.3 under the principal stratification framework.

As shown in a previous study[5], under the no always taker assumption, the first stage binary regression is E(ZR) = ρa + ρcR and residual term E = Z − E (ZR), thus the residual term can be re-expressed as E = ZρaρcR. Since {Z, E} has an one to one relationship with {Z, R}, we can establish the following equivalence between the model involving {Z, E} and the model involving {Z, R} for the second stage Weibull model:

log(h(YZ,E))=λ0+λ1Z+λ2E+log(h0(y))=λ0+λ1Z+λ2(ZρaρcR)+log(h0(y))=log(h(YZ,R)) (11)

Under the no always taker assumption, the second stage Weibull regression model defined by (10) assumes the three underlying Weibull distributions with the same shape parameter but different scale parameters for subjects in the three different subgroups: 1) ~ Weibull(α*, K0) for those who are assigned to treatment and receive the treatment actually (Z = 1, R = 1). Only compliers are in this group; 2) ~ Weibull(α*, K1) for those who are assigned to treatment but do not receive the treatment actually (Z = 0, R = 1), This group has only never takers; 3) ~ Weibull(α*, K2) for those who are assigned to control and do not receive the treatment (Z = 0, R = 0), both never takers and compliers are in this group. There are no subjects that are assigned to control but still take the active treatment (Z = 1, R = 0) under the assumption of no always takers. Thus, the M.L.E of α*, K0, K1, K2 are derived by maximizing the likelihood function Ln(α*, K0, K1, K2) that consists of products of three Weibull densities: Weibull(α*, K0), Weibull(α*, K1), and Weibull(α*, K2).

Let α̂* denote the M.L.E of α* and set E(log(Ln(α,K^0(α),K^1(α),K^2(α))α), the expectation of score equation derived from profile likelihood of α*, to 0 and let α be the solution. Under the assumptions stated in section 2 and consistency of the M.L.E, the probability limit of the estimator α̂* is α. Details are given in Appendix D. With the parameters of principal strata defined, α can be solved numerically using a root-finding algorithm. Let 0, 1, 2 be the M.L.Es of two scale parameters K0, K1, K2. Once the value of α is determined, we compute the probability limits of the estimators 0, 1, 2 as follows:

K0=[Γ(αα+1)[1θc1α+1λα]α/α11+(θc1λ)α]1/α (12)

and

K1=[Γ(αα+1)[1θnt0α+1λα]α/α11+(θnt0λ)α]1/α (13)

and

K2=[Γ(αα+1)[1θnt0α+1λα]α/αρnt+Γ(αα+1)[1θc0α+1λα]α/αρc11+(θnt0λ)αρnt+11+(θc0λ)αρc]1/α (14)

The derivation of (12),(13) and (14) is detailed in Appendix D. Based on (11), we can establish the following three equations with all possible combination of values of Z and R excluding the always takers scenario (Z=1, R=0).

  1. When Z=1 and R=1, there are only compliers in this subgroup.
    log(h(YZ=1,R=1))=log(h(Y(1)Z=1,R=1))λ0+λ1+λ2(1ρC)=log(K0α)=log([Γ(αα+1)[1θc1α+1λα]α/α11+(θc1λ)α]1) (15)
  2. When Z=0 and R=1, there are only never takers in this subgroup.
    log(h(YZ=0,R=1))=log(h(Y(0)Z=0,R=1))λ0+λ2(ρC)=log(K1α)=log([Γ(αα+1)[1θnt0α+1λα]α/α1+(θnt0λ)α]1) (16)
  3. When Z=0 and R=0, there are mixture of both never takers and compliers in this subgroup.
    log(h(YZ=0,R=0))=log(h(Y(0)Z=0,R=0))λ0=log(K2α)=log([Γ(αα+1)[1θnt0α+1λα]α/αρnt+Γ(αα+1)[1θc0α+1λα]α/αρc11+(θnt0λ)αρnt+11+(θc0λ)αρc]1) (17)

We then derive the closed form expression for the causal parameter λ1 by solving (15),(16), and (17) for λ1 as follows:

λ1=log([Γ(αα+1)[1θc1α+1λα]α/α11+(θc1λ)α]1)log([Γ(αα+1)[1θnt0α+1λα]α/αρnt+Γ(αα+1)[1θc0α+1λα]α/αρc11+(θnt0λ)αρnt+11+(θc0λ)αρc]1)1ρCρC(log([Γ(αα+1)[1θnt0α+1λα]α/αρnt+Γ(αα+1)[1θc0α+1λα]α/αρc11+(θnt0λ)αρnt+11+(θc0λ)αρc]1))log([Γ(αα+1)[1θnt0α+1λα]α/α11+(θnt0λ)α]1)

4.2. Bias analysis

To compute asymptotic bias of the 2SRI method, we subtract the true log hazard ratio ϕ from the closed-form expression of λ1.

B2SRI=λ1+α(log(θc1)log(θc0)) (18)

We can re-parameterize θnt0 in (18) in the way as in Section 3 and let θnt0=θc0eΔα. From the derived expression of asymptotic bias of 2SRI estimator, we can make the following observations: 1) When α = 1, the survival outcome within a principal stratum follows an exponential distribution. If we treat α* as known and set α* = 1, it means we fit an exponential regression model in the second stage; 2) When there is perfect compliance (ρc = 1), we have B2SRI = 0. In this scenario, α=α. By plugging ρc = 1 into (18), we can easily verify the results; 3) When there is no confounding ( θc0=θn0), B2SRI = 0; 4) When there is no causal effect ( θc1=θc0), B2 SRI is not 0; 5) λ, the scale parameter of the censoring distribution is involved in bias equation (18), similar to the findings for 2SPS method.

We can analyze how parameters influence the relationship between the magnitude of confounding and bias from the 2SRI method using (18). Similar to the previous section, four scenarios were created assuming there are no always takers. The results are shown in Figure 2 (a)-(d). In Figure 2, it is apparent that the bias of the 2SRI method is 0 when there is no confounding. Intuitively, under the condition of no confounding, substituting the term of the estimated residuals in the second stage survival model has no effect on the estimate of the causal parameter. By comparing Figure 2 (a) and (b), we also observe that the bias decreases as the compliance rate increases from 0.5 to 0.8. When the scale parameter (θc) is smaller, the bias tends to be smaller (Figure 2 (a) vs. (c)). The probability of being randomly assigned to the treatment group has very small impact on the bias (compare Figure 2 (b) to (d)).

Figure 2.

Figure 2

Plot of bias against magnitude of unmeasured confounding Δ using 2SRI method: (a) P (R = 1) = 0.8, ρa = 0, ρc = 0.5, θc1=3.33, θc0=1.67. (b) P (R = 1) = 0.8, ρa = 0, ρc = 0.8, θc1=3.33, θc0=1.67.(c) P (R = 1) = 0.8, ρa = 0, ρc = 0.5, θc1=33.3, θc0=16.7. (d) P (R = 1) = 0.5, ρa = 0, ρc = 0.8, θc1=3.33, θc0=1.67. The different colour of solid line corresponds to different shape parameter: black (α = 0.5), red (α = 1), and green (α = 2).

5. Simulation

5.1. Simulation algorithm

We follow the five step algorithm used by Cai et al.[5] to generate data for a simulation study. In the first step, a data set of N subjects is generated. Always takers, compliers, and never takers among these subjects are generated from a multinomial distribution with probabilities {ρa, ρc, ρn}. At the second step, treatment assignment status R is generated for each subject with probability P(R = 1) = ρr. Because outcome in the present study is time to event, we modified step 3 to generate potential survival time {T0, T1} and censoring time {L0, L1} for each principal stratum based on the parameters θat0, θc0, θnt0, θat1, θc1, θnt1, λ. For instance, if a subject is a complier, the potential time to death under control Tc0 is generated from weibull (α, θc0) and the potential time to death under treatment Tc1 is generated from weibull (α, θc1). The potential censoring time { Lc0, Lc1} are generated from weibull(α, λ). At step 4, we use compliance status (always taker, complier, or never taker) and treatment assignment status R to determine the treatment received status Z. For instance, if a subject is a complier and assigned to treatment group (R = 1), then Z = 1. If a subject is an always taker but assigned to the control group, then Z = 0. At step 5, the observed survival time and censoring time are generated as follows:

T=T1Z+T0(1Z),andL=L1Z+L0(1Z)

and finally observed follow up time and censoring indicator are given as:

Y=min(T,L),andδ=I(LT)

5.2. Simulation results

To demonstrate the consistency between the derived closed form expressions and the asymptotic biases from the 2SPS and 2SRI approaches under the assumption of no always takers (ρa = 0), we ran the simulation 2000 times, with the sample size n=10000, according to the same parameter settings presented in Figure 1 d) and Figure 2 d). Table 1 shows simulation results from 4 scenarios (α = 0.5, 1, 1.5, 2). As shown in this table, the biases from simulated results are consistent with the values computed with the derived analytic formula for both the 2SPS and 2SRI Weibull models. We also considered 2SPS and 2SRI Cox models (the second stage regression is a Cox model instead of a Weibull model). The pattern of the biases from 2SPS and 2SRI Cox models remains the same as for the 2SPS and 2SRI Weibull models respectively. With decreasing hazard (α = 0.5), the bias from using the 2SPS approach is smaller than the bias from the 2SRI approach. When the hazard is constant or increasing (α ≥ 1), the results are mixed. With stronger negative confounding, the 2SPS method produces smaller bias than the 2SRI method. However, with no confounding or stronger positive confounding, the 2SPS method produces larger bias than the 2SRI method.

Table 1.

Bias in estimating log causal hazard ratio parameter (ρa = 0, ρc = 0.5, ρr = 0.8, θc1=3.33, θc0=1.67)

α δ
Bias2spsanalytic
Bias2spsAFT
Bias2spsCox
Bias2srianalytic
Bias2sriAFT
Bias2sriCox
0.5 2 -0.094 -0.093 -0.091 -0.477 -0.476 -0.476
1.5 -0.067 -0.068 -0.064 -0.238 -0.239 -0.235
1 -0.039 -0.040 -0.039 -0.086 -0.087 -0.086
0.5 -0.013 -0.016 -0.012 -0.015 -0.018 -0.014
0 0.007 0.009 0.007 0.000 0.002 0.000
-0.5 0.023 0.020 0.026 0.000 -0.003 -0.001
-1 0.038 0.037 0.051 0.029 0.028 0.029
-1.5 0.055 0.053 0.075 0.114 0.112 0.108
-2 0.073 0.074 0.101 0.261 0.263 0.236

1 2 -0.250 -0.253 -0.247 -0.545 -0.550 -0.544
1.5 -0.177 -0.175 -0.177 -0.285 -0.284 -0.284
1 -0.096 -0.093 -0.097 -0.110 -0.107 -0.112
0.5 -0.017 -0.020 -0.018 -0.022 -0.025 -0.023
0 0.051 0.053 0.055 0.000 0.002 0.000
-0.5 0.107 0.106 0.116 -0.007 -0.008 -0.009
-1 0.152 0.153 0.177 0.000 0.000 0.002
-1.5 0.193 0.191 0.232 0.057 0.053 0.055
-2 0.230 0.232 0.280 0.175 0.176 0.157

1.5 2 -0.422 -0.423 -0.418 -0.605 -0.607 -0.602
1.5 -0.285 -0.285 -0.284 -0.326 -0.325 -0.326
1 -0.132 -0.133 -0.134 -0.133 -0.134 -0.134
0.5 0.019 0.023 0.021 -0.028 -0.027 -0.029
0 0.153 0.152 0.159 0.000 -0.004 0.000
-0.5 0.261 0.266 0.274 -0.015 -0.012 -0.015
-1 0.345 0.342 0.376 -0.030 -0.033 -0.027
-1.5 0.412 0.412 0.461 -0.005 -0.008 -0.002
-2 0.467 0.468 0.531 0.078 0.075 0.068

2 2 -0.574 -0.578 -0.571 -0.656 -0.656 -0.653
1.5 -0.359 -0.360 -0.357 -0.362 -0.361 -0.359
1 -0.122 -0.124 -0.122 -0.152 -0.153 -0.152
0.5 0.111 0.115 0.112 -0.034 -0.032 -0.036
0 0.317 0.320 0.324 0.000 0.003 0.002
-0.5 0.481 0.479 0.494 -0.022 -0.026 -0.026
-1 0.605 0.605 0.636 0.059 -0.059 -0.056
-1.5 0.698 0.701 0.747 -0.069 -0.069 -0.063
-2 0.769 0.770 0.833 -0.023 -0.024 -0.026

Bias2spsanalytic - bias computed using analytic formula derived for 2SPS method; Bias2spsAFT - bias computed via simulation for 2SPS Weibull accelerated failure time model; Bias2spsCox -bias computed via simulation for 2SPS Cox model; Bias2srianalytic - bias computed using analytic formula derived for 2SRI method; Bias2sriAFT -bias computed via simulation for 2SRI Weibull accelerated failure time model; Bias2sriCox -bias computed via simulation for 2SRI Cox model;

To evaluate the performance of both 2SPS and 2SRI methods in the setting where there are always takers, we simulated the data with various combination of parameters based on the following settings: i) Shape parameter α varies among {0.5, 1, 2}, which represent decreasing, constant, and increasing hazard scenarios; ii) Probabilities of being always takers ρa and compliers ρc were set to 3 combinations: {0.2, 0.7}, {0.7, 0.2}, and {0, 0.5}. In this way, low, medium, and high levels of compliance were represented; iii) probability of being assigned to treatment ρr were set to {0.1, 0.5} to reflect both new and relatively established treatments; iv) Scale parameter of censoring distribution were set to {0.5, 1, 2}; v) Each of the parameters θat0, θc0, θc1 was set to {0.5, 1, 3} separately. Thus, 1458 possible combinations were created. For each setting, we generated 10,000 observations and fit the 2SPS and 2SRI models to the data. This process was repeated 2000 times.

The results are presented in Figure 3. The magnitude of bias increases with increasing magnitudes of unmeasured confounding. As the value of shape parameter α increases, the magnitude of bias increases. In the scenarios with decreasing hazard, the 2SPS method outperforms the 2SRI method. The 2SRI method tends to have larger asymptotic bias when the magnitude of unmeasured confounding is large. In the scenarios with constant hazard, the 2SPS method slightly outperforms the 2SRI method when the magnitude of unmeasured confounding is large. In the scenarios with increasing hazard, both approaches produce larger biases. The 2SRI method performs better when the magnitude of unmeasured confounding is small. When there are always takers, the 2SRI method could be biased even when there is no measured confounding. We also compared the two methods using mean square error and the conclusions remain the same (4).

Figure 3.

Figure 3

Absolute bias in estimating log causal hazard ratio using two stage IV methods (X-axis is the magnitude of confounding Δ, Y-axis is the absolute bias). For 2SRI method or 2SPS method, the biases computed for each of 1458 possible scenarios were grouped by the magnitude of shape parameter α (decreasing hazard for α = 0.5, constant hazard for α = 1, and increasing hazard for α = 2) and the magnitude of confounding Δ (larger values represent lager confounding effects and 0 represents no confounding).

6. Seer-Medicare Prostate Cancer Study

Prostate cancer is the highest prevalence non-skin malignancy among American men (In 2011, there were an estimated 2,707,821 men living with prostate cancer in the United States. The number of deaths was 23.0 per 100,000 men per year). Unlike prostate cancers that are diagnosed at an early stage, locally advanced prostate cancer is associated with substantial morbidity and mortality. Radiation therapy is a common treatment for locally advanced prostate cancer. Two randomized trials recently demonstrated that radiation therapy reduces mortality for men with locally advanced tumors who also receive systemic androgen deprivation[9, 10]. However, both trials excluded elderly patients and those with early stage, PSA-screen detected cancer and therefore had less generalizability, a common criticism of randomized evidence. Therefore, we applied two-stage IV methods to evaluate survival outcomes in locally advanced prostate cancer, assessing survival outcomes of androgen deprivation therapy with or without radiation therapy in comparison to the randomized trials.

We analyzed data from the Surveillance, Epidemiology and End Results (SEER)-Medicare database. The SEER-Medicare database links patient demographic and tumor-specific data collected by SEER cancer registries to Medicare claims for inpatient and outpatient care. We considered patients with prostate cancer diagnosed between January 1, 1995 and December 31, 2007 in SEER with follow up through December 31, 2010 in Medicare. The following patients were excluded: 1) older than age 85; 2) with unknown urban category; 3) in hospital referral regions (HRR) with less than 50 patients; 4) with unknown distance to the closest radiation facility; 5) patients who died within the first 9 months of the study. A total of 31,541 patients were selected and categorized as receiving androgen deprivation with or without radiation therapy.

The cohort was divided into the following three groups: 1) patients with American Joint Commission on Cancer (AJCC) Tumor stage (T-stage) of T2 or T3 and aged 65-75 (called the RCT Cohort). The patients in the “RCT Cohort” are most comparable to the patients from the two randomized studies of androgen deprivation with or without radiation therapy[9, 10]; 2) elderly patients under-represented or excluded from the published randomized trials with T-stage T2 or T3, aged 76-85 (called the “Elderly Cohort”); and 3) patients with early stage, PSA-screen detected cancer with T-stage T1 disease who were excluded from the published randomized trials (called the “Screen-Detected Cohort”).

The study by Widmark et al.[9] included men from 47 centers in Europe diagnosed between February, 1996 and December, 2002. 875 patients with locally advanced prostate cancer (T3; 78%; prostate-specific antigen (PSA) ≤ 70 ng/mL; N0; M0) were enrolled. 439 patients were randomly assigned to androgen deprivation alone and the other 436 patients received androgen deprivation with radiation therapy. The study by Warde et. al. enrolled 1,205 patients with locally advanced (T3 or T4) prostate cancer, organ-confined disease (T2) with either PSA >40 ng/mL or PSA >20 ng/mL and a Gleason score of 8 or higher between 1995 and 2005. 1205 patients were randomly assigned to receive the androgen deprivation alone (n=602) or androgen deprivation with radiation therapy (n=603). The hazard ratios for overall mortality reported previously [9] and [10] were 0.68 (95% CI 0.52–0.89) and 0.77 (95% CI 0.61–0.98). For ease of comparison, we combined the results of the randomized trials using weighted-average meta-analysis. The meta-analytic HR was 0.73 (0.61–0.87).

To assess the effectiveness of androgen deprivation with or without radiation therapy in reducing overall mortality (death from any cause), we performed two-stage IV Weibull regression analysis (2SPS and 2SRI) using a local area treatment rate instrument and controlling for the propensity score. The local area treatment rate instrument was defined as the proportion of patients who received definitive treatment (surgery or radiation therapy) among all patients with prostate cancer in the hospital referral region (HRR) and we categorized this instrument into a binary variable according to its median. This IV measures the aggressiveness of local area treatment and captures regionally distinct structural care variation not fully explained by patient characteristics. The IV was strongly associated with treatment assignment and balanced important prognostic factors [3]. The propensity score model included potential confounding variables including age, race, ethnicity, clinical T stage, N stage, and World Health Organization tumor grade, 17 categories of co-morbid disease, urban residence, and census track median income.

As shown in Table 2, there is variability in the estimated HRs obtained from the 2SPS and 2SRI methods. We estimated the shape parameter α ≈ 1.6 from the data. Using Figure 3, we can see that the bias for both the 2SPS and 2SRI methods is the largest when we have an increasing hazard (α > 1), even when the magnitude of unmeasured confounding is relatively small. When the hazard function is a decreasing one (α < 1), the 2SPS method produces more stable and less biased estimates than the 2SRI method. In this case, 2SPS may be a more appropriate approach to use. In the RCT Cohort, the estimated HRs (HR=0.96) from both IV methods are much larger than the meta-analytic HR from the two randomized studies. Note that the confidence intervals are also much larger in both IV analyses than in the original RCTs. In the published RCTs, the authors concluded that there was a statistically significant treatment effect (combined therapy is better) whereas from our IV analysis, we can’t draw this conclusion. In the total study sample and separately in the RCT Cohort and the Screen-Detected Cohort, the two IV estimates are quite similar. However, for the Elderly Cohort, the estimate from the 2SPS method is different from the estimate from the 2SRI method.

Table 2.

Bias in estimating causal hazard ratio parameter for prostate cancer study

Outcome Group IV2sri IV2sps
All cause mortality Total (n=31541) 0.57(0.17-1.06) 0.59(0.19-1.09)
RCT Cohort (n=12924) 0.96(0.18-5.81) 0.97(0.18-5.94)
Elderly Cohort (n=14340) 0.74(0.20-1.83) 0.96(0.26-2.35)
Screen-Detected Cohort (n=4277) 0.34(0.02-2.99) 0.35(0.03-3.22)

7. Discussion

Many clinical and health services studies are using health care databases to compare the treatment effectiveness for drug and surgical therapies, but are prone to unmeasured confounding. Two stage IV methods have been gaining popularity among clinical researchers because these methods provide a relatively simple approach to analyzing survival outcome studies in the presence of unmeasured confounding. However, current knowledge about potential bias in estimating the log causal hazard ratio is limited. As demonstrated in our prostate cancer study, the large treatment effects estimated from two stage IV methods could be attributable to potential bias. We have derived closed-form expressions for the asymptotic bias of the 2SRI and 2SPS approaches assuming the survival times follow a Weibull distribution with shape parameter α and scale parameter K. We have demonstrated that these analytic results are consistent with our simulation results.

For binary outcomes, two previous studies[5, 18] demonstrated that the bias in the treatment effect estimated using the 2SRI approach increases as the magnitude of confounding increases. In this current work, we have shown analytically and by simulation that the 2SRI and 2SPS approaches are both biased in estimating the causal hazard ratio among compliers. In some situations when the hazard is decreasing (e.g among patients who have recently received a kidney transplantation), the 2SPS method is less biased than the 2SRI method and could be a more appropriate method to use. When the hazard is an increasing function, both IV methods may produce very large bias even under a moderate amount of unmeasured confounding. In this case, we recommend exercising caution when interpreting results from two-stage IV survival models.

We have shown that even when all IV assumptions are met, both the 2SRI and the 2SPS methods could fail to consistently estimate the causal hazard ratio among compliers. Our analytic results for bias may help to guide researchers in deciding when the bias is likely to be reasonably small so that two stage IV methods may be reasonably applied. Furthermore, in a sensitivity analysis approach, one may estimate the shape parameter and the censoring proportion among patients assigned to treatment or control from the data. With the shape parameter and censoring proportions fixed based on our known data the level of the unmeasured confounding could be varied to examine how the estimates would change, as shown in Figures 1 and 2. Alternative methods include partial likelihood estimation [19].

Figure 4.

Figure 4

Mean square error in estimating log causal hazard ratio using two stage IV methods (X-axis is the magnitude of confounding Δ, Y-axis is the Mean Square Error). For 2SRI method or 2SPS method, the mean square error computed for each of 1458 possible scenarios were grouped by the magnitude of shape parameter α (decreasing hazard for α = 0.5, constant hazard for α = 1, and increasing hazard for α = 2) and the magnitude of confounding Δ (larger values represent lager confounding effects and 0 represents no confounding).

Appendix

Appendix A: Mixture of Weibull Distributions

Prove the distribution function of observed survival time T conditional on random assignment R can be expressed as the following equations:

F(TR=0)=1(e(tθAT1)αρA+e(tθNT0)αρN+e(tθC0)αρC) (A.1)

and,

F(TR=1)=1(e(tθC1)αρC+e(tθNT0)αρN+e(tθAT1)αρA) (A.2)

In the above equations, AT represents always takers, C represents compliers, and NT represents never takers. Other definitions of parameters and distributions that are used in the proof are given below:

R=1if assigned to treatment;0if assigned to controlZ=1if receives treatment;0if receives controlρr=P(R=1)ρA=P(AT)ρC=P(C)ρN=1ρAρC
T1=potential outcome for a patient under treatmentT0=potential outcome for a patient under controlT1AT~weibull(α,θAT1)T1C~weibull(α,θC1)T1NT~weibull(α,θNT1)T0AT~weibull(α,θAT0)T0C~weibull(α,θC0)T0NT~weibull(α,θNT0)

no defiers under monotonicity assumption

Proof

F(T(1)Z=1,R=1)=P(T(1)tZ=1,R=1)=P(T(1)t,Z=1,R=1)P(Z=1,R=1)=P(T(1)t,AT,R=1)+P(T(1)t,C,R=1)P(AT,R=1)+P(C,R=1)=P(T(1)t,AT)P(R=1)+P(T(1)t,C)P(R=1)(P(AT)+P(C))P(R=1)R(T(1),T(0)),Rprincipal strata=P(T(1)tAT)P(AT)+P(T(1)tC)P(C)P(AT)+P(C)=(1e(tθAT1)α)P(AT)P(AT)+P(C)+(1e(tθC1)α)P(C)P(AT)+P(C)
F(T(0)Z=0,R=1)=P(T(0)tZ=0,R=1)=P(T(0)t,Z=0,R=1)P(Z=0,R=1)=P(T(0)t,NT,R=1)P(NT)P(R=1)=P(T(0)tNT)R(T(1),T(0)),Rprincipal strata=(1e(tθNT0)α)

F(TR = 1) can be expressed as:

F(TR=1)=P(Tt,Z=1R=1)+P(Tt,Z=0R=1)=P(TtZ=1,R=1)P(Z=1R=1)+P(TtZ=0,R=0)P(Z=0R=1)=P(T(1)tZ=1,R=1)P(Z=1R=1)+P(T(0)tZ=0,R=1)P(Z=0R=1)=((1e(tθAT1)α)P(AT)P(AT)+P(C)+(1e(tθC1)α)P(C)P(AT)+P(C))(P(AT)+P(C))+(1e(tθNT0)α)(P(NT))=1(e(tθC1)αρC+e(tθNT0)αρN+e(tθAT1)αρA)
F(T(1)Z=1,R=0)=P(T(1)tZ=1,R=0)=P(T(1)t,Z=1,R=0)P(Z=1,R=0)=P(T(1)t,AT,R=0)P(AT,R=0)=P(T(1)tAT)P(AT)P(R=0)P(AT)P(R=0)=P(T(1)tAT)=1e(tθAT1)α
F(T(0)Z=0,R=0)=P(T(0)tZ=0,R=0)=P(T(0)t,Z=0,R=0)P(Z=0,R=0)=P(T(0)t,NT,R=0)+P(T(0)t,C,R=0)P(NT,R=0)+P(C,R=0)=P(T(0)tNT)P(NT)+P(T(0)tC)P(C)P(NT)+P(C)=(1e(tθNT0)α)P(NT)P(NT)+P(C)+(1e(tθC0)α)P(C)P(NT)+P(C)

F(TR = 0) can be expressed as:

F(TR=0)=P(Tt,Z=1R=0)+P(Tt,Z=0R=0)=P(TtZ=1,R=0)P(Z=1R=0)+P(TtZ=0,R=0)P(Z=0R=0)=P(T(1)tZ=1,R=0)P(Z=1R=0)+P(T(0)tZ=0,R=0)P(Z=0R=0)=((1e(tθAT1)α)P(AT)+[(1e(tθNT0)α)P(NT)P(NT)+P(C)+(1e(tθC0)α)P(C)P(NT)+P(C)](P(C)+P(NT)=1(e(tθAT1)αρA+e(tθNT0))αρN+e(tθC0)αρC)

Appendix B: Proofs related with Derivation of Closed Form Solution

  1. Assume survival time T ~ Weibull(α, K) and censoring time L ~ Weibull(α, λ). Let Y = min(T, L) and δ = I(TL). Show that
    Y~Weibull(α,(1λα+1Kα)1/α)
    and,
    P(δ=1)=11+Kαλa (B.1)
    Proof:
    P(Yy)=P(min(T,L)y)=P(Ty,Ly)=y+αtα1Kαexp((tK)α)dty+αlα1λαexp((lλ)α)dl=exp((yK)α)exp((yλ)α)=exp((y(1λα+1Kα)1/α)α)
    Thus, Y~Weibull(α,(1λα+1Kα)1/α)
    P(δ=1)=P(0TL,0L)=0+αlα1λαexp((lλ)α)0lαtα1Kαexp((tK)α)dtdl=y+αlα1λαexp((lλ)α)[1exp((lK)α)]dl=1y+αlα1λαexp((lλ)α)exp((lK)α)dl=111+(λ/K)α=11+(Kλ)α
  2. Assume survival time T is a mixture of three Weibull distributions with Density f(t)=i=13pif(ti). T1 ~ Weibull(α, K1), T2 ~ Weibull(α, K2), and T3 ~ Weibull(α, K3). The weights are p1, p2, p3 and i=13pi=1. The censoring time L ~ Weibull(α, λ). Let Y = min(T, L) and δ = I(TL). Show that
    P(δ=1)=p111+K1αλa+p211+K2αλa+p311+K3αλa (B.2)
    Proof:
    P(δ=1)=P(δ=1,G=1)+P(δ=1,G=2)+P(δ=1,G=3)=P(δ=1G=1)P(G=1)+P(δ=1G=2)P(G=2)+P(δ=1G=3)P(G=3)=p111+K1αλa+p211+K2αλa+p311+K3αλa
  3. Given X follows a Weibull distribution (α*, K). Show that
    E(Xα)=Γ(αα+1)Kα (B.3)
    Proof:
    E(Xα)=XααKαXα1e(XK)αdx=yαα1KαeyKαdyLety=xα=1Kαy(αα+1)1eyKαdy=1Kα(Kα)(αα+1)Γ(αα+1)1(Kα)(αα+1)Γ(αα+1)y(αα+1)1eyKαdy=Γ(αα+1)Kα
  4. Given X follows a Weibull distribution (α*, K). Show that
    E(log(X))=γα+log(K) (B.4)
    Proof:
    E(log(X))=0log(X)αKαXα1e(XK)αdx=1αlog(y)1KαeyKαdyLety=xα=1α(log(u)+αlog(K))euduLety=uKα=1αlog(u)euduγ+log(K)euduLety=uKα=γα+log(K)
  5. Given X follows a Weibull distribution (α*, K). Show that
    E(Xαlog(X))1αΓ(αα+1)(Kα)(ψ(αα+1)+αlog(K)) (B.5)
    Proof:
    E(Xαlog(X))=0Xαlog(X)αKαXα1e(XK)αdx=yαα1αlog(y)1KαeyKαdyLety=xα=1α1KαΓ(αα+1)(Kα)αα+1log(y)1Γ(αα+1)(Kα)αα+1y(αα+1)1eyKαdyE(log(y))y~gamma(αα+1,Kα)=1α1KαΓ(αα+1)(Kα)αα+1(ψ(αα+1)+αlog(K))ψ()is digamma function=1αΓ(αα+1)(Kα)(ψ(αα+1)+αlog(K))
  6. Let Ti denote the survival time and Ci denote the censoring time for subject i. Ti and Ci are independent. Ti ~ weibull(α, K), and Ci ~ weibull(α, λ). Let Yi = min(Ti, Ci) denote observed follow-up time and δi be the indicator variable δi = (TiCi). Show that:
    E(Yiδi)=E(Yi)E(δi) (B.6)
    Proof:
    E(Yiδi)=E(Yiδi(I(δi=1)+I(δi=0))=E(YiδiI(δi=1))+E(YiδiI(δi=0))=E(TiI(δi=1))=00tI(δi=1)f(t,c)dtdc=00tI(tc)ft(t)fc(c)dtdc=0{0I(tc)fc(c)dc}tft(t)dt=0Sc(t)tft(t)dt=0texp(tαλα)αKαta1exp(tαKα)dt

Let K=(1λα+1Kα)1/α and use (B.1)

E(Yi)E(δi)=(11+Kαλα)0yαKαyα1exp(yαKα)dy=0αKαyα1exp(yαKα)yexp(yαλα)dy

Both E(Yi δi) and E(Yi) E (δi) have the same integral functions. Thus,

E(Yiδi)=E(Yi)E(δi)

Similarly, we can establish the following:

E(g(Yi)δi)=E(g(Yi))E(δi)

Appendix C: Derivation of probability limits of M.L.E of α, K0, K1 for 2SPS

Let Y = min(T, C) be observed follow-up time and δ = I(TC) be the censoring time. The subjects are assigned to either treatment group (R = 1) or control group (R = 0). The distribution of each subgroup has different scale parameter K but the same shape parameter α*. Thus, likelihood function of observed follow up time Y can be written as:

L(y)=i{R=1}nR1[(α/K1)(yi/K1)α1]δi[exp((yi/K1)α)]×i{R=0}nR0[(α/K0)(yi/K0)α1]δi[exp((yi/K0)α)]

For treatment assignment group and control assignment group, subjects are from compliers (c), never takers (nt), and always takers (at). Let nR1, nR0 denote number of subjects assigned to treatment (R = 1) and control (R = 0). Let nR1, at, nR1, nt, nR1, c denote number of always takers, never takers, and compliers that are assigned to treatment group. nR1, at + nR1, nt + nR1, c = nR1. Let nR0, at, nR0, nt, nR0, c denote number of always takers, never takers, and compliers, who are assigned to control group.nR0,at + nR0, nt + nR0, c = nR0. Therefore, the likelihood can be rewritten as:

L(y)=i{R=1,at}nR1,at[(α/K1)(yi/K1)α1]δi[exp((yi/K1)α)]×i{R=1,c}nR1,c[(α/K1)(yi/K1)α1]δi[exp((yi/K1)α)]×i{R=1,nt}nR1,nt[(α/K1)(yi/K1)α1]δi[exp((yi/K1)α)]×i{R=0,at}nR0,at[(α/K0)(yi/K0)α1]δi[exp((yi/K0)α)]×i{R=0,c}nR0,c[(α/K0)(yi/K0)α1]δi[exp((yi/K0)α)]×i{R=0,nt}nR0,nt[(α/K0)(yi/K0)α1]δi[exp((yi/K0)α)]

Next, the log likelihood function is:

l(y)=i{R=1,at}nR1,atδi{log(α)log(K1))+(α1)(log(yi)log(K1))}+i{R=1,c}nR1,cδi{log(α)log(K1))+(α1)(log(yi)log(K1))}+i{R=1,nt}nR1,ntδi{log(α)log(K1))+(α1)(log(yi)log(K1))}+i{R=1}nR1(yi/K1)α+i{R=0,at}nR0,atδi{log(α)log(K0))+(α1)(log(yi)log(K0))}+i{R=0,c}nR0,cδi{log(α)log(K0))+(α1)(log(yi)log(K0))}+i{R=0,nt}nR0,ntδi{log(α)log(K0))+(α1)(log(yi)log(K0))}+i{R=0}nR0(yi/K0)α

To derive the M.L.E of K0, K1, take the first derivative of l(y) with respect to K0, K1 and set score equation to 0, we have

K^0=[i{R=0}nR0yiαi{R=0}nR0δi]1/α (C.1)

and,

K^1=[i{R=1}nR1yiαi{R=1}nR1δi]1/α (C.2)

To derive the M.L.E of α*, take the first derivative of l(y) with respect to α* and set score equation to 0 and replace K1, K0 with the expressions (C.1) and (C.2), we have

0=i{R=0}nR0δiα+i{R=0,at}nR0,atδilog(yi)+i{R=0,c}nR0,cδilog(yi)+i{R=0,nt}nR0,ntδilog(yi){i{R=0}nR0δi}i{R=0}nR0yiαlog(yi)i{R=0}nR0yiα+i{R=1}nR1δiα+i{R=1,at}nR1,atδilog(yi)+i{R=1,c}nR1,cδilog(yi)+i{R=1,nt}nR1,ntδilog(yi){i{R=1}nR1δi}i{R=1}nR1yiαlog(yi)i{R=1}nR1yiα=i{R=0}nR0δiα+i{R=0,at}nR0,atδilog(yi)+i{R=0,c}nR0,cδilog(yi)+i{R=0,nt}nR0,ntδilog(yi){i{R=0}nR0δi}i{R=0,at}nR0,atyiαlog(yi)+i{R=0,c}nR0,cyiαlog(yi)+i{R=0,nt}nR0,ntyiαlog(yi)i{R=0,at}nR0,atyiα+i{R=0,c}nR0,cyiα+i{R=0,nt}nR0,ntyiα+i{R=1}nR1δiα+i{R=1,at}nR1,atlog(yi)+i{R=1,c}nR1,clog(yi)+i{R=1,nt}nR1,ntlog(yi){i{R=1}nR1δi}i{R=1,at}nR1,atyiαlog(yi)+i{R=1,c}nR1,cyiαlog(yi)+i{R=1,nt}nR1,ntyiαlog(yi)i{R=1,at}nR1,atyiα+i{R=1,c}nR1,cyiα+i{R=1,nt}nR1,ntyiα

M.L.E α̂* is the solution to the above equation. Next, divide both sides by total number of subject n, we have

0=i{R=0}nR0δi/nR0αnR0n+{i{R=0,at}nR0,atδilog(yi)+i{R=0,c}nR0,cδilog(yi)+i{R=0,nt}nR0,ntδilog(yi)}/nR0nR0nnR0n{i{R=0}nR0δi/nR0}{i{R=0,at}nR0,atyiαlog(yi)+i{R=0,c}nR0,cyiαlog(yi)+i{R=0,nt}nR0,ntyiαlog(yi)}/nR0{i{R=0,at}nR0,atyiα+i{R=0,c}nR0,cyiα+i{R=0,nt}nR0,ntyiα}/nR0+i{R=1}nR1δi/nR1αnR1n+{i{R=1,at}nR1,atδilog(yi)+i{R=1,c}nR1,cδilog(yi)+i{R=1,nt}nR1,ntδilog(yi)}/nR1nR1nnR1n{i{R=1}nR1δi/nR1}{i{R=1,at}nR1,atyiαlog(yi)+i{R=1,c}nR1,cyiαlog(yi)+i{R=1,nt}nR1,ntyiαlog(yi)}/nR1{i{R=1,at}nR1,atyiα+i{R=1,c}nR1,cyiα+i{R=1,nt}nR1,ntyiα}/nR1

As nR1, nR0, nR1, at, nR1, nt, nR1, c, nR0, at, nR0, nt, nR0, c → ∞, the score equation converges to the following:

0=P(δ=1R=0)P(R=0)/α+P(R=0){P(AT)E(δlog(Y)at,R=0)+P(C)E(δlog(Y)c,R=0)+P(NT)E(δlog(Y)nt,R=0)}P(δ=1R=0)P(R=0)×{P(AT)E(Yαlog(Y)at,R=0)+P(C)E(Yαlog(Y)c,R=0)+P(NT)E(Yαlog(Y)nt,R=0)}{P(AT)E(Yαat,R=0)+P(C)E(Yαc,R=0))+P(NT)E(Yαnt,R=0)}+P(δ=1R=1)P(R=1)/α+P(R=1){P(AT)E(δlog(Y)at,R=1)+P(C)E(δlog(Y)c,R=1)+P(NT)E(δlog(Y)nt,R=1)}P(δ=1R=1)P(R=1)×{P(AT)E(Yαlog(Y)at,R=1)+P(C)E(Yαlog(Y)c,R=1)+P(NT)E(Yαlog(Y)nt,R=1)}{P(AT)E(Yαat,R=1)+P(C)E(Yαc,R=1))+P(NT)E(Yαnt,R=1)} (C.3)

Use the results from Appendix B, we can derive the following:

E(δlog(Y)at,R=0)=P(δ=1at,R=0)(γα+log([1θat1α+1λα]1/α))=11+θat1αλα(γα+log([1θat1α+1λα]1/α)E(δlog(Y)nt,R=0)=P(δ=1nt,R=0)(γα+log([1θnt0α+1λα]1/α))=11+θnt0αλα(γα+log([1θnt0α+1λα]1/α))E(δlog(Y)c,R=0)=P(δ=1c,R=0)(γα+log([1θc0α+1λα]1/α))=11+θc0αλα(γα+log([1θc0α+1λα]1/α))E(Yαlog(Y)at,R=0)=(ψ(αα+1)+αlog([1θat1α+1λα]1/α))Γ(αα+1)[1θat1α+1λα]α/α1αE(Yαlog(Y)nt,R=0)=(ψ(αα+1)+αlog([1θnt0α+1λα]1/α))Γ(αα+1)[1θnt0α+1λα]α/α1αE(Yαlog(Y)c,R=0)=(ψ(αα+1)+αlog([1θc0α+1λα]1/α))Γ(αα+1)[1θc0α+1λα]α/α1αE(Yαat,R=0)=Γ(αα+1)[1θat1α+1λα]α/αE(Yαnt,R=0)=Γ(αα+1)[1θnt0α+1λα]α/αE(Yαc,R=0)=Γ(αα+1)[1θc0α+1λα]α/αP(δ=1R=0)=Pat11+θat1αλa+Pnt11+θnt0αλα+Pc11+θc0αλα

and,

E(δlog(Y)at,R=1)=P(δ=1at,R=1)(γα+log([1θat1α+1λα]1/α))=11+θat1αλα(γα+log([1θat1α+1λα]1/α))E(δlog(Y)nt,R=1)=P(δ=1nt,R=1)(γα+log([1θnt0α+1λα]1/α)))=11+θnt0αλα(γα+log([1θnt0α+1λα]1/α))E(δlog(Y)c,R=1)=P(δ=1c,R=1)(γα+log([1θc1α+1λα]1/α))=11+θc1αλα(γα+log([1θc1α+1λα]1/α))E(Yαlog(Y)at,R=1)=(ψ(αα+1)+αlog([1θat1α+1λα]1/α))Γ(αα+1)[1θat1α+1λα]α/α1αE(Yαlog(Y)nt,R=1)=(ψ(αα+1)+αlog([1θnt0α+1λα]1/α))Γ(αα+1)[1θnt0α+1λα]α/α1αE(Yαlog(Y)c,R=1)=(ψ(αα+1)+αlog([1θc1α+1λα]1/α))Γ(αα+1)[1θc1α+1λα]α/α1αE(Yαat,R=1)=Γ(αα+1)[1θat1α+1λα]α/αE(Yαnt,R=1)=Γ(αα+1)[1θnt0α+1λα]α/αE(Yαc,R=1)=Γ(αα+1)[1θc1α+1λα]α/αP(δ=1R=1)=Pat11+θat1αλα+Pnt11+θnt0αλα+Pc11+θc1αλα

Let α be the solution to the equation (C.3). By the consistency of M.L.E, Thus, we have α^Pα Next, substitute α̂* into equation (C.1)

K^0=[i{R=0}nR0yiα^i{R=0}nR0δi]1/α^=[nR0i{R=0}nR0δi{i{R=0,at}nR0,atyia^+i{R=0,nt}nR0,ntyiα^+i{R=0,c}nR0,cyiα}/nR0]1/α^

Asymptotically, it converges to

K^0[1P(δ=1R=0){PatE(Yat,0α)+PntE(Ynt,0α)+PcE(Yc,0α)}]1/α=[1P(δ=1R=0)×{PatΓ(αα+1)[1θat1α+1λα]α/α+PntΓ(αα+1)[1θnt0α+1λα]α/α+PcΓ(αα+1)[1θc0α+1λα]α/α}]1/α

Similarly, 1 converges to

K^1[1P(δ=1R=1)×{PatΓ(αα+1)[1θat1α+1λα]α/α+PntΓ(αα+1)[1θnt0α+1λα]α/α+PcΓ(αα+1)[1θc1α+1λα]α/α}]1/α

Appendix D: Derivation of probability limits of M.L.E of α, K0, K1, K2 for 2SRI

Under the no AT assumption, we can find an expression for λ1 as follows. The first stage regression can be re-expressed as following:

E(ZR)=ρAT+ρCRE=ZE(ZR)=ZρAρCR

Note that Z, E and Z, R are one-to-one correspondence. Knowing Z, E will let us know Z, R and vice versa. Under no always taker assumption, we observe three subgroups 1) Z = 1, R = 1. Only compliers in this group; 2) Z = 0, R = 1, Only never takers in this group; 3) Z = 0, R = 0, both never takers and compliers in this group. There are no patients that are assigned to control but still takes on active treatment (Z = 1, R = 0). For the 3 subgroups, essentially we are fitting 3 Weibull distributions with the same shape parameter α* and 3 different shape parameter K0, K1, K2 with Weibull regression model: logh(t) = λ0 + λ1 Z + λ2 E

The likelihood function is:

L(y)=i{Z=1,R=1,c}nZ1,R1,c[(α/K0)(yi/K0)α1]δi[exp((yi/K0)α)]×i{Z=0,R=1,nt}nZ0,R1,nt[(α/K1)(yi/K1)α1]δi[exp((yi/K1)α)]×i{Z=0,R=0,nt}nZ0,R0,nt[(α/K2)(yi/K2)α1]δi[exp((yi/K2)α)]×i{Z=0,R=0,c}nZ0,R0,c[(α/K2)(yi/K2)α1]δi[exp((yi/K2)α)]

The log likelihood is:

l(y)=i{Z=1,R=1,c}nZ1,R1,cδi{log(α)log(K0))+(α1)(log(yi)log(K0))}+i{Z=1,R=1,c}nZ1,R1,c(yi/K0)α+i{Z=0,R=1,nt}nZ0,R1,ntδi{log(α)log(K1))+(α1)(log(yi)log(K1))}+i{Z=0,R=1,nt}nZ0,R1,nt(yi/K1)α+i{Z=0,R=0,nt}nZ0,R0,ntδi{log(α)log(K2))+(α1)(log(yi)log(K2))}+i{Z=0,R=0,nt}nZ0,R0,nt(yi/K2)α+i{Z=0,R=0,c}nZ0,R0,cδi{log(α)log(K2))+(α1)(log(yi)log(K2))}+i{Z=0,R=0,c}nZ0,R0,c(yi/K2)α

Take the first derivative of l(y) with respective to K0, K1, K2 respectively and set score equation to 0, then we have

K^0=[i{Z=1,R=1,c}nZ1,R1,cyiαi{Z=1,R=1,c}nZ1,R1,cδi]1/α (D.1)
K^1=[i{Z=0,R=1,nt}nZ0,R1,ntyiαi{Z=0,R=1,nt}nZ0,R1,ntδi]1/α (D.2)
K^2=[i{Z=0,R=0,nt}nZ0,R0,ntyiα+i{Z=0,R=0,c}nZ0,R0,cyiαi{Z=0,R=0,nt}nZ0,R0,ntδi+i{Z=0,R=0,c}nZ0,R0,cδi]1/α (D.3)

Take the first derivative of l(y) with respective to α* and replace K0, K1, K2 with expression (D.1),(D.2),(D.3), then we have:

dlog(L(y))=i{Z=1,R=1,c}nZ1,R1,cδi{1α+log(yi)}i{Z=1,R=1,c}nZ1,R1,cδii{Z=1,R=1,c}nZ1,R1,c(yi)αlog(yi)i{Z=1,R=1,c}nZ1,R1,c(yi)α+i{Z=0,R=1,nt}nZ0,R1,ntδi{1α+log(yi)}i{Z=0,R=1,nt}nZ0,R1,ntδii{Z=0,R=1,nt}nZ0,R1,nt(yi)αlog(yi)i{Z=0,R=1,nt}nZ0,R1,nt(yi)α+i{Z=0,R=0,nt}nZ0,R0,ntδi{1α+log(yi)}+i{Z=0,R=0,c}nZ0,R0,cδi{1α+log(yi)}(i{Z=0,R=0,nt}nZ0,R0,ntδi+i{Z=0,R=0,c}nZ0,R0,cδi)i{Z=0,R=0,nt}nZ0,R0,ntyiαlog(yi)+i{Z=0,R=0,c}nZ0,R0,cyiαlog(yi)i{Z=0,R=0,nt}nZ0,R0,ntyiα+i{Z=0,R=0,c}nZ0,R0,cyiα=i{Z=1,R=1,c}nZ1,R1,cδi1α+i{Z=1,R=1,c}nZ1,R1,cδilog(yi)i{Z=1,R=1,c}nZ1,R1,cδii{Z=1,R=1,c}nZ1,R1,c(yi)αlog(yi)i{Z=1,R=1,c}nZ1,R1,c(yi)α+i{Z=0,R=1,nt}nZ0,R1,ntδi1α+i{Z=0,R=1,nt}nZ0,R1,ntδilog(yi)i{Z=0,R=1,nt}nZ0,R1,ntδii{Z=0,R=1,nt}nZ0,R1,nt(yi)αlog(yi)i{Z=0,R=1,nt}nZ0,R1,nt(yi)α+i{Z=0,R=0,nt}nZ0,R0,ntδi1α+i{Z=0,R=0,nt}nZ0,R0,ntδilog(yi)+i{Z=0,R=0,c}nZ0,R0,cδi1α+i{Z=0,R=0,c}nZ0,R0,cδilog(yi)(i{Z=0,R=0,nt}nZ0,R0,ntδi+i{Z=0,R=0,c}nZ0,R0,cδi)i{Z=0,R=0,nt}nZ0,R0,nty1αlog(yi)+i{Z=0,R=0,c}nZ0,R0,cyiαlog(yi)i{Z=0,R=0,nt}nZ0,R0,ntyiα+i{Z=0,R=0,c}nZ0,R0,cyiα=0

M.L.E α̂* is the solution to the above score equation. Next, divide the equation by total sample size n,

0=i{Z=1,R=1,c}nZ1,R1,cδi1α/nZ1,R1,c×nZ1,R1,cn+i{Z=1,R=1,c}nZ1,R1,cδilog(yi)/nZ1,R1,c×nZ1,R1,cn(i{Z=1,R=1,c}nZ1,R1,cδi/nZ1,R1,c×nZ1,R1,cn)i{Z=1,R=1,c}nZ1,R1,c(yi)αlog(yi)/nZ1,R1,ci{Z=1,R=1,c}nZ1,R1,c(yi)α/nZ1,R1,c+i{Z=0,R=1,nt}nZ0,R1,ntδi1α/nZ0,R1,nt×nZ0,R1,ntn+i{Z=0,R=1,nt}nZ0,R1,ntδilog(yi)/nZ0,R1,nt×nZ0,R1,ntn(i{Z=0,R=1,nt}nZ0,R1,ntδi/nZ0,R1,nt×nZ0,R1,ntn)i{Z=0,R=1,nt}nZ0,R1,nt(yi)αlog(yi)/nZ0,R1,nti{Z=0,R=1,nt}nZ0,R1,nt(yi)α/nZ0,R1,nt+{i{Z=0,R=0,nt}nZ0,R0,ntδi1α+i{Z=0,R=0,nt}nZ0,R0,ntδilog(yi)}/nZ0,R0,nt×nZ0,R0,ntn+{i{Z=0,R=0,c}nZ0,R0,cδi1α+i{Z=0,R=0,c}nZ0,R0,cδilog(yi)}/nZ0,R0,c×nZ0,R0,cn(i{Z=0,R=0,nt}nZ0,R0,ntδi+i{Z=0,R=0,c}nZ0,R0,cδi)/nZ0,R0×nZ0,R0n×{i{Z=0,R=0,nt}nZ0,R0,ntyiαlog(yi)+i{Z=0,R=0,c}nZ0,R0,cyiαlog(yi)}/nZ0,R0{i{Z=0,R=0,nt}nZ0,R0,ntyiα+i{Z=0,R=0,c}nZ0,R0,cyiα}/nZ0,R0

As sample sizes in each principal strata → ∞, the score equation will converge to:

0=1αP(δ=1Z=1,R=1)P(Z=1,R=1)+E(δlog(y)Z=1,R=1)P(Z=1,R=1)P(δ=1Z=1,R=1)P(Z=1,R=1)E(Yαlog(Y)Z=1,R=1))E(YαZ=1,R=1)+1αP(δ=1Z=0,R=1)P(Z=0,R=1)+E(δlog(y)Z=0,R=1)P(Z=0,R=1)P(δ=1Z=0,R=1)P(Z=0,R=1)E(Yαlog(Y)Z=0,R=1))E(YαZ=0,R=1)+1αP(δ=1Z=0,R=0,nt)P(Z=0,R=0,nt)+E(δlog(y)Z=0,R=0,nt)P(Z=0,R=0,nt)+1αP(δ=1Z=0,R=0,c)P(Z=0,R=0,c)+E(δlog(y)Z=0,R=0,c)P(Z=0,R=0,c)(PntPnt+PcP(δ=1Z=0,R=0,nt)+PcPnt+PcP(δ=1Z=0,R=0,c))×(P(Z=0,R=0,nt)+P(Z=0,R=0,c))×P(nt)E(Yαlog(Y)Z=0,R=0,nt)+P(c)E(Yαlog(Y)Z=0,R=0,c)P(nt)E(YαZ=0,R=0,nt)+P(c)E(YαZ=0,R=0,c) (D.4)

where,

P(δ=1Z=1,R=1)=11+(θc1λ)αP(Z=1,R=1)=P(C,R=1)=PcP(R=1)E(δlog(y)Z=1,R=1)=11+θc1αλα(γα+log([1θc1α+1λα]1/α))E(Yαlog(Y)Z=1,R=1)=(ψ(αα+1)+αlog([1θc1α+1λα]1/α))Γ(αα+1)[1θc1α+1λα]α/α1αE(YαZ=1,R=1)=Γ(αα+1)[1θc1α+1λα]α/αP(δ=1Z=0,R=1)=11+(θnt0λ)αP(Z=0,R=1)=P(nt,R=1)=PntP(R=1)E(δlog(y)Z=0,R=1)=11+θnt0αλα(γα+log([1θnt0α+1λα]1/α))
E(Yαlog(Y)Z=0,R=1)=(ψ(αα+1)+αlog([1θnt0α+1λα]1/α))Γ(αα+1)[1θnt0α+1λα]α/α1αE(YαZ=0,R=1)=Γ(αα+1)[1θnt0α+1λα]α/αP(δ=1Z=0,R=0,nt)=11+(θnt0λ)αP(Z=0,R=0,nt)=P(nt,R=0)=PntP(R=0)E(δlog(y)Z=0,R=0,nt)=11+θnt0αλα(γα+log([1θnt0α+1λα]1/α))E(Yαlog(Y)Z=0,R=0,nt)=(ψ(αα+1)+αlog([1θnt0α+1λα]1/α))Γ(αα+1)[1θnt0α+1λα]α/α1αE(YαZ=0,R=0,nt)=Γ(αα+1)[1θnt0α+1λα]α/αP(δ=1Z=0,R=0,c)=11+(θc0λ)αP(Z=0,R=0,c)=P(c,R=0)=PcP(R=0)E(δlog(y)Z=0,R=0,c)=11+θc0αλα(γα+log([1θc0α+1λα]1/α))E(Yαlog(Y)Z=0,R=0,c)=(ψ(αα+1)+αlog([1θc0α+1λα]1/α))Γ(αα+1)[1θc0α+1λα]α/α1αE(YαZ=0,R=0,c)=Γ(αα+1)[1θc0α+1λα]α/α

α is the solution to the equation (D.4). Thus, α^α. Probability limits of M.L.E of K0 can be derived as following:

K^0=[i{Z=1,R=1,c}nZ1,R1,cyiα^i{Z=1,R=1,c}nZ1,R1,cδi]1/α^=[i{Z=1,R=1,c}nZ1,R1,cyiα^/nZ1,R1,ci{Z=1,R=1,c}nZ1,R1,cδi/nZ1,R1,c]1/α^[E(yiαZ=1,R=1,c)P(δ=1Z=1,R=1,c)]1/α=[Γ(αα+1)[1θc1α+1λα]α/α11+(θc1λ)α]1/α

Similarly, for K1, K2,

K^1=[i{Z=0,R=1,nt}nZ0,R1,ntyiα^i{Z=0,R=1,nt}nZ0,R1,ntδi]1/α^[Γ(αα+1)[1θnt0α+1λα]α/α11+(θnt0λ)α]1/α
K^2=[i{Z=0,R=0,nt}nZ0,R0,ntyiα^+i{Z=0,R=0,c}nZ0,R0,cyiα^i{Z=0,R=0,nt}nZ0,R0,ntδi+i{Z=0,R=0,c}nZ0,R0,cδi]1/α^[Γ(αα+1)[1θnt0α+1λα]α/αPntPnt+Pc+Γ(αα+1)[1θc0α+1λα]α/αPcPnt+Pc11+(θnt0λ)αPntPnt+Pc+11+(θc0λ)αPcPnt+Pc]1/α

Appendix E: Assumption of the same shape parameter for survival and censoring distributions

In section 2 of the manuscript, we made the assumption that both time to event and censoring time have the same shape parameter so that close form solution could be derived. To evaluate the potential impact on the bias when the time to event and censoring time have two different shape parameters and the assumption is violated, we re-evaluated the scenario in the table 1 with the shape parameter α = 0.5. We set the shape parameter of censoring distribution to be 1.2 and compared the differences. We found that the differences in bias of 2SPS between two scenarios ranges from 0.01 to 0.018 (δ varies from -2 to 2). For 2SRI approach, the differences ranges from 0.001 to 0.13. These differences are attributable to the different censoring proportions between two scenarios. The shape of relationship between bias and δ remains approximately unchanged (data not shown). It should be noted that under the assumption of having the same shape parameters for both survival time and censoring time, the maximum likelihood estimator based on the survival likelihood that does not incorporate the assumption of the shape parameters being the same is not fully efficient.

References

  • 1.Hernán MA, Robins JM. Instruments for causal inference: an epidemiologist’s dream? Epidemiology. 2006;17(4):360–372. doi: 10.1097/01.ede.0000222409.00878.37. [DOI] [PubMed] [Google Scholar]
  • 2.Angrist J, Imbens G, Rubin DB. Identification of causal effects using instrumental variables. Journal of the American Statistical Association. 1996;91:444–455. [Google Scholar]
  • 3.Bekelman JE, Mitra N, Handorf E, Uzzo RG, Hahn S, Polsky D, Armstrong K. Effectiveness of Androgen Deprivation Therapy and Radiotherapy for Older Men with Locally Advanced Prostate Cancer. Journal of Clinical Oncology. doi: 10.1200/JCO.2014.57.2743. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Terza J, Basu A, Rathouz P. Two-stage residual inclusion estimation: addressing endogeneity in health econometric modeling. Journal of Health Economics. 2008;27(3):531–543. doi: 10.1016/j.jhealeco.2007.09.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Cai B, Small D, Ten Have T. Two-stage instrumental variable methods for estimating the causal odds ratio: analysis of bias. Statistics in Medicine. 2011;30(15):1809–1824. doi: 10.1002/sim.4241. [DOI] [PubMed] [Google Scholar]
  • 6.Gore JL, Litwin MS, Lai J, et al. Use of Radical Cystectomy for Patients with Invasive Bladder Cancer. Journal of the National Cancer Institute. 2010;102(11):802–811. doi: 10.1093/jnci/djq121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Hadley J, Yabroff KR, Barrett MJ, Penson DF, Saigal CS, Potosky AL. Comparative effectiveness of prostate cancer treatments: evaluating statistical adjustments for confounding in observational data. Journal of the National Cancer Institute. 2010;102(23):1780–1793. doi: 10.1093/jnci/djq393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Tan HJ, Norton EC, Ye Z, Hafez KS, Gore JL, Miller DC. Long-term survival following partial vs radical nephrectomy among older patients with early-stage kidney cancer. The Journal of the American Medical Association. 2012;307(15):1629–1635. doi: 10.1001/jama.2012.475. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Widmark A, Klepp O, Solberg A, et al. Endocrine treatment, with or without radiotherapy, in locally advanced prostate cancer (SPCG-7/SFUO-3): an open randomised phase III trial. Lancet. 2009;373(9660):301–308. doi: 10.1016/S0140-6736(08)61815-2. [DOI] [PubMed] [Google Scholar]
  • 10.Warde P, Mason M, Ding K, et al. Combined androgen deprivation therapy and radiation therapy for locally advanced prostate cancer: a randomised, phase 3 trial. Lancet. 2011;378(9809):2104–2111. doi: 10.1016/S0140-6736(11)61095-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Nie H, Cheng J, Small DS. Inference for the effect of treatment on survival probability in randomized trials with noncompliance and administrative censoring. Biometrics. 2011;67(4):1397–1405. doi: 10.1111/j.1541-0420.2011.01575.x. [DOI] [PubMed] [Google Scholar]
  • 12.Rubin DB. Statistics and causal inference–Which ifs have causal answers. Journal of the American Statistical Association. 1986;81:961–962. [Google Scholar]
  • 13.Rubin DB. Comment: Neyman (1923) and causal inference in experiments and observational studies. Statistical Science. 1990;5:472–480. [Google Scholar]
  • 14.Abadie A. Semiparametric Instrumental Variable Estimation of Treatment Response Models. Journal of Econometrics. 2003;113:231–263. [Google Scholar]
  • 15.Imbens G, Angrist J. Identification and Estimation of Local Average Treatment Effects. Econometrica. 1994;62:467–475. [Google Scholar]
  • 16.Rubin DB. Causal Inference Using Potential Outcomes: Design, Modelling, Decisions. Journal of the American Statistical Association. 2005;100:322–331. [Google Scholar]
  • 17.Struthers CA, Kalbfleisch JD. Misspecified proportional hazard models. Biometrika. 1986;73:363–369. [Google Scholar]
  • 18.Ten Have T, Joffe M, Cary M. Causal logistic models for non-compliance under randomized treatment with univariate binary response. Statistics in Medicine. 2003;22(8):1255–1283. doi: 10.1002/sim.1401. [DOI] [PubMed] [Google Scholar]
  • 19.Cuzick J, Sasieni P, Myles J, Tyrer J. Estimating the effect of treatment in a proportional hazards model in the presence of non-compliance and contamination. Journal of the Royal Statistical Society, Series B (Methodological) 2007;69:565–88. [Google Scholar]

RESOURCES