Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Mar 16.
Published in final edited form as: Biometrics. 2021 Oct 12;79(1):253–263. doi: 10.1111/biom.13565

Instrumental Variable Estimation of Complier Causal Treatment Effect with Interval-Censored Data

Shuwei Li 2, Limin Peng 1,*
PMCID: PMC8924024  NIHMSID: NIHMS1752156  PMID: 34528243

Summary:

Assessing causal treatment effect on a time-to-event outcome is of key interest in many scientific investigations. Instrumental variable (IV) is a useful tool to mitigate the impact of endogenous treatment selection to attain unbiased estimation of causal treatment effect. Existing development of IV methodology, however, hasn’t attended to outcomes subject to interval censoring, which are ubiquitously present in studies with intermittent follow-up but are challenging to handle in terms of both theory and computation. In this work, we fill in this important gap by studying a general class of causal semiparametric transformation models with interval-censored data. We propose a nonparametric maximum likelihood estimator of the complier causal treatment effect. Moreover, we design a reliable and computationally stable EM algorithm which has a tractable objective function in the maximization step via the use of Poisson latent variables. The asymptotic properties of the proposed estimators, including the consistency, asymptotic normality, and semiparametric efficiency, are established with empirical process techniques. We conduct extensive simulation studies and an application to a colorectal cancer screening dataset, showing satisfactory finite-sample performance of the proposed method as well as its prominent advantages over naive methods.

Keywords: Complier causal treatment effect, Instrumental variable, Interval censoring, Nonparametric maximum likelihood, Semiparametric transformation models

1. Introduction

The causal effect of a treatment on a time-to-event outcome often represents a substantive interest in scientific investigations. Endogenous selection into treatment, where treatment choice is related to unmeasured confounders of potential outcomes, can greatly complicate the estimation of the causal treatment effect. An instrumental variable (IV), characterized as being independent of unmeasured confounders but related to the treatment and influencing the outcome only through its effect on the treatment, provides a powerful tool to overcome this difficulty (Baiocchi et al., 2014).

A motivating example arises from the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer screening trial. One question of interest arising from this study is to evaluate the effectiveness of flexible sigmoidoscopy screening in reducing the risk of developing colorectal cancer. In the colorectal cancer dataset, among the 70,529 individuals randomly assigned to receive flexible sigmoidoscopy, about 14.1% participants did not comply with the assigned intervention, resulting in treatment noncompliance. However, such treatment noncompliance may be outcome-related and influenced by latent unmeasured factors, such as health consciousness. This is partly reflected by the imbalance of measured covariates between individual who received flexible sigmoidoscopy versus those who did not (see Web Table 9). An IV analysis can help mitigate the bias induced by the presence of unmeasured confounders.

IV methods have been extensively studied for standard uncensored outcomes. Among the vast literature, a popular research line adopts a latent class formulation which divides the whole population into four latent subgroups (i.e. always-takers, compliers, never-takers, defiers), and targets complier (or local) causal treatment effects that are defined upon compliers whose treatment choices always coincide with the IV (Angrist et al., 1996). For example, Imbens and Rubin (1997a) and Imbens and Rubin (1997b) developed two mixture modeling approaches to estimate the local average treatment effect. Abadie et al. (2002) proposed a general weighting strategy for estimating complier treatment effect under linear or nonlinear treatment response models. Cheng et al. (2009) and Cheng et al. (2009) further used empirical likelihood techniques to construct semiparametric and nonparametric estimators of the complier outcome distribution. All these approaches require outcomes are completely observed.

When a time-to-event outcome is of interest, estimation of the causal treatment effect can be complicated by the presence of censoring, in which some or all event times are not observed. Many efforts have been devoted to developing latent class IV methods tailored to censored time-to-event data. For example, Baker (1998) and Nie et al. (2011) studied the estimation of causal hazard difference in compliers with discrete right censored survival data without considering covariates. Under the popular proportional hazards modeling, many authors investigated the estimation of complier causal proportional hazard ratio based on the partial or full likelihood (Loeys and Goetghebeur, 2003; Cuzick et al., 2007; Li and Gray, 2016, for example) or weighted estimating equations (Li and Gray, 2016; Kianian et al., 2021, for example). Lin et al. (2014) and Yu et al. (2015) considered more general causal linear transformation models, and developed pseudo likelihood and maximum likelihood based estimation, respectively. All these efforts, however, can accommodate only random right censoring.

Time-to-event outcomes are often subject to interval censoring rather than the simpler right censoring due to the realistic intermittent follow-up allowed by various scientific studies (Sun, 2006). Unlike right censored data which include exact event times for some subjects, interval-censored data only include time intervals where the event times belong to but not exact event times. For example, in the PLCO study, the times of cancer onset were never exactly observed but were only known to be between two study visits, hence suffering from interval censoring. The development of statistical techniques for interval-censored data involves extra challenges compared to that for right censored data in both computation and theory (Jewell and van der Laan, 2004; Ma and Kosorok, 2005; Zeng et al., 2016, for example). The challenges from interval censoring become even more prominent in the latent class IV setting because the difficulty with the unavailability of exact event times is further tangled with the unobservable compliance subgroups. Despite the ubiquitous presence of interval censoring, to the best of our knowledge, there is no IV method available to address the causal treatment effect with interval-censored data.

To fill in this gap, in this work, we study the IV estimation of complier causal treatment effect with interval-censored data. Our proposal is built under a general class of semiparametric transformation models, which includes many commonly used regression models as special cases, such as proportional hazards and proportional odds models. For the estimation, we employ the nonparametric maximum likelihood approach, and develop a customized EM algorithm to relieve the computational burden in maximizing the observed data likelihood function of a complex form. According to our simulation studies, our algorithm is stable and reliable with finite samples, and the proposed method demonstrates apparent advantages over the naive approaches that are often used in practice but are prone to bias. Using empirical process techniques, we establish the asymptotic properties of the proposed estimators, including the consistency, asymptotic normality, and semiparametric efficiency, under mild regularity conditions. We also propose second-stage estimation to evaluate the complier causal differences in survival probabilities by treatment conditional on covariates and without conditioning on covariates. These quantities provide a straightforward and sensible way to help understand the causal treatment effect on the time-to-event outcome. We illustrate the proposed methods via an application to a dataset from the PLCO trial. We provide several important remarks, which are relegated to Web Appendix D due to space limit.

2. Data Notation, Assumption, and Models

Let Ti denote the time-to-event outcome for subject i (i = 1, , n). We consider a common interval censoring scenario, where Ti is monitored at a sequence of observation times Vi1<<ViJi, where Ji is a random positive integer denoting the number of the observation times for subject i. Let Vi=Vi1,ViJi. For completeness, we define Vi0 = 0 and Vi,Ji+1=. Define Δij = I(Vij < TiVi,j+1), for j = 0, …, Ji. Let Di, Ai and Xi denote a binary treatment variable, a binary instrumental variable, and a p × 1 covariate vector for subject i respectively. For example, Di = 1 if subject i takes the treatment 1 (treatment group), and Di = 0 if subject i takes treatment 0 (control group). The observed interval-censored data consist of Di,Ai,Xi,Vi,Δij,j=0,,Ji,i=1n, with the realized values di,ai,xi,vi,δij,j=0,,Ji,i=1n. Define Li = max{Vij : Vij < Ti} and Ri = min{Vij : VijTi}. That is, (Li, Ri] denotes the smallest time interval containing Ti in which Ti is left censored when Li = 0, and right censored when Ri = ∞. For each i = 1, …, n, the realized values of Li and Ri are denoted as li and ri, respectively. It is clear that the interval-censored data do not include any exact event times.

Potential outcomes for subject i are defined as follows. Let Dia denote the potential treatment selection given Ai = a. We let Tid denote the potential time-to-event given the treatment Di = d under the Assumption 6 stated below. Similarly, let Dia denote the potential treatment given (A1, …, An) = a ≐ (a1, …, an) and let Tid denote the potential time-to-event given (D1, …, Dn) = d ≐ (d1, …, dn). In the sequel, notation without the subscript i represent the corresponding population analogues.

Following the terminology of Angrist et al. (1996), the whole population is divided into four latent compliance subgroups, which we shall index by Ui. Specifically, we let Ui equal 1 if subject i is an always taker (i.e. Di0=Di1=1); 2 if subject i is a complier (i.e. Di1>Di0); 3 if subject i is a never taker (i.e. Di0=Di1=0); 4 if subject i is a defier (i.e. Di1<Di0). Note that Ui cannot be fully determined by the observed data because Di0 and Di1 cannot be observed at the same time.

Throughout this paper, we adopt the following standard IV assumptions:

Assumption 1 (stable unit treatment value assumption). For i = 1, …, n, Dia=Dia if ai=ai, where a = (a1, …, an) ∈ {0, 1}n and a=a1,,an{0,1}n; Tid=Tid if di=di, where d = (d1, …, dn) ∈ {0, 1}n and d=d1,,dn{0,1}n.

Assumption 2 (random sampling). (Ti1, Ti0, Di1, Di0, Xi, Ai), i = 1, …, n, are independently and identically distributed variables.

Assumption 3 (independence of the instrument). (D0, D1, T0, T1, V) ⊥ A | X, where the symbol ‘⊥’ denotes statistical independence.

Assumption 4 (conditional non-null compliance class). P(D1 > D0 | X) > 0.

Assumption 5 (conditional monotonicity). P(D1D0 | X) = 1.

Assumption 6 (exclusion restriction). P(T1d = T0d) = 1 for d = 0, 1, where Tad denote the potential time-to-event given A = a and D = d with a = 0 or 1.

In addition, we assume a random interval censoring scheme, where observation times are conditionally independent of outcomes given covariates as stated below:

Assumption 7 (conditionally independent observation times). (D0, D1, T0, T1) ⊥ V | X.

Assumption 1 implies no interference, which means treatment applied to one subject does not affect the outcome for another subject, and only a single version of treatment level is allowed for one subject so that the potential time-to-event outcome is well defined. By Assumption 3, the IV, A, is as good as a random treatment assignment conditional on X. Assumption 4 states that the complier class has non-zero probability at every covariate level. Assumption 5 excludes the existence of defiers. By Assumption 6, IV affects outcomes only through its effect on D, and thus one can equivalently express the potential outcome Tad as Td, which represents potential time-to-event given treatment D = d. Assumption 7 helps simplify the derivation of the observed data likelihood. It is reasonable in studies where observation times are administrative (e.g. pre-specified by study protocol) but can be violated when observation times are related to side effects or efficacy of the received treatment. Additional modeling of observation times and their associations with the potential event time and treatment may be needed in the presence of informative observation times.

We investigate the complier causal treatment effect, and adopt the general semiparametric transformation modeling to link the potential time-to-event outcome Td with treatment D = d and covariates X within each latent compliance subgroup. Specifically, we assume that

ΛT1(tU=1,X)=G1Λ(t) expα1+γ1X,ΛTd(tU=2,X)=G2Λ(t) expβ2d+γ2X,ΛT0(tU=3,X)=G3Λ(t) expα3+γ3X, (1)

where ΛTd(X,U=k) stands for the cumulative hazard function of Td given covariates X in the latent compliance class U = k, Gk(·) is a prespecified increasing transformation function with Gk(0) = 0, and γk is a class-specific parameter vector containing covariate effects, k = 1, 2, 3. In model (1), Λ(·) is an unknown non-decreasing baseline function for the complier subgroup, and its counterparts for the always-taker and never-taker subgroups have scale shifts from Λ(t) by exp(α1) and exp(α3) respectively. Thus, the unknown parameters α1 and α3, respectively, represent the log scale shifts in the baseline function for always-takers and never-takers as compared to compliers. Because always-takers (i.e. U = 1) always receive treatment D = 1 and never-takers (i.e. U = 3) always receive treatment D = 0, no treatment effect is expressed for these two subgroups.

The parameter β2 in model (1) captures the complier (or local) causal treatment effect conditional on X, the key causal estimand of interest. This is because β2 depicts how the conditional cumulative hazard function of the potential event time T1 given covariates X differs from that of T0 in the complier subgroup. By taking the form of semiparametric linear transformation model, model (1) flexibly includes many commonly used regression models as special cases. For example, we can obtain proportional hazards and proportional odds models by setting Gk(x) = x and Gk(x) = log(1 + x) (k = 1, 2, 3) respectively. Given the non-collapsibility issue of hazard ratio and odds ratio (Martinussen and Vansteelandt, 2013; Wang et al., 2018), we generally expect that β2 is not the same as the marginal local causal treatment effect that is only conditional on the complier subgroup. In addition, in the one-sided compliance case, where subjects with A = 0 have no access to treatment (i.e. P(D0 = 0 | X) = 1), we can show that for d = 1 or 0, PTd>tD=1,X=exp G2Λ(t)expβ2d+γ2X. This means, in the one-sided compliance case, β2 can also be interpreted as the causal treatment effect conditional on X in the treated population.

By formulating additive effects of D and X, model (1) implicitly assumes that the local causal treatment effect condition on X is homogeneous across strata defined by different levels of X. A possible approach to relaxing such a constraint is to extend model (1) by including an interaction term between the treatment D and covariates X, as in Li and Gray (2016) and others. The proposed estimation and inference procedures can readily be generalized to this extended model. In particular, one may test whether the coefficient for the interaction term is zero or not. Failing to reject the null hypothesis would suggest the adequacy of the treatment effect homogeneity assumption.

It is important to note that, under the independence of IV (i.e. Assumption 3), it holds that

PTd>tU=2,X=PTd>tU=2,X,A=d=P(T>tU=2,X,D=d).

This, combined with the facts that T = T1 given U = 1 and T = T0 given U = 3, indicates that model (1) can be equivalently expressed as

Λ(tD,X,U=k)=GkΛ(t) expαk+βkD+γkX,   k=1,2,3, (2)

where Λ(· | D, X, U = k) denotes the cumulative hazard function of T given treatment D and covariates X in the latent compliance class U = k, and α2 = 0 and β1 = β3 = 0. By revealing a more direct link between the causal estimand β2 and the observed event time T, this alternative representation of model (1) can greatly facilitate the estimation of model (1).

To tackle the difficulty with the unobservable complier status (i.e. Ui = 2), we further assume a multinomial logistic regression model for Ui:

logPUi=kXiPUi=2Xi=θkX˜i,    k=1,3,

where X˜i=1,Xi, and θk is an unknown parameter vector, i = 1, …, n. Let θ = (θ1, θ3). The multinomial logistic regression model can be equivalently expressed as

PUi=kXi=pik(θ)expθkX˜i1+expθ1X˜i+expθ3X˜i,    k=1,2,3, (3)

with θ2 = (0, …, 0). With the additional modeling of the latent compliance class Ui given Xi, in Lemma 1 of Web Appendix B of the Supporting Information, we formally establish the identifiability of β2, along with other parameters in models (1) and (3), with the observed interval-censored data. Note that the parametric assumption for P(Ui = k | Xi) in (3) may be strong and yet is hard to verify with the observed data. A more flexible specification for P(Ui = k | Xi), such as that adopted in Schwartz et al. (2011), merits further investigations.

In practice, a second-stage estimand of interest may be given as

CESP(tX)=PT1>tU=2,XPT0>tU=2,X,

which represents the causal difference in the survival probability of a complier with covariates X between the two potential scenarios of receiving treatment versus not receiving treatment. This causal quantity, as compared to β2, may provide a more direct way to depict complier causal treatment effect. Based on the assumed models and IV assumptions above, we have that

PTdtU=2,X=1expG2Λ(t) exp α2+β2d+γ2X,  for d=0,1.

Thus, an estimator of CESP(t | X), denoted by CES^P(tX), can be computed by plugging in the maximum likelihood estimates for Λ(·), α2, β2, and γ2 derived in Section 3.

We also consider a unconditional counterpart of CESP(t | X), which is defined as

CESP(t)=PT1>tU=2PT0>tU=2.

This causal quantity describes causal survival difference by treatment in compliers without conditioning on covariates. To estimate CESP(t), we derive from applying the Bayes’ theorem that

CESP(t)=CESP(tX)fXU=2(x)dx=ECESP(tX)P(U=2X)EP(U=2X).

Consequently, a reasonable plug-in estimator for CESP(t) is given by

CES^P(t)=n1i=1npi2θ^nCES^PtXip¯2θ^n,

where p¯2θ^n=n1i=1npi2θ^n, pi2θ^n=1+expθ^1nX˜i+expθ^3nX˜i1 and θ^n denotes the estimator of θ proposed in Section 3.

3. Parameter Estimation

To estimate β2, the causal estimand of interest, we apply the nonparametric maximum likelihood technique while properly accounting for interval censoring and the latent nature of U.

We first consider the likelihood function based on the observed data. Let ν=ν1,ν2,ν3, where ν1=α1,γ1, ν2=β2,γ2, and ν3=α3,γ3. For each i, let z1i=1,xi, z2i=di,xi, and z3i=1,xi. As shown in Web Appendix A of the Supporting Information, we get

PΔij=1,Di=1,Ai=1Xi=xi,Vi=vi=Pvij<Tivi,j+1Ui=1,Xi=xiPUi=1Xi=xiPAi=1Xi=xi+Pvij<Tivi,j+1Ui=2,Di=1,Xi=xiPUi=2Xi=xiPAi=1Xi=xi, (4)

where vij denotes the jth component of vi. Let fik and pik be the shorthand notation for exp GkΛli expνkzkiexp GkΛri expνkzki, and pik(θ)=expθkx˜i1+expθ1x˜i+expθ3x˜i, respectively, where x˜i is the realization of X˜i. The likelihood contribution from subject i with Di = 1 and Ai = 1 is then proportional to fi1pi1 + fi2pi2, provided the density of covariates and the conditional distribution of A given X contain no information about unknown parameters in models (2) and (3). Following similar arguments, we can show that the likelihood contribution from subject i with Di = 0, Ai = 1, with Di = 1, Ai = 0, or with Di = Ai = 0 is proportional to fi3pi3, fi1pi1, or fi2pi2 + fi3pi3, respectively. Therefore, based on the observed data di,ai,xi,vi,δij,j=0,,Ji,i=1n, the kernel of the likelihood function takes the form,

L0(ξ)=di=ai=1fi1pi1+fi2pi2di=0,ai=1fi3pi3di=1,ai=0fi1pi1di=ai=0fi2pi2+fi3pi3, (5)

where ξ denotes a vector containing all unknown parameters, ν1, ν2, ν3 as well as Λ(·).

For the nuisance unknown function Λ(·), we assume it has nonnegative jump sizes at the distinct finite observation times, denoted by c1<<cqn. Here, qn is determined by the data and has the same order as n. Denote the nonnegative jump sizes as λ1,,λqn, and then we can replace Λ(t) in (5) with cltλl. In the sequel, ξ contains λnλ1,,λqn instead of Λ(·). However, direct maximization of the likelihood function (5) is very challenging because it consists of mixtures of distributions over the latent compliance subgroups, which is further complicated by the interval-censored data structure. Another complication is that it involves a nuisance unknown function Λ(·), which makes the dimension of ξ increases as the sample size increases.

To alleviate these difficulties, we design an EM algorithm tailored to interval-censored data and the proposed models. Since the likelihood implied by interval-censored data is more complex than that under random right censoring, we cannot directly adapt Yu et al. (2015)’s algorithm designed for randomly censored data; instead we employ data augmentation with the use of Poisson latent variables, which yields a tractable objective function. Similar data augmentation techniques have been devised for the analysis of interval-censored data (Wang et al., 2016; Zeng et al., 2016) or multivariate survival data (Zeng et al., 2017; Gao et al., 2019) in non-causal settings. Compared to these existing methods, the proposed algorithm involves more sophisticated augmentation of the observed data by three sets of Poisson latent variables, owing to the inclusion of three transformation models and one multinomial logistic model, and the mixture structure of the likelihood from considering latent compliance classes. The derivation of the conditional expectations is also notably more complicated in the proposed EM algorithm.

The proposed algorithm possesses several desirable analytical features, such as concave complete data likelihood, closed-form estimates, separate score equations for model (2) and model (3). These features can greatly improve the computational stability and reliability of the proposed algorithm. Due to the space restriction, we relegate algorithmic details to the Web Appendix A of the Supporting Information.

4. Asymptotic Properties

Let η = (ν, θ), and let ξ0 = (η0, Λ0) and ξ^n=η^n,Λ^n denote the true value and the maximum likelihood estimator of ξ0, respectively. Under the regularity conditions in the Supporting Information, we establish the asymptotic properties of ξ^n, including consistency, asymptotic normality, and semiparametric efficiency. Detailed proofs of these theorems and their distinctions from previous work are presented in the Supporting Information (Web Appendix B).

Theorem 1: Suppose that conditions (A1) – (A5) hold. We have that η^nη00 and suptτl,τuΛ^n(t)Λ0(t)0 almost surely as n → ∞, where ‖ · ‖ denotes the Euclidean norm.

Theorem 2: Suppose that conditions (A1) – (A5) hold. We have that dξ^n,ξ0η^nη02+τlτuΛ^n(s)Λ0(s)2fV(s)ds1/2=Opn1/3, where fV (s) denotes the density function of the observation time V.

Theorem 3: Suppose that conditions (A1) – (A6) hold. We have that nη^nη0 converges in distribution to a zero-mean multivariate normal distribution, the covariance matrix of which attains the semiparametric efficiency bound.

To make inference on η0, one often needs to estimate the variance of η^n. However, it is difficult to derive an explicit analytic estimator of the asymptotic covariance matrix of η^n, given its complex form suggested by the proof of Theorem 3. Therefore, following the lines of Zeng et al. (2017) and others, we propose to adopt a numerical approach and estimate the covariance matrix of η^n by nI^n1, where

I^n=n1i=1nηliη,Λ^ηη=η^n2.

Here, li(η, Λ) is the log-likelihood function of subject i and Λ^η=argmaxΛlogL(η,Λ), which can be achieved by a simplified version of the proposed EM algorithm by fixing η. Furthermore, we propose to approximate ηliη,Λ^ηη=η^n by the first-order numerical difference.

5. Simulation Studies

We conduct extensive simulations to evaluate the finite-sample performance of the proposed method. We generate data that follow model (2) with Gk(x)=log1+r˜kx/r˜k with r˜k=0 or 1, which corresponds to the proportional hazards or proportional odds models, respectively. We set Λ(t) = 0.5 log(1 + 0.5t), and consider different covariate effects among the three compliance classes by letting ν1=α1,γ1=(1,1,1), ν2=β2,γ2=(0.5,0.5,0.5), and ν3=α3,γ3=(0.1,0.3,0.3). The detailed data generation scheme is provided in the Supporting Information (Web Appendix C). Under each configuration, we generate 500 simulated datasets with the sample size n = 500. On average, we have 12% to 15% left-censored observations and 7% to 30% right-censored observations in the always-taker group; 5% to 6% left-censored observations and 38% to 52% right-censored observations in the complier group; and 1% to 4% left-censored observations, and 53% to 68% right-censored observations in the never-taker group.

When implementing the proposed algorithm, we set the initial value of each regression parameter to be 0, and the initial value of each λl to be 1/n. The algorithm is declared convergent when the sum of the absolute differences of the estimates at two successive iterations is less than a small positive number, say 10−3. Table 1 reports the numerical results on the regression parameter estimation under the proportional hazards model r˜k=0 and the proportional odds model r˜k=1, including the empirical biases (EmpBias), empirical standard standard errors (EmpSE), average standard error estimates (EstSE), and the empirical coverage probabilities of 95% confidence intervals (CP95) obtained by normal approximations. The results in Table 1 suggest that the proposed method performs reasonably well in finite samples. The empirical biases of all regression parameter estimates are small, the corresponding standard error estimates are close to the empirical standard errors, and the empirical coverage probabilities match the nominal value well.

Table 1.

Simulation results on the regression parameter estimates with the proposed method, including the empirical biases (EmpBias), empirical standard errors (EmpSE), average standard error estimates (EstSE), and the 95% coverage probabilities (CP95).

Proportional hazards Proportional odds
EmpBias EmpSE EstSE CP95 EmpBias EmpSE EstSE CP95
Results for the survival models
α^1 0.022 0.516 0.502 0.946 0.003 0.772 0.772 0.944
α^3 0.003 0.561 0.553 0.950 −0.034 0.801 0.780 0.936
β^2 −0.008 0.234 0.223 0.938 −0.009 0.353 0.344 0.944
γ^11 0.004 0.314 0.302 0.946 0.056 0.497 0.469 0.942
γ^12 0.006 0.469 0.498 0.964 −0.019 0.751 0.763 0.956
γ^21 0.020 0.222 0.230 0.969 −0.017 0.387 0.362 0.934
γ^22 −0.002 0.391 0.377 0.954 −0.014 0.618 0.594 0.958
γ^31 −0.016 0.375 0.377 0.954 0.022 0.571 0.563 0.942
γ^32 −0.045 0.638 0.627 0.940 −0.004 0.878 0.861 0.940
Results for the multinomial logistic model
θ^10 −0.013 0.440 0.421 0.936 0.029 0.416 0.420 0.961
θ^11 0.011 0.371 0.366 0.946 −0.010 0.376 0.366 0.942
θ^12 −0.024 0.595 0.610 0.958 −0.052 0.632 0.614 0.948
θ^30 −0.004 0.376 0.393 0.964 0.030 0.380 0.396 0.962
θ^31 0.004 0.381 0.369 0.940 −0.014 0.376 0.371 0.954
θ^32 −0.015 0.591 0.614 0.960 −0.042 0.616 0.616 0.956

Next, we suppose the simulated data arising from a randomized clinical trial with treatment non-compliance, where the IV, A, corresponds to the random treatment assignment, and D corresponds to the actual treatment received. Considering this scenario, we compare the proposed method to the naive methods, including the as-treated analysis, per-protocol analysis, and intent-to-treat (ITT) analysis based on the same simulated datasets. To implement these naive approaches, we use the algorithm provided in Zeng et al. (2016). We report in Table 2 the empirical biases (EmpBias) along with the empirical standard errors (EmpSE) obtained by the three naive methods. We use the true values of the model parameters corresponding to the complier group to calculate the empirical biases, i.e. (β, γ) = (0.5, 0.5, −0.5). The results in Table 2 show that the treatment and covariate effects estimated by the three naive methods are severely biased about the complier treatment effect of interest. This indicates that ITT, as-treated, and per-protocol analyses may all fail to measure the true causal treatment effect for a randomized study with treatment noncompliance.

Table 2.

Simulation results on the regression parameter estimates based on the three naive methods, including the empirical biases (EmpBias) and empirical standard errors (EmpSE). r˜=0 and 1 correspond to the proportional hazards and proportional odds models, respectively.

As-treated Per-protocol ITT Mid-EM
r˜ EmpBias EmpSE EmpBias EmpSE EmpBias EmpSE EmpBias EmpSE
0 β^ 0.306 0.123 0.203 0.141 −0.256 0.117 β^2 0.010 0.236
γ^1 0.120 0.109 0.089 0.127 0.193 0.108 γ^21 0.024 0.235
γ^2 −0.147 0.195 −0.103 0.232 −0.114 0.195 γ^22 −0.044 0.403
1 β^ 0.316 0.173 0.217 0.197 −0.263 0.163 β^2 0.023 0.346
γ^1 0.126 0.170 0.088 0.205 0.246 0.167 γ^21 −0.014 0.371
γ^2 −0.158 0.296 −0.119 0.336 −0.184 0.295 γ^22 −0.061 0.589

As suggested by the Associate Editor, we also consider another naive approach, where the interval-censored event times are imputed by the midpoints of the observed intervals and parameter estimates are obtained by maximizing Yu et al. (2015) ’s likelihood, which was developed for randomly right censored data. This naive approach is referred to as “Mid-EM” hereafter. From Table 2, we find that biases yielded by the Mid-EM method are considerably smaller than those from the ITT, as-treated, and per-protocol analyses, but in general slightly larger than those from the proposed method. The bias reduction resulted from the Mid-EM method reflects the utility of IV in mitigating the bias in the estimation of causal treatment effect. The small bias of the naive Mid-EM method may be explained by the narrow intervals in the simulated datasets, which has the average interval width around 0.2, and the relatively large right censoring proportion in the complier group ranging from 38–52%. We further examine another simulation setting, where Λ(t) = log(1 + t) but other configurations remain the same. In this setup, the average width of the generated intervals equals 3 and the length of follow-up equals 15. On average, we have about 6.1% and 24.5% right-censored observations in the complier group under the proportional hazards and proportional odds models, respectively. Simulation results for this setup are presented in Web Table 1 of the Supporting Information. The results in Web Table 1 suggest that the naive Mid-EM method can yield substantially larger bias than the proposed method when the event times are subject to censoring by wide intervals. This observation endorses the importance of appropriately accounting for interval censoring in the IV analysis.

We also evaluate the estimation of CESP(t | X) and its unconditional counterpart, CESP(t), with X = (1, 0.5) and t = 1, 2, 3, 4, under the proportional hazards modeling and the proportional odds modeling; please see more details in Web Appendix C of the Supporting Information. From the simulation results presented in Web Table 2 of the Supporting Information, we observe that the proposed method yields virtually unbiased estimates for CESP(t | X) and CESP(t) at all different t’s, the standard error estimates agree with with the empirical standard deviations, and the empirical coverage probabilities are close to the nominal value. These suggest good finite-sample performance of the proposed second-stage estimation.

6. A Real Data Application

The Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial is a large, multi-center, two-armed, randomized trial, sponsored by the National Cancer Institute and initiated in 1993. Participants who had no experience in participating any other cancer screening trials were recruited and randomly assigned to either the usual care (i.e. control) group or the screening with flexible sigmoidoscopy (i.e. intervention) group. Study participants in the intervention group were offered the screening at baseline and 3 or 5 years later. Prorok et al. (2000) and Andriole et al. (2012) provided more details about this trial.

In our analysis, we focus on the colorectal cancer screening data. A primary interest is to investigate the efficacy of flexible sigmoidoscopy in reducing the risk of colorectal cancer occurrence as compared to the usual care. However, some participants who were randomly assigned to the intervention group did not comply with the assigned intervention. This raised an issue of treatment noncompliance. As another notable data complication, the onset time of the colorectal cancer was not observed exactly but was only known to be between two study visits, hence suffering from interval censoring. This study renders an IV setting of our interest, where the event time of interest T is defined as the time from randomization to the onset of the colorectal cancer recorded in years, subject to interval censoring due to intermittent patient follow-up, a reasonable IV is the random treatment assignment, denoted by “Intervention”, and the treatment variable D corresponds to the actual status of receiving sigmoidoscopy at the baseline, denoted by “Screening”. We consider 12 covariates, which are explained in Web Appendix C and summarized in Web Tables 3, 9, and 10. The reasonably good balance of measured covariates between the randomly assigned intervention and usual care groups supports the validity of the selected IV (see Web Table 10).

We perform the proposed IV analysis to evaluate the complier causal treatment effect of baseline sigmoidoscopy screening. After deleting the subjects with missing covariates, the final sample size is n = 139, 325. In the final dataset, the right censoring rate is about 98%. Among the 70, 529 individuals who were randomly assigned to the intervention group, about 14.1% individuals did not take the flexible sigmoidoscopy after the randomization and no individuals in the usual-care group received sigmoidoscopy. This may indicate that there are no always-takers in this study following similar discussions in Yu et al. (2015) and others. We fit the data the transformation model (2) with Gk(x)=log1+r˜kx/r˜kr˜k0 (k = 2 or 3) and perform the proposed estimation and inference procedures. We consider various combinations of (r˜2, r˜3), and select the models associated with r˜2=r˜3=1 (proportional odds models) based on the Akaike information criterion (AIC).

Under the proportional odds models, Table 3 presents the parameter estimates (Est), the standard error estimates (SE) and the associated p-values for the complier subgroup. The estimated coefficient for Screening is −0.302, meaning that for compliers, the odds of colorectal cancer occurrence given flexible sigmoidoscopy screening is estimated to be 73.9% (= exp(−0.302)) of that without receiving the screening. This, combined with the p value < 0.001, suggests that for compliers, flexible sigmoidoscopy results in significantly lower risk to develop colorectal cancer compared to usual care. This supports the beneficial effect of flexible sigmoidoscopy screening. In addition, the results in Table 3 reveal that being older, African American, male, with education lower than college degree, having immediate family with any PLCO cancer, having immediate family with colorectal cancer, and having diabetes can increase the risk of developing colorectal cancer.

Table 3.

Results from colorectal cancer screening data analysis under the proportional odds model: the proposed estimates for complier treatment effect and covariate effects, and the treatment and covariate effect estimates based on the naive methods, the standard errors along with the corresponding p-values.

Proposed method for compilers As-treated Per-protocol ITT Mid-EM
EST SE p-value EST SE value EST SE p-value EST SE p-value EST SE p-value
Screening −0.302 0.050 < 0.001 −0.247 0.045 < 0.001 −0.270 0.045 < 0.001 −0.259 0.044 < 0.001 −0.298 0.050 < 0.001
Age 0.345 0.029 < 0.001 0.317 0.025 < 0.001 0.329 0.025 < 0.001 0.318 0.025 < 0.001 0.353 0.027 < 0.001
Race 0.244 0.108 0.024 0.182 0.093 0.050 0.186 0.093 0.046 0.191 0.093 0.040 0.266 0.104 0.010
Sex 0.405 0.053 < 0.001 0.395 0.046 < 0.001 0.404 0.046 < 0.001 0.387 0.046 < 0.001 0.417 0.058 < 0.001
Education −0.163 0.050 0.001 −0.170 0.044 < 0.001 −0.165 0.044 < 0.001 −0.172 0.044 < 0.001 −0.167 0.047 < 0.001
Employment 0.011 0.060 0.857 0.026 0.053 0.624 0.022 0.053 0.679 0.026 0.053 0.619 0.010 0.051 0.849
Cancer 0.124 0.054 0.021 0.087 0.047 0.065 0.106 0.047 0.024 0.084 0.047 0.072 0.125 0.050 0.013
ColoCancer 0.231 0.077 0.003 0.206 0.070 0.003 0.221 0.070 0.002 0.205 0.070 0.003 0.231 0.075 0.002
Diabetes 0.193 0.087 0.027 0.226 0.072 0.002 0.223 0.072 0.002 0.230 0.072 0.002 0.209 0.086 0.015
Strokes 0.064 0.155 0.678 −0.005 0.133 0.972 0.021 0.133 0.873 0.003 0.133 0.981 0.094 0.163 0.563
Heart −0.117 0.085 0.170 −0.129 0.074 0.084 −0.132 0.074 0.076 −0.125 0.075 0.095 −0.091 0.086 0.286
Gallblad 0.140 0.077 0.069 0.117 0.068 0.086 0.129 0.068 0.058 0.117 0.068 0.085 0.138 0.072 0.056
ColorectalPolyps 0.076 0.091 0.406 0.086 0.080 0.280 0.081 0.080 0.310 0.089 0.080 0.264 0.074 0.080 0.353

In Table 3, we also include the coefficient estimates along with the standard error estimates obtained from the naive methods (as-treated, per-protocol and ITT) under the proportional odds models, which are implemented by the algorithm of Zeng et al. (2016). We note that the magnitude of the estimated screening effect based on each naive method is smaller than that produced by the proposed method. In addition, unlike the proposed analysis, which reveals a sensible positive association between colorectal cancer risk and immediate family history of PLCO cancer, as-treated analysis and ITT analysis suggest having immediate family with PLCO cancer has little impact on the development of colorectal cancer. These discrepancies may be caused by the bias induced by endogenous treatment selection. That is, study participants with immediate family history of PLCO cancer, as compared to those without, may be more likely to comply with the assigned screening (as confirmed by the results in Web Table 4 of the Supporting Information). Consequently, the subset of study participants who were assigned to the intervention and comply with it may over-represent individuals with family history of PLCO cancer. As having immediate family with PLCO cancer may be linked to a higher risk of colorectal cancer, ITT or as-treated analysis, which does not properly accommodate such non-random treatment compliance, thus may yield attenuated estimates for the treatment effect and the effect of PLCO cancer family history.

We also performed the Mid-EM analysis, which transforms the interval-censored data into right censored data by imputing the interval-censored observations with the midpoints of intervals. The results are presented in the rightmost panel of Table 3. We note that the findings based on the Mid-EM analysis are similar to those obtained by the proposed method. The result similarity can be explained by the low interval censoring rate of 2% and the high right censoring rate of 98% in this cancer screening dataset.

In Web Appendix C of the Supporting Information, we present and discuss the estimation results for model (1) given never takers and model (3) (see Web Table 4 of the Supporting Information), and results obtained under proportional hazards models (see Web Tables 56 of the Supporting Information). The results suggest that being male, with higher education, having immediate family with any PLCO cancer, and having immediate family with colorectal cancer are associated with a significant higher probability of complying with the randomized treatment assignment. Adopting proportional hazards models or proportional odds models leads to very similar findings about the effect of flexible sigmoidoscopy screening.

We also assess the complier causal difference in the survival probability of developing colorectal cancer between the sigmoidoscopy screening versus the usual care, characterized by CESP(t|X) and CESP(t) for t = 1, 2, 3 or 4 and X = (50, 0, …, 0). The results in Web Table 7 further support the beneficial effect of flexible sigmoidoscopy screening for compliers. For example, the estimate of CESP(2) is about 0.46%. Note that the colorectal cancer is a rare disease with lifetime risk being 4.3% in men and 4.0% in women. The incidence rate for individuals over 55 years old is even lower, estimated to be below 2.5% (https://www.cancer.org). Hence, an improvement of 0.46% in colorectal cancer free probability can be roughly translated to at least 18.4% (=0.46/2.5) reduction in cancer incidence, which may be clinically relevant. The results in Web Table 7 provides an altenerative sensible view to interpret and understand the benefit of receiving flexible sigmoidoscopy screening for compliers.

Data Availability Statement

The data that support the findings in this paper are available upon request to Cancer Data Access System (CDAS) at https://prevention.cancer.gov/major-programs/prostate-lung-colorectal-and-ovarian-cancer-screening-trial/cancer-data-access-system.

Supplementary Material

Supplementary Materials

Acknowledgements

The authors thank the editor, associate editor and two reviewers for their very helpful comments and suggestions that improved the presentation of the paper. This research was supported by the National Institutes of Health grant R01 HL113548 (to Peng) and was partly supported by the National Natural Science Foundation of China grant 11901128 (to Li).

Footnotes

Supporting Information

Web Appendices, Tables and Figures referenced in Sections 36 and R code for implementing the proposed method are available with this paper at the Biometrics website on Wiley Online Library.

References

  1. Abadie A, Angrist J, and Imbens G (2002). Instrumental variables estimates of the effect of subsidized training on the quantiles of trainee earnings. Econometrica 70, 91–117. [Google Scholar]
  2. Andriole G, Crawford E, Grubb R, Buys S, Chia D, Church T, and et al. (2012). Prostate cancer screening in the randomized prostate, lung, colorectal, and ovarian cancer screening trial: mortality results after 13 years of follow-up. Journal of the National Cancer Institute 104, 125–132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Angrist JD, Imbens GW, and Rubin DB (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association 91, 444–455. [Google Scholar]
  4. Baiocchi M, Cheng J, and Small DS (2014). Instrumental variable methods for causal inference. Statistics in Medicine 33, 2297–2340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Baker SG (1998). Analysis of survival data from a randomized trial with all-or-none compliance: estimating the cost-effectiveness of a cancer screening program. Journal of the American Statistical Association 93, 929–934. [Google Scholar]
  6. Cheng J, Qin J, and Zhang B (2009). Semiparametric estimation and inference for distributional and general treatment effects. Journal of the Royal Statistical Society: Series B 71, 881–904. [Google Scholar]
  7. Cheng J, Small DS, Tan Z, and Ten Have TR (2009). Efficient nonparametric estimation of causal effects in randomized trials with noncompliance. Biometrika 96, 19–36. [Google Scholar]
  8. Cuzick P, Sasieni P, Myles J, and Tyrer J (2007). Estimating the effect of treatment in a proportional hazards model in the presence of non-compliance and contamination. Journal of the Royal Statistical Society: Series B 69, 565–588. [Google Scholar]
  9. Gao F, Zeng D, Couper D, and Lin DY (2019). Semiparametric regression analysis of multiple right- and interval-censored events. Journal of the American Statistical Association 114, 1232–1240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Imbens GW and Rubin DB (1997a). Bayesian inference for causal effects in randomized experiments with noncompliance. The Annals of Statistics 25, 305–327. [Google Scholar]
  11. Imbens GW and Rubin DB (1997b). Estimating outcome distributions for compliers in instrumental variables models. Review of Economic Studies 64, 555–574. [Google Scholar]
  12. Jewell NP and van der Laan M (2004). Case-control current status data. Biometrika 91, 529–541. [Google Scholar]
  13. Kianian B, Kim JI, Fine JP, and Peng L (2021). Causal proportional hazards estimation with a binary instrumental variable. Statistica Sinica 31, 1–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Li S and Gray RJ (2016). Estimating treatment effect in a proportional hazards model in randomized clinical trials with all-or-nothing compliance. Biometrics 72, 742–750. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Lin H, Li Y, Jiang H, and Li G (2014). A semiparametric linear transformation model to estimate causal effects for survival data. The Canadian Journal of Statistics 42, 18–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Loeys T and Goetghebeur E (2003). A causal proportional hazards estimator for the effect of treatment actually received in a randomized trial with all-or-nothing compliance. Biometrics 59, 100–105. [DOI] [PubMed] [Google Scholar]
  17. Ma S and Kosorok MR (2005). Penalized log-likelihood estimation for partly linear transformation models with current status data. The Annals of Statistics 33, 2256–2290. [Google Scholar]
  18. Martinussen T and Vansteelandt S (2013). On collapsibility and confounding bias in cox and aalen regression models. Lifetime Data Analysis 19, 279–296. [DOI] [PubMed] [Google Scholar]
  19. Nie H, Cheng J, and Small DS (2011). Inference for the effect of treatment on survival probability in randomized trials with noncompliance and administrative censoring. Biometrics 67, 1397–1405. [DOI] [PubMed] [Google Scholar]
  20. Prorok PC, Andriole GL, Bresalier RS, Buys SS, Chia D, Crawford ED, and et al. (2000). Design of the prostate, lung, colorectal and ovarian (plco) cancer screening trial. Controlled Clinical Trials 21, 273S–309S. [DOI] [PubMed] [Google Scholar]
  21. Schwartz SL, Li F, and Mealli F (2011). A bayesian semiparametric approach to intermediate variables in causal inference. Journal of the American Statistical Association 106, 1331–1344. [Google Scholar]
  22. Sun J (2006). The statistical analysis of interval-censored failure time data. New York: Springer. [Google Scholar]
  23. Wang L, McMahan CS, Hudgens MG, and Qureshi ZP (2016). A flexible, computationally efficient method for fitting the proportional hazards model to interval-censored data. Biometrics 72, 222–231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Wang L, Tchetgen Tchetgen E, Martinussen T, and Vansteelandt S (2018). Learning causal hazard ratio with endogeneity. http://arxiv.org/abs/1807.05313.
  25. Yu W, Chen K, Sobel ME, and Ying Z (2015). Semiparametric transformation models for causal inference in time-to-event studies with all-or-nothing compliance. Journal of the Royal Statistical Society: Series B 77, 397–415. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Zeng D, Gao F, and Lin DY (2017). Maximum likelihood estimation for semiparametric regression models with multivariate interval-censored data. Biometrika 104, 505–525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Zeng D, Mao L, and Lin DY (2016). Maximum likelihood estimation for semiparametric transformation models with interval-censored data. Biometrika 103, 253–271. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Materials

Data Availability Statement

The data that support the findings in this paper are available upon request to Cancer Data Access System (CDAS) at https://prevention.cancer.gov/major-programs/prostate-lung-colorectal-and-ovarian-cancer-screening-trial/cancer-data-access-system.

RESOURCES