Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Oct 28.
Published in final edited form as: Can J Stat. 2013 Nov 14;42(1):18–35. doi: 10.1002/cjs.11198

A semiparametric linear transformation model to estimate causal effects for survival data

Huazhen LIN 1,*, Yi LI 1, Liang JIANG 2, Gang LI 3
PMCID: PMC6816752  NIHMSID: NIHMS1010106  PMID: 31660001

Abstract

Semiparametric linear transformation models serve as useful alternatives to the Cox proportional hazard model. In this study, we use the semiparametric linear transformation model to analyze survival data with selective compliance. We estimate regression parameters and the transformation function based on pseudo-likelihood and a series of estimating equations. We show that the estimators for the regression parameters and transformation function are consistent and asymptotically normal, and both converge to their true values at the rate of n−1/2, the convergence rate expected for a parametric model. The practical utility of the methods is confirmed via simulations as well as an application of a clinical trial to evaluate the effectiveness of sentinel node biopsy in guiding the treatment of invasive melanoma.

Keywords: Cox proportional hazard model, selective compliance, semiparametric linear transformation model, survival data

1. INTRODUCTION

In controlled randomized trials, subjects are generally assigned a treatment regime. However, these subjects rarely exhibit perfect compliance with the treatment. Noncompliance typically renders intention-to-treat (ITT) estimators, which mix the effects of treatment in compliers with a lack of effect in noncompliers, incapable of measuring actual biological efficacy. On the other hand, the dangers of excluding noncompliant patients are well recognized (Altman, 1991). Such exclusion may result in biases because the subsets of compliers and noncompliers have different prognoses at baseline. Thus, the question about the causal effect of the treatment that is actually received becomes difficult to answer.

To handle survival outcomes with selective compliance, Frangakis & Rubin (1999), Loeys & Goetghebeur (2002), and Elashoff, Li,&Zhou (2012) proposed general nonparametric procedures to estimate the causal effect of treatment for data without covariates. For data with covariates, Robins & Tsiatis (1991), Altstein, Li, & Elashoff (2011), and Altstein & Li (2013) proposed accelerated failure time frameworks to quantify the causal effect of treatment. The most widely used regression method for analyzing survival data is the proportional hazards model (Cox, 1972), and work in this direction has been conducted by Robin & Finkelstein (2000), Hernan, Brumback, & Robins (2001), Loeys & Goetghebeur (2003), Loeys, Goetghebeur, & Vandebosch (2005), and Cuzick et al. (2007).

In this paper, we consider a semiparametric linear transformation model. The transformation of survival time is unknown but the distribution of the error term is specified. This class of regression models, which includes proportional hazards (Cox, 1972) and proportional odds models (Bennett, 1983), serves as a useful alternative to the Cox model in survival analysis. Pettitt (1982), Bennett (1983), Clayton & Cuzick (1985), Dabrowska & Doksum (1988), Cheng, Wei, & Ying (1995), Murphy, Rossini, & der Vaart (1997), Scharfstein, Tsiatis, & Gilbert (1998), Cai, Wei, & Wilcox (2000), Cai, Cheng, & Wei (2002), Chen, Jin, & Ying (2002), and Zeng & Lin (2006), among others, proposed estimators for this class of linear transformation models for the analysis of survival data without selective compliance.

Extending the partial likelihood method used in the Cox proportional hazards model to a general transformation model is difficult if is not impossible. In this paper, we propose a two-stage estimation procedure to estimate the parameters and the transformation function in the semiparametric linear transformation model for survival data with selective compliance. The proposed estimator is shown to be consistent and asymptotically normal. The estimator is also easy to compute because a likelihood function need not be maximized over an infinite dimensional parameter space, and our approach in estimating the transformation functions does not involve nonparametric smoothing. Thus, our estimator does not suffer from smoothing related problems, such as, the selection of a smoothing parameter.

The article is organized as follows. A detailed description of the aforementioned models is given in Section 2. A two-stage estimation procedure is described in Section 3. The asymptotic properties are derived in Section 4. An estimator of the variance is provided in Section 5. Section 6 contains the simulation results, and the application to a study, the Multicenter Selective Lymphadenectomy Trial-I (MSLT-I), is detailed in Section 7. Section 8 provides the concluding remarks.

2. NOTATIONS AND MODEL

Consider a trial where n independent subjects are randomly assigned to an experimental treatment (Ri = 1) or a control group (Ri = 0). Let Ti and Ci denote survival and censoring time, respectively; let Wi = min(Ti, Ci) and δi indicate the corresponding observation time and censoring status, respectively. We assume uninformative censoring. However, not all randomized subjects receive their assigned treatment. The trial population is generally considered as being composed of three classes of individuals: (a) compliers which refers to subjects who accept whatever allocation they are offered; (b) always takers, which refers to subjects who demand treatment regardless of the group to which they are randomized; and (c) never takers, which refers to subjects who refuse the treatment, if and when offered. Let Ui be the indicator of the potential class of individuals for subject i. By convention, the values Ui = 1, 2, 3 refer to compliers, always takers and never takers, respectively. Statistical difficulties arise because determining the class to which the individuals belong is not always possible. The ITT principle contrasts Pr(Ti > t | Ri = 1) with Pr(Ti > t | Ri = 0) but does not estimate the casual effect of a treatment because a number of patients in the control and treatment groups do not comply for reasons related to outcome.

Let Ei be the indicator of the treatment that is actually received by subject i. Ei = 1 denotes that subject i actually received treatment; otherwise, Ei = 0. Hence, Ei = RiI(Ui ≤ 2) + (1 − Ri)I(Ui = 2). Ui is not completely observed, but Ei is observable. Our goal is to identify the population averaged causal effect of treatment with adjustment for covariates. Specifically, given that Ui = 2 implies Ei = 1 and Ui = 3 implies Ei = 0, we contrast the survival probability in the treated group Pr(Ti > t | Ei = 1, Ui = 1, Xi = x) with its subpopulation-specific counterpart in the control group Pr(Ti > t | Ei = 0, Ui = 1, Xi = x). We assume that a monotone function H exists, such that the survival time T can be modeled as follows:

H(Ti)=cu+αuEi+βXi+εi,foru{1,2,3}, (1)

where H is the unknown monotonic transformation function; α1 is the unknown parameter of interest used to measure the casual effect; Xi is the covariate vector measured prior to randomization; and εi is a random variable with mean zero, variance σ2, and a known distribution F, which is independent of (Ei, Xi). For model identification, the conditions α2 = α3 = 0 are required because Ui = 2 implies Ei = 1 and Ui = 3 implies Ei = 0. The condition σ2 = 1 can be assumed without loss of generality because the model (1) holds if H is replaced by cH for any positive constant c. On the other hand, the condition H(a0) = b0 is required because the model (1) holds if H is replaced by H − c for any constant c, where a0 and b0 are any given finite numbers. In practice, we can take a0 as the median of the observed failure times among n observations to ensure that a0 is within the support set of H, so that H can be completely identified. The choices of a0 and b0 have a negligible effect to the resulting estimators. A simulation study on sensitivity has been conducted by Lin, Fan, & Zhou (2013) for a more general transformation model for longitudinal data.

For survival data without selective compliance, the model (1) has been extensively considered. The proportional hazards and the proportional odds models are special cases of (1), with εi following the extreme-value distribution and the standard logistic distribution, respectively. Thus, the Cox proportional hazard model proposed by Hernan, Brumback, & Robins (2001) and Cuzick et al. (2007) to analyze the causal effect of treatment for data with covariates and the model considered by Loeys & Goetghebeur (2003) to estimate the biological efficacy without covariates, are special cases of the model (1).

As shown in the literature, an implicit assumption in model (1) is “the exclusion restriction” used by Angrist, Imbens,&Rubin (1996) or the “absence of indirect effect” termed by Pearl (2002), which states that potential outcomes are unrelated to treatment assignment, that is, Pr(Ri = 1 | Ti, Ui) = Pr(Ri = 1 | Ui).

Considering that different characteristics, such as disease status and prognostic factors, may affect the compliance status of subjects, a regression model for compliance status is required. We model the subgroup probability by using the logistic regression model

Pr(Ui=j|Zi)=exp(θj+κjZi)1+u=12exp(θu+κuZi),forj=1,2, (2)

where Zi is the covariate, possibly different from Xi. Thus, the subgroup probability depends on the subject’s covariate vector but not on the randomization indicator Ri. Our simulation studies show that the choice of subgroup probability in model (2) is less important to the properties of the resulting estimators.

Although Ui is not completely observed, the survival distribution of compliers in the control and treatment groups can be identified by using the exclusion-restriction principle. First we note that only four groups can be observed from the data: always takers and compliers to treatment group (Ri = 1, Ui ≤ 2); never takers to treatment group (Ri = 1, Ui = 3); always takers to control group (Ri = 0, Ui = 2); and compliers and never takers to control group (Ri = 0, Ui ≠ 2). Considering that Ui and Ri are independent given Zi, p2(z) = P(Ui = 2 | Zi = z) and p3(z) = P(Ui = 3 | Zi = z) can be consistently estimated by maximizing the following likelihood function,

i=1np2(Zi)I(Ui=2)p3(Zi)I(Ui=3){1p3(Zi)}I(Ui2){1p2(Zi)}I(Ui2),

so can p1(z) = P(Ui = 1 | Zi = z). Furthermore, define Sr(t | x, z) = P(Ti > t | Ri = r, Xi = x, Zi = z) and let Sru(t | x, z) = Pr(Ti > t | Ri = r, Ui = u, Xi = x, Zi = z). Under the exclusion restriction, S12(t | x, z) = S02(t | x, z) and S13(t | x, z) = S03(t | x, z), and then

S0(t|x,z)=p1(z)S01(t|x,z)+p2(z)S02(t|x,z)+{1p1(z)p2(z)}S13(t|x,z),S1(t|x,z)=p1(z)S11(t|x,z)+p2(z)S02(t|x,z)+{1p1(z)p2(z)}S13(t|x,z).

Because p1(z), p2(z), S0(t | x, z), S1(t | x, z), S02(t | x, z), and S13(t | x, z) can be estimated consistently, so can S01(t | x, z) and S11(t | x, z).

3. ESTIMATION

3.1. Estimation of the Parameters

Let Θ=(α1,c,β,θ1,θ2,κ1,κ2)(Θ1,,Θd), where c = (c1, c2, c3)ˊ. Θ and H are the parameter and nonparametric function defined by the models (1) and (2). Under the model (1), noting that H is a monotonic function and that the subgroup probability is independent of the treatment assignment Ri, the likelihood for the observed data can be expressed as

L(Θ;H)i=1n{dH(Wi)}δi×([f{H(Wi)c1α1βXi}p1i(Θ)+f{H(Wi)c2βXi}p2i(Θ)]δi×[S{H(Wi)c1α1βXi}p1i(Θ)+S{H(Wi)c2βXi}p2i(Θ)](1δi))RiI(Ui2)×([f{H(Wi)c3βXi}]δi[S{H(Wi)c3βXi}]1δip3i(Θ))RiI(Ui=3)×([f{H(Wi)c2βXi}]δi[S{H(Wi)c2βXi}]1δip2i(Θ))(1Ri)I(Ui=2)×([f{H(Wi)c1βXi}p1i(Θ)+f{H(Wi)c3βXi}p3i(Θ)]δi×[S{H(Wi)c1βXi}p1i(Θ)+S{H(Wi)c3βXi}p3i(Θ)]1δi)(1Ri)I(Ui2), (3)

where f is the density function of F, S(x) = 1 – F(x), p1i(Θ)=exp(θ1+κ1Zi)/(1+u=12exp(θu+κuZi)), p2i(Θ)=exp(θ2+κ2Zi)/(1+u=12exp(θu+κuZi)), and p3i(Θ) = 1 – p1i(Θ) – p2i(Θ). The likelihood function involves both the finite dimensional parameter Θ and the infinite dimensional parameter H. Maximization of the likelihood function over an infinite dimensional parameter space can be complicated, especially when the form of the objective function is as complicated as the one described in this paper. In fact, even for the simple case of survival data without selective compliance, the computation of the nonparametric maximum likelihood estimator for the transformation is difficult (Zeng & Lin, 2006). In addition, the nonparametric maximum likelihood estimator of H is not assured to be monotonic. In this article, we use a two-stage approach; our estimator for the transformation function is monotone and the computation is easy because the maximization of the likelihood function over an infinite dimensional parameter space is avoided. First, we use a series of estimating equations described in Section 3.2 to estimate the transformation function. Then, the parameter Θ is estimated by maximizing a pseudo likelihood, which is the likelihood function L(Θ; H) with H being replaced by the estimated values. We repeat the procedure until convergence.

3.2. Estimation of the Transformation Functions

In this section, we estimate the transformation function H given Θ. Throughout this article, let 0 < t1 < ··· < tK denote the observed K failure times among the n observations. Denote

Λi(x;Θ)=RiI(Ui2)log{S(xc1α1βXi)p1i(Θ)+S(xc2βXi)p2i(Θ)p1i(Θ)+p2i(Θ)}RiI(Ui=3)log{S(xc3βXi)}(1Ri)I(Ui=2)log{S(xc2βXi)}(1Ri)I(Ui2)log{S(xc1βXi)p1i(Θ)+S(xc3βXi)p3i(Θ)1p2i(Θ)}.

Using the usual counting process notation, let Yi(t) = I(Wit), Ni(t) = δiI(Wit), Mi(t)=Ni(t)0tYi(s)dΛi(H0(s);Θ0). Motivated by the fact that Mi(t) is a martingale process, we consider the following estimating equation to estimate H(t) for any t ≥ 0,

i=1n{dNi(t)Yi(t)dΛi(H(t);Θ)}=0, (4)

where H(0) = −∞. We denote by H^(t;Θ) the solution of (4) with respect to H(t) given Θ. It is clear that H^(;Θ) is nondecreasing step function on [0, ∞) with H(0) = −∞ and with jumps only at the observed failure times t1,...,tK. Then, solving the infinite system of estimating equations defined by (4) is equivalent to solving the infinite system of equations. Denote by λi(x; Θ) = ∂Λi(x; Θ)/∂x. Following the discussions in Chen, Jin, & Ying (2002) and using H(0) = −∞ and (4), we propose to estimate H(t1) by solving

i=1n{dNi(t1)Yi(t1)Λi(H(t1);Θ)}=0,

and obtain H^(tk) for k = 2, …, K, one-by-one by the equations

H^(tk)=H^(tk1)+i=1ndNi(tk)i=1nYi(tk)λi(H^(tk1);Θ).

The computation of H^(t) is considerably simple. In addition, unlike a traditional nonparametric approach used to estimate the transformation function (Horowitz, 1996; Zhou, Lin, & Johnson, 2009), our approach does not involve nonparametric smoothing.

4. INFERENCE IN LARGE SAMPLES

In this section, we present the large sample properties of the proposed estimators. Let Θ^ and H^(t) denote the estimators of Θ and H(t), respectively; and let Θ0 and H0 be the true values of Θ and H, respectively. Some notation and regular conditions are needed. Let τ = inf{t : Pr(Wi > t) = 0} and ψ(t) = (∂/∂t) log(f(t)/S(t)). We assume f(t)/S(t) > 0, ψ(t) to be continuous, and limt→−∞ f(t)/S(t) = limt→−∞ ψ(t) = 0. Furthermore, Xi and Zi are bounded, and H0 has continuous and positive derivatives. Define λi(x) = λi(x; Θ0),

λ˙i(x)=dλi(x)dx,λ(H0(t))=exp[0tE{Yi(x)λ˙i(H0(x))}dH0(x)E{Yi(x)λi(H0(x))}],
Σ=E{2logLi(Θ0;H0)ΘΘ2logLi(Θ0;H0)ΘH(Wi)1λ*{H0(Wi)}×0τYi(s)λ*{H0(s)}E[Yj(s)λj(H0(s))]E[Yj(s)λj(H0(s);Θ0)Θ]dH0(s)},
Δ=E{logLi(Θ0;H0)Θ+0τE[Yj(s)λ*(H0(Wj))2logLj(Θ0;H0)ΘH(Wj)]λ*(H0(s))E[Yj(s)λj(H0(s))]dMi(s)}2,

where Li(Θ; H) is the contribution of subject i to the likelihood L(Θ; H) and a⊗2 = aaˊ for any vector a. Assume that Σ and Δ are finite and nondegenerate. Some additional technical conditions are presented in the Appendix.

Theorem 1.

As n1=i=1nRi and n2=i=1n(1Ri), we have

|Θ^Θ0|0,supt[a,τ]|H^(t)H0(t)|0

in probability for any fixed a ∈ (0, τ].

Theorem 2.

As n1 → ∞ and n2 → ∞, we have

n(Θ^Θ0)N(0,Σ1Δ(Σ1)). (5)

Theorem 3.

As n1 → ∞ and n2 → ∞, for any t ϵ (0, τ), we have

n(H^(t)H0(t))N(0,Γ(t)λ*2(H0(t))),

where

Γ(t)=E{μ(t)Σ1logLi(Θ0;H0)Θ+μ(t)Σ10τE[Yj(s)λ*(H0(Wj))2logLj(Θ0;H0)ΘH(Wj)]×λ*(H0(s))E[Yj(s)λj(H0(s))]dMi(s)+0tλ*(H0(s))E[Yj(s)λj(H0(s))]dMi(s)}2,

and μ(t)=0tλ*{H0(s)}E[Yj(s)λj(H0(s))]E[Yj(s)λj(H0(s);Θ0)Θ]dH0(s).

From Theorem 3, we see that H^(t) converges to H(t) at the rate of n−1/2; this result shows that the nonparametric function H(.) has a parametric convergence rate. The conclusion, that the transformation function can be estimated with n−1/2 rate of convergence, has also been reached by Horowitz (1996), Chen, Jin, & Ying (2002), Ye & Duan (1997), Zhou, Lin, & Johnson (2009), and Lin & Zhou (2009), among others.

5. ESTIMATION OF ASYMPTOTIC VARIANCE OF Θ^

As shown in Theorem 2, the asymptotic variance of Θ^ has a standard sandwich form Σ−1Δ(Σ−1)ˊ. However, the matrices Σ and Δ exhibit complicated analytic forms that require complicated computation. Therefore, a feasible computation approach to approximate the asymptotic variance of Θ^ is needed. In this section, we use the resampling scheme proposed by Jin, Ying, and Wei (2001) to approximate the asymptotic distribution of Θ^. Compared with the simple bootstrap resampling method, Jin, Ying, and Wei’s resampling scheme, that uses continuous weights, is more stable. The resampling algorithm proceeds as follows. First, we generate n standard exponential random variables ξi, i = 1, …, n. Fixing the data at their observed values, we solve the following ξi-weighted estimation equations and denote the solutions as Θ* and H*(t) for any t > 0:

i=1nξi{dNi(t)Yi(t)dΛi(H(t);Θ)}=0, (6)
i=1nξilog(Li(Θ;H))=0, (7)

where H(0) = −∞. Here, the exponential random variable is unnecessary, any positive random variable with mean 1 and variance 1 can be used. The estimates Θ* and H*(·) can be obtained by using the iterative algorithm proposed in Section 3. Following the method of Jin, Ying, & Wei (2001) and by using Equations (A.1) and (A.4) in the Appendix, we establish the validity of the proposed resampling method.

Proposition:

Under the conditions given in the Appendix, the conditional distribution of n1/2(Θ*Θ^), given the observed data, converges almost surely to the asymptotic distribution of n1/2(Θ^Θ0).

Based on this proposition, by repeatedly generating ξ1, …, ξn numerous times, we may obtain a large number of realizations of Θ*. The variance estimate of Θ^ can then be approximated by the empirical variance of Θ*.

6. SIMULATION

In this section, we describe the simulation studies conducted to assess the finite-sample performance of the proposed method by comparing it with other existing methods. The existing approaches to analyze survival data with noncompliance and adjustment for covariates include (1) accelerated failure time frameworks (AFT, Robins & Tsiatis, 1991; Altstein, Li, & Elashoff, 2011; Altstein & Li, 2013) and (2) the Cox proportional hazards model (Robin & Finkelstein, 2000; Hernan, Brumback, & Robins, 2001; Loeys & Goetghebeur, 2003; Loeys, Goetghebeur, & Vandebosch, 2005; Cuzick et al., 2007). Considering that the existing literature on the AFT model assumes that two subgroups exist in the population, whereas we assume three subgroups, the direct comparison of our method and the existing AFT model is infeasible. Given that the AFT model can be regarded as the model (1) with the specified transformation function, we evaluate the AFT model by investigating the performance of the model (1) with misspecified transformation function (indicated by “MT”). We choose the full likelihood method developed by Cuzick et al. (2007) as a representative of the Cox proportional hazards model (indicated by “Cox”).

We present four simulations in this paper. In all of the simulations, we choose n = 400 and generate Ti by using the model (1) with α1 = 2, β = (1, 1)ˊ and Xi = (Xi1, Xi2)ˊ, where Xi1 and Xi2 are independently generated by the standard normal distribution. We take the transformation functions H(t) = log(t) and H(t) = log{log(t)} and generate εi in the model (1) by using the hazard function of the form λ(t) = exp(t)/(1 + r exp(t)) with r = 0 and 1. Notably, the proportional hazards and proportional odds models correspond to r = 0 and r = 1, respectively. We divide the n observations equally into the treatment and control groups. Subsequently, we generate Ri, then, Ei = RiI(Ui ≤ 2) + (1 − Ri)I(Ui = 2), where Ui is generated by the model (2) with θ1 = θ2 = 0, κ1 = (0.5, −0.5)ˊ and κ2 = (0.5, −0.5)ˊ. Considering that the model assumption on Ui used by Cuzick et al. (2007) differs from that used in this paper for the case of class membership correlated with covariates, to compare the proposed method with that of Cuzick et al. (2007), we also generate Ui by using the model (2) with θ1 = θ2 = 0, κ1 = κ2 = (0, 0)ˊ, that is, the class membership is independent of covariates. The independent censoring time Ci is generated from the uniform distribution on (0.5, a). The choice of a yields censoring proportions of approximately 30–40% for each case. For simplicity in presentation, we list the simulation settings in Table 1. We simulated 500 datasets for each setting.

Table 1:

The four simulation settings generated from varying H(·), r, κ1, and κ2.

H(t) r κ1 κ2
Case 1 log(t) 0 (0.5,−0.5)′ (0.5,−0.5)′
Case 2 log{log(t)} 1 (0.5,−0.5)′ (0.5,−0.5)′
Case 3 log(t) 0 (0, 0)′ (0, 0)′
Case 4 log{log(t)} 1 (0, 0)′ (0, 0)′

The proportional hazards and proportional odds models correspond to r = 0 and r = 1, respectively.

The class membership is independent of covariates when κ1 = κ2 = (0, 0)′.

For Cases 1 and 2, we analyze each data of 500 datasets by using the proposed method and the MT estimator, where the transformation function is misspecified by the identity function. However, the MT estimator has great difficulty in estimating the regression coefficients properly, thus causing all runs to fail to converge when the initial value is selected as the true values and the estimator based on the linear regression model with noncompliance to be disregarded, as initial values. Convergence failure implies that the MT estimator is far away from the true values and is severely biased. Tables 2 and 3 present the biases and empirical standard deviations (SDs) of the parameter estimators for Cases 1 and 2, respectively, obtained from the 500 simulations by using the proposed method. Tables 4 and 5 present the biases and SD for Cases 3 and 4, respectively, which were obtained by using the proposed method and the Cox estimator. Figure 1a–d display the average of the estimated transformation functions and their empirical pointwise 95% confidential intervals for Cases 1 to 4, respectively, which were derived by using the proposed method.

Table 2:

The bias, SD, and estimated SE of the estimators based on the 500 simulations for Case 1.

Proposed
MSP
Parameter Bias SD SEave SEsd CP Bias SD
α1 0.0259 0.3885 0.3925 0.0543 0.940 −0.0132 0.4006
c1 0.0066 0.2573 0.2560 0.0386 0.948 0.0168 0.2447
c2 −0.0009 0.1396 0.1237 0.0152 0.912 −0.0055 0.1411
c3 0.0072 0.1545 0.1657 0.0278 0.956 −0.0016 0.1637
β1 0.0014 0.0836 0.0915 0.0234 0.946 −0.0021 0.0894
β2 0.0059 0.0834 0.1006 0.0106 0.978 0.0111 0.0969
θ1 −0.0020 0.2384 0.2151 0.02540 0.912 0.0018 0.2333
θ2 0.0181 0.1461 0.1630 0.0146 0.962 0.0090 0.1558
κ11 0.0058 0.2229 0.2236 0.0289 0.942
κ12 −0.0372 0.2210 0.2348 0.0418 0.948
κ21 0.0090 0.1673 0.1636 0.0179 0.932
κ22 −0.0118 0.1562 0.1682 0.0217 0.958

Table 3:

The bias, SD, and estimated SE of the estimators for Case 2.

Parameter Bias SD SEave SEsd CP
α1 −0.0073 0.6820 0.6842 0.0924 0.938
c1 0.0461 0.3785 0.3971 0.0615 0.952
c2 0.0194 0.2145 0.2095 0.0295 0.926
c3 0.0128 0.2259 0.2229 0.0393 0.930
β1 −0.0067 0.1185 0.1176 0.0128 0.944
β2 0.0026 0.1244 0.1352 0.0198 0.956
θ1 −0.0074 0.2674 0.2546 0.0448 0.926
θ2 0.0079 0.1514 0.1392 0.0103 0.940
κ11 0.0221 0.2341 0.2155 0.0249 0.914
κ12 −0.0144 0.2413 0.2412 0.0255 0.944
κ21 0.0091 0.1682 0.1648 0.0190 0.938
κ22 −0.0040 0.1655 0.1705 0.0140 0.962

Table 4:

The bias, SD, and estimated SE of the estimators for Case 3.

Proposed
Cox
Bias SD SEave SEsd CP Bias SD
α1 0.0194 0.4175 0.4325 0.0555 0.944 0.0602 1.3246
c1 0.0029 0.2719 0.2570 0.0405 0.930
c2 0.0064 0.1453 0.1441 0.0153 0.924 0.0179 1.2739
c3 −0.0026 0.1557 0.1416 0.0136 0.932 0.0515 1.2899
β1 0.0045 0.0841 0.0839 0.0075 0.950 0.0753 0.0775
β2 0.0106 0.0885 0.0770 0.0074 0.910 0.0713 0.0759
θ1 0.0020 0.2174 0.2210 0.0218 0.944
θ2 0.0014 0.1359 0.1585 0.0394 0.954

Table 5:

The bias and SD of the estimators for Case 4.

Proposed
Cox
Bias SD SEave SEsd CP Bias SD
α1 −0.0427 0.6866 0.6903 0.0863 0.932 0.2774 1.4585
c1 0.0774 0.3683 0.3642 0.0416 0.942
c2 0.0368 0.2264 0.2317 0.0214 0.948 0.1972 1.4426
c3 0.0050 0.2152 0.2042 0.0181 0.946 0.1643 1.4694
β1 −0.0180 0.1071 0.1204 0.0116 0.978 0.3603 0.0801
β2 −0.0173 0.1215 0.1342 0.0255 0.952 0.3498 0.0849
θ1 0.0010 0.2164 0.2295 0.0305 0.946
θ2 −0.0121 0.1374 0.1526 0.0189 0.964

Figure 1:

Figure 1:

Estimated transformation functions for (a) Case 1; (b) Case 2; (c) Case 3; and (d) Case 4.

From Tables 25 and Figure 1, we observed the following:

  • Figure 1 shows that the proposed method produces reasonable estimates of the transformation functions.

  • A useful rule of thumb for evaluating the bias is that it should not have a substantial negative impact on inferences unless the standardized bias (bias as a percent of the SD) exceeds 40% (Olsen & Schafer, 2001). Tables 2 and 3 show that the proposed estimators are unbiased in all cases. By contrast, the MT estimators may have significantly large bias, suggesting that the misspecification of the transformation function yields a biased estimator.

  • Interestingly, Table 4 shows that the proposed estimator has less bias and variance than the full likelihood method developed by Cuzick et al. (2007) although a Cox model is appropriate. This result is surprising. By further checking the Cuzick et al.’s estimator, we find that the full likelihood function developed by Cuzick et al. (2007) involves πIi=min(NiCTρ/NiTT,1) and πRi=min(NiTCρ1/NiCC,1). The πIi is estimate of the proportion of individuals complying with randomization to treatment and at risk at time ti – who would have insisted on receiving new treatment even if they had been randomized to control. πRi has a similar interpretation as the estimated proportion of refusers among the CC group at ti, where the definitions of CC, CT, TC, and TT can be found in Cuzick et al. (2007). πIi and πRi are not convincing estimators when NiTT and NiCC become small, which actually occurs as ti becomes large.

  • Table 5 shows that the Cox estimator is biased and unstable when a Cox model is inappropriate, whereas the proposed estimators appear consistently better than the Cox method.

We also test the accuracy of estimation of the standard error given in Section 5. The SDs of the 500 estimators for parameters, based on 500 simulations and also presented in Tables 25, can be regarded as the true standard errors. The average and the standard deviation of 500 estimated bootstrap standard errors, denoted by SEave and SEsd, summarizes the overall performance of the standard error estimator. Tables 25 present the results, which showing that the standard error estimators are generally close to the empirical standard errors and are reasonably adequate as a resampling method for estimating the standard errors. We also display the empirical coverage probabilities (CP) of 95% confidence intervals in Tables 25.

We investigate the issue of whether the misspecification of the model on the subgroup probability yields a biased estimator by testing the performance of the proposed estimators with the misspecification of the subgroup probability model (indicated by “MSP”) for Case 1. Table 2 presents the biases and SDs of the MSP estimators, where κ1 and κ2 are misspecified as zero, that is, the class membership is independent of the covariates. The results in Table 2 suggest that the misspecification of the subgroup probability model has a little effect on the parameters of interest.

7. EXAMPLE

The proposed approach is applied to MSLT-I, an ongoing clinical trial, to evaluate the effectiveness of sentinel node biopsy (SNB) in guiding the treatment of invasive melanoma (Morton et al., 2005). At the time the trial began in 1994, the standard method of care consisted of wide excision of the primary melanoma, followed by observation, with lymphadenectomy (surgical excision of a lymph node or nodes) only undertaken on clinical evidence of recurrence. However, patients with nodal metastases have a poor prognosis and may stand to benefit from immediate lymphadenectomy. The MSLT-I study randomized patients who received an experimental course of therapy (treatment arm) or the standard care (control arm). All patients in the treatment arm received SNB (Morton et al., 1999), and those with biopsies positive for metastasis underwent an immediate (elective) lymphadenectomy. Node-negative patients received the same observational course of treatment as the controls.

The ITT analysis of MSLT-I determines whether the strategy of SNB-plus-immediate-lymphadenectomy improves survival in the diseased aggregate population. The analysis does not estimate the biological efficacy of the surgery on the subgroup of patients with nodal metastases. The third interim analysis of the trial (Morton et al., 2006) attempted to address this estimate by comparing node-positive patients undergoing treatment with control patients who subsequently developed recurrence. However, this procedure does not produce an unbiased estimate of biological efficacy because of the lack of one-to-one correspondence between positive SNB results at the time of randomization and the eventual development of a clinically evident recurrence.

We analyzed the data by using the proposed semiparametric transformation model to estimate the biological efficacy of the surgery on the subgroup of patients with nodal metastases. Following the suggestion of Altstein, Li, & Elashoff (2011), we considered distant disease free survival (DDFS) as the end-point. For these data, several explanatory variables, including the Breslow score (X1), an indicator of the presence of ulceration (X2), as well as an indicator that the melanoma is located on the trunk of the body (X3), was recorded. The Breslow score measures the thickness of the primary melanoma. We modeled the data using the proposed model with X = Z = (X1, X2, X3)ˊ, and the hazard function of εi was assumed to take the form of λ(t) = exp(t)/(1 + r exp(t)). We chose r = 0 from r = 0, 0.5, 1, 1.5, 2, 2.5, 3 by comparing the distribution of ε^i=H^(Ti)c^U˜iα^U˜iEiβ^Xi and the real distribution of εi based on the Q–Q plot, where Ũi is Ui if observed, or otherwise, obtained by the most probable class based on the conditional probability given the observed data. The resulting estimates of the regression coefficients and the transformation function with a0 = 32 and b0 = 3.5 are displayed in Table 6 and Figure 2a. The standard errors were calculated according to the method discussed in Section 5. Table 6 shows that the treatment has no significant effect on DDFS. However, a long DDFS is strongly associated with a low Breslow score, the absence of ulceration and the absence of the melanoma on the trunk of the body. In addition, Table 6 also suggests that the node status depends on the Breslow score. To compare, we also display the estimation results from the accelerated failure time mixture model used by Altstein & Li (2013), and a similar conclusion is obtained.

Table 6:

The estimation results for the MSLT-I data using the proposed method.

Proposed
AFT
Node+
Node−
Estimate SD P-value Estimate P-value Estimate P-value

α1 0.0177 0.1553 0.9093 0.358 0.28
β1 −0.5476 0.1372 0.0000 −0.619 0.017 −0.654 0.0003
β2 −0.4171 0.1398 0.0028 0.411 0.29 −0.530 0.012
β3 −0.6058 0.0968 0.0000 −0.355 0.32 −0.615 0.0018
Proposed
Proposed
Estimate SD P-value Estimate SD P-value

c1 7.4373 0.3149 0.0000 κ11 −1.2211 0.3631 0.0008
c2 7.0197 0.4885 0.0000 κ12 −0.5256 0.3713 0.1569
c3 7.4862 0.4629 0.0000 κ13 0.1129 0.3232 0.7268
θ1 3.5549 0.6813 0.0000 κ21 −1.4890 0.6050 0.0138
θ2 0.6753 0.9101 0.4581 κ22 −0.2430 0.5814 0.6760
κ23 0.0582 0.4551 0.8982

Figure 2:

Figure 2:

(a) The estimated transformation function (dotted) and its 95% confidence limits (dotted-lined) for the MSLT-I data. (b) The quantile–quantile plot of the distribution of the estimated error ε^i versus the given distribution.

Finally, we verified the validity of the assumed semiparametric transformation model (1) by examining how well the distribution of the estimated error ε^i fitted the given distribution. Given that the treatment is insignificant and the estimators of c1, c2, and c3 are close, we computed the estimated error by ε^i=H^(Ti)cβ^Xi, where c = (c1 + c2 + c3)/3. Then, using the Kaplan–Meier estimator, the distribution of the estimated error was obtained based on the data {ε^i,δi} for i = 1, …, n. Figure 2b displays the quantile–quantile plot of the distribution of the estimated error ε^i versus the given distribution. A perfect fit corresponds to the solid diagonal line. Points above or below this diagonal line indicate over- or underprediction by the model. Figure 2b suggests that the goodness-of-fit of the proposed model (1) is reasonable.

8. DISCUSSION

We use a semiparametric linear transformation model to analyze survival data with selective compliance. This model is more robust and flexible than other existing models. Although we have focused on independent data, our method can be extended to multivariate failure times by incorporating random effects or frailties. In our model, the regression coefficient β and the variance of the error are assumed to be the same for the groups Ui = 1, 2, and 3. Our method can be extended to allow different β and variance for different groups Ui = 1, 2, 3.

ACKNOWLEDGEMENTS

Lin’s research was supported by the National Natural Science Funds for Distinguished Young Scholar (No. 11125104), the National Natural Science Foundation of China (No. 11071197) and Program for New Century Excellent Talents in University.

APPENDIX

Regularity conditions for ensuring the central limit theorem for counting process martingales such as those assumed in Fleming & Harrington (1991) are also assumed here without specific statement. In particular, we assume that τ is finite, Pr(Ti > τ) > and Pr(Ci = τ) > 0. This is to avoid a lengthy technical discussion about the tail behaviour. The consistency and asymptotic normality stated in theorems 1 and 2 are proved using similar arguments to those of Chen, Jin, & Ying (2002), so we only highlight the steps that are different.

Proof of Theorems 1 and 2.

Step 1. Using similar arguments to Step A1 of Chen, Jin, & Ying (2002), it can be shown that d{H^(,Θ0),H0()}0 almost surely, where H^(,Θ) is the function implicitly defined as the unique solution of (7) for fixed Θ and d(·,·) is a distance defined as follows. For any two nondecreasing functions H1 and H2 on [0, τ] such that H1(0) = H2(0) = −∞, define d(H1, H2) = sup(| exp{H1(t)} − exp{H2(t)}| : t ∈ [0, τ]).

Step 2. Constructing the expression of H^(t;Θ0). Let a > 0 and b be fixed finite numbers and define B1(t)=atE[Yi(s)λ˙i(H0(s))]dH0(s), B2(t)=E[Yi(t)λi(H0(t))], λ*(H0(t))=exp(0tdB1(x)/B2(x)), Λ*(x)=bxλ*(s)ds, for t > 0 and x ∈ (−∞, ∞). We choose finite a > 0 and b as the lower limits of the integration to ensure that the integrals are finite. It is easy to see that dλ*{H0(t)} = [λ*{H0(t)}/B2(t)]dB1(t) and write

1ni=1nMi(t)=1ni=1n0tYi(s)d{λi(H0(s))λ*(H0(s))(Λ*(H^(s;Θ0))Λ*(H0(s)))}+op(n1/2)=0t(Λ*(H^(s;Θ0))Λ*(H0(s)))×1ni=1nYi(s){λ˙i(H0(s))dH0(s)λ*(H0(s))λi(H0(s))[λ*{H0(s)}/B2(s)]dB1(s)(λ*(H0(s)))2}1ni=1n0tYi(s)λi(H0(s))λ*(H0(s))d(Λ*(H^(s;Θ0))Λ*(H0(s)))+op(n1/2)=0tB2(s)λ*(H0(s))d(Λ*(H^(s;Θ0))Λ*(H0(s)))+op(n1/2).

Therefore, for t ∈ [0, τ],

Λ*(H^(t;Θ0))Λ*(H0(t))=1ni=1n0tλ*(H0(s))B2(s)dMi(s)+op(n1/2). (A.1)

Step 3. Denote U(Θ; H) = ∂ log L(Θ; H)/∂Θ. In the step, we compute V(Θ) = ∂U(Θ; Ĥ(⋅; Θ))/(nΘ) at Θ = Θ0. By differentiating both sides of (7) with respect to Θ, we obtain the identity

i=1n0tYi(s)d{λi(H^(s;Θ0))H^(s;Θ0)Θ+λ˜i(H^(s;Θ0);Θ0)}=0, (A.2)

where λ˜i(H(s);Θ)=Λi(H(s);Θ)Θ. Similarly to Step 2, we have that

H^(t;Θ0)Θ=1λ*{H0(t)}0tλ*{H0(s)}B2(s)E[Yi(s)λi(H0(s);Θ)Θ]dH0(s)+op(1). (A.3)

It follows from the law of large numbers that

V(Θ0)=1ni=1n{2logLi(Θ0;H0)ΘΘ2logLi(Θ0;H0)ΘH(Wi)1λ*{H0(Wi)}×0τYi(s)λ*{H0(s)}B2(s)E[Yj(s)λj(H0(s);Θ0)Θ]dH0(s)}+op(1)=Σ+op(1),

where Li(Θ; H) is the contribution of subject i to the likelihood function L(Θ; H)

Step 4. In this step, we show the asymptotic normality of U(Θ0; Ĥ(⋅; Θ0). Using the results of Steps 1 and 2 and some empirical process approximation techniques, we can write

1nU(Θ0;H^(;Θ0)=1ni=1nlogLi(Θ0;H0)Θ+1ni=1n2logLi(Θ0;H0)ΘH(Wi)(H^(Wi;Θ0)H0(Wi))=1ni=1n{logLi(Θ0;H0)Θ+0τE[Yj(s)λ*(H0(Wj))2logLj(Θ0;H0)ΘH(Wj)]λ*(H0(s))B2(s)dMi(s)}+op(n1/2).

It then follows that n1/2U(Θ0; Ĥ(⋅; Θ0) → N(0, Δ).

The rest of the proof essentially proceeds along the lines of Chen, Jin, & Ying (2002) and is omitted here. ■

Proof of Theorem 3.

By Taylor series expansions, (A.1) and (A.3), we get,

Λ*(H^(t))Λ*(H0(t))=Λ*(H^(t;Θ^))Λ*(H^(t;Θ0))+Λ*(H^(t;Θ0))Λ*(H0(t))=λ*(H0(t))H^(t;Θ0)Θ(Θ^Θ0)+1ni=1n0tλ*(H0(s))B2(s)dMi(s)+op(n1/2)=0tλ*{H0(s)}B2(s)E[Yj(s)λj(H0(s);Θ0)Θ]dH0(s)(Θ^Θ0)+1ni=1n0tλ*(H0(s))B2(s)dMi(s)+op(n1/2). (A.4)

Substituting Θ^Θ0=Σ1n1U(Θ0;H^(;Θ0)+op(n1/2) into the equation above, we complete the proof of Theorem 3. ■

BIBLIOGRAPHY

  1. Angrist JD, Imbens GW, & Rubin DB (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91, 444–1455. [Google Scholar]
  2. Altman DG (1991). Practical Statistics for Medical Research, CRC Press, London. [Google Scholar]
  3. Altstein LL & Li G (2013). Latent subgroup analysis of a randomized clinical trial through a semiparametric accelerated failure time mixture model. Biometrics, 69, C52–C61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Altstein LL, Li G, & Elashoff RM (2011). A method to estimate treatment efficacy among latent subgroups of a randomized clinical trial. Statistics in Medicine, 30, 709–717. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bennett S (1983). Analysis of survival data by the proportional odds model. Statistics in Medicine, 2, 273–277. [DOI] [PubMed] [Google Scholar]
  6. Cai T, Cheng S, & Wei LJ (2002). Semi-parametric random effects models for clustered failure time data. Journal of the American Statistical Association, 97, 514–522. [Google Scholar]
  7. Cai T, Wei LJ, & Wilcox M (2000). Semi-parametric regression analysis for clustered failure time data. Biometrika, 87, 867–878. [Google Scholar]
  8. Chen K, Jin Z, & Ying Z (2002). Semiparametric analysis of transformation models with censored data. Biometrika, 89, 659–668. [Google Scholar]
  9. Cheng SC, Wei LJ, & Ying Z (1995). Analysis of transformation models with censored data. Biometrika, 82, 835–845. [Google Scholar]
  10. Clayton DG & Cuzick J (1985). Multivariate generalizations of the proporational hazards model. Journal of Royal Statistical Society Series B, 34, 187–220. [Google Scholar]
  11. Cox DR (1972). Regression models and life-tables (with discussion). Journal of Royal Statistical Society Series B, 34, 187–220. [Google Scholar]
  12. Cuzick J, Sasieni P, Myles J, & Tyrer J (2007). Estimating the effect of treatment in a proportional hazards model in the presence of non-compliance and contamination. Journal of Royal Statistical Society Series B, 69, 565–588. [Google Scholar]
  13. Dabrowska DM & Doksum KA (1988). Partial likelihood in transformation models with censored data. Scandinavian Journal of Statistics, 15, 1–23. [Google Scholar]
  14. Elashoff RM, Li G, & Zhou Y (2012). Nonparametric inference for assessing treatment efficacy in randomized clinical trials with a time-to-event outcome and all-or-none compliance. Biometrika, 99, 393–404. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Fleming TR & Harrington DP (1991). Counting processes and survival analysis. John Wiley & Sons, Inc., New York [Google Scholar]
  16. Frangakis CE & Rubin DB (1999). Addressing complications of intention-to-treat analysis in the combined presence of all-or-none treatment-noncompliance and subsequent missing outcomes. Biometrika, 86, 365–379. [Google Scholar]
  17. Hernan MA, Brumback B, & Robins JM (2001). Marginal structural models to estimate the joint causal effect of nonrandomized treatments. Journal of American Statistical Association, 96, 440–448. [Google Scholar]
  18. Horowitz JL (1996). Semiparametric estimation of a regression model with an unknown transformation of the dependent variable. Econometrica, 64, 103–137. [Google Scholar]
  19. Jin Z, Ying Z, & Wei LJ (2001). A simple resampling method by perturbing the minimand. Biometrika, 88, 381–390. [Google Scholar]
  20. Lin HZ, Zhou XH, & Zhou L (2013). Semiparametric regression analysis of longitudinal skewed data. Scandinavian Journal of Statistics, in revision. [Google Scholar]
  21. Lin HZ & Zhou XH (2009). A semi-parametric two-part mixed-effects heteroscedastic transformation model for correlated right-skewed semi-continuous data. Biostatistics, 10, 640–658. [DOI] [PubMed] [Google Scholar]
  22. Loeys T & Goetghebeur E (2002). Baseline information in structural failure time estimators for the effect of observed treatment compliance. Statistics in Medicine, 21, 1173–1180. [DOI] [PubMed] [Google Scholar]
  23. Loeys T & Goetghebeur E (2003). A causal proportional hazards estimator for the effect of treatment actually received in a randomized trial with all-or-nothing compliance. Biometrics, 59, 100–105. [DOI] [PubMed] [Google Scholar]
  24. Loeys T, Goetghebeur E, & Vandebosch A (2005). Causal proportional hazards models and time-constant exposure in randomized clinical trials. Lifetime Data Analysis, 11, 435–449. [DOI] [PubMed] [Google Scholar]
  25. Morton DL, Thompson JF, Essner R, Elashoff R, Stern SL, Nieweg OE, Roses DF, Karakousis CP, Mozzillo N, Reintgen D, Wang HJ, Glass EC, & Cochran AJ. (1999). Validation of the accuracy of intraoperative lymphatic mapping and sentinel lymphadenectomy for early-stage melanoma. Annals of Surgery, 230(4), 453–463. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Morton DL, Cochran AJ, Thompson JF, Elashoff R, Essner R, Glass EC, Mozzillo N, Nieweg OE, Roses DF, Hoekstra HJ, Karakousis CP, Reintgen DS, Coventry BJ, Wang HJ, & the Multicenter Selective Lymphadenectomy Trial Group. (2005). Sentinel node biopsy for early-stage melanoma: Accuracy and morbidity in MSLT-I, an international multicenter trial. Annals of Surgery, 242, 302–313. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Morton D, Thompson J, Cochran A, Mozzillo N, Elashoff R, Essner R, Nieweg O, Roses D, Hoekstra H, Karakousis C, Reintgen D, Coventry B, Glass E, Wang H, & the Multicenter Selective Lymphadenectomy Trial Group. (2006). Sentinel-node biopsy or nodal observation in melanoma. New England Journal of Medicine, 355, 1307–1317. [DOI] [PubMed] [Google Scholar]
  28. Murphy SA, Rossini AJ, & van der Vaart AW (1997). MLE in the proportional odds model. Journal of the American Statistical Association, 92, 968–976. [Google Scholar]
  29. Olsen MK & Schafer JL (2001). A two-part random-effects model for semicontinuous longitudinal data. Journal of the American Statistical Association, 96, 730–745. [Google Scholar]
  30. Pearl J (2002). Reasoning with cause and effect. AI Magazine, 23, 95–111. [Google Scholar]
  31. Pettitt AN (1982). Inference for the linear model using a likelihood based on ranks. Journal of Royal Statistical Society Series B, 44, 234–243. [Google Scholar]
  32. Robins JM & Finkelstein DM (2000). Correcting for noncompliance and dependent censoring in an AIDS clinical trial with inverse probability of censoring (IPCW) logrank-tests. Biometrics, 56, 779–788. [DOI] [PubMed] [Google Scholar]
  33. Robins JM and Tsiatis AA (1991). Correcting for non-compliance in randomized trials using rank preserving structural failure time models. Communications in Statistics—Theory and Methods, 20, 2609–2631. [Google Scholar]
  34. Scharfstein DO, Tsiatis AA, & Gilbert PB (1998). Semiparametric efficient estimation in the generalized odds-rate class of regression models for right-censored time-to-event data. Lifetime Data Analysis, 4, 355–391. [DOI] [PubMed] [Google Scholar]
  35. Ye JM & Duan NH (1997). Nonparametric n−1/2-consistent estimation for the general transformation models. The Annals of Statistics, 25, 2682–2717. [Google Scholar]
  36. Zeng D & Lin DY (2006). Efficient estimation of semiparametric transformation models for counting processes. Biometrika, 93, 627–640. [Google Scholar]
  37. Zhou XH, Lin HZ, & Johnson E (2009). Nonparametric heteroscedastic transformation regression models for skewed data with an application to health care costs. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 1029–1047. [Google Scholar]

RESOURCES