INCREASING POWER IN RANDOMIZED TRIALS WITH RIGHT CENSORED OUTCOMES THROUGH COVARIATE ADJUSTMENT

K L Moore; M J van der Laan

doi:10.1080/10543400903243017

. Author manuscript; available in PMC: 2010 Jul 1.

Published in final edited form as: J Biopharm Stat. 2009 Nov;19(6):1099–1131. doi: 10.1080/10543400903243017

INCREASING POWER IN RANDOMIZED TRIALS WITH RIGHT CENSORED OUTCOMES THROUGH COVARIATE ADJUSTMENT

K L Moore ¹, M J van der Laan ¹

PMCID: PMC2895464 NIHMSID: NIHMS213247 PMID: 20183467

Abstract

Targeted maximum likelihood methodology is applied to provide a test that makes use of the covariate data that are commonly collected in randomized trials, and does not require assumptions beyond those of the logrank test when censoring is uninformative. Under informative censoring, the logrank test is biased, whereas the test provided in this article is consistent under consistent estimation of the censoring mechanism or the conditional hazard for survival. Two approaches based on this methodology are provided: (1) a substitution-based approach that targets treatment and time-specific survival from which the logrank parameter is estimated, and (2) directly targeting the logrank parameter.

Keywords: Clinical trials, Covariate adjustment, Logrank test, Targeted maximum likelihood estimation

1. INTRODUCTION

Covariate adjustment in randomized controlled trials (RCTs) has been demonstrated to improve estimation efficiency over standard unadjusted methods in studies with continuous or binary outcomes at fixed endpoints (Koch et al., 1998; Moore and van der Laan, 2009b; Pocock et al., 2002; Tsiatis et al., 2008; Zhang et al., 2008). It is often the case in RCTs that the outcome is time-to-event in nature and subject to right censoring. The standard approach for testing for a treatment effect on survival is the logrank test, or asymptotically equivalently the test, H₀ : ψ = 0, where ψ is the coefficient for treatment in the Cox proportional hazards model that includes only a main term for treatment. From estimation theory (van der Laan and Robins, 2003), it is known under the proportional hazards assumption that this maximum likelihood estimator (MLE) is the efficient estimator of the effect of interest, given that the data include only treatment and survival times. In most RCTs, data are additionally collected on baseline (pretreatment) covariates. This unadjusted estimator ignores the covariates and is thus not equivalent to the full MLE. It follows that application of the unadjusted estimator can lead to a loss in estimation efficiency (precision) in practice.

The key principle in developing covariate adjusted estimators is to not require any additional assumptions beyond those required for the unadjusted method. For example, a Cox proportional hazards model that includes covariates in addition to treatment requires heavy parametric modeling assumptions and thus is not a suitable method of covariate adjustment. Lu and Tsiatis (2008) demonstrated how the efficiency of the logrank test can be improved with covariate adjustment based on estimating equation methodology. Their method, which does not make assumptions beyond those of the logrank test, is more efficient and was shown to increase power over the logrank test. A nonparametric method for a covariate adjusted method that uses logrank or Wilcoxon scores was proposed in Tangen and Koch (1999) and explored via simulation studies in Jiang et al. (2008). This method attempts to adjust for the random imbalances that occur in covariate distributions between treatment groups. The authors use a linear regression model to estimate the difference in vectors between treatment groups, where these vectors include the average rank score and average value for each covariate within the given treatment group. This latter method limits the flexibility in adjusting for covariates by only allowing the comparison of their mean differences between treatment groups. Furthermore, it does not allow for adjustment for informative censoring. The importance of adjusting for covariates to gain power over the logrank test is also discussed in Akazawa et al. (1997). However, their stratified approach does not ensure a gain in power, and can actually lose power over the logrank test for certain stratification strategies.

With the principle of not making any assumptions beyond those required for the unadjusted test, in this article we develop covariate adjusted analogues to the logrank test in RCTs. We present our methodology for discrete survival outcomes where the logrank parameter represents an effect of treatment by comparing the cumulative hazard of treated subjects at time t_k, relative to the cumulative hazard of controls at time t_k, averaged over many time points t_k. However, we note that if the time scale is sufficiently fine, this methodology is compatible with continuous survival outcomes.

We present our methods for covariate adjustment under the framework of targeted maximum likelihood estimation, which is an approach to statistical learning originally introduced in van der Laan and Rubin (2006). Targeted maximum likelihood estimation is an estimation procedure that carries out a bias reduction specifically targeted for the parameter of interest. This is in contrast to traditional maximum likelihood estimation, which aims for a bias–variance trade-off for the whole density of the observed data, rather than a specific parameter of it. The targeted maximum likelihood methodology aims to find a density p̂* that solves the efficient influence curve estimating equation for the parameter of interest that results in a bias reduction and also achieves a small increase in the log-likelihood as compared to the maximum likelihood estimate. The resulting substitution estimator ψ(p̂*) is a familiar type of likelihood-based estimator and due to the fact that it solves the efficient influence curve estimating equation it thereby inherits its properties, including asymptotic linearity and local efficiency (van der Laan and Robins, 2003).

In particular, for the parameter estimated in this article, the marginal log hazard of survival, the targeted maximum likelihood algorithm involves estimation of an initial conditional hazard. This hazard is then updated iteratively by adding to it a particular targeting covariate that is derived based on the specific parameter of interest. That is, two different parameters will have two different updating covariates. The algorithm is iterated until convergence, i.e., the efficient influence estimating equation for the parameter of interest is solved.

There are several advantages to this methodology over estimating equation methodology as discussed in Moore and van der Laan (2009a,b). One important advantage is that the methodology does not rely on the assumption that the efficient influence curve can be represented as an estimating equation in the parameter of interest. This is of particular consequence for the logrank parameter since the efficient influence curve cannot be represented as an estimating equation in this parameter. As a result, the proof of the double robustness consistency properties does not follow in the usual obvious manner. Therefore, in this article, we provide two methods for covariate adjustment using the targeted maximum likelihood methodology. The first is a substitution-based approach that targets the time- and treatment-specific survival parameters for the treated and untreated arms. The corresponding estimates are used as plug-ins to evaluate the logrank parameter. Here we can prove the double robustness properties for the time and treatment specific estimators using the usual estimating equation approach. We then show how we can extend these properties to the logrank parameter. In the second approach, we target the logrank parameter directly. However, since we cannot use the estimating equation approach for proof of the double robustness properties due to the fact that the efficient influence curve cannot be represented as an estimating equation in this parameter, we rely on empirical validation of these properties. Therefore, we include both of these covariate adjusted targeted maximum likelihood estimators (TMLEs) of the logrank parameter.

The likelihood of the observed data (provided in Section 4.1) can be expressed in terms of the hazard of survival, conditional on treatment and covariates. It is important to note that the TMLEs rely on estimation of the conditional hazard. This initial hazard estimate can be maximum likelihood based, but can involve sieve-based estimation and selection of fine-tuning parameters/algorithms/models using likelihood-based cross-validation, since nonparametric maximum likelihood estimation is not possible. Machine learning algorithms can be applied to obtain an initial hazard estimate, after which the targeting step is applied as a means of bias reduction for the parameter of interest. Part of the targeting step involves averaging over the covariates that have terms in the hazard model to obtain a marginal or unconditional estimate. In summary, the methods presented in this article involve two steps. First, an initial hazard of survival, conditional on treatment and covariates, must be estimated. Second, the targeting step is applied as a bias reduction step for the parameter of interest.

Specifically, the article is outlined as follows. We begin with a brief overview of the data, model and parameter of interest in Section 2. We then provide an unadjusted method for testing for a treatment effect in Section 3, to which we compare the targeted maximum likelihood estimation approaches. In the continuous case, this unadjusted test is the logrank test, and in the discrete case, we provide a method based on the proportional odds model. We then review the methodology for estimating the treatment specific estimates of survival at a fixed endpoint t_k as presented in Moore and van der Laan (2009a). This t_k approach is then extended to provide our first of two analogues to the logrank test, the substitution-based targeted maximum likelihood method (Section 4.1). In the second analogue, we provide the direct targeted maximum likelihood approach, which does not require estimation of the t_k-specific survival estimates, but rather directly targets the average (over time) effect of treatment on survival. Since the TMLE requires estimation of an initial conditional hazard, methods for fitting it as well as the censoring mechanism are provided in Section 5. In Section 6 we present simulation studies to demonstrate the efficiency gains of the proposed methods over the logrank test in an RCT under no censoring and uninformative censoring. Furthermore, under informative censoring we demonstrate the bias that arises with the standard approach in contrast to the consistency of our proposed estimator. A second simulation study demonstrates the importance of data-adaptive model selection algorithms in the estimation of the initial hazard used by the targeted maximum likelihood algorithm in order to obtain maximal power. A third simulation study demonstrates the methodology as it would be applied to single dataset as in a typical data analysis. The data were simulated based on a real RCT study. Finally, we conclude with a discussion.

2. DATA, MODEL, AND PARAMETER OF INTEREST

We assume that in the study protocol, each patient is monitored at K equally spaced clinical visits. At each visit, an outcome is evaluated as having occurred or not occurred. Let T represent the first visit at which the event was reported, and thus T can take values {1, . . . , J}. The censoring time C is the first visit when the subject is no longer enrolled in the study. Let A ∈ {0, 1} represent the treatment assignment at baseline and W represent a vector of baseline covariates. The observed data are given by O = (T̃, Δ, A, W) ~ p₀ where T̃ = min(T, C), Δ = I(T ≤ C) is the indicator that that subject was not censored and p₀ the denotes density of O. The conditional hazard is given by λ₀ (· | A, W) and the corresponding conditional survival is given by S₀(· | A, W). The censoring mechanism is given by $\overset{‒}{G} (t_{_} ∣ A, W) = P (C \geq t ∣ A, W)$ .

Consider the data structure (W, A, T = T_A). Within the counterfactual framework for causal inference, since this data structure only contains a single counterfactual outcome T = T_A corresponding with the treatment the patient actually received, this data structure is a missing data structure on the full data X = (T₀, T₁, W) with missingness variable A, where T₁ represents a patient's time to the occurrence of an event had she, possibly contrary to fact, been assigned to the treatment group, and T₀ likewise represents the time to the occurrence of the event had the patient been assigned to the control group. Our target parameter, which we define later; is a parameter of this full data structure. The randomization assumption states that A is conditionally independent of the full data X, given W. In an RCT, this assumption clearly holds. For the sake of presentation, we assume that the treatment A is completely randomized; however, we present our methods under the counterfactual framework so that it is clear how our estimators generalize to observational studies or randomized trials in which the treatment assignment mechanism is unknown. The actual observed data O are a right-censored version of (W, A, T = T_A) and can thus be viewed as a censored data structure of the full data X with censoring variables (A, C). We assume coarsening at random (CAR), which means that the coarsening process only depends on the observed data. Under these counterfactual assumptions, our observed data representations of the parameters discussed below are also full data parameters.

In Moore and van der Laan (2009a), the authors presented the targeted maximum likelihood estimation method for the estimation of the t_k and treatment-specific parameters,

p_{0} \to Γ_{1} (p_{0}) (t_{k}) = P r (T_{1} > t_{k}) = E_{0} (S_{0} (t_{k} ∣ A = 1, W)) = S_{0, 1} (t_{k})

(1)

and

p_{0} \to Γ_{0} (p_{0}) (t_{k}) = P r (T_{0} > t_{k}) = E_{0} (S_{0} (t_{k} ∣ A = 0, W)) = S_{0, 0} (t_{k})

(2)

where the subscripts for Γ and T denote the treatment group, either 0 or 1, elsewhere the subscript 0 denotes the truth, and S_{0, 1} is the true survival function for treatment group 1 and S_{0, 0} is the true survival function for treatment group 0. Thereby, any linear combination of these parameters can be estimated to evaluate the effect of treatment A on survival T, e.g., the marginal log hazard of survival,

p_{0} \to Γ (p_{0}) (t_{k}) = log (\frac{log (P r (T_{1} > t_{k}))}{log (P r (T_{0} > t_{k}))}) = log (\frac{log (S_{0, 1} (t_{k}))}{log (S_{0, 0} (t_{k}))})

(3)

In this article, we are interested not in a test for the effect of treatment at a fixed end point t_k, but rather in the average effect over time. Note that in the continuous survival case, if one averaged Γ(p₀)(t_k) over all t, this parameter would correspond with the Cox proportional hazards parameter (i.e., coefficient for treatment in Cox proportional hazards model) and thus the parameter tested by the ubiquitous logrank test, given by

λ (t ∣ A) = λ (t) exp (ψ_{C} A)

(4)

More formally, let $M$ be the class of all densities of O with respect to an appropriate dominating measure and be nonparametric up to possible smoothness conditions. Let our parameter of interest be represented by Ψ(p₀), where

p_{0} \to Ψ (p_{0}) = \sum_{t_{k}} w (t_{k}) f (S_{1} (t_{k}), S_{0} (t_{k}))

(5)

for some weight function w(t_k) that we discuss in Section 4.3 and some function f. For example, g may be defined as in Eq. (3), and if T is continuous (or close to continuous), then this parameter is commonly tested with testing that ψ_C in Eq. (4) is equal to 0, or equivalently the logrank test. If T is discrete, one could instead define f as the marginal log odds of failure at t_k. In the latter case, this parameter is commonly tested using logistic regression (see Section 3). However, even in the discrete case, one can still choose f as in Eq. (3) as we do in this article, but can no longer use the Cox proportional hazards model (or equivalently the logrank test). Note that although we focus on the marginal log hazard of survival as our parameter of interest, the derivation of the TMLE for the average of the log odds parameter is very similar.

Thus, the targeted maximum likelihood test for the effect of treatment on survival is a test for H₀ : ψ₀ = 0 against H_A : ψ₀ ≠ 0, where ψ₀ = Ψ(p₀).

3. UNADJUSTED ESTIMATION OF Ψ(p₀)

In the case that T is continuous, the unadjusted estimation of Ψ(p₀) could be carried out by fitting a Cox proportional hazards model with a main term for treatment only, or equivalently carry out the logrank test. The proportional odds model, introduced in Cox (1972) for discrete survival times, is a model for the odds of dying at t, given survival up to time t, given by

\frac{λ (t ∣ A)}{1 - λ (t ∣ A)} = \frac{λ (t)}{1 - λ (t)} exp (β_{A} A)

where

log (\frac{λ (t ∣ A)}{1 - λ (t ∣ A)}) = β_{1} I (t = 1) + β_{2} I (t = 2) + \dots + β_{J} (t = J) + β_{A} A

(6)

Thus, β₁, . . . . , β_J capture the logit of the baseline hazard function, and β_A is the effect of treatment on the logit of the hazard. Such a parameterization leaves the baseline hazard unspecified and since A is a binary variable, β_A is the nonparameteric formulation of the effect of treatment on the logit of the hazard.

The likelihood function for the discrete hazard process, where O_r = (T̃, Δ, A), can be expressed as

L (O_{r}) = \prod_{i = 1}^{n} \Pr {(T_{i} = {\tilde{t}}_{i})}^{Δ_{i}} P r {(T_{i} > {\tilde{t}}_{i})}^{(1 - Δ_{i})}

(7)

= \prod_{i = 1}^{n} {(λ_{i} ({\tilde{t}}_{i}) \prod_{t = 1}^{{\tilde{t}}_{i} - 1} (1 - λ_{i} (t)))}^{Δ_{i}} {(\prod_{t = 1}^{{\tilde{t}}_{i}} (1 - λ_{i} (t)))}^{1 - Δ_{i}}

(8)

where t̃_i = min(t_i, c_i) is the last time point at which individual i was observed (i.e., either censored or the event occurred). Let ȳ_i = (y_i1, . . . , y_{it̃_i}) denote the event history for individual i where (y_i1, . . . , y_{it̃_i−1}) = (0, . . . , 0), and y_{it̃_i} = 1 if Δ_i = 1 and y_{it̃_i} = 0 if Δ_i = 0. It can be shown that,

L (O_{r}) = \prod_{i = 1}^{n} \prod_{t = 1}^{{\tilde{t}}_{i}} λ {(t ∣ A_{i})}^{y_{i t}} {(1 - λ (t ∣ A_{i}))}^{(1 - y_{i t})}

Note that this likelihood is equivalent to that of a sequence of independent Bernouilli trials and thus we can use standard logistic regression software to obtain the maximum likelihood estimates for the coefficients β in Eq. (6).

In practice, the logistic regression model is fitted with the dataset that includes repeated measures for each subject until the time when the subject either dies or is censored—e.g., if a given subject dies or is censored at time point 5, this subject would contribute 5 rows of data to the new dataset. The outcome variable is zero up until the event occurs, where it is set to 1. If the subject is censored, then the outcome remains 0, even at the last time point.

An estimate of the effect of treatment on the logit of the hazard can be obtained by extracting the coefficient for A; however, our parameter of interest is the average of the log of the ratio of log of survival under the two treatment regimens, as given by Ψ(p₀) as defined in Eq. (5). Thus, we use the logistic regression fit for the hazard, denoted by $\hat{λ} (t ∣ A)$ , to obtain estimates for ${\hat{λ}}_{1} (t) = \hat{λ} (t ∣ A = 1)$ and ${\hat{λ}}_{0} (t) = \hat{λ} (t ∣ A = 0)$ . Based on these estimates, we use the relation

S (t_{k}) = \prod_{j \leq t_{k}} (1 - λ (j))

to obtain estimates ${\hat{S}}_{1} (t_{k})$ and ${\hat{S}}_{0} (t_{k})$ . The unadjusted estimate of Ψ(p₀) is then computed as the crude average over time of the log of the ratio of the logs of these t_k-specific estimates. We note that, alternatively, one could estimate S₁(t_k) and S₀(t_k) for t_k ∈ 1, . . . , J using Kaplan–Meier and use these estimates as plug-ins into Eq. (5). The use of Kaplan–Meier would be more nonparametric since the proportional odds model assumes proportionality of the odds of failure. However, both provide valid tests of H₀ : ψ₀ = 0 in the nonparametric model since under H₀, both methods provide consistent estimators of ψ₀. In this article, we use the proportional odds approach only.

4. TARGETED MAXIMUM LIKELIHOOD ESTIMATION OF Ψ(p₀)

The log likelihood for the observed data O, which includes covariates W, for a single observation is given by

P (W) g (A ∣ W) {[λ (\tilde{t} ∣ A, W) \prod_{t = 1}^{\tilde{t} - 1} (1 - λ (t ∣ A, W))]}^{δ} {[\prod_{t = 1}^{\tilde{t}} (1 - λ (t ∣ A, W))]}^{1 - δ} \times [\overset{‒}{G} {({\tilde{t}}_{_} ∣ A, W)}^{δ} P {(C = \tilde{t} ∣ A, W)}^{1 - δ}]

(9)

where δ = 1 if T = t̃ and C ≥ t̃.

Consider an initial fit p̂⁰ of the density of the observed data O, identified by a hazard fit ${\hat{λ}}^{0} (t ∣ A, W)$ , the distribution of A identified by ĝ⁰(1 | W) and ĝ⁰(0 | W) = 1 – ĝ⁰(1 | W), the censoring mechanism ${\hat{G}}^{0} (t ∣ A, W)$ , and the marginal distribution of W being the empirical probability distribution of W₁, . . . , W_n. In an RCT, treatment is randomized and ${\hat{g}}^{0} (1 ∣ W) = \frac{1}{n} \sum_{i = 1}^{n} A_{i}$ .

Let the initial hazard fit be denoted by ${\hat{λ}}^{0} (t ∣ A, W)$ . This initial hazard can be represented as

logit ({\hat{λ}}^{0} (t ∣ A, W)) = m (t, A, W)

where m is any function of t, A, and W. We show that representing the initial hazard in this manner allows us to obtain its update (fluctuation) using standard software (e.g., glm in R). For example, we could consider the initial hazard

logit ({\hat{λ}}^{0} (t ∣ A, W)) = \hat{α} (t) + k (A, W ∣ \hat{β})

where k is some function of A and W. The targeted maximum likelihood algorithm updates this initial fit by adding to it the term εh(t, A, W), i.e.,

logit ({\hat{λ}}^{0} (∊) (t ∣ A, W)) = m (t, A, W) + ∊ h (t, A, W)

(10)

Now, ε is estimated by fitting Eq. (10) with standard logistic regression software and fixing the coefficient for m(t, A, W) at one and setting the intercept to zero. The initial hazard fit is then updated by adding to it $\hat{∊} h (t, A, W)$ . The covariate h(t, A, W) is then reevaluated based on this updated hazard (and thus survival) fit. The newly updated hazard now plays the role of the initial hazard and ε is then again estimated based on the newly updated covariate h(t, A, W). The procedure is iterated until convergence, i.e., $\hat{∊}$ is essentially zero. The targeted maximum likelihood estimate is based on the hazard obtained in the final step of the algorithm. The covariate h(t, A, W), which is defined in the following sections, is a function of the conditional survival function (S(t | A, W)) and the censoring mechanism ( $\overset{‒}{G} (t_{_} ∣ A, W)$ ). The covariate is reevaluated at each step of the algorithm based on the updated hazard estimate for λ(t | A, W) (and thus S(t | A, W)). The censoring mechanism is not updated in the algorithm. For the rationale for updating only the hazard, see Moore and van der Laan (2009a).

The covariate h(t, A, W) is selected such that the score for this hazard model at ε = 0 is equal to the projection of the efficient influence curve on scores generated by the parameter λ(t | A, W) in the nonparametric model for the observed data, assuming only CAR. Thus, the TMLE, that is, the estimator based on this iteratively updated hazard fit, solves the efficient influence curve estimating equation.

We provide two different approaches to the estimation of Ψ(p₀) based on the targeted maximum likelihood methodology. This specific covariate h(t, A, W) that updates the hazard is dependent on the approach. In the first approach, the targeted maximum likelihood estimates for each t_k specific parameter, S₁(t_k) and S₀(t_k), at each t_k ∈ 1, . . . , J are obtained. The substitution estimate for Ψ(p₀) is then based on these targeted maximum likelihood estimates. In the second method, the estimate for Ψ(p₀) is obtained by directly targeting that parameter.

4.1. Method 1: Substitution TMLE

In this first method, we describe the procedure for targeted maximum likelihood estimation of the t_k and treatment-specific parameters S₁(t_k) and S₀(t_k). Thereby, we can estimate any parameter that is a combination of them, such as the average parameter Ψ(p₀) defined in Eq. (5). We note that with this procedure one could choose a number of other parameters such as the average difference in survival or the average ratio of survival. However, here we focus on the substitution targeted maximum likelihood estimator (S-TMLE) of Ψ(p₀) only.

The covariates for targeting the t_k and treatment specific parameters S₁(t_k) and S₀(t_k) as defined in Eqs. (1) and (2), respectively, were provided in Moore and van der Laan (2009a). In short, the algorithm selects the covariates, h_{1t_k}(t, A, W) and h_{0t_k}(t, A, W), corresponding with targeting the parameters S₁(t_k) and S₀(t_k), respectively. For the parameter S₁(t_k), h_{1t_k}(t, A, W) is defined such that the score for the hazard model at ε₁ = 0 is equal to the projection of the efficient influence curve of S₁(t_k) on scores generated by the parameter λ(t | A, W) in the nonparametric model for the observed data, assuming only CAR. Similarly, for S₀(t_k), h_{0t_k}(t, A, W) is defined such that the score for the hazard model at ε₂ = 0 is equal to the projection of the efficient influence curve of S₀(t_k) on scores generated by the parameter λ(t | A, W) in the nonparametric model for the observed data, assuming only CAR. These covariates corresponding with parameters S₁(t_k) and S₀(t_k) are respectively given by

h_{1 t_{k}} (t, A, W) = - \frac{I (A = 1)}{g (1) \overset{‒}{G} (t_{_} ∣ A, W)} \frac{S (t_{k} ∣ A, W)}{S (t ∣ A, W)} I (t \leq t_{k})

(11)

and

h_{0 t_{k}} (t, A, W) = - \frac{I (A = 0)}{g (0) \overset{‒}{G} (t_{_} ∣ A, W)} \frac{S (t_{k} ∣ A, W)}{S (t ∣ A, W)} I (t \leq t_{k})

(12)

The parameters S₁(t_k) and S₀(t_k) can be targeted simultaneously by addition of both covariates h₁(t, A, W) and h₀(t, A, W) as in Eq. (10), and finding the two-dimensional updating coefficients, $\hat{∊} = {{\hat{∊}}_{1}, {\hat{∊}}_{2}}$ , i.e.,

logit ({\hat{λ}}^{0} (∊) (t ∣ A, W)) = m (t, A, W) + ∊_{1} h_{1 t_{k}} (t, A, W) + ∊_{2} h_{0 t_{k}} (t, A, W)

(13)

Finding $\hat{∊} = {{\hat{∊}}_{1}, {\hat{∊}}_{2}}$ in the updated hazard provided in Eq. (13) that maximizes the likelihood of the observed data can be done in practice by fitting a logistic regression in the covariates m(t, A, W) and h_{1t_k}(t, A, W) and h_{0t_k}(t, A, W). The coefficient for m(t, A, W) is fixed at one and the intercept is set to zero and thus the whole regression is not refit, but rather only ε is estimated. These steps for evaluating $\hat{∊}$ , and thus obtaining the updated hazard fit ${\hat{λ}}^{1} (t ∣ A, W)$ correspond with a single iteration of the targeted maximum likelihood algorithm. In the second iteration, the updated ${\hat{λ}}^{1} (t ∣ A, W)$ now plays the role of the initial fit, and the covariates h_{1t_k}(t, A, W) and h_{0t_k}(t, A, W) are then reevaluated with the updated ${\hat{S}}^{1} (t ∣ A, W)$ based on ${\hat{λ}}^{1} (t ∣ A, W)$ and $\hat{∊}$ is estimated again. Based on this update, ${\hat{λ}}^{2} (t ∣ A, W)$ is obtained. In the third iteration, ${\hat{λ}}^{3} (t ∣ A, W)$ is fitted and the procedure is iterated until $\hat{∊}$ is essentially zero. The final hazard fit at the last iteration of the algorithm is denoted by ${\hat{λ}}^{*} (t ∣ A, W)$ with the corresponding survival fit given by ${\hat{S}}^{*} (t ∣ A, W)$ .

The t_k-specific parameter Γ(p₀)(t_k), defined in Eq. (3), is estimated by

\hat{γ} {(t_{k})}^{*} = log (\frac{log (\frac{1}{n} \sum_{i = 1}^{n} {\hat{S}}^{*} (t_{k} ∣ 1, W_{i}))}{log (\frac{1}{n} \sum_{i = 1}^{n} {\hat{S}}^{*} (t_{k} ∣ 0, W_{i}))})

(14)

Finally, the parameter of interest Ψ(p₀) can be estimated by plugging in each of the t_k-specific estimates $\hat{γ} {(t_{k})}^{*}$ for t_k ∈ 1, . . . , J. That is,

{\hat{ψ}}^{*} = \sum_{t_{k}} w (t_{k}) \hat{γ} {(t_{k})}^{*}

4.1.1. Efficient Influence Curves

The efficient influence curves for the parameters S₁(t_k) and S₀(t_k), denoted by IC_{1t_k} and IC_{0t_k}, were provided in Moore and van der Laan (2009a), and are respectively given by

I C_{1 t_{k}} (g_{0}, G_{0}, S_{0}) = \sum_{t \leq t_{k}} h_{1} t_{k} (g_{0}, G_{0}, S_{0}) (t ∣ A, W) [I (\tilde{T} = t, Δ = 1) - I (\tilde{T} \geq t) \times λ_{0} (t ∣ A = 1, W)] + S_{0} (t_{k} ∣ A = 1, W) - Γ_{1} (p_{0}) (t_{k})

(15)

and

I C_{0 t_{k}} (g_{0}, G_{0}, S_{0}) = \sum_{t \leq t_{k}} h_{0} t_{k} (g_{0}, G_{0}, S_{0}) (t ∣ A, W) [I (\tilde{T} = t, Δ = 1) - I (\tilde{T} \geq t) \times λ_{0} (t ∣ A = 0, W)] + S_{0} (t_{k} ∣ A = 0, W) - Γ_{0} (p_{0}) (t_{k})

(16)

The efficient influence curve, denoted by IC_{t_k}, for the parameter Γ₀(p₀)(t_k), can be obtained by application of the δ-method to the influence curves IC_{1t_k} and IC_{0t_k}. We have

I C_{t_{k}} = a (t_{k}) I C_{1 t_{k}} + b (t_{k}) I C_{0 t_{k}}

(17)

which is a linear combination of IC_{1t_k} and IC_{0t_k}, with coefficients only a function of t_k. With some algebra, one can easily show that the coefficients are given by $a (t_{k}) = \frac{1}{S_{1} (t_{k}) log (S_{1} (t_{k}))}$ and $b (t_{k}) = \frac{- 1}{S_{0} (t_{k}) log (S_{0} (t_{k}))}$ .

Our parameter of interest Ψ(p₀) is the average (possibly weighted) of the t_k-specific log ratios of the logs of survival, i.e., average over t_k of Γ(p₀)(t_k). Therefore, its efficient influence curve is given by

\overset{‒}{I C} = \sum_{t_{k}} w (t_{k}) I C_{t_{k}}

(18)

4.1.2. Double Robustness Consistency Properties of the S-TMLE

In Moore and van der Laan (2009a), the statistical properties of the treatment and t_k-specific estimators were provided. Consider the parameter S₁(t_k). The targeted maximum likelihood estimate ${\hat{p}}^{*} \in M$ of p₀ solves the efficient influence curve estimating equation, given by $\sum_{i = 1}^{n} I C_{1 t_{k}} (g_{0}, \hat{G}, {\hat{S}}^{*}) (O_{i}) = 0$ , which is the optimal estimating equation for the parameter of interest. It can be shown that E₀IC_{1t_k}(S, g, G) = 0 if either (1) S = S₀(· | A, W) (and thus λ = λ₀(· | A, W)) or (2) g = g₀(A | W) and G = G₀(· | A, W) (see Appendix A). In an RCT, the treatment mechanism is known and g₀(A | W) = g₀(A). Therefore, the consistency of the estimator ${\hat{γ}}_{1}^{*} (t_{k})$ of S₁(t_k) in an RCT relies only on consistent estimation of G₀(· | A, W) or S₀(· | A, W). When there is no censoring or censoring is missing completely at random (MCAR), ${\hat{γ}}_{1}^{*} (t_{k})$ is consistent even when the estimator $\hat{S} (\cdot ∣ A, W)$ is inconsistent (e.g., if it relies on a misspecified model). Hence, in an RCT, one is not concerned with estimation bias due to misspecification of the hazard model. Under informative or missing at random (MAR) censoring, if G₀(· | A, W) is consistently estimated then ${\hat{γ}}_{1}^{*} (t_{k})$ is consistent even if S₀(· | A, W) is not consistently estimated. If both are correctly specified then ${\hat{γ}}_{1}^{*} (t_{k})$ is efficient. These same properties hold for the estimator ${\hat{γ}}_{0}^{*} (t_{k})$ for S₀(t_k).

Since the substitution based estimator $\hat{γ} {(t_{k})}^{*}$ for the t_k-specific parameter given by Eq. (3) is simply a function of these two treatment-specific estimators, it inherits these same double robustness properties. Similarly, since our parameter of interest, Ψ(p₀) is the average (possibly weighted) of the t_k-specific log ratios of the logs of survival, clearly the properties of the t_k-specific log ratio estimator directly extend to the estimator of the average (over time) parameter (i.e., the logrank analogue parameter). That is, if there is no censoring or censoring is MCAR, this method provides a covariate adjusted estimator that is consistent even when the hazard is misspecified. Thus, if one captures only part of the relevant covariate information, one can still gain in efficiency over the unadjusted method, without the risk of introducing bias.

We note that although the double robustness properties of the substitution estimators for the parameters Γ(p₀)(t_k) and Ψ(p₀) are inherited from the properties of ${\hat{γ}}_{1}^{*} (t_{k})$ and ${\hat{γ}}_{0}^{*} (t_{k})$ , the $\hat{γ} {(t_{k})}^{*}$ and ${\hat{ψ}}^{*}$ also solve their corresponding efficient influence curve estimating equations based on Eqs. (17) and (18) respectively.

4.1.3. Inference for S-TMLE

We first consider the parameter S₁(t_k). Since the TMLE is a solution to the efficient influence curve estimating equation, then from estimating equation theory (see van der Laan and Robins, 2003) if g₀ and G₀ are known, then the estimator is asymptotically linear with influence curve IC_{1t_k}(g₀, G₀, S). However, even though g₀ is typically known in an RCT, G₀ is not. In this case, the influence curve is given by

I C_{1 t_{k}} (g_{0}, G, S) - Π (I C_{1 t_{k}} ∣ T_{G})

That is, one must subtract from IC_{1t_k} its projection on the tangent space of the model for the censoring mechanism. Therefore, one can construct an asymptotically conservative Wald-type 0.95-confidence interval for ${\hat{γ}}_{1}^{*} (t_{k})$ based on the estimate of the efficient influence curve for S₁(t_k) ignoring this projection and using Eq. (15), i.e., using $I C_{1 t_{k}} (g_{0}, \hat{G}, \hat{S})$ , where this confidence interval is given by ${\hat{γ}}_{1}^{*} (t_{k}) \pm 1.96 \frac{{\hat{σ}}_{1}}{\sqrt{n}}$ , and

{\hat{σ}}^{2} = \frac{1}{n} \sum_{i = 1}^{n} I C^{2} (g_{0}, \hat{G}, \hat{S})

The null hypothesis H₀ : γ₁ = 0 can be tested with the test statistic

\hat{T} = \frac{{\hat{γ}}_{1}^{*}}{\frac{\hat{σ}}{\sqrt{n}}},

whose asymptotic distribution is N(0, 1) under the null hypothesis. Inference is derived in the same manner for the TMLE ${\hat{γ}}_{0}^{*} (t_{k})$ of S₀(t_k).

As with the estimators of treatment specific survival, if g₀ and G₀ are known, then the S-TMLE of Γ(p₀)(t_k) is asymptotically linear with influence curve IC_{t_k}(g₀, G₀, S). Since G₀ is not known, the influence curve is given by

\begin{matrix} I C_{t_{k}} & = a (t_{k}) (I C_{1 t_{k}} - Π (I C_{1 t_{k}} ∣ T_{G})) + b (t_{k}) (I C_{0 t_{k}} - Π (I C_{0 t_{k}} ∣ T_{g})) \\ = a (t_{k}) (I C_{1 t_{k}}) + b (t_{k}) I C_{0 t_{k}} - Π ((a (t_{k}) I C_{1 t_{k}} - b (t_{k}) I C_{0 t_{k}}) ∣ T_{G}) \end{matrix}

Therefore, one can construct an asymptotically conservative Wald-type 0.95-confidence interval for $\hat{γ}$ based on the estimate of the efficient influence curve for Γ(p₀)(t_k) using Eq. (17), i.e., using $I C_{t_{k}} (g_{0}, \hat{G}, \hat{S})$ , as given earlier. Similarly, the test statistic can be constructed as already shown.

Finally, for our parameter of interest, the average (possibly weighted) of the t_k-specific log of the ratio of logs of survival, the influence curve when G₀ is estimated is given by

\begin{matrix} \overset{‒}{I C} & = \sum_{t_{k}} w (t_{k}) [a (t_{k}) (I C_{1 t_{k}} - Π (I C_{1 t_{k}} ∣ T_{G})) + b (t_{k}) (I C_{0 t_{k}} - Π (I C_{0 t_{k}} ∣ T_{g}))] \\ = \sum_{t_{k}} w (t_{k}) [a (t_{k}) (I C_{1 t_{k}}) + b (t_{k}) I C_{0 t_{k}}] - Π (w (t_{k}) (a (t_{k}) I C_{1 t_{k}} - b (t_{k}) I C_{0 t_{k}}) ∣ T_{G}) \end{matrix}

Therefore, one can construct an asymptotically conservative Wald-type 0.95-confidence interval for ${\hat{ψ}}^{*}$ based on the estimate of the efficient influence curve for Ψ(p₀) using Eq. (18), i.e., using $\overset{‒}{I C} (g_{0}, \hat{G}, \hat{S})$ , as done earlier. Similarly, the test statistic can be constructed as already shown.

4.2. Method 2: Direct TMLE

In the previous method, we provided a substitution-based procedure based on the t_k-specific estimators of survival. However, we did not directly target the parameter of interest, which is the average of the log of the ratio of logs of survival in the treatment and control groups, given by Ψ(p₀) as defined in Eq. (5). In this section we present the targeted maximum likelihood algorithm for targeting this single parameter. Contrary to the S-TMLE, which requires two covariates to update the hazard at each time t_k ∈ {1, . . . , J}, the algorithm targeting the single parameter only requires a single covariate. This suggests that application of this direct targeted maximum likelihood estimator (D-TMLE) results in finite sample improvements in efficiency over the S-TMLE.

The efficient influence curve of our parameter of interest Ψ(p₀) is given by Eq. (18) in Section 4.1.1. We provided the time-dependent covariates h_{1t_k}(t, A, W) and h_{0t_k}(t, A, W) for targeting the t_k-specific survival S₁(t_k) and S₀(t_k), required to generate the corresponding components of IC_{1t_k} and IC_{0t_k} of the form

h_{j} (t, A, W) (d N (t) - E (d N (t) ∣ past (t)), j = 0, 1

where N(t) is a counting process. Thus, to generate the component ${\overset{‒}{I C}}_{t}$ of $\overset{‒}{I C}$ for the t-factor of the likelihood, we now select the single time-dependent covariate,

\overset{‒}{h} (t, A, W) = \sum_{t_{k}} w (t_{k}) a (t_{k}) h_{1 t_{k}} + \sum_{t_{k}} w (t_{k}) b (t_{k}) h_{0 t_{k}}

Now, in the targeted maximum likelihood algorithm, the first updating step of the initial hazard, denoted by ${\hat{λ}}^{0} (t ∣ A, W)$ , is given by

logit ({\hat{λ}}^{0} (∊) (t ∣ A, W)) = m (t, A, W) + ∊ \overset{‒}{h} (t, A, W)

(19)

Again, as in the substitution-based method, we now find $\hat{∊}$ by fitting Eq. (19) using standard logistic regression software by setting the coefficient for m(t, A, W) to one and setting the intercept to zero. The corresponding updated hazard ${\hat{λ}}^{1} (t ∣ A, W)$ is obtained. This represents the first step of the algorithm. The hazard fit ${\hat{λ}}^{1} (t ∣ A, W)$ now plays the role of the initial fit and the covariate h̄(t, A, W) is then reevaluated based on ${\hat{λ}}^{1} (t ∣ A, W)$ (and thus ${\hat{S}}^{1} (t ∣ A, W)$ ). Based on $\hat{∊}$ , estimated as described earlier, ${\hat{λ}}^{2} (t ∣ A, W)$ is obtained. On the third iteration, ${\hat{λ}}^{3} (t ∣ A, W)$ is obtained and the process is iterated until $\hat{∊}$ is essentially zero. The final hazard fit at the last iteration of the algorithm is denoted by ${\hat{λ}}^{*} (t ∣ A, W)$ with the corresponding survival fit given by ${\hat{S}}^{*} (t ∣ A, W)$ .

The procedure for estimation of the parameter of interest, Ψ(p₀), now follows exactly that of the S-TMLE method described in Section 4.1. That is, estimates of the t_k-specific parameter are obtained and then averaged over time to obtain the targeted maximum likelihood estimate ${\hat{ψ}}^{*}$ of $Ψ (p_{0})$ . Note that ${\hat{ψ}}^{*}$ solves the efficient influence curve estimating equation $\sum_{i = 1}^{n} \overset{‒}{I C} ({\hat{λ}}^{*}, g_{0}, \hat{G}) (O_{i})$ ; however, it does not solve $\sum_{i = 1}^{n} I C_{1 t_{k}} ({\hat{λ}}^{*}, g_{0}, \hat{G}) (O_{i})$ and $\sum_{i = 1}^{n} I C_{0 t_{k}} ({\hat{λ}}^{*}, g_{0}, \hat{G}) (O_{i})$ as does the S-TMLE.

4.2.1. Double Robustness Consistency Properties of the D-TMLE

It can be shown that the efficient influence curve $\overset{‒}{I C}$ cannot be written as an estimating equation in the parameter of interest Ψ(p₀). Therefore, the formal proof of double robustness consistency properties does not follow in an obvious manner as with the usual estimating function approach.

Let ψ(λ) be a solution to $P_{0} (\overset{‒}{I C} (λ, g_{0}, G_{0})) = 0$ . Empirically, we have found that solving $P_{0} (\overset{‒}{I C} (λ, g_{0}, G_{0})) = 0$ does not imply ψ(λ) = ψ₀. In this case, the estimating function method breaks down since there are multiple solutions to P₀(D(λ, g₀, G₀)) = 0 and they do not all guarantee ψ(λ) = ψ₀. Furthermore, through simulation studies, we have found that if the initial hazard (λ′) is misspecified such that it solves $P_{0} (\overset{‒}{I C} (λ^{'}, g_{0}, G_{0})) = 0$ but ψ(λ′) is inconsistent, then the targeted maximum likelihood algorithm does not update. That is, adding the ε-covariate results in a convex function in ε that should only have a single maximum in ε. In this very special case, the TMLE is inconsistent. However, through the simulation study presented in Section 6, and an extensive study of over 100 data-generating distributions (results provided in Appendix B), we have found that if the initial hazard does not solve $P_{0} (\overset{‒}{I C} (λ^{'}, g_{0}, G_{0})) = 0$ , i.e., the algorithm has a chance to iterate, then the estimator ψ(λ′*) (where = λ′* is the updated hazard based on the targeting algorithm) is indeed consistent. That is, if the initial hazard solves the given equation, the algorithm ends in zero steps and no iterations can take place. If the initial hazard does not solve the given equation, the algorithm will iterate and, based on our results, the estimator based on the updated hazard from the last step of the algorithm is consistent. This was tested for a number of misspecified initial hazards including an intercept-only model. Thus, based on the empirical results, we conjecture that the double robustness properties of the S-TMLE also hold for the D-TMLE as well.

Another interesting note is that the estimates for S₁(t_k) and S₀(t_k) based on λ′* are not necessarily (and are typically not) consistent. Thus, the following can, and often does, occur: $\sum_{t_{k}} log (\frac{log (S_{1} (t_{k}) (λ^{' *}))}{log (S_{0} (t_{k}) (λ^{' *}))}) = \sum_{t_{k}} log (\frac{log (S_{1} (t_{k}) (λ_{0}))}{log (S_{0} (t_{k}) (λ_{0}))})$ , but S₁(t_k)(λ′*) ≠ S₁(t_k)(λ₀) and S₀(t_k)(λ′*) ≠ S₀(t_k)(λ₀). Thus, the overall logrank parameter is consistently estimated, but its components (i.e., S₁(t_k) and S₀(t_k)) are not.

4.2.2. Inference for D-TMLE

The theorem provided in Appendix E can be used to derive the influence curve when ${\hat{λ}}^{*}$ converges to some misspecified λ*, solving $P_{0} \overset{‒}{I C} (λ^{*}, g_{0}, G_{0}) = 0$ and satisfying ψ(λ*) = ψ₀. When this is the case, from the preamble to the theorem, we can see that a contribution to the influence curve comes from $P_{0} D^{F} ({\hat{λ}}^{*})$ , where D^F is the is the efficient influence curve for the full data, i.e., if there is no censoring. Thus, there is a contribution to the influence curve in this situation that would not be accounted for in Eq. (18). For correct inference in this situation, one can apply this theorem to derive the formal influence curve and use this for inference.

Since deriving the formal influence curve is not trivial in this case in which λ* does not correctly specify all unknowns in the full data efficient influence curve, we can consider two straightforward alternatives. We first note that our empirical results suggest that the D-TMLE is more efficient than the S-TMLE. If indeed this property holds in general, then as the first alternative, we can use the influence curve of the S-TMLE to obtain a conservative estimate of the variance of the D-TMLE. The results then apply as in Section 4.1.3. A second alternative is the bootstrap procedure, although we note that this procedure can be computationally intensive, particularly when K and n are large. The confidence intervals and t-statistics can be constructed as in Section 4.1.3.

4.3. Weight Function

The parameter provided in Eq. (5) is really a whole class of logrank parameters, indexed by a choice of weight function. Each can be used to provide a valid test for testing that the treatment effect is 0, i.e., ψ = 0. Therefore, it is of interest to choose a weight function that is most likely to have more power at the alternative. A perfectly reasonable choice of weight function is one that equally weights each of t_k-specific log ratios. A weight function that down weights the time points t_k ∈ 1, . . . , J, in the tail of which there is heavy censoring, may improve the power over the unit weight function. One such weight function is one that takes into account the variance of $log (\frac{log (S_{1} (t_{k}))}{log (S_{0} (t_{k}))})$ , that is, $w (t_{k}) = \frac{1}{v a r (I C_{t_{k}})}$ , where IC_{t_k} is the efficient influence curve of $log (\frac{log (S_{1} (t_{k}))}{log (S_{0} (t_{k}))})$ . This weight function puts more emphasis on those points t_k ∈ 1, . . . , J for which there is less censoring, and thus more information, providing a more stable estimate of the parameter and the variance. In this article, we apply only unit weights.

5. INITIAL HAZARD ESTIMATION

To avoid the potential for selection of covariates to obtain favorable inference, it is imperative to use an a priori specified algorithm for the selection of the initial hazard that is specified in an analysis protocol. Since the hazard can be fitted with a logistic regression model, the initial hazard can be estimated with any model selection algorithm used to estimate logistic regression models with repeated measures. One such approach is the deletion/subtitution/addition (D/S/A) algorithm (Sinisi and van der Laan, 2004). In this algorithm, the parameter of interest (in this case the conditional hazard for survival) is defined in terms of a loss function. Candidate estimators are then generated using deletion/substitution/addition moves that minimize, over subsets of variables (e.g., polynomial basis functions), the empirical risk of subset-specific estimators of the parameter of interest. Among these candidates, the estimator is selected using cross-validation. For purposes of this algorithm, the parameter of interest is the conditional hazard of survival, and we define it as

λ_{0} (t ∣, A, W) = arg min_{λ} E_{0} L (λ)

where the actual repeated measures loss function is given by

L (λ) = \sum_{t} w (t, A, W) {[I (\tilde{T} = t, Δ = 1) - I (\tilde{T} \geq t) λ (t ∣ A, W)]}^{2}

where w(t, A, W) is an arbitrary weight function. Alternatively, we can use the log-likelihood loss function for λ(t | A, W), and apply the D/S/A algorithm as a standard logistic regression model selection with repeated measures for each subject, where candidate estimators are generated also based on the log-likelihood loss function.

An even more optimal algorithm for estimation of the initial conditional hazard for survival is the super-learner algorithm, which begins by selecting a set of candidate prediction algorithms (“learners”) which ideally cover different basis functions (van der Laan et al., 2007). For example, one such learner could be the D/S/A algorithm. The super-learner algorithm then selects α that minimizes

E_{B_{n}} P_{n, B_{n}}^{1} L (\sum_{j} α (j) {\hat{λ}}_{j} (P_{n, B_{n}}^{0}))

where B_n ∈ {0, 1} denotes a random binary vector whose realizations define a split of the learning sample into a training sample {i : B_n(i) = 0} and validation sample {i : B_n(i) = 1}, and $P_{n, B_{n}}^{1}$ and $P_{n, B_{n}}^{0}$ denote the empirical probability distributions of the validation and training sample, respectively. This minimization problem can be solved by formulating it as a least-squares regression problem. Thus, the algorithm finds optimal weighted combinations of the candidate estimators with respect to the squared error (L₂) loss function, with weights defined by α. We note that candidate estimators can be based on the log-likelihood loss function.

In addition to the hazard for survival, the hazard for censoring must also be estimated. One of the algorithms discussed earlier can also be applied to estimate the censoring mechanism. In particular, the super-learner algorithm can be applied to obtain an estimate for the hazard for censoring in the same manner as for the hazard for survival. If censoring is uninformative, then one can use Kaplan–Meier to estimate the censoring mechanism.

6. SIMULATION STUDIES

Data were simulated to mimic an RCT in which the goal is to determine the effectiveness of a new drug in comparison to the current standard of care on “survival” as measured by a occurrence of an event (e.g., particular marker falling below a given level) by 9 months into treatment. The probability of receiving the new treatment is 0.5. Two covariates were negatively correlated with survival time; for example, these covariates might represent age in years (multiplied by 0.1) and weight gain in the year prior to baseline. Specifically, the 2500 replicates of sample size 500 were generated based on the following data-generating distribution where time is discrete and takes values t_k ∈ {1, . . . , 9}:

Pr(A = 1) = Pr(A = 0) = 0.5
W₁ ~ U(2, 6)
W₂ ~ N(10, 10)
$λ (t ∣ A, W) = \frac{I (t_{k} < 9) I (Y (t_{k} - 1) = 0)}{1 + exp (- (- 8 - 0.75 A + 0.3 W_{1}^{2} + 0.25 W_{2}))} + I (t_{k} = 9)$

where λ(t | A, W) is the hazard for survival and Y(t_k) is the indicator that the event has occurred at or before time t_k. The linear correlations between {W₁, W₂} and failure time were approximately {−0.62, −0.52}. Three different types of censoring were simulated: no censoring, MCAR, and MAR. The MCAR and MAR censoring mechanisms were set such that approximately 27% and 20% of the observations were censored, respectively. The censoring was generated to ensure that $\overset{‒}{G} (t ∣ A, W) > 0$ (see the Discussion section for details of this assumption). If censoring and failure time were tied, the subject was considered uncensored. Under MCAR, the hazard for censoring was λ_C(t) = 0.15. Under MAR censoring, the hazard for censoring depends on A and W₁ where the treated subjects (A = 1) have a much higher hazard for censoring for high levels of W₁ than the untreated subjects, whereas the untreated subjects have a much higher hazard for censoring than the treated subjects for low levels of W₁. The specific censoring mechanism is provided in Appendix C.

The unadjusted estimator was applied as defined in Section 3. The two targeted maximum likelihood methods provided in sections 4.1 and 4.2 were applied using three different initial hazard fits. The first initial hazard was correctly specified. The second initial hazard was misspecified by including only a main term for A and W₁. The third initial hazard was misspecified by including only a main term for A and W₂. In the MCAR censoring setting, the censoring mechanism was correctly estimated using Kaplan–Meier. In the MAR censoring setting, the censoring mechanism was correctly specified. Inference for the D-TMLE was based on the variance of the S-TMLE.

The estimators were compared using a relative efficiency (RE) measure based on the mean squared error (MSE), computed as the MSE of the unadjusted estimates divided by the MSE of the targeted maximum likelihood estimates. Thus, a value greater than 1 indicates a gain in efficiency of the covariate adjusted TMLE over the unadjusted estimator.

In addition to these three simulation scenarios, to explore the relationship between RE and the correlation between the covariate and failure time, we generated data with a hazard that is based on A and a single covariate W. The data were simulated such that the correlation between W and failure time ranged from −0.1 through −0.8 while the effect of A on survival remained constant. The data were simulated with these increasing correlations between W and T with both weak effect and strong effects of treatment. Note that in each of these simulation scenarios, there was in fact a treatment effect. We also simulated data according to the same distributions with the exception that the coefficient for treatment was zero to check the Type I error. A sample of the results is provided in Appendix D, where we found that indeed the 0.05 level was correctly maintained.

To demonstrate that the updating of the initial hazard in the TMLE algorithm does indeed result in a bias reduction for the parameter of interest, we also estimated this parameter directly based on the initial hazard. That is, in Eq. (14), instead of S*(t | A, W) based on λ*(t | A, W), we use S⁰(t | A, W) based on the initial hazard λ⁰(t | A, W). Such a substitution estimator relies on correct specification of the regression model, just as the Cox proportional hazards regression model would result in a model-dependent substitution estimator. Thus, we only include these results to demonstrate the bias reduction of the TMLE when the initial hazard is misspecified.

Lastly, in Section 6.3, we provide a simulation study to demonstrate the importance of the use of data-adaptive algorithms in the estimation of the initial hazard with respect to maximal gains in power.

6.1. Simulation Results and Discussion for Various Censoring Scenarios

In the no censoring and MCAR censoring scenarios, the bias should be approximately zero. In this strong covariate setting, exploiting this covariate by applying the TMLE should provide a gain precision due to a reduction in the residuals. In the informative censoring setting (MAR), in addition to the expected gain in efficiency we expect a reduction in bias of the TMLE with the correctly specified treatment mechanism over the unadjusted estimator. The informative censoring is accounted for through the covariate h that is inverse weighted by the subjects’ conditional probability of being observed at time t given their observed history.

Tables 1–3 provide the bias, relative MSE, and power based for the unadjusted and two targeted maximum likelihood approaches for the no censoring, MCAR censoring, and MAR censoring settings respectively. The results show that indeed the expected gain in efficiency is achieved in the no censoring and MCAR censoring scenarios. When the initial hazard was correctly specified, the gain in power for the TMLE was as high as 0.57 in the no censoring scenario over the unadjusted estimator (Table 1). Although the gains are more modest when the initial hazard is misspecified, the gain in power was as high as 0.22 for the TMLE over the unadjusted estimator. Under the MCAR censoring scenario, the gains under misspecification were somewhat smaller, with an increases in power between 0.06 and 0.51 from misspecified to correctly specified initial hazards (Table 2). These results demonstrate that when the initial hazard fit can be closely approximated, the potential reduction in standard error and thus increase in power is substantial.

Table 1.

No censoring: Power and efficiency comparison

Method	% Bias	Power	95% Coverage	RE
Unadjusted	–2	0.39	0.96	1.00
S-TMLE COR	0	0.96	0.94	3.99
D-TMLE COR	0	0.96	0.94	4.00
S-TMLE MIS1	2	0.61	0.95	1.50
D-TMLE MIS1	–1	0.60	0.95	1.59
S-TMLE MIS2	1	0.53	0.94	1.21
D-TMLE MIS2	–2	0.51	0.95	1.29

Open in a new tab

Note. Comparison of the two targeted maximum likelihood approaches the unadjusted logrank under the no censoring setting. Correctly specified initial λ(t | A, W) (COR), misspecified initial λ(t | A, W) includes only a main term for treatment and W₁ (MIS1), and misspecified initial λ(t | A, W) includes only a main term for treatment and W₂ (MIS2).

Table 3.

MAR censoring: Power and efficiency comparison

Method	% Bias	Power	95% Coverage	RE
Unadjusted	21	0.55	0.88	1.00
S-TMLE COR	2	0.94	0.95	3.89
D-TMLE COR	2	0.94	0.95	4.58
S-TMLE MIS1	1	0.57	0.95	1.50
D-TMLE MIS1	0	0.55	0.96	1.83
S-TMLE MIS2	1	0.51	0.94	1.25
D-TMLE MIS2	–2	0.48	0.95	1.53

Open in a new tab

Note. Comparison of the two targeted maximum likelihood approaches to the unadjusted logrank, under MAR censoring. Correctly specified initial λ(t | A, W) (COR), misspecified initial λ(t | A, W) includes only a main term for treatment and W₁ (MIS1), and misspecified initial λ(t | A, W) includes only a main term for treatment and W₂ (MIS2).

Table 2.

MCAR censoring: Power and efficiency comparison

Method	% Bias	Power	95% Coverage	RE
Unadjusted	1	0.43	0.94	1.00
S-TMLE COR	1	0.94	0.94	3.84
D-TMLE COR	2	0.95	0.95	4.12
S-TMLE MIS1	2	0.58	0.95	1.46
D-TMLE MIS1	0	0.56	0.95	1.60
S-TMLE MIS2	1	0.52	0.95	1.31
D-TMLE MIS2	–2	0.49	0.95	1.40

Open in a new tab

Note. Comparison of the two targeted maximum likelihood approaches to the unadjusted logrank, under MCAR censoring. Correctly specified initial λ(t | A, W) (COR), misspecified initial λ(t | A, W) includes only a main term for treatment and W₁ (MIS1), and misspecified initial λ(t | A, W) includes only a main term for treatment and W₂ (MIS2).

Under the MAR setting, the unadjusted estimate is severely biased (~21%), whereas both the TMLEs remain consistent. In such a setting, one must account for the informative censoring as the results from the unadjusted method are completely unreliable. This is a strong advantage of this methodology as it accounts for this bias-inducing censoring, which is often ignored or not correctly handled in RCTs (Wood et al., 2004).

We also note that for all censoring scenarios, the REs are all greater for the D-TMLE. However, the actual power is slightly lower than the S-TMLE. This is due to the fact that at this small sample size, there happens to be a tiny amount of negative finite sample bias for the D-TMLE (average of the 2,500 point estimates is slightly smaller in absolute value than the truth), whereas the S-TMLE is slightly positively biased. Thus, even though the D-TMLE is more efficient than the S-TMLE, the absolute values of the point estimates are slightly smaller, causing the t-statistics to be smaller as well. Thus, the power is lower as well. For larger sample sizes, as the finite sample is eliminated, the power for the D-TMLE will be at least as large as the power for the S-TMLE. Also, as expected, the inference was slightly conservative as compared to the S-TMLE, although one cannot observe this from the presented results that are rounded to 10⁻². The bootstrap procedure would provide less conservative inference.

The results for the substitution estimator based on the initial hazard in the no censoring scenario are provided in Table 4. As expected, when the initial hazard is correctly specified, the performance of the substitution estimator is similar to both TMLEs. However, when the initial hazard is misspecified, it becomes clear that the targeting algorithm is indeed updating the hazard in such a way that the bias is essentially removed. If the updating were not done, from Table 4, we can see that the bias for the first misspecified hazard was 6% and for the second misspecified hazard the bias was 21%.

Table 4.

No censoring: Substitution estimator based on initial hazard (i.e., hazard not updated by TMLE algorithm)

Method	% Bias	Power	95% Coverage
COR	0	0.97	0.95
MIS1	6	0.60	0.95
MIS2	21	0.59	0.93

Open in a new tab

Note. Correctly specified initial λ(t | A, W) (COR), misspecified initial λ(t | A, W) includes only a main term for treatment and W₁ (MIS1), and misspecified initial λ(t | A, W) includes only a main term for treatment and W₂ (MIS2).

6.2. Relationship Between Correlation of Covariate(s) and Failure Time with Efficiency Gain

As the correlation between covariates and failure time increases we expect to observe increasing gains in efficiency. In this simulation study, we include only a single covariate W with no censoring. For simplicity, we include the results for the S-TMLE only. Figure 1 clearly demonstrates that as the correlation between W and failure time increases, so does the gain in power of the S-TMLE over the unadjusted. The gain in power for the strong treatment effect setting has a nearly linear relationship with increasing correlation between W and failure. The weak treatment effect setting has moderate gains until the covariate effect is very strong, when the corresponding gain in power of the S-TMLE is very high. These results reflect similar findings in RCTs with fixed-end point studies where relations between R² and efficiency gain have been demonstrated (Moore and van der Laan, 2009b; Pocock et al., 2002; Tsiatis et al., 2008).

Power by increasing correlation between covariate and survival time. Relationship between the correlation between covariate and survival time (*ρWT*) and the gains in power between the unadjusted and S-TMLE for both strong and weak treatment (tx) effects on survival.

6.3. Importance of the Use of Data-Adaptive Algorithms for Initial Hazard Estimation

In this section, we provide a simulation example to demonstrate the power that one can obtain by using an aggressive algorithm for estimation of the initial hazard, in comparison to a simpler main term only model selection algorithm. These methods are compared to the standard unadjusted approach. This section is not meant to examine the super-learner algorithm in detail, but rather to demonstrate that its use with targeted maximum likelihood estimation can result in significant gains in power over targeted maximum likelihood estimation with less aggressive algorithms. For details on the algorithm as well as some of the candidate learners, we refer the reader to the original paper (van der Laan et al., 2007).

We simulated 500 replicates of sample size 500 from the following data-generating distribution where time is discrete and takes values t_k ∈ {1, . . . , 8}:

Pr(A = 1) Pr(A = 0) = 0.5
W₁ ~ U(2, 5)
W₂ ~ N(−2, 2)
Pr(W₃ = 1) = 0.3 = 1 − Pr(W₃ = 0)
W₄ ~ N(1, 1)
W₅ ~ U(−2, 4)
$λ (t ∣ A, W) = \frac{I (t_{k} < 8) I (Y (t_{k} - 1) = 0)}{1 + exp (- (- 3 - 0.5 A + 0.2 W_{1} W_{2} + 0.2 W_{2} W_{4} - 0.4 W_{4} W_{5} + 0.5 W_{5} W_{2}))} + I (t_{k} = 8)$

The data were generated with no censoring.

In the first method, the initial hazard was estimated using the super-learner algorithm (see Section 5), which included as candidate learners:

Simple linear regression with all five covariates as main terms.
Lasso logistic regression (Tibshirani, 1996).
Random forest (Breiman, 2001).
Generalized additive models (Hastie and Tibshirani, 1990).
K-nearest neighbor classification (Ripley, 1996).

In the second method, the D/S/A algorithm was applied, allowing only main terms (i.e., no interactions or terms with powers greater than 1 were considered). The targeted maximum likelihood method was then applied using these two different initial hazard estimates. For brevity, we include only the results for the S-TMLE. Lastly, for comparison, the unadjusted method was applied.

The percent bias, power, and RE results are provided in Table 5. The results demonstrate that both methods of adjustment result in a gain in power over the unadjusted method. The power for the S-TMLE with super-learner approach is nearly double that of the unadjusted method, and is 30% higher than the S-TMLE with the D/S/A using main terms only. Thus, a significant loss in power would result if an aggressive algorithm for the initial hazard estimation is not utilized. It is of note that a 50% gain in power over the unadjusted was achieved using S-TMLE with the simple main term only approach. It is clear from these results that the targeted maximum likelihood method of covariate adjustment provides gains in efficiency, even with suboptimal methods for initial hazard estimation. However, for even larger gains in efficiency and thus power, more aggressive algorithms such as the super-learner should be used in combination with the targeted maximum likelihood. We note that the method for selection of the initial hazard should be specified in the analysis protocol.

Table 5.

Power and initial hazard estimation

	% Bias	Power	RE
Unadjusted	–2	0.25	1.00
S-TMLE_MT	1	0.37	1.39
S-TMLE_SL	2	0.48	1.54

Open in a new tab

Note. Comparison of the S-TMLE using the main term only D/S/A algorithm for the initial hazard estimation (S-TMLEMT_MT), and S-TMLE using the super-learner algorithm for the initial hazard estimation (S-TMLE_SL).

6.4. Analysis of Single Dataset

In this section, we analyze a single dataset that was simulated based on an actual RCT. The RCT aimed to evaluate the effect of a new treatment versus a standard on the time until viral load fell below 400 copies/ml. The n = 750 patients were followed monthly for a period of 12 months. Four covariates were associated with the viral load that was measured over time: baseline viral load, age, regional indicator Asia, and regional indicator Europe. The covariates were sampled with replacement from the original data. The median of the baseline viral load was approximately 615 copies/ml, the median age was 33, and the probabilities that a patient was from Asia or Europe were 0.3 and 0.15, respectively. The probability of receiving the new treatment was 0.5. Viral loads at each subsequent month were simulated from these four variables in addition to treatment. The first month at which the viral load was below 400 copies/ml was the survival time T for a given patient. In the dataset analyzed, the correlations between baseline viral load, age, regional indicator Asia, and regional indicator Europe with survival time were 0.3, −0.08, 0.12, and −0.07. Approximately 20% of patients were randomly censored. The S-TMLE was applied using the super-learner to estimate the initial hazard. Kaplan–Meier was used to estimate the censoring mechanism. The unadjusted estimator as outlined in Section 3 was also applied.

The point estimate, standard error and p values are provided in Table 6. The standard error was reduced by 11% by adjusting for the covariates using the S-TMLE estimator, and the corresponding p value was reduced from 0.07 to 0.04, which would change the conclusion of the test of the null hypothesis of no treatment effect. Thus, this dataset, which was simulated to mimic a real RCT, demonstrates that even moderate gains in efficiency can change the conclusions of the study.

Table 6.

Single dataset analysis

	Point estimate	SE	p Value
Unadjusted	0.157	0.086	0.071
S-TMLE	0.162	0.078	0.038

Open in a new tab

Note. Comparison of the S-TMLE using the super-learner algorithm for the initial hazard estimation (S-TMLE) and the unadjusted estimator.

7. DISCUSSION

The simulation studies provided in this article clearly demonstrate that significant gains in efficiency and thus power can be achieved over the unadjusted ubiquitous logrank method through covariate adjustment using the targeted maximum likelihood approach. Both the targeted maximum likelihood methods for covariate adjustment presented in this article do not require additional assumptions beyond those required for the logrank test. With the S-TMLE, we were able to show the double robustness consistency properties based on estimating function methodology. With the D-TMLE, the estimating function methodology could not be applied and therefore we provided extensive empirical evidence that these properties also held for this estimator.

We note that the methods presented in this article differ from adjusting through the Cox proportional hazards models as was done in Hernández et al. (2006), which requires additional assumptions about the proportionality of hazards. Furthermore, the method presented in this article provides a method for estimation of the marginal or population level effect of treatment, rather than a conditional effect from a Cox or covariate-adjusted logistic hazard model. We note that the method of targeted maximum likelihood estimation can also be applied to the estimation of conditional or subgroup effects of treatment; however, we focused on marginal effects only in this article.

The simulation study results also demonstrate the importance of the estimation of the initial hazard in optimizing gains in power. The ideal approach includes two steps, where in the first the initial hazard is estimated based on an aggressive data-adaptive approach such as the super-learner algorithm, and in the second the targeting maximum likelihood step is applied as a bias-reduction step for the parameter of interest. These two steps combined provide consistent estimates of the treatment effect with large gains in power over the procedure that ignores covariates.

It is also important to note that the TMLE, like other inverse weighted estimators, relies on the assumption that each subject has a positive probability of being observed (i.e., not censored) just before time t. More formally, this assumption is Ḡ(t_ | A, W) > 0, t_k ∈ 1, . . . , J. This identifiability assumption has been addressed as an important assumption for right-censored data (Robins and Rotnitzky, 1992). One is alerted to such violations by observing very small probabilities of remaining uncensored based on the estimated censoring mechanism, i.e., there are patients with a probability of censoring of almost one given their observed past. We recommend that one should check that Ḡ(t_ | A, W) > 0.1 in practice. When violations of this assumption are present, an new advance to the targeted maximum likelihood approach presented in this article, namely, collaborative targeted maximum likelihood, can be applied (van der Laan and Gruber, 2009). In this approach, a sequence of TMLEs is generated with increasing likelihood indexed by increasingly non-parametric estimates of the censoring mechanism. The censoring mechanism estimator, for which the targeted maximum likelihood step results in the most effective bias reduction with respect to the parameter of interest, is selected using likelihood based cross-validation. Essentially, in this approach, covariates are only included in the censoring mechanism fit if they improve the targeting of the parameter of interest while not grossly affecting the MSE.

The methodology presented in this article can easily be extended to estimation of causal effects in observational studies, such as post-market safety studies. This includes estimation of the causal parameters presented in this article, as well as more complex parameters as defined by marginal structural models. A further extension, in both RCTs and observational studies, is the inclusion of time-dependent covariates, which are often predictive of censoring and/or survival (van der Laan, 2008; Moore and van der Laan, 2009a). Future work includes the application of these methods in observational studies.

APPENDIX A: TMLE FOR t_k-SPECIFIC PARAMETER IS (COLLABORATIVE) DOUBLY ROBUST

We now show that

E_{0} D^{*} (S, g_{0}, G_{0}, S_{0, 1} (t_{k})) = 0

for any S = S(· | A, W), where D*(S, g, G, Ψ(S)) is the efficient influence curve of Ψ(S) = S₁(t_k) at the data-generating distribution identified by S, g, G and the marginal distribution of W.

We have

D^{*} (S, g_{0}, G_{0}, S_{0, 1} (t_{k})) = - \sum_{t \leq t_{k}} \frac{I (A = 1)}{{\overset{‒}{G}}_{0} (t_{_} ∣ A = 1, W) g_{0} (1 ∣ W)} \frac{S (t_{k} ∣ A = 1, W)}{S (t ∣ A = 1, W)} \times (d N (t) - E (d N (t) ∣ \overset{‒}{N} (t_{_}), A = 1, W)) + S (t_{k} ∣ A = 1, W) - S_{0, 1} (t_{k})

Rewrite this as

D^{*} (S, g_{0}, G_{0}, S_{0, 1} (t_{k})) = - \sum_{t \leq t_{k}} \frac{I (C \geq t)}{{\overset{‒}{G}}_{0} (t_{_} ∣ A = 1, W)} \frac{I (A = 1)}{g_{0} (1 ∣ W)} \frac{S (t_{k} ∣ A = 1, W)}{S (t ∣ A = 1, W)} \times (I (T = t) - I (T \geq t) λ (t ∣ A = 1, W)) + S (t_{k} ∣ A = 1, W) - S_{0, 1} (t_{k})

We now take the conditional expectation given I(T̃ ≥ t), A, W, to obtain

- P_{0} D^{*} (S, g_{0}, G_{0}, S_{0, 1} (t_{k})) = P_{0} \sum_{t \leq t_{k}} \frac{I (C \geq t)}{{\overset{‒}{G}}_{0} (t_{_} ∣ A = 1, W)} \frac{I (A = 1)}{g_{0} (1 ∣ W)} \frac{S (t_{k} ∣ A = 1, W)}{S (t ∣ A = 1, W)} \times I (T \geq t) (λ_{0} (t ∣ A = 1, W) - λ (t ∣ A = 1, W)) + P_{0} S (t_{k} ∣ A = 1, W) - S_{0, 1} (t_{k})

We now take the conditional expectation, given I(C ≥ t), A, W, to obtain

- P_{0} D^{*} (S, g_{0}, G_{0}, S_{0, 1} (t_{k})) = P_{0} \sum_{t \leq t_{k}} \frac{I (C \geq t)}{{\overset{‒}{G}}_{0} (t_{_} ∣ A = 1, W)} \frac{I (A = 1)}{g_{0} (1 ∣ W)} \frac{S (t_{k} ∣ A = 1, W)}{S (t ∣ A = 1, W)} \times S_{0} (t - ∣ A, W) (λ_{0} (t ∣ A = 1, W) - λ (t ∣ A = 1, W)) + P_{0} S (t_{k} ∣ A = 1, W) - S_{0, 1} (t_{k})

Define now the term

W^{*} (t) \equiv \frac{S (t_{k} ∣ A = 1, W)}{S (t ∣ A = 1, W)} S_{0} (t - ∣ A, W) (λ_{0} (t ∣ A = 1, W) - λ (t ∣ A = 1, W))

E (I (C \geq t, A = 1) ∣ W^{*} (t), {\overset{‒}{G}}_{0} (t - ∣ A = 1, W) g_{0} (1 ∣ W)) = {\overset{‒}{G}}_{0} (t - ∣ A = 1, W) g_{0} (1 ∣ W)

then we obtain

- P_{0} D^{*} (S, g_{0}, G_{0}, S_{0, 1} (t_{k})) = P_{0} \sum_{t \leq t_{k}} \frac{S (t_{k} ∣ A = 1, W)}{S (t ∣ A = 1, W)} S_{0} (t - ∣ A, W) (λ_{0} (t ∣ A = 1, W) - λ (t ∣ A = 1, W)) + P_{0} S (t_{k} ∣, A = 1, W) - S_{0, 1} (t_{k})

In particular, this identity applies if Ḡ₀(t − | A = 1, W) and g₀(1 | W) are the true conditional distributions, given the whole W, but the requirement just shown is weaker by only requiring that the censoring mechanism G₀ and treatment mechanism g₀ are conditioning on a function of W depending directly on λ – λ₀. In particular, if λ = λ₀, then treatment and censoring mechanism do not need to condition on any covariates at all.

We now take the right-hand side of the latter identity as starting point and prove that it equals zero for all S. Note that this term has nothing to do with the censoring and treatment mechanism anymore. First, note that it can be rewritten as:

- P_{0} D^{*} (S, g_{0}, G_{0}, ψ_{0}) = - P_{0} \sum_{t \leq t_{k}} S (t_{k} ∣ A = 1, W) \times [\frac{S (t_{_} ∣ A = 1, W) f_{0} (t ∣ A = 1, W) - S_{0} (t_{_} ∣ A = 1, W) f (t ∣ A = 1, W)}{S (t ∣ A = 1, W) S (t_{_} ∣ A = 1, W)}] + P_{0} S (t_{k} ∣ A = 1, W) - S_{0, 1} (t_{k})

Here we used ψ₀ to indicate the true target parameter S_{0, 1}(t_k). Now we can use the algebraic trick, ab − cd = (a − c)d + a(b − d) to obtain the expression

E_{0} D^{*} (S, g_{0}, G_{0}, ψ_{0}) = - E_{0} \sum_{t \leq t_{k}} S (t_{k} ∣ A = 1, W) [\frac{(S (t_{_} ∣ A = 1, W) - S_{0} (t ∣ A = 1, W)) f (t ∣ A = 1, W)}{S (t ∣ A = 1, W)) S (t_{_} ∣ A = 1, W)} + \frac{S (t_{_} ∣ A = 1, W) (f_{0} (t ∣ A = 1, W) - f (t ∣ A = 1, W))}{S (t ∣ A = 1, W)) S (t_{_} ∣ A = 1, W)}] + P_{0} S (t_{k} ∣ A = 1, W) - S_{0, 1} (t_{k})

For the continuous survival case, we note that

\begin{matrix} \int_{[0, t_{k}]} (\frac{S (t_{_} ∣ A = 1, W) - S_{0} (t_{_} ∣ A = 1, W)}{S (t_{_} ∣ A = 1, W)}) & = \int_{[0, t_{k}]} - \frac{(S (t_{_} ∣ A = 1, W) - S_{0} (t_{_} ∣ A = 1, W)) f (t_{_} ∣ A = 1, W)}{S^{2} (t_{_} ∣ A = 1, W)} + \frac{f (t_{_} ∣ A = 1, W)}{S (t_{_} ∣ A = 1, W)} - \frac{f_{0} (t_{_} ∣ A = 1, W)}{S (t_{_} ∣ A = 1, W)} \\ = \frac{(S (t_{k} ∣ A = 1, W) - S_{0} (t_{k} ∣ A = 1, W)) f (t_{_} ∣ A = 1, W)}{S^{2} (t_{k} ∣ A = 1, W)} + \frac{(f_{0} (t_{k} ∣ A = 1, W) - f (t_{k} ∣ A = 1, W)) S (t_{k} ∣ A = 1, W)}{S^{2} (t_{k} ∣ A = 1, W)} . \end{matrix}

Therefore, we have

\begin{matrix} E_{0} (D^{*} (S, g_{0}, G_{0}, ψ_{0} ∣ X)) & = - E_{0} \frac{S (t_{k} ∣ A = 1, W) (S (t_{_} ∣ A = 1, W) - S_{0} (t_{_} ∣ A = 1, W))}{S (t_{k} ∣ A = 1, W)} + E_{0} S (t_{k} ∣ A = 1, W) - E_{0} S_{0, 1} (t_{k}) \\ = - E_{0} S (t_{k} ∣ A = 1, W) + S_{0, 1} (t_{k}) + E_{0} S (t_{k} ∣ A = 1, W) - S_{0, 1} (t_{k}) \\ = 0 \end{matrix}

Now for the discrete survival case, it can be shown with some algebra that

\frac{S (t_{k} ∣ A = 1, W) - S_{0} (t_{0} ∣ A = 1, W)}{S (t_{k} ∣ A = 1, W)} - \frac{S (t_{k - 1} ∣ A = 1, W) - S_{0} (t_{k - 1} ∣ A = 1, W)}{S (t_{k - 1} ∣ A = 1, W)} = \frac{S (t_{k} ∣ A = 1, W) - S_{0} (t_{k} ∣ A = 1, W) f (t_{k})}{S (t_{k} ∣ A = 1, W) S (t_{k - 1} ∣ A = 1, W)} - \frac{f (t_{k} ∣ A = 1, W) - f_{0} (t_{k} ∣ A = 1, W) S (t_{k} ∣ A = 1, W)}{S (t_{k} ∣ A = 1, W) S (t_{k - 1} ∣ A = 1, W)}

Furthermore,

\sum_{t_{k} \leq t_{k}} [\frac{S (t_{k} ∣ A = 1, W) - S_{0} (t_{k} ∣ A = 1, W)}{S (t_{k} ∣ A = 1, W)} - \frac{S (t_{k - 1} ∣ A = 1, W) - S_{0} (t_{k - 1} ∣ A = 1, W)}{S (t_{k - 1} ∣ A = 1, W)}] = \frac{S (t_{k} ∣ A = 1, W) - S_{0} (t_{k} ∣ A = 1, W)}{S (t_{k} ∣ A = 1, W)}

Therefore, for discrete survival we can write,

\begin{matrix} E_{0} D^{*} (S, g_{0}, G_{0}, ψ_{0} ∣ X) & = - E_{0} \sum_{t \leq t_{k}} S (t_{k} ∣ A = 1, W) [\frac{S (t_{0} ∣ A = 1, W) - S_{0} (t_{0} ∣ A = 1, W)}{S (t_{0} ∣ A = 1, W)}] + E_{0} S (t_{k} ∣ A = 1, W) - S_{0, 1} (t_{k}) \\ = 0 \end{matrix}

APPENDIX B: EMPIRICAL VALIDATION FOR D-TMLE CONSISTENCY

In this appendix, results based on extensive simulations are included with the purpose of providing empirical validation of the consistency of the D-TMLE when targeting the single parameter directly. We consider the scenario where ${\hat{λ}}^{*}$ converges to some misspecified λ* but the efficient influence curve estimating equation is solved and parameter is consistently estimated. Remarkably, even though the efficient influence curve estimating equation is not an estimating equation with a variation-independent parameterization in the parameter of interest (ψ) and the nuisance parameters, we still obtain consistent estimates of ψ even with this misspecified λ.

Four different types data-generating distributions were used. For all four data-generating mechanisms, the two covariate distributions were given by W₁ ~ U(2, 6) and W₂ ~ N(1, 1). The data generating distributions differed by definition of the hazard and treatment mechanism. In the first three settings, treatment was randomized, P(A = 1) = P(A = 0) = 0.5. In the fourth setting, treatment mechanism given by $P (A = 1 ∣ W) = \frac{1}{1 + exp (- (- 0.75 + .3 * W_{2}))}$ and P(A = 0 | W) = 1 − P(A = 1 | W). The settings are summarized as follows:

Constant hazard, treatment randomized:
$λ (t ∣ A, W) = \frac{I (t_{k} < 10) I (Y (t_{k} - 1) = 0)}{1 + exp (- (β_{0} - β_{1} A + β_{2} W_{1}^{2} + β_{3} W_{2}))} + I (t_{k} = 10)$
Hazard changed over time, treatment randomized:
$λ (t ∣ A, W) = \frac{I (t_{k} < 10) I (Y (t_{k} - 1) = 0)}{1 + exp (- (β_{0} - β_{1} A + β_{2} W_{1}^{2} + β_{3} W_{2} + β_{4} * t_{k}))} + I (t_{k} = 10)$
Hazard changed over time (interaction between time and covariate), treatment randomized:
$λ (t ∣ A, W) = \frac{I (t_{k} < 10) I (Y (t_{k} - 1) = 0)}{1 + exp (- (β_{0} - β_{1} A + β_{2} W_{1} * t_{k} + β_{3} W_{2}))} + I (t_{k} = 10)$
Constant hazard, treatment not randomized:
$λ (t ∣ A, W) = \frac{I (t_{k} < 10) I (Y (t_{k} - 1) = 0)}{1 + exp (- (β_{0} - β_{1} A + β_{2} W_{1}^{2} + β_{3} W_{2}))} + I (t_{k} = 10)$

For each setting, 25 sets of parameter values that define the hazard were selected from the following distributions, respectively:

β₀ ~ U(−8, −1), β₁ ~ U(−1, 1), β₂ ~ U(−0.5, 0.5), and β₃ ~ U(−2.5, 2.5).
β₀ ~ U(−3, 1), β₁ ~ U(−1, 1), β₂ ~ U(−0.4, 0.4), β₃ ~ U(−2, 2), and β₄ ~ U(−0.2, 0.4).
β₀ ~ U(−5, −1), β₁ ~ U(−1, 1), β₂ ~ U(−0.1, 0.2), and β₃ ~ U(0, 2).
β₀ ~ U(−8, −1), β₁ ~ U(−1, 1), β₂ ~ U(−0.5, 0.5), and β₃ ~ U(−2.5, 2.5).

For simplicity, there was no censoring. Generating the data in such a way provided different levels of correlation between the covariates and the outcome, as well as differing effects of treatment on survival, from negative to positive. A single large dataset of with n = 25,000 observations was generated for each of the 25 different randomly selected parameter values for each of the four settings (i.e., 100 simulation settings). The D-TMLE of the average minimal parameter was applied. The initial hazard was estimated based on (1) an intercept only model, (2) treatment and covariate W₁ only, and (3) treatment and covariate W₂ only. In the nonrandomized setting, the treatment mechanism was correctly specified.

The results for the 100 simulations, segmented by simulation setting and within each setting ordered by ψ₀ are provided in Figures B1–B3, which correspond with different misspecified initial hazard estimates. Clearly, the D-TMLE, even for the intercept only initial hazard (Figure B1) is consistently estimating ψ. There remains some noise due to the fact that the initial hazard is so grossly misspecified; however, the D-TMLE is still performing well. Once covariates are included, even though the initial hazard is still misspecified, the D-TMLE is performing even better (Figures B2 and B3). These results provide empirical evidence that indeed although the consistency properties of the D-TMLE cannot be derived based on the usual estimating function methodology, that the estimator is consistent as long as the initial estimate is not already a solution to the efficient estimating equation (i.e., the algorithm has a chance to iterate).

Figure B1 — D-TMLE results for 100 data-generating distributions with misspecified initial hazard that includes intercept only. Open points are the true values; filled-in points are the directly targeted maximum likelihood estimates.

Figure B3 — D-TMLE results for 100 data-generating distributions with misspecified initial hazard that incorrectly excludes W₁ term. Open points are the true values; filled-in points are the directly targeted maximum likelihood estimates.

Figure B2 — D-TMLE results for 100 data-generating distributions with misspecified initial hazard that incorrectly excludes W₂ term. Open points are the true values; filled-in points are the directly targeted maximum likelihood estimates.

APPENDIX C: MAR CENSORING MECHANISM USED IN SIMULATION STUDY

For t_k ∈ 2, . . . , 9,

λ_{C} (t ∣ A, W_{1}) = {\begin{matrix} 0.25 & if W_{1} > 4.5 and A = 1 \\ 0.2 & if 4.5 \leq W_{1} > 3.5 and A = 1 \\ 0.05 & if 3.5 \leq W_{1} > 2.5 and A = 1 \\ 0 & if W_{1} > 3.5 and A = 0 \\ 0.25 & if 3.5 \leq W_{1} > 2.5 and A = 0 \\ 0.05 & if W_{1} \leq 2.5 \end{matrix}

For t_k = 1, λ_C(t | A, W₁) = 0.

APPENDIX D: CHECKING TYPE I ERROR CONTROL

The data were simulated using the same data generating distribution as in Section 6 with the exception that the coefficient for treatment was set to zero. The results for the no censoring scenario for the D-TMLE are provided in Table D1 and show that indeed the type I error control was maintained.

Table D1.

Type I error control: No censoring, no treatment effect

Method	Rejections (%)
Unadjusted	0.048
D-TMLE COR	0.051
D-TMLE MIS1	0.048
D-TMLE MIS2	0.052

Open in a new tab

APPENDIX E: ASYMPTOTIC LINEARITY OF D-TMLE AND TEMPLATE FOR DERIVATION OF INFLUENCE CURVE

In this appendix, we establish the asymptotic linearity of the D-TMLE for the parameter of interest without having to use that the gradient or canonical gradient of the pathwise derivative can be represented as an estimating function for the parameter of interest. In addition, a template for the formal derivation of the influence curve is outlined.

Consider CAR-censored data models so that D(P) = D(λ(P), G(P)), Ψ(P) depends on P through λ(P) only, and the density factorizes as p = λ(p)G(p). Let $p_{n}^{*} = {\hat{λ}}^{*} {\hat{G}}^{*}$ . We consider the case that ${\hat{G}}^{*}$ is assumed to be consistent for G₀. We proceed as follows:

\begin{matrix} P_{0} D ({\hat{λ}}^{*}, {\hat{G}}^{*}) & = P_{0} D ({\hat{λ}}^{*}, G_{0}) - {P_{0} D ({\hat{λ}}^{*}, {\hat{G}}^{*}) - D ({\hat{λ}}^{*}, G_{0})} \\ = P_{0} D ({\hat{λ}}^{*}, G_{0}) + P_{0} {D (λ_{0}^{*}, {\hat{G}}^{*}) - D (λ_{0}^{*}, G_{0})} - R_{1 n} \end{matrix}

where

R_{1 n} = P_{0} {D ({\hat{λ}}^{*}, {\hat{G}}^{*}) - D ({\hat{λ}}^{*}, G_{0})} - P_{0} {D (Q_{0}^{*}, G_{0}) - D (Q_{0}^{*}, G_{0})}

Here R_1n is a second-order term and therefore it is natural to make it an assumption that R_1n = o_P(1 / √n). Second, we define

Φ ({\hat{G}}^{*}) \equiv P_{0} D (λ_{0}^{*}, {\hat{G}}^{*}),

so that the term $P_{0} {D (λ_{0}^{*}, {\hat{G}}^{*}) - D (λ_{0}^{*}, G_{0})}$ equals $Φ ({\hat{G}}^{*}) - Φ (G_{0})$ . We now assume that $Φ ({\hat{G}}^{*})$ is an efficient estimator of the parameter Φ(G₀) in model $M (G) = {p_{λ, G} = λ G : G \in G}$ , where we denote the tangent space generated by model $G$ for G₀ at P₀ = λ₀G₀ with T_g(P₀). It remains to consider $P_{0} D ({\hat{λ}}^{*}, G_{0})$ . By the general representation Theorem 1.3 in van der Laan and Robins (2003) it follows that

P_{0} D ({\hat{λ}}^{*}, G_{0}) = P_{λ_{0}} D^{F} ({\hat{λ}}^{*})

where D^F (λ) is a gradient in the full data model λ for the parameter λ → Ψ(λ), and P_λ0 denotes the full data distribution. Again, by pathwise differentiability of Ψ in the full data model, if $D^{F} ({\hat{λ}}^{*})$ consistently estimates D^F (λ₀), then one expects $P_{λ_{0}} D^{F} ({\hat{λ}}^{*}) = ψ_{0} - Ψ ({\hat{λ}}^{*}) + o_{P} (1 ∕ \sqrt{n})$ . In general, we note that, if ${\hat{λ}}^{*}$ converges to some possibly misspecified λ* for which Ψ(λ*) = Ψ(λ₀) and P₀D^F (λ*) = 0, we have

P_{λ_{0}} D^{F} ({\hat{λ}}^{*}) = P_{λ^{*}} D^{F} ({\hat{λ}}^{*}) + P_{λ_{0} - λ^{*}} {D^{F} ({\hat{λ}}^{*}) - D^{F} (λ^{*})}

By pathwise differentiability, and the convergence of ${\hat{λ}}^{*}$ to λ*, the first-order Taylor expansion suggests

P_{λ^{*}} D^{F} ({\hat{λ}}^{*}) = ψ_{0} - Ψ ({\hat{λ}}^{*}) + o_{P} (1 ∕ \sqrt{n})

A separate study of the other term (which can be represented as $Φ ({\hat{λ}}^{*}) - Φ (λ^{*})$ for some Φ) will result in an asymptotic linearity result:

P_{λ_{0} - λ^{*}} {D^{F} ({\hat{λ}}^{*}) - D^{F} (λ^{*})} = (P_{n} - P_{0}) D_{1} (P_{0}) + o_{P} (1 ∕ \sqrt{n})

To stay general, we assume the expansion:

P_{0} D ({\hat{λ}}^{*}, G_{0}) = ψ_{0} - Ψ ({\hat{λ}}^{*}) + \frac{1}{n} \sum_{i = 1}^{n} D_{1} (P_{0}) + o_{P} (1 ∕ \sqrt{n})

for some D₁(P₀). By Theorem 2.3 in van der Laan and Robins (2003) the influence curve of $Φ ({\hat{G}}^{*})$ equals $- Π (D (λ_{0}^{*}, G_{0}) + D_{1} (P_{0}) ∣ T_{g} {(P_{0})}^{⊥})$ . This proves the following which provides a template for establishing asymptotic linearity of the D-TMLE in CAR censored data models.

Theorem 1. Let O₁, . . . , O_n ~ P₀be n i.i.d. copies of O = Φ(C, X) for some many to one mapping Φ of censoring variable C and full data structure X. Assume that the conditional distribution G₀of C, given X, satisfies CAR so that p₀ = λ₀G₀w.r.t to appropriate dominating measure, G₀is a density of G₀and λ₀a function of distribution of full data X. Let $M = {p_{λ G} = λ G : λ \in λ, G \in G}$ , where $G$ is a subset of all CAR distributions. Let $Ψ : λ \to {IR}^{d}$ be the Euclidean target parameter of interest. Let D(P) = D(λ(P), G(P)) be a gradient of Ψ at $P \in M$ . Consider an estimator $P_{n}^{*}$ with density $p_{n}^{*} = {\hat{λ}}^{*} {\hat{G}}^{*}$ satisfying $P_{n} D ({\hat{λ}}^{*}, {\hat{G}}^{*}) = 0$ .

Define
$R_{1 n} \equiv P_{0} {D ({\hat{λ}}^{*}, {\hat{G}}^{*}) - D ({\hat{λ}}^{*}, G_{0})} - P_{0} {D (λ_{0}^{*}, G_{0}) - D (λ_{0}^{*}, G_{0})}$
Assume R_1n = o_P(1/ √n).
Define
$Φ (g_{n}^{*}) \equiv P_{0} D (λ_{0}^{*}, {\hat{G}}^{*})$
where P₀and $λ_{0}^{*}$ are treated as given. Assume that $Φ ({\hat{G}}^{*})$ is an efficient estimator of the parameter Φ(G₀) in model $M (G) = {p_{λ, G} = λ G : λ \in λ, G \in G}$ , and let T_G(P₀) denote the tangent space generated by model $G$ for G₀at P₀ = λ₀G₀.
Assume the expansion:
$P_{0} D ({\hat{λ}}^{*}, G_{0}) = ψ_{0} - Ψ ({\hat{λ}}^{*}) + \frac{1}{n} \sum_{i = 1}^{n} D_{1} (P_{0}) + o_{P} (1 ∕ \sqrt{n})$
for some D₁(P₀).
Assume $D ({\hat{λ}}^{*}, {\hat{G}}^{*})$ falls in a P₀-Donsker class. Then, $Ψ (P_{n}^{*}) - ψ_{0} = O_{P} (1 ∕ \sqrt{n})$ .
In addition, assume $P_{0} {D ({\hat{λ}}^{*}, {\hat{G}}^{*}) - D (λ_{0}^{*}, G_{0})}^{2} \to 0$ in probability as n → ∞ for some $λ_{0}^{*}$ and $D (λ_{0}^{*}, G_{0})$ in the P₀-Donsker class.

Then,

Ψ (P_{n}^{*}) - ψ_{0} = (P_{n} - P_{0}) I C (P_{0}) + o_{P} (1 ∕ \sqrt{n})

where

I C (P_{0}) \equiv Π (D (λ_{0}^{*}, G_{0}) + D_{1} (P_{0}) ∣ T_{g} {(P_{0})}^{⊥})

Π is the projection operator in $L_{0}^{2} (P_{0})$ endowed with inner product ${〈 f, g 〉}_{P_{0}} = E_{P_{0}} f g$ onto the orthogonal complement of T_g(P₀). If D₁(P₀) = 0 and $D (λ_{0}^{*}, G_{0}) = D^{*} (λ_{0}, G_{0})$ where D* is the canonical gradient, then $Ψ (P_{n}^{*})$ is asymptotically efficient.

Footnotes

Publisher's Disclaimer: Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

REFERENCES

Akazawa K, Nakamura T, Palesch Y. Power of logrank test and cox regression model in clinical trials with heterogenous samples. Stat. Med. 1997;16(5):583–597. doi: 10.1002/(sici)1097-0258(19970315)16:5<583::aid-sim433>3.0.co;2-z. [DOI] [PubMed] [Google Scholar]
Breiman L. Random forests. Machine Learning. 2001;V45(1):5–32. [Google Scholar]
Cox DR. Regression models and life-tables. J. Roy. Stat. Society Ser. B (Methodological) 1972;34(2):187–220. [Google Scholar]
Hastie T, Tibshirani R. Generalized Additive Models. Chapman & Hall/CRC; Boca Raton, FL: 1990. [Google Scholar]
Hernández AV, Eijkemans MJ, Steyerberg EW. Randomized controlled trials with time-to-event outcomes: How much does prespecified covariate adjustment increase power? Ann. Epidemiol. 2006;16(1):41–48. doi: 10.1016/j.annepidem.2005.09.007. [DOI] [PubMed] [Google Scholar]
Jiang H, Symanowski J, Paul S, Qu Y, Zagar A, Hong S. The type I error and power of non-parametric logrank and Wilcoxon tests with adjustment for covariates—A simulation study. Stat. Med. 2008;27(28):5850–5860. doi: 10.1002/sim.3406. [DOI] [PubMed] [Google Scholar]
Koch G, Tangen CM, Jung J-W, Amara IA. Issues for covariance analysis of dichotomous and ordered categorical data from randomized clinical trials and non-parametric strategies for addressing them. Statistics in Medicine. 1998;17(15–16):1863–1892. doi: 10.1002/(sici)1097-0258(19980815/30)17:15/16<1863::aid-sim989>3.0.co;2-m. [DOI] [PubMed] [Google Scholar]
Lu X, Tsiatis A. Improving the efficiency of the log-rank test using auxiliary covariates. Biometrika. 2008;95(3):679–694. [Google Scholar]
Moore K, van der Laan M. Application of time-to-event methods in the assessment of safety in clinical trials. In: Peace K, editor. Design and Analysis of Clinical Trials with Time-to-Event Endpoints. Taylor & Francis; Philadelphia: 2009a. pp. 455–482. [Google Scholar]
Moore KL, van der Laan MJ. Covariate adjustment in randomized trials with binary outcomes: Targeted maximum likelihood estimation. Stat. Med. 2009b;28(1):39–64. doi: 10.1002/sim.3445. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pocock S, Assmann S, Enos L, Kasten L. Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: Current practice and problems. Stat. Med. 2002;21:2917–2930. doi: 10.1002/sim.1296. [DOI] [PubMed] [Google Scholar]
Ripley BD. Pattern Recognition and Neural Networks. Cambridge University Press; Cambridge: 1996. [Google Scholar]
Robins J, Rotnitzky A. Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell N, Dietz K, Farewell V, editors. AIDS Epidemiology, Methodological Issues. Bikhäuser; Boston, MA: 1992. pp. 297–331. [Google Scholar]
Sinisi S, van der Laan MJ. The deletion/substitution/addition algorithm in loss function based estimation: Applications in genomics. Stat. Appl. Genet. Mol. Biol. 2004;3(1) doi: 10.2202/1544-6115.1069. Article 18. [DOI] [PubMed] [Google Scholar]
Tangen CM, Koch GG. Non-parametric analysis of covariance for hypothesis testing with logrank and wilcoxon scores and survival-rate estimation in a randomized clinical trial. J. Biopharm. Stat. 1999;9(2):307–338. doi: 10.1081/BIP-100101179. [DOI] [PubMed] [Google Scholar]
Tibshirani R. Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B. 1996;58(1):267–288. [Google Scholar]
Tsiatis AA, Davidian M, Zhang M, Lu X. Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: A principled yet flexible approach. Stat. Med. 2008;27(23):4658–4677. doi: 10.1002/sim.3113. [DOI] [PMC free article] [PubMed] [Google Scholar]
van der Laan M, Gruber S. Technical Report 246, Division of Bfiostatistics. University of California; Berkeley: 2009. Collaborative Double Robust Targeted Penalized Maximum Likelihood Estimation. [Google Scholar]
van der Laan M, Polley E, Hubbard A. Technical Report 222, Division of Biostatistics. University of California; Berkeley: 2007. Super Learner. [Google Scholar]
van der Laan MJ. Technical Report 232, Division of Biostatistics. University of California; Berkeley: 2008. The Construction and Analysis of Adaptive Group Sequential Designs. [Google Scholar]
van der Laan MJ, Robins JM. Unified Methods for Censored Longitudinal Data and Causality. Springer; New York: 2003. [Google Scholar]
van der Laan MJ, Rubin D. Targeted maximum likelihood learning. Int. J. Biostatist. 2006;2(1) article 11. [Google Scholar]
Wood A, White I, Thompson S. Are missing outcome data adequately handled? A review of published randomized controlled trials in major medical journals. Clin. Trials. 2004;1:368–376. doi: 10.1191/1740774504cn032oa. [DOI] [PubMed] [Google Scholar]
Zhang M, Tsiatis AAA, Davidian M. Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics. 2008;64(3):707–715. doi: 10.1111/j.1541-0420.2007.00976.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] Akazawa K, Nakamura T, Palesch Y. Power of logrank test and cox regression model in clinical trials with heterogenous samples. Stat. Med. 1997;16(5):583–597. doi: 10.1002/(sici)1097-0258(19970315)16:5<583::aid-sim433>3.0.co;2-z. [DOI] [PubMed] [Google Scholar]

[R2] Breiman L. Random forests. Machine Learning. 2001;V45(1):5–32. [Google Scholar]

[R3] Cox DR. Regression models and life-tables. J. Roy. Stat. Society Ser. B (Methodological) 1972;34(2):187–220. [Google Scholar]

[R4] Hastie T, Tibshirani R. Generalized Additive Models. Chapman & Hall/CRC; Boca Raton, FL: 1990. [Google Scholar]

[R5] Hernández AV, Eijkemans MJ, Steyerberg EW. Randomized controlled trials with time-to-event outcomes: How much does prespecified covariate adjustment increase power? Ann. Epidemiol. 2006;16(1):41–48. doi: 10.1016/j.annepidem.2005.09.007. [DOI] [PubMed] [Google Scholar]

[R6] Jiang H, Symanowski J, Paul S, Qu Y, Zagar A, Hong S. The type I error and power of non-parametric logrank and Wilcoxon tests with adjustment for covariates—A simulation study. Stat. Med. 2008;27(28):5850–5860. doi: 10.1002/sim.3406. [DOI] [PubMed] [Google Scholar]

[R7] Koch G, Tangen CM, Jung J-W, Amara IA. Issues for covariance analysis of dichotomous and ordered categorical data from randomized clinical trials and non-parametric strategies for addressing them. Statistics in Medicine. 1998;17(15–16):1863–1892. doi: 10.1002/(sici)1097-0258(19980815/30)17:15/16<1863::aid-sim989>3.0.co;2-m. [DOI] [PubMed] [Google Scholar]

[R8] Lu X, Tsiatis A. Improving the efficiency of the log-rank test using auxiliary covariates. Biometrika. 2008;95(3):679–694. [Google Scholar]

[R9] Moore K, van der Laan M. Application of time-to-event methods in the assessment of safety in clinical trials. In: Peace K, editor. Design and Analysis of Clinical Trials with Time-to-Event Endpoints. Taylor & Francis; Philadelphia: 2009a. pp. 455–482. [Google Scholar]

[R10] Moore KL, van der Laan MJ. Covariate adjustment in randomized trials with binary outcomes: Targeted maximum likelihood estimation. Stat. Med. 2009b;28(1):39–64. doi: 10.1002/sim.3445. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Pocock S, Assmann S, Enos L, Kasten L. Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: Current practice and problems. Stat. Med. 2002;21:2917–2930. doi: 10.1002/sim.1296. [DOI] [PubMed] [Google Scholar]

[R12] Ripley BD. Pattern Recognition and Neural Networks. Cambridge University Press; Cambridge: 1996. [Google Scholar]

[R13] Robins J, Rotnitzky A. Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell N, Dietz K, Farewell V, editors. AIDS Epidemiology, Methodological Issues. Bikhäuser; Boston, MA: 1992. pp. 297–331. [Google Scholar]

[R14] Sinisi S, van der Laan MJ. The deletion/substitution/addition algorithm in loss function based estimation: Applications in genomics. Stat. Appl. Genet. Mol. Biol. 2004;3(1) doi: 10.2202/1544-6115.1069. Article 18. [DOI] [PubMed] [Google Scholar]

[R15] Tangen CM, Koch GG. Non-parametric analysis of covariance for hypothesis testing with logrank and wilcoxon scores and survival-rate estimation in a randomized clinical trial. J. Biopharm. Stat. 1999;9(2):307–338. doi: 10.1081/BIP-100101179. [DOI] [PubMed] [Google Scholar]

[R16] Tibshirani R. Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B. 1996;58(1):267–288. [Google Scholar]

[R17] Tsiatis AA, Davidian M, Zhang M, Lu X. Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: A principled yet flexible approach. Stat. Med. 2008;27(23):4658–4677. doi: 10.1002/sim.3113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] van der Laan M, Gruber S. Technical Report 246, Division of Bfiostatistics. University of California; Berkeley: 2009. Collaborative Double Robust Targeted Penalized Maximum Likelihood Estimation. [Google Scholar]

[R19] van der Laan M, Polley E, Hubbard A. Technical Report 222, Division of Biostatistics. University of California; Berkeley: 2007. Super Learner. [Google Scholar]

[R20] van der Laan MJ. Technical Report 232, Division of Biostatistics. University of California; Berkeley: 2008. The Construction and Analysis of Adaptive Group Sequential Designs. [Google Scholar]

[R21] van der Laan MJ, Robins JM. Unified Methods for Censored Longitudinal Data and Causality. Springer; New York: 2003. [Google Scholar]

[R22] van der Laan MJ, Rubin D. Targeted maximum likelihood learning. Int. J. Biostatist. 2006;2(1) article 11. [Google Scholar]

[R23] Wood A, White I, Thompson S. Are missing outcome data adequately handled? A review of published randomized controlled trials in major medical journals. Clin. Trials. 2004;1:368–376. doi: 10.1191/1740774504cn032oa. [DOI] [PubMed] [Google Scholar]

[R24] Zhang M, Tsiatis AAA, Davidian M. Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics. 2008;64(3):707–715. doi: 10.1111/j.1541-0420.2007.00976.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

INCREASING POWER IN RANDOMIZED TRIALS WITH RIGHT CENSORED OUTCOMES THROUGH COVARIATE ADJUSTMENT

K L Moore

M J van der Laan

Abstract

1. INTRODUCTION

2. DATA, MODEL, AND PARAMETER OF INTEREST

3. UNADJUSTED ESTIMATION OF Ψ(p0)

4. TARGETED MAXIMUM LIKELIHOOD ESTIMATION OF Ψ(p0)

4.1. Method 1: Substitution TMLE

4.1.1. Efficient Influence Curves

4.1.2. Double Robustness Consistency Properties of the S-TMLE

4.1.3. Inference for S-TMLE

4.2. Method 2: Direct TMLE

4.2.1. Double Robustness Consistency Properties of the D-TMLE

4.2.2. Inference for D-TMLE

4.3. Weight Function

5. INITIAL HAZARD ESTIMATION

6. SIMULATION STUDIES

6.1. Simulation Results and Discussion for Various Censoring Scenarios

Table 1.

Table 3.

Table 2.

Table 4.

6.2. Relationship Between Correlation of Covariate(s) and Failure Time with Efficiency Gain

Figure 1.

6.3. Importance of the Use of Data-Adaptive Algorithms for Initial Hazard Estimation

Table 5.

6.4. Analysis of Single Dataset

Table 6.

7. DISCUSSION

APPENDIX A: TMLE FOR tk-SPECIFIC PARAMETER IS (COLLABORATIVE) DOUBLY ROBUST

APPENDIX B: EMPIRICAL VALIDATION FOR D-TMLE CONSISTENCY

Figure B1.

Figure B3.

Figure B2.

APPENDIX C: MAR CENSORING MECHANISM USED IN SIMULATION STUDY

APPENDIX D: CHECKING TYPE I ERROR CONTROL

Table D1.

APPENDIX E: ASYMPTOTIC LINEARITY OF D-TMLE AND TEMPLATE FOR DERIVATION OF INFLUENCE CURVE

Footnotes

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

3. UNADJUSTED ESTIMATION OF Ψ(p₀)

4. TARGETED MAXIMUM LIKELIHOOD ESTIMATION OF Ψ(p₀)

APPENDIX A: TMLE FOR t_k-SPECIFIC PARAMETER IS (COLLABORATIVE) DOUBLY ROBUST