Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Mar 1.
Published in final edited form as: Biometrika. 2014 Dec 24;102(1):151–168. doi: 10.1093/biomet/asu050

Doubly Robust Learning for Estimating Individualized Treatment with Censored Data

Y Q Zhao 1, D Zeng 2, E B Laber 3, R Song 4, M Yuan 5, M R Kosorok 6
PMCID: PMC4414056  NIHMSID: NIHMS620780  PMID: 25937641

Summary

Individualized treatment rules recommend treatments based on individual patient characteristics in order to maximize clinical benefit. When the clinical outcome of interest is survival time, estimation is often complicated by censoring. We develop nonparametric methods for estimating an optimal individualized treatment rule in the presence of censored data. To adjust for censoring, we propose a doubly robust estimator which requires correct specification of either the censoring model or survival model, but not both; the method is shown to be Fisher consistent when either model is correct. Furthermore, we establish the convergence rate of the expected survival under the estimated optimal individualized treatment rule to the expected survival under the optimal individualized treatment rule. We illustrate the proposed methods using simulation study and data from a Phase III clinical trial on non-small cell lung cancer.

Keywords: Censored data, Doubly robust estimator, Individualized treatment rule, Risk bound, Support vector machine

1. Introduction

Clinicians routinely tailor treatment to the individual characteristics of each patient. Individualized treatment rules formalize this practice by mapping patient characteristics to a recommended treatment. There is a large body work on estimation of optimal individualized treatment rules, using data from clinical trials or observational studies (Murphy, 2003; Robins, 2004; Zhao et al., 2009, 2011; Qian & Murphy, 2011; Zhang et al., 2012b). Regression-based approaches model the regression of outcome on patient covariates and treatment and infer the optimal individualized treatment rule from the modeled regression. The performance of regression-based methods depends critically on the predictive performance of the estimated regression model. In addition, because regression-based approaches require the modeling of treatment-covariate interactions, the number of terms can be large with high-dimensional covariates. An alternative class of procedures, known as classification-based methods, maximize an estimator of the marginal mean outcome over a pre-specified class of individualized treatment rules. These methods typically rely on fewer modeling assumptions about the conditional distribution of the outcome given covariates and treatment and so are potentially more robust to model misspecification; furthermore, they avoid inversion of a predictive model, which can be computationally expensive in some settings. Zhao et al. (2012) and Zhang et al. (2012a) showed that maximization of the estimated marginal mean outcome is equivalent to minimizing a weighted misclassification error with weights that are proportional to the observed clinical outcomes. Classification-based approaches have been shown to work well in settings without censoring (Zhao et al., 2012; Zhang et al., 2012a; Kang et al., 2014; Zhao et al., 2014). However, heretofore both regression-based and classification-based methods were restricted to use with non-censored data.

When the primary outcome of interest is survival time, the observations are commonly subject to right censoring because of subject dropout or administrative censoring. One approach is to fit a parametric or semiparametric survival model, including patient covariates and treatments to infer the optimal decision rule from the fitted survival model. Goldberg & Kosorok (2012) model the completely observed survival time and adjust for censoring by inverse probability of censoring weighting. These methods are intended to form high-quality predictions but may not be consistent for the optimal treatment rule (Qian & Murphy, 2011). Furthermore, parametric or semiparametric models can be sensitive to model misspecification and inverse-weighting may suffer from numerical instability when the censoring rate is high.

We extend the outcome weighted learning approach of Zhao et al. (2012) to accommodate censored data. The extension involves maximizing an estimator of the mean survival time under right censoring. The method avoids inversion of a predictive model by recasting the estimated mean survival time as a weighted misclassification rate where the weights involve both the observed outcome and inverse probability of censoring weights. We also introduce a doubly robust version of outcome weighted learning to account for potential bias introduced by a misspecified censoring model. The method is doubly robust in the sense that the obtained individualized treatment rule is consistent for the optimal rule when the model for either survival or censoring times is correct, but not necessarily both. We use a convex relaxation idea from support vector machines (Cortes & Vapnik, 1995) to construct a computationally efficient algorithm.

2. Methodology

2·1. Value function and optimal treatment rule

Let denote survival time, X = (X1,…, Xd)TInline graphic denote subject covariates, and A ∈ {−1, 1} denote the binary treatment assigned. Define τ to be the end of the study; because there is no information about survival beyond τ we use T = min(τ, T̃) as the outcome of interest. When we are interested in survival time on the log scale, we can use log(T) as the outcome. We assume that data are collected in a randomized trial so that treatment A is randomly assigned with a randomization probability that is completely determined by X. Thus, there are no unmeasured confounders (Rubin, 1974, 1978; Splawa-Neyman et al., 1990). Furthermore, we assume that π(a; X) = pr(A = a |X) is strictly bounded away from zero with probability 1 for each a. A treatment rule, say Inline graphic, is a function from Inline graphic into the space of treatments {−1, 1}; under Inline graphic, a patient with covariates X = x is assigned treatment Inline graphic(x). The value of a regime Inline graphic, denoted V( Inline graphic), is the expected outcome under Inline graphic. Let E denote expectation with respect to the distribution of (T, A, X) in the observed data, and Inline graphic denote expectation under the restriction that A = Inline graphic(X), then it can be shown (Qian and Murphy, 2011) that

V(D)=ED(T)=E{TX,A=D(X)}=E[TI{A=D(X)}π(A;X)], (1)

where I(·) is an indicator function. A treatment rule, say Inline graphic, is said to be optimal if V ( Inline graphic) ≥ V( Inline graphic) for all rules Inline graphic. To characterize Inline graphic, write the last term in (1) as E {[I{ Inline graphic(X) = 1}E(T | A = 1, X) + I{ Inline graphic(X) = −1}E(T | A = −1, X)]} which implies

D(x)=sign{E(TA=1,X=x)-E(TA=-1,X=x)}. (2)

Thus, Inline graphic(x) is the maximizer of E(T|X = x, A = a) with respect to a.

2·2. Outcome weighted learning with censored data

Censoring due to patient dropout is commonly seen in studies of survival time. Let C denote the potential censoring time, which could exceed τ, and assume that C and T are independent given (A, X). We observe data comprising n independent identically distributed subjects, {Yi = TiCi, Δi = I(TiCi), Xi, Ai}, i = 1, , n, where Δ = I(TC) denotes the censoring indicator. Our goal is to estimate the optimal treatment rule Inline graphic using the censored data.

Maximizing V ( Inline graphic) is equivalent to minimizing E [TI{AInline graphic(X)}/π(A; X)] according to (1). This is a weighted classification problem, where misclassification corresponds to AInline graphic(X), and the weights are T/π(A; X). This point of view motivated the development of outcome weighted learning for noncensored outcomes (Zhao et al., 2012). We generalize this approach to censored outcomes. Hereafter, we assume that event times and censoring times are continuous. Let SC(tA,X)=pr(C>tA=a,X=x) be the conditional treatment specific survival function for the censoring time given covariates x. Recall that T = min(T̃, τ). Then,

E{ΔYSC(YA,X)A,X}=E{TI(T<τ)I(C>T)SC(TA,X)+τI(Tτ)I(C>τ)SC(τA,X)A,X}=E{TI(T<τ)+τI(Tτ)A,X}=E(TA,X),

where we have used the conditional independence of T and C given X, A. Therefore,

V(D)=E[E(TA,X)π(A;X)I{A=D(X)}]=E[ΔYSC(YA,X)π(A;X)I{A=D(X)}]. (3)

To obtain an estimator of Inline graphic one could attempt to maximize an empirical estimate of the right-hand-side of (3). This is equivalent to minimizing

n-1i=1nΔiYiS^C(YiAi,Xi)I{AiD(Xi)}π(Ai;Xi), (4)

where ŜC is a consistent estimator for SC. However, direct optimization is intractable because of the discontinuous indicator functions; instead, we minimize a convex relaxation of (4). Because the objective function can be viewed as a weighted misclassification rate, we base our relaxation on support vector machines (Cortes & Vapnik, 1995). We replace I{AInline graphic(x)} with a convex surrogate ϕ{Af(x)}, where Inline graphic(x) = sign{f(x)} and ϕ(t) = max(1 − t, 0) denotes the hinge loss. Details for estimating Inline graphic are at the end of this section.

In the above formulation, a misspecified model for C given (A, X) can lead to biased estimation. Thus, we also propose a doubly robust estimator which protects against such misspecification. Let ETm(TT>t,A,X) denote a working model for the conditional mean residual life-time given (A, X) derived from a working survival model for S (t | A, X), and let SCm(tA,X) denote a working model for SC(t | A, X). Then, using the construction given in Section 2.3.2 of van der Laan & Robins (2003), we define the augmented value function,

Vm(D)=E([ΔYSCm(YA,X)-ETm(TT>t,A,X){dNC(t)SCm(tA,X)+I(Yt)dSCm(tA,X)SCm(tA,X)2}]I{A=D(X)}π(A;X)),

where NC(t) = (1 − Δ)I(Yt). In addition to the inverse probability of censoring weighting, there is an augmentation term in the weights for I{A = Inline graphic(X)}. The following lemma shows that Vm( Inline graphic) is equivalent to V( Inline graphic) when either working model is correct; the proof is deferred to the Supplementary Material.

Lemma 1

If either ETm(TT>t,A,X)=E(TT>t,A,X) or SCm(tA,X)=SC(tA,X), then Vm( Inline graphic) = V( Inline graphic).

Define

R(Y,Δ,SC,ET)=ΔYSC(YA,X)-ET(TT>t,A,X){dNC(t)SC(tA,X)+I(Yt)dSC(tA,X)SC(tA,X)2}, (5)

Lemma 1 shows that if either working model is correct then V(D)=E[R(Y,Δ,SCm,ETm)I{A=D(X)}/π(A;X)]. Thus, we can apply the weighted classification approach to estimate the optimal treatment rule using weights R(Y,Δ,SCm,ETm)/π(A;X).

To distinguish the two learning approaches, we call the first approach inverse censoring weighted outcome weighted learning and the second approach doubly robust outcome weighted learning. Estimation is implemented as follows:

  • Step 1. Fit a model for given (A, X) to construct estimate Ŝ (T | A, X) of S (T |, A, X). Estimate E (T | T > t, A, X) for t ∈ [0, τ) by
    E^T(TT>t,A,X)=τS^T(τA,X)S^T(TA,X)-tτ-udS^T(uA,X).
  • Step 2. Fit a model for C given (A, X) to form estimate ŜC(t | A, X) of SC(t | A, X).

  • Step 3. Calculate Wi = ΔiYi/ŜC(Yi | Ai, Xi) for the first approach and Wi = R(Yi, Δi, ŜC, Ê) for the second approach. If negative weights occur with the doubly robust methods we can subtract mini Wi from all the weights.

  • Step 4. Use the algorithm outlined below to obtain (x) by minimizing
    i=1nWiϕ{Aif(Xi)}π(Ai;Xi)+λnf2. (6)
  • Step 5. The decision rule is Inline graphic(x) = sign{(x)}.

We have added a regularization term λn||f||2 to avoid overfitting in Step 4. Here, ||f|| is a norm defined on the space that f belongs to, and λn is a tuning parameter that controls the severity of penalization. We use a linear decision function f(x) = θ0 + θT x, to illustrate the algorithm utilized in this step. In this case, ||f|| is the Euclidean norm of θ. Let W denote a generic weight constructed in Step 3 using one of the proposed methods. The optimization problem in Step 4 can be written as minγi=1nWiξi+θ2 subject to ξ ≥ 0 and Ai(θT Xi + θ0) ≥ (1− ξi). By introducing Lagrange multipliers, we obtain the Lagrangian

12θ2+γi=1nWiξi-i=1nαi{Ai(θTXi+θ0)-(1-ξi)}-i=1nμiξi,αi,μi0.

By taking derivatives with respect to θ, θ0 and ξi, we have 0=i=1nαiAi,θ=i=1nαiAiXi and αi = γWiμi. It follows that the dual problem is

maxαi=1nαi-12i=1nj=1nαiαjAiAjXiTXj (7)

subject to 0 ≤ αiγWi and i=1nαiAi=0. The dual problem can be solved using quadratic programming. Estimates θ̂ = Σα̂i0α̂iAiXi and θ̂0 follow from the Karush–Kuhn–Tucker conditions. When a linear decision rule is not sufficient, the procedure can be generalized using nonlinear kernel functions. For every positive definite kernel k : Inline graphic × Inline graphic → ℝ, there is a unique reproducing kernel Hilbert space Inline graphic, which is the completion of the linear span of all functions {k, x), xInline graphic}. The norm in Inline graphic, denoted by ||·||k, is induced by the inner product, f,gk=i=1nj=1mαiβjk(xi,xj), for f(·)=i=1nαik(·,xi) and g(·)=j=1mβjk(·,xj). A general nonlinear function f(x) can be used instead of a linear function. By the representer theorem (Kimeldorf & Wahba, 1971), the minimizer must admit a representation of the form f(·)=i=1nαik(·,Xi). In addition, to solve the optimization problem, we only need to compute the kernel matrix, where the inner product XiTXj in the dual objective function (7) is replaced by k(Xi, Xj). Quadratic programming can still be applied to obtain α̂i (i = 1,…, n). The resulting decision rule is Inline graphic(X) = sign{Σα̂i0α̂ik(Xi, X) + θ̂0}.

2·3. Working models for estimating SC(t | A, X) and E (T | T > t, A, X)

In our simulated experiments we used the Cox proportional hazards model for the requisite survival functions (Cox, 1972). Let ZT and ZC denote regressors constructed from X and A used in the Cox proportional hazards models for T and C respectively. Let λCi(t) and λTi(t) denote the hazard functions of censoring and failure times for subject i respectively. Under the Cox model, λCi(t)=λC0(t)exp(βCTZCi) and λTi(t)=λT0(t)exp(βTTZTi), where λC0(t) and λT0(t) are baseline hazard functions for censoring and failure times, respectively. The estimator for βC, say β̂C, maximizes the partial likelihood

i=1n{exp(βCTZCi)YjYiexp(βCTZCj)}1-Δi.

We use the Breslow estimator

Λ^C0(t)=0tk=1ndNCk(u)j=1nI(Yju)eβ^CTZCj

for the cumulative baseline hazard function ΛC0(t). An estimator of ΛC(t|Ai, Xi), the cumulative hazard function of censoring time for subject i, is exp(β^CTZCi)Λ^C0(t). An estimator for SC(t|Ai, Xi) is S^C(tAi,Xi)=exp{-Λ^C0(t)}exp(β^CTZCi). Estimates β̂T and Λ̂0(t) are obtained similarly. Details for estimating E(T|T > t, A, X) are in the Supplementary Material.

3. Theoretical Results

Let f(x) be the decision function with the decision rule given by Inline graphic(x) = sign{f(x)}, and write V (f) to denote the value function V ( Inline graphic). We define the pseudo value as

VR(f,SC,ET)=E(R(Y,Δ,SC,ET)I[A=sign{f(X)}]π(A;X)).

Therefore, Lemma 1 can be restated as V(f)=VR(f,SC,ET)=VR(f,SC,ET), where SC(tA,X) is the true conditional survival function of C given (A, X), and ET(TT>t,A,X) is the true conditional mean residual lifetime given (A, X). Define convex surrogate loss function

Lϕ(f,SC,ET)=R(Y,Δ,SC,ET)ϕ{Af(X)}π(A;X), (8)

where ϕ(t) = max(1 − t, 0). Define Inline graphic to be the set of all measurable functions from Inline graphic into ℝ. Our first result states that the decision function obtained by minimizing the expectation of this surrogate loss over Inline graphic maximizes VR for any SC and E. Furthermore, we quantify the differences using the hinge loss versus zero-one loss. The proofs are essentially the same as Theorems 3.1 and 3.2 in Zhao et al. (2012) and are thus omitted.

Lemma 2

If minimizes E{Lϕ(f, SC, E)) over Inline graphic with any models for SC and E, then

  1. VR(, SC, E) = Inline graphic VR(f, SC, E), that is, yields the maximum value of VR;

  2. for any fInline graphic,
    VR(f,SC,ET)-VR(f,SC,ET)E{Lϕ(f,SC,ET)}-E{Lϕ(f,SC,ET)},

    that is, the value lost due to using a suboptimal decision function f is bounded by the expected surrogate loss.

Our main result establishes the convergence rates for value of the estimated decision rule . As described in Section 2, we use weights that depend on the estimated survival functions, and hinge loss as a surrogate loss function. To bound the difference between the true and the empirical expectation of the surrogate, which involves random quantities, that is, estimates for survival and censoring times, we use the following assumptions:

Assumption 1

Both Ê(T|A = a, X = x) and ŜC(t|A = a, X = x) converge in probability to ETm(tA,X) and SCm(tA,X) uniformly in t ∈ (0, τ] for every (x, a). Moreover, for some constant γ > 0,

E{R(Y,Δ,S^C,E^T)-R(Y,Δ,SCm,ETm)}=Op(n-γ). (9)
Assumption 2

For some η > 0, SCm(τA,X)>η with probability 1.

Assumption 1 implies that Ê(T|A, X) and ŜC(t|A, X) converge to fixed functions, even if the imposed working models are wrong. Moreover, it imposes an assumption on the variance of the survival function estimators. The constant γ depends on the working models used for estimating SC and E. If we assume parametric or semiparametric models, including the Cox proportional hazards model and transformation models, then γ = 1/2 in (9). Assumption 2 ensures that some subjects do not fail at the end of the study and thus have observation time τ.

In addition, we restrict the choice of reproducing kernel Hilbert space to the space associated with Gaussian radial basis function kernels, k(x,x)=exp(-σn2x-x2), x, x′Inline graphic, where σn > 0 is the kernel bandwidth parameter varying with n controlling the spread of the kernel. We can determine the complexity of Inline graphic in terms of capacity bounds with respect to the empirical L2-norm, defined as f-gL2(Pn)=(n-1i=1nf(Xi)-g(Xi)2)1/2, f, gInline graphic for functional class Inline graphic. For any ε > 0, the covering number of Inline graphic with respect to L2(Pn), N{ Inline graphic, ε, L2(Pn)}, is the smallest number of L2(Pn) ε-balls needed to cover Inline graphic, where an L2(Pn) ε-ball around a function gInline graphic is the set {fInline graphic:||fg||L2(Pn) < ε}. It has been shown in Theorem 2.1 of Steinwart & Scovel (2007) that for any ε > 0,

supPnlogN{BHk,ε,L2(Pn)}cnε-p, (10)

where cn=cp,δ,dσn(1-p/2)(1+δ)d, Inline graphic is the closed unit ball of Inline graphic, p and δ are any numbers satisfying 0 < p ≤ 2 and δ > 0, cp,δ,d is a constant depending only on p, δ and d, and the supremum is taken over finitely discrete probability measures Pn.

Let fm=argminfFE{Lϕ(f,SCm,ETm)}. According to Lemma 2, fm also maximizes VR(f,SCm,ETm), and VR(fm,SCm,ETm-VR(f,SCm,ETm))E{Lϕ(f,SCm,ETm)}-E{Lϕ(fm,SCm,ETm)} for any function f. Hence, the convergence rate of the value using the estimated rule under the hinge loss will dominate the rate under the 0–1 loss. Define the approximation error function

a(λ)=inffHk[E{Lϕ(f,SCm,ETm)}+λfk2-E{Lϕ(fm,SCm,ETm)}].

The following theorem bounds the excess value optimal treatment rule relative to the doubly robust estimator of the optimal treatment rule. Its proof can be found in the Appendix.

Theorem 1

Assume Assumptions 1 and 2 hold and that λn → 0 and λnnmin(2γ,1) → ∞ as n → ∞. If we estimate within a reproducing kernel Hilbert space Inline graphic associated with Gaussian radial basis function kernels, then with probability greater than 1 − 2e−b,

V(f)-V(f^)2supfFVR(f,SC,ET)-VR(f,SCm,ETm)+a(λn)+Mpcn2/(p+2)(nλn)-2/(p+2)+Mpλn-1/2cn2/(p+2)n-2/(p+2)+Kb(nλn)-1+2Kbn-1λn-1/2+Op(n-γλn-1/2), (11)

where Mp is a constant depending on p and K is a sufficiently large positive constant.

In this theorem, f*(x) = E(T|A = 1, X = x) − E(T|A = −1, X = x) gives the optimal treatment rule. On the right-hand-side of (11), the first term reflects the estimation bias from the working models for T and C. The second term is the approximation error due to using the Inline graphic space. The last term is the stochastic variability of estimating SCm and ETm. The rest of the terms contain the empirical loss function for estimation of the optimal treatment rule. In particular, the convergence rate γ depends on the estimating procedure applied to the two working models.

A corollary is that when either the model for survival time or the model for censoring time is correctly specified, with probability greater than 1 − 2e−b, we have

V(f)-V(f^)a(λn)+Mpcn2/(p+2)(nλn)-2/(p+2)+Mpλn-1/2cn2/(p+2)n-2/(p+2)+Kb(nλn)-1+2Kbn-1λn-1/2+Op(n-γλn-1/2).

Remark 1

With the hinge loss, it has been shown that if the reproducing kernel Hilbert space is rich enough, the optimizer within the reproducing kernel Hilbert space approaches the optimal treatment decision rule as the sample size goes to for appropriately chosen tuning parameters. The Gaussian kernel is one such kernel, which can induce a reproducing kernel Hilbert space that is flexible enough to approximate the optimal decision rule. While the approximation error term usually goes to zero as λn 0, the other term controlling the stochastic error will increase. The optimal bandwidth λn can be obtained by setting the orders of a(λn) and max{cn2/(p+2)(nλn)-2/(p+2),n-γλn-1/2} equal to each other. The approximation error function a(λn) is usually related to the data-generating distribution, especially the behavior close to the decision boundary, which is the true optimal decision rule if either working model is correct. Intuitively, we should be able to learn the treatment rule more rapidly for well-separated optimal treatment classes, that is, distributions that have low density near the boundary.

This behavior can be characterized in terms of the size of the set of points that are close to boundary f*(x) = 0 (Tsybakov, 2004; Steinwart & Scovel, 2007). There exist a constant c1 such that a(λn)c1λnq/q+1, when using a Gaussian kernel with its kernel band-width σn varying with λn as σn=λn-1/{(q+1)d} and cn=λn-(1-p/2)(1+δ)/(q+1). Here, q > 0 is the noise exponent that characterizes the distribution close to the boundary (Steinwart & Scovel, 2007), and a larger q indicates a better separation between two treatment classes. More details on q are provided in the Appendix. An optimal choice of λn that balances bias and variance is λn = max{n−2(q+1)/{(4+p)q+2+(2−p)(2+δ)}, n−2(q+1)γ/(2q+1)}. The achieved convergence rate of the value due to the estimated rule versus the optimal value is thus max[n−2q/{(4+p)q+2+(2−p)(2+δ)}, n−2/(2q+1)].

The rate consists of two parts: the first part reflects the rate of convergence in estimating the optimal decision rule, which is consistent with the results without censoring (Zhao et al., 2012); the second part is related to survival function estimation. When q is sufficiently large and δ and p are close to zero, the convergence rate is close to n−γ, where γ is determined by the survival function estimator. A Cox model for the survival function estimates leads to γ = 1/2. Other working models can also be applied, such as transformation models (Zeng & Lin, 2007), nonparametric methods based on kernel type estimators (Dabrowska, 1989), or machine learning techniques (Zhu & Kosorok, 2012). However, the rate n−γ can be slower than Op(n−1/2) for certain estimators.

Remark 2

Although the theoretical results are derived only for doubly robust estimators, inverse censoring weighted estimators enjoy the property stated in Theorem 1, as it is a special case obtained by setting the augmentation term to zero. However, the first term on the right-hand-side of (11) will change unless the censoring model is correctly specified.

4. Simulation Studies

4·1. Preliminaries

We aim to maximize the value function in terms of the survival time on the log scale. We compare the inverse censoring weighted and the doubly robust methods for selecting optimal individualized treatment with Cox regression and Q-learning adjusted with censoring weights (Goldberg & Kosorok, 2012). For Cox regression, we fit a proportional hazards model with treatment-by-covariate interactions, and identify the optimal individualized treatment based on the predicted outcomes. To apply Q-learning, we fit Q(X, A) = Φ(X, A)θ, where Φ(X, A) = (1, X, A, XA), to the log of the failure time. We also apply a regularized version of Q-learning, called L2 Q-learning, where an L2 penalty is used for regularization. The estimator is obtained as

argminθi=1n{log(Yi)-Φ(Xi,Ai)θ}2ΔiS^C(YiAi,Xi)+λnθ2,

where λn is a tuning parameter to be selected using cross-validation, and ŜC(Y|A, X) is the estimated conditional survival function of C given (A, X) that can be obtained using Cox regression. The estimated optimal decision rule is Inline graphic(x) = argmaxa∈{−1,1}Φ(x, a)θ̂.

4·2. Simulation study

Ten independent covariates, X1, …, X10, were generated from the uniform distribution on [0, 1]. Treatments were generated from {−1, 1} with equal probabilities 0·5. Four different scenarios are presented, corresponding to different combinations of correct or incorrect survival time and censoring time models. Specifically, we generated or C from the accelerated failure time or Cox models in different scenarios, while we always used a proportional hazards model as a working model for both and C given (A, X). Regarding the specification of the model basis, we include treatment covariate interaction terms in the survival function modeling since we are interested in whether certain characteristics moderate treatment effects. Conversely, we do not model the censoring time with interaction terms unless we have full knowledge of the data, because it is not typical to posit a complex model for the censoring mechanism in practice. Details for calculating the doubly robust weights using Cox working models are given in the Supplementary Material.

For each scenario, a test data set of size 10,000 is generated to evaluate the estimated rules. The decision rules are estimated from training data using the proposed methods as described in Section 2 and the competitors. The sample sizes for the training data sets were taken to equal to 100, 200 and 400, and the simulations were repeated 1000 runs for each sample size. A linear basis is applied for model fitting in Q-learning. Linear kernels were used for both the implementation of inverse censoring weighted and doubly robust outcome weighted learning. We also explored the use of Gaussian kernels, and found that the performances were comparable to the linear kernel. The learning procedure was implemented using a Library for Support Vector Machines developed in Chang & Lin (2011). The tuning parameter λn in (6) was chosen using 5-fold cross validation over a pre-specified grid, with the criterion being the empirical pseudo value function. Specifically, for each tuning parameter, we partitioned the training data into 5 parts, each of which serves as the validation set once while the other 4 parts of the data are utilized for estimation. We sum up the empirical pseudo values calculated across the validation sets from the corresponding trained decision rules, and choose the optimal tuning parameter as the one maximizing the summed value.

In the following, we consider four generative models.

Case 1: The true models are Cox models for both and C. The survival time T is the minimum of τ = 1.5 and , where is generated with hazard rate function

λT(tA,X)=λT0(t)exp{0·6X1-0·8X2+(0·6-0·4X1-0·2X2-0·4X3)A},

and λ0(t) = 2t. The censoring time C is generated with hazard rate function

λC(tA,X)=λC0(t)exp{0·5X1+0·5X2+(-1+0·4X1-0·6X2)A},

where λC0(t) = 2t. The censoring percentage is around 56%. The optimal decision boundary is linear with Inline graphic(X) = −sign(0·6 − 0·4X1 − 0·2X2 − 0·4X3). We use the Cox regression model with covariates (X1, X2, X3, A, X1A, X2A, X3A) to model survival and censoring times respectively. Therefore, both models are correctly specified.

Case 2: The true model for is a Cox model, and the true model for C is an accelerated failure time model. The survival time T is the minimum of τ = 2 and , where

λT(tA,X)=λT0(t)exp{-1·5X1+0·5X2+(1-0·7X1-1·2X2)A}.

We let censoring time C be generated from an accelerated failure time model with

log(C)=-0·3+0·6X1-0·1X2+0·3X3+(0·2-X1-2X2+0·5X3)A+ε,

where ε is generated from N(0, 1). The censoring percentage is around 34%. The optimal decision boundary is Inline graphic(X) = −sign(0·5 − 0·1X1 − 0·6X2). We use the Cox model as the working model for both and C given (A, X). Specifically, we use (X1, X2, A, X1A, X2A) as covariates to model survival time, and (X1, X2, X3) to model censoring time. Therefore, the working model is correctly specified for T but incorrect for C.

Case 3: The true model for is an accelerated failure time model, and the true model for C is a Cox model. The survival time T is the minimum of τ = 2 and , which is generated with

log(T)=-0·5-0·8X1+0·7X2+0·2X3+(0·6-0·4X1-0·1X2-0·4X3)A+ε.

The censoring time C is generated from the Cox proportional hazards model, where

λC(tA,X)=λC0(t)exp{-0·5X1-0·5X2+0·2X3-(1-0·5X1+0·3X2-0·5X3)A},

and λC0(t) = 2t. The censoring percentage is around 45%. The optimal decision boundary is linear with Inline graphic(X) = sign(0·6 − 0·4X1 − 0·1X2 − 0·4X3). We use the Cox model for both and C given (A, X). Specifically, we use (X, A, XA) to model survival time, and (X1, X2, X3, A, X1A, X2A, X3A) to model censoring time. Therefore, the working model is correctly specified for C but incorrect for T.

Case 4: The true models are accelerated failure time models for both T and C. The survival time T is the minimum of τ = 2.5 and , where

log(T)=-0·2-0·5X1+0·5X2+0·3X3+(0·5-0·1X1-0·6X2+0·1X3)A+ε.

The censoring time C is generated from an accelerated failure time model with

log(C)=0·5-0·8X1+0·4X2+0·4X3+(0·5-0·1X1-0·6X2+0·3X3)A+ε,

and ε is generated from N(0, 1). The censoring percentage is around 31%. The optimal decision boundary is linear with Inline graphic(X) = sign(0·5 − 0·1X1 − 0·6X2 + 0·1X3). We use (X, A, XA) to model survival time, and X to model censoring time. Therefore, both working models are incorrectly specified.

Since we know the true data generating mechanism under every scenario, for each of 1000 replicates, we calculate the values based on the logarithm of the survival time using the constructed rule from different methods. Figure 1 shows these values when n = 100, where larger values indicate longer survival. Additional results using other sample sizes are provided in the Supplementary Material.

Fig. 1.

Fig. 1

Boxplots of values of estimated rules using different methods, representing the logarithm of the survival time with higher values being more preferable. Cox, Cox model; Q, inverse censoring weighted Q-learning; L2Q, inverse censoring weighted L2 Q-learning; ICO, inverse censoring weighted outcome weighted learning with linear kernel; DRO, doubly robust outcome weighted learning with linear kernel.

In general, inverse censoring weighted and doubly robust outcome weighted learning have satisfactory performances. Inverse censoring weighted outcome weighted learning performs better when the censoring model is correctly specified, see Fig. 1(a) and 1(c). Indeed, doubly robust outcome weighted learning requires estimating both censoring and survival probabilities, which yield a higher variability compared with that of the inverse censoring weighted outcome weighted learning. The strength of doubly robust approach can be seen when the censoring model is mis-specified but the survival model is correct, since it can correct the bias from using only inverse censoring weighting, see Fig. 1(b). When the survival data are truly generated from the Cox model, Cox regression with correct basis results in the best performances, see Fig. 1(a) and 1(b). However, the strength is lost when the survival time is generated from an accelerated failure time model. Although Q-learning is improved by L2-regularization, possibly by reducing overfitting, Q-learning based methods can have suboptimal performances even when the censoring model is correctly specified but survival time is generated from a Cox model, see Fig. 1(a). This is due to model misspecification when inverse censoring weighted Q-learning models the logarithm of the survival time.

We also consider a nonlinear example of possible model misspecification in the Supplementary Material. The number of covariates is increased to 30, and and C are generated from Cox models with complex effects. When the censoring or survival model is correctly specified, we use the true sets of covariates for model fitting. If the model is incorrect for survival time or censoring time, we use a linear basis. In addition to a linear kernel, we apply both methods with a Gaussian kernel. We can see that the gain from using a Gaussian kernel is pronounced, since it may better approximate the nonlinear treatment decision rules.

5. Application

We illustrate the proposed methods using advanced non-small-cell lung cancer data (Socinski et al., 2002), which is collected in a two-arm randomized trial with survival time as the primary endpoint. Non-small-cell lung cancer is the leading cause of cancer-related mortality, and approximately 30% to 40% of all new cases present with stage IV or stage IIIB disease. To investigate the optimal duration of therapy that maximizes survival, a prospective randomized phase III trial was initiated in 1998. Patients with advanced non-small-cell lung cancer were recruited and randomized to either four cycles of carboplatin/paclitaxel or continuous therapy with carboplatin/pactaxel until disease progression. The study enrolled 230 subjects; however, we restrict our analysis to the 224 subjects with complete information. In the analysis sample, 112 subjects were assigned to each treatment. The censoring rate was 32%. The baseline covariates include age ranging from 32 to 82 with median 63, sex with 138 male and 86 female, race with 162 white, 54 black and 8 other, performance status with 117 Karnofsky performance status 90% to 100% and 97 70% to 80%, and stage with 30 Stage IIIB and 194 Stage IV.

In addition to the proposed methods, we apply Cox regression, inverse censoring weighted Q-learning and L2 Q-learning with a linear basis. We consider two sets of working models: we first use Cox regression with basis (X, A, XA) for both survival time and censoring time, and then use the Kaplan–Meier estimator for censoring time as an alternative. A treatment decision rule Inline graphic is evaluated based on its empirical value adjusted for censoring. The empirical value is calculated by i=1nRiI{Ai=D(Xi)}/i=1nI{Ai=D(Xi)}, with equal to

Δlog(Y)S^C(YA,X)-E^T{log(T)T>t,A,X}{dNC(t)S^C(tA,X)+I(Yit)dS^C(tA,X)S^C(tA,X)2},

where ŜC(t | A, X) and Ê (T | T > t, A, X) are the estimated censoring probability and residual life conditional on patients characteristics. To avoid overfitting, we employ a cross-validated analysis. At each run, we partition the whole data set into 5 pieces, where 4 parts of the data are used as training data to estimate the individualized treatment rules, and the remaining part is the validation set for implementing the estimated rules, with empirical values stored for each method respectively. The cross-validated values are obtained by averaging the empirical values on all 5 validation subsets. The procedure is repeated 100 times. The averages and standard errors of these values are reported in Table 1, where larger values correspond to longer survival time.

Table 1.

Mean (s.e.) cross-validated days on log sale using different working models for C, with working model for T being a Cox regression model with basis (X, A, XA)

Working model for C Mean (s.e.) Cross-validated Values
Cox Q L2Q ICO DRO
Cox model, basis (X, A, XA) 5·061 (0·381) 5·234 (0·578) 5·258 (0·575) 5·339 (0·404) 5·257 (0·526)
Kaplan–Meier 5·061 (0·381) 5·002 (0·621) 5·000 (0·632) 5·473 (0·166) 5·446 (0·273)

s.e., standard errors; Cox, Cox model; Q: inverse censoring weighted Q-learning; L2Q, inverse censoring weighted L2 Q-learning; ICO, inverse censoring weighted outcome weighted learning with linear kernel; DRO, doubly robust outcome weighted learning with linear kernel.

Both inverse censoring weighted and doubly robust outcome weighted learning methods lead to higher values more frequently. We see a comparable performance between the two proposed approaches, which is reasonable if the censoring distribution is correctly specified, although doubly robust outcome weighted learning may have a larger variance. Since the number of covariates is not large, the performances of inverse censoring weighted Q-learning and L2 Q-learning are similar. They could have difficulties in identifying the optimal treatment rules if the model for survival time does not fit well, even if the censoring weight is correctly specified. Also, when we apply Cox regression to model the censoring time, none of the covariates has a significant effect. Thus, working models with either Kaplan–Meier estimators or Cox regression estimators yield similar results. We then apply the proposed methods to the whole data set using Cox regression working models for both T and C. The treatment recommendations from inverse censoring weighted outcome weighted learning recommends that 119 patients be assigned to continuous therapy with carboplatin/pactaxel, while by using doubly robust outcome weighted learning, 122 out of 224 patients should be given the continuous therapy. By checking the empirical value, we find that both strategies yield close values: 5·301 for inverse censoring weighted outcome weighted learning and 5·567 for doubly robust outcome weighted learning. In fact, sometimes we may have equivalent treatment strategies if there are no differential treatment effects on a subset of the patients. The treatment decision rule produced by inverse censoring weighted Q-learning and L2 Q-learning however, only leads to an empirical value of 4·756, and an empirical value of 4·744 by using Cox regression.

6. Discussion

As one reviewer pointed out, the proposed method is only one possible reduction of optimizing treatment rules to a weighted classification problem. Alternative choices have been proposed for continuous outcomes (Zhang et al., 2012b; Rubin & van der Lann, 2012), which can be generalized to the censoring data setup.

There may exist multiple treatments for comparison. One approach for extending the proposed framework to handle this case is to incorporate techniques developed in multicategory classification (Lee et al., 2004; Liu & Yuan, 2011). Another important extension is to settings in which there are a large number of variables. In this setting, penalized methods in classification by using sparse penalties could be adapted (Zhu et al., 2004).

Effective management of a chronic illness requires individualized treatment recommendations that are responsive to a patient’s changing health status. Dynamic treatment regimens formalize a dynamic treatment plan as a sequence of treatment rules, one per stage of clinical intervention, that map current patient information to a recommended treatment. Longer life expectancy and an aging population have created a surge in the rate of chronic illness related death. Thus, there is increasing interest in dynamic treatment regimens (Thall et al., 2002; Murphy, 2003; Robins, 2004; Moodie et al., 2007; Zhao et al., 2011; Goldberg & Kosorok, 2012; Huang et al., 2014). Extension of the proposed approach for survival data to the dynamic setting is of great interest.

Supplementary Material

Supplementary material

Acknowledgments

This research was supported by the U.S. National Institutes of Health. We are grateful to the editors and the reviewers for their insightful comments which have led to important improvements in this paper.

Appendix

Proof of Theorem 1

First, it can be established that

V(f)-V(f^)supfFVR(f,SCm,ETm)-VR(f^,SCm,ETm)+2supfFVR(f,SC,ET)-VR(f,SCm,ETm).

According to Lemma 2(a), VR(fm,SCm,ETm)=supfFVR(f,SCm,ETm), where fm=argminfFE{Lϕ(f,SCm,ETm)}. Hence, it suffices to derive the convergence rate of VR(fm,SCm,ETm)-VR(f^,SCm,ETm).

Let fλnm=argminfHk[E{R(Y,Δ,SCm,ETm)ϕ{Af(X)}/π(A;X)}+λnfk2]. Then,

VR(fm,SCm,ETm)-VR(f^,SCm,ETm)a(λn)+(n-1i=1n[λnf^k2+R(Yi,Δi,S^C,E^T)ϕ{Aif^(Xi)}π(Ai;Xi)-λnfλnmk2-R(Yi,Δi,S^C,E^T)ϕ{Aifλnm(Xi)}π(Ai;Xi)]-E[λnf^k2+R(Y,Δ,S^C,E^T)ϕ{Af^(X)}π(A;X)-λnfλnmk2-R(Y,Δ,S^C,E^T)ϕ{Afλnm(X)}π(A;X)])+E[R(Y,Δ,SCm,ETm)ϕ{Af^(X)}π(A;X)-R(Y,Δ,S^C,E^T)ϕ{Af^(X)}π(A;X)]+E[R(Y,Δ,S^C,E^T)ϕ{Afλnm(X)}π(A;X)-R(Y,Δ,SCm,ETm)ϕ{Afλnm(X)}π(A;X)]=a(λn)+(I)+(II)+(III).

To bound (II) and (III), we consider the class of functions

B={R{Y,Δ,SC(βC,ΛC0),ET(βT,ΛT0)}:βCd,βC-βCm<δ0,βTd,βT-βTm<δ0,ΛC0,ΛT0areboundedmonotonefunctionsin[0,τ],suptΛC0(t)-ΛC0m(t)<δ0,suptΛT0(t)-ΛT0m(t)<δ0},

where δ0 is a small constant, and βTm,βCm,ΛC0m(t),ΛT0m(t) are the limits of β̂T, β̂C, Λ̂C0 and Λ̂T0 based on the Cox models. Then |R{Y, Δ, SC(βC, ΛC0), E (βC, ΛC0)}|/π(A, X) can be bounded from above by a constant, say M.

Trivial bounds for ||||k and fλnmk are obtained as f^kMλn-1/2 and fλnmkMλn-1/2. For every fMλn-1/2BHk,(1-Af)+1+Mλn-1/2=B. Thus,

|E[R(Y,Δ,SCm,ETm)ϕ{Af(X)}π(A;X)-R(Y,Δ,S^C,E^T)ϕ{Af(X)}π(A;X)]|=Op(n-γλn-1/2).

We use empirical process theory to bound (I). Define the functional class

L={λnfk2+R(Y,Δ,SC,ET)ϕ{Af(X)}π(A;X)-R(Y,Δ,SC,ET)ϕ{Afλnm(X)}π(A;X)-λnfλnmk2,fMλn-1/2BHk,R(Y,Δ,SC,ET)B},

and Inline graphic = {E(l) − l : E(l) = ε, lInline graphic}. Let Z=supgGn-1i=1ng(Xi). Since E(g) = 0, gInline graphic, it follows from Lemma S.1 in the Supplementary Material, by setting ρ = 1, that pr{Z ≥ 2E(Z) + σ(Kb)1/2n−1/2 + 2KBbn−1} ≤ eb, where B=O(λn-1/2). Furthermore, σ2cnε following the arguments for proving Theorem 3.4 in Zhao et al. (2012), given that E(l2)cnE(l), where cn=O(λn-1). In addition, for fMλn-1/2BHk,

E(Z)=E{supgGn-1i=1ng(Xi)}=E[supE(l2)cnε|E{l(X)}-n-1i=1nl(Xi)|].

Since | βC-βCm| and | βT-βTm| are bounded by δ0, they lie in a hypercube of ℝ2d. Moreover, { ΛC0:suptΛC0(t)-ΛC0m(t)<δ0} is a class of monotone functions, so is { ΛT0:suptΛT0(t)-ΛT0m(t)<δ0}. The function in Inline graphic is Lipschitz continuous with respect to all these parameters and the Lipschitz constant is less than a constant W. There exists a constant K, depending on d, such that the bracketing number for Inline graphic satisfies N[·]{ Inline graphic, εW, L2(P)} ≤ K(δ0/ε)2d+2. According to (10), supPn log N { Inline graphic, ε, L2(Pn)} ≤ cnεp, and therefore

E(Z)cpMλn-12max{(M2λncnε)(2-p)/4cn1/2n-1/2,cn2/(2+p)n-2/(2+p)},

where cp is a constant depending on p. See Proposition 5.5 in Steinwart & Scovel (2007) and references therein. Consequently,

pr(|n1/2[n-1i=1nl(Xi)-E{l(X)}]|>(cnεKb)1/2n-1/2+2KBbn-1+2cpMλn-1/2max{(M2λncnε)(2-p)/4cn1/2n-1/2,cn2/(2+p)n-2/(2+p)})e-b.

Given that Inline graphic is convex, if lInline graphic satisfies n-1i=1nl(Xi)αε and E{l(X)} ≥ ε, there exists l′ ∈ Inline graphic such that n-1i=1nl(Xi)αε and E{l′(X)} = ε. Thus, with probability at least 1 − eb, every lInline graphic with n-1i=1nl(Xi)αε satisfies Elε (Bartlett et al., 2006; Steinwart & Scovel, 2007). Since

n-1i=1n[λnf^k2+R(Yi,Δi,S^C,E^T)ϕ{Aif^(Xi)}π(Ai;Xi)-λnfλnmk2-R(Yi,Δi,S^C,E^T)ϕ{Aifλnm(Xi)}π(Ai;Xi)]0<αε,

with probability at least 1 − eb,

E[λnf^k2+R(Y,Δ,S^C,E^T)ϕ{Af^(X)}π(A;X)-λnfλnmk2-R(Y,Δ,S^C,E^T)ϕ{Afλnm(X)}π(A;X)]ε.

It follows that,

pr[(I)>{4cpMλn-1/2(M2λncn)(2-p)/4cn1/2n-1/2}4/(p+2)+cpMλn-1/2cn2/(2+p)n-2/2+p+cnKbn-1+2KBbn-1]2e-b,

with cn=O(λn-1) and B=O(λn-1/2). Using Mp as a new constant depending on p, we subsequently obtain the desired results.

Geometric noise exponent

The approximation error depends on the noise component q, called the geometric noise exponent (Steinwart & Scovel, 2007). Let

η(x)=E{R(Y,Δ,SCm,ETm)X=x,A=1}-E{R(Y,Δ,SCm,ETm)X=x,A=-1}E{R(Y,Δ,SCm,ETm)X=x,A=1}+E{R(Y,Δ,SCm,ETm)X=x,A=-1}+1/2.

Hence, 2η(x) − 1 is the decision boundary for the optimal treatment decision rules when we use the pseudo-outcomes. We further define Inline graphic = {xInline graphic : 2η(x) − 1 > 0}, and Inline graphic = {xInline graphic : 2η(x) − 1 < 0}. A distance function to the boundary between Inline graphic and Inline graphic is Δ(x) = (x, Inline graphic) if xInline graphic, Δ(x) = (x, Inline graphic) if xInline graphic and Δ(x) = 0 otherwise, where (x, Inline graphic) denotes the distance of x to a set Inline graphic with respect to the Euclidean norm. Then the distribution P is said to have geometric noise exponent 0 < q < ∞, if there exists a constant C > 0 such that

E[exp{-Δ(X)2t}2η(X)-1]Ctqd/2,t>0.

Δ(X) actually measures the size of the set of points that are close to the opposite class. Indeed, if the data are distinctly separable, that is, when |2η(x) − 1| > δ > 0, for some constant δ, and η is continuous, q can be very large. If either model for the survival time or the censoring time is correctly specified, 2η(x) − 1 is the optimal treatment decision rule, where sign{2η(x) − 1} = sign{f*(X)}.

Footnotes

SUPPLEMENTARY MATERIALS

Supplementary material available at Biometrika online includes extended proofs of technical results, calculation of pseudo-outcome using Cox proportional hazards models, and additional simulation results.

Contributor Information

Y. Q. Zhao, Email: yqzhao@biostat.wisc.edu, Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, 53792, U.S.A

D. Zeng, Email: dzeng@bios.unc.edu, Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, 27599, U.S.A

E. B. Laber, Email: eblaber@ncsu.edu, Department of Statistics, North Carolina State University, Raleigh, North Carolina, 27695, U.S.A

R. Song, Email: rsong@ncsu.edu, Department of Statistics, North Carolina State University, Raleigh, North Carolina, 27695, U.S.A

M. Yuan, Email: myuan@stat.wisc.edu, Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin, 53792, U.S.A

M. R. Kosorok, Email: kosorok@unc.edu, Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, 27599, U.S.A

References

  1. Bartlett PL, Jordan MI, McAuliffe JD. Convexity, classification, and risk bounds. J Am Statist Assoc. 2006;101:138–156. [Google Scholar]
  2. Chang CC, Lin CJ. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology. 2011;2:27:1–27:27. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm. [Google Scholar]
  3. Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20:273–297. [Google Scholar]
  4. Cox DR. Regression models and life-tables (with discussion) J R Statist Soc B. 1972;34:187–220. [Google Scholar]
  5. Dabrowska DM. Uniform consistency of the kernel conditional Kaplan–Meier estimate. Ann Statist. 1989;17:1157–1167. [Google Scholar]
  6. Goldberg Y, Kosorok MR. Q-learning with censored data. Ann Statist. 2012;40:529–560. doi: 10.1214/12-AOS968. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Huang X, Ning J, Wahed AS. Optimization of individualized dynamic treatment regimes for recurrent diseases. Statist Med. 2014 doi: 10.1002/sim.6104. In press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Kang C, Janes H, Huang Y. Combining biomarkers to optimize patient treatment recommendations. Biometrics. 2014 doi: 10.1111/biom.12191. In Press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Kimeldorf G, Wahba G. Some results on Tchebycheffian spline functions. J Math Anal Appl. 1971;33:82–95. [Google Scholar]
  10. Lee Y, Lin Y, Wahba G. Multicategory support vector machines, theory, and application to the classification of microarray data and satellite radiance data. J Am Statist Assoc. 2004;99:67–81. [Google Scholar]
  11. Liu Y, Yuan M. Reinforced multicategory support vector machines. J Comput Graph Statist. 2011;20:909–919. doi: 10.1080/10618600.2015.1043010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Moodie EEM, Richardson TS, Stephens DA. Demystifying optimal dynamic treatment regimes. Biometrics. 2007;63:447–455. doi: 10.1111/j.1541-0420.2006.00686.x. [DOI] [PubMed] [Google Scholar]
  13. Murphy SA. Optimal dynamic treatment regimes (with discussion) J R Statist Soc B. 2003;65:331–366. [Google Scholar]
  14. Qian M, Murphy SA. Performance guarantees for individualized treatment rules. Ann Statist. 2011;39:1180–1210. doi: 10.1214/10-AOS864. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Robins JM. Optimal structural nested models for optimal sequential decisions. In: Lin DY, Heagerty PJ, editors. Proc 2nd Seattle Symp Biostatist. New York: Springer; 2004. [Google Scholar]
  16. Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol. 1974;66:688–701. [Google Scholar]
  17. Rubin DB. Bayesian inference for causal effects: The role of randomization. Ann Statist. 1978;6:34–58. [Google Scholar]
  18. Rubin DB, van der Lann MJ. Statistical issues and limitations in personalized medicine research with clinical trials. J Educ Psychol. 2012;8 doi: 10.1515/1557-4679.1423. Article 18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Socinski MA, Schell MJ, Peterman A, Bakri K, Yates S, Gitten R, Unger P, Lee J, Lee JH, Tynan M, Moore M, Kies MS. Phase III trial comparing a defined duration of therapy versus continuous therapy followed by second-line therapy in advanced-stage IIIB/IV nonsmall-cell lung cancer. J Clin Oncol. 2002;20:1335–1343. doi: 10.1200/JCO.2002.20.5.1335. [DOI] [PubMed] [Google Scholar]
  20. Splawa-Neyman J, Dabrowska DM, Speed TP. On the application of probability theory to agricultural experiments (engl. transl by D M Dabrowska and T P Speed) Statist Sci. 1990;5:465–472. [Google Scholar]
  21. Steinwart I, Scovel C. Fast rates for support vector machines using Gaussian kernels. Ann Statist. 2007;35:575–607. [Google Scholar]
  22. Thall PF, Sung HG, Estey EH. Selecting therapeutic strategies based on efficacy and death in multicourse clinical trials. J Am Statist Assoc. 2002;97:29–39. [Google Scholar]
  23. Tsybakov AB. Optimal aggregation of classifiers in statistical learning. Ann Statist. 2004;32:135–166. [Google Scholar]
  24. van der Laan MJ, Robins JM. Unified Methods for Censored Longitudinal Data and Causality. New York: Springer-Verlag; 2003. [Google Scholar]
  25. Zeng D, Lin D. Maximum likelihood estimation in semiparametric regression models with censored data (with discussion) J R Statist Soc B. 2007;69:507–564. [Google Scholar]
  26. Zhang B, Tsiatis AA, Davidian M, Zhang M, Laber E. Estimating optimal treatment regimes from a classification perspective. Stat. 2012a;1:103–114. doi: 10.1002/sta.411. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Zhang B, Tsiatis AA, Laber EB, Davidian M. A robust method for estimating optimal treatment regimes. Biometrics. 2012b;68:1010–1018. doi: 10.1111/j.1541-0420.2012.01763.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Zhao Y, Kosorok MR, Zeng D. Reinforcement learning design for cancer clinical trials. Statist Med. 2009;28:3294–3315. doi: 10.1002/sim.3720. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Zhao Y, Zeng D, Socinski MA, Kosorok MR. Reinforcement learning strategies for clinical trials in nonsmall cell lung cancer. Biometrics. 2011;67:1422–1433. doi: 10.1111/j.1541-0420.2011.01572.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Zhao YQ, Zeng D, Laber EB, Kosorok MR. New statistical learning methods for estimating optimal dynamic treatment regimes. J Am Statist Assoc. 2014 doi: 10.1080/01621459.2014.937488. In Press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Zhao YQ, Zeng D, Rush AJ, Kosorok MR. Estimating individualized treatment rules using outcome weighted learning. J Am Statist Assoc. 2012;107:1106–1118. doi: 10.1080/01621459.2012.695674. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Zhu J, Rosset S, Hastie TJ, Tibshirani RJ. 1-norm support vector machines. Adv Neural Inf Process Syst. 2004;16:49–56. [Google Scholar]
  33. Zhu R, Kosorok MR. Recursively imputed survival trees. J Am Statist Assoc. 2012;107:331–340. doi: 10.1080/01621459.2011.637468. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material

RESOURCES