Abstract
In this article, we propose a new concordance-assisted learning for estimating optimal individualized treatment regimes. We first introduce a type of concordance function for prescribing treatment and propose a robust rank regression method for estimating the concordance function. We then find treatment regimes, up to a threshold, to maximize the concordance function, named prescriptive index. Finally, within the class of treatment regimes that maximize the concordance function, we find the optimal threshold to maximize the value function. We establish the convergence rate and asymptotic normality of the proposed estimator for parameters in the prescriptive index. An induced smoothing method is developed to estimate the asymptotic variance of the proposed estimator. We also establish the n1/3-consistency of the estimated optimal threshold and its limiting distribution. In addition, a doubly robust estimator of parameters in the prescriptive index is developed under a class of monotonic index models. The practical use and effectiveness of the proposed methodology are demonstrated by simulation studies and an application to an AIDS data.
Keywords: Concordance, Optimal treatment regime, Propensity score, Rank estimation, Value function
1 Introduction
An individualized treatment regime is a deterministic function of predictors, such as clinical and genetic factors of patients, which aims to account for patients’ heterogeneity in response to treatment and maximize an expected clinical outcome of interest. Deriving optimal individualized treatment regimes has recently attracted a lot of attention for treating many complex diseases, such as cancer and AIDS.
There is a large and growing body of literature for estimating the optimal individualized treatment regimes with a single decision time point and multiple decision time points. The latter is referred as optimal dynamic treatment regimes. Two main dynamic learning approaches based on backward induction were proposed for estimating optimal dynamic treatment regimes. One is the Q-learning (e.g. Watkins, 1989; Watkins and Dayan, 1992; Zhao et al., 2009, 2011; Qian and Murphy, 2011), which posits regression models for the outcome of interest and the so-called Q-functions. The other is the A-learning (e.g. Murphy, 2003; Blatt et al., 2004), which directly builds models for the contrast functions and uses doubly robust estimating equations to estimate the contrast functions by incorporating the estimated propensity score functions. Compared with the Q-learning, the A-learning is more robust to model misspecification. More recently, for deriving the optimal treatment regime with a single decision time point, Zhang et al. (2012b) formulated the problem in a missing data framework and proposed inverse propensity score weighted (IPSW) and augmented-IPSW (AIPSW) estimators for the mean potential outcome under a given treatment regime, i.e. the value function. The estimated optimal treatment regime is then obtained by maximizing the estimated value function within a class of prespecified regimes, such as linear decision rules. The value function-based optimization method was extended to estimate the optimal dynamic treatment regime in Zhang et al. (2013a). In addition, Zhao et al. (2012) and Zhang et al. (2012a) recast the estimation of the optimal treatment regime from a classification perspective and use machine learning tools, such as outcome-weighted support vector machine, to optimize the estimated value function. Other development includes the subgroup identification method of Foster et al. (2011), target population selection method of Zhao et al. (2013), and marker-guided treatment selection method of Matsouaka et al. (2014).
The value function-based estimation method proposed by Zhang et al. (2012b) is robust and appealing. It does not require to correctly specify the underlying model for the response and it finds the best treatment regime that maximizes the estimated value function in a class of interested regimes even when the true optimal regime is not contained in this class. However, it also have some limitations. First, the convergence rates of the estimators for the parameters in the decision rules are slower than the standard n1/2-rate, and their asymptotic distributions are not normal. The inference of these estimators were not studied in Zhang et al. (2012b). Second, the optimization of the estimated value function can be challenging especially when the dimension of predictors is relatively large since it is very bumpy. Zhang et al. (2012b) proposed to use a genetic algorithm to search for the optimal.
In this work, we propose a new criterion for optimal treatment decision. First, we introduce a type of concordance function for prescribing treatment. Here, concordance for treatment prescription means that if one subject has bigger benefit of receiving a treatment compared with another subject, he or she is more likely to be assigned to this treatment by the regime, which is a natural requirement for a good treatment regime. Then, we propose a robust rank regression method for estimating the concordance function. The proposed rank estimator is a U-statistic of order 2 similar to the maximum rank correlation estimator that has been widely studied in econometrics literature (Han, 1987; Sherman, 1993; Cavanagh and Sherman, 1998; Chen, 2002; Abrevaya, 2003). Second, we find treatment regimes, up to a threshold, to maximize the concordance function, named prescriptive index. The optimization of the estimated concordance function can be done using a simplex algorithm, for example, the optim function in R, even with a relatively larger number of predictors. Finally, within the class of treatment regimes that maximize the concordance function, we find the optimal threshold to maximize the IPSW estimator of the value function. A similar method was used by Matsouaka et al. (2014) to estimate the optimal threshold in marker-guided treatment regimes. We establish the n1/2-consistency and asymptotic normality of the proposed estimator for parameters in the prescriptive index. An induced smoothing method is developed to estimate the asymptotic variance of the proposed estimator. We also establish the n1/3-consistency of the estimated optimal threshold and its limiting distribution. The asymptotic distribution of the estimated value function for the estimated optimal treatment regimes is also derived. In addition, a doubly robust estimator of parameters in the prescriptive index is developed under a class of monotonic index models.
The rest of the article is organized as follows. In Section 2, we introduce the notation, the concordance function for treatment prescription, and the concordance-assisted estimation methods of the optimal treatment regime. The asymptotic properties of the proposed estimators for the parameters in the optimal regime are also studied. Section 3 presents extensive simulation studies to demonstrate the performance of the proposed methods. An application to an AIDS clinical trial data is given in Section 4, followed by a discussion section. All of the technical proofs are provided in the Appendix.
The data that are analysed in the paper and the programs that were used to analyse them can be obtained from http://www4.stat.ncsu.edu/~lu/programcodes.html.
2 Our Estimation Method
2.1 Notation and IPSW Estimation
Let Y be the continuous response variable, X be the p-dimensional vector of covariates, and A, taking values in 𝒜 = {1, 0}, be the treatment indicator. It is assumed that larger value of Y implies better response. The observed data are {(Yi, Xi, Ai), i = 1, ⋯, n}, which are independently and identically distributed across i. Let Y* (a) denote the potential outcome that would result if the subject was given treatment a ∈ 𝒜. A treatment regime d(x) is a deterministic function that maps x ∈ 𝒳 to a ∈ 𝒜. An optimal treatment regime in class 𝒟 is defined as dopt = arg maxd∈𝒟 E{Y*(d(X))}, where 𝒟 is a class of treatment regimes of interest. For example, we may consider a class of linear decision rules d(x) = I(β′ x ≥ c), where β is p-dimensional vector of parameters and c is a scalar. Here E{Y*(d(X))} is called the value function of a given treatment regime d. To estimate the value function based on observed data, two assumptions are typically made: (i) (stable unit treatment value assumption) Y = Y* (1)A + Y* (0)(1 − A); (ii) (no unmeasured confounders assumption) A ⊥ {Y* (0), Y* (1)}|X. Based on these two assumptions, Zhang et al. (2012b) proposed an IPSW estimator for the value function, that is
(1) |
where π(Xi) = P(Ai = 1|Xi) is the propensity score. To search for the optimal treatment regime in a class of linear decision rules, they suggested to maximize V̂n(d) ≡ V̂n(β, c) with respect to β and c under the constraint ‖(c, β′)′‖ = 1, where ‖a‖ is the Euclidean norm of a vector a.
2.2 Concordance-Assisted Learning
From now on, we consider linear decision rules d(x) = I(β′ x ≥ c) for simplicity. Note that the value function E{Y* (d(X))} = E[{Y* (1) − Y* (0)}d(X)] + E{Y* (0)}. To maximize the value function, it is equivalent to maximize E[{Y* (1) − Y*(0)}d(X)]. Here Y* (1) − Y*(0) is the gain of receiving treatment 1 against treatment 0 of a subject. The optimal treatment regime that maximizes the estimated value function tends to assign a subject to treatment 1 if his or her Y*(1) − Y* (0) > 0 and 0 otherwise. For an optimal treatment regime, it is also natural to require that for any two subjects i and j, if , the regime should be more likely to assign subject i to treatment 1 compared with subject j, i.e. β′ Xi > β′ Xj in terms of linear decision rules. This motivates us to propose a concordance-assisted learning (CAL) for estimating the optimal treatment regime in two steps. In the first step, we find β to maximize the concordance function, defined as
with the constraint ‖β‖ = (β′ β)1/2 = 1. Let β* denote the maximizer of C(β). In the second step, we find c to maximize the value function V (β*, c) = E{Y* (I(β*′ X ≥ c))}. Let c* denote the maximizer. The optimal linear decision rule under the concordance-assisted learning is d*,opt(x) = I(β*′ X ≥ c*). Here, the index β*′ X is named prescriptive index, that is, the larger the prescriptive index of a subject, the more benefit he or she tends to gain if assigned to treatment 1.
Note that the optimal treatment regime defined under the concordance-assisted learning may not maximize the value function in the class of linear decision rules, but it maximizes the concordance function. Moreover, among all linear decision rules that maximize the concordance function, it gives the maximal value function. Under the stable unit treatment value assumption and no unmeasured confounders assumption, it can be shown that
where D(Xi) = E(Yi|Ai = 1, Xi) − E(Yi|Ai = 0, Xi). For a class of monotonic index models with , where Q(·) is an unspecified strictly monotone increasing function and ‖β0‖ = 1. Define c0 = Q−1(0). Since Q(·) is a strictly monotone increasing function, it is easy to show that the maximizer of is given by β = β0 and c = c0. In addition, following similar arguments by Cavanagh and Sherman (1998) for maximum rank correlation estimator, the maximizer of the concordance function can also be shown to be β* = β0. Similarly, we have c* = c0. Therefore, the optimal treatment regime under the concordance-assisted learning coincides with the optimal treatment regime that maximizes the value function.
2.3 Estimation With Known Propensity Score
In this section we assume that the propensity score model π(X) is known as in randomized clinical trials. Similar to the argument of A-learning (Murphy, 2003), we have
where ν(X) is an arbitrary function of X. This motivates us to consider the following estimator of the concordance function
(2) |
where ν(X, θ) is a posited parametric model for μ(X) ≡ E(Y | X, A = 0), such as a constant model or linear model, and θ̂ is an estimator of θ. Define β̂ = argmax‖β‖=1 Ĉn(β, θ̂). In addition, an estimator of c* is given by ĉ = arg maxc V̂n(β̂, c), where V̂n(β̂, c) is the IPSW estimator of the value function for the decision rule d(x) = I(β̂′ x ≥ c) as defined in (1). In our implementation, we consider a linear model for ν(X, θ), and θ̂ is the associated least squares estimator based on data from subjects with A = 0. In addition, since Ĉn(β, θ̂) is a U-statistic with order of 2, it is much less bumpy than the IPSW estimator of the value function, and its optimization can be directly obtained using the optim function in R even with relatively large p. Finally, since c is a scalar, the optimization of V̂n(β̂, c) can be simply done using the grid search.
Next we study the asymptotic properties of the estimators β̂ and ĉ. For simplicity of presentation, we first introduce some notation. Define
Let ∇mϱ(β, x) denote the mth partial derivative operator with respect to β, and define
To establish the asymptotic results, we need the following conditions.
(C1) The propensity score π(x) is known and 0 < π (x) < 1 for all x ∈ 𝒳.
(C2) The estimator θ̂ converges almost surely to a deterministic vector of parameters θ*, and .
(C3) The concordance function C(β) = E{τ (β, X1, X2)} has a unique maximizer at with ‖β*‖ = 1.
(C4) (i) The support of X is not contained in a proper linear subspace of ℝp; (ii) The density function of β′X is everywhere positive for β ∈ ℬ, where ℬ is a neighborhood of β*; (iii) E[{D(X)}2] < ∞ and .
(C5) (i) The function ϱ(β, x) is twice differentiable with respect to β; (ii) There is an integrable function ϒ(x) such that for any x ∈ 𝒳 and β1 and β2 with ‖β1‖ = ‖β2‖ = 1, ‖∇2ϱ(β1, x) − ∇2ϱ(β2, x)‖ < ϒ(x)‖β1 − β2 ‖; (iii) E{|∇1ϱ(β*, X)|2} < ∞ and E{|∇2|ϱ(β*, X)} < ∞; (iv) E{∇2ϱ(β*, X)} is negative definite.
(C6) The value function V (β*, c) = E{Y* (I(β*′ X ≥ c))} has a unique maximizer at c = c*. In addition, there exist a neighborhood of c* and a constant M > 0 such that V (β*, c) − V (β*, c*) ≤ −M(c − c*)2 for every c in this neighborhood.
Condition (C1) is assumed to simplify the theoretical arguments. It can be extended to the situation when the propensity score model is correctly specified, for example, using a logistic regression. The parameters in the propensity score model can be consistently estimated from data. The variation of the estimators then needs to be taken into account when deriving the asymptotic variance of β̂. Condition (C2) usually holds for the least squares estimator under mild conditions. Conditions (C3) and (C4) are assumed to establish the consistency of β̂. In particular, condition (C3) assumes the existence and uniqueness of population parameters that maximize the concordance function. For the class of monotonic index models, we have . Following similar arguments by Cavanagh and Sherman (1998), condition (C2) can be shown to hold with β* = β0. Moreover, condition (ii) of (C4) is assumed to show the continuity of C(β). It holds when there is a component of X, say Xj, that has an everywhere positive density conditional on the rest covariates and . Condition (iii) of (C4) is assumed to show the uniform convergence of Ĉn(β, θ̂) to C(β). Condition (C5) is assumed to ensure the asymptotic normality of β̂. Conditions (C4)–(C5) are often used to establish the large sample properties of maximum rank correlation estimators (e.g. Sherman, 1993; Cavanagh and Sherman, 1998). Condition (C6) is assumed to establish the consistency and convergence rate of ĉ. For the class of monotonic index models, we have , where f(·) is the density function of . Then, ∂V (β*, c)/∂c = −Q(c)f(c), which is less than 0 if c > c0 = Q−1(0) and bigger than 0 otherwise. Therefore, V (β*, c) has a unique maximizer with c* = c0. In addition, ∂2V (β*, c)/∂c2 = −Q̇(c)f(c) − Q(c)ḟ(c), where ȧ(c) is the first derivative of a(c). Then, ∂2V (β*, c)/∂c2|c=c* = −Q̇(c*)f(c*) < 0. This implies that there exist a neighborhood of c* and a constant M > 0 such that V (β*, c) − V (β*, c*) ≤ −M(c − c*)2 for every c in this neighborhood. Condition (C6) holds.
Theorem 1
Under conditions (C1)–(C5), we have, as n → ∞,
‖β̂ − β*‖ → 0 almost surely;
in distribution, where Σ = V−1 Δ(V−1)′, 2V = E{∇2ϱ(β*, X)}, and Δ = E{∇1ϱ(β*, X)∇1ϱ(β*, X)′}.
Theorem 2
Under conditions (C1)–(C6), we have, as n → ∞,
|ĉ − c*| = Op(n−1/3);
n1/3(ĉ − c*) converges in distribution to argmaxhG(h), where G(h) is a two-sided Gaussian process defined in the Appendix.
Proofs of the above Theorems are delegated to the Appendix. Since Ĉn(β, θ̂) is not a smooth function of β, to estimate the asymptotic variance matrix of β̂, we derive an induced smoothing method similar to those studied in Brown and Wang (2005, 2007) and Pang et al. (2012). Define the smoothed concordance function C̃n(β, θ̂, H) = E{Ĉn(β + H1/2U, θ̂)}, where U is a standard p-variate normal random vector, H is a p × p positive definite matrix with the order of O(n−1), and the expectation is taken with respect to the distribution of U. We have
where Xij = Xj − Xi, , and Φ(·) is the standard normal cumulative distribution function. Denote the first and second derivatives of C̃n(β, θ̂, H) with respect to β by
where ϕ (·) is the density function of the standard normal distribution, ϕ̇ (·) is the first derivative of ϕ(·), and for a vector υ, υ⊗2 = υυ′. Then, an estimator of Σ is given by
where
We have the following Theorem.
Theorem 3
Under conditions (C1)–(C5), we have that, for any positive definite matrix H, Σ̂ (β̂, θ̂, H) → Σ almost surely, as n → ∞.
The proof of Theorem 3 can follow similar arguments of Zhang et al. (2013b) for a self-induced smoothing approach for transformation models. The details are omitted here. In our numerical implementation, we set H = n−1Ip for its simplicity. From our numerical experience, the results are not sensitive to the choice of H as long as it is in the order of O(n−1). In practice, one more iteration may help to slightly improve the accuracy of the estimated variance of β̂. Specifically, we initially set H as H(0) = n−1Ip and compute Σ̂(β̂, θ̂, H(0)). Then, we update H as H(1) = n−1Σ̂(β̂, θ̂, H(0)). Moreover, a new estimator β̃ can be defined as the maximizer of the smoothed concordance function C̃n(β, θ̂, H). It can be shown that β̂ and β̃ have the same asymptotic distribution, while β̃ may have slightly smaller standard deviation in finite samples compared with β̂.
Next, we establish the asymptotic distribution of the estimated value function, which is given by
Define V (β, c) = E[Y* {I(β′X ≥ c)}].
Theorem 4
Under conditions (C1)–(C6), we have, as n → ∞,
in distribution, where ΣV is defined in the Appendix.
2.4 Doubly Robust Estimation
If the propensity score π(X) is unknown as in observational studies, a parametric model, such as a logistic regression model, is usually assumed for the propensity score. Let π(X, α) denote the posited propensity score model. The parameters α can be estimated from the observed data and let α̂ denote the estimator. If the propensity score model is misspecified, although π(X, α̂) may not consistently estimate the true propensity score π(X), it can be shown that α̂ converges almost surely to a deterministic vector of parameters α* under mild conditions. Under such a situation, Ĉn(β, θ̂) is not a consistent estimator of the concordance function. To improve the robustness of the estimator, β̂, proposed in the previous section, we develop a doubly robust estimation method for the class of monotonic index models: .
Recall that μ(X) = E(Y | X, A = 0). We have
Define
where ν(X, θ) is a posited parametric model for μ(X). Then, if the propensity score model is correctly specified,
on the other hand, if the baseline mean model ν(X, θ) is correctly specified,
Under either case, the maximizer of is β = β0. This motivates us to consider the following loss function
(3) |
Denote . We have the following Theorem.
Theorem 5
Assume that either the propensity score model π(X, α) or the baseline mean model ν(X, θ) is correctly specified and that . Under conditions (C1’)–(C4’) given in the Appendix, we have, as n → ∞,
‖β̂DR − β0‖ → 0 almost surely;
in distribution, where the asymptotic variance matrix ΣDR is defined in the Appendix.
Since the asymptotic variance matrix ΣDR has a very complicated form, the direct estimation of ΣDR may be difficult. Following the similar techniques in Jin et al. (2001), we derive a resampling method to estimate ΣDR. Specifically, we consider the perturbed loss function
where ξ1, ⋯, ξn are independent and identically distributed exponential variables with mean of one, , and α̂* is the solution to the equation . Denote . We can use the empirical variance matrix of β̂DR,* to estimate ΣDR.
3 Simulations
3.1 Simulations for Monotonic Index Models
In the first set of simulations, we consider a class of monotonic index models
(4) |
where X = (X1, X2, X3, X4)′, A is generated from Bernoulli(π(X)), and ε is generated from N(0, 0.52). Four cases are studied: ; and . Here X1, X2, X3 and X4 are independent and identically distributed standard normal random variables. In each case, we set β0 = (1, 1, −1, 1)′, γ1 = (1, −1, 1, 1)′, and γ2 = (1, 0, −1, 0)′. For all the cases, the optimal treatment regime that maximizes the value function and the optimal treatment regime that is defined by our proposed concordance-assisted learning are the same, which is given by with c0 = 0 and β0 = (0.5, 0.5, −0.5, 0.5)′ after imposing the constraint ‖β0‖ = 1. For each case, we carry out 500 simulation runs with sample size n = 200. The optimization in the proposed method is done by the optim function in R with the default method “Nelder-Mead” for searching the maximizer.
For the propensity score model, we first consider randomized trials with π(X) = 0.5. We compare the IPSW and AIPSW estimators of Zhang et al. (2012b), the estimator obtained based on a linear regression model for both the baseline covariate effect and treatment-covariates interaction, denoted by LR, the proposed concordance-assisted learning estimator, denoted by CAL, and its doubly robust variant, denoted by CAL-DR. In all the estimators, we assume that the propensity score is known and is set as 0.5. For the AIPSW estimator, a linear model was fitted for the augmented term as done in Zhang et al. (2012b). Under case I, the fitted linear model is correctly specified, while under cases II – IV, it is not. For the CAL and CAL-DR estimators, we consider a linear model for ν(X, θ), where θ is estimated based on data from subjects with A = 0. For the IPSW, AIPSW and LR estimators, we normalize β̂ to have norm one for comparisons with the CAL and CAL-DR estimators and adjust ĉ accordingly by ĉ/‖β̂‖. To assess the performance of the estimators, for β̂, we report the mean and standard deviation of the estimators, the mean of the estimated standard errors, and the empirical coverage probability of 95% Wald-type confidence intervals; for ĉ, we report the mean and standard deviation of the estimators. Here, the standard error of the CAL estimator for β is estimated by the proposed induced smoothing method and that of the CAL-DR estimator is estimated by the proposed resampling method. To evaluate the accuracy of the estimated optimal treatment regime d̂opt(x) = I(β̂′ x ≥ ĉ), we report the mean and standard deviation of the percentages of making correct decisions (PCD), defined as . Furthermore, we report the mean and standard deviation of the value functions and concordance functions for the estimated optimal treatment regime, which are obtained via simulations. Specifically, we generate data for N = 10, 000 subjects from model (4) and obtain the value function and the concordance function for d̂opt(x) by
respectively. Similarly, we can compute the value function and the concordance function for the true optimal treatment regime, dopt(x), denoted by V0 and C0, respectively. In all the tables, we report the mean of the estimators (denoted as Est.), standard deviation of the estimators (SD), the mean of the estimated standard errors (SE) and the empirical coverage probability (CP%) of Wald-type 95% confidence interval.
The simulation results for Cases I and II are summarized in Table 1 and those for Cases III and IV are given in the Supplementary Appendix for saving space. From the simulation results, we make the following observations. First, our proposed CAL and CAL-DR estimators for β are nearly unbiased under all cases and have smaller biases than the IPSW and AIPSW estimators. In addition, the standard deviations of the CAL and CAL-DR estimators for β are much smaller than those of the IPSW and AIPSW estimators, indicating the big efficiency gain of the concordance-assisted learning for estimating the optimal treatment regime. One possible reason is that the CAL and CAL-DR estimators have faster convergence rate than the IPSW and AIPSW estimators. Second, the mean of estimated standard errors of the CAL and CAL-DR estimators for β is close to the standard deviation of the estimators and the empirical coverage probability of 95% confidence intervals is close to the nominal level under all cases. Third, the CAL-DR estimator for β has smaller standard deviations than the CAL estimator, indicating that the CAL-DR estimator may be more efficient than the CAL estimator. Fourth, the estimated value function and concordance function of the optimal treatment regimes obtained by the CAL and CAL-DR estimators are all close to their true values and are larger than those obtained by the IPSW estimator. Fifth, the PCDs of the optimal treatment regimes obtained by the CAL and CAL-DR estimators range from 0.875 to 0.925, which are much higher than those obtained by the IPSW estimator. Sixth, for case I, where the fitted linear model was correctly specified, the AIPSW estimators of β have much smaller standard deviations than the IPSW estimators, showing the efficiency gain of the AIPSW estimators as studied in the literature. However, for cases II – IV, where the fitted linear model was misspecified, the AIPSW and IPSW estimators have comparable performance in terms of the value function and PCD. Finally, the LR estimator has the best performance under case I as expected since the linear regression model is correctly specified, but it may have worse performance than the CAL and CAL-DR estimators when the linear model is misspecified. For example, in case II, the LR estimators of β have larger biases and standard deviations, and the optimal treatment regimes obtained by the LR estimators give smaller value function, PCD and concordance function.
Table 1.
Simulation results for Cases I and II. The true optimal regime is with β0 = (0.5, 0.5, − 0.5, 0.5)′, c0 = 0.
β̂1 | β̂2 | β̂3 | β̂4 | ĉ | V̂ | PCD | Ĉ | ||
---|---|---|---|---|---|---|---|---|---|
Case I | V0 = 2.595, C0 = 2.258 | ||||||||
IPSW | Est. | 0.456 | 0.591 | −0.561 | 0.495 | 0.063 | 2.473 | 0.883 | 2.131 |
SD | 0.202 | 0.193 | 0.153 | 0.152 | 0.237 | 0.104 | 0.053 | 0.125 | |
AIPSW | Est. | 0.575 | 0.568 | −0.574 | 0.572 | 0.002 | 2.585 | 0.962 | 2.244 |
SD | 0.058 | 0.060 | 0.062 | 0.087 | 0.080 | 0.034 | 0.020 | 0.021 | |
LR | Est. | 0.500 | 0.500 | −0.500 | 0.499 | 0.000 | 2.597 | 0.989 | 2.257 |
SD | 0.016 | 0.016 | 0.016 | 0.015 | 0.018 | 0.032 | 0.008 | 0.017 | |
CAL | Est. | 0.498 | 0.502 | −0.495 | 0.497 | 0.040 | 2.543 | 0.920 | 2.249 |
SD | 0.048 | 0.043 | 0.042 | 0.046 | 0.234 | 0.066 | 0.055 | 0.019 | |
SE | 0.051 | 0.051 | 0.050 | 0.050 | - | - | - | - | |
CP | 95.0 | 96.6 | 96.2 | 94.8 | - | - | - | - | |
CAL-DR | Est. | 0.501 | 0.501 | −0.497 | 0.499 | 0.031 | 2.545 | 0.925 | 2.256 |
SD | 0.022 | 0.021 | 0.021 | 0.019 | 0.222 | 0.009 | 0.056 | 0.017 | |
SE | 0.023 | 0.023 | 0.021 | 0.024 | - | - | - | - | |
CP | 95.4 | 97.2 | 93.4 | 98.0 | - | - | - | - | |
Case II | V0 = 7.723, C0 = 6.248 | ||||||||
IPSW | Est. | 0.376 | 0.525 | −0.530 | 0.453 | −0.137 | 7.558 | 0.818 | 5.699 |
SD | 0.311 | 0.284 | 0.245 | 0.215 | 0.334 | 0.259 | 0.088 | 0.877 | |
AIPSW | Est. | 0.422 | 0.403 | −0.408 | 0.432 | 0.160 | 7.471 | 0.773 | 5.209 |
SD | 0.291 | 0.325 | 0.324 | 0.392 | 0.416 | 0.629 | 0.127 | 1.405 | |
LR | Est. | 0.481 | 0.479 | −0.486 | 0.480 | 0.472 | 7.663 | 0.811 | 6.125 |
SD | 0.135 | 0.134 | 0.136 | 0.136 | 0.114 | 0.538 | 0.042 | 0.555 | |
CAL | Est. | 0.498 | 0.491 | −0.503 | 0.496 | −0.097 | 7.667 | 0.875 | 6.226 |
SD | 0.054 | 0.057 | 0.057 | 0.057 | 0.456 | 0.179 | 0.090 | 0.526 | |
SE | 0.064 | 0.062 | 0.063 | 0.062 | - | - | - | - | |
CP | 97.2 | 95.0 | 95.6 | 95.6 | - | - | - | - | |
CAL-DR | Est. | 0.498 | 0.497 | −0.504 | 0.498 | −0.078 | 7.668 | 0.875 | 6.244 |
SD | 0.025 | 0.025 | 0.025 | 0.023 | 0.455 | 0.185 | 0.094 | 0.525 | |
SE | 0.026 | 0.026 | 0.023 | 0.026 | - | - | - | - | |
CP | 95.8 | 96.2 | 92.4 | 96.2 | - | - | - | - |
Est., mean of estimators; SD, standard deviation of estimators; SE, mean of estimated standard errors; CP, empirical coverage probability of 95% confidence interval (%).
For comparison, we also implemented the method of Zhao et al. (2013). For saving space, the results are provided in the Supplementary Appendix. We made the following observations. First, the method of Zhao et al. (2013) depends on the choice of ξ. In practice, a data-adaptive way is needed for choosing the optimal ξ. Second, for case I, where the linear models are correctly specified, the method of Zhao et al. (2013) with the best choice of ξ and the proposed methods have comparable performance in terms of the value (V̂) and PCD. However, for cases II – IV, where fitted linear models are misspecified, the estimated optimal treatment rules obtained by the proposed methods give larger values and PCD than those of Zhao et al. (2013). Third, the estimated optimal treatment rules obtained by the proposed methods give larger concordance values (Ĉ) with much smaller standard deviations than those of Zhao et al. (2013) for all cases. In summary, the proposed methods showed very competitive performance.
In addition, we compare the proposed methods and the method of Matsouaka et al. (2014) to examine the inference of the estimated value function. The results are provided in the Supplementary Appendix. All the methods have proper coverage probabilities close to the nominal level, and the estimators of the value functions have comparable standard deviations.
We also conducted simulations with p = 10 covariates, generated from the standard normal distribution. The results are provided in the Supplementary Appendix. It can be seen that the proposal methods give larger concordance, value and PCD compared with the IPSW method and require much shorter computational time on average. This demonstrates the ability of the proposed methods for handling relatively large number of covariates.
Next we consider simulations when the propensity score π(X) is unknown as in observation studies and the posited model is misspecified. The simulation results are given in the Supplementary Appendix. We observe that the IPSW and CAL estimators have relatively large biases under all cases as expected; the CAL-DR estimator is nearly unbiased under Cases I and II and has much smaller biases than the IPSW and CAL estimators under Cases III and IV; the mean of the estimated standard errors of the CAL-DR estimators is close to the standard deviation of the estimators and the empirical coverage probability of 95% confidence intervals is close to the nominal level. These findings show that the CAL-DR estimator has the double robustness property under Cases I and II when the propensity score model is misspecified and has superior performance compared with the IPSW and CAL estimators under Cases III and IV when both the baseline mean model and the propensity score model are misspecified.
3.2 Simulations for General Models
In this Section, we consider several models where a monotonic index model for D(X) is violated and the true optimal treatment regime may or may not be defined by a linear decision rule. We first consider model (4) with the following setting: , where γ1 and γ2 are defined the same as before. Here, we only consider randomized studies with π(X) = 0.5. In this case, the true optimal treatment regime is given by , which is not a linear decision rule. We compare the mean and standard deviation of the IPSW, AIPSW, LR, CAL and CAL-DR estimators and the value function and concordance function of the estimated optimal treatment regime obtained by different methods. For all the methods, we search the optimal treatment regime in the class of linear decision rules. The simulation results are summarized in Table 2. From the simulation results, it can be seen that the CAL and CAL-DR methods give similar estimates of β and c, which may have relatively big difference from the estimates obtained by the IPSW and AIPSW methods; the concordance function of the estimated optimal treatment regimes obtained by the CAL and CAL-DR methods are larger than those obtained by the IPSW and AIPSW methods; the value function of the estimated optimal treatment regimes obtained by the CAL and CAL-DR methods are comparable or slightly larger than those obtained by the IPSW, AIPSW and LR methods and are closer to the value function of the true optimal treatment regime .
Table 2.
Simulation results for Cases V and VI.
β̂1 | β̂2 | β̂3 | β̂4 | ĉ | V̂ | PCD | Ĉ | ||
---|---|---|---|---|---|---|---|---|---|
Case V | V0 = 3.297 | ||||||||
IPSW | Est | 0.088 | 0.447 | 0.209 | 0.023 | 0.596 | 2.877 | 0.712 | 0.273 |
SD | 0.500 | 0.468 | 0.350 | 0.446 | 0.457 | 0.131 | 0.100 | 0.184 | |
AIPSW | Est | 0.057 | 0.516 | 0.260 | 0.011 | 0.640 | 2.912 | 0.739 | 0.310 |
SD | 0.532 | 0.461 | 0.338 | 0.409 | 0.413 | 0.123 | 0.100 | 0.179 | |
LR | Est | 0.080 | 0.570 | 0.254 | 0.022 | −0.641 | 2.902 | 0.734 | 0.331 |
SD | 0.457 | 0.315 | 0.436 | 0.326 | 0.536 | 0.113 | 0.086 | 0.156 | |
CAL | Est | 0.091 | 0.575 | 0.267 | 0.018 | 0.836 | 2.915 | 0.735 | 0.342 |
SD | 0.448 | 0.282 | 0.398 | 0.390 | 0.759 | 0.122 | 0.106 | 0.135 | |
CAL-DR | Est | 0.085 | 0.585 | 0.258 | 0.023 | 0.848 | 2.920 | 0.738 | 0.342 |
SD | 0.448 | 0.287 | 0.410 | 0.365 | 0.753 | 0.119 | 0.101 | 0.140 | |
Case VI | V0 = 4.366 | ||||||||
IPSW | Est. | 0.444 | 0.486 | −0.445 | 0.468 | 0.527 | 4.259 | 0.890 | 3.135 |
SD | 0.198 | 0.196 | 0.194 | 0.194 | 0.189 | 0.447 | 0.051 | 0.453 | |
AIPSW | Est. | 0.448 | 0.426 | −0.423 | 0.434 | 0.355 | 4.204 | 0.845 | 2.972 |
SD | 0.412 | 0.241 | 0.243 | 0.287 | 0.229 | 0.498 | 0.124 | 0.732 | |
LR | Est. | 0.483 | 0.485 | −0.477 | 0.461 | 0.306 | 4.175 | 0.685 | 3.195 |
SD | 0.151 | 0.142 | 0.161 | 0.154 | 0.095 | 0.421 | 0.040 | 0.427 | |
CAL | Est. | 0.498 | 0.495 | −0.508 | 0.480 | 0.548 | 4.338 | 0.935 | 3.255 |
SD | 0.075 | 0.071 | 0.064 | 0.066 | 0.205 | 0.420 | 0.033 | 0.416 | |
CAL-DR | Est. | 0.506 | 0.501 | −0.508 | 0.475 | 0.531 | 4.341 | 0.937 | 3.262 |
SD | 0.047 | 0.045 | 0.053 | 0.047 | 0.217 | 0.422 | 0.045 | 0.416 |
In addition, we consider the model
where ε ~ N(0, 0.52), A ~ Bernoulli(0.5) and X is generated the same as before. The following case is studied: , where β0 is defined the same as before. In this case, the true optimal treatment regime is given by , which is a linear decision rule. Here, the contrast function D(X) = exp{μ(X) + Q(X)} − exp{μ(X)}, which is not a monotonic index model. For all the methods, we search the optimal treatment regime in the class of linear decision rules. Thus, the IPSW and AIPSW estimators are consistent but the CAL and CAL-DR are not. The simulation results are also summarized in Table 2. From the simulation results, we observe that the IPSW and AIPSW estimators of β have relatively small biases under all cases as expected; compared with the IPSW and AIPSW estimators, the CAL and CAL-DR estimators have comparable biases under Cases VI but with smaller standard deviations; the concordance function of the estimated optimal treatment regimes obtained by the CAL and CAL-DR methods are larger than those obtained by the IPSW and AIPSW methods; the value function of the estimated optimal treatment regimes obtained by the CAL and CAL-DR methods are slightly larger than those obtained by the IPSW and AIPSW methods and are closer to the value function of the true optimal treatment regime . The LR method shows worst performance under this case, yielding much smaller value function and PCD than other methods.
In summary, based on the above simulations the CAL and CAL-DR methods show competitive performance even when the monotonic index model for D(X) is violated.
4 Real Data Analysis
We demonstrate the proposed method by an application to the data from AIDS Clinical Trials Group Protocol 175 (ACTG175), which consists of 2139 HIV-infected subjects. The enrolled subjects were randomized to four different treatment groups: zidovudine (ZDV) monotherapy, ZDV+didanosine (ddI), ZDV+zalcitabine, and ddI monotherapy. Here, we focus on the subset of patients receiving the treatment ZDV+ddI or ZDV+zalcitabine. Treatment indicator A = 0 denotes for the treatment ZDV+zalcitabine (524 subjects), while A = 1 denotes for the treatment ZDV+ddI (522 subjects). The CD4 count (cells/mm3) at 20 ± 5 weeks post-baseline is chosen to be the continuous response variable Y. Twelve covariates are considered, including five continuous variables: age (years), weight (kg), Karnofsky score (scale of 0 – 100), CD4 count (cells/mm3) at baseline, and CD8 count (cells/mm3) at baseline; and seven binary variables: hemophilia (0=no, 1=yes), homosexual activity (0=no, 1=yes), history of intravenous drug use (0=no, 1=yes), race (0=white, 1=non-white), gender (0=female,1=male), antiretroviral history (0=naive, 1=experienced), and symptomatic status (0=asymptomatic, 1=symptomatic). Here, the propensity score is known and set as π(X) ≡ 0.5. We consider both the CAL and CAL-DR estimators. In our analysis, the five continuous covariates are normalized to have mean zero and norm one. The CAL and CAL-DR estimators of β and their standard errors are given in Table 3. It can be seen that the CAL and CAL-DR methods yield comparable estimates of β while the CAL-DR estimator generally has smaller standard errors compared with the CAL estimator. In addition, from the P-values reported in Table 3, age is the only significant covariate at the level of 0.05 based on both methods. We refit the CAL and CAL-DR estimators with age as the only covariate. The two methods yield the same estimated optimal treatment regime, which is given by I(age > 37.5). The results suggest that ZDV+zalcitabine (A=0) is more favorable to young patients with AIDS, but ZDV+ddI (A=1) for old patients. A similar finding was also observed in Lu et al. (2013).
Table 3.
Estimated Optimal Treatment Regimes for ACTG175 Data.
β̂1 | β̂2 | β̂3 | β̂4 | β̂5 | β̂6 | β̂7 | β̂8 | β̂9 | β̂10 | β̂11 | β̂12 | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CAL | Est. | 0.929 | −0.182 | −0.160 | −0.252 | −0.076 | −0.057 | −0.059 | 0.001 | −0.033 | 0.037 | −0.019 | −0.021 |
SE | 0.470 | 0.353 | 0.957 | 0.471 | 0.303 | 0.083 | 0.044 | 0.048 | 0.029 | 0.052 | 0.025 | 0.042 | |
PV | 0.048 | 0.608 | 0.868 | 0.594 | 0.802 | 0.494 | 0.182 | 0.982 | 0.253 | 0.473 | 0.454 | 0.614 | |
CAL-DR | Est. | 0.946 | −0.093 | −0.143 | −0.202 | −0.166 | −0.051 | −0.052 | 0.001 | −0.030 | 0.021 | −0.017 | −0.017 |
SE | 0.111 | 0.211 | 0.207 | 0.258 | 0.203 | 0.053 | 0.028 | 0.028 | 0.02 | 0.032 | 0.016 | 0.021 | |
PV | 0.000 | 0.658 | 0.490 | 0.433 | 0.413 | 0.330 | 0.067 | 0.960 | 0.130 | 0.516 | 0.279 | 0.410 |
Furthermore, to compare the CAL and CAL-DR estimators with the IPSW estimator, we randomly divide the dataset into half training and half testing sets for 200 times. For each random split, we compute the CAL, CAL-DR and IPSW estimators based on the training set with age as the only covariate. Then, we compute the estimated value function V̂n defined in (1) for the estimated optimal treatment regimes obtained by the CAL, CAL-DR and IPSW estimators. The averages of V̂n are 395.1 (12.2) for the CAL and CAL-DR estimators, and 386.5 (18.9) for the IPSW estimators, respectively, where the quantities in the parenthesis are the corresponding standard deviations. These results demonstrate that the optimal treatment regimes obtained by the CAL and CAL-DR methods give larger value functions on average but with smaller standard deviations than that obtained by the IPSW method.
5 Discussion
We propose new methods for estimating the optimal treatment regime based on concordance-assisted learning. The estimators of the parameters in the prescriptive index for treatment decision have standard n1/2-rate, and their asymptotic distributions and inference are studied. It is worth noting that although the proposed concordance-assisted learning has appealing interpretation and works well under monotonic index models, the proposed procedure will not always recommend patients with positive D(X) = E(Y|A = 1, X) − E(Y|A = 0, X) to receive treatment 1.
In this paper, we only consider a class of linear decision rules for simplicity. However, our method can be extended to incorporate a more general class of decision rules as d(X) = I{g(X) ≥ c}, where g(·) is a nonparametric function of X. The corresponding concordance function is defined as
and the associated optimal treatment regime is given by d*,opt(X) = I{g* (X) ≥ c*}, where g* = argmaxg C(g) and c* = argmaxc E(Y*[I{g*(X) ≥ c}]).
In addition, it is interesting to generalize the proposed methods to multiple-treatment setup. For example, consider a study with three treatments, denoted as 0, 1 an 2. Assume monotonic index models and , where Q1(·) and Q2(·) are two strictly increasing functions. We can use the proposed methods to estimate β1 and β2 based on comparison of two treatments at a time: 1 vs. 0 and 2 vs. 0, respectively. The resulting estimated optimal treatment rules are denoted as and . Here, if , a subject with covariates X = x is given treatment 1, and 0 otherwise. The rule is similarly defined. Therefore, when , treatment 0 is given; when and , treatment 1 is given; when and , treatment 2 is given; when , we will apply the concordance assisted learning based on the comparison between treatments 1 and 2, and obtain the resulting estimated optimal treatment rule , which assigns treatment 2 if and 1 otherwise. However, generalization to multiple category treatment is not straightforward other than using the above ad-hoc pairwise comparison strategy and its properties need to be investigated. Moreover, the proposed methods can also be extended to incorporate multiple decision time points. The related discussions are given in the Supplementary Appendix.
Supplementary Material
Acknowledgments
The authors are grateful to two referees, an Associate Editor and the Editor for their thoughtful and suggestive comments, which have helped to greatly improve on an earlier manuscript. The work was partially supported by National Institutes of Health grants P01 CA142538.
Appendix
Here, we only give the proofs for Theorems 1 and 2. Those for Theorems 4 and 5 are given in the Supplementary Appendix.
Appendix A: Proof of Theorem 1
To establish the consistency of β̂, similar to Cavanagh and Sherman (1998), we need to show: (i) C(β) has a unique maximizer at β*; (ii) sup‖β‖=1 |Ĉn(β, θ̂) − C(β)| = op(1); (iii) C(β) is continuous. Here, (i) is assumed by condition (C3), which holds for the class of monotonic index models. Under condition (C1), we have E{Λ12(θ)|X1, X2} = D(X1) − D(X2) for any fixed θ. Define fij(β, θ) = Λij(θ)I(β′ Xi > β′ Xj) − C(β). Then, Ĉn(β, θ) − C(β) = Unfij(β, θ), where Un denotes the random measure putting mass 1/(n(n−1)) on each pair of data. Therefore, Unfij(β, θ) is a zero-mean U-process of order 2. In addition, by condition (iii) of (C4), it can be shown that the class {f12(β, θ) : ‖β‖ = 1, θ − θ* = Op(n−1/2)} is Euclidean with a square-integrable envelope. Thus, (ii) holds. Finally, condition (ii) of (C4) implies that P(β′ X1 = β′ X2) = 0 for β ∈ ℬ. Then, τ (βm, X1, X2) → τ(β, X1, X2) in probability, where {βm} is a sequence of elements of ℬ converging to β as m goes to infinity. Applying the dominated convergence theorem, we have C(βm) → C(β) as m → ∞. Thus, (iii) is proved. The consistency of β̂ then follows Amemiya (1985, pp.106–107).
Next, we establish the limiting distribution of β̂. We have
where Ĉn(β, θ*) is a U-statistics of order two. Following similar arguments in Sherman (1993), by condition (C5), we have, uniformly over any op(1) neighborhood of β*,
(A.1) |
where in distribution and 2V = E{∇2ϱ(β*, X)}.
In addition, we have E{∂Λij(θ)/∂θ|Xi, Xj} = E{∂2Λij(θ)/∂θ∂θ′|Xi, Xj} = 0 for any θ. First applying second order Taylor expansion with respect to θ and then following similar arguments to derive (A.1), we have, uniformly over any op(1) neighborhood of β*,
(A.2) |
Combining (A.1) and (A.2), we have
which implies that since V is negative definite. Therefore,
By Theorem 2 of Sherman (1993), we have
(A.3) |
in distribution, where Σ = V−1ΔV−1 with Δ = E{∇1ϱ(β*, X)∇1ϱ(β*, X)′}.
Appendix B: Proof of Theorem 2
Define . Note that V(β, c) = E[Ỹ I{A = I(β′ X ≥ c)}]. By condition (C6), we have V(β*, c) − V(β*, c*) ≤ −M(c − c*)2. In addition, define
where η is a positive constant. Under condition (C1), it can be easily shown that ℱ1 is a Donsker class. Moreover, by Theorem 11.2 of Kosorok (2007), we have
for all n large enough and sufficiently small η, where F = |Ỹ| is a square-integrable envelope function, N[](εF, ℱ1, L2(P)) is the bracket number of ℱ1 with L2(P) norm metric, and K is a positive constant. Therefore,
where ϕ(η) = η1/2 + η. Note that ϕn(η)/ηα is decreasing in η for any α ∈ [1, 2). Set rn = n1/3, then . By Theorem 14.4 of Kosorok (2007), we obtain n1/3(ĉ − c*) = Op(1).
Next, we establish the limiting distribution of ĥn = n1/3(ĉ − c*). We have that hn is the argmax of the process n2/3{V̂n(β̂, n−1/3h + c*) − V̂n(β̂, c*)} that is indexed by h. In addition,
We have
where υ = ∂2V (β*, c*)/∂c2. In addition, define
which can be shown to be a Donsker class. In addition, we have
as n → ∞, where q* is a positive constant. Therefore, n2/3{V̂n(β̂, n−1/3h + c*) − V(β̂, n−1/3h + c*)} − n2/3{V̂n(β̂, c*) − V(β̂, c*)} converges weakly to a two-sided Gaussian process , where 𝒵(h) is a standard two-sided Brownian motion process. By the Argmax theorem of Kosorok (2007), ĥn converges in distribution to argmaxhG(h), where .
References
- Abrevaya J. Pairwise-difference rank estimation of the transformation model. Journal of Business & Economic Statistics. 2003;21:437–47. [Google Scholar]
- Amemiya T. Advanced econometrics. Harvard university press; 1985. [Google Scholar]
- Blatt D, Murphy S, Zhu J. A-learning for approximate planning. Unpublished Manuscript. 2004 [Google Scholar]
- Brown B, Wang Y-G. Standard errors and covariance matrices for smoothed rank estimators. Biometrika. 2005;92:149–158. [Google Scholar]
- Brown B, Wang Y-G. Induced smoothing for rank regression with censored survival times. Statistics in medicine. 2007;26:828–836. doi: 10.1002/sim.2576. [DOI] [PubMed] [Google Scholar]
- Cavanagh C, Sherman RP. Rank estimators for monotonic index models. Journal of Econometrics. 1998;84:351–381. [Google Scholar]
- Chen S. Rank estimation of transformation models. Econometrica. 2002;70:1683–1697. [Google Scholar]
- Foster JC, Taylor JM, Ruberg SJ. Subgroup identification from randomized clinical trial data. Statistics in medicine. 2011;30:2867–2880. doi: 10.1002/sim.4322. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Han AK. Non-parametric analysis of a generalized regression model: the maximum rank correlation estimator. Journal of Econometrics. 1987;35:303–316. [Google Scholar]
- Jin Z, Ying Z, Wei LJ. A simple resampling method by perturbing the minimand. Biometrika. 2001;88:381–390. [Google Scholar]
- Kosorok MR. Introduction to empirical processes and semiparametric inference. Springer; 2007. [Google Scholar]
- Lu W, Zhang H, Zeng D. Variable selection for optimal treatment decision. Statistical methods in medical research. 2013;22:493–504. doi: 10.1177/0962280211428383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matsouaka RA, Li J, Cai T. Evaluating marker-guided treatment selection strategies. Biometrics. 2014;70:489–499. doi: 10.1111/biom.12179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Murphy SA. Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2003;65:331–355. [Google Scholar]
- Pang L, Lu W, Wang HJ. Variance estimation in censored quantile regression via induced smoothing. Computational statistics & data analysis. 2012;56:785–796. doi: 10.1016/j.csda.2010.10.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qian M, Murphy SA. Performance guarantees for individualized treatment rules. Annals of statistics. 2011;39(2):1180. doi: 10.1214/10-AOS864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sherman RP. The limiting distribution of the maximum rank correlation estimator. Econometrica. 1993;61:123–137. [Google Scholar]
- Watkins CJ, Dayan P. Q-learning. Machine learning. 1992;8:279–292. [Google Scholar]
- Watkins CJCH. PhD thesis. University of Cambridge England; 1989. Learning from delayed rewards. [Google Scholar]
- Zhang B, Tsiatis AA, Davidian M, Zhang M, Laber E. Estimating optimal treatment regimes from a classification perspective. Stat. 2012a;1:103–114. doi: 10.1002/sta.411. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang B, Tsiatis AA, Laber EB, Davidian M. A robust method for estimating optimal treatment regimes. Biometrics. 2012b;68:1010–1018. doi: 10.1111/j.1541-0420.2012.01763.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang B, Tsiatis AA, Laber EB, Davidian M. Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. Biometrika. 2013a;100:681–694. doi: 10.1093/biomet/ast014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang J, Jin Z, Shao Y, Ying Z. Statistical inference on transformation models: a self-induced smoothing approach. arXiv preprint arXiv:1302.6651. 2013b [Google Scholar]
- Zhao L, Tian L, Cai, Tianxiand Claggett B, Wei LJ. Effectively selecting a target population for a future camparative study. Journal of the American Statistical Association. 2013;108:527–539. doi: 10.1080/01621459.2013.770705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao Y, Kosorok MR, Zeng D. Reinforcement learning design for cancer clinical trials. Statistics in medicine. 2009;28:3294–3315. doi: 10.1002/sim.3720. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao Y, Zeng D, Rush AJ, Kosorok MR. Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association. 2012;107:1106–1118. doi: 10.1080/01621459.2012.695674. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao Y, Zeng D, Socinski MA, Kosorok MR. Reinforcement learning strategies for clinical trials in nonsmall cell lung cancer. Biometrics. 2011;67:1422–1433. doi: 10.1111/j.1541-0420.2011.01572.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.