Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2023 Feb 8;13:2250. doi: 10.1038/s41598-023-29106-w

Accountable survival contrast-learning for optimal dynamic treatment regimes

Taehwa Choi 1, Hyunjun Lee 2, Sangbum Choi 3,
PMCID: PMC9908913  PMID: 36755137

Abstract

Dynamic treatment regime (DTR) is an emerging paradigm in recent medical studies, which searches a series of decision rules to assign optimal treatments to each patient by taking into account individual features such as genetic, environmental, and social factors. Although there is a large and growing literature on statistical methods to estimate optimal treatment regimes, most methodologies focused on complete data. In this article, we propose an accountable contrast-learning algorithm for optimal dynamic treatment regime with survival endpoints. Our estimating procedure is originated from a doubly-robust weighted classification scheme, which is a model-based contrast-learning method that directly characterizes the interaction terms between predictors and treatments without main effects. To reflect the censorship, we adopt the pseudo-value approach that replaces survival quantities with pseudo-observations for the time-to-event outcome. Unlike many existing approaches, mostly based on complicated outcome regression modeling or inverse-probability weighting schemes, the pseudo-value approach greatly simplifies the estimating procedure for optimal treatment regime by allowing investigators to conveniently apply standard machine learning techniques to censored survival data without losing much efficiency. We further explore a SCAD-penalization to find informative clinical variables and modified algorithms to handle multiple treatment options by searching upper and lower bounds of the objective function. We demonstrate the utility of our proposal via extensive simulations and application to AIDS data.

Subject terms: Applied mathematics, Statistics

Introduction

Dynamic treatment regime (DTR) is an emerging paradigm for maximizing treatment efficacy by providing tailored medicine to each patient1,2. Many chronic diseases, such as cancer, human immunodeficiency virus (HIV), and depression, are hard to be cured by a single treatment, requiring continuous disease management. Because human’s clinical information can change over time, sequentially adjusted treatments should be provided in practice, not only based on patients’ clinical history, but also their prior treatment information and intermediate responses. Due to the heterogeneity of the treatment effect affected by the patient’s baseline characteristics, a treatment regime can be defined as a decision rule that assigns a treatment to a patient by taking into account individual features such as genetic, environmental, and social factors. The optimal treatment regime is usually defined as the one that maximizes the average clinical benefit in the potential population for a single treatment. Then a DTR consists of a sequence of optimal treatment regimes, one per stage of intervention, that dictate how to individualize treatments to patients based on evolving treatment and covariate history.

There is a large and growing literature on statistical methods for effectively estimating optimal treatment regimes under multi-stage randomization clinical trials. Since the seminal work by Murphy1, numerous methods have been developed to explore personal characteristics such as genetic information or clinical information to find effective data-driven treatment rules. One of statistical approaches for finding optimal treatment regimes is to use a model-based method to evaluate the treatment regimes by positing appropriate statistical models for outcome on predictors, treatment, and predictor-by-treatment interaction, where the interaction term is mainly used to determine the optimal decision rules. Many early works use Q-learning or inverse-probability weighting schemes in single-stage35 and multi-stage treatment68 settings. However, as the accessibility of individual information, such as molecular, environmental, and genomic data, increases, these approaches may exhibit a curse of dimensionality and suffer from low accuracy due to potential model mis-specification.

Alternatives to these model-based methods include the outcome-weighted learning (OWL) algorithm and its doubly-robust (DR) versions914, which directly work on the predictor-by-treatment interaction term by recasting the original search problem for the optimal treatment rule as a problem of minimizing the weighted misclassification error. There, the original 0–1 loss may be substituted by a convex surrogate loss like the hinge loss function to apply a weighted support vector machine (SVM) algorithm for the weighted classification problem. Instead, Zhang and Zhang13 directly minimized the non-smooth weighted misclassification error via a generic search algorithm. Tao and Wang12 studied the problem of searching optimal treatment rules when there are multiple treatment options. More recent developments explored various modern machine learning techniques, such as Markov decision process or graphical modeling1517, and instrumental variable approaches to deal with possible confounders under observational studies18,19. See also Tsiatis et al.20 for a comprehensive review of the problem setting for DTR and related statistical methodologies.

In the survival analysis literature, many methods have also been developed to establish optimal treatment rules for survival outcomes, mostly based on outcome regression modeling8,21,22 or inverse-probability censoring weighted (IPCW) schemes2325. However, many existing DTR methods for censored data are notoriously complicated, as they often intend to directly maximize nonparametric Kaplan–Meier curves. For example, Jiang et al.24 and Zhou et al.25 aimed to optimize IPCW-adjusted nonparametric t-year survival and cumulative incidence function for a competing risk, respectively, under the counterfactual framework26. Their methods are in general computationally unstable, because these nonparametric survival curves are often non-smooth, and thus its estimation may require extra smoothing procedures. Moreover, their algorithms are computationally expensive, because they involve iterative numerical evaluations of the target survival function in each optimization, and also they may not accommodate high-dimensional survival data. Other IPCW-based methods23,27 used double inverse-weighting schemes to facilitate censoring and treatment allocation from the classification perspective. Several authors assumed a semiparametric linear regression model and directly calculated the counterfactual survival time through the IPCW adjustment for censored data8,21. Although IPCW-based estimation is a convenient and standard way of handling censored data, it is usually sensitive to the amount and distribution of censored variables and is statistically and computationally inefficient even with doubly robust adjustments.

In this article, we propose an accountable contrast-learning algorithm for optimal dynamic treatment regimes with survival endpoints. Our estimating procedure is originated from a doubly-robust weighted classification scheme, which is a model-based contrast-learning method that directly characterizes the interaction terms between predictors and treatments without working on main effects. To reflect the censorship, we adopt the pseudo-value approach28,29 that replaces survival quantities with their pseudo-observations for the time-to-event outcome. Unlike many existing approaches, mostly based on outcome regression modeling or IPCW schemes, the pseudo-value methods enable investigators to conveniently apply standard machine learning techniques to censored data with minimal loss of statistical efficiency. We show that pseudo values, designed to handle censoring, can be a natural unbiased substitute for estimating survival quantities when derived from a consistent estimator. Pseudo values are easy to compute and can also be applied to more complex censoring schemes, such as competing risks, restricted mean lifetime, and interval-censoring, etc. Once the pseudo survival responses are obtained, our estimating procedure is based on a penalized survival contrast-learning (PSCL) algorithm to estimate patient-level tailored treatment rules.

The proposed pseudo-value approach for adaptive treatment allocation exhibits two levels of robustness. The first level of robustness is achieved because the proposed method imposes model assumptions only on the predictor-by-treatment interaction term, not on the main-effect term. The other is attained as the form of the contrasting treatment effects is allowed to be doubly-robust by adopting a standard method for complete data. As a result, the proposed learning algorithm is more robust to model mis-specifications, and nonparametric learning methods such as SVM, random forests and boosting can be naturally applied to identify optimal treatment rules. Empirical results on synthetic and real-world datasets show that our proposed methods can achieve superior results under various censoring settings, compared to other competitors.

Pseudo observations for survival outcomes

We begin by briefly overviewing the pseudo-value approach for survival data28,29. Suppose there are n random samples. Let θ=E[s(T)] be a parameter of interest, where s(·) is a measurable function of survival time T. For example, one might consider I(Tt) and min(T,τ) for s(T), respectively, corresponding to t-year survival and restricted mean lifetime up to time τ>0. Pseudo-observations are basically jackknife-type resampling substitutes for unknown survival quantities. To be specific, the pseudo-observation for the ith subject can be defined as θi^=nθ^-(n-1)θ^-i, where θ^ is an unbiased estimator of θ and θ^-i is the leave-one-out (i.e., jackknife) estimator, based on n-1 samples excluding the ith object. Note that the pseudo-observation θi^ is unbiased estimator, since E(θi^)=nE(θ^)-(n-1)E(θ^-i)=nθ-(n-1)θ=θ. This property can be equivalently applied to the survival quantities. For example, the t-year survival, S(t)=P(Tt), can be approximated by

S^i(t)=nS^(t)-(n-1)S^-i(t), 1

where S^(t) and S^-i(t) are nonparametric Kaplan–Meier estimators, based on all n samples and n-1 samples without the ith observation, respectively. Similar techniques can be used to approximate restricted mean lifetime or cumulative incidence rate for a competing risk. In this article, we also focus on the competing risks setting as it includes the standard survival problem as a special case. For the ith subject, let Ti and Ci be failure and censoring time variables, respectively, and xi be the baseline covariate. Also, let Di{1,,M} denote the indicator for cause of failure, where M is a known number of distinct failure causes, In the presence of censoring, we can actually observe {(T~i,Δi,xi),i=1,,n}, where T~i=min(Ti,Ci) and Δi=I(TiCi)Di. When the event of interest is the first cause of failure, the primary interest is often the t-year cumulative incidence function (CIF), defined as F1(t)=P(Tit,Di=1), for which F1(t)=E[st(T)] and st(T)=I(Tt,D=1). This can also be approximated by the pseudo-value approach through the equation

F^1i(t)=0tS^i(s)dΛ^1i(s), 2

where Λ^1i(t) is the estimated cause-1 specific cumulative hazard function. Our objective is then to construct an efficient and interpretable DTR rule by minimizing the t-year CIF on average. The pseudo-observations can also be computed using functions in the R: pseudo package.

A drawback of this basic pseudo-value approach is that it requires a stringent independent assumption between Ti and Ci. To relax it to the conditional independent assumption, i.e., TiCi|xi, several IPCW-adjusted nonparametric estimators for survival function30,31, some of which are available in the R: eventglm package32, have been developed. For example, one may use the following equations to compute the survival curves under covariate-dependent censoring

S^(t)=i=1nI(Ti>t)v^ii=1nv^iorS^(t)=n-1i=1nI(Ti>t,CiTit)G^(T~it|xi), 3

where v^i=I(CiTit)/G^(T~it|xi). Here, G^(T~it|xi) is a consistent estimator of G(T~it|xi)=P(Ci>T~it|xi), which may be estimated by Cox’s proportional hazards model. Our experience is that two estimators perform similarly and they do not considerably outperform the basic pseudo-value estimator under the strict independent assumption.

Methods

Notation and assumptions

Suppose now that patients are treated sequentially with multi-stage treatments. With a slight abuse of notation, we redefine random variables in the following to describe longitudinal trajectories of K-stage clinical interventions. Let individuals be identified with i=1,,n and stages be denoted by k=1,,K. Let Ak=akAk={0,1} and xk be the treatment option and covariates, respectively, both observed at the beginning of stage k, and let Rk be the reward, such as survival time, when the kth treatment Ak is given. Usually, larger reward values are preferable, but smaller values are preferred when CIF is the target objective. Let ηk be a random indicator that takes value 1 if a patient is alive at the beginning of the kth stage and 0 otherwise. By convention, we let η1=1 since all recruited patients are at least alive at the first treatment stage. Then, we let H1={η1,x1} and Hk={η1,x1,A1,R1,,ηk-1,xk-1,Ak-1,Rk-1,ηk,xk} (k2) to denote the clinical histories of an individual up to stage k. Note that {xk,Ak,Rk} may be missing data when ηk=0. By observing all set of rewards, we can then define the overall outcome of interest as T=m(η1R1,,ηKRK), where m(·) is a prespecified function, for example, T=k=1KηkRk. In the presence of censoring, however, the reward and consequently total reward T may not be fully observed. When the components in T are censored, we can substitute the target measure θ=E[s(T)] with the corresponding pseudo-observation θi^ for patient i. Since the pseudo-value θi^ is also a random variable, we shall use the notation Y in the following to denote the pseudo-observation of θ.

Now we define the potential outcomes as T(aK)=k=1KηkRk(ak) and correspondingly Y(aK), where Rk(ak) denotes the potential reward for stage k if, possibly contrary to the fact, a patient were given treatments ak=(a1,,ak){0,1}k. The optimal DTR will then maximize the expectation of the potential reward outcome as each patient were given the best treatment options at all stages. Let gkgk(Hk){0,1},(k=1,,K) be the treatment regime at the kth stage, mapping from the clinical history Hk to the treatment variable Ak. A DTR, observed at the end-of-stage, is defined as g=(g1,,gK)G, where G denotes all possible set of treatment regimes. The optimal DTR, denoted by gopt=(g1opt,,gKopt), is expected to achieve E[Y(gopt)]E[Y(g)] for any gG. We make the following standard assumptions for causal inference to link potential outcomes to observed data10,33: (i) Consistency, (ii) Sequential randomization, and (iii) Coarsening at random. Assumption (i) states that the potential outcome coincides with the observed one when a subject is actually given the treatment. Assumption (ii) states that the treatment variable at each stage does not rely on future covariates and treatment history, i.e., {jlKηjRj(aj):l=k,,K}Ak|Hk. Lastly, assumption (iii) assumes that at the beginning of each stage, the probability of censoring onward is independent of future outcomes, given accrued information. This means that the censoring indicator is conditionally independent of future rewards, i.e., {j>lKηjRj(aj):l=k,,K}Δ|Hk.

Individualized treatment regimes

To motivate our method, we first consider the simplest single-stage problem (i.e., K=1). By convention, it is assumed that the optimal treatment regime goptG should also satisfy E{Y(gopt)}E{Y(g)} for all gG. By the consistency assumption, the potential pseudo outcome of an arbitrary regime g can be linked to observed data as Y(g)=Y(1)I{g(H)=1}+Y(0)I{g(H)=0}. By letting μa(H)=E(Y|A=a,H), E{Y(g)}=EH[μ1(H)I{g(H)=1}+μ0(H)I{g(H)=0}], where EH is an expectation with respect to clinical information H. From the classification perspective for decision-making problems34, the optimal treatment regime gopt can be obtained by

gopt(H)=arg mingGEH|C(H)|{I[C(H)>0]g(H)}, 4

where C(H)=μ1(H)-μ0(H) is the treatment contrast. A convenient way to estimate μa(H) is to use the inverse-probability weighting (IPW) method, which leads to

C^IPW(H)=μ^1IPW(H)-μ^0IPW(H)=Aπ^1(H)-1-A1-π^1(H)Y. 5

Here, π^a(H),a{0,1} denotes the propensity score that can be estimated by imposing some parametric or nonparametric models given a set of covariates H. The IPW-based contrasting estimator in Eq. (5) is easily shown to be an unbiased estimator for C(H), because of E[I(A=a)/P(A=a|X=x)]=1 and the consistency property of pseudo-observations. However, this approach is only valid when the propensity model π1(H) is correctly posited, which often fails to hold in practice, and usually it is statistically inefficient35,36. A more robust and efficient alternative is the augmented inverse-probability weighting (AIPW) estimator that combines outcome and propensity models to achieve the double-robustness property. Specifically, the AIPW estimator for μa takes the form

μa^DR(H)=I(A=a)πa^(H)Y+1-I(A=a)πa^(H)μa^(H), 6

which is a weighted average between the pseudo-observation Y and its substitute μa^ from an outcome regression model. Even if the target survival measure is non-negative, its pseudo-observation can take a positive or negative value29. Thus, it is natural to use a simple linear regression or modern machine learning techniques to approximate μa(H). In the statistical literature, (6) is well known as a double-robust (DR) estimator10,20,37, because it still produces a consistent result, when either the outcome model μa(H) (Q-model) or propensity score model πa(H) (A-model) is correctly imposed37.

In this work, we shall use (6) to obtain the DR contrast estimator, i.e., C^DR(H)=μ^1DR(H)-μ^0DR(H). Once this contrasting factor is computed, the optimal treatment regime gopt can be obtained from (4). However, weighted classification errors (4) may require complex and slow general algorithms because its optimization is not straightforward9,13,34. Zhang and Zhang 13 used a generic optimization algorithm via the genoud function from the R: rgenoud package. However, this function is computing expensive and works slowly when the covariate dimension is moderate-to-high. Instead, we propose to solve the classification problem (4) via the weighted linear SVM algorithm38, which can estimate the true treatment regime with high probability due to the Fisher consistency property39. Motivated by Song et al.7, we adopt a penalized SVM by incorporating the contrast function C^DR(H) as a weighting factor to achieve the optimization in (4). By letting wi=|C^iDR(Hi)| and Zi=sign{C^iDR(Hi)}, the optimization problem in (4) may be accomplished by introducing a penalized hinge loss function and approximating (4) with

n-1i=1nwi[1-Zif(Hi)]++j=1pPλ(|βj|), 7

where u+=max(0,u) and f(·) is a prespecified function for treatment selection, so that gopt(H)=I{f(H)>0}. For interpretability, we may take a simple linear decision function, i.e., f(Hi)=HiTβ,βRp. We also use the SCAD penalty function

Pλ(|βj|)=λI(|βj|λ)+(γλ-|βj|)+λ(γ-1)I(|βj|>λ),

where λ>0 is a tuning parameter and γ=3.7 as recommended by Fan and Li40. Following a local linear approximation method, we further linearize the SCAD penalty term as

Pλ(|β|)Pλ(|β0|)+Pλ(|β|)(|β|-|β0|),ββ0,

and introduce a slack variable ξi=n-1[1-Zif(Hi)]+. Then the weighted classification problem in (7) can be recast as

minξi,βj+,βj-,β0+,β0-i=1nwiξi+j=1pPλ|βj(0)|βj++βj-subject toZiβ0+-β0-+j=1phijβj+-βj-1-ξi,ξi,β0+,β0-,βj+,βj-0,fori=1,,n;j=1,,p, 8

where u+0 and u-0 are positive and negative parts of u, respectively, such that u=u+-u- and |u|=u++u-, and hij is (ij)th component of H. We may obtain an initial value βj(0) from the standard 2-type SVM optimization. There exist many optimization softwares to work on problem (8); for example, one may use the lp() function in the R: lpSolve package. After β^ is obtained, the estimated optimal treatment rule g^opt can be formulated as g^opt(H)=I(HTβ^>0). It is noted that lower t-year cumulative incidence rates are preferred under competing risks data. In this case, we can simply replace C^iDR(Hi) in (7) with -C^iDR(Hi) to minimize F1(t). This argument is justified by the following proposition.

Proposition 1

The optimal treatment rule for competing risks outcome is the minimizer of the following weighted misclassification error

gopt(H)=arg mingGEH[|C(H)|{I[C(H)0]g(H)}].

Dynamic treatment regimes

This section extends the previous argument to multi-stage treatment strategies to establish an optimal DTR. See Schulte et al.10 for more detailed description on this problem and related notations. To transfer the treatment effect between adjacent stages, we need to recursively define the value function at the stage-k13 as

Vk(Hk)=EHVk+1(Hk+1)+ηkμ1k(Hk)-μ0k(Hk)gkopt(Hk)-Ak|Hk, 9

where gkopt is the optimal treatment rule at kth stage and μakk(Hk)=EH[Vk+1(Hk+1)|A=ak,Hk] for ak{0,1}. We set Vk+1Y as there are no further subsequent processes. Note that μakk(Hk) can be interpreted as a Q-function in reinforcement learning since it represents the “quality” of action ak. Except for the last stage, Vk(Hk) should be estimated backward in stages and let denote the estimated value function by V~kV~k(Hk). The value function at the kth stage can be recursively estimated from the last stage by following equation V~k=V~k+1+ηk{μ^1k(Hk)-μ^0k(Hk)}{g^kopt(Hk)-Ak}, where V~K+1=Y. Note that V~k is equal to V~k+1 if the optimal treatment is given at the kth stage, i.e., g^kopt(Hk)=Ak, otherwise |μ^1k(Hk)-μ^0k(Hk)| will be added to V~k+1. In the statistical literature, the appended term, which is equivalent to |μ^1k(Hk)-μ^0k(Hk)|I{g^kopt(Hk)Ak}, is called a “regret” function, because this quantity becomes positive when an optimal treatment is not given to the patient. The DTR algorithm aims to minimize this value at all stages of treatment regime to make it optimal. For the competing risks response, we should subtract the regret score from the (k+1)th value function to obtain the kth value function if the patient does not receive the optimal treatment, i.e., V~k=V~k+1-ηk{μ^1k(Hk)-μ^0k(Hk)}{g^kopt(Hk)-Ak}, so that we could minimize the cause-specific risk in the end.

At each stage, we use parametric or nonparametric methods to obtain μ^akk(Hk),ak{0,1}. The optimal treatment rule gkoptgkopt(Hk)=I(f(Hk)>0) at the kth stage can then be determined by minimizing the expectation of the weighed misclassification error, EH[ηk|Ck(Hk)|{I[Ck(Hk)>0]gk(Hk)}]. This can be done again by solving a 1-type weighted linear SVM problem as in (8). Based on the value function V~k+1 from the (k+1)th stage, we can construct a DR estimator for the stage-k contrasting factor Ck(Hk)=μ1k(Hk)-μ0k(Hk) as

C^kDR(Hk)=AkV~k+1π^1(Hk)-Ak-π^1(Hk)π^1(Hk)μ^1k(Hk)-(1-Ak)V~k+1π^0(Hk)+Ak-π^1(Hk)π^0(Hk)μ^0k(Hk), 10

where π^ak(Hk) is the estimated propensity score of πak(Hk). Notice that the estimated regret score in this case is equal to |C^kDR(Hk)|I{g^kopt(Hk)Ak}. Hence, the kth stage value function will be V~k=V~k+1+ηk|C^kDR(Hk)|I{g^kopt(Hk)Ak}. This computation proceeds in a backward iterative fashion from the last stage to the first, also related to dynamic programming algorithm41, which produces the desired optimal DTR, gopt=(g1opt,,gKopt). We emphasize that the gopt may not be optimal unless the sequential randomization, consistency and positivity assumptions hold. Also, there may not be a unique gopt. At any decision k, if there is more than one possible option gkopt maximizing the potential reward outcome, then any rule gkopt yielding one of these ak defines an optimal regime.

The proposed penalized DR-adjusted DTR estimation for survival outcome can be summarized as follows:

Step 0.

Set V~K+1Y.

Step 1.

At stage-k, estimate gkopt with (Hk,Ak,V~k+1) by minimizing (7) with treatment contrast (10).

Step 2.

At stage-k, transfer the value function at stage-(k+1) to the value function at stage-k with (9).

Step 3.

Set kk-1 and repeat steps 1 and 2 until k=1.

Extension to DTR with multiple treatments

Thus far, it is assumed that the treatment option for Ak is binary, i.e., Ak=ak{0,1}. However, there are many clinical studies, testing more than two treatments, in which case the aforementioned approach for optimal treatment regime cannot be applied. With multiple treatment options, we will use a mixed approach of Huang et al.8 and Tao and Wang12. If there are Lk3 treatment options for the kth stage, we can consider the order statistics of μak(Hk),ak=1,,Lk,, i.e., μ(1)(Hk)μ(Lk)(Hk). Now let νak be the order index of the mean outcome, such that μ(ak)(Hk)=μνak(Hk). Then the best optimal treatment regime gkopt among Lk treatments may be estimated by directly maximizing

EHηkak=1Lkμ(ak)(Hk)I{νak(Hk)=gk(Hk)}. 11

This optimization, however, is plausible only when Lk is small and fixed in advance, otherwise it becomes very difficult to implement8. Alternatively, Tao and Wang12 suggested to find a sub-optimal treatment regime by paying attention to the following inequalities of the subsequent contrast functions for ak=1,,Lk-1,

0μ(Lk)(Hk)-μ(Lk-1)(Hk)μ(Lk)(Hk)-μ(ak)(Hk)μ(Lk)(Hk)-μ(1)(Hk).

By focusing on two specific contrasting factors |μ^(Lk)(Hk)-μ^(Lk-1)(Hk)| and |μ^(Lk)(Hk)-μ^(1)(Hk)| respectively, they identified sub-optimal treatment regimes as

g^kopt=arg mingkGEH[ηk|μ^(Lk)(Hk)-μ^(Lk-1)(Hk)|I{νLk(Hk)gk(Hk)}] 12

and

g^kopt=arg mingkGEH[ηk|μ^(Lk)(Hk)-μ^(1)(Hk)|I{νLk(Hk)gk(Hk)}]. 13

This argument suggests that a sub-optimal treatment rule may be obtained by controlling some of the treatment contrasting factors. Note that the decision rules in (12) and (13) minimize, respectively, the lower and the upper bounds of the expected loss in the outcome due to sub-optimal treatments in the entire population of interest. We explore both treatment selection methods in our numerical experiments with pseudo-observations for censored data. Our results reveal that the two methods produce similar performance. This may be because the minimum and maximum bounds of the objective function may converge to the same value unless the assumed models are severely mis-specified.

Experimental studies

This section provides our empirical simulation results to demonstrate the finite-sample performance of the proposed method in a two-stage DTR setting. We also performed additional simulations, shown in the web-based supplementary material, which include the results for the single-stage estimation and covariate-dependent censoring situation.

Scenario 1: Randomized experiments

We first evaluate the performance of the proposed method for the two-stage DTR problem when responses are subject to censoring and competing risks. Simulation results under single stage are postponed to the Tables S1 and S2 in the Web-appendix. We let n=500 or 1000 in all studies. Let xk,ji be the jth covariate value of the ith subject at the kth stage (i=1,,n;k=1,2;j=1,,pk). At the first stage, we generate 10 covariates x1,i=(x1,1i,,x1,10i)T, where each covariate independently follows an Uniform[-2,2] distribution. The second stage involves a single variable x2,i=(x2,i) that is generated from Uniform[min(x1,1i),max(x1,1i)]. The treatment indicator Ak,i,k={1,2} is generated from Bernoulli(0.5). For survival outcome, we first generate first stage survival time as T1,i=exp{1.5+0.5x1,1i+A1,i(x1,2i-0.5)+ϵ1,i} and accumulated survival time at second stage as T2,i=exp{1.5+0.5x1,1i+A1,i(x1,2i-0.5)+A2,i(x2,i-0.5)+ϵ2,i}, where ϵ1,i and ϵ2,i are random error variables, independently generated from exp(ϵk,i)Exp(1). Censoring times are generated from CiExp(c0), where c0 is a fixed constant yielding 15% or 30% censoring rates. A subject enters the second stage when η2,i=I(T1,i<Ci)=1. For an individual who is not alive at the beginning of the second stage (i.e., η2,i=0), his or her survival time is Ti=T1,iexp{(g2,iopt-A2,i)(x2,i-0.5)}, otherwise the survival time is given by Ti=T2,i. That is, Ti=η2,iT2,i+(1-η2,i)T1,iexp{(g2,iopt-A2,i)(x2,i-0.5)}. In this setting, it can be shown that the optimal rules gopt=(g1opt,g2opt) are given by g1opt=I(x1,2i0.484) and g2opt=I(x2,i0.5). Under this setting, approximately 80% of individuals are transferred from stage 1 to stage 2. The propensity score for each individual is estimated by the sample proportion of the treatment, i.e., #(Ak=1)/n. Our objective is to find optimal DTRs that maximize the 3-year survival rate, for which the true maximal survival is known to be S(3,g0opt)=0.65.

We further consider the competing risks data setting, in which we model the stage-1 and stage-2 Q-functions for the cause-1 event as ψ1i=exp{1-3x1,1i-A1,i(3.6x1,2i-0.8)} and ψ1i=exp{1-3x1,1i-A1,i(3.6x1,2i-0.8)-A2,i(0.5-1.7x2,i)}, respectively. The Q-model for the cause-2 event is specified as ψ2i=exp{1+3x1,1i+A1,i(x1,2i+0.8)-A2,i(x2,i-0.5)}. Following Fine and Gray42, we let Pi(Di=1)=1-(1-q)1/ψ1i and generate the cause-2 event times from F2i(t)=1-exp{-tψ2i}. For the cause-1 event, we let η2,i=1 if the cause-1 event time is less than 3. The cause-1 event times are generated from F1i(t)=1-{1-q(1-e-t)}ψ1i if η2i=1, otherwise from F1i(t)=1-{1-q(1-e-t)}exp{1-3x1,1i-A1,i(3.6x1,2i-0.8)}. With the choice of q=0.5, about 43% and 38% of individuals experience the cause-1 failure, respectively, under 15% and 30% censoring rates. Also, approximately 45% are transferred to stage 2 and suffer from the cause-1 event. The optimal treatment rules are g1opt=I(x1,2i0.250) and g2opt=I(x2,i0.294), for which the true minimal 3-year cause-1 CIF is F1(3,g0opt)=0.23.

Table 1 summarizes the performance of several DTR methods, including outcome weighted learning (OWL)39 and its DR version (DWL), penalized OWL (POWL) and the proposed penalized DR weighted learning (PDWL), for survival and competing risks endpoints. In all cases, survival responses are replaced with their pseudo-observations. Here, OWL and POWL represents the pseudo-outcome weighted learning method and the SCAD-penalized OWL, respectively. For OWL and POWL, we evaluate the contrasting factor C(H) by (5) and the value function by (9). Simulations are conducted to optimize the true survival curves, {S(3,g^opt),F1(3,g^opt)}, and their empirical counterparts, {S^(3,g^opt),F^1(3,g^opt)}. The results show that the proposed PDWL outperforms other algorithms, nearly achieving the maximal survival and minimal cumulative incidence rates in all cases. Our method also best performs in terms of correct decision rate at the first stage (CDR1) and average correct decision rate at both stages (ACDR), which are approximated with 50,000 test samples. Note that a naive treatment regime with g=0, i.e., just prescribing the control treatment in both stages, even produces better outputs than OWL or DWL. This implies that the performance of optimal treatment allocation rules can be greatly improved through penalization on the predictor-by-treatment interaction term.

Table 1.

Performance of several DTR algorithms.

n Censor Method Survival events Cause-1 specific events
S(3,g^opt) S^(3,g^opt) CDR1 ACDR F1(3,g^opt) F^1(3,g^opt) CDR1 ACDR
500 15% g=0 0.50 (0.00) 0.39 (0.02) 0.62 (0.00) 0.39 (0.00) 0.43 (0.00) 0.44 (0.02) 0.44 (0.00) 0.37 (0.00)
g=1 0.31 (0.00) 0.39 (0.02) 0.38 (0.00) 0.14 (0.00) 0.42 (0.00) 0.43 (0.02) 0.56 (0.00) 0.14 (0.00)
OWL 0.46 (0.05) 0.44 (0.05) 0.50 (0.10) 0.37 (0.10) 0.40 (0.04) 0.44 (0.04) 0.50 (0.09) 0.32 (0.10)
DWL 0.50 (0.07) 0.50 (0.05) 0.85 (0.09) 0.47 (0.15) 0.28 (0.03) 0.33 (0.03) 0.85 (0.08) 0.54 (0.11)
POWL 0.58 (0.02) 0.54 (0.04) 0.79 (0.08) 0.66 (0.08) 0.27 (0.02) 0.34 (0.03) 0.83 (0.06) 0.59 (0.09)
PDWL 0.61 (0.01) 0.56 (0.03) 0.90 (0.04) 0.74 (0.06) 0.26 (0.01) 0.32 (0.03) 0.89 (0.03) 0.64 (0.09)
30% g=0 0.51 (0.00) 0.40 (0.02) 0.62 (0.00) 0.40 (0.00) 0.43 (0.00) 0.43 (0.02) 0.44 (0.00) 0.37 (0.00)
g=1 0.32 (0.00) 0.40 (0.03) 0.38 (0.00) 0.14 (0.00) 0.42 (0.00) 0.43 (0.03) 0.56 (0.00) 0.14 (0.00)
OWL 0.47 (0.05) 0.45 (0.05) 0.50 (0.10) 0.38 (0.10) 0.41 (0.04) 0.44 (0.04) 0.50 (0.09) 0.32 (0.10)
DWL 0.50 (0.06) 0.50 (0.05) 0.84 (0.09) 0.44 (0.14) 0.28 (0.03) 0.33 (0.04) 0.85 (0.08) 0.53 (0.12)
POWL 0.58 (0.03) 0.53 (0.04) 0.77 (0.09) 0.64 (0.08) 0.27 (0.02) 0.34 (0.03) 0.83 (0.07) 0.59 (0.09)
PDWL 0.61 (0.01) 0.56 (0.03) 0.89 (0.04) 0.74 (0.06) 0.26 (0.01) 0.32 (0.03) 0.89 (0.03) 0.63 (0.09)
1000 15% g=0 0.50 (0.00) 0.39 (0.01) 0.62 (0.00) 0.39 (0.00) 0.43 (0.00) 0.44 (0.01) 0.44 (0.00) 0.37 (0.00)
g=1 0.31 (0.00) 0.39 (0.02) 0.38 (0.00) 0.14 (0.00) 0.42 (0.00) 0.43 (0.02) 0.56 (0.00) 0.14 (0.00)
OWL 0.48 (0.05) 0.46 (0.05) 0.51 (0.11) 0.41 (0.11) 0.39 (0.05) 0.44 (0.04) 0.51 (0.10) 0.36 (0.11)
DWL 0.51 (0.07) 0.52 (0.05) 0.90 (0.06) 0.51 (0.17) 0.27 (0.02) 0.32 (0.02) 0.88 (0.05) 0.59 (0.09)
POWL 0.61 (0.01) 0.56 (0.03) 0.87 (0.06) 0.78 (0.07) 0.25 (0.01) 0.33 (0.02) 0.88 (0.05) 0.69 (0.08)
PDWL 0.62 (0.01) 0.57 (0.02) 0.93 (0.02) 0.83 (0.04) 0.25 (0.01) 0.32 (0.02) 0.91 (0.02) 0.73 (0.08)
30% g=0 0.51 (0.00) 0.40 (0.02) 0.62 (0.00) 0.40 (0.00) 0.43 (0.00) 0.44 (0.02) 0.44 (0.00) 0.37 (0.00)
g=1 0.32 (0.00) 0.40 (0.02) 0.38 (0.00) 0.14 (0.00) 0.42 (0.00) 0.43 (0.02) 0.56 (0.00) 0.14 (0.00)
OWL 0.49 (0.05) 0.46 (0.05) 0.51 (0.11) 0.41 (0.12) 0.40 (0.05) 0.44 (0.04) 0.51 (0.10) 0.35 (0.12)
DWL 0.51 (0.06) 0.52 (0.04) 0.89 (0.07) 0.46 (0.15) 0.27 (0.02) 0.32 (0.02) 0.88 (0.05) 0.59 (0.11)
POWL 0.62 (0.02) 0.56 (0.03) 0.85 (0.07) 0.76 (0.07) 0.25 (0.01) 0.33 (0.02) 0.87 (0.05) 0.69 (0.08)
PDWL 0.63 (0.01) 0.58 (0.02) 0.93 (0.02) 0.82 (0.04) 0.25 (0.01) 0.32 (0.02) 0.91 (0.02) 0.72 (0.08)

The table reports optimized t-year survival and t-year cumulative incidence rates, correct decision rate at first stage (CDR1), and average correct decision rate of two stages (ACDR). For each scenario, the best model is highlighted in bold.

Scenario 2: Observational studies

We next consider observational studies, in which treatment selection is not randomized and may depend on patients’ histories. In a similar configuration to the first simulation, we consider two scenarios for the propensity score function: (i) true logistic: P(A1,i=1|x1,i)=expit(x1,2i-0.6x1,3i) and P(A2,i=1|x2,i)=expit(-0.5x2,i); and (ii) false logistic: P(A1,i=1|x1,i)=expit(x1,2i-0.6x1,3i-0.4x1,3i2) and P(A2,i=1|x2,i)=expit(-0.5x2,i-0.2x2,i2). Notice that the the true logistic models do not involve any second-order treatment effects, whereas the false logistic models have a quadratic term. We shall apply the standard logistic model with only main-effect terms, in which case the true logistic model is correctly specified but the false logistic model is mis-specified.

Figures 1 and 2 summarize the simulation results for the survival and competing risks endpoints, respectively, when censoring rates are about 30%. Again, four methods, OWL, DWL, POWL and PDWL, are compared in terms of the targeted survival measure and ACDR. Clearly, the proposed PDWL approach outperforms other algorithms, regardless of whether the fitted model is correctly specified or not, and also achieves the targeted optimal rates. Overall, DWL shows very high variability in predicting optimal regimes. On the other hand, POWL occasionally performs very poorly, even though its variation is well controlled. This implies that DR estimators should be accompanied with a proper penalization method to achieve optimal performance and that penalization alone could also result in inconsistent and misleading treatment rules. In almost all scenarios, OWL find sub-optimal rules and thus cannot be the method of choice. As the sample size increases, the performance of all algorithms improve.

Figure 1.

Figure 1

Survival probability and average correct decision rate (ACDR) of two stages under optimal dynamic treatment regimes with OWL, DWL, POWL and PDWL for different sample sizes. Optimal regimes should maximize the survival rate and ACDR.

Figure 2.

Figure 2

Cumulative incidence rate and average correct decision rate (ACDR) of two stages under optimal dynamic treatment regimes with OWL, DWL, POWL and PDWL for different sample sizes. Optimal regimes should minimize the cumulative incidence rate but maximize ACDR.

Scenario 3: Multiple treatments

Finally, we extend our method to the multiple treatments recommendation problem. For simplicity, we assume that there are three treatment options (i.e., Ai{1,2,3}) in a single-stage (K=1) setting. We let x1i,x2i and x3i follow Uniform[-2,2] independently and define φ1i=exp(x2i-0.6x3i), φ2i=exp(x2i+0.2x3i) and φ3i=1+φ1i+φ2i. Then, the treatment indicator Ai is generated from a multinomial distribution with probabilities (φ1i/φ3i,φ2i/φ3i,1/φ3i) for treatment 1, 2 and 3, respectively. The survival time is generated as Ti=exp{1.5+0.5x1i+(Ai=1)(x1i-x2i)+(Ai=2)(x1i+0.5x2i)+ϵi}, where exp(ϵi)Exp(1). Figure 3 summarizes the results, where we use the one-versus-one SVM to optimize (11) under 30% censoring. Each color represents three treatments and black dashed line is the true decision line. Two DR methods (DR1 and DR2) perform well, clearly separating three treatment regions. In contrast, IPW-based methods (IPW1 and IPW2) result in poor classification performance, where treatment 1 is dominated by treatments 2 and 3. Here, DR1 and IPW1 are obtained from (12), whereas DR2 and IPW2 are based on (13). Clearly, doubly-robust modifications outperform basic estimators, which implies that model specification is also essential for the performance of classification algorithms.

Figure 3.

Figure 3

Treatment allocations of DR and IPW estimators when there are three treatment options, where the lower bound of contrast function (12) is applied to DR1 and IPW1 while the upper bound (13) is used in DR2 and IPW2, respectively.

An application to ACTG175 data

Data description

This section provides a practical application of the proposed treatment selection method to the AIDS Clinical Trial Group (ACTG175) study43. In this study, each subject was randomized by four treatment arms with equal assignment probabilities: (i) zidovudine monotherapy (ZDV), (ii) ZDV plus didanosine (ddI), (iii) ZDV plus zalcitabine (zal) and (iv) ddI monotherapy alone, which were coded as 0, 1, 2 and 3, respectively. Figure 1a visualizes the nonparametric survival curves for these four treatment arms, showing three treatment arms except ZDV alone have a similar survival rates. For this reason, previous work24 assumed that the treatment is binary by combining (ii)–(iv) into a single arm. In this analysis, we consider the optimal treatment selection problem between binary arms ((ii) versus (iii)) and among three treatment arms ((ii), (iii) and (iv)). The event of primary interest is the first observed time-to-event of either having a larger than 50% decline in the CD4 cell count or occurrence of immune deficiency syndrome or death. Twelve baseline covariates were considered in Hammer et al.43 and three of them were identified as important risk factors, which are age in year at baseline (Age), CD4 T-cell count at baseline (CD40) and Karnofsky score (Karnof). In addition to these three variables, we also include the following covariates in our analysis: Gender (Sex), weight in kilogram (Weight) and number of days of previously received antiretroviral therapy (Preanti). The overall censoring rate was 79.7% when the maximum follow-up time was set to 1000 days.

Analysis results

To examine whether the censoring distribution depends on a set of covariates, we first fitted a Cox proportional hazards model and we found that Sex and Preanti are statistically significant at the significance level of 0.05. Therefore, we considered modified pseudo-observations from Eq. (3) under the conditional independent censoring assumption as well as pseudo-observations from the standard Kaplan-Meier method. We computed individual pseudo responses for the survival rate after 1000 days since the treatment. Since this study was a randomized trial, we calculated the propensity score as the proportions of treated and untreated and applied a linear regression model to predict the mean response. Then we investigated seven methods for optimal treatment regime: (i) naive Kaplan–Meier, (ii) OWL, (iii) POWL, (iv) DWL, (v) PDWL, (vi) PDWL2, and (vii) MDWL. Here, PDWL2 represents the PDWL algorithm with covariate-adjusted pseudo-observations and MDWL represents the modified DWL algorithm for the three treatment options. The naive Kaplan–Meier curve under original treatment allocation is included as a reference.

Figure 4 shows (a) nonparametric Kaplan–Meier curves for four treatments and (b) the expected survival curves under the optimal treatment regimes from six weighted classification algorithms. Clearly, our proposed methods, PDWL and PDWL2, achieved higher overall survival probabilities than the other algorithms, although we focused on a particular t-year survival outcome. The performance of PDWL and PDWL2 were almost indistinguishable, implying that a covariate adjustment for the pseudo-value calculation may not make a noticeable difference in identifying optimal treatment regimes. Also note that OWL and DWL do not significantly improve the overall survival, compared to the naive KM estimator. This may show that penalization is critical in identifying an effective optimal treatment decision rule. The optimal survival rates, if patients followed the optimal treatment rules by PDWL and PDWL2 are above 83% at 1000 days after the treatment, whereas the survival rates under OWL and KM are less than 80% at the same time point. Finally, we note that the MDWL approach for multiple treatments can improve overall survival significantly, dominating the other methods after about 300 days. When implementing MDWL, two criteria (12) and (13) usually produce similar performance, and we used (12) to produce the result in Fig. 4b. This implies that although the suggested treatment rules for multiple treatments are sub-optimal, it could result in more improved performance than the two-treatment cases. More empirical and theoretical studies in this regard would be interesting.

Figure 4.

Figure 4

Nonparametric Kaplan–Meier curves of ACTG175 data under (a) given treatment arms and (b) optimal treatment rules.

Discussion

In this paper, we propose an accountable survival contrast-learning to identify tailored optimal treatment regimes with time-to-event outcomes. Existing methodologies for censored data are mostly based on notoriously complex computing algorithms and become impracticable when the number of covariates are too much increased. It is partly because their procedures may involve a weighted nonparametric survival curve estimation at each iteration under potential population24,25. Alternatively, we employ an affordable pseudo-value approach by replacing unknown survival or competing risks measures with their jackknife-type resampling estimates. We then develop effective regularized survival contrast-learning algorithms that can produce interpretable optimal treatment rules. It should be also noted that many weighted classification algorithms are based on IPW estimating procedures with an 2-penalization. However, these approaches are vulnerable to model mis-specification and amount of censoring and often underperformed as shown in our simulation studies. We provide empirical evidence that our proposal can significantly increase accountability and prediction power in tailoring clinical decision-making by combining well-known 1-type regularization and doubly-robust weighting schemes. In real applications, however, linear treatment rules are sometimes not sufficient to achieve the maximum expected treatment reward and non-linear treatment rules may be requested. In that case, one may generalize the proposed SVM by using a reproducing kernel Hilbert space (RKHS) or pile multiple layers for the deep neural network (DNN). These architectures are widely used in many classification problems and can be explored under the DTR framework.

Of note, conventional pseudo-observations require the strict independent censoring condition, which may fail to hold in practice. Our empirical experiences, however, show that our approach still works well even in the case of covariate-dependent censoring. One may adopt an inversely censoring weighted approach to facilitate covariate-dependent censoring, as shown in Eq. (3)30,31, but we show that its contribution is limited in revealing optimal treatment rules. Further simulation results in Table S3 of the Web-appendix also show that the covariate-adjusted and unadjusted pseudo-value methods produce similar performance. Hager et al.44 also proposed an IPW-based classification algorithm for optimal dynamic treatment regime with censored survival data. Empirical studies to compare their algorithm with the proposed pseudo-value approach would be interesting. Finally, when there are multiple treatment arms, we used the sub-optimal contrast-learning classification algorithms that may not produce the globally optimal treatment rule. In this case, the classification algorithm may be applied several times to each pair among multiple treatment options. However, this approach is computationally demanding and also possibly subject to a multiple-testing problem. One might solve this problem by introducing SVM algorithms for multi-class items45. It is worth further investigation and will be pursued in a separate study.

Supplementary Information

Acknowledgements

The results of ACTG data analysis are solely the responsibility of the authors and does not necessarily represent the official views of the AIDS clinical trial group. The research of T.C. was supported by the junior fellow research grant of Korea University. The research of S.C. was supported by grants from Korea University (No. K2201231) and the National Research Foundation (NSF) of Korea (Nos. 2022R1A2C1008514, 2022M3J6A1063595).

Author contributions

T.C. developed the method, conducted simulation and data analysis, and wrote the manuscript. H.L. validated the computation and data analysis, and reviewed and edited the manuscript. S.C. conceptualized the method, and wrote, reviewed and edited the manuscript. All authors have reviewed the final version of the manuscript.

Data Availability

The pseudo-observation of survival quantities can be calculated by the R package pseudo46 and eventglm32. The optimization of the penalized SVM is conducted by the R package lpSolve47. One-versus-one pairwise SVM can be implemented by the R package e107148. The ACTG175 dataset used in this study is available at the R package speff2trial49. The sample R code to implement our method is available via the first author’s Github (https://github.com/taehwa015/SurvDTR).

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-023-29106-w.

References

  • 1.Murphy SA. Optimal dynamic treatment regimes. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2003;65:331–355. doi: 10.1111/1467-9868.00389. [DOI] [Google Scholar]
  • 2.Moodie EE, Richardson TS, Stephens DA. Demystifying optimal dynamic treatment regimes. Biometrics. 2007;63:447–455. doi: 10.1111/j.1541-0420.2006.00686.x. [DOI] [PubMed] [Google Scholar]
  • 3.Zhao Y, Kosorok MR, Zeng D. Reinforcement learning design for cancer clinical trials. Stat. Med. 2009;28:3294–3315. doi: 10.1002/sim.3720. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Qian M, Murphy SA. Performance guarantees for individualized treatment rules. Ann. Stat. 2011;39:1180–1210. doi: 10.1214/10-AOS864. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Tian L, Alizadeh AA, Gentles AJ, Tibshirani R. A simple method for estimating interactions between a treatment and a large number of covariates. J. Am. Stat. Assoc. 2014;109:1517–1532. doi: 10.1080/01621459.2014.951443. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Chakraborty B, Murphy S, Strecher V. Inference for non-regular parameters in optimal dynamic treatment regimes. Stat. Methods Med. Res. 2010;19:317–343. doi: 10.1177/0962280209105013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Song R, et al. On sparse representation for optimal individualized treatment selection with penalized outcome weighted learning. Stat. 2015;4:59–68. doi: 10.1002/sta4.78. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Huang X, Choi S, Wang L, Thall PF. Optimization of multi-stage dynamic treatment regimes utilizing accumulated data. Stat. Med. 2015;34:3424–3443. doi: 10.1002/sim.6558. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Zhang B, Tsiatis AA, Laber EB, Davidian M. A robust method for estimating optimal treatment regimes. Biometrics. 2012;68:1010–1018. doi: 10.1111/j.1541-0420.2012.01763.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Schulte PJ, Tsiatis AA, Laber EB, Davidian M. Q- and A-learning methods for estimating optimal dynamic treatment regimes. Stat. Sci. 2014;29:640–661. doi: 10.1214/13-STS450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Zhao Y-Q, Zeng D, Laber EB, Kosorok MR. New statistical learning methods for estimating optimal dynamic treatment regimes. J. Am. Stat. Assoc. 2015;110:583–598. doi: 10.1080/01621459.2014.937488. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Tao Y, Wang L. Adaptive contrast weighted learning for multi-stage multi-treatment decision-making. Biometrics. 2017;73:145–155. doi: 10.1111/biom.12539. [DOI] [PubMed] [Google Scholar]
  • 13.Zhang B, Zhang M. C-learning: a new classification framework to estimate optimal dynamic treatment regimes. Biometrics. 2018;74:891–899. doi: 10.1111/biom.12836. [DOI] [PubMed] [Google Scholar]
  • 14.Qi Z, Liu Y, et al. D-learning to estimate optimal individual treatment rules. Electron. J. Stat. 2018;12:3601–3638. doi: 10.1214/18-EJS1480. [DOI] [Google Scholar]
  • 15.Lakkaraju, H. & Rudin, C. Learning cost-effective and interpretable treatment regimes. In International Conference on Artificial Intelligence and Statistics 166–175 (PMLR, 2017).
  • 16.Sherman, E., Arbour, D. & Shpitser, I. General identification of dynamic treatment regimes under interference. In International Conference on Artificial Intelligence and Statistics 3917–3927 (PMLR, 2020). [PMC free article] [PubMed]
  • 17.Cai, H., Lu, W. & Song, R. On validation and planning of an optimal decision rule with application in healthcare studies. In International Conference on Machine Learning 1262–1270 (PMLR, 2020).
  • 18.Cui Y, Tchetgen Tchetgen E. A semiparametric instrumental variable approach to optimal treatment regimes under endogeneity. J. Am. Stat. Assoc. 2021;116:162–173. doi: 10.1080/01621459.2020.1783272. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Qiu H, et al. Optimal individualized decision rules using instrumental variable methods. J. Am. Stat. Assoc. 2021;116:174–191. doi: 10.1080/01621459.2020.1745814. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Tsiatis, A. A., Davidian, M., Holloway, S. T. & Laber, E. B. Dynamic Treatment Regimes: Statistical Methods for Precision Medicine (Chapman and Hall/CRC, 2019).
  • 21.Simoneau G, et al. Estimating optimal dynamic treatment regimes with survival outcomes. J. Am. Stat. Assoc. 2020;115:1531–1539. doi: 10.1080/01621459.2019.1629939. [DOI] [Google Scholar]
  • 22.Zhao Y-Q, Zhu R, Chen G, Zheng Y. Constructing dynamic treatment regimes with shared parameters for censored data. Stat. Med. 2020;39:1250–1263. doi: 10.1002/sim.8473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Zhao Y-Q, et al. Doubly robust learning for estimating individualized treatment with censored data. Biometrika. 2015;102:151–168. doi: 10.1093/biomet/asu050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Jiang R, Lu W, Song R, Davidian M. On estimation of optimal treatment regimes for maximizing t-year survival probability. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2017;79:1165–1185. doi: 10.1111/rssb.12201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Zhou J, Zhang J, Lu W, Li X. On restricted optimal treatment regime estimation for competing risks data. Biostatistics. 2021;22:217–232. doi: 10.1093/biostatistics/kxz026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Robins, J. M. Optimal structural nested models for optimal sequential decisions. In Proceedings of the Second Seattle Symposium in Biostatistics 189–326 (Springer, 2004).
  • 27.Bai X, Tsiatis AA, Lu W, Song R. Optimal treatment regimes for survival endpoints using a locally-efficient doubly-robust estimator from a classification perspective. Lifetime Data Anal. 2017;23:585–604. doi: 10.1007/s10985-016-9376-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Andersen PK, Klein JP, Rosthøj S. Generalised linear models for correlated pseudo-observations, with applications to multi-state models. Biometrika. 2003;90:15–27. doi: 10.1093/biomet/90.1.15. [DOI] [Google Scholar]
  • 29.Andersen PK, Pohar Perme M. Pseudo-observations in survival analysis. Stat. Methods Med. Res. 2010;19:71–99. doi: 10.1177/0962280209105020. [DOI] [PubMed] [Google Scholar]
  • 30.Binder N, Gerds TA, Andersen PK. Pseudo-observations for competing risks with covariate dependent censoring. Lifetime Data Anal. 2014;20:303–315. doi: 10.1007/s10985-013-9247-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Overgaard M, Parner ET, Pedersen J. Pseudo-observations under covariate-dependent censoring. J. Stat. Plan. Inference. 2019;202:112–122. doi: 10.1016/j.jspi.2019.02.003. [DOI] [Google Scholar]
  • 32.Sachs MC, Gabriel EE. Event history regression with pseudo-observations: computational approaches and an implementation in R. J. Stat. Softw. 2022;102:1–34. doi: 10.18637/jss.v102.i09. [DOI] [Google Scholar]
  • 33.Robins J. A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect. Math. Model. 1986;7:1393–1512. doi: 10.1016/0270-0255(86)90088-6. [DOI] [Google Scholar]
  • 34.Zhang B, Tsiatis AA, Davidian M, Zhang M, Laber E. Estimating optimal treatment regimes from a classification perspective. Stat. 2012;1:103–114. doi: 10.1002/sta.411. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Lee BK, Lessler J, Stuart EA. Improving propensity score weighting using machine learning. Stat. Med. 2010;29:337–346. doi: 10.1002/sim.3782. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.McCaffrey DF, et al. A tutorial on propensity score estimation for multiple treatments using generalized boosted models. Stat. Med. 2013;32:3388–3414. doi: 10.1002/sim.5753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Tsiatis A. Semiparametric Theory and Missing Data. Berlin: Springer; 2007. [Google Scholar]
  • 38.Cortes C, Vapnik V. Support-vector networks. Mach. Learn. 1995;20:273–297. doi: 10.1007/BF00994018. [DOI] [Google Scholar]
  • 39.Zhao Y, Zeng D, Rush AJ, Kosorok MR. Estimating individualized treatment rules using outcome weighted learning. J. Am. Stat. Assoc. 2012;107:1106–1118. doi: 10.1080/01621459.2012.695674. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001;96:1348–1360. doi: 10.1198/016214501753382273. [DOI] [Google Scholar]
  • 41.Bather J. Decision Theory: An Introduction to Dynamic Programming and Sequential Decisions. Hoboken: Wiley; 2000. [Google Scholar]
  • 42.Fine JP, Gray RJ. A proportional hazards model for the subdistribution of a competing risk. J. Am. Stat. Assoc. 1999;94:496–509. doi: 10.1080/01621459.1999.10474144. [DOI] [Google Scholar]
  • 43.Hammer SM, et al. A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter. N. Engl. J. Med. 1996;335:1081–1090. doi: 10.1056/NEJM199610103351501. [DOI] [PubMed] [Google Scholar]
  • 44.Hager R, Tsiatis AA, Davidian M. Optimal two-stage dynamic treatment regimes from a classification perspective with censored survival data. Biometrics. 2018;74:1180–1192. doi: 10.1111/biom.12894. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Hsu C-W, Lin C-J. A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 2002;13:415–425. doi: 10.1109/72.991427. [DOI] [PubMed] [Google Scholar]
  • 46.Perme, M. P. & Gerster, M. Pseudo: Computes pseudo-observations for modeling. R package version 1.4.3 (2017).
  • 47.Berkelaar, M., Eikland, K. & Notebaert, P. lpSolve: Interface to ‘Lp_solve’v. 5.5 to solve linear/integer programs. R package version 5.6.15 (2015).
  • 48.Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A. & Leisch, F. e1071: Misc functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.7-4 (2020).
  • 49.Juraska, M. et al. speff2trial: Semiparametric efficient estimation for a two-sample treatment effect. R package version 1.0.4 (2012).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

The pseudo-observation of survival quantities can be calculated by the R package pseudo46 and eventglm32. The optimization of the penalized SVM is conducted by the R package lpSolve47. One-versus-one pairwise SVM can be implemented by the R package e107148. The ACTG175 dataset used in this study is available at the R package speff2trial49. The sample R code to implement our method is available via the first author’s Github (https://github.com/taehwa015/SurvDTR).


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES