Abstract
We analyze a dataset arising from a clinical trial involving multi-stage chemotherapy regimes for acute leukemia. The trial design was a 2 × 2 factorial for frontline therapies only. Motivated by the idea that subsequent salvage treatments affect survival time, we model therapy as a dynamic treatment regime (DTR), that is, an alternating sequence of adaptive treatments or other actions and transition times between disease states. These sequences may vary substantially between patients, depending on how the regime plays out. To evaluate the regimes, mean overall survival time is expressed as a weighted average of the means of all possible sums of successive transitions times. We assume a Bayesian nonparametric survival regression model for each transition time, with a dependent Dirichlet process prior and Gaussian process base measure (DDP-GP). Posterior simulation is implemented by Markov chain Monte Carlo (MCMC) sampling. We provide general guidelines for constructing a prior using empirical Bayes methods. The proposed approach is compared with inverse probability of treatment weighting, including a doubly robust augmented version of this approach, for both single-stage and multi-stage regimes with treatment assignment depending on baseline covariates. The simulations show that the proposed nonparametric Bayesian approach can substantially improve inference compared to existing methods. An R program for implementing the DDP-GP-based Bayesian nonparametric analysis is freely available at https://www.ma.utexas.edu/users/yxu/.
Keywords: Dependent Dirichlet process, Gaussian process, G-Computation, In-verse probability of treatment weighting, Markov chain Monte Carlo
1 Introduction
We analyze a dataset arising from a clinical trial involving multi-stage chemotherapy regimes for acute leukemia. The trial design was a 2×2 factorial for frontline therapies only. However, motivated by the idea that subsequent salvage therapies affect survival time, Wahed and Thall (2013) modeled and analyzed treatments in the trial as a dynamic treatment regime (DTR), that is, an alternating sequence of treatments or other actions and transition times between disease states. We propose a Bayesian nonparametric (BNP) approach for evaluating such DTRs in which the outcome at each stage is a random transition time between two disease states. The final overall survival (OS) time outcome of primary interest is the sum, T, of a sequence of transition times. The actually observed sequence is determined by the way that a patient’s treatment regime plays out, and the mean of T may be expressed as an appropriately weighted average over all possible sequences of event times. Our proposed BNP methodology for estimating the mean of T is based on the idea of Robins’ G-computation (Robins, 1986, 1987).
An algorithm commonly used by oncologists in chemotherapy of solid tumors is to choose the patient’s initial (frontline) treatment based on his/her baseline covariates, continue as long as the patient’s disease is stable, switch to a different chemotherapy (salvage) if progressive disease (P) occurs, stop chemotherapy if the tumor is brought into complete or partial remission (C), and begin salvage if P occurs at some time after C. There are many elaborations of this in oncology, including multiple attempts at salvage therapy, use of consolidation therapy for patients in remission, suspension of therapy if severe toxicity is observed, or inclusion of radiation therapy or surgery in the regime. Another important application of this general adaptive structure occurs in treatment regimes for psychological disorders or drug addiction. For example, in treatment of schizophrenia one may replace P by a psychotic episode or other worsening of the subject’s psychological status, C by a specified improvement in mental status, and death by a psychological breakdown severe enough to require hospitalization.
Denote the action at stage ℓ of the DTR by Zℓ, which may be a treatment or a decision to delay or terminate therapy. Here, stage refers to the decision point in the DTR – that is, the choice of frontline and possible salvage therapies. At each stage one observes a disease state sℓ, such as P, C or death (D). Let T(j,r) denote the transition time from disease state j to state r, with j = 0 the patient’s initial disease status. See Figure 1 for an example (details of which will be provided later) with up to nstage = 3 stages, nstate = 4 disease states, and a total of nT = 7 different transition times. Because the actions are adaptive, the actual number of stages and observed transition times vary between patients depending on how the specific treatment-outcome sequence plays out.
Formally, a DTR is the sequence Z = (Z1, Z2, ⋯), where each Zℓ is an adaptive action based on the patient’s history ℋℓ−1 of previous treatments and transition times, and ℋ0 is the patient’s baseline covariate vector. One possible treatment-outcome sequence is (ℋ0, Z1, T (0,C), Z2, T (C,D)), in which the initial chemotherapy Z1 was chosen based on ℋ0, complete remission (C) was achieved, Z2 was chosen based on ℋ1 = (ℋ0, Z1, T(0,C)). In this case, Z2 would be consolidation therapy given to keep the patients in remission, that is, prevent relapse, although consolidation treatments were not included in the dataset that we will analyze. OS time is T = T(0,C) + T(C,D). In this case, s1 = C and s2 = D. Similarly, a patient brought into remission who later suffers progressive disease has sequence (ℋ0, Z1, T(0,C), T(C,P), Z2, T(P,D)) and T = T(0,C) +T(C,P) +T(P,D). We will apply BNP methods to estimate the conditional distributions of the transition times given the most recent histories, with the goal to estimate the mean of T for each possible DTR. This also will include estimates given specific baseline covariates, for so called “individualized” therapy. Key elements of our proposed approach are quantification of all sources of uncertainty and prediction of T under a reasonable set of viable counterfactual DTRs (Wang et al., 2012). BNP methods have been used in estimating regime effects by Hill (2011) and Karabatsos and Walker (2012). Hill (2011) focused on modeling outcomes flexibly using Bayesian additive regression trees (BART), which required less assumptions in model fitting. However, the uncertainty of BART increases dramatically when there is complete treatment-subgroup confounding, and hence limited empirical counterfactuals, which often occurs in causal inference. Karabatsos and Walker (2012) proposed a nonparametric mixture model with a stick-breaking prior for the probability of treatment assignment to provide a more accurately estimated propensity score in the inverse probability of treatment weighting (IPTW) method.
Since all elements of a DTR may affect T, the clinically relevant problem is optimizing the entire regime, rather than the treatment at one particular stage. Most clinical trials or data analyses attempt to reduce variability by focusing on one stage of the actual DTR, usually frontline or first salvage treatment, or by combining stages in some manner. This often misrepresents actual clinical practice, and consequently conclusions may be very misleading. For example, an aggressive frontline cancer chemotherapy may maximize the probability of C, but it may cause so much immunologic damage that any salvage treatment given after rapid relapse, i.e. short T(C,P), may be unlikely to achieve a second remission. In contrast, a milder induction treatment may be suboptimal to eradicate the tumor, but it may debulk the tumor sufficiently to facilitate surgical resection. Such synergies may have profound implications for clinical practice, especially because effects of multi-stage treatment regimes often are not obvious and may seem counter-intuitive. Physicians who have not been provided with an evaluation of the composite effects of entire regimes on the final outcome may unknowingly set patients on pathways that include only inferior regimes.
A major practical advantage of BNP models is that they often provide better fits to complicated data structures than can be obtained using parametric model-based methods. In the case study that we analyze here, leukemia patients were randomized among initial chemotherapy treatments but not among later salvage therapies, and the BNP model provides a good fit for each transition time distribution conditional on previous history. Failure to randomize patients in treatment stages after the first is typical in clinical trials, most of which ignore all but the first stage of therapy. In contrast, sequential multi-arm randomized treatment (SMART) designs, wherein patients are re-randomized at stages after the first, have been used in oncology trials (Thall et al., 2000, 2007a,b; Wang et al., 2012), and are being used increasingly in trials to study multi-stage adaptive regimes for behavioral or psychological disorders (Dawson and Lavori, 2004; Murphy et al., 2007a,b; Connolly and Bernstein, 2007).
While re-randomization is desirable, it is not commonly done and inference has to adjust for this lack of randomization. A wide array of methods have been proposed for evaluating DTRs from observational data and longitudinal studies, beginning with the seminal papers by Robins (1986, 1987, 1989, 1997) on G-estimation of structural nested models. Additional references include applications to longitudinal data in AIDS (Hernán et al., 2000), inverse probability of treatment weighted (IPTW) estimation of marginal structural models (Murphy et al., 2001; van der Laan and Petersen, 2007; Robins et al., 2008), augmented IPTW (AIPTW) (Tsiatis, 2007; Zhao et al., 2015), G-estimation for optimal DTRs (Murphy, 2003; Robins, 2004), and a review by Moodie et al. (2007). A variety of methods have been developed to evaluate DTRs from clinical trials (Lavori and Dawson, 2000; Thall et al., 2002; Murphy, 2005; Goldberg and Kosorok, 2012; Zajonc, 2012). For survival analysis, Lunceford et al. (2002) introduced ad hoc estimators for the survival distribution and mean restricted survival time under different treatment policies. These estimators, although consistent, were inefficient and did not exploit information from auxiliary covariates. Wahed and Tsiatis (2006) derived more efficient, easy-to-compute estimators that included auxiliary covariates for the survival distribution and related quantities of DTRs. Their estimators compared DTRs using data from a two-stage randomized trial, in which two options were available for both stages and the second-stage treatment assignments were determined by randomization. However, these estimators must be adapted for more general or more complicated designs that permit various numbers of treatment options at each stage and involve the scenarios where second-stage treatment is not randomized, but rather is determined by the attending physicians.
For settings where the DTR’s final overall time, such as survival time, is the sum of a sequence of transition times, our proposed BNP approach employs a nonparametric survival regression model for each transition time conditional on the most recent history of actions and outcomes. We assume a dependent Dirichlet process prior with Gaussian process base measure (DDP-GP), and summarize a joint posterior by Markov chain Monte Carlo (MCMC) simulation. To address the important issue that Bayesian analyses depend on prior assumptions, we provide guidelines for using empirical Bayes methods to establish prior hyperparameters. Posterior analyses include estimation of posterior mean overall outcome times and credible intervals for each DTR.
The rest of the paper is organized as follows. In Section 2 we review the motivating study, and give a brief review of DTRs in settings with successive transition times in Section 3. We present the DDP-GP model in Section 4. A simulation study of the BNP approach in single-stage and multi-stage regimes, with comparison to frequentist IPTW and AIPTW, is summarized in Section 5. We re-analyze the leukemia trial data in Section 6, and close with brief discussion in Section 7.
2 A Study of Multi-Stage Chemotherapy Regimes for Acute Leukemia
Our case study was a clinical trial conducted at The University of Texas M.D. Anderson Cancer Center to evaluate chemotherapies for acute myelogenous leukemia (AML) or myelo-dysplastic syndrome (MDS). Patients were randomized fairly among four frontline combination chemotherapies for remission induction: fludarabine + cytosine arabinoside (ara-C) plus idarubicin (FAI), FAI + all-trans-retinoic acid (ATRA), FAI + granulocyte colony stimulating factor (GCSF), and FAI + ATRA + GCSF. The goal of induction therapy for AML/MDS was to achieve complete remission (C), a necessary but not sufficient condition for long-term survival. Patients who did not achieve C, or who achieved C but later relapsed, were given salvage treatments as another attempt to achieve C. Following conventional clinical practice, patients were not randomized among salvage therapies, which instead were chosen by the attending physicians based on clinical judgment. Since there were many types of salvage, these are broadly classified into two categories as either containing high dose ara-C (HDAC) or other. This dataset was analyzed initially using conventional methods (Estey et al., 1999), including logistic regression, Kaplan-Meier estimates, and Cox model regression, including comparisons of the induction therapies in terms of OS, that ignored possible effects of salvage therapies.
Figure 1 illustrates the actual possible therapeutic pathways and outcomes of the patients during the trial, which is typical of chemotherapy for AML/MDS. Death might occur (1) during induction therapy, (2) following salvage therapy if the disease was resistant to induction, (3) during C, or (4) following disease progression after C. Wahed and Thall (2013) re-analyzed the data from this trial by accounting for the structure in Figure 1, and identified 16 DTRs including both frontline and salvage therapies. To correct for bias due to the lack of randomization in estimating the mean OS times, they used both IPTW (Robins and Rotnitzky, 1992) and G-computation based on a frequentist likelihood. In the G-computation, for each transition time they first fit accelerated failure time (AFT) regression models using Weibull, exponential, log-logistic or lognormal distributions, and chose the distribution having smallest Bayes information criterion (BIC). They then performed likelihood-based G-computation by first fitting each conditional transition time distribution regressed on patient baseline covariates and previous transition times, and then averaging over the empirical covariate distribution.
Like Wahed and Thall, the primary goal of our analyses of the AML/MDS dataset is to estimate mean OS and determine the optimal regime. We build on their approach by replacing the parametric AFT models for transition times with the DDP-GP model. We also demonstrate the usefulness of the BNP regression model for G-computation in simulation studies of single-stage and multi-stage regimes in which treatment assignments depend on patient covariates.
3 Dynamic Regimes with Stochastic Transition Times
The case study involves more complicated structure than a stylized linear sequential study, as often is assumed in papers on DTRs that focus on basic methodology. We introduce the following notation to accommodate this more complex structure. Denote the set of possible disease states by {0, 1, ⋯, nstate}, with 0 denoting the patient’s initial state before receiving the first treatment. The pairs of states (sℓ−1, sℓ) for which a transition sℓ−1 → sℓ is possible at stage ℓ of the patient’s therapy depend on the particular regime. Here s0 = 0 refers to the patient’s initial state, before start of therapy. We will identify specific states using letters such as P, C, etc., as in the earlier examples, to replace the generic integers. For example, in cancer therapy, sℓ−1 → C means that a patient’s disease has responded to treatment, P → D means a patient with progressive disease has died, and of course D → sℓ is impossible. We denote the transition time from state sℓ−1 to state sℓ in stage ℓ of treatment by T(sℓ−1,sℓ), for ℓ = 1, ⋯, nstage, the maximum number of stages in the DTR. In general it might be necessary to add a third index to indicate the stage ℓ when the same transitions are possible in multiple stages. However, in our case study no ambiguity arises by simply writing T(r,s). To simplify notation for the transition time distributions, we denote the history of all covariates, treatments, and previous transition times through ℓ stages, before observation of T(sℓ−1,sℓ) but including the stage ℓ action Zℓ by xℓ = (ℋℓ−1, Zℓ) = (x0, Z1, T(s0,s1), ⋯, T(sℓ−1,sℓ), Zℓ), with x0 = ℋ0. Thus, a DTR is Z = (Z1, Z2, . . .), a sequence of actions for all possible stages. For example, in the leukemia trial (Figure 1), Z1 might be FAI+ATRA given as frontline therapy, followed by salvage therapies Z2=salvage with high dose ara-C if the disease is resistant to induction, and Z3= other salvage if the patient first achieves a complete remission (C) but he later suffers progressive disease (P).
In the leukemia trial, the three possible outcomes following induction chemotherapy, C, R, and D, are competing risks. Thus, only one of the transition times, T(0,C), T(0,R), or T(0,D), is observed for each patient. The distribution of s1 is determined by these three transition times. For example, the probability of C is
This could be made explicit by including the states in the notation for xl. We chose not to do this for notational parsimony.
When no meaning is lost, we will further simplify notation and use a single running index on the transition times, and write T(sℓ−1,sℓ) as Tk, where k = 1, . . . , nT is a running index of all possible state transitions. For example, in Figure 1 we have up to nstage = 3 stages and nT = 7 possible transitions. Similarly, we will write xk for the corresponding covariate vector. Our use of a single index to identify stage is a slight abuse of notation since, for example, the actual second stage of therapy might differ depending on the sequence of outcomes. For example, stage 2 treatment Z2 of a patient with sequence (x0, Z1, T(0,R), Z2) is first salvage for resistant disease during induction with Z1, while stage 3 treatment Z3 of a patient with sequence (x0, Z1, T(0,C), T(C,P), Z3) is first salvage for progressive disease after achieving response initially with Z1. This latter example could be elaborated if, under a different regime, consolidation therapy, Z2, were given for patients who enter C, in which case the sequence would be (x0, Z1, T(0,C), Z2, T(C,P), Z3).
Below, we will develop a general BNP model for all possible conditional distributions p(Tk | xk). For any transition index k, let ℛk denote the risk set, fk the probability density function and F̄k the survival function of the transition time, is a censoring indicator with if patient i is not censored and if censored, and the observed time to the next state or censoring for patient i in risk set ℛk. For example, in the leukemia trial consider the transition (0, R), corresponding to the single running index k = 1. The risk set is ℛ1 = ℛ(0,R) = {1, . . . , n}. Let Ui denote the time from the start of induction to last followup for patient i. Then if and the observed time for patient i is since C, R, and D are competing risks. The likelihood for all possible sequences of treatments and transition times through nT transitions is the product
(1) |
The overall time for any counterfactual sequence of transition times is the sum . Our goal is to estimate the mean of T for each possible Z. Specific details of the likelihood are given in the Appendix.
4 A Nonparametric Bayesian Model for DTR
4.1 DDP and Gaussian Process Prior
Our motivation for using the BNP model described in this section is that it is highly robust and has full support. To specify the BNP model, we denote Yk = log(Tk) and write the distribution of [Yk | xk] as Fk(· | xk). For convenience, we will refer to xk as ‘covariates’. We construct a BNP survival regression model for Fk(· | xk) by successive elaborations, starting with a model for a discrete random distribution Gk(·). We then use a Gaussian kernel to extend this to a prior for a continuous random distribution Fk(·), and finally endow the kernel means with a regression structure by expressing them as functions of xk. The latter construction extends Fk to a family {Fk(· | xk)}, indexed by xk. The construction of Gk(·) and Fk(·) is outlined briefly below, by way of a brief review of BNP models. In the end we will only use the last model {Fk(· | xk)}, which we use as a sampling model for Yk. See, for example, Müller and Mitra (2013) and Müller and Rodriguez (2013) for more extensive reviews of BNP inference. In the following discussion we temporarily drop the superindex k.
The Dirichlet process (DP) prior was first proposed by Ferguson et al. (1973) as a probability distribution on a measurable space of probability measures. The DP is indexed by two hyperparameters, a base measure, G0, and a precision parameter, α > 0. If a random distribution G follows a DP prior, we denote this by G ~ DP (α, G0). Denoting a beta distribution by Be(a, b), if G ~ DP (α, G0) then G(A) ~ Be{αG0(A), α[1 − G0(A)]} for any measurable set A, and in particular E{G(A)} = G0(A). Let δ(θ) denote a point mass at θ. Sethuraman (1991) provided a useful representation of the DP as , where , and the weights wh are generated sequentially from rescaled beta distributions as , the so-called “stick-breaking” construction. The discrete nature of G is awkward in many applications. A DP mixture model extends the DP model by replacing each point mass δ(θh) with a continuous kernel centered at θh. Without loss of generality, we will use a normal kernel. Let N(·; μ, σ) denote a normal kernel with mean μ and standard deviation σ. The DP mixture model assumes
(2) |
The use and interpretation of (2) is very similar to that of a finite mixture of normal models. In practical applications, the sum in (2) is often truncated at a reasonable finite value. This model is useful for density estimation under i.i.d. sampling from an unknown distribution, and it provides good fits to a wide variety of datasets because a mixture of normals can closely approximate virtually any distribution (Ishwaran and James, 2001).
To include the regression on covariates that we will need for the survival model of each conditional transition time distribution, Fk(· | xk), we extend the DP mixture to a dependent DP (DDP), which was first proposed by MacEachern (1999). The basic idea of a DDP is to endow each with additional structure that specifies how it varies as a function of covariates xk. Writing this regression function as for the argument in each summand in (2), and returning to the conditional transition time distributions, we assume that
(3) |
This form of the DDP, which includes both the convolution with a normal kernel and functional dependence on covariates, provides a very flexible regression model.
To complete our specification of the DDP, we will assume that the ’s are independent realizations from a Gaussian process (GP) prior. The GP was first popularized by O’Hagan and Kingman (1978) in Bayesian inference for a random function (unrelated to the use in a DDP prior). For more recent discussions see, for example, Rasmussen and Williams (2006); Neal (1995); Shi et al. (2007). Temporarily suppressing the transition superindex k and running index h in (3), a GP is a stochastic process θ(·) in which (θ(x1), . . . , θ(xn)) has a multivariate normal distribution with mean vector (μ(x1), . . . , μ(xn)) and (n × n) covariance matrix with (i, j) element C(xi, xj) for any set of n ≥ 1 covariate vectors xi. We denote this by θ(x) ~ GP(μ, C).
We use the GP prior to define the dependence of as a function of xk by assuming , as a function of xk, for fixed h. That is, there is a separate GP for each term indexed by h in (3). We will refer to the DDP with a convolution using a normal kernel and a GP prior on the normal kernel means as a DDP-GP model. While the mean and covariance processes of the GP can be quite general, in practice, is often parameterized as a function , where ξk is a vector of hyperparameters, and the mean function is indexed similarly by hyperparameters and written as . In the DTR setting, since each covariate vector xk is a history, its entries can include baseline covariates, transition times, and indicators of previous treatments or actions. To obtain numerically reasonable parameterizations of the GP functions Ck and , we standardize numerical-valued covariates such as age. We now have
To specify the form of and Ck, let i = 1, 2, ⋯ , index patients, so that is the history of patient i at transition k, and define the indicator δij = I(i = j) = 1 if i = j and 0 otherwise. We model the mean function as a linear regression, by assuming that
(4) |
For patients i and j, we assume that the covariance process takes the form
(5) |
where Mk is the number of covariates at transition k and J is the variance on the diagonal reflecting the amount of jitter (Bernardo et al., 1999), which usually takes a small value (e.g, J = 0.1). There are no further hyperparameters ξk to index the covariance function. For binary covariates, the quadratic form in (5) reduces to counting the number of binary covariates in which two patients differ. If desired, additional hyperparameters could be introduced in (5) to obtain more flexible covariance functions. However, in practice this form of the covariance matrix yields a strong correlation for observations on patients with very similar xk, and has been adopted widely (Williams, 1998).
Combining all of these structures, we denote the model for the conditional distribution of the kth transition time as , recalling that the weights of the DDP are generated sequentially as . For later reference we state the full model. For k = 1, . . . , nT
(6) |
4.2 Determining Prior Hyperparameters
As priors for in (6) we assume for each transition k, h = 1, 2, . . . . For σk we assume . Finally, .
To apply the DDP-GP model, one must first determine numerical values for the fixed hyperparameters { , k = 1, 2, ...} and λ = (λ1, λ2, λ3, λ4). This is a critical step. These numerical hyperparameter values must facilitate posterior computation, and they should not introduce inappropriate information into the prior that would invalidate posterior inferences. With this in mind, the hyperparameters ( ) for the kth transition time covariate effect distribution may be obtained via empirical Bayes by doing preliminary fits of a lognormal distribution for each transition k. Similarly, we assume a diagonal matrix for with the diagonal values also obtained from the preliminary fit of the lognormal distribution. Once an empirical estimate of σk is obtained, one can tune (λ1, λ2) so that the prior mean of σk matches the empirical estimate and the variance equals 1 or a suitably large value to ensure a vague prior. Finally, information about αk typically is not available in practice. We use λ3 = λ4 = 1.
This approach works in practice because the parameter specifies the prior mean for the mean function of the GP prior, which in turn formalizes the regression of Tk on the covariates xk, including treatment selection. The imputed treatment effects hinge on the predictive distribution under that regression. Excessive prior shrinkage could smooth away the treatment effect that is the main focus. The use of an empirical Bayes type prior in the present setting is similar to empirical Bayes priors in hierarchical models. This type of empirical Bayes approach for hyperparameter selection is commonly used when a full prior elicitation is either not possible or is impractical. Inference is not sensitive to values of the hyperparameters λ that determine the priors of σk and αk for two reasons. First, the standard deviation σk is the scale of the kernel that is used to smooth the discrete random probability measure generated by the DDP prior. It is important for reporting a smooth fit, that is for display, but it is not critical for the imputed fits in our regression setting. Assuming some regularity of the posterior mean function, smoothing adds only minor corrections. Second, the total mass parameter αk determines the number of unique clusters formed in the underlying Polya urn. However, because most clusters are small, changing the prior of αk does not significantly change the posterior predictive values that are the basis for the proposed inference.
The conjugacy of the implied multivariate normal on { , i = 0, . . . , n} and the normal kernel in (3) greatly simplify computations, since any Markov chain Monte Carlo (MCMC) scheme for DP mixture models can be used. MacEachern and Müller (1998) and Neal (2000) described specific algorithms to implement posterior MCMC simulation in DPM models. Ishwaran and James (2001) developed alternative computational algorithms based on finite DPs, which truncated (2) after a finite number of terms. We provide details of MCMC computations in the online supplement.
4.3 Computing Mean Survival Time
We apply the Bayesian nonparametric DDP-GP model to obtain posterior means and credible intervals of mean survival time for each DTR. In the motivated leukemia trial, recall that the disease states are D (death), R (resistant disease), C (complete remission), and P (progressive disease). In stage ℓ = 1 (induction chemotherapy), the three events D, R, and C are competing risks, so only one can be observed. For the ith patient, the stage 1 outcome is denoted by s1i ∈ {D, R, C}, with transition times or (Figure 1). In stage 2, the transition time is defined only if (s1i, s2i) = (R, D), and similarly for and . Finally, is defined if (s1i, s2i) = (C, P). We thus define seven counterfactual transition times , where k indexes the transitions (0, D), (0, R), (0, C), (R, D), (C, D), (C, P) and (P, D). Figure 1 shows a flowchart of the possible outcome pathways. A dynamic treatment regime for this data may be expressed as Z = (Z1, Z2,1, Z2,2) where Z1 is the induction chemo, Z2,1 is the salvage therapy given if s1i = R, and Z2,2 is the salvage therapy given if s1i = C and s2i = P.
Our primary goal is to estimate mean survival time for each DTR Z while accounting for baseline covariates and non-random treatment assignment. Under the DDP-GP model, we denote the mean survival time for a future patient under Z by
(7) |
In terms of the seven counterfactual transition times, the survival time for a future patient i = n + 1 is
(8) |
The expectation of (8) under the DDP-GP model is evaluated by applying the law of total probability, using the same steps as in Wahed and Thall (2013). We first condition on the four possible cases, (s1i = D), (s1i = R), (s1i = C, s2i = D) and (s1i = C, s2i = P), compute the conditional expectation in each case, and then average across the cases. This computation requires evaluating seven expressions for the conditional mean transition times ηk(Z, xk) = E(Tk | Z, xk) under Fk(· | xk), for each k. For example, η(P,D)(Z1, Z2,2, x0, T(0,C), T(C,P)) is the conditional mean remaining survival time, from P to D, given that C was achieved in stage 1 with frontline therapy Z1, followed by P and salvage therapy Z2,2 in stage 2. The DDP-GP models for Fk(· | xk), k = 1, . . . , nT = 7 define most of the marginalization for the expectation in η(Z), leaving only conditioning on the baseline covariates . As Wahed and Thall (2013), we use the empirical covariate distribution p̂(x0) over the observed patients to define an overall mean survival time (7). Note that the DDP-GP model does not accommodate time-varying covariates. The described evaluation of η(Z) is an application of Robins’ G-computation (Robins, 1986; Robins et al., 2000). The complete expression is given as equation (14) in the Appendix. In the upcoming discussion, we will use η(Z) to evaluate the proposed approach.
5 Simulation Studies
We conducted four simulation studies to evaluate the performance of the proposed DDP-GP model as a tool for estimating the mean of T in survival regression settings. The simulations focused on estimation of survival regression (simulation 1); regime effects in a study with two treatment arms and single-stage regimes (simulation 2); and regime effects in two studies with multi-stage regimes (simulations 3 and 4). For each of the latter three studies, the treatment assignment probabilities depended on patient covariates. That is, we introduced treatment selection bias. In all four simulations, we implemented inference under DDP-GP models. In simulation 1, we used a single survival regression model F(Yi | xi) for a patient-specific baseline covariate vector xi. For simulation 2 we still used a single DDP-GP model F(Yi | xi, Zi), now adding a treatment indicator Zi to the survival regression model to estimate the causal effect. In simulations 3 and 4, we used independent DDP-GP models for multiple transition times, k = 1, . . . , nT , similar to the application in our case study. For all four simulation studies, the hyperprior parameters were determined using the empirical Bayes approach described earlier. For all posterior computations, the MCMC algorithm was implemented with an initial burn-in of 2,000 iterations and a total of 5,000 iterations, thinning out in batches of 10. This worked well in all cases, with convergence diagnostics using the R package coda showing no evidence of practical convergence problems. Traceplots and empirical autocorrelation plots (not shown) for the imputed parameters indicated a well mixing Markov chain.
5.1 Fitting a Survival Regression Model
In simulation 1, we considered four scenarios, with n = 50, 100, or 200 observations without censoring or n = 200 with 23% censoring. The details of simulation 1 are presented in Supplement B. Comparing the DDP-GP model with maximum likelihood estimates under the AFT model with Weibull, lognormal and exponential distributions, the estimates under the DDP-GP model reliably recovered the shape of the true survival function and avoided the excessive bias seen with the AFT models.
5.2 Estimating a Treatment Effect in Single-stage Regimes
Simulation 2 was designed to investigate inference under the DDP-GP model for the regime effect in a single-stage treatment setting. The simulated data represent what might be obtained in an observational setting where treatment is chosen by the attending physician based on patient covariates, rather than from a fairly randomized clinical trial. We simulated a binary treatment indicator Zi ∈ {0=control, 1=experimental} that depended on two continuous covariates, xi = (Li, Wi), for n = 100 patients, i = 1, . . . , n. For example, Li could be a patient’s creatinine to quantify kidney function, and Wi could be body weight. We generated Li from a mixture of normals, , which could correspond to a subgroup of patients having worse kidney function (higher creatinine level) due to damage from prior chemotherapy. We assumed that , a uniform with zero mean and unit standard deviation, which could arise from standardizing a uniformly distributed raw variable. We generated the treatment indicators using the modified logistic regression model
that is, a logistic regression model with intercept 30 and slope 1/5 truncated at 0.05 and 0.95. This produces a very unbalanced treatment assignment, for example, p(Zi = 1 | Li = 40) = 0.88 versus p(Zi = 1 | Li = 20) = 0.12. This could arise in a setting where standard therapy (the ‘control’), Z = 0, is known to be nephrotoxic, while it is believed by most of the treating physicians that the experimental therapy, Zi = 1, is not, so patients with high creatinine are more likely to be given the experimental therapy. In this simulation study, the goal is to estimate the comparative effect on survival of the experimental therapy versus the control. In the two treatment arms, we generated patients’ responses from
and
with σ = 0.4. We simulated 1,000 trials. Note that under the simulation truth the treatment effect, E[Y(1) − Y(0) | x = (L, W)] = 2.5, is constant across L, W.
Figure 2(a) plots the simulation truth for the mean response curve under Z = 1 and Z = 0 versus L, with W ≡ 0, in one randomly selected trial. The upper red solid curve represents E[Y(1) | L, W = 0] and the lower black curve represents E[Y(0) | L, W = 0]. The red dots close to the upper curve are the observations for experimental arm patients and the black dots close to the lower curve are the observations for the control arm patients. We define an average treatment effect for the entire population under the simulation truth as .
We implemented inference for a survival regression F(Yi | xi, Zi) using the proposed DDP-GP model (6). Figure 2(b) summarizes inference for the data from panel (a). Let Ŷi(z) = E(Yn+1 | Ln+1 = Li, Wn+1 = Wi, Zn+1 = z, data) denote the posterior expected response for a future patient n + 1. We defined an estimated average treatment effect as . Figure 2(b) shows the estimated average treatment effect (horizontal red line), and credible intervals for individual effects Ŷi(1) − Ŷi(0) (vertical line segments, located at Li).
Inverse Probability of Treatment Weighting (IPTW)
For comparison, we also implemented inference using naive linear regression (LR), using an IPTW estimator, and an augmented IPTW (AIPTW) estimator for the average treatment effect. The LR estimator is based on a linear regression for log survival times, ignoring the lack of randomization. We use linear predictor functions Yi(1) = β10+β11Li+β12Wi+ε1i and Yi(0) = β00+β01Li+β02Wi+ε0i. Denoting the least squares estimates by β̂zj for z = 0, 1 and j = 0, 1, 2, the estimated means are Ê{Yi(z)} = β̂z0 + β̂z1Li + β̂z2Wi. We define an estimated average treatment effect based on the LR model as . Denote the propensity score πi = pr(Zi = 1 | xi). The IPTW method corrects for bias due to lack of randomization by assigning each patient i a weight bi equal to the inverse of an estimate of p(Zi | xi), the conditional probability of receiving his or her actual treatment (Robins et al., 2000). When Zi = 1, bi = 1/πi; when Zi = 0, bi = 1/(1 − πi). An estimate of πi is obtained by fitting a logistic regression model. We define the IPTW mean outcome estimator
and corresponding average treatment effect estimate ATEIPTW = IPTW(Z = 1)−IPTW(Z = 0).
Augmented IPTW (AIPTW)
The AIPTW estimate (Robins, 2000) is a doubly robust generalization of the IPTW. It is consistent whenever the outcome regression model is correct and/or the propensity score model is correct. We evaluate the AIPTW estimator for average treatment effect (ATE):
(9) |
where π̂i is the estimated propensity score using logistic regression and Ê(Yi | Zi, xi) is estimated by a linear regression model, i = 0, 1.
Figure 2(b) shows ATEDDP, ATELR, ATEIPTW and ATEAIPTW for one simulated dataset under this simuation setup. We found E(ATEDDP | data) = 2.31, with 90% posterior credible interval (1.89, 2.96), compared with the simulation truth ATE★ = 2.5. In contrast, ATELR = 4.13 overestimates, while the IPTW method underestimates, with ATEIPTW = 1.11. The AIPTW method reports ATEAIPTW = 2.73. In Figure 2(b), the vertical green and blue segments are marginal 90% posterior credible intervals for the treatment effect (under the DDP-GP model) at each observed L value. Lengths of posterior credible intervals larger than 2 are highlighted by blue segments. Note how the uncertainty bounds grow wider in the range where there is less overlap across treatment groups, that is, over a range of covariate values for which we do not observe reliable empirical counterfactuals for each data point (e.g. L > 50). Most of the credible intervals reasonably cover the true treatment effect.
Figure 2(b) reports inference for one hypothetical data set. For a comparison of average behavior, we carried out extensive simulations and report the distribution of estimated regime effects across these simulations. We compared the regime effect estimates obtained by DDP-GP, IPTW, AIPTW and LR based on data from 1,000 simulated trials. Figure 3 shows density plots of the distributions of estimated regime effects. Compared to the estimates obtained from DDP-GP or AIPTW, the IPTW estimates are much more variable, ranging from 1.14 to 7.13. The LR estimates are highly biased, and overestimate the true effects. The distribution of estimated regime effects under the DDP-GP model is highly concentrated around the simulation truth.
5.3 Regime Effect for Multi-stage Regimes
Simulation 3 was designed to examine inference on strategy effects for multi-stage regimes with a general DTR setup. This simulation is similar to the scenario in Moodie et al. (2007). We simulated samples of size n = 200. Patients were randomized to initial induction therapy or not, coded as and , with the randomization probabilities based on their baseline CD4 counts, which were simulated as Li ~ N(450, 102). For frontline therapy, we used the model . In order to focus on covariate-dependent induction and salvage therapies, we assumed for simplicity that all patients were resistant to the induction therapy. Let X ~ LN(m, s) denote a lognormal random variable with log(X) ~ N(m, s), we simulated the times . The salvage treatment for each patient was assigned with probability where expit(u) = eu/(1 + eu). For the first stage transition times, we generated transition times , where β(R,D) = (−0.5, 0.03, 0.2, 0.5, 0.3) and .
The goal is to estimate mean survival time for each DTR (Z1, Z2). We have four possible DTRs in this simulation. We applied the Bayesian nonparametric DDP-GP model, IPTW and AIPTW (Zhang et al., 2013) to each simulated dataset to estimate mean survival for each of the four possible DTRs. When implementing IPTW and AIPTW, we estimated the propensity score using logistic regression and the outcome model using AFT regression models with a lognormal distribution. For the nonparametric Bayesian inference we defined independent DDP-GP models as in (6) for each of the nT = 2 log transition times . Figure 4(a) compares the mean survival estimates using boxplots of (Estimated mean survival - Simulation truth), based on 1,000 simulated datasets, arranged by inference method (DDP-GP, IPTW and AIPTW) and by the four possible DTRs (the four sub-plots). Note that the DDP-GP and the AIPTW estimates are on average closer to the truth and have much smaller variability, compared to the IPTW estimates, across all four strategies. Because we use the same outcome regression models as the simulation truth when implementing the AIPTW method, it performs well in this simulation study. In summary, both, the DDP-GP and the AIPTW methods show satisfactory performance in this example, although the DDP-GP estimates show slightly smaller variability than the AIPTW estimates.
Simulation 4 is a stylized version of the leukemia data that we will analyze in Section 6. We simulated samples of size n = 200 and patients’ blood glucose values Li ~ N(100, 102). Patients initially were randomized equally between two induction therapies Z1 ∈ {a1, a2}. We then generated a response (see below). Patients who were resistant (R) to the assigned induction therapies were then assigned salvage treatment Z2,1 ∈ {b11, b12}. Salvage treatments were randomized using the rule p(Z2,1 = b11 | Li) = 0.8 I(Li < 100) + 0.2 I(Li ≥ 100). Patients who achieved C and subsequently suffered disease progression (P), were given salvage treatment Z2,2 ∈{b21, b22}, using p(Z2,2 = b21 | Li) = 0.2 I(Li < 100) + 0.85 I(Li ≥ 100). Finally, the survival time for each patient was evaluated as
We simulated the times of the two competing risks R and C as and , where β(0,R) = (2, 0.02, 0), β(0,C) = (1.5, 0.03, −0.8), with for k ∈{(0, R), (0, C)}. For the three possible second stage transitions k ∈{(R, D), (C, P), (P, D)}, we generated (competing) transition times , where β(R,D) = (−0.5, 0.03, 0.2, 0.5, 0.3), β(C,P) = (1, 0.05, 1, −0.6), β(P,D) = (0.8, 0.04, 1.5, −1, 0.5, 0.5), with covariate vectors and . We simulated N = 1,000 trials with 15% censoring.
The goal is to estimate mean survival time for each DTR (Z1, Z2,1, Z2,2). We performed inference under the Bayesian nonparametric DDP-GP model, IPTW, and AIPTW for each simulated dataset to estimate mean survival for each of the eight possible DTRs. When implementing IPTW and AIPTW, we estimated the propensity score using logistic regression and the outcome model using AFT regression models with a lognormal distribution. For the nonparametric Bayesian inference, we defined independent DDP-GP models for each of the nT = 5 log transition times . Figure 4(b) compares mean survival estimates using boxplots of (Estimated mean survival - Simulation truth), based on 1000 simulated datasets. The boxplots are arranged by inference method (DDP-GP, IPTW, AIPTW) and by all eight possible DTRs. In this simulation, both the propensity score model and the outcome model are incorrect when we implement the IPTW and AIPTW methods. In this case, the DDP-GP estimates on average are much closer to the truth and have much smaller variability, compared to the IPTW and AIPTW estimates, across all eight strategies as shown in Figure 4(b).
6 Evaluation of the Leukemia Trial Regimes
6.1 Leukemia Data – Inference for the Survival Regression
To analyze the AML-MDS trial data under the proposed DDP-GP model, we first implement posterior inference for six of the nT = 7 transition times. The exception is T(C,D). Due to the limited sample size – only 9 patients died after C without first suffering disease progression (P) – we do not implement the DDP-GP model, and instead use an intercept-only Weibull AFT model. Table 1 summarizes the data. The table reports the number of patients and median transition times for some selected transitions.
Table 1.
Induction | Resistance | Die after resistance | |||
---|---|---|---|---|---|
N | TR(days) | Salvage | N | T(R,D)(days) | |
| |||||
All | 39 | 59 (47,84) | All | 37 | 76 (27,187) |
FAI | 17 | 63 (41,97) | HDAC | 25 | 65 (21,154) |
FAI+ATRA | 13 | 59 (55,76) | |||
FAI+GCSF | 4 | 77 (43.5,106.75) | non HDAC | 12 | 146 (79, 376.75) |
FAI+ATRA+GCSF | 5 | 51 (48, 65) |
Induction | CR | Die after progression | |||
---|---|---|---|---|---|
N | TC(days) | Salvage | N | T(P,D)(days) | |
| |||||
All | 102 | 32 (27,41) | All | 83 | 120 (45,280) |
FAI | 20 | 31 (29, 44) | HDAC | 47 | 106 (45,175.5) |
FAI+ATRA | 26 | 31 (25.25, 35) | |||
FAI+GCSF | 28 | 35.5 (28,42.75) | non HDAC | 36 | 147.5 (42.75, 592.25) |
FAI+ATRA+GCSF | 28 | 32 (26,41) |
We first report results for T (R,D). Of 210 patients, 39 (18.57%) experienced resistance to their induction therapies. The rate of resistance varied across regimes, from 31% for patients receiving FAI, 24% for FAI plus ATRA, 7.8% for FAI plus GCSF, and 10% for FAI plus ATRA plus GCSF. The times to treatment resistance were longer, with greater variability in the FAI plus GCSF arm compared to the other three arms. Among the 39 patients who were resistant to induction therapies, 27 were given HDAC as salvage treatment, of whom 2 were censored before observing death. Figure 5 summarizes survival regression under the proposed DDP-GP model by plotting posterior predicted survival functions for a hypothetical future patient at age 61 with poor prognosis cytogenetic abnormality. The figure shows posterior predicted survival functions, arranged by different induction therapies Z1 (the four curves in each panel), T (0,R) and Z2,1 (as indicated in the subtitle). Figure 5 shows that patients with shorter T (0,R) had lower predicted survival once their cancer became resistant. Also, patients with s1 = R who received Z2,1 = HDAC as salvage had worse predicted survival than patients who received salvage treatment with non HDAC. Similar results can be obtained for other transition times.
Next, we summarize results of the survival regression for T (C,P). Among the n = 210 patients, 102 (48.6%) achieved C, with C rates of 37%, 48%, 53% and 56% in the FAI, FAI plus ATRA, FAI plus GCSF and FAI plus GCSF plus ATRA arms, respectively. Of the 102 patients who achieved CR, 93 experienced disease progression before death or being lost to follow-up. Among these 93 relapsed patients, 53 received salvage treatment with HDAC. For a hypothetical future patient at age 61 with poor prognosis cytogenetic abnormality, Figure 6 summarizes survival regression functions for each of the four induction therapies, with solid lines representing T (0,C) = 20 and dotted lines representing T (0,C) = 30. The four dotted lines are below the four corresponding solid lines, indicating that T (0,C) was associated with T (C,P). This observation coincides with the well-known phenomenon in chemotherapy for AML or MDS that, regardless of induction therapy, the longer it takes to achieve C, the shorter the period that the patient remains in C.
Similarly, we summarize results for the survival regression for T (P,D). For a patient with poor prognosis cytogenetic abnormality, Figure 7 shows the posterior predicted survival functions under different combinations of induction therapy and age. Panels (a) and (c) show the survival functions of a patient assigned salvage treatment HDAC with age 46 or 76, while panels (b) and (d) plot the corresponding survival functions for the patient assigned non HDAC as salvage. Four different colors represent the four induction therapies. Figure 7 shows that residual survival time after disease progression following C was associated with both age and salvage therapy. Older patients were more likely to have shorter residual life once their disease progressed, and patients given HDAC as salvage died more quickly than patients given non HDAC salvage.
6.2 Estimating the Regime Effects
In the AML-MDS trial, the four induction therapies and two salvage therapies define a total of 16 regimes. Mean survival time estimates under each of the 16 regimes were calculated using posterior inference under independent DDP-GP models for each of the nT = 7 transition times. For comparison, we also evaluated mean survival times using the IPTW method. See equation (16) in the Appendix for details. Table 2 summarizes the results using IPTW and the DDP-GP model, including 90% credible intervals. Figure (8) shows boxplots of the marginal posterior distributions of survival times under the DDP-GP model for the 16 regimes.
Table 2.
Regime (A, B1, B2) | Estimated mean OS times (days) | ||
---|---|---|---|
DDP-GP | |||
IPTW | Posterior mean | 90% CI | |
| |||
(FAI, HDAC, HDAC) | 191.67 | 390.35 | (286.47 545.6) |
(FAI, HDAC, other) | 198.18 | 416.34 | (295.84 581.73) |
(FAI, other, HDAC) | 216.59 | 394.2 | (287.15 538.63) |
(FAI, other, other) | 222.42 | 420.19 | (296.51 579.05) |
| |||
(FAI+ATRA, HDAC, HDAC) | 527.43 | 572.9 | (416.63 829.12) |
(FAI+ATRA, HDAC, other) | 458.85 | 617.15 | (434.4 905.82) |
(FAI+ATRA, other, HDAC) | 532.29 | 573.46 | (413.59 830.39) |
(FAI+ATRA, other, other) | 464.39 | 617.71 | (434.49 900.32) |
| |||
(FAI+GCSF, HDAC, HDAC) | 326.15 | 542.06 | (393.49 725.23) |
(FAI+GCSF, HDAC, other) | 281.78 | 578.24 | (419.69 781.05) |
(FAI+GCSF, other, HDAC) | 327.66 | 542.5 | (392.77 726.08) |
(FAI+GCSF, other, other) | 283.36 | 578.68 | (421.46 781.26) |
| |||
(FAI+ATRA+GCSF, HDAC, HDAC) | 337.44 | 458.34 | (327.91 651.21) |
(FAI+ATRA+GCSF, HDAC, other) | 285.64 | 502.48 | (360.29 727.44) |
(FAI+ATRA+GCSF, other, HDAC) | 362.56 | 459.42 | (328.09 651.61) |
(FAI+ATRA+GCSF, other, other) | 309.62 | 503.56 | (358.84 726.88) |
The two methods give very different estimates for mean survival time, with the DDP-GP likelihood-based estimator much larger than the corresponding IPTW estimator for most regimes. The differences are expected due to the distinct properties of these two methods. The IPTW estimator uses the covariates to estimate the regime probability weights. In contrast, the DDP-GP likelihood-based method computes mean survival time, using G-computation, accounting for patients’ covariates and previous transition times in addition to treatment followed by marginalizing over the empirical covariate distribution to obtain η(Z). Additionally, the IPTW estimate is calculated from the overall samples, whereas the likelihood-based DDP-GP method models each transition time distribution separately, which reduces the effective sample size for each model fit and thus increases the overall variability even though they share the same prior for the βk’s.
For both methods, the estimates were smallest for the four regimes with FAI as induction therapy regardless of salvage treatment, and the 90% credible intervals were relatively small for these inferior regimes. Under the IPTW method, the estimates were largest for the four regimes with FAI plus ATRA as induction therapy, and the best regime is (FAI+ATRA, other, HDAC). With the DDP-GP likelihood-based approach, FAI plus ATRA as induction also gave the largest estimates, except for the regimes (FAI+GCSF, HDAC, other) and (FAI+GCSF, other, other), while the best regime is (FAI+ATRA, other, other). Most importantly, the DDP-GP likelihood-based approach showed that (FAI + ATRA, Z2,1, other) was superior to (FAI + ATRA, Z2,1, HDAC) regardless of Z2,1. Therefore, our results suggest that (1) FAI plus ATRA was the best induction therapy, (2) if the patient’s disease was resistant to FAI plus ATRA, then it was irrelevant whether the salvage therapy contained HDAC, and (3) if patients experienced progression after achieving C with FAI plus ATRA, then salvage therapy with non HDAC was superior.
These conclusions, although not confirmatory, contradict those given by Estey et al. (1999), who concluded that none of the three adjuvant combinations FAI plus ATRA, FAI plus GCSF, or FAI plus ATRA plus GCSF were significantly different from FAI alone with respect to either survival or event-free survival time, based on consideration of only the frontline therapies by applying conventional Cox regression and hypothesis testing.
7 Conclusions
We have proposed a Bayesian nonparametric DDP-GP model for analyzing survival data and evaluating joint effects of induction-salvage therapies in clinical trials, using the posterior estimates, to predict survival for future patients. The Bayesian paradigm works very well, and the simulation studies suggest that our DDP-GP method yields more reliable estimates than IPTW and AIPTW. The DDP-GP model can be extended easily to multivariate outcomes. In equation (2), this could be done by replacing the normal distribution with a multivariate normal distribution as the base measure. A referee has noted that, in settings where interpretability is important, our proposed BNP approach could be applied in the context of a policy search algorithm (Orellana et al., 2010; Zhang et al., 2012a,b, 2013; Zhao et al., 2012, 2014, 2015).
We employed two different methods to evaluate the 16 possible two-stage regimes for choosing induction and salvage therapies in the leukemia trial data. The IPTW method estimates the regime effect by using covariates only to compute the assignment probabilities of salvage therapies to correct for bias. In contrast, likelihood-based G-computation under the DDP-GP model accounts for all possible outcome paths, the transition times between successive states, and effects of covariates and previous outcomes, on each transition time. Although the two methods gave different numerical estimates of mean survival time, they both reached the conclusion that FAI plus ATRA was the best induction therapy and FAI was the worst induction therapy. Although our current models are set up for two-stage treatment regimes, they easily can be extended to other applications with multi-stage regimes.
Supplementary Material
Acknowledgments
This research was supported by NCI/NIH grant R01 CA157458-01A1 (Yanxun Xu and Peter Müller) and R01 CA83932 (Peter F. Thall).
Appendix
Likelihood
The following structure is adapted from Wahed and Thall (2013), and is included here for completeness. The risk sets of the seven transition times in the leukemia trial are defined as follows. Let ℛ0 = {1, . . . , n} denote the initial risk set at the start of induction chemotherapy, and ℛ(0,r) = {i : s1i = r} for r = D, C, R, so ℛ0 = ℛ(0,D) ∪ ℛ(0,C) ∪ ℛ(0,R). Similarly, ℛ(C,P) = {i : s1i = C, s2i = P } is the later risk set for T(P,D).
To record right censoring, let Ui denote the time from the start of induction to last followup for patient i. We assume that Ui is conditionally independent of the transition time given prior transition times and other covariates. Censoring of event times occurs by competing risk and/or loss to follow up. For patient i in the risk set for transition time , let if patient i is not censored and 0 if patient i is right censored. For example, for i ∈ ℛ0 if . Similarly, for i ∈ ℛ(0,R) if and for i ∈ ℛ(C,P) if .
For i ∈ ℛ0, let denote the observed time for the stage 1 event or censoring. For i ∈ ℛ(0,C) let denote the observed event time for the competing risks D and P and loss to followup. Similarly, for i ∈ ℛ(0,R) , let and for i ∈ ℛ(C,P) let .
The joint likelihood function is the product ℒ= ℒ1ℒ2ℒ3ℒ4. The first factor ℒ1 corresponds to response to induction therapy,
(10) |
where F̄k = 1 − Fk. The second factor ℒ2 corresponds to patients i ∈ R(0,R) who experience resistance to induction and receive salvage Z2,1,
(11) |
The third factor ℒ3 is the likelihood contribution from patients achieving C,
(12) |
The fourth factor ℒ4 is the contribution from patients who experience tumor progression after C,
(13) |
The mean survival time of a patient treated with regime Z = (Z1, Z2,1, Z2,2) is
(14) |
IPTW
We compute the IPTW estimates for overall mean survival with regime Z as
(15) |
where
(16) |
In (16), K̂ is the Kaplan-Meier estimator of the censoring survival distribution K(u) = P (U ≥ t) at time t. Ii(Z) is is an indictor of treatment Z and 0 otherwise, and is the probability of receiving salvage treatment Z2,1 estimated using logistic regression, and similarly for . The above estimator has been shown to be consistent under suitable assumptions (Wahed and Thall, 2013; Scharfstein et al., 1999).
Contributor Information
Yanxun Xu, Division of Statistics and Scientific Computing, The University of Texas at Austin, Austin, TX.
Peter Müller, Department of Mathematics, The University of Texas at Austin, Austin, TX.
Abdus S. Wahed, Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA
Peter F. Thall, Department of Biostatistics, The University of Texas M.D. Anderson Cancer Center, Houston, TX
References
- Bernardo J, Berger J, Smith ADF. Regression and classification using gaussian process priors. Bayesian Statistics 6: Proceedings of the Sixth Valencia International Meeting; June 6-10 1998; Oxford University Press; 1999. p. 475. [Google Scholar]
- Connolly S, Bernstein G. Practice parameter for the assessment and treatment of children and adolescents with anxiety disorders. Journal of the American Academy of Child and Adolescent Psychiatry. 2007;46(2):267–283. doi: 10.1097/01.chi.0000246070.23695.06. [DOI] [PubMed] [Google Scholar]
- Dawson R, Lavori PW. Placebo-free designs for evaluating new mental health treatments: the use of adaptive treatment strategies. Statistics in medicine. 2004;23(21):3249–3262. doi: 10.1002/sim.1920. [DOI] [PubMed] [Google Scholar]
- Estey EH, Thall PF, Pierce S, Cortes J, Beran M, Kantarjian H, Keating MJ, Andreeff M, Freireich E. Randomized phase ii study of fludarabine+ cytosine arabinoside+ idarubicin±all-trans retinoic acid±granulocyte colony-stimulating factor in poor prognosis newly diagnosed acute myeloid leukemia and myelodysplastic syndrome. Blood. 1999;93(8):2478–2484. [PubMed] [Google Scholar]
- Ferguson TS, et al. A bayesian analysis of some nonparametric problems. The Annals of Statistics. 1973;1(2):209–230. [Google Scholar]
- Goldberg Y, Kosorok MR. Q-learning with censored data. Annals of statistics. 2012;40(1):529. doi: 10.1214/12-AOS968. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hernán MÁ, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of hiv-positive men. Epidemiology. 2000;11(5):561–570. doi: 10.1097/00001648-200009000-00012. [DOI] [PubMed] [Google Scholar]
- Hill JL. Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics. 2011;20(1):217–240. [Google Scholar]
- Ishwaran H, James LF. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association. 2001;96(453):161. [Google Scholar]
- Karabatsos G, Walker SG. A bayesian nonparametric causal model. Journal of Statistical Planning and Inference. 2012;142(4):925–934. [Google Scholar]
- Lavori PW, Dawson R. A design for testing clinical strategies: biased adaptive within-subject randomization. Journal of the Royal Statistical Society: Series A (Statistics in Society) 2000;163(1):29–38. [Google Scholar]
- Lunceford JK, Davidian M, Tsiatis AA. Estimation of survival distributions of treatment policies in two-stage randomization designs in clinical trials. Biometrics. 2002;58(1):48–57. doi: 10.1111/j.0006-341x.2002.00048.x. [DOI] [PubMed] [Google Scholar]
- MacEachern SN. ASA proceedings of the section on bayesian statistical science. American Statistical Association; Alexandria, VA: 1999. Dependent nonparametric processes; pp. 50–55.pp. 50–55. [Google Scholar]
- MacEachern SN, Müller P. Estimating mixture of dirichlet process models. Journal of Computational and Graphical Statistics. 1998;7(2):223–238. [Google Scholar]
- Moodie EE, Richardson TS, Stephens DA. Demystifying optimal dynamic treatment regimes. Biometrics. 2007;63(2):447–455. doi: 10.1111/j.1541-0420.2006.00686.x. [DOI] [PubMed] [Google Scholar]
- Müller P, Mitra R. Bayesian nonparametric inference–why and how. Bayesian Analysis. 2013;8(2):269–302. doi: 10.1214/13-BA811. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Müller P, Rodriguez A. Nonparametric bayesian inference. IMS-CBMS Lecture Notes. IMS. 2013:270. [Google Scholar]
- Murphy S, Van Der Laan M, Robins J. Marginal mean models for dynamic regimes. Journal of the American Statistical Association. 2001;96(456):1410–1423. doi: 10.1198/016214501753382327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Murphy SA. Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2003;65(2):331–355. [Google Scholar]
- Murphy SA. An experimental design for the development of adaptive treatment strategies. Statistics in medicine. 2005;24(10):1455–1481. doi: 10.1002/sim.2022. [DOI] [PubMed] [Google Scholar]
- Murphy SA, Collins LM, Rush AJ. Customizing treatment to the patient: adaptive treatment strategies. Drug and alcohol dependence. 2007a;88(Suppl 2):S1–3. doi: 10.1016/j.drugalcdep.2007.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Murphy SA, Lynch KG, Oslin D, McKay JR, TenHave T. Developing adaptive treatment strategies in substance abuse research. Drug and alcohol dependence. 2007b;88:S24–S30. doi: 10.1016/j.drugalcdep.2006.09.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Neal R. PhD thesis. Graduate Department of Computer Science, University of Toronto; 1995. Bayesian Learning for Neural Networks. [Google Scholar]
- Neal RM. Markov chain sampling methods for dirichlet process mixture models. Journal of computational and graphical statistics. 2000;9(2):249–265. [Google Scholar]
- O’Hagan A, Kingman J. Curve fitting and optimal design for prediction. Journal of the Royal Statistical Society. Series B (Methodological) 1978;40(1):1–42. [Google Scholar]
- Orellana L, Rotnitzky A, Robins JM. Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, part i: main content. The international journal of biostatistics. 2010;6(2) [PubMed] [Google Scholar]
- Rasmussen C, Williams C. Gaussian Processes for Machine Learning. MIT Press; 2006. [Google Scholar]
- Robins J. A new approach to causal inference in mortality studies with a sustained exposure period - application to control of the healthy worker survivor effect. Mathematical Modelling. 1986;7(9):1393–1512. [Google Scholar]
- Robins J, Orellana L, Rotnitzky A. Estimation and extrapolation of optimal treatment and testing strategies. Statistics in medicine. 2008;27(23):4678–4721. doi: 10.1002/sim.3301. [DOI] [PubMed] [Google Scholar]
- Robins JM. Addendum to “a new approach to causal inference in mortality studies with a sustained exposure period – application to control of the healthy worker survivor effect”. Computers & Mathematics with Applications. 1987;14(9):923–945. [Google Scholar]
- Robins JM. The analysis of randomized and non-randomized aids treatment trials using a new approach to causal inference in longitudinal studies. Health service research methodology: a focus on AIDS. 1989;113:159. [Google Scholar]
- Robins JM. Latent variable modeling and applications to causality. Springer; 1997. Causal inference from complex longitudinal data; pp. 69–117. [Google Scholar]
- Robins JM. Robust estimation in sequentially ignorable missing data and causal inference models. Proceedings of the American Statistical Association. 2000;1999:6–10. [Google Scholar]
- Robins JM. Optimal structural nested models for optimal sequential decisions. Proceedings of the Second Seattle Symposium in Biostatistics; Springer; 2004. pp. 189–326. [Google Scholar]
- Robins JM, Hernán MÁ, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11(5):550–560. doi: 10.1097/00001648-200009000-00011. [DOI] [PubMed] [Google Scholar]
- Robins JM, Rotnitzky A. AIDS Epidemiology. Springer; 1992. Recovery of information and adjustment for dependent censoring using surrogate markers; pp. 297–331. [Google Scholar]
- Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for nonignorable drop-out using semiparametric nonresponse models. Journal of the American Statistical Association. 1999;94(448):1096–1120. [Google Scholar]
- Sethuraman J. Technical report, DTIC Document. 1991. A constructive definition of dirichlet priors. [Google Scholar]
- Shi JQ, Wang B, Murray-Smith R, Titterington DM. Gaussian process functional regression modeling for batch data. Biometrics. 2007;63(3):714–723. doi: 10.1111/j.1541-0420.2007.00758.x. [DOI] [PubMed] [Google Scholar]
- Thall PF, Logothetis C, Pagliaro LC, Wen S, Brown MA, Williams D, Millikan RE. Adaptive therapy for androgen-independent prostate cancer: a randomized selection trial of four regimens. Journal of the National Cancer Institute. 2007a;99(21):1613–1622. doi: 10.1093/jnci/djm189. [DOI] [PubMed] [Google Scholar]
- Thall PF, Millikan RE, Sung HG, et al. Evaluating multiple treatment courses in clinical trials. Statistics in Medicine. 2000;19(8):1011–1028. doi: 10.1002/(sici)1097-0258(20000430)19:8<1011::aid-sim414>3.0.co;2-m. [DOI] [PubMed] [Google Scholar]
- Thall PF, Sung HG, Estey EH. Selecting therapeutic strategies based on efficacy and death in multicourse clinical trials. Journal of the American Statistical Association. 2002;97(457):29–39. [Google Scholar]
- Thall PF, Wooten LH, Logothetis CJ, Millikan RE, Tannir NM. Bayesian and frequentist two-stage treatment strategies based on sequential failure times subject to interval censoring. Statistics in medicine. 2007b;26(26):4687–4702. doi: 10.1002/sim.2894. [DOI] [PubMed] [Google Scholar]
- Tsiatis A. Semiparametric theory and missing data. Springer; 2007. [Google Scholar]
- van der Laan MJ, Petersen ML. Causal effect models for realistic individualized treatment and intention to treat rules. International Journal of Biostatistics. 2007;3(1):3. doi: 10.2202/1557-4679.1022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wahed AS, Thall PF. Evaluating joint effects of induction–salvage treatment regimes on overall survival in acute leukaemia. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2013;62(1):67–83. doi: 10.1111/j.1467-9876.2012.01048.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wahed AS, Tsiatis AA. Semiparametric efficient estimation of survival distributions in two-stage randomisation designs in clinical trials with censored data. Biometrika. 2006;93(1):163–177. [Google Scholar]
- Wang L, Rotnitzky A, Lin X, Millikan RE, Thall PF. Evaluation of viable dynamic treatment regimes in a sequentially randomized trial of advanced prostate cancer. Journal of the American Statistical Association. 2012;107(498):493–508. doi: 10.1080/01621459.2011.641416. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Williams CK. Learning in graphical models. Springer; 1998. Prediction with gaussian processes: From linear regression to linear prediction and beyond; pp. 599–621. [Google Scholar]
- Zajonc T. Bayesian inference for dynamic treatment regimes: Mobility, equity, and efficiency in student tracking. Journal of the American Statistical Association. 2012;107(497):80–92. [Google Scholar]
- Zhang B, Tsiatis AA, Davidian M, Zhang M, Laber E. Estimating optimal treatment regimes from a classification perspective. Stat. 2012a;1(1):103–114. doi: 10.1002/sta.411. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang B, Tsiatis AA, Laber EB, Davidian M. A robust method for estimating optimal treatment regimes. Biometrics. 2012b;68(4):1010–1018. doi: 10.1111/j.1541-0420.2012.01763.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang B, Tsiatis AA, Laber EB, Davidian M. Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. Biometrika. 2013:ast014. doi: 10.1093/biomet/ast014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao Y, Zeng D, Rush AJ, Kosorok MR. Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association. 2012;107(499):1106–1118. doi: 10.1080/01621459.2012.695674. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao Y-Q, Zeng D, Laber EB, Kosorok MR. New statistical learning methods for estimating optimal dynamic treatment regimes. Journal of the American Statistical Association. 2014 doi: 10.1080/01621459.2014.937488. just-accepted. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao YQ, Zeng D, Laber EB, Song R, Yuan M, Kosorok MR. Doubly robust learning for estimating individualized treatment with censored data. Biometrika. 2015;102:151–168. doi: 10.1093/biomet/asu050. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.