Sequential Multiple Assignment Randomized Trial (SMART) with Adaptive Randomization for Quality Improvement in Depression Treatment Program

Ying Kuen Cheung; Bibhas Chakraborty; Karina W Davidson

doi:10.1111/biom.12258

. Author manuscript; available in PMC: 2016 Jun 1.

Published in final edited form as: Biometrics. 2014 Oct 29;71(2):450–459. doi: 10.1111/biom.12258

Sequential Multiple Assignment Randomized Trial (SMART) with Adaptive Randomization for Quality Improvement in Depression Treatment Program

Ying Kuen Cheung ^1,^✉, Bibhas Chakraborty ^1,², Karina W Davidson ³

PMCID: PMC4429017 NIHMSID: NIHMS680616 PMID: 25354029

Summary

Implementation study is an important tool for deploying state-of-the-art treatments from clinical efficacy studies into a treatment program, with the dual goals of learning about effectiveness of the treatments and improving the quality of care for patients enrolled into the program. In this article, we deal with the design of a treatment program of dynamic treatment regimens (DTRs) for patients with depression post acute coronary syndrome. We introduce a novel adaptive randomization scheme for a sequential multiple assignment randomized trial of DTRs. Our approach adapts the randomization probabilities to favor treatment sequences having comparatively superior Q-functions used in Q-learning. The proposed approach addresses three main concerns of an implementation study: it allows incorporation of historical data or opinions, it includes randomization for learning purposes, and it aims to improve care via adaptation throughout the program. We demonstrate how to apply our method to design a depression treatment program using data from a previous study. By simulation, we illustrate that the inputs from historical data are important for the program performance measured by the expected outcomes of the enrollees, but also show that the adaptive randomization scheme is able to compensate poorly specified historical inputs by improving patient outcomes within a reasonable horizon. The simulation results also confirm that the proposed design allows efficient learning of the treatments by alleviating the curse of dimensionality.

Keywords: Behavioral intervention, Dynamic treatment regimen, Implementation research, Problem-solving therapy, Play-the-winner, Q-learning

1. Introduction

It is common to conduct implementation studies to facilitate the uptake of new treatments, particularly in the area of behavioral intervention for chronic conditions. An implementation study generally aims to adapt existing treatments in a care setting or treatment program so as to improve the quality of care for patients enrolled to the program while producing knowledge about the treatments (Cheung and Duan, 2014). We are motivated by the needs for implementing effective depression interventions in clinical care for patients suffering from acute coronary syndrome. Even though numerous treatment modules including medications and problem-solving therapy (PST) are available, depression management for these patients remains poor due to suboptimal treatment administration (Thombs et al., 2008). As the management of depression may involve multiple treatment components given over a period of time, a successful intervention is likely a direct result of administering each component or their combination in an optimal sequence, possibly based on intermediate outcomes, with an objective to maximize the eventual health outcome. Thus, the optimal intervention is potentially a dynamic treatment regimen (DTR; Murphy, 2003); see Chakraborty, Laber, and Zhao (2013) for an example of a DTR in a depression study setting.

In an attempt to improve care of patients with depression after acute coronary syndrome, Davidson et al. (2013) compare a centralized depression care approach to standard care in the Comparison of Depression Interventions after Acute Coronary Syndrome (CODIACS) Vanguard trial. The trial adopted the stepped care approach, whereby initial treatments were chosen based on patient preference or standard care, and then “stepped” based on intermediate symptoms in the treatment arm. As a result, patients in CODIACS received different treatment sequences. This would allow estimation of an optimal DTR by using reinforcement learning techniques such as Q-learning (Watkins, 1989; Murphy, 2005a); and in principle, the results could be transitioned to implementation in a depression treatment program. The validity of such analysis, however, depends on the untestable assumption of ignorable treatment (Rubin, 1974; Robins, 1997), which asserts that the assigned treatment is independent of the potential future outcomes, conditional on the subject history. Therefore, deploying a stepped care approach in a depression treatment program does not necessarily lead to unbiased learning about the optimal treatments.

An alternative to the stepped care approach is the sequential multiple assignment randomized trial (SMART) strategy, whereby a patient is initially randomized to a treatment, and is re-randomized at a subsequent stage based on intermediate outcomes (Thall, Millikan, and Sung, 2000; Lavori and Dawson, 2004). By virtue of randomization, the assumption of ignorable treatment holds. In this article, we propose using SMART-type designs for implementation studies within a depression treatment program. A regular SMART design randomizes subjects to available treatment options according to pre-specified probabilities, often without regard to the likely benefits of the treatment. In particular, a typical strategy aims to achieve equal sample sizes across possible treatment sequences (Murphy, 2005b). This approach in theory maximizes the comparative power for comparing two treatment sequences, but it is at odds with the objective of an implementation study. In addition, in order to cover all possible branches of treatment sequences, a SMART design may suffer from the “curse of dimensionality”. In drug trials such as the CATIE trial for schizophrenia (Schneider et al., 2001) and the STAR*D trial for depression (Rush et al., 2004), the curse of dimensionality can be alleviated by restricting intervention options that are ethically and scientifically feasible, hence reducing the number of possible sequences to be tested. For example, it is quite common to use the play-the-winner strategy, whereby a patient with a positive intermediate outcome will stay on the same treatment. In giving cancer therapy, the frontline treatment is continued upon a response or stable disease (Thall et al., 2007). While reasonable for administering medications, playing the winner is not necessarily the ethical or optimal approach for behavioral interventions such as PST.

Motivated by these practical considerations, we propose applying SMART with adaptive randomization (AR) for implementation research within a depression treatment program. The use of AR is increasingly common in cancer trials for non-dynamic treatments (e.g., Cheung et al., 2006; Thall and Wathen, 2007; Barker et al., 2009). The idea is to use the outcome data from patients treated previously in a study to unbalance the randomization probabilities in favor of the empirically superior treatments for the future study subjects. This is appealing for the purpose of quality improvement in a care program. In this article, the term ‘adaptive’ refers to the fact that treatment decisions are adapted between patients, rather than within patients. A non-adaptive SMART allows evaluation of DTRs that adapt treatments within patients, but does not adapt between patients. While few SMARTs consider between-patient adaptation, Lee et al. (2014) recently apply ∈-greedy randomization in a dose finding trial to adaptively assign doses based on the posterior expectation of Q-function-type utilities. Like Lee et al. (2014), our AR strategy is also based on Q-functions in Q-learning. Section 2 reviews Q-learning and introduces the proposed SMART with adaptive randomization (SMART-AR) design. In Section 3, we design a SMART-AR for a depression treatment program using the CODIACS data, and discuss design calibration. Section 4 presents simulation results comparing different designs and assessing robustness of the methods. Some remarks are given in Section 5.

2. Methods

2.1. Q-learning: Review and Notation

Let A_ti denote the treatment given to patient i at stage t, and S_t denote the set of treatment options at stage t so that A_ti ∈ S_t. Let J_t be the number of treatment options at stage t. The objective of Q-learning is to identify the optimal decision $d_{t}^{*} (h_{t}) \in S_{t}$ for the stage-t intervention given the patient history H_ti = h_t just prior to the intervention so as to maximize the mean of the eventual health outcome Y_i. In this article, we will focus on Q-learning for two-stage DTR, for which we define Q-functions Q₂(h₂, a₂) = E(Y_i|H₂_i = h₂, A₂_i = a₂) and Q₁(h₁, a₁) = E {max_a_2∈_s₂ Q₂(H₂_i, a₂)|H₁_i = h₁, A₁_i = a₁}. If the Q-functions were known, we could use backward induction to evaluate the optimal DTR that maximizes the expected value of the outcome that is, for patient i, $d_{t}^{*} (h_{ti}) = arg {max}_{a_{t} \in s_{t}} Q_{t} (h_{ti}, a_{t})$ for t = 2, 1.

In practice, however, the true Q-functions are unknown and must be estimated from the data. Q-learning postulates the conceptual model for stage-t Q-function using linear regression: Q_t(h_t, a_t; θ_t) = $θ_{t 0} + θ_{t 1}^{T} h_{t} + θ_{t 2}^{T} a_{t} + θ_{t 2}^{T} a_{t} + θ_{t 3}^{T} h_{t} a_{t}$ where the design matrix comprised of h_ti and a_ti can be properly coded as continuous or dummy variables. For stage 2, the regression parameter θ₂ can be estimated by least squares: ${\hat{θ}}_{2} = arg {min}_{θ} \sum_{i = 1}^{n} {Y_{i} - Q_{2} (H_{2 i}, A_{2 i}; θ)}^{2}$ . For stage 1, we define for patient i a pseudo-outcome Ŷ_i = max_a_2∈_s₂ Q₂(H₂_i, a₂; θˆ₂) as a proxy for the quantity under expectation in the definition of Q₁(h₁, a₁), and estimate θ₁ using least squares based on the pseudo-outcomes: ${\hat{θ}}_{1} = arg {min}_{θ} \sum_{i = 1}^{n} {{\hat{Y}}_{i} - Q_{1} (H_{1 i}, A_{1 i}; θ)}^{2}$ . With the model-based estimates of the Q-functions, we can apply backward induction to obtain an estimate of the optimal DTR: dˆ_t(h_ti) = argmax_at_∈_St Q_t(h_ti, a_t; θˆ_t) for t = 2, 1.

2.2. SMART with Adaptive Randomization (SMART-AR)

If the optimal DTR d* were known for the population served by a treatment program, it would be natural to treat each enrolled patient with d*. In most situations, we will need to learn about d* using the data from patients in the program, and hence some randomization of the treatment sequences is needed. In a typical SMART design, the ith patient enrolled to a treatment program is assigned a stage-t intervention a with probability π_t(a|h_ti) where the function π_t(·|h_t) is pre-specified such that Σ_a_∈_st π_t(a|h_t) = 1 for each given h_t. For example, by setting π_t(a|h_t) = 1/J_t for all a ∈ S_t, we aim to achieve balance between treatments.

The idea of AR is to determine the function π_t(·|h_t) using the data enrolled in the study. Our AR criterion is based on the fact that Q-learning aims to maximize Q_t(h_t, a; θˆ_t) for given h_t. Precisely, let n(i) denote the number of patients with the final outcome evaluated just prior to the enrollment of patient i. It is likely that n(i) < i − 1 as patients are enrolled to the program in a staggered fashion. For patient i, we randomly assign an intervention based on an initial set of probabilities ${π_{t}^{0} (a | h_{ti})} if n (i) < N_{min}$ for some pre-specified N_min. The set $π_{t}^{0}$ may be chosen based on data in previous studies (see Section 3.1) and can thus be viewed as historical randomization probabilities. Once there are at least N_min patients with complete evaluation, that is, n(i) ≥ N_min, we update the randomization probabilities using the data of the first n(i) patients, as follows: For patient i at stage t, define for treatment a

{\hat{ρ}}_{t} (a | h_{ti}) = exp {\frac{Q_{t} (h_{ti}, a; {\hat{θ}}_{t})}{{\hat{σ}}_{t}} log b},

(1)

for some pre-specified base b ≥ 1. The least squares estimate θˆ_t in (1) is evaluated using the data in the first n(i) patients, and ${\hat{σ}}_{t}^{2}$ is the mean squared error due to θˆ_t:

{\hat{σ}}_{1}^{2} = \frac{\sum_{k = 1}^{n (i)} {{\hat{Y}}_{k} - Q_{1} (H_{1 k}, A_{1 k}; {\hat{θ}}_{1})}^{2}}{n (i) - dim (θ_{1})} and {\hat{σ}}_{2}^{2} = \frac{\sum_{k = 1}^{n (i)} {Y_{k} - Q_{2} (H_{2 k}, A_{2 k}; {\hat{θ}}_{2})}^{2}}{n (i) - dim (θ_{2})}

where dim(θ) denotes the dimension of the vector θ. The empirical randomization probability for treatment a is then calculated by normalizing ρˆ_t, that is,

{\hat{π}}_{t} (a | h_{ti}) = \frac{{\hat{ρ}}_{t} (a | h_{ti})}{\sum_{a^{'} \in S_{t}} {\hat{ρ}}_{t} (a | h_{ti})} = \frac{exp {{\hat{△}}_{ti} (a) log b}}{\sum_{a^{'} \in S_{t}} exp {{\hat{△}}_{ti} (a^{'}) log b}}

(2)

where ${\hat{△}}_{ti} (a) = {Q_{t} (h_{ti}, a; {\hat{θ}}_{t}) - Q_{t} (h_{ti}, {\hat{a}}_{ti}^{w}; {\hat{θ}}_{t})} / {\hat{σ}}_{t}$ and ${\hat{a}}_{ti}^{w}$ is the estimated worst action given h_ti. To allow input from historical data or perspective, we propose using a weighted average of the historical and empirical randomization probabilities on the logarithm scale: Define

{\tilde{ρ}}_{t} (a | h_{ti}) = exp {λ_{n}^{b - 1} log π_{t}^{0} (a | h_{ti}) + (1 - λ_{n}^{b - 1}) log {\hat{π}}_{t} (a | h_{ti})}

(3)

where λ_n ∈ [0, 1] goes to zero as n = n(i) grows. The randomization probability to treatment a at stage t given h_ti is given by normalizing ρ̃_t: π̃_t(a|h_ti) = ρ̃_t(a|h_ti)/Σ_a′_∈_st ρ̃_t(a′|h_ti). Thus, under (3), the historical $π_{t}^{0}$ influences the randomization probabilities even after AR is in effect, although its contribution goes to zero as n increases when b > 1.

2.3. Design Parameters

In addition to { $π_{t}^{0}$ }, a SMART-AR design requires specifying b, N_min, and {λ_n}. The base b indicates how “greedy” the AR scheme is. When b = 1, the trial reduces to a non-adaptive SMART using the historical $π_{t}^{0}$ for allocation. When b > 1, the allocation tends to favor the treatments with larger Q_t(h_ti, a; θˆ_t). Heuristically, we may view Δˆ_ti(a) as an empirical version of the effect size Δ_ti(a) = {Q_t(h_ti, a) − Q_t(h_ti, a^w)}/σ_t between a and a^w, where σ_t is the standard deviation of the (pseudo-)outcome and a^w is the worst action based on the true Q-function; hence πˆ_t is expected to be close to (2) with Δˆ_ti(a) replaced by Δ_ti(a). For example, when there are two possible actions S_t = {a, a^w}, under a moderate effect size Δ_ti(a) = 0.5 per Cohen (1988), the empirical πˆ_t(a|h_ti) approximates $\sqrt{b} / (1 + \sqrt{b})$ , which equals 0.59 when b = 2, and 0.91 when b = 100. In this example, setting b from 2 to 100 seems to span a reasonably wide range of “greediness” for a moderate effect size.

The minimum sample size N_min indicates how early AR comes into effect. The choice of N_min apparently depends on dim(θ_t) so that the Q-functions can be reliably estimated. Based on our experience with linear regression models, one should set N_min ≥ 3 × max_t{dim(θ_t)}.

While there are many possible choices of λ_n, we consider λ_n = τ^1/(^b⁻¹⁾ N_min/n for b > 1 and τ ∈ [0,1]. The value of τ attenuates the greediness of AR via the weight given to $π_{t}^{0}$ , which equals τ when AR first becomes in effect. Generally, a larger value of τ leads to more attenuation, whereas τ = 0 implies no attenuation so that the SMART-AR uses the empirical randomization probabilities for allocation, that is, π̃_t ≡ πˆ_t. Together with b and N_min, the attenuation τ allows a wide range of specification of a SMART-AR design. Additional properties of this form of λ_n and the design parameters are given in the web supplement. Calibration of these design parameters is further discussed in Section 3.2.

3. Application: Design of a Depression Treatment Program

3.1. Analysis of Historical Data: Specification of $π_{0}^{t}$

In this section, we describe a SMART-AR design for a depression treatment program with an aim to reduce depression symptoms 6 months after enrolling to the program. Specifically, the final outcome Y is defined as the 6-month reduction in Beck Depression Inventory (BDI). We consider two treatment options, medication or PST, that are to be given at baseline and possibly modified during the step period (6 to 8 weeks since initial treatment). Specifically, let A_ti denote the indicator of patient i receiving PST at stage-t for t = 1, 2; that is, S_t = {0, 1} for t = 1, 2, with 0 indicating medication, and 1 PST. After the step period, each patient is evaluated with an intermediate BDI; let R_i denote the indicator that the intermediate BDI decreased by at least 3 units during the step period for patient i. Thus, the longitudinal trajectory of patient i is given by (A₁_i, R_i, A₂_i, Y_i) so that there are two stages of decisions with H₁_i = ∅ and H₂_i = (A₁_i, R_i).

We set up the initial randomization probabilities $π_{t}^{0}$ in the SMART-AR using the data of CODIACS, in which information about (A_ti, R_i, Y_i) is available in 108 subjects with 56 receiving no PST and 52 receiving PST at baseline. We apply Q-learning to the data with

Q_{2} (h_{2 i}, a_{2 i}; θ_{2}) = β_{0} + β_{1} a_{1 i} + β_{2} a_{2 i} + β_{3} a_{1 i} a_{2 i} + γ_{1} r_{i} + γ_{2} r_{i} (1 - a_{1 i}) a_{2 i} + γ_{3} r_{i} a_{1 i} (1 - a_{2 i})

(4)

for stage 2 where θ₂ = (β₀, β₁, β₂, β₃, γ₁, γ₂, γ₃); and Q₁(h₁_i, a₁_i; θ₁) = Q₁(a₁_i; θ₁) = α₀ + α₁a₁_i for stage 1 where θ₁ = (α₀, α₁). The Q-function (4) is parameterized so that a negative value of γ₂ and γ₃ will support play-the-winner respectively for medication and PST as initial treatment. The least squares γˆ₂ = −13 (P = 0.03) is in line with the expectation that a patient showing response to initial medication should continue with the medication. On the other hand, the least squares γˆ₃ = 6.5 with P = 0.27 is ambivalent about whether playing the winner is optimal for PST. As PST aims to improve a patient's mental health by teaching them how to systematically solve self-identified psychological problem, it is conceivable that additional PST sessions may not be beneficial once the patient acquires the skills.

Table 1 summarizes the results of Q-learning, which estimates dˆ₁ = 1 and dˆ₂(h₂_i) = dˆ₂(dˆ₁, r_i) = 0 for r_i = 0, 1 as the optimal decisions. In other words, the optimal sequence is non-dynamic in that it starts with PST and switches to medication regardless of the intermediate response. It is instructive to also look at the results for patients starting with medication (a₁_i = 0): the optimal follow-up decision in stage 2 will be switching to PST for patients who do not respond (r_i = 0) and staying on medication for those who do (r_i = 1). This analysis supports playing the winner for medication as the initial treatment. To evaluate the robustness of the analysis, we also performed Q-learning under a saturated Q-function

Table 1.

Summary of Q-learning of the CODIACS data. The probabilities ${\hat{π}}_{t}^{CODIACS}$ are calculated according to (2) with b = 2, and are used as historical randomization probabilities $π_{t}^{0}$ in the SMART-AR for the depression treatment program.

Stage 1

Stage 2

a₁

Q₁(a₁; θˆ₁)

{\hat{π}}_{t}^{CODIACS} (a_{1})

(a₁, r)

a₂

Q₂(a₁, r, a₂; θˆ₂)

{\hat{π}}_{t}^{CODIACS} (a_{2} | a_{1}, r)

10.2

0.33

(0, 0)

2.2

0.30

15.4

0.67

10.5

0.70

(0, 1)

10.0

0.62

5.2

0.38

(1, 0)

7.8

0.60

4.0

0.40

(1, 1)

22.0

0.74

11.7

0.26

Open in a new tab

Mean squared errors due to θˆ_t are ${\hat{σ}}_{1}^{2} = 24.6$ and ${\hat{σ}}_{2}^{2} = 45.2$ .

Q_{2}^{Sat} (h_{2 i}, a_{2 i}) = β_{0} + β_{1} a_{1 i} + β_{2} a_{2 i} + β_{3} a_{1 i} a_{2 i} + γ_{1} r_{i} + γ_{2} r_{i} a_{2 i} + γ_{3} r_{i} a_{1 i} + γ_{4} r_{i} a_{1 i} a_{2 i}

(5)

for stage 2. Note that (5) reduces to (4) when γ₄ = −(γ₂ + γ₃). Applying Q-learning with $Q_{2}^{Sat}$ resulted in the same optimal sequence (dˆ₁ = 1, dˆ₂ = 0) as that with (4); this was equivalent to choosing the strategy with maximum marginal means. Also, the Q-functions are estimated with very similar values with Qˆ₁(0) = 10.7 and Qˆ₁(1) = 15.4 (cf. Table 1). It indicates that model (4) is adequate, while slightly more parsimonious than the saturated model.

Based on the estimated Q-functions in Table 1, we evaluated ${{\hat{π}}_{t}^{CODIACS} (a | h_{ti})}$ according to (2) with b = 2. The entire set of ${\hat{π}}_{t}^{CODIACS}$ is also given in Table 1: these values would in turn be used as the historical randomization probabilities $π_{t}^{0}$ in the SMART-AR for the depression treatment program. We used b = 2 in the calculation of ${\hat{π}}_{t}^{CODIACS}$ so as to avoid extreme initial randomization probabilities in the SMART-AR.

3.2. Design Calibration

We take the general calibration approach in Lee and Cheung (2009, 2011) to determine the SMART-AR design parameters (b, N_min, τ). First, we iterate the design parameter values over a three-dimensional grid. Specifically, in light of the discussion in Section 2.3, we considered b = 2, e, 5, 10, 20, 100, N_min = 20, 30, 40, 50, and τ = 0, 0.25, 0.50, 0.75,1.00.

Second, for each triplet (b, N_min, τ), we simulate SMART-AR under a set of pre-specified calibration scenarios. Specifically, the outcomes Y in the simulations were generated as normal with mean specified according to (5) and variance 45; and the parameter values of Scenarios 1–4 in Table 2 were used in the calibration scenarios; these values were chosen so that the analysis model (4) was correct. The optimal DTR d* and the worst DTR d^w and their values are also given in Table 2; the value of a DTR d is defined as V(d) = E_d(Y) the expected outcome Y under d. The intermediate responses (R) were generated as Bernoulli with Pr(R = 1|A₁ = 0) = 0.52 and Pr(R = 1|A₁ = 1) = 0.54 based on the results in CODIACS. In each simulated trial, inter-enrollment times were simulated according to a Poisson process with a rate of four patients per month based on our clinical expectation.

Table 2.

Parameter values for model (5) used in simulation, and the corresponding optimal d* and worst d^w. Scenarios 1–4, under which (4) is correct, are used as calibration scenarios for the SMART-AR design. The analysis model (4) is misspecified under Scenarios 5 and 6.

Variable

Scenario 1

Scenario 2

Scenario 3

Scenario 4

Scenario 5

Scenario 6

β₀

2.2

1.3

2.2

β₁

5.6

6.5

5.6

β₂

8.3

9.2

8.3

β₃

-12

-6.1

-12

-6.1

-12

γ₁

7.7

9.6

7.7

γ₂

-13

-6.5

-13

-6.5

-15

-13

γ₃

6.5

-6.5

4.6

-6.5

γ₄

6.6

0.1

19.5

6.4

-19.5

Optimal DTR

d_{1}^{*}

d_{2}^{*} (d_{1}^{*}, 0)

d_{2}^{*} (d_{1}^{*}, 1)

V (d*)

15.5

16.5

10.2

14.2

15.5

10.2

Worst DTR

d_{1}^{w}

d_{2}^{w} (d_{1}^{w}, 0)

d_{2}^{w} (d_{1}^{w}, 1)

V (d^w)

3.8

6.2

3.8

6.2

3.3

-13

Open in a new tab

Third, we determine the “optimal” design based on some performance averaged over the calibration scenarios. Specifically, we considered the program performance in the first 100 enrollees; for each simulated trial, we evaluated the adjusted value of the estimated optimal dˆ based on the first 100 patients and the adjusted average patient outcomes, defined as:

AV (\hat{d}) = \frac{V (\hat{d}) - V (d^{w})}{V (d *) - V (d^{w})} and {APO}_{100} = \frac{{\bar{Y}}_{100} - V (d^{w})}{V (d *) - V (d^{w})}

(6)

where Y̅₁₀₀ denotes the mean BDI reduction of the 100 patients. For each design under each scenario, we estimated E{AV(dˆ)}, var{AV(dˆ)}, E(APO₁₀₀), and var(APO₁₀₀) based on 1,000 simulation replicates. These performance metrics were then averaged across the calibration scenarios. The value and patient outcome in (6) were standardized against d* and d^w so that we averaged quantities taking values on the same range (0,1) across the scenarios.

The right panel of Figure 1 plots E(APO₁₀₀), which is a measure of quality improvement in a program, under the design parameters considered. Overall, the average patient outcome increases with a large b, small N_min, and small τ; that is, a greedy AR scheme tends to treat more patients with the better regimens on average. However, it is important to note that a greedy AR is also associated with greater var(APO₁₀₀); see the web supplement. The increase in variability is the price for adapting to data, and is in line with AR for non-dynamic treatments. The left panel of Figure 1 plots var{AV(dˆ)}, which gives indication about the efficiency of dˆ subsequent to a SMART-AR. It appears that var{AV(dˆ)} increases with extreme values of b and when N_min = 50. Large τ increases variability (reduces efficiency) with non-greedy AR (e.g., b = 2, N_min = 50), but improves efficiency with greedy AR (e.g., b = 20, N_min = 20). This suggests that an efficient dˆ is a result of non-extreme greediness in a SMART-AR. To account for both efficiency and quality improvement, our calibration strategy aims to find the triplet (b, N_min, τ) that maximizes E(APO₁₀₀) among those with var{AV(dˆ)} no greater than the minimum var{AV(dˆ)} by 5%. We got b = 10, N_min = 30, and τ = 0.75. We emphasize that the goal of calibration is to get a reasonable set of design parameters for a wide range of scenarios, while an “optimal” design is specifically tied the choice of the calibration scenarios and the optimality criteria. It is thus reassuring to note that E{AV(dˆ)} spans a narrow range under the design parameters considered (see web supplement); that is, the SMART-AR allows learning of the optimal DTR over a wide range of design parameters.

Calibration of SMART-AR over a grid of b, N_min, and τ: average var{AV(dˆ)} in the left panel and average E(*APO*₁₀₀) right. The dark dashed and solid lines respectively indicate the minimum and maximum values attained by all design parameters considered.

4. Design Comparison

Simulations were performed to compare several SMART designs. Specifically, we considered the followings: (A) SMART-AR^opt: the optimal SMART-AR with b = 10, N_min = 30, and τ = 0.75; (B) SMART-AR1: SMART-AR with b = 1, which amounts to a non-adaptive SMART using ${\hat{π}}_{t}^{CODIACS}$ throughout; (C) SMART-B: Non-adaptive SMART using balanced randomization, that is, π_t ≡ 1/J_t; (D) SMART-PTW: Non-adaptive SMART with play-the-winner, by which patients responding to the initial treatment will stay on the same treatment, that is, π₂(a₂|a₁, r = 1) = 1 for a₂ = a₁ and 0 for a₂ ≠ a₁, and π_t = 0.5 otherwise. (E) SMART-PTWm: Non-adaptive SMART with play-the-winner for medication only, that is, π₂(0|a₁ = 0, r = 1) = 1 and π₂(1|a₁ = 0, r = 1) = 0, and π_t = 0.5 otherwise. While SMART-AR^opt was calibrated with respect to Scenarios 1–4 in Table 2, we also evaluated the designs under scenarios, namely Scenarios 5 and 6, in which the analysis model (4) is misspecified to assess robustness. In addition, while we considered patient accrual rate of four per month in the calibration, we examined the methods under different accrual rates.

Figure 2 plots the expected BDI reduction of a patient at each given enrollment number up to the 100th enrollee; the smoothed curves were obtained using locally-weighted regression. Generally, the expected patient outcome improves over time under SMART-AR^opt in all scenarios, but remains constant throughout a non-adaptive SMART. Since faster patient accrual implies more patients will be enrolled before adaptation comes into effect, it leads to delayed improvement of SMART-AR^opt. In Scenario 1, the true parameter values are chosen based on regression analysis using the CODIACS data. Therefore, SMART-AR1 is expected to perform well when compared to other non-adaptive SMART designs. In this case, the use of AR still provides improvements over time. By the 100th enrollee, the expected BDI reduction improves by about 2.5 units. Scenario 2, 5, and 6 produce qualitatively similar results, although it is interesting to note that AR results in improvements in Scenarios 5 and 6 when the analysis model is misspecified. In Scenarios 3 and 4, play-the-winner tends to produce better BDI reduction than the other non-adaptive programs. However, SMART-AR^opt is able to compensate the initial deficit and surpasses SMART-PTW by the 100th enrollees even under an accrual rate that doubles our original expectation.

Program performance (average BDI reduction) of SMART-AR under various accrual rates. The SMART-AR is applied with b = 10, N_min = 30, and τ = 0.75. The non-adaptive SMART-AR1 is indicated by ‘o’, SMART-B by ‘+’, SMART-PTW by ‘p’, and SMART-PTWm by ‘m’. The dark solid line on top of each figure corresponds to the performance of the true optimal non-randomized DTR d*.

SMART-B is outperformed by some other non-adaptive schemes in all scenarios. This is not surprising because by aiming to balance sample sizes among all possible treatment sequences, assignments to the worst sequence d^w will likely preclude a balanced design from being an optimal program. To further examine using balanced randomization as an initial randomization scheme in a SMART-AR, we ran simulation of SMART-AR^opt using the initial randomization probabilities $π_{t}^{0} \equiv 1 / J_{t}$ . Figure 3 shows that AR is able to correct for a poor choice of the initial probabilities. Under Scenario 1, for example, the program starting with balanced randomization improves by over 4 units in BDI change in the 25-month period under SMART-AR^opt. (To put the magnitude of improvement in perspective, the centralized care arm in CODIACS improves upon the standard care by 3 units on the BDI scale.) Having said that, this simulation study indicates the importance of having a reliable choice of $π_{t}^{0}$ based on historical data, as far as treating the early enrollees is concerned.

Program performance (average BDI reduction) of SMART-AR with different initial randomization probabilities. The SMART-AR is applied with b = 10, N_min = 30, and τ = 0.75 under an accrual rate 4 per month. The non-adaptive SMART-AR1 is indicated by ‘o’, SMART-B by ‘+’, SMART-PTW by ‘p’, and SMART-PTWm by ‘m’. The dark solid line on top of each figure corresponds to the performance of the true optimal non-randomized DTR d*.

To examine the learning ability of the designs, Table 3(a) gives some properties of dˆ evaluated based on model (4) using the first 100 enrollees. Under SMART-PTWm, the parameter γ₂ is not estimable because R(1 − A₁)A₂ is completely confounded by the main effects A₁, R, and A₂; thus, there is no Q-learning results for this design (and SMART-PTW for the same reason). SMART-AR^opt, SMART-AR1, and SMART-B have similar accuracy in terms of the probability of dˆ correctly estimating d* and the adjusted value E{AV(dˆ)}. On the other hand, SMART-AR^opt yields more efficient estimator dˆ with smaller variability in its value than SMART-B. At first glance, it may not appear to be feasible, because balanced randomization maximizes comparative power between two treatment sequences. However, since our goal is not to compare all possible sequences, but rather identify the optimal one among the good ones, AR allocates resources to the promising sequences thus maximizing the resolution of the relevant comparisons. This simulation shows that by incorporating AR to SMART, it does not only allow learning, but also improves learning upon the non-adaptive designs. As accrual rate increases, SMART-AR^opt converges to SMART-AR1, and thus var{AV(dˆ)} under SMART-AR^opt rises and approaches that of SMART-AR1.

Table 3.

Properties of dˆ under SMART-B, SMART-AR1, and SMART-AR^opt using analysis models (4) and (5) under different patient accrual rates.

	(a) Q-learning using model (4)					(b) Q-learning using model (5)
	Bal.	AR1	AR^opt			Bal.	AR1	AR^opt
Accrual rate:			4	6	8			4	6	8
Scenario 1
Probability dˆ ≡ d*	0.91	0.94	0.95	0.95	0.95	0.90	0.93	0.94	0.94	0.94
E{AV(dˆ)}	0.98	0.99	0.99	0.99	0.99	0.98	0.99	0.99	0.99	0.99
var{AV(dˆ)} × 10³	3.05	2.31	1.32	1.67	1.98	3.49	2.54	1.54	2.25	2.19

Scenario 2
Probability dˆ ≡ d*	0.75	0.78	0.80	0.77	0.77	0.73	0.76	0.78	0.77	0.76
E{AV(dˆ)}	0.97	0.97	0.98	0.97	0.97	0.96	0.97	0.97	0.97	0.97
var{AV(dˆ)} × 10³	4.61	5.26	2.56	2.90	4.11	5.40	6.15	3.74	3.50	5.41

Scenario 3
Probability dˆ ≡ d*	0.53	0.51	0.51	0.53	0.53	0.51	0.50	0.52	0.52	0.52
E{AV(dˆ)}	0.95	0.95	0.96	0.96	0.95	0.95	0.94	0.96	0.96	0.95
var{AV(dˆ)} × 10³	8.47	11.00	7.38	7.24	10.74	8.26	12.75	8.15	7.64	12.02

Scenario 4
Probability dˆ ≡ d*	0.75	0.76	0.79	0.75	0.74	0.74	0.74	0.76	0.74	0.73
E{AV(dˆ)}	0.95	0.95	0.96	0.95	0.95	0.95	0.94	0.95	0.94	0.94
var{AV(dˆ)} × 10³	9.79	13.83	7.97	11.14	12.09	10.68	16.57	11.00	13.59	14.75

Scenario 5
Probability dˆ ≡ d*	0.91	0.91	0.91	0.91	0.91	0.82	0.85	0.88	0.86	0.86
E{AV(dˆ)}	0.99	0.99	0.99	0.99	0.99	0.98	0.98	0.99	0.98	0.98
var{AV(dˆ)} × 10³	1.85	1.73	1.23	1.55	1.66	3.18	2.48	2.29	2.26	2.27

Scenario 6
Probability dˆ ≡ d*	0.01	0.00	0.16	0.10	0.05	0.78	0.76	0.76	0.78	0.77
E{AV(dˆ)}	0.84	0.77	0.92	0.90	0.86	0.98	0.98	0.98	0.98	0.98
var{AV(dˆ)} × 10³	9.14	5.23	2.52	4.20	4.61	1.26	1.80	1.41	1.33	1.42

Open in a new tab

Table 3(b) shows the Q-learning results using the saturated model (5), which in effect is nonparametric. In most scenarios, using model (4) yields smaller var{AV(dˆ)} than (5), even at times when (4) is incorrect; cf. Scenario 5. In Scenario 6, the nonparametric analysis is advantageous in terms of the probability of correctly estimating d*. Its advantage is less pronounced in terms of E{AV(dˆ)}, as Q-learning using model (4) often leads to selecting the second best DTR, which is not far worse than d*. Thus there seems to be intrinsic robustness of Q-learning using a misspecified model in terms of the adjusted value. In addition, under an accrual rate 4 per month, Q-learning using (4) in SMART-AR^opt has better E{AV(dˆ)} and var{AV(dˆ)} than that in the non-adaptive SMART. This suggests that AR further enhances robustness of Q-learning.

5. Discussion

In this article, we have introduced a novel synthesis of Q-learning and adaptive randomization for designing a SMART-AR, and described how to initiate a depression treatment program based on the CODIACS study. In our application, we defined the intermediate response by a BDI reduction at least 3 units. One could also classify a response into more than two categories—for example, worsening (< 0), no improvement (0 – 2), and clinically significant improvement (≥ 3)—and apply different randomization probabilities for each category. Such design enrichment needs to be coupled with a larger model for the Q-function. Generally, with C response categories in a two-stage DTR, dim(θ₂) is of the order J₁ × C × J₂: the actual number of parameters depends on how many interaction terms are postulated, while it is important to include interactions in the search of optimal DTR. Therefore, when a program starts and data are few, it is necessary to consider “simple” designs as in Section 3. As enrollment grows, one can enrich the design to account for more information by incorporating interaction effects with patient covariates, refining and increasing treatment options such as different classes of medications, redefining the intermediate response, and adopting more than two stages of treatments. The proposed AR scheme accommodates such enrichment to the extent that Q-learning is feasible given the sample size. In addition, we have demonstrated in our simulation study that the use of AR improves the performance and robustness of Q-learning by allocating patients away from treatment sequences that are not promising. Similar observations have been noted for adaptive procedures for selection among multiple non-dynamic treatments (e.g. Cheung, 2008). In the context of DTRs, due to the curse of dimensionality, this advantage offered by AR is potentially enormous.

Since the weighted average (3) combines the historical and the empirical inputs on the probability scale, it can be easily applied with other empirical randomization scheme. For example, the empirical component (2) may be replaced with an ∈-optimal criterion as in Lee et al. (2014), who adopt a Bayesian approach and estimate the Q-function using full likelihood. Our approach, while less formal than a fully Bayesian approach, offers flexibility. It can be applied with other reinforcement learning technique that has an explicit objective function, and can be easily extended to deal with a broader set of problems, such as when the outcome is binary (e.g., Moodie, Dean, and Sun, 2013). To use either approach, however, it is critical to properly calibrate the design parameters: (b, N_min, τ) in our method, and an ∈-sequence and the prior distribution for the Bayesian model in Lee et al. (2014).

As in any sequential methods, the advantages of AR diminish with fast patient accrual, as demonstrated in our simulation. Specifically, a SMART-AR converges to a non-adaptive SMART, as more patients are enrolled before adaptation begins. The non-adaptive SMART can thus be used to provide a lower bound of performance for SMART-AR. This underlines the crucial role of historical input $π_{t}^{0}$ . This point should be read in light of the nature of implementation research, in which the SMART-AR is used as a dissemination tool to deploy existing treatments to a community. Thus, historical perspective is often available and useful, and may prove more beneficial than applying balanced randomization initially.

Supplementary Material

Supp Material

NIHMS680616-supplement-Supp_Material.pdf^{(232KB, pdf)}

Footnotes

Supplementary Materials: The R code that was used to perform the simulation in Section 3 is are available with this paper as a web supplement at the Biometrics website on Wiley Online Library. The web-based supplementary materials also contain additional properties of the AR design mentioned in Section 2.3 and post the full calibration results in Section 3.2.

References

Barker AD, Sigman CC, Kelloff GJ, Hylton NM, Berry DA, Esserman LJ. I-SPY 2: An adaptive breast cancer trial design in the setting of neoadjuvant chemotherapy. Clinical Pharmacology & Therapeutics. 2009;86:97–100. doi: 10.1038/clpt.2009.68. [DOI] [PubMed] [Google Scholar]
Chakraborty B, Laber EB, Zhao Y. Inference for optimal dynamic treatment regimes using an adaptive m-out-of-n bootstrap scheme. Biometrics. 2013;69:714–723. doi: 10.1111/biom.12052. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cheung YK. Simple sequential boundaries for treatment selection in multi-armed randomized clinical trials with a control. Biometrics. 2008;64:940–949. doi: 10.1111/j.1541-0420.2007.00929.x. [DOI] [PubMed] [Google Scholar]
Cheung K, Duan N. Design of implementation studies for quality improvement programs: an effectiveness-cost-effectiveness framework. American Journal of Public Health. 2014;104:e23–30. doi: 10.2105/AJPH.2013.301579. Epub 2013 Nov 14. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cheung YK, Inoue LYT, Wathen JK, Thall PF. Continuous Bayesian adaptive randomization based on event times with covariates. Statistics in Medicine. 2006;25:55–70. doi: 10.1002/sim.2247. [DOI] [PubMed] [Google Scholar]
Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd. Lawrence Erlbaum Associates; 1988. [Google Scholar]
Davidson KW, Bigger JT, Burg MM, et al. Centralized, stepped, patient preference-based treatment of patients with post-acute coronary syndrome depression: CODIACS vanguard randomized controlled trial. JAMA Intern Med. 2013;173:997–1004. doi: 10.1001/jamainternmed.2013.915. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lavori PW, Dawson R. Dynamic treatment regimes: practical design considerations. Clinical Trials. 2004;1:9–20. doi: 10.1191/1740774s04cn002oa. [DOI] [PubMed] [Google Scholar]
Lee J, Thall PF, Ji Y, Müller P. Bayesian dose-finding in two treatment cycles based on the joint utility of efficacy and toxicity. Journal of the American Statistical Association. 2014 doi: 10.1080/01621459.2014.926815.. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee SM, Cheung YK. Model calibration in the continual reassessment method. Clinical Trials. 2009;6:227–238. doi: 10.1177/1740774509105076. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee SM, Cheung YK. Calibration of prior variance in the Bayesian continual reassessment method. Statistics in Medicine. 2011;30:2081–2089. doi: 10.1002/sim.4139. [DOI] [PMC free article] [PubMed] [Google Scholar]
Moodie EEM, Dean N, Sun YR. Q-learning: Flexible learning about useful utilities. Statistics in Biosciences. 2013 doi: 10.1007/s12561-013-9103-z. [DOI] [Google Scholar]
Murphy SA. Optimal dynamic treatment regimes (with discussion) Journal of the Royal Statistical Society, Series B. 2003;65:331–366. [Google Scholar]
Murphy SA. A generalization error for Q-learning. Journal of Machine Learning Research. 2005a;6:1073–1097. [PMC free article] [PubMed] [Google Scholar]
Murphy SA. An experimental design for the development of adaptive treatment strategies. Statistics in Medicine. 2005b;24:1455–1481. doi: 10.1002/sim.2022. [DOI] [PubMed] [Google Scholar]
Robins J. Causal inference from complex longitudinal data. In: Berkane M, editor. Latent Variable Modeling and Applications to Causality. Springer; New York: 1997. pp. 69–117. [Google Scholar]
Rubin D. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Education Psychology. 1974;66:688–701. [Google Scholar]
Rush AJ, Fava M, Wisniewski SR. Sequenced treatment alternatives to relieve depression (STAR*D): rationale and design. Controlled Clinical Trials. 2004;25:119–142. doi: 10.1016/s0197-2456(03)00112-0. [DOI] [PubMed] [Google Scholar]
Schneider LS, Tariot PN, Lyketsos CG, et al. National Institute of Mental Health clinical antipsychotic trials of intervention effectiveness (CATIE): alzheimer disease trial methodology. American Journal of Geriatric Psychiatry. 2001;9:346–360. [PubMed] [Google Scholar]
Thall PF, Millikan RE, Sung HG. Evaluating multiple treatment courses in clinical trials. Statistics in Medicine. 2000;19:1011–1028. doi: 10.1002/(sici)1097-0258(20000430)19:8<1011::aid-sim414>3.0.co;2-m. [DOI] [PubMed] [Google Scholar]
Thall PF, Wathen JK. Practical Bayesian adaptive randomisation in clinical trials. European Journal of Cancer. 2007;43:859–866. doi: 10.1016/j.ejca.2007.01.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thall PF, Wooten LH, Logothetis CJ, Millikan RE, Tannir NM. Bayesian and frequentist two-stage treatment strategies based on sequential failure times subject to interval censoring. Statistics in Medicine. 2007;26:4687–4702. doi: 10.1002/sim.2894. [DOI] [PubMed] [Google Scholar]
Thombs BD, de Jonge P, Coyne JC, Whooley MA, Frasure-Smith N, Mitchell AJ, Zuidersma M, Eze-Nliam C, Lima BB, Smith CG, Soderlund K, Ziegelstein RC. Depression screening and patient outcomes in cardiovascular care. Journal of the American Medical Association. 2008;300:2161–2171. doi: 10.1001/jama.2008.667. [DOI] [PubMed] [Google Scholar]
Watkins CJCH. PhD Dissertation. Cambridge University; 1989. Learning from delayed rewards. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp Material

NIHMS680616-supplement-Supp_Material.pdf^{(232KB, pdf)}

[R1] Barker AD, Sigman CC, Kelloff GJ, Hylton NM, Berry DA, Esserman LJ. I-SPY 2: An adaptive breast cancer trial design in the setting of neoadjuvant chemotherapy. Clinical Pharmacology & Therapeutics. 2009;86:97–100. doi: 10.1038/clpt.2009.68. [DOI] [PubMed] [Google Scholar]

[R2] Chakraborty B, Laber EB, Zhao Y. Inference for optimal dynamic treatment regimes using an adaptive m-out-of-n bootstrap scheme. Biometrics. 2013;69:714–723. doi: 10.1111/biom.12052. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Cheung YK. Simple sequential boundaries for treatment selection in multi-armed randomized clinical trials with a control. Biometrics. 2008;64:940–949. doi: 10.1111/j.1541-0420.2007.00929.x. [DOI] [PubMed] [Google Scholar]

[R4] Cheung K, Duan N. Design of implementation studies for quality improvement programs: an effectiveness-cost-effectiveness framework. American Journal of Public Health. 2014;104:e23–30. doi: 10.2105/AJPH.2013.301579. Epub 2013 Nov 14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Cheung YK, Inoue LYT, Wathen JK, Thall PF. Continuous Bayesian adaptive randomization based on event times with covariates. Statistics in Medicine. 2006;25:55–70. doi: 10.1002/sim.2247. [DOI] [PubMed] [Google Scholar]

[R6] Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd. Lawrence Erlbaum Associates; 1988. [Google Scholar]

[R7] Davidson KW, Bigger JT, Burg MM, et al. Centralized, stepped, patient preference-based treatment of patients with post-acute coronary syndrome depression: CODIACS vanguard randomized controlled trial. JAMA Intern Med. 2013;173:997–1004. doi: 10.1001/jamainternmed.2013.915. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Lavori PW, Dawson R. Dynamic treatment regimes: practical design considerations. Clinical Trials. 2004;1:9–20. doi: 10.1191/1740774s04cn002oa. [DOI] [PubMed] [Google Scholar]

[R9] Lee J, Thall PF, Ji Y, Müller P. Bayesian dose-finding in two treatment cycles based on the joint utility of efficacy and toxicity. Journal of the American Statistical Association. 2014 doi: 10.1080/01621459.2014.926815.. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Lee SM, Cheung YK. Model calibration in the continual reassessment method. Clinical Trials. 2009;6:227–238. doi: 10.1177/1740774509105076. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Lee SM, Cheung YK. Calibration of prior variance in the Bayesian continual reassessment method. Statistics in Medicine. 2011;30:2081–2089. doi: 10.1002/sim.4139. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Moodie EEM, Dean N, Sun YR. Q-learning: Flexible learning about useful utilities. Statistics in Biosciences. 2013 doi: 10.1007/s12561-013-9103-z. [DOI] [Google Scholar]

[R13] Murphy SA. Optimal dynamic treatment regimes (with discussion) Journal of the Royal Statistical Society, Series B. 2003;65:331–366. [Google Scholar]

[R14] Murphy SA. A generalization error for Q-learning. Journal of Machine Learning Research. 2005a;6:1073–1097. [PMC free article] [PubMed] [Google Scholar]

[R15] Murphy SA. An experimental design for the development of adaptive treatment strategies. Statistics in Medicine. 2005b;24:1455–1481. doi: 10.1002/sim.2022. [DOI] [PubMed] [Google Scholar]

[R16] Robins J. Causal inference from complex longitudinal data. In: Berkane M, editor. Latent Variable Modeling and Applications to Causality. Springer; New York: 1997. pp. 69–117. [Google Scholar]

[R17] Rubin D. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Education Psychology. 1974;66:688–701. [Google Scholar]

[R18] Rush AJ, Fava M, Wisniewski SR. Sequenced treatment alternatives to relieve depression (STAR*D): rationale and design. Controlled Clinical Trials. 2004;25:119–142. doi: 10.1016/s0197-2456(03)00112-0. [DOI] [PubMed] [Google Scholar]

[R19] Schneider LS, Tariot PN, Lyketsos CG, et al. National Institute of Mental Health clinical antipsychotic trials of intervention effectiveness (CATIE): alzheimer disease trial methodology. American Journal of Geriatric Psychiatry. 2001;9:346–360. [PubMed] [Google Scholar]

[R20] Thall PF, Millikan RE, Sung HG. Evaluating multiple treatment courses in clinical trials. Statistics in Medicine. 2000;19:1011–1028. doi: 10.1002/(sici)1097-0258(20000430)19:8<1011::aid-sim414>3.0.co;2-m. [DOI] [PubMed] [Google Scholar]

[R21] Thall PF, Wathen JK. Practical Bayesian adaptive randomisation in clinical trials. European Journal of Cancer. 2007;43:859–866. doi: 10.1016/j.ejca.2007.01.006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] Thall PF, Wooten LH, Logothetis CJ, Millikan RE, Tannir NM. Bayesian and frequentist two-stage treatment strategies based on sequential failure times subject to interval censoring. Statistics in Medicine. 2007;26:4687–4702. doi: 10.1002/sim.2894. [DOI] [PubMed] [Google Scholar]

[R23] Thombs BD, de Jonge P, Coyne JC, Whooley MA, Frasure-Smith N, Mitchell AJ, Zuidersma M, Eze-Nliam C, Lima BB, Smith CG, Soderlund K, Ziegelstein RC. Depression screening and patient outcomes in cardiovascular care. Journal of the American Medical Association. 2008;300:2161–2171. doi: 10.1001/jama.2008.667. [DOI] [PubMed] [Google Scholar]

[R24] Watkins CJCH. PhD Dissertation. Cambridge University; 1989. Learning from delayed rewards. [Google Scholar]

PERMALINK

Sequential Multiple Assignment Randomized Trial (SMART) with Adaptive Randomization for Quality Improvement in Depression Treatment Program

Ying Kuen Cheung

Bibhas Chakraborty

Karina W Davidson

Summary

1. Introduction

2. Methods

2.1. Q-learning: Review and Notation

2.2. SMART with Adaptive Randomization (SMART-AR)

2.3. Design Parameters

3. Application: Design of a Depression Treatment Program

3.1. Analysis of Historical Data: Specification of $π_{0}^{t}$

Table 1.

3.2. Design Calibration

Table 2.

Figure 1.

4. Design Comparison

Figure 2.

Figure 3.

Table 3.

5. Discussion

Supplementary Material

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Sequential Multiple Assignment Randomized Trial (SMART) with Adaptive Randomization for Quality Improvement in Depression Treatment Program

Ying Kuen Cheung

Bibhas Chakraborty

Karina W Davidson

Summary

1. Introduction

2. Methods

2.1. Q-learning: Review and Notation

2.2. SMART with Adaptive Randomization (SMART-AR)

2.3. Design Parameters

3. Application: Design of a Depression Treatment Program

3.1. Analysis of Historical Data: Specification of π0t

Table 1.

3.2. Design Calibration

Table 2.

Figure 1.

4. Design Comparison

Figure 2.

Figure 3.

Table 3.

5. Discussion

Supplementary Material

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

3.1. Analysis of Historical Data: Specification of $π_{0}^{t}$