Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Apr 10.
Published in final edited form as: Biometrics. 2015 Sep 13;72(1):10–19. doi: 10.1111/biom.12391

Comparing Treatment Policies with Assistance from the Structural Nested Mean Model

Xi Lu 1,*, Kevin G Lynch 2,**, David W Oslin 2,***, Susan Murphy 1,****
PMCID: PMC4789134  NIHMSID: NIHMS728071  PMID: 26363892

Summary

Treatment policies, also known as dynamic treatment regimes, are sequences of decision rules that link the observed patient history with treatment recommendations. Multiple, plausible, treatment policies are frequently constructed by researchers using expert opinion, theories and reviews of the literature. Often these different policies represent competing approaches to managing an illness. Here we develop an “assisted estimator” that can be used to compare the mean outcome of competing treatment policies. The term “assisted” refers to the fact estimators from the Structural Nested Mean Model, a parametric model for the causal effect of treatment at each time point, are used in the process of estimating the mean outcome. This work is motivated by our work on comparing the mean outcome of two competing treatment policies using data from the ExTENd study in alcohol dependence.

Keywords: Adaptive intervention, dynamic treatment regime, semiparametric model, sequential multiple assignment randomized trial

1. Introduction

In many areas of health, treatment response is heterogeneous in which case clinicians will need to consider providing a sequence of treatments in order to obtain sufficient treatment response. Furthermore patients with chronic illnesses often require changes in treatment, that is, sequences of treatments, so as to maintain a good response. As a result clinical scientists have become increasingly interested in, and active in, the development of interventions that are composed of treatment sequences (Lavori and Dawson, 2000) in various fields including alcoholism (Oslin, 2005), substance abuse (Jones et al., 2011; McKay, 2009), leukemia (Thall et al., 2002) and autism spectrum disorder (Kasari, 2009). Ideally the treatment sequences are adapted to accommodate treatment response heterogeneity and thus result in more efficacious and less burdensome/costly treatment. Treatment policies (Lunceford et al., 2002; Wahed and Tsiatis, 2004, 2006) – also called dynamic treatment regimes (Robins, 1986; Murphy et al., 2001), adaptive treatment strategies (Lavori et al., 2000; Murphy, 2005) or adaptive interventions (Nahum-Shani et al., 2012a,b) – operationalize the dynamic adaption via a sequence of decision rules, one for each stage in the treatment process; the decision rule inputs measurements of patients’ time-varying covariates and outputs recommended treatments.

Often scientists construct treatment policies that represent competing approaches to managing an illness. For example in the treatment of ADHD, the American Psychological Association recommends starting with behavioral treatment and moving to a medication only if the behavioral treatment is not effective (Brown et al., 2008), whereas the American Academy of Child and Adolescent Psychiatry recommends starting with medication (Pliszka and AACAP Work Group on Quality Issues, 2007). Or one treatment policy might represent a least intensive or least costly version, whereas another treatment policy may represent a most intensive, most costly version. For example, the Extending Treatment Effectiveness of Naltrexone (ExTENd) trial of alcohol dependence treatments (PI: Oslin (Oslin, 2005)) involves multiple treatment policies, of which one is the most intensive and another is the least intensive.

A common approach to estimating and comparing the mean outcomes of competing treatment policies, is to use a non-parametric estimation procedure that involves inverse-probability-weights (IPW), such as those described in Murphy et al. (2001) and Zhang et al. (2013). These estimators are non-parametric in the sense that they do not require models that relate baseline or time-varying covariates with the outcome. Robins and colleagues (Robins, 1997b; Orellana et al., 2010) generalized the Murphy et al. (2001) methods by parameterizing mean outcomes with each value of the parameter representing a different policy in a class of treatment policies. In this manuscript, we develop an alternative approach for comparing treatment policies. This approach combines the non-parametric IPW estimators of the mean outcome with a model-based approach based on Robins’ Structural Nested Mean Model (SNMM; Robins (1994)). In the Structural Nested Mean Model, intermediate treatment effect functions, also called “treatment blips,” are parametrically modeled. The intermediate treatment effects isolate the causal effect of treatment at each time point, conditional on baseline and time-varying covariate history up to that time point. The resulting estimator is an “assisted” estimator in that the model-based approach assists the non-parametric estimator in estimating the mean outcomes of competing treatment policies.

Throughout this paper we focus on the comparison of two-stage treatment policies. The restriction to two-stage treatment policies allows the main ideas to be presented and in addition most sequentially randomized trials, aka Sequential Multiple Assignment Randomized Trials (SMART) (Lavori and Dawson, 2004; Murphy, 2005), concern two stages of treatment. ExTENd is a two-stage SMART. In Section 2, we formulate the estimand in a precise manner. In this section we provide a class of assisted estimators for the mean outcome based on data from a SMART; theoretical properties of the estimators are also provided. In Section 3, we briefly introduce how these estimators can be used to compare treatment policies and make inference. Simulation studies, in Section 4, are used to investigate different aspects of the methodology, including the performance of the proposed estimator under various levels of mis-specifying treatment effects. In Section 5, the methodology is illustrated by an analysis of the ExTENd data. Finally, a discussion of the paper, including ideas for future work, is presented in Section 6. Proofs of the theorems and lemmas are relegated to Web Appendix A.

2. Assisted Estimator for Policy Value

A two-stage treatment policy consists of two decision rules, d = (d1, d2). Each decision rule inputs available patient information at the current stage and outputs a treatment recommendation. Denote the outcome by Y (Y may be observed after the study or may be a function of the data collected during the study). The value of a policy is the expectation of Y that would result if the treatments were selected using the treatment policy d. A useful way to define the value of a policy is via the potential outcome framework (Neyman et al., 1935; Rubin et al., 1978). For each variable and each treatment sequence, we conceptualize a “potential outcome” that would have been observed under that treatment sequence. Using Xj to denote observations available prior to the j-th decision, the potential outcomes are {X1, X2(a1), X3(a1, a2); for all possible sequences of treatments (a1, a2)}. Here X3 denotes observations after the second decision; the outcome Y(a1, a2) is a known function of {X1, X2(a1), X3(a1, a2)}. The value of the policy, d, is given by Vd=E[Y(a1,a2)a2=d2(H2(a1)),a1=d1(H1)] where H2(a1) = (X1, a1, X2(a1)) and H1 = X1. Vd is the marginal mean of Y under the policy d, after integrating out H2(a1) and H1.

The value of a treatment policy d, can also be written as a function of the intermediate treatment effects or “treatment blip functions,” from Robins’ Structural Nested Mean Model (Robins, 1994). We deviate briefly to define these intermediate treatment effects which we will use below; other types of treatment blip functions can be found in Murphy (2003) and Robins (2004). Corresponding to the two stages of treatment, there are two intermediate treatment effects given by μ2(h2,a2)=E[Y(a1,a2)H2(a1)=h2]E[Y(a1,0)H2(a1)=h2] and μ1(h1,a1)=E[Y(a1,0)H1=h1]E[Y(0,0)H1=h1], where at = 0 is the coding for a reference treatment. The intermediate treatment effect, μ2, quantifies the effect of treatment a2 relative to the reference treatment at stage two on the mean of Y, among individuals with history h2. The intermediate treatment effect, μ1, quantifies the effect of treatment a1 relative to the stage one reference treatment, if always followed by the reference treatment at stage two, on the mean of Y, among individuals with history h1 at stage one.

Consider randomized treatments in a randomized trial, denoted by capitalized letters, A1, A2, where the randomization distribution of A1 given H1 = h1 is denoted by p1(h1) and the randomization distribution of A2 given H2(A1) = h2 is denoted by p2(h2). Throughout this paper we implicitly make all required measurability assumptions as well as existence of regular conditional densities. We have the following lemma.

Lemma 1: Assume that (i) max{EY(a1,a2),Eμ1(H1,a1),Eμ2(H2(a1),a2)} < ∞ for any treatment sequence (a1, a2) and (ii) for some δ > 0, mina1p1(a1H1)δ, a.s., then

Vd=E[Y(A1,A2)μ2(H2(A1),A2)μ1(H1,A1)+μ1(H1,d1(H1))+μ2(H2(a1),d2(H2(a1)))a1=d1(H1)]=E[Y(A1,A2)μ2(H2(A1),A2)μ1(H1,A1)+μ1(H1,d1(H1))+I{A1=d1(H1)}p1(A1H1)μ2(H2(A1),d2(H2(A1)))]. (1)

This representation of the value, Vd, will form the basis for our method. The intuition behind this representation is that the potential outcome of Y under treatment policy d can be constructed or recovered from the potential outcome associated with the treatment sequence (A1, A2), by subtracting the intermediate treatment effects due to the sequence (A1, A2) and then adding in the intermediate treatment effects due to the policy d. The fraction involving the randomization probability in the last term (1) is used to account for the fact that the intermediate treatment effect of the second stage treatment under policy d depends on H2(a1)a1=d1(H1) (the covariate history that would occur if the first stage treatment were assigned according to policy d); that is, this fraction adjusts for the fact that H2(A1) is not always equal to H2(d1(H1)).

2.1 The Data and the Estimation Method

The observed data on each participant in a two-stage SMART is {X1, A1, X2, A2, X3} where Xt denotes covariates observed prior to the t-th stage and At denotes the t-th stage randomized treatment. The primary outcome Y is a known function of {X1, A1, X2, A2, X3}. Let H2 = (X1, A1, X2) and H1 = X1. The randomization probability for an individual's treatment may be a function of the individual's observed data (say P[At=aHt]=pt(aHt)). For example, in many SMARTs, including ExTENd, participants who respond to the first stage treatment are randomized to different treatments from participants who do not respond to the first stage treatment. Thus non-responding participants have probability 0 of being assigned one of the treatments available for responders whereas responding participants have probability 0 of being assigned one of the treatments available for non-responders.

To express the intermediate effects and the value (1) in terms of the observed data, we relate the observed data to the potential outcomes. We assume (Rubin, 1986; Robins, 1997a; Robins et al., 2008), (A1) Consistency: X2 = X2(A1), X3 = X3(A1, A2) and (A2) Sequential Randomization: A1 is independent of all potential outcomes given observed X1; A2 is independent of all potential outcomes given observed (X1, A1, X2). The consistency assumption states that the observed covariates are identical to the potential outcomes of the covariates evaluated at the observed treatment sequence. In particular this assumption implies that each subject's outcomes are uninfluenced by other subjects’ assigned treatments. This assumption may be violated if for example, treatment is provided in a group setting (group counseling). The sequential randomization assumption is valid in the setting of SMART trials because the treatment is randomized.

The intermediate treatment effects and the value, Vd, can be expressed in terms of the observed data as follows.

Lemma 2: Assume A1 and A2 and (i) max{EY,Eμ1(H1,a1),Eμ2(H2,a2)} < ∞ for any treatment sequence (a1, a2) and (ii) for some δ > 0, mina1p1(a1H1)δ, a.s., then

  • (a)

    μ2(h2,a2)=E[YH2=h2,A2=a2]E[YH2=h2,A2=0],

  • (b)

    μ1(h1,a1)=E[E[YH2,A2=0]H1=h1,A1=a1]E[E[YH2,A2=0]H1=h1,A1=0] and

  • (c)

    Vd=E[Yμ2(H2,A2)μ1(H1,A1)+μ1(H1,d1(H1))+I{A1=d1(H1)}p1(A1H1)μ2(H2,d2(H2))].

Suppose the intermediate treatment effects are known up to a finite-dimensional parameter: μ1(h1, a1) = μ1(h1, a1; β1), μ2(h2, a2) = μ2(h2, a2; β2). Robins (1994) provides a class of “g-estimators” for the parameters, β = (β1, β2). Each member in the class corresponds to a different choice of model for each of several nuisance functions; consistency of the g-estimators does not require correct models for the nuisance functions (see Robins (1994) for a detailed discussion). Furthermore this class of estimators does not require knowledge of the treatment policy, d. Thus β can be estimated and then used to form the estimators of the values of a variety of treatment policies. In Web Appendix B, we review the class of g-estimators. Each estimator in this class is consistent for the true value β0 = (β10, β20) of β, and is asymptotically normally distributed (assuming a correctly specified SNMM and some finite moment conditions). Throughout the paper we implicitly assume consistency and asymptotic normality of β^.

Then, given the results of Lemma 2 and estimators, β^, a natural assisted estimator of the value of the policy d, Vd is:

V^0(d;β^)=Pn[Yμ2(H2,A2;β^2)μ1(H1,A1;β^1)+μ1(H1,d1(H1);β^1)+I{A1=d1(H1)}p1(A1H1)μ2(H2,d2(H2);β^2)]. (2)

where Pnf(X1,A1,X2,A2,X3) denotes a sample average. This estimator belongs to a class of assisted estimators, given by

V^m(d;β^)=Pn[Yμ2(H2,A2;β^2)μ1(H1,A1;β^1)+μ1(H1,d1(H1);β^1)+I{A1=d1(H1)}p1(A1H1){μ2(H2,d2(H2);β^2)m(H1,A1)}+m(H1,d1(H1))], (3)

indexed by the function m(h1, a1). Note the former assisted estimator, V^0(d;β^), corresponds to setting m(h1, a1) ≡ 0. We have the following lemma:

Lemma 3: Assume that the assumptions for Lemma 2 hold, then

  • (a)

    The estimating function in (3) is unbiased for any choice of m that satisfies Em(H1,a1) < ∞ for any a1.

  • (b)

    Assume (i) EY2 < ∞; (ii) μ.1(h1,a1;β1)β1μ1(h1,a1;β1) exists for all β1, a.s., and μ.2(h2,a2;β2)β2μ2(h2,a2;β2) exists for all β2, a.s.; and (iii) there exists some δ > 0 such that a1Esupβ1β10δμ1(H1,a1;β1)2+μ.1(H1,a1;β1)2 < ∞, and a2Esupβ2β20δμ2(H2,a2;β2)2+μ.2(H2,a2;β2)2 < ∞. Then if β^ belongs to a subclass B of g-estimators, the choice of m resulting in the lowest variance for V^m(d;β^) satisfies m(h1, d1(h1)) = E[μ2(H2,d2(H2))H1=h1,A1=d1(h1)].

The subclass B corresponds to g-estimators for which a particular nuisance function is correctly modeled. This subclass is defined in Web Appendix B after a general review of g-estimators; in particular, in the simulation section we use an estimator β^ based on a correctly specified model for the nuisance function, thus β^B. In Web Appendix C, we provide additional simulation results when using a β^ that does not belong to B.

The lemma above provides a guide for the choice of m; in practice m(h1, a1) in (3) can be replaced by a working estimator m^(h1,a1)m(h1,a1;α^m) of E[μ2(H2,d2(H2))H1=h1,A1=a1], resulting in V^m^(d;β^). Next we provide consistency and asymptotic normality results for the estimators of the value. We assume A1 and A2; in addition, we assume that μ1(h1, a1; β1) and μ2(h2, a2; β2) are functions that correctly specify the SNMM, with true parameter value β0 = (β10, β20). In particular, Theorem 1 below implies that the assisted estimator is consistent regardless of the choice of function m (indeed one can set m ≡ 0).

Theorem 1: Assume that the assumptions for Lemma 3 hold; moreover, assume: (1) α^m converges in probability to some limit αm+; (2) there exists some δ > 0 such that a1Esupαmαm+δm(H1,a1,αm) < ∞; and (3) m.(h1,a1;αm)αmm(h1,a1;αm) exists for all αm, a.s. Then V^m^(d;β^) is a consistent estimator for the policy value of d, Vd.

Theorem 2: Assume that the assumptions for Theorem 1 hold; moreover, assume: (1) there exists some δ > 0 such that a1Esupαmαm+δm(H1,a1;αm)2+m.(H1,a1;αm)2 < ∞ and (2) n(α^mαm+)=Op(1). Then n(V^m^(d;β^)Vd) is asymptotically normal.

The asymptotic variance of the limiting normal distribution in Theorem 2 is provided in Web Appendix A. Recall that if m(h1, a1; αm) is a correct model for E[μ2(H2,d2(H2))H1=h1,A1=a1], then this asymptotic variance achieves the lowest value among all choices of m, provided that β^ belongs to the subclass B of g-estimators.

3. Comparison between Treatment Policies

Suppose we are interested in comparing treatment policies d = (d1, d2) and d~=(d~1,d~2). Then, given an estimator β^ for the intermediate treatment effects, we obtain the following consistent estimator for the contrast between d and d~, i.e., Vd~Vd:

(V^md~(d~;β^)V^md(d;β^))=Pn[μ1(H1,d~1(H1);β^1)μ1(H1,d1(H1);β^1)+I{A1=d~1(H1)}p1(A1H1){μ2(H2,d~2(H2);β^2)md~(H1,A1)}I{A1=d1(H1)}p1(A1H1){μ2(H2,d2(H2);β^2)md(H1,A1)}+md~(H1,d~1(H1))md(H1,d1(H1))], (4)

where the function m(h1, a1) is now subscripted by the policy d, to reflect that a good choice of function m varies with d (see the following lemma). For ease of notation, define Δd(h1,a1)=md(h1,a1)E[μ2(H2,d2(H2))H1=h1,A1=a1].

Lemma 4: Assume that the conditions for Lemma 3 are satisfied; in particular, assume that β^ belongs to the subclass B of g-estimators. Then the choice of md and md~ resulting in the lowest asymptotic variance for n(V^md~(d~;β^)V^md(d;β^)), among the class of estimators in (4) with md and md~ being arbitrary functions of (h1, a1), satisfy: (1) for h1 such that d1(h1)d~1(h1),Δd~(h1,d~1(h1))=Δd(h1,d1(h1)) = 0; (2) for h1 such that d1(h1) = d~1(h1),Δd~(h1,d~1(h1))=Δd(h1,d1(h1)).

Lemma 4 implies that, for the purpose of estimating the policy contrast, it is reasonable to replace md(h1, a1) with a working estimate md(h1,a1;α^m) of E[μ2(H2,d2(H2))H1=h1,A1=a1]. Then we have the following lemma concerning the estimator of the contrast in (4) with md(h1, a1) replaced by md(h1,a1;α^m). We will also refer to this estimator as an “assisted estimator”. This lemma assumes that md(h1,a1;α^m) is modeled via a linear model DmTαm where Dm is a function of (H1, A1) and αm is estimated via least squares.

Lemma 5: Assume that the conditions for Theorem 1 and 2 are satisfied; then n((V^m^d~(d~,β^)V^m^d(d;β^))(Vd~Vd)) converges in distribution to a normal distribution with mean zero and var-covariance matrix, ΣΔ. The plug-in estimator Σ^Δ is a consistent estimator of ΣΔ. The formulae for ΣΔ and Σ^Δ are provided in Web Appendix A.

4. Simulation Studies

All simulation experiments are based on generative models mimicking the Extending Treatment effectiveness of Naltrexone (ExTENd) trial, a SMART trial of alcohol dependence treatment (PI: Oslin; see Figure 1). In this trial, the first-stage randomization is between two different criteria for early non-response to Naltrexone (NTX): the stringent definition (two or more heavy drinking days) or the lenient definition (five or more heavy drinking days). Participants were assessed weekly for non-response; as soon as a participant met the non-response criterion, he/she was re-randomized to either switch to combined behavioral interventions (CBI) or to a combination of CBI and Naltrexone. If the participant did not meet his/her assigned non-response criterion by the end of two months, then the participant was re-randomized to one of two relapse prevention options: usual care (UC) or telephone disease management (TDM).

Figure 1.

Figure 1

ExTENd SMART design for the treatment of alcohol dependence. “R” stands for (re-)randomization. TDM = Telephone Disease Management, UC = Usual Care, NTX = Naltrexone, CBI = Combined Behavioral Intervention, MM = Medical Management

The structure of the simulated data is: (X1, A1, X2, R, A2, Y). X1 is a 3-dimension baseline covariate simulating the distribution of {baseline percent days heavy drinking, baseline craving score, baseline mental composite score}, A1 is the binary indicator of the randomized non-response criterion, X2 is a 2-dimension covariate simulating the distribution of {stage 1 duration, stage 1 percent days drinking}, R is the binary indicator of early response, A2 is the re-randomized binary treatment at the second stage. Y is a primary outcome simulating the distribution of the end-of-study craving score (lower values are better). We will study various simulation scenarios that are all based on the following Y:

Y=η0(X1)+A1(1,X1T)β1+η1(X1,A1,X2)+A2(1,X2T,A1,R,RX2T,RA1)β2+. (5)

in which the terms involving β's are the intermediate treatment effects and η0(),η1() and ε are other components in the distribution of Y that correspond to the main effect of X1, the effect of X2 conditional on (X1, A1) and the error term, respectively. We use estimates of η0() and η1() that are by-products of estimating an SNMM with the ExTENd data; the by-products of the estimation of SNMM also include an estimate of the variance of the error term, and we use that variance estimate to generate ε in our simulations. More details are provided in Web Appendix C.

We create nine simulation scenarios by varying β1, β2 in the generating model for Y. This procedure alters the magnitude of the main effects of the treatments at both stages and also the extent to which there are treatment by covariate interactions. In particular, the first coordinates in β1 and β2 reflect the main effects of A1 and A2, and the remaining coordinates reflect the interactions of A1 and A2 with covariates. We adopt the following definition of standardized effect size of a coordinate in βj by slightly modifying Cohen's d measure to: SES(βjk)=βjkVar(η0(X1))+Var(η1(X1,A1,X2))+Var(). We adopt this definition of standardized effect size because η0(X1), η1(X1, A1, X2) and ε are uncorrelated components in the generative model of primary outcome Y, and the sum of their variances contributes to the majority of the variance in Y. Note that to ensure that this definition of standardized effect size is meaningful, we will use standardized covariates (each covariate in X1, X2 is standardized to come from a population with mean 0 and standard deviation equal to 1). The nine simulation scenarios correspond to combinations of no treatment effect, low treatment effect and medium treatment effect at both stages. We define no Aj treatment effect (j = 1, 2) as βj = 0, define low Aj treatment effect as setting all coordinates in βj to have SES equal to 0.2, and define medium Aj treatment effect as setting the first two coordinates in βj to have SES equal to 0.5 (i.e., main effect and interaction effect with Xj1), and the other coordinates in βj to have SES equal to 0.2. The rationale for only one medium level interaction in medium Aj treatment effect case is that it is unlikely (in real data) for the treatment to interact with many covariates at medium level. The sign of each coordinate in βj is determined by a preliminary fit to the ExTENd data. In each simulation scenario, we generate 1000 simulated data sets.

Throughout β^ in the assisted estimator is one of Robins’ g-estimators (β^ is the solution to a series of least squares problems; indeed if, as discussed above a particular nuisance function is correctly modeled, then this least squares solution will belong to B). In Web Appendix C we provide results when β^ does not belong to B; the simulation results are similar. Also throughout m^d is estimated via least squares with (1, X1, A1) as predictors.

Let the triple (c1, c2, c3) denote a policy in which c1 is the assigned non-response criterion, c2 is the assigned binary treatment for early responders at the second stage, and c3 is the assigned binary treatment for early non-responders at the second stage. To investigate different aspects of the proposed methodology, we perform two sets of simulation experiments: The first set studies the bias and MSE of the assisted estimators of the difference in values of the most intensive policy, (1,1,1) and the least intensive policy, (0,0,0). The second set illustrates the efficiency gain of using the assisted estimator, compared with a non-parametric policy value estimator that is based on the marginal mean model (Murphy et al., 2001; Zhang et al., 2013).

Simulation 1: Here we compare bias and MSE for three types of assisted estimators for difference in value. We use the assisted estimator, V^m^d(d;β^) with m^d, an estimator of E[μ2(H2,d2(H2))H1,A1], and V^0(d;β^), to estimate the contrast between policies (1, 1, 1) and (0, 0, 0). We also consider V^md(d;β^) in which md is the unknown E[μ2(H2,d2(H2))H1,A1]; we call this an “oracle” assisted estimator, because in practice the optimal md will be unknown. The coverage of confidence intervals based on the asymptotic standard errors of each of the two non-oracle estimators is also provided in Table 1.

Table 1.

Simulation 1: Statistical properties of the assisted estimators of the contrast between values of policies (1,1,1) and (0,0,0).

N = 100
Scenario True Value Bias / SD
MSE
ASE Coverage
Oracle Assist Assist (md = 0) Oracle Assist Assist (md = 0) Assist Assist (md = 0)
(none,none) 0 0.04 0.04 0.04 3.51* 3.46 3.51* 95.7 95.4
(none,low) −2.4 0.01 0.01 0.01 4.26 4.26 4.31 95.1 95.6
(none,med) −5.2 0.03 0.03 0.01 3.94 3.93 4.3* 95.2 95.4
(low,none) −1.4 −0.01 −0.01 −0.01 3.31 3.3 3.31 95.5 96.3
(low,low) −3.8 0 0 0 4.08 4.14 4.12 95.5 95.9
(low,med) −6.6 0.04 0.04 0.04 4.09 4.1 4.25* 95.6 96.3
(med,none) −3.6 0.03 0.03 0.03 3.96 3.93 3.96 95.9 95.4
(med,low) −6.0 −0.01 −0.01 −0.01 4.33 4.36 4.38 95.2 95.5
(med,med) −8.8 0.01 0.01 0 4.02 4.04 4.24* 95 95.7

Oracle = contrast estimator based on V^md(d;β^) with the true optimal md. Assist = contrast estimator based on V^m^d(d;β^) with a working estimate of the optimal md. Assist (md = 0) = contrast estimator based on V^0(d;β^). The displayed numbers for confidence interval coverage are the coverage proportion × 100. An Asterisk indicates that the MSE of Oracle or Assist (md = 0) is significantly different from MSE of Assist (at 0.05 level).

The simulation results with N = 100 are shown in Table 1 (results for N = 250 are shown in Web Appendix C). Based on the ratio of bias and standard deviation, we conclude that, as expected, the assisted estimators provide an unbiased estimate of the contrast between policies. The MSEs of all the three estimators are similar; V^m^d(d;β^) tends to be slightly more efficient than V^0(d;β^). The coverage of the confidence intervals based on the asymptotic standard errors is close to 95% in all cases.

In Web Appendix C we provide additional simulations; these simulations illustrate that V^m^d(d;β^) will provide a noticeable efficiency improvement over V^0(d;β^) in some extreme settings. However, we found that in most practical scenarios, a sophisticated chosen md does not substantially improve the efficiency over md ≡ 0; therefore for simplicity we recommend using the assisted estimator with md ≡ 0.

Simulation 2: Here we assess the robustness via the bias, MSE and confidence interval coverage provided by the assisted estimators to misspecification of the SNMM. As a comparison we consider estimators from the marginal mean model (Murphy et al., 2001) as these estimators do not require the SNMM. The marginal mean models are estimated via a non-parametric inverse-weighted estimator. More details about the implementation of the marginal-mean-models-based estimator in this simulation study can be found in Web Appendix B. We also present there some discussions about the equivalency between the estimators proposed in Zhang et al. (2013) and in Murphy et al. (2001). Note that when the goal is to evaluate the difference between two policies, the estimators in Orellana et al. (2010) under particular choices of nuisance functions reduce to the marginal mean model estimators.

V^m^d(d;β^) is estimated with two differently mis-specified SNMMs in addition to the correctly specified SNMM. The true SNMM is implied by the generative model in (5), i.e., μ1(H1,A1)=A1(1,X1T)β1,μ2(H2,A2)=A2(1,X2T,A1,R,RX2T,RA2)β2. The first mis-specification of the SNMM excludes X11 from the model for μ1(H1, A1) and excludes X21, RX21 from the model for μ2(H2, A2) (denoted as Assist2 in Table 2). The second mis-specification models μ1(H1, A1) as A1(1,X1T)β1 and models μ2(H2, A2) as A2(1,X2T)β2, where X1 and X2 are 3-dimensional and 7-dimensional covariates (denoted as Assist3 in Table 2). The dimensions of X1 and X2 are chosen so that the model complexity is the same as in the correctly specified SNMM; X1 and X2 are generated independently of all the other covariates.

Table 2.

Simulation 2: Comparison between the marginal-mean-model-based estimators and the assisted estimators, with respect to the performance in estimating the policy contrasts.

N = 100
Estimation of the first contrast, (1, 1, 1) vs (0,0,0)
Scenario True value Bias × 100
Coverage of 95% CI × 100
Relative MSE
MM Assist1 Assist2 Assist3 MM Assist1 Assist2 Assist3 Assist1 Assist2 Assist3
(none,none) 0 2.4 4.9 5.2 4.9 95.2 96.2 96 96.1 0.94 0.93 0.99
(none,low) −2.4 5.8 4.6 4.7 6 94.5 96 95.4 95.2 0.95 0.94 1.04
(none,med) −5.2 12 −6.8 −6.8 −2.6 93.6* 93.9 93.6* 94.6 0.95 0.95 1.01
(low,none) −1.4 −1.9 2.5 1.7 4.8 95.6 94.6 94 95 1.01 1.01 1.09
(low,low) −3.8 −12.5 −10.8 −11 −10.3 94.3 94.5 93.5* 94.6 0.92 0.92 0.97
(low,med) −6.6 11 −9.9 −10.4 −5.8 93.9 94.8 94.7 95.5 0.84 0.84 0.93
(med,none) −3.6 8.9 4.2 5.4 3.4 95.5 95.9 95.3 96.2 0.89 0.87 0.89
(med,low) −6.0 9.7 −1.9 −2.7 −7.1 94.3 94.8 94.1 94.9 0.85 0.85 0.93
(med,med) −8.8 28.9* 4.2 5.4 4.7 93.7 94.9 95.2 94.9 0.8 0.79 0.85
Estimation of the second contrast, the tailored policy vs (0,0,0)
Scenario True value Bias × 100
Coverage of 95% CI × 100
Relative MSE
MM Assist1 Assist2 Assist3 MM Assist1 Assist2 Assist3 Assist1 Assist2 Assist3
(none,none) 0 6 1 2.4 2.3 96.2 97* 96.6* 96.1 0.78 0.76 0.57
(none,low) −2.2 6.4 4.8 −2.8 16.7* 95.6 96 95.7 94.7 0.79 0.77 0.59
(none,med) −3.9 11.5 −2.8 −22.1* −43.9* 94.9 95.8 95.1 94.4 0.78 0.77 0.67
(low,none) −1.1 5.3 11.2* 9.7 42.9* 95.5 95.3 94.8 93.8 0.81 0.8 0.69
(low,low) −3.3 −7.3 −6.3 −15.1* 46.3* 93.9 95.3 93.9 95 0.77 0.74 0.59
(low,med) −5 6.7 −1.8 −23.6* −2.8 94 96.3 94.9 95.7 0.7 0.69 0.5
(med,none) −2.3 9.3 8 9.1 50* 95.9 96.5* 95.8 95.4 0.76 0.74 0.57
(med,low) −4.4 13.7* 9.4 −0.3 53.3* 93.2* 95 95.2 94.1 0.7 0.67 0.57
(med,med) −6.2 24.7* 5.2 −15.1* 9.9* 93.1* 95.5 95.3 95.6 0.66 0.64 0.49

MM = Marginal-mean-model-based estimator. Assist1 = Assisted estimator with correctly specified SNMM. Assist2 = Assisted estimator with mis-specified SNMM that excludes X11, X21, RX21. Assist3 = Assisted estimator with mis-specified SNMM that excludes all the covariates interacting with treatments. Bias significantly different from 0, and coverage proportion significantly different from 95%, are marked with an asterisk. Relative MSE is calculated as the ratio of MSE with that of MM.

We focus on the estimation of two contrasts: the first is the contrast between the policies (1,1,1) and (0,0,0), and the second is the contrast between a “tailored” treatment policy and the policy (0, 0, 0). This tailored treatment policy assigns a1 = 1 if X13 > 0; a2 = 1 to all early responders and a2 = 1 to early non-responders if X21 < 0. In each of the nine simulation scenarios we compare the marginal-mean-model-based estimator with the assisted estimators for three differently specified SNMMs.

The experiment results when N = 100 are shown in Table 2 (results for N = 250 are shown in Web Appendix C). Instead of the MSE of the estimators, we present the relative MSE of the assisted estimators, with the MSE of the marginal-mean-model-based estimator (MM) as the reference. We found that, for the comparison between policies (1,1,1) and (0,0,0), the assisted estimators with correctly specified SNMM outperform MM in terms of the MSE in most cases; mis-specifying the SNMM does not seem to introduce bias, but severe mis-specification (Assist3 in the Table) can lead to lower efficiency, and sometimes can even cause the assisted estimators to have a larger MSE than MM. For the comparison between the tailored policy and policy (0,0,0), the assisted estimators with correctly specified SNMM outperform MM in terms of the MSE, and the advantage is greater than that of the first contrast. Mis-specifying the SNMM introduces bias; in particular, severe mis-specification (Assist3) leads to considerable bias. However, this bias does not seem to greatly impact the performance of the confidence interval. Interestingly, for the estimation of this contrast, mis-specifying the SNMM may even result in a smaller MSE despite of the bias, due to a smaller standard deviation in the estimate.

5. Data Analysis Example: ExTENd

The ExTENd study (see Figure 1) includes 302 participants, with 49 participants dropping out prior to experiencing two heavy drinking days. These participants are removed from our analysis as they did not experience the first randomization and both they and the clinicians were blind to this randomization. Only three participants dropped out during the first treatment stage after experiencing two heavy drinking days. The data from these participants is also removed for simplicity. Thus the data we analyze has a sample size of 250.

We use both the marginal-mean-model-based estimator and the assisted estimator to compare the most intensive versus the least intensive policies. Treatment policy (1,1,1) represents the most intensive policy in the SMART, in which the early non response is deemed to occur if and when there are 5 or more heavy drinking days in the first 8 weeks, in which early responders are provided TDM and in which early non responders are provided NTX+CBI. Treatment policy (0,0,0) represents the least intensive policy, in which early non response is deemed to occur if and when there are 2 or more heavy drinking days in the first 8 weeks, in which early responders are provided UC and in which early non responders are provided CBI only.

Besides the two treatment policies above, we will also compare a more “deeply tailored” policy versus the policy (0, 0, 0). At stage one, this tailored policy assigns the 5 or more heavy drinking days definition of non response to participants for whom the standardized pre-treatment mental score is above zero and the 2 or more heavy drinking days definition of non response to participants with a pre-treatment mental score below zero. Among early responders this policy assigns TDM if they have at least one heavy drinking day during stage one and assigns UC otherwise. Among early non responders this policy assigns NTX+CBI if their stage one duration is shorter than 49 days and otherwise assigns CBI only. The justification of this treatment policy comes from the belief that participants who were in worse mental health condition (indicated by a lower mental composite score) at baseline should proceed to stage two earlier to receive more intensive treatments. Moreover, it is considered that responders and non-responders who performed worse in stage one (i.e., responders who experienced at least one heavy drinking day and non-responders who transitioned to stage two sooner) should receive more intensive intervention in stage two.

We compare the treatment policies in terms of the Penn Alcohol Craving Scale (PACS). Here we reverse code this scale such that higher values imply less craving thus are more favorable. PACS is collected every two months during stage two. The outcome Y is the average of the measurement at two months and four months after entry into stage two. Among the 250 participants in our data set, 46 participants are missing Y. We deal with this missingness in the outcome, Y, by adopting a slightly adjusted assisted estimator that handles missingness via inverse-probability-weights (see Robins et al. (1995) for example). The adjustment requires an estimator of the conditional probability of missing the outcome. This adjustment is briefly reviewed in Web Appendix B. In particular, we make the assumption that the missing Y's are missing at random (Rubin, 1976). The marginal-mean-model-based estimator is also adjusted similarly to accommodate for missingness.

In the analysis model, we choose to include the following covariates: X1 is a 10-dimensional baseline covariate including mean-centered versions of {gender, age, years of alcohol use, indicator of drug abuse, pre-treatment percent days heavy drinking, indicator of being married, years of alcohol intoxication, pre-treatment alcohol intoxication days within 30 days, pre-treatment percent days drinking, pre-treatment mental composite score}; X2 is 5-dimensional covariate measured prior to re-randomization, including {duration of the first stage, number of heavy drinking days during the first stage, percent days drinking during the first stage, percent days heavy drinking during the first stage, average number of pills taken per day during the first stage}. Moreover, A1 indicates whether (A1 = 1) or not (A1 = 0) a patient is randomized to the lenient definition (i.e., five or more heavy drinking days) of non-response as opposed to the stringent definition (i.e., two or more heavy drinking days); R is the indicator of being an early responder; A2 indicates whether (A2 = 1) or not (A2 = 0) a responder is re-randomized to TDM as opposed to UC, or whether or not a non-responder is re-randomized to NTX+CBI as opposed to Placebo+CBI.

We run two sets of analysis with the assisted estimators, under two different SNMMs: in the first analysis we adopt a parsimonious model for SNMM by assuming μ1(H1,A1)=A1(1,X~1T)β1 and μ2(H2,A2)=A2(1,X~2T,A1,R,RX~2T,RA1)β2, where X~1 is the first five dimensions in X1 and X~2 is the first three dimensions in X2; in the second analysis we adopt a more complex model for SNMM by assuming μ1(H1,A1)=A1(1,X1T)β1 and μ2(H2,A2)=A2(1,X2T,A1,R,RX2T,RA1)β2. Asymptotic standard errors of the policy contrast estimates are calculated and used to construct the 95% confidence intervals for the policy contrasts. Table 3 presents the analysis results.

Table 3.

Illustrative data analysis results with the ExTENd data. Evaluate the policy contrasts of both the policy (1, 1, 1) and the proposed tailored policy, in relation to the policy (0, 0, 0), with respect to PACS.

(1,1,1) vs (0,0,0) Tailored vs (0,0,0)

Est (s.e.) Lower Bound Upper Bound Est (s.e.) Lower Bound Upper Bound


PACS MM 2.98 (1.30) 0.44 5.52 0.21 (1.05) −1.85 2.27
Assist1 2.83 (1.44) 0.00 5.66 0.91 (0.99) −1.02 2.85
Assist2 2.95 (1.48) 0.04 5.85 1.25 (1.05) −0.80 3.31

MM = Marginal-mean-model-based estimator. Assist1 = Assisted estimator with a parsimonious SNMM. Assist2 = Assisted estimator with a complex SNMM.

The three estimators (including two assisted estimators with different SNMMs) produce similar estimates, considering the relatively large standard errors. The analyses suggest that the most intensive, (1,1,1) policy is estimated to approximately lower PACS by 3 on average compared to the least intensive, (0,0,0) policy, and this difference is significant at 0.05 level, across all three estimators. The proposed more tailored policy, on the other hand, does not significantly differ from the (0,0,0) policy. Note that the marginal-mean-model based estimator has standard error no greater than that of the assisted estimators; this might be due to either small treatment effects in the ExTENd data, or the variance due to the considerable amount of missingness in the data.

6. Discussion

Our simulations indicate that the MSE performance of the assisted estimators is robust to misspecification of the model for the intermediate treatment effects. None-the-less to reduce bias, efforts should be made to ensure good model fit in estimating the intermediate treatment effects. Data analysts should make efforts to collect all the time-varying covariates that may moderate the effect of treatment at each stage on the primary outcome and include them in the treatment effects models. Specific subject knowledge, and possibly results from past studies, may provide valuable information for choosing the models.

In this manuscript we did not derive the semi-parametrically efficient estimator for policy value and/or policy contrast. To obtain the most efficient estimator of the policy contrast, one needs to subtract from the influence function of the assisted estimator its projection on all tangent spaces that are orthogonal to the tangent space associated to the policy contrast; this appears difficult because the policy contrast is a functional of a collection of finite or infinite dimensional parameters in the data distribution and the functional is dependent on the specific policies being studied. We plan to investigate this efficiency problem in future research.

In this paper we focused on the comparison of two-stage treatment policies. When there are more than two treatment stages at which (re-)randomization may happen, similar assisted estimators can be constructed. For example, for a three-stage treatment policy d = (d1, d2, d3), the assisted estimator requires additional terms characterizing the effect of d3 when (d1, d2) were followed at earlier stages. This would involve inverse-probability-weights from more than one stage.

The methodology proposed in this paper is only applicable when a few candidate treatment policies have been pre-specified. When there are more than a few candidate treatment policies, usually one of the candidate treatment policies can be considered as a reference policy, and comparison can be made between any of the remaining policies and this reference policy. In future work, we will also consider a multiple comparison procedure for many treatment policies.

The assisted estimators are based upon the structural nested mean models for continuous primary outcomes. Multiplicative structural mean models (Robins, 1997a) and generalized structural mean models (Vansteelandt and Goetghebeur, 2003) have been proposed to deal with non-continuous primary outcomes and non-linear treatment effects. We expect that the assisted estimators can also be extended to deal with more complicated primary outcomes and more complicated underlying interaction between treatments and covariates, with the assistance of these more recent variations of SNMMs.

Supplementary Material

Supp MaterialS1

Acknowledgements

This work was supported by NIDA grant P50-DA-010075, NIAAA grant RC1 AA-019092, NIAAA grant P01 AA-016821 and NIAAA grant R01 AA-014851.

Footnotes

7. Supplementary Materials

Web Appendices referenced in Sections 2, 3, 4, 5, and the R script to obtain the proposed estimators and generate the simulative data sets are available with this paper at the Biometrics website on Wiley Online Library.

References

  1. Brown RT, Antonuccio DO, DuPaul GJ, Fristad MA, King CA, Leslie LK, McCormick GS, Pelham WE, Jr, Piacentini JC, Vitiello B. Childhood mental health disorders: Evidence base and contextual factors for psychosocial, psychopharmacological, and combined interventions. American Psychological Association; 2008. [Google Scholar]
  2. Jones HE, O' Grady KE, Tuten M. Reinforcement-based treatment improves the maternal treatment and neonatal outcomes of pregnant patients enrolled in comprehensive care treatment. The American Journal on Addictions. 2011;20:196–204. doi: 10.1111/j.1521-0391.2011.00119.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Kasari C. Developmental and augmented intervention for facilitating expressive language (CCNIA) National Institutes of Health; Bethesda, MD .: 2009. [Google Scholar]
  4. Lavori PW, Dawson R. A design for testing clinical strategies: biased adaptive within-subject randomization. Journal of the Royal Statistical Society: Series A (Statistics in Society) 2000;163:29–38. [Google Scholar]
  5. Lavori PW, Dawson R. Dynamic treatment regimes: practical design considerations. Clinical trials. 2004;1:9–20. doi: 10.1191/1740774s04cn002oa. [DOI] [PubMed] [Google Scholar]
  6. Lavori PW, Dawson R, Rush AJ. Flexible treatment strategies in chronic disease: clinical and research implications. Biological Psychiatry. 2000;48:605–614. doi: 10.1016/s0006-3223(00)00946-x. [DOI] [PubMed] [Google Scholar]
  7. Lunceford JK, Davidian M, Tsiatis AA. Estimation of survival distributions of treatment policies in two-stage randomization designs in clinical trials. Biometrics. 2002;58:48–57. doi: 10.1111/j.0006-341x.2002.00048.x. [DOI] [PubMed] [Google Scholar]
  8. McKay JR. Treating Substance Use Disorders With Adaptive Continuing Care. American Psychological Association; 2009. [Google Scholar]
  9. Murphy SA. Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2003;65:331–355. [Google Scholar]
  10. Murphy SA. An experimental design for the development of adaptive treatment strategies. Statistics in Medicine. 2005;24:1455–1481. doi: 10.1002/sim.2022. [DOI] [PubMed] [Google Scholar]
  11. Murphy SA, van der Laan MJ, Robins JM. Marginal mean models for dynamic regimes. Journal of the American Statistical Association. 2001;96:1410–1423. doi: 10.1198/016214501753382327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Nahum-Shani I, Qian M, Almirall D, Pelham WE, Gnagy B, Fabiano GA, Waxmonsky JG, Yu J, Murphy SA. Experimental design and primary data analysis methods for comparing adaptive interventions. Psychological methods. 2012a;17:457. doi: 10.1037/a0029372. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Nahum-Shani I, Qian M, Almirall D, Pelham WE, Gnagy B, Fabiano GA, Waxmonsky JG, Yu J, Murphy SA. Q-learning: A data analysis method for constructing adaptive interventions. Psychological methods. 2012b;17:478. doi: 10.1037/a0029373. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Neyman J, Iwaszkiewicz K, Kolodziejczyk S. Statistical problems in agricultural experimentation. Supplement to the Journal of the Royal Statistical Society. 1935:107–180. [Google Scholar]
  15. Orellana L, Rotnitzky A, Robins JM. Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, part i: main content. The international journal of biostatistics. 2010;6:1–47. [PubMed] [Google Scholar]
  16. Oslin D. Managing alcoholism in people who do not respond to naltrexone (ExTENd) National Institutes of Health; Bethesda, MD.: 2005. [Google Scholar]
  17. Pliszka S, AACAP Work Group on Quality Issues Practice parameter for the assessment and treatment of children and adolescents with attention-deficit/hyperactivity disorder. J. Am. Acad. Child Adolesc. Psychiatry. 2007;46 doi: 10.1097/chi.0b013e318054e724. [DOI] [PubMed] [Google Scholar]
  18. Robins J. Optimal structural nested models for optimal sequential decisions. In: Lin D, Heagerty P, editors. Proceedings of the Second Seattle Symposium on Biostatistics. Springer; New York: 2004. pp. 189 – 326. [Google Scholar]
  19. Robins J, Orellana L, Rotnitzky A. Estimation and extrapolation of optimal treatment and testing strategies. Statistics in medicine. 2008;27:4678–4721. doi: 10.1002/sim.3301. [DOI] [PubMed] [Google Scholar]
  20. Robins JM. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical Modelling. 1986;7:1393–1512. Mathematical models in medicine: diseases and epidemics, Part 2. [Google Scholar]
  21. Robins JM. Correcting for non-compliance in randomized trials using structural nested mean models. Communications in Statistics-Theory and methods. 1994;23:2379–2412. [Google Scholar]
  22. Robins JM. Causal inference from complex longitudinal data. Latent variable modeling and applications to causality. 1997a:69–117. [Google Scholar]
  23. Robins JM. Marginal structural models. Proceedings of the American Statistical Association. Section on Bayesian Statistics. 1997b:1–10. [Google Scholar]
  24. Robins JM, Rotnitzky A, Zhao LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association. 1995;90:106–121. [Google Scholar]
  25. Rubin DB. Inference and missing data. Biometrika. 1976;63:581–592. [Google Scholar]
  26. Rubin DB. Comment. Journal of the American Statistical Association. 1986;81:961–962. [Google Scholar]
  27. Rubin DB, et al. Bayesian inference for causal effects: The role of randomization. The Annals of Statistics. 1978;6:34–58. [Google Scholar]
  28. Thall PF, Sung H-G, Estey EH. Selecting therapeutic strategies based on efficacy and death in multicourse clinical trials. Journal of the American Statistical Association. 2002;97:29–39. [Google Scholar]
  29. Vansteelandt S, Goetghebeur E. Causal inference with generalized structural mean models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2003;65:817–835. [Google Scholar]
  30. Wahed AS, Tsiatis AA. Optimal estimator for the survival distribution and related quantities for treatment policies in two-stage randomization designs in clinical trials. Biometrics. 2004;60:124–133. doi: 10.1111/j.0006-341X.2004.00160.x. [DOI] [PubMed] [Google Scholar]
  31. Wahed AS, Tsiatis AA. Semiparametric efficient estimation of survival distributions in two-stage randomisation designs in clinical trials with censored data. Biometrika. 2006;93:163–177. [Google Scholar]
  32. Zhang B, Tsiatis AA, Laber EB, Davidian M. Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. Biometrika. 2013;100:681–694. doi: 10.1093/biomet/ast014. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp MaterialS1

RESOURCES