Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Sep 1.
Published in final edited form as: Biometrika. 2020 Sep 4;108(3):507–527. doi: 10.1093/biomet/asaa070

Estimating time-varying causal excursion effect in mobile health with binary outcomes

TIANCHEN QIAN 1, HYESUN YOO 2, PREDRAG KLASNJA 3, DANIEL ALMIRALL 4, SUSAN A MURPHY 5
PMCID: PMC8494142  NIHMSID: NIHMS1619823  PMID: 34629476

Summary

Advances in wearables and digital technology now make it possible to deliver behavioral mobile health interventions to individuals in their everyday life. The micro-randomized trial is increasingly used to provide data to inform the construction of these interventions. In a micro-randomized trial, each individual is repeatedly randomized among multiple intervention options, often hundreds or even thousands of times, over the course of the trial. This work is motivated by multiple micro-randomized trials that have been conducted or are currently in the field, in which the primary outcome is a longitudinal binary outcome. The primary aim of such micro-randomized trials is to examine whether a particular time-varying intervention has an effect on the longitudinal binary outcome, often marginally over all but a small subset of the individual’s data. We propose the definition of causal excursion effect that can be used in such primary aim analysis for micro-randomized trials with binary outcomes. Under rather restrictive assumptions one can, based on existing literature, derive a semiparametric, locally efficient estimator of the causal effect. Starting from this estimator, we develop an estimator that can be used as the basis of a primary aim analysis under more plausible assumptions. Simulation studies are conducted to compare the estimators. We illustrate the developed methods using data from the micro-randomized trial, BariFit. In BariFit, the goal is to support weight maintenance for individuals who received bariatric surgery.

Keywords: Binary outcome, Causal excursion effect, Causal inference, Longitudinal data, Micro-randomized trials, Mobile health, Relative risk, Semiparametric efficiency theory

1. Introduction

In mobile health, mobile devices, including smart phones and wearable devices, are used to deliver interventions intended to promote healthy behaviors and health-related behavioral change (Free et al., 2013). Treatments include prompts to self-monitor, cognitive interventions to promote reflection and goal setting as well as suggestions of ways to enact healthy behavior changes. These treatments are delivered to the individual via the individual’s phone or a wearable. An increasingly common trial, called the micro-randomized trial (MRT), is being used to inform the development of mobile health interventions (Liao et al., 2016; Klasnja et al., 2015; Law et al., 2016; Klasnja et al., 2018; Kramer et al., 2019; Tate & Valle, 2019). In an MRT, each individual is repeatedly randomized among the multiple options for a treatment, often hundreds or even thousands of times, over the course of the trial. In all cases the randomization probabilities are determined as part of the design of the trial and are thus known. Between randomizations, covariate data is collected on the individual’s current/recent context via sensors and/or self-report, and after each randomization a proximal, near-time outcome is collected. The time-varying treatments and covariates as well as the time-varying proximal outcome comprise the longitudinal data for use in assessing if a treatment has an effect on the proximal outcome and/or in which contexts this effect may be greater or smaller.

This paper is motivated by our involvement in a number of MRTs in which the primary longitudinal outcome, i.e., the time-varying proximal outcome, is binary. Schematics of these trials can be found at the website of the Methodology Center at the Pennsylvania State University (Methodology Center, 2020). For example, in the Substance Abuse Research Assistance study (Rabbi et al., 2018), the primary outcome is whether or not the user completed a daily survey. In three other MRTs—namely, the Smart Weight Loss Management MRT (Spring & Nahum-Shani, 2016) which is currently in the field, a previous MRT conducted by JOOL Health (Bidargaddi et al., 2018), and in the BariFit (Ridpath, 2017) MRT which we describe in more detail below—the primary outcome is whether or not the user self-monitored on a daily basis.

An important second feature of the MRTs that further motivates this paper is that the primary and secondary aims of the trials—i.e., the comparisons that are of primary and secondary interest to the domain scientists—are often marginal comparisons. Here, marginal means, in part, that the treatment comparisons do not condition on the individual’s full history of data or the full history of prior treatments; in Section 3 we provide a precise definition. For example, in the BariFit MRT, the primary aim is to examine the effect of a daily text message reminder on whether or not the user self-monitored their food intake via a food log by the end of each day. Secondary aims in BariFit also focuses on marginal comparisons, such as whether the effect of a daily text message reminder differs depending on whether the user self-monitored on the previous day. In each of these two examples, the domain scientists are interested in a comparison that is marginal over some aspect of the individual’s full history of data including prior treatments.

For the domain scientists, estimates of these marginal comparisons are crucial for informing decisions regarding whether to include the treatment in a mobile health intervention package, as are more complex, conditional treatment effect comparisons that garner understanding of the contexts in which the treatment might be more effective or less effective. Thus, it is important to develop estimators that enable domain scientists to answer any number of scientific questions, from those that are marginal to those that condition on the user’s full history.

In this paper, we consider estimation and inference for causal effects that can be used as the basis of these primary and secondary aim comparisons. One possible causal effect, which is akin to Robins’ treatment blip in structural nested mean models (Robins, 1994, 1997), is one that conditions on the individual’s full history. Our first contribution is that we propose a definition of marginal generalizations of this effect, which we call causal excursion effects. Such effects are marginal over all but a subset of the individual’s prior data, which is chosen by domain scientists, and they are well-suited for primary and secondary aim comparisons. Furthermore, such effects can be viewed as excursions as they represent a causal effect of a time-varying treatment occurring over an interval of time extending into the future. In this case the definition of the excursion effect involves rules for how further treatments, if any, would occur during this interval of time. This can be used to answer questions that naturally arise in MRTs such as “what is the effect of delivering a treatment now then not delivering any treatment for the next m time points”. Lastly, these causal effects, as they are marginal over prior treatment assignments, can be interpreted as contrasts between excursions from the treatment protocol as specified by the micro-randomization. This informs how the current treatment protocol might be improved via moderation analysis on how these causal effects differ by individual’s contexts.

Based on Robins (1994), we provide a semiparametric, locally efficient estimator for the causal effect that is conditional on the full history. Our second contribution is that, starting from this estimator, we develop an estimator that consistently estimates the causal excursion effect, which can be conditional on an arbitrary subset of the history. The estimator is robust in the sense that, for consistency, it does not require that the model for the proximal outcome under no treatment to be correctly specified. We propose to use this estimator as the basis of primary and secondary aim comparisons for MRTs with binary outcomes.

2. Preliminaries

2.1. Micro-randomized trials and BariFit

As introduced in Section 1, micro-randomized trials (MRTs) provide longitudinal data for use in developing mobile health interventions (Liao et al., 2016; Dempsey et al., 2015; Klasnja et al., 2015). BariFit, for example, is an MRT that was conducted to aid the process of developing a mobile health intervention for promoting weight maintenance among individuals who received bariatric surgery (Ridpath, 2017). In this study a daily text reminder might be sent to encourage the participant to self-monitor their food intake via a food log; we will refer to this daily text reminder as the food tracking reminder.

In an MRT each participant is randomized, with known probabilities, between the treatment options at predetermined time points. In BariFit, the food tracking reminder is randomized with probability 0.5 between delivering versus not delivering each morning for 112 days. In general, the randomization probability can vary depending on the individual’s data observed up to that time.

In BariFit, the proximal outcome for the food tracking reminder is whether the participant completes their food log on that day. The analysis method developed here focuses on this proximal outcome. However, it is conjectured that these reminders will assist the individual in building up healthy habits, and longer term effects are desired. Thus, in defining the causal effects below, we do not assume that longer term effects are absent.

Because treatments are delivered to individuals during their everyday life, there may be times at which it is inappropriate or deemed excessively burdensome to deliver a treatment. For example, if the treatment is a smartphone notification that audibly pings and makes the phone light up, it is inappropriate to deliver the smartphone notification when the individual might be operating a motor vehicle (Klasnja et al., 2018). In these moments the individual is deemed unavailable for treatment. Randomization occurs only at available time points, and the causal effect is conditional on the available times (Boruvka et al., 2018). Due to the fact that many MRTs involve considerations of availability, the methods developed below accommodate this. However, in the case of the BariFit food tracking reminders, they were sent, if at all, early in the morning and, as text messages remain in the phone, the participant is able to read them at a time they deem convenient. Thus in the BariFit study, lack of availability is not a consideration.

2.2. Related literature and our contribution

Because data from mobile health studies are often longitudinal, generalized estimating equations (Liang & Zeger, 1986) and random effects models (Laird & Ware, 1982) are the most commonly used methods for modeling the time-varying association between two or more variables in mobile health studies (Schwartz & Stone, 2007; Bolger & Laurenceau, 2013). However, in the presence of time-varying treatments or time-varying covariates, it is well known that the use of these methods can result in biased causal effect estimates without strong and often unrealistic assumptions (Pepe & Anderson, 1994; Schildcrout & Heagerty, 2005).

Structural nested mean models (SNMMs) and marginal structural models (MSMs) are two classes of models that facilitate estimation of causal effects of a time-varying treatment on a time-varying outcome, where the treatment assignment mechanism may depend on history variables (Robins, 1994, 1997, 2000; Robins et al., 2000). In an SNMM, the effect of sequentially removing an amount of treatment on future outcomes, after having removed all future treatments, is modeled. This effect is conditional on all the history information up to that time. In an MSM, the expectation of the time-varying outcome under a fixed treatment trajectory, possibly conditional on a subset of baseline covariates, is modeled as a function of the treatment trajectory and possibly the subset of baseline covariates.

The causal excursion effect we considered can be conditional on an arbitrary subset of the history, which is chosen by domain scientists. Unlike MSM, our approach allows estimation of causal effect modification by time-varying covariates. Unlike SNMM, our causal excursion effect is marginalized over variables not in the subset of the history, i.e., possibly marginal over a large part of the history variables. This makes the estimand coherent with the goal of primary and secondary aim comparisons, and avoids modeling the relationship between the time-varying outcome and the past history in MRTs, where the number of time points can be numerous and the history can be high-dimensional. Robins (2004, Section 7) considered a related marginalization idea for SNMM, the so-called marginal SNMM. The difference between our causal excursion effect and the effect in Robins’ marginal SNMM is that a marginal SNMM considers a conditional set that grows over time, and the goal of the marginal SNMM is to estimate optimal treatment regime with respect to the conditional set that may not be the full history. Our consideration of the marginalization is for primary and secondary analyses, and we consider general conditional sets that might not be nested. A related marginalization idea was also considered by Neugebauer et al. (2007) in the history-restricted extension of MSM. Furthermore, the causal excursion effect can be defined as a contrast between two treatment excursions extended into the future, each being a time-varying treatment assignment over an interval of time; this excursion aspect was not considered in either SNMM or MSM.

In previous works on data analytic methods for MRTs, Boruvka et al. (2018) and Dempsey et al. (2020) considered estimation of causal effects of mobile health interventions, where the outcome is continuous. In this paper we consider binary outcome, and we address the unique challenges in the binary outcome by considering a log relative risk model for the causal excursion effect and by developing a novel estimator.

3. Definition and assumptions

3.1. Notation and observed data

Suppose that for each individual, there are T time points at which the treatment can be delivered. T is a predetermined nonrandom quantity. For simplicity we assume that there are two treatment options which we will call treatment and no treatment. Thus, the treatment assignment at time t, At, is binary, where 1 means treatment and 0 means no treatment. Denote by Xt the vector of observations collected after time t – 1 and up to time t; X1 includes baseline covariates. Xt contains the availability indicator, It; It = 1 if the individual is available for treatment at time t and It = 0 otherwise. If It = 0, randomization will not occur at time t and At = 0. We use overbar to denote a sequence of variables up to a time point; for example A¯t=(A1,,At). Information accrued up to time t is represented by the history Ht=(X1,A1,X2,A2,,Xt1,At1,Xt)=(X¯t,A¯t1). The randomization probability for At can depend on Ht, and is denoted by pt(Ht) = P(At = 1 ∣Ht). pt(·) is known by the MRT design. The observed data on a generic individual, ordered in time, is O = (X1,A1,…,XT,AT,XT+1). We assume that the data from different individuals are independent and identically distributed draws from an unknown distribution P0. Unless noted otherwise, all expectations are taken with respect to P0.

The proximal outcome, following the treatment assignment at time t, Yt, is a known function of the individual’s data within a subsequent window of length Δ, where Δ ≥ 1 is a fixed positive integer; i.e., Yt = y(Xt+1, At+1,…,Xt+Δ−1, At+Δ−1, Xt) for some known function y(·). In this paper Yt is binary. For example, in a smoking cessation study where the treatment is a push notification that reminds the individual to practice stress-reduction exercises (Spring, 2017), the treatment is randomized every minute albeit with very low probability of sending a push notification at any given minute, and the proximal outcome is whether the individual experiences a stress episode during the 120-minute window following a treatment. In this example, t is every minute, and Δ = 120. A simpler setting with Δ = 1 is where the proximal outcome is measured prior to future treatments and is given by Yt,1 = y(Xt+1); an example is the BariFit MRT described in Section 2.1, where the randomization occurs once a day, and the proximal outcome is measured in the same day. The estimator we propose in Section 5 allows for general Δ.

For an arbitrary function f(·) of the generic observed data O, denote by Pnf(O) the sample average 1ni=1nf(Oi) where Oi denotes the ith individual’s data. We omit the subscript i for the ith individual throughout the paper unless necessary. We use 1 to denote the indicator function.

3.2. Potential outcomes and causal excursion effect

To define treatment effects, we use the potential outcomes framework (Rubin, 1974; Robins, 1986). For an individual, let Xt(a¯t1) and At(a¯t1) be the observation that would have been observed and the tth treatment that would have been assigned, respectively, if they were assigned the treatment sequence a¯t1. Then the potential outcomes are defined as

{X1,A1,X2(a1),A2(a1),X3(a¯2),,AT(a¯T1),XT+1(a¯T)for alla¯T{0,1}T}, (1)

where ⊗ denotes the Cartesian product. The potential outcome for the proximal outcome is Yt,Δ(a¯t+Δ1). The treatment at time t in (1) is indexed by past treatments because in an MRT the randomization probabilities can depend on the participant’s past treatment. However, for notational simplicity, which will be further justified by Assumption 1 in Section 3.3, henceforth denote A2(A1) by A2 and so on with At(A¯t1) by At. The potential history under the observed treatment sequence at time t is Ht(A¯t1)=(X1,A1,X2(A1),A2,X3(A¯2),,Xt(A¯t1)).

We define the causal effect of At on Yt using the log relative risk scale:

βM{t,St(A¯t1)}=logE{Yt,Δ(A¯t1,1,0¯)St(A¯t1),It(A¯t1)=1}E{Yt,Δ(A¯t1,0,0¯)St(A¯t1),It(A¯t1)=1}, (2)

where St(A¯t1) is a vector of summary variables formed from Ht(A¯t1), and 0¯ is a vector of length Δ – 1. We omit the notational dependence of βM on Δ for simplicity. Expression (2) denotes the contrast of the expected outcome under two excursions from the current treatment protocol: treatment at time t and no treatment for the next Δ – 1 time points, versus no treatment at time t and no treatment for the next Δ – 1 time points. In both excursions the treatment assignment up to time t, A¯t1, is stochastic and follows the current treatment protocol of the MRT; i.e., the way the treatments are sequentially randomized with randomization probability pt(Ht). We call βM{t,St(A¯t1)} a causal excursion effect. The expectation in (2) marginalizes over the randomization distribution of A¯t1 that are not included in St(A¯t1). In other words, the meaning of an excursion is relative to how treatments were assigned in the past: at time t, we are considering excursions from the current protocol of assigning treatment. The methods developed below generalize to other types of excursions, such as excursions that specify a decision rule at each time between time t and time t + Δ – 1.

βM{t,St(A¯t1)} is a marginal generalization of the treatment blips in a structural nested mean model (Robins, 1994, 1997), where St(A¯t1) is set to be Ht(A¯t1); hence the subscript M in βM{t,St(A¯t1)}. We are particularly interested in the marginal effect, because the primary, pre-specified analysis in an MRT usually aim to assess whether a particular intervention component has a marginal effect on the proximal outcome. For such an analysis, one would set St(A¯t1)=1; i.e., the treatment effect is fully marginal. Subsequent analyses usually have a hierarchy of increasingly complex St(A¯t1) which includes variables that may modify the treatment effect. In this paper we sometimes call βM{t,St(A¯t1)} a marginal excursion effect to emphasize its marginal aspect. We discuss in more detail the advantage and limitation of our choice of the causal effect definition and why it is well-suited for mobile health and MRTs in Section 8.

A special case of βM{t,St(A¯t1)} is when Δ = 1 and St(A¯t1) is set to Ht(A¯t1):

βC{t,Ht(A¯t1)}=logE{Yt,1(A¯t1,1)Ht(A¯t1),It(A¯t1)=1}E{Yt,1(A¯t1,0)Ht(A¯t1),It(A¯t1)=1}. (3)

As we will see in Section 4, an estimator for this fully conditional case can be derived based on the semiparametric efficiency literature, which motivates our proposed estimator for the general βM{t,St(A¯t1)}.

There has been much debate over the choice of association measure for binary outcomes in the literature, and reasons to prefer relative risk over odds ratio include its interpretability and conditions for collapsibility (Greenland, 1987; Lumley et al., 2006). A drawback of using the relative risk as opposed to odds ratio is that in general the estimated probability of success is not guaranteed to lie in the interval [0, 1], unless some alternative parameterization such as Richardson et al. (2017) is used. Nonetheless, we chose to define (2) on the relative risk scale, for both interpretability and modeling ease. See Section 8 for further discussion concerning this modeling choice.

3.3. Identification of parameters

To express the causal excursion effect in terms of the observed data, we make the following assumptions.

Assumption 1 (Consistency). The observed data equals the potential outcome under observed treatment assignment. In particular, X2 = X2(A1), A2 = A2(A1), and for each subsequent tT, Xt=Xt(A¯t1), At=At(A¯t1), and lastly, XT+1=XT+1(A¯T). This implies Yt,Δ=Yt,Δ(A¯t+Δ1).

Assumption 2 (Positivity). If Pr(Ht = ht,It = 1) > 0, then Pr(At = aHt = ht,It = 1) > 0 for a ∈ {0, 1}.

Assumption 3 (Sequential ignorability). For 1 ≤ tT, the potential outcomes {Xt+1(a¯t), At+1(a¯t),,XT+1(a¯T):a¯T{0,1}T} are independent of At conditional on Ht.

In an MRT, because the treatment is sequentially randomized with known probabilities bounded away from 0 and 1, Assumptions 2 and 3 are satisfied by design. Assumption 1 may fail to hold if there is peer influence or social interaction between individuals; for example, in mobile health interventions with social media components, one individual’s proximal outcome may be dependent on another individual’s treatment assignment, which violates Assumption 1. In those cases, a causal inference framework that incorporates interference needs to be used (Hong & Raudenbush, 2006; Hudgens & Halloran, 2008). To maintain the focus of this paper we do not consider such settings here.

We show in Appendix A that under Assumptions 1 - 3, the causal excursion effect (2) can be written in terms of the observed data distribution:

βM{t,St(A¯t1)}=logE[E{j=t+1t+Δ11(Aj=0)1pj(Hj)Yt,ΔAt=1,Ht,It=1}St,It=1]E[E{j=t+1t+Δ11(Aj=0)1pj(Hj)Yt,ΔAt=0,Ht,It=1}St,It=1], (4)

where we define j=t+1t+Δ11(Aj=0)1pj(Hj)=1 if Δ = 1. With a slight abuse of notation, we denote the right hand side of (4) by βM (t, St). Similarly the treatment effect conditional on full history with Δ = 1 given in (3) can be written as

βC(t,Ht)=logE(Yt,1At=1,Ht,It=1)E(Yt,1At=0,Ht,It=1). (5)

4. A semiparametric, locally efficient estimator

To motivate the estimator for the marginal excursion effect βM(t, St), we first consider the special case where the treatment effect is conditional on the full history Ht and the proximal outcome is defined with Δ = 1; that is, consider (5). Using techniques in Robins (1994), the semiparametric efficient score (Newey, 1990) can be derived; a proof is provided in Appendix I.

Theorem 1. Suppose f(·) is a known deterministic function such that for 1 ≤ tT,

βC(t,Ht)=logE(Yt,1At=1,Ht,It=1)E(Yt,1At=0,Ht,It=1)=f(Ht)Tψ (6)

holds for some unknown p-dimensional parameter ψ, where the expectation is taken over the true distribution of the data, P0. In the semiparametric model characterized by (6) and Assumptions 1, 2 and 3, the efficient score for ψ is

Seff(ψ)=t=1TIteAtf(Ht)Tψ{Yt,1eμ(Ht)+Atf(Ht)Tψ}Kt{Atpt(Ht)}f(Ht), (7)

where

μ(Ht)=logE(Yt,1Ht,At=0,It=1),Kt=ef(Ht)Tψef(Ht)Tψ{1eμ(Ht)}pt(Ht)+{1eμ(Ht)+f(Ht)Tψ}{1pt(Ht)}.

It follows from semiparametric efficiency theory that the solution ψ^ to PnSeff(ψ)=0 achieves the semiparametric efficiency bound; i.e., it has the smallest asymptotic variance among all semiparametric regular and asymptotically linear estimators for ψ (Newey, 1990; Tsiatis, 2007). Of course this estimator is not practical because Seff depends on an unknown quantity μ(Ht). In practice, one can replace μ(Ht) with a parametric working model and solve for the estimating equation or construct a two-step estimator where in the first step μ(Ht) is estimated and its estimate is plugged into Seff in the second step to form an estimating equation for ψ. Seff (ψ) is robust to misspecified model for μ(Ht)—i.e., it has expectation 0 if one replaces μ(Ht) by an arbitrary function of Ht—and the resulting estimator is semiparametric locally efficient, in the sense that it is consistent and when the working model for μ(Ht), 1 ≤ tT, is correctly specified it attains the semiparametric efficiency bound.

Here we describe a particular implementation of this efficient score; this implementation mainly serves to motivate the proposed method in Section 5, where we consider estimation of a causal excursion effect in which Δ ≥ 1 and for which the causal excursion effect is marginal. Let the working model for μ(Ht) be g(Ht)Tα, where g(Ht) is a vector of features constructed from Ht and α is a finite-dimensional parameter that is variationally independent of β. We combine the resulting estimating function from (7) with an estimating function for α in the working model to obtain

mC(α,ψ)=t=1TIteAtf(Ht)Tψ{Yt,1eg(Ht)Tα+Atf(Ht)Tψ}K~t[g(Ht){Atpt(Ht)}f(Ht)], (8)

where

K~t=ef(Ht)Tψef(Ht)Tψ{1eg(Ht)Tα}pt(Ht)+{1eg(Ht)Tα+f(Ht)Tψ}{1pt(Ht)}.

Equation (8) is similar to a g-estimating equation (Robins, 1994), except that in (8) we are only estimating the immediate effect of a time-varying treatment rather than estimating the effect of all past treatments, and that we are estimating the nuisance parameter α simultaneously rather than estimating α in the first step of a two-step estimator. In Appendix B we prove the following result.

Theorem 2. Suppose (6) and Assumptions 1, 2, and 3 hold, and that the randomization probability pt(Ht) is known. Let ṁC be the derivative of mC(α, ψ) with respect to (α, ψ). Let (α^, ψ^) be a solution to PnmC(α,ψ)=0. Suppose ψ* is the value of ψ corresponding to the data generating distribution, P0. Under regularity conditions, n(ψ^ψ) is asymptotically normal with mean zero and variance-covariance matrix ΣC. A consistent estimator for ΣC is the lower block diagonal (p × p) entry of the matrix {Pnm.C(α^,ψ^)}1{PnmC(α^,ψ^)mC(α^,ψ^)T}{Pnm.C(α^,ψ^)}1T. Furthermore, when g(Ht)Tα is a correct model for μ(Ht) in the sense that there exists α* such that g(Ht)Tα* = log E(Yt,1Ht, At = 0, It = 1), ψ^ achieves the semiparametric efficiency bound of the semiparametric model defined in Theorem 1.

Remark 1. The resulting estimator ψ^ is robust in the sense that it is consistent even if exp{g(Ht)Tα} is a misspecified model for E(Yt, 1Ht,It = 1,At = 0). This robustness results from the orthogonality between the so-called blipped-down outcome (Robins, 1997), exp{−Atf(Ht)Tψ*}Yt,1, and the centered treatment indicator, Atpt(Ht)—i.e., E[exp{−Atf (Ht)Tψ*}Yt,1 {Atpt(Ht)} ∣ Ht]=0. This orthogonality follows from an important property of the blipped-down outcome: E[exp{Atf(Ht)Tψ}Yt,1Ht,At]=E{Yt,1(A¯t1,0)Ht,At}. This property plays a key role in the robustness of both the estimator in Theorem 2 and the estimator we develop in Section 5.

Remark 2. Consistency of the estimator ψ^ requires that (6) holds; in other words, it requires that the analysis model f(Ht)Tψ is a correctly specified model for the true conditional treatment effect βC(t,Ht). When f(Ht)Tψ is an incorrect model for the true βC(t,Ht), ψ^ converges in probability to some ψ′ whose form is given in Appendix D.

Remark 3. If g(Ht) includes unbounded covariates, then the working model g(Ht)Tα may always be a misspecified model for log E(Yt,1Ht,At = 0, It = 1), whose range is (−∞, 0]. One solution is to transform the unbounded covariates before including them in the working model.

5. Estimator for the marginal excursion effect

Now we focus on estimation of βM(t,St) where St is an arbitrary subset of Ht. Suppose Δ ≥ 1 is a positive integer. Recall that

βM(t,St)=logE[E{j=t+1t+Δ11(Aj=0)1pj(Hj)Yt,ΔAt=1,Ht,It=1}St,It=1]E[E{j=t+1t+Δ11(Aj=0)1pj(Hj)Yt,ΔAt=0,Ht,It=1}St,It=1].

Note that unlike βC( t, Ht) whose definition is independent of the choice of f(Ht), the definition of βM(t, St) is dependent on the choice of St. We make a parametric assumption on βM(t, St). Suppose that for 1 ≤ tT,

βM(t,St)=StTβ (9)

holds for some p-dimensional parameter β. Note this model allows for time-dependent effects; St could include a vector of basis functions of t. The estimation method described below readily generalizes to situations where the parametric model has a known functional form that may be nonlinear; the use of a linear model here enhances presentation clarity.

We propose to use a marginal generalization of the estimating function (8) to estimate β. The proposed estimating function is

mM(α,β)=t=1T+Δ1IteAtStTβ{Yt,Δeg(Ht)Tα+AtStTβ}Jt[g(Ht){Atp~t(St)}St], (10)

where exp{g(Ht)Tα} is a working model for E{Yt,Δ(A¯t1,0,0¯)Ht,It=1,At=0}. Because the model is now on the marginal effect, we apply a weighting and centering technique similar to Boruvka et al. (2018). The weight at time t is

Jt={p~t(St)pt(Ht)}At{1p~t(St)1pt(Ht)}1At×j=t+1t+Δ11(Aj=0)1pj(Hj), (11)

where p~t(St)(0,1) is arbitrary as long as it does not depend on terms in Ht other than St. The product, j=t+1t+Δ11(Aj=0){1pj(Hj)}, is the standard inverse probability weights for settings with Δ > 1. The ratio of probabilities, {p~t(St)pt(Ht)}At[{1p~t(St)}{1pt(Ht)}]1At, can be viewed as a change of probability: intuitively, this ratio transforms the data distribution in which At is randomized with probability pt(Ht) to a distribution acting as if At were randomized with probability p~t(St). We thus center At with p~t(St); this centering results in orthogonality between the estimation of β and the estimation of the nuisance parameter, α. The weighting and centering, together with the factor exp(AtStTβ), makes the resulting estimator for β consistent even when the working model exp{g(Ht)Tα} is misspecified.

In Appendix C we prove the following result.

Theorem 3. Suppose (9) and Assumptions 1, 2, and 3 hold, and that the randomization probability pt(Ht) is known. Suppose β* is the value of β corresponding to the data generating distribution, P0. Let ṁM be the derivative of mM (α,β) with respect to (α, β). Let (α^, β^) be a solution to PnmM(α,β)=0. Under regularity conditions, n(β^β) is asymptotically normal with mean zero and variance-covariance matrix ΣM. A consistent estimator for ΣM is the lower block diagonal (p × p) entry of the matrix {Pnm.M(α^,β^)}1{PnmM(α^,β^)mM(α^,β^)T}{Pnm.M(α^,β^)}1T.

Remark 4. The consistency of β^ does not require the working model exp{g(Ht)Tα} to be correctly specified. This robustness property is desirable because Ht can be high dimensional in an MRT where the total number of time points, T, can be in the hundreds or even thousands, which makes it difficult to model E{Yt,Δ(A¯t1,0,0¯)Ht,It=1,At=0} correctly.

Remark 5. Under the assumptions in Theorem 3, the choice of p~t(St) doesn’t affect the consistency of β^ as long as it depends at most on St and it lies in (0, 1); it affects the asymptotic variance of β^. When the analysis model StTβ is an incorrect model for the true βM (t, St), p~t(St) contributes to determine the probability limit of β^. For example, suppose the data analyst chooses Δ = 1 and St = ∅; i.e., the analysis model is a constant over time, β0. Suppose, however, that the true treatment effect βM(t, ∅) is not a constant over time. In this case, when we set p~t(St) to be any constant in (0, 1), β^ converges in probability to

β=logt=1TE{E(Yt,1Ht,At=1)It=1}E(It)t=1TE{E(Yt,1Ht,At=0)It=1}E(It),

which further simplifies to

logt=1TE(Yt,1It=1,At=1)E(It)t=1TE(Yt,1It=1,At=0)E(It)

if the randomization probability pt(Ht) is constant. For general Δ and St, the form of the probability limit of β^, β′, is provided in Appendix E.

Remark 6. The estimating equation (10) for β is motivated by the locally efficient estimating equation (8) for ψ in terms of the use of the blipping-down factor, exp(AtStTβ). Note that unlike ψ from (8), the estimator β^ from (10) may not be semiparametric locally efficient. We were not able to derive the semiparametric efficiency bound for β in (9). We suspect that the semiparametric efficient estimating equation for β would include an analogue of K~t, but it is not straightforward how to obtain the marginal analogue of K~t for estimating β.

Remark 7. In the definition of βM (t, St), Δ characterizes the length—i.e., the number of time points—of the excursion into the future. Since the excursion considered here specifies no treatment at the Δ – 1 time points following the current time point, consideration of a large Δ may result in instability in the estimator based on (10). This would occur if the randomization probability to treatment at any time point is much greater than zero. To see this, note that the summand in the estimating equation contributes to the estimation—i.e., is not a constant zero—only if Jt is nonzero, which holds only if Aj = 0 for all t + 1 ≤ jt + Δ − 1. Therefore, for large Δ, when the randomization probability is close to zero, many of the observed treatment trajectories will contribute to the estimation; when the randomization probability is much greater than zero, only very few treatment trajectories will contribute to the estimation, making the estimator unstable with large variance.

6. Simulation

6.1. Overview

In the simulation we focus on the causal excursion effect with Δ = 1, and we conduct two simulation studies to evaluate the proposed estimator of the marginal excursion effect (EMEE) in Section 5 and the semiparametric, locally efficient estimator of the conditional effect (ECE) described in Section 4.

Because the sandwich estimator for the variance of EMEE in Theorem 3 can be anti-conservative when the sample size is small, we adopt the small sample correction technique in Mancl & DeRouen (2001) to modify the term PnmM(α^,β^)2 in the variance estimator. In particular, we pre-multiply the vector of each individual’s residual, (Yt,1exp{g(Ht)Tα^+AtStTβ^}:1tT), by the inverse of the identity matrix minus the leverage for this individual. Also, as in Liao et al. (2016), we use critical values from a t distribution. In particular, for a known p-dimensional vector c, to test the null hypothesis cT β = 0 or to form two-sided confidence intervals, we use the critical value tnpq1(1ξ2), where p, q are the dimensions of β,α, respectively, and ξ is the significance level. Similar small sample corrections are applied to ECE as well.

The numerical algorithm that solves PnmC(α,ψ)=0 can be unstable when the denominator in K~t gets close to 0. This is because exp{g(Ht)Tα} and exp{g(Ht)Tα + f(Ht)Tψ} are not constrained within (0, 1). In our implementation of ECE, to improve the numerical stability we replace K~t in (8) by

ef(Ht)Tψef(Ht)Tψ[1min{eg(Ht)Tα,λ}]pt(Ht)+[1min{eg(Ht)Tα+f(Ht)Tψ,λ}]{1pt(Ht)}, (12)

with the truncation parameter value λ = 0.95. In Appendix G we also provide an alternative, two-step implementation of ECE that do not rely on truncation. The two implementations result in similar performance in terms of standard error and bias, so we conjecture that this truncation-based implementation with λ = 0.95 incurs at most negligible efficiency loss compared to the original ECE estimator, i.e., the one with λ = ∞.

Throughout the simulations, we assume that all individuals are available at all time points, and thus we omit It = 1 in writing conditional expectations.

R code (R Core Team, 2018) to reproduce the simulation results can be downloaded at https://github.com/tqian/binary-outcome-mrt.

6.2. Simulation on consistency

We consider a generative model where an important moderator exists. We illustrate that EMEE consistently estimates both the marginal excursion effect when the moderator is not included in St, in which case the effect is averaged over the distribution of the moderator, and the causal excursion effect moderation when the moderator is included in St. On the other hand, ECE only consistently estimates the treatment effect conditional on the full history, which we refer to as the conditional treatment effect, when the moderator is included in f(Ht); incorrect use of ECE to estimate the marginal excursion effect by excluding the moderator in f(Ht) results in inconsistent estimates.

We use the following generative model. The time-varying covariate, Zt, is independent of all variables observed before Zt, and it takes three values 0, 1, 2 with equal probability. The randomization probability is constant with pt(Ht) = 0.2. The outcome Yt,1 is generated from a Bernoulli distribution with

E(Yt,1Ht,At)={0,21Zt=0+0.51Zt=1+0.41Zt=2}eAt(0.1+0.3Zt).

Here, Zt moderates the conditional treatment effect: The true conditional treatment effect βC(t,Ht) equals 0.1 + 0.3Zt.

We first consider estimating the fully marginal excursion effect, which equals

β0=logE{E(Yt,1Ht,At=1)}E{E(Yt,1Ht,At=0)}=0.477.

This is the setting of a typical primary analysis of MRT. In order to estimate β0, by Theorem 3 it is appropriate to use the EMEE estimator with St = 1. For illustration purpose, we also consider using the ECE estimator with f(Ht) = 1 in the simulation. Note that this choice of f(Ht) corresponds to a misspecified model for βC(t,Ht), because the true conditional treatment effect is βC(t,Ht) = 0.1 + 0.3Zt. For this generative model, the ECE estimator is appropriate—i.e., its in-probability limit is easily interpretable—only if Zt is included in f(Ht). For comparison, we also include the generalized estimating equations (GEE) estimator for binary outcome with log link in the simulation, because GEE is widely used in analyzing mobile health data (Schwartz & Stone, 2007; Bolger & Laurenceau, 2013). We consider two working correlation structures for GEE: independent, denoted by GEE.ind, and exchangeable, denoted by GEE.exch. We use the working model g(Ht)Tα = α0 + α1Zt for log of the expected outcome under no treatment, which is misspecified for all estimators.

The simulation result for estimating β0 is given in Table 1. The total number of time points is T = 30 for each individual. The bias, standard deviation (SD), root mean squared error (RMSE), 95% confidence interval coverage probability before small sample correction (CP (unadj)) and after small sample correction (CP (adj)) are all computed based on 1000 replicates. As expected, EMEE consistently estimates β0, and the incorrect use of ECE, which is due to misspecification of the conditional treatment effect model with f(Ht) = 1, results in an inconsistent estimator for β0. The consistency of GEE generally requires the working model g(Ht)Tα to be correct; in other words, it does not have the robustness property as EMEE. The result shows that both GEE.ind and GEE.exch are inconsistent. We also see that small sample correction helps to improve the confidence interval coverage for EMEE.

Table 1.

Performance of EMEE, ECE, GEE.ind, and GEE.exch for the marginal excursion effect β0.

Estimator Sample size Bias SD RMSE CP (unadj) CP (adj)
EMEE 30 0.000 0.077 0.077 0.93* 0.94
50 0.001 0.057 0.057 0.94 0.95
100 0.000 0.041 0.041 0.95 0.95
ECE 30 0.048* 0.075 0.089 0.85* 0.88*
50 0.049* 0.055 0.074 0.84* 0.85*
100 0.048* 0.040 0.063 0.75* 0.76*
GEE.ind 30 0.041* 0.073 0.084 0.88* 0.89*
50 0.042* 0.054 0.069 0.86* 0.87*
100 0.041* 0.039 0.056 0.80* 0.81*
GEE.exch 30 0.041* 0.073 0.084 0.87* 0.89*
50 0.042* 0.054 0.069 0.86* 0.88*
100 0.041* 0.039 0.056 0.80* 0.81*

EMEE: the estimator of the marginal excursion effect proposed in Section 5. ECE: the semiparametric, locally efficient estimator of the conditional effect described in Section 4. GEE.ind: GEE with independent working correlation structure. GEE.exch: GEE with exchangeable working correlation structure. SD: standard deviation. RMSE: root mean squared error. CP: 95% confidence interval coverage probability, before (unadj) and after (adj) small sample correction. Numbers with

*

indicate when Bias or CP are significantly different, at the 5% level, from 0 or 0.95, respectively. Sample size refers to the number of individuals in each simulated trial.

6.3. Simulation on efficiency

Using the same generative model as in Section 6.2, we now consider estimating the excursion effect moderation by Zt, which can occur in a typical secondary analysis. We set St = Zt in EMEE and f(Ht) = Zt in ECE. Because the generative model implies that

logE{E(Yt,1Ht,At=1)Zt}E{E(Yt,1Ht,At=0)Zt}=logE(Yt,1Ht,At=1)E(Yt,1Ht,At=0)=0.1+0.3Zt,

in this case the parameter value in the causal excursion effect and in the conditional treatment effect coincides. In other words, the analysis models for EMEE and ECE are both correct, and hence both estimators should be consistent for β0 = 0.1 and β0 = 0.3.

To assess the relative efficiency between ECE and EMEE, we consider two ways to specify the control variables: an incorrectly specified working model g(Ht)Tα = α0 + α1Zt, and a correctly specified working model g(Ht)Tα=α0+α1Zt+α21Zt=2. We will assess the relative efficiency between EMEE and ECE as they are both consistent for the same estimands. We also included GEE.ind and GEE.exch for comparison. Because consistency of GEE relies on correct specification of not only the treatment effect model but also the control variables, we expect GEE.ind and GEE.exch to be consistent only when g(Ht)Tα=α0+α1Zt+α21Zt=2.

The results are given in Tables 2 and 3. In both tables, the total number of time points is 30 for each individual. The bias, standard deviation (SD), root mean squared error (RMSE), 95% confidence interval coverage probability before small sample correction (CP (unadj)) and after small sample correction (CP (adj)) are all computed based on 1000 replicates.

Table 2.

Performance of EMEE, ECE, GEE.ind, and GEE.exch for the treatment effect modification St = f(Ht) = Zt, when the working model is misspecified, i.e., g(Ht)Tα = α0 + α1Zt.

Estimator Sample size β 0
β 1
Bias RMSE SD CP (unadj) CP (adj) Bias RMSE SD CP (unadj) CP (adj)
EMEE 30 −0.02 0.20 0.20 0.94 0.95 0.01 0.13 0.13 0.94 0.95
50 −0.01 0.16 0.16 0.95 0.96 0.01 0.11 0.11 0.94 0.95
100 −0.01 0.11 0.11 0.96 0.96 0.01 0.07 0.07 0.95 0.96
ECE 30 −0.02 0.18 0.18 0.94 0.95 0.01 0.12 0.12 0.93* 0.94
50 −0.01 0.15 0.15 0.94 0.95 0.00 0.09 0.09 0.94 0.94
100 −0.01 0.10 0.10 0.96 0.96 0.01 0.06 0.06 0.94 0.95
GEE.ind 30 0.14* 0.21 0.15 0.82* 0.85* −0.12* 0.15 0.08 0.75* 0.78*
50 0.15* 0.19 0.12 0.75* 0.77* −0.12* 0.14 0.07 0.60* 0.63*
100 0.15* 0.17 0.08 0.57* 0.58* −0.12* 0.13 0.05 0.33* 0.34*
GEE.exch 30 0.14* 0.21 0.15 0.82* 0.85* −0.12* 0.15 0.08 0.75* 0.77*
50 0.15* 0.19 0.12 0.75* 0.77* −0.12* 0.14 0.07 0.60* 0.62*
100 0.15* 0.17 0.08 0.57* 0.58* −0.12* 0.13 0.05 0.33* 0.34*

EMEE: the estimator of the marginal excursion effect proposed in Section 5. ECE: the semiparametric, locally efficient estimator of the conditional effect described in Section 4. GEE.ind: GEE with independent working correlation structure. GEE.exch: GEE with exchangeable working correlation structure. SD: standard deviation. RMSE: root mean squared error. CP: 95% confidence interval coverage probability, before (unadj) and after (adj) small sample correction. Numbers with

*

indicate when Bias or CP are significantly different, at the 5% level, from 0 or 0.95, respectively. Sample size refers to the number of individuals in each simulated trial.

Table 3.

Performance of EMEE, ECE, GEE.ind, and GEE.exch for the treatment effect modification St = f(Ht) = Zt, when the working model is correctly specified, i.e., g(Ht)Tα=α0+α1Zt+α21Zt=2.

Estimator Sample size β 0
β 1
Bias RMSE SD CP (unadj) CP (adj) Bias RMSE SD CP (unadj) CP (adj)
EMEE 30 0.00 0.21 0.21 0.93* 0.94 0.00 0.13 0.13 0.94 0.95
50 0.00 0.16 0.16 0.94 0.95 0.00 0.11 0.11 0.94 0.95
100 0.00 0.11 0.11 0.95 0.95 0.00 0.07 0.07 0.95 0.95
ECE 30 0.01 0.17 0.17 0.94 0.95 −0.01 0.12 0.12 0.93* 0.94
50 0.00 0.13 0.13 0.94 0.95 0.00 0.09 0.09 0.94 0.95
100 0.00 0.10 0.10 0.94 0.95 0.00 0.06 0.06 0.95 0.95
GEE.ind 30 0.01 0.17 0.17 0.94 0.95 −0.01 0.12 0.12 0.93* 0.94
50 0.00 0.13 0.13 0.94 0.95 0.00 0.09 0.09 0.94 0.95
100 0.00 0.10 0.10 0.94 0.94 0.00 0.06 0.06 0.94 0.95
GEE.exch 30 0.01 0.17 0.17 0.94 0.95 −0.01 0.12 0.12 0.93* 0.94
50 0.00 0.13 0.13 0.94 0.95 0.00 0.09 0.09 0.94 0.95
100 0.00 0.10 0.10 0.94 0.94 0.00 0.06 0.06 0.95 0.95

EMEE: the estimator of the marginal excursion effect proposed in Section 5. ECE: the semiparametric, locally efficient estimator of the conditional effect described in Section 4. GEE.ind: GEE with independent working correlation structure. GEE.exch: GEE with exchangeable working correlation structure. SD: standard deviation. RMSE: root mean squared error. CP: 95% confidence interval coverage probability, before (unadj) and after (adj) small sample correction. Numbers with

*

indicate when Bias or CP are significantly different, at the 5% level, from 0 or 0.95, respectively. Sample size refers to the number of individuals in each simulated trial.

Table 2 shows the simulation result for EMEE with St = Zt and ECE with f(Ht) = Zt, both with incorrectly specified control variables g(Ht)Tα = α0 + α1Zt. As expected, EMEE and ECE are consistent for β0 and β1, and GEE.ind and GEE.exch are inconsistent. We also see that ECE can be slightly more efficient than EMEE. For example, the relative efficiency between ECE and EMEE for estimating β0 with sample size 30 is (0.20/0.18)2 = 1.23.

Table 3 shows the simulation result for EMEE with St = Zt and ECE with f(Ht) = Zt, both with correctly specified control variables g(Ht)Tα=α0+α1Zt+α21Zt=2. As expected, all four estimators are consistent for β0 and β1. In this case, ECE is much more efficient than EMEE due to ECE achieving the semiparametric efficiency bound under correctly specified control variables. For example, the relative efficiency between ECE and EMEE for estimating β0 with sample size 30 is (0.21/0.17)2 = 1.53.

The above results indicate that when the working model g(Ht)Tα is misspecified, there could be slight efficiency gain by using ECE over EMEE when both estimators are consistent for the causal excursion effect. When the working model is correctly specified, the efficiency gain by using ECE can be significant. Additional simulations in Appendix F under other generative models also support this conclusion. Thus if one had adequate data so as to consistently estimate the potentially complex, high dimensional E(Yt,1Ht,At = 0) and one felt confident that there are no other covariates in Ht than St that interact with treatment so that the conditional treatment effect and the causal excursion effect are equal when f(Ht) = St, then it could be worthwhile to use ECE to estimate the causal excursion effect.

7. Application

BariFit is a 16-week MRT conducted in 2017 by Kaiser Permanente, which aimed to promote weight maintenance for those who went through Bariatric surgery (Ridpath, 2017). In this section, we assess the effect of the food tracking reminder on individual’s food log completion rate using estimation methods proposed in this paper. The data set contains 45 participants. The food tracking reminder was randomly delivered to each participant with probability 0.5 every morning as a text message. Because of the form of the intervention, all participants were available for this intervention throughout the study; i.e., It = 1 for all t. The binary proximal outcome, food log completion, is coded as 1 for a day if a participant logged > 0 calories in the Fitbit app on that day, and 0 otherwise. The food log completion rate averaged over all participant-days where a reminder is delivered is 0.5120, and the food log completion rate averaged over all participant-days where a reminder is not delivered is 0.5118. An exploratory analysis indicates that the effect of the reminder seems to vary greatly among participants, and although there is an overall decreasing trend of the food log completion rate with day-in-study, there is no obvious pattern in terms of how the effect of the reminder varies with day-in-study. Details of the exploratory analysis are in Appendix H.

We used EMEE for assessing the marginal excursion effect as well as the effect moderation by certain baseline and time-varying covariates. For an individual, denote by Dayt the day-in-study of the t-th time point, which is coded as 0, 1,…, 111, Gen the gender of the individual, and Yt,1 the indicator of the individual completing their food log for Dayt. For all the analyses in this section, we always included Dayt, Gen, and the lag-1 outcome Yt−1, 1 in the control variables g(Ht), as they are prognostic of Yt,1 in a preliminary generalized estimating equation fit that is not presented here.

We analyze the marginal excursion effect β0 of the food tracking reminder on food log completion by setting St = 1 with the analysis model

logE{E(Yt,1Ht,At=1)}E{E(Yt,1Ht,At=0)}=β0.

We analyze the effect moderation by St with St = Dayt, St = Gen, and St = Yt−1, 1, respectively, with the analysis model

logE{E(Yt,1Ht,At=1)St}E{E(Yt,1Ht,At=0)St}=β0+β1St.

Results of the analysis are presented in Table 4. Neither the marginal excursion effect nor the effect moderation by any of the three moderators are significantly different from zero.

Table 4.

Analysis result for marginal excursion effect and effect moderation for the effect of food tracking reminder on food log completion rate in BariFit MRT. Estimates reported are on the log relative risk scale.

Analysis model β 0
β 1
Estimate SE 95% CI p-value Estimate SE 95% CI p-value
St = 1 0.014 0.021 (−0.028, 0.056) 0.50 - - - -
St = Dayt 0.035 0.031 (−0.028, 0.098) 0.27 −0.0005 0.0007 (−0.0018, 0.0009) 0.49
St = Gen −0.006 0.017 (−0.041, 0.029) 0.75 0.026 0.032 (−0.039, 0.091) 0.43
St = Yt−1, 1 0.017 0.095 (−0.175, 0.209) 0.86 −0.003 0.094 (−0.193, 0.186) 0.97
*

SE: standard error. 95% CI: 95% confidence interval. SE, 95% CI and p-value are based on small sample correction described in Section 6.1.

The result indicates that no effect of the food tracking reminder is detectable from the data. There are two possible reasons for the result, which are interrelated. One is an insufficient sample size. This study was not sized to test this particular hypothesis; instead, it was sized to test for other intervention components not mentioned here. The other reason is that the true effect may be small or zero. These findings may inform the next iteration of BariFit study in the following ways. If the researchers want to improve the effectiveness of the food tracking reminder, they may consider implementing it as a notification with a smartphone app. The current reminder is sent as text message, which cannot be tailored to the individual’s current context such as location or weather. Such tailoring via a smartphone app may improve effectiveness of the reminder. Alternatively, if the researchers no longer wish to investigate the proximal effect of the food tracking reminder, they may choose not to randomize it in the next iteration of BariFit. This might be done by either combining the food tracking reminder with other messages that will be sent in the morning or to remove the food tracking reminder completely from the intervention. This can help to reduce the burden of the mobile health intervention on the individual.

8. Discussion

The causal excursion effect βM(t,St) defined in this paper is different from the majority of the literature on causal inference in the longitudinal setting (Robins, 1994, 2000; Van der Laan & Robins, 2003). Rather than a contrast of the expected outcome under two fixed treatment trajectories, the causal excursion effect is a contrast of two excursions from the current treatment protocol into the future. In the two excursions, the past treatments are stochastic with randomization probability determined by the study design, and their distribution is usually integrated over in the marginalization. We argue the causal excursion effect is a suitable estimand for the primary and secondary analyses in MRT for the following reasons. First, the causal excursion effect approximates an effect in the real world implementation of a mobile health intervention. Because designing an MRT incorporates considerations regarding how the time-varying treatment would be implemented in real life via considerations of user burden, habituation, and so on, the treatment assignment protocol in an MRT should be a protocol that might be plausibly implemented. For example, domain scientists may aim to deliver around an average of one push notification every other day based on burden considerations; this then is reflected in the MRT treatment protocol via the choice of randomization probability. Second, the causal excursion effect provides an indication of effective deviations from the current treatment protocol, or how it might be improved. Because of its interpretation as excursions from the current treatment protocol, the causal excursion effect tells us whether a treatment is worth further consideration via the analysis of the fully marginal effect and gives an indication of whether the treatment protocol should be further modified depending on time-varying covariates via the analysis of effect modification. Third, it naturally extends the traditional analyses used in screening fractional factorial designs to include marginalization not only over other factors/covariates but also over time. Lastly, the marginal aspect of the causal excursion effect allows for us to design trials with higher power to detect a meaningful effect with a practical sample size, than if we focus on the alternative meaningful effect conditional on the full history, βC(t,Ht).

When is the other effect considered in the paper, the treatment effect conditional on the full history βC (t,Ht), relevant in the analysis of MRT? Because consistent estimation for and interpretation of βC(t,Ht) rely on model (6) being correct, it is usually not suitable for the often pre-specified primary and secondary analyses. However, it can be useful for exploratory analysis and hypothesis generation when flexible models such as splines or machine learning algorithms may be used to analyze MRT data, especially if the sample size is large; an example is an MRT that is deployed through publicly available smartphone applications with thousands of users or more.

Throughout we treated the model for the proximal outcome under no treatment, E{Yt,Δ(A¯t1,0,0¯)Ht,It=1,At=0}, as a nuisance parameter, and we used a working model exp{g(Ht)Tα} for this nuisance parameter to reduce noise. In a series of works considering modeling of the treatment effect on a binary outcome in both cross-sectional (Richardson et al., 2017) and longitudinal settings (Wang et al., 2017), those authors propose to instead use log odds-product as the nuisance parameter. This way the nuisance parameter is no longer constrained by the treatment effect model on the relative risk scale. As discussed by these authors, the valid range of E{Yt,Δ(A¯t1,0,0¯)Ht,It=1,At=0} is constrained by the treatment effect model on the relative risk scale, because E{Yt,Δ(A¯t1,1,0¯)Ht,It=1,At=1} must be within [0, 1]. We agree that this congeniality issue is critical when prediction is the goal as the nuisance part of the model would then be of interest, or when the consistency of the estimator for the parameters in the treatment effect depends on the correct specification of the nuisance part of the model. In the analysis of MRT data, however, the nuisance part of the model is of minimal interest, and more importantly consistency of the estimation methods developed in this paper do not depend on the correct specification of the nuisance part of the model. Therefore, since the purpose of modeling the nuisance parameter is to reduce noise, we choose to use E{Yt,Δ(A¯t1,0,0¯)Ht,It=1,At=0} as the nuisance parameter, because its interpretability makes it easier for domain scientists to model. Admittedly, the estimated probability exceeding [0, 1] can sometimes cause numerical instability in the semiparametric, locally efficient estimator described in Section 4. We addressed this by using a truncation-based implementation in (12) or a two-step implementation in Appendix G. Alternative solutions include using congeniel parametrization such as that in Richardson et al. (2017).

As pointed out by a reviewer, which we also found in our communication with domain scientists, the consideration of availability may raise concern of generalizability of the estimated effect. In an MRT, availability, just as the time-varying proximal outcome, is a time-varying outcome. To check for generalizability, baseline variables that are thought to be related to the time-varying proximal outcome and the time-varying availability will be collected in the MRT, and the distribution of those baseline variables will be compared to the distribution of them in the target population. This is similar to what one would do if there are concerns about the generalizability of results of a standard clinical trial to a particular population.

There are a few directions for future research. First, we have assumed binary treatment in the paper. Extension to treatment with multiple levels could involve modeling the treatment effect, defined as contrast to a reference level, as a function of the treatment level. Second, we have focused on estimating the marginal excursion effect. An interesting extension is to introduce random effects to the excursion effect and allow person-specific predictions. For example, the exploratory analysis in Appendix H implies possible treatment effect heterogeneity among individuals in the BariFit data set, and random effects models can be one way to account for such heterogeneity. With random effects it would be nontrivial to deal with both the nonlinear link function as well as the marginalization. Third, since there are numerous variables that can be potentially included in g(Ht) for noise reduction, one could, because of the high dimensionality of Ht, consider penalization methods for model selection in building the working model g(Ht)Tα. Fourth, it would be useful to formulate how to best use the analyses proposed here to inform decision making for future implementation of the mobile health intervention, such as forming a warm-start policy for the mobile health intervention that utilizes reinforcement learning. For example, suppose a moderation analysis finds that the treatment effect is higher on weekends than on weekdays, then one might use weekend/weekday as a tailoring variable of a decision rule. An active area of current research is how to best use this type of data to form a treatment policy (Luckett et al., 2019). Fifth, it would be useful to develop an easy-to-use sample size calculator that provides the sample size required to achieve a desired power to test for marginal excursion effects.

As suggested by a reviewer, we point out that the purpose of focusing on causal excursion effects is for the primary and secondary analyses. Absent additional assumptions, the analysis results cannot be directly used, in general, to form the optimal treatment strategy, in the sense of maximizing certain summary measure at the end of the trial such as the sum of the proximal outcomes over the time points, because the marginal effects do not condition on the full past history. Other methods such as reinforcement learning may be used to estimate the optimal treatment strategy, and this is an active area of research in mobile health.

Finally, we note that we used an preliminary version of the estimator for the marginal excursion effect in analyzing the effect of push notification on user engagement in Bidargaddi et al. (2018).

Supplementary Material

Biometrika Qian supplemental material

Acknowledgement

Research reported in this paper was supported by National Institute on Alcohol Abuse and Alcoholism (NIAAA) of the National Institutes of Health under award number R01AA23187, National Institute on Drug Abuse (NIDA) of the National Institutes of Health under award numbers P50DA039838 and R01DA039901, National Institute of Biomedical Imaging and Bioengineering (NIBIB) of the National Institutes of Health under award number U54EB020404, National Cancer Institute (NCI) of the National Institutes of Health under award number U01CA229437, and National Heart, Lung, and Blood Institute (NHLBI) of the National Institutes of Health under award number R01HL125440. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Footnotes

Supplementary material

Supplementary material is available at Biometrika online. Appendix A includes the proof of the identifiability result (4). Appendix B includes the proof of Theorem 2. Appendix C includes the proof of Theorem 3. Appendix D includes the form of the limit of ψ^ when the conditional treatment effect model for βC(t,Ht) is misspecified. Appendix E includes the form of the limit of β^ in Remark 5 for general Δ. Appendix F includes additional simulation study results. Appendix G includes an alternative, two-step implementation of the ECE estimator that does not rely on weight truncation. Appendix H includes exploratory analysis result for the BariFit data set. Appendix I includes the proof of Theorem 1.

Contributor Information

TIANCHEN QIAN, Department of Statistics, University of California, Irvine, Donald Bren Hall, Irvine, California 92697, U.S.A..

HYESUN YOO, Department of Statistics, University of Michigan, 323 West Hall, 1085 South University, Ann Arbor, Michigan 48109, U.S.A..

PREDRAG KLASNJA, School of Information, University of Michigan, 4364 North Quad, 105 South State Street, Ann Arbor, Michigan 48109, U.S.A..

DANIEL ALMIRALL, Department of Statistics, University of Michigan, 323 West Hall, 1085 South University, Ann Arbor, Michigan 48109, U.S.A..

SUSAN A. MURPHY, Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, Massachusetts 02138, U.S.A.

REFERENCES

  1. Bidargaddi N, Almirall D, Murphy S, Nahum-Shani I, Kovalcik M, Pituch T, Maaieh H & Strecher V (2018). To prompt or not to prompt? A microrandomized trial of time-varying push notifications to increase proximal engagement with a mobile health app. JMIR mHealth and uHealth 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bolger N & Laurenceau J-P (2013). Intensive Longitudinal Methods: An Introduction to Diary and Experience Sampling Research. Guilford Press. [Google Scholar]
  3. Boruvka A, Almirall D, Witkiewitz K & Murphy SA (2018). Assessing time-varying causal effect moderation in mobile health. Journal of the American Statistical Association 113, 1112–1121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Dempsey W, Liao P, Klasnja P, Nahum-Shani I & Murphy SA (2015). Randomised trials for the fitbit generation. Significance 12, 20–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Dempsey W, Liao P, Kumar S & Murphy SA (2020). The stratified micro-randomized trial design: sample size considerations for testing nested causal effects of time-varying treatments. Annals of Applied Statistics , to appear. [DOI] [PMC free article] [PubMed]
  6. Free C, Phillips G, Galli L, Watson L, Felix L, Edwards P, Patel V & Haines A (2013). The effectiveness of mobile-health technology-based health behaviour change or disease management interventions for health care consumers: a systematic review. PLoS medicine 10, e1001362. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Greenland S (1987). Interpretation and choice of effect measures in epidemiologic analyses. American Journal of Epidemiology 125, 761–768. [DOI] [PubMed] [Google Scholar]
  8. Hong G & Raudenbush SW (2006). Evaluating kindergarten retention policy: A case study of causal inference for multilevel observational data. Journal of the American Statistical Association 101, 901–910. [Google Scholar]
  9. Hudgens MG & Halloran ME (2008). Toward causal inference with interference. Journal of the American Statistical Association 103, 832–842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Klasnja P, Hekler EB, Shiffman S, Boruvka A, Almirall D, Tewari A & Murphy SA (2015). Microrandomized trials: An experimental design for developing just-in-time adaptive interventions. Health Psychology 34, 1220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Klasnja P, Smith S, Seewald NJ, Lee A, Hall K, Luers B, Hekler EB & Murphy SA (2018). Efficacy of contextually tailored suggestions for physical activity: A micro-randomized optimization trial of heartsteps. Annals of Behavioral Medicine . [DOI] [PMC free article] [PubMed]
  12. Kramer J-N, Künzler F, Mishra V, Presset B, Kotz D, Smith S, Scholz U & Kowatsch T (2019). Investigating intervention components and exploring states of receptivity for a smartphone app to promote physical activity: Protocol of a microrandomized trial. JMIR Research Protocols 8, e11540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Laird NM & Ware JH (1982). Random-effects models for longitudinal data. Biometrics 38, 963–974. [PubMed] [Google Scholar]
  14. Law LM, Edirisinghe N & Wason JM (2016). Use of an embedded, micro-randomised trial to investigate non-compliance in telehealth interventions. Clinical Trials 13, 417–424. [DOI] [PubMed] [Google Scholar]
  15. Liang K-Y & Zeger SL (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22. [Google Scholar]
  16. Liao P, Klasnja P, Tewari A & Murphy SA (2016). Sample size calculations for micro-randomized trials in mhealth. Statistics in Medicine 35, 1944–1971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Luckett DJ, Laber EB, Kahkoska AR, Maahs DM, Mayer-Davis E & Kosorok MR (2019). Estimating dynamic treatment regimes in mobile health using v-learning. Journal of the American Statistical Association , 1–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Lumley T, Kronmal R & Ma S (2006). Relative risk regression in medical research: Models, contrasts, estimators, and algorithms. Unpublished paper, available at https://biostats.bepress.com/uwbiostat/paper293/ .
  19. Mancl LA & DeRouen TA (2001). A covariance estimator for GEE with improved small-sample properties. Biometrics 57, 126–134. [DOI] [PubMed] [Google Scholar]
  20. Methodology Center (2020). Micro-randomized trials. https://methodology.psu.edu/ra/adap-inter/mrt-projects/#proj. Accessed: 2020-07-13.
  21. Neugebauer R, van der Laan MJ, Joffe MM & Tager IB (2007). Causal inference in longitudinal studies with history-restricted marginal structural models. Electronic Journal of Statistics 1, 119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Newey WK (1990). Semiparametric efficiency bounds. Journal of Applied Econometrics 5, 99–135. [Google Scholar]
  23. Pepe MS & Anderson GL (1994). A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data. Communications in Statistics-Simulation and Computation 23, 939–951. [Google Scholar]
  24. R Core team (2018). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. [Google Scholar]
  25. Rabbi M, Kotov MP, Cunningham R, Bonar EE, Nahum-Shani I, Klasnja P, Walton M & Murphy S (2018). Toward increasing engagement in substance use data collection: Development of the substance abuse research assistant app and protocol for a microrandomized trial using adolescents and emerging adults. JMIR Research Protocols 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Richardson TS, Robins JM & Wang L (2017). On modeling and estimation for the relative risk and risk difference. Journal of the American Statistical Association 112, 1121–1130. [Google Scholar]
  27. Ridpath J (2017). How can we use technology to support patients after bariatric surgery? https://www.kpwashingtonresearch.org/news-and-events/recent-news/news-2017/how-can-we-use-technology-support-patients-after-bariatric-surgery/. Accessed: 2019-02-15.
  28. Robins J (1986). A new approach to causal inference in mortality studies with a sustained exposure period: Application to control of the healthy worker survivor effect. Mathematical Modelling 7, 1393–1512. [Google Scholar]
  29. Robins JM (1994). Correcting for non-compliance in randomized trials using structural nested mean models. Communications in Statistics-Theory and Methods 23, 2379–2412. [Google Scholar]
  30. Robins JM (1997). Causal inference from complex longitudinal data. In Latent Variable Modeling and Applications to Causality. Springer, pp. 69–117. [Google Scholar]
  31. Robins JM (2000). Marginal structural models versus structural nested models as tools for causal inference. In Statistical Models in Epidemiology, the Environment, and Clinical Trials. Springer, pp. 95–133. [Google Scholar]
  32. Robins JM (2004). Optimal structural nested models for optimal sequential decisions. In Proceedings of the Second Seattle Symposium in Biostatistics. springer. [Google Scholar]
  33. Robins JM, Hernan MA & Brumback B (2000). Marginal structural models and causal inference in epidemiology. Epidemiology 11, 550–560. [DOI] [PubMed] [Google Scholar]
  34. Rubin DB (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66, 688. [Google Scholar]
  35. Schildcrout JS & Heagerty PJ (2005). Regression analysis of longitudinal binary data with time-dependent environmental covariates: Bias and efficiency. Biostatistics 6, 633–652. [DOI] [PubMed] [Google Scholar]
  36. Schwartz JE & Stone AA (2007). The analysis of real-time momentary data: A practical guide. In The Science of Real-time Data Capture: Self-reports in Health Research, Shiffman S, Atienza A & Nebeling L, eds. Oxford University Press; New York, NY, pp. 76–113. [Google Scholar]
  37. Spring B (2017). Sense2stop: Mobile sensor data to knowledge. In: ClinicalTrials.gov [Internet]. Bethesda (MD): National Library of Medicine (US). 2000- [cited 2019 May 10]. Available at https://clinicaltrials.gov/ct2/show/nct03184389 . [Google Scholar]
  38. Spring B & Nahum-Shani I (2016). Smart weight loss management. https://grantome.com/grant/NIH/R01-DK108678-01A1. Accessed: 2020-07-13.
  39. Tate D & Valle G (2019). Precision public health: Enhancing connections to develop just-in-time adaptive intervention strategies (nudge). In: ClinicalTrials.gov [Internet]. Bethesda (MD): National Library of Medicine (US). 2000- [cited 2019 March 23]. Available at https://clinicaltrials.gov/ct2/show/nct03836391 . [Google Scholar]
  40. Tsiatis A (2007). Semiparametric Theory and Missing Data. Springer Science & Business Media. [Google Scholar]
  41. van der Laan MJ & robins JM (2003). Unified Methods for Censored Longitudinal Data and Causality. Springer Science & Business Media. [Google Scholar]
  42. Wang L, Richardson TS & Robins JM (2017). Congenial causal inference with binary structural nested mean models. arXiv preprint arXiv:1709.08281 . [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Biometrika Qian supplemental material

RESOURCES