Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Jan 16.
Published before final editing as: J Am Stat Assoc. 2025 Aug 5:10.1080/01621459.2025.2516197. doi: 10.1080/01621459.2025.2516197

Incorporating Auxiliary Variables to Improve the Efficiency of Time-Varying Treatment Effect Estimation

Jieru Shi 1, Zhenke Wu 1, Walter Dempsey 1
PMCID: PMC12807526  NIHMSID: NIHMS2107334  PMID: 41551576

Abstract

The use of smart devices (e.g., smartphones, smartwatches) and other wearables to deliver digital interventions to improve health outcomes has grown significantly in the past few years. Mobile health (mHealth) systems are excellent tools for the delivery of adaptive interventions that aim to provide the right type/amount of support, at the right time, by adapting to an individual’s changing context. Micro-randomized trials (MRTs) are an increasingly common experimental design that are the main source for data-driven evidence of mHealth intervention effectiveness. To assess time-varying causal effect moderation in an MRT, individuals are intensively randomized to receive treatment over time. In addition, measurements, including individual characteristics, and context are also collected throughout the study. The effective utilization of covariate information to improve inferences regarding causal effects has been well-established in the context of randomized control trials (RCTs), where covariate adjustment is applied to leverage baseline data to address chance imbalances and improve the asymptotic efficiency of causal effect estimation. However, the application of this approach to longitudinal data, such as MRTs, has not been thoroughly explored. Recognizing the connection to Neyman Orthogonality, we propose a straightforward and intuitive method to improve the efficiency of moderated causal excursion effects by incorporating auxiliary variables. We compare the robust standard errors of our method with those of the benchmark method. The efficiency gain of our approach is demonstrated through simulation studies and an analysis of data from the Intern Health Study (NeCamp et al., 2020).

Keywords: Causal Inference, Asymptotic Efficiency, Micro-randomized Trials, Mobile Health, Moderation Effect, Covariate Adjustment

1. Introduction

The use and development of mobile interventions are experiencing rapid growth. In “just-in-time” mobile interventions (Nahum-Shani et al., 2018), treatments are provided via a mobile device that is intended to help an individual make healthy decisions “at the moment” and thus have a proximal, near-term impact on health outcomes. Micro-randomized trials (MRTs; Klasnja et al. (2015); Dempsey et al. (2015)) provide data to guide the development of such mobile interventions (Free et al., 2013), with each participant in an MRT sequentially randomized to treatment numerous times, at possibly hundreds to thousands of occasions. The weighted and centered least squares (WCLS) method (Boruvka et al., 2018) is regarded as the benchmark method used for estimating moderated causal excursion effects, ensuring a consistent estimate with asymptotic normality. Estimates of these marginal comparisons are crucial to domain scientists in making decisions concerning whether to include the treatment in an mHealth intervention package.

In an MRT, measurements of individual characteristics, context, and response to treatments are collected passively through sensors or actively by self-report. We refer to additional variables besides the potential moderator as auxiliary variables. Such variables are analogous to baseline covariates in the context of randomized controlled trials (RCTs). Extensive research has demonstrated that incorporating baseline covariates into the analysis can improve estimation efficiency for the average treatment effect (ATE). Lin (2013) recommended using a fully interacted linear regression model, and it was shown to be asymptotically more efficient over both unadjusted and additive models. In recent years, further generalizations of covariate adjustment have been made to accommodate more complex randomization schemes (Su and Ding, 2021; Zhao and Ding, 2021) and to study arbitrary functions of response, such as linear contrasts, ratios, and odds ratios (Ye et al., 2022).

Despite these appealing properties, there is still a knowledge gap between existing covariate adjustment approaches and applications to causal excursion effect estimation using data arising from an MRT. More specifically, when the treatment, response, and moderators are time-varying, it’s unclear how to adjust auxiliary variables to improve the estimation efficiency of the marginal treatment effect.

This study makes two primary contributions to the evaluation of treatment effects in longitudinal data with time-varying treatments, responses, and moderators. Firstly, we highlight the crucial role of incorporating auxiliary variables in estimating time-specific causal effects and illustrate that doing so correctly can lead to a global efficiency gain. In addition, we introduce a comprehensive theoretical framework that incorporates auxiliary variables for analyzing moderated causal effects smoothed over time. This framework achieves local efficiency gains under mild conditions. These findings are further supported by both simulation studies and real-world data analysis.

1.1. Outline

The rest of the paper proceeds as follows. Section 2 reviews existing analytic techniques for MRTs, and covariate adjustment techniques on the ATE estimation. We specifically summarize our main contribution in Section 2.4 and explain why incorporating auxiliary variables in WCLS is challenging. In Section 3, we propose a general strategy of using auxiliary variables to improve the efficiency of the time-varying treatment effect estimation, and discuss the asymptotic properties of the moderated causal effect estimation. In Section 4, we present the theoretical results on efficiency improvement. Section 5 uses simulation studies to compare various estimators and standard errors under different data-generating processes. Section 6 extends the framework to accommodate time-lagged outcomes and post-treatment auxiliary variables. Section 7 illustrates the efficiency improvement using our proposed method with a recent MRT: the Intern Health Study (NeCamp et al., 2020). The paper concludes with a brief discussion in Section 8. All the technical proofs are collected in the Supplementary Materials.

2. Preliminaries

2.1. Notation

For a given individual j, let At,j denote the treatment at the t-th treatment occasion and Yt+1,j be the subsequent proximal response (t=1,,T;j=1,,N). For simplicity, the main results in this paper assume a binary treatment At,j{0,1}. Individual and contextual information at the t-th treatment occasion is represented by Xt,j, which may contain summaries of previous context, treatment, or response measurements. For example, prior to each treatment occasion, the individual might report their current mood. The vector Xt could then contain this measurement or, with previous measurements, variation or change in mood. Over the course of T treatment occasions, the resulting data from an individual ordered in time is X0,X1,A1,Y2,,XT,AT,YT+1. The overbar is used to denote a sequence of random variables or realized values up to a specific treatment occasion, for example, A¯t=A1,,At denotes the sequence of treatment up to and including decision time t. Information accrued up to treatment occasion t is represented by the history Ht=X0,X1,A1,Y2,,Xt. In this following, random variables or vectors are denoted with uppercase letters; lowercase letters denote their realized values.

We assume that the longitudinal data are independent and identically distributed across N individuals. Note that this assumption would be violated, if, for example, some of the treatments are used to enhance social support between individuals in the study (Liao et al., 2016). The following section defines the “causal excursion effect” estimand. Then we express the causal estimand in terms of the observed data and provide causal assumptions sufficient for these expressions.

2.2. Existing Inferential Methods Review

Many treatments are designed to influence an individual in the short term or proximally in time (Heron and Smyth, 2010). To answer questions related to the causal effect of time-varying treatments on the proximal response, we focus on the class of estimands referred to as “causal excursion effects”, which are a function of the decision point t and a set of moderators St, marginalizing over all other observed and unobserved variables (Boruvka et al., 2018; Qian et al., 2021). We provide formal definitions below using potential outcomes (Rubin, 1978; Robins, 1986).

Let Yt+1a¯t1 denote the potential outcome for the proximal response under treatment sequence a¯t1. Let Sta¯t1 denote the potential outcome for a time-varying effect moderator which is a deterministic function of the potential history up to time t, Hta¯t1. The causal excursion effect is

βp(t;s)=EpYt+1A¯t1,At=1Yt+1A¯t1,At=0StA¯t1=s. (1)

Equation (1) is defined with respect to a reference distribution p, i.e., the distribution of treatments A¯t1:=A1,A2,,At1. We follow common practice in observational mobile health studies where analyses such as GEEs (Liang and Zeger, 1986) are conducted marginally over p. To express the proximal response in terms of the observed data, we assume positivity, consistency, and sequential ignorability (Robins, 1994, 1997):

Assumption 2.1.

  • Consistency: For each tT and j, Yt+1,jA¯t,Xt,jA¯t1,At,jA¯t1=Yt+1,j,Xt,j,At,j, i.e., observed values equal the corresponding potential outcomes;

  • Positivity: if the joint density At=at,Ht=ht is greater than zero, then PAt=atHt=ht>0;

  • Sequential ignorability: For each tT, the potential outcomes Yt+1a¯t,Xt+1a¯t,At+1a¯t,,YT+1a¯T are independent of At,j conditional on the observed history Ht.

Under Assumption 2.1, (1) can be re-expressed in terms of observable data:

βp(t;s)=EEpYt+1At=1,HtEpYt+1At=0,HtSt=s. (2)

Assuming βp(t;s)=ft(s)β0 where ft(s)q is a feature vector comprised of a q-dimensional summary of observed information depending only on state s and decision point t, a consistent estimator for β0 can be obtained by minimizing the WCLS criterion (Boruvka et al., 2018):

nt=1TWt×Yt+1gtHtαAtp˜t1StftStβ2, (3)

where n is an operator denoting the sample average, Wt=p˜tAtSt/ptAtHt is a weight where the numerator is an arbitrary function with range (0,1) that only depends on St, and gtHtd are d control variables chosen to help reduce variance and to construct more powerful test statistics. See Boruvka et al. (2018) for more details on the estimand formulation and consistency, asymptotic normality, and robustness properties of the WCLS estimation method.

Remark 2.2.

Correct causal effect specification, i.e., βp(t;s)=ft(s)β is not required. Instead, we can follow prior literature (Dempsey et al., 2020; Shi et al., 2022) and interpret the proposed linear form as a working model. Specifically, β^ is a consistent and asymptotically normal estimator for

β=argminβEt=1Tp˜t1St1p˜t1Stβt;StftStβ2.

Therefore, the working model can be interpreted as an L2 projection of the true causal excursion effect onto the space spanned by a q-dimensional feature vector that only includes t and s, denoted by ft(s)β (Dempsey et al., 2020). Interpretation as a projection or as a correctly specified causal effect can be viewed as a bias-variance trade-off. The projection interpretation guarantees well-defined parameter interpretation in practice.

In addition to studying continuous proximal outcomes and defining the causal excursion effect as a linear contrast of the outcomes under different treatment allocations, as in Equation (2), there are also many MRTs concerning longitudinal binary outcomes. Qian et al. (2021) proposed an estimator of the marginal excursion effect (EMEE) by defining a log relative risk model for the causal excursion effect, and we present a detailed review in Appendix K.

2.3. Covariate Adjustment Literature Review

A review of Section 2.1 makes clear that if T=1, then the fully marginal causal excursion effect in MRTs is equivalent to the ATE in RCTs. The literature related to RCTs has extensively examined the use of covariate adjustment as an approach to enhance precision while making inferences on the ATE.

Consider an RCT with a binary treatment, where the treatment is randomly assigned in the initial stage of the study, and the N subjects are the population of interest. Neyman (1923) showed that the difference-in-means estimator is unbiased for ATE. Fisher (1935) proposed to use the ordinary least square (OLS) adjusted estimator of ATE, which is the estimated coefficient on the treatment A in the OLS regression of Y on 1,A,X0, hoping to leverage the information in the baseline covariates X0 to improve estimation efficiency. Freedman (2008) suggests that adjustment might hurt asymptotic precision.

Lin (2013) argues that in sufficiently large samples, the statistical problems Freedman (2008) raised are either minor or easily fixed. In addition, Lin (2013) shows that OLS adjustment with a full set of treatment × covariate interactions improves or does not hurt asymptotic precision, even when the regression model is incorrect. The robust Eicker-Huber–White (Eicker, 1967; Huber, 1967; White, 2014) variance estimator (Freedman, 2006) is consistent or asymptotically conservative (regardless of whether the interactions are included) in estimating the true asymptotic variance. More results on the asymptotic precision of treatment effect can be found in papers by Yang and Tsiatis (2001), Tsiatis et al. (2008), Zhang et al. (2008), Schochet (2010), and Negi and Wooldridge (2021).

More complex randomization schemes have led to recent developments in covariate adjustment methods. For example, Zhao and Ding (2021) examined a variety of schemes for rerandomization based on p-values (ReP), and corresponding ATE estimators from the unadjusted, additive, and fully interacted linear regressions. It is found that the estimator from the fully interacted regression is asymptotically most efficient. Su and Ding (2021) extended the theory to cluster-randomized experiments, where randomization over units within clusters is either unethical or logistically infeasible. Ye et al. (2022) used a model-assisted approach for covariate adjustment and generalized the outcome types to not only linear contrast, but ratios and odds ratios. Their conclusions are based on an asymptotic theory that provides a clear picture of how covariate-adaptive randomization and regression adjustment alter statistical efficiency.

2.4. Challenges

In light of these existing methods, we aim to develop a method to incorporate covariate adjustment techniques into the analysis of MRTs. The main challenges are twofold: First, in an MRT, the treatment, response, and moderators are all time-varying, thus it remains questionable whether we can still center the auxiliary variables using the overall average without producing inconsistent estimates; Second, if it is only feasible to estimate the centering parameters individually at each time point, the total number of such parameters can become prohibitive as the number of decision points grows. Therefore, it would be ideal to have a simpler approach that can be uniformly applied to all time points.

Guo and Basse (2021) and Ye et al. (2022) both proposed checklists of ideal properties for covariate adjustment methods. As we consider time-varying treatment effect estimation, we build upon their lists of the ideal properties the auxiliary variable adjustment method should have: (1) valid statistical inference and efficiency gain; (2) robustness to misspecification; (3) strong finite-sample performance; (4) computational simplicity; and (5) wide applicability. In the following sections, we describe a general method called “A2-WCLS” for performing auxiliary variable adjusted WCLS on longitudinal data with time-varying treatments to estimate causal excursion effects that satisfy (1)(5).

3. Estimand and Inferential Method

In this section, we explore the asymptotic relative efficiency of various candidate methods for adjusting auxiliary variables in the causal excursion effect estimation. Our goal is to specify “how to account for [auxiliary variables] in the analysis in order to improve precision” (ICH, 1998). We begin by making a parametric assumption concerning the causal parameter of interest:

Assumption 3.1.

Assume the causal excursion effect βp(t;s)=ft(s)β0, where ft(s)q is a feature vector comprised of a q-dimensional summary of observed information depending only on state s and decision point t.

3.1. Time-Specific Causal Excursion Effect Estimation

To establish a clear connection with previous research on covariate adjustment, we start with the time-specific causal excursion effect estimand, which shares similarities with the ATE estimand in RCTs. Recall that the ATE estimand is typically defined as the linear difference between the expected outcomes under different treatment allocations, i.e., E[E[YX,A=1]E[YX,A=0]]. In the context of micro-randomized trials (MRT), the time-specific causal excursion effect estimand can be defined as follows:

β(t)=EEYt+1Ht,At=1EYt+1Ht,At=0.

The parametric assumption stated in Assumption 3.1 can be reformulated as βt;St=β0,t, where St= and the model is non-parametric in time t. In this section, the phrase “more efficient” is explicitly referring to an estimator of β0,t that has a smaller asymptotic variance.

3.1.1. The Unadjusted Estimator

Analysis using solely the time-varying outcome and treatment data results in the unadjusted estimator. To clarify, in our particular context, the unadjusted estimator β^0,tU can be obtained by minimizing the following objective function:

nt=1TYt+1α0,tAtptβ0,t2. (4)

As we are interested in the time-specific causal effect, we consider the treatment randomization probability to be constant at each time, denoted as ptAtHt=pt. Thus, the corresponding weight is Wt=1. The unadjusted estimator β^0,tU is consistent and asymptotically normal. As the function presented in (4) is solely based on the data Yt,j,At,jt=1T, j=1,,N and is thus “unadjusted” for auxiliary variables, it will serve as the reference against which we will evaluate the relative efficiency of alternative estimators.

3.1.2. The WCLS estimator

Recall that to obtain the WCLS estimator, we need to minimize the objective function (3). For convenience, we center the control variables gtHt at their expectations EgtHt and denote the centered control variables as g˜tHt. Assuming βt;St=β0,t, we can reformulate the WCLS criterion as follows:

nt=1TYt+1α0,tg˜tHtα1,tAtptβ0,t2. (5)

As per the conclusion drawn by Boruvka et al. (2018), the estimator β^0,tWCLS obtained by minimizing (5) is consistent and asymptotically normal. Here the control variables gtHtd are selected to reduce variance and increase the power of test statistics. However, the question remains whether this goal can be consistently achieved. If not, it is important to identify the circumstances under which the incorporation of control variables gtHt may negatively impact the estimation efficiency of β^0,tWCLS when compared to the unadjusted estimator β^0,tU. To address this question, we first define β1,t=ΣgtHt1EYt+1Atptg˜tHt, which represents the least-squares solutions obtained from the criterion (7). Here, ΣgtHt=Eg˜tHtg˜tHtd×d denotes the variance-covariance matrix for the control variables gtHt. With these definitions, we present the following lemma:

Lemma 3.2.

The difference in the asymptotic variance between the WCLS estimator β^0,tWCLS and the unadjusted estimator β^0,tU can be expressed as follows:

pt1ptα1,tΣgtHtα1,t+212ptβ1,t. (6)

If pt=1/2, the adjustment is either neutral or favorable. However, if pt1/2, then adjustment may lead to an asymptotic variance inflation. Specifically, consider the case where gtHt is positively correlated with the outcome and strongly moderates the treatment effect (i.e., α1,t>0 and β1,t is large), if β1,t>0 and the randomization probability pt>1/2, the summation within the parentheses would result in a negative value. Hence term (6) would be positive, indicating an asymptotic variance increase when incorporating additional control variables as described in (5).

3.1.3. A More Efficient Estimator

So far, we have demonstrated that when estimating the time-specific causal excursion effect, the use of the WCLS criterion presented in (5) cannot guarantee efficiency improvement as compared to the unadjusted estimator. This conclusion aligns with what was drawn in Freedman (2008), and naturally leads one to question whether adjusted estimators should be recommended.

Here, we consider alternative ways of adjusting for auxiliary variables that provide some guarantees of estimation efficiency improvement. Motivated by Lin (2013), we incorporate similar ideas as in standard covariate adjustment into the WCLS approach. The proposed estimator of the causal excursion effect with auxiliary variable adjustment can then be obtained by minimizing the following criterion:

nt=1TYt+1α0,tg˜tHtα1,tAtptβ0,t+g˜tHtβ1,t2 (7)

Equation (7) is a strict extension of the proposal in Lin (2013). Lemma 3.3 below establishes an asymptotic efficiency gain of (7) over both the unadjusted and adjusted estimators.

Lemma 3.3.

Suppose Assumptions 2.1 and 3.1 hold, and the randomization probability ptAtHt is known. Given invertibility and regularity conditions, let α^t,β^0,tL,β^1,t minimize objective function (7):

  1. β^0,tL is consistent and asymptotically normal such that nβ^0,tLβ0,t𝒩0,Qt1ΣtLQt1, where Qt, ΣtL are defined in Appendix B;

  2. β^0,tL is at least as efficient as the unadjusted estimator β^0,tU described in (4), and the asymptotic efficiency gain is:
    α1,t+12ptβ1,tΣgtHtα1,t+12ptβ1,tpt1pt+β1,tΣgtHtβ1,t; (8)
  3. β^0,tL is at least as efficient as the WCLS estimator β^0,tWCLS described in (5), and the asymptotic efficiency gain is:
    13pt+3pt2pt1ptβ1,tΣgtHtβ1,t. (9)

The presented lemma demonstrates that incorporating mean-centered auxiliary variables in the estimation of the time-specific marginal treatment effect using (7) guarantees an improvement in efficiency. A detailed proof for Lemma 3.3 is available in Appendix B.

Remark 3.4.

The above discussion aims to establish a connection between the framework of time-varying causal excursion effect estimation in MRT analysis and the original RCT analysis. Therefore, we focus on comparing the efficiency of time-specific marginal effect estimators. However, it is important to note that this comparison is applicable to time-specific moderation analysis as well, where βt;St=ftStβ0,t.

3.2. Moderation effect estimation

There are several limitations with (7). First, most MRTs have a small sample size n and a relatively large number of time points T. Therefore, estimating centering parameters for each time point separately leads to a large number of additional parameters. Second, centering the auxiliary variable at its mean is not a one-size-fits-all remedy. When we are interested in moderated treatment effects (i.e., St), centering by time-specific means can introduce bias. Finally, so far we have taken it for granted that we can center the auxiliary variable directly using its true mean during the model fitting process. However, in practice, this mean quantity needs to be estimated from the observed data, introducing additional variation that could potentially impact the precision of our estimates. In other words, centering at each time point may not necessarily lead to improved precision, particularly when the number of time points T is large.

Here we consider auxiliary variable adjusted estimators of the causal excursion effect moderation under an arbitrary choice of potential moderator St and a smoothed causal excursion effect model ftStβ. Let β^0WCLS be the solution that minimizes the WCLS criterion defined in (3). As proven in Boruvka et al. (2018), the estimator β^0WCLS is consistent and asymptotically normal. Let Ztp be a p-dimensional subset of gtHt that one expects to moderate the causal effect, and ZtftSt=. Our goal is to improve the estimation efficiency of the WCLS estimator by leveraging the auxiliary information from Ztt=1T. In the context of moderation effect analysis, the term “more efficient” refers to an estimator that achieves a lower asymptotic variance than the other for all linear combinations. Specifically, the asymptotic variance of cβ^0 is smaller for any cq. This is equivalent to the negative semidefiniteness of the difference between the asymptotic variance matrices.

3.2.1. Auxiliary Variable Adjusted Estimation

We hereby propose an auxiliary variable adjusted estimation method using a more general centering function μtStp. Under Assumption 3.1, the Auxiliary-variable Adjusted Weighted and Centered Least Square (A2-WCLS) criterion is shown below by minimizing the following objective function:

nt=1TWt×Yt+1gtHtαAtp˜t1StftStβ0+ZtμtStβ12 (10)

An appropriate choice of the centering function μtSt is the key to obtaining a consistent estimate of the moderated causal effect. It has been well established that the WCLS estimation method consistently estimates the moderated treatment effect (Boruvka et al., 2018). Therefore, to ensure the consistency of the A2-WCLS estimator, it is sufficient to require that the minimizer β^0 of criterion (10) coincides with the minimizer of (3). In other words, after substituting β^0 into the derivatives of (10) and (3) with respect to β0, both should be equal to 0. This enables us to formulate the constraint for μtSt to ensure the consistent estimation of β0:

Condition 3.5 (The orthogonality condition).

For each auxiliary variable ZtiμtiSt, where i{1,,p}, to be included in the model, the centering function must satisfy the following orthogonality condition with respect to ftStq:

Et=1Tp˜t1St1p˜t1StZtiμtiStftSt=0q×1. (11)

There are two novel aspects to this orthogonality condition. First, while EZtiSt satisfies Condition 3.5, the orthogonality condition (11) indicates that it is not the only viable option for incorporating auxiliary variables. In fact, the condition implies that a consistent estimate of the moderated causal effect can be obtained without requiring μtiSt to converge to EZtiSt. This allows for considerable flexibility in the choice of μtiSt. As an example, consider the scenario where ftSt=1, i.e., St=. In this case, the goal is to estimate the fully marginal effect β0. For centering the auxiliary variable Zti, one option is to use Z¯ti, which involves T estimates. An alternative approach is to leverage Equation (11), which permits the use of a single centering parameter. Specifically, we can use μi=t=1Tp˜t1St1p˜t1StZtit=1Tp˜t1St1p˜t1St, a weighted average of the Ztit=1T. Moreover, this example also emphasizes that centering Zti at its global average Z¯i can result in inconsistent estimates of the fully marginal effect β0. This example highlights the other novel aspect of Equation (11): it imposes a constraint with the same dimension as ftStq on each of the p centering functions μtiSt. As a result, the dimension of μtSt can be as low as q, therefore, it will not become prohibitive when T increases.

Based on the preceding discussion, a natural choice for a q-dimensional working model for each of the centering functions is μtiSt;θi=ftStθi, where i{1,,p}. This choice allows us to exploit the low-dimensional nature of the causal effect model. Denoting Θ=θ1,,θpq×p, we can establish a straightforward and easy-to-implement A2-WCLS criterion as follows:

nt=1TWt×Yt+1gtHtαAtp˜t1StftStβ0+ZtΘftStβ12 (12)

The following lemma describes the asymptotic properties of the proposed A2-WCLS estimator, enabling a comparison of its estimation precision with the existing WCLS estimator.

Lemma 3.6.

Suppose Assumptions 2.1 and 3.1 hold, centering function μtSt satisfies the orthogonality condition 3.5, and that the randomization probability ptAtHt is known. Let α^,β^0A2,β^1 minimize objective function (10). Under regularity conditions, β^0A2 is consistent and asymptotically normal such that nβ^0A2β0𝒩0,Q1ΣA2Q1, where Q, ΣA2 are defined in Appendix D.

Note that the consistency property of β^0A2 does not depend on the specific choice of μtSt, as long as μtSt satisfies condition 3.5. In addition, the A2-WCLS criterion inherits the robustness of the WCLS method, which guarantees that β^0A2 remains consistent even if the nuisance function gHtα is not correctly specified. Hypothesis tests can be performed using Wald test statistics and normal-based confidence intervals.

Remark 3.7 (Connection to the Neyman Orthogonality).

By treating α,β1 as nuisance parameters, we can establish a connection between the Orthogonality Condition outlined in 3.5 and the Neyman orthogonality of the score equations (Neyman, 1979; Chernozhukov et al., 2018). Intuitively, Neyman orthogonality implies that the moment conditions used to identify β0 are insensitive to the nuisance parameter β1, allowing one to plug in noisy estimates of these parameters without strongly violating the moment condition. Like-wise, our orthogonality condition here also aims to ensure that the consistent estimation of β0 will not be affected by the estimation of β1. It can be derived that Neyman orthogonality leads to the same constraint as condition 3.5. The proof is presented in Appendix C.1.

3.2.2. The Coefficients of the Adjusted Auxiliary Variables

Standard statistical arguments can be used to show that the β^1 obtained from minimizing A2-WCLS in criterion (10) converges to β1μt given by:

Et=1Tp˜t1p˜tΣμZt1Et=1Tp˜t1p˜tβt;St,Ztβt;StZtμtSt, (13)

where ΣμZt=EZtμtStZtμtSt. Referring to the causal estimand as presented in (2), the quantity βt;St,Zt=EEYt+1H1,1EYt+1H1,0St,Zt represents the moderated causal excursion effect at time t, given the effect moderators St,Zt. By estimating β1, we can gain insights into the effects of including a non-moderator auxiliary variable, which indicates a wrong model specification of the causal effect. Additionally, in this scenario, it is worth investigating whether the choice of μt affects the estimation or not.

Let β1EZtSt be denoted as β1. The following two properties provide answers to the previously mentioned concerns and detailed proof can be found in Appendix E.

Lemma 3.8 (Properties).

The inclusion of misspecified auxiliary variables through the A2-WCLS criterion does not adversely affect estimation efficiency.

  1. If βt;St,Ztβt;St=0, then β1=0;

    This property states that the inclusion of Zt as an auxiliary variable has no asymptotic impact on inference if it is not a moderator.

  2. For any function μt satisfying Condition 3.5, β1μtβ1.

    This property confirms parameter attenuation under misspecification of the centering function, and can be used to evaluate its impact on estimation of the moderation effect.

4. Theoretical Results

In this section, we provide theoretical results on the impact of incorporating auxiliary variables on the estimation efficiency of the moderated treatment effect. Additionally, we delve into the detailed examination of the desirable properties of our proposed A2-WCLS method.

4.1. Efficiency Improvement

In contrast to the scenario discussed in Section 3.1.3, where the estimation of time-specific causal effects guarantees a global improvement in precision, modeling the causal excursion effects smoothed over time requires additional assumptions to ensure efficiency gains.

Condition 4.1.

Given the fully observed history Ht, and define the prediction error as ϵtA2=Yt+1gtHtαAtp˜t1StftStβ0+ZtμtStβ1, we can present the following sufficient condition:

  1. Correct causal model specification: βt;Ht=ftStβ0+ZtμtStβ1; and

  2. Conditionally uncorrelated error and future states: EϵtA2ftStZtμtStHt,At=0, t<t.

Condition 4.1(a) implies a correctly specified causal effect model and includes all relevant observed variables that could potentially moderate the proximal outcome. It should be noted that this assumption is less stringent than assuming the entire model to be correctly specified, i.e., EϵtA2Ht,At=0. The latter assumption implies correct modeling of the data-generating process, which is often considerably more complex. Condition 4.1(b) implies that the residuals do not convey any additional information beyond that which is already contained in the observed history. This assumption is fundamental in many statistical and machine learning models, particularly in Markov Decision Processes (MDPs), where the current state determines future states, independent of any unobserved information. Based on the assumptions outlined above, we can state the following theorem:

Theorem 4.2.

Suppose that the centering functions μtSt for the auxiliary variables Zt are given prior to the model fitting. Based on Lemma 3.6 and sufficient condition 4.1, incorporating auxiliary variables using the A2-WCLS method guarantees either an improvement or no harm to the asymptotic precision of the moderated treatment effect estimation. Specifically, the asymptotic efficiency gain compared to the WCLS estimator is given by Q1ΣQ1, where

Q=Et=1Tp˜t1p˜tftStftSt,Σ=Et=1TWtAtp˜t2ftStZtμtStβ1β1t=1TWtAtp˜t2ftStZtμtSt. (14)

For ease of understanding, we can simplify (14) by setting ftSt=1 (i.e., St=). In this case, the efficiency improvement is proportional to the variance of a weighted sum of Zt over time, multiplied by the square of β1. Therefore, if we include auxiliary variables that are strong effect moderators with large variances, we can substantially improve the precision of our estimates. Besides, by setting t to a particular time point and Wt=1, we can establish a link between expression (14) and the time-specific efficiency gain demonstrated in Lemma 3.3, which in turn connects the two causal estimands.

Furthermore, it is worth noting that Lemma 3.3 on estimating time-specific causal effects, as well as the theorem presented in Lin (2013), both demonstrate a global efficiency improvement by adjusting for auxiliary variables. In contrast, Theorem 4.2 suggests a local efficiency gain under the assumption of correct model specification. The gap can be attributed to treatments, moderators, and outcomes in MRTs all being time-varying and the estimation focuses on a moderated effect smoothed over time. In contrast, the estimation in both Lemma 3.3 and Lin (2013) is of a marginal causal effect. Our proposed approach to adjust for auxiliary variables provides insight into how to effectively use observed auxiliary information to enhance the estimation of the parameter of interest. Proof of Theorem 4.2 can be found in Appendix F.

Remark 4.3.

One should consider that Condition 4.1 is just one of several potential sufficient conditions that possess a clear interpretation and guarantee an improvement in the efficiency of general MRT analysis. In practical situations, it is rare that the treatment randomization policy depends on every element of the observed history Ht. Instead, it often depends only on a few contextual variables. As a result, Condition 4.1 can be relaxed to assume correct causal model specification and uncorrelated error states, conditioned on only a subset of Ht. This weaker assumption is still sufficient for efficiency improvement in moderated causal effect estimation. In addition, there are specific scenarios where it is feasible to completely remove this sufficient condition. For instance, when estimating time-specific causal effects, Lemma 3.3 does not require any assumptions regarding the model specification or error-states correlation, while still attaining an efficiency improvement.

4.2. Properties

In this section, we re-evaluate the checklist discussed in Section 2.4 and provide a detailed explanation to demonstrate that the proposed A2-WCLS estimation method satisfies all the desirable properties.

Valid statistical inference.

With reference to Lemma 3.6 and Theorem 4.2, our proposed method generates a consistent and asymptotically normal estimator. Moreover, under Condition 4.1, our approach provides a rigorous mathematical guarantee of achieving an efficiency gain relative to the currently available WCLS inference method.

Robustness to misspecification.

(1) Lemma 3.8 indicates that the inclusion of irrelevant auxiliary variables does not compromise the precision of the estimation of the parameter of interest. (2) A2-WCLS method shares the same property as WCLS in that if the true randomization probability pAtHt is known, a consistent estimator can be obtained even when the nuisance function gHtα is misspecified (Boruvka et al., 2018). Robust standard errors are used against model misspecification and heteroskedasticity.

Strong finite-sample performance.

Our conclusions are derived based on asymptotics, where the sample size N approaches infinity while keeping the total number of time points T fixed. However, our proposed method can still provide reliable performance in a finite sample with a small sample correction (Mancl and DeRouen, 2001) applied to the variance estimator, which is elaborated in Appendix H.

Wide applicability.

The proposed method is a comprehensive extension of the existing literature on covariate adjustment that can be applied to study designs with T1, as discussed in Lemma 3.3 and Theorem 4.2. When T=1, the adjustment will be based on the baseline variables, and our method is equivalent to the covariate adjustment techniques described in Lin (2013) and Ye et al. (2022).

Computational simplicity.

The utilization of linear working models ΘftSt for the centering function simplifies the computation and can be easily implemented using existing functions that are widely available in standard statistical software packages.

4.3. Estimation of the Centering Function

4.3.1. Finite Population vs. Super-Population Based Inference

The previous discussion assumes knowledge of the true centering functions before model fitting, which is not feasible in practice. Therefore, the centering functions μtSt need to be estimated from observed data. In an MRT, treatments are randomized sequentially throughout the study, and it is key to recognize that, like St, the auxiliary variables Zt may also depend on prior actions (i.e., Zt=ZtA¯t). Consequently, the sources of randomness are not limited to treatment assignments, and estimating the centering function μtSt will introduce non-negligible variation to the estimation. Therefore, when performing inference on the causal effect, we must integrate over all sources of uncertainty, specifically, the variance arising from estimating the centering function μtSt.

The debate over whether to use finite population inference or hypothetical infinite population inference when estimating ATE in RCTs has been extensively discussed (Reichardt and Gollob, 1999). Some covariate adjustment methods, such as those proposed by Lin (2013) and Su and Ding (2021), use a finite population framework and are shown to perform well asymptotically. In these methods, baseline variables collected before treatment randomization are used as covariates to improve estimation precision, and the only randomness in the data comes from the random treatment assignment. This approach enables the centering parameter Z¯ to be directly plugged into the model, without introducing additional variation to the estimation process. In contrast, Negi and Wooldridge (2021) and Ye et al. (2022) investigated the asymptotic precision of ATE under the assumption that the subjects were a random sample from an infinite population. The authors suggest “we cannot assume the data has been centered in advance and E[Z]=0 without loss of generality”, therefore the extra variance introduced by estimating Z¯ is taken into account.

MRT analysis slightly differs from the discussion above because the auxiliary variables Zt are collected repeatedly between treatment randomizations. If Zt depends on previous treatments, even in a finite population, estimating μtSt will bring in non-negligible variances which affect the asymptotic variance of the parameters of interest. Therefore, identifying the study population no longer suffices to determine whether extra uncertainty exists. Instead, we should identify whether Zt,jA¯t1,j=zt,j remains unchanged regardless of how the previous treatment realizations are altered. If so, we can directly center the auxiliary variables Zt with μtSt without adding any additional variation to the estimation process. Otherwise, it is crucial to take into account the extra uncertainty introduced by estimating μtSt to avoid a conservative asymptotic variance estimation.

4.3.2. Estimation Methods

To address the uncertainty in estimating the centering function parameters, we employ a “stacking estimating equations” approach, which is a type of M-estimator (Carroll et al., 2006). The A2-WCLS criterion introduced in (12) involves two sets of unknown parameters: α, β0, β1 and θ, and the estimating equation of α, β0, β1 depends on θ. Thus, we can jointly estimate these parameters by “stacking” their estimating equations and solving for all parameters simultaneously. More details are presented in Appendix G.

5. Simulation

In this section, simulation experiments were conducted under different treatment randomization regimes and centering functions. Based on our empirical findings, we have found that: (1) the A2-WCLS method we propose still effectively improves the efficiency of marginal causal effect estimation, even when considering the additional variance introduced by estimating θ; (2) incorporating more features in the centering model leads to minimal or no gain compared to using only μtSt=ftStθ.

5.1. Simulation Setup

Here, we evaluate the proposed method via simulation experiments with a continuous outcome Yt+1. Equation (15) below is based on the base data generation model from Boruvka et al. (2018), which we modified slightly to illustrate the proposed method and compare it with the WCLS method. Consider an MRT with known randomization probability and the observation vector being a single state variable Zt{1,1} at each decision time t, and the state dynamics are given by Zt=1At1,Ht1=expit0.05t+ξAt1 with ξ=0.1. Let

Yt+1,j=β10+β11Zt,jEZt,j×At,jpt1Ht,j+0.8Zt,j+ϵt,j (15)

The randomization probability is set to pt1Ht=expitη1At1+η2Zt where η1,η2=(0.8,0.8) and expit(x)=(1+exp(x))1. The independent error term satisfies ϵt𝒩(0,1) with Corrϵu,ϵt=0.5|ut|/2. We set β10=0.2, and β11{0.2,0.5,0.8} to represent a small, medium, and large moderation effect.

5.2. Simulation Results

In the first simulation experiment, we estimate the fully marginal causal effect, which is constant in time and is given by β0=β10=0.2 Here, we report results with 250 individuals using the following estimation methods:

Estimation Method I: WCLS.

We use the WCLS method (Boruvka et al., 2018) adjusted with Zt,j as a control variable. This method is guaranteed to produce a consistent estimate with a valid confidence interval. Thus, it will be used as a reference for comparison of the estimation results from the following method.

Estimation Method II: A2-WCLS.

We include Zt,j as an auxiliary variable. Since we are primarily concerned with the fully marginal causal excursion effect (i.e. St=, ftSt=1), thus, the working model of the centering function using A2-WCLS criterion in (12) is simply μtSt=θ, and θ^=nt=1Tp˜t1p˜tZt,j.

Estimation Method III: A2-WCLS with a higher-dimensional μtSt.

As the expectation of Zt changes over time, we consider a higher-dimensional working model for the centering function as μtSt=θ1+θ2t. We can solve for both θ^1 and θ^2 under Condition 3.5.

The simulation experiment involves estimating θ in the centering function based on data generated, and this introduces non-negligible variation that affects the asymptotic variance estimation for the fully marginal causal effect. Therefore, the asymptotic variance estimator referred to in Section 4.3.2 will be used. Table 1 reports the simulation results. “%RE gain” indicates the percentage of times we achieve an efficiency gain out of 1000 Monte Carlo replicates. “mRE” stands for the average relative efficiency, and “RSD” represents the relative standard deviation between two estimates. Despite having to account for the extra variance caused by estimating θ, the proposed A2-WCLS method still significantly improves the efficiency of fully marginal causal effect estimation.

Table 1:

Fully marginal causal effect estimation efficiency comparison.

Method β11 Est SE CP %RE gain mRE RSD

WCLS (I) 0.2 −0.197 0.029 0.952 - - -
0.5 −0.197 0.029 0.935 - - -
0.8 −0.194 0.031 0.936 - - -

A2-WCLS (II) 0.2 −0.199 0.026 0.955 100% 1.207 1.198
0.5 −0.200 0.027 0.936 100% 1.202 1.196
0.8 −0.200 0.028 0.940 100% 1.196 1.198

A2-WCLS (III) 0.2 −0.199 0.026 0.955 100% 1.207 1.198
0.5 −0.200 0.027 0.936 100% 1.202 1.196
0.8 −0.200 0.028 0.940 100% 1.196 1.198

We perform one more simulation experiment using A2-WCLS in which the working model of the centering function μtSt has more features that are not included in ftSt. Simulation results indicate that using this extra dimension barely leads to any improvement in RE compared to using just ftSt, which could be interpreted as the trade-off between a centering function that better approximates EZtSt and extra variances caused by estimating the parameters in the working centering model. To conclude, the extension of μtSt to a more complex function while ensuring an improvement in estimation efficiency is not trivial. Thus, we consider this to be future work.

Remark 5.1.

As an empirical demonstration that the orthogonality condition 3.5 must be met for estimation to be consistent, we implement a simulation centering the auxiliary variable at the global average Z¯=1n×Tj=1nt=1TZt,j. According to this simulation result shown in Appendix I Table 2, there is a marked bias compared with results obtained in Estimation Method II, where the auxiliary variable is centered by a weighted average of Zt.

Remark 5.2.

We implemented one simulation under the strong assumption Zt,jA¯t1,j=zt,j. In this case, centering the auxiliary variable no longer introduces extra uncertainty. See Table 3 for the simulation results. This simulation provides similar empirical evidence as in Table 1.

Remark 5.3.

In addition, we vary the treatment randomization policy ptAtHt to evaluate how our proposed method performs under different data generation processes. One is to set the randomization probability to pt1Ht=pt, where pt is only a function of time t. Results are presented in Table 4 in Appendix I. Evidence suggests that incorporating auxiliary variables improves estimation efficiency in this case. Furthermore, we set the randomization probability as a constant pt1Ht=p. See results in Table 5. While we see a minimal improvement compared to the previous two scenarios, we highlight that the inclusion of auxiliary variables does not adversely affect efficiency.

6. Extension: Time-Lagged Outcomes and Post-Treatment Auxiliary Variable Adjustment

In addition to the focus on proximal outcomes, there is growing interest in lagged outcomes defined over future decision points with a fixed window length Δ>1, denoted as Yt,Δ, which can be expressed as a known function of the participant’s data: Yt,Δ=yHt+Δ (Dempsey et al., 2020). The incorporation of auxiliary variables in this context is conceptually similar to that of proximal outcomes. Nevertheless, when considering lagged outcomes, the selection of auxiliary variables actually has a broader scope as post-treatment auxiliary variables may also be incorporated into the estimation.

Under Assumption 2.1, the causal estimand for lagged effect can be expressed in terms of observable data (Shi et al., 2022):

βp,π(t+Δ;s)=EEWt,Δ1Yt,ΔAt=1,HtEWt,Δ1Yt,ΔAt=0,HtSt=s, (16)

Where Wt,u=s=1uπtAt+sHt+s/ptAt+sHt+s. Here, we assume the reference distribution for treatment assignments from t+1 to t+Δ1(Δ>1) is given by a randomization probability generically represented by πuauHuu=t+1t+Δ1. This generalization contains previous definitions such as lagged effects (Boruvka et al., 2018) where πu=pu and deterministic choices such as at+1:(t+Δ1)=0 (Dempsey et al., 2020; Qian et al., 2021), where πu=1au=0 and 1{} is the indicator function. As described in Boruvka et al. (2018), assuming βp,π(t+Δ;s)=ft(s)β0, the WCLS criterion for lagged outcomes is to minimize the following expression:

nt=1TΔ+1Wt,Δ1Wt×Yt,ΔgtHtα0Atp˜t1StftStβ02. (17)

It is important to note that all variables included in Equation (17), as indicated by the notations, must be observed prior to the treatment allocation at time tAt. Naively plugging in post-treatment variables in Equation (17) without appropriate adjustment can lead to bias, and we provide an illustration of this in Appendix J. However, considering the lagged outcomes, it is also reasonable to expect that post-treatment variables may be highly prognostic, and including them could potentially reduce noise and yield a more precise estimation of the causal parameter. Building upon this motivation, we introduce the A2-WCLS criterion for lagged outcomes, which allows for the incorporation of post-treatment auxiliary variables. Let Zt+up represent a set of post-treatment auxiliary variables, where u is a positive integer:

nt=1TΔ+1Wt,Δ1Wt×Yt,ΔgtHtα0Zt+uμt+uSt,Atα1Atp˜t1StftStβ0+Zt+uμt+uSt,Atβ12. (18)

The A2-WCLS criterion for post-treatment auxiliary variable adjustment has two distinctions compared to only pre-treatment auxiliary variable adjustment. First, as a sufficient condition for ensuring a consistent estimate of the parameter of interest β0, we require two sets of centering functions, denoted as μt+u() and μt+u(), as shown above. Second, both sets of centering functions are dependent on the treatment allocation At, thus denoted as μt+uSt,At and μt+uSt,At. As a result, we arrive at the subsequent complementary orthogonality condition for both centering functions:

Condition 6.1.

For each auxiliary variable Zt+ui, where i1,,p, and 0<u<Δ, its corresponding centering functions must satisfy the following orthogonality condition with respect to ftStq,

(a)Et=1TΔ+1WtWt,Δ1Zt+uiμt+uiSt,AtAtp˜t1StftSt=0q×1,(b)Et=1TΔ+1WtWt,Δ1Zt+uiμt+uiSt,AtAtp˜t1St2ftSt=0q×1. (19)

Condition 6.1 offers an approach for integrating post-treatment auxiliary variables into the estimation of lagged moderated causal excursion effects, thereby expanding our framework for incorporating auxiliary variables in time-varying treatment effect estimation. To establish a connection with the previous discussion on pre-treatment auxiliary variable adjustment mentioned in Condition 3.5, we can set u=0, leading to μtSt,At=μtSt,At=μtSt, and Wt,Δ1=1. Thus, Equation (19)(a) will always be 0, and Equation (19)(b) can be simplified to Equation (11). To conclude, the orthogonality condition for proximal outcomes is subsumed as a special case. The proof can be found in Appendix J.

Remark 6.2.

The method we propose can also be used in situations where the causal excursion effect is not presented as a linear contrast but instead, as logarithmic relative risks, as shown in Qian et al. (2021). We have included the extensions of our method to binary outcomes in Appendix K. It’s important to note that the orthogonality condition of the centering function for binary outcomes is dependent on the parameter β0. To solve this, we can use an iterative algorithm to simultaneously estimate the stacking equations.

7. Case Study

The Intern Health Study (IHS) is a 6-month micro-randomized trial on medical interns (NeCamp et al., 2020), which aimed to investigate when to provide mHealth interventions to individuals in stressful work environments to improve their behavior and mental health. In this section, we evaluate the effectiveness of targeted notifications in improving an individual’s mood and step counts. The data set used in the analyses contains 1562 participants.

The exploratory and MRT analyses conducted in this paper focus on weekly randomization, thus, an individual was randomized to receive mood, activity, sleep, or no notifications with equal probability (1/4 each) every week. We choose the outcome Yt+1,j as the self-reported mood score (a Likert scale taking values from 1 to 10) and step count (cubic root) for individual j in study week t. The average weekly mood score when a notification is delivered is 7.14, and 7.16 when there is no notification; The average weekly step count (cubic root) when a notification is delivered is 19.1, and also 19.1 when there is no notification. In the following analysis, the prior week’s outcome is chosen as the auxiliary variable. We evaluate the targeted notification treatment effect for medical interns using our proposed method and WCLS.

7.1. Comparison of the marginal effect estimation

First, we are interested in assessing the fully marginal excursion effect as well as the effect moderation (i.e., β(t)=β0). For an individual j, the study week is coded as a subscript t. Yt+1,j is the self-reported mood score or step count for individual j in study week t+1, and Zt,j represents the auxiliary variable we choose. At is defined as the specific type of notification that targets improving the outcome. For example, if the outcome is the self-reported mood score, sending mood notifications would be the action, thus, At=1=0.25. We analyze the marginal causal effect β0 of the targeted notifications on self-reported mood score and step count using the following model:

Yt+1,j=gtHtα+Atp˜tβ0+β2Zt,jμt(θ)

For the first analysis, we set gtHtα=α0 and β2=0. Thus, the estimation boils down to a univariate WCLS model. In the second model, we keep β2=0, but include a control variable, meaning gtHtα=α0+α1Zt,j. The third moderation analysis lets β2 be a free parameter, enabling novel analyses using A2-WCLS.

The corresponding outcome of the prior week is chosen to be the auxiliary variable, i.e., Zt,j=Yt,j, which is a time-varying variable. To specify further, for mood score as the outcome, we selected the prior week’s average mood score as the auxiliary variable, and for step count as an outcome, the prior week’s average step count was chosen. The working model chosen to center these auxiliary variable is μt=nYt. We report various estimators in Figure 1 and present more details in the Appendix L Table 8. In comparison with Model I, Model II & III have a tangible improvement in the standard error estimates, with a relative estimation efficiency of 2.32, 2.33 for the mood outcome, and 1.57, 1.58 for the step outcome. Even though Model III using A2-WCLS does not gain considerable efficiency over Model II (WCLS) results, it is not likely to lose efficiency.

Figure 1:

Figure 1:

Causal effect point estimates with the 95% confidence interval, and standard errors

We conclude that sending activity notifications can increase (the cubic root of) step counts by 0.072, with statistical significance at level 95%. The study also suggests that sending mood notifications may negatively affect users’ moods. There is, however, insufficient evidence to come to a causal relationship between these two.

7.2. Time-varying treatment effect estimation

In most mobile health intervention studies, time-in-study has always been an important moderator of treatment effects. Therefore, for further investigation, we include study week in the marginal treatment effect model: β(t)=β0+β1t. Auxiliary variables are still chosen to be the corresponding outcome in the prior week. Control variables have been established in Section 7.1 as being beneficial in reducing standard errors, so control variables will always be included in the following analysis.

Estimated time-varying treatment moderation effects are shown in Figure 2 below. We compare our proposed approach against the WCLS method from Boruvka et al. (2018). More details of the moderated analysis are presented in Appendix L Table 9. The shaded area represents the 95% confidence band of the moderation effects at varying values of the moderator. A much narrower confidence band is observed when A2-WCLS is used; specifically, the relative efficiency for β^0, β^1 in the mood model is 2.204 and 2.157, respectively, and in the step model is 1.562 and 1.475, respectively.

Figure 2:

Figure 2:

Causal effect point estimates with the 95% confidence interval, and standard errors

There is an overall decreasing trend of mood change with study week, which suggests that if the notifications don’t serve any therapeutic purposes, it might not be ideal to over-burden the participants with more targeted reminders, since it will bring down their mood. Furthermore, it is encouraging to see that the causal excursion effect of mobile prompts for step count change is positive in the first several weeks of the study, which means that sending targeted reminders is beneficial to increasing physical activity levels. In the later stages of the study, the effect fades away, possibly due to habituation to smartphone reminders.

8. Discussion

MRTs are commonly used in mobile health-related studies, yet statistical techniques for covariate adjustment have mainly focused on RCTs with different sampling designs. In light of the increasing research on statistical tools for MRT analysis, we investigate the theoretical properties of incorporating auxiliary variables in the estimation of moderated causal excursion effects. Our study aims to address the current gaps in the literature by considering repeated measurements and time-varying treatment effects.

Our proposed A2-WCLS method possesses several desirable properties, such as computational simplicity, robustness to model misspecification, and wide applicability. Moreover, it provides valid statistical inference and demonstrates strong performance in finite-sample settings. Most importantly, it offers improved efficiency for moderated causal excursion effect estimation in comparison to the benchmark WCLS methods. In Appendix M, we provide clear and practical guidelines for practitioners to implement auxiliary variable adjustment in longitudinal studies. Overall, our results have the potential to alleviate confusion surrounding this topic and enhance the precision of causal effect estimation.

Many open questions remain. One of the most important is the choice of the centering function μtSt, where we can use feature selection methods, such as the smooth clipped absolute deviation method (SCAD) Fan and Li (2001), to pick an optimal model for μtSt. A non-linear centering function can also be considered, similar to Guo and Basse (2021) proposed for RCTs. In addition to using auxiliary variables to improve estimation efficiency, we may also consider adjusting for controlled variables gHt. There is a wide variety of machine learning methods that can be used for this purpose. Further, if the MRT establishes a certain type of cluster structure (interference within clusters and/or treatment heterogeneity at the cluster level), it definitely requires new centering techniques using cluster-level auxiliary variables (Shi et al., 2022). Last but not least, more advanced and flexible MRT designs should be explored and work in tandem with auxiliary variable adjustment to maximize efficiency, for example, Van Lancker et al. (2022) proposes using an information adaptive design, which adapts to the amount of precision gain and can lead to faster, more efficient trials, without sacrificing validity or power. We consider these to be exciting directions for future work.

Supplementary Material

Supplement only

References

  1. Boruvka A, Almirall D, Witkiewitz K, and Murphy SA (2018). Assessing time-varying causal effect moderation in mobile health. Journal of the American Statistical Association 113(523), 1112–1121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Carroll RJ, Ruppert D, Stefanski LA, and Crainiceanu CM (2006). Measurement error in nonlinear models: a modern perspective. Chapman and Hall/CRC. [Google Scholar]
  3. Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W, and Robins J. (2018). Double/debiased machine learning for treatment and structural parameters. [Google Scholar]
  4. Dempsey W, Liao P, Klasnja P, Nahum-Shani I, and Murphy SA (2015). Randomised trials for the fitbit generation. Significance 12(6), 20–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Dempsey W, Liao P, Kumar S, and Murphy SA (2020). The stratified micro-randomized trial design: sample size considerations for testing nested causal effects of time-varying treatments. The annals of applied statistics 14(2), 661. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Eicker F. (1967). Limit theorems for regressions with unequal and dependent errors. [Google Scholar]
  7. Fan J. and Li R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96(456), 1348–1360. [Google Scholar]
  8. Fisher R. (1935). The Design of Experiments. The Design of Experiments. Oliver and Boyd. [Google Scholar]
  9. Free C, Phillips G, Galli L, Watson L, Felix L, Edwards P, Patel V, and Haines A. (2013). The effectiveness of mobile-health technology-based health behaviour change or disease management interventions for health care consumers: a systematic review. PLoS medicine 10(1), e1001362. [Google Scholar]
  10. Freedman DA (2006). On the so-called “huber sandwich estimator” and “robust standard errors”. The American Statistician 60(4), 299–302. [Google Scholar]
  11. Freedman DA (2008). On regression adjustments to experimental data. Advances in Applied Mathematics 40(2), 180–193. [Google Scholar]
  12. Guo K. and Basse G. (2021). The generalized oaxaca-blinder estimator. Journal of the American Statistical Association, 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Heron KE and Smyth JM (2010). Ecological momentary interventions: incorporating mobile technology into psychosocial and health behaviour treatments. British journal of health psychology 15(1), 1–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Huber PJ (1967). Under nonstandard conditions. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability: Weather Modification; University of California Press: Berkeley, CA, USA, pp. 221. [Google Scholar]
  15. ICH E. (1998). E9 statistical principles for clinical trials. London: European Medicines Agency. [Google Scholar]
  16. Klasnja P, Hekler EB, Shiffman S, Boruvka A, Almirall D, Tewari A, and Murphy SA (2015). Microrandomized trials: An experimental design for developing just-in-time adaptive interventions. Health Psychology 34(S), 1220. [Google Scholar]
  17. Liang K-Y and Zeger SL (1986). Longitudinal data analysis using generalized linear models. Biometrika 73(1), 13–22. [Google Scholar]
  18. Liao P, Klasnja P, Tewari A, and Murphy SA (2016). Sample size calculations for micro-randomized trials in mhealth. Statistics in medicine 35(12), 1944–1971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Lin W. (2013). Agnostic notes on regression adjustments to experimental data: Reexamining freedman’s critique. The Annals of Applied Statistics 7(1), 295–318. [Google Scholar]
  20. Mancl LA and DeRouen TA (2001). A covariance estimator for gee with improved small-sample properties. Biometrics 57(1), 126–134. [DOI] [PubMed] [Google Scholar]
  21. Nahum-Shani I, Smith SN, Spring BJ, Collins LM, Witkiewitz K, Tewari A, and Murphy SA (2018). Just-in-time adaptive interventions (jitais) in mobile health: key components and design principles for ongoing health behavior support. Annals of Behavioral Medicine 52(6), 446–462. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. NeCamp T, Sen S, Frank E, Walton MA, Ionides EL, Fang Y, Tewari A, and Wu Z. (2020). Assessing real-time moderation for developing adaptive mobile health interventions for medical interns: micro-randomized trial. Journal of medical Internet research 22(3), e15033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Negi A. and Wooldridge JM (2021). Revisiting regression adjustment in experiments with heterogeneous treatment effects. Econometric Reviews 40(5), 504–534. [Google Scholar]
  24. Neyman J. (1979). C (α) tests and their use. Sankhyā: The Indian Journal of Statistics, Series A, 1–21. [Google Scholar]
  25. Neyman JS (1923). On the application of probability theory to agricultural experiments. essay on principles. section 9.(tlanslated and edited by dm dabrowska and tp speed, statistical science (1990), 5, 465–480). Annals of Agricultural Sciences 10, 1–51. [Google Scholar]
  26. Qian T, Yoo H, Klasnja P, Almirall D, and Murphy SA (2021). Estimating time-varying causal excursion effects in mobile health with binary outcomes. Biometrika 108(3), 507–527. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Rabbi M, Kotov MP, Cunningham R, Bonar EE, Nahum-Shani I, Klasnja P, Walton M, and Murphy S. (2018). Toward increasing engagement in substance use data collection: development of the substance abuse research assistant app and protocol for a microrandomized trial using adolescents and emerging adults. JMIR research protocols 7(7), e166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Reichardt CS and Gollob HF (1999). Justifying the use and increasing the power of at test for a randomized experiment with a convenience sample. Psychological methods 4(1), 117. [Google Scholar]
  29. Ridpath J. (2017). How can we use technology to support patients after bariatric surgery? [Google Scholar]
  30. Robins J. (1986). A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect. Mathematical Modelling 7(9), 1393–1512. [Google Scholar]
  31. Robins JM (1994). Correcting for non-compliance in randomized trials using structural nested mean models. Communications in Statistics-Theory and methods 23(8), 2379–2412. [Google Scholar]
  32. Robins JM (1997). Causal inference from complex longitudinal data. In Latent variable modeling and applications to causality, pp. 69–117. Springer. [Google Scholar]
  33. Rubin D. (1978). Bayesian inference for causal effects: The role of randomization. The Annals of Statistics 6(1), 34–58. [Google Scholar]
  34. Schochet PZ (2010). Is regression adjustment supported by the neyman model for causal inference? Journal of Statistical Planning and Inference 140(1), 246–259. [Google Scholar]
  35. Shi J, Wu Z, and Dempsey W. (2022). Assessing time-varying causal effect moderation in the presence of cluster-level treatment effect heterogeneity and interference. Biometrika. [Google Scholar]
  36. Su F. and Ding P. (2021). Model-assisted analyses of cluster-randomized experiments. Journal of the Royal Statistical Society: Series B (Statistical Methodology). [Google Scholar]
  37. Tsiatis AA, Davidian M, Zhang M, and Lu X. (2008). Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: a principled yet flexible approach. Statistics in medicine 27(23), 4658–4677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Van Lancker K, Betz J, and Rosenblum M. (2022). Combining covariate adjustment with group sequential and information adaptive designs to improve randomized trial efficiency. arXiv preprint arXiv:2201.12921 [Google Scholar]
  39. White H. (2014). Asymptotic theory for econometricians. Academic press. [Google Scholar]
  40. Yang L. and Tsiatis AA (2001). Efficiency study of estimators for a treatment effect in a pretest–posttest trial. The American Statistician 55(4), 314–321. [Google Scholar]
  41. Ye T, Shao J, Yi Y, and Zhao Q. (2022). Toward better practice of covariate adjustment in analyzing randomized clinical trials. Journal of the American Statistical Association, 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Zhang M, Tsiatis AA, and Davidian M. (2008). Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics 64(3), 707–715. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Zhao A. and Ding P. (2021). No star is good news: A unified look at rerandomization based on p-values from covariate balance tests. arXiv preprint arXiv:2112.10545 [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement only

RESOURCES