Incorporating Auxiliary Variables to Improve the Efficiency of Time-Varying Treatment Effect Estimation

Jieru Shi; Zhenke Wu; Walter Dempsey

doi:10.1080/01621459.2025.2516197

. Author manuscript; available in PMC: 2026 Jan 16.

Published before final editing as: J Am Stat Assoc. 2025 Aug 5:10.1080/01621459.2025.2516197. doi: 10.1080/01621459.2025.2516197

Incorporating Auxiliary Variables to Improve the Efficiency of Time-Varying Treatment Effect Estimation

Jieru Shi ¹, Zhenke Wu ¹, Walter Dempsey ¹

PMCID: PMC12807526 NIHMSID: NIHMS2107334 PMID: 41551576

Abstract

The use of smart devices (e.g., smartphones, smartwatches) and other wearables to deliver digital interventions to improve health outcomes has grown significantly in the past few years. Mobile health (mHealth) systems are excellent tools for the delivery of adaptive interventions that aim to provide the right type/amount of support, at the right time, by adapting to an individual’s changing context. Micro-randomized trials (MRTs) are an increasingly common experimental design that are the main source for data-driven evidence of mHealth intervention effectiveness. To assess time-varying causal effect moderation in an MRT, individuals are intensively randomized to receive treatment over time. In addition, measurements, including individual characteristics, and context are also collected throughout the study. The effective utilization of covariate information to improve inferences regarding causal effects has been well-established in the context of randomized control trials (RCTs), where covariate adjustment is applied to leverage baseline data to address chance imbalances and improve the asymptotic efficiency of causal effect estimation. However, the application of this approach to longitudinal data, such as MRTs, has not been thoroughly explored. Recognizing the connection to Neyman Orthogonality, we propose a straightforward and intuitive method to improve the efficiency of moderated causal excursion effects by incorporating auxiliary variables. We compare the robust standard errors of our method with those of the benchmark method. The efficiency gain of our approach is demonstrated through simulation studies and an analysis of data from the Intern Health Study (NeCamp et al., 2020).

Keywords: Causal Inference, Asymptotic Efficiency, Micro-randomized Trials, Mobile Health, Moderation Effect, Covariate Adjustment

1. Introduction

The use and development of mobile interventions are experiencing rapid growth. In “just-in-time” mobile interventions (Nahum-Shani et al., 2018), treatments are provided via a mobile device that is intended to help an individual make healthy decisions “at the moment” and thus have a proximal, near-term impact on health outcomes. Micro-randomized trials (MRTs; Klasnja et al. (2015); Dempsey et al. (2015)) provide data to guide the development of such mobile interventions (Free et al., 2013), with each participant in an MRT sequentially randomized to treatment numerous times, at possibly hundreds to thousands of occasions. The weighted and centered least squares (WCLS) method (Boruvka et al., 2018) is regarded as the benchmark method used for estimating moderated causal excursion effects, ensuring a consistent estimate with asymptotic normality. Estimates of these marginal comparisons are crucial to domain scientists in making decisions concerning whether to include the treatment in an mHealth intervention package.

In an MRT, measurements of individual characteristics, context, and response to treatments are collected passively through sensors or actively by self-report. We refer to additional variables besides the potential moderator as auxiliary variables. Such variables are analogous to baseline covariates in the context of randomized controlled trials (RCTs). Extensive research has demonstrated that incorporating baseline covariates into the analysis can improve estimation efficiency for the average treatment effect (ATE). Lin (2013) recommended using a fully interacted linear regression model, and it was shown to be asymptotically more efficient over both unadjusted and additive models. In recent years, further generalizations of covariate adjustment have been made to accommodate more complex randomization schemes (Su and Ding, 2021; Zhao and Ding, 2021) and to study arbitrary functions of response, such as linear contrasts, ratios, and odds ratios (Ye et al., 2022).

Despite these appealing properties, there is still a knowledge gap between existing covariate adjustment approaches and applications to causal excursion effect estimation using data arising from an MRT. More specifically, when the treatment, response, and moderators are time-varying, it’s unclear how to adjust auxiliary variables to improve the estimation efficiency of the marginal treatment effect.

This study makes two primary contributions to the evaluation of treatment effects in longitudinal data with time-varying treatments, responses, and moderators. Firstly, we highlight the crucial role of incorporating auxiliary variables in estimating time-specific causal effects and illustrate that doing so correctly can lead to a global efficiency gain. In addition, we introduce a comprehensive theoretical framework that incorporates auxiliary variables for analyzing moderated causal effects smoothed over time. This framework achieves local efficiency gains under mild conditions. These findings are further supported by both simulation studies and real-world data analysis.

1.1. Outline

The rest of the paper proceeds as follows. Section 2 reviews existing analytic techniques for MRTs, and covariate adjustment techniques on the ATE estimation. We specifically summarize our main contribution in Section 2.4 and explain why incorporating auxiliary variables in WCLS is challenging. In Section 3, we propose a general strategy of using auxiliary variables to improve the efficiency of the time-varying treatment effect estimation, and discuss the asymptotic properties of the moderated causal effect estimation. In Section 4, we present the theoretical results on efficiency improvement. Section 5 uses simulation studies to compare various estimators and standard errors under different data-generating processes. Section 6 extends the framework to accommodate time-lagged outcomes and post-treatment auxiliary variables. Section 7 illustrates the efficiency improvement using our proposed method with a recent MRT: the Intern Health Study (NeCamp et al., 2020). The paper concludes with a brief discussion in Section 8. All the technical proofs are collected in the Supplementary Materials.

2. Preliminaries

2.1. Notation

For a given individual $j$ , let $A_{t, j}$ denote the treatment at the $t$ -th treatment occasion and $Y_{t + 1, j}$ be the subsequent proximal response $(t = 1, \dots, T; j = 1, \dots, N)$ . For simplicity, the main results in this paper assume a binary treatment $A_{t, j} \in {0, 1}$ . Individual and contextual information at the $t$ -th treatment occasion is represented by $X_{t, j}$ , which may contain summaries of previous context, treatment, or response measurements. For example, prior to each treatment occasion, the individual might report their current mood. The vector $X_{t}$ could then contain this measurement or, with previous measurements, variation or change in mood. Over the course of $T$ treatment occasions, the resulting data from an individual ordered in time is $(X_{0}, X_{1}, A_{1}, Y_{2}, \dots, X_{T}, A_{T}, Y_{T + 1})$ . The overbar is used to denote a sequence of random variables or realized values up to a specific treatment occasion, for example, ${\bar{A}}_{t} = (A_{1}, \dots, A_{t})$ denotes the sequence of treatment up to and including decision time $t$ . Information accrued up to treatment occasion $t$ is represented by the history $H_{t} = (X_{0}, X_{1}, A_{1}, Y_{2}, \dots, X_{t})$ . In this following, random variables or vectors are denoted with uppercase letters; lowercase letters denote their realized values.

We assume that the longitudinal data are independent and identically distributed across $N$ individuals. Note that this assumption would be violated, if, for example, some of the treatments are used to enhance social support between individuals in the study (Liao et al., 2016). The following section defines the “causal excursion effect” estimand. Then we express the causal estimand in terms of the observed data and provide causal assumptions sufficient for these expressions.

2.2. Existing Inferential Methods Review

Many treatments are designed to influence an individual in the short term or proximally in time (Heron and Smyth, 2010). To answer questions related to the causal effect of time-varying treatments on the proximal response, we focus on the class of estimands referred to as “causal excursion effects”, which are a function of the decision point $t$ and a set of moderators $S_{t}$ , marginalizing over all other observed and unobserved variables (Boruvka et al., 2018; Qian et al., 2021). We provide formal definitions below using potential outcomes (Rubin, 1978; Robins, 1986).

Let $Y_{t + 1} ({\bar{a}}_{t - 1})$ denote the potential outcome for the proximal response under treatment sequence ${\bar{a}}_{t - 1}$ . Let $S_{t} ({\bar{a}}_{t - 1})$ denote the potential outcome for a time-varying effect moderator which is a deterministic function of the potential history up to time $t$ , $H_{t} ({\bar{a}}_{t - 1})$ . The causal excursion effect is

β_{p} (t; s) = E_{p} [Y_{t + 1} ({\bar{A}}_{t - 1}, A_{t} = 1) - Y_{t + 1} ({\bar{A}}_{t - 1}, A_{t} = 0) ∣ S_{t} ({\bar{A}}_{t - 1}) = s] .

(1)

Equation (1) is defined with respect to a reference distribution $p$ , i.e., the distribution of treatments ${\bar{A}}_{t - 1} : = \{A_{1}, A_{2}, \dots, A_{t - 1}\}$ . We follow common practice in observational mobile health studies where analyses such as GEEs (Liang and Zeger, 1986) are conducted marginally over $p$ . To express the proximal response in terms of the observed data, we assume positivity, consistency, and sequential ignorability (Robins, 1994, 1997):

Assumption 2.1.

Consistency: For each $t \leq T$ and $j$ , $\{Y_{t + 1, j} ({\bar{A}}_{t}), X_{t, j} ({\bar{A}}_{t - 1}), A_{t, j} ({\bar{A}}_{t - 1})\} = \{Y_{t + 1, j}, X_{t, j}, A_{t, j}\}$ , i.e., observed values equal the corresponding potential outcomes;
Positivity: if the joint density $\{A_{t} = a_{t}, H_{t} = h_{t}\}$ is greater than zero, then $P (A_{t} = a_{t} ∣ H_{t} = h_{t}) > 0$ ;
Sequential ignorability: For each $t \leq T$ , the potential outcomes $\{Y_{t + 1} ({\bar{a}}_{t}), X_{t + 1} ({\bar{a}}_{t}), A_{t + 1} ({\bar{a}}_{t}), \dots, Y_{T + 1} ({\bar{a}}_{T})\}$ are independent of $A_{t, j}$ conditional on the observed history $H_{t}$ .

Under Assumption 2.1, (1) can be re-expressed in terms of observable data:

β_{p} (t; s) = E [E_{p} [Y_{t + 1} ∣ A_{t} = 1, H_{t}] - E_{p} [Y_{t + 1} ∣ A_{t} = 0, H_{t}] ∣ S_{t} = s] .

(2)

Assuming $β_{p} (t; s) = f_{t} {(s)}^{⊤} β_{0}^{⋆}$ where $f_{t} (s) \in ℝ^{q}$ is a feature vector comprised of a $q$ -dimensional summary of observed information depending only on state $s$ and decision point $t$ , a consistent estimator for $β_{0}^{⋆}$ can be obtained by minimizing the WCLS criterion (Boruvka et al., 2018):

ℙ_{n} [\sum_{t = 1}^{T} W_{t} \times {(Y_{t + 1} - g_{t} {(H_{t})}^{⊤} α - (A_{t} - {\tilde{p}}_{t} (1 ∣ S_{t})) f_{t} {(S_{t})}^{⊤} β)}^{2}],

(3)

where $ℙ_{n}$ is an operator denoting the sample average, $W_{t} = {\tilde{p}}_{t} (A_{t} ∣ S_{t}) / p_{t} (A_{t} ∣ H_{t})$ is a weight where the numerator is an arbitrary function with range (0,1) that only depends on $S_{t}$ , and $g_{t} (H_{t}) \in ℝ^{d}$ are $d$ control variables chosen to help reduce variance and to construct more powerful test statistics. See Boruvka et al. (2018) for more details on the estimand formulation and consistency, asymptotic normality, and robustness properties of the WCLS estimation method.

Remark 2.2.

Correct causal effect specification, i.e., $β_{p} (t; s) = f_{t} {(s)}^{⊤} β^{⋆}$ is not required. Instead, we can follow prior literature (Dempsey et al., 2020; Shi et al., 2022) and interpret the proposed linear form as a working model. Specifically, $\hat{β}$ is a consistent and asymptotically normal estimator for

β^{⋆} = \arg \min_{β} E [\sum_{t = 1}^{T} {\tilde{p}}_{t} (1 ∣ S_{t}) (1 - {\tilde{p}}_{t} (1 ∣ S_{t})) {(β (t; S_{t}) - f_{t} {(S_{t})}^{⊤} β)}^{2}] .

Therefore, the working model can be interpreted as an $L_{2}$ projection of the true causal excursion effect onto the space spanned by a q-dimensional feature vector that only includes $t$ and $s$ , denoted by $f_{t} {(s)}^{⊤} β^{⋆}$ (Dempsey et al., 2020). Interpretation as a projection or as a correctly specified causal effect can be viewed as a bias-variance trade-off. The projection interpretation guarantees well-defined parameter interpretation in practice.

In addition to studying continuous proximal outcomes and defining the causal excursion effect as a linear contrast of the outcomes under different treatment allocations, as in Equation (2), there are also many MRTs concerning longitudinal binary outcomes. Qian et al. (2021) proposed an estimator of the marginal excursion effect (EMEE) by defining a log relative risk model for the causal excursion effect, and we present a detailed review in Appendix K.

2.3. Covariate Adjustment Literature Review

A review of Section 2.1 makes clear that if $T = 1$ , then the fully marginal causal excursion effect in MRTs is equivalent to the ATE in RCTs. The literature related to RCTs has extensively examined the use of covariate adjustment as an approach to enhance precision while making inferences on the ATE.

Consider an RCT with a binary treatment, where the treatment is randomly assigned in the initial stage of the study, and the $N$ subjects are the population of interest. Neyman (1923) showed that the difference-in-means estimator is unbiased for ATE. Fisher (1935) proposed to use the ordinary least square (OLS) adjusted estimator of ATE, which is the estimated coefficient on the treatment $A$ in the OLS regression of $Y$ on $\{1, A, X_{0}\}$ , hoping to leverage the information in the baseline covariates $X_{0}$ to improve estimation efficiency. Freedman (2008) suggests that adjustment might hurt asymptotic precision.

Lin (2013) argues that in sufficiently large samples, the statistical problems Freedman (2008) raised are either minor or easily fixed. In addition, Lin (2013) shows that OLS adjustment with a full set of treatment × covariate interactions improves or does not hurt asymptotic precision, even when the regression model is incorrect. The robust Eicker-Huber–White (Eicker, 1967; Huber, 1967; White, 2014) variance estimator (Freedman, 2006) is consistent or asymptotically conservative (regardless of whether the interactions are included) in estimating the true asymptotic variance. More results on the asymptotic precision of treatment effect can be found in papers by Yang and Tsiatis (2001), Tsiatis et al. (2008), Zhang et al. (2008), Schochet (2010), and Negi and Wooldridge (2021).

More complex randomization schemes have led to recent developments in covariate adjustment methods. For example, Zhao and Ding (2021) examined a variety of schemes for rerandomization based on p-values (ReP), and corresponding ATE estimators from the unadjusted, additive, and fully interacted linear regressions. It is found that the estimator from the fully interacted regression is asymptotically most efficient. Su and Ding (2021) extended the theory to cluster-randomized experiments, where randomization over units within clusters is either unethical or logistically infeasible. Ye et al. (2022) used a model-assisted approach for covariate adjustment and generalized the outcome types to not only linear contrast, but ratios and odds ratios. Their conclusions are based on an asymptotic theory that provides a clear picture of how covariate-adaptive randomization and regression adjustment alter statistical efficiency.

2.4. Challenges

In light of these existing methods, we aim to develop a method to incorporate covariate adjustment techniques into the analysis of MRTs. The main challenges are twofold: First, in an MRT, the treatment, response, and moderators are all time-varying, thus it remains questionable whether we can still center the auxiliary variables using the overall average without producing inconsistent estimates; Second, if it is only feasible to estimate the centering parameters individually at each time point, the total number of such parameters can become prohibitive as the number of decision points grows. Therefore, it would be ideal to have a simpler approach that can be uniformly applied to all time points.

Guo and Basse (2021) and Ye et al. (2022) both proposed checklists of ideal properties for covariate adjustment methods. As we consider time-varying treatment effect estimation, we build upon their lists of the ideal properties the auxiliary variable adjustment method should have: (1) valid statistical inference and efficiency gain; (2) robustness to misspecification; (3) strong finite-sample performance; (4) computational simplicity; and (5) wide applicability. In the following sections, we describe a general method called “A2-WCLS” for performing auxiliary variable adjusted WCLS on longitudinal data with time-varying treatments to estimate causal excursion effects that satisfy (1)–(5).

3. Estimand and Inferential Method

In this section, we explore the asymptotic relative efficiency of various candidate methods for adjusting auxiliary variables in the causal excursion effect estimation. Our goal is to specify “how to account for [auxiliary variables] in the analysis in order to improve precision” (ICH, 1998). We begin by making a parametric assumption concerning the causal parameter of interest:

Assumption 3.1.

Assume the causal excursion effect $β_{p} (t; s) = f_{t} {(s)}^{⊤} β_{0}^{⋆}$ , where $f_{t} (s) \in ℝ^{q}$ is a feature vector comprised of a q-dimensional summary of observed information depending only on state s and decision point $t$ .

3.1. Time-Specific Causal Excursion Effect Estimation

To establish a clear connection with previous research on covariate adjustment, we start with the time-specific causal excursion effect estimand, which shares similarities with the ATE estimand in RCTs. Recall that the ATE estimand is typically defined as the linear difference between the expected outcomes under different treatment allocations, i.e., $E [E [Y ∣ X, A = 1] - E [Y ∣ X, A = 0]]$ . In the context of micro-randomized trials (MRT), the time-specific causal excursion effect estimand can be defined as follows:

β (t) = E [E [Y_{t + 1} ∣ H_{t}, A_{t} = 1] - E [Y_{t + 1} ∣ H_{t}, A_{t} = 0]] .

The parametric assumption stated in Assumption 3.1 can be reformulated as $β (t; S_{t}) = β_{0, t}^{⋆}$ , where $S_{t} = \emptyset$ and the model is non-parametric in time $t$ . In this section, the phrase “more efficient” is explicitly referring to an estimator of $β_{0, t}^{⋆}$ that has a smaller asymptotic variance.

3.1.1. The Unadjusted Estimator

Analysis using solely the time-varying outcome and treatment data results in the unadjusted estimator. To clarify, in our particular context, the unadjusted estimator ${\hat{β}}_{0, t}^{U}$ can be obtained by minimizing the following objective function:

ℙ_{n} [\sum_{t = 1}^{T} {(Y_{t + 1} - α_{0, t} - (A_{t} - p_{t}) β_{0, t})}^{2}] .

(4)

As we are interested in the time-specific causal effect, we consider the treatment randomization probability to be constant at each time, denoted as $p_{t} (A_{t} ∣ H_{t}) = p_{t}$ . Thus, the corresponding weight is $W_{t} = 1$ . The unadjusted estimator ${\hat{β}}_{0, t}^{U}$ is consistent and asymptotically normal. As the function presented in (4) is solely based on the data ${\{Y_{t, j}, A_{t, j}\}}_{t = 1}^{T}$ , $j = 1, \dots, N$ and is thus “unadjusted” for auxiliary variables, it will serve as the reference against which we will evaluate the relative efficiency of alternative estimators.

3.1.2. The WCLS estimator

Recall that to obtain the WCLS estimator, we need to minimize the objective function (3). For convenience, we center the control variables $g_{t} (H_{t})$ at their expectations $E [g_{t} (H_{t})]$ and denote the centered control variables as ${\tilde{g}}_{t} (H_{t})$ . Assuming $β (t; S_{t}) = β_{0, t}^{⋆}$ , we can reformulate the WCLS criterion as follows:

ℙ_{n} [\sum_{t = 1}^{T} {(Y_{t + 1} - α_{0, t} - {\tilde{g}}_{t} {(H_{t})}^{⊤} α_{1, t} - (A_{t} - p_{t}) β_{0, t})}^{2}] .

(5)

As per the conclusion drawn by Boruvka et al. (2018), the estimator ${\hat{β}}_{0, t}^{WCLS}$ obtained by minimizing (5) is consistent and asymptotically normal. Here the control variables $g_{t} (H_{t}) \in ℝ^{d}$ are selected to reduce variance and increase the power of test statistics. However, the question remains whether this goal can be consistently achieved. If not, it is important to identify the circumstances under which the incorporation of control variables $g_{t} (H_{t})$ may negatively impact the estimation efficiency of ${\hat{β}}_{0, t}^{WCLS}$ when compared to the unadjusted estimator ${\hat{β}}_{0, t}^{U}$ . To address this question, we first define $β_{1, t} = Σ {(g_{t} (H_{t}))}^{- 1} E [Y_{t + 1} (A_{t} - p_{t}) {\tilde{g}}_{t} (H_{t})]$ , which represents the least-squares solutions obtained from the criterion (7). Here, $Σ (g_{t} (H_{t})) = E [{\tilde{g}}_{t} (H_{t}) {\tilde{g}}_{t} {(H_{t})}^{⊤}] \in ℝ^{d \times d}$ denotes the variance-covariance matrix for the control variables $g_{t} (H_{t})$ . With these definitions, we present the following lemma:

Lemma 3.2.

The difference in the asymptotic variance between the WCLS estimator ${\hat{β}}_{0, t}^{W C L S}$ and the unadjusted estimator ${\hat{β}}_{0, t}^{U}$ can be expressed as follows:

- p_{t} (1 - p_{t}) α_{1, t}^{⊤} Σ (g_{t} (H_{t})) (α_{1, t} + 2 (1 - 2 p_{t}) β_{1, t}) .

(6)

If $p_{t} = 1 / 2$ , the adjustment is either neutral or favorable. However, if $p_{t} \neq 1 / 2$ , then adjustment may lead to an asymptotic variance inflation. Specifically, consider the case where $g_{t} (H_{t})$ is positively correlated with the outcome and strongly moderates the treatment effect (i.e., $α_{1, t} > 0$ and $|β_{1, t}|$ is large), if $β_{1, t} > 0$ and the randomization probability $p_{t} > 1 / 2$ , the summation within the parentheses would result in a negative value. Hence term (6) would be positive, indicating an asymptotic variance increase when incorporating additional control variables as described in (5).

3.1.3. A More Efficient Estimator

So far, we have demonstrated that when estimating the time-specific causal excursion effect, the use of the WCLS criterion presented in (5) cannot guarantee efficiency improvement as compared to the unadjusted estimator. This conclusion aligns with what was drawn in Freedman (2008), and naturally leads one to question whether adjusted estimators should be recommended.

Here, we consider alternative ways of adjusting for auxiliary variables that provide some guarantees of estimation efficiency improvement. Motivated by Lin (2013), we incorporate similar ideas as in standard covariate adjustment into the WCLS approach. The proposed estimator of the causal excursion effect with auxiliary variable adjustment can then be obtained by minimizing the following criterion:

ℙ_{n} [\sum_{t = 1}^{T} {(Y_{t + 1} - α_{0, t} - {\tilde{g}}_{t} {(H_{t})}^{⊤} α_{1, t} - (A_{t} - p_{t}) (β_{0, t} + {\tilde{g}}_{t} {(H_{t})}^{⊤} β_{1, t}))}^{2}]

(7)

Equation (7) is a strict extension of the proposal in Lin (2013). Lemma 3.3 below establishes an asymptotic efficiency gain of (7) over both the unadjusted and adjusted estimators.

Lemma 3.3.

Suppose Assumptions 2.1 and 3.1 hold, and the randomization probability $p_{t} (A_{t} ∣ H_{t})$ is known. Given invertibility and regularity conditions, let $({\hat{α}}_{t}, {\hat{β}}_{0, t}^{L}, {\hat{β}}_{1, t})$ minimize objective function (7):

${\hat{β}}_{0, t}^{L}$ is consistent and asymptotically normal such that $\sqrt{n} ({\hat{β}}_{0, t}^{L} - β_{0, t}^{⋆}) \to 𝒩 (0, Q_{t}^{- 1} Σ_{t}^{L} Q_{t}^{- 1})$ , where $Q_{t}$ , $Σ_{t}^{L}$ are defined in Appendix B;
${\hat{β}}_{0, t}^{L}$ is at least as efficient as the unadjusted estimator ${\hat{β}}_{0, t}^{U}$ described in (4), and the asymptotic efficiency gain is:
$\frac{{(α_{1, t} + (1 - 2 p_{t}) β_{1, t})}^{⊤} Σ (g_{t} (H_{t})) (α_{1, t} + (1 - 2 p_{t}) β_{1, t})}{p_{t} (1 - p_{t})} + β_{1, t}^{⊤} Σ (g_{t} (H_{t})) β_{1, t};$ (8)
${\hat{β}}_{0, t}^{L}$ is at least as efficient as the WCLS estimator ${\hat{β}}_{0, t}^{W C L S}$ described in (5), and the asymptotic efficiency gain is:
$\frac{1 - 3 p_{t} + 3 p_{t}^{2}}{p_{t} (1 - p_{t})} β_{1, t}^{⊤} Σ (g_{t} (H_{t})) β_{1, t} .$ (9)

The presented lemma demonstrates that incorporating mean-centered auxiliary variables in the estimation of the time-specific marginal treatment effect using (7) guarantees an improvement in efficiency. A detailed proof for Lemma 3.3 is available in Appendix B.

Remark 3.4.

The above discussion aims to establish a connection between the framework of time-varying causal excursion effect estimation in MRT analysis and the original RCT analysis. Therefore, we focus on comparing the efficiency of time-specific marginal effect estimators. However, it is important to note that this comparison is applicable to time-specific moderation analysis as well, where $β (t; S_{t}) = f_{t} {(S_{t})}^{⊤} β_{0, t}^{⋆}$ .

3.2. Moderation effect estimation

There are several limitations with (7). First, most MRTs have a small sample size $n$ and a relatively large number of time points $T$ . Therefore, estimating centering parameters for each time point separately leads to a large number of additional parameters. Second, centering the auxiliary variable at its mean is not a one-size-fits-all remedy. When we are interested in moderated treatment effects (i.e., $S_{t} \neq \emptyset$ ), centering by time-specific means can introduce bias. Finally, so far we have taken it for granted that we can center the auxiliary variable directly using its true mean during the model fitting process. However, in practice, this mean quantity needs to be estimated from the observed data, introducing additional variation that could potentially impact the precision of our estimates. In other words, centering at each time point may not necessarily lead to improved precision, particularly when the number of time points $T$ is large.

Here we consider auxiliary variable adjusted estimators of the causal excursion effect moderation under an arbitrary choice of potential moderator $S_{t}$ and a smoothed causal excursion effect model $f_{t} {(S_{t})}^{⊤} β$ . Let ${\hat{β}}_{0}^{WCLS}$ be the solution that minimizes the WCLS criterion defined in (3). As proven in Boruvka et al. (2018), the estimator ${\hat{β}}_{0}^{WCLS}$ is consistent and asymptotically normal. Let $Z_{t} \in ℝ^{p}$ be a $p$ -dimensional subset of $g_{t} (H_{t})$ that one expects to moderate the causal effect, and $Z_{t} \cap f_{t} (S_{t}) = \emptyset$ . Our goal is to improve the estimation efficiency of the WCLS estimator by leveraging the auxiliary information from ${\{Z_{t}\}}_{t = 1}^{T}$ . In the context of moderation effect analysis, the term “more efficient” refers to an estimator that achieves a lower asymptotic variance than the other for all linear combinations. Specifically, the asymptotic variance of $c^{⊤} {\hat{β}}_{0}$ is smaller for any $c \in ℝ^{q}$ . This is equivalent to the negative semidefiniteness of the difference between the asymptotic variance matrices.

3.2.1. Auxiliary Variable Adjusted Estimation

We hereby propose an auxiliary variable adjusted estimation method using a more general centering function $μ_{t} (S_{t}) \in ℝ^{p}$ . Under Assumption 3.1, the Auxiliary-variable Adjusted Weighted and Centered Least Square (A2-WCLS) criterion is shown below by minimizing the following objective function:

ℙ_{n} [\sum_{t = 1}^{T} W_{t} \times {(Y_{t + 1} - g_{t} {(H_{t})}^{⊤} α - (A_{t} - {\tilde{p}}_{t} (1 ∣ S_{t})) (f_{t} {(S_{t})}^{⊤} β_{0} + {(Z_{t} - μ_{t} (S_{t}))}^{⊤} β_{1}))}^{2}]

(10)

An appropriate choice of the centering function $μ_{t} (S_{t})$ is the key to obtaining a consistent estimate of the moderated causal effect. It has been well established that the WCLS estimation method consistently estimates the moderated treatment effect (Boruvka et al., 2018). Therefore, to ensure the consistency of the A2-WCLS estimator, it is sufficient to require that the minimizer ${\hat{β}}_{0}$ of criterion (10) coincides with the minimizer of (3). In other words, after substituting ${\hat{β}}_{0}$ into the derivatives of (10) and (3) with respect to $β_{0}$ , both should be equal to 0. This enables us to formulate the constraint for $μ_{t} (S_{t})$ to ensure the consistent estimation of $β_{0}^{⋆}$ :

Condition 3.5 (The orthogonality condition).

For each auxiliary variable $Z_{t}^{i} - μ_{t}^{i} (S_{t}) \in ℝ$ , where $i \in {1, \dots, p}$ , to be included in the model, the centering function must satisfy the following orthogonality condition with respect to $f_{t} (S_{t}) \in ℝ^{q}$ :

E [\sum_{t = 1}^{T} {\tilde{p}}_{t} (1 ∣ S_{t}) (1 - {\tilde{p}}_{t} (1 ∣ S_{t})) (Z_{t}^{i} - μ_{t}^{i} (S_{t})) f_{t} (S_{t})] = 0_{q \times 1} .

(11)

There are two novel aspects to this orthogonality condition. First, while $E [Z_{t}^{i} ∣ S_{t}]$ satisfies Condition 3.5, the orthogonality condition (11) indicates that it is not the only viable option for incorporating auxiliary variables. In fact, the condition implies that a consistent estimate of the moderated causal effect can be obtained without requiring $μ_{t}^{i} (S_{t})$ to converge to $E [Z_{t}^{i} ∣ S_{t}]$ . This allows for considerable flexibility in the choice of $μ_{t}^{i} (S_{t})$ . As an example, consider the scenario where $f_{t} (S_{t}) = 1$ , i.e., $S_{t} = \emptyset$ . In this case, the goal is to estimate the fully marginal effect $β_{0}^{⋆}$ . For centering the auxiliary variable $Z_{t}^{i}$ , one option is to use ${\bar{Z}}_{t}^{i}$ , which involves $T$ estimates. An alternative approach is to leverage Equation (11), which permits the use of a single centering parameter. Specifically, we can use $μ^{i} = \frac{\sum_{t = 1}^{T} {\tilde{p}}_{t} (1 ∣ S_{t}) (1 - {\tilde{p}}_{t} (1 ∣ S_{t})) Z_{t}^{i}}{\sum_{t = 1}^{T} {\tilde{p}}_{t} (1 ∣ S_{t}) (1 - {\tilde{p}}_{t} (1 ∣ S_{t}))}$ , a weighted average of the ${\{Z_{t}^{i}\}}_{t = 1}^{T}$ . Moreover, this example also emphasizes that centering $Z_{t}^{i}$ at its global average ${\bar{Z}}^{i}$ can result in inconsistent estimates of the fully marginal effect $β_{0}^{⋆}$ . This example highlights the other novel aspect of Equation (11): it imposes a constraint with the same dimension as $f_{t} (S_{t}) \in ℝ^{q}$ on each of the $p$ centering functions $μ_{t}^{i} (S_{t})$ . As a result, the dimension of $μ_{t} (S_{t})$ can be as low as $q$ , therefore, it will not become prohibitive when $T$ increases.

Based on the preceding discussion, a natural choice for a $q$ -dimensional working model for each of the centering functions is $μ_{t}^{i} (S_{t}; θ^{i}) = f_{t} {(S_{t})}^{⊤} θ^{i}$ , where $i \in {1, \dots, p}$ . This choice allows us to exploit the low-dimensional nature of the causal effect model. Denoting $Θ = (θ^{1}, \dots, θ^{p}) \in ℝ^{q \times p}$ , we can establish a straightforward and easy-to-implement A2-WCLS criterion as follows:

ℙ_{n} [\sum_{t = 1}^{T} W_{t} \times {(Y_{t + 1} - g_{t} {(H_{t})}^{⊤} α - (A_{t} - {\tilde{p}}_{t} (1 ∣ S_{t})) (f_{t} {(S_{t})}^{⊤} β_{0} + {(Z_{t} - Θ^{⊤} f_{t} (S_{t}))}^{⊤} β_{1}))}^{2}]

(12)

The following lemma describes the asymptotic properties of the proposed A2-WCLS estimator, enabling a comparison of its estimation precision with the existing WCLS estimator.

Lemma 3.6.

Suppose Assumptions 2.1 and 3.1 hold, centering function $μ_{t} (S_{t})$ satisfies the orthogonality condition 3.5, and that the randomization probability $p_{t} (A_{t} ∣ H_{t})$ is known. Let $(\hat{α}, {\hat{β}}_{0}^{A 2}, {\hat{β}}_{1})$ minimize objective function (10). Under regularity conditions, ${\hat{β}}_{0}^{A 2}$ is consistent and asymptotically normal such that $\sqrt{n} ({\hat{β}}_{0}^{A 2} - β_{0}^{⋆}) \to 𝒩 (0, Q^{- 1} Σ^{A 2} Q^{- 1})$ , where $Q$ , $Σ^{A 2}$ are defined in Appendix D.

Note that the consistency property of ${\hat{β}}_{0}^{A 2}$ does not depend on the specific choice of $μ_{t} (S_{t})$ , as long as $μ_{t} (S_{t})$ satisfies condition 3.5. In addition, the A2-WCLS criterion inherits the robustness of the WCLS method, which guarantees that ${\hat{β}}_{0}^{A 2}$ remains consistent even if the nuisance function $g {(H_{t})}^{⊤} α$ is not correctly specified. Hypothesis tests can be performed using Wald test statistics and normal-based confidence intervals.

Remark 3.7 (Connection to the Neyman Orthogonality).

By treating $(α, β_{1})$ as nuisance parameters, we can establish a connection between the Orthogonality Condition outlined in 3.5 and the Neyman orthogonality of the score equations (Neyman, 1979; Chernozhukov et al., 2018). Intuitively, Neyman orthogonality implies that the moment conditions used to identify $β_{0}^{⋆}$ are insensitive to the nuisance parameter $β_{1}$ , allowing one to plug in noisy estimates of these parameters without strongly violating the moment condition. Like-wise, our orthogonality condition here also aims to ensure that the consistent estimation of $β_{0}^{⋆}$ will not be affected by the estimation of $β_{1}$ . It can be derived that Neyman orthogonality leads to the same constraint as condition 3.5. The proof is presented in Appendix C.1.

3.2.2. The Coefficients of the Adjusted Auxiliary Variables

Standard statistical arguments can be used to show that the ${\hat{β}}_{1}$ obtained from minimizing A2-WCLS in criterion (10) converges to $β_{1}^{⋆} (μ_{t})$ given by:

E {[\sum_{t = 1}^{T} {\tilde{p}}_{t} (1 - {\tilde{p}}_{t}) Σ_{μ} (Z_{t})]}^{- 1} E [\sum_{t = 1}^{T} {\tilde{p}}_{t} (1 - {\tilde{p}}_{t}) (β (t; S_{t}, Z_{t}) - β (t; S_{t})) (Z_{t} - μ_{t} (S_{t}))],

(13)

where $Σ_{μ} (Z_{t}) = E [(Z_{t} - μ_{t} (S_{t})) {(Z_{t} - μ_{t} (S_{t}))}^{⊤}]$ . Referring to the causal estimand as presented in (2), the quantity $β (t; S_{t}, Z_{t}) = E [E [Y_{t + 1} ∣ H_{1}, 1] - E [Y_{t + 1} ∣ H_{1}, 0] ∣ S_{t}, Z_{t}]$ represents the moderated causal excursion effect at time $t$ , given the effect moderators $\{S_{t}, Z_{t}\}$ . By estimating $β_{1}$ , we can gain insights into the effects of including a non-moderator auxiliary variable, which indicates a wrong model specification of the causal effect. Additionally, in this scenario, it is worth investigating whether the choice of $μ_{t}$ affects the estimation or not.

Let $β_{1}^{⋆} (E [Z_{t} ∣ S_{t}])$ be denoted as $β_{1}^{⋆}$ . The following two properties provide answers to the previously mentioned concerns and detailed proof can be found in Appendix E.

Lemma 3.8 (Properties).

The inclusion of misspecified auxiliary variables through the A2-WCLS criterion does not adversely affect estimation efficiency.

If $β (t; S_{t}, Z_{t}) - β (t; S_{t}) = 0$ , then $β_{1}^{⋆} = 0$ ;

This property states that the inclusion of $Z_{t}$ as an auxiliary variable has no asymptotic impact on inference if it is not a moderator.
For any function $μ_{t}$ satisfying Condition 3.5, $|β_{1}^{⋆} (μ_{t})| \leq |β_{1}^{⋆}|$ .

This property confirms parameter attenuation under misspecification of the centering function, and can be used to evaluate its impact on estimation of the moderation effect.

4. Theoretical Results

In this section, we provide theoretical results on the impact of incorporating auxiliary variables on the estimation efficiency of the moderated treatment effect. Additionally, we delve into the detailed examination of the desirable properties of our proposed A2-WCLS method.

4.1. Efficiency Improvement

In contrast to the scenario discussed in Section 3.1.3, where the estimation of time-specific causal effects guarantees a global improvement in precision, modeling the causal excursion effects smoothed over time requires additional assumptions to ensure efficiency gains.

Condition 4.1.

Given the fully observed history $H_{t}$ , and define the prediction error as $ϵ_{t}^{A 2} = Y_{t + 1} - g_{t} {(H_{t})}^{⊤} α - (A_{t} - {\tilde{p}}_{t} (1 ∣ S_{t})) (f_{t} {(S_{t})}^{⊤} β_{0} + {(Z_{t} - μ_{t} (S_{t}))}^{⊤} β_{1})$ , we can present the following sufficient condition:

Correct causal model specification: $β (t; H_{t}) = f_{t} {(S_{t})}^{⊤} β_{0} + {(Z_{t} - μ_{t} (S_{t}))}^{⊤} β_{1}$ ; and
Conditionally uncorrelated error and future states: $E [ϵ_{t}^{A 2} f_{t^{'}} (S_{t^{'}}) (Z_{t^{'}} - μ_{t^{'}} (S_{t^{'}})) ∣ H_{t}, A_{t}] = 0$ , $t < t^{'}$ .

Condition 4.1(a) implies a correctly specified causal effect model and includes all relevant observed variables that could potentially moderate the proximal outcome. It should be noted that this assumption is less stringent than assuming the entire model to be correctly specified, i.e., $E [ϵ_{t}^{A 2} ∣ H_{t}, A_{t}] = 0$ . The latter assumption implies correct modeling of the data-generating process, which is often considerably more complex. Condition 4.1(b) implies that the residuals do not convey any additional information beyond that which is already contained in the observed history. This assumption is fundamental in many statistical and machine learning models, particularly in Markov Decision Processes (MDPs), where the current state determines future states, independent of any unobserved information. Based on the assumptions outlined above, we can state the following theorem:

Theorem 4.2.

Suppose that the centering functions $μ_{t} (S_{t})$ for the auxiliary variables $Z_{t}$ are given prior to the model fitting. Based on Lemma 3.6 and sufficient condition 4.1, incorporating auxiliary variables using the A2-WCLS method guarantees either an improvement or no harm to the asymptotic precision of the moderated treatment effect estimation. Specifically, the asymptotic efficiency gain compared to the WCLS estimator is given by $Q^{- 1} Σ Q^{- 1}$ , where

Q = E [\sum_{t = 1}^{T} {\tilde{p}}_{t} (1 - {\tilde{p}}_{t}) f_{t} (S_{t}) f_{t} {(S_{t})}^{⊤}], Σ = E [(\sum_{t = 1}^{T} W_{t} {(A_{t} - {\tilde{p}}_{t})}^{2} f_{t} (S_{t}) {(Z_{t} - μ_{t} (S_{t}))}^{⊤}) β_{1} β_{1}^{⊤} {(\sum_{t = 1}^{T} W_{t} {(A_{t} - {\tilde{p}}_{t})}^{2} f_{t} (S_{t}) {(Z_{t} - μ_{t} (S_{t}))}^{⊤})}^{⊤}] .

(14)

For ease of understanding, we can simplify (14) by setting $f_{t} (S_{t}) = 1$ (i.e., $S_{t} = \emptyset$ ). In this case, the efficiency improvement is proportional to the variance of a weighted sum of $Z_{t}$ over time, multiplied by the square of $β_{1}$ . Therefore, if we include auxiliary variables that are strong effect moderators with large variances, we can substantially improve the precision of our estimates. Besides, by setting $t$ to a particular time point and $W_{t} = 1$ , we can establish a link between expression (14) and the time-specific efficiency gain demonstrated in Lemma 3.3, which in turn connects the two causal estimands.

Furthermore, it is worth noting that Lemma 3.3 on estimating time-specific causal effects, as well as the theorem presented in Lin (2013), both demonstrate a global efficiency improvement by adjusting for auxiliary variables. In contrast, Theorem 4.2 suggests a local efficiency gain under the assumption of correct model specification. The gap can be attributed to treatments, moderators, and outcomes in MRTs all being time-varying and the estimation focuses on a moderated effect smoothed over time. In contrast, the estimation in both Lemma 3.3 and Lin (2013) is of a marginal causal effect. Our proposed approach to adjust for auxiliary variables provides insight into how to effectively use observed auxiliary information to enhance the estimation of the parameter of interest. Proof of Theorem 4.2 can be found in Appendix F.

Remark 4.3.

One should consider that Condition 4.1 is just one of several potential sufficient conditions that possess a clear interpretation and guarantee an improvement in the efficiency of general MRT analysis. In practical situations, it is rare that the treatment randomization policy depends on every element of the observed history $H_{t}$ . Instead, it often depends only on a few contextual variables. As a result, Condition 4.1 can be relaxed to assume correct causal model specification and uncorrelated error states, conditioned on only a subset of $H_{t}$ . This weaker assumption is still sufficient for efficiency improvement in moderated causal effect estimation. In addition, there are specific scenarios where it is feasible to completely remove this sufficient condition. For instance, when estimating time-specific causal effects, Lemma 3.3 does not require any assumptions regarding the model specification or error-states correlation, while still attaining an efficiency improvement.

4.2. Properties

In this section, we re-evaluate the checklist discussed in Section 2.4 and provide a detailed explanation to demonstrate that the proposed A2-WCLS estimation method satisfies all the desirable properties.

Valid statistical inference.

With reference to Lemma 3.6 and Theorem 4.2, our proposed method generates a consistent and asymptotically normal estimator. Moreover, under Condition 4.1, our approach provides a rigorous mathematical guarantee of achieving an efficiency gain relative to the currently available WCLS inference method.

Robustness to misspecification.

(1) Lemma 3.8 indicates that the inclusion of irrelevant auxiliary variables does not compromise the precision of the estimation of the parameter of interest. (2) A2-WCLS method shares the same property as WCLS in that if the true randomization probability $p (A_{t} ∣ H_{t})$ is known, a consistent estimator can be obtained even when the nuisance function $g {(H_{t})}^{⊤} α$ is misspecified (Boruvka et al., 2018). Robust standard errors are used against model misspecification and heteroskedasticity.

Strong finite-sample performance.

Our conclusions are derived based on asymptotics, where the sample size $N$ approaches infinity while keeping the total number of time points $T$ fixed. However, our proposed method can still provide reliable performance in a finite sample with a small sample correction (Mancl and DeRouen, 2001) applied to the variance estimator, which is elaborated in Appendix H.

Wide applicability.

The proposed method is a comprehensive extension of the existing literature on covariate adjustment that can be applied to study designs with $T \geq 1$ , as discussed in Lemma 3.3 and Theorem 4.2. When $T = 1$ , the adjustment will be based on the baseline variables, and our method is equivalent to the covariate adjustment techniques described in Lin (2013) and Ye et al. (2022).

Computational simplicity.

The utilization of linear working models $Θ^{⊤} f_{t} (S_{t})$ for the centering function simplifies the computation and can be easily implemented using existing functions that are widely available in standard statistical software packages.

4.3. Estimation of the Centering Function

4.3.1. Finite Population vs. Super-Population Based Inference

The previous discussion assumes knowledge of the true centering functions before model fitting, which is not feasible in practice. Therefore, the centering functions $μ_{t} (S_{t})$ need to be estimated from observed data. In an MRT, treatments are randomized sequentially throughout the study, and it is key to recognize that, like $S_{t}$ , the auxiliary variables $Z_{t}$ may also depend on prior actions (i.e., $Z_{t} = Z_{t} ({\bar{A}}_{t})$ ). Consequently, the sources of randomness are not limited to treatment assignments, and estimating the centering function $μ_{t} (S_{t})$ will introduce non-negligible variation to the estimation. Therefore, when performing inference on the causal effect, we must integrate over all sources of uncertainty, specifically, the variance arising from estimating the centering function $μ_{t} (S_{t})$ .

The debate over whether to use finite population inference or hypothetical infinite population inference when estimating ATE in RCTs has been extensively discussed (Reichardt and Gollob, 1999). Some covariate adjustment methods, such as those proposed by Lin (2013) and Su and Ding (2021), use a finite population framework and are shown to perform well asymptotically. In these methods, baseline variables collected before treatment randomization are used as covariates to improve estimation precision, and the only randomness in the data comes from the random treatment assignment. This approach enables the centering parameter $\bar{Z}$ to be directly plugged into the model, without introducing additional variation to the estimation process. In contrast, Negi and Wooldridge (2021) and Ye et al. (2022) investigated the asymptotic precision of ATE under the assumption that the subjects were a random sample from an infinite population. The authors suggest “we cannot assume the data has been centered in advance and $E [Z] = 0$ without loss of generality”, therefore the extra variance introduced by estimating $\bar{Z}$ is taken into account.

MRT analysis slightly differs from the discussion above because the auxiliary variables $Z_{t}$ are collected repeatedly between treatment randomizations. If $Z_{t}$ depends on previous treatments, even in a finite population, estimating $μ_{t} (S_{t})$ will bring in non-negligible variances which affect the asymptotic variance of the parameters of interest. Therefore, identifying the study population no longer suffices to determine whether extra uncertainty exists. Instead, we should identify whether $Z_{t, j} ({\bar{A}}_{t - 1, j}) = z_{t, j}$ remains unchanged regardless of how the previous treatment realizations are altered. If so, we can directly center the auxiliary variables $Z_{t}$ with $μ_{t} (S_{t})$ without adding any additional variation to the estimation process. Otherwise, it is crucial to take into account the extra uncertainty introduced by estimating $μ_{t} (S_{t})$ to avoid a conservative asymptotic variance estimation.

4.3.2. Estimation Methods

To address the uncertainty in estimating the centering function parameters, we employ a “stacking estimating equations” approach, which is a type of M-estimator (Carroll et al., 2006). The A2-WCLS criterion introduced in (12) involves two sets of unknown parameters: $α$ , $β_{0}$ , $β_{1}$ and $θ$ , and the estimating equation of $α$ , $β_{0}$ , $β_{1}$ depends on $θ$ . Thus, we can jointly estimate these parameters by “stacking” their estimating equations and solving for all parameters simultaneously. More details are presented in Appendix G.

5. Simulation

In this section, simulation experiments were conducted under different treatment randomization regimes and centering functions. Based on our empirical findings, we have found that: (1) the A2-WCLS method we propose still effectively improves the efficiency of marginal causal effect estimation, even when considering the additional variance introduced by estimating $θ$ ; (2) incorporating more features in the centering model leads to minimal or no gain compared to using only $μ_{t} (S_{t}) = f_{t} {(S_{t})}^{⊤} θ$ .

5.1. Simulation Setup

Here, we evaluate the proposed method via simulation experiments with a continuous outcome $Y_{t + 1}$ . Equation (15) below is based on the base data generation model from Boruvka et al. (2018), which we modified slightly to illustrate the proposed method and compare it with the WCLS method. Consider an MRT with known randomization probability and the observation vector being a single state variable $Z_{t} \in {- 1, 1}$ at each decision time $t$ , and the state dynamics are given by $ℙ (Z_{t} = 1 ∣ A_{t - 1}, H_{t - 1}) = expit (0.05 t + ξ A_{t - 1})$ with $ξ = 0.1$ . Let

Y_{t + 1, j} = (β_{10} + β_{11} (Z_{t, j} - E [Z_{t, j}])) \times (A_{t, j} - p_{t} (1 ∣ H_{t, j})) + 0.8 Z_{t, j} + ϵ_{t, j}

(15)

The randomization probability is set to $p_{t} (1 ∣ H_{t}) = expit (η_{1} A_{t - 1} + η_{2} Z_{t})$ where $(η_{1}, η_{2}) = (- 0.8, 0.8)$ and $expit (x) = {(1 + \exp (- x))}^{- 1}$ . The independent error term satisfies $ϵ_{t} \sim 𝒩 (0, 1)$ with $Corr (ϵ_{u}, ϵ_{t}) = {0.5}^{| u - t | / 2}$ . We set $β_{10} = - 0.2$ , and $β_{11} \in {0.2, 0.5, 0.8}$ to represent a small, medium, and large moderation effect.

5.2. Simulation Results

In the first simulation experiment, we estimate the fully marginal causal effect, which is constant in time and is given by $β_{0}^{⋆} = β_{10} = - 0.2$ Here, we report results with 250 individuals using the following estimation methods:

Estimation Method I: WCLS.

We use the WCLS method (Boruvka et al., 2018) adjusted with $Z_{t, j}$ as a control variable. This method is guaranteed to produce a consistent estimate with a valid confidence interval. Thus, it will be used as a reference for comparison of the estimation results from the following method.

Estimation Method II: A2-WCLS.

We include $Z_{t, j}$ as an auxiliary variable. Since we are primarily concerned with the fully marginal causal excursion effect (i.e. $S_{t} = \emptyset$ , $f_{t} (S_{t}) = 1$ ), thus, the working model of the centering function using A2-WCLS criterion in (12) is simply $μ_{t} (S_{t}) = θ$ , and $\hat{θ} = ℙ_{n} [\sum_{t = 1}^{T} {\tilde{p}}_{t} (1 - {\tilde{p}}_{t}) Z_{t, j}]$ .

Estimation Method III: A2-WCLS with a higher-dimensional $μ_{t} (S_{t})$ .

As the expectation of $Z_{t}$ changes over time, we consider a higher-dimensional working model for the centering function as $μ_{t}^{'} (S_{t}) = θ_{1} + θ_{2} t$ . We can solve for both ${\hat{θ}}_{1}$ and ${\hat{θ}}_{2}$ under Condition 3.5.

The simulation experiment involves estimating $θ$ in the centering function based on data generated, and this introduces non-negligible variation that affects the asymptotic variance estimation for the fully marginal causal effect. Therefore, the asymptotic variance estimator referred to in Section 4.3.2 will be used. Table 1 reports the simulation results. “%RE gain” indicates the percentage of times we achieve an efficiency gain out of 1000 Monte Carlo replicates. “mRE” stands for the average relative efficiency, and “RSD” represents the relative standard deviation between two estimates. Despite having to account for the extra variance caused by estimating $θ$ , the proposed A2-WCLS method still significantly improves the efficiency of fully marginal causal effect estimation.

Table 1:

Fully marginal causal effect estimation efficiency comparison.

Method	$β_{11}$	Est	SE	CP	%RE gain	mRE	RSD

WCLS (I)	0.2	−0.197	0.029	0.952	-	-	-
	0.5	−0.197	0.029	0.935	-	-	-
	0.8	−0.194	0.031	0.936	-	-	-

A2-WCLS (II)	0.2	−0.199	0.026	0.955	100%	1.207	1.198
	0.5	−0.200	0.027	0.936	100%	1.202	1.196
	0.8	−0.200	0.028	0.940	100%	1.196	1.198

A2-WCLS (III)	0.2	−0.199	0.026	0.955	100%	1.207	1.198
	0.5	−0.200	0.027	0.936	100%	1.202	1.196
	0.8	−0.200	0.028	0.940	100%	1.196	1.198

Open in a new tab

We perform one more simulation experiment using A2-WCLS in which the working model of the centering function $μ_{t} (S_{t})$ has more features that are not included in $f_{t} (S_{t})$ . Simulation results indicate that using this extra dimension barely leads to any improvement in RE compared to using just $f_{t} (S_{t})$ , which could be interpreted as the trade-off between a centering function that better approximates $E [Z_{t} ∣ S_{t}]$ and extra variances caused by estimating the parameters in the working centering model. To conclude, the extension of $μ_{t} (S_{t})$ to a more complex function while ensuring an improvement in estimation efficiency is not trivial. Thus, we consider this to be future work.

Remark 5.1.

As an empirical demonstration that the orthogonality condition 3.5 must be met for estimation to be consistent, we implement a simulation centering the auxiliary variable at the global average $\bar{Z} = \frac{1}{n \times T} \sum_{j = 1}^{n} \sum_{t = 1}^{T} Z_{t, j}$ . According to this simulation result shown in Appendix I Table 2, there is a marked bias compared with results obtained in Estimation Method II, where the auxiliary variable is centered by a weighted average of $Z_{t}$ .

Remark 5.2.

We implemented one simulation under the strong assumption $Z_{t, j} ({\bar{A}}_{t - 1, j}) = z_{t, j}$ . In this case, centering the auxiliary variable no longer introduces extra uncertainty. See Table 3 for the simulation results. This simulation provides similar empirical evidence as in Table 1.

Remark 5.3.

In addition, we vary the treatment randomization policy $p_{t} (A_{t} ∣ H_{t})$ to evaluate how our proposed method performs under different data generation processes. One is to set the randomization probability to $p_{t} (1 ∣ H_{t}) = p_{t}$ , where $p_{t}$ is only a function of time $t$ . Results are presented in Table 4 in Appendix I. Evidence suggests that incorporating auxiliary variables improves estimation efficiency in this case. Furthermore, we set the randomization probability as a constant $p_{t} (1 ∣ H_{t}) = p$ . See results in Table 5. While we see a minimal improvement compared to the previous two scenarios, we highlight that the inclusion of auxiliary variables does not adversely affect efficiency.

6. Extension: Time-Lagged Outcomes and Post-Treatment Auxiliary Variable Adjustment

In addition to the focus on proximal outcomes, there is growing interest in lagged outcomes defined over future decision points with a fixed window length $Δ > 1$ , denoted as $Y_{t, Δ}$ , which can be expressed as a known function of the participant’s data: $Y_{t, Δ} = y (H_{t + Δ})$ (Dempsey et al., 2020). The incorporation of auxiliary variables in this context is conceptually similar to that of proximal outcomes. Nevertheless, when considering lagged outcomes, the selection of auxiliary variables actually has a broader scope as post-treatment auxiliary variables may also be incorporated into the estimation.

Under Assumption 2.1, the causal estimand for lagged effect can be expressed in terms of observable data (Shi et al., 2022):

β_{p, π} (t + Δ; s) = E [E [W_{t, Δ - 1} Y_{t, Δ} ∣ A_{t} = 1, H_{t}] - E [W_{t, Δ - 1} Y_{t, Δ} ∣ A_{t} = 0, H_{t}] ∣ S_{t} = s],

(16)

Where $W_{t, u} = \prod_{s = 1}^{u} π_{t} (A_{t + s} ∣ H_{t + s}) / p_{t} (A_{t + s} ∣ H_{t + s})$ . Here, we assume the reference distribution for treatment assignments from $t + 1$ to $t + Δ - 1 (Δ > 1)$ is given by a randomization probability generically represented by ${\{π_{u} (a_{u} ∣ H_{u})\}}_{u = t + 1}^{t + Δ - 1}$ . This generalization contains previous definitions such as lagged effects (Boruvka et al., 2018) where $π_{u} = p_{u}$ and deterministic choices such as $a_{t + 1 : (t + Δ - 1)} = 0$ (Dempsey et al., 2020; Qian et al., 2021), where $π_{u} = 1 \{a_{u} = 0\}$ and $1 {\cdot}$ is the indicator function. As described in Boruvka et al. (2018), assuming $β_{p, π} (t + Δ; s) = f_{t} {(s)}^{⊤} β_{0}^{⋆}$ , the WCLS criterion for lagged outcomes is to minimize the following expression:

ℙ_{n} [\sum_{t = 1}^{T - Δ + 1} W_{t, Δ - 1} W_{t} \times {(Y_{t, Δ} - g_{t} {(H_{t})}^{⊤} α_{0} - (A_{t} - {\tilde{p}}_{t} (1 ∣ S_{t})) f_{t} {(S_{t})}^{⊤} β_{0})}^{2}] .

(17)

It is important to note that all variables included in Equation (17), as indicated by the notations, must be observed prior to the treatment allocation at time $t (A_{t})$ . Naively plugging in post-treatment variables in Equation (17) without appropriate adjustment can lead to bias, and we provide an illustration of this in Appendix J. However, considering the lagged outcomes, it is also reasonable to expect that post-treatment variables may be highly prognostic, and including them could potentially reduce noise and yield a more precise estimation of the causal parameter. Building upon this motivation, we introduce the A2-WCLS criterion for lagged outcomes, which allows for the incorporation of post-treatment auxiliary variables. Let $Z_{t + u} \in ℝ^{p^{'}}$ represent a set of post-treatment auxiliary variables, where $u$ is a positive integer:

ℙ_{n} [\sum_{t = 1}^{T - Δ + 1} W_{t, Δ - 1} W_{t} \times (Y_{t, Δ} - g_{t} {(H_{t})}^{⊤} α_{0} - {(Z_{t + u} - μ_{t + u} (S_{t}, A_{t}))}^{⊤} α_{1} - {(A_{t} - {\tilde{p}}_{t} (1 ∣ S_{t})) (f_{t} {(S_{t})}^{⊤} β_{0} + {(Z_{t + u} - μ_{t + u}^{'} (S_{t}, A_{t}))}^{⊤} β_{1}))}^{2}] .

(18)

The A2-WCLS criterion for post-treatment auxiliary variable adjustment has two distinctions compared to only pre-treatment auxiliary variable adjustment. First, as a sufficient condition for ensuring a consistent estimate of the parameter of interest $β_{0}$ , we require two sets of centering functions, denoted as $μ_{t + u} (\cdot)$ and $μ_{t + u}^{'} (\cdot)$ , as shown above. Second, both sets of centering functions are dependent on the treatment allocation $A_{t}$ , thus denoted as $μ_{t + u} (S_{t}, A_{t})$ and $μ_{t + u}^{'} (S_{t}, A_{t})$ . As a result, we arrive at the subsequent complementary orthogonality condition for both centering functions:

Condition 6.1.

For each auxiliary variable $Z_{t + u}^{i} \in ℝ$ , where $i \in 1, \dots, p^{'}$ , and $0 < u < Δ$ , its corresponding centering functions must satisfy the following orthogonality condition with respect to $f_{t} (S_{t}) \in ℝ^{q}$ ,

(a) E [\sum_{t = 1}^{T - Δ + 1} W_{t} W_{t, Δ - 1} (Z_{t + u}^{i} - μ_{t + u}^{i} (S_{t}, A_{t})) (A_{t} - {\tilde{p}}_{t} (1 ∣ S_{t})) f_{t} (S_{t})] = 0_{q \times 1}, (b) E [\sum_{t = 1}^{T - Δ + 1} W_{t} W_{t, Δ - 1} (Z_{t + u}^{i} - μ_{t + u}^{' i} (S_{t}, A_{t})) {(A_{t} - {\tilde{p}}_{t} (1 ∣ S_{t}))}^{2} f_{t} (S_{t})] = 0_{q \times 1} .

(19)

Condition 6.1 offers an approach for integrating post-treatment auxiliary variables into the estimation of lagged moderated causal excursion effects, thereby expanding our framework for incorporating auxiliary variables in time-varying treatment effect estimation. To establish a connection with the previous discussion on pre-treatment auxiliary variable adjustment mentioned in Condition 3.5, we can set $u = 0$ , leading to $μ_{t}^{'} (S_{t}, A_{t}) = μ_{t} (S_{t}, A_{t}) = μ_{t} (S_{t})$ , and $W_{t, Δ - 1} = 1$ . Thus, Equation (19)(a) will always be 0, and Equation (19)(b) can be simplified to Equation (11). To conclude, the orthogonality condition for proximal outcomes is subsumed as a special case. The proof can be found in Appendix J.

Remark 6.2.

The method we propose can also be used in situations where the causal excursion effect is not presented as a linear contrast but instead, as logarithmic relative risks, as shown in Qian et al. (2021). We have included the extensions of our method to binary outcomes in Appendix K. It’s important to note that the orthogonality condition of the centering function for binary outcomes is dependent on the parameter $β_{0}$ . To solve this, we can use an iterative algorithm to simultaneously estimate the stacking equations.

7. Case Study

The Intern Health Study (IHS) is a 6-month micro-randomized trial on medical interns (NeCamp et al., 2020), which aimed to investigate when to provide mHealth interventions to individuals in stressful work environments to improve their behavior and mental health. In this section, we evaluate the effectiveness of targeted notifications in improving an individual’s mood and step counts. The data set used in the analyses contains 1562 participants.

The exploratory and MRT analyses conducted in this paper focus on weekly randomization, thus, an individual was randomized to receive mood, activity, sleep, or no notifications with equal probability (1/4 each) every week. We choose the outcome $Y_{t + 1, j}$ as the self-reported mood score (a Likert scale taking values from 1 to 10) and step count (cubic root) for individual $j$ in study week $t$ . The average weekly mood score when a notification is delivered is 7.14, and 7.16 when there is no notification; The average weekly step count (cubic root) when a notification is delivered is 19.1, and also 19.1 when there is no notification. In the following analysis, the prior week’s outcome is chosen as the auxiliary variable. We evaluate the targeted notification treatment effect for medical interns using our proposed method and WCLS.

7.1. Comparison of the marginal effect estimation

First, we are interested in assessing the fully marginal excursion effect as well as the effect moderation (i.e., $β (t) = β_{0}^{⋆}$ ). For an individual $j$ , the study week is coded as a subscript $t$ . $Y_{t + 1, j}$ is the self-reported mood score or step count for individual $j$ in study week $t + 1$ , and $Z_{t, j}$ represents the auxiliary variable we choose. $A_{t}$ is defined as the specific type of notification that targets improving the outcome. For example, if the outcome is the self-reported mood score, sending mood notifications would be the action, thus, $ℙ (A_{t} = 1) = 0.25$ . We analyze the marginal causal effect $β_{0}$ of the targeted notifications on self-reported mood score and step count using the following model:

Y_{t + 1, j} = g_{t} {(H_{t})}^{⊤} α + (A_{t} - {\tilde{p}}_{t}) (β_{0} + β_{2} (Z_{t, j} - μ_{t} (θ)))

For the first analysis, we set $g_{t} {(H_{t})}^{⊤} α = α_{0}$ and $β_{2} = 0$ . Thus, the estimation boils down to a univariate WCLS model. In the second model, we keep $β_{2} = 0$ , but include a control variable, meaning $g_{t} (H_{t}) α = α_{0} + α_{1} Z_{t, j}$ . The third moderation analysis lets $β_{2}$ be a free parameter, enabling novel analyses using A2-WCLS.

The corresponding outcome of the prior week is chosen to be the auxiliary variable, i.e., $Z_{t, j} = Y_{t, j}$ , which is a time-varying variable. To specify further, for mood score as the outcome, we selected the prior week’s average mood score as the auxiliary variable, and for step count as an outcome, the prior week’s average step count was chosen. The working model chosen to center these auxiliary variable is $μ_{t} = ℙ_{n} [Y_{t}]$ . We report various estimators in Figure 1 and present more details in the Appendix L Table 8. In comparison with Model I, Model II & III have a tangible improvement in the standard error estimates, with a relative estimation efficiency of 2.32, 2.33 for the mood outcome, and 1.57, 1.58 for the step outcome. Even though Model III using A2-WCLS does not gain considerable efficiency over Model II (WCLS) results, it is not likely to lose efficiency.

Figure 1: — Causal effect point estimates with the 95% confidence interval, and standard errors

We conclude that sending activity notifications can increase (the cubic root of) step counts by 0.072, with statistical significance at level 95%. The study also suggests that sending mood notifications may negatively affect users’ moods. There is, however, insufficient evidence to come to a causal relationship between these two.

7.2. Time-varying treatment effect estimation

In most mobile health intervention studies, time-in-study has always been an important moderator of treatment effects. Therefore, for further investigation, we include study week in the marginal treatment effect model: $β (t) = β_{0}^{⋆} + β_{1}^{⋆} t$ . Auxiliary variables are still chosen to be the corresponding outcome in the prior week. Control variables have been established in Section 7.1 as being beneficial in reducing standard errors, so control variables will always be included in the following analysis.

Estimated time-varying treatment moderation effects are shown in Figure 2 below. We compare our proposed approach against the WCLS method from Boruvka et al. (2018). More details of the moderated analysis are presented in Appendix L Table 9. The shaded area represents the 95% confidence band of the moderation effects at varying values of the moderator. A much narrower confidence band is observed when A2-WCLS is used; specifically, the relative efficiency for ${\hat{β}}_{0}$ , ${\hat{β}}_{1}$ in the mood model is 2.204 and 2.157, respectively, and in the step model is 1.562 and 1.475, respectively.

There is an overall decreasing trend of mood change with study week, which suggests that if the notifications don’t serve any therapeutic purposes, it might not be ideal to over-burden the participants with more targeted reminders, since it will bring down their mood. Furthermore, it is encouraging to see that the causal excursion effect of mobile prompts for step count change is positive in the first several weeks of the study, which means that sending targeted reminders is beneficial to increasing physical activity levels. In the later stages of the study, the effect fades away, possibly due to habituation to smartphone reminders.

8. Discussion

MRTs are commonly used in mobile health-related studies, yet statistical techniques for covariate adjustment have mainly focused on RCTs with different sampling designs. In light of the increasing research on statistical tools for MRT analysis, we investigate the theoretical properties of incorporating auxiliary variables in the estimation of moderated causal excursion effects. Our study aims to address the current gaps in the literature by considering repeated measurements and time-varying treatment effects.

Our proposed A2-WCLS method possesses several desirable properties, such as computational simplicity, robustness to model misspecification, and wide applicability. Moreover, it provides valid statistical inference and demonstrates strong performance in finite-sample settings. Most importantly, it offers improved efficiency for moderated causal excursion effect estimation in comparison to the benchmark WCLS methods. In Appendix M, we provide clear and practical guidelines for practitioners to implement auxiliary variable adjustment in longitudinal studies. Overall, our results have the potential to alleviate confusion surrounding this topic and enhance the precision of causal effect estimation.

Many open questions remain. One of the most important is the choice of the centering function $μ_{t} (S_{t})$ , where we can use feature selection methods, such as the smooth clipped absolute deviation method (SCAD) Fan and Li (2001), to pick an optimal model for $μ_{t} (S_{t})$ . A non-linear centering function can also be considered, similar to Guo and Basse (2021) proposed for RCTs. In addition to using auxiliary variables to improve estimation efficiency, we may also consider adjusting for controlled variables $g (H_{t})$ . There is a wide variety of machine learning methods that can be used for this purpose. Further, if the MRT establishes a certain type of cluster structure (interference within clusters and/or treatment heterogeneity at the cluster level), it definitely requires new centering techniques using cluster-level auxiliary variables (Shi et al., 2022). Last but not least, more advanced and flexible MRT designs should be explored and work in tandem with auxiliary variable adjustment to maximize efficiency, for example, Van Lancker et al. (2022) proposes using an information adaptive design, which adapts to the amount of precision gain and can lead to faster, more efficient trials, without sacrificing validity or power. We consider these to be exciting directions for future work.

Supplementary Material

Supplement only

NIHMS2107334-supplement-Supplement_only.pdf^{(337.9KB, pdf)}

References

Boruvka A, Almirall D, Witkiewitz K, and Murphy SA (2018). Assessing time-varying causal effect moderation in mobile health. Journal of the American Statistical Association 113(523), 1112–1121. [DOI] [PMC free article] [PubMed] [Google Scholar]
Carroll RJ, Ruppert D, Stefanski LA, and Crainiceanu CM (2006). Measurement error in nonlinear models: a modern perspective. Chapman and Hall/CRC. [Google Scholar]
Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W, and Robins J. (2018). Double/debiased machine learning for treatment and structural parameters. [Google Scholar]
Dempsey W, Liao P, Klasnja P, Nahum-Shani I, and Murphy SA (2015). Randomised trials for the fitbit generation. Significance 12(6), 20–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dempsey W, Liao P, Kumar S, and Murphy SA (2020). The stratified micro-randomized trial design: sample size considerations for testing nested causal effects of time-varying treatments. The annals of applied statistics 14(2), 661. [DOI] [PMC free article] [PubMed] [Google Scholar]
Eicker F. (1967). Limit theorems for regressions with unequal and dependent errors. [Google Scholar]
Fan J. and Li R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96(456), 1348–1360. [Google Scholar]
Fisher R. (1935). The Design of Experiments. The Design of Experiments. Oliver and Boyd. [Google Scholar]
Free C, Phillips G, Galli L, Watson L, Felix L, Edwards P, Patel V, and Haines A. (2013). The effectiveness of mobile-health technology-based health behaviour change or disease management interventions for health care consumers: a systematic review. PLoS medicine 10(1), e1001362. [Google Scholar]
Freedman DA (2006). On the so-called “huber sandwich estimator” and “robust standard errors”. The American Statistician 60(4), 299–302. [Google Scholar]
Freedman DA (2008). On regression adjustments to experimental data. Advances in Applied Mathematics 40(2), 180–193. [Google Scholar]
Guo K. and Basse G. (2021). The generalized oaxaca-blinder estimator. Journal of the American Statistical Association, 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
Heron KE and Smyth JM (2010). Ecological momentary interventions: incorporating mobile technology into psychosocial and health behaviour treatments. British journal of health psychology 15(1), 1–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
Huber PJ (1967). Under nonstandard conditions. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability: Weather Modification; University of California Press: Berkeley, CA, USA, pp. 221. [Google Scholar]
ICH E. (1998). E9 statistical principles for clinical trials. London: European Medicines Agency. [Google Scholar]
Klasnja P, Hekler EB, Shiffman S, Boruvka A, Almirall D, Tewari A, and Murphy SA (2015). Microrandomized trials: An experimental design for developing just-in-time adaptive interventions. Health Psychology 34(S), 1220. [Google Scholar]
Liang K-Y and Zeger SL (1986). Longitudinal data analysis using generalized linear models. Biometrika 73(1), 13–22. [Google Scholar]
Liao P, Klasnja P, Tewari A, and Murphy SA (2016). Sample size calculations for micro-randomized trials in mhealth. Statistics in medicine 35(12), 1944–1971. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lin W. (2013). Agnostic notes on regression adjustments to experimental data: Reexamining freedman’s critique. The Annals of Applied Statistics 7(1), 295–318. [Google Scholar]
Mancl LA and DeRouen TA (2001). A covariance estimator for gee with improved small-sample properties. Biometrics 57(1), 126–134. [DOI] [PubMed] [Google Scholar]
Nahum-Shani I, Smith SN, Spring BJ, Collins LM, Witkiewitz K, Tewari A, and Murphy SA (2018). Just-in-time adaptive interventions (jitais) in mobile health: key components and design principles for ongoing health behavior support. Annals of Behavioral Medicine 52(6), 446–462. [DOI] [PMC free article] [PubMed] [Google Scholar]
NeCamp T, Sen S, Frank E, Walton MA, Ionides EL, Fang Y, Tewari A, and Wu Z. (2020). Assessing real-time moderation for developing adaptive mobile health interventions for medical interns: micro-randomized trial. Journal of medical Internet research 22(3), e15033. [DOI] [PMC free article] [PubMed] [Google Scholar]
Negi A. and Wooldridge JM (2021). Revisiting regression adjustment in experiments with heterogeneous treatment effects. Econometric Reviews 40(5), 504–534. [Google Scholar]
Neyman J. (1979). C (α) tests and their use. Sankhyā: The Indian Journal of Statistics, Series A, 1–21. [Google Scholar]
Neyman JS (1923). On the application of probability theory to agricultural experiments. essay on principles. section 9.(tlanslated and edited by dm dabrowska and tp speed, statistical science (1990), 5, 465–480). Annals of Agricultural Sciences 10, 1–51. [Google Scholar]
Qian T, Yoo H, Klasnja P, Almirall D, and Murphy SA (2021). Estimating time-varying causal excursion effects in mobile health with binary outcomes. Biometrika 108(3), 507–527. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rabbi M, Kotov MP, Cunningham R, Bonar EE, Nahum-Shani I, Klasnja P, Walton M, and Murphy S. (2018). Toward increasing engagement in substance use data collection: development of the substance abuse research assistant app and protocol for a microrandomized trial using adolescents and emerging adults. JMIR research protocols 7(7), e166. [DOI] [PMC free article] [PubMed] [Google Scholar]
Reichardt CS and Gollob HF (1999). Justifying the use and increasing the power of at test for a randomized experiment with a convenience sample. Psychological methods 4(1), 117. [Google Scholar]
Ridpath J. (2017). How can we use technology to support patients after bariatric surgery? [Google Scholar]
Robins J. (1986). A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect. Mathematical Modelling 7(9), 1393–1512. [Google Scholar]
Robins JM (1994). Correcting for non-compliance in randomized trials using structural nested mean models. Communications in Statistics-Theory and methods 23(8), 2379–2412. [Google Scholar]
Robins JM (1997). Causal inference from complex longitudinal data. In Latent variable modeling and applications to causality, pp. 69–117. Springer. [Google Scholar]
Rubin D. (1978). Bayesian inference for causal effects: The role of randomization. The Annals of Statistics 6(1), 34–58. [Google Scholar]
Schochet PZ (2010). Is regression adjustment supported by the neyman model for causal inference? Journal of Statistical Planning and Inference 140(1), 246–259. [Google Scholar]
Shi J, Wu Z, and Dempsey W. (2022). Assessing time-varying causal effect moderation in the presence of cluster-level treatment effect heterogeneity and interference. Biometrika. [Google Scholar]
Su F. and Ding P. (2021). Model-assisted analyses of cluster-randomized experiments. Journal of the Royal Statistical Society: Series B (Statistical Methodology). [Google Scholar]
Tsiatis AA, Davidian M, Zhang M, and Lu X. (2008). Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: a principled yet flexible approach. Statistics in medicine 27(23), 4658–4677. [DOI] [PMC free article] [PubMed] [Google Scholar]
Van Lancker K, Betz J, and Rosenblum M. (2022). Combining covariate adjustment with group sequential and information adaptive designs to improve randomized trial efficiency. arXiv preprint arXiv:2201.12921 [Google Scholar]
White H. (2014). Asymptotic theory for econometricians. Academic press. [Google Scholar]
Yang L. and Tsiatis AA (2001). Efficiency study of estimators for a treatment effect in a pretest–posttest trial. The American Statistician 55(4), 314–321. [Google Scholar]
Ye T, Shao J, Yi Y, and Zhao Q. (2022). Toward better practice of covariate adjustment in analyzing randomized clinical trials. Journal of the American Statistical Association, 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang M, Tsiatis AA, and Davidian M. (2008). Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics 64(3), 707–715. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhao A. and Ding P. (2021). No star is good news: A unified look at rerandomization based on p-values from covariate balance tests. arXiv preprint arXiv:2112.10545 [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement only

NIHMS2107334-supplement-Supplement_only.pdf^{(337.9KB, pdf)}

[R1] Boruvka A, Almirall D, Witkiewitz K, and Murphy SA (2018). Assessing time-varying causal effect moderation in mobile health. Journal of the American Statistical Association 113(523), 1112–1121. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Carroll RJ, Ruppert D, Stefanski LA, and Crainiceanu CM (2006). Measurement error in nonlinear models: a modern perspective. Chapman and Hall/CRC. [Google Scholar]

[R3] Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W, and Robins J. (2018). Double/debiased machine learning for treatment and structural parameters. [Google Scholar]

[R4] Dempsey W, Liao P, Klasnja P, Nahum-Shani I, and Murphy SA (2015). Randomised trials for the fitbit generation. Significance 12(6), 20–23. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Dempsey W, Liao P, Kumar S, and Murphy SA (2020). The stratified micro-randomized trial design: sample size considerations for testing nested causal effects of time-varying treatments. The annals of applied statistics 14(2), 661. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Eicker F. (1967). Limit theorems for regressions with unequal and dependent errors. [Google Scholar]

[R7] Fan J. and Li R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96(456), 1348–1360. [Google Scholar]

[R8] Fisher R. (1935). The Design of Experiments. The Design of Experiments. Oliver and Boyd. [Google Scholar]

[R9] Free C, Phillips G, Galli L, Watson L, Felix L, Edwards P, Patel V, and Haines A. (2013). The effectiveness of mobile-health technology-based health behaviour change or disease management interventions for health care consumers: a systematic review. PLoS medicine 10(1), e1001362. [Google Scholar]

[R10] Freedman DA (2006). On the so-called “huber sandwich estimator” and “robust standard errors”. The American Statistician 60(4), 299–302. [Google Scholar]

[R11] Freedman DA (2008). On regression adjustments to experimental data. Advances in Applied Mathematics 40(2), 180–193. [Google Scholar]

[R12] Guo K. and Basse G. (2021). The generalized oaxaca-blinder estimator. Journal of the American Statistical Association, 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Heron KE and Smyth JM (2010). Ecological momentary interventions: incorporating mobile technology into psychosocial and health behaviour treatments. British journal of health psychology 15(1), 1–39. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Huber PJ (1967). Under nonstandard conditions. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability: Weather Modification; University of California Press: Berkeley, CA, USA, pp. 221. [Google Scholar]

[R15] ICH E. (1998). E9 statistical principles for clinical trials. London: European Medicines Agency. [Google Scholar]

[R16] Klasnja P, Hekler EB, Shiffman S, Boruvka A, Almirall D, Tewari A, and Murphy SA (2015). Microrandomized trials: An experimental design for developing just-in-time adaptive interventions. Health Psychology 34(S), 1220. [Google Scholar]

[R17] Liang K-Y and Zeger SL (1986). Longitudinal data analysis using generalized linear models. Biometrika 73(1), 13–22. [Google Scholar]

[R18] Liao P, Klasnja P, Tewari A, and Murphy SA (2016). Sample size calculations for micro-randomized trials in mhealth. Statistics in medicine 35(12), 1944–1971. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] Lin W. (2013). Agnostic notes on regression adjustments to experimental data: Reexamining freedman’s critique. The Annals of Applied Statistics 7(1), 295–318. [Google Scholar]

[R20] Mancl LA and DeRouen TA (2001). A covariance estimator for gee with improved small-sample properties. Biometrics 57(1), 126–134. [DOI] [PubMed] [Google Scholar]

[R21] Nahum-Shani I, Smith SN, Spring BJ, Collins LM, Witkiewitz K, Tewari A, and Murphy SA (2018). Just-in-time adaptive interventions (jitais) in mobile health: key components and design principles for ongoing health behavior support. Annals of Behavioral Medicine 52(6), 446–462. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] NeCamp T, Sen S, Frank E, Walton MA, Ionides EL, Fang Y, Tewari A, and Wu Z. (2020). Assessing real-time moderation for developing adaptive mobile health interventions for medical interns: micro-randomized trial. Journal of medical Internet research 22(3), e15033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Negi A. and Wooldridge JM (2021). Revisiting regression adjustment in experiments with heterogeneous treatment effects. Econometric Reviews 40(5), 504–534. [Google Scholar]

[R24] Neyman J. (1979). C (α) tests and their use. Sankhyā: The Indian Journal of Statistics, Series A, 1–21. [Google Scholar]

[R25] Neyman JS (1923). On the application of probability theory to agricultural experiments. essay on principles. section 9.(tlanslated and edited by dm dabrowska and tp speed, statistical science (1990), 5, 465–480). Annals of Agricultural Sciences 10, 1–51. [Google Scholar]

[R26] Qian T, Yoo H, Klasnja P, Almirall D, and Murphy SA (2021). Estimating time-varying causal excursion effects in mobile health with binary outcomes. Biometrika 108(3), 507–527. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] Rabbi M, Kotov MP, Cunningham R, Bonar EE, Nahum-Shani I, Klasnja P, Walton M, and Murphy S. (2018). Toward increasing engagement in substance use data collection: development of the substance abuse research assistant app and protocol for a microrandomized trial using adolescents and emerging adults. JMIR research protocols 7(7), e166. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] Reichardt CS and Gollob HF (1999). Justifying the use and increasing the power of at test for a randomized experiment with a convenience sample. Psychological methods 4(1), 117. [Google Scholar]

[R29] Ridpath J. (2017). How can we use technology to support patients after bariatric surgery? [Google Scholar]

[R30] Robins J. (1986). A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect. Mathematical Modelling 7(9), 1393–1512. [Google Scholar]

[R31] Robins JM (1994). Correcting for non-compliance in randomized trials using structural nested mean models. Communications in Statistics-Theory and methods 23(8), 2379–2412. [Google Scholar]

[R32] Robins JM (1997). Causal inference from complex longitudinal data. In Latent variable modeling and applications to causality, pp. 69–117. Springer. [Google Scholar]

[R33] Rubin D. (1978). Bayesian inference for causal effects: The role of randomization. The Annals of Statistics 6(1), 34–58. [Google Scholar]

[R34] Schochet PZ (2010). Is regression adjustment supported by the neyman model for causal inference? Journal of Statistical Planning and Inference 140(1), 246–259. [Google Scholar]

[R35] Shi J, Wu Z, and Dempsey W. (2022). Assessing time-varying causal effect moderation in the presence of cluster-level treatment effect heterogeneity and interference. Biometrika. [Google Scholar]

[R36] Su F. and Ding P. (2021). Model-assisted analyses of cluster-randomized experiments. Journal of the Royal Statistical Society: Series B (Statistical Methodology). [Google Scholar]

[R37] Tsiatis AA, Davidian M, Zhang M, and Lu X. (2008). Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: a principled yet flexible approach. Statistics in medicine 27(23), 4658–4677. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] Van Lancker K, Betz J, and Rosenblum M. (2022). Combining covariate adjustment with group sequential and information adaptive designs to improve randomized trial efficiency. arXiv preprint arXiv:2201.12921 [Google Scholar]

[R39] White H. (2014). Asymptotic theory for econometricians. Academic press. [Google Scholar]

[R40] Yang L. and Tsiatis AA (2001). Efficiency study of estimators for a treatment effect in a pretest–posttest trial. The American Statistician 55(4), 314–321. [Google Scholar]

[R41] Ye T, Shao J, Yi Y, and Zhao Q. (2022). Toward better practice of covariate adjustment in analyzing randomized clinical trials. Journal of the American Statistical Association, 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] Zhang M, Tsiatis AA, and Davidian M. (2008). Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics 64(3), 707–715. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] Zhao A. and Ding P. (2021). No star is good news: A unified look at rerandomization based on p-values from covariate balance tests. arXiv preprint arXiv:2112.10545 [Google Scholar]

PERMALINK

Incorporating Auxiliary Variables to Improve the Efficiency of Time-Varying Treatment Effect Estimation

Jieru Shi

Zhenke Wu

Walter Dempsey

Abstract

1. Introduction

1.1. Outline

2. Preliminaries

2.1. Notation

2.2. Existing Inferential Methods Review

Assumption 2.1.

Remark 2.2.

2.3. Covariate Adjustment Literature Review

2.4. Challenges

3. Estimand and Inferential Method

Assumption 3.1.

3.1. Time-Specific Causal Excursion Effect Estimation

3.1.1. The Unadjusted Estimator

3.1.2. The WCLS estimator

Lemma 3.2.

3.1.3. A More Efficient Estimator

Lemma 3.3.

Remark 3.4.

3.2. Moderation effect estimation

3.2.1. Auxiliary Variable Adjusted Estimation

Condition 3.5 (The orthogonality condition).

Lemma 3.6.

Remark 3.7 (Connection to the Neyman Orthogonality).

3.2.2. The Coefficients of the Adjusted Auxiliary Variables

Lemma 3.8 (Properties).

4. Theoretical Results

4.1. Efficiency Improvement

Condition 4.1.

Theorem 4.2.

Remark 4.3.

4.2. Properties

Valid statistical inference.

Robustness to misspecification.

Strong finite-sample performance.

Wide applicability.

Computational simplicity.

4.3. Estimation of the Centering Function

4.3.1. Finite Population vs. Super-Population Based Inference

4.3.2. Estimation Methods

5. Simulation

5.1. Simulation Setup

5.2. Simulation Results

Estimation Method I: WCLS.

Estimation Method II: A2-WCLS.

Estimation Method III: A2-WCLS with a higher-dimensional μtSt.

Table 1:

Remark 5.1.

Remark 5.2.

Remark 5.3.

6. Extension: Time-Lagged Outcomes and Post-Treatment Auxiliary Variable Adjustment

Condition 6.1.

Remark 6.2.

7. Case Study

7.1. Comparison of the marginal effect estimation

Figure 1:

7.2. Time-varying treatment effect estimation

Figure 2:

8. Discussion

Supplementary Material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Estimation Method III: A2-WCLS with a higher-dimensional $μ_{t} (S_{t})$ .