Abstract
In this paper, we introduce an analytic approach for assessing effects of multilevel interventions on disparity in health outcomes and health-related decision outcomes (i.e., a treatment decision made by a healthcare provider). We outline common challenges that are encountered in interventional health disparity research, including issues of effect scale and interpretation, choice of covariates for adjustment and its impact on effect magnitude, and the methodological challenges involved with studying decision-based outcomes. To address these challenges, we introduce total effects of interventions on disparity for the entire sample and the treated sample, and corresponding direct effects that are relevant for decision-based outcomes. We provide weighting and g-computation estimators in the presence of study attrition and sketch a simulation-based procedure for sample size determinations based on precision (e.g., confidence interval width). We validate our proposed methods through a brief simulation study and apply our approach to evaluate the RICH LIFE intervention, a multilevel healthcare intervention designed to reduce racial and ethnic disparities in hypertension control.
Supplementary Information
The online version contains supplementary material available at 10.1007/s11121-024-01677-8.
Keywords: Multilevel, Intervention, Evaluation, Disparity, Equity, Decision, Causal inference, Allowability, Trial
Introduction
Often the effect of an intervention on disparity is evaluated through the lens of effect (measure) modification where the effect is compared across levels of the social group (e.g., race) on the additive or relative scale. There are a few problems with this approach. First, due to differences in the outcome among the referent social group in the treatment and control arms, the intervention effect on disparity can differ depending on whether disparity is measured as a difference or a ratio. Commonly used regression estimators typically emphasize one scale (e.g., ratio for binary measures) rather than how investigators conceive of what constitutes change in disparity. Second, a meaningful measure of disparity (e.g., in the outcome of hypertension control) may adjust for certain allowable covariates (e.g., age and gender) but avoid adjusting for non-allowable covariates (implicated in generating disparity, e.g., pre-existing conditions and socioeconomic status) (Cook et al., 2009; Duan et al., 2008; Jackson, 2021). Regression estimators that are often used to gain precision or address potential bias typically do so by conditioning on all covariates which may overadjust (and possibly underestimate) the disparity. Third, in the setting of healthcare, it will often be important to assess effects on process outcomes such as the disparity in treatment decisions by clinicians. The measure of disparity in treatment decision-making should account for relevant criteria (e.g., clinical needs at the time of the decision, such as systolic blood pressure for antihypertensive treatment decisions). But these criteria may have in fact been affected by the intervention and adjusting for such factors in a regression model can introduce bias. This dilemma makes it difficult to study the effects of multilevel healthcare interventions on disparity in treatment decision-making.
We will present an analytic approach for evaluating intervention effects on disparity in health and healthcare decisions (i.e., treatment decisions made by a healthcare provider). Based on the potential outcome framework for causal inference (Robins, 1986; Rubin, 1974), this approach allows researchers to choose the effect scale and the covariate adjustment set that match their equity value judgments while resolving confounding by other measured covariates that are imbalanced across intervention arms. When certain process outcomes are of interest (e.g., treatment decisions), the approach provides a novel direct effect that captures effects on disparity in the process outcome that appropriately accounts for allowable covariates that may be affected by the intervention, but without over-adjusting the disparity measure for the process outcome.
The paper is organized as follows. We begin with our motivating example which concerns a cluster-randomized trial of a multilevel intervention to reduce disparities in hypertension control. We then review the issues outlined above, describe our analytic approach, and apply our methods to our motivating example. We close by noting limitations, implications for study design, and distinctions from other approaches in the literature. As our intended audience is primarily applied, formal results and proofs appear in the supplement. Procedures for sample size determination based on precision are also provided in the Supplemental Material.
Motivating Example: Hypertension Control
Despite the availability of effective therapy and prevention strategies, racially and ethnically minoritized groups in the USA remain disproportionately burdened by cardiovascular disease—a leading cause of death—in large part due to uncontrolled hypertension (Murphy et al., 2013). Barriers to hypertension control are multifactorial and operate at multiple levels of societal organization, including the individual patient, familial and support systems, clinical team, institutional, and municipal and policy levels (Mueller et al., 2015). Multilevel strategies to reduce disparities in hypertension control include patient activation, practice-based quality improvement efforts, such as audit and feedback interventions, and reorganization of clinical care teams (Hysong, 2009; Mills et al., 2018; Viswanathan et al., 2010; Walsh et al. 2005). Few multilevel interventions have been designed specifically to reduce disparities in hypertension control.
Proposed Design
Overview
The RICH LIFE Project (Cooper et al., 2020) was a multilevel, pragmatic, two-arm cluster-randomized trial to compare the effectiveness of two approaches for reducing disparities. Its goal was to test practical, scalable approaches to addressing disparities in hypertension control. It was designed using the Pragmatic-Explanatory Continuum Criteria Indicators (PRECIS) criteria for pragmatic trials, and through engagement with health system administrators, clinical staff, and community partners. The full protocol is described elsewhere (Cooper et al., 2020).
Population
The trial enrolled 1820 adults (non-Hispanic Black [57%], non-Hispanic White [33%], Hispanic [10%]) during 2016–2019 across 30 primary care practices within five healthcare systems in Maryland and Pennsylvania. Adults were eligible if, by the time of eligibility assessment, they were 21 years of age or older, received care in the prior 6 months, were diagnosed with hypertension, had an uncontrolled systolic blood pressure ( 140 mm Hg) at their recent office-based visit, and had at least one of the following cardiovascular risk factors: diabetes mellitus, hyperlipidemia, coronary heart disease, current tobacco smoking, or depression. Patients with pregnancy, certain serious medical conditions, substance use disorders, those no longer receiving care at the practice site, or those who declined consent were excluded.
Intervention Arms
Within each participating healthcare system, clinical practice sites were randomized to receive the control condition “Standard of Care Plus” (SCP) or the treatment condition “Collaborative Care/Stepped Care” (CC/SC) for 1-year post randomization. Practice sites in the SCP arm received blood pressure measurement standardization with electronic monitors, blood pressure dashboards for audit-feedback, training modules on hypertension care best practices for providers and staff, and presentations on equity for system leaders. Practice sites in the CC/SC arm combined the SCP components with an intensive care management and stepped care model. The patient’s clinical care team was redesigned to include, at minimum, a primary care physician and a care manager (a registered nurse or licensed clinical social worker) whose role was to co-develop (with the patient) a medical management plan and to facilitate care coordination. Around 3-month follow-up, patients and their care managers co-determined whether the care team would be broadened to include either: (1) a community health worker to help patients assess and overcome barriers to self-management and effective interaction with providers, and/or (2) a consultation with a relevant clinical specialist (e.g., a cardiologist) to review the patient’s case and provide recommendations. While these collaborative and stepped care components affected processes at the clinical care team-level, they were only applied to enrolled patients seen at practice sites randomized to the CC/SC arm.
Follow-Up and Outcome Ascertainment
Eligibility was assessed up to 3 to 6 months after the implementation of the blood pressure standardization, audit and feedback, and system-level interventions were implemented. Follow-up began after the point of enrollment and lasted for up to 2 years. Data on demographics, clinical status, experiences of care, and social determinants of health were collected through surveys at baseline, 12 months, and 24 months, supplemented by clinical data pulls from the electronic medical record. The primary clinical outcome was blood pressure control (< 140/90 mm Hg) at 12 months (via the closest office-based measurement in the electronic medical record between 6- and 18-month follow-up).
Text Box 1.
Three common challenges arise when evaluating intervention effects on disparity: (1) effects on disparity can depend on whether disparity is defined in absolute or relative terms; (2) covariate adjustment via regression methods to address confounding, loss to follow-up, or to increase precision may inadvertently overadjust a disparity measure and impact the magnitude of intervention effects on disparity; (3) assessing intervention effects on disparate decision-making will often require accounting for relevant criteria that is possibly affected by the intervention, but regression methods cannot properly account for such criteria without introducing bias
Common Challenges in Analysis and Interpretation
To further motivate our proposal, we discuss three common challenges that arise when analyzing and interpreting the impact of interventions on disparity. Let represent the mean of the outcome measured at follow-up, e.g., controlled hypertension (1 = yes, 0 = no), for a given intervention arm (1 = treatment, 0 = control) and a specific social group (1 = marginalized, e.g., Black, 0 = privileged, e.g., White). For example, would represent the average level of controlled hypertension at follow-up among the Black population in the treated arm. Because is binary, this is equivalent to the proportion of controlled hypertension at follow-up in this group, . We may consider a conditional average given covariates , , as .
Outcome Scale and Coding
Informally, we can express the intervention’s effect on disparity, expressed as , by contrasting disparity among the treatment arm with disparity in the control arm. On the risk difference (i.e., additive) scale,
1 |
where
On the risk ratio or prevalence ratio (i.e., relative) scale,
2 |
where
From the perspective of Eqs. (1) and (2), the scale one uses to conceive of an effect (i.e., the difference in disparity from control versus treatment arms) is paramount in describing progress toward equity. The additive measure emphasizes how the absolute disparity in the proportion of controlled hypertension differs between the treatment and control arms. In contrast, the prevalence or risk ratio measure emphasizes how the absolute disparity compares to the level of controlled hypertension among White participants, and how this ratio differs across treatment and control arms. An odds ratio (not shown) behaves similarly, emphasizing how the absolute disparity in the odds compares to the odds of controlled hypertension among White participants, and how this ratio differs across treatment and control arms.
Concerningly, intervention effects on the additive and relative scales can be in conflict, with one indicating reduced disparity, and the other indicating no change or increased disparity. Supplemental Table 1 replicates a hypothetical scenario (Asada, 2010) where either or is null and the other is not. There are striking empirical examples (Harper et al., 2010) where disparity (across time rather than across intervention arms) decreases on the difference scale but increases on the ratio scale. Exclusive use of or may prioritize one form of change (i.e., additive or ratio) in characterizing the intervention effect. When, for binary outcomes, one reports , another issue arises. The degree of change can also depend on whether the outcome is coded as an attainment (e.g., controlled hypertension) or a shortfall (e.g., uncontrolled hypertension). For example, Supplemental Table 1 shows a hypothetical scenario (Kjellsson et al., 2015) where can be null when outcomes are coded as attainments but non-null when otherwise coded as shortfalls. Disparity may be reported on both scales to provide a fuller picture (Harper et al., 2010; Kjellsson et al., 2015).1
Unfortunately, when applied researchers analyze intervention effects, the scale choice is driven less by their values about what sort of change represents an effect, but rather by statistical considerations. For example, regression models are used to adjust for covariates (for precision or to reduce bias). For binary outcomes, logistic regression (which provides ) or a modified Poisson or negative binomial regression (which provide are often used. However, a linear regression model for binary outcomes (which provides can be difficult to estimate when adjusting for because the model can produce implausible predicted values. Thus, for binary outcomes, is seldom reported which prioritizes relative rather than absolute effects (an implicit value judgment) when assessing the impact of an intervention on disparity. A modeling approach that allows adjustment for while reporting effects on both scales is desirable.
Allowability
As before, we may informally express the intervention effect by contrasting the disparity in the control arm with the disparity in the treatment arm. To make our point explicit in this subsection we will consider effects on disparity within levels of covariates . For example, consider the conditional additive effect:
3 |
where
From the perspective of Eq. (3), the choice of what covariates are included in (if any) is paramount because it helps define what we mean by disparity (Jackson, 2021; Jackson et al., 2022). Our goal for the intervention, after all, is to minimize the treatment arm’s racial difference in outcomes, i.e., . We would include if we believe that racial differences among those similarly situated on better reflects what we mean by equity in hypertension control. For example, Black participants are often younger than their White counterparts, and younger adults are more likely to achieve hypertension control. We may want to compare outcomes among Black and White participants with a similar age (or age distribution) so that differences in age do not mask the effect of barriers that Black participants are more likely to face in managing their hypertension. Such barriers may include their greater likelihood of residence in neighborhoods with fewer healthy food stores, options for physical activity, and pharmacies to support medication adherence (Mueller et al., 2015). For this very reason, we would not want to similarly situate Black and White participants on socioeconomic status (SES), since the Black participants’ lower SES is a primary driver of these barriers and their worse hypertension control, especially if we believe that the opportunity to achieve hypertension control should not depend on one’s SES. Adjusting for SES would, in a sense, over-adjust the measure of racial disparity used to define the intervention effect. In essence, we may not wish to adjust for all covariates as , but rather a subset that we designate as allowable, where another non-allowable subset is used to account for confounding or improve precision but not to adjust the disparity measure. The choice of what is allowable (if anything at all) is a complex but necessary choice and can be informed by ethical and justice-based frameworks.2 From here on, in our examples, we will assume that we have chosen age and gender as allowable and SES as non-allowable covariates for the reasons discussed above.
When we express the intervention effect on disparity as a contrast of the intervention effects across social groups, conditional on the same covariates as in (3), it resembles an interaction term that quantifies effect heterogeneity. On the risk difference (i.e., additive) scale,
4 |
The perspective of (4) shows us that the choice of what covariates are considered allowable (i.e., what is included in and thus used to define disparity) may impact the magnitude of the intervention effect on disparity. The intervention will reduce the racial disparity in hypertension control when its effect is, on average, greater among Black participants (Mackenbach & Gunning-Schepers, 1997). Ideally, the intervention achieves this by addressing barriers that are overrepresented among Black participants (Cooper et al., 2002). For example, persons who are adherent to antihypertensive medications are more likely to achieve hypertension control. If baseline adherence is lower among Black participants, we expect an intervention that improves adherence to be more effective in increasing hypertension control for this group (as more people in this group stand to benefit from the intervention). But if we similarly situate Black and White participants on baseline adherence, we expect this intervention to be equally effective across racial groups because the effects are compared between groups with similar baseline adherence. Supplemental Fig. 1 provides a more formal intuition based on analysis of a causal graph.
When applied researchers analyze intervention effects, especially those of multilevel interventions, they may need to adjust for many covariates that are implicated in disparity if they are imbalanced across intervention arms or if they are associated with study attrition. Imbalanced covariates may be likely to occur in practice with multilevel interventions when randomization occurs at the cluster level and there are few clusters. If we fail to adjust for certain covariates, we may have bias due to confounding or study attrition. But if we adjust through regression, we may overadjust the disparity measure and obscure the treatment effect on disparity as we discussed above. We need a modeling approach that can use a chosen set of allowable covariates (e.g., age and gender) to define the effect on the disparity in the outcome while using an auxiliary set of non-allowable covariates (e.g., SES) to address potential bias without overadjustment.
Decision-Based Process Outcomes
At times we may wish to conduct exploratory analyses on process outcomes that involve medical decision-making. For example, we may wish to know how the intervention affects decisions to intensify antihypertensive medications as measured at follow-up (after hypertension control ). But this raises questions about allowability. In Supplemental Table 2, we provide a hypothetical example, which we explain here. The Black-White difference in uncontrolled hypertension at follow-up is absent in the treatment arm (0%), and present in the control arm (40%). If we assess the intervention’s impact on the disparity in antihypertensive treatment intensification while ignoring hypertension control , we find no difference in the treatment arm (0%), and higher antihypertensive treatment intensification among Black participants than White participants in the control arm (8%). But, among those with the same level of hypertension control , there is no Black-White difference in in the treatment or control arm. What ostensibly was the intervention’s impact on difference in antihypertensive treatment intensification is entirely attributable to its impact on eliminating the Black-White difference in hypertension control .
This is a form of Simpson’s paradox (Simpson, 1951), where the correct choice (to account for ) is driven by substantive rather than statistical considerations. From the standpoint of equity (Institute of Medicine Committee on Understanding & Eliminating Racial Ethnic Disparities in Health Care, 2003; Jackson, 2021), it makes sense to consider hypertension control , which reflects clinical need, as allowable for defining disparity (and thus effects on disparity) in antihypertensive treatment intensification , a medical decision. However, we cannot simply adjust for when measuring intervention effects on disparity in because is a post-intervention variable and doing so may induce bias under more general settings than depicted in the hypothetical scenario of Supplemental Table 2. See, for example, the explanation based on a causal diagram in Supplemental Fig. 2. There, the allowable covariate itself may share a common cause with the outcome, and conditioning on the allowable without accounting for that common cause can lead to what is called collider-stratification bias (Cole & Hernán, 2002). We call this situation the “allowability dilemma” because the intervention effect’s interpretation may be difficult if one does not account for the post-intervention allowable criteria, and the effect estimate may be biased if one adjusts for criteria inappropriately.3 With intervention effects on disparity in decisions like , we need to overcome the allowability dilemma by properly accounting for post-intervention allowables like .
Text Box 2.
Our analytic approach draws from the causal inference literature. It addresses the first challenge by allowing analysts to estimate mean outcomes or proportions for each intervention arm (that account for potential confounding by measured covariates and potential selection-bias due to loss to follow-up), which can be combined to provide intervention effects on the additive or relative scales. It addresses the second challenge by balancing all covariates across intervention arms within each social group (e.g., to control for confounding) but only chosen covariates across social groups (to meaningfully define disparity). It addresses the third challenge by defining and estimating a novel direct effect that removes the impact of the intervention on disparity in the relevant decision criteria. The direct effect estimators account for confounding of the criteria and avoid introducing bias.
Analytic Approach
We outline an analytic approach, based on the potential outcome framework, that overcomes the aforementioned challenges. Readers who are unfamiliar with this framework can find a brief review in the Supplemental Material.
We begin by considering estimation for the total intervention effect by weighting and by a sequential regression procedure known as g-computation (Snowden et al., 2011) and also as iterated conditional expectations (Wen et al., 2021). Weighting involves modeling the intervention assignment mechanism correctly and g-computation involves modeling the outcome process correctly. g-computation is more efficient (Ren et al., 2023) but weighting is designed-based and the weights, which can be constructed without knowledge of the outcomes, can be checked by evaluating covariate balance after weighting (Austin & Stuart, 2015; Jackson, 2016). Therefore, g-computation may be favored under limited sample size for estimation or power for hypothesis testing, whereas weighting may be favored to emphasize objectivity. For direct effects, the g-computation procedure does not require modeling the post-intervention allowable criteria, whereas the weighting procedure does. Both estimation approaches can provide effect estimates on the additive and relative scales. Unlike regression that adjusts for covariates (e.g., age), the approaches we propose allow for the intervention’s effect on disparity to be heterogeneous (e.g., to be more/less effective for various age groups) without having to formulate this form of heterogeneity in the modeling procedure.
Informally, the weighting and g-computation estimation approaches we will propose balance the allowable covariates (e.g., age and gender) and non-allowable covariates (e.g., SES) across intervention arms within each social group (to control for potential confounding by and ), while only balancing the allowables across social groups (i.e., across Black and White participants, to define meaningful effects on disparity). This form of separate balancing for allowables and non-allowables to meaningfully represent intervention effects on disparity represents a novel feature of our approach, which traditional causal estimands and their associated applications of weighting and g-computation estimators do not share. The total effects, which we consider first, are essentially intention-to-treat effects and are of most interest to practitioners who are interested in the total effect of the CC/SC versus SCP interventions as they were actually implemented in the treatment and control arms. Following this, we consider novel direct effects and their estimation by weighting and g-computation procedures. The direct effects are most useful for exploratory analyses for decision-based outcomes while avoiding the challenges described in the previous section.
Definition of Total Effects
The potential outcome is the outcome we would observe for individual under assigning that person to the intervention arm We omit the subscript to simplify notation. We denote the standardized4 average potential outcome among those in the social group under the intervention to set to value as5:
5 |
On the risk difference (i.e., additive) scale, we define the intervention effect , as
6 |
where
The total effects can also be similarly defined for the relative scale which resembles (2).
The standardization of the allowables to a common within-sample standard distribution, denoted by , balances them across social groups so that the definition of effect on disparity is meaningful. Whenever the intervention effect is modified by the allowables on the chosen scale, the choice of standard population, denoted by , will impact the magnitude of the effect. This choice can reflect inferential interests and value judgments. Reflecting inferential interests, the standard population represents membership in the entire trial sample when and represent sample average treatment effects on disparity (SATE-D), which is of interest if the intervention is to be applied to the entire trial sample. See the Supplemental Material for sample average effects of disparity among the treated (SATT-D). Reflecting value judgments, one may wish to center Black persons among the entire sample or among the treated by choosing them as the standard population in either case (Thurber et al., 2022). While this choice may seem to be an added analytic complexity, commonly used regression estimators that adjust for covariates can be represented as a form of standardization where the choice of standard population is usually data driven, opaque, and not connected to any actual population (Aronow & Samii, 2016). Being concrete about the standard population allows the analyst to choose their own inferential goals and makes their value judgments explicit.
Estimation of Total Effects
Here we present two approaches (weighting and g-computation) for estimating the SATE-D given its relevance for our motivating example. We describe modifications for the SATT-D and for loss to follow-up in the supplementary material. The approaches rely on standard “identifying” assumptions (Hernán & Robins, 2020) to ensure that the average potential outcome can be estimated using the observed study data. A key assumption is that the effect of the intervention is unconfounded given the social group , the allowables (e.g., age and gender), and non-allowables (e.g., SES), along with assumptions known as positivity and consistency, and overlap in the distribution of the allowables between each social group and the standard population . These are described further in the Supplemental Material. If these assumptions hold, we can estimate each average potential outcome under each intervention arm and racial group .
To estimate for the SATE-D, the average effect oftreatment on disparity, by weighting we take those belonging to a particular social group in the treatment (or control) arm and take a weighted average of their observed outcomes, using the weight:
7 |
For the SATE-D, we choose the standard population among the entire trial sample. The first term of the weight controls for confounding of the intervention by the allowables (e.g., age and gender) and non-allowables (e.g., SES) in each social group (e.g., Black and White participants), by making the treated and control arms of each social group comparable. The second and third terms serve to meaningfully define disparity. They do so for each arm by balancing across in such a way that follows the standard distribution , the marginal distribution of in the entire study population who are members of the standard population denoted as . Because the same standard distribution is used to balance across for both intervention arms, their comparability is preserved.
Because the weights are unknown, they must be estimated by modeling the assignment mechanism for the intervention given the non-allowables and allowables within each social group , modeling the patterning of given , and patterning of given . The predicted values from these models are then used to obtain the weights. For example, we could estimate the conditional probability by fitting a logistic regression model for given :
8 |
For those in the marginalized group (e.g., Black participants), we use the coefficients to predict . For those in the privileged group (e.g., White participants), we obtain by predicting and obtaining the complement by subtracting from one. Similar strategies can be used to obtain the remaining components of (7).
To estimate for the SATE-D by g-computation, we propose a sequential regression and prediction procedure that relies on the re-expression of as an iterated expectation:
9 |
Here we present one algorithm for estimating (9). In step 1, we regress the outcome on the allowables (e.g., age and gender) and the non-allowables (e.g., SES) among an intervention arm and social group , e.g., the generalized linear model:
10 |
where is some link function (e.g., identity [for linear regression], logistic). In step 2, among those in the entire social group (the same one chosen in step 1), we obtain the predicted values from the model fit in step 1 (e.g., from (10)), and we call those predictions . In step 3, among the same social group used in steps 1 and 2, we regress the predictions on the allowables (e.g., age and gender), e.g., the generalized linear model:
11 |
In step 4, among the standard population denoted by , which for the SATE-D is among the entire trial sample, we obtain predicted values from the model fit in step 3 (e.g., from (11)), and we call those predictions . In step 5, among the standard population we take an average of these predicted values , which estimates the standardized average potential outcome . For standard errors and confidence intervals that account for the hierarchical structure of the data, we suggest a non-parametric, balanced, stratified cluster bootstrap procedure (Davison & Hinkley, 1997; Field & Welsh, 2007; Gleason, 1988; Huang, 2018; Ren et al., 2010). Bootstrap samples are formed by resampling clusters with replacement (at exactly the same rate per cluster) separately for treatment and control arms and retaining all observations within sampled clusters.
Definition of Direct Effects
We discussed that the assessment of intervention effects on disparity in decision-based outcomes (e.g., treatment intensification) needs to properly account for certain post-intervention allowable criteria measured just before the decision (e.g., clinical need). This is necessary so that assessment of intervention effects on disparity in, say, treatment decision-making is not obscured by intervention effects on disparity in the criteria that inform the decision. To accomplish this, we define direct effects via two actions: (1) assign the treatment or control condition; (2) assign the post-intervention decision-relevant criteria in such a way that, within each arm, there is no disparity in the criteria. Although the second action is hypothetical, we could actually intervene to affect the decision-maker’s perception of the criteria (Tolbert & Jackson, 2024). Such strategies are often used in randomized audit studies (Bertrand & Duflo, 2017) designed to detect discrimination, e.g., in hiring decisions by assigning equal qualifications to resumes before passing them along to the hiring manager(s). Thus, the direct effect identifies the impact of the intervention on disparate treatment.
To define sample interventional direct effects of treatment on disparity (SITE-D), we introduce a different standardized average potential outcome6 for a decision-based outcome at follow-up among those in the social group under a joint action to assign the intervention condition to value and assign the values of the criteria (that are perceived by the decision-maker just before the decision ). Formally,
12 |
where is an action to set to a value that was randomly drawn from a pre-specified distribution (Didelez et al., 2006; Geneletti, 2007; Muñoz & van der Laan, 2012) and denotes membership in a standard population within the entire trial sample.
Within each arm , the criteria (e.g., hypertension control at follow-up ) are set by drawing their values from a counterfactual distribution obtained from the standard population after intervening to set (e.g., to treatment or control). The distribution used to draw the assigned values for the criteria (the assigned distribution) is a counterfactual distribution under the treatment condition (when the intervention is set to treatment), or under the control condition (when the intervention is set to control). This has two implications. First, the direct effect captures effects of the intervention on decision-making through the criteria. But it does not capture effects of the intervention on disparity in the criteria. In this way, the effects of the intervention on disparity in decision-making (e.g., treatment intensification) are not obscured by effects of the intervention on disparity in the criteria (e.g., hypertension control).
On the risk difference (i.e., additive) scale, we define the direct effect , as
13 |
where
Relative effects can also be defined to resemble (2). See the Supplemental Material for sample interventional direct effects of treatment on disparity among the treated (SITT-D).
The direct effect is “direct” in the sense that it captures a sample-level effect on disparity in healthcare decisions (e.g., made by providers) that are not due to the intervention’s effect on disparity in the distribution of allowable criteria used to inform those decisions. As a sample-level effect, it differs from the sort of sample average direct effects that are defined by averaging over individual-level direct effects (Didelez et al., 2006; Pearl, 2001; Robins & Greenland, 1992; Robins et al., 2022; Vanderweele et al., 2014).
Estimation of Direct Effects
We propose weighting and g-computation procedures to estimate the SITE-D. Modifications for estimating the SITT-D and under loss to follow-up appear in the Supplementary Material. The approaches rely on the “identifying” assumptions (Hernán & Robins, 2020) (beyond those of the SATE-D), namely unconfoundedness, positivity, consistency, and overlap for the effect of the post-intervention allowables on the outcome , and for the effect of the intervention on the post-intervention allowables , which we describe in the Supplemental Material. If these assumptions hold, we can estimate each average potential outcome under each intervention arm and racial group .
To estimate for the SITE-D, the interventional direct effect on disparity, by weighting we can take those from a social group in the intervention (or control) arm and take a weighted average of their observed outcomes (e.g., treatment intensification , with the following weight, which incorporates the weight (7) used to estimate the SATE-D:
14 |
The first term of the weight (14) takes each social group within each arm and shifts the actual distribution of the post-intervention criteria (e.g., hypertension control) to the assigned distribution. This distribution depends on their intervention condition, but not their social group. The denominator is the actual distribution of the post-intervention allowables given the non-allowables and the baseline allowables among those in the intervention arm and social group . Estimating it requires modeling the actual distribution of given and among those with and . To aid the exposition, we assume that is discrete with a tractable number of levels (e.g., hypertension control defined in stages I-IV (Whelton et al., 2018)) so that a multinomial logistic regression model is appropriate. The predicted values of this model serve as the weight’s denominator. The numerator is similar, except that it is defined among those in the standard population (rather than among the social group ) and is marginalized over the distribution of the allowables and non-allowables among the standard population . Estimating it requires fitting another multinomial logistic regression model for (this time fit among the standard population in the arm ) given , and then obtaining predicted values from this second model among the standard population. The next step is to fit an intercept-only multinomial logistic regression model for among the standard population. The predicted values from this final model are the weight’s numerator.
To estimate for the SITE-D by g-computation, we build upon the g-computation procedure for the SATE-D (9). The procedure for the SITE-D adds a few preliminary steps, the output of which is plugged in as the initial outcome for the SATE-D procedure. It relies on the re-expression of as an iterated conditional expectation:
15 |
In preliminary step (i), we regress the outcome on the baseline allowables (e.g., age and gender) and the non-allowables (e.g., SES) and the post-intervention allowables (e.g., hypertension control) among an intervention arm and social group , e.g.,
16 |
where is some link function (e.g., identity, logistic). In preliminary step (ii), among those in the standard population with , we obtain predicted values from the model fit in step (i) (e.g., from (16)), and call those predictions . In preliminary step (iii), among those in the standard population with , we fit a weighted regression where the predicted values from step (ii) are the regressand and the allowables and non-allowables are the regressors, e.g.,
17 |
fit the following weights,
18 |
The weights (18) ensure that the average taken over the post-intervention allowables , which occurs in the regression (17), is over the appropriate counterfactual distribution of . In preliminary step (iv) we obtain predicted values from the model fit in preliminary step (iii), and call those predictions . This completes the preliminary steps. The final predictions then serve as the starting outcome for the g-computation procedure described for the SATE-D (9).
For the estimation of direct effects through weighting or g-computation, we suggest the same non-parametric, balanced, stratified cluster bootstrap described for the estimation of total effects. The statistical performance of the weighting and g-computation approaches for the SATE-D and SITE-D is compared in a brief simulation study in the Supplemental material.
Application
We applied our methods to the RICH LIFE project to examine the total effect of the intervention, which occurred for 1 year, on the Black-White disparity in hypertension control at 2 years’ follow-up. While the direct effect on treatment intensification is of interest, this outcome is not yet available. To assess the potential for confounding and selection-bias, we compared the distribution of baseline predictors of hypertension control and study attrition across intervention arms separately for each group (see Supplemental Table 3). We estimated the average potential outcomes and total effect on disparity (SATE-D) of the RICH LIFE intervention (treatment [CC/SC arm] versus control [SCP arm]) on hypertension outcomes coded as controlled (gain) and again as uncontrolled (shortfall) at 2 years’ follow-up. For each coding, we estimated effects as a prevalence difference and as a prevalence ratio. For the reasons discussed earlier (see “allowability”), we chose age and gender as allowable and Black participants as the standard population. To adjust for potential confounding and selection-bias, we chose baseline measures of marital status, educational attainment, employment, smoking, systolic blood pressure, and medication adherence as variables for adjustment as potential non-allowable confounders, using the weighting and g-computation estimators for total effects, which were adapted for right censoring (see Supplemental Material). To simplify the application, we excluded the four participants with missing covariate data and the 67 individuals who died after baseline but before hypertension control could be measured. We used the balanced non-parametric cluster bootstrap, stratified by intervention arm and the presence of each racial group (or not) within the practice site, to obtain 95% confidence intervals.
Table 1 reports the average potential outcomes and effect estimates by outcome coding type and analytic approach. The results suggest that the 1-year CC/SC intervention may reduce uncontrolled hypertension at 2 years among both racial groups. However, it appears that this potentially sustained effect was similar across racial groups, with no impact on the disparity on the additive or relative scale. This may reflect the higher prevalence of personal and structural barriers to achieving hypertension control among Black participants (Supplemental Table 3), which may have been exacerbated during the COVID-19 pandemic. The direction of effects was similar on both scales and by coding type.
Table 1.
Weighting | G-computation | |
---|---|---|
Uncontrolled hypertension | ||
Black mean (R = 1) | ||
Treatment (Z = 1) | 0.43 (0.37, 0.50) | 0.43 (0.37, 0.49) |
Control (Z = 0) | 0.46 (0.40, 0.53) | 0.46 (0.40, 0.53) |
White mean (R = 0) | ||
Treatment (Z = 1) | 0.26 (0.15, 0.33) | 0.25 (0.16, 0.33) |
Control (Z = 0) | 0.29 (0.23, 0.36) | 0.29 (0.24, 0.35) |
Intervention effect on disparity | ||
Additive ( | 0.01 (− 0.10, 0.14) | 0.01 (− 0.11, 0.13) |
Relative ( | 1.09 (0.75, 1.81) | 1.09 (0.75, 1.75) |
Controlled hypertension | ||
Black mean (R = 1) | ||
Treatment (Z = 1) | 0.57 (0.50, 0.63) | 0.57 (0.51, 0.63) |
Control (Z = 0) | 0.54 (0.47, 0.60) | 0.54 (0.47, 0.60) |
White mean (R = 0) | ||
Treatment (Z = 1) | 0.74 (0.67, 0.85) | 0.75 (0.67, 0.83) |
Control (Z = 0) | 0.71 (0.65, 0.76) | 0.71 (0.65, 0.76) |
Intervention effect on disparity | ||
Additive ( | − 0.01 (− 0.14, 0.10) | − 0.01 (− 0.13, 0.10) |
Relative ( | 0.99 (0.82, 1.19) | 0.99 (0.83, 1.19) |
Discussion
Motivated by the design of the RICH LIFE Project, we proposed analytic strategies for evaluating effects of multilevel interventions on disparity in health and health-related decisions. These analytic strategies were provided for inference in the entire trial sample (SATE-D and SITE-D), as emphasized in the main text, or for inference on the treated sample (SATT-D and SITT-D), as provided in the Supplemental Material. Effects among the treated are relevant for non-randomized designs when the treatment condition may not be expanded to the control sites that did not receive it as part of the trial.
The proposed approach enables analysts to estimate average potential outcomes for each racial group under each intervention condition (treatment or control), thus providing results on both additive and relative scales. It also allows for analysts to separate the balancing of covariates across social groups (to measure disparity) from the balancing of covariates across intervention arms and loss to follow-up (to account for confounding and selection-bias). Because the analytic approach is flexible with respect to effect scale, and to how covariates are balanced (across social groups and/or intervention arms), it allows one to incorporate values regarding what is equitable in the distribution of health with respect to measuring intervention effects.
Standard approaches such as regression analysis do not offer this degree of flexibility in scale or in how covariates are balanced and may not align with analysts’ or stakeholders’ underlying value judgments regarding equity. Furthermore, the ability of our analytic approach to provide marginal counterfactual means is helpful in clearly describing the intervention effect, especially the potential impact of the intervention if applied to the entire trial sample or only the marginalized social group. Standard approaches also do not allow for the estimation of effects on decision-based process outcomes such as treatment decision-making.
The total effects (SATE-D and SATT-D) are related to other approaches (Howe et al., 2018; Jackson & VanderWeele, 2018; Lundberg, 2022; Naimi et al., 2016; VanderWeele & Robinson, 2014) that estimate effects of hypothetical interventions on outcomes contrasted across social groups. Among other ways, our approach differs by examining the total effect of an actual intervention on disparity with two intervention conditions (treatment and control) rather than a single condition, by examining the total effect on disparity among the those in treated arm (SATT-D), and by explicitly considering allowable covariates and how they are standardized.
Our contribution also includes novel population-level direct effects for decision-based outcomes (SITE-D and SITT-D) that have not been considered before. They differ from the controlled direct effect (Pearl; Robins & Greenland, 1992), natural direct effect (Pearl; Robins & Greenland, 1992), principal stratum direct effect (Frangakis & Rubin, 2002), randomized interventional analog direct effect (Didelez et al., 2006; Geneletti, 2007; Vanderweele et al., 2014), organic direct effect (Lok, 2016), generalized direct effect (Nguyen et al., 2020), interventional direct effect (Robins et al., 2022), and causal influence direct effect (Díaz, 2023) in that (i) they are defined at the population-level whereas the existing direct effects are defined at the individual-level and then averaged (ii) they only remove the impact of the intervention on disparity in an intermediate variable or its perception (the post-intervention allowable criteria) rather than removing all intervention effects through the intermediate variable. The novel direct effect estimands are designed for the purpose of measuring an intervention’s effect on disparity in decision-making rather than providing mechanistic insight. Therefore, we do not define a complimentary indirect effect as the difference between the total and direct effects. Thus, the novel direct effects (SITE-D and SITT-D) avoid the criticisms (Miles, 2022) of indirect effects based on stochastic actions (Didelez et al., 2006; Geneletti, 2007; Vanderweele et al., 2014) on the intermediate variable.
Our approach is not without limitations. First, we note that the effects on disparity among the treated (the SATT-D and SITT-D presented in the Supplemental Material) may also be identified by adaptations of a difference-in-difference approach (Caniglia & Murray, 2020), or more accurately, triple-difference estimators used in economics to study effects on disparity. In those settings, the baseline outcomes can in certain cases be leveraged to enable weaker assumptions about confounding (Caniglia & Murray, 2020; Tchetgen et al., 2023), but the details for our total and direct effects on disparity are saved for future work. Second, our focus was on intention-to-treat effects but investigators and stakeholders may be interested in per protocol effects (Rojas-Saunero et al., 2022) that account for varying degrees of adherence to the intervention condition at the patient-level or site level. Such effects are of great interest in implementation science and will be considered in future work. Third, our approach for decision-based outcomes was limited to a single decision at the end of follow-up. Future work will consider effects on the trajectory of decisions over follow-up to better summarize effects on disparate patterns of decision-making. Our results for the SITE-D and SITT-D assume that any confounding of post-intervention allowable criteria is through baseline covariates. This will also be relaxed in future work. Our sample size determination procedure is based on desirable precision (e.g., confidence interval width) which reflects a focus on estimation. While this focus has many advantages (Rothman & Greenland, 2018), many users and funding agencies may desire to see sample size based on power for hypothesis testing. We suspect that leveraging permutation-based tests (Good, 1994) may be useful in this regard and leave this for future work.
Our approach does carry design implications. Estimating the SATE-D and SATT-D requires investigators to consider what baseline covariates are relevant for measuring disparity, and what additional non-allowable covariates may be prognostic for the outcome so that they can be accounted for in the analysis when there is evidence of potential confounding or selection-bias. Estimating the SITE-D and SITT-D requires measuring any post-intervention criteria that may be relevant for decision-making. Ideally these criteria would be measured just before the decision-based outcome, along with any potential variables that might confound these criteria.
Our contribution represents a potentially powerful approach for evaluating effects of multilevel interventions on disparity. This approach is also applicable to evaluating the effects of single-level interventions. In the Supplementary Material we provide sample analytic R code and a sketch of a simulation-based procedure for sample size determination based on precision.
Supplementary Information
Below is the link to the electronic supplementary material.
Author Contribution
JWJ conceived of the work, developed the formal results, implemented the statistical analyses, and wrote the first draft of the manuscript; YJH and KAC created the analytic dataset and critiqued the manuscript for scientific content; LZ critiqued the manuscript for clarity and scientific content; KAC, JAM, and LAC conducted the RICH LIFE project, provided substantive guidance on the motivating example and statistical analysis, and critiqued the manuscript for clarity and scientific content.
Funding
JWJ was supported by a research grant from the National Heart Lung and Blood Institute (NHLBI) (K01 HL145320). The RICH LIFE Project was supported by the Patient-Centered Outcomes Research Institute (PCORI) and the NHLBI (UH3 HL130688). The RICH LIFE Project is registered at clinicaltrials.gov, registration number NCT02674464.
Data Availability
The data underlying this article will be shared on reasonable request to the corresponding author.
Declarations
Ethical Approval
This study was performed in line with the principles of the Declaration of Helsinki. The RICH LIFE Project was approved by the Johns Hopkins Institutional Review Board, protocol number 00085630.
Informed Consent
Oral informed consent was obtained from all individual participants included in the RICH LIFE study that served as a data application example for this manuscript.
Conflict of Interest
The authors declare no competing interests.
Footnotes
The R, SAS, and Stata statistical software packages do have built-in routines to marginalize over parameter estimates and report risk differences and, with some effort, risk ratios, although these routines have the limitation of treating all covariates as “allowable” by default (SAS Institute Inc., 2018; Sjölander, 2018; Williams, 2012).
To our knowledge, a general, a systematic approach for choosing what is allowable versus non-allowable is not yet available. Discussion of helpful principles is given in Jackson (2021). We suspect that the choice will, at the least, depend on: (i) a theory of how marginalization gives rise to population-level outcomes that are worse among the marginalized group, and (ii) some sense of what variables are morally appropriate in their contribution to health outcomes or healthcare decisions. The choice will always be subjective and guided by moral arguments and theories of (in)justice. We view this as desirable because it centers arguments about inequity which encourages reflection and discussion. See also Cook et al. (2009). When the choice of what is allowable does not align with the investigator’s underlying judgments about equity, the discrepancy in results can be viewed as a form of bias, which can be quite large. See Chang et al. (2024) for a simulation study demonstrating the magnitude of the bias in a different context.
The allowability dilemma cannot be resolved by carrying out a regression analysis with non-allowable covariates as additional regressors (to eliminate bias from collider-stratification) because this may obscure effects on disparity.
The effect is defined by standardizing potential outcomes. For such an effect, the standardization of the potential outcome can be interpreted as the average potential outcome that would be observed in a study design where the eligible trial participants were first sampled (at baseline before the intervention to set to ) in such a way so that the allowables of each group are distributed as in the within-sample standard population denoted by (which balances the allowables across groups). After such sampling, the intervention is applied. For more conceptual and formal details underpinning this interpretation, see Jackson et al. (2022).
The parameter in equation (5) refers to potential outcomes where the parameter is indexed by the assigned intervention status , denoted as a superscript, whereas in equations (1) through (4) refers to observed outcomes indexed by the observed intervention status , denoted parenthetically.
See the earlier footnote under total effects regarding the interpretation of a standardized potential outcome in the definition of the effect. In the case of the direct effect, it refers to the average potential outcome that would be seen in a design where eligible trial participants are first sampled at baseline before any application of any intervention and then, among those sampled and retained in the trial, the interventions is applied, and the intervention is applied just before measurement of the outcome . See Jackson et al. (2022) for further details on interpretation.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- Aronow PM, Samii C. Does regression produce representative estimates of causal effects? Am. J. Pol. Sci. 2016;60:250–267. doi: 10.1111/ajps.12185. [DOI] [Google Scholar]
- Asada, Y. (2010). On the choice of absolute or relative inequality measures. Milbank Quarterly, 88, 616–622; discussion 623–617. [DOI] [PMC free article] [PubMed]
- Austin PC, Stuart EA. Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Statistics in Medicine. 2015;34:3661–3679. doi: 10.1002/sim.6607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bertrand M, Duflo E. Field experiments on discrimination. In: Banerjee AV, Duflo E, editors. Handbook of field experiments. Elsevier; 2017. pp. 309–393. [Google Scholar]
- Caniglia EC, Murray EJ. Difference-in-difference in the time of cholera: A gentle introduction for epidemiologists. Current Epidemiology Reports. 2020;7:203–211. doi: 10.1007/s40471-020-00245-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chang T-H, Nguyen TQ, Jackson JW. The importance of equity value judgements and estimator-estimand alignment in measuring disparity and identifying targets to reduce disparity. American Journal of Epidemiology. 2024;193:536–547. doi: 10.1093/aje/kwad209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cole SR, Hernán MA. Fallibility in estimating direct effects. International Journal of Epidemiology. 2002;31:163–165. doi: 10.1093/ije/31.1.163. [DOI] [PubMed] [Google Scholar]
- Cook BL, McGuire TG, Meara E, Zaslavsky AM. Adjusting for health status in non-linear models of health care disparities. Health Serv. Outcomes Res. Methodol. 2009;9:1–21. doi: 10.1007/s10742-008-0039-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cooper LA, Hill MN, Powe NR. Designing and evaluating interventions to eliminate racial and ethnic disparities in health care. Journal of General Internal Medicine. 2002;17:477–486. doi: 10.1046/j.1525-1497.2002.10633.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cooper LA, Marsteller JA, Carson KA, Dietz KB, Boonyasai RT, Alvarez C, et al. The RICH LIFE Project: A cluster randomized pragmatic trial comparing the effectiveness of health system only vs. health system Plus a collaborative/stepped care intervention to reduce hypertension disparities. American Heart Journal. 2020;226:94–113. doi: 10.1016/j.ahj.2020.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davison A, Hinkley D. Bootstrap methods and their application. Cambridge University Press; 1997. [Google Scholar]
- Díaz, I. (2023). Non-agency interventions for causal mediation in the presence of intermediate confounding. Journal of the Royal Statistical Society Series B: Statistical Methodology.
- Didelez, V., Dawid, P., & Geneletti, S. (2006). Direct and indirect effects of sequential treatments. UAI'06: Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence pp. 138–146).
- Duan N, Meng X-L, Lin JY, Chen C-N, Alegria M. Disparities in defining disparities: Statistical conceptual frameworks. Statistics in Medicine. 2008;27:3941–3956. doi: 10.1002/sim.3283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Field CA, Welsh AH. Bootstrapping clustered data. J. R. Stat. Soc. Series B Stat. Methodol. 2007;69:369–390. doi: 10.1111/j.1467-9868.2007.00593.x. [DOI] [Google Scholar]
- Frangakis CE, Rubin DB. Principal stratification in causal inference. Biometrics. 2002;58:21–29. doi: 10.1111/j.0006-341X.2002.00021.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Geneletti S. Identifying direct and indirect effects in a non-counterfactual framework. Journal of the Royal Statistical Society Series B Statistical Methodology. 2007;69:199–215. doi: 10.1111/j.1467-9868.2007.00584.x. [DOI] [Google Scholar]
- Gleason JR. Algorithms for balanced bootstrap simulations. American Statistician. 1988;42:263–266. doi: 10.1080/00031305.1988.10475581. [DOI] [Google Scholar]
- Good, P. I. (1994). Permutation tests: a practical guide to resampling methods for hypothesis testing. New York, NY: Springer Science & Business Media.
- Harper S, King NB, Meersman SC, Reichman ME, Breen N, Lynch J. Implicit value judgments in the measurement of health inequalities. Milbank Quarterly. 2010;88:4–29. doi: 10.1111/j.1468-0009.2010.00587.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hernán MA, Robins JM. Causal inference: What if. Chapman & Hall/CRC; 2020. [Google Scholar]
- Howe CJ, Dulin-Keita A, Cole SR, Hogan JW, Lau B, Moore RD, et al. Evaluating the population impact on racial/ethnic disparities in hiv in adulthood of intervening on specific targets: A conceptual and methodological framework. American Journal of Epidemiology. 2018;187:316–325. doi: 10.1093/aje/kwx247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang FL. Using cluster bootstrapping to analyze nested data with a few clusters. Educational and Psychological Measurement. 2018;78:297–318. doi: 10.1177/0013164416678980. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hysong SJ. Meta-analysis: Audit and feedback features impact effectiveness on care quality. Medical Care. 2009;47:356–363. doi: 10.1097/MLR.0b013e3181893f6b. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Institute of Medicine Committee on Understanding and Eliminating Racial Ethnic Disparities in Health Care. (2003). Unequal treatment: Confronting racial and ethnic disparities in health care. Washington (DC): National Academies Press (US). [PubMed]
- Jackson, J. W., Hsu, Y.-J., Greer, R. C., Boonyasai, R. T., & Howe, C. J. (2022). The observational target trial: A conceptual model for measuring disparity. arXiv [stat.ME].
- Jackson JW. Diagnostics for confounding of time-varying and other joint exposures. Epidemiology (Cambridge, Mass.) 2016;27:859–869. doi: 10.1097/EDE.0000000000000547. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jackson JW. Meaningful causal decompositions in health equity research: Definition, identification, and estimation through a weighting framework. Epidemiology (Cambridge, Mass.) 2021;32:282–290. doi: 10.1097/EDE.0000000000001319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jackson JW, VanderWeele TJ. Decomposition analysis to identify intervention targets for reducing disparities. Epidemiology (Cambridge, Mass.) 2018;29:825–835. doi: 10.1097/EDE.0000000000000901. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kjellsson G, Gerdtham U-G, Petrie D. Lies, damned lies, and health inequality measurements: Understanding the value judgments. Epidemiology (Cambridge, Mass.) 2015;26:673–680. doi: 10.1097/EDE.0000000000000319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lok JJ. Defining and estimating causal direct and indirect effects when setting the mediator to specific values is not feasible. Statistics in Medicine. 2016;35:4008–4020. doi: 10.1002/sim.6990. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lundberg, I. (2022). The gap-closing estimand: A causal approach to study interventions that close disparities across social categories. Sociological Methods Research, 00491241211055769.
- Mackenbach JP, Gunning-Schepers LJ. How should interventions to reduce inequalities in health be evaluated? Journal of Epidemiology and Community Health. 1997;51:359–364. doi: 10.1136/jech.51.4.359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miles, C. H. (2022). On the causal interpretation of randomized interventional indirect effects. arXiv [stat.ME].
- Mills KT, Obst KM, Shen W, Molina S, Zhang H-J, He H, et al. Comparative effectiveness of implementation strategies for blood pressure control in hypertensive patients: A systematic review and meta-analysis. Annals of Internal Medicine. 2018;168:110–120. doi: 10.7326/M17-1805. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mueller M, Purnell TS, Mensah GA, Cooper LA. Reducing racial and ethnic disparities in hypertension prevention and control: What will it take to translate research into practice and policy? American Journal of Hypertension. 2015;28:699–716. doi: 10.1093/ajh/hpu233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Muñoz ID, van der Laan M. Population intervention causal effects based on stochastic interventions. Biometrics. 2012;68:541–549. doi: 10.1111/j.1541-0420.2011.01685.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Murphy SL, Xu J, Kochanek KD. Deaths: Final data for 2010. National Vital Statistics Reports. 2013;61:1–117. [PubMed] [Google Scholar]
- Naimi AI, Schnitzer ME, Moodie EEM, Bodnar LM. Mediation analysis for health disparities research. American Journal of Epidemiology. 2016;184:315–324. doi: 10.1093/aje/kwv329. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nguyen, T. Q., Schmid, I., & Stuart, E. A. (2020). Clarifying causal mediation analysis for the applied researcher: Defining effects based on what we want to learn. Psycholigal Methods. [DOI] [PMC free article] [PubMed]
- Pearl, J. (2001). Direct and Indirect Effects. In K.D. Breese J (Ed.), Uncertainty in artificial intelligence, proceedings of the seventeenth conference pp. 411–420): Morgan Kaufmann.
- Ren J, Cislo P, Cappelleri JC, Hlavacek P, DiBonaventura M. Comparing g-computation, propensity score-based weighting, and targeted maximum likelihood estimation for analyzing externally controlled trials with both measured and unmeasured confounders: A simulation study. BMC Medical Research Methodology. 2023;23:18. doi: 10.1186/s12874-023-01835-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ren S, Lai H, Tong W, Aminzadeh M, Hou X, Lai S. Nonparametric bootstrapping for hierarchical data. Journal of Applied Statistics. 2010;37:1487–1498. doi: 10.1080/02664760903046102. [DOI] [Google Scholar]
- Robins, J. M., Richardson, T. S., & Shpitser, I. (2022). An interventionist approach to mediation analysis. Probabilistic and Causal Inference: The Works of Judea Pearl pp. 713–764). New York, NY, USA: Association for Computing Machinery.
- Robins JM. A new approach to causal inference in mortality studies with a sustained exposure period—Application to control of the healthy worker survivor effect. Mathematical Modelling. 1986;7:1393–1512. doi: 10.1016/0270-0255(86)90088-6. [DOI] [Google Scholar]
- Robins JM, Greenland S. Identifiability and exchangeability for direct and indirect effects. Epidemiology (Cambridge, Mass.) 1992;3:143–155. doi: 10.1097/00001648-199203000-00013. [DOI] [PubMed] [Google Scholar]
- Rojas-Saunero LP, Labrecque JA, Swanson SA. Invited commentary: Conducting and emulating trials to study effects of social interventions. American Journal of Epidemiology. 2022;191:1453–1456. doi: 10.1093/aje/kwac066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rothman K, Greenland S. Planning study size based on precision rather than power. Epidemiology (Cambridge, Mass.) 2018;29:599–603. doi: 10.1097/EDE.0000000000000876. [DOI] [PubMed] [Google Scholar]
- Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology. 1974;66:688–701. doi: 10.1037/h0037350. [DOI] [Google Scholar]
- SAS Institute Inc. (2018). The CAUSALTRT procedure. SAS/STAT 15.1 User’s Guide pp. 2365–2423). Cary, NC: SAS Institute Inc.
- Simpson EH. The interpretation of interaction in contingency tables. Journal of the Royal Statistical Society. 1951;13:238–241. doi: 10.1111/j.2517-6161.1951.tb00088.x. [DOI] [Google Scholar]
- Sjölander, A. A.-O. (2018). Estimation of causal effect measures with the R-package stdReg. European Journal of Epidemiology, 33. [DOI] [PMC free article] [PubMed]
- Snowden JM, Rose S, Mortimer KM. Implementation of G-computation on a simulated data set: Demonstration of a causal inference technique. American Journal of Epidemiology. 2011;173:731–738. doi: 10.1093/aje/kwq472. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tchetgen, E. T., Park, C., & Richardson, D. (2023). Universal difference-in-differences for causal inference in epidemiology. arXiv [stat.ME]. [DOI] [PMC free article] [PubMed]
- Thurber KA, Thandrayen J, Maddox R, Barrett EM, Walker J, Priest N, et al. Reflection on modern methods: Statistical, policy and ethical implications of using age-standardized health indicators to quantify inequities. International Journal of Epidemiology. 2022;51:324–333. doi: 10.1093/ije/dyab132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tolbert, A. W., & Jackson, J. W. (2024). Trading places: A causal measure of discrimination when the perception of race is not manipulable. Working Paper.
- VanderWeele TJ, Robinson WR. On the causal interpretation of race in regressions adjusting for confounding and mediating variables. Epidemiology (Cambridge, Mass.) 2014;25:473–484. doi: 10.1097/EDE.0000000000000105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vanderweele TJ, Vansteelandt S, Robins JM. Effect decomposition in the presence of an exposure-induced mediator-outcome confounder. Epidemiology (Cambridge, Mass.) 2014;25:300–306. doi: 10.1097/EDE.0000000000000034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Viswanathan M, Kraschnewski JL, Nishikawa B, Morgan LC, Honeycutt AA, Thieda P, et al. Outcomes and costs of community health worker interventions: A systematic review. Medical Care. 2010;48:792–808. doi: 10.1097/MLR.0b013e3181e35b51. [DOI] [PubMed] [Google Scholar]
- Walsh, J., McDonald, K.M., Shojania, K.G., Sundaram, V., Nayak, S., Davies, S., et al. (2005). Closing the quality gap: A critical analysis of quality improvement strategies (Vol. 3: Hypertension Care). Rockville (MD): Agency for Healthcare Research and Quality (US). [PubMed]
- Wen L, Young JG, Robins JM, Hernán MA. Parametric g-formula implementations for causal survival analyses. Biometrics. 2021;77:740–753. doi: 10.1111/biom.13321. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Whelton PK, Carey RM, Aronow WS, Casey DE, Jr, Collins KJ, Dennison Himmelfarb C, et al. 2017 ACC/AHA/AAPA/ABC/ACPM/AGS/APhA/ASH/ASPC/NMA/PCNA guideline for the prevention, detection, evaluation, and management of high blood pressure in adults: A report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. Hypertension. 2018;71:e13–e115. doi: 10.1161/HYP.0000000000000065. [DOI] [PubMed] [Google Scholar]
- Williams R. Using the margins command to estimate and interpret adjusted predictions and marginal effects. The Stata Journal. 2012;12:308–331. doi: 10.1177/1536867X1201200209. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data underlying this article will be shared on reasonable request to the corresponding author.