Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Apr 7.
Published in final edited form as: Annu Rev Public Health. 2026 Apr;47(1):1–18. doi: 10.1146/annurev-publhealth-071123-032637

Causal Inference in Health Disparities Research

John W Jackson 1,2,3,4,5,6
PMCID: PMC13052304  NIHMSID: NIHMS2123377  PMID: 41927499

Abstract

Causal inference is a central endeavor in health disparities research. For decades, it has been used to both measure and explain disparity and discrimination and to evaluate the impact of interventions on disparity and discrimination. This article reviews the use of causal inference methods for each of these endeavors, highlighting critical challenges that emerging work attempts to overcome. A key feature of a newer proposal is to use a descriptive measure of disparity that builds in normative and ethical assumptions and then to perform causal inference on that measure of disparity when seeking to inform and evaluate interventions. In this way, the measure of disparity is applicable to real-world data; is consistent across measurement, intervention development, and evaluation efforts; and builds in normative and ethical assumptions that promote transparency, dialogue, debate, and reproducibility. The article also briefly highlights causal inference methods for transformative interventions.

Keywords: disparity, causal inference, allowability, target study, target trial, decomposition

INTRODUCTION

Causal inference is a fundamental part of health disparities research (69, 71) but it has faced challenges. Chief among them are the issues of how disparity should be defined, in causal versus descriptive terms, and how to integrate moral and ethical assumptions into its definition (31, 65). Equally challenging is the task of using observational data about our current world to inform future interventions that could reduce disparities and to do so in a way that not only respects those moral and ethical assumptions but also leverages sufficient information to meet the causal assumptions needed to estimate a causal effect. Such issues also arise when evaluating the effects of existing interventions on disparities.

This article reviews the historical and emerging literature on causal inference for each of these tasks. The first section summarizes and critiques the standard approach to measuring disparities as a statistical or causal decomposition and proposes an alternate descriptive model that builds in normative and ethical assumptions. The second section explores emerging approaches to extend this model to inform and evaluate interventions that address disparities, including transformative interventions. Our review focuses on disparities in health and health care.

MEASURING HEALTH AND HEALTH CARE DISPARITIES

Disparity as Decomposition

Differential outcomes across social groups [e.g., by race, ethnicity, gender, education, rurality, sexual orientation, disability, mental illness (32); henceforth, groups] are observed for many health-related outcomes, including medical care, disease onset, clinical outcomes, and life expectancy (59, 100). Such differences are often termed disparities when they are associated with historical disadvantage (12). Some stakeholders view these differences as reflective of systemic injustice (74). Others grant this possibility but interpret them, at the very least, as deserving of moral concern and policy efforts (60). Yet others claim that such differences, while unfortunate, may have fair and unfair origins and may not hold societal institutions responsible for ameliorating them (121). These different viewpoints reflect a question often raised: How much of a differential outcome can be attributed to unjust, unfair, or morally concerning sources (5)? How much is not mere statistical artifact but could inform policy?

To answer this question, the social sciences developed statistical decomposition models (3, 36). The key attempt of these models is to take an observed difference and break it into two pieces: one part that is unfair or unjust and thus a measure of disparity or discrimination, and another part that may not be unfair or unjust and thus not a measure of disparity or discrimination. Various conceptual models each distinguish allowable factors that do not contribute to disparity (or discrimination) and nonallowable factors that do. For example, concerning disparities in prioritized vaccine access under limited supply, allowable factors might include risk of infection, risk of severe illness, or immunocompromised status (105). Concerning geographic disparities in the availability of health care resources, allowable factors might include population density or the prevalence of health conditions (53). Wealth, for example, would be nonallowable (25). With these specified, they compare differences as observed versus those under a counterfactual scenario where allowable factors are balanced while nonallowable factors remain unbalanced (to express the generation of unfair population outcomes from the differential distribution of nonallowables), thus isolating disparity. Such models also regard the differential effects of allowables and nonallowables as part of disparity.

Conceptual Models Based on Decomposition

The most recognized decomposition model is that of Oaxaca (101) and Blinder (10), equivalent to methods proposed by Peters (106) and Belson (7) as well as Kitigawa (75). It has been widely used to assess evidence of discrimination in legal cases (40) and to study labor market discrimination (3). Considering two social groups G = 1 (historically disadvantaged) and G = 0 (historically advantaged), the approach posits group-specific linear regression models for a decision-based outcome D (e.g., medical treatment; yes versus no) given explanatory covariates X (i.e., decision inputs) to represent provider decision-making among each group. The counterfactual ports the outcome model of one group to the other: What if those in group G = 0 received decisions D in the same way as did those in group G = 1? Because the counterfactual swaps how decisions are made, but not how characteristics X that drive decisions are distributed, any difference in the average decision between the factual and counterfactual for G = 0 is interpreted as evidence of discrimination. To identify this model, one must include as covariates all factors that providers use to make decisions (e.g., symptoms, lab results), even factors beyond aspects of clinical need (e.g., patient-provider communication, health insurance). Thus, all factors that potentially drive decisions, even inappropriate factors, are operationally considered as allowable.

The Institute of Medicine (IOM), now known as the National Academies of Medicine, released the seminal report Unequal Treatment: Confronting Racial and Ethnic Disparities in Health Care (61), which documented pervasive disparities in health care that it defined as being not due to differences in clinical need or appropriateness, access to care, or patient preferences (i.e., what it considered allowable). To implement this definition, McGuire et al. (90) and Cook et al. (20) sought to generalize the Oaxaca-Blinder decomposition to go beyond linear models and to distinguish allowable factors (e.g., clinical need) from nonallowable factors [e.g., socioeconomic status (SES)]. In their model, the counterfactual G = 0 group must have a joint distribution of clinical need and SES whereby (a) the marginal distribution of clinical need matches that of the G = 1 group and (b) the marginal distribution of SES is retained as that of the G = 0 group. Criterion a balances the allowables across groups, and criterion b is claimed to pick up the mediating role of producing differences in treatment at the population level. The criteria are often emulated by what is known as a rank-and-replace procedure (20, 90). Their counterfactual does not invoke causal assumptions beyond these criteria. This decomposition model is often claimed to represent an IOM-concordant disparity (19, 82).

Duan et al. (30) also implemented the IOM definition of disparity but observed that the criteria of Cook et al. (20) can produce populations where the distribution of allowables (e.g., clinical need) is independent of nonallowables (e.g., SES). They argued that such populations are unrealistic and unsuitable to derive policy-relevant measures of disparity. They proposed a causal decomposition where the G = 1 group is assigned to have the same distribution of clinical need as the G = 0 group, irrespective of SES (when clinical need causes SES) or conditionally on SES (when SES causes clinical need). Unlike the models of McGuire et al. (90) and Cook et al. (20), the marginal nonallowable distribution is not held fixed but is affected by the change in clinical need when, under the causal structure, clinical need impacts SES. Comparing the mean outcome D among the counterfactual G = 1 group and the factual G = 0 group identifies the measure of disparate health care.

The legal and political science literature also generalized the Oaxaca-Blinder decomposition but in a different way. Rather than swapping how outcomes are generated, or how outcome-driving factors are distributed, Greiner & Rubin (44) envision a hypothetical experiment that randomizes a person’s group membership (e.g., race) as perceived by a decision-maker (e.g., a clinical provider) at a moment (e.g., initial clinical visit) relevant for determining the decision-based outcome D (e.g., medical treatment). This scenario is a hypothetical audit study to identify discrimination (8), is similar to a design used to detect disparate treatment in a widely known study of cardiac care (114), and has been applied to study racial discrimination in police shootings (38, 76). It focuses on effects of perceived group membership. The randomization balances all characteristics across groups, and thus all factors, including clinical need and SES, are operationally considered to be allowable.

In computer science, Pearl (104) and others (98, 134, 135) define discrimination mechanistically. Given a causal graph relating a group to an outcome through a set of intermediate variables, the authors conceive of discrimination as the effect of group membership that operates through unfair paths. For example, if, on the graph, one’s group affects clinical need, investigators might consider the path from group to medical treatment through clinical need as a fair path. This model can focus on path-specific effects of perceived group membership. But some focus on the effects of manipulating membership in the group itself (42) perhaps on the grounds that group membership is socially constructed and thus manipulable (79).

Identification Assumptions for Decomposition Models

These conceptual models are elegant theoretically, but their application often rests on unverifiable assumptions. Many of these models envision counterfactuals where the persons’ groups or allowables (e.g., health) are changed by an intervention. To estimate these counterfactuals from data on the observed world (where such interventions have not occurred), assumptions such as conditional exchangeability (no unmeasured confounding), positivity, and consistency (49) are often used. Aside from McGuire et al. (90) and Cook et al. (20), the models rely on these assumptions with respect to the group, the allowables (e.g., health), or both (36, 65). These assumptions have been critiqued for interventions on nonmanipulable factors (48, 71, 115, 125, 129). Here, we examine interventions to change a social group (e.g., race) or an allowable (e.g., health, for a treatment or vaccine access outcome) to understand their plausibility in real-world data.

Under consistency, a person’s counterfactual and observed outcomes align when the hypothetical intervention sets the exposure to its observed status (18, 127). The assumption is usually critiqued by whether the intervention is well-defined enough to expect what the outcome would be from observational data (48, 52, 115, 125, 129). But consistency also reflects assumptions about the observed and counterfactual worlds where the intervention occurs. A corollary of consistency is that the way in which the outcome is conditionally distributed (given the same exposure and confounder values) in the observed world is unchanged by the intervention (30, 63). It implies that the world postintervention is, in a sense, recognizable. This insight offers an avenue for critiquing the plausibility of consistency for hypothetical interventions.

Consider when the social group is race. The intervention to change a person’s assigned race could be achieved by changing how individuals are assigned to racial categories or by changing the rights and privileges accorded to a racial group (79). Looking to historical examples in the United States, we observe that changes in racialization, such as the abolition of slavery and the Reconstruction period (29) or the gains from the civil rights movement, were followed by changes in US society that reflected both advancement and profound backlash (35), including the decades-long Jim Crow era. The world under such interventions was no longer recognizable as its former form. Even proposals to change a person’s perceived race are constrained by how society defines what constitutes race (58, 77, 118). Consider when the allowable is health or underlying medical conditions for measuring disparity in medical treatment or in vaccine access. Interventions to instantly change a person’s health may be feasible for certain medical conditions but are unrealistic for conditions that are chronic and incurable. While new medicines have potential for primary and secondary prevention (45), they are expensive and subject to restrictive health insurance reimbursement policies (33). A world where such medications are broadly available may not be recognizable to our own. The consistency assumption may be implausible for hypothetical interventions that instantly change group membership or allowable covariates.

Under conditional exchangeability, those with alternate exposure values are comparable in all respects relevant to their potential outcomes, conditional on a sufficient set of measured confounders. Under positivity we observe, in the real world, persons for each exposure value of interest within each stratum of confounder values. Consider when the social group is race. Under an intervention to change a person’s assigned racial category in early life, the confounders might involve intergenerational experiences or parental characteristics (e.g., wealth) that not only define or correlate with race but also impact that person’s potential outcomes later in life (17, 56, 129). But these characteristics are often unmeasured in data sources (22) (violating conditional exchangeability) or are tightly bound with racial categorizations (71, 81) (violating positivity). Consider when the allowable is health. Under an intervention to change health, the confounders might involve psychological and physiological factors that determine health. But these factors are likely unmeasured (52) (violating conditional exchangeability) and are highly correlated with health (violating positivity). The conditional exchangeability and positivity assumptions may be implausible for hypothetical interventions that instantly change group or allowable covariates.

Even for the model of McGuire et al. (90) and Cook et al. (20) described above that is not explicitly causal, the model does require causal information. It requires investigators to specify and measure all the nonallowable factors that affect the outcome. These nonallowable factors include more than SES and may involve provider bias, patient-provider communication, patient activation, and other social determinants of health (96) that are often unmeasured in data.

This review has described models to measure disparity or discrimination—based on interventions—and these models rely on identifying assumptions that are arguably implausible with real-world data. Even if one views these assumptions as purely instrumental (42), their violations preclude the ability to emulate a conceptual model, leading to potentially biased results unsuitable for intervention development and policymaking.

Comparing Similarly Situated Groups

Jackson et al. (65) proposed the target study, an alternative conceptual model for measuring disparity. They abandoned the idea of defining disparity as a decomposition where one breaks a group difference in outcomes into two pieces, one fair and one unfair. In place of a decomposition, the authors propose a highly descriptive model that (a) does not involve interventions on group membership or allowable covariates and (b) defines disparity only in terms of allowable covariates. By avoiding interventions, the authors ground the measurement of disparity in the real world rather than in a hypothetical one and thereby avoid unrealistic causal assumptions. By defining disparity in terms of allowables, investigators do not need to know about or measure all the factors that lead to differential outcomes. These two aspects arguably make the model practicable. The model is formally described in Jackson et al. (65), in which issues of nonrandom sample selection (111) are also addressed, but here we describe the default model that accepts contributions from nonrandom sample selection as a part of disparity. A tutorial that applies the model applied to measure disparity in treatment decisions within a large health care system is provided by Meche et al. (92). The moral relevance of this measure is arguably forward-looking to the consequences of disparity (60).

The target study (65) conceives of a disparity as a difference between groups that are similarly situated (distributionally or by restriction) on allowable factors whose differential distribution does not lead to disparate outcomes. This is principally an ethical argument, not a causal one. If a study is measuring disparity in vaccine access, it would take groups already alike in qualifying conditions (e.g., risk of infection) and assess whether there are differences in access. This measure is not “but for” discrimination (34), but it is arguably a difference that warrants moral concern (107). The model permits the allowable category to be empty according to investigators’ judgments.

The target study model (65) shares elements with the target trial framework developed by Hernán & Robins (50). It asks investigators to envision a study one could do in real life to measure disparity. In the protocol, one envisions a specific moment or span of calendar time in which eligible individuals are enrolled. At that “time zero” moment, persons are followed for a specified amount of time for outcomes. The model also accommodates enrollment into sequential studies over time where the aggregated results are interpretable as the average disparity across well-defined populations indexed at different points in calendar time.

A key distinction of the target study (65) is that groups are balanced on baseline allowable factors through a stratified sampling plan. A person’s group and allowable covariates are not determined or assigned by investigators. Enrolled groups are balanced on allowable factors (that do not contribute to disparity) but are unbalanced on nonallowable factors (that do contribute to disparity). The target study resembles a study design that compares groups’ treatment receipt among those with clinical need, a design that identifies an IOM-concordant disparity (20, 65). The target study generalizes this design by similarly situating the groups on allowable covariates distributionally, allowing for varying levels and types, e.g., clinical need, within each group (65). It can also be viewed as a study design that yields a standardized measure (65).

In the target study (65), the only action of the hypothetical investigator is to enroll persons through a stratified sampling plan and follow them for outcomes. Thus, for identification and emulation in real-world data, one need only assume sufficient distributional overlap in the allowables and that the hypothetical sampling process does not affect outcomes. These assumptions are more likely to be met than the causal assumptions required to identify and emulate the decomposition models (65). Moreover, as discussed above, the target study does not require one to enumerate or measure any of the nonallowable factors. Thus, the target study model facilitates the measurement of disparity with real-world data.

INTERVENING TO REDUCE HEALTH AND HEALTH CARE DISPARITIES

The ultimate goal of health disparities research is to show respect and dignity for all persons by giving everyone an equal opportunity to achieve their highest possible level of health (12). Thus, we want to not only document disparities, but also find out how to address them and see how changes to our world affect them (23, 80). To do so, we need to understand how actual and hypothetical interventions on factors that affect outcomes (risk factors) impact disparity apart from confounding factors. This approach calls for causal analysis, but its integration in disparities research has been challenging.

Practitioners often repurpose decomposition methods to explain group differences (109, 117). Oaxaca-Blinder decompositions model a host of so-called explanatory factors jointly (109, 117) without attention to temporality and control of confounders for each factor (67). They are thus prone to situations where each explanatory factor’s contribution could arise from unmeasured factors (confounding) or where variables affected by a factor are included in the model [selection bias (67)]. This potential bias limits the ability of Oaxaca-Blinder decompositions to inform interventions to reduce disparity. Traditional decomposition methods also explain a crude difference, which may not always be interpreted as a disparity. This repurposed usage of decomposition methods represents a conflation where the same decomposition methods used to take a difference and measure disparity as a component are also used to explain the difference’s sources. If disparity is the policy construct of interest, traditional decomposition methods may explain a less relevant construct: inequality. To inform interventions for disparity, an analysis of disparity is more relevant (see the sidebar titled Defining Inequality, Inequity, and Disparity).

DEFINING INEQUALITY, INEQUITY, AND DISPARITY.

There are no standard definitions for the terms inequality, inequity, and disparity (13). This article defines inequality as an unadjusted association between the outcome and the social group; disparity as an association between the outcome and social group interpreted to be unfair, unjust, or morally concerning, which may require adjustment of allowable covariates; and inequity as a synonym for disparity. Inequality can correlate with disparity and inequity and, at times (e.g., see 4), be synonymous with these terms.

Similar issues arise when evaluating the effects of interventions on group differences, which amounts to characterizing effect modification by group and is equivalent to contrasting the group differences across treatment and control arms [i.e., the effect on group inequality (87)]. When stakeholders do not conceive of difference as disparity, the assessment of effect modification provides information about the intervention’s effect on inequality, a construct that is less relevant. The problem is exacerbated when the effect modification by group is assessed using standard regression models with interaction terms between the group and the intervention; these models are often fit by including covariates to increase precision, or to reduce bias from imbalance across intervention arms or from study attrition. What is modeled is the effect on inequality conditional on the covariates, which may differ in magnitude and direction depending on the covariate set (128). The core problem is that the covariate set, chosen to reduce statistical bias, may include ones that stakeholders would not seek to balance when measuring disparity (66).

These challenges amount to a mismatch between the construct of interest, a disparity defined by normatively chosen covariates versus a crude or overadjusted difference studied by traditional decomposition methods (to inform interventions), and standard regression models (to evaluate interventions). Progress can be made by integrating the target study (65) and target trial (50) frameworks to evaluate how hypothetical or actual interventions impact disparity. This approach separates the methods used to measure disparity from those used to inform and evaluate interventions that impact disparity (62). This separation allows investigators to propose a meaningful measure of disparity and then assess how that proposed measure would change under a hypothetical or actual intervention.

Informing Interventions to Address Disparities

Recently developed causal decomposition analysis (CDA; see 108 for a review) integrates causal inference and normative judgments to inform interventions that address disparity (62). CDA treats group membership as a characteristic that defines populations rather than as an exposure. Therefore, CDA is not synonymous with causal mediation or path analysis, neither conceptually nor mathematically (16, 93). Rather, CDA extends from population impact methods (95) such as the attributable risk wherein one compares the distribution of the outcomes as observed versus under a hypothetical shift in the distribution of a factor believed to affect the outcome. CDA asks how a disparity would be reduced if we could remove an existing disparity in a given factor in a normatively and ethically justified way.

CDA can be intuited as a hypothetical study (62, 64, 94). We use a target study protocol (65) to enroll eligible persons and balance outcome-allowable covariates considered appropriate for measuring disparity in the outcome. Within this sample, we carry out a target trial (50) where, within each social group, we randomly assign persons to an observation arm (ARM-I) or intervention arm (ARM-II). In ARM-I, we record the outcomes as they are realized under real-world conditions, yielding an observed disparity. In ARM-II, we alter the distribution of a factor that potentially affects the outcome, conditional on any covariates that, on normative or ethical grounds, are appropriate (i.e., allowable) for determining the factor’s allocation. A simple choice is to shift the conditional distribution of the factor (given intervention-allowable covariates) among the historically disadvantaged group to that of the advantaged group to remove the disparity in the factor. We then record outcomes under both arms, yielding a counterfactual disparity. If the outcome disparity is smaller in ARM-II than it is in ARM-I, interventions to reduce disparity in the factor may be promising.

This study design is hypothetical, but it can be emulated with observational data under assumptions of overlap and innocuous sampling (for the observed disparity) (65) as well as those of consistency, positivity, and conditional exchangeability with respect to the factor intervened upon [for the counterfactual disparity (62)]. Estimators for CDA respect the way disparity is defined in the design by permitting users to specify which covariates are outcome-allowable and/or intervention-allowable (62). They can incorporate nonallowable covariates that help address confounding or other biases but in a way that does not redefine disparity or the intervention. Traditional decomposition analyses and other approaches do not allow investigators to properly distinguish allowable from nonallowable covariates (62). Failing to respect chosen allowability designations in the estimation can yield significant bias (16). Several estimation procedures for CDA have been proposed, including regression-based (67), density ratio weighting (62), g-formula (103, 122), Bayesian (120), multistate (126), and doubly robust estimators (64, 132). We refer the reader to Qin & Jackson (108) for a review, as some estimators do not fully incorporate allowability for the outcome and/or intervention. Sensitivity analyses for unmeasured confounding are available (102, 119) but require significant simplifying assumptions for practical use. See Smith et al. (120), Guimarães et al. (46), and Dissanayake et al. (28) for examples of CDA applied to interventions on contextual factors with individual-level and contextual-level data.

How are estimates from CDA to be interpreted? One perspective is that CDA is informative of the mechanisms by which disparities are produced. With an additional intervention arm (ARM-III) that universally sets the factor to a constant value (57, 99), this perspective links to traditional decomposition methods that seek to explain disparities by the differential distribution versus differential effects of the factor (68). The difference between disparity under ARM-I and under ARM-II identifies the explanation by differential distribution. The difference between disparity under ARM-II and under ARM-III identifies the explanation by differential effect. Recent work by Yu & Elwert (132) modifies ARM-II (to also intervene on the historically advantaged group) and adds a fourth arm (ARM-IV), which randomizes the factor within each group. The difference between disparity under ARM-I and under ARM-IV identifies an explanation by they call differential selection (i.e., differential patterns of confounding). The difference in disparity under ARM-IV and under the modified ARM-II identifies the explanation by differential prevalence apart from differential selection patterns. Selection occurs when factor levels (e.g., treatment) are associated with the factor’s effect (e.g., treatment benefit or harm). The goal of this mechanistic interpretation is to explain the factor’s role in producing the disparity. One limitation is that the mechanisms are defined by interventions that could be specified in many ways, each of which could yield components with varying magnitudes. However, their results may still be useful for testing theories of how disparities are generated (26).

An alternative perspective is that CDA is prescriptive of how to reduce disparities. By its definition, the intervention to equalize a factor’s distribution (i.e., ARM-II) will affect the factor’s differential distribution, partially reduce the impact of the factor’s differential effect [by changing the factor’s distribution (68)], and change how groups select into factor levels (by altering the factor’s assignment mechanism). When the key goal is to learn how much the intervention in ARM-II will reduce the disparity seen in ARM-I, to inform whether the factor is a valuable focus of intervention, the comparison of ARM-I to ARM-II suffices and a breakdown of the factor’s role in producing disparity is not needed. Then, the interventions under ARM-III and ARM-IV are useful only to the extent that they tell us about the impacts of potential interventions that stakeholders may consider in practice.

CDA is agnostic about how its main intervention (ARM-II) is achieved. Although CDA typically (though not always; see, e.g., 28, 46, 120) involves a unit-level (e.g., individual-level) factor as an ultimate focus of intervention, it may be that addressing issues at contextual levels higher than the unit (e.g., family, health care, community) may be needed to bring about the removal of disparity in that factor. In this way, CDA can be conceived of and utilized in a framework that acknowledges a socioecological understanding of health (96).

Evaluating Interventions that Address Disparities

We often want to know how certain interventions or policies, be they hypothetical or real, impact disparities. Toward this goal, the first step is to choose a conceptual model for, and a corresponding measure of, disparity. Here, we focus on the target study model as it relies on minimal assumptions to measure disparity. The next step is to contrast the magnitude of that disparity measure under the treatment versus control arms in a prospective study or randomized trial (66). This method differs from the typical approach of assessing effect modification by group (87), which amounts to contrasting the magnitude of inequality (not necessarily disparity) under treatment versus control arms.

As seen with CDA, the approach advocated for here amounts to nesting the target study (65) within a target trial framework (50), as discussed in Sun et al. (123), except here the trial portion may in fact be carried out rather than emulated. Either way, we specify an eligible population that is enrolled through a stratified sampling plan to balance the allowables across social groups at baseline. Next, we take each social group and randomize persons [or clusters of persons (97)] to the treatment or control condition. This design ensures that (a) allowable covariates are balanced across groups, (b) the nonallowables may be imbalanced across groups, and (c) the allowables and nonallowables are balanced across intervention arms within groups (66). Features a and b help us measure disparity the same way within each intervention arm. Feature c makes the intervention arms exchangeable within groups, which helps us causally assess whether and to what extent the intervention impacts disparity. Then, for all persons, we assess outcomes from the point of randomization (time zero) for a specified duration to avoid bias from immortal person time bias and study attrition.

Although this design is conceivable, it is new and may not often be carried out in real life because allowability designations, which would drive the sampling design, are specific to the choice of primary outcome (and investigators often study secondary outcomes). If the target study assumptions of allowability overlap and innocuous sampling (65) hold, this design can be emulated using data from randomized trials (66). When the intervention cannot be implemented in a prospective study or randomized trial due to cost, feasibility, or ethics, this design can be emulated using observational data under the additional assumptions of consistency, positivity, and conditional exchangeability (49, 66, 123). Estimators such as inverse probability weighting and g-computation can be used to simultaneously emulate the sampling and randomization stages of the design (66, 123).

This approach is flexible. It can be applied to cluster-level interventions where interventions may operate at various societal levels and incorporate multiple components (see, e.g., 66), or to parsimonious interventions that may be defined at the individual level (see, e.g., 123). It can also accommodate intervention strategies that require certain degrees of intervention fidelity (112) or require certain levels of adherence to individual treatment rules (51). The target trial framework (50), when applied to complex health care data, often evokes sequential trials that are each indexed at a different moment in calendar time. In such a design, persons are allowed to enroll in and contribute data to each trial for which they are eligible. The integrated target study and target trial design permits this setup as well (123). It is possible to meta-analyze the results of sequential trials without making any assumption that the disparity or intervention effects on disparity are constant over calendar time (65, 123).

This integrated design can also be extended to study intervention effects on disparity in long-term decision-making by health care providers. In this extension, after the sampling and baseline randomization stages, a randomized audit study is conducted within each intervention arm, where providers’ perceptions of patients’ clinical or public health need (assessed just before the decision outcome) are randomized to be balanced across social groups (66). This within-arm balancing of perceived clinical or public health need helps to identify the intervention’s effect on disparity in provider decision-making for patient treatment plans. This within-arm randomized audit study represents an intervention on perceived clinical or public health need. Thus, its emulation requires additional assumptions of consistency, positivity, and conditional exchangeability with respect to the perception of clinical or public health need (66). Due to these complexities, this additional intervention and its assumptions are best suited for when the decision outcome must consider the perception of clinical or public health need as allowable, and clinical or public health need may be affected by the baseline intervention, so that feedback between the interventions and clinical or public health need during follow-up is properly addressed.

When using observational data to emulate an integrated target trial, it is important to remember its conceptual limits. An underappreciated characteristic of observational data is that it features only factors, actions, or interventions that have occurred in the real world [otherwise the positivity condition fails (63)]. While we have seen declines in disparities over time, in many cases health disparities have persisted (9, 37, 100), implying that the types of interventions we can emulate from observed data may be limited in their effectiveness. Certain transformative actions to change contexts or combine components in novel ways (e.g., redesign of communities and health care) may not have occurred before or otherwise may not be recorded well in data. These trends and constraints with observational study designs call for the continued development of actual interventions and their evaluation in prospective and experimental study designs (23, 97).

Modeling Transformative Interventions

Given the limits of studying hypothetical interventions with observational data, how can we leverage transformative changes or events in society (e.g., 78)? How do we study the causal effects of improved or degraded racial relations, or the effects of economic growth or recessions, on disparities? When we can locate places where such phenomena have occurred, i.e., a treatment condition, the assessment of the phenomena’s impact may be amenable to quasi-experimental study design through an alternative identification strategy. Combined with a control condition where the phenomena have not occurred at the same time, we may be able to leverage a parallel trends or equi-confounding assumption (124)—in place of the conditional exchangeability assumption—through a difference-in-differences (131) or comparative interrupted time series design (85). If none of the possible control groups satisfy this assumption, we may be able to reweight them to construct an artificial control group that satisfies it in a synthetic control design (1). Without any control group, such as when interventions occur universally, a prepost interrupted time series design (84) may be helpful. Quasi-experimental designs are amenable to a target trial specification (116). None of these designs are a panacea. Rather than relying on weaker assumptions for causal inference compared to those of typical observational studies (e.g., conditional exchangeability), these designs instead trade one set of unverifiable assumptions for another (88, 89). Thus, their assumptions must be carefully evaluated.

Quasi-experimental approaches still limit our queries to what has been observed. A radically different approach to modeling potentially transformative interventions is to directly model posited mechanisms by which disparities are produced, and then, to layer on top of this model, an explicit intervention model altering those mechanisms (91). Although the potential outcomes framework approaches can attempt to model mechanisms (e.g., 43), public health scientists have adopted systems science models to study mechanisms due to their greater flexibility (15, 86). They have been championed (27, 39) for their ability to model feedback relationships [systems dynamics models (54)] and interactions among persons as well as between people and their environments [agent-based models (14)]. However, the full potential of these models has not been fully realized because the ways in which disparities arise are complex and dynamic and involve interlocking mechanisms (110), but these mechanisms may not be typically specified in systems science models (63). One avenue to promote richer models is to involve community members and other stakeholders to help specify a model’s key components and structure (55). Another avenue (63) is to incorporate explicit theories (6, 11, 24, 41, 110) (e.g., social, economic, political, cultural) of how disparities arise specifically at multiple levels (27), e.g., how fundamental causes enhance or limit individuals’ ability to take advantage of resources and opportunities (47, 83, 130). Making such a model elaborate enough to reflect theoretical elements but parsimonious enough to be transparent and verifiable with practical insights is an ambitious undertaking.

When quantitative approaches to informing and evaluating interventions are infeasible due to a lack of realistic assumptions, data, or resources, one might still be able to qualitatively examine the relation of events and societal changes to disparity. Recent approaches are integrating formal historiography with counterfactual analysis (70, 113), including structural causal models and explicit interventions as conceptual devices to analyze causal relationships. Such approaches could also help strengthen quantitative attempts to inform and evaluate interventions (2).

DISCUSSION

Causal inference is a critical part of health disparities research. It is a challenging endeavor to ask what has given rise to patterns of population health, to identify which factors might upon intervention improve population health and ensure that groups do not bear undue burden of poor morbidity and excess mortality, and to evaluate how effectively the interventions and policies we construct achieve those goals (73). An added complexity is the need to define patterns of population health that are unfair or morally concerning in some sense (i.e., disparity).

This article has reviewed the challenges of using causal inference and other counterfactual methods to define a disparity and instead has advocated for a descriptive approach that builds in normative and ethical assumptions about what is fair (i.e., allowable) in the distribution of health and health-related factors. With this definition, the article has reviewed causal approaches for identifying points of intervention to reduce a disparity and for evaluating how specific interventions, hypothetical or real, impact a disparity. In this approach, the way in which a disparity is measured across descriptive studies, epidemiologic studies that identify points of intervention, and evaluative studies such as prospective trials can be aligned and made consistent. This alignment is critical for producing a coherent evidence base for reducing a disparity.

Perhaps the most challenging endeavor in health disparities research is to articulate what sources are allowable (if any) for measuring disparity. One concern is that allowability cannot always be deduced from a causal diagram depicting the genesis of disparities (e.g., 56). For example, when measuring disparity in medical provider treatment decisions or vaccine access, the clinical or public health needs of patients may stem from barriers to health and health care that reflect unequal opportunity for health and unequal exposure to environmental risk factors. Nonetheless, many argue that disparity in medical treatment or prioritized vaccine access should nonetheless treat clinical or public health need as allowable (21, 61, 62). Some articles have discussed particular examples and principles for choosing allowables for various types of outcomes (21, 62). Overall, more guidance is needed, and the development of frameworks for allowability is an active area of research.

Although making allowability designations is challenging, it is ultimately worthwhile because it clarifies interpretations regarding health disparities that may aid scientific inference, communication, and reproducibility (72). Explicitly defining allowable sources of difference [or none at all (133)] makes transparent an investigator’s assumptions about which outcomes they consider as disparate. Those who disagree can then challenge or debate these assumptions, but it is difficult to critique and improve upon the underlying assumptions when they are not clearly articulated. Moreover, explicitly stating allowable sources of difference allows stakeholders who use research findings to understand their relevance for what they consider to be disparity.

Furthermore, because descriptive and causal inference rests on identifying assumptions, adequately measured and longitudinal data on multidomain constructs of interest are critical. The continued development of data resources that link across multiple sectors, especially those outside of health (e.g., economic, educational, sociological), is a key priority for future research.

ACKNOWLEDGMENTS

Research reported in this publication was supported by the National Heart, Lung, and Blood Institute of the NIH and the NIH Office of the Director under award R01HL169956 and by the Robert E. Meyerhoff Foundation.

Footnotes

DISCLOSURE STATEMENT

The author is not aware of any affiliations, memberships, funding, or financial holdings that might be perceived as affecting the objectivity of this review. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health (NIH) or the Robert E. Meyerhoff Foundation.

LITERATURE CITED

  • 1.Abadie A. 2021. Using synthetic controls: feasibility, data requirements, and methodological aspects. J. Econ. Lit 59:391–425 [Google Scholar]
  • 2.Abadie A, Diamond A, Hainmueller J. 2015. Comparative politics and the synthetic control method. Am. J. Political Sci 59:495–510 [Google Scholar]
  • 3.Altonji JG, Blank RM. 1999. Race and gender in the labor market. In Handbook of Labor Economics, ed. Ashenfelter O, Card D. Elsevier [Google Scholar]
  • 4.Arcaya MC, Arcaya AL, Subramanian SV. 2015. Inequalities in health: definitions, concepts, and theories. Glob. Health Action 8:27106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Asada Y, Hurley J, Norheim OF, Johri M. 2015. Unexplained health inequality—is it unfair? Int. J. Equity Health 14:11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Bailey ZD, Feldman JM, Bassett MT. 2021. How structural racism works—racist policies as a root cause of U.S. racial health inequities. N. Engl. J. Med 384:768–73 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Belson WA. 1956. A technique for studying the effects of a television broadcast. J. R. Stat. Soc. C 5:195–202 [Google Scholar]
  • 8.Bertrand M, Duflo E. 2017. Field experiments on discrimination. In Handbook of Economic Field Experiments, ed. Banerjee AV, Duflo E. North-Holland [Google Scholar]
  • 9.Bleich SN, Jarlenski MP, Bell CN, LaVeist TA. 2012. Health inequalities: trends, progress, and policy. Annu. Rev. Public Health 33:7–40 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Blinder AS. 1973. Wage discrimination: reduced form and structural estimates. J. Hum. Resour 8:436–55 [Google Scholar]
  • 11.Bowleg L 2012. The problem with the phrase women and minorities: intersectionality—an important theoretical framework for public health. Am. J. Public Health 102:1267–73 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Braveman P 2006. Health disparities and health equity: concepts and measurement. Annu. Rev. Public Health 27:167–94 [DOI] [PubMed] [Google Scholar]
  • 13.Braveman P 2025. Health inequalities, disparities, equity: What’s in a name? Am. J. Public Health 115:996–1002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Bruch E, Atwell J. 2015. Agent-based models in empirical social research. Sociol. Methods Res 44:186–221 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Carey G, Malbon E, Carey N, Joyce A, Crammond B, Carey A. 2015. Systems science and systems thinking for public health: a systematic review of the field. BMJ Open 5:e009002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Chang T-H, Nguyen TQ, Jackson JW. 2024. The importance of equity value judgments and estimator-estimand alignment in measuring disparity and identifying targets to reduce disparity. Am. J. Epidemiol 193:536–47 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Chetty R, Hendren N, Jones MR, Porter SR. 2019. Race and economic opportunity in the United States: an intergenerational perspective. Q. J. Econ 135:711–83 [Google Scholar]
  • 18.Cole SR, Frangakis CE. 2009. The consistency statement in causal inference: a definition or an assumption? Epidemiology 20:3–5 [DOI] [PubMed] [Google Scholar]
  • 19.Cook B, Forrester S, Creedon T, Wasserman J, Sufian M, Allison J. 2021. Racial/ethnic health and healthcare disparities measurement: the application of the principles and methods of causal inference. In The Science of Health Disparities Research, ed. Dankwa-Mullan I, Pérez-Stable EJ, Gardner KL, Zhang X, Rosario AM. Wiley Blackwell [Google Scholar]
  • 20.Cook BL, McGuire TG, Meara E, Zaslavsky AM. 2009. Adjusting for health status in non-linear models of health care disparities. Health Serv. Outcomes Res. Methodol 9:1–21 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Cook BL, McGuire TG, Zaslavsky AM. 2012. Measuring racial/ethnic disparities in health care: methods and practical issues. Health Serv. Res 47:1232–54 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Cook LA, Sachs J, Weiskopf NG. 2021. The quality of social determinants data in the electronic health record: a systematic review. J. Am. Med. Inform. Assoc 29:187–96 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Cooper LA, Hill MN, Powe NR. 2002. Designing and evaluating interventions to eliminate racial and ethnic disparities in health care. J. Gen. Intern. Med 17:477–86 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Daly B, Olopade OI. 2015. A perfect storm: how tumor biology, genomics, and health care delivery patterns collide to create a racial survival disparity in breast cancer and proposed interventions for change. CA Cancer J. Clin 65:221–38 [DOI] [PubMed] [Google Scholar]
  • 25.Daniels N 2001. Justice, health, and healthcare. Am. J. Bioethics 1:2–16 [DOI] [PubMed] [Google Scholar]
  • 26.Diderichsen F, Hallqvist J, Whitehead M. 2019. Differential vulnerability and susceptibility: how to make use of recent development in our understanding of mediation and interaction to tackle health inequalities. Int. J. Epidemiol 48:268–74 [DOI] [PubMed] [Google Scholar]
  • 27.Diez Roux AV. 2011. Complex systems thinking and current impasses in health disparities research. Am. J. Public Health 101:1627–34 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Dissanayake MV, Jackson JW, Martin CL, Urrutia RP, Funk MJ, Wood ME. 2025. Challenges in estimating effects of hypothetical interventions on resources patterned by structural racism: an example in a rural North Carolina Medicaid population. Am. J. Epidemiol 10.1093/aje/kwaf072. In press [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Du Bois WEB. 2017. Black Reconstruction in America: Toward a History of the Part Which Black Folk Played in the Attempt to Reconstruct Democracy in America, 1860–1880. Routledge [Google Scholar]
  • 30.Duan N, Meng X-L, Lin JY, Chen C-N, Alegria M. 2008. Disparities in defining disparities: statistical conceptual frameworks. Stat. Med 27:3941–56 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Duran D, Asada Y, Millum J, Gezmu M. 2019. Harmonizing health disparities measurement. Am. J. Public Health 109:S25–27 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Duran DG, Pérez-Stable EJ. 2019. Novel approaches to advance minority health and health disparities research. Am. J. Public Health 109:S8–10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Essien UR, Tang Y, Figueroa JF, Litam TMA, Tang F, et al. 2022. Diabetes care among older adults enrolled in Medicare Advantage versus traditional Medicare fee-for-service plans: the Diabetes Collaborative Registry. Diabetes Care 45:1549–57 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Eyer K 2021. The but-for theory of anti-discrimination law. Va. Law Rev 107:1621–710 [Google Scholar]
  • 35.Foner E 2019. The Second Founding: How the Civil War and Reconstruction Remade the Constitution. Norton [Google Scholar]
  • 36.Fortin N, Lemieux T, Firpo S. 2011. Decomposition methods in economics. In Handbook of Labor Economics, ed. Ashenfelter O, Card D. North Holland: /Elsevier [Google Scholar]
  • 37.Foti K, Wang D, Appel LJ, Selvin E. 2019. Hypertension awareness, treatment, and control in US adults: trends in the hypertension control cascade by population subgroup (National Health and Nutrition Examination Survey, 1999–2016). Am. J. Epidemiol 188:2165–74 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Gaebler J, Cai W, Basse G, Shroff R, Goel S, Hill J. 2022. A causal framework for observational studies of discrimination. Stat. Public Policy 9:26–48 [Google Scholar]
  • 39.Galea S, Riddle M, Kaplan GA. 2010. Causal thinking and complex system approaches in epidemiology. Int. J. Epidemiol 39:97–106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Gastwirth JL. 2020. The role of statistical evidence in civil cases. Annu. Rev. Stat. Appl 7:39–60 [Google Scholar]
  • 41.Geronimus AT. 1992. The weathering hypothesis and the health of African-American women and infants: evidence and speculations. Ethn. Dis 2:207–21 [PubMed] [Google Scholar]
  • 42.Glymour MM, Spiegelman D. 2017. Evaluating public health interventions: 5. Causal inference in public health research—do sex, race, and biological factors cause health outcomes? Am. J. Public Health 107:81–85 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Graetz N, Boen CE, Esposito MH. 2022. Structural racism and quantitative causal inference: a life course mediation framework for decomposing racial health disparities. J. Health Soc. Behav 63:232–49 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Greiner DJ, Rubin DB. 2011. Causal effects of perceived immutable characteristics. Rev. Econ. Stat 93:775–85 [Google Scholar]
  • 45.Gudzune KA, Kushner RF. 2024. Medications for obesity: a review. JAMA 332:571–84 [DOI] [PubMed] [Google Scholar]
  • 46.Guimarães JMN, Jackson JW, Barber S, Griep RH, da Fonseca MJM, et al. 2024. Racial inequities in the control of hypertension and the explanatory role of residential segregation: a decomposition analysis in the Brazilian Longitudinal Study of Adult Health (ELSA-Brasil). J. Rac. Ethn. Health Disparities 11:1024–32 [DOI] [PubMed] [Google Scholar]
  • 47.Hatzenbuehler ML, Phelan JC, Link BG. 2013. Stigma as a fundamental cause of population health inequalities. Am. J. Public Health 103:813–21 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Hernán MA. 2016. Does water kill? A call for less casual causal inferences. Ann. Epidemiol 26:674–80 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Hernán MA, Robins JM. 2006. Estimating causal effects from epidemiological data. J. Epidemiol. Commun. Health 60:578–86 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Hernán MA, Robins JM. 2016. Using big data to emulate a target trial when a randomized trial is not available. Am. J. Epidemiol 183:758–64 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Hernán MA, Robins JM. 2017. Per-protocol analyses of pragmatic trials. N. Engl. J. Med 377:1391–98 [DOI] [PubMed] [Google Scholar]
  • 52.Hernán MA, Taubman SL. 2008. Does obesity shorten life? The importance of well-defined interventions to answer causal questions. Int. J. Obes 32(3):S8–14 [DOI] [PubMed] [Google Scholar]
  • 53.Hofmann B 2022. Ethical issues with geographical variations in the provision of health care services. BMC Med. Ethics 23:127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Homer JB, Hirsch GB. 2006. System dynamics modeling for public health: background and opportunities. Am. J. Public Health 96:452–58 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Hovmand PS. 2014. Group model building and community-based system dynamics process. In Community Based System Dynamics. Springer [Google Scholar]
  • 56.Howe CJ, Bailey ZD, Raifman JR, Jackson JW. 2022. Recommendations for using causal diagrams to study racial health disparities. Am. J. Epidemiol 191:1981–89 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Howe CJ, Dulin-Keita A, Cole SR, Hogan JW, Lau B, et al. 2018. Evaluating the population impact on racial/ethnic disparities in HIV in adulthood of intervening on specific targets: a conceptual and methodological framework. Am. J. Epidemiol 187:316–25 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Hu L, Kohler-Hausmann I. 2025. What is perceived when race is perceived and why it matters for causal inference and discrimination studies. Law Soc. Rev 59:239–64 [Google Scholar]
  • 59.Huang DT, Bassig BA, Hubbard K, Klein RJ, Talih M. 2022. Examining progress toward elimination of racial and ethnic health disparities for Healthy People 2020 objectives using three measures of overall disparity. Vital Health Stat. Rep., National Center for Health Statistics. 10.15620/cdc:121266 [DOI] [PubMed] [Google Scholar]
  • 60.Hutler B 2022. Causation and injustice: locating the injustice of racial and ethnic health disparities. Bioethics 36:260–66 [DOI] [PubMed] [Google Scholar]
  • 61.Institute of Medicine. 2003. Unequal Treatment: Confronting Racial and Ethnic Disparities in Health Care. National Academies Press; [PubMed] [Google Scholar]
  • 62.Jackson JW. 2021. Meaningful causal decompositions in health equity research: definition, identification, and estimation through a weighting framework. Epidemiology 32:282–90 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Jackson JW, Arah OA. 2020. Invited commentary: making causal inference more social and (social) epidemiology more causal. Am. J. Epidemiol 189:179–82 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Jackson JW, Chang TH, Meche A, Nguyen TQ. 2026. Estimation strategies for causal decomposition analysis with allowability specifications. Preprint, arXiv:2602.07825 (stat). https://arxiv.org/abs/2602.07825 [Google Scholar]
  • 65.Jackson JW, Hsu Y-J, Greer RC, Boonyasai RT, Howe CJ. 2025. The target study: a conceptual model and framework for measuring disparity. Sociol. Methods Res 10.1177/00491241251314037. In press [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Jackson JW, Hsu Y-J, Zalla LC, Carson KA, Marsteller JA, et al. 2024. Evaluating effects of multilevel interventions on disparity in health and healthcare decisions. Prev. Sci 25:407–20 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Jackson JW, VanderWeele TJ. 2018. Decomposition analysis to identify intervention targets for reducing disparities. Epidemiology 29:825–35 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Jackson JW, VanderWeele TJ. 2019. Intersectional decomposition analysis with differential exposure, effects, and construct. Soc. Sci. Med 226:254–59 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Jeffries N, Zaslavsky AM, Diez Roux AV, Creswell JW, Palmer RC, et al. 2019. Methodological approaches to understanding causes of health disparities. Am. J. Public Health 109:S28–33 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Jeong T 2025. When to use counterfactuals in causal historiography: methods for semantics and inference. Sociol. Methods Res 10.1177/00491241251314039 [DOI] [Google Scholar]
  • 71.Kaufman JS. 2008. Epidemiologic analysis of racial/ethnic disparities: some fundamental issues and a cautionary example. Soc. Sci. Med 66:1659–69 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Kaufman JS. 2017. Statistics, adjusted statistics, and maladjusted statistics. Am. J. Law Med 43:193–208 [DOI] [PubMed] [Google Scholar]
  • 73.Kaufman JS. 2019. Commentary: causal inference for social exposures. Annu. Rev. Public Health 40:7–21 [DOI] [PubMed] [Google Scholar]
  • 74.Kendi IX. 2020. Stop blaming black people for dying of the coronavirus: new data from 29 states confirm the extent of the racial disparities. The Atlantic, April 14. https://www.theatlantic.com/ideas/archive/2020/04/race-and-blame/609946/ [Google Scholar]
  • 75.Kitagawa EM. 1955. Components of a difference between two rates. J. Am. Stat. Assoc 50:1168–94 [Google Scholar]
  • 76.Knox D, Mummolo J. 2020. Toward a general causal framework for the study of racial bias in policing. J. Political Inst. Political Econ 1:341–78 [Google Scholar]
  • 77.Kohler-Hausmann I 2019. Eddie Murphy and the dangers of counterfactual causal thinking about detecting racial discrimination. Northwest. Univ. Law Rev 113:1163–228 [Google Scholar]
  • 78.Krieger N, Chen JT, Coull B, Waterman PD, Beckfield J. 2013. The unique impact of abolition of Jim Crow laws on reducing inequities in infant death rates and implications for choice of comparison groups in analyzing societal determinants of health. Am. J. Public Health 103:2234–44 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Krieger N, Davey Smith G. 2016. The tale wagged by the DAG: broadening the scope of causal inference and explanation for epidemiology. Int. J. Epidemiol 45:1787–808 [DOI] [PubMed] [Google Scholar]
  • 80.LaVeist TA. 2000. On the study of race, racism, and health: a shift from description to explanation. Int. J. Health Serv 30:217–19 [DOI] [PubMed] [Google Scholar]
  • 81.LaVeist TA. 2005. Disentangling race and socioeconomic status: a key to understanding health inequalities. J. Urban Health 82:iii26–34 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Li F, Li F. 2023. Using propensity scores for racial disparities analysis. Obs. Stud 9:59–68 [Google Scholar]
  • 83.Link BG, Phelan J. 1995. Social conditions as fundamental causes of disease. J. Health Soc. Behav 1995:80–94 [PubMed] [Google Scholar]
  • 84.Lopez Bernal J, Cummins S, Gasparrini A. 2016. Interrupted time series regression for the evaluation of public health interventions: a tutorial. Int. J. Epidemiol 46:348–55 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Lopez Bernal J, Cummins S, Gasparrini A. 2018. The use of controls in interrupted time series studies of public health interventions. Int. J. Epidemiol 47:2082–93 [DOI] [PubMed] [Google Scholar]
  • 86.Luke DA, Stamatakis KA. 2012. Systems science methods in public health: dynamics, networks, and agents. Annu. Rev. Public Health 33:357–76 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Mackenbach JP, Gunning-Schepers LJ. 1997. How should interventions to reduce inequalities in health be evaluated? J. Epidemiol. Community Health 51:359–64 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Matthay EC, Hagan E, Gottlieb LM, Tan ML, Vlahov D, et al. 2020. Alternative causal inference methods in population health research: evaluating tradeoffs and triangulating evidence. SSM Popul. Health 10:100526. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Matthay EC, Hagan E, Joshi S, Tan ML, Vlahov D, et al. 2021. The revolution will be hard to evaluate: how co-occurring policy changes affect research on the health effects of social policies. Epidemiol. Rev 43:19–32 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.McGuire TG, Alegria M, Cook BL, Wells KB, Zaslavsky AM. 2006. Implementing the Institute of Medicine definition of disparities: an application to mental health care. Health Serv. Res 41:1979–2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.McMillon DB. 2024. What makes systemic discrimination, “systemic?” Exposing the amplifiers of inequity. Preprint, arXiv:2403.11028. https://arxiv.org/abs/2403.11028v1 [Google Scholar]
  • 92.Meche A, Boonyasai RT, Hsu Y-J, Greer RC, Mehta HB, Alexander GC, Segal JB, Cooper LA, Jackson JW. 2026. Applying the Target Study Conceptual Model to Measure Racial and Ethnic Disparities in Hypertension Treatment Intensification. Epidemiology (in press). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Miles CH. 2023. On the causal interpretation of randomised interventional indirect effects. J. R. Stat. Soc. B 85:1154–72 [Google Scholar]
  • 94.Moreno-Betancur M 2021. The target trial: a powerful device beyond well-defined interventions. Epidemiology 32:291–94 [DOI] [PubMed] [Google Scholar]
  • 95.Morgenstern H, Bursic ES. 1982. A method for using epidemiologic data to estimate the potential impact of an intervention on the health status of a target population. J. Commun. Health 7:292–309 [DOI] [PubMed] [Google Scholar]
  • 96.Mueller M, Purnell TS, Mensah GA, Cooper LA. 2015. Reducing racial and ethnic disparities in hypertension prevention and control: What will it take to translate research into practice and policy? Am. J. Hypertens 28:699–716 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Murray DM, Goodman MS. 2024. Design and analytic methods to evaluate multilevel interventions to reduce health disparities: rigorous methods are available. Prev. Sci 25:343–47 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Nabi R, Shpitser I. 2018. Fair inference on outcomes. Presented at 32nd AAAI Conference on Artificial Intelligence [PMC free article] [PubMed] [Google Scholar]
  • 99.Naimi AI, Schnitzer ME, Moodie EEM, Bodnar LM. 2016. Mediation analysis for health disparities research. Am. J. Epidemiol 184:315–24 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.National Academies of Sciences, Engineering, and Medicine. 2024. Ending Unequal Treatment: Strategies to Achieve Equitable Health Care and Optimal Health for All, ed. Benjamin GC, DeVoe JE, Amankwah FK, Nass SJ. National Academies Press; [PubMed] [Google Scholar]
  • 101.Oaxaca R 1973. Male-female wage differentials in urban labor markets. Int. Econ. Rev 14:693–709 [Google Scholar]
  • 102.Park S, Kang S, Lee C, Ma S. 2023. Sensitivity analysis for causal decomposition analysis: assessing robustness toward omitted variable bias. J. Causal Inference 11:20220031 [Google Scholar]
  • 103.Park S, Qin X, Lee C. 2024. Estimation and sensitivity analysis for causal decomposition in health disparity research. Sociol. Methods Res 53:571–602 [Google Scholar]
  • 104.Pearl J 2001. Direct and indirect effects. In Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence, ed. Breese JS, Koller D. Morgan Kauffmann [Google Scholar]
  • 105.Persad G, Peek ME, Shah SK. 2022. Fair allocation of scarce therapies for coronavirus disease 2019 (COVID-19). Clin. Infect. Dis 75:e529–33 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Peters CC. 1941. A method of matching groups for experiment with no loss of population. J. Educ. Res 34:606–12 [Google Scholar]
  • 107.Powers M, Faden R. 2003. Racial and ethnic disparities in health care: an ethical analysis of when and how they matter. In Unequal Treatment: Confronting Racial and Ethnic Disparities in Health Care, ed. Smedley BD, Stith AY, Nelson AR. National Academies Press; [PubMed] [Google Scholar]
  • 108.Qin MM, Jackson JW. 2025. A review of the causal decomposition framework for modeling interventions that reduce disparities. Curr. Epidemiol. Rep 12:10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Rahimi E, Hashemi Nazari SS. 2021. A detailed explanation and graphical representation of the Blinder-Oaxaca decomposition method with its application in health inequalities. Emerg. Themes Epidemiol 18:12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Reskin B 2012. The race discrimination system. Annu. Rev. Sociol 38:17–35 [Google Scholar]
  • 111.Rojas-Saunero LP, Glymour MM, Mayeda ER. 2024. Selection bias in health research: quantifying, eliminating, or exacerbating health disparities? Curr. Epidemiol. Rep 11:63–72 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Rojas-Saunero LP, Labrecque JA, Swanson SA. 2022. Invited commentary: conducting and emulating trials to study effects of social interventions. Am. J. Epidemiol 191:1453–56 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Runhardt RW. 2024. Concrete counterfactual tests for process tracing: defending an interventionist potential outcomes framework. Sociol. Methods Res 53:1591–628 [Google Scholar]
  • 114.Schulman KA, Berlin JA, Harless W, Kerner JF, Sistrunk S, et al. 1999. The effect of race and sex on physicians’ recommendations for cardiac catheterization. N. Engl. J. Med 340:618–26 [DOI] [PubMed] [Google Scholar]
  • 115.Schwartz S, Prins SJ, Campbell UB, Gatto NM. 2016. Is the “well-defined intervention assumption” politically conservative? Soc. Sci. Med 166:254–57 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Seewald NJ, McGinty EE, Stuart EA. 2024. Target trial emulation for evaluating health policy. Ann. Intern. Med 177:1530–38 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Sen B 2014. Using the Oaxaca-Blinder decomposition as an empirical tool to analyze racial disparities in obesity. Obesity 22:1750–55 [DOI] [PubMed] [Google Scholar]
  • 118.Sen M, Wasow O. 2016. Race as a bundle of sticks: designs that estimate effects of seemingly immutable characteristics. Annu. Rev. Political Sci 19:499–522 [Google Scholar]
  • 119.Shen AA, Visoki E, Barzilay R, Pimentel SD. 2025. A calibrated sensitivity analysis for weighted causal decompositions. Stat. Med 44:e70010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Smith MJ, Charlton ME, Oleson JJ. 2023. Causal decomposition maps: an exploratory tool for designing area-level interventions aimed at reducing health disparities. Biom. J 65:e2200213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121.Sowell T 2019. Discrimination and Disparities. Basic Books; /Hachette [Google Scholar]
  • 122.Sudharsanan N, Bijlsma MJ. 2021. Educational note: causal decomposition of population health differences using Monte Carlo integration and the g-formula. Int. J. Epidemiol 50:2098–107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Sun X, Iwashyna TJ, Drabo EF, Crews DC, Ferryman K, Jackson JW. 2025. An integrated target study and target trial framework to evaluate intervention effects on disparities. Preprint, arXiv:2508.14690v1 (stat). https://arxiv.org/abs/2508.14690v1 [Google Scholar]
  • 124.Tchetgen Tchetgen EJ, Park C, Richardson DB. 2024. Universal difference-in-differences for causal inference in epidemiology. Epidemiology 35:16–22 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125.Tolbert AW. 2024. Causal agnosticism about race: variable selection problems in causal inference. Philos. Sci 91:1098–108 [Google Scholar]
  • 126.Valeri L, Proust-Lima C, Fan W, Chen JT, Jacqmin-Gadda H. 2023. A multistate approach for the study of interventions on an intermediate time-to-event in health disparities research. Stat. Methods Med. Res 32:1445–60 [DOI] [PubMed] [Google Scholar]
  • 127.VanderWeele TJ. 2009. Concerning the consistency assumption in causal inference. Epidemiology 20:880–83 [DOI] [PubMed] [Google Scholar]
  • 128.VanderWeele TJ, Robins JM. 2007. Four types of effect modification: a classification based on directed acyclic graphs. Epidemiology 18:561–68 [DOI] [PubMed] [Google Scholar]
  • 129.VanderWeele TJ, Robinson WR. 2014. On the causal interpretation of race in regressions adjusting for confounding and mediating variables. Epidemiology 25:473–84 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Williams DR, Collins C. 2001. Racial residential segregation: a fundamental cause of racial disparities in health. Public Health Rep. 116:404–16 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.Wing C, Simon K, Bello-Gomez RA. 2018. Designing difference in difference studies: best practices for public health policy research. Annu. Rev. Public Health 39:453–69 [DOI] [PubMed] [Google Scholar]
  • 132.Yu A, Elwert F. 2025. Nonparametric causal decomposition of group disparities. Ann. Appl. Stat 19:821–45 [Google Scholar]
  • 133.Zalla LC, Martin CL, Edwards JK, Gartner DR, Noppert GA. 2021. A geography of risk: structural racism and coronavirus disease 2019 mortality in the United States. Am. J. Epidemiol 190:1439–46 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134.Zhang J, Bareinboim E. 2018. Fairness in decision-making—the causal explanation formula. Presented at the 32nd AAAI Conference on Artificial Intelligence [Google Scholar]
  • 135.Zhang L, Wu Y, Wu X. 2016. A causal framework for discovering and removing direct and indirect discrimination. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. AAAI Press. 10.24963/ijcai.2017/549 [DOI] [Google Scholar]

RESOURCES