Skip to main content
SSM - Population Health logoLink to SSM - Population Health
. 2023 Feb 4;22:101352. doi: 10.1016/j.ssmph.2023.101352

A quantitative assessment of the frequency and magnitude of heterogeneous treatment effects in studies of the health effects of social policies

Dakota W Cintron a,b, Laura M Gottlieb a, Erin Hagan a, May Lynn Tan a, David Vlahov c, M Maria Glymour a,b, Ellicott C Matthay d,
PMCID: PMC9975308  PMID: 36873266

Abstract

Substantial heterogeneity in effects of social policies on health across subgroups may be common, but has not been systematically characterized. Using a sample of 55 contemporary studies on health effects of social policies, we recorded how often heterogeneous treatment effects (HTEs) were assessed, for what subgroups (e.g., male, female), and the subgroup-specific effect estimates expressed as Standardized Mean Differences (SMDs). For each study, outcome, and dimension (e.g., gender), we fit a random-effects meta-analysis. We characterized the magnitude of heterogeneity in policy effects using the standard deviation of the subgroup-specific effect estimates (τ). Among the 44% of studies reporting subgroup-specific estimates, policy effects were generally small (<0.1 SMDs) with mixed impacts on health (67% beneficial) and disparities (50% implied narrowing of disparities). Across study-outcome-dimensions, 54% indicated any heterogeneity in effects, and 20% had τ > 0.1 SMDs. For 26% of study-outcome-dimensions, the magnitude of τ indicated that effects of opposite signs were plausible across subgroups. Heterogeneity was more common in policy effects not specified a priori. Our findings suggest social policies commonly have heterogeneous effects on health of different populations; these HTEs may substantially impact disparities. Studies of social policies and health should routinely evaluate HTEs.

Keywords: Population heterogeneity, Effect modifiers, Social policy, Health equity

Highlights

  • The extent of heterogeneity in social policy effects across subgroups is not clear.

  • We examine heterogeneity in policy effects across subgroups using meta-analysis.

  • Large heterogeneities in policy effects across subgroups are common.

  • Social policies may likely be benefitting some subgroups while harming others.

1. Introduction

Social policies may have substantial impacts on a broad range of population health outcomes, and a growing body of health research seeks to quantify their causal effects (Matthay & Glymour, 2022). However, less research has evaluated differences in the effects of social policies across population subgroups. Many social policies could plausibly benefit some members of the community while harming others, or have larger or smaller benefits across population subgroups. For example, racist social policies widen disparities between racial groups (e.g., in health, housing, education, or policing), whereas anti-racist social policies narrow racial disparities by dismantling the racism embedded in social, economic, and political institutions (Boykin et al., 2020; Kendi, 2019).

Assessing heterogeneous treatment effects (HTEs) of social policies is critical to understand the implications of these policies for health inequities, and epidemiology researchers have increasingly called for HTE assessments for this reason (Matthay & Glymour, 2022). The health effects of a policy on the population overall need not be in the same direction as the effects on inequities: policies that improve average health may exacerbate inequities, or conversely, policies that harm health on average may nonetheless narrow inequities. Social policies that primarily benefit those with better health at baseline are likely to widen inequities whereas social policies that primarily benefit those with the poorest health may reduce inequities. For example, the Korean War GI Bill, which provided socioeconomic benefits to veterans, was associated with fewer subsequent depressive symptoms for veterans from backgrounds with low childhood socioeconomic status (SES) but not those from high childhood SES backgrounds (Vable et al., 2016); if veterans from low SES backgrounds had more depressive symptoms at baseline, then this policy may have reduced inequities in depressive symptoms. Well-controlled assessments of HTEs across subgroups contribute evidence on whether social policy effects differ across social categories associated with health disparities such as race, gender, or socioeconomic status. Understanding HTEs is also necessary to understand how to adapt policies to new populations because the characteristics of the new population may modify the effects of the social policy (Matthay, 2020).

Government and funders are interested in determining what interventions works best to improve health and for whom. Understanding HTEs of social policies is at the heart of these questions. HTE evaluations add cost and complexity to a study, so it is important to be able to identify the types of policies and population subgroups for which large heterogeneities are most likely. Despite their potential value, HTE evaluations are not routine in research on the health effects of social policies (Cintron et al., 2022; Fernandez y Garcia et al., 2010; Gabler et al., 2009; Glymour et al., 2013; Rojas-Saunero et al., 2022; Thomson et al., 2022). A prior review of social policy studies found that only 44% evaluated any HTEs, and of these, the population dimensions (e.g., race) examined varied widely (Cintron et al., 2022). Given limited resources and the potential for increased chance findings, additional guidance is needed on when and for which dimensions HTEs should be assessed (Breck & Wakar, 2021). Priority setting therefore requires answers to questions such as: How often does treatment effect heterogeneity happen? How often is heterogeneity trivial in magnitude? How often is it substantial in magnitude? Does the magnitude of heterogeneity and frequency of substantial heterogeneity vary by population dimension or policy type? If effects differ somewhat but are at least the same sign for everyone in the population, it may not be as important to precisely quantify heterogeneity. But if an intervention may harm some people while helping others, it is essential to understand this. Although a handful of systematic reviews explore HTEs in randomized trials of biomedical interventions (Fan et al., 2019; Fernandez y Garcia et al., 2010; Gabler et al., 2009; Kasenda et al., 2014; Starks et al., 2019; Sun et al., 2012), little work has examined HTEs of social policies and no work has examined the magnitude and distribution of HTEs in any research domain. This study takes a first step towards answering these questions.

Given the lack of systematic reviews and empirical evaluations of HTEs in practice, it is not clear if or when large HTEs of social policies are common. We address this gap by characterizing the extent of heterogeneity in estimated policy effects across population subgroups in a sample of 55 studies of the health effects of social policies. This study builds on prior work that found that less than half of the 55 studies evaluated heterogeneity in estimated policy effects across any population subgrouping dimension (Cintron et al., 2022). Here, we extend this work to characterize the findings of the studies that did evaluate HTEs. Specifically, we use meta-analyses to examine how frequently studies found heterogeneity in estimated policy effects across population subgroups, and to characterize the magnitude and distribution of heterogeneity overall and by population subgrouping dimension, policy domain, and whether the authors specified their HTE evaluations a priori. We also quantify how often researchers should expect subgroup effects on the opposite side of the null from the overall population effect, highlighting the potential consequences of failure to assess HTEs.

2. Materials and methods

2.1. Identification of social policy studies

We used a previously reported sample of 55 contemporary studies evaluating the health effects of social policies (Cintron et al., 2022; Matthay et al., 2022a, 2022b). The sample included all studies evaluating the health effects of social policies that were published in 2019 in a multidisciplinary set of high-impact journals: American Journal of Public Health, American Journal of Epidemiology, Journal of the American Medical Association, New England Journal of Medicine, The Lancet, American Journal of Preventive Medicine, Social Science and Medicine, Health Affairs, Demography, and American Economic Review. We confirmed the comprehensiveness and relevance of this set of journals using a convenience sample of 66 researchers from diverse disciplines who were asked to rank the most relevant high-impact journals publishing research on the health effects of social policies. Additional details on the sample and survey can be found elsewhere (Cintron et al., 2022; Matthay et al., 2022a, 2022b). This sample provides a snapshot of HTE evaluations in social policy research across diverse policy domains with an emphasis on high-profile publications with high methodological rigor that may influence public policy.

2.2. Data extraction and measures

We re-abstracted the studies in the original sample using a structured data extraction form (Web Table A1) to collect information on estimated policy effects across population subgroups. We classified studies as evaluating HTEs if they reported effects of the social policy on the health-related outcome(s) for subgroups of the study population defined by any subgrouping (e.g. age, gender, race/ethnicity, geography, health status). For clarity, we refer to the population characteristics along which subgroupings are evaluated (e.g., gender) as “dimensions” and the specific subgroups (e.g., men, women) as “subgroups.” Dimensions defined intersectionally (e.g., race by gender) were treated as unique dimensions.

For each outcome in each study, we extracted the overall policy effect estimate and the effect estimates for all available subgroups along all available subgrouping dimensions. Studies frequently performed HTE evaluations along multiple independent dimensions. For each subgroup effect estimate, we also extracted the sample size on which the estimate is based, whether the estimate corresponds to a beneficial or harmful effect, the measure of association (e.g., incidence rate ratio, risk difference), and any quantities required to standardize the measures of association for comparability across studies. Because HTEs or estimates of effect measure modification can differ meaningfully depending on whether effects are reported on the additive versus multiplicative scale (Rothman et al., 2008), we transformed all estimated effects to both the multiplicative scale (lnOR) and additive scale (standardized mean difference, SMD) and conducted the statistical analyses on both scales (see Web Appendix B for details). Lastly, for each study, outcome, and HTE dimension, we also recorded whether the estimated policy effects indicated a narrowing or widening of disparities across subgroups as a result of the social policy (see Web Appendix B for details). Descriptive results for other data abstracted from the 55 studies can be found elsewhere (Cintron et al., 2022; Matthay et al., 2022a, 2022b).

We categorized the population characteristics used to define the HTE evaluation dimensions into demographic characteristics (e.g., age, gender, race/ethnicity), geographic location (e.g., states, cities), health characteristics (e.g., depression scores, substance use, body mass index), and socioeconomic characteristics (e.g., education, income, socioeconomic status) (Web Appendix Table A2). We categorized the social policies into the following domains: firearm (e.g., right-to-carry), immigration (e.g., deferred action for childhood arrivals), macroeconomic (e.g., austerity), employment and income (e.g., minimum wage, cash transfers), family benefits (e.g., paid family leave), population parity (e.g., 1-child/2-child), alcohol and substance use (e.g., blood alcohol concentration limits for drivers), and education policies (e.g., education system stratification) (Web AppendixTable A3). Finally, because we expected that some HTE assessments might be conducted post-hoc and therefore lack the study planning for sample sizes needed to ensure sufficient statistical precision, we also recorded whether authors specified their HTE evaluations a priori (i.e., they made their intent to evaluate HTEs along particular dimensions known prior to reporting them).

2.3. Statistical analysis

The outcome variable in all statistical analyses was the study-outcome-subgroup-specific estimate of the effect of the social policy. For each study-outcome-dimension, we fit a random-effects meta-analysis model to characterize the heterogeneity in policy effect estimates across population subgroups defined by the corresponding dimension (Viechtbauer, 2010). For example, if a study evaluated HTEs by both race and gender for two outcomes, we fit four random-effects meta-analysis models for that study: one model for each outcome with the race dimension and one model for each outcome with the gender dimension. Since we recorded effect estimates on both the additive (SMD) and multiplicative (lnOR) scales, we fit two random-effects meta-analysis models for each unique study-outcome-dimension–one on each scale.

All models were fit using restricted maximum-likelihood estimation using the rma function of the metafor package in R version 4.1.3 (Viechtbauer, 2010). Random-effects meta-analysis models assume that the true effect estimates vary across studies (in this case, the “studies” are study-outcome-dimensions (Borenstein et al., 2021)), as opposed to fixed-effects meta-analyses which would assume that the true underlying policy effect is the same for all subgroups (Viechtbauer, 2010). The variation in observed effect estimates across subgroups may be due to real differences in policy effects across subgroups or sampling variability (i.e., chance). Random-effects meta-analysis models decompose the variation in effect estimates into these two components using the following model. For each study-outcome-dimension (Viechtbauer, 2010):

yi=θi+ei (1)
θi=μ+ui (2)

where yi denotes the observed effect in subgroup i, θi corresponds to the true effect, μ is the overall average true effect across subgroups, ei represents the sampling error of the effect estimate and is distributed eiN(0,vi), and most importantly, ui represents the true subgroup-specific deviation from the true overall effect μ and is distributed uiN(0,τ2). The vi represent the variances in the sampling errors for each subgroup (i.e., the variances of the estimated effects for each subgroup) and, because many studies did not report these variances, were approximated using vi=1(var(E)*(ni1)) here E is the policy exposure variable (binary or continuous) and ni is the corresponding subgroup analytic sample size (Viechtbauer, 2010) (see Web Appendix B for details and justification).

The primary parameter of interest was τ, the standard deviation of the true subgroup-specific effects about the true overall effect, which quantifies the degree of heterogeneity in the effect of the social policy across subgroups after accounting for variability due to chance. τ was on the same scale as the policy effect estimates (SMD or lnOR), and τ = 0 indicated that there were no differences in the true effects of the policy across population subgroups.

We used frequency statistics and histograms to characterize the distribution of τ estimates overall and by dimension, social policy domain, and a priori specification of HTE analyses. We did not calculate inferential statistics to evaluate whether the distributions of τ differed by dimension or study characteristics because of the small number of effect estimates in each category. Given that the effects of social policies on health outcomes are likely to be small (i.e., <0.2 SMD) (Matthay et al., 2021), we considered τ ≥0.1 SMDs (or equivalently τ ≥0.18 on the lnOR scale) (Cohen, 2013) to be “large” heterogeneity.

Special ethical considerations arise in the context of qualitative interaction, i.e., when a policy benefits some subgroups but harms others. To assess whether such qualitative interactions were likely, we also used the estimated parameters from the meta-analysis models to quantify the proportion of the time we would expect a subgroup effect on the opposite of the null from the overall population effect. Specifically, for each study-outcome-dimension, we computed the area under the curve distributed N(μ,τ2) that was on the opposite side of the null from the overall population effect. This phenomenon indicates when failure to evaluate HTEs could lead to policy decisions that inadvertently harm some subgroups.

Lastly, for each meta-analysis model, we examined I2: the estimated percentage of variability in the effect estimates due to real between-subgroup heterogeneity rather than chance. I2 = 0% indicated that variation in effect estimates was entirely due to chance.

A complete overview of the steps in this study is presented in Fig. 1.

Fig. 1.

Fig. 1

Data extraction and analysis flow chart.

3. Results

3.1. Characteristics of the study sample

Of the 55 studies, 24 evaluated some form of HTEs. Studies assessed a range of health outcomes (e.g., infant mortality, self-rated health, firearm suicides) and social policies. After data extraction, the database included 557 subgroup effect estimates from 159 unique study-outcome-dimensions. Of these, 24 estimates (4%) were excluded because the quantities needed to transform the measure of association to the lnOR or SMD were not reported and could not be approximated. Another 16 estimates (2%) were excluded due to missing information needed to compute the variance of the subgroup-specific effect estimate. Finally, 7 estimates (1%) were excluded because they were not reported in the original study due to model non-convergence. We treated these estimates as missing completely at random. The final analytic database included 510 subgroup effect estimates from 136 unique study-outcome-dimensions (Table 1).

Table 1.

Sample characteristics and estimated heterogeneity of effects within study-outcome-dimensions for studies on the health effects of social policies.

# Study- Outcome- Dimensions # of estimates Median τ (range) %τ > 0.1 % τ CIs excluding null
Overall 136 510 0.03 (0.0–0.9) 20 54
Social policy domain
Firearm 11 76 0.00 (0.0–0.0) 0 0
Immigration 2 4 0.09 (0.0–0.2) 50 50
Macroeconomic 2 4 0.00 (0.0–0.0) 0 0
Employment and income 41 114 0.05 (0.0–0.4) 29 71
Family benefits 52 221 0.03 (0.0–0.9) 13 62
Population parity 8 16 0.06 (0.0–0.2) 25 75
Alcohol and substance use 18 62 0.03 (0.0–0.3) 28 28
Education 2 13 0.00 (0.0–0.0) 0 0
Population characteristic
Demographic characteristics 71 210 0.02 (0.0–0.9) 18 48
Geographic location 22 162 0.02 (0.0–0.4) 18 41
Health characteristics 6 32 0.04 (0.0–0.4) 33 50
Socioeconomic characteristics 37 106 0.04 (0.0–0.8) 22 73
A priori specification
Yes 99 422 0.02 (0.0–0.4) 11 51
No 37 88 0.08 (0.0–0.9) 43 54

Note. Tau (τ) is the standard deviation of the effect estimates across the given study-outcome-dimension after accounting for sampling variability. CIs - 95% Confidence intervals.

3.2. Benefits and harms of social policies and implications for disparities

Standardized policy effect estimates for all studies, outcomes, and subgroups are presented in Fig. 2. Social policy effect estimates were generally small (<0.1 SMDs). Effects on health were mixed: 342 (67%) estimates implied health benefits and 168 (33%) implied health harms. Across the 136 study-outcome-dimensions, the estimated policy effects for 68 (50%) implied a widening of disparities in the outcome between subgroups. Cross-tabulating this information with the direction of effect (harmful versus beneficial), 14% of effect estimates corresponded to harmful effects on average that nonetheless reduced the magnitude of disparities in the outcome across subgroups, 36% corresponded to beneficial effects on average that reduced disparities, 18% corresponded to harmful effects on average that widened disparities, and 31% corresponded to beneficial effects on average that widened disparities.

Fig. 2.

Fig. 2

Distribution of estimated subgroup social policy effects by population characteristic and social policy domain

Note. The overall effect is the overall policy effect reported for a given study population. In certain studies, the overall policy effect is not presented because it was not reported in the original study or the subgroup analyses were the primary focus of the study. X-axis tick marks represent a specific study-outcome-dimension. A grey line is placed at a null effect (zero). Positive values indicate beneficial effects whereas negative values indicate harmful effects. Estimates for each study-outcome-dimension are jittered for clarity. SeeTable A2 and A3 for more information on population dimensions and social policy domains.

3.3. Overall distribution of heterogeneity in social policy effect estimates

The effects of social policies frequently varied by population subgroup. Fig. 3 presents the distribution of estimated τ values. Across study-outcome-dimensions, the median τ (degree of heterogeneity) on the SMD scale was 0.03 (range: 0.0–0.9), 54% of τ estimates had 95% confidence intervals that excluded 0, indicating statistically significant evidence of heterogeneity, and 20% of τ estimates were greater than our benchmark for large heterogeneity of 0.1 (Table 1). On the multiplicative scale, the distribution of τ estimates was similar in pattern (Appendix Figures A4,Table A5), but larger in magnitude compared to the additive scale. 47 (35%) study-outcome-dimensions had estimated I2 values of 0, indicating that for these dimensions, the magnitude of any apparent variation in effects across subgroups was within that expected due to chance in finite samples (Appendix Figures A8-9).

Fig. 3.

Fig. 3

Distribution of standard deviations (heterogeneity) in standardized mean difference estimates of social policy effects (τ) across study-outcome-dimensions

Note. Tau (τ) is the standard deviation of the effect estimates across study-outcome-dimensions after accounting for sampling variability. The vertical dashed line represents our benchmark for considerable heterogeneity (i.e., τ = 0.1). Two sets of histograms are overlaid and shaded by the statistical significance of the τ′s, where statistical significance refers to a 95% confidence interval for the estimated τ that excluded the null.

3.4. Distribution of heterogeneity social policy effect estimates by social policy domain

Most HTE evaluations involved employment and income (n = 41) or family benefits (n = 52) policies (Table 1). Heterogeneity in estimated policy effects was evident for most types of social policies (Table 1, Appendix Figure A1, AppendixTable A5, Appendix Figure A5). Specifically, τ was greater than the 0.1 benchmark for 29% of employment and income, 28% of alcohol and substance use, 25% of population parity, 13% of family benefits, and 50% of immigration study-outcome-dimensions, but 0% of firearm, macroeconomic, or education study-outcome-dimensions.

3.5. Distribution of heterogeneity in social policy effect estimates by population characteristics

Demographic characteristics (n = 71), geographic location (n = 22), and socioeconomic characteristics (n = 37) were the most common population characteristics for which HTEs were evaluated (Table 1). Heterogeneity in estimated policy effects was evident for all population characteristics (Table 1, Appendix Figure A2, AppendixTable A5, Appendix Figures A6). Specifically, τ was greater than the 0.1 benchmark for 18% of demographic characteristics, 18% of geographic location, 33% of individual health characteristics, and 22% of socioeconomic characteristics analyses.

3.6. Distribution of heterogeneity in social policy effect estimates by a priori specification of HTEs

HTE evaluations were specified a priori for 73% of study-outcome-dimensions. Heterogeneity in estimated effects was less common in studies with a priori HTE specification (11% with τ >0.1) than in studies that did not specify a priori a plan to evaluate HTEs (43% with τ >0.1)(Table 1, Appendix Figure A3, AppendixTable A5; Appendix Figures A7).

3.7. Frequency of qualitative interaction

Given the estimated variance of effect sizes across subgroups, effects in opposite directions are likely to be common. Of the 136 study-outcome-dimensions, 104 (76%) had a non-zero area under the curve on the opposite side of the null from the overall population effect. Of these 104, 44% corresponded to an expected effect on the opposite side of the null from the overall population effect at least 25% of the time; 26% corresponded to an expected effect opposite to the overall population effect at least 50% of the time; and 16% corresponded to an expected effect to the overall population effect at least 75% of the time.

4. Discussion

We characterized the frequency, magnitude, and distribution of HTEs in a contemporary sample of studies on the health effects of social policies. Less than half of studies (44%) evaluated heterogeneity in estimated policy effects across any population subgrouping dimension. Across reported HTE evaluations (study-outcome-dimensions), 54% indicated statistically significant differences in policy effects across population subgroups and 20% reported large heterogeneities. With some variation in frequency and magnitude, HTEs were observed for most social policy domains, all types of population characteristics, and regardless of whether the HTE evaluation was specified a priori. These findings underscore the importance of evaluating HTEs of social policies. HTEs are important for both understanding the implications of the social policy for health disparities and anticipating how population-level social policy effects will differ in jurisdictions with different compositions from the one initially studied.

Evidence of considerable heterogeneity in social policy effects across population subgroups is consistent with social theory. For example, resource substitution theory hypothesizes that those who have been historically denied health promoting resources (e.g., education, income, and power) will benefit more from access to these resources compared to those who more readily receive them (Ross & Mirowsky, 2006). Numerous prior policy evaluations have reported important HTEs for at least some subgrouping dimensions (Leventhal & Brooks-Gunn, 2003; Nguyen et al., 2016; Vable et al., 2016), but to our knowledge, this is the first study to systematically assess the frequency, magnitude, and distribution of HTEs across the social policy literature. It is also the first to apply meta-analysis methodology as a tool for characterizing heterogeneity and enabling discussions of health equity impacts.

Only 35% of study-outcome-dimensions we evaluated had I2=0, meaning that the differences in estimated policy effects across subgroups could not generally be explained by chance alone. Yet, the estimates reported here likely provide a lower bound on the frequency of heterogeneity in social policy effects for two reasons. First, HTE evaluations are frequently underpowered. An apparent lack of heterogeneity may simply reflect insufficient sample size to derive precise effect estimates for each subgroup. We did not observe heterogeneity in the study-outcome-dimensions involving firearm, macroeconomic, or education policies, but this does not mean that these policies do not have heterogeneous effects; these findings may simply reflect insufficient sample sizes to derive precise effect estimates for each subgroup. Second, HTEs are not routinely assessed along all potentially relevant dimensions (Cintron et al., 2022). This study thus adds to accumulating evidence that substantial HTEs are common and should be routinely and systematically reported when evaluating the health effects of social policies (Cintron et al., 2022; Matthay & Glymour, 2022; Petticrew et al., 2012). It also highlights the importance of study planning to ensure evaluations of social policies are sufficiently precise to evaluate HTEs.

The importance of evaluating HTEs is especially evident given our finding of subgroup effects in the opposite direction from the overall population effect (e.g., 76% of the study-outcome-dimensions had a non-zero area under the curve on the opposite side of the null). Studies lacking HTE assessments may therefore lead to policy recommendations that inadvertently harm some groups, or conversely, lead to missed opportunities for some subgroups to benefit. This study provides a methodological framework for using meta-analysis to identify these patterns. Our findings suggest that prior studies that did not examine HTEs might well be revisited to check for differential effects across important subgroups. HTE evaluations also show how the directions of average health effects (benefit versus harm) intersect with implications for disparities: 36% of HTE evaluations in this study corresponded to beneficial average effects that also reduced the magnitude of disparities in the outcome across subgroups, but 14% reduced disparities despite on-average health-harming effects; and 31% benefitted health on-average but widened disparities. To enable informed policy discussions of these tradeoffs, adequate quantitative HTE assessments across all relevant subgroups are required.

Funders, policymakers, and researchers have limited resources, and powering studies to evaluate HTEs entails added cost and complexity. Thus, resources should be dedicated to evaluating heterogeneity only if meaningful heterogeneity is likely and relevant for the policy impact. Studies such as ours can help inform this priority setting, but a larger evidence base is needed to draw firm conclusions. For instance, in our sample substantial heterogeneity in effect estimates was common across subgroups defined by health characteristics and socioeconomic characteristics and somewhat less common for geographic and demographic characteristics. Ideally a comprehensive analysis and theoretical guidance could direct prioritization of both the dimensions of heterogeneity–including intersectionally defined groups–and policy domains most likely to have heterogeneous effects.

The variation in policy types, study contexts, study designs, and outcome measures across studies included in our analysis implies that there are many reasons that the magnitude of HTEs may differ across studies. Delving further into the reasons for differences in estimated HTEs for subsets of studies that are more homogeneous is an area for future research. We conducted the random effects meta-analyses at the level of the study-outcome-dimension, so our analytic approach makes no assumptions about the level of similarity or difference in the estimated subgroup treatment effects across studies. We view this analytic flexibility and the diversity of policy domains and study contexts as a strength because, to our knowledge, no research has quantified the magnitude and distribution of HTEs using our meta-analysis analytic approach across any social policy domains. Because of the paucity of research in this area, our study is an important first step towards quantifying the full range of HTEs across diverse domains, so that subsequent research can further investigate individual domains and reasons for differences across domains.

No consensus on when or how to evaluate HTEs in social policy and health research has been established. For example, how should researchers balance the importance of identifying meaningful heterogeneity against the increased risk of spurious findings as the number of subgroups grows (Heckman et al., 2010)? Among the many diverse methods for evaluating HTEs (Breck & Wakar, 2021), which approaches are most appropriate for social policies? Which perform best and under what conditions (e.g., for few versus many subgroups)? Which available methods (e.g., MAIHDA, probability samples, qualitative tools) are best-suited to small sample sizes (Evans et al., 2018; Harding and Seefeldt, 2013; Tipton et al., 2019)? When is pre-specification or pre-registration necessary? How should HTEs be reported? Several recent articles on performing, reporting, and assessing the credibility of subgroup analyses may help in this effort (Gil-Sierra et al., 2020; Lesko et al., 2018; Schandelmaier et al., 2019, 2020; Sun et al., 2010; Tipton et al., 2019; Varadhan et al., 2013). We found that substantial heterogeneity was more common (43% vs 11%) when HTE evaluations were not specified a priori. This finding is consistent with research on the replicability crisis that suggests increases in spurious findings due to multiple hypothesis testing (Austin et al., 2006; Berger et al., 2009; Gelman & Loken, 2014). Research indicates methods for evaluating HTEs vary considerably across disciplines and that these differences may lead to differing conclusions (Breck & Wakar, 2021; Inglis et al., 2018; Loh et al., 2019). Work to develop guidelines for the conduct and reporting of HTEs specific to studies of the health effects of social policies is needed, especially given the unique implications of these studies for public policy and health equity.

Many potentially relevant subgrouping dimensions were not evaluated in the studies reviewed here. For example, health insurance status and disability status were not considered in any of the studies in our sample, yet important heterogeneity along these dimensions may exist. Differential impacts of social policies on racial/ethnic subgroups are of particular interest, but there were insufficient studies including consistent definitions of racial/ethnic categories to examine results for this dimension separately; this is a priority for future work. Furthermore, no subgroups were explicitly defined using intersectionality theory (Crenshaw, 1989); almost all studies treated subgrouping dimensions as independent and mutually exclusive (only 12 of the study-outcome-dimensions considered cross-classification of multiple demographic characteristics, e.g., age by gender). Future HTE evaluations must consider subgroups based on intersectionality theory (e.g., race by gender) to illuminate how a person’s multiple identities and social positions might be embedded within systems of inequality. Guidance is needed on how to determine which dimensions are relevant and should be evaluated based on theoretical and/or statistical principles (Boyd et al., 2020).

4.1. Limitations

We excluded 8% of study-outcome-subgroup-specific estimates due to incomplete reporting of HTEs in the original research contributing to our meta-analyses. The small number of studies and subgroups in our analysis also limited our ability to make claims about differences in the frequency or magnitude of heterogeneity in social policy effects by specific population characteristics or social policy domains. Furthermore, the subgroups examined were quite variable across studies and we lacked consistent observation of the same subgrouping dimensions (e.g. race/ethnicity) many times across different studies.

Also, note that the estimated HTEs may not be the true, unbiased HTEs. In this paper, we treat estimates as the investigators’ best attempt to estimate the causal effect of the social policy on the population subgroups, but we acknowledge that all estimates likely depart from the true causal effect to some degree. Further assessment of the methodological quality of studies evaluating HTEs is necessary; this work is best-done within subsets of studies that are more homogeneous with respect to policy type, study context, study design, and outcome measures. We view this paper as a first step towards these goals. Finally, some assumptions and approximations were necessary to convert reported measures of association to a common scale and to include consistent and complete variance estimates for all effect estimates.

4.2. Conclusions

This is the first study to systematically advance our understanding of the frequency, magnitude, and distribution of HTEs in research on the health effects of social policies at scale. We found that social policies can have considerably different health effects on subgroups and that the frequency and magnitude of heterogeneity varied by social policy domain, subgrouping dimensions, and a priori HTE specification. While this study does not provide recommendations on specific policy domains or population subgroups for which evaluating HTEs is a priority, it provides a novel methodological framework for quantifying HTEs and lays the groundwork for future investigations. Researchers and policymakers should be aware that social policies may have differential impacts across population subgroups in ways that can either exacerbate or mitigate health disparities. Consistently conducting HTE evaluations across all relevant population subgrouping dimensions is essential for adequate evidence-based policymaking that promotes health equity. This includes assessing HTE in future policy studies as well as revisiting prior social policy studies that did not examine HTEs. Yet in social policy research, HTE evaluations remain rare, methods are not standardized, and there is no available guidance on best practices. Given the small number of studies and diversity of subgrouping dimensions in this review, more research on a larger sample of the literature is needed to definitively characterize the policies and population subgrouping dimensions that are most important to evaluate.

Sources of financial support

This work was supported by the Evidence for Action program of the Robert Wood Johnson.

Foundation (RWJF).

Data access

Data and computing code may be made available upon reasonable request.

CRediT author statement

Dakota W. Cintron: Methodology, Software, Formal analysis, Investigation, Data Curation, Writing – Original Draft, Writing – Review & Editing. Laura M. Gottlieb: Writing – Review & Editing. Erin Hagan: Writing – Review & Editing. May Lynn Tan: Writing – Review & Editing. David Vlahov: Writing – Review & Editing. M. Maria Glymour: Methodology, Writing – Original Draft, Writing – Review & Editing, Supervision. Ellicott C. Matthay: Conceptualization, Methodology, Writing – Original Draft, Writing – Review & Editing, Supervision.

Declaration of competing interest

There are no conflicts of interest to report.

Acknowledgements:

None.

Footnotes

Appendix A

Supplementary data to this article can be found online at https://doi.org/10.1016/j.ssmph.2023.101352.

Appendix A. Supplementary data

The following are the Supplementary data to this article.

Multimedia component 1
mmc1.pdf (98.7KB, pdf)
Multimedia component 2
mmc2.pdf (277.1KB, pdf)

Data availability

Data will be made available upon reasonable request.

References

  1. Austin P.C., Mamdani M.M., Juurlink D.N., Hux J.E. Testing multiple statistical hypotheses resulted in spurious associations: A study of astrological signs and health. Journal of Clinical Epidemiology. 2006;59(9):964–969. doi: 10.1016/j.jclinepi.2006.01.012. [DOI] [PubMed] [Google Scholar]
  2. Berger M.L., Mamdani M., Atkins D., Johnson M.L. Good research practices for comparative effectiveness research: Defining, reporting and interpreting nonrandomized studies of treatment effects using secondary data sources: The ISPOR good research practices for retrospective database analysis task force report—Part I. Value in Health. 2009;12(8):1044–1052. doi: 10.1111/j.1524-4733.2009.00600.x. [DOI] [PubMed] [Google Scholar]
  3. Borenstein M., Hedges L.V., Higgins J.P., Rothstein H.R. John Wiley & Sons; 2021. Introduction to meta-analysis. [Google Scholar]
  4. Boyd R.W., Lindo E.G., Weeks L.D., McLemore M.R. On racism: A new standard for publishing on racial health inequities. Health Aff Blog. 2020;10(10):1. 1377. [Google Scholar]
  5. Boykin C.M., Brown N.D., Carter J.T., et al. Anti-racist actions and accountability: Not more empty promises. Equal Divers Incl Int J. 2020;39(7):775–786. doi: 10.1108/EDI-06-2020-0158. [DOI] [Google Scholar]
  6. Breck A., Wakar B. Methods, challenges, and best practices for conducting subgroup analysis. OPRE Rep. 2021:17. [Google Scholar]
  7. Cintron D.W., Adler N.E., Gottlieb L.M., et al. Heterogeneous treatment effects in social policy studies: An assessment of contemporary articles in the health and social sciences. Annals of Epidemiology. 2022;70:79–88. doi: 10.1016/j.annepidem.2022.04.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cohen J. Routledge; 2013. Statistical power analysis for the behavioral sciences. [Google Scholar]
  9. Crenshaw K. Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory, and antiracist politics. University of Chicago Legal Forum. 1989;(1):139–167. Chic IL. Published online 1989. [Google Scholar]
  10. Evans C.R., Williams D.R., Onnela J.P., Subramanian S.V. A multilevel approach to modeling health inequalities at the intersection of multiple social identities. Social Science & Medicine. 2018;203:64–73. doi: 10.1016/j.socscimed.2017.11.011. [DOI] [PubMed] [Google Scholar]
  11. Fan J., Song F., Bachmann M.O. Justification and reporting of subgroup analyses were lacking or inadequate in randomized controlled trials. Journal of Clinical Epidemiology. 2019;108:17–25. doi: 10.1016/j.jclinepi.2018.12.009. [DOI] [PubMed] [Google Scholar]
  12. Fernandez y Garcia E., Nguyen H., Duan N., Gabler N.B., Kravitz R.L. Assessing heterogeneity of treatment effects: Are authors misinterpreting their results? Health Services Research. 2010;45(1):283–301. doi: 10.1111/j.1475-6773.2009.01064.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Gabler N.B., Duan N., Liao D., Elmore J.G., Ganiats T.G., Kravitz R.L. Dealing with heterogeneity of treatment effects: Is the literature up to the challenge? Trials. 2009;10(1):43. doi: 10.1186/1745-6215-10-43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Gelman A., Loken E. The statistical crisis in science data-dependent analysis—a “garden of forking paths”—explains why many statistically significant comparisons don't hold up. American Scientist. 2014;102(6):460. [Google Scholar]
  15. Gil-Sierra M.D., Fénix-Caballero S., Abdel kader-Martin L., et al. Checklist for clinical applicability of subgroup analysis. Journal of Clinical Pharmacy and Therapeutics. 2020;45(3):530–538. doi: 10.1111/jcpt.13102. [DOI] [PubMed] [Google Scholar]
  16. Glymour M.M., Osypuk T.L., Rehkopf D.H. Invited commentary: Off-roading with social epidemiology—exploration, causation, translation. American Journal of Epidemiology. 2013;178(6):858–863. doi: 10.1093/aje/kwt145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Harding D.J., Seefeldt K.S. Handbook of causal analysis for social research. Springer; 2013. Mixed methods and causal analysis; pp. 91–110. [Google Scholar]
  18. Heckman J., Moon S.H., Pinto R., Savelyev P., Yavitz A. Analyzing social experiments as implemented: A reexamination of the evidence from the HighScope perry preschool program. Quant Econ. 2010;1(1):1–46. doi: 10.3982/QE8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Inglis G., Archibald D., Doi L., et al. Credibility of subgroup analyses by socioeconomic status in public health intervention evaluations: An underappreciated problem? SSM - Popul Health. 2018;6:245–251. doi: 10.1016/j.ssmph.2018.09.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Kasenda B., Schandelmaier S., Sun X., et al. Subgroup analyses in randomised controlled trials: Cohort study on trial protocols and journal publications. BMJ. 2014;349 doi: 10.1136/bmj.g4539. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Kendi I.X. Random House Publishing Group; 2019. How to Be an antiracist. [Google Scholar]
  22. Lesko C.R., Henderson N.C., Varadhan R. Considerations when assessing heterogeneity of treatment effect in patient-centered outcomes research. Journal of Clinical Epidemiology. 2018;100:22–31. doi: 10.1016/j.jclinepi.2018.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Leventhal T., Brooks-Gunn J. Moving to opportunity: An experimental study of neighborhood effects on mental health. American Journal of Public Health. 2003;93(9):1576–1582. doi: 10.2105/AJPH.93.9.1576. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Loh W.Y., Cao L., Zhou P. Subgroup identification for precision medicine: A comparative review of 13 methods. WIREs Data Min Knowl Discov. 2019;9(5):e1326. doi: 10.1002/widm.1326. [DOI] [Google Scholar]
  25. Matthay E.C. Do social interventions have different health effects for different people? Evidence for Action Methods Notes. 2020 https://www.evidenceforaction.org/sites/default/files/2021-04/E4A-Methods-Note-HTEp1.pdf [Google Scholar]
  26. Matthay E.C., Glymour M.M. Causal inference challenges and new directions for epidemiologic research on the health effects of social policies. Curr Epidemiol Rep. 2022;9(1):22–37. doi: 10.1007/s40471-022-00288-7. [DOI] [Google Scholar]
  27. Matthay E.C., Gottlieb L.M., Rehkopf D., Tan M.L., Vlahov D., Glymour M.M. What to do when everything happens at once: Analytic approaches to estimate the health effects of Co-occurring social policies. Epidemiologic Reviews. 2022;43(1):33–47. doi: 10.1093/epirev/mxab005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Matthay E.C., Hagan E., Gottlieb L.M., et al. Powering population health research: Considerations for plausible and actionable effect sizes. SSM - Popul Health. 2021;14 doi: 10.1016/j.ssmph.2021.100789. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Matthay E.C., Hagan E., Joshi S., et al. The revolution will Be hard to evaluate: How Co-occurring policy changes affect research on the health effects of social policies. Epidemiologic Reviews. 2022;43(1):19–32. doi: 10.1093/epirev/mxab009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Nguyen Q.C., Rehkopf D.H., Schmidt N.M., Osypuk T.L. Heterogeneous effects of housing vouchers on the mental health of US adolescents. American Journal of Public Health. 2016;106(4):755–762. doi: 10.2105/AJPH.2015.303006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Petticrew M., Tugwell P., Kristjansson E., Oliver S., Ueffing E., Welch V. Damned if you do, damned if you don't: Subgroup analysis and equity. Journal of Epidemiology & Community Health. 2012;66(1):95–98. doi: 10.1136/jech.2010.121095. [DOI] [PubMed] [Google Scholar]
  32. Rojas-Saunero L.P., Labrecque J.A., Swanson S.A. Invited commentary: Conducting and emulating trials to study effects of social interventions. American Journal of Epidemiology. 2022;191(8):1453–1456. doi: 10.1093/aje/kwac066. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Ross C.E., Mirowsky J. Sex differences in the effect of education on depression: Resource multiplication or resource substitution? Social Science & Medicine. 2006;63(5):1400–1413. doi: 10.1016/j.socscimed.2006.03.013. [DOI] [PubMed] [Google Scholar]
  34. Rothman K.J., Greenland S., Lash T.L. Vol. 3. Wolters Kluwer Health/Lippincott Williams & Wilkins; Philadelphia: 2008. (others. Modern Epidemiology). [Google Scholar]
  35. Schandelmaier S., Briel M., Varadhan R., et al. Development of the instrument to assess the credibility of effect modification analyses (ICEMAN) in randomized controlled trials and meta-analyses. Canadian Medical Association Journal. 2020;192(32):E901–E906. doi: 10.1503/cmaj.200077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Schandelmaier S., Chang Y., Devasenapathy N., et al. A systematic survey identified 36 criteria for assessing effect modification claims in randomized trials or meta-analyses. Journal of Clinical Epidemiology. 2019;113:159–167. doi: 10.1016/j.jclinepi.2019.05.014. [DOI] [PubMed] [Google Scholar]
  37. Starks M.A., Sanders G.D., Coeytaux R.R., et al. Assessing heterogeneity of treatment effect analyses in health-related cluster randomized trials: A systematic review. PLoS One. 2019;14(8) doi: 10.1371/journal.pone.0219894. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Sun X., Briel M., Busse J.W., et al. Credibility of claims of subgroup effects in randomised controlled trials: Systematic review. BMJ. 2012;344 doi: 10.1136/bmj.e1553. [DOI] [PubMed] [Google Scholar]
  39. Sun X., Briel M., Walter S.D., Guyatt G.H. Is a subgroup effect believable? Updating criteria to evaluate the credibility of subgroup analyses. BMJ. 2010:340. doi: 10.1136/bmj.c117. [DOI] [PubMed] [Google Scholar]
  40. Thomson R.M., Igelström E., Purba A.K., et al. How do income changes impact on mental health and wellbeing for working-age adults? A systematic review and meta-analysis. The Lancet Public Health. 2022;7(6):e515–e528. doi: 10.1016/S2468-2667(22)00058-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Tipton E., Yeager D.S., Iachan R., Schneider B. Experimental methods in survey research. John Wiley & Sons, Ltd; 2019. Designing probability samples to study treatment effect heterogeneity; pp. 435–456. [DOI] [Google Scholar]
  42. Vable A.M., Canning D., Glymour M.M., Kawachi I., Jimenez M.P., Subramanian S.V. Can social policy influence socioeconomic disparities? Korean war GI Bill eligibility and markers of depression. Annals of Epidemiology. 2016;26(2):129–135.e3. doi: 10.1016/j.annepidem.2015.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Varadhan R., Segal J.B., Boyd C.M., Wu A.W., Weiss C.O. A framework for the analysis of heterogeneity of treatment effect in patient-centered outcomes research. Journal of Clinical Epidemiology. 2013;66(8):818–825. doi: 10.1016/j.jclinepi.2013.02.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Viechtbauer W. Conducting meta-analyses in R with the metafor package. Journal of Statistical Software. 2010;36(3):1–48. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Multimedia component 1
mmc1.pdf (98.7KB, pdf)
Multimedia component 2
mmc2.pdf (277.1KB, pdf)

Data Availability Statement

Data will be made available upon reasonable request.


Articles from SSM - Population Health are provided here courtesy of Elsevier

RESOURCES