Skip to main content
. Author manuscript; available in PMC: 2022 Mar 17.
Published in final edited form as: Nat Hum Behav. 2021 Jul 22;5(8):980–989. doi: 10.1038/s41562-021-01143-3

Table 2 |.

How different stages of research can take heterogeneity seriously, or not

Stage of research Description How to take heterogeneity seriously How to not take heterogeneity seriously

(1) Initial experiments • Initial test(s) of a new intervention idea
• Typically conducted with convenience samples
• Usually overestimates average intervention effects in the population
• Assume effect is moderated in the broader population, use an inclusive sample, and theorize carefully about potential moderators
• Measure and report data on potential moderators—even those with little or no variation in the sample—to inform future studies and meta-analyses
• Qualify research conclusions clearly and prominently (for example, 'This is a promising solution with unknown generalizability')
• Fail to theorize carefully about likely limits on generalizability
• Fail to measure and report data on potential moderators
• Make claims about the real-world promise of an intervention without clear, prominent qualification based on sample and site characteristics
• Omit diverse populations
(2) Efficacy experiments • Replication or extension experiment(s)
• Typically conducted with larger convenience samples
• Could overestimate or underestimate average intervention effects in the population
• Define the population of interest (that is, target population if intervention were applied at scale)
• Select test sites and sample inclusion criteria to represent some or all of the population of interest (purposive sampling)
• Measure and report data on moderators of interest and, when appropriate, test for subgroup differences
• Qualify claims (for example, 'Preliminary evidence that this intervention is effective, at least in urban schools in Northern California')
• Select samples and test sites based primarily on convenience
• Make claims about the real-world promise of an intervention without clear, prominent qualification based on sample and site characteristics
(3) Effectiveness experiments • Large-scale tests of an intervention's likely effect in the population of interest (or in some specified subgroup within that population)
• Typically conducted with larger, generalizable samples that include many sources of heterogeneity
• May include an experimental manipulation of the relevant moderators
• Yield unbiased estimates of average effects and subgroup effects
• Construct a probability sampling plan using theories of moderators, as well as knowledge gleaned from moderators in prior studies
• Measure and report data on moderators of interest
• Intentionally over-sample subgroups of interest to power moderation tests adequately
• Exercise caution in interpreting measured moderators as causal variables
• Make justifiably broad claims about the likely effectiveness of the intervention in the population and subpopulations studied
• Focus primarily on sample size without careful regard to sample composition
• Attend only to theoretically superficial moderators
• Focus primarily or exclusively on powering tests of average treatment effects in the population as a whole