Table 2 |.
Stage of research | Description | How to take heterogeneity seriously | How to not take heterogeneity seriously |
---|---|---|---|
| |||
(1) Initial experiments | • Initial test(s) of a new intervention idea • Typically conducted with convenience samples • Usually overestimates average intervention effects in the population |
• Assume effect is moderated in the broader population, use an inclusive sample, and theorize carefully about potential moderators • Measure and report data on potential moderators—even those with little or no variation in the sample—to inform future studies and meta-analyses • Qualify research conclusions clearly and prominently (for example, 'This is a promising solution with unknown generalizability') |
• Fail to theorize carefully about likely limits on generalizability • Fail to measure and report data on potential moderators • Make claims about the real-world promise of an intervention without clear, prominent qualification based on sample and site characteristics • Omit diverse populations |
(2) Efficacy experiments | • Replication or extension experiment(s) • Typically conducted with larger convenience samples • Could overestimate or underestimate average intervention effects in the population |
• Define the population of interest (that is, target population if intervention were applied at scale) • Select test sites and sample inclusion criteria to represent some or all of the population of interest (purposive sampling) • Measure and report data on moderators of interest and, when appropriate, test for subgroup differences • Qualify claims (for example, 'Preliminary evidence that this intervention is effective, at least in urban schools in Northern California') |
• Select samples and test sites based primarily on convenience • Make claims about the real-world promise of an intervention without clear, prominent qualification based on sample and site characteristics |
(3) Effectiveness experiments | • Large-scale tests of an intervention's likely effect in the population of interest (or in some specified subgroup within that population) • Typically conducted with larger, generalizable samples that include many sources of heterogeneity • May include an experimental manipulation of the relevant moderators • Yield unbiased estimates of average effects and subgroup effects |
• Construct a probability sampling plan using theories of moderators, as well as knowledge gleaned from moderators in prior studies • Measure and report data on moderators of interest • Intentionally over-sample subgroups of interest to power moderation tests adequately • Exercise caution in interpreting measured moderators as causal variables • Make justifiably broad claims about the likely effectiveness of the intervention in the population and subpopulations studied |
• Focus primarily on sample size without careful regard to sample composition • Attend only to theoretically superficial moderators • Focus primarily or exclusively on powering tests of average treatment effects in the population as a whole |