This is the third editorial in a series of related opinion pieces. The first editorial provided a description of the Office of Adolescent Health’s (OAH) investment in teen pregnancy prevention (TPP) programs and the evaluation technical assistance (TA) contract that was intended to support funded grantees.1 That editorial also outlined the US Department of Health and Human Services (HHS) evidence standards that guided the evaluation TA contract. The second editorial detailed the activities conducted under the evaluation TA contract with the first cohort of funded grantees.2 Implementing a large-scale TA effort was not without challenges. These challenges were necessary and expected, as the federal government looks to support large-scale grantee-led evaluations to build the evidence base in TPP. The lessons learned from these challenges laid the groundwork for a second a round of evaluation TA with a second cohort of TPP grantees that began in summer 2015. These challenges and their solutions can also inform other federally sponsored evaluation TA efforts. The following is a discussion of the major challenges in providing TA to the grantees in cohort 1. These challenges were identified through the TA activities laid out in Zief et al.,2 primarily our monitoring calls and document reviews.
CHALLENGE 1: NO STANDARDIZED APPROACH
Federal efforts to assess impact evaluations for the credibility of their causal inferences began with the Department of Education’s What Works Clearinghouse in 2004 and have spread throughout other agencies and offices, such as HHS and the Department of Labor. These agencies have each developed similar sets of standards that are used to systematically and consistently assess the internal validity of completed studies in their field. But no publicly available road maps existed within HHS, or any other federal agency, for designing and implementing an evaluation that has a good chance of resulting in a completed study that meets evidence review standards. Because many grantees and evaluators were initially unaware or only partially aware of the standards, they were not considering the evidence standards when designing their evaluations. So this contract focused on building a process (the framework noted in Zief et al.2) for training the grantees on the evidence standards and best practices for meeting those standards. This was accomplished through providing formal and informal training during the TA process and developing written products that could inform the broader field and future grantees and evaluators. The contract produced research briefs on planning evaluations designed to meet evidence standards, coping with missing data and clustering in randomized controlled trials, baseline equivalence and matching, attrition in randomized controlled trials, best practices for school and district recruitment, and a primer on the evidence standards. The dissemination materials, available on the OAH Web site,3 supplied a road map for the second cohort of grantees that did not exist for the first cohort.
CHALLENGE 2: NOT THE SOLE BENCHMARK
The HHS evidence review assesses the internal validity of the impact findings to document the extent to which the results are credible and could therefore guide policy and program decisions. However, for most policymakers and researchers, the key question of interest is whether the program had an effect that was statistically significant, which is in part based on the study’s power and the effective contrast in services across the treatment and control groups.
When the evaluation TA contractor began design plan reviews, the team identified numerous studies with low statistical power or a small-expected programmatic contrast between the two groups. Upon consultation with OAH, the evaluation TA team made recommendations for numerous studies to improve statistical power and contrast. For example, some studies proposed a small sample (e.g., less than 500) but planned multiple (three or more) follow-up surveys. Eliminating one follow-up survey sometimes provided the resources needed to enroll a larger sample. Others proposed larger samples but had a control group receiving very similar services as the treatment group. Therefore, such studies were unlikely to observe large differences in participant outcomes, given the small differences in the effective contrast between treatment and control groups. More often than not, recommendations to improve statistical power and the contrast between the two groups had budget implications that could not always be offset by other design modifications, or they were infeasible to implement given the limitations of the evaluation settings. As a result, a small number of studies were unable to accommodate the recommended improvements and moved forward with expected weak contrasts between the two groups and low power to detect statistically significant impacts.
As OAH prepared their funding announcement for the second cohort, the TA contract prepared a research brief and power calculator targeted toward research in this field. The goal was to encourage applicants to be more thoughtful and critical when determining the optimal sample size and contrast for the design. Study power and contrast were also focal points of the early reviews of applicant and funded design plans for the second cohort.
CHALLENGE 3: LATE START TO TA
One reason that the evaluation TA team was limited in its ability to improve some of the evaluations is that the TA contract was funded after the grantees were awarded. The Office of Adolescent Health was a new federal office funded in 2010, and it had to move quickly to launch the grant program, the evaluation TA contract, and a related performance measures contract during fiscal year 2010.4 The evaluation TA activities started several months after the grant awards were announced. Because of this timeline, grantees and their evaluators were asked to rethink and refine their original evaluation and program plans nearly a year after they received their grant award. Understandably, this led to quite a bit of confusion with and frustration over a newly evolving set of expectations and requirements. For some grantees, the process and timeline led to delays in beginning the programming and evaluation. For the TA team, the late start meant that budgets were set and some parameters were nonnegotiable. Needless to say, relationships between all involved parties were at times strained during this first year.
OAH made several adjustments to the process before releasing the funding opportunity announcement for the second cohort of grantees funded in 2015. First, the evaluation TA contractor was funded prior to the awards and would therefore be available to provide evaluation TA support before and immediately upon grant award. Second, the top-scoring evaluation applications were reviewed specifically to identify any designs with a low probability of being successful. Finally, the expectations and requirements for the evaluations and work with the evaluation TA contractor was disclosed in the funding opportunity announcement and again at the time of award, preparing grantees and evaluators for evaluation TA involvement. These changes have resulted in faster approvals for the second cohort grantee designs.
CONCLUSIONS
Despite the challenges, evaluation TA improved the quality of the completed evaluations. Ironically, the TA bolstered the rigor of many studies that unfortunately do not show statistically significant program impacts; several of these studies would not have produced credible or publishable evidence without this TA (see Farb and Margolis5 and Cole6). As a result, these no-findings evaluations would never have had their evidence published or been deemed credible, contributing to what Rosenthal calls the “file-drawer problem”—though, the registration of the evaluations may have ameliorated this problem somewhat.7 That said, it is important to add credible evidence to the field regardless of the direction and statistical significance of the impact estimate. Understanding that some programs have nonsignificant findings, at least in some contexts or for some populations, is an important contribution to the evidence base, particularly when some of these programs have been shown to improve participant outcomes in other settings or with different populations. Furthermore, observing substantively large, but nonsignificant impacts in an underpowered study may highlight an opportunity to conduct an additional evaluation for a potentially promising program with greater attention paid to the study design and implementation.
ACKNOWLEDGMENTS
This work was conducted under a contract (HHSP233201300416G) with the Office of Adolescent Health, within the Department of Health and Human Services (HHS).
REFERENCES
- 1.Cole RP, Zief SG, Knab J. Establishing an evaluation technical assistance contract to support studies in meeting the HHS evidence standards. Am J Public Health. 2016;106(10):S22–S24. doi: 10.2105/AJPH.2016.303359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Zief SG, Knab J, Cole RP. A framework for evaluation technical assistance. Am J Public Health. 2016;106(suppl 1):S24–S26. doi: 10.2105/AJPH.2016.303365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.US Department of Health and Human Services. Office of Adolescent Health. Evaluation training & technical assistance (TA). Available at http://www.hhs.gov/ash/oah/oah-initiatives/evaluation/ta.html. Accessed August 24, 2016. [DOI] [PubMed]
- 4.Kappeler EM, Farb AF. Historical context for the creation of the Office of Adolescent Health and the Teen Pregnancy Prevention Program. J Adolesc Health. 2014;54(3):S3–S9. doi: 10.1016/j.jadohealth.2013.11.020. [DOI] [PubMed] [Google Scholar]
- 5.Margolis AL. The Teen Pregnancy Prevention Program (2010-2015): synthesis of impact findings. Am J Public Health. 2016;106(suppl 1):S9–S15. doi: 10.2105/AJPH.2016.303367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Cole RP. Comprehensive reporting of adolescent pregnancy prevention programs. Am J Public Health. 2016;106(suppl 1):S15–S16. doi: 10.2105/AJPH.2016.303332. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Rosenthal R. The “File Drawer Problem” and tolerance for null results. Psychol Bull. 1979;86(3):638–641. [Google Scholar]