Abstract
The process used by the WHO to generate nutrition recommendations relies on high-quality research evidence, and this makes new demands on the research questions that nutrition scientists address. As a researcher involved in WHO nutrition guidelines development, my objective is to suggest ways in which our research can adapt to meet these demands. Randomized controlled trials and systematic reviews generate the highest quality of evidence to support strong recommendations, yet even these methods leave controversies in which judgments must be made. Using examples from recent research and guidelines, 4 issues are highlighted that illustrate ways in which nutrition research can adapt to become more useful and informative to global nutrition guidelines. These issues include embedding mechanistic research within trials, explicit choice of design along the efficacy or effectiveness spectrum, anticipation of heterogeneity of effects, and the need for research on consumer or community values and preferences.
Introduction
One of the core functions of the WHO is to provide health guidelines to assist member states to improve the public health. In 2010, the Nutrition Division of the WHO convened a Nutrition Guidelines Advisory Group, which was mandated to develop a series of nutrition guidelines (including several on micronutrient interventions) according to a new process based on the Grading Assessment Development and Evaluation (GRADE) framework and in collaboration with The Cochrane Collaboration.
The new guidelines development process adopts internationally accepted best practices to use research evidence to make recommendations (1). The past 3 y of experience developing WHO nutrition guidelines, including many related to micronutrient interventions, have revealed both strengths and weaknesses in the nutrition research evidence base. In addition, the researchers, program planners, and policy makers on the Guidelines Advisory Group have sometimes been frustrated as they attempt to apply processes originating from clinical practice interventions (i.e., GRADE and Cochrane reviews) to recommendations involving public health nutrition decision making.
I write from the perspective of a nutrition researcher who has participated in the Nutrition Guidelines Advisory Group since 2010; thus, a researcher who both generates and uses evidence to guide policy. This article comprises specific examples, personal reflection, and connections to the lively literature that surrounds evidence-based medicine and public health. The issues raised here are not unique to nutrition. In the first part of this article, I note strengths and limitations of relying primarily on systematic reviews and randomized trials to create recommendations. I then highlight 5 suggestions to the nutrition science community as we design, implement, and publish research intended to inform stronger and more relevant guidelines for the WHO member states.
Strengths and Limitations of Randomized Controlled Trials and Systematic Reviews—the Backbones of the Guidelines Development Process
The strengths of randomized controlled trials (RCTs) are well known and appreciated. Randomization provides “reasonable assurance of balance with respect to uncontrolled factors” (2) and is the most certain method for dealing with confounding. This virtue of randomization is the primary reason that intention-to-treat analyses are considered the most valid source of inference: that is, intention-to-treat analyzes the experimental groups as randomly allocated. Furthermore, in a randomized experiment, a careful choice of the control condition often enables the investigator to isolate the particular aspect of treatment that he or she wants to make an inference about. In addition, some statisticians argue that randomly allocated groups are essential to interpret common statistical tests and confidence intervals (3). For these reasons, the GRADE process used by the WHO considers evidence from randomized trials high quality (although defects in design and analysis can render them low quality) and evidence from other designs as low quality (4).
The medical science community has become so confident in randomized experiments compared with alternatives that it is often claimed that causal inference can be made only on the basis of randomized designs. However, some have criticized this stance, arguing that other scientific approaches can also make legitimate claims to causal inference. For example, Cartwright draws upon longstanding scientific principles to argue that “when it comes to clinchers—to methods from which the hypothesis can be rigorously derived from the evidence—RCTs are not the only game in town” (5, p. 14). Other examples of clinchers relevant to public health and nutrition include econometric methods and what she calls “tracing the causal process.” These and other methods are discussed by Shadish et al. (6).
A second line of criticism in the use of RCTs has to do with the limitations of their information. A perfectly designed, implemented, and analyzed RCT will yield a strong causal inference with regard to the specific population studied but may tell you nothing about the nature of the causal process, nor which participants benefited and which did not. [The average treatment effect, which is what RCTs estimate, may include no benefit or a “dys”-benefit among some participants, even if it is positive on average (5).] Furthermore, RCTs do not necessarily provide evidence about how the specific result observed might depend on the social or biologic context of the study participants. Thus, although RCTs deliver well on internal validity [or effectual validity in a given population and context (7)], they do not have any special claim to external validity (i.e., generalizability to other populations or contexts) and may even be less externally valid than other approaches (5, 8).
Constraints on external validity of RCTs derive from various practical pressures, including the investigator’s desire to choose a study population with the following characteristics: high outcome rates (to control sample size and therefore costs), unusual homogeneity in potential to benefit (e.g., uniformly poor or with or without access to some essential service), and with excellent research infrastructure, thus favoring certain nations or study sites. Indeed, there are currently problems and places (e.g., maternal mortality in Tibet) that cannot meet the requirements of RCT methods, creating what Adams (9) has called the erasures of evidence-based medicine in global health.
To quote Cartwright: “The central question for external validity then is, ‘How do we come to be justified in the assumptions required for exporting a causal claim from the experimental to a target population?’ Here rigor [of the RCT] gives out.” (5, p. 18)
And yet, that is exactly what the WHO’s guidelines development committee seeks to do: apply rigorous RCT-based evidence to the needs of diverse target populations of 193 member states.
Although external validity is not inherent in RCTs, it can be increased by choosing study populations with generalizability in mind (10). This requires initial clarity on the targets of generalization: does one seek to generalize based on demography, genetics, or nutritional status, for example? Logistics and resources may rapidly become constraints. Various stakeholders may also have different interests in generalizability; for example, local stakeholders may value certain targets (e.g., a national minority population) more than international donors who may seek to generalize to diverse global populations.
The RCT evidence from many studies is reviewed systematically through a well-defined rigorous process guided by The Cochrane Collaboration. This comprehensive search process has consistently unearthed evidence that would otherwise not have been known and appreciated by the guidelines development group. It also can address the issue of generalizability of individual RCTs, especially if the group of trials includes relevant diversity of populations and contexts, and the trials are numerous enough to explore subgroups. The format of the Cochrane reviews organizes the information in ways that can be readily viewed and understood. And the structure of the process requires that critical decisions such as inclusion and exclusion criteria be documented and transparent. These are substantial advantages that add credibility to current WHO guidelines.
The systematic review process is also not without challenges for the guidelines development group, however. Two challenges are particularly worthy of discussion. The first is that the outcome measures used by various researchers are often not uniform across trials. When assessing intervention effects on birth outcomes, for example, we often summarize across a variety of measures used in the various trials, and some of those (such as small for gestational age) use different definitions. The situation is even more troublesome for outcomes related to child development, a process composed of a wide variety of cognitive domains measured at different times. If nutrition interventions have specific effects at specific periods in child development, overpooling outcome data could obscure true effects.
A second challenge is discerning to what degree studies across diverse contexts and risk levels should be pooled. The controversy over the trials of newborn vitamin A supplementation and infant mortality illustrates this well. Three initial trials of newborn vitamin A supplementation, all conducted in Asia, found significant protective effects on infant survival. When 3 subsequent trials in Africa showed no benefits, how were the new trial results to be integrated with the previous highly promising evidence? An updated systematic review pooling all studies found no significant effect (11). But others argued that inferences across the different contexts of Asia and Africa should not be pooled; newborn supplementation saves infant lives in Asia and large-scale programs should proceed in that region, whereas further data are needed from Africa (12). In 2011, the WHO issued a guideline, based on global pooled effects, that recommends against this intervention on the basis of current data (13) but also recommends an updated review in 2013. At the heart of this controversy is a lack of clarity about when global guidelines are appropriate, given the immense diversity of global populations and contexts, and when to issue more targeted guidelines based on subgroups of trials (e.g., newborn vitamin A trials in South Asia).
Thus, whereas meta-analyses have the noteworthy strengths of compiling all available evidence, considerable judgment is required in how and when to pool evidence.
How Can the Research Community Provide Strong and Informative Evidence to Support Guidelines?
The guidelines development process thus has challenges in the midst of its numerous strengths. There are several ways that nutrition research is adapting to meet the needs of evidence-based recommendations. In the remainder of this article, I make 4 broad suggestions, with references to the literature that offers a deeper treatment of the challenges.
Suggestion 1: unpack black boxes.
A black box refers to an “untested postulate linking an exposure and an outcome in a causal sequence, where the causal mechanism is unknown (‘black’), but its existence is implied (‘box’)” (14, p. 2). It is a useful metaphor that was first used in systems engineering and has previously engendered lively debate within epidemiology (15, 16).
An exclusive emphasis on RCT evidence in making recommendations focuses our attention on the cleanliness of trial design and interpretation, and away from the mechanisms that have caused the results that we find. It is the tendency that Royall (2) >30 y ago called “closurization,” as in “you randomize and then you close your eyes” (emphasis mine).
Does it matter whether we know why and how an intervention works? Evidence-based decision-making systems, such as GRADE, rely on hierarchies of evidence. In these rankings of strength of evidence, mechanistic evidence is either not mentioned at all or is considered to be the lowest level of evidence (4, 17). I think that mechanistic evidence does matter [as did Royall (2); see also Habicht in this issue (18)], and the most important reason it matters for WHO guidelines development has to do with external validity.
As explained above, the rigor of the RCT lies in its internal validity, whereas it has no special claim to external validity. And yet, what matters most for WHO guidelines is external validity: How (and how strongly) does this body of evidence pertain to the WHO member states now and in the future? Guidelines development groups must use mechanistic theories to use RCT evidence for good decision making. The essential question is: What was the causal driver of the effect observed, and will it operate similarly in other times and places?
The use of evidence from several trials (i.e., a meta-analysis) to recommend future actions for a diverse set of WHO member states is a form of extrapolation: “to infer or estimate by extending or projecting known information” (19). Many disciplines have well-developed methods for extrapolation, but, in my experience, the methods for extrapolating RCT evidence to global guidelines is not well theorized. WHO instructs the guidelines development group to assess the “directness” of the evidence but offers no guidance on how to do this (1). In practice, this is a qualitative judgment without clear structure or transparency.
The guidelines development group’s capacity for this extrapolation depends very strongly upon mechanistic and theoretical understandings of the causal forces underlying the results seen in the trials. And these can only be known if those causal forces—mechanisms—are revealed through scientific processes that go beyond the immediate demands of RCTs.
The call for stronger use of mechanistic evidence alongside RCT evidence exists within the public health nutrition literature, as well as in philosophical circles (5, 8, 20, 21). One of the strengths of WHO’s nutrition guidelines advisory group process has been the inclusion of nutrition science researchers with mechanistic expertise when the evidence for guidelines is being discussed. These scientists are typically invited for a specific meeting when their nutrient or outcome of expertise is on the agenda (e.g., calcium or preeclampsia) but are not permanent or voting members. Nutrition science expertise is essential for the guidelines development group members to probe the mechanistic foundations for the inferences obtained from RCTs.
Although the word mechanism invokes biochemistry for most nutrition scientists, social and behavioral mechanisms are equally important to the success or failure of many nutrition interventions. In our research protocols and the methods sections of our publications, we are well disciplined to describe nutrient dose, formulation, regimen, and duration. I suggest that we need to become equally disciplined in describing the social and informational aspects of interventions. For example, if a community health worker delivered a micronutrient powder, how long was the social interaction, what topics and messages did they talk about, and what social support or social pressure was implicitly or explicitly delivered along with the micronutrient powder? This gap in reporting the social aspects of interventions has also been noted in fields other than nutrition (22).
Social mechanisms are at work not only through our behaviors as interventionists but also through the responses they elicit. What intended and unintended behavioral responses ripple out from our carefully designed point of intervention? I have become convinced that ethnographic methods (such as observations and open-ended interviews) are the only way to know what really happened within a trial. How else can we discover the “complex ways in which [our interventions] are actually received and critiqued. The counterknowledge of the people who are actually at the center of things” (23, p. 23). These methods cannot be used with large numbers of trial participants, but purposeful selection of smaller samples for deeper observation or questioning can yield important insights.
Inextricably linked to the causal forces creating the measured effect of an intervention is the context in which it is implemented and studied. Cook (10), invoking Cronbach and Campbell, described generalization in 4 domains: persons, treatments, outcomes, and settings. He argued that causal generalization must grapple with all 4 domains. I assert that in nutrition our weak spot is in the area of settings, what we may call context. We tend not to describe it adequately, either in our data collection or our publications. And we do not have strong and consistent ways of critically examining which aspects of context are most important to our inferences. Perhaps this is because, for nutritionists, context is overwhelmingly complex, including foods, nutrients, social systems, disease patterns, health systems, genetics, and more. To make progress may require simplified theories or rubrics to guide our thinking and research.
Thus, to unpack black boxes and strengthen generalized inferences from RCTs, there is need for the nutrition research community to continue to generate mechanistic evidence, before, alongside, or within RCT evidence, and for the WHO guidelines process to bring that evidence to the table for deliberation in conjunction with systematic reviews. As guidelines developers, we need to develop stronger and clearer methods for the use of mechanistic and contextual evidence in our judgment of the external validity of RCTs—i.e., our methods for extrapolation of RCT evidence across time and context.
Suggestion 2: think carefully about the efficacy-effectiveness spectrum.
Studying epidemiology in the 1980s, I learned the difference between efficacy and effectiveness. Efficacy refers to the “level of desired effects (good over harm, benefits over costs) of a program when delivered and received under optimum conditions (i.e., when availability and acceptance are maximized),” whereas effectiveness is the “level of good over harm (or benefits over costs) that a program achieves when received under typical real-world conditions of availability and acceptance” (p. 468) (24). This was a neat distinction, with efficacy providing the upper bound of effectiveness, in which efficacy is diluted by the messiness of the real world.
More recently, this dichotomy has become blurred, and I have come to view efficacy and effectiveness as a spectrum rather than a dichotomy. Public health nutrition researchers, myself included, have become more ambitious and creative in the studies that we design and implement, and we are designing experiments that blend efficacy and effectiveness questions.
Several factors have contributed to this blending of designs. First is a desire on the part of both scientists and funding agencies for research to find ways to close the efficacy-effectiveness gap, also called the research-practice gap (21). Researchers who previously dwelt comfortably within the world of efficacy research aspire to do research that is closer to the real world and to produce results that are more immediately relevant to the needs of health systems and people. Second is the fact that many of the interventions of greatest interest are not well suited to delivery by researchers or clinicians. Take infant feeding interventions, for example. Peer educators or community health workers are arguably more effective and certainly more affordable than research staff when frequent personal contact is required to deliver behavior change communication. Donors are also increasingly interested in the use of delivery structures that can be directly scaled up.
A third factor is that cluster randomization is being used in increasingly diverse circumstances, including within programmatic (i.e., real-world effectiveness) settings, and our current methods of systematic review facilitate the aggregation of all randomized experiments, without clear distinction between inference of efficacy versus effectiveness. The example of meta-analysis of the DEVTA (Deworming and Enhanced Vitamin A supplementation) effectiveness trial with individually randomized efficacy trials of vitamin A supplementation to decrease child mortality is a case in point (25), which is discussed at length by Habicht in this supplement issue (18).
I suggest that we researchers need to be more clear and transparent about this spectrum. It is useful to return to the 1986 definitions of Flay (24), cited above, which differentiate efficacy and effectiveness on the basis of availability and acceptance of the intervention. In the pure efficacy design, the intervention would be nearly 100% available to participants in the intervention and nearly 100% accepted by them. Flay himself suggested a spectrum, by defining 3 types of effectiveness research: implementation effectiveness, treatment effectiveness, and program evaluation, each allowing different degrees of variation in acceptance, availability, and implementation fidelity (see reference 24 for definitions). This taxonomy may not be wholly adequate today, but it illustrates the sort of language that we need to develop further as we seek to understand the various sorts of inferences that can be made (and aggregated) across the diverse study designs that we are creating. It is not randomization that justifies aggregation (i.e., meta-analysis), it is the research question itself. It is problematic and confusing, for example, to aggregate randomized trials of vitamin A efficacy and vitamin A programmatic effectiveness, as discussed above.
Suggestion 3: anticipate heterogeneity of effects (effect modification).
Effect modification is ubiquitous. It is difficult to imagine any nutrition intervention that will have very similar effects across all individuals in a target population, given the potential for variability in preintervention nutritional status, metabolic characteristics (e.g., gut function), genome, and other environmental factors. This is not only true for nutrition interventions (26).
In the context of interventions, effect modification has been termed “potential to benefit” (27). For any intervention, some individuals and populations will have greater potential to benefit than others. This has tremendous relevance for the extrapolation task of the guidelines development group. When beneficial effects are rigorously demonstrated through RCTs and systematic reviews, how can we help potential users of that intervention to estimate whether their target populations have the same potential to benefit as the study populations in the RCTs? We are back to the issue of external validity.
Given that the effects of any intervention are heterogeneous (i.e., subject to modifying factors), how can research provide the most useful information about this? One approach is to study large samples of heterogeneous individuals, and to design a priori subgroup analyses planned to demonstrate effect modification. Or better yet, to construct the study sample to represent subgroups of interest (10). These solutions are both challenging and expensive. As a rule of thumb, a robust test of interaction requires at least 4 times the sample size as a simple test of an overall average effect (28). The likelihood of making a false inference becomes very high when the subgroup analyses are post hoc or when many tests of interaction are carried out (29).
The selection of the study population is paramount and subject to tensions between various objectives. Is the ideal study population one with uniformly high potential to benefit? This may produce a significant result with a smaller sample size, but raises questions about which other populations might benefit. Or should one choose a more heterogeneous population likely to yield a smaller effect size (requiring a greater sample size) and able to yield informative subgroup analyses (requiring yet a greater sample size).
Given the importance of understanding effect modification, it is remarkable how little guidance exists for doing subgroup analyses well; rather, most advice is simply to avoid it. In 1974, the epidemiologist Olli Miettinin (26) wrote: “. . . data analysis without regard for effect-modification can be of suboptimal sensitivity in the detection of the very existence of the association and incomplete in the estimation of its magnitude. At the same time, routine approaches in epidemiologic data analysis are generally oblivious to possible effect-modification . . . .” (p. 1115)
Thirty-six years later, the 2010 CONSORT (Consolidated Standards of Reporting Trials) statement simply discouraged subgroup analyses because of the well-documented likelihood of false results (30). This statement is statistically well justified; however, it does not help us advance the knowledge that we need to make evidence-based guidelines across diverse populations.
To anticipate effect modification and to select the most important subgroup analyses a priori requires understanding of causal forces, both of the intervention itself and of contextual or environmental modifiers. The need for biologic and social mechanistic theory is critical.
Suggestion 4: explore and document values and preferences.
An explicit part of the GRADE process, reiterated by the WHO, is the consideration of values and preferences. To quote the WHO: “Values, the relative worth or importance of a state or consequences of a decision (outcomes relating to benefits, harms, burden and costs) play a role in every recommendation. . . . The values used in making recommendations should reflect those of the people affected. Judgments should be explicit and should be informed by those affected (including citizens, patients, clinicians and policy-makers)” (1, p. 51).
This is a difficult and murky task, easily overshadowed by the highly structured and transparent processes in place for evaluating the intervention outcomes. The evidence base for preferences and values is poorly conceived (what should this evidence look like?) and decidedly sparse (31). It is therefore not surprising that values and preferences statements in recent WHO nutrition guidelines are weak (e.g., “It is expected that this intervention would be acceptable to women” (32, p. 15) or from a clinical perspective (e.g., “Evidence is consistent for pre-eclampsia and preterm birth, both of which are responsible for a considerable proportion of the burden of maternal and infant morbidity and mortality” (33, p. 20).
Clinical and public health researchers are experimenting with a variety of approaches to address this gap. One approach is systematic and ambitious qualitative research in diverse populations. For example, the WHO commissioned a study of the values and preferences of men who have sex with men and transgendered people on HIV interventions (34); these findings can be used when drafting guidelines for this target population. Others have begun the evidence review process with surveys of practitioners (in this case, immunization program managers) to find out what interventions they would most value, and then searched for evidence on those interventions (35).
Various approaches may be justified based on time, resources, and the issue at hand, but the problem deserves greater attention and most solutions will require research. Certainly, nutrition interventions are value-laden, and citizens, providers, and policy makers have distinct preferences. An efficient approach might be to study citizen and provider values and preferences within or alongside the randomized trials that we conduct, so that some evidence from multiple perspectives is immediately available to the people interpreting the evidence from trials and systematic reviews.
Conclusions
The WHO guidelines development process is a substantial advance in stakeholder involvement, systematic use of evidence, transparency, and clarity of communication. The interaction between nutrition research and the formulation of recommendations is dynamic, and both the WHO and the research community are served by open dialog about research questions, research designs, and the best use of research evidence.
For micronutrient interventions, the process thus far has relied almost exclusively on RCT evidence, summarized in Cochrane reviews. The strength and rigor of these RCTs is shaping the recommendations that will guide programs in this decade. This evidence base also leaves blind spots. Of all the issues raised in this article, that of values and preferences is perhaps the most in need of innovation and conceptualization. The contribution of researchers to mechanistic evidence (both social and biologic) and to understanding who benefitted (and who did not, and whether anyone was harmed) will strengthen future guidelines. Methods exist to generate this evidence, but we also need to develop stronger frameworks and methods for integrating such evidence with the results from trials of various sorts.
Acknowledgments
The sole author had responsibility for all parts of the manuscript.
Literature Cited
- 1.World Health Organization WHO handbook for guideline development. Geneva: WHO; 2010 [Google Scholar]
- 2.Royall RM. Current advances in sampling theory: implications for human observational studies. Am J Epidemiol. 1976;104:463–4 [DOI] [PubMed] [Google Scholar]
- 3.Fisher RA. Statistical methods and scientific induction. J R Stat Soc Series B. 1955;17:69–78 [Google Scholar]
- 4.Balshem H, Helfand M, Schunemann HJ, Oxman AD, Kunz R, Brozek J, Vist GE, Falck-Ytter Y, Meerpohl J, Norris S, et al. GRADE guidelines: 3. Rating the quality of evidence. J Clin Epidemiol. 2011;64:401–6 [DOI] [PubMed] [Google Scholar]
- 5.Cartwright N. Are RCTs the gold standard? Biosocieties. 2007;2:11–20 [Google Scholar]
- 6.Shadish WC, Cook TD, Campbell DT. Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin; 2002 [Google Scholar]
- 7.Chen HT. The roots and growth of theory-driven evaluation: an integrated perspective for assessing viability, effectuality, and transferability. In: Alkin MC, editor. Evaluation roots: a wider perspective of theorists’ roots and influences. 2nd ed. Los Angeles: Sage; 2012. p. 113–129.
- 8.Clarke B, Gillies D, Illari P, Russo F, Williamson J. The evidence that evidence-based medicine omits. Prev Med. 2013;57:745–7 [DOI] [PubMed] [Google Scholar]
- 9.Adams V. Evidence-based global public health. In: Biehl J, Petryna A, editors. When people come first: critical studies in global health. Princeton (NJ): Princeton University Press; 2013. p. 54–90
- 10.Cook TD. Causal generalization: how Campbell and Cronbach influenced my theoretical thinking on this topic. In: Alkin MC, editor. Evaluation roots: a wider perspective of theorists’ views and influences. 2nd ed. Los Angeles: Sage; 2012: 81–96.
- 11.Gogia S, Sachdev HS. Neonatal vitamin A supplementation for prevention of mortality and morbidity in infancy: systematic review of randomised controlled trials. BMJ. 2009:338:b919. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.West KP, Jr, Sommer M, Tielsch JM, Katz JK, Christian P, Klemm RDW. A central question not answered: can newborn vitamin a reduce infant mortality in South Asia? BMJ. 2009. [cited 2013 Aug 8]. Available from: http://www.bmj.com/rapid-response/2011/11/02/central-question-not-answered-can-newborn-vitamin-reduce-infant-mortality. [Google Scholar]
- 13.World Health Organization Guideline: neonatal vitamin A supplementation. Geneva: WHO; 2011 [PubMed] [Google Scholar]
- 14.Kim SS, Habicht JP, Menon P, Stoltzfus RJ. How do programs work to improve child nutrition? Washington: IFPRI; 2011. IFPRI Discussion Paper. [Google Scholar]
- 15.Savitz DA. In defense of black box epidemiology. Epidemiology. 1994;5:550–2 [PubMed] [Google Scholar]
- 16.Skrabanek P. The emptiness of the black box. Epidemiology. 1994;5:553–5 [PubMed] [Google Scholar]
- 17.University of Oxford Centre for Evidence Based Medicine. OCEBM levels of evidence system: 2011 levels of evidence. [cited 2013 Aug 8]. Available from: http://www.cebm.net/index.aspx?o=5653.
- 18.Habicht JP, Pelto GP. From biological to program efficacy: promoting dialogue among the research, policy and program communities.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.The Free Dictionary. Definition of extrapolate. [cited 2013 Aug 8]. Available from: http://www.thefreedictionary.com/extrapolation.
- 20.Victora CG, Habicht JP, Bryce J. Evidence-based public health: moving beyond randomized trials. Am J Public Health. 2004;94:400–5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Glasgow RE, Lichtenstein E, Marcus AC. Why don't we see more translation of health promotion research to practice? Rethinking the efficacy-to-effectiveness transition. Am J Public Health. 2003;93:1261–7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Mayo-Wilson E, Grant S, Hopewell S, Macdonald G, Moher D, Montgomery P. Developing a reporting guideline for social and psychological intervention trials. Trials. 2013;14:242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Biehl J, Petryna A. Evidence: overview. In: Biehl J, Petryna A, editors. When people come first: critical studies in global health. Princeton (NJ): Princeton University Press; 2013. p. 23–9.
- 24.Flay BR. Efficacy and effectiveness trials (and other phases of research) in the development of health promotion programs. Prev Med. 1986;15:451–74 [DOI] [PubMed] [Google Scholar]
- 25.Awasthi S, Peto R, Read S, Clark S, Pande V, Bundy D. Vitamin A supplementation every 6 months with retinol in 1 million pre-school children in north India: DEVTA, a cluster-randomised trial. Lancet. 2013;381:1469–77 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Miettinen O. Confounding and effect-modification. Am J Epidemiol. 1974;100:350–3 [DOI] [PubMed] [Google Scholar]
- 27.Ruel MT, Habicht JP, Rasmussen KM, Martorell R. Screening for nutrition interventions: the risk or the differential-benefit approach? Am J Clin Nutr. 1996;63:671–7 [DOI] [PubMed] [Google Scholar]
- 28.Brookes ST, Whitely E, Egger M, Smith GD, Mulheran PA, Peters TJ. Subgroup analyses in randomized trials: risks of subgroup-specific analyses; power and sample size for the interaction test. J Clin Epidemiol. 2004;57:229–36 [DOI] [PubMed] [Google Scholar]
- 29.Oxman AD, Guyatt GH. A consumer's guide to subgroup analyses. Ann Intern Med. 1992;116:78–84 [DOI] [PubMed] [Google Scholar]
- 30.Schulz KF, Altman DG, Moher D. CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. Trials. 2010;11:32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Krahn M, Naglie G. The next step in guideline development: incorporating patient preferences. JAMA. 2008;300:436–8 [DOI] [PubMed] [Google Scholar]
- 32.World Health Organization Guideline: vitamin D supplementation in pregnant women. Geneva: WHO; 2012 [PubMed] [Google Scholar]
- 33.World Health Organization Guideline: calcium supplementation in pregnant women. Geneva: WHO; 2013 [PubMed] [Google Scholar]
- 34.Arreola SAG, Banos O, Beck J, Keatley J, Sundararaj M. In our own words: preferences, values, and perspectives on HIV prevention and treatment. A civil society consultation with MSM and transgender people. Oakland (CA): Global Forum on MS and HIV; 2010.
- 35.Wiysonge CS, Ngcobo NJ, Jeena PM, Madhi SA, Schoub BD, Hawkridge A, Shey MS, Hussey GD. Advances in childhood immunisation in South Africa: where to now? Programme managers’ views and evidence from systematic reviews. BMC Public Health. 2012;12:578. [DOI] [PMC free article] [PubMed] [Google Scholar]