Abstract
Background:
“Biological plausibility” is a concept frequently referred to in environmental and public health when researchers are evaluating how confident they are in the results and inferences of a study or evidence review. Biological plausibility is not, however, a domain of one of the most widely used approaches for assessing the certainty of evidence (CoE) which underpins the findings of a systematic review, the Grading of Recommendations, Assessment, Development and Evaluation (GRADE) CoE Framework.
Objectives:
Whether the omission of biological plausibility is a potential limitation of the GRADE CoE Framework is a topic that is regularly discussed, especially in the context of environmental health systematic reviews.
Study Design and Setting:
We analyze how the concept of “biological plausibility,” as applied in the context of assessing certainty of the evidence that supports the findings of a systematic review, is accommodated under the processes of systematic review and the existing GRADE domains.
Results:
We argue that “biological plausibility” is a concept which primarily comes into play when direct evidence about the effects of an exposure on a population of concern (usually humans) is absent, at high risk of bias, inconsistent, or limited in other ways. In such circumstances, researchers look toward evidence from other study designs to draw conclusions. In this respect, we can consider experimental animal and in vitro evidence as “surrogates” for the target populations, exposures, comparators, and outcomes of actual interest. Through discussion of 10 examples of experimental surrogates, we propose that the concept of biological plausibility consists of two principal aspects: a “generalizability aspect” and a “mechanistic aspect.”
Conclusions:
The “generalizability aspect” concerns the validity of inferences from experimental models to human scenarios, and asks the same question as does the assessment of external validity or indirectness in systematic reviews. The “mechanistic aspect” concerns certainty in knowledge of biological mechanisms and would inform judgments of indirectness under GRADE, and thus the overall CoE. Although both aspects are accommodated under the indirectness domain of the GRADE CoE Framework, further research is needed to determine how to use knowledge of biological mechanisms in the assessment of indirectness of the evidence in systematic reviews.
Keywords: Systematic review, Biological plausibility, Surrogates, Environmental health, Toxicology, Epidemiology, Bradford Hill, GRADE
1. Introduction
In environmental and public health research, toxicology, and human health chemical risk assessment (henceforth referred to as “environmental health research”) it is rare to have direct evidence from studies in humans of the effects that environmental exposures might be having on people’s health. This elevates the importance in environmental health research of evidence from experimental animal (in vivo) and in vitro studies. However, although evidence from in vivo and in vitro studies has the advantage that exposure can be controlled, the laboratory set-up is only indirectly representative of the human situation which it models—using animals in place of people, artificial cell culture constructs to measure biological processes, and exposure regimens which are often much higher, shorter, and more regimented than would be seen in human cases [1].
There is often, therefore, a need to translate the evidence from laboratory experiments to the human scenarios they are informing. Our ability to do this correctly is critical in successfully identifying, quantifying, and limiting health harms from environmental exposures. As systematic reviews become mainstream in environmental health [2–6], the need for systematic approaches for translating evidence from the laboratory to the human context becomes increasingly important [7].
One concept which is often applied in assessing causality and translating the findings of laboratory experiments to human contexts (or, indeed, one epidemiological context to another) is that of “biological plausibility.” As a concept, biological plausibility was first formalized in 1965 by Sir Austin Bradford Hill [8], as one of his considerations for establishing causality. Bradford Hill argued that the presence of biological plausibility can increase the likelihood that a relationship between an exposure and a health outcome is a causal one. However, despite the evolution of thinking around the concept and the many definitions of “biological plausibility” that are available (Table 1), exactly what constitutes biological plausibility has never been fully or finally characterized. This is particularly true in the context of conducting environmental health systematic reviews. Methodologists, including those in the Grading of Recommendations, Assessment, Development and Evaluation (GRADE) Working Group, are frequently challenged by environmental health practitioners about whether and how the assessment of biological plausibility is accommodated in the systematic review process [9].
Table 1.
Examples of definitions of “biological plausibility”
| Source | Definition of “biological plausibility” |
|---|---|
| Hill [8] | “It will be helpful if the causation we suspect is biologically plausible. But this is a feature I am convinced we cannot demand. What is biologically plausible depends upon the biological knowledge of the day” |
| Wikipedia Contributors [10] | “A relationship between a putative cause and an outcome—that is consistent with existing biological and medical knowledge” and “one component of a method of reasoning that can establish a cause-and-effect relationship between a biological factor and a particular disease or adverse event” |
| European Food Safety Authority [11] | “Consistency between data and biological theory or mechanism” |
| Last’s Dictionary of Epidemiology [12] | The “causal consideration that an observed, potentially causal association between an exposure and a health outcome may plausibly be attributed to causation on the basis of existing biomedical and epidemiological knowledge” |
| Organisation for Economic Co-operation and Development [13] | Being “consistent with biological knowledge” and “based on extensive previous documentation and broad acceptance” |
| US Environmental Protection Agency Cancer Guidelines [14] | “An inference of causality [which] tends to be strengthened by consistency with data from experimental studies or other sources demonstrating plausible biological mechanisms. A lack of mechanistic data, however, is not a reason to reject causality” |
2. GRADE and biological plausibility
The GRADE Framework, originally introduced in 2003, is commonly used in public health and healthcare systematic reviews, and increasingly in environmental health [4,15]. GRADE contends that assessment of the certainty of evidence for answers to research questions can be successfully operationalized (ie, conducted accurately, consistently, and transparently by different researchers working in different times and places) via systematic consideration of a predefined set of eight “domains” of strengths and limitations of the overall evidence base [16]. The domains that reduce certainty in a body of evidence summarized in a systematic review are risk of bias, inconsistency, indirectness, imprecision, and publication bias. The domains which increase certainty are large effect size, presence of a dose–response relationship, and residual opposing confounding (Fig. 1).
Fig. 1.

The upgrade and downgrade domains in GRADE and how they are used to determine the overall certainty in evidence for a systematic review. GRADE, Grading of Recommendations, Assessment, Development and Evaluation. Adapted from [17].
These domains are intended to be exhaustive of the concepts necessary for assessing certainty in the evidence, operationalized via a structured reasoning process designed to produce more consistent and transparent results than is achievable by direct application of the considerations of Bradford Hill. Historically, the contention has been that the role played by assessment of biological plausibility in environmental health assessments is already accommodated either in the GRADE domains or as part of the systematic review process [18,19]. The GRADE Working Group has therefore intentionally not included biological plausibility as a domain in rating the certainty of the evidence.
Our objectives in this paper are as follows: to further elucidate how the systematic review process and the GRADE domains operationalize the assessment of biological plausibility; to describe how the concept of biological plausibility maps onto the process of systematically reviewing environmental health evidence; and answer the question of how “biomedical,” “biological,” or “epidemiological” knowledge, as referred to in the various definitions of biological plausibility, contributes to rating certainty in a body of evidence summarized in a systematic review.
Our argument consists of five parts. First, we argue that consideration of biological plausibility is not necessary if the body of evidence that is directly reflective of the populations, exposures, comparators, and outcomes of concern in a systematic review question is sufficiently certain. Second, we note that this situation is rare in environmental health, and that systematic reviews in this field will often need to include indirect evidence from surrogate (We define “surrogate” as any property of a study model that is used to estimate the characteristics of a different property. By “property” we mean any controllable or measurable element of study design. This includes population, exposure or intervention, comparator, outcome, and any individual characteristics thereof respectively. For example, rats might be studied in the laboratory as surrogates for human populations, and IQ might be measured as a surrogate for intellectual capacity.) in vivo and in vitro experimental models. Third, through 10 examples of the use of surrogates, we show what sort of “biological knowledge” is typically used when researchers are making judgments about biological plausibility.
Fourth, our 10 examples show that the concept of biological plausibility consists of two connected principle aspects, which we call the “generalizability aspect” and the “mechanistic aspect.” The “generalizability aspect” of biological plausibility concerns the extent to which findings from an experimental context apply to a target context of concern. The “mechanistic aspect” concerns certainty in the evidence of biological mechanisms (ie, the molecular, cellular, and organismal events leading to an outcome). Judgments of the generalizability of a surrogate are informed by evidence of biological mechanisms.
Fifth, we argue that because the generalizability aspect of biological plausibility and the assessment of indirectness in systematic reviews both concern the external validity of experimental models, it follows that the generalizability aspect of biological plausibility is accommodated under the GRADE domain of indirectness. Insofar as judgments of certainty in biological mechanisms support judgments of the generalizability of a surrogate, then the mechanistic aspect of biological plausibility should also be operationalized under the indirectness domain of GRADE.
We therefore conclude that, while processes and language may be different, the concepts involved in the assessment of biological plausibility are covered by the established domains of GRADE. This means GRADE does not need to introduce additional domains to accommodate biological plausibility. However, we also recognize that GRADE has not yet been applied to the assessment of certainty in a way which takes detailed account of biological mechanisms. We therefore recommend research be conducted to advance understanding of how knowledge about mechanisms should be applied in determining the indirectness of evidence.
We note that we are not providing a complete account of the concept of biological plausibility in all contexts and uses, restricting our focus to its application in the conduct of systematic reviews of exposure–outcome relationships. We also note that there is a potential relationship between biological plausibility and Bradford Hill’s concept of “coherence.” This is acknowledged elsewhere that coherence is covered in GRADE under the domains of inconsistency and indirectness [19]. However, as a different concept to biological plausibility, coherence is not a focus of this article.
3. “Biological plausibility” and the inclusion of surrogates in systematic reviews
Systematic review can be defined as the application of methods designed to minimize risk of systematic and random error, and maximize transparency of decisionmaking, when using existing evidence to answer specific research questions. Asking a specific, focused question is a fundamental step in the systematic review process. Systematic review questions in environmental health are generally characterized in terms of the population, exposure, comparator, and outcomes of concern—the PECO mnemonic [20].
One of the principal reasons for characterizing environmental health questions and the objectives of systematic reviews in terms of a PECO statement is to facilitate unambiguous characterization of the types of studies which will be considered by the authors of a systematic review to be relevant or eligible for answering their question. Studies which are more directly relevant will be of populations, exposures, comparators, and outcomes that closely match the PECO of the systematic review; those which are less relevant will match less closely. This concept of fit between a study and the objectives of a systematic review is “external validity”—the extent to which the findings of a study can be generalized to populations, exposures, and outcomes outside the context of that study [21]. External validity is part of the indirectness domain in GRADE [22].
When designing a systematic review, authors need to decide on what their cut-off or threshold for external validity is going to be, that is, where they draw the line on a study being sufficiently generalizable to their target PECO to be worth including in their review. Where the line is drawn will depend on the review objectives. The way to keep a systematic review relatively small and simple is to define as eligible only those studies whose designs most directly match the PECO characterization of the systematic review question (Fig. 2A). If the evidence from those studies is sufficiently certain then there is no need to seek out other evidence in support of the findings of the systematic review—a search for indirect evidence need not be undertaken.
Fig. 2.

Schematic representation of how it might be decided to include studies of surrogates in a systematic review.
A classic example of this scenario, of direct evidence being of high certainty, is smoking causing lung cancer. Several observational studies have investigated doctors (P) who smoke (E), compared them to doctors who do not smoke (C), and assessed the relative risk of lung cancer (O) between the two groups. The studies are at relatively low risk of bias, including confounding; multiple studies of similar design give reasonably consistent results; they are in a representative population; the overall effect size is reasonably precise; there is no evidence that publication bias exaggerates the observed effect size; there is a dose–response relationship; and the effect size is large, with smoking increasing lung cancer risk by a factor of 12–24 [23,24]. These features of the evidence establish with sufficiently high certainty that a causal relationship has been observed, without knowledge of the mechanism by which the exposure causes the outcome.
In such scenarios, the “biological plausibility” of the exposure–outcome relationship does not need to be evaluated; it can be assumed that there must be a discoverable biological mechanism because there is high certainty that the relationship is causal. This is true even when there is little information about the biological mechanism by which the exposure causes its outcome. Conversely, that it is not known why or how the exposure causes the outcome does not undermine certainty that the relationship is causal. This is what we believe Bradford Hill meant when he stated that establishing biological plausibility is helpful but not always necessary for a causal claim [8]: “It will be helpful if the causation we suspect is biologically plausible. But this is a feature I am convinced we cannot demand.”
The challenge in environmental health research is that high certainty in direct evidence is a theoretical possibility which is only rarely realized. Usually, environmental health systematic reviews that focus only on the human evidence for a hypothesized exposure–outcome relationship would be expected to yield insufficiently conclusive results, due to the human evidence being highly uncertain or even nonexistent. In such circumstances, to further investigate and elucidate potential causal relationships between exposures and outcomes, it may become necessary to consider indirect evidence in the form of studies of surrogates (Fig. 2B). This is done in the expectation that including in the systematic review indirect evidence from studies of surrogates will support an assessment of the presence of a causal relationship.
The use of surrogates is familiar in environmental health contexts, which has long been reliant on evidence whereby animal models stand in for target human populations, biomarkers of disease are used in place of observations of clinical health outcomes, and potential health effects of under-studied chemicals are inferred from their similarity to better-researched substances. As with any systematic process, decisions on which surrogates to include in a systematic review should be transparent and well-reasoned, based on evidence of the validity of the decision, and as far as possible defined in advance of conduct of the review [25]. Spurious inclusion of surrogate studies is not just a waste of time and resources: if surrogates are not informative of the question but nonetheless included in the overall analysis, then the validity of the results of the systematic review may be compromised; likewise, spurious exclusion of surrogate studies which should have been included also risks false conclusions.
4. The “biological plausibility” of choice of surrogates
In conventional environmental health assessments, the consideration of evidence from surrogates is considered to be justifiable insofar as it provides “biologically plausible” support for the hypothesized exposure–outcome relationship in the population of concern [9]. In the context of systematic reviews, the GRADE Framework assesses the importance of the indirectness of the surrogate relative to the question being asked. Evidence from surrogates which is too indirect would be excluded from a systematic review; evidence from surrogates which is direct enough to be informative would be included but might be rated down for indirectness [26].
Here we present 10 examples of the use of surrogates in environmental health assessments. We frame the examples in terms of biological plausibility and describe how the indirectness of the surrogates might be interpreted in the GRADE approach. We then use these examples to show how judgments of biological plausibility map onto the concepts of systematic review. The 10 examples and related analyses are summarized in Tables 2 and 3.
Table 2.
Summary of the 10 examples used in this manuscript to show how discussion of biological plausibility maps onto the concepts of systematic review
| Pillars of the question of interest | Surrogates of higher biological plausibility, for which indirectness is less important | Surrogates of lower biological plausibility, for which indirectness is more important |
|---|---|---|
| Population | Animal models for human carcinogenicity of 2-nitropropane | Rat models for human bladder carcinogenicity of saccharin |
| Exposure (dose) | Extrapolating from high doses to low doses of genotoxic substances | Extrapolating from high doses to low doses of endocrine disrupting chemicals |
| Exposure (route) | Oral administration of bisphenol-A via gavage, or availability of a pharmacokinetic model to translate intravenous dose to oral equivalent | Intravenous administration of bisphenol-A in the absence of pharmacokinetic model to translate intravenous dose to oral equivalent |
| Exposure (substance) | Inferring estrogenic potential of other bisphenols and from studies of bisphenol-A | Inferring neurotoxicity of organophosphate flame retardants from studies of organophosphate pesticides |
| Outcome | Maternal serum thyroxine (T4) for child neurodevelopmental outcomes | Biomarkers of Alzheimer’s disease progression in place of clinical measures |
Table 3.
Summary of potential influencing factors in judging biological plausibility or external validity of study surrogates, as suggested by the examples in this manuscript
| Pillars of the question of interest | Potential influencing factors in judging the biological plausibility or external validity of study surrogates |
|---|---|
| Population | The extent to which the biological pathway connecting exposure to outcome is operating in both the surrogate population and the target population (Fig. 5A) |
| Exposure: dose | The similarity of the toxicodynamic and toxicokinetic processes by which the surrogate dose acts in comparison to that of the dose range of interest |
| Exposure: route | The similarity by which an organism absorbs and metabolizes the substance of concern via the surrogate route as opposed to the target route; or the reliability with which exposure from the surrogate route can be transformed to values which match exposure from the route of interest |
| Exposure: substance | The extent to which the surrogate molecule influences the biological processes by which the target molecule is thought to elicit its biological effects (Fig. 5C) |
| Outcome | The extent to which a surrogate outcome is predictive of the target outcome of concern (Fig. 5B) |
Note that this is a conceptual article describing how ratings of indirectness may be described using the GRADE approach. This information should not be used for decision-making. As for any GRADE concept article, the particular approach described here will require further validation before it may become official GRADE guidance. Thus, the importance of the indirectness of a surrogate when assessed as part of a systematic review, and therefore any judgments to exclude or downgrade evidence based on its indirectness, may turn out to be different to the judgments which have been made in each of the examples we present.
5. Surrogate populations
Toxicology has a long history of use of animal models for investigating potential harm to human health from exposure to chemical substances. This is due to the ethical prohibition on conducting experiments in humans that are designed to potentially cause harm, combined with the need for evidence to inform evaluation of chemical health risks, for example, for regulatory approval and compliance.
One example of where surrogate animal and in vitro populations are accepted as providing evidence for health outcomes in human populations of concern is in the assessment of the carcinogenicity of 2-nitropropane. Although there is no direct evidence of carcinogenicity in humans, animal and in vitro evidence is considered to be sufficiently certain to justify classifying 2-nitropropane as a human carcinogen [27]. Although the authors did not have a complete account of the mechanism by which 2-nitropropane is a genotoxic carcinogen, they judged it sufficiently biologically plausible that observations in surrogate experimental populations would also be seen in humans that they felt able to draw a conclusion of carcinogenicity.
In a contrasting example, the US Food and Drug Administration (FDA) has deemed evidence from rat models as not relevant for the assessment of saccharin as a bladder carcinogen. This is due to the mechanism by which saccharin causes tumor growth in rats not being present in humans [28]. The rat model was originally considered to be predictive but, once the mechanism by which saccharin causes cancer in rats was determined not to be present in humans, the US FDA excluded the rat model from assessment. The US FDA judged the hypothesis that the mechanism by which saccharin causes cancer in rats also occurs in humans as not biologically plausible.
Expressing the reasoning around 2-nitropropane in the concepts of GRADE, it would be to say that surrogate evidence from animal studies is sufficiently direct to be included in a systematic review of the human carcinogenicity of 2-nitropropane. A conclusion about carcinogenicity in humans can be made on the basis of this animal evidence, in spite of its indirectness and a lack of a complete account of the mechanism of carcinogenicity. For saccharin it would be to say that, based on the evidence about the underlying mechanism, the indirectness of the surrogate animal model is unacceptable, that is, the rat model is too indirect, does not generalize to humans, and therefore is not eligible for inclusion in a systematic review of whether saccharin is a bladder carcinogen.
We again emphasize that the purpose of these examples is to survey how judgments of biological plausibility map onto the concepts and processes of GRADE and systematic reviews. We are not validating any of the judgments that have been made by others in the selected examples. The purpose of the examples is to understand what is involved when researchers are making judgments of biological plausibility, not to determine whether those judgments are valid.
6. Surrogate outcomes
Surrogate outcomes are used in environmental health research because it is often easier or more ethical in experimental and observational studies to measure biomarkers of disease than clinical outcomes of interest. This is the case when health outcomes may have long latency periods in the population of concern (such as for many cancer types), for particular study designs (eg, the use of in vivo models for allergic contact dermatitis that focus on the induction phase only), or when the observed population may not manifest the apical outcome of interest (eg, when nonanimal test methods address downstream key events relating skin sensitization).
One example of the use of a surrogate outcome is in a systematic review of the developmental and reproductive toxicity of the biocide triclosan by Johnson et al. [29]. In this case, serum thyroxine concentrations in pregnant women were chosen as a surrogate for the neurodevelopmental health of children. The authors’ reasoning was that maternal thyroid hormone levels during pregnancy are predictive of the subsequent neurodevelopmental health of the child—an association described in another systematic review as being “biologically plausible” [30]. This can be taken as a judgment by the authors that there is a sufficiently “biologically plausible” relationship between maternal serum thyroxine and neurodevelopment that the former can be treated as a surrogate outcome for the latter.
In contrast, a systematic review of biomarkers for Alzheimer’s disease found insufficient evidence to be able to recommend any biomarker for use as a surrogate outcome for disease progression [31]. Although it might appear to be “biologically plausible” that Alzheimer’s disease results in specific changes to physical brain structure detectable in a magnetic resonance imaging scan [32], there seems to be a lack of empirical evidence that directly connects the biomarker to the outcome of concern.
Expressing the triclosan example in the concepts of the GRADE Framework, it would be to say that, in spite of their indirectness, studies which investigate the surrogate outcome can be considered eligible for inclusion in a systematic review of neurodevelopmental toxicity and contribute toward its findings (note that Johnson et al. [29] used a modified version of the GRADE Framework and did not downgrade for indirectness). For Alzheimer’s disease, it would be to say that due to uncertainty around how the surrogate biomarker predicts the ultimate outcome of concern, studies of brain structure biomarkers should either be downgraded more than once for indirectness or excluded from a systematic review if the indirectness is judged to be unacceptable.
7. Surrogate exposures
Selecting and attributing appropriate importance to surrogate exposures is a complex issue in environmental health systematic reviews. We briefly discuss three aspects of surrogate exposure: route of exposure; administered dose; and active substance. These should provide sufficient illustration of principle, although we note that other aspects of exposure such as measurement of metabolites vs. parent compound, timing of exposure, and other issues, will need consideration in environmental health systematic reviews [33].
7.1. Route
Extrapolating from experimental routes of exposure to the actual routes of exposure likely to be encountered by target populations is a major preoccupation of toxicological risk assessment. For example, toxicology studies which administer bisphenol-A (BPA) to animal test subjects via oral gavage are considered to be of direct relevance to assessing outcomes from dietary exposure. In contrast, intravenous (IV) administration of BPA is typically considered not to be relevant to such assessment, due to the avoidance of first-pass metabolism in the liver [34]. However, the relevance of studies using IV administration can increase if knowledge of how BPA is metabolized allows equivalent oral doses to be calculated from IV doses, as this provides what can be interpreted as a “biologically plausible” account of how the two doses are related [35].
Expressing this in the conceptual framework of GRADE, we would say the indirectness of the route of exposure becomes less important when the exposures of concern can be determined from surrogate exposure routes. Physiologically based pharmacokinetic models to aid in route-to-route extrapolation are encouraged in chemical assessments [36,37]. The availability of such models may lead to indirect evidence from studies using IV exposure routes being included in a systematic review and potentially rated down fewer levels for indirectness than for scenarios in which such models are unavailable.
7.2. Dose
In toxicological research, experiments are often conducted using high doses that are not considered environmentally or occupationally relevant. Many bioassays also merely aim at identifying a maximum tolerated dose of a chemical substance to provide a benchmark of toxicity. High-dose regimens can raise critical concerns about the indirectness of a study, if the toxicokinetic and toxicodynamic factors by which the administered dose causes an outcome are different from those operating at the dose level of concern [38].
This is a key point of debate about the potential health effects of exposure to endocrine disrupting chemicals: if the administered high dose overwhelms the biological pathway that is involved in the endocrine activity of the active substance, triggering nonspecific pathways that are responsible for the observed outcomes, then there may be critical concerns about the indirectness of the surrogate dose for determining whether the chemical of concern is an endocrine disruptor [39]. An example of this is the causing of endocrine effects via direct damage to the liver [40]. This would lead to concern that disease induction at high doses via endocrine disruption is not a biologically plausible mechanism, due to differences between the mechanism by which the dose of concern causes the outcome of interest compared to the mechanism by which the surrogate dose causes the outcome.
In contrast, chemicals which cause cancer by a genotoxic mechanism are considered to operate according to the same mechanism of action at high and low doses [41]. In this case, extrapolation from across the dose range is taken to be unproblematic.
Expressing the example of endocrine disruption and genotoxicity in the concepts of the GRADE Framework, the indirectness of a surrogate dose becomes more important when there is evidence of different toxicokinetic and toxicodynamic processes operating at different dose levels. The presence of such differences may result in a decision to exclude evidence from studies using surrogate doses in a systematic review because of unacceptable levels of indirectness. If, on the other hand, it is decided to include the studies that use surrogate doses, rating down for indirectness by two or more levels would be more likely in the example of endocrine disruptors than for genotoxic carcinogens.
7.3. Substance
There are many chemicals to which people are potentially exposed which have very few associated toxicology studies. One means for anticipating the potential toxicity of under-studied substances is by extrapolation from evidence of the toxicity of suitably similar chemicals. Often this is based on the demonstration of a common mode of action of toxicity, or sufficient likeness of the surrogate chemical in terms of physical properties that a common mode of action can reasonably be inferred.
For example, the UK Committee on Toxicity (COT) recently evaluated evidence of the neurotoxicity of organophosphate flame retardants (OPFRs) [42]. Part of their assessment concerned whether the neurotoxicity of OPFRs could be extrapolated from studies of the neurotoxicity of organophosphate pesticides (OPPs). COT determined that OPPs are not a good surrogate exposure for OPFRs, because OPFRs do not inhibit acetylcholinesterase to the same degree as OPPs. COT concluded that there is no “biologically plausible” explanation for how OPPs and OPFRs can cause the same effect, and therefore determined that conclusions about the neurotoxicity of OPFRs should not be derived from evidence of the neurotoxicity of OPPs.
In contrast, since the phase-out of consumer uses of the plastic additive bisphenol-A due to concerns about its potential to act as an estrogen, considerable research has been conducted into whether replacements such as bisphenol-AF and bisphenol-C may have similar estrogenic potential. Enough similarities in biological effects have been observed for some researchers to suggest that, at least as a group, exposure to some bisphenols may be predictive of the effects of exposure to others [43]. Similar suggestions have been made for polyfluorinated compounds [44]. When it is more “biologically plausible” that different chemical substances share the same mechanisms by which they exert health effects, then it might be acceptable to use one as a surrogate (also referred to in the environmental health field as an “analog”) for the other.
Expressing the example of OPFRs in the concepts of the GRADE Framework, the absence of explanation for a shared mechanism by which OPPs and OPFRs would exert a neurotoxic effect increases the indirectness of OPFRs as a surrogate for OPPs. If the level of indirectness is unacceptable, it would lead to studies of OPFRs being excluded from a systematic review of their neurotoxicity; if very high, evidence from the surrogate exposure might be included but would be rated down for indirectness, potentially two or three times. For bisphenols and polyfluorinated compounds, if indirectness of surrogates is deemed less important, they may be included in a systematic review and rated down only once for indirectness, or possibly not at all.
8. Discussion
8.1. Biological plausibility as a dual-aspect concept
Our examples show that “biological plausibility” is a concept that can be deployed in multiple scenarios in environmental health assessments. In general, judgments of biological plausibility seem to support judgments of causality insofar as studies of causal relationships in surrogates can be generalized to the target populations, exposures, and outcomes of actual concern. These uses extend beyond the definitions of biological plausibility as provided by Bradford Hill and Last’s Dictionary of Epidemiology, which define biological plausibility exclusively in terms of biological explanations of a causal relationship between exposure and outcome (Table 1). When translated into the conceptual underpinnings of GRADE, the uses center on judging the indirectness of surrogates and describing the impact on certainty in the evidence for the effect an exposure has on a health outcome in a population of concern. These judgments not only govern decisions about the eligibility of surrogates for a systematic review but also the extent to which a body of evidence based on those surrogates should be downgraded for indirectness.
Our examples also demonstrate that the judgments being made when assessing indirectness are complex. Not only do judgments need to be made about the generalizability of a surrogate, there are also judgments that need to be made about certainty in descriptions of biological mechanism. These judgments are intrinsically connected, as absence of mechanistic explanation limits the ability to generalize from a study surrogate to a target context of concern. This is perhaps most clearly illustrated in the examples of considering whether studies of OPPs are relevant to characterizing the potential neurotoxicity of OPFRs, and whether studies of rats are relevant to characterizing saccharin as a bladder carcinogen in humans.
Based on these observations, we posit that the concept of “biological plausibility” in fact consists of two principle aspects. We call these the “generalizability aspect” and the “mechanistic aspect.”
We define the generalizability aspect of biological plausibility as concerning the validity of generalizations from a surrogate population, exposure, comparator, or outcome to a target population, exposure, comparator, or outcome of concern, respectively. The generalizability aspect is not about the plausibility of causal claims about the effect of exposures on outcomes, but instead about the extent to which an observation in a surrogate population plausibly generalizes to a target population, a surrogate exposure generalizes to a target exposure, and so forth. The generalizability aspect supports judgments of causality insofar as observations are made in studies of surrogates, and the surrogates then generalize to the target contexts of concern.
We define the mechanistic aspect as concerning certainty in biological mechanism. Our examples show that judgments of whether a surrogate plausibly generalizes to a target context are informed by knowledge of relevant biological mechanisms, that is, how an exposure causes an outcome in a given biological or experimental system. Although this knowledge is not often available, when it is, it has a significant impact on judgments about the generalizability of observations in a surrogate: the higher is the certainty in the knowledge of relevant biological mechanisms (eg, that similar mechanisms are present in humans and surrogate animal species), the higher is the certainty that a generalization from a given surrogate to a target context is valid or not. The mechanistic aspect informs the generalizability aspect, as knowledge of mechanism helps determine the validity of generalizing from surrogates to target contexts of concern.
These two aspects are different but fundamentally linked: judgments of the plausibility of generalizations are informed by judgments of the plausibility of mechanisms. This connection is illustrated in Fig. 3.
Fig. 3.

The relationship between the generalizability and mechanistic aspects of biological plausibility.
8.2. Biological plausibility and GRADE
8.2.1. How biological plausibility is accommodated within GRADE.
Both the generalizability and mechanistic aspects of biological plausibility can be accommodated in the indirectness domain of GRADE. This is because the concept of “generalizability” is the same as the concepts of external validity and indirectness of evidence already familiar in systematic reviews, that is, the extent to which the results of an experimental or observational study apply to a target context outside of that study [21,22]. The difference is in vocabulary, whereby systematic reviewers talk about the “validity” rather than “plausibility” of a generalization. Because the generalizability aspect of biological plausibility is asking the same question as the assessment of external validity in systematic reviews, and external validity is subsumed under the GRADE domain of indirectness, it follows that there is no need to extend GRADE to accommodate the generalizability aspect of biological plausibility.
The mechanistic aspect of biological plausibility, because it informs judgments of indirectness or generalizability, is also logically positioned under the indirectness domain of GRADE. This relationship is shown in Fig. 4. This means GRADE does not need an additional domain to accommodate assessment of certainty in biological mechanisms. However, this is an interesting category of question for which systematic methods have only recently begun to be explored [25,45] and we recommend further research on this issue.
Fig. 4.

How biological plausibility maps onto the processes of systematic review via the shared concept of external validity, included in GRADE’s indirectness domain. Although questions about biological mechanisms (eg, how an exposure causes an outcome) are independent of a given systematic review, answers to those questions can be highly informative in judging the external validity or indirectness of evidence.
Conceptualizing matters in this way suggests an operational definition of biological plausibility that maps the concept onto the GRADE framework, as follows: “Biological plausibility is a dual-aspect concept operationalized in systematic reviews as (1) the validity of generalizations from studies of surrogates to target contexts of concern and (2) certainty in biological mechanisms. Certainty in biological mechanisms informs judgment of the validity of generalizations. When knowledge of biological mechanisms is available, it can have significant impact on judgments of the validity of generalizations.”
8.2.2. How biological plausibility is accommodated within the processes of systematic review.
We have established that biological plausibility maps onto judgments of the generalizability or indirectness of evidence in a systematic review. These judgments are informed by certainty in biological mechanisms. Our task now is to clarify when these judgments are made in the systematic review process. This will show how biological plausibility, in the form of judgments of generalizability informed by knowledge of biological mechanisms, is accounted for when conducting a systematic review. The general principles are articulated below and the specific steps described in Box 1.
Box 1.
Explanation of how the concept of biological plausibility is operationalized in the conduct of a systematic review and assessment of certainty in the evidence using the GRADE Framework. Step 2 is the first place where judgments of indirectness may be made. Here, high certainty that an indirect study model does not generalize to the research question may lead to such models being excluded. The US FDA exclusion of rat models from assessments of the bladder carcinogenicity of saccharin is an example of this. Otherwise, systematic reviews of health effects of environmental exposures will likely include indirect study models. (The exception is for deliberately narrowly focused reviews, as illustrated in Fig. 2.) Indirectness judgments are next made in Steps 4 and 5, where the included evidence is assessed under the GRADE domain of indirectness. The UK COT analysis of neurotoxicity of OPFRs based on neurotoxicity of OPPs is an example of when a judgment of lack of certainty in shared biological mechanism results in evidence effectively being rated down for certainty due to indirectness. We note that regulatory frameworks tend to assume a high level of generalizability of a surrogate model unless there is a high level of evidence to the contrary.
Judgments of generalizability or indirectness occur at two stages in systematic reviews, as illustrated in Fig. 2. The first stage is in the formulation of eligibility criteria for the inclusion of evidence in a systematic review. Here, the authors decide what study designs are sufficiently generalizable to their question to be worth including in their systematic review. These criteria may be narrowly defined around the most direct evidence, if the authors are attempting to keep their review focused and/or they are confident that looking only at the most direct evidence will provide sufficiently conclusive results. Otherwise, the eligibility criteria may be quite broadly defined, if the authors consider indirect evidence to be of value for their review objectives. Even then, there will be limits to eligibility, as many studies will be so irrelevant to the objective that it would be a waste of time and resources to include them. Setting these limits, that is, judging what sort of study designs are informative enough of the research question to be worth including in the systematic review, is a judgment of acceptable indirectness. Mechanistic data, if available, may be of high value in making these judgments.
The second stage is in the judgment of indirectness of the evidence that has been included in the review, when the authors are determining certainty in the evidence on which their results are based. As we show with our example of smoking and lung cancer, in systematic reviews of very direct evidence indirectness is trivial and assessment of biological plausibility unnecessary—the generalizability of findings is given and mechanistic information is not needed to support judgments of certainty when certainty is already high.
The situation is different for systematic reviews with broadly defined eligibility criteria. Indirectness in a systematic review with broadly defined eligibility criteria rapidly becomes very important (due to potentially significant differences between target and surrogate) and complex (due to there being numerous potential differences between the characteristics of the question as formulated in the PECO vs. the included studies). The generalizability of surrogates to the target context of concern is a nontrivial issue and needs to be carefully evaluated. Information about mechanisms is of high value in making these judgments.
8.2.3. How the assessment of biological plausibility is operationalized in systematic reviews of the health effects of environmental exposures which use the GRADE approach for assessing certainty in the evidence.
Define the systematic review question as a PECO statement: “In population P, what effect does exposure E have on outcome O in comparison to comparator C?”
Define as ineligible study models that do not sufficiently generalize to the scenario described in the research question (the generalizability aspect of biological plausibility). These judgments may be informed by knowledge of biological mechanisms (the mechanistic aspect of biological plausibility).
Determine the effect of the exposure on the outcome in the studies included in the systematic review. This may require studies to be grouped by design characteristics.
Evaluate how well the included evidence generalizes to the situation described in the research question for each element of the PECO statement (generalizability aspect). These judgments may be informed by knowledge of biological mechanisms (mechanistic aspect).
If it is not certain that the evidence generalizes to the research question, rate down the evidence one or more times for indirectness depending on the level of this uncertainty.
8.2.4. Research requirements: judging indirectness at the level of individual studies
The 10 examples in this manuscript show that judgments of external validity are complex and potentially need to be made across multiple related domains of population, exposure, comparator, outcome, and subdomains thereof. Instruments which would facilitate transparent, consistent, and accurate judgments across these domains are not yet available for study-level judgments of indirectness in environmental health and should be developed.
Answering questions about biological mechanisms draws on a wide variety of information about the absorption, distribution, metabolism, and excretion (“ADME”) of chemical substances, knowledge of mechanisms by which chemicals cause outcomes in both target and observed populations, information about interaction between chemicals and target sites, and the extent to which biomarkers of disease are predictive of clinical outcomes, among many other issues. If mechanistic knowledge is informative of judgments of external validity, and therefore of the indirectness domain in GRADE, it follows that we need to develop methods for assessing certainty in mechanistic knowledge.
Assessing certainty in biological mechanisms would be an important and interesting extension of the GRADE indirectness domain. Unlike questions about associations which are of the form “is X associated with Y?,” questions about mechanisms are of the form “how does X cause Y?.” Answering this form of question involves describing sequences of biological events, one of which is associated with the next. In principle, event–event associations should be approachable in the same way as exposure–outcome associations, and therefore be amenable to the GRADE approach. A particular challenge we can foresee is in handling the sheer volume of data involved in systematically assessing multiple associated biological events, if very indirect evidence is permitted to enter into the assessment [45].
We note that developments in the Adverse Outcome Pathway framework may be informative for operationalizing the assessment of certainty in biological mechanisms and interpreting indirectness of evidence in systematic reviews [46]. Alternatively, the Key Characteristics framework may also provide a structured approach to assessing indirectness via similarity of biological mechanisms [47,48].
The 10 examples discussed above give us indications of some of the considerations which may reduce concerns about the indirectness of a study included in a systematic review. These are outlined in Table 3 and illustrated, where feasible, in Fig. 5. Although these are only suggestive selections from the examples we have used in this manuscript, they do illustrate how much of this discussion is already familiar in toxicology and environmental health. This experience should provide a robust platform for further research and should draw on the experience of GRADE and the environmental health communities.
Fig. 5.

Illustrations of the potential influencing factors in judging biological plausibility or external validity of study surrogates, as suggested by the examples in this manuscript. Circles represent individual events in a biological pathway by which activation of a receptor (green) results in a health outcome (orange). Eyes indicate what surrogate is being observed in an experimental model in lieu of the target of concern (dashed outlines).
Finally, we note that sufficient biological knowledge to permit high-certainty judgments of mechanism and external validity, and thus avoiding rating down for indirectness (and, conversely, being certain that evidence from a surrogate is not relevant), is rare. Absent explanations of mechanism, evidence would either (a) end up being excluded from a systematic review because there is no theoretical route (apart from presumption of relevance) to considering it as eligible, or (b) evidence would be included but its external validity would be unclear, indirectness higher as a consequence, and certainty lower overall. In these cases of low certainty due to unclear external validity of the included studies, significant mechanistic research may be required before it is possible to determine whether one experimental model is more externally valid than another. Currently, in regulatory circumstances where making decisions in the face of uncertainty is important, and mechanistic evidence to support judgments of external validity in a health assessment is limited, the external validity of a choice of surrogate is often assumed unless there is compelling evidence to the contrary [14].
8.3. Limitations
The attentive reader of our source material will notice that the concept of biological plausibility is rarely clearly applied, even when it appears that it is being discussed. This phenomenon has been observed by other researchers [49]. We have therefore had to impute the concept of biological plausibility to some of our examples—particularly for Alzheimer’s disease, bisphenols, and neurodevelopment—based on surrounding literature and our general understanding of how discussion of biological plausibility is conducted. We believe our imputation to be consistent with use of the concept and intent of the source material, and it is anyway not necessary for the specific term “biological plausibility” to have been used for the concept to have been applied. Although a greater number of direct examples could be gathered from a systematic survey of the use of the concept of biological plausibility in the literature, we do not expect that they would invalidate our argument.
9. Conclusion
We asked what sort of “biomedical,” “biological,” or “epidemiological” knowledge may influence certainty in the evidence of a systematic review of an exposure–outcome relationship. Our answer is knowledge of biological mechanisms, which informs judgments of the indirectness of a body of evidence constituted from studies of surrogates. We also set out to determine whether biological plausibility, when applied as a concept relating to certainty in the evidence for the findings of a systematic review, is accommodated by the GRADE domain of indirectness. We have argued that it is, although its full operationalization will require additional study.
In answering these questions, we have elucidated Bradford Hill’s proposition that establishing biological plausibility is helpful but not always necessary for a causal claim [8]: “It will be helpful if the causation we suspect is biologically plausible. But this is a feature I am convinced we cannot demand.” We have shown that biological plausibility is indeed not necessary for determining that an exposure causes an outcome, so long as the direct evidence for the exposure–outcome relationship is sufficiently certain. The presence of “biological plausibility” can nonetheless be “helpful” to establishing causation. This happens when sufficient information about mechanisms is available to characterize the generalizability of a surrogate, thereby supporting judgments about indirectness of the evidence and potentially permitting a more certain answer to the review question than would be yielded by inclusion of the most direct evidence alone.
Our analysis also broadens the scope of discussion in GRADE of study surrogates. Currently, GRADE guidance only explicitly addresses surrogate outcomes [26]: “Guideline developers should consider surrogate outcomes only when high-quality evidence regarding important outcomes is lacking. When such evidence is lacking […] they should specify the important outcomes and the associated surrogates they must use as substitutes. […] the necessity to substitute the surrogate may ultimately lead to rating down the quality of the evidence because of indirectness.” Here, we have extended discussion of eligibility and potential grading of surrogate outcomes to also cover surrogate populations and surrogate exposures.
We have argued that judgments of biological plausibility, at least in their application to determining the relevance of evidence to answering a focused research question, are accommodated under the operational procedures of systematic review and the GRADE domain of indirectness. Although vocabulary and processes may differ, we feel confident that there is nothing in biological plausibility that, for this context, is “missing” from GRADE. What is needed, however, are means to operationalize the assessment of the indirectness of included studies and certainty in evidence for biological mechanisms, the outputs of which can be used in determining the extent to which evidence should be rated down for indirectness. Such methods would help bring shape to the amorphous nature of mechanistic evidence and aid in its exploitation in environmental health systematic reviews.
As a final point, we observe a clear parallel between the clinical and public health contexts in which GRADE was developed and the environmental health context in which it is here being applied. The difference is that in clinical contexts, GRADE is nearly always used to evaluate human evidence where treatments are being trialed in people, far downstream from the preclinical in vitro and animal research that is used to justify conducting a human trial. Although treatments are advanced to human trials based on evidence from preclinical studies, this evidence is often many years old by the time a systematic review is conducted—and therefore preclinical evidence is not needed. In contrast, in vitro and in vivo research constitutes in many environmental health contexts most of the evidence being dealt with. The fundamental principles for systematically reviewing this evidence are no different to systematic reviews of human evidence, it is just the availability of human evidence that is more limited and mechanisms are often not known. In the context of environmental health, GRADE is, therefore, being applied to a more indirect evidence base which is often focused on events that are further upstream than those dealt with by most healthcare systematic reviews.
What is new?
Key findings
We analyzed the meaning and application of the concept of “biological plausibility” in the context of systematic reviews (SRs) of environmental health questions. We determined it to consist of two principle aspects: a “generalizability aspect” and a “mechanistic aspect.” The “generalizability aspect” concerns the validity of inferences from experimental models included in a SR to the target question with which the SR is concerned. This is equivalent to the assessment of the external validity or indirectness of evidence. The “mechanistic aspect” concerns certainty in knowledge of biological mechanisms. This knowledge is informative of judgments of external validity or indirectness of the evidence.
What this adds to what was known?
“Biological plausibility” is a concept frequently referred to in environmental and public health when researchers are evaluating how confident they are in the results and inferences of a study or evidence review. Biological plausibility is not, however, a domain of the Grading of Recommendations, Assessment, Development and Evaluation (GRADE) certainty of evidence framework. Our analysis shows how biological plausibility is accommodated within GRADE.
What is the implication and what should change now?
Although both the “generalizability aspect” and the “mechanistic aspect” can be accommodated under the indirectness domain of the GRADE certainty of evidence framework, further research is needed to determine how to use knowledge of biological mechanisms in the assessment of indirectness of the evidence in systematic reviews.
Acknowledgments
The authors would like to thank the GRADE Environmental Health Project Group and GRADE Working Group for their contributions to this manuscript, and the Evidence-based Toxicology Collaboration (EBTC) at Johns Hopkins Bloomberg School of Public Health for providing funding to cover the time of P.W., K.T., and S.H. in working on this manuscript. We would like to thank Dr Andrew Kraft and Dr Michelle Angrish for their technical review of the manuscript, and Dr Daniele Wikoff (ToxStrategies) for detailed discussion of the ideas and concepts we present here. The authors would also thank the European Food Safety Authority and EBTC for organizing the Scientific Colloquium, and the participants who contributed to discussions therein, which gave genesis to the concept of this manuscript [9].
Abbreviations:
- GRADE
Grading of Recommendations Assessment, Development and Evaluation
Footnotes
Conflict of interest statement: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
This article is co-published in Environment International and Journal of Clinical Epidemiology.
References
- [1].Rhomberg L Hypothesis-based weight of evidence: an approach to assessing causation and its application to regulatory toxicology. Risk Anal 2015;35(6):1114–24. [DOI] [PubMed] [Google Scholar]
- [2].Bilotta GS, Milner AM, Boyd I. On the use of systematic reviews to inform environmental policies. Environ Sci Pol 2014;42:67–77. [Google Scholar]
- [3].Hoffmann S, de Vries RBM, Stephens ML, Beck NB, Dirven H, Fowle rjR, et al. A primer on systematic reviews in toxicology. Arch Toxicol 2017;91(7):2551–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Morgan RL, Beverly B, Ghersi D, Schüunemann HJ, Rooney AA, Whaley P, et al. GRADE: assessing the quality of evidence in environmental and occupational health. Environ Int 2016;92-93:1–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Sheehan MC, Lam J. Use of systematic review and meta-analysis in environmental health Epidemiology: a systematic review and comparison with guidelines. Curr Environ Health Rep 2015;2(3):272–83 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Whaley P, Halsall C, Agerstrand M, Aiassa E, Benford D, Bilotta G, et al. Implementing systematic review techniques in chemical risk assessment: challenges, opportunities and recommendations. Environ Int 2016;92-93:556–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Lewis SJ, Gardner M, Higgins J, Holly JMP, Gaunt TR, Perks CM, et al. Developing the WCRF International/University of Bristol methodology for identifying and carrying out systematic reviews of mechanisms of exposure–cancer associations. Cancer Epidemiol 2017;26(11):1667–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Hill AB. The Environment and Disease: Association or Causation? Proc R Soc Med 1965;58:295–300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].European Food Safety AuthorityIn: ‘EFSA scientific Colloquium 23 – joint European Food safety authority and evidence-based toxicology collaboration Colloquium evidence integration in risk assessment: the science of combining apples and oranges 25–26 October 2017 Lisbon, Portugal’, EFSA Supporting Publications, 15 2018. 10.2903/sp.efsa.2018.EN-1396. [DOI] [Google Scholar]
- [10].Wikipedia contributors.Biological plausibility, wikipedia, the free encyclopedia. 2014. Available at https://en.wikipedia.org/w/index.php?title=Biological_plausibility&oldid=614374435. Accessed October 16, 2019.
- [11].Hardy A, Benford D, Halldorsson T, Jeger MJ, Knutsen HK, More S, et al. Guidance on the use of the weight of evidence approach in scientific assessments. EFSA J 2017;15(8):e04971. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].International Epidemiological Association. A Dictionary of Epidemiology. Oxford University Press. 2001. Available at https://play.google.com/store/books/details?id=nQmhQgAACAAJ. Accessed January 10, 2022. [Google Scholar]
- [13].OECD. ‘Users’ Handbook supplement to the Guidance Document for developing and assessing Adverse Outcome Pathways’, Env/Jm/Mono(2016) 12, (OECD Series on Adverse Outcome Pathways1; ), p. 63 2016. 10.1787/5jlv1m9d1g32-en. [DOI] [Google Scholar]
- [14].US Environmental Protection Agency. Guidelines for carcinogen risk assessment. 2005. Available at https://www.epa.gov/risk/guidelines-carcinogen-risk-assessment. Accessed January 10, 2022.
- [15].Morgan RL, Thayer KA, Bero L, Bruce N, Falck-Ytter Y, Ghersi D, et al. GRADE guidelines for environmental and occupational health: a new series of articles in Environment International. Environ Int 2019;128:11–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 2008;336:924–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Schünemann HJ, Cuello C, Akl EA, Mustafa RA, Meerpohl JJ, Thayer K, et al. GRADE guidelines: 18. How ROBINS-I and other tools to assess risk of bias in nonrandomized studies should be used to rate the certainty of a body of evidence. J Clin Epidemiol 2019;111:105–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Hultcrantz M, Rind D, Akl EA, Treweek S, Mustafa RA, Iorio A, et al. The GRADE Working Group clarifies the construct of certainty of evidence. J Clin Epidemiol 2017;87:4–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Schünemann H, Hill S, Guyatt G, Akl EA, Ahmed F. The GRADE approach and Bradford Hill’s criteria for causation. J Epidemiol Community Health 2011;65(5):392–5. [DOI] [PubMed] [Google Scholar]
- [20].Morgan RL, Thayer KA, Whaley P, Schünemann HJ. Identifying the PECO: a framework for formulating good questions to explore the association of environmental and other exposures with health outcomes. Environ Int 20181–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, et al. , editors. Cochrane Handbook for Systematic Reviews of Interventions version 6.0 (updated July 2019). Cochrane. Available at www.training.cochrane.org/handbook. Accessed January 10, 2022.
- [22].Schünemann HJ, Tugwell P, Reeves BC, Akl EA, Santesso N, Spencer FA, et al. Non-randomized studies as a source of complementary, sequential or replacement evidence for randomized controlled trials in systematic reviews on the effects of interventions. Res Synth Methods 2013;4(1):49–62. [DOI] [PubMed] [Google Scholar]
- [23].Doll R, Peto R, Boreham J, Sutherland I. Mortality from cancer in relation to smoking: 50 years observations on British doctors. Br J Cancer 2005;92:426–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Pope CA 3rd, Burnett RT, Turner MC, Cohen A, Krewski D, Jerrett M, et al. Lung cancer and cardiovascular disease mortality associated with ambient air pollution and cigarette smoke: shape of the exposure-response relationships. Environ Health Perspect 2011;119(11):1616–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Whaley P, Aiassa E, Beausoleil C, Beronius A, Bilotta G, Boobis A, et al. Recommendations for the conduct of systematic reviews in toxicology and environmental health research (COSTER). Environ Int 2020a;143:105926. [DOI] [PubMed] [Google Scholar]
- [26].Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, et al. GRADE guidelines: 8. Rating the quality of evidence–indirectness. J Clin Epidemiol 2011;64:1303–10. [DOI] [PubMed] [Google Scholar]
- [27].Papameletiou D, Hoffmann S, Johanson G, Klein CL, Rietjens I. ‘SCOEL/REC/300 2-Nitropropane - Recommendation from the Scientific Committee on Occupational Exposure Limits’. Directorate-General for Employment, Social Affairs and Inclusion (European Commission), Scientific Committee on Occupational Exposure Limits; 2017. [Google Scholar]
- [28].US National Research Council. Review of EPA’s Integrated Risk Information System (IRIS) Process. The National Academies Press. 2014. Available at http://www.nap.edu/openbook.php?record_id=18764. Accessed January 10, 2022. [PubMed] [Google Scholar]
- [29].Johnson PI, Koustas E, Vesterinen HM, Sutton P, Atchley DS, Kim AN, et al. Application of the Navigation Guide systematic review methodology to the evidence for developmental and reproductive toxicity of triclosan. Environ Int 2016;92-93:716–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Thompson W, Russell G, Baragwanath G, Matthews J, Vaidya B, Thompson-Coon J. Maternal thyroid hormone insufficiency during pregnancy and risk of neurodevelopmental disorders in offspring: a systematic review and meta-analysis. Clin Endocrinol 2018;88(4):575–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].McGhee DJM, Ritchie CW, Thompson PA, Wright DE, Zajicek JP, Counsell CE, et al. A systematic review of biomarkers for disease progression in Alzheimer’s disease. PLoS One 2014;9:e88854. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Downey A, Stroud C, Landis S, Leshner AI. Communicating with the public about interventions to prevent cognitive decline and dementia. In: Preventing Cognitive Decline and Dementia: A Way Forward. Washington (DC), USA: National Academies Press (US); 2017. [PubMed] [Google Scholar]
- [33].Cohen Hubal EA, Frank JJ, Nachman R, Angrish M, Deziel NC, Fry M, et al. Advancing systematic-review methodology in exposure science for environmental health decision making. J Expo Sci Environ Epidemiol 2020;30(6):906–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].European Food Safety AuthorityIn: ‘Scientific Opinion on the risks to public health related to the presence of bisphenol A (BPA) in foodstuffs’: PART II - Toxicological assessment and risk characterisation, 13 2015. 10.2903/j.efsa.2015.3978. [DOI] [Google Scholar]
- [35].Taylor JA, Welshons WV, Vom Saal FS. No effect of route of exposure (oral; subcutaneous injection) on plasma bisphenol A throughout 24h after administration in neonatal female mice. Reprod Toxicol 2008;25:169–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Meek MEB, Barton HA, Bessems JG, Lipscomb JC, Krishnan K. Case study illustrating the WHO IPCS guidance on characterization and application of physiologically based pharmacokinetic models in risk assessment. Regul Toxicol Pharmacol 2013;66(1):116–29. [DOI] [PubMed] [Google Scholar]
- [37].US Environmental Protection Agency. A review of the reference dose and reference concentration processes. Report Number EPA/630/P-02/002F. Washington, DC. 2002. Available at https://www.epa.gov/sites/production/files/2014-12/documents/rfd-final.pdf. Accessed January 10, 2022. [Google Scholar]
- [38].Slikker W Jr, Andersen ME, Bogdanffy MS, Bus JS, Cohen SD, Conolly RB, et al. Dose-dependent transitions in mechanisms of toxicity: case studies. Toxicol Appl Pharmacol 2004;201:226–94. [DOI] [PubMed] [Google Scholar]
- [39].Lagarde F, Beausoleil C, Belcher SM, Belzunces LP, Emond C, Guerbet M, et al. Non-monotonic dose-response relationships and endocrine disruptors: a qualitative method of assessment. Environ Health 2015;14:13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [40].Marty MS, Borgert C, Coady K, Green R, Levine SL, Mihaich E, et al. Distinguishing between endocrine disruption and non-specific effects on endocrine systems. Regul Toxicol Pharmacol 2018;99:142–58. [DOI] [PubMed] [Google Scholar]
- [41].Crump KS. The linearized multistage model and the future of quantitative risk assessment. Hum Exp Toxicol 1996;15(10):787–98. [DOI] [PubMed] [Google Scholar]
- [42].UK Committee on Toxicity (COT). Statement on phosphate-based flame retardants and the potential for neurodevelopmental toxicity. 2019. Available at https://cot.food.gov.uk/cotstatements/cotstatementsyrs/cot-statements-2019/cot-phosphate-based-flame-retardants-statement. Accessed January 10, 2022.
- [43].Pelch KE, Li Y, Perera L, Thayer KA, Korach KS. Characterization of estrogenic and androgenic activities for bisphenol A-like chemicals (BPs): in vitro estrogen and androgen receptors transcriptional activation, gene regulation, and binding profiles. Toxicol Sci 2019;172:23–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Cousins IT, DeWitt JC, Glüge J, Goldenman G, Herzke D, Lohmann R, et al. Strategies for grouping per- and polyfluoroalkyl substances (PFAS) to protect human and environmental health. Environ Sci Process Impacts 2020;d22(7):1444–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [45].Whaley P, Edwards SW, Kraft A, Nyhan K, Shapiro A, Watford S, et al. Knowledge organization systems for systematic chemical assessments. Environ Health Perspect 2020b;128(12):125001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [46].de Vries RBM, Angrish M, Browne P, Brozek J, Rooney AA, Wikoff DS, et al. Applying evidence-based methods to the development and use of adverse outcome pathways. ALTEX 2021;38(2):336–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [47].Guyton KZ, Rusyn I, Chiu WA, Corpet DE, van den Berg M, Ross MK, et al. Application of the key characteristics of carcinogens in cancer hazard identification. Carcinogenesis 2018;39(4):614–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [48].Smith MT, Guyton KZ, Gibbons CF, Fritz JM, Portier CJ, Rusyn I, et al. Key characteristics of carcinogens as a basis for organizing data on mechanisms of carcinogenesis. Environ Health Perspect 2016;124:713–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [49].Dailey J, Rosman L, Silbergeld EK. Evaluating biological plausibility in supporting evidence for action through systematic reviews in public health. Public Health 2018;165:48–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
