Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Nov 1.
Published in final edited form as: Epidemiology. 2020 Nov;31(6):755–757. doi: 10.1097/EDE.0000000000001254

In defense of the weight-of-evidence approach to literature review in the Integrated Science Assessment

Jennifer Richmond-Bryant 1
PMCID: PMC7541567  NIHMSID: NIHMS1625483  PMID: 32897910

The weight-of-evidence approach to science-based air quality policy

The Clean Air Act requires the United States Environmental Protection Agency (EPA) Administrator to issue “quality criteria for an air pollutant [that] shall accurately reflect the latest scientific knowledge useful in indicating the kind and extent of all identifiable effects on public health or welfare” as the basis for the review of the National Ambient Air Quality Standards (NAAQS)1. The NAAQS review includes a science assessment of the health and ecologic effects of exposure to a criteria pollutant (particulate matter, O3, NO2, SO2, CO, or Pb), followed by a risk assessment and a policy assessment containing recommendations for retaining or changing the NAAQS. Ultimately, the EPA Administrator makes a decision after reviewing the Integrated Science Assessment (ISA) and these recommendations.

EPA produces an ISA to evaluate the scientific literature and make determinations about the causal nature of relationships between criteria pollutant exposure and health and welfare effects2 (eFigure 1). The ISA takes a weight-of-evidence approach, in which it considers the body of scientific evidence spanning atmospheric chemistry, exposure assessment, dosimetry, health effects, and welfare effects related to a given criteria pollutant to determine if the literature taken together provides evidence of causality. Health effect determinations are made by considering evidence from controlled human exposure, animal toxicology, and epidemiology studies together. For example, the ISA can consider whether 1) exposure to a criteria pollutant is followed by a physiologic response in humans (controlled human exposure), 2) biologic mechanisms exist through which an exposure may cause a health effect (animal toxicology), and 3) populations exposed to ambient concentrations experience health effects (epidemiology). Key to EPA’s weight-of-evidence approach are certain Sir Bradford Hill aspects, including 1) consistency: agreement among studies about the existence of an effect within a given study type, 2) coherence: evidence of an effect among multiple lines of evidence, and 3) biologic plausibility: evidence of a mechanism by which the exposure may cause a health outcome3. When multiple studies from multiple disciplines mostly point towards the same conclusion, chance, confounding, and other biases have likely been reduced so that a conclusion of a causal relationship is supported. This approach has been lauded by past Clean Air Scientific Advisory Committees4 (CASAC; the scientific advisory committee that provides external review of the ISAs as mandated by the Clean Air Act) and by the Administrative Conference of the United States5 (a federal agency that convenes external experts to recommend efficiencies in implementing federal regulations and programs). However, the Trump Administration EPA is attempting to disqualify many epidemiologic studies from consideration in the NAAQS review process by changing the way in which the peer-reviewed literature is considered in the ISA.

An important part of the ISA’s science review process is evaluation of the quality of individual studies comprising the body of evidence informing the causality determinations. For the epidemiologic literature, the ISA considers for each study whether:

  • Concentrations observed in the study are at or near ambient concentrations;

  • Models control for confounding by copollutants and other factors;

  • Testing has been performed for potential effect modification;

  • Health endpoints have been included in the study design;

  • The study presents new information pertinent to populations, groups, or lifestages; and,

  • Methodologic issues such as lagged effects and thresholds have been included in the study’s design2.

Within this process, EPA recognizes the insufficiency of methods used in epidemiologic studies to address copollutant confounding2,6. Because each individual epidemiologic study controls for different potential confounders, the Hill approach to evaluating consistency across epidemiologic studies provides an indication of whether observed effects are robust to confounding2,3. The EPA qualitatively examined study quality criteria (study design, study population, exposure assessment, outcome assessment, confounding, statistical analysis) to identify strengths and threats to internal validity or risk of bias of studies that were included in the recently published 2020 ISA for Ozone7,8. Risk of bias did not necessarily disqualify a study from inclusion in this ISA8 if it was informative9. However, wording found in the Ozone ISA Process Appendix8 stating, “references that did not pass the study quality review, and deemed critically deficient, were excluded from the ISA,” raises concern that individuals within the Agency may be pushing for the use of study quality evaluation to reduce the evidence base rather than contextualize it, as in past ISAs2 (eFigure 1).

Narrowing the evidence base through “study quality criteria”

Dr. Tony Cox, the current Administrator-appointed CASAC chair and industry consultant with clients including the American Petroleum Institute and Phillip Morris International, has focused his review on whether individual studies demonstrate causality. Unlike the ISA’s weight-of-evidence approach to determine if the body of literature supports a conclusion that criteria pollutant exposure causes a health effect, Cox proposed limiting the considered epidemiologic literature to “manipulative causation” studies10. These studies require that some intervention be conducted to change air pollutant concentrations, with all other factors kept constant, to demonstrate a change in the health effect11. Accordingly, he has called for use of systematic review study quality criteria in the ISAs to substantiate excluding studies based on the following considerations10:

  • Study does not control for potential confounders or selection bias;

  • Study estimates exposure by fixed-site monitor or modeled concentrations in lieu of using true personal exposure;

  • Study design does not allow for testing of threats to internal validity;

  • Study design does not allow for evaluation of external validity;

  • Study does not perform sensitivity testing; or

  • Study design only addresses association rather than permitting “valid inferences about (manipulative) causation”10.

Cox’s focus on study elimination suits the systematic review process of addressing a narrow question about observed health effects following an intervention12 through his manipulative causation test. This differs from the weight-of-evidence approach used in the ISA, as framed by the Clean Air Act, to evaluate all of the latest scientific evidence together to uncover all known health and welfare effects of the criteria pollutant.

Goodman et al.13 proposed a quality score for studies considered for the ISA to quantitatively rate peer-reviewed studies based on a set of criteria similar to those listed by Cox10, for short-term ozone exposure and asthma severity (eTable 1). In this system, an attribute deemed positive by Goodman et al.13 would receive a score of “+1”, and an attribute deemed negative by Goodman et al.13 would receive a score of “−1”. A positive score, summed across criteria, would result in designation as a Tier I study, while a negative score would result in designation as a Tier II study. In this analysis, four out of 19 panel studies, 10 out of 28 time-series studies, and one out of eight case-crossover studies (27% of all studies) would be downgraded, or excluded by Cox’s approach10. Similar approaches by Goodman’s group have been published with respect to the relationships between particulate matter exposure and lung cancer biomarkers14, long-term ozone exposure and cardiovascular effects15, and short-term ozone exposure and cardiovascular effects16, as well as in a broad evaluation of the framework used by the EPA to evaluate the science used for the NAAQS review17.

Evaluation of quality scores in systematic reviews has demonstrated that different scoring systems can produce drastically different results for the same set of studies, making study rankings arbitrary and therefore unsystematic.1820 Armijo-Olivo et al.18 compared two study quality metrics and found that the same conclusions about study quality were reached for only two out of 20 studies. Whiting et al.20 developed five weighting scores and found that while the scores mostly agreed for highest and lowest quality scoring studies, results among the studies ranking in the middle were vastly different. Rooney et al.21 performed a comparison of five different qualitative study quality evaluation methods used in systematic review and found that, while all of the methods addressed issues of selection, exposure, attrition, confounding, outcome assessment, and publication bias, conversion of these elements to scores would be inadvisable due to uncertainties about each domain and how they are weighted. Savitz et al.9 also recognized this limitation of study quality scores and instead recommended a qualitative approach to investigate the risk of bias and its potential magnitude and direction of effect through critical analysis of relevant peer reviewed literature for each potential source of bias. Because the ISA is a complex, cross-disciplinary synthesis of the literature with a mandate under the Clean Air Act to be comprehensive, study weighting is inappropriate and exclusion risks loss of evidence that should be considered11. Qualitative analysis in line with the approach of Savitz et al.9 is therefore more amenable to the critical assessment of literature within the ISA.

Example: Trading exposure measurement error for reporting bias

The question about whether to discard studies due to a perceived study quality issue or simply to acknowledge their limitations can be examined for the exposure assessment domain. Dr. Sabine Lange, a CASAC charter member, wrote in her comments on the Particulate Matter ISA10:

“the systematic review guidelines for TSCA22 lists [sic] study quality criteria for epidemiologic studies (amongst others). They state as a criterion for deeming a study unacceptable (and therefore for removal from the review) ‘There is evidence of substantial exposure misclassification that would significantly alter results.’ This needs to be seriously considered for studies that use ambient monitors as surrogates for personal exposure.”

It is true that exposure assessments conducted for air pollution epidemiologic studies have limitations that may add bias and uncertainty to effect estimates. However, a recent review of the influence of exposure measurement error on effect estimates has shown that exposure measurement error is usually negatively biased23. In other words, the presence of bias would not negate the observed effect and, in fact, would often cause an observed effect to be underestimated. Conversely, discarding epidemiologic studies from consideration in the ISA due to exposure measurement error could lead to reporting bias. This could result in loss of evidence of an association between the exposure and health effect from the ISA.

Conclusions

Using systematic review criteria such as individual study quality ratings for the ISA may not improve the validity of causality determinations. This practice could, in fact, introduce uncertainty and bias into the process by excluding informative scientific studies for the purported reason of minimizing bias. In contrast, expert evaluation of the strengths and limitations of studies through the weight-of-evidence approach reduces bias through the triangulation process24. Without thoughtful consideration of these concerns, adopting a systematic review study rating methodology is likely to create an ill-conceived set of criteria that will make the ISAs less, not more, defensible and could result in weakened NAAQS.

Supplementary Material

Supplemental Digital Content

Acknowledgments

The author would like to thank Dr. Joel Kaufmann for his advice on developing this commentary.

Sources of financial support: The author is supported in part by the Superfund Basic Research Program (P42 ES013648). The content is solely the responsibility of the author and does not necessarily represent the official views of the National Institute of Environmental Health Sciences or the National Institutes of Health.

Biosketch

Dr. Jennifer Richmond-Bryant was a staff scientist with the U.S. EPA National Center for Environmental Assessment from 2008–2019, as the exposure assessment lead on the team writing the Integrated Science Assessment. She is currently an Associate Professor of the Practice in the Department of Forestry and Environmental Resources at North Carolina State University. Her research areas include assessing human exposure to ambient air pollution, transport and dispersion of air pollutants, and disparities in exposures among population groups.

Footnotes

Conflict of interest: The author is a former employee of the Environmental Protection Agency but has no current financial ties to the organization or to any organization participating in review of the Integrated Science Assessment.

Description of the process by which someone else could obtain data and computing code: n/a

References

  • 1.42 U.S.C. 7408.
  • 2.U.S. EPA. Preamble to the Integrated Science Assessments. Research Triangle Park, NC: Office of Research and Development; 2015. [Google Scholar]
  • 3.Hill AB. The environment and disease: Association or causation? Proc R Soc Med. 1965;58:295–300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.CASAC. CASAC Review of the EPA’s Integrated Science Assessment for Oxides of Nitrogen - Health Criteria (Second External Review Draft). Washington, DC; Office of Research and Development; 2015. [Google Scholar]
  • 5.Wagner W Science in Regulation: A Study of Agency Decisionmaking Approaches. Washington, DC: : Administrative Conference of the United States; 2013. [Google Scholar]
  • 6.Rothman KJ, Greenland S. Modern Epidemiology, 2nd Edition. Philadelphia: Lippincott Williams & Wilkins; 1998. [Google Scholar]
  • 7.U.S. EPA. Draft Ozone ISA: Study Quality (2019). 2019 [cited January 10, 2020]. Available from: https://hawcprd.epa.gov/assessment/100500031/.
  • 8.U.S. EPA. Integrated Science Assessment for Ozone and Related Photochemical Oxidants. Research Triangle Park, NC: Office of Research and Development; 2020. EPA/600/R-20/012. [Google Scholar]
  • 9.Savitz DA, Wellenius GA, Trikalinos TA. The problem with mechanistic risk of bias assessments in evidence synthesis of observational studies and a practical alternative: Assessing the impact of specific sources of potential bias. Am J Epidemiol. 2019;188:1581–1585. [DOI] [PubMed] [Google Scholar]
  • 10.Cox LA. CASAC Review of the EPA’s Integrated Science Assessment for Particulate Matter (External Review Draft - October 2018). Washington, DC: Clean Air Scientific Advisory Committee; 2019. [Google Scholar]
  • 11.Campaner R Mechanistic causality and counterfactual-manipulative causality: recent insights from philosophy of science. J Epidemiol Community Health. 2011;65:1070–1074. [DOI] [PubMed] [Google Scholar]
  • 12.Boell SK, Cecez-Kecmanovic D. On being ‘systematic’ in literature reviews in IS. J Inf Technol. 2015;30:161–173. [Google Scholar]
  • 13.Goodman JE, Zu K, Loftus CT, Lynch HN, Prueitt RL, Mohar I, Pacheco Shubin S, Sax SN. Short-term ozone exposure and asthma severity: Weight-of-evidence analysis. Environ Res. 2018;160:391–397. [DOI] [PubMed] [Google Scholar]
  • 14.Lynch HN, Loftus CT, Cohen JM, Kerper LE, Kennedy EM, Goodman JE. Weight-of-evidence evaluation of associations between particulate matter exposure and biomarkers of lung cancer. Regul Toxicol Pharm. 2016;82:53–93. [DOI] [PubMed] [Google Scholar]
  • 15.Prueitt RL, Lynch HN, Zu K, Sax SN, Venditti FJ, Goodman JE. Weight-of-evidence evaluation of long-term ozone exposure and cardiovascular effects. Crit Rev Toxicol. 2014;44:791–822. [DOI] [PubMed] [Google Scholar]
  • 16.Goodman JE, Prueitt RL, Sax SN, Lynch HN, Zu K, Lemay JC, King JM, Venditti FJ. Weight-of-evidence evaluation of short-term ozone exposure and cardiovascular effects. Crit Rev Toxicol. 2014;44:725–790. [DOI] [PubMed] [Google Scholar]
  • 17.Goodman JE, Prueitt RL, Sax SN, Bailey LA, Rhomberg LR. Evaluation of the causal framework used for setting National Ambient Air Quality Standards. Crit Rev Toxicol. 2013;43:829–849. [DOI] [PubMed] [Google Scholar]
  • 18.Armijo-Olivo S, Stiles CR, Hagen NA, Blondo PD, Cummings GG. Assessment of study quality for systematic reviews: a comparison of the Cochrane Collaboration Risk of Bias Tool and the Effective Public Health Practice Project Quality Assessment Tool: methodological research. J Eval Clin Pract. 2012;18:12–18. [DOI] [PubMed] [Google Scholar]
  • 19.Juni P, Witschi A, Bloch R, Egger M. The hazards of scoring the quality of clinical trials for meta-analysis. JAMA. 1999;282:1054–1060. [DOI] [PubMed] [Google Scholar]
  • 20.Whiting P, Harbord R, Kleijnen J. No role for quality scores in systematic reviews of diagnostic accuracy studies. BMC Med Res Methodol. 2005;5:19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Rooney AA, Cooper GS, Jahnke GD, Lam J, Morgan RL, Boyles AL, Ratcliffe JM, Kraft AD, Schunemann HJ, Schwingl P, Walker TD, Thayer KA, Lunn RM. How credible are the study results? Evaluating and applying internal validity tools to literature-based assessments of environmental health hazards. Environ Int. 2016;92–93: 617–629. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.U.S. EPA. Application of Systematic Review in TSCA Risk Evaluations. Research Triangle Park, NC: Office of Chemical Safety and Pollution Prevention; 2018. [Google Scholar]
  • 23.Richmond-Bryant J, Long TC. Influence of exposure measurement error on results from epidemiologic studies of different designs J Exp Sci Environ Epidemiol. 2019; 10.1038/s41370-019-0164-z. [DOI] [PubMed] [Google Scholar]
  • 24.Lawlor DA, Tilling K, Smith GD. Triangulation in aetiological epidemiology. Int J Epidemiol. 2016;45:1866–1886. DOI:// 10.1093/ije/dyw314. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Digital Content

RESOURCES