Systematic evidence maps are increasingly used to develop chemical risk assessments. These maps can provide an overview of available studies and relevant study information to be used for various research objectives and applications. Environmental epidemiological studies that examine the impact of chemical exposures on various ‘omic profiles in human populations provide relevant mechanistic information and can be used for benchmark dose modeling to derive potential human health reference values.
To create a systematic evidence map of environmental epidemiological studies examining environmental contaminant exposures with ‘omics in order to characterize the extent of available studies for future research needs.
Systematic review methods were used to search and screen the literature and included the use of machine learning methods to facilitate screening studies. The Populations, Exposures, Comparators and Outcomes (PECO) criteria were developed to identify and screen relevant studies. Studies that met the PECO criteria after full-text review were summarized with information such as study population, study design, sample size, exposure measurement, and ‘omics analysis.
Over 10,000 studies were identified from scientific databases. Screening processes were used to identify 84 studies considered PECO-relevant after full-text review. Various contaminants (e.g. phthalate, benzene, arsenic, etc.) were investigated in epidemiological studies that used one or more of the four ‘omics of interest: epigenomics, transcriptomics, proteomics, and metabolomics. The epidemiological study designs that were used to explore single or integrated ‘omic research questions with contaminant exposures were cohort studies, controlled trials, cross-sectional, and case-control studies. An interactive web-based systematic evidence map was created to display more study-related information.
This systematic evidence map is a novel tool to visually characterize the available environmental epidemiological studies investigating contaminants and biological effects using ‘omics technology and serves as a resource for investigators and allows for a range of applications in chemical research and risk assessment needs.
1. Introduction
There is a vast array of chemicals and contaminants that humans are potentially exposed to, of which the majority lack the substantial toxicity data needed to perform comprehensive human health assessments. Traditional environmental epidemiology has aimed to characterize how individual or mixtures of exposures are associated with one or several apical health outcomes; however, it seldom sufficient to characterize all environmentally-associated biological or health outcomes (Kyrtopoulos, 2013). Technological advancements in ‘omics (e.g., genomics, transcriptomics, proteomics, and metabolomics) have resulted in improved capabilities of generating high-dimensional molecular data (e.g., hundreds to thousands of genes, methylations, proteins, and metabolites) that are informative about internalization of exposures and perturbations to physiological activities (Kyrtopoulos, 2013). Information from ‘omics analyses can be used as a complement to traditional environmental epidemiology and expand understanding of the impacts of chemicals on health and on disease etiology.
The field of human health risk assessment is utilizing practices of evidence mapping in order to systematically identify relevant studies to a given topic of interest (Bragge et al., 2011; Wolffe et al., 2019). A systematic evidence map may provide an overview of available scientific studies that can be used to identify data gaps in the topic of interest and provide relevant information such as the number of studies on the topic, study design, and study characteristics (Bragge et al., 2011; Miake-Lye et al., 2016; Wolffe et al., 2019). Moreover, evidence mapping can be helpful for gathering relevant toxicity and mechanistic data for legacy and emerging chemicals forms an increasingly vital part of risk assessment, and advances in analytical techniques and scientific understanding continue to broaden the scope of available data beyond from those of the traditional in vivo or in vitro toxicity testing (Wolffe et al., 2019).
A number of environmental epidemiological studies have examined the impact of various chemical or contaminant exposures on certain tissue-based (usually blood) ‘omic profiles in human populations. Information from these studies can help identify biological profile changes related to known or suspected adverse effects associated with the exposures of interest. These human population based studies using ‘omics analyses can also be informative of exposure and early biological effect biomarkers as well as molecular and cellular events that are indicative of modes-of-action or key events in adverse outcome pathways (Espín-Pérez et al., 2014). We developed a systematic evidence map (SEM) of environmental epidemiological studies examining chemical or contaminant exposures with ‘omics analyses in order to characterize the extent of available studies and for future research needs as well as potential future applications in chemical risk assessments. Single or multi-omic integration in epidemiological studies can provide a significant opportunity to increase the understanding of health and disease with respect to biological mechanism, molecular targets, and biomarkers (Karczewski and Snyder, 2018). Thus, the ‘omics data from these epidemiological studies have the potential to inform various aspects of risk assessment such as mechanism of action, exposure assessment, toxicokinetics, and dose-response assessment (Yu et al., 2016).
2. Methods
We developed a PECO statement (Participants, Exposure, Comparator, and Outcomes) to define the scope of the SEM:
Participants: Any population and lifestage (occupational or general population, including children and other sensitive populations).
Exposure: Any chemical and/or environmental contaminant
Comparator: Comparison or reference population exposed to lower or no levels of chemical/contaminant to more highly exposed population; or humans who serve as their own control by comparing before-and-after outcomes following chemical/contaminant exposure
Outcome: Molecular analyses from use of the following ‘omics:
Transcriptomics: gene expression changes (RNA transcripts)
Metabolomics: metabolites produced by cell, tissue, or organism
Proteomics: protein functions and interactions
Epigenomics: epigenetic modifications (e.g., DNA methylation, histone modification, microRNA) that influence gene expression.
To identify relevant literature, we developed a comprehensive search strategy. The primary databases searched were PubMed, Web of Science, Toxline, and Toxic Substances Control Act Test Submissions (TSCATS). Additional resources outside of the four bibliographic databases were also used such as the reference list of studies and reviews screened as meeting PECO criteria after full-text review, and publicly available study and data information from Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/). We curated specific search terms from relevant vocabularies (Table 1) for each category of ‘omics (transcriptomics, epigenomics, metabolomics, and proteomics), excluding terminology that was too broad to be useful. For example, we did not include “cell” and “array” as terms as they were too general, and fold-change was reported in various ways to express quantification of expression (not just for ‘omics) so was not included in the search terms. However, we included specific terms such as “differential expression,” “differential expressed genes,” or “differential gene expression” that consistently produced on-topic results. Our strategy was further refined (Table S1) to recall ‘omics studies meeting two criteria points: studies had to first focus on human populations, and then they must also contain information relevant to chemical exposure.
Table 1.
Relevant Search Terms.
Subject | Terms |
Chemical Exposure | inhalation, ingestion, skin contact, chemical exposure, molecular epidemiology |
Metabolomics | metabolite, biofluid, tissue, metabolome, metabolism, molecular phenotype, sugar, lipid, amino acid, fatty acid, phenolic compound, alkaloids, small molecules, biomarker |
Epigenomics | epigenome, DNA, gene expression, DNA methylation, histone modification, microRNA, miRNA, Illumina 450 K, Infinium HumanMethylation, Illumina 850 K |
Transcriptomics | RNA, mRNA, mRNA expression, microarray, high-throughput sequencing, HTS, transcriptome, next-generation sequencing, NGS, RNA-Seq, differentially expressed genes, differential genes, differential gene expression |
Proteomics | proteomes, protein expression, protein activity, protein degradation, protein production, steady-state abundance, post-translational modifications, PTM, mass spectrometry |
ToxNet Narrowing Terms | biological effect, exposure, biomarker |
Human Health Terms | development*, skin, tissue, derm*, human health, health, epidemiology, child*, teenager, adolescen*, pregnan*, adult*, general population, population, blood, serum, birth |
Exclusion Terms | social, in vitro, in-vitro, animal, mouse, mice, rat*, beagle*, rodent*, rabbit*, dog*, cat*, guinea pig*, primate*, monkey*, pig*, fish*, bird*, frog*, in vivo, in-vivo, cell line, yeast, tick, ecotoxicity, nematode, mosquito*, drosophila, mite*, reptile, parasit*, poultry, mussel*, parasite*, murine, protozoa*, fungi, fungus, cetacean, canine, feline |
2.1. Initial literature searches
The initial literature search was performed in October 2019 and considered studies from 1995-present. The starting year for this search was chosen with consideration to keystone publications, namely the first microarray expression analysis described in the developing era of genomics (Schena et al., 1995). An update search was performed in December 2020 to integrate references published since the initial search, and title/abstract and full-text screening were performed (details discussed in literature screening and data extraction section) on the resulting group of studies.
2.2. Validation searches
The initial primary searches were designed to be broad and capture a wide array of information, but there were concerns that certain types of studies may have been missed. Thus, we performed additional focused validation searches (October 2021) to identify specific categories of potentially missed relevant information. First, we aimed to identify non-English studies by performing SQL querying against the reference database. The query enables us to check against the language field in the host database. Any values other than English were considered for this validation step. Searches were also performed to identify references specifically regarding arsenic and polychlorinated biphenyls (PCBs) (Table S2), as they were examples of chemicals investigated across all ‘omics platforms (epigenomics, transcriptomics, metabolomics, and proteomics) from studies evaluated through initial title/abstract screening. Additionally, we performed searches targeted to identify studies of “chemical” exposure and phthalate exposure (Table S3). Phthalates were chosen as a broad chemical class that may have missing studies from the initial search, which would indicate that the initial search terms need to be re-evaluated. Validation searching for these studies occurred in two parts. The first keyword set identified individual phthalates using common names and abbreviations. The second keyword set identified phthalates categorically (e.g., phthalate esters (PAEs), plasticizers). The resulting group of studies from the validation searches were then also put through a title/abstract screening effort.
2.3. Literature screening and data extraction
Because the number of references retrieved was large from the search efforts, the results were first imported into SWIFT-Review software (https://www.sciome.com/swift-review/) to remove duplicate and off-topic references. It has pre-set literature search filters (health outcome and evidence stream) that can be applied to identify studies that are more likely to be useful for identifying human health content from those that likely do not. Using the evidence stream filter in SWIFT-Review, we selected for human studies only, and tagged for animal or plant studies, in-vitro studies, ecotoxicity, physical chemistry, and environmental fate to be excluded. Using the health outcomes filter, we selected all relevant health outcomes (e.g., hematological and immune, developmental, cancer, respiratory, endocrine, reproductive, renal, hepatic, cardiovascular, musculoskeletal, neurological, nutritional and metabolic, mortality, skin and connective tissue, and ocular and sensory) and excluded studies with tags for physiological based pharmacokinetic (PBPK) modeling and simulation, or that had no tags.
Title/abstract screening of relevant studies was then performed in SWIFT-Active Screener (https://www.sciome.com/swift-activescreener). To be considered for inclusion, studies needed to meet the PECO criteria. If the studies were included, information was noted on the type(s) of ‘omics (transcriptomics, epigenomics, metabolomics, or proteomics) examined in the study. in general, reviewers can save time by reviewing titles and abstracts in SWIFT-Active Screener until an estimated 95% recall since it uses “active” machine learning in which real time screening decisions help to prioritize unscreened studies for relevance (Howard et al., 2020). For this systematic evidence map process, reviewers screened all 100% of included studies in SWIFT-Active Screener in order to ensure no pertinent environmental epidemiological studies were missed.
Full text screening and data extraction of the included studies were performed concurrently in DistillerSR (https://www.evidencepartners.com/products/distillersr-systematic-review-software). Both title/abstract and full-text screening processes using SWIFT-Review, SWIFT-Active Screener, and DistillerSR were conducted by two independent reviewers. The data extraction consisted of information on study population (pregnant women, occupational, general population-adults, children/adolescents), study design (case-control, cohort, controlled-trial, cross-sectional, other), years of data collection, study sample size, country of study/population, exposure measurement (chemicals analyzed, exposure levels, and matrix), availability of data (e.g., whether data was publicly available or accessible), potential confounders, type of ‘omics and platform, and biological matrix used for ‘omics. We then created an evidence map summarizing the available and extracted data. The code book is available in the Supplemental Materials. Visualizations were generated using Microsoft Excel and Tableau.
3. Results
Our searches spanning from January 1, 1995 until October 13, 2021 across the databases retrieved 10,067 total records (Fig. 1). 7,397 studies were identified after undergoing pre-set literature filters, and 7,348 unique studies were selected for title/abstract screening after removal of duplicates and off-topic references (Fig. 1). As part of validation searches, we performed chemical-specific searches for arsenic and PCBs as many of the identified studies from the initial search examined these two chemicals with the four ‘omics of interest. We identified 333 studies for arsenic and 393 studies for PCB (Fig. 1). We also performed validation searches on 2,670 studies that were initially excluded through the filters in SWIFT-Review to check that relevant references were not missed and used additional search terms to identify any relevant studies for foreign language (119 studies), chemical exposure (431 studies), and two keyword sets for phthalates (165 studies for set I and 312 studies for set II) (Figure S1). From filtering and screening in SWIFT-Active Review and Screener of studies identified from initial and validation searches as well as through reference lists and GEO database, 84 unique relevant environmental epidemiological studies were identified (Fig. 1).
Fig. 1. Study Selection Diagram (Literature Search Results).
*References identified from other sources (n = 14) were collected from reviews, abstracts, and suggested work in screened studies.
The chemicals investigated in the identified epidemiology ‘omics studies are shown in Fig. 2. We group the chemicals into categories of air pollutants, BPA and phthalates, metals, persistent organic pollutants, pesticides, PFCs and PFAS, volatile organic compound, and other. The top five chemicals that were often examined in epidemiology ‘omics studies were arsenic, PCBs (grouped), particulate matter (PM) 2.5, benzene, and metals (cadmium, lead).
Fig. 2. Summary of chemical exposures of included studies*.
*Some studies may appear in multiple categories for examining one or more chemicals.
Cross-sectional (n = 35) and cohort studies (n = 28) were the most common study designs across all of the ‘omics epidemiology studies, followed by controlled exposure trials (n = 12) and case-control studies (n = 4) (Fig. 3). Omics investigations are often cross-sectional in nature with samples collected at a single time point due to limited availability of data or post-hoc additions to existing studies (Chu et al., 2019). Various exposure matrix were examined in the epidemiological ‘omics studies (Fig. 3), and air (n = 27), urine (n =25), and blood (n = 21) were mostly used to measure exposure levels (Fig. 3). For most ‘omics analyses, studies used serum, placenta, cord blood, or urine to extract the biological markers (e.g., RNA, DNA, metabolites, or proteins) (Table S4). Different populations such as children and adolescents < 18 years old, adults, occupational workers, and pregnant women were all investigated across studies using epigenomics, metabolomics, or transcriptomics for chemical exposure effects (Fig. 3). Studies that used proteomics investigated chemical exposures among mostly occupational (n = 3) and adult general population(n = 1) (Fig. 3). Studies of pregnant women most often used epigenomics (n = 20) to investigate effects from prenatal chemical exposures, which is of interest because prenatal exposure can dysregulate the fetal epigenome with potential consequences for subsequent adverse health effects manifesting in childhood, over lifetime, or transgenerationally (Perera and Herbstman, 2011).
Fig. 3. Summary of study design and population of included studies*.
*Some studies may appear in multiple categories for examining one or more ‘omics.
An interactive dashboard of the data in the systematic evidence map is available at: https://public.tableau.com/app/profile/literature.inventory/viz/Omics-epi-SEM/Omics_Inventory, and information of the 84 included environmental epidemiology studies are summarized below. The dashboard can be filtered by chemical, omics type, population category, study design, and matrix (snapshot of dashboard shown in Fig. 4a). The sample size of the 84 studies in this systematic evidence map ranged from 5 to 2,411 participants (see interactive dashboard, Fig. 4a). Overall, studies were often conducted in United States (n = 24) as well as in countries such as China (n = 11), Mexico (n = 8), and Bangladesh (n = 6) (see interactive dashboard, Fig. 4a). Studies most often used epigenomics (n = 45), transcriptomics (n = 30), and metabolomics (n = 11) but used less of proteomics (n = 4) when examining associations with chemical exposures (see interactive dashboard, Figs. 3 and 4a). There were several epidemiological studies (n = 6) that examined more than one ‘omics, such as transcriptomics and epigenomics (see interactive dashboard, Fig. 3 Fig. 4a).
Fig. 4.
Snapshots from interactive dashboard that display extracted information of the identified epidemiological ‘omics studies. (See above-mentioned references for further information.).
When an individual study is selected from the interactive systematic evidence map, the reader can access additional study details such as years of data collection, data availability, ‘omics technology platform, potential confounders the study accounted for, and exposure levels (includes information on the central tendency type, value, and additional information such as units of measurement) (see interactive dashboard). An example of an individual study and information is shown in Fig. 4b. Thirty-five studies (42%) had publicly available data and provided database (such as GEO or dbGAP) accession numbers (see interactive dashboard). Additionally, most of the studies (n = 73, 87%) indicated potential confounders and performed restriction, matched the study subjects with respect to potential confounders, or adjusted for them in the statistical analyses (Fig. 4) (see interactive dashboard).
4. Discussion
The systematic evidence map presented here summarizes the available body of environmental epidemiological studies investigating molecular effects from chemical exposures by using ‘omics technology. Key information on study design, study population characteristics, type of ‘omics and technology, and chemical exposure are available, allowing the user to inspect and analyze the available studies. This evidence map serves as an inventory and starting place for further investigations. It can be used to inform future research or the design of environmental epidemiological studies examining chemical exposures and ‘omics by identifying the relevant information in which the presented studies have and also identifying potential data gaps.
Environmental epidemiological studies that examine ‘omics need to consider unique potential sources of bias related to sampling of tissues and analyzing of high-throughput data. This SEM extracted some information relevant to bias (e.g., matrix, exposure levels, adjustment for confounding, population description), but risk of bias evaluation was not performed. Researchers using this data should consider potential sources of bias as relevant to their research question. The challenges associated with ‘omics data require researchers to consider sources of bias in traditional epidemiological studies, such as confounding, selection bias, measurement error, and reverse causation, but also consider unique biases such as cellular to tissue heterogeneity or technical variability (Everson and Marsit, 2018; Rockett et al., 2004). Statistical power is important to consider when trying to detect true associations from ‘omics data; thus, power calculations must be performed to estimate appropriate sample sizes (Everson and Marsit, 2018). Additionally, environmental epidemiological studies using ‘omics data should have information on quality control, filtering processes, normalization and appropriate statistical methods. As various ‘omics platforms exist, there are potential challenges to applying and analyzing ‘omics in epidemiological studies (see examples listed in Table S5); and thus, the generated data need to be carefully analyzed and interpreted (Franks and Pomares-Millan, 2020; Krassowski et al., 2020). Moreover, environmental epidemiologic studies often examine multiple -omics within the population of interest and so study designs that involve repeated sampling on study subjects to generate multiple-omics are particularly useful for biological inference (Chu et al., 2019). Thus, thoughtful incorporation of different study design principles, data filtering and analysis, validation of possible biomarkers in independent populations, and further analytical development of integrative ‘omics methods are necessary to continue the understanding of relevant biological processes inferred from ‘omics in these environmental epidemiological studies.
There are several strengths and limitations to our systematic evidence map. As mentioned above, we did not examine perform risk of bias evaluation, and thus it is not known whether the included studies are of high or low quality. As part of our future research applications of using this systematic evidence map, we are working to create data quality metrics and study evaluation criteria on domains such as population selection, exposure assessment, and risk of bias for these environmental epidemiological studies examining chemical exposures with ‘omics. A major strength of our systematic evidence map is that we searched diverse sources of evidence by using multiple databases and reviewing reference lists of included studies and reviews to identify relevant references, which we believe has resulted in a comprehensive inventory but with potential for missing relevant studies. Another strength is that we worked with an expert librarian to develop and optimize our search terms and targeted strategies. Moreover, using the extracted information from our systematic evidence map, others can adapt for quantifiable risk assessment or research needs such as developing analytic strategies based on the study design, types of ‘omic data available, and exposure information
In general, data derived from epidemiological research has certain advantages over animal or in-vitro experimental studies when assessing exposure-outcome associations for risk assessments. For example, the target species (human) is directly relevant and does not require for interspecies or high to low dose extrapolations (Burns et al., 2019). Technological advancements in ‘omics have resulted in improved capacity to measure molecular changes in biological samples that are informative about internalization of exposures and physiological perturbations (Everson and Marsit, 2018). These advances have expanded researchers’ capabilities to examine the underlying etiology of environmentally-associated diseases. Using analysis from ‘omics in environmental epidemiological studies as a complement to traditional epidemiology and experimental studies can further our understanding of the direct impacts of chemicals on human health and inform relevant human data for risk assessment. For instance, ‘omics data in epidemiological studies are being used to better characterize molecular initiating events and provide evidence of key events at different levels of molecular processes in adverse outcome pathways (Brockmeier et al., 2017). The ‘omics data from epidemiological studies can provide mechanistic evidence to support chemical read-across, weight of evidence for certain mechanisms, understanding of biological networks, and potential quantitative development of point of departures; and we are focusing on the latter application for future work using this systematic evidence map. Hence, qualitative and quantitative approaches and considerations to using various ‘omics data sets in risk assessments or other regulatory landscapes continue to be explored (Boverhof and Zacharewski, 2006; Buesen et al., 2017; D. Ghosh et al., 2018; Pennie et al., 2004).
5. Conclusions
‘Omics data have generally been used in chemical risk assessments to provide information about the mode or mechanism of action and are now transitioning to using these data to potentially derive human health relevant toxicity values. For example, a recent study using an epidemiological cohort estimated inorganic arsenic doses that corresponded to changes in transcriptomic, proteomic, epigenomic, and integrated multi-‘omic signatures in human cord blood through benchmark dose modeling (Rager et al., 2017). Thus, data integration from ‘omics in population-based studies may provide direct human-relevant reference values to quantify biological effects from chemical exposures. The systematic evidence map presented here identifies the existing evidence that may be used for including ‘omics information from epidemiological studies for quantitative and qualitative risk assessments and also informs where more data may be needed for specific chemical(s). This systematic evidence map will be updated on an annual basis as an ongoing resource for researchers investigating chemical exposures and ‘omics data in epidemiological studies.
Supplementary Material
