Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Aug 1.
Published in final edited form as: Toxicology. 2020 May 22;441:152505. doi: 10.1016/j.tox.2020.152505

Genomic Resources for Dissecting the Role of Non-Protein Coding Variation in Gene-Environment Interactions

Daniel Levings 1, Kirsten E Shaw 1, Sarah E Lacher 1,*
PMCID: PMC7423718  NIHMSID: NIHMS1602194  PMID: 32450112

Abstract

The majority of single nucleotide variants (SNVs) identified in Genome Wide Association Studies (GWAS) fall within non-protein coding DNA and have the potential to alter gene expression. Non-protein coding DNA can control gene expression by acting as transcription factor (TF) binding sites or by regulating the organization of DNA into chromatin. SNVs in non-coding DNA sequences can disrupt TF binding and chromatin structure and this can result in pathology. Further, environmental health studies have shown that exposure to xenobiotics can disrupt the ability of TFs to regulate entire gene networks and result in pathology. However, there is a large amount of interindividual variability in exposure-linked health outcomes. One explanation for this heterogeneity is that genetic variation and exposure combine to disrupt gene regulation, and this eventually manifests in disease. Many resources exist that annotate common variants from GWAS and combine them with conservation, functional genomics, and TF binding data. These annotation tools provide clues regarding the biological implications of an SNV, as well as lead to the generation of hypotheses regarding potentially disrupted target genes, epigenetic markers, pathways, and cell types. Collectively this information can be used to predict how SNVs can alter an individual’s response to exposure and disease risk. A basic understanding of the regulatory information contained within non-protein coding DNA is needed to predict the biological consequences SNVs, and to determine how these SNVs impact exposure-related disease. We hope that this review will aid in the characterization of disease-associated genetic variation in the non-protein coding genome.

Keywords: Gene-regulation, Gene-expression, Non-protein coding DNA, Gene-environment interactions

1.0. Introduction

Genome wide association studies (GWAS) identify markers of genetic variation in individuals associated with disease risk, and these studies have provided an excellent asset for understanding the complex nature of disease. However, the functional implications of individual SNVs or SNV combinations and their contribution to disease is not always straightforward. To date, most genetic loci that have been identified in GWAS only account for a small portion of phenotypic variation and most disease risk remains unexplained; this is referred to as “missing heritability” (Zuk et al. 2012). Complex and common diseases are often a result of many variant with a small effect coupled with environmental exposures over an individual’s lifetime. Further, rare variants (with possibly large effects) not represented on GWAS arrays, gene-gene interactions, and gene-environment interactions (GxE) are all potential contributors to disease risk that require further exploration in order to better interpret GWAS data (Manolio et al. 2009). Even with these caveats, there is still a wealth of information hidden within current GWAS data sets, and strategic approaches to dissect GWAS data are needed in order to understand the contribution to disease risk.

In humans, non-protein coding DNA makes up 98% of the genome (Lander 2011); likewise, the majority of single nucleotide variants (SNVs) identified in GWAS fall in non-protein coding DNA. One well-described function of non-protein coding DNA is to work with DNA binding transcription factors (TFs) to regulate gene expression. Environmental health studies have demonstrated for decades that exposure can alter the function and activity of transcription factors (TFs), which can ultimately disrupt the transcriptional regulation of a single gene, or even entire gene networks, and result in pathology. Non-protein coding DNA controls gene expression at the most basic level by serving as TF binding sites. At the more complex level, non-protein coding DNA regulates DNA-nucleosome interactions and is an important factor in the higher-order organization of DNA into three-dimensional chromatin. Perturbations at any of these levels of organization can lead to altered gene expression (Fig 1). Disruption of these tightly regulated processes due to genetic variation and/or exposure can contribute to phenotypic changes in the cell, which can eventually manifest as disease. Thus, a basic understanding of the regulatory mechanisms mediating gene expression is needed in order to predict the functional implications of the vast majority of disease-associated SNVs, and to determine how these SNVs impact common, complex, and exposure-related disease (Zhang and Lupski 2015, Nishizaki and Boyle 2017).

Figure 1: SNVs in Non-Coding DNA Disrupt Gene Expression via Multiple Routes.

Figure 1:

(A) SNVs located in a TF-DNA binding sequence can change the sequence from high affinity to low affinity (depicted here) or vice versa, and this can result in large changes in the target gene’s response to activation. (B) An SNV or a combination of SNVs can disrupt TF DNA binding sequences in both distal and proximal enhancer regions and can alter the integration of these multiple signals and disrupt gene expression. (C) As described in the text histones can be modified and these histone marks dictate regulatory activity. Primary TFs (TF’) are able to bind regions of inaccessible chromatin and recruit factors that unpack the chromatin and allow other TFs to gain access; such SNVs can sometimes affect the regulation of whole regions of DNA, and potentially the expression of a multitude of genes. (D) SNVs can disrupt the process of DNA looping and the ability of TFs to come into contact with and regulate one another. Image was created with BioRender (https://biorender.com/).

Herein we discuss in detail the fundamental role of non-protein coding DNA in orchestrating gene expression. We also discuss potential approaches for validation of genetic variation in the non-protein coding genome as it contributes to pathological or protective changes in transcription. Such integration of population genetics with functional genomics has already resulted in a more mechanistic view of many GWAS-identified loci, which were previously poorly understood (De Gobbi et al. 2006, Smith et al. 2014, Claussnitzer et al. 2015, Deplancke et al. 2016, Lacher and Slattery 2016, Wang et al. 2018). Finally, we review methods to characterize gene-environment interactions as they relate to disease risk.

2.0. Non-Protein Coding DNA Orchestrates Gene Expression

Although this review will focus on the regulation of gene expression in humans, it is worth mentioning that much of what is known regarding the mechanisms controlling gene regulation was discovered using model organisms such as Saccharomyces cerevisiae (yeast), Caenorhabditis elegans (roundworm), Drosophila melanogaster (fruit flies), Danio rerio (zebrafish), and Mus musculus (mice). Comparing the genomes of these model organisms to humans has shed light on the significance of non-protein coding DNA in the evolution of complex multicellular organisms; however, the amount of non-protein coding DNA relative to protein-coding DNA in these model organisms is strikingly different from that seen in humans. The yeast genome is comprised of about 30% non-protein coding DNA, whereas the human genome contains about 98% (Lander 2011). In general, the ratio of non-protein coding to protein-coding DNA increases as the level of organismal complexity increases, although there are exceptions to this trend (e.g. the onion, (Palazzo and Gregory 2014)).

At the most basic level, non-protein coding DNA’s control of gene expression is regulated by intricate networks of DNA binding proteins, including TFs. Much effort has been devoted to identifying the range of sequences bound by a TF and understanding the mechanisms by which TFs recognize and bind their target DNA sequences. Methods such as electrophoretic mobility shift assays (EMSA), systematic evolution of ligands by exponential enrichment coupled with next generation sequencing (SELEX-seq), and high-throughput SELEX (HT-SELEX) have been extensively utilized for this purpose (Zhao et al. 2009, Jolma et al. 2010). Examination of the genome revealed that TF binding sites occur thousands, if not tens of thousands of times in the genome, although in any given context many of these TF binding sites remain unbound (Consortium 2012). Amazingly, TFs are able to identify which DNA binding sites and corresponding genes are needed (and which are not needed) in various cellular, developmental, and environmental contexts (Dimas et al. 2009, Slattery et al. 2014). Furthermore, a single base-pair change to a TF binding sequence can change a strong TF binding site to a weak binding site, or vice versa, and this can impact the expression level of the corresponding target gene (Fig 1A).

In yeast, most TF binding sites are found in the region directly upstream of the transcription start site (TSS). While this is also common in multicellular organisms, TF binding sites can also be found hundreds of thousands of bases away from the TSS (Fig 1B). These distal regulatory TF binding sites, commonly referred to as enhancers or enhancer elements, can be found within gene introns, downstream of genes, upstream of the TSS, clustered together, and in some cases even bypass intervening genes, (Visel et al. 2009, Bulger and Groudine 2011, Slattery et al. 2014, Deplancke et al. 2016). Enhancers can also work together to integrate information from groups of TFs that respond to the environment and then either positively or negatively regulate gene expression (Slattery et al. 2014). Characterization of enhancers is commonly carried out using reporter assays in which an enhancer is cloned into a plasmid containing a reporter gene (GFP, luciferase, etc.). These plasmids are then transfected into cells or organisms and the level of the reporter gene is correlated to TF binding at that enhancer region. In this manner, only one enhancer can be evaluated at a time. Massively parallel reporter assays (MPRAs) have enabled the analysis of thousands of potential enhancers in a single experiment (Patwardhan et al. 2012, Inoue and Ahituv 2015). Although experiments evaluating individual enhancer regions continue to be successfully used, MPRA allow for this to be done on a much larger and faster scale. A thorough review of MPRA and other current next generation techniques for high-throughput characterization of enhancers can be found in Gasperini et al. (Gasperini et al. 2020).

While it is easy to appreciate the impact that DNA-TF interactions may have on gene expression, there are other types of DNA-protein interactions that can also alter transcription. For instance, DNA is organized by wrapping around histone octamers referred to as nucleosomes. DNA-nucleosome interactions and overall nucleosome organization have been shown to play a major regulatory role in orchestrating gene expression (Fig 1C) (Bai and Morozov 2010). The post-translational modifications that are present on histones impact the formation and shape of the nucleosome as well as TF binding (Rando 2012, Rothbart and Strahl 2014). Some of these modifications are associated with more accessible chromatin, whereas others are associated with tightly wrapped DNA that is inaccessible to most TF binding. Histone modifications (epigenetic marks) are major drivers of contextual transcription factor binding and gene expression (Strahl and Allis 2000, Xin and Rohs 2018). Primary TFs, also known as Pioneer TFs (TF’ in Fig 1C), directly bind to tightly condensed DNA and recruit DNA binding proteins that can modify chromatin and allow other TFs to gain access. At a larger scale, many TFs are able to integrate information about the structural properties of large regions of chromosomes. DNA looping can bring together functional elements and allow for regulatory interactions between distal TFs (Fig 1D). The observation that enhancers can co-regulate genes separated by vast chromosomal distances demonstrates the impact that structural properties of DNA can have on gene transcription (de Wit and de Laat 2012, Deng et al. 2012, Phillips-Cremins et al. 2013, Pasquali et al. 2014, Rao et al. 2014, Miguel-Escalada et al. 2015).

As illustrated above, non-protein coding DNA orchestrates gene expression from base-level sequence resolution to chromatin structure. Regulation of gene expression at the level of TF binding (Fig 1A), the interaction of proximal and distal TFs into groups of enhancers (Fig 1B), chromatin accessibility (Fig 1C) and DNA looping (Fig 1D), are all heavily determined by the underlying DNA sequence. Gene regulation can be sensitive to slight disruptions at any of these levels, and consequently, SNVs in non-protein coding DNA have the potential to disrupt gene expression and contribute to exposure-mediated disease.

3.0. Genome-Wide Approaches to Elucidating Non-Protein Coding DNA’s Control of Gene Expression

In order to identify SNVs that disrupt regulatory DNA, researchers must first be able to identify regulatory DNA regions within the vast non-protein coding genome. Chromatin immunoprecipitation coupled with sequencing (ChIP-seq) has been used to locate all the regions bound by a given TF and identify potential enhancer elements across the entire genome (Barski et al. 2007, Johnson et al. 2007, Mikkelsen et al. 2007, Furey 2012). It is important to note, however, that ChIP-seq studies have revealed that TF binding is not necessarily equivalent to TF regulation; not all regions identified as bound in a ChIP-seq experiment are functional (Fisher et al. 2012). Despite this, ChIP-seq provides a map of genome-wide TF binding in different contexts, which can be used to construct TF regulatory networks and showcase those that are worth biological validation (Ward and Kellis 2012, Lacher et al. 2015). Integrating TF ChIP-seq data with histone ChIP-seq and other chromatin dynamics data has demonstrated that many TFs have specific histone and nucleosome preferences (Ernst and Kellis 2013).

TFs are only able to alter gene expression if they are able to access DNA, hence, techniques that measure chromatin accessibility directly are useful. Regions of accessible DNA can be identified using DNase-hypersensitivity assays, DNase-seq or the assay for transposase-accessible chromatin coupled with sequencing (ATAC-seq) (Buenrostro et al. 2013, Tsompana and Buck 2014). Histone modifications can alter chromatin accessibility and can serve as markers of transcriptional activity. For example, acetylation and tri-methylation of the lysine 4 residue of histone H3 (denoted H3K4ac and H3K4me3, respectively) are generally associated with regions that are actively transcribed (Heintzman et al. 2007), while histone H3 lysine 27 tri-methylation (H3K27me3) is associated with a polycomb-mediated repressive, closed chromatin state (Cao et al. 2002). The Roadmap Epigenomics Consortium combined a wide variety of data sets and generated genome-wide maps of chromatin states (Roadmap Epigenomics et al. 2015). These chromatin state maps illustrate the regulatory potential of regions across the genome for more than a hundred primary tissues.

Once a potential regulatory region has been identified, the next challenge is determining which gene(s) are regulated by this region. Although enhancers often regulate the expression of nearby genes, in many cases enhancers skip nearby genes and instead interact with more distal genes up to kilobases away. Much progress has been made in developing high-throughput methods to identify contacts between distant DNA elements, such as Hi-C and Hi-C 2.0 (Belaghzal et al. 2017). These methods have identified large regions of the genome with higher rates of intraregional interaction, called topologically associating domains (TADs), as well as DNA looping at kilobase resolutions (Dekker et al. 2013). This work has highlighted the importance of the large-scale structural organization of chromatin and has identified several instances of looping interactions, but the resolution does not always allow for direct linking of regulatory regions to target genes (Nicoletti et al. 2018). Thus, integrating Hi-C with expression quantitative trait loci (eQTL) and allele-specific expression experiments (discussed in section 4, and 7), are likely to be most useful in identifying disease-linked variants.

As ever-larger quantities and varied types of genomics data continue to be made publicly available, there has been a concomitant increase in the use of computational approaches designed to help make sense of all this data. For instance, neural networks are one type of computational algorithm for integrating varied types of information and learning interesting features of the data. A more complex version of neural networks, called deep learning systems (sometimes referred to as multi-layer neural networks), has been used to identify novel correlations and mechanistic insights from clinical and genomics data. One type of deep learning system, called a graph convolutional neural network, has been used to identify likely off target side-effects of drug combination therapies (Zitnik et al. 2018) and to model the physical and toxicological properties of novel compounds (Kearnes et al. 2016). In addition, several deep learning frameworks have already been developed that are reported to identify putative regulatory elements linked to gene expression, as well as SNVs likely to impact these elements (Kelley et al. 2018, Zhou et al. 2018). Utilizing deep learning models to better understand clinical and genomics data has several challenges including the tendency to overfit models, thus it is absolutely necessary to validate any predictions related to a specific SNV biologically. For more information on the promise and caveats of deep learning in genomics, we refer the reader to Eraslan et al. (Eraslan et al. 2019).

On the other hand, there are multiple user-friendly annotation tools (e.g. RegulomeDB and HaploReg) that allow researchers to identify potential regulatory DNA-disrupting SNVs and to prioritize them for follow up biological validation (Boyle et al. 2012, Ward and Kellis 2016). The encyclopedia of DNA elements (ENCODE) project has generated a publicly available genome-wide map of all putative elements in the human genome (Consortium 2012). Using these available maps and their accompanying search tools it is now relatively easy to view potential enhancers, histone modifications, common SNVs, regions of DNase hypersensitivity, and much more. Additionally, it is easy to see both the regions of coding and non-protein coding DNA, and to compare the conservation of particular regions across a wide variety of species. Combining these sources of information is entirely necessary to generate hypotheses and make accurate predictions about the effects of genetic variation in the non-protein coding genome.

4.0. Resolution of SNVs identified in GWAS

GWAS are rapidly providing data that can be used to determine how genetic variation contributes to an individual’s disease risk. As of January 2020, a total of 4004 publicly available datasets have been uploaded to GWAS central (Fig 2). GWAS identifies disease risk-associated SNVs in both protein coding and non-protein coding DNA; often thousands of SNVs are associated with a particular disease, but many are non-functional (Welter et al. 2014, Visscher et al. 2017). Although significant association of a locus does inherently imply relevance in disease risk, the identification of a causative SNV requires further investigation and biological validation.

Figure 2: Number of Studies Submitted to GWAS Central by Year as of January 2020.

Figure 2:

The publication year for all GWAS data submitted to GWAS Central (www.gwascentral.org) was extracted from the full study metadata. The number of GWAS submitted (y axis) is plotted as a single point for each year (x axis), from 2000 through 2019.

Adjacent SNVs in a given region are often genetically linked, that is they are co-inherited with one another; this co-inheritance is referred to as linkage disequilibrium (LD) (Daly et al. 2001, Lander 2011). GWAS can identify a statistically significant SNV (or SNVs) at a locus. However, because of LD it is not immediately clear which SNV is functional and drives the association and which SNVs are simply co-inherited with the functional SNV. Also, GWAS have been primarily performed in populations of European descent and, because of the design of GWAS, rare SNVs are not detected. To address these issues it has been suggested that GWAS be required to report the association of all variants and not just those that reach the specified significance threshold (Ward and Kellis 2012), however this is not yet standard practice. Further resolution and functional validation are thus required to understand the full role of an SNV in altering gene expression and disease risk (Kasarskis et al. 2011).

SNVs found to be correlated with significant changes in gene expression are known as expression quantitative trait loci (eQTL). Recent advances in RNA-seq and transcriptome profiling have enhanced our understanding of eQTLs and the genes under their regulatory control. Recently a large-scale eQTL study reported that 83% of the genes evaluated were regulated by at least one enhancer element (Kirsten et al. 2015), which is almost double the percentage reported in previous studies. Most of the eQTLs identified fell within 150 kilobases of the gene’s TSS or end site, however eQTL enriched areas were detected at a distance as large as 5 megabases away. Because of this, the authors emphasize that future eQTL studies may want to consider a more liberal constraint on the maximum distance from eQTL to gene in their analysis (Kirsten et al. 2015).

Large scale GWAS have brought to light that the most common forms of disease are likely not the result of only a single SNV, a single gene, or a single molecular outcome (Schadt 2009, Nica and Dermitzakis 2013, Boyle et al. 2017, Liu et al. 2019). This idea is not new; for over one hundred years scientists have thought that complex and common diseases are a result of the contribution of many genes with a small effect. These diseases have been termed polygenic diseases or traits and this pattern of genetic effect has been described as quantitative (Fisher 1918, Visscher et al. 2010, Visscher and Goddard 2019). Currently available data suggest that perturbation of networks of interacting genes are likely the cause of many common disease traits, and the homeostasis of these interacting genes hinge on properly functioning non-protein coding DNA and gene-environment interactions (Schadt 2009).

5.0. Leveraging Functional Genomics Data to Identify, Prioritize, and Validate Genetic Variation in Non-Protein Coding DNA

Closer evaluation of TF occupancy and GWAS data has led to the observation that there is a strong correlation between SNVs found in transcriptional regulatory elements and disease risk (Maurano et al. 2012, Schaub et al. 2012, Karczewski et al. 2013, Zeron-Medina et al. 2013, Fogarty et al. 2014, Dai et al. 2015). If an SNV has been associated with altered gene expression through eQTL analysis, then that SNV is more likely to be functionally important (Moffatt et al. 2007, Nica and Dermitzakis 2013). SNVs that are found in regions of potential enhancers are of particular interest, therefore experimental priorities are often focused on eQTL-SNVs located in accessible chromatin (Drake et al. 2005, Mehrabian et al. 2005, Schadt et al. 2005, Yaguchi et al. 2005, GuhaThakurta et al. 2006, Lum et al. 2006, Meng et al. 2007, Farber et al. 2009, Fransen et al. 2010, Kasarskis et al. 2011).

Histone tail modifications render the chromatin more or less accessible to TF’s and these modification patterns are heritable as epigenetic marks. Historically histone tail modifications have been thought of as independent of the specific DNA sequence. However, specific variations in DNA sequence can orchestrate changes to the histone tail modifications that alter gene expression (Kilpinen et al. 2013, McVicker et al. 2013). TF’s bind to specific DNA sequences and this leads to recruitment of the enzymes that modify histones. Consequently, SNVs can impact histone modifications and epigenetic inheritance. The histone H3 lysine 4 mono-methylation (H3K4me1) mark is indicative of an active enhancer element and identifying SNVs found near these regions can also help prioritize causal variants. (Heintzman et al. 2007). Tools have been made available that overlay SNVs with such chromatin state information to aid in this process (Ward and Kellis 2016).

Because TF-DNA interactions are highly sequence specific, SNVs that are thought to alter a TFs consensus DNA binding sequence should be given highest priority. However, SNVs falling outside of these regions should also be evaluated as candidates due to the reasons discussed above. Such SNVs should be cataloged and prioritized based on multiple criteria (overlap with eQTL data, evolutionary conservation, inferred regulatory potential, etc.), and the top candidates should be functionally tested for allele-specific differences in regulatory activity.

In Fig 3A we have provided a hypothetical example of a Manhattan Plot highlighting SNVs that were significantly associated with a disease, mapped to the chromosome in which the SNV is located. Close examination of the highlighted region reveals several dozen if not hundreds of significant SNVs. One could examine this region by bioinformatically overlaying this GWAS data with ENCODE data. This could show if any of the SNVs fall in potential enhancer regions that have been previously demonstrated to be bound by a known TF (Fig 3B), and/or whether they could disrupt a known TF DNA binding sequence (Fig 3C). Publicly available data from the Genotype-Tissue Expression (GTEx) portal can also be utilized to examine the expression of genes with loci nearby the SNV. This strategy would be used to see if there is a functional consequence of this SNV on the allele-specific expression of any of the predicted target genes (Fig 3D). The SNVs at the top of the list can then be biologically validated in a number of ways using in vitro techniques such as EMSA and in vivo techniques such as enhancer-bashing reporter assays (Claussnitzer et al. 2015).

Figure 3: Identification and Validation of SNVs in Non-Protein Coding DNA Using Functional Genomics Data.

Figure 3:

(A) Example Manhattan plot of ‘simulated’ GWAS data; SNVs falling above the blue and red line, so called ‘suggestive’ (p < 1 x 10−5) and ‘genome-wide’ (multiple testing-corrected p < 0.05) thresholds, are considered moderately and highly significant SNVs associated with disease risk, respectively. (B) Higher resolution of one of the significant regions (in this example, a region located on chromosome 15) show that there can be many possible genes affected by SNVs in this region. Looking closer at the LD map of the SNVs in this region indicates that some SNVs are more significant than others (darker red represents the most significant SNVs in this example). Further information, such as putative enhancer regions identified by the ENCODE project, can be utilized to identify any SNVs that are more likely to have a regulatory role. (C) An even higher resolution exploration of this region might reveal that one of the SNVs falls within a known enhancer region, and can alter a TF DNA binding sequence, changing it from a high affinity binding site to one of low affinity. (D) Using GTEx data to generate rank-normalized allele-specific expression of the different allelic combinations of the SNV variants can confirm that the SNV identified is associated with altered gene expression in disease-relevant tissues. Image was created with BioRender (https://biorender.com/).

Additional tools to biologically validate an SNV recently emerged with the development of genome editing CRISPR/Cas technology. The CRISPR/Cas system has facilitated editing of targeted- single base pairs in the genome, which can allow for efficient and accurate validation of an SNV in many model systems. The CRISPR/Cas9 system has also been modified to utilize Cas9 with deactivated nuclease activity. Instead of editing targeted sites, the activity of regions of the genome can be modified using CRISPR activation (CRISPRa) or CRISPR interference (CRISPRi). These advances have provided a technique for targeted sequence-specific control of transcriptional repression and activation (Gilbert et al. 2014). In order to prioritize causal variants at the TNFAIP3 locus, Ray et al. recently performed a study highlighting the perturbational utility of CRISPRi. Seven experimental methods were compared to evaluate SNVs at the TNFAIP3 locus. SNVs that reside in 1) CRISPRi-sensitive regions or 2) in a region of accessible DNA with allele-specific reporter activity were the most useful characteristics for identifying causal SNVs (Ray et al. 2020). CRISPR/Cas methods are also being extended to interrogate larger scale genetic variation such as copy number variants (CNVs), and inversions (Kraft et al. 2015).

6.0. Examples of Allele-Specific Expression

Interindividual variability in drug response can lead to both adverse effects and diminished therapeutic benefits. Much of this interindividual variation is caused by differential expression of enzymes and transporters that largely contribute to xenobiotic disposition (Kuehl et al. 2001, Urquhart et al. 2007, Sanford et al. 2013, Wang et al. 2014). The Pregnane X Receptor (PXR) is a TF that has been shown to control expression of many metabolic enzymes and transporters that contribute to xenobiotic disposition (Tolson and Wang 2010, Aleksunes and Klaassen 2012, Gorczyca and Aleksunes 2020). Smith et al. performed rifampin-induced ChIP-seq and RNA-seq on primary human liver cells in order to map PXR-DNA binding and gene expression (PXR is activated by rifampin). In their analysis they considered three markers of regulation including a marker of functional enhancer regions (p300) and 2 histone modifications (H3K4me1 and H3K27ac). Thousands of rifampin-induced PXR ChIP-seq peaks were reported and the authors include an elegant discussion section regarding the importance of designing future ChIP-seq experiments to include physiological (basal) conditions as well as conditions in which an environmental exposure is introduced. In fact, the authors state that 7 of the 19 functional elements identified would have been undetected if they had only examined peaks under basal conditions. Of the potential thousands of elements identified in ChIP-seq, screening these enhancer regions using reporter assays identified only a few interesting functional elements. Additionally, the authors identified several novel enhancer regions previously unreported as contributing to expression of these metabolizing enzymes and transporters. Further analysis suggested that SNVs in these enhancers contribute to changes in gene expression that lead to adverse drug responses. (Smith et al. 2014).

Another recent example comes from the identification and validation of an SNV in the FTO gene that contributes to obesity (Claussnitzer et al. 2015). Previously, obesity risk had been associated with FTO variation, and GWAS had identified an obesity-linked region that was made up of a total of 89 SNVs (Dina et al. 2007, Frayling et al. 2007). Prior to this publication it was unknown which of these 89 SNVs could be functionally contributing to increased obesity risk. In order to prioritize SNVs the authors utilized epigenetic annotations, chromatin state maps, regulatory motif conservation, and conducted eQTL analysis. The authors then performed targeted experiments in which they functionally evaluated the risk-associated and non-risk-associated SNV clusters. These studies led the authors to identify an SNV in an enhancer within the FTO locus that was responsible for altered expression of the genes IRX3 and IRX5. They then used CRISPR/Cas9 editing to establish causality of the predicted SNV in adipocytes from risk and non-risk individuals (Claussnitzer et al. 2015).

The methods and recently published examples described in this section serve as an experimental platform for the identification, prioritization, and validation of SNVs in the non-protein coding genome. Using similar approaches to those described above, we recently identified neurodegenerative disease-associated SNVs that alter binding sequences for the stress-responsive NRF1 (NFE2L1) and NRF2 (NFE2L2) TFs (Wang et al. 2016, Lacher et al. 2018). Adding to this, Wang et al. recently developed an integrated computational system to identify and prioritize SNVs that disrupt binding sequences for NRF2/NRF1 across the whole genome (Wang et al. 2007). Cellular stress and exposure to xenobiotics modulate the activity of NRF1 and NRF2, and thus environmental exposure will likely amplify the influence of these SNVs on disease.

Although the above examples demonstrate disease-associated variants that alter gene expression in a way that is meaningful, it remains difficult to mechanistically demonstrate that a change in expression directly causes disease. However, in the field of pharmacogenomics we do have examples where an increase or decrease of protein levels due to genetic variation can alter drug metabolism. One example is copy number variation in the gene encoding the enzyme CYP2D6 (Dalton et al. 2020). In this case, the function of the CYP2D6 protein remains the same but CNVs alter the amount of total CYP2D6 protein available for metabolizing drugs (e.g. codeine, tamoxifen, selective serotonin reuptake inhibitors, tricyclic antidepressants, antiemetics, and atomoxetine) and this information directly influences dosing and therapeutic decision making. These pharmacogenomic studies provide directly measurable and immediate endpoints that can be conclusively linked to altered metabolism as a result of genetic variation-associated expression changes. However, in the field of toxicogenomics we are just at the beginning of understanding how exposure combined with genetic variation can disrupt gene expression and contribute to disease, especially considering that the endpoints being examined are the effects of long-term exposure over the course of an individual’s lifetime.

7.0. Gene-Environment Interactions

Gene-environment interactions (GxE) are likely to modify the impact of certain disease risk loci and need to also be considered when investigating the impact of genetic variation on gene expression. The DNA sequence itself is not altered by the environment in GxE; rather, following activation of a TF by a given exposure, SNVs can alter TF binding strength, histone modifications, and gene expression. One well-supported example of a GxE is genetic variation in FTO that increases obesity risk and is attenuated by exercise (Frayling et al. 2007, Scuteri et al. 2007, Rampersaud et al. 2008). Many studies like this associate a single exposure (exercise) with attenuating the allele-specific expression of a single gene (FTO). Mapping of eQTLs has led to the emergence of response eQTLs (reQTLs). For reQTLs, the pattern observed in an eQTL differs following exposures to different stimuli (Kim-Hellmuth et al. 2017). A recent report using human monocytes assessed genetic variants associated with exposure to different immune stimulating compounds (LPS, MDP, and 5’-ppp-dsRNA). The analysis of reQTLs performed in this study provided support for a model in which genetic risk factors for disease can be entirely driven by a failure to properly respond to an external stimulus (Kim-Hellmuth et al. 2017).

Allele-specific expression analysis is a method similar to eQTL that has been recently utilized in a high-throughput in vitro system to characterize the cellular response to xenobiotics (Moyerbrailean et al. 2016). This study focused on 5 cell types and 50 xenobiotics (grouped into 6 categories: steroid hormones, peptide hormones, metal ions, dietary components, common drugs, and environmental contaminants). 1455 genes demonstrated allele-specific expression, and 215 genes were described to have GxE. Of the genes that were differentially expressed following exposure, 22% had been associated with complex disease traits in previous GWAS. The authors provide examples of candidate GxE for complex diseases as a web-resource (Moyerbrailean et al. 2016).

The Gene Environment Association Studies consortium (GENEVA) was initiated in 2006 with a goal of identifying variations in gene-trait associations related to environmental exposures. GENEVA has highlighted that the under-appreciation of GxE in GWAS is due in part to a lack of established methods for how to study these types of interactions in a GWAS context (Cornelis et al. 2010). Integrating GxE into GWAS will likely improve the ability to detect functional genetic variants (Cantor et al. 2010, Thomas 2010). The shift from candidate gene to genome-wide and candidate pathway studies is also expanding to GxE studies (Winham and Biernacka 2013, Dick et al. 2015, Sharafeldin et al. 2015).

Many gains have been made in the last decade by taking the approach of studying a single SNV or a single exposure, however, many common human diseases are more complex than what can be captured using those approaches (Zuk et al. 2012). GxE studies can be focused on a biological pathway that is relevant to the disease and then focus on the environmental exposures that are relevant to that pathway (Sharafeldin et al. 2015). This approach has been termed the “candidate-pathway approach”. These approaches will advance toxicology towards understanding GxE in light of genome-wide variation and a lifetime of exposure.

8.0. Conclusions

Many resources exist that annotate non-protein coding SNVs identified in GWAS and combine them with conservation, functional genomics and TF-DNA binding data (McLaren et al. 2010, Boyle et al. 2012, Ward and Kellis 2012, Ward and Kellis 2012). Beyond SNVs, larger structural variants (i.e. CNVs, inversions) that influence many nucleotides and can often affect more than one gene are also correlated with SNV inheritance (Wang et al. 2008). As advancements in technology and experimental approaches are made, these resources will need to be continuously updated and integrated. When used in combination, these resources provide hints to the biological implications of genetic variation in the non-protein coding genome. These hints then are used to generate hypotheses and design experiments to dissect the target genes, epigenetic markers, pathways, cell types, and exposures that can potentially be disrupted by a SNV, and how this collectively can result in disease. Knowledge surrounding the impact of genetic variation on drug metabolizing enzymes and transporters has been used to direct precision medicine; with much of this attributed to CNVs and thus, protein expression. In a similar fashion, we hope that combining GWAS, functional genomics of gene expression, and exposure data using the strategies described herein will help to further decipher the impact of genetic variation and exposure on disease, improve models of exposure risk, identify at-risk individuals, and inform regulatory policy.

ACKNOWLEDGMENTS

The authors would like to thank Dr. Matthew Slattery, Dr. Salil Saurav Pathak, and Jennifer Krznarich M.S. for helpful advice and feedback in the preparation of this review. This work was supported by the National Institute of General Medical Sciences (R35-GM-119553 to Matthew Slattery). The authors have no competing interests to declare.

Abbreviations

SNV

Single nucleotide variant

GWAS

Genome wide association studies

TSS

Transcription start site

TFs

Transcription factors

EMSA

Electrophoretic mobility shift assays

SELEX-seq

Systematic evolution of ligands by exponential enrichment coupled with sequencing

HT-SELEX

High-throughput systematic evolution of ligands by exponential enrichment

MPRA

Massively parallel reporter assays

ENCODE

Encyclopedia of DNA elements

ChIP-seq

Chromatin immunoprecipitation coupled with sequencing

H3K4ac

Histone H3 lysine 4 acetylation

H3K4me3

Histone H3 lysine 4 tri-methylation

H3K27me3

Histone H3 lysine 27 tri-methylation

H3K4me1

Histone H3 lysine 4 mono-methylation

TADs

Topologically associating domains

eQTL

Expression quantitative trait loci

CRISPRa

CRISPR activation

CRISPRi

CRISPR interference

CNV

Copy number variant

GTEx

Genotype-Tissue Expression

GxE

Gene-environment interactions

reQTLs

Response eQTLs

GENEVA

The Gene Environment Association Studies Consortia

Footnotes

Declaration of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Aleksunes LM and Klaassen CD (2012). "Coordinated regulation of hepatic phase I and II drug-metabolizing genes and transporters using AhR-, CAR-, PXR-, PPARalpha-, and Nrf2-null mice." Drug Metab Dispos 40(7): 1366–1379. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bai L and Morozov AV (2010). "Gene regulation by nucleosome positioning." Trends Genet 26(11): 476–483. [DOI] [PubMed] [Google Scholar]
  3. Barski A, et al. (2007). "High-resolution profiling of histone methylations in the human genome." Cell 129(4): 823–837. [DOI] [PubMed] [Google Scholar]
  4. Belaghzal H, et al. (2017). "Hi-C 2.0: An optimized Hi-C procedure for high-resolution genome-wide mapping of chromosome conformation." Methods 123: 56–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Boyle AP, et al. (2012). "Annotation of functional variation in personal genomes using RegulomeDB." Genome Res 22(9): 1790–1797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Boyle EA, et al. (2017). "An Expanded View of Complex Traits: From Polygenic to Omnigenic." Cell 169(7): 1177–1186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Buenrostro JD, et al. (2013). "Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position." Nat Methods 10(12): 1213–1218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bulger M and Groudine M (2011). "Functional and mechanistic diversity of distal transcription enhancers." Cell 144(3): 327–339. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Cantor RM, et al. (2010). "Prioritizing GWAS results: A review of statistical methods and recommendations for their application." Am J Hum Genet 86(1): 6–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Cao R, et al. (2002). "Role of histone H3 lysine 27 methylation in Polycomb-group silencing." Science 298(5595): 1039–1043. [DOI] [PubMed] [Google Scholar]
  11. Claussnitzer M, et al. (2015). "FTO Obesity Variant Circuitry and Adipocyte Browning in Humans." N Engl J Med 373(10): 895–907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Consortium EP (2012). "An integrated encyclopedia of DNA elements in the human genome." Nature 489(7414): 57–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Cornelis MC, et al. (2010). "The Gene, Environment Association Studies consortium (GENEVA): maximizing the knowledge obtained from GWAS by collaboration across studies of multiple conditions." Genet Epidemiol 34(4): 364–372. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Dai J, et al. (2015). "Systematical analyses of variants in CTCF-binding sites identified a novel lung cancer susceptibility locus among Chinese population." Sci Rep 5: 7833. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Dalton R, et al. (2020). "Interrogation of CYP2D6 Structural Variant Alleles Improves the Correlation Between CYP2D6 Genotype and CYP2D6-Mediated Metabolic Activity." Clin Transl Sci 13(1): 147–156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Daly MJ, et al. (2001). "High-resolution haplotype structure in the human genome." Nat Genet 29(2): 229–232. [DOI] [PubMed] [Google Scholar]
  17. De Gobbi M, et al. (2006). "A regulatory SNP causes a human genetic disease by creating a new transcriptional promoter." Science 312(5777): 1215–1217. [DOI] [PubMed] [Google Scholar]
  18. de Wit E and de Laat W (2012). "A decade of 3C technologies: insights into nuclear organization." Genes Dev 26(1): 11–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Dekker J, et al. (2013). "Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data." Nat Rev Genet 14(6): 390–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Deng W, et al. (2012). "Controlling long-range genomic interactions at a native locus by targeted tethering of a looping factor." Cell 149(6): 1233–1244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Deplancke B, et al. (2016). "The Genetics of Transcription Factor DNA Binding Variation." Cell 166(3): 538–554. [DOI] [PubMed] [Google Scholar]
  22. Dick DM, et al. (2015). "Candidate gene-environment interaction research: reflections and recommendations." Perspect Psychol Sci 10(1): 37–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Dimas AS, et al. (2009). "Common regulatory variation impacts gene expression in a cell type-dependent manner." Science 325(5945): 1246–1250. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Dina C, et al. (2007). "Variation in FTO contributes to childhood obesity and severe adult obesity." Nat Genet 39(6): 724–726. [DOI] [PubMed] [Google Scholar]
  25. Drake TA, et al. (2005). "Integrating genetic and gene expression data to study the metabolic syndrome and diabetes in mice." Am J Ther 12(6): 503–511. [DOI] [PubMed] [Google Scholar]
  26. Eraslan G, et al. (2019). "Deep learning: new computational modelling techniques for genomics." Nat Rev Genet. [DOI] [PubMed] [Google Scholar]
  27. Ernst J and Kellis M (2013). "Interplay between chromatin state, regulator binding, and regulatory motifs in six human cell types." Genome Res 23(7): 1142–1154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Farber CR, et al. (2009). "Genetic dissection of a major mouse obesity QTL (Carfhg2): integration of gene expression and causality modeling." Physiol Genomics 37(3): 294–302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Fisher RA (1918). "The correlation between relatives on the supposition of Mendelian inheritance." The Genetical Theory of Natural Selection 52: 339–433. [Google Scholar]
  30. Fisher WW, et al. (2012). "DNA regions bound at low occupancy by transcription factors do not drive patterned reporter gene expression in Drosophila." Proc Natl Acad Sci U S A 109(52): 21330–21335. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Fogarty MP, et al. (2014). "Identification of a regulatory variant that binds FOXA1 and FOXA2 at the CDC123/CAMK1D type 2 diabetes GWAS locus." PLoS Genet 10(9): e1004633. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Fransen K, et al. (2010). "Analysis of SNPs with an effect on gene expression identifies UBE2L3 and BCL3 as potential new risk genes for Crohn's disease." Hum Mol Genet 19(17): 3482–3488. [DOI] [PubMed] [Google Scholar]
  33. Frayling TM, et al. (2007). "A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity." Science 316(5826): 889–894. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Furey TS (2012). "ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions." Nat Rev Genet 13(12): 840–852. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Gasperini M, et al. (2020). "Towards a comprehensive catalogue of validated and target-linked human enhancers." Nat Rev Genet. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Gilbert LA, et al. (2014). "Genome-Scale CRISPR-Mediated Control of Gene Repression and Activation." Cell 159(3): 647–661. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Gorczyca L and Aleksunes LM (2020). "Transcription factor-mediated regulation of the BCRP/ABCG2 efflux transporter: a review across tissues and species." Expert Opin Drug Metab Toxicol 16(3): 239–253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. GuhaThakurta D, et al. (2006). "Cis-regulatory variations: a study of SNPs around genes showing cis-linkage in segregating mouse populations." BMC Genomics 7: 235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Heintzman ND, et al. (2007). "Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome." Nat Genet 39(3): 311–318. [DOI] [PubMed] [Google Scholar]
  40. Inoue F and Ahituv N (2015). "Decoding enhancers using massively parallel reporter assays." Genomics 106(3): 159–164. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Johnson DS, et al. (2007). "Genome-wide mapping of in vivo protein-DNA interactions." Science 316(5830): 1497–1502. [DOI] [PubMed] [Google Scholar]
  42. Jolma A, et al. (2010). "Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities." Genome Res 20(6): 861–873. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Karczewski KJ, et al. (2013). "Systematic functional regulatory assessment of disease-associated variants." Proc Natl Acad Sci U S A 110(23): 9607–9612. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Kasarskis A, et al. (2011). "Integrative genomics strategies to elucidate the complexity of drug response." Pharmacogenomics 12(12): 1695–1715. [DOI] [PubMed] [Google Scholar]
  45. Kearnes S, et al. (2016). "Molecular graph convolutions: moving beyond fingerprints." J Comput Aided Mol Des 30(8): 595–608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Kelley DR, et al. (2018). "Sequential regulatory activity prediction across chromosomes with convolutional neural networks." Genome Res 28(5): 739–750. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Kilpinen H, et al. (2013). "Coordinated effects of sequence variation on DNA binding, chromatin structure, and transcription." Science 342(6159): 744–747. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Kim-Hellmuth S, et al. (2017). "Genetic regulatory effects modified by immune activation contribute to autoimmune disease associations." Nat Commun 8(1): 266. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Kirsten H, et al. (2015). "Dissecting the genetics of the human transcriptome identifies novel trait-related trans-eQTLs and corroborates the regulatory relevance of non-protein coding loci." Hum Mol Genet 24(16): 4746–4763. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Kraft K, et al. (2015). "Deletions, Inversions, Duplications: Engineering of Structural Variants using CRISPR/Cas in Mice." Cell Rep. [DOI] [PubMed] [Google Scholar]
  51. Kuehl P, et al. (2001). "Sequence diversity in CYP3A promoters and characterization of the genetic basis of polymorphic CYP3A5 expression." Nat Genet 27(4): 383–391. [DOI] [PubMed] [Google Scholar]
  52. Lacher SE, et al. (2018). "A hypermorphic antioxidant response element is associated with increased MS4A6A expression and Alzheimer's disease." Redox Biol 14: 686–693. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Lacher SE, et al. (2015). "Beyond antioxidant genes in the ancient Nrf2 regulatory network." Free Radic Biol Med 88(Pt B): 452–465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Lacher SE and Slattery M (2016). "Gene regulatory effects of disease-associated variation in the NRF2 network." Curr Opin Toxicol 1: 71–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Lander ES (2011). "Initial impact of the sequencing of the human genome." Nature 470(7333): 187–197. [DOI] [PubMed] [Google Scholar]
  56. Liu X, et al. (2019). "Trans Effects on Gene Expression Can Drive Omnigenic Inheritance." Cell 177(4): 1022–1034 e1026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Lum PY, et al. (2006). "Elucidating the murine brain transcriptional network in a segregating mouse population to identify core functional modules for obesity and diabetes." J Neurochem 97 Suppl 1: 50–62. [DOI] [PubMed] [Google Scholar]
  58. Manolio TA, et al. (2009). "Finding the missing heritability of complex diseases." Nature 461(7265): 747–753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Maurano MT, et al. (2012). "Systematic localization of common disease-associated variation in regulatory DNA." Science 337(6099): 1190–1195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. McLaren W, et al. (2010). "Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor." Bioinformatics 26(16): 2069–2070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. McVicker G, et al. (2013). "Identification of genetic variants that affect histone modifications in human cells." Science 342(6159): 747–749. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Mehrabian M, et al. (2005). "Integrating genotypic and expression data in a segregating mouse population to identify 5-lipoxygenase as a susceptibility gene for obesity and bone traits." Nat Genet 37(11): 1224–1233. [DOI] [PubMed] [Google Scholar]
  63. Meng H, et al. (2007). "Identification of Abcc6 as the major causal gene for dystrophic cardiac calcification in mice through integrative genomics." Proc Natl Acad Sci U S A 104(11): 4530–4535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Miguel-Escalada I, et al. (2015). "Transcriptional enhancers: functional insights and role in human disease." Curr Opin Genet Dev 33: 71–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Mikkelsen TS, et al. (2007). "Genome-wide maps of chromatin state in pluripotent and lineage-committed cells." Nature 448(7153): 553–560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Moffatt MF, et al. (2007). "Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma." Nature 448(7152): 470–473. [DOI] [PubMed] [Google Scholar]
  67. Moyerbrailean GA, et al. (2016). "High-throughput allele-specific expression across 250 environmental conditions." Genome Res 26(12): 1627–1638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Nica AC and Dermitzakis ET (2013). "Expression quantitative trait loci: present and future." Philos Trans R Soc Lond B Biol Sci 368(1620): 20120362. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Nicoletti C, et al. (2018). "Computational methods for analyzing genome-wide chromosome conformation capture data." Curr Opin Biotechnol 54: 98–105. [DOI] [PubMed] [Google Scholar]
  70. Nishizaki SS and Boyle AP (2017). "Mining the Unknown: Assigning Function to Noncoding Single Nucleotide Polymorphisms." Trends Genet 33(1):34–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Palazzo AF and Gregory TR (2014). "The case for junk DNA." PLoS Genet 10(5): e1004351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Pasquali L, et al. (2014). "Pancreatic islet enhancer clusters enriched in type 2 diabetes risk-associated variants." Nat Genet 46(2): 136–143. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Patwardhan RP, et al. (2012). "Massively parallel functional dissection of mammalian enhancers in vivo." Nat Biotechnol 30(3): 265–270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Phillips-Cremins JE, et al. (2013). "Architectural protein subclasses shape 3D organization of genomes during lineage commitment." Cell 153(6): 1281–1295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Rampersaud E, et al. (2008). "Physical activity and the association of common FTO gene variants with body mass index and obesity." Arch Intern Med 168(16): 1791–1797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Rando OJ (2012). "Combinatorial complexity in chromatin structure and function: revisiting the histone code." Curr Opin Genet Dev 22(2): 148–155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Rao SS, et al. (2014). "A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping." Cell 159(7): 1665–1680. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Ray JP, et al. (2020). "Prioritizing disease and trait causal variants at the TNFAIP3 locus using functional and genomic features." Nat Commun 11(1): 1237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Roadmap Epigenomics C, et al. (2015). "Integrative analysis of 111 reference human epigenomes." Nature 518(7539): 317–330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Rothbart SB and Strahl BD (2014). "Interpreting the language of histone and DNA modifications." Biochim Biophys Acta 1839(8): 627–643. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Sanford JC, et al. (2013). "Regulatory polymorphisms in CYP2C19 affecting hepatic expression." Drug Metabol Drug Interact 28(1): 23–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Schadt EE (2009). "Molecular networks as sensors and drivers of common human diseases." Nature 461 (7261): 218–223. [DOI] [PubMed] [Google Scholar]
  83. Schadt EE, et al. (2005). "An integrative genomics approach to infer causal associations between gene expression and disease." Nat Genet 37(7): 710–717. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Schaub MA, et al. (2012). "Linking disease associations with regulatory information in the human genome." Genome Res 22(9): 1748–1759. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Scuteri A, et al. (2007). "Genome-wide association scan shows genetic variants in the FTO gene are associated with obesity-related traits." PLoS Genet 3(7): e115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Sharafeldin N, et al. (2015). "A Candidate-Pathway Approach to Identify Gene-Environment Interactions: Analyses of Colon Cancer Risk and Survival." J Natl Cancer Inst 107(9). [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Slattery M, et al. (2014). "Absence of a simple code: how transcription factors read the genome." Trends Biochem Sci 39(9): 381–399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Smith RP, et al. (2014). "Genome-wide discovery of drug-dependent human liver regulatory elements." PLoS Genet 10(10): e1004648. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Strahl BD and Allis CD (2000). "The language of covalent histone modifications." Nature 403(6765): 41–45. [DOI] [PubMed] [Google Scholar]
  90. Thomas D (2010). "Gene--environment-wide association studies: emerging approaches." Nat Rev Genet 11(4): 259–272. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Tolson AH and Wang H (2010). "Regulation of drug-metabolizing enzymes by xenobiotic receptors: PXR and CAR." Adv Drug Deliv Rev 62(13): 1238–1249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Tsompana M and Buck MJ (2014). "Chromatin accessibility: a window into the genome." Epigenetics Chromatin 7(1): 33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Urquhart BL, et al. (2007). "Nuclear receptors and the regulation of drug-metabolizing enzymes and drug transporters: implications for interindividual variability in response to drugs." J Clin Pharmacol 47(5): 566–578. [DOI] [PubMed] [Google Scholar]
  94. Visel A, et al. (2009). "Genomic views of distant-acting enhancers." Nature 461(7261): 199–205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Visscher PM and Goddard ME (2019). "From R.A. Fisher's 1918 Paper to GWAS a Century Later." Genetics 211(4): 1125–1130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Visscher PM, et al. (2010). "From Galton to GWAS: quantitative genetics of human height." Genet Res (Camb) 92(5-6): 371–379. [DOI] [PubMed] [Google Scholar]
  97. Visscher PM, et al. (2017). "10 Years of GWAS Discovery: Biology, Function, and Translation." Am J Hum Genet 101(1): 5–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Wang D, et al. (2014). "Common CYP2D6 polymorphisms affecting alternative splicing and transcription: long-range haplotypes with two regulatory variants modulate CYP2D6 activity." Hum Mol Genet 23(1): 268–278. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Wang H, et al. (2018). "Crosstalk of Genetic Variants, Allele-Specific DNA Methylation, and Environmental Factors for Complex Disease Risk." Front Genet 9: 695. [DOI] [PMC free article] [PubMed] [Google Scholar]
  100. Wang K, et al. (2008). "Modeling genetic inheritance of copy number variations." Nucleic Acids Res 36(21): e138. [DOI] [PMC free article] [PubMed] [Google Scholar]
  101. Wang X, et al. (2016). "A Polymorphic Antioxidant Response Element Links NRF2/sMAF Binding to Enhanced MAPT Expression and Reduced Risk of Parkinsonian Disorders." Cell Rep 15(4): 830–842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  102. Wang X, et al. (2007). "Identification of polymorphic antioxidant response elements in the human genome." Hum Mol Genet 16(10): 1188–1200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  103. Ward LD and Kellis M (2012). "HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants." Nucleic Acids Res 40(Database issue): D930–934. [DOI] [PMC free article] [PubMed] [Google Scholar]
  104. Ward LD and Kellis M (2012). "Interpreting noncoding genetic variation in complex traits and human disease." Nat Biotechnol 30(11): 1095–1106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. Ward LD and Kellis M (2016). "HaploReg v4: systematic mining of putative causal variants, cell types, regulators and target genes for human complex traits and disease." Nucleic Acids Res 44(D1): D877–881. [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Welter D, et al. (2014). "The NHGRI GWAS Catalog, a curated resource of SNP-trait associations." Nucleic Acids Res 42(Database issue): D1001–1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  107. Winham SJ and Biernacka JM (2013). "Gene-environment interactions in genome-wide association studies: current approaches and new directions." J Child Psychol Psychiatry 54(10): 1120–1134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  108. Xin B and Rohs R (2018). "Relationship between histone modifications and transcription factor binding is protein family specific." Genome Res. [DOI] [PMC free article] [PubMed] [Google Scholar]
  109. Yaguchi H, et al. (2005). "Identification of candidate genes in the type 2 diabetes modifier locus using expression QTL." Genomics 85(5): 591–599. [DOI] [PubMed] [Google Scholar]
  110. Zeron-Medina J, et al. (2013). "A polymorphic p53 response element in KIT ligand influences cancer risk and has undergone natural selection." Cell 155(2): 410–422. [DOI] [PMC free article] [PubMed] [Google Scholar]
  111. Zhang F and Lupski JR (2015). "Non-coding genetic variants in human disease." Hum Mol Genet 24(R1): R102–110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  112. Zhao Y, et al. (2009). "Inferring binding energies from selected binding sites." PLoS Comput Biol 5(12): e1000590. [DOI] [PMC free article] [PubMed] [Google Scholar]
  113. Zhou J, et al. (2018). "Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk." Nat Genet 50(8): 1171–1179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  114. Zitnik M, et al. (2018). "Modeling polypharmacy side effects with graph convolutional networks." Bioinformatics 34(13): i457–i466. [DOI] [PMC free article] [PubMed] [Google Scholar]
  115. Zuk O, et al. (2012). "The mystery of missing heritability: Genetic interactions create phantom heritability." Proc Natl Acad Sci U S A 109(4): 1193–1198. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES