Summary
The NRF2/sMAF protein complex regulates the oxidative stress response by occupying cis-acting enhancers containing an antioxidant response element (ARE). Integrating genome-wide maps of NRF2/sMAF occupancy with disease-susceptibility loci, we discovered 8 polymorphic AREs linked to 14 highly-ranked disease-risk SNPs in Caucasians. Among these SNPs was rs242561, located within a regulatory region of the MAPT gene (encoding microtubule-associated protein Tau). It was consistently occupied by NRF2/sMAF in multiple experiments and its strong-binding allele associated with higher mRNA levels in cell lines and human brain tissue. Induction of MAPT transcription by NRF2 was confirmed using a human neuroblastoma cell line and a Nrf2-deficient mouse model. Most importantly, rs242561 displayed complete linkage disequilibrium with a highly protective allele identified in multiple GWASs of progressive supranuclear palsy, Parkinson’s disease, and corticobasal degeneration. These observations suggest a potential role for NRF2/sMAF in tauopathies and a possible role for NRF2 pathway activators in disease prevention.
Introduction
Identifying genetic variants that play a mechanistic role in human disease is critical for better understanding the origins, progression and treatment of diseases, and for identifying biomarkers that could reveal individuals at increased risk for developing disease. Genome-wide association studies (GWASs) have identified approximately 2000 single nucleotide polymorphisms (SNPs) significantly (p < 5 ×10−8) associated with susceptibility to more than 70 human diseases (Welter et al., 2014). However, differentiating the causal SNPs responsible for the disease association from the many non-functional, associated SNPs has been difficult. Large-scale initiatives such as the Encyclopedia of DNA Elements (ENCODE) (Consortium, 2012) and the Roadmap Epigenomics Project (Consortium, 2015b) have mapped catalogues of regulatory regions, including genome-wide occupancy of hundreds of transcription factors (TFs) across dozens of cell lines using ChIP-Seq (chromatin immunoprecipitation-sequencing). The 1000 Genomes Project has built a resource for determining genetic linkage based on the sequenced genomes of 1,092 individuals from 14 populations (Genomes Project et al., 2012). Importantly about 80% of disease-associated SNPs identified in GWASs are significantly enriched in non-coding functional DNA elements identified by ENCODE (Schaub et al., 2012). We anticipate that variants affecting TF binding in gene regulatory elements may lead to downstream differences in gene expression and be the underlying mechanism causing a disease phenotype. Recently, integration of GWAS data with TF occupancy has strongly supported a role for polymorphic transcriptional regulatory elements in the risk of cancer (Zeron-Medina et al., 2013) and other diseases (Fogarty et al., 2014; Karczewski et al., 2013; Maurano et al., 2012; Schaub et al., 2012). Furthermore, incorporating eQTL (expression quantitative trait loci) information from multiple tissues into the analysis of GWAS results can provide functional support linking SNPs that affect gene expression with disease risk (Dubois et al., 2010; Farh et al., 2015).
The transcription factor NRF2 (nuclear factor, erythroid 2 like 2; NFE2L2) is the “master regulator” of the cellular antioxidant response to oxidative or electrophilic stress. Under basal conditions, NRF2 resides mainly in the cytoplasm bound to its cysteine-rich, Kelch domain-containing partner KEAP1, which is anchored to the actin cytoskeleton and regulates ubiquitination and degradation of NRF2. Reactive oxygen species (ROS) or NRF2 pathway activators modify cysteine residues of KEAP1, leading to conformational change and disrupting the KEAP1-NRF2 complex. Thus, NRF2 accumulates, translocates to the nucleus and heterodimerizes with bZIP proteins such as small MAF proteins (e.g. MAFG, MAFF, and MAFK) to form a transactivation complex that binds to antioxidant response elements (AREs) (Rushmore et al., 1991; Tong et al., 2006; Wakabayashi et al., 2004). These cis-acting enhancer ARE sequences are found in the promoter or enhancer regions of many genes encoding antioxidant and Phase II detoxification enzymes/proteins (Hayes and McMahon, 2001; Itoh et al., 1997). In addition, NRF2 binds AREs in genes that participate in diverse processes such as immune and inflammatory responses, metabolism, tissue remodeling and fibrosis, metastasis, cognitive dysfunction, and addictive behavior (Hayes and Dinkova-Kostova, 2014). Thus, dysregulation of NRF2-regulated genes could provide a plausible explanation for the connections between oxidative stress and numerous human diseases (Hybertson and Gao, 2014), including chronic neurological diseases (Yamazaki et al., 2015). The functional ARE bound by NRF2 is represented as the consensus sequence 5’-TMAnnRTGAYnnnGCR-3’, where M = A or C; R = A or G; Y = C or T, and the ‘core’ consensus underlined (Nioi et al., 2003; Wasserman and Fahl, 1997). Recently, a large number of genomic regions occupied by NRF2 and small MAF proteins have been discovered using ChIP-seq (Chorley et al., 2012; Consortium, 2012). Despite the progress in identifying ARE target genes, little is known about how polymorphisms might affect their function and impact disease susceptibility.
In this report, using linkage disequilibrium (LD, the non-random association between alleles at two genome locations) data from the 1000 Genomes Project to enable integration of datasets, we combined genome-wide maps of NRF2/sMAF occupancy with disease risk SNPs identified in GWASs. This allowed identification of genetic variants located in functional AREs and associated with diseases. We then integrated diverse functional datasets to provide evidence that several ARE SNPs affect NRF2 regulated transcription, particularly the MAPT gene. From our analysis, we propose a plausible oxidative stress based-mechanism for an ARE SNP in MAPT mediating Parkinsonian diseases.
Results
Functional Antioxidant Response Elements Link to the Strongest GWAS SNPs in Several Human Diseases
Numerous mouse model studies implicate Nrf2 as a susceptibility factor in a broad range of oxidative stress-related pathologies (Cho and Kleeberger, 2015; Hybertson and Gao, 2014; Johnson and Johnson, 2015), suggesting that SNPs affecting human functional NRF2 binding sites (e.g. AREs) could affect risk of many diseases. Thus we sought to detect GWAS disease SNPs or their proxies (i.e. highly correlated SNPs in LD with a GWAS SNP) that reside in genomic regions occupied by NRF2 and/or small MAF proteins, and that contain a strong ARE motif. To explore this possibility, we implemented a data analysis pipeline (Figure 1A) to integrate multiple genome-wide data-sets and to select candidates with functional evidence.
Figure 1.
Study design and summary of candidates. (A) Overview of workflow for identifying functional ARE SNPs associated with disease; (B) The significance of GWAS SNPs linked to ARE SNPs.
To reduce false positives we used a stringent significance cut-off (p ≤ 5 × 10−8) for GWAS SNPs and considered only studies with large population sizes (≥1000 individuals with European ancestry, EUR). We found 1016 GWAS SNPs reported to be associated with 80 different diseases in 134 individual studies, and we identified 13,027 additional SNPs (or proxies) that were in “complete LD” (r2≥ 0.95 in EUR) with the GWAS SNPs. We refer to these GWAS SNPs and proxies as risk SNPs. To assess which of these risk SNPs reside in NRF2/sMAF occupied genomic regions, we integrated NRF2/sMAF ChIP-seq data derived from 15 unique experiments in 8 cell types (listed in Table S1). Included among these experiments were two we generated previously using lymphoblastoid cells treated with vehicle or with 10uM sulforaphane (SFN) (Chorley et al., 2012), a potent activator of NRF2 and up-regulator of the antioxidant response pathway.
There were 9852 NRF2 peaks reproducible in at least two experiments. Together with the reproducible ENCODE ChIP-seq peaks for MAFF (15602) and MAFK (45907), a total of 51573 individual ChIP-seq peaks were observed from one or more TF (Figure S1A). We confirmed that the top enriched motif in NRF2 ChIP-seq was also highly enriched in both MAFF and MAFK ChIP-seq, and the sequence logos were highly similar (p ≤ 3.0×10−04, by Tomtom (Machanick and Bailey, 2011))(Figure S1B). By intersecting the reproducible ChIP-seq peaks with GWAS data, we were able to determine that 94 SNPs reside in genomic regions occupied by NRF2/sMAF and are linked to disease risk (Table S2). To further evaluate which of these 94 SNPs also reside in a putative binding site (ARE), the sequences surrounding these SNPs were analyzed by position weight matrix (PWM) calculations (Wang et al., 2007). We determined that 14 SNPs reside in occupied AREs with PWM scores ≥ 6.4 (the lowest PWM score among known AREs). These 14 ARE SNPs link to 21 unique GWAS SNPs (Table 1, Figure 1B).
Table 1.
ARE SNPs and their linked disease-risk SNPs
| ARE SNP | Allele (PWM) |
ChIP-seq |
Nearest gene (location) |
GWAS SNP |
Disease | p-value | Rank in GWAS |
PUBMED | ||
|---|---|---|---|---|---|---|---|---|---|---|
| NRF2 | MAFF | MAFK | ||||||||
| rs242561* | T(19.7) C(17.6) |
6 | 2 | 6 | MAPT (intron) |
rs8070723 | progressive supranuclear palsy | 1.50E-116 | 1 | 21685912 |
| rs8070723 | Parkinson’s disease | 7.00E-12 | 2 | 21044948 | ||||||
| rs8070723 | corticobasal degeneration | 1.30E-8 | 1 | 26077951 | ||||||
| rs1981997 | interstitial lung disease | 9.00E-14 | 4 | 23583980 | ||||||
| rs241032* | C(15.4) T(10.5) |
4 | CRHR1-IT1 (downstream) |
rs2942168 | Parkinson’s disease | 1.00E-28 | 2 | 21292315 | ||
| rs393152 | Parkinson’s disease | 2.00E-16 | 1 | 19915575 | ||||||
| rs62033400* | A(7.1) G(7.0) |
2 | FTO (intron) |
rs8043757 | obesity | 5.00E-110 | 1 | 23563607 | ||
| rs7185735 | obesity | 1.00E-79 | 3 | 23563607 | ||||||
| rs17817449 | obesity | 2.00E-12 | 1 | 21552555 | ||||||
| rs9939609 | obesity | 1.90E-105 | 1 | 25104851 | ||||||
| rs9939609 | obesity | 4.90E-74 | 1 | 19079261 | ||||||
| rs9939609 | type 2 diabetes | 1.00E-20 | 3 | 22693455 | ||||||
| rs8050136 | type 2 diabetes | 2.00E-17 | 2 | 19056611 | ||||||
| rs8050136 | type 2 diabetes | 7.00E-14 | 3 | 17463249 | ||||||
| rs8050136 | type 2 diabetes | 1.00E-12 | 3 | 17463248 | ||||||
| rs9936385 | Type 2 diabetes | 1.00E-12 | 9 | 24509480 | ||||||
| rs17817449 | breast cancer | 6.00E-14 | 29 | 23535729 | ||||||
| rs6426833* | A(12.0) G(7.0) |
5 | RNF186 (upstream) |
rs6426833 | ulcerative colitis | 2.00E-68 | 2 | 23128233 | ||
| rs6426833 | ulcerative colitis | 4.00E-35 | 2 | 21297633 | ||||||
| rs6426833 | ulcerative colitis | 2.00E-21 | 1 | 20228799 | ||||||
| rs6426833 | ulcerative colitis | 5.00E-13 | 2 | 19122664 | ||||||
| rs6426833 | ulcerative colitis | 2.00E-11 | 3 | 19915572 | ||||||
| rs9603754 | C(7.8) G(6.6) |
2 | LINC00598 (intron) |
rs941823 | inflammatory bowel disease | 2.00E-14 | 46 | 23128233 | ||
| rs941823 | ulcerative colitis | 4.00E-12 | 23 | 21297633 | ||||||
| rs17035378 | A(14.9) G(14.4) |
2 | 2 | PLEK (intron) |
rs17035378 | celiac disease | 8.00E-09 | 24 | 20190752 | |
| rs12638492 | A(7.7) G(7.6) |
2 | ILDR1 (intron) |
rs4285028 | multiple sclerosis | 2.00E-08 | 45 | 21833088 | ||
| rs6426519* | G(9.5) A(7.4) |
4 | RHOU (downstream) |
rs801114 | Basal cell carcinoma | 2.00E-13 | 4 | 24403052 | ||
| rs801114 | Basal cell carcinoma | 6.00E-12 | 2 | 18849993 | ||||||
| rs16857611 | A(7.8) G(2.9) |
3 | 2 | 5 | DIRC3 (intron) |
rs16857609 | breast cancer | 1.00E-15 | 24 | 23535729 |
| rs13067040 | A(8.9) G(7.5) |
2 | MIR944 (downstream) |
rs710521 | bladder cancer | 2.00E-11 | 6 | 24163127 | ||
| rs710521 | bladder cancer | 2.00E-10 | 7 | 20972438 | ||||||
| rs369184* | A(10.3) G(5.4) |
1 | 6 | TEX14 (intron) |
rs9905704 | testicular germ cell tumor | 4.00E-13 | 2 | 23666239 | |
| rs9905704 | testicular germ cell tumor | 3.00E-09 | 9 | 23666240 | ||||||
| rs4818832 | G(11.6) A(11.4) |
2 | 3 | YBEY (intron) |
rs2839186 | testicular germ cell tumor | 1.00E-09 | 8 | 23666240 | |
| rs9884209* | A(20.1) G(18.8) |
1 | 2 | SMARCAD1 (downstream) |
rs17021463 | testicular germ cell tumor | 1.00E-08 | 5 | 23666239 | |
| rs62094906* | C(11.6) T(7.9) |
3 | LOC101927571 (downstream) |
rs11661542 | intracranial aneurysm | 1.00E-12 | 2 | 20364137 | ||
Linked to top 5 GWAS risk SNP.
See also Table S2.
To assess the relative contribution of these ARE SNPs among the linked diseases, the GWAS SNPs (Table 1 column 7) linked with ARE SNPs were ranked by p-values in their corresponding original studies. We found that 14 of the 21 (67%) GWAS SNPs were ranked among the top 5 most significant SNPs (herein we referred to as “highly-ranked GWAS SNPs”). Among them, four were identified as having very significant p-values and being replicated in multiple studies. Sorting by GWAS p-value, the most significant SNP rs8070723, located in an intragenic region of the MAPT gene, associated with risk of progressive supranuclear palsy (PSP, p=1.5×10−116), Parkinson’s disease (p=7.0×10−12) and corticobasal degeneration (p=1.3×10−8). Other highly-ranked GWAS SNPs were rs8043757 and rs9939609, located in introns of the FTO gene, associated with risk of obesity (p=5.0×10−110 and 1.9×10−105 respectively). In addition, the ARE SNP rs6426833, located in an intergenic region between OTUD3 and RNF186, is an highly-ranked GWAS SNP for ulcerative colitis (p=2.0× 10−68) that was replicated in 5 studies.
The Disease Signature of ARE SNPs is Specifically Defined by NRF2/sMAF Binding
We observed a surprising pattern, or signature, of highly-ranked GWAS SNPs displaying linkage with occupied ARE SNPs (Table 1), and this is consistent with the general hypothesis put forth by ENCODE that SNPs associated with disease tend to be enriched in TF binding regions of open chromatin (Consortium, 2012). We posit that this observed disease risk signature of ARE SNPs is due to occupancy by NRF2/sMAF upon an ARE motif. To assess this, we selected highly-ranked SNPs and their proxies from 12 GWASs for the nine diseases listed in Table 1, then we first asked whether NRF2/sMAF bound ARE SNPs in the genome were more likely to be found (i.e. enriched) among highly-ranked GWAS SNPs than ARE SNPs not bound by NRF2/sMAF. Using the PWM calculation to assess the binding strength of the ARE, we determined that 2,641 SNPs reside in AREs with PWM scores ≥ 6.4 and were reproducibly occupied by NRF2/sMAF in ChIP-seq experiments. Among these 2,641 occupied ARE SNPs, there were 8 ARE SNPs associated with highly-ranked GWAS SNPs (starred in Table 1). However, among the unbound ARE SNPs that were not in LD with occupied ARE SNPs in the genome, only 9 of 230,760 ARE SNPs were observed to be associated with highly-ranked GWAS SNPs. Therefore, the relative enrichment of NRF2/sMAF bound ARE SNPs among highly-ranked GWAS SNPs was 83-fold (8/2,641 vs. 9/230,760) relative to unbound ARE SNPs (p = 5.90×10−12, Fisher’s Exact test). A similar enrichment analysis against all risk SNPs found 203 ARE SNPs linked with all risk SNPs and a 6-fold relative enrichment of NRF2/sMAF bound ARE SNPs among all risk SNPs (14/2,641 vs. 203/230,760) relative to unbound ARE SNPs (p = 2.48×10−7). This result provides strong support for a direct impact of NRF2/sMAF occupancy among disease-associated SNPs.
We further asked whether SNPs located in any other putative transcription factor binding sites (TFBS) identified as bound by ENCODE (159 TFs), displayed enrichment among highly-ranked GWAS SNPs, similar to that observed with NRF2/sMAF bound ARE SNPs. Among 147,994 occupied ENCODE TFBS SNPs (RegulomeDB) that are not in LD with occupied ARE SNPs, we found only three TFBS SNPs linked to highly-ranked GWAS SNPs (SNP rs1980781 linked to the risk SNP rs931520; rs6734363 linked to the risk SNP rs6711012; rs7545115 linked to risk SNP rs7538876). Therefore, compared to NRF2/sMAF-bound ARE SNPs, the ENCODE occupied TFBS SNPs were greatly depleted (0.0062 fold, 3/147,994 vs. 8/2,641, p = 1.39×10−12, Fisher’s Exact test) among highly-ranked GWAS SNPs. Similarly, there were 50 TFBS SNPs linked with all risk SNPs, and the relative depletion of the ENCODE occupied TFBS SNPs among all risk SNPs was 0.064-fold (50/147,994 vs. 14/2,641) relative to occupied ARE SNPs (p = 5.29×10−12). These comparisons reinforce that the risk signature of functional ARE SNPs was defined by NRF2/sMAF occupancy and was independent of other TFBS SNPs identified in the ENCODE project.
SNPs in Functional AREs are Infrequent in the Genome
Using very stringent selection (reproducible ChIP-seq, LD r2 ≥ 0.95, GWAS p ≤ 5.0×10−8) criteria, our search was able to identify only 14 functional ARE SNPs linked to disease risk among 268,564 common (MAF ≥ 1%) potential ARE SNPs in the genome. We thus examined the possibility that functional ARE SNPs are infrequent in the genome, possibly due to purifying (negative) selection. We used the 15 NRF2/sMAF ChIP-seq datasets and compared the frequency of occupied polymorphic AREs, to the frequency of polymorphic AREs never observed to be occupied. In the genome, we identified 2,411,064 possible AREs with a PWM score of 6.4 or greater (Figure S2A). Of these, only 1.5% (35,659) of AREs were reproducibly occupied by NRF2/sMAF in at least 2 and up to 15 different ChIP-seq experiments. Presuming that occupancy in a multiplicity of experiments indicates that a group of AREs would be more likely to be functionally important, we calculated the frequency of SNPs in the occupancy groups (shown in Figure S2A). Interestingly, 11.2% (265,747/2,375,405) of the non-occupied AREs contain SNPs. The frequency declines in a monotonic way with increased multiplicity of occupancy, from 8.0% (1,717/21,570) in 2–4 experiments, 7.3% (591/8,049) in 5–7 experiments, 6.4% (320/5,023) in 8–10 experiments, to 6.0% (61/1,017) in 11–15 experiments) (Figure S2B). This decreased frequency of SNPs in AREs that display NRF2/sMAF occupancy (p < 2.2×10−16, Cochran-Armitage Trend Test) lends support to the hypothesis that polymorphisms in functional AREs are suppressed in human populations through negative evolutionary selection.
Disease Associated ARE SNPs are Frequently Occupied by NRF2/sMAF in an Allele-specific Manner
To explore the prediction that NRF2/sMAF could occupy AREs in an allele-specific manner, we examined the sequence reads generated from the ChIP-seq experiments in greater detail. For each SNP position, we defined allele-specific binding as: (1) at least 20 sequencing reads overlapping a SNP position with the least occupied allele having at least 3 reads, and (2) one allele having more reads than the other allele (p ≤ 0.05 by exact binomial test). This analysis requires that the cells used in the experiment must be heterozygous for the SNP to be tested. Given this limitation, among our 14 ARE SNPs, we found only 5 SNPs in 11 ChIP-seq experiments had enough reads for testing allele-specific binding (Table 2). Despite this, we identified 4 SNPs displaying significant allele-specific binding, including the 3 SNPs linked to the most significant GWAS risk SNPs. The difference in allele-specific binding was consistent with the predicted binding strength assessed by PWM value supporting the hypothesis that higher PWM alleles should display greater occupancy. This result provides corroboration that the disease-risk AREs we identified are frequently occupied by NRF2/sMAF in an allele-specific manner.
Table 2.
Allele-specific binding detected in ChIP-seq experiments
| SNP | Allele 1 | PWM1 | Allele 2 | PWM2 | Allele Counts |
TF (antibody) | cell | p-value | |||
|---|---|---|---|---|---|---|---|---|---|---|---|
| A | C | G | T | ||||||||
| rs6426833 | A | 12 | G | 7 | 31 | 0 | 3 | 0 | MAFK | H1-hESC | 7.66 ×10−7 |
| 61 | 0 | 23 | 0 | MAFK | MCF7 | 2.18 × 10−5 | |||||
| Rs16857611 | A | 7.8 | G | 2.9 | 0 | 12 | 1 | 30 | NFE2L2 | A549 | 0.0054 |
| 0 | 44 | 0 | 89 | MAFK | A549 | 1.18× 10−4 | |||||
| Rs9884209 | A | 20.1 | G | 18.8 | 0 | 29 | 0 | 40 | MAFK (SC-477) | HepG2 | ns |
| 1 | 85 | 0 | 72 | MAFK (ab50322) | HepG3 | ns | |||||
| 0 | 20 | 0 | 10 | MAFK (ab50322) | IMR90 | ns | |||||
| 0 | 41 | 0 | 43 | MAFK (ab50322) | A549 | ns | |||||
| 0 | 24 | 0 | 24 | MAFF | HepG2 | ns | |||||
| Rs62033400 | A | 7.1 | G | 7 | 19 | 0 | 6 | 0 | MAFK | A549 | 0.0146 |
| Rs242561 | T | 19.7 | C | 17.6 | 0 | 22 | 0 | 38 | MAFK | H1-hESC | 0.052 |
“ns”: not significant
See also table S4
Disease-linked ARE SNPs are Enriched among ARE SNPs Associated with Proximal Gene Expression
Disease-associated SNPs that affect NRF2/sMAF binding to AREs under basal or induced levels of oxidative stress should perturb the expression of adjacent genes, which would ultimately influence phenotype and disease risk. To assess this, we took advantage of many independent studies identifying expression quantitative trait loci (eQTL) in various cells or tissues, and examined the overlap between our candidate SNPs and published significant eQTL signals. We queried 2,641 NRF2/sMAF occupied ARE SNPs and their proxies against 16 eQTL data-sets (listed in Table S3) to determine if they were repeatable eQTL SNPs (i.e. replicable eSNPs) that are in the top 5% by false discovery rate (FDR), and < 200-kb to a nearby transcription start site (TSS). We found that 16.9% (446/2641) were repeatable, being associated with gene expression in at least two of the queried datasets, either the same gene, or adjacent genes. However, only 11.2% (25,736/230,760) of unbound AREs were replicable eSNPs, which indicated occupied ARE SNPs were enriched for eSNPs (p-value < 2.2x−16 by Chi-squared test). In addition, we divided all ARE SNPs based on “bound” and “eSNP”, then analyzed the ratio of ARE SNPs linked with risk SNPs to ARE SNPs not-linked with risk SNPs. We compared this ratio for “not-bound, not-eSNP” (0.057%, 116/204,908) to: “not-bound, eSNP” (0.343%, 88/25648; 6.0-fold enrichment, p<2.2 × 10−16); “bound, not-eSNP” (0.366%, 8/2187; 6.4-fold enrichment, p=5.7×10−5); “bound, eSNP” (1.363%, 6/440; 23.9-fold enrichment, p=3.1 ×10−7). This significant enrichment supports our hypothesis that NRF2/sMAF-occupied ARE SNPs that are associated with transcription are more likely to be linked to disease risk.
A Functional ARE SNP in MAPT Intron 1 is a Strong Candidate as a Causal SNP in Progressive Supranuclear Palsy, Parkinson’s Disease, and Corticobasal Degeneration
Considering more deeply the available functional evidence and a plausible underlying molecular mechanism, we propose an ARE SNP in the first intron of MAPT gene (encoding microtubule-associated protein tau) may explain a portion of the genetic susceptibility to Parkinsonian Diseases. First, this SNP rs242561 (previously known as rs113166067) located at chr17:44026548 (hg19) was consistently occupied by NRF2, MAFF, and MAFK, as measured by ChIP-seq (Figure 2A), and was located at the peak of the strongest binding signals in multiple experiments (Figure 2A). Specifically, 14 of 15 available ChIP-seq peaks overlap this SNP, which includes 6 NRF2 ChIP-seq, 2 MAFF ChIP-seq, and 6 MAFK ChIP-seq experiments performed in 8 different cell lines. The homologous sequence in the mouse was also bound in both Nrf2 and MafK ChIP-seq experiments (Figure S3A). In addition, the Roadmap Epigenome project chromHMM model classifies this genomic location in brain tissue (substantia nigra) as an enhancer or as actively transcribed (brain germinal matrix, hippocampus) and it is repressed in most other non-brain tissues. These data along with the high interspecies evolutionary conservation at this binding site (Figure 2B–C) strongly support a functional role for NRF2/sMAF binding in neural brain tissues.
Figure 2.
Characteristics of rs242561. (A) location relative to risk SNP for PSP and PD and ChIP-seq signals, (B) position in ARE motif, (C) conservation, (D) allele-specific binding, (E) allele specific transactivation, (F) RT-PCR of MAPT levels in human IMR32 neuroblastoma cells treated with t-BHQ or vehicle, (G) Nrf2 +/+ and Nrf2 −/− mice were exposed to oxidative stress induced by hyperoxia and Mapt gene expression level was measured in mRNA from cerebellum by RT-PCR.
See also Figure S3 and Tables S4–S5.
Second, this SNP is predicted to affect NRF2/sMAFK binding and transactivation based on the PWM calculation. The T-allele (PWM = 19.7) confers a stronger binding ARE relative to the C-allele (PWM = 17.6) (Figure 2D). Counting the alleles present in sequence reads in the MAFK ChIP-seq in H1-hESC supports the PWM prediction indicating the T-allele is a stronger binding ARE. Of the 60 unique sequencing reads precipitated with MAFK antibodies, 61% contained the T-allele (38) and 39% the C-allele (Figure 2D, T/C ratio = 1.7, p = 0.052, by binomial test). The allele-specific binding was confirmed by NRF2 ChIP tagmentation sequencing, using a lymphoblastoid cell line (LCL, GM12763; heterozygous for rs242561) treated with sulforaphane. We found that the ratio of T/C alleles was 1.46 (47,274 reads vs. 32,320 reads, averaged from 3 replicates, see also Table S4 and Figure S3B), showing significant difference in allele counts (p-value < 2.2×10−16). We further evaluated the interaction between NRF2 and this binding sequence using reporter constructs containing the two alleles. Both constructs can be strongly activated by NRF2 in IMR32 cells, and the stronger binding T allele construct displays significantly higher (~2 fold) transactivation ability relative to the C allele construct (Figure 2E). The induction of MAPT transcription by NRF2 was further confirmed using both a human neuroblastoma IMR32 cell line that carries a homozygous C allele and a Nrf2 deficient mouse model. In cells, treatment with 10mM tert-butylhydroquinone (t-BHQ, a NRF2 activator) significantly increased MAPT mRNA level ~3.8 fold relative to vehicle DMSO (Figure 2F), although IMR32 was not responsive to all NRF2 pathway activators. When Nrf2+/+ and Nrf2 −/− mice were exposed to either air or oxidative stress caused by hyperoxia, Mapt mRNA in cerebellum of Nrf2+/+ mice treated with hyperoxia was significantly elevated 1.7-fold compared with air treatment, but Mapt mRNA in cerebellum of Nrf2−/− mice had no significant change under either treatment. Therefore, Mapt transcription in mouse cerebellum has an Nrf2-dependent pattern (Figure 2G), as with known Nrf2 target genes Nqo1, Gclc, Gstm1, and Hmox1 (Table S5), indicating the possible importance of Nrf2 in Mapt regulation under oxidative stress conditions.
Third, this SNP affected MAPT transcription in an allele-specific manner in cells as measured by RNA-seq. Using polyA+ RNA-seq datasets (GSE56785, GSE20301, and GSE16256) obtained from H1 embryonic stem cell (H1-hESC) and neural progenitor cells derived from H1-hESC by different protocols, we counted sequencing reads overlapping three MAPT exon SNPs (rs17650901 on 5’ UTR, rs1052551 on exon 7, and rs17652121 on exon 9) that are in complete LD with rs242561. We found that the alleles linked with the stronger binding T allele had a pattern of higher sequencing read counts (Table 3). Further aggregating allele counts into counts per haplotype, we found the haplotype linked with the stronger binding T allele had consistently higher read counts (except for one time-point in GSE56785). Conditions or time points with higher levels of expression generally displayed statistically significant differences between the two haplotypes (Table 3).
Table 3.
Allele-specific expression detected in mRNA-seq experiments
| GEO | time-point | H1 | H2 | Fold change (H2/H1) |
p-value* (H2/H1) |
rs17650901 Allele 1 |
rs17650901 Allele 2# |
rs1052551 Allele 1 |
rs1052551 Allele 2 |
rs17652121 Allele 1 |
rs17652121 Allele 2 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| GSE56785 | hESC | 9 | 16 | 1.8 | 0.23 | 6 | 12 | 3 | 4 | ||
| NPC day1 | 511 | 616 | 1.2 | 1.94 × 10−3 | 219 | 243 | 158 | 188 | 134 | 185 | |
| NPC day2 | 36 | 53 | 1.5 | 0.09 | 14 | 22 | 13 | 23 | 9 | 8 | |
| NPC day4 | 130 | 163 | 1.3 | 0.06 | 62 | 61 | 36 | 58 | 32 | 44 | |
| NPC day5 | 200 | 212 | 1.1 | 0.59 | 76 | 84 | 75 | 70 | 49 | 58 | |
| NPC day11 | 1487 | 1736 | 1.2 | 1.24 ×10−5 | 246 | 282 | 658 | 784 | 583 | 670 | |
| NPC day18 | 64 | 59 | 0.9 | 0.72 | 29 | 25 | 15 | 15 | 20 | 19 | |
| GSE20301 | N2 A | 39 | 64 | 1.6 | 0.0176 | 8 | 7 | 19 | 51 | 12 | 6 |
| N3 A | 2 | 26 | 13.0 | 3.03 × 10−6 | 0 | 7 | 2 | 12 | 0 | 7 | |
| GSE16256 | hESC | 19 | 23 | 1.2 | 0.64 | 6 | 10 | 13 | 13 |
H2 is the sum of total reads overlapping allele 2 of SNPs rs17650901, rs1052551, and rs17652121.
Binomial test.
In each case allele 2 is the allele linked with the stronger binding T allele of rs242561 on haplotype H2.
Fourth, the observation of rs242561-associated allele-specific expression of MAPT was further strengthened by examining human population datasets. Querying eQTL datasets, the genotypes for rs242561 or proxies were significantly associated with mRNA levels of the MAPT gene in three different tissues (brain, subcutaneous fat (adipose), and esophagus mucosa), as measured by Affymetrix exon array, Illumina Ref-8 bead array, and RNA-seq, respectively (Table 4). Since microarray probes containing SNPs in the sequences would confound expression measurements, such as ILMN_1710903 or affy_3723733 (Ramasamy, NAR, 2013; Trabzuni, HMG 2012), we excluded eQTLs measured with SNP-containing probe sets. As listed in Table 4, a significant association was reported between rs242561-linked SNP rs17650579 and MAPT mRNA levels measured by ILMN_2310814 in adipose tissue among 850 European descendants (Grundberg et al., 2012). The T allele of rs17650579 is in complete LD (r2 =0.97, Figure 3A) with the stronger binding T allele of rs242561 and was associated with higher MAPT expression. Similarly, in the UK Brain Expression Consortium (UKBEC) study (Ramasamy et al., 2014), a significant association was reported between the A allele of rs1981997 and higher MAPT mRNA levels in brain tissue among 134 European descendants free of neurodegenerative disorders. The A allele of rs1981997 is in complete LD (r2 =0.97, Figure 3A) with the stronger binding T allele of rs242561. In the GTEx project data, the stronger binding T allele of rs242561 (previously known as rs113166067) was associated with higher RNA-seq measured MAPT expression in esophagus mucosa from 95 individuals (Consortium, 2015a).
Table 4.
The association of genotypes of ARE SNP rs242561 with MAPT mRNA levels
| Proxy/SNP | LD(r2) | Tissue | Brain Region | Probe ID | Allele$ | Frequency | Beta | p-value | Data source | Reference |
|---|---|---|---|---|---|---|---|---|---|---|
| rs17650579 | 0.97 | adipose | ILMN 2310814 | T | 0.24 | 0.135 | 1.90 ×10−14 | MuTHER | Grundberg et al., 2012 | |
| Rs1981997 | 0.97 | brain | TCTX (temporal cortex) |
3723707 | A | 0.27 | 0.151 | 8.93 ×10−6 | UKBEC | Ramasamy et al., 2014 |
| Rs242561 | esophagus mucosa |
(RNA-seq) | T | NA | 0.530 | 2.90 ×10−7 | GTEx | GTEx Consortium, 2015 |
||
| Rs242561 | brain | CRBL (cerebellar cortex) |
3723710 | T | 0.27 | 0.195 | 1.37 ×10−5 ** | UKBEC | Ramasamy et al., 2014 | |
| Rs242561 | brain | TCTX (temporal cortex) |
3723707 | T | 0.27 | 0.153 | 9.78 × 10−6 ** | UKBEC | Ramasamy et al., 2014 | |
| Rs242561 | brain | TCTX (temporal cortex) |
3723751 | T | 0.27 | 0.197 | 1.78 ×10−5 ** | UKBEC | Ramasamy et al., 2014 | |
| Rs242561 | brain | WHMT (intralobular white matter) |
3723707 | T | 0.27 | 0.175 | 1.32 ×10−05 ** | UKBEC | Ramasamy et al., 2014 |
The allele linked with stronger binding.
passed Bonferroni correction for 2197 SNPs genotyped in the 400kb region centered by rs242561.
Figure 3.
Association of rs242561 genotypes with MAPT mRNA levels in humans. (A) Map of rs242561 and eQTL SNPs with LD correlations (R2) within the MAPT gene; (B) MAPT mRNA levels and genotypes of rs242561 in human brain temporal cortex as measured by Affymetrix Exon array (UKBEC). TT is the stronger binding genotype.
See also Figure S3.
We further downloaded the UKBEC dataset from http://www.braineac.org/ to examine the effect of rs242561 genotype on MAPT expression levels in different regions of the brain among 143 healthy individuals. We confirmed that the stronger binding T allele of rs242561 (chr17: 44026548) was associated with the higher MAPT mRNA levels in 3 different regions in brain (Table 4), including cerebellar cortex (CRBL), temporal cortex (TCTX), intralobular white matter (WHMT), after Bonferroni correction (p = 1.28×10−5) for 2197 genotyped SNPs within 400kb region centred on rs242561. This trend was consistent with our PWM prediction. Figure 3B displays rs242561 genotypes and mRNA levels in brain TCTX region from UKBEC subjects measured by 3 different Affymetrix Exon 1.0 ST probe sets targeting exon 1 by 3723707, exon 2 by 3723710 and 3’UTR by 3723751. The stronger binding T allele corresponds to higher MAPT mRNA for all 3 probesets. Similarly the protective G allele of rs8070723 corresponds to higher MAPT mRNA for all 3 probesets (Figure S3C). In addition, probeset 3723712, marking expression of exon 3, showed a similar pattern (Figure S3D), although its expression level was lower than the above three probes. To exclude the possibility that other potential TFBS SNPs might affect this relationship, we searched the HaploReg database (Ward and Kellis, 2012) against all SNPs that were in complete LD (r2 =1) with rs242561, and found no SNPs among other reproducibly occupied TF binding sites with matched TF motifs.
Last and most importantly, as previously mentioned, we found this SNP rs242561 was in complete LD (r2 = 0.96, Figure 3A) with rs8070723, the strongest GWAS signal (p = 1.5×10−116 (Hoglinger et al., 2011)) for progressive supranuclear palsy, and very strong signal for Parkinson’s disease (p = 7.0×10−12 (Spencer et al., 2011), corticobasal degeneration (p = 1.3×10−8 (Kouri et al., 2015)) and Alzheimer’s (p=5.2×10−5 (Allen et al., 2014)). The stronger NRF2/sMAF binding allele T (corresponding to high MAPT expression) of rs242561 which enhances MAPT expression is in complete LD with the G allele of rs8070723 (frequency = 0.23 and 0.05, in control and case respectively) that confers a protective effect in Europeans with an odds ratio of 0.18, indicating a 5.11-fold lower risk for PSP, which is the largest odds for protective effect identified among all disease GWAS. Similarly this G allele was associated with the protection of Parkinson’s, corticobasal degeneration, and Alzheimer’s diseases. Thus, we propose that the stronger NRF2/sMAF binding T allele of rs242561 leads to the higher MAPT expression and is a functionally protective allele for Parkinsonian disorders in European ancestry populations.
Discussion
Recent studies using integrated analysis of genome-wide data have identified SNPs affecting allele-specific TF occupancy that can be mechanistically linked with a disease (Fogarty et al., 2014; Karczewski et al., 2013; Maurano et al., 2012; Schaub et al., 2012; Zeron-Medina et al., 2013). Here we focused on the gene network regulated by NRF2 and its binding partners (MAFF, MAFK). Activation of NRF2, either genetically or with chemical activators, such as SFN and t-BHQ, protects a broad range of tissues, including neurological, from oxidative stress-related toxicity by upregulating antioxidant response genes (Hayes and Dinkova-Kostova, 2014; Yamazaki et al., 2015). Knockout of Nrf2 (Nrf2−/− mice) directly increases oxidative stress-related neurological damage (Yamazaki et al., 2015) with associated behavioral outcomes and neurotransmitter function (Muramatsu et al., 2013). Interestingly, recent work has demonstrated that Nrf2 activation in P301S mice protects against oxidative stress and tau-related neurotoxicity (Stack et al., 2014). However, until the present study, no one has directly connected MAPT (tau) expression with NRF2/sMAF binding.
We hypothesize that the stronger binding T allelic variant of rs242561, which occurs on the H2 1Mb inversion haplotype, brings MAPT transcription under stronger regulation by NRF2/sMAF. The H2 haplotype (T allele) is associated with higher levels of MAPT mRNA as observed in numerous studies (Caffrey et al., 2008; Ramasamy et al., 2014; Trabzuni et al., 2012). The MAPT Tau protein has 7 major isoforms formed by alternative splicing of exons 2, 3, and 10 and it has been reported that the H2 haplotype encoded mRNA and MAPT Tau protein display higher levels of inclusion of MAPT exon 3 relative to H1 haplotypes (Trabzuni et al., 2012). Inclusion of exon 3 in the Tau protein has been demonstrated to be protective, decreasing both propensity for aggregation (Zhong et al., 2012) and amyloid-beta toxicity in a mouse model of AD (Ittner et al., 2011). For those individuals who carry the rs242561 T allele that is present on the H2 haplotype, NRF2/sMAF regulation of the MAPT gene should result in production of a Tau protein that resists aggregation and is protective against tauopathies like PSP and PD. Abnormal hyperphosphorylation and aggregation of MAPT is a critical molecular feature that characterizes tauopathies, for which the mechanism is unknown (Ballatore et al., 2007).
It is unclear if in vivo regulation of human MAPT by NRF2/sMAF is a basal effect or is induced by oxidative stress. We observed NRF2-dependent regulation of MAPT in neural cells in culture and in cerebellum in a Nrf2 mouse oxidative-stress model, however, neither of these models have the H2 haplotype and the T allele of rs242561. We suggest that individuals harboring the T allele of the ARE SNP rs242561 could benefit from an upregulation of aggregation resistant, exon 3 containing Tau protein, especially under conditions of oxidative stress. This would presumably lead to the observed protective effect of this allele. It seems likely that in parallel, upregulation of other NRF2 targets (NQO1, HMOX1) would also occur, reducing oxidative stress and promoting degradation of aggregated proteins (SQSTM1).
Studies of eQTLs in the MAPT gene have been controversial due to several publications identifying false-positive eQTLs resulting from SNPs in array probe sequences interfering with expression measurement (Ramasamy et al., 2013). Combined with an extremely complex gene locus structure that includes a 1-Mb inversion haplotype, termed H2, as well as copy number variation, the literature on this topic presents a confusing picture of the relationship between polymorphic variants and Tau expression, and subsequent PSP/Parkinson’s risk. Some published eQTL reports have suggested that the H1 haplotype (risk alleles) is associated with higher MAPT expression and that consequent hyperphosphorylation of MAPT drives risk (Gibbs et al., 2010; Myers et al., 2007; Zou et al., 2012). However, these reports have consistently measured mRNA expression using an array probe (e.g. ILMN_1710903) containing a 2-bp deletion polymorphism (Ramasamy et al., 2013) and this probe fails to detect mRNA transcribed from the H2 haplotype. Using an H1/H2 haplotype-specific expression assay based on a 5’UTR SNP rs17650901 (in complete LD with rs242561) to compare N-terminal exon expression in a heterozygous H1/H2 human neuronal cell culture model and in post-mortem human brain tissues from 14 control individuals, the protective MAPT H2 haplotype was found to have two-fold more MAPT transcripts (containing exons 2+3+) than the disease-associated H1 haplotype (Caffrey et al., 2008). The H2 haplotype was also associated with higher expression of exon 3 (as detected by Affymetrix probe set 3723712) in all brain regions except white matter in 439 control individuals (Trabzuni et al., 2012). In addition higher MAPT mRNA level was associated with SNP alleles on the same H2 haplotype with rs242561 T allele including: the minor A allele of rs1981997 (r2 = 0.97 with rs242561) (Ramasamy et al., 2014); and the minor A allele of rs113986870 (r2 = 0.82 with rs242561) which is associated with reduced risk of Alzheimer’s Disease (Jun et al., 2015). Altogether, it seems clear from previous studies and the present study that the H2 haplotype and the MAPT ARE SNP rs242561 T allele are associated with higher MAPT mRNA expression and are protective in tau-associated pathologies, including PSP, a rare Parkinsonian disorder, and Parkinson’s disease. Interestingly, in AD the H2-haplotype tagging rs8070723 G allele (coupled with rs242561 T allele) was also associated with reduced risk, but in diseased brain MAPT levels were lower in the cerebellum and temporal cortex (Allen et al., 2014; Zou et al., 2012). However, in these studies of AD brains, MAPT mRNA level was measured by probe ILMN_2298727, which was not robustly detected in normal human brain samples (Trabzuni et al., 2012).
The protective H2 haplotype (high expression allele), which occurs at a frequency of ~0.2 in European ancestry populations but is nearly absent in Africans and Asians, may have undergone positive selection in Europeans (Stefansson et al., 2005). We suggest that NRF2/sMAF-driven, higher levels of the MAPT isoform containing exon 3 may protect glia from pathological damage to the tau protein that may occur under conditions of oxidative stress and this may be a plausible mechanism for both the protective effect in disease and positive selection.
We have used an integrated approach to combine genome-wide maps of NRF2/sMAF occupancy with disease risk signals identified in GWASs to find potential functional AREs SNPs associated with disease risk. We further integrate these data with diverse transcriptional datasets to support the functional connection between several ARE SNPs and NRF2-regulated transcription. The data provide strong support for a link between NRF2/sMAF occupancy at rs242561 in the MAPT gene, allelic expression in neuronal cells, and Parkinsonian disorders. As shown in many animal models of oxidative stress-related pathology, NRF2 activation via dietary components such as SFN is protective, the present results suggest NRF2 activation might be useful in prevention of neurological diseases such as PSP and PD (Johnson and Johnson, 2015).
Experimental Procedures
ChIP-seq data
We used 7 unique NRF2 ChIP-seq datasets in this study (Table S1), generated in LCL, BEAS-2B, A549, HepG2, and K562. BEAS-2B and LCL were treated with SFN or vehicle (Chorley et al., 2012). The raw reads were aligned against human reference sequence (hg19) using Burrows-Wheeler Alignment (BWA) Tool (Li et al., 2008). Peaks were called using MACS2 program (Zhang et al., 2008) with its default setting. Then, we only kept peaks that had a FDR q-value ≤ 5% and had 6 or more reads at the summits. The MAFF and MAFK ChIP-seq peaks were downloaded from the ENCODE Data Coordination Center. The file name is wgEncodeRegTfbsClusteredV3.bed, which were generated by the ENCODE Analysis Working Group using a uniform processing pipeline (Li et al., 2011) for all 161 transcription factors. To provide reliable TF binding signals, we only selected peaks observed in two or more cell types for further analysis. We further filtered out peaks overlapping the ENCODE blacklist (Consortium, 2012).
GWAS data
We use the NHGRI GWAS catalog (Welter et al., 2014) (http://www.genome.gov/gwastudies downloaded on September 9, 2014) to obtain a list of risk SNPs. The selection criteria were: discovered in studies with the “Disease/Trait” attribute matching any human disease terms defined by MeSH (Medical Subject Headings) from the National Library of Medicine, performed in Europeans, with p-values < 5.0×10−8 and a population size 1000 or more in either discovery or replication phase. We discarded all GWAS SNPs mapped into any exon regions.
SNP sequence, allele frequency and linkage disequilibrium data
The SNP sequences were downloaded from NCBI dbSNP database (build 141). We selected SNPs with a minor allele frequency > 1% in 379 Europeans in the 1000 Genome project’s Phase I Integrated Release Version 2 (Genomes Project et al., 2012). The LD data was downloaded from http://csg.sph.umich.edu/abecasis/MACH/download/1000G.2012-02-14.html.
eQTL (expression quantitative trait loci) data
We selected 15 publicly available eQTL datasets which originated from gene expression profiles under no treatment in healthy European descendants (Table S3) that report cis SNP-gene associations in 8 tissues/cells. The population size ranged from 134 to 5311. We also selected the GTEx project (Consortium, 2015a) results on 12 tissues/cells with the sample size ≥200. To focus on most significant signals that may underlie regulation of transcription, we selected eQTLs that were ranked top 5% by FDR, with a distance to transcription start site < 200kb. To avoid false positive, we filtered out expression probes that contained common SNPs (MAF ≥ 1% in EUR panel of 1000 Genomes project).
Integrative annotation of SNPs based on linkage disequilibrium
We selected the GWAS SNPs related to disease risk from NHGRI GWAS Catalog. Next, we extended GWAS SNP sets by identifying all SNPs in complete LD (r2 ≥ 0.95) with a GWAS SNP in the EUR population of 1000 Genomes Project. Then, comparing this set of disease risk SNPs with the SNPs located in the ChIP-seq peaks, we identified SNPs bound by NRF2/sMAF and linked to disease risk. For each of our candidate ARE SNPs linked to disease risk, we identified all SNPs in a complete LD (r2 ≥ 0.95) in the EUR population of 1000 Genomes Project. Then, comparing this set of ARE SNPs and their proxies with the SNPs discovered by eQTL studies, we identified ARE SNPs potentially affecting gene transcription.
Detection of Allele-Specific Binding and Allele-Specific Expression
After mapping ChIP-seq reads to hg19, the MPILEUP function in the Samtools (Li et al., 2009) was used to pile up reads overlapping SNP positions. Then, we identified heterozygous sites and used a binomial test to evaluate if there was a difference in read counts of tested alleles After mapping RNA-seq reads using STAR (v2.4.2a, (Dobin et al., 2013)) to hg19, the GATK (v3.4, (McKenna et al., 2010)) was used to pile up reads overlapping SNP positions. Then, we identified heterozygous sites and used a binomial test to evaluate if there was a difference in read counts of tested alleles.
Association of SNP genotype and MAPT expression
The MAPT gene expression and SNP genotyping data were downloaded from the UK Brain Expression Consortium (UKBEC, http://www.braineac.org) (Ramasamy et al., 2014; Trabzuni et al., 2011). This data originates from brain tissue from 134 individuals free of neurodegenerative disorders. Up to ten regions per brain were analyzed using Affymetrix Exon 1.0 ST Array. To avoid false positive association, the probe-sets were examine for common SNPs and indels (MAF > 1% in EUR) using a web application called “Polymorphism-in-probe Finder (PiP Finder) tool” (https://caprica.genetics.kcl.ac.uk/~ilori/pipfinder.php) (Ramasamy et al., 2013). We used 13 probe-sets that had average log2-expression levels > 7.0 (recommended by Affymetrix) across ten regions, including 3723707, 3723710, 3723722, 3723725, 3723731, 3723735, 3723740, 3723743, 3723746, 3723747, 3723749, 3723751, 3723753, and 3723754. The probe-set 372712 was below this level of expression but was examined due to its specific role in identifying exon 3. The SNPs we tested were 2197 SNPs genotyped in the 400kb region centered by rs242561. A SNP was considered as associated with MAPT expression level only if p < 2.28 × 10−5 (i.e. 0.05 / 2197, Bonferroni correction).
Other bioinformatics analyses
To identify SNPs within putative NRF2/sMAF binding sites (AREs), we used position weight matrix (PWM) calculations computed from 57 published AREs to analyze the sequences surrounding these SNPs (Wang et al., 2007). To determine if a SNP was within other transcription factor binding sites, we downloaded SNP annotation file RegulomeDB.dbSNP141.txt from the RegulomeDB (Boyle et al., 2012). Then we examined the selected SNPs for reproducible ChIP-seq signals and matched binding motifs of the TF used in the ChIP-seq.
Statistical Methods
The binomial test was used to test allele-specific binding or expression. For each SNP of interest, we tested the null hypothesis that the counts of the two alleles occurred equally with no allele-specific binding or expression. Fisher’s exact test and Chi-squared test were used for enrichment analysis to calculate the significance of the difference between expected and observed frequencies (or ratios). The Cochran-Armitage trend test was used to assess for the presence of an association between a variable with two categories and a variable with k categories. The null hypothesis is the hypothesis of no trend, which means that the binomial proportion is the same for all levels of the explanatory variable. Two-way analysis of variance (ANOVA) was used to test if Mapt transcription in mouse brain was influenced by the presence of Nrf2 under hyperoxia conditions. The t-test was used to test the significance of the difference in means between two groups of data (treatment vs control). Multiple comparisons were controlled by using Bonferroni correction (0.05/number of tests).
Luciferase reporter assays
See supplemental methods.
Reverse transcriptase PCR (RT-PCR) & quantitative PCR (qPCR)
See supplemental methods.
Nrf2 +/+, Nrf2 −/− mouse oxidative stress exposure model
See supplemental methods.
Supplementary Material
Highlights.
Genome-wide NRF2/sMAF occupancy defines highly-ranked disease-associated SNPs
SNPs in NRF2 binding sites are rare in the human genome due to negative selection
A SNP in MAPT shows allele-specific binding to NRF2, enhanced MAPT transcription
Strong NRF2 binding site is linked with reduced risk of Parkinsonian diseases
Acknowledgments
This work was funded by the Intramural Research Program of National Institute of Environmental Health Sciences, National Institutes of Health (projects: Z01ES100475, Z01ES46008), the University of Minnesota Foundation (MS) and the Ludwig Institute for Cancer Research (GB). The authors would like to acknowledge Dr. Shuangshuang Dai for Linux computing support, NIEHS Animal Facility Core and Dr. Jean Harry, NIEHS for technical advice and useful comments on the manuscript.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Author Contributions
D.A.B. and X.W. conceived and designed the experiments; M.R.C., S.E.L., H.C, B.N.C., M.W. and C.C. performed the experiments; X.W., D.A.B., S.R.K., G.B., and M.S. analyzed the data; X.W. and D.A.B. wrote the paper.
References
- Allen M, Kachadoorian M, Quicksall Z, Zou F, Chai HS, Younkin C, Crook JE, Pankratz VS, Carrasquillo MM, Krishnan S, et al. Association of MAPT haplotypes with Alzheimer’s disease risk and MAPT brain gene expression levels. Alzheimers Res Ther. 2014;6:39. doi: 10.1186/alzrt268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ballatore C, Lee VM, Trojanowski JQ. Tau-mediated neurodegeneration in Alzheimer’s disease and related disorders. Nat Rev Neurosci. 2007;8:663–672. doi: 10.1038/nrn2194. [DOI] [PubMed] [Google Scholar]
- Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, Karczewski KJ, Park J, Hitz BC, Weng S, et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 2012;22:1790–1797. doi: 10.1101/gr.137323.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Caffrey TM, Joachim C, Wade-Martins R. Haplotype-specific expression of the N-terminal exons 2 and 3 at the human MAPT locus. Neurobiology of Aging. 2008;29:1923–1929. doi: 10.1016/j.neurobiolaging.2007.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cho HY, Kleeberger SR. Association of Nrf2 with airway pathogenesis: lessons learned from genetic mouse models. Arch Toxicol. 2015;89:1931–1957. doi: 10.1007/s00204-015-1557-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chorley BN, Campbell MR, Wang X, Karaca M, Sambandan D, Bangura F, Xue P, Pi J, Kleeberger SR, Bell DA. Identification of novel NRF2-regulated genes by ChIP-Seq: influence on retinoid X receptor alpha. Nucleic Acids Res. 2012;40:7416–7429. doi: 10.1093/nar/gks409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Consortium EP. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Consortium G. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science (New York, NY) 2015a;348:648–660. doi: 10.1126/science.1262110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Consortium RE. Integrative analysis of 111 reference human epigenomes. Nature. 2015b;518:317–330. doi: 10.1038/nature14248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dubois PC, Trynka G, Franke L, Hunt KA, Romanos J, Curtotti A, Zhernakova A, Heap GA, Adany R, Aromaa A, et al. Multiple common variants for celiac disease influencing immune gene expression. Nature genetics. 2010;42:295–302. doi: 10.1038/ng.543. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Farh KK, Marson A, Zhu J, Kleinewietfeld M, Housley WJ, Beik S, Shoresh N, Whitton H, Ryan RJ, Shishkin AA, et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature. 2015;518:337–343. doi: 10.1038/nature13835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fogarty MP, Cannon ME, Vadlamudi S, Gaulton KJ, Mohlke KL. Identification of a regulatory variant that binds FOXA1 and FOXA2 at the CDC123/CAMK1D type 2 diabetes GWAS locus. PLoS Genet. 2014;10:e1004633. doi: 10.1371/journal.pgen.1004633. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Genomes Project C, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. doi: 10.1038/nature11632. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gibbs JR, van der Brug MP, Hernandez DG, Traynor BJ, Nalls MA, Lai SL, Arepalli S, Dillman A, Rafferty IP, Troncoso J, et al. Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. PLoS Genet. 2010;6:e1000952. doi: 10.1371/journal.pgen.1000952. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grundberg E, Small KS, Hedman AK, Nica AC, Buil A, Keildson S, Bell JT, Yang TP, Meduri E, Barrett A, et al. Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nature genetics. 2012;44:1084–1089. doi: 10.1038/ng.2394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hayes JD, Dinkova-Kostova AT. The Nrf2 regulatory network provides an interface between redox and intermediary metabolism. Trends Biochem Sci. 2014;39:199–218. doi: 10.1016/j.tibs.2014.02.002. [DOI] [PubMed] [Google Scholar]
- Hayes JD, McMahon M. Molecular basis for the contribution of the antioxidant responsive element to cancer chemoprevention. Cancer Lett. 2001;174:103–113. doi: 10.1016/s0304-3835(01)00695-4. [DOI] [PubMed] [Google Scholar]
- Hoglinger GU, Melhem NM, Dickson DW, Sleiman PMA, Wang LS, Klei L, Rademakers R, de Silva R, Litvan I, Riley DE, et al. Identification of common variants influencing risk of the tauopathy progressive supranuclear palsy. Nature genetics. 2011;43:699–U125. doi: 10.1038/ng.859. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hybertson BM, Gao B. Role of the Nrf2 signaling system in health and disease. Clin Genet. 2014;86:447–452. doi: 10.1111/cge.12474. [DOI] [PubMed] [Google Scholar]
- Itoh K, Chiba T, Takahashi S, Ishii T, Igarashi K, Katoh Y, Oyake T, Hayashi N, Satoh K, Hatayama I, et al. An Nrf2/small Maf heterodimer mediates the induction of phase II detoxifying enzyme genes through antioxidant response elements. Biochem Biophys Res Commun. 1997;236:313–322. doi: 10.1006/bbrc.1997.6943. [DOI] [PubMed] [Google Scholar]
- Ittner A, Ke YD, van Eersel J, Gladbach A, Gotz J, Ittner LM. Brief update on different roles of tau in neurodegeneration. IUBMB Life. 2011;63:495–502. doi: 10.1002/iub.467. [DOI] [PubMed] [Google Scholar]
- Johnson DA, Johnson JA. Nrf2-a therapeutic target for the treatment of neurodegenerative diseases. Free Radic Biol Med. 2015 doi: 10.1016/j.freeradbiomed.2015.07.147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jun G, Ibrahim-Verbaas CA, Vronskaya M, Lambert JC, Chung J, Naj AC, Kunkle BW, Wang LS, Bis JC, Bellenguez C, et al. A novel Alzheimer disease locus located near the gene encoding tau protein. Mol Psychiatry. 2015 doi: 10.1038/mp.2015.23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karczewski KJ, Dudley JT, Kukurba KR, Chen R, Butte AJ, Montgomery SB, Snyder M. Systematic functional regulatory assessment of disease-associated variants. Proceedings of the National Academy of Sciences of the United States of America. 2013;110:9607–9612. doi: 10.1073/pnas.1219099110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kouri N, Ross OA, Dombroski B, Younkin CS, Serie DJ, Soto-Ortolaza A, Baker M, Finch NC, Yoon H, Kim J, et al. Genome-wide association study of corticobasal degeneration identifies risk variants shared with progressive supranuclear palsy. Nature communications. 2015;6:7247. doi: 10.1038/ncomms8247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H, Ruan J, Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008;18:1851–1858. doi: 10.1101/gr.078212.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li QH, Brown JB, Huang HY, Bickel PJ. Measuring Reproducibility of High-Throughput Experiments. Annals of Applied Statistics. 2011;5:1752–1779. [Google Scholar]
- Machanick P, Bailey TL. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics. 2011;27:1696–1697. doi: 10.1093/bioinformatics/btr189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, Reynolds AP, Sandstrom R, Qu H, Brody J, et al. Systematic localization of common disease-associated variation in regulatory DNA. Science (New York, NY) 2012;337:1190–1195. doi: 10.1126/science.1222794. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Muramatsu H, Katsuoka F, Toide K, Shimizu Y, Furusako S, Yamamoto M. Nrf2 deficiency leads to behavioral, neurochemical and transcriptional changes in mice. Genes Cells. 2013;18:899–908. doi: 10.1111/gtc.12083. [DOI] [PubMed] [Google Scholar]
- Myers AJ, Gibbs JR, Webster JA, Rohrer K, Zhao A, Marlowe L, Kaleem M, Leung D, Bryden L, Nath P, et al. A survey of genetic human cortical gene expression. Nature genetics. 2007;39:1494–1499. doi: 10.1038/ng.2007.16. [DOI] [PubMed] [Google Scholar]
- Nioi P, McMahon M, Itoh K, Yamamoto M, Hayes JD. Identification of a novel Nrf2-regulated antioxidant response element (ARE) in the mouse NAD(P)H:quinone oxidoreductase 1 gene: reassessment of the ARE consensus sequence. Biochem J. 2003;374:337–348. doi: 10.1042/BJ20030754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ramasamy A, Trabzuni D, Gibbs JR, Dillman A, Hernandez DG, Arepalli S, Walker R, Smith C, Ilori GP, Shabalin AA, et al. Resolving the polymorphism-in-probe problem is critical for correct interpretation of expression QTL studies. Nucleic Acids Res. 2013;41:e88. doi: 10.1093/nar/gkt069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ramasamy A, Trabzuni D, Guelfi S, Varghese V, Smith C, Walker R, De T, Coin L, de Silva R, Cookson MR, et al. Genetic variability in the regulation of gene expression in ten regions of the human brain. Nature Neuroscience. 2014;17:1418–1428. doi: 10.1038/nn.3801. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rushmore TH, Morton MR, Pickett CB. The antioxidant responsive element. Activation by oxidative stress and identification of the DNA consensus sequence required for functional activity. J Biol Chem. 1991;266:11632–11639. [PubMed] [Google Scholar]
- Schaub MA, Boyle AP, Kundaje A, Batzoglou S, Snyder M. Linking disease associations with regulatory information in the human genome. Genome Res. 2012;22:1748–1759. doi: 10.1101/gr.136127.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stack C, Jainuddin S, Elipenahli C, Gerges M, Starkova N, Starkov AA, Jove M, Portero-Otin M, Launay N, Pujol A, et al. Methylene blue upregulates Nrf2/ARE genes and prevents tau-related neurotoxicity. Human molecular genetics. 2014;23:3716–3732. doi: 10.1093/hmg/ddu080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stefansson H, Helgason A, Thorleifsson G, Steinthorsdottir V, Masson G, Barnard J, Baker A, Jonasdottir A, Ingason A, Gudnadottir VG, et al. A common inversion under selection in Europeans. Nature genetics. 2005;37:129–137. doi: 10.1038/ng1508. [DOI] [PubMed] [Google Scholar]
- Tong KI, Kobayashi A, Katsuoka F, Yamamoto M. Two-site substrate recognition model for the Keap1-Nrf2 system: a hinge and latch mechanism. Biol Chem. 2006;387:1311–1320. doi: 10.1515/BC.2006.164. [DOI] [PubMed] [Google Scholar]
- Trabzuni D, Ryten M, Walker R, Smith C, Imran S, Ramasamy A, Weale ME, Hardy J. Quality control parameters on a large dataset of regionally dissected human control brains for whole genome expression studies. J Neurochem. 2011;119:275–282. doi: 10.1111/j.1471-4159.2011.07432.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Trabzuni D, Wray S, Vandrovcova J, Ramasamy A, Walker R, Smith C, Luk C, Gibbs JR, Dillman A, Hernandez DG, et al. MAPT expression and splicing is differentially regulated by brain region: relation to genotype and implication for tauopathies. Human molecular genetics. 2012;21:4094–4103. doi: 10.1093/hmg/dds238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wakabayashi N, Dinkova-Kostova AT, Holtzclaw WD, Kang MI, Kobayashi A, Yamamoto M, Kensler TW, Talalay P. Protection against electrophile and oxidant stress by induction of the phase 2 response: fate of cysteines of the Keap1 sensor modified by inducers. Proceedings of the National Academy of Sciences of the United States of America. 2004;101:2040–2045. doi: 10.1073/pnas.0307301101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang X, Tomso DJ, Chorley BN, Cho HY, Cheung VG, Kleeberger SR, Bell DA. Identification of polymorphic antioxidant response elements in the human genome. Human molecular genetics. 2007;16:1188–1200. doi: 10.1093/hmg/ddm066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ward LD, Kellis M. HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Research. 2012;40:D930–D934. doi: 10.1093/nar/gkr917. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wasserman WW, Fahl WE. Functional antioxidant responsive elements. Proc Natl Acad Sci U S A. 1997;94:5361–5366. doi: 10.1073/pnas.94.10.5361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, Klemm A, Flicek P, Manolio T, Hindorff L, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42:D1001–D1006. doi: 10.1093/nar/gkt1229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yamazaki H, Tanji K, Wakabayashi K, Matsuura S, Itoh K. Role of the Keap1/Nrf2 pathway in neurodegenerative diseases. Pathol Int. 2015;65:210–219. doi: 10.1111/pin.12261. [DOI] [PubMed] [Google Scholar]
- Zeron-Medina J, Wang X, Repapi E, Campbell MR, Su D, Castro-Giner F, Davies B, Peterse EF, Sacilotto N, Walker GJ, et al. A polymorphic p53 response element in KIT ligand influences cancer risk and has undergone natural selection. Cell. 2013;155:410–422. doi: 10.1016/j.cell.2013.09.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nussbaum C, Myers RM, Brown M, Li W, et al. Model-based Analysis of ChIP-Seq (MACS) Genome biology. 2008:9. doi: 10.1186/gb-2008-9-9-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhong Q, Congdon EE, Nagaraja HN, Kuret J, et al. Tau isoform composition influences rate and extent of filament formation. J Biol Chem. 2012;287:20711–20719. doi: 10.1074/jbc.M112.364067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zou FG, Chai HS, Younkin CS, Allen M, Crook J, Pankratz VS, Carrasquillo MM, Rowley CN, Nair AA, Middha S, et al. Brain Expression Genome-Wide Association Study (eGWAS) Identifies Human Disease-Associated Variants. Plos Genetics. 2012:8. doi: 10.1371/journal.pgen.1002707. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



