SUMMARY
Genome-wide association studies (GWASs) have identified numerous variants associated with polygenic traits and diseases. However, with few exceptions, a mechanistic understanding of which variants affect which genes in which tissues to modulate trait variation is lacking. Here, we present genomic analyses to explain trait heritability of blood pressure (BP) through the genetics of transcriptional regulation using GWASs, multiomics data from different tissues, and machine learning approaches. Approximately 500,000 predicted regulatory variants across four tissues explain 33.4% of variant heritability: 2.5%, 5.3%, 7.7%, and 11.8% for kidney-, adrenal-, heart-, and artery-specific variants, respectively. Variation in the enhancers involved shows greater tissue specificity than in the genes they regulate, suggesting that gene regulatory networks perturbed by enhancer variants in a tissue relevant to a phenotype are the major source of interindividual variation in BP. Thus, our study provides an approach to scan human tissue and cell types for their physiological contribution to any trait.
In brief
Investigating tissue-specific effects of sequence variants on complex traits through a gene regulatory network is challenging. Lee et al. demonstrate that the heritability of blood pressure is largely explained by predicted tissue-specific enhancer variants and that artery-specific enhancer variants are the greatest source of interindividual variation in blood pressure regulation.
Graphical Abstract

INTRODUCTION
Genome-wide association studies (GWASs) of polygenic traits and diseases have been highly successful in mapping hundreds of thousands of common sequence variants clustered at thousands of genomic loci.1–3 These genetic analyses have confirmed Fisher’s infinitesimal model of multifactorial inheritance,4 namely, that complex traits and diseases arise from the additive actions of small genetic effects at many genes.5 The success in answering this genetic puzzle of complex inheritance stands in stark contrast to our continuing inability to identify the underlying molecular and mechanistic basis for these traits. Most GWAS variants reside in the non-coding genome and are believed to perturb the expression of many trait-modifying genes by altering the effects of their transcriptional enhancers, often in a tissue- or cell-type-specific manner.6 Therefore, to understand the functional basis of a given trait or disease, we need to know the identities of the four relevant transcriptional components of each GWAS locus: the transcription factors (TFs) involved, the cis-regulatory elements (CREs, or enhancers) that bind these TFs, the causal sequence variants within these CREs that alter enhancer activity, and the gene whose expression is thereby altered.7 Genetic variation in any component of this quartet is the fundamental unit of polygenic trait variation. This problem has been difficult to solve, except for selected genes and loci,8,9 and here, we provide a general genomic and computational framework for doing so. When successful, these transcriptional components will allow experimental tests of the veracity of a trait’s uncovered functional architecture and teach us how a specific set of genes modulate a trait or disease.
Central to the task of elucidating a phenotype’s functional architecture is the challenge of identification of the tissues where such genetic variation is functionally relevant. We use the genetic principle that only sequence variants in CREs active in a specific tissue can mediate trait effects by perturbing their target gene’s expression in that tissue, irrespective of whether the target gene is expressed elsewhere. This principle has emerged from our extensive research on Hirschsprung disease, a neuro-developmental disorder affecting the enteric nervous system. We have shown that the majority of the disease risk arises from specific sequence variants within three CREs responsible for regulating RET gene expression during gut development.10 Interestingly, these variants are common in the general human population and do not manifest in other phenotypes, despite the widespread expression of RET in various tissues. This phenomenon can be attributed to the property of these enhancers to exert their regulatory effects in a gut-specific manner despite being open (accessible) in other tissues. Thus, CRE activity in a given tissue is the most direct way to separate putative causal variants from those in linkage disequilibrium (LD) with causal variants. It is, of course, possible that a CRE sequence variant is merely in LD with the causal variant residing in another tissue’s CRE. Distinguishing these effects is important and requires us to predict the functional consequences of all CRE-resident variants, analogous to methods that distinguish pathogenic from benign coding variants. Advancing such analyses requires comprehensive and high-quality tissue-resolved CRE and gene expression epigenomic maps to constrain the functionally genomic space where sequence variation relevant to a trait can reside. The availability of such maps can then allow scanning all human tissues for their relative contribution to a trait’s inter-individual variation and, in principle, can be extended to all cell types within contributing tissues, connecting genome sequence variation to the cell and tissue origins of a phenotype.
Identifying the specific functional variants in CREs is a second major challenge because only a small proportion of variants within CREs are expected to affect its activities.11 However, the advent of computational methods for functional CRE variant predictions now provides a practical means to systematically identify such variants genome-wide.12–15 Over the past decade, several groups, includingus, have pioneered these approaches and demonstrated that functional CRE variants can be predicted from DNA sequences,16–18 using a two-step approach. First, sequence-based models for CRE prediction are built using machine-learning methods trained on experimentally detected CREs. Second, these models are used to estimate the effect of any CRE variant by calculating the difference in model-predicted CRE activities between reference and variant alleles. This method can score any genetic variant as long as its sequence information is available regardless of its sequence type, allele frequency, and haplotype structure. These sequence-resolved epigenomic maps can thus identify variants in CREs shared across tissues but affecting gene expression in a tissue-specific manner. Thus, high-quality tissue-resolved CRE maps, also used as training data for these sequence-based predictive models, are an essential component of our analyses.
In this study, we propose genomic and epigenomic analyses of any complex trait and apply these methods to blood pressure (BP) variation, a classical polygenic trait. Genetic dissection ofBP has remained challenging because of the influence of numerous environmental and genetic factors; the mechanistic involvement of multiple tissues, such as the kidney, adrenal, heart, and vasculature, among others; and multiple levels of physiological control, be they instantaneous, diurnal, or long term.19 BP has been a major target of adaptation in human evolution20,21 and is a significant physiological trait that underlies numerous cardiovascular diseases and their complications.22 It is precisely because of the multiplicity of mechanisms that genetic analysis is critical because it can implicate genes, pathways, cell types, and tissues, and their relative roles, in an unbiased manner. Genetic analyses of BP syndromes have clearly revealed the major role of salt-water homeostasis in the distal nephrons of the kidney as being critical23; however, BP epidemiology suggests this is an incomplete picture and that many more tissues are important. The methods and studies we describe here now allow us to answer these and other broader questions regarding BP genetics and essential hypertension, one of the most frequent human disorders.
RESULTS
Our primary objective was to elucidate the functional genetic architectures of systolic BP (SBP) and diastolic BP (DBP) contributing to inter-individual BP trait variation in a tissue-resolved manner (Figure 1). Thus, we focused here on four major tissues relevant to BP,24 the adrenal gland, artery, heart, and kidney, and first built high-quality epigenomic maps for gene expression, CREs, and regulatory variants (see STAR Methods). For comparison to control (BP irrelevant) tissues, we also considered data from the pancreas, liver, thyroid gland, and colon. Although many such relevant datasets are publicly available, they require uniform reanalysis because our tissue-resolved analysis was comparative, required the use of identical criteria across datasets, and relied on sequence-resolved data.12,25,26
Figure 1. Study design.

We focused on four major tissues relevant to blood pressure regulation and analyzed their relative contributions to systolic and diastolic blood pressure (BP) variation using genetic variants affecting gene expression in these tissues.
For CRE maps, we reanalyzed existing chromatin accessibility data to construct comprehensive CRE maps. We first analyzed DNase sequencing (DNase-seq) data for three adult tissues that were publicly available (Table S1). Using our optimized peakcalling methods for open chromatin,27 we uniformly processed DNase-seq from the ENCODE project28 and compared our CRE peak calls with HOTSPOT2 peaks publicly available from ENCODE. We performed extensive sequence-based quality control (gkmQC) analysis, which we recently developed for assessing the quality of epigenomic data,29 to show that our optimized methods identified higher-quality peaks (higher peak predictability) than ENCODE processing methods (Figure S1). Consistent with this comparison, the heritability of BP, estimated using the stratified LD score regression method (S-LDSC),30 is higher in our identified peaks than in HOTSPOT2 peaks (Figure S2). For adult kidney tissues, we generated multiple libraries using the assay for transposase-accessible chromatin with sequencing (ATAC-Seq), as kidney open chromatin data are not publicly available (STAR Methods). Along with our kidney ATAC-seq data, we uniformly processed additional ENCODE ATAC-seq data for other tissues as well and confirmed that several of them, including our kidney ATAC-seq, exhibited high quality and were comparable to DNase-seq data (Figure S3). Once again, heritability analysis demonstrated that high-quality ATAC-seq samples determined by gkmQC explain greater BP heritability than low-quality ones and were comparable to high-quality DNase-seq data (Figure S4). We note that, in contrast to BP-relevant tissues, BP heritability does not correlate with epigenomic data quality of BP irrelevant tissues, such as the pancreas, liver, thyroid gland, and colon (Table S2; Figure S5). The negligible per-SNP heritability even for the high-quality samples of these latter tissues further validates the relevance of our chosen tissues.
Each sample for a tissue captures slightly different enhancer regions due to variation in technical (e.g., assay used) and biological (e.g., variation in CRE intensity) factors.27 Thus, it is necessary to construct comprehensive maps from all available chromatin accessibility data of specified quality for each tissue, analyzed in the same manner. Starting with the highest-quality sample identified by gkmQC, we defined peaks in this sample as “the best peak set.” We identified unique peaks not overlapping these for each of the remaining samples from the same tissue and chose the next best sample whose unique peaks explained the largest additional BP heritability, through a partitioning heritability analysis (S-LDSC),30 and augmented the existing best peak set with these new peaks. We repeated these steps until a sample failed to significantly increase explained trait heritability. This process showed that the two highest-quality samples from different chromatin accessibility assays are often sufficient to explain the greatest BP heritability (Figure S6). We speculate that this feature likely arises from DNase-seq and ATAC-seq identifying different types of CREs. Finally, we note that differences in the number of open chromatin regions across tissues could produce a potential bias in comparative heritability analyses. To reduce this effect, we selected the top 100,000 regions from each tissue for further analysis (Table S3).
Prior studies have shown that heritability is enriched in regions surrounding genes with tissue-specific expression.31 However, it is unknown how much of the heritability is explained by genetic variations in regions around “all” expressed genes, regardless of their tissue specificity. Thus, we first evaluated the contribution of all common variants to BP heritability in regions of expressed genes for each tissue. To do so, we defined the “gene region” as a gene body spanning ±50 kb from the transcription start to the stop sites (Figure 2A). Next, for each of the four BP-relevant tissues, we performed partitioning heritability analysis (S-LDSC)30 using all common sequence variants in gene regions of each tissue’s top 10,000 genes (ranked by median gene expression from GTEx32) and BP GWAS summary statistics from the UK Biobank.33 These and the following analyses attempt to explain only that fraction of the heritability measured by genetic variants. For these analyses, we used a standard regression where BP was adjusted for age and sex as covariates.34 First, across all four tissues, we explain ~65% heritability which is highly significant. Second, each of the four tissues showed highly significant and ~50% contribution to heritability (Figure 2B), suggesting significant pleiotropy between these four tissues. Repeated analyses using different numbers of expressed genes yielded similar results, but the fraction of heritability is correlated with the number of genes (Figure S7).
Figure 2. Variants in open chromatin regions are major contributors to BP heritability.

(A) Schematic of how the genome is partitioned into five distinct non-overlapping regions based on gene annotation and chromatin accessibility. To avoid overlap, the five genomic categories were defined sequentially from top to bottom, in the order shown in the legend. The gene region is defined by the entire gene body with its ±50,000 bp flanking regions.
(B) Bar plots show the proportion of SNPs (top), the proportion of SNP heritability (middle), and the enrichment scores (bottom) for the top 10,000 genes expressed in each of the four tissues and their union. The LDSC method was used for the analysis, and each tissue was analyzed separately with the baseline model (v.2.2).
(C) For each tissue, the three statistics shown in (B) are further divided into the five exclusive genomic categories as described in (A). These five categories were analyzed together in one LDSC model to account for variants with high LD in more than one genomic category.
Error bars in (B) and (C) are standard errors estimated using a block jackknife method implemented in the LDSC method with a default setting (n = 200 blocks).
Asterisks indicate the statistical significance based on Z scores from S-LDSC coefficients (*p < 0.01, **p < 0.001, ***p < 0.0001).
We thus asked whether these contributions by variants in different functional categories in the gene regions are different. To understand these functional differences, for each tissue, we partitioned gene regions into five mutually exclusive and exhaustive subregions based on gene and CRE content (STAR Methods). We evaluated heritability simultaneously for all subregions in one S-LDSC model so that we could directly compare their individual contributions in the context of other annotations. The results showed that CRE variants explain most of the heritability, particularly for the artery (27.5% DBP, 26.2% SBP) and heart (24.5% DBP, 22.3% SBP), while exonic variants showed a much smaller, yet detectable, fraction (4.1%–5.2% DBP, 4.4%–4.8% SBP across tissues). In contrast, exonic variants showed the most significant enrichment (6.3- to 7.7-fold with p = 1.2 × 10−3−8.2 × 10−5 for DBP; 6.6- to 7.2-fold with p = 2.7 × 10−3−5.2 × 10−4 for SBP across tissues) owing to its smaller genomic target size (0.7% of all SNPs) (Figure 2C). Finally, we note that a significant portion of the heritability is explained by CREs outside expressed genes as well (9.0%–15.7% DBP; 8.7%–11.7% SBP, across tissues). This effect is likely from distal enhancers or CREs residing within genes expressed in other tissues. Although greater heritability is enriched in artery and heart CREs than in other tissues, we observed a high degree of explained heritability and enrichment in all subregions regardless of the tissue considered. This implies that variants in diverse functional elements cause inter-individual BP variation and explains the near-uniform distribution of polygenic genetic effects.5
We hypothesize that numerous widely expressed genes and CREs active in multiple tissues, hereafter referred to as ubiquitous, lead to significant BP heritability (Figures S8A and S8B). To test this contention, we first quantified the pattern of sharing of gene expression vis-à-vis their CREs across the four BP-relevant tissues we studied. The overlap of expressed genes across tissues demonstrates that more than 60% of them are expressed in all four tissues, while <5% are uniquely expressed (Figure S8C). In contrast, 17% of CREs are ubiquitously open, while 15%–20% are open in each tissue only (Figure S8D). We further investigated this surprising relationship between gene expression and CREs across tissues by classifying CRE overlap between tissues vis-à-vis gene expression overlaps. In general, the proportions of CREs with tissue overlaps do not vary much by gene expression overlaps (Figures 3 and S9). This is because two major CRE classes emerge: those that are ubiquitously open, and those that are open only in the tissue in which its target gene is expressed. Consequently, we sought to quantify the contributions of CREs and genes to BP heritability as a function of their ubiquitous vs. non-ubiquitous feature. For each tissue, we classified CREs into six different groups based on their gene expression and CRE activity patterns across tissues (Figure 4A) and performed partitioned heritability analysis using all six groups in one model. The most significant and greatest BP heritability arises from ubiquitously active CREs in ubiquitously expressed genes. This is the primary reason why we can explain significant heritability on an individual tissue basis (11.1%–12.0% for DBP, 11.1%–11.9% for SBP). On the other hand, BP heritability explained by tissue-restricted CREs in tissue-restricted genes is much more variable across tissues. Here, artery- (6.0% DBP, 6.5% SBP) and heart-restricted (5.2% DBP, 3.9% SBP) CREs explain much greater BP heritability than those from the adrenal gland (1.5% DBP, 2.1% SBP) or kidney (2.4% DBP, 2.2% SBP). Finally, we explained the greatest BP heritability enrichment in artery-restricted CREs for artery-restricted genes (6.7-fold with p = 5.7 × 10−5 for DBP, 7.3-fold with p = 2.8 × 10−5 for SBP). Similar to the result in Figure 2C, artery- and heart-restricted CREs not associated with “expressed genes” explain an even greater proportion of heritability (12.3% and 8.7% for DBP and SBP for artery and 8.7% and 8.5% for DBP and SBP for heart). Thus, distal enhancers play an important role in BP regulation, at least for the artery and heart, suggesting that different tissues may contribute to BP physiology in distinct ways.
Figure 3. Open chromatin patterns are independent of gene expression patterns.

The classification and stratification of genes and CREs is based on their expression and accessibility patterns in four tissues (adrenal gland, artery, heart, and kidney). The x axis represents sixteen categories (columns) into which each gene is classified from the union of the top 10,000 genes across the four tissues. These categories are determined based on the expression pattern of the gene (expressed or not expressed) in the four tissues. The letters (A, adrenal; R, artery; H, heart; K, kidney) associated with each category indicate the tissues in which the genes are expressed. The y axis represents the stratification of CREs within each gene category. The CREs are further categorized based on their accessibility pattern (open or not open) across the same four tissues. The letters (A, R, H, and K) in the CRE category names indicate the tissues in which the corresponding regions are open. The numbers within each cell of the figure represent the percentage of CREs in a given CRE category relative to the total number of CREs across all categories in one gene category.
Figure 4. Non-ubiquitous CREs show the greatest differential enrichment scores for SBP and DBP.

(A) Using adrenal gland as an example, we show how CREs active in adrenal gland can be classified into six different classes based on their activities in other tissues, genomic location, and gene expression.
(B) For each tissue, bar plots compare the proportion of SNPs (top), the proportion of SNP heritability explained (middle), and the enrichment scores (bottom) across these six classes.
Error bars are standard errors estimated using the LDSC block jackknife method (n = 200 blocks). Asterisks indicate the statistical significance based on Z scores from S-LDSC coefficients (*p < 0.01, **p < 0.001, ***p < 0.0001).
To disentangle the specific contribution of specific genomic regions across these BP-relevant tissues, we reperformed partitioning heritability analysis but with different groupings of genomic regions. Instead of analyzing each tissue independently, we considered all tissues together to define six mutually exclusive classes based on their overlap: four tissue-specific groups (adrenal gland, artery, heart, kidney), one common group (common), and one intermediate group (mixed). We reestimated the heritability explained by each of these groups in a single S-LDSC model. Consistent with our prior analysis, ubiquitously expressed genes explained the greatest BP heritability (41.4% DBP, 41.1% SBP) (Figure 5A); furthermore, CREs active in multiple tissues (common and mixed) also explained significant heritability (15.0% DBP, 14.5% SBP for common and 19.0% DBP, 19.7% SBP for mixed) followed by artery (13.2% DBP, 10.1% SBP), heart (7.8% DBP, 4.9% SBP), and adrenal gland (5.5% DBP, 6.1% SBP) tissues (Figure 5B).
Figure 5. Predicted regulatory variants are largely tissue specific and differentially affect SBP and DBP.

Three different groups of variants are considered: (A) all variants in regions around expressed genes, (B) all variants in CREs, and (C) impactful regulatory variants predicted by deltaSVM. In each group, variants were further divided into six different classes based on overlaps across the four tissues (see Figure 3). Variants in the “common category” affect all four tissues, while variants in the “mixed category” affect more than one, but not all, tissues. For each variant group, bar plots show the proportion of SNPs (top), the proportion of SNP heritability explained (middle), and the enrichment score (bottom) across six variant classes. Error bars are standard errors estimated using the LDSC block jackknife method (n = 200 blocks). Asterisks indicate the statistical significance based on Z scores from S-LDSC coefficients (*p < 0.01, **p < 0.001, ***p < 0.0001).
Since our epigenomic maps could identify the tissue in which a CRE was active, regulatory variants within these CREs would indicate a tissue-specific genetic effect despite the CRE being ubiquitously open. Thus, we used the deltaSVM method, which calculates the change in the gapped k-mer support vector machine (gkm-SVM) scores, with models trained on tissue-specific chromatin accessibility (Figure S10)12 to computationally predict the regulatory effect of variants in CREs (STAR Methods). Strikingly, we discovered that most of these variants are predicted to have an impact on CREs in one tissue only. Specifically, among the variants in CREs in the four tissues, 15.9% and 30.2% are in common and mixed groups, respectively, while 1.8% and 22.3% of deltaSVM-positive variants among all deltaSVM-positive variants are in common and mixed groups. Moreover, these predicted regulatory variants have different levels of impact on BP phenotypes. Consistent with tissue-specific CREs, artery- (11.8% DBP, 9.8% SBP) and heart-specific (7.7% DBP, 6.9% SBP) regulatory variants explained the greatest BP heritability. Artery-specific regulatory variants achieved the most significant and the greatest enrichment (17.2-fold with p = 5.3 × 10−6 for DBP; 14.2-fold with p = 2.5 × 10−5 for SBP) (Figure 5C). Thus, even if regulatory elements are active in multiple tissues, variants in these CREs are likely to exert their effect on phenotypes through a specific tissue. Collectively, we explain 33.4% and 29.5% of DBP and SBP heritability with these predicted regulatory variants in the four tissues. Thus, BP physiology is more tissue specific than gene expression analyses signify.
Finally, we repeated the above analyses using two independent BP GWAS datasets from Hoffmann et al.35 and Evangelou et al.2 Hoffmann’s study used a relatively smaller multiancestry cohort (n = 100,000) and a long-term average of BP measurements, resulting in more robust phenotypes. On the other hand, Evangelou’s study evaluated ~750,000 participants from many different cohorts using average BP measurements from a single visit as phenotypes (STAR Methods). The genotyping data are also different: custom-designed ancestry-specific arrays followed by imputation to the 1000 Genome Project were used in Hoffmann’s study, whereas different arrays imputed to either 1000 Genome or HRC were meta-analyzed in Evangelou’s study. The significant differences in study designs (e.g., cohort size, BP measurements) and technical factors (e.g., genotyping array platforms, imputation methods) provided us with an opportunity to assess the robustness of our heritability inference. Indeed, the estimated total SNP heritabilities using S-LDSC are slightly different across all three studies: 15.1% (DBP) and 15.5% (SBP) for the Evangelou dataset and 11.2% and 13.6% for the Hoffmann dataset, in contrast to 14.8% (DBP) and 13.9% (SBP) for the UK Biobank (UKB) dataset. This affected quantitative differences in partitioned heritability estimates across tissues and classes. Nonetheless, the overall patterns are very similar across the three studies, suggesting that inference of tissue and CRE effects is consistent across studies (Figure S11). As expected, results from the Evangelou dataset are more similar to our main findings due to our common use of the UKB cohort.
Tissue-specific regulatory variants in ubiquitously active CREs explain not only heritability (Figure 5C) but also tissue relevance of specific GWAS hits, which remain unclear when analyzing only CREs or gene expression (Figure S12). For example, rs3753584, a top GWAS hit associated with SBP in UKB cohorts (p = 3.04 × 10−50), is located within the ubiquitous promoter regions of MTHFR and CLCN6 genes. These genes, along with NPPA and NPPB, are expressed in all four tissues and exhibit pleiotropy in relation to various comorbidities and endophenotypes associated with BP.36,37 deltaSVM analysis revealed that rs3753584 has the strongest regulatory impact on artery and heart tissues. The artery-/heart-enriched regulatory impact aligns with the significant association of rs3753584 with natriuretic peptide concentration, a clinical biomarker of cardiac stress affecting BP (p = 4.63 × 10−38 for midregional pro-atrial natriuretic peptide).38 Additionally, rs10776752, another significant GWAS hit found within the intron of WNT2B (p = 8.25 × 10−17), is also located in a shared CRE across tissues. rs10776752 most strongly impacts the adrenal gland and is involved in a significant risk haplotype for primary aldosteronism, a commoncause of secondary hypertension (p = 5.2 × 10−11).39 Inspired by these observations, we systematically analyzed BP-associated genes from the viewpoint of tissue-specific regulatory variants (STAR Methods). We found that 82 of 150 (54.7%) and 55 of 91 (60.4%) with ≥5 deltaSVM-positive GWAS hits in at least one tissue have 50% or greater of their regulatory variants active in that tissue for SBP and DBP, respectively (Table S4). Taken together, a more detailed analysis of tissue specificity, particularly at the SNP level, can help us understand which tissue-specific dysfunction of pleiotropic genes and CREs is associated with a trait, enabling the dissection of tissue-specific mechanisms underlying BP.
As indicated earlier, regulatory variants that alter chromatin accessibility do so by disrupting TF binding. Therefore, we evaluated which TF motifs are enriched for the high-scoring 11-mers from the trained gkm-SVM models27 (STAR Methods). The results revealed 137 TFs significantly enriched in at least one tissue (enrichment score ≥ 4) as well as expressed in that tissue (Figure 6). Interestingly, even though we only required TFs to be expressed in the tissue in which the corresponding motifs are enriched, most (124 out of 137 TFs) of the identified TFs showed expression in multiple tissues with differential enrichment across tissues. Moreover, while some TFs are uniquely enriched in one tissue, in general, most show enrichment in two or more tissues, suggesting that TFs work in combination with other TFs and that their activities are highly cell-type and tissue dependent.40,41 The most enriched TFs for artery are NFI (NFIA, NFIX, NFIB), AP1 (JUND, FOS), CEBP, and MEF2 (MEF2A/B/C/D) factors. In the heart, nuclear receptors (PPARA, NR2F2, NR4A1), NFI, and MEF2 are the most enriched, but MEIS, GATA, and SOX family factors also demonstrated significant enrichment. We also found that motifs enriched for regions overlapping predicted regulatory variants (Figure 6) are largely concordant with the motifs enriched for the predictive 11-mers from our SVM models, suggesting that dysregulation of and by these TFs in their respective tissue of action affects BP regulation.
Figure 6. Specific transcription factors are differentially enriched for predicted regulatory variants.

Heatmaps showing enrichment of transcription factor (TF) motifs across four tissues in 11-mers with large SVM weights from the trained SVM models (left) and in regions overlapping predicted regulatory variants (right). Motifs were clustered based on their enrichment pattern across tissues on the left. The motifs in both heatmaps are ordered identically. Representative TF motifs enriched in the artery and heart are highlighted.
DISCUSSION
In this study, we confirm that BP heritability is significantly controlled by sequence variation in genes expressed in the artery, heart, kidney, and adrenal and that inter-individual BP variation is largely controlled by CREs. Although many of these genes are pleiotropic, surprisingly, CRE-mediated genetic effects from regulatory variants for these genes show far greater tissue specificity. In other words, the pleiotropic nature of gene expression is largely caused by combinations of tissue-specific CREs. Additionally, most promoters, especially with high CpG dinucleotide content, are more commonly active and open than distal enhancers across different tissues and cell types.42 Thus, the fact that ubiquitously active CREs in ubiquitously expressed genes explain the greatest BP heritability regardless of tissues suggests an important role of promoters in phenotypic variation (Figure 4). However, our finding that the effect of predicted regulatory variants is largely tissue specific even for ubiquitous CREs, as it would be for promoters, suggests that phenotypic consequences mainly arise from perturbation of gene expression in limited tissues. The cell-type specificity informed by our approach can bepotentially used to identify targets in drug discovery and avoid potential side effects from action in non-relevant tissues and cell types. Our method can address this issue by identifying the cell types and tissues in which the target protein exerts its disease-relevant function. As an example, we evaluated BP-associated genes and predicted their disease-relevant tissues (Table S4). However, our genomic method is general and applicable to any disorder.
Despite the lower abundance of exonic than non-coding variants, we show that they show the greatest enrichment in BP heritability (Figure 2C). This suggests that common exonic variants do affect BP phenotypes and provides an important route for identifying specific BP genes. Since the vast majority of such variants are non-pathogenic, we suspect that they exert their effect through gene expression through differential codon usage of synonymous variants and missense variants affecting gene expression. We note that many of these exonic variants may also be tissue restricted because only expressed genes with variants could exert their function in the tissue. Although it is not highly enriched, we also observe non-trivial BP heritability explained by intronic variants not in open chromatin regions. This suggests the emerging functional importance of deep intronic splicing variants on gene expression.43 Although the analyses here have focused on common (polymorphic) variants, they are extendable to variants of any frequency. Recent studies have shown significant contributions of rare variants to complex trait heritability.44 Analyses of such variants will be instructive for explaining greater BP heritability and achieving more specificity for target gene identification.
In principle, our framework is flexible enough to incorporate different types of variants as long as there are enough variants available within each partition of the functional categories. For example, methods for predicting non-coding RNA (ncRNA) functions45 could be used in a similar way as deltaSVM was used in our study. This versatility expands the applicability of our approach to different variant classification systems, increasing its potential for broader research applications. The key innovation of our approach lies in the partitioning of the functional genome into mutually exclusive and exhaustive categories, which is followed by joint heritability estimation using the LDSC method, which allows for a more comprehensive understanding of the genetic architecture underlying complex traits.
Of the four tissues studied, arterial and cardiac tissues have the greatest role in BP heritability (Figure 5). The strong genetic effect of artery-specific regulatory variants on BP regulation is consistent with our previous finding that BP GWAS variants are most significantly colocalized with expression-modulating variants in artery tissues.35 We have long known of two major ways of BP control: through cardiac output and vascular resistance.46 The arterial effect implies that genetic variation in vascular resistance may be the greater contributor to inter-individual BP differences, and thus their specific mechanisms will require further investigation. These advances can now be used to answer broader physiological questions about BP, namely, how are the genetic architecture of SBP and DBP different, and what is the genetic basis of their correlation?
High-quality epigenomic maps are essential components of our approach. We show that the quality of CRE maps is directly correlated with their contribution to estimated BP heritability and that optimization of chromatin data processing further improves the yield of genetic analyses. Our approach is general and can be extended to other phenotypes, as well as to single-cell-based genomic annotations.
The enrichment analyses of TF motifs in the predictive sequence features and putative regulatory variants we provide give us an opportunity for mechanistic understanding of the genetic architecture of BP regulation. Our findings suggest that disruption of tissue-specific TF binding sites enriched in the identified CREs is the major source of inter-individual variation in BP. We anticipate that this is a common feature of many complex traits and diseases.
We hypothesize that the causal genes that affect a particular phenotype are universal, as are the CREs that affect the expression of the causal gene and the TFs that bind these CREs. However, it is very likely that the allele frequencies of the many common variants within the CREs differ between ancestral groups. As a result, it is possible that specific causal regulatory variants may also differ between ancestry groups. Thus, the importance of studying non-European ancestry groups is to maximize the discovery of all causal genes, CREs, and TFs.
A major unanswered question in the genetics of multifactorial traits is the mechanism by which environmental factors modulate these traits. One possibility is that environmental factors affect these same tissues and dysregulate the same TFs and their cognate CREs to thereby dysregulate the same genes.47 In other words, genetic variation in CREs to alter specific target genes is phenocopied by environmental alterations of the TFs that bind these CREs. This hypothesis is biologically plausible because it proposes that BP variation arises from the same genes whose biology is modulated in specific tissues by both genetic and environmental factors but through different biological molecules: CREs versus TFs (acting on CREs). The results from this study are critical to test this hypothesis. For example, the discovered functional architecture could be studied for its reaction to environmental exposures and to answer whether disease results through the same biochemical and physiological pathways irrespective of whether the perturbations are genetic or environmental.
Limitations of the study
(1) Although the gkm-SVM method used in this study is accurate in its ability to predict tissue-specific regulatory variants, newer deep learning approaches combining different types of epigenomic data (e.g., chromatin accessibility, histone marks, DNA methylation, etc.) can further enhance identification of transcriptional enhancers. Additionally, BP heritability can also arise from variation affecting RNA splicing, post-transcriptional regulation in the 3′/5′ UTR, and microRNA (miRNA) targeting. (2) This study has examined the contribution of common single-nucleotide variants on BP heritability and needs to be extended to more complex and less frequent variants, including those affecting coding sequences. (3) Our current approach for linking variants to their potential genes relies solely on the proximity of variants to genes. This inference needs to be improved using chromatin conformation data linking individual regulatory elements to specific genes.
STAR★METHODS
RESOURCE AVAILABILITY
Lead contact
Requests for further information should be directed to the lead contact, Aravinda Chakravarti (aravinda.chakravarti@nyulangone.org).
Materials availability
This study did not generate new unique reagents.
Data and code availability
Raw and processed data for kidney ATAC-seq generated in this study have been deposited at the NCBI Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo), and are publicly available as of the date of publication. Accession numbers are listed in the key resources table. The Supplemental Data (Data S1–S6) have been deposited at Zenodo and are publicly available. The DOI is listed in the key resources table.
All original code with documentation has been deposited at GitHub: https://github.com/Dongwon-Lee/bph2 and is publicly available. DOI for the code is also listed in the key resources table.
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
KEY RESOURCES TABLE.
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
|
| ||
| Biological samples | ||
|
| ||
| Frozen human kidney tissues | The National Disease Research Interchange (NDRI) | http://www.ndriresource.org/; RRID:SCR_000550 |
| Frozen human kidney tissues | Gift of Life Michigan | N/A |
|
| ||
| Chemicals, peptides, and recombinant proteins | ||
|
| ||
| 6x Homogenization Buffer | In-house | 30mM CaCl2, 18 mM Mg(AC)2, 60mM Tris pH 7.8, 0.1mM PMSF, 1mM β-mercaptoethanol, 320mM Sucrose, 0.1mM EDTA, 0.1%NP40 |
| ATAC-Resuspension Buffer (RSB) | In-house | 10mM Tris-HCL, 10mM NaCl, 3mM MgCl2, 1% BSA |
| PMSF | Sigma | P7626–1G |
| β-mercaptoethanol | Sigma | M6250–100ML |
| Iodixanol solution | Sigma | D1556–250ML |
| Calcium chloride | Sigma | 21115–100ML |
| Magnesium acetate | Sigma | 63052–100ML |
| EDTA | Ambion | AM9261 |
| Sucrose | Sigma | S7903–250G |
| Digitonin | Sigma-Aldrich | Product# 300410 |
| BSA | Sigma-Aldrich | Product# A2153 |
| NP-40 | Thermofischer Scientific | CAT# 85124 |
| Tris-HCL | Thermofischer Scientific | CAT# 15567027 |
| Sodium Chloride Solution | Thermofischer Scientific | CAT# AM9759 |
| Magnesium Chloride Solution | Thermofischer Scientific | CAT# AM9530G |
| Tween 20 | Thermofischer Scientific | CAT# 28320 |
| TD buffer | Illumina | 20034198 |
| Transposase | Illumina | 15027865 |
| RNAse inhibitor | Thermofischer Scientific | CAT# AM2696 |
| DAPI | Thermofischer Scientific | D1306 |
| Nuclei EZ lysis buffer | Sigma Aldrich | Product# NUC-101 |
|
| ||
| Critical commercial assays | ||
|
| ||
| MinElute Reaction Cleanup Kit | Qiagen | 28206 |
|
| ||
| Deposited data | ||
|
| ||
| Raw and processed data for kidney tissues | This paper | GEO: GSE200047 |
| UK-Biobank GWAS summary statistics | UK Biobank33 | http://www.nealelab.is/uk-biobank/ |
| GWAS summary statistics for BP datasets | NHGRI-EBI GWAS Catalog1 | http://www.ebi.ac.uk/gwas/; RRID:SCR_012745 |
| Human Gene Annotation data from GENCODE | GENCODE48 | https://www.gencodegenes.org/; RRID:SCR_014966 |
| GTEx RNA-seq data for BP tissues | GTEx Consortium32 | http://commonfund.nih.gov/GTEx/; RRID:SCR_013042 |
| CIS-BP TFBS Motif data | Weirauch et al.49 | http://cisbp.ccbr.utoronto.ca; RRID:SCR_017236 |
| 1000 Genomes Project | The 1000 Genomes Project Consortium50 | http://www.1000genomes.org/; RRID:SCR_006828 |
| DNase-seq and ATAC-seq data for BP tissues | ENCODE Project Consortium28 | https://www.encodeproject.org/; RRID:SCR_015482 |
|
| ||
| Software and algorithms | ||
|
| ||
| BEDTools | Quinlan & Hall51 | https://github.com/arq5x/bedtools2; RRID:SCR_006646 |
| Bowtie2 | Langmead & Salzberg52 | RRID:SCR_016368 |
| cutadapt | Martin53 | RRID:SCR_011841 |
| FIMO | Grant et al.54 | https://meme-suite.org/meme/doc/fimo.html |
| gkmQC | Han et al.29 | https://github.com/Dongwon-Lee/gkmQC |
| LDSC | Finucane et al.30 | RRID:SCR_022801 |
| LS-GKM | Lee26 | https://github.com/Dongwon-Lee/lsgkm |
| MACS2 | Zhang et al.55 | RRID:SCR_013291 |
| Picard Toolkit | Broad Institute | https://broadinstitute.github.io/picard/; RRID:SCR_006525 |
| Python 3.9.15 | Python Software Foundation | http://www.python.org/; RRID:SCR_008394 |
| R 3.6.2 | The R Project for Statistical Computing | https://www.r-project.org/; RRID:SCR_001905 |
| SAMTools | Li et al.56 | RRID:SCR_002105 |
|
| ||
| Other | ||
|
| ||
| Pipeline scripts | This paper | https://github.com/Dongwon-Lee/bph2; https://doi.org/10.5281/zenodo.8400966 |
| Supplemental Data | This paper | https://doi.org/10.5281/zenodo.8057373 |
EXPERIMENTAL MODEL AND SUBJECT PARTICIPANT DETAILS
One human kidney cortex tissue sample was obtained from the National Disease Research Interchange (NDRI; protocol number: RCHA201001A; PI: A. Chakravarti). The study using this sample was determined to be non-human subject research and, therefore, did not require IRB review. Two additional kidney cortex tissues were obtained from the University of Michigan and Gift of Life Michigan. This research was conducted with the informed consent of all study participants and was approved by the IRB (approval number: HUM00107869; PI: S.K. Ganesh).
METHOD DETAILS
RNA-seq data processing (GTEx V8)
To identify expressed genes, we used the pre-processed GTEx RNA-seq datasets (http://gtexportal.org).32 Specifically, we obtained median gene-level TPMs by tissue (GTEx_Analysis_2017–06-05_v8_RNASeQCv1.1.9_gene_median_tpm.gct.gz) and labeled the top 10,000 genes ranked by the median TPM as “expressed.” We also defined the additional “expressed” gene sets by varying the number of top expressed genes from 5,000 to 10,000 in increments of 1,000 genes to assess the sensitivity of the choice of this number. We defined gene regions as 50kb extensions 5′ to the transcription start site and 3′ to the stop site. After extension, any overlapping regions were merged. The hg 19 genome build and the gene annotation file from GENCODE v 19 database48 were used to match the genome build of pre-calculated S-LDSC data. The full lists of genes and the genomic coordinates for these annotations are available in Data S1.
ATAC-seq data for kidney tissues
Three human kidney tissues were obtained as snap-frozen specimens from The National Disease Research Interchange (NDRI) as well as from the University of Michigan and Gift of Life Michigan. We employed two different protocols in our experimental optimization. ATAC-seq libraries for the first two samples (R1 and R2) were generated using the protocol adapted from Corces et al.57 Briefly, tissue samples were finely chopped on ice and homogenized in homogenization buffer (30Mm CaCl2, 18 mM Mg(AC)2, 60mM Tris pH7.8,0.1mM PMSF, 1mM β-mercaptoethanol, 320mM Sucrose, 0.1mM EDTA, 0.1%NP40) using a Dounce homogenizer. Cellular components were separated by iodixanol gradient, nuclei isolated and counted by hemocytometer. Nuclei were washed twice in ATAC-RSB+0.1% Tween 20 (10mM Tris-HCL, 10mM NaCl, 3mM MgCl2, 1% BSA, and 0.1% Tween 20). Nuclei were resuspended in 50 μl of transposition mixture, containing 2.5 μl transposase in 1×TD buffer (20034198, Illumina) with 0.01% digitonin and 0.1% Tween 20 and incubated for 30 min on a thermomixer at 1000RPM and 37°C. DNA was extracted using MinElute Reaction Cleanup Kit (Qiagen).
The remaining sample was processed using an improved protocol that included sorting. Tissues were finely chopped, washed with ice-cold PBS (w/o Ca2+/Mg2+) and homogenized in Nuclei EZ lysis buffer (NUC-101, Sigma Aldrich) as recommended, in the presence of 0.25U/μL RNase inhibitor. Lysis was promoted by gentle mixing of the homogenate using bore tips, followed by incubation for 5 min on ice and passing of the homogenate through a 70μm cell strainer. Nuclei were then pelleted by centrifugation of the filtered homogenate at 500g for 5 min at 4°C. Nuclei were washed twice with 500μL Nuclei Wash and Resuspension Buffer (1× PBS, 1% BSA, 0.25μL RNase) for 5 min on ice (first wash without resuspending nuclei, second wash with resuspended nuclei). After each wash, nuclei were pelleted by centrifugation at 500g for 5 min at 4°C. Pelleted nuclei were further resuspended in 100–500μL Nuclei Wash and Resuspension Buffer with DAPI (10 μg/mL) and nuclei were sorted by FACS to eliminate cellular debris as well as doublets. Sorted nuclei were then pelleted at 500g for 5 min at 4°C and 500μL ATAC wash buffer (10mM Tris-HCL, 10mM NaCl, 3mM MgCl2, 1% BSA, and 0.1% Tween 20) was gently added and incubated for 5 min on ice without resuspending the nuclei. Washed nuclei were then pelleted by centrifugation at 500g for 5 min at 4°C and nuclei were gently resuspended in 500μL ice-cold ATAC wash buffer and incubated for 5 min. The nuclei pellet was obtained by centrifugation at 500g for 5 min at 4°C and resuspended in 100μL ice-cold 13 TD buffer (20034198, Illumina). About 10,000 nuclei were used for transposition reaction at 37° for 30 min in a thermomixer. DNA was extracted using MinElute Reaction Cleanup Kit (28206, Qiagen). Sequencing was carried out at the Genome Technology Center, New York University and 50 million 150bp paired-end reads per sample were obtained for each sample.
DNase-seq & ATAC-seq data processing
For CRE map construction, we primarily used chromatin accessibility data from public databases. Specifically, we collected DNase-seq and ATAC-seq samples from ENCODE28 for the adrenal gland, heart (left ventricle), and tibial artery (Table S1). For negative controls, we collected analogous data for the pancreas, liver, thyroid gland, and colon (Table S2). We also generated ATAC-seq libraries for adult kidney tissues, for which no public data existed (see above).
We uniformly processed DNase-seq and ATAC-seq data using our previously established pipeline with a minor modification.27 For the raw ATAC-seq kidney data, we used bowtie252 to align the reads after trimming reads using cutadapt.53 Then, SAMTools56 and Picard were used to perform quality control of the bam files. For the ENCODE datasets, we skipped the alignment step and directly used the mapped reads processed by the ENCODE pipeline. Starting with properly paired and non-duplicated read pairs mapped to the GRCh38 reference genome build, we extracted the cut-sites from each read and used them as input to the MACS2 peak caller55 with the no model option (–nomodel). We found the optimal extension and shift base-pairs to be 100bp (–extsize 100) and –50bp (–shift 50; lagging strand), respectively. Also, we used the –keep-dup option and default q-value cut-off (q < 0.01). For ATAC-seq samples, we additionally adjusted the cut-sites by +4bp for forward-strand reads and −5bp for reverse-strand reads, to take into account the 9bp insertion by the Tn5 transposase.58 and bed files. Manipulation of bed files was done using BEDTools.51 The full set of MACS2 peaks for all tissues analyzed in this study is provided in Data S2.
QUANTIFICATION AND STATISTICAL ANALYSIS
Partitioning heritability using S-LDSC
We obtained GWAS summary statistics for systolic (S) and diastolic (D) blood pressure (BP) phenotypes from the UK Biobank,33 as processed by the Ben Neale laboratory (http://www.nealelab.is/uk-biobank/). For two additional BP GWAS summary statistics,2,35 we directly obtained the datasets from the NHGRI-EBI GWAS Catalog.1 To estimate SNP-heritability (h2SNP) from GWAS summary statistics, we used a stratified version of the LD-score regression (S-LDSC) method to estimate the proportion of SNP-heritability of SNPs in a specific category as previously described.30,59 We used enrichment scores to estimate the relative contribution of SNPs in specific annotations (e.g., tissue-specific CREs) to heritability. The enrichment score was calculated as the proportional SNP heritability contributed to an SNP set (Pr(h2SNP)) divided by the proportion of SNPs in that SNP set (Pr(SNP)). We used z-scores from normalized S-LDSC coefficients to compare samples, corresponding to the statistical significance of the per-SNP contribution to the heritability as previously reported.30
We defined genomic annotations of CREs as ±1,000bp extended regions from peak summits to capture putatively functional variants in flanking regions associated with CREs.27 To control signals confounded by other functional regions, such as protein-coding and evolutionary-conserved regions, we used the 97 baseline LD model as recommended.60 We used data from European ancestry subjects and corresponding allele frequencies from the 1000 Genomes Phase 3 data as a reference panel for LD score calculation. We obtained the baseline model and reference panel datasets from https://data.broadinstitute.org/alkesgroup/LDSCORE/. Note that we lifted over our open chromatin peak coordinates to hg19 before the S-LDSC analyses as all pre-calculated baseline LD models were available only for hg19 at the time of analyses. For a fair comparison among multiple annotations of interest, we conducted S-LDSC analyses of all annotation sets in one mode, along with the baseline model. All S-LDSC results are provided in Data S3.
Characteristics of two additional BP GWAS
In the study by Hoffmann et al.,35 participants for the meta-analysis were from the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort, consisting of ~100,000 individuals. The cohort is ancestrally diverse, with 81% European, 8% Hispanic/Latin American, 7% East Asian, 3% African American, and 3% South Asian ancestry. In contrast to the UKB study, the BP phenotype in this analysis was based on the long-term average of multiple independent clinic measurements at different visits.
In the study by Evangelou et al.,2 the meta-analysis included ~750,000 individuals of European ancestry. This dataset combined information from the UKB study (~450,000 individuals) and the International Consortium of Blood Pressure-Genome Wide Association Studies (ICBP), which included ~300,000 individuals from 77 different cohorts. The majority of BP measurements were derived from averaging multiple (typically two) measurements taken on a single day.
gkmQC
We assessed the quality of open chromatin peaks using gkmQC.29 Specifically, we split the open chromatin peaks into subsets, each comprising 5,000 peaks sorted by decreasing signal intensity scores from the peak caller. We then trained the gkm-SVM model for each peak subset and used the “predictability” of peaks from models as a quality metric of peak subsets. Model performance was measured by the area under the ROC curves with 5-fold cross-validation (i.e., AUC scores of peaksubsets). To visualize the overall quality of a sample, we plotted AUC scores as a function of ranks of peak subsets (i.e., gkmQC curve). We limited this analysis to the top 100,000 peaks (20 subsets of 5,000 peaks each) to generate a gkmQC curve. We use the 10th AUC score of peak subsets as an overall estimate of sample quality and defined a sample to be of high-quality if the 10th AUC is greater than 0.8. All gkmQC results are provided in Data S4.
gkm-SVM
We built gkm-SVM models following our previously established pipeline with minor modifications.26,27 For each high-quality sample as determined by gkmQC, we defined the positive training set as follows: starting from the top 100,000 open chromatin regions (ranked by their MACS2 p values obtained from our optimized pipeline described above), we removed from the training set peaks with >1% of N-bases, >70% of repeats, and commonly open regions (defined as regions active in at least 30% of samples across all ENCODE datasets), as previously described.12,27 We further restricted open chromatin regions to overlapping H3K27ac peaks from the same tissue (Table S5). As a negative training set, we used an equal number of random genomic regions, matched for length, GC content and repeat fraction. To prevent potential bias caused by variable sequence length, we used 600bp fixed-length regions as a training set by extending ±300bp from peak summits. We used LS-GKM26 software for training with l = 11, k = 7, d = 3, and t = 4 (weighted-gkm kernels). For each sample, we averaged ten different models with different random samplings of negative training sets. After training, we combined the models from different samples (i.e., biological replicates) to generate one model per tissue. Training sets and final models are provided in Data S5.
deltaSVM
For each tissue, we calculated deltaSVM scores for all common variants (~10M with minor allele frequency >1%) in the EUR superset from the 1000 Genome Project (v3),50 using the combined model, and identified variants overlapping open chromatin regions in the corresponding tissue. We used ±10 bp regions centered on those variants for scoring. We then determined variants in the top 15th percentile of all deltaSVM scores as potentially impactful regulatory variants. This threshold was chosen based on our previous analyses of allele-biased chromatin accessibility in heart tissues.27 The lists of these deltaSVM positive variants are provided in Data S5.
Tissue mapping of BP-associated genes
We first identified genes with ≥5 GWAS hits, whose deltaSVM scores are significant in at least one BP tissue, in the gene region. We required these GWAS hits to be of suggestive significance (p < 5.0 × 10−6) in associations with S/DBP in the UKB cohort, as well as in the 95% credible sets obtained from the UKB GWAS fine-mapping datasets.61 Then, we designated a gene as having a tissue-specific effect if ≥ 50% of its GWAS hits were deltaSVM positive in the corresponding tissue.
TF motif enrichment analysis
We identified TF motifs enriched for high-scoring 11-mers (i.e., the top fifth percentile) from the trained gkm-SVM models, as previously described,27 with a minor modification. Briefly, for each of the manually curated TF motifs from the CIS-BP database,49 we identified all 11-mers that significantly matched a given motif using FIMO with default parameters,54 and calculated enrichment of the matched 11-mers in the top fifth percentile ranked by their SVM weights. We determined TF motifs as significant if their multiple testing corrected binomial p ≤ 0.01 and enrichment score was ≥4. To further reduce false positives, we only considered TFs expressed in the tissue of interest, as previously defined (i.e., the top 10,000 genes ranked by median gene expression from GTEx V8). Similar to enrichment analysis of the high-scoring 11-mers, we also identified TF motifs enriched for regions surrounding predicted regulatory variants. We used ±10 bp regions centered at each variant to scan TF motifs using FIMO with default parameters. We tested both reference and alternative alleles and determined ‘a hit’ if either of them significantly matched a motif. We calculated the expected hit frequency using all common variants and the observed hit frequency using the predicted impactful regulatory variants only. Enrichment scores were calculated as observed to expected hit frequency. Motif enrichment results are provided in Data S5.
Supplementary Material
Highlights.
Variants in enhancers exhibit greater tissue specificity than gene expression
Enhancer variants disentangle tissue-specific dysregulation of genes and CREs
Artery-specific enhancer variants explain the most heritability of blood pressure
ACKNOWLEDGMENTS
We thank Dr. Luciano Martelotto for his advice and improved protocol for our ATAC-seq experiments. This study has benefited from useful comments from Drs. Xiaofeng Zhu, Alanna Morrison, and Charles Gu. This research was supported by the computational resources of the high-performance computing core at NYU and National Institutes of Health grants HL086694, HL141980, and HL128782 to A.C.
Footnotes
DECLARATION OF INTERESTS
P.M. is a cofounder and the chief executive officer of Benthos Prime Central (Houston, TX, USA).
SUPPLEMENTAL INFORMATION
Supplemental information can be found online at https://doi.org/10.1016/j.celrep.2023.113351.
REFERENCES
- 1.Buniello A, MacArthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C, McMahon A, Morales J, Mountjoy E, Sollis E, et al. (2019). The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Evangelou E, Warren HR, Mosen-Ansorena D, Mifsud B, Pazoki R, Gao H, Ntritsos G, Dimou N, Cabrera CP, Karaman I, et al. (2018). Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits. Nat. Genet. 50, 1412–1425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Yengo L, Sidorenko J, Kemper KE, Zheng Z, Wood AR, Weedon MN, Frayling TM, Hirschhorn J, Yang J, and Visscher PM; GIANT Consortium (2018). Meta-analysis of genome-wide association studies for height and body mass index in 700000 individuals of European ancestry. Hum. Mol. Genet. 27, 3641–3649. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Fisher RA (1919). —The Correlation between Relatives on the Supposition of Mendelian Inheritance. Trans. R. Soc. Edinb. Earth Sci. 52, 399–433. [Google Scholar]
- 5.Visscher PM, Yengo L, Cox NJ, and Wray NR (2021). Discovery and implications of polygenicity of common diseases. Science 373, 1468–1473. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, Reynolds AP, Sandstrom R, Qu H, Brody J, et al. (2012). Systematic Localization of Common Disease-Associated Variation in Regulatory DNA. Science 337, 1190–1195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Chakravarti A, and Turner TN (2016). Revealing rate-limiting steps in complex disease biology: The crucial importance of studying rare, extreme-phenotype families. Bioessays 38, 578–586. [DOI] [PubMed] [Google Scholar]
- 8.Musunuru K, Strong A, Frank-Kamenetsky M, Lee NE, Ahfeldt T, Sachs KV, Li X, Li H, Kuperwasser N, Ruda VM, et al. (2010). From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature 466, 714–719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Claussnitzer M, Dankel SN, Kim KH, Quon G, Meuleman W, Haugen C, Glunk V, Sousa IS, Beaudry JL, Puviindran V, et al. (2015). FTO Obesity Variant Circuitry and Adipocyte Browning in Humans. N. Engl. J. Med. 373, 895–907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Chatterjee S, Kapoor A, Akiyama JA, Auer DR, Lee D, Gabriel S, Berrios C, Pennacchio LA, and Chakravarti A (2016). Enhancer Variants Synergistically Drive Dysfunction of a Gene Regulatory Network In Hirschsprung Disease. Cell 167, 355–368.e10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Patwardhan RP, Hiatt JB, Witten DM, Kim MJ, Smith RP, May D, Lee C, Andrie JM, Lee SI, Cooper GM, et al. (2012). Massively parallel functional dissection of mammalian enhancers in vivo. Nat. Biotechnol. 30, 265–270. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lee D, Gorkin DU, Baker M, Strober BJ, Asoni AL, McCallion AS, and Beer MA (2015). A method to predict the impact of regulatory variants from DNA sequence. Nat. Genet. 47, 955–961. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Zhou J, and Troyanskaya OG (2015). Predicting effects of noncoding variants with deep learning-based sequence model. Nat Meth 12, 931–934. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kelley DR, Snoek J, and Rinn JL (2016). learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999. 10.1101/gr.200535.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Avsec Z, Weilert M, Shrikumar A, Krueger S, Alexandari A, Dalal K, Fropf R, McAnany C, Gagneur J, Kundaje A, and Zeitlinger J (2021). Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354–366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Ulirsch JC, Nandakumar SK, Wang L, Giani FC, Zhang X, Rogov P, Melnikov A, McDonel P, Do R, Mikkelsen TS, and Sankaran VG (2016). Systematic Functional Dissection of Common Genetic Variation Affecting Red Blood Cell Traits. Cell 165, 1530–1545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Kreimer A, Zeng H, Edwards MD, Guo Y, Tian K, Shin S, Welch R, Wainberg M, Mohan R, Sinnott-Armstrong NA, et al. (2017). Predicting gene expression in massively parallel reporter assays: A comparative study. Hum. Mutat. 38, 1240–1250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Shigaki D, Adato O, Adhikari AN, Dong S, Hawkins-Hooker A, Inoue F, Juven-Gershon T, Kenlay H, Martin B, Patra A, et al. (2019). Integration of multiple epigenomic marks improves prediction of variant impact in saturation mutagenesis reporter assay. Hum. Mutat. 40, 1280–1291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Poulter NR, Prabhakaran D, and Caulfield M (2015). Hypertension. Lancet 386, 801–812. [DOI] [PubMed] [Google Scholar]
- 20.Young JH, Chang YPC, Kim JDO, Chretien JP, Klag MJ, Levine MA, Ruff CB, Wang NY, and Chakravarti A (2005). Differential susceptibility to hypertension is due to selection during the out-of-Africa expansion. PLoS Genet. 1, e82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Young JH (2007). Evolution of blood pressure regulation in humans. Curr. Hypertens. Rep. 9, 13–18. [DOI] [PubMed] [Google Scholar]
- 22.Fuchs FD, and Whelton PK (2020). High Blood Pressure and Cardiovascular Disease. Hypertension 75, 285–292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Lifton RP (1996). Molecular genetics of human blood pressure variation. Science 272, 676–680. [DOI] [PubMed] [Google Scholar]
- 24.Cabrera CP, Ng FL, Nicholls HL, Gupta A, Barnes MR, Munroe PB, and Caulfield MJ (2019). Over 1000 genetic loci influencing blood pressure with multiple systems and tissues implicated. Hum. Mol. Genet. 28, R151–R161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Ghandi M, Lee D, Mohammad-Noori M, and Beer MA (2014). Enhanced regulatory sequence prediction using gapped k-mer features. PLoS Comput. Biol. 10, e1003711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Lee D (2016). a new gkm-SVM for large-scale datasets. Bioinformatics 32, 2196–2198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Lee D, Kapoor A, Safi A, Song L, Halushka MK, Crawford GE, and Chakravarti A (2018). Human cardiac cis-regulatory elements, their cognate transcription factors, and regulatory DNA sequence variants. Genome Res. 28, 1577–1588. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.ENCODE Project Consortium; Moore JE, Purcaro MJ, Pratt HE, Epstein CB, Shoresh N, Adrian J, Kawli T, Davis CA, Dobin A, et al. (2020). Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Han SK, Muto Y, Wilson PC, Humphreys BD, Sampson MG, Chakravarti A, and Lee D (2022). Quality assessment and refinement of chromatin accessibility data using a sequence-based predictive model. Proc. Natl. Acad. Sci. USA 119, e2212810119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Finucane HK, Bulik-Sullivan B, Gusev A, Trynka G, Reshef Y, Loh PR, Anttila V, Xu H, Zang C, Farh K, et al. (2015). Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Finucane HK, Reshef YA, Anttila V, Slowikowski K, Gusev A, Byrnes A, Gazal S, Loh PR, Lareau C, Shoresh N, et al. (2018). Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Consortium GTEx (2020). The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, Downey P, Elliott P, Green J, Landray M, et al. (2015). UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLoS Med. 12, e1001779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Pan-UKB. (2020). https://pan.ukbb.broadinstitute.org.
- 35.Hoffmann TJ, Ehret GB, Nandakumar P, Ranatunga D, Schaefer C, Kwok PY, Iribarren C, Chakravarti A, and Risch N (2017). Genome-wide association analyses using electronic health records identify new loci influencing blood pressure variation. Nat. Genet. 49, 54–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Giri A, Hellwege JN, Keaton JM, Park J, Qiu C, Warren HR, Torstenson ES, Kovesdy CP, Sun YV, Wilson OD, et al. (2019). Transethnic association study of blood pressure determinants in over 750,000 individuals. Nat. Genet. 51, 51–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Del Greco M F, Pattaro C, Luchner A, Pichler I, Winkler T, Hicks AA, Fuchsberger C, Franke A, Melville SA, Peters A, et al. (2011). Genome-wide association analysis and fine mapping of NT-proBNP level provide novel insight into the role of the MTHFR-CLCN6-NPPA-NPPB gene cluster. Hum. Mol. Genet. 20, 1660–1671. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Salo PP, Havulinna AS, Tukiainen T, Raitakari O, Lehtimäki T, Kähönen M, Kettunen J, Männikkö M, Eriksson JG, Jula A, et al. (2017). Genome-Wide Association Study Implicates Atrial Natriuretic Peptide Rather Than B-Type Natriuretic Peptide in the Regulation of Blood Pressure in the General Population. Circ. Cardiovasc. Genet. 10, e001713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Naito T, Inoue K, Sonehara K, Baba R, Kodama T, Otagaki Y, Okada A, Itcho K, Kobuke K, Kishimoto S, et al. (2023). Genetic Risk of Primary Aldosteronism and Its Contribution to Hypertension: A Cross-Ancestry Meta-Analysis of Genome-Wide Association Studies. Circulation 147, 1097–1109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Lee B-K, Bhinge AA, Battenhouse A, McDaniell RM, Liu Z, Song L, Ni Y, Birney E, Lieb JD, Furey TS, et al. (2012). Cell-Type Specific and Combinatorial Usage of Diverse Transcription Factors Revealed by Genome-Wide Binding Studies in Multiple Human Cells. Genome Res. 22, 9–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Reiter F, Wienerroither S, and Stark A (2017). Combinatorial function of transcription factors and cofactors. Curr. Opin. Genet. Dev. 43, 73–81. [DOI] [PubMed] [Google Scholar]
- 42.Landolin JM, Johnson DS, Trinklein ND, Aldred SF, Medina C, Shulha H, Weng Z, and Myers RM (2010). Sequence features that drive human promoter function and tissue specificity. Genome Res. 20, 890–898. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Vaz-Drago R, Custódio N, and Carmo-Fonseca M (2017). Deep intronic mutations and human disease. Hum. Genet. 136, 1093–1111. [DOI] [PubMed] [Google Scholar]
- 44.Wainschtein P, Jain D, Zheng Z, TOPMed Anthropometry Working Group; NHLBI Trans-Omics for Precision Medicine TOPMed Consortium; Cupples LA, Shadyab AH, McKnight B, Shoemaker BM, Mitchell BD, et al. (2022). Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data. Nat. Genet. 54, 263–273. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Koido M, Hon CC, Koyama S, Kawaji H, Murakawa Y, Ishigaki K,Ito K, Sese J, Parrish NF, Kamatani Y, et al. (2023). Prediction of the cell-type-specific transcription of non-coding RNAs from genome sequences via machine learning. Nat. Biomed. Eng. 7, 830–844. 10.1038/s41551-022-00961-8. [DOI] [PubMed] [Google Scholar]
- 46.Mayet J, and Hughes A (2003). Cardiac and vascular pathophysiology in hypertension. Heart 89, 1104–1109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Donovan KA, An J, Nowak RP, Yuan JC, Fink EC, Berry BC, Ebert BL, and Fischer ES (2018). Thalidomide promotes degradation of SALL4, a transcription factor implicated in Duane Radial Ray syndrome. Elife 7, e38430. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Frankish A, Diekhans M, Ferreira AM, Johnson R, Jungreis I, Loveland J, Mudge JM, Sisu C, Wright J, Armstrong J, et al. (2019). GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766–D773. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Weirauch MT, Yang A, Albu M, Cote AG, Montenegro-Montero A, Drewe P, Najafabadi HS, Lambert SA, Mann I, Cook K, et al. (2014). Determination and Inference of Eukaryotic Transcription Factor Sequence Specificity. Cell 158, 1431–1443. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.1000 Genomes Project Consortium; Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, and Abecasis GR (2015). A global reference for human genetic variation. Nature 526, 68–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Quinlan AR, and Hall IM (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Langmead B, and Salzberg SL (2012). Fast gapped-read alignment with Bowtie 2. Nat Meth 9, 357–359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Martin M (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10–12. [Google Scholar]
- 54.Grant CE, Bailey TL, and Noble WS (2011). FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, and Liu XS (2008). Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, and Durbin R; 1000 Genome Project Data Processing Subgroup (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Corces MR, Trevino AE, Hamilton EG, Greenside PG, Sinnott-Armstrong NA, Vesuna S, Satpathy AT, Rubin AJ, Montine KS, Wu B, et al. (2017). An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat Meth 14, 959–962. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Buenrostro JD, Giresi PG, Zaba LC, Chang HY, and Greenleaf WJ (2013). Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Meth 10, 1213–1218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Bulik-Sullivan BK, Loh PR, Finucane HK, Ripke S, Yang J, Schizophrenia Working Group of the Psychiatric Genomics Consortium; Patterson N, Daly MJ, Price AL, and Neale BM (2015). LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Gazal S, Finucane HK, Furlotte NA, Loh PR, Palamara PF, Liu X, Schoech A, Bulik-Sullivan B, Neale BM, Gusev A, and Price AL (2017). Linkage disequilibrium–dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Kanai M, Ulirsch JC, Karjalainen J, Kurki M, Karczewski KJ, Fauman E, Wang QS, Jacobs H, Aguet F, Ardlie KG, et al. (2021). Insights from Complex Trait Fine-Mapping across Diverse Populations. Preprint at medRxiv. 10.1101/2021.09.03.21262975. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Raw and processed data for kidney ATAC-seq generated in this study have been deposited at the NCBI Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo), and are publicly available as of the date of publication. Accession numbers are listed in the key resources table. The Supplemental Data (Data S1–S6) have been deposited at Zenodo and are publicly available. The DOI is listed in the key resources table.
All original code with documentation has been deposited at GitHub: https://github.com/Dongwon-Lee/bph2 and is publicly available. DOI for the code is also listed in the key resources table.
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
KEY RESOURCES TABLE.
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
|
| ||
| Biological samples | ||
|
| ||
| Frozen human kidney tissues | The National Disease Research Interchange (NDRI) | http://www.ndriresource.org/; RRID:SCR_000550 |
| Frozen human kidney tissues | Gift of Life Michigan | N/A |
|
| ||
| Chemicals, peptides, and recombinant proteins | ||
|
| ||
| 6x Homogenization Buffer | In-house | 30mM CaCl2, 18 mM Mg(AC)2, 60mM Tris pH 7.8, 0.1mM PMSF, 1mM β-mercaptoethanol, 320mM Sucrose, 0.1mM EDTA, 0.1%NP40 |
| ATAC-Resuspension Buffer (RSB) | In-house | 10mM Tris-HCL, 10mM NaCl, 3mM MgCl2, 1% BSA |
| PMSF | Sigma | P7626–1G |
| β-mercaptoethanol | Sigma | M6250–100ML |
| Iodixanol solution | Sigma | D1556–250ML |
| Calcium chloride | Sigma | 21115–100ML |
| Magnesium acetate | Sigma | 63052–100ML |
| EDTA | Ambion | AM9261 |
| Sucrose | Sigma | S7903–250G |
| Digitonin | Sigma-Aldrich | Product# 300410 |
| BSA | Sigma-Aldrich | Product# A2153 |
| NP-40 | Thermofischer Scientific | CAT# 85124 |
| Tris-HCL | Thermofischer Scientific | CAT# 15567027 |
| Sodium Chloride Solution | Thermofischer Scientific | CAT# AM9759 |
| Magnesium Chloride Solution | Thermofischer Scientific | CAT# AM9530G |
| Tween 20 | Thermofischer Scientific | CAT# 28320 |
| TD buffer | Illumina | 20034198 |
| Transposase | Illumina | 15027865 |
| RNAse inhibitor | Thermofischer Scientific | CAT# AM2696 |
| DAPI | Thermofischer Scientific | D1306 |
| Nuclei EZ lysis buffer | Sigma Aldrich | Product# NUC-101 |
|
| ||
| Critical commercial assays | ||
|
| ||
| MinElute Reaction Cleanup Kit | Qiagen | 28206 |
|
| ||
| Deposited data | ||
|
| ||
| Raw and processed data for kidney tissues | This paper | GEO: GSE200047 |
| UK-Biobank GWAS summary statistics | UK Biobank33 | http://www.nealelab.is/uk-biobank/ |
| GWAS summary statistics for BP datasets | NHGRI-EBI GWAS Catalog1 | http://www.ebi.ac.uk/gwas/; RRID:SCR_012745 |
| Human Gene Annotation data from GENCODE | GENCODE48 | https://www.gencodegenes.org/; RRID:SCR_014966 |
| GTEx RNA-seq data for BP tissues | GTEx Consortium32 | http://commonfund.nih.gov/GTEx/; RRID:SCR_013042 |
| CIS-BP TFBS Motif data | Weirauch et al.49 | http://cisbp.ccbr.utoronto.ca; RRID:SCR_017236 |
| 1000 Genomes Project | The 1000 Genomes Project Consortium50 | http://www.1000genomes.org/; RRID:SCR_006828 |
| DNase-seq and ATAC-seq data for BP tissues | ENCODE Project Consortium28 | https://www.encodeproject.org/; RRID:SCR_015482 |
|
| ||
| Software and algorithms | ||
|
| ||
| BEDTools | Quinlan & Hall51 | https://github.com/arq5x/bedtools2; RRID:SCR_006646 |
| Bowtie2 | Langmead & Salzberg52 | RRID:SCR_016368 |
| cutadapt | Martin53 | RRID:SCR_011841 |
| FIMO | Grant et al.54 | https://meme-suite.org/meme/doc/fimo.html |
| gkmQC | Han et al.29 | https://github.com/Dongwon-Lee/gkmQC |
| LDSC | Finucane et al.30 | RRID:SCR_022801 |
| LS-GKM | Lee26 | https://github.com/Dongwon-Lee/lsgkm |
| MACS2 | Zhang et al.55 | RRID:SCR_013291 |
| Picard Toolkit | Broad Institute | https://broadinstitute.github.io/picard/; RRID:SCR_006525 |
| Python 3.9.15 | Python Software Foundation | http://www.python.org/; RRID:SCR_008394 |
| R 3.6.2 | The R Project for Statistical Computing | https://www.r-project.org/; RRID:SCR_001905 |
| SAMTools | Li et al.56 | RRID:SCR_002105 |
|
| ||
| Other | ||
|
| ||
| Pipeline scripts | This paper | https://github.com/Dongwon-Lee/bph2; https://doi.org/10.5281/zenodo.8400966 |
| Supplemental Data | This paper | https://doi.org/10.5281/zenodo.8057373 |
