Biological constraints on GWAS SNPs at suggestive significance thresholds reveal additional BMI loci

Reza K Hammond; Matthew C Pahl; Chun Su; Diana L Cousminer; Michelle E Leonard; Sumei Lu; Claudia A Doege; Yadav Wagley; Kenyaita M Hodge; Chiara Lasconi; Matthew E Johnson; James A Pippin; Kurt D Hankenson; Rudolph L Leibel; Alessandra Chesi; Andrew D Wells; Struan FA Grant

doi:10.7554/eLife.62206

. 2021 Jan 18;10:e62206. doi: 10.7554/eLife.62206

Biological constraints on GWAS SNPs at suggestive significance thresholds reveal additional BMI loci

Reza K Hammond ^1,², Matthew C Pahl ^1,², Chun Su ^1,², Diana L Cousminer ^1,², Michelle E Leonard ^1,², Sumei Lu ^1,², Claudia A Doege ^3,^4,⁵, Yadav Wagley ⁶, Kenyaita M Hodge ^1,², Chiara Lasconi ^1,², Matthew E Johnson ^1,², James A Pippin ^1,², Kurt D Hankenson ⁶, Rudolph L Leibel ⁷, Alessandra Chesi ^1,², Andrew D Wells ^1,^8,⁹, Struan FA Grant ^1,^2,^10,^11,^12,^✉

Editors: Mone Zaidi¹³, Mone Zaidi¹⁴

PMCID: PMC7815306 PMID: 33459256

Abstract

To uncover novel significant association signals (p<5×10⁻⁸), genome-wide association studies (GWAS) requires increasingly larger sample sizes to overcome statistical correction for multiple testing. As an alternative, we aimed to identify associations among suggestive signals (5 × 10⁻⁸≤p<5×10⁻⁴) in increasingly powered GWAS efforts using chromatin accessibility and direct contact with gene promoters as biological constraints. We conducted retrospective analyses of three GIANT BMI GWAS efforts using ATAC-seq and promoter-focused Capture C data from human adipocytes and embryonic stem cell (ESC)-derived hypothalamic-like neurons. This approach, with its extremely low false-positive rate, identified 15 loci at p<5×10⁻⁵ in the 2010 GWAS, of which 13 achieved genome-wide significance by 2018, including at NAV1, MTIF3, and ADCY3. Eighty percent of constrained 2015 loci achieved genome-wide significance in 2018. We observed similar results in waist-to-hip ratio analyses. In conclusion, biological constraints on sub-significant GWAS signals can reveal potentially true-positive loci for further investigation in existing data sets without increasing sample size.

Research organism: Human

Introduction

Genome-wide association studies (GWAS) have been widely employed to identify genetic variants that are associated with risk for disease and physiologically relevant traits (Tam et al., 2019; Visscher et al., 2017). GWAS was first described in 2005 (Klein et al., 2005) and has since been utilized to study a large range of common and complex traits in humans. As of July 2020, the GWAS Catalog consisted of 4628 publications reporting a total of 189,811 associations that achieved genome-wide significance (p<5×10⁻⁸) between genetic variants and human common complex traits. For instance, in 2009, 54 near-independent genome-wide significant (GWS) signals associated with variation in height had been identified in a population of tens of thousands of individuals (Visscher, 2008), while by 2014, that number had jumped to 697 and was estimated to explain more than 20% of the trait’s heritability (Wood et al., 2014). The number of significant GWS signals increased to 3,290 in 2018, accounting for nearly a quarter of the heritability (Yengo et al., 2018). And yet, despite these successes, GWAS has a clear shortcoming in that a large proportion of predicted heritability remains unexplained despite a constantly growing number of implicated loci (Tam et al., 2019). These novel loci are only identifiable using larger sample sizes, which come at a significant cost of both time and money, and each successive independent signal explains less and less of the overall estimated heritability (Zuk et al., 2012).

The additional signals achieving genome-wide significance in successive rounds of GWAS are often those that failed to achieve the commonly utilized p=5×10⁻⁸ threshold in initial rounds, a strict bar that can only be met when larger cohort sizes are recruited. Various approaches have been previously utilized to identify sub-threshold signals that go on to achieve genome-wide significance. Sub-threshold signals have been selected for replication in a stage 2 GWAS due to their proximity to putative candidate genes to identify novel loci, as with the identification of the obesity locus NPC1 and replicated subsequently in a large meta-analysis (Meyre et al., 2009; Turcot et al., 2018). Epigenomic maps have also been utilized to implicate biologically relevant sub-threshold variants that were subsequently experimentally validated (Wang et al., 2016). We explored if it would be possible to apply a systematic genome-wide molecular-genetics-based approach to sub-threshold signals to predict single nucleotide polymorphisms (SNPs) that would go on to achieve genome-wide significance in a future round of GWAS. Over the past decade, sequencing technologies such aa RNA-seq (Wang et al., 2009), ATAC-seq (Buenrostro et al., 2013), and high-resolution promoter-focused Capture C Chesi et al., 2019; Hughes et al., 2014; Su et al., 2020 have been developed to facilitate the annotation of genes and their regulatory elements. Such data have been utilized to identify physical variant-to-gene interactions via three-dimensional genomics to implicate effector genes at GWAS loci where both sequence variant and gene reside within regions of open chromatin (Arnold et al., 2015; Çalışkan et al., 2019; Chesi et al., 2019; Cousminer et al., 2020; Javierre et al., 2016; Smemo et al., 2014; Su et al., 2019; Su et al., 2020). By leveraging high-resolution promoter-focused Capture C with ATAC-seq, it is possible to physically connect putatively functional non-coding elements, such as enhancers, harboring disease-relevant SNPs to promoters of specific genes thereby potentially mechanistically implicated in the SNP-associated phenotype. We hypothesized that by applying this variant-to-gene mapping method to sub-threshold SNP signals, the molecular constraints on these signals would result in a set of SNPs that are plausibly related to the biology of the trait in question, thereby bypassing the requirement of a larger GWAS to implicate these signals.

To test this hypothesis, we leveraged multiple GWAS data sets, increasing in sample size over time, to determine whether our sub-threshold implicated leads became genome-wide significant in a subsequent larger study. We chose to test this method utilizing GWAS data sets for body mass index (BMI) and waist-to-hip ratio adjusted for BMI (WHRadjBMI) as initial traits because they have both been the subject of particularly large GWAS efforts by the GIANT consortium. Additionally, both traits have implicated a large number of loci and therefore provide optimal statistical power for our purposes. With respect to variant-to-gene mapping, we utilized our existing 3D genomic data sets for adipocytes as well as hypothalamic neurons as previous studies have implicated these cell types in BMI-associated variants (Locke et al., 2015).

Results

Suggestive proxy SNP identification

We first applied our chromatin-based variant-to-gene mapping approach to predict BMI loci that would be subsequently reported as genome-wide significant in a 2015 GWAS (n = 339,224) but were only suggestive in 2010 (n = 249,796). Because the sentinel SNPs reported from a GWAS are not necessarily causal with regard to phenotype, but very likely in linkage disequilibrium (LD) with the responsible allele, we first identified proxy SNPs (r² > 0.8) for each of the 2010 suggestive SNPs, 5×10⁻⁸≤p<5×10⁻⁴. Here, we detected 26,343 proxy SNPs; however, some of these proxy signals were in LD with an already established genome-wide significant signal. Given the purpose of this analysis was to identify new signals that were not yet genome-wide significant, we eliminated all proxies that were in even very modest LD, with an already genome-wide significant signal using an r² > 0.1. This filtering left a residual of 23,197 signals.

ATAC-seq and promoter-focused capture C

We generated ATAC-seq and promoter-focused Capture C libraries from mesenchymal stem cell (MSC)-derived adipocytes and leveraged our existing comparable data from human ESC-derived hypothalamic neurons to query open chromatin maps for open sub-threshold SNPs that contact open gene promoters (Pahl et al., 2020). The adipocytes were derived from mesenchymal stem cells. Three ATAC-seq libraries were sequenced and analyzed with the ENCODE pipeline (https://github.com/kundajelab/atac_dnase_pipelines). Peaks from all replicates were merged by bedtools (v2.25.0) provided peaks were present in at least two biological replicates. These resulted in 2,225,635 adipocyte peaks and 179,212 hypothalamic neuron peaks.

The adipose capture C libraries had an average of 1.4 billion reads per adipose library, with an average of 41% valid read pairs and 89% capture efficiency. We then called significant interactions using the CHiCAGO pipeline and performed analyses at a 1-fragment resolution to identify short-distance interactions (258,882 interactions) and at 4-fragment resolution to identify long-distance interactions (278,040) (Cairns et al., 2016).

Variant-to-gene mapping reveals suggestive 2010 BMI loci that subsequently achieved genome-wide significance in 2015

We next identified physical contacts between any of the remaining proxy SNPs and gene promoters, with the additional constraint that both points of contact were within regions of open chromatin and that the SNP itself did not map to a gene promoter. This approach favors SNPs most likely to have a functional role in the regulation of genes relevant to a given trait and is therefore dependent on the cell type utilized to make such inferences. For BMI variant-to-gene mapping, we used data derived from MSC-derived adipocytes and human ESC-derived hypothalamic neurons because of the known roles of these cell types in BMI. To identify the point at which this approach is no longer viable due to noise, that is, at point of negligible return and excessive false positives, we stratified the suggestive regions into several bins: 5×10⁻⁸≤p<5×10⁻⁷, 5×10⁻⁸≤p<5×10⁻⁶, 5×10⁻⁸≤p<5×10⁻⁵, and 5×10⁻⁸≤p<5×10⁻⁴. Note that each successive bin is inclusive of the SNPs from the previous bin.

Upon identifying loci that passed these filters, we quantified those that had reached genome-wide significance by 2015. To avoid redundant inclusion of SNPs in LD with one another, we collapsed all biologically constrained SNPs into independent SNP clusters, designated here as separate independent ‘loci’. We defined such loci as the set of SNPs surviving the variant-to-gene mapping filter that were in LD with one another at a relatively relaxed r² threshold of >0.4.

One hundred and sixty-one of the 23,197 suggestive proxy SNPs survived these biological filters at our most relaxed p-value threshold. These SNPs corresponded to 78 loci, of which 11 achieved genome-wide significance by 2015 (Table 1); these are annotated on the 2015 BMI Manhattan plot (Figure 1). Four of these loci were highlighted in the 2010 study, but the associations were only considered ‘suggestive’ within the stage one discovery set at that time, that is at p>5×10⁻⁸. Across all suggestive bins and cell types, the positive predictive value was low through 2015, and the proportion of constrained signals actually achieving genome-wide significance (GWS) did not differ significantly from the proportion of the unconstrained signals achieving GWS within the corresponding p-value bin (Figure 2). This is likely a function of the relatively modest increase in sample size (+89,428) between 2010 and 2015. At the relatively relaxed p-value threshold of 5×10⁻⁸≤p<5×10⁻⁵, we observed that 6/15 2010 biologically constrained loci were GWS in 2015, whereas 43/163 unconstrained loci were GWS by 2015. A flowchart describing this pipeline using the 2010–2015 BMI data is available in Figure 3.

Table 1. 2015 BMI loci that were implicated with our method in the 2010 data set.

The 2015 genome-wide significant BMI loci whose sentinel SNP was in LD with SNPs implicated from suggestive association in the 2010 BMI GWAS. Key: Notable genes from biological relevance to obesity (B); copy number variation (C); DEPICT analyses (D); GRAIL results (G); BMI-associated variant is in strong LD (r² ≥ 0.8) with a missense variant in the indicated gene (M); gene nearest to index SNP (N); association and eQTL data converge to affect gene expression (Q) (Locke et al., 2015).

Novel as of 2015 (Locke et al.)
2015 sentinel SNP		2010 implicated SNPs	2015 assigned locus name	Interacting gene
rs4740619		rs10810462	C9orf93(C,M,N)	TCONS_00015651
rs17094222		rs117597828	HIF1AN(N)	PAX2
rs2820292		rs12086240, rs2820315	NAV1(N)	TIMM17A
rs758747		rs2238435	NLRC3(N)	TCONS_00024950, TCONS_00024564, TCONS_00024949, TCONS_00024562, RP11-462G12.1, TCONS_00024568, TCONS_00024570, RP11-95P2.1, TCONS_00024567, TCONS_00024569, TCONS_00024320, TCONS_00024952
rs3736485		rs7183479	SCG3(B,D); DMXL2(M,N)	LYSMD2, SCG3, CTD-2308G16.1, TMOD2
Identified between 2010 (Speliotes et al.) and 2015 (Locke et al.)
2015 Sentinel SNP	2010 Implicated SNPs		2015 Assigned locus name	Interacting gene
rs17024393	rs72705210		GNAT2(N); AMPD2(D)	GSTM3, AHCYL1
rs4256980	rs10840079, rs10840087, rs11041999, rs11042023, rs12803166, rs4256980,		TRIM66(D,M,N); TUB(B)	PBLD, RPL27A, TRIM66
Identified in 2010 (Speliotes et al.), but was not genome-wide significant
2015 Sentinel SNP	2010 Implicated SNPs		2015 Assigned Locus Name	Interacting gene
rs10182181	rs12713419, rs13012304, rs6718510, rs7597332, rs7608976		ADCY3(B,M,N,Q); POMC(B,G); NCOA1(B); SH2B1(B,M,Q); APOBR(M,Q);	ADCY3, TCONS_00003602
rs12016871	rs7988412,		MTIF3(N); GTF3A(Q) *~1 Mb from sentinel	MTIF3
rs1808579	rs1788783		NPC1(B,G,M,Q); C18orf8(N,Q)	NPC1
rs2287019	rs11672660, rs34783010		QPCTL(N); GIPR(B,M)	GIPR

Open in a new tab

Figure 1. — Cell type where locus was identified indicated below locus name. Color indicates the p-value threshold where the locus became implicated (Locke et al., 2015). Color key: Green – 5×10⁻⁸≤p<5×10⁻⁷, blue – 5×10⁻⁷≤p<5×10⁻⁶, orange – 5×10⁻⁶≤p<5×10⁻⁵, red – 5×10⁻⁵≤p<5×10⁻⁴.

Figure 2. — Positive predictive value is depicted as a percentage for each bar. Above these percentages, the p-value, as identified through Fisher’s exact test, is posted. These p-values depict the probability that the proportions of salvaged SNPs using variant-to-gene mapping differ from simply salvaging all suggestive SNPs within the same suggestive bin.

Figure 2—source data 1. Number of 2010 loci identified by constrained method and the number that achieved GWS by 2015 in each cell type.

elife-62206-fig2-data1.csv^{(125B, csv)}

Figure 2—source data 2. Number of 2010 loci identified with no constraint and the number that achieved GWS by 2015.

elife-62206-fig2-data2.csv^{(101B, csv)}

Figure 3. — BMI 2010–2015 data is utilized here as an example to report the number of SNPs and loci that occur at each step of the analysis.

BMI GWAS variant-to-gene mapping constraints between 2010 and 2018

Using the same set of 2010 surviving SNPs from the previous section, we next identified how many of these SNPs reached GWS by the 2018 BMI GWAS (N = 681,275), representing a nearly tripling in cohort size compared to 2010. Here, we observed that 45 of the 78 2010 biologically constrained loci achieved genome-wide significance by 2018 at the most relaxed p-value threshold (Table 2).

Table 2. 2018 BMI loci that were identified using 2010 salvaged.

SNPs 2018 BMI loci identified as genes nearest to genome-wide significant SNPs that could be identified using SNPs salvaged from suggestive regions of the 2010 BMI GWAS.

Nearest gene to sentinel	Surviving proxy SNPs	Lowest threshold found
ABHD17A	rs893543, rs893542, rs11671347	5 × 10⁻⁴
AC007879.5	rs11677847, rs72951700, rs11689163, rs72966483, rs11694560, rs11692026, rs964621, rs964622	5 × 10⁻⁴
ADCY3	rs6718510, rs7597332, rs7608976, rs13012304, rs12713419	5 × 10⁻⁴
ADCY9	rs710893, rs2531993, rs2238435	5 × 10⁻⁵
AK5	rs12729914	5 × 10⁻⁵
AP000439.5	rs11605729	5 × 10⁻⁴
BCL7A	rs7299842	5 × 10⁻⁴
C10orf32	rs7085104	5 × 10⁻⁴
C18orf8	rs1788826	5 × 10⁻⁴
C1orf61	rs11264483	5 × 10⁻⁴
CCDC171	rs10810462	5 × 10⁻⁴
CNNM2	rs1926032	5 × 10⁻⁴
COQ4	rs1468648	5 × 10⁻⁴
CRTC1	rs4808845, rs4808844	5 × 10⁻⁴
DPYD	rs12077442	5 × 10⁻⁴
EIF2B5	rs3914188, rs35637422	5 × 10⁻⁴
EXOSC10	rs1884429, rs12041740	5 × 10⁻⁴
FAIM2	rs422022	5 × 10⁻⁴
GAB2	rs869202	5 × 10⁻⁴
GIPR	rs34783010, rs11672660	5 × 10⁻⁷
GPR61	rs72705210	5 × 10⁻⁴
HIF1AN	rs117597828	5 × 10⁻⁴
HOXB1	rs2326013	5 × 10⁻⁴
IFNGR1	rs17258750	5 × 10⁻⁴
IPO9	rs2820315	5 × 10⁻⁵
KCNJ12	rs9906072	5 × 10⁻⁴
LMOD1	rs2047264	5 × 10⁻⁴
MAP2K3	rs2001651, rs3785542	5 × 10⁻⁴
MAP3K7CL	rs928277	5 × 10⁻⁴
MEF2D	rs2274319, rs1925950, rs12038396, rs3818463, rs2274320, rs2274317	5 × 10−⁴
MLN	rs11752353, rs6921487, rs72880511, rs1887340, rs73746509	5 × 10⁻⁴
MLXIP	rs28530689, rs10773037, rs28737311, rs36158849, rs2280573	5 × 10⁻⁴
MST1R	rs3774758, rs2252833, rs6446187	5 × 10^⁻⁴
MTIF3	rs7988412	5 × 10⁻⁷
MTOR	rs11581010, rs10864490	5 × 10⁻⁴
NAV1	rs12086240	5 × 10⁻⁵
NPC1	rs1788783	5 × 10⁻⁴
RASA2	rs2042864	5 × 10⁻⁴
RCAN2	rs3934393	5 × 10⁻⁵
RNU6-543P	rs10761689	5 × 10⁻⁴
RP11-493K19.3	rs13100903	5 × 10⁻⁴
RP11-562L8.1	rs12887636	5 × 10⁻⁵
RP11-68I18.10	rs10788800	5 × 10⁻⁵
RP11-707P17.1	rs7183479	5 × 10⁻⁴
SAE1	rs466477	5 × 10⁻⁴
SKAP1	rs16951519, rs2240121	5 × 10⁻⁵
STK33	rs10840087, rs11041999, rs34009921	5 × 10⁻⁴
TNRC6B	rs6001834, rs4820409	5 × 10⁻⁴
TRIM66	rs10840079, rs4256980, rs11042023, rs12803166	5 × 10⁻⁶
TTC34	rs6424062	5 × 10⁻⁵
URM1	rs7859557, rs2240948	5 × 10⁻⁴
XXYLT1	rs58434965	5 × 10⁻⁴

Open in a new tab

The proportion of 2010 suggestive SNPs meeting these criteria that achieved GWS by 2018 within suggestive bin 5×10⁻⁸≤p<5×10⁻⁵ was particularly striking (Figure 4). At this threshold, 86.7% (13/15) of biologically constrained loci identified achieved genome-wide significance by 2018, whereas only 40% (6/15) had achieved genome-wide significance by 2015 (Figure 2); clearly, this improvement is a function of relative cohort size. While an increase was also observed with loci identified with no constraint (43/163 suggestive 2010 loci achieving GWS by 2015 and 105/163 achieving GWS by 2018), the precision was significantly higher with the biological constraint than without.

Figure 4. — BMI SNPs salvaged via variant-to-gene mapping that go on to reach genome-wide significance by 2018, as well as the set of unconstrained 2010 suggestive SNPs that achieve genome-wide significance by 2018. Positive predictive value is depicted for each bar. Above these percentages, the p-value, as identified through Fisher’s exact test, is posted. These p-values depict the probability that the proportions of salvaged SNPs using variant-to-gene mapping differ from simply salvaging all suggestive SNPs within the same suggestive bin.

Figure 4—source data 1. Number of 2010 loci identified by constrained method and the number that achieved GWS by 2018 in each cell type.

elife-62206-fig4-data1.csv^{(129B, csv)}

Figure 4—source data 2. Number of 2010 loci identified with no constraint and the number that achieved GWS by 2018.

elife-62206-fig4-data2.csv^{(103B, csv)}

Figure 4—figure supplement 1. — BMI SNPs salvaged via variant-to-gene mapping that go on to reach genome-wide significance by 2018, as well as the set of unconstrained 2010 suggestive SNPs that achieve genome-wide significance by 2018. Positive predictive value is depicted for each bar. Above these percentages, the p-value, as identified through Fisher’s exact test, is posted. These p-values depict the probability that the proportions of salvaged SNPs using variant-to-gene mapping differ from simply salvaging all suggestive SNPs within the same suggestive bin.

Figure 4—source data 1. Number of 2010 loci identified by constrained method and the number that achieved GWS by 2018 in each cell type.

elife-62206-fig4-data1.csv^{(129B, csv)}

Figure 4—source data 2. Number of 2010 loci identified with no constraint and the number that achieved GWS by 2018.

elife-62206-fig4-data2.csv^{(103B, csv)}

We assessed the positive predictive capability of this method across the stratified p-value bins to determine at which level of significance this biologically constrained analytic no longer outperformed randomly selected signals from the same bin. As shown in Figure 4—figure supplement 1, there is no point across either cell type or p-value bin where randomly sampled SNPs outperform the chromatin-constrained approach. Forty-four of 78 suggestive loci were identified in the 5×10⁻⁸≤p<5×10⁻⁴ bin, with a positive predictive value approximately 1.2-fold higher than random selection. As anticipated, fewer novel chromatin-implicated loci were detected in the 5×10⁻⁸≤p<5×10⁻⁵ bin, though the positive predictive value was approximately 1.5-fold higher than random. Taken together, these results suggest that this chromatin-constrained approach consistently performs better than random within these serial data sets, though there are diminishing returns as the p-value threshold is made increasingly less stringent.

BMI GWAS variant-to-gene mapping constraints between 2015 and 2018

We also applied the chromatin-based variant-to-gene mapping approach to the 2015 BMI GWAS data to assess the efficiency of this analysis in identifying loci that would achieve genome-wide significance by 2018. The identified proxy SNPs, their nearest 2018 GWS proxy, and the nearest gene to those proxies are described in Supplementary file 1 (see Supplementary file 2 for the implicated genes). We observed that across all cell types with 5×10⁻⁸≤p<5×10⁻⁴, 117 of the implicated 184 loci reached GWS in 2018. At the 5×10⁻⁸≤p<5×10⁻⁵ threshold, 80% (57/71) of the constrained 2015 implicated loci reached GWS in 2018, while 59% (148/248) of the unconstrained 2015 loci reached GWS in 2018 (Figure 5).

Figure 5. — BMI SNPs salvaged via variant-to-gene mapping that go on to reach genome-wide significance by 2018, as well as the set of unconstrained 2015 suggestive SNPs that achieve genome-wide significance by 2018. Positive predictive value is depicted for each bar. The posterior probability that loci identified by our chromatin-based constraint more often achieve GWS than loci with no constraint is posted above these percentages.

Figure 5—source data 1. Number of 2015 loci identified by constrained method and the number that achieved GWS by 2018 in each cell type.

elife-62206-fig5-data1.csv^{(139B, csv)}

Figure 5—source data 2. Number of 2015 loci identified with no constraint and the number that achieved GWS by 2018.

elife-62206-fig5-data2.csv^{(104B, csv)}

The results from the analyses at these years present the greatest levels of significance observed thus far. Given the presence of many more loci in each bin, we observed significance at the 5×10⁻⁸≤p<5×10⁻⁶ threshold in both cell types, where 93% (26/28) of the surviving signals reached GWS by 2018 in both cell types. This appears to be principally due to the larger sample sizes at this bin size relative to 2010, which provided additional power to observe such differences. Considering the results of all BMI retrospective analyses through 2018, we found that this method positively identified sub-threshold BMI signals that went on to achieve GWS at a later date significantly more often than when we apply no biological constraint.

Constraining WHRadjBMI GWAS reports: 2010–2018

We also assessed the performance of this method in the 2010 WHRadjBMI GWAS (n = 77,167) relative to 2018 (n = 694,649) (The results of the WHRadjBMI 2010 WHRadjBMI relative to 2015 [n = 224,459,459], and 2015 WHRadjBMI relative to 2018, are available in our Figure 6—figure supplement 2, Figure 6—figure supplement 2, and Supplementary file 2). Here, we found 15,457 proxies within the most relaxed p-value threshold of 5×10⁻⁸≤p<5×10⁻⁴. One hundred and fifty-seven of these proxies survived our biological constraints, corresponding to 71 independent loci. Thirty-six of these ultimately achieved GWS by the 2018 GWAS (Figure 6).

Figure 6. — WHRadjBMI SNPs salvaged via variant-to-gene mapping that go on to reach genome-wide significance by 2018, as well as the set of unconstrained 2010 suggestive SNPs that achieve genome-wide significance by 2018. Positive predictive value is depicted for each bar. Above these percentages, the p-value, as identified through Fisher’s exact test, is posted. These p-values depict the probability that the proportions of salvaged SNPs using variant-to-gene mapping differ from simply salvaging all suggestive SNPs within the same suggestive bin.

Figure 6—source data 1. Number of 2010 loci identified by constrained method and the number that achieved GWS by 2018 in each cell type.

elife-62206-fig6-data1.csv^{(128B, csv)}

Figure 6—source data 2. Number of 2010 loci identified with no constraint and the number that achieved GWS by 2018.

elife-62206-fig6-data2.csv^{(100B, csv)}

Figure 6—figure supplement 1. — WHRadjBMI SNPs salvaged via variant-to-gene mapping that go on to reach genome-wide significance by 2018, as well as the set of unconstrained 2010 suggestive SNPs that achieve genome-wide significance by 2018. Positive predictive value is depicted for each bar. Above these percentages, the p-value, as identified through Fisher’s exact test, is posted. These p-values depict the probability that the proportions of salvaged SNPs using variant-to-gene mapping differ from simply salvaging all suggestive SNPs within the same suggestive bin.

Figure 6—source data 1. Number of 2010 loci identified by constrained method and the number that achieved GWS by 2018 in each cell type.

elife-62206-fig6-data1.csv^{(128B, csv)}

Figure 6—source data 2. Number of 2010 loci identified with no constraint and the number that achieved GWS by 2018.

elife-62206-fig6-data2.csv^{(100B, csv)}

Precision of the biologically constrained 2010 loci reaching genome-wide significance in 2018 for both traits through 5×10⁻⁸≤p<5×10⁻⁶ was 100% for each cell type, except for the 2010 WHRadjBMI SNPs constrained by adipose chromatin data which equals 83% (Figure 6—figure supplement 1). This represents a 1.3- to 1.6-fold increase over the mean PPV of a randomly sampled unconstrained SNP set at the same p-value threshold. The number of loci surviving this constraint present in this p-value threshold is quite modest (BMI: 4/4 vs WHRadjBMI: 5/6). The threshold may be further relaxed (5×10⁻⁸<p<5×10⁻⁵) to identify additional loci: BMI: 13/15 vs WHRadjBMI: 13/20. Relaxing even further to p<5×10⁻⁴ yielded far more loci, but brought in more potential false positives (BMI: 44/78 vs WHRadjBMI: 36/71), although they could still become GWS at a future time point. Despite the larger rate of false positives at a threshold of 5×10⁻⁸<p<5×10⁻⁵ in WHRadjBMI relative to BMI, we still observe that the chromatin-constraint surviving sub-threshold WHRadjBMI signals went on to achieve GWS at a later date significantly more often than when we apply no constraint.

Predictive power of negative control does not differ from the unconstrained set

We assessed whether there was any period (2010–2018) in which PPV for the set of SNPs that do not physically contact gene promoters or are not located within regions of open chromatin differed significantly from the unconstrained signals, that is, if there was any period in which such a negative control outperformed, the unconstrained signals. We found that there was no p-value bin, cell type or trait for which there was a difference between this negative set and the unconstrained set (Figure 7 and Figure 8). The absence of observable differences between the negative control and unconstrained sets supports the inference that the differences observed between our biological constrained data and the unconstrained data are in fact attributable to the chromatin-constrained analytic strategy.

Figure 7. — BMI SNPs failing the variant-to-gene mapping filter that go on to reach genome-wide significance by 2018. Positive predictive value is depicted for each bar. The posterior probability that loci identified by our chromatin-based constraint more often achieve GWS than loci with no constraint is posted above these percentages. There is no threshold where this data differs significantly from the unconstrained set.

Figure 7—source data 1. Number of 2010 loci identified by constrained method and the number that achieved GWS by 2018 in each cell type.

elife-62206-fig7-data1.csv^{(152B, csv)}

Figure 8. — WHRadjBMI SNPs failing the variant-to-gene mapping filter that go on to reach genome-wide significance by 2018. Positive predictive value is depicted for each bar. The posterior probability that loci identified by our chromatin-based constraint more often achieve GWS than loci with no constraint is posted above these percentages. There is no threshold where this data differs significantly from the unconstrained set.

Figure 8—source data 1. Number of 2010 loci identified by constrained method and the number that achieved GWS by 2018 in each cell type.

elife-62206-fig8-data1.csv^{(142B, csv)}

Finally, we assessed the classification rates of this approach for both traits in all testable years. At each threshold for prior GWAS significance, we used our biological constraints to predict whether an implicated locus would achieve GWS in a subsequent GWAS. We found that, across all thresholds and chromatin-constrained loci in each cell type, sensitivity was generally less than 25%, indicating a high rate of false negatives. This is expected because the chromatin-constrained data represent effects that are only captured by the cell-types (and their respective developmental maturities) that were profiled. By combining the loci identified in either adipose or hypothalamic neurons, sensitivity was increased, though the increment was modest. Specificity, however, was consistently >80%, and often >90%, indicating few false positives. The precision was particularly high through 5×10⁻⁸<p<5×10⁻⁵; however, expanding through to 5×10⁻⁸<p<5×10⁻⁴ resulted in relatively high proportions of false positives in most analyses, although always at a lower rate than without a chromatin-constraint. As a control, we analyzed classifications of a negative control set in which loci failed to survive the variant-to-gene mapping filter for each cell type and suggestive threshold. There was no instance across trait, cell type, or p-value bin in which the specificity of the negative control outperformed the variant-to-gene mapping filter (Source data 1). All together, the consistently high specificity and precision observed conveys the capacity of this chromatin-constraint to preferentially retain signals that are likely to achieve GWS in larger data sets while minimizing the presence of false positives through 5×10⁻⁸<p<5×10⁻⁵.

Discussion

In this study, we used our recently described variant-to-gene structural mapping approach (Chesi et al., 2019; Su et al., 2020) to conduct retrospective biologically constrained analyses of previous sequential GWAS reports to determine whether we could implicate statistically suggestive SNPs that would subsequently achieve GWS in incrementally larger data sets. Using a combination of high-resolution promoter-focused Capture C and ATAC-seq, this method enables the prioritization of statistically suggestive loci.

We assessed the extent to which this method can be applied to ‘salvage’ sub-threshold loci for possible consideration. We used BMI and WHR adjusted for BMI because of the relatively large number of loci identified in GWAS efforts conducted in large cohorts that continued to increase over time. The number of SNPs surviving these filters for each cell type and trait at each successively relaxed statistical cutoff are described in Source data 1.

Given the fairly modest increase in sample size between 2010 (N = 249,796) and 2015 (N = 339,224) for BMI, it was not particularly surprising that majority of the surviving 2010 BMI signals did not achieve GWS until 2018 (N = 681,275). Only six of the 15 surviving signals achieved GWS by 2015, but 13 eventually achieved GWS by 2018. Reaching deeper into the 2010 suggestive signals through p<5×10⁻⁴ showed a much clearer trend for both traits: roughly 20% of the surviving 2010 signals achieved GWS by 2015 and nearly 50% achieved GWS by 2018. Despite positive predictive values consistently greater than 80% for loci with 5×10⁻⁸≤p<5×10⁻⁶, it remains to be seen how many of the remaining surviving SNPs with less significant p-values will achieve GWS by the publication of the next GWAS in the future for each corresponding trait.

In this study, we never identified a particular sub-threshold bin where no constraint was as precise as this chromatin-based constraint. We did, however, find that the precision observed for loci in data sets between 2010 and 2018 varied by trait. While the positive predictive value was nearly 90% for 2010 BMI constrained loci reaching GWS by 2018 at 5×10⁻⁸≤p<5×10⁻⁵, only 65% of the surviving 2010 WHRadjBMI achieved GWS by the subsequent 2018 study. At the next, more restrictive, threshold, 5×10⁻⁸≤p<5×10⁻⁶, we found that nearly all of the surviving loci for both traits of the same years reached GWS by their respective 2018 study, although the number of loci within this threshold is small (BMI: 4/4, WHRadjBMI: 5/6). Thus, our findings suggest that this strategy is capable of identifying loci that will achieve GWS at 5×10⁻⁸≤p<5×10⁻⁶. Additional loci can be identified at 5×10⁻⁸≤p<5×10⁻⁵, at the expense of a degree of false positives. Although as noted above, such false-positive signals could possibly go on to be GWS in a future study.

We also observed that across all suggestive thresholds, specificity was consistently >80%, and often >90%. The consistently high specificity and precision, at least through 5×10⁻⁸≤p<5×10⁻⁶, suggest that true negatives are largely properly identified without the generation of large amounts of false positives. The low false-positive rate did come at a cost, however. Sensitivity was extremely low at all thresholds, often below 20% (Source data 1). The lack of sensitivity conveys high levels of false negatives, meaning many signals that would eventually achieve GWS would not be properly classified with this method.

The strength of this method is in its ability to attach biological relevance to sub-threshold SNPs in individual cell types, but this can also serve as its weakness. However, given the cost in time and resources to test candidate loci, we believe that high specificity and precision are more important characteristics for such a classifier. While we believe this trade-off is acceptable for the identification of novel biologically relevant loci prior to their confirmation via GWAS, we recognize that there are many loci that are falsely identified as not relevant. In utilizing the chromatin state of individual cell types in such a manner, we may reject loci that are biologically relevant in a different cell type or those that simply lack such an epigenetic mechanism. Provided a more comprehensive library of chromatin state in a much more diverse set of cell types, or the inclusion of additional biological filters that could identify loci that lack this epigenetic mechanism, we could potentially increase sensitivity in a substantial manner while retaining high rates of precision and specificity.

Despite any such limitations, we implicated loci most likely to become significant in the context of larger data sets with just our chromatin-based constraint approach. Using this variant-to-gene mapping approach, one can prioritize loci/genes of borderline statistical significance that may have important candidacy based upon functional considerations. Confirmation could come via larger data sets and/or by direct molecular physiological analyses of the candidates.

Materials and methods

ATAC-seq library generation and peak calls

Tn5 transposase transposition (Illumina Cat #FC-121–1030, Nextera) and purification of the Tn5 transposase derived DNA from 100,000 cells of each investigated cell type. The samples were then shipped to the Center of Spatial and Functional Genomics at CHOP where the ATAC-seq process was completed. Live cells were harvested via trypsinization, followed by a series of wash steps. One hundred thousand cells from each sample were pelleted at 550 × g for 5 min at 4°C. The cell pellet was then resuspended in 50 μl cold lysis buffer (10 mM Tris–HCl, pH 7.4, 10 mM NaCl, 3 mM MgCl₂, 0.1% IGEPAL CA-630) and centrifuged immediately at 550 × g for 10 min at 4°C. The nuclei were resuspended in the transposition reaction mix (2× TD Buffer [Illumina Cat #FC-121–1030, Nextera], 2.5 µl Tn5 transposase [Illumina Cat #FC-121–1030, Nextera], and nuclease-free H₂O) on ice and then incubated for 45 min at 37°C. The transposed DNA was then purified using the MinElute Kit (Qiagen), eluted with 10.5 μl elution buffer (EB), frozen, and sent to the Center for Spatial and Functional Genomics at CHOP. The transposed DNA was PCR amplified using Nextera primers for 12 cycles to generate each library. The PCR was subsequently cleaned up using AMPureXP beads (Agencourt), and libraries were paired-end sequenced on an Illumina HiSeq 4000 (100 bp read length) and the Illumina NovaSeq platform. Open chromatin regions were called using the ENCODE ATAC-seq pipeline (https://www.encodeproject.org/atac-seq/), selecting the resulting IDR conservative peaks (with all coordinates referring to hg19). We define a genomic region open if it has 1 bp overlap with an ATAC-seq peak.

Cell fixation for chromatin capture

The protocol used for cell fixation was similar to previous methods (Hughes et al., 2014). Cells were collected and single-cell suspension were made with aliquots of 10⁷ cells in 10 ml media (e.g. RPMI + 10% FCS). Five hundred and forty microliters of 37% formaldehyde was added and incubated for 10 min at RT on a platform rocker. The reaction was quenched by adding 1.5 ml 1 M cold glycine (4°C) for a total volume of 12 ml. Fixed cells were centrifuged at 1000 rpm for 5 min at 4°C and supernatant removed. The cell pellets were washed in 10 ml cold PBS (4°C) followed by centrifugation as above. Supernatant was removed and cell pellets were resuspended in 5 ml of cold lysis buffer (10 mM Tris pH8, 10 mM NaCl, 0.2% NP-40 [Igepal] supplemented with protease inhibitor cocktails). Resuspended cells were incubated for 20 min on ice, centrifuged as above, and the lysis buffer removed. Finally, cell pellets were resuspended in 1 ml fresh lysis buffer, transferred to 1.5 ml Eppendorf tubes, and snap frozen (ethanol/dry ice or liquid nitrogen). Cells were stored at −80°C until they were thawed for 3C library generation.

Capture C

Custom capture baits were designed using an Agilent SureSelect library design targeting both ends of DpnII restriction fragments encompassing promoters (including alternative promoters) of all human coding genes, noncoding RNA, antisense RNA, snRNA, miRNA, snoRNA, and lincRNA transcripts, totaling 36,691 RNA-baited fragments. The library was designed using scripts generously provided by Dr. Hughes (Oxford, UK), utilizing the RefSeq, lincRNA transcripts, and sno/miRNA tracks in the hg19 assembly downloaded from the UCSC Table Browser on 16 September 2015. The capture library design covered 95% of all coding RNA promoters and 88% of RNA types described above. The missing 5% of coding genes that could not be designed were either duplicated genes or contained highly repetitive DNA in their promoter regions.

The isolated DNA from the 3C libraries was quantified using a Qubit fluorometer (Life Technologies), and 10 μg of each library was sheared in dH₂O using a QSonica Q800R to an average DNA fragment size of 350 bp. QSonica settings used were 60% amplitude, 30 s on, 30 s off, 2 min intervals, for a total of five intervals at 4°C. After shearing, DNA was purified using AMPureXP beads (Agencourt). Sample concentration was checked via Qubit fluorometer and DNA size was assessed on a Bioanalyzer 2100 using a 1000 DNA Chip. Agilent SureSelect XT Library Prep Kit (Agilent) was used to repair DNA ends and for adaptor ligation following the standard protocol. Excess adaptors were removed using AMPureXP beads. Size and concentration were checked again before hybridization. One microgram of adaptor ligated library was used as input for the SureSelect XT capture kit using their standard protocol and our custom-designed Capture C library. The quantity and quality of the captured library were assessed by Qubit fluorometer and Bioanalyser using a high-sensitivity DNA Chip. Each SureSelect XT library was initially sequenced on one lane HiSeq 4000 paired-end sequencing (100 bp read length) for QC. All six Capture C promoter interactome libraries were then sequenced three at a time on an S2 flow cells on an Illumina NovaSeq, generating ~1.6 billion paired-end reads per sample (50 bp read length).

Analysis of promoter-focused capture C data

Quality control of the raw fastq files was performed with FastQC. Paired-end reads were pre-processed with the HiCUP pipeline (Wingett et al., 2015), with bowtie2 as aligner and hg19 as reference genome. Significant promoter interactions at 1-DpnII fragment resolution were called using CHiCAGO (Cairns et al., 2016), with default parameters except for binsize which was set to 2500. Significant interactions at 4-DpnII fragment resolution were also called with CHiCAGO using artificial .baitmap and .rmap files where DpnII fragments were grouped into four consecutively and using default parameters except for removeAdjacent that was set to false. We define PIR a promoter-interacting region, irrespective of whether it is a baited region or not. The CHiCAGO function peakEnrichment4Features() was used to assess enrichment of genomic features in promoter-interacting regions at both 1-fragment and 4-fragment resolution.

Variant-to-gene mapping pipeline

BMI and WHRadjBMI GWAS summary statistics derived from European ancestry, each from 2010, 2015, and 2018, were downloaded from the Genetic Investigation of Anthropometric Traits (GIANT) consortium https://portals.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium_data_files (Heid et al., 2010a; Locke et al., 2015; Pulit et al., 2019; Shungin et al., 2015; Speliotes et al., 2010b; Yengo et al., 2018). We identified all genome-wide significant variants from each data set as any SNP with p<5×10⁻⁸. We then identified the sets of variants in the varying suggestive bins of 5×10⁻⁸≤p<5×10⁻⁷, 5 × 10⁻⁸ ≤ p<5×10⁻⁶, 5 × 10⁻⁸ ≤ p<5×10⁻⁵, and 5 × 10⁻⁸ ≤ p<5×10⁻⁴. Upon the identification of these variants within each bin, we utilized SNiPA v3.3 (Arnold et al., 2015) to find their proxy SNPs, which we define as any SNP with r² > 0.8 in Europeans for each region. Only signals with LD information within SNiPA were considered.

We next conducted a variant-to-gene mapping for each cell type within each of the suggestive bins to identify the gene promoters that are in contact with these SNPs. Promoter Capture C libraries are utilized to identify the genes with which these variants have interactions. Significant interactions were called using the CHiCAGO pipeline (Cairns et al., 2016) utilizing both 1-fragment and 4-fragment resolutions. We chose to focus on non-gene-to-gene interactions; thus, we ignored interactions that were identified as bait-to-bait as these represent the interactions between gene promoters. ATAC-seq libraries were also utilized to identify only those gene promoters that interact with SNPs in regions of open chromatin. Genes within the annotated MHC region were removed due to its highly polymorphic nature.

We next removed SNPs within the suggestive bins that were in linkage disequilibrium with any SNP that was significant genome wide. This was done to determine an entirely novel set of variant interacting genes that are entirely independent of those that are already known. To accomplish this, we removed all variants that were found to be in LD with a variant that is significant at a genome level. Thus, we identified all proxies of genome-wide significant variants at r² > 0.1 and remove any variant within the suggestive region that is found to be a proxy of these genome-wide significant variants. This provides a filtered set of SNPs that are independent of any SNP that is significant at a genome-wide scale.

Retrospective analysis

Interacting SNP-gene promoter pairs were identified by the variant-to-gene mapping pipeline described above for each GWAS across all suggestive zones for each cell type. Of these SNP-gene promoter pairs, we identify those that are in tight LD (r² >0.8) with an SNP that is genome-wide significant in a future study. The implicated genes derived from our variant-to-gene mapping approach were frequently different from the locus ‘names’ traditionally used in publications reporting GWAS findings, typically based on the nearest gene. In contrast, our variant-to-gene mapping used the integration of ATAC-seq and promoter-focused Capture C data to identify the gene promotor(s) physically in contact with a genomic region (almost always non-coding) harboring an associated variant. However, in order to assess the 2015 loci that were identifiable in 2010 with this method, for consistency’s sake, we annotated the implicated 2010 loci with the published locus ‘name’ annotation from the respective 2015 GWAS report. However, unlike the 2015 efforts, the 2018 studies did not annotate the reported loci with an arbitrary gene notation; thus, we simply labeled the nearest gene to the 2018 sentinel SNP in which the surviving signals were in tight LD. The gene promoters in contact with each surviving signal are also available in Source data 1. The surviving SNPs that were in tight LD with an SNP that achieved GWS in 2015 were annotated with the published 2015 locus name of a sentinel SNP residing up to 1 MB away. We indicate the locus name that was provided in each 2015 study, as well as provide the set of gene promoters that were identified by the variant-to-gene mapping pipeline. Manhattan plots were generated in R (R Development Core Team, 2019), utilizing p-values of SNPs from the corresponding summary statistics. SNP positions were identified with dbSNP Build 151 to plot the relative location of the SNPs on the Manhattan plot. The highlighted loci point to the SNP with the lowest p-value identified at each locus.

In the case of all remaining comparisons, retrospective annotation was performed differently. Rather than trying to provide a locus name to the set of SNPs that were identified to be in tight LD with a genome-wide significant SNP in a later year, we elected to identify the surviving SNP, the gene promoters it physically contacts, and its best proxy that reached GWS (as defined by both r² and p-value in the summary statistics).

To assess the predictive power of this method, we identified the positive predictive value of each set of salvaged SNPs across each retrospective analysis. To accurately quantify the number of loci that are identifiable using salvaged suggestive SNPs, we group the surviving variants based on their LD (r² > 0.4) to one another as a locus. We extend this one additional time to include the proxies (r² > 0.4) of each previously identified proxy SNP as a component of each locus. This allows SNPs that failed to be identified as a proxy of the lead SNP to be included within the locus via the transitive property, but only via one iteration of this process. The resulting loci identified are independent from one another, which prevents double counting of loci. We count these loci for each cell type across each suggestive bin, and we also identify the number of distinct loci regardless of cell type. We then identify the set of true positives as a locus that is in tight LD with a genome-wide significant SNP in a future GWAS. Those failing to meet these criteria are deemed false positives. We also identified loci of all genome-wide significant independent suggestive SNPs (r² < 0.4) for each retrospective analysis and identified true-positive, false-positive, false-negative, and true-negative counts for this set using the same metrics as the constrained data sets. We used a beta-binomial model and used Monte Carlo approximation over 100,000 iterations to identify the posterior probability that loci identified by our chromatin-based constraint more often achieve GWS than loci with no constraint. We also assessed significance of the positive predictive values by calculating an empirical distribution from 10,000 iterations of a randomly sampled population of the unconstrained suggestive signals within each p-value threshold bin. At each iteration, we randomly selected N loci, where N is the corresponding number of loci surviving our chromatin-constraints from either cell type, and quantified the positive predictive value from this sampling. We then plotted the location where the positive predictive values fell for each cell type to determine whether these different significantly from the random population.

We calculated sensitivity, specificity, false-negative rate, and false positive rate of this approach across both traits for all testable years. To do this, we identified binary classifications of sub-threshold loci that were predicted to either reach or not reach GWS by the future year in each cell type separately. We then created confusion matrices for each cell type and p-value bin to calculate each binary classification performance metric. Additionally, we identified these values for the previously described negative control data (the effective inverse of the biological constrained data) as well as the randomly sampled data. For both of these data sets, we identified these metrics across 10,000 randomly sampled populations where we randomly selected N loci, where N is the corresponding number of loci surviving our chromatin-constraints from either cell type at the corresponding p-value threshold.

All source code available on Github: https://github.com/rkweku/SubThresholdProjectScripts.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Struan FA Grant, Email: grants@email.chop.edu.

Mone Zaidi, Icahn School of Medicine at Mount Sinai, United States.

Funding Information

This paper was supported by the following grants:

Eunice Kennedy Shriver National Institute of Child Health and Human Development R01-HD056465 to Struan FA Grant.
National Human Genome Research Institute R01-HG010067 to Struan FA Grant.
Children's Hospital of Philadelphia Daniel B. Burke Endowed Chair for Diabetes Research to Struan FA Grant.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Methodology, Writing - original draft, Writing - review and editing.

Formal analysis, Investigation, Writing - review and editing.

Investigation, Writing - review and editing.

Data curation, Writing - review and editing.

Resources, Writing - review and editing.

Writing - review and editing.

Data curation, Writing - review and editing.

Writing - review and editing.

Formal analysis, Investigation, Writing - review and editing.

Investigation, Writing - review and editing.

Conceptualization, Supervision, Investigation, Writing - original draft, Project administration, Writing - review and editing.

Additional files

Source data 1. Classification metrics for each condition.

elife-62206-data1.zip^{(95.8KB, zip)}

Transparent reporting form

elife-62206-transrepform.docx^{(66.5KB, docx)}

Data availability

Adipose ATAC-seq and promoter-focused capture C data will be made available on GEO prior to publication. Hypothalamic Neuron ATAC-eq and promoter-focused capture C data is the subject of another atlas-based manuscript currently under peer review and through that process that dataset will be made available once the paper is published- the corresponding hypothalamus preprint can be found at: https://doi.org/10.1101/2020.07.06.146951v1.full.

The following datasets were generated:

Hammond RK, Pahl MC, Su C, Cousminer DL, Leonard ME, Lu S, Doege CA, Wagley Y, Hodge KM, Lasconi C, Johnson ME, Terry NA, Ghanem LR, Brown CS, Pippin JA, Hankenson KD, Leibel RL, Chesi A, Wells AD, Grant SFA. 2020. MSC-Derived Adipose ATAC-seq and Promoter-Focused Capture-C. NCBI Gene Expression Omnibus. GSE164745

Pahl MC, Doege CA, Hodge KM, Littleton SH, Leonard ME, Lu S, Rausch R, Pippin JA, Bradfield JP, Hammond RK, Boehm K, Berkowitz RI, Lasconi C, Chesi A, Johnson ME, Wells AD, Voight BF, Leibel RL, Cousminer DL, Grant SFA. 2020. ESC-Derived Hypothalamic Neuron ATAC-seq and Promoter-Focused Capture-C. NCBI Gene Expression Omnibus. GSE164911

The following previously published datasets were used:

Speliotes EK, Willer CJ, Berndt SI, Monda KL, Thorleifsson G, Jackson AU, Allen HL, Lindgren CM, Luan J, Magi R. 2010. GWAS 2010 BMI Summary Statistics. Broad Institute. GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq

Heid IM, Jackson AU, Randall JC, Winkler TW, Qi L, Steinthorsdottir V, Thorleifsson G, Zillikens MC, Speliotes EK, Magi R. 2010. GWAS 2010 WHRadjBMI Summary Statistics. Broad Institute. GIANT_WHRadjBMI_Heid2010_publicrelease_HapMapCeuFreq

Locke AE, Kahali B, Berndt SI, Justice AE, Pers TH, Day FR, Powell C, Vedantam S, Buchkovich ML, Yang J, Croteau-Chonka DC, Esko T. 2015. GWAS 2015 BMI Summary Statistics. Broad Institute. SNP_gwas_mc_merge_nogc.tbl.uniq

Shungin D, Winkler TW, Croteau-Chonka DC, Ferreira T, Locke AE, Magi R, Strawbridge R, Pers TH, Fischer K, Justice AE, Workalemahu T, Wu JM. 2015. GWAS 2015 WHRadjBMI Summary Statistics. Broad Institute. GIANT_2015_WHRadjBMI_COMBINED_EUR

Pulit SL, Stoneman C, Morris AP, Wood AR, Glastonbury CA, Tyrrell J. 2018. GWAS 2018 BMI Summary Statistics. Broad Institute. Meta-analysis_Locke_et_al%2BUKBiobank_2018_UPDATED

Yengo L, Sidorenko J, Kemper KE, Zheng Z, Wood AR, Weedon MN, Frayling TM, Hirschhorn J, Yang J, Visscher PM, GIANT Consortium 2018. GWAS 2018 WHRadjBMI Summary Statistics. Zenodo.

References

Arnold M, Raffler J, Pfeufer A, Suhre K, Kastenmüller G. SNiPA: an interactive, genetic variant-centered annotation browser. Bioinformatics. 2015;31:1334–1336. doi: 10.1093/bioinformatics/btu779. [DOI] [PMC free article] [PubMed] [Google Scholar]
Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nature Methods. 2013;10:1213–1218. doi: 10.1038/nmeth.2688. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cairns J, Freire-Pritchett P, Wingett SW, Várnai C, Dimond A, Plagnol V, Zerbino D, Schoenfelder S, Javierre BM, Osborne C, Fraser P, Spivakov M. CHiCAGO: robust detection of DNA looping interactions in capture Hi-C data. Genome Biology. 2016;17:127. doi: 10.1186/s13059-016-0992-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Çalışkan M, Manduchi E, Rao HS, Segert JA, Beltrame MH, Trizzino M, Park Y, Baker SW, Chesi A, Johnson ME, Hodge KM, Leonard ME, Loza B, Xin D, Berrido AM, Hand NJ, Bauer RC, Wells AD, Olthoff KM, Shaked A, Rader DJ, Grant SFA, Brown CD. Genetic and epigenetic fine mapping of complex trait associated loci in the human liver. The American Journal of Human Genetics. 2019;105:89–107. doi: 10.1016/j.ajhg.2019.05.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chesi A, Wagley Y, Johnson ME, Manduchi E, Su C, Lu S, Leonard ME, Hodge KM, Pippin JA, Hankenson KD, Wells AD, Grant SFA. Genome-scale capture C promoter interactions implicate effector genes at GWAS loci for bone mineral density. Nature Communications. 2019;10:1260. doi: 10.1038/s41467-019-09302-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cousminer DL, Wagley Y, Pippin J, Elhakeem A, Way G. Genome-wide association study implicates novel loci and reveals candidate effector genes for longitudinal pediatric bone accrual through variant-to-gene mapping. medRxiv. 2020 doi: 10.1101/2020.02.17.20024133. [DOI] [PMC free article] [PubMed]
Heid IM, Jackson AU, Randall JC, Winkler TW, Qi L, Steinthorsdottir V, Thorleifsson G, Zillikens MC, Speliotes EK, Mägi R, Workalemahu T, White CC, Bouatia-Naji N, Harris TB, Berndt SI, Ingelsson E, Willer CJ, Weedon MN, Luan J, Vedantam S, Esko T, Kilpeläinen TO, Kutalik Z, Li S, Monda KL, Dixon AL, Holmes CC, Kaplan LM, Liang L, Min JL, Moffatt MF, Molony C, Nicholson G, Schadt EE, Zondervan KT, Feitosa MF, Ferreira T, Lango Allen H, Weyant RJ, Wheeler E, Wood AR, Estrada K, Goddard ME, Lettre G, Mangino M, Nyholt DR, Purcell S, Smith AV, Visscher PM, Yang J, McCarroll SA, Nemesh J, Voight BF, Absher D, Amin N, Aspelund T, Coin L, Glazer NL, Hayward C, Heard-Costa NL, Hottenga JJ, Johansson A, Johnson T, Kaakinen M, Kapur K, Ketkar S, Knowles JW, Kraft P, Kraja AT, Lamina C, Leitzmann MF, McKnight B, Morris AP, Ong KK, Perry JR, Peters MJ, Polasek O, Prokopenko I, Rayner NW, Ripatti S, Rivadeneira F, Robertson NR, Sanna S, Sovio U, Surakka I, Teumer A, van Wingerden S, Vitart V, Zhao JH, Cavalcanti-Proença C, Chines PS, Fisher E, Kulzer JR, Lecoeur C, Narisu N, Sandholt C, Scott LJ, Silander K, Stark K, Tammesoo ML, Teslovich TM, Timpson NJ, Watanabe RM, Welch R, Chasman DI, Cooper MN, Jansson JO, Kettunen J, Lawrence RW, Pellikka N, Perola M, Vandenput L, Alavere H, Almgren P, Atwood LD, Bennett AJ, Biffar R, Bonnycastle LL, Bornstein SR, Buchanan TA, Campbell H, Day IN, Dei M, Dörr M, Elliott P, Erdos MR, Eriksson JG, Freimer NB, Fu M, Gaget S, Geus EJ, Gjesing AP, Grallert H, Grässler J, Groves CJ, Guiducci C, Hartikainen AL, Hassanali N, Havulinna AS, Herzig KH, Hicks AA, Hui J, Igl W, Jousilahti P, Jula A, Kajantie E, Kinnunen L, Kolcic I, Koskinen S, Kovacs P, Kroemer HK, Krzelj V, Kuusisto J, Kvaloy K, Laitinen J, Lantieri O, Lathrop GM, Lokki ML, Luben RN, Ludwig B, McArdle WL, McCarthy A, Morken MA, Nelis M, Neville MJ, Paré G, Parker AN, Peden JF, Pichler I, Pietiläinen KH, Platou CG, Pouta A, Ridderstråle M, Samani NJ, Saramies J, Sinisalo J, Smit JH, Strawbridge RJ, Stringham HM, Swift AJ, Teder-Laving M, Thomson B, Usala G, van Meurs JB, van Ommen GJ, Vatin V, Volpato CB, Wallaschofski H, Walters GB, Widen E, Wild SH, Willemsen G, Witte DR, Zgaga L, Zitting P, Beilby JP, James AL, Kähönen M, Lehtimäki T, Nieminen MS, Ohlsson C, Palmer LJ, Raitakari O, Ridker PM, Stumvoll M, Tönjes A, Viikari J, Balkau B, Ben-Shlomo Y, Bergman RN, Boeing H, Smith GD, Ebrahim S, Froguel P, Hansen T, Hengstenberg C, Hveem K, Isomaa B, Jørgensen T, Karpe F, Khaw KT, Laakso M, Lawlor DA, Marre M, Meitinger T, Metspalu A, Midthjell K, Pedersen O, Salomaa V, Schwarz PE, Tuomi T, Tuomilehto J, Valle TT, Wareham NJ, Arnold AM, Beckmann JS, Bergmann S, Boerwinkle E, Boomsma DI, Caulfield MJ, Collins FS, Eiriksdottir G, Gudnason V, Gyllensten U, Hamsten A, Hattersley AT, Hofman A, Hu FB, Illig T, Iribarren C, Jarvelin MR, Kao WH, Kaprio J, Launer LJ, Munroe PB, Oostra B, Penninx BW, Pramstaller PP, Psaty BM, Quertermous T, Rissanen A, Rudan I, Shuldiner AR, Soranzo N, Spector TD, Syvanen AC, Uda M, Uitterlinden A, Völzke H, Vollenweider P, Wilson JF, Witteman JC, Wright AF, Abecasis GR, Boehnke M, Borecki IB, Deloukas P, Frayling TM, Groop LC, Haritunians T, Hunter DJ, Kaplan RC, North KE, O'Connell JR, Peltonen L, Schlessinger D, Strachan DP, Hirschhorn JN, Assimes TL, Wichmann HE, Thorsteinsdottir U, van Duijn CM, Stefansson K, Cupples LA, Loos RJ, Barroso I, McCarthy MI, Fox CS, Mohlke KL, Lindgren CM, MAGIC Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. Nature Genetics. 2010a;42:949–960. doi: 10.1038/ng.685. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hughes JR, Roberts N, McGowan S, Hay D, Giannoulatou E, Lynch M, De Gobbi M, Taylor S, Gibbons R, Higgs DR. Analysis of hundreds of cis-regulatory landscapes at high resolution in a single, high-throughput experiment. Nature Genetics. 2014;46:205–212. doi: 10.1038/ng.2871. [DOI] [PubMed] [Google Scholar]
Javierre BM, Burren OS, Wilder SP, Kreuzhuber R, Hill SM, Sewitz S, Cairns J, Wingett SW, Várnai C, Thiecke MJ, Burden F, Farrow S, Cutler AJ, Rehnström K, Downes K, Grassi L, Kostadima M, Freire-Pritchett P, Wang F, Stunnenberg HG, Todd JA, Zerbino DR, Stegle O, Ouwehand WH, Frontini M, Wallace C, Spivakov M, Fraser P, BLUEPRINT Consortium Lineage-Specific genome architecture links enhancers and Non-coding disease variants to target gene promoters. Cell. 2016;167:1369–1384. doi: 10.1016/j.cell.2016.09.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, Henning AK, SanGiovanni JP, Mane SM, Mayne ST, Bracken MB, Ferris FL, Ott J, Barnstable C, Hoh J. Complement factor H polymorphism in age-related macular degeneration. Science. 2005;308:385–389. doi: 10.1126/science.1109557. [DOI] [PMC free article] [PubMed] [Google Scholar]
Locke AE, Kahali B, Berndt SI, Justice AE, Pers TH, Day FR, Powell C, Vedantam S, Buchkovich ML, Yang J, Croteau-Chonka DC, Esko T, Fall T, Ferreira T, Gustafsson S, Kutalik Z, Luan J, Mägi R, Randall JC, Winkler TW, Wood AR, Workalemahu T, Faul JD, Smith JA, Zhao JH, Zhao W, Chen J, Fehrmann R, Hedman ÅK, Karjalainen J, Schmidt EM, Absher D, Amin N, Anderson D, Beekman M, Bolton JL, Bragg-Gresham JL, Buyske S, Demirkan A, Deng G, Ehret GB, Feenstra B, Feitosa MF, Fischer K, Goel A, Gong J, Jackson AU, Kanoni S, Kleber ME, Kristiansson K, Lim U, Lotay V, Mangino M, Leach IM, Medina-Gomez C, Medland SE, Nalls MA, Palmer CD, Pasko D, Pechlivanis S, Peters MJ, Prokopenko I, Shungin D, Stančáková A, Strawbridge RJ, Sung YJ, Tanaka T, Teumer A, Trompet S, van der Laan SW, van Setten J, Van Vliet-Ostaptchouk JV, Wang Z, Yengo L, Zhang W, Isaacs A, Albrecht E, Ärnlöv J, Arscott GM, Attwood AP, Bandinelli S, Barrett A, Bas IN, Bellis C, Bennett AJ, Berne C, Blagieva R, Blüher M, Böhringer S, Bonnycastle LL, Böttcher Y, Boyd HA, Bruinenberg M, Caspersen IH, Chen YI, Clarke R, Daw EW, de Craen AJM, Delgado G, Dimitriou M, Doney ASF, Eklund N, Estrada K, Eury E, Folkersen L, Fraser RM, Garcia ME, Geller F, Giedraitis V, Gigante B, Go AS, Golay A, Goodall AH, Gordon SD, Gorski M, Grabe HJ, Grallert H, Grammer TB, Gräßler J, Grönberg H, Groves CJ, Gusto G, Haessler J, Hall P, Haller T, Hallmans G, Hartman CA, Hassinen M, Hayward C, Heard-Costa NL, Helmer Q, Hengstenberg C, Holmen O, Hottenga JJ, James AL, Jeff JM, Johansson Å, Jolley J, Juliusdottir T, Kinnunen L, Koenig W, Koskenvuo M, Kratzer W, Laitinen J, Lamina C, Leander K, Lee NR, Lichtner P, Lind L, Lindström J, Lo KS, Lobbens S, Lorbeer R, Lu Y, Mach F, Magnusson PKE, Mahajan A, McArdle WL, McLachlan S, Menni C, Merger S, Mihailov E, Milani L, Moayyeri A, Monda KL, Morken MA, Mulas A, Müller G, Müller-Nurasyid M, Musk AW, Nagaraja R, Nöthen MM, Nolte IM, Pilz S, Rayner NW, Renstrom F, Rettig R, Ried JS, Ripke S, Robertson NR, Rose LM, Sanna S, Scharnagl H, Scholtens S, Schumacher FR, Scott WR, Seufferlein T, Shi J, Smith AV, Smolonska J, Stanton AV, Steinthorsdottir V, Stirrups K, Stringham HM, Sundström J, Swertz MA, Swift AJ, Syvänen AC, Tan ST, Tayo BO, Thorand B, Thorleifsson G, Tyrer JP, Uh HW, Vandenput L, Verhulst FC, Vermeulen SH, Verweij N, Vonk JM, Waite LL, Warren HR, Waterworth D, Weedon MN, Wilkens LR, Willenborg C, Wilsgaard T, Wojczynski MK, Wong A, Wright AF, Zhang Q, Brennan EP, Choi M, Dastani Z, Drong AW, Eriksson P, Franco-Cereceda A, Gådin JR, Gharavi AG, Goddard ME, Handsaker RE, Huang J, Karpe F, Kathiresan S, Keildson S, Kiryluk K, Kubo M, Lee JY, Liang L, Lifton RP, Ma B, McCarroll SA, McKnight AJ, Min JL, Moffatt MF, Montgomery GW, Murabito JM, Nicholson G, Nyholt DR, Okada Y, Perry JRB, Dorajoo R, Reinmaa E, Salem RM, Sandholm N, Scott RA, Stolk L, Takahashi A, Tanaka T, van 't Hooft FM, Vinkhuyzen AAE, Westra HJ, Zheng W, Zondervan KT, Heath AC, Arveiler D, Bakker SJL, Beilby J, Bergman RN, Blangero J, Bovet P, Campbell H, Caulfield MJ, Cesana G, Chakravarti A, Chasman DI, Chines PS, Collins FS, Crawford DC, Cupples LA, Cusi D, Danesh J, de Faire U, den Ruijter HM, Dominiczak AF, Erbel R, Erdmann J, Eriksson JG, Farrall M, Felix SB, Ferrannini E, Ferrières J, Ford I, Forouhi NG, Forrester T, Franco OH, Gansevoort RT, Gejman PV, Gieger C, Gottesman O, Gudnason V, Gyllensten U, Hall AS, Harris TB, Hattersley AT, Hicks AA, Hindorff LA, Hingorani AD, Hofman A, Homuth G, Hovingh GK, Humphries SE, Hunt SC, Hyppönen E, Illig T, Jacobs KB, Jarvelin MR, Jöckel KH, Johansen B, Jousilahti P, Jukema JW, Jula AM, Kaprio J, Kastelein JJP, Keinanen-Kiukaanniemi SM, Kiemeney LA, Knekt P, Kooner JS, Kooperberg C, Kovacs P, Kraja AT, Kumari M, Kuusisto J, Lakka TA, Langenberg C, Marchand LL, Lehtimäki T, Lyssenko V, Männistö S, Marette A, Matise TC, McKenzie CA, McKnight B, Moll FL, Morris AD, Morris AP, Murray JC, Nelis M, Ohlsson C, Oldehinkel AJ, Ong KK, Madden PAF, Pasterkamp G, Peden JF, Peters A, Postma DS, Pramstaller PP, Price JF, Qi L, Raitakari OT, Rankinen T, Rao DC, Rice TK, Ridker PM, Rioux JD, Ritchie MD, Rudan I, Salomaa V, Samani NJ, Saramies J, Sarzynski MA, Schunkert H, Schwarz PEH, Sever P, Shuldiner AR, Sinisalo J, Stolk RP, Strauch K, Tönjes A, Trégouët DA, Tremblay A, Tremoli E, Virtamo J, Vohl MC, Völker U, Waeber G, Willemsen G, Witteman JC, Zillikens MC, Adair LS, Amouyel P, Asselbergs FW, Assimes TL, Bochud M, Boehm BO, Boerwinkle E, Bornstein SR, Bottinger EP, Bouchard C, Cauchi S, Chambers JC, Chanock SJ, Cooper RS, de Bakker PIW, Dedoussis G, Ferrucci L, Franks PW, Froguel P, Groop LC, Haiman CA, Hamsten A, Hui J, Hunter DJ, Hveem K, Kaplan RC, Kivimaki M, Kuh D, Laakso M, Liu Y, Martin NG, März W, Melbye M, Metspalu A, Moebus S, Munroe PB, Njølstad I, Oostra BA, Palmer CNA, Pedersen NL, Perola M, Pérusse L, Peters U, Power C, Quertermous T, Rauramaa R, Rivadeneira F, Saaristo TE, Saleheen D, Sattar N, Schadt EE, Schlessinger D, Slagboom PE, Snieder H, Spector TD, Thorsteinsdottir U, Stumvoll M, Tuomilehto J, Uitterlinden AG, Uusitupa M, van der Harst P, Walker M, Wallaschofski H, Wareham NJ, Watkins H, Weir DR, Wichmann HE, Wilson JF, Zanen P, Borecki IB, Deloukas P, Fox CS, Heid IM, O'Connell JR, Strachan DP, Stefansson K, van Duijn CM, Abecasis GR, Franke L, Frayling TM, McCarthy MI, Visscher PM, Scherag A, Willer CJ, Boehnke M, Mohlke KL, Lindgren CM, Beckmann JS, Barroso I, North KE, Ingelsson E, Hirschhorn JN, Loos RJF, Speliotes EK, LifeLines Cohort Study. ADIPOGen Consortium. AGEN-BMI Working Group. CARDIOGRAMplusC4D Consortium. CKDGen Consortium. GLGC. ICBP. MAGIC Investigators. MuTHER Consortium. MIGen Consortium. PAGE Consortium. ReproGen Consortium. GENIE Consortium. International Endogene Consortium Genetic studies of body mass index yield new insights for obesity biology. Nature. 2015;518:197–206. doi: 10.1038/nature14177. [DOI] [PMC free article] [PubMed] [Google Scholar]
Meyre D, Delplanque J, Chèvre JC, Lecoeur C, Lobbens S, Gallina S, Durand E, Vatin V, Degraeve F, Proença C, Gaget S, Körner A, Kovacs P, Kiess W, Tichet J, Marre M, Hartikainen AL, Horber F, Potoczna N, Hercberg S, Levy-Marchal C, Pattou F, Heude B, Tauber M, McCarthy MI, Blakemore AI, Montpetit A, Polychronakos C, Weill J, Coin LJ, Asher J, Elliott P, Järvelin MR, Visvikis-Siest S, Balkau B, Sladek R, Balding D, Walley A, Dina C, Froguel P. Genome-wide association study for early-onset and morbid adult obesity identifies three new risk loci in european populations. Nature Genetics. 2009;41:157–159. doi: 10.1038/ng.301. [DOI] [PubMed] [Google Scholar]
Pahl MC, Doege CA, Hodge KM, Littleton SH, Leonard ME, Lu S, Grant SFA. Cis-regulatory architecture of human ESC-derived hypothalamic neuron differentiation aids in variant-to-gene mapping of relevant complex traits. bioRxiv. 2020 doi: 10.1101/2020.07.06.146951. [DOI] [PMC free article] [PubMed]
Pulit SL, Stoneman C, Morris AP, Wood AR, Glastonbury CA, Tyrrell J, Yengo L, Ferreira T, Marouli E, Ji Y, Yang J, Jones S, Beaumont R, Croteau-Chonka DC, Winkler TW, Hattersley AT, Loos RJF, Hirschhorn JN, Visscher PM, Frayling TM, Yaghootkar H, Lindgren CM, GIANT Consortium Meta-analysis of genome-wide association studies for body fat distribution in 694 649 individuals of European ancestry. Human Molecular Genetics. 2019;28:166–174. doi: 10.1093/hmg/ddy327. [DOI] [PMC free article] [PubMed] [Google Scholar]
R Development Core Team . Vienna, Austria: R Foundation for Statistical Computing; 2019. http://www.R-project.org [Google Scholar]
Shungin D, Winkler TW, Croteau-Chonka DC, Ferreira T, Locke AE, Mägi R, Strawbridge RJ, Pers TH, Fischer K, Justice AE, Workalemahu T, Wu JMW, Buchkovich ML, Heard-Costa NL, Roman TS, Drong AW, Song C, Gustafsson S, Day FR, Esko T, Fall T, Kutalik Z, Luan J, Randall JC, Scherag A, Vedantam S, Wood AR, Chen J, Fehrmann R, Karjalainen J, Kahali B, Liu CT, Schmidt EM, Absher D, Amin N, Anderson D, Beekman M, Bragg-Gresham JL, Buyske S, Demirkan A, Ehret GB, Feitosa MF, Goel A, Jackson AU, Johnson T, Kleber ME, Kristiansson K, Mangino M, Leach IM, Medina-Gomez C, Palmer CD, Pasko D, Pechlivanis S, Peters MJ, Prokopenko I, Stančáková A, Sung YJ, Tanaka T, Teumer A, Van Vliet-Ostaptchouk JV, Yengo L, Zhang W, Albrecht E, Ärnlöv J, Arscott GM, Bandinelli S, Barrett A, Bellis C, Bennett AJ, Berne C, Blüher M, Böhringer S, Bonnet F, Böttcher Y, Bruinenberg M, Carba DB, Caspersen IH, Clarke R, Daw EW, Deelen J, Deelman E, Delgado G, Doney AS, Eklund N, Erdos MR, Estrada K, Eury E, Friedrich N, Garcia ME, Giedraitis V, Gigante B, Go AS, Golay A, Grallert H, Grammer TB, Gräßler J, Grewal J, Groves CJ, Haller T, Hallmans G, Hartman CA, Hassinen M, Hayward C, Heikkilä K, Herzig KH, Helmer Q, Hillege HL, Holmen O, Hunt SC, Isaacs A, Ittermann T, James AL, Johansson I, Juliusdottir T, Kalafati IP, Kinnunen L, Koenig W, Kooner IK, Kratzer W, Lamina C, Leander K, Lee NR, Lichtner P, Lind L, Lindström J, Lobbens S, Lorentzon M, Mach F, Magnusson PK, Mahajan A, McArdle WL, Menni C, Merger S, Mihailov E, Milani L, Mills R, Moayyeri A, Monda KL, Mooijaart SP, Mühleisen TW, Mulas A, Müller G, Müller-Nurasyid M, Nagaraja R, Nalls MA, Narisu N, Glorioso N, Nolte IM, Olden M, Rayner NW, Renstrom F, Ried JS, Robertson NR, Rose LM, Sanna S, Scharnagl H, Scholtens S, Sennblad B, Seufferlein T, Sitlani CM, Smith AV, Stirrups K, Stringham HM, Sundström J, Swertz MA, Swift AJ, Syvänen AC, Tayo BO, Thorand B, Thorleifsson G, Tomaschitz A, Troffa C, van Oort FV, Verweij N, Vonk JM, Waite LL, Wennauer R, Wilsgaard T, Wojczynski MK, Wong A, Zhang Q, Zhao JH, Brennan EP, Choi M, Eriksson P, Folkersen L, Franco-Cereceda A, Gharavi AG, Hedman ÅK, Hivert MF, Huang J, Kanoni S, Karpe F, Keildson S, Kiryluk K, Liang L, Lifton RP, Ma B, McKnight AJ, McPherson R, Metspalu A, Min JL, Moffatt MF, Montgomery GW, Murabito JM, Nicholson G, Nyholt DR, Olsson C, Perry JR, Reinmaa E, Salem RM, Sandholm N, Schadt EE, Scott RA, Stolk L, Vallejo EE, Westra HJ, Zondervan KT, Amouyel P, Arveiler D, Bakker SJ, Beilby J, Bergman RN, Blangero J, Brown MJ, Burnier M, Campbell H, Chakravarti A, Chines PS, Claudi-Boehm S, Collins FS, Crawford DC, Danesh J, de Faire U, de Geus EJ, Dörr M, Erbel R, Eriksson JG, Farrall M, Ferrannini E, Ferrières J, Forouhi NG, Forrester T, Franco OH, Gansevoort RT, Gieger C, Gudnason V, Haiman CA, Harris TB, Hattersley AT, Heliövaara M, Hicks AA, Hingorani AD, Hoffmann W, Hofman A, Homuth G, Humphries SE, Hyppönen E, Illig T, Jarvelin MR, Johansen B, Jousilahti P, Jula AM, Kaprio J, Kee F, Keinanen-Kiukaanniemi SM, Kooner JS, Kooperberg C, Kovacs P, Kraja AT, Kumari M, Kuulasmaa K, Kuusisto J, Lakka TA, Langenberg C, Le Marchand L, Lehtimäki T, Lyssenko V, Männistö S, Marette A, Matise TC, McKenzie CA, McKnight B, Musk AW, Möhlenkamp S, Morris AD, Nelis M, Ohlsson C, Oldehinkel AJ, Ong KK, Palmer LJ, Penninx BW, Peters A, Pramstaller PP, Raitakari OT, Rankinen T, Rao DC, Rice TK, Ridker PM, Ritchie MD, Rudan I, Salomaa V, Samani NJ, Saramies J, Sarzynski MA, Schwarz PE, Shuldiner AR, Staessen JA, Steinthorsdottir V, Stolk RP, Strauch K, Tönjes A, Tremblay A, Tremoli E, Vohl MC, Völker U, Vollenweider P, Wilson JF, Witteman JC, Adair LS, Bochud M, Boehm BO, Bornstein SR, Bouchard C, Cauchi S, Caulfield MJ, Chambers JC, Chasman DI, Cooper RS, Dedoussis G, Ferrucci L, Froguel P, Grabe HJ, Hamsten A, Hui J, Hveem K, Jöckel KH, Kivimaki M, Kuh D, Laakso M, Liu Y, März W, Munroe PB, Njølstad I, Oostra BA, Palmer CN, Pedersen NL, Perola M, Pérusse L, Peters U, Power C, Quertermous T, Rauramaa R, Rivadeneira F, Saaristo TE, Saleheen D, Sinisalo J, Slagboom PE, Snieder H, Spector TD, Stefansson K, Stumvoll M, Tuomilehto J, Uitterlinden AG, Uusitupa M, van der Harst P, Veronesi G, Walker M, Wareham NJ, Watkins H, Wichmann HE, Abecasis GR, Assimes TL, Berndt SI, Boehnke M, Borecki IB, Deloukas P, Franke L, Frayling TM, Groop LC, Hunter DJ, Kaplan RC, O'Connell JR, Qi L, Schlessinger D, Strachan DP, Thorsteinsdottir U, van Duijn CM, Willer CJ, Visscher PM, Yang J, Hirschhorn JN, Zillikens MC, McCarthy MI, Speliotes EK, North KE, Fox CS, Barroso I, Franks PW, Ingelsson E, Heid IM, Loos RJ, Cupples LA, Morris AP, Lindgren CM, Mohlke KL, ADIPOGen Consortium. CARDIOGRAMplusC4D Consortium. CKDGen Consortium. GEFOS Consortium. GENIE Consortium. GLGC. ICBP. International Endogene Consortium. LifeLines Cohort Study. MAGIC Investigators. MuTHER Consortium. PAGE Consortium. ReproGen Consortium New genetic loci link adipose and insulin biology to body fat distribution. Nature. 2015;518:187–196. doi: 10.1038/nature14132. [DOI] [PMC free article] [PubMed] [Google Scholar]
Smemo S, Tena JJ, Kim KH, Gamazon ER, Sakabe NJ, Gómez-Marín C, Aneas I, Credidio FL, Sobreira DR, Wasserman NF, Lee JH, Puviindran V, Tam D, Shen M, Son JE, Vakili NA, Sung HK, Naranjo S, Acemel RD, Manzanares M, Nagy A, Cox NJ, Hui CC, Gomez-Skarmeta JL, Nóbrega MA. Obesity-associated variants within FTO form long-range functional connections with IRX3. Nature. 2014;507:371–375. doi: 10.1038/nature13138. [DOI] [PMC free article] [PubMed] [Google Scholar]
Speliotes EK, Willer CJ, Berndt SI, Monda KL, Thorleifsson G, Jackson AU, Lango Allen H, Lindgren CM, Luan J, Mägi R, Randall JC, Vedantam S, Winkler TW, Qi L, Workalemahu T, Heid IM, Steinthorsdottir V, Stringham HM, Weedon MN, Wheeler E, Wood AR, Ferreira T, Weyant RJ, Segrè AV, Estrada K, Liang L, Nemesh J, Park JH, Gustafsson S, Kilpeläinen TO, Yang J, Bouatia-Naji N, Esko T, Feitosa MF, Kutalik Z, Mangino M, Raychaudhuri S, Scherag A, Smith AV, Welch R, Zhao JH, Aben KK, Absher DM, Amin N, Dixon AL, Fisher E, Glazer NL, Goddard ME, Heard-Costa NL, Hoesel V, Hottenga JJ, Johansson A, Johnson T, Ketkar S, Lamina C, Li S, Moffatt MF, Myers RH, Narisu N, Perry JR, Peters MJ, Preuss M, Ripatti S, Rivadeneira F, Sandholt C, Scott LJ, Timpson NJ, Tyrer JP, van Wingerden S, Watanabe RM, White CC, Wiklund F, Barlassina C, Chasman DI, Cooper MN, Jansson JO, Lawrence RW, Pellikka N, Prokopenko I, Shi J, Thiering E, Alavere H, Alibrandi MT, Almgren P, Arnold AM, Aspelund T, Atwood LD, Balkau B, Balmforth AJ, Bennett AJ, Ben-Shlomo Y, Bergman RN, Bergmann S, Biebermann H, Blakemore AI, Boes T, Bonnycastle LL, Bornstein SR, Brown MJ, Buchanan TA, Busonero F, Campbell H, Cappuccio FP, Cavalcanti-Proença C, Chen YD, Chen CM, Chines PS, Clarke R, Coin L, Connell J, Day IN, den Heijer M, Duan J, Ebrahim S, Elliott P, Elosua R, Eiriksdottir G, Erdos MR, Eriksson JG, Facheris MF, Felix SB, Fischer-Posovszky P, Folsom AR, Friedrich N, Freimer NB, Fu M, Gaget S, Gejman PV, Geus EJ, Gieger C, Gjesing AP, Goel A, Goyette P, Grallert H, Grässler J, Greenawalt DM, Groves CJ, Gudnason V, Guiducci C, Hartikainen AL, Hassanali N, Hall AS, Havulinna AS, Hayward C, Heath AC, Hengstenberg C, Hicks AA, Hinney A, Hofman A, Homuth G, Hui J, Igl W, Iribarren C, Isomaa B, Jacobs KB, Jarick I, Jewell E, John U, Jørgensen T, Jousilahti P, Jula A, Kaakinen M, Kajantie E, Kaplan LM, Kathiresan S, Kettunen J, Kinnunen L, Knowles JW, Kolcic I, König IR, Koskinen S, Kovacs P, Kuusisto J, Kraft P, Kvaløy K, Laitinen J, Lantieri O, Lanzani C, Launer LJ, Lecoeur C, Lehtimäki T, Lettre G, Liu J, Lokki ML, Lorentzon M, Luben RN, Ludwig B, Manunta P, Marek D, Marre M, Martin NG, McArdle WL, McCarthy A, McKnight B, Meitinger T, Melander O, Meyre D, Midthjell K, Montgomery GW, Morken MA, Morris AP, Mulic R, Ngwa JS, Nelis M, Neville MJ, Nyholt DR, O'Donnell CJ, O'Rahilly S, Ong KK, Oostra B, Paré G, Parker AN, Perola M, Pichler I, Pietiläinen KH, Platou CG, Polasek O, Pouta A, Rafelt S, Raitakari O, Rayner NW, Ridderstråle M, Rief W, Ruokonen A, Robertson NR, Rzehak P, Salomaa V, Sanders AR, Sandhu MS, Sanna S, Saramies J, Savolainen MJ, Scherag S, Schipf S, Schreiber S, Schunkert H, Silander K, Sinisalo J, Siscovick DS, Smit JH, Soranzo N, Sovio U, Stephens J, Surakka I, Swift AJ, Tammesoo ML, Tardif JC, Teder-Laving M, Teslovich TM, Thompson JR, Thomson B, Tönjes A, Tuomi T, van Meurs JB, van Ommen GJ, Vatin V, Viikari J, Visvikis-Siest S, Vitart V, Vogel CI, Voight BF, Waite LL, Wallaschofski H, Walters GB, Widen E, Wiegand S, Wild SH, Willemsen G, Witte DR, Witteman JC, Xu J, Zhang Q, Zgaga L, Ziegler A, Zitting P, Beilby JP, Farooqi IS, Hebebrand J, Huikuri HV, James AL, Kähönen M, Levinson DF, Macciardi F, Nieminen MS, Ohlsson C, Palmer LJ, Ridker PM, Stumvoll M, Beckmann JS, Boeing H, Boerwinkle E, Boomsma DI, Caulfield MJ, Chanock SJ, Collins FS, Cupples LA, Smith GD, Erdmann J, Froguel P, Grönberg H, Gyllensten U, Hall P, Hansen T, Harris TB, Hattersley AT, Hayes RB, Heinrich J, Hu FB, Hveem K, Illig T, Jarvelin MR, Kaprio J, Karpe F, Khaw KT, Kiemeney LA, Krude H, Laakso M, Lawlor DA, Metspalu A, Munroe PB, Ouwehand WH, Pedersen O, Penninx BW, Peters A, Pramstaller PP, Quertermous T, Reinehr T, Rissanen A, Rudan I, Samani NJ, Schwarz PE, Shuldiner AR, Spector TD, Tuomilehto J, Uda M, Uitterlinden A, Valle TT, Wabitsch M, Waeber G, Wareham NJ, Watkins H, Wilson JF, Wright AF, Zillikens MC, Chatterjee N, McCarroll SA, Purcell S, Schadt EE, Visscher PM, Assimes TL, Borecki IB, Deloukas P, Fox CS, Groop LC, Haritunians T, Hunter DJ, Kaplan RC, Mohlke KL, O'Connell JR, Peltonen L, Schlessinger D, Strachan DP, van Duijn CM, Wichmann HE, Frayling TM, Thorsteinsdottir U, Abecasis GR, Barroso I, Boehnke M, Stefansson K, North KE, McCarthy MI, Hirschhorn JN, Ingelsson E, Loos RJ, MAGIC. Procardis Consortium Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nature Genetics. 2010b;42:937–948. doi: 10.1038/ng.686. [DOI] [PMC free article] [PubMed] [Google Scholar]
Su C, Johnson ME, Torres A, Thomas RM, Manduchi E, Sharma P, Wells AD. Human follicular helper T cell promoter connectomes reveal novel genes and regulatory elements at SLE GWAS loci. bioRxiv. 2019 doi: 10.1101/2019.12.20.885426. [DOI]
Su C, Johnson ME, Torres A, Thomas RM, Manduchi E, Sharma P, Mehra P, Le Coz C, Leonard ME, Lu S, Hodge KM, Chesi A, Pippin J, Romberg N, Grant SFA, Wells AD. Mapping effector genes at lupus GWAS loci using promoter Capture-C in follicular helper T cells. Nature Communications. 2020;11:3294. doi: 10.1038/s41467-020-17089-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tam V, Patel N, Turcotte M, Bossé Y, Paré G, Meyre D. Benefits and limitations of genome-wide association studies. Nature Reviews Genetics. 2019;20:467–484. doi: 10.1038/s41576-019-0127-1. [DOI] [PubMed] [Google Scholar]
Turcot V, Lu Y, Highland HM, Schurmann C, Justice AE, Fine RS, Bradfield JP, Esko T, Giri A, Graff M, Guo X, Hendricks AE, Karaderi T, Lempradl A, Locke AE, Mahajan A, Marouli E, Sivapalaratnam S, Young KL, Alfred T, Feitosa MF, Masca NGD, Manning AK, Medina-Gomez C, Mudgal P, Ng MCY, Reiner AP, Vedantam S, Willems SM, Winkler TW, Abecasis G, Aben KK, Alam DS, Alharthi SE, Allison M, Amouyel P, Asselbergs FW, Auer PL, Balkau B, Bang LE, Barroso I, Bastarache L, Benn M, Bergmann S, Bielak LF, Blüher M, Boehnke M, Boeing H, Boerwinkle E, Böger CA, Bork-Jensen J, Bots ML, Bottinger EP, Bowden DW, Brandslund I, Breen G, Brilliant MH, Broer L, Brumat M, Burt AA, Butterworth AS, Campbell PT, Cappellani S, Carey DJ, Catamo E, Caulfield MJ, Chambers JC, Chasman DI, Chen YI, Chowdhury R, Christensen C, Chu AY, Cocca M, Collins FS, Cook JP, Corley J, Corominas Galbany J, Cox AJ, Crosslin DS, Cuellar-Partida G, D'Eustacchio A, Danesh J, Davies G, Bakker PIW, Groot MCH, Mutsert R, Deary IJ, Dedoussis G, Demerath EW, Heijer M, Hollander AI, Ruijter HM, Dennis JG, Denny JC, Di Angelantonio E, Drenos F, Du M, Dubé MP, Dunning AM, Easton DF, Edwards TL, Ellinghaus D, Ellinor PT, Elliott P, Evangelou E, Farmaki AE, Farooqi IS, Faul JD, Fauser S, Feng S, Ferrannini E, Ferrieres J, Florez JC, Ford I, Fornage M, Franco OH, Franke A, Franks PW, Friedrich N, Frikke-Schmidt R, Galesloot TE, Gan W, Gandin I, Gasparini P, Gibson J, Giedraitis V, Gjesing AP, Gordon-Larsen P, Gorski M, Grabe HJ, Grant SFA, Grarup N, Griffiths HL, Grove ML, Gudnason V, Gustafsson S, Haessler J, Hakonarson H, Hammerschlag AR, Hansen T, Harris KM, Harris TB, Hattersley AT, Have CT, Hayward C, He L, Heard-Costa NL, Heath AC, Heid IM, Helgeland Ø, Hernesniemi J, Hewitt AW, Holmen OL, Hovingh GK, Howson JMM, Hu Y, Huang PL, Huffman JE, Ikram MA, Ingelsson E, Jackson AU, Jansson JH, Jarvik GP, Jensen GB, Jia Y, Johansson S, Jørgensen ME, Jørgensen T, Jukema JW, Kahali B, Kahn RS, Kähönen M, Kamstrup PR, Kanoni S, Kaprio J, Karaleftheri M, Kardia SLR, Karpe F, Kathiresan S, Kee F, Kiemeney LA, Kim E, Kitajima H, Komulainen P, Kooner JS, Kooperberg C, Korhonen T, Kovacs P, Kuivaniemi H, Kutalik Z, Kuulasmaa K, Kuusisto J, Laakso M, Lakka TA, Lamparter D, Lange EM, Lange LA, Langenberg C, Larson EB, Lee NR, Lehtimäki T, Lewis CE, Li H, Li J, Li-Gao R, Lin H, Lin KH, Lin LA, Lin X, Lind L, Lindström J, Linneberg A, Liu CT, Liu DJ, Liu Y, Lo KS, Lophatananon A, Lotery AJ, Loukola A, Luan J, Lubitz SA, Lyytikäinen LP, Männistö S, Marenne G, Mazul AL, McCarthy MI, McKean-Cowdin R, Medland SE, Meidtner K, Milani L, Mistry V, Mitchell P, Mohlke KL, Moilanen L, Moitry M, Montgomery GW, Mook-Kanamori DO, Moore C, Mori TA, Morris AD, Morris AP, Müller-Nurasyid M, Munroe PB, Nalls MA, Narisu N, Nelson CP, Neville M, Nielsen SF, Nikus K, Njølstad PR, Nordestgaard BG, Nyholt DR, O'Connel JR, O'Donoghue ML, Olde Loohuis LM, Ophoff RA, Owen KR, Packard CJ, Padmanabhan S, Palmer CNA, Palmer ND, Pasterkamp G, Patel AP, Pattie A, Pedersen O, Peissig PL, Peloso GM, Pennell CE, Perola M, Perry JA, Perry JRB, Pers TH, Person TN, Peters A, Petersen ERB, Peyser PA, Pirie A, Polasek O, Polderman TJ, Puolijoki H, Raitakari OT, Rasheed A, Rauramaa R, Reilly DF, Renström F, Rheinberger M, Ridker PM, Rioux JD, Rivas MA, Roberts DJ, Robertson NR, Robino A, Rolandsson O, Rudan I, Ruth KS, Saleheen D, Salomaa V, Samani NJ, Sapkota Y, Sattar N, Schoen RE, Schreiner PJ, Schulze MB, Scott RA, Segura-Lepe MP, Shah SH, Sheu WH, Sim X, Slater AJ, Small KS, Smith AV, Southam L, Spector TD, Speliotes EK, Starr JM, Stefansson K, Steinthorsdottir V, Stirrups KE, Strauch K, Stringham HM, Stumvoll M, Sun L, Surendran P, Swift AJ, Tada H, Tansey KE, Tardif JC, Taylor KD, Teumer A, Thompson DJ, Thorleifsson G, Thorsteinsdottir U, Thuesen BH, Tönjes A, Tromp G, Trompet S, Tsafantakis E, Tuomilehto J, Tybjaerg-Hansen A, Tyrer JP, Uher R, Uitterlinden AG, Uusitupa M, Laan SW, Duijn CM, Leeuwen N, van Setten J, Vanhala M, Varbo A, Varga TV, Varma R, Velez Edwards DR, Vermeulen SH, Veronesi G, Vestergaard H, Vitart V, Vogt TF, Völker U, Vuckovic D, Wagenknecht LE, Walker M, Wallentin L, Wang F, Wang CA, Wang S, Wang Y, Ware EB, Wareham NJ, Warren HR, Waterworth DM, Wessel J, White HD, Willer CJ, Wilson JG, Witte DR, Wood AR, Wu Y, Yaghootkar H, Yao J, Yao P, Yerges-Armstrong LM, Young R, Zeggini E, Zhan X, Zhang W, Zhao JH, Zhao W, Zhao W, Zhou W, Zondervan KT, Rotter JI, Pospisilik JA, Rivadeneira F, Borecki IB, Deloukas P, Frayling TM, Lettre G, North KE, Lindgren CM, Hirschhorn JN, Loos RJF, CHD Exome+ Consortium. EPIC-CVD Consortium. ExomeBP Consortium. Global Lipids Genetic Consortium. GoT2D Genes Consortium. EPIC InterAct Consortium. INTERVAL Study. ReproGen Consortium. T2D-Genes Consortium. MAGIC Investigators. Understanding Society Scientific Group Protein-altering variants associated with body mass index implicate pathways that control energy intake and expenditure in obesity. Nature Genetics. 2018;50:26–41. doi: 10.1038/s41588-017-0011-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Visscher PM. Sizing up human height variation. Nature Genetics. 2008;40:489–490. doi: 10.1038/ng0508-489. [DOI] [PubMed] [Google Scholar]
Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, Yang J. 10 years of GWAS discovery: biology, function, and translation. The American Journal of Human Genetics. 2017;101:5–22. doi: 10.1016/j.ajhg.2017.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics. 2009;10:57–63. doi: 10.1038/nrg2484. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang X, Tucker NR, Rizki G, Mills R, Krijger PH, de Wit E, Subramanian V, Bartell E, Nguyen XX, Ye J, Leyton-Mange J, Dolmatova EV, van der Harst P, de Laat W, Ellinor PT, Newton-Cheh C, Milan DJ, Kellis M, Boyer LA. Discovery and validation of sub-threshold genome-wide association study loci using epigenomic signatures. eLife. 2016;5:e10557. doi: 10.7554/eLife.10557. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wingett S, Ewels P, Furlan-Magaril M, Nagano T, Schoenfelder S, Fraser P, Andrews S. HiCUP: pipeline for mapping and processing Hi-C data [version 1; peer review: 2 approved, 1 approved with reservations] F1000Research. 2015;4:1310. doi: 10.12688/f1000research.7334.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wood AR, Esko T, Yang J, Vedantam S, Pers TH, Gustafsson S, Chu AY, Estrada K, Luan J, Kutalik Z, Amin N, Buchkovich ML, Croteau-Chonka DC, Day FR, Duan Y, Fall T, Fehrmann R, Ferreira T, Jackson AU, Karjalainen J, Lo KS, Locke AE, Mägi R, Mihailov E, Porcu E, Randall JC, Scherag A, Vinkhuyzen AA, Westra HJ, Winkler TW, Workalemahu T, Zhao JH, Absher D, Albrecht E, Anderson D, Baron J, Beekman M, Demirkan A, Ehret GB, Feenstra B, Feitosa MF, Fischer K, Fraser RM, Goel A, Gong J, Justice AE, Kanoni S, Kleber ME, Kristiansson K, Lim U, Lotay V, Lui JC, Mangino M, Mateo Leach I, Medina-Gomez C, Nalls MA, Nyholt DR, Palmer CD, Pasko D, Pechlivanis S, Prokopenko I, Ried JS, Ripke S, Shungin D, Stancáková A, Strawbridge RJ, Sung YJ, Tanaka T, Teumer A, Trompet S, van der Laan SW, van Setten J, Van Vliet-Ostaptchouk JV, Wang Z, Yengo L, Zhang W, Afzal U, Arnlöv J, Arscott GM, Bandinelli S, Barrett A, Bellis C, Bennett AJ, Berne C, Blüher M, Bolton JL, Böttcher Y, Boyd HA, Bruinenberg M, Buckley BM, Buyske S, Caspersen IH, Chines PS, Clarke R, Claudi-Boehm S, Cooper M, Daw EW, De Jong PA, Deelen J, Delgado G, Denny JC, Dhonukshe-Rutten R, Dimitriou M, Doney AS, Dörr M, Eklund N, Eury E, Folkersen L, Garcia ME, Geller F, Giedraitis V, Go AS, Grallert H, Grammer TB, Gräßler J, Grönberg H, de Groot LC, Groves CJ, Haessler J, Hall P, Haller T, Hallmans G, Hannemann A, Hartman CA, Hassinen M, Hayward C, Heard-Costa NL, Helmer Q, Hemani G, Henders AK, Hillege HL, Hlatky MA, Hoffmann W, Hoffmann P, Holmen O, Houwing-Duistermaat JJ, Illig T, Isaacs A, James AL, Jeff J, Johansen B, Johansson Å, Jolley J, Juliusdottir T, Junttila J, Kho AN, Kinnunen L, Klopp N, Kocher T, Kratzer W, Lichtner P, Lind L, Lindström J, Lobbens S, Lorentzon M, Lu Y, Lyssenko V, Magnusson PK, Mahajan A, Maillard M, McArdle WL, McKenzie CA, McLachlan S, McLaren PJ, Menni C, Merger S, Milani L, Moayyeri A, Monda KL, Morken MA, Müller G, Müller-Nurasyid M, Musk AW, Narisu N, Nauck M, Nolte IM, Nöthen MM, Oozageer L, Pilz S, Rayner NW, Renstrom F, Robertson NR, Rose LM, Roussel R, Sanna S, Scharnagl H, Scholtens S, Schumacher FR, Schunkert H, Scott RA, Sehmi J, Seufferlein T, Shi J, Silventoinen K, Smit JH, Smith AV, Smolonska J, Stanton AV, Stirrups K, Stott DJ, Stringham HM, Sundström J, Swertz MA, Syvänen AC, Tayo BO, Thorleifsson G, Tyrer JP, van Dijk S, van Schoor NM, van der Velde N, van Heemst D, van Oort FV, Vermeulen SH, Verweij N, Vonk JM, Waite LL, Waldenberger M, Wennauer R, Wilkens LR, Willenborg C, Wilsgaard T, Wojczynski MK, Wong A, Wright AF, Zhang Q, Arveiler D, Bakker SJ, Beilby J, Bergman RN, Bergmann S, Biffar R, Blangero J, Boomsma DI, Bornstein SR, Bovet P, Brambilla P, Brown MJ, Campbell H, Caulfield MJ, Chakravarti A, Collins R, Collins FS, Crawford DC, Cupples LA, Danesh J, de Faire U, den Ruijter HM, Erbel R, Erdmann J, Eriksson JG, Farrall M, Ferrannini E, Ferrières J, Ford I, Forouhi NG, Forrester T, Gansevoort RT, Gejman PV, Gieger C, Golay A, Gottesman O, Gudnason V, Gyllensten U, Haas DW, Hall AS, Harris TB, Hattersley AT, Heath AC, Hengstenberg C, Hicks AA, Hindorff LA, Hingorani AD, Hofman A, Hovingh GK, Humphries SE, Hunt SC, Hypponen E, Jacobs KB, Jarvelin MR, Jousilahti P, Jula AM, Kaprio J, Kastelein JJ, Kayser M, Kee F, Keinanen-Kiukaanniemi SM, Kiemeney LA, Kooner JS, Kooperberg C, Koskinen S, Kovacs P, Kraja AT, Kumari M, Kuusisto J, Lakka TA, Langenberg C, Le Marchand L, Lehtimäki T, Lupoli S, Madden PA, Männistö S, Manunta P, Marette A, Matise TC, McKnight B, Meitinger T, Moll FL, Montgomery GW, Morris AD, Morris AP, Murray JC, Nelis M, Ohlsson C, Oldehinkel AJ, Ong KK, Ouwehand WH, Pasterkamp G, Peters A, Pramstaller PP, Price JF, Qi L, Raitakari OT, Rankinen T, Rao DC, Rice TK, Ritchie M, Rudan I, Salomaa V, Samani NJ, Saramies J, Sarzynski MA, Schwarz PE, Sebert S, Sever P, Shuldiner AR, Sinisalo J, Steinthorsdottir V, Stolk RP, Tardif JC, Tönjes A, Tremblay A, Tremoli E, Virtamo J, Vohl MC, Amouyel P, Asselbergs FW, Assimes TL, Bochud M, Boehm BO, Boerwinkle E, Bottinger EP, Bouchard C, Cauchi S, Chambers JC, Chanock SJ, Cooper RS, de Bakker PI, Dedoussis G, Ferrucci L, Franks PW, Froguel P, Groop LC, Haiman CA, Hamsten A, Hayes MG, Hui J, Hunter DJ, Hveem K, Jukema JW, Kaplan RC, Kivimaki M, Kuh D, Laakso M, Liu Y, Martin NG, März W, Melbye M, Moebus S, Munroe PB, Njølstad I, Oostra BA, Palmer CN, Pedersen NL, Perola M, Pérusse L, Peters U, Powell JE, Power C, Quertermous T, Rauramaa R, Reinmaa E, Ridker PM, Rivadeneira F, Rotter JI, Saaristo TE, Saleheen D, Schlessinger D, Slagboom PE, Snieder H, Spector TD, Strauch K, Stumvoll M, Tuomilehto J, Uusitupa M, van der Harst P, Völzke H, Walker M, Wareham NJ, Watkins H, Wichmann HE, Wilson JF, Zanen P, Deloukas P, Heid IM, Lindgren CM, Mohlke KL, Speliotes EK, Thorsteinsdottir U, Barroso I, Fox CS, North KE, Strachan DP, Beckmann JS, Berndt SI, Boehnke M, Borecki IB, McCarthy MI, Metspalu A, Stefansson K, Uitterlinden AG, van Duijn CM, Franke L, Willer CJ, Price AL, Lettre G, Loos RJ, Weedon MN, Ingelsson E, O'Connell JR, Abecasis GR, Chasman DI, Goddard ME, Visscher PM, Hirschhorn JN, Frayling TM, Electronic Medical Records and Genomics (eMEMERGEGE) Consortium. MIGen Consortium. PAGEGE Consortium. LifeLines Cohort Study Defining the role of common variation in the genomic and biological architecture of adult human height. Nature Genetics. 2014;46:1173–1186. doi: 10.1038/ng.3097. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yengo L, Sidorenko J, Kemper KE, Zheng Z, Wood AR, Weedon MN, Frayling TM, Hirschhorn J, Yang J, Visscher PM, GIANT Consortium Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of european ancestry. Human Molecular Genetics. 2018;27:3641–3649. doi: 10.1093/hmg/ddy271. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zuk O, Hechter E, Sunyaev SR, Lander ES. The mystery of missing heritability: genetic interactions create phantom heritability. PNAS. 2012;109:1193–1198. doi: 10.1073/pnas.1119675109. [DOI] [PMC free article] [PubMed] [Google Scholar]

eLife. doi: 10.7554/eLife.62206.sa1

Decision letter

Editor: Mone Zaidi¹

Reviewed by: David Meyre²

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

The study is well done and is likely to significantly impact our understanding gene associations in large GWAS datasets. Review critique was thoughtfully and thoroughly addressed. Notably, statistical correction for multiple testing of GWAS data requires increasingly large sample sizes to establish potential associations. This retrospective study used chromatin accessibility and direct contact with gene promoters as biological constraints. The application of such constraints on otherwise sub-significant GWAS signals was shown to reveal potentially true-positive loci without the requirement to increase sample size.

Decision letter after peer review:

Thank you for submitting your article "Biological constraints on GWAS SNPs at suggestive significance thresholds reveals true BMI loci" for consideration by eLife. Your article has been reviewed by two peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Clifford Rosen as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: David Meyre (Reviewer #2).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

As the editors have judged that your manuscript is of interest, but as described below that substantial revisions are required before it is published, we would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). First, because many researchers have temporarily lost access to the labs, we will give authors as much time as they need to submit revised manuscripts. We are also offering, if you choose, to post the manuscript to bioRxiv (if it is not already there) along with this decision letter and a formal designation that the manuscript is "in revision at eLife". Please let us know if you would like to pursue this option. (If your work is more suitable for medRxiv, you will need to post the preprint yourself, as the mechanisms for us to do so are still in development.)

Summary:

The authors sought to determine whether SNP's located in chromatin that physically contacts GWAS hits provide evidence that boost test scores from suggestive to significant. If successful, this approach might obviate the need for larger, more expensive GWAS studies. The strategy is to first identify relevant open chromatin sites, focusing on genes that were suggestive in early studies. Then in parallel ask which suggestive loci in these early studies became significant in later and larger studies. They claim that positive results support the argument that epigenetic evidence would have boosted scores and precluded the need for larger and more expensive studies. The manuscript has potential, but work is needed. Critical evidence is not provided, or provided in a way that makes critical reading a challenge.

Essential revisions:

1) This is not the first study to demonstrate that biological annotations combined to more relax thresholds of statistical associations from GWAS “rescue” true associations in obesity field. As an illustration, Meyre et al., 2009, rescued a modest stage 1 GWAS association signal for extreme obesity in the NPC1 gene using a candidate gene strategy, and the association was replicated in stage 2. The GWAS obesity hit in NPC1 was recently confirmed as a genome-wide significant signal for BMI in a large meta-analysis (Turcot et al., 2018). Wang et al. (Diabetes 2019) demonstrated a strong enrichment in positive associations with BMI for SNPs located in/near syndromic obesity genes. The authors may like to discuss these and other reports in literature.

2) A strengths and limitation section would add a lot in the Discussion.

3) The biological criteria (chromatin accessibility / direct contact with gene promoters) used in this study are very original, but to be more exhaustive additional biological criteria may have been used to select more SNPs (listed in Li and Meyre, Int J Obes 2013 in the “hypothesis-driven GWAS analysis” section). This may be acknowledged as a limitation of the study.

4) Tissues targeted in the study (adipocyte and hypothalamus) are extremely relevant in the context of genetic susceptibility to obesity. However, genetic association studies have also highlighted the important role of other tissues in energy balance (e.g. beta-cells, liver, muscle, stomach… see Locke et al., 2015, and Pigeyre et al., Clin Sci 2016). Having not explored all tissues potentially relevant for obesity may be acknowledged as another limitation of the study.

5) While Speliotes et al., 2010 and Yengo et al., 2018 GWAS for BMI have been performed in populations of European ancestry, the Locke et al., 2015 study included a multi-ethnic population. Did the authors analyze the GWAS summary statistics in the European population in the Locke et al., 2015 study. If yes, they may provide more details on what they did in the Materials and methods section. If no, I think using the multi-ethnic GWAS summary statistics may add some heterogeneity, and I recommend to focus the analysis in the European population in the Locke et al., 2015 study.

6) Organization of data and analysis. Curiously, the epigenetic data, evidence and analysis of the Capture-C and ATAC-seq data is in the Materials and methods rather than the Results. The combination of methods and data are somewhat incomplete perhaps because of their unusual location in the manuscript. These are key to the overall study and should be in the Results section. As I read the manuscript, my curiosity grew about the nature of the evidence for connecting flanking and regulatory SNPs with the previously reported target SNPs – what is the source of the data, what is the nature of the evidence, – only to learn that they are in the Materials and methods, albeit incompletely. What is the evidence for open chromatin at their targets and in the sentinel genes that were suggestive in early GWAS studies?

7) Study design, analysis, and presentation. A flow chart would help readers understand the sequence of tasks and help the authors with the organization of the results.

8) Bottom-line. The key question is what is the benefit of the proposed method? What's missing – a simple statement about how many suggestive loci became significant later, and how many didn't, in the regular course of work, and in parallel how many epigenetic hits at suggestive loci become significant later, and how many didn't? That is, what's the quantitative benefit of the new assay. Remarkably they don't show the numbers for this seemingly simple and central question. Surprisingly, this point is not even in the Abstract; the 4th and 5th sentences, which address the key results, do not make these points clearly.

9) Chromatin evidence. The evidence is based on two cell lines – MSC-derived adipocytes and ESC-derived hypothalamic-like neurons. Both are relevant cell types in vivo. But beyond that functional connection, no other rationale is provided, and no discussion is provided to critically evaluate the reliability of the evidence. Chromatin states are dynamic properties of cells both in vivo and in vitro. Physiological conditions and disease state can impact these profiles. How stable are these profiles in vitro, and how consistent are these features with their in vivo counter-parts? How do these profiles vary among individuals in health or disease? What about single-cell heterogeneity? Presumably, these factors would contribute noise in the assays, creating false positives, false negatives. All of this is fine, in principle; every approach has limitations. But remarkably little consideration to these issues, either in study design or in discussion of the results and analysis. These issues need careful, thorough and critical consideration.

10) The authors should discuss how the model might work. The authors are correctly concerned about linkage disequilibrium, with an emphasis on independent evidence (subsection “Variant-to-gene mapping pipeline”). But no data are formally presented; if these SNPs are independent (no LD), what is the argument that these data can be combined rather than additive? Are positive results for this variant-to-gene mapping simply a reflection of additive effects? Or do they argue that SNP-interaction (epistasis) is involved? The text is unclear about these issues.

11) Approach. These are important questions for any proposed method:

A) The authors use a frequentist approach. Perhaps they justify that approach versus Bayes, which the logic of their approach nicely fits.

B) The authors should be able to give clear basic statistics – number/percentage of hits that validate in general, and then when applying their approach, essentially they need to give the false positive and false negative rates, i.e. how many does their approach put forward that end up being negative, and how many do they miss with their approach (presumably loci with non-epigenetic mechanisms would obviously be missed). Obviously, they need to be clear that a comparable threshold is being applied so that readers can assess the relative performance of the proposed and standard methods. Importantly, are they preserving the type I error rate with their new method, and what is their power? Unfortunately, the Abstract, Results and Discussion are not clear on these points – these are the essence of the paper.

eLife. 2021 Jan 18;10:e62206. doi: 10.7554/eLife.62206.sa2

Author response

Essential revisions:

1) This is not the first study to demonstrate that biological annotations combined to more relax thresholds of statistical associations from GWAS “rescue” true associations in obesity field. As an illustration, Meyre et al., 2009, rescued a modest stage 1 GWAS association signal for extreme obesity in the NPC1 gene using a candidate gene strategy, and the association was replicated in stage 2. The GWAS obesity hit in NPC1 was recently confirmed as a genome-wide significant signal for BMI in a large meta-analysis (Turcot et al., 2018). Wang et al. (Diabetes 2019) demonstrated a strong enrichment in positive associations with BMI for SNPs located in/near syndromic obesity genes. The authors may like to discuss these and other reports in literature.

The reviewer raises a good point and speaks to the feasibility of the uniform genome wide approach we have employed. In response to these points, we now refer to previous work in this context of using biological annotations to salvage sub-threshold signals. We’ve taken the suggestion of including the key example of the Meyre et al., 2009 study as an introduction to this idea of sub-threshold SNP mining, along with citing the Turcot et al. paper which replicated this association. We have also added an additional study by Wang et al., 2016, as it relates greatly to this paper in its use of epigenomic data to mine biologically relevant sub-threshold signals from GWAS.

2) A strengths and limitation section would add a lot in the Discussion.

We agree that a strengths and limitation section is an excellent addition to the manuscript. While we previously included a number of points regarding the limitations of this study, we recognize that a more complete breakdown is warranted. We have added a far more comprehensive discussion concerning the classification rates, noting the relatively poor sensitivity while also highlighting the high precision and specificity. We believe this trade-off is an acceptable outcome given the positive predictions went on to largely achieve genome-wide significance. As per comment #4, we have also noted in our limitations the lack of a more diverse set of cell types, although adipocytes and hypothalamic neurons are prime candidate, it is clear there are other relevant cell types that could be used in such a study; indeed they could be the subject of a future study in this regard.

3) The biological criteria (chromatin accessibility / direct contact with gene promoters) used in this study are very original, but to be more exhaustive additional biological criteria may have been used to select more SNPs (listed in Li and Meyre, Int J Obes 2013 in the “hypothesis-driven GWAS analysis” section). This may be acknowledged as a limitation of the study.

We agree that more criteria and constraints could always be employed. Our intention was to observe if this unique, and relatively straightforward, combination of chromatin state and physical contacts between implicated proxy SNPs and gene promoters alone would suffice to salvage sub-threshold SNPs. We do not doubt that additional filters could improve predictions to a degree, especially at more relaxed p-value thresholds, so we have highlighted this point in our Discussion while presenting the possibility that additional criteria could be used in a future study to improve the quality of the predictions.

4) Tissues targeted in the study (adipocyte and hypothalamus) are extremely relevant in the context of genetic susceptibility to obesity. However, genetic association studies have also highlighted the important role of other tissues in energy balance (e.g. beta-cells, liver, muscle, stomach… see Locke et al., 2015, and Pigeyre et al., Clin Sci 2016). Having not explored all tissues potentially relevant for obesity may be acknowledged as another limitation of the study.

5) While Speliotes et al., 2010 and Yengo et al., 2018 GWAS for BMI have been performed in populations of European ancestry, the Locke et al., 2015 study included a multi-ethnic population. Did the authors analyze the GWAS summary statistics in the European population in the Locke et al., 2015 study. If yes, they may provide more details on what they did in the Materials and methods section. If no, I think using the multi-ethnic GWAS summary statistics may add some heterogeneity, and I recommend to focus the analysis in the European population in the Locke et al., 2015 study.

Thank you for catching this omission. We leveraged European ancestry results for all the studies analyzed. We have updated the Materials and methods accordingly to reflect this point.

6) Organization of data and analysis. Curiously, the epigenetic data, evidence and analysis of the Capture-C and ATAC-seq data is in the Materials and methods rather than the Results. The combination of methods and data are somewhat incomplete perhaps because of their unusual location in the manuscript. These are key to the overall study and should be in the Results section. As I read the manuscript, my curiosity grew about the nature of the evidence for connecting flanking and regulatory SNPs with the previously reported target SNPs – what is the source of the data, what is the nature of the evidence, – only to learn that they are in the Materials and methods, albeit incompletely. What is the evidence for open chromatin at their targets and in the sentinel genes that were suggestive in early GWAS studies?

Our intention with this form of presentation was to facilitate narrative flow and to dedicate largely the Results section to the process involving the retrospective analyses employed, while placing the processing of the epigenetic data in the Materials and methods section. However, given the reviewer’s comment, we agree that additional details should be provided in the Results section. We have therefore added additional information on these datasets in the Results section, while leaving the extensive details of how these data were generated and processed within the Materials and methods. The hypothalamic neuron related data were previously generated and analyzed in a separate study of ours – which we now cited. In addition, we include specific ATAC-seq peak numbers, as well as the interaction counts, for our three adipocyte libraries.

7) Study design, analysis, and presentation. A flow chart would help readers understand the sequence of tasks and help the authors with the organization of the results.

Given the complexity of these analyses, we strongly agree with this suggestion. To aid comprehension, we have now added a flowchart outlining the pipelines used (see Figure 3). The flowchart we include uses the BMI 2010-2015 data to exemplify our approach, and includes numbers to aid description of each computational step and how many SNPs and loci occur at each step of these analyses.

8) Bottom-line. The key question is what is the benefit of the proposed method? What's missing – a simple statement about how many suggestive loci became significant later, and how many didn't, in the regular course of work, and in parallel how many epigenetic hits at suggestive loci become significant later, and how many didn't? That is, what's the quantitative benefit of the new assay. Remarkably they don't show the numbers for this seemingly simple and central question. Surprisingly, this point is not even in the Abstract; the 4th and 5th sentences, which address the key results, do not make these points clearly.

We opted principally not to highlight these numbers outside of the various figures due to the many conditions that we were employed. This was to avoid the perception that we were somehow cherry-picking the best trait, year, and P-value bin given the variance in these numbers, potentially confusing or misleading the reader from the start. However, we do acknowledge that such numbers in text would be beneficial to keep the reader from consistently having to refer back to the figures and tables. Therefore, we have added such numbers at the end of each respective section using counts from within the consistent bin of 5 x 10^-8 ≤ P < 5 x 10^-5.

9) Chromatin evidence. The evidence is based on two cell lines – MSC-derived adipocytes and ESC-derived hypothalamic-like neurons. Both are relevant cell types in vivo. But beyond that functional connection, no other rationale is provided, and no discussion is provided to critically evaluate the reliability of the evidence. Chromatin states are dynamic properties of cells both in vivo and in vitro. Physiological conditions and disease state can impact these profiles. How stable are these profiles in vitro, and how consistent are these features with their in vivo counter-parts? How do these profiles vary among individuals in health or disease? What about single-cell heterogeneity? Presumably, these factors would contribute noise in the assays, creating false positives, false negatives. All of this is fine, in principle; every approach has limitations. But remarkably little consideration to these issues, either in study design or in discussion of the results and analysis. These issues need careful, thorough and critical consideration.

Our data are generated from three replicates and/or donors, and we focus on regions that are accessible across all replicates; indeed, we do note that there is high consistency between replicates. It should be noted that most chromatin differences across individuals are quantitative, not qualitative. A study by the Regev group (Gate et al., Nature Genetics 2018) showed that open chromatin regions in T cells from >100 individuals were highly consistent, but that 15% of OCR varied quantitatively in a manner associated with common genetic variation. Combined with the additional constraint of insisting on only SNPs coinciding with open chromatin via ATAC-seq, our Capture C based approach avoids a high degree of false positives; however, it is challenging to determine the false negative rate, given a lack of reference data. However, given the statistical significance of our observations, we strongly believe our data should be shared with the complex disease genetics field. Furthermore, we ensure we only work with cells that are derived from healthy individuals, given we are focused on characterizing the susceptibility conferred by these loci before onset of disease, i.e. we are investigating the genetic build up to disease pathogenesis, or in other words “before the car crash”. The data generated here are consistent with the datasets we describe in recent papers we published in Nature Communications for lupus and bone mineral density. Finally, we have not conducted chromatin conformation capture in a single cell setting, an approach which is still in its infancy and does represent some drawbacks, including important multiple amplification steps which can itself a high degree of false positives. We have highlighted these points in the Discussion.

10) The authors should discuss how the model might work. The authors are correctly concerned about linkage disequilibrium, with an emphasis on independent evidence (subsection “Variant-to-gene mapping pipeline”). But no data are formally presented; if these SNPs are independent (no LD), what is the argument that these data can be combined rather than additive? Are positive results for this variant-to-gene mapping simply a reflection of additive effects? Or do they argue that SNP-interaction (epistasis) is involved? The text is unclear about these issues.

We are less clear on the issues raised by the reviewer in this regard. We are in fact not concerned about LD, rather we are leveraging it to integrate with ATAC-seq data in order to shortlist putative causal variants, which are further shortlisted by the further integration of Capture C data. We are treating all the independent BMI and WHR loci as independent signals, and making inferences as such. Our study does not address or observe any form of epistasis.

11) Approach. These are important questions for any proposed method:

A) The authors use a frequentist approach. Perhaps they justify that approach versus Bayes, which the logic of their approach nicely fits.

We appreciate this suggestion. We have updated the statistical analysis with a Bayesian model to identify the probability that our constrained method outperforms no constraints. We note few changes to the results, though the situations where changes occurred resulted in >95% probability of our biological constraint outperforming no constraints. And overall, the changes do not affect our interpretation of these results.

B) The authors should be able to give clear basic statistics – number/percentage of hits that validate in general, and then when applying their approach, essentially they need to give the false positive and false negative rates, i.e. how many does their approach put forward that end up being negative, and how many do they miss with their approach (presumably loci with non-epigenetic mechanisms would obviously be missed). Obviously, they need to be clear that a comparable threshold is being applied so that readers can assess the relative performance of the proposed and standard methods. Importantly, are they preserving the type I error rate with their new method, and what is their power? Unfortunately, the Abstract, Results and Discussion are not clear on these points – these are the essence of the paper.

While we reported the sensitivity and specificity values in the previous version of the manuscript, we acknowledge that the false positive and false negative rates were not directly included. We did reference the general trend of these values, given their relationship to sensitivity and specificity, and the overall pattern of these rates across all analyses within the Results section entitled “Predictive power of negative control does not differ from the unconstrained set”, as well as in the Discussion; however, we acknowledge that such information relating to the preservation of the type I error rate was not included. To accommodate this comment, we have made our supplementary data file more comprehensive, which includes the classification metrics for each condition while also now including FPR and FNR. This file has been subsequently renamed Source data 1. Additionally, we now include a more direct discussion of these values and their trends within the Discussion section.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

Hammond RK, Pahl MC, Su C, Cousminer DL, Leonard ME, Lu S, Doege CA, Wagley Y, Hodge KM, Lasconi C, Johnson ME, Terry NA, Ghanem LR, Brown CS, Pippin JA, Hankenson KD, Leibel RL, Chesi A, Wells AD, Grant SFA. 2020. MSC-Derived Adipose ATAC-seq and Promoter-Focused Capture-C. NCBI Gene Expression Omnibus. GSE164745
Pahl MC, Doege CA, Hodge KM, Littleton SH, Leonard ME, Lu S, Rausch R, Pippin JA, Bradfield JP, Hammond RK, Boehm K, Berkowitz RI, Lasconi C, Chesi A, Johnson ME, Wells AD, Voight BF, Leibel RL, Cousminer DL, Grant SFA. 2020. ESC-Derived Hypothalamic Neuron ATAC-seq and Promoter-Focused Capture-C. NCBI Gene Expression Omnibus. GSE164911
Speliotes EK, Willer CJ, Berndt SI, Monda KL, Thorleifsson G, Jackson AU, Allen HL, Lindgren CM, Luan J, Magi R. 2010. GWAS 2010 BMI Summary Statistics. Broad Institute. GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq
Heid IM, Jackson AU, Randall JC, Winkler TW, Qi L, Steinthorsdottir V, Thorleifsson G, Zillikens MC, Speliotes EK, Magi R. 2010. GWAS 2010 WHRadjBMI Summary Statistics. Broad Institute. GIANT_WHRadjBMI_Heid2010_publicrelease_HapMapCeuFreq
Locke AE, Kahali B, Berndt SI, Justice AE, Pers TH, Day FR, Powell C, Vedantam S, Buchkovich ML, Yang J, Croteau-Chonka DC, Esko T. 2015. GWAS 2015 BMI Summary Statistics. Broad Institute. SNP_gwas_mc_merge_nogc.tbl.uniq
Shungin D, Winkler TW, Croteau-Chonka DC, Ferreira T, Locke AE, Magi R, Strawbridge R, Pers TH, Fischer K, Justice AE, Workalemahu T, Wu JM. 2015. GWAS 2015 WHRadjBMI Summary Statistics. Broad Institute. GIANT_2015_WHRadjBMI_COMBINED_EUR
Pulit SL, Stoneman C, Morris AP, Wood AR, Glastonbury CA, Tyrrell J. 2018. GWAS 2018 BMI Summary Statistics. Broad Institute. Meta-analysis_Locke_et_al%2BUKBiobank_2018_UPDATED
Yengo L, Sidorenko J, Kemper KE, Zheng Z, Wood AR, Weedon MN, Frayling TM, Hirschhorn J, Yang J, Visscher PM, GIANT Consortium 2018. GWAS 2018 WHRadjBMI Summary Statistics. Zenodo. [DOI] [PMC free article] [PubMed]

Supplementary Materials

Figure 2—source data 1. Number of 2010 loci identified by constrained method and the number that achieved GWS by 2015 in each cell type.

elife-62206-fig2-data1.csv^{(125B, csv)}

Figure 2—source data 2. Number of 2010 loci identified with no constraint and the number that achieved GWS by 2015.

elife-62206-fig2-data2.csv^{(101B, csv)}

Figure 4—source data 1. Number of 2010 loci identified by constrained method and the number that achieved GWS by 2018 in each cell type.

elife-62206-fig4-data1.csv^{(129B, csv)}

Figure 4—source data 2. Number of 2010 loci identified with no constraint and the number that achieved GWS by 2018.

elife-62206-fig4-data2.csv^{(103B, csv)}

Figure 5—source data 1. Number of 2015 loci identified by constrained method and the number that achieved GWS by 2018 in each cell type.

elife-62206-fig5-data1.csv^{(139B, csv)}

Figure 5—source data 2. Number of 2015 loci identified with no constraint and the number that achieved GWS by 2018.

elife-62206-fig5-data2.csv^{(104B, csv)}

Figure 6—source data 1. Number of 2010 loci identified by constrained method and the number that achieved GWS by 2018 in each cell type.

elife-62206-fig6-data1.csv^{(128B, csv)}

Figure 6—source data 2. Number of 2010 loci identified with no constraint and the number that achieved GWS by 2018.

elife-62206-fig6-data2.csv^{(100B, csv)}

Figure 6—figure supplement 2—source data 1. Number of 2010 loci identified by constrained method and the number that achieved GWS by 2015 in each cell type.

elife-62206-fig6-figsupp2-data1.csv^{(127B, csv)}

Figure 6—figure supplement 2—source data 2. Number of 2010 loci identified with no constraint and the number that achieved GWS by 2015.

elife-62206-fig6-figsupp2-data2.csv^{(99B, csv)}

Figure 6—figure supplement 3—source data 1. Number of 2015 loci identified by constrained method and the number that achieved GWS by 2018 in each cell type.

elife-62206-fig6-figsupp3-data1.csv^{(95B, csv)}

Figure 6—figure supplement 3—source data 2. Number of 2015 loci identified with no constraint and the number that achieved GWS by 2018.

elife-62206-fig6-figsupp3-data2.csv^{(104B, csv)}

Figure 7—source data 1. Number of 2010 loci identified by constrained method and the number that achieved GWS by 2018 in each cell type.

elife-62206-fig7-data1.csv^{(152B, csv)}

Figure 8—source data 1. Number of 2010 loci identified by constrained method and the number that achieved GWS by 2018 in each cell type.

elife-62206-fig8-data1.csv^{(142B, csv)}

Source data 1. Classification metrics for each condition.

elife-62206-data1.zip^{(95.8KB, zip)}

Transparent reporting form

elife-62206-transrepform.docx^{(66.5KB, docx)}

Data Availability Statement

The following datasets were generated:

The following previously published datasets were used:

Pulit SL, Stoneman C, Morris AP, Wood AR, Glastonbury CA, Tyrrell J. 2018. GWAS 2018 BMI Summary Statistics. Broad Institute. Meta-analysis_Locke_et_al%2BUKBiobank_2018_UPDATED

Yengo L, Sidorenko J, Kemper KE, Zheng Z, Wood AR, Weedon MN, Frayling TM, Hirschhorn J, Yang J, Visscher PM, GIANT Consortium 2018. GWAS 2018 WHRadjBMI Summary Statistics. Zenodo.

[bib1] Arnold M, Raffler J, Pfeufer A, Suhre K, Kastenmüller G. SNiPA: an interactive, genetic variant-centered annotation browser. Bioinformatics. 2015;31:1334–1336. doi: 10.1093/bioinformatics/btu779. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nature Methods. 2013;10:1213–1218. doi: 10.1038/nmeth.2688. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] Cairns J, Freire-Pritchett P, Wingett SW, Várnai C, Dimond A, Plagnol V, Zerbino D, Schoenfelder S, Javierre BM, Osborne C, Fraser P, Spivakov M. CHiCAGO: robust detection of DNA looping interactions in capture Hi-C data. Genome Biology. 2016;17:127. doi: 10.1186/s13059-016-0992-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] Çalışkan M, Manduchi E, Rao HS, Segert JA, Beltrame MH, Trizzino M, Park Y, Baker SW, Chesi A, Johnson ME, Hodge KM, Leonard ME, Loza B, Xin D, Berrido AM, Hand NJ, Bauer RC, Wells AD, Olthoff KM, Shaked A, Rader DJ, Grant SFA, Brown CD. Genetic and epigenetic fine mapping of complex trait associated loci in the human liver. The American Journal of Human Genetics. 2019;105:89–107. doi: 10.1016/j.ajhg.2019.05.010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] Chesi A, Wagley Y, Johnson ME, Manduchi E, Su C, Lu S, Leonard ME, Hodge KM, Pippin JA, Hankenson KD, Wells AD, Grant SFA. Genome-scale capture C promoter interactions implicate effector genes at GWAS loci for bone mineral density. Nature Communications. 2019;10:1260. doi: 10.1038/s41467-019-09302-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] Cousminer DL, Wagley Y, Pippin J, Elhakeem A, Way G. Genome-wide association study implicates novel loci and reveals candidate effector genes for longitudinal pediatric bone accrual through variant-to-gene mapping. medRxiv. 2020 doi: 10.1101/2020.02.17.20024133. [DOI] [PMC free article] [PubMed]

[bib8] Hughes JR, Roberts N, McGowan S, Hay D, Giannoulatou E, Lynch M, De Gobbi M, Taylor S, Gibbons R, Higgs DR. Analysis of hundreds of cis-regulatory landscapes at high resolution in a single, high-throughput experiment. Nature Genetics. 2014;46:205–212. doi: 10.1038/ng.2871. [DOI] [PubMed] [Google Scholar]

[bib9] Javierre BM, Burren OS, Wilder SP, Kreuzhuber R, Hill SM, Sewitz S, Cairns J, Wingett SW, Várnai C, Thiecke MJ, Burden F, Farrow S, Cutler AJ, Rehnström K, Downes K, Grassi L, Kostadima M, Freire-Pritchett P, Wang F, Stunnenberg HG, Todd JA, Zerbino DR, Stegle O, Ouwehand WH, Frontini M, Wallace C, Spivakov M, Fraser P, BLUEPRINT Consortium Lineage-Specific genome architecture links enhancers and Non-coding disease variants to target gene promoters. Cell. 2016;167:1369–1384. doi: 10.1016/j.cell.2016.09.037. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, Henning AK, SanGiovanni JP, Mane SM, Mayne ST, Bracken MB, Ferris FL, Ott J, Barnstable C, Hoh J. Complement factor H polymorphism in age-related macular degeneration. Science. 2005;308:385–389. doi: 10.1126/science.1109557. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] Meyre D, Delplanque J, Chèvre JC, Lecoeur C, Lobbens S, Gallina S, Durand E, Vatin V, Degraeve F, Proença C, Gaget S, Körner A, Kovacs P, Kiess W, Tichet J, Marre M, Hartikainen AL, Horber F, Potoczna N, Hercberg S, Levy-Marchal C, Pattou F, Heude B, Tauber M, McCarthy MI, Blakemore AI, Montpetit A, Polychronakos C, Weill J, Coin LJ, Asher J, Elliott P, Järvelin MR, Visvikis-Siest S, Balkau B, Sladek R, Balding D, Walley A, Dina C, Froguel P. Genome-wide association study for early-onset and morbid adult obesity identifies three new risk loci in european populations. Nature Genetics. 2009;41:157–159. doi: 10.1038/ng.301. [DOI] [PubMed] [Google Scholar]

[bib13] Pahl MC, Doege CA, Hodge KM, Littleton SH, Leonard ME, Lu S, Grant SFA. Cis-regulatory architecture of human ESC-derived hypothalamic neuron differentiation aids in variant-to-gene mapping of relevant complex traits. bioRxiv. 2020 doi: 10.1101/2020.07.06.146951. [DOI] [PMC free article] [PubMed]

[bib14] Pulit SL, Stoneman C, Morris AP, Wood AR, Glastonbury CA, Tyrrell J, Yengo L, Ferreira T, Marouli E, Ji Y, Yang J, Jones S, Beaumont R, Croteau-Chonka DC, Winkler TW, Hattersley AT, Loos RJF, Hirschhorn JN, Visscher PM, Frayling TM, Yaghootkar H, Lindgren CM, GIANT Consortium Meta-analysis of genome-wide association studies for body fat distribution in 694 649 individuals of European ancestry. Human Molecular Genetics. 2019;28:166–174. doi: 10.1093/hmg/ddy327. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] R Development Core Team . Vienna, Austria: R Foundation for Statistical Computing; 2019. http://www.R-project.org [Google Scholar]

[bib17] Smemo S, Tena JJ, Kim KH, Gamazon ER, Sakabe NJ, Gómez-Marín C, Aneas I, Credidio FL, Sobreira DR, Wasserman NF, Lee JH, Puviindran V, Tam D, Shen M, Son JE, Vakili NA, Sung HK, Naranjo S, Acemel RD, Manzanares M, Nagy A, Cox NJ, Hui CC, Gomez-Skarmeta JL, Nóbrega MA. Obesity-associated variants within FTO form long-range functional connections with IRX3. Nature. 2014;507:371–375. doi: 10.1038/nature13138. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] Su C, Johnson ME, Torres A, Thomas RM, Manduchi E, Sharma P, Wells AD. Human follicular helper T cell promoter connectomes reveal novel genes and regulatory elements at SLE GWAS loci. bioRxiv. 2019 doi: 10.1101/2019.12.20.885426. [DOI]

[bib20] Su C, Johnson ME, Torres A, Thomas RM, Manduchi E, Sharma P, Mehra P, Le Coz C, Leonard ME, Lu S, Hodge KM, Chesi A, Pippin J, Romberg N, Grant SFA, Wells AD. Mapping effector genes at lupus GWAS loci using promoter Capture-C in follicular helper T cells. Nature Communications. 2020;11:3294. doi: 10.1038/s41467-020-17089-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] Tam V, Patel N, Turcotte M, Bossé Y, Paré G, Meyre D. Benefits and limitations of genome-wide association studies. Nature Reviews Genetics. 2019;20:467–484. doi: 10.1038/s41576-019-0127-1. [DOI] [PubMed] [Google Scholar]

[bib23] Visscher PM. Sizing up human height variation. Nature Genetics. 2008;40:489–490. doi: 10.1038/ng0508-489. [DOI] [PubMed] [Google Scholar]

[bib24] Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, Yang J. 10 years of GWAS discovery: biology, function, and translation. The American Journal of Human Genetics. 2017;101:5–22. doi: 10.1016/j.ajhg.2017.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics. 2009;10:57–63. doi: 10.1038/nrg2484. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] Wang X, Tucker NR, Rizki G, Mills R, Krijger PH, de Wit E, Subramanian V, Bartell E, Nguyen XX, Ye J, Leyton-Mange J, Dolmatova EV, van der Harst P, de Laat W, Ellinor PT, Newton-Cheh C, Milan DJ, Kellis M, Boyer LA. Discovery and validation of sub-threshold genome-wide association study loci using epigenomic signatures. eLife. 2016;5:e10557. doi: 10.7554/eLife.10557. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib27] Wingett S, Ewels P, Furlan-Magaril M, Nagano T, Schoenfelder S, Fraser P, Andrews S. HiCUP: pipeline for mapping and processing Hi-C data [version 1; peer review: 2 approved, 1 approved with reservations] F1000Research. 2015;4:1310. doi: 10.12688/f1000research.7334.1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] Yengo L, Sidorenko J, Kemper KE, Zheng Z, Wood AR, Weedon MN, Frayling TM, Hirschhorn J, Yang J, Visscher PM, GIANT Consortium Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of european ancestry. Human Molecular Genetics. 2018;27:3641–3649. doi: 10.1093/hmg/ddy271. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] Zuk O, Hechter E, Sunyaev SR, Lander ES. The mystery of missing heritability: genetic interactions create phantom heritability. PNAS. 2012;109:1193–1198. doi: 10.1073/pnas.1119675109. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Biological constraints on GWAS SNPs at suggestive significance thresholds reveal additional BMI loci

Reza K Hammond

Matthew C Pahl

Chun Su

Diana L Cousminer

Michelle E Leonard

Sumei Lu

Claudia A Doege

Yadav Wagley

Kenyaita M Hodge

Chiara Lasconi

Matthew E Johnson

James A Pippin

Kurt D Hankenson

Rudolph L Leibel

Alessandra Chesi

Andrew D Wells

Struan FA Grant

Roles

Abstract

Introduction

Results

Suggestive proxy SNP identification

ATAC-seq and promoter-focused capture C

Variant-to-gene mapping reveals suggestive 2010 BMI loci that subsequently achieved genome-wide significance in 2015

Table 1. 2015 BMI loci that were implicated with our method in the 2010 data set.

Figure 1. 2015 BMI Manhattan plot depicting loci identified with 2010 salvaged SNPs 2015 BMI loci identifiable with 2010 salvaged SNPs.

Figure 2. Independent 2010 BMI SNPs identified via variant-to-gene mapping that go on to reach genome-wide significance by 2015, as well as the set of unconstrained 2010 suggestive SNPs that achieve genome-wide significance by 2015.

Figure 3. Flowchart of the pipeline describing each computational step.

BMI GWAS variant-to-gene mapping constraints between 2010 and 2018

Table 2. 2018 BMI loci that were identified using 2010 salvaged.

Figure 4. Independent 2010.

Figure 4—figure supplement 1. Empirical distribution of positive predictive values of suggestive 2010 BMI SNPs achieving GWS by 2018.

BMI GWAS variant-to-gene mapping constraints between 2015 and 2018

Figure 5. Independent 2015.

Constraining WHRadjBMI GWAS reports: 2010–2018

Figure 6. Independent 2010.

Figure 6—figure supplement 1. Empirical distribution of positive predictive values of suggestive 2010 WHRadjBMI SNPs achieving GWS by 2018.

Figure 6—figure supplement 2. Independent 2010.

Figure 6—figure supplement 3. Independent 2015.

Predictive power of negative control does not differ from the unconstrained set

Figure 7. Independent 2010.

Figure 8. Independent 2010.

Discussion

Materials and methods

ATAC-seq library generation and peak calls

Cell fixation for chromatin capture

Capture C

Analysis of promoter-focused capture C data

Variant-to-gene mapping pipeline

Retrospective analysis

Funding Statement

Contributor Information

Funding Information

Additional information

Competing interests

Author contributions

Additional files

Data availability

References

Decision letter

Roles

Author response

Associated Data

Data Citations

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases