Abstract
The genetic architecture of human diseases and complex traits has been extensively studied, but little is known about the relationship of causal disease effect sizes between proximal SNPs, which have largely been assumed to be independent. We introduce a new method, LD SNP-pair effect correlation regression (LDSPEC), to estimate the correlation of causal disease effect sizes of derived alleles between proximal SNPs, depending on their allele frequencies, LD, and functional annotations; LDSPEC produced robust estimates in simulations across various genetic architectures. We applied LDSPEC to 70 diseases and complex traits from the UK Biobank (average N=306K), meta-analyzing results across diseases/traits. We detected significantly nonzero effect correlations for proximal SNP pairs (e.g., −0.37±0.09 for low-frequency positive-LD 0–100bp SNP pairs) that decayed with distance (e.g., −0.07±0.01 for low-frequency positive-LD 1–10kb), varied with allele frequency (e.g., −0.15±0.04 for common positive-LD 0–100bp), and varied with LD between SNPs (e.g., +0.12±0.05 for common negative-LD 0–100bp) (because we consider derived alleles, positive-LD and negative-LD SNP pairs may yield very different results). We further determined that SNP pairs with shared functions had stronger effect correlations that spanned longer genomic distances, e.g., −0.37±0.08 for low-frequency positive-LD same-gene promoter SNP pairs (average genomic distance of 47kb (due to alternative splicing)) and −0.32±0.04 for low-frequency positive-LD H3K27ac 0–1kb SNP pairs. Consequently, SNP-heritability estimates were substantially smaller than estimates of the sum of causal effect size variances across all SNPs (ratio of 0.87±0.02 across diseases/traits), particularly for certain functional annotations (e.g., 0.78±0.01 for common Super enhancer SNPs)—even though these quantities are widely assumed to be equal. We recapitulated our findings via forward simulations with an evolutionary model involving stabilizing selection, implicating the action of linkage masking, whereby haplotypes containing linked SNPs with opposite effects on disease have reduced effects on fitness and escape negative selection.
Introduction
Inferring the genome-wide distribution of causal genetic effects has yielded rich insights into the polygenic architecture of human diseases and complex traits1–21. However, virtually all published studies of disease and complex trait architectures assume that nearby SNPs have independent causal effects on disease1–21—an assumption that warrants careful scrutiny. Correlated effects may arise due to natural selection22–28, e.g., due to linkage masking, whereby haplotypes containing linked SNPs with opposite effects on disease have a reduced aggregate effect on fitness and escape negative selection23,28; correlated effects have also been reported in studies of rare coding variants (concordant effects29–34) and model organisms (concordant35 or opposite35,36 effects). Despite these findings, SNP-pair effect correlations have yet to be systematically investigated in genome-wide data.
Here, we propose a method, linkage disequilibrium SNP-pair effect correlation regression (LDSPEC), to estimate correlations of standardized derived allele causal disease effect sizes for pairs of proximal SNPs, depending on their minor allele frequency (MAF), LD, and functional annotations. Roughly, LDSPEC determines that a SNP-pair annotation has positive (resp. negative) correlation of causal effect sizes (of derived alleles) if SNPs with concordant signed LD to pairs of SNPs in the SNP-pair annotation have higher (resp. lower) χ2 statistics than SNPs with discordant signed LD. We performed extensive simulations with real genotypes to show that LDSPEC is well-calibrated in null simulations and produces attenuated estimates of nonzero SNP-pair effect correlations in causal simulations. We applied LDSPEC to 70 UK Biobank diseases and complex traits37 (N=306K), estimating effect correlations for common (MAF≥5%) positive-LD, common negative-LD, low-frequency (0.5%≤MAF<5%) positive-LD, and low-frequency negative-LD SNP pairs depending on their functional annotations. We note that because we consider derived alleles, positive-LD and negative-LD SNP pairs differ in a way that is not arbitrary and may yield very different results. We recapitulated our findings via forward simulations with an evolutionary model involving stabilizing selection38,39.
We note that this study expands upon an unpublished preprint40, which contained key ideas and derivations and detected SNP-pair effect correlations for extremely-short-range SNP pairs (0–100bp) that varied with LD; here, we introduce improved methodology, analyze functional SNP-pair annotations, identify SNP-pair effect correlations at longer distances, and perform evolutionary forward simulations to interpret our findings.
Results
Overview of methods
LDSPEC estimates the signed correlation of standardized derived allele causal disease effect sizes across SNP pairs in a given SNP-pair annotation, e.g., set of 0–100bp SNP pairs. The method adopts and improves upon key ideas and derivations from a recent preprint40 (see Discussion). In detail, for a SNP-pair annotation defined by a set of SNP pairs , LDSPEC estimates the SNP-pair effect correlation
(1) |
where denote standardized derived allele causal disease effect sizes of SNPs (i.e., number of standard deviations increase in phenotype per 1 standard deviation increase in genotype) under a random-effects model, denote expected per-SNP heritabilities, and denotes expected per-SNP-pair effect covariance. We note that previous work has broadly assumed that causal effects are independent1–21 (implying ), but LDSPEC challenges this assumption. To assess correlations specific to the SNP-pair annotation, LDSPEC also estimates the excess SNP-pair effect correlation , defined as the difference between and its expected value across distance-matched SNP pairs. To assess the impact of SNP-pair effect correlations on SNP-heritability, LDSPEC separately estimates genome-wide SNP-heritability and the sum of causal effect size variances across SNPs (); the two quantities may be different when causal effects are not independent (as assumed in previous work1–21).
LDSPEC relies on the fact that the association statistic for a given SNP includes the effects of all SNPs tagged by that SNP4,41. Methods for analyzing single-SNP annotations5 determine that a single-SNP annotation is enriched for heritability if SNPs with higher LD to SNPs in the single-SNP annotation have higher statistics than SNPs with low LD to SNPs in the single-SNP annotation. LDSPEC further determines that a SNP-pair annotation has a positive (resp. negative) correlation of causal effect sizes (of derived alleles) if SNPs with concordant signed LD to SNP pairs in the SNP-pair annotation have higher (resp. lower) statistics than SNPs with discordant signed LD to SNP pairs in the SNP-pair annotation.
In detail, under a polygenic model1, the expected of SNP can be written as
(2) |
where is the GWAS sample size, is the LD score of SNP and single-SNP annotation (ref.5,8) (defined as , where is the value of single-SNP annotation for SNP and is the signed LD between SNPs denotes the contribution of single-SNP annotation to per-SNP heritability (ref.5,8), is the directional LD score of SNP and SNP-pair annotation (defined as where is the set of SNP pairs in SNP-pair annotation denotes the contribution of SNP-pair annotation to per-SNP-pair effect covariance, and denotes disease/trait SNP-heritability. The last term is different from 1 in the analogous LDSC equation4,5 because LDSC uses an external LD reference panel while our method uses in-sample LD to avoid challenges that arise from the use of inaccurate LD reference panels11,42,43 (Methods); we also use a larger 10Mb LD window compared to the 1Mb window commonly used in LDSC4,5. Equation (2) allows us to estimate and via multivariate linear regression of on and and we can further estimate quantities such as and based on estimates of and . We employ regression weights to account for dependency between regression SNPs and heteroskedasticity, and estimate standard errors via genomic block-jackknife, analogous to previous work4,5. Further details are provided in the Methods section and Supplementary Note; we have publicly released open-source software implementing LDSPEC (see Code availability).
We applied LDSPEC to 70 well-powered diseases and complex traits from the UK Biobank37 (z-score >5 for nonzero SNP-heritability; average N=305,646 unrelated “in.white.British.ancestry.subset” individuals, a previously-defined subset of UK Biobank participants who self-reported White British ethnicity and had very similar genetic ancestry based on principal component analysis), including 29 independent diseases/traits ( average N=298,430) (Supplementary Table 1; see Data availability). We considered 14,820,648 imputed SNPs (version “imp_v3” from ref.37, MAF ≥ 0.1%, INFO score44 ≥ 0.6, ref.11,43). We analyzed 165 single-SNP annotations, including 163 baseline-LF annotations11 and 2 annotations for deleterious coding SNPs (common and low-frequency SNPs with CADD pathogenicity score45 >20, resp.) (Supplementary Tables 2,3). We refer to the heritability model defined by the 165 single-SNP annotations as the “baseline” model. We further constructed a “baseline-SP” model including, in addition to the 165 single-SNP annotations, 136 SNP-pair annotations obtained by stratifying 34 main SNP-pair annotations by MAF (common or low-frequency) and LD (positive or negative): 3 proximity-based annotations (0–100bp, 100bp-1kb, 1–10kb), 5 gene-based annotations (e.g., same-gene promoter SNP pairs), 7 functional 0–100bp annotations, and 19 functional 0–1kb annotations (e.g., pairs of H3K27ac SNPs with distances 0–100bp) (Table 1, Supplementary Tables 4,5). The functional SNP-pair annotations were constructed from 38 binary baseline single-SNP functional annotations, subject to a requirement to yield at least 1 million SNP pairs (this requirement is more difficult to satisfy for 0–100bp annotations, implying a smaller number of functional 0–100bp annotations retained). We excluded SNP-pair annotations involving one common SNP and one low-frequency SNP, because these SNP pairs had low levels of LD, limiting the informativeness of directional LD scores (Methods). We have publicly released all SNP annotations and LDSPEC output from this study (see Data availability).
Table 1.
Number of SNP pairs | Average distance | |
---|---|---|
| ||
Proximal 0–100bp | 3.5M | 47bp |
Proximal 100bp-1kb | 27M | 546bp |
Proximal 1–10kb | 253M | 5.4kb |
| ||
Same-exon | 0.81M | 3.6kb |
Same-gene exonic | 1.8M | 53kb |
Same-gene promoter | 1.2M | 46kb |
Same-protein-domain | 0.19M | 47kb |
Same-gene | 1889M | 390kb |
| ||
H3K27ac-100 | 1.4M | 46bp |
H3K27ac (PGC2)-100 | 0.92M | 46bp |
H3K4me1–100 | 1.4M | 46bp |
Intron-100 | 1.3M | 46bp |
Repressed-100 | 1.6M | 45bp |
Super enhancer-100 | 0.61M | 46bp |
Transcribed-100 | 1.1M | 44bp |
| ||
DGF-1k | 1.2M | 387bp |
DHS-1k | 1.9M | 383bp |
DHS peaks-1k | 0.91M | 366bp |
Enhancer-1k | 0.73M | 418bp |
Fetal DHS-1k | 0.84M | 349bp |
H3K27ac-1k | 11M | 483bp |
H3K27ac (PGC2)-1k | 7.2M | 469bp |
H3K4me1–1k | 10M | 466bp |
H3K4me1 peaks-1k | 2.1M | 432bp |
H3K4me3–1k | 2.7M | 436bp |
H3K9ac-1k | 2.5M | 441bp |
Intron-1k | 11M | 487bp |
Promoter-1k | 1.3M | 469bp |
Repressed-1k | 10M | 466bp |
Super enhancer-1k | 5.3M | 487bp |
TFBS-1k | 1.9M | 388bp |
Transcribed-1k | 6.8M | 458bp |
Super enhancer (Vahedi)-1k | 0.59M | 485bp |
Typical enhancer-1k | 0.65M | 475bp |
Simulations assessing calibration and power
We performed null simulations (heritable traits with zero SNP-pair effect correlations) and causal simulations (heritable traits with nonzero SNP-pair effect correlations). We used the same UK Biobank genotype data (N=337,426) and restricted to chromosome 1 SNPs (M=1,161,341) for computational tractability (analogous to ref.8,11). In our primary simulations, SCV was set to 0.5 (similar to previous work11; SNP-heritability was slightly different from SCV when SNP-pair effect correlations were nonzero), causal SNP proportion was set to 0.2 (similar to previous work11), LD-dependent and MAF-dependent genetic architectures were specified based on previous work8,11, and functional enrichment was simulated by assigning a positive to the common Super enhancer (Hnisz) single-SNP annotation; other settings were also evaluated. True simulated values of nonzero SNP-pair effect correlations for SNP-pair annotations in causal simulations are described below, and generative model parameters for all simulations are provided in Supplementary Table 7. Results were obtained by running LDSPEC with the baseline-SP model. We assessed bias (in null and causal simulations) and power (in causal simulations) using mean estimates and empirical SEs across 50 simulation replicates (empirical SE = empirical , and assessed calibration (in null and causal simulations) by comparing average jackknife SE (across 50 simulation replicates) to empirical SD; we note that aggregating 50 simulation replicates reduces the empirical SE, analogous to meta-analyzing 29 independent diseases/traits in real data. Further details are provided in the Methods section.
We first performed null simulations, simulating heritable traits with functional enrichment but zero SNP-pair effect correlations for all SNP-pair annotations. We reached 6 main conclusions. First, estimates of SNP-pair effect correlation () were approximately unbiased, with no significant bias for all 136 SNP-pair annotations (P>0.05/136) (Figure 1a and Supplementary Table 8); furthermore, we did not observe a trend towards negative for positive-LD SNP pairs or positive for negative-LD SNP pairs. Second, estimates of excess SNP-pair effect correlation were approximately unbiased, with no significant bias for all 136 SNP-pair annotations (P>0.05/136) (Supplementary Figure 1a). Third, estimates of the contribution of a SNP-pair annotation to per-SNP-pair effect covariance () were approximately unbiased, with no significant bias for all 136 SNP-pair annotations (P>0.05/136) (Supplementary Figure 1a). Fourth, estimates of the contribution of a single-SNP annotation to per-SNP heritability (), total SNP-heritability, and heritability enrichment were approximately unbiased, analogous to previous work5,8,11 (Supplementary Figure 2a). Fifth, distinct from estimates of total SNP-heritability, estimates of the sum of causal effect size variances across all SNPs (total SCV), as well as total heritability shrinkage (total SCV divided by total SNP-heritability) were approximately unbiased (Supplementary Figure 2a). Sixth, jackknife standard errors for all quantities were well-calibrated (Supplementary Figure 3a).
We next performed causal simulations, simulating heritable traits with functional enrichment and nonzero SNP-pair effect correlation for a subset of SNP-pair annotations. To mimic results in real data (see below), we specified negative contributions to per-SNP-pair effect covariance () for 6 positive-LD SNP-pair annotations (common and low-frequency 0–100bp, 100bp-1kb, and super-enhancer 0–1kb; zero for the 6 corresponding negative-LD SNP-pair annotations; Supplementary Table 7); other SNP-pair annotations that overlap the 6 causal SNP-pair annotations are expected to have nonzero SNP-pair effect correlation (). We reached 6 main conclusions. First, estimates of SNP-pair effect correlation () were significantly negative for all 3 causal common positive-LD SNP-pair annotations (P<0.05/136), non-significantly negative for all 3 causal low-frequency positive-LD SNP-pair annotations (P>0.05/136), and attenuated towards 0 for all 6 causal SNP-pair annotations (Figure 1b and Supplementary Table 9); estimates were non-significant for the 6 corresponding negative-LD SNP-pair annotations (P>0.05/136), consistent with their zero simulated . 10 of the remaining 62 non-causal positive-LD SNP-pair annotations had significantly negative estimates (P<0.05/136), as expected due to overlap with the 6 causal positive-LD SNP-pair annotations (Supplementary Figure 1b). 1 negative-LD SNP-pair annotations had a slightly but significantly positive estimate (common negative-LD 1–10kb, 0.016±0.004) (P<0.05/136) (Supplementary Figure 1b), suggesting a slight bias (perhaps due to collinearity of directional LD scores between SNP-pair annotations (Supplementary Table 6)); we believe that this should not impact our interpretation of results in real data, as the magnitude of estimates was much larger in real data (see below) and LDSPEC produced unbiased estimates in null simulations. Second, estimates of excess SNP-pair effect correlation were significantly negative for the two SNP-pair annotations that were simulated to have negative (common and low-frequency positive-LD super-enhancer 0–1kb) (P<0.05/136) (Supplementary Figure 1b). 4 other positive-LD functional SNP-pair annotations also had significantly negative estimates (P<0.05/136), as expected due to overlap with the causal SNP-pair annotations (Supplementary Figure 1b). 1 negative-LD functional SNP-pair annotation had a slightly but significantly positive estimate (common negative-LD intron 0–1kb, 0.061±0.014) (P<0.05/136) (Supplementary Figure 1b), suggesting a slight bias (perhaps due to collinearity of directional LD scores between SNP-pair annotations (Supplementary Table 6), analogous to the estimates above); we believe that this should not impact our interpretation of results in real data, as we detected substantially more significantly positive estimates for negative-LD functional SNP-pair annotations with larger magnitudes in real data (see below) and LDSPEC produced unbiased estimates in null simulations. Third, estimates of the contribution of a SNP-pair annotation to per-SNP-pair effect covariance () were significantly negative for 1 of 6 causal SNP-pair annotations (common Super enhancer 0–1kb) (P<0.05/136) but non-significant and attenuated towards 0 for the other 5 (P>0.05/136) (Supplementary Figure 1b). 4 of the 130 non-causal SNP-pair annotations also had significantly nonzero estimates (low-frequency positive-LD 1–10kb, common negative-LD 1–10kb, common positive-LD intron 0–1kb, common negative-LD intron 0–1kb) (P<0.05/136) (Supplementary Figure 1b) (perhaps due to the collinearity of directional LD scores between SNP-pair annotations (Supplementary Table 6), analogous to the estimates above); we believe that this should not impact our interpretation of results in real data, as analyses of real data primarily focused on estimates (see below) and LDSPEC produced unbiased estimates in null simulations. Fourth, estimates of the contribution of a single-SNP annotation to per-SNP heritability () were attenuated towards 0 (7.4×10−7 ±4.3×10−8, true value 1.9×10−6 for the common Super enhancer (Hnisz) single-SNP annotation), analogous to the attenuated estimates (running LDSPEC or S-LDSC5 using the baseline model without SNP-pair annotations produced more attenuated estimates of 5.2×10−7 ±2.8×10−8 and 5.0×10−7 ±2.5×10−8, respectively, suggesting that modeling SNP-pair annotations could partially mitigate the attenuation in these simulations; Supplementary Figure 2); estimates of total SNP-heritability and heritability enrichment were approximately unbiased, analogous to null simulations (Supplementary Figure 2b, Supplementary Table 9). Fifth, distinct from estimates of total SNP-heritability, estimates of total heritability shrinkage (total SCV divided by total SNP-heritability) were significantly smaller than 1 but attenuated towards 1 (0.80±0.01, true value 0.56), consistent with the attenuation of estimates (Supplementary Figure 2b, Supplementary Table 9). Sixth, jackknife standard errors for all quantities were well-calibrated, analogous to null simulations (Supplementary Figure 3b).
We performed 5 secondary analyses. First, we performed null and causal simulations at a lower value of SCV (0.2 instead of 0.5). Analogous to our primary simulations, LDSPEC produced approximately unbiased estimates of and in null simulations, and produced significantly negative but attenuated estimates of , and for a subset of causal SNP-pair annotations in causal simulations (slightly biased estimates of for other SNP-pair annotations) (Supplementary Figure 4). Second, we performed null and causal simulations at a lower value of causal SNP proportion (0.1 instead of 0.2). Analogous to our primary simulations, LDSPEC produced approximately unbiased estimates of and in null simulations (though estimates of were slightly biased), and produced significantly negative but attenuated estimates of and for a subset of causal SNP-pair annotations in causal simulations (Supplementary Figure 5). Third, we performed causal simulations where we specified negative values for both the 6 causal positive-LD SNP-pair annotations (as in primary simulations) and the 6 corresponding negative-LD SNP-pair annotations (vs. zero in primary simulations). Analogous to our primary causal simulations, LDSPEC produced significantly negative and slightly attenuated estimates of and for a subset of causal SNP-pair annotations (with slightly biased estimates of and for other SNP-pair annotations); the estimates were less attenuated, suggesting that LDSPEC was more effective when the positive-LD and negative-LD strata of the same SNP-pair annotation had the same (Supplementary Figure 6a). Fourth, we performed causal simulations where we specified positive values for both the 6 causal positive-LD SNP-pair annotations (vs. negative in primary simulations) and the 6 corresponding negative-LD SNP-pair annotations (vs. zero in primary simulations). Analogous to our primary causal simulations, LDSPEC produced significantly positive and slightly attenuated estimates of , and for a subset of causal SNP-pair annotations; once again, the estimates were less attenuated, suggesting that LDSPEC was more effective when the positive-LD and negative-LD strata of the same SNP-pair annotation had the same (Supplementary Figure 6b). Fifth, we applied LDSPEC to the primary null and causal simulation data using LD scores and directional LD scores that were computed with smaller window sizes (1Mb, 3Mb, 5Mb, instead of 10Mb). LDSPEC produced more biased estimates of , heritability, and heritability enrichment as the window size decreased (Supplementary Figure 7).
We conclude that LDSPEC is well-calibrated in null simulations and produces attenuated estimates of nonzero SNP-pair effect correlations in causal simulations.
Analysis of 70 diseases and complex traits
We applied LDSPEC with the baseline-SP model to publicly available summary statistics and in-sample LD of 70 diseases and complex traits (29 independent diseases/traits) from the UK Biobank37 (Supplementary Table 1; see Data availability), analyzing 136 SNP-pair annotations (Table 1). For each SNP-pair annotation, estimates were meta-analyzed across the 29 independent diseases/traits using random-effects meta-analysis, analogous to previous studies5,8 (Methods). Statistical significance was assessed via a Bonferroni p-value threshold, correcting for the number of hypotheses tested. Analysis of each UK Biobank disease/trait required roughly 12 hours for a single-core CPU, and required roughly 128GB of memory (Methods).
We first discuss results for the 3 proximity-based SNP-pair annotations (12 annotations when stratified by MAF and LD; Table 1). Results are reported in Figure 2 and Supplementary Table 14. First, for low-frequency positive-LD SNP-pair annotations, we detected strongly and significantly negative (P<0.05/136) SNP-pair effect correlations () for 0–100bp and 1–10kb SNP-pair annotations (−0.37±0.09 and −0.07±0.01; negative but non-significant estimate for 100bp-1kb). The negative between positive-LD SNP pairs can potentially be explained by linkage masking23 (also see ref.28), whereby haplotypes containing linked SNPs with opposite effects on disease escape negative selection. Specifically, a haplotype harboring two SNPs with opposite effects on disease/trait may have a reduced aggregate effect on fitness in individuals carrying that haplotype, e.g., under stabilizing selection38,39,46,47. The more strongly negative for SNP pairs at closer genomic distances may be partly because the magnitude of LD slightly decays with distance (e.g., average of 0.69, 0.64, 0.55 for common positive-LD 0–100bp, 100–1kb, 1–10kb, resp., Supplementary Table 4), reducing linkage masking effects, but predominantly because nearby SNPs are more likely to have shared functional roles (e.g., median of 541bp for mean segment length across functional annotations in Supplementary Table 1 of ref.5); SNP pairs with similar functional roles and opposite effects on a given disease are likely to also have opposite effects on pleiotropic traits underlying pleiotropic selection38 (but this is less likely for SNP pairs with different functional roles). Second, for common positive-LD SNP-pair annotations, our estimate of was negative with suggestive significance (P=0.001 > 0.05/136) for the 0–100bp SNP-pair annotation (−0.15±0.04; non-significant estimates for 100bp-1kb and 1–10kb). Common positive-LD SNP-pair annotations had less negative estimates than their low-frequency counterparts (significantly positive difference for 1–10kb, P<0.05/68; positive but non-significant differences for the remaining 2 comparisons; Supplementary Table 15), perhaps because common SNPs have smaller per-allele effects on disease and fitness than low-frequency SNPs11,13,16, limiting the impact of linkage masking. Third, common and low-frequency negative-LD SNP-pair annotations had less negative estimates than their positive-LD counterparts (significantly positive differences for common 0–100bp, P<0.05/68; positive but non-significant differences for the remaining 5 comparisons; Supplementary Table 15), consistent with linkage masking, which implicates a negative SNP-pair effect correlation for positive-LD SNP pairs and a less negative or weakly positive SNP-pair effect correlation for negative-LD SNP pairs (see Forward simulations recapitulate empirical findings.)
We next discuss results for the 5 gene-based SNP-pair annotations (20 annotations when stratified by MAF and LD; Table 1). Results are reported in Figure 2 and Supplementary Table 14. First, for low-frequency positive-LD SNP-pair annotations, we detected strongly and significantly negative (P<0.05/136) SNP-pair effect correlations () for same-exon, same-gene exonic, same-gene promoter, and same-protein-domain SNP-pair annotations (−0.16±0.04, −0.13±0.02, −0.37±0.08, and −0.17±0.04; estimates of excess SNP-pair effect correlation were very similar to estimates of for these SNP-pair annotations due to their large genomic distances (implying a close to zero expected value of for distance-matched SNP pairs) (Supplementary Table 13). The strongly negative (and ) estimates are consistent with shared functional roles for SNP pairs in these gene-based annotations; the same-gene promoter SNP-pair annotation had the most negative estimate, perhaps because promoter SNPs can either increase or decrease gene expression levels48, supporting masking effects on gene expression, disease/trait, and fitness. Second, for common positive-LD SNP-pair annotations, estimates were non-significant and less negative than their low-frequency counterparts (significantly positive difference for same-gene promoter, P<0.05/68; positive but non-significant differences for 4 of 5 comparisons; Supplementary Table 15), analogous to results for proximity-based SNP-pair annotations. Third, common and low-frequency negative-LD SNP-pair annotations had less negative estimates than their positive-LD counterparts (significantly positive differences for 9 of 10 comparisons, P<0.05/68; positive but non-significant difference for the remaining 1 comparison; Supplementary Table 15), analogous to results for proximity-based SNP-pair annotations.
Finally, we discuss results for the 7 functional 0–100bp and 19 functional 0–1kb SNP-pair annotations (e.g., pairs of H3K27ac SNPs with distance 0–100bp; 104 annotations when stratified by MAF and LD; Table 1). We primarily focus on excess SNP-pair effect correlations to assess information specific to these functional annotations. estimates are reported in Figure 3 and Supplementary Table 16; corresponding estimates are reported in Supplementary Figure 8 and Supplementary Table 13. First, for low-frequency positive-LD SNP-pair annotations, we detected strongly and significantly negative (P<0.05/136) for 9 of 19 functional 0–1kb SNP-pair annotations (e.g., −0.24±0.02 for H3K27ac 0–1kb; significantly positive for Repressed 0–1kb, 0.21±0.10, P<0.05/136; non-significant for the remaining 9 functional 0–1kb and all 7 functional 0–100bp). SNP pairs in these SNP-pair annotations have stronger effects on disease5 and are likely to have similar functional roles, thus are expected to be more strongly impacted by linkage masking (exception: the significantly positive estimate for the low-frequency positive-LD Repressed 0–1kb SNP-pair annotation (corresponding estimate non-significant) is likely because SNP pairs in this annotation have weaker effects on disease5 and are likely to have weaker effects on fitness, thus expected to be less strongly impacted by linkage masking). Interestingly, low-frequency positive-LD functional 0–100bp SNP-pair annotations had less negative estimates than the corresponding functional 0–1kb SNP-pair annotations (significantly positive differences for H3K27ac and Transcribed, P<0.05/7; non-significant for the remaining 5; Supplementary Table 17); SNP pairs at very short genomic distances may generally have shared functional roles supporting linkage masking regardless of functional annotation, limiting the difference in between functional SNP pairs and other distance-matched SNP pairs. Second, for common positive-LD SNP-pair annotations, we detected significantly negative (P<0.05/136) for only 4 of 19 functional 0–1kb SNP-pair annotations (e.g., −0.05±0.01 for H3K27ac 0–1kb; non-significant for the remaining 15 functional 0–1kb and all 7 functional 0–100bp). Common positive-LD functional SNP-pair annotations had less negative estimates than their low-frequency counterparts (significantly positive differences for 12 out of 26 comparisons, P<0.05/68; Supplementary Table 17), analogous to results for proximity-based SNP-pair annotations. Third, common and low-frequency negative-LD functional SNP-pair annotations had less negative estimates than their positive-LD counterparts (significantly positive differences for 36 of 38 functional 0–1kb (and 0 of 14 functional 0–100bp), significantly negative difference for common Repressed 0–1kb, P<0.05/68; Supplementary Table 17), analogous to results for proximity-based SNP-pair annotations; 5 of 19 common negative-LD functional 0–1kb SNP-pair annotations had weakly but significantly positive (P<0.05/136) estimates (Supplementary Figure 8), perhaps because SNP pairs with concordant effects are more likely to be on different haplotypes to have a smaller aggregate impact on fitness under stabilizing selection.
We investigated whether excess SNP-pair effect correlations were larger for functional SNP-pair annotations with larger disease heritability enrichments for the underlying functional single-SNP annotations; we hypothesized that this might be the case, because pairs of SNPs with more strongly enriched heritability and shared functional roles are expected to be more strongly impacted by linkage masking. Results are reported in Figure 4, Supplementary Figure 9, and Supplementary Table 18. For positive-LD functional SNP-pair annotations, we observed significantly more negative (P<0.05/4) estimates for functional annotations with higher disease heritability enrichments, with a stronger effect for low-frequency SNP-pair annotations (e.g., regression slope of −0.179±0.031 for low-frequency positive-LD 0–1kb vs. −0.024±0.009 for common positive-LD 0–1kb). For negative-LD functional SNP-pair annotations, we observed significantly more positive (P<0.05/4) estimates for functional annotations with higher disease heritability enrichments (e.g., regression slope of 0.036±0.010 for common negative-LD 0–1kb; non-significant slope of −0.041±0.029 for low-frequency negative-LD 0–1kb). These results support our hypothesis that functional annotations that are more enriched for disease heritability are more impacted by linkage masking.
Although most of our results reflect a meta-analysis across diseases/traits, an assessment of results for individual diseases/traits is also important. For individual diseases/traits, we detected 12 significantly nonzero (P<0.05/136) SNP-pair effect correlations (), spanning 10 diseases/traits and 9 SNP-pair annotations (Supplementary Table 11); this suggests that LDSPEC can detect nonzero for individual diseases/traits, but has limited power to do so. These findings included a significantly negative estimate of the common positive-LD H3K4me3 0–1kb SNP-pair annotation for Monocyte Count (−0.19±0.05) and a significantly positive estimate of the common negative-LD 0–1kb H3K4me1 SNP-pair annotation for Forced Vital Capacity (0.21±0.05). We assessed the heterogeneity of estimates across 29 independent diseases/traits by computing a statistic quantifying relative excess cross-trait variance as compared to within-trait variance (Methods). Results are reported in Supplementary Table 19. The medium relative excess cross-trait variance was 4.0% across all 136 SNP-pair annotations (17.3% when restricting to the 12 proximity-based SNP-pair annotations), implying a low level of heterogeneity. We detected significant heterogeneity (P<0.05/136) for 1 SNP-pair annotation, the low-frequency positive-LD Repressed 0–1kb SNP-pair annotation (P=3.5×10−4).
We compared LDSPEC results obtained using the baseline-SP model to results obtained using other heritability models, including the baseline-SP-proximity model (165 single-SNP annotations + 12 proximity-based SNP-pair annotations only), the baseline-SP-gene model (165 single-SNP annotations + 20 gene-based SNP-pair annotations only), and the baseline-SP-functional model (165 single-SNP annotations + 104 functional SNP-pair annotations only). We determined that each of these models produced similar estimates as the baseline-SP model for SNP-pair annotations shared between the models (correlation of 0.96 across 136 SNP-pair correlations; non-significant difference (P>0.05/136) for all 136 comparisons; Supplementary Figure 10).
We conclude that positive-LD SNP pairs tend to have strongly negative SNP-pair effect correlations of disease effects, negative-LD SNP pairs tend to have less negative or weakly positive SNP-pair effect correlations, low-frequency SNP pairs tend to have stronger SNP-pair effect correlations than common SNP pairs, and SNP pairs in shared functional annotations tend to have much stronger SNP-pair effect correlations.
Impact of SNP-pair effect correlations on SNP-heritability
We assessed the impact of SNP-pair effect correlation on SNP-heritability by estimating and comparing two closely related quantities: SNP-heritability and sum of causal effect size variances (SCV) (Methods); the two quantities may be different when causal effects are not independent (as assumed in previous work1–21). SNP-heritability quantifies the aggregate impact of SNPs on disease and may be more relevant to applications such as polygenic risk scores (PRS)49,50, whereas SCV pertains to the impact of individual SNPs on disease and may be more relevant to applications such as fine-mapping51.
Results are reported in Figure 5 and Supplementary Table 20. First, SNP-heritability was substantially smaller than SCV, with a regression slope of 0.89±0.01; accordingly, heritability shrinkage, defined as the ratio between SNP-heritability and SCV, was equal to 0.87±0.02 (average across 29 independent diseases/traits). This implies that the phenomenon of negative SNP-pair effect correlations for positive-LD SNP pairs (and less negative or weakly positive SNP-pair effect correlations for negative-LD SNP pairs) can substantially impact SNP-heritability. Second, average heritability shrinkage was even stronger for certain functional annotations, e.g., 0.79±0.01 for common Super enhancer (Hnisz) SNPs; average of 0.83±0.01 across the 6 common functional annotations that had enriched heritability (heritability enrichment >1) and were large enough to be included in both 0–100bp and 0–1kb SNP-pair annotations (implying more accurate modeling of heritability shrinkage) and 0.84±0.01 across the corresponding 6 low-frequency functional annotations.
We performed 3 secondary analyses. First, we assessed the impact of modeling SNP-pair effect correlations on genome-wide SNP-heritability estimates; we determined that modeling SNP-pair effect correlations had a limited impact, as models that do not account for SNP-pair effect correlations produced similar estimates (Supplementary Figure 11a). Second, we assessed the impact of modeling SNP-pair effect correlations on estimates of heritability enrichment for single-SNP annotations; again, we determined that modeling SNP-pair effect correlations had a limited impact, as models that do not account for SNP-pair effect correlations produced similar estimates (Supplementary Figure 11b). Third, we confirmed that LDSPEC and S-LDSC5,8 (using the baseline model without SNP-pair annotations) produced similar estimates of each single-SNP annotation’s contribution to per-SNP heritability (τ), as well as genome-wide SNP-heritability (Supplementary Figure 11c,d).
We conclude that SNP-heritability is systematically smaller than SCV across diseases/traits, and that this heritability shrinkage is stronger for functionally important annotations.
Forward simulations under stabilizing selection recapitulate empirical findings
Our finding that positive-LD SNP pairs tend to have negative SNP-pair effect correlations can potentially be explained by linkage masking, whereby haplotypes containing linked SNPs with opposite effects on disease have reduced effects on fitness and escape negative selection23,28. To test this hypothesis, we performed forward simulations of a quantitative trait under stabilizing selection, in which alleles that either increase or decrease the value of the phenotype are selected against38,39. In our primary simulations, we assumed a constant population size with 10,000 diploid individuals, mutation rate μ = 1×10−8, and fitness function (defined as the relationship between fitness and trait effect size of an allele) consistent with strong stabilizing selection (width of fitness function = 2; Supplementary Figure 12a); other settings were also evaluated. We assessed the (true) SNP-pair effect correlations () of SNP-pair annotations stratified by MAF and LD at different distances. Further details of the forward simulation framework are provided in the Methods section.
Results are reported in Figure 6 and Supplementary Table 21. We determined that positive-LD 0–100bp, 100bp-1kb, and 1–10kb SNP pairs had substantially negative SNP-pair effect correlations whereas negative-LD 0–100bp, 100bp-1kb, and 1–10kb SNP pairs had weakly positive SNP-pair effect correlations, which is consistent with linkage masking and qualitatively consistent with results for real diseases/traits (Figure 2). We did not observe a sharp decay of with distance as in real data (Figure 2), perhaps because we did not simulate more proximal SNPs to have shared functional roles, which is the case in real data (Supplementary Table 1 of ref.5). Under stabilizing selection, SNP pairs with discordant effects on the trait (for derived alleles) will have strongly positive LD, because haplotypes containing both derived alleles or both ancestral alleles are less susceptible to selection (than haplotypes containing one derived allele and one ancestral allele). On the other hand, SNP pairs with concordant effects on the trait (for derived alleles) will have weakly negative LD, because haplotypes containing both derived alleles are more susceptible to selection but haplotypes containing both ancestral alleles are less susceptible to selection (than haplotypes containing one derived allele and one ancestral allele). These consequences are consistent with the “Bulmer effect”, in which stabilizing selection reduces the phenotypic variance in each generation by weeding out extreme deviations from the norm46,47.
Accordingly, we determined that SNP pairs with opposite trait effects (for derived alleles) tended to be in strongly positive LD, and SNP pairs with concordant trait effects (for derived alleles) tended to be in weakly negative LD (Supplementary Figure 12b). The level of LD was relatively low when the disease/trait effects were either very small or very large, perhaps because small-effect SNPs are less impacted by stabilizing selection, and large-effect SNPs are efficiently removed from the population before the emergence of a second SNP masking the first SNP’s trait effect. LD was not significantly different from zero for neutral SNP pairs with at least one zero-effect SNP (Supplementary Figure 12b), consistent with the hypothesis that negative arises only under selection. We also performed simulations with other selection strengths (width of the fitness function: 4 for moderate selection and 1×106 for no selection, instead of 2 for strong selection in primary simulation). Results were similar for moderate selection vs. strong selection, but the LD between SNP pairs with correlated effects disappeared under no selection, consistent with our expectation (Supplementary Figure 12b).
In summary, our results suggest that a model of stabilizing selection on a complex trait can potentially explain the patterns we observe in real data, providing an evolutionary explanation for our findings.
Discussion
We have developed LDSPEC, a method that analyzes summary statistics and in-sample LD to estimate correlations of causal disease effect sizes for pairs of nearby SNPs, depending on their functional annotations. We recommend applying LDSPEC using the baseline-SP model, which contains 165 single-SNP annotations11 and 136 new SNP-pair annotations, including 12 proximity-based, 20 gene-based, and 104 functional SNP-pair annotations. We have shown that LDSPEC is approximately unbiased and well-calibrated in null simulations and capable of detecting nonzero SNP-pair effect correlations (with attenuated estimates) in causal simulations. Applying LDSPEC with the baseline-SP model to 70 UK Biobank diseases and complex traits37, we detected strongly and significantly nonzero SNP-pair effect correlations for nearby SNP pairs that decayed with distance. We determined that positive-LD SNP pairs had strongly negative disease-effect correlations, that negative-LD SNP pairs had less negative or weakly positive disease-effect correlations, and that SNP pairs in shared functional annotations that were enriched for disease heritability had stronger disease-effect correlations that spanned longer distances. As a consequence, SNP-heritability is systematically smaller than the sum of causal effect size variances, particularly for certain functional annotations. The negative SNP-pair effect correlations between positive-LD SNP pairs can potentially be explained by linkage masking, whereby haplotypes containing linked SNPs with opposite effects on disease have a reduced aggregate effect on fitness and escape negative selection. Forward simulations showed that our findings are consistent with an evolutionary model involving stabilizing selection.
To our knowledge, no published study has systematically investigated SNP-pair effect correlations in genome-wide data. Our work expands upon an unpublished preprint40, which contained key ideas and derivations and detected SNP-pair effect correlations for extremely-short-range SNP pairs (0–100bp) that varied with LD. We note 4 important differences between our work and ref.40. First, our work stratifies SNP pairs by MAF and functional annotations. Second, our work identifies SNP-pair effect correlations at larger genomic distances (up to tens of kilobases). Third, our work performs evolutionary forward simulations to interpret our findings. Fourth, our work introduces improved methodology: LDSPEC uses a more accurate model15 for per-SNP heritability (165 baseline-LF single-SNP annotations11 vs. 26 MAF-and-LD single-SNP annotations in ref.40); LDSPEC adopts a principled estimator for SNP-pair effect correlations, whereas ref.40 uses a two-step heuristic assuming per-SNP heritability to be the same across SNPs; and LDSPEC more accurately computes LD scores and directional LD scores using a much larger LD window (10Mb vs. 1Mb) (leveraging an efficient implementation).
Our findings have several implications for future work. First, our findings challenge the widespread assumption of independent causal SNP-to-disease effects in studies of disease and complex trait architectures1–21. We have shown that modeling SNP-pair effect correlations distinguishes total SNP-heritability from the sum of causal SNP-to-trait effect size variances. Despite the limited impact of modeling SNP-pair effect correlations on estimates of SNP-heritability and heritability enrichment, its impact on other genetic architecture parameters (e.g., parameters related to polygenicity13,14,17,19 or selection11,13,16,20) remains to be assessed. Second, our findings motivate further prioritization of joint association testing methods that increase statistical power in the presence of linkage masking23,28,52. Third, our findings motivate the development of improved fine-mapping methods to disentangle linkage-masked SNPs by modeling SNP-pair effect correlations; incorporating functional annotations43,51–54 (including SNP-pair annotations) and analyzing data from diverse populations with different LD patterns55–57 will likely remain valuable. Fourth, negative SNP-pair effect correlations may contribute to poor cross-population transferability of polygenic risk scores (PRS)49,50,58–60, as linked SNPs with opposite effects in one population may not be linkage-masked in a different population due to different LD patterns. Ongoing efforts to improve cross-population PRS61,62 may benefit from modeling SNP-pair effect correlations.
We note several limitations of our work. First, LDSPEC produces attenuated estimates of SNP-pair effect correlations in causal simulations, possibly because there is a high level of collinearity of directional LD scores between SNP-pair annotations, and it is challenging to distinguish between SNP-pair annotations with highly correlated directional LD scores; however, LDSPEC is unbiased and well-calibrated in null simulations. Second, LDSPEC attains incomplete power in some settings, including simulations (Figure 1b) and analyses of individual diseases/traits (Supplementary Tables 10,11); an important future direction is to improve the power of LDSPEC, e.g., by incorporating products of z-scores of nearby SNPs. Third, we only considered binary SNP-pair annotations in this work; an important future direction is to extend LDSPEC to incorporate continuous SNP-pair annotations, analogous to incorporation of continuous single-SNP annotations in S-LDSC8. Fourth, although we have shown via forward simulations that stabilizing selection can produce the negative SNP-pair effect correlations observed in real data, we currently cannot exclude the possibility that this could be produced by other evolutionary mechanisms. For example, Hill–Robertson interference8,22 can create negative LD for pairs of deleterious SNPs (concordant effects on fitness) and antagonistic epistasis can create positive LD between SNP pairs26. Stabilizing selection may be a more plausible explanation, because Hill–Robertson interference is less relevant to SNP pairs with opposite effects and the impact of epistatis on disease is hypothesized to be small63–65. Nonetheless, investigating the impact of a broad set of evolutionary models on SNP-pair effect correlations is an important future direction. Fifth, we have estimated SNP-pair effect correlations for low-frequency and common variants, but not for rare variants (for which LDSPEC is underpowered due to a lower level of LD between rare SNP pairs). Investigating SNP-pair effect correlations for rare variants (which have often been reported to have concordant effects29–34, motivating the development of rare variant burden tests52,66,67) is an important future direction. Sixth, analogous to other studies that employ linear complex trait models1–21, we have not investigated the potential impact of epistatic interactions on our estimates; however, the impact of epistatic interaction on these models is hypothesized to be small63–65. Seventh, we have not assessed the impact of unmodeled causal variants that are missing from the data on our estimates. However, shared tagging of unmodeled causal variants could produce spurious positive effect correlations between positive-LD SNP pairs, but would not be expected to produce the negative effect correlations that we report here. Eighth, we have analyzed “in.white.British.ancestry.subset” samples from the UK Biobank, but an important future direction is to extend our analyses to cohorts of diverse genetic ancestry68,69. Despite these limitations, our work provides a comprehensive genome-wide assessment of SNP-pair effect correlations of causal disease effect sizes across MAF, LD, and functional annotations.
Methods
Modelling SNP-pair effect correlations
We considered individuals, SNPs, and assume a polygenic model1,71
(3) |
where is a quantitative phenotype, is the standardized genotype, is the SNP causal effects on phenotype, and is the environmental factor. We model as fixed and model and as random variables independent of each other. Previous work has assumed independent SNP-to-phenotype effects1–21 (implying elements of are independent), but our model allows SNP-to-phenotype effects to be correlated by assuming a general covariance We standardize71 as , where is the number of derived alleles for individual and SNP , and is the derived allele frequency of SNP .
We consider binary/continuous single-SNP annotations, where represents the value of annotation for SNP . We consider binary SNP-pair annotations, where indicates if SNP pair is in the annotation (we set diagonal elements for modelling convenience). We model the SNP causal effect covariance as a linear combination of contributions from single-SNP annotations and SNP-pair annotations:
(4) |
where represents the contribution of single-SNP annotation to per-SNP heritability, and represents the contribution of SNP-pair annotation to per-SNP-pair covariance. Analyzing standardized effect sizes (as in this paper) may produce slightly different results compared to analyzing non-standardized (per-allele) effect sizes, as the two analyses, together with model (4), imply different MAF-dependent genetic architectures.
Inference via LDSPEC
Let be the summary association statistic for SNP and be the signed in-sample LD between SNPs and . Then the chi-square statistic is equal to . Under the correlated SNP effect model (Equations (3),(4)),
(5) |
where is the LD score of SNP for single-SNP annotation and is the directional LD score of SNP for SNP-pair annotation . Please see the Supplementary Note for more details.
We use all SNPs in the data set as both reference SNPs (for computing LD and directional LD scores) and regression SNPs (for estimating and via regression). We prefer in-sample LD over external LD reference panels because external LD data sets may have smaller sample sizes and may not match the GWAS cohort, potentially reducing power and introducing estimation bias. For computational tractability, we approximate the LD and directional LD scores using SNPs in an adjacent 10Mb window; using a smaller window may introduce estimation biases (Supplementary Figure 7). We use two sets of regression weights similar to previous work4: LD score weights proportional to accounting for dependency between regression SNPs and heteroskedasticity weights proportional to (approximating ), where is the LD score of SNP and is estimated using reference SNPs in the adjacent 10Mb window. We estimate the covariance of estimates of and using a genomic block jackknife with 100 equally-sized blocks of adjacent SNPs; estimates of and are approximately normally distributed.
LDSPEC further estimates a number of parameters for single-SNP annotations and SNP-pair annotations. Let be the set of SNPs in a binary single-SNP annotation and be the set of SNP pairs in a SNP-pair annotation .
Heritability of a single-SNP annotation . It holds that (second term is 0 when SNP effects are independent; see Supplementary Note for more details). For computational efficiency, we approximate the coefficient of in the second term as , where is the average signed LD across SNP pairs in and can be precomputed (see Data availability).
Sum of causal effect size variance (SCV) of a single-SNP annotation . SCV(c) is equal to when SNP effects are independent.
Heritability enrichment of a single-SNP annotation . For a common single-SNP annotation , the common heritability enrichment is , where is the common SNP heritability and is the number of common SNPs. We define and estimate low-frequency heritability enrichment for a low-frequency single-SNP annotation similarly.
Heritability shrinkage of a single-SNP annotation .
Total SNP-pair effect covariance of a SNP-pair annotation .
SNP-pair effect correlation of a SNP-pair annotation , where .
Total excess SNP-pair effect covariance of a SNP-pair annotation , where, for a heritability model with non-overlapping proximity-based SNP-pair annotations (such as baseline-SP), sums over the non-overlapping proximity-based SNP-pair annotations. for proximity-based annotations by definition.
Excess SNP-pair effect correlation of a SNP-pair annotation , where .
Heritability, SCV, total SNP-pair effect covariance, and excess total SNP-pair effect covariance are linear in and (therefore approximately normal); we estimate their SE and further compute z-scores to test for significance using the covariance of estimates of and . Since heritability enrichment may not be normally distributed, analogous to previous work5, we test for significant enrichment by testing whether , which is linear in and (therefore approximately normal). Since heritability shrinkage may not be normally distributed, we test for significant shrinkage by testing whether , which is linear in and (therefore approximately normal). Since (resp. ) may not be normally distributed, we test for significantly nonzero (resp. )using the p-value for nonzero (resp. ). We also report jackknife SE for heritability enrichment, heritability shrinkage, , and , even though this is not what we use to assess significance.
The computational cost for LDSPEC to analyze one UK Biobank disease/trait (14,820,648 SNPs) was roughly 12 hours for a single-core CPU, and roughly 128GB of memory; this assumes precomputed LD and directional LD scores (which need to be computed only once for all diseases/traits analyzed).
Genotype data
We considered 337,426 unrelated “in.white.British.ancestry.subset” individuals and 70 diseases and complex traits from the UK Biobank37 (average N=305,646, z-score >5 for nonzero SNP-heritability; Supplementary Table 1). The subset of 29 independent diseases/traits (average N=298,430) was selected to have pairwise genetic correlation7 . We considered the set of 14,820,648 UK Biobank imputed SNPs (version “imp_v3” from ref.37) with MAF ≥ 0.1% and INFO score ≥ 0.6, similar to previous work11,43. This set of SNPs was used as both the regression SNPs and reference SNPs in the LDSPEC analysis. We considered disease effects defined with respect to derived alleles of SNPs. To determine the ancestral allele (opposite of the derived allele) at each variant site, we obtained a whole genome alignment of the Human hg19 genome assembly to the Chimpanzee panTro6 genome assembly from the UCSC genome browser (see Data availability). We converted the MAF formatted file (hg19.panTro6.synNet.maf.gz) to VCF format using MAFFilter v1.3.172 (see Code availability) and extracted the chimpanzee allele at all variant sites in the UK Biobank.
SNP annotations
We considered 165 single-SNP annotations (Supplementary Tables 2,3), including 163 annotations in the baseline-LF model11 and 2 annotations of CADD score45 for deleterious coding SNPs (common and low-frequency CADD score >20 SNPs, resp.). The 165 single-SNP annotations were constructed from 45 main functional annotations (baseline model version provided in Supplementary Table 2). Since we considered a different set of reference SNPs, we recomputed these main functional annotations. Specifically, the original .bed reference files were used for 36 main functional annotations. The annotations “Nucleotide diversity” and “Recombination rate” were recomputed following the original definition8,11. The annotation “MAF-adjusted LLD-AFR” was computed using the 1000 genome African population LD score73 (missing values imputed as 1). The annotations “Conserved (GERP RS ≥ 4)”, “Conserved (GERP NS)”, “CpG content”, “Deleterious (CADD ≥ 20)” were obtained from the CADD database v1.645 (see Data availability). The annotations “Non-synonymous” and “Synonymous” were curated using SnpEff v4.3t74 (see Code availability). All single-SNP annotations analyzed are publicly available (see Data availability).
We constructed 136 SNP-pair annotations obtained by stratifying 34 main SNP-pair annotations by MAF (common or low-frequency) and LD (positive or negative): 3 proximity-based annotations (0–100bp, 100bp-1kb, 1–10kb), 5 gene-based annotations (e.g., same-gene promoter SNP pairs), 7 functional 0–100bp annotations, and 19 functional 0–1kb annotations (e.g., pairs of H3K27ac SNPs with distances 0–100bp) (Table 1, Supplementary Tables 4,5; Data availability). For gene-based annotations, we used GENCODE v41 for exon and gene annotations (Data availability) and downloaded the promoter annotation from ref.70, and annotated protein domains using VEP v10275 (Code availability). The functional SNP-pair annotations were constructed from 38 binary baseline model single-SNP functional annotations (Supplementary Table 2), restricted to functional SNP-pair annotations with at least 1 million SNP pairs (combined across MAF and LD bins). We excluded SNP-pair annotations involving one common SNP and one low-frequency SNP, because these SNP pairs had low levels of LD, limiting the informativeness of directional LD scores. All SNP-pair annotations analyzed are publicly available (see Data availability).
Simulations
For all simulations, we used the UK Biobank genotype data of all 337,426 samples and all 1,161,341 SNPs on chromosome 1, analogous to previous work8,11. We considered two values of SCV (0.5 or 0.2) and two values of causal SNP proportion (0.2 or 0.1). We repeated all simulations 50 times. All simulation parameters are reported in Supplementary Table 7. We note that heritabilities are different from SCVs in causal simulations with nonzero SNP-pair effect correlations.
In null simulations, we simulated heritable traits with functional enrichment but zero SNP-pair effect correlations. First, we simulated per-SNP heritability of SNPs according to Equation (4), where we incorporated the LD-dependent genetic architecture by assigning nonzero to LD-related single-SNP annotations based on estimates from previous work8,11 and incorporated functional enrichments by assigning a positive to the common Super enhancer (Hnisz) single-SNP annotation, also motivated by previous work5,8,11 (Supplementary Table 7). Second, we simulated the MAF-dependent genetic architecture by further multiplying the simulated per-SNP heritability of each SNP by , where is the derived allele frequency and we used based on previous work16. Third, we simulated the sparse genetic architecture by randomly selecting a subset of causal SNPs, setting the simulated per-SNP heritability of non-causal SNPs to zero, and scaling up the simulated per-SNP heritability of causal SNPs to match the target SCV (making equal to target SCV). Finally, we sampled causal SNP effect sizes for each SNP from a normal distribution with mean zero and variance equal to the simulated per-SNP heritability. We determined the true values of and by regressing simulated causal effects on the subset of causal single-SNP and SNP-pair annotations following Equation (4) and determined true values of other quantities based on true values of and .
In casual simulations, we simulated heritable traits with functional enrichment and nonzero SNP-pair effect correlations. In primary causal simulations, we simulated negative for positive-LD SNP pairs but zero for negative-LD SNP pairs, to mimic our findings in real-data analysis that positive-LD SNP pairs had strongly negative estimates but negative-LD SNP pairs had very weakly positive estimates (Figures 2, Supplementary Figure 8). First, we simulated LD-and-MAF dependent genetic architectures and functional enrichments for per-SNP heritability of SNPs by repeating the first and second steps in null simulations. Second, we assigned nonzero contributions to SNP-pair effect correlation () to a subset of SNP-pair annotations (Supplementary Table 7) and calculated the correlation matrix of SNP effect sizes by summing up contributions from all causal SNP-pair annotations. Third, we calculated the covariance matrix of SNP effect sizes by scaling the simulated correlation matrix by simulated per-SNP heritability. Fourth, we simulated SNP causal effect sizes by blocks of 100 SNPs, randomly selecting a subset of blocks to be causal based on the target causal SNP proportion, and sampled causal SNP effect sizes from a multivariate normal distribution with zero mean and the simulated covariance matrix for causal SNP blocks (we removed negative eigenvalues from covariance matrices to keep them positive semidefinite). Fifth, we rescaled the simulated causal effect sizes to match the target SCV by scaling to be equal to the target SCV. We calculated the true parameter values the same as in null simulations.
Data analysis
We used genomic jackknife to assess standard error and statistical significance when aggregating dependent estimates, including analyses in Figure 4 and Supplementary Figure 9, and Supplementary Tables 15,17. For analysis of heterogeneity across diseases/traits (in Analysis of 70 diseases and complex traits), let be the number of diseases/traits and let be the point estimate and SE of the th trait. We assume that . Let be the unweighted mean. For the ratio between across-trait variance and average SE, the across-trait variance is estimated as (second term corrects for bias), and the average SE is computed as . Let be the weighted mean. The chi-square statistic is and follows a distribution.
Forward evolutionary simulations
Forward evolutionary simulations were performed on SLiM v3.676 (Code availability) using a fixed population size of 10,000 diploid individuals, each with a single chromosome of length 100kb, mutation rate , and recombination rate 1×10−8. For simulations of stabilizing selection, new mutations were introduced at rate with effect sizes of (trait-decreasing), 0 (neutral), or (trait-increasing) (with equal probability); was used in the main simulation, and additional values were considered for simulation of linkage disequilibrium varying over a log-scaled range from 1×10−4 to 1 (Supplementary Figure 12b). At the end of each generation, aggregate trait effect for each individual was calculated as across variants each with effect size . Individual fitness (as a function of aggregate trait effect for each individual in a given generation) was calculated as depending on the width of fitness function parameter , following ref.38. We considered 3 values for the width parameter: strong selection (, used in the main simulation), moderate selection (), and effectively neutral (Supplementary Figure 12a). Simulations were run for 10N = 100,000 generations. Pairwise linkage disequilibrium was computed using emeraLD77 v0.1 (Supplementary Figure 12) (Code availability), or using correlation coefficient (Figure 6). An aggregate of 5,000 simulated populations was run, and the mean statistic (e.g., or ) was summarized within each run and then between runs to derive mean values and confidence intervals.
Supplementary Material
Acknowledgements
We are grateful to Kangcheng Hou, Kushal Dey, Ali Akbari, Luke O’Connor, Armin Schoech, Corbin Quick, Xihao Li, Hui Li, Tiffany Amariuta, Karthik Jagadeesh, Katherine Siewert-Rocks, Jordan Rossen, Elizabeth Dorans, Xihong Lin, and Soumya Raychaudhuri for their helpful discussions. This research was conducted using the UK Biobank resource under application no. 16549 and was funded by National Institutes of Health (NIH) grants U01 HG009379, R01 MH101244, R37 MH107649, U01 HG012009 and R01 HG006399. C.C. was funded by the NIGMS 5T32GM007748–44. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Footnotes
Competing Interests Statement
The authors declare no competing interests.
Code availability
Software implementing LDSPEC and code for generating all results of the paper are available at https://github.com/martinjzhang/LDSPEC. MAFFilter v1.3.172 is available at https://jydu.github.io/maffilter/. SnpEff v4.3t74 is available at http://pcingola.github.io/SnpEff/. VEP v10275 is available at https://useast.ensembl.org/info/docs/tools/vep/script/vep_download.html. SLiM version v3.6 is available at https://github.com/MesserLab/SLiM/releases/tag/v3.6. emeraLD version v0.1 is available at https://github.com/statgen/emeraLD.
Data availability
Information of imputed SNPs and corresponding ancestral alleles, GWAS summary statistics, baseline-SP single-SNP and SNP-pair annotations, LD scores, directional LD scores, and LDSPEC output from this study are available at https://figshare.com/projects/LD_SNP-pair_effect_correlation_regression_LDSPEC_/188052. We did not release in-sample LD files due to their large sizes; similar in-sample LD files can be found in ref.43. The whole genome alignment of the Human hg19 genome assembly to the Chimpanzee panTro6 genome assembly is available at http://hgdownload.cse.ucsc.edu/goldenpath/hg19/vsPanTro6/. CADD database v1.645 is available at https://cadd.gs.washington.edu/download. GENCODE v41 is available at https://www.gencodegenes.org/human/release_41.html. The promoter annotation from ref.70 is available at https://alkesgroup.broadinstitute.org/cS2G.
References
- 1.Yang Jian, Benyamin Beben, McEvoy Brian P, Gordon Scott, Henders Anjali K, Nyholt Dale R, Madden Pamela A, Heath Andrew C, Martin Nicholas G, Montgomery Grant W, et al. Common snps explain a large proportion of the heritability for human height. Nature genetics, 42(7):565, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Speed Doug, Hemani Gibran, Johnson Michael R, and Balding David J. Improved heritability estimation from genome-wide snps. The American Journal of Human Genetics, 91(6):1011–1021, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Zhou Xiang, Carbonetto Peter, and Stephens Matthew. Polygenic modeling with bayesian sparse linear mixed models. PLoS genetics, 9(2):e1003264, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Bulik-Sullivan Brendan K, Loh Po-Ru, Finucane Hilary K, Ripke Stephan, Yang Jian, Patterson Nick, Daly Mark J, Price Alkes L, and Neale Benjamin M. Ld score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature genetics, 47(3):291–295, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Finucane Hilary K, Bulik-Sullivan Brendan, Gusev Alexander, Trynka Gosia, Reshef Yakir, Loh Po-Ru, Anttila Verneri, Xu Han, Zang Chongzhi, Farh Kyle, et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nature genetics, 47(11):1228, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Loh Po-Ru, Bhatia Gaurav, Gusev Alexander, Finucane Hilary K, Bulik-Sullivan Brendan K, Pollack Samuela J, de Candia Teresa R, Lee Sang Hong, Wray Naomi R, Kendler Kenneth S, et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nature genetics, 47(12):1385–1392, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Bulik-Sullivan Brendan, Finucane Hilary K, Anttila Verneri, Gusev Alexander, Day Felix R, Loh Po-Ru, ReproGen Consortium, Psychiatric Genomics Consortium, Genetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control Consortium 3, Duncan Laramie, et al. An atlas of genetic correlations across human diseases and traits. Nature genetics, 47(11):1236–1241, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Gazal Steven, Finucane Hilary K, Furlotte Nicholas A, Loh Po-Ru, Palamara Pier Francesco, Liu Xuanyao, Schoech Armin, Bulik-Sullivan Brendan, Neale Benjamin M, Gusev Alexander, et al. Linkage disequilibrium–dependent architecture of human complex traits shows action of negative selection. Nature genetics, 49(10):1421, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Speed Doug, Cai Na, Consortium Ucleb, Johnson Michael R, Nejentsev Sergey, and Balding David J. Reevaluation of snp heritability in complex human traits. Nature genetics, 49(7):986–992, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Lu Qiongshi, Li Boyang, Ou Derek, Erlendsdottir Margret, Powles Ryan L, Jiang Tony, Hu Yiming, Chang David, Jin Chentian, Dai Wei, et al. A powerful approach to estimating annotation-stratified genetic covariance via gwas summary statistics. The American Journal of Human Genetics, 101(6):939–964, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Gazal Steven, Loh Po-Ru, Finucane Hilary K, Ganna Andrea, Schoech Armin, Sunyaev Shamil, and Price Alkes L. Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations. Nature genetics, 50(11):1600–1607, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Evans Luke M, Tahmasbi Rasool, Vrieze Scott I, Abecasis Gonçalo R, Das Sayantan, Gazal Steven, Bjelland Douglas W, De Candia Teresa R, Haplotype Reference Consortium, Goddard Michael E, et al. Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits. Nature genetics, 50(5):737–745, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Zeng Jian, De Vlaming Ronald, Wu Yang, Robinson Matthew R, Lloyd-Jones Luke R, Yengo Loic, Yap Chloe X, Xue Angli, Sidorenko Julia, McRae Allan F, et al. Signatures of negative selection in the genetic architecture of human complex traits. Nature genetics, 50(5):746–753, 2018. [DOI] [PubMed] [Google Scholar]
- 14.Zhang Yan, Qi Guanghao, Park Ju-Hyun, and Chatterjee Nilanjan. Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits. Nature genetics, 50(9):1318–1326, 2018. [DOI] [PubMed] [Google Scholar]
- 15.Gazal Steven, Marquez-Luna Carla, Finucane Hilary K, and Price Alkes L. Reconciling s-ldsc and ldak functional enrichment estimates. Nature genetics, 51(8):1202–1204, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Schoech Armin P, Jordan Daniel M, Loh Po-Ru, Gazal Steven, O’Connor Luke J, Balick Daniel J, Palamara Pier F, Finucane Hilary K, Sunyaev Shamil R, and Price Alkes L. Quantification of frequency-dependent genetic architectures in 25 uk biobank traits reveals action of negative selection. Nature communications, 10(1):1–10, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.O’Connor Luke J, Schoech Armin P, Hormozdiari Farhad, Gazal Steven, Patterson Nick, and Price Alkes L. Extreme polygenicity of complex traits is explained by negative selection. The American Journal of Human Genetics, 105(3):456–476, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Speed Doug and Balding David J. Sumher better estimates the snp heritability of complex traits from summary statistics. Nature genetics, 51(2):277–284, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.O’Connor Luke J. The distribution of common-variant effect sizes. Nature genetics, 53(8):1243–1249, 2021. [DOI] [PubMed] [Google Scholar]
- 20.Zeng Jian, Xue Angli, Jiang Longda, Lloyd-Jones Luke R, Wu Yang, Wang Huanwei, Zheng Zhili, Yengo Loic, Kemper Kathryn E, Goddard Michael E, et al. Widespread signatures of natural selection across human complex traits and functional genomic categories. Nature Communications, 12(1):1164, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Song Shuang, Jiang Wei, Zhang Yiliang, Hou Lin, and Zhao Hongyu. Leveraging ld eigenvalue regression to improve the estimation of snp heritability and confounding inflation. The American Journal of Human Genetics, 109(5):802–811, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Hill William G and Robertson Alan. The effect of linkage on limits to artificial selection. Genetics Research, 8(3):269–294, 1966. [PubMed] [Google Scholar]
- 23.Brown Brielin C, Price Alkes L, Patsopoulos Nikolaos A, and Zaitlen Noah. Local joint testing improves power and identifies hidden heritability in association studies. Genetics, 203(3):1105–1116, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Sohail Mashaal, Vakhrusheva Olga A, Sul Jae Hoon, Pulit Sara L, Francioli Laurent C, Genome of the Netherlands Consortium, Alzheimer’s Disease Neuroimaging Initiative, van den Berg Leonard H, Veldink Jan H, de Bakker Paul IW, et al. Negative selection in humans and fruit flies involves synergistic epistasis. Science, 356(6337):539–542, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Garcia Jesse A and Lohmueller Kirk E. Negative linkage disequilibrium between amino acid changing variants reveals interference among deleterious mutations in the human genome. PLoS Genetics, 17(7):e1009676, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Ragsdale Aaron P. Local fitness and epistatic effects lead to distinct patterns of linkage disequilibrium in protein-coding genes. Genetics, 221(4):iyac097, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Good Benjamin H. Linkage disequilibrium between rare mutations. Genetics, 220(4):iyac004, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Li Ang, Liu Shouye, Bakshi Andrew, Jiang Longda, Chen Wenhan, Zheng Zhili, Sullivan Patrick F, Visscher Peter M, Wray Naomi R, Yang Jian, et al. mbat-combo: a more powerful test to detect gene-trait associations from gwas data. The American Journal of Human Genetics, 110(1):30–43, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Romeo Stefano, Pennacchio Len A, Fu Yunxin, Boerwinkle Eric, Tybjaerg-Hansen Anne, Hobbs Helen H, and Cohen Jonathan C. Population-based resequencing of angptl4 uncovers variations that reduce triglycerides and increase hdl. Nature genetics, 39(4):513–516, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Do Ron, Stitziel Nathan O, Won Hong-Hee, Jørgensen Anders Berg, Duga Stefano, Merlini Pier Angelica, Kiezun Adam, Farrall Martin, Goel Anuj, Zuk Or, et al. Exome sequencing identifies rare ldlr and apoa5 alleles conferring risk for myocardial infarction. Nature, 518(7537):102–106, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Genovese Giulio, Fromer Menachem, Stahl Eli A, Ruderfer Douglas M, Chambert Kimberly, Landén Mikael, Moran Jennifer L, Purcell Shaun M, Sklar Pamela, Sullivan Patrick F, et al. Increased burden of ultra-rare protein-altering variants among 4,877 individuals with schizophrenia. Nature neuroscience, 19(11):1433–1441, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Ganna Andrea, Genovese Giulio, Howrigan Daniel P, Byrnes Andrea, Kurki Mitja I, Zekavat Seyedeh M, Whelan Christopher W, Kals Mart, Nivard Michel G, Bloemendal Alex, et al. Ultra-rare disruptive and damaging mutations influence educational attainment in the general population. Nature neuroscience, 19(12):1563–1565, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Ganna Andrea, Satterstrom F Kyle, Zekavat Seyedeh M, Das Indraniel, Kurki Mitja I, Churchhouse Claire, Alfoldi Jessica, Martin Alicia R, Havulinna Aki S, Byrnes Andrea, et al. Quantifying the impact of rare and ultra-rare coding variation across the phenotypic spectrum. The American Journal of Human Genetics, 102(6):1204–1211, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Barton Alison R, Sherman Maxwell A, Mukamel Ronen E, and Loh Po-Ru. Whole-exome imputation within uk biobank powers rare coding variant association and fine-mapping analyses. Nature genetics, 53(8):1260–1269, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.She Richard and Jarosz Daniel F. Mapping causal variants with single-nucleotide resolution reveals biochemical drivers of phenotypic change. Cell, 172(3):478–490, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Bernstein Max R, Zdraljevic Stefan, Andersen Erik C, and Rockman Matthew V. Tightly linked antagonistic-effect loci underlie polygenic phenotypic variation in c. elegans. Evolution letters, 3(5):462–473, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Bycroft Clare, Freeman Colin, Petkova Desislava, Band Gavin, Elliott Lloyd T, Sharp Kevin, Motyer Allan, Vukcevic Damjan, Delaneau Olivier, O’Connell Jared, et al. The uk biobank resource with deep phenotyping and genomic data. Nature, 562(7726):203–209, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Simons Yuval B, Bullaughey Kevin, Hudson Richard R, and Sella Guy. A population genetic interpretation of gwas findings for human quantitative traits. PLoS biology, 16(3):e2002985, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Simons Yuval Benjamin, Mostafavi Hakhamanesh, Smith Courtney Jean, Pritchard Jonathan K, and Sella Guy. Simple scaling laws control the genetic architectures of human complex traits. bioRxiv, pages 2022–10, 2022. [Google Scholar]
- 40.Schoech Armin P, Weissbrod Omer, O’Connor Luke J, Patterson Nick, Shi Huwenbo, Reshef Yakir, and Price Alkes. Negative short-range genomic autocorrelation of causal effects on human complex traits. bioRxiv, 2020. Submitted to Cell Genomics on 09/23/20, under revision on 01/04/21, withdrawn from consideration due to lack of bandwidth of the first author (who left academia). [Google Scholar]
- 41.Yang Jian, Weedon Michael N, Purcell Shaun, Lettre Guillaume, Estrada Karol, Willer Cristen J, Smith Albert V, Ingelsson Erik, O’connell Jeffrey R, Mangino Massimo, et al. Genomic inflation factors under polygenic inheritance. European Journal of Human Genetics, 19(7):807–812, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Benner Christian, Havulinna Aki S, Järvelin Marjo-Riitta, Salomaa Veikko, Ripatti Samuli, and Pirinen Matti. Prospects of fine-mapping trait-associated genomic regions by using summary statistics from genome-wide association studies. The American Journal of Human Genetics, 101(4):539–551, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Weissbrod Omer, Hormozdiari Farhad, Benner Christian, Cui Ran, Ulirsch Jacob, Gazal Steven, Schoech Armin P, Van De Geijn Bryce, Reshef Yakir, Márquez-Luna Carla, et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nature genetics, 52(12):1355–1363, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Marchini Jonathan and Howie Bryan. Genotype imputation for genome-wide association studies. Nature Reviews Genetics, 11(7):499–511, 2010. [DOI] [PubMed] [Google Scholar]
- 45.Rentzsch Philipp, Witten Daniela, Cooper Gregory M, Shendure Jay, and Kircher Martin. Cadd: predicting the deleteriousness of variants throughout the human genome. Nucleic acids research, 47(D1):D886–D894, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Bulmer MG1. The effect of selection on genetic variability. The American Naturalist, 105(943):201–211, 1971. [Google Scholar]
- 47.Bulmer MG. The genetic variability of polygenic characters under optimizing selection, mutation and drift. Genetics Research, 19(1):17–25, 1972. [DOI] [PubMed] [Google Scholar]
- 48.Deng NA, Zhou Heng, Fan Hua, and Yuan Yuan. Single nucleotide polymorphisms and cancer susceptibility. Oncotarget, 8(66):110635, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Chatterjee Nilanjan, Shi Jianxin, and García-Closas Montserrat. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nature Reviews Genetics, 17(7):392–406, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Torkamani Ali, Wineinger Nathan E, and Topol Eric J. The personal and clinical utility of polygenic risk scores. Nature Reviews Genetics, 19(9):581–590, 2018. [DOI] [PubMed] [Google Scholar]
- 51.Schaid Daniel J, Chen Wenan, and Larson Nicholas B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nature Reviews Genetics, 19(8):491–504, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Li Xihao, Li Zilin, Zhou Hufeng, Gaynor Sheila M, Liu Yaowu, Chen Han, Sun Ryan, Dey Rounak, Arnett Donna K, Aslibekyan Stella, et al. Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nature genetics, 52(9):969–983, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Kichaev Gleb, Yang Wen-Yun, Lindstrom Sara, Hormozdiari Farhad, Eskin Eleazar, Price Alkes L, Kraft Peter, and Pasaniuc Bogdan. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS genetics, 10(10):e1004722, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Chen Wenan, McDonnell Shannon K, Thibodeau Stephen N, Tillmans Lori S, and Schaid Daniel J. Incorporating functional annotations for fine-mapping causal variants in a bayesian framework using summary statistics. Genetics, 204(3):933–958, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Kichaev Gleb and Pasaniuc Bogdan. Leveraging functional-annotation data in trans-ethnic fine-mapping studies. The American Journal of Human Genetics, 97(2):260–271, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Ishigaki Kazuyoshi, Sakaue Saori, Terao Chikashi, Luo Yang, Sonehara Kyuto, Yamaguchi Kensuke, Amariuta Tiffany, Too Chun Lai, Laufer Vincent A, Scott Ian C, et al. Multi-ancestry genome-wide association analyses identify novel genetic mechanisms in rheumatoid arthritis. Nature genetics, 54(11):1640–1651, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Mahajan Anubha, Spracklen Cassandra N, Zhang Weihua, Ng Maggie CY, Petty Lauren E, Kitajima Hidetoshi, Yu Grace Z, Rüeger Sina, Speidel Leo, Kim Young Jin, et al. Multi-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation. Nature genetics, 54(5):560–572, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Márquez-Luna Carla, Loh Po-Ru, South Asian Type 2 Diabetes (SAT2D) Consortium, SIGMA Type 2 Diabetes Consortium, and Price Alkes L. Multiethnic polygenic risk scores improve risk prediction in diverse populations. Genetic epidemiology, 41(8):811–823, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Martin Alicia R, Kanai Masahiro, Kamatani Yoichiro, Okada Yukinori, Neale Benjamin M, and Daly Mark J. Clinical use of current polygenic risk scores may exacerbate health disparities. Nature genetics, 51(4):584–591, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Wang Ying, Guo Jing, Ni Guiyan, Yang Jian, Visscher Peter M, and Yengo Loic. Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations. Nature communications, 11(1):3865, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Weissbrod Omer, Kanai Masahiro, Shi Huwenbo, Gazal Steven, Peyrot Wouter J, Khera Amit V, Okada Yukinori, Martin Alicia R, Finucane Hilary K, et al. Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores. Nature Genetics, 54(4):450–458, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Ruan Yunfeng, Lin Yen-Feng, Feng Yen-Chen Anne, Chen Chia-Yen, Lam Max, Guo Zhenglin, He Lin, Sawa Akira, Martin Alicia R, et al. Improving polygenic prediction in ancestrally diverse populations. Nature genetics, 54(5):573–580, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Hill William G, Goddard Michael E, and Visscher Peter M. Data and theory point to mainly additive genetic variance for complex traits. PLoS genetics, 4(2):e1000008, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Mäki-Tanila Asko and Hill William G. Influence of gene interaction on complex trait variation with multilocus models. Genetics, 198(1):355–367, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Hivert Valentin, Sidorenko Julia, Rohart Florian, Goddard Michael E, Yang Jian, Wray Naomi R, Yengo Loic, and Visscher Peter M. Estimation of non-additive genetic variance in human complex traits from a large sample of unrelated individuals. The American Journal of Human Genetics, 108(5):786–798, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Lee Seunggeung, Abecasis Gonçalo R, Boehnke Michael, and Lin Xihong. Rare-variant association analysis: study designs and statistical tests. The American Journal of Human Genetics, 95(1):5–23, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Povysil Gundula, Petrovski Slavé, Hostyk Joseph, Aggarwal Vimla, Allen Andrew S, and Goldstein David B. Rare-variant collapsing analyses for complex traits: guidelines and applications. Nature Reviews Genetics, 20(12):747–759, 2019. [DOI] [PubMed] [Google Scholar]
- 68.Kanai Masahiro, Akiyama Masato, Takahashi Atsushi, Matoba Nana, Momozawa Yukihide, Ikeda Masashi, Iwata Nakao, Ikegawa Shiro, Hirata Makoto, Matsuda Koichi, et al. Genetic analysis of quantitative traits in the japanese population links cell types to complex human diseases. Nature genetics, 50(3):390–400, 2018. [DOI] [PubMed] [Google Scholar]
- 69.All of Us Research Program Investigators. The “all of us” research program. New England Journal of Medicine, 381(7):668–676, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Gazal Steven, Weissbrod Omer, Hormozdiari Farhad, Dey Kushal K, Nasser Joseph, Jagadeesh Karthik A, Weiner Daniel J, Shi Huwenbo, Fulco Charles P, O’Connor Luke J, et al. Combining snp-to-gene linking strategies to identify disease genes and assess disease omnigenicity. Nature Genetics, 54(6):827–836, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Yang Jian, Hong Lee S, Goddard Michael E, and Visscher Peter M. Gcta: a tool for genome-wide complex trait analysis. The American Journal of Human Genetics, 88(1):76–82, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Dutheil Julien Y, Gaillard Sylvain, and Stukenbrock Eva H. Maffilter: a highly flexible and extensible multiple genome alignment files processor. BMC genomics, 15:1–10, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature, 526(7571):68, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Cingolani P., Platts A., Coon M., Nguyen T., Wang L., Land S.J., Lu X., and Ruden D.M.. A program for annotating and predicting the effects of single nucleotide polymorphisms, snpeff: Snps in the genome of drosophila melanogaster strain w1118; iso-2; iso-3. Fly, 6(2):80–92, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.McLaren William, Gil Laurent, Hunt Sarah E, Riat Harpreet Singh, Ritchie Graham RS, Thormann Anja, Flicek Paul, and Cunningham Fiona. The ensembl variant effect predictor. Genome biology, 17(1):1–14, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Haller Benjamin C and Messer Philipp W. SLiM 3: Forward genetic simulations beyond the Wright-Fisher model. Mol. Biol. Evol., 36(3):632–637, March 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Quick Corbin, Fuchsberger Christian, Taliun Daniel, Abecasis Gonçalo, Boehnke Michael, and Kang Hyun Min. emeraLD: rapid linkage disequilibrium estimation with massive datasets. Bioinformatics, 35(1):164–166, January 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Information of imputed SNPs and corresponding ancestral alleles, GWAS summary statistics, baseline-SP single-SNP and SNP-pair annotations, LD scores, directional LD scores, and LDSPEC output from this study are available at https://figshare.com/projects/LD_SNP-pair_effect_correlation_regression_LDSPEC_/188052. We did not release in-sample LD files due to their large sizes; similar in-sample LD files can be found in ref.43. The whole genome alignment of the Human hg19 genome assembly to the Chimpanzee panTro6 genome assembly is available at http://hgdownload.cse.ucsc.edu/goldenpath/hg19/vsPanTro6/. CADD database v1.645 is available at https://cadd.gs.washington.edu/download. GENCODE v41 is available at https://www.gencodegenes.org/human/release_41.html. The promoter annotation from ref.70 is available at https://alkesgroup.broadinstitute.org/cS2G.