Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2016 Jun 8;6:27644. doi: 10.1038/srep27644

A method to estimate the contribution of regional genetic associations to complex traits from summary association statistics

Guillaume Pare 1,2,3,4,a, Shihong Mao 3, Wei Q Deng 5
PMCID: PMC4897708  PMID: 27273519

Abstract

Despite considerable efforts, known genetic associations only explain a small fraction of predicted heritability. Regional associations combine information from multiple contiguous genetic variants and can improve variance explained at established association loci. However, regional associations are not easily amenable to estimation using summary association statistics because of sensitivity to linkage disequilibrium (LD). We now propose a novel method, LD Adjusted Regional Genetic Variance (LARGV), to estimate phenotypic variance explained by regional associations using summary statistics while accounting for LD. Our method is asymptotically equivalent to a multiple linear regression model when no interaction or haplotype effects are present. It has several applications, such as ranking of genetic regions according to variance explained or comparison of variance explained by two or more regions. Using height and BMI data from the Health Retirement Study (N = 7,776), we show that most genetic variance lies in a small proportion of the genome and that previously identified linkage peaks have higher than expected regional variance.


Currently known genetic associations only explain a relatively small proportion of complex traits variance. In accordance with the widely accepted polygenic nature of complex traits, it has been proposed that weak, yet undetected, associations underlie complex trait heritability of a wide variety of phenotypes such as height1, cognitive function2 or rheumatoid arthritis3, etc. We have recently shown that the joint association of multiple weakly associated variants over large chromosomal regions contributes to complex traits variance4. Such regional associations are not easily amenable to estimation using summary-level association statistics because of sensitivity to linkage disequilibrium (LD). Nonetheless, only large meta-analyses have the necessary power to identify weakly associated variants and results are typically reported in the form of summary association statistics. In this report, we propose a novel method to assess the contribution of regional associations to complex traits variance using summary association statistics. Estimation of regional associations with our novel method, LD Adjusted Regional Genetic Variance (LARGV), can help identify key genomic regions involved in regulation of complex traits.

Clustering of weak associations within defined chromosomal regions has been previously suggested5 and can increase variance explained at established association loci as compared to genome-wide significant SNPs alone6. Such regional associations extended up to 433.0 Kb from genome-wide significant SNPs4, a distance compatible with long-range cis regulation of gene expression7,8. Furthermore, regional associations appeared to be the results of multiple weak associations rather than one or a few very significant univariate associations. These results point towards the existence of key regulatory regions where functional genetic variants aggregate, the identification of which can lead to novel biological insights and a better understanding of complex traits genetics.

Several methods have been described to estimate the overall contribution of common genetic variants to complex traits variance, but no method was specifically designed to estimate regional association using summary association statistics while accounting for LD (Table 1). For instance, a popular approach is based on variance component models using genetic relatedness as the variance-covariance matrix of the random effect. An implementation of this approach uses REML1 to estimate the genetic effect, and modifications have been reported to either take into account LD between SNPs9 or to handle large datasets10. While very useful and informative, all of these approaches require individual-level data. This latter pitfall has been overcome by the development of LDScore11, which uses summary-level association statistics as inputs and has been shown to be equivalent to Elston regression12. However, LDScore is ill suited for estimation of regional heritability because it is based on regressing genetic effects on the sum of linkage disequilibrium and requires large number of SNPs for precise estimations. A multi-SNP locus-association method has been described that uses summary association statistics, but it necessitates to first prune SNPs for p-value (p < 0.01) and LD (r2 < 0.1)13. Finally, alternative methods14,15 estimate genetic variance from the distribution of effect sizes and are not appropriate for regional genetic variance because the small number of SNPs will lead to an imprecise estimation and linkage equilibrium is assumed. There is thus a need for a method to estimate the regional contribution of common genetic variants using summary association statistics while simultaneously taking LD into account.

Table 1. Advantages and disadvantages of existing methods to estimate regional genetic variance from summary association statistics.

Method Advantages Disadvantages
Multi-SNP locus association13 -Can estimate genetic variance of medium to large regions (i.e. small regions might not have SNPs remaining after LD and p-value pruning) -Need for LD pruning as it assumes uncorrelated markers-Need for p-value pruning-Assumes population structure has been entirely adjusted for
LDScore11 -Includes all SNPs irrespective of LD-Adjusts for potential bias such as population stratification-Can estimate genetic covariance between two traits -Need for a reference LD structure-Best suited for genome-wide analysis or large regions
Distribution of effect size3,14,15 (AVENGEME and ABPA) -Estimates the proportion of markers affecting a trait-Can estimate genetic covariance between two traits -Need for LD pruning as it assumes uncorrelated markers-Assumes population structure has been entirely adjusted for-Best suited for genome-wide analysis or large regions
LD Adjusted Regional Genetic Variance (LARGV) -Includes all SNPs irrespective of LD-Can estimate genetic variance of any region, small or large -Need for a reference LD structure-Assumes population structure has been entirely adjusted for

Results

Comparison of genetic variance estimated using summary statistics and variance component models

Our method estimates the regional contribution to complex trait variance using summary data by adjusting the variance explained by each SNP for its LD with neighboring SNPs. Simulations using 1000 Genomes Project16 (1000G) data showed that genetic regional variance is accurately estimated by our method when no haplotype or interaction effects are present (Figure S1), as predicted by theoretical derivations (see Methods). We sought to compare estimation of overall genetic variance by our novel method to variance component models. We divided the genome into SNP blocks of median size 250 Kb (85–95 contiguous SNPs) minimizing inter-block LD and applied our method to BMI and height in the Health Retirement Study17 (HRS; N = 7,776) for which individual-level genotypes were available. Using only summary association statistics for our method and corresponding individual-level data for variance component models, both methods provided consistent estimates of genome-wide genetic variance (Fig. 1). We explored the impact of adjustment for genetic principal components. The first 20 components provided adequate protection against population stratification while the inclusion of fewer components led to the inflation in genetic variance, especially for height. Using 20 components, genetic variance was estimated at 0.12 (95% CI 0.01–0.24) for BMI with our novel method and 0.14 (95% CI 0.05–0.23) with variance component models18. The corresponding estimates for height were 0.28 (95% CI 0.17–0.39) and 0.30 (95% CI 0.20–0.39).

Figure 1. Genome-wide genetic variance estimated by summary association statistics and variance component models.

Figure 1

The overall genetic variance estimated by summary association statistics and variance component models is illustrated as a function of number of principal components for BMI (a) and height (b). Blue lines represent estimates of genetic variance using summary association statistics with LARGV; with 95% confidence intervals illustrated as blue shaded area. Corresponding estimates for variance component models are in red.

Using summary association statistics to identify regional associations

We next sought to compare the ability of our method to identify regional associations with other approaches also using summary association statistics. We ranked SNP blocks (median size of 250 Kb) according to decreasing regional genetic variance by applying three different methods on Genetic Investigation of Anthropometric Traits consortium (GIANT) summary association statistics19,20: (1) our proposed approach (LARGV), (2) LDScore and (3) the multi-SNP locus-association proposed by Ehret and colleagues13. We then used SNP block ranks derived from GIANT by each of the three methods to estimate genetic variance in HRS (which is not part of GIANT) with variance component models, successively adding a higher proportion of SNP blocks. As illustrated in Fig. 2, the genetic variance estimated by LARGV increased more rapidly with the proportion of top SNP blocks included as compared to other two methods. The multi-SNP locus-association method performed well, but a large proportion of SNP blocks did not have any SNP left after LD and p-value pruning (68.4% for BMI and 26.7% for height), highlighting the disadvantage of pruning instead of adjusting for LD. We obtained consistent results when using LD data from 1000G data instead of HRS (Figure S2), changing SNP block size (Figure S3), or changing the number of neighboring SNPs included in LD calculations (Figure S4).

Figure 2. Genetic variance as a function of proportion of top SNP blocks.

Figure 2

Genetic variance in HRS based on SNP block ranking derived from GIANT summary association statistics. Three methods were tested to rank SNP blocks: LARGV (red), LD Score (blue) and multi-SNP locus-association (green). The median SNP block size was 250 Kb (i.e. 85–95 SNPs). Genetic variance was calculated in HRS using variance component models for BMI (a) and height (b).

A relatively small proportion of blocks contributed disproportionately to genetic variance. The top 25% SNP blocks explained 0.94 of BMI genetic variance (i.e. = 0.132/0.140) and 0.83 of height genetic variance when using LARGV on GIANT data to rank the genetic variance of each SNP block. These results could potentially be explained by the presence of one or more very strong associations in each of these top SNP blocks. To explore this possibility, we recorded the minimum univariate association SNP p-value for each block in both GIANT and HRS (Fig. 3). Median minimum univariate p-value was 2.0 × 10−2 for BMI in GIANT, with 1% of blocks having one or more genome-wide significant associations (p < 5 × 10−8). On the other hand, the median minimum p-value was 1.9 × 10−3 for height in GIANT, with 9% of blocks having one or more genome-wide significant associations. The median minimum univariate p-values were 1.8 × 10−2 and 1.7 × 10−2 for BMI and height in HRS. No SNP reached genome-wide significance in HRS.

Figure 3. Minimum univariate SNP association p-value for each SNP block.

Figure 3

SNP block ranks are based on GIANT data using our novel approach while the minimum univariate SNP association p-values were taken from GIANT (a,c) or calculated in HRS (b,d) for BMI (a,b) and height (c,d). The median SNP block size was 250 Kb (i.e. 85–95 SNPs).

Analysis of known linkage peaks

A unique application of our method is the estimation of genetic variance over extended genomic regions using summary association statistics. We therefore tested the hypothesis that previously identified linkage peaks are enriched for regional associations. Based on results from the largest linkage study of height and BMI, three peaks with suggestive (LOD>2.0) evidence of linkage in Europeans were identified, all for height21. The only peak with LOD > 3.0 showed a significant (p = 0.002) enrichment in regional association within a distance of +/−7.5 Mb from the linkage marker, corresponding to an estimated excess regional genetic variance of 0.0044 over the genome-wide average (Table 2). Upon a closer inspection, the region encompasses several sub-regions with genome-wide significant associations.

Table 2. Excess regional genetic variance at three suggestive (LOD > 2.0) linkage peaks for height using GIANT summary association statistics.

Chr. Peak Marker LOD Score Excess genetic variance 95% CI Upper limit 95% CI lower limit p-value
11 D11S2000 2.74 −0.0021 0.0002 −0.0044 0.07
12 D12S1301 2.07 −0.0005 0.0014 −0.0024 0.61
15 D15S655 3.00 0.0044 0.0072 0.0017 0.002

Regional genetic variance was calculated within +/−7.5 Mb of each peak marker and compared to genome-wide average for regions of equivalent size. P-values are two-sided.

Discussion

We propose a novel method to estimate regional genetic variance from summary association statistics. Using this method, we confirmed a major role of regional associations in complex trait heritability, whereby the aggregation of genetic associations contributes disproportionately to phenotypic variance. Selecting the top SNP blocks from the GIANT meta-analysis, we showed that 25% of the genome is responsible for up to 0.94 and 0.83 of BMI and height genetic variance. A large proportion of these blocks had unremarkable minimum univariate p-values, suggesting the presence of multiple weak associations underlies their impact on phenotypic variance, especially for BMI. The concentration of genetic associations within these regions supports the existence of critical nodes in the genetic regulation of complex traits such as height and BMI, with implications not only for association testing but also for population genetics and natural selection. These results also suggest that a combination of strong genetic associations and regional associations contribute to complex traits variance, with the relative proportions varying across traits. For instance, a higher proportion of genetic variance was found in the top 25% blocks for BMI yet these blocks had less significant minimum univariate p-value than height.

Our method can also be used to estimate the genetic variance explained by extended regions. We therefore tested the hypothesis that some of the previously identified linkage peaks are the result of regional associations. The only known linkage peak with LOD > 3.0 for height showed a marked and significant enrichment in regional association. This region had been previously identified in multiple linkage studies21,22. Genetic variance explained by the region was estimated at 0.0044, which is unlikely to explain the linkage peak by itself. Nonetheless, the juxtaposition of linkage and regional associations points towards concentration of functional variants as a potential explanation for the observed linkage. The lack of regional association at other peaks can be explained by false-positive linkage results, rare variant associations undetected by genome-wide association studies or by differences in genetic architecture between studied populations.

Our proposed method has several advantages and is complementary to other methods. First, it provides results that are highly consistent with the widely used variance component models requiring individual-level data. Second, it is computationally straightforward and can be applied to large datasets. Third, it is agnostic and therefore complementary to functional annotations of the genome. Fourth, since SNPs are not pruned by either LD or significance, every SNP block or region can be evaluated.

A few limitations are worth mentioning. First, the assumption that SNPs contribute to genetic variance without any interaction or haplotype effects can lead to an underestimation of genetic variance. While our estimates were consistent with variance component models, there is a need for statistical models that better capture genetic variance when strong haplotype effects are expected23. Second, estimation of overall genome-wide genetic variance using summary association statistics is dependent on both the accuracy of SNP effect size estimates and the correct specification of LD structure. For instance, differences in adjustment for population stratification (e.g. principal components) in individual studies that participate in a meta-analysis could potentially influence the results. Ideally, LD data would come from the same population used to derived the summary association statistics, which could be included as summary statistics for each SNP (i.e. ηd, see Methods). While this information is currently not available, consistent results were observed when using LD data from either HRS or 1000G, demonstrating the robustness of our approach to LD misspecification. Third, we have only tested continuous traits. Nevertheless, our method can be easily adapted to other outcome types through the use of generalized linear models.

In this report, we establish a novel method to estimate the regional contribution of common variants to complex traits variance using summary association statistics. Our method has several applications, such as ranking of genetic regions for genetic variance and identification of key regions contributing to genetic variance. Our method can also be used to perform network analysis using summary association statistics, or to combine summary association statistics with other types of genetic annotations as we have shown with linkage results.

Methods

Methods overview

We have previously shown that large-region joint association, where multiple genetic variants are included as independent variables in a linear model, is a simple and powerful way to test for regional associations when individual-level data are available4. However, such regional joint associations are not easily amenable to estimation using summary data because of sensitivity to linkage disequilibrium. To evaluate the contribution of large regions joint associations to the genetic variance of complex traits, we first devised an algorithm to divide the genome in blocks of SNPs in such a way to minimize inter-block linkage disequilibrium and thus “spillage” associations. We then derived a method to estimate regional associations using summary data and showed this method to be equivalent to a multiple linear regression model when genetic effects are strictly additive (i.e. no haplotype or interaction effect). Using regional variance estimates from summary association statistics for height and BMI from the Genetic Investigation of Anthropometric Traits (GIANT) consortium, we estimated the distribution of genetic effects across the genome and validated results in the Health Retirement Study (HRS), which is not part of GIANT. A user-friendly software is available to produce SNP blocks and LD adjustment upon request at pareg@mcmaster.ca.

Dividing the genome into SNP blocks

We first divided the genome into regions of contiguous SNPs varying in size (e.g. from 195 SNPs to 205 SNPs), herein referred to as SNP blocks and used as units for regional associations. To minimize inter-block LD and thus “spillage” associations, we devised a greedy algorithm optimizing the choice of block boundary sequentially from one end of a chromosome to the other. Briefly, using a user-defined minimum and maximum block size (in number of SNPs) and starting at one end of a chromosome arm, each possible “cut-point” between the first and second block are tested and maximal LD (r2) between pairs of SNPs crossing block boundary is calculated. The cut-point that minimizes the maximal LD is chosen, thus defining the first block, and the procedure is repeated for each subsequent block until all SNPs on a chromosome arm have been assigned to a block. We empirically determined that SNP blocks of size 85 SNPs to 95 SNPs (median 90 SNPs) had a median physical size of 250 Kb.

Estimating the contribution of regional genetic associations with individual-level genotypes

The use of adjusted R2 lends itself nicely to estimation of regional variance explained when individual-level genotypes are available4. In this context, SNPs comprised in a given SNP block are included as independent variables in a multiple linear regression model and the goodness of fit statistic, adjusted R2 calculated. Because the adjusted R2accounts for the number of SNPs included in each block, the expected adjusted R2 is zero under the null hypothesis of no association and the expected sum of adjusted R2 over all SNP blocks is also zero. The overall contribution of regional associations to complex traits variance can be estimated by simply summing the adjusted R2 over all (or selected) SNP blocks. Furthermore, the distribution of adjusted R2 under the null hypothesis of no association has been previously described24 and can be used to derive the distribution of the sum of adjusted R2.

Estimating the contribution of regional genetic associations with summary association statistics

Estimating the contribution of regional genetic associations from summary association statistics is challenging when the exact SNP linkage disequilibrium structure of source populations is unknown. While approaches have been described to perform joint or conditional associations23,25 using estimated SNP covariance matrices, they do not perform well when estimating regional variance explained because of sensitivity to misspecification of linkage disequilibrium (data not shown) and ensuing an overestimation of regional associations. We therefore created a simple procedure to estimate regional variance explained from summary association statistics data, adjusting for linkage disequilibrium.

Without loss of generalizability, we assume a quantitative trait (Y) standard normally distributed and genotypes normalized to have mean = 0 and standard deviation = 1 throughout. Given an n × m genotype matrix X representing genotypes at m SNPs in n individuals and the pairwise linkage disequilibrium (r2) between two SNPs k and l as r2k,l, for a SNP d, the following LD adjustment (ηd) can be defined as the summation of LD between the dth SNP and 100 SNPs upstream and downstream:

graphic file with name srep27644-m1.jpg

with a distance of 100 SNPs assumed sufficient to ensure linkage equilibrium (other values might be used). Only including SNPs with summary GWAS statistics in the sum, variance explained by each SNP d is given by:

graphic file with name srep27644-m2.jpg

where bd denotes the univariate regression coefficient commonly reported in GWAS results (assuming genotypes have been standardized to have mean zero and SD = 1). Regional variance explained is then given by the sum of Inline graphic over SNPs in a given region. Assuming a strictly additive genetic model where each SNP contributes additively to a trait without any interaction or haplotype effects, we demonstrate the expected total variance explained over a region Inline graphic is approximately equal to the expected value of the multiple linear regression variance explained Inline graphic when the sample size is sufficiently large.

To simplify the calculation, we define D such that Inline graphic is an m by m symmetric matrix, where Dk is a m × 1 vector whose entries represent the kth column of D. We will make use of the following properties:

graphic file with name srep27644-m7.jpg
graphic file with name srep27644-m8.jpg
graphic file with name srep27644-m9.jpg

where N is the total number of individuals in the sample.

Estimation of regional genetic variance with multiple linear regression models

Suppose the genotype matrix is fixed while the true genetic effect is a random vector β, whose individual components, i.e. the SNPs, i = 1, 2, …, m, have mean zero and variance σ2. The size of the variance σ2 is on the scale of M1, where M is the number of genome-wide SNPs. The genetic model can be expressed as:

graphic file with name srep27644-m10.jpg

where ε is a vector of standard normal error with identity variance covariance matrix. Then, the vector of estimated multiple linear regression coefficients B is given by:

graphic file with name srep27644-m11.jpg

The multivariate variance explained R2 can be written in terms of the true effect and the error term:

graphic file with name srep27644-m12.jpg

and since the error term has identity variance covariance and the true effect β has variance covariance matrix σ2I, the expected variance explained is simplified to

graphic file with name srep27644-m13.jpg

and the variance can be calculated accordingly:

graphic file with name srep27644-m14.jpg

Estimation of regional genetic variance using summary association statistics

The univariate regression coefficients, denoted by lower case b and directly obtained from GWAS summary statistics, are given by

graphic file with name srep27644-m15.jpg

The total variance explained over a region Inline graphic can be calculated using only the univariate regression coefficients from GWAS:

graphic file with name srep27644-m17.jpg

We can rewrite the expression by defining:

graphic file with name srep27644-m18.jpg

Thus, we have (Equation 10):

graphic file with name srep27644-m19.jpg

Since Inline graphic as LD tends to be weak 100 SNPs upstream and downstream away from the index SNP, we can simplify the expected total variance explained using the summary statistics to (Equation 11):

graphic file with name srep27644-m21.jpg

The variance of Inline graphic can be similarly derived. By using the cyclic property of trace, and the fact that DΛD is a positive definite matrix, an upper bound for the variance can be expressed as (Equation 12):

graphic file with name srep27644-m23.jpg

where Inline graphic and both terms are small with respect to Inline graphic.

Comparison of regional genetic variance estimated using multiple regression models and summary association statistics

The expected values are equivalent between multiple linear regression model Inline graphic and the summary statistics derived regional sum (Inline graphic), with the number of SNPs m replaced by the “effective” number of genetic markers Inline graphic. Variance is slightly bigger for multiple linear regression models Inline graphic as compared to regional sum Inline graphic because the “effective” number of genetic markers Inline graphic is always equal (no LD) or less than the number of markers (m). In other words, Inline graphic is expected to be equal (no LD) or lower than the corresponding Inline graphic.

Health Retirement Study

We conducted large region joint association analysis for height using genome-wide data from the publicly available Health Retirement Study (HRS; dbGaP Study Accession: phs000428.v1.p1). HRS quality control criteria were used for filtering of both genotype and phenotype data, namely: (1) SNPs and individuals with missingness higher than 2% were excluded, (2) related individuals were excluded, (3) only participants with self-reported European ancestry genetically confirmed by principal component analysis were included, (4) SNPs with Hardy-Weinberg equilibrium p < 1 × 10−6 were excluded, (5) individuals for whom the reported sex does not match their genetic sex were excluded, (6) SNPs with minor allele frequency lower than 0.02 were removed. The final dataset included 7,776 European participants genotyped for 740,748 SNPs. Height and BMI was adjusted for age and sex in all analyses. To mitigate the effect of outliers, we have removed values outside the 1st and 99th percentile range for each of height and BMI. All analyses are adjusted for the first 20 genetic principal components unless stated otherwise. All LD estimates used throughout the manuscript were derived from HRS genotypes. HRS was not part of the GIANT meta-analysis of height and BMI26,27.

Additional Information

How to cite this article: Pare, G. et al. A method to estimate the contribution of regional genetic associations to complex traits from summary association statistics. Sci. Rep. 6, 27644; doi: 10.1038/srep27644 (2016).

Supplementary Material

Supplementary Information
srep27644-s1.pdf (513.9KB, pdf)

Footnotes

Author Contributions G.P. designed the experiment; G.P. and W.Q.D. wrote the manuscript; S.M. analyzed the data and prepared tables and figures; All authors reviewed the manuscript.

References

  1. Yang J. et al. Common SNPs explain a large proportion of the heritability for human height. Nature Genetics 42, 565–569, doi: 10.1038/ng.608 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Davies G. et al. Genetic contributions to variation in general cognitive function: a meta-analysis of genome-wide association studies in the CHARGE consortium (N = 53949). Mol Psychiatry 20, 183–192, doi: 10.1038/mp.2014.188 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Stahl E. A. et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nature genetics 44, 483–489, doi: 10.1038/ng.2232 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Pare G., Asma S. & Deng W. Q. Contribution of large region joint associations to complex traits genetics. PLoS Genet 11, e1005103, doi: 10.1371/journal.pgen.1005103 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Beyene J., Tritchler D., Asimit J. L. & Hamid J. S. Gene- or region-based analysis of genome-wide association studies. Genet Epidemiol 33 Suppl 1, S105–110, doi: 10.1002/gepi.20481 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Gusev A. et al. Quantifying missing heritability at known GWAS loci. PLoS Genet 9, e1003993, doi: 10.1371/journal.pgen.1003993 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Cheung V. G. & Spielman R. S. Genetics of human gene expression: mapping DNA variants that influence gene expression. 10, 595–604, doi: 10.1038/nrg2630 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Consortium T. E. P. An integrated encyclopedia of DNA elements in the human genome. 489, 57–74, doi: 10.1038/nature11247 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Speed D., Hemani G., Johnson M. R. & Balding D. J. Improved heritability estimation from genome-wide SNPs. Am J Hum Genet 91, 1011–1021, doi: 10.1016/j.ajhg.2012.10.010 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Loh P.-R. et al. Contrasting regional architectures of schizophrenia and other complex diseases using fast variance components analysis. bioRxiv (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Bulik-Sullivan B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47, 291–295, doi: 10.1038/ng.3211 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Bulik-Sullivan B. Relationship between LD Score and Haseman-Elston Regression. bioRxiv (2015). [Google Scholar]
  13. Ehret G. B. et al. A multi-SNP locus-association method reveals a substantial fraction of the missing heritability. Am J Hum Genet 91, 863–871, doi: 10.1016/j.ajhg.2012.09.013 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Palla L. & Dudbridge F. A Fast Method that Uses Polygenic Scores to Estimate the Variance Explained by Genome-wide Marker Panels and the Proportion of Variants Affecting a Trait. Am J Hum Genet 97, 250–259, doi: 10.1016/j.ajhg.2015.06.005 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. So H. C., Li M. & Sham P. C. Uncovering the total heritability explained by all true susceptibility variants in a genome-wide association study. Genet Epidemiol 35, 447–456, doi: 10.1002/gepi.20593 (2011). [DOI] [PubMed] [Google Scholar]
  16. Genomes Project C. et al. A global reference for human genetic variation. Nature 526, 68–74, doi: 10.1038/nature15393 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Sonnega A. et al. Cohort Profile: the Health and Retirement Study (HRS). Int J Epidemiol 43, 576–585, doi: 10.1093/ije/dyu067 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Yang J., Lee S. H., Goddard M. E. & Visscher P. M. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 88, 76–82, doi: 10.1016/j.ajhg.2010.11.011 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Locke A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206, doi: 10.1038/nature14177 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Wood A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nature genetics 46, 1173–1186, doi: 10.1038/ng.3097 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Sammalisto S. et al. Genome-wide linkage screen for stature and body mass index in 3.032 families: evidence for sex- and population-specific genetic effects. Eur J Hum Genet 17, 258–266, doi: 10.1038/ejhg.2008.152 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Perola M. et al. Combined genome scans for body stature in 6,602 European twins: evidence for common Caucasian loci. PLoS Genet 3, e97, doi: 10.1371/journal.pgen.0030097 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Vilhjalmsson B. et al. Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. bioRxiv (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Ohtani K. & Tanizaki H. Exact Distributions of R2 and Adjusted R2 in a Linear Regression Model with Multivariate Error Terms. Journal Of The Japan Statistical Society 34, 101–109, doi: 10.14490/jjss.34.101 (2004). [DOI] [Google Scholar]
  25. Yang J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet 44, 369–375, S361–363, doi: 10.1038/ng.2213 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Berndt S. I. et al. Genome-wide meta-analysis identifies 11 new loci for anthropometric traits and provides insights into genetic architecture. Nat Genet 45, 501–512, doi: 10.1038/ng.2606 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Lango Allen H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838, doi: 10.1038/nature09410 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information
srep27644-s1.pdf (513.9KB, pdf)

Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES