Golan et al. (1) report that restricted maximum likelihood (REML) seriously underestimates SNP heritability when using a case–control design. Their conclusions are based on results from simplified linkage equilibrium SNP simulation (SLES), which the authors acknowledge may be unrealistic.
We simulated case–control data using the liability threshold model (1, 2), based on a real genome-wide association study (GWAS) of 800,000 SNPs from 64,000 samples, i.e., a genome-wide linkage disequilibrium SNP simulation (GLDS). Our simulation used a population disease risk of K = 0.01 and proportion of cases in the sample of P = 0.5 (therefore, there were 640 cases and 640 controls in the estimation analyses). A random 10,000, 1,000, or 100 SNPs across the genome were selected as risk loci. The genomic relationship matrix (GRM) was based on all of the SNPs. For comparison, the SLES (without real GWAS data) was used, following Golan et al. (1) where the GRM was calculated only from the risk SNPs that are independent from each other.
In Fig. 1A, we show that SLES unrealistically inflates the correlation between the eigenvectors of the GRM and disease status compared with GLDS (Fig. 1B) or that inferred from real data [e.g., figure S1 in the study by Gusev et al. (3)]. The artifactual correlation between the eigenvectors and disease status caused the inaccuracy of the REML estimates. The bias depends on the ratio of the number of individuals (N) to the number of risk SNPs (M) (Fig. 1). Unlike REML, a sophisticated approach, Haseman–Elston regression [referred to as phenotype correlation–genotype correlation (PCGC) by Golan et al. (1)] does not use the eigensystem of covariance structure; therefore, SLES does not affect the PCGC estimate (1). With GLDS, the REML estimates were stable and close to the true value regardless of the value of N/M (Fig. 2A). With SLES, the REML estimates were severely biased with increasing value of N/M (Fig. 2A).
We considered results from real data analyses (3) and plotted published SNP heritability estimates against the sample size for nine diseases (Fig. 2B). There was no difference between REML and PCGC, regardless of sample size, which was strikingly different from figure 2B in the study by Golan et al. (1). We also show estimation errors for the nine diseases assuming that the PCGC estimates are the true values (Fig. 2C), which were again dramatically different from results in figure S4 from Golan et al.
In derivation of the correction factor for case–control ascertainment bias, Lee et al. (2) used a simulation from a multivariate normal distribution based on a predefined relationship matrix. In real data analyses, the true relationship matrix is not known but can be approximated from genotypes, i.e., GRM pairwise estimator is unbiased under linkage disequilibrium; that is, the expectation of the estimator for each SNP is the kinship in the identical-by-descent (IBD) fraction sense (4), and therefore so is the estimate averaged over multiple SNPs. SLES ignores the concept of linkage disequilibrium, IBD, and coalescence. We urge researchers to use a more realistic genetic model (e.g., GLDS at least) in their simulation strategies and to be cautious of results drawn from SLES (1, 5).
Footnotes
The author declares no conflict of interest.
References
- 1.Golan D, Lander ES, Rosset S. Measuring missing heritability: Inferring the contribution of common variants. Proc Natl Acad Sci USA. 2014;111(49):E5272–E5281. doi: 10.1073/pnas.1419064111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Lee SH, Wray NR, Goddard ME, Visscher PM. Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet. 2011;88(3):294–305. doi: 10.1016/j.ajhg.2011.02.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Gusev A, et al. Schizophrenia Working Group of the Psychiatric Genomics Consortium SWE-SCZ Consortium Schizophrenia Working Group of the Psychiatric Genomics Consortium SWE-SCZ Consortium Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am J Hum Genet. 2014;95(5):535–552. doi: 10.1016/j.ajhg.2014.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Thompson EA. Identity by descent: Variation in meiosis, across genomes, and in populations. Genetics. 2013;194(2):301–326. doi: 10.1534/genetics.112.148825. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Chen G-B. Estimating heritability of complex traits from genome-wide association studies using IBS-based Haseman–Elston regression. Front Genet. 2014;5:107. doi: 10.3389/fgene.2014.00107. [DOI] [PMC free article] [PubMed] [Google Scholar]