In a recent publication in PNAS, Krishna Kumar et al. (1) claim that “GCTA applied to current SNP data cannot produce reliable or stable estimates of heritability.” We show below that those claims are false due to their misunderstanding of the theory and practice of random-effect models underlying genome-wide complex trait analysis (GCTA) (2).
GCTA, more precisely, the genomic-relatedness-based restricted maximum-likelihood (GREML) approach (3) implemented in GCTA (4), is a method to estimate the proportion of phenotypic variation that can be explained by all genome-wide SNPs () using an SNP-derived genetic relationship matrix. Krishna Kumar et al. (1) claim that the estimate of from GCTA-GREML is unreliable based on the observations that the observed variance explained per SNP (, where m is the number of SNPs) from simulations is inconsistent with their expectation. This is because they misunderstand that “GCTA assumes that the SNPs used are in linkage equilibrium” (ref. 1, p. 2), and mistakenly believe that should be the same regardless of the number of SNPs fitted in the model in either their original paper (1) or subsequent response (5) to our commentary (2). In fact, GREML fits all of the SNPs jointly in a random-effect model so that each SNP effect is fitted conditioning on the joint effects of all of the other SNPs [i.e., it accounts for linkage disequilibrium (LD) between the SNPs] (3). The estimate of in a random effect is interpreted as the variance of an SNP effect when it is fitted jointly with all of the other SNPs. Therefore, for a random subset of SNPs () is larger than that for the entire set () if SNPs are in LD.
Krishna Kumar et al. (1) show by analysis of a real dataset (ref. 1, figure 4A) that was, on average, much larger than . They further used the estimates from our previous studies (3, 6) as examples to show that with a smaller m was larger than with a larger m (5). All of these observations are entirely consistent with published theory that for a random subset of SNPs is larger than if SNPs are in LD. It was clearly demonstrated by Yang et al. (figure 2 of ref. 3) that increases toward a plateau as m increases.
From simulations of unlinked SNPs (figure 2 in ref. 1), Krishna Kumar et al. (1) observed that SD() was much larger than SD(). Their claim that this is a failure of GCTA-GREML is therefore incorrect because SD() is expected to increase with a decrease in m. If SNPs are unlinked, , where n = sample size (2, 7). For n = 2,000 and m = 50,000, this equation predicts that SD() ≈ 3.2 × 10−6, which is highly consistent with the observation by Krishna Kumar et al. (1) of 3.1 × 10−6.
There are many other errors in the paper by Krishna Kumar et al. (1), as pointed out by us (2) and others (8). In conclusion, Krishna Kumar et al. (1, 5) misunderstood the model and assumptions underlying GCTA-GREML, and therefore used the incorrect expected mean and SD of for comparison with those values observed from resampling. Hence, their conclusion about biasedness of GREML estimates is not supported by empirical evidence.
Acknowledgments
We thank Bill Hill, Alkes Price, John Witte, and Mark Blows for comments and support. This research was supported by the Australian National Health and Medical Research Council (Grants 1078037 and 1078901) and the Sylvia & Charles Viertel Charitable Foundation.
Footnotes
The authors declare no conflict of interest.
References
- 1.Krishna Kumar S, Feldman MW, Rehkopf DH, Tuljapurkar S. Limitations of GCTA as a solution to the missing heritability problem. Proc Natl Acad Sci USA. 2016;113(1):E61–E70. doi: 10.1073/pnas.1520109113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Yang J, Lee SH, Wray NR, Goddard ME, Visscher PM. Commentary on “Limitations of GCTA as a solution to the missing heritability problem.”. bioRxiv. 2016 doi: 10.1101/036574. [DOI] [Google Scholar]
- 3.Yang J, et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42(7):565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: A tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88(1):76–82. doi: 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Kumar SK, Feldman MW, Rehkopf DH, Tuljapurkar S. Response to Commentary on “Limitations of GCTA as a solution to the missing heritability problem.”. bioRxiv. 2016 doi: 10.1101/039594. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Yang J, et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat Genet. 2011;43(6):519–525. doi: 10.1038/ng.823. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Visscher PM, et al. Statistical power to detect genetic (co)variance of complex traits using SNP data in unrelated samples. PLoS Genet. 2014;10(4):e1004269. doi: 10.1371/journal.pgen.1004269. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Gamazon ER, Park DS. SNP-based heritability estimation: Measurement noise, population stratification and stability. bioRxiv. 2016 doi: 10.1101/040055. [DOI] [Google Scholar]