We thank de los Campos and Sorensen (D&S) for their Correspondence, which follows their recent work1. D&S agree that maximum prediction accuracy depends on h2M, defined as the variance explained by genotyped markers in the population. They claim that estimates of h2M in a finite sample (h2G-BLUP or h2G) may overestimate h2M, and that this is exacerbated for unrelated individuals. We respond by showing how and why we disagree with these claims.
h2G and h2G-BLUP are estimates of the same parameter from equivalent models3–6 and so, for the same dataset, they must have the same value. Both measure the proportion of the phenotypic variance that is explained by the markers. This proportion depends on linkage disequilibrium (LD) between the SNPs and causal variants or quantitative trait loci (QTL). If the LD is imperfect, then h2M will be less than the conventional heritability, h2, which is the proportion of variance explained by all causal variants. The extent of LD depends on the relatedness of the sample of individuals used. If closely related individuals are included in the sample, there is long range LD generated even between SNPs and QTL on different chromosomes. Thus, inclusion of close relatives increases h2M and its estimates. Usually, the parameter we wish to estimate is the h2M among individuals who are no more closely related than randomly sampled individuals from the population (e.g.7).
D&S state that the accuracy of prediction R2TST does not approach h2M even in an infinite sample. This is incorrect. R2TST depends on two factors – h2M and the accuracy with which the marker effects are estimated 3; 8. If the marker effects are estimated with no error then R2TST = h2M. In practice the accuracy of estimating SNP effects is usually low in humans and this also explains the low R2TST often reported. Ref. 1 claims that “the estimated h2G did not provide a good indication of prediction R2”. In their simulations of unrelated individuals (GEN cohort; h2=0.8), they state “when [non-causal] markers were used we observed only a small extent of missing heritability [h2G=0.737, vs. h2G=0.773 for causal markers] but the reduction in R2 due to use of markers that were in imperfect LD with causal loci was dramatic [R2=0.071, vs. R2=0.517 for causal markers]”. Even though the number of causal loci was the same, the number of markers differed: 300,000, corresponding to M=60,000 independent markers2 vs M=5,000 in the causal set. Equation (1) of ref. 2 demonstrates that R2 decreases with higher M (which increases the variance of the estimated genetic relationships).
D&S say that R2TST is zero if the training and testing datasets are independent. This is a distracting statement because individuals within a species are always related to some degree. D&S also question our focus on the prediction accuracy that can be obtained in an independent validation sample2. We disagree with the opinion of D&S that the prediction accuracy that can be obtained in a non-independent validation sample is a quantity of equal interest.
References
- 1.de Los Campos G, Vazquez AI, Fernando R, Klimentidis YC, Sorensen D. Prediction of complex human traits using the genomic best linear unbiased predictor. PLoS Genet. 2013;9:e1003608. doi: 10.1371/journal.pgen.1003608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Wray NR, Yang J, Hayes BJ, Price AL, Goddard ME, Visscher PM. Pitfalls of predicting complex traits from SNPs. Nat Rev Genet. 2013;14:507–515. doi: 10.1038/nrg3457. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Goddard ME, Wray NR, Verbyla KL, Visscher PM. Estimating effects and making predictions from genome-wide marker data. Statistical Science. 2009;24:517–529. [Google Scholar]
- 4.Habier D, Fernando RL, Dekkers JC. The impact of genetic relationship information on genome-assisted breeding values. Genetics. 2007;177:2389–2397. doi: 10.1534/genetics.107.081190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.VanRaden PM. Efficient methods to compute genomic predictions. Journal of dairy science. 2008;91:4414–4423. doi: 10.3168/jds.2007-0980. [DOI] [PubMed] [Google Scholar]
- 6.Goddard ME. Genomic Selection: predicion of accuracy and maximisation of long term response. Genetica. 2009;136:245–257. doi: 10.1007/s10709-008-9308-0. [DOI] [PubMed] [Google Scholar]
- 7.Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW, et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42:565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Goddard ME, Hayes BJ, Meuwissen TH. Using the genomic relationship matrix to predict the accuracy of genomic selection. Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie. 2011;128:409–421. doi: 10.1111/j.1439-0388.2011.00964.x. [DOI] [PubMed] [Google Scholar]
- 9.Makowsky R, Pajewski NM, Klimentidis YC, Vazquez AI, Duarte CW, Allison DB, de Los Campos G. Beyond missing heritability: prediction of complex traits. PLoS Genet. 2011;7:e1002051. doi: 10.1371/journal.pgen.1002051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Janss L, de Los Campos G, Sheehan N, Sorensen D. Inferences from genomic models in stratified populations. Genetics. 2012;192:693–704. doi: 10.1534/genetics.112.141143. [DOI] [PMC free article] [PubMed] [Google Scholar]
