Main Text
We thank Browning and Browning for questioning the effect of fine-scale population structure on variance explained by consideration of all SNPs together in methods we have proposed and implemented. Recently, we have taken the methodology further and have partitioned additive genetic variation across the genome.1 Browning and Browning investigate the effect of two sources of bias in estimates of the variance explained by SNPs—these sources are population stratification and correlation between environment and genotype—but their examples refer mainly to the latter.
We agree with Browning and Browning that an environmental factor that is correlated with genotype and has a large effect on phenotype will bias estimates of the genetic variance. This is not a problem specifically for our methods but for all genetic studies, including pedigree studies and genome-wide association studies (GWASs). For example, common environmental effects within a family will bias traditional estimates of heritability. Our method uses very distantly related individuals, so we would expect that our estimates are less likely to be biased than estimates based on close relatives. Such a correlation between genotype and environmental effects on the trait would also bias individual SNP effects in GWASs. However, for many complex traits, including quantitative traits and disease, SNP effects are similar in different ethnic groups (e.g., for type 2 diabetes2 and height3). This could hardly be due to the same chance confounding in different countries. The estimates of genetic variance from our methods are just the combined effects of all the SNPs. It has also been shown that estimates from all SNPs have predictive power across countries.3 Yang et al.4 used people of British ancestry living in Australia. It seems most unlikely that there is some environmental factor that has a large effect and is correlated with the part of Britain from which their ancestors emigrated. Another example of environmental- genotype correlation is when SNP genotypes are called differently between cases and controls. We have emphasized the importance of strict QC to minimize this possibility. For example, in Lee et al.5 we caution in three different places that any artificial allele frequency difference between cases and controls will result in the estimation of spurious “genetic” variance.
Although genotype-environment correlation does bias the estimate of SNP effects, the large bias observed in the simulations of Browning and Browning rely on unrealistically large effects. For the quantitative trait, Browning and Browning simulated a quantitative trait with mean values of 1, 2, and 3 for Scotland, England, and Wales, respectively, and a standard deviation of 0.4 within each region. In terms of human height (which is stratified across Europe and has a standard deviation of ∼7 cm), these parameters imply an average height difference between the Scots and English of ∼17 cm and between the Scots and Welsh of ∼35 cm. For these simulations, which use the genotype data of the two WTCCC control samples in which Scots comprise 9% of the sample, English 86%, and Welsh 5%, the phenotypic variance is ∼0.30, and the variance due to population difference is 0.14, so the expected proportion of variance explained by population difference (hS2) is ∼47%. If we assume that the height difference between the Scots and Welsh is 2 cm (this might not reflect the truth but might be more realistic) and that the English fall somewhere between the Scots and Welsh, then hS2 = 0.3%, a small effect. Likewise, Browning and Browning considered some extreme disease-region correlations that generated spurious detection of variance explained by SNPs. However, for the example closest to the real regional distortions in the WTCCC bipolar disorder data set (75% of the Welsh and Scots and 30% of the English samples chosen as “cases”), the variance explained was only 4% and decreased to only ∼2% when the variance estimated by the null model was subtracted. This potential bias is rather small in comparison to the 38% of variance explained by SNPs for bipolar disorder, i.e., a potential bias of about 5% (2%/38%).
Browning and Browning also simulated 200 generations of mating for an isolation-by-distance demographic model with a 50% chance of migration of progeny to adjacent regions. If we take their 5 × 5 grid axes as latitude and longitude, then for the quantitative trait the environmental gradient among the five latitude units was simulated to be one phenotypic standard deviation per unit. Of all phenotypic variation in the population, two-thirds was between latitudes. For human height this would be a gradient of 28 cm (four standard deviations) between the extreme latitudes. We performed the same simulation and used a random sample of 2000 individuals with a phenotype and the last five generations of pedigree information to estimate heritability, for a model that assumes that all family resemblance is due to additive genetic factors. Our estimate is based upon the correct identity-by-descent with respect to the base population of five generations in the past and does not utilize any SNP markers. Using restricted maximum likelihood, we estimated heritability to be at the upper boundary of 1.0. A least-squares estimate (Haseman-Elston regression) on the same data resulted in an estimated additive genetic variance that was more than four times the phenotypic variance. This example highlights again that the spurious estimation of “genetic” variance is due to a genotype-environment correlation and is not restricted to SNP-based estimates of genetic variation.
Population stratification without an environment-genotype correlation can also bias estimates of SNP effects due to confounding between a SNP and unlinked causative polymorphisms. In this case, fitting PCs helps to minimize the bias. However, when the confounding is purely due to an environmental effect rather than other genes, it is not surprising that PCs based on SNPs do not fully remove the bias. Despite this, it is noteworthy that in the most extreme scenario considered by Browning and Browning (90% of Scots and Welsh and 10% of English samples as “cases”), they found that fitting PCs substantially reduced the bias. We observed only a small drop in variance explained by the fitting of PCs in Yang et al.1,4 and Lee et al.,5 implying that the degree of population stratification in the real analyses was trivial compared to that of the simulations.
One can detect population stratification by examining the correlation between relationships as estimated from different chromosomes.1 If there is population stratification and mean differences between populations are due to genetic factors, then SNP variants on one chromosome will be correlated with causal variants on other chromosomes, and consequently variation associated with a single chromosome will be inflated. However, when the variances for all chromosomes are estimated simultaneously, this bias is removed.1 Browning and Browning found only small correlations between relationships when these were estimated from different chromosomes, but this is not surprising because the bias in their example relies on genotype-environment correlation rather than population stratification alone.
In conclusion, all genetic studies should guard against confounding between genotype and environment. Fitting region might help to remove such environmental effects if, for example, data from across Europe are analyzed. However, it is extremely unlikely that confounding explains more than a small proportion, if any, of the genetic variance estimated by Yang et al.1,4 and Lee et al.5 The environmental effects would have to be unbelievably large, and we can think of no factor that could cause such a large correlation between genotype and environment as to substantially affect our estimates in the samples we analyzed. We fully agree with Browning and Browning when they write that “However, if the data have been ascertained to avoid biases and ensure homogeneity, this inflation should be a very small part of the estimate.”
References
- 1.Yang J., Manolio T.A., Pasquale L.R., Boerwinkle E., Caporaso N., Cunningham J.M., de Andrade M., Feenstra B., Feingold E., Hayes M.G. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 2011;43:519–525. doi: 10.1038/ng.823. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Waters K.M., Stram D.O., Hassanein M.T., Le Marchand L., Wilkens L.R., Maskarinec G., Monroe K.R., Kolonel L.N., Altshuler D., Henderson B.E., Haiman C.A. Consistent association of type 2 diabetes risk variants found in Europeans in diverse racial and ethnic groups. PLoS Genet. 2010;6:e1001078. doi: 10.1371/journal.pgen.1001078. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Lango Allen H., Estrada K., Lettre G., Berndt S.I., Weedon M.N., Rivadeneira F., Willer C.J., Jackson A.U., Vedantam S., Raychaudhuri S. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature. 2010;467:832–838. doi: 10.1038/nature09410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Yang J., Benyamin B., McEvoy B.P., Gordon S., Henders A.K., Nyholt D.R., Madden P.A., Heath A.C., Martin N.G., Montgomery G.W. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 2010;42:565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Lee S.H., Wray N.R., Goddard M.E., Visscher P.M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 2011;88:294–305. doi: 10.1016/j.ajhg.2011.02.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
