Main Text
To the Editor: In Speed et al.,1 we identified two potential issues when performing SNP-based heritability estimation: (1) estimates of h2 can be biased when the tagging of causal variants differs from that of the SNPs used for calculating the genomic-relationship matrix (GRM), and (2) the accuracy of h2 estimates depends on how closely the assumed relationship between a causal variant’s minor allele frequency (MAF) and effect size matches the true relationship (this relationship can be modeled with a scale parameter s, where the standard assumption is s = −1). To resolve the first issue, we proposed computing an adjusted GRM, where uneven tagging is accounted for by weighted SNPs. This approach is implemented in our software LDAK.1
Lee et al. proposed continuing to use an unadjusted GRM computed via GCTA and to instead take a MAF-stratification approach to estimating h2. We summarize the major claims of their paper as follows:
(1) Their MAF-stratification approach accounts for uncertainty about s, (2) the weightings from LDAK are suboptimal for dense SNP data, and (3) their MAF-stratification approach gives less biased estimates of h2. We agree with the first, show that the second can be avoided by appropriate parameter selection in LDAK, and disagree with the third. In our opinion, adjusting for uneven tagging becomes even more important in the analysis of dense SNP data, and the MAF-stratification method of Lee et al. can be improved by incorporating the weightings used in LDAK.
To support the third claim, Lee et al. presented simulations in which the causal variants tend to be poorly tagged. However, the design of the simulations means that potential biases in their MAF-stratification approach are not evident. For example, for architecture E, causal variants are restricted to SNPs with a MAF < 0.1, so estimates of h2 tend to be biased downward if they are based on a GRM computed from all SNPs (demonstrated by the performance of standard GCTA in their simulations). MAF < 0.1 defines one of the (arbitrary) tranches in their analysis; had there been a mismatch between the MAF tranche from which causal variants were randomly selected and the MAF tranche used for analysis, their approach would have experienced biases similar to those suffered by GCTA.
Here, we demonstrate the continued importance of adjusting for uneven tagging through simulation (50 replicates in each case). We considered a data set of 6,387 individuals, who, after imputation against the 1000 Genomes reference panel, were genotyped for 4,238,038 SNPs. In addition to considering architectures A–F defined by Lee et al., we also considered architectures G and H, in which the 10,000 causal SNPs are well tagged, and architectures I and J, in which they are poorly tagged. (Tagging is measured by T, effectively the multiplicity of a signal.1 Here, T ranges from 1 to 2,032 and has a median of 92; a SNP is defined as well tagged if T > 179 and poorly tagged if T < 45.) We compared four analysis methods: GCTA (standard GRM), LDAK (weighted GRM), and MAF versions of both of these; in these MAF versions, GRMs were computed for each of five MAF tranches. MAF-LDAK was implemented with the “region” option in LDAK. Each GRM was computed with s = −1, and we used the default settings for LDAK.
Our results for architectures A–F (Table 1) agree with those of Lee et al. in that MAF-GCTA outperformed GCTA. Any overestimation of h2 by LDAK appeared to be slight, less than what was observed by Lee et al.; we return to this point below. When causal variants were well or poorly tagged (architectures G–J), the estimates of h2 from MAF-GCTA (similar to those of GCTA) tended to be biased upward or downward, respectively, whereas those from MAF-LDAK were much closer to the truth, indicating that adjusting for uneven tagging and stratifying by MAF are both advantageous. We repeated these simulations with 500, 1,000, 2,000, and 5,000 causal variants and obtained similar results each time (data not shown).
Table 1.
Architecture | GCTA | LDAK | MAF-GCTA | MAF-LDAK |
---|---|---|---|---|
Causal SNPs Picked at Random | ||||
Architecture A (s = 1) | 0.50 (0.05) | 0.51 (0.08) | 0.50 (0.05) | 0.51 (0.08) |
Architecture B (s = 0) | 0.52 (0.04) | 0.50 (0.07) | 0.50 (0.04) | 0.50 (0.07) |
70% Causal SNPs with MAF < 0.1 | ||||
Architecture C (s = 1) | 0.47 (0.04) | 0.51 (0.06) | 0.50 (0.04) | 0.51 (0.06) |
Architecture D (s = 0) | 0.52 (0.05) | 0.53 (0.07) | 0.52 (0.05) | 0.53 (0.07) |
All Causal SNPs with MAF < 0.1 | ||||
Architecture E (s = 1) | 0.45 (0.05) | 0.51 (0.08) | 0.52 (0.04) | 0.53 (0.07) |
Architecture F (s = 0) | 0.45 (0.04) | 0.52 (0.06) | 0.51 (0.03) | 0.53 (0.05) |
Well-Tagged Causal SNPs | ||||
Architecture G (s = 1) | 0.89 (0.03) | 0.56 (0.06) | 0.89 (0.04) | 0.56 (0.06) |
Architecture H (s = 0) | 0.88 (0.03) | 0.54 (0.08) | 0.84 (0.03) | 0.54 (0.07) |
Poorly Tagged Causal SNPs | ||||
Architecture I (s = 1) | 0.13 (0.05) | 0.49 (0.06) | 0.14 (0.05) | 0.49 (0.06) |
Architecture J (s = 0) | 0.13 (0.05) | 0.50 (0.07) | 0.13 (0.05) | 0.51 (0.07) |
Architectures A–F have been defined by Lee et al.; we additionally considered architectures G and H, where causal variants are well-tagged SNPs, and architectures I and J, where causal variants are poorly tagged SNPs. The true (simulated) h2 is 0.5. Each value reports the mean estimate of h2 over 50 replicates (the empirical SD is provided in parentheses). Note that Lee et al. also reported the Akaike information criterion (AIC), but we omit this score because it can be highly misleading; for example, in our simulations, the highest AIC was achieved with GCTA for architecture G, where the estimates of h2 were on average 80% higher than the true value.
The above-reported upward bias of LDAK is smaller than that found by Lee et al. (the second claim above), which might be a result of fewer SNPs (approximately four million versus eight million) because we imposed stricter quality control (QC). We believe that strict QC is crucial in the estimation of h2 for binary traits,1,2 but to allow direct comparison, we relaxed the QC thresholds to match those of Lee et al. Afterwards, our data set had 7,190,149 SNPs. Because of the limited time to prepare this response, we only used chromosomes 1 and 2 (1,153,686 SNPs) and simulated 1,000 causal variants. Focusing on the architectures with s = −1 in Table 2, we observed overestimation of h2 by LDAK, although to a lesser extent than did Lee et al. (2%, 2%, and 1% compared with 6%, 5%, and 3%). When weightings are calculated, LDAK by default models linkage-disequilibrium decay with distance in order to give more weight to local correlations than to long-range correlations that might be due to relatedness. For unrelated individuals, this is unnecessary, and so a large value (such as 100 Mb) can be used for the decay parameter. With this change, no bias is apparent (LDAK2).
Table 2.
Architecture | GCTA | LDAK | LDAK2 |
---|---|---|---|
Causal SNPs Picked at Random | |||
Architecture A (s = 1) | 0.49 (0.03) | 0.52 (0.04) | 0.50 (0.04) |
Architecture B (s = 0) | 0.55 (0.02) | 0.54 (0.03) | 0.52 (0.03) |
70% Causal SNPs with MAF < 0.1 | |||
Architecture C (s = 1) | 0.49 (0.02) | 0.52 (0.04) | 0.50 (0.04) |
Architecture D (s = 0) | 0.51 (0.02) | 0.52 (0.03) | 0.50 (0.03) |
All Causal SNPs with MAF < 0.1 | |||
Architecture E (s = 1) | 0.42 (0.03) | 0.51 (0.04) | 0.49 (0.04) |
Architecture F (s = 0) | 0.46 (0.03) | 0.51 (0.04) | 0.49 (0.04) |
LDAK employs the default parameter settings when computing weightings; LDAK2 turns off the linkage-disequilibrium-decay function. The true (simulated) h2 is 0.5. Each value reports the mean estimate of h2 over 50 replicates (the empirical SD is provided in parentheses).
In summary, although we agree with Lee et al. that MAF stratification is effective in reducing biases caused by misspecification of the scale parameter s, we feel that it remains important to adjust for uneven tagging. In addition to achieving improved accuracy by incorporating the LDAK weightings, this approach effects SNP pruning (for imputed data, approximately 90% of SNPs will receive weight zero and so can be discarded), thus reducing the subsequent task of computing GRMs by a factor of about ten.
References
- 1.Speed D., Hemani G., Johnson M.R., Balding D.J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 2012;91:1011–1021. doi: 10.1016/j.ajhg.2012.10.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Lee S.H., Wray N.R., Goddard M.E., Visscher P.M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 2011;88:294–305. doi: 10.1016/j.ajhg.2011.02.002. [DOI] [PMC free article] [PubMed] [Google Scholar]