Skip to main content
Human Molecular Genetics logoLink to Human Molecular Genetics
. 2021 Aug 9;31(1):146–155. doi: 10.1093/hmg/ddab203

False positive findings during genome-wide association studies with imputation: influence of allele frequency and imputation accuracy

Zhihui Zhang 1,2, Xiangjun Xiao 3, Wen Zhou 4, Dakai Zhu 5, Christopher I Amos 6,7,
PMCID: PMC8682785  PMID: 34368847

Abstract

Genotype imputation is widely used in genetic studies to boost the power of GWAS, to combine multiple studies for meta-analysis and to perform fine mapping. With advances of imputation tools and large reference panels, genotype imputation has become mature and accurate. However, the uncertain nature of imputed genotypes can cause bias in the downstream analysis. Many studies have compared the performance of popular imputation approaches, but few investigated bias characteristics of downstream association analyses. Herein, we showed that the imputation accuracy is diminished if the real genotypes contain minor alleles. Although these genotypes are less common, which is particularly true for loci with low minor allele frequency, a large discordance between imputed and observed genotypes significantly inflated the association results, especially in data with a large portion of uncertain SNPs. The significant discordance of P-values happened as the P-value approached 0 or the imputation quality was poor. Although elimination of poorly imputed SNPs can remove false positive (FP) SNPs, it sacrificed, sometimes, more than 80% true positive (TP) SNPs. For top ranked SNPs, removing variants with moderate imputation quality cannot reduce the proportion of FP SNPs, and increasing sample size in reference panels did not greatly benefit the results as well. Additionally, samples with a balanced ratio between cases and controls can dramatically improve the number of TP SNPs observed in the imputation based GWAS. These results raise concerns about results from analysis of association studies when rare variants are studied, particularly when case–control studies are unbalanced.

Introduction

Missing genotype data are common in genetic studies and arise when single nucleotide polymorphisms (SNPs) are not covered by the genotyping platform. In genome-wide association studies (GWAS), missing data usually hinder the analysis when the study contains sparse markers, or multiple studies from different genotyping platforms are combined. While whole genome sequencing may not currently be feasible for large studies due to its high cost, genotype imputation is an economical approach to obtain dense genetic data and boost the power of GWAS (1–3). Applications of genotype imputation have advanced the discovery of novel loci in GWAS (4–7) and benefitted other genetic studies, such as fine mapping (6,8), discoveries of causal rare variants (9–11), and estimation of polygenetic risk scores (12,13), which require large sample sizes to adequately characterize genetic architecture for predictive modeling.

The theory underlying genotype imputation is that unobserved genotypes can be predicted or imputed by identifying the shared sequence between the haplotypes of the study samples and the haplotypes of the external references (1). In practice, the imputation methods estimate untyped genotypes by evaluating the probability of all the possible genotypes and generate uncertain or probabilistic genotypes. Although we can well calibrate the mean accuracy of the imputed results by schemes, such as selecting the proper reference panels (10,14) or removing SNPs with poor imputation qualities (15), the uncertain nature of the imputed genotypes may unavoidably bring biases to the downstream analysis (16–19).

Many studies have compared the performance of the popular imputation tools in various scenarios and suggested strategies to correct imputation results (15,20–24). However, most comparisons focus on the accuracy of the genotype imputation, and few studies have evaluated how the uncertainty in genotype data affects the downstream GWAS. Although most current imputation approaches are mature and can achieve >90% imputation accuracy for many SNPs, especially for alleles with minor allele frequency (MAF) > 5% (2,23), even small discordances between the imputed uncertain genotypes and the unknown genotypes can cause large bias in the downstream analysis. A recent study briefly examined the bias in a BMI association study by comparing the significant SNPs identified by analyzing the imputed genotype data and 73 established SNPs reported by prior studies (25). However, the established BMI associated SNPs were undetectable in the study due to many factors, such as the size of samples in the study, the cutoff of P-values, or other unique features of the data, etc., thus, they could not identify the bias characteristics solely caused by the uncertainty of the imputation.

To examine the bias emerging from imputation-based GWAS in detail, we randomly selected samples from the UKB Biobank data and masked the genotypes with varying rates for the succeeding imputation. We compared the association results by analyzing the study data with complete genotypes and the study with imputed genotypes. In this paper, we considered various scenarios, such as the ratio between the cases and controls in the study samples, the sample size of the reference panel, and the density of the marker panel genotyped. We will not only present how the probabilistic genotypes affect the correctness of P-values in different scenarios but also show the distribution of TP and FP SNPs at different cutoffs of P-values and post-imputation quality. Overall, our study will benefit the genotype imputation-based studies and offer an in-depth view of the bias that can be caused by using imputed genotypes in GWAS.

Results

In Figure 1A and B, we compared the discordance between the allelic dosage from imputed genotypes and the expected dosage from masked genotypes as described in the Method section; and the results were stratified by categories of masked genotypes. Note that, the left and right panels in Figure 1 represent results with 20% and 80% variants masked, respectively. As the value of R2 decreased, the mean discordance of all curves in Figure 1A and B deviated from 0 as expected, because low values of R2 mean poor quality of the imputation. Additionally, the imputation was less accurate when expected genotypes contained more copies of minor alleles. For example, in Figure 1A (20% variants are masked), when R2 is around 0.5, the mean discordance is 0.17, 0.42, and 0.90 while the expected genotypes were homozygous major (grey curve), heterozygous (orange curve) and homozygous minor (cyan curve), respectively. However, the mean discordance of all imputed genotypes had a relatively flat curve (red curve) and showed a similar trend as the curve of homozygous major (grey curves). The same pattern was observed while 80% variants are imputed (Fig. 1B). The flattened discordance curves of all genotypes can be explained by the prevalence of different genotypes. As shown in Figure 1C and D, the leading proportion among all imputed genotypes is expected to be homozygous major, and the proportion is larger than the total number of heterozygous and homozygous minor. Note that, the overall imputation quality suffered as the ratio of masked variants increases. When 20% genotypes were masked (Fig. 1C), 91.15% variants have R2 > 0.75, and the proportion dropped to 22.81% when 80% genotypes were masked (Fig. 1D). However, according to Figure 1A and B, even though more genotype data were masked, the mean discordance curves of all genotypes (red curves) did not change significantly. For example, when R2 is around 0.5, the mean discordance of all genotypes (red curves) slightly increased from 0.23 for masking 20% variants (Fig. 1A) to 0.27 for masking 80% variants (Fig. 1B). In other words, the poor imputation quality in the sparse data is hidden by solely comparing the mean discordance of all imputed genotypes. In Figure 1E and F, we showed the mean discordance as a function of minor allele frequency (MAF) and stratified the results by types of masked genotypes as well. The imputation accuracy largely reduced as MAF was low and the expected genotypes contained minor alleles. For 80% variants masked (Fig. 1F), when MAF decreased from 0.5 to 0.05, the mean discordance increased from 0.49 to 1.21 for homozygous minor (cyan curve) and from 0.21 to 0.61 for heterozygous (orange curve). While for homozygous major, the mean discordance decreased as the reduction of MAF (grey curves) and the trends were comparable to the curves of all variants (red curves). A similar trend was observed when we analyzed the data from the Indian descent population (Supplementary Material, Fig. S1).

Figure 1 .


Figure 1

Imputation accuracy. The top panels (A and B) show the relationship between the discordance of the imputed allelic dosage and the reported imputation quality metric R2 when 20% (left panels) and 80% (right panels) variants are masked. The calculation of the discordance is described in Method section. The y-axis of the curves is the mean discordance at each R2 (x-axis) bin with the size of 0.05. Error bars represent 95% confident intervals of the mean. The middle panels (C and D) describe the density of masked variants distributed as the function of R2. Each bar of the plots corresponds to 0.05 R2. The bottom panels (E and F) show the relationship between the discordance of the imputed allelic dosage and the minor allele frequency (MAF) when 20% (left panels) and 80% (right panels) variants are masked. The calculation of the discordance is described in Method section. The y-axis of the curves is the mean discordance at each MAF (x-axis) bin with the size of 0.005. Error bars represent 95% confident intervals of the mean. All results in these figures are stratified by types of masked genotypes.

To examine the accuracy of the downstream association analysis, we compared P-values of SNPs obtained from BMI association tests using imputed genotypes (Pi) and non-imputed genotypes (Pt), respectively. When 20% of the genotyped data were masked (Fig. 2A and C), for most SNPs, the difference between Pi and Pt was very small, though the difference was slightly larger when Pi was approaching 0. We calculated the discordance of P-values by the equation in the Method section. In Figure 2A, only two variants had the maximum discordance around 3.5 and all others are below 2.5. The discordance of P-values highly depends on the imputation quality. As shown in Figure 2C, most SNPs had high imputation quality and, thus, clustered around R2 = 1 which corresponded to the area with the smallest discordance of P-values in Figure 2A. However, when 80% of the genotyped data were imputed (Fig. 2B and D), a large proportion of SNPs had inflated Pi (positive y-axis). When Pi was approaching 0 in Figure 2B, the discordance surged and the maximum value reached 6.78, which means the value of Pi is approximately 10−7 times smaller than Pt. Accordingly, the imputed variants were spread along the axis of R2 in Figure 2D, while the band is wider as the imputation quality is lower. In Figure 2E and F, we showed the changes of the P-values’ discordance along the axis of MAF. The discordance was slightly high when MAF was low, although the trend was more obvious for imputing 80% genotypes (Fig. 2F). We also observed that SNPs with low imputation qualities tended to accumulate at low MAF (Fig. 2E and F). It is reasonable to hypothesize that the discordance will be much larger for rare variants with MAF < 5%. According to the results, variants with poor imputation qualities are likely to generate small Pi thus highly increased the possibility to get FP results.

Figure 2 .


Figure 2

Accuracy of the association analysis. Panels A and B show the distribution of SNPs as a function of P-values’ discordance (y-axis) and Pi (x-axis). Panels C and D show the distribution of SNPs with Pi < 0.05 as a function of P-values’ discordance (y-axis) and the imputation quality metric (R2) (x-axis). Color scales in these panels (A, B, C and D) indicate counts of variants within each hexagonal dot. Panels E and F show the distribution of SNPs with Pi < 0.05 as a function of P-values’ discordance (y-axis) and MAF (x-axis). The absolute discordance of P-values was averaged in each bin of MAF with a size of 0.005. The color scales indicate the range of R2 that was also averaged in each MAF bin. Pi and Pt were described in Method section. There are 20% variants masked in panels A, C and E, yet 80% in panels B, D and F.

The main aim of GWAS is to classify SNPs by calculating their P-values. Therefore, to estimate the confidence of the imputation based GWAS, we may have less interest in the discordance of P-values but care whether those significant SNPs identified by real genotypes are top ranked in studies using uncertain genotypes. First, we used SNPs with Pt < 0.05 as the target SNPs and evaluated the ability to detect them from the list of SNPs ranked by association studies using imputed probabilistic genotypes. As shown in Figure 3, we plotted the precision recall (PR) curves and stratified the results by different cutoffs of R2. When the rate of the masked variants increased from 20% (Fig. 3A) to 80% (Fig. 3B) and R2 > 0, the area under the curve (AUC) decreased from 0.75 to 0.23. Without removing any SNPs by post-imputation quality metric (R2 > 0), the maximum precision rate achieved 0.97 for 20% variants masked (Fig. 3A, red curve) but only 0.63 (Fig. 3B, red curve) for 80% variants masked. Removing SNPs using different cutoff of R2, the maximum precision rates of all scenarios were improved, because, as mentioned previously, poorly imputed marker tend to generate FP findings. For example, when the cutoff of R2 is 0.8, the maximum precision rates reached 1, regardless of the proportion that variants were imputed. However, the recall rate was highly traded off with a stringent cutoff of R2. For example, in Figure 3B, as the cutoff of R2 increased from 0 to 0.8, the maximum recall rate was sharply declined from 1 (all target SNPs were identified) to 0.17 (not even 1 of 5 target SNPs were identified). Note that, upon different cutoff of R2, SNPs with R2 outside the threshold were discarded and never detected, which resulted in the maximum recall rate < 1. With fewer samples in the study data, although the PR curves had the similar pattern, the ability to identify target SNPs became low (Supplementary Material, Fig. S3), because the power of the association analysis decreased.

Figure 3 .


Figure 3

PR Curves for the association analysis. The target SNPs are defined as variants with Pt < 0.05 obtained by analyzing the real genotypes of masked variants. At different thresholds (t) of Pi, the precision rate is the ratio between the number of SNPs with {Pi < t and Pt < 0.05} and the number of SNPs with Pi < t; and the recall rate is the ratio between the number of SNPs with {Pi < t and Pt < 0.05} and the number of SNPs with Pt < 0.05. The rate of masked variants is 20% in panel (A) and 80% in (B). All curves are stratified by imputation quality metric R2. Note that, in the panel (A), the curve of R < 0 is overlapped with the curve of R < 0.2.

According to the results in Figure 2, the discordance of P-values is the function of both R2 and Pi. In Figure 4, therefore, we assumed that SNPs with Pt < 1e-4 are the target SNPs and examined the evolution of precision and recall rates as the function of Pi. When 20% genotypes were masked, the cutoff of R2 did not improve the precision rate much (Fig. 4A), whereas it reduced the maximum recall rates from 1 to 0.88 as the cutoff of R2 changed from 0 to 0.8 (Fig. 4C). The maximum recall rate was reached as −log10(Pi) ranged from 3.53 to 3.57. Note that, in the data with complete genotypes, the maximum recall rate should be achieved at −log10(PI) = 4. When 80% genotyping data were masked (Fig. 4B and D), both precision and recall curves behaved quite differently depending on different cutoffs of R2. A stringent cutoff of R2 significantly enhanced precision rates (Fig. 4B) but sacrificed the recall rates (Fig. 4D). For example, at the point that −log10(Pi) was around 4, when R2 rose from 0 to 0.8, the precision rate increased 900% (from 0.06 to 0.60) and the recall rate decreased 36.99% (from 0.73 to 0.46). We also noticed that a stringent threshold of Pi (corresponding to a large value of -log10(Pi) in Figure 4) can achieve the similar effect as a stringent cutoff of R2 to increase the precision rate but decrease the recall rate. Sometimes, with a stringent threshold of Pi, a moderate cutoff of R2 is good enough to obtain a comparable precision rate but higher recall rate than a stringent cutoff of R2. For example, if the threshold of −log10(Pi) was around 5.7, R2 > 0.8 and R2 > 0.6 resulted in the same precision rate (1.0), but the latter one had a higher recall rate (0.19 for R2 > 0.8 but 0.38 for R2 > 0.6). Note that the maximum recall rates acquired as −log10(Pi) ranged from 2.15 to 3.07, which was much lower than the expected value 4. Both precision and recall rates decreased dramatically in the data with small sample size, especially in the data with sparse variants and using stringent cutoffs of R2 (Supplementary Material, Fig. S4).

Figure 4 .


Figure 4

Evolution of precision and recall rates of the association analysis along changes of Pi. The top-ranked SNPs are defined as variants with Pt < 1e-4 calculated by analyzing the real genotypes of masked variants. At each cutoff (t) of Pi, the precision rate is the ratio between the number of SNPs with {Pi < t and Pt < 1e-4} and the number of SNPs with Pi < t; and the recall rate is the ratio between the number of SNPs with {Pi < t and Pt < 1e-4} and the number of SNPs with Pt < 1e-4. The rate of masked variants is 20% in panel (A) and (C) and 80% in (B) and (D). All curves are stratified by imputation quality metric R2.

Many studies have suggested that a large size of the reference panel can improve the quality of the genotype imputation (26), which leads to a question about whether the size of the reference panel affects the association results using probabilistic genotypes. In Figure 5, we compared the top ranked SNPs (Pi < 1e-5) when 80% variants were imputed from the reference with a size of 8000 and 80 000 samples, respectively. As shown in Figure 5A, a 10 times larger reference panel only marginally improve the mean discordance of P-values (3.49 and 3.62 for 8000 and 80 000 samples in the reference, respectively), however with a bit less variability between the SNPs (standard deviation of 1.94 and 1.71 for 8000 and 80 000 samples in the reference, respectively). Among these top ranked SNPs, we grouped the SNPs with Pt < 1e-5 as TP SNPs but others as FP SNPs. In Figure 5B, if no SNPs were filtered out, the reference panels with 8000 and 80 000 samples obtained the same numbers of TP SNPs (11 TP SNPs), but the latter one generated 10 more FP SNPs (from 68 to 78). When the SNPs were stratified by the cutoff of R2 as shown in Figure 5B, the larger reference panel improved the quality of the imputation in all groups and, therefore, more SNPs have larger values of R2 in both TP and FP. The phenomenon is more obvious in the percentile barplot showing in Figure 5C. In this case, a moderate cutoff of R2 resulted in more TP SNPs as well as more FP SNPs in the results with a larger reference panel. For example, if the reference panel had 8000 samples and SNPs with R2 < 0.6 were discarded, we obtained 10 TP and 4 FP SNPs with Pi < 1e-5, corresponding to the precision rate of 0.71. However, for 80 000 samples in the reference, the same cutoff of R2 produced 10 TP but 11 FP SNPs, corresponding to the precision rate of 0.48. Only if we applied a stringent cutoff of R2, the data imputed from a large reference panel acquired high precision and recall rates. Using the cutoff R2 > 0.8, for 8000 samples in the reference, the precision and recall rates were 0.89 and 0.73, respectively, whereas 80 000 samples in the reference had precision and recall rates of 1 and 0.82, respectively.

Figure 5 .


Figure 5

Bias characteristics of top ranked SNPs as the size of the reference panels was varied. The top ranked SNPs in this figure are SNPs with Pi < 1e-5. The SNPs with both Pi and Pt smaller than 1e-5 were regarded as TP SNPs, and SNPs with Pi < 1e-5, but Pt > 1e-5 are FP SNPs. The rate of masked variants is 80% in all panels. (A) The distribution of the P-values’ discordance when the sample size of reference panels varied. The calculation of the discordance was described in Method. (B) The count of TP SNPs and FP SNPs was grouped by different sample size in the reference panel. The results were stratified by different cutoffs of the imputation quality metric R2. (C) Percent stacked bar plot of the same results showing in panel (B).

We further explored the impact of the varying ratios between cases and controls on the top ranked SNPs. In Figure 6, we presented the results of the SNPs with Pi < 1e-4 as 80% variants imputed in the studies with the ratio of case: control = 1:1, 7:1, and 1:7, respectively. In Figure 6A, the unbalanced ratio between cases and controls slightly elevated the mean discordance of P-values from 3.13 for case: control = 1:1, to 3.24 for case: control = 7:1 and 3.32 for case: control = 1:7. Using complete genotypes, we observed 26, 34 and 25 SNP with Pi < 1e-4 for the ratio of case: control = 1:1, 7:1, and 1:7, respectively. While, without filtering any SNPs by their imputation quality, we observed 19, 3, and 7 TP SNPs for case: control = 1:1, 7:1 and 1:7, respectively (Figure 6B). We can conclude that the recall rates were dramatically decreased as the ratio of case: control was unbalanced. Additionally, as shown in Figure 6C, the false discovery rates (FDR) of unbalanced data are much higher than the balanced one. As the cutoff of R2 rises from 0 to 0.8, the FDR drops from 0.94 to 0.33 for case: control = 1:1, but from 0.97 to 0.57 for case: control = 1:7. For case: control = 7:1, more stringent cutoffs of R2 did not improve the FDR; and the FDR slightly decreased from 0.98 as R2 > 0 to 0.91 as R2 > 0.6 but returned to 1 as R2 > 0.8. Note that, when we used a more stringent threshold of P-value (Pi < 1e-5), the FDR rates were even higher (Supplementary Material, Fig. S6).

Figure 6 .


Figure 6

Bias characteristic of top ranked SNPs as the ratio of case: control was varied. The top ranked SNPs in this figure are SNPs with Pi < 1e-4. The SNPs with both Pi and Pt smaller than 1e-4 were regarded as TP SNPs, and SNPs with Pi < 1e-4 but Pt > 1e-4 are FP SNPs. The rate of masked variants is 80% in all panels. (A) The distribution of the P-values’ discordance when the ratio of case: control is 1:1, 7:1 and 1:7, respectively. The calculation of the discordance was described in Method section. (B) Count of TP SNPs and FP SNPs was grouped by different ratios of case: control. The results were stratified by different cutoffs of the imputation quality metric R2. (C) Changes of FDR as varying cutoff of R2. The red, green and blue curves represent the data with case: control = 1:1, 1:7 and 7:1, respectively.

Discussion

In this paper, we presented the bias in GWAS originated from imputed genotypes. Rather than deterministic data, the imputed genotypes are estimated by probabilities of potential genotypes, which result in probabilistic data and bring the uncertainty to the downstream analysis. Many studies have evaluated the performance of the popular imputation tools by comparing the squared correlation between the imputed and true genotypes or calculating the percentage of imputed results that match the true genotypes (2,23). In this way, the imputation methods can acquire high accurate rates even if a small portion of the imputed genotypes has poor imputation accuracy, whereas these genotypes may play important roles in the downstream analysis. As shown in our analysis, it is less accurate to impute genotypes with copies of minor alleles than to impute homozygous major. However, the poor imputed genotypes, although existed, are less common especially for variants with small MAF; therefore, they did not affect the average imputation accuracy of all genotypes. In the downstream GWAS studies, some samples carried variants with informative data about minor variants, and their quality determines the confidence of the positive results identified by GWAS.

As we know, the popular reference panels used for the imputation are very dense. For examples, the 1000 Genomes Phase III panel has 49 143 605 sites from 2504 samples (26), and HRC panel contains 64 976 haplotypes at 39 635 008 sites from 2470 samples (27). In contrast, the genotyping arrays to be imputed are usually sparse. Take the UKB biobank data as the example, before the QC, the Affymetrix UK Biobank Axiom array has about 850 000 variants and this number is even smaller after the data clean (28). Therefore, if we impute the UK Biobank array data using any reference panels mentioned above, more than 80% variants will be imputed. According to our results, the imputation quality was dramatically reduced as the density of SNPs in the study samples is much small compared to the density of the reference panel. When 80% genotypes were imputed the P-values of most SNPs obtained by the imputation-based GWAS tended to be inflated, especially for top SNPs ranked by imputed P-values. The inflation of the P-values leads to the false discoveries in GWAS, especially those SNPs with low MAFs.

The popular strategies to avoid FP results in the imputation is to remove SNPs with poor imputation quality based on certain metric, such as R2 used in this paper. Generally, once the cutoff of R2 can decrease the inflation factor (λ) to 1, we usually assume that the GWAS results are reliable. In our analysis, the stringent cutoff of R2 indeed decreased λ. For example, as 80% variants were imputed, the cutoff of R2 = 0.8 can lower the value of λ from 1.30 to 1.06 in the data imputed by the reference with 8000 samples. However, the stringent filtration of SNPs in turn forfeits the ability to obtain TR results. For a sparse sample with a large portion of imputed genotypes, more than 80% TR results are lost as the cutoff of R2 is very stringent. Additionally, even if the SNPs are removed according to the values of R2, the FP results continued to exist in the remaining findings, and the proportion is relatively large as a moderate cutoff of R2 is selected. Therefore, we suggest that the obtained trait-associated SNPs from the imputation based GWAS studies with sparse genotyping are required to be verified by experiments or by validation from another independent GWAS studies. Deletion of SNPs by stringent R2 filtering can reduce the FP rate, but additional quality controls for the imputed genotypes are also necessary.

Other factors that affect the accuracy of the imputation have impacts on the downstream GWAS analysis as well. For examples, some studies suggested the imputation reference with a large sample size can improve the accuracy of the imputation (26). According to our results, for the top ranked SNPs, the large reference panel did not increase the number of TP SNPs obtained by the analysis but improved the imputation quality within TP SNPs. Therefore, significant SNPs tend to have higher values of R2, which increase the confidence in a TP finding. However, the results using the large reference panel generated more numbers of FP results and considerable inflations with a moderate cutoff of R2. The sample size of the study data also affects the results of the imputation-based association studies. Small sample size with sparse genotypes significantly reduces the confidence of the associated SNPs identified by imputed data (Supplementary Material, Fig. S5).

The unbalanced ratio between cases and controls is very common in real genetic studies, but their impacts on GWAS using imputation data are less discussed. For example, the most cited study using samples from the UK Biobank used all individuals affected by conditions or diseases compared to all other participants, yielding a median number of affected to nonaffected participants of 1:192 (29). According to our results, the imputation in unbalanced data did not change the mean discordance of P-values significantly. However, the proportion of FP SNPs among the top ranked variants was dramatically larger if unbalanced data were used for imputation. The situation is even worse if the level of significance is set to be very small. Therefore, the data with an approximately balanced ratio between cases and controls is preferred to the imputation based GWAS, although there may be some trade-offs in power versus FP rate to be gained by have 2–3 more controls than cases when controls are plentiful, and cases are few. Note that the results in this paper are mainly drawn from UK Biobank array data, so the verification using other source of data is necessary.

Overall, our study comprehensively accessed the bias in GWAS caused by the usage of the imputation genotype. We evaluated the accuracy of imputed genotypes and post-imputed GWAS P-values stratified by types of genotypes and values of R2. We also presented the abilities to remove FP results and obtain TP results when different cutoff of R2 and threshold of P-values were applied. Other factors that may contribute to the post-imputation GWAS results were also discussed in our study. The results in this paper will benefit the studies that depend on the imputation data by offering a valuable reference.

Material and Methods

The genotyping and phenotype data sets of 450 000 participants were obtained from UK Biobank (28). In this study, we focused on autosomal variants and performed the QC by removing variants with MAF < 5%, Hardy–Weinberg Equilibrium P-value < 10−7 and a missing rate per-SNP or per-individual >10%. There are 271 360 variants remained after the QC. The relatedness of individuals was inferred by KING (30), and individuals with relatives up to second degree were removed. We used BMI as the phenotype of interest and dichotomized the samples into adipose cases (30 < BMI < 39.9) and normal weigh controls (18.5 < BMI < 24.9).

As shown in Figure 7, factors that influence imputation accuracy and GWAS accuracy cooperate in obstructing positive findings in GWAS. To study the impacts of these factors, we prepared the study samples by randomly selecting three groups of 5000 individuals (15 000 samples in total) from British population with the ratios between cases and controls equal to 1:1, and two extreme ratios of 1:7 and 7:1, respectively. Two reference panels with sample sizes of 8000 and 80 000, respectively, were randomly selected from the remaining British population from UK Biobank participants. We duplicated the results by performing the same analysis in Indian population from UK Biobank data with a sample size of 2000, and the ratio between cases and controls is 1:1. The remaining 3500 Indian participants from UK Biobank are reference panel of the study. Note that, all figures in this paper are results by analyzing the data from British population, and the results for Indian population are shown in supplementary (Supplementary Material, Figs. S1S5).

Figure 7 .


Figure 7

Diagram of the workflow.

Within the three study sample sets, we first performed the BMI association analysis using PLINK1.90 (31) and obtained ‘true’ P-values for each SNP (Pt). Next, we randomly and evenly masked 20% (54 272) and 80% (217 088) of markers, respectively, in both cases and controls for the succeeding imputation. According to modern genotype imputation guidelines, we pre-phased the data using SHAPEIT4 (32) and imputed the masked variances using Beagle v.4.1 (33) with the recommended parameters. The imputation produced decimal values called ‘allelic dosage’ ranging from 0 to 2 at each masked site. The discordance of genotypes is calculated by the difference between the allelic dosage and the true dosage of the masked genotype. The true dosage is obtained by summing the copies of minor alleles carried by each individual at each site. For example, if a heterozygote genotype was masked and obtained an imputed dosage of 0.3, the discordance is |1–0.3| = 0.7. We then utilized the imputed allelic dosage to perform the BMI association analysis again using PLINK1.90 (31) and obtained post-imputation P-values denoted by Pi. The discordance of P-values is calculated as −log10(Pi/Pt).

After the imputation, it is necessary to measure the quality of the imputation without any knowledge of the true genotypes. Beagle v.4.1 (33) introduced the metric allelic R2 by estimating the squared correlation between the best guess-genotype and the allelic dosage. The value of R2 ranges from 0 to 1, and lower values correspond to poor imputation qualities. We used Beagle for imputation because its accuracy of imputation has been shown to be comparable to Minimac or IMPUTE2 while being less computationally demanding (33).

Supplementary Material

S1_ddab203
S2_ddab203
S3_ddab203
S4_ddab203
S5_ddab203
S6_ddab203
Supplementary_Material_ddab203

Acknowledgements

We wish to thank people who participated in the UK Biobank and made this research possible. The datasets used for the analyses in this manuscript were obtained from the database of UK Biobank at https://www.ukbiobank.ac.uk/. We would like to thank UK Biobank for distributing the data used in this study.

Conflict of Interest statement. No potential competing interest was reported by the authors.

Contributor Information

Zhihui Zhang, Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA; Institute of Clinical and Translational Medicine, Department of Medicine, Baylor College of Medicine, Houston, TX 77030, USA.

Xiangjun Xiao, Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA.

Wen Zhou, Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA.

Dakai Zhu, Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA.

Christopher I Amos, Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA; Institute of Clinical and Translational Medicine, Department of Medicine, Baylor College of Medicine, Houston, TX 77030, USA.

Funding

The National Cancer Institute of the National Institutes of Health under Award Number R01CA242218 and U19CA203654. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Dr Amos is a research scholar of the Cancer Prevention Research Institute of Texas and supported by RR170048.

Reference

  • 1. Li, Y., Willer, C., Sanna, S. and Abecasis, G. (2009) Genotype imputation. Annu. Rev. Genomics Hum. Genet., 10, 387–406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Marchini, J. and Howie, B. (2010) Genotype imputation for genome-wide association studies. Nat. Rev. Genet., 11, 499–511. [DOI] [PubMed] [Google Scholar]
  • 3. Bosse, Y. and Amos, C.I. (2018) A decade of GWAS results in lung cancer. Cancer Epidemiol. Biomark. Prev., 27, 363–379. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Liu, J.Z., Tozzi, F., Waterworth, D.M., Pillai, S.G., Muglia, P., Middleton, L., Berrettini, W., Knouff, C.W., Yuan, X., Waeber, G.  et al. (2010) Meta-analysis and imputation refines the association of 15q25 with smoking quantity. Nat. Genet., 42, 436–440. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Yan, G., Qiao, R., Zhang, F., Xin, W., Xiao, S., Huang, T., Zhang, Z. and Huang, L. (2017) Imputation-based whole-genome sequence association study rediscovered the missing QTL for lumbar number in Sutai pigs. Sci. Rep., 7, 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Ng, M.C.Y., Graff, M., Lu, Y., Justice, A.E., Mudgal, P., Liu, C.T., Young, K., Yanek, L.R., Feitosa, M.F., Wojczynski, M.K.  et al. (2017) Discovery and fine-mapping of adiposity loci using high density imputation of genome-wide association studies in individuals of African ancestry: African ancestry anthropometry genetics consortium. PLoS Genet., 13, 81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. McKay, J.D., Hung, R.J., Han, Y., Zong, X., Carreras-Torres, R., Christiani, D.C., Caporaso, N.E., Johansson, M., Xiao, X., Li, Y.  et al. (2017) Large-scale association analysis identifies new lung cancer susceptibility loci and heterogeneity in genetic susceptibility across histological subtypes. Nat. Genet., 49, 1126–1132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Mahajan, A., Taliun, D., Thurner, M., Robertson, N.R., Torres, J.M., Rayner, N.W., Payne, A.J., Steinthorsdottir, V., Scott, R.A., Grarup, N.  et al. (2018) Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat. Genet., 50, 1505–1513. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Marchini, J., Howie, B., Myers, S., McVean, G. and Donnelly, P. (2007) A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet., 39, 906–913. [DOI] [PubMed] [Google Scholar]
  • 10. Mitt, M., Kals, M., Pärn, K., Gabriel, S.B., Lander, E.S., Palotie, A., Ripatti, S., Morris, A.P., Metspalu, A., Esko, T.  et al. (2017) Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel. Eur. J. Hum. Genet., 25, 869–876. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Wang, Y., McKay, J.D., Rafnar, T., Wang, Z., Timofeeva, M.N., Broderick, P., Zong, X., Laplana, M., Wei, Y., Han, Y.  et al. (2014) Rare variants of large effect in BRCA2 and CHEK2 affect risk of lung cancer. Nat. Genet., 46, 736–741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Yanes, T., McInerney-Leo, A.M., Law, M.H. and Cummings, S. (2020) The emerging field of polygenic risk scores and perspective for use in clinical care. Hum. Mol. Genet., 00, 1–12. [DOI] [PubMed] [Google Scholar]
  • 13. Choi, S.W., Mak, T.S.H. and O’Reilly, P.F. (2020) Tutorial: a guide to performing polygenic risk score analyses. Nat. Protoc., 15, 2759–2772. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Das S., Abecasis G.R., Browning B.L. (2018) Genotype Imputation from Large Reference Panels.  Annu. Rev. Genomics Hum. Genet., 19, 73–96. [DOI] [PubMed] [Google Scholar]
  • 15. Johnson, E.O., Hancock, D.B., Levy, J.L., Gaddis, N.C., Saccone, N.L., Bierut, L.J. and Page, G.P. (2013) Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy. Hum. Genet., 132, 509–522. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Almeida, M.A.A., Oliveira, P.S.L., Pereira, T.V., Krieger, J.E. and Pereira, A.C. (2011) An empirical evaluation of imputation accuracy for association statistics reveals increased type-I error rates in genome-wide associations. BMC Genet., 12, 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Hao, K., Chudin, E., McElwee, J. and Schadt, E.E. (2009) Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies. BMC Genet., 10, 27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. de  Bakker, P.I.W., Ferreira, M.A.R., Jia, X., Neale, B.M., Raychaudhuri, S. and Voight, B.F. (2008) Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum. Mol. Genet., 17, 122–128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Guan, Y. and Stephens, M. (2008) Practical issues in imputation-based association mapping. PLoS Genet., 4, e1000279. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Hoffmann, T.J. and Witte, J.S. (2015) Strategies for imputing and analyzing rare variants in association studies. Trends Genet., 31, 556–563. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Nothnagel, M., Ellinghaus, D., Schreiber, S., Krawczak, M. and Franke, A. (2009) A comprehensive evaluation of SNP genotype imputation. Hum. Genet., 125, 163–171. [DOI] [PubMed] [Google Scholar]
  • 22. Bai, W.-Y., Zhu, X.-W., Cong, P.-K., Zhang, X.-J., Richards, J.B. and Zheng, H.-F. (2019) Genotype imputation and reference panel: a systematic evaluation on haplotype size and diversity. Brief. Bioinform., 21, 1806–1817. [DOI] [PubMed] [Google Scholar]
  • 23. Shi, S., Yuan, N., Yang, M., Du, Z., Wang, J., Sheng, X., Wu, J. and Xiao, J. (2018) Comprehensive assessment of genotype imputation performance. Hum. Hered., 83, 107–116. [DOI] [PubMed] [Google Scholar]
  • 24. Liu, Q., Cirulli, E.T., Han, Y., Yao, S., Liu, S. and Zhu, Q. (2014) Systematic assessment of imputation performance using the 1000 genomes reference panels. Brief. Bioinform., 16, 549–562. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Palmer, C. and Pe’er, I. (2016) Bias characterization in probabilistic genotype data and improved signal detection with multiple imputation. PLoS Genet., 12, e1006091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Auton, A., Abecasis, G.R., Altshuler, D.M., Durbin, R.M., Bentley, D.R., Chakravarti, A., Clark, A.G., Donnelly, P., Eichler, E.E., Flicek, P.  et al. (2015) A global reference for human genetic variation. Nature, 526, 68–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. McCarthy, S., Das, S., Kretzschmar, W., Delaneau, O., Wood, A.R., Teumer, A., Kang, H.M., Fuchsberger, C., Danecek, P., Sharp, K.  et al. (2016) A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet., 48, 1279–1283. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Bycroft, C., Freeman, C., Petkova, D., Band, G., Elliott, L.T., Sharp, K., Motyer, A., Vukcevic, D., Delaneau, O., O’Connell, J.  et al. (2018) The UK biobank resource with deep phenotyping and genomic data. Nature, 562, 203–209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Canela-Xandri, O., Rawlik, K. and Tenesa, A. (2018) An atlas of genetic associations in UK biobank. Nat. Genet., 50, 1593–1599. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Manichaikul, A., Mychaleckyj, J.C., Rich, S.S., Daly, K., Sale, M. and Chen, W.-M. (2010) Robust relationship inference in genome-wide association studies. Bioinformatics, 26, 2867–2873. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Chang, C.C., Chow, C.C., Tellier, L.C., Vattikuti, S., Purcell, S.M. and Lee, J.J. (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience, 4, 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Delaneau, O., Zagury, J.F., Robinson, M.R., Marchini, J.L. and Dermitzakis, E.T. (2019) Accurate, scalable and integrative haplotype estimation. Nat. Commun., 10, 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Browning, B.L. and Browning, S.R. (2016) Genotype imputation with millions of reference samples. Am. J. Hum. Genet., 98, 116–126. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1_ddab203
S2_ddab203
S3_ddab203
S4_ddab203
S5_ddab203
S6_ddab203
Supplementary_Material_ddab203

Articles from Human Molecular Genetics are provided here courtesy of Oxford University Press

RESOURCES