Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Aug 8.
Published in final edited form as: Anim Genet. 2013 Oct 27;45(1):153. doi: 10.1111/age.12093

Validation of imputation between equine genotyping arrays

A M McCoy 1, M E McCue 1
PMCID: PMC4000747  NIHMSID: NIHMS554390  PMID: 24164665

Background

Two genotyping arrays are available for the horse, containing ~54 000 and ~65 000 markers, of which only ~45 000 are shared. This leads to a loss of information when combining datasets generated on separate arrays. Genotype imputation offers a potential solution to this problem. Our objective was to assess the accuracy of genotype imputation for the two equine genotyping arrays across scenarios constructed to examine factors previously reported to affect imputation success in domestic animals and humans, including imputed population size, reference population size, reference population makeup (similar to or different from the imputed population) and length of shared haplotype blocks (linkage disequilibrium, LD).1,2

Methods

Genotypes from 248 horses of three breeds [Quarter Horse (QH), n = 143; Standardbred (STB), n = 72; Thoroughbred (TB), n = 33] genotyped on the Illumina Equine SNP70 BeadChip were ‘masked’ down to the 45 703 markers shared by the SNP70 and SNP50 chips and subsequently imputed back to the complete marker set for five chromosomes (ECA 1, 6, 15, 26 and X) using BEAGLE3 with default settings (Appendix S1, Fig. S1). Additionally, 30 QH genotyped on the SNP50 had their genotypes masked and imputed, using a reference population of 280 horses from 13 diverse breeds.

Results/Conclusions

Results for 20 SNP70 scenarios are summarized in Table S1. Overall, mean imputation success was 94.8% (individual horse range 82.2–100%). Generally, ECA 1, 15 and 26 performed better than did ECA 6 and X. For ECA 6, this may be partly due to the fact that a large block of imputed markers are located at the end of the chromosome and thus do not have an ideal haplotype context for imputation. Contrary to previous reports,2 size of the imputed population did not impact imputation success. Imputation success increased with larger reference population sizes (Fig. S2) and when imputed and reference populations were breed-matched. However, large mixed breed reference populations resulted in more accurate imputation than did small breed-matched reference populations. Breeds with longer LD had higher imputation success than did those with shorter LD (TB > STB > QH; Fig. S2). These results reflect findings reported in humans.1,4 Allelic R2, the estimated squared correlation between the imputed allele dosage and the true allele dosage for a marker, was used as a measure of confidence for imputed genotype calls. The overall mean R2 was 0.771 (range, 0.582–0.981). Imputation success and R2 were highly linearly correlated (r2 = 0.79). Results for the SNP50 were comparable to the SNP70 (Appendix S1). The total number of markers available for analysis after imputation was 73 200, an increase of ~27 500 markers from the set shared by the two chips. In conclusion, imputation between the two arrays was highly accurate.

Supplementary Material

Appendix S1
01

Acknowledgements

Thanks to Dr. James MacLeod for TB data and Robert Schaefer for custom shell script. AMM was supported by a NIH institutional training grant (T32OD10993).

Footnotes

Supporting information

Additional supporting information may be found in the online version of this article.

Appendix S1 Supplemental text.

Figure S1 Complete pipeline for imputation of equine genotyping data.

Figure S2 Mean imputation success with an imputed population n = 10 across a range of reference population sizes (n = 20–100) for each of three breeds [Quarter Horse (QH), red squares; Standardbred (STB), blue circles; Thoroughbred (TB), green triangles].

Figure S3 Venn diagram of marker overlap between the Illumina Equine SNP50 and SNP70 beadchips.

Table S1 Summary of SNP70 validation scenario results.

Table S2 Summary of preliminary imputation data.

Table S3 Summary of SNP50 validation scenario results.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix S1
01

RESOURCES