Abstract
Horse body size varies greatly due to intense selection within each breed. American Miniatures are less than one meter tall at the withers while Shires and Percherons can exceed two meters. The genetic basis for this variation is not known. We hypothesize that the breed population structure of the horse should simplify efforts to identify genes controlling size. In support of this, here we show with genome-wide association scans (GWAS) that genetic variation at just four loci can explain the great majority of horse size variation. Unlike humans, which are naturally reproducing and possess many genetic variants with weak effects on size, we show that horses, like other domestic mammals, carry just a small number of size loci with alleles of large effect. Furthermore, three of our horse size loci contain the LCORL, HMGA2 and ZFAT genes that have previously been found to control human height. The LCORL/NCAPG locus is also implicated in cattle growth and HMGA2 is associated with dog size. Extreme size diversification is a hallmark of domestication. Our results in the horse, complemented by the prior work in cattle and dog, serve to pinpoint those very few genes that have played major roles in the rapid evolution of size during domestication.
Introduction
The horse, like other domestic mammals, is comprised of many inbred and highly selected breed populations. Like all domestic mammals, the horse has experienced intense selection for certain traits. For example, extreme size diversification is a hallmark of domestication [1] and horses are no exception to this pattern. Today, horse breeds like the American Miniature average less than one meter tall at the withers while Shires and Percherons can exceed two meters [2]. The genetic basis for horse size variation is not known but we hypothesize that the breed population structure of the horse should simplify efforts to identify genes controlling size.
Size is a highly complex trait and until recently no human variants contributing to natural size variation had been found. Now, genome-wide association scans (GWAS) and meta-analyses with large sample sizes have identified nearly 200 size loci in the human genome [3]–[14]. Control of human size is mediated by a huge number of genes of very small effect [15]. In fact, it has been estimated that 697 genes, if identified, would explain just 15.7% of variance in human height [3]. In contrast, a single gene, IGF1, explains ∼10–15% of dog size variation [16], [17] and the majority of dog breed-average mass can be explained by as few as six loci [18]. Domestic mammals therefore offer a powerful system in which to investigate genes controlling size. In support of this, here we show with two GWAS that genetic variation at just four loci can explain the great majority of horse size variation. Unlike humans, which are naturally reproducing and possess many genetic variants with weak effects on size [3], [15], we show that horses, like other domestic mammals [18]–[20], carry just a small number of size loci with alleles of large effect.
Results and Discussion
With the ultimate goal of understanding the genetics of size and the rapid changes in size that have occurred in species under domestication, we previously quantified horse size variation by collecting 33 measurements of the head, neck, trunk and limbs from each of 1215 horses of known breed [2]. Our principal components (PC) analysis of the measurements showed that PC1 (which we will refer to as ‘PC1-size’) quantifies overall horse size and explains 65.9% of the variance in the body measurements [2].
To identify genes controlling PC1-size variation we conducted two GWAS (Fig. 1) using the equine 50 K SNP genotyping chip (Illumina, Inc.). DNA was collected from 48 horses of 16 different large and small breeds (three horses per breed) plus 48 Thoroughbreds of variable size. We inspected pedigrees to avoid including close relatives. The equine 50 K SNP chip has a ∼5 Mbp gap in coverage on chromosome 6. Because high mobility group AT-hook 2 (HMGA2) is within this interval and is a strong candidate for size [3]–[10], [12], [13], we added SNP genotypes from the HMGA2 locus to both GWA scans. We discovered and genotyped 34 SNPs in and around HMGA2 using two-direction capillary sequencing of seven amplicons in each of our 96 horse samples.
We first examined the genotypes via a principal components analysis to assess breed phylogenetic relationships (Fig. 1A–B). Each breed has a distinct genetic signature, as was found in a recent horse phylogeny [21]. The PC1 axis of variation distinguishes between the thoroughbreds and all the other breeds in our sample, i.e. the 16 breeds of extreme size. It makes sense that these SNPs would readily distinguish the thoroughbred breed, because a thoroughbred’s genome was sequenced to provide the horse reference. As a consequence, a disproportionate number of the total SNP discoveries in the horse species have involved sequences from thoroughbred chromosomes. Interestingly, we find that breeds assort on the genotype PC2, PC3 and PC4 axes largely by size (Fig. 1A–B). The PC2 axis separates our eight sampled large breeds from our eight small breeds. Furthermore, the PC3 axis separates the very largest breeds (Shire and Clydesdale) from the other large breeds, and PC4 separates three of the smallest breeds (American Miniature, Falabella and Shetland Pony) from the other small breeds. This finding supports a model of horse evolution in which divergence and genetic differentiation according to body size occurred early and was subsequently followed by creation of breed lines. The GWAS were conducted using EMMA [22] to correct for population structure, with sex included as a covariate. Markers with <10% minor allele frequency or >20% missing genotypes were excluded. No samples were excluded. Following EMMA correction, the GWA scans using 16 horse breeds and Thoroughbreds had genomic inflation factors [23] of 1.189 and 1.114, respectively (Fig. 1C).
The 16 breed GWAS was conducted with 48 of our measured horses that have extreme PC1-size values. We selected three horses from each of eight small and eight large breeds (Fig. 1D). In the Thoroughbred GWAS we genotyped 24 small and 24 large Thoroughbred horses, which represent the ∼10% smallest and ∼10% largest horses for PC1-size among the 219 Thoroughbreds we measured (Fig. 1E). This multi-breed design tests our hypothesis that many of the alleles controlling size are likely to be shared widely across extreme-sized breeds and in some cases, may contribute to size variation within breeds. Limited locus and allelic heterogeneity, and breed sharing of alleles identical-by-descent, is a common pattern for traits under selection in domestic mammals [16], [18], [21], [24]–[26].
We have identified four loci in the 16 breed scan and two loci in the Thoroughbred scan that are significantly associated with horse size following Bonferroni correction for multiple hypothesis testing (Fig. 1D–E and Fig. 2). The locus on chromosome 3 was identified independently in both scans. The four loci on chromosomes 3, 6, 9 and 11 together explain 83% of size variance in the 48 horses from 16 breeds (Fig. 2). Together, the loci on chromosomes 3 and 28 explain an estimated 59% of the variance in Thoroughbred size. While these estimates are likely to be upwardly biased by our small sample size, they nevertheless make the qualitative point that the genetic control of horse size includes loci with large effects. The simplicity of the genetic control of horse size contrasts greatly with the complexity of human size genetics [3], [15] but is similar to results for the domestic dog [18], [20].
The top genome-wide associated SNP in both GWAS is on chromosome 3 at 105,547,002 bp and is located 100 kb upstream of the ligand dependent nuclear receptor corepressor-like (LCORL) gene. The association signal at this SNP is near its maximum possible value in our 16 breed scan, as the alleles nearly perfectly segregate by size (Fig. 3). The LCORL gene is a transcription factor that has repeatedly been associated with human height 3,5,6,8–14. In cattle LCORL was identified in a screen for loci under selection [27] and the immediately adjacent gene, NCAPG, has been implicated in prenatal growth [28]. We inferred haplotypes for SNPs flanking the associated SNP (Fig. 3A–D). Haplotype #3 is found in all eight small breeds but only two large breeds (Fig. 3C). Together the eight small breeds carry five different haplotypes. In contrast, haplotype diversity is low in the large breeds, as six of them carry just a single haplotype, consistent with a selective sweep at this locus. The sizes of individual horses are plotted in Fig. 3E.
We also found a significant association with horse size for SNPs within and adjacent to HMGA2 (Fig. 4). We inferred 9-SNP haplotypes and found 10 haplotypes above a 1% frequency (Fig. 4A). Haplotype #1 is carried on 55% of the little horse chromosomes but just a single large horse chromosome (Fig. 4B, C). Haplotype #10, in contrast, is common in large breeds but not found in any small breeds (Fig. 4C). HMGA2 is an architectural transcription factor that regulates gene expression and directs cellular growth, proliferation and differentiation [29]. It was the first gene in which a common variant was associated with human height [4] and this finding has been replicated in many different human populations [3], [5], [7]–[10], [12], [13]. Mice homozygous for a HMGA2 knockout are just 40% the body weight of controls [30]. Furthermore, the HMGA2 locus has twice been associated with size in dogs [18], [20].
Our association on chromosome 9 is intergenic in a gene-sparse region 410 kbp upstream of the transcription factor [31] zinc finger and AT hook domain containing (ZFAT), which has been associated with height in multiple human populations [3], [11], [12]. ZFAT plays a role during development in hematopoiesis and mice homozygous for a knockout of the gene die as embryos [31].
For the other statistically associated SNPs, the association in the 16 breed scan on chromosome 11 is in the first intron of the LIM and SH3 protein 1 (LASP1) gene, which occurs in a gene-rich region. LASP1 mediates cell migration and survival and its expression is induced by IGF1 [32]. Its mis-expression in the mouse disrupts chondrocyte differentiation [33]. Thus, LASP1 is a good candidate for further investigation. However, the locus is gene-dense and fine-mapping will be needed to identify the causal variant or variants contributing to size variation. The Thoroughbred association on chromosome 28 is at a pair of SNPs 3 kbp apart at 18,161,215 bp and 18,164,558 bp. The SNPs are in perfect linkage disequilibrium and are intergenic between chronic lymphocytic leukemia up-regulated 1 (CLLU1) and plekstrin homology domain containing, family G member 7 (PLEKHG7). The 16 breed scan does not show any association with size at this locus (Fig. 1D), so genotyping in additional Thoroughbreds will be the best way to confirm and refine the association. On chromosome 14 the Thoroughbred scan identified a marginally significant association (Fig. 1E) for a set of SNPs spanning a large interval from 14.7–16.4 Mbp. This region in the horse reference genome assembly lacks genes except for a pair of pseudogenes. One of the pseudogenes is derived from vacuolar protein sorting 4 homology A (VPS4A), the protein product of which was recently shown to interact with Ras to promote growth factor signaling [34].
Three of the five significant loci we identified have previously been associated with size in humans, which argues against them being false positives. This finding also illustrates the conservation of size determination in mammals and makes possible a comparison of the evolution of these genes in natural versus intensely selected species.
Nearly 1% of all human genes are now implicated in contributing to size variation [3]. We show here that, in stark contrast, the control of the majority of horse size is genetically fairly simple. Genes controlling size in the horse are drawn largely from the broad set already identified in this role in humans. By combining our results with previous findings in cattle and dog we have identified a very short list of genes that were selected repeatedly in domestication to act as major drivers of rapid and extreme size diversification. We hypothesize that HMGA2 or LCORL, or both, may also drive size variation in other domestic mammals. By highlighting here a small but important subset of the size genes found in humans, the horse also offers guidance for exploring size genetics in humans and other mammals.
Note added in proof: while this paper was under review, complementary data describing genome-wide associations with withers height for the LCORL/NCAPG and ZFAT loci were reported for Franches-Montagnes horses [35].
Materials and Methods
Ethics Statement
Horses were sampled with signed consent from owners under a protocol approved by the institutional animal care and use committee at Cornell University.
Sample Collection and Phenotyping
A total of 33 measurements, breed identity, sex and date of birth were collected for each horse, as previously described [2]. Pedigrees and photographs were also collected and were used to confirm owner statements of breed identity. Pedigrees were also inspected to avoid genotyping close relatives. DNA was extracted from tail hair bulbs or blood using standard methods. The measurement data from a total of 1215 horses representing 65 breeds were subjected to a correlation matrix principal components analysis (R; princomp() function) to quantify PC1-size for each horse. See ref. [2] for details.
Genotyping and Genome-wide Association Analysis
Genome-wide SNP genotypes were collected for 96 horses using the equine 50 K SNP chip (Illumina, Inc.). The 16 breed sample and the Thoroughbred sample were each run as their own batches at Geneseek, Inc. The Illumina software genotype calls were used. SNPs were removed from the analysis if more than 20% of the samples had a missing genotype or if the minor allele frequency was less than 10%. No samples were removed from the analysis. After filtering, 48 samples and 37,584 SNPs were analyzed in the 16 breed GWA scan, and 48 samples and 38,496 SNPs were analyzed in the Thoroughbred scan. The proportion of size variation explained was estimated using a normal linear model and by comparing the residual variance of a null model with sex only (VN) to a full model (VF) with sex and relevant markers. The proportion of explained variance is defined as 1 - (VF/VN).
Haplotype Inference
Haploview [36] was used to assess patterns of linkage disequilibrium at the LCORL and HMGA2 loci and blocks of contiguous SNPs were chosen for haplotype inference based on those patterns. Haplotypes were inferred with PHASE [37] using the default parameter values. Due to the small number of samples for each of the 16 breeds, the haplotype inference was conducted using the entire sample set together.
Acknowledgments
We thank the many horse owners who kindly provided samples.
Footnotes
Competing Interests: The authors have declared that no competing interests exist.
Funding: We thank the Cornell University Center for Vertebrate Genomics for grant support (NS and SB). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Clutton-Brock J, Natural History Museum (London England) A natural history of domesticated mammals. Cambridge, U.K.; New York, NY, USA London;: Cambridge University Press; Natural History Museum. viii, 238 p. p. 1999.
- 2.Brooks SA, Makvandi-Nejad S, Chu E, Allen JJ, Streeter C, et al. Morphological variation in the horse: defining complex traits of body size and shape. Animal Genetics. 2010;41:159–165. doi: 10.1111/j.1365-2052.2010.02127.x. [DOI] [PubMed] [Google Scholar]
- 3.Lango Allen H, Estrada K, Lettre G, Berndt SI, Weedon MN, et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature. 2010;467:832–838. doi: 10.1038/nature09410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Weedon MN, Lettre G, Freathy RM, Lindgren CM, Voight BF, et al. A common variant of HMGA2 is associated with adult and childhood height in the general population. Nat Genet. 2007;39:1245–1250. doi: 10.1038/ng2121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Gudbjartsson DF, Walters GB, Thorleifsson G, Stefansson H, Halldorsson BV, et al. Many sequence variants affecting diversity of adult human height. Nat Genet. 2008;40:609–615. doi: 10.1038/ng.122. [DOI] [PubMed] [Google Scholar]
- 6.Weedon MN, Lango H, Lindgren CM, Wallace C, Evans DM, et al. Genome-wide association analysis identifies 20 loci that influence adult height. Nat Genet. 2008;40:575–583. doi: 10.1038/ng.121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Lettre G, Jackson AU, Gieger C, Schumacher FR, Berndt SI, et al. Identification of ten loci associated with height highlights new biological pathways in human growth. Nat Genet. 2008;40:584–591. doi: 10.1038/ng.125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kim JJ, Lee HI, Park T, Kim K, Lee JE, et al. Identification of 15 loci influencing height in a Korean population. J Hum Genet. 2010;55:27–31. doi: 10.1038/jhg.2009.116. [DOI] [PubMed] [Google Scholar]
- 9.Liu JZ, Medland SE, Wright MJ, Henders AK, Heath AC, et al. Genome-wide association study of height and body mass index in Australian twin families. Twin Research and Human Genetics. 2010;13:179–193. doi: 10.1375/twin.13.2.179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Sovio U, Bennett AJ, Millwood IY, Molitor J, O’Reilly PF, et al. Genetic determinants of height growth assessed longitudinally from infancy to adulthood in the northern Finland birth cohort 1966. PLoS Genet. 2009;5:e1000409. doi: 10.1371/journal.pgen.1000409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Takeuchi F, Nabika T, Isono M, Katsuya T, Sugiyama T, et al. Evaluation of genetic loci influencing adult height in the Japanese population. J Hum Genet. 2009;54:749–752. doi: 10.1038/jhg.2009.99. [DOI] [PubMed] [Google Scholar]
- 12.N’Diaye A, Chen GK, Palmer CD, Ge B, Tayo B, et al. Identification, replication, and fine-mapping of Loci associated with adult height in individuals of african ancestry. PLoS Genet. 2011;7:e1002298. doi: 10.1371/journal.pgen.1002298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Carty CL, Johnson NA, Hutter CM, Reiner AP, Peters U, et al. Genome-wide association study of body height in African Americans: the Women’s Health Initiative SNP Health Association Resource (SHARe). Hum Mol Genet. 2011. [DOI] [PMC free article] [PubMed]
- 14.Okada Y, Kamatani Y, Takahashi A, Matsuda K, Hosono N, et al. A genome-wide association study in 19 633 Japanese subjects identified LHX3-QSOX2 and IGF1 as adult height loci. Hum Mol Genet. 2010;19:2303–2312. doi: 10.1093/hmg/ddq091. [DOI] [PubMed] [Google Scholar]
- 15.Perola M. Genome-wide association approaches for identifying loci for human height genes. Best Pract Res Clin Endocrinol Metab. 2011;25:19–23. doi: 10.1016/j.beem.2010.10.013. [DOI] [PubMed] [Google Scholar]
- 16.Sutter NB, Bustamante CD, Chase K, Gray MM, Zhao K, et al. A single IGF1 allele is a major determinant of small size in dogs. Science. 2007;316:112–115. doi: 10.1126/science.1137045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Chase K, Carrier DR, Adler FR, Jarvik T, Ostrander EA, et al. Genetic basis for systems of skeletal quantitative traits: principal component analysis of the canid skeleton. Proc Natl Acad Sci U S A. 2002;99:9930–9935. doi: 10.1073/pnas.152333099. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Boyko AR, Quignon P, Li L, Schoenebeck JJ, Degenhardt JD, et al. A simple genetic architecture underlies morphological variation in dogs. PLoS Biol. 2010;8:e1000451. doi: 10.1371/journal.pbio.1000451. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Vaysse A, Ratnakumar A, Derrien T, Axelsson E, Rosengren Pielberg G, et al. Identification of genomic regions associated with phenotypic variation between dog breeds using selection mapping. PLoS Genet. 2011;7:e1002316. doi: 10.1371/journal.pgen.1002316. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Jones P, Chase K, Martin A, Davern P, Ostrander EA, et al. Single-nucleotide-polymorphism-based association mapping of dog stereotypes. Genetics. 2008;179:1033–1044. doi: 10.1534/genetics.108.087866. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.McCue ME, Bannasch DL, Petersen JL, Gurr J, Bailey E, et al. A high density SNP array for the domestic horse and extant perissodactyla: utility for association mapping, genetic diversity, and phylogeny studies. PLoS Genet. 2012;8:e1002451. doi: 10.1371/journal.pgen.1002451. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Kang HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D, et al. Efficient control of population structure in model organism association mapping. Genetics. 2008;178:1709–1723. doi: 10.1534/genetics.107.080101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Balding DJ. A tutorial on statistical methods for population association studies. Nat Rev Genet. 2006;7:781–791. doi: 10.1038/nrg1916. [DOI] [PubMed] [Google Scholar]
- 24.Cadieu E, Neff MW, Quignon P, Walsh K, Chase K, et al. Coat variation in the domestic dog is governed by variants in three genes. Science. 2009;326:150–153. doi: 10.1126/science.1177808. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Parker HG, vonHoldt BM, Quignon P, Margulies EH, Shao S, et al. An expressed fgf4 retrogene is associated with breed-defining chondrodysplasia in domestic dogs. Science. 2009;325:995–998. doi: 10.1126/science.1173275. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Karlsson EK, Baranowska I, Wade CM, Salmon Hillbertz NH, Zody MC, et al. Efficient mapping of mendelian traits in dogs through genome-wide association. Nat Genet. 2007;39:1321–1328. doi: 10.1038/ng.2007.10. [DOI] [PubMed] [Google Scholar]
- 27.Flori L, Fritz S, Jaffrezic F, Boussaha M, Gut I, et al. The genome response to artificial selection: a case study in dairy cattle. PLoS One. 2009;4:e6595. doi: 10.1371/journal.pone.0006595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Eberlein A, Takasuga A, Setoguchi K, Pfuhl R, Flisikowski K, et al. Dissection of genetic factors modulating fetal growth in cattle indicates a substantial role of the non-SMC condensin I complex, subunit G (NCAPG) gene. Genetics. 2009;183:951–964. doi: 10.1534/genetics.109.106476. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Cleynen I, Van de Ven WJ. The HMGA proteins: a myriad of functions (Review). Int J Oncol. 2008;32:289–305. [PubMed] [Google Scholar]
- 30.Zhou X, Benson KF, Ashar HR, Chada K. Mutation responsible for the mouse pygmy phenotype in the developmentally regulated factor HMGI-C. Nature. 1995;376:771–774. doi: 10.1038/376771a0. [DOI] [PubMed] [Google Scholar]
- 31.Tsunoda T, Takashima Y, Tanaka Y, Fujimoto T, Doi K, et al. Immune-related zinc finger gene ZFAT is an essential transcriptional regulator for hematopoietic differentiation in blood islands. Proc Natl Acad Sci U S A. 2010;107:14199–14204. doi: 10.1073/pnas.1002494107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Loughran G, Huigsloot M, Kiely PA, Smith LM, Floyd S, et al. Gene expression profiles in cells transformed by overexpression of the IGF-I receptor. Oncogene. 2005;24:6185–6193. doi: 10.1038/sj.onc.1208772. [DOI] [PubMed] [Google Scholar]
- 33.Hermann-Kleiter N, Ghaffari-Tabrizi N, Blumer MJ, Schwarzer C, Mazur MA, et al. Lasp1 misexpression influences chondrocyte differentiation in the vertebral column. Int J Dev Biol. 2009;53:983–991. doi: 10.1387/ijdb.072435nh. [DOI] [PubMed] [Google Scholar]
- 34.Zheng ZY, Cheng CM, Fu XR, Chen LY, Xu L, et al. CHMP6 and VPS4A mediate the recycling of Ras to the plasma membrane to promote growth factor signaling. Oncogene. 2012. [DOI] [PMC free article] [PubMed]
- 35.Signer-Hasler H, Flury C, Haase B, Burger D, Simianer H, et al. A genome-wide association study reveals Loci influencing height and other conformation traits in horses. PLoS One. 2012;7:e37282. doi: 10.1371/journal.pone.0037282. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21:263–265. doi: 10.1093/bioinformatics/bth457. [DOI] [PubMed] [Google Scholar]
- 37.Stephens M, Smith NJ, Donnelly P. A new statistical method for haplotype reconstruction from population data. Am J Hum Genet. 2001;68:978–989. doi: 10.1086/319501. [DOI] [PMC free article] [PubMed] [Google Scholar]