Abstract
High-density single nucleotide polymorphism (SNP) genotyping arrays are a powerful tool for studying genomic patterns of diversity, inferring ancestral relationships between individuals in populations and studying marker–trait associations in mapping experiments. We developed a genotyping array including about 90 000 gene-associated SNPs and used it to characterize genetic variation in allohexaploid and allotetraploid wheat populations. The array includes a significant fraction of common genome-wide distributed SNPs that are represented in populations of diverse geographical origin. We used density-based spatial clustering algorithms to enable high-throughput genotype calling in complex data sets obtained for polyploid wheat. We show that these model-free clustering algorithms provide accurate genotype calling in the presence of multiple clusters including clusters with low signal intensity resulting from significant sequence divergence at the target SNP site or gene deletions. Assays that detect low-intensity clusters can provide insight into the distribution of presence–absence variation (PAV) in wheat populations. A total of 46 977 SNPs from the wheat 90K array were genetically mapped using a combination of eight mapping populations. The developed array and cluster identification algorithms provide an opportunity to infer detailed haplotype structure in polyploid wheat and will serve as an invaluable resource for diversity studies and investigating the genetic basis of trait variation in wheat.
Keywords: single nucleotide polymorphism, polyploid wheat, wheat iSelect array, genotyping, high-density map, genetic diversity
Introduction
High-density single nucleotide polymorphism (SNP) data are widely used to detect marker–trait associations in quantitative trait locus (QTL) mapping experiments and genome-wide association studies (GWAS) (Cook et al., 2012; Jia et al., 2013; Tian et al., 2011; Zhao et al., 2011). Advances in next-generation sequencing have significantly facilitated the discovery of SNPs by whole genome (Berkman et al., 2012; Chia et al., 2012; Xu et al., 2012), transcriptome (Allen et al., 2011; Cavanagh et al., 2013; Oliver et al., 2013) or reduced-representation sequencing in diverse populations of individuals (Elshire et al., 2011; Poland et al., 2012; Saintenac et al., 2011, 2013; Van Poecke et al., 2013). Sets of informative SNPs selected based on their distribution across the genome, minor allele frequency (MAF) and intervariant linkage disequilibrium (LD), have been used to design high-density genotyping assays based on various technological principles (Cavanagh et al., 2013; Ganal et al., 2011; Kim et al., 2007; Song et al., 2013). While SNP arrays can be prone to ascertainment bias caused by preselection of SNPs in populations of limited size (Albrechtsen et al., 2010), reduced computational requirements for downstream data processing, high call frequency, low error rate and ease of use make SNP-based platforms an attractive genotyping tool.
High-density SNP arrays have been developed for a number of economically important crops and animals (Ganal et al., 2011; Sim et al., 2012; Song et al., 2013; Wiedmann et al., 2008; Zhao et al., 2011) and successfully used for genetic studies. The GWAS of 413 diverse rice accessions using a 44K SNP genotyping chip identified dozens of alleles controlling 34 morphological, developmental and agronomic traits (Zhao et al., 2011). The 50K maize SNP chip has been used to study the genetic control of maize kernel composition in a nested association mapping panel (Cook et al., 2012) and identify signatures of wild relative allele introgressions in the maize genome (Hufford et al., 2012). The recently developed 9K SNP wheat chip was used to detect genomic regions targeted by breeding and improvement selection in wheat (Cavanagh et al., 2013).
The allotetraploid and allohexaploid genomes of durum (Triticum turgidum subsp. durum (Desf.) Husnot) and bread wheat (Triticum aestivum L.), respectively, pose a significant challenge for the analysis of genotyping data generated using most SNP genotyping platforms (Akhunov et al., 2009). The ratio of allelic variants observed in polyploids often deviates from the ratio observed in diploid organisms, resulting in genotype cluster plots (plots of the fluorescence intensities of the A and B alleles) that are difficult to analyse using conventional genotype calling software. In the polyploid wheat genome, this problem is further complicated by the presence of paralogous loci and secondary SNPs that interfere with genotyping oligonucleotide annealing (Akhunov et al., 2009). While there have been attempts to develop cluster identification algorithms for polyploid genotyping data (Serang et al., 2012), genotype calling in allopolyploid wheat still remains a significant challenge. In our previous study (Cavanagh et al., 2013), we applied the default algorithm implemented in Genome Studio (Illumina) followed by extensive manual data curation. This approach resulted in high-quality genotype calls for many assays, but not for those that generated multiple clusters, closely spaced clusters or clusters with low fluorescence signal intensity. Further development of genotype calling procedures for polyploid species was required to accelerate the analysis of these complex data sets.
Here, we present the development of a wheat SNP iSelect array comprising of approximately 90 000 gene-associated SNPs that provides dense coverage of the wheat genome. To analyse the complex genotyping data generated for polyploid wheat, we applied two complementary model-free density-based clustering algorithms: OPTICS and DBSCAN (Ankerst et al., 1999; Ester et al., 1996). We demonstrate the utility of the developed array and genotype calling algorithms to reliably detect SNPs across worldwide wheat populations including hexaploid and tetraploid cultivars and landraces. A total of 46 977 SNP markers were genetically mapped using eight mapping populations, creating a resource for diversity studies and high-resolution dissection of complex traits in wheat.
Results
Variant discovery
For hexaploid wheat, more than 526 million quality-filtered RNA-seq reads (∼73 Gbp) were generated for 19 bread wheat accessions (Table S1). On average, 77% of reads from each accession were mapped to the reference transcripts (RTs). After quality filtering, 67 686 variants were discovered of which 72% were transitions and 28% were transversions. Among the 39 110 SNPs located in the protein-coding region, 24 460 SNPs were synonymous and 14 650 SNPs were nonsynonymous. Re-sequencing of sites polymorphic between accessions Kukri and RAC875 validated about 73% of SNPs (53 of 73) (Table S2), a result comparable to other wheat studies in which SNP discovery was performed using next-generation sequencing (Allen et al., 2011; Cavanagh et al., 2013; Edwards et al., 2012; Lai et al., 2012).
For tetraploid wheat, 666 million quality-filtered RNA-seq reads (∼64 Gbp) were generated for 18 cultivars selected from a worldwide collection of durum wheat (Maccaferri et al., 2011) (Table S3) and one accession of emmer wheat (T. turgidum subsp. dicoccum Shrank ex Schübler Thell). Reads were mapped to RTs assembled for cultivar Svevo from ∼66 million reads (Table S4) and used to identify a total of 52 646 variants. The frequencies of transitions and transversions, and synonymous and nonsynonymous mutations were similar to those observed for bread wheat.
For assay design, we used the sets of SNPs discovered in this study with those previously identified in hexaploid wheat (Allen et al., 2011; Cavanagh et al., 2013; Pont et al., 2013) combined with a small set of SNPs discovered by amplicon sequencing in a set of 24 varieties (M.Ganal unpublished data). To this marker set, SNPs from the diploid ancestor of the wheat D genome Aegilops tauschii (Luo et al., 2013) were added. A total of 91 829 SNPs (Table S5) were included in the genotyping array, of which 261 and 91 568 were Infinium I (two probes per SNP) and Infinium II (one probe per SNP) assays, respectively. Of the 91 829 SNPs included in the original assay design, 81 587 (89%) passed the assay design process and produced functional assays.
Analysis of 81 587 nucleotide sequences corresponding to the functional iSelect SNP detection probes against the contigs assembled in the chromosome survey sequencing (CSS) project (http://wheat-urgi.versailles.inra.fr/Seq-Repository) identified 517 587 hybridization sites in the wheat genome. The average number of hybridization sites per probe was 6.3 with the median of three, suggesting that probes mostly targeted low-copy sequences in the wheat genome (Appendix S1, Figure S1). Using transcriptome and whole-genome shotgun sequences available for nine wheat varieties from the discovery panel (AC Barrie, Alsen, Baxter, Chara, Pastor, Volcani, Westonia, Xiaoyan54 and Yitpi), 25 252 (31%) of the SNPs could be assigned to a specific locus (on the A, B or D genome) in the CSS assemblies based on the association of the intervarietal polymorphism with sequence variation that distinguished between the hybridization sites on the different genomes to which the SNP detection probes were predicted to hybridize (Table S6). Comparison of the chromosomal assignments for 4538 of these SNPs that were also present on the 9K wheat iSelect assay and which had been previously genetically mapped (Cavanagh et al., 2013) revealed 93.1% accuracy for the in silico assignments. The remaining 56 335 SNPs, which did not show polymorphism among these nine accessions, were tentatively assigned to wheat chromosomes based on the best blastn hit (based on percentage identity) of the nucleotide sequence flanking the SNP against the CSS contigs. Comparison of the tentative chromosomal locations for these SNPs with evidence from genetic mapping (Cavanagh et al., 2013) indicated 79.6% accuracy for such assignments.
By comparing the flanking sequences of 81 587 SNPs, 13 357, 13 548 and 12 870, orthologous genes were uniquely tagged in Brachypodium, rice and sorghum, respectively (Table S7), providing a resource for comparative analysis of wheat genome.
SNP genotype calling in polyploid wheat
As shown previously (Akhunov et al., 2009; Cavanagh et al., 2013), genotyping of polyploid wheat is complicated by the presence of duplicated (homoeologous and paralogous) genes. Due to low coding sequence divergence between homoeologous gene copies on different wheat genomes (2%–4%), and often between paralogous gene copies on the same genome, oligonucleotide probes can hybridize not only to the targeted locus, but also to its homoeologues and/or paralogues. As a consequence, the ratio of allele-specific fluorescent signals observed for an assay depends on the dosage of alternative SNP variants in the wheat genome. Increasing locus copy number reduces the ratio of allele-specific fluorescent signal, and the separation of SNP allele clusters (Figure 1). Wheat genotyping can be further complicated by the presence of mutations that modify oligonucleotide annealing sites located in one or more gene copies (Figure 1). This can result in assays that do not hybridize to all gene copies and show different cluster types.
Figure 1.

Assay IWB2818 shows multiple clusters in unrelated hexaploid wheat accessions, which can be tracked within bi-parental mapping populations as biallelic markers. The targeted [T/C] single nucleotide polymorphism (SNP) site is located in the A genome of hexaploid wheat. An SNP is located in the primer binding sequence of the B genome and results in the additional cluster (C3) on the genotyping plot due to failed/reduced hybridization for the assay oligonucleotide probe. Chara × Glenlea DH samples are shown in blue (situation C2/C3, polymorphism in Genome B). Westonia × Kauz DH samples are shown in red (situation C1/C3, polymorphism in Genome A). Diverse germplasm is shown in grey. Theta is the angle of deviation from pure T allele signal, where 0 represents pure T allele signal and 1 represents pure C allele signal; R is the intensity of hybridization signal. The graphical representation of genotypes in clusters C1, C2 and C3 is shown on the right side, where a grey arrow represents the Infinium probe.
We applied the standard diploid version of GenomeStudio (GS) software (Illumina) to call genotypes for the iSelect 90K SNP assay. For this purpose, a diverse worldwide panel of almost 2500 hexaploid accessions was assembled and used to develop a cluster file storing information about cluster positions on the genotyping plot. A total of 35 684 (44%) assays showed three distinct clusters corresponding to the AA, AB and BB genotypes expected for a biallelic SNP (Table S8): 20 785 had well-separated clusters that were correctly captured by the default algorithm (Figure 2a); 9960 had poor cluster separation, for which manual clustering was required and heterozygous genotypes could not be called (Figure 2c); and 4939 showed four clusters. Of the remaining assays, 25 199 (31%) were monomorphic (consistent with 73% Sanger-based validation rate) and 20 704 (25%) showed complex clustering patterns that could not be correctly captured even with manual curation (Figure 2e,g,i). Similar proportions of polymorphic and monomorphic sites were identified in the SNP discovery panel. Overall, 56 388 (69%) of the 81 587 functional iSelect bead chip assays visually revealed polymorphism among the unrelated wheat accessions, of which 35 684 (63% of 56 388) could be correctly clustered for genotype calling providing six times more markers than the previously developed 9K iSelect assay (Cavanagh et al., 2013). In a diverse set of 55 tetraploid cultivars and landraces, 20 197 SNPs showed clustering corresponding to bi-allelic sites. A total of 36 037 biallelic SNPs segregated in the populations of both tetraploid and hexaploid wheat.
Figure 2.
Examples of clustering obtained using diploid and polyploid versions of the GenomeStudio software, respectively: (a, b) assay IWB8846; (c, d) assay IWB63414; (e, f) assay IWB36584; (g, h) assay IWB15488; and (i, j) assay IWB54207.
The shortcomings of the standard version of the GS software for analysing polyploid genotyping data are its inability to identify multiple (>3) clusters, its inability to call heterozygous genotypes when clusters are compressed due to the hybridization of assay probes to duplicated targets, and the requirement for time-consuming manual curation of assays incorrectly clustered by the default algorithm. To address these shortcomings, we used two model-free density-based cluster identification algorithms: DBSCAN (Ester et al., 1996) and OPTICS (Ankerst et al., 1999). Both algorithms can detect any number of clusters of arbitrary shape. They each require only two user-defined input parameters, ‘minimum number of points in cluster’ and ‘cluster distance’. The first parameter specifies how many data points need to be inside a circular cluster distance area to be able to form a cluster, while the second parameter defines the minimum separation distance between clusters for clusters not to merge. Together, these two parameters define the density of the cluster areas. The ‘minimum number of points in cluster’ parameter helps to minimize the merging of two or more clusters that are not fully separated. A modified OPTICS algorithm can identify a user-defined number of clusters. To increase speed for manual annotation, the polyploid version of GS was developed by Illumina that currently implements both of these algorithms.
Using these algorithms in combination with a cluster file developed using multiple bi-parental mapping populations, we identified clusters in genotyping data sets from unrelated wheat lines (Appendix S1, Figures S2–S4, Tables S9, S10). Among the 56 388 assays that exhibited visible polymorphism, 46 880 (83%) had more than a single cluster correctly captured. For the other 9508 assays, only one of the observed clusters was captured, indicating that one or more additional clusters on a genotyping plot were not present in any of the six mapping populations used for cluster file development. Only 1783 (4%) of the 48 663 assays revealing polymorphism in the six mapping populations were not present in the unrelated accessions. Inclusion of additional mapping populations at the cluster file development stage should increase the number polymorphisms that can be correctly called in diverse populations.
To confirm the accuracy of the clustering, we compared genotype calls produced by the diploid and polyploid versions of GS for biallelic assays with three clusters corresponding to the AA, AB and BB genotypes. The concordance between the two data sets was 99.6%, and the overall cluster assignment rate was 99% and 97% for the diploid and polyploid versions of GS, respectively. The differences in genotype and cluster assignment rates were primarily due to three factors: (i) low data density, especially for heterozygous genotypes that prevented cluster identification using DBSCAN and OPTICS. This was most notable for SNPs that likely had single-dose occurrence in the wheat genome and produced well-spaced clusters (Figure 2a,b); (ii) cluster compression (Figure 2c,d) and irregular cluster shape (Figure 2g,h) that prevented complete data capture by the default diploid algorithm; and (iii) application of the Confidence Score Limit in the polyploid version to exclude nonreliable data.
To assess the accuracy for near-automated genotype clustering in mapping populations (3-step procedure described in Appendix S1), we used the polyploid GS to identify polymorphisms in two doubled-haploid mapping populations. The majority (average 79%) of SNPs were detected in the first step (Table S11). The remaining SNPs were captured mostly in the second step, in which the rate of incorrectly clustered assays increased to an average of 5.9%. Visual inspection of 5000 randomly selected assays for which only a single cluster was detected revealed ∼5% rate for missed polymorphisms. Genotype calling of the same mapping populations using the cluster file developed for the diploid version of GS revealed substantially fewer polymorphic assays: 11 187 and 11 877 in the Chara × Glenlea and Young × AUS33414 populations, respectively.
Construction of genetic maps
Eight doubled-haploid mapping populations were used to order SNPs along wheat chromosomes. Genotype calling was performed using the polyploid version of GS. A total of 45 109 assays revealed polymorphism in the mapping populations (Tables S12 and S13). Of these assays, 44 345 could be mapped to one or more of 46 977 loci on specific wheat chromosomes. Of the remaining 764 polymorphic assays, 20 mapped to linkage groups that could not be unambiguously assigned to a wheat chromosome, and 744 were not linked with any other markers. Of the assays revealing polymorphism that could be mapped on wheat chromosomes 41 746 mapped to a single position, 2508 to two different positions, 69 to three positions and two to four positions. Consistent with previously observed levels of genetic diversity in the wheat genomes, the majority of mapped markers were located in the A (35%) and B (50%) genomes. Only 15% of markers mapped to the D genome (Table 1).
Table 1.
Distribution of mapped SNP loci across the wheat genome
| Chromosomes | Wheat genome | Total | ||
|---|---|---|---|---|
| A | B | D | ||
| 1 | 2260 | 4020 | 1082 | 7362 |
| 2 | 2502 | 6456 | 1561 | 10 519 |
| 3 | 1975 | 2739 | 899 | 5613 |
| 4 | 2017 | 1513 | 320 | 3850 |
| 5 | 2672 | 3347 | 1120 | 7139 |
| 6 | 2369 | 2810 | 618 | 5797 |
| 7 | 2867 | 2526 | 1304 | 6697 |
| Total | 16 662 | 23 411 | 6904 | 46 977 |
SNP, single nucleotide polymorphism.
Six of the doubled-haploid mapping populations were used to construct a consensus SNP map containing 40 267 loci (Table S13). Comparison of the consensus map order with that obtained for individual populations showed high collinearity across chromosomes, confirming the high accuracy of genotype calling using the polyploid GS (Figure 3a). Comparative analysis of SNP order revealed by assays detecting segregation at nontarget SNPs (see below) showed the high level of gene order conservation between homoeologous chromosomes, as well as frequent gene duplications across chromosomes (Figure 3b). These assays provide insights into the structural organization of the wheat genome revealing new and previously characterized re-arrangements (Devos et al., 1995).
Figure 3.

(a) Alignment of chromosome 2 consensus maps with genetic maps from individual bi-parental crosses. BTS/AUS = BT-Schomburgk × AUS33384, Cha/Glen = Chara × Glenlea, Op/Syn = W7984 × Opata M85, Sun/AUS = Sundor × AUS30604, Wes/Kauz = Westonia × Kauz, Yo/AUS = Young × AUS33414. Chromosome 2B from Yo/AUS was excluded from consensus map construction due to the presence of the alien Sr36 introgression in cultivar Young, whose presence restricts recombination and complicates map construction. (b) Comparative analysis of the order of single nucleotide polymorphism (SNP) loci in the wheat genome based on SNPs showing segregation at two (left) and three (right) duplicated loci.
Identification of nontarget SNPs and null alleles
The ability for the polyploid clustering algorithms to detect any number of clusters allowed for the capture of genotypic data for SNP assays that detected polymorphism at nontarget SNPs located on homoeologous chromosomes or duplicated paralogous targets on different chromosomes. Such assays showed more than the three expected clusters for a biallelic SNP when genotyped in unrelated germplasm but could be resolved as biallelic markers in segregating bi-parental mapping populations (Figure 1). A total of 25 643 assays detected multiple clusters in the population of unrelated hexaploid wheat accessions, representing 31% (25 643/81 857) of the entire content in the iSelect 90K bead chip array, and 46% (25 643/56 388) of all polymorphic assays. Using eight mapping populations, we were able to map polymorphisms revealed by 18 360 (72%) of these assays.
The ability of the clustering algorithms implemented in the polyploid version of GS to detect clusters of any shape allowed for the identification of null alleles (clusters with low signal intensity) resulting from either the deletion of single-copy genes in the wheat genome or the divergence of genotyping probe annealing sites (Figure 4). A total of 1660 single-locus SNPs showed evidence for null alleles. We investigated the molecular basis of null allele origin by comparing the sequences of SNP probes detecting these alleles in wheat cultivar Chinese Spring with the genomic sequence of this cultivar. Based on the comparison of flanking sequences of 94 SNP assays detecting the null alleles in cultivar Chinese Spring, 46 assays did not have annealing sites in the genome. This result suggests that about 50% of null alleles result from gene deletions and remaining are the consequence of sequence divergence at the SNP probe annealing sites.
Figure 4.

Examples of null alleles in the wheat genome. (a) Assay IWB17050 detecting a null allele; (b) Assay IWB12859 detects a co-dominant single nucleotide polymorphism locus that also shows the evidence of a null allele; (c) Frequency of nulls in the populations of different geographical origin.
Genetic variation assessment using the 90K wheat SNP assay
The 90K iSelect genotyping assay was tested by surveying SNP variation in a samples 550 hexaploid and 55 tetraploid wheat accessions including landraces and cultivars of different geographic origin from North America, Australia, Europe and Asia (Table S14). The number of biallelic polymorphic loci per population varied from 12 524 in Australian material to 21 110 in European material (Table 2). The level of genetic diversity in the cultivars was either comparable or higher than that of the population of landraces, possibly due to ascertainment bias in the SNP discovery panel, which comprised mainly of cultivars.
Table 2.
SNP diversity summary assessed in the populations of wheat cultivars and landraces
| Populations | Ploidy | Accessions | Mean heterozygosity | Number of polymorphic bi-allelic SNPs |
|---|---|---|---|---|
| Asia | 6n | 29 | 0.20 | 16 968 |
| Australia | 6n | 182 | 0.24 | 12 524 |
| Canada | 6n | 46 | 0.17 | 15 427 |
| Europe | 6n | 71 | 0.18 | 21 110 |
| USA | 6n | 95 | 0.15 | 17 013 |
| Landraces | 6n | 127 | 0.20 | 17 984 |
| Durum wheat | 4n | 55 | 0.07 | 20 197 |
SNP, single nucleotide polymorphism.
To ascertain the transferability of SNP markers across populations, we assessed the number of shared alleles and the degree of genetic differentiation (FST) between the wheat populations (Table 3). The majority of polymorphic SNPs were shared among populations, suggesting that the targeting of SNPs with both alleles present in at least two individuals in the discovery panel enriched the array for common SNP variants. This observation is consistent with the prevalence of SNPs of intermediate to high MAF in the populations (Figure 5a). FST variation between the populations of different geographical origin is likely caused by the usage of different founders (Table 3) and/or by allele frequency divergence during the development of locally adapted populations. For example, broad usage of landraces in the breeding programmes of Asia could have resulted in low FST between landraces and Asian cultivars (Cavanagh et al., 2013). Our analyses also confirm previous observations showing the high proportion of shared alleles between wheat cultivars as a whole and landraces (Cavanagh et al., 2013), suggesting that the majority of alleles for wheat improvement were contributed by landraces.
Table 3.
The number of SNP markers shared between populations (above diagonal) and the estimates of pairwise FST (below diagonal)*
| Landraces | Asia | USA | Europe | Canada | Australia | |
|---|---|---|---|---|---|---|
| Landraces | 15 823 | 14 770 | 16 312 | 14 772 | 8173 | |
| Asia | 0.02 | 14 448 | 15 773 | 14 501 | 7842 | |
| USA | 0.10 | 0.15 | 15 920 | 13 761 | 7908 | |
| Europe | 0.11 | 0.11 | 0.18 | 14 867 | 8645 | |
| Canada | 0.17 | 0.16 | 0.28 | 0.22 | 7442 | |
| Australia | 0.26 | 0.26 | 0.32 | 0.31 | 0.31 |
SNP, single nucleotide polymorphism.
Weir and Cockerham's unbiased pairwise FST.
Figure 5.

Single nucleotide polymorphism (SNP) distribution across populations. (a) Minor allele frequency across populations of different origin. (b) Shared and private SNPs between the analysed tetraploid and hexaploid wheat populations.
The 90K assay included 4427 functional SNP assays discovered by re-sequencing two subspecies of Ae. tauschii (ssp. tauschii and ssp. strangulata) (You et al., 2011). Of these SNPs, 2827 SNPs were bi-allelic in the panel used for training the clustering algorithms (Tables S5 and S8). As only one of the Ae. tauschii haplotypes was closely related to the wheat D genome (Wang et al., 2013), we expected that the majority of these SNPs would be monomorphic in hexaploid wheat. Consistently, in a set of 550 hexaploid wheat lines (Table S14), only 796 of these SNPs (18%) were polymorphic. However, in mapping populations developed using synthetic wheats created by hybridizing tetraploid wheat with Ae. tauschii, the fraction of segregating SNPs was significantly higher. For example, of 1332 genetically mapped SNPs discovered in Ae. tauschii, 1219 were polymorphic only in the synthetic wheat mapping populations.
For a set of SNPs mapped to the A and B genomes, we assessed the proportion of shared alleles between tetraploid durum and hexaploid bread wheat populations. Of 30 238 biallelic SNPs in durum (pasta) and hexaploid wheat populations, 10 251 SNPs (34%) were shared, consistent with the previous observation (Dvorak et al., 2006) that there was an extensive gene flow from the populations of tetraploid ancestors to hexaploid wheat (Figure 5b). Of 8906 variants discovered by sequencing the durum wheat transcriptome (Table S5), there were nearly two times more SNPs (3691) that were polymorphic in tetraploid than in hexaploid wheat (1777).
The extent of LD, the nonrandom association of alleles at different loci, was assessed in the populations of cultivars and landraces. Consistent with the effect of wheat improvement on LD (Cavanagh et al., 2013), the rate of LD decay was higher in landraces than in cultivars (Figure S5). Likewise, our analysis confirmed previously observed genome-specific LD patterns in the wheat genomes (Chao et al., 2010) with LD in the D genome decaying two to three times slower than in the A and B genomes.
Discussion
We present the development of a resource for high-density genotyping of wheat using a custom iSelect bead array assaying 81 587 gene-associated SNPs. The utility of the iSelect assay for functional studies in wheat was maximized by anchoring the SNPs to CSS contigs with high (93%) accuracy for chromosome assignment, identifying orthologous genes in Brachypodium, rice and sorghum, and generating genetic maps containing 46 977 loci. The MAF of SNP alleles ranging from intermediate to high in the populations of different origin suggests high transferability of SNP markers. The value of the iSelect array for genetic studies and breeding of durum and bread wheat was enhanced by including SNPs discovered in diverse populations of tetraploid and hexaploid wheat. The inclusion of SNPs polymorphic in Ae. tauschii provides an opportunity to analyse variation in this wild species and to map introgressions of genetic material from this wild relative which has been extensively used as a source of alleles contributing to abiotic and biotic stress tolerance in wheat (Jones et al., 2013; Periyannan et al., 2013; Sohail et al., 2011).
The model-free density-based clustering algorithms implemented in the polyploid version of GS provided a significant improvement for genotyping polyploid wheat. While the requirement to visually inspect each SNP remains, manual curation of incorrectly clustered SNPs is simplified by a modified OPTICS algorithm that allows automatic re-clustering of an assay for a user-defined number of clusters. The polyploid version of GS also has the ability to detect densely spaced clusters or clusters of arbitrary shape. One of the useful applications of OPTICS and DBSCAN algorithms was for chromosomal assignment of alleles for assays that revealed multiple clusters due to segregation at more than one duplicated locus. Assays revealing multiple clusters in unrelated wheat accessions tended to segregate as biallelic markers in bi-parental mapping populations. By tracking cluster positions for loci that segregated in the mapping populations, it was possible to establish the allelic relationship between the multiple clusters observed in unrelated wheat accessions. This strategy allowed us to establish the allelic relationship between clusters for 72% (18 360) of the 25 643 assays showing multiple clusters. This capability provides opportunities to better utilize assays that reveal segregation at more than one duplicated locus in genetic diversity studies, GWAS and for investigating structural variation in the wheat genome.
The clustering algorithm reliably detected clusters showing low signal intensity due to divergence of SNP assay probe hybridization sites or presence–absence variations (PAVs). The latter type of variation was shown can contribute to phenotype (Chia et al., 2012; Springer et al., 2009), and the resources developed here will provide an opportunity to investigate the impact of PAVs on trait variation in wheat.
Single nucleotide polymorphisms on the array were shown to be polymorphic across multiple populations of different geographical origin, suggesting that the array can be used as a genotyping platform in various wheat genetic studies. A high proportion of shared SNPs is likely the result of using common founders for developing regional populations and intercrossing of relatively few locally adapted cultivars in regional breeding programmes (Chao et al., 2010). In spite of the significant fraction of SNPs shared among landraces and cultivars, we observed differentiation in allele frequency between regional populations and landraces. This allele frequency shift can be attributed to several factors, including disproportional usage of a limited number of founders in developing regional populations and enrichment of alleles associated with regional adaptation by local breeding programmes (Cavanagh et al., 2013). This conclusion is consistent with the effect of wheat improvement on patterns of LD. The observed elevated correlation of alleles in wheat cultivars compared with that landraces is suggestive of a population bottleneck probably caused by the usage of limited number of landrace accessions in breeding.
In conclusion, the developed 90K array, genotype calling algorithms and high-density genetic maps provide a useful resource for analysing genome-wide variation in wheat. The high data quality and low proportion of missing genotypes provide an opportunity to create a high-resolution haplotype map of the wheat genome and build a framework for future analyses of genomic variation in mapping experiments and diversity studies. A haplotype map of wheat will serve as a resource for the extrapolation of data across diversity studies and imputation of missing genotypes in experiments using low-coverage sequencing as a genotyping tool. These developments will advance the field of wheat genetics and genomics and help in elucidating intricate relationships between phenotype and genotype.
Experimental procedures
Plant material
The distribution of the 90K SNPs across populations was assessed in the diverse panel of 726 accessions including tetraploid and hexaploid landraces (Table S14). A total of eight bi-parental doubled-haploid mapping populations were used to order SNPs along chromosomes: BT-Schomburgk × AUS33384 (CIGM92.1712), Young × AUS33414 (CIGM93.238), Chara × Glenlea, W7984 × Opata M85, Sundor × AUS30604, Westonia × Kauz, Avalon × Cadenza and Savannah × Rialto. Ditelosomic lines for Chinese Spring wheat (Kimber and Sears, 1968) were used to test the accuracy of clustering and assign the consensus genetic map linkage groups to wheat chromosomes. For cluster file development for hexaploid wheat, 2473 bread wheat lines comprising 1979 worldwide wheat accessions and 494 F4 progeny from a nested association mapping population were used. The F4 lines were included to provide a sufficient number of heterozygous individuals for the majority of SNPs to ensure correct clustering of the heterozygous SNP alleles. For cluster file development in durum wheat, diverse accessions from a worldwide durum panel, recombinant inbred lines from a four-way cross of (Neodur × Claudio) × (Colosseo × Rascon37/Tarro2/Rascon37), six F1 samples (Dylan × Normanno; Tiziana × Normanno; Dupri × Normanno; Achille × Normanno; Strongfield × Saragolla; Kofa × Claudio) and the corresponding nine F1 parental lines were used.
SNP discovery
The RTs of tetraploid and hexaploid wheat were generated by assembling RNA-seq data generated using several next-generation sequencing platforms (Appendix S1). SNP discovery was performed in the transcriptomes of 19 accessions of hexaploid (Table S1) and 18 accessions of tetraploid (Table S3) wheat.
Selection of SNPs for the genotyping assay design
For assay design, SNPs were filtered to remove those that (i) had sequences showing similarity to the repeats (e-value ≤1e−10) identified by comparing 100 bp SNP-flanking sequences with the GIRI (http://www.girinst.org/repbase/) and ITMI Triticeae Repeat Sequence databases (wheat.pw.usda.gov/ITMI/Repeats) and (ii) were located in close proximity (<50 bp) to the exon–intron junctions identified in the wheat genome assembly (Brenchley et al., 2012). The selected SNPs were then submitted to the Illumina Assay Design Tool for design score calculation (www.illumina.com). A total of 91 829 SNPs were included into the assay design (Table S5).
Synonymous or nonsynonymous SNPs were annotated by comparing sequences with the nonredundant protein database at NCBI (https://www.ncbi.nlm.nih.gov/) using the blastx program with the e-value threshold of ≤1e−10. For functional annotation, RTs were translated into six reading frames and compared against the protein sequences (blastx e-value threshold ≤1e−05) predicted in the rice, sorghum, maize and barley genomes. The output of the blastx program was used for automated functional annotation using blast2GO (http://www.blast2go.de/).
SNP genotype calling using the diploid version of Genome Studio (GS)
Single nucleotide polymorphism allele clustering and genotype calling for tetraploid and hexaploid wheat was performed with GS v2011.1 as described in Cavanagh et al. (2013)). In brief, the default clustering algorithm implemented in GS was first used to identify assays that produced three distinct clusters corresponding to the AA, AB and BB genotypes expected for biallelic SNPs. Manual curation was performed for assays that produced compressed SNP allele clusters that could not be discriminated by the default algorithm. The accuracy for SNP clustering was validated visually.
SNP genotype calling in hexaploid wheat using the polyploid version of GS
Single nucleotide polymorphism clustering was performed with GS Polyploid Clustering v1.0 software using the three steps described in Appendix S1. In the first step, the density-based DBSCAN clustering algorithm (Cluster Distance = 0.07 and Minimum Number of Points in Cluster = 10) was used to identify assays producing one or more clusters. The DBSCAN does not have an a priori expectation for the number of clusters and can find arbitrarily shaped clusters (Ester et al., 1996). The setting of the minimum number of points in a cluster to ten helped to minimize the merging of clusters into a single cluster when clusters were not well separated. The clustered SNPs were then filtered based on custom cluster number, call rate and MAF. In the second step, SNP assays for which only a single cluster was detected in the first step were re-clustered using the OPTICS (Ankerst et al., 1999) clustering algorithm (Cluster Distance = 0.07, Minimum Number of Points in Cluster = 10 and Force Two Clusters option). This step allowed the identification of two clusters that were closely spaced due to the presence of duplicated copies of the SNP locus in the wheat genome. Similar to the first step, assays with two clusters were filtered based on cluster number. In the third step, assays for which satisfactory SNP clustering was not yet achieved were re-clustered using the DBSCAN algorithm with parameters Cluster Distance = 0.09 and Minimum Number of Points in Cluster = 10, followed by filtering based on custom cluster number, call rate and MAF. This step allowed for the identification of clusters that were too broad to be detected in the first DBSCAN. Finally, wheat accessions were assigned to a SNP cluster for each assay using a Confidence Score Limit of 0.8. A MAF of 0.35 was used to filter SNP clustering performed for genetic mapping populations, and a MAF of 0.05 was used to filter SNP clustering for unrelated wheat accessions. The accuracy for SNP clustering was visually checked, and incorrectly clustered SNPs were manually curated. Sample cluster assignments for each SNP assay were converted to genotype calls (Appendix S1, Figures S3 and S4).
Data analyses
Basic summary statistics for each SNP (MAF, average heterozygosity and FST) and LD were calculated using R package genetics. The linkage map was constructed using the MSTmap program (Wu et al., 2008). Linkage groups were assigned to chromosome based on the best blastn hit from a comparison of SNP-flanking sequences with the CSS sequences. The program MergeMap (Wu et al., 2011) was used to construct the consensus map using the previously described strategy (Cavanagh et al., 2013).
Acknowledgments
This project is funded by the USDA AFRI Triticeae-CAP (2011-68002-30029) USDA AFRI (2009-65300-05638), Borlaug Global Rust Initiative, National Science Foundation Plant Genome Research Program Grants DBI-0701916, Department of Primary Industries of Victoria, Grains Research and Development Corporation Australia, the Howard Hughes Medical Institute, the Gordon & Betty Moore Foundation (J. Dubcovsky), CSIRO Food Futures Flagship, Agroalimentare e ricerca, Genome Canada, Genome Prairie, Province of Saskatchewan and Western Grains Research Foundation. AA and KJE are funded by the BBSRC WISP (BB/I003207/1). We thank the Wheat Genetic Improvement Network for providing Avalon × Cadenza map, Limagrain UK limited for supplying the Savannah × Rialto population, Jingjuan Zhang for Westonia/Kauz population; and Manisha Shankar and Sue Broughton for BT-Schomburgk × AUS33384 and Young × AUS33414 mapping populations. CL and IM are affiliated with Illumina Inc., RW, JP, MG are affiliated with TraitGenetics GmbH.
Supporting Information
Additional Supporting information may be found in the online version of this article:
Cumulative distribution of the number of putative hybridization sites for genotyping oligonucleotides.
Sequential addition of mapping populations during cluster file development in polyploid version of GenomeStudio.
Calling genotypes at the targeted SNP locus.
Calling sample genotypes for iSelect assays that detect multiple clusters in a population of unrelated wheat accessions.
LD decay in the populations of wheat cultivars and landraces.
Summary of sequencing data generated for wheat transcriptome.
SNP validation.
Durum wheat genotypes used for SNP discovery.
Summary of RNA-seq data generated for cultivar Svevo.
Annotation of SNP loci.
Assignment of SNPs to a specific locus in the wheat genome using CSS assemblies.
Blast hits of SNP flanking sequences against CDS and protein sequences of Brachypodium, rice and sorghum.
Annotation of clustering patterns observed for SNP assays.
Proportion of iSelect 90K bead chip assays trained to capture polymorphisms.
Theoretical expectations for segregation at a single-copy, duplicated and triplicated SNP locus.
Numbers of assays that reveal polymorphism in the Chara × Glenlea and Young × AUS33414 mapping populations.
Genetic linkage maps for bi-parental doubled-haploid mapping populations.
Consensus SNP genetic linkage map for hexaploid wheat.
Hexaploid and tetraploid wheat accessions used to assess the distribution of the 90K SNPs across populations.
Methods (SNP discovery, cluster file development, map construction).
References
- Akhunov E, Nicolet C, Dvorak J. Single nucleotide polymorphism genotyping in polyploid wheat with the Illumina GoldenGate assay. Theor. Appl. Genet. 2009;119:507–517. doi: 10.1007/s00122-009-1059-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Albrechtsen A, Nielsen FC, Nielsen R. Ascertainment biases in SNP chips affect measures of population divergence. Mol. Biol. Evol. 2010;27:2534–2547. doi: 10.1093/molbev/msq148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Allen AM, Barker GLA, Berry ST, Coghill JA, Gwilliam R, Kirby S, Robinson P, Brenchley RC, D'Amore R, McKenzie N, Waite D, Hall A, Bevan M, Hall N, Edwards KJ. Transcript-specific, single-nucleotide polymorphism discovery and linkage analysis in hexaploid bread wheat (Triticum aestivum L.) Plant Biotechnol. J. 2011;9:1086–1099. doi: 10.1111/j.1467-7652.2011.00628.x. [DOI] [PubMed] [Google Scholar]
- Ankerst M, Breunig MM, Kriegel H, Sander J. OPTICS: ordering points to identify the clustering structure. In: Ankerst M, Breunig MM, editors. ACM SIGMOD International Conference on Management of Data. New York, NY: ACM Press; 1999. pp. 49–60. [Google Scholar]
- Berkman PJ, Lai K, Lorenc MT, Edwards D. Next-generation sequencing applications for wheat crop improvement. Am. J. Bot. 2012;99:365–371. doi: 10.3732/ajb.1100309. [DOI] [PubMed] [Google Scholar]
- Brenchley R, Spannagl M, Pfeifer M, Barker GL, D'Amore R, Allen AM, McKenzie N, Kramer M, Kerhornou A, Bolser D, Kay S, Waite D, Trick M, Bancroft I, Gu Y, Huo N, Luo M-C, Sehgal S, Gill B, Kianian S, Anderson O, Kersey P, Dvorak J, McCombie WR, Hall A, Mayer KFX, Edwards KJ, Bevan MW, Hall N. Analysis of the bread wheat genome using whole-genome shotgun sequencing. Nature. 2012;491:705–710. doi: 10.1038/nature11650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cavanagh CR, Chao S, Wang S, Huang BE, Stephen S, Kiani S, Forrest K, Saintenac C, Brown-Guedira GL, Akhunova A, See D, Bai G, Pumphrey M, Tomar L, Wong D, Kong S, Reynolds M, da Silva ML, Bockelman H, Talbert L, Anderson JA, Dreisigacker S, Baenziger S, Carter A, Korzun V, Morrell PL, Dubcovsky J, Morell MK, Sorrells ME, Hayden MJ, Akhunov E. Genome-wide comparative diversity uncovers multiple targets of selection for improvement in hexaploid wheat landraces and cultivars. Proc. Natl Acad. Sci. USA. 2013;110:8057–8062. doi: 10.1073/pnas.1217133110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chao S, Dubcovsky J, Dvorak J, Luo M-C, Baenziger SP, Matnyazov R, Clark DR, Talbert LE, Anderson JA, Dreisigacker S, Glover K, Chen J, Campbell K, Bruckner PL, Rudd JC, Haley S, Carver BF, Perry S, Sorrells ME, Akhunov ED. Population- and genome-specific patterns of linkage disequilibrium and SNP variation in spring and winter wheat (Triticum aestivum L.) BMC Genomics. 2010;11:727. doi: 10.1186/1471-2164-11-727. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chia J-M, Song C, Bradbury PJ, Costich D, de Leon N, Doebley J, Elshire RJ, Gaut B, Geller L, Glaubitz JC, Gore M, Guill KE, Holland J, Hufford MB, Lai J, Li M, Liu X, Lu Y, McCombie R, Nelson R, Poland J, Prasanna BM, Pyhäjärvi T, Rong T, Sekhon RS, Sun Q, Tenaillon MI, Tian F, Wang J, Xu X, Zhang Z, Kaeppler SM, Ross-Ibarra J, McMullen MD, Buckler ES, Zhang G, Xu Y, Ware D. Maize HapMap2 identifies extant variation from a genome in flux. Nat. Genet. 2012;44:803–807. doi: 10.1038/ng.2313. [DOI] [PubMed] [Google Scholar]
- Cook JP, McMullen MD, Holland JB, Tian F, Bradbury P, Ross-Ibarra J, Buckler ES, Flint-Garcia SA. Genetic architecture of maize kernel composition in the nested association mapping and inbred association panels. Plant Physiol. 2012;158:824–834. doi: 10.1104/pp.111.185033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Devos KM, Dubcovsky J, Dvorak J, Chinoy CN, Gale MD. Structural evolution of wheat chromosomes 4A, 5A, and 7B and its impact on recombination. Theor. Appl. Genet. 1995;91:282–288. doi: 10.1007/BF00220890. [DOI] [PubMed] [Google Scholar]
- Dvorak J, Akhunov ED, Akhunov AR, Deal KR, Luo M-C. Molecular characterization of a diagnostic DNA marker for domesticated tetraploid wheat provides evidence for gene flow from wild tetraploid wheat to hexaploid wheat. Mol. Biol. Evol. 2006;23:1386–1396. doi: 10.1093/molbev/msl004. [DOI] [PubMed] [Google Scholar]
- Edwards D, Wilcox S, Barrero RA, Fleury D, Cavanagh CR, Forrest KL, Hayden MJ, Moolhuijzen P, Keeble-Gagnère G, Bellgard MI, Lorenc MT, Shang CA, Baumann U, Taylor JM, Morell MK, Langridge P, Appels R, Fitzgerald A. Bread matters: a national initiative to profile the genetic diversity of Australian wheat. Plant Biotechnol. J. 2012;10:703–708. doi: 10.1111/j.1467-7652.2012.00717.x. [DOI] [PubMed] [Google Scholar]
- Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, Mitchell SE. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE. 2011;6:e19379. doi: 10.1371/journal.pone.0019379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ester M, Kriegel H, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis E, Han J, Fayyad U, editors. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96) Menlo Park, CA: AAAI Press; 1996. pp. 226–231. [Google Scholar]
- Ganal MW, Durstewitz G, Polley A, Bérard A, Buckler ES, Charcosset A, Clarke JD, Graner E-M, Hansen M, Joets J, Le Paslier M-C, McMullen MD, Montalent P, Rose M, Schön C-C, Sun Q, Walter H, Martin OC, Falque M. A large maize (Zea mays L.) SNP genotyping array: development and germplasm genotyping, and genetic mapping to compare with the B73 reference genome. PLoS ONE. 2011;6:e28334. doi: 10.1371/journal.pone.0028334. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hufford MB, Xu X, van Heerwaarden J, Pyhäjärvi T, Chia J-M, Cartwright RA, Elshire RJ, Glaubitz JC, Guill KE, Kaeppler SM, Lai J, Morrell PL, Shannon LM, Song C, Springer NM, Swanson-Wagner RA, Tiffin P, Wang J, Zhang G, Doebley J, McMullen MD, Ware D, Buckler ES, Yang S, Ross-Ibarra J. Comparative population genomics of maize domestication and improvement. Nat. Genet. 2012;44:808–811. doi: 10.1038/ng.2309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jia G, Huang X, Zhi H, Zhao Y, Zhao Q, Li W, Chai Y, Yang L, Liu K, Lu H, Zhu C, Lu Y, Zhou C, Fan D, Weng Q, Guo Y, Huang T, Zhang L, Lu T, Feng Q, Hao H, Liu H, Lu P, Zhang N, Li Y, Guo E, Wang S, Wang S, Liu J, Zhang W, Chen G, Zhang B, Li W, Wang Y, Li H, Zhao B, Li J, Diao X, Han B. A haplotype map of genomic variations and genome-wide association studies of agronomic traits in foxtail millet (Setaria italica. Nat. Genet. 2013;45:957–961. doi: 10.1038/ng.2673. [DOI] [PubMed] [Google Scholar]
- Jones H, Gosman N, Horsnell R, Rose GA, Everest LA, Bentley AR, Tha S, Uauy C, Kowalski A, Novoselovic D, Simek R, Kobiljski B, Kondic-Spika A, Brbaklic L, Mitrofanova O, Chesnokov Y, Bonnett D, Greenland A. Strategy for exploiting exotic germplasm using genetic, morphological, and environmental diversity: the Aegilops tauschii Coss. example. Theor. Appl. Genet. 2013;126:1793–1808. doi: 10.1007/s00122-013-2093-x. [DOI] [PubMed] [Google Scholar]
- Kim S, Plagnol V, Hu TT, Toomajian C, Clark RM, Ossowski S, Ecker JR, Weigel D, Nordborg M. Recombination and linkage disequilibrium in Arabidopsis thaliana. Nat. Genet. 2007;39:1151–1155. doi: 10.1038/ng2115. [DOI] [PubMed] [Google Scholar]
- Kimber G, Sears E. Nomenclature for the description of aneuploids in the Triticinae. In: Findlay K, Shepherd K, editors. Proceedings of Third International Wheat Genetics Symposium. Canberra, Australia: Australian Academy of Science; 1968. pp. 468–473. [Google Scholar]
- Lai K, Duran C, Berkman PJ, Lorenc MT, Stiller J, Manoli S, Hayden MJ, Forrest KL, Fleury D, Baumann U, Zander M, Mason AS, Batley J, Edwards D. Single nucleotide polymorphism discovery from wheat next-generation sequence data. Plant Biotechnol. J. 2012;10:743–749. doi: 10.1111/j.1467-7652.2012.00718.x. [DOI] [PubMed] [Google Scholar]
- Luo M-C, Gu YQ, You FM, Deal KR, Ma Y, Hu Y, Huo N, Wang Y, Wang J, Chen S, Jorgensen CM, Zhang Y, McGuire PE, Pasternak S, Stein JC, Ware D, Kramer M, McCombie WR, Kianian SF, Martis MM, Mayer KFX, Sehgal SK, Li W, Gill BS, Bevan MW, Simková H, Dolezel J, Weining S, Lazo GR, Anderson OD, Dvorak J. A 4-gigabase physical map unlocks the structure and evolution of the complex genome of Aegilops tauschii, the wheat D-genome progenitor. Proc. Natl Acad. Sci. USA. 2013;110:7940–7945. doi: 10.1073/pnas.1219082110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maccaferri M, Sanguineti MC, Demontis A, El-Ahmed A, Garcia del Moral L, Maalouf F, Nachit M, Nserallah N, Ouabbou H, Rhouma S, Royo C, Villegas D, Tuberosa R. Association mapping in durum wheat grown across a broad range of water regimes. J. Exp. Bot. 2011;62:409–438. doi: 10.1093/jxb/erq287. [DOI] [PubMed] [Google Scholar]
- Oliver RE, Tinker NA, Lazo GR, Chao S, Jellen EN, Carson ML, Rines HW, Obert DE, Lutz JD, Shackelford I, Korol AB, Wight CP, Gardner KM, Hattori J, Beattie AD, Bjørnstad Å, Bonman JM, Jannink J-L, Sorrells ME, Brown-Guedira GL, Mitchell Fetch JW, Harrison SA, Howarth CJ, Ibrahim A, Kolb FL, McMullen MS, Murphy JP, Ohm HW, Rossnagel BG, Yan W, Miclaus KJ, Hiller J, Maughan PJ, Redman Hulse RR, Anderson JM, Islamovic E, Jackson EW. SNP discovery and chromosome anchoring provide the first physically-anchored hexaploid oat map and reveal synteny with model species. PLoS ONE. 2013;8:e58068. doi: 10.1371/journal.pone.0058068. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Periyannan S, Moore J, Ayliffe M, Bansal U, Wang X, Huang L, Deal K, Luo M, Kong X, Bariana H, Mago R, McIntosh R, Dodds P, Dvorak J, Lagudah E. The gene Sr33, an ortholog of barley Mla genes, encodes resistance to wheat stem rust race Ug99. Science. 2013;341:786–788. doi: 10.1126/science.1239028. [DOI] [PubMed] [Google Scholar]
- Poland JA, Brown PJ, Sorrells ME, Jannink J-L. Development of high-density genetic maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing approach. PLoS ONE. 2012;7:e32253. doi: 10.1371/journal.pone.0032253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pont C, Murat F, Guizard S, Flores R, Foucrier S, Bidet Y, Quraishi UM, Alaux M, Doležel J, Fahima T, Budak H, Keller B, Salvi S, Maccaferri M, Steinbach D, Feuillet C, Quesneville H, Salse J. Wheat syntenome unveils new evidences of contrasted evolutionary plasticity between paleo- and neoduplicated subgenomes. Plant J. 2013;76:1030–1044. doi: 10.1111/tpj.12366. [DOI] [PubMed] [Google Scholar]
- Saintenac C, Jiang D, Akhunov ED. Targeted analysis of nucleotide and copy number variation by exon capture in allotetraploid wheat genome. Genome Biol. 2011;12:R88. doi: 10.1186/gb-2011-12-9-r88. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Saintenac C, Jiang D, Wang S, Akhunov E. Sequence-based mapping of the polyploid wheat genome. G3 (Bethesda) 2013;3:1105–1114. doi: 10.1534/g3.113.005819. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Serang O, Mollinari M, Garcia AAF. Efficient exact maximum a posteriori computation for bayesian SNP genotyping in polyploids. PLoS ONE. 2012;7:e30906. doi: 10.1371/journal.pone.0030906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sim S-C, Durstewitz G, Plieske J, Wieseke R, Ganal MW, Van Deynze A, Hamilton JP, Buell CR, Causse M, Wijeratne S, Francis DM. Development of a large SNP genotyping array and generation of high-density genetic maps in tomato. PLoS ONE. 2012;7:e40563. doi: 10.1371/journal.pone.0040563. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sohail Q, Inoue T, Tanaka H, Eltayeb AE, Matsuoka Y, Tsujimoto H. Applicability of Aegilops tauschii drought tolerance traits to breeding of hexaploid wheat. Breed Sci. 2011;61:347–357. doi: 10.1270/jsbbs.61.347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Song Q, Hyten DL, Jia G, Quigley CV, Fickus EW, Nelson RL, Cregan PB. Development and evaluation of SoySNP50K, a high-density genotyping array for soybean. PLoS ONE. 2013;8:e54985. doi: 10.1371/journal.pone.0054985. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Springer NM, Ying K, Fu Y, Ji T, Yeh C-T, Jia Y, Wu W, Richmond T, Kitzman J, Rosenbaum H, Iniguez AL, Barbazuk WB, Jeddeloh JA, Nettleton D, Schnable PS. Maize inbreds exhibit high levels of copy number variation (CNV) and presence/absence variation (PAV) in genome content. PLoS Genet. 2009;5:e1000734. doi: 10.1371/journal.pgen.1000734. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tian F, Bradbury PJ, Brown PJ, Hung H, Sun Q, Flint-Garcia S, Rocheford TR, McMullen MD, Holland JB, Buckler ES. Genome-wide association study of leaf architecture in the maize nested association mapping population. Nat. Genet. 2011;43:159–162. doi: 10.1038/ng.746. [DOI] [PubMed] [Google Scholar]
- Van Poecke RMP, Maccaferri M, Tang J, Truong HT, Janssen A, van Orsouw NJ, Salvi S, Sanguineti MC, Tuberosa R, van der Vossen EAG. Sequence-based SNP genotyping in durum wheat. Plant Biotechnol. J. 2013;11:809–817. doi: 10.1111/pbi.12072. [DOI] [PubMed] [Google Scholar]
- Wang J, Luo M-C, Chen Z, You FM, Wei Y, Zheng Y, Dvorak J. Aegilops tauschii single nucleotide polymorphisms shed light on the origins of wheat D-genome genetic diversity and pinpoint the geographic origin of hexaploid wheat. New Phytol. 2013;198:925–937. doi: 10.1111/nph.12164. [DOI] [PubMed] [Google Scholar]
- Wiedmann RT, Smith TPL, Nonneman DJ. SNP discovery in swine by reduced representation and high throughput pyrosequencing. BMC Genet. 2008;9:81. doi: 10.1186/1471-2156-9-81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu Y, Bhat PR, Close TJ, Lonardi S. Efficient and accurate construction of genetic linkage maps from the minimum spanning tree of a graph. PLoS Genet. 2008;4:e1000212. doi: 10.1371/journal.pgen.1000212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu Y, Close TJ, Lonardi S. Accurate construction of consensus genetic maps via integer linear programming. IEEE/ACM Trans. Comput. Biol. Bioinform. 2011;8:381–394. doi: 10.1109/TCBB.2010.35. [DOI] [PubMed] [Google Scholar]
- Xu X, Liu X, Ge S, Jensen JD, Hu F, Li X, Dong Y, Gutenkunst RN, Fang L, Huang L, Li J, He W, Zhang G, Zheng X, Zhang F, Li Y, Yu C, Kristiansen K, Zhang X, Wang J, Wright M, McCouch S, Nielsen R, Wang J, Wang W. Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes. Nat. Biotechnol. 2012;30:105–111. doi: 10.1038/nbt.2050. [DOI] [PubMed] [Google Scholar]
- You FM, Huo N, Deal KR, Gu YQ, Luo M-C, McGuire PE, Dvorak J, Anderson OD. Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence. BMC Genomics. 2011;12:59. doi: 10.1186/1471-2164-12-59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao K, Tung C-W, Eizenga GC, Wright MH, Ali ML, Price AH, Norton GJ, Islam MR, Reynolds A, Mezey J, McClung AM, Bustamante CD, McCouch SR. Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat. Commun. 2011;2:467. doi: 10.1038/ncomms1467. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Cumulative distribution of the number of putative hybridization sites for genotyping oligonucleotides.
Sequential addition of mapping populations during cluster file development in polyploid version of GenomeStudio.
Calling genotypes at the targeted SNP locus.
Calling sample genotypes for iSelect assays that detect multiple clusters in a population of unrelated wheat accessions.
LD decay in the populations of wheat cultivars and landraces.
Summary of sequencing data generated for wheat transcriptome.
SNP validation.
Durum wheat genotypes used for SNP discovery.
Summary of RNA-seq data generated for cultivar Svevo.
Annotation of SNP loci.
Assignment of SNPs to a specific locus in the wheat genome using CSS assemblies.
Blast hits of SNP flanking sequences against CDS and protein sequences of Brachypodium, rice and sorghum.
Annotation of clustering patterns observed for SNP assays.
Proportion of iSelect 90K bead chip assays trained to capture polymorphisms.
Theoretical expectations for segregation at a single-copy, duplicated and triplicated SNP locus.
Numbers of assays that reveal polymorphism in the Chara × Glenlea and Young × AUS33414 mapping populations.
Genetic linkage maps for bi-parental doubled-haploid mapping populations.
Consensus SNP genetic linkage map for hexaploid wheat.
Hexaploid and tetraploid wheat accessions used to assess the distribution of the 90K SNPs across populations.
Methods (SNP discovery, cluster file development, map construction).

