Skip to main content
Annals of Botany logoLink to Annals of Botany
. 2016 Jun 6;118(1):71–87. doi: 10.1093/aob/mcw081

An ultra-high density genetic linkage map of perennial ryegrass (Lolium perenne) using genotyping by sequencing (GBS) based on a reference shotgun genome assembly

Janaki Velmurugan 1,2,, Ewan Mollison 1,3,4,, Susanne Barth 1, David Marshall 3, Linda Milne 3, Christopher J Creevey 5,, Bridget Lynch 2, Helena Meally 1, Matthew McCabe 5, Dan Milbourne 1,*
PMCID: PMC4934400  PMID: 27268483

Abstract

Background and Aims High density genetic linkage maps that are extensively anchored to assembled genome sequences of the organism in question are extremely useful in gene discovery. To facilitate this process in perennial ryegrass (Lolium perenne L.), a high density single nucleotide polymorphism (SNP)- and presence/absence variant (PAV)-based genetic linkage map has been developed in an F2 mapping population that has been used as a reference population in numerous studies. To provide a reference sequence to which to align genotyping by sequencing (GBS) reads, a shotgun assembly of one of the grandparents of the population, a tenth-generation inbred line, was created using Illumina-based sequencing.

Methods The assembly was based on paired-end Illumina reads, scaffolded by mate pair and long jumping distance reads in the range of 3–40 kb, with >200-fold initial genome coverage. A total of 169 individuals from an F2 mapping population were used to construct PstI-based GBS libraries tagged with unique 4–9 nucleotide barcodes, resulting in 284 million reads, with approx. 1·6 million reads per individual. A bioinformatics pipeline was employed to identify both SNPs and PAVs. A core genetic map was generated using high confidence SNPs, to which lower confidence SNPs and PAVs were subsequently fitted in a straightforward binning approach.

Key Results The assembly comprises 424 750 scaffolds, covering 1·11 Gbp of the 2·5 Gbp perennial ryegrass genome, with a scaffold N50 of 25 212 bp and a contig N50 of 3790 bp. It is available for download, and access to a genome browser has been provided. Comparison of the assembly with available transcript and gene model data sets for perennial ryegrass indicates that approx. 570 Mbp of the gene-rich portion of the genome has been captured. An ultra-high density genetic linkage map with 3092 SNPs and 7260 PAVs was developed, anchoring just over 200 Mb of the reference assembly.

Conclusions The combined genetic map and assembly, combined with another recently released genome assembly, represent a significant resource for the perennial ryegrass genetics community.

Keywords: Lolium perenne, perennial ryegrass, genome assembly, genotyping by sequencing, GBS, single nucleotide polymorphism, linkage mapping, presence/absence variation

INTRODUCTION

Perennial ryegrass (Lolium perenne L.) is an important component species in pastoral-based production systems in temperate regions. It is diploid (2n) with seven chromosomes and has a genome size of approx. 2·5 Gb (Kopecky et al., 2010). Despite its relative importance, it remains poorer in genome-based resources than other grass species, such as members of the closely related Triticeae, lacking at the time of writing a published physical map and genome sequence.

High quality genetic linkage maps remain a cornerstone of discovery genetics in plant species. Despite their numerous drawbacks, including a restricted representation of the true genetic diversity of a species, much progress in discovery genetics continues to be made using ‘flagship’ reference mapping populations over near decadal time scales. The greatest advances can be gained in such reference populations by developing genetic maps that are densely populated with genetic markers located in genic regions and that are sequence characterized in such a way as to allow anchoring to pre-existing or emerging physical maps, and other important reference maps in the same or other species.

The ‘F2 biomass’ population has been used to study segregation distortion (Anhalt et al., 2008) and as a basis for several quantitative trait locus (QTL) mapping studies for traits including rust resistance (Tomaszewski et al., 2012), biomass yield (Anhalt et al., 2009) and polar (A. Foito, JHI, Dundee, UK, unpubl. res.) and non-polar metabolites (Foito et al., 2015). Both parents of the F1 parental genotype of this population were originally maintainer lines in a cytoplasmic male sterility (CMS) programme at Teagasc (Connolly and Wrightturner, 1984) and originated from an inter-specific cross between meadow fescue (Festuca pratensis) and perennial ryegrass. The initial inter-specific hybrid was backcrossed for several generations to the ryegrass parent and recurrently self-pollinated for nine (maternal grandparent) or ten (paternal grandparent) generations. The background of the Lolium contribution in the pedigree of the inbred lines was the ryegrass cultivar ‘S24’ [the Institute of Biological, Environmental and Rural Sciences (IBERS)] for the maternal grandparent and the ryegrass cultivar ‘Premo’ (Mommersteeg International BV) for the paternal grandparent. The inbred lines have been subjected to analysis using both fluorescent and genomic in situ hybridization (Anhalt et al., 2008) approaches, and no evidence of large intact portions of the fescue parent were evident, indicating that the grandparents largely reflect a perennial ryegrass genetic background. Offspring arising from self-pollination of a single F1 plant from a cross between these two self-compatible lines was used for the basis of the original F2 biomass population of 360 individuals (Anhalt et al., 2008; Tomaszewski et al., 2012). The population is also the basis for an ongoing initiative to develop a recombinant inbred line (RIL) population for perennial ryegrass at Teagasc.

Advances in sequencing technology have allowed the development of approaches to generate extremely large numbers of DNA markers in a quick and cost-effective manner (Davey et al., 2011). Although sequencing costs have experienced a continued downward trend over the last several years, it is still relatively expensive and computationally intensive to sequence and assemble whole genomes in order to identify genetic variation/DNA markers for the large numbers of genotypes that tend to comprise experimental populations. As an alternative, numerous strategies have been developed that rely on sequencing reduced sub-sets of the genomes of different individuals to identify such variation. High throughput polymorphism detection methods such as Complexity Reduction of Polymorphic Sequences (CRoPS) (van Orsouw et al., 2007), restriction site-associated DNA sequencing (RADseq) (Davey and Blaxter, 2010) and genotyping by sequencing (GBS) (Elshire et al., 2011) use genome complexity reduction approaches to target specific regions of the genome, and markers are identified by examining DNA variation in similar sub-sets of the genome from different individuals using widely available bioinformatics-based approaches. These methods, combined with the power of next-generation sequencing technology, have radically enhanced our ability to generate thousands of markers in reasonably large experimental populations, opening up a wealth of applications in areas such as discovery genetics and genomics-assisted plant breeding.

The specific GBS approach described by Elshire et al. (2011) is increasingly becoming a method of choice for high throughput genotyping applications. This method has a simple protocol for the generation of genotyping libraries, which lacks a specific gel-based size selection step, avoids the use of divergent Y-adaptors and is amenable to parallelization using either manual or automated liquid handling approaches. Combining these features with a simple in-line barcoding system and the ability to tailor the protocol to suit different organisms and applications by changing the methylation-sensitive restriction enzyme(s) employed for the complexity-reduction step makes GBS a powerful but easy to adopt approach for genome-wide marker generation. Because of this, it has been widely adopted in plant species.

Although it was conceived primarily as a method for detecting single nucleotide polymorphism (SNP) variation, GBS can also survey other forms of variation including small insertions/deletions (InDels), simple sequence repeats (SSRs) and ‘presence/absence’ variation derived from ‘anonymous’ DNA polymorphisms that cause variation in whether specific DNA fragments are amplified across individuals (Elshire et al., 2011). The GBS approach has been shown to work remarkably well irrespective of the availability of a reference genome (Ward et al., 2013), since fragments produced in individuals of the experimental population can be auto-assembled to produce a reference sequence set to which the same fragments can subsequently be re-aligned in order to identify polymorphisms. However, the approach is also extremely useful when it is coupled with an externally derived reference sequence, and it is a useful tool for applications such as anchoring shotgun genome assemblies and spotting misassembles in existing reference genome sequences (Mascher et al., 2013).

As previously mentioned, the F2 biomass population has been used extensively for a variety of purposes, and the densest map available to date for the cross is a Diversity Array Technology- (DArT) based map comprising 297 markers, anchored and oriented with 29 SSR markers (Tomaszewski et al., 2012). Despite the fact that the sequences of the DArT markers are available (Bartos et al., 2011), their lack of genomic context and the relatively low density of the map are limitations for the continued use of the population as a platform for genetic analysis. The objective of the current study was to generate a high density genetic linkage map of the F2 biomass population using GBS in order to increase its utility for genetic mapping studies in the future. Despite the fact that such a map could be developed in the absence of a reference genomic sequence, we decided to increase the utility of the mapping resource by assembling a reference shotgun sequence of the inbred line that was the paternal grandparent of the mapping population. Short read fragments (from the Illumina HiSeq 2000 platform) mapped using this approach would thus generally be anchored to larger fragments from the shotgun assembly, providing a better genomic context for each marker mapped – increasing their utility for future applications. This is especially timely in the context of the recent publication of an annotated synteny-based draft genome sequence of another genotype of L. perenne (Byrne et al., 2015). We present the genetic map, all SNP information and the reference assembly as resources for the forage genetics community.

MATERIALS AND METHODS

Shotgun sequencing and assembly of reference sequence

Illumina HiSeq and GAII sequencing was used to generate approx. 207-fold raw coverage of the genome of the paternal grandparent of the F2 biomass population. Libraries were produced from a range of paired-end (<300 bp insert), mate pair (3 kbp insert) and long jumping distance (8, 20 and 40 kbp insert) libraries and with read lengths ranging from 51 to 160 bp. Supplementary Data Table S1 shows details for all libraries used.

For the 300 bp and 3 kbp insert libraries, library production was as follows: DNA of the paternal grandparent (2 μg) was fragmented for 30 min with NEBNext® dsDNA Fragmentase (NEB) and purified using a QIAquick PCR purification kit (Qiagen). The NEBNext® End Repair Module was used to blunt-end the fragments, and purification of the reaction was performed using a QIAquick PCR purification kit (Qiagen). The NEBNext® dA-Tailing Module was used to adenylate the blunt-ended fragments, and purification of the reaction was performed using a QIAquick PCR purification kit (Qiagen). Illumina standard paired-end adaptors were ligated onto the adenylated fragments using the Quick Ligation™ Kit (NEB) and purification was performed using a QIAquick PCR purification kit (Qiagen). Adaptor-ligated fragments were then size selected by electrophoresis on an agarose gel, excision of a 2 mm gel slice and extraction of DNA from the agarose using the QIAquick gel extraction kit (Qiagen). PCR enrichment (12 cycles) of the library was performed using Illumina PCR Paired End Primers 1.0 and 2.0 and a Phusion™ High-Fidelity PCR Kit (Finnzymes). The library size and absence of adaptor dimers were determined with a DNA1000 chip on an Agilent 2100 bioanalyser. Sequencing was performed on either an Illumina HiSeq 2000 or GAII platform as outlined in Table S1. Long jumping distance libraries were constructed by Eurofins Genomics using a proprietary method, and sequenced on an Illumina HiSeq 2000 platform.

Assembly was carried out using the resulting FASTQ files. Prior to assembly, additional quality control stages were carried out: Sickle (Joshi and Fass, 2011) was used to trim low quality base calls from 5′ ends of the reads using a quality cut-off of Q30, equivalent to 99·9 % confidence in base calls, and with remaining read length of 50 bp (35 bp in the case of the 3 kbp libraries, as these were sequenced with read length 51 bp); and FastUniq (Xu et al., 2012) was used to remove redundant read pairs that may have arisen due to PCR duplication. As a result of this filtering, final genome coverage was reduced to approx. 105-fold. Following trimming and de-duplication, paired-end and singleton reads were assembled using CLC Assembly Cell (http://www.clcbio.com/, CLC Bio, Aarhus, Denmark) with a k-mer length of 41 and then scaffolded with the 3–40 kb read pairs using SSPACE (Boetzer et al., 2011).

Preliminary annotation of the reference sequence

RepeatMasker version 3.2.8 (Smit et al., 2010) was used to identify common repetitive elements in the scaffolded Lolium assembly using a wheat-based model. The widely used, open-source, gene prediction tool Augustus (Stanke and Waack, 2003) was used for gene prediction using the repeat masked genome assembly, with a wheat-based gene model. BLAST searching (Altschul et al., 1990) of barley cDNA sequences and publicly available peptide sequences for barley, rice and Brachypodium was carried out using default cut-off parameters (E-value = 10), which allows some very dissimilar matches to be returned. This parameter was intentionally left at the default setting to allow identification of distant homologues.

Viewing Lolium genomic data

In order to view data generated in this study in a more accessible format, a JBrowse-based genome browser (Skinner et al., 2009) has been set up and made available at https://ics.hutton.ac.uk/jbrowse/lolium, and the scaffolded genome assembly is available for download at https://ics.hutton.ac.uk/jbrowse/lolium/data/seq/lolium_scaffolds.zip. Raw reads generated in this study have been deposited with the European Nucleotide Archive under study accession PRJEB12921 (http://www.ebi.ac.uk/ena/data/view/PRJEB12921).

Estimating coverage of the Lolium gene complement

Four approaches were taken to estimate the degree to which the gene complement of Lolium was captured within the assembly.

Byrne et al. (2015) have published models for 28 455 Lolium genes, yielding 40 068 transcripts, based on several RNA sequencing (RNA-Seq) studies. Sequences for these genes were compared using BLAST against our Lolium assembly (E-value = 10). Completeness of BLAST hits was assessed based on cumulative identity percentage (CIP) and cumulative alignment percentage (CALP) across all high-scoring segment pairs (HSPs) for each match, a method described in Salse et al. (2008).

Ruttink et al. (2013) used an orthology-guided assembly (OGA) approach to create a reference transcriptome for Lolium. For simplicity, we used only the OGA based on Brachypodium distachyon in this study. Sequences of > 200 bases from the OGA transcriptome were compared using BLAST against the Lolium assembly (E-value = 10) and match strength was evaluated as above. Another Lolium transcriptome assembly is described by Farrell et al. (2014); all sequences in this transcriptome assembly are >200 bp in length, so no filtering was required before applying the approach described above.

The Core Eukaryotic Genes Mapping Approach (CEGMA; Parra et al., 2007) was used to search the Lolium assembly for 458 core proteins that are conserved across eukaryotes, with a more highly conserved sub-set of 248 used to indicate completeness of coverage.

A set of 47 genes associated with control of flowering in rice and Brachypodium were selected from Higgins et al. (2010) and an additional gene associated with flowering in barley, CEN, from Comadran et al. (2012). Peptide sequences for those genes were searched against the Lolium genome assembly from this study using Exonerate (Slater and Birney, 2005), with the top-ranked hit being considered as the probable Lolium orthologue. The same approach was applied to identify the equivalent scaffold from the assembly of Byrne et al. (2015), with the additional step of confirming the equivalency of the scaffold through manual inspection of BLAST-based comparisons of the scaffolds between the assemblies in order to ensure that they represented the orthologous regions in both assemblies.

GBS library construction

A total of 169 individuals from the F2 biomass population were used for the mapping study. For reference purposes, the paternal grandparent and the F1 parental genotype were also used. Unfortunately, the maternal grandparent was no longer extant at the time of the study. Genomic DNA from these individuals was extracted from approx. 3 g of flash-frozen, fresh leaf material using a variation of the CTAB (cetyltrimethylammonium bromide) method of Doyle (1987). GBS libraries were constructed using an adapted version of the protocol outlined by Elshire et al. (2011), employing the methylation-sensitive 6 bp rare-cutting restriction enzyme PstI instead of ApeKI. A set of 48 unique barcode adaptors were generated from complementary sequence with a PstI overhang sequence. The barcodes varied from four to nine nucleotides in length. A common adaptor and PCR primers A and B were generated. Complementary oligos for each of the 48 adaptors at 50 µm were annealed under the following program: 95 °C, 2 min; ramp to 25 °C by 0·1 °C s−1; 25 °C, 30 min; 4 °C hold. The annealed adaptors were diluted 1:15 and then subjexted to a further dilution of 1:100. A 100 µL aliquot of the 1:1500 diluted barcoded adaptors and the common adaptor were mixed to make the 200 µL working stock of 0·6 ng µL−1. These were quantified using the Qubit fluorometer.

DNA was digested in 20 µL reactions containing 200–220 ng of genomic DNA, 2 µL of 10× NEB buffer 3, 1·5 µL of bovine serum albumin (BSA), 20 U of PstI and 13·5 µL of molecular grade water incubated at 37 °C for 2 h, then deactivated at 80 °C for 20 min. In the ligation reaction, 20 µL of digested product was combined with 12 ng of the working stock of annealed adaptor mix, 5 µL of 10× T4 ligase buffer and 400 U of T4 ligase in a 50 µL reaction. All ligation reactions were incubated at 22 °C for 1 h and then at 65 °C for 30 min to deactivate the ligase.

The ligation reaction was cleaned up using the Qiagen Qiaquick PCR purification kit, and the elution volume was 50 µL. The PCR was set up as a 50 µL reaction that included 10 µL of the purified ligation reaction, 25 µL of the NEB 2× Taq master mix, 2 µL of a 3 µm primer 1 and 2 mix, and 13 µL of molecular grade water. The PCR program was 72 °C for 5 min; 98 °C for 30 s; 18 cycles of 98 °C for 10 s, 65 °C for 30 s, 72 °C for 30 s; 72 °C for 5 min; and 4 °C hold. The PCR-enriched libraries were purified using the Qiagen MiniElute purification kit and were eluted in 21 µL. The quality of the library was checked on the 2100 Bioanalyser from Agilent Technologies. The constructed GBS libraries were sequenced in two channels of Illumina HiSeq 2000 (Bentley et al., 2008) for single-end 100 bp reads.

Variant calling pipeline

The sequenced reads were de-multiplexed and trimmed to 66 bp using the process_radtags component of Stacks (Catchen et al., 2013). A sliding window quality metric (-w 0·15) was adopted to discard any reads with low quality scores [i.e. as the sliding window scans the read, if any 15 % of a total fraction of the length of the read falls below the phred score value of 10 (-s) the reads were discarded]. Reads with uncalled bases were also discarded. The de-multiplexed reads were aligned to the reference set using the Bowtie (Langmead et al., 2009) alignment program, allowing two mismatches (-v 2) and allowing only unique mapping to the reference set (-m 1). The resulting alignment files in SAM format were post-processed (converting SAM to BAM format, sorting, indexing the BAM files) to create a consensus mpileup file using Samtools-0.1.18 (Li et al., 2009).

For SNP discovery, VarScan.v.2.2.11 (Koboldt et al., 2012) was used to call SNP variants from the mpileup file with the settings: minimum coverage, 8; minimum reads, 2; minimum variant allele frequency, 0·2; minimum average quality, 20; P-value threshold, 0·05). The resulting variant list file was filtered for SNP markers exhibiting at least one heterozygote to identify the maximum number of possibly segregating markers.

To identify PAVs, all the individual alignment files in BAM format were merged using the SAMtools merge command. The merged BAM file was then converted to SAM format. The merged SAM file was then parsed into a text file to produce a table with genotypes from the population in columns and independent loci having at least one alignment as rows. A simple UNIX shell script was subsequently used to identify alignments from this table for which between 10 % and 50 % of the individuals (16) exhibited a Not Called (NC) designation (indicating potential PA variation), and for which the average read depth for individuals exhibiting alignments was eight.

Linkage map construction

Linkage map calculations were performed using R/qtl (Broman et al., 2003) and Joinmap (v.4.1, Kyazma; Van Ooijen, 2011). The SNP markers and PAV markers were initially scored respectively as co-dominant marker type [A, H, B] and dominant marker for an F2 population type in R/qtl (Broman et al., 2003). For simplicity, PAVs were all coded in the same way [B (absent), D (present)] regardless of probable grandparental line derivation. Different optimal settings were required to resolve SNPs and PAVs into linkage groups (LGs) so the two data sets were kept separate for this stage. DArT markers from a previous genetic linkage map published in this population (Tomaszewski et al., 2012) were included in order to identify LG designations. The pairwise recombinant fractions were estimated between markers in the two data sets using the est.rf() function of R/qtl, and markers were grouped using formLinkageGroups(). The grouping function resolved the SNPs to the expected seven LGs, but yielded 14 LGs for PAV markers, representing seven paternal and seven maternal grandparent-derived sets of LGs. These were subsequently recoded B,D or A,C to reflect grandparental origin.

The output of R/qtl was used to create chromosomally designated locus genotype files for Joinmap 4.1 (DArT markers were not carried forward in the analysis). A framework map was created with only the SNP markers (with <40 % missing data), using the maximum likelihood algorithm of JoinMap 4.1. To reduce map inflation due to low levels of genotyping error, a single round of imputation-based error correction was performed. Graphical genotypes based on the maximum likelihood maps for each LG were exported and used as input for the genotype error correction module ‘GBS Plumage for F’ (Spindel et al., 2013) with the setting of (-ct 1). This process identified all singletons (double recombinants) in the graphical genotypes and replaced them with missing values. The map order was then recalculated with the maximum likelihood algorithm of JoinMap 4.1 using the error-corrected data set. The final map comprised only non-redundant loci, as all identical loci are grouped into bins automatically during grouping in JoinMap 4.1.

The PAV markers (and SNP loci that were excluded due to high missing data) were subsequently fitted to the framework map using a simple binning strategy. Pairwise recombination fractions were calculated between the SNP markers and the PAVs in JoinMap 4.1. PAV markers (and high missing data SNPs) were placed in the SNP bin on the framework linkage map with which they exhibited the lowest recombinant fraction (RF) value, with ties in RF being resolved by referring to the highest LOD (limit of detection) score.

RESULTS

Generating a reference sequence for GBS alignment

We generated a draft assembly of the low copy portion of the genome of the inbred paternal grandparent of the F2 biomass population from Illumina-sequenced paired-end, mate pair and long jumping distance libraries. Assembly and scaffolding produced a final assembly of 1·11 Gbp in size, consisting of 424 750 scaffolds, with scaffold N50 of 25 212, contig N50 of 3790 and GC content of 44·16 %. This GC content is consistent with that of barley (Rostoks et al., 2002). The assembly size of 1·11 Gbp reflects only around 40 % of the total genome size of Lolium, most probably as a result of the limitations of short-read sequencing when assembling complex plant genomes, with many repetitive regions being collapsed into a limited number of contigs. Table 1 summarizes the assembly statistics.

Table 1.

Summary statistics for the Lolium perenne genome draft assembly

Contigs Scaffolds
No. of sequences 624 485 424 750
Max. size 94 591 282 695
Mean size 1338 2618
N50 3790 25 212
Total length 835 987 474 1 112 005 533
%GC 44·21 44·16
%N 3·08 27·19
No. of sequences ≥N50 54 586 10 877
% of sequences ≥N50 8·74 2·56
No. of sequences < 500 bp 320 891 254 591
% of sequences <500 bp 51·38 59·94
No. of bases in sequences <500 bp 104 144 782 81 476 618
% of bases in sequences <500 bp 12·46 7·33

The shortest contig and scaffold in the assembly are equal in length at 143 bp. For completeness, we have included all sequences above this size in the released version of the assembly from this study. Consequently, short sequences (<500 bp) are highly abundant, accounting for approx. 50 % to 60 % of the total number of sequences in the unscaffolded and scaffolded versions of the assembly, respectively. However, by length, these sequences comprise only a small part of the assembly. The 254 591 scaffolds <500 bp that account for approx. 60 % of the total of 424 750 scaffolds contain only 7·33 % of the sequence, whilst 2·56 % of scaffolds account for 50 % of the sequence. Figure 1 illustrates the distribution of contig and scaffold lengths according to size range groupings.

Fig. 1.

Fig. 1.

Distribution of contig and scaffold lengths. Scaffolds and contigs are grouped according to size range, from <500 bp to > 280 000 bp, to indicate the proportion of sequence held by scaffolds/contigs within each size range group. The numbers of scaffolds/contigs in each size range group are plotted on the left-hand vertical axis, with the corresponding base pair length for each group plotted on the right-hand vertical axis. (A) Scaffold/contig counts and base pair lengths for each group are shown along with (B) cumulative running totals.

Estimate of gene space coverage

We used four methods to gain an insight into the coverage of the L. perenne gene space by the assembly, based respectively on: a core reference set of proteins (CEGMA); the ability to find the majority of genes involved in controlling flowering; and representation of both specific public L. perenne transcript assembly data sets and of a set of gene models associated with the recently released draft assembly of perennial ryegrass.

The CEGMA defines coverage as either ‘complete’ or ‘partial’, based on the length of the aligned region. Complete coverage of a gene is defined as any alignment across >70 % of its length and partial as < 70 % of the length aligned, but with significant identity. Of the 248 core proteins used by CEGMA to estimate completeness of coverage, 239 (96·37 %) were found to have complete alignment and 246 (99·19 %) were found to have either complete or partial alignment within the Lolium assembly.

Because traits related to flowering are important in the utility of perennial ryegrass as a forage crop, we decided to investigate whether we had captured a significant number of the genes involved in the control of this characteristic. Higgins et al. (2010) have identified the probable rice and Brachypodium orthologues of approximately 50 genes involved in the induction of flowering in Arabidopsis. Homologues of 47 of the genes described by Higgins et al. (2010) and an additional gene (CEN) described by Comadran et al. (2012) in barley were identified in the Lolium assembly by both BLAST and Exonerate methods. The genes were located on 48 individual scaffolds ranging in length from 227 bp to 140 152 bp (perhaps demonstrating the utility of retaining shorter scaffolds in the assembly). The majority of the genes were located on large scaffolds with substantial sequence both up- and downstream of the gene’s position. Of the 48 genes, 33 appear to be complete models based on homology with rice, Brachypodium or barley, as indicated by Exonerate; 12 genes are classed as partial models due to genome scaffolding around the N-terminus, two lie on short scaffolds and one is truncated by scaffolding around the C-terminus. A list of the scaffolds containing the 48 genes is available in Supplementary Data Table S2.

The Brachypodium-based OGA transcriptome assembly described in Ruttink et al. (2013) contained 46 459 sequences, with 41 120 (88·51 %) of these being larger than 200 bp. BLAST searching (E-value = 10) of these within the draft Lolium assembly matched 38 876 sequences (94·54 %), with 27 427 (66·67 % of total sequences, 70·55 % of matched sequences) aligning with at least 95 % identity over at least 70 % of the query sequence length, based on the calculations for CIP and CALP described by Salse et al. (2008). The 27 427 strongly matched transcripts over 200 bp occurred on 11 778 scaffolds ranging from 220 bp to 282 695 bp, Ninety-six per cent (11 315) of these scaffolds were >5 kbp in length and these contained 26 888 (98·03 %) of the strongly matched transcripts.

The transcriptome assembly described in Farrell et al. (2014) contained 185 833 sequences, all >200 bp in length. BLAST searching (E-value = 10) matched 138 028 (74·28 %) within the Lolium genome assembly; 109 320 (58 % of total sequences and 79·20 % of matched sequences) aligned with at least 95 % identity over at least 70 % of the query sequence length. The 109 320 strongly matched transcripts occurred on 14 934 scaffolds ranging from 202 bp to 282 695 bp. Ninety-one per cent (13 576) of these scaffolds were >5 kbp in length and these contained 106 644 (90·91 %) of the strongly matched transcripts.

The assembly published by Byrne et al, (2015) contained sequences for 28 455 gene models. BLAST searching (E-value = 10) of these within the draft Lolium assembly matched 28 067 sequences (98·64 %), with 22 563 (79·29 % of total sequences, 80·39 % of matched sequences) aligning with at least 95 % identity over at least 70 % of the query sequence length. The 22 563 strongly matched gene models occurred on 12 551 scaffolds ranging from 205 bp to 274 411 bp and with a maximum of 45 models on one scaffold; 11 662 scaffolds (92·92 %) were >5 kbp in length and contained 21 461 (95·12 %) of the strongly matched models.

Gene models and transcripts from all sets combined were located on a total of 18 135 distinct scaffolds, totalling 570 Mb in length. Of these, 5584 scaffolds were specific to the combined Farrell and Ruttink transcript sets and 1855 were specific to the set of gene models from Byrne, with 10 696 scaffolds common to both sets. Figure 2 shows the distribution of scaffold sizes along with the numbers of transcripts/gene models aligned to them. Although the number of features differs greatly between the two transcript sets and the gene model set, the distribution of scaffold size bins and number of features in each bin is consistent between the sets, with larger (>5 kb) scaffolds being much more prevalent and containing the large majority of transcripts/gene models. In particular, a sharp increase in the number of scaffolds and transcripts/models occurs in the 10–20 kb size range, with scaffold count tailing off rapidly, but transcript/model numbers remaining high before beginning to tail off beyond 50 kbp. This trend is reflected in the cumulative totals, with a steep rise in numbers occurring between the 5 kb and 50 kb size ranges and then rapidly levelling out beyond 50 kb.

Fig. 2.

Fig. 2.

Distribution of transcripts and gene models, and associated scaffolds. Scaffolds are grouped according to size range, from <500 bp to > 280 000 bp, to indicate the proportion of transcripts from both Ruttink and Farrell sets and gene models from Byrne et al. contained by scaffolds within each size range group. The numbers of scaffolds in each size range group are plotted on the left-hand vertical axis, with the number of associated transcripts for each group plotted on the right-hand vertical axis. (A) Scaffold counts with numbers of transcripts and gene models for each group are shown along with (B) cumulative running totals.

Accessing the draft assembly

As part of this study, we present a genome browser that allows dynamic viewing of the assembly, with tracks for the features described above. In addition, we have also provided additional layers of annotation based on preliminary de novo prediction and homology-based methods involving comparisons with barley, rice and B. distachyon. Tracks for all of the features listed below are also available on the browser.

Prior to preliminary annotation, RepeatMasker version 3.2.8 was used to identify common repetitive elements in the scaffolded Lolium assembly using a wheat-based model. The majority of repeats identified in Lolium belonged to the retroelement and DNA transposon classes of repeat; this led to 67·77 Mbp of sequence being masked, or 6·09 % of the assembled genome sequence. Supplementary Data Table S3 details the repeat content identified.

The widely used, open-source, gene prediction tool Augustus (Stanke and Waack, 2003) was used for gene prediction using the repeat masked genome assembly, with a wheat-based gene model. In total, 188 842 predicted entities were identified from 59 903 scaffolds, with a maximum count of 74 entities on one scaffold. Three scaffolds are likely to be mitochondrial, representing 521·8 kbp and containing 20 predictions; Augustus did not predict gene models for scaffolds known to be associated with the chloroplast. This prediction of 188 842 entities is clearly a gross overestimate and will reflect a number of confounding factors, including retroelements, pseudogenes, gene fragments and sequencing errors.

BLAST searching of barley cDNA sequences and publicly available peptide sequences for barley, rice and Brachypodium was carried out using default cut-off parameters (E-value = 10). Using this lenient threshold, 99·75 % (26 094) of 26 159 barley peptide sequences, 98·42 % (30 539) of 31 029 Brachypodium peptides and 88·8 % (58 905) of 66 338 rice peptides exhibited matches to the alignment. As expected from their phylogenetic and ancestral relationship, a greater proportion of barley peptide sequences were found to have matches within the Lolium assembly. In total, 19 477 scaffolds were found to contain homology matches to any of the above data sets.

GBS library construction and sequencing results

GBS libraries were developed for 169 individuals, the F1 parent and the paternal grandparent of the mapping population following the protocol of Elshire et al. (2011) using the restriction enzyme PstI and sequenced on an Illumina HiSeq 2000 to generate single-end 100 bp reads. In total, sequencing yielded 284 908 063 reads for the progeny genotypes. Reads were de-multiplexed and, to maintain a consistent read length and quality, the reads were trimmed to 66 bp. After cleaning, an average of approx. 1·7 million reads per individual was obtained.

Alignment of GBS reads to the assembly

The Lolium shotgun assembly described above (with 424 750 scaffolds) was used as a reference sequence for SNP variant identification in the F2 biomass population. The sequences from the de-multiplexed individual FASTQ files were aligned to the reference set allowing two mismatches. Of the total 284 908 063 reads, 164 285 05 (57·6 %) reads had at least one reported alignment. On average, 58 % of reads from each individual aligned to the reference genome. Overall, 15 118 076 reads (comprising 5·3 % of the total) failed to align to the reference due to the alignment option that allowed alignments only for reads that mapped uniquely to the reference. A further 105 504 582 reads (37·3 %) failed to align under the settings used.

In total, there were 213 310 PstI restriction sites located on 64 977 scaffolds in the assembly. These scaffolds accounted for 75 % of the total size of the assembly (834 624 995 bp), with the remaining 359 774 scaffolds accounting for only 25 % of the total size of the assembly (277 380 681 bp). Out of the 64 977 scaffolds possessing PstI sites, 26 954 (41·5 %) have at least one GBS read aligning to them, and these scaffolds contain a total of 111 903 PstI sites (52·4 % of the total PstI sites in the assembly).

SNP variant identification

Using Varscan (minimum read depth of eight; minimum two reads to call variants, minimum average phred quality base score of 20, variant allele frequency of 0·2), 22 805 SNP positions were reported. This included variants which were monomorphic amongst the progeny individuals, but which differed from the reference nucleotide at that position. Amongst these, a total of 9127 variants exhibited at least one variant in the population and were biallelic. Of the 9127 variants, the majority of them were of the transition type, with C/T and A/G type accounting for 31 and 29 %, respectively. The remaining SNPs were of the transversion type, with C/G, G/T, A/C and A/T type accounting for 15, 9, 9 and 6 %, respectively (Table 2). The R/qtl function geno.table() was used to examine the segregation pattern of the markers, and 4329 out of 9127 markers were eliminated due to severe departure from the expected Mendelian segregation ratio (1:2:1) using a cut-off P-value <1e-10. The remaining 4798 SNP markers were used for map construction. The identity and location of these SNP variants is provided as a track on the JBrowse of the assembly.

Table 2.

Statistics of identified SNP markers (number and proportion of transition vs. transversion type markers)

Type Type of variation Number Proportion of type
Transition C/T 2832 31
Transition A/G 2681 29
Transversion C/G 1334 15
Transversion A/C 875 10
Transversion G/T 870 10
Transversion A/T 535 6

Presence/absence variant (PAV) identification

Although the GBS approach was originally envisaged primarily as a method for genome-wide SNP discovery, a second type of variation has also been reported in many studies involving its use (Elshire et al., 2011). This variation manifests itself in the presence of alignments at a locus for some individuals vs. the lack of alignments for other individuals. Such PVAs can arise due to several events (SNPs and small InDels at restriction sites, larger InDels, inversions, etc.) all of which have the effect of disrupting the formation of PstI site-bounded fragments in the size range being selected for by the PCR amplification step for some alleles at a locus, while such fragments are present for other alleles. The result is the segregation of the presence of the fragment as a dominant marker in the population, with the exact mode dependent on the allelic configuration and population type involved.

Importantly, PA variation can actually far exceed SNP variation in GBS studies (Elshire et al., 2011). Because of its potential to add significant numbers of markers, we decided to explore the use of a very simple two-step filtering approach to identify potential PAVs in the F2 biomass population using a series of UNIX commands (see the Materials and Methods for details not included below).

There were 111 903 independent loci in the genome that have at least one read aligning to them. For each locus, the individuals with alignments were scored as present, and individuals with no alignments were scored as absent. We then filtered this table of variants to identify marker loci that satisfied two criteria (in the following order). (1) PAVs are expected to exhibit a Mendelian segregation ratio of 3:1 (presence:absence) in an F2 population. In the F2 biomass population, the ideal expected ratio is 127:42. To take into account the known existence of segregation distortion within this population (Anhalt et al., 2008), data were filtered to identify loci with alignments (potential ‘presence’ category variants) to between 50 and 90 % of the total population. This reduced the number of candidate loci to 20 180. (2) Lack of read alignment for potential ‘absence’ variants might be due to segregation of the recessive allele, but might also be due to a technically derived lack of read coverage. This latter class is effectively missing data, but such instances cannot easily be distinguished from ‘absence’ variants on a case by case basis. To minimize this confounding effect, we screened the remaining marker loci to identify those exhibiting a mean read depth of no fewer than eight alignments per individual to identify loci with an ‘on-average’ reasonable read depth. Of the 20 180 loci from the previous round, a total of 7714 potential PAVs remained after this filtering step.

Construction of a high density SNP- and PAV-based genetic linkage map of the F2 biomass population

A total of 4798 SNPs and 7714 PAVs were carried forward for linkage map construction. In order to identify and orient LGs, segregation data for 326 DArT markers previously used for map construction in this F2 biomass population (Anhalt et al., 2008) were included in early rounds of the analysis (grouping and early rounds of mapping prior to error correction), but were removed for later rounds.

The SNP and PAV markers were initially assigned to LGs using R/qtl. The two sub-sets of markers (co-dominant SNPs and dominant PAVs) were grouped separately as different optimal settings were required to resolve the different sub-sets efficiently into LGs. At thresholds for an RF/LOD of 0·11/7, SNP markers resolved into seven LGs (identified by the presence of DArT markers). Likewise, the PA markers resolved to 14 LGs at RF/LOD thresholds of 0·12/10.

Of the 4798 SNP markers, 3105 grouped into seven large LGs and were used for subsequent map calculation. The number of SNP markers per LG ranged from 269 to 563. Of the 7714 PAV markers, 7265 resolved into seven LGs, with a range of from 903 to 1426 markers per LG.

We adopted a two-stage mapping process, using the co-dominant SNP markers to construct a framework map, to which we subsequently fitted the PAV markers using a binning approach. After grouping the markers into LGs and removal of non-redundant loci, an initial round of marker ordering was performed using the maximum likelihood algorithm of JoinMap 4.1 for the SNP markers. The resulting LGs ranged in size from 687 to 1324 cM. Given that the entire map length for the previous DArT marker-based map of the F2 biomass population was 966 cM, these map lengths were vastly overinflated. This phenomenon is well established in the production of ultra-high density genetic linkage maps with relatively low resolution, where the cumulative effect of low levels of genotyping error (yielding false recombination events between markers) results in artificial map expansion when analysed with more ‘traditional’ mapping algorithms and approaches such as those implemented in Joinmap (van Os et al., 2006).

In order to address this problem, we decided to adopt a conservative approach to correct potential genotyping errors, followed by removal of redundant marker data in order to decrease map length while maintaining accuracy of marker order. From the maximum likelihood maps produced in JoinMap, graphical genotypes were generated for each LG in the framework SNP map (Fig. 3). These were used as input files for the ‘GBS Plumage for F2’ utility (Spindel et al., 2013) specifically designed to deal with genotype error correction in F2 population types. Erroneous genotype calls usually manifest themselves as apparent double recombinants. Using the default setting of GBS Plumage, potential double recombinants in progeny LGs (rendered as graphical genotypes) were identified and replaced with missing values. After error correction, a second round of ordering was performed in JoinMap 4.1 – pairwise recombinant fractions between all pairs of markers were calculated on the error-corrected and re-ordered LG using the maximum likelihood mapping algorithm (Fig. 3).

Fig. 3.

Fig. 3.

Examples of graphical genotypes of chromosomes 2 and 3 (A) before and (B) after genotype error correction. The x-axis consists of genotype calls of individuals and the y-axis consists of markers ordered by chromosomal map position. The blue colour represents the allele from the paternal grandparent, pink from the maternal grandparent and yellow for the heterozygous state.

In total, 1865 unique bins representing 10 352 markers were used to calculate the map. The total final map length was 952·6 cM, which is in keeping with previous map lengths for perennial ryegrass and specifically, for this population, indicating that the error correction and redundancy removal were effective. The number of markers in each chromosome ranges from 845 to 1987 (Table 3). The number of unique bins for each chromosome ranges from 179 to 331. Average spacing between unique markers across all the chromosomes was 0·4 cM, with the maximum spacing of 15·8 cM.

Table 3.

Summary of the genetic map [linkage group (LG), total number of markers, number of SNPs, PA markers, number of SNP bins, map length, and number and size of scaffolds anchored for each LG]

LG Total no. of markers No. of SNP markers No. of PA markers No. of SNP bins Map length (cM) No. of unique scaffolds anchored No. of bases anchored
1 1319 418 901 240 124·5 573 22 689 863
2 1502 466 1036 287 139·4 665 28 082 237
3 1986 563 1423 317 153·9 886 36 169 696
4 1839 557 1282 331 196 853 34 568 556
5 1421 370 1051 233 119·8 617 25 520 777
6 845 269 576 179 89·3 373 15 266 325
7 1440 449 991 278 129·7 624 25 795 316
Total 10 352 3092 7260 1865 952·6 4591 188 092 770

Markers in the map were defined by alignment to the reference gene space assembly produced in the paternal grandparent. The 10 352 markers on the map represent 4767 unique scaffolds in the assembly. The majority of the scaffolds anchored were in the size range between 10 and 100 kb (Table 4). The total size of the 4767 scaffolds accounts for 18 % (200 Mbp) of the total size of the reference assembly. Supplementary Data Table S4 contains a complete list of the markers, genetic order and identity of anchor markers used to create bins, the bin assignment of the remaining markers, and a list of the unique scaffolds anchored by the markers.

Table 4.

Summary of the size distribution of anchored scaffolds in the map of the F2 biomass population

Scaffold size range No. of anchored scaffolds
≤500 75
500–1000 63
1000–5000 223
5000–10 000 296
10 000–50 000 2556
50 000–100 000 1241
100 000–500 000 313
Total 4767

The number of markers observed per scaffold ranged from one to 15. Out of the 4767 scaffolds that were anchored with GBS markers, 175 scaffolds had 720 markers on them that were mapping to more than one chromosome. Of the remaining 4591 scaffolds, 1007 scaffolds comprising 1590 markers were anchored just by SNP markers, 2877 scaffolds comprising 5331 markers were anchored just by PA markers and 707 scaffolds comprising 2711 markers were anchored by both SNP and PA markers. The total space in the assembly anchored by scaffolds mapping to multiple chromosomes accounted for 1 % (1997 625 bp). This could be due to misassembly of the scaffolds, but might also arise from events such as incorrect alignment of fragments to the reference genome.

The GBS-based map of the F2 biomass population is defined by a considerably larger number of PAVs than SNPs (more than twice as many PAVs than SNPs). Because of their dominant nature, PAV markers are more prone to genotype scoring error, largely due to the difficulty in distinguishing the recessive allelic state (absence of an alignment) from a technically derived lack of read coverage on a per genotype basis. The existence of 707 scaffolds anchored by both PAV and SNP markers afforded an opportunity to test the accuracy of the PAV markers relative to the more informative SNP markers.

We examined the pairwise recombination fraction (from the JoinMap pairwise data file) between all pairs of SNPs and PAVs occupying the same scaffold for all 707 scaffolds. Given the resolving power of the population and the maximum size of the scaffolds in our assembly, these markers should generally co-segregate. Out of a total of 1455 pairwise observations between SNP and PAV markers on the same scaffolds, 430 (30%) had pairwise recombination fractions <0·01 and 932 (64 %) observations had pairwise recombination fractions between 0·01 and 0·05. A further 60 (4 %) observations had pairwise recombination fractions between 0·05 and 0·1, and the remaining 33 (2 %) had recombinant fractions exceeding 0·1. In order to test how well the binning strategy to place the PAV markers on the map performed, we also examined the map distance between SNP and PA markers occurring on the same scaffold according to which non-redundant bin they occupied on the final map. Out of the same 1455 pairwise comparisons, 852 (59 %) pairs were separated by < 1 cM, 318 (22%) were separated by between 1 and 5 cM, 132 (9 %) had a map distance between 5 and 10 cM, and the remaining 153 (10 %) had >10 cM map distance between them.

Previous work on the F2 biomass population showed the presence of unusually high levels of segregation distortion. Anhalt et al. (2008) showed that 63 % of the total markers used in an amplified fragment length polymorphism- (AFLP) and SSR-based map of the population showed segregation distortion, a level twice that observed in other mapping populations of perennial ryegrass used for comparison in the same study. Linkage groups 3, 5, 6 and 7 were reported to have a high level of segregation distortion, and LG 2 and LG 4 to have the least amount of segregation distortion. In particular, LG 6 was reported to be completely distorted.

The GBS-based SNP map of the population also exhibited significant levels of distortion, but the overall level was much lower than that observed by Anhalt et al. (2008). Out of 10 352 markers and 1865 unique bins on the GBS map, 4357 (42 %) markers and 618 (33 %) bins exhibited segregation distortion (P-value <0·05). This is a 2-fold discrepancy with the figure found in the previous study. However, the observations of Anhalt et al. (2008) were based on only 75 markers, with marker densities ranging from only eight to 17 per LG. To investigate this apparent discrepancy, we placed all 75 markers from the map of the F2 biomass population presented by Anhalt et al. (2008) on to the combined GBS map. As expected, these markers mapped to areas exhibiting segregation distortion in the current map. However, it is apparent that increased marker coverage on the current map is yielding a better representation of areas exhibiting lower levels of segregation distortion which were significantly under-represented on the previous map (Figs 4 and 5). While segregation distortion is apparent on all chromosomes, the majority of distorted markers are from LG 6 (96 % of marker bins distorted) and LG 3 (57 % of marker bins distorted), and together these LGs account for over half (57 %) of distorted loci on the map. Thus, as well as higher marker density and coverage, the current map of the F2 biomass population exhibits better representation of both distorted and non-distorted genome regions, which could represent a useful feature in trait mapping experiments in the future.

Fig. 4.

Fig. 4.

Distribution of segregation distortion across the chromosomes. A line on the chromosome represents the framework marker on the map. Red indicates loci with segregation distortion (P-value <0.05) and green represents non-distorted loci. The highlighted loci with map position on the left and locus name on the right represent the marker locations of previously published markers by Anhalt et al. (2008) on the current linkage map.

Fig. 5.

Fig. 5.

Distribution of marker density across the chromosome. The x-axis represents the 5 cM map interval and the y-axis represents the number of GBS markers present in the interval.

Homozygosity level of the genotype used for the reference sequence

Both the F1 parent and paternal grandparent were also subject to GBS with the progeny individuals. Unfortunately, the maternal grandparent of the population was no longer in existence at the time of the study, and so could not be examined. However, inclusion of the paternal inbred grandparent, which was also used for the reference sequence assembly, yielded the opportunity to examine the extent of homozygosity of this inbred line. This feature of the paternal grandparent is particularly interesting, since the extent of heterozygosity could yield insights into whether there is a requirement to account for the extensive presence of biallelic loci in the assembly, or in gene expression-based experiments involving this interesting experimental genotype. Out of 3030 loci from the paternal grandparent mapped using GBS in an F2 population, 3015 of them were of homozygous calls, 11 calls were of heterozygous type and four calls represented alleles from another parent (these may represent ‘missed’ heterozygote calls). Thus, the mapping data indicate that the paternal grandparent and reference sequence genotype is approx. 99 % homozygous (Fig. 6).

Fig. 6.

Fig. 6.

Graphical genotypes of parents along with a sub-set of individuals from chromosome 1. The first column represents the F1 parent, and the second column the paternal grandparent. Blue represents alleles from the paternal grandparent, pink represents alleles from the maternal grandparent and yellow indicates a heterozygous allelic state.

Anchoring of gene-containing scaffolds

Comparison of the 4767 GBS-anchored scaffolds with the 18 135 scaffolds that had good matches to the Byrne gene models and the Ruttink and Farrell transcript sets identified 3679 anchored scaffolds (79·32 %) that contained transcripts/gene models from any set. The total length of anchored scaffolds containing matches to transcripts or gene models is 184·36 Mbp, corresponding to 92 % of the total cumulative length (200 Mb) of anchored scaffolds. In terms of the total proportion of the potential ‘gene space’ of L. perenne anchored in the study, the 18 135 genic scaffolds cover approx. 570 Mb, and we have anchored approximately one-third of this. Use of methylation-sensitive enzymes in GBS is expected to target genic areas, and the results for the PstI-based approach used in this study support the veracity of this expectation, with the vast majority of anchored scaffolds showing evidence of being gene containing.

In order to gain an insight into the performance of the synteny-based approach for chromosomal anchoring of scaffolds adopted by Byrne et al. (2015) relative to the direct anchoring of scaffolds to chromosomes via genetic mapping in this study, we focused on the 48 scaffolds containing flowering-related genes that we identified in our assembly. On examination, 22 of the 48 scaffolds were directly anchored to our genetic map (Table S2). We identified the ‘equivalent’ scaffolds in the assembly of Byrne et al. (2015), defining ‘equivalent’ as a scaffold that contained the probable flowering gene orthologue as identified by Exonerate, but also exhibited a BLAST-based similarity profile on a scaffold level that confirmed that each match represented the orthologous genomic region (for simplicity, we ignored scaffolds from the Byrne et al. assembly that overlapped our scaffold, but did not contain the flowering gene).

Using this approach, we found 47 equivalent scaffolds in the Byrne et al. (2015) assembly, but were unable to resolve an equivalent for our FRIGIDA-containing scaffold due to multiple strong matches (Table S2). All of the 47 matched genes appear to be represented in the Byrne et al. (2015) assembly by complete models (or in one case a near-complete model) based on homology with rice and Brachypodium, and all possessed gene models from the annotation associated with the assembly.

Comparing the 22 specific scaffolds for which we have a genetic location with the chromosomal assignment for the equivalent scaffolds in the Byrne et al. (2015) assembly revealed that, in 18 cases, the chromosomal assignments agreed, whilst in four cases, there were conflicts. We did not compare the relative location within chromosomes of matching results due to the widely differing map lengths of individual LGs in our map and the reference map used for anchoring the Genome Zipper. However, at a whole-chromosome level, the scaffolds containing Lolium homologues of the genes AP1, CEN, FCA and FIE1 were placed on chromosomes 2, 6, 2 and 1, respectively, by Byrne et al. (2015), but were anchored to chromosomes 3, 5, 5 and 3, respectively, in our map (Table S2). For CEN, FCA and FIE1, these scaffold locations were supported by multiple PA and/or SNP markers in the map, whereas the scaffold containing AP1 was anchored by a single SNP used to create the framework map. Assuming that, in general, scaffolds directly anchored by multiple markers, or single high confidence markers are robustly assigned, this means that the synteny-based method has resulted in incorrect chromosomal assignments for 18 % of these scaffolds.

DISCUSSION

Genotyping by sequencing offers a magnitudinal increase in our ability to create densely populated genetic maps in an extremely cost- and time-effective manner. A genetic linkage map was successfully created containing >10 000 markers, located in 1865 non-redundant bins, using 169 individuals of the well characterized perennial ryegrass F2 biomass mapping population. Experience to date suggests that, once the methodology and basic resources are established, dense maps of this sort can be created in a matter of weeks.

Although it is possible to perform GBS in the absence of a reference sequence, early pilot experiments using the mapping population suggested that auto-assembly of GBS reads to create a reference sequence, as performed in other studies (Chen et al., 2013; Russell et al., 2014), could be problematic, with relatively minor changes in assembly parameters causing relatively large differences in the resulting assemblies (data not shown). Because of this, it was decided to generate a reference sequence to which to align GBS reads. In this case, the only existing inbred grandparental line of the mapping population was used (the paternal grandparent). As outlined previously, this genotype is a tenth-generation inbred line originating from a CMS programme (Connolly and Wrightturner, 1984). Theoretically, a genotype at this level of inbreeding should retain well below 1 % heterozygosity, making it an ideal candidate for use in a genome assembly initiative, since only a single haplotype is expected to be present for the majority of the genome. Near-complete homozygosity obviates the problems associated with assembly associated with a highly heterozygous species in which SNP densities have been estimated at in the region of one SNP every 30 bp (Xing et al., 2007). Inclusion of the paternal grandparent in the GBS experiment allowed us to confirm the expected high levels of homozygosity, with only 0·5 % of > 3000 mapped SNP markers present in the paternal grandparent deviating from the expectation of homozygosity.

It is important to note that our study has taken place against the backdrop of the recent release of a more complete synteny-based draft genome sequence of L. perenne by Byrne et al. (2015). That assembly was generated in a sixth-generation inbred line of perennial ryegrass (P226/135/16). Utilizing a similar mixture of Illumina paired-end, mate pair and long jumping distance library sequencing that we adopted, Byrne et al. (2015) additionally used long read PacBio sequences equivalent to 9-fold coverage of the genome for closure of assembly gaps. Their resulting assembly captured 1128 Mbp of the perennial ryegrass genome in 48 415 scaffolds (67 024 contigs) with a scaffold N50 of 70 062 bp (contig N50: 16 370 bp). These figures account only for scaffolds and contigs in excess of 1 kb, and, adjusting for this by also only considering sequences in excess of 1 kb, by comparison, our assembly captures 977 Mbp of the genome in 90 787 scaffolds (166 217 contigs) with a scaffold N50 of 32 299 bp (contig N50: 5559 bp). Based on this comparison, our assembly offers slightly lower genome coverage, which is captured in just under double the number of scaffolds. Byrne et al. (2015) also utilized multiple RNA-seq data sets to generate a comprehensive annotation comprising 28 455 genes on 13 725 scaffolds that accounted for 796 Mbp of their assembly. Subsequently, using the synteny-driven Genome Zipper approach (Pfeifer et al., 2013), they organized a total of 13 411 scaffolds (approx. 800 Mbp in total) and 10 464 annotated genes into a linear order on the perennial ryegrass genome by virtue of comparison with the reference genomes of Brachypodium, rice and Sorghum.

We utilized the gene models generated by Byrne et al. (2015), in addition to extensive transcript data sets generated in L. perenne by Ruttink et al. (2013) and Farrell et al. (2014), to identify the gene-containing portion of our assembly, identifying scaffolds comprising 570 Mbp in total length that contain high confidence matches to these L. perenne sequences. The approx. 10 000 GBS tags comprising the map of the F2 biomass population anchor 4767 scaffolds, equivalent to approx. 200 Mb of the assembly. Although this is considerably lower than the total length assigned a chromosomal location and order by Byrne et al. (2015), the anchoring is more direct in nature, and we used this feature to test the performance of the synteny-based approach by comparing chromosomal assignments for a small sub-set of genes involved in flowering. The comparison reveals that synteny-based anchoring performs well, with >80 % concordance between genetic mapping and synteny-based results at a ‘whole-chromosome’ assignment level. However, the results also demonstrate that, while synteny-based anchoring is a powerful approach, GBS-based genetic mapping in this and other populations may also contribute to the long-term goal of producing a more comprehensive, chromosomally anchored pseudomolecule assembly of perennial ryegrass in the future through validating and augmenting synteny-based assignments.

Reduced representational sequencing approaches for genotyping are largely based on the concept of characterizing the same sequence tag across all individuals in the study population, with variation being detected within the window of sequence covered by the tag. However, a variety of polymorphic events (e.g. SNPs and InDels at the restriction sites being used for complexity reduction) can cause a second type of polymorphism which manifests itself in the form of the differential detection of the presence of aligned tags in different individuals. This PAV has in fact been observed at a frequency far higher than the occurrence of SNP variation (Lu et al., 2015). For instance, 80–90 % of the maize genome is reported to show some PAVs (Chia et al., 2012) and recently 1·1 million PAVs have been mapped to the maize pan genome (Lu et al., 2015). Given the potential for PAVs to add significantly to the marker density of the map (and the extent of genome anchoring), we decided to develop and test some simple procedures to both score and map them in this study.

Since PAVs manifest themselves at the alignment stage, we adopted a relatively straightforward approach based on the bowtie-generated alignment files for loci exhibiting the footprint of PA variation. Because lack of alignment at a locus in any individual could come from technical sources such as sequence under-representation in the GBS libraries, a filtering process based on read depth across all individuals was used to identify loci in which this was not a general problem, followed by the imposition of a requirement to conform to the expected Mendelian segregation pattern for dominant markers in the population. In recognition of the fact that the population exhibits segregation distortion, and that lack of read coverage at individual loci could still contribute to apparent ‘absence’ variants, a more or less arbitrarily determined window around the expected 3:1 ratio was used, allowing for either a 3-fold increase or decrease of the ‘presence’ category, equivalent to ratios between 9:1 and 1:1.

For the mapping component of the study, early attempts at incorporating the PAVs directly in the map were problematic, probably due to a mixture of incomplete genotype information [heterozygotes (Aa) and homozygotes (AA) are indistinguishable] combined with a higher potential for miscalls, resulting in vastly inflated map distances. To circumvent this, a high quality framework SNP-based map was generated and a very simple binning approach was adopted to place PAVs into this (fixed order and distance) framework map. This maintained the integrity of the map produced using the more robust SNP markers, whilst allowing the utilization of the considerable amount of anchoring information associated with the PAVs. Over half of the total mapped scaffold length (113 Mb of 200Mb) was anchored solely by PAVs and, whilst there is an expectation that chromosomal position of markers anchored by PAVs might be inherently less accurate, we felt that inclusion of this information would be beneficial to future gene discovery applications as long as: (1) the inclusion of the PAVs did not degrade the map; and (2) the potential accuracy range of the PAVs was reasonably well understood.

Adopting a binning process addresses the first of the two points above. An attempt to quantify the second was undertaken using the hypotheses that in general, SNP markers were accurately placed relative to PAVs; and that SNPs and PAVs occupying the same scaffolds (given that the N50 scaffold size of the assembly is 25 kb) should theoretically co-segregate in the absence of genotyping error and missing data. Over 90 % of SNP–PAV pairs occupying the same scaffold had pairwise recombination fractions of no more than 5 % (0·05), and just over 80 % of PAVs ended up in a final bin no more than 5 cM away from their physically paired SNP anchor marker. Assuming a low error rate in the SNP data set, each percentage point of error in genotype calling in the PAVs will be translated into a 1 % increase in the recombination fraction between the ‘reference’ SNP and the ‘query’ PAV in question. Our results suggest that our filtering approach is managing to identify PAVs with low error rates (approx. 30 % below 1 % error and approx. 60 % below 5 % error), with the binning process placing the majority of the data at a map distance consistent with these recombination fractions. There is no doubt that more sophisticated approaches to both identify and map PAVs in similar studies could be implemented, but this study demonstrates that, even using the relatively straightforward approaches adopted here, PAV markers can contribute significantly to anchoring mapped markers to sequence assemblies in pairwise mapping populations subjected to GBS.

The F2 biomass population has, in the past, been used as an exemplar for high levels of segregation distortion (SD). Anhalt et al. (2008) indicated the occurrence of levels of SD exceeding 60 %. The SNP framework map presented here contrasts with the previous results, with 42 % of SNP markers, or 33 % of markers representing the signatures for the non-redundant bin set exhibiting distortion. Placing the markers used in the previous study on this map provides insights into the discrepancy, which seems to be due to a mixture of low marker density and unfortunate distribution of the markers in the previous study. The majority of the distortion appears on LG 6, which exhibits an under-representation of the grandparental-derived genome, and the top two-thirds of LG 3, which exhibit similar patterns of distortion. The source of the extensive SD in these regions remains unknown. To our knowledge, extensive distortion of the level observed on LG 6 is unique to the F2 biomass population. Segregation distortion of LG 3 has previously been associated with the presence of the self-compatibility locus F in the International Lolium Genome Initiative (ILGI) reference mapping population (Thorogood et al., 2002). LG 3 of the F2 biomass population was the focus of a study by Manzanares (2013), who tested the hypothesis that the F-locus was responsible for the self-compatibility phenotype in the population, in part based on the observations of the SD on LG 3 in this population. Although the conclusion was based on representation of LG 3 by only four markers, the results indicated that LG 3 is not involved in the self-compatibility phenotype of the F2 biomass population, although the trait is controlled by a single self-compatibility locus. Thus the SD observed on this chromosome seems unrelated to the F-locus, and remains unexplained. The existence of a physically anchored, SNP-based map of the F2 biomass population leads to interesting prospects for establishing the identity of the locus responsible for the self-compatibility phenotype in the population, in order to understand whether one or several such loci exist in the available self-compatible L. perenne genotypes available. Regardless of the source of the variation, a more accurate understanding of the distribution of distortion over the map is important in the continuing utility of the F2 biomass population as a key reference population for future trait mapping and discovery genetics applications.

Conclusion

Our main goal in this study was to produce a high density, heavily chromosomally anchored genetic map in a key reference mapping population in perennial ryegrass. We adopted a highly inclusive approach, maximizing the number of anchored fragments by utilizing both SNP and the more frequent PA variation revealed by GBS. Combined with existing and emerging genomic resources such as the recently published synteny-based draft genome sequence of the species released by Byrne et al. (2015), and hopefully more comprehensive assemblies that will be built in the near future, the current map will be a useful tool for understanding the genetic basis of numerous traits for which it segregates. For example, a phenotypic data set for the segregation of polar secondary metabolites already exists as an extension of the study on the mapping of non-polar metabolites recently published by Foito et al. (2015), and (interestingly in the context of the flowering-associated genes described) the population also segregates for heading date. Unlike maps produced in the species to date, the high level of direct and indirect anchoring to two perennial ryegrass assemblies yields the potential routinely to identify candidate genes underlying mapped traits. In addition to its future use in trait genetic analysis, the F2 biomass population is also the source population for the long-term goal of the generation of a RIL population for perennial ryegrass.

Finally, the availability of a draft assembly of a second perennial ryegrass genotype, in addition to that of genotype P226/135/16, will allow comparisons that may yield useful insights into intra-specific variation in genome structure in L. perenne, similar to those that have been enabled by the availability of significant genome-wide sequence information of multiple haplotypes in other recently characterized species (Potato Genome Sequencing Consortium, 2011; Deokar et al., 2014; Wilson et al., 2015).

SUPPLEMENTARY DATA

Supplementary data are available online at www.aob.oxfordjournals.org and consist of the following. Table S1: summary of initial sequencing and raw fold-coverage achieved. Table S2: list of 48 flowering-associated genes identified in the study, along with associated scaffold ID, location on that scaffold and the chromosomal assignment for the 22 scaffolds we anchored. The corresponding scaffolds from Byrne et al. (2015), and their chromosomal assignment are also shown for comparison, as well as completeness of the gene model from both assemblies. Table S3: summary statistics for repeat masking with the wheat-based model. Table S4: multi-tab Excel spreadsheet containing the map positions for all the markers, the non-redundant set of 1865 bins used to calculate the core map, number of markers present in each bin, list of unique scaffold names anchored using the map, and a list of scaffolds that map to multiple chromosomes.

Supplementary Data

ACKNOWLEDGEMENTS

The authors wish to acknowledge Trinity College Dublin (Elaine Kenny), University College Dublin (Alison Murphy) and the Oslo Sequencing Centre (Lex Nederbragt and Gregor Gilfillan) for their technical expertise in the sequencing for the assembly described herein. We also thank Dr Tom Ruttink, ILVO, Belgium for access to Lolium perenne OGA assembled transcript data, and Dr Stephen Byrne for useful discussions that contributed to the revised draft of the manuscript. This study was funded by Teagasc core funding and Teagasc PhD Walsh Fellowships to J.V. and E.M.

LITERATURE CITED

  1. Altschul SG, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. Journal of Molecular Biology 215: 403–410. [DOI] [PubMed] [Google Scholar]
  2. Anhalt U, Heslop-Harrison P, Byrne S, Guillard A, Barth S. 2008. Segregation distortion in Lolium: evidence for genetic effects. Theoretical and Applied Genetics 117: 297–306. [DOI] [PubMed] [Google Scholar]
  3. Anhalt U, Heslop-Harrison J, Piepho H, Byrne S, Barth S. 2009. Quantitative trait loci mapping for biomass yield traits in a Lolium inbred line derived F-2 population. Euphytica 170: 99–107. [Google Scholar]
  4. Bartos J, Sandve S, Kolliker R, et al. 2011. Genetic mapping of DArT markers in the Festuca–Lolium complex and their use in freezing tolerance association analysis. Theoretical and Applied Genetics 122: 1133–1147. [DOI] [PubMed] [Google Scholar]
  5. Bentley D, Balasubramanian S, Swerdlow H, et al. 2008. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456: 53–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Boetzer M, Henkel C, Jansen H, Butler D, Pirovano W. 2011. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27: 578–579. [DOI] [PubMed] [Google Scholar]
  7. Broman K, Wu H, Sen S, Churchill G. 2003. R/qtl: QTL mapping in experimental crosses. Bioinformatics 19: 889–890. [DOI] [PubMed] [Google Scholar]
  8. Byrne SL, Nagy I, Pfeifer M, et al. 2015. A synteny-based draft genome sequence of the forage grass Lolium perenne. The Plant Journal 84: 816–826 [DOI] [PubMed] [Google Scholar]
  9. Catchen J, Hohenlohe P, Bassham S, Amores A, Cresko W. 2013. Stacks: an analysis tool set for population genomics. Molecular Ecology 22: 3124–3140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Chen Q, Ma Y, Yang Y, et al. 2013. Genotyping by genome reducing and sequencing for outbred animals. PLoS One 8: e67500. doi:10.1371/journal.pone.0067500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Chia J, Song C, Bradbury P, et al. 2012. Maize HapMap2 identifies extant variation from a genome in flux. Nature Genetics, 44: 803–807. [DOI] [PubMed] [Google Scholar]
  12. Comadran J, Kilian B, Russell J, et al. 2012. Natural variation in a homolog of Antirrhinum CENTRORADIALIS contributed to spring growth habit and environmental adaptation in cultivated barley. Nature Genetics 44: 1388–1392 [DOI] [PubMed] [Google Scholar]
  13. Connolly V, Wrightturner R. 1984. Induction of cytoplasmic male-sterility into ryegrass (Lolium perenne). Theoretical and Applied Genetics 68: 449–453. [DOI] [PubMed] [Google Scholar]
  14. Davey J, Blaxter M. 2010. RADSeq: next-generation population genetics. Briefings in Functional Genomics 9: 416–423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Davey J, Hohenlohe P, Etter P, Boone J, Catchen J, Blaxter M. 2011. Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nature Reviews Genetics 12: 499–510. [DOI] [PubMed] [Google Scholar]
  16. Deokar A, Ramsay L, Sharpe A, et al. 2014. Genome wide SNP identification in chickpea for use in development of a high density genetic map and improvement of chickpea reference genome assembly. BMC Genomics 15: 708. doi:10.1186/1471-2164-15-708. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Doyle JJ. 1987. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochemical Bulletin 19: 11–15. [Google Scholar]
  18. Elshire R, Glaubitz J, Sun Q, et al. 2011. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One 6: e19379. doi:10.1371/journal.pone.0019379. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Farrell J, Byrne S, Paina C, Asp T. 2014. De novo assembly of the perennial ryegrass transcriptome using an RNA-Seq strategy. PLoS One 9: e103567. doi:10.1371/journal.pone.0103567. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Foito A, Hackett C, Byrne S, Stewart D, Barth S. 2015. Quantitative trait loci analysis to study the genetic regulation of non-polar metabolites in perennial ryegrass. Metabolomics 11: 412–424. [Google Scholar]
  21. Higgins J, Bailey P, Laurie D. 2010. Comparative genomics of flowering time pathways using Brachypodium distachyon as a model for the temperate grasses. PLoS One 5 :e10065. doi:10.1371/journal.pone.0010065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Joshi NA, Fass JN. 2011. Sickle: a sliding-window, adaptive, quality-based trimming tool for FastQ files. [Google Scholar]
  23. Koboldt D, Zhang Q, Larson D, et al. 2012. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Research 22: 568–576. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Kopecky D, Havrankova M, Loureiro J, et al. 2010. Physical distribution of homoeologous recombination in individual chromosomes of Festuca pratensis in Lolium multiflorum. Cytogenetic and Genome Research 129: 162–172. [DOI] [PubMed] [Google Scholar]
  25. Langmead B, Trapnell C, Pop M, Salzberg S. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10: R25. doi:10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Li H, Handsaker B, Wysoker A, et al. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Lu F, Lu F, Romay MC, et al. 2015. High-resolution genetic mapping of maize pan-genome sequence anchors. Nature Communications 6: 6914. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Manzanares C. 2013. Genetics of self-incompatibility in perennial ryegrass (Lolium perenne L.). PhD thesis, University of Birmingham, UK.
  29. Mascher M, Wu S, St Amand P, Stein N, Poland J. 2013. Application of genotyping-by-sequencing on semiconductor sequencing platforms: a comparison of genetic and reference-based marker ordering in barley. PLoS One 8: e76925. doi:10.1371/journal.pone.0076925. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. van Orsouw N, Hogers R, Janssen A, et al. 2007. Complexity Reduction of Polymorphic Sequences (CRoPS): a novel approach for large-scale polymorphism discovery in complex genomes. PLoS One 2: e1172.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. van Os H, Andrzejewski S, Bakker E, et al. 2006. Construction of a 10,000-marker ultradense genetic recombination map of potato: providing a framework for accelerated gene isolation and a genomewide physical map. Genetics 173: 1075–1087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Parra G, Bradnam K, Korf I. 2007. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genornes. Bioinformatics 23: 1061–1067. [DOI] [PubMed] [Google Scholar]
  33. Pfeifer M, Martis M, Asp T, et al. 2013. The perennial ryegrass GenomeZipper: targeted use of genome resources for comparative grass genomics. Plant Physiology 161: 571–582. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Potato Genome Sequencing Consortium. 2011. Genome sequence and analysis of the tuber crop potato. Nature 475: 189–195. [DOI] [PubMed] [Google Scholar]
  35. Rostoks N, Park Y, Ramakrishna W, et al. 2002. Genomic sequencing reveals gene content, genomic organization, and recombination relationships in barley. Functional and Integrative Genomics 2: 51–59. [DOI] [PubMed] [Google Scholar]
  36. Russell J, Hackett C, Hedley P, et al. 2014. The use of genotyping by sequencing in blackcurrant (Ribes nigrum): developing high-resolution linkage maps in species without reference genome sequences. Molecular Breeding 33: 835–849. [Google Scholar]
  37. Ruttink T, Sterck L, Rohde A, Bendixen C, et al. 2013. Orthology Guided Assembly in highly heterozygous crops: creating a reference transcriptome to uncover genetic diversity in Lolium perenne. Plant Biotechnology Journal 11: 605–617. [DOI] [PubMed] [Google Scholar]
  38. Salse J, Bolot S, Throude M, et al. 2008. Identification and characterization of shared duplications between rice and wheat provide new insight into grass genome evolution. The Plant Cell 20: 11–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Skinner M, Uzilov A, Stein L, Mungall C, Holmes I. 2009. JBrowse: a next-generation genome browser. Genome Research 19: 1630–1638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Slater G, Birney E. 2005. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6: 31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Smit AFA, Hubley R, Green P. 2010. RepeatMasker Open-3.0 http://www.repeatmasker.org.
  42. Spindel J, Wright M, Chen C, et al. 2013. Bridging the genotyping gap: using genotyping by sequencing (GBS) to add high-density SNP markers and new value to traditional bi-parental mapping and breeding populations. Theoretical and Applied Genetics 126: 2699–2716. [DOI] [PubMed] [Google Scholar]
  43. Stanke M, Waack S. 2003. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19: II215–II225. [DOI] [PubMed] [Google Scholar]
  44. Thorogood D, Kaiser W, Jones J, Armstead I. 2002. Self-incompatibility in ryegrass 12. Genotyping and mapping the S and Z loci of Lolium perenne L. Heredity 88: 385–390. [DOI] [PubMed] [Google Scholar]
  45. Tomaszewski C, Byrne S, Foito A, et al. 2012. Genetic linkage mapping in an F2 perennial ryegrass population using DArT markers. Plant Breeding 131: 345–349. [Google Scholar]
  46. Van Ooijen J. 2011. Multipoint maximum likelihood mapping in a full-sib family of an outbreeding species. Genetics Research 93: 343–349. [DOI] [PubMed] [Google Scholar]
  47. Ward J, Bhangoo J, Fernandez-Fernandez F, et al. 2013. Saturated linkage map construction in Rubus idaeus using genotyping by sequencing and genome-independent imputation. BMC Genomics 14: 2. doi:10.1186/1471-2164-14-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Wilson A, Wickett N, Grabowski P, Fant J, Borevitz J, Mueller G. 2015. Examining the efficacy of a genotyping-by-sequencing technique for population genetic analysis of the mushroom Laccaria bicolor and evaluating whether a reference genome is necessary to assess homology. Mycologia 107: 217–226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Xing Y, Frei U, Schejbel B, Asp T, Lubberstedt T. 2007. Nucleotide diversity and linkage disequilibrium in 11 expressed resistance candidate genes in Lolium perenne. BMC Plant Biology 7: 43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Xu H, Luo X, Qian J, et al. 2012. FastUniq: a fast de novo duplicates removal tool for paired short reads. PLoS One 7: e52249. doi:10.1371/journal.pone.0052249. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Annals of Botany are provided here courtesy of Oxford University Press

RESOURCES