Skip to main content
Plant Physiology logoLink to Plant Physiology
. 2014 Sep 10;166(3):1241–1254. doi: 10.1104/pp.114.247668

De Novo Genome Assembly of the Economically Important Weed Horseweed Using Integrated Data from Multiple Sequencing Platforms1,[C],[W],[OPEN]

Yanhui Peng 1,2,3,4,5, Zhao Lai 1,2,3,4,5, Thomas Lane 1,2,3,4,5, Madhugiri Nageswara-Rao 1,2,3,4,5, Miki Okada 1,2,3,4,5, Marie Jasieniuk 1,2,3,4,5, Henriette O’Geen 1,2,3,4,5, Ryan W Kim 1,2,3,4,5, R Douglas Sammons 1,2,3,4,5, Loren H Rieseberg 1,2,3,4,5, C Neal Stewart Jr 1,2,3,4,5,*
PMCID: PMC4226366  PMID: 25209985

De novo genome assembly and genomic resources of horseweed will be useful to understand the genetic and molecular bases of weediness.

Abstract

Horseweed (Conyza canadensis), a member of the Compositae (Asteraceae) family, was the first broadleaf weed to evolve resistance to glyphosate. Horseweed, one of the most problematic weeds in the world, is a true diploid (2n = 2x = 18), with the smallest genome of any known agricultural weed (335 Mb). Thus, it is an appropriate candidate to help us understand the genetic and genomic bases of weediness. We undertook a draft de novo genome assembly of horseweed by combining data from multiple sequencing platforms (454 GS-FLX, Illumina HiSeq 2000, and PacBio RS) using various libraries with different insertion sizes (approximately 350 bp, 600 bp, 3 kb, and 10 kb) of a Tennessee-accessed, glyphosate-resistant horseweed biotype. From 116.3 Gb (approximately 350× coverage) of data, the genome was assembled into 13,966 scaffolds with 50% of the assembly = 33,561 bp. The assembly covered 92.3% of the genome, including the complete chloroplast genome (approximately 153 kb) and a nearly complete mitochondrial genome (approximately 450 kb in 120 scaffolds). The nuclear genome is composed of 44,592 protein-coding genes. Genome resequencing of seven additional horseweed biotypes was performed. These sequence data were assembled and used to analyze genome variation. Simple sequence repeat and single-nucleotide polymorphisms were surveyed. Genomic patterns were detected that associated with glyphosate-resistant or -susceptible biotypes. The draft genome will be useful to better understand weediness and the evolution of herbicide resistance and to devise new management strategies. The genome will also be useful as another reference genome in the Compositae. To our knowledge, this article represents the first published draft genome of an agricultural weed.


In the past few years, genomic approaches have revolutionized plant biology. Complete or draft genome data are currently available for tens of plant species (http://www.phytozome.net). From the model plant Arabidopsis (Arabidopsis thaliana; Arabidopsis Genome Initiative, 2000) to important crop plants such as rice (Oryza sativa; Goff et al., 2002; Yu et al., 2002; International Rice Genome Sequencing Project, 2005), soybean (Glycine max; Schmutz et al., 2010), maize (Zea mays; Schnable et al., 2009), and chickpea (Cicer arietinum; Varshney et al., 2013) to economic woody plants such as wine grape (Vitis vinifera; Jaillon et al., 2007) and poplar (Populus trichocarpa; Tuskan et al., 2006), powerful complete genome data sets and tools allow the unprecedented ability to explore the genetics and genomics of plant form and function. Furthermore, genome sequences of additional plant species will soon be available from in-progress large-scale sequencing projects (http://www.ncbi.nlm.nih.gov/genomes/PLANTS/PlantList.html).

One critical group of plants that has been largely overlooked in the genomics revolution consists of economically significant agricultural weeds (Stewart, 2009; Stewart et al., 2009). Weeds cause about $36 billion annual damage in the United States alone (Pimentel et al., 2000). The cost is even higher if one includes weeds in pastures, golf courses, aquatic environments, etc. Unchecked, weeds effectively outcompete crops for resources and decrease the yield of food, feed, and fiber. Despite their profound economic significance, weedy plant genomics data are very scarce relative to other economically important plants.

Weeds experienced surreptitious accidental domestication (Warwick and Stewart, 2005) and adaptation to changing agricultural environments to become among the world’s most successful plants. They are persistent, adaptable, stress tolerant, competitive, and capable of extreme levels of reproduction (e.g. some species can produce one million seeds per plant). Herbicide application has been the dominant weed-control measure in recent decades, but increasingly, the evolution of herbicide resistance is challenging this practice. More than 400 biotypes of weeds have evolved resistance to one or more of all the major groups of herbicides (Heap, 2014), among which resistance to glyphosate is currently of greatest concern. Horseweed (Conyza canadensis), which is in the Compositae (Asteraceae) family, was the first eudicot weedy species to evolve glyphosate resistance and is also one of the most widely distributed glyphosate-resistant species in the world (VanGessel, 2001; Heap, 2014). Horseweed is a true diploid (2n = 18), with a self-fertilizing mating system and the smallest genome of any agricultural weed (335 Mb; Stewart et al., 2009). We sought to describe the reference genome of horseweed in order to enable genomic approaches to elucidate the mechanism of nontarget herbicide resistance and to gain a better understanding of the evolution and spread of herbicide resistance across agricultural landscapes. Weed genomics is critical to understand weed biology, and understanding weed biology is critical in weed management. To date, transcriptomes using next-generation sequencing of a few weedy species have been produced to aid the discovery of genes that contribute to herbicide resistance and other weediness traits, with most progress occurring in Conyza and Amaranthus spp. and Lolium rigidum (Lee et al., 2009; Peng et al., 2010; Riggins et al., 2010; Lai et al., 2012; Gaines et al., 2014).

Horseweed shares many other weediness features with the world’s most damaging weeds. These features include a relatively short vegetative phase, indeterminate flowering, self-compatibility, high seed output, long-distance seed dispersal, competitiveness with crop plants, a deep root system, discontinuous dormancy, environmental plasticity, and allelopathy (Basu et al., 2004; Chao et al., 2005). Also, horseweed is amenable to genetic transformation and regeneration (Halfhill et al., 2007), which facilitates overexpression or knockdown analysis of potential gene targets. Autogamous, a single horseweed plant may produce well over 200,000 seeds. The small seeds have pappi that enable wind and water dispersal (Bhowmik and Bekech, 1993; Weaver 2001) to sometimes very long distances (DeVlaming and Proctor, 1968; Weaver and McWilliams, 1980; Andersen, 1993; Dauer et al., 2006; Shields et al., 2006). Moreover, horseweed seeds are not dormant and can germinate immediately after maturity in the fall, spring, or midsummer (Bhowmik and Bekech, 1993; Weaver, 2001). Thus, horseweed is a bona fide weed with model plant qualities.

The Compositae is the largest and most diverse plant family, with over 24,000 described species. Despite the evolutionary success and economic importance of plants in this family, a draft whole-genome publication for any member of the family or for any plants from closely related families is lacking. However, genomes for lettuce (Lactuca sativa; https://lgr.genomecenter.ucdavis.edu), sunflower (Helianthus annuus; http://www.sunflowergenome.org), and several other members of the family are available, but de novo genome assembly can often be arduous in many cases when genomes are large and complex (Kane et al., 2011; Truco et al., 2013). A draft sequence of a streamlined Compositae genome would greatly advance the genomics and genetics of economically, ecologically, and evolutionarily important plants within this important plant family.

In this study, in order to further increase sequence resources for horseweed, we performed de novo genome sequencing and assembly of the Tennessee glyphosate-resistant horseweed biotype TN-R using multiple sequencing platforms, including 454 GS-FLX, Illumina HiSeq 2000 and PacBio RS. Using a modified strategy that has proven effective in vertebrates (Li et al., 2010a), date palm (Phoenix dactylifera; Al-Dous et al., 2011), and flax (Linum usitatissimum; Wang et al., 2012), the draft genome of horseweed was assembled into scaffolds. Genomes and transcriptomes of this and another seven horseweed biotypes were also sequenced, and the data were used to investigate genome variation and to facilitate molecular marker development and gene discovery. The primary goal of this project was to build substantial genomic resources for horseweed, which will subsequently be useful to elucidate the genomic basis of weediness traits.

RESULTS

De Novo Sequencing and Assembly

The reference genome source was DNA extracted from six individual plants of a single glyphosate-resistant biotype that was originally collected in 2002 from a soybean field in Jackson, TN (TN-R; Mueller et al., 2003). Comprehensive genome shotgun sequences were obtained using seven libraries with insert sizes that ranged from 350 bp to 10 kb and three sequencing platforms (454 GS-FLX, Illumina HiSeq 2000, and PacBio RS; Table I). Two plates of 454 GS-FLX runs on three independent libraries (approximately 600 bp) produced 2,238,807 raw reads with an average read length of approximately 390 bp, for a yield total of approximately 860 Mb data. Two lanes of HiSeq 2000 runs (2 × 100 bp) of two short paired-end libraries produced 786,389,990 raw reads, for a yield total of approximately 77 Gb data. One lane of a HiSeq 2000 run (2 × 100 bp) of one 3-kb mate-paired library produced 381,844,926 raw reads, for a yield total of approximately 37 Gb data. Ten SMRT cells of PacBio RS runs produced 513,084 sequences with an average read length of 3.1 kb, which yielded 1.44 Gb of total data (Table II). All the sequence data together represented approximately 350× coverage of the horseweed genome.

Table I. Summary of sequencing read yield results from multiple platforms and mate-paired libraries of various sizes (approximately 350 bp, 600 bp, 3 kb, and 10 kb).

Platform Libraries Read Length Filtered Reads Data (Approximate Coverage)
454 GS-FLX 600 bp (3) 390 bp 2,238,807 860 Mb (2.5×)
Illumina PE 350 bp (2) 2 × 100 bp 786,389,990 77 Gb (230×)
Illumina MP 3 kb (1) 2 × 100 bp 381,844,926 37 Gb (110×)
PacBio RS 10 kb (1) 3.1 kb 513,084 1.4 Gb (4.5×)
Total Seven libraries 0.1–10 kb 1,170,986,807 116.3 Gb (350×)

Table II. Summary of the de novo genome assembly of horseweed.

N90 and N70, 90% and 70% of the assembly, respectively.

Parameter Contigs Scaffolds
Sequence number 20,075 13,966
N90 (bp; n) 8,698; 14,599 13,506; 10,318
N70 (bp; n) 14,517; 8,886 23,243; 6,219
N50 (bp; n) 20,764; 5,122 33,561; 3,575
Average length (bp) 16,258 26,546
Maximum single sequence (bp) 102,072 182,395
Base pairs (Mb) 311.27 344.88

De novo genome assembly is often not straightforward (Baker, 2012), in that many factors impact the integrity and accuracy of the draft assembly. To integrate data from multiple sequencing platforms, we used a de novo assembly strategy (Supplemental Fig. S1). Factors that could impact the assembly quality were tested, including Kmer (the possible subsequences of length k from reads obtained through DNA sequencing effort) size, sequencing depth and coverage, and the hybrid assembly of multiplex data (Supplemental Fig. S2). Increased coverage could increase assembly quality by increasing the 50% of the assembly (N50) value and assembled genome size and may also reduce the numbers of contigs. However, as the coverage reached 50×, the effect of this factor was limited. For a given coverage, lower Kmer size translated into increased sensitivity, but with more assembly errors, while higher Kmer size translated into highly specific, accurate, and reliable assembly (Supplemental Fig. S2). Therefore, given adequate coverage, higher Kmer provided better assembly. To mitigate potential problems, we chose deep coverage (greater than 50×), high Kmer (more than 25 nucleotides) cutoff for accuracy, and multiple sequencing platforms and library types. Indeed, the data that were integrated among multiple Next Generation Sequencing platforms and libraries with different insert sizes increased the quality of the de novo assembly (Supplemental Fig. S2). Newbler was used, since it was designed specifically for assembling sequence data generated by the 454 GS-FLX (Margulies et al., 2005), whereas SOAPdenovo is among the most popular tools to assemble Illumina short reads (Li et al., 2010b). Using the PacBio RS sequencing platform, reads of over 10 kb in length were produced. However, the higher error rate prevented the direct use of PacBio reads in assembly (Supplemental Fig. S3). After error correction by mapping high-quality, but short, Illumina reads, the inclusion of the long reads from PacBio increased the quality of de novo genome assembly by using it for either the primary assembly or the guidance of scaffolding (Supplemental Figs. S3–S5). The filtered and high-quality 454 GS-FLX single reads and high-coverage Illumina paired-end reads were combined to generate de novo contigs. Mate-pair libraries included true mates mixed with chimeras and also paired ends. To find those true mates with correct orientation and distance, the de novo-assembled contigs were used as reference for the mapping of mate-pair reads (Supplemental Table S1; Supplemental Fig. S6). The total size of the assembled contigs was 311.3 Mb, in which N50 was composed of 5,122 sequences of 20,764 bp or longer (Fig. 1; Table II). The contigs were further assembled to 344.9-Mb scaffolds, in which the N50 increased to 33,561 bp (3,575 sequences). Without gaps, the assembly contained 92.3% genome coverage. The horseweed genome contained a 34.9% G/C base ratio, which was higher than sequenced woody eudicot genomes but equivalent to other eudicot genomes and much lower than monocot genomes (Fig. 1C; Supplemental Table S2). The G/C content of coding regions was 38.6%, which was higher than that of the entire genome (Fig. 1D). The assembled draft of horseweed genome data has been submitted to the National Center for Biotechnology Information with accession number SUB535309.

Figure 1.

Figure 1.

De novo genome assembly of horseweed. A, Assembly units were sorted by descending length, and the cumulative DNA content was plotted as a function of the total number of assembly units. B, Length distributions of de novo-assembled contigs. C, G/C content distributions in horseweed computed over a bin size of approximately 500 bp. D, G/C content distribution of unique coding DNA sequences.

To inspect the de novo assembly accuracy, multiple independent sources of horseweed DNA and complementary DNA sequences were used to align with the assembled genome (Table III). These sequences included EST reads from Sanger sequencing, transcriptome shotgun reads from 454 and Illumina sequencing, and partial genomic reads, which were used for the genome assembly (Zhou et al., 2009; Peng et al., 2010; Yuan et al., 2010; sample identifiers WSYE and NUSE from the One-Thousand Plant Transcriptomes project [www.onekp.com]). Over 90% of sequences could be aligned to the genome. The mapping and BLAST varied among sequence reads. However, in the aligned sequences, 91.2% or more had over 90% coverage and identity. The nonperfect alignments indicated that EST reads could be used to further improve assembly accuracy and gene prediction.

Table III. Alignment of ESTs, transcriptome sequencing (RNAseq), and genomic shotgun sequences to scaffolds.

The threshold of alignment coverage and identity was 90% or greater.

Sequence Type Sequences Aligned Aligned Alignment Coverage Alignment Identity
n %
Sanger ESTs (more than 600 bp) 5,482 5,012 91.4 91.2 93.7
454 GS-FLX RNAseq (more than 200 bp) 145,796 132,579 90.9 92.1 92.6
Illumina RNAseq (100 bp) 3,987,575 3,591,820 90.1 91.6 94.5
454 GS-FLX genomic reads (more than 390 bp) 499,955 450,754 90.2 ≥95 ≥99
Illumina genomic reads (100 bp) 9,315,014 8,476,173 91.0 ≥95 ≥99

Identification of Repetitive Elements and Annotation of Protein-Coding Genes

By using the plant repeat database (RepeatMasker libraries 20140131) and RepeatMasker 4.0.5, a total of 233,521 loci (19.5 Mb) were identified to be various nucleotide repeat elements. These sequences represented 6.25% of the assembled genome (Table IV) and were composed of 172,825 simple sequence repeats (SSRs), 28,494 low-complexity elements, 23,425 different types of retroelements, 5,424 small RNA structures, 2,283 DNA transposons, and 675 unclassified elements. Retroelements were the largest mobile elements found in the genome (2.61%), of which most were long terminal repeat-type retroelements (2.54%). Small RNAs contributed 0.61% to the genome, whereas identified DNA transposons represented just 0.16% of the genome.

Table IV. Repeat element analysis in the horseweed genome.

For elements, most repeats were fragmented by insertions or deletions and have been counted as one element.

Repeat Elements Elements Length Percentage of Genome
n bp
Retroelements 23,425 8,509,514 2.61
Short interspersed elements 29 1,433 0.00
Long interspersed elements 902 219,700 0.07
Long terminal repeat elements 22,889 8,288,381 2.54
 Yeast transposon family1/Copia 15,599 4,999,694 1.54
 Gypsy/Dictyostelium intermediate repeat sequence1 7,110 3,276,883 1.00
DNA transposons 2,283 531,765 0.16
 Hobo-Activator 337 61,039 0.02
 Tc1-IS630-Pogo 719 166,937 0.05
 Enhancer/suppressor mutator 2 644 0.00
 Tourist/Harbinger 418 137,376 0.04
Unclassified 675 278,725 0.09
Small RNAs 5,424 2,003,521 0.61
Simple repeats 172,825 7,669,927 2.35
Low complexity 28,494 1,400,133 0.43
Total 233,521 20,393,585 6.25

Masked SSR loci were analyzed using the Simple Sequence Repeat Identification Tool (http://archive.gramene.org/db/markers/ssrtool) with a threshold minimum of six repeats. The results included 51,892 loci (Supplemental Table S3) containing 44,589 dimer (85.9%), 6,864 trimer (13.2%), 299 tetramer (0.58%), 60 pentamer (0.12%), and 80 hexamer (0.15%) motifs. The majority of these loci contained six to nine repeats (Supplemental Fig. S7). Considering the position in scaffold, repeat number, and length, 1,316 loci were identified to have appropriate sequences to develop PCR primers, which could be used as SSR markers (Okada et al., 2013).

Using previous EST and transcriptome data as a starting point (Zhou et al., 2009; Peng et al., 2010; Yuan et al., 2010), 44,592 horseweed gene models were predicted. A total of 25,439 unique transcripts were returned after removing redundancies. The longest gene (using coding sequence only and without tallying introns) was composed of 17,120 bp. It was annotated to encode a midasin-like protein. The mean and median transcript lengths were 2,210 and 1,830 bp, respectively. Over 95% of the transcripts were longer than 715 bp (Fig. 2A). The predicted gene sequences were further compared with the UniProt and National Center for Biotechnology Information nonredundant protein databases for assigning biological information. Using a threshold e value of 10−4 or less, 79.6% of these (20,248 of 25,439) had at least one hit to the database. The majority of the alignments had an identity of 50% or greater (Fig. 2C). Within the annotated genes, 18,406 returned hits at e values of 10−10 or less. Moreover, most of the hits had only one high-scoring segment pair with the aligned genes (Fig. 2, B and D). Overall, the majority of the predicted genes in the assembled horseweed genome could be aligned with hits in a known protein database with high identity and specificity, which indicated that most of these genes were likely protein-coding sequences.

Figure 2.

Figure 2.

Annotation of protein-coding transcripts. A, Distribution of lengths of predicted transcripts. B, Identity distribution of alignments with hits in the database. C, E value distribution of aligned hits. D, Distribution of high-scoring segment pairs (HSPs) from aligned hits.

Gene Ontology (GO) annotation resulted in 14,897 GO-annotated genes. The largest gene category with molecular function was kinase activity (1,780), following by nucleotide binding (1,740), transporter activity (1,031), and another 15 subgroups with 50 or more terms (Supplemental Fig. S8A). Among genes annotated to various biological processes, the largest subgroup was stress-response genes (5,446), followed by cellular component organization-related genes (4,582), transport (3,799), and 27 additional groups with 50 or more terms (Supplemental Fig. S8B). The potential cellular localization of the genes was predicted by assigning 17,878 cellular component GO terms to various queries (Supplemental Fig. S8C). Genes that encoded proteins with enzyme activity were further divided into six groups (transferases, hydrolases, oxidoreductases, ligases, lyases, and isomerases; Supplemental Fig. S8D).

Fisher’s exact test showed 10 GO terms that were significantly overrepresented in horseweed compared with Arabidopsis (Fig. 3). Enrichment analysis using the GO terms as well as related gene families included vacuolar transport, photosynthetic acclimation, detoxification of nitrogenous compounds, response to UV-B light, chloroplast thylakoid lumen, protein targeting to chloroplasts, protein peptidyl-prolyl isomerization, peptidyl-prolyl cis-trans-isomerase activity, hydrolase activity (acting on carbon-nitrogen), glycolysis, and NAD(P)H dehydrogenase complex assembly. Four transporter subgroup GO terms were overrepresented in the horseweed genome at the P < 0.001 level: drug transmembrane transporters (GO:0015238), xanthine transmembrane transporters (GO:0042907), uracil transmembrane transporters (GO:0015210), and allantoin uptake transporters (GO:0005274; Supplemental Fig. S9). However, these data are not sufficient to draw conclusions about biological function and the evolution of herbicide resistance; further gene expression analysis and other follow-on experiments need to be performed.

Figure 3.

Figure 3.

Differential GO term distribution by enrichment analysis of the horseweed and Arabidopsis genomes using the Fisher’s exact test. The entire transcriptome in horseweed was set as the test data set, and the entire transcriptome in Arabidopsis was set as the reference data set (P ≤ 0.001). [See online article for color version of this figure.]

Gene families that are commonly associated with nontarget herbicide resistance include cytochrome P450s, glutathione S-transferases (GSTs), glycosyltransferases (GTs), and ATP-binding cassette (ABC) transporters (Yuan at al., 2007). Relative to Arabidopsis, the horseweed genome has more members in each of these families (Table V). There were 323 unique cytochrome P450 genes at 401 loci in horseweed, which represents a 26% increase over Arabidopsis. Also, 155 ABC transporter genes at 213 loci were found in horseweed, which is 14% more than in Arabidopsis. Horseweed had 6% more GTs than Arabidopsis. There were 54 horseweed GSTs, which is one more gene than is found in Arabidopsis (Table V).

Table V. Analysis of gene families (cytochrome P450s, GSTs, ABC transporters, and GTs) that are potentially involved in nontarget glyphosate resistance in horseweed compared with those tallied for Arabidopsis.

Loci data refer to the total potential copy numbers in the gene family.

Gene Family Arabidopsis Horseweed
Cytochrome P450s 256 323 (401 Loci)
GSTs 53 54 (54 Loci)
ABC transporters 136 155 (213 Loci)
GTs 361 381 (457 Loci)

Plastid and Mitochondrial Genomes

Plastid genome sequences from 454 GS-FLX reads were isolated by searching against known plastid genome databases and then subjected to de novo assembly. The assembled plastid genomes were further mapped with accurate Illumina reads to fix homopolymer errors. The entire chloroplast genome (approximately 153 kb; Supplemental Fig. S10) and most of the mitochondrial genome (approximately 450 kb in scaffolds) were obtained (Supplemental Table S4; Supplemental Data Sets S1 and S2). The horseweed chloroplast genome contained two inverted repeats (24,936 bp each), a large single-copy fragment (84,634 bp), and a small single-copy fragment (18,063 bp). The orientation of the two inverted repeats was determined by PCR. A total of 95 protein-coding genes (88 unique), 39 tRNA genes (28 unique), and eight ribosomal RNA genes (four unique) were annotated (Supplemental Data Set S1). These genes comprised 59.7% of the chloroplast genome. The remainder, the noncoding portion of the chloroplast genome, was composed of introns, intergenic spacers, and pseudogenes. The largest gene was the ycf2 gene, which was 6,085 bp. The smallest gene was a 34-bp tRNA gene. The G/C content in the chloroplast genome was 37.2%. The chloroplast genome was nearly identical to that of lettuce and sunflower, except in the orientation of the two inverted repeat elements (Supplemental Fig. S11). The lettuce chloroplast has reverse inverted repeat elements compared with horseweed and sunflower.

The horseweed mitochondrial genome reads from 454 GS-FLX, Illumina paired-end, and true-mate pairs were parsed and assembled into 123 scaffolds (N50 = 10,057 bp), with the help of plant mitochondrial databases. The scaffold sizes ranged from 315 to 43,498 bp, for a total of 453,334 bp in the mitochondrial genome (Supplemental Fig. S12; Supplemental Data Set S2), which was moderately sized compared with other plant mitochondrial genomes.

Genome Resequencing and Genomic Variation Analysis

We resequenced four population pairs that were either glyphosate resistant (R) or susceptible (S) from the United States. The representative populations used for resequencing were from California (CA-R versus CA-S), Delaware (DE-R versus DE-S), Indiana (IN-R versus IN-S), and Tennessee (TN-R versus TN-S). Individual seedlings surviving a glyphosate application were used for eventual DNA donors, wherein DNA was pooled for sequencing (Table VI; Supplemental Fig. S13). Within two HiSeq 2000 flow cell lanes, a total 55.6 Gb (170× coverage) of paired-end reads was produced, and each of the seven additional genomes was assembled using TN-R as the reference genome (Table VI). Interpopulation genome variation included multiple-nucleotide variation, which was defined as 4 bp or less, with equal numbers of nucleotides at each locus, single-nucleotide variation, insertions/deletions, and replacement (Table VII). When a minimum cutoff value (20 times or greater coverage at the variant loci, 20% or greater frequency) was applied to avoid calling errors, the frequency of detected variants in the genome varied among biotypes, ranging from 0.75 to 1.59 counts per kb (mean of 1.01; Table VII). Compared with the TN-R reference genome, CA-R had the fewest variants, followed by, in ascending order, IN-R, CA-S, IN-S, DE-R, TN-S, and DE-S. IN-R, IN-S, and TN-S had fewer variants relative to each other compared with the rest of the other biotypes. Compared with their partners (DE-S and CA-S, respectively), DE-R and CA-R had the fewest variants. Similarly, IN-R had fewer variants than IN-S (Supplemental Table S5). The minimum average quality score for the position of each variant was 30 (one error in 1,000; Fig. 4A), which ensured that the results were quite reliable. To determine whether a variant was homozygous, we applied a threshold of 90% probability and found that IN-R and CA-R had the highest frequency of heterozygosity (nearly 80%), whereas only 50% of variants were heterozygous in the DE-R, DE-S, and TN-S biotypes (Fig. 4B). Most of the sequence variation among samples was composed of single-nucleotide variations/single-nucleotide polymorphisms (SNPs) in all eight biotypes (Fig. 4C). Fisher’s exact test showed that a total of 2,010 specific variants were found in all four glyphosate-resistant biotypes, which were distributed among 1,370 loci. Only 539 specific variants were found in all four glyphosate-susceptible biotypes, and these were located at 425 loci (Fig. 4D). Since the physical map of the horseweed genome was not available, the location of each specific variant in the sorted scaffold arrays instead of in the genome was listed (Supplemental Fig. S14). A genome-wide association study requires SNP information from the genomes of many control and test individuals. Specific SNPs in glyphosate-resistant and glyphosate-susceptible genomes were located in hotspots, which were defined as the ones with low P values (P ≤ 0.01), and might be associated with the phenotypes of glyphosate resistance (Supplemental Fig. S14). However, these findings are not definitive, given the low sample sizes of biotypes analyzed.

Table VI. Summary of genome resequencing of seven additional horseweed biotypes using the Illumina HiSeq 2000 platform.

R, Resistant; S, susceptible.

Population Identifier Accession Location Phenotype Coverage
CA-S Fresno, CA S 36×
CA-R Fresno, CA R 19×
DE-S Georgetown, DE S 24×
DE-R Georgetown, DE R 28×
IN-S Knox County, IN S 16×
IN-R Knox County, IN R 16×
TN-S Jackson, TN S 30×

Table VII. Summary of genomic variation among the sequenced horseweed biotypes.

For each biotype, genomic DNA was isolated from six individual plants. Variants having frequencies of 90% or greater were considered to be homozygous, whereas variants having frequencies between 20% and 90% were considered to be heterozygous. MNV, Multiple-nucleotide variation; SNV, single-nucleotide variation. *Value of average frequency.

Biotype MNV SNV Deletion Insertion Replacement Heterozygous Homozygous Total No. Size kb per Variant
bp
CA-R 7,069 188,525 35,475 3,664 621 184,596 50,758 235,354 247,452 1.32
CA-S 13,253 279,016 10,329 5,407 1,035 235,416 73,624 309,040 329,025 0.99
DE-R 11,215 352,016 33,265 10,712 1,210 228,343 180,075 408,418 432,327 0.76
DE-S 13,024 419,703 41,670 11,843 1,255 234,044 253,451 487,495 516,307 0.63
IN-R 9,890 273,165 16,479 6,049 832 248,044 58,371 306,415 323,209 1.01
IN-S 10,394 298,866 20,132 7,360 965 270,989 66,728 337,717 356,007 0.92
TN-R 6,650 174,915 48,417 3,646 617 155,282 78,963 234,245 246,577 1.33
TN-S 9,725 330,880 58,920 10,432 1,043 205,585 205,335 410,920 433,873 0.75
Total 81,220 2,317,086 264,687 59,113 7,578 1,762,299 967,305 2,729,604 2,884,777 0.96*

Figure 4.

Figure 4.

Summary of genome variants and distribution among biotypes. A, Total variant counts in different biotypes. B, Percentage of heterozygous and homozygous variants among biotypes. C, Distribution of variant types. Del, Deletion; Ins, insertion; MNV, multiple-nucleotide variation; Rep, replacement; SNV, single-nucleotide variation. D, Number of specific variants in glyphosate-resistant (GR) and glyphosate-susceptible (GS) populations. [See online article for color version of this figure.]

Phylogenetic analysis of whole-genome SNP variation provided a phylogeographic hypothesis regarding the evolution of glyphosate resistance and its spread (Fig. 5). Biotypes from proximate locations generally shared the same clades, such as the Delaware pairs and the Indiana pairs, which is consistent with our previous studies (Yuan et al., 2010; Okada et al., 2013). Although not in the same clade, the TN-R branch was proximate to TN-S. The position of CA-S on the phylogram was basal, which might be of biological importance with regard to gene flow in the species. The phylogenetic pattern suggests that glyphosate resistance in horseweed has evolved independently multiple times (Fig. 5).

Figure 5.

Figure 5.

Phylogenetic tree of sequenced horseweed biotypes based on whole-genome single-nucleotide variation profiles using the shrunk-genomes method of the program PhyloSNP (Faison et al., 2014) with a position delta of zero. Bootstrap supports were all 100%, from 1,000 iterations. The tree was built by using the neighbor-joining method in Phylip 3.695. The scale bar indicates the number of genetic changes per unit of length.

5-Enolpyruvylshikimate-3-Phosphate Synthase Genes

In the horseweed genome, there are two 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS) gene copies (EPSPS1 and EPSPS2); thus, no gene amplification was observed. Eleven EPSPS1 and 32 EPSPS2 variants were detected (Supplemental Fig. S15). However, no EPSPS mutation cosegregated with glyphosate resistance in our data. Furthermore, none of the observed EPSPS sequence variants have been previously reported to be associated with resistance. There also were no significant differences in EPSPS gene expression level among these eight biotypes in response to glyphosate treatment, based on an RNAseq study (Y. Peng, Y. Sang, S. Allen, and C.N. Stewart, unpublished data). Therefore, we conclude that glyphosate resistance in horseweed is conferred by a non-target-site mechanism.

DISCUSSION

To our knowledge, horseweed is the first economically important weed to have its genome sequenced and assembled. The de novo strategy using whole-genome shotgun sequencing in this study was similar to those used in several recent reports (Al-Dous et al., 2011; Wang et al., 2012; Varshney et al., 2013), except that we also included third-generation sequencing reads in our study. The 92.3% coverage of 13,996 scaffolds should be considered to be a draft genome comparable to rice, soybean, maize, chickpea, grape, poplar, and date palm (Goff et al., 2002; Yu et al., 2002; Tuskan et al., 2006; Jaillon et al., 2007; Schnable et al., 2009; Schmutz et al., 2010; Al-Dous et al., 2011; Varshney et al., 2013). Further improvements in assembly accuracy and backfilling will require the use of either long-insertion mate-pair reads or physical maps (International Rice Genome Sequencing Project, 2005; Wang et al., 2012; Al-Mssallem et al., 2013).

The draft horseweed genome has proven to be immediately useful. We have successfully cloned 10 target promoters of interest to study glyphosate resistance and response (data not shown). Moreover, 12 SSR loci have already been tested and successfully used in a recent horseweed population genetics study (Okada et al., 2013). Also, the genome sequences have been used to annotate the explosive composition B (hexahydro-1,3,5-trinitro-1,3,5-triazine and 2,4,6-trinitrotoluene) response genes in Baccharis halimifolia (Ali et al., 2014). Furthermore, we observed that the horseweed chloroplast genome is nearly identical to those of sunflower and lettuce (https://lgr.genomecenter.ucdavis.edu). Higher plants have various mitochondrial genome architectures with respect to size and construction compared with the chloroplast genome, which is more conserved (Tian et al., 2006). Brassica carinata has a small mitochondrial genome (232,241 bp), whereas Cucurbita pepo has a much larger chloroplast genome (982,832 bp; Supplemental Table S4).

Next-generation sequencing technologies have been widely used in whole-genome sequencing and resequencing, which has led to the development of rapid genome-wide SNP detection applications in model and crop plant species for exploring within-species diversity, genotyping by sequencing, construction of haplotype maps, and performing genome-wide association studies of interesting traits (Craig et al., 2008; Huang et al., 2009, 2010; Atwell et al., 2010; Todesco et al., 2010; Wu et al., 2010; Kump et al., 2011; Deschamps et al., 2012). With the reported horseweed genome data, 51,892 SSR markers and over 2.7 million SNPs, as well as chloroplast DNA markers, were identified. Although glyphosate-susceptible and glyphosate-resistant plants, obviously, had opposite phenotypic responses to glyphosate treatment, 99.93% of their genomes were identical. We found many SNPs in glyphosate-resistant or glyphosate-susceptible genomes that were located in hot pots, which were defined as those with low P values (P ≤ 0.01), and might be associated with glyphosate resistance. However, many SNPs will be silent at the protein sequence level and not of interest for further study in terms of biological functions. Other mutations involved in gene expression regulation (e.g. within promoter regions) are of potential interest for elucidating resistance and other traits. Interestingly, horseweed exhibited considerable morphological variability among accessions (Supplemental Fig. S13); thus; integrating genomic information with resistance phenomena could be useful in weed science not only in studying the rapid evolution of herbicide resistance but also other traits.

The evolution of glyphosate resistance is a critical process that, in most cases, has been elucidated with regard to genomics and molecular mechanisms. For over a decade, we have known that glyphosate resistance in horseweed is a semidominant trait that is determined by a single simple Mendelian locus (Zelaya et al., 2004). The major focus of weed scientists to combat resistance has been to explore more herbicide control strategies while essentially ignoring the genomic and evolutionary bases of resistance. With few genomic resources for weeds and little expertise to utilize the available resources among the weed science community (Stewart, 2009), discovering the molecular mechanisms underlying nontarget resistance has been extremely difficult. However, now that the genomics era has found its way to weed science, we can begin to answer fundamental questions about what makes weeds so weedy and capable of adapting to control measures and, thereby, design approaches that reduce the further evolution of herbicide resistance in weeds (Basu et al., 2004; Stewart et al., 2009). The currently known mechanisms of glyphosate resistance in weedy plants include EPSPS target-site mutations (Baerson et al., 2002; Collavo and Sattin, 2012; Sammons and Gaines, 2014), EPSPS gene amplification (Gaines et al., 2010, 2013; Jugulam et al., 2014), and vacuolar sequestration of glyphosate (Ge et al., 2010, 2012, 2014); however, in the last case, the genes conferring resistance have not been identified.

Non-target-site resistance, especially via altered sequestration, is interesting both biologically and practically for weed management, but is not well characterized at the genomic or molecular level for most weeds (Yuan et al., 2007). This is the case with the many biotypes of glyphosate-resistant horseweed as well as several other glyphosate-resistant weedy plant species, such as johnsongrass (Sorghum halepense), ryegrass (Lolium spp.), velvet bean (Mucuna pruriens), and wild radish (Raphanus raphanistrum; Mueller et al., 2003; Feng et al., 2004; Main et al., 2004; Zelaya et al., 2004; Koger and Reddy, 2005; Owen and Zelaya 2005; Preston and Wakelin, 2008; Ge et al., 2010, 2012; Riar et al., 2011; Rojano-Delgado et al., 2012; Vila-Aiub et al., 2012; Ashworth et al., 2014; Sammons and Gaines, 2014). Physiologically, we know that non-target-site resistance can involve a plethora of metabolic, conversion, sequestration, and reduced translocation processes, including oxidation, conjugation, or compartmentation of the herbicide molecules (Yuan et al., 2007; Cummins et al., 2013; Iwakami et al., 2014). Horseweed represents a model weedy plant with a non-target-site glyphosate-resistant mechanism via altered translocation or transport (Feng et al., 2004; Stewart et al., 2009; Ge et al., 2010).

An important goal of effective weed management is to stop or slow the evolution and spread of herbicide resistance in weeds. Thus, it is vital to understand whether glyphosate resistance originated once and spread from a single source horseweed population or originated multiple independent times within distinct populations. If resistance originated once and spread, say, via seeds, then the molecular mechanisms would be expected to be the same or similar among populations because of identical descent. In that case, the prevention of seed dispersal and movement by machinery or other means would be an effective resistance management strategy. Alternatively, if resistance originated multiple times and spread from multiple sources, reduction in both seed dispersal and selection pressure will be needed. Management would also benefit from understanding evolutionary dynamics. In the case of horseweed, the evolution of glyphosate resistance occurred independently in multiple locations and might be caused by more than one nontarget resistance gene (Yuan et al., 2010; Okada et al., 2013; this study). Moreover, approximate Bayesian computation (ABC) analyses of microsatellite marker variation indicated that resistant populations underwent expansion after greatly increased glyphosate use in California in the 1990s, but many years before it was detected, strongly suggesting that diversity in weed-control practices prior to herbicide regulation probably kept resistance frequencies low (Okada et al., 2013). Data from this study also seem not to support the long-range rapid spread of resistance from a single source. At one time, the application of a combination of herbicides simultaneously or in sequence was considered to be an effective resistance management strategy. However, resistance to multiple herbicides is becoming increasingly common (Ashworth et al., 2014; Heap, 2014), and this approach is no longer consistently effective. Herbicides with new modes of action or more diversified control measures are desperately needed. Understanding the genomic mechanisms underlying nontarget resistance would allow the development of novel approaches, such as allelopathy and the expanded use of safeners.

For the first time, we have genomics data to begin to address the genetic basis of weediness. Do weeds have unique genes that endow weediness or merely variants of genes common to all plants? Most researchers have posited that the latter situation is most plausible (Basu et al., 2004; Stewart et al., 2009). Indeed, our finding lends credence to the gene family expansion hypothesis, given that nontarget resistance candidate gene families, ABC transporters, and cytochrome P450 genes, and to a lesser extent GST and GT genes, were overrepresented in horseweed relative to other published plant genomes. Weeds, at least some of them, might have an inordinate adeptness for rapid evolution that is manifested by herbicide resistance; gene family expansion and/or changes in gene expression might be why weeds are weeds (Yuan et al., 2007; Shaner, 2009; Ge et al., 2010; Peng et al., 2010; Gaines et al., 2014; Sammons and Gaines, 2014). Other genes of interest with enriched GO terms, such as vacuole transport, photosynthetic acclimation, and detoxification of nitrogenous compounds, were found (Fig. 3). Finally, horseweed seems to have relatively more protein-coding genes (44,592; also supported by EST data) in comparison with other sequenced plant genomes while maintaining the smallest genome of all economically important weeds (Sterck et al., 2007; Stewart et al., 2009; Paterson et al., 2010). However, de novo genome assembly (contigs or scaffolds) tends to underestimate repeat numbers and overestimate gene numbers, because repeats are compacted on each other and genes are somewhat fragmentary and get annotated as two unique genes in some cases (Chaisson and Pevzner, 2008; Baker, 2012; Seabury et al., 2013). Repeat numbers in the horseweed genome are lower than in most other sequenced plant genomes, in part because horseweed has a relatively compact genome. However, read-mapping analyses revealed that the read depth of repeated regions was much higher than for nonrepeats. This finding implies that repeat numbers are underestimated in the horseweed genome and that a de novo repeat-searching model will be needed for further analysis.

The small genome size of horseweed and other economically important agricultural weeds might have a practical and evolutionary significance. Twenty-three of the 25 most-studied weedy plant species have 1C genome sizes of less than 5,000 Mb, and 13 of these have genome sizes of less than 2,000 Mb (Stewart et al., 2009). Recently, a meta-analysis was performed in which invasiveness among plant species was negatively associated with genome size and positively associated with chromosome number (Pandit et al., 2014). This study analyzed 890 species among 62 genera. Even though it represents one datum, horseweed appears to be, potentially, an archetypical weed in that it has a very streamlined genome but with a large number of genes, which could give it the capacity for rapid evolution in changing environments. Thus, beyond its use for practical agricultural research, the horseweed genome might shed light on standing questions of invasion, evolution, and changing climates (Caplat et al., 2013). The bane of farmers could be the harbinger of genomic enlightenment.

MATERIALS AND METHODS

Plants and Phenotyping

Seeds sampled from horseweed (Conyza canadensis) populations were germinated and grown in potting medium (3B potting medium, 10-cm-diameter pot, one plant per pot; Supplemental Fig. S13) in a greenhouse under a 16-/8-h light/dark photoperiod at ambient temperatures (25°C ± 2°C). Twenty-four 3-month-old plants from each population were treated with glyphosate at the field rate (0.84 kg ha−1; Roundup Weathermax; Monsanto). Treatment occurred at the rosette stage when plants were 6 to 8 cm in diameter, and glyphosate-treated plants were considered to be resistant if they were alive at the end of 3 weeks (Yuan et al., 2010).

DNA Isolation and Genome Sequencing Using Multiple Platforms

Genomic DNA was extracted from six individual plants of the glyphosate-resistant biotype that was collected originally from a soybean field in Jackson, TN (TN-R), using Plant DNeasy kits (Qiagen). Whole-genome shotgun sequencing was performed using the Illumina HiSeq 2000, 454 GS-FLX Titanium, and PacBio RS sequencing platforms. A total of three sequencing libraries were constructed with insert sizes of approximately 600 bp for the 454 GS-FLX Titanium system. Two paired-end sequencing libraries with insert sizes of approximately 350 bp and one mate-pair library with insert sizes of approximately 3,000 bp was constructed for the Illumina HiSeq 2000 system. One library with insert sizes of approximately 10,000 bp was constructed for the PacBio RS sequencing system.

De Novo Genome Assembly

The de novo genome assembly strategy is shown in Supplemental Figure S1. The 454 GS-FLX reads were assembled with Newbler. The Illumina paired-end sequence reads were divided by sequencing depth (approximately 50×) using error-corrected PacBio long reads as guidance (Koren et al., 2012), and subsets were assembled using SOAPdenovo and CLC Genomic Workbench. NGen was used to combine the primary assembled contigs. CLC Genomic Workbench was used to find the true mate pairs by mapping with the total contigs. NGen was used for the final scaffold construction and gap closure. All the parameter settings were described previously (Varshney et al., 2012; Wang et al., 2012). To check the completeness of the assembly, we mapped ESTs and transcriptome data to the genome assembly using BLASTN (with e value cutoff of 10−20).

Assembly of Chloroplast and Mitochondrial Genomes

The sequencing reads were screened against custom plant chloroplast and mitochondrial genome databases. The subset sequences for chloroplast and mitochondria were assembled respectively. The complete chloroplast genome was annotated using DOGMA (Wyman et al., 2004). To remove nonmitochondrial assembly, the mitochondrial contigs were further screened against plant mitochondrial genome databases using BLASTN with e value cutoff of 10−5.

Gene and Repeat Annotation

Gene prediction was performed using homology-based and transcript-based methods. Previous horseweed ESTs and transcriptome data were aligned to the genome assembly using BLAT (blat-34; identity ≥ 0.98, coverage ≥ 0.98) to generate spliced alignments (Kent, 2002), which were linked according to their overlap using PASA (Haas et al., 2003). Plant proteins of other species were also mapped to the genome using TBLASTN with e value cutoff of 10−5. Predicted transcripts were assigned biological information by searching the UniProt protein database using BLASTX with e value cutoff of 10−3. RepeatMasker version 4.0.5 (http://www.repeatmasker.org/) was used to search putative transposable element regions of the horseweed genome against an updated set of RepeatMasker libraries (20140131). To develop SSR markers for further population genetic studies, the masked SSR loci were further analyzed using the Simple Sequence Repeat Identification Tool (http://archive.gramene.org/db/markers/ssrtool) with a threshold minimum of six repeats.

GO Annotation and Enrichment Analysis

Results of BLASTX of horseweed transcripts were further mapped using the GO database to assign GO terms and carry out GO annotation using the Blast2Go program integrated in CLC Genomic Workbench (http://www.blast2go.com). Enrichment analysis was performed by using the entire set of available Arabidopsis (Arabidopsis thaliana) transcripts as a reference. The P value of Fisher’s exact test was set to P ≤ 0.001 to reduce noise and output the most specific terms.

Genome Resequencing and Variation Analysis

DNA samples of another seven horseweed biotypes were isolated using the method described above. Paired-end DNA libraries (approximately 350-bp insertion) were constructed for each biotype, bar coded, and subjected to Illumina sequencing using the HiSeq 2000 platform. After trimming and quality-control steps, the remaining data for each population were assembled into genomes with the TN-R assembly as reference using CLC Genomic Workbench 7.03. The probabilistic variant detection model was chosen to report the variants. A minimum cutoff value (20 times or greater coverage at the variant loci, 20% or greater frequency) was applied to avoid issues of sequencing and assembly errors in variant calling. Fisher’s exact test was carried out to detect specific variants in glyphosate-susceptible or glyphosate-resistant groups. The cutoff sample frequency was 90% or greater, which meant that only variants present in all test samples, but not in any reference samples, would be reported. To carry out phylogenetic analysis based on whole-genome single-nucleotide variation profiles among sequenced horseweed biotypes, the shrunk-genomes method of the program PhyloSNP (Faison et al., 2014) with a position delta of zero surrounding each SNP was chosen. The algorithm of a quantitative method was used, in which the created matrix of presence/absence and the position of each SNP from the shrunk-genome alignment can be used directly to generate a distance matrix among the biotypes. Finally, the tree was built by using the neighbor-joining method in Phylip 3.695.

Supplemental Data

The following materials are available in the online version of this article.

Supplementary Material

Supplemental Data

Acknowledgments

We thank Sara Allen, Laura Abercrombie, Qidong Jia, Guanglin Li, and Nolan Kane for assistance with material sampling, sequencing, and data collection. We also thank the providers of the original sources of germplasm, including Anil Shrestha, David Grantz, Vince Davis, Robert M. Hayes, and Barbara Scott.

Glossary

N50

50% of the assembly

SSR

simple sequence repeat

GO

Gene Ontology

GST

glutathione S-transferase

GT

glycosyltransferase

ABC

ATP-binding cassette

SNP

single-nucleotide polymorphism

Kmer

the possible subsequences of length k from reads obtained through DNA sequencing effort

Footnotes

1

This work was supported by the U.S. Department of Agriculture-National Institute of Food and Agriculture, Monsanto, the National Science Foundation (grant no. DBI–0820451), and the University of California, Davis, Genome Center.

[C]

Some figures in this article are displayed in color online but in black and white in the print edition.

[W]

The online version of this article contains Web-only data.

[OPEN]

Articles can be viewed online without a subscription.

References

  1. Al-Dous EK, George B, Al-Mahmoud ME, Al-Jaber MY, Wang H, Salameh YM, Al-Azwani EK, Chaluvadi S, Pontaroli AC, DeBarry J, et al. (2011) De novo genome sequencing and comparative genomics of date palm (Phoenix dactylifera). Nat Biotechnol 29: 521–527 [DOI] [PubMed] [Google Scholar]
  2. Ali A, Zinnert JC, Muthukumar B, Peng Y, Chung SM, Stewart CN., Jr (2014) Physiological and transcriptional responses of Baccharis halimifolia to the explosive “composition B” (RDX/TNT) in amended soil. Environ Sci Pollut Res Int 21: 8261–8270 [DOI] [PubMed] [Google Scholar]
  3. Al-Mssallem IS, Hu S, Zhang X, Lin Q, Liu W, Tan J, Yu X, Liu J, Pan L, Zhang T, et al. (2013) Genome sequence of the date palm Phoenix dactylifera L. Nat Commun 4: 2274. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Andersen MC. (1993) Diaspore morphology and seed dispersal in several wind-dispersed Asteraceae. Am J Bot 80: 487–492 [DOI] [PubMed] [Google Scholar]
  5. Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815 [DOI] [PubMed] [Google Scholar]
  6. Ashworth MB, Walsh MJ, Flower KC, Powles SB. (2014) Identification of the first glyphosate-resistant wild radish (Raphanus raphanistrum L.) populations. Pest Manag Sci 70: 1432–1436 [DOI] [PubMed] [Google Scholar]
  7. Atwell S, Huang YS, Vilhjálmsson BJ, Willems G, Horton M, Li Y, Meng D, Platt A, Tarone AM, Hu TT, et al. (2010) Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465: 627–631 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Baerson SR, Rodriguez DJ, Tran M, Feng Y, Biest NA, Dill GM. (2002) Glyphosate-resistant goosegrass: identification of a mutation in the target enzyme 5-enolpyruvylshikimate-3-phosphate synthase. Plant Physiol 129: 1265–1275 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Baker M. (2012) De novo genome assembly: what every biologist should know. Nat Methods 9: 333–337 [Google Scholar]
  10. Basu C, Halfhill MD, Mueller TC, Stewart CN., Jr (2004) Weed genomics: new tools to understand weed biology. Trends Plant Sci 9: 391–398 [DOI] [PubMed] [Google Scholar]
  11. Bhowmik PC, Bekech MM. (1993) Horseweed (Conyza canadensis) seed production, emergence, and distribution in no-tillage and conventional tillage corn (Zea mays). Agron Trends Agric Sci 1: 67–71 [Google Scholar]
  12. Caplat P, Cheptou PO, Diez J, Guisan A, Larson BMH, MacDougall AS, Peltzer DA, Richardson DM, Shea K, van Kleunen M, et al. (2013) Movement, impacts and management of plant distributions in response to climate change: insights from invasions. Oikos 122: 1265–1274 [Google Scholar]
  13. Chaisson MJ, Pevzner PA. (2008) Short read fragment assembly of bacterial genomes. Genome Res 18: 324–330 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Chao WS, Horvath DP, Anderson JV, Foley MP. (2005) Potential model weeds to study genomics, ecology, and physiology in the 21st century. Weed Sci 53: 929–937 [Google Scholar]
  15. Collavo A, Sattin M. (2012) Resistance to glyphosate in Lolium rigidum selected in Italian perennial crops: bioevaluation, management and molecular bases of target-site resistance. Weed Res 52: 16–24 [Google Scholar]
  16. Craig DW, Pearson JV, Szelinger S, Sekar A, Redman M, Corneveaux JJ, Pawlowski TL, Laub T, Nunn G, Stephan DA, et al. (2008) Identification of genetic variants using bar-coded multiplexed sequencing. Nat Methods 5: 887–893 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Cummins I, Wortley DJ, Sabbadin F, He Z, Coxon CR, Straker HE, Sellars JD, Knight K, Edwards L, Hughes D, et al. (2013) Key role for a glutathione transferase in multiple-herbicide resistance in grass weeds. Proc Natl Acad Sci USA 110: 5812–5817 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Dauer JT, Mortensen DA, Humston R. (2006) Controlled experiments to predict horseweed (Conyza canadensis) dispersal distances. Weed Sci 54: 484–489 [Google Scholar]
  19. Deschamps S, Llaca V, May GD. (2012) Genotyping-by-sequencing in plants. Biology (Basel) 1: 460–483 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. DeVlaming V, Proctor VW. (1968) Dispersal of aquatic organisms: viability of seeds recovered from the droppings of captive killdeer and mallard ducks. Am J Bot 55: 20–26 [Google Scholar]
  21. Faison WJ, Rostovtsev A, Castro-Nallar E, Crandall KA, Chumakov K, Simonyan V, Mazumder R. (2014) Whole genome single-nucleotide variation profile-based phylogenetic tree building methods for analysis of viral, bacterial and human genomes. Genomics 104: 1–7 [DOI] [PubMed] [Google Scholar]
  22. Feng PCC, Tran M, Chiu T, Sammons RD, Heck GR, Jacob CA. (2004) Investigations into glyphosate-resistant horseweed (Conyza canadensis): retention, uptake, translocation, and metabolism. Weed Sci 52: 498–505 [Google Scholar]
  23. Gaines TA, Lorentz L, Figge A, Herrmann J, Maiwald F, Ott MC, Han H, Busi R, Yu Q, Powles SB, et al. (2014) RNA-Seq transcriptome analysis to identify genes involved in metabolism-based diclofop resistance in Lolium rigidum. Plant J 78: 865–876 [DOI] [PubMed] [Google Scholar]
  24. Gaines TA, Wright AA, Molin WT, Lorentz L, Riggins CW, Tranel PJ, Beffa R, Westra P, Powles SB. (2013) Identification of genetic elements associated with EPSPs gene amplification. PLoS ONE 8: e65819. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Gaines TA, Zhang W, Wang D, Bukun B, Chisholm ST, Shaner DL, Nissen SJ, Patzoldt WL, Tranel PJ, Culpepper AS, et al. (2010) Gene amplification confers glyphosate resistance in Amaranthus palmeri. Proc Natl Acad Sci USA 107: 1029–1034 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Ge X, d’Avignon DA, Ackerman JJH, Collavo A, Sattin M, Ostrander EL, Hall EL, Sammons RD, Preston C. (2012) Vacuolar glyphosate-sequestration correlates with glyphosate resistance in ryegrass (Lolium spp.) from Australia, South America, and Europe: a 31P NMR investigation. J Agric Food Chem 60: 1243–1250 [DOI] [PubMed] [Google Scholar]
  27. Ge X, d’Avignon DA, Ackerman JJH, Sammons D. (2014) In vivo 31P-nuclear magnetic resonance studies of glyphosate uptake, vacuolar sequestration, and tonoplast pump activity in glyphosate-resistant horseweed. Plant Physiol 166: 1255–1268 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Ge X, d’Avignon DA, Ackerman JJH, Sammons RD. (2010) Rapid vacuolar sequestration: the horseweed glyphosate resistance mechanism. Pest Manag Sci 66: 345–348 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, et al. (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296: 92–100 [DOI] [PubMed] [Google Scholar]
  30. Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK, Jr, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, et al. (2003) Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 31: 5654–5666 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Halfhill MD, Good LL, Basu C, Burris J, Main CL, Mueller TC, Stewart CN., Jr (2007) Transformation and segregation of GFP fluorescence and glyphosate resistance in horseweed (Conyza canadensis) hybrids. Plant Cell Rep 26: 303–311 [DOI] [PubMed] [Google Scholar]
  32. Heap I (2014) The International Survey of Herbicide Resistant Weeds. www.weedscience.com (May 2014)
  33. Huang X, Feng Q, Qian Q, Zhao Q, Wang L, Wang A, Guan J, Fan D, Weng Q, Huang T, et al. (2009) High-throughput genotyping by whole-genome resequencing. Genome Res 19: 1068–1076 [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Huang X, Wei X, Sang T, Zhao Q, Feng Q, Zhao Y, Li C, Zhu C, Lu T, Zhang Z, et al. (2010) Genome-wide association studies of 14 agronomic traits in rice landraces. Nat Genet 42: 961–967 [DOI] [PubMed] [Google Scholar]
  35. International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436: 793–800 [DOI] [PubMed] [Google Scholar]
  36. Iwakami S, Uchino A, Kataoka Y, Shibaike H, Watanabe H, Inamura T. (2014) Cytochrome P450 genes induced by bispyribac-sodium treatment in a multiple-herbicide-resistant biotype of Echinochloa phyllopogon. Pest Manag Sci 70: 549–558 [DOI] [PubMed] [Google Scholar]
  37. Jaillon O, Aury JM, Noel B, Policriti A, Clepet C, Casagrande A, Choisne N, Aubourg S, Vitulo N, Jubin C, et al. (2007) The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449: 463–467 [DOI] [PubMed] [Google Scholar]
  38. Jugulam M, Niehues K, Godar AS, Koo DH, Danilova T, Friebe B, Sehgal S, Varanasi VK, Wiersma A, Westra P, et al. (2014) Tandem amplification of a chromosomal segment harboring EPSPS locus confers glyphosate resistance in Kochia scoparia. Plant Physiol 166: 1200–1207 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Kane NC, Gill N, King MG, Bowers JE, Berges H, Gouzy J, Rieseberg LH. (2011) Progress towards a reference genome for sunflower. Botany 89: 429–437 [Google Scholar]
  40. Kent WJ. (2002) BLAT: the BLAST-like alignment tool. Genome Res 12: 656–664 [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Koger CH, Reddy KN. (2005) Role of absorption and translocation in the mechanism of glyphosate resistance in horseweed (Conyza canadensis). Weed Sci 53: 84–89 [Google Scholar]
  42. Koren S, Schatz MC, Walenz BP, Martin J, Howard JT, Ganapathy G, Wang Z, Rasko DA, McCombie WR, Jarvis ED, et al. (2012) Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat Biotechnol 30: 693–700 [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Kump KL, Bradbury PJ, Wisser RJ, Buckler ES, Belcher AR, Oropeza-Rosas MA, Zwonitzer JC, Kresovich S, McMullen MD, Ware D, et al. (2011) Genome-wide association study of quantitative resistance to southern leaf blight in the maize nested association mapping population. Nat Genet 43: 163–168 [DOI] [PubMed] [Google Scholar]
  44. Lai Z, Kane NC, Kozik A, Hodgins KA, Dlugosch KM, Barker MS, Matvienko M, Yu Q, Turner KG, Pearl SA, et al. (2012) Genomics of Compositae weeds: EST libraries, microarrays, and evidence of introgression. Am J Bot 99: 209–218 [DOI] [PubMed] [Google Scholar]
  45. Lee RM, Thimmapuram J, Thinglum KA, Gong G, Hernandez AG, Wright CL, Kim RW, Mikel M, Tranel PJ. (2009) Sampling the waterhemp (Amaranthus tuberculatus) genome using pyrosequencing technology. Weed Sci 57: 463–469 [Google Scholar]
  46. Li R, Fan W, Tian G, Zhu H, He L, Cai J, Huang Q, Cai Q, Li B, Bai Y, et al. (2010a) The sequence and de novo assembly of the giant panda genome. Nature 463: 311–317 [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Li R, Zhu H, Ruan J, Qian W, Fang X, Shi Z, Li Y, Li S, Shan G, Kristiansen K, et al. (2010b) De novo assembly of human genomes with massively parallel short read sequencing. Genome Res 20: 265–272 [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Main CL, Mueller TC, Hayes RM, Wilkerson JB. (2004) Response of selected horseweed (Conyza canadensis (L.) Cronq.) populations to glyphosate. J Agric Food Chem 52: 879–883 [DOI] [PubMed] [Google Scholar]
  49. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437: 376–380 [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Mueller TC, Massey JH, Hayes RM, Main CL, Stewart CN., Jr (2003) Shikimate accumulates in both glyphosate-sensitive and glyphosate-resistant horseweed (Conyza canadensis L. Cronq.). J Agric Food Chem 51: 680–684 [DOI] [PubMed] [Google Scholar]
  51. Okada M, Hanson BD, Hembree KJ, Peng Y, Shrestha A, Stewart CN, Jr, Wright SD, Jasieniuk M. (2013) Evolution and spread of glyphosate resistance in Conyza canadensis in California. Evol Appl 6: 761–777 [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Owen MD, Zelaya IA. (2005) Herbicide-resistant crops and weed resistance to herbicides. Pest Manag Sci 61: 301–311 [DOI] [PubMed] [Google Scholar]
  53. Pandit MK, White SM, Pocock MJO. (2014) The contrasting effects of genome size, chromosome number and ploidy level on plant invasiveness: a global analysis. New Phytol 203: 697–703 [DOI] [PubMed] [Google Scholar]
  54. Paterson AH, Freeling M, Tang H, Wang X. (2010) Insights from the comparison of plant genome sequences. Annu Rev Plant Biol 61: 349–372 [DOI] [PubMed] [Google Scholar]
  55. Peng Y, Abercrombie LLG, Yuan JS, Riggins CW, Sammons RD, Tranel PJ, Stewart CN., Jr (2010) Characterization of the horseweed (Conyza canadensis) transcriptome using GS-FLX 454 pyrosequencing and its application for expression analysis of candidate non-target herbicide resistance genes. Pest Manag Sci 66: 1053–1062 [DOI] [PubMed] [Google Scholar]
  56. Pimentel D, Lach L, Zuniga R, Morrison D. (2000) Environmental and economic costs of nonindigenous species in the United States. Bioscience 50: 53–65 [Google Scholar]
  57. Preston C, Wakelin AM. (2008) Resistance to glyphosate from altered herbicide translocation patterns. Pest Manag Sci 64: 372–376 [DOI] [PubMed] [Google Scholar]
  58. Riar DS, Norsworthy JK, Johnson DB, Scott RC, Bagavathiannan M. (2011) Glyphosate resistance in a johnsongrass (Sorghum halepense) biotype from Arkansas. Weed Sci 59: 299–304 [Google Scholar]
  59. Riggins CW, Peng Y, Stewart CN, Jr, Tranel PJ. (2010) Characterization of waterhemp transcriptome using 454 pyrosequencing and its application for studies of herbicide target-site genes. Pest Manag Sci 66: 1042–1052 [DOI] [PubMed] [Google Scholar]
  60. Rojano-Delgado AM, Cruz-Hipolito H, De Prado R, Luque de Castro MD, Franco AR. (2012) Limited uptake, translocation and enhanced metabolic degradation contribute to glyphosate tolerance in Mucuna pruriens var. utilis plants. Phytochemistry 73: 34–41 [DOI] [PubMed] [Google Scholar]
  61. Sammons RD, Gaines TA. (2014) Glyphosate resistance: state of knowledge. Pest Manag Sci 70: 1367–1377 [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, Hyten DL, Song Q, Thelen JJ, Cheng J, et al. (2010) Genome sequence of the palaeopolyploid soybean. Nature 463: 178–183 [DOI] [PubMed] [Google Scholar]
  63. Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, Liang C, Zhang J, Fulton L, Graves TA, et al. (2009) The B73 maize genome: complexity, diversity, and dynamics. Science 326: 1112–1115 [DOI] [PubMed] [Google Scholar]
  64. Seabury CM, Dowd SE, Seabury PM, Raudsepp T, Brightsmith DJ, Liboriussen P, Halley Y, Fisher CA, Owens E, Viswanathan G, et al. (2013) A multi-platform draft de novo genome assembly and comparative analysis for the scarlet macaw (Ara macao). PLoS ONE 8: e62415. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Shaner DL. (2009) the role of translocation as a mechanism of resistance to glyphosate. Weed Sci 57: 118–123 [Google Scholar]
  66. Shields EJ, Dauer JT, VanGessel MJ, Neumann G. (2006) Horseweed (Conyza canadensis) seed collected in the planetary boundary layer. Weed Sci 54: 1063–1067 [Google Scholar]
  67. Sterck L, Rombauts S, Vandepoele K, Rouzé P, Van de Peer Y. (2007) How many genes are there in plants (...and why are they there)? Curr Opin Plant Biol 10: 199–203 [DOI] [PubMed] [Google Scholar]
  68. Stewart CN Jr, editor (2009) Weedy and Invasive Plant Genomics. Wiley-Blackwell, Ames, IA [Google Scholar]
  69. Stewart CN, Jr, Tranel PJ, Horvath DP, Anderson JV, Rieseberg LH, Westwood JH, Mallory-Smith CA, Zapiola ML, Dlugosch KM. (2009) Evolution of weediness and invasiveness: charting the course for weed genomics. Weed Sci 57: 451–462 [Google Scholar]
  70. Tian X, Zheng J, Hu S, Yu J. (2006) The rice mitochondrial genomes and their variations. Plant Physiol 140: 401–410 [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Todesco M, Balasubramanian S, Hu TT, Traw MB, Horton M, Epple P, Kuhns C, Sureshkumar S, Schwartz C, Lanz C, et al. (2010) Natural allelic variation underlying a major fitness trade-off in Arabidopsis thaliana. Nature 465: 632–636 [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Truco MJ, Ashrafi H, Kozik A, van Leeuwen H, Bowers J, Wo SRC, Michelmore RW. (2013) An ultra-high-density, transcript-based, genetic map of lettuce. G3 (Bethesda) 3: 617–631 [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A, et al. (2006) The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313: 1596–1604 [DOI] [PubMed] [Google Scholar]
  74. VanGessel MJ. (2001) Glyphosate-resistant horseweed from Delaware. Weed Sci 49: 703–705 [Google Scholar]
  75. Varshney RK, Chen W, Li Y, Bharti AK, Saxena RK, Schlueter JA, Donoghue MT, Azam S, Fan G, Whaley AM, et al. (2012) Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers. Nat Biotechnol 30: 83–89 [DOI] [PubMed] [Google Scholar]
  76. Varshney RK, Song C, Saxena RK, Azam S, Yu S, Sharpe AG, Cannon S, Baek J, Rosen BD, Tar’an B, et al. (2013) Draft genome sequence of chickpea (Cicer arietinum) provides a resource for trait improvement. Nat Biotechnol 31: 240–246 [DOI] [PubMed] [Google Scholar]
  77. Vila-Aiub MM, Balbi MC, Distéfano AJ, Fernández L, Hopp E, Yu Q, Powles SB. (2012) Glyphosate resistance in perennial Sorghum halepense (johnsongrass), endowed by reduced glyphosate translocation and leaf uptake. Pest Manag Sci 68: 430–436 [DOI] [PubMed] [Google Scholar]
  78. Wang Z, Hobson N, Galindo L, Zhu S, Shi D, McDill J, Yang L, Hawkins S, Neutelings G, Datla R, et al. (2012) The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads. Plant J 72: 461–473 [DOI] [PubMed] [Google Scholar]
  79. Warwick SI, Stewart CN Jr (2005) Crops come from wild plants: how domestication, transgenes, and linkage effects shape ferality. In J Gressel, ed, Crop Ferality and Volunteerism. CRC Press, Boca Raton, FL, pp 9–30 [Google Scholar]
  80. Weaver SE. (2001) The biology of Canadian weeds. 115. Conyza canadensis. Can J Plant Sci 81: 867–875 [Google Scholar]
  81. Weaver SE, McWilliams EL. (1980) The biology of Canadian weeds. 44. Amaranthus retroflexus L., A. powellii S. Wats. and A. hybridus L. Can J Plant Sci 60: 1215–1234 [Google Scholar]
  82. Wu X, Ren C, Joshi T, Vuong T, Xu D, Nguyen HT. (2010) SNP discovery by high-throughput sequencing in soybean. BMC Genomics 11: 469. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Wyman SK, Jansen RK, Boore JL. (2004) Automatic annotation of organellar genomes with DOGMA. Bioinformatics 20: 3252–3255 [DOI] [PubMed] [Google Scholar]
  84. Yu J, Hu S, Wang J, Wong GKS, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, et al. (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296: 79–92 [DOI] [PubMed] [Google Scholar]
  85. Yuan JS, Abercrombie LG, Cao Y, Halfhill MD, Zhou X, Peng Y, Hu J, Rao MR, Heck GR, Larosa TJ, et al. (2010) Functional genomics analysis of glyphosate resistance in Conyza canadensis (horseweed). Weed Sci 58: 109–117 [Google Scholar]
  86. Yuan JS, Tranel PJ, Stewart CN., Jr (2007) Non-target-site herbicide resistance: a family business. Trends Plant Sci 12: 6–13 [DOI] [PubMed] [Google Scholar]
  87. Zelaya IA, Owen MDK, Vangessel MJ. (2004) Inheritance of evolved glyphosate resistance in Conyza canadensis (L.) Cronq. Theor Appl Genet 110: 58–70 [DOI] [PubMed] [Google Scholar]
  88. Zhou X, Su Z, Sammons RD, Peng Y, Tranel PJ, Stewart CN, Jr, Yuan JS. (2009) Novel software package for cross-platform transcriptome analysis (CPTRA). BMC Bioinformatics (Suppl 11) 10: S16. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Data

Articles from Plant Physiology are provided here courtesy of Oxford University Press

RESOURCES