Skip to main content
G3: Genes | Genomes | Genetics logoLink to G3: Genes | Genomes | Genetics
. 2022 Aug 17;12(12):jkac182. doi: 10.1093/g3journal/jkac182

Insights from the genomes of 4 diploid Camelina spp.

Sara L Martin 1,, Beatriz Lujan Toro 2, Tracey James 3, Connie A Sauder 4, Martin Laforest 5
Editor: J J Emerson
PMCID: PMC9713399  PMID: 35976116

Abstract

Plant evolution has been a complex process involving hybridization and polyploidization making understanding the origin and evolution of a plant’s genome challenging even once a published genome is available. The oilseed crop, Camelina sativa (Brassicaceae), has a fully sequenced allohexaploid genome with 3 unknown ancestors. To better understand which extant species best represent the ancestral genomes that contributed to C. sativa’s formation, we sequenced and assembled chromosome level draft genomes for 4 diploid members of Camelina: C. neglecta C. hispida var. hispida, C. hispida var. grandiflora, and C. laxa using long and short read data scaffolded with proximity data. We then conducted phylogenetic analyses on regions of synteny and on genes described for Arabidopsis thaliana, from across each nuclear genome and the chloroplasts to examine evolutionary relationships within Camelina and Camelineae. We conclude that C. neglecta is closely related to C. sativa’s sub-genome 1 and that C. hispida var. hispida and C. hispida var. grandiflora are most closely related to C. sativa’s sub-genome 3. Further, the abundance and density of transposable elements, specifically Helitrons, suggest that the progenitor genome that contributed C. sativa’s sub-genome 3 maybe more similar to the genome of C. hispida var. hispida than that of C. hispida var. grandiflora. These diploid genomes show few structural differences when compared to C. sativa’s genome indicating little change to chromosome structure following allopolyploidization. This work also indicates that C. neglecta and C. hispida are important resources for understanding the genetics of C. sativa and potential resources for crop improvement.

Keywords: allopolyploidy, chromosome evolution, genome evolution, phylogenomics, Camelina, Brassicaceae

Introduction

A key goal in evolutionary biology is understanding the evolution of genomes—how changes in structure and content affects the rate of evolution (Otto and Whitton 2000). Three linked processes shape genome structure: hybridization (Stebbins 1969; Rieseberg 1997; Soltis and Soltis 2009; Abbott et al. 2013), polyploidization (de Wet 1971; Levin 1983; Soltis and Soltis 1999; Husband et al. 2013), and chromosomal rearrangements (Rieseberg 2001; Rieseberg and Willis 2007). Crop genomes such as maize, canola, soybean, sugarcane, and wheat (Schmutz et al. 2010; Schnable et al. 2011; Chalhoub et al. 2014; The International Wheat Genome Sequencing Consortium 2014) have underscored the frequency of these processes. While less effort has focused on diploid wild crop relatives (Michael and VanBuren 2015), sequencing the extant representatives of crop progenitors can provide insight into the crop’s origins and genomic evolution (Marcussen et al. 2014; Latta et al. 2019).

The oil seed crop and allohexaploid Camelina sativa (L.) Crantz (2n = 40) has been sequenced and well described (Kagale et al. 2014). It’s 3 parental genomes are thought to have diverged from each other recently between 2.5 MYA (Žerdoner Čalasan et al. 2019) and 5.4 MYA (Kagale et al. 2014) with the dates of the hybridizations contributing to Camelina sativa’s formation estimated as 5–10,000 years ago (Kagale et al. 2014). The Camelina genus includes 4 taxa with known diploid chromosome counts: Camelina neglecta (2n = 12, 1C = 265 Mb) J. Brock, Mandáková, Lysak & Al-Shehbaz, Camelina laxa C. A. Mey. (2n = 12, 1C = 275 Mb), Camelina hispida Boiss. var. hispida (2n = 14, 1C = 355 Mb), and Camelina hispida var. grandiflora (Boiss.) Hedge (2n = 14, 1C = 315 Mb) (Al-Shehbaz 2012; Martin et al. 2017; Brock et al. 2019). These species could be modern representatives of genomes involved in C. sativa’s formation.

While determining the origin of a polyploid lineage is generally difficult (Kyriakidou et al. 2018; Rothfels 2021), our knowledge of the ancestral chromosome structure of the Camelineae provides an opportunity to investigate related diploid genomes. In the 2000s, researchers defined 24 large conserved collinear regions or blocks (Ancestral Crucifer Karyotype or ACK) among crucifer genomes (Schranz et al. 2006; Murat et al. 2015; Lysak et al. 2016) and reconstructed an ancestral karyotype with 8 chromosomes similar to Arabidopsis lyrata (L.) O’Kane & Al-Shehbaz (Koch and Kiefer 2005). These blocks were later grouped into 16 ancestral karyotype regions (ABK) corresponding to ancestral chromosome arms (Murat et al. 2015). Camelina sativa’s sub-genomes each show a conserved ACK structure, with 21 in-block breaks that, most likely, occurred in its progenitors (Lysak et al. 2016). Here we extend our understanding of the evolution of C. sativa by assembling draft genomes for C. neglecta, C. laxa, C. hispida var. hispida, and C. hispida var. grandiflora. Using this data and the ACK structure, we phase C. sativa’s sub-genomes and examine relationships within the Camelineae using a phylogenetic analysis before delving further into C. sativa’s potential paternal lineage using transposable element (TE) abundance.

Materials and methods

Plant material and nucleic acid isolation

Camelina neglecta, C. hispida var. hispida, C. hispida var. grandiflora, and C. laxa seed were obtained from the North Central Regional Plant Introduction Station (NCRPIS) (PI650135, PI650139, PI650133, and PI633185, respectively). Seeds obtained from NCRPIS were stratified in petri dishes using filter paper that was moistened with 0.2% KNO3, sealed with Parafilm (Pechiney Plastic Packaging Company, Illinois, USA), and placed at 4°C in the dark for 2 weeks. For seed germination, the plates were then placed at room temperature under growth lights with a 16 h/8 h day/night light cycle. Seedlings were then sown on soil (soil, peat, and sand; 1:2:1; Promix, Rivière-du-Loup, Québec, Canada) with a photoperiod of 16 h 20°C days/8 h 18°C nights. For the largely self-incompatible species, C. hispida and C. laxa, rosette leaves were sampled using dry ice for DNA extractions and stored at −80°C. For the self-compatible, C. neglecta, after 6 weeks, the plants were placed for another 6 weeks at 4°C with an 8 h/16 h day/night photoperiod for vernalization. Plants were then transplanted to 5 in pots with the same soil and allowed to self-pollinate and set seed in a 16-h/8-h day/night photoperiod with 20°C days and 18°C nights. Following approximately 3 months, the mature seeds were collected and this process was repeated to obtain a fifth-generation inbred line before sampling and storage. Vouchers for each of the accessions have been deposited in the DAO (Department of Agriculture Ottawa) herbarium (C. neglecta DAO 902176; C. laxa: DAO 902754; C. hispida var. hispida: DAO 902780; and C. hispida var. grandiflora: DAO 902768).

For Pacific Biosciences long read (PacBio; Pacific Biosciences, Menlo Park, CA, USA) and Illumina short reads (Illumina Inc., San Diego, California, USA) sequencing, total DNA was extracted using a FastDNA Spin Kit (MP Biomedicals, Solon, Ohio), grinding was done in the FastPrep (MP Biomedicals) at 4.0 for 20 s, with the addition of 1 ceramic bead. Two DNA extractions were pooled for a total volume of 200 μl and precipitated with the addition of 20 μl 3 M NaOAc and 200 μl 100% ethanol. Following overnight incubation at −20°C, the DNA was centrifuged at 13,000 rpm at 4°C for 30 min. The ethanol was decanted and the DNA pellet was washed with cold 70% ethanol, dried at 37°C for approximately 20 min and resuspended in 100 μl 5 mM Tris–HCl (pH 8.5). The DNA concentration was determined by Qubit was 110 ng/μl. DNA quality was determined by running 1.0 μl on a 0.8% E-gel (Invitrogen by ThermoFisher) beside a 0.2 μg of 20 kb ladder (GeneRuler 1kbPlus, ThermoFisher).

DNA was also extracted to generate sequence data using Oxford Nanopore Technologies (ONT, Oxford Science Park, UK). High-molecular weight DNA extraction procedures were carried out as described in Workman et al. (2018). The Short Read Eliminator kit (Circulomics Inc.) was used as per the directions.

Sequencing

Sequencing was conducted in 5 different locations. McGill University’s Genome Quebec Innovation Centre completed PacBio sequencing using P6-C4 chemistry for C. laxa, C. hispida var. hispida, and C. neglecta. A total of 7 Single-Molecule Real-Time (SMRT) cells were used for sequencing C. neglecta and 8 cells each were used for C. laxa and C. hispida. They also generated paired-end (PE) Illumina data for C. laxa, C. hispida var. hispida, and C. hispida var. grandiflora with runs of 2 × 150 bases. The sequencing facility at the Microbial Molecular Technologies Laboratory (MMTL) in Ottawa (Ottawa Research and Development Centre, Agriculture and Agri-Food Canada) was used for PE sequencing using Illumina MiSeq v3 chemistry with runs of 2 × 300 bases with 500 bp inserts. The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, Canada prepared and sequenced 4 additional libraries: 1 PE library with 550 bp inserts using a Nano kit (Illumina, San Diego, CA, USA) and 3 mate-pair (MP) libraries: 3, 5, and 10 kb inserts using the Nextera MP kit (Illumina). All 4 libraries were sequenced on an Illumina HiSeq-2000 using v4 chemistry and flow cells with runs of 2 × 126 bases. ONT sequencing data were obtained using a MinION with 1 FLO-MIN106 flow cell run for 48 h for each taxon in the Martin Laboratory. Base pair calling was completed with Guppy v. 3.2.2 + 9fe0a78 (Wick et al. 2019). Finally, library preparation for chromosome conformation capture (Hi-C) analyses used the Proximo Hi-C 2.0 Kit from Phase Genomics Inc. (Seattle, WA, USA) and sequenced using Ilumina HiSeq 4000.

Quality control and de novo genome assembly

Raw PacBio and ONT reads were self-corrected and assembled using Canu v1.8 (Koren et al. 2016). The corrected error rate was set to 14.4% for both C. hispida var. grandiflora, where we only had ONT data, and for C. hispida var. hispida. Draft assemblies were polished using Illumina data filtered and trimmed with trimmomatic v 0.33 (Bolger et al. 2014) and up to 3 iterations of Pilon v 1.23 (Walker et al. 2014) using bowtie2 v 2.3.4.3 (Langmead and Salzberg 2012) and Burrows-Wheeler Aligner (bwa) (Li 2013). For C. hispida and C. laxa, the program Purge Haplotigs (Roach et al. 2018) was used to reduce redundancy resulting from heterozygosity.

The quality of the polished assemblies was evaluated with QUAST 5.0.2 (Gurevich et al. 2013). The completion of the assembly’s gene space was evaluated using BUSCO v3.0.2 (Simão et al. 2015) using the embryophyta_odb10 database and the degree to which the assemblies had been successfully collapsed was further assessed with HapPy (Guiglielmoni et al. 2021). Final heterozygosity and error rates were calculated with samtools v 1.9 (Li et al. 2009) and SNP detection and phasing was completed with Princess v0.01 (Mahmoud et al. 2021).

The chloroplast genome for each species was extracted from the draft assemblies by aligning contigs with C. sativa’s chloroplast genome using nucmer v 4.0 from MUMmer v 4.0. Chloroplast contigs were ordered and oriented based on C. sativa’s chloroplast and merged using a custom script written in R.

Genome scaffolding

Phase genomics used their proprietary software, Proximo, to produce chromosome level scaffolds (Oddes et al. 2018) for C. neglecta, C. hispida var. hispida, and C. laxa. The scaffolding tool, ntJoin 1.0.3-0 (Coombe et al. 2020), was used to scaffold the genome of C. hispida var. grandiflora using the Hi-C scaffolded assembly of C. hispida var. hispida. A final round of polishing by Pilon was completed following scaffolding.

All genomes are available from GenBank as part of Bioproject PRJNA750147.

Phylogenetic analysis of C. sativa sub-genomes and Camelineae diploids

We determined the phylogenetic relationships among genomes for (1) diploid members of the Camelineae: Arabidopsis lyrata (Ensembl Genomes version 1.0), Capsella rubella (NCBI v. ANNY00000000.1), Neslia paniculata (L.) Desvaux (Wright S, personal communication), (2) the 3 sub-genomes of C. sativa (NCBI version JFZQ0000000.1), and (3) the 4 diploid Camelina species sequenced here. We used 3 methods to extract regions of the genomes for analysis. First we identified random fragments within homologous regions of the genomes, second we used a reciprocal best hit (RBH) method to identify shared genes from Arabidopsis thaliana (TAIR version 10), and third we used Orthofinder 2.3.11 (Emms and Kelly 2017, 2019).

For the first approach, shared homologous regions within each ACK block as described in the A. lyrata genome (Schranz et al. 2006) were isolated (Supplementary Table 1). Each ACK region was cut into 1,000 bp fragments, aligned to A. lyrata’s genome using bowtie2 (Langmead and Salzberg 2012) and filtered. Pairwise collinearity between this set of filtered fragments and each taxon was determined using nucmer from MUMmer (v4.0) then a custom R script determined which fragments that overlapped between all genomes and were found in 3 copies within a consistent set of C. sativa’s chromsomes (Supplementary Table 2). Each set was aligned using msa 1.18.0 (Bodenhofer et al. 2015) and run in MrBayes v3.2.1 using a stepping stone model run for 6,000,000 generations to determine the appropriate model (Ronquist et al. 2012). We then ran the preferred model for 5,000,000 generations checking convergence of each run with the r package rwty (Warren et al. 2017). We then calculated 7 metrics to exclude biased sequences or sources of misleading phylogenetic signal using TreeCmp 2.0 (Bogdanowicz et al. 2012) and TreSpEx 1.1 (Struck 2014). Specifically, following Nikolov et al. (2019), we calculated the number of matching splits and the Robinson-Foulds tree distances using TreeCmp; the upper quartile and standard deviation of the long-branch scores, average patristic differences, and R2 of the saturation score and slope with TreSpEx. Fragments were excluded from further consideration if they failed the convergence checks or were outliers for one or more of the 7 phylogenetic metrics at the 99th percentile. For each set of trees belonging to ABKs located on the same chromosome or chromosome arm, we estimated species trees with ASTRAL-III (5.1.1) (Zhang et al. 2018).

Our second method used the information phasing the sub-genome structure of C. sativa as determined by the fragments above, to identify RBHs (Chen et al. 2017) for A. thaliana genes in each genome and sub-genome. Specifically, sequences for genes identified for A. thaliana in the TAIR 10 assembly (available from www.arabidopsis.org) were used to extract similar sequences for each genome. Following recommendations by Chen et al. (2017), BLAST 2.2.31 (Altschul et al. 1990) hits for A. thaliana genes were screened (e-value of less or equal to 0.0001, 70% or more of the query length, 70% or greater identity with query, a bit score ratio between the first and second BLAST hit of 1.2 or greater) extracted from each target genome and BLASTed back to A. thaliana’s genome and filtered again with the additional criteria of aligning to the gene’s original location.

The phylogenetic analysis of these best hits was then completed as above using MrBayes and ASTRAL-III by ABK. In addition, we randomly selected 1 sequence from each ACK 25 times to be a set of unlinked data and analyzed these using StarBEAST2 2.6.3 (Ogilvie et al. 2017; Suchard et al. 2018; Bouckaert et al. 2019), PhyloNet 3.8.2 (Than et al. 2008; Wen et al. 2018; Cao et al. 2019), and to generate a consensus tree with ASTRAL-III. StarBeast2 was run with the GTR site model for between 20 and 100 million generations as required to produce estimated sample sizes (ESS) values above 200 using a configuration file created by BEAUti 2.6.3 (Bouckaert et al. 2019). Convergence was examined with Tracer (1.7.1) (Rambaut et al. 2018), trees summarized with TreeAnnotator 2.6.3 (Bouckaert et al. 2019) and trees plotted with the densiTree function in the R package phangorn 2.5.5 (Schliep 2011). PhyloNet’s MCMC_Seq command was run with chain lengths of between 20 and 80 million as required to produce ESS values above 200 and the maximum number of reticulations set to 4. Networks and trees produced by PhyloNet were visualized using plotTree function from the package phytools 0.7.20 (Revell 2012).

Finally, AUGUSTUS 3.3.2 (Stanke et al. 2006) was run on the final version of the assembled genomes, reference genomes, and Camelina sativa’s phased sub-genomes to predict genes in silico using Arabidopsis. Amino acid sequences were then provided to OrthoFinder (Emms and Kelly 2019), which estimated the species tree.

Entire chloroplast genomes for Arabis alpina, Arabidopsis thaliana, A. lyrata, Capsella bursa-pastoris, C. grandiflora, C. rubella, and Camelina sativa were downloaded from the Chloroplast Genome Database (https://rocaplab.ocean.washington.edu/old_website/tools/cpbase [last accessed Sept 2021]) and aligned with the 4 Camelina chloroplasts using the msa function in R (see above). A consensus tree was then estimated with MrBayes as above. Annotation of the chloroplast genomes was completed with GeSeq 1.77 (Tillich et al. 2017) through CHLOROBOX website (https://chlorobox.mpimp-golm.mpg.de/index.html [last accessed Sept 2021]), using Camelina sativa, A. lyrata, and Capsella rubella’s chloroplasts as reference sequences. The chloroplast genomes with their annotations were then visualized with the website’s OrganellarGenomeDRAW (OGDRAW) 1.3.1 (Greiner et al. 2019).

Synteny between Camelina diploids and C. sativa and A. lyrata

To evaluate collinearity between the chromosome level draft genomes, we used nucmer with both C. sativa and A. lyrata as references for comparison and scripts written in R employing the package circlize 0.4.11 (Gu et al. 2014) to order and visualize the alignments.

Transposable element annotation

TEs were located using the Extensive de novo TE Annotator 1.8.3 (EDTA; Ou et al., 2019), EAHelitron 1.5.1 (Hu et al. 2019), LRT_FINDER/LTR_FINDER_parallel (Xu and Wang 2007; Ou and Jiang 2019), and LTRharvest (Ellinghaus et al. 2008). Data from these last 2 programs were processed with LRT_retreiver (Ou and Jiang 2018), which was used to calculate the LTR Assembly Index (LAI) (Ou et al. 2018). We determined whether Helitrons detected in C. sativa’s sub-genome 3 were also detected in syntenic regions of C. hispida var. hispida and C. hispida var. grandiflora using a script written in R and output from EAHelitron and nucmer. Specifically, we examined all regions of synteny (determined by nucmer), determined if a Helitron was detected in the region (by EAHelitron using the presence of sequence CTAG and a GC rich hairpin), and examined sequence similarity for 1000 bp upstream of the 3′ end. We considered any pairs with 80% identity to represents the same Helitron insertion event.

Additional software

The version of R used was 3.6.3 (2020-02-29)—“Holding the Windsock.” Sequence handling, tree plotting, and graphical display were facilitated by numerous R packages in addition to those mentioned above including: ape 5.0 (Paradis and Schliep 2019), apex 1.0.4 (Jombart et al. 2017), Biostrings 2.56.0 (Pagès et al. 2020), pals 1.7 (Wright 2021), pBrackets 1.0.1 (Schulz 2021), plotrix 3.7.8 (Lemon 2006), plyr 1.8.6 (Wickham 2011), ips 0.0.11 (Heibl 2008), IRanges 2.22.2 (Lawrence et al. 2013), outliers 0.14 (Komsta 2011), RIdeogram 0.2.2 (Hao et al. 2020), Rsamtools 2.4.0 (Morgan et al. 2020), seqinr 3.6.1 (Bastolla et al. 2007), stringr 1.4.0 (Wickham 2019), treeio 1.10.0 (Wang et al. 2020), and VennDiagram.1.7.1 (Chen 2021).

The program FigTree 1.4.4 (Rambaut 2018) was used to convert trees including their support values to a format easily readable in R.

Unless otherwise specified, all tools were run using default settings.

Results

Genome assemblies

Sequencing coverage of the genome differed for each species with C. neglecta receiving the majority of our sequencing efforts (286× raw coverage) (Table 1). Following assembly with Canu and polishing with Pilon, C. neglecta had the most contiguous assembly with 204 contigs and an NG50 of 11,493,634 (Table 2). Scaffolding using Hi-C data and Phase Genomics’ Proximo resulted in chromosome level assemblies for C. neglecta (n = 6), C. hispida var. hispida (n = 7), and C. laxa (n = 6) with approximately 70% or more of their expected lengths and NG50s of 29,279,412, 39,460,631, and 31,147,072, respectively. Following scaffolding by ntJoin, 70% of the expected genome length for C. hispida var. grandiflora was incorporated into a chromosome level assembly (n = 7). The completeness of gene space, as evaluated by estimating the proportion of core conserved eukaryotic genes recovered by the Benchmarking Universal Single Copy Orthologs (BUSCO) score, indicated that all assemblies had at least 90% of the expected genes (Table 2). The small percentage of duplicated BUSCO genes indicated that the assemblies have largely been collapsed. This was confirmed by haploidy scores from HapPy of over 0.9. The genomes all rank as gold quality based the high level of intact long terminal repeat (LTR) elements detected in the genome (LIA > 20) (Table 2).

Table 1.

Genome sizes (1C) as estimated from flow cytometry and the fold coverage (x) for the 4 genomes sequenced here provided by long read data (PacBio and ONT) and short read data including Illumina PE reads and Illumina MPs with upper limits of estimates of heterozygosity (H) and error rates (ER) calculated by GenomeScope from the Illumina data.

Species Genome size (Mb) Pacbio Ont Illumina
Raw coverage Genomescope
PE MP H ER
Camelina neglecta 265 38 60 92 96 286 0.17% 0.08%
Camelina hispida var. hispida 355 38 53 52 143 1.09% 0.15%
Camelina hispida var. grandiflora 315 32 64 96 2.14% 0.15%
Camelina laxa 275 57 74 47 177 1.36% 0.24%

Table 2.

Statistics on genome assemblies for C. neglecta, C. hispida var. hispida, C. hispida var. grandiflora, and C. laxa generated by QUAST and BUSCO based on the embryophyta_odb10 database before and after scaffolding.

Camelina neglecta 
Camelina hispida var. hispida
Camelina hispida var. grandiflora
Camelina laxa 
Assembly strategy Canu HI-C Canu and purge haplotigs HI-C Canu and purge haplotigs ntJoin Canu and purge haplotigs HI-C
 Contigs (≥0 bp) 204 6 747 7 1,072 7 900 6
 Contigs (≥10,000 bp) 157 6 703 7 896 7 785 6
 Contigs (≥50,000 bp) 139 6 552 7 748 7 530 6
 Total length (≥0 bp) 210,252,877 194,776,448 305,512,316 283,171,403 247,512,955 247,668,533 223,800,267 199,991,310
 Total length (≥10,000 bp) 210,014,359 194,776,448 305,328,622 283,171,403 247,070,138 247,668,533 223,546,568 199,991,310
 Total length (≥50,000 bp) 209,462,969 194,776,448 301,174,771 283,171,403 242,642,354 247,668,533 217,296,729 199,991,310
 Largest contig 19,486,679 48,079,882 6,147,326 6,147,326
6,147,326
6,147,326
45,593,769 3,518,166 41,100,942 4,931,297 39,604,730
 Total length 210,252,877 194,726,046 305512316 283,171,403 247,512,955 247,668,533 223,800,267 199,991,310
Estimated genome size 265,000,000 265,000,000 360,000,000 360,000,000 320,000,000 320,000,000 275,000,000 275,000,000
Percent of expected genome size 79% 74% 85% 79% 77% 77% 81% 73%
 N50 13,701,387 30,499,177 1053466 41,586,011 477,640 37,976,760 651,797 31,824,321
 NG50 11,493,634 29,279,412 843316 843,316 843,316 39,460,631 346,627 30,680,265 477,199 31,147,072
  N's per 100 kb 1 16 32,233 20
BUSCO 2,121 2,121 2,121 2,121 2,121 2,121 2,121 2,121
 Complete (%) 2,082 (98) 2,084 (98) 2055 (97) 2,025 (96) 2,063 (97) 1,901 (90) 2,079 (98) 2032 (96)
Complete single copy (%) 2,051 (97) 2,055 (97) 1938 (91) 1,971 (93) 1,863 (88) 1,853 (88) 1,939 (91) 1,978 (93)
Complete duplicated (%) 31 (1) 29 (1) 117 (6) 54 (3) 200 (9) 48 (2) 140 (7) 54 (3)
 Fragmented (%) 14 (<1) 14 (<1) 28 (1) 43 (2) 18 (<1) 65 (3) 14 (<1) 18 (<1)
 Missing (%) 25 (1) 23 (1) 38 (2) 53 (3) 40 (2) 155 (7) 28 (1) 71 (3)
Haploidy score 0.99 0.97 0.96 0.94
LAI 21.1 25.7 27.6 22.0
Error rate 0.004 0.007 0.02 0.007
SNPs 420,396 1,564,682 1,813,558 1,488,895
Heterozygosity 0.002 0.006 0.007 0.008

The initial assembly values for C. hispida and C. laxa are for assemblies following assembly by Canu and reduction in the number of alternative contigs resulting from the heterozygosity of these genomes by Purge Haplotigs. All assemblies were polished with Pilon. Addition metrics are (1) the haploidy score (HapPy) with 1 indicating perfect collapse, (2) the LAI (LTR_retriever) with scores >20 indicating gold quality, (3) final error rate (samtools), (4) number of SNPs (PRINCESS), and (5) final heterozygosity (PRINCESS).

Phylogenetic relationships among C. sativa and Camelineae diploids

In total 1,444 fragments within the ACK blocks described for A. lyrata genome were found to be shared among the 8 core Camelineae taxa with 3 copies distributed across C. sativa’s genome. Fragments from each ancestral chromosome showed the same pattern of localization in C. sativa’s genome in the majority of cases (Table 3). For example ACK blocks A, B, and C from ancestral chromosome 1 all showed the highest number of hits on C. sativa’s chromosomes 3, 14, and 17. Where differences occurred, there was consistency corresponding to ancestral chromosome arms and, therefore, ABK group. The pattern of hits across the genomes resulted in 11 groupings, corresponding to either ancestral chromosome arms or entire ancestral chromosomes, that were used in further phylogenetic analyses (Table 4).

Table 3.

Distribution of the top 3 chromosomes with successful alignments of fragments from each of Arabidopsis lyrata's ACKs across the genome of Camelina sativa as determined by bowtie2 ordered by number of hits.

ACK ABK Camelina sativa top chromosome Camelina sativa 2nd chromosome Camelina sativa 3rd chromosome
A AK2 3 14 17
B AK2 14 3 17
C AK1 3 14 17
D AK3 7 16 9
E AK4 7 16 9
F AK6 1 15 19
G AK6 1 15 19
H AK5 15 1 19
I AK7 7 16 9
J AK8 5 6 4
K AK9 4 6 9
L AK9 4 6 9
M AK10 9 6 4
N AK10 4 6 9
O AK11 2 13 8
P AK11 13 2 8
Q AK12 8 13 20
R AK12 8 13 20
S AK13 10 11 12
T AK14 10 11 12
U AK14 11 12 10
V AK15 11 20 18
W AK16 18 11 2
X AK16 11 2 18

Table 4.

Summary of ACK and ABK groups across Camelina sativa's chromosomes based on bowtie hits showing preservation of ancestral genome structure across ABK regions, but movement of some regions across chromosomes.

ACK ABK Sub-genome 1 Sub-genome 2 Sub-genome 3 Fragments in group Normalized quartet score
ABC AK1/2 14 3 17 262 0.78
DE AK3/4 7 16 9 98 0.80
FGH AK5/6 19 1 15 269 0.79
I AK7 7 16 9 28 0.82
J AK8 4 6 5 104 0.77
KLMN AK9/10 4 6 9 115 0.79
OP AK11 8 13 2 37 0.80
QR AK12 8 13 20 180 0.79
STU AK13/14 11 10 12 202 0.78
V AK15 11 18 20 19 0.80
WX AK16 11 18 2 130 0.78

Chromosomes were assigned to sub-genomes based on summary trees constructed by MrBayes from the ACK fragments grouped by ABK using ASTRAL III. The normalized quartet score, indicating the similarity of trees from individual fragments (out of one), produced by ASTRAL-III is presented. Sub-genome 1 was defined as the genome with closest relationship to C. neglecta while sub-genome 3 was defined as that with the closest relationship to the C. hispida varieties.

The species trees produced by coalescent modeling by ASTRAL-III from gene trees produced from all fragments within the 11 groups indicated the phylogenetic relationship between C. sativa’s chromosomes and the other taxa (Tables 4 and 5 and Fig. 1). Trees were largely congruent as indicated by the high normalized quartet scores (Table 4).

Table 5.

Sub-genome structure of C. sativa based on phylogenetic relationships of AKB regions and diploid genomes determined here in comparison to the original structure proposed by Kagale et al. (2014).

This article Kagale et al. (2014)
Sub-genome 1 Cs 14 Cs 17
Cs 07 Cs 16
Cs 19 Cs 15
Cs 04 Cs 04
Cs 08 Cs 13
Cs 11 Cs 11
Sub-genome 2 Cs 03 Cs 14
Cs 16 Cs 07
Cs 01 Cs 19
Cs 06 Cs 06
Cs 13 Cs 08
Cs 10 Cs 10
Cs 18 Cs 18
Sub-genome 3 Cs 17 Cs 03
Cs 15 Cs 05
Cs 09 Cs 01
Cs 05 Cs 09
Cs 02 Cs 20
Cs 20 Cs 02
Cs 12 Cs 12

Fig 1.

Fig 1.

Species trees. These species trees were produced by ASTRAL-III using a multispecies coalescent model and trees produced by MrBayes for each fragment identified by bowtie 2’s alignment of sequence fragments from A. lyrata. These are grouped within 11 ABK groups that correspond to either whole ancestral chromosomes or chromosome arms that have been largely conserved. The ACK(s) included in each tree are indicated in the box to the left of each tree and the ancestral chromosome structure with division into ACK and ABK is shown in the lower right.

The consensus species trees estimated by StarBEAST2 using the RBH sequences also most consistently recovered this topology (Figs. 2 and 3) as did the final species tree predicted by OrthoFinder based 14,513 genes trees (Fig. 4). Further, all methods indicated Arabidopsis is the basal genus in this group, while Neslia is closest genus to Camelina among those studied.

Fig. 2.

Fig. 2.

Density tree plots. Density tree plots of 1,000 randomly chosen tree estimated by StarBEAST2 for a set of a) fragment sequences, and b) genes, and c) 2 sets of reciprocal gene sequences.

Fig. 3.

Fig. 3.

Consensus networks. The consensus networks generated with PhyloNet using 1 set of reciprocal gene sequences (a) and the 3 most credible networks contributing to this consensus (b–d) with the percentage of credible networks represented.

Fig 4.

Fig 4.

Trees from OrthoFinder. Phylogenetic trees generated by OrthoFinder based on in silico predictions of amino acids produced by AUGUSTUS. a) A consensus of 14,513 gene trees with an inferred root and b) a concatenated multiple sequence alignment of 7,401 shared, single-copy genes. Node labels indicate support values for each bipartition with the higher levels of support for the concatenated data expected for the type of analysis.

The primary uncertainty in tree topology was the position of C. laxa—either basal to all the Camelina spp. or in a clade with C. sativa’s sub-genome 3 and C. hispida. Specifically, of the randomly chosen sets of sequences from across the genome, ASTRAL-III consensus trees produced trees with C. laxa basal to the Camelina group just over half the time—14 of 25 for the fragments and 15 of 25 for the RBH genes. The consensus trees produced by StarBEAST2 showed greater disparity with 18 of 25 suggesting C. laxa is basal for the fragments but only 6 of 25 for the RBH. A thin majority, 14 of 25, of the RBH suggested C. laxa in a clade with C. sativa’s sub-genome 3. This topology, with C. laxa in the sub-genome 3 clade, was also produced by Orthogene (Fig. 4). Density tree plots of trees generated by StarBEAST2 indicated mixed signals for C. laxa’s position for sequence fragments (Fig. 2a) and for genes (Fig. 2b). These plots also indicated that some trees differed in the timing of divergence in genes among C. laxa, C. sativa’s sub-genome 3, and the varieties of C. hispida (Fig. 2c). Both ASTRAL-III and StartBeast2 account for incomplete lineage sorting; however, given the propensity of Brassicaceae taxa to hybridize, reticulate evolution between these taxa was investigated with PhyloNet. However, for many of the sets of 24 unlinked fragments or genes, PhyloNet indicated trees without reticulations were the most credible for the fragments (10 of 25) and the RBH genes (17 of 25). With other sets 1 reticulation was included in the credible networks, but their placement was inconsistent (Fig. 3).

Results for chloroplasts

The chloroplast (cp) assemblies produced for the 4 diploid Camelina species sequenced here, based on the scaffolding of contigs using C. sativa’s cp, resulted in contigs between 152,239 bp (C. hispida var. grandiflora) and 153,366 bp (C. neglecta) in comparison to the 153,044 bp for C. sativa’s cp assembly. Annotation by GeSeq indicated expected features such as the large inverted repeat and that ARAGORN 1.2.38, invoked by GeSeq, found the same 39 genes in all 4 of the Camelina cp’s assembled here and that of C. sativa. The cp assemblies varied in the degree of fragmentation of the genes within the assembly with C. hispida var. grandiflora showing the greatest number of fragmented features (89) and C. neglecta the fewest (20) (Supplementary Fig. 1). The resulting tree from entire chloroplast sequences of all 11 taxa included in the analysis, showed the chloroplast sequence from C. sativa was most closely related to that of C. neglecta and that these taxa were sister to a clade containing the varieties of C. hispida with C. laxa basal. As with the analysis of nuclear genes, the chloroplast indicates Capsella as more closely related to Camelina than Arabidopsis (Fig. 5).

Fig. 5.

Fig. 5.

Chloroplast-based phylogeny. Phylogeny constructed from whole chloroplast alignment using MrBayes. Node labels indicate posterior probabilities.

Synteny between Camelina diploids and A. lyrata and C. sativa

The 4 diploid species sequenced here showed extensive synteny with both A. lyrata and C. sativa genomes with conservation of the ACK and ABK blocks (Fig. 6). The analysis indicates a high level of synteny between A. lyrata and C. neglecta, and to a greater degree, between C. neglecta and C. sativa sub-genome 1 (Fig. 6). Similarly, there is strong synteny between A. lyrata and C. hispida and, to a greater extent, between C. hispida and C. sativa sub-genome 3 (Fig. 6). In contrast the genome of C. laxa has more extensive rearrangements and breaks within ACK and ABK blocks (Fig. 7) including fragmentation of the E block, and separation of parts of the F, J, U, and W blocks on to separate chromosomes. Further, chromosome number reduction in this species involved portions of several of the ancestral chromosomes have been incorporated into several different chromosomes such as the upper arm of chromosome 6—AK06 (FG), which has been split between chromosomes 3 and 5.

Fig. 6.

Fig. 6.

Synteny plots with C. neglecta and C. sativa’s subgenomes 1 and 2. Plots indicating regions of synteny as indicated by nucmer between C. neglecta, with A. lyrata (a), C. sativa’s sub-genomes (b, c, e), and between A. lyrata and C. sativa’s sub-genomes (d, f). Plot (b) is colored by alignment with A. lyrata’s chromosomes, analogs for the ancestral chromosomes.

Fig. 7.

Fig. 7.

Synteny plots with C. hispida var. hispida, C. laxa, and C. sativa sub-genome 3. Plots indicating regions of synteny as indicated by nucmer among the diploid Camelina species C. hispida var. hispida and C. laxa with C. sativa’s sub-genomes (a, d, g, h) and with A. lyrata (b, e, i). Synteny between C. sativa sub-genomes and A. lyrata are shown for comparison (c, f). Plots (b) and (f) are colored by alignment with A. lyrata’s chromosomes, analogs for the ancestral chromosomes.

Transposable element annotation and comparison

Annotation of the TEs by the EDTA indicated that TEs made up 34%–35% of the diploid Camelina genomes, with the exception of C. hispida var. hispida where TEs accounted for 50% of the genome (Fig. 8 and Supplementary Table 2). The largest groups of TEs identified belonged to the Helitron (DHH) and Gypsy superfamilies (RLC). EAHelitron identified fewer Helitrons than EDTA (Table 6). However, the difference in the number of Helitrons in C. neglecta and C. hispida var. hispida was even more divergent with Helitron densities of 9.6 and 20.4 per 1,000,000 bp, respectively. The percentage of Camelina hispida var. grandiflora’s genome (34%) and the Helitron density (11.8) was much lower than that of C. hispida var. hispida (Supplementary Table 2). The 3 sub-genomes of C. sativa also showed differences in the number of TEs and specifically Helitrons, with sub-genome 1 and 2 showing similar percentages of TEs at 30.2% and 26.6% and similar Helitron densities at 6.1 and 5.8, but sub-genome 3 showing higher values at 40% and 14.7. In syntenic regions, C. sativa shared more Helitrons with C. hispida var. hispida (258) than with C. hispida var. grandiflora (178) (Table 7 and Fig. 9).

Fig. 8.

Fig. 8.

TE types and abundance. Types of TE in a) Camelina neglecta, b) C. laxa, c) C. hispida var. grandiflora, d) C. hispida var. hispida, e) Arabidopsis lyrata, f) C. sativa sub-genome 1, g) C. sativa sub-genome 2, and h) C. sativa sub-genome 3, as annotated by Extensive De novo Transposable Element (EDTA). Pie graphs are scaled by the percentage of the genome attributed to TEs which at highest is 49.6% for C. hispida var. hispida (d). Classification of TE type follows the unified classification system for eukaryotic TEs that uses a 3 letter code indicating class, order, and superfamily (34). Here, these are divided into 3 groups: DNA transposons (DNA), which includes Helitrons (DHH); LTR, which includes retrotransposons RLC (Copia) and RLG (Gypsy); and miniature-inverted transposable elements.

Table 6.

Results of Helitron (DHH) annotation by EAHelitron.

SPECIES HELITRONS HELITRON DENSITY
Camelina neglecta 1,877 9.6
Camelina hispida var. hispida 5,776 20.4
Camelina hispida var. grandiflora 2,933 11.8
Camelina laxa 937 4.6
Camelina sativa sub-genome 1 1,220 6.1
Camelina sativa sub-genome 2 1,024 5.8
Camelina sativa sub-genome 3 3,395 14.7
Arabidopsis lyrata 3,242 16.7
Arabidopsis thaliana 665 5.6
Capsella rubella 1,826 14.1
Capsella grandiflora 226 2.1
Capsella bursa-pastoris 2,576 9.6
Neslia paniculata 1,003 8.9

Helitron density is calculated as the number of Helitrons detected in the genome for each 1,000,000 bp.

Table 7.

The number of shared Helitrons detected by EAHelitron in C. sativa’s sub-genome 3 within regions of synteny among C. sativa’s sub-genome 3, C. hispida var. grandiflora, and C. hispida var. hispida chromosome.

C. sativa chromosome C. sativa Helitron count Helitrons shared with
Common Helitrons
C. hispida var. grandiflora C. hispida var. hispida
Cs 17 516 24 31 9
Cs 09 518 29 41 17
Cs 15 456 27 38 7
Cs 05 513 27 59 14
Cs 12 505 21 35 12
Cs 02 496 27 29 9
Cs 20 391 23 25 7
Total 3,395 178 258 75

Fig. 9.

Fig. 9.

Helitron ideogram and abundance. a) Density of Helitrons identified by EAHelitron across the sub-genome 3 of C. sativa with the locations of Helitrons shared by C. hispida var. hispida, C. hispida var. grandiflora or both indicated and b) Venn diagram of the number of Helitrons shared among C. hispida var. hispida, C. hispida var. grandiflora, and C. sativa’s sub-genome 3.

Discussion

Whole-genome sequencing is allowing a deeper understanding of the processes that have shaped plant evolution including polyploidization, hybridization, and chromosomal rearrangements. Allopolyploid crops have become excellent systems for understanding these genomic changes because they have often received substantial sequencing effort (e.g. Triticum aestivum L. [Appels et al. 2018], Solanum lycopersicum L. [Sato et al. 2012], Zea mays ssp. mays L. [Schnable et al. 2009; Jiao et al. 2017], and Arachis hypogaea L. [Bertioli et al. 2019]). Moreover, as work on these species has shown, a greater understanding of the genomic changes associated with allopolyploidization or autopolyploidization, the genomic consequences of domestication, and the potential breeding resources can be achieved when related, extant diploid species are also sequenced (e.g. Sato et al. 2012; Marcussen et al. 2014; Bertioli et al. 2016; Ramos-Madrigal et al. 2016).

The genome of the allohexaploid Camelina sativa (L.) Crantz (camelina; 2n = 40) has been sequenced (Kagale et al. 2014), but extant relatives of the 3 parental genomes have not. Four diploid taxa are known within the genus: Camelina neglecta, Camelina laxa, Camelina hispida var. hispida, and Camelina hispida var. grandiflora. Here we sequenced the genomes of these 4 diploids, assembled high quality chromosome level drafts, and examined their phylogenetic relationship with C. sativa’s sub-genomes. Each genome showed synteny with A. lyrata with conservation of ACK and ABK blocks, with each region mapping once, confirming a diploid structure (Figs. 6 and 7). This conservation of ACK and ABK blocks allowed for further dissection of the genomes and phylogenetic analysis.

Camelia sativa’s sub-genomes

Species trees produced by ASTRAL-III’s multispecies coalescent model using trees produced from sequence data from each ABK or ACK indicated the phylogenetic relationships between these regions from C. sativa’s chromosomes and the other taxa included (Fig. 1). These phylogenies indicate that the phasing of C. sativa’s chromosomes to sub-genomes required reassessment compared to that originally suggested by Kagale et al. (2014). Given that rearrangement and fractionation have not been predominant mechanisms of genome change in C. sativa (Lysak et al. 2016), limited visual information is available for this division. Phasing polyploid genomes into their sub-genomes remains a challenging issue (Rothfels 2021) and the majority of tools to aid in this are currently only able to handle tetraploids (e.g. AlloppNET) (Jones 2013). Here, phasing was accomplished by using the distribution of ancestral syntenic blocks and placing them in phylogenetic context with related diploids. This allowed us to define sub-genome 1 as composed of the chromosomes with the closest phylogenetic relationship with C. neglecta, sub-genome 2 as those chromosomes sister to the clade with C. neglecta and sub-genome 1, and sub-genome 3 as chromosomes with the closest phylogenetic relationship to the varieties of C. hispida (Tables 4 and Table 5 and Fig. 1). This is a different nomenclature than our previous work which defined sub-genome 2 as most closely related to C. neglecta (Lujan Toro 2017), but it is concordant with the revised sub-genome definition recently published for C. sativa (Chaudhary et al. 2020).

In Chaudhary et al.’s analysis (2020), the authors characterized 193 accessions of Camelina, including C. neglecta, C. laxa, C. hispida, C. rumelica Velen, tetraploid and hexaploid C. microcarpa Andrz. ex DC, and C. sativa, using a genotyping by sequencing (GBS) approach that mapped SNPs in sequences for these accessions to C. sativa’s genome. They determined that sequences from C. neglecta aligned to 6 of C. sativa’s chromosomes and tetraploid C. microcarpa aligned with 13 of the chromosomes. However, they determined that alignment reads did not correspond to the sub-genome structure published in Kagale et al. (2014) and refined the composition of the sub-genomes accordingly with C. neglecta sequences aligning to sub-genome 1, the tetraploid C. microcarpa sequences aligning to sub-genomes 1 and 2, and C. hispida partially aligning to sub-genome 3 (Chaudhary et al. 2020). As a result very different approaches, an analysis based on SNPs and a phylogenetic analysis of whole-genome sequences, has resulted in the same conclusions about refinements to the sub-genome structure of C. sativa.

In all 3 sub-genomes, the majority of the changes in the genome structure in comparison to A. lyrata are shared between the diploid genomes and the sub-genomes of C. sativa (see below, Figs. 6 and 7). This conservation of structure in these closely related genomes is in line with Lysak et al.’s expectations that the majority of the changes in chromosome structure seen in C. sativa compared to the ancestral chromosomes were likely present in the ancestral diploid genomes (Lysak et al. 2016). While the formation of C. sativa was relatively recent (Kagale et al. 2014), it is clear that allopolyploidization did not result in extensive structural reorganization of the genome, but rather can be characterized by conservation of ancestral chromosome structure and gene order.

Inter-genome recombination has been observed in some allopolyploids, including resynthesized Brassica napus (Pires et al. 2004), and if this has occurred in C. sativa’s genome it could result in conflicting signals in some loci, as could historical gene flow and introgression. Some genes appear to show a signal of hybridizations among C. laxa, C. sativa’s sub-genome 3, and the varieties of C. hispida (Figs. 2 and 3), and further evaluations of this possibility could be completed with broader, population level sampling of the taxa and evaluation of the evidence of introgression (Martin et al. 2015; Martin and Jiggins 2017; Crowl et al. 2020).

Camelina neglecta—an extant, maternal relative

With the refinements to the phasing of the sub-genome structure of C. sativa, in addition to the close phylogenetic relationship, there is strong synteny between the draft C. neglecta and C. sativa’s sub-genome 1 (Fig. 6) with similar conservation of the ancestral chromosome structure. Specifically, only 2 large inversions are apparent when C. neglecta’s genome and C. sativa’s sub-genome 1 are compared: one at the start of C. neglecta’s chromosome 6 and end of C. sativa’s chromosome 8 and one within C. neglecta’s chromosome 2 and C. sativa’s chromosome 7, which is echoed in the homologous chromosome of C. sativa’s sub-genome 2. Interestingly, the inversion in C. neglecta’s chromosome 6 is more similar to A. lyrata’s chromosome 6 suggesting this change may have occurred in C. sativa’s genome 1 following allopolyploidization. This similarity and the chloroplast data, which suggests the maternal lineage of C. sativa is most closely related to C. neglecta, indicates that C. neglecta is a close extant, maternal relative of C. sativa.

C. neglecta is also the closest relative of sub-genome 2 (Fig. 6) with the largest difference in structure the lack of fusion in C. sativa’s chromosomes 10 and 18 compared to C. neglecta’s chromosome 5 and several additional areas of inversions within C. neglecta’s chromosome 3. This suggests that the taxon that contributed C. sativa’s sub-genome 2 is or was a close relative of C. neglecta that retained the ancestral chromosome structure of n = 7 reconstructed for the genus by Mandáková et al. (2019). This taxon is currently unknown, but given that C. neglecta was only recently described as a distinct species because of its morphological similarity to C. microcarpa, it is possible that the taxon has been collected, but is similarly cryptic and not yet identified as distinct from accessions of C. microcarpa.

Camelina hispida—an extant, paternal relative

The varieties of C. hispida show a close phylogenetic relationship with C. sativa’s sub-genome 3 (Fig. 7). There is extensive synteny among these genomes indicating strong preservation of chromosome structure and gene order (e.g. Fig. 7). As in the comparisons between C. neglecta and C. sativa’s genomes 1 and 2, 3 chromosomes appear to have inversions, one of which is shared in comparisons with A. lyrata and potentially represents a change since C. sativa’s formation. While the phylogenies indicate that both varieties of C. hispida share a common ancestor with C. sativa’s sub-genome 3, the frequency of TE elements differs strongly between the 2 varieties and is responsible for the 12% larger 2C DNA content of in C. hispida var. hispida (Martin et al. 2017).

The frequency of TEs and, in particular, shared Helitrons suggest C. hispida var. hispida’s genome may more closely resembles the genome that contributed C. sativa’s sub-genome 3. The frequency of TEs varies across C. sativa’s sub-genome’s with sub-genome 3 having the highest number of these elements, at 40% of the sub-genome, compared to 30% in the other two (Fig. 8, Table 6, and Supplementary Table 2). Among the diploids sequenced here, this percentage is closest to the percentage observed for C. hispida var. hispida (50%). The largest portion of these TEs by count, Helitrons, make up 15% of C. sativa’s sub-genome 3 and 16% of C. hispida var. hispida’s genome. Helitrons were first discovered in the genomes of Arabidopsis thaliana, Oryza sativa L., and Caenorhabditis elegans Maupas (Kapitonov and Jurka 2001) and are the most abundant TEs in A. thaliana’s genome. They make up the second largest TE portion of the genome after the less numerous, but longer retro-elements such as Copia and Gypsy (Quesneville 2020). Though estimates vary based on the process or algorithms used to search for the elements and the level of conservation expected (e.g. Xiong et al. 2014 or Yang and Bennetzen 2009). Hu et al. (2019) have suggested that, because Helitron density is highly evolutionarily labile they could be used for species identification. In this case, we investigated whether they may provide clues to which genome was more similar to that which contributed to the formation of C. sativa. We used the software developed by Hu et al. (2019), EAHelitron, to determine whether Helitrons detected in C. sativa’s sub-genome 3 were more often shared with C. hispida var. hispida or C. hispida var. grandiflora. This indicated that 7.5% of the Helitrons in C. sativa’s sub-genome 3 are shared with C. hispida var. hispida, while 5.2% are shared with C. hispida var. grandiflora’s (Table 7 and Fig. 9). This suggests that the genome that contributed C. sativa’s sub-genome 3 may have been more similar to the genome of C. hispida var. hispida with its higher TE content. However, it is also possible that with an expansion of TEs in C. hispida var. hispida’s genome, shared Helitrons were more likely to be retained.

Here we see reduced TEs abundance and Helitron density in sub-genome 3 from levels observed in C. hispida var. hispida and in sub-genome 1 compared to C. neglecta. However, one expectation is that genomic shock and relaxed selection pressure following allopolyploidization allows an increase in TE abundance (Parisod and Senerchia 2012; Vicient and Casacuberta 2017). For example, proliferation of TEs has been observed in allotetraploid Coffea arabica L. Specifically, copia elements have increased the size of sub-genome C compared to that observed in the diploid representative of that progenitor genome, C. canephora L. (Yu et al. 2011). Similarly, the transposon Sunfish was observed to be released from repression in the early generations of synthetic autopolyploids of Arabidopsis thaliana and A. arenosa (L.) Lawalrée as well as their allopolyploid derivative A. suecica (Fries.) Norrlin. However, the same was not true for other TE families in A. suecica (Madlung et al. 2005) and more recent work by Burns et al. (2021) concluded there was no upregulation in transposon activity in A. suecica compared to its ancestral species. Ågren et al. (2016) also determined that the allotetraploid Capsella bursa-pastoris did not show genome-wide TE proliferation compared to its progenitors, though did note a higher abundance in retrotransposons in gene rich regions, and Sarilar et al. (2013) determined that early generations of synthetic allotetraploid Brassica napus did not show increased activity in 3 groups of TEs. Interestingly, similar to our observation, Hu et al. (2019) observed that Helitron density was also lower in the sub-genomes of the allotetraploid Brassica napus L. compared to the extant representatives of the parental genomes B. rapa L. and B. oleraceae L. Here, the repeatome of C. sativa is less complete than that of the diploid genomes assembled with long reads and this may mean that the Helitrons in C. sativa’s genome are underrepresented. Alternatively, the diploid lineages may have seen proliferation of TEs since C. sativa’s formation. As in the systems explored above, resynthesis of C. sativa, could determine whether the apparent changes are repeatable. Further, the variation in TE abundance between C. hispida varieties and the presence of extant G1G3, C. rumelica, and G1G2, “C. microcarpa 4x,” tetraploids (Mandáková et al. 2019; Chaudhary et al. 2020) in addition to the G1G2G3 hexaploid C. sativa provides an intriguing opportunity to use resynthesis to study TE dynamics after allopolyploidization and investigate if and how TE abundance contributes to sub-genome dominance or lack thereof (Bird et al. 2018; Alger and Edger 2020).

The results presented by Žerdoner Čalasan et al. (2019) suggest that C. anomala, a taxon with unknown chromosome number, is sister to C. hispida raising the possibility that this species could also be a close relative of C. sativa’s sub-genome 3. However, the species has apparently not been collected since the 1800s and remains largely a mystery.

Camelina laxa 

Camelina laxa is morphologically distinct from the other taxa studied here with its flexuous or zig-zagging stems and also has the most diverged genome from the ancestral karyotype. It is clear from the phylogenetic evidence here and cytological evidence presented by Mandáková et al. (2019) that C. laxa and C. neglecta represent separate transitions from n = 7 to n = 6. In C. laxa, extensive chromosomal rearrangements occurred, resulting from chromosome shattering, while in C. neglecta chromosome reduction resulted from the fusion of 2 chromosomes (Mandáková et al. 2019). However, the position of C. laxa in the phylogenetic tree was the least stable element across all analyses with 2 main alternatives, either as basal to all other Camelina species sampled or as basal to the clade containing the varieties of C. hispida and C. sativa’s sub-genome 3. While Mandáková et al.’s (2019) analysis of 48 single copy genes using ASTRAL placed C. laxa in a clade with C. hispida, the consensus from previous work is that C. laxa should be considered basal to the other species of Camelina. For example, Brock et al. (2018)’s maximum likelihood consensus phylogeny constructed from ddRADseq for 48 specimens from gene bank material and field collections of C. sativa, C. microcarpa hexaploids, C. rumelica, C. laxa, and C. hispida placed C. laxa was basal to the other species. Similarly, a neighbor-joining tree produced using GBS data indicated C. laxa was basal except the G1G3 tetraploid C. rumelica (Chaudhary et al. 2020). Finally, in a more comprehensive analysis of the Camelineae, which included C. alyssum (Mill.) Thell. and C. anomala Boiss. & Hausskn. ex Boiss. as well as representatives from many other genera included in the tribe: Nelisa, Capsella, Arabidopsis, Catolobus, Pseudoarabidopsis, and Chrysochamela, Žerdoner Čalasan et al. (2019) also placed C. laxa as basal to Camelina. As a result, the preponderance of evidence currently suggests that this, more diverged lineage, is basal to the Camelina.

Conclusion

As domestication has decreased genetic variability in C. sativa accessions (Manca et al. 2013) other members of the genus might be of value for the improvement of C. sativa. Here our results indicate that C. neglecta and C. hispida var. hispida could be useful species to investigate variation in traits such as seed size, disease resistance, and oil profiles. However, the collections available for these species are limited and international efforts to collect and preserve a broader selection of germplasm for the genus should be considered as we look to these species as sources of desirable traits for the improvement of the crop species (Ford-Lloyd et al. 2011).

Supplementary Material

jkac182_Supplementary_Figure_S1
jkac182_Supplemental_Tables

Acknowledgments

We would like to thank the ORDC growth facility team for support rearing the plant material, the ORDC Molecular Technology Laboratory for support with sequencing, and the U.S. National Plant Germplasm System and GRIN-Global for conserving and providing the germplasm needed for this research. We thank Dr S. Wright for sharing the draft genome sequence of Neslia paniculata and Julia Mata for help processing convergence data.

Funding

Funding was provided by Agriculture and Agri-Food Canada as part of the project “Gene flow, diversity and relationships within the Brassicaceae: Focus on Camelina” (J-001029).

Conflicts of interest

None declared.

Contributor Information

Sara L Martin, Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON K1A 0CA, Canada.

Beatriz Lujan Toro, Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON K1A 0CA, Canada.

Tracey James, Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON K1A 0CA, Canada.

Connie A Sauder, Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON K1A 0CA, Canada.

Martin Laforest, Saint-Jean-sur-Richelieu Research and Development Centre, Agriculture and Agri-Food Canada, Saint-Jean-sur-Richelieu, QC J3B 3E6, Canada.

Data Availability

All assembled genomes are available from the National Center for Biotechnology Information as part of Bioproject PRJNA750147.

Supplemental material is available at G3 online

Literature cited

  1. Abbott R, Albach D, Ansell S, Arntzen JW, Baird SJE, Bierne N, Boughman J, Brelsford A, Buerkle CA, Buggs R, et al. Hybridization and speciation. J Evol Biol. 2013;26(2):229–246. doi:10.1111/j.1420–9101.2012.02599.x. [DOI] [PubMed] [Google Scholar]
  2. Ågren JA, Huang HR, Wright SI.. Transposable element evolution in the allotetraploid capsella bursa-pastoris. Am J Bot. 2016;103(7):1197–1202. doi: 10.3732/ajb.1600103. [DOI] [PubMed] [Google Scholar]
  3. Al-Shehbaz IA. A generic and tribal synopsis of the Brassicaceae (Cruciferae). Taxon. 2012;61(5):931–954. [Google Scholar]
  4. Alger EI, Edger PP.. One subgenome to rule them all: underlying mechanisms of subgenome dominance. Curr Opin Plant Biol. 2020;54:108–113. doi: 10.1016/j.pbi.2020.03.004. [DOI] [PubMed] [Google Scholar]
  5. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ.. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–410. [DOI] [PubMed] [Google Scholar]
  6. Appels R, Eversole K, Feuillet C, Keller B, Rogers J, Stein N, Pozniak CJ, Choulet F, Distelfeld A, Poland J, et al. ; The International Wheat Genome Sequencing Consortium (IWGSC). Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science. 2018;361(6403):eaar7191. doi: 10.1126/science.aar7191. [DOI] [PubMed] [Google Scholar]
  7. Bastolla U, Porto M, Roman HE, Vendruscolo M.. Seqin{R} 1.0–2: a contributed package to the {R} project for statistical computing devoted to biological sequences retrieval and analysis. In: Bastolla U, Porto M, Roman HE, Vendruscolo M, editors. Structural Approaches to Sequence Evolution: Molecules, Networks, Populations. New York: Springer-Verlag; 2007. p. 207–232. [Google Scholar]
  8. Bertioli DJ, Cannon SB, Froenicke L, Huang G, Farmer AD, Cannon EKS, Liu X, Gao D, Clevenger J, Dash S, et al. The genome sequences of Arachis duranensis and Arachis ipaensis, the diploid ancestors of cultivated peanut. Nat Genet. 2016;48(4):438–446. doi: 10.1038/ng.3517. [DOI] [PubMed] [Google Scholar]
  9. Bertioli DJ, Jenkins J, Clevenger J, Dudchenko O, Gao D, Seijo G, Leal-Bertioli SCM, Ren L, Farmer AD, Pandey MK, et al. The genome sequence of segmental allotetraploid peanut Arachis hypogaea. Nat Genet. 2019;51(5):877–884. doi: 10.1038/s41588-019-0405-z. [DOI] [PubMed] [Google Scholar]
  10. Bird KA, VanBuren R, Puzey JR, Edger PP.. The causes and consequences of subgenome dominance in hybrids and recent polyploids. New Phytol. 2018;220(1):87–93. doi: 10.1111/nph.15256. [DOI] [PubMed] [Google Scholar]
  11. Bodenhofer U, Bonatesta E, Horejs-Kainrath C, Hochreiter S.. msa: an R package for multiple sequence alignment. Bioinformatics. 2015;31(24):3997–3999. doi: 10.1093/bioinformatics/btv176. [DOI] [PubMed] [Google Scholar]
  12. Bogdanowicz D, Giaro K, Wróbel B.. TreeCmp: comparison of trees in polynomial time. Evol Bioinform Online. 2012;8(8):EBO.S9657–487. doi: 10.4137/EBO.S9657. [DOI] [Google Scholar]
  13. Bolger AM, Lohse M, Usadel B.. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Bouckaert R, Vaughan TG, Barido-Sottani J, Duchêne S, Fourment M, Gavryushkina A, Heled J, Jones G, Kühnert D, De Maio N, et al. BEAST 2.5: an advanced software platform for Bayesian evolutionary analysis. PLoS Comput Biol. 2019;15(4):e1006650. doi: 10.1371/journal.pcbi.1006650. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Brock JR, Dönmez AA, Beilstein MA, Olsen KM.. Phylogenetics of Camelina Crantz. (Brassicaceae) and insights on the origin of gold-of-pleasure (Camelina sativa). Mol Phylogenet Evol. 2018;127:834–842. doi: 10.1016/j.ympev.2018.06.031. [DOI] [PubMed] [Google Scholar]
  16. Brock JR, Mandáková T, Lysak MA, Al-Shehbaz IA.. Camelina neglecta (Brassicaceae, Camelineae), a new diploid species from Europe. PhytoKeys. 2019;115:51–57. doi: 10.3897/phytokeys.115.31704. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Burns R, Mandáková T, Gunis J, Soto-Jiménez LM, Liu C, Lysak MA, Novikova PY, Nordborg M.. Gradual evolution of allopolyploidy in Arabidopsis suecica. Nat Ecol Evol. 2021;5(10):1367–1381. doi: 10.1038/s41559-021-01525-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Cao Z, Liu X, Ogilvie H, Yan Z, Nakhleh L. Practical aspects of phylogenetic network analysis using PhyloNet. bioRxiv, 2019. doi: 10.1101/746362. [DOI]
  19. Chalhoub B, Denoeud F, Liu S, Parkin IAP, Tang H, Wang X, Chiquet J, Belcram H, Tong C, Samans B, et al. Early allopolyploid evolution in the post-Neolithic Brassica napus oilseed genome. Science. 2014;345(6199):950–953. doi: 10.1126/science.1253435. [DOI] [PubMed] [Google Scholar]
  20. Chaudhary R, Koh CS, Kagale S, Tang L, Wu SW, Lv Z, Mason AS, Sharpe AG, Diederichsen A, Parkin IAP.. Assessing diversity in the camelina genus provides insights into the genome structure of Camelina sativa. G3 (Bethesda). 2020;10(4):1297–1308. doi: 10.1534/g3.119.400957. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Chen H. VennDiagram: generate high-resolution Venn and Euler plots, 2021. https://cran.r-project.org/package=VennDiagram.
  22. Chen MY, Liang D, Zhang P.. Phylogenomic resolution of the phylogeny of laurasiatherian mammals: exploring phylogenetic signals within coding and noncoding sequences. Genome Biol Evol. 2017;9(8):1998–2012. doi: 10.1093/gbe/evx147. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Coombe L, Nikolić V, Chu J, Birol I, Warren RL.. ntJoin: fast and lightweight assembly-guided scaffolding using minimizer graphs. Bioinformatics. 2020;36(12):3885–3887. doi: 10.1093/bioinformatics/btaa253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Crowl AA, Manos PS, McVay JD, Lemmon AR, Lemmon EM, Hipp AL.. Uncovering the genomic signature of ancient introgression between white oak lineages (Quercus). New Phytol. 2020;226(4):1158–1170. doi: 10.1111/nph.15842. [DOI] [PubMed] [Google Scholar]
  25. Ellinghaus D, Kurtz S, Willhoeft U.. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics. 2008;9(1):18. doi: 10.1186/1471-2105-9-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Emms DM, Kelly S.. STRIDE: species tree root inference from gene duplication events. Mol Biol Evol. 2017;34(12):3267–3278. doi: 10.1093/molbev/msx259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Emms DM, Kelly S.. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 2019;20(1):238. doi: 10.1186/s13059-019-1832-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Ford-Lloyd BV, Schmidt M, Armstrong SJ, Barazani O, Engels J, Hadas R, Hammer K, Kell SP, Kang D, Khoshbakht K, et al. Crop wild relatives—undervalued, underutilized and under threat? Bioscience. 2011;61(7):559–565. doi: 10.1525/bio.2011.61.7.10. [DOI] [Google Scholar]
  29. Greiner S, Lehwark P, Bock R.. OrganellarGenomeDRAW (OGDRAW) version 1.3.1: expanded toolkit for the graphical visualization of organellar genomes. Nucleic Acids Res. 2019;47(W1):W59–W64. doi: 10.1093/nar/gkz238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Gu Z, Gu L, Eils R, Schlesner M, Brors B.. Circlize implements and enhances circular visualization in R. Bioinformatics. 2014;30(19):2811–2812. doi: 10.1093/bioinformatics/btu393. [DOI] [PubMed] [Google Scholar]
  31. Guiglielmoni N, Houtain A, Derzelle A, Van Doninck K, Flot JF.. Overcoming uncollapsed haplotypes in long-read assemblies of non-model organisms. BMC Bioinformatics. 2021;22(1):1–23. doi: 10.1186/s12859-021-04118-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Gurevich A, , SavelievV, , VyahhiN, , Tesler G.. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29(8):1072–1075. doi: 10.1093/bioinformatics/btt086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Hao Z, Lv D, Ge Y, Shi J, Weijers D, Yu G, Chen J.. RIdeogram: drawing SVG graphics to visualize and map genome-wide data on idiograms, PeerJ Comput Sci. 2020. doi: 10.7717/peerj-cs.251. [DOI] [PMC free article] [PubMed]
  34. Heibl C. PHYLOCH: R language tree plotting tools and interfaces to diverse phylogenetic software packages, 2008. http://www.christophheibl.de/Rpackages.html.
  35. Hu K, Xu K, Wen J, Yi B, Shen J, Ma C, Fu T, Ouyang Y, Tu J.. Helitron distribution in Brassicaceae and whole genome helitron density as a character for distinguishing plant species. BMC Bioinformatics. 2019;20(1):1–20. doi: 10.1186/s12859-019-2945-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Husband BC, Baldwin SJ, Suda J.. The incidence of polyploid in natural plant populations: major patterns and evolutionary processes. In: Greilhuber J, Dolezel J, Wendel J, editors. Plant Genome Diversity. Volume 2. Vienna: Springer. 2013. p. 255–276. http://link.springer.com/10.1007/978-3-7091-1160-4. [Google Scholar]
  37. Jiao Y, Peluso P, Shi J, Liang T, Stitzer MC, Wang B, Campbell MS, Stein JC, Wei X, Chin CS, et al. Improved maize reference genome with single-molecule technologies. Nature. 2017;546(7659):524–527. doi: 10.1038/nature22971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Jombart T, Archer F, Schliep K, Kamvar Z, Harris R, Paradis E, Goudet J, Lapp H.. apex: phylogenetics with multiple genes. Mol Ecol Resour. 2017;17(1):19–26. doi: 10.1111/1755-0998.12567 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Jones G. Bayesian phylogenetic analysis for diploid and allotetraploid species networks. bioRxiv, 2013. doi: 10.1101/129361. [DOI]
  40. Kagale S, Koh C, Nixon J, Bollina V, Clarke WE, Tuteja R, Spillane C, Robinson SJ, Links MG, Clarke C, et al. The emerging biofuel crop Camelina sativa retains a highly undifferentiated hexaploid genome structure. Nat Commun. 2014;5:3706–3717. doi:10.1038/ncomms4706. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Kapitonov VV, Jurka J.. Rolling-circle transposons in eukaryotes. Proc Natl Acad Sci USA. 2001;98(15):8714–8719. doi:10.1073/pnas.151269298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Koch MA, Kiefer M.. Genome evolution among cruciferous plants: a lecture from the comparison of the genetic maps to three diploid species - Capsella rubella, Arabidopsis lyrata subsp. petraea, and A. thaliana. Am J Bot. 2005;92(4):761–767. [DOI] [PubMed] [Google Scholar]
  43. Komsta L. Outliers: tests for outliers, 2011. R package version 0.15. [accessed 2022 July]. https://cran.r-project.org/web/packages/outliers/
  44. Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. bioRxiv. 2016; doi:10.1101/071282. [DOI] [PMC free article] [PubMed]
  45. Kyriakidou M, Tai HH, Anglin NL, Ellis D, Strömvik MV.. Current strategies of polyploid plant genome sequence assembly. Front Plant Sci. 2018;9:1660–1675. doi: 10.3389/fpls.2018.01660. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Langmead B, Salzberg SL.. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–359. doi: 10.1038/nmeth.1923.Fast. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Latta RG, Bekele WA, Wight CP, Tinker NA.. Comparative linkage mapping of diploid, tetraploid, and hexaploid Avena species suggests extensive chromosome rearrangement in ancestral diploids. Sci Rep. 2019;9(1):1–12. doi: 10.1038/s41598-019-48639-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Lawrence M, Huber W, Pagès H, Aboyoun P, Carlson M, Gentleman R, Morgan M, Carey V.. Software for computing and annotating genomic ranges. PLoS Comput Biol. 2013;9(8):e1003118. doi: 10.1371/journal.pcbi.1003118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Lemon J. Plotrix: a package in the red light district of R . R-News. 2006;6(4):8–12. [Google Scholar]
  50. Levin DA. Polyploidy and novelty in flowering plants. Am Nat. 1983;122(1):1–25. [Google Scholar]
  51. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv, 2013:1–3. http://arxiv.org/abs/1303.3997.
  52. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Lujan Toro BE. Genome assembly of Camelina microcarpa Andrz. Ex DC, a step towards understanding genome evolution in Camelina. Ottawa, Ontario, Canada: Carleton University, 2017.
  54. Lysak MA, Mandakova T, Schranz ME.. Comparative paleogenomics of crucifers: ancestral genomic blocks revisited. Curr Opin Plant Biol. 2016;30:108–115. doi: 10.1016/j.pbi.2016.02.001. [DOI] [PubMed] [Google Scholar]
  55. Madlung A, Tyagi AP, Watson B, Jiang H, Kagochi T, Doerge RW, Martienssen R, Comai L.. Genomic changes in synthetic Arabidopsis polyploids. Plant J. 2005;41(2):221–230. doi:10.1111/j.1365-313X.2004.02297.x. [DOI] [PubMed] [Google Scholar]
  56. Mahmoud M, Doddapaneni H, Timp W, Sedlazeck FJ.. PRINCESS: comprehensive detection of haplotype resolved SNVs, SVs, and methylation. Genome Biol. 2021;22(1):1–17. doi:10.1186/s13059-021–02486-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Manca A, Pecchia P, Mapelli S, Masella P, Galasso I.. Evaluation of genetic diversity in a Camelina sativa (L.) Crantz collection using microsatellite markers and biochemical traits. Genet Resour Crop Evol. 2013;60(4):1223–1236. doi:10.1007/s10722-012–9913-8. [Google Scholar]
  58. Mandáková T, Pouch M, Brock JR, Al-Shehbaz IA, Lysak MA.. Origin and evolution of diploid and allopolyploid Camelina genomes were accompanied by chromosome shattering. Plant Cell. 2019;31(11):2596–2612. doi: 10.1105/tpc.19.00366. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Marcussen T, Sandve SR, Heier L, Spannagl M, Pfeifer M, Jakobsen KS, Wulff BBH, Steuernagel B, Mayer KFX, Olsen O-A, International Wheat Genome Sequencing Consortium. Ancient hybridizations among the ancestral genomes of bread wheat. Science. 2014;345(6194):1250092. doi: 10.1126/science.1251788. [DOI] [PubMed] [Google Scholar]
  60. Martin SH, Davey JW, Jiggins CD.. Evaluating the use of ABBA-BABA statistics to locate introgressed loci. Mol Biol Evol. 2015;32(1):244–257. doi: 10.1093/molbev/msu269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Martin SH, Jiggins CD.. Interpreting the genomic landscape of introgression. Curr Opin Genet Dev. 2017;47:69–74. doi:10.1016/j.gde.2017.08.007. [DOI] [PubMed] [Google Scholar]
  62. Martin SL, Smith TW, James T, Shalabi F, Kron P, Sauder CA.. An update to the Canadian range, abundance, and ploidy of Camelina spp. (Brassicaceae) east of the Rocky Mountains. Botany. 2017;95(4):405–417. doi: 10.1139/cjb-2016-0070. [DOI] [Google Scholar]
  63. Michael TP, VanBuren R.. Progress, challenges and the future of crop genomes. Curr Opin Plant Biol. 2015;24:71–81. doi:10.1016/j.pbi.2015.02.002. [DOI] [PubMed] [Google Scholar]
  64. Morgan M, Pagès H, Obenchain V, Hayden N.. Rsamtools: binary alignment (BAM), FASTA, variant call (BCF), and tabix file import, 2020. R package version 2.12.0. http://bioconductor.org/packages/Rsamtools.
  65. Murat F, Louis A, Maumus F, Armero A, Cooke R, Quesneville H, Crollius HR, Salse J.. Understanding Brassicaceae evolution through ancestral genome reconstruction. Genome Biol. 2015;16(1):262. doi: 10.1186/s13059-015-0814-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Nikolov LA, Shushkov P, Nevado B, Gan X, Al-Shehbaz IA, Filatov D, Bailey CD, Tsiantis M.. Resolving the backbone of the Brassicaceae phylogeny for investigating trait diversity. New Phytol. 2019;222(3):1638–1651. doi: 10.1111/nph.15732. [DOI] [PubMed] [Google Scholar]
  67. Oddes S, Zelig A, Kaplan N.. Three invariant Hi-C interaction patterns: applications to genome assembly. Methods. 2018;142:89–99. doi: 10.1016/j.ymeth.2018.04.013. [DOI] [PubMed] [Google Scholar]
  68. Ogilvie HA, Bouckaert RR, Drummond AJ.. StarBEAST2 brings faster species tree inference and accurate estimates of substitution rates. Mol Biol Evol. 2017;34(8):2101–2114. doi: 10.1093/molbev/msx126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Otto SP, Whitton J.. Polyploid incidence and evolution. Annu Rev Genet. 2000;34(1):401–437. doi: 10.1146/annurev.genet.34.1.401. [DOI] [PubMed] [Google Scholar]
  70. Ou S, Chen J, Jiang N.. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 2018;46(21):e126. doi: 10.1093/nar/gky730. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Ou S, Jiang N.. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 2018;176(2):1410–1422. doi: 10.1104/pp.17.01310. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Ou S, Jiang N.. LTR_FINDER_parallel: parallelization of LTR_FINDER enabling rapid identification of long terminal repeat retrotransposons. Mob DNA. 2019;10(1):1–3. doi:10.1186/s13100-019–0193-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Ou S, Su W, Liao Y, Chougule K, Agda JRA, Hellinga AJ, Lugo CSB, Elliott TA, Ware D, Peterson T, et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 2019;20(1):275. doi: 10.1186/s13059-019-1905-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Pagès H, Aboyoun P, Gentleman R, DebRoy S.. Biostrings: efficient manipulation of biological strings, 2022. R package version 2.64.0. [accessed 2022 July]. https://bioconductor.org/packages/release/bioc/html/Biostrings.html.
  75. Paradis E, Schliep K.. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics. 2019;35(3):526–528. [DOI] [PubMed] [Google Scholar]
  76. Parisod C, Senerchia N.. Responses of transposable elements to polyploidy. In: Grandbastien M-A, Casacuberta JM, editors. Topics in Current Genetics. Vol. 24. Berlin, Heidelberg: Springer-Verlag; 2012. p. 147–168. http://link.springer.com/10.1007/978-1-62703-568-2. [Google Scholar]
  77. Pires CJ, Zhao J, Schranz ME, Leon EJ, Quijada PA, Lukens LN, Osborn TC.. Flowering time divergence and genomic rearrangements in resynthesized Brassica polyploids (Brassicaceae). Biol J Linn Soc. 2004;82(4):675–688 [accessed 2011 Apr 29]. http://www3.botany.ubc.ca/rieseberglab/plantevol/Piresetal2004.pdf. [Google Scholar]
  78. Quesneville H. Twenty years of transposable element analysis in the Arabidopsis thaliana genome. Mob DNA. 2020;11(1):1–13. doi: 10.1186/s13100-020-00223-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Rambaut A. FigTree, 2018. [accessed 2022 July]. http://tree.bio.ed.ac.uk/software/figtree/.
  80. Rambaut A, Drummond AJ, Xie D, Baele G, Suchard MA.. Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Syst Biol. 2018;67(5):901–904. doi: 10.1093/sysbio/syy032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Ramos-Madrigal J, Smith BD, Moreno-Mayar JV, Gopalakrishnan S, Ross-Ibarra J, Gilbert MTP, Wales N.. Genome sequence of a 5,310-year-old maize cob provides insights into the early stages of maize domestication. Curr Biol. 2016;26(23):3195–3201. doi: 10.1016/j.cub.2016.09.036. [DOI] [PubMed] [Google Scholar]
  82. Revell LJ. phytools: phylogenetic tools for comparative biology (and other things). Methods Ecol Evol. 2012;3(2):217–223. doi: 10.1111/j.2041-210X.2011.00169.x. [DOI] [Google Scholar]
  83. Rieseberg LH. Hybrid origins of plant species. Annu Rev Ecol Syst. 1997;28(1):359–389. doi: 10.1146/annurev.ecolsys.28.1.359. [DOI] [Google Scholar]
  84. Rieseberg LH. Chromosomal rearrangements and speciation. Trends Ecol Evol. 2001;16(7):351–358. [DOI] [PubMed] [Google Scholar]
  85. Rieseberg LH, Willis JH.. Plant speciation. Science. 2007;317(5840):910–914. doi: 10.1126/science.1137729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Roach MJ, Schmidt SA, Borneman AR.. Purge haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics. 2018;19(1):1–10. doi: 10.1186/s12859-018-2485-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Ronquist F, Teslenko M, Van Der Mark P, Ayres DL, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP.. Mrbayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol. 2012;61(3):539–542. doi: 10.1093/sysbio/sys029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Rothfels CJ. Polyploid phylogenetics. New Phytol. 2021;230(1):66–72. doi: 10.1111/nph.17105. [DOI] [PubMed] [Google Scholar]
  89. Sarilar V, Palacios PM, Rousselet A, Ridel C, Falque M, Eber F, Chèvre AM, Joets J, Brabant P, Alix K.. Allopolyploidy has a moderate impact on restructuring at three contrasting transposable element insertion sites in resynthesized Brassica napus allotetraploids. New Phytol. 2013;198(2):593–604. doi: 10.1111/nph.12156. [DOI] [PubMed] [Google Scholar]
  90. Sato S, Tabata S, Hirakawa H, Asamizu E, Shirasawa K, Isobe S, Kaneko T, Nakamura Y, Shibata D, Aoki K, et al. The tomato genome sequence provides insights into fleshy fruit evolution. Nature. 2012;485(7400):635–641. doi: 10.1038/nature11119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Schliep KP. phangorn: phylogenetic analysis in R. Bioinformatics. 2011;27(4):592–593. doi: 10.1093/bioinformatics/btq706. [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, Hyten DL, Song Q, Thelen JJ, Cheng J, et al. Genome sequence of the palaeopolyploid soybean. Nature. 2010;463(7278):178–183. doi: 10.1038/nature08670. [DOI] [PubMed] [Google Scholar]
  93. Schnable JC, Springer NM, Freeling M.. Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss. Proc Natl Acad Sci USA. 2011;108(10):4069–4074. doi: 10.1073/pnas.1101368108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  94. Schnable PS, Pasternak S, Liang C, Zhang J, Fulton L, Graves TA, Minx P, Reily AD, Courtney L, Kruchowski SS, et al. The B73 maize genome: complexity, diversity, and dynamics. Science. 2009;326(5956):1112–1115. [DOI] [PubMed] [Google Scholar]
  95. Schranz ME, Lysak M A, Mitchell-Olds T.. The ABC’s of comparative genomics in the Brassicaceae: building blocks of crucifer genomes. Trends Plant Sci. 2006;11(11):535–542. doi:10.1016/j.tplants.2006.09.002. [DOI] [PubMed] [Google Scholar]
  96. Schulz A. pBrackets: plot brackets, 2021. R package version 1.0.1. [accessed 2022 July]. https://cran.r-project.org/web/packages/pBrackets/index.html
  97. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM.. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
  98. Soltis D, Soltis P.. Polyploidy: recurrent formation and genome evolution. Trends Ecol Evol. 1999;14(9):348–352. http://www.ncbi.nlm.nih.gov/pubmed/10441308. [DOI] [PubMed] [Google Scholar]
  99. Soltis PS, Soltis DE.. The role of hybridization in plant speciation. Annu Rev Plant Biol. 2009;60:561–588. doi:10.1146/annurev.arplant.043008.092039. http://www.ncbi.nlm.nih.gov/pubmed/19575590. [DOI] [PubMed] [Google Scholar]
  100. Stanke M, Schöffmann O, Morgenstern B, Waack S.. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics. 2006;7(1):62. doi: 10.1186/1471-2105-7-62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  101. Stebbins GL. The significance of hybridization for plant taxonomy and evolution. Taxon. 1969;18(1):26–35. [Google Scholar]
  102. Struck TH. Trespex-detection of misleading signal in phylogenetic reconstructions based on tree information. Evol Bioinform Online. 2014;10:51–67. doi: 10.4137/EBo.s14239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  103. Suchard MA, Lemey P, Baele G, Ayres DL, Drummond AJ, Rambaut A.. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 2018;4(1):vey016. doi: 10.1093/ve/vey016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  104. Than C, Ruths D, Nakhleh L.. PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships. BMC Bioinformatics. 2008;9(1):322. doi: 10.1186/1471-2105-9-322. [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. The International Wheat Genome Sequencing Consortium. A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science. 2014;345(6194):1251788. doi: 10.1126/science.1251788. [DOI] [PubMed] [Google Scholar]
  106. Tillich M, Lehwark P, Pellizzer T, Ulbricht-Jones ES, Fischer A, Bock R, Greiner S.. GeSeq - versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 2017;45(W1):W6–W11. doi: 10.1093/nar/gkx391. [DOI] [PMC free article] [PubMed] [Google Scholar]
  107. Vicient CM, Casacuberta JM.. Impact of transposable elements on polyploid plant genomes. Ann Bot. 2017;120(2):195–207. doi: 10.1093/aob/mcx078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  108. Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman J, Young SK, et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014;9(11):e112963. doi: 10.1371/journal.pone.0112963. [DOI] [PMC free article] [PubMed] [Google Scholar]
  109. Wang L-G, Lam TT-Y, Xu S, Dai Z, Zhou L, Feng T, Guo P, Dunn CW, Jones BR, Bradley T, et al. treeio: an R package for phylogenetic tree input and output with richly annotated and associated data. Mol Biol Evol. 2020;37(2):599–603. doi: 10.1093/molbev/msz240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  110. Warren DL, Geneva AJ, Lanfear R, Rosenberg M.. RWTY (R We There Yet): an R package for examining convergence of Bayesian phylogenetic analyses. Mol Biol Evol. 2017;34(4):1016–1020. doi: 10.1093/molbev/msw279. [DOI] [PubMed] [Google Scholar]
  111. Wen D, Yu Y, Zhu J, Nakhleh L.. Inferring phylogenetic networks using PhyloNet. Syst Biol. 2018;67(4):735–740. doi: 10.1093/sysbio/syy015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  112. Wet JMJ. Polyploidy and evolution in plants. Taxon. 1971;20(1):29–35. [Google Scholar]
  113. Wick RR, Judd LM, Holt KE.. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol. 2019;20(1):129. doi: 10.1186/s13059-019-1727-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  114. Wickham H. The split-apply-combine strategy for data analysis. J Stat Softw. 2011;20(1):1–29. http://www.jstatsoft.org/v40/i01/. [Google Scholar]
  115. Wickham H. stringr: simple, consistent wrappers for common string operations, 2019. R package version 1.4.0. [accessed 2022 July]. https://cran.r-project.org/package=stringr
  116. Workman R, Fedak R, Kilburn D, Hao S, Liu K, Timp W.. High molecular weight DNA extraction from recalcitrant plant species for third generation sequencing. Protoc Exch. Version. 2018;1:1–15. doi: 10.1038/protex.2018.059. [DOI] [Google Scholar]
  117. Wright K. pals: color palettes, colormaps, and tools to evaluate them, v. 1.7 2021. [accessed 2022 July]. https://kwstat.github.io/pals/.
  118. Xiong W, He L, Lai J, Dooner HK, Du C.. HelitronScanner uncovers a large overlooked cache of Helitron transposons in many plant genomes. Proc Natl Acad Sci USA. 2014;111(28):10263–10268. doi: 10.1073/pnas.1410068111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  119. Xu Z, Wang H.. LTR-FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 2007;35(Suppl. 2):265–268. doi: 10.1093/nar/gkm286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  120. Yang L, Bennetzen JL.. Structure-based discovery and description of plant and animal Helitrons. Proc Natl Acad Sci USA. 2009;106(31):12832–12837. doi: 10.1073/pnas.0905563106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  121. Yu Q, Guyot R, de Kochko A, Byers A, Navajas-Pérez R, Langston BJ, Dubreuil-Tranchant C, Paterson AH, Poncet V, Nagai C, et al. Micro-collinearity and genome evolution in the vicinity of an ethylene receptor gene of cultivated diploid and allotetraploid coffee species (Coffea). Plant J. 2011;67(2):305–317. doi: 10.1111/j.1365-313X.2011.04590.x. [DOI] [PubMed] [Google Scholar]
  122. Žerdoner Čalasan A, Seregin AP, Hurka H, Hofford NP, Neuffer B.. The Eurasian steppe belt in time and space: phylogeny and historical biogeography of the false flax (Camelina Crantz, Camelineae, Brassicaceae). Flora Morphol Distrib Funct Ecol Plants. 2019;260(October):151477. doi: 10.1016/j.flora.2019.151477. [DOI] [Google Scholar]
  123. Zhang C, Rabiee M, Sayyari E, Mirarab S.. ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinformatics. 2018;19(S6):15–30. doi:10.1186/s12859-018–2129-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

jkac182_Supplementary_Figure_S1
jkac182_Supplemental_Tables

Data Availability Statement

All assembled genomes are available from the National Center for Biotechnology Information as part of Bioproject PRJNA750147.

Supplemental material is available at G3 online


Articles from G3: Genes|Genomes|Genetics are provided here courtesy of Oxford University Press

RESOURCES