Abstract
Lobularia maritima (L.) Desv. is an ornamental plant cultivated across the world. It belongs to the family Brassicaceae and can tolerate dry, poor and contaminated habitats. Here, we present a chromosome-scale, high-quality genome assembly of L. maritima based on integrated approaches combining Illumina short reads and Hi–C chromosome conformation data. The genome was assembled into 12 pseudochromosomes with a 197.70 Mb length, and it includes 25,813 protein-coding genes. Approximately 41.94% of the genome consists of repetitive sequences, with abundant long terminal repeat transposable elements. Comparative genomic analysis confirmed that L. maritima underwent a species-specific whole-genome duplication (WGD) event ~22.99 million years ago. We identified ~1900 species-specific genes, 25 expanded gene families, and 50 positively selected genes in L. maritima. Functional annotations of these genes indicated that they are mainly related to stress tolerance. These results provide new insights into the stress tolerance of L. maritima, and this genomic resource will be valuable for further genetic improvement of this important ornamental plant.
Subject terms: Plant stress responses, Plant breeding, Genome
Introduction
Whole-genome duplication (WGD), or polyploidy, has had a strong influence on the evolution of the tree of life, and it seems to have occurred in the evolutionary history of most plant species1,2, especially in angiosperms3. WGDs have been found in most angiosperm families with abundant species, including Brassicaceae, Poaceae, Asteraceae, Solanaceae, Fabaceae and Orchidaceae4–11. Previous studies suggested that WGDs can strengthen the adaptation of plants to environmental challenges12 because of genomic reorganization and novelties. Through subfunctionalization or reciprocal loss of duplicated genes in differentiated populations of an ancestral species, WGDs can also promote reproductive isolation and thus facilitate speciation13. Brassicaceae (also known as Cruciferae), a monophyletic group distributed worldwide, has been highly diversified by complicated WGD events and subsequent evolution, with ~350 genera and 4000 species14,15. It contains many important crops (e.g., cabbage, rapeseed and mustard) that have been domesticated for food, biofuels, and ornamentals16. The well-known model organism Arabidopsis thaliana, which is of paramount importance in studies of the development, gene expression and genome evolution of flowering plants, is also a member of this family17,18. Analyses of the A. thaliana genome have provided clear evidence that three ancient WGD events (γ, β and α), occurred in its evolutionary history. The oldest WGD event, the At-γ event, was related to the diversification of eudicots and perhaps all angiosperms19–21. The At-β event postdated the Brassicaceae–Caricaceae divergence ~70 million years ago (Mya)22,23. However, the At-α event was specific to the Brassicaceae family19, occurring ~40 Mya24. In addition, independent WGDs more recent than the Neogene may have promoted the colonization of harsh environments by Brassicaceae taxa by increasing their stress tolerance and conferring high adaptability25,26. However, detailed investigation of WGDs in numerous genera present in arid habitats is still badly needed27,28.
Lobularia maritima (L.) Desv., commonly known as sweet alyssum, is a perennial and diploid (2n = 24) herbaceous plant of the family Brassicaceae. This ornamental plant naturally occurs in the western Mediterranean region and has been widely cultivated since its domestication29,30. Its flowers range in color from pale violet to deep purple31. In addition to tolerating dry and poor habitats, L. maritima is recognized as a nickel hyperaccumulator that can remove heavy metals from contaminated soils32. As a facultative halophyte closely related to Arabidopsis thaliana, L. maritima seems to be an ideal model for revealing the molecular mechanisms underlying plant tolerance to drought and salt stress33. However, studies of L. maritima have focused mainly on its cultivation, management and rapid propagation in vitro29.
In this study, we report a chromosome-scale assembly of the L. maritima genome anchored on 12 pseudochromosomes. We further identified a recent L. maritima-specific WGD event that occurred after the Brassicaceae-specific At-α event using comparative and evolutionary analyses. We also revealed numerous genomic changes by which L. maritima has adapted to harsh habitats.
Results
Genome sequencing and assembly
Samples for genome sequencing were obtained from an L. maritima seedling with purple flowers (Fig. 1a). We obtained 59.77 Gb of clean reads with various insert sizes and 22.31 Gb of Hi–C clean reads (~112.89-fold coverage) after Illumina sequencing and quality control (Supplementary Table 1). Two methods were employed to estimate the genome size of L. maritima. First, we determined the L. maritima genome size to be 225 Mb using flow cytometry with A. thaliana as the external control (Supplementary Fig. 1). Second, we used k-mer-based statistics34, and the genome size was calculated to be 264 Mb (Supplementary Fig. 2).
Fig. 1. Morphological and genomic characteristics of L. maritima.
a Morphological characteristics of L. maritima: schematic representations of (a) aerial parts; (b) abaxial surfaces of leaves; (c) adaxial surfaces of leaves; (d) flowers; (e) petals; and (f) a receptacle and pedicel. b Hi–C chromatin interaction map for the 12 pseudochromosomes of the L. maritima genome. c Genome comparison between L. maritima and C. rubella at the chromosome level: (a) syntenic relationships between the L. maritima and C. rubella genomes; (b) gene density (window size = 100 kb, nonoverlapping); (c) density distribution of Copia elements (window size = 100 kb, nonoverlapping); and (d) density distribution of Gypsy elements (window size = 100 kb, nonoverlapping). d Estimated insertion times of intact LTR retrotransposons
Based on the clean reads, a de novo genome was assembled with a 197.70 Mb length. We further anchored this genome on 12 pseudochromosomes (Fig. 1b and Table 1 and Supplementary Fig. 3). We then evaluated the completeness of this genome using BUSCO v4.1.235 and found that 99% of the single-copy orthologs were intact (Supplementary Table 2), suggesting the high quality of the assembled genome.
Table 1.
Assembly and annotation statistics of the L. maritima genome
| Genome feature | Value |
|---|---|
| Estimated genome size (Mb) | 264 |
| Total scaffold number | 27,734 |
| Total length (bp) | 197,688,650 |
| Total length of chromosomes (bp) | 174,586,151 |
| Longest scaffold length (bp) | 16,503,592 |
| Scaffold L50 | 7 |
| Scaffold N50 length (bp) | 14,943,599 |
| GC content (%) | 36.02 |
| Repeat content (%) | 41.94 |
| Number of predicted genes | 25,813 |
| Average coding sequence length (bp) | 241 |
| Average gene length (bp) | 2431 |
| Number of exons | 140,984 |
| Average number of exons per gene | 5.46 |
Genome annotation
To predict protein-coding sequences, we combined de novo and homology- and transcriptome-based methods. We predicted 25,813 complete protein-coding genes. Gene length and the number of exons of these protein-coding genes were 2431 base pairs (bp) and 5.46 exons, respectively, on average (Table 1). In our assembly, 97.99% (25,295 of 25,813) of the genes were annotated on 12 pseudochromosomes, and only 2.01% (518 of 25,813) were located on scaffolds. The Circos v0.69 (http://circos.ca) was used to visualize the collinearity blocks between L. maritima and Capsella rubella, gene density, Copia density, and Gypsy density on individual chromosomes (Fig. 1c). Among the 25,813 predicted genes, 81.30% and 95.71% had homologs in the Swiss-Prot36 and TrEMBL36 databases, respectively. Additionally, we annotated 95.15%, 80.65%, and 36.55% of the genes using the InterPro37, Gene Ontology (GO)38 and Kyoto Encyclopedia of Genes and Genomes (KEGG)39 databases, respectively (Supplementary Table 3). In addition, 41.94% (83 Mb) of the assembled L. maritima genome comprised repetitive sequences (Supplementary Table 4). Of these repetitive sequences, long terminal repeat (LTR) retrotransposons were the most frequent, spanning 14.24% of the assembled genome with 13.23% intact LTR retrotransposons. The other common repetitive sequences were DNA transposons (10.06%), Tandem Repeats (8.56%) and LINEs (5.65%) (Supplementary Tables 4 and 5). To analyze the evolutionary dynamics of these LTRs, we estimated their insertion dates in four related species (A. thaliana, Arabidopsis lyrata, C. rubella and L. maritima). The recent insertions in A. lyrata may have contributed to its relatively large genome size (207 Mb). Similarly, L. maritima had more recent insertions than A. thaliana and C. rubella (Fig. 1d and Supplementary Table 5). Diverse genetic changes can be caused by transposable elements (including LTR retrotransposons), which might have promoted lineage-specific diversification and adaptation40. This may partly contribute to the tolerance of L. maritima to arid habitats. However, the L. maritima genome contained a similar number of transcription factors (1799) as the other closely related Brassicaceae species (Supplementary Table 6, all transcription factor data for other species were downloaded from http://www.transcriptionfactor.org).
Comparative genomic analyses and WGD analyses
Using the ColinearScan v1.0.141 program and MCScanX v142 package, the protein sequences of L. maritima were compared to those of the diploid C. rubella, which has not been affected by a recent WGD event, to identify the collinear blocks in the genomes. The whole-genome alignments showed high collinearity and conservation, and several collinear regions almost completely spanned chromosomes of the two species (Fig. 2). It is worth noting that each chromosome or chromosomal region in C. rubella was represented on multiple independent chromosomes in the L. maritima genome after the Brassicaceae-specific At-α WGD event, suggesting that the L. maritima genome experienced a specific WGD event.
Fig. 2. Dotplot comparing the L. maritima and C. rubella genomes.
Collinear regions in the L. maritima genome. Regions from putative subgenomes are circled in purple and green, respectively
Furthermore, we determined the karyotype of L. maritima using previously reported methods28,43 (Supplementary Fig. 4) and recovered two sets of conserved genomic blocks44,45. However, the patterns of genomic blocks suggested that L. maritima experienced many postpolyploid diploidization events and a reduction in chromosome number. We also analyzed the gene retention rates of the two subgenomes in each genomic block with the C. rubella genome as the reference. The results showed that the two subgenomes retained similar numbers of genes (Supplementary Table 7). We also assessed the absence or presence of genome dominance by examining the expression levels of each pair of duplicated genes with high confidence. Based on RNA-seq data from flower, leaf, and stem tissues, we failed to find any evidence of biased expression in each genomic block between the two subgenomes (Supplementary Table 8). These results are largely consistent with the patterns of autopolyploids, which usually show a few instances of biased gene retention and no genome dominance.
Recent WGD event in L. maritima
To identify possible WGD events, we calculated the Ks values between the collinear genes. The L. maritima collinear blocks produced two visible peaks, at 0.583 and 1.287 (Fig. 3a), representing two different WGD events. We then estimated the occurrence times of each WGD event based on the Ks values. However, dating ancestral events in plants can be influenced by divergent evolutionary rates46. Thus, by aligning the L. maritima peak with the corresponding location in the C. rubella Ks distribution, as in a previous report46, we performed evolutionary rate correction (Fig. 3b). After correction, the peaks of Ks for the two WGD events were 0.378 and 0.855, corresponding to 22.99 and 52.01 Mya, respectively (Fig. 3b). The results indicated an ancient WGD event shared with C. rubella and a recent species-specific WGD event in L. maritima. In addition, Ks estimation indicated that C. rubella and L. maritima diverged approximately 21.53 Mya (Fig. 3b). These findings were consistent with those of the synteny and collinearity analyses of L. maritima and C. rubella and suggested that L. maritima experienced a species-specific WGD event after sharing a WGD event with other Brassicaceae species.
Fig. 3. Evolutionary rate correction.
a Distribution of uncorrected Ks values in syntenic blocks; b distribution of corrected Ks values in syntenic blocks and age estimates for the events
Phylogeny and divergence
We obtained the genome sequences of representative Brassicaceae species to clarify the genome evolution and divergence of L. maritima. Gene family clusters were defined based on the L. maritima protein-coding genes and the annotated gene sets of 10 published genomes (Supplementary Table 9) using OrthoFinder v2.3.1247. A total of 25,316 orthogroups were determined across the 11 species. Among these orthogroups, 1,986 were putative single-copy gene families, and 24,705 genes from L. maritima could be clustered into 16,821 orthogroups. In addition, we identified 1878 L. maritima-specific genes in these gene families. Functional annotations of these genes indicated that they were distinctly enriched in the GO terms “positive regulation of response to salt stress”, “abscisic acid-activated signaling pathway”, “response to freezing”, “response to stimulus”, and “response to biotic stimulus”, indicating that the genes retained after the WGD event may be relevant in the adaptation of L. maritima to multiple environmental stress factors (Fig. 4a and Supplementary Tables 10 and 11). For example, a homolog of these L. maritima-specific genes, ABI4, acts as both an activator and a repressor of gene expression and plays a critical role in phytohormone signaling pathways in plant development and biotic/abiotic stress responses48. Another homolog, ABI1, serves as a key repressor of the abscisic acid (ABA) signaling pathway and regulates diverse ABA responses to abiotic stress49,50. The species-specific calcium-dependent protein kinase (CDPK) genes recovered here (Supplementary Table 11) were also demonstrated to be involved in numerous aspects of plant growth and development, from sensing biotic and abiotic stress to mediating hormone-related development51.
Fig. 4. Evolutionary analyses of the L. maritima genome.
a Gene Ontology enrichment of species-specific genes in L. maritima compared to 10 other species. b Estimation of divergence times between 11 species in the Brassicaceae family. The A. thaliana α and β duplication events were estimated to have occurred ~50 and ~60 million years ago, respectively76. c Phylogenetic tree of the KTI gene family constructed from sequences of L. maritima, A. thaliana, and S. parvula, indicating the expanded gene copies in L. maritima
To verify the phylogenetic position of L. maritima, we used the concatenated protein sequence alignment of the 1986 single-copy gene families in the 11-species phylogenetic analyses. The results confirmed that L. maritima belonged to Lineage II11,52 (Fig. 4b), consistent with its position in the chloroplast genome phylogeny reported previously53. In our analyses performed using MCMCtree54, L. maritima was estimated to have diverged from the other closely related species ~22.63 (18.74, 26.61) Mya (Fig. 4b).
Expansion and contraction of gene families in L. maritima
Gene families with significantly expanded or contracted copy numbers are usually related to the adaptive divergence of one species from closely related species55,56. We compared the genomes of L. maritima and 10 other species, with Aethionema arabicum as the outgroup (Fig. 4b), to explore the expansion and contraction of the gene families in L. maritima. Twenty-five gene families, comprising 319 genes, were significantly expanded in L. maritima (P < 0.05). Functional annotation of these genes indicated that they were mainly enriched in “response to molecule of bacterial origin”, “response to insect”, “response to molecule of fungal origin”, “response to wounding” and “response to salt stress” (Supplementary Tables 12 and 13). For example, one of the expanded gene families, the KTI gene family, comprised versatile protease inhibitors related to defense against insect attack (Fig. 4c)57. In addition, the HIPP gene family, involved in stress responses58, was also greatly expanded in L. maritima.
Positively selected genes in L. maritima
Genes with signs of positive selection are usually regarded to be involved in the adaptive divergence of one species from closely related species59. We conducted positive selection analysis by using L. maritima as the foreground branch and five related Brassicaceae species (Eutrema yunnanense, C. rubella, A. arabicum, A. lyrata, and Schrenkiella parvula) as the background branches. We identified 10,581 single-copy orthologous gene families. To identify the genes that evolved in response to positive selection, we adopted the branch-site model in the PAML v4.9 package54. After false discovery rate (FDR) correction, we identified 50 genes that were possibly under positive selection. The functions of the significantly positively selected genes (PSGs) indicated that they were associated with stress tolerance and the survival of plants (Fig. 5 and Supplementary Table 14). For example, one of the genes was SGT1B, which was found to be involved in innate immunity and resistance in plants mediated by multiple R genes60–63. Another of the genes was YchF1, which is involved in salinity stress tolerance and disease resistance against bacterial pathogens64. Another of the genes, EIF4A3, is an important factor for abiotic stress adaptation, which can regulate plant resistance to abiotic stress partially by regulating the expression of acetoacetyl-CoA thiolase 265.
Fig. 5. Genes in the L. maritima genome associated with various biotic and abiotic stimuli.
Gene IDs for the gene names are listed below. ABI1: Lma13276; ABI4: Lma14310, Lma25376; EFR: Lma05693, Lma14720, Lma21740; EIF4A3: Lma15462; ERD15: Lma14373, Lma26266; ERF012: Lma17999; HIPP06: Lma19275; HIPP14: Lma12975; HIPP33: Lma25212; HIPP43: Lma16268; KTI1: Lma22625, Lma04403, Lma25677, Lma22622, Lma22621, Lma04406, Lma22624, Lma22623, Lma25676, Lma04405; MYB96: Lma13875; STG1B: Lma21436; PME1: Lma13269; ROSY1:Lma11854; RPP13L4: Lma09152, Lma03536, Lma21662, Lma03543, Lma09149, Lma21664, Lma09145; RTM3: Lma20357, Lma22531, Lma20352, Lma10871, Lma08006, Lma20353; TIL: Lma24858, Lma24859; DSC2: Lma14716 Lma17306; YchF1: Lma17801; Abbreviations: ABI ABA insensitive; EFR EF-TU receptor; EIF eukaryotic initiation factor; ERD early responsive to dehydration stress; ERF ethylene-responsive factor; HIPP heavy metal-associated isoprenylated plant protein; KTI Kunitz trypsin inhibitor; MYB myeloblastosis oncogene; PME pectin methyl esterase; ROSY interactor of synaptotagmin; RPP resistance to P. pachyrhizi; RTM restricted tobacco etch virus movement; TIL temperature-induced lipocalin; DSC2, desmocollin-2
Discussion
L. maritima is an important ornamental plant in horticulture because of its colorful flowers and stress tolerance. In this study, by combining Illumina and Hi–C data, a chromosome-level high-quality L. maritima genome was assembled. The L. maritima genome was ~197.70 Mb in size, and 88.31% (174.59 Mb) of the sequences were assigned to 12 pseudochromosomes. We annotated 25,813 genes and found substantially more repetitive elements (especially intact LTR retrotransposons) in the L. maritima genome than in the genomes of other Brassicaceae species. In addition, most intact LTR retrotransposons expanded rapidly in the recent past. Such proliferation of LTR retrotransposons may have partly resulted in the increased genome size of L. maritima. Phylogenetic reconstructions showed that L. maritima diverged early as an independent branch of Brassicaceae Lineage II.
In the histories of many diverse eukaryotes, including Danio rerio66, Saccharomyces cerevisiae67, and A. thaliana68–71, WGDs have been discovered. Through large-scale phylogenomic analyses, ancient WGDs were found to occur in the common ancestors of both seed plants and angiosperms4,9,71,72. WGDs have played an essential role in angiosperm diversification and environmental adaptation9. Polyploids can tolerate high environmental stress, with present-day polyploids often appearing to occur at high frequencies in disturbed and harsh environments73–75. Under environmental stresses, polyploids may have been more successful because their changing environments created many opportunities to make use of the evolutionary benefits of WGDs76. The comparison of L. maritima and the diploid C. rubella indicated a recent WGD event that was specific to L. maritima, followed by extensive chromosomal rearrangements. Furthermore, we evaluated whether biased gene retention occurred after the WGD event. Two subgenomes retained a similar number of genes. However, neither subgenome showed genome dominance. This indicates that L. maritima might have undergone an autopolyploidization event. Analysis of the Ks values between the collinear genes suggested that the recent L. maritima-specific WGD event occurred ~22.99 Mya. The comparison of between-species Ks distributions indicated that the L. maritima-C. rubella divergence occurred ~21.53 Mya. Thus, this divergence and the aforementioned L. maritima-specific WGD event occurred at almost the same time. L. maritima and C. rubella belong to two major lineages, and it is highly likely that the divergence of the two major lineages and genus diversification of each lineage in Brassicaceae occurred radiatively at the same time. This rapid radiation was accompanied by polyploidy in a few of the genera. This is also consistent with the previous suggestion that further WGDs might have occurred in Brassicaceae since the Neogene, with radiative diversification, which further helped members of this family colonize arid habitats by increasing their stress tolerance26. As a result of the WGD event, species-specific genes and expanded gene families become further involved in responses to environmental stresses, for example, drought and pathogen attack, which might have facilitated the adaptation of L. maritima to harsh environments. In addition, the positively selected genes in L. maritima may have increased defense against fungal and bacterial attack. Thus, the species-specific WGD event may have promoted the adaptation of L. maritima to harsh environments, which is consistent with previous findings for numerous plants76,77. These genomic traits may also explain why L. maritima is a nickel hyperaccumulator32 and a halophyte with a high tolerance to salt stress33. Overall, whole-genome sequencing of L. maritima could elucidate the stress tolerance of this ornamental plant and be useful in future breeding programs.
Materials and methods
Materials and DNA/RNA extraction
The L. maritima seedling was cultivated in Jinjiang District, Chengdu City, Sichuan Province, China (N 30°34′21.86″, E 104°09′45.47″). We harvested fresh and healthy roots, stems, leaves and flowers and immediately froze them in liquid nitrogen. Before DNA/RNA extraction, we stored these tissues in a −80 °C freezer in the laboratory. To extract high-quality genomic DNA, the cetyl trimethylammonium bromide (CTAB)78 method was used. Additionally, we extracted total RNA from the flower, stem and leaf tissues using Qiagen RNeasy Plant Mini Kits.
Library construction and sequencing
We randomly fragmented the purified genomic DNA using a focused ultrasonicator and obtained fragments of desired lengths by electrophoresing the DNA fragments in 0.8% General Purpose Agarose E-Gel. Then, we created Illumina libraries with large (2-, 5-, 10- and 20-kb) and small (350- and 500-bp) inserts using the purified DNA fragments. Based on the PE-150 protocol, the libraries were finally sequenced on an Illumina HiSeq 2000 platform. RNA libraries were constructed with a TruSeq RNA Library Preparation Kit v2 and sequenced on the same platform.
A Hi–C library was constructed using five main steps. First, we fixed the sample with formaldehyde and crosslinked DNA-DNA interactions that are bridged by proteins. Second, the crosslinked DNA was treated with the restriction endonuclease Hind III to produce sticky ends. Third, terminal DNA repair was used to introduce biotin-labeled bases in order to facilitate subsequent DNA purification and capture. Next, we ensured the location of the interacting DNA through cyclization of the end-repaired DNA and DNA fragments. Finally, we extracted and purified the DNA sample and then used Covaris S2 to shear the DNA sample. After A-tailing, pulldown, and adapter ligation, the DNA library was sequenced on an Illumina platform using the PE-150 protocol. We used HiCPro v2.8.179 to remove duplicates and then assessed quality. After trimming low-quality reads and removing adapters, more than 22.31 Gb (~112.89-fold coverage) of clean data was generated. Then, all clean data were submitted to the 3D-DNA v180419 pipeline80.
Genome assembly
Approximately 79.49 Gb of raw reads was generated by sequencing all six DNA libraries. These raw reads were filtered following a previous study81. We first used Trimmomatic v0.3382 to perform quality filtering of short reads. We then used the BFC error corrector83 followed by FastUniq v1.184 to delete duplicates in the mate pair data. The resultant reads produced approximately 59.77 Gb of clean data (Supplementary Table 1).
We used Platanus v1.2.485 software to perform de novo assembly of the L. maritima genome. Thereafter, using the 3D-DNA v180419 pipeline80, the draft assembly was scaffolded with the Hi–C clean reads. Using the Juicer v1.6.2 pipeline86, we aligned the Hi–C clean reads to the draft assembly genome. We then used Juicebox Assembly Tools87 to polish the results from the 3D-DNA v180419 pipeline. The Hi–C scaffolding was anchored on 12 pseudochromosomes. In total, 88.31% of the assembled sequences were related to the pseudochromosomes. In addition, we assessed the quality of the assembled genome using the BUSCO v4.1.235 pipeline (database: embryophyta odb10, 2020-09-02, containing 1,614 BUSCO genes).
Repeat element annotation
Repeat elements were identified with the RepeatMasker v4.0.788 and RepeatModeler v1.0.1189 programs using the assembled L. maritima genome as the input. We also identified intact LTR retrotransposons by searching the L. maritima genome using LTRharvest v1.5.1090 and LTR_Finder v1.0691. We further combined these results using LTR_retriever v1.992. We also estimated insertion time according to a substitution rate of 7 × 10−9/site/year.
Gene prediction and annotation
To predict genes in the L. maritima genome, we first assembled transcripts using the de novo and genome-guided modes in Trinity v2.6.693. Then, these transcripts were used to create transcript-based predictions with the PASA v2.1.0 pipeline94. We also carried out homolog predictions. In such predictions, the protein sequences of A. thaliana, A. arabicum, A. lyrata, Eutrema yunnanense, Brassica rapa, Sisymbrium irio, C. rubella, Tarenaya hassleriana, Leavenworthia alabamica and Carica papaya were mapped to the L. maritima genome using Exonerate v2.2.0 (https://www.ebi.ac.uk/about/vertebrate-genomics/software/exonerate). GlimmerHMM v3.0.495 and Augustus v3.2.296 were trained with genes from the PASA results and used for de novo gene prediction. We merged the gene models from the three sources using EVidenceModeler v1.1.197. To annotate the functions of all predicted genes, we aligned the protein sequences of L. maritima to Swiss-Prot and TrEMBL36 using blastp and generated functional assignments based on the best hit. Protein domains were determined by searching against the InterPro37 database. In addition, Blast2GO v2.598 was used to identify the Gene Ontology38 annotations and KEGG39 pathways using the KAAS server (https://www.genome.jp/kegg/kaas).
Synteny and WGD
To construct syntenic blocks between L. maritima and C. rubella, all protein sequences of L. maritima were compared to protein sequences of C. rubella. The gene pairs with an e-value ≤ 1e-5 were further analyzed. We applied the ColinearScan v1.0.141 program, which can effectively evaluate genomic blocks of collinear genes, and the MCScanX v1 package42 to find the syntenic blocks between the C. rubella and L. maritima genomes. Thereafter, we used these collinear gene pairs to construct a dotplot. Next, we used the script “add_ka_and_ks_to_collinearity.pl” in MCScanX to calculate the Ks values of the collinear orthologous gene pairs. We converted the Ks values to divergence times (T) based on T = Ks/2r, where r is the neutral substitution rate (8.22 × 10−9). Finally, we performed evolutionary rate correction because of the inconsistent evolutionary rates among species. The evolutionary rate correction method was as reported by Wang et al.46. Briefly, under the assumption that the C. rubella peak appears at kC and the L. maritima peak appears at kL, we can use the equation r = (kL − kC)/kC to describe the relative evolutionary rate of L. maritima. Then, rate correction was performed to discover the corrected rate kL correction of L. maritima relative to kC: (1) For the Ks between duplicates in L. maritima, we defined the correction coefficient WL as kL correction/kL = kC/kL = WL; thus, we obtained kL correction = kC/kL × kL = 1/(1 + r) × kL and WL = 1/(1 + r). (2) For the Ks between homologous genes from C. rubella and L. maritima, if the peak was located at kL-C, supposing the correction coefficient WL in L. maritima, we then calculated a corrected evolutionary rate kL-C-correction = WL × kL-C.
Phylogeny and divergence
The genomes of L. maritima and 10 other species (A. arabicum, B. rapa, L. alabamica, E. yunnanense, S. irio, A. thaliana, A. lyrata, C. rubella, S. parvula and Thlaspi arvense) were selected to generate clusters of gene families. We retained only the longest protein sequence. We removed redundant sequences based on alternative splicing variations. Using OrthoFinder v2.3.1247, we obtained orthologous gene families. Protein sequences from 1986 single-copy gene families were used to construct a phylogenetic tree. MAFFT v7.31399 software was used for sequence alignment of each single-copy gene family with default settings. A phylogenetic tree was built using RAxML v8.0.0100 under the PROTGAMMALGX model, and divergence times were calculated using the MCMCTree program of the PAML v4.9 package54. The calibration information for MCMCTree was extracted based on the TimeTree database101 (http://www.time.org/).
Gene family expansion and contraction
Based on the dated phylogeny, we determined the expansions and contractions of orthologous gene families in the 11 Brassicaceae species (A. arabicum, B. rapa, L. alabamica, E. yunnanense, S. irio, T. arvense, C. rubella, A. thaliana, A. lyrata, S. parvula, and L. maritima) by using the CAFÉ v4.2102 program. Genes in significantly expanded families were then used for Gene Ontology enrichment analysis.
Genes under positive selection
We selected six genomes, i.e., those of A. arabicum, A. lyrata, C. rubella, E. yunnanense, S. parvula and L. maritima, to identify orthologs for analyzing positive selection. First, Proteinortho v6.0.21103 was used to detect orthologs among the six genomes. Next, we used the PosiGene v0.1104 pipeline for genome-wide detection of the genes with positive selection and specified the L. maritima clade as the foreground branch. Finally, PSGs were identified based on an FDR-corrected P value < 0.05.
Supplementary information
Acknowledgements
This work was supported by the National Key Research and Development Program of China (2017YFC0505203), National Natural Science Foundation of China (31590821), Fundamental Research Funds for the Central Universities (2018CDDY-S02-SCU and SCU2019D013), National High-Level Talents Special Support Plan (10 Thousand of People Plan), and 985 and 211 Projects of Sichuan University.
Author contributions
Q.H., J.L., and Z.X. designed the research; L.Z. collected the materials and performed genome sequencing; L.H., Y.M., W.Y., T.L., J.J., L.W., and L.F. conducted the genome assembly, annotation and evolution-related data analysis; L.H., Q.H., and J.L. wrote the paper. All authors read and approved the final paper.
Data availability
Raw Illumina-short reads and Hi–C reads used for de novo whole-genome assembly have been deposited in the National Center for Biotechnology Information (NCBI) Sequence Read Archive database under accession number PRJNA630530. The genome and related annotation data have been deposited in the National Genomics Data Center (PRJCA002888).
Conflict of interest
The authors declare that they have no conflict of interest.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Li Huang, Yazhen Ma
Supplementary information
Supplementary Information accompanies this paper at (10.1038/s41438-020-00422-w).
References
- 1.Schranz ME, Mitchell-Olds T. Independent ancient polyploidy events in the sister families Brassicaceae and Cleomaceae. Plant Cell. 2006;18:1152–1165. doi: 10.1105/tpc.106.041111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Van de Peer Y, Mizrachi E, Marchal K. The evolutionary significance of polyploidy. Nat. Rev. Genet. 2017;18:411–424. doi: 10.1038/nrg.2017.26. [DOI] [PubMed] [Google Scholar]
- 3.Adams KL, Wendel JF. Polyploidy and genome evolution in plants. Curr. Opin. Plant Biol. 2005;8:135–141. doi: 10.1016/j.pbi.2005.01.001. [DOI] [PubMed] [Google Scholar]
- 4.Blanc G, Wolfe KH. Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell. 2004;16:1667–1678. doi: 10.1105/tpc.021345. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Paterson AH, Bowers JE, Chapman BA. Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc. Natl Acad. Sci. USA. 2004;101:9903–9908. doi: 10.1073/pnas.0307901101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Bertioli DJ, et al. An analysis of synteny of Arachis with Lotus and Medicago sheds new light on the structure, stability and evolution of legume genomes. BMC Genomics. 2009;10:45. doi: 10.1186/1471-2164-10-45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Tang H, Bowers JE, Wang X, Paterson AH. Angiosperm genome comparisons reveal early polyploidy in the monocot lineage. Proc. Natl Acad. Sci. USA. 2010;107:472–477. doi: 10.1073/pnas.0908007107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Jiao Y, Li J, Tang H, Paterson AH. Integrated syntenic and phylogenomic analyses reveal an ancient genome duplication in monocots. Plant Cell. 2014;26:2792–2802. doi: 10.1105/tpc.114.127597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Jiao Y, et al. Ancestral polyploidy in seed plants and angiosperms. Nature. 2011;473:97–100. doi: 10.1038/nature09916. [DOI] [PubMed] [Google Scholar]
- 10.Cai J, et al. The genome sequence of the orchid Phalaenopsis equestris. Nat. Genet. 2015;47:65. doi: 10.1038/ng.3149. [DOI] [PubMed] [Google Scholar]
- 11.Huang C, et al. Resolution of Brassicaceae phylogeny using nuclear genes uncovers nested radiations and supports convergent morphological evolution. Mol. Biol. Evol. 2016;33:394–412. doi: 10.1093/molbev/msv226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Hegarty MJ, Hiscock SJ. Genomic clues to the evolutionary success of polyploid plants. Curr. Biol. 2008;18:R435–R444. doi: 10.1016/j.cub.2008.03.043. [DOI] [PubMed] [Google Scholar]
- 13.Sémon M, Wolfe KH. Reciprocal gene loss between Tetraodon and zebrafish after whole genome duplication in their ancestor. Trends Genet. 2007;23:108–112. doi: 10.1016/j.tig.2007.01.003. [DOI] [PubMed] [Google Scholar]
- 14.Perumal S, et al. Elucidating the major hidden genomic components of the A, C, and AC genomes and their influence on Brassica evolution. Sci. Rep. 2017;7:1–12. doi: 10.1038/s41598-016-0028-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kiefer M, et al. BrassiBase: introduction to a novel knowledge database on Brassicaceae evolution. Plant Cell Physiol. 2014;55:e3. doi: 10.1093/pcp/pct158. [DOI] [PubMed] [Google Scholar]
- 16.Appel, O. & Al-Shehbaz, I. A. Cruciferae. in Flowering Plants·Dicotyledons (Springer, 2003).
- 17.Al-Shehbaz IA, Beilstein MA, Kellogg EA. Systematics and phylogeny of the Brassicaceae (Cruciferae): an overview. Plant Syst. Evol. 2006;259:89–120. doi: 10.1007/s00606-006-0415-z. [DOI] [Google Scholar]
- 18.O’Kane Jr, S. L. Brassicaceae, molecular systematics and evolution of. Brenner’s Encycl. Genet. Second Ed. 374–376, 10.1016/B978-0-12-374984-0.00169-8 (2013).
- 19.Blanc G, Hokamp K, Wolfe KH. A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome. Genome Res. 2003;13:137–144. doi: 10.1101/gr.751803. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.De Bodt S, Maere S, de Peer Y. Genome duplication and the origin of angiosperms. Trends Ecol. Evol. 2005;20:591–597. doi: 10.1016/j.tree.2005.07.008. [DOI] [PubMed] [Google Scholar]
- 21.Soltis DE, et al. Polyploidy and angiosperm diversification. Am. J. Bot. 2009;96:336–348. doi: 10.3732/ajb.0800079. [DOI] [PubMed] [Google Scholar]
- 22.Ming R, et al. The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus) Nature. 2008;452:991–996. doi: 10.1038/nature06856. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Tang H, et al. Synteny and collinearity in plant genomes. Science. 2008;320:486–488. doi: 10.1126/science.1153917. [DOI] [PubMed] [Google Scholar]
- 24.Fawcett JA, Maere S, Van de Peer Y. Plants with double genomes might have had a better chance to survive the Cretaceous-Tertiary extinction event. Proc. Natl Acad. Sci. USA. 2009;106:5737–5742. doi: 10.1073/pnas.0900906106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Wang X, et al. The genome of the mesopolyploid crop species Brassica rapa. Nat. Genet. 2011;43:1035. doi: 10.1038/ng.919. [DOI] [PubMed] [Google Scholar]
- 26.Kagale S, et al. Polyploid Evolution of the Brassicaceae during the Cenozoic Era. Plant Cell. 2014;26:2777–2791. doi: 10.1105/tpc.114.126391. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Guo X, et al. The genomes of two Eutrema species provide insight into plant adaptation to high altitudes. DNA Res. 2018;25:307–315. doi: 10.1093/dnares/dsy003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Kang M, et al. A chromosome-scale genome assembly of Isatis indigotica, an important medicinal plant used in traditional Chinese medicine. Hortic. Res. 2020;7:1–10. doi: 10.1038/s41438-020-0240-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Huang R, et al. Artificially induced polyploidization in Lobularia maritima (L.) Desv. and its effect on morphological traits. HortScience. 2015;50:636–639. doi: 10.21273/HORTSCI.50.5.636. [DOI] [Google Scholar]
- 30.Gómez JM, et al. Phenotypic selection and response to selection in Lobularia maritima: importance of direct and correlational components of natural selection. J. Evol. Biol. 2000;13:689–699. doi: 10.1046/j.1420-9101.2000.00196.x. [DOI] [Google Scholar]
- 31.Polunin, O. & Everard, B. Flowers of Europe: A Field Guide (Oxford University Press, Oxford, 1969).
- 32.Yuan XY, Zhang XY, Ma J, Hou XF. Tissue culture in vitro and establishment of regeneration system of Lobularia maritima. North. Hort. 2010;8:145–146. [Google Scholar]
- 33.Popova OV, Golldack D. In the halotolerant Lobularia maritima (Brassicaceae) salt adaptation correlates with activation of the vacuolar H+-ATPase and the vacuolar Na+/H+ antiporter. J. Plant Physiol. 2007;164:1278–1288. doi: 10.1016/j.jplph.2006.08.011. [DOI] [PubMed] [Google Scholar]
- 34.Liu B, et al. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. Quant. Biol. 2013;35:62–67. [Google Scholar]
- 35.Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
- 36.Bairoch A, Apweiler R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 2000;28:45–48. doi: 10.1093/nar/28.1.45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Hunter S, et al. InterPro: the integrative protein signature database. Nucleic Acids Res. 2008;37:D211–D215. doi: 10.1093/nar/gkn785. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Ashburner M, et al. Gene ontology: tool for the unification of biology. Nat. Genet. 2000;25:25. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Kanehisa M, Goto S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Oliver KR, McComb JA, Greene WK. Transposable elements: powerful contributors to angiosperm evolution and diversity. Genome Biol. Evol. 2013;5:1886–1901. doi: 10.1093/gbe/evt141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Wang X, et al. Statistical inference of chromosomal homology based on gene colinearity and applications to Arabidopsis and rice. BMC Bioinforma. 2006;7:447. doi: 10.1186/1471-2105-7-447. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Wang Y, et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 2012;40:e49. doi: 10.1093/nar/gkr1293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Lysak MA, Mandáková T, Schranz ME. Comparative paleogenomics of crucifers: ancestral genomic blocks revisited. Curr. Opin. Plant Biol. 2016;30:108–115. doi: 10.1016/j.pbi.2016.02.001. [DOI] [PubMed] [Google Scholar]
- 44.Mandáková T, Guo X, Özüdoğru B, Mummenhoff K, Lysak MA. Hybridization-facilitated genome merger and repeated chromosome fusion after 8 million years. Plant J. 2018;96:748–760. doi: 10.1111/tpj.14065. [DOI] [PubMed] [Google Scholar]
- 45.Mandáková T, Lysak MA. Chromosomal phylogeny and karyotype evolution in x = 7 crucifer species (Brassicaceae) Plant Cell. 2008;20:2559–2570. doi: 10.1105/tpc.108.062166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Wang J, et al. Two likely auto-tetraploidization events shaped kiwifruit genome and contributed to establishment of the Actinidiaceae family. iScience. 2018;7:230–240. doi: 10.1016/j.isci.2018.08.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Emms DM, Kelly S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 2015;16:157. doi: 10.1186/s13059-015-0721-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Chandrasekaran U, Luo X, Zhou W, Shu K. Multifaceted signaling networks mediated by abscisic acid insensitive 4. Plant Commun. 2020;1:100040. doi: 10.1016/j.xplc.2020.100040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Harb A, Krishnan A, Ambavaram MMR, Pereira A. Molecular and physiological aanalysis of drought stress in Arabidopsis reveals early responses leading to acclimation in plant growth. Plant Physiol. 2010;154:1254–1271. doi: 10.1104/pp.110.161752. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Kariola T, et al. Early responsive to dehydration 15, a negative regulator of abscisic acid responses in Arabidopsis. Plant Physiol. 2006;142:1559–1573. doi: 10.1104/pp.106.086223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Boudsocq M, Sheen J. CDPKs in immune and stress signaling. Trends Plant Sci. 2013;18:30–40. doi: 10.1016/j.tplants.2012.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Beilstein MA, Al-Shehbaz IA, Kellogg EA. Brassicaceae phylogeny and trichome evolution. Am. J. Bot. 2006;93:607–619. doi: 10.3732/ajb.93.4.607. [DOI] [PubMed] [Google Scholar]
- 53.Guo X, et al. Plastome phylogeny and early diversification of Brassicaceae. BMC Genomics. 2017;18:176. doi: 10.1186/s12864-017-3555-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 2007;24:1586–1591. doi: 10.1093/molbev/msm088. [DOI] [PubMed] [Google Scholar]
- 55.Dassanayake M, et al. The genome of the extremophile crucifer Thellungiella parvula. Nat. Genet. 2011;43:913–918. doi: 10.1038/ng.889. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Sudmant PH, et al. Diversity of human copy number variation and multicopy genes. Science. 2010;330:641–646. doi: 10.1126/science.1197005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Arnaiz A, et al. Arabidopsis kunitz trypsin inhibitors in defense against spider mites. Front. Plant Sci. 2018;9:986. doi: 10.3389/fpls.2018.00986. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Zschiesche W, et al. The zinc-binding nuclear protein HIPP3 acts as an upstream regulator of the salicylate-dependent plant immunity pathway and of flowering time in Arabidopsis thaliana. N. Phytol. 2015;207:1084–1096. doi: 10.1111/nph.13419. [DOI] [PubMed] [Google Scholar]
- 59.Fitch WM. Distinguishing homologous from analogous proteins. Syst. Zool. 1970;19:99–113. doi: 10.2307/2412448. [DOI] [PubMed] [Google Scholar]
- 60.Azevedo C, et al. Role of SGT1 in resistance protein accumulation in plant immunity. EMBO J. 2006;25:2007–2016. doi: 10.1038/sj.emboj.7601084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Tör M, et al. Arabidopsis SGT1b is required for defense signaling conferred by several downy mildew resistance genes. Plant Cell. 2002;14:993–1003. doi: 10.1105/tpc.001123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Holt BF. Antagonistic control of disease resistance protein stability in the plant immune system. Science. 2005;309:929–932. doi: 10.1126/science.1109977. [DOI] [PubMed] [Google Scholar]
- 63.Austin MJ. Regulatory role of SGT1 in early R gene-mediated plant defenses. Science. 2002;295:2077–2080. doi: 10.1126/science.1067747. [DOI] [PubMed] [Google Scholar]
- 64.Cheung M-Y, et al. ATP binding by the P-loop NTPase OsYchF1 (an unconventional G protein) contributes to biotic but not abiotic stress responses. Proc. Natl Acad. Sci. USA. 2016;113:2648–2653. doi: 10.1073/pnas.1522966113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Pascuan C, Frare R, Alleva K, Ayub ND, Soto G. mRNA biogenesis-related helicase eIF4AIII from Arabidopsis thaliana is an important factor for abiotic stress adaptation. Plant Cell Rep. 2016;35:1205–1208. doi: 10.1007/s00299-016-1947-5. [DOI] [PubMed] [Google Scholar]
- 66.Postlethwait JH, et al. Zebrafish comparative genomics and the origins of vertebrate chromosomes. Genome Res. 2000;10:1890–1902. doi: 10.1101/gr.164800. [DOI] [PubMed] [Google Scholar]
- 67.Wolfe KH, Shields DC. Molecular evidence for an ancient duplication of the entire yeast genome. Nature. 1997;387:708. doi: 10.1038/42711. [DOI] [PubMed] [Google Scholar]
- 68.Blanc G, Barakat A, Guyot R, Cooke R, Delseny M. Extensive duplication and reshuffling in the Arabidopsis genome. Plant Cell. 2000;12:1093–1101. doi: 10.1105/tpc.12.7.1093. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Vision TJ, Brown DG, Tanksley SD. The origins of genomic duplications in Arabidopsis. Science. 2000;290:2114–2117. doi: 10.1126/science.290.5499.2114. [DOI] [PubMed] [Google Scholar]
- 70.Simillion C, Vandepoele K, Van Montagu MCE, Zabeau M, Van de Peer Y. The hidden duplication past of Arabidopsis thaliana. Proc. Natl Acad. Sci. USA. 2002;99:13627–13632. doi: 10.1073/pnas.212522399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Bowers JE, Chapman BA, Rong J, Paterson AH. Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature. 2003;422:433–438. doi: 10.1038/nature01521. [DOI] [PubMed] [Google Scholar]
- 72.Doyle JJ, et al. Evolutionary genetics of genome merger and doubling in plants. Annu. Rev. Genet. 2008;42:443–461. doi: 10.1146/annurev.genet.42.110807.091524. [DOI] [PubMed] [Google Scholar]
- 73.Madlung A. Polyploidy and its effect on evolutionary success: old questions revisited with new tools. Heredity. 2013;110:99. doi: 10.1038/hdy.2012.79. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Ramsey J. Polyploidy and ecological adaptation in wild yarrow. Proc. Natl Acad. Sci. USA. 2011;108:7096–7101. doi: 10.1073/pnas.1016631108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Diallo AM, Nielsen LR, Kjær ED, Petersen KK, Ræbild A. Polyploidy can confer superiority to West African Acacia senegal (L.) Willd. trees. Front. Plant Sci. 2016;7:821. doi: 10.3389/fpls.2016.00821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Vanneste K, Baele G, Maere S, Van de Peer Y. Analysis of 41 plant genomes supports a wave of successful genome duplications in association with the Cretaceous–Paleogene boundary. Genome Res. 2014;24:1334–1347. doi: 10.1101/gr.168997.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Vanneste K, Maere S, Van de Peer Y. Tangled up in two: a burst of genome duplications at the end of the Cretaceous and the consequences for plant evolution. Philos. Trans. R. Soc. B Biol. Sci. 2014;369:20130353. doi: 10.1098/rstb.2013.0353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Doyle JJ, Doyle JL. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull. 1987;19:11–15. [Google Scholar]
- 79.Servant N, et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 2015;16:259. doi: 10.1186/s13059-015-0831-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Dudchenko O, et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017;356:92–95. doi: 10.1126/science.aal3327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Wu H, et al. A high-quality Actinidia chinensis (kiwifruit) genome. Hortic. Res. 2019;6:1–9. doi: 10.1038/s41438-018-0066-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Li H. BFC: correcting Illumina sequencing errors. Bioinformatics. 2015;31:2885–2887. doi: 10.1093/bioinformatics/btv290. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Xu H, et al. FastUniq: a fast de novo duplicates removal tool for paired short reads. PLoS ONE. 2012;7:e52249. doi: 10.1371/journal.pone.0052249. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Kajitani R, et al. Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res. 2014;24:1384–1395. doi: 10.1101/gr.170720.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Durand NC, et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 2016;3:95–98. doi: 10.1016/j.cels.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Durand NC, et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Systems. 2016;3:99–101. doi: 10.1016/j.cels.2015.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. In Current Protocols in Bioinformatics (John Wiley & Sons, Inc., 2002). [DOI] [PubMed]
- 89.Jurka J, et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 2005;110:462–467. doi: 10.1159/000084979. [DOI] [PubMed] [Google Scholar]
- 90.Ellinghaus D, Kurtz S, Willhoeft U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinform. 2008;9:18. doi: 10.1186/1471-2105-9-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Xu Z, Wang H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 2007;35:W265–W268. doi: 10.1093/nar/gkm286. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Ou S, Jiang N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 2018;176:1410–1422. doi: 10.1104/pp.17.01310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Haas BJ, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 2013;8:1494–1512. doi: 10.1038/nprot.2013.084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Haas BJ, et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003;31:5654–5666. doi: 10.1093/nar/gkg770. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Majoros WH, Pertea M, Salzberg SL. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics. 2004;20:2878–2879. doi: 10.1093/bioinformatics/bth315. [DOI] [PubMed] [Google Scholar]
- 96.Stanke M, Steinkamp R, Waack S, Morgenstern B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 2004;32:W309–W312. doi: 10.1093/nar/gkh379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Haas BJ, et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 2008;9:1–22. doi: 10.1186/gb-2008-9-1-r7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Conesa A, et al. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005;21:3674–3676. doi: 10.1093/bioinformatics/bti610. [DOI] [PubMed] [Google Scholar]
- 99.Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 2013;30:772–780. doi: 10.1093/molbev/mst010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–1313. doi: 10.1093/bioinformatics/btu033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Hedges SB, Dudley J, Kumar S. TimeTree: a public knowledge-base of divergence times among organisms. Bioinformatics. 2006;22:2971–2972. doi: 10.1093/bioinformatics/btl505. [DOI] [PubMed] [Google Scholar]
- 102.De Bie T, Cristianini N, Demuth JP, Hahn MW. CAFE: a computational tool for the study of gene family evolution. Bioinformatics. 2006;22:1269–1271. doi: 10.1093/bioinformatics/btl097. [DOI] [PubMed] [Google Scholar]
- 103.Lechner M, et al. Proteinortho: detection of (co-)orthologs in large-scale analysis. BMC Bioinform. 2011;12:124. doi: 10.1186/1471-2105-12-124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Sahm A, Bens M, Platzer M, Szafranski K. PosiGene: automated and easy-to-use pipeline for genome-wide detection of positively selected genes. Nucleic Acids Res. 2017;45:e100. doi: 10.1093/nar/gkx179. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Raw Illumina-short reads and Hi–C reads used for de novo whole-genome assembly have been deposited in the National Center for Biotechnology Information (NCBI) Sequence Read Archive database under accession number PRJNA630530. The genome and related annotation data have been deposited in the National Genomics Data Center (PRJCA002888).





