Skip to main content
Plant Biotechnology Journal logoLink to Plant Biotechnology Journal
. 2023 Jul 3;21(10):2084–2099. doi: 10.1111/pbi.14115

Decoupling subgenomes within hybrid lavandin provide new insights into speciation and monoterpenoid diversification of Lavandula

Jingrui Li 1,2, , Hui Li 1,2, , Yiming Wang 3, , Wenying Zhang 1,2, Di Wang 1,2, Yanmei Dong 1,2, Zhengyi Ling 1,2, Hongtong Bai 1,2, Xiaohua Jin 2,4, Xiaodi Hu 3,, Lei Shi 1,2,
PMCID: PMC10502749  PMID: 37399213

Summary

Polyploidization and transposon elements contribute to shape plant genome diversity and secondary metabolic variation in some edible crops. However, the specific contribution of these variations to the chemo‐diversity of Lamiaceae, particularly in economic shrubs, is still poorly documented. The rich essential oils (EOs) of Lavandula plants are distinguished by monoterpenoids among the main EO‐producing species, L. angustifolia (LA), L. × intermedia (LX) and L. latifolia (LL). Herein, the first allele‐aware chromosome‐level genome was assembled using a lavandin cultivar ‘Super’ and its hybrid origin was verified by two complete subgenomes (LX‐LA and LX‐LL). Genome‐wide phylogenetics confirmed that LL, like LA, underwent two lineage‐specific WGDs after the γ triplication event, and their speciation occurred after the last WGD. Chloroplast phylogenetic analysis indicated LA was the maternal source of ‘Super’, which produced premium EO (higher linalyl/lavandulyl acetate and lower 1,8‐cineole and camphor) close to LA. Gene expression, especially the monoterpenoid biosynthetic genes, showed bias to LX‐LA alleles. Asymmetric transposon insertions in two decoupling ‘Super’ subgenomes were responsible for speciation and monoterpenoid divergence of the progenitors. Both hybrid and parental evolutionary analysis revealed that LTR (long terminal repeat) retrotransposon associated with AAT gene loss cause no linalyl/lavandulyl acetate production in LL, and multi‐BDH copies retained by tandem duplication and DNA transposon resulted in higher camphor accumulation of LL. Advances in allelic variations of monoterpenoids have the potential to revolutionize future lavandin breeding and EO production.

Keywords: Lavandula × intermedia, allelic variations, speciation, transposon, monoterpenoid evolution


Polyploidy and transposon insertion are associated with speciation and monoterpenoid diversification of Lavandula.

graphic file with name PBI-21-2084-g003.jpg

Introduction

Lavender has been extensively used for years in cosmetics, alternative medicines and aromatherapy industries as the bactericidal and antioxygenic performance of its abundant essential oil (EO). The global lavender oil market is valued at USD 34.2 Million in 2020 and is expected to reach USD 49.4 million by 2027. Lavandula × intermedia, commonly known as lavandin, is a naturally occurring hybrid complex between L. angustifolia subsp. angustifolia and L. latifolia (Upson and Andrews, 2004). It is typically a much larger and more robust plant than either parent and can claim to be the most widely grown lavender, dominating the world's production of lavender EO (accounting for ~85% of the annual global yield). The other feature of lavandin is its sterility and thus it can only be propagated clonally, giving rise to the classic images of regular, unvarying rows of lavender. Despite its prevalence in cultivation and on the EO market, the lavandin has received limited attention from the scientific community. One important constraint is the camphorous smell of lavandin EO (Aprotosoaie et al., 2017; Pokajewicz et al., 2023; Salehi et al., 2018). As it has inborn nature of high yield potential and great robustness, current lavandin breeding programs mainly lay emphasis on EO quality improvement.

As representative volatile renewable resources, the lavender EO mainly consists of mono‐ and sesquiterpenoids, which are primarily stored in the peltate glandular trichome (PGT) (Guitton et al., 2018; Li et al., 2021a). The monoterpenoids play important ecological and physiological roles in Lavandula plants, and generally, their biosynthesis can be regulated by different developmental growth stages, herbivory attacks and pathogen infections (Li et al., 2019, 2021a; Woronuk et al., 2011). Among these, the monoterpenoids 1,8‐cineole and camphor are of particular significance as they add a sharp overtone to the L. latifolia scent, while linalyl acetate and lavandulyl acetate impart a pleasant scent of L. angustifolia EOs (Guitton et al., 2010; Woronuk et al., 2011). Additionally, the EO quality differs across lavandin cultivars and some cultivar trials demonstrated that L. × intermedia cv. ‘Super’ exhibits a mild aroma and produces high‐quality oil with higher linalyl/lavandulyl acetate and lower 1,8‐cineole/camphor contents compared with most L. latifolia cultivars (Guitton et al., 2010; Upson and Andrews, 2004).

Transposable elements (TEs) are one of the major factors to cause structural variations and are divided into two classes, DNA transposons and retrotransposons. DNA transposons transpose through a cut‐and‐paste mechanism while retrotransposons produce their RNA copies to transpose into another genomic region, called a copy‐and‐paste mechanism (Vicient and Casacuberta, 2017). TEs could modify gene expression levels by directly altering gene copy numbers, such as deletions, duplications and inversions. TE insertions nearby genes also can cause large‐scale perturbations of cis‐regulatory regions and are therefore more likely to quantitatively change gene expression and phenotypes (Feschotte et al., 2002; Kim et al., 2019; Xia et al., 2020). Our latest study for L. angustifolia indicated that the astounding diversity of volatile terpenoids has been thought to arise largely from lineage‐specific whole‐genome duplication and tandem duplication, resulting in the expansion of terpenoid biosynthetic genes (Li et al., 2021a). However, despite their importance, the specific contribution of gene duplication and transposons to the chemo‐diversity of Lamiaceae, particularly in economically important Lavandula, is still poorly documented.

The recently reported L. angustifolia reference genome cannot represent the full range of genetic diversity related to phenotypes such as EO composition (Li et al., 2021a; Mansfeld et al., 2021; Sun et al., 2021). Here, we de novo assemble a high‐quality chromosome‐scale genome of L. × intermedia ‘Super’, with fully represented subgenomes (LX‐LA and LX‐LL). Genome phasing analysis shows that TEs are responsible for genome size variation, allele‐specific expression and the evolutionary history of two lavandin subgenomes, and have a great impact on monoterpenoid biosynthetic gene families. We also unveil the genetic components responsible for the distinct EO composition of Lavandula species, demonstrating their origin by homeolog losses, pseudogenization and altered gene expression patterns. A holistic view of the hybrid lavandin genome aids EO component‐related allele discovery and provides a cornerstone for the functional genomics of Lavandula to facilitate further improvement.

Results

Disparate monoterpenoid accumulation of three lavender taxa

The three lavender taxa, L. angustifolia (LA), L. × intermedia (LX) and L. latifolia (LL), are the main EO‐producing plants worldwide (Figure 1a,b). As mono‐ and sesquiterpenoids are the main constituents of lavender EO, we analysed the volatile terpenoid profiles by solid‐phase microextraction coupled to gas chromatography‐mass spectrometry (Figure 1c; Table S1). Principal component analysis revealed that five monoterpenoids varied dramatically among three lavender taxa, including linalool, linalyl acetate, lavandulyl acetate, 1,8‐cineole and camphor (Figure 1d). Besides linalool, linalyl acetate and lavandulyl acetate were identified as major monoterpenoids of LA flower, while 1,8‐cineole and camphor constituted the main monoterpenoids of LL flower (Figure 1c,e). The flower linalool contents of LA and LX were higher 60.26% and 71.61% than LL, respectively. Strikingly, there was no linalyl/lavandulyl acetate in LL flower, while linalyl/lavandulyl acetate dominantly accumulated in flowers of LA and LX (total 38.12 and 25.91 μg/mg in LA and LX). LL flower accumulated 5.23 μg/mg camphor and 8.02 μg/mg 1,8‐cineole, which were ∼5011‐ and 61‐fold higher than LA flower and ~3‐ and 4‐fold higher than LX flower, respectively (Figure 1e).

Figure 1.

Figure 1

Monoterpenoid accumulation and genomic features of Lavandula. (a) Morphological characteristics among three lavender taxa. Scale bars: 1 cm. (b) Original distribution of Lavandula (refer to Upson and Andrews, 2004). L. × intermedia grows naturally in the overlapping geographical areas of L. angustifolia and L. latifolia. (c) Gas chromatograms show linalool, linalyl acetate, lavandulyl acetate, 1,8‐Cineole and camphor in the flowers of L. angustifolia (LA), L. × intermedia (LX) and L. latifolia (LL). (d) Principal component analysis of volatile terpenoids from Lavandula plants. (e) Contents of linalool, linalyl acetate, lavandulyl acetate, 1,8‐cineole and camphor in the flower of L. angustifolia (LAF), L. × intermedia (LXF) and L. latifolia (LLF). (f) Landscape of the L. × intermedia genome. The tracks I–V of circos plot represent the distribution of chromosomes (purple items indicate LX‐LA and orange items indicate LX‐LL), transposable element density, gene density, gene expression level (FPKM; the highest expression of different sequenced tissues) and structural variation density. Each line in the centre of the circle connects a pair of homologous genes in LX‐LA and LX‐LL subgenomes.

Assembly and annotation of L. × intermedia genome

The two closely related Lavandula species, LA and LL, are recognized as parents of LX from geographic distribution, botanical traits and some DNA markers (Hind et al., 2018; Upson and Andrews, 2004; Figure 1a,b). However, LA and LL carry different numbers of chromosomes (27 and 24 pairs, respectively) and we identified 51 chromosomes in L. × intermedia ‘Super’ (Figure S1), suggesting that the biparental genetic information was combined in this hybrid. To investigate the genetic basis of monoterpenoid differences among lavender taxa, a chromosome‐level assembly for lavandin (cv. ‘Super’) with a combination of single‐molecule real‐time sequencing by PacBio, 10× Genomics, high‐throughput chromatin conformation capture (Hi‐C) and Illumina short‐read sequencing. Using flow cytometry, we estimated the lavandin genome size as 1108.08 Mb (Figure S2), which is consistent with k‐mer counting results (1068.16 Mb) (Figure S3). We first generated 103× coverage (110.21 Gb) of PacBio long reads for a primary assembly and then polished with 51× coverage (54.11 Gb) of Illumina short reads. This resulted in an assembly with a contig N50 of 1.82 Mb and a total length of 1.80 Gb. Next, after assistant assembly with 117× coverage (125.21 Gb) of 10× Genomics reads, we generated 116× coverage (123.96 Gb) of Hi‐C data, which allowed >98.98% of the contigs to be anchored onto 51 pseudomolecules with a scaffold N50 of 35.46 Mb (Table S2; Figures S4 and S5). The length of lavandin assembly was nearly twice that predicted by the flow cytometry and genome survey, indicating that it could be subgenome‐resolved. As the LA reference genome has been published (Li et al., 2021a), we generated LL draft genome with the size of 981.67 Mb and contig N50 of 49.32 kb using Illumina sequencing (Tables [Link], [Link]). Finally, we divided the lavandin assembly into 826.68‐Mb LX‐LA and 976.84‐Mb LX‐LL subgenomes based on the differentiable LA and LL genomes (Figure S6). The lavandin 51 chromosomes were re‐numbered from Chr1 to Chr27 belonging to LX‐LA subgenome and from Chr28 to Chr51 belonging to LX‐LL subgenome (Table 1; Table S6). The final lavandin assembly reached a high long terminal repeat assembly index (LAI; 16.63) (Ou et al., 2018) and high completeness (exceeding 98%) (Tables S7 and S8). A total of 109 080 protein‐coding genes were inferred from lavandin genome, with supporting evidence from the RNA‐seq data of various tissues and from PacBio isoform sequencing data (Tables [Link], [Link]).

Table 1.

Assembly and annotation of genomes of three lavender taxa

Parameter LA LX‐LA LX‐LL LL
Total length of contigs 911.14 M 823.46 M 972.63 M 981.67 M
Total number of contigs 1383 1032 1167 304 655
N50 of contigs (bp) 1.22 M 1.88 M 1.78 M 49.32 K
Largest contig (bp) 9 968 313 9 755 447 7 131 504 406 389
Total length of scaffolds 914.49 M 826.68 M 976.84 M 1000.62 M
Total number of scaffolds 306 566 559 274 440
N50 of scaffolds (bp) 3.00 M 4.23 M 5.39 M 0.44 M
N50 of super‐scaffolds (bp) 36.65 M 35.45 M 35.45 M 36.60 M
Largest scaffold (bp) 46 645 376 12 658 113 17 021 574 2 291 030
GC content 38.58% 38.72% 38.89% 38.75%
Number of genes anchored to chromosome 60 156 54 293 54 319 57 354
Complete genome BUSCOs 98.4% 98.0% 98.5%
Complete protein‐coding gene BUSCOs 96.5% 98.8% 96.7%
Repeat density 58.28% 59.32% 65.30%
Number of protein‐coding genes 65 905 109 080 58 126
Average length of transcripts (bp) 2739.26 2681.58 2336.27
Average length of coding sequenses (bp) 1114.44 1133.70 1071.91
Number of annotated genes 62 822 (95.3%) 104 861 (96.1%) 55 602 (95.7%)
Number of miRNA 1351 1448 845
Number of tRNA 1298 2449 1450
Number of rRNA 399 1990 1587
Number of snRNA 1199 2041 1203

The genome of L. angustifolia has been published (Li et al., 2021a).

Genome evolution of L. × intermedia ‘Super’

Phylogenomic analysis of the genomes of 12 additional close and distant relatives of L. × intermedia ‘Super’ revealed that LA and LL diverged from the common ancestor approximately 5.3 million years ago (MYA) (Figure 2a,b). Molecular dating also showed that LA separated from LX‐LA approximately 2.9 MYA before the divergence of LL and LX‐LL approximately 0.9 MYA, suggesting the formation of LX at least within 0.9 MYA (Figure 2a). A total of 11 gene families showed significant expansion in the LA genome and contraction in the LL genome. These gene families were related to ‘monoterpenoid biosynthesis’, ‘sesquiterpenoid and triterpenoid biosynthesis’ and ‘plant–pathogen interactions’ pathways (Figure 2c). This indicated that L. angustifolia and L. latifolia functionally diverged at the genome level during evolution. After the γ triplication event, L. latifolia, like L. angustifolia (Li et al., 2021a), underwent two WGDs approximately 6.86 and 29.6 MYA, suggesting that L. × intermedia originated after the last WGD in L. angustifolia and L. latifolia (Figure S7).

Figure 2.

Figure 2

Genomic evolution of lavandin. (a) Phylogenetic tree shows the relationship among L. angustifolia, L. latifolia and two L. × intermedia subgenomes and their divergence times. (b) Density distributions of Ks for paralogous genes in L. angustifolia and L. latifolia. (c) KEGG pathway enrichment of expansive genes in L. angustifolia in comparison with L. latifolia. (d) Phylogenetic tree of Lavandula species constructed using the maximum likelihood methods, based on the whole chloroplast genome sequences.

The chloroplast genomes of 16 Lavandula cultivars, belonging to eight species, of Lavandula were assembled (Figures S8 and S9). Whole chloroplast genome analysis revealed that L. angustifolia is the plastid donor of L. × intermedia ‘Super’ (Figure 2d). This result was confirmed by the phylogenetic analysis of all CDSs in the chloroplast genomes (Figure S10). Our findings are in line with the morphology‐based hybrid origin of lavandin and support the speculated spontaneous hybridization between wild L. angustifolia and L. latifolia.

Asymmetric subgenome evolution in lavandin

Investigation of macrosyntenic relationships between the two lavandin subgenomes revealed two major processes that explain the 27‐ and 24‐chromosome karyotypes of LX‐LA and LX‐LL, respectively; Chr4, Chr9 and Chr27 of LX‐LA showed good collinearity with Chr31 of LX‐LL, while Chr16 and Chr25 of LX‐LA showed good collinearity with Chr42 of LX‐LL (Figures S11 and S12). The high‐quality phased genome of lavandin enabled the detection of structural variants between LX‐LA and LX‐LL subgenomes, including large insertions or deletions (Figure 1e). Functional enrichment analysis showed that genes involved in secondary metabolic pathways, such as monoterpenoid, flavone, flavonol and zeatin biosynthesis, were susceptible to variations (Figure S13). Although the LA genome contains more chromosomes, its size is relatively smaller than that of LL (LA: 914.49 Mb; LL: 1000.62 Mb) and is intermediate between those of two LX subgenomes (LX‐LA: 826.68 Mb; LX‐LL: 976.84 Mb). Using web‐based Assemblytics, a total of 4169 and 4605 large variants (≥50 bp) were detected in LA vs. LL and LX‐LA vs. LX‐LL comparisons, respectively (Table S14). The results suggest that insertions and tandem duplications enlarge the LL genome and the LX‐LL subgenome (Figure S14).

Transposable elements (TEs) are one of the major factors responsible for variation in genome sequence and gene expression patterns (Alonge et al., 2020; Alseekh et al., 2020; Feschotte et al., 2002). LTR retrotransposons (LTR‐RTs) were identified as the most abundant type of TEs in lavandin, accounting for 56.62% of LX‐LA and 53.31% of LX‐LL subgenomes, respectively (Figure 3a; Table S15). Phylogenetic analysis indicated that the classifications of Gypsy and Copia lineages were different, such as the subclades Ale, TAR and Maximus in Copia and Tekay in Gypsy expanded in LX‐LL, Ivana in Copia and Reina in Gypsy expanded in LX‐LA (Figure 3b). Additionally, recent LTR‐RT burst events occurred at approximately 0.3 MYA in LX‐LA and approximately 1.5 MYA in LX‐LL. The Gypsy and Copia LTR insertions were also younger in LL and LX‐LL than in LA and LX‐LA (Figure 3c). Based on these results, we can speculate that the burst of universal LTR‐RTs in LX‐LL was the major contributor to the divergence between LX‐LA and LX‐LL.

Figure 3.

Figure 3

Transposable element (TE) insertion dynamics and allele‐specific expression pattern between the LX‐LA and LX‐LL subgenomes of lavandin. (a) Repeat sequence distribution patterns and LTR‐RT types (b) Classification of LTR‐RT superfamilies. The subgenome location of Copia and Gypsy are indicated by the outer circle. (c) Estimated times of intact LTR‐RT, Copia and Gypsy insertion in two LX subgenomes. (d) Distribution of differentially expressed alleles (DEAs) and differentially expressed terpenoid biosynthetic alleles (DETBAs) with soft (FC > 1) and robust cut‐off (FC > 2) in all tissues. (e,f) The number of DEAs (e) and DETBAs (f) across the five tissues towards the LX‐LA (blue) or LX‐LL (yellow) subgenome. (g,h) Tissue‐specific expression patterns of TBGs harbouring LTR‐RTs (green) or lacking LTR‐RTs (orange) in upstream (g) and intergenic (h) sequences. Gene expression levels were examined in five different tissues of L. × intermedia. Asterisks indicate significant differences with the Wilcoxon rank‐sum test (*P < 0.05, **P < 0.01). (i) Proportion of LTR‐RTs in protein‐coding genes relative to that in the whole genome, TBGs and pseudogenes of lavandin.

The variations in transcript abundance between parental two accessions can be inferred by assessing the allelic imbalance of two alleles in an F1 hybrid (Erik et al., 2020). The subgenome‐resolved lavandin genome enabled us to distinguish 33 684 allelic genes (Table S16). A total of 28 258 alleles showed allele‐specific expression in at least one tissue type, and 2898 alleles exhibited biased expression in all tissue types (Figure S15). Allelic genes showed a subtle expression bias towards the LX‐LA subgenome, with roughly 50.84% alleles (relaxed cut‐off, fold‐change [FC] > 1) across all five tissues. The bias slightly increased (51.68%) when a robust cut‐off (FC >2) of differential expression was applied (Figure 3d; Figure S16). Functional enrichment of LTR‐RTs inserted in different positions of LX‐LA and LX‐LL alleles showed LTR‐RTs inserted in overlap positions of LX‐LL alleles were enriched in terpenoid biosynthetic pathways (Figure S17). An increasing allele‐specific expression rate bias (53.23% and 56.52% with relaxed and robust cut‐offs, respectively) towards LX‐LA was observed among differentially expressed terpenoid biosynthetic alleles (Figure 3e,f; Figures S18 and S19). The vast majority of differentially expressed alleles, especially terpenoid biosynthetic alleles, maintained the same pattern of dominance across all tissues. Approximately, 50.29%–52.15% and 52.31%–63.10% of the total alleles and terpenoid biosynthetic alleles showed expression bias towards the LX‐LA compared with the LX‐LL subgenome (Figure 3e,f).

We found most gene families related to terpenoid biosynthesis were impacted by LTRs, with an above‐average LTR‐RT insertion rate in LA, LX and LL (Figure S20; Table S17). LTR‐RTs resided in the upstream and intragenic sequences of (terpenoid biosynthetic genes) TBGs, largely giving rise to transcriptional diversification (Figure 3g,h). Among various tissues, interestingly, we found that LTR‐RTs inserted upstream of genes in 2‐C‐methyl‐D‐erythritol‐4‐phosphate pathways (which contributed mainly to monoterpenoid biosynthesis) were significantly causing different expressions between LX‐LL and LX‐LA and genes in mevalonate pathway showed no differences (Figures S21 and S22), suggesting different LTR‐RT insertions may contribute to distinct monoterpenoid accumulation between LA and LL. Additionally, compared with genome‐wide protein‐coding genes, TBGs and the related pseudogenes showed a significantly higher content of LTR‐RTs (Figure 3i). These results suggested that LTR‐RTs potentially acted as a driving force for EO chemotype diversification during the evolution of Lavandula subgenera.

Monoterpenoid biosynthetic alleles in lavandin

Since disparate monoterpenoid accumulation was observed in two parents of lavandin, we focused on the monoterpenoid biosynthetic pathway in the chloroplast. Genes related to terpenoid backbone biosynthesis via 2‐C‐methyl‐D‐erythritol‐4‐phosphate pathways as well as those encoding prenyltransferases and monoterpenoid synthases have been well characterized (Table S18). Peltate glandular trichome (PGT) is the main organ that produces monoterpenoid, and we found that total of 15 alleles have higher expression levels in LX‐LA than that in LX‐LL (Figure 4). 1‐deoxy‐D‐xylulose 5‐phosphate synthase (DXS), 1‐deoxy‐D‐xylulose 5‐phosphate reductoisomerase (DXR) have been defined as rate‐limiting enzymes for terpenoid biosynthesis (Abbas et al., 2017; Nagegowda and Gupta, 2020). The expression of DXS and DXR alleles also showed bias towards the LX‐LA subgenome. Among 10 DXR allelic genes that could find copies both in LX‐LA and LX‐LL, seven DXRs (LiDXR1A‐LiDXR7A) of LX‐LA subgenome showed significantly higher expression levels than their allelic genes (LiDXR1L‐LiDXR7L) in the LX‐LL subgenome and four alleles (LiDXR7A‐LiDXR10A vs LiDXR7L‐LiDXR10L) showed contrary tendency. Although LX‐LL had one more DXR copy, the LiDXR22L showed no expression in five tissues (Figure 4).

Figure 4.

Figure 4

Overview of monoterpenoid biosynthetic pathway in lavandin. Expression atlas of monoterpenoid biosynthetic gene families in the LX‐LA and LX‐LL subgenomes across the five tissues. R, S, L, F and PGT represent the root, stem, leaf, flower and peltate glandular trichome of L. × intermedia. Horizontally oriented genes in the heat map indicate allelic genes between LX‐LA and LX‐LL subgenomes. Grey boxes indicate the absence of alleles between the two subgenomes. Asterisks indicate the dominance of differentially expressed alleles between the two subgenomes, and two asterisks indicate differentially expressed alleles with FC >2 cut‐offs. The colour indicates the normalized expression level (relative percentage of the highest expression level in each gene family).

The linalyl acetate, lavandulyl acetate, 1,8‐cineole and camphor were the main different monoterpenoids identified in LA, LX and LL (Figure 1c) and important index of Lavandula EO quality. We identified 40 and 28 members of TPS‐b subfamily in LX‐LA and LX‐LL respectively. And the number of genes with PGT‐specific expression pattern were 20 and 9 in LX‐LA and LX‐LL respectively (Figure S23). Linalool and lavandulol are the precursor for superior components linalyl acetate and lavandulyl acetate, respectively. The identified linalool synthases (LINSs) showed no significant difference between lavandin two subgenomes. The lavandulyl diphosphate synthases (LPPSs), LiLPPS1A and LiLPPS3A, identified in LX‐LA were expressed higher in PGT, whereas the LiLPPS1L and LiLPPS2L found in LX‐LL were not expressed. This gene expression trend was consistent with the lavandulol contents in LA and LL. For sharp components, 1,8‐cineole and camphor accumulated more in LL and leaf tissues. We found three 1,8‐cineole synthase (CINS) genes from LX‐LA in lavandin and two CINSs (LiCINS2A and LiCINS3A) showed high expression levels in leaf, which is in line with 1,8‐cineole accumulation. The loss of CINS alleles of LX‐LL might cause the fragrant EO of ‘Super’. The borneol is the precursor of camphor. Among four and six bornyl‐diphosphate synthases (BPPSs) were identified in LX‐LA and LX‐LL, LiBPPS1A of LX‐LA and three BPPSs of LX‐LL (LiBPPS1L and LiBPPS5L and LiBPPS7L) showed higher expression in leaf (Figure 4).

Genetic basis of the high‐level accumulation of linalyl/lavandulyl acetate in L. angustifolia

Besides LINS and LPPS, the synthetases of linalyl acetate and lavandulyl acetate precursor, we systematically mined the genomic organization of linalyl/lavandulyl acetate biosynthesis genes (AATs) in the three lavender taxa.

The AATs in LA

Phylogenetic analysis showed that three tandemly arranged genes in LA (La14G01394, La14G01393 and La14G01640) grouped together with LiAAT3 in the BAHD Va subfamily, suggesting that these proteins exhibit similar catalytic activity (Figure 5a; Figures S24 and S25). The expression pattern of La14G01394 was consistent with the high‐level accumulation of lavandulyl acetate in flowers and glandular trichomes (Li et al., 2021a). Then the La14G01394 CDS was cloned and expressed in bacteria (Figure S26). The purified La14G01394 protein could convert lavandulol to lavandulyl acetate but not convert linalool to linalyl acetate (Figure 5b,c; Figure S27). Mapping of genomic DNA and RNA‐Seq reads revealed the insertion of ‘GT’ and ‘GACAGTTT’ at nucleotide positions 274–275 and 378–385, respectively, in La14G01393 (Figure 5d), and deletion of ‘A/G’ at 464 nt in La14G01640 (Figures S28 and S29), causing pseudogenization of the putative AATs (La14G01393 and La14G01640). Genes located on Chr15 and Chr14 were almost completely collinear within LA, and microsynteny analysis revealed that genes flanking La14G01394 on Chr14 were inversely arranged relative to the syntenic genes on Chr15 (Figure S30). These results indicated that grouped and close‐range La15G00488 and La15G00501 were collinear to La14G01393 and La14G01394 (Figure 5a) and probably arose 65.26 MYA via WGD. Considering the phylogeny and WGD history of AAT‐related genes in LA, we propose an evolutionary model for AATs. First, La14G01393 and La14G01394 were derived by tandem duplication of the ancestral AAT gene, followed by the gradual pseudogenization of La14G01393. Then, La15G00501 and La15G00488 arose by WGD and likely underwent functional segregation (Figure 5e). In addition, we found that La06G01094 grouped together with the functional LiAAT4 (Figure 5a). Enzyme activity assay showed that the purified La06G01094 produced linalyl acetate and lavandulyl acetate from linalool and lavandulol (substrates), respectively (Figure 5b,c; Figure S27), suggesting the multifunctionality of the AAT4.

Figure 5.

Figure 5

Evolution of AATs in lavandin. (a) Phylogenetic analysis of Lavandula AATs. AAT3 and AAT4 clustered into the BAHD IIIa and Va subgroups, respectively. Paralogs in LA, LX and LL genomes are indicated in purple, orange and green fonts, respectively. (b,c) Analysis of the activity of LaAAT3 (b) and LaAAT4 (c) in vitro. The in vitro enzyme assay was carried out using purified proteins and linalool or lavandulol (substrate). LaAAT4 produced linalyl acetate and lavandulyl acetate, whereas LaAAT3 produced only lavandulyl acetate. (d) Alignment of AAT nucleotide sequences for comparison between LA and LL genomes and LX‐LA and LX‐LL subgenomes. Dark purple squares represent the coding sequence (CDS); dashed boxes represent the recuperative CDS region of the pseudogene; green and red lines indicate abnormal insertions and terminations, respectively; blue squares represent TEs; grey lines between two genes represent regions with high sequence similarity. (e) Proposed evolutionary models for the birth of AAT3, as inferred from LA and LX‐LA assemblies and LL and LX‐LL assemblies. A series of tandem duplication, whole‐genome duplication (WGD), gene loss, pseudogenization and neofunctionalization events are included. Purple colour indicates the AAT modules. Pseudogenes, lost genes and neofunctionalized genes are indicated with white, grey and blue arrows, respectively.

The differences in AATs among LA, LL and LX

Syntenic analysis revealed that sequences at 27 283 446–27 283 794 (348 bp), 27 289 279–27 289 396 (117 bp), and 27 294 141–27 294 359 (218 bp) nucleotide positions on Chr13 of LL were partially homologous to La14G01393/La14G01394 (Figure 5d; Figure S31). A closer look at the region showed that AAT in LL was the result of gradual pseudogenization due to the insertion of one (5477 bp) and five (543, 1213, 253, 245 and 215 bp) Copia‐type LTR‐RTs at nucleotide positions 27 283 794–27 289 279 and 27 289 396–27 294 141, respectively, in the Chr13 of LL (Figure 5d). Based on sequence alignment results, the pseudogene sequence in LL was recovered and labelled as LL_Chr13ψ. Similar patterns were identified between LX‐LA and LX‐LL subgenomes using similar approaches, and the corresponding pseudogene loci were recovered and labelled as LX‐LA_Chr14ψ and LX‐LL_Chr40ψ, respectively (Figure 5d). The LX‐LA_Chr14ψ, LX‐LL_Chr40ψ and LL_Chr13ψ loci grouped together with La14G01393 (Figure S32), again confirming the pseudogenization of La14G01393‐corresponding loci in LA, LX‐LA, LX‐LL and LL. Both LL and LX‐LL results suggested that the pseudogenization of AAT3 locus 1 (due to LTR‐RT insertion) and loss of AAT3 locus 2 were partially responsible for the lack of lavandulyl acetate biosynthesis in LL (Figure 5e). Moreover, a series of phylogenetic and collinearity analyses revealed that the AAT4 gene was completely lost in LX‐LL and LL, uncovering at least in part the basis of linalyl/lavandulyl acetate absence in LL.

The AATs in Lamiaceae

Sequence comparison among lavender taxa and three other species belonging to the Nepetoideae subclade of Lamiaceae, including Ocimum basilicum (Obas) (Gonda et al., 2020), Salvia splendens (Sspl) (Jia et al., 2021) and Thymus quinquecostatus (Tqui) (Sun et al., 2022), showed the lack of AAT3 and AAT4 homologues in Obas, Sspl and Tqui (Figure S33), consistent with the absence of linalyl/lavandulyl acetate in the three species (He et al., 2020; Lee et al., 2005; Mathew and Thoppil, 2011).

Genetic basis of the high‐level accumulation of camphor in L. latifolia

Compared to LX‐LL, less copy and lower expression of BPPS were observed in LX‐LA, resulting in more borneol produced by LX‐LL that was derived from LL. Camphor biosynthesis is from borneol catalysed by BDH, which is a member of the SDR superfamily (Ma et al., 2021; Singh et al., 2020). We identified all BDH gene copies in the LA, LX and LL genome assemblies by searching for all SDR members, followed by clustering analysis (Figure 6a; Figure S34 and S35).

Figure 6.

Figure 6

Evolution of BDHs in lavandin. (a) Phylogenetic analysis of Lavandula BDHs. (b) Analysis of the activity of LiBDHs encoded by four loci. The enzyme activity assay was carried out in vitro using purified proteins and borneol (substrate). (c) Syntenic analysis of functional BDH loci across LA, LX‐LA, LX‐LL and LL. Grey lines indicate syntenic blocks. (d) Proposed evolutionary models for the birth of BDHs, as inferred from LA and LX‐LA assemblies and LL and LX‐LL assemblies. A series of tandem duplication, WGD, gene loss, pseudogenization and translocation events are included. Different colours denote distinct BDH modules. White and grey arrows indicate pseudogenes and lost genes, respectively.

The BDHs in LL

In LL, eight BDH‐like gene sequences, scattered on Chr20, Chr11 and Chr12, were identified after correction by PCR and sequence alignment. The Ll20G00455, Ll20G00456 and Ll20G00457 genes tandemly arranged in a large syntenic block on Chr20 were identified as potential paralogs that maintained synteny and close evolutionary relationships with Ll12G01327/1328, Ll12G01325 and Ll12G01324 on Chr12, respectively, suggesting that these genes are the result of WGD. Ll20G00457, Ll12G01327/1328, Ll12G01325 and Ll12G01324 were identified as pseudogenes, based on their incomplete CDSs or the presence of stop codons in the open reading frame and negligible or no expression (Figure S36). The Ll12G01326 locus might harbour two genes with two complete CDSs, labelled as Ll12G01326‐1 and Ll12G01326‐2 (Table S19), and the insertion of massive TEs likely promoted the formation of this violently fluctuant region. The BDH on Chr11 of LL, Ll11G00242, located in subtelomeric regions, was inserted with DNA‐type transposons (one upstream, one intragenic and one downstream; Table S20), and Ll11G00242 grouped with Ll12G01326‐1 and Ll12G01326‐2 (Figure 6a), suggesting that Ll11G00242 and Ll12G01326 were potentially derived from one active locus, and Ll11G00242 shifted to another location via the cut‐and‐paste mechanisms. Subtelomeric regions of eukaryotic genomes have previously been suggested to facilitate gene recombination and transposon insertion and serve as hotbeds for the origin of new genes. Together, these results suggest that the ancestral BDH gene underwent local tandem duplication, producing four BDH copies at four loci (Locus 1–4), which then increased to eight via WGD in LL, followed by translocation and insertion/deletion (Figure 6d). Evidence collected using multiple approaches, including syntenic and phylogenetic analyses, protein sequence alignments and WGD within LX‐LL, support the hypothetical model of BDH evolution.

The differences in BDHs among LA, LL and LX

Next, we compared the two hybrid subgenomes and two parental genomes to analyse the differences in camphor abundance. The Chr3 segment of LX‐LA, which exhibited a syntenic relationship with Chr47 of LX‐LL, showed associations with Chr3 of LA and Chr20 of LL (Figure 6c). Three BDH‐like paralogs were found on an ~18‐kb contig in LA (La03G00501, La03G00502 and La03G00503) and LX‐LA (Li03G00441/442, Li03G00443 and Li03G00444) (Figure S37). Phylogenetic analysis showed that three BDH‐like genes grouped with the BDH loci 1, 3 and 4 of LL and LX‐LL (Figure 6a). Frameshift mutations occurred in La03G00501 and Li03G00441/442 at locus 1, resulting in their loss of function. Besides these three BDH‐like genes, we found no BDHs elsewhere in LA and LX‐LA assemblies and only sporadic BDH trace at the position corresponding to locus 2 in LL and LX‐LL. These results suggest that BDH copies at loci 1 and 2 were pseudogenized and lost in LA and LX‐LA during evolution, whereas those at loci 3 and 4 were functional. Indeed, the purified Li03G00443 and Li03G00444 proteins (Figure S38) could convert borneol into camphor in vitro (Figure 6b; Figure S39). Therefore, compared with two functional BDHs in LA and LX‐LA, the LL and LX‐LL assemblies contained three functional BDHs, which were responsible for the strong camphor production of L. latifolia.

The BDHs in Lamiaceae

Similarly, by exploring the available Nepetoideae subclade genome sequence data, we were able to identify several sets of BDH genes. Synteny analysis showed that BDH likely evolved in an ancestor predating the emergence of Lavandula. At loci 1, 3 and 4, we found two BDH copies in Obas, one BDH copy in Tqui and no BDH in Sspl. At locus 4, only LL and LX‐LL contained a functional BDH, whereas LA, LX‐LA, Obas, Sspl and Tqui lacked BDH (Figure S40). The BDH copy number in each species was consistent with its camphor content, because terpenoid accumulation, at least to a certain extent, is controlled by biosynthetic enzymes in a dosage‐dependent manner (He et al., 2020; Lee et al., 2005; Mathew and Thoppil, 2011).

Discussion

Lavandin is recognized as a spontaneous hybrid of L. angustifolia and L. latifolia and occupies an intermediate ecological niche, with many physiological advantages over its parents, yet a complete description of its molecular underpinnings remains elusive (Upson and Andrews, 2004). Nowadays, remarkably more attention is paid to L. angustifolia due to its superior EO, the research about lavandin and L. latifolia are less known (Mathias et al., 2023; Pokajewicz et al., 2023). Our study firstly decoded the fully phased genome of the world‐renowned lavandin cultivar ‘Super’ with outstanding EO characteristics. The integration of PacBio long‐read and Illumina short‐read data and parental genome alignment‐based subgenome phasing improved the contiguity and accuracy of genome assembly substantially with a notably enhanced assembly of LTRs (LAI value = 16.53). Unravelling lavandin genome into two subgenomes provides opportunities for resolving important biological trait‐related allelic variations among progenitors, for example, the urgent requirement of modern lavandin breeding to produce new selections with higher linalyl acetate and lower camphor contents. In addition, the high‐quality phased lavandin genome also provides an attractive model to underlie heterosis, and more studies about plant morphological and resistance advantages could be carried out gradually.

The merger of divergent genomes via hybridization often causes phenotypic and genotypic perturbations because of conflicts between parental genomes (Roy, 2021; Sharbrough et al., 2017; Xiao et al., 2021). A dominant subgenome often emerges immediately, with a significantly greater number of retained genes and a higher level of allele‐specific expression, as the plant returns to a diploid‐like state with reduced hybrid/polyploid incompatibilities (Panchy et al., 2016; Roy, 2021). Biased homologue gene expression patterns have been observed in a few animals and some allopolyploid plant species, including Eragrostis tef (VanBuren et al., 2020), Brassica napus (Zhang et al., 2018), Musa balbisiana (Wang et al., 2019) and Gossypium hirsutum (Zhao et al., 2018). A slight LX‐LA allele‐biased expression atlas, especially evident in the DETBAs of L. × intermedia ‘Super’, in line with its EO chemotype, was the closest to L. angustifolia among numerous lavandin cultivars. The allele‐specific expression patterns in two lavandin subgenomes may represent alternative strategies for counteracting deleterious effects and improving adaptability. Hybridization is the route to speciation, and knowledge of the early stages of post‐hybridization evolution is particularly important (Bird et al., 2018). Allopolyploid speciation requires rapid evolutionary reconciliation of two diverged genomes. We detected slight expression dominance, no large‐scale structural rearrangements and biased gene loss might due to the sterile lavandin is a nascent F1 hybrid. Research on Mimulus peregrinus suggests that gene expression dominance increases over successive generations, and the greatest bias was observed in the naturally established 140‐year‐old allopolyploid (Edger et al., 2017). Chromosome doubling via tissue culture to produce novel fertile L. × intermedia line deserves further research on global dominant alleles as they can generate successive generations (Urwin, 2014).

Decades of research have shown that structural variations in the plant genome underlie crop improvement and domestication traits such as flavour, fruit size, productivity and stress resistance (Alonge et al., 2020; Alseekh et al., 2020; VanBuren et al., 2020). Our study shows that TEs play critical roles in genome size expansion and transcriptional diversification, as observed previously (Feschotte et al., 2002; Kim et al., 2019; Xia et al., 2020). Differential gene gain and loss also contribute to gene content variation across Lavandula and lead to lineage‐specific gene repertoires. Moreover, the genus Lavandula belongs to the species‐rich and chemical‐diverse Lamiaceae. TEs are significantly enriched in TBGs involved in terpenoid modification (such as AAT and BDH), which results in the colossal chemo‐diversity of the Lavandula metabolome, and the insertion of TEs largely influences the expression level of these TBGs. A recent study showed that TEs contribute to the colossal chemo‐diversity in plants as TE‐mediated recombination likely facilitates the formation of plant biosynthetic gene clusters and the establishment of coregulation of these gene clusters (Boutanaev and Osbourn, 2018). The evolutionary pattern of AATs and BDHs leading to the production of linalyl/lavandulyl acetate and camphor indicates that tandem duplication and WGD are closely connected with EO component diversification within Lavandula in a dosage‐dependent manner. TEs are among the key factors leading to allelic variation in the two lavandin subgenomes. The AAT and BDH pseudogenes derived from duplicated or retrotransposed genes and gene loss may have facilitated the environmental adaptation of L. angustifolia and L. latifolia during evolution, which echoes the study of Xu and Guo and could be regarded as a typical case compliant with the ‘less is more’ rule (Xu and Guo, 2020). Subtelomeric regions of eukaryotic genomes have previously been suggested to facilitate gene recombination and promote cluster formation (David et al., 2009; Freeling, 2009; Li et al., 2021b). The BDH gene cluster is located close to the subtelomeric region, and perturbation of the BDH copy at locus 2 was most likely caused by transposon insertion. Plants can alter their armoury of specialized metabolism to adapt to and survive in diverse ecosystems (Frezza et al., 2019; Pichersky and Raguso, 2018; Wink, 2003). The combination of parental genomes allows for EO chemotype diversification in lavandin, which likely reflects the adaptation to particular environmental niches.

Together, we used the genome phasing‐driven approach in hybrid lavandin to investigate aroma divergence between its progenitors and EO component‐related heterosis, providing an explicit example to study metabolic diversification in horticultural crops. A comprehensive understanding of linalyl/lavandulyl acetate and camphor biosynthesis‐related allelic variation between the two lavandin subgenomes will facilitate the design and application of breeding strategies for the development of novel lavandin cultivars with the desired EO chemotype.

Materials and methods

Plant materials

The experimental lavender was planted in the nursery of aromatic plant germplasm resources of the Institute of Botany, Chinese Academy of Sciences. The L. latifolia and L. × intermedia ‘Super’ were used for genome sequencing. Fresh leaves from a single 2‐year growing plant were harvested and then frozen immediately with liquid nitrogen to preserve genomic DNA for isolation. The tissue culture seedlings of L. × intermedia ‘Super’ provided by the National Wild Plant Germplasm Resource Center for Beijing Botanical Garden, Institute of Botany, Chinese Academy of Sciences were used to construct the Hi‐C library. Fresh full‐bloom flowers, leaves and stems of L. angustifolia ‘Jingxun 2’, L. latifolia and L. × intermedia ‘Super’ were collected in duplicate for determination of volatiles and RNA‐sequencing, and the glandular trichomes from calyx of L. × intermedia ‘Super’ were isolated with dissecting needles to extract the RNA.

Solid‐phase microextraction (SPME) coupled to GC–MS analysis

Fresh flowers (10 mg) from three plants of L. angustifolia Jingxun 2, L. latifolia and L. × intermedia ‘Super’ were collected and placed into headspace vials. The 3‐octanol (Sigma Aldrich, Saint Louis, MO, USA) was added as an internal standard. All experiments were run in triplicate. Kept the headspace vials in a laboratory water bath at 40 °C (for flower samples) or 70 °C (for leaf and stem samples) for 40 min. SPME analysis (20 min exposure to a 2 cm DVB/CAR/PDMS fibre, Supelco, Bellefonte, PA, USA, followed by analyte desorption at 220 °C for 3 min) was performed using a Varian CP‐3800/Saturn 2000 apparatus (Varian, Walnut Creek, CA, USA) equipped with a Zebron ZB‐5 MSI (30 m × 0.25 mm × 0.25 μm) column (Phenomenex, Shim‐Pol, Poland). About 10 μL C7‐C40 Saturated Alkanes Standard (Sigma, 1000 μg/mL) was used as a Retention Index (RI) labelled probe.

The GC oven temperature was programmed from 40 to 100 °C at a rate of 3 °C/min; to 114 °C at a rate of 2 °C/min; and then to 280 °C at a rate of 120 °C/min. Scanning was performed from 35 to 550 m/z in electronic impact mode at 70 eV. Samples were injected in a split ratio of 80 : 1 and 20 : 1 for flower and leaf/stem samples, and helium gas was used as the carrier gas at a flow rate of 1 mL/min.

Agilent MassHunter 5.0 was used to analyse the chromatograms and mass spectra. Identification of all volatile constituents obtained by SPME analysis was based on the comparison of experimentally obtained compound mass spectra with mass spectra available in the NIST14 database. In addition, the experimentally obtained RI according to C7–C40 alkane ladders was compared with RI available in the mass spectra databases NIST 2014 and literature data. The quantification analysis was performed through the integration of the peak area of the chromatograms.

Library construction and sequencing

High‐molecular weight genomic DNA from L. latifolia and L. × intermedia ‘Super’ leaves was isolated by a modified CTAB method for PacBio and Illumina sequencing. For L. × intermedia ‘Super’, the construction of PacBio, Illumina, 10 × Genomics and Hi‐C libraries and sequencing were referred to the previously published methods (Li et al., 2021a). For L. latifolia, we generated a total of 200× clean Illumina reads from libraries with different insert sizes of 350 bp (70×), 450 bp (40×), 2 kb (30×), 5 kb (30×) and 10 kb (30×). RNA for the full‐length transcriptome and expression atlas was extracted using the Magen® Plant RNA Kit according to the manufacturer's protocol. And the library construction and sequencing were the same as the L. angustifolia (Beekwilder et al., 2004).

Genome assembly, subgenome phasing and quality assessment

The genome size of LX was estimated by Jellyfish (https://github.com/gmarcais/Jellyfish, v2.2.9) using the k‐mer of 17. De novo assembly of the PacBio reads was performed using FALCON (https://github.com/PacificBiosciences/FALCON, v0.7) with the parameters ‐max_diff 100, ‐max_cov 100 and ‐min_cov 3. And then the preceding assemblies were polished by the consensus‐calling algorithm Quiver (https://www.pacb.com, v7.0.1.66975). Illumina paired‐end reads were mapped to the contig assemblies and corrected them using the Pilon (https://github.com/broadinstitute/pilon, v1.22). The 10× Genomics data were aligned to the assembly using BWA (https://github.com/lh3/bwa, v0.7.8) and the scaffolding approach was performed by FragScaff (https://sourceforge.net/projects/fragscaff, v140324.1). Finally, the anchorage of the genome assembly onto pseudochromosomes was performed by the LACHESIS (http://shendurelab.github.io/LACHESIS) pipeline. The names and orientation of LX pseudochromosomes were adjusted based on their mapping position on LA and LL genomes using NUCmer software.

Complementary methods were employed to evaluate the quality of the genome assembly. First, the completeness of genome and protein‐coding gene was assessed based on conserved plant genes (embryophyta_odb10) in the BUSCO (https://busco.ezlab.org, v3.0.2) databases. Second, Illumina short‐reads were mapped to the assembled genome using BWA (https://github.com/lh3/bwa, v0.7.8) to assess coverage rate and average depth. Third, the genome LTR assembly index (LAI) was evaluated by LTR_retriever (https://github.com/oushujun/LTR_retriever, v1.0.7) package.

Genome annotation

The repetitive sequences were annotated by combining de novo‐based and homology‐based approaches. RepeatMasker and RepeatProteinMask (http://www.repeatmasker.org, v4.0.5) were used to identify TEs by alignment to the repeat library (Repbase), whereas de novo prediction of TEs was performed using RepeatModeler (http://www.repeatmasker.org/RepeatModeler, v1.0.4), RepeatScout (http://www.repeatmasker.org, v1.0.5) and LTR_Finder (https://github.com/xzhub/LTR_Finder, v1.0.7). Tandem repeats were detected using Tandem Repeats Finder (http://tandem.bu.edu/trf/trf.html, v4.07b).

To predict protein‐coding genes, we used homologue‐based prediction, ab initio prediction and transcriptome‐based prediction. For homologue‐based prediction, sequences of homologous proteins from eight plants (Arabidopsis thaliana, Catharanthus roseus, Salvia miltiorrhiza, Salvia splendens, Scutellaria baicalensis, Sesamum indicum, Solanum lycopersicum and Vitis vinifera) were aligned to the genome using TblastN with an E‐value cut‐off of 1E‐5. Then, we used GeneWise (https://www.ebi.ac.uk/Tools/psa/genewise, v2.4.1) to generate gene models based on the alignments of proteins to the genomic sequence. For ab initio prediction, five ab initio gene prediction programs were used to predict gene models, including Augustus (http://bioinf.uni‐greifswald.de/augustus, v3.2.3), GENSCAN (http://hollywood.mit.edu/GENSCAN.html, v1.0), geneid (https://genome.crg.cat/software/geneid, v1.4), GlimmerHMM (http://ccb.jhu.edu/software/glimmerhmm, v3.0.4) and SNAP (http://korflab.ucdavis.edu/software.html, v2006‐07‐28). Additionally, in order to optimize the genome annotation, five tissues (root, stem, leaf, flower and peltate glandular trichome) RNA‐seq data were aligned to the lavender genome using TopHat (http://ccb.jhu.edu/software/tophat, v2.0.13) to identify exon regions and splicing positions. The alignment results were then used as input for Cufflinks (http://cole‐trapnell‐lab.github.io/cufflinks, v2.1.1) to assemble transcripts to the gene models. Finally, all genes predicted from the above three approaches were merged to form a comprehensive and nonredundant gene set by EVidenceModeler (http://evidencemodeler.github.io, v1.1.1).

We then generated functional annotations of the lavender genes with BLAST in public protein databases, including Swiss‐Prot (https://www.uniprot.org), NR (https://www.ncbi.nlm.nih.gov), InterPro (http://www.ebi.ac.uk/interpro, v32.0), Pfam (http://pfam.xfam.org, v27.0) and KEGG (https://www.kegg.jp). We identified the tRNA genes using tRNAscan‐SE software (http://lowelab.ucsc.edu/tRNAscan‐SE, v1.3.1). The rRNA fragments were predicted by aligning to the plant rRNA sequences using BlastN at an E‐value of 1E‐10. The miRNA and snRNA genes were predicted by Infernal software (http://eddylab.org/infernal, v1.1rc4) against the Rfam database (http://rfam.xfam.org, v13.0).

Orthologous gene pairs analysis

The orthologous gene pairs between subgenomes of lavandin were identified using collinearity analysis, which was performed using the JCVI utility libraries (https://github.com/tanghaibao/jcvi, v1.1.17). LAST software (https://gitlab.com/mcfrith/last, v1238) was applied to perform homologue search between the two subgenomes, and then the syntenic blocks were screened by Mcscan. The result was filtered and integrated by jcvi.compara.synteny mcscan module. The orthologous gene pairs between LX‐LA and LX‐LL were defined as allelic genes in the hybrid genotype. RNA‐seq data from five tissue types (root, stem, leaf, flower and glandular trichome) were mapped against the lavandin genome to determine the expression landscape of allelic genes. The FPKM of each pair of allelic genes was computed by HTSeq, and the DESeq R package was used for differential expression analyses. Differentially expressed alleles were defined with an adjusted P‐value of <0.05.

Insertion time, position and phylogenetic analysis of LTR‐RT

For the de novo prediction of intact LTR‐RTs, LTR_FINDER (v1.07) (Xu and Wang, 2007) and LTRharvest (v1.5.8) (Ellinghaus et al., 2008) were used against the genome sequences. The two ends of these LTR‐RTs were aligned with MUSCLE (v3.8.31) (Edgar, 2004), and the nucleotide divergence rate (λ) between the two LTR‐RTs was filtered at a rate exceeding 0.75. The genetic distance (K) was calculated using the formula K = −0.75ln (1–4λ/3). Kimura distance analysis was used to infer the evolutionary distance of LTR‐RTs, based on the calculation of pairwise divergence between LTR‐RT copies and consensus sequences. The insertion time (T) of LTR‐RT was calculated with the formula T = K/2r, where r represents the nucleotide substitution rate, which was set to 1.32 × 10−8 substitutions/site/year. A neighbour‐joining unrooted phylogenetic tree was constructed by uncorrected pairwise distances using TreeBeST (v1.9.2, http://treesoft.sourceforge.net/treebest.shtml) with the suggested parameters. The LTR‐RTs were classified into Copia, Gypsy and other superfamilies using LTRdigest (v1.07), whereas the secondary LTR‐RTs were determined according to their phylogenetic properties.

To analyse the insertion position of LTR‐RTs, first, determine whether there were LTRs inserted in a specific range (2 kb upstream/downstream and intragenic) of genes on the lavender genome and divide the genes into two types: with LTRs insertion and without LTR‐RTs insertion. Then calculated the odds ratio value based on the comparison of the two type of genes and the P value according to the hypergeometric distribution (Fisher's exact test), and finally, the specific enrichment LTR‐RTs of the gene family was analysed.

Identification of terpenoid biosynthetic genes

Protein sequences of L. angustifolia and Arabidopsis thaliana annotated as involved in the terpenoid biosynthesis pathway (including ACAT, HMGS, HMGR, MVK, PMK, MVD, DXR, DXS, MCT, CMK, MDS, HDS, HDR, IDI, GPPS, FPPS and TPS) were used as reference (Li et al., 2021a). We searched for homologues to these proteins in the genomes of LX and LL using BLASTP with an E‐value cut‐off of 1e−5. Conserved domains were used as search queries against the predicted proteome using hmmsearch in HMMER. For monoterpenoid synthases identification, LX members of TPS‐b subfamily and published protein sequences of Lavandula, including CINS (Demissie et al., 2012), LINS (Landmann et al., 2007) and BPPS (Despinasse et al., 2017), were used to construct the ML tree using iqtree software. For prenyltransferases LPPS, the ML tree was constructed using protein sequences of LX prenyltransferases and the published LPPS (Demissie et al., 2013).

AAT and BDH analyses of lavender and other species of Lamiaceae

The AAT and BDH belong to the BAHD and SDR superfamilies, respectively (D'Auria, 2006; Ma et al., 2021). We firstly download the protein sequences of BAHD (Yu et al., 2009) and SDR (Moummou et al., 2012) of Arabidopsis. Then we construct an ML tree with published AAT (Sarker and Mahmoud, 2015) and BDH (Sarker et al., 2012) of Lavandula using iqtree software. Then we perform a detailed analysis of the genes clustered with the published AAT and BDH by protein and DNA sequence alignment, gene expression test, gene clone and pseudogene recovery. The region of the LA or LL genome containing AAT or BDH gene and about 5–10 genes from upstream and downstream was compared with the corresponding genomic regions of LX‐LA, LX‐LL, Ocimum basilicum (Obas), Salvia splendens (Sspl) and Thymus quinquecostatus (Tqui) using JCVI utility libraries (https://github.com/tanghaibao/jcvi, v1.1.17). The coding sequences of AAT or BDH were confirmed by PCR and RNA‐Seq. The coding sequences of pseudogenes were recovered manually.

Functional characterization of AATs and BDHs in vitro

The cDNAs for targeted genes were cloned using special primers (Table S21). Positive colonies were verified by sequencing and then subcloned into the pET‐28a (+) expression vector (Novagen, USA) according to the protocol of the pEASY®‐Uni Seamless Cloning and Assembly Kit (TransGen Biotech, China). The recombinant proteins were expressed and purified following the methods described previously (Ma et al., 2021; Sarker and Mahmoud, 2015). For the in vitro enzyme assay, the enzyme assays of BDHs were performed in a final volume of 200 μL of buffer (10 mm sodium phosphate buffer, pH 8.0, 1 mm NAD+) containing 10 μg of purified protein and 0.1 mm borneol; the enzyme assays of AATs were performed in a final volume of 500 μL of buffer (Tris‐HCl, pH 7.5, 0.2 mm acetyl CoA) containing 10 μg of purified protein and 5 mm linalool or lavandulol. All reactions were incubated at 30 °C with 150 rpm shaking overnight.

Assay products were analysed using a Varian CP‐3800/Saturn 2000 apparatus (Varian, Walnut Creek, CA, USA) equipped with a Zebron ZB‐5 MSI (30 m × 0.25 mm × 0.25 μm) column (Phenomenex, Shim‐Pol, Poland). The GC–MS analysis was as same as the above volatile analysis. The products were confirmed by comparing their RI and mass spectra to those of authentic standards analysed under the same condition.

The methods for Chloroplast genome assembly and phylogenetic analysis (Method S1), Genome size and heterozygosity evaluation (Method S2) and Analysis of gene family, phylogenetic evolution and WGD (Method S3) are attached in the Supporting Information files.

Conflict of Interest

The authors declare no competing financial interests.

Author Contributions

LS, HL and JRL conceived and designed the study; JRL, HL, DW, YMD, ZYL and HTB prepared the materials; JRL and WYZ performed the experiments; JRL, YMW and XDH analysed data and prepared results; JRL wrote and revised the manuscript. All authors read and approved the final draft.

Supporting information

Figure S1 The karyotype of L. × intermedia ‘Super’.

Figure S2 Genome size estimation of L. × intermedia and L. latifolia based on flow cytometry study.

Figure S3 17‐mer analysis to estimate the genome size of L. × intermedia and L. latifolia.

Figure S4 The relation of different regions along chromosomes in the genome of L. × intermedia.

Figure S5 The workflow of L. × intermedia genome assembly.

Figure S6 Coverage depth of L. × intermedia genome.

Figure S7 Density distributions of 4dTv for paralogous genes in Lavandula.

Figure S8 The chloroplast genome of Lavandula.

Figure S9 Characteristics of complete chloroplast genomes for Lavandula species.

Figure S10 Phylogenetic tree reconstructions of Lavandula using ML based on whole chloroplast genome sequences.

Figure S11 Microsynteny between two subgenomes of L. × intermedia.

Figure S12 Dotplots of synthetic blocks between two subgenomes of L. × intermedia.

Figure S13 KEGG pathway enrichment of genes with structural variants between the LX‐LA and LX‐LL subgenomes.

Figure S14 Large structural variants (≥50 bp) in LA vs LL and LX‐LA vs LX‐LL comparisons.

Figure S15 The number of total gene pairs with homeolog expression bias across the five tissues within hybrid lavandin.

Figure S16 The number of total gene pairs with significant homeolog expression bias across the five tissues within hybrid lavandin.

Figure S17 KEGG pathway enrichment of genes with intragenic LTR‐RT insertions between the LX‐LA and LX‐LL subgenomes.

Figure S18 The number of terpenoid biosynthetic gene pairs gene pairs with homeolog expression bias across the five tissues within hybrid lavandin.

Figure S19 The number of terpenoid biosynthetic gene pairs gene pairs with significant homeolog expression bias across the five tissues within hybrid lavandin.

Figure S20 Transposable element insertions in gene families related to terpenoid biosynthesis in L. angustifolia, L. × intermedia and L. latifolia.

Figure S21 Tissue‐specific expression patterns of terpenoid biosynthetic genes harbouring LTR‐RTs (green) or lacking LTR‐RTs (orange) in upstream sequences.

Figure S22 Tissue‐specific expression patterns of terpenoid biosynthetic genes harbouring LTR‐RTs (green) or lacking LTR‐RTs (orange) in intergenic sequences.

Figure S23 Expression patterns of members of the TPS‐b subfamily in the LX‐LA and LX‐LL subgenomes across the five tissues.

Figure S24 Amino acid sequence alignment of putative AATs in Lavandula.

Figure S25 Phylogenetic analysis of BAHD gene family in Lavandula.

Figure S26 SDS‐PAGE analysis of protein samples extracted from bacterial cells expressing AAT3 and AAT4 of L. angustifolia.

Figure S27 The mass spectrum of linalyl acetate and lavandulyl acetate.

Figure S28 Sequence alignment of cloned La14G01640 and genomic La14G01640.

Figure S29 Mapping of RNA‐Seq reads to La14G01640.

Figure S30 Microsynteny between chromosome 14 and chromosome 15 of L. angustifolia nearby the AAT locus.

Figure S31 Microsynteny between L. angustifolia chromosome 14 and L. latifolia chromosome 3 nearby the AAT locus.

Figure S32 Phylogenetic analysis of all putative AAT copies and recovered AAT pseudogenes in Lavandula.

Figure S33 Microsynteny analysis of AAT3 and AAT4 across Lavandula (LA, LX‐LA, LX‐LL and LL), Ocimum basilicum (Obas), Salvia splendens (Sspl) and Thymus quinquecostatus (Tqui).

Figure S34 Phylogenetic analysis of SDR gene family in Lavandula.

Figure S35 Amino acid sequence alignment of putative BDHs in Lavandula.

Figure S36 The gene expression levels of BDHs in flower, root, leaf and stem of L. latifolia.

Figure S37 Microsynteny between L. angustifolia chromosome 3 and 12 and L. latifolia chromosome 20 and 12 nearby the BDH locus.

Figure S38 SDS‐PAGE analysis of protein samples extracted from bacterial cells expressing BDHs using the pDE2 vector.

Figure S39 The mass spectrum of camphor.

Figure S40 Microsynteny analysis of BDH loci 1, 3 and 4 and BDH locus 2 across Lavandula (LA, LX‐LA, LX‐LL and LL), Ocimum basilicum (Obas), Salvia splendens (Sspl) and Thymus quinquecostatus (Tqui).

Method S1 Chloroplast genome assembly and phylogenetic analysis.

Method S2 Genome size and heterozygosity evaluation.

Method S3 Analysis of gene family, phylogenetic evolution and WGD.

PBI-21-2084-s005.pdf (3.9MB, pdf)

Table S1 Identification of volatile terpenoids in three lavender taxa.

PBI-21-2084-s019.xlsx (14.3KB, xlsx)

Table S2 Statistics on sequencing of the L. × intermedia genome.

PBI-21-2084-s016.xlsx (9.2KB, xlsx)

Table S3 Statistics on sequencing of the L. latifolia genome.

PBI-21-2084-s022.xlsx (9.3KB, xlsx)

Table S4 Results of initial genome assembly of L. latifolia.

PBI-21-2084-s015.xlsx (9.4KB, xlsx)

Table S5 Results of Hi‐C‐assisted assembly of L. latifolia genome.

PBI-21-2084-s008.xlsx (9.5KB, xlsx)

Table S6 Results of Hi‐C‐assisted assembly of L. × intermedia genome.

PBI-21-2084-s006.xlsx (9.5KB, xlsx)

Table S7 Results of BUSCO assessment of Lavandula genome.

PBI-21-2084-s017.xlsx (9.4KB, xlsx)

Table S8 The long terminal repeat assembly index score of Lavandula genome.

PBI-21-2084-s012.xlsx (9.1KB, xlsx)

Table S9 Prediction of the gene structure of L. latifolia genome.

PBI-21-2084-s002.xlsx (10.5KB, xlsx)

Table S10 Prediction of the gene structure of L. × intermedia genome.

PBI-21-2084-s001.xlsx (10.5KB, xlsx)

Table S11 Basic statistical results of the gene structure of relative species.

PBI-21-2084-s011.xlsx (9.8KB, xlsx)

Table S12 Statistical results of gene functional annotations.

PBI-21-2084-s013.xlsx (9.3KB, xlsx)

Table S13 Statistics of non‐coding RNA in the genome of L. × intermedia and L. latifolia.

PBI-21-2084-s020.xlsx (10.4KB, xlsx)

Table S14 Large variants in comparisons of LA vs LL and LX‐LA vs LX‐LL.

PBI-21-2084-s004.xlsx (10.4KB, xlsx)

Table S15 Transposable elements (TEs) classification in genomes of three lavender taxa.

PBI-21-2084-s009.xlsx (18.4KB, xlsx)

Table S16 The homologue gene pairs between the two subgenomes of L. × intermedia.

PBI-21-2084-s014.xlsx (1MB, xlsx)

Table S17 The position of TEs inserted in terpenoid biosynthetic genes of Lavandula.

PBI-21-2084-s021.xlsx (22.3KB, xlsx)

Table S18 Monoterpenoid biosynthetic genes identified in lavandin.

PBI-21-2084-s007.xlsx (14.3KB, xlsx)

Table S19 TEs inserted in the upstream of Ll12G01326.

PBI-21-2084-s018.xlsx (12.5KB, xlsx)

Table S20 TEs insertion near the Ll11G00242.

PBI-21-2084-s003.xlsx (10.2KB, xlsx)

Table S21 Primers for cloning AATs and BDHs from L. × intermedia.

PBI-21-2084-s010.xlsx (9.4KB, xlsx)

Acknowledgements

We thank Yalong Guo (Institute of Botany, Chinese Academy of Sciences) for helpful discussions on heterosis and Yan Zhu (Institute of Botany, Chinese Academy of Sciences) for her help with terpenoid analysis experiments. This work was supported by grants from the projects funded by the Strategic Priority Research Program of the Chinese Academy of Sciences (grant no. XDA23080603), National Natural Science Foundation of China (grant nos. 31701956 and 32270411), Special Research Assistant Grant of the Chinese Academy of Sciences and China Postdoctoral Science Foundation (grant no. 2021M703477).

Contributor Information

Xiaodi Hu, Email: huxiaodi@novogene.com.

Lei Shi, Email: shilei_67@126.com.

Data Availability Statement

The raw genome and transcriptome sequencing data of L. × intermedia ‘Super’ and L. latifolia have been deposited in the National Center for Biotechnology Information (NCBI) database under project number PRJNA700125.

References

  1. Abbas, F. , Ke, Y. , Yu, R. , Yue, Y. , Amanullah, S. , Jahangir, M.M. and Fan, Y. (2017) Volatile terpenoids: multiple functions, biosynthesis, modulation and manipulation by genetic engineering. Planta, 246, 803–816. [DOI] [PubMed] [Google Scholar]
  2. Alonge, M. , Wang, X. , Benoit, M. , Soyk, S. , Pereira, L. , Zhang, L. , Suresh, H. et al. (2020) Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell, 182, 145–161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Alseekh, S. , Scossa, F. and Fernie, A.R. (2020) Mobile transposable elements shape plant genome diversity. Trends Plant Sci. 25, 1062–1064. [DOI] [PubMed] [Google Scholar]
  4. Aprotosoaie, A.C. , Gille, E. , Trifan, A. , Luca, V.S. and Miron, A. (2017) Essential oils of Lavandula genus: a systematic review of their chemistry. Phytochem. Rev. 16, 761–799. [Google Scholar]
  5. Beekwilder, J. , Alvarez‐Huerta, M. , Neef, E. , Verstappen, F.W. , Bouwmeester, H.J. and Aharoni, A. (2004) Functional characterization of enzymes forming volatile esters from strawberry and banana. Plant Physiol. 135, 1865–1878. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bird, K.A. , VanBuren, R. , Puzey, J.R. and Edger, P.P. (2018) The causes and consequences of subgenome dominance in hybrids and recent polyploids. New Phytol. 220, 87–93. [DOI] [PubMed] [Google Scholar]
  7. Boutanaev, A.M. and Osbourn, A.E. (2018) Multigenome analysis implicates miniature inverted‐repeat transposable elements MITEs in metabolic diversification in eudicots. Proc. Natl Acad. Sci., USA, 115, E6650–E6658. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. D'Auria, J.C. (2006) Acyltransferases in plants: a good time to be BAHD. Curr. Opin. Plant Biol. 9, 331–340. [DOI] [PubMed] [Google Scholar]
  9. David, P. , Chen, N.W. , Pedrosa‐Harand, A. , Thareau, V. , Sévignac, M. , Cannon, S.B. , Debouck, D. et al. (2009) A nomadic subtelomeric disease resistance gene cluster in common bean. Plant Physiol. 151, 1048–1065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Demissie, Z.A. , Cella, M.A. , Sarker, L.S. , Thompson, T.J. , Rheault, M.R. and Mahmoud, S.S. (2012) Cloning, functional characterization and genomic organization of 1,8‐cineole synthases from Lavandula . Plant Mol. Biol. 79, 393–411. [DOI] [PubMed] [Google Scholar]
  11. Demissie, Z.A. , Erland, L. , Rheault, M.R. and Mahmoud, S.S. (2013) The biosynthetic origin of irregular monoterpenes in Lavandula: isolation and biochemical characterization of a novel cis‐prenyl diphosphate synthase gene, lavandulyl diphosphate synthase. J. Biol. Chem. 288, 6333–6341. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Despinasse, Y. , Fiorucci, S. , Antonczak, S. , Moja, S. , Bony, A. , Nicolè, F. , Baudino, S. et al. (2017) Bornyl‐diphosphate synthase from Lavandula angustifolia: A major monoterpene synthase involved in essential oil quality. Phytochemistry, 137, 24–33. [DOI] [PubMed] [Google Scholar]
  13. Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Edger, P.P. , Smith, R. , McKain, M.R. , Cooley, A.M. , Vallejo‐Marin, M. , Yuan, Y. , Bewick, A.J. et al. (2017) Subgenome dominance in an interspecific hybrid, synthetic allopolyploid, and a 140‐year‐old naturally established neo‐allopolyploid monkeyflower. Plant Cell, 29, 2150–2167. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Ellinghaus, D. , Kurtz, S. and Willhoeft, U. (2008) LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinform. 9, 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Erik, D.V. , Sawers, R.H. and Angélica, C.J. (2020) Cis‐ and trans‐regulatory variations in the domestication of the chili pepper fruit. Mol. Biol. Evol. 6, 1594–1603. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Feschotte, C. , Jiang, N. and Wessler, S.R. (2002) Plant transposable elements: where genetics meets genomics. Nat. Rev. Genet. 3, 329–341. [DOI] [PubMed] [Google Scholar]
  18. Freeling, M. (2009) Bias in plant gene content following different sorts of duplication: tandem, whole‐genome, segmental, or by transposition. Annu. Rev. Plant Biol. 60, 433–453. [DOI] [PubMed] [Google Scholar]
  19. Frezza, C. , Venditti, A. , Giuliani, C. , Foddai, S. , Maggi, F. , Fico, G. , Bianco, A. et al. (2019) Preliminary study on the phytochemical evolution of different Lamiaceae species based on iridoids. Biochem. System. Ecol. 82, 44–51. [Google Scholar]
  20. Gonda, I. , Faigenboim, A. , Adler, C. , Milavski, R. , Karp, M.J. , Shachter, A. , Ronen, G. et al. (2020) The genome sequence of tetraploid sweet basil, Ocimum basilicum L., provides tools for advanced genome editing and molecular breeding. DNA Res. 27, dsaa027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Guitton, Y. , Nicolè, F. , Moja, S. , Valot, N. , Legrand, S. , Jullien, F. and Legendre, L. (2010) Differential accumulation of volatile terpene and terpene synthase mRNAs during lavender Lavandula angustifolia and L. × intermedia inflorescence development. Physiol. Plant. 138, 150–163. [DOI] [PubMed] [Google Scholar]
  22. Guitton, Y. , Nicolè, F. , Jullien, F. , Caissard, J.C. , Saint‐Marcoux, D. , Legendre, L. , Pasquier, B. et al. (2018) A comparative study of terpene composition in different clades of the genus Lavandula . Bot. Lett. 165, 494–505. [Google Scholar]
  23. He, T. , Li, X. , Wang, X. , Xu, X. , Yan, X. , Li, X. , Sun, S. et al. (2020) Chemical composition and anti‐oxidant potential on essential oils of Thymus quinquecostatus Celak. From Loess Plateau in China, regulating Nrf2/Keap1 signaling pathway in zebrafish. Sci. Rep. 10, 1–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Hind, K.R. , Adal, A.M. , Upson, T.M. and Mahmoud, S.S. (2018) An assessment of plant DNA barcodes for the identification of cultivated Lavandula (Lamiaceae) taxa. Biocatal. Agric. Biotechnol. 16, 459–466. [Google Scholar]
  25. Jia, K.H. , Liu, H. , Zhang, R.G. , Xu, J. , Zhou, S.S. , Jiao, S.Q. , Yan, X.M. et al. (2021) Chromosome‐scale assembly and evolution of the tetraploid Salvia splendens Lamiaceae genome. Hortic. Res. 8, 177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Kim, S. , Mun, S. , Kim, T. , Lee, K.H. , Kang, K. , Cho, J.Y. and Han, K. (2019) Transposable element‐mediated structural variation analysis in dog breeds using whole‐genome sequencing. Mamm. Genome, 30, 289–300. [DOI] [PubMed] [Google Scholar]
  27. Landmann, C. , Fink, B. , Festner, M. , Dregus, M. and Schwab, W. (2007) Cloning and functional characterization of three terpene synthases from lavender (Lavandula angustifolia). Arch. Biochem. Biophys. 465, 417–429. [DOI] [PubMed] [Google Scholar]
  28. Lee, S.J. , Umano, K. , Shibamoto, T. and Lee, K.G. (2005) Identification of volatile components in basil (Ocimum basilicum L.) and thyme leaves (Thymus vulgaris L.) and their antioxidant properties. Food Chem. 91, 131–137. [Google Scholar]
  29. Li, H. , Li, J. , Dong, Y. , Hao, H. , Ling, Z. , Bai, H. , Wang, H. et al. (2019) Time‐series transcriptome provides insights into the gene regulation network involved in the volatile terpenoid metabolism during the flower development of lavender. BMC Plant Biol. 19, 1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Li, J. , Wang, Y. , Dong, Y. , Zhang, W. , Wang, D. , Bai, H. , Li, K. et al. (2021a) The chromosome‐based lavender genome provides new insights into Lamiaceae evolution and terpenoid biosynthesis. Hortic. Res., 8, 53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Li, Y. , Leveau, A. , Zhao, Q. , Feng, Q. , Lu, H. , Miao, J. , Xue, Z. et al. (2021b) Subtelomeric assembly of a multi‐gene pathway for antimicrobial defense compounds in cereals. Nat. Commun. 12, 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Ma, R. , Su, P. , Jin, B. , Guo, J. , Tian, M. , Mao, L. , Tang, J. et al. (2021) Molecular cloning and functional identification of a high‐efficiency (+)‐borneol dehydrogenase from Cinnamomum camphora L. Presl. Plant Physiol. Biochem. 158, 363–371. [DOI] [PubMed] [Google Scholar]
  33. Mansfeld, B.N. , Boyher, A. , Berry, J.C. , Wilson, M. , Ou, S. , Polydore, S. , Michael, T.P. et al. (2021) Large structural variations in the haplotype‐resolved African cassava genome. Plant J. 108, 1830–1848. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Mathew, J. and Thoppil, J.E. (2011) Chemical composition and mosquito larvicidal activities of Salvia essential oils. Pharm. Biol. 49, 456–463. [DOI] [PubMed] [Google Scholar]
  35. Mathias, F. , Aurélie, B. , Florence, N. , Sandrine, M. and Frédéric, J. (2023) Lavandula angustifolia Mill. a model of aromatic and medicinal plant to study volatile organic compounds synthesis, evolution and ecological functions. Bot. Lett. 170, 65–76. [Google Scholar]
  36. Moummou, H. , Kallberg, Y. , Tonfack, L.B. , Persson, B. and Rest, B.V.D. (2012) The plant short‐chain dehydrogenase (SDR) superfamily: genome‐wide inventory and diversification patterns. BMC Plant Biol. 12, 219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Nagegowda, D.A. and Gupta, P. (2020) Advances in biosynthesis, regulation, and metabolic engineering of plant specialized terpenoids. Plant Sci. 294, 110457. [DOI] [PubMed] [Google Scholar]
  38. Ou, S. , Chen, J. and Jiang, N. (2018) Assessing genome assembly quality using the LTR Assembly Index LAI. Nucleic Acids Res. 46, e126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Panchy, N. , Lehti‐Shiu, M. and Shiu, S.H. (2016) Evolution of gene duplication in plants. Plant Physiol. 171, 2294–2316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Pichersky, E. and Raguso, R.A. (2018) Why do plants produce so many terpenoid compounds? New Phytol. 220, 692–702. [DOI] [PubMed] [Google Scholar]
  41. Pokajewicz, K. , Czarniecka‐Wiera, M. , Krajewska, A. , Maciejczyk, E. and Wieczorek, P.P. (2023) Lavandula × intermedia—a bastard lavender or a plant of many values? Part I. biology and chemical composition of lavandin. Molecules, 28(7), 2943. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Roy, S.W. (2021) Dual fertilization, intragenomic conflict, genome downsizing, and angiosperm dominance. Trends Plant Sci. 26, 767–769. [DOI] [PubMed] [Google Scholar]
  43. Salehi, B. , Mnayer, D. , Özçelik, B. , Altin, G. , Kasapoğlu, K.N. , Daskaya‐Dikmen, C. , Sharifi‐Rad, M. et al. (2018) Plants of the genus Lavandula: from farm to pharmacy. Nat. Prod. Commun. 13, 1934578X1801301. [Google Scholar]
  44. Sarker, L.S. and Mahmoud, S.S. (2015) Cloning and functional characterization of two monoterpene acetyltransferases from glandular trichomes of L. × intermedia . Planta, 242, 709–719. [DOI] [PubMed] [Google Scholar]
  45. Sarker, L.S. , Galata, M. , Demissie, Z.A. and Mahmoud, S.S. (2012) Molecular cloning and functional characterization of borneol dehydrogenase from the glandular trichomes of Lavandula × intermedia . Arch. Biochem. Biophys. 528, 163–170. [DOI] [PubMed] [Google Scholar]
  46. Sharbrough, J. , Conover, J.L. , Tate, J.A. , Wendel, J.F. and Sloan, D.B. (2017) Cytonuclear responses to genome doubling. Am. J. Bot. 104, 1277–1280. [DOI] [PubMed] [Google Scholar]
  47. Singh, P. , Kalunke, R.M. , Shukla, A. , Tzfadia, O. , Thulasiram, H.V. and Giri, A.P. (2020) Biosynthesis and tissue‐specific partitioning of camphor and eugenol in Ocimum kilimandscharicum . Phytochemistry, 177, 112451. [DOI] [PubMed] [Google Scholar]
  48. Sun, Y. , Shang, L. , Zhu, Q.H. , Fan, L. and Guo, L. (2021) Twenty years of plant genome sequencing: Achievements and challenges. Trends Plant Sci. 27, 391–401. [DOI] [PubMed] [Google Scholar]
  49. Sun, M. , Zhang, Y. , Zhu, L. , Liu, N. , Bai, H. , Sun, G. , Zhang, J. et al. (2022) Chromosome‐level assembly and analysis of the Thymus genome provide insights into glandular secretory trichome formation and monoterpenoid biosynthesis in thyme. Plant Commun. 3, 100413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Upson, T. and Andrews, S. (2004) The Genus Lavandula[M]. Royal Botanic Gardens, Kew. Chicago, IL: The University of Chicago Press. [Google Scholar]
  51. Urwin, N.A. (2014) Generation and characterisation of colchicine‐induced polyploid Lavandula × intermedia . J. Euphytica, 197, 331–339. [Google Scholar]
  52. VanBuren, R. , Man, W.C. , Wang, X. , Pardo, J. , Yocca, A.E. , Wang, H. , Chaluvadi, S.R. et al. (2020) Exceptional subgenome stability and functional divergence in the allotetraploid Ethiopian cereal teff. Nat. Commun. 11, 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Vicient, C.M. and Casacuberta, J.M. (2017) Impact of transposable elements on polyploid plant genomes. Ann. Bot. 120, 195–207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Wang, Z. , Miao, H. , Liu, J. , Xu, B. , Yao, X. , Xu, C. , Zhao, S. et al. (2019) Musa balbisiana genome reveals subgenome evolution and functional divergence. Nat. Plants, 5, 810–821. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Wink, M. (2003) Evolution of secondary metabolites from an ecological and molecular phylogenetic perspective. Phytochemistry, 64, 3–19. [DOI] [PubMed] [Google Scholar]
  56. Woronuk, G. , Demissie, Z. , Rheault, M. and Mahmoud, S. (2011) Biosynthesis and therapeutic properties of Lavandula essential oil constituents. Planta Med. 77, 7–15. [DOI] [PubMed] [Google Scholar]
  57. Xia, E. , Tong, W. , Hou, Y. , An, Y. , Chen, L. , Wu, Q. , Liu, Y. et al. (2020) The reference genome of tea plant and resequencing of 81 diverse accessions provide insights into its genome evolution and adaptation. Mol. Plant, 13, 1013–1026. [DOI] [PubMed] [Google Scholar]
  58. Xiao, Y. , Jiang, S. , Cheng, Q. , Wang, X. , Yan, J. , Zhang, R. , Qiao, F. et al. (2021) The genetic mechanism of heterosis utilization in maize improvement. Genome Biol. 22, 1–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Xu, Y.C. and Guo, Y.L. (2020) Less is more, natural loss‐of‐function mutation is a strategy for adaptation. Plant Commun. 1, 100103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Xu, Z. and Wang, H. (2007) LTR_FINDER: an efficient tool for the prediction of full‐length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Yu, X.H. , Gou, J.Y. and Liu, C.J. (2009) BAHD superfamily of acyl‐CoA dependent acyltransferases in Populus and Arabidopsis: bioinformatics and gene expression. Plant Mol. Biol. 70, 421–442. [DOI] [PubMed] [Google Scholar]
  62. Zhang, D. , Pan, Q. , Tan, C. , Liu, L. , Ge, X. , Li, Z. and Yan, M. (2018) Homeolog expression is modulated differently by different subgenomes in Brassica napus hybrids and allotetraploids. Plant Mol. Biol. Report. 36, 387–398. [Google Scholar]
  63. Zhao, B. , Cao, J.F. , Hu, G.J. , Chen, Z.W. , Wang, L.Y. , Shangguan, X.X. , Wang, L.J. et al. (2018) Core cis‐element variation confers subgenome‐biased expression of a transcription factor that functions in cotton fiber elongation. New Phytol. 218, 1061–1075. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figure S1 The karyotype of L. × intermedia ‘Super’.

Figure S2 Genome size estimation of L. × intermedia and L. latifolia based on flow cytometry study.

Figure S3 17‐mer analysis to estimate the genome size of L. × intermedia and L. latifolia.

Figure S4 The relation of different regions along chromosomes in the genome of L. × intermedia.

Figure S5 The workflow of L. × intermedia genome assembly.

Figure S6 Coverage depth of L. × intermedia genome.

Figure S7 Density distributions of 4dTv for paralogous genes in Lavandula.

Figure S8 The chloroplast genome of Lavandula.

Figure S9 Characteristics of complete chloroplast genomes for Lavandula species.

Figure S10 Phylogenetic tree reconstructions of Lavandula using ML based on whole chloroplast genome sequences.

Figure S11 Microsynteny between two subgenomes of L. × intermedia.

Figure S12 Dotplots of synthetic blocks between two subgenomes of L. × intermedia.

Figure S13 KEGG pathway enrichment of genes with structural variants between the LX‐LA and LX‐LL subgenomes.

Figure S14 Large structural variants (≥50 bp) in LA vs LL and LX‐LA vs LX‐LL comparisons.

Figure S15 The number of total gene pairs with homeolog expression bias across the five tissues within hybrid lavandin.

Figure S16 The number of total gene pairs with significant homeolog expression bias across the five tissues within hybrid lavandin.

Figure S17 KEGG pathway enrichment of genes with intragenic LTR‐RT insertions between the LX‐LA and LX‐LL subgenomes.

Figure S18 The number of terpenoid biosynthetic gene pairs gene pairs with homeolog expression bias across the five tissues within hybrid lavandin.

Figure S19 The number of terpenoid biosynthetic gene pairs gene pairs with significant homeolog expression bias across the five tissues within hybrid lavandin.

Figure S20 Transposable element insertions in gene families related to terpenoid biosynthesis in L. angustifolia, L. × intermedia and L. latifolia.

Figure S21 Tissue‐specific expression patterns of terpenoid biosynthetic genes harbouring LTR‐RTs (green) or lacking LTR‐RTs (orange) in upstream sequences.

Figure S22 Tissue‐specific expression patterns of terpenoid biosynthetic genes harbouring LTR‐RTs (green) or lacking LTR‐RTs (orange) in intergenic sequences.

Figure S23 Expression patterns of members of the TPS‐b subfamily in the LX‐LA and LX‐LL subgenomes across the five tissues.

Figure S24 Amino acid sequence alignment of putative AATs in Lavandula.

Figure S25 Phylogenetic analysis of BAHD gene family in Lavandula.

Figure S26 SDS‐PAGE analysis of protein samples extracted from bacterial cells expressing AAT3 and AAT4 of L. angustifolia.

Figure S27 The mass spectrum of linalyl acetate and lavandulyl acetate.

Figure S28 Sequence alignment of cloned La14G01640 and genomic La14G01640.

Figure S29 Mapping of RNA‐Seq reads to La14G01640.

Figure S30 Microsynteny between chromosome 14 and chromosome 15 of L. angustifolia nearby the AAT locus.

Figure S31 Microsynteny between L. angustifolia chromosome 14 and L. latifolia chromosome 3 nearby the AAT locus.

Figure S32 Phylogenetic analysis of all putative AAT copies and recovered AAT pseudogenes in Lavandula.

Figure S33 Microsynteny analysis of AAT3 and AAT4 across Lavandula (LA, LX‐LA, LX‐LL and LL), Ocimum basilicum (Obas), Salvia splendens (Sspl) and Thymus quinquecostatus (Tqui).

Figure S34 Phylogenetic analysis of SDR gene family in Lavandula.

Figure S35 Amino acid sequence alignment of putative BDHs in Lavandula.

Figure S36 The gene expression levels of BDHs in flower, root, leaf and stem of L. latifolia.

Figure S37 Microsynteny between L. angustifolia chromosome 3 and 12 and L. latifolia chromosome 20 and 12 nearby the BDH locus.

Figure S38 SDS‐PAGE analysis of protein samples extracted from bacterial cells expressing BDHs using the pDE2 vector.

Figure S39 The mass spectrum of camphor.

Figure S40 Microsynteny analysis of BDH loci 1, 3 and 4 and BDH locus 2 across Lavandula (LA, LX‐LA, LX‐LL and LL), Ocimum basilicum (Obas), Salvia splendens (Sspl) and Thymus quinquecostatus (Tqui).

Method S1 Chloroplast genome assembly and phylogenetic analysis.

Method S2 Genome size and heterozygosity evaluation.

Method S3 Analysis of gene family, phylogenetic evolution and WGD.

PBI-21-2084-s005.pdf (3.9MB, pdf)

Table S1 Identification of volatile terpenoids in three lavender taxa.

PBI-21-2084-s019.xlsx (14.3KB, xlsx)

Table S2 Statistics on sequencing of the L. × intermedia genome.

PBI-21-2084-s016.xlsx (9.2KB, xlsx)

Table S3 Statistics on sequencing of the L. latifolia genome.

PBI-21-2084-s022.xlsx (9.3KB, xlsx)

Table S4 Results of initial genome assembly of L. latifolia.

PBI-21-2084-s015.xlsx (9.4KB, xlsx)

Table S5 Results of Hi‐C‐assisted assembly of L. latifolia genome.

PBI-21-2084-s008.xlsx (9.5KB, xlsx)

Table S6 Results of Hi‐C‐assisted assembly of L. × intermedia genome.

PBI-21-2084-s006.xlsx (9.5KB, xlsx)

Table S7 Results of BUSCO assessment of Lavandula genome.

PBI-21-2084-s017.xlsx (9.4KB, xlsx)

Table S8 The long terminal repeat assembly index score of Lavandula genome.

PBI-21-2084-s012.xlsx (9.1KB, xlsx)

Table S9 Prediction of the gene structure of L. latifolia genome.

PBI-21-2084-s002.xlsx (10.5KB, xlsx)

Table S10 Prediction of the gene structure of L. × intermedia genome.

PBI-21-2084-s001.xlsx (10.5KB, xlsx)

Table S11 Basic statistical results of the gene structure of relative species.

PBI-21-2084-s011.xlsx (9.8KB, xlsx)

Table S12 Statistical results of gene functional annotations.

PBI-21-2084-s013.xlsx (9.3KB, xlsx)

Table S13 Statistics of non‐coding RNA in the genome of L. × intermedia and L. latifolia.

PBI-21-2084-s020.xlsx (10.4KB, xlsx)

Table S14 Large variants in comparisons of LA vs LL and LX‐LA vs LX‐LL.

PBI-21-2084-s004.xlsx (10.4KB, xlsx)

Table S15 Transposable elements (TEs) classification in genomes of three lavender taxa.

PBI-21-2084-s009.xlsx (18.4KB, xlsx)

Table S16 The homologue gene pairs between the two subgenomes of L. × intermedia.

PBI-21-2084-s014.xlsx (1MB, xlsx)

Table S17 The position of TEs inserted in terpenoid biosynthetic genes of Lavandula.

PBI-21-2084-s021.xlsx (22.3KB, xlsx)

Table S18 Monoterpenoid biosynthetic genes identified in lavandin.

PBI-21-2084-s007.xlsx (14.3KB, xlsx)

Table S19 TEs inserted in the upstream of Ll12G01326.

PBI-21-2084-s018.xlsx (12.5KB, xlsx)

Table S20 TEs insertion near the Ll11G00242.

PBI-21-2084-s003.xlsx (10.2KB, xlsx)

Table S21 Primers for cloning AATs and BDHs from L. × intermedia.

PBI-21-2084-s010.xlsx (9.4KB, xlsx)

Data Availability Statement

The raw genome and transcriptome sequencing data of L. × intermedia ‘Super’ and L. latifolia have been deposited in the National Center for Biotechnology Information (NCBI) database under project number PRJNA700125.


Articles from Plant Biotechnology Journal are provided here courtesy of Society for Experimental Biology (SEB) and the Association of Applied Biologists (AAB) and John Wiley and Sons, Ltd

RESOURCES