Background
Isodon lophanthodies is a perennial herb and the whole plant has medicinal value distributed in southern China and southeast Asia. The absence of a reference genome has hindered evolution and genomic breeding research of this species. Results: In this study, we present a high-quality, chromosome-level genome assembly of I. lophanthodies with integrating PacBio and Hi-C sequencing data. We assembled a genome of 412.78 Mb with a scaffold N50 of ~ 33.43 Mb, organized into 12 pseudochromosomes. This assembly includes 36,324 genes and 209.51 Mb of repetitive sequences. Phylogenetic analysis revealed that I. lophanthodies and its sister species Isodon rubescens diverged approximately 9.99 million years ago (MYA), and shared a recent whole-genome duplication (WGD) event. Combined with the gene expression profile and metabolite fluctuation in response to methyl jasmonate, two key enzymes involved in salicin synthesis pathway were further identified. Conclusions: This genome assembly provides an essential reference for future research on I. lophanthodies, and enhances our understanding of salicin synthesis and medicinal metabolite profiles in response to exogenous methyl jasmonate.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12870-024-05979-5.
Keywords: Isodon lophanthodies, Genome assembly, Medicinal metabolites, Methyl jasmonate
Introduction
Isodon lophanthodies is a perennial herb and the whole plant has medicinal value. Its distribution is primarily in southern China, India, Myanmar, Nepal, Thailand, Vietnam, etc [1]. In southern China, I. lophanthodies commonly thrives in moist habitats including valleys, fields, streams, and riverbanks [1]. Its leaves possess a unique characteristic that when manually rubbed, yellow juice will be released, leading to the local name “Xihuangcao”. Currently, researchers have confirmed four sources of the Chinese traditional medicinal herb “Xihuangcao”, including I. lophanthoides, I. lophanthoides var. Gerardianus, I. lophanthoides var. Graciliflorus and Isodon serra [2].
The genus Isodon encompasses a diverse array of medicinally active compounds. Four novel 7,20-non-epoxy-ent-kaurane skeleton diterpenes isolated from Isodon excisoides exhibited potent cytotoxic activities against human cancer cells [3]. Oridonin, isolated from I. rubescens, had antitumor and anticancer properties [4, 5]. As a traditional Chinese medicine, I. lophanthodies has been reported to exhibit anti-tumour, anti-inflammatory, anti-hepatitis, and anticancer properties [6]. The main components of I. lophanthodies were flavonoids, diterpenoids, triterpenes, etc [6–8].
Previous researches have primarily focused on terpenoids within this genus [9, 10]. Two oxidase genes, IrCYP706V2 and IrCYP706V7, involved in oridonin biosynthesis have been identified [11]. Additionally, six copalyl diphosphate synthase genes (IlCPS1-6) and five kaurene synthase-like genes (IlKSL1-5) have been confirmed in I. lophanthoides var. gerardiana [12].
Aspirin is widely recognized for its analgesic properties and is commonly utilized in clinical practice. Approximately 200 years ago, salicin was extracted from willow bark and subsequently oxidized to produce salicylic acid. This compound was then acetylated to yield aspirin [13, 14]. As an important substance for the industrial synthesis of aspirin, it is significant importance in elucidating the salicin biosynthetic mechanism in vivo. The latest research progress has preliminarily identified the synthetic pathways of salicin in willow [15, 16]. A UDP-glycosyltransferase UGT71L1 from willow involved in salicinoid biosynthesis has been identified [17, 18]. This enzyme was utilized by researchers to construct a metabolic engineering approach for salicin biosynthesis in Escherichia coli, presenting a viable method for its production [19]. These findings will enhance our investigation into the previously uncharted synthetic pathway of salicin in I. lophanthoides.
Genome assembly and multi-omics analyses have advanced our understanding of plant evolutionary history and provided insights into the plant’s metabolic pathways, which will facilitate modern breeding efforts [20]. Previous research had assembled a chromosome-level genome of I. rubescens, and utilized multi-omics data to identify ent-kaurene hydroxylases and offer new perspectives on the evolution of the lineage-specific diterpenoid pathway [11]. Here, we present a high-quality, chromosome-level genome assembly of I. lophanthodies, revealing the genome characteristics and evolutionary history of I. lophanthodies. Further metabolomic profiling identified medicinal metabolites across the tissues, and gene expression analysis in response to methyl jasmonate revealed the genes in the salicin synthesis pathway. This genome assembly and metabolite analysis will provide a foundation for studying the evolution and medicinal value of I. lophanthodies.
Results
Karyotype, genome survey, and genome characteristics of Isodon lophanthodies
I. lophanthodies, a perennial herb with significant medicinal value contains bioactive compounds with anti-inflammatory activity and anticancer property(Fig. 1a) [6]. The chromosome number of a plant is crucial for genome assembly and directly influences the accuracy and precision of the genome. To determine the chromosome number of I. lophanthoides, a karyotype experiment was conducted, and the base chromosome number was found to be 12 (Fig. 1b).
Fig. 1.

Morphological, karyotype and genomic features of I. lophanthoides. (a) The morphological characteristics of I. lophanthoides; (b) The chromosomes number of I. lophanthoides; (c) The Hi-C interaction matrix of I. lophanthoides. (d) Genome characteristics of I. lophanthoides
Before assembling the genome, a total of 52 Gb Illumina data was used for genome survey. Genomescore with the 21 k-mer showed that the genome size, heterozygosity rate, and repeat sequence rate were 380 Mb, 0.53%, and 39.7%, respectively (Supplementary Data Figure S1). These indicated that I. lophanthodies is a diploid with relatively low heterozygosity, which is beneficial to assemble a high quality genome.
A total of 11 Gb HiFi data was used for genome assembly, resulting in 470 contigs with the a contig N50 of 19.96 Mb. Then, 12 pseudochromosomes were anchored using Hi-C technology (Fig. 1c). The final assembled genome was 459.79 Mb with 35 scaffolds and a scaffold N50 of 33.43 Mb. The evaluation of genome assembly quality revealed that the complete BUSCO rate was 98.7%. After eliminating the redundancy, the chromosome-level genome size of I. lophanthodies was 412.78 Mb, localized on 12 pseudochromosomes (Fig. 1d). The chromosome lengths ranged from 28.60 Mb to 42.83 Mb, with a GC content of 34.69%. LTR elements and repetitive sequences comprised 26.83% (110.74 Mb) and 50.75% (209.51 Mb) of the genome sequences, respectively. The process of MAKER [21] with three kinds of evidence including de novo, homolog-based, and transcriptome-based was implemented to predict the gene structure. A total of 36,324 genes were predicted in the assembled genome. The assessment with the database embryophyta_odb10 suggested a BUSCO completeness of 92.1% for the gene annotation. Functional annotation indicated that 80.57% of the genes in I. lophanthodies had homologous annotation.
Genome comparison and evolution analysis of Isodon genus
The investigation of genome evolution can disentangle the complexity of Isodon genus evolutionary history. In this research, 12 species were selected to construct the time-scaling phylogeny tree. The number of identified single copy orthologues genes was 610 (Supplementary Data Figure S2). The phylogeny tree indicated that I. lophanthodies and I. rubescens diverged about 9.99 MYA, meaning that Isodon genus may have a common ancestor. Further investigation of gene family detected 619 expanded gene families and 2,244 contracted gene families in I. lophanthodies (Fig. 2a). The expanded gene families were related to carbon fixation in photosynthetic organisms, glyoxylate and dicarboxylate metabolism, anthocyanin biosynthesis, and phenylpropanoid biosynthesis. The contracted gene families were involved in linoleic acid metabolism, arachidonic acid metabolism, and glucosinolate biosynthesis.
Fig. 2.
Evolutionary analysis of I. lophanthoides. (a) Phylogenetic tree and gene family contraction and expansion. (b) Genome colinear comparison among I. lophanthoides, I. rubescens and Salvia miltiorrhiza. (c) Ks plot of Arabidopsis thaliana (At), Vitis vinifera (Vv), I. lophanthodies (Il), I. rubescens (Ir) and Thymus quinquecostatus (Tq)
The genome collinearity between I. lophanthodies and I. rubescens showed that most chromosomes had a parallel relationship(Fig. 2b). The chromosomal rearrangement occurred between chromosomes 1 and 11 of I. rubescens and chromosomes 6 and 11 of I. lophanthodies. Furthermore, chromosomes 3, 8, and 12 have large inversions in I. lophanthodies compared to I. rubescens. The genome collinearity of I. rubescens and Salvia miltiorrhiza presented less collinear than that of I. rubescens and I. lophanthodies, indicating that I. lophanthodies and S. miltiorrhiza had a farther genetic relationship.
WGD event generates a large number of redundant genes in the plant, promoting rapid recombination of the plant genome, generating gene diversity, and thus promoting plant evolution [22]. Based on the distribution of synonymous substitution sites (Ks), the peak of Ks ≈ 0.99 indicated a WGD event occurred in I. lophanthodies, and the inferred r value may be 6.34 × 10− 9 to 6.65 × 10− 9 (confidence interval of 95%). Using the formula: divergence time = Ks / (2×r), the WGD event in I. lophanthodies could date to 74.48 and 77.75 MYA and shared with I. rubescens(Fig. 2c).
Medicinal metabolites distribution in whole plant and fluctuation under exogenous methyl jasmonate treatment
As a traditional herb, it is essential to identify the metabolites presence in I. lophanthoides. We detected 3,002 metabolites with LC-MS wide-targeted metabolite profiling. The annotation of metabolites based on KEGG metabolic pathway indicated 79 metabolites in the amino acids pathway (map01230), 43 metabolites in arachidonic acid metabolism (map00590), 41 metabolites in flavonoid biosynthesis pathway (map00941), 22 metabolites in diterpenoid biosynthesis (map00904), 19 metabolites in alanine, aspartate and glutamate metabolism (map00250) (Fig. 3a).
Fig. 3.
Metabolites distribution in whole plant and accumulation dynamics in leaf. a. The classification of partial metabolites. b. The distribution of metabolites in root, stem, and leaf. c. The significant enrichment of the metabolites in root (blue) and stem (dark pink). d. Clustering of metabolites in I. lophanthodies.e. The significant enrichment of increased or decreased metabolites in three treatment groups compared to control check.
There were 2,766 metabolites distributed in root, stem, and leaf. A total of 43 unique metabolites were identified in leaf, while 149 metabolites were detected only in leaf and root. Additionally, 11 metabolites were identified only in root and stem (Fig. 3b). The 43 metabolites exclusively identified in the leaf were associated with several secondary metabolites, including kaempferol, cyanidin, vinblastine, divinylprotochlorophyllide, and 2’,3,4,4’,6’-peptahydroxychalcone 4’-O-glucoside. In comparison to leaf tissue, there was a significant reduction of 844 differentially accumulated metabolites (DAMs) in root and 966 DAMs in stem. These DAMs were mainly involved in the biosynthesis pathways of flavonoids, terpenoids, and carotene (Fig. 3c).
Astragaloside IV and wedelolactone were only detected in the leaf under the treatment of exogenous methyl jasmonate for 12 h, 48 h, and 96 h. Astragaloside IV was enriched in cluster 10 and wedelolactone was enriched in cluster 5. Both of them initially showed a rapid increase. While astragaloside IV declined at 48 h, wedelolactone declined at 12 h (Fig. 3d). Subsequently, we intersected the DAMs in the three groups: CK vs. 12 h, CK vs. 48 h, and CK vs. 96 h, and 36 jointly metabolites showed an upward trend and 298 metabolites showed a downward trend. The decreased DAMs were mainly enriched in the pathway of isoquinoline alkaloid biosynthesis and arachidonic acid metabolism. The increased DAMs were mainly enriched in the pathway of glycolysis/gluconeogenesis, nicotinate and nicotinamide metabolism, pyrimidine metabolism, flavonoid biosynthesis, and purine metabolism (Fig. 3e).
Proline can regulate the cell osmotic pressure to improve drought stress resistance of the plant [23]. Under the treatment of exogenous methyl jasmonate, the concentration of proline exhibited a rising trend, and reached its peak at 12 h (Fig. 4a). We further investigated other metabolites in the proline biosynthesis pathway. The biosynthesis pathway of proline in plants originates from glutamate or ornithine, and the two pathways converge on glutamate 5-semialdehyde (Fig. 4b). The glutamate 5-semialdehyde can be spontaneously converted into 1-pyrroline-5-carboxylate, the direct precursor for proline synthesis in vivo. The concentration of glutamate 5-semialdehyde increased more than 10-fold during the initial 12-hour period (Fig. 4c). Concurrently, the proline content also increased rapidly and reached its peak at 12 h. These results elucidated that the increase of proline was attribute to the activation of glutamate 5-semialdeyde converted into 1-pyrroline-5-carboxylate and synthesized the proline subsequently. The synthesis of precursor substances was also activated to be replenished.
Fig. 4.
The biosynthesis pathway of proline and salicin in plant. a. The relative content dynamics of proline under exogenous methyl jasmonate treatment. b. The pathway of proline synthesis. c. The relative content dynamics of glutamate 5-semialdeyde under exogenous methyl jasmonate treatment. d. The pathway of salicin synthesis. e. The heatmaps of accumulation level of metabolites (left) and the transcript level of the candidate genes (right). f. Protein structure characteristics of ILLG10G59.5 and ILLG9G12.30.
Putative structural genes regulate the biosynthesis of salicin
Salicin, as a natural product with analgesic or anti-rheumatic effects, is widely used in clinical medicine for therapeutic purposes [24]. We found the salicin synthesis pathway in plants using the PlantCyc database and obtained the enzyme numbers on the pathway (Fig. 4d). Then, we tried to reveal the complex relationship among metabolomics, transcriptome, and protein structure, especially those related to salicin biosynthesis in vivo. The contents of benzoic acid, benzaldehyde, benzyl alcohol, and salicin increased within 12 h under the treatment of methyl jasmonate, while decreased with continuous methyl jasmonate treatment (Fig. 4e).
In order to analyze the biosynthesis of salicin in I. lophanthodies, we combined two approaches to detect the potential genes in synthetic pathway. The candidate genes of ILLG8G53.18 and ILLG10G43.71 encoding EC 1.2.1.28 were annotated by EggNOG. ILLG10G59.5 and ILLG9G12.30 encoding EC 1.1.1.90 and EC 2.4.1.172 were obtained in dependence on the similarity of amino acid sequences, respectively. We further calculated the root-mean-square deviation (RMSD), which is a quantitative index for the similarity among proteins structure between the template genes undergone functional identification and the candidate genes [25]. Two protein structures of ILLG10G59.5 and ILLG9G12.30 in I. lophanthodies exhibited a high degree of similarity compared to their protein templates, with a low RMSD value of 0.310 and 0.115, respectively (Fig. 4f). The expression of the four genes was elevated between 12 and 24 h (Fig. 4e), which was consistent with the trend observed in metabolites that most of the metabolites in salicin pathway peaked at 12 h and downed between 12 and 48 h. This information suggested that the expression of candidate genes and the accumulation of the metabolites in the salicin synthesis pathway were consistent in response to methyl jasmonate.
Discussion
The genus Isodon comprises a variety of species, which are distributed in tropical, subtropical, and the “roof of the world” Qinghai–Xizang Plateau [26]. As a traditional medicine, I. lophanthodies, one species of genus Isodon, is famous as “Xihuangcao”, which has been demonstrated to possess anti-inflammatory, anti-hepatitis and anti-cancer properties. The availability of high-quality genomic resources can facilitate molecular breeding and evolutionary studies of plant [19].
In this research, we presented a genome of I. lophanthodies assembled by combining PacBio HiFi and Hi-C data. The genome size was about 412 Mb, with an scaffold N50 of 33.43 Mb. The karyotype of the genus Isodon has been of great concern to researchers. Previous researcher had investigated chromosome numbers of 24 taxa of Lamiaceae of Hengduan Mountains Region, including 11 taxa of Isodon [27]. Except for the tetraploid I. ternifolius, other Isodon have 24 chromosomes (2 C), suggesting that the base chromosome number for Isodon gunes is C = 12. The plant I. lophanthodies was reported to have 36 chromosomes and its base chromosome number is C = 18 [28]. While the chromosome number of the other source of “Xihuangcao”, such as I. serra, I. lophanthoides var. Gerardiana, and I. lophanthoides var. Graciliflora were all 24 chromosomes (C = 12) [28]. To solve the problem, we carried out a karyotype experiment and confirmed that it was a diploid with a total of 24 chromosomes (2 C). The BUSCOs assessment of genome and transcript were 98.7% and 92.1%, respectively. A total of 36,324 genes was predicted. These results indicated that the assembly was high-quality. To date, the only publicly available genome of this genus is I. rubescens [20]. This assembled genome of I. lophanthodies will facilitate a more profound comprehension of the evolution of the genus. For time-scaling phylogeny trees, we inferred that I. rubescens and I. lophanthodies diverged about 9.99 MYA.
Transcriptome and metabolome are great resources for helping us understand the changes of metabolites in plants [29, 30]. We detected abundant medicinal ingredients in the plant of I. lophanthodies such as flavonoids, terpenoids, eicosanoids, etc. (Supplementary Data Figure S3-S5). Additionally, we identified 79 metabolites involved in the amino acid synthesis pathway, including alanine, lysine, proline, etc. (Supplementary Data Figure S6, map01230). Then, we observed the metabolites fluctuation in response to exogenous methyl jasmonate. Proline rose sharply under the treatment of exogenous methyl jasmonate. As a natural product, salicin can alleviate osteoarthritis and headaches [24]. We identified four candidate genes by combining transcriptome and metabolome data. To confirm the function of the candidate genes, we compared the protein structures between candidate genes and reported functional genes, which ensured that the protein structures of ILLG10G59.5 and ILLG9G12.30 were analogous to the templates. These candidate genes are worth experimental validation in further research.
Conclusion
Here, we presented a high-quality chromosome-level reference genome for I. lophanthodies, as well as transcriptome and metabolome data. Major medical metabolites were demonstrated and studied the content dynamics of proline which rose sharply under the short-term exogenous methyl jasmonate treatment. Meanwhile, four candidate genes involved in the salicin synthesis pathway were identified. The results of this study will facilitate further investigation into the secondary metabolic pathways of I. lophanthodies.
Method and materials
Plant material collection and sequening
Fresh young leaves of I. lophanthodies were collected from Guangdong Academy of Agricultural Sciences (Guangzhou, China), and immediately frozen by liquid nitrogen. The plant material was planted in medicinal plant germplasm resources garden. The leaves were used to extract high-quality genomic DNA. The tissues of the root, stem, leaf, flower and leaf with six periods: 5 h, 12 h, 24 h, 48 h, 72 h and 96 h under the treatment of 200 mM methyl jasmonate were acquired to generate transcriptome data. For long read sequencing, a library was constructed with 15 kb genomic DNA fragments according to the manufacturer’s report and sequenced on the PacBio Revio platform. Then, low quality reads and sequence adapters were removed to obtain clean reads. The Hi-C library was constructed using the young leaf and DpnII enzyme for high-throughput sequencing. The sequencing final generated 61Gb raw data for chromosome assembly.
Karyotype experiment
The karyotype experiment was using the method of predecessors as a reference [27]. Freshly growing root tips of 5–10 mm length were cut from germinated plants on wet filter paper. They were pretreated in 2 mM 8-hydroxyquinoline for 2 h at 4℃ and then fixed in 45% acetic acid for 10 min at about 2℃. Finally, they were macerated in a 1:1 mixture of 1 mol/L hydrochloric acid and 45% acetic acid for 20–23 s at 60℃ and stained with DAPI for 5 min at room temperature in dark.
Genome survey and chromosome assembly
A total of 52 Gb Illumina sequences generated and were used to assess the genome size and heterozygosis by the software jellyfish and Genomescope [31, 32]. Then the PacBio HiFi data was assembled using the software hifiasm with the default parameters [33]. The Hi-C data was anchored to the contigs to promote the chromosomal-level genome. The genome was regulated by 3D-DNA and visualized in juicebox [34, 35].
Annotation of repetitive sequences, gene structure and function
The repetitive sequences were discerned by EDTA with default pipeline and the repetitive sequences were masked subsequently. The masked genome was used to further identify the gene structures [36].
The three evidences of de novo prediction, homologous protein evidence, and EST evidence were used for predicting gene structures by using pipeline MAFER [21]. The software Augustus was utilized in de novo gene prediction. The software hisat2, stringties, and gffread were implemented to generate EST evidence with 206 Gb RNA-seq data sequenced on the NovaSeq 6000 platform [37, 38]. Homologous protein sequences were combined with Salvia splendens (GCF_004379255.2), Salvia hispanica (GCF_023119035.1), and S. miltiorrhiza (GCF_028751815.1) to acquire homologous evidence [39, 40].
Functional annotation was achieved by comparing predicted proteins with public databases using BLAST with an E-value cut-off of 1e-5 [41]. The best-hit BLAST results were considered as gene functions. GO and KEGG annotations were obtained by using EggNOG and KAAS [42, 43].
Genome quality analysis
Multiple analyses were implemented to assess the accuracy and integrity of assembled genomes. Firstly, the scaffold N50, represented metrics of length and completeness of the genome assembly, was calculated. Furthermore, RNA-seq data were mapped to genome using hisat2 to assess completeness [44]. The base software BUSCO was implemented to assess genome completeness using the embryophyta_odb10 database [45].
Evolutionary analysis
The single-copy orthologue genes of 12 species were identified by orthofinder using protein sequences [46]. Then the CDSs corresponding to the protein sequences were extracted and subjected to comparison using the MAFFT algorithm [47]. Divergence times with fossil evidence between the species were obtained from timetree database (http://timetree.org/). Finally, the mcmctree program was employed to calculate the time of divergence for the assessed species. Gene family expansion and contraction were conducted using the CAFE software [48, 49]. The JCVI procedure was employed to identify the collinearity block between species [50]. WGDI which can elucidate the intricate, multi-layered patterns of gene collinearity was employed to detect the WGD [51]. The NG86 algorithm was used to calculate the synonymous substitutions per synonymous site (Ks). The ages of WGD and the divergence time were calculated as follows: Divergence time = Ks/ (2×r) [52].
Metabolome analysis
The metabolome samples were collected including root, stem, leaf, and the leaf with four periods: 12 h, 48 h, 72 h, 96 h under the treatment of 200 mM methyl jasmonate, and each group included three biological replicates. Weigh the sample and add 2-chlorophenylalanine methanol, vortex for 30 s. After grinding the samples, ultrasound for 15 min. Centrifuge and the supernatant 300 µL was filtered through 0.22 μm membrane to obtain the prepared samples for LC-MS. Metabolite determination was carried out in a Thermo Vanquish system that was furnished with an ACQUITY UPLC HSS T3 (150 × 2.1 mm, 1.8 μm, Waters) column and maintained at 40℃. The autosampler temperature was 8℃. Analytes were eluted with a gradient using 0.1% formic acid in water (A1) and 0.1% formic acid in acetonitrile (B1), or 5 mM ammonium formate in water (A2) and acetonitrile (B2) at a flow rate of 0.25 mL/min. After equilibration, 2 µL of each sample was injected. A linear gradient of solvent B (v/v) was increased as follows: 0–1 min, 2% B2/B1; 1–9 min, 2−50% B2/B1; 9–12 min, 50−98% B2/B1; 12–13.5 min, 98% B2/B1; 13.5–14 min, 98%−2% B2/B1; 14–20 min, 2% B1-positive model (14–17 min, 2% B2-negative model). ESI-MSn experiments were conducted on the Thermo Q Exactive HF-X mass spectrometer with a spray voltage of 3.5 kV in the positive mode and − 2.5 kV in the negative mode. Sheath gas and auxiliary gas were set at 30 and 10 arbitrary units, respectively. The capillary temperature was 325℃. The analyzer scanned over a mass range of m/z 81−1,000 for a full scan at a mass resolution of 60,000. Data-dependent acquisition MS/MS experiments were carried out with HCD scan. The normalized collision energy was 30 eV. The raw data were converted into mzXML format by Proteowizard software [53]. Peak identification, filtration, and alignment were performed using XCMS. For facilitate comparisons, the data was subjected to batch normalization for peak area. The R package ropls was used to analyze the DAMs. The checking criteria of up regulated metabolites were fold change > 2, p-value < 0.05, and VIP > 1 and the checking criteria of down regulated metabolites were fold change < 0.5, p-value < 0.05, and VIP > 1.
Transcriptome and metabolome data processing
The comparison of RNA-seq data was processed through the Hisat2 pipeline. The software SAMtools was used to sort the data [54]. The statistic of gene expression was achieved by utilizing the software FeatureCounts [55]. The R module DESeq2 was employed for the purpose of data normalization and the identification of differentially expressed genes (DEGs) [56].
The R package clusterProfiler was employed for enrichment analysis of transcriptomic and metabolomic data [57]. The R package Mfuzz was used to analyze the trends in metabolite expression and to delineate clustering groups [58].
Determination of structural gene of salicin pathway
The pathway of salicin was drawn according the PlantCyc (https://www.plantcyc.org/) and the enzyme numbers on the pathway were used to acquire the amino acid sequences. The ILLG8G53.18 and ILLG10G43.71 genes with the structural domains of enzyme number (EC) 1.2.1.28 were annotated by EggNOG and confirmed identity and length with AT1G04580 through blast. The candidate genes of the remaining two enzymes, EC 1.1.1.90 (phenylacetaldehyde reductase) and EC 2.4.1.172 (salicyl-alcohol β-D-glucosyltransferase) were obtained through blast with the protein sequences of QXI89045.1 and UGT71L1 with an e-value cut-off of 1e-5 [17, 59]. Then, we combined expression levels of genes and trends of the transcript levels to determine the candidate genes. To further confirm the protein structures of the candidate genes, we compared the similarity between template protein structures and target protein structures according to RMSDs. The homology modeling of template and target sequences were constructed by SWISS-MODEL and visualized through ChimeraX [60, 61]. Then the RMSDs between templates and targets were calculated by Matchmaker [62].
Electronic supplementary material
Below is the link to the electronic supplementary material.
Acknowledgements
Not applicable.
Author contributions
W.J. H and X.S.Q designed the experiments of this manuscript. L.J.Y wrote the main text of the manuscript. L.J.Y, L.F.P, and L.W. analyzed the data. Z.L., G.Z.P, C.J.X, and W.S.K provided some valuable advice and made further modifications. All authors read and approved the final version of the manuscript.
Funding
Guangdong Science and technology plan project (2023B1212060038), Natural Science Foundation of Crops Research Institute, Guangdong Academy of Agricultural Sciences (0145), the Chaozhou Science and technology plan project (202102NY04).
Data availability
The genome assembly as well as the raw genomic data of Illumina sequences, PacBio sequences and transcriptome data have been deposited in the NCBI Sequence Read Archive under accession numbers PRJNA1125617 and dx.doi.org/10.6084/m9.figshare.26062414 of Figshare database. The biological materials will be shared by the contacts upon request.
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Jieying Liu, Shiqiang Xu, Fangping Li these authors contributed equally to this work.
References
- 1.Xiang CL, Chen YP, Funamoto T et al. Cytology of thirteen taxa in the genus isodon (lamiaceae: nepetoideae) from China. Plant Divers Resour, 2014(36): 561–8.
- 2.Wong LL, Liang Z, Chen H et al. Rapid differentiation of xihuangcao from the three isodon species by uplc-esi-qtof-ms/ms and chemometrics analysis. Chin Med, 2016(11): 48. [DOI] [PMC free article] [PubMed]
- 3.Dai LP, Li XF, Feng QM et al. Isolation and identification of two pairs of cytotoxic diterpene tautomers and their tautomerization mechanisms. Sci Rep, 2020(10): 1442. [DOI] [PMC free article] [PubMed]
- 4.Zhou GB, Kang H, Wang L et al. Oridonin, a diterpenoid extracted from medicinal herbs, targets aml1-eto fusion protein and shows potent antitumor activity with low adverse effects on t(8;21) leukemia in vitro and in vivo. Blood, 2007(109): 3441–50. [DOI] [PMC free article] [PubMed]
- 5.Zhen T, Wu CF, Liu P et al. Targeting of aml1-eto in t(8;21) leukemia by oridonin generates a tumor suppressor-like protein. Sci Transl Med, 2012(4): 127ra38. [DOI] [PubMed]
- 6.Lin L, Dong Y, Yang B et al. Chemical constituents and biological activity of chinese medicinal herb ‘xihuangcao’. Comb Chem High Throughput Screen, 2011(14): 720–9. [DOI] [PubMed]
- 7.Lin C, Liu F, Zhang R, et al. High-performance thin-layer chromatographic fingerprints of triterpenoids for distinguishing between Isodon lophanthoides and isodon lophanthoides var. Gerardianus. J AOAC Int. 2019;102:714–9. [DOI] [PubMed] [Google Scholar]
- 8.Du Z, Peng Z, Yang H et al. Identification and functional characterization of three cytochrome p450 genes for the abietane diterpenoid biosynthesis in Isodon lophanthoides. Planta, 2023(257): 90. [DOI] [PubMed]
- 9.Lin CZ, Zhao W, Feng XL, et al. Cytotoxic diterpenoids from Rabdosia lophanthoides var. Gerardianus. Fitoterapia. 2016(109):14–9. [DOI] [PubMed]
- 10.Xie W, Zhang D, Wen X, et al. A practical technique for rapid characterisation of ent-kaurane diterpenoids in Isodon serra (Maxim.) Hara by UHPLC-Q-TOF-MS/MS. Phytochem Anal. 2022(33):517–32. [DOI] [PubMed]
- 11.Sun Y, Shao J, Liu H, et al. A chromosome-level genome assembly reveals that tandem-duplicated cyp706v oxidase genes control oridonin biosynthesis in the shoot apex of isodon rubescens. Mol Plant. 2023;16:517–32. [DOI] [PubMed] [Google Scholar]
- 12.Yang R, Du Z, Qiu T, et al. Discovery and functional characterization of a diverse diterpene synthase family in the medicinal herb isodon lophanthoides var. Gerardiana. Plant Cell Physiol. 2021(62):1423–35. [DOI] [PubMed]
- 13.Fiebich BL, Appel K. Anti-inflammatory effects of willow bark extract. Clin Pharmacol Ther. 2003;96(74):96–7. [DOI] [PubMed] [Google Scholar]
- 14.Vlachojannis JE, Cameron M, Chrubasik S. A systematic review on the effectiveness of willow bark for musculoskeletal pain. Phytother Res. 2009;23:897–900. [DOI] [PubMed] [Google Scholar]
- 15.Pridham P J B, Saltmarsh M J. The biosynthesis of phenolic glucosides in plants. Biochem J. 1963;87:218–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Babst BA, Harding SA, Tsai CJ. Biosynthesis of phenolic glycosides from phenylpropanoid and benzenoid precursors in populus. J Chem Ecol, 2010(36): 286–97. [DOI] [PubMed]
- 17.Gordon H, Fellenberg C, Lackus ND et al. Crispr/cas9 disruption of ugt71l1 in poplar connects salicinoid and salicylic acid metabolism and alters growth and morphology. Plant Cell, 2022(34): 2925–47. [DOI] [PMC free article] [PubMed]
- 18.Fellenberg C, Corea O, Yan LH et al. Discovery of salicyl benzoate udp-glycosyltransferase, a central enzyme in poplar salicinoid phenolic glycoside biosynthesis. Plant J, 2020(102): 99–115. [DOI] [PubMed]
- 19.Zhang M, Liu C, Xi D et al. Metabolic engineering of escherichia coli for high-level production of salicin. ACS Omega, 2022(7): 33147–55. [DOI] [PMC free article] [PubMed]
- 20.Li F, Hou Z, Xu S et al. Haplotype-resolved genomes of octoploid species in phyllanthaceae family reveal a critical role for polyploidization and hybridization in speciation. Plant J, 2024(119): 348–63. [DOI] [PubMed]
- 21.Cantarel BL, Korf I, Robb SM et al. Maker: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res, 2008(18): 188–96. [DOI] [PMC free article] [PubMed]
- 22.Hu J, Qiu S, Wang F et al. Functional divergence of cyp76aks shapes the chemodiversity of abietane-type diterpenoids in genus salvia. Nat Commun, 2023(14): 4696. [DOI] [PMC free article] [PubMed]
- 23.Peng Z, Lu Q, Verma DP. Reciprocal regulation of delta 1-pyrroline-5-carboxylate synthetase and proline dehydrogenase genes controls proline levels during and after osmotic stress in plants. Mol Gen Genet, 1996(253): 334–41. [DOI] [PubMed]
- 24.Zhu Z, Gao S, Chen C et al. The natural product salicin alleviates osteoarthritis progression by binding to ire1alpha and inhibiting endoplasmic reticulum stress through the ire1alpha-ikappabalpha-p65 signaling pathway. Experimental Mol Med, 2022(54): 1927–39. [DOI] [PMC free article] [PubMed]
- 25.Bruschweiler R. Efficient rmsd measures for the comparison of two molecular ensembles. Root-mean-square deviation. Proteins-Structure Function Bioinf. 2003;50:26–34. [DOI] [PubMed] [Google Scholar]
- 26.Yu XQ, Maki M, Drew BT, et al. Phylogeny and historical biogeography of isodon (lamiaceae): rapid radiation in south-west China and miocene overland dispersal into Africa. Mol Phylogenet Evol. 2014;77:183–94. [DOI] [PubMed] [Google Scholar]
- 27.Chen Y, Zhao F, Peng H et al. Chromosome numbers of 24 taxa of lamiaceae from southwest China. Caryologia, 2018(71): 298–306.
- 28.Huang SS. Studies on chromosome number of four original species of Chinese medicine xihuangcao. J Trop Subtropical Bot, 2011(19): 374–6.
- 29.Nagano AJ, Kawagoe T, Sugisaka J et al. Annual transcriptome dynamics in natural environments reveals plant seasonal adaptation. Nat Plants, 2019(5): 74–83. [DOI] [PubMed]
- 30.Li N, Shao T, Xu L et al. Transcriptome analysis reveals the molecular mechanisms underlying the enhancement of salt-tolerance in melia azedarach under salinity stress. Sci Rep, 2024(14): 10981. [DOI] [PMC free article] [PubMed]
- 31.Marcais G, Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011;27:764–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Vurture GW, Sedlazeck FJ, Nattestad M et al. Genomescope: fast reference-free genome profiling from short reads. Bioinformatics, 2017(33): 2202–4. [DOI] [PMC free article] [PubMed]
- 33.Cheng H, Concepcion GT, Feng X et al. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods, 2021(18): 170–5. [DOI] [PMC free article] [PubMed]
- 34.C Durand N, T Robinson J, S Shamim M, et al. Juicebox provides a visualization system for hi-c contact maps with unlimited zoom. Cell Syst. 2016;3:99–101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Dudchenko O, Batra SS, Omer AD et al. De novo assembly of the aedes aegypti genome using hi-c yields chromosome-length scaffolds. Science, 2017(356): 92–5. [DOI] [PMC free article] [PubMed]
- 36.Ou S, Su W, Liao Y et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol, 2019(20): 275. [DOI] [PMC free article] [PubMed]
- 37.Pertea M, Kim D, Pertea GM et al. Transcript-level expression analysis of rna-seq experiments with hisat, stringtie and ballgown. Nat Protoc, 2016(11): 1650–67. [DOI] [PMC free article] [PubMed]
- 38.Pertea G, Pertea M. Gff utilities: gffread and gffcompare. F1000Res, 2020(9). [DOI] [PMC free article] [PubMed]
- 39.Jia KH, Liu H, Zhang RG et al. Chromosome-scale assembly and evolution of the tetraploid salvia splendens (lamiaceae) genome. Hortic Res, 2021(8): 177. [DOI] [PMC free article] [PubMed]
- 40.Pan X, Chang Y, Li C et al. Chromosome-level genome assembly of salvia miltiorrhiza with orange roots uncovers the role of sm2ogd3 in catalyzing 15,16-dehydrogenation of tanshinones. Hortic Res, 2023(10): uhad69. [DOI] [PMC free article] [PubMed]
- 41.Camacho C, Coulouris G, Avagyan V, et al. Blast+: architecture and applications. BMC Bioinformatics. 2009;10:421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Cantalapiedra CP, Hernandez-Plaza A, Letunic I et al. Eggnog-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol Biol Evol, 2021(38): 5825–9. [DOI] [PMC free article] [PubMed]
- 43.Moriya Y, Itoh M, Okuda S et al. Kaas: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res, 2007(35): W182–5. [DOI] [PMC free article] [PubMed]
- 44.Kim D, Langmead B, Salzberg SL. Hisat: a fast spliced aligner with low memory requirements. Nat Methods, 2015(12): 357–60. [DOI] [PMC free article] [PubMed]
- 45.Manni M, Berkeley MR, Seppey M et al. Busco update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol, 2021(38): 4647–54. [DOI] [PMC free article] [PubMed]
- 46.Emms DM, Kelly S. Orthofinder: phylogenetic orthology inference for comparative genomics. Genome Biol, 2019(20): 238. [DOI] [PMC free article] [PubMed]
- 47.Katoh K, Standley DM. Mafft multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol, 2013(30): 772–80. [DOI] [PMC free article] [PubMed]
- 48.De Bie T, Cristianini N, Demuth JP et al. Cafe: a computational tool for the study of gene family evolution. Bioinformatics, 2006(22): 1269–71. [DOI] [PubMed]
- 49.Yang Z. Paml 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol, 2007(24): 1586–91. [DOI] [PubMed]
- 50.Tang H, Krishnakumar V, Zeng X et al. Jcvi: a versatile toolkit for comparative genomics analysis. iMeta, 2024(n/a): e211. [DOI] [PMC free article] [PubMed]
- 51.Sun P, Jiao B, Yang Y et al. Wgdi: a user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes. Mol Plant, 2022(15): 1841–51. [DOI] [PubMed]
- 52.Badouin H, Gouzy J, Grassa CJ et al. The sunflower genome provides insights into oil metabolism, flowering and asterid evolution. Nature, 2017(546): 148–52. [DOI] [PubMed]
- 53.Smith CA, Want EJ, O’Maille G et al. Xcms: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal Chem, 2006(78): 779–87. [DOI] [PubMed]
- 54.Danecek P, Bonfield JK, Liddle J et al. Twelve years of samtools and bcftools. GigaScience, 2021(10). [DOI] [PMC free article] [PubMed]
- 55.Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics, 2013,30(7):923-930. [DOI] [PubMed]
- 56.Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol, 2010(11): R106. [DOI] [PMC free article] [PubMed]
- 57.Wu T, Hu E, Xu S et al. Clusterprofiler 4.0: a universal enrichment tool for interpreting omics data. Innov (Camb), 2021(2): 100141. [DOI] [PMC free article] [PubMed]
- 58.Kumar L, E FM. Mfuzz: a software package for soft clustering of microarray data. Bioinformation, 2007(2): 5–7. [DOI] [PMC free article] [PubMed]
- 59.Sanchez R, Bahamonde C, Sanz C et al. Identification and functional characterization of genes encoding phenylacetaldehyde reductases that catalyze the last step in the biosynthesis of hydroxytyrosol in olive. Plants-Basel, 2021(10). [DOI] [PMC free article] [PubMed]
- 60.Waterhouse A, Bertoni M, Bienert S et al. Swiss-model: homology modelling of protein structures and complexes. Nucleic Acids Res, 2018(46): W296–303. [DOI] [PMC free article] [PubMed]
- 61.Meng EC, Goddard TD, Pettersen EF et al. Ucsf Chimerax: tools for structure building and analysis. Protein Sci, 2023(32): e4792. [DOI] [PMC free article] [PubMed]
- 62.Meng EC, Pettersen EF, Couch GS et al. Tools for integrated sequence-structure analysis with ucsf chimera. BMC Bioinformatics, 2006(7): 339. [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The genome assembly as well as the raw genomic data of Illumina sequences, PacBio sequences and transcriptome data have been deposited in the NCBI Sequence Read Archive under accession numbers PRJNA1125617 and dx.doi.org/10.6084/m9.figshare.26062414 of Figshare database. The biological materials will be shared by the contacts upon request.



