Skip to main content
PeerJ logoLink to PeerJ
. 2025 Oct 23;13:e20136. doi: 10.7717/peerj.20136

Full-length transcriptome profiling of Gentiana straminea Maxim. provides new insights into iridoid biosynthesis pathway

Lina Yang 1, Tao He 2,3,, Le Wang 3,, Xiaochun Ning 4, Shuai Wang 3
Editor: Vladimir Uversky
PMCID: PMC12554311  PMID: 41146991

Abstract

Gentiana straminea Maxim. is a traditional Chinese medicinal plant celebrated for its diverse array of bioactive compounds, particularly iridoid glycosides. These compounds are recognized as the main components that exert therapeutic effects against rheumatism, osteoarthritis, hepatitis, gastritis, and cholecystitis. Consequently, research on G. straminea has attracted significant interest, yet the genetic factors underlying the production and diversification of its secondary metabolites remain poorly understood, especially the pathway of iridoid biosynthesis. In the present study, a full-length transcriptome analysis based on PacBio Sequel II platform and Illumina sequencing was performed to identify genes that were differentially expressed in five G. straminea tissues, and proteins catalyzing iridoid biosynthesis were characterized. After sequence clustering and redundancy removal, a total of 32,776 isoforms were identified in PacBio sequencing, with an average length of 2,589.14 bp, an N50 value of 2,767 bp, and a guanine-cytosine (GC) content of 41.43%. Results of Illumina sequencing unveiled that a total of 31,330 genes were found in common in all the five tissues. Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis revealed that the DEGs were mainly enriched in terms related to biosynthesis of secondary metabolites, metabolic pathways, mitogen-activated protein kinase (MAPK) signaling pathway, etc. A total of 117 isoforms encoding 19 key enzymes related to the iridoid synthesis pathway were identified, including one geranyl diphosphate synthase (GPPS) and two geranylgeranyl diphosphate synthases (GGPPS). A phylogenetic analysis further classified plant G(G)PPSs into three distinct branches. Profiling tissue-specific expression of key genes involved in iridoid synthesis revealed that the Quantitative reverse transcription polymerase chain reaction (RT-qPCR) results demonstrated the consistent trend with the FPKM (Fragments Per Kilobase of transcript per Million mapped reads) values in the root, stem, leaf, flower, ovary, non-embryonic callus (NEC), and embryonic callus (EC). Among them, AACT, IDI, ISPH, and GCPE had the highest expression levels in leaves, whereas DXS and GPPS had the highest expression levels in stems. This work provides the first transcriptomic analysis of G. straminea, serving as a foundational resource for elucidating mechanisms of bioactive compound biosynthesis, facilitating molecular studies, and supporting genomic exploration of this medicinal species.

Keywords: Gentiana straminea, SMRT sequencing, Iridoid biosynthesis, Expression profile

Introduction

Gentiana straminea Maxim., a member of the Gentianaceae family, is a medicinal plant widely used in traditional Chinese medicine (Ye et al., 2021). This species is primarily distributed in high-altitude regions (2,000–4,950 m) of Qinghai, Xizang and Sichuan, where it thrives in alpine meadows, forests, and grasslands (Jia et al., 2012). Previous studies have indicated that iridoids from the roots of G. straminea have therapeutic effects against rheumatism, osteoarthritis, hepatitis, gastritis, and cholecystitis (Zhou et al., 2016). The main medicinal effects are associated with gentiopicroside, loganic acid, sweroside and swertiamarin, which are all iridoid compounds (Wei et al., 2012; Wu et al., 2016). Because of their abundant pharmacological properties, iridoid have emerged as a research hotspot in related studies. To date, the research on the iridoid biosynthetic pathway is relatively well-established primarily in Catharanthus roseus (Oudin et al., 2007). The biosynthesis pathway of iridoids in G. straminea remains poorly understood. Elucidating its biosynthesis mechanism has become essential for targeted enhancement of pharmacologically active compounds in G. straminea.

Preliminary results demonstrate that iridoids (oxygenated monoterpene structurally derived from two isoprene units), are biosynthesized via a three-step biosynthetic pathway. The first stage involves synthesis of the precursors isopentyl diphosphate (IPP) and dimethylallyl diphosphate (DMAPP), which can be formed via the mevalonate pathway (MVA) and the methylerythritol phosphate pathway (MEP). MVA occurs mainly in the cytoplasm, whereas MEP occurs mainly in plastids (Zhan et al., 2023). The second stage involves the formation of the iridoid skeleton, during which IPP and DMAPP are catalytically condensed by geranyl diphosphate synthases (GPPS) to produce geranyl diphosphate (GPP), while geranylgeranyl diphosphate (GGPP) is generated through a catalytic process mediated by geranylgeranyl diphosphate synthases (GGPPS) (Vranová, Coman & Gruissem, 2013). Then, GPP and GGPP are used as the raw materials for the synthesis of different terpenoids (monoterpenes, diterpenes, triterpenes, etc.) through different metabolic pathways (Sun et al., 2012). The third stage is the synthesis of iridoid from GPP. GPP is converted to geraniol by geranyl diphosphate diphosphatase (GES)-mediated catalysis and hydrolysis (Oudin et al., 2007). Geraniol undergoes structural modifications including glycosylation, hydroxylation, methylation, and isomerization, along with other reactions, to yield iridoid derivatives (Zhao & Wang, 2020). Gentiopicroside, as one of these derivatives, possesses significant medicinal value in Gentianaceae plants (Wu et al., 2016).

Advances in sequencing technology have facilitated the application of transcriptome sequencing in transcript profiling, gene discovery, and determining which genes are expressed in plants. The transcriptome consists of all the RNA transcripts of a species and reflects the functions of different cells and tissues during a particular period. Modern high-throughput RNA sequencing technologies enable the analysis of genes that regulate the synthesis of secondary metabolites in non-model species. This approach can uncover new genes, potential metabolic pathways, and associated genetic regulatory mechanisms (Ozsolak & Milos, 2012). The third-generation single-molecule real-time sequencing (SMRT) enables sequencing of transcripts up to 10 kb without a reference genome. However, it is limited by high cost per base, high error rates, and low throughput (Rhoads & Au, 2015). Second-generation sequencing produces short read lengths, but provides high sequencing accuracy. Due to limitations imposed by its read length and assembly algorithms, second-generation sequencing cannot accurately obtain the complete sequence of transcripts, particularly for different transcripts with high homology. Consequently, the synergistic integration of second- and third-generation sequencing techniques enables the generation of high-fidelity sequencing data with low error rate. Full-length transcriptome-based Illumina sequencing has been applied in research involving Coptis deltoidei (Zhong et al., 2020), Ranunculus japonicus (Xu et al., 2023), Fritillaria hupehensis (Guo et al., 2021), Angelica sinensis (Gao et al., 2021), Torreya grandis (Lou et al., 2019), and Salvia miltiorrhiza (Xu et al., 2016).

This study aimed to identify and characterize key genes involved in iridoid biosynthesis in G. straminea through an integrated multi-omics approach. Combining third-generation (PacBio) and second-generation (Illumina) sequencing platforms, we obtained full-length transcriptomes and quantified tissue-specific expression patterns of iridoid pathway-related genes. Candidate genes were further validated via RT-qPCR, while differentially expressed genes (DEGs) across tissues were functionally annotated using KEGG enrichment analysis. These findings establish a critical genetic foundation for deciphering the iridoid biosynthetic pathway in G. straminea and will facilitate future metabolic engineering efforts.

Materials & Methods

Preparation and collection of samples for transcriptome sequencing and qPCR analysis

Samples of G. straminea individuals were collected in August 2023 during the flowering stage from Yushu, Maqin County, Qinghai Province, China (N34°38′380, E100°23′546, altitude 4,200 m). Fresh tissues from five types—root, stem, leaves, flower, and ovary—were collected, washed with sterilized water, wrapped in foil, and then preserved in liquid nitrogen. Tissues of these five types were further utilized for library construction and SMRT sequencing. Additionally, for the quantification of gene expression via qPCR, two additional tissue types—non-embryonic callus (NEC) and embryonic callus (EC)—were included. The generation of EC and NEC tissues was carried out following the protocol outlined by He, Yang & Zhao (2011), utilizing leaves as explants. Three biological replicates were collected for all samples.

RNA extraction and SMRT sequencing

Samples of G. straminea were used for total RNA extraction on ice, according to the manufacturer’s protocol, with TRIzol reagent (Life Technologies, Karlsbad, CA, USA). An Agilent 2100 Bioanalyzer and agarose gel electrophoresis were used to determine the integrity of the total RNA. A Nanodrop microspectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) was used to check the purity and concentration of the RNA. The Clontech SMARTer PCR cDNA Synthesis Kit was used to reverse transcribe the oligo (dT) magnetic bead-enriched mRNA to cDNA. PCR cycle optimization was employed to identify the ideal number of amplification cycles for subsequent large-scale PCRs. Double-stranded cDNAs were generated with the optimized cycle number. Additionally, size selection at >5 kb and equal mixing without size selection of cDNA were performed with the BluePippinTM Size Selection System. The next step in the construction of the SMRTbell library was carried out by large-scale PCR. The sequencing primer was matched with the SMRTbell template by annealing, and then linked to the polymerase. Sequencing was conducted using the PacBio Sequel II platform at Gene Denovo Biotechnology Co.

The raw sequencing reads from the cDNA library were classified via the Pacific Biosciences Iso-Seq pipeline, with high-quality circular consensus sequence (CCS) first extracted. Transcript integration was assessed according to whether the CCS reads contained all 5′primers, the 3′primer and the poly-A sequences. Full length sequences (FLs) were those that contained all three sequences. After the removal of primers, barcodes and poly A tails, full-length nonchimeric (FLNC) reads were obtained. Reads less than 50 bp in length were discarded. The entire isoform was generated by clustering the FLNC reads. Minimap2 was used for similar FLNC reads, which were then clustered hierarchically to obtain a consistency sequence (unpolished consensus isoforms). The consistency sequence was then further corrected via the Quiver algorithm. The high-quality isoforms (prediction accuracy ≥ 0.99) were used for subsequent analysis.

Library construction and Illumina sequencing

Total RNA was enriched by Oligo(dT) beads to form mRNA, then was fragmented into short fragments. With random primers, fragments were transcribed into cDNA, then syntheisized the second-strand cDNA with DNA polymerase I, Rnase H, dNTP and buffer. the obtained cDNA was purified with QiaQuick PCR extraction kit (Qiagen, Venlo, The Netherlands), end repaired, poly(A) added, and ligated to Illumina sequencing adapters. The ligated products were screened by agarose gel electrophoresis, amplified by PCR, and sequenced by Gente Denovo Biotechnology Co. (Guangzhou, China) using Illumina HiSeqTM 4000. High quality clean reads were obtained by fastp (Version 0.18.0), with removing adapters, containing more than 10% of unknown nucleotides (N) and low-quality reads.

Isoform expression and differential expression analysis

Using the full-length transcriptome as the reference, the clean and high-quality reads were mapped using RSEM (version 1.2.8) to determine the isoform expression in five different tissues of G. straminea. The expression levels of isoforms from each sample were calculated and normalized to FPKM. Differentially expressed genes (DEGs) were identified using the DESeq2 software with —log2 (Fold Change)— ≥ 2 and a false discovery rate (FDR) below 0.05 as the screening criteria.

Functional annotation, structure analysis

The sequences of the isoforms were checked against the non-redundant protein (Nr) database of the NCBI (http://www.ncbi.nlm.nih.gov), the Clusters of Orthologous Genes/EukaryoticOrthologous Group (COG/KOG) database (http://www.ncbi.nlm.nih.gov/COG), the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (http://www.genome.jp/kegg), and the Swiss-Prot protein database (http://www.expasy.ch/sprot) via the BLASTx program (http://www.ncbi.nlm.nih.gov/BLAST/), with an E value threshold of 1e−5, to assess the similarity of the sequences to those of genes from other species. Gene Ontology (GO) annotation was analyzed using isoforms from the Nr annotation results by Blast2GO software. The top 20 scoring isoforms and no fewer than 33 high-scoring segment pair hits (HSPs) were selected for the Blast2GO analysis. Isoforms were functionally classified using WEGO software. Transcription factors (TFs) were predicted via hmmscan by aligning the protein coding sequences to the Plant TFdb (https://planttfdb.gao-lab.org/).

Identification of the putative iriodoid biosynthesis pathway

Based on previous literature and known information from public databases, candidate isoforms for iridoid biosynthesis were screened from the sequence annotation files derived from third-generation sequencing. These candidate isoforms were subsequently annotated against the Nr, Swiss-Prot, KEGG, and KOG databases. Isoforms annotated with Enzyme Commission numbers (EC numbers) and keywords associated with terpenoid backbone biosynthesis, cytochrome P450, and iridoids were selected for further analysis. The screened isoforms were mapped to reference pathways, such as terpenoid backbone biosynthesis (map000900) and monoterpenoid biosynthesis (map00902), within the KEGG database. With reference to published core iridoid biosynthesis pathways (Oudin et al., 2007; Miettinen et al., 2014; Ni et al., 2019; Liu et al., 2017; Salim et al., 2014; Rai et al., 2013; Xu et al., 2022). These putative enzyme isoforms were assigned to corresponding biochemical reaction steps, thereby establishing a putative pathway for iridoid biosynthesis in G. straminea.

Protein-protein interaction network analysis

Functional enrichment analysis of protein-protein interaction (PPI) networks for genes involved in the iridoid biosynthesis pathway was conducted using the String database (https://cn.string-db.org/) (Szklarczyk et al., 2014). The protein sequences corresponding to key enzymatic components of the iridoid biosynthetic machinery in G. straminea were queried against the string database, with A. thaliana designated as the reference model organism for orthologous mapping. After sequence homology analysis, the highest-confidence protein isoforms (with open reading frames, ORFs) were selected based on maximal sequence identity from the mapping interface. Network construction parameters were optimized as follows: Minimum required interaction score: High confidence (score ≥ 0.700); max number of interactors to show: first shell limited to query proteins only; visualization output: dynamic Scalable Vector Graphics (SVG) network representation.

Identification and bioinformatic analysis of G(G)PPSs in G. straminea

For identification of GsG(G)PPSs, local BLAST search was performed using GGPPSs from Arabidopsis thaliana (Beck et al., 2013) or Chimonanthus praecox (Kamran et al., 2020) as queries. A threshold of e-value <10−10 was applied for preliminary screening. After searching, sequences of putative GsG(G)PPSs were further subjected to CDD (https://www.ncbi.nlm.nih.gov/cdd/) and InterPro (https://www.ebi.ac.uk/interpro/result/InterProScan/) for domain confirmation (Paysan-Lafosse et al., 2023). Prediction and analysis of the physicochemical properties of the GsG(G)PPS amino acid sequences were performed via ExPASy (https://web.expasy.org/protparam/) (Artimo et al., 2012). Sequences were submitted to SignalP4.1 server for prediction of the signal peptide (https://services.healthtech.dtu.dk/services/SignalP-4.1/) (Petersen et al., 2011). Subcellular localization was determined via WoLF PSORT (https://wolfpsort.hgc.jp/), transmembrane structure was predicted by HMHMM2.0 (https://services.healthtech.dtu.dk/services/TMHMM-2.0/). In addition, the annotated sequence information was submitted to the MEME website (https://meme-suite.org/meme/doc/meme.html), using 6–100 residues as the optimal motif size to search for 10 conserved motifs and predicted the conserved protein motifs in the sequence (Bailey et al., 2015). Similar to the GsGGPPS SSU, GsGGPPS, and GsGPPS amino acid sequences were downloaded from NCBI BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi). Protein sequence alignment was performed via DNAMAN. Protein structure prediction was performed via SWISS-MODEL (https://swissmodel.expasy.org/).

Phylogentic analysis of G(G)PPS gene family

The GsG(G)PPSs obtained, and G(G)PPSs from other species were incorporated for phylogenetic analysis. G(G)PPSs in Arabidopsis thaliana and Nicotiana tabacum genomes were identified through BLAST by using the ensemble database (https://asia.ensembl.org/Multi/Tools/Blast). G(G)PPS homologues from other species available in the NCBI database were included for phylogenetic analyses of the G(G)PPS family. Details on all the G(G)PPSs used for phylogenetic analysis were listed in Table S1. These species protein sequences were performed multiple sequence alignment using the ClustalW program integrated in MEGA11.0. Phylogenetic inference of G(G)PPSs was conducted using the neighbor-joining method in MEGA 11.0 software, with a bootstrap test of 1,000 replicates (Tamura, Stecher & Kumar, 2021) and the best-fit substitution model JTT+G+I. The refinement of the evolutionary tree was completed using the online software Evoview (https://www.evolgenius.info/evolview/#/).

Expression analysis of key enzymes by real—time quantitative PCR

First-strand cDNA synthesis was performed using a cDNA reverse transcription kit (PrimeScriptTMII 1st Strand cDNA Synthesis Kit), following the protocol provided. Primers for reverse transcription-quantitative polymerase chain reaction (RT-qPCR) were designed using the OligoArchitect online sever and synthesized by Sangon Biotech (Shanghai) Co., Ltd. The primers sequence shown in Table S2, qPCR was performed using TB Green Premix Ex Taq with a 20 μL reaction system, which included 10 μL of TB Green Premix, 0.8 μL each of forward and reverse primers (10 μM), two μL of cDNA, 6.4 μL of ddH2O. The reaction procedure consisted of the following steps: pre-denaturation at 94 °C for 5 min, denaturation at 94 °C for 30 s, annealing at 53 °C for 30 s, extension at 72 °C for 30 s, followed by 40 cycles. The GAPDH gene was utilized as the internal reference for relative expression analysis. The quantification of gene expressions was conducted using three biological replicates. Relative expression was calculated using the Ct (2−ΔΔCt) method, following the approach described by Livak & Schmittgen (2001), with root expression serving as the control. The significance analysis of difference tissue was conducted with means of gene expression by the Duncan test at 5%.

Results

Transcriptome sequencing of G. straminea

Both SMRT and Illumina sequencing were performed on the root, stem, leaf, flower, and ovary tissues of G. straminea. The average amount of raw data generated was 6.5 GB for second-generation sequencing per sample (Table 1). For PacBio sequencing, a total of 62.47 GB of raw data was obtained. The per-base coverage depth reached approximately 45× (circular consensus sequences corrected depth). A total of 23,318,162 subreads were generated from third-generation sequencing. After self-correction and merging, 499,496 circular consensus sequences (CCS) were generated, with an average CCS read length of 2,789 bp. The distributions for the number of passes and read lengths of these CCS reads are shown in Figs. S1AS1B. The full-length nonchimeric sequences with high-precision CCS reads were identified, and similar FLNC reads were clustered hierarchically to obtain consensus sequences (Fig. S1C). A total of 41,785 high-quality isoforms (HQs) and 140 low-quality isoforms (LQs) were obtained after further correction. After removing redundant sequences, the total length of the isoforms was 84,861,577 bp. A total of 32,776 isoforms were obtained, and the lengths ranged from 165 to 10,169 bp, with an average length of 2,589.14 bp, an N50 of 2,767 bp, and a guanine-cytosine (GC) content of 41.43%. The number and length distribution of the isoforms are shown in Fig. S1D.

Table 1. Comparison with reference gene sequence Pure reads obtained in second generation sequencing.

Sample CleanData (GB) Total_Mapped (%) Unique_Mapped (%)
R-1 6.574 76.63 18.14
R-2 6.944 76.75 18.04
R-3 6.449 76.95 18.08
S-1 6.712 73.75 18.86
S-2 5.728 73.86 18.9
S-3 5.835 73.33 19.04
L-1 6.398 76.13 20.3
L-2 5.922 76.74 19.34
L-3 6.232 76.95 19.39
F-1 6.507 71.36 18.76
F-2 6.837 70.92 18.59
F-3 6.406 71.18 18.65
O-1 7.384 72.48 19.04
O-2 6.594 71.91 18.92
O-3 7.001 72.3 18.89

Notes.

R
tissue of root
S
stem
L
leaf
F
flower
O
ovary; the numbers after letters indicated three biological replicates

Functional annotation of the full-length transcriptome of G. straminea

The HQ unigenes were annotated against four functional annotation databases Nr, Swiss-Prot, KEGG, and KOG. A total of 31,434 (95.9%) unigenes were successfully annotated, while 1,342 were unannotated. The highest number of unigenes (31,235; 97.47%) was annotated to Nr database, followed by the KEGG database and the Swiss-Prot database, in which 30,990 (94.55%) and 27,622 (84.27%) unigenes, respectively, were annotated. The lowest number of unigenes was annotated in the KOG database (22,753; 69.42%). Summary, 21,742 common unigenes (66.34%) were annotated in all four databases (Fig. 1A). These findings were compared with those for 414 species annotated in the Nr database (top ten shown in Fig. 1B). The species with the most annotated sequence information was Coffea arabica, with 8,701 (27.86%) unigenes, followed by Coffea eugenioides, Coffea canephora, and Olea europaea, with 5,318 (17.03%), 3,512 (11.24%), and 1,051 (3.36%) unigenes, respectively.

Figure 1. Venn diagram and species distribution.

Figure 1

(A) Venn diagram showing the number of unigenes annotated to four databases; (B) the top ten species distribution annotated in the Nr database.

The KOG analysis identified 22,753 unigenes, which could be classified into 25 categories (Fig. 2). The largest number of annotated genes was associated with general function prediction only 4,733 genes (20.80%), followed by 4,146 genes (18.22%) annotated to signal transduction mechanisms; 2,859 genes (12.57%) annotated to posttranslational modifications, protein turnover, and chaperones; 1,617 genes (7.11%) annotated to carbohydrate transport and metabolism; and 1,533 genes (6.74%) annotated to RNA processing and modification. The lowest number was observed for cell motility (37; 0.16%). In addition, 1,210 (5.30%) genes with unknown functions were identified.

Figure 2. KOG function classification.

Figure 2

The unigenes annotated by GO function analysis were associated with 51 GO terms, which were grouped into three categories: cellular component, molecular function, and biological process (Fig. S2). The top three GO enriched terms in the biological process category were cellular process, metabolic process, and response to stimulus, with 21,052, 18,417 and 7,559 genes, respectively. The top three enriched GO terms in the molecular function category were binding (18,864), catalytic activity (16,549), and transporter activity (3,175). In the cellular component category, cellular anatomical entity (16,552) and protein-containing complex (6,946) terms were highly enriched.

In the KEGG database, 30,990 unigenes of G. straminea were annotated and classified into five main categories and 19 subclasses (shown in Table S3). Of these annotated unigenes, 9,485 were mapped to specific KEGG pathways. The greatest number of genes (4,647; 48.99%) was annotated in metabolism pathways, followed by secondary metabolite biosynthesis (2,494; 26.29%), carbon metabolism (826; 8.71%), and biosynthesis of amino acids (635; 6.69%) (Table S4). Genes annotated to secondary metabolite biosynthesis pathways may be related to the synthesis of the medicinal components of G. straminea. In addition to carbon metabolism, the biosynthesis of amino acids and other metabolic pathways may be related to cellular osmotic regulation and the oxidative stress response. These annotated genes provide important sequence information for investigating the biosynthetic mechanism of the metabolites of G. straminea.

In this study, 708 annotated genes were found to participate in 20 standard KEGG secondary metabolism pathways in the transcriptome of G. straminea (Table S5); among these genes, 121 genes were annotated to the terpenoid backbone biosynthesis pathway and 67 genes were enriched in terpenoids (monoterpenoid, diterpenoid, sesquiterpenoid and triterpenoid biosynthesis) pathways. Furthermore, there were 84 genes involved in phenylpropanoid biosynthesis, while 41 genes were involved in flavonoids, isoflavonoid, flavone and flavonol biosynthesis. Additionally, 89 genes related to the synthesis of various alkaloids (indole, isoquinoline, tropane, piperidine, and pyridine alkaloid biosynthesis) were annotated, as shown in Table S5.

Predicting TFs

According to the assembly results, 1,151 genes were annotated as TFs, distributed in 51 TF families. The largest number of genes belonged to the GRAS family, with 128 genes (11.12%), followed by the ARF, C3H, bHLH, and WRKY families, with 92, 70, 66 and 66 genes, respectively. The least common families were the NF-YB (1), M-type (1), Whirly (1), AP2 (1), and YABBY (1) families. The ten TF families with the greatest number of genes in G. straminea are shown in Fig. S3.

DEGs analysis

A total of 32,470 genes were detected, and venn diagram analysis revealed that 31,330 genes were commonly found in the five tissues (Fig. 3B). In the comparisons of root and stem, root and leaf, root and flower, root and ovary, a total of 9,809, 10,503, 13,195, 9,699 DEGs were identified, respectively. Among these, 6,594, 5,762, 6,572, 5,727 DEGs were up-regulated and 3,260, 4,741, 6,623, 3,972 DEGs were down-regulated, respectively. In the contrast between leaf and stem, leaf and flower, leaf and ovary, a total of 6,980, 10,475, 10,006 DEGs were separately detected. Of these, 4,030, 4,707, 5,002 DEGs were up-reaulated and 2,950, 5,768, 5,004 DEGs were down-regulated. Furthermore, 8,855, 7,021 DEGs were identified in the stem vs flower, stem vs ovary comparison groups, respectively. Comparing with ovary, in the tissue of flower, 3,456 DEGs were up-regulated, and 2,218 DEGs were down-regulated, as shown in Fig. 3A.

Figure 3. Distribution of DEG expression level across experimental groups and venn diagram of tissue-species DEGs.

Figure 3

(A) Up-regulated and down-regulated number distribution of DEG expression level across experimental groups. (B) Venn diagram of DEG in different tissues.

In the comparisons of five different tissues, DEGs annotated were mainly enriched in the metabolic pathways, biosynthesis of secondary metabolites. Additionally, DEGs genes were enriched in the pentose and glucuronate interconversions in the root-vs-flower and root-vs-ovaries; carbon metabolism in leaves-vs-stem and flowers-vs-ovaries; amino sugar and nucleotide sugar metabolism in roots-vs-stem (Fig. S4).

Analysis of iridoid biosynthesis genes in G. straminea

Iridoid compound, which are common secondary metabolite components found in various medicinal plants, are the main components of G. straminea and have significant biological activity. By combining these results with previous research results (Ni et al., 2019; Liu et al., 2017), we identified a putative pathway for iridoid biosynthesis and the isoforms involved (Fig. 4). Our results revealed that 117 isoforms encoded 19 key enzymes (Table S6). The expression levels of these key enzyme isoforms in seven tissues are shown with a heatmap (Fig. 4), of which, GCPE, STR, ISPE, DXR, ISPH,7-DLS showed relatively high expression in leaves, other genes showed different expression patterns in different tissues. To further elucidate the interactions among these genes. The STRING database was employed to construct a protein-protein interaction (PPI) network for proteins annotated in the iridoid biosynthesis pathway of G. straminea. A. thaliana was used as the reference organism, the annotated protein sequences were uploaded to STRING database. Initial analysis revealed only 18 associated proteins, with subsequent network formation showing just 15 functionally connected proteins (Fig. 5A). The resulting PPI network comprised 15 nodes and 82 edges (Fig. 5A), exhibiting an average node degree of 10.9 and an average local clustering coefficient of 0.863. The PPI enrichment p-value was highly significant (<1.0e−16). STRING database annotations indicated experimentally validated interactions between: GGPPS SSU and GGPPS (Purple edges), AACT and HMGCS (light blue edges). The other interacting pairs were derived from curated databases and predicted interactions. The protein interaction network-associated genes displayed distinct tissue-specific expression profiles (Fig. 5B).

Figure 4. Putative pathways and heatmap of isoforms related to iridoid biosynthesis in different tissues of G. straminea.

Figure 4

Note: Enzymes highlighted in red indicate annotated genes in G. straminea, while black labels denote unannotated homologs, the red numerals in parentheses represent the isoform count in G. straminea. Different color-coded blocks visualized tissue-specific expression patterns across seven tissues types (R, root; S, stem; L, leaf; F, flower; O, ovary; NEC, non-embryonic callus; EC, embryonic callus). With each row corresponds to a normalized expression profile (FPKM) of iridoid biosynthesis-related isoforms. The heatmap employs a row-wise-z-score normalization, where red indicates high relative expression, and blue indicates low relative expression. AACT, Acetyl-CoA C-acetyltransferase; HMGCS, Hydroxymethylglutaryl-CoA synthase; HMGCR, Hydroxymethylglutaryl-CoA reductase (NADPH); MVK, Mevalonate kinase; PMK: Phosphomevalonate kinase; MVD, Diphosphomevalonate decarboxylase; IDI, Isopentenyl-diphosphate delta-isomerase; DXS, 1-Deoxy-D-xylulose-5-phosphate synthase; DXR, 1-Deoxy-D-xylulose-5-phosphate reductoisomerase; ISPD, 2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase; ISPE, 4-Diphosphocytidyl-2-C-methyl-D-erythritol kinase; ISPF, 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase; GCPE, (E)-4-Hydroxy-3- methylbut-2-enyl-diphosphate synthase; ISPH, 4-Hydroxy-3-methylbut-2-en1yl diphosphate reductase; GPPS, Geranyl diphosphate synthase; GGPPS, Geranylgeranyl diphosphate synthase; GES, Geranyl diphosphate diphosphatase; POR, Cytochrome P450 reductase; G10H, Geraniol 10 -hydroxylase; 10-HG O, 10 -Hydroxygeraniol oxidoreductase; ISY1, Iridoid synthase; 7-DLS, 7-Deoxyloganetic acid synthase; 7-DLGT, 7-Deoxyloganetic acid glucosyltransferase; 7-DLH, 7-Deoxyloganic acid hydroxylase; LAMT, Loganic acid O-methyltransferase; SLS, Secologanin synthase; STR, Strictosidine synthase.

Figure 5. Protein-protein interaction network and gene expression profiling in the iridoid biosynthesis pathway of G. straminea.

Figure 5

(A) Protein-protein interaction network of the iridoid biosynthetic enzymes. Nodes represent pathways proteins, edges colors denote interaction evidence types: purple and light blue: Known interactions (database and experimental evidence): green: Gene neighborhood; red: Gene fusion; blue: Gene co-occurrence; yellow: Text mining; black: Coexpression; light purple: Protein homology. (B) Tissue-specific expression of iridoid-related isoforms. Heatmap displays row-wise z-score normalized FPKM values (red: high; blue: low). R, root; S, stem; L, leaf; F, flower; O, ovary; NEC, non-embryonic callus; EC, embryonic callus. AACT: Acetyl-CoA C-acetyltransferase; HMGCS: Hydroxymethylglutaryl-CoA synthase; HMGCR: Hydroxymethylglutaryl-CoA reductase (NADPH); MVK: Mevalonate kinase; PMK: Phosphomevalonate kinase; MVD: Diphosphomevalonate decarboxylase; IDI: Isopentenyl-diphosphate delta-isomerase; DXS: 1-Deoxy-D-xylulose-5-phosphate synthase; DXR: 1-Deoxy-D-xylulose-5-phosphate reductoisomerase; ISPE: 4-Diphosphocytidyl-2-C-methyl-D-erythritol kinase; GCPE: (E)-4-Hydroxy-3- methylbut-2-enyl-diphosphate synthase; ISPH: 4-Hydroxy-3-methylbut-2-en1yl diphosphate reductase; GPPS: Geranyl diphosphate synthase; GGPPS: Geranylgeranyl diphosphate synthase ; GGPPS SSU: Geranylgeranyl diphosphate synthase small subunits.

Bioinformatics analysis of G(G)PPS

Our results revealed that ten isoforms had GPPS/GGPPS annotations, three of which had open reading frames (ORFs). Two of these were annotated as GGPPS, while one isoform was annotated as GPPS, among the two GGPPSs, one was categorized as GGPPS small subunits (GGPPS SSU), and one was classified as a typical GGPPS. The prediction results revealed that the amino acid length of G(G)PPS (SSU) ranged from 342 to 424 aa, with corresponding molecular weights were 37.9 kDa and pI values were 5.81 (Table 1). Two of Gs GGPPS (SSU) possessed nagative GRAVY values, ranging from −0.050 to −0.187, indicating that these proteins have hydrophilicity. Gs GPPS had a positive GRAVY value (0.049), suggesting hydrophobicity of them. Three of Gs G(G)PPS (SSU) were identified no signal peptide. Two of Gs GGPPS (SSU) were localized in the chloroplast, Gs GPPS were predicted to be mitochondrion. Transmembrane structures were not predicted in any of the G(G)PPS proteins by TMHMM2.0 predictions (Table 2). Pfam protein structural domain prediction revealed a distinctive polyprenyl-synt domain shared by all the G(G)PPS proteins (Figs. S5A and S5B).

Table 2. Physicochemical, structural properties and subcellular localization of GsG(G)PPS.

Isoform number Gene name length (aa) MW (kD) pI SP SL GRAVY TS
Isoform0028091 GsGGPPS SSU 347 37.90001 5.81 NO chloroplast −0.187 o
Isoform0030969 GsGGPPS 368 40.01208 6.28 NO chloroplast −0.050 o
Isoform0027756 GsGPPS 424 46.45244 6.48 NO mitochondrion 0.049 o

Notes.

MW
molecular weight
pI
isoelectric point
SP
Signal peptide cleavage site
SL
Subcellular localization
GRAVY
grand average of hydropathicity
TS
Transmembrane structures
o
indicates that the protein is predicted to be outside the membrane

G(G)PPS usually contains two highly conserved aspartic acid-rich regions-FRAM and SARM with the sequences of DD(XX)1−2D (D is aspartic acid, and X refers to any amino acid). The first conserved region FRAM (DDXXXXD) is consistent with the binding site of the substrate dimethylallyl diphosphate (DMAPP), and the second conserved region SARM (DDXXD) corresponds to the binding site of the substrate isopentenyl diphosphate (IPP). which affects the catalytic activity of G(G)PPS. Some G(G)GPPS proteins also have the characteristic sequence CXXXC (C is cysteine, and X refers to any hydrophobic amino acid) of their structural domain, which is essential for proteins-proteins interactions (Beck et al., 2013). Sequence alignment results revealed that the Gs GGPPS SSU sequences were similar to those of Pj GGPPS SSU, Ae GGPPS SSU, Ca GGPPS SSU, and Si GGPPS SSU2, with identity values of 81.74%, 81.55%, 81.49% and 81.61%, respectively, according to DNAMAN (Table S7). The identities of the Gs GGPPS sequences were similar to those of Cr GGPPS, Ca GPPS, Ce GGPPS and Gj GGPPS, with values of 74.06%, 72.29%, 72.04% and 71.28%, respectively (Table S7). The Gs GPPS sequences were similar to those of Ce SPPS, Ca SPPS, Cr GPPS1, Cr GPPS2, Gsy FPPS, Si SPPS and Na SPPS, with identities of 91.84%, 91.76%, 92%, 91.84%, 91.53%, 90.68%, and 90.75%, respectively (Table S7). Gs GGPPS SSU were enriched with one FARM (DD(XX)2D) and two CXXXC regions. The Gs GGPPS subunit underwent a change in the second aspartic acid enrichment motif, from D to E, i.e, DDXXE (Fig. S5A). Gs GGPPS was enriched with one FARM region (DD(XX)2D), one SARM region each (DDXXD), and one CXXXC region (Fig. S5A). Gs GPPS was enriched with two SARM regions (DDXXD) (Fig. S5B).

The analysis of the conserved motifs in Gs G(G)PPS revealed that both Gs GGPPS SSU and Gs GGPPS contained motif 1, 2, 4, 6, 9. Gs GGPPS SSU and Gs GPPS shared motif 3, 5, 7. Gs GGPPS and Gs GPPS shared common motif 7 and 8 (Fig. 6). The difference in motif composition may influence the function of Gs G(G)PPSs, leading to changes in catalytic activity, protein subcellular localization, and other aspects. Protein structure prediction revealed that Gs GGPPS SSU, Gs GGPPS, and Gs GPPS exhibited high structural similarity to their homologous counterparts from Mucuna pruriens (GMQE = 0.85), Handroanthus impetiginosus (GMQE = 0.81), and Catharanthus roseus (GMQE = 0.79), respectively. Gs G(G)PPS mainly contained α-helices and random coils in the tertiary structure (Fig. S6).

Figure 6. Conserved motif analysis of G(G)PPS in G. straminea.

Figure 6

Phylogentic analysis of G(G)PPSs

Phylogenetic analysis revealed that the G(G)PPSs identified can be categorized into three distinct branches. Among them, Gs GGPPS, together with large subunits of GGPPS (GGPPS LSUs) and GGPPSs from other species, clustered into group 1. Gs GGPPS SSU along with the small subunits of GGPPS (GGPPS SSUs) from other species, was grouped into the second branch (group 2). Gs GPPSs formed the third branch, together with GPPS, SPPS, and FPPS from various other species (group 3) (Fig. 7). Gs G(G)PPSs were categorized into three distinct groups based on their sequence and functional divergence.

Figure 7. Phylogenetic tree of the G(G)PPS gene family across different species.

Figure 7

The abbreviations and sequence ID of G(G)PPS gene family are shown in Table S1. Red triangles, red circles and red stars indicated the proteins annotated in this study. the numbers on the branches represent the statistical support values for those branches. These values are obtained through bootstrap analysis, with higher numbers indicating more reliable and credible branch divisions.

Expression analysis by real-time quantitative PCR

To further analyze the expression patterns of genes annotated to iridoid biosynthesis pathway in different tissues, RT-qPCR was performed on key genes of the iridoid biosynthesis pathway. We selected the upstream and downstream initiators of the MVA (AACT, MVD) and MEP (DXS, ISPH, GCPE) pathways, and these genes (IDI, GPPS) serving as key genes involved in intermediate production and skeletal formation steps. As shown in Fig. 8, the qPCR analysis revealed distinct tissue-specific expression patterns of these genes; AACT, IDI, ISPH, and GCPE showed their highest expression levels in leaves, DXS and GPPS were most abundantly expressed in stems, while MVD exhibited peak expression in non-embryogenic callus. Conversely, the lowest expression levels were observed for AACT, DXS, and ISPH in root tissues, and for IDI, MVD, GCPE, and GPPS in flowers. Significant differential expression across tissues was observed for all genes, with particularly pronounced variations for ISPH, GCPE, and GPPS. The RNA-seq results demonstrated some similar but also distinct patterns: AACT and GPPS showed highest expression in embryogenic callus, ISPH and GCPE in leaves, while MVD was uniquely most highly expressed in roots. For most genes (DXS, IDI, ISPH, GCPE, GPPS), the expression trends across different tissues were generally consistent between qPCR and RNA-seq data. However, discrepancies were noted for AACT and MVD in roots and leaves, as well as for DXS and MVD in the ovary.

Figure 8. Tissue-specific expression abundances of key genes involved in iridoid synthesis.

Figure 8

Note: Bar chart indicated the relative expression levels of genes, line chart indicated the FPKM values of genes. NEC indicated non-embryonic callus, EC indicated embryonic callus. Bars represent standard deviation; Different lowercase letters indicating significant differences at the 0.05 level of probability according to Duncan’s multiple-rangetest.

Discussion

As an important medicinal plant, G. straminea produces diverse iridoids, its principal bioactive constituents, which are primarily biosynthesized via the terpenoid pathway. Transcriptome analysis revealed significant enrichment of genes associated with secondary metabolite synthesis, especially terpenoid backbone biosynthesis, with 121 genes annotated to this pathway. This genetic repertoire presumably underlies the molecular foundation for iridoid production in the plant. Comparing with the results obtained for G. straminea via Illumina NGS (Zhou et al., 2016), and Gentiana waltonii and Gentiana robusta via the Illumina Hiseq X Ten platform (Ni et al., 2019), we obtained more annotation information than previous studies, significantly enriching the genomic resources for G. straminea, and providing a more comprehensive genetic database.

TFs can regulate gene expression by recognizing specific DNA sequences in gene promoters, which is important for understanding gene expression regulatory mechanisms (Jose, Franco & Roberto, 2016). In plants, major TF families including GRAS, bHLH, and WRKY are implicated in hormone signaling and secondary metabolism. Our annotation identified predominant TF distributions in the GRAS, ARF, C3H, bHLH, WRKY, and FAR1 families, consistent with their metabolic regulatory functions. As members of the GRAS family, the SmDELLA1 protein positively regulates phenolic acid and flavonoid biosynthesis (Li et al., 2024). DELLA proteins additionally modulate jasmonic acid (JA) signaling and cell wall formation (Hou et al., 2010; Wang et al., 2021). As the second largest angiosperms TF class, bHLHs govern epidermal differentiation, stress responses, and secondary metabolism. They are key regulators of anthocyanin biosynthesis (Jaakola, 2013) and loganic acid production (Fu et al., 2024). Plant-specific WRKYs bind conserved W-box elements to activate downstream genes (Brand et al., 2013). Artemisia carvifolia AaWRKY1 promoted artemisinin biosynthesis via activating sesquiterpene synthase expression (Ma et al., 2009).

Iridoids are present in traditional medicinal plants and regulate various diseases in the human body. The synthesis of iridoids has been reported in C. roseus (Oudin et al., 2007), Gentiana rigescens (Zhang et al., 2015), Valeriana jatamansi (Zhao and Wang., 2020), Swertia mussotii (Liu et al., 2017) and Rehmannia glutinosa (Sun et al., 2012). In our study, 117 isoforms involved in 19 key enzymes were annotated, spanning multiple stages of iridoid synthesis. Comparative analyses reveal species-specific patterns: for instance. S. mussotii exhibited 39 transcripts link to 24 enzyme categories (Liu et al., 2017), Gentiana lhassica had 171 unigenes encoding 27 key enzymes (Heng et al., 2021), and in V. jatamansi contained 24 unigenes associated with three metabolic pathways (Zhao & Wang, 2020). In Panax ginseng (Kim et al., 2014) and Ganoderma lucidum (Shi et al., 2012), overexpressed MVD could significantly increase the accumulation of terpenoids in plants. The overexpression of HDR gene in Artemisia annua (Ma et al., 2017) and Ginkgo biloba (Kim et al., 2021) could significantly increase the terpenoids content.

Studies show that genes typically do not act alone but co-express with related genes in their metabolic pathways to achieve specific functions (Zhao et al., 2013a; Zhao et al., 2013b). Some research has demonstrated that co-expression of G(G)PPS and GES genes significantly enhances the accumulation of terpenoid indole alkaloids (TIAs) (Kumar et al., 2015). Upregulation of DXS expression markedly increases the expression of its downstream gene GGPPS, thereby promoting carotenoid biosynthesis (Henriquez et al., 2016). Co-expression of DXS and GGPPS in Salvia miltiorrhiza resulted in transgenic hairy roots with substantially higher tanshinone content compared to single-gene transformants and wild-type controls (Shi et al., 2016). Overexpression of LcGPPS.SSU1 enhanced DXS expression level in the metabolic pathway via a positive feedback regulatory mechanism, concomitantly elevating GGPPSs expression levels (Zhao et al., 2020). Elevated ISPG ((E)-4-hydroxy-3-methylbut-2-enyl diphosphate synthase, also designated GCPE) levels induced HMBPP accumulation, which subsequently activated ISPH to alleviate both HMBPP buildup and the negative effects of GCPE overexpression (Li et al., 2017). Coordinated expression of GCPE and ISPH simultaneously eliminated the accumulation of both HMBPP and MECPP, leading to a dramatic increase in isoprenoid production (Li et al., 2017). Deficiency in ISPE activity reduces the levels of MEP pathway metabolites (chlorophylls and carotenoids), while stimulating the expression of ISPF and ISPG (Ahn & Pai, 2008). Methylerythritol cyclodiphosphate (MEcPP) may regulate MEP pathway activity through feedback modulation of DXS protein abundance (Wang et al., 2019). PtHMGR overexpression resulted in transcriptional activation of genes across the MEP and MVA pathways (including DXS, IDI, and GPS), suggesting a global regulatory role (Wei et al., 2019). WRKY transcription factors likely participate in terpenoid biosynthesis by binding to W-box elements (TTGACC) in the promoters of both MEP pathway genes (IDS, DXS) and MVA pathway genes (HMGS) (Meng et al., 2018). When ISPC (1-Deoxy-D-xylulose-5-phosphate synthase, also known DXR), ISPE, ISPH and ISPG were expressed as a tandem gene cluster under a shared promoter in the host strain, taxadiene production was significantly improved (Huang & Zhong, 2017).

These regulatory mechanisms ultimately converge to optimize the production of terpenoid precursors. GPP/GGPP serve as essential terpenoid precursors synthesized from IPP and DMAPP through distinct enzymatic pathways. Researchers have demonstrated that GPPS catalyzes the condensation of IPP and DMAPP form GPP for monoterpene synthesis, while mediates their conversion to GGPP as the precursor for diterpene synthesis, triterpene synthesis, etc. (Tholl et al., 2004; Liang, Ko & Wang, 2002). Since both of them act on the same substrate, some scholars have hypothesized that the IPP flow direction determines the product differences (Tholl et al., 2004). In the third stage, geraniol is formed via the action of GES, and then 10-hydroxygeraniol is formed via the catalytic action of G10H (Liang, Ko & Wang, 2002). The genes encoding G10H in C. roseus (Krithika et al., 2015) and S. mussotii (Wang et al., 2010) have been cloned. Although the G10H gene was not annotated in our results, we detected its transfer partner cytochrome P450 reductase (CPR, POR, EC1.6.2.4); this was the partner of G10H, in the catalytic production of 10-hydroxygeraniol from geraniol. Previous studied have demonstrated that cytochrome P450 monooxygenases (e.g., CYP76B6 in C. roseus and CYP76B10 in S. mussotii) can catalyze the conversion of geraniol to loganic acid (Wang et al., 2010; Collu et al., 2001; Hofer et al., 2013). Notably, the catalytic activity of plant CYP450s depends on electron transfer mediated by POR (Durst & Nelson, 1995). And their expression patterns were similar to G10H (Hofer et al., 2013). Base on relevant reports in R. glutinosa (Sun et al., 2012), the POR annotated in this study may catalyze geraniol formation.

These cytochromes P450-mediated conversions represent critical downstream modifications of terpenoid precursors. At the upstream level, the biosynthesis of these precursors is tightly regulated by prenyltransferases. GPP synthase, a member of the short chain prenyltransferase family, catalyzes the condensation of DMAPP and IPP to form GPP. Both FPPS and GGPPS belong to this enzyme family. These prenyltransferases not only regulate IPP flux but also exhibit conserved structural characteristics across plant species (Durst & Nelson, 1995). Our result revealed that the amino acid sizes, molecular weights and isoelectric points of Gs G(G)PPS annotated in this study were essentially similar to those reported for GGPPS in S. miltiorrhiza (Li et al., 2024), Liriodendron tulipifera (Zhang et al., 2021) and wintersweet flower (Kamran et al., 2020). The characteristic conserved motif of Gs GGPPS SSU was consistent with that of Cp GPPS.SSU2 and Cp GPPS.SSU1 reported in wintersweet flower (Kamran et al., 2020). The Gs GGPPS was consistent with the Ltu GGPPS2 reported in the Liriodendron tulipifera (Zhang et al., 2021) and the Cp GPPS reported in wintersweet flower (Kamran et al., 2020). The characteristic conserved motif of Gs GPPS was similar to other characteristics of homologous GPPSs (Kamran et al., 2020).

G(G)PPS was shown to exist in both homologous and heterologous forms in the plant material (Chen, Fan & Wang, 2015), heterodimeric G(G)PPS contained one LSU and one SSU, and the LSU of the heterodimeric GPPS showed 50%–75% sequence similarity to that of GGPPS and possessed isopentenyl transferase activity, which catalyzes the production of mainly GGPP, as well as a small amount of GPP and FPP (Tholl et al., 2004; Kamran et al., 2020). However, the heterodimeric GPPS SSU shares little sequence similarity to with GGPPS, only 22%–38%, lacks the DD(XX)1−2D motif, and shows no isoprenyl transferase activity (Tholl et al., 2004). Five full-length GPPS and GGPPS genes were successfully annotated in the wintersweet flower transcriptome, these genes were classified into three branches by phylogenetic analysis, namely the SSU representing the heterodimeric GPPS and the homodimeric GPPS and GGPPS (Kamran et al., 2020).

Studies demonstrated that GGPPS large subunit (LSU) can form heterodimeric with GPPS inactive small subunit (SSU) to catalyze monoterpene precursor substances. For instance, homologous and heterologous GPP synthetases have been identified in C. roseus, and classified as the LSU of Cr GPPS, the SSU of heterologous Cr GPPS, and homologous Cr GPPS, the LSU of Cr GPPS is bifunctional in the formation of GPP and GGPP, whereas the inactive SSU of Cr GPPS can integrate with Cr GPPS LSU, increasing enzyme activity, and result in the production of only GPP (Rai et al., 2013). It was hypothesized that the inactive SSU of the heterodimeric Cr GPPS interacting with the bifunctional G(G)PPS redirected metabolic flux towards, and thus acting as an important regulator of monoterpene indole alkaloid biosynthesis (Zhang et al., 2021). Similar mechanisms occured in Arabidopsis (Orlova et al., 2010), Antirrhinum majus (Tholl et al., 2004) and C. roseus (Zhang et al., 2021). In tobacco, the studies further shown that Am SSU overexpression enhanced GPPS activity in leaves and flowers and promoted monoterpene production (Orlova et al., 2010). These findings collectively indicate that both homodimeric and heterodimeric G(G)PPS are regulated monoterpenes formation across species, with LSU either binding SSU or functioning as a homodimer to regulate the flow of IPPs. However, the reason for this phenomenon in G. straminea remains still unclear, and requires further investigation.

RNA-seq covers nearly all exon regions of a gene, so the obtained gene expression levels effectively represent the combined expression across all exonic regions. In contrast, qPCR quantifies expression by amplifying a localized region using designed primers and does not account for the full-length gene (Everaert et al., 2017). Furthermore, RNA integrity and purity, cDNA synthesis efficiency, and PCR amplification efficiency may influence the qPCR results (Lin et al., 2023). Additional research has indicated that GC content also has a strong sample-specific impact on gene expression results (Hansen, Irizarry & WU, 2012; Love, Hogenesch & Irizarry, 2016). For some low-expression genes, expression estimates may be less accurate (Li, Jiang & Wong, 2010). In this study, real-time qPCR validation demonstrated strong concordance between gene expression profiles and transcriptome sequencing data for the majority of analyzed genes. However, certain genes exhibited inter-platform discrepancies, a phenomenon consistent with established methodological variance reports. The overall high correlation (85%) between qPCR and RNA-seq data aligns with previously reported plantform concordance metrics (Everaert et al., 2017). However, the observed inter-platform discrepancies for certain genes could be attributed to the complex, multi-layered control system governing plant secondary metabolism, which includes transcriptional, post-transcriptional, translational, and post-translational processes (Hemmerlin, Harwood & Bach, 2012; Deng et al., 2024; Zhao et al., 2023). Our study integrated RT-qPCR and RNA-seq data to reveal significantly higher expression levels of representative iridoid biosynthesis-related genes in aerial tissues. These findings were corroborated by previous studies: Rai demonstated in C. roseus that GPPS exhibited peaked in the flower tissues, followed by the stem, across examined organs (roots, stems, leaves, flowers, and siliques) (Rai et al., 2013). Similarly, Zhou et al. (2016) reported that GPPS exhibited higher expression levels in the flowers comparing to root. Collectively, these results highlight the species-specific tissue distribution patterns of genes involved in iridoid biosynthesis pathways.

In traditional Chinese medicine, the root of Gentiana spp. serve as the official medicinal parts, while Tibetan medicine also utilizes aerial portions (stems, leaves, and flowers) (Yang, 1991). Some studies also have reported superior medicinal quality in the aerial parts of Gentiana species (Zhao, Dorje & Wang, 2010). Similar results were also observed in transcriptome analyses of G. crassicaulis, G. lhassica and G. Rhodantha (Kang et al., 2021a; Kang et al., 2021b; Zhang et al., 2023). Moreover, quantitative analysis of gentiopicroside and loganic acid content across different tissues (roots, stems, leaves, and flowers) of G. lhassica demonstrated markedly higher total iridoid accumulation in aerial organs than in underground parts, and with statistically significant differences (Kang et al., 2021b; Zhang et al., 2023). This suggests that they may possess higher quality than the underground parts.

In summary, this study delineates the tissue-specific expression landscape of iridoid biosynthesis genes in G. straminea through transcriptome analysis, with aerial-part-enriched key genes revealing molecular insights into its medicinal compound accumulation. This work fills critical knowledge gaps underlying iridoid biosynthesis in this species and provides valuable candidate targets (through the annotation of key genes) for future metabolic engineering optimization of iridoid compounds and heterologous biosynthesis of medicinal components.

Conclusions

Combining Pac-Bio long-read and Illumina short-read (HiSeqTM 4000) sequencing, this study conducted full-length transcriptome assembly and differential expression analysis across five tissues of G. straminea. A total of 32,776 full-length transcripts of high-quality without redundancy were obtained, and 31,434 isoforms were annotated in the NR, KEGG, KOG and Swiss-Prot databases. Illumina sequencing revealed 31,330 genes that were commonly expressed across all five tissues. According to KEGG enrichment analysis, the DEGs were mainly enriched in biosynthesis of secondary metabolites, metabolic pathways, mitogen-activated protein kinase (MAPK) signaling pathway. In summary, 708 genes were classified into 20 KEGG secondary metabolism pathways in the transcriptome of G. straminea. All genes involved in the biosynthesis of iridoids were screened, and a total of 117 isoforms were annotated into the iridoid synthesis pathway, resulting in the identification of key genes encoding 19 enzymes. RT-qPCR results showed that AACT, IDI, ISPH, and GCPE had the highest expression levels in leaves, while DXS and GPPS had the highest expression levels in stems. DXS, IDI, MVD, ISPH, and GPPS exhibited the higher expression levels in NEC than in EC, RT-qPCR results showed a similar trend in expression abundance across the tested tissues. The polyprenyl_synt domain was highly conserved in both the identified GsGGPPSs and GsGPPSs. Through phylogenetic analysis, the GsG(G)PPSs annotated in this study could be classified into three branches. These new results provide valuable information for further research on functional gene development and active ingredient accumulation patterns in G. straminea.

Supplemental Information

Supplemental Information 1. Sequence information of proteins analyzed in this study.
DOI: 10.7717/peerj.20136/supp-1
Supplemental Information 2. Primer for real-time quantitative PCR.
DOI: 10.7717/peerj.20136/supp-2
Supplemental Information 3. Gene function annotation via KEGG metabolic pathway classification.
DOI: 10.7717/peerj.20136/supp-3
Supplemental Information 4. The ten pathways with the highest number of annotated genes in the KEGG.
peerj-13-20136-s004.doc (27.5KB, doc)
DOI: 10.7717/peerj.20136/supp-4
Supplemental Information 5. The number of genes involved in secondary metabolism according to the KEGG pathway analysis.
peerj-13-20136-s005.doc (32.5KB, doc)
DOI: 10.7717/peerj.20136/supp-5
Supplemental Information 6. Isoforms involved in iridoid biosynthesis.

“ –” i ndicatesnoE . C . number or KEGG Orthology ( KO )

peerj-13-20136-s006.doc (54.5KB, doc)
DOI: 10.7717/peerj.20136/supp-6
Supplemental Information 7. Protein sequence homology of GsG(G)PPS.

Note: in G. straminea , GsGGPPS SSU denotes the small subunits of geranylgeranyl diphosphate synthase ; GsGGPPS represents the geranylgeranyl diphosphate synthase; GsGPPS indicates geranyl pyrophosphate synthase .

DOI: 10.7717/peerj.20136/supp-7
Supplemental Information 8. SMRT sequencing of G.straminea.

(a) CCS read length distribution; (b) CCS pass distribution; (c) Consensus isoform distribution; (d) Length distribution of transcript isoforms .

DOI: 10.7717/peerj.20136/supp-8
Supplemental Information 9. GO function classification.
peerj-13-20136-s009.jpg (113.7KB, jpg)
DOI: 10.7717/peerj.20136/supp-9
Supplemental Information 10. Gene number of each TF family (top ten).
DOI: 10.7717/peerj.20136/supp-10
Supplemental Information 11. KEGG pathway enrichment of DEGs in different groups.
peerj-13-20136-s011.png (804.3KB, png)
DOI: 10.7717/peerj.20136/supp-11
Supplemental Information 12. Multiple sequence alignment of deduced amino acid sequences of GGPPSSSU, GGPPS, GPPS-like proteins from G. straminea and other plant species.

(a) Sequence alignment of GsGGPPS(SSU) with GGPPS(SSU) from other plants; (b) Sequence alignment of GsGPPS with GPPS, SPPS from other plants; Identical residues are shaded in dark blue, highly similar residues are shaded in pink, and similar residues are shaded in light blue. Polyprenyl-synt domain was marked with purple border; DDXXXXD FARM and DDXXD (SARM) motif was marked with red border, CxxxC motif was marked with green border; Aspartate-rich region was marked with red Triangle; Chain length determination region was marked with green horizontal line; Active site lid residues was marked with blue horizontal line. Gene abbreviations, the species and gene ID information were shown in Table S1.

peerj-13-20136-s012.pdf (916.4KB, pdf)
DOI: 10.7717/peerj.20136/supp-12
Supplemental Information 13. Protein tertiary structure of G(G)PPS of G . straminea.

Structure prediction was performed by homologous modelling with the SWISS-MODEL sever, GsGGPPS SSU using structure of the geranylgeranyl pyrophosphate synthase small subunit protein of Mucuna pruriens (velvet bean) (A0A371F419) as the template. GsGGPPS using structure of the geranylgeranyl pyrophosphate synthase protein of Handroanthus impetiginosus GGPPS (A0A2G9GV50) as the template, GsGPPS using structure of the geranyl pyrophosphate synthase protein of Catharanthus roseus (B2MV87) as the template. (a) Predicted structure of GsGGPPS SSU shown; (b) Predicted structure of GsGGPPS shown; (c) Predicted structure of GsGPPS shown.

peerj-13-20136-s013.png (395.4KB, png)
DOI: 10.7717/peerj.20136/supp-13
Supplemental Information 14. The detailed procedure for the PCR experiment.
DOI: 10.7717/peerj.20136/supp-14
Supplemental Information 15. MIQE checklist.
DOI: 10.7717/peerj.20136/supp-15
Supplemental Information 16. The average Ct of q-PCR.
peerj-13-20136-s016.xls (31.5KB, xls)
DOI: 10.7717/peerj.20136/supp-16
Supplemental Information 17. Gs Transcriptome Functional annotation.
peerj-13-20136-s017.xlsx (6.9MB, xlsx)
DOI: 10.7717/peerj.20136/supp-17
Supplemental Information 18. Gs Isoforms sequence.
peerj-13-20136-s018.fasta (177.5KB, fasta)
DOI: 10.7717/peerj.20136/supp-18

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions.

Funding Statement

This research was supported by the Youth Fund projects, Qinghai University (No. 2020-DNY-7) and Central Leading Local Science and Technology Development Fund Project (2024ZY030). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Contributor Information

Tao He, Email: hetaoxn@aliyun.com.

Le Wang, Email: wangleqhu@163.com.

Additional Information and Declarations

Competing Interests

The authors declare that they have no conflicts of interest.

Author Contributions

Lina Yang conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Tao He conceived and designed the experiments, authored or reviewed drafts of the article, and approved the final draft.

Le Wang conceived and designed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Xiaochun Ning performed the experiments, prepared figures and/or tables, sample collection, and approved the final draft.

Shuai Wang performed the experiments, analyzed the data, prepared figures and/or tables, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

The data is available at China National Center for Bioinformation: CRA017932, CRA019968.

References

  • Ahn & Pai (2008).Ahn CS, Pai HS. Physiological function of IspE, a plastid MEP pathway gene for isoprenoid biosynthesis, in organelle biogenesis and cell morphogenesis in Nicotiana benthamiana. Plant Molecular Biology. 2008;66(5):503–517. doi: 10.1007/s11103-007-9286-0. [DOI] [PubMed] [Google Scholar]
  • Artimo et al. (2012).Artimo P, Jonnalagedda M, Arnold K, Baratin D, Csardi G. ExPASy: SIB bioinformatics resource portal. Nucleic Acids Research. 2012;40(Web Server issue):W597–W603. doi: 10.1093/nar/gks400. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Bailey et al. (2015).Bailey TL, Johnson J, Grant CE, Noble WS. The MEME suite. Nucleic Acids Research. 2015;43(W1):W39–W49. doi: 10.1093/nar/gkv416. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Beck et al. (2013).Beck G, Coman D, Herren E, guila Ruiz-Sola M, Rodríguez-Concepción M, Gruissem W, Vranová E. Characterization of the GGPP synthase gene family in Arabidopsis thaliana. Plant Molecular Biology. 2013;82(4–5):393–416. doi: 10.1007/s11103-013-0070-z. [DOI] [PubMed] [Google Scholar]
  • Brand et al. (2013).Brand LH, Fischer NM, Harter K, Kohlbacher O, Wanke D. Elucidating the evolutionary conserved DNA-binding specificities of WRKY transcription factors by molecular dynamics and in vitro binding assays. Nucleic Acids Research. 2013;41(21):9764–9778. doi: 10.1093/nar/gkt732. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Chen, Fan & Wang (2015).Chen QW, Fan DJ, Wang GD. Heteromeric geranyl (geranyl) diphosphate synthase is involved in monoterpene biosynthesis in Arabidosis flowers. Molecular Plant. 2015;8(9):1434–1437. doi: 10.1016/j.molp.2015.05.001. [DOI] [PubMed] [Google Scholar]
  • Collu et al. (2001).Collu G, Unver N, Peltenburg-Looman AM, Heijden RVD, Verpoorte R, Memelink J. Geraniol 10-hydroxylase, a cytochrome P450 enzyme involved in terpenoid indole alkaloid biosynthesis. FEBS Letters. 2001;508(2):215–220. doi: 10.1016/s0014-5793(01)03045-9. [DOI] [PubMed] [Google Scholar]
  • Deng et al. (2024).Deng KY, Li ZW, Huang TB, Huang JZ. Noncoding RNAs in regulation of plant secondary metabolism. Plant Physiology and Biochemistry. 2024;211:108718. doi: 10.1016/j.plaphy.2024.108718. [DOI] [PubMed] [Google Scholar]
  • Durst & Nelson (1995).Durst F, Nelson DR. Diversity and evolution of plant P450 and P450-reductases. Drug Metabolism and Drug Interactions. 1995;12(3):4. doi: 10.1515/dmdi.1995.12.3-4.189. [DOI] [PubMed] [Google Scholar]
  • Everaert et al. (2017).Everaert C, Luypaert M, Maag JLV, Cheng QX, Dinger ME, Hellemans J, Mestdagh Benchmarking of RNA-sequencing analysis workflows using whole-transcriptome RT-qPCR expression data. Scientific Reports. 2017;7:1559. doi: 10.1038/s41598-017-01617-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Fu et al. (2024).Fu HH, Wang YM, Mi FK, Wang L, Yang Y, Wang F, Yue ZG, He YH. Transcriptome and metabolome analysis reveals mechanism of light intensity modulating iridoid biosynthesis in Gentiana macrophylla Pall. BMC Plant Biology. 2024;24(1):526. doi: 10.1186/s12870-024-05217-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Gao et al. (2021).Gao X, Guo FX, Chen Y, Bai G, Liu YX, Jin JQ, Wang Q. Full-length transcriptome analysis provides new insights into the early bolting occurrence in medicinal Angelica sinensis. Scientific Reports. 2021;11(1):13000. doi: 10.1038/S41598-021-92494-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Guo et al. (2021).Guo KY, Chen J, Niu YQ, Lin XM. Full-length transcriptome sequencing provides insights into flavonoid biosynthesis in Fritillaria hupehensis. Life. 2021;11(4):287. doi: 10.3390/life11040287. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Hansen, Irizarry & WU (2012).Hansen KD, Irizarry RA, WU ZJ. Removing technical variability in RNA-seq data using conditional quantile normalization. Biostatistics. 2012;13(2):204–216. doi: 10.1093/biostatistics/kxr054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • He, Yang & Zhao (2011).He T, Yang LN, Zhao ZG. Embryogenesis of Gentiana straminea and assessment of genetic stability of regenerated plants using inter simple sequence repeat (ISSR) marker. African Journal of Biotechnology. 2011;10(39):7604–7610. doi: 10.5897/AJB11.572. [DOI] [Google Scholar]
  • Hemmerlin, Harwood & Bach (2012).Hemmerlin A, Harwood JL, Bach TJ. A raison d’être for two distinct pathways in the early steps of plant isoprenoid biosynthesis. Progress Lipid Research. 2012;51(2):95–148. doi: 10.1016/j.plipres.2011.12.001. [DOI] [PubMed] [Google Scholar]
  • Heng et al. (2021).Heng K, Zhao ZL, Ni LH, Li WT, Zhao SJ, Liu TH. Transcriptome analysis and validation of key genes involved in biosynthesis of iridoids in Gentiana lhassica. China Journal Chinese Materia Medica. 2021;46(18):4704–4711. doi: 10.19540/j.cnki.cjcmm.20210610.101. [DOI] [PubMed] [Google Scholar]
  • Henriquez et al. (2016).Henriquez MA, Soliman A, Li GY, Hannoufa A, Ayele B, Daayf Molecular cloning, functional characterization and expression of potato (Solanum tuberosum) 1-deoxy-D-xylulose 5-phosphate synthase 1 (StDXS1) in response to Phytophthora infestans. Plant Science. 2016;243:71–83. doi: 10.1016/j.plantsci.2015.12.001. [DOI] [PubMed] [Google Scholar]
  • Hofer et al. (2013).Hofer R, Lemeng D, Andre F, Ginglinger JF, Lugan R, Gavira C, Grec S, Lang G, Memelink J, Krol SVD. Geraniol hydroxylase and hydroxygeraniol oxidase activities of the CYP76 family of cytochrome P450 enzymes and potential for engineering the early steps of the (seco)iridoid pathway. Metabolic Engineering. 2013;20:221–232. doi: 10.1016/j.ymben.2013.08.001. [DOI] [PubMed] [Google Scholar]
  • Hou et al. (2010).Hou XL, Yun L, Lee C, Xia KF, Yan YY, Yu H. DELLAs modulate jasmonate signaling via competitive binding to JAZs. Developmental Cell. 2010;19(6):884–894. doi: 10.1016/j.devcel.2010.10.024. [DOI] [PubMed] [Google Scholar]
  • Huang & Zhong (2017).Huang XB, Zhong LL. Enhancing MEP2 pathway of E. coli to produce taxol precursor. Journal of Chongqing University. 2017;40(7):91–98. doi: 10.11835/j.issn.1000-582X.2017.07.012. [DOI] [Google Scholar]
  • Jaakola (2013).Jaakola L. New insights into the regulation of anthocyanin biosynthesis in fruits. Trends Plant Science. 2013;18(9):477–483. doi: 10.1016/j.tplants.2013.06.003. [DOI] [PubMed] [Google Scholar]
  • Jia et al. (2012).Jia N, Li YW, Wu Y, Xi MM, Hur GM, Zhang XX, Cui J, Sun WJ, Wen AD. Comparison of the anti-inflammatory and analgesic effects of Gentiana macrophylla Pall. and Gentiana straminea Maxim. and identification of their active constituents. Journal of Ethnopharmacology. 2012;144(3):638–645. doi: 10.1016/j.jep.2012.10.004. [DOI] [PubMed] [Google Scholar]
  • Jose, Franco & Roberto (2016).Jose M, Franco Z, Roberto S. Identification of plant transcription factor target sequences, Gene Regulatory Mechanisms. Biochimica et Biophysica Acta-Biomembranes. 2016;1860(1):21–30. doi: 10.1016/j.bbagrm.2016.05.001. [DOI] [PubMed] [Google Scholar]
  • Kamran et al. (2020).Kamran HM, Hussain SB, Shang JZ, Xiang L, Chen LQ. Identification and molecular characterization of Geranyl Diphosphate Synthase (GPPS) genes in wintersweet flower. Plants. 2020;9(5):666. doi: 10.3390/plants9050666. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Kang et al. (2021a).Kang H, Zhao ZL, Ni LH, Li WT, Zhao SJ, Liu TH. Transcriptome analysis and exploration of genes involved in the biosynthesis of iridoids in Gentiana crassicaulis (Gentianaceae) Acta Pharmaceutica Sinica. 2021a;56(7):2005–2014. doi: 10.16438/j.0513-4870.2020-1921. [DOI] [Google Scholar]
  • Kang et al. (2021b).Kang H, Zhao ZL, Ni LH, Li WT, Zhao SJ, Liu TH. Transcriptome analysis and validation of key genes involved in the biosynthesis of iridoids in Gentiana lhassica. China Journal of Chinese Materia Medica. 2021b;46(18):4704–4711. doi: 10.19540/j.cnki.cjcmm.20210610.101. [DOI] [PubMed] [Google Scholar]
  • Kim et al. (2021).Kim YB, Kim SM, Sathasivam R, Kim YK, Kim SU. Overexpression of Ginkgo biloba Hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate reductase 2 gene (GbHDR2) in Nicotiana tabacum cv. Xanthi. 3. Biotech. 2021;11(7):337. doi: 10.1007/s13205-021-02887-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Kim et al. (2014).Kim YK, Kim YB, Uddin MR, Lee SH, Kim SU, Park SU. Enhanced triterpene accumulation in Panax ginseng Hairy roots overexpressing mevalonate-5-pyrophosphate decarboxylase and farnesyl pyrophosphate synthase. ACS Synthetic Biology. 2014;3(10):773–779. doi: 10.1021/sb400194g. [DOI] [PubMed] [Google Scholar]
  • Krithika et al. (2015).Krithika R, Srivastava PL, Rani B, Kolet SP, Chopade M, Soniya M, Thulasiram HV. Characterization of 10-Hydroxygeraniol Dehydrogenase from Catharanthus roseus reveals cascaded enzymatic activity in iridoid biosynthesis. Scientific Reports. 2015;5:8258. doi: 10.1038/srep08258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Kumar et al. (2015).Kumar K, Kumar SR, Dwivedi V, Rai A, Shukla Ak, Shanker K, Nagegowda DA. Precursor feeding studies and molecular characterization of geraniol synthase establish the limiting role of geraniol in monoterpene indole alkaloid biosynthesis in Catharanthus roseus leaves. Plant Science. 2015;239:56–66. doi: 10.1016/j.plantsci.2015.07.007. [DOI] [PubMed] [Google Scholar]
  • Li et al. (2017).Li QY, Fan FY, Gao X, Yang C, Bi CH, Tang JL, Liu T, Zhang XL. Balanced activation of IspG and IspH to eliminate MEP intermediate accumulation and improve isoprenoids production in Escherichia coli. Metabolic Engineering. 2017;44:13–21. doi: 10.1016/j.ymben.2017.08.005. [DOI] [PubMed] [Google Scholar]
  • Li, Jiang & Wong (2010).Li J, Jiang H, Wong WH. Modeling non-uniformity in short-read rates in RNA-Seq data. Genome Biology. 2010;11(5):R50. doi: 10.1186/gb-2010-11-5-r50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Li et al. (2024).Li YY, Pang QY, Li B, Fu YC, Guo MY, Zhang CJ, Tian Q, Hu SY, Niu JF, Wang SQ. Characteristics of CXE family of Salvia miltiorrhiza and identification of interactions between SmGID1s and SmDELLAs. Plant Physiology and Biochemistry. 2024;206:108140. doi: 10.1016/j.plaphy.2023.108140. [DOI] [PubMed] [Google Scholar]
  • Liang, Ko & Wang (2002).Liang PH, Ko TP, Wang AH. Structure, mechanism and function of prenyltransferases. European Journal of Biochemistry. 2002;269(14):3339–3354. doi: 10.1046/j.1432-1033.2002.03014.x. [DOI] [PubMed] [Google Scholar]
  • Lin et al. (2023).Lin Y, Liu GF, Rao Y, Wang B, Tian RF, Tan YY, Peng T. Identification and validation of reference genes for qRT-PCR analyses under different experimental conditions in Allium wallichii. Journal of Plant Physiology. 2023;281:153925. doi: 10.1016/j.jplph.2023.153925. [DOI] [PubMed] [Google Scholar]
  • Liu et al. (2017).Liu Y, Wang Y, Guo FX, Zhan L, Mohr T, Cheng P, Huo NX, Gu RH, Pei DN, Sun JQ, Tang L, Long CL, Huang LQ, Gu YQ. Deep sequencing and transcriptome analyses to identify genes involved in secoiridoid biosynthesis in the Tibetan medicinal plant Swertia mussotii. Scientific Reports. 2017;7:43108. doi: 10.1038/srep43108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Livak & Schmittgen (2001).Livak KJ, Schmittgen TD. Analysis of relative gene expression data using real time quantitive PCR and the 2−ΔΔCT method. Methods. 2001;25(4):402–408. doi: 10.1006/meth.2001.1262. [DOI] [PubMed] [Google Scholar]
  • Lou et al. (2019).Lou HQ, Ding HQ, Wu MZ, Zhang JS, Zhang FC, Chen WC, Yang Y, Suo JW, Yu WW, Xu CM, Song LL. Full-length transcriptome analysis of the genes involved in tocopherol biosynthesis in Torreya grandis. Journal of Agricultural and Food Chemistry. 2019;67(7):1877–1888. doi: 10.1021/acs.jafc.8b06138. [DOI] [PubMed] [Google Scholar]
  • Love, Hogenesch & Irizarry (2016).Love M, Hogenesch J, Irizarry R. Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation. Nature Biotechnology. 2016;34(12):1287–1291. doi: 10.1038/nbt.3682. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Ma et al. (2017).Ma DM, Li G, Zhu Y, Xie DY. Overexpression and suppression of Artemisia annua 4-hydroxy-3-methylbut-2-enyldiphosphate reductase 1 gene (AaHDR1) differentially regulate artemisinin and terpenoid biosynthesis. Frontiers in Plant Science. 2017;8:77. doi: 10.3389/fpls.2017.00077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Ma et al. (2009).Ma DM, Pu GB, Lei CY, Ma LQ, Wang HH, Guo YW, Chen JL, Du ZG, Wang H, Li GF, Ye HC, Liu BY. Isolation and characterization of AaWRKY1, an Artemisia annua transcription factor that regulates the amorpha-4, 11-diene synthase gene, a key gene of artemisinin biosynthesis. Plant and Cell Physiology. 2009;50(12):2146–2161. doi: 10.1093/pcp/pcp149. [DOI] [PubMed] [Google Scholar]
  • Meng et al. (2018).Meng XX, Xu F, Song QL, Ye JB, Liao YL, Zhang WW. Isolation, characterization and functional analysis of a novel 3-hydroxy-3-methylglutaryl-coenzyme A synthase gene (GbHMGS2) from Ginkgo biloba. Acta Physiologiae Plantarum. 2018;40:72. doi: 10.1007/s11738-018-2650-7. [DOI] [Google Scholar]
  • Miettinen et al. (2014).Miettinen K, Dong L, Navrot N, Schneider T, Burlat V, Pollier J, Woittiez L, van der Krol S, Lugan R, Ilc T, Verpoorte R, Oksman-Caldentey K-M, Martinoia E, Bouwmeester H, Goossens A, Memelink J, Werck-Reichhart D. The seco-iridoid pathway from Catharanthus roseus. Nature Communications. 2014;5:3606. doi: 10.1038/ncomms4606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Ni et al. (2019).Ni LH, Zhao ZL, Wu JR, Dorje GAAWE, Ma M. Analysis of transcriptomes to explore genes contributing to iridoid biosynthesis in Gentiana waltonii and Gentiana robusta (Gentianaceae) Acta Pharmaceutica Sinica B. 2019;54(5):944–953. doi: 10.16438/j.0513-4870.2019-0082. Chinese. [DOI] [Google Scholar]
  • Orlova et al. (2010).Orlova I, Nagegowda DA, Kish CM, Gutensohn AM, Haeda AH. The small subunit of snapdragon geranyl diphosphate synthase modifies the chain length specificity of tobacco geranylgeranyl diphosphate synthase in planta. Plant. 2010;21(12):4002–4017. doi: 10.1105/TPC.109.071282. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Oudin et al. (2007).Oudin A, Courtois M, Rideau M, Clastre M. The iridoid pathway in Catharanthus roseus alkaloid biosynthesis. Phytochemistry Reviews. 2007;6(2):259–276. doi: 10.1007/s11101-006-9054-9. [DOI] [Google Scholar]
  • Ozsolak & Milos (2012).Ozsolak F, Milos PM. RNA sequencing: advances, challenges and opportunities. Nature Reviews Genetics. 2012;12(2):87–98. doi: 10.1038/nrg2934. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Paysan-Lafosse et al. (2023).Paysan-Lafosse T, Blum M, Chuguransky S, Grego T, Pinto BL, Salazar GA, Bileschi ML, Bork P, Bridge A, Colwell L, Gough J, Haft DH, Letunić I, Marchler-Bauer A, Mi H, Natale DA, Orengo CA, Pandurangan AP, Rivoire C, Sigrist CJA, Sillitoe I, Thanki N, Thomas PD, Tosatto SCE, Wu CH, Bateman A. InterPro in 2022. Nucleic Acids Research. 2023;51(D1):D418–D427. doi: 10.1093/nar/gkac993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Petersen et al. (2011).Petersen TN, Brunak S, Heijne G, Nielsen H. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nature Methods. 2011;8(10):785–786. doi: 10.1038/nmeth.1701. [DOI] [PubMed] [Google Scholar]
  • Rai et al. (2013).Rai A, Smita SS, Singh AK, Shanker K, Nagegowda DA. Heteromeric and homomeric geranyl diphosphate synthases from Catharanthus roseus and their role in monoterpene indole alkaloid biosynthesis. Molecular Plant. 2013;6(5):1531–1549. doi: 10.1093/mp/sst058. [DOI] [PubMed] [Google Scholar]
  • Rhoads & Au (2015).Rhoads A, Au KF. PacBio sequencing and its applications, Genomics, Proteomics. Bioinformatics. 2015;3(5):278–289. doi: 10.1016/j.gpb.2015.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Salim et al. (2014).Salim V, Wiens B, Masada-Atsumi S, Yu F, De Luca V. 7-Deoxyloganetic acid synthase catalyzes a key 3 step oxidation to form 7-deoxyloganetic acid in Catharanthus roseus iridoid biosynthesis. Phytochemistry. 2014;101:23–31. doi: 10.1016/j.phytochem.2014.02.009. [DOI] [PubMed] [Google Scholar]
  • Shi et al. (2016).Shi M, Luo XQ, Ju GH, Li LL, Huang SX, Zhang T, Wang HZ, Kai GY. Enhanced diterpene tanshinone accumulation and bioactivity of transgenic Salvia miltiorrhiza hairy roots by pathway engineering. Journal of Agruculture and Food Chemistry. 2016;64(12):2523–2530. doi: 10.1021/acs.jafc.5b04697. [DOI] [PubMed] [Google Scholar]
  • Shi et al. (2012).Shi L, Qin L, Xu YJ, Ren A, Fang X, Mu DS, Tan Q, Zhao MW. Molecular cloning, characterization, and function analysis of a mevalonate pyrophosphate decarboxylase gene from Ganoderma lucidum. Molecular Biology Reports. 2012;39(5):6149–6159. doi: 10.1007/s11033-011-1431-9. [DOI] [PubMed] [Google Scholar]
  • Sun et al. (2012).Sun P, Song SH, Zhou LL, Zhang B, Qi JJ, Li XE. Transcriptome analysis reveals putative genes involved in iridoid biosynthesis in Rehmannia glutinosa. International Journal of Molecular Sciences. 2012;13(10):13748–13763. doi: 10.3390/ijms131013748. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Szklarczyk et al. (2014).Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, Simonovic M, Roth A, Santos A, Tsafou KP, Kuhn M, Bork P, Jensen LJ, Von Mering C. STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Research. 2014;43(D1):D447–D452. doi: 10.1093/nar/gku1003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Tamura, Stecher & Kumar (2021).Tamura K, Stecher G, Kumar S. MEGA11: molecular evolutionary genetics analysis version 11. Molecular Biology and Evolution. 2021;38(7):3022–3027. doi: 10.1093/molbev/msab120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Tholl et al. (2004).Tholl D, Kish CM, Orlova I, Sherman D, Gershenzon J, Pichersky E, Dudareva N. Formation of monoterpenes in Antirrhinum majus and Clarkia breweri flowers involves heterodimeric geranyl diphosphate synthases. The Plant Cell. 2004;16(4):977–992. doi: 10.1105/tpc.020156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Vranová, Coman & Gruissem (2013).Vranová E, Coman D, Gruissem W. Network analysis of the MVA and MEP pathways for isoprenoid synthesis. Annual Review Plant Biology. 2013;64(1):665–670. doi: 10.1146/annurev-arplant-050312-120116. [DOI] [PubMed] [Google Scholar]
  • Wang et al. (2019).Wang JZ, Lei YX, Xiao YM, Dehesh K. Uncovering the functional residues of Arabidopsis isoprenoid biosynthesis enzyme HDS. PANS. 2019;117(1):355–361. doi: 10.1073/pnas.1916434117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Wang et al. (2010).Wang J, Liu Y, Cai Y, Zhang F, Xia G, Xiang F. Cloning and functional analysis of geraniol 10-hydroxylase, a cytochrome P450 from Swertia mussotii Franch. Bioscience Biotechnology and Biochemistry. 2010;74(8):1583. doi: 10.1271/bbb.100175. [DOI] [PubMed] [Google Scholar]
  • Wang et al. (2021).Wang Y, Yu WT, Ran LF, Chen Z, Wang CN, Dou Y, Qin YY, Suo QW, Li YH, Zeng JJ, Liang AM, Dai YL, Wu YP, Qu XF, Xiao YH. DELLA-NAC interactions mediate GA signaling to promote secondary cell wall formation in cotton stem. Frontier in Plant Science. 2021;12(12):655127. doi: 10.3389/fpls.2021.655127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Wei et al. (2019).Wei H, Xu C, Movahedi A, Sun WB, Li DW, Qiang ZG. Characterization and function of 3-hydroxy-3-methylglutaryl-CoA reductase in Populus trichocarpa: overexpression of PtHMGR enhances terpenoids in transgenic poplar. Frontiers Plant Science. 2019;10:1476–1493. doi: 10.3389/fpls.2019.01476. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Wei et al. (2012).Wei SH, Zhang PC, Feng XZ, Kodama H, Yu CY, Chen G. Qualitative and quantitative determination of ten iridoids and secoiridoids in Gentiana straminea Maxim. by LC-UV-ESI- MS. Journal of Natural Medicines. 2012;66(1):102–108. doi: 10.1007/s11418-011-0560-8. [DOI] [PubMed] [Google Scholar]
  • Wu et al. (2016).Wu JR, Zhao ZL, Wu LH, Wang ZT. Authentication of Gentiana straminea Maxim and its substitutes based on chemical profiling of iridoids using liquid chromatography with mass spectrometry. Biomedical Chromatography. 2016;30(12):2061–2066. doi: 10.1002/bmc.3763. [DOI] [PubMed] [Google Scholar]
  • Xu et al. (2016).Xu ZC, Luo HM, Ji AJ, Zhang X, Song JY, Chen SL. Global identification of the full-length transcripts and alternative splicing related to phenolic acid biosynthetic genes in Salvia miltiorrhiza. Frontier in Plant Science. 2016;7:100. doi: 10.3389/fpls.2016.00100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Xu et al. (2023).Xu J, Shan TY, Zhang JJ, Zhong XX, Tao YJ. Full-length transcriptome analysis provides insights into flavonoid biosynthesis in Ranunculus japonicus. Physiologia Plantarum. 2023;175(4):1–13. doi: 10.1111/ppl.13965. [DOI] [PubMed] [Google Scholar]
  • Xu et al. (2022).Xu C, Ye P, Wu Q, Liang SC, Wei WK, Yang JF, Chen WW, Zhang RT, Ma DM. Identification and functional characterization of three iridoid synthases in Gardenia jasminoides. Planta. 2022;255(3):58. doi: 10.1007/s00425-022-03824-3. [DOI] [PubMed] [Google Scholar]
  • Yang (1991).Yang YC. Tibetan medicine. Xining: Qinghai People Press; 1991. pp. 11–12. Chinese. [Google Scholar]
  • Ye et al. (2021).Ye HG, Li CY, Ye WC, Zeng FY, Liu FF, Liu YY, Wang FG, Ye YS, Fu L, Li JR. Common Chinese Materia Medica. Springer; Singapore: 2021. pp. 525–564. [DOI] [Google Scholar]
  • Zhan et al. (2023).Zhan T, Li F, Lan J, Lin LH, Yang ZR, Xie CZ, Wang HB, Zheng XS. Functional characterization of four mono-terpene synthases (TPSs) provided insight into the biosynthesis of volatile monoterpenes in the medicinal herb Blumea balsamifera. Physiology and Molecular Biology of Plants. 2023;29(4):459–469. doi: 10.1007/s12298-023-01306-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Zhang et al. (2015).Zhang XD, Allan A, Li CH, Wang YZ, Yao QY. De novo assembly and characterization of the transcriptome of the Chinese Medicinal Herb, Gentiana rigescens. International Journal of Molecular Sciences. 2015;16(5):11550–11573. doi: 10.3390/ijms160511550. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Zhang et al. (2021).Zhang CG, Liu HH, Zong YX, Tu ZH, Li HG. Isolation, expression, and functional analysis of the geranylgeranyl pyrophosphate synthase (GGPPS) gene from Liriodendron tulipifera. Plant Physiol and Bioch. 2021;166:700–711. doi: 10.1016/j.plaphy.2021.06.052. [DOI] [PubMed] [Google Scholar]
  • Zhang et al. (2023).Zhang T, Wang M, Li Z, Wu X, Liu X. Transcriptome analysis and exploration of genes involved in the biosynthesis of secoiridoids in Gentiana rhodantha. PeerJ. 2023;11:e14968. doi: 10.7717/peerj.14968. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Zhao et al. (2013a).Zhao L, Chang WC, Xiao Y, Liu HW, Liu P. Methylerythritol phosphate pathway of isoprenoid biosynthesis. The Annual Review of Biochemistry. 2013a;82:497–530. doi: 10.1146/annurev-biochem-052010-100934. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Zhao et al. (2020).Zhao YX, Chen YC, Gao M, Yin HF, Wu LW, Wang YD. Overexpression of geranyl diphosphate synthase small subunit 1 (LcGPPS.SSU1) enhances the monoterpene content and biomass. Industrial Crops and Products. 2020;143:111926. doi: 10.1016/j.indcrop.2019.111926. [DOI] [Google Scholar]
  • Zhao, Dorje & Wang (2010).Zhao ZL, Dorje G, Wang ZT. Identification of medicinal plants used as Tibetan traditional medicine. Journal of Ethnopharmacology. 2010;132:122–126. doi: 10.1016/j.jep.2010.07.051. Chinese. [DOI] [PubMed] [Google Scholar]
  • Zhao et al. (2013b).Zhao L, Gao LP, Wang HX, Chen XT, Wang YS, Yang H, Wei CL, Wan XC, Xia T. The R2R3-MYB, bHLH, WD40, and related transcription factors in flavonoid biosynthesis. Functional & Integrative Genomics. 2013b;13(1):75–98. doi: 10.1007/s10142-012-0301-4. [DOI] [PubMed] [Google Scholar]
  • Zhao et al. (2023).Zhao Y, Liu G, Yang F, Liang Y, Gao Q, Xiang C, Li X, Yang R, Zhang G, Jiang H, Yu L, Yang S. Multilayered regulation of secondary metabolism in medicinal plants. Molecular Horticulture. 2023;3(1):11. doi: 10.1186/s43897-023-00059-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Zhao & Wang (2020).Zhao S, Wang CS. Deep sequencing and transcriptome analyses to identify genes involved in iridoid biosynthesis in the medicinal plant Valeriana jatamansi Jones. Notulae botanicea Horti Agrobotanici Cluj-Napoca. 2020;48(1):189–199. doi: 10.15835/nbha48111759. [DOI] [Google Scholar]
  • Zhong et al. (2020).Zhong FR, Huang L, Qi LM, Ma YT, Yan ZY. Full-length transcriptome analysis of Coptis deltoidea and identification of putative genes involved in benzylisoquinoline alkaloids biosynthesis based on combined sequencing platforms. Plant Molecular Biology. 2020;102(5):477–499. doi: 10.1007/s11103-019-00959-y. [DOI] [PubMed] [Google Scholar]
  • Zhou et al. (2016).Zhou DW, Gao S, Wang H, Lei TX, Shen JW, Gao J, Chen SL, Yin J, Liu JQ. De novo sequencing transcriptome of endemic Gentiana straminea (Gentianaceae) to identify genes involved in the biosynthesis of active ingredients. Gene. 2016;575(1):160–170. doi: 10.1016/j.gene.2015.08.055. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Information 1. Sequence information of proteins analyzed in this study.
DOI: 10.7717/peerj.20136/supp-1
Supplemental Information 2. Primer for real-time quantitative PCR.
DOI: 10.7717/peerj.20136/supp-2
Supplemental Information 3. Gene function annotation via KEGG metabolic pathway classification.
DOI: 10.7717/peerj.20136/supp-3
Supplemental Information 4. The ten pathways with the highest number of annotated genes in the KEGG.
peerj-13-20136-s004.doc (27.5KB, doc)
DOI: 10.7717/peerj.20136/supp-4
Supplemental Information 5. The number of genes involved in secondary metabolism according to the KEGG pathway analysis.
peerj-13-20136-s005.doc (32.5KB, doc)
DOI: 10.7717/peerj.20136/supp-5
Supplemental Information 6. Isoforms involved in iridoid biosynthesis.

“ –” i ndicatesnoE . C . number or KEGG Orthology ( KO )

peerj-13-20136-s006.doc (54.5KB, doc)
DOI: 10.7717/peerj.20136/supp-6
Supplemental Information 7. Protein sequence homology of GsG(G)PPS.

Note: in G. straminea , GsGGPPS SSU denotes the small subunits of geranylgeranyl diphosphate synthase ; GsGGPPS represents the geranylgeranyl diphosphate synthase; GsGPPS indicates geranyl pyrophosphate synthase .

DOI: 10.7717/peerj.20136/supp-7
Supplemental Information 8. SMRT sequencing of G.straminea.

(a) CCS read length distribution; (b) CCS pass distribution; (c) Consensus isoform distribution; (d) Length distribution of transcript isoforms .

DOI: 10.7717/peerj.20136/supp-8
Supplemental Information 9. GO function classification.
peerj-13-20136-s009.jpg (113.7KB, jpg)
DOI: 10.7717/peerj.20136/supp-9
Supplemental Information 10. Gene number of each TF family (top ten).
DOI: 10.7717/peerj.20136/supp-10
Supplemental Information 11. KEGG pathway enrichment of DEGs in different groups.
peerj-13-20136-s011.png (804.3KB, png)
DOI: 10.7717/peerj.20136/supp-11
Supplemental Information 12. Multiple sequence alignment of deduced amino acid sequences of GGPPSSSU, GGPPS, GPPS-like proteins from G. straminea and other plant species.

(a) Sequence alignment of GsGGPPS(SSU) with GGPPS(SSU) from other plants; (b) Sequence alignment of GsGPPS with GPPS, SPPS from other plants; Identical residues are shaded in dark blue, highly similar residues are shaded in pink, and similar residues are shaded in light blue. Polyprenyl-synt domain was marked with purple border; DDXXXXD FARM and DDXXD (SARM) motif was marked with red border, CxxxC motif was marked with green border; Aspartate-rich region was marked with red Triangle; Chain length determination region was marked with green horizontal line; Active site lid residues was marked with blue horizontal line. Gene abbreviations, the species and gene ID information were shown in Table S1.

peerj-13-20136-s012.pdf (916.4KB, pdf)
DOI: 10.7717/peerj.20136/supp-12
Supplemental Information 13. Protein tertiary structure of G(G)PPS of G . straminea.

Structure prediction was performed by homologous modelling with the SWISS-MODEL sever, GsGGPPS SSU using structure of the geranylgeranyl pyrophosphate synthase small subunit protein of Mucuna pruriens (velvet bean) (A0A371F419) as the template. GsGGPPS using structure of the geranylgeranyl pyrophosphate synthase protein of Handroanthus impetiginosus GGPPS (A0A2G9GV50) as the template, GsGPPS using structure of the geranyl pyrophosphate synthase protein of Catharanthus roseus (B2MV87) as the template. (a) Predicted structure of GsGGPPS SSU shown; (b) Predicted structure of GsGGPPS shown; (c) Predicted structure of GsGPPS shown.

peerj-13-20136-s013.png (395.4KB, png)
DOI: 10.7717/peerj.20136/supp-13
Supplemental Information 14. The detailed procedure for the PCR experiment.
DOI: 10.7717/peerj.20136/supp-14
Supplemental Information 15. MIQE checklist.
DOI: 10.7717/peerj.20136/supp-15
Supplemental Information 16. The average Ct of q-PCR.
peerj-13-20136-s016.xls (31.5KB, xls)
DOI: 10.7717/peerj.20136/supp-16
Supplemental Information 17. Gs Transcriptome Functional annotation.
peerj-13-20136-s017.xlsx (6.9MB, xlsx)
DOI: 10.7717/peerj.20136/supp-17
Supplemental Information 18. Gs Isoforms sequence.
peerj-13-20136-s018.fasta (177.5KB, fasta)
DOI: 10.7717/peerj.20136/supp-18

Data Availability Statement

The following information was supplied regarding data availability:

The data is available at China National Center for Bioinformation: CRA017932, CRA019968.


Articles from PeerJ are provided here courtesy of PeerJ, Inc

RESOURCES