Skip to main content
Nature Communications logoLink to Nature Communications
. 2025 Nov 24;16:10339. doi: 10.1038/s41467-025-65299-6

Multi-omics analyses reveal regulatory networks underpinning metabolite biosynthesis in Nicotiana tabacum

Jiaming Li 1,#, Qinggang Liao 1,#, Huina Zhou 2, Risheng Hu 3, Yangyang Li 3, Zhengrong Hu 3, Bei Yu 3, Pingping Liu 2, Qingxia Zheng 2, Wenxuan Pu 4, Song Sheng 5, Yongjun Liu 3, Shaolong Wu 3, Tianbo Liu 3, Qinzhi Xiao 3, Shuhui Duan 3, Junping Gao 4, Xiaoxu Li 4, Shuaibin Wang 4, Hanqian Xiao 3, Zhicheng Zhou 3, Zhongshan Lu 3, Jiashuo Yang 3,, Jianbin Yan 1,
PMCID: PMC12644487  PMID: 41285720

Abstract

Tobacco is a significant industrial crop, serving as a model for plant science and a promising specie for the production of proteins and small molecules. However, system biology studies of tobacco under natural field cultivation conditions remain scarce. Here, we construct a genome-scale metabolic regulatory network through integration of dynamic transcriptomic and metabolomic profiles from field-grown tobacco leaves across two ecologically distinct regions. We map 25,984 genes and 633 metabolites into 3.17 million regulatory pairs using multi-algorithm integration. This network reveals three pivotal transcriptional hubs, including NtMYB28 (promoting hydroxycinnamic acids synthesis by modifying Nt4CL2 and NtPAL2 expression), NtERF167 (amplifying lipid synthesis via NtLACS2 activation) and NtCYC (driving aroma production through NtLOX2 induction). These transcriptional hubs achieve substantial yield improvements of target metabolites by rewiring metabolic flux. The present work provides a systems-level atlas of tobacco metabolic regulation and may help to guide metabolic engineering.

Subject terms: Secondary metabolism, Gene regulation, Metabolomics, Agricultural genetics


Tobacco is not only an important industrial crop but also severs as a model for plant science research and chassis for plant metabolic engineering. Here, the authors report genome-scale metabolic regulatory network and reveal key transcriptional hubs for different metabolite biosynthesis.

Introduction

Plant metabolomes constitute dynamic biochemical blueprints that orchestrate environmental adaptation through intricate networks of primary and secondary metabolites which function as molecular sensors and effectors1,2. Primary metabolites, including sugars, amino acids, lipids, nucleotides, and vitamins, serve as essential structural or energy materials required for plant growth, development, and reproduction3. Secondary metabolites—such as antibiotics, toxins, hormones, and pigments—enhance plant adaptation to specific environmental conditions4. These metabolites operate as structural sentinels (such as membrane lipids maintaining fluidity under cold stress), reactive oxygen mitigators (such as flavonoids and hydroxycinnamate conjugates scavenging ultraviolet radiation (UV)-induced reactive oxygen species (ROS)), and signaling molecules (such as phytohormones such as abscisic acid (ABA) and strigolactones, along with specialized metabolites like dihydroactinidiolide5 (DHA)). In several dimensions of genetic research, metabolism stands out as a direct factor influencing phenotypes6. In addition to their roles in responding to various stresses or serving as signaling molecules, plant metabolites have diverse industrial applications. Nicotiana tabacum, as an important crop in the tobacco industry, is used as a raw material source for the cigarette manufacturing and for obtaining nicotine. It could be genetically engineered to produce various high value-added endogenous metabolites7, such as chlorogenic acid (CGA)8, triacylglycerol (TAG)9,10 and flavorings (such as DHA)11,12, highlighting the great potential of chassis in sustainable biomanufacturing and bioeconomy13. Additionally, owing to its rich metabolite diversity and substantial biomass, tobacco serves as an ideal model species for plant metabolic regulation research.

The phenylpropanoid pathway stands as one of the crucial secondary metabolic routes. Its products, including flavonoids, lignin, and hydroxycinnamic acids, play vital roles in plant growth, development, and stress resistance14. CGA, one of the most important metabolites among hydroxycinnamic acids, displays diverse pharmacological properties such as anti-inflammatory, antioxidant, antiviral, hypoglycemic, and hypolipidemic activities15. Therefore, it finds extensive applications in the food, pharmaceutical, and nutraceutical industries15. While the phenylpropanoid biosynthesis pathway has been extensively studied in rice16 and Arabidopsis thaliana17, a comprehensive metabolic profile of phenylpropanoid metabolism in tobacco remains elusive.

The lipid metabolism stands as one of the most crucial primary metabolic pathways in plants, where diverse lipids are indispensable for structural maintenance and stress resistance18. Moreover, plant-derived oils serve as an ideal source for biodiesel production19. In tobacco, the primary synthesis of oil takes place in the green photosynthetic tissues, the presence of leaf oil deposits in the form of oil bodies20. Although the oil content in green biomass is considerably lower than that found in oil-crop seeds, there is potential to enhance lipid content in tobacco leaves by identifying and utilizing genes, such as transcription factors (TFs), capable of regulating the expression of multiple genes, that regulate lipid synthesis10.

In addition to biosynthetic pathways, several crucial metabolites in plants are derived from carotenoid degradation, including ABA, strigolactones, and the majority of volatile compounds responsible for aroma11. Aroma is a sophisticated combination of aromatic volatile organic compounds present in plants. This diverse blend serves various purposes in the food, flavors, fragrances, cosmetics, and pharmaceutical industries21. Furthermore, it acts as a repellent for herbivores and plays a role in attracting natural predators22. In tobacco, the main source of aroma is the breakdown of carotenoids, leading to the production of terpenoids or terpenoid-like metabolites23. Lipoxygenase (LOX) catalyzes the oxidation of unsaturated fatty acids, such as linoleic acid and linolenic acid, producing free radicals. This oxidative process results in the breakdown of carotenoids and the generation of volatile compounds derived from carotenoids, such as ionone, damascenone, and DHA11. DHA is a volatile compound with a sweet tea-like aroma, serving as a fragrance and flavoring agent in food or cosmetics. Additionally, it functions as a crucial stress signaling molecule within plant organisms5. Ionone is also a significant aroma compound in tobacco, imparting notes of black tea, woody, or fruity fragrances, which is utilized in the manufacturing of fragrances, flavorings, beverages, food products, and cosmetics. Additionally, ionone can induce resistance in plants against certain insects24. Damascenone is known for its calming, cholagogic, cell regeneration-promoting, and vascular wall-strengthening effects, primarily extracted from roses, which is utilized in the production of fragrances, essential oils, and more25. Damascenones and ionones belong to the group of rose ketones, structurally similar chemical compounds. These substances hold significant value in perfumery and pharmaceuticals, but the high costs associated with their extraction limit the scale of production. Therefore, there is a need to develop large-scale and cost-effective raw material sources26.

While having a rich source of substrate materials for diverse synthetic targets, tobacco metabolic regulatory mechanisms are highly complex1. Metabolic pathways compete and may even inhibit each other27,28. Therefore, employing systems biology approaches to investigate the dynamic accumulation patterns and their genetic mechanisms is imperative for elucidating metabolic regulatory mechanisms. Systems biology research based on multi-omics analysis enables multi-tiered dissection of genetic and developmental processes to understand complex traits, thereby facilitating efficient identification of key metabolic pathway regulatory genes, including structural enzymes and transcription factors29,30. Genes and metabolites participating in the same biosynthetic pathway, along with structural genes and their regulatory counterparts, frequently display analogous expression patterns31. Based on this principle, in the previous studies, gene-metabolite regulatory networks have been successfully constructed in multiple species, such as human32, Arabidopsis33, maize34,35, rice16, barley36,37, Tibetan hulless barley38, tomato3941, cotton42, and kiwifruit43,44, and successfully identified putative regulatory factors governing metabolite biosynthesis. For instance, profiling variomes, transcriptomes, and metabolomes across hundreds of tomato genotypes enabled the construction of a metabolic-genetic regulatory network that identified SlMYB12 as a key gene regulating fruit coloration through metabolic rewiring39. Integrative analysis of 20 developmental-stage samples in MicroTom tomato established a metabolic regulatory network, revealing transcription factor SlMYB75 governing flavonoid biosynthesis and the regulator SlbHLH114 controlling steroidal alkaloid production40. Similarly, a rice developmental metabolic network uncovered key regulators of lignin biosynthesis and phospholipid metabolism16, while multi-omics integration across six developmental stages in two barley cultivars deciphered the metabolic regulatory framework of grain development, providing resources for nutritional quality improvement37. Beyond identifying metabolic pathway regulators, such networks demonstrate versatile functionality: integration of multi-omics data from three rice tissues exposed a jasmonate-mediated anti-herbivore network with pivotal NAM/ATAF/CUC (NAC) 1/3/4 resistance genes45, and analysis of a diploid F₂ potato population derived from homozygous inbred lines identified pectin methylesterase gene exhibiting dominant heterosis effects46. While foundational frameworks exist for some model plants, economically critical crops like tobacco remain underexplored in field-relevant contexts, particularly regarding spatiotemporal gene-metabolite regulatory mechanism during development, which is a knowledge gap impeding metabolic engineering applications and molecular design breeding. In prior research, we performed co-expression analysis of transcriptomic and metabolomic data derived from three cultivars across three field sites at six developmental stages, providing insights into the molecular mechanisms underlying complex genetic in tobacco47.

In this work, we determine the dynamic metabolome and transcriptome data after topping under open field conditions in two typical tobacco-growing belts of China, including high-altitude mountainous areas (HM) and low-altitude flat areas (LF) to accurately understand the metabolic regulatory patterns in tobacco leaves under natural environments. To construct a comprehensive regulatory landscape for tobacco metabolism, we combine metabolome data with transcriptome data on the basis of the strong correlation and similar expression patterns observed between metabolite accumulation and gene expression. Under the guidance of metabolic regulation landscape, we identify some key genes that can regulate the production of hydroxycinnamic acids, lipids and aroma compounds, and engineer tobacco plants capable of efficiently producing hydroxycinnamic acid, lipid and aromatic compounds. This study establishes a high-quality tobacco metabolic regulation atlas, providing critical multi-omics resources for identifying key regulatory genes governing developmental processes and metabolic pathways, alongside analyzing underlying mechanisms of metabolite synthesis in tobacco.

Results

Environment affects tobacco leaf development, gene expression and metabolite accumulation patterns

K326 is a widely cultivated model variety of tobacco grown globally, and analyzing its metabolic regulatory network provides a great representation. In addition, given that the tobacco will be used for industrial-scale production in field environments, we conducted field planting experiments in two typical tobacco cultivation regions with distinct environmental differences, including HM and LF (Supplementary Fig. 1a). To understand the effect of different environmental factors on tobacco development, we analyzed climate data from both regions during the tobacco sampling period. LF and HM showed no significant differences in net solar radiation, relative humidity, and precipitation during the sample collection period. The main climatic differences between the two sites were primarily in terms of temperature, including rainfall, temperature range, and maximum temperature (Fig. 1a-f, Supplementary Fig. 1b, c, Supplementary Data 1). There is a highly significant difference in the temperature range, average temperature and maximum temperature between LF and HM, but there is no difference in the minimum temperature (Fig. 1d–f, Supplementary Fig. 1c, d, and Supplementary Data 1). Previous study has shown that temperature is a critical factor affecting the growth and metabolic homeostasis of tobacco48. Therefore, the perturbations in gene expression and metabolite accumulation caused by temperature variations can enhance the accuracy and stability of subsequent metabolic regulatory networks.

Fig. 1. Schematic representation of the experimental design and summary of metabolome and transcriptome profiling.

Fig. 1

af Comparison of the net solar radiation (a), relative humidity (b), rain fall (c), temperature difference (d), average temperature (e) and maximum air temperature (f) between Guiyang (LF) and Longshan (HM) from 20 to 40 days after topping (DAT) (n = 21). *P < 0.05, **P < 0.01, ***P < 0.001. NS, not significant. g Schematic diagram of tobacco sampling. The 8th and 9th leaves from the bottom to the top after topping were selected for analysis. h Tobacco middle leaves from 20 to 40 DAT in LF and HM. i The Soil Plant Analysis Development chlorophyll index (SPAD)/Specific Leaf Weight (SLW) of tobacco middle leaves from 20 to 40 DAT in LF and HM (n = 3 biological replicates). Error bars represent the means ± standard errors of three replicate. j, k PCA results for the metabolome (j) and transcriptome (k) data from 22 tobacco samples. l, m KEGG enrichment for time-course differentially expressed genes (l) and metabolites (m). The P values were calculated using hypergeometric distribution (two-sided) and then adjusted using the FDR. In (af) boxes represent the median values and the first and third quartiles; whiskers represent the minimum and maximum values. A two-tailed Student’s t test was used to determine P values. Source data are provided as a Source Data file.

To create a standardized system for tobacco leaf development, we adopted a topping strategy (Fig. 1g). Topping was strictly based on the same developmental stage of the plants in LF and HM. Thus, although ecological variations between sites induced differences in growth rates, flowering time, and maturity period (Supplementary Data 1), synchronized topping at the same developmental stage maximally neutralized pre-existing developmental disparities49,50. On the day of flowering, we manually removed the inflorescence at the top of the main stem along with some new leaves, ensuring each tobacco plant bore 18 healthy leaves51. Topping removes apical dominance, halts reproductive growth, and redirects more nutrients towards the synthesis of secondary metabolites, thus standardizing the nutrient system of the tobacco plant52. After topping, the upper leaves are still growing, while the lower leaves are prone to pests and diseases, receive less light, and have entered the senescence process. Therefore, we selected the 8th and 9th leaves in the middle position for sampling (Fig. 1g). Moreover, we began sampling on the 20th day after topping to avoid the period immediately following topping when hormone levels within the tobacco plant are disrupted, and levels of various metabolites such as phenolics, alkaloids, pigments, and terpenes fluctuate significantly, which may interfere with our research53.

To understand metabolic dynamics and relevant underlying mechanisms, we collected tobacco leaves from 20 to 40 days after topping (DAT) in LF and HM, and performed transcriptomic and metabolomic analysis. The leaf color between HM and LF exhibited differences during the maturation process (Fig. 1h). Therefore, we measured the chlorophyll content in the leaves. The results indicated that the normalized SPAD (Soil Plant Analysis Development) index by SLW (Specific Leaf Weight) of LF leaves was significantly higher than that of HM leaves, suggested that the senescence of leaves in LF was slower than that in HM (Fig. 1i). For metabolomic analysis, samples were analyzed by multiple methods, including an ultrahigh-performance liquid chromatography-mass spectrometry (LC-MS/MS)-based nontargeted method combined with a targeted metabolic profiling method, a gas chromatography-mass spectrometry (GC/MS)-based nontargeted method and a derivative GC-MS nontargeted method. In total, we identified and quantified 633 metabolites among all the samples, which were classified into 14 major categories based on their metabolic characteristics (Supplementary Data 2). Hierarchical clustering analysis revealed that the location specificity was greater than time-course specificity (Supplementary Fig. 2). Principal component analysis (PCA) also showed that metabolites could both be divided into 2 groups by location (Fig. 1j). Furthermore, in both LF and HM, the period from 20 to 40 DAT can be divided into two clusters based on time-course, with the cluster transition point at 28 DAT for LF and 30 DAT for HM (Supplementary Fig. 2).

Flavonoids and lipids can be divided into different subclasses according to their structure or type of modification; therefore, we analyzed these metabolites in detail. The contents of most flavonoids, particularly the subclasses of flavonoid-3-O-glycosides, flavonoid-O-glycosides, flavonoid-C-glycosides, O-methylated flavonoids and flavanols increased in samples from HM (Supplementary Fig. 3). Interestingly, at 28 DAT, most flavonoid contents in HM decreased. Temperature data revealed a sharp decline of the lowest temperature at 28 DAT compared to 27 DAT (Supplementary Fig. 1c, and Supplementary Data 1), indicating that flavonoids were heavily consumed to counteract reactive oxygen species generated by environmental stress during this abrupt temperature change54. By 30 DAT, temperatures had returned to average levels, and most flavonoid contents increased, demonstrating that flavonoid synthesis genes which induced by stress had initiated excessive accumulation of flavonoid compounds55 (Supplementary Fig. 3).

The lipid accumulation patterns differed significantly between LF and HM and were categorized into early and late stages based on the accumulation in different phases (Supplementary Fig. 4). A majority of the lipids, such as TAG and free fatty acids (FFAs), were present at relatively high levels in the tobacco leaves of HM, which may be associated with the relatively low temperatures and UV-B stress in HM (Supplementary Fig. 4). The contents of specific lipid subclasses in LF tobacco leaves—particularly diacylglycerols and phosphatidylethanolamines—underwent significant alterations around 28-30 and 34 DAT. Meteorological data indicate substantial rainfall and temperature declines during these periods (Supplementary Fig. 1b, c; and Supplementary Data 1). To counteract UV-B and low-temperature stress, tobacco requires relatively high accumulation of lipids to maintain cell membrane fluidity.

To identify the genetic basis of tobacco leaf development, the tobacco leaves harvested in the same batch as the metabolomic samples were collected in triplicate for RNA sequencing (RNA-seq). The replicates of RNA-seq showed highly positive correlations (Supplementary Fig. 5). In total, 34,839 protein-coding genes were found to be expressed (average Fragments Per Kilobase Million (FPKM) ≥ 1, median absolute deviation ≥0.01) in 22 samples (Supplementary Data 3). Hierarchical clustering analysis and PCA showed that the location specificity was greater than time-course specificity, suggesting that site plays a major role in gene expression (Fig. 1k, and Supplementary Fig. 6). The samples from the two sites could also be clustered separately at different time points (Supplementary Fig. 6). Overall, both the metabolome and transcriptome showed significant developmental specificity during leaf development between LF and HM. These results implied that environment has a strong impact on metabolite accumulation and gene expression, while the accumulation patterns of metabolites gradually change over time during the maturation process.

Metabolite difference is linked with extensive temporal transcriptional changes

To understand the time-course metabolites and genes expression change, we conducted a temporal analysis of differential gene expression and metabolite production in LF and HM. A total of 14,595 differentially expressed genes (DEGs) and 308 differentially abundant metabolites were recognized in LF, while 6943 DEGs and 196 differentially abundant metabolites were recognized in HM. There were 3476 common temporal DEGs and 134 common temporal differentially abundant metabolites between LF and HM (Supplementary Fig. 7a, b). Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) enrichment analyses highlighted significant differences in the functions of temporal DEGs between LF and HM (Fig. 1l, and Supplementary Fig. 7c). For instance, GO enrichment analysis also revealed a more diverse biological functionality for these genes in HM (Supplementary Fig. 7c). KEGG analysis indicated that the functions of temporally specific genes in HM were enriched in processes such as photosynthesis, carotenoid biosynthesis, and FFA degradation (Fig. 1l). KEGG enrichment analysis of the temporally specific differentially abundant metabolites also suggested noteworthy distinctions in the biological functions of metabolites between LF and HM (Fig. 1m).

The dynamic expression patterns of genes reflect their role in leaf development. Therefore, temporal clustering analysis was conducted by Mfuzz using the fuzzy c-means algorithm, revealing that temporally specific genes and metabolites in LF and HM could be categorized into 9 clusters (Supplementary Fig. 8a–d). In LF, temporally specific DEGs in Cluster V were enriched in photosynthesis and chlorophyll metabolism, while those in Cluster IX exhibited enrichment in lipid and starch metabolism. In HM, Cluster IX displayed enrichment in lipid metabolism (Supplementary Fig. 9a, b). KEGG enrichment analysis of the metabolites in LF indicated that Cluster VI was primarily enriched in amino acid metabolism, whereas in HM, Cluster VIII was enriched in the synthesis of plant hormones and phenylpropanoids (Supplementary Fig. 10a, b). These results suggested dynamic changes in genes and metabolites at the temporal level. However, it is difficult to establish direct parallelism between genes and metabolites. Therefore, we aimed to construct a comprehensive metabolic regulatory network in the next step.

Multiple cluster analysis provides a clearer depiction of complex gene-metabolite regulatory landscape

Regulatory genes and their targets may often maintain similar expression patterns across large time scales and different tissues28. Based on these principles, we merged multiple clustering methods, including Weighted Gene Correlation Network Analysis (WGCNA), Pearson correlation coefficient (PCC)-based analysis of coregulation16 and k-means56 to assess the associations between genes and metabolites, and employed GINI coefficients complemented by machine learning approaches to capture non-linear relationships (Fig. 2a). Initially, we excluded genes that showed no correlation with any metabolite (absolute correlation coefficient |r | <0.5) (Supplementary Fig. 11a, b). In total, 25,984 genes were identified (Supplementary Data 4). Subsequently, the genes and metabolites were separated into 9 k-means clusters (Fig. 2b, Supplementary Data 5). To calibrate the gene-metabolite regulatory relationships within the clusters, we employed a coregulation method based directly on the PCCs (Fig. 2b, Supplementary Data 6). By analyzing the mapping relationship between the PCC and k-means clustering results, we found that different clustering methods exhibited biases in aggregating regulatory pairs, including some known metabolic associated genes. By merging multiple clustering methods, we enhanced the accuracy of the two-dimensional regulatory relationships and derived the final clustering results, encompassing 630 metabolites and 21,527 genes (Fig. 2c, and Supplementary Data 7). These clusters displayed a uniform and distinct expression pattern across tobacco leaf development. For instance, the expression of genes and metabolites in cluster 2 continuously increased, while those in cluster 3 continuously decreased, suggesting that the abundance patterns of tobacco genes are closely aligned with the major metabolic pathways (Fig. 2c).

Fig. 2. Multiple cluster analysis of metabolites and genes.

Fig. 2

a Workflow of the gene-metabolite cluster analysis. b Mapping relationship of different clustering between two methods. Pink lines represent some known metabolic associated genes. c Characterization of the final clustering. The word cloud shows the TF family enrichment analysis. Font size represents the -log (FDR). Blue lines in each cluster display the average metabolite content or gene expression level. d Pearson correlation coefficients (PCCs) and GINI coefficients between same cluster (purple) and random (green) gene-gene or gene-metabolite pairs. e Significant overlap of regulatory pairs identified through distinct methodologies indicates the robustness of regulatory relationships. f Ternary plot shows the deviation in the distribution of correlations between metabolites (ME) and structural genes (SG), transcription factors (TF) and MEs. g PCCs and GINI coefficients between metabolites is significantly higher than that between metabolites and SG or TF. Blue lines represent the median values. Different lowercase letters indicate significant differences (PCCs: a vs b P = 2.18e-16, b vs c P = 3.87e-17, a vs c P = 5.87e-22; GINI: a vs b P = 3.52e-17, b vs c P = 4.77e-18, a vs c P = 6.3e-25). A two-tailed Games Howell test was used to determine P values. h PCCs and GINI coefficients between the expression values of the known metabolic associated TFs and those of their targets in the same cluster (n = 4 biological replicates). Error bars represent the means ± standard errors of four biological replicate. i PCC correlation between dihydroactinidiolide (DHA) and expression level of SGs. j PCC correlation between NtLOX2 and expression level of TFs. k PCC correlation between chlorogenic acid (CGA) and expression level of SGs. l PCC correlation between Nt4CL2A, NtPAL2 and expression level of TFs. m GINI correlation between DHA and expression level of SGs. n GINI correlation between NtLOX2 and expression level of TFs. o GINI correlation between CGA and expression level of SGs. p GINI correlation between Nt4CL2A, NtPAL2 and expression level of TFs. In (c) and (e), The P values were calculated using hypergeometric distribution (two-sided) and then adjusted using the FDR. Source data are provided as a Source Data file.

To specifically identify regulatory genes, we collected information on all tobacco TFs. In total, 2,390 TFs were detected at the transcriptome level as regulators. A total of 22,569 structural genes (SGs) were identified as potential targets. Through enrichment of TF families in each cluster, it was found that every cluster displayed significant enrichment of specific TF categories, suggesting their potential pivotal roles in the regulation of the corresponding SGs and metabolites (MEs) in the same cluster. Multiple cluster analysis provides a clearer depiction of complex genome-wide regulatory relationships, laying the groundwork for establishing the genome-wide metabolic regulatory network (GMRN) (Fig. 2c, and Supplementary Fig. 12, Supplementary Data 7).

To capture non-linear regulatory relationships and enhance the comprehensiveness of the global regulatory network, we computed GINI coefficients between each SG, TF, and ME, and implemented a simple random forest approach to rank regulatory importance, thereby facilitating the screening of candidate regulatory genes or metabolites. A total of 2,332,526 high-confidence GINI regulatory pairs were identified across 24,959 genes and 633 metabolites, with importance rankings computed for every SG, TF, and ME. Next, we characterized the clustering and regulation of these genes. Individuals within the same cluster exhibited stronger correlations or higher GINI coefficient than did those within random pairings (Fig. 2d). The overlap among PCC and GINI regulatory pairs was significantly greater than that in the random control, demonstrating the accuracy and robustness of regulatory relationship mining (Fig. 2e). The regulatory relationships of ME-ME, ME-SG, and ME-TF pairs were unbalanced across all clusters (Fig. 2f). The distribution of the PCC and GINI coefficients of the ME-ME, ME-SG, and ME-TF pairs also confirmed this result statistically (Fig. 2g). We collected data regarding known TFs from the literature that regulate different metabolic pathways, including lipid, sugar, phenylpropane, and terpene pathways, to validate the accuracy of our data. The results reveal that these TFs and their targets are clustered together, and their PCC and GINI coefficients are greater than the average value, indicating the accuracy of our gene-metabolite correlation clustering (Fig. 2h). Due to the imbalance in regulatory relationships among ME-ME, ME-SG, and ME-TF, we adopted a stepwise analysis approach to cascade through ME-SG-TF to identify regulators of CGA and DHA. Compared with other SGs, NtLOX2, has the greatest correlation with DHA (Fig. 2i), while Nt4CL2A (encoding cinnamate 4-hydroxylase) and NtPAL2 (encoding phenylalanine ammonia lyase) have the greatest correlation with CGA (Fig. 2k). Similarly, compared to other TFs, the TEOSINTE BRANCHED 1/CYCLOIDEA/PCF (TCP) transcription factor NtCYC (CYCLOIDEA) has the strongest correlation with NtLOX2 (Fig. 2j), and the MYB transcription factor NtMYB28 exhibited the strongest correlation with Nt4CL2A and NtPAL2 (Fig. 2l). We subsequently identified the GINI coefficients and importance rankings for NtCYC with DHA and NtMYB28 with CGA. Consistent with PCC results, the GINI coefficients and importance rankings of NtLOX2-DHA, NtCYC-NtLOX2, Nt4CL2A/NtPAL2-CGA, and NtMYB28-Nt4CL2A/NtPAL2 were significantly higher than those of other regulatory relationships involving SGs or TFs (Fig. 2m-p, Supplementary Data 8). These results suggest that NtCYC and NtMYB28 may be key regulators of DHA and CGA metabolism, respectively.

Construction of a genome-wide metabolic regulatory network and characterization

To construct the metabolic regulatory network during tobacco leaf development, we expanded the two-dimensional regulatory relationships, including PCC and GINI regulatory pairs, obtained from multiple clustering and repeated the analysis five times (using the three separate transcriptomic or metabolomic replicates, the mean expression levels and PCC-GINI merged regulatory pairs). Subsequently, we merged the results to obtain the final network, which comprised 3,175,516 high-confidence regulatory pairs (Fig. 3a). To characterize the GMRN in tobacco precisely, we employed network parameters such as transitivity, average eigenvector centrality, and average path length to quantify the topological architecture of the GMRN by NetworkX. Tobacco GMRN showed the high transitivity (0.101), average eigenvector centrality (0.688) and average path length (2.638). We assigned 2,802 genes exhibiting only GINI coefficient-based regulatory relationships to a provisional Cluster 0, facilitating analysis of their cross-cluster interactions with other clusters. The mapping relationship between different clusters in the GMRN revealed that intra-cluster regulatory pairs were dominant, but cross-cluster regulatory relationships still existed, particularly in Cluster 0 (Fig. 3a). The distributions of the hub genes and metabolites across the five networks were consistent (Fig. 3b, c, and Supplementary Data 9). Enrichment analyses of the hub genes indicated their involvement in various metabolic pathways, possibly representing essential metabolic processes in organism development (Fig. 3d, e). Additionally, the top three TF families among the hub genes were the Ethylene Response Factor (ERF), WRKY, and MYB families, which are well known for their active involvement in metabolic regulation (Fig. 3f).

Fig. 3. GMRN provides insights into the regulation of metabolites by transcription factors.

Fig. 3

a Genome-wide landscape of GMRN across 9 co-response clusters and 1 GINI-specific cluster (C0) (left panel) and mapping relationship of regulatory pairs between different clusters in GMRN (right panel). b, c Number of overlapping hub genes (b) and metabolites (c) between the GMRNs from different data sources. The orange bars represent the shared hub genes and metabolites across the four GMRNs. d, e GO (d) and KEGG (e) enrichment of hub genes in GMRNs. The P values were calculated using hypergeometric distribution (two-sided) and then adjusted using the FDR. BP, biological process; CC, cellular component; MF, molecular function. f The top 15 TF families with the highest total number of key regulators predicted by GMRN. g A sub-network centered on ERF genes. SG, structural genes. TF, transcription factors. ME, metabolites. h GMRN recovers targets of ERF1 and DREB3 associated with different metabolites. Green arrows represent the regulatory pairs used for validation. Dashed lines represent GINI coefficient relationships, while solid lines denote PCC (Pearson correlation coefficient) relationships. i Validation of the regulation to the targets of ERF1 and DREB3 (n = 5 biological replicates). ERF1 and DREB3 enhanced the transcription of BCCP1, POD1, HSFA2 and MST6 in a dual-luciferase reporter assay, respectively. The bottom panel show schematic diagrams of the effector and reporter plasmids. EV, empty vector. j A Sub-GMRN centered on WRKY genes. k GMRN recovers targets of WRKY11, WRKY22 and WRKY42 associated with different metabolites. Green arrows represent the regulatory pairs used for validation. l Validation of the regulation to the targets of WRKY11 and WRKY42 (n = 5 biological replicates). WRKY11 and WRKY42 enhanced the transcription of FLS, GATL9, Glu1 and LCYB in a dual-luciferase reporter assay, respectively. The bottom panels show schematic diagrams of the effector and reporter plasmids. In (i) and (l), error bars represent the means ± standard errors of five biological replicate. ***P < 0.001. A two-tailed Student’s t test was used to determine P values. Source data are provided as a Source Data file.

Given the potential key roles as well as high rankings in hub gene analysis of ERF and WRKY TFs in plant metabolic regulation, we extracted metabolic regulatory networks centered on ERF or WRKY TFs, which resulted in a change in the network topological structure compared to that of the GMRN. Compared to the GMRN, the ERF-GMRN exhibits higher transitivity (0.637), lower average eigenvector centrality (0.0523), and shorter average path length (2.368). ERF-GMRN comprised 2,442 nodes and 57,798 regulatory pairs (Fig. 3g, and Supplementary Data 10). The top five ERF TFs with the highest degree of connectivity were identified in the network, which included ERF1, ERF49, ERF167, Dehydration-Responsive Element Binding protein 3 (DREB3), and RELATED TO APETALA2.4 (RAP2.4) (Fig. 3g, and Supplementary Data 10). ERF1 and RAP2.4 were previously shown to regulate wax synthesis57,58. DREB3 is involved in ABA signaling pathways and drought resistance and can regulate phospholipid and carbohydrate metabolism16,59. ERF49 is associated with plant responses to high temperature and drought stress60, while the function of ERF167 is unknown. The GMRN revealed some known regulatory relationships between ERF1 and DREB3, including those validated by laboratory experiments. ERF1 is primarily associated with potential cascade regulation in the metabolism of various lipids, while DREB3 is associated with phospholipid and carbohydrate metabolism (Fig. 3h). We randomly selected targets identified by PCC or GINI coefficient for ERF1 and DREB3 and performed dual-luciferase assays. ERF1 can activate the expression of the lipid transport protein Biotin Carboxyl Carrier Protein 1 (BCCP1) and the peroxidase gene Peroxidase 1 (POD1), and the monosaccharide transport protein MONOSACCHARIDE TRANSPORTER 6 (MST6) and heat shock transcription factor A-2 (HSFA2) is positively regulated by DREB3 (Fig. 3i).

Compared to the GMRN, the WRKY-GMRN exhibits higher transitivity (0.499) and longer average path length (3.295), but lower average eigenvector centrality (0.0253). The top three TFs in terms of connectivity in the WRKY-GMRN were WRKY42, WRKY22, and WRKY11 (Fig. 3j, and Supplementary Data 11). WRKY42 regulates chlorophyll degradation and carotenoid synthesis61. WRKY11, regulated by WRKY22, promotes anthocyanin, flavonoid, and lignin synthesis, and can enhance plant resistance to pathogens6265. Some targets of WRKY TFs were recovered by the GMRN (Fig. 3k). We randomly selected some PCC or GINI regulatory pairs for validation. The results showed that WRKY11 promotes the expression of the flavanol synthase gene Flavanol Synthase (FLS) and Galacturonosyl-transferase gene Galacturonosyltransferase-Like 9 (GATL9), while WRKY42 positively regulates the expression of the lycopene beta-cyclase gene Lycopene β-Cyclase (LCYB) and glutamine synthetase gene Glutamine 1 (Glu1) (Fig. 3l). These results not only further validated the accuracy of the GMRN, but also allowed reassembly of the regulatory networks of ERF and WRKY TFs in tobacco. Overall, the GMRN will facilitate the discovery of metabolic regulatory genes.

The GMRN provides insights into the identification of key regulators involved in hydroxycinnamic acid biosynthesis

Next, we used the GMRN to identify key TFs that regulate target metabolites biosynthesis. The clustering analysis results revealed that NtMYB28 may regulate CGA biosynthesis by modifying the expression of Nt4CL2A and NtPAL2. The phenylalanine ammonia lyase and cinnamate 4-hydroxylase act as the rate-limiting enzymes in the biosynthetic pathway of CGA66 (Supplementary Fig. 13). To understand the regulatory relationships of NtMYB28, we extracted a sub-regulatory network from the GMRN centered on target metabolites (Fig. 4a). The expression patterns of phenylpropanoid biosynthetic genes and related TFs, and the accumulation patterns of hydroxycinnamic acids were highly consistent (Fig. 4b). KEGG enrichment of genes and metabolites coregulated in the GMRN of NtMYB28 revealed that their molecular functions were primarily enriched in phenylalanine metabolism and phenylpropanoid biosynthesis (Supplementary Figs. 14, 15a, b).

Fig. 4. Transcription factor NtMYB28 regulates hydroxycinnamic acid biosynthesis and the expression of genes involved in phenylpropanoid pathway.

Fig. 4

a The sub-network extracted from the GMRN for chlorogenic acid biosynthesis. SG, structural genes. TF, transcription factors. ME, metabolites. b Expression patterns of MEs, SGs, and TFs. “4” and “-4” represent the maximum and minimum value of the Z-score standardized expression, respectively. DAT, days after topping. c Transcriptional activation assay of NtMYB28 in yeast. d Transient expression assay showing transcriptional activation of the LUC reporter gene (driven by the promoter of Nt4CL2A and NtPAL2) by NtMYB28 (n = 5 biological replicates). e Specific binding of NtMYB28 protein to promoters of Nt4CL2A and NtPAL2 by one-hybrid system using bait and either prey or negative control. f EMSA with NtMYB28 protein performed with the probes derived from the Nt4CL2A and NtPAL2 promoter. g, h RNA-seq and RT-qPCR showing that Nt4CL2A (g) and NtPAL2 (h) are positively regulated by NtMYB28 (n = 3 biological replicates). i Comparison of chlorogenic acid, neochlorogenic acid, cryptochlorogenic acid and caffeic acid mass spectrometry data of WT, NtMYB28-OE and ntmyb28-KO tobaccos. j Chlorogenic acid, neochlorogenic acid, cryptochlorogenic acid and caffeic acid contents in leaves of WT, NtMYB28-OE and ntmyb28-KO tobaccos (n = 3 biological replicates). k GMRN accurately predicts targets of NtMYB28, as demonstrated by the identification of differentially expressed genes (DEGs) in the TF mutant and the overexpression lines. l Heatmap analysis of the genes related to phenylpropanoid pathway. “2” and “-2” represent the maximum and minimum value of the Z-score standardized expression, respectively. m KEGG pathway enriched by DEGs between WT, NtMYB28-OE and ntmyb28-KO tobaccos. In (c), (e), and (f), the images were representative of three independent repeats with similar results. In (d), (g), (h), and (j), error bars represent the means ± standard errors of biological replicate. A two-tailed Student’s t test was used to determine P values. Different lowercase letters indicate significant differences (P < 0.001). In (k) and (m), the P values were calculated using hypergeometric distribution (two-sided) and then adjusted using the FDR. Source data are provided as a Source Data file.

Transcriptional activation activity and subcellular localization revealed that NtMYB28 functions as a TF (Fig. 4c, Supplementary Fig. 16). To further investigate whether NtMYB28 directly induces the expression of phenylpropanoid biosynthetic genes, we cloned the promoters of Nt4CL2A and NtPAL2 (Supplementary Fig. 17a). The promoter regions (1,000-bp region upstream of ATG) of Nt4CL2A and NtPAL2 contain eleven MYB binding motifs, including AC I-III, Secondary Metabolism-Responsive Element (SMRE)-I and II (Supplementary Fig. 17b). The integrated results of LUC activity assay, yeast one-hybrid assay (Y1H) and electrophoretic mobility shift assay (EMSA) (Fig. 4d–f) revealed that NtMYB28 protein could directly bind to the promoters of Nt4CL2A and NtPAL2 via the AC-I motif (Fig. 4d–f).

We generated two independent NtMYB28 overexpression lines (NtMYB28-overexpressed (OE)#8 and 13) and two independent ntmyb28 mutant lines (ntmyb28-knockout (KO)#11 and 14) (Supplementary Fig. 18a–c). The mRNA levels of Nt4CL2A and NtPAL2 were significantly greater in the OE lines, whereas they were downregulated in the ntmyb28-KO lines (Fig. 4g, h, and Supplementary Fig. 19a). The levels of hydroxycinnamic acid of two NtMYB28-OE lines increased 1.2-2.0-fold, compared to those of the wild-type (WT) tobacco, whereas in the ntmyb28-KO lines, there was a significant reduction (1.5-2.3-fold) as compared to the levels of the WT tobacco (Fig. 4i, j). PCA revealed that the transcriptomes of the two OE lines or two KO lines shared a quite distinct from the WT, respectively (Supplementary Fig. 19b). Only genes with a fold change (FC) ≥ 2 and False Discovery Rate (FDR) ≤ 0.01 for each line were considered DEGs (Supplementary Fig. 19c–e, and Supplementary Data 12). The overlap among these DEGs and the predicted NtMYB28 targets in the GMRN was significantly greater than that in the random control (Fig. 4k), which indicates that the GMRN accurately predicted the targets of NtMYB28. The expression levels of SGs used to construct the sub-regulatory network of the phenylpropanoid biosynthesis pathway in the GMRN differed among the WT, OE and KO lines (Fig. 4l). Additionally, KEGG enrichment of DEGs between the WT and OE lines revealed that they were mainly enriched in phenylalanine metabolism (Fig. 4m). These results strongly demonstrate that NtMYB28 is a regulatory factor in the hydroxycinnamic acid biosynthesis and validate the accuracy of the GMRN. Based on the significant temperature differences between LF and HM, and considering the established roles of MYB family members, we conducted cold stress experiments to examine the function of NtMYB28 in response to abiotic stress67. Under cold stress, NtMYB28-OE lines exhibited superior growth compared to the wild type (Supplementary Fig. 20a). Relative to WT plants, NtMYB28-OE lines displayed higher chlorophyll content, and increased antioxidant capacity (Supplementary Fig. 20b–e). In contrast, ntmyb28-KO lines showed severe growth inhibition under cold stress with significantly compromised antioxidant activity (Supplementary Fig. 20a–e). These results indicated that NtMYB28 can also serve as a candidate stress-tolerance gene for molecular breeding in tobacco.

Identification of an uncharacterized transcription factor regulating lipid biosynthesis via the GMRN

In the ERF-GMRN, among the top five TFs with the greatest connectivity, only NtERF167 has an unknown function. NtERF167 clustered with numerous SGs involved in lipid metabolism (Supplementary Fig. 21a). Consistent with this, KEGG enrichment analysis of the GMRN also revealed significant enrichment of various lipid biosynthesis pathways in cluster VIII (Supplementary Fig. 12, Supplementary Data 7). Therefore, we speculated that NtERF167 may be a central TF regulating lipid metabolism. We further extracted a sub-regulatory network from the ERF-GMRN centered on these lipids and the corresponding SGs. NtERF167 exhibits a direct and strong correlation with lipids and SGs (Fig. 5a). The expression patterns of lipid biosynthesis genes, related TFs and the accumulation patterns of lipids showed a high level of consistency (Fig. 5b, and Supplementary Fig. 21b). Phylogenetic analysis revealed that NtERF167 is related to NtERF1 and NtSHINE2 (NtSHN2) (Supplementary Fig. 22). The expression patterns of NtERF167 were strongly correlated with and similar to those of Long-Chain Acyl-CoA Synthetase 2 (NtLACS2), and importance ranking of NtERF167 and NtLACS2 was prioritized over other SGs in the regulatory networks, suggesting that NtERF167 is a potential regulatory factor for NtLACS2 (Fig. 5b, and Supplementary Data 8). The genes and metabolites co-regulated with NtERF167 in the GMRN were primarily enriched in lipid metabolism and flavonoid biosynthesis (Supplementary Fig. 23a, b). Additionally, the cis-elements of genes co-expressed in the GMRN of NtERF167 mainly included binding sites for DREB and ERF TFs (Supplementary Fig. 24).

Fig. 5. NtERF167 is a key regulatory factor in lipid metabolism and disrupts lipidomic flux.

Fig. 5

a The sub-network extracted from the GMRN for lipid biosynthesis. SG, structural genes. TF, transcription factors. ME, metabolites. b Expression patterns of lipids, SGs, and TFs. “4” and “−4” represent the maximum and minimum value of the Z-score standardized expression, respectively. DAT, days after topping. c Transcriptional activation assay of NtERF167 in yeast. d Subcellular localization of NtERF167-GFP fusion protein in Nicotiana benthamiana leaf cells. GFP, green fluorescent protein. mKATE, Katushka protein. Scale bars, 20 µm. e Transient expression assay showing transcriptional activation of the LUC reporter gene (driven by the promoter of NtLACS2) by NtERF167 (n = 5 biological replicates). *** P < 0.001. f Specific binding of NtERF167 to promoters of NtLACS2 by one-hybrid system using bait and either prey or negative control. g EMSA with NtERF167 protein performed with the probes derived from the NtLACS2 promoter. h RNA-seq and RT-qPCR showing that NtLACS2 are positively regulated by NtERF167 (n = 3 biological replicates). i Lipid content in leaves of WT, NtERF167-OE and nterf167-KO tobaccos (n = 3 biological replicates). j TAG (triglyceride) and LPE (Lysophosphatidylethanolamine) contents in leaves of WT, NtERF167-OE and nterf167-KO tobaccos (n = 3 biological replicates). k GMRN accurately predicts targets of NtERF167. l Expression patterns of the genes showed an opposite expression pattern between the NtERF167-OE and nterf167-KO tobaccos. “2” and “-1” represent the maximum and minimum value of the Z-score standardized expression, respectively. m KEGG enrichment of genes showed an opposite expression pattern between the NtERF167-OE and nterf167-KO tobaccos. In (c), (d), (f), and (g), the images were representative of three independent repeats with similar results. In e, h, i, and j error bars represent the means ± standard errors of biological replicate. A two-tailed Student’s t test was used to determine P values. Different lowercase letters indicate significant differences (P < 0.001). In k, m the P values were calculated using hypergeometric distribution (two-sided) and then adjusted using the FDR. Source data are provided as a Source Data file.

Transcriptional activation activity analysis and subcellular localization revealed that NtERF167 functions as a TF (Fig. 5c, d). The promoter region (2000 bp region upstream of ATG) of NtLACS2 contains the ERF binding motif GCC-box (Supplementary Fig. 25b). Dual-luciferase reporter assays, Y1H assay and EMSA showed that NtERF167 protein could directly bind to the promoter of NtLACS2 (Fig. 5e–g, and Supplementary Fig. 25a, b).

To confirm the function of NtERF167 in lipid biosynthesis, we generated transgenic tobacco plants overexpressing NtERF167, driven by the 35S promoter. Furthermore, we used CRISPR/Cas9-mediated genome editing to generate nterf167 loss-of-function mutants with the same genetic background as the NtERF167-overexpressing plants. We obtained two independent NtERF167-overexpressing lines (NtERF167-OE#1 and 6) and two independent nterf167 mutant lines (nterf167-KO#3 and 28) (Supplementary Fig. 26a-c). The mRNA levels of NtLACS2 were significantly greater in the OE lines than in the WT lines and were downregulated in the nterf167-KO mutants (Fig. 5h, and Supplementary Fig. 27b). Total lipid content of the two NtERF167-OE lines significantly increased compared with WT plants, whereas the content of nterf167-KO lines significantly decreased compared with that of the WT plants (P < 0.01, Fig. 5i, and Supplementary Fig. 26a-c). PCA revealed that the lipidomic profiles of the two OE lines and two KO lines were quite distinct from that of the WT (Supplementary Fig. 27a, b, and Supplementary Data 13). The levels of most glycerophospholipids (phosphatidic acid, phosphatidylglycerol, phosphatidylinositol, Phosphatidylserine, lysophosphatidylglycerol, and lysophosphatidylethanolamine) were significantly greater in the OE lines than in the WT or KO lines (Supplementary Fig. 27c, d). In addition, the skeletal lipids DAG and TAG were more abundant in the OE plants (Fig. 5j). RNA-seq of NtERF167-OE, nterf167-KO and WT plants was performed to elucidate how NtERF167 regulates lipid biosynthesis (Supplementary Fig. 28a). The overlap between DEGs among WT, OE and KO lines and the predicted NtERF167 targets in the GMRN was significantly greater than that in the random control group (Fig. 5k, and Supplementary Fig. 28b-d, Supplementary Data 14). A series of vital genes involved in FFA oxidation and lipid metabolism were differentially expressed among the WT, OE and KO lines (Fig. 5l). Additionally, KEGG enrichment of DEGs that exhibited opposite expression patterns between the KO and OE lines revealed that they were mainly enriched in lipid metabolism (Fig. 5m), suggesting that NtERF167 may be related to multiple biological pathways. These results indicate that NtERF167 is a key regulatory factor in lipid metabolism and lipidomic flux was disrupted by NtERF167. We also evaluated the cold stress tolerance function of NtERF167. Compared to the WT, loss-of-function of nterf167 resulted in increased sensitivity to cold stress in tobacco (Supplementary Fig. 29a-b). The nterf167-KO lines exhibited significantly reduced chlorophyll content and elevated membrane lipid peroxidation under low-temperature conditions (Supplementary Fig. 29c-d). These results suggested that NtERF167 also plays a role in abiotic stress responses.

The GMRN illustrates the key regulators of aroma formation

Aroma is a sophisticated combination of aromatic volatile metabolites present in plants, primarily from the degradation of carotenoids within plants, LOX enzymes catalyze the oxidation of polyunsaturated FFAs, such as linoleic and linolenic acids. The free radicals generated during the intermediate steps of FFA oxidation are responsible for the oxidative degradation of carotenoid pigments11. The co-oxidation reaction catalyzed by LOX involves the nonspecific cleavage of carotenoids23,68, yielding apocarotenoid volatiles (Supplementary Fig. 30). To identify genes regulating the degradation of carotenoids and aroma formation, we focused on DHA, which is a crucial flavoring agent and signaling molecule in plants. The GMRN revealed that NtCYC may regulate DHA formation by modifying the expression of NtLOX2. Phylogenetic analysis revealed a close evolutionary relationship between NtCYC and TCP4 in Arabidopsis and tobacco (Supplementary Fig. 31). The sub-network centered on several aroma terpenoid metabolites revealed a wide variety of metabolite species and TF families involved in terpenoid metabolism, demonstrating the complexity of terpenoid biosynthesis processes (Fig. 6a). In the network, DHA, LOXs, and most TCPs were found in cluster VI (Supplementary Data 7). KEGG enrichment analysis of the GMRN revealed that metabolic pathways that may be related to aroma formation, such as FFA degradation, glutathione metabolism, oxidative phosphorylation, sesquiterpenoid, and triterpenoid biosynthesis, were specifically enriched in cluster VI (Fig. 2c, Supplementary Fig. 12). As LOX genes are also implicated in jasmone acid (JA) synthesis, we examined the expression patterns of genes involved in carotenoid degradation or JA synthesis with related TFs and the accumulation patterns of DHA. The expression patterns of NtCYC were strongly correlated with and similar to those of NtLOX2 and DHA (Fig. 6b), suggesting that NtCYC is a potential regulatory factor for NtLOX2.

Fig. 6. Transcription factor NtCYC rewrites aroma formation.

Fig. 6

a The sub-network extracted from the GMRN for carotenoid degradation. SG, structural genes. TF, transcription factors. b Expression patterns of metabolites, SGs, and TFs. “4” and “-4” represent the maximum and minimum value of the Z-score standardized expression, respectively. DAT, days after topping. c Transcriptional activation assay of NtCYC in yeast. d Subcellular localization of NtCYC-GFP fusion protein in Nicotiana benthamiana leaf cells. GFP, green fluorescent protein. mKATE, Katushka protein. Scale bars, 20 µm. e Transient expression assay showing transcriptional activation of the LUC reporter gene (driven by the promoter of NtLOX2) by NtCYC (n = 5 biological replicates). *** P < 0.001. f Specific binding of NtCYC protein to promoters of NtLOX2 by one-hybrid system using bait and either prey or negative control (bait/pGADT7). g EMSA with NtCYC protein performed with the probes derived from the NtLOX2 promoter. hj Dihydroactinidiolide (h), damascenone (i) and ionone (j) contents and corresponding mass spectrometry data in leaves of WT, NtCYC-OE and ntcyc-KO tobaccos (n = 3 biological replicates). k RNA-seq and RT-qPCR showing that NtLOX2 are positively regulated by NtCYC (n = 3 biological replicates). l MDA content between WT, NtCYC-OE and ntcyc-KO tobaccos (n = 3 biological replicates). m GMRN accurately predicts targets of NtCYC. The P values were calculated using hypergeometric distribution (two-sided) and then adjusted using the FDR. n Expression patterns of the genes related to carotenoid pathway in WT, NtCYC-OE and ntcyc-KO tobaccos. “2” and “−2” represent the maximum and minimum value of the Z-score standardized expression, respectively. In c, d, f, and (g), the images were representative of three independent repeats with similar results. In (e), (h), (i), (j), (k), and (l), error bars represent the means ± standard errors of biological replicate. A two-tailed Student’s t test was used to determine P values. Different lowercase letters indicate significant differences (P < 0.001). Source data are provided as a Source Data file.

Transcriptional activation activity analysis and subcellular localization revealed that NtCYC functions as a TF (Fig. 6c, d). Y1H, EMSA and dual-luciferase reporter assays revealed that NtCYC protein binds to the NtLOX2 promoter (Fig. 6e–g, and Supplementary Fig. 32a, b). To further investigate the function of NtCYC in aroma formation and carotenoid metabolism, we generated transgenic tobacco plants overexpressing NtCYC, driven by the 35S promoter. Furthermore, we used CRISPR/Cas9-mediated genome editing to create ntcyc loss-of-function mutants. We obtained two independent NtCYC overexpression lines (NtCYC-OE#7 and 12) and two independent lines of ntcyc mutants (ntcyc-KO#27 and 43) (Supplementary Fig. 33a–c). Aroma metabolites, including β-ionone, β-damascenone and DHA profiling was carried out on extracts of tobacco leaves. The concentration of aroma compounds in tobacco is relatively low, posing challenges for detection with GC-MS, consequently, we chose the higher sensitivity of LC-MS for analysis. In the leaves of the two NtCYC-OE lines, the levels of DHA and β-ionone of the two NtCYC-OE lines significantly increased compared to those in the leaves of the WT plants, whereas in the leaves of the ntcyc-KO mutants, there was a significant reduction compared to the levels in the leaves of the WT plants (Fig. 6h–j). The mRNA levels of NtLOX2 were significantly greater in the OE lines than WT and were downregulated in the ntcyc-KO mutants (Fig. 6k, and Supplementary Fig. 34c). Furthermore, since LOXs degrades carotenoids to produce aroma compounds through FFA oxidation, we measured malondialdehyde (MDA) content—a crucial lipid oxidation indicator—in NtCYC-OE, ntcyc-KO, and WT plants. The results indicated a significant increase in MDA content in NtCYC-OE compared to WT plants, while it markedly decreased in ntcyc-KO (P < 0.01, Fig. 6l). Interestingly, the level of β-damascenone showed contrasting results compared to DHA and β-ionone (Fig. 6i). Because β-damascenone is not subject to degradation by LOX, the enhancement of NtLOX2 expression mediated by NtCYC redirected metabolic flux towards DHA and β-ionone which catalyzed by LOXs.

RNA-seq of NtCYC-OE, ntcyc-KO and WT plants was performed to elucidate how NtCYC regulates aroma formation. PCA showed that two OE lines or two KO lines shared a similar transcriptome quite distinct from the WT, respectively (Supplementary Fig. 34a). The overlap between DEGs among WT, OE and KO plants and the predicted NtCYC targets in the GMRN was significantly greater than that in the random control (Fig. 6m, and Supplementary Fig. 34b, d, Supplementary Data 15). In addition, the expression levels of the SGs used to construct the sub-regulatory network involved in JA biosynthesis were different among the WT, OE and KO lines (Fig. 6n). GO and KEGG enrichment analyses of these DEGs revealed similar molecular functions as those revealed by the KEGG enrichment analysis of the GMRN, such as functions in sesquiterpenoid and triterpenoid biosynthesis (Supplementary Fig. 35a, b). In conclusion, NtCYC is a regulatory factor in aroma formation and affects the metabolic flux of terpenoid metabolism. Given that NtCYC may mediate oxidative homeostasis and evidence that certain CYC transcription factors function as negative regulators in abiotic stresses69, we characterized the role of NtCYC under cold stress. Compared to WT, ntcyc-KO plants exhibited enhanced growth phenotypes under cold stress (Supplementary Fig. 36a, b). The ntcyc-KO lines displayed higher chlorophyll content, accompanied by significantly reduced hydrogen peroxide accumulation and membrane lipid peroxidation (Supplementary Fig. 36c-e). These results demonstrated that NtCYC negatively regulates cold stress tolerance in tobacco through peroxide-mediated mechanisms.

Discussion

In this study, dynamic transcriptomic and metabolomic profiles were collected across multiple timepoints and ecological gradients following topping. To comprehensively characterize tobacco metabolites, we employed integrated analytical platforms including GC-MS, LC-MS/MS, GC-MS derivatization, and lipidomics. Our species-specific metabolite library encompasses nearly 30 tobacco-abundant compounds such as nicotine, scopoletin, and solanesol. In total, 25,984 genes and 633 fully annotated and qualified metabolites were identified, enabling construction of a high-confidence metabolic regulatory network containing 3.17 million regulatory relationships. To date, extensive metabolomic datasets have been generated for metabolic regulatory network construction in multiple species, such as barley37, rice70, kiwifruit43, tomato39. Among these, tomato profiles quantified 540 annotated metabolites, kiwifruit 515, and barley 419 (from a total of 986 detected metabolites, with 567 remaining unannotated), while rice studies report 825 metabolites. Compared with these systems, our fully annotated metabolite set (633 compounds) represents a competitively comprehensive resource, providing a solid foundation for reconstructing pathway-specific regulatory networks.

The choice of experimental environment in the present research was carefully considered. Population genetics studies demonstrate that utilizing diverse environments strengthens genotype-phenotype/metabolite associations and reveals genetic effects39. Recent research further indicates that environmental perturbations significantly improve gene-metabolite correlations in networks71. In our previous multi-omics study across three distinct ecoregions, geographical location emerged as the dominant factor perturbing the tobacco metabolome, regulating the biosynthesis, accumulation, and gene expression of diverse metabolites47. To enhance the ecological relevance and gene-metabolite correlations of metabolic network, the present study was conducted under open-field conditions across two distinct ecological regions LF and HM, ensuring our findings authentically represent real-world scenarios for tobacco breeding and metabolic engineering applications. The high-altitude HM site-imposed UV-B and cold stress, triggering significant accumulation of lipids and flavonoids (Supplementary Figs. 3, 4). These findings align with established mechanisms: flavonoids serve as ultraviolet shields and antioxidants55, exemplified by significantly enhanced flavonoid levels in Ginkgo leaves at high altitudes72. The OsRLCK160-OsbZIP48 regulatory module further demonstrates flavonoid-mediated UV-B resistance in rice73. Concurrently, lipids function as potent stress-mitigating metabolites that stabilize membranes during cold/dehydration stress and form cuticular waxes to limit water loss, explaining their stress-inducible accumulation57,74. The results of this study demonstrate ecological differences between the two sites at the metabolomic level and prove that metabolic changes become an important characteristic of plant stress response. Notably, three identified transcription factors exhibit dual functions of metabolic regulation and abiotic stress resistance, highlighting the utility of environment-driven metabolic networks for discovering stress-resistance candidate genes.

The application of synthetic biology approaches has the potential to improve tobacco as a production platform for proteins and small molecules. However, metabolic pathways are strictly regulated, and the overexpression of a single biosynthetic enzyme may not has a significant impact on output efficiency27. Recent studies have revealed that modifying the expression of TFs may lead to more comprehensive upregulation of plant metabolism75. For example, overexpression of ZmNAC78 can increase the iron content in maize grains by 50%76. Overexpression of SlMYB12 can alter the metabolic profile of tomato fruit39. The exploration of TFs and their downstream target genes, especially their regulatory relationships with metabolites, may be challenging due to the less significant coregulation effects among them16. Recent studies have suggested that networks are efficient tools for researching the relationships between TFs and other genes or metabolic pathways, which helps us discover and characterize some hidden and complex relationships31,77. However, in cases with small sample numbers, the prediction quality and weight values of the network are often lower, especially when more genes are used in the network construction, as evidenced by WGCNA40. Theoretically, the regulatory network becomes more robust as the number of samples and the differences among them increase, highlighting the regulatory relationship between genes and metabolism78. Therefore, the integration of multiple omics datasets, along with increased sample size and differences between samples, significantly enhances the accuracy and predictive ability of GMRNs. In this study, we sampled and measured data from LF and HM, two regions with significant ecological differences, and increased the sample size to enhance the accuracy of our network.

Recent advances in genomics, transcriptomics, epigenomics, and metabolomics have enabled successful construction of multi-omics-based metabolic regulatory networks across diverse species16,32,37,42,45,46. These networks represent a powerful systems biology framework for analyzing and presenting biological data, as deconstructing network architectures provides an effective approach to elucidate the highly dynamic and complex principles of functional genomics79. Compared to population genetics-based metabolite-genome-wide association study (mGWAS) studies, metabolic regulatory networks enable faster data generation and regulator gene identification. However, network hardly captured metabolic changes caused by sequence variations. A wheat multi-omics study demonstrated that these approaches are complementary: metabolic networks accelerate the identification of major-effect genes from mGWAS-derived candidate pools80, while integrating mGWAS or metabolite-quantitative trait locus genetic variations into metabolic networks enhances prediction accuracy37. Current mainstream approaches for metabolic network construction primarily employ co-expression networks and coregulation analysis. This methodology enables the identification of genes and metabolites exhibiting coordinated expression patterns, which are subsequently clustered into distinct modules. This highly effective framework has facilitated the identification of numerous metabolic regulators across diverse species, such as tomato40, rice16, cotton42, kiwifruit44, and barley36. However, reducing false positives in regulatory pairs remains a persistent challenge. First, similar expression patterns do not necessarily indicate strong correlations, the present study further revealed significant divergence between regulatory pairs identified through co-regulation analysis versus correlation-based approaches (Fig. 2a). Moreover, capturing nonlinear relationships arising from metabolite synthesis lagging behind gene expression proves particularly difficult81. Building upon established methodologies for metabolic regulatory network construction, this study optimizes the network assembly pipeline by integrating linear and nonlinear relationships, merging clustering-correlation analyses, and refining outcomes through machine learning-based modeling. We validated the accuracy of regulatory interactions within the metabolic network through multiple orthogonal approaches: comparative analysis of PCC and GINI coefficients between module-enriched versus random regulatory pairs; verification that wet-lab confirmed regulatory relationships exhibit high correlation and co-cluster within identical network modules; and transcriptomic profiling of gene overexpression/knockout plant materials to quantify the overlap between DEGs and network-connected genes. Collectively, these results demonstrate exceptional accuracy of our metabolic network. This study establishes a pioneering framework for constructing metabolic regulatory networks across diverse species.

GMRN facilitates the identification of the functions of regulated genes in metabolite synthesis40. Firstly, we validated the accuracy of GMRN using the hydroxycinnamic acid metabolic pathway. The biosynthesis of hydroxycinnamic acid diverges from the phenylpropanoid pathway, and PAL and 4CL act as the rate-limiting enzymes in the biosynthetic pathway of CGA82. In previous research, R2R3-MYB TFs have been identified as regulatory factors in phenylpropanoid pathway83. It has been suggested that in Arabidopsis, AtMYB15 directly activates PAL1 and Cinnamate 4-hydroxylase (C4H), while its ortholog in Lonicera macranthoides, LmMYB15, activates 4CL, MYB3, and MYB4 to regulate CGA biosynthesis and phenylpropanoid metabolism17,84. Similarly, the ortholog of AtMYB15 in rice, OsMYB30, OsMYB55, and OsMYB110, has been reported to directly activate OsPAL1-3, Os4CL3, and Os4CL5 to activate the lignin synthesis85. However, few studies have reported MYB transcription factors regulating hydroxycinnamic acid biosynthesis in Nicotiana tabacum. Previous research identified that NtWRKY33a modulates polyphenol biosynthesis within the same phenylpropanoid pathway by targeting NtMYB4 and hydroxycinnamoyl-CoA shikimic acid/cinnamate transferase gene NtHCT genes in tobacco86. However, the downstream regulatory targets of MYB4 remain unidentified. In this study, through GMRN, we identified a key transcription factor, NtMYB28, which directly targets the coding genes of the rate-limiting enzymes 4CL and PAL in the phenylpropane pathway, Nt4CL2A and NtPAL2. The NtMYB28 also influenced the transcription levels of phenylpropane pathway-related SGs and the synthesis efficiency of hydroxycinnamic acids. This discovery provides theoretical guidance for engineering tobacco plants that accumulate hydroxycinnamic acid. The present study reveals the key member of MYB transcription factor in the regulation of phenylpropanoid metabolism in Nicotiana tabacum. Furthermore, previous studies demonstrated that hydroxycinnamic acids play pivotal roles in plant responses to environmental and biotic stresses16,87,88. CGA exhibits multifunctional protective effects, including combating oxidative stress, enhancing cold tolerance, and mitigating UV-B radiation damage. For instance, in Citrus species, exogenous application of CGA significantly improves cold tolerance through efficient scavenging of ROS87. MYB TFs constitute pivotal regulators in plant stress responses, orchestrating diverse metabolic pathways to combat environmental stresses89. In Nicotiana tabacum, NtMYB27 functions downstream of brassicin steroid signal kinase NtBES1 to enhance UV-B stress tolerance by modulating flavonoid biosynthesis88. Furthermore, OsMYB30, OsMYB55, and OsMYB110 confer pathogen resistance through reinforced cell wall synthesis16. Consistently, the present study identified NtMYB28 as a key positive regulator of cold tolerance, thereby furnishing a prime molecular target for developing cold- tolerance tobacco cultivars.

Tobacco has been a potential production platform for oils10. Tobacco leaves contain 1.7%–4% oil per dry weight90, which is extractable as FFA esters, the major component of biofuel oil9. Furthermore, lipids, including membrane lipids, storage lipids (such as TAG), surface lipids (wax, cutin, and suberin), and serves as a key functional compound for plants to combat diverse abiotic stresses91. Lipids function not only as crucial signaling molecules for transmitting stress information in plants, but also enable plants to preserve membrane fluidity and stability under low-temperature stress through enhanced synthesis of unsaturated fatty acids92,93. Moreover, lipids constitute the primary components of the plant cuticle—a composite hydrophobic matrix of waxy lipids deposited on the outer epidermal cell walls of aerial organs. This structure serves as a critical barrier that minimizes water loss and safeguards internal tissue integrity94. In the present study, a key ERF family transcription factor, NtERF167, that significantly regulates lipid metabolism was identified from the GMRN. Based on the sub-regulatory network of lipid metabolism SGs, we found that NtERF167 have a stronger correlation with most lipid SGs and the strongest correlation with LACS216,58. LACS2 is a key gene in lipid metabolism that acts as a bridge between the chloroplast and endoplasmic reticulum lipid synthesis pathways95. This study revealed that NtERF167 positively regulated the expression of the LACS2 gene, significantly impacting lipid composition changes in tobacco by increasing the lysophosphatidic acid, lysophosphatidylglycerol, and TAG contents. Previous studies suggested that AP2/ERF family TFs may participate in abiotic stress resistance, FFA metabolism or wax synthesis57,96. In Arabidopsis, overexpression of the AP2/ERF transcription factor WIN1/SHN1 promotes cuticular wax biosynthesis, enhances drought tolerance, and modulates cuticular permeability57. In soybean, overexpression of GmWIN1-5, the functional ortholog of AtWIN1, significantly increased total lipid and phospholipid content in hairy roots and transgenic tobacco leaves, while substantially enhancing cold tolerance in transgenic plants74. The WIN1 ortholog in Nicotiana tabacum, NtERF1, modulates the expression of lipid transport genes under aluminum stress97. Research on ERF-mediated regulation of lipid biosynthesis in Nicotiana tabacum has not been extensively investigated98. This study demonstrates that NtERF167 rewires lipid metabolic flux, with its mediated lipid biosynthesis concurrently enhancing cold tolerance in tobacco. These data expand the metabolic map of tobacco lipids and provide a critical gene resource for molecular breeding and lipid metabolic engineering in Nicotiana species.

Natural aroma substances have high extraction costs26, which have various applications in the food, flavor, fragrance, cosmetics, and pharmaceutical industries21. The aroma in plants primarily stems from the degradation of carotenoids by LOXs and carotenoid cleavage dioxygenases (CCDs)11. Previous studies have reported mechanisms underlying aroma production and regulation in tea, flowers, and vegetables11,99,100. CsLOX9 controls the biosynthesis of signature C9 volatiles in cucumber (Cucumis sativus), while CsDof1.8 transcriptionally regulates CsLOX9 to modulate cucumber aroma99. ERF1 regulates floral scent production in Osmanthus fragrans by controlling ionone biosynthesis through transcriptional modulation of the CCD4 gene100. There have been few reports on structural enzymes or the transcription factors controlling aroma production in tobacco. The GMRN revealed that NtLOX2 exhibits strong correlations with multiple aroma compounds and functions as a key regulatory gene controlling aroma compound biosynthesis. Furthermore, this study experimentally validated that the TCP-class transcription factor NtCYC regulates NtLOX2 expression, thereby precisely controlling aroma compound biosynthesis in tobacco. While TCP transcription factors have been characterized in prior studies as pivotal regulators mediating plant growth, environmental stress responses, and senescence, their role in controlling aroma compound biosynthesis has not been explored101104. Therefore, this study expands the known biological functions of TCP transcription factors and provides key genetic targets for metabolic engineering of aroma compound biosynthesis in tobacco. Furthermore, previous research has established that TCP transcription factors mediate plant abiotic stress responses through dual regulatory roles, acting as both activators and repressors105. Transgenic Arabidopsis overexpressing AtTCP13 exhibited leaf curling and reduced leaf growth under osmotic stress, whereas enhanced dehydration tolerance was observed106. Conversely, ZmTCP14 negatively regulates drought tolerance in maize by promoting ROS production69. Consistently, this study demonstrated that overexpression of NtCYC promotes MDA production in tobacco, indicating its positive regulation of ROS generation. ntcyc-KO plants exhibited enhanced cold tolerance, demonstrating that NtCYC modulates cold resilience by regulating ROS accumulation in plants. The signaling pathway mediated by NtCYC requires further validation in future studies.

In future research, candidate regulatory genes for corresponding metabolites can be directly extracted from the GMRN of this study. We will continue to refine the metabolic network, including by identifying nontarget metabolites to increase the quantity of metabolites, exploring the integration of advanced statistical methods, such as convolutional neural network for network modeling to improve accuracy. We will also explore other applications of metabolic networks, such as in identifying unknown metabolite synthesis pathways. Furthermore, we will collect and integrate more multi-omics data for tobacco, developing a multi-omics metabolic regulatory database and interactive web platform. In conclusion, our study expands the gene expression and metabolite datasets of tobacco, providing a wealth of publicly available data and gene resource for foundational research on tobacco.

Methods

Plant materials and growth conditions

Tobacco (Nicotiana tabacum L. cv K326) were grown in experimental fields in Guiyang and Longshan, China, under natural conditions. Leaf samples were harvested every 2 days between 16:00–18:00 h after topping. Leaves (without petioles) harvested from and rapidly frozen in liquid nitrogen, and stored at -80 °C freezers. Three biological replicates, each of which was a mixed sample from at least three individual plants, and divided for parallel RNA extraction and metabolite profiling.

For cold stress treatment, the seeds of wild type and transgenic plants were germinated and cultivated in plastic pots under the condition of 14 h light (200 μmol m−2 s−1, 25C)/10h-dark (25 ◦C) and 70% relative humidity for 45 days. Then, the seedlings were transferred to another greenhouse with the condition of 10C for 7/14 days. The phenotypes of the WT and transgenic plants under the experimental stresses were recorded. All aboveground leaves of each group that had five biological replicates were immediately frozen in liquid nitrogen, and further for physiological indices measurement.

Meteorological data collection

Climate data (relative humidity, precipitation and air temperature) corresponding to the sampling site coordinates were obtained from the China Meteorological Data Network (http://data.cma.cn/). Specifically, relative humidity refers to the ratio (%) of actual water vapor pressure in air to the saturation vapor pressure at a given temperature. Precipitation denotes the accumulated depth (mm) of liquid or solid water reaching the Earth’s surface per unit area over a specified time period. The net solar radiation was measured by NR LITE2 net radiometer (Kipp & Zonen, Delft, Netherlands). The radiometers were installed 2  meters above the ground, and were at least 0.5  meters above the tobacco plant canopy107.

SPAD index measurement

SPAD index of tobacco middle leaves at different sampling stages were measured using a SPAD-502 PLUS chlorophyll meter (Minolta Co. Ltd., Osaka, Japan), which has an accuracy of ±1.0 SPAD unit. Measurements were taken symmetrically at three positions along both sides of the midvein, equidistant from the midvein and leaf margin108. Readings were avoided that would be directly on the leaf midrib. The average of these readings was used as the chlorophyll meter reading for the leaf. At least ten plants were measured per time point. Leaf area was measured on-site using a leaf area meter (LI-3100, LiCor, Lincoln, NE) to prevent shrinkage. Dry weight was determined after oven drying at 70 °C for 72 h. Specific Leaf Weight was calculated as the ratio of dry weight to leaf area109. The normalized SPAD value is calculated as the ratio of raw SPAD data to SLW.

Chemicals and reagents

Methanol and acetonitrile, both of HPLC grade, were obtained from Merck (Darmstadt, Germany). Acetic acid, formic acid, and ammonium acetate, also HPLC grade, were sourced from CNW (Düsseldorf, Germany). Standards including CGA, neo-chlorogenic acid, crypto-chlorogenic acid, caffeic acid, β-damascenone, β-ionone, and DHA were purchased from Sigma Aldrich (Milan, Italy), all with a purity exceeding 97%. All chemicals and reagents were used as received, without further purification. Ultrapure water (18.2 MΩ·cm) was produced using a Milli-Q system (Bedford, MA, USA) and was utilized throughout the experiments.

Metabolome profiling

We performed untargeted metabolome profiling based on GC-MS and LC-MS/MS. For sample preparation, a total of 66 samples (eleven developmental stages × three biological replicates × two sites) were analyzed, and eight analytical blanks and eight pooled biological quality control (QC) samples prepared by pooling 10 µL aliquots of every leaf extract. The injection sequence was randomized within blocks of 12 samples with QC injections every 8 samples110.

For direct-injection GC-MS profiling, 55 ± 0.5 mg frozen leaf powder (liquid-N₂ milled) was weighed into 2 mL Eppendorf tubes, extracted with 1.5 mL MeOH/CHCl₃ (3:2 v/v) for 40 min at 4 °C (vortex 30 s, 10 min sonication, 30 min shaking at 1400 × g). After centrifugation (10 min, 12,000 × g, 4 °C), 1.0 mL of the upper polar phase was removed and dried in a CentriVap vacuum concentrator for 6 h. The residue was re-dissolved in 250 µL MeOH/CHCl₃ (3:2), vortexed, centrifuged (10 min, 12,000 × g) and 200 µL transferred to a 250 µL glass micro-vial for immediate GC-MS analysis. An Agilent 8890 GC coupled to a 5975 MSD was used with a DB-5MS column (30 m × 0.25 mm × 0.25 µm). Injection volume 2 µL, split 5:1, inlet 290 °C; carrier gas He, 1.0 mL min⁻¹ constant flow. Oven: 50 °C (4 min), 5 °C min⁻¹, 300 °C (10 min). Transfer line 280 °C. EI⁺ 70 eV, source 230 °C, quadrupole 150 °C, m/z 50-600, solvent delay 4 min. Daily autotune with PFTBA kept mass accuracy within 0.1 Da. NetCDF files were imported into XCMS-online v3.12.0 (centWave: 15 ppm, peak width 5-25 s, snthresh 6). Annotation used an in-house tobacco library (NIST17 + 180 authentic standards, match ≥800, RI ± 20) resulting in 127 level-1 identifications. Blank filtering removed features present in ≥50% of blanks with ≥20% sample mean intensity. Data were normalized by sum-of-selected-metabolites and batch-corrected with QC-RLSC (MetNormalizer v1.2.0)111.

For derivatization GC-MS metabolomics, 20 mg lyophilized powder was extracted with 1.5 mL isopropanol/acetonitrile/water (3/3/2 v/v/v) for 1 h at 4 °C in an ultrasonic bath, centrifuged (12,000 × g, 10 min) and 500 µL supernatant dried in vacuo. Methoximation was carried out with 100 µL methoxyamine (20 mg mL⁻¹ in pyridine) 90 min at 37 °C, followed by 100 µL BSTFA + 1% TMCS 60 min at 60 °C. After cooling, 1 µL was injected. Same 8890-5975 platform, DB-5MS column. Inlet 300 °C, split 10:1, injection 1 µL; He 1.2 mL min⁻¹. Oven: 70 °C (4 min), 5 °C min⁻¹, 310 °C (15 min). Transfer line 280 °C; EI 70 eV, source 230 °C, quadrupole 150 °C, m/z 50-600, solvent delay 5.2 min. XCMS-online v3.12.0 (centWave, as above) + CAMERA grouping. Metabolites were identified (match ≥800 vs Fiehn RI library) yielding 186 level-1 identifications. Blank filtering, missing-value imputation, Sample-Specific Median (SSM) normalization and QC-RLSC batch correction were applied prior to multivariate analysis (SIMCA 16, OPLS-DA, VIP > 1)112.

For untargeted LC-MS metabolomics, 20 mg freeze-dried powder was extracted with 1.5 mL 75% MeOH containing umbelliferone (2 mg L⁻¹) 1 h at 40 °C, centrifuged (12,000 × g, 15 min, 4 °C) and 800 µL supernatant transferred to LC vials. Agilent 1290 UPLC, Zorbax SB-C18 (100 × 2.1 mm, 1.8 µm) at 60 °C. Mobile phase A 0.1% formic acid in water, B acetonitrile. Gradient: 0-2 min 15-30% B, 2-10 min 30-85% B, 10-15 min 85-90% B, 17 min 100% B (hold 3 min), re-equilibrate 4 min; flow 0.3 mL min⁻¹, UV 200-400 nm. Agilent 6540 Q-TOF, ESI⁺/ESI⁻, gas temp 350 °C, gas flow 12 L min⁻¹, nebulizer 40 psi, sheath gas 350 °C 10 L min⁻¹, capillary 3.5 kV, nozzle 480 V, fragmentor 130 V, oct RF 750 V; mass range 50-1,000 m/z, acquisition rate 2 spectra s⁻¹, internal reference masses 121.0509/922.0098 (ESI⁺) and 112.9856/966.0007 (ESI⁻). Raw files converted to mzXML by MSConvert (ProteoWizard 3.0.21). Peak picking, alignment and area extraction performed with AntDAS v1.0 using parameters: peak width 6-30 s, mass tolerance 10 ppm, minimum peak intensity 1000. Annotation used an in-house PMDB combined with HMDB (mass error ≤5 ppm, MS/MS match ≥80). Data were normalized by umbelliferone area, log-transformed and Pareto-scaled prior to OPLS-DA (SIMCA 14.1)113.

For untargeted lipidomics, 20 mg lyophilized powder was extracted with 1.5 mL isopropanol/ethyl acetate (2:3 v/v) 30 min at 30 °C, centrifuged (12,000 × g, 10 min). Five-hundred µL supernatant was dried and reconstituted in 200 µL isopropanol/acetonitrile/water (2:1:1 v/v/v), sonicated 10 min, centrifuged again and 150 µL transferred to LC vials. Waters I-Class UPLC, Acquity CORTECS C18 (100 × 2.1 mm, 1.6 µm) at 40 °C. Mobile phase A acetonitrile/water (6:4) + 0.1% formic acid + 10 mM ammonium formate, B isopropanol/acetonitrile (9:1) + 0.1% formic acid + 10 mM ammonium formate. Gradient: 0–2 min 20–55% B, 2–12 min 55–58% B, 12–13 min 58–65% B, 13-26 min 65-99% B (hold 1 min), re-equilibrate 4 min; flow 0.3 mL min⁻¹. Waters Xevo G2-S Q-TOF with traveling-wave IMS, ESI⁺, capillary 3.0 kV, sampling cone 40 V, source temp 120 °C, desolvation temp 400 °C, desolvation gas 800 L h⁻¹, cone gas 50 L h⁻¹; mass range 120-2000 m/z, scan time 0.2 s, lock-mass leucine enkephalin 556.2771. Data acquired in resolution mode ( ~ 20,000 FWHM). Raw files were imported to Progenesis QI v3.2 (Waters) for alignment, peak picking (automatic sensitivity, 10 ppm), deconvolution and normalization to total ion signal. Lipids were annotated (level-1 when both precursor ≤5 ppm and ≥2 diagnostic MS/MS fragments matched; otherwise level-2) against LipidMAPS and in-house LipidBlast libraries (collision energy 20-40 eV ramp). PCA and heatmap plotting were performed using procomp function and ComplexHEATMAP in R software, respectively114. Metabolite functional enrichment analysis was conducted MBROLE 2.0115.

Transcriptome profiling and analysis

Total RNA was extracted using a kit (TransGen Biotech Co. Ltd.). Quality of RNA was evaluated by 1% agarose gel electrophoresis. RNA-seq libraries were conducted and sequenced by the Illumina HiSeq-2000 platform.

The transcriptome raw read quality was assessed using FastQC and filtered with Trim Galore. The filtered reads were then aligned to the reference genome using HISAT2116. Gene expression for each primary gene model was quantified with StringTie using default parameters117. For time-course DEG analyses, gene expression data from StringTie was merged, and time-course DEGs were identified using Ballgown and clustered with Mfuzz118. Differentially expressed genes at various stages were determined using DESeq2 with a likelihood ratio test. PCA and heatmap visualizations were created using the prcomp function and ComplexHEATMAP in R software114.

For functional enrichment analysis, all tobacco genes were annotated using eggnog-mapper v2.0, followed by extracted GO and KEGG annotation information by a Python script. Hypergeometric test with all the DEGs computing was used for enrichment analysis in R software. The threshold significance of P value uses FDR < 0.01.

Weighted gene correlation network analysis

WGCNA was conducted in R software following the official manual119. FPKM data for all genes served as input for the analysis. A soft-thresholding power of 16 was selected to achieve an approximate scale-free topology. The clustering dendrogram was cut at a height of 0.6 to merge similar dynamic modules. The module-metabolite correlation matrix was computed using the Pearson correlation coefficient, with correlation coefficients greater than 0.8 or less than -0.8 considered as strong positive or negative correlations, respectively.

Multiple cluster analysis and construction of gene-metabolite regulatory network

Linear relationships between genes and metabolites were constructed using Pearson’s correlation algorithm78. Non-linear relationships were quantified via the GINI coefficient, while machine learning was employed to rank the importance of gene-metabolite regulatory pairs. Only genes exhibiting |correlation | > 0.5 with at least one metabolite were retained for subsequent analysis. Both PCC and GINI coefficients between gene and metabolite expression levels were computed in R. Multiple cluster analysis was performed by standardizing transcriptomic and metabolomic datasets separately, followed by integrated co-regulation analysis37. Normalized expression values of genes and metabolites were calculated based on their expression levels across all samples. Co-regulation analysis was conducted, encompassing a total of 11 time points (22 samples in total for LF and HM), using MeV 4.9 with the k-means method120. Concurrently, metabolomic data underwent independent k-means clustering. The resulting metabolite clusters were subjected to co-expression/co-regulation analysis with genes to compute pairwise regulatory relationships. Regulatory pairs exhibiting inconsistency in gene-metabolite co-clustering or metabolite cluster-gene coregulation analyses were filtered out through PCC-based correction using Shell, Python and R scripts, yielding the final clustering solution. Regulatory relationships based on PCC and GINI coefficients were filtered and merged using custom Python scripts. All analyses were conducted in quadruplicate. Only regulatory relationships consistently detected in at least two biological replicates were retained. Expression trends of genes and metabolites were visualized separately in R. Hierarchical clustering (HCL) was performed using the prcomp function, enabling graphical interpretation of sample relatedness. Gene-metabolite regulatory networks were constructed with network topological properties computed using the NetworkX module in Python16. Visualization of all gene-metabolite associations was achieved using Gephi and Cytoscape software121,122.

Phylogenetic analysis

The amino acid sequences were aligned using MUSCLE (v3.8.1551, with default setting)123, and maximum-likelihood trees were built by IQ-TREE (v 2.2.2.3, settings: -m MFP -B 1000)124.

Expression analysis

Total RNA was extracted using TRIzol (Thermo Fisher Scientific, USA). First-strand cDNA was synthesized by ReverTra Ace (TOYOBO Biotech). RT‒qPCR was performed using Applied Biosystems QuantStudio 3 RealTime PCR System. Actin1 was used as internal reference to normalize the expression value of each sample. The primers used in this research are listed in Supplementary Data 16. Oligonucleotide primers were purchased from Sangon Biotech.

Protein subcellular localization

The coding sequences of NtMYB28, NtERF167, and NtCYC were introduced into the pVBG2300-GFP vector. Oligonucleotide primers were purchased from Sangon Biotech. These recombinant constructs were transformed into the Agrobacterium strain GV3101, and then suspended in an infiltration buffer (10 mM MgCl2, 10 mM MES, pH=5.7, 150 mM acetosyringone) at an OD600 of 0.5-0.8. The suspension was transferred into leaves of Nicotiana benthamiana125.

The nuclear localization marker NLS, linked to the mKATE red fluorescent protein (RFP), was co-transformed with pVBG2300-GFP. After infiltration, the Nicotiana benthamiana plants were incubated for 48–72 h. The fluorescent protein signals were visualized using a high-resolution laser confocal microscope.

Transcriptional activation assay

The coding sequences of NtMYB28, NtERF167, and NtCYC were inserted into the BamHI/SalI site of the pGBKT7 vector to create GAL4 BD constructs. Oligonucleotide primers were purchased from Sangon Biotech. The Matchmaker Gold Yeast One-Hybrid System (Clontech) was used. Competent Y2HGold cells were prepared with lithium acetate/PEG and transformed with 200 ng of each BD construct (pGBKT7-NtMYB28, pGBKT7-NtERF167, pGBKT7-NtCYC) or control vectors (pGBKT7-empty and pGBKT7-53). Transformants were selected on synthetic dropout (SD) medium lacking tryptophan (SD/-Trp) at 30 °C for 3 d. Three independent colonies per construct were streaked onto SD/-Trp agar (control) and SD/-Trp/-His/-Ade (stringent) supplemented with 5-bromo-4-chloro-3-indolyl-α-D-galactopyranoside (X-α-Gal, 40 µg mL⁻¹). Plates were incubated at 30 °C for 3-5 d and photographed. The trans-activation activities were evaluated based on the growth status of yeast cells and the activity of β-galactosidase126.

Yeast one-hybrid assay

Promoter sequences of Nt4CL2A, NtPAL2, NtLACS2 and NtLOX2 were inserted into the pAbAi vector. Oligonucleotide primers were purchased from Sangon Biotech. The constructs were transferred into Y1H Gold yeast (S. cerevisiae) strain, and then the yeast was grown on SD/-Ura/-Leu medium with AbA (Aureobasidin A). The coding sequence of NtMYB28, NtERF167 and NtCYC were cloned into pGADT7 vector and transferred into the Y1H Gold yeast strain containing pAbAi-Nt4CL2A/NtPAL2/NtLACS2/NtLOX2, respectively. Interaction of TFs with the promoter fragment were tested on SD/-Ura/-Leu medium with the tested AbA concentration. Primers used are listed in Supplementary Data 16.

Transient dual-luciferase reporter assays

Promoter sequences of Nt4CL2A, NtPAL2, NtLACS2, and NtLOX2 were inserted into the pGreenII 0800 vector to construct reporter vectors (pro-LUC). The coding sequences of NtMYB28, NtERF167, and NtCYC were inserted into the effector vector pGreenII 62-SK. Oligonucleotide primers were purchased from Sangon Biotech. These reporter and effector vector combinations were transformed into Agrobacterium strain GV3101 cells. The Agrobacterium strains were then introduced into the leaves of Nicotiana benthamiana plants. Luminescence was determined using a live-imaging apparatus 2–5 days after infiltration127.

EMSA

The coding sequences of NtMYB28, NtERF167 and NtCYC were inserted into the pET28a vector to generate the recombinant His-NtMYB28, His-NtERF167 and His-NtCYC plasmid, and the plasmid was following tranformed into BL21 (DE3) E. coli cells. For protein expression, a single colony was inoculated into 5 mL LB containing 50 µg mL⁻¹ kanamycin at 37 °C overnight, then diluted 1:100 into 500 mL fresh LB and grown to OD₆₀₀ = 0.6. Expression was induced with 0.5 mM IPTG and continued for 6 h at 28 °C with shaking. Cells were harvested by centrifugation (4 °C, 6 000 × g, 10 min) and stored at -80 °C until purification126. About 5 μg of purified recombinant fusion proteins were used for assay.

Conserved motifs of the Nt4CL2A, NtPAL2, NtLACS2 and NtLOX2 promoters were predicted using JASPAR. Oligonucleotide probes including predicted binding sites were synthesized with biotin at the 30-hydroxyl end of the sense strand. The probes used for EMSA are listed in Supplementary Data 16. To validate the specificity of the shifted band, nuclear proteins were pre-incubated with 50 or 100-fold excess of non-labeled identical or mutated oligonucleotides for 20 minutes before the addition of labeled probes126. The signal was detected using a Kit (Beyotime, Jiangsu, China; GS009). Oligonucleotide primers and probes were purchased from Sangon Biotech.

Vector construction and generation of over-expression lines

To construct NtMYB28, NtERF167 and NtCYC over-expression plasmid, the coding sequences of NtMYB28, NtERF167 and NtCYC was cloned in the Bsa I and Eco31 I sites of the pBWA(V)BS-ccdB binary vector using a specific primer listed in Supplementary Data 16. Oligonucleotide primers were purchased from Sangon Biotech. The constructs were introduced into Agrobacterium tumefaciens strain GV3101 and generated the stable transgenic lines128. Briefly, disinfected tobacco seeds were sown on germination medium and cultured for 4-5 weeks. Sterile tobacco leaves were cut into small pieces and inoculated onto pre-cultured medium. Inoculate tobacco leaves pre-cultured for 2-3 days in the Agrobacterium suspension for 10-15 minutes, and then plant on co-cultivation medium. After co-cultivation for 2 days, transfer the leaves to the induction medium for callus induction. Select callus tissues that meet the criteria and inoculate them on the corresponding selective medium. Inoculate vigorously growing positive callus tissues onto differentiation medium and culture for 15-30 days at 23 °C with a 16 h/8 h light/dark cycle. During the differentiation process, if seedlings form in the callus, transfer them to the vigorous seedling medium and grow for 7-10 days to obtain the transformed tobacco line.

CRISPR/Cas9 constructions and mutant genotyping

We generated knockout lines of NtMYB28, NtERF167 and NtCYC by the CRISPR/Cas9 technique129. Guide RNAs (gRNA) were designed using CRISPR-P 2.0130. The scaffold containing two gRNAs was amplified by PCR and inserted into the pKSE401 vector containing Cas9 gene131. Oligonucleotide primers were purchased from Sangon Biotech. The vector was transformed into Agrobacterium strain GV3101 for genetic transformation. Primers used for vector construction are listed in Supplementary Data 16. The derived constructs were transformed into tobacco as described above.

For genotyping of the mutants, including ntmyb28, nterf167 and ntcyc, the mutations were identified by Illumina resequencing. Two independent homologous knockout lines without the Cas9 gene were selected for further analysis.

Immunoblot analysis

Total protein was extracted from frozen tissue (100 mg) in 500 µL RIPA buffer (50 mM Tris-HCl pH 7.4, 150 mM NaCl, 1% NP-40, 0.5% sodium deoxycholate, 1 mM PMSF, 1× protease inhibitor cocktail) on ice for 30 min, cleared by centrifugation (12 000 × g, 15 min, 4 °C), and quantified with the BCA assay. 30 µg protein per lane was mixed with 4× Laemmli buffer, boiled (95 °C, 5 min), resolved on 10% SDS-PAGE (120 V, 90 min), and electro-transferred to 0.22 µm PVDF membranes (100 V, 60 min, 4 °C). Membranes were blocked with 5% non-fat milk in TBST (25 mM Tris, 150 mM NaCl, 0.1% Tween-20) for 1 h at room temperature, incubated overnight at 4 °C with primary antibodies (mouse anti-His 1:2000, ab18184, Abcam; rabbit anti-Actin 1:5000, AC038, Abclonal), washed with TBST, incubated with HRP-conjugated secondary antibodies (goat anti-mouse or anti-rabbit, 1:10 000) for 1 h at room temperature. Washed again and visualized with ECL substrate on a ChemiDoc imager82.

Determination of hydroxycinnamic acid and aroma components content by LC-MS/MS

The content of hydroxycinnamic acid and aroma components was measured using LC-MS/MS. 200 mg of frozen leaf powder (liquid-N₂ milled, -80 °C) was weighed into 2 mL Eppendorf tubes, mixed with 1 mL ice-cold methanol (LC-MS grade, Merck) and vortexed 30 s. The suspension was sonicated 1 h at 4 °C (Branson 5510, 40 kHz), centrifuged (15 min, 14 000 × g, 4 °C) and 800 µL supernatant transferred to a 1.5 mL tube. Solvent was removed in a Speed-Vac (30 °C, 6 h) and the residue reconstituted in 200 µL 50% acetonitrile/water (v/v) containing 0.1% formic acid. After vortexing and centrifugation (10 min, 14 000 × g) the solution was transferred to amber LC vials and kept at 4 °C in the autosampler (max 12 h until injection).

Chromatography was performed on an Agilent 1290 Infinity UHPLC coupled to an AB Sciex 3200 QTRAP triple-quadrupole mass spectrometer equipped with a Turbo-V ion source. Separation was achieved on an Agilent ZORBAX Eclipse Plus C18 column (100 × 2.1 mm, 1.8 µm) thermostated at 25 °C. Mobile phases A (water) and B (acetonitrile) both contained 0.1% formic acid. The gradient delivered at 0.8 mL min⁻¹ was: 0-0.5 min 10% B, 0.5-15 min 10-70% B, 15-16 min 70% B, 16-17 min 70-10% B, followed by 3 min re-equilibration (total run time 20 min). Injection volume was 20 µL. The mass spectrometer operated in scheduled SRM mode with a 30 s window, dwell time 30 ms, pause time 5 ms. Source parameters: curtain gas 20 psi, ion-spray voltage +4500 V/ − 4500 V, temperature 550 °C, nebulizer gas 50 psi, heater gas 60 psi. Compound-dependent parameters (precursor/product ion pairs, declustering potential and collision energy) were optimized by direct infusion of 1 µg mL⁻¹ standards. Every analytical batch commenced with three conditioning QC injections; samples were randomized within blocks of 12.

Matrix-matched calibration curves (0.5-500 ng mL⁻¹, R² ≥ 0.995) were prepared in extract of blank tissue. Limits of detection (LOD, S/N = 3) and quantification (LOQ, S/N = 10) ranged 0.03-0.15 ng mL⁻¹ and 0.10-0.50 ng mL⁻¹, respectively. Recovery (spike 10 and 100 ng g⁻¹) was 92-108% with RSD < 8%. Analyst 1.7.1 (AB Sciex) was used for peak integration; peaks were manually inspected and re-integrated when necessary. Technical replicates were averaged before statistical analysis.

Lipid extraction

Fresh tobacco leaves are harvested and ground into powder and labeled, then stored in a -80 °C freezer. Weigh samples, with high water content such as watermelon and tomatoes at 200 ± 5 mg, and samples with low water content such as leaves at 100 ± 3 mg, into labeled 2 mL EP tubes. Place the tubes in a -80 °C freezer for storage. Add 1.5 mL of the dichloromethane: Methanol 2:1 (v/v) to the weighed sample, vortex for 1 minute. After the vortex, add 200 μL of water, vortex for an additional 1 minute, let it stand for 1 minute. Centrifuge at 12,000 × g for 10 minutes, quantitatively transfer the lower layer (dichloromethane layer) of 1 mL to a labeled 1.5 mL EP tube. Place it in a nitrogen blow dryer to dry the solvent. Re-dissolve the dried sample in 0.2 mL reconstitution solvent, vortex for 1 minute, let it stand for 1 minute, and repeat this process five times. Transfer the supernatant to sample vials.

Lipidomics data acquisition

Chromatography was performed on a Shimadzu Nexera X2 UHPLC (SIL-30AC autosampler, LC-30AD pumps, CTO-30A column oven) coupled to a SCIEX QTRAP 5500 equipped with a differential-mobility spectrometry (DMS) device (SelexION®) and an ESI Turbo-V source. Separation was achieved on a Phenomenex Kinetex C18 column (100 × 2.1 mm, 1.7 µm) maintained at 50 °C. Mobile phases were A = acetonitrile/water (6:4 v/v) + 10 mM ammonium formate + 0.1% formic acid and B = isopropanol/acetonitrile (9:1 v/v) + 10 mM ammonium formate + 0.1% formic acid. The gradient (total flow 0.4 mL min⁻¹) was: 0-2 min 20% B, 2-12 min 20-58% B, 12-13 min 58-65% B, 13-26 min 65-99% B, held 1 min, re-equilibrate 4 min at 20% B. DMS compensation voltage (CoV) and separation voltage (SV) were optimized per class. Source conditions were identical for all three acquisition methods: curtain gas 25 psi, collision gas (CAD) medium, ion-spray voltage +5 500 V/ − 4 500 V, temperature 150 °C, nebulizer gas 50 psi, heater gas 60 psi. Scheduled MRM was employed with a 30 s detection window, dwell time 20 ms and pause time 5 ms. Each transition was acquired 20 times to ensure < 5% CV. Three complementary methods (Lipidyzer™ v3.0) were applied: Method-1 (phosphatidylcholines, lysophosphatidylcholines, sphingomyelins, 42 µL injection, DMS on, SV 3 700 V), Method-2 (triacylglycerols, diacylglycerols, 50 µL injection, DMS off) and Method-3 (phosphatidylethanolamines, phosphatidylserines, phosphatidylglycerols, phosphatidic acids, phosphatidylinositols, lysophosphatidylethanolamines, lysophosphatidylserines, lysophosphatidylinositols, lysophosphatidic acids, 39 µL injection, DMS on, SV 3 700 V). Mass spectrometer tuning was performed daily using 40 µL SPLASH mix, 50 µL SCIEX PEG tuning mix and 50 µL Lyso-tune mix (17:1 LPG, 17:1 LPS, 17:1 LPI, 16:0 LPA, final 1 mg mL⁻¹ except LPA 10 µg mL⁻¹). Calibration curves (0.1-2000 ng mL⁻¹, R² ≥0.995) were generated for 1 150 lipid species using class-specific internal standards. LOD (S/N = 3) and LOQ (S/N = 10) ranged 0.02-0.80 ng mL⁻¹ and 0.05-2.0 ng mL⁻¹, respectively. Recovery (spike 20 and 200 ng mL⁻¹) was 92-108% with RSD < 8%. Intra-day precision (n = 6) and inter-day precision (three consecutive days) were < 6% and < 9%, respectively132. MultiQuant 3.0.3 (SCIEX) was used for peak integration with automatic smoothing (Gaussian, 3 points) and baseline subtraction. Lipid species were normalized to their respective class-internal standards and expressed as nmol g⁻¹ dry weight132.

Physiological and biochemical parameters measurement

The present research uses MDA content as a representation of the level of lipid peroxidation133. Fresh leaf tissue weighing fifty milligrams was crushed with 1 ml of 80% (v/v) ethanol. Following centrifugation, the resulting supernatant underwent a reaction with thiobarbituric acid, producing the pinkish-red chromogen thiobarbituric acid-MDA. Absorbance measurements were taken using a UV-vis spectrophotometer. The MDA content was then calculated as nmol/g fresh weight tissue.

To determine the content of H2O2, 1 g of tissue was ground to powder in liquid nitrogen and 2 mL of 100 mmol L-1 K-phosphate buffer (pH 6.8, containing 0.1 mmol L-1 EDTA) was added. Total soluble protein content was measured by the Bradford method134. The content of H2O2 was measured using the Ampliflu Red (Sigma-Aldrich, St. Louis, MI, USA) method135.

For the determination of chlorophyll contents, 0.5 g of leaf tissue was frozen and ground in liquid nitrogen, then soaked in 95% ethanol at 4 °C for 24 h and extracted in dark condition136. The absorbances of the supernatant were measured at 665, 649, and 470 nm, and the content was calculated using the formula proposed by Bradford method134.

Statistics and reproducibility

In this study, all statistical analyses were conducted using R software137. Statistical tests including two-tailed Student’s t test, hypergeometric test, or Tukey’s-test were applied as appropriate. Sample sizes were not predetermined using statistical methods. For metabolic and physiological assessments, a minimum of three completely independent experiments were conducted. Each experiment included at least three biological replicates. For RNA-seq data and metabolome data, three biological experiments were conducted for each.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Supplementary information

Peer Review file (3.5MB, pdf)
41467_2025_65299_MOESM3_ESM.pdf (8KB, pdf)

Description of Additional Supplementary Files

Supplementary Data 1 (17.4KB, xlsx)
Supplementary Data 2 (672.6KB, xlsx)
Supplementary Data 3 (23.7MB, xlsx)
Supplementary Data 4 (299.3KB, xlsx)
Supplementary Data 5 (1.9MB, xlsx)
Supplementary Data 6 (1.8MB, xlsx)
Supplementary Data 7 (330.2KB, xlsx)
Supplementary Data 8 (40.2KB, xlsx)
Supplementary Data 9 (312.6KB, xlsx)
Supplementary Data 10 (1.8MB, xlsx)
Supplementary Data 11 (1.9MB, xlsx)
Supplementary Data 12 (882.4KB, xlsx)
Supplementary Data 13 (102KB, xlsx)
Supplementary Data 14 (544.7KB, xlsx)
Supplementary Data 15 (749.7KB, xlsx)
Supplementary Data 16 (12.9KB, xlsx)
Reporting Summary (103.5KB, pdf)

Source data

Source Data (10.3MB, xlsx)

Acknowledgements

This research was financially supported by the National Key R&D Program of China projects (2023YFA0915800 to J.B.Y.); the National Natural Science Foundation of China (Grant No. 32425011, 32488302 to J.B.Y., 32502746 to J.M.L.); Project grants 19-23Aa01 to J.S.Y.; Beijing Life Science Academy (BLSA) (2024200CA0020 to J.B.Y.); China National Tobacco Corporation (110202102025 to J.B.Y.); HN2021KJ03 to J.S.Y.; HN2023KJ04 to J.S.Y.; HN2024KJ01 to J.S.Y.; 110202101037 (JY-14) to J.S.Y.; the Agricultural Science and Technology Innovation Program (ASTIP, No.CAAS-CSIAF-202302) to J.B.Y., and the New Cornerstone Science Foundation through the XPLORER PRIZE to J.B.Y.

Author contributions

J.B.Y. and J.S.Y. supervised the entire project. J.M.L., J.S.Y., Q.G.L. and H.N.Z. designed the experiments. J.M.L., J.S.Y., Q.G.L., H.N.Z., Y.Y.L., Z.R.H., B.Y., P.P.L., Q.X.Z., S.S., Y.J.L., S.L.W., T.B.L., Q.Z.X., S.H.D., J.P.G., X.X.L. and S.B.W. performed part of the experiment. R.S.H., W.X.P., H.Q.X., Z.C.Z., and Z.S.L. gave advices to the project. W.X.P. provided plant materials. J.M.L. analyzed data and wrote the manuscript. J.B.Y. and J.S.Y. revised the manuscript. All the authors discussed and commented on the manuscript.

Peer review

Peer review information

Nature Communications thanks Peter Waterhouse, who co-reviewed with Leila Asadyar and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Data availability

The raw transcriptome data for GMRN construction were deposited in the Sequence Read Archive (SRA) of the NCBI under BioProject accession PRJNA1112100. The remaining raw transcriptome data for the WT, OE or KO tobacco lines are available at the NCBI under BioProject accession PRJNA1112565. All metabolomics and lipidomics data generated in this study have been deposited to the EMBL-EBI MetaboLights database under accession code MTBLS13081 (GMRN metabolomics), MTBLS13060 (hydroxycinnamic acid), MTBLS13071 (lipidomics), and MTBLS13072 (aroma components). Source data are provided with this paper.

Code availability

The Python, perl, and R scripts used to calculate coefficient, merge multiple clusters, construct gene-metabolite regulatory network and visualization are publicly accessible on GitHub (https://github.com/SIMON8423Lee/Genome-scale-Metabolic-Regulatory-Network). A permanent version of all code is available at Zenodo (10.5281/zenodo.17252227)138.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Jiaming Li, Qinggang Liao.

Contributor Information

Jiashuo Yang, Email: yangjiasuo@163.com.

Jianbin Yan, Email: jianbinlab@caas.cn.

Supplementary information

The online version contains supplementary material available at 10.1038/s41467-025-65299-6.

References

  • 1.Kosmacz, M., Sokołowska, E. M., Bouzaa, S. & Skirycz, A. Towards a functional understanding of the plant metabolome. Curr. Opin. Plant Biol.55, 47–51 (2020). [DOI] [PubMed] [Google Scholar]
  • 2.Shen, S., Zhan, C., Yang, C., Fernie, A. R. & Luo, J. Metabolomics-centered mining of plant metabolic diversity and function: Past decade and future perspectives. Mol. Plant16, 43–63 (2023). [DOI] [PubMed] [Google Scholar]
  • 3.Fernie, A. R. & Tohge, T. The genetics of plant metabolism. Annu. Rev. Genet.51, 287–310 (2017). [DOI] [PubMed] [Google Scholar]
  • 4.Park, Y. H. et al. Microalgal secondary metabolite productions as a component of biorefinery: A review. Bioresour. Technol.344, 126206 (2022). [DOI] [PubMed] [Google Scholar]
  • 5.Shumbe, L., Bott, R. & Havaux, M. Dihydroactinidiolide, a high light-induced β-carotene derivative that can regulate gene expression and photoacclimation in Arabidopsis. Mol. Plant7, 1248–1251 (2014). [DOI] [PubMed] [Google Scholar]
  • 6.Luo, J. Metabolite-based genome-wide association studies in plants. Curr. Opin. Plant Biol.24, 31–38 (2015). [DOI] [PubMed] [Google Scholar]
  • 7.Molina-Hidalgo, F. J. et al. Engineering metabolism in Nicotiana species: a promising future. Trends Biotechnol.39, 901–913 (2021). [DOI] [PubMed] [Google Scholar]
  • 8.Li, R. et al. High-resolution genome mapping and functional dissection of chlorogenic acid production in Lonicera maackii. Plant Physiol.192, 2902–2922 (2023). [DOI] [PubMed] [Google Scholar]
  • 9.Vicente, G., Martinez, M. & Aracil, J. Optimisation of integrated biodiesel production. Part I. A study of the biodiesel purity and yield. Bioresour. Technol.98, 1724–1733 (2007). [DOI] [PubMed] [Google Scholar]
  • 10.Andrianov, V. et al. Tobacco as a production platform for biofuel: overexpression of Arabidopsis DGAT and LEC2 genes increases accumulation and shifts the composition of lipids in green biomass. Plant Biotechnol. J.8, 277–287 (2010). [DOI] [PubMed] [Google Scholar]
  • 11.Liang, M. H., He, Y. J., Liu, D. M. & Jiang, J. G. Regulation of carotenoid degradation and production of apocarotenoids in natural and engineered organisms. Crit. Rev. Biotechnol.41, 513–534 (2021). [DOI] [PubMed] [Google Scholar]
  • 12.Gong, X., Li, F., Liang, Y., Han, X. & Wen, M. Characteristics of NtCCD1-3 from tobacco, and protein engineering of the CCD1 to enhance β-ionone production in yeast. Front. Microbiol.13, 1011297 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bally, J. et al. The rise and rise of Nicotiana benthamiana: a plant for all reasons. Annu. Rev. Phytopathol.56, 405–426 (2018). [DOI] [PubMed] [Google Scholar]
  • 14.Dong, N. Q. & Lin, H. X. Contribution of phenylpropanoid metabolism to plant development and plant–environment interactions. J. Integr. Plant Biol.63, 180–209 (2021). [DOI] [PubMed] [Google Scholar]
  • 15.Lu, H., Tian, Z., Cui, Y., Liu, Z. & Ma, X. Chlorogenic acid: A comprehensive review of the dietary sources, processing effects, bioavailability, beneficial properties, mechanisms of action, and future directions. Compr. Rev. Food Sci. Food Saf.19, 3130–3158 (2020). [DOI] [PubMed] [Google Scholar]
  • 16.Yang, C. et al. Rice metabolic regulatory network spanning the entire life cycle. Mol. Plant15, 258–275 (2022). [DOI] [PubMed] [Google Scholar]
  • 17.Chezem, W. R., Memon, A., Li, F.-S., Weng, J.-K. & Clay, N. K. SG2-type R2R3-MYB transcription factor MYB15 controls defense-induced lignification and basal immunity in Arabidopsis. Plant Cell29, 1907–1926 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Muñoz, C. F. et al. Genetic engineering of microalgae for enhanced lipid production. Biotechnol. Adv.52, 107836 (2021). [DOI] [PubMed] [Google Scholar]
  • 19.Yang, Y., Chaffin, T. A., Ahkami, A. H., Blumwald, E. & Stewart, C. N. Plant synthetic biology innovations for biofuels and bioproducts. Trends Biotechnol.40, 1454–1468 (2022). [DOI] [PubMed] [Google Scholar]
  • 20.Lersten, N. R., Czlapinski, A. R., Curtis, J. D., Freckmann, R. & Horner, H. T. Oil bodies in leaf mesophyll cells of angiosperms: overview and a selected survey. Am. J. Bot.93, 1731–1739 (2006). [DOI] [PubMed] [Google Scholar]
  • 21.Maffei, M. E., Gertsch, J. & Appendino, G. Plant volatiles: production, function and pharmacology. Nat. Prod. Rep.28, 1359–1380 (2011). [DOI] [PubMed] [Google Scholar]
  • 22.Pichersky, E. & Gershenzon, J. The formation and function of plant volatiles: perfumes for pollinator attraction and defense. Curr. Opin. Plant Biol.5, 237–243 (2002). [DOI] [PubMed] [Google Scholar]
  • 23.Gayen, D., Ali, N., Sarkar, S. N., Datta, S. K. & Datta, K. Down-regulation of lipoxygenase gene reduces degradation of carotenoids of golden rice during storage. Planta242, 353–363 (2015). [DOI] [PubMed] [Google Scholar]
  • 24.Murata, M., Kobayashi, T. & Seo, S. α-Ionone, an apocarotenoid, induces plant resistance to western flower thrips, Frankliniella occidentalis, independently of jasmonic acid. Molecules25, 17 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Milošević, T., Argyropoulou, C., Solujić, S., Murat-Spahić, D. & Skaltsa, H. Chemical composition and antimicrobial activity of essential oils from Centaurea pannonica and C. jacea. Nat. Prod. Commun.5, 1934578X1000501030 (2010). [PubMed] [Google Scholar]
  • 26.Williams, A. Rose ketones: Celebrating 30 years of success. Perfum. Flavorist27, 18–31 (2002). [Google Scholar]
  • 27.Zhu, X. et al. Synthetic biology of plant natural products: From pathway elucidation to engineered biosynthesis in plant cells. Plant Commun. 2, 100229 (2021). [DOI] [PMC free article] [PubMed]
  • 28.Walley, J. W. et al. Integration of omic networks in a developmental atlas of maize. Science353, 814–818 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Civelek, M. & Lusis, A. J. Systems genetics approaches to understand complex traits. Nat. Rev. Genet.15, 34–48 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Morabito, A., De Simone, G., Pastorelli, R., Brunelli, L. & Ferrario, M. Algorithms and tools for data-driven omics integration to achieve multilayer biological insights: a narrative review. J. Transl. Med.23, 425 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Zhu, W. et al. A translatome-transcriptome multi-omics gene regulatory network reveals the complicated functional landscape of maize. Genome Biol.24, 1–26 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Karjalainen, M. K. et al. Genome-wide characterization of circulating metabolic biomarkers. Nature628, 130–138 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Tohge, T. & Fernie, A. R. Combining genetic diversity, informatics and metabolomics to facilitate annotation of plant gene function. Nat. Protoc.5, 1210–1227 (2010). [DOI] [PubMed] [Google Scholar]
  • 34.Han, L. et al. A multi-omics integrative network map of maize. Nat. Genet.55, 144–153 (2023). [DOI] [PubMed] [Google Scholar]
  • 35.Wang, D. et al. An integrated analysis of transcriptome and metabolome provides insights into the responses of maize (Zea mays L.) roots to different straw and fertilizer conditions. Environ. Exp. Bot.194, 104732 (2022). [Google Scholar]
  • 36.Shen, L. et al. A transcriptional atlas identifies key regulators and networks for the development of spike tissues in barley. Cell Rep. 42, 113441 (2023). [DOI] [PubMed]
  • 37.Song, R. et al. Unraveling the regulatory network of barley grain metabolism through the integrative analysis of multiomics and mQTL. Nat. Commun.16, 5544 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Xu, C. et al. Integrative metabolomic and transcriptomic analyses reveal the mechanisms of Tibetan hulless barley grain coloration. Front. Plant Sci.13, 1038625 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Zhu, G. et al. Rewiring of the fruit metabolome in tomato breeding. Cell172, 249–261.e12 (2018). [DOI] [PubMed] [Google Scholar]
  • 40.Li, Y. et al. MicroTom metabolic network: rewiring tomato metabolic regulatory network throughout the growth cycle. Mol. Plant13, 1203–1218 (2020). [DOI] [PubMed] [Google Scholar]
  • 41.Tang, J. et al. Integrated transcriptomics and metabolomics analyses reveal the molecular mechanisms of red-light on carotenoids biosynthesis in tomato fruit. Food Qual. Saf.6, fyac009 (2022). [Google Scholar]
  • 42.Liu, Z. et al. Cotton metabolism regulatory network: Unraveling key genes and pathways in fiber development and growth regulation. Plant Commun.6, 101221 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Shu, P. et al. A comprehensive metabolic map reveals major quality regulations in red-flesh kiwifruit (Actinidia chinensis). N. Phytol.238, 2064–2079 (2023). [DOI] [PubMed] [Google Scholar]
  • 44.Zeng, Z. et al. Kiwifruit spatiotemporal multiomics networks uncover key tissue-specific regulatory processes throughout the life cycle. Plant Physiol.197, kiae567 (2025). [DOI] [PubMed] [Google Scholar]
  • 45.Chen, Y. et al. Multiomic analyses reveal key sectors of jasmonate-mediated defense responses in rice. Plant Cell36, 3362–3377 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Li, D. et al. Integrative multi-omics analysis reveals genetic and heterotic contributions to male fertility and yield in potato. Nat. Commun.15, 8652 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Liu, P. et al. Integrating transcriptome and metabolome reveals molecular networks involved in genetic and environmental variation in tobacco. DNA Res27, dsaa006 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Gu, K. et al. The physiological response of different tobacco varieties to chilling stress during the vigorous growing period. Sci. Rep.11, 22136 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Kurt, D. & Kinay, A. Effects of irrigation, nitrogen forms and topping on sun cured tobacco. Ind. Crops Prod.162, 113276 (2021). [Google Scholar]
  • 50.Lu, J. et al. Constitutive activation of nitrate reductase in tobacco alters flowering time and plant biomass. Sci. Rep.11, 4222 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Basha, S. J., Bai, P. P., Krishna, S. K. & Rao, C. C. Effect of nitrogen and topping on performance of bidi tobacco (Nicotiana tabacum L.) varieties under rainfed conditions. Ann. Plant Soil Res.22, 415–419 (2020). [Google Scholar]
  • 52.Lei, B. et al. Nitrogen application and differences in leaf number retained after topping affect the tobacco (Nicotiana tabacum) transcriptome and metabolome. BMC Plant Biol.22, 38 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Gu, K. et al. Effects of topping and non-topping on growth-regulating hormones of flue-cured tobacco (Nicotiana tabacum L.)—a proteomic analysis. Front. Plant Sci.14, 1255252 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Pourcel, L., Routaboul, J. M., Cheynier, V., Lepiniec, L. & Debeaujon, I. Flavonoid oxidation in plants: from biochemical properties to physiological functions. Trends Plant Sci.12, 29–36 (2007). [DOI] [PubMed] [Google Scholar]
  • 55.Shomali, A. et al. Diverse physiological roles of flavonoids in plant environmental stress responses and tolerance. Plants11, 3158 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Lin, X. et al. Systemic identification of wheat spike development regulators by integrated multi-omics, transcriptional network, GWAS and genetic analyses. Mol. Plant17, 438–459 (2024). [DOI] [PubMed] [Google Scholar]
  • 57.Kannangara, R. et al. The transcription factor WIN1/SHN1 regulates cutin biosynthesis in Arabidopsis thaliana. Plant Cell19, 1278–1294 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Yang, S. U., Kim, H., Kim, R. J., Kim, J. & Suh, M. C. AP2/DREB transcription factor RAP2. 4 activates cuticular wax biosynthesis in Arabidopsis leaves under drought. Front. Plant Sci.11, 895 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Yang, W., Xin, Z., Zhang, Q., Zhang, Y. & Niu, L. The tree peony DREB transcription factor PrDREB2D regulates seed α-linolenic acid accumulation. Plant Physiol.195, 745–761 (2024). [DOI] [PubMed] [Google Scholar]
  • 60.Chen, X. et al. ERF49 mediates brassinosteroid regulation of heat stress tolerance in Arabidopsis thaliana. BMC Biol.20, 254 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Chen, H. et al. Transcription factor CrWRKY42 coregulates chlorophyll degradation and carotenoid biosynthesis in citrus. Plant Physiol.195, 728–744 (2024). [DOI] [PubMed] [Google Scholar]
  • 62.Liu, W. et al. MdWRKY11 participates in anthocyanin accumulation in red-fleshed apples by affecting MYB transcription factors and the photoresponse factor MdHY5. J. Agric. Food Chem.67, 8783–8793 (2019). [DOI] [PubMed] [Google Scholar]
  • 63.Wen, W. et al. MsWRKY11, activated by MsWRKY22, functions in drought tolerance and modulates lignin biosynthesis in alfalfa (Medicago sativa L.). Environ. Exp. Bot.184, 104373 (2021). [Google Scholar]
  • 64.Wang, Z. et al. Molecular cloning and functional characterization of NtWRKY11b in promoting the biosynthesis of flavonols in Nicotiana tabacum. Plant Sci.304, 110799 (2021). [DOI] [PubMed] [Google Scholar]
  • 65.Jiang, C.-H. et al. Transcription factors WRKY70 and WRKY11 served as regulators in rhizobacterium Bacillus cereus AR156-induced systemic resistance to Pseudomonas syringae pv. tomato DC3000 in Arabidopsis. J. Exp. Bot.67, 157–174 (2016). [DOI] [PubMed] [Google Scholar]
  • 66.Vogt, T. Phenylpropanoid biosynthesis. Mol. Plant3, 2–20 (2010). [DOI] [PubMed] [Google Scholar]
  • 67.Dubos, C. et al. MYB transcription factors in Arabidopsis. Trends Plant Sci.15, 573–581 (2010). [DOI] [PubMed] [Google Scholar]
  • 68.Cataldo, V. F., López, J., Cárcamo, M. & Agosin, E. Chemical vs. biotechnological synthesis of C 13-apocarotenoids: Current methods, applications and perspectives. Appl. Microbiol. Biotechnol.100, 5703–5718 (2016). [DOI] [PubMed] [Google Scholar]
  • 69.Jiao, P. et al. ZmTCP14, a TCP transcription factor, modulates drought stress response in Zea mays L. Environ. Exp. Bot.208, 105232 (2023). [Google Scholar]
  • 70.Wang, W.-Q., Xu, D.-Y., Sui, Y.-P., Ding, X.-H. & Song, X.-J. A multiomic study uncovers a bZIP23-PER1A–mediated detoxification pathway to enhance seed vigor in rice. Proc. Natl Acad. Sci. USA119, e2026355119 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.McClune, C. J. et al. Discovery of FoTO1 and Taxol genes enables biosynthesis of baccatin III. Nature643, 582–592 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Liu, S. et al. UV-B promotes flavonoid biosynthesis in Ginkgo biloba by inducing the GbHY5-GbMYB1-GbFLS module. Hortic. Res.10, uhad118 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Zhang, F. et al. OsRLCK160 contributes to flavonoid accumulation and UV-B tolerance by regulating OsbZIP48 in rice. Sci. China Life Sci.65, 1380–1394 (2022). [DOI] [PubMed] [Google Scholar]
  • 74.Cai, G. et al. Functional characterization of transcription factor WIN1 genes associated with lipid biosynthesis and stress tolerance in soybean (Glycine max). Environ. Exp. Bot.200, 104916 (2022). [Google Scholar]
  • 75.O’Connor, S. E. Engineering of secondary metabolism. Annu. Rev. Genet.49, 71–94 (2015). [DOI] [PubMed] [Google Scholar]
  • 76.Yan, P. et al. Biofortification of iron content by regulating a NAC transcription factor in maize. Science382, 1159–1165 (2023). [DOI] [PubMed] [Google Scholar]
  • 77.Zhao, K. & Rhee, S. Y. Omics-guided metabolic pathway discovery in plants: Resources, approaches, and opportunities. Curr. Opin. Plant Biol.67, 102222 (2022). [DOI] [PubMed] [Google Scholar]
  • 78.Bishara, A. J. & Hittner, J. B. Testing the significance of a correlation with nonnormal data: comparison of Pearson, Spearman, transformation, and resampling approaches. Psychol. Methods17, 399 (2012). [DOI] [PubMed] [Google Scholar]
  • 79.Trewavas, A. A brief history of systems biology. Plant Cell18, 2420–2430 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Chen, Y. et al. A wheat integrative regulatory network from large-scale complementary functional datasets enables trait-associated gene discovery for crop improvement. Mol. Plant16, 393–414 (2023). [DOI] [PubMed] [Google Scholar]
  • 81.Yang, B. et al. CAT Bridge: an efficient toolkit for gene–metabolite association mining from multiomics data. GigaScience13, giae083 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Xie, Y. et al. FHY3 and FAR1 integrate light signals with the miR156-SPL module-mediated aging pathway to regulate Arabidopsis flowering. Mol. Plant13, 483–498 (2020). [DOI] [PubMed] [Google Scholar]
  • 83.Li, W. et al. A natural allele of a transcription factor in rice confers broad-spectrum blast resistance. Cell170, 114–126. e15 (2017). [DOI] [PubMed] [Google Scholar]
  • 84.Tang, N. et al. A R2R3-MYB transcriptional activator LmMYB15 regulates chlorogenic acid biosynthesis and phenylpropanoid metabolism in Lonicera macranthoides. Plant Sci.308, 110924 (2021). [DOI] [PubMed] [Google Scholar]
  • 85.Kishi-Kaboshi, M., Seo, S., Takahashi, A. & Hirochika, H. The MAMP-responsive MYB transcription factors MYB30, MYB55 and MYB110 activate the HCAA synthesis pathway and enhance immunity in rice. Plant Cell Physiol.59, 903–915 (2018). [DOI] [PubMed] [Google Scholar]
  • 86.Wang, Z. et al. Transcription factor NtWRKY33a modulates the biosynthesis of polyphenols by targeting NtMYB4 and NtHCT genes in tobacco. Plant Sci.326, 111522 (2023). [DOI] [PubMed] [Google Scholar]
  • 87.Xiao, P. et al. Transcriptome and metabolome atlas reveals contributions of sphingosine and chlorogenic acid to cold tolerance in Citrus. Plant Physiol.196, 634–650 (2024). [DOI] [PubMed] [Google Scholar]
  • 88.Wang, Z. et al. NtMYB27 acts downstream of NtBES1 to modulate flavonoids accumulation in response to UV-B radiation in tobacco. Plant J.119, 2867–2884 (2024). [DOI] [PubMed] [Google Scholar]
  • 89.Li, C., Ng, C. K.-Y. & Fan, L.-M. MYB transcription factors, active players in abiotic stress signaling. Environ. Exp. Bot.114, 80–91 (2015). [Google Scholar]
  • 90.Koiwai, A., Suzuki, F., Matsuzaki, T. & Kawashima, N. The fatty acid composition of seeds and leaves of Nicotiana species. Phytochemistry22, 1409–1412 (1983). [Google Scholar]
  • 91.Liu, X. et al. Plant lipid remodeling in response to abiotic stresses. Environ. Exp. Bot.165, 174–184 (2019). [Google Scholar]
  • 92.Hou, Q., Ufer, G. & Bartels, D. Lipid signalling in plant responses to abiotic stress. Plant Cell Environ.39, 1029–1048 (2016). [DOI] [PubMed] [Google Scholar]
  • 93.Yu, L., Zhou, C., Fan, J., Shanklin, J. & Xu, C. Mechanisms and functions of membrane lipid remodeling in plants. Plant J.107, 37–53 (2021). [DOI] [PubMed] [Google Scholar]
  • 94.González-Valenzuela, L., Renard, J., Depège-Fargeix, N. & Ingram, G. The plant cuticle. Curr. Biol.33, R210–R214 (2023). [DOI] [PubMed] [Google Scholar]
  • 95.Ding, L.-N. et al. Long-chain acyl-CoA synthetase 2 is involved in seed oil production in Brassica napus. BMC Plant Biol.20, 1–14 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Trinh, D.-C. et al. PUCHI regulates very long chain fatty acid biosynthesis during lateral root and callus formation. Proc. Natl Acad. Sci. Usa.116, 14325–14330 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Tsuchiya, Y., Katsuhara, M., Sasaki, T. & Yamamoto, Y. Oxygen supply is a prerequisite for response to aluminum in cultured cells of tobacco (Nicotiana tabacum). Plant Cell Physiol.66, 1044–1060 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Gao, Y. et al. Molecular characterization and systematic analysis of NtAP2/ERF in tobacco and functional determination of NtRAV-4 under drought stress. Plant Physiol. Biochem.156, 420–435 (2020). [DOI] [PubMed] [Google Scholar]
  • 99.Sun, Y. et al. The CsDof1. 8–CsLIPOXYGENASE09 module regulates C9 aroma production in cucumber. Plant Physiol.196, 338–351 (2024). [DOI] [PubMed] [Google Scholar]
  • 100.Han, Y. et al. Mechanism of floral scent production in Osmanthus fragrans and the production and regulation of its key floral constituents, β-ionone and linalool. Hortic. Res. 6, 106 (2019). [DOI] [PMC free article] [PubMed]
  • 101.Schommer, C. et al. Control of jasmonate biosynthesis and senescence by miR319 targets. PLoS Biol.6, e230 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Yang, X. et al. CYCLOIDEA-like genes control floral symmetry, floral orientation, and nectar guide patterning. Plant Cell35, 2799–2820 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Hao, J. et al. GrTCP11, a cotton TCP transcription factor, inhibits root hair elongation by down-regulating jasmonic acid pathway in Arabidopsis thaliana. Front. Plant Sci.12, 769675 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Shen, F. et al. The CIN-TCP transcription factors regulate endocycle progression and pavement cell size by promoting cell wall pectin degradation. Nat. Commun.16, 4108 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Lopez, J. A., Sun, Y., Blair, P. B. & Mukhtar, M. S. TCP three-way handshake: linking developmental processes with plant immunity. Trends Plant Sci.20, 238–245 (2015). [DOI] [PubMed] [Google Scholar]
  • 106.Urano, K. et al. CIN-like TCP13 is essential for plant growth regulation under dehydration stress. Plant Mol. Biol.108, 257–275 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Gong, X. et al. Energy budget for tomato plants grown in a greenhouse in northern China. Agric. Water Manag.255, 107039 (2021). [Google Scholar]
  • 108.Li, Y. et al. Cold stress in the harvest period: effects on tobacco leaf quality and curing characteristics. BMC Plant Biol.21, 1–15 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Esfahani, M., Abbasi, H. A., Rabiei, B. & Kavousi, M. Improvement of nitrogen management in rice paddy fields using chlorophyll meter (SPAD). Paddy Water Environ.6, 181–188 (2008). [Google Scholar]
  • 110.Hu, Z. et al. Integrative analysis of transcriptome and metabolome provides insights into the underlying mechanism of cold stress response and recovery in two tobacco cultivars. Environ. Exp. Bot.200, 104920 (2022). [Google Scholar]
  • 111.Zhao, Y. et al. Investigation of the relationship between the metabolic profile of tobacco leaves in different planting regions and climate factors using a pseudotargeted method based on gas chromatography/mass spectrometry. J. Proteome Res.12, 5072–5083 (2013). [DOI] [PubMed] [Google Scholar]
  • 112.Xu, G. et al. Metabolic engineering of a 1, 8-cineole synthase enhances aphid repellence and increases trichome density in transgenic tobacco (Nicotiana tabacum L.). Pest Manag. Sci.79, 3342–3353 (2023). [DOI] [PubMed] [Google Scholar]
  • 113.Li, L. et al. Lipidome and metabolome analysis of fresh tobacco leaves in different geographical regions using liquid chromatography–mass spectrometry. Anal. Bioanal. Chem.407, 5009–5020 (2015). [DOI] [PubMed] [Google Scholar]
  • 114.Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics32, 2847–2849 (2016). [DOI] [PubMed] [Google Scholar]
  • 115.López-Ibáñez, J., Pazos, F. & Chagoyen, M. MBROLE 2.0—functional enrichment of chemical compounds. Nucleic Acids Res44, W201–W204 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Sun, Y. et al. Divergence in the ABA gene regulatory network underlies differential growth control. Nat. Plants8, 549–560 (2022). [DOI] [PubMed] [Google Scholar]
  • 117.Pertea, M., Kim, D., Pertea, G. M., Leek, J. T. & Salzberg, S. L. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat. Protoc.11, 1650–1667 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118.Kumar, L. & Futschik, M. E. Mfuzz: a software package for soft clustering of microarray data. Bioinformation2, 5 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinform9, 1–13 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Gasch, A. P. & Eisen, M. B. Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biol.3, 1–22 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121.Kohl, M., Wiese, S. & Warscheid, B. Cytoscape: software for visualization and analysis of biological networks. Data Min. Proteom.696, 291–303 (2011). [DOI] [PubMed] [Google Scholar]
  • 122.Bastian, M., Heymann, S. & Jacomy, M. Gephi: an open source software for exploring and manipulating networks. Proc. Int. AAAI Conf. Web Soc. Media3, 361–362 (2009). [Google Scholar]
  • 123.Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res32, 1792–1797 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.Nguyen, L.-T., Schmidt, H. A., Von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol.32, 268–274 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125.Sleight, S. C., Bartley, B. A., Lieviant, J. A. & Sauro, H. M. In-Fusion BioBrick assembly and re-engineering. Nucleic Acids Res38, 2624–2636 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 126.Wang, S. et al. The R2R3-MYB transcription factor FaMYB63 participates in regulation of eugenol production in strawberry. Plant Physiol.188, 2146–2165 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127.Hellens, R. P., Edwards, E. A., Leyland, N. R., Bean, S. & Mullineaux, P. M. pGreen: a versatile and flexible binary Ti vector for Agrobacterium-mediated plant transformation. Plant Mol. Biol.42, 819–832 (2000). [DOI] [PubMed] [Google Scholar]
  • 128.Horsch, R. et al. A simple and general method for transferring genes into plants. Science227, 1229–1231 (1985). [DOI] [PubMed] [Google Scholar]
  • 129.Hsu, P. D., Lander, E. S. & Zhang, F. Development and applications of CRISPR-Cas9 for genome engineering. Cell157, 1262–1278 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Liu, H. et al. CRISPR-P 2.0: an improved CRISPR-Cas9 tool for genome editing in plants. Mol. Plant10, 530–532 (2017). [DOI] [PubMed] [Google Scholar]
  • 131.Xing, H.-L. et al. A CRISPR/Cas9 toolkit for multiplex genome editing in plants. BMC Plant Biol.14, 1–12 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132.Hornburg, D. et al. Dynamic lipidome alterations associated with human health, disease and ageing. Nat. Metab.5, 1578–1594 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 133.Hodges, D. M., DeLong, J. M., Forney, C. F. & Prange, R. K. Improving the thiobarbituric acid-reactive-substances assay for estimating lipid peroxidation in plant tissues containing anthocyanin and other interfering compounds. Planta207, 604–611 (1999). [DOI] [PubMed] [Google Scholar]
  • 134.Bradford, M. M. A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal. Biochem.72, 248–254 (1976). [DOI] [PubMed] [Google Scholar]
  • 135.Smith, A. M., Ratcliffe, R. G. & Sweetlove, L. J. Activation and function of mitochondrial uncoupling protein in plants. J. Biol. Chem.279, 51944–51952 (2004). [DOI] [PubMed] [Google Scholar]
  • 136.Che, Y. et al. Potassium ion regulates hormone, Ca2+ and H2O2 signal transduction and antioxidant activities to improve salt stress resistance in tobacco. Plant Physiol. Biochem.186, 40–51 (2022). [DOI] [PubMed] [Google Scholar]
  • 137.Crawley, M. J. et al. The R book, (John Wiley & Sons, 2012).
  • 138.Li, J. et al. Multi-omics analyses reveal regulatory networks underpinning metabolite biosynthesis in Nicotiana tabacum. Zenodo10.5281/zenodo.17252227 (2025). [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Peer Review file (3.5MB, pdf)
41467_2025_65299_MOESM3_ESM.pdf (8KB, pdf)

Description of Additional Supplementary Files

Supplementary Data 1 (17.4KB, xlsx)
Supplementary Data 2 (672.6KB, xlsx)
Supplementary Data 3 (23.7MB, xlsx)
Supplementary Data 4 (299.3KB, xlsx)
Supplementary Data 5 (1.9MB, xlsx)
Supplementary Data 6 (1.8MB, xlsx)
Supplementary Data 7 (330.2KB, xlsx)
Supplementary Data 8 (40.2KB, xlsx)
Supplementary Data 9 (312.6KB, xlsx)
Supplementary Data 10 (1.8MB, xlsx)
Supplementary Data 11 (1.9MB, xlsx)
Supplementary Data 12 (882.4KB, xlsx)
Supplementary Data 13 (102KB, xlsx)
Supplementary Data 14 (544.7KB, xlsx)
Supplementary Data 15 (749.7KB, xlsx)
Supplementary Data 16 (12.9KB, xlsx)
Reporting Summary (103.5KB, pdf)
Source Data (10.3MB, xlsx)

Data Availability Statement

The raw transcriptome data for GMRN construction were deposited in the Sequence Read Archive (SRA) of the NCBI under BioProject accession PRJNA1112100. The remaining raw transcriptome data for the WT, OE or KO tobacco lines are available at the NCBI under BioProject accession PRJNA1112565. All metabolomics and lipidomics data generated in this study have been deposited to the EMBL-EBI MetaboLights database under accession code MTBLS13081 (GMRN metabolomics), MTBLS13060 (hydroxycinnamic acid), MTBLS13071 (lipidomics), and MTBLS13072 (aroma components). Source data are provided with this paper.

The Python, perl, and R scripts used to calculate coefficient, merge multiple clusters, construct gene-metabolite regulatory network and visualization are publicly accessible on GitHub (https://github.com/SIMON8423Lee/Genome-scale-Metabolic-Regulatory-Network). A permanent version of all code is available at Zenodo (10.5281/zenodo.17252227)138.


Articles from Nature Communications are provided here courtesy of Nature Publishing Group

RESOURCES