Skip to main content
The EMBO Journal logoLink to The EMBO Journal
. 2025 Jul 3;44(16):4409–4418. doi: 10.1038/s44318-025-00496-z

A million shades of green: understanding and harnessing plant metabolic diversity

Rocky D Payet 1,, Adnane Aouidate 1,2, Rebecca Casson 1, Alan Houghton 1, Mai-Truc Pham 1, Anne Osbourn 1,
PMCID: PMC12361408  PMID: 40610793

Abstract

Recent developments in single-cell -omic and metabolite imaging technologies and the increasing availability of high-quality genome assemblies are having a transformative impact on the way research is carried out into plant specialised metabolism. Integrating these technologies into pathway discovery projects is therefore highly advantageous. Here, we present a general introduction into methods and workflows in specialised metabolism research. We review a range of recent methodologies, highlighting what they might be used for and common pitfalls which may be encountered. Finally, we provide a practical guide on how these technologies may be incorporated into a specialised metabolic pathway discovery pipeline for researchers who are new to the field.

Keywords: Natural Products, Plant Metabolism, Specialised Metabolism

Subject terms: Metabolism, Plant Biology


This commentary presents a step-by-step guide to starting a project that characterises plant secondary metabolites.

graphic file with name 44318_2025_496_Figa_HTML.jpg

Introduction

Plants produce a wide diversity of compounds. Broadly, these are separated into primary metabolites, which are necessary for growth and development, and specialised metabolites (sometimes called natural products, see “Glossary”), which serve a host of ecological functions ranging from defence, communication with other organisms, and interaction with their environment (Suresh et al, 2023). It should be noted, however, that the distinction between these two groups of compounds is not always well defined (Ji et al, 2024), such as in the case of sterols. Humans have exploited natural products for over 5000 years (Petrovska, 2012). More than 30% of our drugs derive from a direct plant source, and more than 60% of small molecule drugs introduced in the past 20 years are based on bioactive molecules from plant extracts or their derivatives. To date, over 200,000 plant natural products have been reported, yet genomic predictions suggest that plants can make millions of structurally varied molecules (Fang et al, 2019), indicating there are many more still to discover. The complexities of plant genomes, such as size and high repeat content, have traditionally presented substantial challenges to elucidation of the genes and enzymes of plant natural product biosynthetic pathways. However, the advent of long read sequencing technologies such as single-molecule real-time (SMRT/PacBio) sequencing and the reduced cost of sequencing has helped to overcome these barriers (van Dijk et al, 2018), heralding a new era of high-quality plant genome assemblies (Marks et al, 2021). Natural products (and indeed research into metabolism in general) research is an exciting, fast-moving area, but the multifaceted and interdisciplinary nature of the subject makes it a large and complex field. In this article, we attempt to demystify natural products research by outlining the archetypal steps to elucidate a biosynthesis pathway, while addressing the key considerations and common challenges faced. These steps encompass determining a product of interest or species of interest to work on, identifying candidate biosynthetic genes and pathways, and elucidating the activities of these in a heterologous host. We emphasise recent technological advancements in the specialised metabolism space and outline key practical considerations for those who are unfamiliar with the technologies, or how they may be incorporated into a gene discovery pipeline.

Augmenting natural products research through the use of cheminformatics databases

A common entry point to natural products research is through an activity-guided, empirical approach in which an unknown compound with known (and desired) activity is tracked and isolated using various fractionation experiments (Nothias et al, 2018). These methods are often highly successful but require analytical chemistry infrastructure to isolate and assign structures to compounds, as well as a large quantity of starting material. This approach is still employed, but technological advances have put emphasis on high-throughput methods (Ayon, 2023), with particular focus on automated screening of vast compound libraries—a process often referred to as bioprospecting. There are about 391,000 species of plant known to science at the time of writing this article, and as many as 2000 new plant species are described every year (Antonelli et al, 2023). Prioritisation of plant species to focus on for particular purposes has often been guided by ethnobotany (Teixidor-Toneu et al, 2018).

In situations where one has a given bioactivity or property of interest (e.g. anti-fungal or anti-cancer activity), or perhaps a common molecular feature associated with a desired property (e.g. natural products which derive from 3-hydroxyflavone, the backbone of all flavonols), it may be appropriate to consult specialised databases, such as Coconut (Sorokina et al, 2021), Lotus (Rutz et al, 2022) or NPASS (Zhao et al, 2023). These databases compile information from globally published research, concentrating on natural products and metabolites that have been extracted and identified using techniques such as high-performance liquid chromatography (HPLC), gas chromatography-mass spectrometry (GC-MS), liquid chromatography-mass spectrometry (LC-MS), and advanced nuclear magnetic resonance (NMR) spectroscopy. Furthermore, it is possible to filter these databases based on molecular features, species/genera of origin, and/or bioactivities using varied, recent approaches (Boldini et al, 2024; Gaudry et al, 2024). General databases such as Reaxys® (Elsevier Limited; https://www.reaxys.com/#/search/quick/query) or ChEMBL (Mendez et al, 2019), which are not restricted to natural products or bioactive molecules, may also be consulted (Box 1). It is, however, essential to recognise that data within these databases are derived from published studies that may contain inaccuracies, such as incorrect stereochemistry assignments, typographical errors in species or genus names, or misidentification of compounds. To mitigate these issues, it is advisable to first select only those molecules with complete stereochemistry, followed by thorough manual verification to ensure accurate structural assignment. Additionally, cross-referencing with taxonomic databases such as the National Center for Biotechnology Information taxonomy database (Schoch et al, 2020) can help confirm the accuracy of species names. Errors may also arise in cases in which the organism of study has not been properly taxonomically identified. This is particularly difficult to resolve in studies that lack sufficient metadata or information on provenance, as it can lead to misassumptions.

Curated and filtered data from these databases can be valuable for various purposes. For example, surveying the occurrence of a given compound or core structure across different plant lineages can provide insights into the underpinning evolutionary dynamics. It can also be used to inform on the likely phytochemistry of species which have not yet been the subject of study, based on their relationship to other known species (Rodríguez-López et al, 2022). In other scenarios, one may have a compound of interest and wish to determine how it is biosynthesised in a particular plant species. Here, chemical databases may also be useful because they can provide information on other plant species that may produce the compound of interest, and/or shed light on related compounds that are synthesised by the same species.

Box 1 Chemical databases.

Reaxys: https://www.reaxys.com/#/search/quick/query. A database that contains millions of chemical substances and reactions, along with their associated chemical and physical properties, as well as literature references.

Lotus: https://lotus.naturalproducts.net/. One of the largest and best-annotated resources for NP occurrences, available free of charge and without any restrictions. LOTUS is a dynamic database that is hosted both on its official website and on Wikidata. The Wikidata version enables community curation and the addition of new data.

NPASS: https://bidd.group/NPASS/. Natural Product Activity and Species Source is designed to provide a freely accessible database that integrates detailed information on species, sources, and biological activities of natural products.

Coconut: https://coconut.naturalproducts.net/. The COlleCtion of Open NatUral producTs is a platform that supports natural product research by offering data, tools, and services for deposition, curation, and reuse. Launched in 2021, it has become one of the largest open natural product databases. COCONUT includes chemical structures, names and synonyms, species, organism parts, geographic information on sample collection locations, and available literature references.

CheMBL: https://www.ebi.ac.uk/chembl/. A manually curated database of bioactive molecules with drug-like properties. This database combines chemical, bioactivity, and genomic data to facilitate the translation of genomic information into effective new drugs.

Identifying biosynthetic gene candidates

Once a target compound and/or plant species for investigation has been identified, it is next ideal to obtain material of that species. This can later be used to gather important information about where to look for biosynthesis genes and can also be used to clone gene candidates directly, which is cheaper than DNA synthesis. This material may be obtained through a botanical garden or a commercial supplier such as a plant nursery. Whichever route is taken, it is important to secure information about the provenance of the material in order to ensure Nagoya Protocol compliancy (Aubertin and Filoche, 2011). This can be challenging in many countries, where established processes are lacking. Smith et al (2018) provide comprehensive guidelines on navigating Nagoya and are a good first point of call. The metabolite profiles of commercially purchased plant material may differ substantially from bona fide botanic garden accessions, possibly due to nomenclature issues or hybridisation with other closely related species. Indeed, even within the same species it is well documented that different chemotypes can exist between individuals, and thus the specimen of study may not mirror published data (Anaia et al, 2024; Ziaja and Müller, 2025). It can be a good idea to prepare herbarium specimens of the material used. This is particularly salient for groups of species which lack genomic information, and can serve to ensure reproducibility and traceability of subsequent work (Davis and Knapp, 2024). The plant material obtained may be clonally propagated, e.g. cuttings or bulbs. However, in some instances the only material available is seed, and in this scenario ensuring successful germination can be an issue.

Once material from the plant of interest has been obtained, the next step is to confirm the presence of the natural product of interest, where it accumulates, and, ideally, in which tissue it is biosynthesised—this is a vital piece of information to guide later gene discovery. Classically, determining sites of accumulation would entail extraction from a tissue homogenate followed by analysis by LC-MS or GC-MS. More recently, matrix-assisted laser desorption/ionisation (MALDI) imaging approaches (which do not involve homogenisation) have been used to investigate the localisation of target compounds at the cell type level, thus greatly improving the resolution of these endeavours. In these methods, plant material is finely sliced (e.g. to 80 µm thickness (Yamamoto et al, 2019)) and mounted on a glass slide. The sample is then freeze-dried, immersed in a matrix solution to facilitate ionisation, and then subjected to MALDI imaging spectrometry (Nakabayashi et al, 2021; Yamamoto et al, 2019). This has allowed for localisation within tissues (Nakabayashi et al, 2021) and even at the level of specific cell types within tissues (Yamamoto et al, 2019), down to a resolution of 10 µm. While MALDI imaging is conceptionally straightforward, the selection of an appropriate matrix for a tissue of interest can be empirical, potentially requiring a number of different options to be explored (Alolga et al, 2024). Additionally, it does rely heavily on tissue sectioning, which may be difficult in tissues which are very small or difficult to work with.

In cases where the biosynthesis of a natural product of interest is conditional, this feature can be exploited to facilitate pathway discovery, e.g. hormone elicitors (Šenkyřík et al, 2023) or biotic factors (Katoch et al, 2022) can greatly stimulate the biosynthesis of some natural products. To determine sites of biosynthesis, labelling experiments that measure the incorporation of labelled radioisotopes or stable isotopes into final product remain the “gold standard” approach (Eljounaidi et al, 2024; Mehta et al, 2024; Trojanowska et al, 2000). These approaches can also be combined with MALDI imaging (Schwaiger-Haber et al, 2023), allowing for highly precise determination of sites of synthesis at the cell type level. Furthermore, knowledge of the precise localisation of a given compound in a native tissue can suggest potential function and can yield insights into how precursors are partitioned to facilitate flux.

Having established which tissues/conditions the natural product of interest is located/biosynthesised in, the next step in the pathway elucidation process is to access sequence information. Next-generation sequencing (NGS) resources pertaining to the plant of study may already be publicly available; the 1KP project is an excellent first port of call for this, containing NGS data for over 1000 species of plant (Carpenter et al, 2019). However, issues with sequencing depth can be encountered with this database, which can complicate later cloning efforts. Furthermore, these transcriptomes typically consist of one organ per species or of homogenised seedlings and therefore may not span sites of biosynthesis. Finally, the sequenced individual in the 1KP database may not well represent the accession acquired, for example in species which show high levels of intraspecific variation. It is therefore preferable to generate in-house NGS resources from the accession in use. This may involve performing RNAseq on different tissues, followed by either mapping back to a published transcriptome, or more commonly generation of de novo transcriptome resources for different plant tissues/treatment conditions. Typically, this would involve short-read Illumina sequencing. However, more recently long-read Iso-Seq has been employed in the construction of reference-quality transcriptomes (Zhang et al, 2022). While transcriptomes can be obtained from a single library, it is advisable to design a sequencing experiment to cover a range of tissues in which biosynthesis is and is not taking place, as this allows for exclusion of genes that are unlikely to be involved in the pathway of interest. Furthermore, quantitative information is often a useful line of evidence in pathway discovery: for example, genes that have very low expression may be excluded, and genes that are differentially expressed in tissues where biosynthesis is taking place can be shortlisted. It is therefore recommended that a minimum of three replicates per tissue, preferably four, are sequenced, as this allows for robust, statistical quantification of transcript abundance.

Single-cell sequencing technologies have also recently been developed, and have considerable potential for facilitating elucidation of natural product pathways that are restricted to specific cell types (Lin et al, 2023; Wang et al, 2022), as well as examples of pathways which are split between multiple, different cell types (Ozber and Facchini, 2022; Sun et al, 2023). These sequencing methods rely on the dissociation of tissues into cells in suspension, followed by preparation of protoplasts (Cole et al, 2021). The suspended protoplasts are then attached onto individually barcoded particles using microfluidics, followed by preparation of libraries and sequencing. The data can then be resolved into distinct cell types through use of principle component clustering (Cole et al, 2021). It should be noted, however, that the preparation of protoplasts for some tissues and some species can be extremely challenging. In such cases, single nucleus sequencing might be more feasible. The workflow for single nucleus sequencing is similar to single cell, but crucially it does not rely on the preparation of protoplasts (Sunaga-Franze et al, 2021). These methods are contrasted in Box 2, and well-reviewed by Ding et al (2020). These single cell and nucleus sequencing methods are substantially different from bulk RNA-seq, in which whole tissues are homogenised, as the information on which cells are expressing which genes is lost. Furthermore, transcript abundance in bulk RNA-seq is a synthesis of level of expression and abundance of cell type in which expression is taking place. Single-cell sequencing circumvents this problem, however in both methods the architecture of where genes are expressed in the context of a tissue is lost. Very recently, spatial transcriptomic methods such as stereo-seq have been developed and incorporated into plant science, which can overcome both these issues (Yin et al, 2023).

While many of the provided examples of single-cell sequencing use Illumina short-read sequencing technologies, the decreasing costs of SMRT sequencing have led to more widespread adoption (van Dijk et al, 2018). This long-read sequencing type permits differentiation between transcript isoforms and has recently been integrated into single-cell (Al’Khafaji et al, 2024; Shi et al, 2023) and even single-nucleus sequencing pipelines (Hardwick et al, 2022). However, it should be noted that there exist examples of biosynthetic pathways for which the functions are split over multiple different cell types, such as monoterpene indole alkaloids in Catharanthus roseus (Li et al, 2023) and morphine biosynthesis in Papaver somniferum (Ozber and Facchini, 2022). Sequencing or focusing on a single cell-type may therefore not always be appropriate. A comparison between long-read and short-read sequencing technologies is made in Box 2.

In addition to transcriptome sequencing applications, SMRT sequencing has had a transformative impact on genomics, resulting in a dramatic increase in the number of sequenced plant genomes. This can be combined with chromosome conformation capture technologies (Hi-C), which reveal contacts between genomic regions which are in close three-dimensional proximity (Šimková et al, 2024), to resolve even very complex genomes down to chromosome-level assemblies (Chávez Montes et al, 2022). These developments have led to the discovery that in a burgeoning number of cases, genes involved in particular natural product biosynthetic pathways are co-located in the genome in biosynthetic gene clusters (BGCs) (Nützmann et al, 2018; Smit and Lichman, 2022). This phenomenon offers opportunities to identify biosynthetic pathway genes (Kerwin et al, 2024; Liu et al, 2020; Winzer et al, 2012) based on genomic location, which can greatly accelerate the pathway elucidation process. Such clusters can readily be predicted by publicly available algorithms, such as plantiSMASH (Kautsar et al, 2017). Importantly, clustering can enable the discovery of genes encoding unexpected pathway components that would not have been found otherwise, based on a Rosetta Stone approach, e.g. a reductase involved in the D-fucosylation of the triterpene glycoside adjuvant QS-21 (Reed et al, 2023). Key considerations to take into account when sequencing a plant genome are the ploidy of the organism and its heterozygosity, and sequencing depth should be adjusted accordingly.

Knowledge of precursor molecules allows for probable routes of biosynthesis to be inferred. While natural products are highly diverse, many enzymatic transformations are common to most classes. For example, hydroxylations around a carbon skeleton are frequently made by cytochrome P450 family members (Nguyen and Dang, 2021). This allows gene lists to be curated for likely relevant classes of enzyme. However, in cases when a non-canonical gene family is involved in a pathway, or if there is no precedent for the step in literature, then difficulties may be encountered. If a pathway is clustered in a BGC, then examining other genes in the cluster may lead to discovery of missing steps, for example to a scaffold protein required for the formation of a protein complex required for biosynthetic pathway function (Boccia et al, 2024; Jozwiak et al, 2024). Where pathway genes are not organised in BGCs, differential gene expression analyses between tissues/conditions where the metabolite is accumulated to high or low levels (Carroll et al, 2023; Payet et al, 2024), or correlation with known biosynthesis enzymes (Jo et al, 2024) may all help identify functional genes.

Finally, there is a current drive to develop gene discovery pipelines which incorporate multiple different omics strategies—typically genomics, transcriptomics and metabolomics (Louwen et al, 2023). These multi-omic methods have historically been applied to microbial systems but are successfully being implemented in plant systems (Li et al, 2023). These approaches are particularly powerful, as they incorporate multiple lines of evidence. It should be noted, however, that to achieve sufficient statistical power it is generally necessary to sample multiple tissues/conditions in replicate, with a minimum of three, preferably four replicates per sample; however this can be expensive. Moreover, such an experiment should be designed to span both tissues/conditions where biosynthesis is known or suspected and tissues/conditions where it is not, so that meaningful comparisons can be made.

Box 2. NGS sequencing methods and their uses.

Short-read sequencing Long-read sequencing
Pros Cons Pros Cons
Cheap Difficult to resolve highly repetitive/complex sequences Can resolve highly repetitive/complex sequences More expensive
Can achieve high sequencing depth Cannot be used to distinguish sequence isoforms well Can be used to distinguish isoforms Lower accuracy per read
Ideal for standard transcriptomics Ideal for reference transcriptomes/genomes
Single-cell sequencing Single nucleus sequencing
Pros Cons Pros Cons
Isolated cells can be used for further multi-omics (proteomics/metabolomics) Rely on generating protoplasts for cell sorting Does not rely on generating protoplasts Cannot be used to profile proteome or metabolome
Transcripts across the cell are represented (e.g. in the cytoplasm) Ease of dissociation of cells can introduce heavy bias Reduced bias of ease of dissociation Only captures transcripts in nuclei
Cannot use with frozen samples Can be used with frozen samples.

Ratification of candidate genes and pathways

Following the identification of candidate genes, the next step in the process is to validate their function. This process can be broadly categorised into two different approaches: “bottom-up” and “top-down”.

In the “bottom-up” approach, candidate genes are expressed in a heterologous host. This host may be microbial, e.g. yeast or bacteria, or multicellular, i.e. plants. Microbial systems are often tractable and easy to scale up—for example, Escherichia coli is readily transformable and can be grown to high density in a controlled incubator. However, some of the key enzyme families involved in natural product biosynthesis in plants, such as cytochrome P450s, are localised on the endoplasmic reticulum, an internal structure that bacteria lack. Eukaryotic microbes such as yeast can provide useful alternatives for expression and purification of such enzymes for expression of individual steps (Nguyen et al, 2023) or indeed whole pathways (Winegar et al, 2024). However, these typically require additional metabolic engineering to provide the necessary precursors and co-factors. The plant system Nicotiana benthamiana (a wild relative of tobacco) is emerging as a rapid and powerful alternative to these microbial systems (Golubova et al, 2024), as it is better suited for expression of plant natural product enzymes and will naturally produce most of the necessary co-factors. Furthermore, in addition to stable transformation, N. benthamiana is amenable to transient transformation by agroinfiltration (Sainsbury et al, 2012), allowing for rapid screening of putative biosynthesis genes (Carlson et al, 2023; Reed et al, 2017).

The first step of the “bottom-up” approach is to clone the candidate gene into a suitable expression vector, such as pEAQ (Sainsbury et al, 2009). This expression cassette is then transformed into Agrobacterium tumefaciens, and A. tumefaciens cells bearing the expression constructs are then infiltrated into the leaves of N. benthamiana (Sainsbury et al, 2012). Expression of the introduced genes peaks after 4–5 days (Reed et al, 2017), after which plant material is then harvested, the metabolites extracted into an appropriate solvent, and either LC-MS or GC-MS is used to determine the presence of the target metabolite. Successive genes can be stacked together, either in multi-gene constructs (Payet et al, 2024) or by combining agrobacterium strains (Reed et al, 2017), and thus complete biosynthetic pathways can be reconstituted.

One common issue experienced with this approach is that precursor pools may be limiting and unable to meet the demand of constitutively expressed downstream enzymes. This can be addressed by boosting precursor supply, for example by co-expressing a feedback insensitive truncated 3-hydroxy-3-methylglutaryl-CoA reductase gene to boost triterpene production (Reed et al, 2017; Rodríguez-Concepción and Boronat, 2015).

When the host species is ameliorable to transformation, a “top-down” approach can be used to validate candidate gene function (Small, 2007). In this approach, small interfering RNAs can be expressed to activate endogenous gene silencing mechanisms, thereby reducing expression of the candidate gene and, correspondingly, production of the target molecule (Boccia et al, 2024; Jo et al, 2024). While this is an effective strategy, it should be noted that typically RNAi methods do not achieve complete silencing, and so this method rarely results in a complete knockout phenotype. Furthermore, silencing is often transient, and over time the efficacy may reduce. CRISPR-Cas can also be deployed to manipulate gene expression, either by silencing using a nuclease dead Cas9 (Zhang et al, 2023), or to knock out genes at a genetic level (Mercx et al, 2017). Biosynthetic genes can also be overexpressed in such systems to further demonstrate their involvement in planta (Grzech et al, 2024). Additionally, depending on whether the metabolites are present in root tissue, transformation via Agrobacterium rhizogenes to generate hairy roots can provide an excellent platform to test candidate genes (Shi et al, 2021). These stably transformed roots can be propagated indefinitely, providing a potential reservoir of pathway intermediates that can be isolated and used for in vitro enzyme assays. Furthermore, hairy roots are routinely genetically manipulated by CRISPR-Cas systems, allowing for complete knockout of candidate genes (Kiryushkin et al, 2021).

Generally, the “bottom-up” approach is faster and more accessible, as it does not require the organism of study to be transformed. However, the “top-down” method has many advantages in certain situations. For example, for many years steroidal glycoalkaloids (SGA) from potato and tomato could not be reconstituted in a heterologous host as the recently described scaffold protein GAME15 was not known (Boccia et al, 2024; Jozwiak et al, 2024). As such, many SGA pathway steps were discovered based on a “top-down” strategy, in which genes were knocked out or overexpressed in the native organism (Sonawane et al, 2022; Nakayasu et al, 2021). The “top-down” approach also serves as a formal validation of function in the producing plant, whereas this information is lost in a “bottom-up” approach.

Detection and structural assignment of natural products

Compounds produced by heterologous expression are typically validated through mass spectrometry coupled chromatographic methods such as GC-MS and LC-MS. This allows for the identification of the putative products through their molecular mass and retention times. Furthermore, through comparison of the fragmentation pattern of known, similar compounds (such as those of a closely related standard) it is possible to suggest potential structures of unknown products based on the fragments observed and some knowledge of the likely activity of the tested genes (Moses et al, 2014b).

Additionally, the use of spectral libraries can assist with the identification of products if samples are run under identical conditions (Morehouse et al, 2023). It should be noted, however, that unambiguous identification requires either the exact matching of retention time and mass spectrum to an authenticated standard, or else the purification and structural elucidation of the product through other means, as mass spectrometry cannot unambiguously determine structure by itself due to structural isomers (and especially enantiomers) commonly sharing identical mass spectra (Zhou et al, 2022).

To characterise the structure of unknown products from heterologous expression experiments, higher quantities of product are needed. To accommodate this, infiltrations are typically scaled up for metabolite extraction and isolation (Reed et al, 2017). Given the complex nature of plant material, extraction and isolation of pure compounds necessitates various separation techniques and strategies. These techniques include critical sample preparation steps such as drying, grinding, decolorising, defatting, and fractionating of crude extracts prior to column chromatography. Advances in modern extraction techniques, such as pressurised liquid extraction and solid-phase extraction, have enhanced sample extraction efficiency, complementing conventional methods such as soaking, Soxhlet extraction, and ultrasound-assisted extraction (Cheok et al, 2014; Majinda, 2012). Furthermore, alongside flash chromatography and medium-pressure LC for rapid preliminary fractionation of complex mixtures, HPLC is regarded as a highly effective and convenient technique for compound isolation. HPLC systems can be coupled with various types of detectors, such as mass selective detectors, photodiode array detectors or evaporative light scattering detectors, facilitating the handling of non-ultraviolet absorbing compounds (Bucar et al, 2013; Sticher, 2008).

The structural elucidation of natural products primarily relies on the use of advanced NMR spectroscopy (Bross-Walch et al, 2005; Kwan and Huang, 2008). Using triterpenoids as an example, the core aglycone structure is usually characterised first through proton (1H) and carbon (13C) NMR spectra—these can be compared with published literature. In addition to discerning the core aglycone structure, the splitting pattern and chemical shift of proton signals influenced by substituents can also inform on the presence of attached moieties. For instance, resonance signals of H–29 and H–30 in the ursane scaffold appear as doublet peaks while those in the oleanane-type structure exhibit singlet peaks (Kaweetripob et al, 2016; Moses et al, 2014a; Wu et al, 2020).

Further confirmation of proposed structures is achieved using other 2D NMR techniques such as heteronuclear single quantum correlation (HSQC), heteronuclear multiple bond correlation (HMBC), rotating-frame nuclear Overhauser effect spectroscopy, and nuclear Overhauser effect spectroscopy (Reynolds and Mazzola, 2015). However, in the case of complex, highly decorated natural products, such as saponins, obtaining comprehensive structural information—including sugar ring conformation and anomeric configuration, linkage positions and sequence of the sugar chain, as well as characterisation of acyl moieties and their linkage positions—presents significant challenges. Therefore, a combination of various 1D and 2D NMR experiments is necessary (da Silva et al, 2016).

More recently, the combined use of heteronuclear single quantum correlation-total correlation spectroscopy (HSQC-TOCSY) and heteronuclear two-bond correlation has proven beneficial for assigning carbon and proton resonances within sugar moieties (Graziani et al, 2018; Shiomi et al, 2016; Wallace et al, 2022) (Box 3).

Box 3 NMR methods and their uses.

2D NMR experiments

COSY: (correlation spectroscopy) cross-peaks show signals of protons attached to carbons that are adjacent to each other.

TOCSY: (total correlation spectroscopy) is much like COSY, except can show longer-range neighbouring protons within a group of connected carbons (or spin system).

NOESY: (nuclear Overhauser effect spectroscopy) and ROESY (rotating-frame nuclear Overhauser effect spectroscopy) show cross-peaks when protons are spatially close to each other. These experiments are particularly useful for determining stereochemistry.

HSQC: (heteronuclear single quantum coherence) shows which protons are directly attached to which carbon.

HMBC: (homonuclear multiple bond coherence) shows connections between carbons and protons separated by two, three or sometimes even four bonds. These cross peaks can be used to assign carbon-carbon connectivity.

HSQC-TOCSY: (heteronuclear single quantum coherence-total correlation) looks like a normal TOCSY but is additionally resolved using the 13C dimension, which can be useful for resolving overlapping cross-peaks. This can be particularly useful with molecules that contain multiple sugars, where the cross-peaks would normally overlap in a standard TOCSY experiment.

H2BC: (homonuclear 2 bond coherence) is similar to HMBC, except that the cross-peaks are limited to a distance of two bonds. This spectrum is usually used to distinguish the carbons directly adjacent to the carbon to which the proton is attached.

Future perspectives

Developments in NGS methods and metabolite imaging have greatly enriched the capacity for researchers to identify putative biosynthesis genes from a wide variety of non-reference organisms. However, one of the key bottlenecks in natural product discovery remains the ratification of biosynthesis genes. Increasing the accuracy of prediction of biosynthesis genes will therefore be vital in overcoming this. In the future, the development of large databases of ratified functional genes that can be used to train machine learning algorithms will greatly accelerate this capability.

Looking forward, the further development of multi-omic pipelines (Li et al, 2023) coupled with social network endeavours such as GNPS (Aron et al, 2020) will likely dramatically improve our capacity to identify natural product biosynthetic genes and pathways. Additionally, initiatives such as the Earth Biogenome project (Lewin et al, 2022), which aim to sequence the genomes of every eukaryote on the planet, have and will continue to revolutionise natural products research. Furthermore, advances in our understanding of protein structure and enzyme superfamilies will increase our capacity to make predictions about what biosynthesis genes may do based on their translated DNA sequences (Bordin et al, 2024). Indeed, when coupled with modern machine learning approaches, a future may be imagined where it is possible to design our own enzymes for biosynthesis of new-to-nature products based on these knowledge foundations (Notin et al, 2024) (Box 4).

Taken together, it becomes clear that the process of elucidating a natural product biosynthesis pathway is a highly multidisciplinary endeavour. It relies on skillsets in chemi- and bioinformatics to identify compounds with promising bioactivity and putative biosynthesis genes, molecular biology to clone and perform heterologous expression experiments, and chemistry to purify and assign structures to newly produced compounds (Fig. 1). As the techniques employed in each of the described steps become increasingly advanced, so too will the need to employ multidisciplinary teams to address the challenges of natural products research. The curation of these highly interdisciplinary teams also provides excellent training opportunities for early career researchers.

Figure 1. Flow chart showing the key stages of a specialised metabolite gene discovery pipeline.

Figure 1

Rectangular boxes, key milestones; circles, action to take at the key milestones; diamonds with dashed arrows, specific techniques by which the circles would be achieved. Readers may wish to align their current research project with this chart to decide potential next steps.

Box 4. Take home messages.

  • Cheminformatics databases can be used to systematically identify compounds based on structural features (e.g. shared aglycone) and bioactivity and can also be integrated with taxonomy to investigate chemistry across genera.

  • Understanding where and when biosynthesis occurs is vital for metabolic pathway elucidation.

  • Single-cell omics techniques are powerful additions to pathway elucidation, but it must be taken into consideration that some pathways are segmented over a variety of different cell types.

  • Heterologous reconstitution of biosynthesis pathways in Nicotiana benthamiana is effective and highly scalable.

Supplementary information

Peer Review File (329.5KB, pdf)

Acknowledgements

We would like to acknowledge the following funding sources to the Osbourn lab: The Novozymes Prize 2023 (Novo Nordisk Foundation), Wellcome Discovery Award #227375/Z/23/Z, BBSRC responsive mode award APP3941, the BBRSC Institute Strategic Programme Grant “Harnessing biosynthesis for sustainable food and health” (BB/X01097X/1) and the John Innes Foundation.

Glossary

Natural products

Specialised metabolites produced by living organisms, typically excluding large biomolecules such as proteins and nucleic acids.

Cheminformatics

Cheminformatics (also known as chemical informatics) uses computational methods to address chemical challenges and derive insights from chemical data. It encompasses database management, data mining, visualisation, model development, application development, property prediction, etc. Relevant chemical data includes small-molecule formulas, structures, properties, spectra, as well as biological and industrial activities.

Author contributions

Rocky D Payet: Conceptualisation; Writing—original draft; Writing—review and editing. Adnane Aouidate: Writing—original draft. Rebecca Casson: Writing—original draft. Alan Houghton: Writing—original draft. Mai-Truc Pham: Writing—original draft. Anne Osbourn: Conceptualisation; Supervision; Funding acquisition; Project administration; Writing—review and editing.

Disclosure and competing interests statement

AO is a co-founder and CSO of HotHouse Therapeutics. The other authors declare no competing interests.

Footnotes

Contributor Information

Rocky D Payet, Email: Rocky.payet@jic.ac.uk.

Anne Osbourn, Email: Anne.osbourn@jic.ac.uk.

Peer review information

A peer review file is available at 10.1038/s44318-025-00496-z

References

  1. Al’Khafaji AM, Smith JT, Garimella KV, Babadi M, Popic V, Sade-Feldman M, Gatzen M, Sarkizova S, Schwartz MA, Blaum EM et al (2024) High-throughput RNA isoform sequencing using programmed cDNA concatenation. Nat Biotechnol 42:582–586 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Alolga RN, Wang S-L, Qi L-W, Zang H, Huang F-Q (2024) MALDI mass spectrometry imaging in targeted drug discovery and development: the pros, the cons, and prospects in global omics techniques. TrAC Trends Anal Chem 178:117860 [Google Scholar]
  3. Anaia RA, Chiocchio I, Sontowski R, Swinkels B, Vergara F, van Dam NM (2024) Ontogeny and organ-specific steroidal glycoside diversity is associated with differential expression of steroidal glycoside pathway genes in two Solanum dulcamara leaf chemotypes. Plant Biol 10.1111/plb.13704 [DOI] [PMC free article] [PubMed]
  4. Antonelli A, Fry C, Smith R, Eden J, Govaerts R, PJK, Nic Lughadha E, Onstein R, Simmonds M, Zizka A, et al (2023) State of the world’s plants and fungi 2023. https://www.kew.org/science/state-of-the-worlds-plants-and-fungi
  5. Aron AT, Gentry EC, McPhail KL, Nothias L-F, Nothias-Esposito M, Bouslimani A, Petras D, Gauglitz JM, Sikora N, Vargas F et al (2020) Reproducible molecular networking of untargeted mass spectrometry data using GNPS. Nat Protoc 15:1954–1991 [DOI] [PubMed] [Google Scholar]
  6. Aubertin C, Filoche G (2011) The Nagoya Protocol on the use of genetic resources:: one embodiment of an endless discussion. Sustain Debate. 10.18472/SustDeb.v2n1.2011.3906
  7. Ayon NJ (2023) High-throughput screening of natural product and synthetic molecule libraries for antibacterial drug discovery. Metabolites 13:625 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Boccia M, Kessler D, Seibt W, Grabe V, Rodríguez López CE, Grzech D, Heinicke S, O’Connor SE, Sonawane PD (2024) A scaffold protein manages the biosynthesis of steroidal defense metabolites in plants. Science 386:1366–1372 [DOI] [PubMed] [Google Scholar]
  9. Boldini D, Ballabio D, Consonni V, Todeschini R, Grisoni F, Sieber S (2024) Effectiveness of molecular fingerprints for exploring the chemical space of natural products. J Cheminform 16:35 [DOI] [PMC free article] [PubMed]
  10. Bordin N, Scholes H, Rauer C, Roca-Martínez J, Sillitoe I, Orengo C (2024) Clustering protein functional families at large scale with hierarchical approaches. Protein Sci 33:e5140 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Bross-Walch N, Kühn T, Moskau D, Zerbe O (2005) Strategies and tools for structure determination of natural products using modern methods of NMR spectroscopy. Chem Biodivers 2:147–177 [DOI] [PubMed] [Google Scholar]
  12. Bucar F, Wube A, Schmid M (2013) Natural product isolation – how to get from biological material to pure compounds. Nat Prod Rep 30:525–545 [DOI] [PubMed] [Google Scholar]
  13. Carlson ED, Rajniak J, Sattely ES (2023) Multiplicity of the Agrobacterium infection of Nicotiana benthamiana for transient DNA delivery. ACS Synth Biol 12:2329–2338 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Carpenter EJ, Matasci N, Ayyampalayam S, Wu S, Sun J, Yu J, Jimenez Vieira FR, Bowler C, Dorrell RG, Gitzendanner MA et al (2019) Access to RNA-sequencing data from 1,173 plant species: the 1000 Plant transcriptomes initiative (1KP). Gigascience 8:giz126 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Carroll E, Ravi Gopal B, Raghavan I, Mukherjee M, Wang ZQ (2023) A cytochrome P450 CYP87A4 imparts sterol side-chain cleavage in digoxin biosynthesis. Nat Commun 14:4042 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Chávez Montes RA, Haber A, Pardo J, Powell RF, Divisetty UK, Silva AT, Hernández-Hernández T, Silveira V, Tang H, Lyons E et al (2022) A comparative genomics examination of desiccation tolerance and sensitivity in two sister grass species. Proc Natl Acad Sci USA 119:e2118886119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Cheok CY, Salman HAK, Sulaiman R (2014) Extraction and quantification of saponins: a review. Food Res Int 59:16–40 [Google Scholar]
  18. Cole B, Bergmann D, Blaby-Haas CE, Blaby IK, Bouchard KE, Brady SM, Ciobanu D, Coleman-Derr D, Leiboff S, Mortimer JC et al (2021) Plant single-cell solutions for energy and the environment. Commun Biol 4:1–12 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. da Silva AJR, Borges RM, Soares V (2016) Nuclear magnetic resonance in saponin structure elucidation. In: Williams A, Martin G, Rovnyak D (eds) Modern NMR approaches to the structure elucidation of natural products: Volume 2: Data acquisition and applications to compound classes. Royal Society of Chemistry, London. p 486–501
  20. Davis CC, Knapp S (2024) Exploring biodiversity through museomics. Nat Rev Genet 26:149–150 [DOI] [PubMed]
  21. Ding J, Adiconis X, Simmons SK, Kowalczyk MS, Hession CC, Marjanovic ND, Hughes TK, Wadsworth MH, Burks T, Nguyen LT et al (2020) Systematic comparison of single-cell and single-nucleus RNA-sequencing methods. Nat Biotechnol 38:737–746 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Eljounaidi K, Radzikowska BA, Whitehead CB, Taylor DJ, Conde S, Davis W, Dowle AA, Langer S, James S, Unsworth WP et al (2024) Variation of terpene alkaloids in Daphniphyllum macropodum across plants and tissues. New Phytol 243:299–313 [DOI] [PubMed] [Google Scholar]
  23. Fang C, Fernie AR, Luo J (2019) Exploring the diversity of plant metabolism. Trends Plant Sci 24:83–98 [DOI] [PubMed] [Google Scholar]
  24. Gaudry A, Pagni M, Mehl F, Moretti S, Quiros-Guerrero LM, Cappelletti L, Rutz A, Kaiser M, Marcourt L, Queiroz EF, Ioset JR, Grondin A, David B, Wolfender JL, Allard PM (2024) A Sample-Centric and Knowledge-Driven Computational Framework for Natural Products Drug Discovery. ACS Cent Sci. 10:494–510 [DOI] [PMC free article] [PubMed]
  25. Golubova D, Tansley C, Su H, Patron NJ (2024) Engineering Nicotiana benthamiana as a platform for natural product biosynthesis. Curr Opin Plant Biol 81:102611 [DOI] [PubMed] [Google Scholar]
  26. Graziani V, Scognamiglio M, Belli V, Esposito A, D’Abrosca B, Chambery A, Russo R, Panella M, Russo A, Ciardiello F et al (2018) Metabolomic approach for a rapid identification of natural products with cytotoxic activity against human colorectal cancer cells. Sci Rep 8:5309 [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Grzech D, Smit SJ, Alam RM, Boccia M, Nakamura Y, Hong B, Barbole R, Heinicke S, Kunert M, Seibt W, et al (2024) Incorporation of nitrogen in antinutritional Solanum alkaloid biosynthesis. Nat Chem Biol 21:131–142 [DOI] [PMC free article] [PubMed]
  28. Hardwick SA, Hu W, Joglekar A, Fan L, Collier PG, Foord C, Balacco J, Lanjewar S, Sampson MM, Koopmans F et al (2022) Single-nuclei isoform RNA sequencing unlocks barcoded exon connectivity in frozen brain tissue. Nat Biotechnol 40:1082–1092 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Ji W, Osbourn A, Liu Z (2024) Understanding metabolic diversification in plants: branchpoints in the evolution of specialized metabolism. Philos Trans R Soc B Biol Sci 379:20230359 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Jo S, El-Demerdash A, Owen C, Srivastava V, Wu D, Kikuchi S, Reed J, Hodgson H, Harkess A, Shu S, et al (2024) Unlocking saponin biosynthesis in soapwort. Nat Chem Biol 21:215–226 [DOI] [PMC free article] [PubMed]
  31. Jozwiak A, Panda S, Akiyama R, Yoneda A, Umemoto N, Saito K, Yasumoto S, Muranaka T, Gharat SA, Kazachkova Y et al (2024) A cellulose synthase-like protein governs the biosynthesis of Solanum alkaloids. Science 386:eadq5721 [DOI] [PubMed] [Google Scholar]
  32. Katoch K, Gupta S, Gupta AP, Goyal P, Devi R, Dey A, Pandey DK (2022) Biotic elicitation for enhanced production of plumbagin in regenerated shoot cultures of Plumbago zeylanica using response surface methodology. Plant Cell Tissue Organ Cult 151:605–615 [Google Scholar]
  33. Kautsar SA, Suarez Duran HG, Blin K, Osbourn A, Medema MH (2017) plantiSMASH: automated identification, annotation and expression analysis of plant biosynthetic gene clusters. Nucleic Acids Res 45:W55–W63 [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Kaweetripob W, Mahidol C, Thongnest S, Prawat H, Ruchirawat S (2016) Polyoxygenated ursane and oleanane triterpenes from Siphonodon celastrineus. Phytochemistry 129:58–67 [DOI] [PubMed] [Google Scholar]
  35. Kerwin RE, Hart JE, Fiesel PD, Lou Y-R, Fan P, Jones AD, Last RL (2024) Tomato root specialized metabolites evolved through gene duplication and regulatory divergence within a biosynthetic gene cluster. Sci Adv 10:eadn3991 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Kiryushkin AS, Ilina EL, Guseva ED, Pawlowski K, Demchenko KN (2021) Hairy CRISPR: genome editing in plants using hairy root transformation. Plants 11:51 [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Kwan EE, Huang SG (2008) Structural elucidation with NMR spectroscopy: practical strategies for organic chemists. Eur J Org Chem 2008:2671–2688 [Google Scholar]
  38. Lewin HA, Richards S, Lieberman Aiden E, Allende ML, Archibald JM, Bálint M, Barker KB, Baumgartner B, Belov K, Bertorelle G et al (2022) The Earth BioGenome Project 2020: starting the clock. Proc Natl Acad Sci USA 119:e2115635118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Li C, Wood JC, Vu AH, Hamilton JP, Rodriguez Lopez CE, Payne RME, Serna Guerrero DA, Gase K, Yamamoto K, Vaillancourt B et al (2023) Single-cell multi-omics in the medicinal plant Catharanthus roseus. Nat Chem Biol 19:1031–1041 [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Lin J-L, Chen L, Wu W-K, Guo X-X, Yu C-H, Xu M, Nie G-B, Dun J, Li Y, Xu B et al (2023) Single-cell RNA sequencing reveals a hierarchical transcriptional regulatory network of terpenoid biosynthesis in cotton secretory glandular cells. Mol Plant 16:1990–2003 [DOI] [PubMed] [Google Scholar]
  41. Liu Z, Cheema J, Vigouroux M, Hill L, Reed J, Paajanen P, Yant L, Osbourn A (2020) Formation and diversification of a paradigm biosynthetic gene cluster in plants. Nat Commun 11:5354 [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Louwen JJR, Medema MH, van der Hooft JJ (2023) Enhanced correlation-based linking of biosynthetic gene clusters to their metabolic products through chemical class matching. Microbiome 11:13 [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Majinda RRT (2012) Extraction and isolation of saponins. In: Sarker SD, Nahar L (eds) Natural products isolation, Humana Press, Totowa, NJ, p 415–426
  44. Marks RA, Hotaling S, Frandsen PB, VanBuren R (2021) Representation and participation across 20 years of plant genome sequencing. Nat Plants 7:1571–1578 [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Mehta N, Meng Y, Zare R, Kamenetsky-Goldstein R, Sattely E (2024) A developmental gradient reveals biosynthetic pathways to eukaryotic toxins in monocot geophytes. Cell 187:5620.e10–5637.e10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, Félix E, Magariños MP, Mosquera JF, Mutowo P, Nowotka M et al (2019) ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 47:D930–D940 [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Mercx S, Smargiasso N, Chaumont F, De Pauw E, Boutry M, Navarre C (2017) Inactivation of the β(1,2)-xylosyltransferase and the α(1,3)-fucosyltransferase genes in Nicotiana tabacum BY-2 cells by a multiplex CRISPR/Cas9 strategy results in glycoproteins without plant-specific glycans. Front Plant Sci 8:403 [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Morehouse NJ, Clark TN, McMann EJ, van Santen JA, Haeckl FPJ, Gray CA, Linington RG (2023) Annotation of natural product compound families using molecular networking topology and structural similarity fingerprinting. Nat Commun 14:308 [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Moses T, Papadopoulou KK, Osbourn A (2014a) Metabolic and functional diversity of saponins, biosynthetic intermediates and semi-synthetic derivatives. Crit Rev Biochem Mol Biol 49:439–462 [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Moses T, Pollier J, Almagro L, Buyst D, Van Montagu M, Pedreño MA, Martins JC, Thevelein JM, Goossens A (2014b) Combinatorial biosynthesis of sapogenins and saponins in Saccharomyces cerevisiae using a C-16α hydroxylase from Bupleurum falcatum. Proc Natl Acad Sci USA 111:1634–1639 [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Nakabayashi R, Hashimoto K, Mori T, Toyooka K, Sudo H, Saito K (2021) Spatial metabolomics using imaging mass spectrometry to identify the localization of asparaptine A in Asparagus officinalis. Plant Biotechnol 38:311 [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Nakayasu M, Umemoto N, Akiyama R, Ohyama K, Lee HJ, Miyachi H, Watanabe B, Muranaka T, Saito K, Sugimoto Y et al (2021) Characterization of C-26 aminotransferase, indispensable for steroidal glycoalkaloid biosynthesis. Plant J Cell Mol Biol 108:81–92 [DOI] [PubMed] [Google Scholar]
  53. Nguyen T-AM, Grzech D, Chung K, Xia Z, Nguyen T-D, Dang T-TT (2023) Discovery of a cytochrome P450 enzyme catalyzing the formation of spirooxindole alkaloid scaffold. Front Plant Sci 14:1125158 [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Nguyen T-D, Dang T-TT (2021) Cytochrome P450 enzymes as key drivers of alkaloid chemical diversification in plants. Front Plant Sci 12:682181 [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Nothias L-F, Nothias-Esposito M, da Silva R, Wang M, Protsyuk I, Zhang Z, Sarvepalli A, Leyssen P, Touboul D, Costa J et al (2018) Bioactivity-based molecular networking for the discovery of drug leads in natural product bioassay-guided fractionation. J Nat Prod 81:758–767 [DOI] [PubMed] [Google Scholar]
  56. Notin P, Rollins N, Gal Y, Sander C, Marks D (2024) Machine learning for functional protein design. Nat Biotechnol 42:216–228 [DOI] [PubMed] [Google Scholar]
  57. Nützmann H-W, Scazzocchio C, Osbourn A (2018) Metabolic gene clusters in eukaryotes. Annu Rev Genet 52:159–183 [DOI] [PubMed] [Google Scholar]
  58. Ozber N, Facchini PJ (2022) Phloem-specific localization of benzylisoquinoline alkaloid metabolism in opium poppy. J Plant Physiol 271:153641 [DOI] [PubMed] [Google Scholar]
  59. Payet RD, Bilham LJ, Kabir SMT, Monaco S, Norcott AR, Allen MGE, Zhu X-Y, Davy AJ, Brearley CA, Todd JD et al (2024) Elucidation of Spartina dimethylsulfoniopropionate synthesis genes enables engineering of stress tolerant plants. Nat Commun 15:8568 [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Petrovska BB (2012) Historical review of medicinal plants’ usage. Pharmacogn Rev 6:1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Reed J, Orme A, El-Demerdash A, Owen C, Martin LBB, Misra RC, Kikuchi S, Rejzek M, Martin AC, Harkess A et al (2023) Elucidation of the pathway for biosynthesis of saponin adjuvants from the soapbark tree. Science 379:1252–1264 [DOI] [PubMed] [Google Scholar]
  62. Reed J, Stephenson MJ, Miettinen K, Brouwer B, Leveau A, Brett P, Goss RJ, Goossens A, O’Connell MA, Osbourn A (2017) A translational synthetic biology platform for rapid access to gram-scale quantities of novel drug-like molecules. Metab Eng 42:185 [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Reynolds WF, Mazzola EP (2015) Nuclear magnetic resonance in the structural elucidation of natural products. In: Kinghorn A. D. Falk H, Kobayashi J (eds) Progress in the chemistry of organic natural products. Springer International Publishing, Cham, p 223–309 [DOI] [PubMed]
  64. Rodríguez-Concepción M, Boronat A (2015) Breaking new ground in the regulation of the early steps of plant isoprenoid biosynthesis. Curr Opin Plant Biol 25:17–22 [DOI] [PubMed] [Google Scholar]
  65. Rodríguez-López CE, Jiang Y, Kamileen MO, Lichman BR, Hong B, Vaillancourt B, Buell CR, O’Connor SE (2022) Phylogeny-aware chemoinformatic analysis of chemical diversity in Lamiaceae enables iridoid pathway assembly and discovery of aucubin synthase. Mol Biol Evol 39:msac057 [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Rutz A, Sorokina M, Galgonek J, Mietchen D, Willighagen E, Gaudry A, Graham JG, Stephan R, Page R, Vondrášek J et al (2022) The LOTUS initiative for open knowledge management in natural products research. eLife 11:e70780 [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Sainsbury F, Saxena P, Geisler K, Osbourn A, Lomonossoff GP (2012) Chapter Nine - using a virus-derived system to manipulate plant natural product biosynthetic pathways. In: Hopwood DA (ed) Methods in enzymology. Academic Press, Cambridge, MA, p 185–202 [DOI] [PubMed]
  68. Sainsbury F, Thuenemann EC, Lomonossoff GP (2009) pEAQ: versatile expression vectors for easy and quick transient expression of heterologous proteins in plants. Plant Biotechnol J 7:682–693 [DOI] [PubMed] [Google Scholar]
  69. Schoch CL, Ciufo S, Domrachev M, Hotton CL, Kannan S, Khovanskaya R, Leipe D, Mcveigh R, O’Neill K, Robbertse B et al (2020) NCBI Taxonomy: a comprehensive update on curation, resources and tools. Database 2020:baaa062 [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Schwaiger-Haber M, Stancliffe E, Anbukumar DS, Sells B, Yi J, Cho K, Adkins-Travis K, Chheda MG, Shriver LP, Patti GJ (2023) Using mass spectrometry imaging to map fluxes quantitatively in the tumor ecosystem. Nat Commun 14:2876 [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Shi M, Liao P, Nile SH, Georgiev MI, Kai G (2021) Biotechnological exploration of transformed root culture for value-added products. Trends Biotechnol 39:137–149 [DOI] [PubMed] [Google Scholar]
  72. Shi Z-X, Chen Z-C, Zhong J-Y, Hu K-H, Zheng Y-F, Chen Y, Xie S-Q, Bo X-C, Luo F, Tang C et al (2023) High-throughput and high-accuracy single-cell RNA isoform analysis using PacBio circular consensus sequencing. Nat Commun 14:2631 [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Shiomi N, Abe T, Kikuchi H, Aritsuka T, Takata Y, Fukushi E, Fukushi Y, Kawabata J, Ueno K, Onodera S (2016) Structural analysis of novel kestose isomers isolated from sugar beet molasses. Carbohydr Res 424:1–7 [DOI] [PubMed] [Google Scholar]
  74. Small I (2007) RNAi for revealing and engineering plant gene functions. Curr Opin Biotechnol 18:148–153 [DOI] [PubMed] [Google Scholar]
  75. Smit SJ, Lichman BR (2022) Plant biosynthetic gene clusters in the context of metabolic evolution. Nat Prod Rep 39:1465–1482 [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Smith D, Hinz H, Mulema J, Weyl P, Ryan MJ (2018) Biological control and the Nagoya Protocol on access and benefit sharing – a case of effective due diligence. Biocontrol Sci Technol 28:914–926 [Google Scholar]
  77. Sonawane PD, Jozwiak A, Barbole R, Panda S, Abebie B, Kazachkova Y, Gharat SA, Ramot O, Unger T, Wizler G et al (2022) 2-oxoglutarate-dependent dioxygenases drive expansion of steroidal alkaloid structural diversity in the genus Solanum. New Phytol 234:1394–1410 [DOI] [PubMed] [Google Scholar]
  78. Sorokina M, Merseburger P, Rajan K, Yirik MA, Steinbeck C (2021) COCONUT online: Collection of Open Natural Products database. J Cheminformatics 13:2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Sticher O (2008) Natural product isolation. Nat Prod Rep 25:517–554 [DOI] [PubMed] [Google Scholar]
  80. Sun S, Shen X, Li Y, Li Y, Wang S, Li R, Zhang H, Shen G, Guo B, Wei J et al (2023) Single-cell RNA sequencing provides a high-resolution roadmap for understanding the multicellular compartmentation of specialized metabolism. Nat Plants 9:179–190 [DOI] [PubMed] [Google Scholar]
  81. Sunaga-Franze DY, Muino JM, Braeuning C, Xu X, Zong M, Smaczniak C, Yan W, Fischer C, Vidal R, Kliem M et al (2021) Single-nucleus RNA sequencing of plant tissues using a nanowell-based system. Plant J Cell Mol Biol 108:859–869 [DOI] [PubMed] [Google Scholar]
  82. Suresh PS, Kumari S, Sahal D, Sharma U (2023) Innate functions of natural products: a promising path for the identification of novel therapeutics. Eur J Med Chem 260:115748 [DOI] [PubMed] [Google Scholar]
  83. Teixidor-Toneu I, Jordan FM, Hawkins JA (2018) Comparative phylogenetic methods and the cultural evolution of medicinal plant use. Nat Plants 4:754–761 [DOI] [PubMed] [Google Scholar]
  84. Trojanowska MR, Osbourn AE, Daniels MJ, Threlfall DR (2000) Biosynthesis of avenacins and phytosterols in roots of Avena sativa cv. Image. Phytochemistry 54:153–164 [DOI] [PubMed] [Google Scholar]
  85. van Dijk EL, Jaszczyszyn Y, Naquin D, Thermes C (2018) The third revolution in sequencing technology. Trends Genet 34:666–681 [DOI] [PubMed] [Google Scholar]
  86. Wallace F, Fontana C, Ferreira F, Olivaro C (2022) Structure elucidation of triterpenoid saponins found in an immunoadjuvant preparation of Quillaja brasiliensis using mass spectrometry and 1H and 13C NMR spectroscopy. Molecules 27:2402 [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Wang Q, Wu Y, Peng A, Cui J, Zhao M, Pan Y, Zhang M, Tian K, Schwab W, Song C (2022) Single-cell transcriptome atlas reveals developmental trajectories and a novel metabolic pathway of catechin esters in tea leaves. Plant Biotechnol J 20:2089–2106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Winegar PH, Hudson GA, Dell LB, Astolfi MCT, Reed J, Payet RD, Ombredane HCJ, Iavarone AT, Chen Y, Gin JW et al (2024) Verazine biosynthesis from simple sugars in engineered Saccharomyces cerevisiae. Metab Eng 85:145–158 [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Winzer T, Gazda V, He Z, Kaminski F, Kern M, Larson TR, Li Y, Meade F, Teodor R, Vaistij FE et al (2012) A Papaver somniferum 10-gene cluster for synthesis of the anticancer alkaloid noscapine. Science 336:1704–1708 [DOI] [PubMed] [Google Scholar]
  90. Wu Z-W, Li W-B, Zhou J, Liu X, Wang L, Chen B, Wang M-K, Ji L, Hu W-C, Li F (2020) Oleanane- and ursane-type triterpene saponins from Centella asiatica exhibit neuroprotective effects. J Agric Food Chem 68:6977–6986 [DOI] [PubMed] [Google Scholar]
  91. Yamamoto K, Takahashi K, Caputi L, Mizuno H, Rodriguez-Lopez CE, Iwasaki T, Ishizaki K, Fukaki H, Ohnishi M, Yamazaki M et al (2019) The complexity of intercellular localisation of alkaloids revealed by single-cell metabolomics. New Phytol 224:848–859 [DOI] [PubMed] [Google Scholar]
  92. Yin R, Xia K, Xu X (2023) Spatial transcriptomics drives a new era in plant research. Plant J 116:1571–1581 [DOI] [PubMed] [Google Scholar]
  93. Zhang R, Kuo R, Coulter M, Calixto CPG, Entizne JC, Guo W, Marquez Y, Milne L, Riegler S, Matsui A et al (2022) A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis. Genome Biol 23:149 [DOI] [PMC free article] [PubMed] [Google Scholar]
  94. Zhang Y-M, Zheng L, Xie K (2023) CRISPR/dCas9-mediated gene silencing in two plant fungal pathogens. mSphere 8:e0059422 [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Zhao H, Yang Y, Wang S, Yang X, Zhou K, Xu C, Zhang X, Fan J, Hou D, Li X et al (2023) NPASS database update 2023: quantitative natural product activity and species source database for biomedical research. Nucleic Acids Res 51:D621–D628 [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Zhou C, Yang Y, Tian J, Wu Y, An F, Li C, Zhang Y (2022) 22R- but not 22S-hydroxycholesterol is recruited for diosgenin biosynthesis. Plant J 109:940–951 [DOI] [PubMed] [Google Scholar]
  97. Ziaja D, Müller C (2025) Intraspecific and intra-individual chemodiversity and phenotypic integration of terpenes across plant parts and development stages in an aromatic plant. Plant Biol 10.1111/plb.13763 [DOI] [PMC free article] [PubMed]
  98. Šenkyřík JB, Křivánková T, Kaczorová D, Štefelová N (2023) Investigation of the effect of the auxin antagonist PEO-IAA on cannabinoid gene expression and content in Cannabis sativa L. plants under in vitro conditions. Plants 12:1664 [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Šimková H, Câmara AS, Mascher M (2024) Hi-C techniques: from genome assemblies to transcription regulation. J Exp Bot 75:5357–5365 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Peer Review File (329.5KB, pdf)

Articles from The EMBO Journal are provided here courtesy of Nature Publishing Group

RESOURCES