Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2016 Nov 7;113(47):E7610–E7618. doi: 10.1073/pnas.1610218113

Illuminating a plant’s tissue-specific metabolic diversity using computational metabolomics and information theory

Dapeng Li a, Sven Heiling a, Ian T Baldwin a, Emmanuel Gaquerel a,b,1
PMCID: PMC5127351  PMID: 27821729

Significance

Population geneticists have educated molecular biologists in how to harness the statistical power of variance arising from interindividual natural variation to elucidate gene function in plants. The metabolic differences among tissues within a plant provide another source of variance that can be harnessed in the quest to understand gene function. We combine the power of information theory statistics and computational metabolomics to parse metabolic diversity within an ecological model plant, Nicotiana attenuata, to reveal intriguing patterns of metabolic specialization in floral limb and anthers, the responsible mechanisms of which we parse further by detecting and silencing the expression of two UDP-glycosyltransferases involved in floral flavonoid metabolism. The workflow defines a framework for future evolutionary studies on plant tissue metabolic specialization.

Keywords: secondary metabolism, mass spectrometry, metabolomics, information theory, Nicotiana attenuata

Abstract

Secondary metabolite diversity is considered an important fitness determinant for plants’ biotic and abiotic interactions in nature. This diversity can be examined in two dimensions. The first one considers metabolite diversity across plant species. A second way of looking at this diversity is by considering the tissue-specific localization of pathways underlying secondary metabolism within a plant. Although these cross-tissue metabolite variations are increasingly regarded as important readouts of tissue-level gene function and regulatory processes, they have rarely been comprehensively explored by nontargeted metabolomics. As such, important questions have remained superficially addressed. For instance, which tissues exhibit prevalent signatures of metabolic specialization? Reciprocally, which metabolites contribute most to this tissue specialization in contrast to those metabolites exhibiting housekeeping characteristics? Here, we explore tissue-level metabolic specialization in Nicotiana attenuata, an ecological model with rich secondary metabolism, by combining tissue-wide nontargeted mass spectral data acquisition, information theory analysis, and tandem MS (MS/MS) molecular networks. This analysis was conducted for two different methanolic extracts of 14 tissues and deconvoluted 895 nonredundant MS/MS spectra. Using information theory analysis, anthers were found to harbor the most specialized metabolome, and most unique metabolites of anthers and other tissues were annotated through MS/MS molecular networks. Tissue–metabolite association maps were used to predict tissue-specific gene functions. Predictions for the function of two UDP-glycosyltransferases in flavonoid metabolism were confirmed by virus-induced gene silencing. The present workflow allows biologists to amortize the vast amount of data produced by modern MS instrumentation in their quest to understand gene function.


Plants are elegant synthetic chemists making use of their metabolic prowess to produce complex blends of structurally diverse chemicals. Commonly quoted estimates state that plants produce somewhere on the order of 200,000 chemical structures. Secondary metabolites, also referred to as specialized metabolites or natural products, contribute to the largest fraction of this structural diversity. Compared with their counterparts in central metabolism (primary metabolites), secondary metabolite groups have diversified to the extreme in plant lineages, likely as a result of the multiple ecological roles they fulfill (1). The high degree of plasticity of secondary metabolism pathways is consistent with the existence of large families of metabolism-related genes such as cytochrome P450s and UDP-glycosyltransferases in plant genomes that can create structural and chemical modifications almost without limits. The majority of metabolic gene functions remain unknown, however, either because the metabolites that they produce are unknown or significant associations remain to be identified between the expression of specific metabolic genes and characterized metabolic groups.

The biosynthesis of particular secondary metabolites or of complete metabolic groups is frequently taxonomically restricted (2). For this reason, certain secondary metabolite classes have been used as signature characters for biochemical investigation of specific plant families: for instance, quinolizidine alkaloids for Fabaceae (3), tropane and steroidal alkaloids for Solanaceae (4), and iridoids for Lamiaceae (5). Another way of looking at plant secondary metabolism diversity is to consider the precise tissue-specific localization of pathways responsible for their production. Compositional differences are, for instance, readily apparent across floral tissues that produce metabolic blends very different from their vegetative counterparts (6). In the most extreme cases, the accumulation of secondary metabolites can be restricted to specific cell types. For instance, plant defense metabolites are frequently produced in specialized tissues/cell types as a means of minimizing autotoxicity reactions in the surrounding tissues and/or of maximizing the defensive function of these metabolites toward aggressors that attack in a spatially specific manner (7, 8). A better exploration of tissue-level metabolic specialization is therefore particularly helpful in understanding the contribution of a given tissue to an organism’s fitness. Deep biological insight based on single-cell metabolomics has remained technically challenging; however, important steps have been taken toward this goal in the field of microbial metabolomics, but not yet in plant science, and the technique has been proven as an excellent indicator of phenotypic heterogeneity in this field.

From a mechanistic standpoint, the accumulation of secondary metabolites in a given tissue requires the spatial-temporal coordination of a vast array of cellular processes in which systems controlling biosynthesis, storage, and degradation are of central importance. Regulatory mechanisms coordinating these processes are only beginning to be uncovered for some model metabolic pathways such as the metabolic pathways of the family of glucosinolates in Brassicacae (9). Coexpression analysis using information about gene and secondary metabolite cross-tissue expression patterns has been applied successfully to infer biosynthetic genes in secondary metabolism (4, 10, 11). In Arabidopsis, several “-omics”-based tissue atlases (e.g., for gene expression, alternative splicing, proteome) are publicly accessible to conduct such types of analysis (12).

Tissue-level nontargeted metabolomics of downstream metabolic readouts are more challenging to implement. Notably, the potential of mass spectrometry (MS)-based metabolomics and of the large-scale acquisition of tandem MS (MS/MS) spectra for as many metabolites as possible within a metabolic profile is severely constrained by the absence of straightforward classification and visualization pipelines that enable facile pathway interpretations. Metabolite annotation and identification are the obvious bottlenecks that thwart the metabolomics analysis of secondary metabolism (13, 14). Ideally, we need approaches that combine the strengths of state-of-the-art statistical methods currently emerging from the genomics field with the recent advances in metabolomics data mining, such as the method of MS/MS molecular networking, which allow unknown metabolites to be readily classified based solely on their fragmentation patterns.

Here, we developed a pipeline combining tissue-wide nontargeted MS data acquisition and information theory to mine patterns of tissue-specific structural diversity. With this pipeline, we analyzed a compendium of 14 dissected tissues of Nicotiana attenuata, an ecological model for chemically mediated adaptive traits in the wild. The analysis resulted in the deconvolution of 895 nonredundant MS/MS spectra, of which 565 exhibited preferential tissue specificity. Using information theory analysis, we asked whether certain tissues exhibited a higher degree of tissue metabolic specialization and which MS/MS data were linked to these patterns. From all this information, tissue–metabolite association maps were created to provide predictions about the tissue-level analysis of gene functions, some of which were tested by gene silencing techniques.

Results

A Compendium of MS Profiles Obtained from Isolated N. attenuata Tissues.

Here, we isolated 14 tissues from 28- and 50-d-old N. attenuata plants growing under controlled growth conditions in the glasshouse (Fig. 1A). Pools of 100-mg tissues were extracted using independent extractions with 80% or 20% (vol/vol) methanol to increase the coverage of the metabolome with polar to semipolar compounds not efficiently extracted by 80% (vol/vol) methanol. We used an optimized ultrahigh-performance liquid chromatography (UHPLC) electrospray ionization (ESI)/quadrupole time-of-flight (qTOF) MS method to analyze the metabolome profiles of these tissues. Identical chromatographic conditions were used for the analysis of these two extraction types because retention time consistency for identical mass features (with mass features being m/z signals detected at a given retention by the peak picking method) is one of the criteria implemented in our bioinformatics workflow.

Fig. 1.

Fig. 1.

Integration of MS-based metabolomics and information theory analysis highlights tissue-specific metabolome specialization. (A) Tissues were collected and analyzed separately for metabolomic profiling. Detailed explanations of the tissue collection procedure are provided in Materials and Methods. (B) Hierarchical clustering, using the Euclidean distance as the clustering metric, of tissue-specific idMS/MS relative expression profiles. The heat map coloring depicts the scaled intensities. Z-score–normalized median absolute distances captured the cross-tissue variations for idMS/MS intensities (895 idMS/MSs) obtained for each tissue. (C) Information theory analysis of tissue-level idMS/MS composition δj and Hj based on idMS/MS cross-tissue distributions is displayed in a 2D space to reveal gradients of metabolic specialization. ANT, anthers; BUD, floral bud; COR, corolla tube; FIL, filaments; LEA, rosette leaves; LIM, corolla limb; PED, floral pedicel; ROO, root; SEE, seeds; SEP, floral sepals; STE, stem; STY, floral style and stigma.

The dataset (Dataset S1) was processed using the R package XCMS utilizing optimized parameters and analyzed by principal component analysis (PCA), which confirmed that extensive variations in the composition of mass features exist among the different tissue profiles (SI Appendix, Fig. S1). The XCMS × PCA processing procedure is a very common one and is, together with a priori knowledge annotation of prominent mass features and of in-source fragmentation patterns, frequently considered as the central mining step in most metabolomics studies. However, patterns revealed from this type of data mining provide little to no information with respect to compound diversity among samples. This type of biological interpretation critically requires an analysis at the level of metabolites described by deconvoluted spectra, and not at the level of individual mass features, which is what prior work has used.

Creating a Multitissue Indiscriminant MS/MS Library for Metabolite Structural Analysis.

To collect a holistic repertoire of structural information on the metabolic diversity in our tissue compendium, we implemented a tissue-wide analytical pipeline for indiscriminant (data-independent) MS/MS (idMS/MS) analysis. Compared with data-dependent acquisition methods involving the preselection of a restricted list of precursor ions for collision-induced dissociation (CID) fragmentation, this approach considers for fragmentation analysis all signals within an m/z range set as large as possible (15). In recent years, the idMS/MS technique, sometimes referred to as shotgun or broad-scale MS/MS, has gained considerable interest as an exploratory method for metabolomics measurements. In a previous study, we showed that idMS/MS can be efficiently implemented to most qTOF instruments by running replicated measurements of the same sample using idMS/MS at different CID voltages to maximize fragment coverage (16). Furthermore, because the idMS/MS method has the disadvantage of being uninformative about precursor-to-fragment relationships, we optimized a computational pipeline based on cross-sample correlation calculations to perform fragment relationship assignments with high confidence (16).

Here, we improved the previous computational pipeline for exploiting cross-tissue metabolic variations to gain statistical power in precursor-to-fragment assignments. Briefly, for each CID voltage, precursor and fragment relationships were assigned using Pearson correlation coefficient (PCC) analysis across all tissues (Materials and Methods). The idMS/MS spectra reconstructed at each CID voltage were merged into a composite idMS/MS spectrum, and some of redundant sub-idMS/MSs were grouped simultaneously via the calculation of spectral similarity. Notably, not all putatively redundant sub-idMS/MSs could be merged into respective compound-specific idMS/MS spectra using the single spectral similarity threshold value applied to the dataset; hence, metabolites prone to particularly intense in-source fragmentation frequently produced several idM/MS spectra by the analysis. This possible challenge was likely minimal in our study, however, because these different sub-idMS/MSs are expected to covary across the tissue dataset and to form tight clusters during the structural clustering analysis applied later on in the workflow (SI Appendix, Fig. S2). The discrimination of nearly coeluting isobaric peaks, resulting from compounds with the same molecular weight but different structures, is a challenge inherent to all large-scale metabolomics studies and one that can only be partly resolved via technical advances such as enhanced ion mobility MS. From a data processing standpoint, if two nearly coeluting isobaric species return overlapping fragmentation patterns, the correlation score for the precursor-fragment assignment will consequently be affected. Such a scenario could explain challenges encountered during the assembly of certain spectra (which were therefore not included in subsequent analyses). However, an advantage of the precursor-fragment assignment method in our pipeline is that it does not rely solely on the chromatography behavior of candidate m/z signals but also on their coregulated behavior across the tissue dataset, which, to a certain extent, improves the assembly of nearly coeluting metabolites. The deconvolution efficiency was tested by comparing idMS/MS spectra with previously reported MS/MS spectra at optimized CID voltages for major N. attenuata secondary metabolites as well as unknowns (Dataset S1). Altogether, the computational pipeline (merging of CID voltage-specific data and partial redundancy filtering) retrieved a library of 895 nonredundant idMS/MSs (Dataset S1); these idMS/MSs were used as the data for all subsequent analyses presented here.

Tissues Differ in Their Degree of Metabolic Specialization.

For a first perspective into tissue metabolic relationships, we normalized idMS/MS spectra intensities (precursor intensities in MS mode) using a modified Z-score method, termed ZMAD (Z-score normalized median absolute distance) (Materials and Methods) and used hierarchical clustering analysis (HCA) (Fig. 1B). When merging the datasets obtained from the two extraction procedures, three main clusters appeared from the HCA based on Euclidean distance calculations (Fig. 1B): one cluster with most nonreproductive tissues of flowers (corolla limb, corolla tube, sepal, pedicel, bud, and filament), one with the reproductive parts (anther, nectary, ovary, style, and stigma), and a last one with vegetative tissues (leaf, root, stem, and seed). Interestingly, tissues that connect reproductive and nonreproductive parts in flowers, namely, filaments and stamens, exhibited strongly divergent idMS/MS compositional profiles, demonstrating that relatively fine-scale spatial modulations of metabolism can be analyzed by this approach. It should be noted that the upstream computational procedure used to deconvolute idMS/MS spectra was performed tissue-wide (and not at the individual tissue level) and relied on matrix alignment and noise filtering steps to produce an idMS/MS tissue-wide matrix that was of a consistent size (Dataset S1). A drawback of this computational approach is that no information about the number of idMS/MSs per tissue is readily available to explore tissue-level metabolic specialization. Intuitively, the presence of few high-intensity idMS/MSs in a given tissue compared with the average calculated across all tissues could be indicative of a high degree of metabolic specialization, whereas the presence of a large number of average-intensity idMS/MSs could reflect a low metabolic specialization. Such interpretations are linked to the frequency distribution of each idMS/MS within the dataset. Information theory, which was pioneered by Shannon (17) in a seminal article in 1948, provides the statistical framework to cope with this type of analysis. In defining tissue metabolic diversity and specialization, we therefore considered idMS/MS spectra as symbols, in the sense of information theory, and estimated for each tissue’s metabolome its diversity based on the Shannon entropy of its frequency distribution. In other words, tissue-level metabolome specialization was measured as the average specificity of each of its idMS/MS components. Using previously implemented formulae (18), we retrieved values for the following indexes: diversity (Hj) reflecting the tissue-level idMS/MS diversity and specialization (δj) for the tissue-level idMS/MS specialization as inferred from the average idMS/MS specificity in the dataset.

Visualizing tissue metabolic profiles in a 2D space using these two indexes as coordinates revealed a number of interesting patterns (Fig. 1C). A most obvious one was that tissues significantly vary in their degree of δj and Hj. When extraction types were merged to achieve a more comprehensive view, anthers emerged as the tissue with the least diverse, most specialized metabolome (Hj = 5.16, δj = 1.95). In other words, several idMS/MSs exhibited relative high-intensity levels concomitant with low-frequency distributions across tissues. Root (Hj = 6.66, δj = 1.49), stem (Hj = 6.91, δj = 1.32), and sepal (Hj = 7.39, δj = 1.38) samples followed anthers in terms of low idMS/MS diversity and middle to high idMS/MS specialization. In the case of roots, the relatively high specialization index value retrieved for this tissue was especially supported by idMS/MS spectra collected from the 20% (vol/vol) methanol extraction (SI Appendix, Fig. S3). The signature for low diversity and low specialization detected in seeds (Hj = 6.34, δj = 1.09) was in line with the low density of chromatographic peaks seen for this tissue. Style and stigma (Hj = 6.47, δj = 1.11), filaments (Hj = 7.32, δj = 1.08), ovary (Hj = 7.31, δj = 1.10), corolla tube (Hj = 7.81, δj = 1.12), and pedicel (Hj = 8.17, δj = 1.16) were the tissues exhibiting lowest specialization indexes. The pedicel had the most diverse idMS/MS profile of all tissues analyzed.

Tissue-Level Differentiations in 17-Hydroxygeranyllinalool Diterpene Glycosides and Phenolamines.

As a first step toward mining metabolite compositional variations across tissues, we amortized previous chemical knowledge acquired from N. attenuata leaves and evaluated whether the distribution across tissues was differentially modulated at different levels of know secondary metabolic pathways. For this analysis, we selected as a case study the 17-hydroxygeranyllinalool diterpene glycosides (17-HGL-DTGs) pathway that produces abundant acyclic diterpenes with antiherbivore functions (19). For this metabolic group, we retrieved the corresponding idMS/MSs for the most abundant metabolites and investigated cross-tissue modulations as visualized by plotting individual tissue ZMAD-normalized values (SI Appendix, Fig. S4). The 17-HGL-DTGs can be subcategorized based on their sugar/malonyl decorations as follows: the precursor molecule (lyciumoside I), core structures with a higher degree of glycosylation but no malonyl groups (nicotianoside III, lyciumoside IV, and attenoside), and monomalonylated (nicotianosides IV, Ia, and VI) and dimalonylated (nicotianosides V, II, and VII) structures. Lyciumoside I and lyciumoside IV, its direct rhamnosylation product, were detected in young and photosynthetically active rosette leaves and at lower normalized levels in certain floral organs. The analysis revealed that 17-HGL-DTGs varied significantly among tissues, and the variance was organized by biosynthetic sequence in the pathway. The general trend was that greater tissue-specific variation was found in the downstream steps of the pathway. This trend was particularly apparent for monomalonylated 17-HGL-DTGs and suggests an increased translocation from source to sink tissues that increased with 17-HGL-DTG glycosylation and malonylation. Dimalonylated 17-HGL-DTGs were more abundant in certain reproductive organs relative to all other tissues. Another pathway monitoring example is provided for the phenolamide pathway, for which an apparent greater specificity toward certain reproductive organs was detected for polyacylated spermidine conjugates (SI Appendix, Fig. S4).

Large-Scale Inference of IdMS/MS Tissue Specificity Reveals Basic Principles of Tissue Interdependencies.

The above descriptions confirmed that tissue-based differentiations are detectable for characterized metabolic pathways. This finding is consistent with the idea that specific groups of metabolites specifically accumulate in one or several tissues, albeit being detectable at lower levels in almost all other tissues. The specificity index of information theory of a given idMS/MS serving for δj calculation tends to be stringent and excludes features exhibiting a significant degree of specificity (association) with more than one tissue (Fig. 2A, Center). In an attempt to assess the degree of association of an idMS/MS toward one or several tissues statistically, we analyzed idMS/MS expression distribution across tissues using reduction of kurtosis as developed by Li et al. (20). The kurtosis analysis measures expression distribution patterns rather than frequencies and skirts the restriction of the number of tissues with which a given idMS/MS can be associated. As such, the method has been found to be highly successful in detecting tissue specificity from large-scale data. Briefly, idMS/MS spectra that exhibit high tissue specificity are characterized by high kurtosis values with either right- or left-tailed leptokurtic distributions, whereas idMS/MS spectra that exhibit low tissue specificity have low kurtosis values with normal distributions (Fig. 2 A and B). A total of 595 of 895 idMS/MSs exhibited preferential tissue associations (Q < 0.05), with the rest of the idMS/MSs being considered as non–tissue-associated features. For ease of interpretation, SI Appendix, Fig. S5 reports the statistical support via mapping of kurtosis Q values and inferred tissue associations for previously discussed tissue-level differentiations in the 17-HGL-DTG and phenolamide pathways.

Fig. 2.

Fig. 2.

Large-scale analysis of idMS/MS tissue specificity. (A) Cross-tissue distribution patterns for three idMS/MS examples. Z-score–normalized median absolute distances captured cross-tissue variations for idMS/MS intensities. idMS/MSs deconvoluted for m/z 295.102 @ 374 s and 901.404 @ 1,032 s revealed clear tissue specificity for one and two tissue types, respectively. idMS/MS for m/z 627.340 @ 1473 s was not associated with a particular tissue. (B) Density of intensity levels of each idMSMS across all analyzed tissues is computed and filtered using a reduction of kurtosis method to determine idMS/MS with significant tissue specificity. (C) Bar chart showing the number of idMS/MSs per tissue using an intensity threshold of 2 (Left), and bar chart showing the percentage of idMS/MSs illustrating tissue specificity per tissue (Right). (D) Heat map matrix visualizing idMSMS sharing among tissues as measured using the Jacquard index. The idMS/MS classifications to main compound classes in N. attenuata as obtained by idMS/MS alignments to public libraries and manual curation are shown in Dataset S1).

To retrieve tissue idMS/MS–specific associations, we defined a tissue relative expression threshold Z (Z = 2) through the calculation of a reduction of kurtosis according to the rationale proposed by Li et al. (20) (Fig. 2C and SI Appendix, Fig. S6). Seeds harbored again the smallest associated metabolome had 99 specifically expressed idMS/MSs, followed by stem (158 idMS/MSs) and root (278 idMS/MSs), whereas a general trend was that floral organs had the largest numbers of associated idMS/MSs. An interesting level of analysis was therefore to look at the percentage of tissue-specific idMS/MSs compared with nonspecific ones per tissue. For instance, a number, albeit small (278 idMS/MSs), of idMS/MSs specific to roots represented 72.3% of the total detectable root metabolome, indicating the relatively high metabolic specialization of this tissue.

In agreement with the importance of not restricting the analysis only to single tissue-specific idMS/MSs and considering different degrees of specificity based on the number of tissues in which a given idMS/MS accumulates, we detected that idMS/MSs specifically associated with more than one tissue were highly prevalent (97%) in the tissue-specific idMS/MS pool (SI Appendix, Fig. S7). The relative strength of the metabolic interdependencies between two tissues based on the number of shared tissue-specific idMS/MSs was scored using the Jaccard index (Fig. 2D). Clustering based on this score again supported the fact that vegetative tissues such as leaf, stem, and root cluster apart from floral counterparts in terms of secondary metabolite profiles. The “floral” cluster subdivided into three smaller clusters: one with tight connections between tissues not directly involved in reproductive tissues (besides the complete bud); one comprising tissues with mostly reproductive functions (filament, style, ovary, but also the nectary); and, finally, anthers. The individualized positioning of anthers in this clustering analysis is in agreement with the information theory specialization signature detected in this tissue as discussed above.

MS/MS Structural Analysis of IdMS/MS Associations.

Examples of annotated tissue-specific idMS/MS spectra shared by different tissues are presented in Dataset S1. Metabolite annotation remains a bottleneck in metabolomics studies because public spectral databases are poorly populated with plant-specialized metabolites, with many of them being taxa-specific and frequently species-specific. The MS/MS molecular network method pioneered by Dorrestein and coworkers (21) circumvents the limitation of spectral databases via the analysis of within-dataset MS/MS similarities to accelerate hypothesis generation about the identity of unknown MS/MS (21). This approach also has the advantage of being amenable to the visualization of putative biochemical relationships among metabolites corresponding to highly similar MS/MS spectra (16). In a recent study, we improved the scoring and classification of MS/MS similarities for plant secondary metabolites notably by the implementation of a biclustering method that detects possible compound familial groupings according to fragment and neutral loss (NL)-based similarities (16). Applying this method to the total pool of idMS/MSs from the present study resulted in the formation of nine modules within which idMS/MSs are expected to share high structural similarities (Fig. 3A). Modules 4, 6, 7, 8, and 9 largely corresponded to previously identified compound classes: flavonoid glycosides, phenolics, 17-HGL-DTGs, acyl sugars, and nicotine and polyamine derivatives, respectively. As expected, uncharacterized metabolites likely belonging to these groups and not previously thoroughly investigated were also detected as sub-idMS/MSs that did not merge during the redundancy filtering step. A comprehensive view is provided in Dataset S1 that concatenates MS/MS spectral content and NLs, as well as their clustering and association with tissues. A critical consideration was whether tissues differ in their module relative composition as depicted in the stacked bar chart of Fig. 3B. For instance, it is clearly visible that complete O-acyl sugar metabolism (M8) is absent from anthers and the style, that the stem and seeds lack 17-HGL-DTG metabolism (M7), and that the flavonoid module (M4) is overrepresented in certain flower tissues. Molecular networks can be constructed for each module to visualize structural relationships among idMS/MSs better (Fig. 3C). The case of module M4 is presented (Fig. 4). A subpart of this flavonoid-enriched module contains O-acyl sugar type II due to shared NLs with flavonoid glycosides; those two groups are still discriminated according to the edge density and by the careful inspection of idMS/MS tissue coexpression scores. By simply mapping the relative expression of idMS/MSs onto nodes, it is possible to pinpoint metabolites that are characteristic of a given tissue rapidly, for instance, the dramatic overrepresentation of kaempferol-3-O-glucoside (KG) at the limb level (808.684 ZMAD scaled intensity). Also, the idMS/MS for m/z 295.102 specific to anthers and not coexpressed across tissues with any other flavonoid glycosides from module M4 is putatively annotated by our method as a glucose ester with C4 side chains, depicted here as 6-tuliposide B (22).

Fig. 3.

Fig. 3.

Combination of structural classifications of idMS/MS and tissue specificity of expression. (A) Biclustering analysis to classify idMS/MSs according to structural similarities. The analysis used two scoring methods: one based on shared fragments among spectra, whereas the other scored shared common NLs among spectra. Using biclustering, which favors clustering based on iterative alignments of spectra based on the two scoring methods, produces large modules (M) with structurally related idMS/MSs. Some of these modules were congruent with known compound families, whereas others were composed of yet unknown or poorly characterized metabolites. Module annotation and idMS/MS intensity distribution are reported in Dataset S1. (B) Relative contribution of each module to the idMS/MSs associated with a given tissue. The visualization highlights the complete absence of specific metabolic groups, corresponding here to particular modules, such as O-acyl sugars (O-AS), in anthers. (C) Molecular networks constructed for each module. Nodes represent idMS/MSs and edges represent similarity values based on the two scoring types. Tissue specificity can easily be mapped to the molecular networks.

Fig. 4.

Fig. 4.

Distribution of a flavonoid-enriched module among different flower parts. (A) Network representation and annotation of module M4 from the biclustering analysis. Nodes correspond to idMS/MS spectra, and edges correspond to their pairwise similarity as measured according to the fragment (NDP; >0.6) and NL (>0.6) similarity. Many of the spectra correspond to flavonoid glycosides, albeit O-acyl sugars of type II are also present due to shared NLs. (B) Cross-tissue coexpression (based on an Ochiai score > 0.6) between idMS/MS spectra discriminates flavonoid glycosides from O-acyl sugar. The analysis reveals metabolites within the M4 module with high tissue specificity, such as idMS/MS at m/z 295.102, predicted to be a tuliposide derivative, which is abundant in anthers (Fig. 3A). (C) Examples of visualization of cross-tissue variations for idMS/MSs of M4. Node size is proportional to the cross-tissue relative intensity of each idMS/MS. Color mapping denotes rules presented in A. Gray nodes do not exhibit tissue specificity, whereas yellow nodes were detected as tissue-specific. Red-circled nodes are annotated as flavonoid glycosides. Identifications of KG, kaempferol-3-O-sophoroside (glucosyl(1-2)glucoside) (KGG), KGR, QG, QGG, and kaempferol-3-O-rutinoside (glucosyl(1-2)rhamnoside) [QGR (Rutin)] are according to Snook et al. (53).

Exploring Metabolite and Gene Coassociations Across Tissues Facilitates Metabolic Gene Pathway Assignment.

In this last section, we illustrate the power of first determining tissue–metabolite associations in generating predictions about the assignment of unknown genes to particular pathways. In the case of a unimodal regulation (with cross-tissue transport being minimal), the logic behind these predictions is that a gene responsible for the production of a given set of metabolites will share maximal tissue associations with these metabolites. As for gene expression data, we used an RNA-sequencing (RNAseq) transcriptome dataset (SI Appendix, Table S2) in which tissues and developmental stages largely overlap with those tissues and developmental stages used for metabolomics but that also included treatment responses to account for the fact that certain genes are expressed constitutively at low levels but the metabolites can accumulate without turnover to high levels. Similar to idMS/MS data, the kurtosis filtering allowed us to filter out genes with quasiconstant expression and focus on the genes exhibiting leptokurtic distributions (SI Appendix, Figs. S6 and S8). Thirty-seven percent of the total genes expressed exhibited leptokurtic distributions (i.e., expressed specifically in one or several tissues) (Fig. 5A). Subsequent analysis steps followed the steps presented above for the analysis of metabolites. Overrepresented gene ontologies (GOs) within the complete set of genes with preferential tissue associations corresponded to general processes such as chloroplast thylakoid activity, monocarboxylic acid biosynthetic process, anion transport, and metal ion transport (SI Appendix, Fig. S9). This GO overrepresentation analysis was also conducted on a module basis for a gene set specifically associated with a given idMS/MS module using an Ochiai similarity index (SI Appendix, Figs. S9 and S10). For this calculation, emphasis is placed on tissue specificity rather than on the characterization of a trend of coexpression across the complete tissue set such as is the case when using simple PCC analysis. Through this approach, it is now possible to target specific metabolic gene families, UDP-glycosyltransferases here, and to predict their importance for the metabolic group enriched within a given idMS/MS module (Fig. 5B). For mining this latter gene family, an additional filtering criterion is the presence within the coassociated idMS/MSs of NLs corresponding to glucose or rhamnose moieties. Modules 4, 7, and 8 are made up of idMS/MSs corresponding to glycosylated secondary metabolites, and hence enriched in the presence of these latter NLs (Fig. 5B). We extracted 10 members of this gene family that had cotissue specificities with members of M4. Next, we tested, by transient gene silencing using virus-induced gene silencing (SI Appendix, Fig. S11), the pathway assignment of two of these UDP-glycosyltransferases highlighted by the Ochiai similarity analysis: UDP-glycosyltransferase-A (UGT-A) [Ochiai similarity = 0.71 with quercetin-3-O-glucose (QG)] and UDP-glycosyltransferase-B (UGT-B) (Ochiai similarity = 0.71 with rutin). Briefly, when silencing UGT-A, a majority of the flavonoid glycosides in flower buds were significantly decreased in their accumulations, namely, KG, QG, kaempferol-3-O-glucose-rhamnose (KGR), and quercetin-3-O-glucose-glucose (QGG). On the other hand, silencing UGT-B translated into significant decreases in the levels of rhamnose-containing KGR and rutin, whereas QGG, QG, and KG accumulated to higher levels compared with the empty vector control. This result is consistent with the conclusion that UGT-B likely controls the rhamnosylation of these flavonoid glycosides and that the higher accumulations of nonrhamnose flavonoid glycosides reflect the metabolic tension existing with the UGT-A–mediated glucosylation process (Fig. 5C and SI Appendix, Fig. S11). Future work could test these predictions and examine the enzymatic properties of these two UDP-glycosyltransferases.

Fig. 5.

Fig. 5.

Silencing UGT-A and UGT-B reveals their involvement as UDP-glucosyltransferase and UDP-rhamnosyltransferase, respectively, in floral flavonoid glycoside metabolism, two predictions of the tissue coexpression analysis. (A) Results of the kurtosis filtering analysis for preferential tissue–gene associations. Examples are provided for the tissue specificity of members of large metabolic gene families. Notably, 71% of all predicted UDP-glycosyltransferases (GT) exhibit tissue specificity in the transcriptome dataset. (B, Left) Number of tissue-specific idM/MS spectra containing glucose or rhamnose NLs, and therefore predicted to be glycosylated secondary metabolites, compared with the total number of tissue-specific idMS/MSs per biclustering module. M4 is enriched in flavonoid glycosides, M7 in 17-HGL-DTGs, and M8 in O-acyl sugars. (B, Right) Number of UDP-GT coassociated across tissues (Ochiai score > 0.4) with at least one idMS/MS containing glucose (G) or rhamnose (R) NL of each module. (C) Relative levels of precursors corresponding to idMS/MSs referred to in B after analysis of flower buds of plants inoculated with empty vector and gene silencing constructs for UGT-A and UGT-B (SI Appendix, Fig. S11). As supported by the annotation of idMS/MS spectra, silencing UGT-A decreases the glucosylation of flavonols, whereas silencing UGT-B decreases their additional rhamnosylation. Identifications of KG, KGG, KGR, QG, QGG, and QGR (Rutin) are according to Snook et al. (53). *P < 0.05; **P < 0.01; ***P < 0.001.

Discussion

In this study, we investigated tissue-level variations in secondary metabolism in an ecological model plant using computational metabolomics and information theory statistics. Information theory has been used for multivariate data generated in a broad scope of biological contexts ranging from plant ecology (23) to microbiome diversity (24), but, to our knowledge, it has never been used to summarize trends in MS-based metabolomics data. Previous studies identified preferential tissue-based redirectionalities in secondary metabolism, for instance, during the maturation of tomato fruits for which the green, turning, and red developmental stages are characterized by rearrangements in pathways related to flavonoids, phenolics, and glycoalkaloids (25). However, to our knowledge, no unbiased metabolomics study, other than a study of the AtMetExpress database (6), has been applied with rigorous statistical analysis to such a broad range of tissues as in the present study.

This statistical portfolio revealed that tissues exhibit distinct states of secondary metabolism activity but also that they differ in their degree of specialization. An extracted feature illustrative of the explorative power of this approach was that connecting tissues of flowers such as the anthers and filaments, on one hand, and the corolla limb and tube, on the other hand, differed dramatically in their metabolite specialization signatures. For the corolla, this contrast highlights the fact that limbs are functionally specialized for attracting and guiding pollinators, and likely require a highly specialized metabolome to fulfill this function. The latter is especially expected in a species such as N. attenuata whose main pollinator, the hawkmoth Manduca sexta, is also a voracious folivore during its larval stage (26), requiring that the plant critically fine-tune its blend of secondary metabolites to solve the dilemma imposed by these two contrasting interactions (27, 28). As expected, the green tissues of our dataset display the most prototypic and undifferentiated metabolic profiles, as highlighted by both targeted and nontargeted analyses.

N. attenuata is a pioneer plant in postfire habitats (29, 30), and as such, it represents one of the primary food sources for herbivorous insects (31). It is well established that the photosynthetically active tissues of this plant mount a very strong specialized metabolic response locally and systemically during biotic challenges such as insect herbivory (32). It would therefore be very interesting to reassess how the specialization indices readjust during stress adaptation, taking advantage of preexisting knowledge on antiherbivory function of many secondary metabolite classes (6, 19, 33). Also, the pools of many of these defensive secondary metabolites are rearranged during ontogeny in the form of quantitative gradients established across tissues (19, 34). The optimal defense theory provides a conceptual framework that links these quantitative patterns with the fitness of different tissues for the plant’s fitness (35, 36). Even though the developmental stages of multiple tissues would need to be separately analyzed using our analytical approach to evaluate this theory thoroughly, several defense-related metabolites exhibited higher relative levels in reproductive tissues than in vegetative counterparts, a central prediction of the optimal defense theory. A last remark concerns the extremely low metabolic diversity detected in seeds, a result that could possibly be due to the fact that most apolar metabolites present in the seed endosperm were poorly recovered with our extraction systems. This result speaks to the need to use a more sophisticated combination of extraction and chromatographic systems in future experiments to capture the behavior of a broader range of compound classes.

It is tempting to consider that signatures of high metabolic specializations observed for certain tissues correlate with their highly specialized physiological functions. Previous tissue-level -omics analyses in plants and animals are consistent with the expectation that physiological differentiation is accompanied by qualitative variations of metabolic capacities (37, 38). As previously noted, this claim is difficult to support only with metabolomics due to the sparse knowledge about secondary metabolite biosynthetic schemes. The GO enrichment analysis conducted in this study supports the fact that the kurtosis-based method is able to discriminate gene signatures involved in the tissue-specialized physiological processes from housekeeping ones (e.g., pollen tube growth) (SI Appendix, Figs. S9 and S10), so it is reasonable to propose that metabolites extracted by the kurtosis method also reflect tissue-level functions, even if transport processes may obfuscate some of these trends. In this regard, the case of anthers exhibiting a prevalent signature of high idMS/MS specialization, greater than all other reproductive organs, is particularly germane. In line with our observation of the manufacture of a specific set of metabolites in this tissue (Dataset S1), previous studies have shown that metabolites, notably certain phenolic derivatives, are abundant and highly specific for the tapetum (specialized layer of nutritive cells and source of precursors for the pollen coat within anthers) of anthers and pollen grains (3941). The biosynthesis of these phenolic derivatives, with potential roles in pollen coat composition and establishment of fertilization barriers, has been linked to rapid metabolic gene evolution through retroposition and neofunctionalization (42). In a cross-species study on transcriptome evolutionary divergence, the fastest rates of gene expression divergence and signatures of transcriptome specialization were detected in anthers, whereas the lowest rates of evolution were detected in roots (43). Our metabolomics study therefore suggests that transcriptome and metabolome specialization may be coupled patterns in anthers, likely as a result of strong reproduction-related selection pressures exerted at this tissue level. More broadly, it would therefore be very interesting to analyze whether such kinds of metabolic specialization patterns are consistent across species for homologous tissues.

Navigating large datasets in such a way that knowledge can be made more efficiently accessible for hypothesis formulation is one of the challenges that thwart the routine application of certain -omics technologies to nonmodel systems organisms. In this study, we used data-independent MS/MS acquisition. This approach, albeit suffering from redundant data collection for certain metabolites prone to intense in-source fragmentation, maximizes the comprehensiveness of fragment data collection, which forms the foundation of such unbiased analysis. The present study also speaks to the power of the previously described molecular network method for plant samples and identifies directions for its integration with genomics data. The latter is illustrated by our functional studies on UDP-glycosyltransferases and the assignment of two previously uncharacterized genes of N. attenuata to the glucosylation and rhamnosylation steps in floral flavonoid glycoside metabolism (44, 45) (Fig. 5). Importantly, the data platform generated here can also be mined for additional metabolic gene families (e.g., P450, BAHD acyltransferases).

Tissue-level PCC coexpression analysis among genes and metabolites has traditionally been shown to be an efficient way forward for gene function analysis in secondary metabolism (6, 44, 46, 47). However, one important message of the present study is that because PCC-based coexpression analysis relies on trends inferred from gene/metabolite expression levels, certain tissue-level gene–metabolite associations are difficult to capture via this approach because they take place only in a few of the analyzed tissues, thereby resulting in a poor coexpression output. Consistent with this finding, a recent study on gene-sharing analysis in plants and animals demonstrated that an approach that puts emphasis on gene expression tissue specificities is significantly more efficient in identifying functional gene clusters than one that relies on the complete tissue-level expression dataset (20). Our kurtosis analyses show that this inference is likely to be more pronounced when incorporating metabolomics as another -omics dimension, because up to 97% of detected secondary metabolites exhibit tissue-specific expression in only a few of the tissue atlases. As such, we concluded that relying on expression levels monitored across the overall tissue set would decrease rather than increase the statistical power to discover biologically meaningful gene–metabolite associations. We thus adopted for gene-to-metabolite analysis a modified Ochiai similarity analysis in which the emphasis is placed on tissue specificity. Comparison of performance between this Ochiai similarity analysis and the PCC-based coexpression analysis revealed that the PCC analysis returned poor coexpression values (PCC for the association QG/UGT-A is only 0.09, whereas the Ochiai similarity for the same association is 0.71) and failed to associate UGT-A and UGT-B specifically to flavonoids (SI Appendix, Table S3). Similar comparisons of performance between these two approaches for a compendium of 70 previously characterized gene–metabolite associations also confirmed that the Ochiai similarity analysis systemically outperforms the PCC-based approach, especially when metabolites exhibited high tissue specificity (SI Appendix, Table S3). Taken together, this study reinforces the power of applying approaches combining large-scale metabolomics and information theory analysis to accelerate hypothesis generation on metabolic gene function.

Conclusion

In summary, a major strength of this unique study is that it synergistically combines, using a three-pronged approach, the strengths of (i) information theory to capture signatures of diversity and specialization in the dataset, (ii) computational MS to accelerate the structural annotation of the diversity of compounds collected, and (iii) experimental gene silencing to falsify hypotheses regarding metabolic gene functions from metabolomics–transcriptomics integration. A recent breakthrough study on mammals’ metabolomes has highlighted the power of metabolomics to predict markers associated with organ specialization in a phylogenetic context (48). Future directions will make use of genomics resources existing for related species of N. attenuata to extend the approach to the diagnosis of gene divergence effects contributing the most to tissue-metabolic specialization.

Materials and Methods

Tissue-Level Metabolite Extraction.

Here, we extracted 14 different tissues from 28- and 50-d-old N. attenuata plants growing in the glasshouse (Fig. 1A). For nonreproductive tissues, the sample collection included a pool of all nonsenescing rosette leaves; combined lower, middle, and higher segments of the stem; the complete root system; and matured seeds. Reproductive parts were harvested as follows. Complete floral buds of 8-mm length, a stage at which the corolla has not yet protruded from the sepals and for which important gene expression and metabolic reconfigurations have been detected in previous work (49), were harvested. Mature flowers at anthesis (5 d after 8-mm stage, 7:00 PM), were carefully separated into the following parts: pedicel, complete sepal ring, nectary, ovary (not including the nectary), style, anthers, filaments (not including anthers), corolla tube (not including the limb), and corolla limb.

Pools of 100 mg of isolated tissues (SI Appendix, Materials and Methods) were extracted as follows using extraction buffers containing either 20% or 80% (vol/vol) methanol to increase the coverage of chemically diverse metabolite classes. One milliliter of extraction buffer [50 mM acetate buffer (pH 4.8) containing 20% or 80% (vol/vol) methanol] per 100 mg of tissue was added, and samples were homogenized in a ball mill (Genogrinder 2000; SPEX CertiPrep) for 45 s at a rate of 1× and at 250 strokes per minute. Homogenized samples were centrifuged at 16,000 × g at 4 °C for 30 min, and supernatants were transferred into 1.5-mL microcentrifuge tubes and recentrifuged as before. Supernatants of 400 μL were transferred to 2-mL glass vials for MS-based metabolomics. To prevent the discarding of tissue-specific metabolites from the XCMS analysis due to poor grouping across samples (SI Appendix, Materials and Methods), five mixed extracts containing all 14 tissues at different ratios were generated and processed simultaneously with all other tissue samples.

UHPLC-ESI/qTOF-MS Conditions for IdMS/MS Data Acquisition.

Data-independent or idMS/MS fragmentation analysis was conducted to gain structural information on the overall detectable metabolic profile. Injection and UHPLC binary gradient-based separation conditions used for the MS and MS/MS mode analyses are described in SI Appendix, Materials and Methods. For all MS analyses, the column eluent was infused into a MicrOTOF-Q II (Bruker Daltonics) equipped with quadrupole and TOF analyzers and fitted with an electrospray source operated in positive ionization mode (capillary voltage = 4,500 V, capillary exit = 130 V, dry temperature = 180 °C, dry gas flow = 8 L⋅min−1). The concept of the idMS/MS approach relies on the fact that the quadrupole is operated with a very large mass isolation window (so that quasi all m/z signals are considered for fragmentation). For this determination, several independent analyses are performed with increasing CID collision energy (CE) values because the MicrOTOF-Q II instrument can operate neither alternated scans collected in MS and MS/MS mode nor CE ramping. Briefly, samples were first analyzed by UHPLC-ESI/qTOF-MS using the single-MS mode (low-fragmentation condition derived from in-source fragmentation) by scanning from m/z 50–1,400 at a rate of 5,000 scans per second. MS/MS analyses were conducted using nitrogen as collision gas and involved independent measurements at the following four different CID voltages: 20, 30, 40, and 50 eV. The quadrupole was operated throughout the measurement with the largest mass isolation window, from m/z 50–1,400. This mass range is automatically activated by the operating software of the instrument when the precursor m/z and the isolation width are set to 400 and 300 Da, respectively. Mass fragments were scanned in the single-MS mode between m/z 50 and 1,400 at a rate of 5,000 scans per second. Mass calibration was performed using sodium formate (50 mL of isopropanol, 200 μL of formic acid, 1 mL of 1 M NaOH in water). Data files were calibrated postrun on the average spectrum from this time segment, using the Bruker high-precision calibration algorithm. The idMS/MS dataset has been deposited in the open metabolomics database Metabolights (www.ebi.ac.uk) under accession no. MTBLS335.

Assembly of Compound-Specific IdMS/MS.

We used a previously designed precursor-to-product assignment pipeline (15) using the output results from processing with the R packages XCMS and CAMERA. The idMS/MS assembly was achieved via correlational analysis between MS1 and idMS/MS mass signals for low- and high-CEs and newly implemented rules (SI Appendix, Materials and Methods). The correlation analysis for precursor-to-product assignment was implemented using an R script, and rules were operated using a C# script available at GitHub (https://github.com/PlantDefenseMetabolism).

Defining Tissue Metabolic Diversity and Specialization Using Information Theory.

Tissue metabolic diversity, the Hj index, was calculated using Shannon entropy of idMS/MS tissue-level frequency distribution. Tissue metabolic specialization, the δj index, was measured by the average idMS/MS specificity of each of the tissue idMS/MS components. Framework details are described in SI Appendix, Materials and Methods.

IdMS/MS Similarity Scoring.

The idMS/MS spectra were aligned in a pairwise manner, and their similarity was calculated according to two scores. First, a standard normalized dot product (NDP), also referred to as cosine correlation method, was used to score fragment similarity among spectra using the following equation:

NDP=(iS1&S2WS1,iWS2,i)2iWS1,i2iWS2,i2,

where S1 and S2 correspond, respectively, to spectrum 1 and spectrum 2 and WS1,i and WS2,i indicate peak intensity-based weights given to ith common peaks differing by less than 0.01 Da between the two spectra. Weights were calculated as follows:

W=[Peakintensity]m[Mass]n,

with m = 0.5 and n = 2 as suggested by MassBank (50).

A second scoring method involving the analysis of shared NLs among individual idMS/MSs was implemented as described in SI Appendix, Materials and Methods. For this analysis, we used a list of 52 NLs commonly encountered during MS/MS fragmentation (Dataset S1) as well as more specific ones that had been previously annotated for MS/MS spectra of N. attenuata secondary metabolite classes.

IdMS/MS Tissue-Specificity Inference Using Kurtosis Filtering.

We used an outlier-insensitive Z-score measure, generally considered preferable for the statistical description of sample groups containing extreme differences in values, by using median and median absolute deviation (MAD) instead of mean and SD for the normalization of both idMS/MS and RNAseq datasets to obtain relative expressions within tissues, as calculated using the following equation described by Birmingham et al. (51):

Zi=(EiMedian(E))/MAD(E),

where Ei is the expression level of a metabolite or a gene in tissue i. E is a vector of a metabolite or a gene in all tissue samples.

Kurtosis (K) was calculated for each metabolite and gene using an R package (moments) utilizing the following equation:

K=1ni=1n(XiX¯)4(1ni=1n(XiX¯)2)2,

where Xi stands for the expression level of a metabolite or a gene in the ith tissue and X¯ is the mean of the same metabolite or gene. The P value of the kurtosis was calculated using Anscombe.test function in the R “moments” package.

Tissue specificity for a metabolite or a gene was defined using the reduction of kurtosis method as previously described (20). When a leptokurtic expressed metabolite or gene removes high expression values for certain tissues, the kurtosis of the metabolite or the gene will be reduced. Threshold Z filtering of the data from a particular tissue was obtained by plotting the cumulative reductions in the kurtosis curves for any given kurtosis threshold using different Z threshold values (SI Appendix, Fig. S6). When defining the false discovery rate-adjusted P value as Q, we chose a Z threshold of 2 for metabolite datasets, where 98.3% (the highest) of the metabolites with Q < 0.01 exhibit reduced kurtosis after applying the threshold cutoff. Similarly, a threshold of 3 was applied for the RNAseq dataset.

RNAseq Dataset of Different Tissues and Data Mining.

A detailed overview of the RNAseq dataset and National Center for Biotechnology Information database accession numbers are available in SI Appendix, Table S2. A list of the tissues collected for RNAseq analysis and of metadata related to this experiment is also available at the Nicotiana attenuata Data Hub database web site (nadh.ice.mpg.de/NaDH). This experiment had been conducted and involved additional tissues and physiological conditions in addition to those tissues and physiological conditions reported in the metabolomics study presented here.

Virus-Induced Gene Silencing.

Vector construction, plant growth, and inoculation conditions were as described by Saedler and Baldwin (52) and are described in SI Appendix, Materials and Methods.

Supplementary Material

Supplementary File
pnas.1610218113.sd01.xlsx (29.2MB, xlsx)
Supplementary File

Acknowledgments

We thank Dr. Mathias Schöttner for technical support in establishing the idMS/MS acquisition method and Dr. Klaus Gase for help with gene silencing construct design. D.L. and I.T.B. are funded by the Max Planck Society, by Advanced Grant 293926 of the European Research Council (to I.T.B.), and by the Collaborative Research Centre “Chemical Mediators in Complex Biosystems” (Grant SFB 1127). E.G.’s research in Heidelberg is supported within the framework of the Deutsche Forschungsgemeinschaft Excellence Initiative to the University of Heidelberg.

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The MS/MS dataset has been deposited in the European Molecular Biology Laboratory European Bioinformatics Institute open metabolomics database Metabolights, www.ebi.ac.uk (accession no. MTBLS335). The RNA sequencing dataset is available at the Nicotiana attenuate Data Hub database (nadh.ice.mpg.de/NaDH) and at the National Center for Biotechnology Information Sequence Read Archive (SRA) database, https://www.ncbi.nlm.nih.gov/sra (accession nos. NA1498ROT, NA1500LET, NA1717LEC, NA1504STT, NA1505COE, NA1515COL, NA1506STI, NA1507POL, NA1508SNP, NA1509STO, NA1510STS, NA1511NEC, NA1512ANT, NA1513OVA, NA1514PED, NA1516OFL, NA1517FLB, NA1501SES, NA1502SEW, and NA1503SED).

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1610218113/-/DCSupplemental.

References

  • 1.Weng JK, Philippe RN, Noel JP. The rise of chemodiversity in plants. Science. 2012;336(6089):1667–1670. doi: 10.1126/science.1217411. [DOI] [PubMed] [Google Scholar]
  • 2.Wink M. Evolution of secondary metabolites from an ecological and molecular phylogenetic perspective. Phytochemistry. 2003;64(1):3–19. doi: 10.1016/s0031-9422(03)00300-5. [DOI] [PubMed] [Google Scholar]
  • 3.Wink M, Carey DB. Variability of quinolizidine alkaloid profiles of Lupinus argenteus (Fabaceae) from North-America. Biochem Syst Ecol. 1994;22(7):663–669. [Google Scholar]
  • 4.Itkin M, et al. GLYCOALKALOID METABOLISM1 is required for steroidal alkaloid glycosylation and prevention of phytotoxicity in tomato. Plant Cell. 2011;23(12):4507–4525. doi: 10.1105/tpc.111.088732. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.vonPoser GL, Toffoli ME, Sobral M, Henriques AT. Iridoid glucosides substitution patterns in Verbenaceae and their taxonomic implication. Plant Syst Evol. 1997;205(3-4):265–287. [Google Scholar]
  • 6.Matsuda F, et al. AtMetExpress development: A phytochemical atlas of Arabidopsis development. Plant Physiol. 2010;152(2):566–578. doi: 10.1104/pp.109.148031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Tissier A. Glandular trichomes: What comes after expressed sequence tags? Plant J. 2012;70(1):51–68. doi: 10.1111/j.1365-313X.2012.04913.x. [DOI] [PubMed] [Google Scholar]
  • 8.Schilmiller AL, et al. Studies of a biochemical factory: tomato trichome deep expressed sequence tag sequencing and proteomics. Plant Physiol. 2010;153(3):1212–1223. doi: 10.1104/pp.110.157214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Zang YX, et al. Genome-wide identification of glucosinolate synthesis genes in Brassica rapa. FEBS J. 2009;276(13):3559–3574. doi: 10.1111/j.1742-4658.2009.07076.x. [DOI] [PubMed] [Google Scholar]
  • 10.Hirai MY, et al. Omics-based identification of Arabidopsis Myb transcription factors regulating aliphatic glucosinolate biosynthesis. Proc Natl Acad Sci USA. 2007;104(15):6478–6483. doi: 10.1073/pnas.0611629104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Rajniak J, Barco B, Clay NK, Sattely ES. A new cyanogenic metabolite in Arabidopsis required for inducible pathogen defence. Nature. 2015;525(7569):376–379. doi: 10.1038/nature14907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Sakurai T, et al. PRIMe Update: Innovative content for plant metabolomics and integration of gene expression and metabolite accumulation. Plant Cell Physiol. 2013;54(2):e5. doi: 10.1093/pcp/pcs184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bowen BP, Northen TR. Dealing with the unknown: Metabolomics and metabolite atlases. J Am Soc Mass Spectrom. 2010;21(9):1471–1476. doi: 10.1016/j.jasms.2010.04.003. [DOI] [PubMed] [Google Scholar]
  • 14.Allard PM, et al. Integration of molecular networking and in-silico MS/MS fragmentation for natural products dereplication. Anal Chem. 2016;88(6):3317–3323. doi: 10.1021/acs.analchem.5b04804. [DOI] [PubMed] [Google Scholar]
  • 15.Broeckling CD, Heuberger AL, Prince JA, Ingelsson E, Prenni JE. Assigning precursor-product ion relationships in indiscriminant MS/MS data from non-targeted metabolite profiling studies. Metabolomics. 2013;9(1):33–43. [Google Scholar]
  • 16.Li D, Baldwin IT, Gaquerel E. Navigating natural variation in herbivory-induced secondary metabolism in coyote tobacco populations using MS/MS structural analysis. Proc Natl Acad Sci USA. 2015;112(30):E4147–E4155. doi: 10.1073/pnas.1503106112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Shannon CE. A mathematical theory of communication. AT&T Tech J. 1948;27(3):379–423. [Google Scholar]
  • 18.Martínez O, Reyes-Valdés MH. Defining diversity, specialization, and gene specificity in transcriptomes through information theory. Proc Natl Acad Sci USA. 2008;105(28):9709–9714. doi: 10.1073/pnas.0803479105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Heiling S, et al. Jasmonate and ppHsystemin regulate key Malonylation steps in the biosynthesis of 17-Hydroxygeranyllinalool Diterpene Glycosides, an abundant and effective direct defense against herbivores in Nicotiana attenuata. Plant Cell. 2010;22(1):273–292. doi: 10.1105/tpc.109.071449. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Li S, et al. Gene-sharing networks reveal organizing principles of transcriptomes in Arabidopsis and other multicellular organisms. Plant Cell. 2012;24(4):1362–1378. doi: 10.1105/tpc.111.094748. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Watrous J, et al. Mass spectral molecular networking of living microbial colonies. Proc Natl Acad Sci USA. 2012;109(26):E1743–E1752. doi: 10.1073/pnas.1203689109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Nomura T, Murase T, Ogita S, Kato Y. Molecular identification of tuliposide B-converting enzyme: A lactone-forming carboxylesterase from the pollen of tulip. Plant J. 2015;83(2):252–262. doi: 10.1111/tpj.12883. [DOI] [PubMed] [Google Scholar]
  • 23.Ulanowicz RE. Information theory in ecology. Comput Chem. 2001;25(4):393–399. doi: 10.1016/s0097-8485(01)00073-0. [DOI] [PubMed] [Google Scholar]
  • 24.Eren AM, Borisy GG, Huse SM, Mark Welch JL. Oligotyping analysis of the human oral microbiome. Proc Natl Acad Sci USA. 2014;111(28):E2875–E2884. doi: 10.1073/pnas.1409644111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Moco S, et al. Tissue specialization at the metabolite level is perceived during the development of tomato fruit. J Exp Bot. 2007;58(15-16):4131–4146. doi: 10.1093/jxb/erm271. [DOI] [PubMed] [Google Scholar]
  • 26.Kessler D, Diezel C, Baldwin IT. Changing pollinators as a means of escaping herbivores. Curr Biol. 2010;20(3):237–242. doi: 10.1016/j.cub.2009.11.071. [DOI] [PubMed] [Google Scholar]
  • 27.Euler M, Baldwin IT. The chemistry of defense and apparency in the corollas of Nicotiana attenuata. Oecologia. 1996;107(1):102–112. doi: 10.1007/BF00582240. [DOI] [PubMed] [Google Scholar]
  • 28.Kessler D, Baldwin IT. Making sense of nectar scents: The effects of nectar secondary metabolites on floral visitors of Nicotiana attenuata. Plant J. 2007;49(5):840–854. doi: 10.1111/j.1365-313X.2006.02995.x. [DOI] [PubMed] [Google Scholar]
  • 29.Baldwin IT, Staszak-Kozinski L, Davidson R. Up in smoke: I. Smoke-derived germination cues for postfire annual, Nicotiana attenuata torr. Ex. Watson. J Chem Ecol. 1994;20(9):2345–2371. doi: 10.1007/BF02033207. [DOI] [PubMed] [Google Scholar]
  • 30.Baldwin IT, Morse L. Up in smoke: II. Germination of Nicotiana attenuata in response to smoke-derived cues and nutrients in burned and unburned soils. J Chem Ecol. 1994;20(9):2373–2391. doi: 10.1007/BF02033208. [DOI] [PubMed] [Google Scholar]
  • 31.Baldwin IT. An ecologically motivated analysis of plant-herbivore interactions in native tobacco. Plant Physiol. 2001;127(4):1449–1458. [PMC free article] [PubMed] [Google Scholar]
  • 32.Gulati J, Kim SG, Baldwin IT, Gaquerel E. Deciphering herbivory-induced gene-to-metabolite dynamics in Nicotiana attenuata tissues using a multifactorial approach. Plant Physiol. 2013;162(2):1042–1059. doi: 10.1104/pp.113.217588. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Weinhold A, Baldwin IT. Trichome-derived O-acyl sugars are a first meal for caterpillars that tags them for predation. Proc Natl Acad Sci USA. 2011;108(19):7855–7859. doi: 10.1073/pnas.1101306108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Onkokesung N, et al. MYB8 controls inducible phenolamide levels by activating three novel hydroxycinnamoyl-coenzyme A:polyamine transferases in Nicotiana attenuata. Plant Physiol. 2012;158(1):389–407. doi: 10.1104/pp.111.187229. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.McKey D. Adaptive patterns in alkaloid physiology. Am Nat. 1974;108(961):305–320. [Google Scholar]
  • 36.McKey D. Herbivores: Their Interaction with Secondary Plant Metabolites. Academic; New York: 1979. The distribution of secondary compounds within plants; pp. 55–133. [Google Scholar]
  • 37.Uhlén M, et al. Proteomics. Tissue-based map of the human proteome. Science. 2015;347(6220):1260419. doi: 10.1126/science.1260419. [DOI] [PubMed] [Google Scholar]
  • 38.Schmid M, et al. A gene expression map of Arabidopsis thaliana development. Nat Genet. 2005;37(5):501–506. doi: 10.1038/ng1543. [DOI] [PubMed] [Google Scholar]
  • 39.Bassard JE, Ullmann P, Bernier F, Werck-Reichhart D. Phenolamides: Bridging polyamines to the phenolic metabolism. Phytochemistry. 2010;71(16):1808–1824. doi: 10.1016/j.phytochem.2010.08.003. [DOI] [PubMed] [Google Scholar]
  • 40.Werner C, Hu WQ, Lorenziriatsch A, Hesse M. Di-coumaroylspermidines and tri-coumaroylspermidines in anthers of different species of the genus Aphelandra. Phytochemistry. 1995;40(2):461–465. [Google Scholar]
  • 41.Meurer B, Wiermann R, Strack D. Phenylpropanoid patterns in Fagales pollen and their phylogenetic relevance. Phytochemistry. 1988;27(3):823–828. [Google Scholar]
  • 42.Matsuno M, et al. Evolution of a novel phenolic pathway for pollen development. Science. 2009;325(5948):1688–1692. doi: 10.1126/science.1174095. [DOI] [PubMed] [Google Scholar]
  • 43.Yang R, Wang X. Organ evolution in angiosperms driven by correlated divergences of gene sequences and expression patterns. Plant Cell. 2013;25(1):71–82. doi: 10.1105/tpc.112.106716. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Yonekura-Sakakibara K, et al. Comprehensive flavonol profiling and transcriptome coexpression analysis leading to decoding gene-metabolite correlations in Arabidopsis. Plant Cell. 2008;20(8):2160–2176. doi: 10.1105/tpc.108.058040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Tohge T, Fernie AR. Combining genetic diversity, informatics and metabolomics to facilitate annotation of plant gene function. Nat Protoc. 2010;5(6):1210–1227. doi: 10.1038/nprot.2010.82. [DOI] [PubMed] [Google Scholar]
  • 46.Ginglinger JF, et al. Gene coexpression analysis reveals complex metabolism of the monoterpene alcohol linalool in Arabidopsis flowers. Plant Cell. 2013;25(11):4640–4657. doi: 10.1105/tpc.113.117382. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Mintz-Oron S, et al. Gene expression and metabolism in tomato fruit surface tissues. Plant Physiol. 2008;147(2):823–851. doi: 10.1104/pp.108.116004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Ma S, et al. Organization of the mammalian metabolome according to organ function, lineage specialization, and longevity. Cell Metab. 2015;22(2):332–343. doi: 10.1016/j.cmet.2015.07.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Stitz M, Hartl M, Baldwin IT, Gaquerel E. Jasmonoyl-L-isoleucine coordinates metabolic networks required for anthesis and floral attractant emission in wild tobacco (Nicotiana attenuata) Plant Cell. 2014;26(10):3964–3983. doi: 10.1105/tpc.114.128165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Horai H, et al. MassBank: A public repository for sharing mass spectral data for life sciences. J Mass Spectrom. 2010;45(7):703–714. doi: 10.1002/jms.1777. [DOI] [PubMed] [Google Scholar]
  • 51.Birmingham A, et al. Statistical methods for analysis of high-throughput RNA interference screens. Nat Methods. 2009;6(8):569–575. doi: 10.1038/nmeth.1351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Saedler R, Baldwin IT. Virus-induced gene silencing of jasmonate-induced direct defences, nicotine and trypsin proteinase-inhibitors in Nicotiana attenuata. J Exp Bot. 2004;55(395):151–157. doi: 10.1093/jxb/erh004. [DOI] [PubMed] [Google Scholar]
  • 53.Snook ME, Chortyk OT, Sisson VA, Costello CE. The flower flavonols of Nicotiana species. Phytochemistry. 1992;31(5):1639–1647. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File
pnas.1610218113.sd01.xlsx (29.2MB, xlsx)
Supplementary File

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES