Development of an NMR-Based Platform for the Direct Structural Annotation of Complex Natural Products Mixtures

Joseph M Egan; Jeffrey A van Santen; Dennis Y Liu; Roger G Linington

doi:10.1021/acs.jnatprod.0c01076

. Author manuscript; available in PMC: 2022 Apr 23.

Published in final edited form as: J Nat Prod. 2021 Mar 22;84(4):1044–1055. doi: 10.1021/acs.jnatprod.0c01076

Development of an NMR-Based Platform for the Direct Structural Annotation of Complex Natural Products Mixtures

Joseph M Egan ^a, Jeffrey A van Santen ^a, Dennis Y Liu ^a, Roger G Linington ^a

PMCID: PMC8330833 NIHMSID: NIHMS1704875 PMID: 33750122

Abstract

The development of new 'omics' platforms is having a significant impact on the landscape of natural products discovery. However, despite the advantages that such platforms bring to the field, there remains no straightforward method for characterizing the chemical landscape of natural products libraries using two-dimensional nuclear magnetic resonance (2D-NMR) experiments. NMR analysis provides a powerful complement to mass spectrometric approaches, given the universal coverage of NMR experiments. However, the high degree of signal overlap, particularly in one-dimensional NMR spectra, has limited applications of this approach.

To address this issue, we have developed a new data analysis platform for complex mixture analysis, termed MADByTE (Metabolomics And Dereplication By Two-dimensional Experiments). This platform employs a combination of TOCSY and HSQC spectra to identify spin system features within complex mixtures, and then matches spin system features between samples to create a chemical similarity network for a given sample set. In this report we describe the design and construction of the MADByTE platform, and demonstrate the application of chemical similarity networks for both the dereplication of known compound scaffolds and the prioritization of bioactive metabolites from a bacterial prefractionated extract library.

Graphical Abstract

graphic file with name nihms-1704875-f0001.jpg

Natural products have traditionally played a central role in drug discovery, but novel bioactive compound discovery is becoming increasingly difficult as the field matures and the number of known scaffolds increases.¹ Standard approaches rely heavily on bioassay-guided fractionation, which often results in the re-isolation of known compounds and carries an inescapable material cost. To reduce the chances of re-isolation, numerous dereplication methods have been developed, including UV, MS and NMR-based platforms.^2,3 However, many of these tools rely on in-house databases that are slow and expensive to generate, and require high coverage for exact database matching.

As demonstrated by the development and widespread adoption of the Global Natural Products Social molecular networks platform (GNPS), methods which leverage structural information rather than direct spectral matching are of high value to the natural products community.⁴ These tools improve prioritization efforts by enabling investigators to group metabolites into compound families, and to determine the distribution of these families across sample sets. Despite the numerous advantages of this methodology, MS analysis suffers from several inherent limitations, including variable ionization efficiency between analytes, and ion suppression.^5-7 In addition, MS yields limited structural information compared to other analytical methods. There is therefore a need for complementary methods that can address the existing limitations of these approaches.

In contrast to MS, NMR provides direct structural information, is a universal method of detection, and is semi-quantitative under standard conditions. NMR-based metabolomics approaches have increasingly focused on the development of platforms capable of highly accurate annotations of known primary metabolites, especially in biofluids.^8-12 These approaches have been successfully used to highlight high priority regions of the spectra, using spectral variability as a proxy to gauge potential novelty or detection of biomarkers.^13,14 However, ¹H NMR-based metabolomics methods cannot easily associate signals that derive from the same molecule, limiting identification options for unknown constituents.

An inherent strength of NMR-based platforms is their ability to resolve complex mixtures in two or more dimensions. 2D NMR data offer a robust method of annotation when compared against a database of known compounds,^15,16 but are often limited by the availability of reference data.¹⁷ Several platforms exist for annotating metabolites utilizing 2D NMR data, including dereplication utilities and targeted metabolomics in biofluids (Table 1).¹⁸ For example, the newly developed SMART (Small Molecule Accurate Recognition Technology) platform employs convolutional neural networks to compare HSQC spectra against a large library of experimental and calculated spectra to identify structurally similar molecules.¹⁹ This provides investigators with candidate structures that can inform and accelerate dereplication efforts, and determine core structural elements for unknown compounds.²⁰ In a separate application, HSQC-TOCSY spectra were used to fingerprint extracts from a library of bacterial isolates, and these fingerprints used to prioritize strains enriched in NMR motifs for polyketide and peptidic natural products; scaffold types with significant precedent in bioactive natural products discovery.²¹

Table 1.

Overview of 2D NMR Metabolomics and Dereplication Utilities

	MetaboMiner²²	COLMAR¹⁶	SMART 2.0²⁰	DEREP-NP¹⁵	MADByTE
Analysis Type	Targeted	Targeted	Targeted	Targeted	Untargeted
Designed for Mixtures	✓	✓	✘	✘	✓
Reference Database	Required	Required	Required	Required	Optional
Metabolite Type	1°	1°	2°	2°	2°
Sample Type	Biofluids	Biofluids	Single Compounds	Single Compounds	Complex Mixtures
Batch Comparison	✘	✘	✘	✘	✓
Bioactivity Integration	✘	✘	✘	✘	✓
Multiple Data Integration	✘	✓	✘	✓	✓
Required Solvent System	Buffered D₂O	Buffered D₂O/CDCl₃	Various	Various	Various

Open in a new tab

Many existing NMR-based tools require pure, or simplified, samples for accurate structure prediction.²³ To address this issue, we have developed a new metabolomics platform, termed MADByTE (Metabolomics And Dereplication By Two-dimensional Experiments), to deconvolute NMR data from complex natural products mixtures. This platform is designed to address a previously unsolved problem in natural products, namely the grouping of complex mixtures by shared compound families, based directly on NMR data of complex mixtures.

The central function of MADByTE is to identify spin system features in individual mixtures, and to match these spin system features between samples. In addition, the platform contains modules to dereplicate known chemistry by feature matching to data from pure compounds, and to prioritize features associated with highly bioactive samples for the isolation of bioactive constituents. The analysis pipeline works by integrating ¹H-¹³C connectivity data from HSQC spectra with ¹H-¹H scalar coupling information from TOCSY spectra to define discrete substructures present in each sample. In principle, any resolvable spin system from the mixture can be annotated using this approach. This includes multiple features from the same molecule, and individual features from different molecules in the mixture. Together, these spin system features provide sets of diagnostic ¹H and ¹³C signals that together describe the chemical composition of each mixture. These spin system features are then compared between samples to identify common features between samples, providing a view of the chemical diversity of the sample set, and accelerating the natural products discovery process.

Unlike many of the existing NMR-based profiling tools, MADByTE does not require a bespoke spectral reference library against which to compare NMR data (Table 1). This is an important distinction, as it offers a new mechanism to evaluate the chemical similarities and differences between samples, regardless of whether these constituents are known or novel natural product classes. However, if such a reference library is available, MADByTE includes an optional dereplication module to identify known compound groups using these data. By acquiring HSQC and TOCSY spectra for a set of 85 natural product prefractions and grouping these prefractions based on spin system feature matching we demonstrate the viability of the platform for the untargeted analysis of natural product mixtures. In addition, we apply this dataset to direct applications in natural products research including compound dereplication and bioactivity-based compound discovery.

RESULTS AND DISCUSSION

NMR experiment selection and parameter optimization were governed by the need to balance digital resolution against acquisition time. To reduce acquisition times, we employed non-uniform sampling (NUS) for both two-dimensional experiments, with an NUS rate of 50%. This is a conservative value and was selected to limit the generation of signal artefacts in the NUS processing step.²⁴ We elected to use a phase-sensitive version of the HSQC experiment for all analyses. Phase-sensitive HSQC experiments have marginally lower signal-to-noise ratios than absolute value experiments due to the increased length of the phase-sensitive pulse sequence.²⁵ However, this minor decrease in signal-to-noise is compensated for by the addition of carbon multiplicity data in the phases of the cross peaks in the spectrum (by convention CH and CH₃ signals are positive, while CH₂ peaks are negative).

An alternative approach would have been to acquire these data in a single HSQC-TOCSY experiment. However, this strategy suffers from three major drawbacks. Firstly, we observed a significant signal-to-noise penalty when using the HSQC-TOCSY over the stand-alone TOCSY pulse sequence (Figure S3). Secondly, the presence of matching cross-peaks on both sides of the diagonal in the TOCSY spectrum is used to eliminate background noise in complex samples. This strategy would not have been possible with the use of a single HSQC-TOCSY spectrum, potentially increasing artefacts and reducing clustering accuracy. Finally, it is not possible to assign specific ¹H-¹³C relationships from the HSQC-TOCSY experiment, because each carbon signal displays cross peaks with all of the adjacent protons in the spin system. For these reasons, separate TOCSY and HSQC spectra were selected for this analysis method.

Solvent selection is also an important consideration for experimental design. If users wish to integrate biological data then DMSO-d6 is required for NMR acquisition to ensure that sample compositions are the same between biological screening (which almost always uses DMSO as the solvent vehicle) and NMR experiments. Alternatively, methanol-d4 may be used to include as many mixture components as possible if this was used for initial extract preparation. Because MADByTE only includes protons attached to carbons for building spin system features there is little impact from the removal of exchangeable signals in the dataset. The only situations where this may influence spin system feature creation are those where exchangeable signals are part of larger spin systems (e.g., secondary amines). In these cases, the spin system will be split into two smaller components. Finally, deuterochloroform can be useful a useful option in situations where samples contain predominantly non-polar constituents, although the residual solvent signal is in a region of the spectrum that often includes legitimate natural product signals which can complicate the analysis of highly aromatic substituents. The only situation which is not appropriate is the inclusion of data acquired in different solvents for different samples. Because the system aligns spin system features based on chemical shifts it is not possible to compare datasets in different solvents as chemical shift variations of the same compounds between different solvents are often substantial.

Feature Construction and Comparison

The data processing workflow consists of five steps: data acquisition, peak picking, creation of spin system features, comparison features between samples, and generation of network graphs for data visualization. As inputs, the platform requires peak picked tables from TOCSY and HSQC spectra. The platform is therefore vendor and pulse sequence-independent and works with data from any frequency of spectrometer. Using peak picked tables allows users to apply data processing functions available in NMR processing software (i.e., apodization functions, linear prediction, zero filling, phase correction, baseline correction etc.) to produce the highest quality peak lists prior to analysis. Using this approach it is even possible to apply advanced processing methods such as covariance NMR, which have been shown to have a significant influence on spectral quality in some cases.²⁶

Spin system features are created using the following workflow. The NMR solvent is selected from a drop-down menu, and the HSQC peak lists are filtered to remove signals in the solvent and water peak regions in a band ±0.02 ppm either side of both the solvent and water signals in the ¹H dimension. Three regions in the HSQC spectrum that do not contain legitimate signals (¹H > 7.0 with ¹³C < 50 ppm, ¹H < 7.0 with ¹³C > 170 ppm and ¹H < 2.4 with ¹³C > 100 ppm) are also removed (for a graphical representation of these filters see Figure S22). Additionally, the shielded quadrant of the TOCSY spectrum (<2.5 ppm) is excluded because of the often-poor performance of peak picking algorithms in this crowded region. TOCSY data are then symmetrized by correcting f1 chemical shifts against the corresponding f2 values, due to the higher resolution and more accurate peak picking in f2. Finally, the list of TOCSY signals is then compared against the HSQC peak list, and TOCSY signals without corresponding HSQC cross-peaks in the ¹H dimension are excluded.

The removal of TOCSY peaks in the shielded region could raise concerns that important elements of the structure are being ignored. Fortunately, the spin system creation strategy mitigates this issue. Although the region below 2.5 ppm in each dimension is excluded, this only removes TOCSY cross-peaks between protons if both have chemical shifts below this value. If many natural product structures met this criterion this would be a significant limitation of the platform, as compounds without any ¹H NMR signals >2.5 ppm would be excluded from analysis.

To understand the distribution of NMR signals in natural product structures we predicted the ¹H NMR spectra for 27,844 microbial natural products from the Natural Products Atlas database using the stereo-aware HOSE code model available from NMRShiftDB2 (Supplemental Information). ^27,28 Predicted NMR spectra were filtered to retain only proton atoms attached to carbon, and the proportion of ¹H signals >2.5 ppm determined (Figure S24). Interestingly, very few molecules (0.4%) are predicted to contain exclusively shielded ¹H signals, while 94.9% possessed at least 10% deshielded signals. While MADByTE will not identify every spin system in a given mixture, this result demonstrates that most compounds contain one or more spin systems with proton signals in an appropriate chemical shift range for detection using this approach.

Because MADByTE is designed to characterize the chemical landscape between sets of mixtures a full description of all NMR signals is not required. Instead, diagnostic spin systems of distinctive substructures are used to match related compound families. Therefore, any signal >2.5 ppm can act as an 'antenna' for that functional group by providing TOCSY cross-peaks to all other protons within range of the TOCSY signal propagation. Therefore, removal of data in the highly congested shielded region of the TOCSY spectrum does not influence the detection of diagnostic spin system features containing at least one signal with a chemical shift >2.5 ppm. It is important to note that users must exercise caution if using the MADByTE platform to analyze specialist compound classes, such as sterols or low molecular weight isoprenoids, which may be more difficult to detect due to the low numbers of shielded positions in these molecules.

Spin system features for individual samples are created by generating a directed graph from each TOCSY peak table, where nodes represent ¹H signals in the TOCSY spectrum, and edges represent TOCSY cross-peaks between ¹H signals. Because multiple members of each spin system should generate cross-peaks to any given spin system member, nodes containing only a single connection to a spin system are removed. Similarly, edges between nodes are removed if they are not reciprocal (i.e. are not observed on both sides of the diagonal of the TOCSY spectrum).

The resulting graph includes sub-graphs for every unique spin system in the sample (Figure S26 Panel B). Nodes in these sub-graphs are annotated with ¹³C chemical shifts to form (¹H, ¹³C) pairs by integration of the HSQC peak table data. In instances where spectral overlap yields multiple candidate ¹³C chemical shifts, values of the closest HSQC resonance are included in the node annotation.

To compare spin system features between samples, a square similarity matrix is created containing all spin system features in the sample set. Each pairwise combination is scored for (¹H, ¹³C) pair overlap by dividing the number of overlapping (¹H, ¹³C) pairs by the total number of (¹H, ¹³C) pairs in the spin system. This results in an asymmetric matrix, which scores the overlap of (for example) spin system A with B in one dimension, and B with A in the other. The higher of the two similarity scores (A vs B or B vs A) is used to define edges between spin system nodes. This approach is appropriate because variation in compound concentrations or resolution between samples can lead to the creation of larger or smaller spin system features for the same molecule in different samples.

The resulting similarity matrix scores spin system overlap between each spin system feature and all other spin system features in the sample set. MADByTE illustrates the chemical interrelatedness of sample sets by creating network graphs that include all spin system features and include connections (edges) between features with similarity scores above a minimum threshold (Figure 1, Outputs 1, 2 and 3). These networks are used to identify interrelated spin system features between samples for natural products identification, as illustrated below.

Figure 1. — MADByTE Workflow. Following raw data acquisition and standard processing steps (Fourier transform, linear prediction, reconstruction phase correction, supervised peak picking; yellow box) Two stages of analysis are performed: Per-sample processing (blue box) constructs spin system features (SSF) for each sample independently from each set of spectra. After each sample is processed, the sample comparison step (green box) calculates the correlation matrix relating each spin system by similarity. The correlation matrix is then used to generate the three network outputs (Outputs 1, 2, and 3).

The three network outputs offer different representations of the spin system features in the sample set. The full association network (Output 1) includes every spin system feature from every sample in the dataset. This comprehensive viewpoint includes unique spin system features that are not shared between samples. While this visualization is comprehensive, it can become congested and hard to interpret for large numbers of samples. The similarity network (Output 2) removes unique nodes, providing a shared-chemistry viewpoint that includes all spin system features that are matched to at least one other spin system feature in the sample set. This visualization is particularly useful for dereplication or targeted objectives such as bioactive motif discovery, where identifying shared chemistry is the primary objective. In this visualization the full spin system feature for each extract is presented, and related spin system features are connected by edges. The hybrid network (Output 3) presents a single consensus spin system feature node between samples. This consensus node displays only the chemical shifts that are shared between samples, providing a simplified viewpoint of the conserved NMR features in spin system features shared between extracts. In cases where only single resonances are shared, hybrid nodes are not constructed, requiring a minimum of two members in overlapping spin systems. This visualization is useful for identifying the diagnostic signals between extracts, and is valuable for selecting signals to pursue for NMR-guided compound isolation.

Graphical User Interface

The platform is supported by a feature-rich, platform-independent graphical user interface (GUI). The GUI includes tools for data and parameter selection, configuration of analysis settings, and results visualization (Figure 2). User modifiable parameters include chemical shift error ranges for peak matching in both ¹H and ¹³C, and minimum similarity score cut-off for network generation. The visualization panels allow direct visualization of 1D spectra using NMRglue,²⁹ HSQC and TOCSY plotting, as well as interactive network visualizations that include dynamic features such as sub-network size filtering. Finally, the GUI provides options to perform dereplication against user-supplied reference libraries and overlay of biological datasets, both of which are discussed in detail below.

Figure 2. — The graphical user interface (GUI). A) Analysis setup window, including user-modifiable parameters and lists of optional reference spectra for inclusion in analysis. B) Native NMR plotting for spectral review, including options for viewing both 1D spectra and points derived from HSQC and TOCSY processing. C) Network results view, including interactive tools that allow users to highlight nodes of interest and display the NMR signals used for their construction. D) Example of network filtration based on spin system size, performed using the bottom slider, to include only spin system features containing a defined number of spin system members.

The MADByTE tool has been designed with ease of use as a core requirement. The platform is supported by an extensive user manual, and the code base contains detailed comments to allow command line implementation or script-based automations by advanced users. It is provided under the MIT Software license and is available for free as a code repository from https://github.com/liningtonlab/MADByTE.

Application to Known Scaffolds

To test the effectiveness of the platform for grouping compound classes we initially acquired a training dataset comprising ¹H, TOCSY and HSQC spectra for 17 commercially available natural products and natural product analogues (Supporting Information Figure S4). Following supervised peak picking, peak lists were imported, and the sample set processed as described above. Selected features from the resulting full association network are presented in Figure 3 (full network is available in the Supporting Information as Figure S5). The full association network contained one sub-cluster (central cluster) containing three reference compounds. Closer examination of this sub-network revealed the presence of the related polyketide macrocycles azithromycin (3), erythromycin (4) roxithromycin (5) (Figure 3A). These compounds were related by the presence of two major spin system features shared between all three compounds. The first (pink nodes) included signals from the lactone junction of the macrocyclic core, while the second (blue nodes) contained signals from the pendant cladinose sugar. In the former case the spin system feature contains four resonances (0.76-0.80, 1.37-1.38, 1.77-1.81, 4.75-5.13 ppm) found in each dataset (Figure 3B). These four signals are sufficient to identify this motif as a commonly shared sub-structure. The cladinose substructure includes six (¹H, ¹³C) correlations, present in two discrete spin systems. One of these spin systems, containing three features (1.52-1.53, 2.28-2.29, 4.73-4.84 ppm), was detected as a core motif in all three datasets (Figure 3B). The positions of the signals in each spin system (Figure 3C) highlights the similarities in signal composition between samples, while demonstrating that moderate variations in chemical shift are tolerated without eliminating connections between spin system features. These results demonstrate the ability of the algorithm to connect substructures even in the absence of all possible spin systems, and to use these connections to group structurally related molecules from different samples.

Review of the network (Figure 3) revealed two other matching compound sets, including chloramphenicol (1)/ thiamphenicol (2), and epirubicin (6)/ daunomycin (7). Encouragingly, compounds with low structural similarity to other members of the training set did not form connections to these clusters. Instead, these compounds (e.g., puromycin (8) and mupirocin (9)) remained as single sub-networks containing only the spin system features identified from their own NMR spectra, without false-positive connections to other members of the test set.

Detection of Non-Native Compounds in Complex Matrices

To evaluate the platform's ability to identify metabolites in complex mixtures, a set of 9 spiked prefraction samples were prepared (Table 2). 25 μL aliquots of each prefraction were taken from the Linington lab extract library, dried, and spiked with 0.5 mg of one of three reference compounds as shown in Table 2. These three compounds (novobiocin, mupirocin, and erythromycin) were selected because they are structurally dissimilar to one another and were known not to occur in these fractions. The prefractions included three different polarity fractions (A, C and E suffixes) from each of three different crude extracts (1526, 1726 and 1814). Spiked compounds were distributed so that they occurred only once in each extract and polarity fraction. (i.e., spiked compounds were added to three samples without duplication of either extract or polarity, see Table 2). HSQC and TOCSY spectra were collected for each sample, processed, peak picked, and subjected to standard analysis, including the reference spectra for each pure reference compound (Figure 4). The resulting full association network shows clear separation of the extract prefractions based on the presence of spiked reference compounds.

Table 2:

Prefractions Spiked with Reference Compounds.

Sample Name	Extract Prefraction	Spiked Compound	Node ID (Figure 4)
1526_A_SPK	1526 A	Erythromycin	A
1526_C_SPK	1526 C	Mupirocin	B
1526_E_SPK	1526 E	Novobiocin	C
1726_A_SPK	1726 A	Novobiocin	D
1726_C_SPK	1726 C	Erythromycin	E
1726_E_SPK	1726 E	Mupirocin	F
1814_A_SPK	1814 A	Mupirocin	G
1814_C_SPK	1814 C	Novobiocin	H
1814_E_SPK	1814 E	Erythromycin	I
Mupirocin	-	-	9
Erythromycin	-	-	4
Novobiocin	-	-	10

Open in a new tab

Figure 4. — Full annotation network illustrating extract prefractions containing spiked reference compounds (green, gold, and pink nodes), spin system features (grey nodes), and pure compound reference data (blue nodes; erthromycin (4), mupirocin (9), and novobiocin (10))

As with all MADByTE networks, nodes in this graph are grouped based on the presence of shared spin system features. Because the spiked compounds were each added to three different prefractions we expected the network to cluster around spin system features from these compounds. Figure 4 confirms this assumption, with the prefractions containing each spiked compound (pink, yellow and green nodes) forming sub-clusters within the network. In some cases, prefractions and reference compounds were connected by a single shared spin system (e.g., F to 9 and H to 10), however in most cases spiked prefractions and reference compounds were connected by at least two shared spin system features, demonstrating the robust connections between spectra even in the case of high chemical complexity in the spiked prefraction spectra.

However, this network includes spin system feature nodes for all spin system features in the sample set, not just spin system features related to the spiked reference compounds. Connections between these additional spin system features influences overall network structure. In this case, there were few common spin system features between the more non-polar prefractions (suffixes C and E). By contrast, the polar 'A' fractions contained several spin system features that were shared between prefractions. These correlations create edges between the three 'A' fractions (1526A, node A; 1726A, node D; and 1814A, node G), decreasing the distance between these nodes in the network and moving sample 1526A to a position between groups. This illustrates an important aspect of network interpretation. Nodes that are grouped together in the network plot will include a high proportion of shared spin system features. However, every edge is of equal importance to the network and many compound classes may be represented by just one or two spin system features. Therefore, the two connections between reference compound 4 (erythromycin) and sample 1526A (node A) indicate strong chemical similarity between these two nodes, even though node A also possesses shared spin system features with other nodes (e.g., node B and G) that influences the position of node A relative to node 4. The GUI includes an interactive network visualization panel that displays a 'pop up' window containing details about spin system feature membership when users hover over selected nodes to assist with detailed data interpretation.

Structural Dereplication in a Natural Product Prefraction Library

To extend the analysis to real-world datasets, we acquired data for 85 samples from our prefractionated microbial natural products library (example spectra presented in Supporting Information figures S16-S19; full dataset available for download at https://doi.org/10.5281/zenodo.4317721). Following data processing to generate spin system features, these samples were combined with the pure compound dataset and a hybrid network (Figure 1, Output 3) generated using standard parameters. The hybrid network contains a single spin system feature node for each connection between extract nodes that contains just the overlapping NMR features between the spin system features of the individual samples. This view is the most valuable for large networks as it significantly reduces network complexity compared to the other two network options. It also highlights the most robust and consistent NMR features that link samples together which can be helpful when manually inspecting individual NMR spectra.

Given the high degree of signal overlap that is common in NMR, particularly for ¹H spectra, there is a limit to the complexity of samples that can successfully be analyzed as complex mixtures. Encouragingly, this analysis yielded a hybrid network containing a large number of spin system features that grouped samples into distinct regions of related chemical space (Figure 5). Interestingly, several natural products prefractions contained spin system features that linked to reference compounds, suggesting the presence of known compound families in the extract library. Although analysis was successful in this use case, for libraries of exceptionally high complexity (e.g., botanical supplements or traditional Chinese medicine preparations) is recommended that a prefractionation step be employed prior to NMR acquisition and analysis.

Figure 5 highlights several commonly encountered situations in natural products chemistry. The nodes in the red dashed box are interconnected by a large number of shared spin system features but are not connected to other prefractions outside this region. This indicates that these prefractions contain many related chemical features, but that these features are not present elsewhere in the sample set. By contrast, the nodes in the blue dashed box share very few spin system features, and are not connected to other nodes in the network. This indicates that these samples contain very few spin system features that are shared with other samples in this experiment. However, it would be incorrect to assume that these samples contain few natural products. The hybrid network includes only those spin system features shared between at least two samples in the set. To review the spin systems that are unique to individual samples users must examine the full association network (Figure 1, Output 1). The full association network includes all the spin system features detected for each sample, and can be used to highlight features that are unique for a particular sample or group of samples in the set.

MADByTE contains an optional 'dereplication' module that compares network spin systems against a user-supplied internal library of spin systems from pure compound reference spectra. Laboratories that have legacy spectra for compounds of interest can import the peak lists from these spectra to create custom reference libraries for dereplication of common spin systems. Alternatively, these lists can be extracted from the scientific literature for cases where TOCSY and HSQC spectra were reproduced in the original papers. Therefore, the platform can either be used in a targeted mode (with the dereplication module) to identify the presence of known compound families for which reference spectra are available, or in an untargeted mode to relate spin systems between samples in the absence of reference spectra. It is important to note that the use of reference libraries is optional, and is not a requirement for analysis. In this case, prefractions 1565C and 1565D included spin system features that linked to the reference compound novobiocin (Figures 5B and C). UPLC-MS analysis of pre-fractions 1565C and 1565D (Figures 5D and E) against a commercial standard confirmed the presence of novobiocin, demonstrating the value of MADByTE for known compound dereplication.

Prioritization of Bioactive Constituents from Prefraction Libraries

Among metabolomics platforms there is an increasing focus on integration of bioactivity profiles with spectroscopic information.^30-32 Bioactivity driven prioritization via NMR of complex mixtures is a relatively new field, with most applications focused on primary metabolomics³³ and biomarker detection.³⁴ However, recent developments in MS-based bioactivity prioritization have demonstrated the value of this general strategy for bioactive compound discovery.³⁵ The GUI contains an optional function to layer bioactivity data over the resulting networks to highlight NMR features that are correlated with activity. This is the first NMR data analysis platform designed for untargeted analysis of natural product fractions that includes this capability.

To demonstrate the utility of this approach we overlaid antimicrobial screening data from our in-house BioMAP screening platform, comprising 15 clinically relevant bacterial pathogens and laboratory strains, onto the prefractionated extract library used to generate Figure 5.³⁶ From these data we generated a new network view containing only those prefractions with activity against one or more pathogens (34 prefractions). The network was highlighted by color coding nodes based on degree of activity, from narrow spectrum (yellow nodes, 1 - 4 organisms) to broad spectrum (red nodes, >10 organisms) activities. This new full association network (Figure 6A) identified several 'hotspots' of bioactivity with both broad-spectrum bioactivities and shared spin system features, suggesting the presence of conserved bioactive compound families between prefractions.

Figure 6. — A) Full association network for natural products extract library. Extract nodes color coded by bioactivity profile. B) Expansion of high activity region (dashed box in panel A), highlighting shared spin system features between prefractions (grey nodes marked with asterisks). C) Expansions of TOCSY spectra for active prefractions and bioactive component collismycin A (11). Shared spin system features highlighted in pink. D) Integration of the NMR data for collismycin A (11) (teal node) and subsequent reprocessing verified a match between the spin system features of the bioactive component and the spin system features from the original prefractions.

One hotspot region (Figures 6A and 6B) contained two related prefractions, 2108D and 2108E, with shared spin system features. To identify this predicted bioactive component, we performed a re-fermentation of the producing organism under our standard culture conditions.³⁶ Subsequent NMR-guided isolation of the active component was performed via sequential rounds of fractionation, tracking signals from the predicted spin system feature by ¹H NMR. This approach was effective at rapidly isolating the predicted bioactive component, which was identified as the pyridine-containing biaryl compound collismycin A (Figures S6-S13) through traditional 1D and 2D NMR structure elucidation methods.^37,38 Peak picking of the TOCSY and HSQC spectra for the purified material, followed by data processing and incorporation in to the spin system feature network shown in Figure 6A demonstrated that this pure compound (11) was directly associated with spin system features from the prioritized bioactive prefractions (Figure 6C). Subsequent UPLC-MS analysis confirmed the presence of collismycin A in all three related prefractions (Figure S13). Finally, rescreening of pure collismycin A in the BioMAP antimicrobial screening panel confirmed the predicted antibacterial activity for this bioactive metabolite (Supporting Information, Table S7).

Limitations and Future Opportunities

Extracting accurate and informative information from NMR spectra of complex mixtures remains an unsolved challenge, more than 60 years after the invention of the NMR spectrometer. While our new platform provides a new mechanism for extracting this information from 2D NMR spectra, several challenges and limitations remain. Firstly, situations where proton signals from different spin systems possess identical chemical shifts can occasionally result in the creation of 'fused' spin system features that contain signals from both spin systems. These fused spin system features will not associate well with spin system features from other samples because of these additional erroneous signals. Fortunately, the use of an asymmetric comparison matrix for comparison spin system features (Figure 1) mitigates this issue, because edges will be formed if one spin system feature is a subset of another, removing the requirement that all NMR signals must match between the two spin system features being compared. The hybrid network view is useful for addressing this issue as only the shared NMR features are presented in this view, highlighting the common NMR features between samples.

As discussed above, the current platform excludes TOCSY cross peaks where the chemical shifts in F1 and F2 are both < 2.5 ppm. While this is not an issue for molecules that contain other signals above this range it does preclude the use of MADByTE for studying specialty non-polar compound classes (e.g. lipids, volatile terpenes) that contain exclusively shielded ¹H NMR signals. Resolving this issue is a major challenge that will require significant methodological innovation to address.

Lastly, this platform can only dereplicate compound classes if users supply reference spectra for those compounds. Although open access databases exist that contain structures for most known natural products, the NMR data for these compounds are often missing or presented in tabular formats in journal articles and supporting information files. Availability of large-scale repositories of raw or calculated NMR spectra for these compounds would greatly improve the functionality of this platform for de novo compound dereplication. Current 'lightweight' NMR predictors do not have sufficient accuracy to be useful for this task.^39,40 While it is possible to generate very high-quality NMR predictions using higher levels of theory, these calculations are computationally expensive.^23,41 Generation of such a dataset would be a massive but highly valuable undertaking that would greatly serve the natural products community in the coming years.

Conclusion

In summary, we have developed a new platform for NMR-based untargeted metabolomic analysis of complex natural products mixtures. This platform is the first open-source tool capable of grouping natural product mixtures by shared spin system features using 2D-NMR-based untargeted metabolomics. It has been successfully employed for the dereplication of known compound classes (Figure 5), and for the direct prioritization and isolation of bioactive constituents from fraction libraries (Figure 6). The platform is supported by user friendly, feature-rich GUI, is freely available to the natural products community, and includes extensive documentation to support adoption by non-specialists.

EXPERIMENTAL SECTION

NMR Data Collection

For this study all data were acquired in 5 mm Shigemi tubes on either Avance III TCI (600 MHz) or Avance III QCI (600 MHz) spectrometers in DMSO-d6 (CortecNet lot Q0611) at 300 °K and referenced to residual solvent signals (¹H: 2.50 ppm, ¹³C: 39.5 ppm). HSQC spectra were recorded as 32 scans (TD: 4096 x 256), collected by non-uniform sampling at 50% followed by linear prediction and zero filling. TOCSY spectra were recorded as 16 scans (TD: 1024 x 128), collected by non-uniform sampling at 50% followed by linear prediction and zero filling. Proton spectra were recorded as 64 scans (TD: 131 k). All spectra were manually referenced and phased, followed by supervised peak picking. Peaks were picked if they possessed reasonable peak shape and were above the defined noise threshold.

Pure Compound Networking

The reference compound set selected for standard compound network were chosen to represent diverse scaffolds from natural products or natural product derivatives. Daunomycin (7), roxithromycin (5), erythromycin (4), puromycin (8), novobiocin (10), and cycloheximide were obtained from Sigma-Aldrich. Ursolic acid, betulinic acid, and oleanolic acid were purchased from Extrasynthese SA. Chloramphenicol (1) was obtained from Calbiochem. Azithromycin (3) and rifamycin S were purchased from TCI. Thiamphenicol (2) was acquired from Spectrum Chemicals, and actinomycin D was purchased from RPI. Mupirocin (9) was purchased from AppliChem. Epirubicin (6) was purchased from MP Biomedicals LLC, and staurosporine was purchased from LC Laboratories.

Parameters used for this study are displayed in Table S2. The resulting networks were exported in graphML format and processed in Gephi for visualization using the Force Atlas 2 algorithm with default parameters except; spacing = 10, dissuade hubs = True, prevent overlap = True.

Actinobacterial Growth Conditions

From frozen glycerol stocks, isolate RL12_176_HVF_A was grown on solid Marine Broth plates for two weeks until colonies were visible. Four single colony isolates were selected and grown in 7 mL of SYP media for 3 days. Following inspection for consistent phenotype and acceptable growth, 3 mL of each culture was transferred to 60 mL of fresh media for 5 days. After 5 days, 40 mL of culture was transferred to 1 L of SYP media containing 20 g of XAD-7 resin. This culture was grown for 7 days (25 °C, 200 rpm) resulting in 4L of culture for extraction to create extract 2108.

Actinobacteria Extraction

Bacteria and resin were filtered under vacuum and the supernatant sterilized and discarded. Cells and resin were shaken in 250 mL of 50:50 CH₂Cl₂:MeOH per 1 L culture for 1 h, then filtered and the resulting filtrate concentrated to dryness in vacuo.

Fractionation of Actinobacterial Extract

Actinobacterial extracts were loaded onto 10 g of Celite per 1 L culture. Each bacterial extract was fractionated via flash chromatography on a Combiflash ISCO instrument using a linear gradient of 10 to100% MeOH over 30 min followed by 100% EtOAc wash for 10 min using a C₁₈ SepPak column.

Isolation of Collismycin A

Each fraction from the Combiflash™ separation was evaluated by ¹H NMR. One fraction in particular, 2108 fraction 10, contained NMR signals consistent with the prioritized spin system from bioactivity profiling. This fraction showed a promising profile with one major component and a predominant m/z feature at 276.1 [M+H]⁺ (Figure S6). NMR analysis identified this component as collismycin A (11).³⁸ The aromatic region of the ¹H NMR spectrum of collismycin A (Figure S8) matched the expected proton values for the spin system derived from the MADByTE analysis; further, the HSQC spectrum confirmed correct assignment of carbon values to these protons as predicted in the spin system feature (Figure S10).

Biological Activity Evaluation

Extract prefractions were initially screened in the BioMAP panel as part of a previous study.³⁶ Activity information was simplified for use in this study by reducing absolute values of inhibition to a three value system (1 = hit, 0.5 = mild activity, 0 = no activity). Prefractions displaying over 50% inhibition were classified as hits, between 25-50% activity were listed as mildly active, and below 25% were listed as no activity. Values were summed across the bacterial panel and this value used as the bioactivity hit rate for each prefraction for visualization.

Antimicrobial susceptibility tests for collismycin A were performed against a select panel of bacteria using a miniaturized high throughput assay adapted from the broth microdilution method outlined by the Clinical and Laboratory Standards Institute (CLSI).⁴² Bacterial test strains were individually grown on fresh Nutrient Broth (NB, ATCC Medium 3) agar, Tryptic Soy Broth (TSB, ATCC Medium 18) agar or Brain Heart Infusion (BHI, ATCC Medium 44) agar, as recommended by the American Type Culture Collection (ATCC) cultivation protocol (Table S6). Individual colonies were used to inoculate 3 mL of sterile NB, TSB or BHI media and grown overnight with shaking (200 rpm; 37 °C). Listeria ivanovii (ATCC BAA-139) and Streptococcus pneumoniae (ATCC 49619) were incubated overnight but not shaken (37 °C; 5% CO2). Saturated overnight cultures were diluted in their respective media according to turbidity to achieve approximately 5 x 10⁵ CFU of final inoculum density and dispensed into sterile clear polystyrene 384-well microplates (Thermo Scientific 265202) with a final screening volume of 30 μL. L. ivanovii was diluted with and grown in Haemophilus Test Medium (HTM; ATCC Medium 2167). DMSO solutions of test collismycin A and antibiotic controls were prepared as 1:1 dilution series and pinned into each assay plate (200 nL) using a high throughput pinning robot (Tecan Freedom EVO 100) to achieve final screening concentrations ranging from 128 μM to 3.91 nM. In each 384-well plate; lane 1 was reserved for DMSO vehicle and culture medium; lane 2 reserved for DMSO vehicle, culture medium and target bacteria; lanes 23 and 24 reserved for antibiotic controls, DMSO vehicle, culture medium and target bacteria. After compound pinning, assay plates were read as T₀ at OD₆₀₀ using an automated plate reader (Molecular Devices SpectraMax i3x), sealed with lid and placed in a humidity-controlled incubator at 37 °C for 18-20 h. Final OD₆₀₀ were obtained on the same plate reader for T₂₀ values. L. ivanovii and S. pneumoniae were incubated in a separate incubator (37 °C; 5% CO2). Resulting growth curves for each dilution series were used to determine the MIC values for all test compounds following standard procedures.

Supplementary Material

Supplemental

NIHMS1704875-supplement-Supplemental.pdf^{(2.8MB, pdf)}

ACKNOWLEDGMENTS

We thank Drs. A. Lewis and E. Ye for assistance with NMR experiment selection and data acquisition and L. Flores-Bocanegra, G. Peterson, C. Fergusson, T. Clark, T. Bergeron, R. Reher, and K. B. Kang for alpha testing and user experience feedback.

Funding Sources

This work was supported by NIH U41-AT008718 (RGL), NSERC Discovery (RGL), and NIH F31-AT010098 (JME).

Footnotes

There are no conflicts to declare.

Supporting Information

The following files are available free of charge:

General experimental procedures, NMR spectra of collismycin A, supporting experiments, and explanations of code logic are included in the supporting information. (PDF)

NMR Data

NMR Data (HSQC and TOCSY) for all experiments can be found at https://doi.org/10.5281/zenodo.4317721.

Source Code

Source code, installation instructions, and user manual can be found at https://github.com/liningtonlab/MADByTE

REFERENCES

(1).Pye CR; Bertin MJ; Lokey RS; Gerwick WH; Linington RG Proc. Natl. Acad. Sci. U. S. A 2017, 114 (22), 5601–5606. [DOI] [PMC free article] [PubMed] [Google Scholar]
(2).Dumolin C; Aerts M; Verheyde B; Schellaert S; Vandamme T; Van der Jeugt F; De Canck E; Cnockaert M; Wieme AD; Cleenwerck I; Peiren J; Dawyndt P; Vandamme P; Carlier A mSystems 2019, 4 (5), e00437–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
(3).Hubert J; Nuzillard JM; Renault JH Phytochem. Rev 2017, 16 (1), 55–95. [Google Scholar]
(4).Wang M; Carver JJ; Phelan VV; Sanchez LM; Garg N; Peng Y; Nguyen DD; Watrous J; Kapono CA; Luzzatto-Knaan T; Porto C; Bouslimani A; Melnik AV; Meehan MJ; Liu WT; Crüsemann M; Boudreau PD; Esquenazi E; Sandoval-Calderón M; Kersten RD; Pace LA; Quinn RA; Duncan KR; Hsu CC; Floros DJ; Gavilan RG; Kleigrewe K; Northen T; Dutton RJ; Parrot D; Carlson EE; Aigle B; Michelsen CF; Jelsbak L; Sohlenkamp C; Pevzner P; Edlund A; McLean J; Piel J; Murphy BT; Gerwick L; Liaw CC; Yang YL; Humpf HU; Maansson M; Keyzers RA; Sims AC; Johnson AR; Sidebottom AM; Sedio BE; Klitgaard A; Larson CB; Boya CAP; Torres-Mendoza D; Gonzalez DJ; Silva DB; Marques LM; Demarque DP; Pociute E; O’Neill EC; Briand E; Helfrich EJN; Granatosky EA; Glukhov E; Ryffel F; Houson H; Mohimani H; Kharbush JJ; Zeng Y; Vorholt JA; Kurita KL; Charusanti P; McPhail KL; Nielsen KF; Vuong L; Elfeki M; Traxler MF; Engene N; Koyama N; Vining OB; Baric R; Silva RR; Mascuch SJ; Tomasi S; Jenkins S; Macherla V; Hoffman T; Agarwal V; Williams PG; Dai J; Neupane R; Gurr J; Rodríguez AMC; Lamsa A; Zhang C; Dorrestein K; Duggan BM; Almaliti J; Allard PM; Phapale P; Nothias LF; Alexandrov T; Litaudon M; Wolfender JL; Kyle JE; Metz TO; Peryea T; Nguyen DT; VanLeer D; Shinn P; Jadhav A; Müller R; Waters KM; Shi W; Liu X; Zhang L; Knight R; Jensen PR; Palsson B; Pogliano K; Linington RG; Gutiérrez M; Lopes NP; Gerwick WH; Moore BS; Dorrestein PC; Bandeira N Nat. Biotechnol 2016, 34 (8), 828–837. [DOI] [PMC free article] [PubMed] [Google Scholar]
(5).Annesley TM Clin. Chem 2003, 49 (7), 1041–1044. [DOI] [PubMed] [Google Scholar]
(6).Jessome LL; Volmer DA LCGC North Am. 2006, 24 (5), 498–510. [Google Scholar]
(7).Matuszewski BK; Constanzer ML; Chavez-Eng CM Anal. Chem 1998, 70 (5), 882–889. [DOI] [PubMed] [Google Scholar]
(8).Verhoeven A; Slagboom E; Wuhrer M; Giera M; Mayboroda OA Anal. Chim. Acta 2017, 976, 52–62. [DOI] [PubMed] [Google Scholar]
(9).Bingol K; Brüschweiler RJ Proteome Res. 2015, 14 (6), 2642–2648. [DOI] [PMC free article] [PubMed] [Google Scholar]
(10).Bingol K; Li DW; Brüschweiler-Li L; Cabrera OA; Megraw T; Zhang F; Brüschweiler R ACS Chem. Biol 2015, 10 (2), 452–459. [DOI] [PMC free article] [PubMed] [Google Scholar]
(11).Walker LR; Hoyt DW; Walker SM; Ward JK; Nicora CD; Bingol K Magn. Reson. Chem 2016, 54 (12), 998–1003. [DOI] [PubMed] [Google Scholar]
(12).Giskeødegård GF; Madssen TS; Euceda LR; Tessem MB; Moestue SA; Bathen TF NMR Biomed. 2019, 32 (10), e3927. [DOI] [PubMed] [Google Scholar]
(13).Gu H; Pan Z; Xi B; Asiago V; Musselman B; Raftery D Anal. Chim. Acta 2011, 686 (1–2), 57–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
(14).Cloarec O; Dumas ME; Craig A; Barton RH; Trygg J; Hudson J; Blancher C; Gauguier D; Lindon JC; Holmes E; Nicholson J Anal. Chem 2005, 77 (5), 1282–1289. [DOI] [PubMed] [Google Scholar]
(15).Zani CL; Carroll AR J. Nat. Prod 2017, 80 (6), 1758–1766. [DOI] [PubMed] [Google Scholar]
(16).Robinette SL; Zhang F; Brüschweiler-Li L; Brüschweiler R Anal. Chem 2008, 80 (10), 3606–3611. [DOI] [PubMed] [Google Scholar]
(17).McAlpine JB; Chen S-N; Kutateladze A; MacMillan JB; Appendino G; Barison A; Beniddir MA; Biavatti MW; Bluml S; Boufridi A; Butler MS; Capon RJ; Choi YH; Coppage D; Crews P; Crimmins MT; Csete M; Dewapriya P; Egan JM; Garson MJ; Genta-Jouve G; Gerwick WH; Gross H; Harper MK; Hermanto P; Hook JM; Hunter L; Jeannerat D; Ji N-Y; Johnson TA; Kingston DGII; Koshino H; Lee H-W; Lewin G; Li J; Linington RG; Liu M; McPhail KL; Molinski TF; Moore BS; Nam J-W; Neupane RP; Niemitz M; Nuzillard J-M; Oberlies NH; Ocampos FMMM; Pan G; Quinn RJ; Reddy DSS; Renault J-H; Rivera-Chávez J; Robien W; Saunders CM; Schmidt TJ; Seger C; Shen B; Steinbeck C; Stuppner H; Sturm S; Taglialatela-Scafati O; Tantillo DJ; Verpoorte R; Wang B-G; Williams CM; Williams PG; Wist J; Yue J-M; Zhang C; Xu Z; Simmler C; Lankin DC; Bisson J; Pauli GF Nat. Prod. Rep 2019, 36 (1), 35–107. [DOI] [PMC free article] [PubMed] [Google Scholar]
(18).Robinette SL; Brüschweiler R; Schroeder FC; Edison AS Acc. Chem. Res 2012, 45 (2), 288–297. [DOI] [PMC free article] [PubMed] [Google Scholar]
(19).Zhang C; Idelbayev Y; Roberts N; Tao Y; Nannapaneni Y; Duggan BM; Min J; Lin EC; Gerwick EC; Cottrell GW; Gerwick WH Sci. Rep 2017, 7 (1), 14243. [DOI] [PMC free article] [PubMed] [Google Scholar]
(20).Reher R; Kim HW; Zhang C; Mao HH; Wang M; Nothias L; Caraballo-rodriguez M; Glukhov E; Teke B; Leao T; Alexander KL; Duggan M; Everbroeck E. L. Van; Dorrestein PC; Cottrell GW; Gerwick WH; Reher R; Kim HW; Zhang C; Mao HH; Wang M J. Am. Chem. Soc 2020, 142 (9), 4114–4120. [DOI] [PMC free article] [PubMed] [Google Scholar]
(21).Buedenbender L; Habener LJ; Grkovic T; Kurtböke DI; Duffy S; Avery VM; Carroll AR J. Nat. Prod 2018, 81 (4), 957–965. [DOI] [PubMed] [Google Scholar]
(22).Xia J; Bjorndahl TC; Tang P; Wishart DS BMC Bioinformatics 2008, 9, 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
(23).Howarth A; Ermanis K; Goodman JM Chem. Sci 2020, 11 (17), 4351–4359. [DOI] [PMC free article] [PubMed] [Google Scholar]
(24).Schlippenbach T. Von; Oefner PJ; Gronwald W Sci. Rep 2018, 8 (1), 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
(25).Boyer RD; Johnson R; Krishnamurthy KJ Magn. Reson 2003, 165 (2), 253–259. [DOI] [PubMed] [Google Scholar]
(26).Brüschweiler R; Zhang FJ Chem. Phys 2004, 120 (11), 5253–5260. [DOI] [PubMed] [Google Scholar]
(27).Kuhn S; Johnson SR ACS Omega 2019, 4 (4), 7323–7329. [DOI] [PMC free article] [PubMed] [Google Scholar]
(28).Van Santen JA; Jacob G; Singh AL; Aniebok V; Balunas MJ; Bunsko D; Neto FC; Castaño-Espriu L; Chang C; Clark TN; Cleary Little JL; Delgadillo DA; Dorrestein PC; Duncan KR; Egan JM; Galey MM; Haeckl FPJ; Hua A; Hughes AH; Iskakova D; Khadilkar A; Lee J-H; Lee S; Legrow N; Liu DY; Macho JM; McCaughey CS; Medema MH; Neupane RP; O’Donnell TJ; Paula JS; Sanchez LM; Shaikh AF; Soldatou S; Terlouw BR; Tran TA; Valentine M; Van Der Hooft JJJ; Vo DA; Wang M; Wilson D; Zink KE; Linington RG ACS Cent. Sci 2019, 5 (11). [DOI] [PMC free article] [PubMed] [Google Scholar]
(29).Helmus JJ; Jaroniec CP J. Biomol. NMR 2013, 55 (4), 355–367. [DOI] [PMC free article] [PubMed] [Google Scholar]
(30).Kellogg JJ; Graf TN; Paine MF; McCune JS; Kvalheim OM; Oberlies NH; Cech NB J. Nat. Prod 2017, 80 (5), 1457–1466. [DOI] [PMC free article] [PubMed] [Google Scholar]
(31).Robinette SL; Lindon JC; Nicholson JK Anal. Chem 2013, 85 (11), 5297–5303. [DOI] [PubMed] [Google Scholar]
(32).Kurita KL; Glassey E; Linington RG Proc. Natl. Acad. Sci. U. S. A 2015, 112 (39), 11999–12004. [DOI] [PMC free article] [PubMed] [Google Scholar]
(33).Lewis IA; Schommer SC; Hodis B; Robb KA; Tonelli M; Westler WM; Sussman MR; Markley JL Anal. Chem 2007, 79 (24), 9385–9390. [DOI] [PMC free article] [PubMed] [Google Scholar]
(34).Song Z; Wang H; Yin X; Deng P; Jiang W Clin. Chem. Lab. Med 2019, 57 (4), 417–441. [DOI] [PubMed] [Google Scholar]
(35).Wolfender JL; Litaudon M; Touboul D; Queiroz EF Nat. Prod. Rep 2019, 36 (6), 855–868. [DOI] [PubMed] [Google Scholar]
(36).Wong WR; Oliver AG; Linington RG Chem. Biol 2012, 19 (11), 1483–1495. [DOI] [PMC free article] [PubMed] [Google Scholar]
(37).Shindo K; Yamagishi Y; Okada Y; Kawai H J. Antibiot 1994, 47 (9), 1072–1074. [DOI] [PubMed] [Google Scholar]
(38).Garcia I; Vior NM; González-Sabín J; Braña AF; Rohr J; Moris F; Méndez C; Salas JA Chem. Biol 2013, 20 (8), 1022–1032. [DOI] [PubMed] [Google Scholar]
(39).Lodewyk MW; Siebert MR; Tantillo DJ Chem. Rev 2012, 112 (3), 1839–1862. [DOI] [PubMed] [Google Scholar]
(40).Jain R; Bally T; Rablen PR J. Org. Chem 2009, 74 (11), 4017–4023. [DOI] [PubMed] [Google Scholar]
(41).Yesiltepe Y; Nuñez JR; Colby SM; Thomas DG; Borkum MI; Reardon PN; Washton NM; Metz TO; Teeguarden JG; Govind N; Renslow RS J. Cheminform 2018, 10 (1), 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
(42).Weinstien Melven P.. Methods for Dilution Antimicrobial Susceptibility Tests for Bacteria That Grow Aerobically, 11th ed.; Clinical and Laboratory Standards Institute: Wayne, PA, 2018. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental

NIHMS1704875-supplement-Supplemental.pdf^{(2.8MB, pdf)}

[R1] (1).Pye CR; Bertin MJ; Lokey RS; Gerwick WH; Linington RG Proc. Natl. Acad. Sci. U. S. A 2017, 114 (22), 5601–5606. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] (2).Dumolin C; Aerts M; Verheyde B; Schellaert S; Vandamme T; Van der Jeugt F; De Canck E; Cnockaert M; Wieme AD; Cleenwerck I; Peiren J; Dawyndt P; Vandamme P; Carlier A mSystems 2019, 4 (5), e00437–19. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] (3).Hubert J; Nuzillard JM; Renault JH Phytochem. Rev 2017, 16 (1), 55–95. [Google Scholar]

[R4] (4).Wang M; Carver JJ; Phelan VV; Sanchez LM; Garg N; Peng Y; Nguyen DD; Watrous J; Kapono CA; Luzzatto-Knaan T; Porto C; Bouslimani A; Melnik AV; Meehan MJ; Liu WT; Crüsemann M; Boudreau PD; Esquenazi E; Sandoval-Calderón M; Kersten RD; Pace LA; Quinn RA; Duncan KR; Hsu CC; Floros DJ; Gavilan RG; Kleigrewe K; Northen T; Dutton RJ; Parrot D; Carlson EE; Aigle B; Michelsen CF; Jelsbak L; Sohlenkamp C; Pevzner P; Edlund A; McLean J; Piel J; Murphy BT; Gerwick L; Liaw CC; Yang YL; Humpf HU; Maansson M; Keyzers RA; Sims AC; Johnson AR; Sidebottom AM; Sedio BE; Klitgaard A; Larson CB; Boya CAP; Torres-Mendoza D; Gonzalez DJ; Silva DB; Marques LM; Demarque DP; Pociute E; O’Neill EC; Briand E; Helfrich EJN; Granatosky EA; Glukhov E; Ryffel F; Houson H; Mohimani H; Kharbush JJ; Zeng Y; Vorholt JA; Kurita KL; Charusanti P; McPhail KL; Nielsen KF; Vuong L; Elfeki M; Traxler MF; Engene N; Koyama N; Vining OB; Baric R; Silva RR; Mascuch SJ; Tomasi S; Jenkins S; Macherla V; Hoffman T; Agarwal V; Williams PG; Dai J; Neupane R; Gurr J; Rodríguez AMC; Lamsa A; Zhang C; Dorrestein K; Duggan BM; Almaliti J; Allard PM; Phapale P; Nothias LF; Alexandrov T; Litaudon M; Wolfender JL; Kyle JE; Metz TO; Peryea T; Nguyen DT; VanLeer D; Shinn P; Jadhav A; Müller R; Waters KM; Shi W; Liu X; Zhang L; Knight R; Jensen PR; Palsson B; Pogliano K; Linington RG; Gutiérrez M; Lopes NP; Gerwick WH; Moore BS; Dorrestein PC; Bandeira N Nat. Biotechnol 2016, 34 (8), 828–837. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] (5).Annesley TM Clin. Chem 2003, 49 (7), 1041–1044. [DOI] [PubMed] [Google Scholar]

[R6] (6).Jessome LL; Volmer DA LCGC North Am. 2006, 24 (5), 498–510. [Google Scholar]

[R7] (7).Matuszewski BK; Constanzer ML; Chavez-Eng CM Anal. Chem 1998, 70 (5), 882–889. [DOI] [PubMed] [Google Scholar]

[R8] (8).Verhoeven A; Slagboom E; Wuhrer M; Giera M; Mayboroda OA Anal. Chim. Acta 2017, 976, 52–62. [DOI] [PubMed] [Google Scholar]

[R9] (9).Bingol K; Brüschweiler RJ Proteome Res. 2015, 14 (6), 2642–2648. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] (10).Bingol K; Li DW; Brüschweiler-Li L; Cabrera OA; Megraw T; Zhang F; Brüschweiler R ACS Chem. Biol 2015, 10 (2), 452–459. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] (11).Walker LR; Hoyt DW; Walker SM; Ward JK; Nicora CD; Bingol K Magn. Reson. Chem 2016, 54 (12), 998–1003. [DOI] [PubMed] [Google Scholar]

[R12] (12).Giskeødegård GF; Madssen TS; Euceda LR; Tessem MB; Moestue SA; Bathen TF NMR Biomed. 2019, 32 (10), e3927. [DOI] [PubMed] [Google Scholar]

[R13] (13).Gu H; Pan Z; Xi B; Asiago V; Musselman B; Raftery D Anal. Chim. Acta 2011, 686 (1–2), 57–63. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] (14).Cloarec O; Dumas ME; Craig A; Barton RH; Trygg J; Hudson J; Blancher C; Gauguier D; Lindon JC; Holmes E; Nicholson J Anal. Chem 2005, 77 (5), 1282–1289. [DOI] [PubMed] [Google Scholar]

[R15] (15).Zani CL; Carroll AR J. Nat. Prod 2017, 80 (6), 1758–1766. [DOI] [PubMed] [Google Scholar]

[R16] (16).Robinette SL; Zhang F; Brüschweiler-Li L; Brüschweiler R Anal. Chem 2008, 80 (10), 3606–3611. [DOI] [PubMed] [Google Scholar]

[R17] (17).McAlpine JB; Chen S-N; Kutateladze A; MacMillan JB; Appendino G; Barison A; Beniddir MA; Biavatti MW; Bluml S; Boufridi A; Butler MS; Capon RJ; Choi YH; Coppage D; Crews P; Crimmins MT; Csete M; Dewapriya P; Egan JM; Garson MJ; Genta-Jouve G; Gerwick WH; Gross H; Harper MK; Hermanto P; Hook JM; Hunter L; Jeannerat D; Ji N-Y; Johnson TA; Kingston DGII; Koshino H; Lee H-W; Lewin G; Li J; Linington RG; Liu M; McPhail KL; Molinski TF; Moore BS; Nam J-W; Neupane RP; Niemitz M; Nuzillard J-M; Oberlies NH; Ocampos FMMM; Pan G; Quinn RJ; Reddy DSS; Renault J-H; Rivera-Chávez J; Robien W; Saunders CM; Schmidt TJ; Seger C; Shen B; Steinbeck C; Stuppner H; Sturm S; Taglialatela-Scafati O; Tantillo DJ; Verpoorte R; Wang B-G; Williams CM; Williams PG; Wist J; Yue J-M; Zhang C; Xu Z; Simmler C; Lankin DC; Bisson J; Pauli GF Nat. Prod. Rep 2019, 36 (1), 35–107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] (18).Robinette SL; Brüschweiler R; Schroeder FC; Edison AS Acc. Chem. Res 2012, 45 (2), 288–297. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] (19).Zhang C; Idelbayev Y; Roberts N; Tao Y; Nannapaneni Y; Duggan BM; Min J; Lin EC; Gerwick EC; Cottrell GW; Gerwick WH Sci. Rep 2017, 7 (1), 14243. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] (20).Reher R; Kim HW; Zhang C; Mao HH; Wang M; Nothias L; Caraballo-rodriguez M; Glukhov E; Teke B; Leao T; Alexander KL; Duggan M; Everbroeck E. L. Van; Dorrestein PC; Cottrell GW; Gerwick WH; Reher R; Kim HW; Zhang C; Mao HH; Wang M J. Am. Chem. Soc 2020, 142 (9), 4114–4120. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] (21).Buedenbender L; Habener LJ; Grkovic T; Kurtböke DI; Duffy S; Avery VM; Carroll AR J. Nat. Prod 2018, 81 (4), 957–965. [DOI] [PubMed] [Google Scholar]

[R22] (22).Xia J; Bjorndahl TC; Tang P; Wishart DS BMC Bioinformatics 2008, 9, 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] (23).Howarth A; Ermanis K; Goodman JM Chem. Sci 2020, 11 (17), 4351–4359. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] (24).Schlippenbach T. Von; Oefner PJ; Gronwald W Sci. Rep 2018, 8 (1), 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] (25).Boyer RD; Johnson R; Krishnamurthy KJ Magn. Reson 2003, 165 (2), 253–259. [DOI] [PubMed] [Google Scholar]

[R26] (26).Brüschweiler R; Zhang FJ Chem. Phys 2004, 120 (11), 5253–5260. [DOI] [PubMed] [Google Scholar]

[R27] (27).Kuhn S; Johnson SR ACS Omega 2019, 4 (4), 7323–7329. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] (28).Van Santen JA; Jacob G; Singh AL; Aniebok V; Balunas MJ; Bunsko D; Neto FC; Castaño-Espriu L; Chang C; Clark TN; Cleary Little JL; Delgadillo DA; Dorrestein PC; Duncan KR; Egan JM; Galey MM; Haeckl FPJ; Hua A; Hughes AH; Iskakova D; Khadilkar A; Lee J-H; Lee S; Legrow N; Liu DY; Macho JM; McCaughey CS; Medema MH; Neupane RP; O’Donnell TJ; Paula JS; Sanchez LM; Shaikh AF; Soldatou S; Terlouw BR; Tran TA; Valentine M; Van Der Hooft JJJ; Vo DA; Wang M; Wilson D; Zink KE; Linington RG ACS Cent. Sci 2019, 5 (11). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] (29).Helmus JJ; Jaroniec CP J. Biomol. NMR 2013, 55 (4), 355–367. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] (30).Kellogg JJ; Graf TN; Paine MF; McCune JS; Kvalheim OM; Oberlies NH; Cech NB J. Nat. Prod 2017, 80 (5), 1457–1466. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] (31).Robinette SL; Lindon JC; Nicholson JK Anal. Chem 2013, 85 (11), 5297–5303. [DOI] [PubMed] [Google Scholar]

[R32] (32).Kurita KL; Glassey E; Linington RG Proc. Natl. Acad. Sci. U. S. A 2015, 112 (39), 11999–12004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] (33).Lewis IA; Schommer SC; Hodis B; Robb KA; Tonelli M; Westler WM; Sussman MR; Markley JL Anal. Chem 2007, 79 (24), 9385–9390. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] (34).Song Z; Wang H; Yin X; Deng P; Jiang W Clin. Chem. Lab. Med 2019, 57 (4), 417–441. [DOI] [PubMed] [Google Scholar]

[R35] (35).Wolfender JL; Litaudon M; Touboul D; Queiroz EF Nat. Prod. Rep 2019, 36 (6), 855–868. [DOI] [PubMed] [Google Scholar]

[R36] (36).Wong WR; Oliver AG; Linington RG Chem. Biol 2012, 19 (11), 1483–1495. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] (37).Shindo K; Yamagishi Y; Okada Y; Kawai H J. Antibiot 1994, 47 (9), 1072–1074. [DOI] [PubMed] [Google Scholar]

[R38] (38).Garcia I; Vior NM; González-Sabín J; Braña AF; Rohr J; Moris F; Méndez C; Salas JA Chem. Biol 2013, 20 (8), 1022–1032. [DOI] [PubMed] [Google Scholar]

[R39] (39).Lodewyk MW; Siebert MR; Tantillo DJ Chem. Rev 2012, 112 (3), 1839–1862. [DOI] [PubMed] [Google Scholar]

[R40] (40).Jain R; Bally T; Rablen PR J. Org. Chem 2009, 74 (11), 4017–4023. [DOI] [PubMed] [Google Scholar]

[R41] (41).Yesiltepe Y; Nuñez JR; Colby SM; Thomas DG; Borkum MI; Reardon PN; Washton NM; Metz TO; Teeguarden JG; Govind N; Renslow RS J. Cheminform 2018, 10 (1), 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] (42).Weinstien Melven P.. Methods for Dilution Antimicrobial Susceptibility Tests for Bacteria That Grow Aerobically, 11th ed.; Clinical and Laboratory Standards Institute: Wayne, PA, 2018. [Google Scholar]

PERMALINK

Development of an NMR-Based Platform for the Direct Structural Annotation of Complex Natural Products Mixtures

Joseph M Egan

Jeffrey A van Santen

Dennis Y Liu

Roger G Linington

Abstract

Graphical Abstract

Table 1.

RESULTS AND DISCUSSION

Feature Construction and Comparison

Figure 1.

Graphical User Interface

Figure 2.

Application to Known Scaffolds

Figure 3.

Detection of Non-Native Compounds in Complex Matrices

Table 2:

Figure 4.

Structural Dereplication in a Natural Product Prefraction Library

Figure 5.

Prioritization of Bioactive Constituents from Prefraction Libraries

Figure 6.

Limitations and Future Opportunities

Conclusion

EXPERIMENTAL SECTION

NMR Data Collection

Pure Compound Networking

Actinobacterial Growth Conditions

Actinobacteria Extraction

Fractionation of Actinobacterial Extract

Isolation of Collismycin A

Biological Activity Evaluation

Supplementary Material

ACKNOWLEDGMENTS

Funding Sources

Footnotes

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases