Abstract
High-throughput chemical analysis of natural products mixtures lags behind developments in genome sequencing technologies and laboratory automation, leading to a disconnect between library-scale chemical and biological profiling that limits new molecule discovery. Here we report a new orthogonal sample multiplexing strategy that can increase mass spectrometry-based profiling up to 30-fold over traditional methods. Profiled pooled samples undergo subsequent computational deconvolution to reconstruct peak lists for each sample in the set. We validated this approach using in silico experiments and demonstrated a high assignment precision (>97%) for large, pooled samples (r = 30), particularly for infrequently occurring metabolites of relevance in drug discovery applications. Requiring only 5% of the previously required MS acquisition time, this approach was repeated on a recent biological activity profiling study on 925 natural products extracts, leading to the rediscovery of all previously reported bioactive metabolites. This new method is compatible with MS data from any instrument vendor and is supported by an open-source software package: https://github.com/liningtonlab/MultiplexMS.
Graphical Abstract

INTRODUCTION
Advances in high-throughput multi-omics have revolutionized the field of natural products (NP) discovery, becoming an integral tool for prioritizing and directing the isolation of new chemical entities from biological organisms.1,2 Metabolomics, an “omics” branch focused on the comprehensive and quantitative chemical characterization of complex NP mixtures, has benefitted from these improvements and remains at the forefront of NP discovery.3 Untargeted metabolomics analysis is a mainstay for NP profiling as it leverages the unparalleled sensitivity of mass spectrometry (MS) to reveal chemical information on large numbers of metabolites in biological samples.4 However, the acquisition of MS data acquisition on large extract libraries remains a rate-limiting step for discovery programs due to the inherent time cost for chromatographic separation of complex mixtures. Academic high-throughput screening facilities can assay upwards of 50,000 samples a day. By contrast, MS data acquisition on these samples typically requires weeks or months of continuous instrument time.5 This means that modern NP initiatives such as the National Cancer Institute Program for Natural Products Discovery 1,000,000-member fraction library program cannot easily leverage the power of MS-based analyses in their discovery model.6 For example, with an elution gradient as low as 5 minutes, UHPLC-MS analysis of this library would require more than 9 years of continuous instrument time, not including replicate analyses that would increase the processing time three-fold.6,7
Developments in high-speed sampling techniques, such as matrix-assisted laser desorption/ionization (MALDI)8, acoustic mist ionization (AMI)9, and acoustic droplet ejection (ADE)10 provide effective and rapid sample delivery for very fast sample analyses (<10 s). These techniques are optimal for the targeted analysis of samples with few analytes per sample and greatly outpace LC methods for respective experiments.8 However, UHPLC-MS methods can improve coverage in highly complex samples such as natural product extracts, given the additional dimension afforded by the chromatographic separation step.
In fields such as public health screening, one solution to the challenge of low sample throughput is sample pooling. In this approach, tests are run on pools of samples, and individual samples are only retested if the pool returns a positive result.11,12 This is best suited to applications where the frequency of positive results is low, where screening many negative pools leads to a substantial reduction in the number of tests that must be performed. The discovery of bioactive NPs is well-suited to such a strategy. Hit rates in biological NP screens are typically less than 1%, and this often includes many different classes of NP structures.13 In general, individual NP structures are sparsely distributed across large extract libraries, making them well-suited to a pool / deconvolute approach. Despite a long history of sample pooling applications in various fields of science, no sample pooling MS-based method has yet been developed for NP extract libraries.13-16
In this study, we introduce MultiplexMS, a dual-grid orthogonal multiplexing strategy that overcomes high-throughput limitations of untargeted analyses by leveraging highly sensitive MS instrumentation with a pooled-sample approach. Drawing inspiration from previously described methods, we created a multiplexing strategy to efficiently analyze NP extracts by pooling rows and columns from grids of samples. A computational workflow deconvolutes the resulting pooled MS data back into MS feature lists of individual samples (Figure 1, S1).11,13,14,17 Quantitative information is sacrificed in this approach to increase throughput, making this technique ideal for untargeted NP metabolomics research when the objective is to find rare molecules in a large collection of complex mixtures. Consequently, this method is not suited for classic quantitative or targeted metabolomics experiments which require accurate concentration determinations for analytes that occur in many or all samples in the sample set.19 Instead, MultiplexMS enables rapid MS profiling of large extract libraries and returns lists of all MS features present in each sample in a single analytical experiment. The approach offers a scalable efficiency over traditional methods, dependent upon the number of samples pooled. The method is also supported by a stand-alone software package that manages all aspects of pool design and MS feature deconvolution. Herein we describe the development of the MultiplexMS method including proof-of-concept validation, assessment of pool size limits, in silico modeling to optimize pool design, the development of an open-source software platform for using MultiplexMS, and the validation of the platform with bioactive compound discovery from a 925-member extract library.
Figure 1.
Overview of the MMSO strategy. Symmetrical initial and rearranged grids of size r2 are generated using the MultiplexMS app. Rows and columns of each grid are individually pooled and analyzed by UPLC-MS. Aligned MS features (retention time_m/z pair) in each dataset are traced back to the intersecting well of the analyzed row/column. If the sample at the intersection is the same in both grids, then the feature is assigned to that sample.
EXPERIMENTAL SECTION
General experimental information
All solvents used in the MS acquisition were of Optima LCMS grade. Laboratory antibiotic standards were used without additional purification. NMR spectra were measured on a Bruker AVANCE II 600 MHz spectrometer equipped with a 5 mm QCI cryoprobe (Supporting Method 1).
LCMS conditions
All measurements were performed on an ACQUITY I-Class UPLC system (Waters Corp.) with an ACQUITY HSS T3 column (1.8 μm, 2.1 × 100 mm, Waters Corp.). Separation was achieved using a linear elution gradient (mobile phase A: H2O + 0.01% formic acid; mobile phase B: acetonitrile + 0.01% formic acid, 0.5 mL/min) as follows: 0 – 0.3 min, 5% B; 0.3 – 4.7 min, 5% – 90% B; 4.7 – 5.5 min, 90% – 98% B; 5.5 – 5.8 min, 98% B; 5.8 – 7.5 min, 5% A. MS data were acquired on a SYNAPT G2-Si qTOF (Waters Corp.). All mass measurements were recorded using ESI+ data-independent acquisition (DIA) experiments, with ion mobility mode enabled (Supporting Method 2).
MS data processing
All samples were processed using the Progenesis QI software suite (v2.2.5826.42898, Nonlinear Dynamics, Waters™ Corp.). MS data was uploaded into Progenesis QI for spectral alignment, lock mass calibration, and peak picking using default settings. The generated feature table was subjected to a blank subtraction, minimal intensity threshold determination, and rearranged to a flat .csv file containing MS features on one axis with pooled sample names on the other. The rearranged feature table was provided to MultiplexMS for computational deconvolution (Supporting Method 3 – 4).
Determination of minimum intensity threshold
All MS data files were preprocessed in Waters™ Progenesis QI using default settings. The cut-off threshold was calculated based on feature count per sample for a range of cut-off intensity values. The minimum intensity threshold was set at the inflection point of this calculation. The number of MS features below this mark dramatically increased in each dataset, indicating the introduction of noise features, and were removed from the analysis (Supporting Method 4).
Sample preparation
Sample preparation from frozen DMSO stock is described in detail in Supporting Method 5. Sample pooling was performed using a TECAN Evo 150 liquid handler equipped with a LiHa robotic arm to automate the multiplexing process. Stock extracts were suspended and transferred in DMSO. Following the sample pooling procedure, the destination plates were concentrated in vacuo to remove the DMSO. To ensure each sample was at the same concentration, the dried mixtures were resuspended in 5 μL DMSO, and then brought to a final concentration with 50% (v/v) methanol/water for mass spectrometric analysis.
RESULTS
Development of the multiplexed sampling strategy
MultiplexMS uses a dual-grid orthogonal pooling strategy to maximize the throughput of NP extract samples while maintaining important chemical information when analyzed by MS (Figure 1, Supporting Method 6). Adapted from array testing strategies11, this method arranges an n sample library in two MultiplexMS Organization (MMSO) configured r × r grids (initial and rearranged), where samples are multiplexed in pools of size r. MMSO strategically organizes the samples into two configurations such that no two samples from the first grid fall in the same row/column in the second grid (Figure S2). The use of the dual-grid strategy minimizes the inherent disadvantages of deconvoluting the occurrence of the same feature(s) in multiple samples in a grid, leading to false positive assignments and overinflated feature lists for reconstructed samples.11 For example, if the same feature appears in many samples, then the deconvolution algorithm will return false positives in cases where two or more possible solutions cannot be disambiguated (Figure S3). While the MMSO sampling approach doubles the number of samples to be analyzed, the incorporation of the second grid considerably improves the precision of MS feature assignments and reduces false positive rates (FPRs) following computational deconvolution (Figure S4). The rows and columns of the initial and rearranged grids are pooled in size r, where each extract makes up 1 ⁄ r of the pooled samples, and then diluted to the final volume for MS analysis. This strategy enables each sample to be analyzed at a fixed concentration irrespective of grid size, thereby preventing dilution effects.
MMSO pooled samples are analyzed by UPLC-MS, and a feature list is generated for each mixture from a user-preferred software package (e.g., MZmine 319, Waters™ Progenesis QI, etc.). MultiplexMS then computationally deconvolutes the initial and rearranged grids by assigning features detected at the intersection of row/column combinations in both grids to the sample at that position. A feature must be detected in the correct positions in both grids to be given a sample assignment. An advantage of using this method is that each sample is analyzed four times (each row/column in both grids), providing built-in replication for quality control (Figure 1). The MMSO approach is designed to increase the confidence of the analyte assignments in cases where the same analyte appears multiple times in a sample set, a possible risk associated with orthogonal sample pooling.17
In silico testing of the MultiplexMS strategy
Instances of the same feature(s) occurring in multiple samples in a grid can expose a potential pitfall of the MMSO sampling strategy and deconvoluting protocol, namely that this feature may be falsely assigned to additional samples in the set. Conversely, if NP libraries are comprised exclusively of molecules that appear with low frequency, then the deconvoluting algorithm will correctly reconstitute MS feature lists for every sample. To assess the performance of the MMSO method, we performed an in silico experiment using LCMS data for 1,015 bacterial extract fractions (prefractions) from our in-house library. The dataset was chosen based on the taxonomic relatedness of the source organisms (Actinobacteria), and the potential overlap of natural products between samples.21 This provided an ideal test case to quantitatively evaluate the performance (precision and FPR) of the MMSO method for NP mixtures.
To estimate feature frequency in metabolic profile datasets we performed an in silico subsampling of the full dataset and counted feature presence for all MS features. This experiment was repeated 50 times for three different grid sizes (r = 10, 20, and 30), containing 100, 400, and 900 prefractions, respectively (Figure S5). The number of occurrences, x, for each MS feature was averaged for all repetitions and plotted for each grid size (blue bars, Figure 2). In all three cases, the average frequency of an MS feature being present just once exceeded 2000 (r = 10, 2654 ± 391; r = 20, 2273 ± 149; r = 30, 2040 ± 71) out of an average 10,253 ± 657, 13,686 ± 251, and 15,453 ± 98 total MS features in each subsampling dataset, respectively. The frequency of MS features present twice in the dataset (x = 2) surpassed 1000 for all grids (r = 10, 1380 ± 219; r = 20, 1258 ± 119; r = 30, 1196 ± 30), decreasing for higher frequency counts (e.g., x = 5: r = 10, 544 ± 189; r = 20, 571 ± 97; r = 30, 512 ± 36).
Figure 2.
In silico testing of the MultiplexMS strategy. The results of the MS feature frequency count and precision calculation for each subset population using the dual grid sampling scheme with (A) r = 10, (B) r = 20, and (C) r = 30 grid dimensions.
In addition to the MS feature count, we also determined the precision and FPR of the MMSO method for the three selected r2 grid sizes (see SI, green trendline, Figure 2). The MS feature table in each selected sample was used to simulate pooled rows and columns in silico, and the deconvolution algorithm was applied to generate MS feature lists for reconstructed samples. One limitation of this approach is that only correct and false assignments are assessed because effects that could lead to feature loss (i.e., ion suppression, sample preparation issues, etc.) are excluded. Nevertheless, this experiment provides a valuable testbed for evaluating the computational deconvolution algorithm under ideal acquisition conditions. The performance of the algorithm was tested on both single grid (Figure S6), and MMSO layouts (Figure 2). In both scenarios, MS features in the reconstructed prefractions were compared to those in the ground-truth dataset, allowing a facile determination of the precision (true positives/ total positives; see SI, green trendline in Figure 2, S7) of analyte assignments. Concurrently, the FPR was assessed as 1 – precision (see Supporting Method 7).
Despite selecting a large, taxonomically related metabolomics dataset for in silico subsampling the majority of MS features were present five times or fewer (e.g., r = 10, 62%). This illustrates the sparse distribution of MS features in this representative prefraction library and suggests that deconvolution methods that are optimized for sparse sample sets are appropriate for applications in natural products. The MMSO sampling strategy also improves assignment precision in all experimental grid sizes (e.g., x = 5: r = 10: 86 ± 13%; r = 20: 95 ± 9%; r = 30: 97 ± 6%) compared to the single grid approach (x = 5: r = 10: 31 ± 8%; r = 20: 25 ± 5%; r = 30: 23 ± 4%). This provides an argument for using dual grids in this application.
Importantly, a balance must be struck between throughput and FPR as r increases. Increasing grid dimensions r2 increases the probability of multiple samples containing the same MS feature(s). In all cases, the FPR is inversely proportional to r as MS feature frequency increases. As expected, the precision of MS feature assignment is very high when x is small (x ≤ 5), but trends downward as x increases. At very high values of x, a point of inflection is reached due to the increased likelihood of an MS feature being detected in the correct well despite the increased FPR of assignments. However, when the objective is to identify ‘rare’ metabolites that are sparsely distributed in a library (as is common in drug discovery applications), then metabolites that occur with high frequency and correlate poorly with biological activity can typically be safely excluded from downstream analyses, limiting concern about high FPRs for abundant features in the dataset.
MultiplexMS successfully deconvolutes “one-compound-one-well” libraries
Although in silico experiments demonstrated the applicability of the MMSO method for complex library deconvolution they do not consider practical issues (e.g., ion suppression) that can impact MS data quality. To address this issue, we performed a series of experiments of increasing complexity designed to evaluate the accuracy of this method in real-world scenarios. As an initial proof-of-concept, a series of commercially available antibiotic and antifungal NP standards (n = 25) were arranged in MMSO grids (r = 5) as a “one-compound-one-well” experiment (Figure S8). This simple multiplexing case provided a benchmarking experiment to observe the potential effects of sample pooling (i.e., chromatographic overlap and ion suppression) with MS analyses. The sample list was provided to the MultiplexMS app, which generated the sample lists for the initial and rearranged grids. Pooled mixtures were prepared by combining samples from appropriate rows and columns in each grid (20 pooled samples total), diluted to an appropriate final concentration, and analyzed by UPLC-MS. Preprocessing of the MS data for pooled samples was performed with Progenesis QI for peak picking and alignment of MS features. A strict blank subtraction and minimum intensity threshold were applied to the output file (see Methods). MS feature lists for each pooled sample were provided to the MultiplexMS app for feature deconvolution and sample reconstruction for each NP standard. Separately, each of the 25 NP standards was analyzed individually by MS to establish a ground-truth dataset comprised of the correct m/z – retention time pairs. A comprehensive examination of m/z – retention time pairs assigned to the standards showed all 25 NPs were assigned to the correct reconstructed sample, with two instances of false positive assignments (Figure S9). The first was the correct assignment of m/z 335.10 to penicillin G, but the identification of the compound in the reconstructed sample of cloxacillin. Reexamination of the cloxacillin standard revealed low-intensity contamination of the commercial material with penicillin G, confirming the MultiplexMS assignment (Figure S10). The second was structural isomers tetracycline and doxycycline where poor peak shape and overlapping elution times complicated automated peak picking and alignment (Figure S11). Overall, the precision of feature assignment was 96% with two instances of false positive assignments. The success of this initial test case encouraged us to extend the methodology to the analysis of complex mixtures and to explore the detection limits for complex mixture analysis using mass spectrometry.
MS feature deconvolution performance for LCMS analysis of complex mixtures
A major concern with the multiplexing approach is that interference between analytes could lead to loss of information compared to analyzing samples individually. Ion suppression, loss of chromatographic resolution due to peak overlap, and limitations of automated peak detection software can all impact the detection of MS features in highly complex mixtures. To assess the impact of increasing sample complexity on MS feature recovery we selected a prefraction from our in-house marine bacterial library that was known to contain several NP compound classes, including micromonolactam (1), dracolactam A (2), and dracolactam C (3).21-23 This target prefraction, represented as i, was analyzed in triplicate and MS features appearing in all three replicates were retained to generate a ground-truth feature list. Next, a sequential number of prefractionated extracts from other source organisms were added to the target prefraction from i + 1, i + 2… i + 49 and then incrementally for i + 59, 69, 79, and 99. The total pooled prefractions represent potential grid dimensions from r + 1 to 100 (Figure 3A). The concentrations of individual prefractions in each pooled sample of size r were kept constant across the set to eliminate dilution effects. Each pooled sample was analyzed by UPLC-MS, preprocessed in Progenesis QI to generate an MS feature list, and compared to the ground-truth feature list to determine rates of recovery for the ground-truth MS features from the target prefraction.
Figure 3.
MS feature deconvolution performance for LCMS analysis of complex mixtures. (A) Pooling scheme showing the successive addition of complex mixtures to a ground-truth sample containing known molecules 1, 2, and 3. (B) Information recovery of benchmark molecules. Each molecule shows consistent peak shape and intensity up to r = 60 in all cases, while signal intensity reduces above r = 70. (C) Relative feature recovery of the full MS feature list from the ground-truth feature list was assessed for each test mixture and plotted as percent recovery. Red: r = 5, Green: r = 10, Yellow: r = 31. (D) Assessment of feature recovery as a function of mixture complexity. Ground-truth MS features are on the y-axis in order of feature retention success throughout the dataset.
We assessed feature recovery in two ways. Firstly, the three benchmark molecules (1 – 3) were examined to determine information recovery in the presence of increasing numbers of additional prefractions (Figure 3B). Extracted ion chromatograms (EICs) for each molecule show consistent peak shape and intensity up to r = 60 in all cases. Above r = 70, the peak shape for compounds 2 (m/z 486.29; tR 2.58 min) and 3 (m/z 452.28; tR 3.01 min) remained consistent, but peak intensities were reduced, reaching a minimum of 64% for 2 and 43% for 3 of the ground truth signals when r = 100 (Figure 3B). Similarly for benchmark molecule 1 (m/z 452.28; tR 3.11 min), peak shape remained consistent through r = 100, but peak intensity started to reduce at r = 70, displaying the greatest reduction in the intensity of 50% of the ground-truth signal for each molecule when r = 100 (Figure 3B).
Secondly, recovery of the full ground-truth feature list from the target extract was assessed for each test mixture. Figure 3C illustrates the percentage of recovered features from the ground-truth feature list as a function of mixture complexity (r = 0 →100). Pooling up to 10 extracts returned >97% of ground-truth features. Increasing the mixture complexity of 20 samples reduced the recovery to 90% while pooling up to 30 samples reduced the overall recovery rate to 74%. Examination of the signal intensities for unrecovered features indicated that most were low-intensity analytes, close to the intensity threshold (Figure 3D). Importantly, most natural products possess multiple MS features under standard LCMS acquisition conditions, suggesting that the loss of small numbers of low-intensity features will have a negligible impact on the chemical characterization of complex mixtures, particularly for drug discovery applications.7 An important observation from this analysis is that grid size selection should not be made on the basis of reduction in acquisition time, but instead should be optimized based on sample complexity and MS feature recovery. These factors are heavily influenced by the complexity, chemical similarity, and chromatographic properties of the mixtures being analyzed. We recommend that users of the MultiplexMS platform perform the simple benchmarking method outlined above using their sample libraries and use the plot in Figure 3C to select an appropriate grid size for each sample set.
Assessment of MultiplexMS performance with complex mixtures
The evaluation of MS feature recovery as a function of sample mixture complexity (described above) yielded a feature return of >97% for mixtures containing 10 extracts, suggesting that MultiplexMS should provide excellent recall for medium grid sizes. However, this analysis did not account for changes in MS feature list composition caused by ion suppression, peak overlap, etc. To examine the real-world performance of this method we prepared MMSO grids containing 90 bacterial prefractions and 10 NP standards (Figure S12). Samples were arranged so that each row/column in the initial grid contained one NP standard, providing an internal reference during the deconvolution step. Rows and columns were pooled and analyzed by UPLC-MS (40 samples in total) following the MMSO protocol. Separately, each prefraction was analyzed independently in quadruplicate to establish a ground-truth list of MS features. A strict replicate comparison was applied so that only features in all 4 replicates were retained. To avoid dilution effects, the concentration of individual replicate samples was analyzed at the same concentration as the individual prefractions in each pooled row/column.
Following preprocessing of the multiplexed MS data in Progenesis QI with blank subtraction and the application of a minimum intensity threshold, pooled samples were computationally deconvoluted to reconstruct MS feature lists for each original prefraction. MS features pertaining to the NP standards were correctly assigned to their parent samples (Figure S13) with zero instances of false positive assignment. Next, MS feature lists from the ground-truth prefraction replicate analyses were compared to the reconstructed samples from the multiplexed plates. Feature recovery was calculated as the percentage of MS features in the ground-truth feature list that were present in the reconstructed feature list for each prefraction (Figure S14). Feature recovery ranged from a maximum of 97% to a minimum of 57% with 72% of samples possessing feature recovery rates of 80% or higher. Importantly, most of the samples with low feature recoveries (<70%) were non-polar 'wash' fractions containing comparatively few MS features in the ground truth dataset (Figure S15). In these cases the loss of a small number of features resulted in large reductions in percentage feature recovery because of the small value of the denominator. As in the pooling experiment described above, most missing features possessed low signal intensities. Examination of these missing features revealed that most were low intensity m/z ions, high frequency background ions, isotopologues, or multiply charged ions identified by Progenesis QI (Figure S16). While these features are important for determining accurate atomic composition, they are not required to prioritize bioactive components from complex mixtures. Therefore, the loss of small numbers of MS features specific to an analyte does not impact the information content of the reconstituted MS feature lists for connecting the presence/ absence of molecules to specific biological phenotypes.
Overall, this experiment supported the results from the in silico analyses (Figure 2) that predicted excellent FPRs for all samples, but high FPRs for commonly occurring features. Utilizing a 10 × 10 grid reduced the data acquisition time 10-fold over quadruplicate individual analyses without major data loss compared to ground-truth MS feature lists. The strong overlap in feature recovery between in silico and experimental analyses validates this approach for bioactive compound discovery using automated data integration strategies.
Application of MultiplexMS to ultra-high-throughput library analysis
Bioactive compound prioritization is an important component of NP drug discovery, especially when a small number of analytes in NP extract libraries exhibit bioactivity.13 The collection of metabolomics data from a large set of NP extracts along with accompanying bioassay data can highlight the active constituents in datasets, minimizing the time-to-discovery in natural products pipelines. NP Analyst, an open platform for Compound Activity Mapping, is one such tool that integrates metabolomic and bioactivity data to enable compound prioritization.21 In the original study from our laboratory, Lee et al. used metabolomics data of 925 in-house marine bacteria prefractionated extracts, measured in technical triplicates (2775 samples), together with an Antibiotic Mode of Action Profile (BioMAP) screening panel, to highlight priority molecules for isolation leading to the discovery of new bioactive molecules.21,24
Acquisition of the MS data for the 925-member prefraction library required more than 15 days of continuous acquisition time, effectively occupying one high-resolution mass spectrometer for several weeks; substantially longer than the time required to acquire the corresponding biological screening results. This original NP Analyst experiment, therefore, provided the perfect test platform to benchmark the MMSO strategy against a “ground-truth” dataset. Since metabolomics data had already been acquired on each sample in triplicate for the original NP Analyst study, we first tested the principle of the MMSO strategy by pooling feature lists in silico using an r value of 31. A grid size of r = 31 is the minimum size required to incorporate all 925 samples into a single analysis. This grid size provided the highest possible throughput, reducing the total number of samples required for MS analysis from n = 2775 (925 × 3) to 124 and reducing the theoretical analysis time from 15 consecutive days of instrument time to 15 hours; a 24-fold reduction in acquisition time.
NP Analyst prioritizes MS features based on the strength (Activity Score) and consistency (Cluster Score) of the activity profiles for the samples where the MS feature is found.25 If a set of samples have similar bioactivity profiles and contain one or more MS features in common, those MS features will be prioritized as candidate bioactive features. Therefore, errors in reconstitution from MultiplexMS will impact both Activity and Cluster scores for that feature, possibly deprioritizing important bioactive molecules. To assess the value of MultiplexMS data for bioactive compound discovery we performed a new NP Analyst experiment using the reconstituted MS feature lists from the in silico MultiplexMS experiment and the BioMAP activity data from the original NP Analyst study. Next, we compared the Activity and Cluster Scores for each MS feature between the original and in silico MultiplexMS experiments. Examining any changes in Activity and Cluster scores for active features from the original experiment provides an objective measure of the influence of MultiplexMS analysis on bioactive compound discovery; the main application for which this platform was designed.
The original metabolomics dataset contains 9,834 detected MS features, of which 845 were defined as 'active' using Activity and Cluster score filters of 2.0 and 0.3, respectively. Following computational deconvolution of the in silico pooled dataset, the Activity and Cluster scores were recalculated on all active and inactive MS features. An MS feature presence filter was applied, eliminating those features that are present in >20 reconstructed prefractions (Figure 4A, S17). Encouragingly, all the 845 bioactive MS features from the original experiment were detected in the filtered in silico experiment. Of these, 798 showed no change in Activity and Cluster scores, indicating that most bioactive features were correctly assigned to their original positions in each grid. Among the 47 features possessing changes in either Activity or Cluster Score changes ranged from –4.1 to +0.23 (Activity Score) and –0.46 to +0.042 (Cluster Score) (Figure 4B, C). In total, just 26 features possessed changes in scores large enough to change their assignment from 'active' to 'inactive'. We also assessed the in silico NP Analyst dataset to look for instances where features were falsely assigned as 'active' compared to the original dataset. Interestingly, there were no cases of 'active' misassignments; a reasonable result when one considers that features would have to be falsely assigned only to prefractions with similar biological profiles to be scored as 'active'. Overall, the in silico NP Analyst experiment provided strong evidence that this strategy was appropriate for high-throughput bioactive metabolite discovery, with low false negative 'active' feature assignment rates and zero false positives.
Figure 4.
In silico MultiplexMS comparison of Activity and Cluster Scores to a ground truth experiment set. (A) Assessment of the FPR in the active prefractions following in silico MultiplexMS. First, MS features that were present in the dataset ≥ 20 samples were omitted. Next, an Activity and Cluster Score filter of 2.0 and 0.3 were applied. (B) Absolute changes in the original Activity Scores of the active MS features compared to the in silico MultiplexMS experiment. (C) Absolute changes in the original Cluster Scores of the active MS features versus the in silico MultiplexMS deconvoluted scores.
As a final validation of the approach, we elected to repeat the entire NP Analyst analysis experimentally by creating pooled samples from the original stock solutions and reacquiring the raw MS data. The prefraction sample list was provided to the MultiplexMS app to generate the initial and rearranged grids for r = 31 and create the pooled lists for the rows and columns of each grid. Prefractions were pooled using a Tecan Evo 150 liquid handling robot (see Methods), creating 62 pooled extracts per grid (124 total). Multiplexed samples were subjected to MS analysis, preprocessed in Progenesis QI for peak picking and alignment, and the pooled feature lists were computationally deconvoluted with a minimum intensity value of 70 to generate reconstructed prefraction feature lists. The MS feature lists of the original and deconvoluted datasets were compared to determine the extent of feature inflation when r = 31. The MultiplexMS experiment yielded 4369 MS features compared to 9834 in the original feature lists. However, there were, on average, 1095 ± 131 MS features per prefraction present in the MultiplexMS deconvoluted list, versus 214 ± 178 in the original table. This comparison shows the extent of the FPR when pooling up to 31 prefractions and provides an ideal test case when prioritizing for the rarer analytes in a dataset.
Prefraction MS feature lists, together with the original BioMAP dataset, were integrated using the NP Analyst platform, and a prefraction-feature activity network was generated for bioactive compound identification and prioritization (Figure 5A, S18 – 20). Encouragingly, the new analysis recapitulated the creation of distinct communities for the known bioactive molecules from the original study (Figure 5). These included communities for micromonolactam (1) and dracolactam A (2); the collismycin analogs collismycin A (4), collismycin B (5), and SF2738D (6); and amychelin C (7). The characterization data for each molecule were compared to the previous report, including m/z, retention time, Activity Score and Cluster Score for representative ions from each molecule (Figure 5B, S21). All previously identified molecules were present in the MultiplexMS version of the experiment with very similar chromatographic attributes and identical Activity and Cluster scores in almost all cases.
Figure 5.
Ultra-high-throughput MultiplexMS application with complex mixtures. (A) The NP Analyst output from multiplexing 925 prefractionated extracts in 31 × 31 sampling grids (124 pooled samples). Distinct communities of MS features were generated. Circled communities highlight dereplicated molecules in the dataset. (B) Activity and Cluster Score comparison between molecules identified in the original analyses versus MultiplexMS.
The new NP Analyst network highlighted several additional communities for further investigation. One of these, community 16, included a feature m/z 295.1223 and 299.17 in two prefractions: RLUS-2072D and -E. Re-examination of the mass spectral data for these prefractions showed these two MS features were precursor ions for the known antibiotic natural product, staurosporine (8). To verify this assignment, the producing organism was re-cultured, the target molecule isolated, and the structure confirmed by 1H-NMR and comparison with a commercial standard (Figure S24, Supporting Method 10).
Together, these results demonstrate that MultiplexMS can be applied successfully to decrease acquisition time and increase sample throughput for bioactive compound discovery. In a complex real-world example containing over 900 samples, the system performed equivalently to the gold-standard approach with individual replicate analyses but required less than five percent of the MS acquisition time. Chemical characterization of bioactive metabolites was highly similar between the two methods indicating that, at the compound level, the information content was equivalent. This result paves the way for an application to other large NP-based screening projects where limitations in data acquisition rates currently preclude the use of next-generation data integration strategies such as NP Analyst.
CONCLUSIONS
Technological advances in instrument sensitivity and preprocessing software suites can help facilitate compound discovery by directing bioactive compound identification in HTS applications. However, recent advances in laboratory automation are significantly increasing the scale of some NP libraries.6 In turn, the time required to analyze these samples has increased dramatically, reaching as much as decades of instrument time in extreme cases.6 This is further complicated by the fact that many commercial and third-party MS processing software packages cannot perform peak picking and alignment of thousands of samples in a single analysis, making it functionally impossible to analyze these sample sets using traditional methods.
Throughput is well recognized as a limiting factor by the metabolomics community.26 Numerous strategies for addressing this issue have been proposed, including the use of pooled quality control (pooled QC) samples to define ground truth peak lists.27 For example, Stancliffe et al. recently reported a data acquisition strategy capable of analyzing thousands of samples by using the ground truth peak list from a pooled QC sample to peak pick features from batches of research samples.28 However, although the pooled QC strategy eliminates the need to acquire technical replicates for all samples the number of MS runs still increases linearly with the sample count. By contrast, MultiplexMS requires significantly fewer MS runs for a given sample set. For example, the 925 sample NP Analyst experiment described above would have required 928 MS runs using the pooled QC experiment (3 × pooled QC and 925 sample runs) as opposed to just 124 runs using MultiplexMS. Further, the success of the pooled QC approach would have required the extraction of accurate peak lists from a pooled QC mixture containing 925 samples, compared to mixtures containing just 31 samples using MultiplexMS.
This raises an important distinction between the two methods. Many existing strategies, such as pooled QC, are designed for primary metabolomics applications where the samples (e.g., urine or plasma) are constitutionally similar to one another but the concentrations of analytes vary between samples. In these cases, pooled QC and other methods are effective because the composition of the pooled QC sample is similar to the composition of individual samples. What is required is the relative abundances of these analytes in each sample, necessitating single sample runs.
By contrast, in the discovery of bioactive natural products the composition of each extract can be markedly different, making pooled QC strategies impractical due to signal suppression and chromatographic overlap. However, in this case, the goal of the analysis is to determine the presence/ absence of individual features, rather than their relative abundances. This change in scope permits sample multiplexing to increase sample throughput at the expense of analyte quantitation, enabling much higher throughput than is possible with other methods.
One limitation of this approach is that FPRs can be high for commonly encountered features. As discussed above, this is of limited importance for bioactive compound discovery. Most natural product sampling strategies are designed to maximize species29 and chemical30 diversity within the sample set, with the goal of discovering 'rare' metabolites with unique structural and biological features. Commonly encountered features can therefore typically be safely ignored due to their poor correlation with biological phenotypes, as evidenced by the NP Analyst experiment described above. Users can minimize the occurrence of false positives by designing chromatographic methods that limit overlap and by including appropriate process blank subtraction into their workflows. This excludes highly prevalent background MS features in the peak lists for multiplexed rows and columns and ensures that peak lists are representative of mixture composition. Where possible we recommend that users employ 'high-resolution' mass spectrometry systems with resolving powers > 20,000 such as the instrument used in this study (see Methods). Inclusion of additional axes of separation such as ion mobility spectrometry31 should also be considered to reduce the instances of incorrect alignment of features with similar physicochemical properties.
In summary, MultiplexMS provides a high-throughput pipeline to rapidly acquire qualitative information on the chemical complexity of large NP libraries, providing opportunities for the examination of large collections. It is supported by an open-source software package that includes an easy-to-use GUI, making it suitable for non-programmers. Users can employ their data processing software of choice for peak picking and alignment, meaning that MultiplexMS is vendor-independent, and can be readily incorporated into most existing MS workflows.
Supplementary Material
ACKNOWLEDGMENT
Funding was provided by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery (R.G.L.) and PGSD (M.J.J.R.) programs and the National Institutes of Health (U41AT008718 to R.G.L.). We thank Dr. Sandra Keerthisinghe for assistance with SFU’s Centre for High-Throughput Chemical Biology.
Footnotes
Supporting Information
The Supporting Information is available free of charge on the ACS Publications website. Experimental section, supporting methods, isolation of bioactive compounds, and supporting figures (PDF)
The authors declare no competing interests.
Data Availability Statement
The companion tool and a tutorial are freely available for Windows and MacOS (https://github.com/liningtonlab/MultiplexMS). Raw mass spectrometry data have been deposited to MassIVE (MSV000090912; doi: 10.25345/C5SF2MH02). Code and data for manuscript figures and plots are available for download at Zenodo (doi: 10.5281/zenodo.7968494). The MultiplexMS GUI was developed in Python 3.832 and leverages the widely used packages pandas (v1.5.2)33 and NumPy (v1.23.5)34 for data handling and calculations. The Gooey package (v1.0.8.1)35 was used to build the GUI framework.
REFERENCES
- (1).Leão T; Wang M; Moss N; da Silva R; Sanders J; Nurk S; Gurevich A; Humphrey G; Reher R; Zhu Q; Belda-Ferre P; Glukhov E; Whitner S; Alexander KL; Rex R; Pevzner P; Dorrestein PC; Knight R; Bandeira N; Gerwick WH; Gerwick L A Multi-omics Characterization of the Natural Product Potential of Tropical Filamentous Marine. Mar. Drugs 2021, 19, 20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (2).Yu Q; Xiao H; Jedrychowski MP; Schweppe DK; Navarrete-Perea J; Knott J; Rogers J; Chouchani ET; Gygi SP Sample Multiplexing for Targeting Pathway Proteomics in Aging Mice. Proc. Natl. Acad. Sci. U. S. A 2020, 117, 9723–9732. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (3).Demarque DP; Dusi RG; de Sousa FDM; Grossi SM; Silvério MRS; Lopes NP; Espindola LS Mass Spectrometry-based Metabolomics Approach in the Isolation of Bioactive Natural Products. Sci. Rep 2020, 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (4).Caesar LK; Montaser R; Keller NP; Kelleher NL Metabolomics and Genomics in Natural Products Research: Complementary Tools for Targeting New Chemical Entities. Nat. Prod. Rep 2021, 38, 2041–2065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (5).Pye CR; Bertin MJ; Lokey RS; Gerwick WH; Linington RG Retrospective Analysis of Natural Products Provides Insights for Future Discovery Trends. Proc. Natl. Acad. Sci. U. S. A 2017, 114, 5601–5606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (6).Thornburg CC; Britt JR; Evans JR; Akee RK; Whitt JA; Trinh SK; Harris MJ; Thompson JR; Ewing TL; Shipley SM; Grothaus PG; Newman DJ; Schneider JP; Grkovic T; O’Keefe BR NCI Program for Natural Product Discovery: a Publicly-Accessible Library of Natural Product Fractions for High-Throughput Screening. ACS Chem. Bio 2018, 13, 2484–2497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (7).Clark TN; Houriet J; Vidar WS; Kellogg JJ; Todd DA; Cech NB; Linington RG Interlaboratory Comparison of Untargeted Mass Spectrometry Data Uncovers Underlying Causes for Variability. J. Nat. Prod 2021, 84, 824–835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (8).Radosevich AJ; Pu F; Chang-Yen D; Sawicki JW; Talaty NN; Elsen NL; Williams JD; Pan JY Ultra-High-Throughput Ambient MS: Direct Analysis at 22 Samples Per Second by Infrared Matrix-Assisted Laser Desorption Electrospray Ionization Mass Spectrometry. Anal. Chem 2022, 94, 4913–4918. [DOI] [PubMed] [Google Scholar]
- (9).Sinclair I; Stearns R; Pringle S; Wingfield J; Datwani S; Hall E; Ghislain L; Majlof L; Bachman M Novel Acoustic Loading of a Mass Spectrometer: Toward Next-Generation High-Throughput MS Screening. J. Lab. Autom 2016, 21, 19–26. [DOI] [PubMed] [Google Scholar]
- (10).Zhang H; Liu C; Hua W; Ghislain LP; Liu J; Aschenbrenner L; Noell S; Dirico KJ; Lanyon LF; Steppan CM; West M; Arnold DW; Covey TR; Datwani SS; Troutman MD Acoustic Ejection Mass Spectrometry for High-Throughput Analysis. Anal. Chem 2021, 93, 10850–10861. [DOI] [PubMed] [Google Scholar]
- (11).Verdun CM; Fuchs T; Harar P; Elbrächter D; Fischer DS; Berner J; Grohs P; Theis FJ; Krahmer F Group Testing for SARS-CoV-2 Allows for up to 10-Fold Efficiency Increase Across Realistic Scenarios and Testing Strategies. Front. Public Health 2021, 9, 583377. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (12).Shental N; Levy S; Wuvshet V; Skorniakov S; Shalem B; Ottolenghi A; Greenshpan Y; Steinberg R; Edri A; Gillis R; Goldhirsh M; Moscovici K; Sachren S; Friedman LM; Nesher L; Shemer-Avni Y; Porgador A; Hertz T Efficient High-Throughput SARS-CoV-2 Testing to Detect Asymptomatic Carriers. Sci. Adv 2020, 6, 5961. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (13).Ohnesorge N; Sasore T; Hillary D; Alvarez Y; Carey M; Kennedy BN Orthogonal Drug Pooling Enhances Phenotype-Based Discovery of Ocular Antiangiogenic Drugs in Zebrafish Larvae. Front. Pharmacol 2019, 10, 508. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (14).Dorfman R. The Detection of Defective Members of Large Populations. Ann. Mat. Stat 1943, 14, 436–440. [Google Scholar]
- (15).Kainkaryam RM; Gilbert AC; Woolf PJ Smart Pooling of mRNA Samples in Microarray Experiments. BMC Bioinf. 2010, 11, 299–307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (16).Elkin LL; Harden DG; Saldanha S; Ferguson H; Cheney DL; Pieniazek SN; Maloney DP; Zewinski J; O’Connell J; Banks M Just-in-Time Compound Pooling Increases Primary Screening Capacity Without Compromising Screening Quality. J. Biomol. Screening 2015, 20, 577–587. [DOI] [PubMed] [Google Scholar]
- (17).Kainkaryam RM; Woolf PJ Pooling in High-Throughput Drug Screening. Curr. Opin. Drug Discovery Dev 2009, 12, 339–350. [PMC free article] [PubMed] [Google Scholar]
- (18).Deng K; Lan X; Fang Q; Li M; Xie G; Xie L Untargeted Metabolomics Reveals Alterations in the Primary Metabolites and Potential Pathways in the Vegetative Growth of Morchella sextelata. Front. Mol. Biosci 2021, 8, 632341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (19).Pluskal T; Castillo S; Villar-Briones A; Orešič M MZmine 2: Modular Framework for Processing, Visualizing, and Analyzing Mass Spectrometry-Based Molecular Profile Data. BMC Bioinf. 2010, 11, 395. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (20).Schulze CJ; Bray WM; Woerhmann MH; Stuart J; Lokey RS; Linington RG ”Function-First” Lead Discovery: Mode of Action Profiling of Natural Product Libraries Using Image-Based Screening. Chem. Biol 2013, 20, 285–295. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (21).Lee S; van Santen JA; Farzaneh N; Liu DY; Pye CR; Baumeister TUH; Wong WR; Linington RG NP Analyst: An Open Online Platform for Compound Activity. ACS Cent. Sci 2022, 8, 223–234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (22).Hoshino S; Okada M; Awakawa T; Asamizu S; Onaka H; Abe I Mycolic Acid Containing Bacterium Stimulates Tandem Cyclization of Polyene Macrolactam in a Lake Sediment Derived Rare Actinomycete. Org. Lett 2017, 19, 4992–4995. [DOI] [PubMed] [Google Scholar]
- (23).Skellam EJ; Stewart AK; Strangman WK; Wright JLC Identification of Micromonolactam, a New Polyene Macrocyclic Lactam from Two Marine Micromonospora Strains Using Chemical and Molecular Methods: Clarification of the Biosynthetic Pathway from a Glutamate Starter Unit. J. Antibiot 2013, 66, 431–441. [DOI] [PubMed] [Google Scholar]
- (24).Wong WR; Oliver AG; Linington RG Development of Antibiotic Activity Profile Screening for the Classification and Discovery of Natural Product Antibiotics. Chem. Biol 2012, 19, 1483–1495. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (25).Kurita KL; Glassey E; Linington RG Integration of High-Content Screening and Untargeted Metabolomics for Comprehensive Functional Annotation of Natural Product Libraries. Proc. Natl. Acad. Sci. U. S. A 2015, 112, 11999–12004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (26).D’Atri V; S. Fekete S; Clarke A; Veuthey JL; Guillarme D Recent Advances in Chromatography for Pharmaceutical Analysis. Anal. Chem 2019, 91, 210–239. [DOI] [PubMed] [Google Scholar]
- (27).Evans AM; O’Donovan C; Playdon M; Beecher C; Beger RD; Bowden JA; Broadhurst D; Clish CB; Dasari S; Dunn WB; Griffin JL; Hartung T; Hsu PC; Huan T; Jans J; Jones CM; Kachman M; Kleensang A; Lewis MR; Monge ME; Mosley JD; Taylor E; Tayyari F; Theodoridis G; Torta F; Ubhi BK; Vuckovic D Dissemination and Analysis of the Quality Assurance (QA) and Quality Control (QC) Practices of LC-MS Based Untargeted Metabolomics Practitioners. Metabolomics 2020, 16, 113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (28).Stancliffe E; Schwaiger-Haber M; Sindelar M; Murphy MJ; Soerensen M; Patti GJ An Untargeted Metabolomics Workflow that Scales to Thousands of Samples for Population-Based Studies. Anal. Chem 2022, 94, 17370–13378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (29).Clark CM; Nguyen L; Pham VC; Sanchez LM; Murphy BT Automated Microbial Library Generation Using the Bioinformatics Platform IDBac. Molecules 2022, 27, 2038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (30).Kalkreuter E; Pan G; Cepeda AJ; Shen B Targeting Bacterial Genomes for Natural Product Discovery. Trends Pharmacol. Sci 2020, 41, 13–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (31).Perez de Souza L; Alseekh S; Scossa F; Fernie AR Ultra-High-Performance Liquid Chromatography High-Resolution Mass Spectrometry Variants for Metabolomics Research. Nat. Methods 2021, 18 (7), 733–746. [DOI] [PubMed] [Google Scholar]
- (32).The pandas development team, pandas-dev/pandas: Pandas, 2020. doi: 10.5281/zenodo.3509134. [DOI] [Google Scholar]
- (33).McKinney Wes, "Data structures for statistical computing in python" in Proceedings of the 9th python in Science Conference, van der Walt Stefan and Millman Jarrod, Ed. (2010), pp. 56–61. [Google Scholar]
- (34).Harris CR; Millman KJ; van der Walt SJ; Gommers R; Virtanen P; Cournapeau D; Wieser E; Taylor J; Berg S; Smith NJ; Kern R; Picus M; Hoyer S; van Kerkwijk MH; Brett M; Haldane A; del Río JF; Wiebe M; Peterson P; Gérard-Marchant P; Sheppard K; Reddy T; Weckesser W; Abbasi H; Gohlke C; Oliphant TE Array Programming with NumPy. Nature 2020, 585, 357–362. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (35).chriskiehl, Gooey, 2021. https://github.com/chriskiehl/Gooey). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The companion tool and a tutorial are freely available for Windows and MacOS (https://github.com/liningtonlab/MultiplexMS). Raw mass spectrometry data have been deposited to MassIVE (MSV000090912; doi: 10.25345/C5SF2MH02). Code and data for manuscript figures and plots are available for download at Zenodo (doi: 10.5281/zenodo.7968494). The MultiplexMS GUI was developed in Python 3.832 and leverages the widely used packages pandas (v1.5.2)33 and NumPy (v1.23.5)34 for data handling and calculations. The Gooey package (v1.0.8.1)35 was used to build the GUI framework.





