Abstract
The continuous emergence of new pollutants poses significant threats to both human health and ecological environments. Nontarget analysis based on mass spectrometry has become prevalent for detecting new pollutants due to its high throughput capabilities. However, structural elucidation remains a major challenge in nontarget analysis. Here, we review the implementation of machine learning techniques to accelerate nontarget structural elucidation, with particular focus on spectral library matching, structural database retrieval, and de novo structure generation. We investigated the design principles, technical characteristics, and comparative evaluation of these computational approaches. In addition, we show their applications in environmental nontarget analysis for new pollutant identification. Finally, we discuss the challenges of current approaches and the future development trends. This review aims to deepen the understanding of existing computational approaches, promote the application of machine learning techniques in nontarget identification, and facilitate the integration of artificial intelligence with environmental pollutant analysis.
Keywords: mass spectrometry, nontarget identification, new pollutants, machine learning, neural network


1. Introduction
Numerous new chemicals with substantial economic value have been produced and used. The number of chemicals registered on Chemical Abstracts Service (CAS) has been over two hundred million and is continuously growing. These chemicals might be released into the environment, leading to the emergence of new pollutants. New pollutants refer to hazardous chemical substances that are newly discovered or concerned, but have not yet been regulated or existing management measures are insufficient to effectively prevent and control their risks, also called chemicals of emerging concern, contaminants of emerging concern, or emerging contaminants. New pollutants have been continuously detected in the environment and human body, posing a significant threat to human health and ecological security. − Some new pollutants, such as perfluorinated compounds, also exhibit remarkable persistence, bioaccumulation and toxicity. − There is an urgent need to investigate the risks associated with these new pollutants. Accurate elucidation of their chemical structures is a prerequisite for assessing their potential environmental risks. Traditional target analysis is often inadequate to cope with the demand for high-throughput identification of new pollutants due to its reliance on available standards. In recent years, nontarget analysis based on the high-resolution tandem mass spectrometry (HRMS) has emerged as an important approach to identify new pollutants in environmental samples for its high throughput of detection. Several studies published on Science have underscored the importance of nontarget analysis in pollutant discovery. , Researchers have discovered a large number of new substitutes and transformation products in the environment through nontarget analysis. −
Nontarget analysis refers to the identification of compounds by using information such as mass-to-charge (MS1), isotope distribution, and fragment masses (MS2) provided by HRMS without prior information. In nontarget analysis, structural elucidation tends to become a critical and rate-limiting step. A common method for identifying compound structures from mass spectra is by comparing them to a spectral library. − Many databases have been developed, such as NIST, GNPS, MassBank, etc. However, the accuracy of library matching is heavily dependent on the spectral similarity algorithm. Traditional spectral similarity algorithms such as cosine similarity have limited accuracy and cannot fully represent the degree of structural similarity. In addition, library matching cannot identify compounds not in spectral library. To broaden the scope of application, several studies have attempted to identify candidates from structural databases, such as CAS, PubChem and ChemSpider, based on MS1 and isotopic distribution. In-silico fragmented MS2 data, along with other candidate metadata, were then compared to experimental MS2 spectra to identify and rank the candidates. , Although this approach improved the coverage of nontarget identification, a large number of chemicals and their transformation products remain unidentified in the environment.
In recent years, the rapid advancement of machine learning technology and mass spectral databases has led to the emergence of machine learning models trained on mass spectral data. − These models can predict the chemical structures of unknown compounds, either directly or indirectly, by learning the relationships between mass spectral data and chemical structures. Machine learning-based approaches can effectively address the limitations of current methods in terms of their scope of application. This also establishes a foundation for the future development of nontarget analysis. This review explores the implementation of machine learning techniques in three pivotal structural elucidation methodologies: spectral library matching, structural database retrieval, and de novo structure generation. We illustrated the underlying technical principles and performance metrics of existing algorithms. Furthermore, we demonstrate the application of these machine learning-driven structural identification approaches in environmental samples. At last, we suggest future development directions.
2. Implementation of Machine Learning for Mass Spectrometry-Based Identification
The integration of machine learning techniques with mass spectral data was initially uncommon in environmental nontarget identification but common in the field of metabolomics. Mass spectral analysis is selected as a key tool for metabolite identification due to its broad coverage, high sensitivity, and selectivity. Therefore, previous studies retained large amounts of mass spectral data, providing a solid foundation for the application of machine learning. Additionally, mass spectra are generated by the cleavage and transformation of molecules through various physicochemical processes at specific collision energies, implying that they may be related to chemical structures. Machine learning techniques are well-suited for identifying these potential patterns. Finally, metabolites are primarily small-mass compounds, which reduces the complexity of applying machine learning techniques. The establishment of various automated identification processes based on mass spectral data has led to the extension of these techniques to other areas, such as environmental nontarget identification.
Machine learning-based mass spectral identification methods can be broadly classified into three categories based on their scope of identification: (1) enhanced spectral library matching, to enhance the algorithms spectral similarity through machine learning to improve the performance of traditional library matching methods, but the scope of identification is limited to the number of identified mass spectra, at hundred thousand scale; (2) structural database retrieval, to find the most likely structure according to mass spectral information from known chemical structures through machine learning, the scope of identification is limited to all possible chemical structures known to mankind, at billion scale; (3) denovo generation, to directly give the possible chemical structures according to mass spectral information through machine learning which may not be consistent with any of known chemical structures, so there is no restriction on the scope of identification.
2.1. Enhanced Library Matching Methods Based on Machine Learning
Library matching is a classical method for mass spectrometry-based identification, the workflow of library matching is presented in Figure . The spectra obtained from samples must be compared individually with those in mass spectral databases, and the chemical structure with the highest spectral similarity score is selected as the identification result. Cosine similarity is commonly used to calculate the spectral similarity score. The identified spectra primarily originate from mass spectral analysis of standards. Several major mass spectral databases have been established, along with software. − The information on high-resolution mass spectral databases is presented in Table .
1.
Workflow of library matching (scored by cosine similarity).
1. Information on High-Resolution Mass Spectral Databases .
| database | scale (number of spectra) | profitability | homepage |
|---|---|---|---|
| NIST | 2,374,064 | commercial | https://chemdata.nist.gov |
| GNPS | 592,542 | nonprofit | https://gnps.ucsd.edu |
| MassBank | 122,512 | nonprofit | https://massbank.jp |
| MoNA | 235,754 | nonprofit | https://mona.fiehnlab.ucdavis.edu |
| METLIN | 960,000 | commercial | https://metlin.scripps.edu |
| mzCloud | 16,531,567 | nonprofit | https://www.mzcloud.org |
| HMDB | 76,416 | nonprofit | https://hmdb.ca |
All data were accessed on 25 March 2025. Tandem MS and GC MS spectra are both considered when counting number of spectra.
However, library matching may result in incorrect retrievals using current algorithms. Li et al. compared the accuracy of 42 spectral similarity algorithms which have been studied in different fields of science before with spectral entropy and proposed spectral entropy as the optimal method for mass spectral library matching. It improved the accuracy of MS-based annotations in small-molecule research to achieve 5.8% false discovery rate (FDR) at a 0.75 similarity score threshold compared to 9.6% FDR using dot product similarity in a data set of 25,138 molecules from NIST. This study demonstrates that spectral similarity algorithms still have room for improvement, which is crucial for enhancing the accuracy and coverage of library matching methods. Additionally, different samples can also significantly affect FDR of the similarity algorithm. Scheubert et al. evaluated the FDR of traditional library matching algorithms using 70 GNPS data sets. The results showed that it needed to set a high similarity score threshold, for example, 0.99, which resulted in low annotation rate, to achieve 1% FDR for some data sets even the FDR estimation was improved. Addressing the FDR issue to improve accuracy in library matching necessitates the development of novel methodologies.
To enhance the accuracy of library matching, the researchers incorporated machine learning models into the similarity algorithm. Huber et al. developed a novel spectral similarity algorithm, Spec2Vec, based on machine learning. The results from Spec2Vec more closely align with the structural similarity of two compounds compared to the classical cosine similarity algorithm. The method is inspired by Word2Vec, a technique from natural language processing. Spec2Vec treats each peak in MS/MS as a word in natural language, then analyzes the distribution, for example, coappearance or absence of peak sets to generate abstract spectral embeddings. Finally, it calculates the cosine similarity between the spectral embeddings. Spec2Vec can partially complement the missing information about distribution patterns in MS/MS data from classical algorithms, thereby improving the accuracy of library matching methods. In the method evaluation, Spec2Vec resulted in a notably better true/false positive ratio than classical algorithms across all recall rates and achieved a retrieval accuracy of up to 88%.
Following the rise of deep learning, which significantly enhanced learning capacity through the introduction of nonlinear neurons, Huber et al. developed MS2DeepScore based on the Siamese Network architecture to replace the original machine learning model. Rather than using unsupervised learning to explore distribution patterns in MS/MS data, the new model directly targets a specific structural similarity score, predicting Tanimoto scores for pairs of molecules based on their fragment spectra with a root mean squared error of about 0.15. This enables the MS2DeepScore method to retrieve compounds with greater structural similarity at all recall rates compared to the classical methods during mass spectral retrieval. Subsequently, De Jonge et al. incorporated MS1 information and employed the random forest model to make decisions based on the multidimensional data provided by Spec2Vec, MS2DeepScore, and MS1. This approach resulted in higher accuracy of MS2Query than MS2DeepScore when matching exact compounds and higher average Tanimoto score when searching analogues. Bittremieux et al. proposed GLEAMS, which utilizes contrastive learning techniques. Its basic framework is derived from Convolutional Neural Networks (CNN) and the Siamese Network. The advantage of contrastive learning is that the model can directly incorporate negative samples during training, thereby reducing the false discovery rate (FDR). This approach enhances learning capacity of the model. Guo et al. encoded basic information, such as MS1 and adduct type, as numerical values and adopted the Transformer framework, which is more advanced in deep learning. The model outperformed MS2DeepScore in the accuracy of mass spectral identification.
Methods of enhanced library matching are listed in Table .
2. Methods of Enhanced Library Matching.
| method | model architecture | model input | model output | training set | references |
|---|---|---|---|---|---|
| Spec2Vec | Word2Vec | word vector | spectral vector | GNPS | Huber et al. |
| MS2DeepScore | Siamese Network | binned spectrum | spectral vector | GNPS | Huber et al. |
| MS2Query | Random Forest | similarity score | similarity score | GNPS | De Jonge et al. |
| GLEAMS | CNN, Siamese Network | binned spectrum | spectral vector | MassIVE-KB | Bittremieux et al. |
| CLERMS | Transformer | binned spectrum | spectral vector | GNPS | Guo et al. |
A few models take metadata like precursor m/z, adduct type, collision energy, etc. as input additionally, they are not included here.
Spec2Vec needs a spectral document as input, where peaks and losses are translated into words, for example “peak@200.45” and “loss@100.45.”
Library matching is crucial for mass spectrometry-based nontarget identification and remains widely used in environmental nontarget analysis due to its ease of use and high efficiency. Utilizing machine learning models to enhance library matching offers a promising strategy for environmental nontarget analysis, as it improves the understanding of fragmentation patterns and reduces false positives. In addition to improving accuracy, machine learning can help establish relationships between compounds with low spectral similarity but high structural similarity, thereby adapting the library matching method to accommodate compounds with similar structures but low spectral similarity.
2.2. Structural Database Retrieval Methods Based on Machine Learning
The disparity between known chemical structures and available mass spectra is substantial. While databases like PubChem and ZINC contain hundreds of millions of chemical structures, only a fraction have corresponding mass spectral data. This gap creates a class of compounds we call ‘unknown knowns’: chemicals that exist in reference databases but remain unidentified in analysis due to lacking spectral information. In environmental nontarget analysis, even compounds with known structures often fall into this category when their mass spectra are unavailable. Given current analytical limitations, it is impractical to characterize all organic compounds through conventional library matching approaches, leaving most chemical structures undetermined.
Machine learning enables two principal strategies for identifying unknown knowns, seen in Figure . The first approach follows a from-spectrum-to-structure workflow: (1) Candidate structures are first filtered by precursor mass within an allowable tolerance. (2) Molecular fingerprints of these candidates are algorithmically generated. (3) A machine learning model then predicts the molecular fingerprint from the query mass spectrum. (4) The predicted fingerprint is compared against candidate fingerprints, with the highest similarity match selected as the identification result. The second approach follows a from-structure-to-spectrum workflow: (1) Candidate structures are first filtered by precursor mass within an allowable tolerance. (2) A machine learning model then predicts mass spectra from these candidates. (3) The query mass spectrum is compared against candidate mass spectra, with the highest similarity match selected as the identification result.
2.

Two main approaches of structural database retrieval. Candidate structures are obtained by searching the structure library using precursor m/z and isotope pattern. Approach 1 predicts structure from spectrum while approach 2 predicts spectrum from structure.
2.2.1. Methods of from-Spectrum-to-Structure Identification
To map mass spectra to chemical structures, researchers have used molecular fingerprints to represent chemical structures, as they can be directly generated by algorithms. In addition, the fixed length of molecular fingerprint makes it suitable for model training. Binning is the most straightforward and commonly used method for representing mass spectra. For each mass spectrum, an upper bound of m/z (e.g., 500 Da), is first defined. Subsequently, based on the desired resolution (e.g., 0.01 Da), the m/z range from 0 to 500 Da is divided into 50,000 bins. The intensity of each peak in the mass spectrum is then assigned to its corresponding bin as a characteristic value. If no peak falls within a bin, the value is set to zero. This process results in a binned representation of the mass spectrum. However, Rasche et al. explored an alternative approach to representing mass spectra, seen in Figure . They introduced fragmentation graphs and fragmentation trees to reduce noise interference and avoid fragment mass errors. A fragmentation graph is a specialized type of graph where each node corresponds to the mass of a fragment in the mass spectrum. Directed edges between the nodes, pointing from larger to smaller masses, represent possible fragmentation pathways. By annotating and computing the most likely subgraph from the fragmentation graph, a final fragmentation tree can be obtained. Annotation refers to calculating molecular formulas from MS/MS data using algorithms. Given the large number of possible molecular formulas, isotopic analysis is used to filter potential candidates and reduce computational demands. Computation involves scoring all subgraphs of the fragmentation graph, with the subgraph yielding the highest score being selected as the fragmentation tree. The scoring criteria include both isotopic distribution and fragmentation patterns. The isotopic distribution score is derived through Bayesian analysis, while the fragmentation score is determined by analyzing mass spectrum fragmentation patternssuch as by manually assigning reward scores for common neutral losses. In summary, this method generates a mass spectral fragmentation tree containing fragment masses, their corresponding molecular formulas, and fragmentation patterns.
3.
Simplified procedure for calculating molecular formula score from its fragmentation tree.
Building on the fragmentation tree, Dührkop et al. developed CSI:FingerID (Figure ). CSI:FingerID utilizes the Support Vector Machine (SVM), a machine learning technique. The model processes multidimensional data from the fragmentation tree as input and outputs the molecular fingerprint to retrieve the corresponding chemical structure from databases. Compared to classical methods and some machine learning approaches, CSI:FingerID demonstrates significant improvements in accuracy, while maintaining acceptable efficiency. The model is available in the latest version of the SIRIUS software and has shown strong performance in identifying compounds with precursor masses below 500 Da. However, as the size of the fragmentation tree grows exponentially with increasing mass, both the runtime and the number of candidate molecular formulas increase significantly, which can affect accuracy. Consequently, CSI:FingerID may lose its effectiveness for compounds with large masses.
4.
Workflow of CSI:FingerID. Reproduced with permission from ref .
Some studies have also employed the SVM but circumvented the use of fragmentation trees. Heinonen et al. employed similar SVM to predict molecular fingerprints directly from binned mass spectra, rather than using fragmentation trees, and performed structural database retrieval using the predicted fingerprints. The established model, FingerID, outperformed MetFrag on a test set derived from Kegg. Brouard et al. also applied SVM to map chemical structures into a latent space, replacing molecular fingerprints for structural database retrieval. The results demonstrated that the IOKR model outperformed both FingerID and CSI:FingerID on a test set derived from MassBank. Wang et al. developed a molecular fingerprint specifically for perfluorinated compounds and predicted fingerprints from mass spectra using SVM. This approach compensated for the lack of training data for perfluorinated compounds, improving the accuracy of the model over existing methods in identifying such compounds.
Given the limited learning capability of traditional machine learning frameworks, many studies have turned to deep learning models, which offer more parameters and greater learning capacity. Nguyen et al. utilized a Graph Neural Network (GNN) model to learn chemical structures and proposed ADAPTIVE, which achieved the highest accuracy over kernel-based models such as FingerID, CSI:FingerID, IOKR on a benchmark data set in ref . Goldman et al. developed MIST based on the Transformer architecture. This model predicts molecular fingerprints by learning fragmentation trees, incorporating multidimensional information such as fragment structures annotated by MAGma and molecular formulas. Additionally, a contrastive fine-tuning strategy was employed to increase the similarity between the predicted and true fingerprints while enhancing the dissimilarity from decoy fingerprints. MIST achieved performance comparable to that of CSI:FingerID in structural database retrieval. Fan et al. sought to directly predict molecular fingerprints from mass spectra using a three-layer neural network, leading to the development of MetFID. When evaluated on the CASMI 2016 testing data set, MetFID demonstrated performance comparable to CSI:FingerID.
The mechanisms vary across methods of from-spectrum-to-structure identification. For clarity, these methods are summarized in Table .
3. Methods of from-Spectrum-to-Structure Identification .
| method | model architecture | model input | model output | training set | references |
|---|---|---|---|---|---|
| CSI:FingerID | SVM | fragmentation tree feature | molecular fingerprint | GNPS | Dührkop et al. |
| FingerID | SVM | spectral features | molecular fingerprint | MassBank | Heinonen et al. |
| IOKR | SVM | spectral features, fragmentation tree features | molecular fingerprint | GNPS | Brouard et al. |
| APP-ID | SVM | spectral features | molecular fingerprint | in-house PFAS data set | Wang et al. |
| ADAPTIVE | SVM, GNN | spectral features | latent vector | GNPS | Nguyen et al. |
| MIST | transformer | formulas set | molecular fingerprint | NIST, MoNA, GNPS | Goldman et al. |
| MetFID | ANN | binned spectrum | molecular fingerprint | MoNA | Fan et al. |
A few models take metadata like precursor m/z, adduct type, collision energy, etc. as input additionally, they are not included here.
Fragmentation tree features are obtained through different kernel methods used in SVM or calculating similarity scores with fragmentation trees of a series of representative compounds.
Spectral features are obtained through different kernel methods used in SVM or calculating similarity scores with mass spectra of a series of representative compounds.
ADAPTIVE maps both mass spectrum and molecular graph into a latent space, then retrieved candidates there. The latent vector played the same role as molecular fingerprint.
Formulas set is obtained from fragment m/z assigned by SIRIUS.
ANN: Artificial Neural Network.
Methods for from-spectrum-to-structure identification typically involve predicting molecular fingerprints from mass spectra or from fragmentation trees derived from mass spectra. The predicted fingerprints are then used to search chemical structure databases for compounds with the most similar fingerprints, thereby enabling molecular identification. Numerous parameters representing the characteristics of mass spectra or fragmentation trees are selected as key features for the model, which is most evident in the choice of kernels in the SVM model. For example, in CSI:FingerID, the fragment type, fragment intensity, loss type, and loss intensity in the fragmentation tree each have corresponding kernels. Additionally, molecular formula, subtrees, and common substructures are also considered. In FingerID, kernels designed for mass spectra typically incorporate features such as m/z, intensity, and collision energy. Using a binned spectrum as model input is simpler; however, due to the absence of feature engineering, the model requires more parameters to achieve comparable performance.
2.2.2. Methods of from-Structure-to-Spectrum Identification
To predict mass spectra from chemical structure, researchers initially aimed to develop a collection of fragmentation rules. Simulated spectra were then generated by in silico fragmentation of candidate chemical structures based on these rules and compared with the spectra to be identified. Several widely used commercial software tools, such as ACD Fragmenter and Mass Frontier, follow this approach. Another strategy, however, does not rely on fixed rules but simulates bond breaking for each bond in the chemical structure, producing a fragmentation tree. The fragmentation tree, first introduced by Hill et al., is similar to the one described earlier, but its nodes contain specific chemical structures rather than molecular formulas.
Fragmentation tree-based rules are commonly employed to predict mass spectra from molecular chemical structures. Wolf et al. developed the MetFrag tool to assist in deducing chemical structures from mass spectra. This strategy considered a broader range of potential bond-breaking scenarios and enabled candidate ranking based on specific MS2 information, such as bond dissociation energy, fragment mass, and fragment abundance. This made MetFrag a popular tool for compound identification, offering advantages over rule-based strategies. Later, Ruttkies et al. optimized the MetFrag algorithm by introducing new filtering rules and ranking strategies such as InChIKey filtering, element restrictions, substructure restrictions, log P score, etc., making the method comparable to some machine learning models. Based on the same principle of scoring as MetFrag, Ridder et al. proposed an algorithm for generating fragmentation trees from multilevel mass spectra to enhance identification accuracy when such data are available. Similarly, Kind et al. employed a comparable strategy to predict the chemical structures of lipids from mass spectra. Their study successfully generated simulated lipid spectra that closely resembled real spectra, and they established LipidBlast, a database containing simulated mass spectra for 119,200 lipids. LipidBlast demonstrated superior performance due to the relatively clear fragmentation rules of lipids and their limited variety of substituent groups.
To more accurately predict the breaking of chemical bonds, researchers introduced machine learning models. Allen et al. proposed a different bond-breaking strategy called Competitive Fragmentation Modeling (CFM) to predict mass spectra using machine learning. In this approach, bond-breaking events are governed by six assumptions related to bond-breaking limitations, and they are repeated in the initial molecules and fragments produced in subsequent steps until no further fragmentation occurs. Each bond-breaking event is modeled as a Markov process, involving state transitions between two charged molecules. These processes are ultimately combined into a Markov network. Using this framework, the authors constructed a probabilistic graphical model to learn the state transition parameters for each bond-breaking event. Later, Allen et al. trained multiple CFM models for 10, 20, and 40 eV collision energies under both positive and negative ion modes and developed online tools for ease of use.
Currently, models that follow the ″from-structure-to-spectrum″ identification approach have shifted to deep learning frameworks, which offer more parameters and stronger learning capabilities. Deep learning-based methods typically use binned spectra to fix output lengths. Wei et al. initially proposed NEIMS, a model based on a multilayer perceptron (MLP), which can quickly predict mass spectra from molecular fingerprints and achieved better accuracy than the CFM model. Zhu et al. employed the more advanced GNN architecture to learn the probability distribution of molecular substructures in fragmentation trees, thereby predicting the corresponding intensities detected in mass spectrometry. This approach resulted in more stable and accurate predictions compared to the CFM and NEIMS models. Li et al., on the other hand, used both GNN and MLP to learn chemical structures and molecular fingerprints, respectively, and combined the manually weighted outputs of both models to obtain the final predicted mass spectra. The combined model outperformed either individual model. These two studies demonstrate the importance of topological information in training machine learning models. Goldman et al. proposed ICEBERG, a method for predicting mass spectra through fragmentation event inference and fragmentation graph reconstruction. Specifically, ICEBERG consists of two main steps. First, it generates a fragmentation graph through repeated bond breaking. Then, a GNN model is employed to score and filter out bond-breaking processes in the fragmentation graph, known as fragmentation graph reconstruction. The reconstructed graph can be used to predict mass spectra in a manner similar to the CFM model. The average cosine similarity between the predicted spectra of ICEBERG and the real spectra in the test set is 0.727, which exceeds that of the CFM and MassFormer models. Additionally, Hong et al. developed 3DMolMS, which utilizes the three-dimensional chemical structure of molecules by incorporating the type and spatial location of each atom as model inputs to predict mass spectra. The distances between atoms were employed to provide supplementary structural information for model training. The model outperformed CFM and MassFormer on the test set.
The mechanisms vary across methods of from-structure-to-spectrum identification. For clarity, these methods are summarized in Table .
4. Methods of from-Structure-to-Spectrum Identification .
| method | model architecture | model input | model output | training set | references |
|---|---|---|---|---|---|
| CFM-ID | PGM | fragmentation graph | fragmentation graph | METLIN | Wang et al. |
| MolDiscovery | PGM | fragmentation graph | fragmentation graph | GNPS, MoNA | Cao et al. |
| NEIMS | MLP/GNN | molecular fingerprint | binned spectrum | NIST | Wei et al. |
| RASSP:FN | MLP, GNN | molecular graph, formulas set | binned spectrum | NIST | Zhu et al. |
| RASSP:SN | MLP, GNN | molecular graph, atom set | binned spectrum | ||
| ESP | MLP, GNN | molecular graph, molecular fingerprint | binned spectrum | NIST | Li et al. |
| MassFormer | MLP, GAT | molecular graph | binned spectrum | NIST | Young et al. |
| ICEBERG | GNN | molecular graph | fragment m/z, intensity | NIST, NPLIB1 | Goldman et al. |
| 3DMolMS | Point-based DNN | 3D molecular feature | binned spectrum | NIST, PCDL | Hong et al. |
PGM: probabilistic graphical model; GAT: Graph Transformer; CID: collision-induced dissociation; PCDL: Agilent Personal Compound Database and Library.
A few models take metadata like adduct type, collision energy, etc. as input, they were not included here.
Formulas set here is obtained through an exhaustive method while atom set is obtained through a heuristic bond-breaking method.
ICEBERG predicts fragment intensities through GNN independently.
NPLIB1: a subset of GNPS.
3D molecular features are similar to molecular fingerprints but with 3D information.
Methods of from-structure-to-spectrum identification mainly use molecular graph and fragmentation graph as model input. This is because the graph format effectively captures the topological structure of molecules and the relationships among different fragmentation processes. The fragmentation graph can be used to predict the probability of each fragmentation process through machine learning models, enabling spectrum prediction through iterative fragmentation, which aligns more closely with the underlying mechanisms of mass spectrometry. The molecular graph however can be directly used by machine learning models to generate the predicted spectrum. Although this reduces the interpretability of the method, it enhances the ability of models to learn molecular topological information which may be overlooked by former methods. In addition, molecular fingerprint can convey various types of chemical information, such as ring structures, aromaticity, etc., and is therefore chosen as model input as well. To ensure that the results represent both the m/z and the intensity of the mass spectrum, most methods use a binned spectrum as model output. The fragmentation graph can also be converted into a mass spectrum.
2.3. De Novo Structural Generation Methods
Although billions of chemical structures are known, fully cataloging all possible structures remains a distant goal. Numerous compounds, such as perfluorinated compounds, which are not included in structural databases, have been produced and released into the environment. Their transformation products have also been detected in various environmental media. −
Currently, the confirmation of chemical structures primarily relies on manual labor. To automate the identification of completely unknown compounds, a common approach is de novo structural generation from mass spectra. This method predicts chemical structures directly, rather than molecular fingerprints, from mass spectra through machine learning. It is analogous to the translation task in natural language processing, where mass spectral data is translated into chemical structures, typically represented by SMILES. This approach is particularly useful for identifying unknown compounds, as it allows the model to generate specific chemical structures without relying on comparisons with existing structures. However, it presents significant challenges in model design. Traditional machine learning models predict spectra or molecular fingerprints and compare them with existing databases, allowing for the possibility that even if an incorrect prediction is made, the correct structure can still be ranked highly. In contrast, de novo generation models cannot recover from incorrect predictions. Therefore, the accuracy requirements for de novo generation models are extremely high, placing substantial demands on both learning capacity of the model and the quality of the training data.
Several methods based on de novo generation have already been proposed, they can be simplified as shown in Figure . Stravs et al. predicted molecular fingerprints from mass spectra using CSI:FingerID and then obtained molecular formulas through SIRIUS. They subsequently developed MSNovelist, which takes molecular fingerprints and formulas as input to predict the SMILES notation of unknown compounds. The model is based on Long Short-Term Memory (LSTM) networks, which generate predictions by sequentially predicting SMILES characters. For each prediction, MSNovelist generates multiple candidate structures simultaneously and ranks them using an modified Platt score.
5.
Workflow of de novo structural generation methods.
LSTM models may fail to capture the global correlations among fragments in MS2 spectra, leading to limited generation accuracy. However, with the rapid advancement of deep learning, Transformer-based models utilizing self-attention mechanisms have revitalized de novo generation methods. Shrivastava et al. developed MassGenie, a model capable of directly predicting SMILES from binned mass spectra obtained via binning. The model is based on a Transformer and generates multiple candidate predictions, which are ranked using MetFrag. Litsa et al. also employed a Transformer-based framework to develop Spec2Mol. This study introduced two pretraining tasks, SMILES restoration and molecular property prediction, to enhance the generation of accurate SMILES. After pretraining a SMILES encoder-decoder, the mass spectrum encoder was trained to minimize the mean squared error between encoded SMILES and mass spectra. Spec2Mol generates multiple candidate predictions simultaneously, ranking them by mass error relative to measured precursor ions. Yang et al. developed TeFT, a Transformer-based model for generating fragmentation trees. Unlike previous models, TeFT outputs fragmentation trees annotated with chemical structures. However, the TeFT model is less accurate in known compounds than models like CSI:FingerID due to the need to rank outputs based on traditional fragmentation trees.
Methods of de novo structural generation are listed in Table .
5. Methods of De Novo Structural Generation.
| method | model architecture | model input | model output | training set | references |
|---|---|---|---|---|---|
| MSNovelist | LSTM | molecular fingerprints | SMILES | HMDB, COCONUT, DSSTox | Stravs et al. |
| MassGenie | Transformer | binned spectrum | SMILES | ZINC, GNPS | Shrivastava et al. |
| Spec2Mol | Transformer | binned spectrum | SMILES | PubChem, ZINC, NIST | Litsa et al. |
| TeFT | Transformer | binned spectrum | fragmentation tree | GNPS, HMDB, MoNA | Yang et al. |
MSNovelist first predicted molecular fingerprint through CSI:FingerID.
TeFT compares the predicted fragmentation trees with a reference fragmentation tree computed by a self-constructed algorithm, which is similar to SIRIUS but can be applied to low-resolution mass spectrometry data.
Although the de novo generation strategy offers a promising approach for identifying unknown compounds, the accuracy of these methods remains limited. Beyond the inherent difficulty of the task, the lack of high-quality mass spectra for training data is a significant challenge. In fact, with sufficiently rich training data, deep learning models can outperform traditional methods, as demonstrated in fields such as natural language processing and protein structure prediction. To achieve comparable results, excluding considerations of learning efficiency, accurately predicting all known molecular structures may require at least one million molecular structures with their mass spectra in multiple instrumental conditions while there are only fewer than 100,000 molecular structures with their corresponding mass spectra in existing databases. Several studies have attempted to use data augmentation techniques to expand the available mass spectra for model training, but the improvements achieved have been modest.
2.4. Assistant Tools for Identification Based on Machine Learning
In addition, various machine learning-based tools have been developed to assist in the mass spectrometry analysis of unknown compounds. These computational approaches facilitate critical analytical tasks including substructure elucidation, compound class prediction and retention time estimation, all of which significantly contribute to comprehensive mass spectral interpretation and compound identification. Van Der Hooft et al. modeled the relationship among MS2 data as a Latent Dirichlet Allocation (LDA) problem and proposed MS2LDA. A set of fragment peaks and neutral losses that occur frequently in MS2 spectra can be classified as Mass2Motifs, which represent common substructures in organic compounds. Wandy et al. developed MotifDB, an online database designed to store annotations of Mass2Motifs.
Classification information plays a crucial role in screening candidates for nontarget identification. Tripathi et al. first obtained molecular fingerprints of features in samples using CSI:FingerID, and then created a Qmeistree that integrates both sample data and the hierarchical clustering of molecular fingerprints. Qmeistree can be used not only to annotate nodes in molecular networks but also to analyze correlations between samples, thereby enhancing the interpretability of mass spectral analysis. Additionally, Dührkop et al. employed a deep neural network model to predict the ClassyFire classification of compounds from their molecular fingerprints. The model, CANOPUS, achieved an accuracy of 99.8% on the test set. Zhao et al. developed a random forest model to identify neutral losses containing iodine in mass spectra. This approach facilitated the determination of the number of iodine atoms substituted during the identification of iodinated disinfection byproducts in environmental samples.
Network information provides valuable additional insights for identification. Chen et al. proposed a global network optimization approach, NetID, to annotate nontarget LC-MS data. It added biochemical connections based on spectral similarity molecular network. Five previously unrecognized metabolites were identified through this approach. Shen et al. developed a similar method, MetDNA, which can expand metabolite annotations without the need for a comprehensive standard spectral library.
Multidimensional information derived from instrumental metadata, such as retention time and collision energy, is frequently used in nontarget identification. Kretschmer et al. developed a similar data set, RepoRT, containing 8,809 unique compounds and their corresponding retention time data to facilitate the development of machine learning models for retention time prediction. Domingo-Almenara et al. trained a deep learning model on the METLIN small molecule retention time (SMRT) data set, which includes 80,038 small molecules. In 70% of cases, the correct molecular identity was ranked among the top three candidates based on predicted retention times.
3. Application of Machine Learning for Environmental Nontarget Identification
Machine learning has been successfully applied to mass spectrometry-based identification. The use of machine learning techniques to assist or replace manual identification is expected to become a prevailing trend. The primary current application scenarios are suspect list screening for chemicals without standard MS/MS spectra and nontarget identification of spectra with unknown structures.
3.1. Suspect List Screening for Chemicals without Standard MS/MS Spectra
A large number of new pollutants with potential toxicity are present in the current environment, making them difficult to identify due to the absence of mass spectra. McEachran et al. constructed a mass spectral database containing simulated spectra for approximately 700,000 compounds in the DSSTox database using CFM-ID. Corresponding metadata information from PubChem and DSSTox was used to label these simulated spectra, thereby enhancing the informational dimension of the database. They also analyzed the frequency of occurrence of each compound in samples recorded by PubChem and DSSTox to assist users in identifying compounds that are more likely to appear in real samples. Similarly, several compound databases that previously lacked mass spectral data were supplemented with simulated spectra obtained through CFM-ID, facilitating querying and usage. −
This simulated mass spectra-based method has also been employed to identify several high-concern new pollutants. Li et al. predicted the mass spectra of 3566 polycyclic aromatic compounds (PACs) in DSSTox using NEIMS and constructed a suspect list containing both mass spectral information and retention indices. This suspect list was then used to screen PACs in PM2.5 samples, leading to the successful identification of 350 PACs. Finally, the potential risks of these PACs were predicted using a QSAR model. Yu et al. employed CFM-ID to generate mass spectra for per-and polyfluoroalkyl substances (PFASs), which have limited MS2 spectra in databases, to determine their potential structures. The study ultimately identified 50 PFASs in airborne particulate matter in China. Wang et al. fine-tuned CFM-ID on a data set of new psychoactive substances (NPSs) using transfer learning, improving the predictive performance of the model for NPSs. This approach led to the successful identification of 3-chlorobenzocyclidine and two related derivatives in real forensic samples, demonstrating the potential of the NPS-MS model to identify previously unknown NPSs in forensic samples.
3.2. Nontarget Identification of Spectra with Unknown Structures
Currently, machine learning-based mass spectral identification algorithms are widely used in nontarget analysis. Qian et al. identified 69 transformation products of sartans using SIRIUS, which served as a scoring tool to rank the candidate chemical structures of central nodes in the transformation reaction molecular network. Similarly, Meyer et al. employed SIRIUS, CFM-ID, and MetFrag to annotate MS2 spectra that lacked matching results in databases. The study successfully identified 87 human pharmaceutical metabolites in wastewater treatment plants, 25 of which were first detected in influent wastewater. SIRIUS and CFM-ID have become the most widely used machine learning tools in environmental nontarget identification.
To identify compounds not present in the structural databases, de novo generation methods are increasingly being applied to real samples. Zwerger et al. conducted a nontarget identification of mycosporine-like amino acids (MAAs) in 33 algal samples using machine learning models such as SIRIUS, CANOPUS, and MSNovelist. MSNovelist, which is capable of identifying completely unknown compounds, did not yield meaningful results for MAAs according to the conclusion in this study. This likely stems from the lower accuracy of current de novo generation models for known compounds, which require further improvements.
4. Conclusions and Perspectives
In summary, the implementation of machine learning for environmental nontarget identification can be broadly categorized into the three approaches discussed above: enhanced library matching, structural database retrieval and de novo generation, seen in Figure . These methods address the growing diversity of environmental pollutants, yet each faces distinct challenges. Enhanced library matching, the gold standard for accuracy, is constrained by limited library coverage. Structural database retrieval, augmented by spectrum-structure machine learning models, extend identification to nonlibrary compounds but sacrifice some accuracy. For entirely novel structures, de novo generation offers the broadest potential; however, its accuracy suffers due to the immense and uncharted chemical space of unknown pollutants. Moreover, reliable de novo prediction demands expansive training data sets, which remain a critical bottleneck.
6.

Comparison of three machine learning approaches for environmental nontarget identification.
With the emergence of new methods, the application of machine learning techniques in mass spectral identification is advancing rapidly, particularly for environmental nontarget analysis of new pollutants. As environmental monitoring increasingly requires the detection of unknown compounds, traditional manual identification has become unsustainable, positioning machine learning as a critical enabler. Currently, established methods like SIRIUS and CFM-ID are widely used due to their user-friendly interfaces, relatively reliable predictions and computational efficiency. However, the accuracy of these methods is highly dependent on the similarity between the predicted compounds and those in their training sets. For new pollutants with significant structural differences, their accuracy is relatively low, and they struggle to identify compounds that are not present in existing structural databases. The emerging de novo generation approach, which shows great potential for identifying compounds beyond existing structural databases, is hindered by issues related to accuracy and interpretability. To improve de novo generation methods, larger number of high-resolution mass spectral data is needed. However, current high-resolution mass spectral data sets remain limited in size and diversity. There are several ways to expand personal databases. One option is to collect additional spectra from commercial sources, open-access repositories, or published literature. Another involves using existing machine learning software to predict spectra; however, the accuracy of these predictions remains limited. Additionally, quantum chemistry models can generate theoretical spectra, but it is currently too computationally demanding for practical use. Furthermore, several widely using machine learning methods tend to be computationally intensive as well, as they rely on exhaustive algorithms in the process of converting mass spectrum to model input, such as fragmentation tree computation and molecular bond breaking. These demands increase exponentially as the molecular mass of compounds grows, which explains why these methods impose an upper mass limit for identifiable compounds. Deep learning methods also need a similar mass limit to save computational cost. Finally, current machine learning models are trained on more metabolites but fewer pollutants, which may limit their direct applicability to environmental pollutants. To address this, researchers could retrain existing models using pollutant-specific data sets or apply transfer learning strategies to enhance their performance in environmental pollutants. Another significant challenge is the lack of standardized data sets, making it difficult to compare the performance of different methods. Addressing these issues is crucial for advancing mass spectrometry-based environmental nontarget identification. Once these challenges are overcome, the workflow of mass spectrometry-based identification could be more accurate and efficient. This will significantly enhance the identification and risk assessment of new pollutants in environmental samples.
Acknowledgments
This study was supported by the Key Research and Development Program of Zhejiang Province (2023C02037), the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB0750100), the National Natural Science Foundation of China (22376092, U24A20512, 22276090, and 22206075), Anhui Provincial Key Research and Development Project (2023t07020004), State Environmental Protection Key Laboratory of Dioxin Pollution Control Open Fund (2023) and Jiangsu Provincial Administration for Market Regulation Science and Technology Program (KJ2024011).
The authors declare no competing financial interest.
Published as part of Environment & Health special issue “New Pollutants: Challenges and Prospects”.
References
- Khan N. A., López-Maldonado E. A., Majumder A., Singh S., Varshney R., López J. R., Méndez P. F., Ramamurthy P. C., Khan M. A., Khan A. H.. et al. A state-of-art-review on emerging contaminants: Environmental chemistry, health effect, and modern treatment methods. Chemosphere. 2023;344:140264. doi: 10.1016/j.chemosphere.2023.140264. [DOI] [PubMed] [Google Scholar]
- Xu J., Liu J.. Managing the Risks of New Pollutants in China: The Perspective of Policy Integration. Environ. Health. 2023;1(6):360–366. doi: 10.1021/envhealth.3c00054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bao J., Liu W., Liu L., Jin Y., Dai J., Ran X., Zhang Z., Tsuda S.. Perfluorinated compounds in the environment and the blood of residents living near fluorochemical plants in Fuxin. China. Environ. Sci. Technol. 2011;45(19):8075–8080. doi: 10.1021/es102610x. [DOI] [PubMed] [Google Scholar]
- Wang X., Zhu Q., Yan X., Wang Y., Liao C., Jiang G.. A review of organophosphate flame retardants and plasticizers in the environment: analysis, occurrence and risk assessment. Sci. Total Environ. 2020;731:139071. doi: 10.1016/j.scitotenv.2020.139071. [DOI] [PubMed] [Google Scholar]
- Sima M. W., Jaffé P. R.. A critical review of modeling Poly-and Perfluoroalkyl Substances (PFAS) in the soil-water environment. Sci. Total Environ. 2021;757:143793. doi: 10.1016/j.scitotenv.2020.143793. [DOI] [PubMed] [Google Scholar]
- Pérez F., Nadal M., Navarro-Ortega A., Fàbrega F., Domingo J. L., Barceló D., Farré M.. Accumulation of perfluoroalkyl substances in human tissues. Environ. Int. 2013;59:354–362. doi: 10.1016/j.envint.2013.06.004. [DOI] [PubMed] [Google Scholar]
- Lewis A. J., Yun X., Spooner D. E., Kurz M. J., McKenzie E. R., Sales C. M.. Exposure pathways and bioaccumulation of per-and polyfluoroalkyl substances in freshwater aquatic ecosystems: Key considerations. Sci. Total Environ. 2022;822:153561. doi: 10.1016/j.scitotenv.2022.153561. [DOI] [PubMed] [Google Scholar]
- Washington J. W., Yoo H., Ellington J. J., Jenkins T. M., Libelo E. L.. Concentrations, distribution, and persistence of perfluoroalkylates in sludge-applied soils near Decatur, Alabama. USA. Environ. Sci. Technol. 2010;44(22):8390–8396. doi: 10.1021/es1003846. [DOI] [PubMed] [Google Scholar]
- Dickman R. A., Aga D. S.. A review of recent studies on toxicity, sequestration, and degradation of per-and polyfluoroalkyl substances (PFAS) J. Hazard. Mater. 2022;436:129120. doi: 10.1016/j.jhazmat.2022.129120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vermeulen R., Schymanski E. L., Barabási A.-L., Miller G. W.. The exposome and health: Where chemistry meets biology. Science. 2020;367(6476):392–396. doi: 10.1126/science.aay3164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Escher B. I., Stapleton H. M., Schymanski E. L.. Tracking complex mixtures of chemicals in our changing environment. Science. 2020;367(6476):388–392. doi: 10.1126/science.aay6636. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gosetti F., Mazzucco E., Gennaro M. C., Marengo E.. Contaminants in water: non-target UHPLC/MS analysis. Environ. Chem. Lett. 2016;14:51–65. doi: 10.1007/s10311-015-0527-1. [DOI] [Google Scholar]
- Hollender J., Schymanski E. L., Singer H. P., Ferguson P. L.. Nontarget Screening with High Resolution Mass Spectrometry in the Environment: Ready to Go? Environ. Sci. Technol. 2017;51(20):11505–11512. doi: 10.1021/acs.est.7b02184. [DOI] [PubMed] [Google Scholar]
- Ciccarelli D., Samanipour S., Rapp-Wright H., Bieber S., Letzel T., O’Brien J. W., Marczylo T., Gant T. W., Vineis P., Barron L. P.. Bridging knowledge gaps in human chemical exposure via drinking water with non-target screening. Crit. Rev. Environ. Sci. Technol. 2025;55(3):190–214. doi: 10.1080/10643389.2024.2396690. [DOI] [Google Scholar]
- Richardson S. D., Kimura S. Y.. Water analysis: emerging contaminants and current issues. Anal. Chem. 2020;92(1):473–505. doi: 10.1021/acs.analchem.9b05269. [DOI] [PubMed] [Google Scholar]
- Díaz R., Ibáñez M., Sancho J. V., Hernández F.. Target and non-target screening strategies for organic contaminants, residues and illicit substances in food, environmental and human biological samples by UHPLC-QTOF-MS. Anal. Methods. 2012;4(1):196–209. doi: 10.1039/C1AY05385J. [DOI] [Google Scholar]
- Ibáñez M., Sancho J. V., Hernández F., McMillan D., Rao R.. Rapid non-target screening of organic pollutants in water by ultraperformance liquid chromatography coupled to time-of-light mass spectrometry. TrAC, Trends Anal. Chem. 2008;27(5):481–489. doi: 10.1016/j.trac.2008.03.007. [DOI] [Google Scholar]
- Alygizakis N., Giannakopoulos T., Τhomaidis N. S., Slobodnik J.. Detecting the sources of chemicals in the Black Sea using non-target screening and deep learning convolutional neural networks. Sci. Total Environ. 2022;847:157554. doi: 10.1016/j.scitotenv.2022.157554. [DOI] [PubMed] [Google Scholar]
- NIST, National Institute of Standards and Technology. https://chemdata.nist.gov (accessed 25 March 2025).
- Wang M., Carver J. J., Phelan V. V., Sanchez L. M., Garg N., Peng Y., Nguyen D. D., Watrous J., Kapono C. A., Luzzatto-Knaan T.. et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat. Biotechnol. 2016;34(8):828–837. doi: 10.1038/nbt.3597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Horai H., Arita M., Kanaya S., Nihei Y., Ikeda T., Suwa K., Ojima Y., Tanaka K., Tanaka S., Aoshima K.. et al. MassBank: a public repository for sharing mass spectral data for life sciences. J. Mass. Spectrom. 2010;45(7):703–714. doi: 10.1002/jms.1777. [DOI] [PubMed] [Google Scholar]
- CAS, Chemical Abstracts Service. https://www.cas.org/ (accessed 25 March 2025).
- Kim S., Chen J., Cheng T., Gindulyte A., He J., He S., Li Q., Shoemaker B. A., Thiessen P. A., Yu B., Zaslavsky L., Zhang J., Bolton E. E.. PubChem 2019 update: improved access to chemical data. Nucleic Acids Res. 2019;47(D1):D1102–D1109. doi: 10.1093/nar/gky1033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- ChemSpider database. https://www.chemspider.com/ (accessed 25 March 2025).
- Ionas A. C., Ballesteros Gómez A., Leonards P. E., Covaci A.. Identification strategies for flame retardants employing time-of-flight mass spectrometric detectors along with spectral and spectra-less databases. J. Mass. Spectrom. 2015;50(8):1031–1038. doi: 10.1002/jms.3618. [DOI] [PubMed] [Google Scholar]
- Lohne J. J., Turnipseed S. B., Andersen W. C., Storey J., Madson M. R.. Application of single-stage orbitrap mass spectrometry and differential analysis software to nontargeted analysis of contaminants in dog food: detection, identification, and quantification of glycoalkaloids. J. Agric. Food. Chem. 2015;63(19):4790–4798. doi: 10.1021/acs.jafc.5b00959. [DOI] [PubMed] [Google Scholar]
- Wang F., Liigand J., Tian S., Arndt D., Greiner R., Wishart D. S.. CFM-ID 4.0: more accurate ESI-MS/MS spectral prediction and compound identification. Anal. Chem. 2021;93(34):11692–11700. doi: 10.1021/acs.analchem.1c01465. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dührkop K., Shen H., Meusel M., Rousu J., Böcker S.. Searching molecular structure databases with tandem mass spectra using CSI: FingerID. Proc. Natl. Acad. Sci. U.S.A. 2015;112(41):12580–12585. doi: 10.1073/pnas.1509788112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dührkop K., Fleischauer M., Ludwig M., Aksenov A. A., Melnik A. V., Meusel M., Dorrestein P. C., Rousu J., Böcker S.. SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nat. Methods. 2019;16(4):299–302. doi: 10.1038/s41592-019-0344-8. [DOI] [PubMed] [Google Scholar]
- van Der Hooft J. J. J., Wandy J., Barrett M. P., Burgess K. E., Rogers S.. Topic modeling for untargeted substructure exploration in metabolomics. Proc. Natl. Acad. Sci. U.S.A. 2016;113(48):13738–13743. doi: 10.1073/pnas.1608041113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shrivastava A. D., Swainston N., Samanta S., Roberts I., Wright Muelas M., Kell D. B.. MassGenie: A transformer-based deep learning method for identifying small molecules from their mass spectra. Biomolecules. 2021;11(12):1793. doi: 10.3390/biom11121793. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stravs M. A., Dührkop K., Böcker S., Zamboni N.. MSNovelist: de novo structure generation from mass spectra. Nat. Methods. 2022;19(7):865–870. doi: 10.1038/s41592-022-01486-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Litsa E. E., Chenthamarakshan V., Das P., Kavraki L. E.. An end-to-end deep learning framework for translating mass spectra to de-novo molecules. Commun. Chem. 2023;6(1):132. doi: 10.1038/s42004-023-00932-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huber F., Ridder L., Verhoeven S., Spaaks J. H., Diblen F., Rogers S., Van Der Hooft J. J.. Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships. PLoS Comput. Biol. 2021;17(2):e1008724. doi: 10.1371/journal.pcbi.1008724. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huber F., van der Burg S., van der Hooft J. J. J., Ridder L.. MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra. J. Cheminf. 2021;13(1):84. doi: 10.1186/s13321-021-00558-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liebal U. W., Phan A. N., Sudhakar M., Raman K., Blank L. M.. Machine learning applications for mass spectrometry-based metabolomics. Metabolites. 2020;10(6):243. doi: 10.3390/metabo10060243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tsugawa H., Cajka T., Kind T., Ma Y., Higgins B., Ikeda K., Kanazawa M., VanderGheynst J., Fiehn O., Arita M.. MS-DIAL: data-independent MS/MS deconvolution for comprehensive metabolome analysis. Nat. Methods. 2015;12(6):523–526. doi: 10.1038/nmeth.3393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmid R., Heuckeroth S., Korf A., Smirnov A., Myers O., Dyrlund T. S., Bushuiev R., Murray K. J., Hoffmann N., Lu M., Sarvepalli A., Zhang Z., Fleischauer M., Dührkop K., Wesner M., Hoogstra S. J., Rudt E., Mokshyna O., Brungs C., Ponomarov K., Mutabdžija L., Damiani T., Pudney C. J., Earll M., Helmer P. O., Fallon T. R., Schulze T., Rivas-Ubach A., Bilbao A., Richter H., Nothias L. F., Wang M., Orešič M., Weng J. K., Böcker S., Jeibmann A., Hayen H., Karst U., Dorrestein P. C., Petras D., Du X., Pluskal T.. Integrative analysis of multimodal mass spectrometry data in MZmine 3. Nat. Biotechnol. 2023;41(4):447–449. doi: 10.1038/s41587-023-01690-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith C. A., Want E. J., O’Maille G., Abagyan R., Siuzdak G.. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal. Chem. 2006;78(3):779–787. doi: 10.1021/ac051437y. [DOI] [PubMed] [Google Scholar]
- Li Y., Kind T., Folz J., Vaniya A., Mehta S. S., Fiehn O.. Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification. Nat. Methods. 2021;18(12):1524–1531. doi: 10.1038/s41592-021-01331-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scheubert K., Hufsky F., Petras D., Wang M., Nothias L. F., Dührkop K., Bandeira N., Dorrestein P. C., Böcker S.. Significance estimation for large scale metabolomics annotations by spectral matching. Nat. Commun. 2017;8(1):1494. doi: 10.1038/s41467-017-01318-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mikolov T., Sutskever I., Chen K., Corrado G. S., Dean J.. Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 2013;26:3111–3119. [Google Scholar]
- de Jonge N. F., Louwen J. J. R., Chekmeneva E., Camuzeaux S., Vermeir F. J., Jansen R. S., Huber F., van der Hooft J. J. J.. MS2Query: reliable and scalable MS2 mass spectra-based analogue search. Nat. Commun. 2023;14(1):1752. doi: 10.1038/s41467-023-37446-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bittremieux W., May D. H., Bilmes J., Noble W. S.. A learned embedding for efficient joint analysis of millions of mass spectra. Nat. Methods. 2022;19(6):675–678. doi: 10.1038/s41592-022-01496-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guo H., Xue K., Sun H., Jiang W., Pu S.. Contrastive learning-based embedder for the representation of tandem mass spectra. Anal. Chem. 2023;95(20):7888–7896. doi: 10.1021/acs.analchem.3c00260. [DOI] [PubMed] [Google Scholar]
- Vaswani A.. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017;30:5998–6008. [Google Scholar]
- Wang M., Wang J., Carver J., Pullman B. S., Cha S. W., Bandeira N.. Assembling the Community-Scale Discoverable Human Proteome. Cell. Syst. 2018;7(4):412–421. doi: 10.1016/j.cels.2018.08.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Irwin J. J., Sterling T., Mysinger M. M., Bolstad E. S., Coleman R. G.. ZINC: a free tool to discover chemistry for biology. J. Chem. Inf. Model. 2012;52(7):1757–1768. doi: 10.1021/ci3001277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rasche F., Svatos A., Maddula R. K., Böttcher C., Böcker S.. Computing fragmentation trees from tandem mass spectrometry data. Anal. Chem. 2011;83(4):1243–1251. doi: 10.1021/ac101825k. [DOI] [PubMed] [Google Scholar]
- Böcker S., Rasche F.. Towards de novo identification of metabolites by analyzing tandem mass spectra. Bioinformatics. 2008;24(16):i49–i55. doi: 10.1093/bioinformatics/btn270. [DOI] [PubMed] [Google Scholar]
- Böcker S., Letzel M. C., Lipták Z., Pervukhin A.. SIRIUS: decomposing isotope patterns for metabolite identification. Bioinformatics. 2009;25(2):218–224. doi: 10.1093/bioinformatics/btn603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heinonen M., Shen H., Zamboni N., Rousu J.. Metabolite identification and molecular fingerprint prediction through machine learning. Bioinformatics. 2012;28(18):2333–2341. doi: 10.1093/bioinformatics/bts437. [DOI] [PubMed] [Google Scholar]
- Brouard C., Shen H., Dührkop K., d’Alché-Buc F., Böcker S., Rousu J.. Fast metabolite identification with input output kernel regression. Bioinformatics. 2016;32(12):i28–i36. doi: 10.1093/bioinformatics/btw246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang X., Yu N., Jiao Z., Li L., Yu H., Wei S.. Machine learning–enhanced molecular network reveals global exposure to hundreds of unknown PFAS. Sci. Adv. 2024;10(21):eadn1039. doi: 10.1126/sciadv.adn1039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nguyen D. H., Nguyen C. H., Mamitsuka H.. ADAPTIVE: leArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra. Bioinformatics. 2019;35(14):i164–i172. doi: 10.1093/bioinformatics/btz319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goldman S., Wohlwend J., Stražar M., Haroush G., Xavier R. J., Coley C. W.. Annotating metabolite mass spectra with domain-inspired chemical formula transformers. Nat. Mach. Intell. 2023;5(9):965–979. doi: 10.1038/s42256-023-00708-3. [DOI] [Google Scholar]
- Ridder L., van der Hooft J. J., Verhoeven S.. Automatic compound annotation from mass spectrometry data using MAGMa. Mass Spectrom. 2014;3(S2):S0033–S0033. doi: 10.5702/massspectrometry.S0033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fan Z., Alley A., Ghaffari K., Ressom H. W.. MetFID: artificial neural network-based compound fingerprint prediction for metabolite annotation. Metabolomics. 2020;16(10):104. doi: 10.1007/s11306-020-01726-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hill D. W., Kertesz T. M., Fontaine D., Friedman R., Grant D. F.. Mass spectral metabonomics beyond elemental formula: chemical database querying by matching experimental with computational fragmentation spectra. Anal. Chem. 2008;80(14):5574–5582. doi: 10.1021/ac800548g. [DOI] [PubMed] [Google Scholar]
- Hill A. W., Mortishire-Smith R. J.. Automated assignment of high-resolution collisionally activated dissociation mass spectra using a systematic bond disconnection approach. Rapid Commun. Mass Spectrom. 2005;19(21):3111–3118. doi: 10.1002/rcm.2177. [DOI] [Google Scholar]
- Wolf S., Schmidt S., Müller-Hannemann M., Neumann S.. In silico fragmentation for computer assisted identification of metabolite mass spectra. BMC Bioinf. 2010;11:148. doi: 10.1186/1471-2105-11-148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ruttkies C., Schymanski E. L., Wolf S., Hollender J., Neumann S.. MetFrag relaunched: incorporating strategies beyond in silico fragmentation. J. Cheminf. 2016;8:3. doi: 10.1186/s13321-016-0115-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ridder L., van der Hooft J. J., Verhoeven S., de Vos R. C., van Schaik R., Vervoort J.. Substructure-based annotation of high-resolution multistage MSn spectral trees. Rapid Commun. Mass Spectrom. 2012;26(20):2461–2471. doi: 10.1002/rcm.6364. [DOI] [PubMed] [Google Scholar]
- Kind T., Liu K.-H., Lee D. Y., DeFelice B., Meissen J. K., Fiehn O.. LipidBlast in silico tandem mass spectrometry database for lipid identification. Nat. Methods. 2013;10(8):755–758. doi: 10.1038/nmeth.2551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Allen F., Greiner R., Wishart D.. Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification. Metabolomics. 2015;11:98–110. doi: 10.1007/s11306-014-0676-4. [DOI] [Google Scholar]
- Allen F., Pon A., Wilson M., Greiner R., Wishart D.. CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra. Nucleic Acids Res. 2014;42(W1):W94–W99. doi: 10.1093/nar/gku436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wei J. N., Belanger D., Adams R. P., Sculley D.. Rapid prediction of electron–ionization mass spectrometry using neural networks. ACS Cent. Sci. 2019;5(4):700–708. doi: 10.1021/acscentsci.9b00085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu R. L., Jonas E.. Rapid Approximate Subset-Based Spectra Prediction for Electron Ionization-Mass Spectrometry. Anal. Chem. 2023;95(5):2653–2663. doi: 10.1021/acs.analchem.2c02093. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li X., Zhou Chen Y., Kalia A., Zhu H., Liu L. p., Hassoun S., Wren J.. An Ensemble Spectral Prediction (ESP) model for metabolite annotation. Bioinformatics. 2024;40(8):btae490. doi: 10.1093/bioinformatics/btae490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goldman S., Li J., Coley C. W.. Generating molecular fragmentation graphs with autoregressive neural networks. Anal. Chem. 2024;96(8):3419–3428. doi: 10.1021/acs.analchem.3c04654. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hong Y., Li S., Welch C. J., Tichy S., Ye Y., Tang H., Elofsson A.. 3DMolMS: prediction of tandem mass spectra from 3D molecular conformations. Bioinformatics. 2023;39(6):btad354. doi: 10.1093/bioinformatics/btad354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cao L., Guler M., Tagirdzhanov A., Lee Y. Y., Gurevich A., Mohimani H.. MolDiscovery: Learning mass spectrometry fragmentation of small molecules. Nat. Commun. 2021;12(1):3718. doi: 10.1038/s41467-021-23986-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Young A., Röst H., Wang B.. Tandem mass spectrum prediction for small molecules using graph transformers. Nat. Mach. Intell. 2024;6(4):404–416. doi: 10.1038/s42256-024-00816-8. [DOI] [Google Scholar]
- Yang X., Neta P., Stein S. E.. Extending a tandem mass spectral library to include MS2 spectra of fragment ions produced in-source and MSn spectra. J. Am. Soc. Mass Spectrom. 2017;28(11):2280–2287. doi: 10.1007/s13361-017-1748-2. [DOI] [PubMed] [Google Scholar]
- Kurwadkar S., Dane J., Kanel S. R., Nadagouda M. N., Cawdrey R. W., Ambade B., Struckhoff G. C., Wilkin R.. Per-and polyfluoroalkyl substances in water and wastewater: A critical review of their global occurrence and distribution. Sci. Total Environ. 2022;809:151003. doi: 10.1016/j.scitotenv.2021.151003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang Y., Munir U., Huang Q.. Occurrence of per-and polyfluoroalkyl substances (PFAS) in soil: Sources, fate, and remediation. Soil Environ. Health. 2023;1(1):100004. doi: 10.1016/j.seh.2023.100004. [DOI] [Google Scholar]
- Lin H., Taniyasu S., Yamazaki E., Wei S., Wang X., Gai N., Kim J. H., Eun H., Lam P. K., Yamashita N.. Per-and polyfluoroalkyl substances in the air particles of Asia: levels, seasonality, and size-dependent distribution. Environ. Sci. Technol. 2020;54(22):14182–14191. doi: 10.1021/acs.est.0c03387. [DOI] [PubMed] [Google Scholar]
- Hochreiter S., Schmidhuber J.. Long Short-term Memory. Neural Comput. 1997;9(8):1735–1780. doi: 10.1162/neco.1997.9.8.1735. [DOI] [PubMed] [Google Scholar]
- Yang Y., Sun S., Yang S., Yang Q., Lu X., Wang X., Yu Q., Huo X., Qian X.. Structural annotation of unknown molecules in a miniaturized mass spectrometer based on a transformer enabled fragment tree method. Commun. Chem. 2024;7(1):109. doi: 10.1038/s42004-024-01189-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sorokina M., Steinbeck C.. Review on natural products databases: where to find data in 2020. J. Cheminf. 2020;12(1):20. doi: 10.1186/s13321-020-00424-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Richard A. M., Williams C. R.. Distributed structure-searchable toxicity (DSSTox) public database network: a proposal. Mutat. Res. Fundam. Mol. Mech. Mutagen. 2002;499(1):27–52. doi: 10.1016/S0027-5107(01)00289-5. [DOI] [PubMed] [Google Scholar]
- Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., Tunyasuvunakool K., Bates R., Žídek A., Potapenko A., Bridgland A., Meyer C., Kohl S. A. A., Ballard A. J., Cowie A., Romera-Paredes B., Nikolov S., Jain R., Adler J., Back T., Petersen S., Reiman D., Clancy E., Zielinski M., Steinegger M., Pacholska M., Berghammer T., Bodenstein S., Silver D., Vinyals O., Senior A. W., Kavukcuoglu K., Kohli P., Hassabis D.. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wandy J., Zhu Y., van der Hooft J. J. J., Daly R., Barrett M. P., Rogers S., Stegle O.. Ms2lda. org: web-based topic modelling for substructure discovery in mass spectrometry. Bioinformatics. 2018;34(2):317–318. doi: 10.1093/bioinformatics/btx582. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tripathi A., Vázquez-Baeza Y., Gauglitz J. M., Wang M., Dührkop K., Nothias-Esposito M., Acharya D. D., Ernst M., van der Hooft J. J. J., Zhu Q.. et al. Chemically informed analyses of metabolomics mass spectrometry data with Qemistree. Nat. Chem. Biol. 2021;17(2):146–151. doi: 10.1038/s41589-020-00677-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dührkop K., Nothias L. F., Fleischauer M., Reher R., Ludwig M., Hoffmann M. A., Petras D., Gerwick W. H., Rousu J., Dorrestein P. C., Böcker S.. Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra. Nat. Biotechnol. 2021;39(4):462–471. doi: 10.1038/s41587-020-0740-8. [DOI] [PubMed] [Google Scholar]
- Djoumbou Feunang Y., Eisner R., Knox C., Chepelev L., Hastings J., Owen G., Fahy E., Steinbeck C., Subramanian S., Bolton E., Greiner R., Wishart D. S.. ClassyFire: automated chemical classification with a comprehensive, computable taxonomy. J. Cheminf. 2016;8(1):61. doi: 10.1186/s13321-016-0174-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao T., Shen Q., Li X.-F., Huan T.. IodoFinder: Machine Learning-Guided Recognition of Iodinated Chemicals in Nontargeted LC-MS/MS Analysis. Environ. Sci. Technol. 2025;59(9):4530–4539. doi: 10.1021/acs.est.4c12698. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen L., Lu W., Wang L., Xing X., Chen Z., Teng X., Zeng X., Muscarella A. D., Shen Y., Cowan A., McReynolds M. R., Kennedy B. J., Lato A. M., Campagna S. R., Singh M., Rabinowitz J. D.. Metabolite discovery through global annotation of untargeted metabolomics data. Nat. Methods. 2021;18(11):1377–1385. doi: 10.1038/s41592-021-01303-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shen X., Wang R., Xiong X., Yin Y., Cai Y., Ma Z., Liu N., Zhu Z. J.. Metabolic reaction network-based recursive metabolite annotation for untargeted metabolomics. Nat. Commun. 2019;10(1):1516. doi: 10.1038/s41467-019-09550-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kretschmer F., Harrieder E.-M., Hoffmann M. A., Böcker S., Witting M.. RepoRT: a comprehensive repository for small molecule retention times. Nat. Methods. 2024;21(2):153–155. doi: 10.1038/s41592-023-02143-z. [DOI] [PubMed] [Google Scholar]
- Domingo-Almenara X., Guijas C., Billings E., Montenegro-Burke J. R., Uritboonthai W., Aisporna A. E., Chen E., Benton H. P., Siuzdak G.. The METLIN small molecule dataset for machine learning-based retention time prediction. Nat. Commun. 2019;10(1):5811. doi: 10.1038/s41467-019-13680-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McEachran A. D., Balabin I., Cathey T., Transue T. R., Al-Ghoul H., Grulke C., Sobus J. R., Williams A. J.. Linking in silico MS/MS spectra with chemistry data to improve identification of unknowns. Sci. Data. 2019;6(1):141. doi: 10.1038/s41597-019-0145-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wishart D. S., Guo A., Oler E., Wang F., Anjum A., Peters H., Dizon R., Sayeeda Z., Tian S., Lee B. L., Berjanskii M., Mah R., Yamamoto M., Jovel J., Torres-Calzada C., Hiebert-Giesbrecht M., Lui V. W., Varshavi D., Varshavi D., Allen D., Arndt D., Khetarpal N., Sivakumaran A., Harford K., Sanford S., Yee K., Cao X., Budinski Z., Liigand J., Zhang L., Zheng J., Mandal R., Karu N., Dambrova M., Schiöth H. B., Greiner R., Gautam V.. HMDB 5.0: the human metabolome database for 2022. Nucleic Acids Res. 2022;50(D1):D622–D631. doi: 10.1093/nar/gkab1062. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wishart D. S., Feunang Y. D., Guo A. C., Lo E. J., Marcu A., Grant J. R., Sajed T., Johnson D., Li C., Sayeeda Z., Assempour N., Iynkkaran I., Liu Y., Maciejewski A., Gale N., Wilson A., Chin L., Cummings R., Le D., Pon A., Knox C., Wilson M.. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2018;46(D1):D1074–D1082. doi: 10.1093/nar/gkx1037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ramirez-Gaona M., Marcu A., Pon A., Guo A. C., Sajed T., Wishart N. A., Karu N., Djoumbou Feunang Y., Arndt D., Wishart D. S.. YMDB 2.0: a significantly expanded version of the yeast metabolome database. Nucleic Acids Res. 2017;45(D1):D440–D445. doi: 10.1093/nar/gkw1058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li T., Su W., Zhong L., Liang W., Feng X., Zhu B., Ruan T., Jiang G.. An Integrated Workflow Assisted by In Silico Predictions To Expand the List of Priority Polycyclic Aromatic Compounds. Environ. Sci. Technol. 2023;57(49):20854–20863. doi: 10.1021/acs.est.3c07087. [DOI] [PubMed] [Google Scholar]
- Yu N., Guo H., Yang J., Jin L., Wang X., Shi W., Zhang X., Yu H., Wei S.. Non-Target and Suspect Screening of Per- and Polyfluoroalkyl Substances in Airborne Particulate Matter in China. Environ. Sci. Technol. 2018;52(15):8205–8214. doi: 10.1021/acs.est.8b02492. [DOI] [PubMed] [Google Scholar]
- Wang F., Pasin D., Skinnider M. A., Liigand J., Kleis J.-N., Brown D., Oler E., Sajed T., Gautam V., Harrison S.. et al. Deep Learning-Enabled MS/MS Spectrum Prediction Facilitates Automated Identification Of Novel Psychoactive Substances. Anal. Chem. 2023;95(50):18326–18334. doi: 10.1021/acs.analchem.3c02413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qian Y., Ke Y., Wang L., Yu N., He Y., Yu Q., Wei S., Ren H., Geng J.. Entropy Similarity-Driven Transformation Reaction Molecular Networking Reveals Transformation Pathways and Potential Risks of Emerging Contaminants in Wastewater: The Example of Sartans. Environ. Sci. Technol. 2025;59:4153. doi: 10.1021/acs.est.4c13144. [DOI] [PubMed] [Google Scholar]
- Meyer C., Stravs M. A., Hollender J.. How Wastewater Reflects Human MetabolismSuspect Screening of Pharmaceutical Metabolites in Wastewater Influent. Environ. Sci. Technol. 2024;58(22):9828–9839. doi: 10.1021/acs.est.4c00968. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zwerger M. J., Hammerle F., Siewert B., Ganzera M.. Application of feature-based molecular networking in the field of algal research with special focus on mycosporine-like amino acids. J. Appl. Phycol. 2023;35(3):1377–1392. doi: 10.1007/s10811-023-02906-3. [DOI] [Google Scholar]




