Skip to main content
PLOS One logoLink to PLOS One
. 2025 Aug 13;20(8):e0324668. doi: 10.1371/journal.pone.0324668

An automated software-assisted approach for exploring metabolic susceptibility and degradation products in macromolecules using high-resolution mass spectrometry

Paula Cifuentes 1,2,3,*,#, Ismael Zamora 3,#, Tatiana Radchenko 2,#, Fabien Fontaine 3,#, Albert Garriga 3,#, Luca Morettoni 4,#, Jesper Kammersgaard Christensen 5, Hans Helleberg 5, Bridget A Becker 6
Editor: Yash Gupta7
PMCID: PMC12349704  PMID: 40802852

Abstract

A comprehensive understanding of drug metabolism is crucial for advancements in drug development. Automation has improved various stages of this process, from compound procurement to data analysis, but significant challenges persist in the metabolite identification (MetID) of macromolecules due to their size, structural complexity, and associated computational demands. This study introduces new algorithms for automated Liquid Chromatography-High-Resolution Mass Spectrometry (LC-HRMS) data analysis applicable to macromolecules. A novel peak detection approach based on the most abundant mass (MaM) is presented and systematically compared with the monoisotopic mass (MiM) approach, commonly used in small molecules MetID. Additionally, three structure visualization strategies, expanded (atom-level), non-expanded (monomer-level), and a hybrid mode, are evaluated for their impact on computation data processing time and interpretability, based on their distinct fragmentation strategies. The workflow was validated using six diverse datasets, comprising linear and cyclic peptides and oligonucleotides with both natural and unnatural monomers, covering a molecular weight range of 700–7630 Da. A total of 970 metabolites were identified under various experimental and ionization conditions. The MaM algorithm demonstrated higher scores and a greater number of matches, instilling greater confidence in the accurate prediction of metabolite structures, while the non-expanded visualization significantly reduced processing times (ranging from minutes to under an hour for most peptides). Furthermore, the visualization algorithm, which integrates monomer-level and atom/bond notation, enables clear localization of metabolic biotransformations. Compared to previous studies, the proposed workflow demonstrated reduced processing time, consistent detection of degradation products, and enhanced visualization capabilities, advancing automated MetID for macromolecules.

Introduction

An essential aspect of the drug development process is the comprehensive identification and characterization of the major metabolites of the drug candidate and the enzymes responsible for its metabolic transformation, commonly known as drug metabolism. These studies are crucial, as certain metabolites may exhibit superior potency or improved pharmacokinetics properties compared to the parent drugs, thereby enhancing therapeutic efficacy [1]. Conversely, some metabolites may be toxic or chemically reactive, potentially interfering with the metabolism of co-administered drugs and increasing the risk of drug-drug interactions [2, 3].

Therefore, MetID plays a vital role not only in guiding chemical modifications to improve metabolic stability and reduce toxicity, but also for informing clinical monitoring strategies and supporting personalized medicine approaches that aim to prevent adverse drug reactions. Collectively, these efforts are essential to the development of safe and effective therapeutic agents [4].

In recent years, the use of macromolecules such as peptides and oligonucleotides as therapeutic agents has rapidly grown in drug development, making MetID for these compounds increasingly important [5,6]. However, MetID for macromolecules presents greater challenges than for small molecules, especially in data analysis and result interpretation. The large size and structural complexity of such compounds, which often consists of hundreds of atoms, lead to an exponential increase in spectral signals that must be interpreted, along with a larger number of fragments to compute and compare, and difficulties to determine the specific location where the biotransformation has occurred [7]. Consequently, this complexity demands significantly more software processing time and memory.

Although several tools have been developed for automated MetID, these are primarily designed for small molecules and often struggle to process multiply charged states, which are prevalent in large biomolecules. Although some specialized approaches have been developed to address these challenges, they frequently suffer high false-positive rates in complex biological matrices and may offer limited support for various ionization modes, as highlighted in this research [7].

In our previous publications [8, 9] we developed software solutions focused on automating data analysis, primarily for small molecules, with some applicability to macromolecules. These tools have helped to create faster systems for the data processing step and the results review/visualization as they perform the following steps automatically: select the chromatographic peaks that are related to the compound of interest, find the mass spectral information for each extracted peak, assign potential structures by comparing the theoretical fragmentation that can be predicted with the actual mass to charge ratio (m/z) values obtained with the experimental spectra, scoring potential solutions depending on the fragments assigned to the spectra alone or by the comparison with the parent fragmentation. After clustering the results from different experimental conditions and consolidating them into a single experimental entity, the results are stored in the database. Subsequently, upon the conclusion of the review process, a report is generated.

The primary aim of this article is to present novel algorithms and approaches for automated LC-HRMS data analysis that specifically address the challenges of MetID in macromolecules. One of the new approaches introduced in the automated workflow is a peak detection algorithm based on the MaM peak —an approach that, to our knowledge, has not been previously reported. To demonstrate its suitability for MetID of macromolecules, the algorithm is compared with the traditional MiM peak detection method, which has long been used in small molecule MetID studies.

In addition, two visualization strategies for macromolecules are presented. In the expand form all atoms and intermonomer bonds are shown, whereas in the non-expanded form, the structure is represented by linking the monomer acronyms. These visualizations have direct implications for the computational process: in the non-expanded form, the structure is not subjected to virtual atom-level metabolite generation. Instead, biotransformations are applied at the monomer level, which reduces the number of potential fragments generated and leads to decreased processing time and memory consumption. The non-expanded approach is compared to the expanded one, also demonstrating how this representation can facilitate the identification of biotransformation sites.

These proposed approaches are integrated into a workflow that enables the interpretation of data acquired under diverse experimental conditions and ionization modes. To validate the applicability of this workflow, analysis was conducted on six datasets spanning a molecular range from 700 to 7630 Da. These datasets consist of both linear and cyclic peptides, incorporating natural and unnatural amino acids, as well as oligonucleotides. Specifically, dataset-1 comprises 9 commercially available peptides, dataset-2 includes one commercially available peptide and 4 synthetic analogues, dataset-3 involves a natural peptide hormone and 7 synthetic analogues, dataset-4 features an antisense oligonucleotide, dataset-5 contains 28 commercially available peptides, and dataset-6 is composed of a peptide hormone. Covering macromolecules of varying sizes and structural types—including linear, cyclic, and non-standard monomers—these datasets demonstrate that the proposed methodology can be broadly applied across a wide compound applicability domain.

Comparisons of the results obtained for certain compounds with those of prior studies have enabled an evaluation of several factors, such as the number and structure of identified metabolites, along with a consideration of the time consumed during the data processing step.

Materials and methods

Experimental data

For this study, six different experimental data sets (linear/cyclic, natural/unnatural amino acids, and an oligonucleotide dataset) have been used for the MetID, as shown in Table 1. The proteases and biological matrices used in the experimental incubations of these datasets represent key relevant proteolytic environments that therapeutic peptides are likely to encounter in vivo. This includes enzymes involved in gastrointestinal metabolism—where peptide hydrolysis primarily occurs—such as trypsin, chymotrypsin, elastase, and pepsin. The other proteases and matrices reflect metabolism in the liver, blood, and other physiological contexts, ensuring coverage of a broad range of relevant peptide degradation pathways [10].

Table 1. Summary of the number of compounds of each dataset, along with the molecular weight range of the compounds and the corresponding data acquisition mode. (DDA = data-dependent acquisition, DIA = data-independent acquisition).

Dataset Number of compounds Molecular weight range (Da) Data acquisition mode Incubation conditions
Dataset-1 9 1282–3429 DDA Trypsin, Chymotrypsin, Pancreatic Elastase, and Pepsin
Dataset-2 5 3298–4184 DDA Dipeptidyl peptidase-4 (DPP-4) and neutral endopeptidase (NEP)
Dataset-3 8 1637–1679 DDA and DIA Human Serum
Dataset-4 1 7633 DDA Human Liver
Dataset-5 25 708–1900 DIA Human Cathepsin G, Human Neutrophil Elastase, Human MMP-12 catalytic domain, and Bovine pancreatic trypsin
Dataset-6 1 5808 DIA Insulin-degrading enzyme (IDE)

The first set (dataset-1) is composed of nine commercially available peptides (secretin, calcitonin, oxytocin, octreotide, deslorelin, histrelin, goserelin, buserelin, and leuprolide), each of them, was separately incubated, with four selected protease enzymes – trypsin, chymotrypsin, pancreatic elastase, and pepsin. Data acquisition was performed using a Thermo Orbitrap® instrument in full scan mode with data-dependent tandem mass spectrometry (MS/MS). The detailed experimental conditions for this dataset are documented in the referenced bibliography [11]. Three of the compounds are cyclic peptides (octreotide, oxytocin, and calcitonin) and five contain unnatural amino acids (secretin, calcitonin, ocreotide, deslorelin, and histrelin). Molecular weight ranges from 1282 to 3429 Da, as illustrated in Table 2.

Table 2. Dataset-1 sequence structures and its molecular weights.

Compound name Molecular weight (Da) Sequence Structure
Deslorelin 1282.45 H-Pyr-His-Trp-Ser-Tyr-D-Trp-Leu-Arg-Pro-NHEt Linear
Goserelin 1269.41 Glp-His-Trp-Ser-Tyr-Ser-tBu-Leu-Arg-Pro-NHNHCONH2 Linear
Buserelin 1238.66 Glp-His-Trp-Ser-Tyr-Ser-tBu-Leu-Arg-Pro-NHEt Linear
Histrelin 1323.5 Glp-His-Trp-Ser-Tyr-HisBzl-Leu-Arg-Pro-NHEt Linear
Leuprolide 1209.4 Glp-His-Trp-Ser-Tyr-D-Leu-Leu-Arg-Pro-NHEt Linear
Secretin Human 3039.41 H-His-Ser-Asp-Gly-Thr-Phe-Thr-Ser-Glu-Leu-Ser-Arg-Leu-Arg-Glu-Gly-Ala-Arg-Leu-Gln-Arg-Leu-Leu-Gln-Gly-Leu-Val-NH2 Linear
Octreotide 1019.24 H-D-Phe-Cys (1)-Phe-D-Trp-Lys-Thr-Cys (1)-Thr-ol Cyclic
Oxytocin 1007.19 H-Cys (1)-Tyr-Ile-Gln-Asn-Cys (1)-Pro-Leu-Gly-NH2 Cyclic
Calcitonin 3429.71 H-Cys (1)-Ser-Asn-Leu-Ser-Thr-Cys (1)-Val-Leu-Gly-Lys-Leu-Ser-Gln-Glu-Leu-His-Lys-Leu-Gln-Thr-Tyr-Pro-Arg-Thr-Asn-Thr-Gly-Ser-Gly-Thr-Pro-NH2 Cyclic

Dataset-2 consists of a commercially available peptide glucagon-like peptide-1 (GLP-1), a 30 amino acid compound, and four synthetic analogues, designed to have a reduced susceptibility to enzymatic degradation, taspoglutide, exenatide, liraglutide and semaglutide, all of them linear peptides. MetID has been conducted under the presence of DPP-4 and NEP, as both enzymes are known to be involved in native GLP-1 degradation. Data acquisition employed a Thermo Orbitrap® instrument operating in full scan mode with data-dependent MS/MS, as detailed previously in the cited references [11]. Except for semaglutide, which was incubated in dog plasma – with the two metabolites first synthesized and then spiked into the plasma – the data were collected using a Waters® ACQUITY® Ultra-Performance Liquid Chromatography with Vion Ion Mobility Spectrometry Quadrupole Time-of-Flight (IMS-QToF) Mass Spectrometer operated by UNIFI in a data-independent mode, in collaboration with Zealand Pharma. Taspoglutide peptide has non-natural amino acids and liraglutide has C-16 fatty acid side chain (palmitic acid). Molecular weights ranges from 3297 to 4184 Da, as presented in Table 3, being exenatide the larger.

Table 3. Dataset-2: sequence structures and molecular weights of GLP-1 and its analogues.

Compound name Molecular weight (Da) Sequence Structure
GLP-1 3297.68 H2N-His-Ala-Glu-Gly-Thr-Phe-Thr-Ser-Asp-Val-Ser-Ser-Tyr-Leu-Glu-Gly-Gln-Ala-Ala-Lys-Glu-Phe-Ile-Ala-Trp-Leu-Val-Lys-Gly-Arg- Gly-OH Linear
Liraglutide 3751.20 H-His-Ala-Glu-Gly-Thr-Phe-Thr-Ser-Asp-Val-Ser-Ser-Tyr-Leu-Glu-Gly-Gln-Ala-Ala-Lys(γ-Glu-palmitoyl)-Glu-Phe-Ile-Ala-Trp-Leu-Val-Arg-Gly-Arg-Gly-OH Linear
Taspoglutide 3338.71 H-His-Aib-Glu-Gly-Thr-Phe-Thr-Ser-Asp-Val-Ser-Ser-Tyr-Leu-Glu-Gly-Gln-Ala-Ala-Lys-Glu-Phe-Ile-Ala-Trp-Leu-Val-Lys-Aib-Arg-NH2 Linear
Semaglutide 4113.58 H-His-Aib-Glu-Gly-Thr-Phe-Thr-Ser-Asp-Val-Ser-Ser-Tyr-Leu-Glu-Gly-Gln-Ala-Ala-Lys(γ-Glu-ADO-C18 di-acid)-Glu-Phe-Ile-Ala-Trp-Leu-Val-Arg-Gly-Arg-Gly-OH Linear
Exenatide 4184.03 H-His-Gly-Glu-Gly-Thr-Phe-Thr-Ser-Asp-Leu-Ser-Lys-Gln-Met-Glu-Glu-Glu-Ala-Val-Arg-Leu-Phe-Ile-Glu-Trp-Leu-Lys-Asn-Gly-Gly-Pro-Ser-Ser-Gly-Ala-Pro-Pro-Pro-Ser-NH2 Linear

Dataset-3 includes somatostatin, a natural growth-inhibiting peptide hormone, along with seven 14-amino acid cyclic analogues. Data is collected in two data acquisition modes; the first one was conducted on a Thermo Q-Exactive® instrument employing full scan mode with data-dependent MS/MS and the second one High Definition MSE (HDMSE) data was collected using a Vion IMS QTof Mass Spectrometer. The detailed experimental conditions for this dataset are documented in the referenced bibliography [11, 12]. In the synthesis of these analogues, a common approach is employed, which entails substituting some of the natural amino acids with non-natural or modified ones (Fig 1). Notably, these analogues feature the substitution of Phe (7) by Msa, enhancing the rigidity due to the ortho substitution, and Trp (8) by D-Trp [13]. Additionally, various permutations involve substituting Ala (1), Cys (3), and Cys (14) with their D-amino acid equivalents, along with the substitution of Lys (4) by ornithine [13]. Molecular weight ranges from 1636 to 1678 Da Table 4. Given the inherent low stability of somatostatin, a critical consideration for its pharmaceutical utility, there is a great interest in evaluating whether these novel analogs (Table 4) exhibit prolonged lifetimes in human serum.

Fig 1. Structure of somatostatin and its seven modified analogues including unnatural amino acids.

Fig 1

All eight peptides exhibit a cyclic structure, closing through the disulfide bond (between monomer 3 and 14).

Table 4. Dataset-3 is composed of somatostatin and its seven modified analogues, with the corresponding molecular formulas and molecular weights.

Compound name Molecular formula Monoisotopic mass (Da) Structure
Somatostatin C76H104N18O19S2 1636.7167 Cyclic
Analogue 6 C79H110N18O19S2 1678.7636 Cyclic
Analogue 30 C78H108N18O19S2 1664.7480 Cyclic
Analogue 31 C79H110N18O19S2 1678.7636 Cyclic
Analogue 35 C78H108N18O19S2 1664.7480 Cyclic
Analogue 64 C79H110N18O19S2 1678.7636 Cyclic
Analogue 65 C79H110N18O19S2 1678.7636 Cyclic
Analogue 95 C79H110N18O19S2 1678.7636 Cyclic

Dataset-4 includes an antisense oligonucleotide (ASOs) with the formula C242H307N91O150P94 (molecular weight of 7633 Da) containing 25 monomers. ASOs are synthetic, small-sized single-stranded nucleic acids. Data was collected using a Thermo Orbitrap® instrument in DDA mode. This dataset pertains to the incubation of ASOs in human liver tissue, a commonly studied experimental condition [14]. It enables researchers to evaluate the efficacy and selectivity of the ASOs in targeting specific messenger RNA molecules within the complex environment of the liver.

In this study, dataset-5 comprises a collection of 25 structurally diverse linear and cyclic peptides, with molecular weights ranging from 708 to 1900 Da (atosiban, BIO-11006, BIO-1211, carbetocin, CSP7, deslorelin, desmopressin, felypressin, gonadorelin, iseganan, lanreotide, LDTRYLEQLHKLY, leuprolide, lypressin, M10 peptide, MMI-0100, NAS-911, ocreotide, peptide T, salmon calcitonin, somatostatin, SPX-101, triptorelin, vasopressin, and vapreotide), as depicted in Table 5. These compounds have been incubated with four pulmonary proteases (human cathepsin G, human neutrophil elastase, human MMP-12 catalytic domain, and bovine pancreatic trypsin). Except felypressin, iseganan, LDTRYKEQLHKLY, lypressin, MMI-0100, vasopressin that data is unavailable for bovine pancreatic trypsin incubation, and atosiban, lanreotide, leuprolide which data is also unavailable for the human cathepsin G protease incubation. Data acquisition was performed using a Waters® Q-TOF instrument in a data-independent mode. The data was used to develop an assay workflow aimed at guiding the initial chemical modifications of peptide hits in early respiratory drug discovery projects. The detailed experimental conditions for this dataset are documented in the referenced bibliography [15]. This workflow utilizes WebMetabase to effectively detect and elucidate the structures of metabolites formed through enzymatic proteolysis. This data has been used in this study for a comprehensive comparison of results obtained through this new approach. Furthermore, its utilization serves to underscore the noteworthy advancements in data processing time realized through the implementation of this workflow.

Table 5. Dataset-5, composed of 28 peptides, with the corresponding sequence structures and molecular weights.

Compound name Molecular weight (Da) Sequence Structure
BIO-1211 708.8 4-[(2-tolyl)-urea]-phenylacetyl-Leu-Asp-Val-Pro-OH Linear
CSP7 815.92 H-Phe-Thr-Thr-Phe-Thr-Val-Thr-OH Linear
Peptide T 857.87 H-Ala-Ser-Thr-Thr-Thr-Asn-Tyr-Thr-OH Linear
BIO-11006 1050.18 Ac-Gly-Ala-Gln-Phe-Ser-Lys-Thr-Ala-Ala-Lys-OH Linear
SPX-101 1179.38 H-D-Ala-D-Ala-Leu-Pro-Ile-Pro-Leu-Asp-Glu-Thr-D-Ala-D-Ala-OH Linear
M10 peptide 1181.27 H-Thr-Arg-Pro-Ala-Ser-Phe-Trp-Glu-Thr-Ser-OH Linear
Gonadorelin 1182.31 H-Pyr-His-Trp-Ser-Tyr-Gly-Leu-Arg-Pro-Gly-NH2 Linear
Leuprolide 1209.42 H-Pyr-His-Trp-Ser-Tyr-D-Leu-Leu-Arg-Pro-NHEt Linear
Deslorelin 1282.48 H-Pyr-His-Trp-Ser-Tyr-D-Trp-Leu-Arg-Pro-NHEt Linear
Triptorelin 1311.47 H-Pyr-His-Trp-Ser-Tyr-D-Trp-Leu-Arg-Pro-Gly-OH Linear
NAS-911 1393.68 H-Arg-Pro-Lys-Pro-Gln-Gln-Phe-Phe-Sar-Leu-Met(O2)-NH2 Linear
LDTRYLEQLHKLY 1691.95 H-Leu-Asp-Thr-Arg-Tyr-Leu-Glu-Gln-Leu-His-Lys-Leu-Tyr-OH Linear
MMI-0100 2283.68 H-Tyr-Ala-Arg-Ala-Ala-Ala-Arg-Gln-Ala-Arg-Ala-Lys-Ala-Leu-Ala-Arg-Gln-Leu-Gly-Val-Ala-Ala-OH Linear
Salmon Calcitonin 3431.89 H-Cys (1)-Ser-Asn-Leu-Ser-Thr-Cys (1)-Val-Leu-Gly-Lys-Leu-Ser-Gln-Glu-Leu-His-Lys-Leu-Gln-Thr-Tyr-Pro-Arg-Thr-Asn-Thr-Gly-Ser-Gly-Thr-Pro-NH2 Linear
Carbetocin 988.17 deamino-Cys (1)-Tyr(Me)-Ile-Gln-Asn-Cys (1)-Pro-Leu-Gly-NH2 Cyclic
Atosiban 994.19 deamino-Cys (1)-D-Tyr(Et)-Ile-Thr-Asn-Cys (1)-Pro-Orn-Gly-NH2 Cyclic
Octreotide 1019.25 H-D-Phe-Cys (1)-Phe-D-Trp-Lys-Thr-Cys (1)-Thr-ol Cyclic
Felypressin 1040.23 H-Cys (1)-Phe-Phe-Gln-Asn-Cys (1)-Pro-Lys-Gly-NH2 Cyclic
Lypressin 1056.23 H-Cys (1)-Tyr-Phe-Gln-Asn-Cys (1)-Pro-Lys-Gly-NH2 Cyclic
Desmopressin 1069.22 deamino-Cys (1)-Tyr-Phe-Gln-Asn-Cys (1)-Pro-D-Arg-Gly-NH2 Cyclic
Vasopressin 1084.24 H-Cys (1)-Tyr-Phe-Gln-Asn-Cys (1)-Pro-Arg-Gly-NH2 Cyclic
Lanreotide 1096.33 H-D-2Nal-Cys (1)-Tyr-D-Trp-Lys-Val-Cys (1)-Thr-NH2 Cyclic
Vapreotide 1131.38 H-D-Phe-Cys (1)-Tyr-D-Trp-Lys-Val-Cys (1)-Trp-NH2 Cyclic
Somatostatin 1637.88 H-Ala-Gly-Cys (1)-Lys-Asn-Phe-Phe-Trp-Lys-Thr-Phe-Thr-Ser-Cys (1)-OH Cyclic
Iseganan 1900.28 H-Arg-Gly-Gly-Leu-Cys (1)-Tyr-Cys (2)-Arg-Gly-Arg-Phe-Cys (2)-Val-Cys (1)- Val-Gly-Arg-NH2 Cyclic

Dataset-6 comprises human insulin, a peptide hormone containing three disulfide bridges, one of which is internally located within Chain A, while the other two covalently connect Chain A to Chain B (Fig 2). Data was collected with QTOF from a Waters® instrument. Insulin has been subjected to analysis following incubation with IDE, a protease widely recognized for its pivotal role in degrading and inactivating insulin. The detailed experimental conditions for this dataset are documented in the referenced bibliography [16].

Fig 2. Insulin structure with the linear visualization.

Fig 2

The structure of insulin consists of two peptide chains known as Chain A, comprising 21 amino acids (numbered 1–21), and Chain B, comprising 30 amino acids (numbered 22–51). The A and B chains are interconnected by two disulfide bonds (highlighted in pink and light blue), and an additional disulfide bond is formed within the A Chain (highlighted in purple).

Data preprocessing

The MassMetaSite procedure consists of three steps: (a) data reading, (b) automatic detection of the chromatographic peaks related to the parent compound and its metabolites, and (c) structure elucidation by proposing a potential metabolite structure based on the fragmentation pattern for each peak detected in the previous step.

a) Data reading. Three different acquisition files need to be defined, depending on the data. Firstly, a blank file is employed to distinguish relevant signals from background peaks. This file is crucial for investigating whether a detected peak in the incubation file is attributable to the compound of interest or if it was already present in the incubation matrix (blank sample). Secondly, a substrate file is utilized to analyze the fragmentation pattern of the substrate. This step is essential in the structure elucidation process, involving the comparison of fragments assigned to the spectra of the parent compound with the spectra of potential metabolites. Lastly, the incubation file which contains all the products after incubation, either in vitro or in vivo. It serves for investigating and identifying metabolites formed during the incubation process.

b) Automatic detection of the chromatographic peaks. During the automated chromatographic peak detection stage, an initial spectral noise analysis is conducted. For each full scan (intensity vs. m/z), a noise level is computed by calculating the change in slope between two consecutive shortlists of ions present in the full scan, and ions below this threshold are systematically eliminated. Subsequently, the list of ions is examined across chromatographic retention times. Ions are selected based on specific m/z values to precisely determine the presence or absence of peak formation.

Following the identification of a potential peak in the incubation sample, a background analysis is performed. Specifically, for the selected m/z and retention time of the potential peak, a search is conducted to verify the presence of the peak in the blank sample. If the peak is detected in the blank, a peak alignment optimization is initiated using a combination of Hodgkin and Pearson similarity indexes computation, which allows a comprehensive comparison of both shape and peak intensity. The sample peak is excluded from the analysis whenever it exhibits similar shape and equivalent (or lower) intensity to the blank peak. The Negative Control Area Ratio is then computed, representing the quantitative ratio between the peak area in the incubation sample and the corresponding in the blank.

Subsequently, a filtered spectrum is computed by merging all the scans within the peak retention time range. This involves the selection of m/z values that exhibit correlation within the chromatographic peak shape. Each m/z value of each filtered spectrum is compared with any of the m/z values for the metabolites of the parent compound. There are two potential options to represent the theoretical m/z of the compound of the peak under consideration: the monoisotopic or the most abundant isotope species. Additionally, the isotope pattern derived from the metabolite formula is compared to the one from the experimental spectra and a filter may be set to consider the similarity between the observed and predicted intensity for each potential isotope. In addition, m/z values from multiple charge states were also used in the analysis.

For each selected m/z value extracted from the filtered spectra, a comprehensive metabolite classification is conducted. This classification categorizes metabolites into distinct groups, including first-generation metabolites, second or higher generation metabolites, metabolites stemming from biotransformations unrecognized by the software (referred to as “red peaks” denoting unknowns), and cases where the fragment ion may arise from ion adduct formation or in-source neutral loss.

Ultimately, a MS/MS evaluation is conducted, examining the presence of m/z values observed in the parent spectrum within the potential peak. The evaluation considers the shift based on the obtained formula, classifying a non-shifted scenario when the same m/z observed in the parent spectra is also observed in the metabolite, and identifying a shift when a change in the m/z of the considered value relative to the parent is observed between a peak in the parent spectra and a peak in the filtered spectra.

The m/z values are scored according to multiple criteria: isotope similarity, retention time, MS/MS comparison and calculated m/z. Among all the values above the score threshold, the m/z that will represent the peak in the chromatogram is the one with the highest m/z value. This process results in a compiled list of peaks, each associated with an assigned m/z, retention time range, area, full scan filtered spectra, and MS/MS spectra.

c) Structure elucidation. The third stage of data processing is structure elucidation (Fig 3), during which the fragment ions obtained from the parent and those from the metabolite are compared.

Fig 3. Illustrates the third step, Structure Elucidation, of MassMetaSite procedure.

Fig 3

This process has two starting points: the parent structure, or the metabolite structure which is obtained by virtual synthesis:

  1. Identification of metabolite fragments from fragmentation of the parent
    • Parent fragmentation: During this process, the parent molecule is fragmented, and the m/z of the fragments are computed. There could be more than one m/z value for a single fragment due to potential hydrogen rearrangements. Fragment structures are then associated to the spectra m/z values considering a user-specified tolerance.
    • Generation of metabolite fragments: Metabolite fragments are built from parent fragments using metabolite and parent atom map. The metabolite resulting fragments m/z may be shifted or equal to the parent m/z depending on whether the fragment contains sites of metabolism or not [17].
    • Association between parent peaks and metabolite peaks: For each parent spectrum, whether MS or MS/MS, the software checks if there are peaks with the same or shifted m/z in the associated metabolite spectrum. A shifted m/z is equal to the m/z of the parent plus the change of m/z due to the chemical modifications introduced during metabolism. Resulting in Substrate-Metabolite peak pairs that could be used for structural identification.
    • Matches: When substrate and metabolite fragments are identical and both peaks of the Substrate-Metabolite fragment pair have the same m/z value, the observed and calculated interpretation match. Likewise, when the metabolite fragment is different from the substrate fragment and the Substrate-Metabolite fragment pair have a shifted mass, the interpretations also match [18].
    • Mismatches: The fragments that are mismatching are those ones where the m/z is observed as non-shifted between the parent and metabolite spectra, but the atom set of the fragment corresponds to a chemical modification that would change the m/z. Similarly, a mismatch is detected when the m/z is observed as shifted between the parent and the metabolite spectra, but the atom set of the fragment corresponds to a modification that would not change the m/z of the fragment [18].
  2. Identification of metabolite fragment from the structure of the metabolite: Virtual fragments of the metabolites are generated based on a predefined list of metabolic biotransformation reactions [19].
    • Fragmentation of the metabolite: This is the same as the parent fragmentation but the number of bonds that can be cut is usually lower since breaking all the possible metabolites has a greater computational cost.
    • Metmatches: The fragments that are obtained in this way are assigned to the metabolite spectra are called metmatches. This fragmentation strategy is particularly beneficial for cyclic peptides, where the metabolite might be a linear peptide due to amide hydrolysis-induced ring opening, leading to a markedly different fragmentation pattern compared to the parent.

Scoring is done by summing the intensity for the matching peaks plus the sum of the intensity for the metmatching peaks minus the sum of the intensity for the mismatching peaks. The solutions with the highest score are auto selected by the system and reported as potential structural candidates [18].

Each experiment consisted of a set of samples, i.e., one sample per incubation time point per matrix. MassMetaSite processes each sample as a separate entity, and thus generates three main pieces of information for each sample: metabolic scheme, spectrometry data (product ion assignment) and outcomes (retention time, MS area, MS relative area, collision cross section, and parts per million (ppm) mass error) for each found component. WebMetabase then consolidates all these data from the individual files into a single interpretation for the entire experiment (time/matrix) and analyses which metabolite peaks from each sample can be clustered based on its retention time and m/z.

Settings/Structure visualization

In this study, data have been processed with distinct algorithms, establishing the groundwork for a comprehensive comparison among them. This research is focused on three crucial dimensions

-Peak detection (Monoisotopic Mass and Most Abundant Mass). Various algorithms for peak detection are employed based on the molecular size. The MiM represents the peak to the ion with the lowest mass-to-charge (m/z) ratio and it is calculated using the lightest isotope mass of each element present in the molecule. It is particularly useful for accurately determining the molecular formula, especially for smaller molecules [20]. Conversely, MaM represents the molecule’s most common isotopic distribution, considering the natural abundance of all isotopes in the molecule, not just the lightest ones.

For larger molecules or when the monoisotopic ion is undetectable, the MaM is employed for peak detection. This choice is made because, with increasing molecular size, the heightened probability of the entire molecule containing at least one heavy isotope atom (mainly 13C) becomes more pronounced. Consequently, the MiM peak may be much more difficult to detect than the MaM peak. In addition, MaM peaks are typically the ones which are selected for triggering MSMS scans in DDA when no preferred list is provided to the acquisition software.

In this study all datasets have been processed with both the MiM and MaM algorithms, except dataset-5 that has been exclusively subjected to processing with the MaM settings.

-Acquisition modes (Data-dependent acquisition and Data-independent acquisition) The LC-HRMS stands as the preferred method MetID, with DDA being commonly used strategy in MS data acquisition. In DDA, precursor ions selected based on their abundances are often employed to drive MS/MS. In contrast, DIA methods, such as MSE and HDMSE, eliminate the risk of overlooking metabolites by avoiding precursor ion selection [21]. The DIA HDMSE is a method that combines ion mobility separation with MSE data acquisition. It alternates between low and high collision energy ion mobility spectrometry-mass spectrometry scans, enabling accurate mass measurements of both precursor and product ions simultaneously. In contrast to MiM, where a specific m/z must be isolated before fragmentation, DIA provides more complex but more complete datasets.

Data from dataset-3 was acquired employing the two predetermined strategies, DIA and DDA, facilitating a comparison of outcomes obtained from both acquisition modes. Settings used for the processing of DIA (MSE/HDMSE) and DDA data for somatostatin synthetic analogues are presented in S6-S9 Files.

-Structure visualization (Expanded and non-expanded). Two visualization options are available for representing the structure of polymeric compounds like peptides or oligonucleotides during data analysis. Monomers of the compound can be depicted either in an expanded form, revealing all atoms and intermonomer bonds, or in a non-expanded form, where the structure is represented by linking the monomer acronyms. In this study, dataset-4 was processed using both visualization options, enabling a comparative analysis of processing time and providing an illustrative example of how metabolites structures are visualized after metabolic reactions using both approaches.

The selection to work on expanded or non-expanded monomers has an impact on structure visualization. The non-expanded mode shows the monomer symbol making it simpler for the user to identify the structure and the place where the biotransformation takes place and therefore it is recommended to be used. Nevertheless, it also has implications in the computation process. The structure that is represented as monomer does not undergo a virtual structure metabolite generation, the biotransformation is applied at monomer level and not at atomic level, therefore the resulting compound is not a valid chemical structure, since there is no information on the exact chemical structure that is obtained after the reaction. The part of the structure that is represented as atoms/bonds undergoes a typical virtual reaction and a defined chemical structure is obtained for each potential metabolite. In the monomer presented part of the molecule, fewer chemical structures need to be constructed during the calculation process, resulting in reduced computation time. There is another aspect applied on the part of all the molecule that is treated as monomer, since for this part only the typical a,b,c, and x,y,z fragmentation is considered, reducing in this way the number of potential fragments generated degreasing the time and memory consumption. For the rest of the molecule treated as atoms/bond all the bonds are disconnected to generate fragments that will generate an increased number of fragments.

Furthermore, there exists the option to work with a combination of both visualizations within the molecular structure. This can be achieved by selectively choosing which segments of the molecule to expand or maintain in a non-expanded state.

Data analysis

Following data consolidation, manual data interpretation by the user is conducted for peak selection and structure elucidation steps, applying diverse data analysis criteria to systematically eliminate any potential false positive metabolites. These criteria are:

Peak selection

  • MS area (%): Reporting with a relative area above 0.5%.

  • Difference between observed and calculated m/z (amu, ppm): For the MS signal the system computes the difference between the observed and the computed m/z. The observed m/z considers the m/z finds at the different scans and derives a value which is compared to the vendor software package to consider effects like peak saturation and loss of accuracy at the top of the peak. Maintaining a difference of less than 10 ppm between observed and computed values [22].

  • Value of isotopic all similarity: Quantifying the match between observed and expected isotopic patterns for peaks, where a low value suggests pattern variability.

  • Negative control area ratio: Establishing the ratio between peak areas in the incubation sample and the blank, with a signal observed in both considered non-specific.

  • Kinetics: Reflects changes in metabolite abundance over time. At time 0 (t = 0), when the incubation begins, the cluster chart would initially show the presence of ions solely related to the parent compound. There should be no signals corresponding to metabolites at this point, as no biotransformation has occurred yet. The first generated metabolite usually has an exponential shape, as they are starting to be formed. If the metabolites are further metabolized, the signal of the metabolite will decrease since the metabolite has been consumed to generate a second generation one. Typically, the second-generation metabolite has a sigmoidal shape since it needs the first-generation metabolite to form and then be further metabolized [11].

  • Shape of the metabolite peak: Ideally, metabolite peaks should exhibit a Gaussian shape; however, in practice, peak tails may occasionally occur. It is important to distinguish these from peaks that resemble background noise or exhibit irregular shapes, such as broad or asymmetric profiles, which may suggest contamination or interference rather than the presence of a true metabolite [23].

Structure elucidation.

The second step of the algorithm proposes potential metabolite structure based on the fragmentation pattern for each peak detected in the peak selection step Fig 4 illustrates the MS Spectra data interpretation window, highlighting the analysis of fragment structures used to generate the score, including the count of matches and mismatches.

Fig 4. Fragmentation pattern for the M2-38 metabolite of oxytocin in incubation with chymotrypsin in 120 minutes.

Fig 4

On the left, full scan/data-dependent MS/MS spectras for oxytocin and M2-38 are presented, while on the right, a subset of fragment structures derived from the selected matched peaks is displayed.

This window allows for a comparison to determine if the metabolites exhibit a similar fragmentation pattern compared to the substrate fragmentation. Metabolite fragment ions may either share the same m/z as a parent fragment ion (non-shifted ion) or exhibit a defined mass shift (shifted ion).

The MS and MS/MS spectra contain 5 types of fragments:

  • Black peaks: These peaks lack fragment assignments in the parent, they have no effect on the interpretation of the metabolite under consideration.

  • Red peaks: Represent matching peaks, and their structural interpretation aligns with the proposed metabolite structure. Clicking on red peaks reveals the assigned structure in the right panel.

  • Cyan peaks: Indicate mismatching peaks, and their structural interpretation contradicts the proposed metabolite structure.

  • Coral peaks: Correspond to metabolite matching peaks with structural information consistent with the proposed metabolite structure. However, they lack a substrate fragment match, resulting from manual editing or MassMetaSite if metabolite fragmentation is selected in the settings.

  • Light green peaks: Denote metabolite mismatching peaks, providing structural information contrary to the proposed structure under study. These peaks lack substrate peak matches and stem from the propagation of a manually edited peak.

It is essential to consider the isotope pattern and ensure that it aligns with the expected charge state of the metabolite. The charge of the ion significantly influences the spacing between isotopic peaks, and deviations in the observed pattern may serve as indicators of errors in charge assignment or other issues.

Furthermore, the structural assignment of the isotope pattern peaks is checked manually. If the structure assignment of a match or mismatch peak is not the expected one, it can be removed from the analysis and therefore the score will be re-calculated. In addition, black peaks can be examined, and structural information can be added by using the fragment structure editor if it is considered.

Processing time.

In this study, the data processing time has also been collected, encompassing the duration required for importing data into WebMetabase. Notably, dataset-5 facilitated a comparison with previously reported processing times in the bibliography [15], utilizing the same software with an outdated version (2021). A comparison of the processing time has also been conducted between the different algorithms and settings outlined in the Data Preprocessing section. Since the processing time may vary depending on the peak algorithm employed, as well as the choice of visualization for the compound representation, including expanded, non-expanded, or mixed options.

Results and discussion

This section presents the experimental results obtained through the application of our approach and algorithms to perform the MetID of the five distinct peptide datasets and an oligonucleotide dataset. All these metabolite structural assignments have been checked manually and considered as reliable because the fragmentation was adequate, isotope pattern was as expected, the m/z small differences between the m/z of observed and theoretical (<10 ppm), and the score was high.

Monoisotopic mass and most abundant mass

One of the primary objectives of this study is to conduct a comprehensive comparison between the two algorithms, MiM and MaM. To achieve this goal, datasets 1, 2, 3, 4 and 6 as previously outlined, have undergone processing with both algorithm configurations. Table 6 presents the number of identified metabolites corresponding to each dataset, based on the employed algorithm.

Table 6. Number of identified metabolites for each dataset, considering the algorithm, incubation conditions, and acquisition mode (in case of dataset-3).

DATASET-1 INCUBATION CONDITIONS
Trypsin Chymotrypsin Pancreatic Elastase Pepsin
MiM 34 42 39 35
MaM 36 45 43 37
DATASET-2 INCUBATION CONDITIONS
DPP-4 NEP
MiM 26 4
MaM 27 4
DATASET-3 ACQUISITION MODE
DDA DIA
MiM 50 111
MaM 50 111
DATASET-4 STRUCTURE VISUALIZATION
NON-EXPANDED
MiM 7
MaM 11
DATASET-5 INCUBATION CONDITIONS
Trypsin MMP12 Neutrophil Elastase CatG
MaM 31 60 70 77
DATASET-6 INCUBATION CONDITIONS
IDE
MiM 8
MaM 12

Notable differences between MiM and MaM algorithms are observed in compounds such as calcitonin from dataset-1 or taspoglutide from dataset-2. These variations are attributed to the larger peptide structures of these compounds. As molecular size increases, the relative intensity of the MiM tends to decrease. In such cases, the use of the MaM algorithm provides a more precise MetID in larger peptides.

The analysis of dataset-1 resulted in the identification of 150 metabolites through the MiM algorithm, while 161 metabolites were identified using the MaM algorithm. Calcitonin, a cyclic peptide, is one of the largest peptides of this dataset (3429.71 Da), yielding the identification of the same 6 metabolites with both settings, M1-2178, M2-2309, M3-1981, M4-1852, M5-499, and M6-1739 with the respectively retention times of 1.86, 1.91, 2.45, 2.47, 2.52, and 2.99 minutes. However, there is a noticeable difference between them in the score values Table 7. A higher score indicates a better match between the theoretical product ion m/z value and the observed m/z value in the MS/MS spectrum and therefore a more confident structure prediction. This scoring system helps in distinguishing reliable matches from potential false positives.

Table 7. Retention times of the identified Calcitonin metabolites along with their corresponding values for score, matches, mismatches, and metmatches obtained using both algorithms.

RT
(minutes)
Most abundant mass Monoisotopic mass
Score Matches Mismatches MetMatches Score Matches Mismatches MetMatches
1.86 445.4 2 1 13 126.1 1 0 0
1.91 928.3 10 0 17 523.4 6 0 0
2.45 914.2 12 2 28 175.5 2 0 1
2.47 1062.3 13 1 40 197.1 0 0 0
2.52 1864.0 17 1 19 454.9 6 0 0
2.99 1177.4 23 3 30 234.5 2 0 0

The dataset-2, consisting of GLP-1 and four synthetic analogues, comprises linear peptides with a molecular weight exceeding 3000 Da, thereby accentuating the significant differences when utilizing MaM or MiM algorithms. This contrast is evident in the case of taspoglutide, as illustrated below.

Taspoglutide (3338.71 Da) incubated with DPP-4 has yielded 15 metabolites peaks with the MaM settings (M1-2175, M2-2163, M3-1966, M4-2223, M5-1925, M6-1895, M7-2222, M8-2154, M9-2255, M10-2094, M11-2147, M12-1977, M13-1396, M14-1146 and M15-407) with a retention time of 2.69, 3.54, 3.74, 3.88, 3.96, 4.00, 4.26, 4.47, 4.48, 4.54, 4.54, 4.92, 5.27, 5.90, and 6.80 respectively. In contrast, using MiM settings, 14 metabolites have been identified, the same as with MaM, but missing M6-1895 (at a retention time of 4.00). Eight of the metabolites correspond to first-generation products (from a single reaction) and are indicated by the green color of the peak, as shown in Fig 5. The other seven brown colored metabolites are indicative of multiple enzymatic reactions. A score is calculated and reported for each metabolite. It can be highlighted that the increased number of matches in the MaM analysis contributes to higher maximum score values. This increase in score values convert a greater level of confidence in the results obtained. As for example, with MaM the metabolite M4-2223 the score is 1302.1 with 28 matching fragments, while with MiM the same metabolite results in a score of 807.1 with 17 matching fragments. Other results are shown in supporting information.

Fig 5. Extracted ion chromatograms of Taspoglutide after 24 hours of incubation with DPP-4, using both algorithms.

Fig 5

(Blue peak: represents the parent peptide compound, green peaks: first generation of metabolites, and brown peaks: second generation or higher).

The peptide GLP-1 (3297.68 Da) exhibits a brief half-life, primarily attributed to its swift degradation by proteases DPP-4 and NEP. MetID of GLP-1, incubated with DPP-4, revealed the presence of three metabolites: M1-137, M2-394, and M3-208, with respective retention times of 6.53, 6.58, and 6.66 minutes. Notably, M3-208 exhibits the common cleavage site reported in bibliography [24] and attributed to DPP-4, occurring between Ala (8) and Glu (9). A discernible distinction between the two algorithms lies in the appearance of false positives, as shown in Fig 6, with a notable increase observed when employing the MiM settings.

Fig 6. False positives of the GLP-1 compound using both the MaM and MiM algorithms.

Fig 6

It is noteworthy that the number obtained with the MiM algorithm is significantly higher.

Semaglutide, a GLP-1 analogue, underwent data collection using the HDMSE acquisition mode on a Waters® QToF instrument. Structural assignments for two degradation products with both algorithms MiM and MaM, namely M1-3446 and M2-3418, have been achieved with high mass accuracy, featuring retention times of 2.77 and 3.07 minutes, respectively (Fig 7). Consistent with prior bibliography, these metabolites arise from three distinct metabolic modifications, specifically induced by amide hydrolysis and sequential beta-oxidation in the fatty acid part [25].

Fig 7. Metabolites identified and extracted ion chromatogram of Semaglutide using MiM algorithm.

Fig 7

Dataset-3 (comprising somatostatin and seven synthetic analogs incubated with human serum) allows the analysis with different acquisition modes in order to illustrate that the workflow for MetID employing data coming from distinct structural mass spectrometry techniques as DIA and DDA.

DDA data was collected with Thermo Scientific Q-Exactive Hybrid Quadrupole-Orbitrap Mass Spectrometer (Q-Exactive) instrument employing full scan mode and DIA HDMSE data were acquired using a Vion IMS QTof Mass Spectrometer. Both data was processed through Mass-MetaSite, and subsequently uploaded to WebMetabase for visualization via the Mass-MetaSite Batch Processor.

DDA and DIA data underwent processing with both algorithms (MiM and MaM). The results obtained show no distinctions. The identified metabolites, score values, and various parameters such as the numbers of matches, mismatches, and metmatches remain consistent across both algorithms. Considering the minimal chemical or monomer modifications within the peptide structure of these compounds, no substantial shift in molecular size was observed in this dataset.

The analysis of this dataset collected with DDA led to the identification of 17 metabolites for each of the algorithms. All the metabolites identified have been produced from amide hydrolysis reaction. The principal metabolite formations observed include the generation of -Ala (−71 Da) and -AlaGly (−128 Da) from the linear segment of the structure (Fig 8). The incorporation of D-Trp at the eighth position showed an improved stability over the parent compound somatostatin, due to the differences in the appearance of metabolism as synthetic analogs avoid the ring opening observed between D-Trp (8) and Lys (9). This observation aligns with findings from previous bibliography, which highlighted that the introduction of Msa residues, coupled with the presence of D-Trp8, contributes to the augmentation of aromatic side-chains interactions in Somatostatin, providing a greater stability [13].

Fig 8. Somatostatin (Parent compound) and major metabolites identified using both algorithms.

Fig 8

Metabolites M5-128 and M6-71 indicate cleavages from the tail portion of somatostatin. Additionally, M3 + 18 represents a ring-opening product occurring between DTrp (8) and Lys (9).

Similarly, for DIA data, the identification of key metabolites, specifically -Ala and -AlaGly, is consistent. As previously documented in bibliography [26], the analog labeled as 95 demonstrates superior stability, characterized by delayed and reduced metabolic transformations compared to other analogs. This stability is further elucidated in Fig 9, which delineates the time/response profiles of the substrate, illustrating the gradual disappearance of the peptide.

Fig 9. Substrate profiles employing In-In scaling for somatostatin, Analogue 31, 65, and 95.

Fig 9

Dataset-5 contains 16 linear and 12 cyclic peptides, incubated with cathepsin G, neutrophil elastase, trypsin and MMP-12. The data was collected using LC-HRMS, with analysis performed on a Synapt G2® high-definition quadrupole time-of-flight mass spectrometer (Waters®), operating in positive electrospray ionization mode

The data processing time, employing the settings outlined in the referenced research [13] study and utilizing the non-expanded structure visualization, has undergone a substantial reduction. As an illustration, the compound salmon calcitonin, which conventionally needed two hours for processing, now, requires only 25 minutes with the implementation of the new methodology.

As an illustrative example of this dataset, the following compound and its analogs will be described, while the MetID for the other compounds can be found in the Supplementary Information. Specifically, the dataset includes somatostatin, and analogs that have been synthesized over the past few decades introducing modifications such as exchange and deletion of amino acids, ring size reduction, or disulfide bridge modification, among others. [13] These analogs, namely octreotide, lanreotide, and vapreotide, are octapeptides characterized by a shorter and consequently less flexible ring structure compared to somatostatin.

Previous bibliography reports that the ring opening from somatostatin and its analogs was only observed in the case of somatostatin, as also observed in this study [13]. Despite somatostatin being rapidly degraded by proteases, its analogs exhibit stability, as illustrated in Fig 10, which presents extracted ion chromatograms after 60 minutes of incubation with neutrophil elastase. The processing time for these compounds was 15 minutes.

Fig 10. Extracted ion chromatograms using MaM algorithms.

Fig 10

A) somatostatin, B) lanreotide, C) octreotide, and D) vapeotride after 60 minutes of incubation with neutrophil elastase.

Fig 11 presents a detailed MetID of somatostatin incubated with neutrophil elastase. The analysis identified the same metabolites as reported in the previously bibliography [13]: M1-1371, M2-1204, M3-230, M4 + 18, M5-909, M6 + 18, and M7-661, with respective retention times (RT) of 0.73, 1.60, 1.60, 1.71, 1.93, 1.93, and 2.21.

Fig 11. Summarized MetID reports which each retention time (RT) from incubation of somatostatin with neutrophil elastase.

Fig 11

Dataset-6 contains data of human insulin (5808 Da), a cyclic peptide with three disulfide bridges, after the incubation with IDE at 2 minutes. Computing using the MaM algorithm led to the identification of 12 metabolites, designated as M1-2965, M2-3315, M3-3145, M4-2973, M5-2902, M6-3452, M7-3151, M8-3032, M9-2961, M10-3289, M11-2869, and M12-2798, with respective retention times of 2.06, 7.78, 8.08, 9.14, 9.65, 9.80, 10.38, 11.17, 11.43, 11.69, and 12.39 minutes (Fig 11). These metabolites have been previously documented in the bibliography and are generated through two cleavages, one within Chain A and the other within Chain B. Notably, four of them have been reported previously as major IDE-degraded insulin fragments (Fig 12) [16]. The formation of these metabolites results from cleavage occurring either within the A chain, specifically at positions A13-14 or A14-15, and in the middle of the B chain, either at positions B9-10 or B14-15.

Fig 12. Extracted ion chromatograms of Insulin after 2 minutes of incubation with IDE.

Fig 12

Blue peak: substrate/parent peptide, green peaks: first generation metabolites, brown peaks: second generation or higher metabolites.

In contrast, MiM identified 8 metabolites, M1-3306, M2-2971, M3-3450, M4-3150, M5-2959, M6-3287, M7-2867, and M8-2618, with respective retention times of 7.74, 9.14, 9.65, 9.78, 11.15, 11.43, 11.67, and 15.65 (Fig 13). Notably, two of the major previously bibliography-reported products are absent [16]. Moreover, consistent with previous observations, there is a significant difference in score values between the two algorithms, with MaM. scores consistently higher due to the higher number of matches and no presence of mismatches Table 8.

Fig 13. Four of major products corresponding to Insulin fragments, using MaM algorithm, after incubutation with IDE.

Fig 13

These metabolites, resulting from two distinct cleavages—one within Chain A and the other within Chain B—have been previously identified in the bibliography.

Table 8. Retention times of the identified Insulin metabolites along with their corresponding values for score, matches, mismatches, and metmatches obtained using both algorithms. NI = Non-identified metabolites.

RT
(minutes)
Most abundant mass Monoisotopic mass
Score Matches Mismatches MetMatches Score Matches Mismatches MetMatches
2.06 509.6 3 0 0 NI NI NI NI
7.78 1017.5 6 0 18 132.9 2 0 12
8.08 760.3 6 0 5 NI NI NI NI
9.14 816.3 6 0 8 253.3 4 0 9
9.32 767.5 6 0 2 NI NI NI NI
9.65 767.1 6 0 9 243.6 4 2 10
9.80 874.0 6 0 4 278.7 4 0 5
10.38 992.6 9 0 2 NI NI NI NI
11.17 792.4 6 0 4 319.0 2 0 3
11.43 737.5 6 0 4 223.8 4 0 4
11.69 509.6 3 0 0 210.1 4 1 7
12.39 789.8 6 0 8 NI NI NI NI
15.65 NI NI NI NI 138.9 2 0 0

Structure visualization – atoms/bonds vs monomer

The analysis of biotransformation products for therapeutic oligonucleotides using LC-HRMS presents a significant challenge, primarily attributed to the high molecular weight of these compounds. Given that these oligonucleotides consist of multiple monomers susceptible to metabolic reactions, constructing a virtual set containing all potential metabolites becomes a resource-intensive task in terms of time and computational requirements. Furthermore, the extensive number of cleavable bonds amplifies the complexity of the fragmentation analysis, demanding additional time and computing resources. This study shows the fragmentation algorithm that allows the analysis at monomer levels (non-expanded) and the other at the atom/bond levels (expanded).

In this section, three experiments involving the incubation of ASOs in Human Liver at various timepoints are presented, comprising two sets incubated with distinct oligonucleotide strains (dataset-4). The data was acquired in a DDA mode in a Thermo Q-Exactive® spectrometer.

A total of 11 metabolites have been identified in both experiments (expanded and non-expanded) using the MaM algorithm, M1-5473, M2-5473, M3-3282, M4-2567, M5-930, M6-617, M7-616, M8-313, M9-312, M10-304, M11-304, with respective retention times of 7.17, 8.62, 14.42, 16.36, 17.11, 17.59, 17.84, 17.90, 17.92, 17.95, and 18.26 (Fig 14). The identified structures of the metabolites can be attributed to specific biotransformation reactions, including o-dealkylation, phosphoester hydrolysis, aromatic deamination, and nucleobase loss.

Fig 14. Extracted ion chromatograms of ASOs after 72 hours of incubation with the modified strain.

Fig 14

In contrast, using the MiM algorithm, a total of 7 metabolites have been identified (using non-expanded visualization), M1-5470, M2-5470, M3-2566, M4-617, M5-313, M6-313, and M7-304, with respective retention times of 7.17, 8.62, 16.36, 17.55, 17.89, 17.93, and 18.24. Table 9 illustrates the score value differences between the two algorithms.

Table 9. Retention times of the identified ASO metabolites along with their corresponding values for score, matches, mismatches, and metmatches obtained using both algorithms. NI = Non-identified metabolites.

RT
(minutes)
Most Abundant Mass Monoisotopic Mass
Score Matches Mismatches MetMatches Score Matches Mismatches MetMatches
7.17 2654.9 31 0 2 835.9 15 0 0
8.62 2547.5 32 0 5 896.8 15 0 0
14.42 2250.2 31 0 11 NI NI NI NI
16.36 4445.7 62 6 33 788.3 35 0 3
17.11 4019.8 46 4 19 NI NI NI NI
17.59 5056.3 67 3 18 426.1 25 0 2
17.84 1778.9 24 0 0 NI 35 0 5
17.90 6171.2 101 6 19 541.9 NI NI NI
17.92 4081.9 62 1 4 561.3 5 0 3
17.95 8142.5 112 5 38 NI NI NI NI
18.26 5423.3 83 2 21 371.5 25 0 1

In addition, a paired-samples t-test was performed on the Score values reported in Tables 7–9 (excluding metabolites not found in one of the two approaches), revealing a statistically significant difference between the MiM and MaM algorithms (p = 0.0002904).

In Fig 15, the two distinct structure visualizations are presented for the same identified metabolite, showcasing a nucleobase loss from the parent compound and two phosphoester hydrolyses. The depiction at the bond level provides a clearer understanding of the biotransformation pathways and chemical alterations experienced by the compound. It is noteworthy to consider the processing time, which, in this specific example, is 40 minutes for the non-expanded representation and extends to 70 minutes when three of the monomers are expanded.

Fig 15. Illustration of nucleobase loss in both expanded and non-expanded structural representations of ASO.

Fig 15

This visualization algorithm allows to combine monomer and atom/bond notation, being then easily to see the metabolic changes in the structure. As a result, the need to expand all monomers individually is avoided, alleviating the associated high processing time. The constraint structure alignment between the substrate and the metabolite, maintaining the same orientation, allows for the interpretation of the occurred biotransformations.

Processing time

The processing time is influenced by the size and molecular weight of the peptide, as shown in Table 10. For peptides with molecular weights between 3000 and 4000 Da, processing times range from 22 to 30 minutes when using the non-expanded visualization mode. In contrast, the expanded mode results in longer processing times, extending from 2 up to 8 hours. For peptides exceeding 4000 Da, the expanded mode becomes impractical due to excessive memory requirements and long processing times.

Table 10. Estimated processing times for peptides based on molecular weight and the used visualization approach.

Molecular weight (Da) Number of compounds All monomers non-expanded (minutes) All monomers expanded
< 1000 5 5 - 8 25 - 30 minutes
1000 - 1200 15 8 - 13 30 - 40 minutes
1200 - 1500 7 13 - 18 40 - 60 minutes
1500 - 3000 12 18 - 22 60 - 120 minutes
3000 - 4000 6 22 - 30 2–8 hours
> 4000 4 30 - 40 *

* Not computable due to high memory requirements and extended processing time.

This difference is due to the fragmentation method used in each mode: the non-expanded mode operates at the monomer level, limiting fragmentation to predefined ion types (e.g., a, b, c, x, y, z ions in peptide analysis), while the expanded mode simulates fragmentation at the atomic level by disconnecting all chemical bonds. As a result, the expanded approach generates a significantly higher number of theoretical fragments, increasing processing times.

Conclusions

A new automated workflow for LC-HRMS data analysis has been described and developed, addressing challenges associated with result visualization and computational time in processing incubated data of macromolecules. This approach has effectively proved the analysis of both linear and cyclic peptides containing natural or unnatural amino acids. A total of 970 metabolites have been identified across different incubation conditions and peak detection algorithms. Furthermore, its applicability extends beyond peptides, as demonstrated by successful processing of oligonucleotide data. The results have shown that the workflow can efficiently manage experimental data within a molecular range spanning 700–7630 Da. Importantly, its effectiveness has been validated across multiple acquisition modes, as data coming from different acquisition modes (DDA and DIA) has been processed.

WebMetabase was employed for the processing and visualization of data derived from six databases using different algorithms in the data preprocessing step.

In larger molecules (>3000 Da), notable differences were observed between the MiM and MaM peak detection algorithms. The MaM approach identified a greater number of metabolites, including several that were missed by MiM but previously reported in the literature, as for example in the case of insulin. In these high-mass compounds, the MaM algorithm produced higher scoring and more numerous matches, indicating increased confidence in structural predictions. Additionally, it demonstrated a lower incidence of false positives, reinforcing its suitability for macromolecules.

Two visualization strategies for macromolecules are presented, expanded and non-expanded, which directly influence how biotransformations are computed. The non-expanded mode reduces preprocessing time by minimizing the number of chemical structures that must be generated during analysis, with processing times ranging from 5 minutes for small peptides (<1000 Da) up to 40 minutes for larger peptides (>4000 Da). In contrast, the expanded mode simulates fragmentation at the atomic level and requires processing times ranging from 25 minutes for small peptides (<1000 Da) to several hours for larger peptides. For peptides larger than 4000 Da, the expanded mode becomes impractical due to excessively long processing times and high memory requirements. Moreover, both strategies can be combined in a hybrid approach, allowing selective expansion of specific monomers while keeping others non-expanded, as illustrated in the oligonucleotide dataset. This flexibility enhances interpretability by enabling targeted bond-level investigation of biotransformations without incurring the computational cost of expanding all the monomers within the compound.

Supporting information

S1 File. Metabolite identification reports exported from WebMetabase for each compound incubated in each protease using both algorithms.

(PDF)

pone.0324668.s001.pdf (122.7KB, pdf)
S2 File. Dataset-1 and Dataset-2 MaM Settings.

(PDF)

pone.0324668.s002.pdf (51.3KB, pdf)
S3 File. Dataset-1 and Dataset-2 MiM Settings.

(PDF)

pone.0324668.s003.pdf (51.3KB, pdf)
S4 File. Dataset-2 Semaglutide MaM Settings.

(PDF)

pone.0324668.s004.pdf (51.7KB, pdf)
S5 File. Dataset-2 Semaglutide MiM Settings.

(PDF)

pone.0324668.s005.pdf (51.7KB, pdf)
S6 File. Dataset-3 MaM Settings DDA.

(PDF)

pone.0324668.s006.pdf (51.3KB, pdf)
S7 File. Dataset-3 MiM Settings DDA.

(PDF)

pone.0324668.s007.pdf (51.3KB, pdf)
S8 File. Dataset-3 MaM Settings DIA.

(PDF)

pone.0324668.s008.pdf (51.7KB, pdf)
S9 File. Dataset-3 MiM Settings DIA.

(PDF)

pone.0324668.s009.pdf (51.7KB, pdf)
S10 File. Dataset-4 MaM Settings.

(PDF)

pone.0324668.s010.pdf (51.6KB, pdf)
S11 File. Dataset-4 MiM Settings.

(PDF)

pone.0324668.s011.pdf (51.6KB, pdf)
S12 File. Dataset-5 MaM Settings.

(PDF)

pone.0324668.s012.pdf (51.6KB, pdf)
S13 File. Dataset-6 MaM Settings.

(PDF)

pone.0324668.s013.pdf (51.6KB, pdf)
S14 File. Dataset-6 MiM Settings.

(PDF)

pone.0324668.s014.pdf (51.6KB, pdf)

Data Availability

All relevant data are within the manuscript and its Supporting Information files.

Funding Statement

This work has been partially supported by Doctorats Industrials, AGAUR, Generalitat de Catalunya. Industrial Doctorate grant 00002/2023.

References

  • 1.Evans L, Phipps R, Shanu-Wilson J, Steele J, Wrigley S. Methods for metabolite generation and characterization by NMR. Identification and Quantification of Drugs, Metabolites, Drug Metabolizing Enzymes, and Transporters. Elsevier. 2020. p. 119–50. doi: 10.1016/b978-0-12-820018-6.00004-1 [DOI] [Google Scholar]
  • 2.Wu Y, Pan L, Chen Z, Zheng Y, Diao X, Zhong D. Metabolite Identification in the Preclinical and Clinical Phase of Drug Development. Curr Drug Metab. 2021;22(11):838–57. doi: 10.2174/1389200222666211006104502 [DOI] [PubMed] [Google Scholar]
  • 3.Li AP. Overview: Evaluation of metabolism-based drug toxicity in drug development. Chem Biol Interact. 2009;179(1):1–3. doi: 10.1016/j.cbi.2008.11.013 [DOI] [PubMed] [Google Scholar]
  • 4.Kania J. Analyzing drug metabolism: a key factor in drug development and safety assessment. J Drug Metab Toxicol. 2024;15:329. [Google Scholar]
  • 5.Wang L, Wang N, Zhang W, Cheng X, Yan Z, Shao G, et al. Therapeutic peptides: current applications and future directions. Signal Transduct Target Ther. 2022;7(1):48. doi: 10.1038/s41392-022-00904-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Moumné L, Marie A-C, Crouvezier N. Oligonucleotide Therapeutics: From Discovery and Development to Patentability. Pharmaceutics. 2022;14(2):260. doi: 10.3390/pharmaceutics14020260 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Cuyckens F, Dillen L, Cools W, Bockx M, Vereyken L, de Vries R, et al. Identifying metabolite ions of peptide drugs in the presence of an in vivo matrix background. Bioanalysis. 2012;4(5):595–604. doi: 10.4155/bio.11.333 [DOI] [PubMed] [Google Scholar]
  • 8.Mass Analytica. WebMetabase. 2023.
  • 9.Mass Analytica. MassMetaSite. 2023.
  • 10.Yao J-F, Yang H, Zhao Y-Z, Xue M. Metabolism of Peptide Drugs and Strategies to Improve their Metabolic Stability. Curr Drug Metab. 2018;19(11):892–901. doi: 10.2174/1389200219666180628171531 [DOI] [PubMed] [Google Scholar]
  • 11.Radchenko T, Brink A, Siegrist Y, Kochansky C, Bateman A, Fontaine F, et al. Software-aided approach to investigate peptide structure and metabolic susceptibility of amide bonds in peptide drugs based on high resolution mass spectrometry. PLoS One. 2017;12(11):e0186461. doi: 10.1371/journal.pone.0186461 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Radchenko T. New advances in metabolism prediction: Biotransformation of peptides and its implications in drug discovery. Barcelona: Universitat Pompeu Fabra. 2018. https://www.tdx.cat/handle/10803/665008 [Google Scholar]
  • 13.Martín-Gago P, Aragón E, Gomez-Caminals M, Fernández-Carneado J, Ramón R, Martin-Malpartida P, et al. Insights into structure-activity relationships of somatostatin analogs containing mesitylalanine. Molecules. 2013;18(12):14564–84. doi: 10.3390/molecules181214564 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Basiri B, Xie F, Wu B, Humphreys SC, Lade JM, Thayer MB, Yamaguchi P, Florio M, Rock B. Introducing an in vitro liver stability assay capable of predicting the in vivo pharmacodynamic efficacy of siRNAs for IVIVC. Mol Ther Nucleic Acids. 2020;21:725–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Wesche F, De Maria L, Leek T, Narjes F, Bird J, Su W, et al. Automated high-throughput in vitro assays to identify metabolic hotspots and protease stability of structurally diverse, pharmacologically active peptides for inhalation. J Pharm Biomed Anal. 2022;211:114518. doi: 10.1016/j.jpba.2021.114518 [DOI] [PubMed] [Google Scholar]
  • 16.Manolopoulou M, Guo Q, Malito E, Schilling AB, Tang W-J. Molecular basis of catalytic chamber-assisted unfolding and cleavage of human insulin by human insulin-degrading enzyme. J Biol Chem. 2009;284(21):14177–88. doi: 10.1074/jbc.M900068200 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bonn B, Leandersson C, Fontaine F, Zamora I. Enhanced metabolite identification with MS(E) and a semi-automated software for structural elucidation. Rapid Commun Mass Spectrom. 2010;24(21):3127–38. doi: 10.1002/rcm.4753 [DOI] [PubMed] [Google Scholar]
  • 18.Cece-Esencan EN, Fontaine F, Plasencia G, Teppner M, Brink A, Pähler A, et al. Software-aided cytochrome P450 reaction phenotyping and kinetic analysis in early drug discovery. Rapid Commun Mass Spectrom. 2016;30(2):301–10. doi: 10.1002/rcm.7429 [DOI] [PubMed] [Google Scholar]
  • 19.Zelesky V, Schneider R, Janiszewski J, Zamora I, Ferguson J, Troutman M. Software automation tools for increased throughput metabolic soft-spot identification in early drug discovery. Bioanalysis. 2013;5(10):1165–79. doi: 10.4155/bio.13.89 [DOI] [PubMed] [Google Scholar]
  • 20.Soares R, Franco C, Pires E, Ventosa M, Palhinhas R, Koci K, et al. Mass spectrometry and animal science: protein identification strategies and particularities of farm animal species. J Proteomics. 2012;75(14):4190–206. doi: 10.1016/j.jprot.2012.04.009 [DOI] [PubMed] [Google Scholar]
  • 21.Radchenko T, Kochansky CJ, Cancilla M, Wrona MD, Mortishire-Smith RJ, Kirk J, et al. Metabolite identification using an ion mobility enhanced data-independent acquisition strategy and automated data processing. Rapid Commun Mass Spectrom. 2020;34(12):e8792. doi: 10.1002/rcm.8792 [DOI] [PubMed] [Google Scholar]
  • 22.Mass Analytica. https://mass-analytica.com/products/webchembase/chromatography-quality-and-multiple-signal-detection/. Accessed 2024 February 26.
  • 23.Wahab M, Patel D, Armstrong D. Peak shapes and their measurements: the need and the concept behind total peak shape analysis. LC GC North Am. 2017. Dec;12:846–53. [Google Scholar]
  • 24.Manandhar B, Ahn J-M. Glucagon-like peptide-1 (GLP-1) analogs: recent advances, new possibilities, and therapeutic implications. J Med Chem. 2015;58(3):1020–37. doi: 10.1021/jm500810s [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Jensen L, Helleberg H, Roffel A, van Lier JJ, Bjørnsdottir I, Pedersen PJ, et al. Absorption, metabolism and excretion of the GLP-1 analogue semaglutide in humans and nonclinical species. Eur J Pharm Sci. 2017;104:31–41. doi: 10.1016/j.ejps.2017.03.020 [DOI] [PubMed] [Google Scholar]
  • 26.Wrona MD, Kirk JM, Zamora I, Radchenko T, Escola A, Riera A, et al. Somatostatin analogue catabolite screening and identification using Vion IMS QTof with WebMetabase. 720006586EN. Waters. 2019. [Google Scholar]

Decision Letter 0

Yash Gupta

10 Jun 2025

PONE-D-25-21950An automated software-assisted approach for exploring metabolic susceptibility and degradation products in macromolecules using High-Resolution Mass SpectrometryPLOS ONE

Dear Dr. Cifuentes López,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Both the reviewers have raised critical questions regarding use of ambiguous statements such as less time. Authors need to review the manuscript thoroghly to confirm complience with standard scientific language supported by citations or facts. There is a need for concrete description regarding Novelty as authors have similar works previously published. 

Please submit your revised manuscript by Jul 25 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Yash Gupta, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. Thank you for stating in your Funding Statement: [This work has been partially supported by Doctorats Industrials, AGAUR, Generalitat de Catalunya. Industrial Doctorate grant 00002/2023.].

Please provide an amended statement that declares *all* the funding or sources of support (whether external or internal to your organization) received during this study, as detailed online in our guide for authors at http://journals.plos.org/plosone/s/submit-now.  Please also include the statement “There was no additional external funding received for this study.” in your updated Funding Statement.

Please include your amended Funding Statement within your cover letter. We will change the online submission form on your behalf.

4. Thank you for stating the following in the Competing Interests section: [The author declare the following competing financial interest(s): P.C. is an employee of Lead Molecular Design, S.L., and I.Z. is the CEO of the company. Lead Molecular Design, S.L. develops analytical software, including MassMetaSite and Oniro, which were used in this study.].

We note that you received funding from a commercial source: [Lead Molecular Design]

Please provide an amended Competing Interests Statement that explicitly states this commercial funder, along with any other relevant declarations relating to employment, consultancy, patents, products in development, marketed products, etc.

Within this Competing Interests Statement, please confirm that this does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests). If there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.

Please include your amended Competing Interests Statement within your cover letter. We will change the online submission form on your behalf.

5. Thank you for stating the following in the Acknowledgments Section of your manuscript: [This work was supported by the Generalitat de Catalunya and Lead Molecular Design S.L through the Industrial Doctorate grant 00002/2023.]

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: [This work has been partially supported by Doctorats Industrials, AGAUR, Generalitat de Catalunya. Industrial Doctorate grant 00002/2023.].

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: 1. The authors have provided previous work citation for their sotftwares [6,7], which suggests some automating data analysis already exist. It would strengthen the paper highlight the novelty of this work such as is it the integration of peak detection algorithms for macromolecules (monoisotopic vs most abundant), the visualization approach.

2. Authors mentioned two peak detection techniques (MiM vs. MaM) and using fragment match scoring. For big molecules, for instance, how were mass inaccuracies (ppm) managed? Were several charge states taken into account and deconvoluted?

3. The authors claim “substantial reductions in processing time” and “consistent identification of degradation products” compared to prior work, but these claims need quantitative support. For processing time, it would be helpful to present a table or bar chart showing the workflow’s runtime for each dataset and algorithm (MiM vs MaM), alongside the runtimes of previous methods (if available from literature or the authors’ earlier pipelines). The statement "processing time ranged from 5 minutes to 2 hours per experiment," for instance, is helpful, but it would be more better to show how the specific circumstances (eg, large pepitde took 2h) and the extent of the improvement. Likewise, does "consistent identification" imply that the same metabolites were discovered in each replicate and also does the result of degardation matches previous reports?

4. Does these new algorithms and workflow are available to other researchers. It will better to mentioned whether the software is open-source or can be accessed (and how).

5. Authors are suggested to go for full experimental validation (e.g. NMR or orthogonal MS for each metabolite),

to support strong claims in the Conclusion and Abstract (“accurate prediction of metabolite structures”, “proven the analysis”, “greater confidence”).

Minor Comments

Ensure consistency in heading capitalization. eg. “Materials and methods” and “Results and Discussion”. Authors are advised to follow the journal’s style and apply it throughout the manuscript.

In Tables, either replace “Nº of dataset” with “Dataset number” or simply “Dataset”. Avoid using “Nº” (the degree symbol) in formal tables or define the same in its caption.

Both the Abstract and Conclusion summarize similar findings, improve both.

For consistency, either write out “minutes” or “min” uniformly, authors have used both which is need to be improved.

Check for grammer and acronyms at first use.

Reviewer #2: Major comments:

1. Please rewrite the introduction, including citations, that provide a scientific basis for the claims made compared to the existing literature.

2. In the introduction section, authors should:

(a) briefly explain how metabolites contribute to therapeutic efficacy, toxicity, and drug-drug interactions.

(b) highlight the importance of metabolite profiling that leads to the development of an effective drug in the introduction.

(c) explain the adverse effects of metabolites, preventing adverse drug reactions, and guiding clinical efforts.

(d) expand tools for in silico models and in vitro assays, identifying drug metabolites. Advantages and limitations of these approaches regarding the present study.

(e) detail more clear explanation of hybrid approaches with an example. What specific features of these software solutions faster than existing tools in terms of capabilities and performance?

(f) Please highlight any scientific report citing unsuitability for the conventional approach for larger and intricate molecules such as peptide oligonucleotide and antibodies, with increasing challenges with the increase in size.

(g) Line 99:101 – The article aims to address the challenges in the processing of only macromolecules, or macromolecules and small molecules, as described in the manuscript?

(h) Line 107:111 – Datasets consisted of some linear and cyclic as well as natural and unnatural amino acids, and oligonucleotides, can represent the scope of application for analysis on a wide variety of compounds? How the coverage of applicability has been defined with compounds covered in the datasets. Please define

(i) Line 141:144 – Repeated lines.

3. Authors should provide an alternate approach when the fragmentation data does not match well with the theoretical.

4. How does this system handle data quality control and verification? Is it an entirely automated process? What threshold detail can it provide for the user to make an accurate decision?

5. These algorithms are based on fragmentation, isotope pattern, and m/z differences. Is there any possibility to combine these software with other data types for broad information, like 3D structure prediction? Also, quantitative details are required for more reliable weightage.

6. It would be great to mention the statistical analysis of the results throughout. Are these statistically significant? Also, mention these differences between the two algorithms.

7. For high score context, authors should mention a specific numerical threshold.

8. Have previous studies conducted similar studies or comparisons? Please cite them and mention your novelty.

9. MiM and MaM algorithms have the distinction of handling larger peptides. Elaborate underlying reason behind it. Different ways to process peptide size or mass could be a reason for it?

10. Please comment on how score differences reflect the practical reliability of the identification. Are the score differences substantial enough for metabolite identification? Also, mention scores for false-positive or false-negative identification. Authors should discuss the score threshold in all analysis.

11. Please comment: Is MiM less sensitive to certain types of enzyme degradation? OR the score differences related to the algorithm's sensitivity.

12. Comment on how these algorithms handle false positives.

13. You mention that the non-expanded structure visualization helped achieve the reduced processing time. Could you clarify why this approach is more efficient compared to the expanded visualization?

14. Why were particular proteases like cathepsin G, neutrophil elastase, and trypsin chosen?

15. Line 247:251 – Generations of metabolites are classified with virtual screening based on predefined biotransformation reactions and m/z ratio, but differences among metabolites due to stereochemistry and reactions outside of the considered biotransformation limit the chances of correct analysis. How do authors believe that their method has overcome these limitations?

16. Line 411: 415 – classification of contaminations and background noise by unrecognised and irregular peaks may correspond to variation among metabolites of different generations. How do authors justify their parameters outlining the peak selection?

17. Line 315:317 – Dependency on the predefined databases should be countered if the structure is not matched?

18. Could this approach be applied to high-throughput screening of peptide libraries or to analyse metabolites in clinical samples?

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #1: Yes:  SUMIT KUMAR

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.

PLoS One. 2025 Aug 13;20(8):e0324668. doi: 10.1371/journal.pone.0324668.r002

Author response to Decision Letter 1


10 Jul 2025

Dear Editor and Reviewers,

We sincerely thank you for your thoughtful and constructive comments on our manuscript. We have carefully considered each point raised and have revised the manuscript accordingly to address all concerns. We provide detailed responses to the academic editor and reviewers’ comments along with explanations of the changes made in the "Response to Reviewers" document.

Thank you for your time and consideration.

Attachment

Submitted filename: Response to Reviewers.docx

pone.0324668.s016.docx (60.8KB, docx)

Decision Letter 1

Yash Gupta

30 Jul 2025

An automated software-assisted approach for exploring metabolic susceptibility and degradation products in macromolecules using High-Resolution Mass Spectrometry

PONE-D-25-21950R1

Dear Dr. Cifuentes López,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager®  and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact billing support .

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Yash Gupta, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Ajuthors have addressed the point raised, and manuscript can be accepted for publication in the journal.

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #1: Yes:  Sumit Kumar

Reviewer #2: Yes:  Bhawna Saini

**********

Acceptance letter

Yash Gupta

PONE-D-25-21950R1

PLOS ONE

Dear Dr. Cifuentes López,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Yash Gupta

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. Metabolite identification reports exported from WebMetabase for each compound incubated in each protease using both algorithms.

    (PDF)

    pone.0324668.s001.pdf (122.7KB, pdf)
    S2 File. Dataset-1 and Dataset-2 MaM Settings.

    (PDF)

    pone.0324668.s002.pdf (51.3KB, pdf)
    S3 File. Dataset-1 and Dataset-2 MiM Settings.

    (PDF)

    pone.0324668.s003.pdf (51.3KB, pdf)
    S4 File. Dataset-2 Semaglutide MaM Settings.

    (PDF)

    pone.0324668.s004.pdf (51.7KB, pdf)
    S5 File. Dataset-2 Semaglutide MiM Settings.

    (PDF)

    pone.0324668.s005.pdf (51.7KB, pdf)
    S6 File. Dataset-3 MaM Settings DDA.

    (PDF)

    pone.0324668.s006.pdf (51.3KB, pdf)
    S7 File. Dataset-3 MiM Settings DDA.

    (PDF)

    pone.0324668.s007.pdf (51.3KB, pdf)
    S8 File. Dataset-3 MaM Settings DIA.

    (PDF)

    pone.0324668.s008.pdf (51.7KB, pdf)
    S9 File. Dataset-3 MiM Settings DIA.

    (PDF)

    pone.0324668.s009.pdf (51.7KB, pdf)
    S10 File. Dataset-4 MaM Settings.

    (PDF)

    pone.0324668.s010.pdf (51.6KB, pdf)
    S11 File. Dataset-4 MiM Settings.

    (PDF)

    pone.0324668.s011.pdf (51.6KB, pdf)
    S12 File. Dataset-5 MaM Settings.

    (PDF)

    pone.0324668.s012.pdf (51.6KB, pdf)
    S13 File. Dataset-6 MaM Settings.

    (PDF)

    pone.0324668.s013.pdf (51.6KB, pdf)
    S14 File. Dataset-6 MiM Settings.

    (PDF)

    pone.0324668.s014.pdf (51.6KB, pdf)
    Attachment

    Submitted filename: Response to Reviewers.docx

    pone.0324668.s016.docx (60.8KB, docx)

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting Information files.


    Articles from PLOS One are provided here courtesy of PLOS

    RESOURCES